[gentoo-commits] proj/linux-patches:4.2 commit in: /

public inbox for gentoo-commits@lists.gentoo.org
 help / color / mirror / Atom feed

* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-08-19 14:58 Mike Pagano
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-08-19 14:58 UTC (permalink / raw
  To: gentoo-commits

commit:     1e94ed6b2554ee5b2a7770481aa5d64ad6a2332b
Author:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Wed Aug 19 14:58:06 2015 +0000
Commit:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Wed Aug 19 14:58:06 2015 +0000
URL:        https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=1e94ed6b

Patch to enable link security restrictions by default. Patch to disable Windows 8 compatibility for some Lenovo ThinkPads.  Patch to ensure that /dev/root doesn't appear in /proc/mounts when bootint without an initramfs.  Path to not not lock when UMH is waiting on current thread spawned by linuxrc. (bug #481344) fbcondecor bootsplash patch. Kernel patch that enables gcc < v4.9 optimizations for additional CPUs. Add patch to support namespace user.pax.* on tmpfs, bug #470644.

 0000_README                                        |    36 +
 1500_XATTR_USER_PREFIX.patch                       |    54 +
 ...ble-link-security-restrictions-by-default.patch |    22 +
 2700_ThinkPad-30-brightness-control-fix.patch      |    67 +
 2900_dev-root-proc-mount-fix.patch                 |    38 +
 2905_2disk-resume-image-fix.patch                  |    24 +
 4200_fbcondecor-3.19.patch                         |  2119 ++
 4567_distro-Gentoo-Kconfig.patch                   |    39 +-
 ...able-additional-cpu-optimizations-for-gcc.patch |   327 +
 ...-additional-cpu-optimizations-for-gcc-4.9.patch |   402 +
 5015_kdbus-8-12-2015.patch                         | 34349 +++++++++++++++++++
 11 files changed, 37446 insertions(+), 31 deletions(-)

diff --git a/0000_README b/0000_README
index 9018993..9022e99 100644
--- a/0000_README
+++ b/0000_README
@@ -43,6 +43,42 @@ EXPERIMENTAL
 Individual Patch Descriptions:
 --------------------------------------------------------------------------
 
+Patch:  1500_XATTR_USER_PREFIX.patch
+From:   https://bugs.gentoo.org/show_bug.cgi?id=470644
+Desc:   Support for namespace user.pax.* on tmpfs.
+
+Patch:  1510_fs-enable-link-security-restrictions-by-default.patch
+From:   http://sources.debian.net/src/linux/3.16.7-ckt4-3/debian/patches/debian/fs-enable-link-security-restrictions-by-default.patch/
+Desc:   Enable link security restrictions by default.
+
+Patch:  2700_ThinkPad-30-brightness-control-fix.patch
+From:   Seth Forshee <seth.forshee@canonical.com>
+Desc:   ACPI: Disable Windows 8 compatibility for some Lenovo ThinkPads.
+
+Patch:  2900_dev-root-proc-mount-fix.patch
+From:   https://bugs.gentoo.org/show_bug.cgi?id=438380
+Desc:   Ensure that /dev/root doesn't appear in /proc/mounts when bootint without an initramfs.
+
+Patch:  2905_s2disk-resume-image-fix.patch
+From:   Al Viro <viro <at> ZenIV.linux.org.uk>
+Desc:   Do not lock when UMH is waiting on current thread spawned by linuxrc. (bug #481344)
+
+Patch:  4200_fbcondecor-3.19.patch
+From:   http://www.mepiscommunity.org/fbcondecor
+Desc:   Bootsplash ported by Marco. (Bug #539616)
+
 Patch:  4567_distro-Gentoo-Kconfig.patch
 From:   Tom Wijsman <TomWij@gentoo.org>
 Desc:   Add Gentoo Linux support config settings and defaults.
+
+Patch:  5000_enable-additional-cpu-optimizations-for-gcc.patch
+From:   https://github.com/graysky2/kernel_gcc_patch/
+Desc:   Kernel patch enables gcc < v4.9 optimizations for additional CPUs.
+
+Patch:  5010_enable-additional-cpu-optimizations-for-gcc-4.9.patch
+From:   https://github.com/graysky2/kernel_gcc_patch/
+Desc:   Kernel patch enables gcc >= v4.9 optimizations for additional CPUs.
+
+Patch:  5015_kdbus-8-12-2015.patch
+From:   https://lkml.org
+Desc:   Kernel-level IPC implementation

diff --git a/1500_XATTR_USER_PREFIX.patch b/1500_XATTR_USER_PREFIX.patch
new file mode 100644
index 0000000..cc15cd5
--- /dev/null
+++ b/1500_XATTR_USER_PREFIX.patch
@@ -0,0 +1,54 @@
+From: Anthony G. Basile <blueness@gentoo.org>
+
+This patch adds support for a restricted user-controlled namespace on
+tmpfs filesystem used to house PaX flags.  The namespace must be of the
+form user.pax.* and its value cannot exceed a size of 8 bytes.
+
+This is needed even on all Gentoo systems so that XATTR_PAX flags
+are preserved for users who might build packages using portage on
+a tmpfs system with a non-hardened kernel and then switch to a
+hardened kernel with XATTR_PAX enabled.
+
+The namespace is added to any user with Extended Attribute support
+enabled for tmpfs.  Users who do not enable xattrs will not have
+the XATTR_PAX flags preserved.
+
+diff --git a/include/uapi/linux/xattr.h b/include/uapi/linux/xattr.h
+index e4629b9..6958086 100644
+--- a/include/uapi/linux/xattr.h
++++ b/include/uapi/linux/xattr.h
+@@ -63,5 +63,9 @@
+ #define XATTR_POSIX_ACL_DEFAULT  "posix_acl_default"
+ #define XATTR_NAME_POSIX_ACL_DEFAULT XATTR_SYSTEM_PREFIX XATTR_POSIX_ACL_DEFAULT
+ 
++/* User namespace */
++#define XATTR_PAX_PREFIX XATTR_USER_PREFIX "pax."
++#define XATTR_PAX_FLAGS_SUFFIX "flags"
++#define XATTR_NAME_PAX_FLAGS XATTR_PAX_PREFIX XATTR_PAX_FLAGS_SUFFIX
+ 
+ #endif /* _UAPI_LINUX_XATTR_H */
+diff --git a/mm/shmem.c b/mm/shmem.c
+index 1c44af7..f23bb1b 100644
+--- a/mm/shmem.c
++++ b/mm/shmem.c
+@@ -2201,6 +2201,7 @@ static const struct xattr_handler *shmem_xattr_handlers[] = {
+ static int shmem_xattr_validate(const char *name)
+ {
+ 	struct { const char *prefix; size_t len; } arr[] = {
++		{ XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN},
+ 		{ XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN },
+ 		{ XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN }
+ 	};
+@@ -2256,6 +2257,12 @@ static int shmem_setxattr(struct dentry *dentry, const char *name,
+ 	if (err)
+ 		return err;
+ 
++	if (!strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN)) {
++		if (strcmp(name, XATTR_NAME_PAX_FLAGS))
++			return -EOPNOTSUPP;
++		if (size > 8)
++			return -EINVAL;
++	}
+ 	return simple_xattr_set(&info->xattrs, name, value, size, flags);
+ }
+ 

diff --git a/1510_fs-enable-link-security-restrictions-by-default.patch b/1510_fs-enable-link-security-restrictions-by-default.patch
new file mode 100644
index 0000000..639fb3c
--- /dev/null
+++ b/1510_fs-enable-link-security-restrictions-by-default.patch
@@ -0,0 +1,22 @@
+From: Ben Hutchings <ben@decadent.org.uk>
+Subject: fs: Enable link security restrictions by default
+Date: Fri, 02 Nov 2012 05:32:06 +0000
+Bug-Debian: https://bugs.debian.org/609455
+Forwarded: not-needed
+
+This reverts commit 561ec64ae67ef25cac8d72bb9c4bfc955edfd415
+('VFS: don't do protected {sym,hard}links by default').
+
+--- a/fs/namei.c
++++ b/fs/namei.c
+@@ -651,8 +651,8 @@ static inline void put_link(struct namei
+ 	path_put(link);
+ }
+ 
+-int sysctl_protected_symlinks __read_mostly = 0;
+-int sysctl_protected_hardlinks __read_mostly = 0;
++int sysctl_protected_symlinks __read_mostly = 1;
++int sysctl_protected_hardlinks __read_mostly = 1;
+ 
+ /**
+  * may_follow_link - Check symlink following for unsafe situations

diff --git a/2700_ThinkPad-30-brightness-control-fix.patch b/2700_ThinkPad-30-brightness-control-fix.patch
new file mode 100644
index 0000000..b548c6d
--- /dev/null
+++ b/2700_ThinkPad-30-brightness-control-fix.patch
@@ -0,0 +1,67 @@
+diff --git a/drivers/acpi/blacklist.c b/drivers/acpi/blacklist.c
+index cb96296..6c242ed 100644
+--- a/drivers/acpi/blacklist.c
++++ b/drivers/acpi/blacklist.c
+@@ -269,6 +276,61 @@  static struct dmi_system_id acpi_osi_dmi_table[] __initdata = {
+ 	},
+ 
+ 	/*
++	 * The following Lenovo models have a broken workaround in the
++	 * acpi_video backlight implementation to meet the Windows 8
++	 * requirement of 101 backlight levels. Reverting to pre-Win8
++	 * behavior fixes the problem.
++	 */
++	{
++	.callback = dmi_disable_osi_win8,
++	.ident = "Lenovo ThinkPad L430",
++	.matches = {
++		     DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
++		     DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad L430"),
++		},
++	},
++	{
++	.callback = dmi_disable_osi_win8,
++	.ident = "Lenovo ThinkPad T430s",
++	.matches = {
++		     DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
++		     DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad T430s"),
++		},
++	},
++	{
++	.callback = dmi_disable_osi_win8,
++	.ident = "Lenovo ThinkPad T530",
++	.matches = {
++		     DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
++		     DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad T530"),
++		},
++	},
++	{
++	.callback = dmi_disable_osi_win8,
++	.ident = "Lenovo ThinkPad W530",
++	.matches = {
++		     DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
++		     DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad W530"),
++		},
++	},
++	{
++	.callback = dmi_disable_osi_win8,
++	.ident = "Lenovo ThinkPad X1 Carbon",
++	.matches = {
++		     DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
++		     DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad X1 Carbon"),
++		},
++	},
++	{
++	.callback = dmi_disable_osi_win8,
++	.ident = "Lenovo ThinkPad X230",
++	.matches = {
++		     DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
++		     DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad X230"),
++		},
++	},
++
++	/*
+ 	 * BIOS invocation of _OSI(Linux) is almost always a BIOS bug.
+ 	 * Linux ignores it, except for the machines enumerated below.
+ 	 */
+

diff --git a/2900_dev-root-proc-mount-fix.patch b/2900_dev-root-proc-mount-fix.patch
new file mode 100644
index 0000000..60af1eb
--- /dev/null
+++ b/2900_dev-root-proc-mount-fix.patch
@@ -0,0 +1,38 @@
+--- a/init/do_mounts.c	2015-08-19 10:27:16.753852576 -0400
++++ b/init/do_mounts.c	2015-08-19 10:34:25.473850353 -0400
+@@ -490,7 +490,11 @@ void __init change_floppy(char *fmt, ...
+ 	va_start(args, fmt);
+ 	vsprintf(buf, fmt, args);
+ 	va_end(args);
+-	fd = sys_open("/dev/root", O_RDWR | O_NDELAY, 0);
++	if (saved_root_name[0])
++		fd = sys_open(saved_root_name, O_RDWR | O_NDELAY, 0);
++	else
++		fd = sys_open("/dev/root", O_RDWR | O_NDELAY, 0);
++
+ 	if (fd >= 0) {
+ 		sys_ioctl(fd, FDEJECT, 0);
+ 		sys_close(fd);
+@@ -534,11 +538,17 @@ void __init mount_root(void)
+ #endif
+ #ifdef CONFIG_BLOCK
+ 	{
+-		int err = create_dev("/dev/root", ROOT_DEV);
+-
+-		if (err < 0)
+-			pr_emerg("Failed to create /dev/root: %d\n", err);
+-		mount_block_root("/dev/root", root_mountflags);
++		if (saved_root_name[0] == '/') {
++	       	int err = create_dev(saved_root_name, ROOT_DEV);
++			if (err < 0)
++				pr_emerg("Failed to create %s: %d\n", saved_root_name, err);
++			mount_block_root(saved_root_name, root_mountflags);
++		} else {
++			int err = create_dev("/dev/root", ROOT_DEV);
++			if (err < 0)
++				pr_emerg("Failed to create /dev/root: %d\n", err);
++			mount_block_root("/dev/root", root_mountflags);
++		}
+ 	}
+ #endif
+ }

diff --git a/2905_2disk-resume-image-fix.patch b/2905_2disk-resume-image-fix.patch
new file mode 100644
index 0000000..7e95d29
--- /dev/null
+++ b/2905_2disk-resume-image-fix.patch
@@ -0,0 +1,24 @@
+diff --git a/kernel/kmod.c b/kernel/kmod.c
+index fb32636..d968882 100644
+--- a/kernel/kmod.c
++++ b/kernel/kmod.c
+@@ -575,7 +575,8 @@
+ 		call_usermodehelper_freeinfo(sub_info);
+ 		return -EINVAL;
+ 	}
+-	helper_lock();
++	if (!(current->flags & PF_FREEZER_SKIP))
++		helper_lock();
+ 	if (!khelper_wq || usermodehelper_disabled) {
+ 		retval = -EBUSY;
+ 		goto out;
+@@ -611,7 +612,8 @@ wait_done:
+ out:
+ 	call_usermodehelper_freeinfo(sub_info);
+ unlock:
+-	helper_unlock();
++	if (!(current->flags & PF_FREEZER_SKIP))
++		helper_unlock();
+ 	return retval;
+ }
+ EXPORT_SYMBOL(call_usermodehelper_exec);

diff --git a/4200_fbcondecor-3.19.patch b/4200_fbcondecor-3.19.patch
new file mode 100644
index 0000000..29c379f
--- /dev/null
+++ b/4200_fbcondecor-3.19.patch
@@ -0,0 +1,2119 @@
+diff --git a/Documentation/fb/00-INDEX b/Documentation/fb/00-INDEX
+index fe85e7c..2230930 100644
+--- a/Documentation/fb/00-INDEX
++++ b/Documentation/fb/00-INDEX
+@@ -23,6 +23,8 @@ ep93xx-fb.txt
+ 	- info on the driver for EP93xx LCD controller.
+ fbcon.txt
+ 	- intro to and usage guide for the framebuffer console (fbcon).
++fbcondecor.txt
++	- info on the Framebuffer Console Decoration
+ framebuffer.txt
+ 	- introduction to frame buffer devices.
+ gxfb.txt
+diff --git a/Documentation/fb/fbcondecor.txt b/Documentation/fb/fbcondecor.txt
+new file mode 100644
+index 0000000..3388c61
+--- /dev/null
++++ b/Documentation/fb/fbcondecor.txt
+@@ -0,0 +1,207 @@
++What is it?
++-----------
++
++The framebuffer decorations are a kernel feature which allows displaying a 
++background picture on selected consoles.
++
++What do I need to get it to work?
++---------------------------------
++
++To get fbcondecor up-and-running you will have to:
++ 1) get a copy of splashutils [1] or a similar program
++ 2) get some fbcondecor themes
++ 3) build the kernel helper program
++ 4) build your kernel with the FB_CON_DECOR option enabled.
++
++To get fbcondecor operational right after fbcon initialization is finished, you
++will have to include a theme and the kernel helper into your initramfs image.
++Please refer to splashutils documentation for instructions on how to do that.
++
++[1] The splashutils package can be downloaded from:
++    http://github.com/alanhaggai/fbsplash
++
++The userspace helper
++--------------------
++
++The userspace fbcondecor helper (by default: /sbin/fbcondecor_helper) is called by the
++kernel whenever an important event occurs and the kernel needs some kind of
++job to be carried out. Important events include console switches and video
++mode switches (the kernel requests background images and configuration
++parameters for the current console). The fbcondecor helper must be accessible at
++all times. If it's not, fbcondecor will be switched off automatically.
++
++It's possible to set path to the fbcondecor helper by writing it to
++/proc/sys/kernel/fbcondecor.
++
++*****************************************************************************
++
++The information below is mostly technical stuff. There's probably no need to
++read it unless you plan to develop a userspace helper.
++
++The fbcondecor protocol
++-----------------------
++
++The fbcondecor protocol defines a communication interface between the kernel and
++the userspace fbcondecor helper.
++
++The kernel side is responsible for:
++
++ * rendering console text, using an image as a background (instead of a
++   standard solid color fbcon uses),
++ * accepting commands from the user via ioctls on the fbcondecor device,
++ * calling the userspace helper to set things up as soon as the fb subsystem 
++   is initialized.
++
++The userspace helper is responsible for everything else, including parsing
++configuration files, decompressing the image files whenever the kernel needs
++it, and communicating with the kernel if necessary.
++
++The fbcondecor protocol specifies how communication is done in both ways:
++kernel->userspace and userspace->helper.
++  
++Kernel -> Userspace
++-------------------
++
++The kernel communicates with the userspace helper by calling it and specifying
++the task to be done in a series of arguments.
++
++The arguments follow the pattern:
++<fbcondecor protocol version> <command> <parameters>
++
++All commands defined in fbcondecor protocol v2 have the following parameters:
++ virtual console
++ framebuffer number
++ theme
++
++Fbcondecor protocol v1 specified an additional 'fbcondecor mode' after the
++framebuffer number. Fbcondecor protocol v1 is deprecated and should not be used.
++
++Fbcondecor protocol v2 specifies the following commands:
++
++getpic
++------
++ The kernel issues this command to request image data. It's up to the 
++ userspace  helper to find a background image appropriate for the specified 
++ theme and the current resolution. The userspace helper should respond by 
++ issuing the FBIOCONDECOR_SETPIC ioctl.
++
++init
++----
++ The kernel issues this command after the fbcondecor device is created and
++ the fbcondecor interface is initialized. Upon receiving 'init', the userspace
++ helper should parse the kernel command line (/proc/cmdline) or otherwise
++ decide whether fbcondecor is to be activated.
++
++ To activate fbcondecor on the first console the helper should issue the
++ FBIOCONDECOR_SETCFG, FBIOCONDECOR_SETPIC and FBIOCONDECOR_SETSTATE commands,
++ in the above-mentioned order.
++
++ When the userspace helper is called in an early phase of the boot process
++ (right after the initialization of fbcon), no filesystems will be mounted.
++ The helper program should mount sysfs and then create the appropriate
++ framebuffer, fbcondecor and tty0 devices (if they don't already exist) to get
++ current display settings and to be able to communicate with the kernel side.
++ It should probably also mount the procfs to be able to parse the kernel
++ command line parameters.
++
++ Note that the console sem is not held when the kernel calls fbcondecor_helper
++ with the 'init' command. The fbcondecor helper should perform all ioctls with
++ origin set to FBCON_DECOR_IO_ORIG_USER.
++
++modechange
++----------
++ The kernel issues this command on a mode change. The helper's response should
++ be similar to the response to the 'init' command. Note that this time the
++ console sem is held and all ioctls must be performed with origin set to
++ FBCON_DECOR_IO_ORIG_KERNEL.
++
++
++Userspace -> Kernel
++-------------------
++
++Userspace programs can communicate with fbcondecor via ioctls on the
++fbcondecor device. These ioctls are to be used by both the userspace helper
++(called only by the kernel) and userspace configuration tools (run by the users).
++
++The fbcondecor helper should set the origin field to FBCON_DECOR_IO_ORIG_KERNEL
++when doing the appropriate ioctls. All userspace configuration tools should
++use FBCON_DECOR_IO_ORIG_USER. Failure to set the appropriate value in the origin
++field when performing ioctls from the kernel helper will most likely result
++in a console deadlock.
++
++FBCON_DECOR_IO_ORIG_KERNEL instructs fbcondecor not to try to acquire the console
++semaphore. Not surprisingly, FBCON_DECOR_IO_ORIG_USER instructs it to acquire
++the console sem.
++
++The framebuffer console decoration provides the following ioctls (all defined in 
++linux/fb.h):
++
++FBIOCONDECOR_SETPIC
++description: loads a background picture for a virtual console
++argument: struct fbcon_decor_iowrapper*; data: struct fb_image*
++notes: 
++If called for consoles other than the current foreground one, the picture data
++will be ignored.
++
++If the current virtual console is running in a 8-bpp mode, the cmap substruct
++of fb_image has to be filled appropriately: start should be set to 16 (first
++16 colors are reserved for fbcon), len to a value <= 240 and red, green and
++blue should point to valid cmap data. The transp field is ingored. The fields
++dx, dy, bg_color, fg_color in fb_image are ignored as well.
++
++FBIOCONDECOR_SETCFG
++description: sets the fbcondecor config for a virtual console
++argument: struct fbcon_decor_iowrapper*; data: struct vc_decor*
++notes: The structure has to be filled with valid data.
++
++FBIOCONDECOR_GETCFG
++description: gets the fbcondecor config for a virtual console
++argument: struct fbcon_decor_iowrapper*; data: struct vc_decor*
++
++FBIOCONDECOR_SETSTATE
++description: sets the fbcondecor state for a virtual console
++argument: struct fbcon_decor_iowrapper*; data: unsigned int*
++          values: 0 = disabled, 1 = enabled.
++
++FBIOCONDECOR_GETSTATE
++description: gets the fbcondecor state for a virtual console
++argument: struct fbcon_decor_iowrapper*; data: unsigned int*
++          values: as in FBIOCONDECOR_SETSTATE
++
++Info on used structures:
++
++Definition of struct vc_decor can be found in linux/console_decor.h. It's
++heavily commented. Note that the 'theme' field should point to a string
++no longer than FBCON_DECOR_THEME_LEN. When FBIOCONDECOR_GETCFG call is
++performed, the theme field should point to a char buffer of length
++FBCON_DECOR_THEME_LEN.
++
++Definition of struct fbcon_decor_iowrapper can be found in linux/fb.h.
++The fields in this struct have the following meaning:
++
++vc: 
++Virtual console number.
++
++origin: 
++Specifies if the ioctl is performed as a response to a kernel request. The
++fbcondecor helper should set this field to FBCON_DECOR_IO_ORIG_KERNEL, userspace
++programs should set it to FBCON_DECOR_IO_ORIG_USER. This field is necessary to
++avoid console semaphore deadlocks.
++
++data: 
++Pointer to a data structure appropriate for the performed ioctl. Type of
++the data struct is specified in the ioctls description.
++
++*****************************************************************************
++
++Credit
++------
++
++Original 'bootsplash' project & implementation by:
++  Volker Poplawski <volker@poplawski.de>, Stefan Reinauer <stepan@suse.de>,
++  Steffen Winterfeldt <snwint@suse.de>, Michael Schroeder <mls@suse.de>,
++  Ken Wimer <wimer@suse.de>.
++
++Fbcondecor, fbcondecor protocol design, current implementation & docs by:
++  Michal Januszewski <michalj+fbcondecor@gmail.com>
++
+diff --git a/drivers/Makefile b/drivers/Makefile
+index 7183b6a..d576148 100644
+--- a/drivers/Makefile
++++ b/drivers/Makefile
+@@ -17,6 +17,10 @@ obj-y				+= pwm/
+ obj-$(CONFIG_PCI)		+= pci/
+ obj-$(CONFIG_PARISC)		+= parisc/
+ obj-$(CONFIG_RAPIDIO)		+= rapidio/
++# tty/ comes before char/ so that the VT console is the boot-time
++# default.
++obj-y				+= tty/
++obj-y				+= char/
+ obj-y				+= video/
+ obj-y				+= idle/
+ 
+@@ -42,11 +46,6 @@ obj-$(CONFIG_REGULATOR)		+= regulator/
+ # reset controllers early, since gpu drivers might rely on them to initialize
+ obj-$(CONFIG_RESET_CONTROLLER)	+= reset/
+ 
+-# tty/ comes before char/ so that the VT console is the boot-time
+-# default.
+-obj-y				+= tty/
+-obj-y				+= char/
+-
+ # iommu/ comes before gpu as gpu are using iommu controllers
+ obj-$(CONFIG_IOMMU_SUPPORT) += iommu/
+
+diff --git a/drivers/video/console/Kconfig b/drivers/video/console/Kconfig
+index fe1cd01..6d2e87a 100644
+--- a/drivers/video/console/Kconfig
++++ b/drivers/video/console/Kconfig
+@@ -126,6 +126,19 @@ config FRAMEBUFFER_CONSOLE_ROTATION
+          such that other users of the framebuffer will remain normally
+          oriented.
+ 
++config FB_CON_DECOR
++	bool "Support for the Framebuffer Console Decorations"
++	depends on FRAMEBUFFER_CONSOLE=y && !FB_TILEBLITTING
++	default n
++	---help---
++	  This option enables support for framebuffer console decorations which
++	  makes it possible to display images in the background of the system
++	  consoles.  Note that userspace utilities are necessary in order to take 
++	  advantage of these features. Refer to Documentation/fb/fbcondecor.txt 
++	  for more information.
++
++	  If unsure, say N.
++
+ config STI_CONSOLE
+         bool "STI text console"
+         depends on PARISC
+diff --git a/drivers/video/console/Makefile b/drivers/video/console/Makefile
+index 43bfa48..cc104b6f 100644
+--- a/drivers/video/console/Makefile
++++ b/drivers/video/console/Makefile
+@@ -16,4 +16,5 @@ obj-$(CONFIG_FRAMEBUFFER_CONSOLE)     += fbcon_rotate.o fbcon_cw.o fbcon_ud.o \
+                                          fbcon_ccw.o
+ endif
+ 
++obj-$(CONFIG_FB_CON_DECOR)     	  += fbcondecor.o cfbcondecor.o
+ obj-$(CONFIG_FB_STI)              += sticore.o
+diff --git a/drivers/video/console/bitblit.c b/drivers/video/console/bitblit.c
+index 61b182b..984384b 100644
+--- a/drivers/video/console/bitblit.c
++++ b/drivers/video/console/bitblit.c
+@@ -18,6 +18,7 @@
+ #include <linux/console.h>
+ #include <asm/types.h>
+ #include "fbcon.h"
++#include "fbcondecor.h"
+ 
+ /*
+  * Accelerated handlers.
+@@ -55,6 +56,13 @@ static void bit_bmove(struct vc_data *vc, struct fb_info *info, int sy,
+ 	area.height = height * vc->vc_font.height;
+ 	area.width = width * vc->vc_font.width;
+ 
++	if (fbcon_decor_active(info, vc)) {
++ 		area.sx += vc->vc_decor.tx;
++ 		area.sy += vc->vc_decor.ty;
++ 		area.dx += vc->vc_decor.tx;
++ 		area.dy += vc->vc_decor.ty;
++ 	}
++
+ 	info->fbops->fb_copyarea(info, &area);
+ }
+ 
+@@ -380,11 +388,15 @@ static void bit_cursor(struct vc_data *vc, struct fb_info *info, int mode,
+ 	cursor.image.depth = 1;
+ 	cursor.rop = ROP_XOR;
+ 
+-	if (info->fbops->fb_cursor)
+-		err = info->fbops->fb_cursor(info, &cursor);
++	if (fbcon_decor_active(info, vc)) {
++		fbcon_decor_cursor(info, &cursor);
++	} else {
++		if (info->fbops->fb_cursor)
++			err = info->fbops->fb_cursor(info, &cursor);
+ 
+-	if (err)
+-		soft_cursor(info, &cursor);
++		if (err)
++			soft_cursor(info, &cursor);
++	}
+ 
+ 	ops->cursor_reset = 0;
+ }
+diff --git a/drivers/video/console/cfbcondecor.c b/drivers/video/console/cfbcondecor.c
+new file mode 100644
+index 0000000..a2b4497
+--- /dev/null
++++ b/drivers/video/console/cfbcondecor.c
+@@ -0,0 +1,471 @@
++/*
++ *  linux/drivers/video/cfbcon_decor.c -- Framebuffer decor render functions
++ *
++ *  Copyright (C) 2004 Michal Januszewski <michalj+fbcondecor@gmail.com>
++ *
++ *  Code based upon "Bootdecor" (C) 2001-2003
++ *       Volker Poplawski <volker@poplawski.de>,
++ *       Stefan Reinauer <stepan@suse.de>,
++ *       Steffen Winterfeldt <snwint@suse.de>,
++ *       Michael Schroeder <mls@suse.de>,
++ *       Ken Wimer <wimer@suse.de>.
++ *
++ *  This file is subject to the terms and conditions of the GNU General Public
++ *  License.  See the file COPYING in the main directory of this archive for
++ *  more details.
++ */
++#include <linux/module.h>
++#include <linux/types.h>
++#include <linux/fb.h>
++#include <linux/selection.h>
++#include <linux/slab.h>
++#include <linux/vt_kern.h>
++#include <asm/irq.h>
++
++#include "fbcon.h"
++#include "fbcondecor.h"
++
++#define parse_pixel(shift,bpp,type)						\
++	do {									\
++		if (d & (0x80 >> (shift)))					\
++			dd2[(shift)] = fgx;					\
++		else								\
++			dd2[(shift)] = transparent ? *(type *)decor_src : bgx;	\
++		decor_src += (bpp);						\
++	} while (0)								\
++
++extern int get_color(struct vc_data *vc, struct fb_info *info,
++		     u16 c, int is_fg);
++
++void fbcon_decor_fix_pseudo_pal(struct fb_info *info, struct vc_data *vc)
++{
++	int i, j, k;
++	int minlen = min(min(info->var.red.length, info->var.green.length),
++			     info->var.blue.length);
++	u32 col;
++
++	for (j = i = 0; i < 16; i++) {
++		k = color_table[i];
++
++		col = ((vc->vc_palette[j++]  >> (8-minlen))
++			<< info->var.red.offset);
++		col |= ((vc->vc_palette[j++] >> (8-minlen))
++			<< info->var.green.offset);
++		col |= ((vc->vc_palette[j++] >> (8-minlen))
++			<< info->var.blue.offset);
++			((u32 *)info->pseudo_palette)[k] = col;
++	}
++}
++
++void fbcon_decor_renderc(struct fb_info *info, int ypos, int xpos, int height,
++		      int width, u8* src, u32 fgx, u32 bgx, u8 transparent)
++{
++	unsigned int x, y;
++	u32 dd;
++	int bytespp = ((info->var.bits_per_pixel + 7) >> 3);
++	unsigned int d = ypos * info->fix.line_length + xpos * bytespp;
++	unsigned int ds = (ypos * info->var.xres + xpos) * bytespp;
++	u16 dd2[4];
++
++	u8* decor_src = (u8 *)(info->bgdecor.data + ds);
++	u8* dst = (u8 *)(info->screen_base + d);
++
++	if ((ypos + height) > info->var.yres || (xpos + width) > info->var.xres)
++		return;
++
++	for (y = 0; y < height; y++) {
++		switch (info->var.bits_per_pixel) {
++
++		case 32:
++			for (x = 0; x < width; x++) {
++
++				if ((x & 7) == 0)
++					d = *src++;
++				if (d & 0x80)
++					dd = fgx;
++				else
++					dd = transparent ?
++					     *(u32 *)decor_src : bgx;
++
++				d <<= 1;
++				decor_src += 4;
++				fb_writel(dd, dst);
++				dst += 4;
++			}
++			break;
++		case 24:
++			for (x = 0; x < width; x++) {
++
++				if ((x & 7) == 0)
++					d = *src++;
++				if (d & 0x80)
++					dd = fgx;
++				else
++					dd = transparent ?
++					     (*(u32 *)decor_src & 0xffffff) : bgx;
++
++				d <<= 1;
++				decor_src += 3;
++#ifdef __LITTLE_ENDIAN
++				fb_writew(dd & 0xffff, dst);
++				dst += 2;
++				fb_writeb((dd >> 16), dst);
++#else
++				fb_writew(dd >> 8, dst);
++				dst += 2;
++				fb_writeb(dd & 0xff, dst);
++#endif
++				dst++;
++			}
++			break;
++		case 16:
++			for (x = 0; x < width; x += 2) {
++				if ((x & 7) == 0)
++					d = *src++;
++
++				parse_pixel(0, 2, u16);
++				parse_pixel(1, 2, u16);
++#ifdef __LITTLE_ENDIAN
++				dd = dd2[0] | (dd2[1] << 16);
++#else
++				dd = dd2[1] | (dd2[0] << 16);
++#endif
++				d <<= 2;
++				fb_writel(dd, dst);
++				dst += 4;
++			}
++			break;
++
++		case 8:
++			for (x = 0; x < width; x += 4) {
++				if ((x & 7) == 0)
++					d = *src++;
++
++				parse_pixel(0, 1, u8);
++				parse_pixel(1, 1, u8);
++				parse_pixel(2, 1, u8);
++				parse_pixel(3, 1, u8);
++
++#ifdef __LITTLE_ENDIAN
++				dd = dd2[0] | (dd2[1] << 8) | (dd2[2] << 16) | (dd2[3] << 24);
++#else
++				dd = dd2[3] | (dd2[2] << 8) | (dd2[1] << 16) | (dd2[0] << 24);
++#endif
++				d <<= 4;
++				fb_writel(dd, dst);
++				dst += 4;
++			}
++		}
++
++		dst += info->fix.line_length - width * bytespp;
++		decor_src += (info->var.xres - width) * bytespp;
++	}
++}
++
++#define cc2cx(a) 						\
++	((info->fix.visual == FB_VISUAL_TRUECOLOR || 		\
++	  info->fix.visual == FB_VISUAL_DIRECTCOLOR) ? 		\
++	 ((u32*)info->pseudo_palette)[a] : a)
++
++void fbcon_decor_putcs(struct vc_data *vc, struct fb_info *info,
++		   const unsigned short *s, int count, int yy, int xx)
++{
++	unsigned short charmask = vc->vc_hi_font_mask ? 0x1ff : 0xff;
++	struct fbcon_ops *ops = info->fbcon_par;
++	int fg_color, bg_color, transparent;
++	u8 *src;
++	u32 bgx, fgx;
++	u16 c = scr_readw(s);
++
++	fg_color = get_color(vc, info, c, 1);
++        bg_color = get_color(vc, info, c, 0);
++
++	/* Don't paint the background image if console is blanked */
++	transparent = ops->blank_state ? 0 :
++		(vc->vc_decor.bg_color == bg_color);
++
++	xx = xx * vc->vc_font.width + vc->vc_decor.tx;
++	yy = yy * vc->vc_font.height + vc->vc_decor.ty;
++
++	fgx = cc2cx(fg_color);
++	bgx = cc2cx(bg_color);
++
++	while (count--) {
++		c = scr_readw(s++);
++		src = vc->vc_font.data + (c & charmask) * vc->vc_font.height *
++		      ((vc->vc_font.width + 7) >> 3);
++
++		fbcon_decor_renderc(info, yy, xx, vc->vc_font.height,
++			       vc->vc_font.width, src, fgx, bgx, transparent);
++		xx += vc->vc_font.width;
++	}
++}
++
++void fbcon_decor_cursor(struct fb_info *info, struct fb_cursor *cursor)
++{
++	int i;
++	unsigned int dsize, s_pitch;
++	struct fbcon_ops *ops = info->fbcon_par;
++	struct vc_data* vc;
++	u8 *src;
++
++	/* we really don't need any cursors while the console is blanked */
++	if (info->state != FBINFO_STATE_RUNNING || ops->blank_state)
++		return;
++
++	vc = vc_cons[ops->currcon].d;
++
++	src = kmalloc(64 + sizeof(struct fb_image), GFP_ATOMIC);
++	if (!src)
++		return;
++
++	s_pitch = (cursor->image.width + 7) >> 3;
++	dsize = s_pitch * cursor->image.height;
++	if (cursor->enable) {
++		switch (cursor->rop) {
++		case ROP_XOR:
++			for (i = 0; i < dsize; i++)
++				src[i] = cursor->image.data[i] ^ cursor->mask[i];
++                        break;
++		case ROP_COPY:
++		default:
++			for (i = 0; i < dsize; i++)
++				src[i] = cursor->image.data[i] & cursor->mask[i];
++			break;
++		}
++	} else
++		memcpy(src, cursor->image.data, dsize);
++
++	fbcon_decor_renderc(info,
++			cursor->image.dy + vc->vc_decor.ty,
++			cursor->image.dx + vc->vc_decor.tx,
++			cursor->image.height,
++			cursor->image.width,
++			(u8*)src,
++			cc2cx(cursor->image.fg_color),
++			cc2cx(cursor->image.bg_color),
++			cursor->image.bg_color == vc->vc_decor.bg_color);
++
++	kfree(src);
++}
++
++static void decorset(u8 *dst, int height, int width, int dstbytes,
++		        u32 bgx, int bpp)
++{
++	int i;
++
++	if (bpp == 8)
++		bgx |= bgx << 8;
++	if (bpp == 16 || bpp == 8)
++		bgx |= bgx << 16;
++
++	while (height-- > 0) {
++		u8 *p = dst;
++
++		switch (bpp) {
++
++		case 32:
++			for (i=0; i < width; i++) {
++				fb_writel(bgx, p); p += 4;
++			}
++			break;
++		case 24:
++			for (i=0; i < width; i++) {
++#ifdef __LITTLE_ENDIAN
++				fb_writew((bgx & 0xffff),(u16*)p); p += 2;
++				fb_writeb((bgx >> 16),p++);
++#else
++				fb_writew((bgx >> 8),(u16*)p); p += 2;
++				fb_writeb((bgx & 0xff),p++);
++#endif
++			}
++		case 16:
++			for (i=0; i < width/4; i++) {
++				fb_writel(bgx,p); p += 4;
++				fb_writel(bgx,p); p += 4;
++			}
++			if (width & 2) {
++				fb_writel(bgx,p); p += 4;
++			}
++			if (width & 1)
++				fb_writew(bgx,(u16*)p);
++			break;
++		case 8:
++			for (i=0; i < width/4; i++) {
++				fb_writel(bgx,p); p += 4;
++			}
++
++			if (width & 2) {
++				fb_writew(bgx,p); p += 2;
++			}
++			if (width & 1)
++				fb_writeb(bgx,(u8*)p);
++			break;
++
++		}
++		dst += dstbytes;
++	}
++}
++
++void fbcon_decor_copy(u8 *dst, u8 *src, int height, int width, int linebytes,
++		   int srclinebytes, int bpp)
++{
++	int i;
++
++	while (height-- > 0) {
++		u32 *p = (u32 *)dst;
++		u32 *q = (u32 *)src;
++
++		switch (bpp) {
++
++		case 32:
++			for (i=0; i < width; i++)
++				fb_writel(*q++, p++);
++			break;
++		case 24:
++			for (i=0; i < (width*3/4); i++)
++				fb_writel(*q++, p++);
++			if ((width*3) % 4) {
++				if (width & 2) {
++					fb_writeb(*(u8*)q, (u8*)p);
++				} else if (width & 1) {
++					fb_writew(*(u16*)q, (u16*)p);
++					fb_writeb(*(u8*)((u16*)q+1),(u8*)((u16*)p+2));
++				}
++			}
++			break;
++		case 16:
++			for (i=0; i < width/4; i++) {
++				fb_writel(*q++, p++);
++				fb_writel(*q++, p++);
++			}
++			if (width & 2)
++				fb_writel(*q++, p++);
++			if (width & 1)
++				fb_writew(*(u16*)q, (u16*)p);
++			break;
++		case 8:
++			for (i=0; i < width/4; i++)
++				fb_writel(*q++, p++);
++
++			if (width & 2) {
++				fb_writew(*(u16*)q, (u16*)p);
++				q = (u32*) ((u16*)q + 1);
++				p = (u32*) ((u16*)p + 1);
++			}
++			if (width & 1)
++				fb_writeb(*(u8*)q, (u8*)p);
++			break;
++		}
++
++		dst += linebytes;
++		src += srclinebytes;
++	}
++}
++
++static void decorfill(struct fb_info *info, int sy, int sx, int height,
++		       int width)
++{
++	int bytespp = ((info->var.bits_per_pixel + 7) >> 3);
++	int d  = sy * info->fix.line_length + sx * bytespp;
++	int ds = (sy * info->var.xres + sx) * bytespp;
++
++	fbcon_decor_copy((u8 *)(info->screen_base + d), (u8 *)(info->bgdecor.data + ds),
++		    height, width, info->fix.line_length, info->var.xres * bytespp,
++		    info->var.bits_per_pixel);
++}
++
++void fbcon_decor_clear(struct vc_data *vc, struct fb_info *info, int sy, int sx,
++		    int height, int width)
++{
++	int bgshift = (vc->vc_hi_font_mask) ? 13 : 12;
++	struct fbcon_ops *ops = info->fbcon_par;
++	u8 *dst;
++	int transparent, bg_color = attr_bgcol_ec(bgshift, vc, info);
++
++	transparent = (vc->vc_decor.bg_color == bg_color);
++	sy = sy * vc->vc_font.height + vc->vc_decor.ty;
++	sx = sx * vc->vc_font.width + vc->vc_decor.tx;
++	height *= vc->vc_font.height;
++	width *= vc->vc_font.width;
++
++	/* Don't paint the background image if console is blanked */
++	if (transparent && !ops->blank_state) {
++		decorfill(info, sy, sx, height, width);
++	} else {
++		dst = (u8 *)(info->screen_base + sy * info->fix.line_length +
++			     sx * ((info->var.bits_per_pixel + 7) >> 3));
++		decorset(dst, height, width, info->fix.line_length, cc2cx(bg_color),
++			  info->var.bits_per_pixel);
++	}
++}
++
++void fbcon_decor_clear_margins(struct vc_data *vc, struct fb_info *info,
++			    int bottom_only)
++{
++	unsigned int tw = vc->vc_cols*vc->vc_font.width;
++	unsigned int th = vc->vc_rows*vc->vc_font.height;
++
++	if (!bottom_only) {
++		/* top margin */
++		decorfill(info, 0, 0, vc->vc_decor.ty, info->var.xres);
++		/* left margin */
++		decorfill(info, vc->vc_decor.ty, 0, th, vc->vc_decor.tx);
++		/* right margin */
++		decorfill(info, vc->vc_decor.ty, vc->vc_decor.tx + tw, th, 
++			   info->var.xres - vc->vc_decor.tx - tw);
++	}
++	decorfill(info, vc->vc_decor.ty + th, 0, 
++		   info->var.yres - vc->vc_decor.ty - th, info->var.xres);
++}
++
++void fbcon_decor_bmove_redraw(struct vc_data *vc, struct fb_info *info, int y, 
++			   int sx, int dx, int width)
++{
++	u16 *d = (u16 *) (vc->vc_origin + vc->vc_size_row * y + dx * 2);
++	u16 *s = d + (dx - sx);
++	u16 *start = d;
++	u16 *ls = d;
++	u16 *le = d + width;
++	u16 c;
++	int x = dx;
++	u16 attr = 1;
++
++	do {
++		c = scr_readw(d);
++		if (attr != (c & 0xff00)) {
++			attr = c & 0xff00;
++			if (d > start) {
++				fbcon_decor_putcs(vc, info, start, d - start, y, x);
++				x += d - start;
++				start = d;
++			}
++		}
++		if (s >= ls && s < le && c == scr_readw(s)) {
++			if (d > start) {
++				fbcon_decor_putcs(vc, info, start, d - start, y, x);
++				x += d - start + 1;
++				start = d + 1;
++			} else {
++				x++;
++				start++;
++			}
++		}
++		s++;
++		d++;
++	} while (d < le);
++	if (d > start)
++		fbcon_decor_putcs(vc, info, start, d - start, y, x);
++}
++
++void fbcon_decor_blank(struct vc_data *vc, struct fb_info *info, int blank)
++{
++	if (blank) {
++		decorset((u8 *)info->screen_base, info->var.yres, info->var.xres,
++			  info->fix.line_length, 0, info->var.bits_per_pixel);
++	} else {
++		update_screen(vc);
++		fbcon_decor_clear_margins(vc, info, 0);
++	}
++}
++
+diff --git a/drivers/video/console/fbcon.c b/drivers/video/console/fbcon.c
+index f447734..da50d61 100644
+--- a/drivers/video/console/fbcon.c
++++ b/drivers/video/console/fbcon.c
+@@ -79,6 +79,7 @@
+ #include <asm/irq.h>
+ 
+ #include "fbcon.h"
++#include "../console/fbcondecor.h"
+ 
+ #ifdef FBCONDEBUG
+ #  define DPRINTK(fmt, args...) printk(KERN_DEBUG "%s: " fmt, __func__ , ## args)
+@@ -94,7 +95,7 @@ enum {
+ 
+ static struct display fb_display[MAX_NR_CONSOLES];
+ 
+-static signed char con2fb_map[MAX_NR_CONSOLES];
++signed char con2fb_map[MAX_NR_CONSOLES];
+ static signed char con2fb_map_boot[MAX_NR_CONSOLES];
+ 
+ static int logo_lines;
+@@ -286,7 +287,7 @@ static inline int fbcon_is_inactive(struct vc_data *vc, struct fb_info *info)
+ 		!vt_force_oops_output(vc);
+ }
+ 
+-static int get_color(struct vc_data *vc, struct fb_info *info,
++int get_color(struct vc_data *vc, struct fb_info *info,
+ 	      u16 c, int is_fg)
+ {
+ 	int depth = fb_get_color_depth(&info->var, &info->fix);
+@@ -551,6 +552,9 @@ static int do_fbcon_takeover(int show_logo)
+ 		info_idx = -1;
+ 	} else {
+ 		fbcon_has_console_bind = 1;
++#ifdef CONFIG_FB_CON_DECOR
++		fbcon_decor_init();
++#endif
+ 	}
+ 
+ 	return err;
+@@ -1007,6 +1011,12 @@ static const char *fbcon_startup(void)
+ 	rows = FBCON_SWAP(ops->rotate, info->var.yres, info->var.xres);
+ 	cols /= vc->vc_font.width;
+ 	rows /= vc->vc_font.height;
++
++	if (fbcon_decor_active(info, vc)) {
++		cols = vc->vc_decor.twidth / vc->vc_font.width;
++		rows = vc->vc_decor.theight / vc->vc_font.height;
++	}
++
+ 	vc_resize(vc, cols, rows);
+ 
+ 	DPRINTK("mode:   %s\n", info->fix.id);
+@@ -1036,7 +1046,7 @@ static void fbcon_init(struct vc_data *vc, int init)
+ 	cap = info->flags;
+ 
+ 	if (vc != svc || logo_shown == FBCON_LOGO_DONTSHOW ||
+-	    (info->fix.type == FB_TYPE_TEXT))
++	    (info->fix.type == FB_TYPE_TEXT) || fbcon_decor_active(info, vc))
+ 		logo = 0;
+ 
+ 	if (var_to_display(p, &info->var, info))
+@@ -1260,6 +1270,11 @@ static void fbcon_clear(struct vc_data *vc, int sy, int sx, int height,
+ 		fbcon_clear_margins(vc, 0);
+ 	}
+ 
++ 	if (fbcon_decor_active(info, vc)) {
++ 		fbcon_decor_clear(vc, info, sy, sx, height, width);
++ 		return;
++ 	}
++
+ 	/* Split blits that cross physical y_wrap boundary */
+ 
+ 	y_break = p->vrows - p->yscroll;
+@@ -1279,10 +1294,15 @@ static void fbcon_putcs(struct vc_data *vc, const unsigned short *s,
+ 	struct display *p = &fb_display[vc->vc_num];
+ 	struct fbcon_ops *ops = info->fbcon_par;
+ 
+-	if (!fbcon_is_inactive(vc, info))
+-		ops->putcs(vc, info, s, count, real_y(p, ypos), xpos,
+-			   get_color(vc, info, scr_readw(s), 1),
+-			   get_color(vc, info, scr_readw(s), 0));
++	if (!fbcon_is_inactive(vc, info)) {
++
++		if (fbcon_decor_active(info, vc))
++			fbcon_decor_putcs(vc, info, s, count, ypos, xpos);
++		else
++			ops->putcs(vc, info, s, count, real_y(p, ypos), xpos,
++				   get_color(vc, info, scr_readw(s), 1),
++				   get_color(vc, info, scr_readw(s), 0));
++	}
+ }
+ 
+ static void fbcon_putc(struct vc_data *vc, int c, int ypos, int xpos)
+@@ -1298,8 +1318,13 @@ static void fbcon_clear_margins(struct vc_data *vc, int bottom_only)
+ 	struct fb_info *info = registered_fb[con2fb_map[vc->vc_num]];
+ 	struct fbcon_ops *ops = info->fbcon_par;
+ 
+-	if (!fbcon_is_inactive(vc, info))
+-		ops->clear_margins(vc, info, bottom_only);
++	if (!fbcon_is_inactive(vc, info)) {
++	 	if (fbcon_decor_active(info, vc)) {
++	 		fbcon_decor_clear_margins(vc, info, bottom_only);
++ 		} else {
++			ops->clear_margins(vc, info, bottom_only);
++		}
++	}
+ }
+ 
+ static void fbcon_cursor(struct vc_data *vc, int mode)
+@@ -1819,7 +1844,7 @@ static int fbcon_scroll(struct vc_data *vc, int t, int b, int dir,
+ 			count = vc->vc_rows;
+ 		if (softback_top)
+ 			fbcon_softback_note(vc, t, count);
+-		if (logo_shown >= 0)
++		if (logo_shown >= 0 || fbcon_decor_active(info, vc))
+ 			goto redraw_up;
+ 		switch (p->scrollmode) {
+ 		case SCROLL_MOVE:
+@@ -1912,6 +1937,8 @@ static int fbcon_scroll(struct vc_data *vc, int t, int b, int dir,
+ 			count = vc->vc_rows;
+ 		if (logo_shown >= 0)
+ 			goto redraw_down;
++		if (fbcon_decor_active(info, vc))
++			goto redraw_down;
+ 		switch (p->scrollmode) {
+ 		case SCROLL_MOVE:
+ 			fbcon_redraw_blit(vc, info, p, b - 1, b - t - count,
+@@ -2060,6 +2087,13 @@ static void fbcon_bmove_rec(struct vc_data *vc, struct display *p, int sy, int s
+ 		}
+ 		return;
+ 	}
++
++	if (fbcon_decor_active(info, vc) && sy == dy && height == 1) {
++ 		/* must use slower redraw bmove to keep background pic intact */
++ 		fbcon_decor_bmove_redraw(vc, info, sy, sx, dx, width);
++ 		return;
++ 	}
++
+ 	ops->bmove(vc, info, real_y(p, sy), sx, real_y(p, dy), dx,
+ 		   height, width);
+ }
+@@ -2130,8 +2164,8 @@ static int fbcon_resize(struct vc_data *vc, unsigned int width,
+ 	var.yres = virt_h * virt_fh;
+ 	x_diff = info->var.xres - var.xres;
+ 	y_diff = info->var.yres - var.yres;
+-	if (x_diff < 0 || x_diff > virt_fw ||
+-	    y_diff < 0 || y_diff > virt_fh) {
++	if ((x_diff < 0 || x_diff > virt_fw ||
++		y_diff < 0 || y_diff > virt_fh) && !vc->vc_decor.state) {
+ 		const struct fb_videomode *mode;
+ 
+ 		DPRINTK("attempting resize %ix%i\n", var.xres, var.yres);
+@@ -2167,6 +2201,21 @@ static int fbcon_switch(struct vc_data *vc)
+ 
+ 	info = registered_fb[con2fb_map[vc->vc_num]];
+ 	ops = info->fbcon_par;
++	prev_console = ops->currcon;
++	if (prev_console != -1)
++		old_info = registered_fb[con2fb_map[prev_console]];
++
++#ifdef CONFIG_FB_CON_DECOR
++	if (!fbcon_decor_active_vc(vc) && info->fix.visual == FB_VISUAL_DIRECTCOLOR) {
++		struct vc_data *vc_curr = vc_cons[prev_console].d;
++		if (vc_curr && fbcon_decor_active_vc(vc_curr)) {
++			/* Clear the screen to avoid displaying funky colors during
++			 * palette updates. */
++			memset((u8*)info->screen_base + info->fix.line_length * info->var.yoffset,
++			       0, info->var.yres * info->fix.line_length);
++		}
++	}
++#endif
+ 
+ 	if (softback_top) {
+ 		if (softback_lines)
+@@ -2185,9 +2234,6 @@ static int fbcon_switch(struct vc_data *vc)
+ 		logo_shown = FBCON_LOGO_CANSHOW;
+ 	}
+ 
+-	prev_console = ops->currcon;
+-	if (prev_console != -1)
+-		old_info = registered_fb[con2fb_map[prev_console]];
+ 	/*
+ 	 * FIXME: If we have multiple fbdev's loaded, we need to
+ 	 * update all info->currcon.  Perhaps, we can place this
+@@ -2231,6 +2277,18 @@ static int fbcon_switch(struct vc_data *vc)
+ 			fbcon_del_cursor_timer(old_info);
+ 	}
+ 
++	if (fbcon_decor_active_vc(vc)) {
++		struct vc_data *vc_curr = vc_cons[prev_console].d;
++
++		if (!vc_curr->vc_decor.theme ||
++			strcmp(vc->vc_decor.theme, vc_curr->vc_decor.theme) ||
++			(fbcon_decor_active_nores(info, vc_curr) &&
++			 !fbcon_decor_active(info, vc_curr))) {
++			fbcon_decor_disable(vc, 0);
++			fbcon_decor_call_helper("modechange", vc->vc_num);
++		}
++	}
++
+ 	if (fbcon_is_inactive(vc, info) ||
+ 	    ops->blank_state != FB_BLANK_UNBLANK)
+ 		fbcon_del_cursor_timer(info);
+@@ -2339,15 +2397,20 @@ static int fbcon_blank(struct vc_data *vc, int blank, int mode_switch)
+ 		}
+ 	}
+ 
+- 	if (!fbcon_is_inactive(vc, info)) {
++	if (!fbcon_is_inactive(vc, info)) {
+ 		if (ops->blank_state != blank) {
+ 			ops->blank_state = blank;
+ 			fbcon_cursor(vc, blank ? CM_ERASE : CM_DRAW);
+ 			ops->cursor_flash = (!blank);
+ 
+-			if (!(info->flags & FBINFO_MISC_USEREVENT))
+-				if (fb_blank(info, blank))
+-					fbcon_generic_blank(vc, info, blank);
++			if (!(info->flags & FBINFO_MISC_USEREVENT)) {
++				if (fb_blank(info, blank)) {
++					if (fbcon_decor_active(info, vc))
++						fbcon_decor_blank(vc, info, blank);
++					else
++						fbcon_generic_blank(vc, info, blank);
++				}
++			}
+ 		}
+ 
+ 		if (!blank)
+@@ -2522,13 +2585,22 @@ static int fbcon_do_set_font(struct vc_data *vc, int w, int h,
+ 	}
+ 
+ 	if (resize) {
++		/* reset wrap/pan */
+ 		int cols, rows;
+ 
+ 		cols = FBCON_SWAP(ops->rotate, info->var.xres, info->var.yres);
+ 		rows = FBCON_SWAP(ops->rotate, info->var.yres, info->var.xres);
++
++		if (fbcon_decor_active(info, vc)) {
++			info->var.xoffset = info->var.yoffset = p->yscroll = 0;
++			cols = vc->vc_decor.twidth;
++			rows = vc->vc_decor.theight;
++		}
+ 		cols /= w;
+ 		rows /= h;
++
+ 		vc_resize(vc, cols, rows);
++
+ 		if (CON_IS_VISIBLE(vc) && softback_buf)
+ 			fbcon_update_softback(vc);
+ 	} else if (CON_IS_VISIBLE(vc)
+@@ -2657,7 +2729,11 @@ static int fbcon_set_palette(struct vc_data *vc, unsigned char *table)
+ 	int i, j, k, depth;
+ 	u8 val;
+ 
+-	if (fbcon_is_inactive(vc, info))
++	if (fbcon_is_inactive(vc, info)
++#ifdef CONFIG_FB_CON_DECOR
++			|| vc->vc_num != fg_console
++#endif
++		)
+ 		return -EINVAL;
+ 
+ 	if (!CON_IS_VISIBLE(vc))
+@@ -2683,14 +2759,56 @@ static int fbcon_set_palette(struct vc_data *vc, unsigned char *table)
+ 	} else
+ 		fb_copy_cmap(fb_default_cmap(1 << depth), &palette_cmap);
+ 
+-	return fb_set_cmap(&palette_cmap, info);
++	if (fbcon_decor_active(info, vc_cons[fg_console].d) &&
++	    info->fix.visual == FB_VISUAL_DIRECTCOLOR) {
++
++		u16 *red, *green, *blue;
++		int minlen = min(min(info->var.red.length, info->var.green.length),
++				     info->var.blue.length);
++		int h;
++
++		struct fb_cmap cmap = {
++			.start = 0,
++			.len = (1 << minlen),
++			.red = NULL,
++			.green = NULL,
++			.blue = NULL,
++			.transp = NULL
++		};
++
++		red = kmalloc(256 * sizeof(u16) * 3, GFP_KERNEL);
++
++		if (!red)
++			goto out;
++
++		green = red + 256;
++		blue = green + 256;
++		cmap.red = red;
++		cmap.green = green;
++		cmap.blue = blue;
++
++		for (i = 0; i < cmap.len; i++) {
++			red[i] = green[i] = blue[i] = (0xffff * i)/(cmap.len-1);
++		}
++
++		h = fb_set_cmap(&cmap, info);
++		fbcon_decor_fix_pseudo_pal(info, vc_cons[fg_console].d);
++		kfree(red);
++
++		return h;
++
++	} else if (fbcon_decor_active(info, vc_cons[fg_console].d) &&
++		   info->var.bits_per_pixel == 8 && info->bgdecor.cmap.red != NULL)
++		fb_set_cmap(&info->bgdecor.cmap, info);
++
++out:	return fb_set_cmap(&palette_cmap, info);
+ }
+ 
+ static u16 *fbcon_screen_pos(struct vc_data *vc, int offset)
+ {
+ 	unsigned long p;
+ 	int line;
+-	
++
+ 	if (vc->vc_num != fg_console || !softback_lines)
+ 		return (u16 *) (vc->vc_origin + offset);
+ 	line = offset / vc->vc_size_row;
+@@ -2909,7 +3027,14 @@ static void fbcon_modechanged(struct fb_info *info)
+ 		rows = FBCON_SWAP(ops->rotate, info->var.yres, info->var.xres);
+ 		cols /= vc->vc_font.width;
+ 		rows /= vc->vc_font.height;
+-		vc_resize(vc, cols, rows);
++
++		if (!fbcon_decor_active_nores(info, vc)) {
++			vc_resize(vc, cols, rows);
++		} else {
++			fbcon_decor_disable(vc, 0);
++			fbcon_decor_call_helper("modechange", vc->vc_num);
++		}
++
+ 		updatescrollmode(p, info, vc);
+ 		scrollback_max = 0;
+ 		scrollback_current = 0;
+@@ -2954,7 +3079,9 @@ static void fbcon_set_all_vcs(struct fb_info *info)
+ 		rows = FBCON_SWAP(ops->rotate, info->var.yres, info->var.xres);
+ 		cols /= vc->vc_font.width;
+ 		rows /= vc->vc_font.height;
+-		vc_resize(vc, cols, rows);
++		if (!fbcon_decor_active_nores(info, vc)) {
++			vc_resize(vc, cols, rows);
++		}
+ 	}
+ 
+ 	if (fg != -1)
+@@ -3596,6 +3723,7 @@ static void fbcon_exit(void)
+ 		}
+ 	}
+ 
++	fbcon_decor_exit();
+ 	fbcon_has_exited = 1;
+ }
+ 
+diff --git a/drivers/video/console/fbcondecor.c b/drivers/video/console/fbcondecor.c
+new file mode 100644
+index 0000000..babc8c5
+--- /dev/null
++++ b/drivers/video/console/fbcondecor.c
+@@ -0,0 +1,555 @@
++/*
++ *  linux/drivers/video/console/fbcondecor.c -- Framebuffer console decorations
++ *
++ *  Copyright (C) 2004-2009 Michal Januszewski <michalj+fbcondecor@gmail.com>
++ *
++ *  Code based upon "Bootsplash" (C) 2001-2003
++ *       Volker Poplawski <volker@poplawski.de>,
++ *       Stefan Reinauer <stepan@suse.de>,
++ *       Steffen Winterfeldt <snwint@suse.de>,
++ *       Michael Schroeder <mls@suse.de>,
++ *       Ken Wimer <wimer@suse.de>.
++ *
++ *  Compat ioctl support by Thorsten Klein <TK@Thorsten-Klein.de>.
++ *
++ *  This file is subject to the terms and conditions of the GNU General Public
++ *  License.  See the file COPYING in the main directory of this archive for
++ *  more details.
++ *
++ */
++#include <linux/module.h>
++#include <linux/kernel.h>
++#include <linux/string.h>
++#include <linux/types.h>
++#include <linux/fb.h>
++#include <linux/vt_kern.h>
++#include <linux/vmalloc.h>
++#include <linux/unistd.h>
++#include <linux/syscalls.h>
++#include <linux/init.h>
++#include <linux/proc_fs.h>
++#include <linux/workqueue.h>
++#include <linux/kmod.h>
++#include <linux/miscdevice.h>
++#include <linux/device.h>
++#include <linux/fs.h>
++#include <linux/compat.h>
++#include <linux/console.h>
++
++#include <asm/uaccess.h>
++#include <asm/irq.h>
++
++#include "fbcon.h"
++#include "fbcondecor.h"
++
++extern signed char con2fb_map[];
++static int fbcon_decor_enable(struct vc_data *vc);
++char fbcon_decor_path[KMOD_PATH_LEN] = "/sbin/fbcondecor_helper";
++static int initialized = 0;
++
++int fbcon_decor_call_helper(char* cmd, unsigned short vc)
++{
++	char *envp[] = {
++		"HOME=/",
++		"PATH=/sbin:/bin",
++		NULL
++	};
++
++	char tfb[5];
++	char tcons[5];
++	unsigned char fb = (int) con2fb_map[vc];
++
++	char *argv[] = {
++		fbcon_decor_path,
++		"2",
++		cmd,
++		tcons,
++		tfb,
++		vc_cons[vc].d->vc_decor.theme,
++		NULL
++	};
++
++	snprintf(tfb,5,"%d",fb);
++	snprintf(tcons,5,"%d",vc);
++
++	return call_usermodehelper(fbcon_decor_path, argv, envp, UMH_WAIT_EXEC);
++}
++
++/* Disables fbcondecor on a virtual console; called with console sem held. */
++int fbcon_decor_disable(struct vc_data *vc, unsigned char redraw)
++{
++	struct fb_info* info;
++
++	if (!vc->vc_decor.state)
++		return -EINVAL;
++
++	info = registered_fb[(int) con2fb_map[vc->vc_num]];
++
++	if (info == NULL)
++		return -EINVAL;
++
++	vc->vc_decor.state = 0;
++	vc_resize(vc, info->var.xres / vc->vc_font.width,
++		  info->var.yres / vc->vc_font.height);
++
++	if (fg_console == vc->vc_num && redraw) {
++		redraw_screen(vc, 0);
++		update_region(vc, vc->vc_origin +
++			      vc->vc_size_row * vc->vc_top,
++			      vc->vc_size_row * (vc->vc_bottom - vc->vc_top) / 2);
++	}
++
++	printk(KERN_INFO "fbcondecor: switched decor state to 'off' on console %d\n",
++			 vc->vc_num);
++
++	return 0;
++}
++
++/* Enables fbcondecor on a virtual console; called with console sem held. */
++static int fbcon_decor_enable(struct vc_data *vc)
++{
++	struct fb_info* info;
++
++	info = registered_fb[(int) con2fb_map[vc->vc_num]];
++
++	if (vc->vc_decor.twidth == 0 || vc->vc_decor.theight == 0 ||
++	    info == NULL || vc->vc_decor.state || (!info->bgdecor.data &&
++	    vc->vc_num == fg_console))
++		return -EINVAL;
++
++	vc->vc_decor.state = 1;
++	vc_resize(vc, vc->vc_decor.twidth / vc->vc_font.width,
++		  vc->vc_decor.theight / vc->vc_font.height);
++
++	if (fg_console == vc->vc_num) {
++		redraw_screen(vc, 0);
++		update_region(vc, vc->vc_origin +
++			      vc->vc_size_row * vc->vc_top,
++			      vc->vc_size_row * (vc->vc_bottom - vc->vc_top) / 2);
++		fbcon_decor_clear_margins(vc, info, 0);
++	}
++
++	printk(KERN_INFO "fbcondecor: switched decor state to 'on' on console %d\n",
++			 vc->vc_num);
++
++	return 0;
++}
++
++static inline int fbcon_decor_ioctl_dosetstate(struct vc_data *vc, unsigned int state, unsigned char origin)
++{
++	int ret;
++
++//	if (origin == FBCON_DECOR_IO_ORIG_USER)
++		console_lock();
++	if (!state)
++		ret = fbcon_decor_disable(vc, 1);
++	else
++		ret = fbcon_decor_enable(vc);
++//	if (origin == FBCON_DECOR_IO_ORIG_USER)
++		console_unlock();
++
++	return ret;
++}
++
++static inline void fbcon_decor_ioctl_dogetstate(struct vc_data *vc, unsigned int *state)
++{
++	*state = vc->vc_decor.state;
++}
++
++static int fbcon_decor_ioctl_dosetcfg(struct vc_data *vc, struct vc_decor *cfg, unsigned char origin)
++{
++	struct fb_info *info;
++	int len;
++	char *tmp;
++
++	info = registered_fb[(int) con2fb_map[vc->vc_num]];
++
++	if (info == NULL || !cfg->twidth || !cfg->theight ||
++	    cfg->tx + cfg->twidth  > info->var.xres ||
++	    cfg->ty + cfg->theight > info->var.yres)
++		return -EINVAL;
++
++	len = strlen_user(cfg->theme);
++	if (!len || len > FBCON_DECOR_THEME_LEN)
++		return -EINVAL;
++	tmp = kmalloc(len, GFP_KERNEL);
++	if (!tmp)
++		return -ENOMEM;
++	if (copy_from_user(tmp, (void __user *)cfg->theme, len))
++		return -EFAULT;
++	cfg->theme = tmp;
++	cfg->state = 0;
++
++	/* If this ioctl is a response to a request from kernel, the console sem
++	 * is already held; we also don't need to disable decor because either the
++	 * new config and background picture will be successfully loaded, and the
++	 * decor will stay on, or in case of a failure it'll be turned off in fbcon. */
++//	if (origin == FBCON_DECOR_IO_ORIG_USER) {
++		console_lock();
++		if (vc->vc_decor.state)
++			fbcon_decor_disable(vc, 1);
++//	}
++
++	if (vc->vc_decor.theme)
++		kfree(vc->vc_decor.theme);
++
++	vc->vc_decor = *cfg;
++
++//	if (origin == FBCON_DECOR_IO_ORIG_USER)
++		console_unlock();
++
++	printk(KERN_INFO "fbcondecor: console %d using theme '%s'\n",
++			 vc->vc_num, vc->vc_decor.theme);
++	return 0;
++}
++
++static int fbcon_decor_ioctl_dogetcfg(struct vc_data *vc, struct vc_decor *decor)
++{
++	char __user *tmp;
++
++	tmp = decor->theme;
++	*decor = vc->vc_decor;
++	decor->theme = tmp;
++
++	if (vc->vc_decor.theme) {
++		if (copy_to_user(tmp, vc->vc_decor.theme, strlen(vc->vc_decor.theme) + 1))
++			return -EFAULT;
++	} else
++		if (put_user(0, tmp))
++			return -EFAULT;
++
++	return 0;
++}
++
++static int fbcon_decor_ioctl_dosetpic(struct vc_data *vc, struct fb_image *img, unsigned char origin)
++{
++	struct fb_info *info;
++	int len;
++	u8 *tmp;
++
++	if (vc->vc_num != fg_console)
++		return -EINVAL;
++
++	info = registered_fb[(int) con2fb_map[vc->vc_num]];
++
++	if (info == NULL)
++		return -EINVAL;
++
++	if (img->width != info->var.xres || img->height != info->var.yres) {
++		printk(KERN_ERR "fbcondecor: picture dimensions mismatch\n");
++		printk(KERN_ERR "%dx%d vs %dx%d\n", img->width, img->height, info->var.xres, info->var.yres);
++		return -EINVAL;
++	}
++
++	if (img->depth != info->var.bits_per_pixel) {
++		printk(KERN_ERR "fbcondecor: picture depth mismatch\n");
++		return -EINVAL;
++	}
++
++	if (img->depth == 8) {
++		if (!img->cmap.len || !img->cmap.red || !img->cmap.green ||
++		    !img->cmap.blue)
++			return -EINVAL;
++
++		tmp = vmalloc(img->cmap.len * 3 * 2);
++		if (!tmp)
++			return -ENOMEM;
++
++		if (copy_from_user(tmp,
++			    	   (void __user*)img->cmap.red, (img->cmap.len << 1)) ||
++		    copy_from_user(tmp + (img->cmap.len << 1),
++			    	   (void __user*)img->cmap.green, (img->cmap.len << 1)) ||
++		    copy_from_user(tmp + (img->cmap.len << 2),
++			    	   (void __user*)img->cmap.blue, (img->cmap.len << 1))) {
++			vfree(tmp);
++			return -EFAULT;
++		}
++
++		img->cmap.transp = NULL;
++		img->cmap.red = (u16*)tmp;
++		img->cmap.green = img->cmap.red + img->cmap.len;
++		img->cmap.blue = img->cmap.green + img->cmap.len;
++	} else {
++		img->cmap.red = NULL;
++	}
++
++	len = ((img->depth + 7) >> 3) * img->width * img->height;
++
++	/*
++	 * Allocate an additional byte so that we never go outside of the
++	 * buffer boundaries in the rendering functions in a 24 bpp mode.
++	 */
++	tmp = vmalloc(len + 1);
++
++	if (!tmp)
++		goto out;
++
++	if (copy_from_user(tmp, (void __user*)img->data, len))
++		goto out;
++
++	img->data = tmp;
++
++	/* If this ioctl is a response to a request from kernel, the console sem
++	 * is already held. */
++//	if (origin == FBCON_DECOR_IO_ORIG_USER)
++		console_lock();
++
++	if (info->bgdecor.data)
++		vfree((u8*)info->bgdecor.data);
++	if (info->bgdecor.cmap.red)
++		vfree(info->bgdecor.cmap.red);
++
++	info->bgdecor = *img;
++
++	if (fbcon_decor_active_vc(vc) && fg_console == vc->vc_num) {
++		redraw_screen(vc, 0);
++		update_region(vc, vc->vc_origin +
++			      vc->vc_size_row * vc->vc_top,
++			      vc->vc_size_row * (vc->vc_bottom - vc->vc_top) / 2);
++		fbcon_decor_clear_margins(vc, info, 0);
++	}
++
++//	if (origin == FBCON_DECOR_IO_ORIG_USER)
++		console_unlock();
++
++	return 0;
++
++out:	if (img->cmap.red)
++		vfree(img->cmap.red);
++
++	if (tmp)
++		vfree(tmp);
++	return -ENOMEM;
++}
++
++static long fbcon_decor_ioctl(struct file *filp, u_int cmd, u_long arg)
++{
++	struct fbcon_decor_iowrapper __user *wrapper = (void __user*) arg;
++	struct vc_data *vc = NULL;
++	unsigned short vc_num = 0;
++	unsigned char origin = 0;
++	void __user *data = NULL;
++
++	if (!access_ok(VERIFY_READ, wrapper,
++			sizeof(struct fbcon_decor_iowrapper)))
++		return -EFAULT;
++
++	__get_user(vc_num, &wrapper->vc);
++	__get_user(origin, &wrapper->origin);
++	__get_user(data, &wrapper->data);
++
++	if (!vc_cons_allocated(vc_num))
++		return -EINVAL;
++
++	vc = vc_cons[vc_num].d;
++
++	switch (cmd) {
++	case FBIOCONDECOR_SETPIC:
++	{
++		struct fb_image img;
++		if (copy_from_user(&img, (struct fb_image __user *)data, sizeof(struct fb_image)))
++			return -EFAULT;
++
++		return fbcon_decor_ioctl_dosetpic(vc, &img, origin);
++	}
++	case FBIOCONDECOR_SETCFG:
++	{
++		struct vc_decor cfg;
++		if (copy_from_user(&cfg, (struct vc_decor __user *)data, sizeof(struct vc_decor)))
++			return -EFAULT;
++
++		return fbcon_decor_ioctl_dosetcfg(vc, &cfg, origin);
++	}
++	case FBIOCONDECOR_GETCFG:
++	{
++		int rval;
++		struct vc_decor cfg;
++
++		if (copy_from_user(&cfg, (struct vc_decor __user *)data, sizeof(struct vc_decor)))
++			return -EFAULT;
++
++		rval = fbcon_decor_ioctl_dogetcfg(vc, &cfg);
++
++		if (copy_to_user(data, &cfg, sizeof(struct vc_decor)))
++			return -EFAULT;
++		return rval;
++	}
++	case FBIOCONDECOR_SETSTATE:
++	{
++		unsigned int state = 0;
++		if (get_user(state, (unsigned int __user *)data))
++			return -EFAULT;
++		return fbcon_decor_ioctl_dosetstate(vc, state, origin);
++	}
++	case FBIOCONDECOR_GETSTATE:
++	{
++		unsigned int state = 0;
++		fbcon_decor_ioctl_dogetstate(vc, &state);
++		return put_user(state, (unsigned int __user *)data);
++	}
++
++	default:
++		return -ENOIOCTLCMD;
++	}
++}
++
++#ifdef CONFIG_COMPAT
++
++static long fbcon_decor_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) {
++
++	struct fbcon_decor_iowrapper32 __user *wrapper = (void __user *)arg;
++	struct vc_data *vc = NULL;
++	unsigned short vc_num = 0;
++	unsigned char origin = 0;
++	compat_uptr_t data_compat = 0;
++	void __user *data = NULL;
++
++	if (!access_ok(VERIFY_READ, wrapper,
++                       sizeof(struct fbcon_decor_iowrapper32)))
++		return -EFAULT;
++
++	__get_user(vc_num, &wrapper->vc);
++	__get_user(origin, &wrapper->origin);
++	__get_user(data_compat, &wrapper->data);
++	data = compat_ptr(data_compat);
++
++	if (!vc_cons_allocated(vc_num))
++		return -EINVAL;
++
++	vc = vc_cons[vc_num].d;
++
++	switch (cmd) {
++	case FBIOCONDECOR_SETPIC32:
++	{
++		struct fb_image32 img_compat;
++		struct fb_image img;
++
++		if (copy_from_user(&img_compat, (struct fb_image32 __user *)data, sizeof(struct fb_image32)))
++			return -EFAULT;
++
++		fb_image_from_compat(img, img_compat);
++
++		return fbcon_decor_ioctl_dosetpic(vc, &img, origin);
++	}
++
++	case FBIOCONDECOR_SETCFG32:
++	{
++		struct vc_decor32 cfg_compat;
++		struct vc_decor cfg;
++
++		if (copy_from_user(&cfg_compat, (struct vc_decor32 __user *)data, sizeof(struct vc_decor32)))
++			return -EFAULT;
++
++		vc_decor_from_compat(cfg, cfg_compat);
++
++		return fbcon_decor_ioctl_dosetcfg(vc, &cfg, origin);
++	}
++
++	case FBIOCONDECOR_GETCFG32:
++	{
++		int rval;
++		struct vc_decor32 cfg_compat;
++		struct vc_decor cfg;
++
++		if (copy_from_user(&cfg_compat, (struct vc_decor32 __user *)data, sizeof(struct vc_decor32)))
++			return -EFAULT;
++		cfg.theme = compat_ptr(cfg_compat.theme);
++
++		rval = fbcon_decor_ioctl_dogetcfg(vc, &cfg);
++
++		vc_decor_to_compat(cfg_compat, cfg);
++
++		if (copy_to_user((struct vc_decor32 __user *)data, &cfg_compat, sizeof(struct vc_decor32)))
++			return -EFAULT;
++		return rval;
++	}
++
++	case FBIOCONDECOR_SETSTATE32:
++	{
++		compat_uint_t state_compat = 0;
++		unsigned int state = 0;
++
++		if (get_user(state_compat, (compat_uint_t __user *)data))
++			return -EFAULT;
++
++		state = (unsigned int)state_compat;
++
++		return fbcon_decor_ioctl_dosetstate(vc, state, origin);
++	}
++
++	case FBIOCONDECOR_GETSTATE32:
++	{
++		compat_uint_t state_compat = 0;
++		unsigned int state = 0;
++
++		fbcon_decor_ioctl_dogetstate(vc, &state);
++		state_compat = (compat_uint_t)state;
++
++		return put_user(state_compat, (compat_uint_t __user *)data);
++	}
++
++	default:
++		return -ENOIOCTLCMD;
++	}
++}
++#else
++  #define fbcon_decor_compat_ioctl NULL
++#endif
++
++static struct file_operations fbcon_decor_ops = {
++	.owner = THIS_MODULE,
++	.unlocked_ioctl = fbcon_decor_ioctl,
++	.compat_ioctl = fbcon_decor_compat_ioctl
++};
++
++static struct miscdevice fbcon_decor_dev = {
++	.minor = MISC_DYNAMIC_MINOR,
++	.name = "fbcondecor",
++	.fops = &fbcon_decor_ops
++};
++
++void fbcon_decor_reset(void)
++{
++	int i;
++
++	for (i = 0; i < num_registered_fb; i++) {
++		registered_fb[i]->bgdecor.data = NULL;
++		registered_fb[i]->bgdecor.cmap.red = NULL;
++	}
++
++	for (i = 0; i < MAX_NR_CONSOLES && vc_cons[i].d; i++) {
++		vc_cons[i].d->vc_decor.state = vc_cons[i].d->vc_decor.twidth =
++						vc_cons[i].d->vc_decor.theight = 0;
++		vc_cons[i].d->vc_decor.theme = NULL;
++	}
++
++	return;
++}
++
++int fbcon_decor_init(void)
++{
++	int i;
++
++	fbcon_decor_reset();
++
++	if (initialized)
++		return 0;
++
++	i = misc_register(&fbcon_decor_dev);
++	if (i) {
++		printk(KERN_ERR "fbcondecor: failed to register device\n");
++		return i;
++	}
++
++	fbcon_decor_call_helper("init", 0);
++	initialized = 1;
++	return 0;
++}
++
++int fbcon_decor_exit(void)
++{
++	fbcon_decor_reset();
++	return 0;
++}
++
++EXPORT_SYMBOL(fbcon_decor_path);
+diff --git a/drivers/video/console/fbcondecor.h b/drivers/video/console/fbcondecor.h
+new file mode 100644
+index 0000000..3b3724b
+--- /dev/null
++++ b/drivers/video/console/fbcondecor.h
+@@ -0,0 +1,78 @@
++/* 
++ *  linux/drivers/video/console/fbcondecor.h -- Framebuffer Console Decoration headers
++ *
++ *  Copyright (C) 2004 Michal Januszewski <michalj+fbcondecor@gmail.com>
++ *
++ */
++
++#ifndef __FBCON_DECOR_H
++#define __FBCON_DECOR_H
++
++#ifndef _LINUX_FB_H
++#include <linux/fb.h>
++#endif
++
++/* This is needed for vc_cons in fbcmap.c */
++#include <linux/vt_kern.h>
++
++struct fb_cursor;
++struct fb_info;
++struct vc_data;
++
++#ifdef CONFIG_FB_CON_DECOR
++/* fbcondecor.c */
++int fbcon_decor_init(void);
++int fbcon_decor_exit(void);
++int fbcon_decor_call_helper(char* cmd, unsigned short cons);
++int fbcon_decor_disable(struct vc_data *vc, unsigned char redraw);
++
++/* cfbcondecor.c */
++void fbcon_decor_putcs(struct vc_data *vc, struct fb_info *info, const unsigned short *s, int count, int yy, int xx);
++void fbcon_decor_cursor(struct fb_info *info, struct fb_cursor *cursor);
++void fbcon_decor_clear(struct vc_data *vc, struct fb_info *info, int sy, int sx, int height, int width);
++void fbcon_decor_clear_margins(struct vc_data *vc, struct fb_info *info, int bottom_only);
++void fbcon_decor_blank(struct vc_data *vc, struct fb_info *info, int blank);
++void fbcon_decor_bmove_redraw(struct vc_data *vc, struct fb_info *info, int y, int sx, int dx, int width);
++void fbcon_decor_copy(u8 *dst, u8 *src, int height, int width, int linebytes, int srclinesbytes, int bpp);
++void fbcon_decor_fix_pseudo_pal(struct fb_info *info, struct vc_data *vc);
++
++/* vt.c */
++void acquire_console_sem(void);
++void release_console_sem(void);
++void do_unblank_screen(int entering_gfx);
++
++/* struct vc_data *y */
++#define fbcon_decor_active_vc(y) (y->vc_decor.state && y->vc_decor.theme) 
++
++/* struct fb_info *x, struct vc_data *y */
++#define fbcon_decor_active_nores(x,y) (x->bgdecor.data && fbcon_decor_active_vc(y))
++
++/* struct fb_info *x, struct vc_data *y */
++#define fbcon_decor_active(x,y) (fbcon_decor_active_nores(x,y) &&		\
++			      x->bgdecor.width == x->var.xres && 	\
++			      x->bgdecor.height == x->var.yres &&	\
++			      x->bgdecor.depth == x->var.bits_per_pixel)
++
++
++#else /* CONFIG_FB_CON_DECOR */
++
++static inline void fbcon_decor_putcs(struct vc_data *vc, struct fb_info *info, const unsigned short *s, int count, int yy, int xx) {}
++static inline void fbcon_decor_putc(struct vc_data *vc, struct fb_info *info, int c, int ypos, int xpos) {}
++static inline void fbcon_decor_cursor(struct fb_info *info, struct fb_cursor *cursor) {}
++static inline void fbcon_decor_clear(struct vc_data *vc, struct fb_info *info, int sy, int sx, int height, int width) {}
++static inline void fbcon_decor_clear_margins(struct vc_data *vc, struct fb_info *info, int bottom_only) {}
++static inline void fbcon_decor_blank(struct vc_data *vc, struct fb_info *info, int blank) {}
++static inline void fbcon_decor_bmove_redraw(struct vc_data *vc, struct fb_info *info, int y, int sx, int dx, int width) {}
++static inline void fbcon_decor_fix_pseudo_pal(struct fb_info *info, struct vc_data *vc) {}
++static inline int fbcon_decor_call_helper(char* cmd, unsigned short cons) { return 0; }
++static inline int fbcon_decor_init(void) { return 0; }
++static inline int fbcon_decor_exit(void) { return 0; }
++static inline int fbcon_decor_disable(struct vc_data *vc, unsigned char redraw) { return 0; }
++
++#define fbcon_decor_active_vc(y) (0)
++#define fbcon_decor_active_nores(x,y) (0)
++#define fbcon_decor_active(x,y) (0)
++
++#endif /* CONFIG_FB_CON_DECOR */
++
++#endif /* __FBCON_DECOR_H */
+diff --git a/drivers/video/fbdev/Kconfig b/drivers/video/fbdev/Kconfig
+index e1f4727..2952e33 100644
+--- a/drivers/video/fbdev/Kconfig
++++ b/drivers/video/fbdev/Kconfig
+@@ -1204,7 +1204,6 @@ config FB_MATROX
+ 	select FB_CFB_FILLRECT
+ 	select FB_CFB_COPYAREA
+ 	select FB_CFB_IMAGEBLIT
+-	select FB_TILEBLITTING
+ 	select FB_MACMODES if PPC_PMAC
+ 	---help---
+ 	  Say Y here if you have a Matrox Millennium, Matrox Millennium II,
+diff --git a/drivers/video/fbdev/core/fbcmap.c b/drivers/video/fbdev/core/fbcmap.c
+index f89245b..05e036c 100644
+--- a/drivers/video/fbdev/core/fbcmap.c
++++ b/drivers/video/fbdev/core/fbcmap.c
+@@ -17,6 +17,8 @@
+ #include <linux/slab.h>
+ #include <linux/uaccess.h>
+ 
++#include "../../console/fbcondecor.h"
++
+ static u16 red2[] __read_mostly = {
+     0x0000, 0xaaaa
+ };
+@@ -249,14 +251,17 @@ int fb_set_cmap(struct fb_cmap *cmap, struct fb_info *info)
+ 			if (transp)
+ 				htransp = *transp++;
+ 			if (info->fbops->fb_setcolreg(start++,
+-						      hred, hgreen, hblue,
++						      hred, hgreen, hblue, 
+ 						      htransp, info))
+ 				break;
+ 		}
+ 	}
+-	if (rc == 0)
++	if (rc == 0) {
+ 		fb_copy_cmap(cmap, &info->cmap);
+-
++		if (fbcon_decor_active(info, vc_cons[fg_console].d) &&
++		    info->fix.visual == FB_VISUAL_DIRECTCOLOR)
++			fbcon_decor_fix_pseudo_pal(info, vc_cons[fg_console].d);
++	}
+ 	return rc;
+ }
+ 
+diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
+index b6d5008..d6703f2 100644
+--- a/drivers/video/fbdev/core/fbmem.c
++++ b/drivers/video/fbdev/core/fbmem.c
+@@ -1250,15 +1250,6 @@ struct fb_fix_screeninfo32 {
+ 	u16			reserved[3];
+ };
+ 
+-struct fb_cmap32 {
+-	u32			start;
+-	u32			len;
+-	compat_caddr_t	red;
+-	compat_caddr_t	green;
+-	compat_caddr_t	blue;
+-	compat_caddr_t	transp;
+-};
+-
+ static int fb_getput_cmap(struct fb_info *info, unsigned int cmd,
+ 			  unsigned long arg)
+ {
+diff --git a/include/linux/console_decor.h b/include/linux/console_decor.h
+new file mode 100644
+index 0000000..04b8d80
+--- /dev/null
++++ b/include/linux/console_decor.h
+@@ -0,0 +1,46 @@
++#ifndef _LINUX_CONSOLE_DECOR_H_
++#define _LINUX_CONSOLE_DECOR_H_ 1
++
++/* A structure used by the framebuffer console decorations (drivers/video/console/fbcondecor.c) */
++struct vc_decor {
++	__u8 bg_color;				/* The color that is to be treated as transparent */
++	__u8 state;				/* Current decor state: 0 = off, 1 = on */
++	__u16 tx, ty;				/* Top left corner coordinates of the text field */
++	__u16 twidth, theight;			/* Width and height of the text field */
++	char* theme;
++};
++
++#ifdef __KERNEL__
++#ifdef CONFIG_COMPAT
++#include <linux/compat.h>
++
++struct vc_decor32 {
++	__u8 bg_color;				/* The color that is to be treated as transparent */
++	__u8 state;				/* Current decor state: 0 = off, 1 = on */
++	__u16 tx, ty;				/* Top left corner coordinates of the text field */
++	__u16 twidth, theight;			/* Width and height of the text field */
++	compat_uptr_t theme;
++};
++
++#define vc_decor_from_compat(to, from) \
++	(to).bg_color = (from).bg_color; \
++	(to).state    = (from).state; \
++	(to).tx       = (from).tx; \
++	(to).ty       = (from).ty; \
++	(to).twidth   = (from).twidth; \
++	(to).theight  = (from).theight; \
++	(to).theme    = compat_ptr((from).theme)
++
++#define vc_decor_to_compat(to, from) \
++	(to).bg_color = (from).bg_color; \
++	(to).state    = (from).state; \
++	(to).tx       = (from).tx; \
++	(to).ty       = (from).ty; \
++	(to).twidth   = (from).twidth; \
++	(to).theight  = (from).theight; \
++	(to).theme    = ptr_to_compat((from).theme)
++
++#endif /* CONFIG_COMPAT */
++#endif /* __KERNEL__ */
++
++#endif
+diff --git a/include/linux/console_struct.h b/include/linux/console_struct.h
+index 7f0c329..98f5d60 100644
+--- a/include/linux/console_struct.h
++++ b/include/linux/console_struct.h
+@@ -19,6 +19,7 @@
+ struct vt_struct;
+ 
+ #define NPAR 16
++#include <linux/console_decor.h>
+ 
+ struct vc_data {
+ 	struct tty_port port;			/* Upper level data */
+@@ -107,6 +108,8 @@ struct vc_data {
+ 	unsigned long	vc_uni_pagedir;
+ 	unsigned long	*vc_uni_pagedir_loc;  /* [!] Location of uni_pagedir variable for this console */
+ 	bool vc_panic_force_write; /* when oops/panic this VC can accept forced output/blanking */
++
++	struct vc_decor vc_decor;
+ 	/* additional information is in vt_kern.h */
+ };
+ 
+diff --git a/include/linux/fb.h b/include/linux/fb.h
+index fe6ac95..1e36b03 100644
+--- a/include/linux/fb.h
++++ b/include/linux/fb.h
+@@ -219,6 +219,34 @@ struct fb_deferred_io {
+ };
+ #endif
+ 
++#ifdef __KERNEL__
++#ifdef CONFIG_COMPAT
++struct fb_image32 {
++	__u32 dx;			/* Where to place image */
++	__u32 dy;
++	__u32 width;			/* Size of image */
++	__u32 height;
++	__u32 fg_color;			/* Only used when a mono bitmap */
++	__u32 bg_color;
++	__u8  depth;			/* Depth of the image */
++	const compat_uptr_t data;	/* Pointer to image data */
++	struct fb_cmap32 cmap;		/* color map info */
++};
++
++#define fb_image_from_compat(to, from) \
++	(to).dx       = (from).dx; \
++	(to).dy       = (from).dy; \
++	(to).width    = (from).width; \
++	(to).height   = (from).height; \
++	(to).fg_color = (from).fg_color; \
++	(to).bg_color = (from).bg_color; \
++	(to).depth    = (from).depth; \
++	(to).data     = compat_ptr((from).data); \
++	fb_cmap_from_compat((to).cmap, (from).cmap)
++
++#endif /* CONFIG_COMPAT */
++#endif /* __KERNEL__ */
++
+ /*
+  * Frame buffer operations
+  *
+@@ -489,6 +517,9 @@ struct fb_info {
+ #define FBINFO_STATE_SUSPENDED	1
+ 	u32 state;			/* Hardware state i.e suspend */
+ 	void *fbcon_par;                /* fbcon use-only private area */
++
++	struct fb_image bgdecor;
++
+ 	/* From here on everything is device dependent */
+ 	void *par;
+ 	/* we need the PCI or similar aperture base/size not
+diff --git a/include/uapi/linux/fb.h b/include/uapi/linux/fb.h
+index fb795c3..dc77a03 100644
+--- a/include/uapi/linux/fb.h
++++ b/include/uapi/linux/fb.h
+@@ -8,6 +8,25 @@
+ 
+ #define FB_MAX			32	/* sufficient for now */
+ 
++struct fbcon_decor_iowrapper
++{
++	unsigned short vc;		/* Virtual console */
++	unsigned char origin;		/* Point of origin of the request */
++	void *data;
++};
++
++#ifdef __KERNEL__
++#ifdef CONFIG_COMPAT
++#include <linux/compat.h>
++struct fbcon_decor_iowrapper32
++{
++	unsigned short vc;		/* Virtual console */
++	unsigned char origin;		/* Point of origin of the request */
++	compat_uptr_t data;
++};
++#endif /* CONFIG_COMPAT */
++#endif /* __KERNEL__ */
++
+ /* ioctls
+    0x46 is 'F'								*/
+ #define FBIOGET_VSCREENINFO	0x4600
+@@ -35,6 +54,25 @@
+ #define FBIOGET_DISPINFO        0x4618
+ #define FBIO_WAITFORVSYNC	_IOW('F', 0x20, __u32)
+ 
++#define FBIOCONDECOR_SETCFG	_IOWR('F', 0x19, struct fbcon_decor_iowrapper)
++#define FBIOCONDECOR_GETCFG	_IOR('F', 0x1A, struct fbcon_decor_iowrapper)
++#define FBIOCONDECOR_SETSTATE	_IOWR('F', 0x1B, struct fbcon_decor_iowrapper)
++#define FBIOCONDECOR_GETSTATE	_IOR('F', 0x1C, struct fbcon_decor_iowrapper)
++#define FBIOCONDECOR_SETPIC 	_IOWR('F', 0x1D, struct fbcon_decor_iowrapper)
++#ifdef __KERNEL__
++#ifdef CONFIG_COMPAT
++#define FBIOCONDECOR_SETCFG32	_IOWR('F', 0x19, struct fbcon_decor_iowrapper32)
++#define FBIOCONDECOR_GETCFG32	_IOR('F', 0x1A, struct fbcon_decor_iowrapper32)
++#define FBIOCONDECOR_SETSTATE32	_IOWR('F', 0x1B, struct fbcon_decor_iowrapper32)
++#define FBIOCONDECOR_GETSTATE32	_IOR('F', 0x1C, struct fbcon_decor_iowrapper32)
++#define FBIOCONDECOR_SETPIC32	_IOWR('F', 0x1D, struct fbcon_decor_iowrapper32)
++#endif /* CONFIG_COMPAT */
++#endif /* __KERNEL__ */
++
++#define FBCON_DECOR_THEME_LEN		128	/* Maximum lenght of a theme name */
++#define FBCON_DECOR_IO_ORIG_KERNEL	0	/* Kernel ioctl origin */
++#define FBCON_DECOR_IO_ORIG_USER	1	/* User ioctl origin */
++ 
+ #define FB_TYPE_PACKED_PIXELS		0	/* Packed Pixels	*/
+ #define FB_TYPE_PLANES			1	/* Non interleaved planes */
+ #define FB_TYPE_INTERLEAVED_PLANES	2	/* Interleaved planes	*/
+@@ -277,6 +315,29 @@ struct fb_var_screeninfo {
+ 	__u32 reserved[4];		/* Reserved for future compatibility */
+ };
+ 
++#ifdef __KERNEL__
++#ifdef CONFIG_COMPAT
++struct fb_cmap32 {
++	__u32 start;
++	__u32 len;			/* Number of entries */
++	compat_uptr_t red;		/* Red values	*/
++	compat_uptr_t green;
++	compat_uptr_t blue;
++	compat_uptr_t transp;		/* transparency, can be NULL */
++};
++
++#define fb_cmap_from_compat(to, from) \
++	(to).start  = (from).start; \
++	(to).len    = (from).len; \
++	(to).red    = compat_ptr((from).red); \
++	(to).green  = compat_ptr((from).green); \
++	(to).blue   = compat_ptr((from).blue); \
++	(to).transp = compat_ptr((from).transp)
++
++#endif /* CONFIG_COMPAT */
++#endif /* __KERNEL__ */
++
++
+ struct fb_cmap {
+ 	__u32 start;			/* First entry	*/
+ 	__u32 len;			/* Number of entries */
+diff --git a/kernel/sysctl.c b/kernel/sysctl.c
+index 74f5b58..6386ab0 100644
+--- a/kernel/sysctl.c
++++ b/kernel/sysctl.c
+@@ -146,6 +146,10 @@ static const int cap_last_cap = CAP_LAST_CAP;
+ static unsigned long hung_task_timeout_max = (LONG_MAX/HZ);
+ #endif
+ 
++#ifdef CONFIG_FB_CON_DECOR
++extern char fbcon_decor_path[];
++#endif
++
+ #ifdef CONFIG_INOTIFY_USER
+ #include <linux/inotify.h>
+ #endif
+@@ -255,6 +259,15 @@ static struct ctl_table sysctl_base_table[] = {
+ 		.mode		= 0555,
+ 		.child		= dev_table,
+ 	},
++#ifdef CONFIG_FB_CON_DECOR
++	{
++		.procname	= "fbcondecor",
++		.data		= &fbcon_decor_path,
++		.maxlen		= KMOD_PATH_LEN,
++		.mode		= 0644,
++		.proc_handler	= &proc_dostring,
++	},
++#endif
+ 	{ }
+ };
+ 

diff --git a/4567_distro-Gentoo-Kconfig.patch b/4567_distro-Gentoo-Kconfig.patch
index c7af596..652e2a7 100644
--- a/4567_distro-Gentoo-Kconfig.patch
+++ b/4567_distro-Gentoo-Kconfig.patch
@@ -1,5 +1,5 @@
---- a/Kconfig
-+++ b/Kconfig
+--- a/Kconfig	2014-04-02 09:45:05.389224541 -0400
++++ b/Kconfig	2014-04-02 09:45:39.269224273 -0400
 @@ -8,4 +8,6 @@ config SRCARCH
  	string
  	option env="SRCARCH"
@@ -7,9 +7,9 @@
 +source "distro/Kconfig"
 +
  source "arch/$SRCARCH/Kconfig"
---- /dev/null
-+++ b/distro/Kconfig
-@@ -0,0 +1,131 @@
+--- 	1969-12-31 19:00:00.000000000 -0500
++++ b/distro/Kconfig	2014-04-02 09:57:03.539218861 -0400
+@@ -0,0 +1,108 @@
 +menu "Gentoo Linux"
 +
 +config GENTOO_LINUX
@@ -30,7 +30,7 @@
 +
 +	depends on GENTOO_LINUX
 +	default y if GENTOO_LINUX
-+
++	
 +	select DEVTMPFS
 +	select TMPFS
 +
@@ -51,29 +51,7 @@
 +		boot process; if not available, it causes sysfs and udev to malfunction.
 +
 +		To ensure Gentoo Linux boots, it is best to leave this setting enabled;
-+		if you run a custom setup, you could consider whether to disable this.
-+
-+config GENTOO_LINUX_PORTAGE
-+	bool "Select options required by Portage features"
-+
-+	depends on GENTOO_LINUX
-+	default y if GENTOO_LINUX
-+
-+	select CGROUPS
-+	select NAMESPACES
-+	select IPC_NS
-+	select NET_NS
-+
-+	help
-+		This enables options required by various Portage FEATURES.
-+		Currently this selects:
-+
-+		CGROUPS     (required for FEATURES=cgroup)
-+		IPC_NS      (required for FEATURES=ipc-sandbox)
-+		NET_NS      (required for FEATURES=network-sandbox)
-+
-+		It is highly recommended that you leave this enabled as these FEATURES
-+		are, or will soon be, enabled by default.
++		if you run a custom setup, you could consider whether to disable this. 
 +
 +menu "Support for init systems, system and service managers"
 +	visible if GENTOO_LINUX
@@ -109,13 +87,12 @@
 +	select AUTOFS4_FS
 +	select BLK_DEV_BSG
 +	select CGROUPS
-+	select DEVPTS_MULTIPLE_INSTANCES
 +	select EPOLL
 +	select FANOTIFY
 +	select FHANDLE
 +	select INOTIFY_USER
 +	select NET
-+	select NET_NS
++	select NET_NS 
 +	select PROC_FS
 +	select SIGNALFD
 +	select SYSFS

diff --git a/5000_enable-additional-cpu-optimizations-for-gcc.patch b/5000_enable-additional-cpu-optimizations-for-gcc.patch
new file mode 100644
index 0000000..f7ab6f0
--- /dev/null
+++ b/5000_enable-additional-cpu-optimizations-for-gcc.patch
@@ -0,0 +1,327 @@
+This patch has been tested on and known to work with kernel versions from 3.2
+up to the latest git version (pulled on 12/14/2013).
+
+This patch will expand the number of microarchitectures to include new
+processors including: AMD K10-family, AMD Family 10h (Barcelona), AMD Family
+14h (Bobcat), AMD Family 15h (Bulldozer), AMD Family 15h (Piledriver), AMD
+Family 16h (Jaguar), Intel 1st Gen Core i3/i5/i7 (Nehalem), Intel 2nd Gen Core
+i3/i5/i7 (Sandybridge), Intel 3rd Gen Core i3/i5/i7 (Ivybridge), and Intel 4th
+Gen Core i3/i5/i7 (Haswell). It also offers the compiler the 'native' flag.
+
+Small but real speed increases are measurable using a make endpoint comparing
+a generic kernel to one built with one of the respective microarchs.
+
+See the following experimental evidence supporting this statement:
+https://github.com/graysky2/kernel_gcc_patch
+
+REQUIREMENTS
+linux version >=3.15
+gcc version <4.9
+
+---
+diff -uprN a/arch/x86/include/asm/module.h b/arch/x86/include/asm/module.h
+--- a/arch/x86/include/asm/module.h	2013-11-03 18:41:51.000000000 -0500
++++ b/arch/x86/include/asm/module.h	2013-12-15 06:21:24.351122516 -0500
+@@ -15,6 +15,16 @@
+ #define MODULE_PROC_FAMILY "586MMX "
+ #elif defined CONFIG_MCORE2
+ #define MODULE_PROC_FAMILY "CORE2 "
++#elif defined CONFIG_MNATIVE
++#define MODULE_PROC_FAMILY "NATIVE "
++#elif defined CONFIG_MCOREI7
++#define MODULE_PROC_FAMILY "COREI7 "
++#elif defined CONFIG_MCOREI7AVX
++#define MODULE_PROC_FAMILY "COREI7AVX "
++#elif defined CONFIG_MCOREAVXI
++#define MODULE_PROC_FAMILY "COREAVXI "
++#elif defined CONFIG_MCOREAVX2
++#define MODULE_PROC_FAMILY "COREAVX2 "
+ #elif defined CONFIG_MATOM
+ #define MODULE_PROC_FAMILY "ATOM "
+ #elif defined CONFIG_M686
+@@ -33,6 +43,18 @@
+ #define MODULE_PROC_FAMILY "K7 "
+ #elif defined CONFIG_MK8
+ #define MODULE_PROC_FAMILY "K8 "
++#elif defined CONFIG_MK10
++#define MODULE_PROC_FAMILY "K10 "
++#elif defined CONFIG_MBARCELONA
++#define MODULE_PROC_FAMILY "BARCELONA "
++#elif defined CONFIG_MBOBCAT
++#define MODULE_PROC_FAMILY "BOBCAT "
++#elif defined CONFIG_MBULLDOZER
++#define MODULE_PROC_FAMILY "BULLDOZER "
++#elif defined CONFIG_MPILEDRIVER
++#define MODULE_PROC_FAMILY "PILEDRIVER "
++#elif defined CONFIG_MJAGUAR
++#define MODULE_PROC_FAMILY "JAGUAR "
+ #elif defined CONFIG_MELAN
+ #define MODULE_PROC_FAMILY "ELAN "
+ #elif defined CONFIG_MCRUSOE
+diff -uprN a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
+--- a/arch/x86/Kconfig.cpu	2013-11-03 18:41:51.000000000 -0500
++++ b/arch/x86/Kconfig.cpu	2013-12-15 06:21:24.351122516 -0500
+@@ -139,7 +139,7 @@ config MPENTIUM4
+ 
+ 
+ config MK6
+-	bool "K6/K6-II/K6-III"
++	bool "AMD K6/K6-II/K6-III"
+ 	depends on X86_32
+ 	---help---
+ 	  Select this for an AMD K6-family processor.  Enables use of
+@@ -147,7 +147,7 @@ config MK6
+ 	  flags to GCC.
+ 
+ config MK7
+-	bool "Athlon/Duron/K7"
++	bool "AMD Athlon/Duron/K7"
+ 	depends on X86_32
+ 	---help---
+ 	  Select this for an AMD Athlon K7-family processor.  Enables use of
+@@ -155,12 +155,55 @@ config MK7
+ 	  flags to GCC.
+ 
+ config MK8
+-	bool "Opteron/Athlon64/Hammer/K8"
++	bool "AMD Opteron/Athlon64/Hammer/K8"
+ 	---help---
+ 	  Select this for an AMD Opteron or Athlon64 Hammer-family processor.
+ 	  Enables use of some extended instructions, and passes appropriate
+ 	  optimization flags to GCC.
+ 
++config MK10
++	bool "AMD 61xx/7x50/PhenomX3/X4/II/K10"
++	---help---
++	  Select this for an AMD 61xx Eight-Core Magny-Cours, Athlon X2 7x50,
++		Phenom X3/X4/II, Athlon II X2/X3/X4, or Turion II-family processor.
++	  Enables use of some extended instructions, and passes appropriate
++	  optimization flags to GCC.
++
++config MBARCELONA
++	bool "AMD Barcelona"
++	---help---
++	  Select this for AMD Barcelona and newer processors.
++
++	  Enables -march=barcelona
++
++config MBOBCAT
++	bool "AMD Bobcat"
++	---help---
++	  Select this for AMD Bobcat processors.
++
++	  Enables -march=btver1
++
++config MBULLDOZER
++	bool "AMD Bulldozer"
++	---help---
++	  Select this for AMD Bulldozer processors.
++
++	  Enables -march=bdver1
++
++config MPILEDRIVER
++	bool "AMD Piledriver"
++	---help---
++	  Select this for AMD Piledriver processors.
++
++	  Enables -march=bdver2
++
++config MJAGUAR
++	bool "AMD Jaguar"
++	---help---
++	  Select this for AMD Jaguar processors.
++
++	  Enables -march=btver2
++
+ config MCRUSOE
+ 	bool "Crusoe"
+ 	depends on X86_32
+@@ -251,8 +294,17 @@ config MPSC
+ 	  using the cpu family field
+ 	  in /proc/cpuinfo. Family 15 is an older Xeon, Family 6 a newer one.
+ 
++config MATOM
++	bool "Intel Atom"
++	---help---
++
++	  Select this for the Intel Atom platform. Intel Atom CPUs have an
++	  in-order pipelining architecture and thus can benefit from
++	  accordingly optimized code. Use a recent GCC with specific Atom
++	  support in order to fully benefit from selecting this option.
++
+ config MCORE2
+-	bool "Core 2/newer Xeon"
++	bool "Intel Core 2"
+ 	---help---
+ 
+ 	  Select this for Intel Core 2 and newer Core 2 Xeons (Xeon 51xx and
+@@ -260,14 +312,40 @@ config MCORE2
+ 	  family in /proc/cpuinfo. Newer ones have 6 and older ones 15
+ 	  (not a typo)
+ 
+-config MATOM
+-	bool "Intel Atom"
++	  Enables -march=core2
++
++config MCOREI7
++	bool "Intel Core i7"
+ 	---help---
+ 
+-	  Select this for the Intel Atom platform. Intel Atom CPUs have an
+-	  in-order pipelining architecture and thus can benefit from
+-	  accordingly optimized code. Use a recent GCC with specific Atom
+-	  support in order to fully benefit from selecting this option.
++	  Select this for the Intel Nehalem platform. Intel Nehalem proecessors
++	  include Core i3, i5, i7, Xeon: 34xx, 35xx, 55xx, 56xx, 75xx processors.
++
++	  Enables -march=corei7
++
++config MCOREI7AVX
++	bool "Intel Core 2nd Gen AVX"
++	---help---
++
++	  Select this for 2nd Gen Core processors including Sandy Bridge.
++
++	  Enables -march=corei7-avx
++
++config MCOREAVXI
++	bool "Intel Core 3rd Gen AVX"
++	---help---
++
++	  Select this for 3rd Gen Core processors including Ivy Bridge.
++
++	  Enables -march=core-avx-i
++
++config MCOREAVX2
++	bool "Intel Core AVX2"
++	---help---
++
++	  Select this for AVX2 enabled processors including Haswell.
++
++	  Enables -march=core-avx2
+ 
+ config GENERIC_CPU
+ 	bool "Generic-x86-64"
+@@ -276,6 +354,19 @@ config GENERIC_CPU
+ 	  Generic x86-64 CPU.
+ 	  Run equally well on all x86-64 CPUs.
+ 
++config MNATIVE
++ bool "Native optimizations autodetected by GCC"
++ ---help---
++
++   GCC 4.2 and above support -march=native, which automatically detects
++   the optimum settings to use based on your processor. -march=native
++   also detects and applies additional settings beyond -march specific
++   to your CPU, (eg. -msse4). Unless you have a specific reason not to
++   (e.g. distcc cross-compiling), you should probably be using
++   -march=native rather than anything listed below.
++
++   Enables -march=native
++
+ endchoice
+ 
+ config X86_GENERIC
+@@ -300,7 +391,7 @@ config X86_INTERNODE_CACHE_SHIFT
+ config X86_L1_CACHE_SHIFT
+ 	int
+ 	default "7" if MPENTIUM4 || MPSC
+-	default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MATOM || MVIAC7 || X86_GENERIC || GENERIC_CPU
++	default "6" if MK7 || MK8 || MK10 || MBARCELONA || MBOBCAT || MBULLDOZER || MPILEDRIVER || MJAGUAR || MPENTIUMM || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2 || MATOM || MVIAC7 || X86_GENERIC || MNATIVE || GENERIC_CPU
+ 	default "4" if MELAN || M486 || MGEODEGX1
+ 	default "5" if MWINCHIP3D || MWINCHIPC6 || MCRUSOE || MEFFICEON || MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || M586 || MVIAC3_2 || MGEODE_LX
+ 
+@@ -331,11 +422,11 @@ config X86_ALIGNMENT_16
+ 
+ config X86_INTEL_USERCOPY
+ 	def_bool y
+-	depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || X86_GENERIC || MK8 || MK7 || MEFFICEON || MCORE2
++	depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || MNATIVE || X86_GENERIC || MK8 || MK7 || MK10 || MBARCELONA || MEFFICEON || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2
+ 
+ config X86_USE_PPRO_CHECKSUM
+ 	def_bool y
+-	depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MVIAC3_2 || MVIAC7 || MEFFICEON || MGEODE_LX || MCORE2 || MATOM
++	depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MK10 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MVIAC3_2 || MVIAC7 || MEFFICEON || MGEODE_LX || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2 || MATOM || MNATIVE
+ 
+ config X86_USE_3DNOW
+ 	def_bool y
+@@ -363,17 +454,17 @@ config X86_P6_NOP
+ 
+ config X86_TSC
+ 	def_bool y
+-	depends on (MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MCORE2 || MATOM) || X86_64
++	depends on (MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MK10 || MBARCELONA || MBOBCAT || MBULLDOZER || MPILEDRIVER || MJAGUAR || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MCORE2 || MCOREI7 || MCOREI7-AVX || MATOM) || X86_64 || MNATIVE
+ 
+ config X86_CMPXCHG64
+ 	def_bool y
+-	depends on X86_PAE || X86_64 || MCORE2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MATOM
++	depends on X86_PAE || X86_64 || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MATOM || MNATIVE
+ 
+ # this should be set for all -march=.. options where the compiler
+ # generates cmov.
+ config X86_CMOV
+ 	def_bool y
+-	depends on (MK8 || MK7 || MCORE2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || X86_64 || MATOM || MGEODE_LX)
++	depends on (MK8 || MK10 || MBARCELONA || MBOBCAT || MBULLDOZER || MPILEDRIVER || MJAGUAR || MK7 || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || X86_64 || MNATIVE || MATOM || MGEODE_LX)
+ 
+ config X86_MINIMUM_CPU_FAMILY
+ 	int
+diff -uprN a/arch/x86/Makefile b/arch/x86/Makefile
+--- a/arch/x86/Makefile	2013-11-03 18:41:51.000000000 -0500
++++ b/arch/x86/Makefile	2013-12-15 06:21:24.354455723 -0500
+@@ -61,11 +61,26 @@ else
+ 	KBUILD_CFLAGS += $(call cc-option,-mno-sse -mpreferred-stack-boundary=3)
+ 
+         # FIXME - should be integrated in Makefile.cpu (Makefile_32.cpu)
++        cflags-$(CONFIG_MNATIVE) += $(call cc-option,-march=native)
+         cflags-$(CONFIG_MK8) += $(call cc-option,-march=k8)
++        cflags-$(CONFIG_MK10) += $(call cc-option,-march=amdfam10)
++        cflags-$(CONFIG_MBARCELONA) += $(call cc-option,-march=barcelona)
++        cflags-$(CONFIG_MBOBCAT) += $(call cc-option,-march=btver1)
++        cflags-$(CONFIG_MBULLDOZER) += $(call cc-option,-march=bdver1)
++        cflags-$(CONFIG_MPILEDRIVER) += $(call cc-option,-march=bdver2)
++        cflags-$(CONFIG_MJAGUAR) += $(call cc-option,-march=btver2)
+         cflags-$(CONFIG_MPSC) += $(call cc-option,-march=nocona)
+ 
+         cflags-$(CONFIG_MCORE2) += \
+-                $(call cc-option,-march=core2,$(call cc-option,-mtune=generic))
++                $(call cc-option,-march=core2,$(call cc-option,-mtune=core2))
++        cflags-$(CONFIG_MCOREI7) += \
++                $(call cc-option,-march=corei7,$(call cc-option,-mtune=corei7))
++        cflags-$(CONFIG_MCOREI7AVX) += \
++                $(call cc-option,-march=corei7-avx,$(call cc-option,-mtune=corei7-avx))
++        cflags-$(CONFIG_MCOREAVXI) += \
++                $(call cc-option,-march=core-avx-i,$(call cc-option,-mtune=core-avx-i))
++        cflags-$(CONFIG_MCOREAVX2) += \
++                $(call cc-option,-march=core-avx2,$(call cc-option,-mtune=core-avx2))
+ 	cflags-$(CONFIG_MATOM) += $(call cc-option,-march=atom) \
+ 		$(call cc-option,-mtune=atom,$(call cc-option,-mtune=generic))
+         cflags-$(CONFIG_GENERIC_CPU) += $(call cc-option,-mtune=generic)
+diff -uprN a/arch/x86/Makefile_32.cpu b/arch/x86/Makefile_32.cpu
+--- a/arch/x86/Makefile_32.cpu	2013-11-03 18:41:51.000000000 -0500
++++ b/arch/x86/Makefile_32.cpu	2013-12-15 06:21:24.354455723 -0500
+@@ -23,7 +23,14 @@ cflags-$(CONFIG_MK6)		+= -march=k6
+ # Please note, that patches that add -march=athlon-xp and friends are pointless.
+ # They make zero difference whatsosever to performance at this time.
+ cflags-$(CONFIG_MK7)		+= -march=athlon
++cflags-$(CONFIG_MNATIVE) += $(call cc-option,-march=native)
+ cflags-$(CONFIG_MK8)		+= $(call cc-option,-march=k8,-march=athlon)
++cflags-$(CONFIG_MK10)	+= $(call cc-option,-march=amdfam10,-march=athlon)
++cflags-$(CONFIG_MBARCELONA)	+= $(call cc-option,-march=barcelona,-march=athlon)
++cflags-$(CONFIG_MBOBCAT)	+= $(call cc-option,-march=btver1,-march=athlon)
++cflags-$(CONFIG_MBULLDOZER)	+= $(call cc-option,-march=bdver1,-march=athlon)
++cflags-$(CONFIG_MPILEDRIVER)	+= $(call cc-option,-march=bdver2,-march=athlon)
++cflags-$(CONFIG_MJAGUAR)	+= $(call cc-option,-march=btver2,-march=athlon)
+ cflags-$(CONFIG_MCRUSOE)	+= -march=i686 $(align)-functions=0 $(align)-jumps=0 $(align)-loops=0
+ cflags-$(CONFIG_MEFFICEON)	+= -march=i686 $(call tune,pentium3) $(align)-functions=0 $(align)-jumps=0 $(align)-loops=0
+ cflags-$(CONFIG_MWINCHIPC6)	+= $(call cc-option,-march=winchip-c6,-march=i586)
+@@ -32,6 +39,10 @@ cflags-$(CONFIG_MCYRIXIII)	+= $(call cc-
+ cflags-$(CONFIG_MVIAC3_2)	+= $(call cc-option,-march=c3-2,-march=i686)
+ cflags-$(CONFIG_MVIAC7)		+= -march=i686
+ cflags-$(CONFIG_MCORE2)		+= -march=i686 $(call tune,core2)
++cflags-$(CONFIG_MCOREI7)	+= -march=i686 $(call tune,corei7)
++cflags-$(CONFIG_MCOREI7AVX)	+= -march=i686 $(call tune,corei7-avx)
++cflags-$(CONFIG_MCOREAVXI)	+= -march=i686 $(call tune,core-avx-i)
++cflags-$(CONFIG_MCOREAVX2)	+= -march=i686 $(call tune,core-avx2)
+ cflags-$(CONFIG_MATOM)		+= $(call cc-option,-march=atom,$(call cc-option,-march=core2,-march=i686)) \
+ 	$(call cc-option,-mtune=atom,$(call cc-option,-mtune=generic))

diff --git a/5010_enable-additional-cpu-optimizations-for-gcc-4.9.patch b/5010_enable-additional-cpu-optimizations-for-gcc-4.9.patch
new file mode 100644
index 0000000..c4efd06
--- /dev/null
+++ b/5010_enable-additional-cpu-optimizations-for-gcc-4.9.patch
@@ -0,0 +1,402 @@
+WARNING - this version of the patch works with version 4.9+ of gcc and with
+kernel version 3.15.x+ and should NOT be applied when compiling on older
+versions due to name changes of the flags with the 4.9 release of gcc.
+Use the older version of this patch hosted on the same github for older
+versions of gcc. For example:
+
+corei7 --> nehalem
+corei7-avx --> sandybridge
+core-avx-i --> ivybridge
+core-avx2 --> haswell
+
+For more, see: https://gcc.gnu.org/gcc-4.9/changes.html
+
+It also changes 'atom' to 'bonnell' in accordance with the gcc v4.9 changes.
+Note that upstream is using the deprecated 'match=atom' flags when I believe it
+should use the newer 'march=bonnell' flag for atom processors.
+
+I have made that change to this patch set as well.  See the following kernel
+bug report to see if I'm right: https://bugzilla.kernel.org/show_bug.cgi?id=77461
+
+This patch will expand the number of microarchitectures to include newer
+processors including: AMD K10-family, AMD Family 10h (Barcelona), AMD Family
+14h (Bobcat), AMD Family 15h (Bulldozer), AMD Family 15h (Piledriver), AMD
+Family 16h (Jaguar), Intel 1st Gen Core i3/i5/i7 (Nehalem), Intel 1.5 Gen Core
+i3/i5/i7 (Westmere), Intel 2nd Gen Core i3/i5/i7 (Sandybridge), Intel 3rd Gen
+Core i3/i5/i7 (Ivybridge), Intel 4th Gen Core i3/i5/i7 (Haswell), Intel 5th
+Gen Core i3/i5/i7 (Broadwell), and the low power Silvermont series of Atom
+processors (Silvermont). It also offers the compiler the 'native' flag.
+
+Small but real speed increases are measurable using a make endpoint comparing
+a generic kernel to one built with one of the respective microarchs.
+
+See the following experimental evidence supporting this statement:
+https://github.com/graysky2/kernel_gcc_patch
+
+REQUIREMENTS
+linux version >=3.15
+gcc version >=4.9
+
+--- a/arch/x86/include/asm/module.h	2014-06-16 16:44:27.000000000 -0400
++++ b/arch/x86/include/asm/module.h	2015-03-07 03:27:32.556672424 -0500
+@@ -15,6 +15,22 @@
+ #define MODULE_PROC_FAMILY "586MMX "
+ #elif defined CONFIG_MCORE2
+ #define MODULE_PROC_FAMILY "CORE2 "
++#elif defined CONFIG_MNATIVE
++#define MODULE_PROC_FAMILY "NATIVE "
++#elif defined CONFIG_MNEHALEM
++#define MODULE_PROC_FAMILY "NEHALEM "
++#elif defined CONFIG_MWESTMERE
++#define MODULE_PROC_FAMILY "WESTMERE "
++#elif defined CONFIG_MSILVERMONT
++#define MODULE_PROC_FAMILY "SILVERMONT "
++#elif defined CONFIG_MSANDYBRIDGE
++#define MODULE_PROC_FAMILY "SANDYBRIDGE "
++#elif defined CONFIG_MIVYBRIDGE
++#define MODULE_PROC_FAMILY "IVYBRIDGE "
++#elif defined CONFIG_MHASWELL
++#define MODULE_PROC_FAMILY "HASWELL "
++#elif defined CONFIG_MBROADWELL
++#define MODULE_PROC_FAMILY "BROADWELL "
+ #elif defined CONFIG_MATOM
+ #define MODULE_PROC_FAMILY "ATOM "
+ #elif defined CONFIG_M686
+@@ -33,6 +49,20 @@
+ #define MODULE_PROC_FAMILY "K7 "
+ #elif defined CONFIG_MK8
+ #define MODULE_PROC_FAMILY "K8 "
++#elif defined CONFIG_MK8SSE3
++#define MODULE_PROC_FAMILY "K8SSE3 "
++#elif defined CONFIG_MK10
++#define MODULE_PROC_FAMILY "K10 "
++#elif defined CONFIG_MBARCELONA
++#define MODULE_PROC_FAMILY "BARCELONA "
++#elif defined CONFIG_MBOBCAT
++#define MODULE_PROC_FAMILY "BOBCAT "
++#elif defined CONFIG_MBULLDOZER
++#define MODULE_PROC_FAMILY "BULLDOZER "
++#elif defined CONFIG_MPILEDRIVER
++#define MODULE_PROC_FAMILY "PILEDRIVER "
++#elif defined CONFIG_MJAGUAR
++#define MODULE_PROC_FAMILY "JAGUAR "
+ #elif defined CONFIG_MELAN
+ #define MODULE_PROC_FAMILY "ELAN "
+ #elif defined CONFIG_MCRUSOE
+--- a/arch/x86/Kconfig.cpu	2014-06-16 16:44:27.000000000 -0400
++++ b/arch/x86/Kconfig.cpu	2015-03-07 03:32:14.337713226 -0500
+@@ -137,9 +137,8 @@ config MPENTIUM4
+ 		-Paxville
+ 		-Dempsey
+ 
+-
+ config MK6
+-	bool "K6/K6-II/K6-III"
++	bool "AMD K6/K6-II/K6-III"
+ 	depends on X86_32
+ 	---help---
+ 	  Select this for an AMD K6-family processor.  Enables use of
+@@ -147,7 +146,7 @@ config MK6
+ 	  flags to GCC.
+ 
+ config MK7
+-	bool "Athlon/Duron/K7"
++	bool "AMD Athlon/Duron/K7"
+ 	depends on X86_32
+ 	---help---
+ 	  Select this for an AMD Athlon K7-family processor.  Enables use of
+@@ -155,12 +154,62 @@ config MK7
+ 	  flags to GCC.
+ 
+ config MK8
+-	bool "Opteron/Athlon64/Hammer/K8"
++	bool "AMD Opteron/Athlon64/Hammer/K8"
+ 	---help---
+ 	  Select this for an AMD Opteron or Athlon64 Hammer-family processor.
+ 	  Enables use of some extended instructions, and passes appropriate
+ 	  optimization flags to GCC.
+ 
++config MK8SSE3
++	bool "AMD Opteron/Athlon64/Hammer/K8 with SSE3"
++	---help---
++	  Select this for improved AMD Opteron or Athlon64 Hammer-family processors.
++	  Enables use of some extended instructions, and passes appropriate
++	  optimization flags to GCC.
++
++config MK10
++	bool "AMD 61xx/7x50/PhenomX3/X4/II/K10"
++	---help---
++	  Select this for an AMD 61xx Eight-Core Magny-Cours, Athlon X2 7x50,
++		Phenom X3/X4/II, Athlon II X2/X3/X4, or Turion II-family processor.
++	  Enables use of some extended instructions, and passes appropriate
++	  optimization flags to GCC.
++
++config MBARCELONA
++	bool "AMD Barcelona"
++	---help---
++	  Select this for AMD Barcelona and newer processors.
++
++	  Enables -march=barcelona
++
++config MBOBCAT
++	bool "AMD Bobcat"
++	---help---
++	  Select this for AMD Bobcat processors.
++
++	  Enables -march=btver1
++
++config MBULLDOZER
++	bool "AMD Bulldozer"
++	---help---
++	  Select this for AMD Bulldozer processors.
++
++	  Enables -march=bdver1
++
++config MPILEDRIVER
++	bool "AMD Piledriver"
++	---help---
++	  Select this for AMD Piledriver processors.
++
++	  Enables -march=bdver2
++
++config MJAGUAR
++	bool "AMD Jaguar"
++	---help---
++	  Select this for AMD Jaguar processors.
++
++	  Enables -march=btver2
++
+ config MCRUSOE
+ 	bool "Crusoe"
+ 	depends on X86_32
+@@ -251,8 +300,17 @@ config MPSC
+ 	  using the cpu family field
+ 	  in /proc/cpuinfo. Family 15 is an older Xeon, Family 6 a newer one.
+ 
++config MATOM
++	bool "Intel Atom"
++	---help---
++
++	  Select this for the Intel Atom platform. Intel Atom CPUs have an
++	  in-order pipelining architecture and thus can benefit from
++	  accordingly optimized code. Use a recent GCC with specific Atom
++	  support in order to fully benefit from selecting this option.
++
+ config MCORE2
+-	bool "Core 2/newer Xeon"
++	bool "Intel Core 2"
+ 	---help---
+ 
+ 	  Select this for Intel Core 2 and newer Core 2 Xeons (Xeon 51xx and
+@@ -260,14 +318,63 @@ config MCORE2
+ 	  family in /proc/cpuinfo. Newer ones have 6 and older ones 15
+ 	  (not a typo)
+ 
+-config MATOM
+-	bool "Intel Atom"
++	  Enables -march=core2
++
++config MNEHALEM
++	bool "Intel Nehalem"
+ 	---help---
+ 
+-	  Select this for the Intel Atom platform. Intel Atom CPUs have an
+-	  in-order pipelining architecture and thus can benefit from
+-	  accordingly optimized code. Use a recent GCC with specific Atom
+-	  support in order to fully benefit from selecting this option.
++	  Select this for 1st Gen Core processors in the Nehalem family.
++
++	  Enables -march=nehalem
++
++config MWESTMERE
++	bool "Intel Westmere"
++	---help---
++
++	  Select this for the Intel Westmere formerly Nehalem-C family.
++
++	  Enables -march=westmere
++
++config MSILVERMONT
++	bool "Intel Silvermont"
++	---help---
++
++	  Select this for the Intel Silvermont platform.
++
++	  Enables -march=silvermont
++
++config MSANDYBRIDGE
++	bool "Intel Sandy Bridge"
++	---help---
++
++	  Select this for 2nd Gen Core processors in the Sandy Bridge family.
++
++	  Enables -march=sandybridge
++
++config MIVYBRIDGE
++	bool "Intel Ivy Bridge"
++	---help---
++
++	  Select this for 3rd Gen Core processors in the Ivy Bridge family.
++
++	  Enables -march=ivybridge
++
++config MHASWELL
++	bool "Intel Haswell"
++	---help---
++
++	  Select this for 4th Gen Core processors in the Haswell family.
++
++	  Enables -march=haswell
++
++config MBROADWELL
++	bool "Intel Broadwell"
++	---help---
++
++	  Select this for 5th Gen Core processors in the Broadwell family.
++
++	  Enables -march=broadwell
+ 
+ config GENERIC_CPU
+ 	bool "Generic-x86-64"
+@@ -276,6 +383,19 @@ config GENERIC_CPU
+ 	  Generic x86-64 CPU.
+ 	  Run equally well on all x86-64 CPUs.
+ 
++config MNATIVE
++ bool "Native optimizations autodetected by GCC"
++ ---help---
++
++   GCC 4.2 and above support -march=native, which automatically detects
++   the optimum settings to use based on your processor. -march=native 
++   also detects and applies additional settings beyond -march specific
++   to your CPU, (eg. -msse4). Unless you have a specific reason not to
++   (e.g. distcc cross-compiling), you should probably be using
++   -march=native rather than anything listed below.
++
++   Enables -march=native
++
+ endchoice
+ 
+ config X86_GENERIC
+@@ -300,7 +420,7 @@ config X86_INTERNODE_CACHE_SHIFT
+ config X86_L1_CACHE_SHIFT
+ 	int
+ 	default "7" if MPENTIUM4 || MPSC
+-	default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MATOM || MVIAC7 || X86_GENERIC || GENERIC_CPU
++	default "6" if MK7 || MK8 || MK8SSE3 || MK10 || MBARCELONA || MBOBCAT || MBULLDOZER || MPILEDRIVER || MJAGUAR || MPENTIUMM || MCORE2 || MNEHALEM || MWESTMERE || MSILVERMONT || MSANDYBRIDGE || MIVYBRIDGE || MHASWELL || BROADWELL || MNATIVE || MATOM || MVIAC7 || X86_GENERIC || GENERIC_CPU
+ 	default "4" if MELAN || M486 || MGEODEGX1
+ 	default "5" if MWINCHIP3D || MWINCHIPC6 || MCRUSOE || MEFFICEON || MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || M586 || MVIAC3_2 || MGEODE_LX
+ 
+@@ -331,11 +451,11 @@ config X86_ALIGNMENT_16
+ 
+ config X86_INTEL_USERCOPY
+ 	def_bool y
+-	depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || X86_GENERIC || MK8 || MK7 || MEFFICEON || MCORE2
++	depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || X86_GENERIC || MK8 || MK8SSE3 || MK7 || MEFFICEON || MCORE2 || MK10 || MBARCELONA || MNEHALEM || MWESTMERE || MSILVERMONT || MSANDYBRIDGE || MIVYBRIDGE || MHASWELL || MBROADWELL || MNATIVE
+ 
+ config X86_USE_PPRO_CHECKSUM
+ 	def_bool y
+-	depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MVIAC3_2 || MVIAC7 || MEFFICEON || MGEODE_LX || MCORE2 || MATOM
++	depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MK10 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MK8SSE3 || MVIAC3_2 || MVIAC7 || MEFFICEON || MGEODE_LX || MCORE2 || MNEHALEM || MWESTMERE || MSILVERMONT || MSANDYBRIDGE || MIVYBRIDGE || MHASWELL || MBROADWELL || MATOM || MNATIVE
+ 
+ config X86_USE_3DNOW
+ 	def_bool y
+@@ -359,17 +479,17 @@ config X86_P6_NOP
+ 
+ config X86_TSC
+ 	def_bool y
+-	depends on (MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MCORE2 || MATOM) || X86_64
++	depends on (MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MK8SSE3 || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MCORE2 || MNEHALEM || MWESTMERE || MSILVERMONT || MSANDYBRIDGE || MIVYBRIDGE || MHASWELL || MBROADWELL || MNATIVE || MATOM) || X86_64
+ 
+ config X86_CMPXCHG64
+ 	def_bool y
+-	depends on X86_PAE || X86_64 || MCORE2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MATOM
++	depends on X86_PAE || X86_64 || MCORE2 || MNEHALEM || MWESTMERE || MSILVERMONT || MSANDYBRIDGE || MIVYBRIDGE || MHASWELL || MBROADWELL || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MATOM || MNATIVE
+ 
+ # this should be set for all -march=.. options where the compiler
+ # generates cmov.
+ config X86_CMOV
+ 	def_bool y
+-	depends on (MK8 || MK7 || MCORE2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || X86_64 || MATOM || MGEODE_LX)
++	depends on (MK8 || MK8SSE3 || MK10 || MBARCELONA || MBOBCAT || MBULLDOZER || MPILEDRIVER || MJAGUAR || MK7 || MCORE2 || MNEHALEM || MWESTMERE || MSILVERMONT || MSANDYBRIDGE || MIVYBRIDGE || MHASWELL || MBROADWELL || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || X86_64 || MNATIVE || MATOM || MGEODE_LX)
+ 
+ config X86_MINIMUM_CPU_FAMILY
+ 	int
+--- a/arch/x86/Makefile	2014-06-16 16:44:27.000000000 -0400
++++ b/arch/x86/Makefile	2015-03-07 03:33:27.650843211 -0500
+@@ -92,13 +92,35 @@ else
+ 	KBUILD_CFLAGS += $(call cc-option,-mpreferred-stack-boundary=3)
+ 
+         # FIXME - should be integrated in Makefile.cpu (Makefile_32.cpu)
++        cflags-$(CONFIG_MNATIVE) += $(call cc-option,-march=native)
+         cflags-$(CONFIG_MK8) += $(call cc-option,-march=k8)
++        cflags-$(CONFIG_MK8SSE3) += $(call cc-option,-march=k8-sse3,-mtune=k8)
++        cflags-$(CONFIG_MK10) += $(call cc-option,-march=amdfam10)
++        cflags-$(CONFIG_MBARCELONA) += $(call cc-option,-march=barcelona)
++        cflags-$(CONFIG_MBOBCAT) += $(call cc-option,-march=btver1)
++        cflags-$(CONFIG_MBULLDOZER) += $(call cc-option,-march=bdver1)
++        cflags-$(CONFIG_MPILEDRIVER) += $(call cc-option,-march=bdver2)
++        cflags-$(CONFIG_MJAGUAR) += $(call cc-option,-march=btver2)
+         cflags-$(CONFIG_MPSC) += $(call cc-option,-march=nocona)
+ 
+         cflags-$(CONFIG_MCORE2) += \
+-                $(call cc-option,-march=core2,$(call cc-option,-mtune=generic))
+-	cflags-$(CONFIG_MATOM) += $(call cc-option,-march=atom) \
+-		$(call cc-option,-mtune=atom,$(call cc-option,-mtune=generic))
++                $(call cc-option,-march=core2,$(call cc-option,-mtune=core2))
++        cflags-$(CONFIG_MNEHALEM) += \
++                $(call cc-option,-march=nehalem,$(call cc-option,-mtune=nehalem))
++        cflags-$(CONFIG_MWESTMERE) += \
++                $(call cc-option,-march=westmere,$(call cc-option,-mtune=westmere))
++        cflags-$(CONFIG_MSILVERMONT) += \
++                $(call cc-option,-march=silvermont,$(call cc-option,-mtune=silvermont))
++        cflags-$(CONFIG_MSANDYBRIDGE) += \
++                $(call cc-option,-march=sandybridge,$(call cc-option,-mtune=sandybridge))
++        cflags-$(CONFIG_MIVYBRIDGE) += \
++                $(call cc-option,-march=ivybridge,$(call cc-option,-mtune=ivybridge))
++        cflags-$(CONFIG_MHASWELL) += \
++                $(call cc-option,-march=haswell,$(call cc-option,-mtune=haswell))
++        cflags-$(CONFIG_MBROADWELL) += \
++                $(call cc-option,-march=broadwell,$(call cc-option,-mtune=broadwell))
++        cflags-$(CONFIG_MATOM) += $(call cc-option,-march=bonnell) \
++                $(call cc-option,-mtune=bonnell,$(call cc-option,-mtune=generic))
+         cflags-$(CONFIG_GENERIC_CPU) += $(call cc-option,-mtune=generic)
+         KBUILD_CFLAGS += $(cflags-y)
+ 
+--- a/arch/x86/Makefile_32.cpu	2014-06-16 16:44:27.000000000 -0400
++++ b/arch/x86/Makefile_32.cpu	2015-03-07 03:34:15.203586024 -0500
+@@ -23,7 +23,15 @@ cflags-$(CONFIG_MK6)		+= -march=k6
+ # Please note, that patches that add -march=athlon-xp and friends are pointless.
+ # They make zero difference whatsosever to performance at this time.
+ cflags-$(CONFIG_MK7)		+= -march=athlon
++cflags-$(CONFIG_MNATIVE) += $(call cc-option,-march=native)
+ cflags-$(CONFIG_MK8)		+= $(call cc-option,-march=k8,-march=athlon)
++cflags-$(CONFIG_MK8SSE3)		+= $(call cc-option,-march=k8-sse3,-march=athlon)
++cflags-$(CONFIG_MK10)	+= $(call cc-option,-march=amdfam10,-march=athlon)
++cflags-$(CONFIG_MBARCELONA)	+= $(call cc-option,-march=barcelona,-march=athlon)
++cflags-$(CONFIG_MBOBCAT)	+= $(call cc-option,-march=btver1,-march=athlon)
++cflags-$(CONFIG_MBULLDOZER)	+= $(call cc-option,-march=bdver1,-march=athlon)
++cflags-$(CONFIG_MPILEDRIVER)	+= $(call cc-option,-march=bdver2,-march=athlon)
++cflags-$(CONFIG_MJAGUAR)	+= $(call cc-option,-march=btver2,-march=athlon)
+ cflags-$(CONFIG_MCRUSOE)	+= -march=i686 $(align)-functions=0 $(align)-jumps=0 $(align)-loops=0
+ cflags-$(CONFIG_MEFFICEON)	+= -march=i686 $(call tune,pentium3) $(align)-functions=0 $(align)-jumps=0 $(align)-loops=0
+ cflags-$(CONFIG_MWINCHIPC6)	+= $(call cc-option,-march=winchip-c6,-march=i586)
+@@ -32,8 +40,15 @@ cflags-$(CONFIG_MCYRIXIII)	+= $(call cc-
+ cflags-$(CONFIG_MVIAC3_2)	+= $(call cc-option,-march=c3-2,-march=i686)
+ cflags-$(CONFIG_MVIAC7)		+= -march=i686
+ cflags-$(CONFIG_MCORE2)		+= -march=i686 $(call tune,core2)
+-cflags-$(CONFIG_MATOM)		+= $(call cc-option,-march=atom,$(call cc-option,-march=core2,-march=i686)) \
+-	$(call cc-option,-mtune=atom,$(call cc-option,-mtune=generic))
++cflags-$(CONFIG_MNEHALEM)	+= -march=i686 $(call tune,nehalem)
++cflags-$(CONFIG_MWESTMERE)	+= -march=i686 $(call tune,westmere)
++cflags-$(CONFIG_MSILVERMONT)	+= -march=i686 $(call tune,silvermont)
++cflags-$(CONFIG_MSANDYBRIDGE)	+= -march=i686 $(call tune,sandybridge)
++cflags-$(CONFIG_MIVYBRIDGE)	+= -march=i686 $(call tune,ivybridge)
++cflags-$(CONFIG_MHASWELL)	+= -march=i686 $(call tune,haswell)
++cflags-$(CONFIG_MBROADWELL)	+= -march=i686 $(call tune,broadwell)
++cflags-$(CONFIG_MATOM)		+= $(call cc-option,-march=bonnell,$(call cc-option,-march=core2,-march=i686)) \
++	$(call cc-option,-mtune=bonnell,$(call cc-option,-mtune=generic))
+ 
+ # AMD Elan support
+ cflags-$(CONFIG_MELAN)		+= -march=i486
+

diff --git a/5015_kdbus-8-12-2015.patch b/5015_kdbus-8-12-2015.patch
new file mode 100644
index 0000000..4e018f2
--- /dev/null
+++ b/5015_kdbus-8-12-2015.patch
@@ -0,0 +1,34349 @@
+diff --git a/Documentation/Makefile b/Documentation/Makefile
+index bc05482..e2127a7 100644
+--- a/Documentation/Makefile
++++ b/Documentation/Makefile
+@@ -1,4 +1,4 @@
+ subdir-y := accounting auxdisplay blackfin connector \
+-	filesystems filesystems ia64 laptops mic misc-devices \
++	filesystems filesystems ia64 kdbus laptops mic misc-devices \
+ 	networking pcmcia prctl ptp spi timers vDSO video4linux \
+ 	watchdog
+diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
+index 51f4221..ec7c81b 100644
+--- a/Documentation/ioctl/ioctl-number.txt
++++ b/Documentation/ioctl/ioctl-number.txt
+@@ -292,6 +292,7 @@ Code  Seq#(hex)	Include File		Comments
+ 0x92	00-0F	drivers/usb/mon/mon_bin.c
+ 0x93	60-7F	linux/auto_fs.h
+ 0x94	all	fs/btrfs/ioctl.h
++0x95	all	uapi/linux/kdbus.h	kdbus IPC driver
+ 0x97	00-7F	fs/ceph/ioctl.h		Ceph file system
+ 0x99	00-0F				537-Addinboard driver
+ 					<mailto:buk@buks.ipn.de>
+diff --git a/Documentation/kdbus/.gitignore b/Documentation/kdbus/.gitignore
+new file mode 100644
+index 0000000..b4a77cc
+--- /dev/null
++++ b/Documentation/kdbus/.gitignore
+@@ -0,0 +1,2 @@
++*.7
++*.html
+diff --git a/Documentation/kdbus/Makefile b/Documentation/kdbus/Makefile
+new file mode 100644
+index 0000000..8caffe5
+--- /dev/null
++++ b/Documentation/kdbus/Makefile
+@@ -0,0 +1,44 @@
++DOCS :=	\
++	kdbus.xml		\
++	kdbus.bus.xml		\
++	kdbus.connection.xml	\
++	kdbus.endpoint.xml	\
++	kdbus.fs.xml		\
++	kdbus.item.xml		\
++	kdbus.match.xml		\
++	kdbus.message.xml	\
++	kdbus.name.xml		\
++	kdbus.policy.xml	\
++	kdbus.pool.xml
++
++XMLFILES := $(addprefix $(obj)/,$(DOCS))
++MANFILES := $(patsubst %.xml, %.7, $(XMLFILES))
++HTMLFILES := $(patsubst %.xml, %.html, $(XMLFILES))
++
++XMLTO_ARGS := -m $(srctree)/$(src)/stylesheet.xsl --skip-validation
++
++quiet_cmd_db2man = MAN     $@
++      cmd_db2man = xmlto man $(XMLTO_ARGS) -o $(obj) $<
++%.7: %.xml
++	@(which xmlto > /dev/null 2>&1) || \
++	 (echo "*** You need to install xmlto ***"; \
++	  exit 1)
++	$(call cmd,db2man)
++
++quiet_cmd_db2html = HTML    $@
++      cmd_db2html = xmlto html-nochunks $(XMLTO_ARGS) -o $(obj) $<
++%.html: %.xml
++	@(which xmlto > /dev/null 2>&1) || \
++	 (echo "*** You need to install xmlto ***"; \
++	  exit 1)
++	$(call cmd,db2html)
++
++mandocs: $(MANFILES)
++
++htmldocs: $(HTMLFILES)
++
++clean-files := $(MANFILES) $(HTMLFILES)
++
++# we don't support other %docs targets right now
++%docs:
++	@true
+diff --git a/Documentation/kdbus/kdbus.bus.xml b/Documentation/kdbus/kdbus.bus.xml
+new file mode 100644
+index 0000000..83f1198
+--- /dev/null
++++ b/Documentation/kdbus/kdbus.bus.xml
+@@ -0,0 +1,344 @@
++<?xml version='1.0'?> <!--*-nxml-*-->
++<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
++        "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
++
++<refentry id="kdbus.bus">
++
++  <refentryinfo>
++    <title>kdbus.bus</title>
++    <productname>kdbus.bus</productname>
++  </refentryinfo>
++
++  <refmeta>
++    <refentrytitle>kdbus.bus</refentrytitle>
++    <manvolnum>7</manvolnum>
++  </refmeta>
++
++  <refnamediv>
++    <refname>kdbus.bus</refname>
++    <refpurpose>kdbus bus</refpurpose>
++  </refnamediv>
++
++  <refsect1>
++    <title>Description</title>
++
++    <para>
++      A bus is a resource that is shared between connections in order to
++      transmit messages (see
++      <citerefentry>
++        <refentrytitle>kdbus.message</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>).
++      Each bus is independent, and operations on the bus will not have any
++      effect on other buses. A bus is a management entity that controls the
++      addresses of its connections, their policies and message transactions
++      performed via this bus.
++    </para>
++    <para>
++      Each bus is bound to the mount instance it was created on. It has a
++      custom name that is unique across all buses of a domain. In
++      <citerefentry>
++        <refentrytitle>kdbus.fs</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>
++      a bus is presented as a directory. No operations can be performed on
++      the bus itself; instead you need to perform the operations on an endpoint
++      associated with the bus. Endpoints are accessible as files underneath the
++      bus directory. A default endpoint called <constant>bus</constant> is
++      provided on each bus.
++    </para>
++    <para>
++      Bus names may be chosen freely except for one restriction: the name must
++      be prefixed with the numeric effective UID of the creator and a dash. This
++      is required to avoid namespace clashes between different users. When
++      creating a bus, the name that is passed in must be properly formatted, or
++      the kernel will refuse creation of the bus. Example:
++      <literal>1047-foobar</literal> is an acceptable name for a bus
++      registered by a user with UID 1047. However,
++      <literal>1024-foobar</literal> is not, and neither is
++      <literal>foobar</literal>. The UID must be provided in the
++      user-namespace of the bus owner.
++    </para>
++    <para>
++      To create a new bus, you need to open the control file of a domain and
++      employ the <constant>KDBUS_CMD_BUS_MAKE</constant> ioctl. The control
++      file descriptor that was used to issue
++      <constant>KDBUS_CMD_BUS_MAKE</constant> must not previously have been
++      used for any other control-ioctl and must be kept open for the entire
++      life-time of the created bus. Closing it will immediately cleanup the
++      entire bus and all its associated resources and endpoints. Every control
++      file descriptor can only be used to create a single new bus; from that
++      point on, it is not used for any further communication until the final
++      <citerefentry>
++        <refentrytitle>close</refentrytitle>
++        <manvolnum>2</manvolnum>
++      </citerefentry>
++      .
++    </para>
++    <para>
++      Each bus will generate a random, 128-bit UUID upon creation. This UUID
++      will be returned to creators of connections through
++      <varname>kdbus_cmd_hello.id128</varname> and can be used to uniquely
++      identify buses, even across different machines or containers. The UUID
++      will have its variant bits set to <literal>DCE</literal>, and denote
++      version 4 (random). For more details on UUIDs, see <ulink
++      url="https://en.wikipedia.org/wiki/Universally_unique_identifier">
++      the Wikipedia article on UUIDs</ulink>.
++    </para>
++
++  </refsect1>
++
++  <refsect1>
++    <title>Creating buses</title>
++    <para>
++      To create a new bus, the <constant>KDBUS_CMD_BUS_MAKE</constant>
++      command is used. It takes a <type>struct kdbus_cmd</type> argument.
++    </para>
++    <programlisting>
++struct kdbus_cmd {
++  __u64 size;
++  __u64 flags;
++  __u64 return_flags;
++  struct kdbus_item items[0];
++};
++    </programlisting>
++
++    <para>The fields in this struct are described below.</para>
++
++    <variablelist>
++      <varlistentry>
++        <term><varname>size</varname></term>
++        <listitem><para>
++          The overall size of the struct, including its items.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>flags</varname></term>
++        <listitem><para>The flags for creation.</para>
++          <variablelist>
++            <varlistentry>
++              <term><constant>KDBUS_MAKE_ACCESS_GROUP</constant></term>
++              <listitem>
++                <para>Make the bus file group-accessible.</para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_MAKE_ACCESS_WORLD</constant></term>
++              <listitem>
++                <para>Make the bus file world-accessible.</para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
++              <listitem>
++                <para>
++                  Requests a set of valid flags for this ioctl. When this bit is
++                  set, no action is taken; the ioctl will return
++                  <errorcode>0</errorcode>, and the <varname>flags</varname>
++                  field will have all bits set that are valid for this command.
++                  The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
++                  cleared by the operation.
++                </para>
++              </listitem>
++            </varlistentry>
++          </variablelist>
++        </listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>return_flags</varname></term>
++        <listitem><para>
++          Flags returned by the kernel. Currently unused and always set to
++          <constant>0</constant> by the kernel.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>items</varname></term>
++        <listitem>
++          <para>
++            The following items (see
++            <citerefentry>
++              <refentrytitle>kdbus.item</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>)
++            are expected for <constant>KDBUS_CMD_BUS_MAKE</constant>.
++          </para>
++          <variablelist>
++            <varlistentry>
++              <term><constant>KDBUS_ITEM_MAKE_NAME</constant></term>
++              <listitem>
++                <para>
++                  Contains a null-terminated string that identifies the
++                  bus. The name must be unique across the kdbus domain and
++                  must start with the effective UID of the caller, followed by
++                  a '<literal>-</literal>' (dash). This item is mandatory.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_ITEM_BLOOM_PARAMETER</constant></term>
++              <listitem>
++                <para>
++                  Bus-wide bloom parameters passed in a
++                  <type>struct kdbus_bloom_parameter</type>. These settings are
++                  copied back to new connections verbatim. This item is
++                  mandatory. See
++                  <citerefentry>
++                    <refentrytitle>kdbus.item</refentrytitle>
++                    <manvolnum>7</manvolnum>
++                  </citerefentry>
++                  for a more detailed description of this item.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_ITEM_ATTACH_FLAGS_SEND</constant></term>
++              <listitem>
++                <para>
++                  An optional item that contains a set of attach flags that are
++                  returned to connections when they query the bus creator
++                  metadata. If not set, no metadata is returned.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
++              <listitem><para>
++                With this item, programs can <emphasis>probe</emphasis> the
++                kernel for known item types. See
++                <citerefentry>
++                  <refentrytitle>kdbus.item</refentrytitle>
++                  <manvolnum>7</manvolnum>
++                </citerefentry>
++                for more details.
++              </para></listitem>
++            </varlistentry>
++          </variablelist>
++        </listitem>
++      </varlistentry>
++    </variablelist>
++
++    <para>
++      Unrecognized items are rejected, and the ioctl will fail with
++      <varname>errno</varname> set to <constant>EINVAL</constant>.
++    </para>
++  </refsect1>
++
++  <refsect1>
++    <title>Return value</title>
++    <para>
++      On success, all mentioned ioctl commands return <errorcode>0</errorcode>;
++      on error, <errorcode>-1</errorcode> is returned, and
++      <varname>errno</varname> is set to indicate the error.
++      If the issued ioctl is illegal for the file descriptor used,
++      <varname>errno</varname> will be set to <constant>ENOTTY</constant>.
++    </para>
++
++    <refsect2>
++      <title>
++        <constant>KDBUS_CMD_BUS_MAKE</constant> may fail with the following
++        errors
++      </title>
++
++      <variablelist>
++        <varlistentry>
++          <term><constant>EBADMSG</constant></term>
++          <listitem><para>
++            A mandatory item is missing.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EINVAL</constant></term>
++          <listitem><para>
++            The flags supplied in the <constant>struct kdbus_cmd</constant>
++            are invalid or the supplied name does not start with the current
++            UID and a '<literal>-</literal>' (dash).
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EEXIST</constant></term>
++          <listitem><para>
++            A bus of that name already exists.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>ESHUTDOWN</constant></term>
++          <listitem><para>
++            The kdbus mount instance for the bus was already shut down.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EMFILE</constant></term>
++          <listitem><para>
++            The maximum number of buses for the current user is exhausted.
++          </para></listitem>
++        </varlistentry>
++      </variablelist>
++    </refsect2>
++  </refsect1>
++
++  <refsect1>
++    <title>See Also</title>
++    <simplelist type="inline">
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.connection</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.endpoint</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.fs</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.item</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.message</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.name</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.pool</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++    </simplelist>
++  </refsect1>
++</refentry>
+diff --git a/Documentation/kdbus/kdbus.connection.xml b/Documentation/kdbus/kdbus.connection.xml
+new file mode 100644
+index 0000000..4bb5f30
+--- /dev/null
++++ b/Documentation/kdbus/kdbus.connection.xml
+@@ -0,0 +1,1244 @@
++<?xml version='1.0'?> <!--*-nxml-*-->
++<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
++        "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
++
++<refentry id="kdbus.connection">
++
++  <refentryinfo>
++    <title>kdbus.connection</title>
++    <productname>kdbus.connection</productname>
++  </refentryinfo>
++
++  <refmeta>
++    <refentrytitle>kdbus.connection</refentrytitle>
++    <manvolnum>7</manvolnum>
++  </refmeta>
++
++  <refnamediv>
++    <refname>kdbus.connection</refname>
++    <refpurpose>kdbus connection</refpurpose>
++  </refnamediv>
++
++  <refsect1>
++    <title>Description</title>
++
++    <para>
++      Connections are identified by their <emphasis>connection ID</emphasis>,
++      internally implemented as a <type>uint64_t</type> counter.
++      The IDs of every newly created bus start at <constant>1</constant>, and
++      every new connection will increment the counter by <constant>1</constant>.
++      The IDs are not reused.
++    </para>
++    <para>
++      In higher level tools, the user visible representation of a connection is
++      defined by the D-Bus protocol specification as
++      <constant>":1.&lt;ID&gt;"</constant>.
++    </para>
++    <para>
++      Messages with a specific <type>uint64_t</type> destination ID are
++      directly delivered to the connection with the corresponding ID. Signal
++      messages (see
++      <citerefentry>
++        <refentrytitle>kdbus.message</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>)
++      may be addressed to the special destination ID
++      <constant>KDBUS_DST_ID_BROADCAST</constant> (~0ULL) and will then
++      potentially be delivered to all currently active connections on the bus.
++      However, in order to receive any signal messages, clients must subscribe
++      to them by installing a match (see
++      <citerefentry>
++        <refentrytitle>kdbus.match</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>).
++    </para>
++    <para>
++      Messages synthesized and sent directly by the kernel will carry the
++      special source ID <constant>KDBUS_SRC_ID_KERNEL</constant> (0).
++    </para>
++    <para>
++      In addition to the unique <type>uint64_t</type> connection ID,
++      established connections can request the ownership of
++      <emphasis>well-known names</emphasis>, under which they can be found and
++      addressed by other bus clients. A well-known name is associated with one
++      and only one connection at a time. See
++      <citerefentry>
++        <refentrytitle>kdbus.name</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>
++      on name acquisition, the name registry, and the validity of names.
++    </para>
++    <para>
++      Messages can specify the special destination ID
++      <constant>KDBUS_DST_ID_NAME</constant> (0) and carry a well-known name
++      in the message data. Such a message is delivered to the destination
++      connection which owns that well-known name.
++    </para>
++
++    <programlisting><![CDATA[
++  +-------------------------------------------------------------------------+
++  | +---------------+     +---------------------------+                     |
++  | | Connection    |     | Message                   | -----------------+  |
++  | | :1.22         | --> | src: 22                   |                  |  |
++  | |               |     | dst: 25                   |                  |  |
++  | |               |     |                           |                  |  |
++  | |               |     |                           |                  |  |
++  | |               |     +---------------------------+                  |  |
++  | |               |                                                    |  |
++  | |               | <--------------------------------------+           |  |
++  | +---------------+                                        |           |  |
++  |                                                          |           |  |
++  | +---------------+     +---------------------------+      |           |  |
++  | | Connection    |     | Message                   | -----+           |  |
++  | | :1.25         | --> | src: 25                   |                  |  |
++  | |               |     | dst: 0xffffffffffffffff   | -------------+   |  |
++  | |               |     |  (KDBUS_DST_ID_BROADCAST) |              |   |  |
++  | |               |     |                           | ---------+   |   |  |
++  | |               |     +---------------------------+          |   |   |  |
++  | |               |                                            |   |   |  |
++  | |               | <--------------------------------------------------+  |
++  | +---------------+                                            |   |      |
++  |                                                              |   |      |
++  | +---------------+     +---------------------------+          |   |      |
++  | | Connection    |     | Message                   | --+      |   |      |
++  | | :1.55         | --> | src: 55                   |   |      |   |      |
++  | |               |     | dst: 0 / org.foo.bar      |   |      |   |      |
++  | |               |     |                           |   |      |   |      |
++  | |               |     |                           |   |      |   |      |
++  | |               |     +---------------------------+   |      |   |      |
++  | |               |                                     |      |   |      |
++  | |               | <------------------------------------------+   |      |
++  | +---------------+                                     |          |      |
++  |                                                       |          |      |
++  | +---------------+                                     |          |      |
++  | | Connection    |                                     |          |      |
++  | | :1.81         |                                     |          |      |
++  | | org.foo.bar   |                                     |          |      |
++  | |               |                                     |          |      |
++  | |               |                                     |          |      |
++  | |               | <-----------------------------------+          |      |
++  | |               |                                                |      |
++  | |               | <----------------------------------------------+      |
++  | +---------------+                                                       |
++  +-------------------------------------------------------------------------+
++    ]]></programlisting>
++  </refsect1>
++
++  <refsect1>
++    <title>Privileged connections</title>
++    <para>
++      A connection is considered <emphasis>privileged</emphasis> if the user
++      it was created by is the same that created the bus, or if the creating
++      task had <constant>CAP_IPC_OWNER</constant> set when it called
++      <constant>KDBUS_CMD_HELLO</constant> (see below).
++    </para>
++    <para>
++      Privileged connections have permission to employ certain restricted
++      functions and commands, which are explained below and in other kdbus
++      man-pages.
++    </para>
++  </refsect1>
++
++  <refsect1>
++    <title>Activator and policy holder connection</title>
++    <para>
++      An <emphasis>activator</emphasis> connection is a placeholder for a
++      <emphasis>well-known name</emphasis>. Messages sent to such a connection
++      can be used to start an implementer connection, which will then get all
++      the messages from the activator copied over. An activator connection
++      cannot be used to send any message.
++    </para>
++    <para>
++      A <emphasis>policy holder</emphasis> connection only installs a policy
++      for one or more names. These policy entries are kept active as long as
++      the connection is alive, and are removed once it terminates. Such a
++      policy connection type can be used to deploy restrictions for names that
++      are not yet active on the bus. A policy holder connection cannot be used
++      to send any message.
++    </para>
++    <para>
++      The creation of activator or policy holder connections is restricted to
++      privileged users on the bus (see above).
++    </para>
++  </refsect1>
++
++  <refsect1>
++    <title>Monitor connections</title>
++    <para>
++      Monitors are eavesdropping connections that receive all the traffic on the
++      bus, but is invisible to other connections. Such connections have all
++      properties of any other, regular connection, except for the following
++      details:
++    </para>
++
++    <itemizedlist>
++      <listitem><para>
++        They will get every message sent over the bus, both unicasts and
++        broadcasts.
++      </para></listitem>
++
++      <listitem><para>
++        Installing matches for signal messages is neither necessary
++        nor allowed.
++      </para></listitem>
++
++      <listitem><para>
++        They cannot send messages or be directly addressed as receiver.
++      </para></listitem>
++
++      <listitem><para>
++        They cannot own well-known names. Therefore, they also can't operate as
++        activators.
++      </para></listitem>
++
++      <listitem><para>
++        Their creation and destruction will not cause
++        <constant>KDBUS_ITEM_ID_{ADD,REMOVE}</constant> (see
++        <citerefentry>
++          <refentrytitle>kdbus.item</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>).
++      </para></listitem>
++
++      <listitem><para>
++        They are not listed with their unique name in name registry dumps
++        (see <constant>KDBUS_CMD_NAME_LIST</constant> in
++        <citerefentry>
++          <refentrytitle>kdbus.name</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>), so other connections cannot detect the presence of
++	a monitor.
++      </para></listitem>
++    </itemizedlist>
++    <para>
++      The creation of monitor connections is restricted to privileged users on
++      the bus (see above).
++    </para>
++  </refsect1>
++
++  <refsect1>
++    <title>Creating connections</title>
++    <para>
++      A connection to a bus is created by opening an endpoint file (see
++      <citerefentry>
++        <refentrytitle>kdbus.endpoint</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>)
++      of a bus and becoming an active client with the
++      <constant>KDBUS_CMD_HELLO</constant> ioctl. Every connection has a unique
++      identifier on the bus and can address messages to every other connection
++      on the same bus by using the peer's connection ID as the destination.
++    </para>
++    <para>
++      The <constant>KDBUS_CMD_HELLO</constant> ioctl takes a <type>struct
++      kdbus_cmd_hello</type> as argument.
++    </para>
++
++    <programlisting>
++struct kdbus_cmd_hello {
++  __u64 size;
++  __u64 flags;
++  __u64 return_flags;
++  __u64 attach_flags_send;
++  __u64 attach_flags_recv;
++  __u64 bus_flags;
++  __u64 id;
++  __u64 pool_size;
++  __u64 offset;
++  __u8 id128[16];
++  struct kdbus_item items[0];
++};
++    </programlisting>
++
++    <para>The fields in this struct are described below.</para>
++
++    <variablelist>
++      <varlistentry>
++        <term><varname>size</varname></term>
++        <listitem><para>
++          The overall size of the struct, including its items.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>flags</varname></term>
++        <listitem>
++          <para>Flags to apply to this connection</para>
++          <variablelist>
++            <varlistentry>
++              <term><constant>KDBUS_HELLO_ACCEPT_FD</constant></term>
++              <listitem>
++                <para>
++                  When this flag is set, the connection can be sent file
++                  descriptors as message payload of unicast messages. If it's
++                  not set, an attempt to send file descriptors will result in
++                  <constant>-ECOMM</constant> on the sender's side.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_HELLO_ACTIVATOR</constant></term>
++              <listitem>
++                <para>
++                  Make this connection an activator (see above). With this bit
++                  set, an item of type <constant>KDBUS_ITEM_NAME</constant> has
++                  to be attached. This item describes the well-known name this
++                  connection should be an activator for.
++                  A connection can not be an activator and a policy holder at
++                  the same time time, so this bit is not allowed together with
++                  <constant>KDBUS_HELLO_POLICY_HOLDER</constant>.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_HELLO_POLICY_HOLDER</constant></term>
++              <listitem>
++                <para>
++                  Make this connection a policy holder (see above). With this
++                  bit set, an item of type <constant>KDBUS_ITEM_NAME</constant>
++                  has to be attached. This item describes the well-known name
++                  this connection should hold a policy for.
++                  A connection can not be an activator and a policy holder at
++                  the same time time, so this bit is not allowed together with
++                  <constant>KDBUS_HELLO_ACTIVATOR</constant>.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_HELLO_MONITOR</constant></term>
++              <listitem>
++                <para>
++                  Make this connection a monitor connection (see above).
++                </para>
++                <para>
++                  This flag can only be set by privileged bus connections. See
++                  below for more information.
++                  A connection can not be monitor and an activator or a policy
++                  holder at the same time time, so this bit is not allowed
++                  together with <constant>KDBUS_HELLO_ACTIVATOR</constant> or
++                  <constant>KDBUS_HELLO_POLICY_HOLDER</constant>.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
++              <listitem>
++                <para>
++                  Requests a set of valid flags for this ioctl. When this bit is
++                  set, no action is taken; the ioctl will return
++                  <errorcode>0</errorcode>, and the <varname>flags</varname>
++                  field will have all bits set that are valid for this command.
++                  The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
++                  cleared by the operation.
++                </para>
++              </listitem>
++            </varlistentry>
++          </variablelist>
++        </listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>return_flags</varname></term>
++        <listitem><para>
++          Flags returned by the kernel. Currently unused and always set to
++          <constant>0</constant> by the kernel.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>attach_flags_send</varname></term>
++        <listitem><para>
++          Set the bits for metadata this connection permits to be sent to the
++          receiving peer. Only metadata items that are both allowed to be sent
++          by the sender and that are requested by the receiver will be attached
++          to the message.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>attach_flags_recv</varname></term>
++        <listitem><para>
++          Request the attachment of metadata for each message received by this
++          connection. See
++          <citerefentry>
++            <refentrytitle>kdbus</refentrytitle>
++            <manvolnum>7</manvolnum>
++          </citerefentry>
++          for information about metadata, and
++          <citerefentry>
++            <refentrytitle>kdbus.item</refentrytitle>
++            <manvolnum>7</manvolnum>
++          </citerefentry>
++          regarding items in general.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>bus_flags</varname></term>
++        <listitem><para>
++          Upon successful completion of the ioctl, this member will contain the
++          flags of the bus it connected to.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>id</varname></term>
++        <listitem><para>
++          Upon successful completion of the command, this member will contain
++          the numerical ID of the new connection.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>pool_size</varname></term>
++        <listitem><para>
++          The size of the communication pool, in bytes. The pool can be
++          accessed by calling
++          <citerefentry>
++            <refentrytitle>mmap</refentrytitle>
++            <manvolnum>2</manvolnum>
++          </citerefentry>
++          on the file descriptor that was used to issue the
++          <constant>KDBUS_CMD_HELLO</constant> ioctl.
++          The pool size of a connection must be greater than
++          <constant>0</constant> and a multiple of
++          <constant>PAGE_SIZE</constant>. See
++          <citerefentry>
++            <refentrytitle>kdbus.pool</refentrytitle>
++            <manvolnum>7</manvolnum>
++          </citerefentry>
++          for more information.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>offset</varname></term>
++        <listitem><para>
++          The kernel will return the offset in the pool where returned details
++          will be stored. See below.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>id128</varname></term>
++        <listitem><para>
++          Upon successful completion of the ioctl, this member will contain the
++          <emphasis>128-bit UUID</emphasis> of the connected bus.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>items</varname></term>
++        <listitem>
++          <para>
++            Variable list of items containing optional additional information.
++            The following items are currently expected/valid:
++          </para>
++          <variablelist>
++            <varlistentry>
++              <term><constant>KDBUS_ITEM_CONN_DESCRIPTION</constant></term>
++              <listitem>
++                <para>
++                  Contains a string that describes this connection, so it can
++                  be identified later.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_ITEM_NAME</constant></term>
++              <term><constant>KDBUS_ITEM_POLICY_ACCESS</constant></term>
++              <listitem>
++                <para>
++                  For activators and policy holders only, combinations of
++                  these two items describe policy access entries. See
++                  <citerefentry>
++                    <refentrytitle>kdbus.policy</refentrytitle>
++                    <manvolnum>7</manvolnum>
++                  </citerefentry>
++                  for further details.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_ITEM_CREDS</constant></term>
++              <term><constant>KDBUS_ITEM_PIDS</constant></term>
++              <term><constant>KDBUS_ITEM_SECLABEL</constant></term>
++              <listitem>
++                <para>
++                  Privileged bus users may submit these types in order to
++                  create connections with faked credentials. This information
++                  will be returned when peer information is queried by
++                  <constant>KDBUS_CMD_CONN_INFO</constant>. See below for more
++                  information on retrieving information on connections.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
++              <listitem><para>
++                With this item, programs can <emphasis>probe</emphasis> the
++                kernel for known item types. See
++                <citerefentry>
++                  <refentrytitle>kdbus.item</refentrytitle>
++                  <manvolnum>7</manvolnum>
++                </citerefentry>
++                for more details.
++              </para></listitem>
++            </varlistentry>
++          </variablelist>
++
++          <para>
++            Unrecognized items are rejected, and the ioctl will fail with
++            <varname>errno</varname> set to <constant>EINVAL</constant>.
++          </para>
++        </listitem>
++      </varlistentry>
++    </variablelist>
++
++    <para>
++      At the offset returned in the <varname>offset</varname> field of
++      <type>struct kdbus_cmd_hello</type>, the kernel will store items
++      of the following types:
++    </para>
++
++    <variablelist>
++      <varlistentry>
++        <term><constant>KDBUS_ITEM_BLOOM_PARAMETER</constant></term>
++        <listitem>
++          <para>
++            Bloom filter parameter as defined by the bus creator.
++          </para>
++        </listitem>
++      </varlistentry>
++    </variablelist>
++
++    <para>
++      The offset in the pool has to be freed with the
++      <constant>KDBUS_CMD_FREE</constant> ioctl. See
++      <citerefentry>
++        <refentrytitle>kdbus.pool</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>
++      for further information.
++    </para>
++  </refsect1>
++
++  <refsect1>
++    <title>Retrieving information on a connection</title>
++    <para>
++      The <constant>KDBUS_CMD_CONN_INFO</constant> ioctl can be used to
++      retrieve credentials and properties of the initial creator of a
++      connection. This ioctl uses the following struct.
++    </para>
++
++    <programlisting>
++struct kdbus_cmd_info {
++  __u64 size;
++  __u64 flags;
++  __u64 return_flags;
++  __u64 id;
++  __u64 attach_flags;
++  __u64 offset;
++  __u64 info_size;
++  struct kdbus_item items[0];
++};
++    </programlisting>
++
++    <variablelist>
++      <varlistentry>
++        <term><varname>size</varname></term>
++        <listitem><para>
++          The overall size of the struct, including its items.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>flags</varname></term>
++        <listitem><para>
++          Currently, no flags are supported.
++          <constant>KDBUS_FLAG_NEGOTIATE</constant> is accepted to probe for
++          valid flags. If set, the ioctl will return <errorcode>0</errorcode>,
++          and the <varname>flags</varname> field is set to
++          <constant>0</constant>.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>return_flags</varname></term>
++        <listitem><para>
++          Flags returned by the kernel. Currently unused and always set to
++          <constant>0</constant> by the kernel.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>id</varname></term>
++        <listitem><para>
++          The numerical ID of the connection for which information is to be
++          retrieved. If set to a non-zero value, the
++          <constant>KDBUS_ITEM_OWNED_NAME</constant> item is ignored.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>attach_flags</varname></term>
++        <listitem><para>
++          Specifies which metadata items should be attached to the answer. See
++          <citerefentry>
++            <refentrytitle>kdbus.message</refentrytitle>
++            <manvolnum>7</manvolnum>
++          </citerefentry>.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>offset</varname></term>
++        <listitem><para>
++          When the ioctl returns, this field will contain the offset of the
++          connection information inside the caller's pool. See
++          <citerefentry>
++            <refentrytitle>kdbus.pool</refentrytitle>
++            <manvolnum>7</manvolnum>
++          </citerefentry>
++          for further information.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>info_size</varname></term>
++        <listitem><para>
++          The kernel will return the size of the returned information, so
++          applications can optionally
++          <citerefentry>
++            <refentrytitle>mmap</refentrytitle>
++            <manvolnum>2</manvolnum>
++          </citerefentry>
++          specific parts of the pool. See
++          <citerefentry>
++            <refentrytitle>kdbus.pool</refentrytitle>
++            <manvolnum>7</manvolnum>
++          </citerefentry>
++          for further information.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>items</varname></term>
++        <listitem>
++          <para>
++            The following items are expected for
++            <constant>KDBUS_CMD_CONN_INFO</constant>.
++          </para>
++          <variablelist>
++            <varlistentry>
++              <term><constant>KDBUS_ITEM_OWNED_NAME</constant></term>
++              <listitem>
++                <para>
++                  Contains the well-known name of the connection to look up as.
++                  This item is mandatory if the <varname>id</varname> field is
++                  set to 0.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
++              <listitem><para>
++                With this item, programs can <emphasis>probe</emphasis> the
++                kernel for known item types. See
++                <citerefentry>
++                  <refentrytitle>kdbus.item</refentrytitle>
++                  <manvolnum>7</manvolnum>
++                </citerefentry>
++                for more details.
++              </para></listitem>
++            </varlistentry>
++          </variablelist>
++          <para>
++            Unrecognized items are rejected, and the ioctl will fail with
++            <varname>errno</varname> set to <constant>EINVAL</constant>.
++          </para>
++        </listitem>
++      </varlistentry>
++    </variablelist>
++
++    <para>
++      When the ioctl returns, the following struct will be stored in the
++      caller's pool at <varname>offset</varname>. The fields in this struct
++      are described below.
++    </para>
++
++    <programlisting>
++struct kdbus_info {
++  __u64 size;
++  __u64 id;
++  __u64 flags;
++  struct kdbus_item items[0];
++};
++    </programlisting>
++
++    <variablelist>
++      <varlistentry>
++        <term><varname>size</varname></term>
++        <listitem><para>
++          The overall size of the struct, including its items.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>id</varname></term>
++        <listitem><para>
++          The connection's unique ID.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>flags</varname></term>
++        <listitem><para>
++          The connection's flags as specified when it was created.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>items</varname></term>
++        <listitem>
++          <para>
++            Depending on the <varname>flags</varname> field in
++            <type>struct kdbus_cmd_info</type>, items of types
++            <constant>KDBUS_ITEM_OWNED_NAME</constant> and
++            <constant>KDBUS_ITEM_CONN_DESCRIPTION</constant> may follow here.
++            <constant>KDBUS_ITEM_NEGOTIATE</constant> is also allowed.
++          </para>
++        </listitem>
++      </varlistentry>
++    </variablelist>
++
++    <para>
++      Once the caller is finished with parsing the return buffer, it needs to
++      employ the <constant>KDBUS_CMD_FREE</constant> command for the offset, in
++      order to free the buffer part. See
++      <citerefentry>
++        <refentrytitle>kdbus.pool</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>
++      for further information.
++    </para>
++  </refsect1>
++
++  <refsect1>
++    <title>Getting information about a connection's bus creator</title>
++    <para>
++      The <constant>KDBUS_CMD_BUS_CREATOR_INFO</constant> ioctl takes the same
++      struct as <constant>KDBUS_CMD_CONN_INFO</constant>, but is used to
++      retrieve information about the creator of the bus the connection is
++      attached to. The metadata returned by this call is collected during the
++      creation of the bus and is never altered afterwards, so it provides
++      pristine information on the task that created the bus, at the moment when
++      it did so.
++    </para>
++    <para>
++      In response to this call, a slice in the connection's pool is allocated
++      and filled with an object of type <type>struct kdbus_info</type>,
++      pointed to by the ioctl's <varname>offset</varname> field.
++    </para>
++
++    <programlisting>
++struct kdbus_info {
++  __u64 size;
++  __u64 id;
++  __u64 flags;
++  struct kdbus_item items[0];
++};
++    </programlisting>
++
++    <variablelist>
++      <varlistentry>
++        <term><varname>size</varname></term>
++        <listitem><para>
++          The overall size of the struct, including its items.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>id</varname></term>
++        <listitem><para>
++          The bus ID.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>flags</varname></term>
++        <listitem><para>
++          The bus flags as specified when it was created.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>items</varname></term>
++        <listitem>
++          <para>
++            Metadata information is stored in items here. The item list
++            contains a <constant>KDBUS_ITEM_MAKE_NAME</constant> item that
++            indicates the bus name of the calling connection.
++            <constant>KDBUS_ITEM_NEGOTIATE</constant> is allowed to probe
++            for known item types.
++          </para>
++        </listitem>
++      </varlistentry>
++    </variablelist>
++
++    <para>
++      Once the caller is finished with parsing the return buffer, it needs to
++      employ the <constant>KDBUS_CMD_FREE</constant> command for the offset, in
++      order to free the buffer part. See
++      <citerefentry>
++        <refentrytitle>kdbus.pool</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>
++      for further information.
++    </para>
++  </refsect1>
++
++  <refsect1>
++    <title>Updating connection details</title>
++    <para>
++      Some of a connection's details can be updated with the
++      <constant>KDBUS_CMD_CONN_UPDATE</constant> ioctl, using the file
++      descriptor that was used to create the connection. The update command
++      uses the following struct.
++    </para>
++
++    <programlisting>
++struct kdbus_cmd {
++  __u64 size;
++  __u64 flags;
++  __u64 return_flags;
++  struct kdbus_item items[0];
++};
++    </programlisting>
++
++    <variablelist>
++      <varlistentry>
++        <term><varname>size</varname></term>
++        <listitem><para>
++          The overall size of the struct, including its items.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>flags</varname></term>
++        <listitem><para>
++          Currently, no flags are supported.
++          <constant>KDBUS_FLAG_NEGOTIATE</constant> is accepted to probe for
++          valid flags. If set, the ioctl will return <errorcode>0</errorcode>,
++          and the <varname>flags</varname> field is set to
++          <constant>0</constant>.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>return_flags</varname></term>
++        <listitem><para>
++          Flags returned by the kernel. Currently unused and always set to
++          <constant>0</constant> by the kernel.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>items</varname></term>
++        <listitem>
++          <para>
++            Items to describe the connection details to be updated. The
++            following item types are supported.
++          </para>
++          <variablelist>
++            <varlistentry>
++              <term><constant>KDBUS_ITEM_ATTACH_FLAGS_SEND</constant></term>
++              <listitem>
++                <para>
++                  Supply a new set of metadata items that this connection
++                  permits to be sent along with messages.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_ITEM_ATTACH_FLAGS_RECV</constant></term>
++              <listitem>
++                <para>
++                  Supply a new set of metadata items that this connection
++                  requests to be attached to each message.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_ITEM_NAME</constant></term>
++              <term><constant>KDBUS_ITEM_POLICY_ACCESS</constant></term>
++              <listitem>
++                <para>
++                  Policy holder connections may supply a new set of policy
++                  information with these items. For other connection types,
++                  <constant>EOPNOTSUPP</constant> is returned in
++                  <varname>errno</varname>.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
++              <listitem><para>
++                With this item, programs can <emphasis>probe</emphasis> the
++                kernel for known item types. See
++                <citerefentry>
++                  <refentrytitle>kdbus.item</refentrytitle>
++                  <manvolnum>7</manvolnum>
++                </citerefentry>
++                for more details.
++              </para></listitem>
++            </varlistentry>
++          </variablelist>
++
++          <para>
++            Unrecognized items are rejected, and the ioctl will fail with
++            <varname>errno</varname> set to <constant>EINVAL</constant>.
++          </para>
++        </listitem>
++      </varlistentry>
++    </variablelist>
++  </refsect1>
++
++  <refsect1>
++    <title>Termination of connections</title>
++    <para>
++      A connection can be terminated by simply calling
++      <citerefentry>
++        <refentrytitle>close</refentrytitle>
++        <manvolnum>2</manvolnum>
++      </citerefentry>
++      on its file descriptor. All pending incoming messages will be discarded,
++      and the memory allocated by the pool will be freed.
++    </para>
++
++    <para>
++      An alternative way of closing down a connection is via the
++      <constant>KDBUS_CMD_BYEBYE</constant> ioctl. This ioctl will succeed only
++      if the message queue of the connection is empty at the time of closing;
++      otherwise, the ioctl will fail with <varname>errno</varname> set to
++      <constant>EBUSY</constant>. When this ioctl returns
++      successfully, the connection has been terminated and won't accept any new
++      messages from remote peers. This way, a connection can be terminated
++      race-free, without losing any messages. The ioctl takes an argument of
++      type <type>struct kdbus_cmd</type>.
++    </para>
++
++    <programlisting>
++struct kdbus_cmd {
++  __u64 size;
++  __u64 flags;
++  __u64 return_flags;
++  struct kdbus_item items[0];
++};
++    </programlisting>
++
++    <variablelist>
++      <varlistentry>
++        <term><varname>size</varname></term>
++        <listitem><para>
++          The overall size of the struct, including its items.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>flags</varname></term>
++        <listitem><para>
++          Currently, no flags are supported.
++          <constant>KDBUS_FLAG_NEGOTIATE</constant> is accepted to probe for
++          valid flags. If set, the ioctl will fail with
++          <varname>errno</varname> set to <constant>EPROTO</constant>, and
++          the <varname>flags</varname> field is set to <constant>0</constant>.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>return_flags</varname></term>
++        <listitem><para>
++          Flags returned by the kernel. Currently unused and always set to
++          <constant>0</constant> by the kernel.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>items</varname></term>
++        <listitem>
++          <para>
++            The following item types are supported.
++          </para>
++          <variablelist>
++            <varlistentry>
++              <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
++              <listitem><para>
++                With this item, programs can <emphasis>probe</emphasis> the
++                kernel for known item types. See
++                <citerefentry>
++                  <refentrytitle>kdbus.item</refentrytitle>
++                  <manvolnum>7</manvolnum>
++                </citerefentry>
++                for more details.
++              </para></listitem>
++            </varlistentry>
++          </variablelist>
++
++          <para>
++            Unrecognized items are rejected, and the ioctl will fail with
++            <varname>errno</varname> set to <constant>EINVAL</constant>.
++          </para>
++        </listitem>
++      </varlistentry>
++    </variablelist>
++  </refsect1>
++
++  <refsect1>
++    <title>Return value</title>
++    <para>
++      On success, all mentioned ioctl commands return <errorcode>0</errorcode>;
++      on error, <errorcode>-1</errorcode> is returned, and
++      <varname>errno</varname> is set to indicate the error.
++      If the issued ioctl is illegal for the file descriptor used,
++      <varname>errno</varname> will be set to <constant>ENOTTY</constant>.
++    </para>
++
++    <refsect2>
++      <title>
++        <constant>KDBUS_CMD_HELLO</constant> may fail with the following
++        errors
++      </title>
++
++      <variablelist>
++        <varlistentry>
++          <term><constant>EFAULT</constant></term>
++          <listitem><para>
++            The supplied pool size was 0 or not a multiple of the page size.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EINVAL</constant></term>
++          <listitem><para>
++            The flags supplied in <type>struct kdbus_cmd_hello</type>
++            are invalid.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EINVAL</constant></term>
++          <listitem><para>
++            An illegal combination of
++            <constant>KDBUS_HELLO_MONITOR</constant>,
++            <constant>KDBUS_HELLO_ACTIVATOR</constant> and
++            <constant>KDBUS_HELLO_POLICY_HOLDER</constant> was passed in
++            <varname>flags</varname>.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EINVAL</constant></term>
++          <listitem><para>
++            An invalid set of items was supplied.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>ECONNREFUSED</constant></term>
++          <listitem><para>
++            The attach_flags_send field did not satisfy the requirements of
++            the bus.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EPERM</constant></term>
++          <listitem><para>
++            A <constant>KDBUS_ITEM_CREDS</constant> items was supplied, but the
++            current user is not privileged.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>ESHUTDOWN</constant></term>
++          <listitem><para>
++            The bus you were trying to connect to has already been shut down.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EMFILE</constant></term>
++          <listitem><para>
++            The maximum number of connections on the bus has been reached.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EOPNOTSUPP</constant></term>
++          <listitem><para>
++            The endpoint does not support the connection flags supplied in
++            <type>struct kdbus_cmd_hello</type>.
++          </para></listitem>
++        </varlistentry>
++      </variablelist>
++    </refsect2>
++
++    <refsect2>
++      <title>
++        <constant>KDBUS_CMD_BYEBYE</constant> may fail with the following
++        errors
++      </title>
++
++      <variablelist>
++        <varlistentry>
++          <term><constant>EALREADY</constant></term>
++          <listitem><para>
++            The connection has already been shut down.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EBUSY</constant></term>
++          <listitem><para>
++            There are still messages queued up in the connection's pool.
++          </para></listitem>
++        </varlistentry>
++      </variablelist>
++    </refsect2>
++
++    <refsect2>
++      <title>
++        <constant>KDBUS_CMD_CONN_INFO</constant> may fail with the following
++        errors
++      </title>
++
++      <variablelist>
++        <varlistentry>
++          <term><constant>EINVAL</constant></term>
++          <listitem><para>
++            Invalid flags, or neither an ID nor a name was provided, or the
++            name is invalid.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>ESRCH</constant></term>
++          <listitem><para>
++            Connection lookup by name failed.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>ENXIO</constant></term>
++          <listitem><para>
++            No connection with the provided connection ID found.
++          </para></listitem>
++        </varlistentry>
++      </variablelist>
++    </refsect2>
++
++    <refsect2>
++      <title>
++        <constant>KDBUS_CMD_CONN_UPDATE</constant> may fail with the following
++        errors
++      </title>
++
++      <variablelist>
++        <varlistentry>
++          <term><constant>EINVAL</constant></term>
++          <listitem><para>
++            Illegal flags or items.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EINVAL</constant></term>
++          <listitem><para>
++            Wildcards submitted in policy entries, or illegal sequence
++            of policy items.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EOPNOTSUPP</constant></term>
++          <listitem><para>
++            Operation not supported by connection.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>E2BIG</constant></term>
++          <listitem><para>
++            Too many policy items attached.
++          </para></listitem>
++        </varlistentry>
++      </variablelist>
++    </refsect2>
++  </refsect1>
++
++  <refsect1>
++    <title>See Also</title>
++    <simplelist type="inline">
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.bus</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.endpoint</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.message</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.name</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.policy</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.pool</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.item</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++    </simplelist>
++  </refsect1>
++</refentry>
+diff --git a/Documentation/kdbus/kdbus.endpoint.xml b/Documentation/kdbus/kdbus.endpoint.xml
+new file mode 100644
+index 0000000..6632485
+--- /dev/null
++++ b/Documentation/kdbus/kdbus.endpoint.xml
+@@ -0,0 +1,429 @@
++<?xml version='1.0'?> <!--*-nxml-*-->
++<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
++        "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
++
++<refentry id="kdbus.endpoint">
++
++  <refentryinfo>
++    <title>kdbus.endpoint</title>
++    <productname>kdbus.endpoint</productname>
++  </refentryinfo>
++
++  <refmeta>
++    <refentrytitle>kdbus.endpoint</refentrytitle>
++    <manvolnum>7</manvolnum>
++  </refmeta>
++
++  <refnamediv>
++    <refname>kdbus.endpoint</refname>
++    <refpurpose>kdbus endpoint</refpurpose>
++  </refnamediv>
++
++  <refsect1>
++    <title>Description</title>
++
++    <para>
++      Endpoints are entry points to a bus (see
++      <citerefentry>
++        <refentrytitle>kdbus.bus</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>).
++      By default, each bus has a default
++      endpoint called 'bus'. The bus owner has the ability to create custom
++      endpoints with specific names, permissions, and policy databases
++      (see below). An endpoint is presented as file underneath the directory
++      of the parent bus.
++    </para>
++    <para>
++      To create a custom endpoint, open the default endpoint
++      (<literal>bus</literal>) and use the
++      <constant>KDBUS_CMD_ENDPOINT_MAKE</constant> ioctl with
++      <type>struct kdbus_cmd</type>. Custom endpoints always have a policy
++      database that, by default, forbids any operation. You have to explicitly
++      install policy entries to allow any operation on this endpoint.
++    </para>
++    <para>
++      Once <constant>KDBUS_CMD_ENDPOINT_MAKE</constant> succeeded, the new
++      endpoint will appear in the filesystem
++      (<citerefentry>
++        <refentrytitle>kdbus.bus</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>), and the used file descriptor will manage the
++      newly created endpoint resource. It cannot be used to manage further
++      resources and must be kept open as long as the endpoint is needed. The
++      endpoint will be terminated as soon as the file descriptor is closed.
++    </para>
++    <para>
++      Endpoint names may be chosen freely except for one restriction: the name
++      must be prefixed with the numeric effective UID of the creator and a dash.
++      This is required to avoid namespace clashes between different users. When
++      creating an endpoint, the name that is passed in must be properly
++      formatted or the kernel will refuse creation of the endpoint. Example:
++      <literal>1047-my-endpoint</literal> is an acceptable name for an
++      endpoint registered by a user with UID 1047. However,
++      <literal>1024-my-endpoint</literal> is not, and neither is
++      <literal>my-endpoint</literal>. The UID must be provided in the
++      user-namespace of the bus.
++    </para>
++    <para>
++      To create connections to a bus, use <constant>KDBUS_CMD_HELLO</constant>
++      on a file descriptor returned by <function>open()</function> on an
++      endpoint node. See
++      <citerefentry>
++        <refentrytitle>kdbus.connection</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>
++      for further details.
++    </para>
++  </refsect1>
++
++  <refsect1>
++    <title>Creating custom endpoints</title>
++    <para>
++      To create a new endpoint, the
++      <constant>KDBUS_CMD_ENDPOINT_MAKE</constant> command is used. Along with
++      the endpoint's name, which will be used to expose the endpoint in the
++      <citerefentry>
++        <refentrytitle>kdbus.fs</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>,
++      the command also optionally takes items to set up the endpoint's
++      <citerefentry>
++        <refentrytitle>kdbus.policy</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>.
++      <constant>KDBUS_CMD_ENDPOINT_MAKE</constant> takes a
++      <type>struct kdbus_cmd</type> argument.
++    </para>
++    <programlisting>
++struct kdbus_cmd {
++  __u64 size;
++  __u64 flags;
++  __u64 return_flags;
++  struct kdbus_item items[0];
++};
++    </programlisting>
++
++    <para>The fields in this struct are described below.</para>
++
++    <variablelist>
++      <varlistentry>
++        <term><varname>size</varname></term>
++        <listitem><para>
++          The overall size of the struct, including its items.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>flags</varname></term>
++        <listitem><para>The flags for creation.</para>
++          <variablelist>
++            <varlistentry>
++              <term><constant>KDBUS_MAKE_ACCESS_GROUP</constant></term>
++              <listitem>
++                <para>Make the endpoint file group-accessible.</para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_MAKE_ACCESS_WORLD</constant></term>
++              <listitem>
++                <para>Make the endpoint file world-accessible.</para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
++              <listitem>
++                <para>
++                  Requests a set of valid flags for this ioctl. When this bit is
++                  set, no action is taken; the ioctl will return
++                  <errorcode>0</errorcode>, and the <varname>flags</varname>
++                  field will have all bits set that are valid for this command.
++                  The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
++                  cleared by the operation.
++                </para>
++              </listitem>
++            </varlistentry>
++          </variablelist>
++        </listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>return_flags</varname></term>
++        <listitem><para>
++          Flags returned by the kernel. Currently unused and always set to
++          <constant>0</constant> by the kernel.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>items</varname></term>
++        <listitem>
++          <para>
++            The following items are expected for
++            <constant>KDBUS_CMD_ENDPOINT_MAKE</constant>.
++          </para>
++          <variablelist>
++            <varlistentry>
++              <term><constant>KDBUS_ITEM_MAKE_NAME</constant></term>
++              <listitem>
++                <para>Contains a string to identify the endpoint name.</para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_ITEM_NAME</constant></term>
++              <term><constant>KDBUS_ITEM_POLICY_ACCESS</constant></term>
++              <listitem>
++                <para>
++                  These items are used to set the policy attached to the
++                  endpoint. For more details on bus and endpoint policies, see
++                  <citerefentry>
++                    <refentrytitle>kdbus.policy</refentrytitle>
++                    <manvolnum>7</manvolnum>
++                  </citerefentry>.
++                </para>
++              </listitem>
++            </varlistentry>
++          </variablelist>
++          <para>
++            Unrecognized items are rejected, and the ioctl will fail with
++            <varname>errno</varname> set to <varname>EINVAL</varname>.
++          </para>
++        </listitem>
++      </varlistentry>
++    </variablelist>
++  </refsect1>
++
++  <refsect1>
++    <title>Updating endpoints</title>
++    <para>
++      To update an existing endpoint, the
++      <constant>KDBUS_CMD_ENDPOINT_UPDATE</constant> command is used on the file
++      descriptor that was used to create the endpoint, using
++      <constant>KDBUS_CMD_ENDPOINT_MAKE</constant>. The only relevant detail of
++      the endpoint that can be updated is the policy. When the command is
++      employed, the policy of the endpoint is <emphasis>replaced</emphasis>
++      atomically with the new set of rules.
++      The command takes a <type>struct kdbus_cmd</type> argument.
++    </para>
++    <programlisting>
++struct kdbus_cmd {
++  __u64 size;
++  __u64 flags;
++  __u64 return_flags;
++  struct kdbus_item items[0];
++};
++    </programlisting>
++
++    <para>The fields in this struct are described below.</para>
++
++    <variablelist>
++      <varlistentry>
++        <term><varname>size</varname></term>
++        <listitem><para>
++          The overall size of the struct, including its items.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>flags</varname></term>
++        <listitem><para>
++          Unused for this command.
++          <constant>KDBUS_FLAG_NEGOTIATE</constant> is accepted to probe for
++          valid flags. If set, the ioctl will return <errorcode>0</errorcode>,
++          and the <varname>flags</varname> field is set to
++          <constant>0</constant>.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>return_flags</varname></term>
++        <listitem><para>
++          Flags returned by the kernel. Currently unused and always set to
++          <constant>0</constant> by the kernel.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>items</varname></term>
++        <listitem>
++          <para>
++            The following items are expected for
++            <constant>KDBUS_CMD_ENDPOINT_UPDATE</constant>.
++          </para>
++          <variablelist>
++            <varlistentry>
++              <term><constant>KDBUS_ITEM_NAME</constant></term>
++              <term><constant>KDBUS_ITEM_POLICY_ACCESS</constant></term>
++              <listitem>
++                <para>
++                  These items are used to set the policy attached to the
++                  endpoint. For more details on bus and endpoint policies, see
++                  <citerefentry>
++                    <refentrytitle>kdbus.policy</refentrytitle>
++                    <manvolnum>7</manvolnum>
++                  </citerefentry>.
++                  Existing policy is atomically replaced with the new rules
++                  provided.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
++              <listitem><para>
++                With this item, programs can <emphasis>probe</emphasis> the
++                kernel for known item types. See
++                <citerefentry>
++                  <refentrytitle>kdbus.item</refentrytitle>
++                  <manvolnum>7</manvolnum>
++                </citerefentry>
++                for more details.
++              </para></listitem>
++            </varlistentry>
++          </variablelist>
++          <para>
++            Unrecognized items are rejected, and the ioctl will fail with
++            <varname>errno</varname> set to <constant>EINVAL</constant>.
++          </para>
++        </listitem>
++      </varlistentry>
++    </variablelist>
++  </refsect1>
++
++  <refsect1>
++    <title>Return value</title>
++    <para>
++      On success, all mentioned ioctl commands return <errorcode>0</errorcode>;
++      on error, <errorcode>-1</errorcode> is returned, and
++      <varname>errno</varname> is set to indicate the error.
++      If the issued ioctl is illegal for the file descriptor used,
++      <varname>errno</varname> will be set to <constant>ENOTTY</constant>.
++    </para>
++
++    <refsect2>
++      <title>
++        <constant>KDBUS_CMD_ENDPOINT_MAKE</constant> may fail with the
++        following errors
++      </title>
++
++      <variablelist>
++        <varlistentry>
++          <term><constant>EINVAL</constant></term>
++          <listitem><para>
++            The flags supplied in the <type>struct kdbus_cmd</type>
++            are invalid.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EINVAL</constant></term>
++          <listitem><para>
++            Illegal combination of <constant>KDBUS_ITEM_NAME</constant> and
++            <constant>KDBUS_ITEM_POLICY_ACCESS</constant> was provided.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EEXIST</constant></term>
++          <listitem><para>
++            An endpoint of that name already exists.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EPERM</constant></term>
++          <listitem><para>
++            The calling user is not privileged. See
++            <citerefentry>
++              <refentrytitle>kdbus</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>
++            for information about privileged users.
++          </para></listitem>
++        </varlistentry>
++      </variablelist>
++    </refsect2>
++
++    <refsect2>
++      <title>
++        <constant>KDBUS_CMD_ENDPOINT_UPDATE</constant> may fail with the
++        following errors
++      </title>
++
++      <variablelist>
++        <varlistentry>
++          <term><constant>EINVAL</constant></term>
++          <listitem><para>
++            The flags supplied in <type>struct kdbus_cmd</type>
++            are invalid.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EINVAL</constant></term>
++          <listitem><para>
++            Illegal combination of <constant>KDBUS_ITEM_NAME</constant> and
++            <constant>KDBUS_ITEM_POLICY_ACCESS</constant> was provided.
++          </para></listitem>
++        </varlistentry>
++      </variablelist>
++    </refsect2>
++  </refsect1>
++
++  <refsect1>
++    <title>See Also</title>
++    <simplelist type="inline">
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.bus</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.endpoint</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.fs</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.item</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.message</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.name</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.pool</refentrytitle>
++           <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++    </simplelist>
++  </refsect1>
++</refentry>
+diff --git a/Documentation/kdbus/kdbus.fs.xml b/Documentation/kdbus/kdbus.fs.xml
+new file mode 100644
+index 0000000..8c2a90e
+--- /dev/null
++++ b/Documentation/kdbus/kdbus.fs.xml
+@@ -0,0 +1,124 @@
++<?xml version='1.0'?> <!--*-nxml-*-->
++<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
++        "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
++
++<refentry id="kdbus_fs">
++
++  <refentryinfo>
++    <title>kdbus.fs</title>
++    <productname>kdbus.fs</productname>
++  </refentryinfo>
++
++  <refmeta>
++    <refentrytitle>kdbus.fs</refentrytitle>
++    <manvolnum>7</manvolnum>
++  </refmeta>
++
++  <refnamediv>
++    <refname>kdbus.fs</refname>
++    <refpurpose>kdbus file system</refpurpose>
++  </refnamediv>
++
++  <refsect1>
++    <title>File-system Layout</title>
++
++    <para>
++      The <emphasis>kdbusfs</emphasis> pseudo filesystem provides access to
++      kdbus entities, such as <emphasis>buses</emphasis> and
++      <emphasis>endpoints</emphasis>. Each time the filesystem is mounted,
++      a new, isolated kdbus instance is created, which is independent from the
++      other instances.
++    </para>
++    <para>
++      The system-wide standard mount point for <emphasis>kdbusfs</emphasis> is
++      <constant>/sys/fs/kdbus</constant>.
++    </para>
++
++    <para>
++      Buses are represented as directories in the file system layout, whereas
++      endpoints are exposed as files inside these directories. At the top-level,
++      a <emphasis>control</emphasis> node is present, which can be opened to
++      create new buses via the <constant>KDBUS_CMD_BUS_MAKE</constant> ioctl.
++      Each <emphasis>bus</emphasis> shows a default endpoint called
++      <varname>bus</varname>, which can be opened to either create a connection
++      with the <constant>KDBUS_CMD_HELLO</constant> ioctl, or to create new
++      custom endpoints for the bus with
++      <constant>KDBUS_CMD_ENDPOINT_MAKE</constant>. See
++      <citerefentry>
++        <refentrytitle>kdbus.bus</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>,
++      <citerefentry>
++        <refentrytitle>kdbus.connection</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry> and
++      <citerefentry>
++        <refentrytitle>kdbus.endpoint</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>
++      for more details.
++    </para>
++
++    <para>Following, you can see an example layout of the
++    <emphasis>kdbusfs</emphasis> filesystem:</para>
++
++<programlisting>
++        /sys/fs/kdbus/                          ; mount-point
++        |-- 0-system                            ; bus directory
++        |   |-- bus                             ; default endpoint
++        |   `-- 1017-custom                     ; custom endpoint
++        |-- 1000-user                           ; bus directory
++        |   |-- bus                             ; default endpoint
++        |   |-- 1000-service-A                  ; custom endpoint
++        |   `-- 1000-service-B                  ; custom endpoint
++        `-- control                             ; control file
++</programlisting>
++  </refsect1>
++
++  <refsect1>
++    <title>Mounting instances</title>
++    <para>
++      In order to get a new and separate kdbus environment, a new instance
++      of <emphasis>kdbusfs</emphasis> can be mounted like this:
++    </para>
++<programlisting>
++  # mount -t kdbusfs kdbusfs /tmp/new_kdbus/
++</programlisting>
++  </refsect1>
++
++  <refsect1>
++    <title>See Also</title>
++    <simplelist type="inline">
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.bus</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.connection</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.endpoint</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>mount</refentrytitle>
++          <manvolnum>8</manvolnum>
++        </citerefentry>
++      </member>
++    </simplelist>
++  </refsect1>
++</refentry>
+diff --git a/Documentation/kdbus/kdbus.item.xml b/Documentation/kdbus/kdbus.item.xml
+new file mode 100644
+index 0000000..ee09dfa
+--- /dev/null
++++ b/Documentation/kdbus/kdbus.item.xml
+@@ -0,0 +1,839 @@
++<?xml version='1.0'?> <!--*-nxml-*-->
++<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
++        "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
++
++<refentry id="kdbus">
++
++  <refentryinfo>
++    <title>kdbus.item</title>
++    <productname>kdbus item</productname>
++  </refentryinfo>
++
++  <refmeta>
++    <refentrytitle>kdbus.item</refentrytitle>
++    <manvolnum>7</manvolnum>
++  </refmeta>
++
++  <refnamediv>
++    <refname>kdbus.item</refname>
++    <refpurpose>kdbus item structure, layout and usage</refpurpose>
++  </refnamediv>
++
++  <refsect1>
++    <title>Description</title>
++
++    <para>
++      To flexibly augment transport structures, data blobs of type
++      <type>struct kdbus_item</type> can be attached to the structs passed
++      into the ioctls. Some ioctls make items of certain types mandatory,
++      others are optional. Items that are unsupported by ioctls they are
++      attached to will cause the ioctl to fail with <varname>errno</varname>
++      set to <constant>EINVAL</constant>.
++      Items are also used for information stored in a connection's
++      <emphasis>pool</emphasis>, such as received messages, name lists or
++      requested connection or bus owner information. Depending on the type of
++      an item, its total size is either fixed or variable.
++    </para>
++
++    <refsect2>
++      <title>Chaining items</title>
++      <para>
++        Whenever items are used as part of the kdbus kernel API, they are
++        embedded in structs that are embedded inside structs that themselves
++        include a size field containing the overall size of the structure.
++        This allows multiple items to be chained up, and an item iterator
++        (see below) is capable of detecting the end of an item chain.
++      </para>
++    </refsect2>
++
++    <refsect2>
++      <title>Alignment</title>
++      <para>
++        The kernel expects all items to be aligned to 8-byte boundaries.
++        Unaligned items will cause the ioctl they are used with to fail
++        with <varname>errno</varname> set to <constant>EINVAL</constant>.
++        An item that has an unaligned size itself hence needs to be padded
++        if it is followed by another item.
++      </para>
++    </refsect2>
++
++    <refsect2>
++      <title>Iterating items</title>
++      <para>
++        A simple iterator would iterate over the items until the items have
++        reached the embedding structure's overall size. An example
++        implementation is shown below.
++      </para>
++
++      <programlisting><![CDATA[
++#define KDBUS_ALIGN8(val) (((val) + 7) & ~7)
++
++#define KDBUS_ITEM_NEXT(item) \
++    (typeof(item))((uint8_t *)(item) + KDBUS_ALIGN8((item)->size))
++
++#define KDBUS_ITEM_FOREACH(item, head, first)                      \
++    for ((item) = (head)->first;                                   \
++         ((uint8_t *)(item) < (uint8_t *)(head) + (head)->size) && \
++          ((uint8_t *)(item) >= (uint8_t *)(head));                \
++         (item) = KDBUS_ITEM_NEXT(item))
++      ]]></programlisting>
++    </refsect2>
++  </refsect1>
++
++  <refsect1>
++    <title>Item layout</title>
++    <para>
++      A <type>struct kdbus_item</type> consists of a
++      <varname>size</varname> field, describing its overall size, and a
++      <varname>type</varname> field, both 64 bit wide. They are followed by
++      a union to store information that is specific to the item's type.
++      The struct layout is shown below.
++    </para>
++
++    <programlisting>
++struct kdbus_item {
++  __u64 size;
++  __u64 type;
++  /* item payload - see below */
++  union {
++    __u8 data[0];
++    __u32 data32[0];
++    __u64 data64[0];
++    char str[0];
++
++    __u64 id;
++    struct kdbus_vec vec;
++    struct kdbus_creds creds;
++    struct kdbus_pids pids;
++    struct kdbus_audit audit;
++    struct kdbus_caps caps;
++    struct kdbus_timestamp timestamp;
++    struct kdbus_name name;
++    struct kdbus_bloom_parameter bloom_parameter;
++    struct kdbus_bloom_filter bloom_filter;
++    struct kdbus_memfd memfd;
++    int fds[0];
++    struct kdbus_notify_name_change name_change;
++    struct kdbus_notify_id_change id_change;
++    struct kdbus_policy_access policy_access;
++  };
++};
++    </programlisting>
++
++    <para>
++      <type>struct kdbus_item</type> should never be used to allocate
++      an item instance, as its size may grow in future releases of the API.
++      Instead, it should be manually assembled by storing the
++      <varname>size</varname>, <varname>type</varname> and payload to a
++      struct of its own.
++    </para>
++  </refsect1>
++
++  <refsect1>
++    <title>Item types</title>
++
++    <refsect2>
++      <title>Negotiation item</title>
++      <variablelist>
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
++          <listitem><para>
++            With this item is attached to any ioctl, programs can
++            <emphasis>probe</emphasis> the kernel for known item types.
++            The item carries an array of <type>uint64_t</type> values in
++            <varname>item.data64</varname>, each set to an item type to
++            probe. The kernel will reset each member of this array that is
++            not recognized as valid item type to <constant>0</constant>.
++            This way, users can negotiate kernel features at start-up to
++            keep newer userspace compatible with older kernels. This item
++            is never attached by the kernel in response to any command.
++          </para></listitem>
++        </varlistentry>
++      </variablelist>
++    </refsect2>
++
++    <refsect2>
++      <title>Command specific items</title>
++      <variablelist>
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_PAYLOAD_VEC</constant></term>
++          <term><constant>KDBUS_ITEM_PAYLOAD_OFF</constant></term>
++          <listitem><para>
++            Messages are directly copied by the sending process into the
++            receiver's
++            <citerefentry>
++              <refentrytitle>kdbus.pool</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>.
++            This way, two peers can exchange data by effectively doing a
++            single-copy from one process to another; the kernel will not buffer
++            the data anywhere else. <constant>KDBUS_ITEM_PAYLOAD_VEC</constant>
++            is used when <emphasis>sending</emphasis> message. The item
++            references a memory address when the payload data can be found.
++            <constant>KDBUS_ITEM_PAYLOAD_OFF</constant> is used when messages
++            are <emphasis>received</emphasis>, and the
++            <constant>offset</constant> value describes the offset inside the
++            receiving connection's
++            <citerefentry>
++              <refentrytitle>kdbus.pool</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>
++            where the message payload can be found. See
++            <citerefentry>
++              <refentrytitle>kdbus.message</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>
++            for more information on passing of payload data along with a
++            message.
++            <programlisting>
++struct kdbus_vec {
++  __u64 size;
++  union {
++    __u64 address;
++    __u64 offset;
++  };
++};
++            </programlisting>
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_PAYLOAD_MEMFD</constant></term>
++          <listitem><para>
++            Transports a file descriptor of a <emphasis>memfd</emphasis> in
++            <type>struct kdbus_memfd</type> in <varname>item.memfd</varname>.
++            The <varname>size</varname> field has to match the actual size of
++            the memfd that was specified when it was created. The
++            <varname>start</varname> parameter denotes the offset inside the
++            memfd at which the referenced payload starts. See
++            <citerefentry>
++              <refentrytitle>kdbus.message</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>
++            for more information on passing of payload data along with a
++            message.
++            <programlisting>
++struct kdbus_memfd {
++  __u64 start;
++  __u64 size;
++  int fd;
++  __u32 __pad;
++};
++            </programlisting>
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_FDS</constant></term>
++          <listitem><para>
++            Contains an array of <emphasis>file descriptors</emphasis>.
++            When used with <constant>KDBUS_CMD_SEND</constant>, the values of
++            this array must be filled with valid file descriptor numbers.
++            When received as item attached to a message, the array will
++            contain the numbers of the installed file descriptors, or
++            <constant>-1</constant> in case an error occurred.
++            In either case, the number of entries in the array is derived from
++            the item's total size. See
++            <citerefentry>
++              <refentrytitle>kdbus.message</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>
++            for more information.
++          </para></listitem>
++        </varlistentry>
++      </variablelist>
++    </refsect2>
++
++    <refsect2>
++      <title>Items specific to some commands</title>
++      <variablelist>
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_CANCEL_FD</constant></term>
++          <listitem><para>
++            Transports a file descriptor that can be used to cancel a
++            synchronous <constant>KDBUS_CMD_SEND</constant> operation by
++            writing to it. The file descriptor is stored in
++            <varname>item.fd[0]</varname>. The item may only contain one
++            file descriptor. See
++            <citerefentry>
++              <refentrytitle>kdbus.message</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>
++            for more information on this item and how to use it.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_BLOOM_PARAMETER</constant></term>
++          <listitem><para>
++            Contains a set of <emphasis>bloom parameters</emphasis> as
++            <type>struct kdbus_bloom_parameter</type> in
++            <varname>item.bloom_parameter</varname>.
++            The item is passed from userspace to kernel during the
++            <constant>KDBUS_CMD_BUS_MAKE</constant> ioctl, and returned
++            verbatim when <constant>KDBUS_CMD_HELLO</constant> is called.
++            The kernel does not use the bloom parameters, but they need to
++            be known by each connection on the bus in order to define the
++            bloom filter hash details. See
++            <citerefentry>
++              <refentrytitle>kdbus.match</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>
++            for more information on matching and bloom filters.
++            <programlisting>
++struct kdbus_bloom_parameter {
++  __u64 size;
++  __u64 n_hash;
++};
++            </programlisting>
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_BLOOM_FILTER</constant></term>
++          <listitem><para>
++            Carries a <emphasis>bloom filter</emphasis> as
++            <type>struct kdbus_bloom_filter</type> in
++            <varname>item.bloom_filter</varname>. It is mandatory to send this
++            item attached to a <type>struct kdbus_msg</type>, in case the
++            message is a signal. This item is never transported from kernel to
++            userspace. See
++            <citerefentry>
++              <refentrytitle>kdbus.match</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>
++            for more information on matching and bloom filters.
++            <programlisting>
++struct kdbus_bloom_filter {
++  __u64 generation;
++  __u64 data[0];
++};
++            </programlisting>
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_BLOOM_MASK</constant></term>
++          <listitem><para>
++            Transports a <emphasis>bloom mask</emphasis> as binary data blob
++            stored in <varname>item.data</varname>. This item is used to
++            describe a match into a connection's match database. See
++            <citerefentry>
++              <refentrytitle>kdbus.match</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>
++            for more information on matching and bloom filters.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_DST_NAME</constant></term>
++          <listitem><para>
++            Contains a <emphasis>well-known name</emphasis> to send a
++            message to, as null-terminated string in
++            <varname>item.str</varname>. This item is used with
++            <constant>KDBUS_CMD_SEND</constant>. See
++            <citerefentry>
++              <refentrytitle>kdbus.message</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>
++            for more information on how to send a message.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_MAKE_NAME</constant></term>
++          <listitem><para>
++            Contains a <emphasis>bus name</emphasis> or
++            <emphasis>endpoint name</emphasis>, stored as null-terminated
++            string in <varname>item.str</varname>. This item is sent from
++            userspace to kernel when buses or endpoints are created, and
++            returned back to userspace when the bus creator information is
++            queried. See
++            <citerefentry>
++              <refentrytitle>kdbus.bus</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>
++            and
++            <citerefentry>
++              <refentrytitle>kdbus.endpoint</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_ATTACH_FLAGS_SEND</constant></term>
++          <term><constant>KDBUS_ITEM_ATTACH_FLAGS_RECV</constant></term>
++          <listitem><para>
++            Contains a set of <emphasis>attach flags</emphasis> at
++            <emphasis>send</emphasis> or <emphasis>receive</emphasis> time. See
++            <citerefentry>
++              <refentrytitle>kdbus</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>,
++            <citerefentry>
++              <refentrytitle>kdbus.bus</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry> and
++            <citerefentry>
++              <refentrytitle>kdbus.connection</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>
++            for more information on attach flags.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_ID</constant></term>
++          <listitem><para>
++            Transports a connection's <emphasis>numerical ID</emphasis> of
++            a connection as <type>uint64_t</type> value in
++            <varname>item.id</varname>.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_NAME</constant></term>
++          <listitem><para>
++            Transports a name associated with the
++            <emphasis>name registry</emphasis> as null-terminated string as
++            <type>struct kdbus_name</type> in
++            <varname>item.name</varname>. The <varname>flags</varname>
++            contains the flags of the name. See
++            <citerefentry>
++              <refentrytitle>kdbus.name</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>
++            for more information on how to access the name registry of a bus.
++            <programlisting>
++struct kdbus_name {
++  __u64 flags;
++  char name[0];
++};
++            </programlisting>
++          </para></listitem>
++        </varlistentry>
++      </variablelist>
++    </refsect2>
++
++    <refsect2>
++      <title>Items attached by the kernel as metadata</title>
++
++      <variablelist>
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_TIMESTAMP</constant></term>
++          <listitem><para>
++            Contains both the <emphasis>monotonic</emphasis> and the
++            <emphasis>realtime</emphasis> timestamp, taken when the message
++            was processed on the kernel side.
++            Stored as <type>struct kdbus_timestamp</type> in
++            <varname>item.timestamp</varname>.
++            <programlisting>
++struct kdbus_timestamp {
++  __u64 seqnum;
++  __u64 monotonic_ns;
++  __u64 realtime_ns;
++};
++            </programlisting>
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_CREDS</constant></term>
++          <listitem><para>
++            Contains a set of <emphasis>user</emphasis> and
++            <emphasis>group</emphasis> information as 32-bit values, in the
++            usual four flavors: real, effective, saved and filesystem related.
++            Stored as <type>struct kdbus_creds</type> in
++            <varname>item.creds</varname>.
++            <programlisting>
++struct kdbus_creds {
++  __u32 uid;
++  __u32 euid;
++  __u32 suid;
++  __u32 fsuid;
++  __u32 gid;
++  __u32 egid;
++  __u32 sgid;
++  __u32 fsgid;
++};
++            </programlisting>
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_PIDS</constant></term>
++          <listitem><para>
++            Contains the <emphasis>PID</emphasis>, <emphasis>TID</emphasis>
++            and <emphasis>parent PID (PPID)</emphasis> of a remote peer.
++            Stored as <type>struct kdbus_pids</type> in
++            <varname>item.pids</varname>.
++            <programlisting>
++struct kdbus_pids {
++  __u64 pid;
++  __u64 tid;
++  __u64 ppid;
++};
++            </programlisting>
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_AUXGROUPS</constant></term>
++          <listitem><para>
++            Contains the <emphasis>auxiliary (supplementary) groups</emphasis>
++            a remote peer is a member of, stored as array of
++            <type>uint32_t</type> values in <varname>item.data32</varname>.
++            The array length can be determined by looking at the item's total
++            size, subtracting the size of the header and dividing the
++            remainder by <constant>sizeof(uint32_t)</constant>.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_OWNED_NAME</constant></term>
++          <listitem><para>
++            Contains a <emphasis>well-known name</emphasis> currently owned
++            by a connection. The name is stored as null-terminated string in
++            <varname>item.str</varname>. Its length can also be derived from
++            the item's total size.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_TID_COMM</constant> [*]</term>
++          <listitem><para>
++            Contains the <emphasis>comm</emphasis> string of a task's
++            <emphasis>TID</emphasis> (thread ID), stored as null-terminated
++            string in <varname>item.str</varname>. Its length can also be
++            derived from the item's total size. Receivers of this item should
++            not use its contents for any kind of security measures. See below.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_PID_COMM</constant> [*]</term>
++          <listitem><para>
++            Contains the <emphasis>comm</emphasis> string of a task's
++            <emphasis>PID</emphasis> (process ID), stored as null-terminated
++            string in <varname>item.str</varname>. Its length can also be
++            derived from the item's total size. Receivers of this item should
++            not use its contents for any kind of security measures. See below.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_EXE</constant> [*]</term>
++          <listitem><para>
++            Contains the <emphasis>path to the executable</emphasis> of a task,
++            stored as null-terminated string in <varname>item.str</varname>. Its
++            length can also be derived from the item's total size. Receivers of
++            this item should not use its contents for any kind of security
++            measures. See below.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_CMDLINE</constant> [*]</term>
++          <listitem><para>
++            Contains the <emphasis>command line arguments</emphasis> of a
++            task, stored as an <emphasis>array</emphasis> of null-terminated
++            strings in <varname>item.str</varname>. The total length of all
++            strings in the array can be derived from the item's total size.
++            Receivers of this item should not use its contents for any kind
++            of security measures. See below.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_CGROUP</constant></term>
++          <listitem><para>
++            Contains the <emphasis>cgroup path</emphasis> of a task, stored
++            as null-terminated string in <varname>item.str</varname>. Its
++            length can also be derived from the item's total size.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_CAPS</constant></term>
++          <listitem><para>
++            Contains sets of <emphasis>capabilities</emphasis>, stored as
++            <type>struct kdbus_caps</type> in <varname>item.caps</varname>.
++            As the item size may increase in the future, programs should be
++            written in a way that it takes
++            <varname>item.caps.last_cap</varname> into account, and derive
++            the number of sets and rows from the item size and the reported
++            number of valid capability bits.
++            <programlisting>
++struct kdbus_caps {
++  __u32 last_cap;
++  __u32 caps[0];
++};
++            </programlisting>
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_SECLABEL</constant></term>
++          <listitem><para>
++            Contains the <emphasis>LSM label</emphasis> of a task, stored as
++            null-terminated string in <varname>item.str</varname>. Its length
++            can also be derived from the item's total size.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_AUDIT</constant></term>
++          <listitem><para>
++            Contains the audit <emphasis>sessionid</emphasis> and
++            <emphasis>loginuid</emphasis> of a task, stored as
++            <type>struct kdbus_audit</type> in
++            <varname>item.audit</varname>.
++            <programlisting>
++struct kdbus_audit {
++  __u32 sessionid;
++  __u32 loginuid;
++};
++            </programlisting>
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_CONN_DESCRIPTION</constant></term>
++          <listitem><para>
++            Contains the <emphasis>connection description</emphasis>, as set
++            by <constant>KDBUS_CMD_HELLO</constant> or
++            <constant>KDBUS_CMD_CONN_UPDATE</constant>, stored as
++            null-terminated string in <varname>item.str</varname>. Its length
++            can also be derived from the item's total size.
++          </para></listitem>
++        </varlistentry>
++      </variablelist>
++
++      <para>
++        All metadata is automatically translated into the
++        <emphasis>namespaces</emphasis> of the task that receives them. See
++        <citerefentry>
++          <refentrytitle>kdbus.message</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++        for more information.
++      </para>
++
++      <para>
++        [*] Note that the content stored in metadata items of type
++        <constant>KDBUS_ITEM_TID_COMM</constant>,
++        <constant>KDBUS_ITEM_PID_COMM</constant>,
++        <constant>KDBUS_ITEM_EXE</constant> and
++        <constant>KDBUS_ITEM_CMDLINE</constant>
++        can easily be tampered by the sending tasks. Therefore, they should
++        <emphasis>not</emphasis> be used for any sort of security relevant
++        assumptions. The only reason they are transmitted is to let
++        receivers know about details that were set when metadata was
++        collected, even though the task they were collected from is not
++        active any longer when the items are received.
++      </para>
++    </refsect2>
++
++    <refsect2>
++      <title>Items used for policy entries, matches and notifications</title>
++
++      <variablelist>
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_POLICY_ACCESS</constant></term>
++          <listitem><para>
++            This item describes a <emphasis>policy access</emphasis> entry to
++            access the policy database of a
++            <citerefentry>
++              <refentrytitle>kdbus.bus</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry> or
++            <citerefentry>
++              <refentrytitle>kdbus.endpoint</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>.
++            Please refer to
++            <citerefentry>
++              <refentrytitle>kdbus.policy</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>
++            for more information on the policy database and how to access it.
++            <programlisting>
++struct kdbus_policy_access {
++  __u64 type;
++  __u64 access;
++  __u64 id;
++};
++            </programlisting>
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_ID_ADD</constant></term>
++          <term><constant>KDBUS_ITEM_ID_REMOVE</constant></term>
++          <listitem><para>
++            This item is sent as attachment to a
++            <emphasis>kernel notification</emphasis> and indicates that a
++            new connection was created on the bus, or that a connection was
++            disconnected, respectively. It stores a
++            <type>struct kdbus_notify_id_change</type> in
++            <varname>item.id_change</varname>.
++            The <varname>id</varname> field contains the numeric ID of the
++            connection that was added or removed, and <varname>flags</varname>
++            is set to the connection flags, as passed by
++            <constant>KDBUS_CMD_HELLO</constant>. See
++            <citerefentry>
++              <refentrytitle>kdbus.match</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>
++            and
++            <citerefentry>
++              <refentrytitle>kdbus.message</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>
++            for more information on matches and notification messages.
++            <programlisting>
++struct kdbus_notify_id_change {
++  __u64 id;
++  __u64 flags;
++};
++            </programlisting>
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_NAME_ADD</constant></term>
++          <term><constant>KDBUS_ITEM_NAME_REMOVE</constant></term>
++          <term><constant>KDBUS_ITEM_NAME_CHANGE</constant></term>
++          <listitem><para>
++            This item is sent as attachment to a
++            <emphasis>kernel notification</emphasis> and indicates that a
++            <emphasis>well-known name</emphasis> appeared, disappeared or
++            transferred to another owner on the bus. It stores a
++            <type>struct kdbus_notify_name_change</type> in
++            <varname>item.name_change</varname>.
++            <varname>old_id</varname> describes the former owner of the name
++            and is set to <constant>0</constant> values in case of
++            <constant>KDBUS_ITEM_NAME_ADD</constant>.
++            <varname>new_id</varname> describes the new owner of the name and
++            is set to <constant>0</constant> values in case of
++            <constant>KDBUS_ITEM_NAME_REMOVE</constant>.
++            The <varname>name</varname> field contains the well-known name the
++            notification is about, as null-terminated string. See
++            <citerefentry>
++              <refentrytitle>kdbus.match</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>
++            and
++            <citerefentry>
++              <refentrytitle>kdbus.message</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>
++            for more information on matches and notification messages.
++            <programlisting>
++struct kdbus_notify_name_change {
++  struct kdbus_notify_id_change old_id;
++  struct kdbus_notify_id_change new_id;
++  char name[0];
++};
++            </programlisting>
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_REPLY_TIMEOUT</constant></term>
++          <listitem><para>
++            This item is sent as attachment to a
++            <emphasis>kernel notification</emphasis>. It informs the receiver
++            that an expected reply to a message was not received in time.
++            The remote peer ID and the message cookie are stored in the message
++            header. See
++            <citerefentry>
++              <refentrytitle>kdbus.message</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>
++            for more information about messages, timeouts and notifications.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ITEM_REPLY_DEAD</constant></term>
++          <listitem><para>
++            This item is sent as attachment to a
++            <emphasis>kernel notification</emphasis>. It informs the receiver
++            that a remote connection a reply is expected from was disconnected
++            before that reply was sent. The remote peer ID and the message
++            cookie are stored in the message header. See
++            <citerefentry>
++              <refentrytitle>kdbus.message</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>
++            for more information about messages, timeouts and notifications.
++          </para></listitem>
++        </varlistentry>
++      </variablelist>
++    </refsect2>
++  </refsect1>
++
++  <refsect1>
++    <title>See Also</title>
++    <simplelist type="inline">
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.bus</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.connection</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.endpoint</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.fs</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.message</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.name</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.pool</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>memfd_create</refentrytitle>
++          <manvolnum>2</manvolnum>
++        </citerefentry>
++      </member>
++    </simplelist>
++  </refsect1>
++
++</refentry>
+diff --git a/Documentation/kdbus/kdbus.match.xml b/Documentation/kdbus/kdbus.match.xml
+new file mode 100644
+index 0000000..ae38e04
+--- /dev/null
++++ b/Documentation/kdbus/kdbus.match.xml
+@@ -0,0 +1,555 @@
++<?xml version='1.0'?> <!--*-nxml-*-->
++<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
++        "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
++
++<refentry id="kdbus.match">
++
++  <refentryinfo>
++    <title>kdbus.match</title>
++    <productname>kdbus.match</productname>
++  </refentryinfo>
++
++  <refmeta>
++    <refentrytitle>kdbus.match</refentrytitle>
++    <manvolnum>7</manvolnum>
++  </refmeta>
++
++  <refnamediv>
++    <refname>kdbus.match</refname>
++    <refpurpose>kdbus match</refpurpose>
++  </refnamediv>
++
++  <refsect1>
++    <title>Description</title>
++
++    <para>
++      kdbus connections can install matches in order to subscribe to signal
++      messages sent on the bus. Such signal messages can be either directed
++      to a single connection (by setting a specific connection ID in
++      <varname>struct kdbus_msg.dst_id</varname> or by sending it to a
++      well-known name), or to potentially <emphasis>all</emphasis> currently
++      active connections on the bus (by setting
++      <varname>struct kdbus_msg.dst_id</varname> to
++      <constant>KDBUS_DST_ID_BROADCAST</constant>).
++      A signal message always has the <constant>KDBUS_MSG_SIGNAL</constant>
++      bit set in the <varname>flags</varname> bitfield.
++      Also, signal messages can originate from either the kernel (called
++      <emphasis>notifications</emphasis>), or from other bus connections.
++      In either case, a bus connection needs to have a suitable
++      <emphasis>match</emphasis> installed in order to receive any signal
++      message. Without any rules installed in the connection, no signal message
++      will be received.
++    </para>
++  </refsect1>
++
++  <refsect1>
++    <title>Matches for signal messages from other connections</title>
++    <para>
++      Matches for messages from other connections (not kernel notifications)
++      are implemented as bloom filters (see below). The sender adds certain
++      properties of the message as elements to a bloom filter bit field, and
++      sends that along with the signal message.
++
++      The receiving connection adds the message properties it is interested in
++      as elements to a bloom mask bit field, and uploads the mask as match rule,
++      possibly along with some other rules to further limit the match.
++
++      The kernel will match the signal message's bloom filter against the
++      connection's bloom mask (simply by &amp;-ing it), and will decide whether
++      the message should be delivered to a connection.
++    </para>
++    <para>
++      The kernel has no notion of any specific properties of the signal message,
++      all it sees are the bit fields of the bloom filter and the mask to match
++      against. The use of bloom filters allows simple and efficient matching,
++      without exposing any message properties or internals to the kernel side.
++      Clients need to deal with the fact that they might receive signal messages
++      which they did not subscribe to, as the bloom filter might allow
++      false-positives to pass the filter.
++
++      To allow the future extension of the set of elements in the bloom filter,
++      the filter specifies a <emphasis>generation</emphasis> number. A later
++      generation must always contain all elements of the set of the previous
++      generation, but can add new elements to the set. The match rules mask can
++      carry an array with all previous generations of masks individually stored.
++      When the filter and mask are matched by the kernel, the mask with the
++      closest matching generation is selected as the index into the mask array.
++    </para>
++  </refsect1>
++
++  <refsect1>
++    <title>Bloom filters</title>
++    <para>
++      Bloom filters allow checking whether a given word is present in a
++      dictionary.  This allows connections to set up a mask for information it
++      is interested in, and will be delivered signal messages that have a
++      matching filter.
++
++      For general information, see
++      <ulink url="https://en.wikipedia.org/wiki/Bloom_filter">the Wikipedia
++      article on bloom filters</ulink>.
++    </para>
++    <para>
++      The size of the bloom filter is defined per bus when it is created, in
++      <varname>kdbus_bloom_parameter.size</varname>. All bloom filters attached
++      to signal messages on the bus must match this size, and all bloom filter
++      matches uploaded by connections must also match the size, or a multiple
++      thereof (see below).
++
++      The calculation of the mask has to be done in userspace applications. The
++      kernel just checks the bitmasks to decide whether or not to let the
++      message pass. All bits in the mask must match the filter in and bit-wise
++      <emphasis>AND</emphasis> logic, but the mask may have more bits set than
++      the filter. Consequently, false positive matches are expected to happen,
++      and programs must deal with that fact by checking the contents of the
++      payload again at receive time.
++    </para>
++    <para>
++      Masks are entities that are always passed to the kernel as part of a
++      match (with an item of type <constant>KDBUS_ITEM_BLOOM_MASK</constant>),
++      and filters can be attached to signals, with an item of type
++      <constant>KDBUS_ITEM_BLOOM_FILTER</constant>. For a filter to match, all
++      its bits have to be set in the match mask as well.
++    </para>
++    <para>
++      For example, consider a bus that has a bloom size of 8 bytes, and the
++      following mask/filter combinations:
++    </para>
++    <programlisting><![CDATA[
++          filter  0x0101010101010101
++          mask    0x0101010101010101
++                  -> matches
++
++          filter  0x0303030303030303
++          mask    0x0101010101010101
++                  -> doesn't match
++
++          filter  0x0101010101010101
++          mask    0x0303030303030303
++                  -> matches
++    ]]></programlisting>
++
++    <para>
++      Hence, in order to catch all messages, a mask filled with
++      <constant>0xff</constant> bytes can be installed as a wildcard match rule.
++    </para>
++
++    <refsect2>
++      <title>Generations</title>
++
++      <para>
++        Uploaded matches may contain multiple masks, which have to be as large
++        as the bloom filter size defined by the bus. Each block of a mask is
++        called a <emphasis>generation</emphasis>, starting at index 0.
++
++        At match time, when a signal is about to be delivered, a bloom mask
++        generation is passed, which denotes which of the bloom masks the filter
++        should be matched against. This allows programs to provide backward
++        compatible masks at upload time, while older clients can still match
++        against older versions of filters.
++      </para>
++    </refsect2>
++  </refsect1>
++
++  <refsect1>
++    <title>Matches for kernel notifications</title>
++    <para>
++      To receive kernel generated notifications (see
++      <citerefentry>
++        <refentrytitle>kdbus.message</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>),
++      a connection must install match rules that are different from
++      the bloom filter matches described in the section above. They can be
++      filtered by the connection ID that caused the notification to be sent, by
++      one of the names it currently owns, or by the type of the notification
++      (ID/name add/remove/change).
++    </para>
++  </refsect1>
++
++  <refsect1>
++    <title>Adding a match</title>
++    <para>
++      To add a match, the <constant>KDBUS_CMD_MATCH_ADD</constant> ioctl is
++      used, which takes a <type>struct kdbus_cmd_match</type> as an argument
++      described below.
++
++      Note that each of the items attached to this command will internally
++      create one match <emphasis>rule</emphasis>, and the collection of them,
++      which is submitted as one block via the ioctl, is called a
++      <emphasis>match</emphasis>. To allow a message to pass, all rules of a
++      match have to be satisfied. Hence, adding more items to the command will
++      only narrow the possibility of a match to effectively let the message
++      pass, and will decrease the chance that the connection's process will be
++      woken up needlessly.
++
++      Multiple matches can be installed per connection. As long as one of it has
++      a set of rules which allows the message to pass, this one will be
++      decisive.
++    </para>
++
++    <programlisting>
++struct kdbus_cmd_match {
++  __u64 size;
++  __u64 flags;
++  __u64 return_flags;
++  __u64 cookie;
++  struct kdbus_item items[0];
++};
++    </programlisting>
++
++    <para>The fields in this struct are described below.</para>
++
++    <variablelist>
++      <varlistentry>
++        <term><varname>size</varname></term>
++        <listitem><para>
++          The overall size of the struct, including its items.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>flags</varname></term>
++        <listitem><para>Flags to control the behavior of the ioctl.</para>
++          <variablelist>
++            <varlistentry>
++              <term><constant>KDBUS_MATCH_REPLACE</constant></term>
++              <listitem>
++                <para>Make the endpoint file group-accessible</para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
++              <listitem>
++                <para>
++                  Requests a set of valid flags for this ioctl. When this bit is
++                  set, no action is taken; the ioctl will return
++                  <errorcode>0</errorcode>, and the <varname>flags</varname>
++                  field will have all bits set that are valid for this command.
++                  The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
++                  cleared by the operation.
++                </para>
++              </listitem>
++            </varlistentry>
++          </variablelist>
++        </listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>return_flags</varname></term>
++        <listitem><para>
++          Flags returned by the kernel. Currently unused and always set to
++          <constant>0</constant> by the kernel.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>cookie</varname></term>
++        <listitem><para>
++          A cookie which identifies the match, so it can be referred to when
++          removing it.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>items</varname></term>
++        <listitem>
++        <para>
++          Items to define the actual rules of the matches. The following item
++          types are expected. Each item will create one new match rule.
++        </para>
++          <variablelist>
++            <varlistentry>
++              <term><constant>KDBUS_ITEM_BLOOM_MASK</constant></term>
++              <listitem>
++                <para>
++                  An item that carries the bloom filter mask to match against
++                  in its data field. The payload size must match the bloom
++                  filter size that was specified when the bus was created.
++                  See the "Bloom filters" section above for more information on
++                  bloom filters.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_ITEM_NAME</constant></term>
++              <listitem>
++                <para>
++                  When used as part of kernel notifications, this item specifies
++                  a name that is acquired, lost or that changed its owner (see
++                  below). When used as part of a match for user-generated signal
++                  messages, it specifies a name that the sending connection must
++                  own at the time of sending the signal.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_ITEM_ID</constant></term>
++              <listitem>
++                <para>
++                  Specify a sender connection's ID that will match this rule.
++                  For kernel notifications, this specifies the ID of a
++                  connection that was added to or removed from the bus.
++                  For used-generated signals, it specifies the ID of the
++                  connection that sent the signal message.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_ITEM_NAME_ADD</constant></term>
++              <term><constant>KDBUS_ITEM_NAME_REMOVE</constant></term>
++              <term><constant>KDBUS_ITEM_NAME_CHANGE</constant></term>
++              <listitem>
++                <para>
++                  These items request delivery of kernel notifications that
++                  describe a name acquisition, loss, or change. The details
++                  are stored in the item's
++                  <varname>kdbus_notify_name_change</varname> member.
++                  All information specified must be matched in order to make
++                  the message pass. Use
++                  <constant>KDBUS_MATCH_ID_ANY</constant> to
++                  match against any unique connection ID.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_ITEM_ID_ADD</constant></term>
++              <term><constant>KDBUS_ITEM_ID_REMOVE</constant></term>
++              <listitem>
++                <para>
++                  These items request delivery of kernel notifications that are
++                  generated when a connection is created or terminated.
++                  <type>struct kdbus_notify_id_change</type> is used to
++                  store the actual match information. This item can be used to
++                  monitor one particular connection ID, or, when the ID field
++                  is set to <constant>KDBUS_MATCH_ID_ANY</constant>,
++                  all of them.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
++              <listitem><para>
++                With this item, programs can <emphasis>probe</emphasis> the
++                kernel for known item types. See
++                <citerefentry>
++                  <refentrytitle>kdbus.item</refentrytitle>
++                  <manvolnum>7</manvolnum>
++                </citerefentry>
++                for more details.
++              </para></listitem>
++            </varlistentry>
++          </variablelist>
++
++          <para>
++            Unrecognized items are rejected, and the ioctl will fail with
++            <varname>errno</varname> set to <constant>EINVAL</constant>.
++          </para>
++        </listitem>
++      </varlistentry>
++    </variablelist>
++
++    <para>
++      Refer to
++      <citerefentry>
++        <refentrytitle>kdbus.message</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>
++      for more information on message types.
++    </para>
++  </refsect1>
++
++  <refsect1>
++    <title>Removing a match</title>
++    <para>
++      Matches can be removed with the
++      <constant>KDBUS_CMD_MATCH_REMOVE</constant> ioctl, which takes
++      <type>struct kdbus_cmd_match</type> as argument, but its fields
++      usage slightly differs compared to that of
++      <constant>KDBUS_CMD_MATCH_ADD</constant>.
++    </para>
++
++    <programlisting>
++struct kdbus_cmd_match {
++  __u64 size;
++  __u64 cookie;
++  __u64 flags;
++  __u64 return_flags;
++  struct kdbus_item items[0];
++};
++    </programlisting>
++
++    <para>The fields in this struct are described below.</para>
++
++    <variablelist>
++      <varlistentry>
++        <term><varname>size</varname></term>
++        <listitem><para>
++          The overall size of the struct, including its items.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>cookie</varname></term>
++        <listitem><para>
++          The cookie of the match, as it was passed when the match was added.
++          All matches that have this cookie will be removed.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>flags</varname></term>
++        <listitem><para>
++          No flags are supported for this use case.
++          <constant>KDBUS_FLAG_NEGOTIATE</constant> is accepted to probe for
++          valid flags. If set, the ioctl will fail with
++          <errorcode>-1</errorcode>, <varname>errno</varname> is set to
++          <constant>EPROTO</constant>, and the <varname>flags</varname> field
++          is set to <constant>0</constant>.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>return_flags</varname></term>
++        <listitem><para>
++          Flags returned by the kernel. Currently unused and always set to
++          <constant>0</constant> by the kernel.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>items</varname></term>
++        <listitem>
++          <para>
++            No items are supported for this use case, but
++            <constant>KDBUS_ITEM_NEGOTIATE</constant> is allowed nevertheless.
++          </para>
++        </listitem>
++      </varlistentry>
++    </variablelist>
++  </refsect1>
++
++  <refsect1>
++    <title>Return value</title>
++    <para>
++      On success, all mentioned ioctl commands return <errorcode>0</errorcode>;
++      on error, <errorcode>-1</errorcode> is returned, and
++      <varname>errno</varname> is set to indicate the error.
++      If the issued ioctl is illegal for the file descriptor used,
++      <varname>errno</varname> will be set to <constant>ENOTTY</constant>.
++    </para>
++
++    <refsect2>
++      <title>
++        <constant>KDBUS_CMD_MATCH_ADD</constant> may fail with the following
++        errors
++      </title>
++
++      <variablelist>
++        <varlistentry>
++          <term><constant>EINVAL</constant></term>
++          <listitem><para>
++            Illegal flags or items.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EDOM</constant></term>
++          <listitem><para>
++            Illegal bloom filter size.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EMFILE</constant></term>
++          <listitem><para>
++            Too many matches for this connection.
++          </para></listitem>
++        </varlistentry>
++      </variablelist>
++    </refsect2>
++
++    <refsect2>
++      <title>
++        <constant>KDBUS_CMD_MATCH_REMOVE</constant> may fail with the following
++        errors
++      </title>
++
++      <variablelist>
++        <varlistentry>
++          <term><constant>EINVAL</constant></term>
++          <listitem><para>
++            Illegal flags.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EBADSLT</constant></term>
++          <listitem><para>
++            A match entry with the given cookie could not be found.
++          </para></listitem>
++        </varlistentry>
++      </variablelist>
++    </refsect2>
++  </refsect1>
++
++  <refsect1>
++    <title>See Also</title>
++    <simplelist type="inline">
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.bus</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.match</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.fs</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.item</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.message</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.name</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.pool</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++    </simplelist>
++  </refsect1>
++</refentry>
+diff --git a/Documentation/kdbus/kdbus.message.xml b/Documentation/kdbus/kdbus.message.xml
+new file mode 100644
+index 0000000..0115d9d
+--- /dev/null
++++ b/Documentation/kdbus/kdbus.message.xml
+@@ -0,0 +1,1276 @@
++<?xml version='1.0'?> <!--*-nxml-*-->
++<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
++        "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
++
++<refentry id="kdbus.message">
++
++  <refentryinfo>
++    <title>kdbus.message</title>
++    <productname>kdbus.message</productname>
++  </refentryinfo>
++
++  <refmeta>
++    <refentrytitle>kdbus.message</refentrytitle>
++    <manvolnum>7</manvolnum>
++  </refmeta>
++
++  <refnamediv>
++    <refname>kdbus.message</refname>
++    <refpurpose>kdbus message</refpurpose>
++  </refnamediv>
++
++  <refsect1>
++    <title>Description</title>
++
++    <para>
++      A kdbus message is used to exchange information between two connections
++      on a bus, or to transport notifications from the kernel to one or many
++      connections. This document describes the layout of messages, how payload
++      is added to them and how they are sent and received.
++    </para>
++  </refsect1>
++
++  <refsect1>
++    <title>Message layout</title>
++
++    <para>The layout of a message is shown below.</para>
++
++    <programlisting>
++  +-------------------------------------------------------------------------+
++  | Message                                                                 |
++  | +---------------------------------------------------------------------+ |
++  | | Header                                                              | |
++  | | size:          overall message size, including the data records     | |
++  | | destination:   connection ID of the receiver                        | |
++  | | source:        connection ID of the sender (set by kernel)          | |
++  | | payload_type:  "DBusDBus" textual identifier stored as uint64_t     | |
++  | +---------------------------------------------------------------------+ |
++  | +---------------------------------------------------------------------+ |
++  | | Data Record                                                         | |
++  | | size:  overall record size (without padding)                        | |
++  | | type:  type of data                                                 | |
++  | | data:  reference to data (address or file descriptor)               | |
++  | +---------------------------------------------------------------------+ |
++  | +---------------------------------------------------------------------+ |
++  | | padding bytes to the next 8 byte alignment                          | |
++  | +---------------------------------------------------------------------+ |
++  | +---------------------------------------------------------------------+ |
++  | | Data Record                                                         | |
++  | | size:  overall record size (without padding)                        | |
++  | | ...                                                                 | |
++  | +---------------------------------------------------------------------+ |
++  | +---------------------------------------------------------------------+ |
++  | | padding bytes to the next 8 byte alignment                          | |
++  | +---------------------------------------------------------------------+ |
++  | +---------------------------------------------------------------------+ |
++  | | Data Record                                                         | |
++  | | size:  overall record size                                          | |
++  | | ...                                                                 | |
++  | +---------------------------------------------------------------------+ |
++  |   ... further data records ...                                          |
++  +-------------------------------------------------------------------------+
++    </programlisting>
++  </refsect1>
++
++  <refsect1>
++    <title>Message payload</title>
++
++    <para>
++      When connecting to the bus, receivers request a memory pool of a given
++      size, large enough to carry all backlog of data enqueued for the
++      connection. The pool is internally backed by a shared memory file which
++      can be <function>mmap()</function>ed by the receiver. See
++      <citerefentry>
++        <refentrytitle>kdbus.pool</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>
++      for more information.
++    </para>
++
++    <para>
++      Message payload must be described in items attached to a message when
++      it is sent. A receiver can access the payload by looking at the items
++      that are attached to a message in its pool. The following items are used.
++    </para>
++
++    <variablelist>
++      <varlistentry>
++        <term><constant>KDBUS_ITEM_PAYLOAD_VEC</constant></term>
++        <listitem>
++          <para>
++            This item references a piece of memory on the sender side which is
++            directly copied into the receiver's pool. This way, two peers can
++            exchange data by effectively doing a single-copy from one process
++            to another; the kernel will not buffer the data anywhere else.
++            This item is never found in a message received by a connection.
++          </para>
++        </listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><constant>KDBUS_ITEM_PAYLOAD_OFF</constant></term>
++        <listitem>
++          <para>
++            This item is attached to messages on the receiving side and points
++            to a memory area inside the receiver's pool. The
++            <varname>offset</varname> variable in the item denotes the memory
++            location relative to the message itself.
++          </para>
++        </listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><constant>KDBUS_ITEM_PAYLOAD_MEMFD</constant></term>
++        <listitem>
++          <para>
++            Messages can reference <emphasis>memfd</emphasis> files which
++            contain the data. memfd files are tmpfs-backed files that allow
++            sealing of the content of the file, which prevents all writable
++            access to the file content.
++          </para>
++          <para>
++            Only memfds that have
++            <constant>(F_SEAL_SHRINK|F_SEAL_GROW|F_SEAL_WRITE|F_SEAL_SEAL)
++            </constant>
++            set are accepted as payload data, which enforces reliable passing of
++            data. The receiver can assume that neither the sender nor anyone
++            else can alter the content after the message is sent. If those
++            seals are not set on the memfd, the ioctl will fail with
++            <errorcode>-1</errorcode>, and <varname>errno</varname> will be
++            set to <constant>ETXTBUSY</constant>.
++          </para>
++        </listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><constant>KDBUS_ITEM_FDS</constant></term>
++        <listitem>
++          <para>
++            Messages can transport regular file descriptors via
++            <constant>KDBUS_ITEM_FDS</constant>. This item carries an array
++            of <type>int</type> values in <varname>item.fd</varname>. The
++            maximum number of file descriptors in the item is
++            <constant>253</constant>, and only one item of this type is
++            accepted per message. All passed values must be valid file
++            descriptors; the open count of each file descriptors is increased
++            by installing it to the receiver's task. This item can only be
++            used for directed messages, not for broadcasts, and only to
++            remote peers that have opted-in for receiving file descriptors
++            at connection time (<constant>KDBUS_HELLO_ACCEPT_FD</constant>).
++          </para>
++        </listitem>
++      </varlistentry>
++    </variablelist>
++
++    <para>
++      The sender must not make any assumptions on the type in which data is
++      received by the remote peer. The kernel is free to re-pack multiple
++      <constant>KDBUS_ITEM_PAYLOAD_VEC</constant> and
++      <constant>KDBUS_ITEM_PAYLOAD_MEMFD</constant> payloads. For instance, the
++      kernel may decide to merge multiple <constant>VECs</constant> into a
++      single <constant>VEC</constant>, inline <constant>MEMFD</constant>
++      payloads into memory, or merge all passed <constant>VECs</constant> into a
++      single <constant>MEMFD</constant>. However, the kernel preserves the order
++      of passed data. This means that the order of all <constant>VEC</constant>
++      and <constant>MEMFD</constant> items is not changed in respect to each
++      other. In other words: All passed <constant>VEC</constant> and
++      <constant>MEMFD</constant> data payloads are treated as a single stream
++      of data that may be received by the remote peer in a different set of
++      chunks than it was sent as.
++    </para>
++  </refsect1>
++
++  <refsect1>
++    <title>Sending messages</title>
++
++    <para>
++      Messages are passed to the kernel with the
++      <constant>KDBUS_CMD_SEND</constant> ioctl. Depending on the destination
++      address of the message, the kernel delivers the message to the specific
++      destination connection, or to some subset of all connections on the same
++      bus. Sending messages across buses is not possible. Messages are always
++      queued in the memory pool of the destination connection (see above).
++    </para>
++
++    <para>
++      The <constant>KDBUS_CMD_SEND</constant> ioctl uses a
++      <type>struct kdbus_cmd_send</type> to describe the message
++      transfer.
++    </para>
++    <programlisting>
++struct kdbus_cmd_send {
++  __u64 size;
++  __u64 flags;
++  __u64 return_flags;
++  __u64 msg_address;
++  struct kdbus_msg_info reply;
++  struct kdbus_item items[0];
++};
++    </programlisting>
++
++    <para>The fields in this struct are described below.</para>
++
++    <variablelist>
++      <varlistentry>
++        <term><varname>size</varname></term>
++        <listitem><para>
++          The overall size of the struct, including its items.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>flags</varname></term>
++        <listitem><para>Flags for message delivery</para>
++          <variablelist>
++            <varlistentry>
++              <term><constant>KDBUS_SEND_SYNC_REPLY</constant></term>
++              <listitem>
++                <para>
++                  By default, all calls to kdbus are considered asynchronous,
++                  non-blocking. However, as there are many use cases that need
++                  to wait for a remote peer to answer a method call, there's a
++                  way to send a message and wait for a reply in a synchronous
++                  fashion. This is what the
++                  <constant>KDBUS_SEND_SYNC_REPLY</constant> controls. The
++                  <constant>KDBUS_CMD_SEND</constant> ioctl will block until the
++                  reply has arrived, the timeout limit is reached, in case the
++                  remote connection was shut down, or if interrupted by a signal
++                  before any reply; see
++                  <citerefentry>
++                    <refentrytitle>signal</refentrytitle>
++                    <manvolnum>7</manvolnum>
++                  </citerefentry>.
++
++                  The offset of the reply message in the sender's pool is stored
++                  in <varname>reply</varname> when the ioctl has returned without
++                  error. Hence, there is no need for another
++                  <constant>KDBUS_CMD_RECV</constant> ioctl or anything else to
++                  receive the reply.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
++              <listitem>
++                <para>
++                  Request a set of valid flags for this ioctl. When this bit is
++                  set, no action is taken; the ioctl will fail with
++                  <errorcode>-1</errorcode>, <varname>errno</varname>
++                  is set to <constant>EPROTO</constant>.
++                  Once the ioctl returned, the <varname>flags</varname>
++                  field will have all bits set that the kernel recognizes as
++                  valid for this command.
++                  The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
++                  cleared by the operation.
++                </para>
++              </listitem>
++            </varlistentry>
++          </variablelist>
++        </listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>return_flags</varname></term>
++        <listitem><para>
++          Flags returned by the kernel. Currently unused and always set to
++          <constant>0</constant> by the kernel.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>msg_address</varname></term>
++        <listitem><para>
++          In this field, users have to provide a pointer to a message
++          (<type>struct kdbus_msg</type>) to send. See below for a
++          detailed description.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>reply</varname></term>
++        <listitem><para>
++          Only used for synchronous replies. See description of
++          <type>struct kdbus_cmd_recv</type> for more details.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>items</varname></term>
++        <listitem>
++          <para>
++            The following items are currently recognized.
++          </para>
++          <variablelist>
++            <varlistentry>
++              <term><constant>KDBUS_ITEM_CANCEL_FD</constant></term>
++              <listitem>
++                <para>
++                  When this optional item is passed in, and the call is
++                  executed as SYNC call, the passed in file descriptor can be
++                  used as alternative cancellation point. The kernel will call
++                  <citerefentry>
++                    <refentrytitle>poll</refentrytitle>
++                    <manvolnum>2</manvolnum>
++                  </citerefentry>
++                  on this file descriptor, and once it reports any incoming
++                  bytes, the blocking send operation will be canceled; the
++                  blocking, synchronous ioctl call will return
++                  <errorcode>-1</errorcode>, and <varname>errno</varname> will
++                  be set to <errorname>ECANCELED</errorname>.
++                  Any type of file descriptor on which
++                  <citerefentry>
++                    <refentrytitle>poll</refentrytitle>
++                    <manvolnum>2</manvolnum>
++                  </citerefentry>
++                  can be called on can be used as payload to this item; for
++                  example, an eventfd can be used for this purpose, see
++                  <citerefentry>
++                    <refentrytitle>eventfd</refentrytitle>
++                    <manvolnum>2</manvolnum>
++                  </citerefentry>.
++                  For asynchronous message sending, this item is allowed but
++                  ignored.
++                </para>
++              </listitem>
++            </varlistentry>
++          </variablelist>
++          <para>
++            Unrecognized items are rejected, and the ioctl will fail with
++            <varname>errno</varname> set to <constant>EINVAL</constant>.
++          </para>
++        </listitem>
++      </varlistentry>
++    </variablelist>
++
++    <para>
++      The message referenced by the <varname>msg_address</varname> above has
++      the following layout.
++    </para>
++
++    <programlisting>
++struct kdbus_msg {
++  __u64 size;
++  __u64 flags;
++  __s64 priority;
++  __u64 dst_id;
++  __u64 src_id;
++  __u64 payload_type;
++  __u64 cookie;
++  __u64 timeout_ns;
++  __u64 cookie_reply;
++  struct kdbus_item items[0];
++};
++    </programlisting>
++
++    <para>The fields in this struct are described below.</para>
++
++    <variablelist>
++      <varlistentry>
++        <term><varname>size</varname></term>
++        <listitem><para>
++          The overall size of the struct, including its items.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>flags</varname></term>
++        <listitem><para>Flags to describe message details.</para>
++          <variablelist>
++            <varlistentry>
++              <term><constant>KDBUS_MSG_EXPECT_REPLY</constant></term>
++              <listitem>
++                <para>
++                  Expect a reply to this message from the remote peer. With
++                  this bit set, the timeout_ns field must be set to a non-zero
++                  number of nanoseconds in which the receiving peer is expected
++                  to reply. If such a reply is not received in time, the sender
++                  will be notified with a timeout message (see below). The
++                  value must be an absolute value, in nanoseconds and based on
++                  <constant>CLOCK_MONOTONIC</constant>.
++                </para><para>
++                  For a message to be accepted as reply, it must be a direct
++                  message to the original sender (not a broadcast and not a
++                  signal message), and its
++                  <varname>kdbus_msg.cookie_reply</varname> must match the
++                  previous message's <varname>kdbus_msg.cookie</varname>.
++                </para><para>
++                  Expected replies also temporarily open the policy of the
++                  sending connection, so the other peer is allowed to respond
++                  within the given time window.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_MSG_NO_AUTO_START</constant></term>
++              <listitem>
++                <para>
++                  By default, when a message is sent to an activator
++                  connection, the activator is notified and will start an
++                  implementer. This flag inhibits that behavior. With this bit
++                  set, and the remote being an activator, the ioctl will fail
++                  with <varname>errno</varname> set to
++                  <constant>EADDRNOTAVAIL</constant>.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
++              <listitem>
++                <para>
++                  Requests a set of valid flags for this ioctl. When this bit is
++                  set, no action is taken; the ioctl will return
++                  <errorcode>0</errorcode>, and the <varname>flags</varname>
++                  field will have all bits set that are valid for this command.
++                  The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
++                  cleared by the operation.
++                </para>
++              </listitem>
++            </varlistentry>
++          </variablelist>
++        </listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>priority</varname></term>
++        <listitem><para>
++          The priority of this message. Receiving messages (see below) may
++          optionally be constrained to messages of a minimal priority. This
++          allows for use cases where timing critical data is interleaved with
++          control data on the same connection. If unused, the priority field
++          should be set to <constant>0</constant>.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>dst_id</varname></term>
++        <listitem><para>
++          The numeric ID of the destination connection, or
++          <constant>KDBUS_DST_ID_BROADCAST</constant>
++          (~0ULL) to address every peer on the bus, or
++          <constant>KDBUS_DST_ID_NAME</constant> (0) to look
++          it up dynamically from the bus' name registry.
++          In the latter case, an item of type
++          <constant>KDBUS_ITEM_DST_NAME</constant> is mandatory.
++          Also see
++          <citerefentry>
++            <refentrytitle>kdbus.name</refentrytitle>
++            <manvolnum>7</manvolnum>
++          </citerefentry>
++          .
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>src_id</varname></term>
++        <listitem><para>
++          Upon return of the ioctl, this member will contain the sending
++          connection's numerical ID. Should be 0 at send time.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>payload_type</varname></term>
++        <listitem><para>
++          Type of the payload in the actual data records. Currently, only
++          <constant>KDBUS_PAYLOAD_DBUS</constant> is accepted as input value
++          of this field. When receiving messages that are generated by the
++          kernel (notifications), this field will contain
++          <constant>KDBUS_PAYLOAD_KERNEL</constant>.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>cookie</varname></term>
++        <listitem><para>
++          Cookie of this message, for later recognition. Also, when replying
++          to a message (see above), the <varname>cookie_reply</varname>
++          field must match this value.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>timeout_ns</varname></term>
++        <listitem><para>
++          If the message sent requires a reply from the remote peer (see above),
++          this field contains the timeout in absolute nanoseconds based on
++          <constant>CLOCK_MONOTONIC</constant>. Also see
++          <citerefentry>
++            <refentrytitle>clock_gettime</refentrytitle>
++            <manvolnum>2</manvolnum>
++          </citerefentry>.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>cookie_reply</varname></term>
++        <listitem><para>
++          If the message sent is a reply to another message, this field must
++          match the cookie of the formerly received message.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>items</varname></term>
++        <listitem>
++          <para>
++            A dynamically sized list of items to contain additional information.
++            The following items are expected/valid:
++          </para>
++          <variablelist>
++            <varlistentry>
++              <term><constant>KDBUS_ITEM_PAYLOAD_VEC</constant></term>
++              <term><constant>KDBUS_ITEM_PAYLOAD_MEMFD</constant></term>
++              <term><constant>KDBUS_ITEM_FDS</constant></term>
++              <listitem>
++                <para>
++                  Actual data records containing the payload. See section
++                  "Message payload".
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_ITEM_BLOOM_FILTER</constant></term>
++              <listitem>
++                <para>
++                  Bloom filter for matches (see below).
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_ITEM_DST_NAME</constant></term>
++              <listitem>
++                <para>
++                  Well-known name to send this message to. Required if
++                  <varname>dst_id</varname> is set to
++                  <constant>KDBUS_DST_ID_NAME</constant>.
++                  If a connection holding the given name can't be found,
++                  the ioctl will fail with <varname>errno</varname> set to
++                  <constant>ESRCH</constant> is returned.
++                </para>
++                <para>
++                  For messages to a unique name (ID), this item is optional. If
++                  present, the kernel will make sure the name owner matches the
++                  given unique name. This allows programs to tie the message
++                  sending to the condition that a name is currently owned by a
++                  certain unique name.
++                </para>
++              </listitem>
++            </varlistentry>
++          </variablelist>
++        </listitem>
++      </varlistentry>
++    </variablelist>
++
++    <para>
++      The message will be augmented by the requested metadata items when
++      queued into the receiver's pool. See
++      <citerefentry>
++        <refentrytitle>kdbus.connection</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>
++      and
++      <citerefentry>
++        <refentrytitle>kdbus.item</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>
++      for more information on metadata.
++    </para>
++  </refsect1>
++
++  <refsect1>
++    <title>Receiving messages</title>
++
++    <para>
++      Messages are received by the client with the
++      <constant>KDBUS_CMD_RECV</constant> ioctl. The endpoint file of the bus
++      supports <function>poll()/epoll()/select()</function>; when new messages
++      are available on the connection's file descriptor,
++      <constant>POLLIN</constant> is reported. For compatibility reasons,
++      <constant>POLLOUT</constant> is always reported as well. Note, however,
++      that the latter does not guarantee that a message can in fact be sent, as
++      this depends on how many pending messages the receiver has in its pool.
++    </para>
++
++    <para>
++      With the <constant>KDBUS_CMD_RECV</constant> ioctl, a
++      <type>struct kdbus_cmd_recv</type> is used.
++    </para>
++
++    <programlisting>
++struct kdbus_cmd_recv {
++  __u64 size;
++  __u64 flags;
++  __u64 return_flags;
++  __s64 priority;
++  __u64 dropped_msgs;
++  struct kdbus_msg_info msg;
++  struct kdbus_item items[0];
++};
++    </programlisting>
++
++    <para>The fields in this struct are described below.</para>
++
++    <variablelist>
++      <varlistentry>
++        <term><varname>size</varname></term>
++        <listitem><para>
++          The overall size of the struct, including its items.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>flags</varname></term>
++        <listitem><para>Flags to control the receive command.</para>
++          <variablelist>
++            <varlistentry>
++              <term><constant>KDBUS_RECV_PEEK</constant></term>
++              <listitem>
++                <para>
++                  Just return the location of the next message. Do not install
++                  file descriptors or anything else. This is usually used to
++                  determine the sender of the next queued message.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_RECV_DROP</constant></term>
++              <listitem>
++                <para>
++                  Drop the next message without doing anything else with it,
++                  and free the pool slice. This a short-cut for
++                  <constant>KDBUS_RECV_PEEK</constant> and
++                  <constant>KDBUS_CMD_FREE</constant>.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_RECV_USE_PRIORITY</constant></term>
++              <listitem>
++                <para>
++                  Dequeue the messages ordered by their priority, and filtering
++                  them with the priority field (see below).
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
++              <listitem>
++                <para>
++                  Request a set of valid flags for this ioctl. When this bit is
++                  set, no action is taken; the ioctl will fail with
++                  <errorcode>-1</errorcode>, <varname>errno</varname>
++                  is set to <constant>EPROTO</constant>.
++                  Once the ioctl returned, the <varname>flags</varname>
++                  field will have all bits set that the kernel recognizes as
++                  valid for this command.
++                </para>
++              </listitem>
++            </varlistentry>
++          </variablelist>
++        </listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>return_flags</varname></term>
++        <listitem><para>
++          Flags returned by the kernel. If the <varname>dropped_msgs</varname>
++          field is non-zero, <constant>KDBUS_RECV_RETURN_DROPPED_MSGS</constant>
++          is set. If a file descriptor could not be installed, the
++          <constant>KDBUS_RECV_RETURN_INCOMPLETE_FDS</constant> flag is set.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>priority</varname></term>
++        <listitem><para>
++          With <constant>KDBUS_RECV_USE_PRIORITY</constant> set in
++          <varname>flags</varname>, messages will be dequeued ordered by their
++          priority, starting with the highest value. Also, messages will be
++          filtered by the value given in this field, so the returned message
++          will at least have the requested priority. If no such message is
++          waiting in the queue, the ioctl will fail, and
++          <varname>errno</varname> will be set to <constant>EAGAIN</constant>.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>dropped_msgs</varname></term>
++        <listitem><para>
++          Whenever a message with <constant>KDBUS_MSG_SIGNAL</constant> is sent
++          but cannot be queued on a peer (e.g., as it contains FDs but the peer
++          does not support FDs, or there is no space left in the peer's pool)
++          the 'dropped_msgs' counter of the peer is incremented. On the next
++          RECV ioctl, the 'dropped_msgs' field is copied into the ioctl struct
++          and cleared on the peer. If it was non-zero, the
++          <constant>KDBUS_RECV_RETURN_DROPPED_MSGS</constant> flag will be set
++          in <varname>return_flags</varname>. Note that this will only happen
++          if the ioctl succeeded or failed with <constant>EAGAIN</constant>. In
++          other error cases, the 'dropped_msgs' field of the peer is left
++          untouched.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>msg</varname></term>
++        <listitem><para>
++          Embedded struct containing information on the received message when
++          this command succeeded (see below).
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>items</varname></term>
++        <listitem><para>
++          Items to specify further details for the receive command.
++          Currently unused, and all items will be rejected with
++          <varname>errno</varname> set to <constant>EINVAL</constant>.
++        </para></listitem>
++      </varlistentry>
++    </variablelist>
++
++    <para>
++      Both <type>struct kdbus_cmd_recv</type> and
++      <type>struct kdbus_cmd_send</type> embed
++      <type>struct kdbus_msg_info</type>.
++      For the <constant>KDBUS_CMD_SEND</constant> ioctl, it is used to catch
++      synchronous replies, if one was requested, and is unused otherwise.
++    </para>
++
++    <programlisting>
++struct kdbus_msg_info {
++  __u64 offset;
++  __u64 msg_size;
++  __u64 return_flags;
++};
++    </programlisting>
++
++    <para>The fields in this struct are described below.</para>
++
++    <variablelist>
++      <varlistentry>
++        <term><varname>offset</varname></term>
++        <listitem><para>
++          Upon return of the ioctl, this field contains the offset in the
++          receiver's memory pool. The memory must be freed with
++          <constant>KDBUS_CMD_FREE</constant>. See
++          <citerefentry>
++            <refentrytitle>kdbus.pool</refentrytitle>
++            <manvolnum>7</manvolnum>
++          </citerefentry>
++          for further details.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>msg_size</varname></term>
++        <listitem><para>
++          Upon successful return of the ioctl, this field contains the size of
++          the allocated slice at offset <varname>offset</varname>.
++          It is the combination of the size of the stored
++          <type>struct kdbus_msg</type> object plus all appended VECs.
++          You can use it in combination with <varname>offset</varname> to map
++          a single message, instead of mapping the entire pool. See
++          <citerefentry>
++            <refentrytitle>kdbus.pool</refentrytitle>
++            <manvolnum>7</manvolnum>
++          </citerefentry>
++          for further details.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>return_flags</varname></term>
++        <listitem>
++          <para>
++            Kernel-provided return flags. Currently, the following flags are
++            defined.
++          </para>
++          <variablelist>
++            <varlistentry>
++              <term><constant>KDBUS_RECV_RETURN_INCOMPLETE_FDS</constant></term>
++              <listitem>
++                <para>
++                  The message contained memfds or file descriptors, and the
++                  kernel failed to install one or more of them at receive time.
++                  Most probably that happened because the maximum number of
++                  file descriptors for the receiver's task were exceeded.
++                  In such cases, the message is still delivered, so this is not
++                  a fatal condition. File descriptors numbers inside the
++                  <constant>KDBUS_ITEM_FDS</constant> item or memfd files
++                  referenced by <constant>KDBUS_ITEM_PAYLOAD_MEMFD</constant>
++                  items which could not be installed will be set to
++                  <constant>-1</constant>.
++                </para>
++              </listitem>
++            </varlistentry>
++          </variablelist>
++        </listitem>
++      </varlistentry>
++    </variablelist>
++
++    <para>
++      Unless <constant>KDBUS_RECV_DROP</constant> was passed, the
++      <varname>offset</varname> field contains the location of the new message
++      inside the receiver's pool after the <constant>KDBUS_CMD_RECV</constant>
++      ioctl was employed. The message is stored as <type>struct kdbus_msg</type>
++      at this offset, and can be interpreted with the semantics described above.
++    </para>
++    <para>
++      Also, if the connection allowed for file descriptor to be passed
++      (<constant>KDBUS_HELLO_ACCEPT_FD</constant>), and if the message contained
++      any, they will be installed into the receiving process when the
++      <constant>KDBUS_CMD_RECV</constant> ioctl is called.
++      <emphasis>memfds</emphasis> may always be part of the message payload.
++      The receiving task is obliged to close all file descriptors appropriately
++      once no longer needed. If <constant>KDBUS_RECV_PEEK</constant> is set, no
++      file descriptors are installed. This allows for peeking at a message,
++      looking at its metadata only and dropping it via
++      <constant>KDBUS_RECV_DROP</constant>, without installing any of the file
++      descriptors into the receiving process.
++    </para>
++    <para>
++      The caller is obliged to call the <constant>KDBUS_CMD_FREE</constant>
++      ioctl with the returned offset when the memory is no longer needed.
++    </para>
++  </refsect1>
++
++  <refsect1>
++    <title>Notifications</title>
++    <para>
++      A kernel notification is a regular kdbus message with the following
++      details.
++    </para>
++
++    <itemizedlist>
++      <listitem><para>
++          kdbus_msg.src_id == <constant>KDBUS_SRC_ID_KERNEL</constant>
++      </para></listitem>
++      <listitem><para>
++        kdbus_msg.dst_id == <constant>KDBUS_DST_ID_BROADCAST</constant>
++      </para></listitem>
++      <listitem><para>
++        kdbus_msg.payload_type == <constant>KDBUS_PAYLOAD_KERNEL</constant>
++      </para></listitem>
++      <listitem><para>
++        Has exactly one of the items attached that are described below.
++      </para></listitem>
++      <listitem><para>
++        Always has a timestamp item (<constant>KDBUS_ITEM_TIMESTAMP</constant>)
++        attached.
++      </para></listitem>
++    </itemizedlist>
++
++    <para>
++      The kernel will notify its users of the following events.
++    </para>
++
++    <itemizedlist>
++      <listitem><para>
++        When connection <emphasis>A</emphasis> is terminated while connection
++        <emphasis>B</emphasis> is waiting for a reply from it, connection
++        <emphasis>B</emphasis> is notified with a message with an item of
++        type <constant>KDBUS_ITEM_REPLY_DEAD</constant>.
++      </para></listitem>
++
++      <listitem><para>
++        When connection <emphasis>A</emphasis> does not receive a reply from
++        connection <emphasis>B</emphasis> within the specified timeout window,
++        connection <emphasis>A</emphasis> will receive a message with an
++        item of type <constant>KDBUS_ITEM_REPLY_TIMEOUT</constant>.
++      </para></listitem>
++
++      <listitem><para>
++        When an ordinary connection (not a monitor) is created on or removed
++        from a bus, messages with an item of type
++        <constant>KDBUS_ITEM_ID_ADD</constant> or
++        <constant>KDBUS_ITEM_ID_REMOVE</constant>, respectively, are delivered
++        to all bus members that match these messages through their match
++        database. Eavesdroppers (monitor connections) do not cause such
++        notifications to be sent. They are invisible on the bus.
++      </para></listitem>
++
++      <listitem><para>
++        When a connection gains or loses ownership of a name, messages with an
++        item of type <constant>KDBUS_ITEM_NAME_ADD</constant>,
++        <constant>KDBUS_ITEM_NAME_REMOVE</constant> or
++        <constant>KDBUS_ITEM_NAME_CHANGE</constant> are delivered to all bus
++        members that match these messages through their match database.
++      </para></listitem>
++    </itemizedlist>
++  </refsect1>
++
++  <refsect1>
++    <title>Return value</title>
++    <para>
++      On success, all mentioned ioctl commands return <errorcode>0</errorcode>;
++      on error, <errorcode>-1</errorcode> is returned, and
++      <varname>errno</varname> is set to indicate the error.
++      If the issued ioctl is illegal for the file descriptor used,
++      <varname>errno</varname> will be set to <constant>ENOTTY</constant>.
++    </para>
++
++    <refsect2>
++      <title>
++        <constant>KDBUS_CMD_SEND</constant> may fail with the following
++        errors
++      </title>
++
++      <variablelist>
++        <varlistentry>
++          <term><constant>EOPNOTSUPP</constant></term>
++          <listitem><para>
++            The connection is not an ordinary connection, or the passed
++            file descriptors in <constant>KDBUS_ITEM_FDS</constant> item are
++            either kdbus handles or unix domain sockets. Both are currently
++            unsupported.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EINVAL</constant></term>
++          <listitem><para>
++            The submitted payload type is
++            <constant>KDBUS_PAYLOAD_KERNEL</constant>,
++            <constant>KDBUS_MSG_EXPECT_REPLY</constant> was set without timeout
++            or cookie values, <constant>KDBUS_SEND_SYNC_REPLY</constant> was
++            set without <constant>KDBUS_MSG_EXPECT_REPLY</constant>, an invalid
++            item was supplied, <constant>src_id</constant> was non-zero and was
++            different from the current connection's ID, a supplied memfd had a
++            size of 0, or a string was not properly null-terminated.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>ENOTUNIQ</constant></term>
++          <listitem><para>
++            The supplied destination is
++            <constant>KDBUS_DST_ID_BROADCAST</constant> and either
++            file descriptors were passed, or
++            <constant>KDBUS_MSG_EXPECT_REPLY</constant> was set,
++            or a timeout was given.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>E2BIG</constant></term>
++          <listitem><para>
++            Too many items.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EMSGSIZE</constant></term>
++          <listitem><para>
++            The size of the message header and items or the payload vector
++            is excessive.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EEXIST</constant></term>
++          <listitem><para>
++            Multiple <constant>KDBUS_ITEM_FDS</constant>,
++            <constant>KDBUS_ITEM_BLOOM_FILTER</constant> or
++            <constant>KDBUS_ITEM_DST_NAME</constant> items were supplied.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EBADF</constant></term>
++          <listitem><para>
++            The supplied <constant>KDBUS_ITEM_FDS</constant> or
++            <constant>KDBUS_ITEM_PAYLOAD_MEMFD</constant> items
++            contained an illegal file descriptor.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EMEDIUMTYPE</constant></term>
++          <listitem><para>
++            The supplied memfd is not a sealed kdbus memfd.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EMFILE</constant></term>
++          <listitem><para>
++            Too many file descriptors inside a
++            <constant>KDBUS_ITEM_FDS</constant>.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EBADMSG</constant></term>
++          <listitem><para>
++            An item had illegal size, both a <constant>dst_id</constant> and a
++            <constant>KDBUS_ITEM_DST_NAME</constant> was given, or both a name
++            and a bloom filter was given.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>ETXTBSY</constant></term>
++          <listitem><para>
++            The supplied kdbus memfd file cannot be sealed or the seal
++            was removed, because it is shared with other processes or
++            still mapped with
++            <citerefentry>
++              <refentrytitle>mmap</refentrytitle>
++              <manvolnum>2</manvolnum>
++            </citerefentry>.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>ECOMM</constant></term>
++          <listitem><para>
++            A peer does not accept the file descriptors addressed to it.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EFAULT</constant></term>
++          <listitem><para>
++            The supplied bloom filter size was not 64-bit aligned, or supplied
++            memory could not be accessed by the kernel.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EDOM</constant></term>
++          <listitem><para>
++            The supplied bloom filter size did not match the bloom filter
++            size of the bus.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EDESTADDRREQ</constant></term>
++          <listitem><para>
++            <constant>dst_id</constant> was set to
++            <constant>KDBUS_DST_ID_NAME</constant>, but no
++            <constant>KDBUS_ITEM_DST_NAME</constant> was attached.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>ESRCH</constant></term>
++          <listitem><para>
++            The name to look up was not found in the name registry.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EADDRNOTAVAIL</constant></term>
++          <listitem><para>
++            <constant>KDBUS_MSG_NO_AUTO_START</constant> was given but the
++            destination connection is an activator.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>ENXIO</constant></term>
++          <listitem><para>
++            The passed numeric destination connection ID couldn't be found,
++            or is not connected.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>ECONNRESET</constant></term>
++          <listitem><para>
++            The destination connection is no longer active.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>ETIMEDOUT</constant></term>
++          <listitem><para>
++            Timeout while synchronously waiting for a reply.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EINTR</constant></term>
++          <listitem><para>
++            Interrupted system call while synchronously waiting for a reply.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EPIPE</constant></term>
++          <listitem><para>
++            When sending a message, a synchronous reply from the receiving
++            connection was expected but the connection died before answering.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>ENOBUFS</constant></term>
++          <listitem><para>
++            Too many pending messages on the receiver side.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EREMCHG</constant></term>
++          <listitem><para>
++            Both a well-known name and a unique name (ID) was given, but
++            the name is not currently owned by that connection.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EXFULL</constant></term>
++          <listitem><para>
++            The memory pool of the receiver is full.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EREMOTEIO</constant></term>
++          <listitem><para>
++            While synchronously waiting for a reply, the remote peer
++            failed with an I/O error.
++          </para></listitem>
++        </varlistentry>
++      </variablelist>
++    </refsect2>
++
++    <refsect2>
++      <title>
++        <constant>KDBUS_CMD_RECV</constant> may fail with the following
++        errors
++      </title>
++
++      <variablelist>
++        <varlistentry>
++          <term><constant>EOPNOTSUPP</constant></term>
++          <listitem><para>
++            The connection is not an ordinary connection, or the passed
++            file descriptors are either kdbus handles or unix domain
++            sockets. Both are currently unsupported.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EINVAL</constant></term>
++          <listitem><para>
++            Invalid flags or offset.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EAGAIN</constant></term>
++          <listitem><para>
++            No message found in the queue.
++          </para></listitem>
++        </varlistentry>
++      </variablelist>
++    </refsect2>
++  </refsect1>
++
++  <refsect1>
++    <title>See Also</title>
++    <simplelist type="inline">
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.bus</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.connection</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.endpoint</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.fs</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.item</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.name</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.pool</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>clock_gettime</refentrytitle>
++          <manvolnum>2</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>ioctl</refentrytitle>
++          <manvolnum>2</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>poll</refentrytitle>
++          <manvolnum>2</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>select</refentrytitle>
++          <manvolnum>2</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>epoll</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>eventfd</refentrytitle>
++          <manvolnum>2</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>memfd_create</refentrytitle>
++          <manvolnum>2</manvolnum>
++        </citerefentry>
++      </member>
++    </simplelist>
++  </refsect1>
++</refentry>
+diff --git a/Documentation/kdbus/kdbus.name.xml b/Documentation/kdbus/kdbus.name.xml
+new file mode 100644
+index 0000000..3f5f6a6
+--- /dev/null
++++ b/Documentation/kdbus/kdbus.name.xml
+@@ -0,0 +1,711 @@
++<?xml version='1.0'?> <!--*-nxml-*-->
++<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
++        "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
++
++<refentry id="kdbus.name">
++
++  <refentryinfo>
++    <title>kdbus.name</title>
++    <productname>kdbus.name</productname>
++  </refentryinfo>
++
++  <refmeta>
++    <refentrytitle>kdbus.name</refentrytitle>
++    <manvolnum>7</manvolnum>
++  </refmeta>
++
++  <refnamediv>
++    <refname>kdbus.name</refname>
++    <refpurpose>kdbus.name</refpurpose>
++  </refnamediv>
++
++  <refsect1>
++    <title>Description</title>
++    <para>
++      Each
++      <citerefentry>
++        <refentrytitle>kdbus.bus</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>
++      instantiates a name registry to resolve well-known names into unique
++      connection IDs for message delivery. The registry will be queried when a
++      message is sent with <varname>kdbus_msg.dst_id</varname> set to
++      <constant>KDBUS_DST_ID_NAME</constant>, or when a registry dump is
++      requested with <constant>KDBUS_CMD_NAME_LIST</constant>.
++    </para>
++
++    <para>
++      All of the below is subject to policy rules for <emphasis>SEE</emphasis>
++      and <emphasis>OWN</emphasis> permissions. See
++      <citerefentry>
++        <refentrytitle>kdbus.policy</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>
++      for more information.
++    </para>
++  </refsect1>
++
++  <refsect1>
++    <title>Name validity</title>
++    <para>
++      A name has to comply with the following rules in order to be considered
++      valid.
++    </para>
++
++    <itemizedlist>
++      <listitem>
++        <para>
++          The name has two or more elements separated by a
++          '<literal>.</literal>' (period) character.
++        </para>
++      </listitem>
++      <listitem>
++        <para>
++          All elements must contain at least one character.
++        </para>
++      </listitem>
++      <listitem>
++        <para>
++          Each element must only contain the ASCII characters
++          <literal>[A-Z][a-z][0-9]_</literal> and must not begin with a
++          digit.
++        </para>
++      </listitem>
++      <listitem>
++        <para>
++          The name must contain at least one '<literal>.</literal>' (period)
++          character (and thus at least two elements).
++        </para>
++      </listitem>
++      <listitem>
++        <para>
++          The name must not begin with a '<literal>.</literal>' (period)
++          character.
++        </para>
++      </listitem>
++      <listitem>
++        <para>
++          The name must not exceed <constant>255</constant> characters in
++          length.
++        </para>
++      </listitem>
++    </itemizedlist>
++  </refsect1>
++
++  <refsect1>
++    <title>Acquiring a name</title>
++    <para>
++      To acquire a name, a client uses the
++      <constant>KDBUS_CMD_NAME_ACQUIRE</constant> ioctl with
++      <type>struct kdbus_cmd</type> as argument.
++    </para>
++
++    <programlisting>
++struct kdbus_cmd {
++  __u64 size;
++  __u64 flags;
++  __u64 return_flags;
++  struct kdbus_item items[0];
++};
++    </programlisting>
++
++    <para>The fields in this struct are described below.</para>
++
++    <variablelist>
++      <varlistentry>
++        <term><varname>size</varname></term>
++        <listitem><para>
++          The overall size of the struct, including its items.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>flags</varname></term>
++        <listitem><para>Flags to control details in the name acquisition.</para>
++          <variablelist>
++            <varlistentry>
++              <term><constant>KDBUS_NAME_REPLACE_EXISTING</constant></term>
++              <listitem>
++                <para>
++                  Acquiring a name that is already present usually fails,
++                  unless this flag is set in the call, and
++                  <constant>KDBUS_NAME_ALLOW_REPLACEMENT</constant> (see below)
++                  was set when the current owner of the name acquired it, or
++                  if the current owner is an activator connection (see
++                  <citerefentry>
++                    <refentrytitle>kdbus.connection</refentrytitle>
++                    <manvolnum>7</manvolnum>
++                  </citerefentry>).
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_NAME_ALLOW_REPLACEMENT</constant></term>
++              <listitem>
++                <para>
++                  Allow other connections to take over this name. When this
++                  happens, the former owner of the connection will be notified
++                  of the name loss.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_NAME_QUEUE</constant></term>
++              <listitem>
++                <para>
++                  A name that is already acquired by a connection can not be
++                  acquired again (unless the
++                  <constant>KDBUS_NAME_ALLOW_REPLACEMENT</constant> flag was
++                  set during acquisition; see above).
++                  However, a connection can put itself in a queue of
++                  connections waiting for the name to be released. Once that
++                  happens, the first connection in that queue becomes the new
++                  owner and is notified accordingly.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
++              <listitem>
++                <para>
++                  Request a set of valid flags for this ioctl. When this bit is
++                  set, no action is taken; the ioctl will fail with
++                  <errorcode>-1</errorcode>, and <varname>errno</varname>
++                  is set to <constant>EPROTO</constant>.
++                  Once the ioctl returned, the <varname>flags</varname>
++                  field will have all bits set that the kernel recognizes as
++                  valid for this command.
++                  The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
++                  cleared by the operation.
++                </para>
++              </listitem>
++            </varlistentry>
++          </variablelist>
++        </listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>return_flags</varname></term>
++        <listitem>
++          <para>
++            Flags returned by the kernel. Currently, the following may be
++            returned by the kernel.
++          </para>
++          <variablelist>
++            <varlistentry>
++              <term><constant>KDBUS_NAME_IN_QUEUE</constant></term>
++              <listitem>
++                <para>
++                  The name was not acquired yet, but the connection was
++                  placed in the queue of peers waiting for the name.
++                  This can only happen if <constant>KDBUS_NAME_QUEUE</constant>
++                  was set in the <varname>flags</varname> member (see above).
++                  The connection will receive a name owner change notification
++                  once the current owner has given up the name and its
++                  ownership was transferred.
++                </para>
++              </listitem>
++            </varlistentry>
++          </variablelist>
++        </listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>items</varname></term>
++        <listitem>
++          <para>
++            Items to submit the name. Currently, one item of type
++            <constant>KDBUS_ITEM_NAME</constant> is expected and allowed, and
++            the contained string must be a valid bus name.
++            <constant>KDBUS_ITEM_NEGOTIATE</constant> may be used to probe for
++            valid item types. See
++            <citerefentry>
++              <refentrytitle>kdbus.item</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>
++            for a detailed description of how this item is used.
++          </para>
++          <para>
++            Unrecognized items are rejected, and the ioctl will fail with
++            <varname>errno</varname> set to <errorname>>EINVAL</errorname>.
++          </para>
++        </listitem>
++      </varlistentry>
++    </variablelist>
++  </refsect1>
++
++  <refsect1>
++    <title>Releasing a name</title>
++    <para>
++      A connection may release a name explicitly with the
++      <constant>KDBUS_CMD_NAME_RELEASE</constant> ioctl. If the connection was
++      an implementer of an activatable name, its pending messages are moved
++      back to the activator. If there are any connections queued up as waiters
++      for the name, the first one in the queue (the oldest entry) will become
++      the new owner. The same happens implicitly for all names once a
++      connection terminates. See
++      <citerefentry>
++        <refentrytitle>kdbus.connection</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>
++      for more information on connections.
++    </para>
++    <para>
++      The <constant>KDBUS_CMD_NAME_RELEASE</constant> ioctl uses the same data
++      structure as the acquisition call
++      (<constant>KDBUS_CMD_NAME_ACQUIRE</constant>),
++      but with slightly different field usage.
++    </para>
++
++    <programlisting>
++struct kdbus_cmd {
++  __u64 size;
++  __u64 flags;
++  __u64 return_flags;
++  struct kdbus_item items[0];
++};
++    </programlisting>
++
++    <para>The fields in this struct are described below.</para>
++
++    <variablelist>
++      <varlistentry>
++        <term><varname>size</varname></term>
++        <listitem><para>
++          The overall size of the struct, including its items.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>flags</varname></term>
++        <listitem><para>
++          Flags to the command. Currently unused.
++          <constant>KDBUS_FLAG_NEGOTIATE</constant> is accepted to probe for
++          valid flags. If set, the ioctl will return <errorcode>0</errorcode>,
++          and the <varname>flags</varname> field is set to
++          <constant>0</constant>.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>return_flags</varname></term>
++        <listitem><para>
++          Flags returned by the kernel. Currently unused and always set to
++          <constant>0</constant> by the kernel.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>items</varname></term>
++        <listitem>
++          <para>
++            Items to submit the name. Currently, one item of type
++            <constant>KDBUS_ITEM_NAME</constant> is expected and allowed, and
++            the contained string must be a valid bus name.
++            <constant>KDBUS_ITEM_NEGOTIATE</constant> may be used to probe for
++            valid item types. See
++            <citerefentry>
++              <refentrytitle>kdbus.item</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>
++            for a detailed description of how this item is used.
++          </para>
++          <para>
++            Unrecognized items are rejected, and the ioctl will fail with
++            <varname>errno</varname> set to <constant>EINVAL</constant>.
++          </para>
++        </listitem>
++      </varlistentry>
++    </variablelist>
++  </refsect1>
++
++  <refsect1>
++    <title>Dumping the name registry</title>
++    <para>
++      A connection may request a complete or filtered dump of currently active
++      bus names with the <constant>KDBUS_CMD_LIST</constant> ioctl, which
++      takes a <type>struct kdbus_cmd_list</type> as argument.
++    </para>
++
++    <programlisting>
++struct kdbus_cmd_list {
++  __u64 flags;
++  __u64 return_flags;
++  __u64 offset;
++};
++    </programlisting>
++
++    <para>The fields in this struct are described below.</para>
++
++    <variablelist>
++      <varlistentry>
++        <term><varname>flags</varname></term>
++        <listitem>
++          <para>
++            Any combination of flags to specify which names should be dumped.
++          </para>
++          <variablelist>
++            <varlistentry>
++              <term><constant>KDBUS_LIST_UNIQUE</constant></term>
++              <listitem>
++                <para>
++                  List the unique (numeric) IDs of the connection, whether it
++                  owns a name or not.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_LIST_NAMES</constant></term>
++              <listitem>
++                <para>
++                  List well-known names stored in the database which are
++                  actively owned by a real connection (not an activator).
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_LIST_ACTIVATORS</constant></term>
++              <listitem>
++                <para>
++                  List names that are owned by an activator.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_LIST_QUEUED</constant></term>
++              <listitem>
++                <para>
++                  List connections that are not yet owning a name but are
++                  waiting for it to become available.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
++              <listitem>
++                <para>
++                  Request a set of valid flags for this ioctl. When this bit is
++                  set, no action is taken; the ioctl will fail with
++                  <errorcode>-1</errorcode>, and <varname>errno</varname>
++                  is set to <constant>EPROTO</constant>.
++                  Once the ioctl returned, the <varname>flags</varname>
++                  field will have all bits set that the kernel recognizes as
++                  valid for this command.
++                  The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
++                  cleared by the operation.
++                </para>
++              </listitem>
++            </varlistentry>
++          </variablelist>
++        </listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>return_flags</varname></term>
++        <listitem><para>
++          Flags returned by the kernel. Currently unused and always set to
++          <constant>0</constant> by the kernel.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>offset</varname></term>
++        <listitem><para>
++          When the ioctl returns successfully, the offset to the name registry
++          dump inside the connection's pool will be stored in this field.
++        </para></listitem>
++      </varlistentry>
++    </variablelist>
++
++    <para>
++      The returned list of names is stored in a <type>struct kdbus_list</type>
++      that in turn contains an array of type <type>struct kdbus_info</type>,
++      The array-size in bytes is given as <varname>list_size</varname>.
++      The fields inside <type>struct kdbus_info</type> is described next.
++    </para>
++
++    <programlisting>
++struct kdbus_info {
++  __u64 size;
++  __u64 id;
++  __u64 flags;
++  struct kdbus_item items[0];
++};
++    </programlisting>
++
++    <para>The fields in this struct are described below.</para>
++
++    <variablelist>
++      <varlistentry>
++        <term><varname>size</varname></term>
++        <listitem><para>
++          The overall size of the struct, including its items.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>id</varname></term>
++        <listitem><para>
++          The owning connection's unique ID.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>flags</varname></term>
++        <listitem><para>
++          The flags of the owning connection.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>items</varname></term>
++        <listitem>
++          <para>
++            Items containing the actual name. Currently, one item of type
++            <constant>KDBUS_ITEM_OWNED_NAME</constant> will be attached,
++            including the name's flags. In that item, the flags field of the
++            name may carry the following bits:
++          </para>
++          <variablelist>
++            <varlistentry>
++              <term><constant>KDBUS_NAME_ALLOW_REPLACEMENT</constant></term>
++              <listitem>
++                <para>
++                  Other connections are allowed to take over this name from the
++                  connection that owns it.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_NAME_IN_QUEUE</constant></term>
++              <listitem>
++                <para>
++                  When retrieving a list of currently acquired names in the
++                  registry, this flag indicates whether the connection
++                  actually owns the name or is currently waiting for it to
++                  become available.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_NAME_ACTIVATOR</constant></term>
++              <listitem>
++                <para>
++                  An activator connection owns a name as a placeholder for an
++                  implementer, which is started on demand by programs as soon
++                  as the first message arrives. There's some more information
++                  on this topic in
++                  <citerefentry>
++                    <refentrytitle>kdbus.connection</refentrytitle>
++                    <manvolnum>7</manvolnum>
++                  </citerefentry>
++                  .
++                </para>
++                <para>
++                  In contrast to
++                  <constant>KDBUS_NAME_REPLACE_EXISTING</constant>,
++                  when a name is taken over from an activator connection, all
++                  the messages that have been queued in the activator
++                  connection will be moved over to the new owner. The activator
++                  connection will still be tracked for the name and will take
++                  control again if the implementer connection terminates.
++                </para>
++                <para>
++                  This flag can not be used when acquiring a name, but is
++                  implicitly set through <constant>KDBUS_CMD_HELLO</constant>
++                  with <constant>KDBUS_HELLO_ACTIVATOR</constant> set in
++                  <varname>kdbus_cmd_hello.conn_flags</varname>.
++                </para>
++              </listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
++              <listitem>
++                <para>
++                  Requests a set of valid flags for this ioctl. When this bit is
++                  set, no action is taken; the ioctl will return
++                  <errorcode>0</errorcode>, and the <varname>flags</varname>
++                  field will have all bits set that are valid for this command.
++                  The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
++                  cleared by the operation.
++                </para>
++              </listitem>
++            </varlistentry>
++          </variablelist>
++        </listitem>
++      </varlistentry>
++    </variablelist>
++
++    <para>
++      The returned buffer must be freed with the
++      <constant>KDBUS_CMD_FREE</constant> ioctl when the user is finished with
++      it. See
++      <citerefentry>
++        <refentrytitle>kdbus.pool</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>
++      for more information.
++    </para>
++  </refsect1>
++
++  <refsect1>
++    <title>Return value</title>
++    <para>
++      On success, all mentioned ioctl commands return <errorcode>0</errorcode>;
++      on error, <errorcode>-1</errorcode> is returned, and
++      <varname>errno</varname> is set to indicate the error.
++      If the issued ioctl is illegal for the file descriptor used,
++      <varname>errno</varname> will be set to <constant>ENOTTY</constant>.
++    </para>
++
++    <refsect2>
++      <title>
++        <constant>KDBUS_CMD_NAME_ACQUIRE</constant> may fail with the following
++        errors
++      </title>
++
++      <variablelist>
++        <varlistentry>
++          <term><constant>EINVAL</constant></term>
++          <listitem><para>
++            Illegal command flags, illegal name provided, or an activator
++            tried to acquire a second name.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EPERM</constant></term>
++          <listitem><para>
++            Policy prohibited name ownership.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EALREADY</constant></term>
++          <listitem><para>
++            Connection already owns that name.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EEXIST</constant></term>
++          <listitem><para>
++            The name already exists and can not be taken over.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>E2BIG</constant></term>
++          <listitem><para>
++            The maximum number of well-known names per connection is exhausted.
++          </para></listitem>
++        </varlistentry>
++      </variablelist>
++    </refsect2>
++
++    <refsect2>
++      <title>
++        <constant>KDBUS_CMD_NAME_RELEASE</constant>
++        may fail with the following errors
++      </title>
++
++      <variablelist>
++        <varlistentry>
++          <term><constant>EINVAL</constant></term>
++          <listitem><para>
++            Invalid command flags, or invalid name provided.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>ESRCH</constant></term>
++          <listitem><para>
++            Name is not found in the registry.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EADDRINUSE</constant></term>
++          <listitem><para>
++            Name is owned by a different connection and can't be released.
++          </para></listitem>
++        </varlistentry>
++      </variablelist>
++    </refsect2>
++
++    <refsect2>
++      <title>
++        <constant>KDBUS_CMD_LIST</constant> may fail with the following
++        errors
++      </title>
++
++      <variablelist>
++        <varlistentry>
++          <term><constant>EINVAL</constant></term>
++          <listitem><para>
++            Invalid command flags
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>ENOBUFS</constant></term>
++          <listitem><para>
++            No available memory in the connection's pool.
++          </para></listitem>
++        </varlistentry>
++      </variablelist>
++    </refsect2>
++  </refsect1>
++
++  <refsect1>
++    <title>See Also</title>
++    <simplelist type="inline">
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.bus</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.connection</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.item</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.policy</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.pool</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++    </simplelist>
++  </refsect1>
++</refentry>
+diff --git a/Documentation/kdbus/kdbus.policy.xml b/Documentation/kdbus/kdbus.policy.xml
+new file mode 100644
+index 0000000..6732416
+--- /dev/null
++++ b/Documentation/kdbus/kdbus.policy.xml
+@@ -0,0 +1,406 @@
++<?xml version='1.0'?> <!--*-nxml-*-->
++<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
++        "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
++
++<refentry id="kdbus.policy">
++
++  <refentryinfo>
++    <title>kdbus.policy</title>
++    <productname>kdbus.policy</productname>
++  </refentryinfo>
++
++  <refmeta>
++    <refentrytitle>kdbus.policy</refentrytitle>
++    <manvolnum>7</manvolnum>
++  </refmeta>
++
++  <refnamediv>
++    <refname>kdbus.policy</refname>
++    <refpurpose>kdbus policy</refpurpose>
++  </refnamediv>
++
++  <refsect1>
++    <title>Description</title>
++
++    <para>
++      A kdbus policy restricts the possibilities of connections to own, see and
++      talk to well-known names. A policy can be associated with a bus (through a
++      policy holder connection) or a custom endpoint. kdbus stores its policy
++      information in a database that can be accessed through the following
++      ioctl commands:
++    </para>
++
++    <variablelist>
++      <varlistentry>
++        <term><constant>KDBUS_CMD_HELLO</constant></term>
++        <listitem><para>
++          When creating, or updating, a policy holder connection. See
++          <citerefentry>
++            <refentrytitle>kdbus.connection</refentrytitle>
++            <manvolnum>7</manvolnum>
++          </citerefentry>.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><constant>KDBUS_CMD_ENDPOINT_MAKE</constant></term>
++        <term><constant>KDBUS_CMD_ENDPOINT_UPDATE</constant></term>
++        <listitem><para>
++          When creating, or updating, a bus custom endpoint. See
++          <citerefentry>
++            <refentrytitle>kdbus.endpoint</refentrytitle>
++            <manvolnum>7</manvolnum>
++          </citerefentry>.
++        </para></listitem>
++      </varlistentry>
++    </variablelist>
++
++    <para>
++      In all cases, the name and policy access information is stored in items
++      of type <constant>KDBUS_ITEM_NAME</constant> and
++      <constant>KDBUS_ITEM_POLICY_ACCESS</constant>. For this transport, the
++      following rules apply.
++    </para>
++
++    <itemizedlist>
++      <listitem>
++        <para>
++          An item of type <constant>KDBUS_ITEM_NAME</constant> must be followed
++          by at least one <constant>KDBUS_ITEM_POLICY_ACCESS</constant> item.
++        </para>
++      </listitem>
++
++      <listitem>
++        <para>
++          An item of type <constant>KDBUS_ITEM_NAME</constant> can be followed
++          by an arbitrary number of
++          <constant>KDBUS_ITEM_POLICY_ACCESS</constant> items.
++        </para>
++      </listitem>
++
++      <listitem>
++        <para>
++          An arbitrary number of groups of names and access levels can be given.
++        </para>
++      </listitem>
++    </itemizedlist>
++
++    <para>
++      Names passed in items of type <constant>KDBUS_ITEM_NAME</constant> must
++      comply to the rules of valid kdbus.name. See
++      <citerefentry>
++        <refentrytitle>kdbus.name</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>
++      for more information.
++
++      The payload of an item of type
++      <constant>KDBUS_ITEM_POLICY_ACCESS</constant> is defined by the following
++      struct. For more information on the layout of items, please refer to
++      <citerefentry>
++        <refentrytitle>kdbus.item</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>.
++    </para>
++
++    <programlisting>
++struct kdbus_policy_access {
++  __u64 type;
++  __u64 access;
++  __u64 id;
++};
++    </programlisting>
++
++    <para>The fields in this struct are described below.</para>
++
++    <variablelist>
++      <varlistentry>
++        <term><varname>type</varname></term>
++        <listitem>
++          <para>
++            One of the following.
++          </para>
++
++          <variablelist>
++            <varlistentry>
++              <term><constant>KDBUS_POLICY_ACCESS_USER</constant></term>
++              <listitem><para>
++                Grant access to a user with the UID stored in the
++                <varname>id</varname> field.
++              </para></listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_POLICY_ACCESS_GROUP</constant></term>
++              <listitem><para>
++                Grant access to a user with the GID stored in the
++                <varname>id</varname> field.
++              </para></listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_POLICY_ACCESS_WORLD</constant></term>
++              <listitem><para>
++                Grant access to everyone. The <varname>id</varname> field
++                is ignored.
++              </para></listitem>
++            </varlistentry>
++          </variablelist>
++        </listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>access</varname></term>
++        <listitem>
++          <para>
++            The access to grant. One of the following.
++          </para>
++
++          <variablelist>
++            <varlistentry>
++              <term><constant>KDBUS_POLICY_SEE</constant></term>
++              <listitem><para>
++                Allow the name to be seen.
++              </para></listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_POLICY_TALK</constant></term>
++              <listitem><para>
++                Allow the name to be talked to.
++              </para></listitem>
++            </varlistentry>
++
++            <varlistentry>
++              <term><constant>KDBUS_POLICY_OWN</constant></term>
++              <listitem><para>
++                Allow the name to be owned.
++              </para></listitem>
++            </varlistentry>
++          </variablelist>
++        </listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>id</varname></term>
++        <listitem><para>
++           For <constant>KDBUS_POLICY_ACCESS_USER</constant>, stores the UID.
++           For <constant>KDBUS_POLICY_ACCESS_GROUP</constant>, stores the GID.
++        </para></listitem>
++      </varlistentry>
++
++    </variablelist>
++
++    <para>
++      All endpoints of buses have an empty policy database by default.
++      Therefore, unless policy rules are added, all operations will also be
++      denied by default. Also see
++      <citerefentry>
++        <refentrytitle>kdbus.endpoint</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>.
++    </para>
++  </refsect1>
++
++  <refsect1>
++    <title>Wildcard names</title>
++    <para>
++      Policy holder connections may upload names that contain the wildcard
++      suffix (<literal>".*"</literal>). Such a policy entry is effective for
++      every well-known name that extends the provided name by exactly one more
++      level.
++
++      For example, the name <literal>foo.bar.*</literal> matches both
++      <literal>"foo.bar.baz"</literal> and
++      <literal>"foo.bar.bazbaz"</literal> are, but not
++      <literal>"foo.bar.baz.baz"</literal>.
++
++      This allows connections to take control over multiple names that the
++      policy holder doesn't need to know about when uploading the policy.
++
++      Such wildcard entries are not allowed for custom endpoints.
++    </para>
++  </refsect1>
++
++  <refsect1>
++    <title>Privileged connections</title>
++    <para>
++      The policy database is overruled when action is taken by a privileged
++      connection. Please refer to
++      <citerefentry>
++        <refentrytitle>kdbus.connection</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>
++      for more information on what makes a connection privileged.
++    </para>
++  </refsect1>
++
++  <refsect1>
++    <title>Examples</title>
++    <para>
++      For instance, a set of policy rules may look like this:
++    </para>
++
++    <programlisting>
++KDBUS_ITEM_NAME: str='org.foo.bar'
++KDBUS_ITEM_POLICY_ACCESS: type=USER, access=OWN, ID=1000
++KDBUS_ITEM_POLICY_ACCESS: type=USER, access=TALK, ID=1001
++KDBUS_ITEM_POLICY_ACCESS: type=WORLD, access=SEE
++
++KDBUS_ITEM_NAME: str='org.blah.baz'
++KDBUS_ITEM_POLICY_ACCESS: type=USER, access=OWN, ID=0
++KDBUS_ITEM_POLICY_ACCESS: type=WORLD, access=TALK
++    </programlisting>
++
++    <para>
++      That means that 'org.foo.bar' may only be owned by UID 1000, but every
++      user on the bus is allowed to see the name. However, only UID 1001 may
++      actually send a message to the connection and receive a reply from it.
++
++      The second rule allows 'org.blah.baz' to be owned by UID 0 only, but
++      every user may talk to it.
++    </para>
++  </refsect1>
++
++  <refsect1>
++    <title>TALK access and multiple well-known names per connection</title>
++    <para>
++      Note that TALK access is checked against all names of a connection. For
++      example, if a connection owns both <constant>'org.foo.bar'</constant> and
++      <constant>'org.blah.baz'</constant>, and the policy database allows
++      <constant>'org.blah.baz'</constant> to be talked to by WORLD, then this
++      permission is also granted to <constant>'org.foo.bar'</constant>. That
++      might sound illogical, but after all, we allow messages to be directed to
++      either the ID or a well-known name, and policy is applied to the
++      connection, not the name. In other words, the effective TALK policy for a
++      connection is the most permissive of all names the connection owns.
++
++      For broadcast messages, the receiver needs TALK permissions to the sender
++      to receive the broadcast.
++    </para>
++    <para>
++      Both the endpoint and the bus policy databases are consulted to allow
++      name registry listing, owning a well-known name and message delivery.
++      If either one fails, the operation is failed with
++      <varname>errno</varname> set to <constant>EPERM</constant>.
++
++      For best practices, connections that own names with a restricted TALK
++      access should not install matches. This avoids cases where the sent
++      message may pass the bloom filter due to false-positives and may also
++      satisfy the policy rules.
++
++      Also see
++      <citerefentry>
++        <refentrytitle>kdbus.match</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>.
++    </para>
++  </refsect1>
++
++  <refsect1>
++    <title>Implicit policies</title>
++    <para>
++      Depending on the type of the endpoint, a set of implicit rules that
++      override installed policies might be enforced.
++
++      On default endpoints, the following set is enforced and checked before
++      any user-supplied policy is checked.
++    </para>
++
++    <itemizedlist>
++      <listitem>
++        <para>
++          Privileged connections always override any installed policy. Those
++          connections could easily install their own policies, so there is no
++          reason to enforce installed policies.
++        </para>
++      </listitem>
++      <listitem>
++        <para>
++          Connections can always talk to connections of the same user. This
++          includes broadcast messages.
++        </para>
++      </listitem>
++    </itemizedlist>
++
++    <para>
++      Custom endpoints have stricter policies. The following rules apply:
++    </para>
++
++    <itemizedlist>
++      <listitem>
++        <para>
++          Policy rules are always enforced, even if the connection is a
++          privileged connection.
++        </para>
++      </listitem>
++      <listitem>
++        <para>
++          Policy rules are always enforced for <constant>TALK</constant> access,
++          even if both ends are running under the same user. This includes
++          broadcast messages.
++        </para>
++      </listitem>
++      <listitem>
++        <para>
++          To restrict the set of names that can be seen, endpoint policies can
++          install <constant>SEE</constant> policies.
++        </para>
++      </listitem>
++    </itemizedlist>
++  </refsect1>
++
++  <refsect1>
++    <title>See Also</title>
++    <simplelist type="inline">
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.bus</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.endpoint</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.fs</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.item</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.message</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.name</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.pool</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++    </simplelist>
++  </refsect1>
++</refentry>
+diff --git a/Documentation/kdbus/kdbus.pool.xml b/Documentation/kdbus/kdbus.pool.xml
+new file mode 100644
+index 0000000..a9e16f1
+--- /dev/null
++++ b/Documentation/kdbus/kdbus.pool.xml
+@@ -0,0 +1,326 @@
++<?xml version='1.0'?> <!--*-nxml-*-->
++<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
++        "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
++
++<refentry id="kdbus.pool">
++
++  <refentryinfo>
++    <title>kdbus.pool</title>
++    <productname>kdbus.pool</productname>
++  </refentryinfo>
++
++  <refmeta>
++    <refentrytitle>kdbus.pool</refentrytitle>
++    <manvolnum>7</manvolnum>
++  </refmeta>
++
++  <refnamediv>
++    <refname>kdbus.pool</refname>
++    <refpurpose>kdbus pool</refpurpose>
++  </refnamediv>
++
++  <refsect1>
++    <title>Description</title>
++    <para>
++      A pool for data received from the kernel is installed for every
++      <emphasis>connection</emphasis> of the <emphasis>bus</emphasis>, and
++      is sized according to the information stored in the
++      <varname>pool_size</varname> member of <type>struct kdbus_cmd_hello</type>
++      when <constant>KDBUS_CMD_HELLO</constant> is employed. Internally, the
++      pool is segmented into <emphasis>slices</emphasis>, each referenced by its
++      <emphasis>offset</emphasis> in the pool, expressed in <type>bytes</type>.
++      See
++      <citerefentry>
++        <refentrytitle>kdbus.connection</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>
++      for more information about <constant>KDBUS_CMD_HELLO</constant>.
++    </para>
++
++    <para>
++      The pool is written to by the kernel when one of the following
++      <emphasis>ioctls</emphasis> is issued:
++
++      <variablelist>
++        <varlistentry>
++          <term><constant>KDBUS_CMD_HELLO</constant></term>
++          <listitem><para>
++            ... to receive details about the bus the connection was made to
++          </para></listitem>
++        </varlistentry>
++        <varlistentry>
++          <term><constant>KDBUS_CMD_RECV</constant></term>
++          <listitem><para>
++            ... to receive a message
++          </para></listitem>
++        </varlistentry>
++        <varlistentry>
++          <term><constant>KDBUS_CMD_LIST</constant></term>
++          <listitem><para>
++            ... to dump the name registry
++          </para></listitem>
++        </varlistentry>
++        <varlistentry>
++          <term><constant>KDBUS_CMD_CONN_INFO</constant></term>
++          <listitem><para>
++            ... to retrieve information on a connection
++          </para></listitem>
++        </varlistentry>
++        <varlistentry>
++          <term><constant>KDBUS_CMD_BUS_CREATOR_INFO</constant></term>
++          <listitem><para>
++            ... to retrieve information about a connection's bus creator
++          </para></listitem>
++        </varlistentry>
++      </variablelist>
++
++    </para>
++    <para>
++      The <varname>offset</varname> fields returned by either one of the
++      aforementioned ioctls describe offsets inside the pool. In order to make
++      the slice available for subsequent calls,
++      <constant>KDBUS_CMD_FREE</constant> has to be called on that offset
++      (see below). Otherwise, the pool will fill up, and the connection won't
++      be able to receive any more information through its pool.
++    </para>
++  </refsect1>
++
++  <refsect1>
++    <title>Pool slice allocation</title>
++    <para>
++      Pool slices are allocated by the kernel in order to report information
++      back to a task, such as messages, returned name list etc.
++      Allocation of pool slices cannot be initiated by userspace. See
++      <citerefentry>
++        <refentrytitle>kdbus.connection</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>
++      and
++      <citerefentry>
++        <refentrytitle>kdbus.name</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>
++      for examples of commands that use the <emphasis>pool</emphasis> to
++      return data.
++    </para>
++  </refsect1>
++
++  <refsect1>
++    <title>Accessing the pool memory</title>
++    <para>
++      Memory in the pool is read-only for userspace and may only be written
++      to by the kernel. To read from the pool memory, the caller is expected to
++      <citerefentry>
++        <refentrytitle>mmap</refentrytitle>
++        <manvolnum>2</manvolnum>
++      </citerefentry>
++      the buffer into its task, like this:
++    </para>
++    <programlisting>
++uint8_t *buf = mmap(NULL, size, PROT_READ, MAP_SHARED, conn_fd, 0);
++    </programlisting>
++
++    <para>
++      In order to map the entire pool, the <varname>size</varname> parameter in
++      the example above should be set to the value of the
++      <varname>pool_size</varname> member of
++      <type>struct kdbus_cmd_hello</type> when
++      <constant>KDBUS_CMD_HELLO</constant> was employed to create the
++      connection (see above).
++    </para>
++
++    <para>
++      The <emphasis>file descriptor</emphasis> used to map the memory must be
++      the one that was used to create the <emphasis>connection</emphasis>.
++      In other words, the one that was used to call
++      <constant>KDBUS_CMD_HELLO</constant>. See
++      <citerefentry>
++        <refentrytitle>kdbus.connection</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>
++      for more details.
++    </para>
++
++    <para>
++      Alternatively, instead of mapping the entire pool buffer, only parts
++      of it can be mapped. Every kdbus command that returns an
++      <emphasis>offset</emphasis> (see above) also reports a
++      <emphasis>size</emphasis> along with it, so programs can be written
++      in a way that it only maps portions of the pool to access a specific
++      <emphasis>slice</emphasis>.
++    </para>
++
++    <para>
++      When access to the pool memory is no longer needed, programs should
++      call <function>munmap()</function> on the pointer returned by
++      <function>mmap()</function>.
++    </para>
++  </refsect1>
++
++  <refsect1>
++    <title>Freeing pool slices</title>
++    <para>
++      The <constant>KDBUS_CMD_FREE</constant> ioctl is used to free a slice
++      inside the pool, describing an offset that was returned in an
++      <varname>offset</varname> field of another ioctl struct.
++      The <constant>KDBUS_CMD_FREE</constant> command takes a
++      <type>struct kdbus_cmd_free</type> as argument.
++    </para>
++
++<programlisting>
++struct kdbus_cmd_free {
++  __u64 size;
++  __u64 flags;
++  __u64 return_flags;
++  __u64 offset;
++  struct kdbus_item items[0];
++};
++</programlisting>
++
++    <para>The fields in this struct are described below.</para>
++
++    <variablelist>
++      <varlistentry>
++        <term><varname>size</varname></term>
++        <listitem><para>
++          The overall size of the struct, including its items.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>flags</varname></term>
++        <listitem><para>
++          Currently unused.
++          <constant>KDBUS_FLAG_NEGOTIATE</constant> is accepted to probe for
++          valid flags. If set, the ioctl will return <errorcode>0</errorcode>,
++          and the <varname>flags</varname> field is set to
++          <constant>0</constant>.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>return_flags</varname></term>
++        <listitem><para>
++          Flags returned by the kernel. Currently unused and always set to
++          <constant>0</constant> by the kernel.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>offset</varname></term>
++        <listitem><para>
++          The offset to free, as returned by other ioctls that allocated
++          memory for returned information.
++        </para></listitem>
++      </varlistentry>
++
++      <varlistentry>
++        <term><varname>items</varname></term>
++        <listitem><para>
++          Items to specify further details for the receive command.
++          Currently unused.
++          Unrecognized items are rejected, and the ioctl will fail with
++          <varname>errno</varname> set to <constant>EINVAL</constant>.
++          All items except for
++          <constant>KDBUS_ITEM_NEGOTIATE</constant> (see
++            <citerefentry>
++              <refentrytitle>kdbus.item</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>
++          ) will be rejected.
++        </para></listitem>
++      </varlistentry>
++    </variablelist>
++  </refsect1>
++
++  <refsect1>
++    <title>Return value</title>
++    <para>
++      On success, all mentioned ioctl commands return <errorcode>0</errorcode>;
++      on error, <errorcode>-1</errorcode> is returned, and
++      <varname>errno</varname> is set to indicate the error.
++      If the issued ioctl is illegal for the file descriptor used,
++      <varname>errno</varname> will be set to <constant>ENOTTY</constant>.
++    </para>
++
++    <refsect2>
++      <title>
++        <constant>KDBUS_CMD_FREE</constant> may fail with the following
++        errors
++      </title>
++
++      <variablelist>
++        <varlistentry>
++          <term><constant>ENXIO</constant></term>
++          <listitem><para>
++            No pool slice found at given offset.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EINVAL</constant></term>
++          <listitem><para>
++            Invalid flags provided.
++          </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>EINVAL</constant></term>
++          <listitem><para>
++            The offset is valid, but the user is not allowed to free the slice.
++            This happens, for example, if the offset was retrieved with
++            <constant>KDBUS_RECV_PEEK</constant>.
++          </para></listitem>
++        </varlistentry>
++      </variablelist>
++    </refsect2>
++  </refsect1>
++
++  <refsect1>
++    <title>See Also</title>
++    <simplelist type="inline">
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.bus</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.connection</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.endpoint</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.name</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>mmap</refentrytitle>
++            <manvolnum>2</manvolnum>
++          </citerefentry>
++        </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>munmap</refentrytitle>
++          <manvolnum>2</manvolnum>
++        </citerefentry>
++      </member>
++    </simplelist>
++  </refsect1>
++</refentry>
+diff --git a/Documentation/kdbus/kdbus.xml b/Documentation/kdbus/kdbus.xml
+new file mode 100644
+index 0000000..d8e7400
+--- /dev/null
++++ b/Documentation/kdbus/kdbus.xml
+@@ -0,0 +1,1012 @@
++<?xml version='1.0'?> <!--*-nxml-*-->
++<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
++        "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
++
++<refentry id="kdbus">
++
++  <refentryinfo>
++    <title>kdbus</title>
++    <productname>kdbus</productname>
++  </refentryinfo>
++
++  <refmeta>
++    <refentrytitle>kdbus</refentrytitle>
++    <manvolnum>7</manvolnum>
++  </refmeta>
++
++  <refnamediv>
++    <refname>kdbus</refname>
++    <refpurpose>Kernel Message Bus</refpurpose>
++  </refnamediv>
++
++  <refsect1>
++    <title>Synopsis</title>
++    <para>
++      kdbus is an inter-process communication bus system controlled by the
++      kernel. It provides user-space with an API to create buses and send
++      unicast and multicast messages to one, or many, peers connected to the
++      same bus. It does not enforce any layout on the transmitted data, but
++      only provides the transport layer used for message interchange between
++      peers.
++    </para>
++    <para>
++      This set of man-pages gives a comprehensive overview of the kernel-level
++      API, with all ioctl commands, associated structs and bit masks. However,
++      most people will not use this API level directly, but rather let one of
++      the high-level abstraction libraries help them integrate D-Bus
++      functionality into their applications.
++    </para>
++  </refsect1>
++
++  <refsect1>
++    <title>Description</title>
++    <para>
++      kdbus provides a pseudo filesystem called <emphasis>kdbusfs</emphasis>,
++      which is usually mounted on <filename>/sys/fs/kdbus</filename>. Bus
++      primitives can be accessed as files and sub-directories underneath this
++      mount-point. Any advanced operations are done via
++      <function>ioctl()</function> on files created by
++      <emphasis>kdbusfs</emphasis>. Multiple mount-points of
++      <emphasis>kdbusfs</emphasis> are independent of each other. This allows
++      namespacing of kdbus by mounting a new instance of
++      <emphasis>kdbusfs</emphasis> in a new mount-namespace. kdbus calls these
++      mount instances domains and each bus belongs to exactly one domain.
++    </para>
++
++    <para>
++      kdbus was designed as a transport layer for D-Bus, but is in no way
++      limited, nor controlled by the D-Bus protocol specification. The D-Bus
++      protocol is one possible application layer on top of kdbus.
++    </para>
++
++    <para>
++      For the general D-Bus protocol specification, its payload format, its
++      marshaling, and its communication semantics, please refer to the
++      <ulink url="http://dbus.freedesktop.org/doc/dbus-specification.html">
++      D-Bus specification</ulink>.
++    </para>
++
++  </refsect1>
++
++  <refsect1>
++    <title>Terminology</title>
++
++    <refsect2>
++      <title>Domain</title>
++      <para>
++        A domain is a <emphasis>kdbusfs</emphasis> mount-point containing all
++        the bus primitives. Each domain is independent, and separate domains
++        do not affect each other.
++      </para>
++    </refsect2>
++
++    <refsect2>
++      <title>Bus</title>
++      <para>
++        A bus is a named object inside a domain. Clients exchange messages
++        over a bus. Multiple buses themselves have no connection to each other;
++        messages can only be exchanged on the same bus. The default endpoint of
++        a bus, to which clients establish connections, is the "bus" file
++        /sys/fs/kdbus/&lt;bus name&gt;/bus.
++        Common operating system setups create one "system bus" per system,
++        and one "user bus" for every logged-in user. Applications or services
++        may create their own private buses. The kernel driver does not
++        distinguish between different bus types, they are all handled the same
++        way. See
++        <citerefentry>
++          <refentrytitle>kdbus.bus</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++        for more details.
++      </para>
++    </refsect2>
++
++    <refsect2>
++      <title>Endpoint</title>
++      <para>
++        An endpoint provides a file to talk to a bus. Opening an endpoint
++        creates a new connection to the bus to which the endpoint belongs. All
++        endpoints have unique names and are accessible as files underneath the
++        directory of a bus, e.g., /sys/fs/kdbus/&lt;bus&gt;/&lt;endpoint&gt;
++        Every bus has a default endpoint called "bus".
++        A bus can optionally offer additional endpoints with custom names
++        to provide restricted access to the bus. Custom endpoints carry
++        additional policy which can be used to create sandboxes with
++        locked-down, limited, filtered access to a bus. See
++        <citerefentry>
++          <refentrytitle>kdbus.endpoint</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++        for more details.
++      </para>
++    </refsect2>
++
++    <refsect2>
++      <title>Connection</title>
++      <para>
++        A connection to a bus is created by opening an endpoint file of a
++        bus. Every ordinary client connection has a unique identifier on the
++        bus and can address messages to every other connection on the same
++        bus by using the peer's connection ID as the destination. See
++        <citerefentry>
++          <refentrytitle>kdbus.connection</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++        for more details.
++      </para>
++    </refsect2>
++
++    <refsect2>
++      <title>Pool</title>
++      <para>
++        Each connection allocates a piece of shmem-backed memory that is
++        used to receive messages and answers to ioctl commands from the kernel.
++        It is never used to send anything to the kernel. In order to access that
++        memory, an application must mmap() it into its address space. See
++        <citerefentry>
++          <refentrytitle>kdbus.pool</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++        for more details.
++      </para>
++    </refsect2>
++
++    <refsect2>
++      <title>Well-known Name</title>
++      <para>
++        A connection can, in addition to its implicit unique connection ID,
++        request the ownership of a textual well-known name. Well-known names are
++        noted in reverse-domain notation, such as com.example.service1. A
++        connection that offers a service on a bus is usually reached by its
++        well-known name. An analogy of connection ID and well-known name is an
++        IP address and a DNS name associated with that address. See
++        <citerefentry>
++          <refentrytitle>kdbus.name</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++        for more details.
++      </para>
++    </refsect2>
++
++    <refsect2>
++      <title>Message</title>
++      <para>
++        Connections can exchange messages with other connections by addressing
++        the peers with their connection ID or well-known name. A message
++        consists of a message header with information on how to route the
++        message, and the message payload, which is a logical byte stream of
++        arbitrary size. Messages can carry additional file descriptors to be
++        passed from one connection to another, just like passing file
++        descriptors over UNIX domain sockets. Every connection can specify which
++        set of metadata the kernel should attach to the message when it is
++        delivered to the receiving connection. Metadata contains information
++        like: system time stamps, UID, GID, TID, proc-starttime, well-known
++        names, process comm, process exe, process argv, cgroup, capabilities,
++        seclabel, audit session, loginuid and the connection's human-readable
++        name. See
++        <citerefentry>
++          <refentrytitle>kdbus.message</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++        for more details.
++      </para>
++    </refsect2>
++
++    <refsect2>
++      <title>Item</title>
++      <para>
++        The API of kdbus implements the notion of items, submitted through and
++        returned by most ioctls, and stored inside data structures in the
++        connection's pool. See
++        <citerefentry>
++          <refentrytitle>kdbus.item</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++        for more details.
++      </para>
++    </refsect2>
++
++    <refsect2>
++      <title>Broadcast, signal, filter, match</title>
++      <para>
++        Signals are messages that a receiver opts in for by installing a blob of
++        bytes, called a 'match'. Signal messages must always carry a
++        counter-part blob, called a 'filter', and signals are only delivered to
++        peers which have a match that white-lists the message's filter. Senders
++        of signal messages can use either a single connection ID as receiver,
++        or the special connection ID
++        <constant>KDBUS_DST_ID_BROADCAST</constant> to potentially send it to
++        all connections of a bus, following the logic described above. See
++        <citerefentry>
++          <refentrytitle>kdbus.match</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++        and
++        <citerefentry>
++          <refentrytitle>kdbus.message</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++        for more details.
++      </para>
++    </refsect2>
++
++    <refsect2>
++      <title>Policy</title>
++      <para>
++        A policy is a set of rules that define which connections can see, talk
++        to, or register a well-known name on the bus. A policy is attached to
++        buses and custom endpoints, and modified by policy holder connections or
++        owners of custom endpoints. See
++        <citerefentry>
++          <refentrytitle>kdbus.policy</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++        for more details.
++      </para>
++    </refsect2>
++
++    <refsect2>
++      <title>Privileged bus users</title>
++      <para>
++        A user connecting to the bus is considered privileged if it is either
++        the creator of the bus, or if it has the CAP_IPC_OWNER capability flag
++        set. See
++        <citerefentry>
++          <refentrytitle>kdbus.connection</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++        for more details.
++      </para>
++    </refsect2>
++  </refsect1>
++
++  <refsect1>
++    <title>Bus Layout</title>
++
++    <para>
++      A <emphasis>bus</emphasis> provides and defines an environment that peers
++      can connect to for message interchange. A bus is created via the kdbus
++      control interface and can be modified by the bus creator. It applies the
++      policy that control all bus operations. The bus creator itself does not
++      participate as a peer. To establish a peer
++      <emphasis>connection</emphasis>, you have to open one of the
++      <emphasis>endpoints</emphasis> of a bus. Each bus provides a default
++      endpoint, but further endpoints can be created on-demand. Endpoints are
++      used to apply additional policies for all connections on this endpoint.
++      Thus, they provide additional filters to further restrict access of
++      specific connections to the bus.
++    </para>
++
++    <para>
++      Following, you can see an example bus layout:
++    </para>
++
++    <programlisting><![CDATA[
++                                  Bus Creator
++                                       |
++                                       |
++                                    +-----+
++                                    | Bus |
++                                    +-----+
++                                       |
++                    __________________/ \__________________
++                   /                                       \
++                   |                                       |
++             +----------+                             +----------+
++             | Endpoint |                             | Endpoint |
++             +----------+                             +----------+
++         _________/|\_________                   _________/|\_________
++        /          |          \                 /          |          \
++        |          |          |                 |          |          |
++        |          |          |                 |          |          |
++   Connection  Connection  Connection      Connection  Connection  Connection
++    ]]></programlisting>
++
++  </refsect1>
++
++  <refsect1>
++    <title>Data structures and interconnections</title>
++    <programlisting><![CDATA[
++  +--------------------------------------------------------------------------+
++  | Domain (Mount Point)                                                     |
++  | /sys/fs/kdbus/control                                                    |
++  | +----------------------------------------------------------------------+ |
++  | | Bus (System Bus)                                                     | |
++  | | /sys/fs/kdbus/0-system/                                              | |
++  | | +-------------------------------+ +--------------------------------+ | |
++  | | | Endpoint                      | | Endpoint                       | | |
++  | | | /sys/fs/kdbus/0-system/bus    | | /sys/fs/kdbus/0-system/ep.app  | | |
++  | | +-------------------------------+ +--------------------------------+ | |
++  | | +--------------+ +--------------+ +--------------+ +---------------+ | |
++  | | | Connection   | | Connection   | | Connection   | | Connection    | | |
++  | | | :1.22        | | :1.25        | | :1.55        | | :1.81         | | |
++  | | +--------------+ +--------------+ +--------------+ +---------------+ | |
++  | +----------------------------------------------------------------------+ |
++  |                                                                          |
++  | +----------------------------------------------------------------------+ |
++  | | Bus (User Bus for UID 2702)                                          | |
++  | | /sys/fs/kdbus/2702-user/                                             | |
++  | | +-------------------------------+ +--------------------------------+ | |
++  | | | Endpoint                      | | Endpoint                       | | |
++  | | | /sys/fs/kdbus/2702-user/bus   | | /sys/fs/kdbus/2702-user/ep.app | | |
++  | | +-------------------------------+ +--------------------------------+ | |
++  | | +--------------+ +--------------+ +--------------+ +---------------+ | |
++  | | | Connection   | | Connection   | | Connection   | | Connection    | | |
++  | | | :1.22        | | :1.25        | | :1.55        | | :1.81         | | |
++  | | +--------------+ +--------------+ +--------------------------------+ | |
++  | +----------------------------------------------------------------------+ |
++  +--------------------------------------------------------------------------+
++    ]]></programlisting>
++  </refsect1>
++
++  <refsect1>
++    <title>Metadata</title>
++
++    <refsect2>
++      <title>When metadata is collected</title>
++      <para>
++        kdbus records data about the system in certain situations. Such metadata
++        can refer to the currently active process (creds, PIDs, current user
++        groups, process names and its executable path, cgroup membership,
++        capabilities, security label and audit information), connection
++        information (description string, currently owned names) and time stamps.
++      </para>
++      <para>
++        Metadata is collected at the following times.
++      </para>
++
++      <itemizedlist>
++        <listitem><para>
++          When a bus is created (<constant>KDBUS_CMD_MAKE</constant>),
++          information about the calling task is collected. This data is returned
++          by the kernel via the <constant>KDBUS_CMD_BUS_CREATOR_INFO</constant>
++          call.
++        </para></listitem>
++
++        <listitem>
++          <para>
++            When a connection is created (<constant>KDBUS_CMD_HELLO</constant>),
++            information about the calling task is collected. Alternatively, a
++            privileged connection may provide 'faked' information about
++            credentials, PIDs and security labels which will be stored instead.
++            This data is returned by the kernel as information on a connection
++            (<constant>KDBUS_CMD_CONN_INFO</constant>). Only metadata that a
++            connection allowed to be sent (by setting its bit in
++            <varname>attach_flags_send</varname>) will be exported in this way.
++          </para>
++        </listitem>
++
++        <listitem>
++          <para>
++            When a message is sent (<constant>KDBUS_CMD_SEND</constant>),
++            information about the sending task and the sending connection is
++            collected. This metadata will be attached to the message when it
++            arrives in the receiver's pool. If the connection sending the
++            message installed faked credentials (see
++            <citerefentry>
++              <refentrytitle>kdbus.connection</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>),
++            the message will not be augmented by any information about the
++            currently sending task. Note that only metadata that was requested
++            by the receiving connection will be collected and attached to
++            messages.
++          </para>
++        </listitem>
++      </itemizedlist>
++
++      <para>
++        Which metadata items are actually delivered depends on the following
++        sets and masks:
++      </para>
++
++      <itemizedlist>
++        <listitem><para>
++          (a) the system-wide kmod creds mask
++          (module parameter <varname>attach_flags_mask</varname>)
++        </para></listitem>
++
++        <listitem><para>
++          (b) the per-connection send creds mask, set by the connecting client
++        </para></listitem>
++
++        <listitem><para>
++          (c) the per-connection receive creds mask, set by the connecting
++          client
++        </para></listitem>
++
++        <listitem><para>
++          (d) the per-bus minimal creds mask, set by the bus creator
++        </para></listitem>
++
++        <listitem><para>
++          (e) the per-bus owner creds mask, set by the bus creator
++        </para></listitem>
++
++        <listitem><para>
++          (f) the mask specified when querying creds of a bus peer
++        </para></listitem>
++
++        <listitem><para>
++          (g) the mask specified when querying creds of a bus owner
++        </para></listitem>
++      </itemizedlist>
++
++      <para>
++        With the following rules:
++      </para>
++
++      <itemizedlist>
++        <listitem>
++          <para>
++            [1] The creds attached to messages are determined as
++            <constant>a &amp; b &amp; c</constant>.
++          </para>
++        </listitem>
++
++        <listitem>
++          <para>
++            [2] When connecting to a bus (<constant>KDBUS_CMD_HELLO</constant>),
++            and <constant>~b &amp; d != 0</constant>, the call will fail with,
++            <errorcode>-1</errorcode>, and <varname>errno</varname> is set to
++            <constant>ECONNREFUSED</constant>.
++          </para>
++        </listitem>
++
++        <listitem>
++          <para>
++            [3] When querying creds of a bus peer, the creds returned are
++            <constant>a &amp; b &amp; f</constant>.
++          </para>
++        </listitem>
++
++        <listitem>
++          <para>
++            [4] When querying creds of a bus owner, the creds returned are
++            <constant>a &amp; e &amp; g</constant>.
++          </para>
++        </listitem>
++      </itemizedlist>
++
++      <para>
++        Hence, programs might not always get all requested metadata items that
++        it requested. Code must be written so that it can cope with this fact.
++      </para>
++    </refsect2>
++
++    <refsect2>
++      <title>Benefits and heads-up</title>
++      <para>
++        Attaching metadata to messages has two major benefits.
++
++        <itemizedlist>
++          <listitem>
++            <para>
++              Metadata attached to messages is gathered at the moment when the
++              other side calls <constant>KDBUS_CMD_SEND</constant>, or,
++              respectively, then the kernel notification is generated. There is
++              no need for the receiving peer to retrieve information about the
++              task in a second step. This closes a race gap that would otherwise
++              be inherent.
++            </para>
++          </listitem>
++          <listitem>
++            <para>
++              As metadata is delivered along with messages in the same data
++              blob, no extra calls to kernel functions etc. are needed to gather
++              them.
++            </para>
++          </listitem>
++        </itemizedlist>
++
++        Note, however, that collecting metadata does come at a price for
++        performance, so developers should carefully assess which metadata to
++        really opt-in for. For best practice, data that is not needed as part
++        of a message should not be requested by the connection in the first
++        place (see <varname>attach_flags_recv</varname> in
++        <constant>KDBUS_CMD_HELLO</constant>).
++      </para>
++    </refsect2>
++
++    <refsect2>
++      <title>Attach flags for metadata items</title>
++      <para>
++        To let the kernel know which metadata information to attach as items
++        to the aforementioned commands, it uses a bitmask. In those, the
++        following <emphasis>attach flags</emphasis> are currently supported.
++        Both the <varname>attach_flags_recv</varname> and
++        <varname>attach_flags_send</varname> fields of
++        <type>struct kdbus_cmd_hello</type>, as well as the payload of the
++        <constant>KDBUS_ITEM_ATTACH_FLAGS_SEND</constant> and
++        <constant>KDBUS_ITEM_ATTACH_FLAGS_RECV</constant> items follow this
++        scheme.
++      </para>
++
++      <variablelist>
++        <varlistentry>
++          <term><constant>KDBUS_ATTACH_TIMESTAMP</constant></term>
++            <listitem><para>
++              Requests the attachment of an item of type
++              <constant>KDBUS_ITEM_TIMESTAMP</constant>.
++            </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ATTACH_CREDS</constant></term>
++            <listitem><para>
++              Requests the attachment of an item of type
++              <constant>KDBUS_ITEM_CREDS</constant>.
++            </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ATTACH_PIDS</constant></term>
++            <listitem><para>
++              Requests the attachment of an item of type
++              <constant>KDBUS_ITEM_PIDS</constant>.
++            </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ATTACH_AUXGROUPS</constant></term>
++            <listitem><para>
++              Requests the attachment of an item of type
++              <constant>KDBUS_ITEM_AUXGROUPS</constant>.
++            </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ATTACH_NAMES</constant></term>
++            <listitem><para>
++              Requests the attachment of an item of type
++              <constant>KDBUS_ITEM_OWNED_NAME</constant>.
++            </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ATTACH_TID_COMM</constant></term>
++            <listitem><para>
++              Requests the attachment of an item of type
++              <constant>KDBUS_ITEM_TID_COMM</constant>.
++            </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ATTACH_PID_COMM</constant></term>
++            <listitem><para>
++              Requests the attachment of an item of type
++              <constant>KDBUS_ITEM_PID_COMM</constant>.
++            </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ATTACH_EXE</constant></term>
++            <listitem><para>
++              Requests the attachment of an item of type
++              <constant>KDBUS_ITEM_EXE</constant>.
++            </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ATTACH_CMDLINE</constant></term>
++            <listitem><para>
++              Requests the attachment of an item of type
++              <constant>KDBUS_ITEM_CMDLINE</constant>.
++            </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ATTACH_CGROUP</constant></term>
++            <listitem><para>
++              Requests the attachment of an item of type
++              <constant>KDBUS_ITEM_CGROUP</constant>.
++            </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ATTACH_CAPS</constant></term>
++            <listitem><para>
++              Requests the attachment of an item of type
++              <constant>KDBUS_ITEM_CAPS</constant>.
++            </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ATTACH_SECLABEL</constant></term>
++            <listitem><para>
++              Requests the attachment of an item of type
++              <constant>KDBUS_ITEM_SECLABEL</constant>.
++            </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ATTACH_AUDIT</constant></term>
++            <listitem><para>
++              Requests the attachment of an item of type
++              <constant>KDBUS_ITEM_AUDIT</constant>.
++            </para></listitem>
++        </varlistentry>
++
++        <varlistentry>
++          <term><constant>KDBUS_ATTACH_CONN_DESCRIPTION</constant></term>
++            <listitem><para>
++              Requests the attachment of an item of type
++              <constant>KDBUS_ITEM_CONN_DESCRIPTION</constant>.
++            </para></listitem>
++        </varlistentry>
++      </variablelist>
++
++      <para>
++        Please refer to
++        <citerefentry>
++          <refentrytitle>kdbus.item</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++        for detailed information about the layout and payload of items and
++        what metadata should be used to.
++      </para>
++    </refsect2>
++  </refsect1>
++
++  <refsect1>
++    <title>The ioctl interface</title>
++
++    <para>
++      As stated in the 'synopsis' section above, application developers are
++      strongly encouraged to use kdbus through one of the high-level D-Bus
++      abstraction libraries, rather than using the low-level API directly.
++    </para>
++
++    <para>
++      kdbus on the kernel level exposes its functions exclusively through
++      <citerefentry>
++        <refentrytitle>ioctl</refentrytitle>
++        <manvolnum>2</manvolnum>
++      </citerefentry>,
++      employed on file descriptors returned by
++      <citerefentry>
++        <refentrytitle>open</refentrytitle>
++        <manvolnum>2</manvolnum>
++      </citerefentry>
++      on pseudo files exposed by
++      <citerefentry>
++        <refentrytitle>kdbus.fs</refentrytitle>
++        <manvolnum>7</manvolnum>
++      </citerefentry>.
++    </para>
++    <para>
++      Following is a list of all the ioctls, along with the command structs
++      they must be used with.
++    </para>
++
++    <informaltable frame="none">
++      <tgroup cols="3" colsep="1">
++        <thead>
++          <row>
++            <entry>ioctl signature</entry>
++            <entry>command</entry>
++            <entry>transported struct</entry>
++          </row>
++        </thead>
++        <tbody>
++          <row>
++            <entry><constant>0x40189500</constant></entry>
++            <entry><constant>KDBUS_CMD_BUS_MAKE</constant></entry>
++            <entry><type>struct kdbus_cmd *</type></entry>
++          </row><row>
++            <entry><constant>0x40189510</constant></entry>
++            <entry><constant>KDBUS_CMD_ENDPOINT_MAKE</constant></entry>
++            <entry><type>struct kdbus_cmd *</type></entry>
++          </row><row>
++            <entry><constant>0xc0609580</constant></entry>
++            <entry><constant>KDBUS_CMD_HELLO</constant></entry>
++            <entry><type>struct kdbus_cmd_hello *</type></entry>
++          </row><row>
++            <entry><constant>0x40189582</constant></entry>
++            <entry><constant>KDBUS_CMD_BYEBYE</constant></entry>
++            <entry><type>struct kdbus_cmd *</type></entry>
++          </row><row>
++            <entry><constant>0x40389590</constant></entry>
++            <entry><constant>KDBUS_CMD_SEND</constant></entry>
++            <entry><type>struct kdbus_cmd_send *</type></entry>
++          </row><row>
++            <entry><constant>0x80409591</constant></entry>
++            <entry><constant>KDBUS_CMD_RECV</constant></entry>
++            <entry><type>struct kdbus_cmd_recv *</type></entry>
++          </row><row>
++            <entry><constant>0x40209583</constant></entry>
++            <entry><constant>KDBUS_CMD_FREE</constant></entry>
++            <entry><type>struct kdbus_cmd_free *</type></entry>
++          </row><row>
++            <entry><constant>0x401895a0</constant></entry>
++            <entry><constant>KDBUS_CMD_NAME_ACQUIRE</constant></entry>
++            <entry><type>struct kdbus_cmd *</type></entry>
++          </row><row>
++            <entry><constant>0x401895a1</constant></entry>
++            <entry><constant>KDBUS_CMD_NAME_RELEASE</constant></entry>
++            <entry><type>struct kdbus_cmd *</type></entry>
++          </row><row>
++            <entry><constant>0x80289586</constant></entry>
++            <entry><constant>KDBUS_CMD_LIST</constant></entry>
++            <entry><type>struct kdbus_cmd_list *</type></entry>
++          </row><row>
++            <entry><constant>0x80309584</constant></entry>
++            <entry><constant>KDBUS_CMD_CONN_INFO</constant></entry>
++            <entry><type>struct kdbus_cmd_info *</type></entry>
++          </row><row>
++            <entry><constant>0x40209551</constant></entry>
++            <entry><constant>KDBUS_CMD_UPDATE</constant></entry>
++            <entry><type>struct kdbus_cmd *</type></entry>
++          </row><row>
++            <entry><constant>0x80309585</constant></entry>
++            <entry><constant>KDBUS_CMD_BUS_CREATOR_INFO</constant></entry>
++            <entry><type>struct kdbus_cmd_info *</type></entry>
++          </row><row>
++            <entry><constant>0x40189511</constant></entry>
++            <entry><constant>KDBUS_CMD_ENDPOINT_UPDATE</constant></entry>
++            <entry><type>struct kdbus_cmd *</type></entry>
++          </row><row>
++            <entry><constant>0x402095b0</constant></entry>
++            <entry><constant>KDBUS_CMD_MATCH_ADD</constant></entry>
++            <entry><type>struct kdbus_cmd_match *</type></entry>
++          </row><row>
++            <entry><constant>0x402095b1</constant></entry>
++            <entry><constant>KDBUS_CMD_MATCH_REMOVE</constant></entry>
++            <entry><type>struct kdbus_cmd_match *</type></entry>
++          </row>
++        </tbody>
++      </tgroup>
++    </informaltable>
++
++    <para>
++      Depending on the type of <emphasis>kdbusfs</emphasis> node that was
++      opened and what ioctls have been executed on a file descriptor before,
++      a different sub-set of ioctl commands is allowed.
++    </para>
++
++    <itemizedlist>
++      <listitem>
++        <para>
++          On a file descriptor resulting from opening a
++          <emphasis>control node</emphasis>, only the
++          <constant>KDBUS_CMD_BUS_MAKE</constant> ioctl may be executed.
++        </para>
++      </listitem>
++      <listitem>
++        <para>
++          On a file descriptor resulting from opening a
++          <emphasis>bus endpoint node</emphasis>, only the
++          <constant>KDBUS_CMD_ENDPOINT_MAKE</constant> and
++          <constant>KDBUS_CMD_HELLO</constant> ioctls may be executed.
++        </para>
++      </listitem>
++      <listitem>
++        <para>
++          A file descriptor that was used to create a bus
++          (via <constant>KDBUS_CMD_BUS_MAKE</constant>) is called a
++          <emphasis>bus owner</emphasis> file descriptor. The bus will be
++          active as long as the file descriptor is kept open.
++          A bus owner file descriptor can not be used to
++          employ any further ioctls. As soon as
++          <citerefentry>
++            <refentrytitle>close</refentrytitle>
++            <manvolnum>2</manvolnum>
++          </citerefentry>
++          is called on it, the bus will be shut down, along will all associated
++          endpoints and connections. See
++          <citerefentry>
++            <refentrytitle>kdbus.bus</refentrytitle>
++            <manvolnum>7</manvolnum>
++          </citerefentry>
++          for more details.
++        </para>
++      </listitem>
++      <listitem>
++        <para>
++          A file descriptor that was used to create an endpoint
++          (via <constant>KDBUS_CMD_ENDPOINT_MAKE</constant>) is called an
++          <emphasis>endpoint owner</emphasis> file descriptor. The endpoint
++          will be active as long as the file descriptor is kept open.
++          An endpoint owner file descriptor can only be used
++          to update details of an endpoint through the
++          <constant>KDBUS_CMD_ENDPOINT_UPDATE</constant> ioctl. As soon as
++          <citerefentry>
++            <refentrytitle>close</refentrytitle>
++            <manvolnum>2</manvolnum>
++          </citerefentry>
++          is called on it, the endpoint will be removed from the bus, and all
++          connections that are connected to the bus through it are shut down.
++          See
++          <citerefentry>
++            <refentrytitle>kdbus.endpoint</refentrytitle>
++            <manvolnum>7</manvolnum>
++          </citerefentry>
++          for more details.
++        </para>
++      </listitem>
++      <listitem>
++        <para>
++          A file descriptor that was used to create a connection
++          (via <constant>KDBUS_CMD_HELLO</constant>) is called a
++          <emphasis>connection owner</emphasis> file descriptor. The connection
++          will be active as long as the file descriptor is kept open.
++          A connection owner file descriptor may be used to
++          issue any of the following ioctls.
++        </para>
++
++        <itemizedlist>
++          <listitem><para>
++            <constant>KDBUS_CMD_UPDATE</constant> to tweak details of the
++            connection. See
++            <citerefentry>
++              <refentrytitle>kdbus.connection</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>.
++          </para></listitem>
++
++          <listitem><para>
++            <constant>KDBUS_CMD_BYEBYE</constant> to shut down a connection
++            without losing messages. See
++            <citerefentry>
++              <refentrytitle>kdbus.connection</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>.
++          </para></listitem>
++
++          <listitem><para>
++            <constant>KDBUS_CMD_FREE</constant> to free a slice of memory in
++            the pool. See
++            <citerefentry>
++              <refentrytitle>kdbus.pool</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>.
++          </para></listitem>
++
++          <listitem><para>
++            <constant>KDBUS_CMD_CONN_INFO</constant> to retrieve information
++            on other connections on the bus. See
++            <citerefentry>
++              <refentrytitle>kdbus.connection</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>.
++          </para></listitem>
++
++          <listitem><para>
++            <constant>KDBUS_CMD_BUS_CREATOR_INFO</constant> to retrieve
++            information on the bus creator. See
++            <citerefentry>
++              <refentrytitle>kdbus.connection</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>.
++          </para></listitem>
++
++          <listitem><para>
++            <constant>KDBUS_CMD_LIST</constant> to retrieve a list of
++            currently active well-known names and unique IDs on the bus. See
++            <citerefentry>
++              <refentrytitle>kdbus.name</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>.
++          </para></listitem>
++
++          <listitem><para>
++            <constant>KDBUS_CMD_SEND</constant> and
++            <constant>KDBUS_CMD_RECV</constant> to send or receive a message.
++            See
++            <citerefentry>
++              <refentrytitle>kdbus.message</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>.
++          </para></listitem>
++
++          <listitem><para>
++            <constant>KDBUS_CMD_NAME_ACQUIRE</constant> and
++            <constant>KDBUS_CMD_NAME_RELEASE</constant> to acquire or release
++            a well-known name on the bus. See
++            <citerefentry>
++              <refentrytitle>kdbus.name</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>.
++          </para></listitem>
++
++          <listitem><para>
++            <constant>KDBUS_CMD_MATCH_ADD</constant> and
++            <constant>KDBUS_CMD_MATCH_REMOVE</constant> to add or remove
++            a match for signal messages. See
++            <citerefentry>
++              <refentrytitle>kdbus.match</refentrytitle>
++              <manvolnum>7</manvolnum>
++            </citerefentry>.
++          </para></listitem>
++        </itemizedlist>
++      </listitem>
++    </itemizedlist>
++
++    <para>
++      These ioctls, along with the structs they transport, are explained in
++      detail in the other documents linked to in the "See Also" section below.
++    </para>
++  </refsect1>
++
++  <refsect1>
++    <title>See Also</title>
++    <simplelist type="inline">
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.bus</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.connection</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.endpoint</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.fs</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.item</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.message</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.name</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>kdbus.pool</refentrytitle>
++          <manvolnum>7</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>ioctl</refentrytitle>
++          <manvolnum>2</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>mmap</refentrytitle>
++          <manvolnum>2</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>open</refentrytitle>
++          <manvolnum>2</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <citerefentry>
++          <refentrytitle>close</refentrytitle>
++          <manvolnum>2</manvolnum>
++        </citerefentry>
++      </member>
++      <member>
++        <ulink url="http://freedesktop.org/wiki/Software/dbus">D-Bus</ulink>
++      </member>
++    </simplelist>
++  </refsect1>
++
++</refentry>
+diff --git a/Documentation/kdbus/stylesheet.xsl b/Documentation/kdbus/stylesheet.xsl
+new file mode 100644
+index 0000000..52565ea
+--- /dev/null
++++ b/Documentation/kdbus/stylesheet.xsl
+@@ -0,0 +1,16 @@
++<?xml version="1.0" encoding="UTF-8"?>
++<stylesheet xmlns="http://www.w3.org/1999/XSL/Transform" version="1.0">
++	<param name="chunk.quietly">1</param>
++	<param name="funcsynopsis.style">ansi</param>
++	<param name="funcsynopsis.tabular.threshold">80</param>
++	<param name="callout.graphics">0</param>
++	<param name="paper.type">A4</param>
++	<param name="generate.section.toc.level">2</param>
++	<param name="use.id.as.filename">1</param>
++	<param name="citerefentry.link">1</param>
++	<strip-space elements="*"/>
++	<template name="generate.citerefentry.link">
++		<value-of select="refentrytitle"/>
++		<text>.html</text>
++	</template>
++</stylesheet>
+diff --git a/MAINTAINERS b/MAINTAINERS
+index d8afd29..02f7668 100644
+--- a/MAINTAINERS
++++ b/MAINTAINERS
+@@ -5585,6 +5585,19 @@ S:	Maintained
+ F:	Documentation/kbuild/kconfig-language.txt
+ F:	scripts/kconfig/
+ 
++KDBUS
++M:	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++M:	Daniel Mack <daniel@zonque.org>
++M:	David Herrmann <dh.herrmann@googlemail.com>
++M:	Djalal Harouni <tixxdz@opendz.org>
++L:	linux-kernel@vger.kernel.org
++S:	Maintained
++F:	ipc/kdbus/*
++F:	samples/kdbus/*
++F:	Documentation/kdbus/*
++F:	include/uapi/linux/kdbus.h
++F:	tools/testing/selftests/kdbus/
++
+ KDUMP
+ M:	Vivek Goyal <vgoyal@redhat.com>
+ M:	Haren Myneni <hbabu@us.ibm.com>
+diff --git a/Makefile b/Makefile
+index f5c8983..a1c8d57 100644
+--- a/Makefile
++++ b/Makefile
+@@ -1343,6 +1343,7 @@ $(help-board-dirs): help-%:
+ %docs: scripts_basic FORCE
+ 	$(Q)$(MAKE) $(build)=scripts build_docproc
+ 	$(Q)$(MAKE) $(build)=Documentation/DocBook $@
++	$(Q)$(MAKE) $(build)=Documentation/kdbus $@
+ 
+ else # KBUILD_EXTMOD
+ 
+diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
+index 1a0006a..4842a98 100644
+--- a/include/uapi/linux/Kbuild
++++ b/include/uapi/linux/Kbuild
+@@ -215,6 +215,7 @@ header-y += ixjuser.h
+ header-y += jffs2.h
+ header-y += joystick.h
+ header-y += kcmp.h
++header-y += kdbus.h
+ header-y += kdev_t.h
+ header-y += kd.h
+ header-y += kernelcapi.h
+diff --git a/include/uapi/linux/kdbus.h b/include/uapi/linux/kdbus.h
+new file mode 100644
+index 0000000..4fc44cb
+--- /dev/null
++++ b/include/uapi/linux/kdbus.h
+@@ -0,0 +1,984 @@
++/*
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef _UAPI_KDBUS_H_
++#define _UAPI_KDBUS_H_
++
++#include <linux/ioctl.h>
++#include <linux/types.h>
++
++#define KDBUS_IOCTL_MAGIC		0x95
++#define KDBUS_SRC_ID_KERNEL		(0)
++#define KDBUS_DST_ID_NAME		(0)
++#define KDBUS_MATCH_ID_ANY		(~0ULL)
++#define KDBUS_DST_ID_BROADCAST		(~0ULL)
++#define KDBUS_FLAG_NEGOTIATE		(1ULL << 63)
++
++/**
++ * struct kdbus_notify_id_change - name registry change message
++ * @id:			New or former owner of the name
++ * @flags:		flags field from KDBUS_HELLO_*
++ *
++ * Sent from kernel to userspace when the owner or activator of
++ * a well-known name changes.
++ *
++ * Attached to:
++ *   KDBUS_ITEM_ID_ADD
++ *   KDBUS_ITEM_ID_REMOVE
++ */
++struct kdbus_notify_id_change {
++	__u64 id;
++	__u64 flags;
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_notify_name_change - name registry change message
++ * @old_id:		ID and flags of former owner of a name
++ * @new_id:		ID and flags of new owner of a name
++ * @name:		Well-known name
++ *
++ * Sent from kernel to userspace when the owner or activator of
++ * a well-known name changes.
++ *
++ * Attached to:
++ *   KDBUS_ITEM_NAME_ADD
++ *   KDBUS_ITEM_NAME_REMOVE
++ *   KDBUS_ITEM_NAME_CHANGE
++ */
++struct kdbus_notify_name_change {
++	struct kdbus_notify_id_change old_id;
++	struct kdbus_notify_id_change new_id;
++	char name[0];
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_creds - process credentials
++ * @uid:		User ID
++ * @euid:		Effective UID
++ * @suid:		Saved UID
++ * @fsuid:		Filesystem UID
++ * @gid:		Group ID
++ * @egid:		Effective GID
++ * @sgid:		Saved GID
++ * @fsgid:		Filesystem GID
++ *
++ * Attached to:
++ *   KDBUS_ITEM_CREDS
++ */
++struct kdbus_creds {
++	__u64 uid;
++	__u64 euid;
++	__u64 suid;
++	__u64 fsuid;
++	__u64 gid;
++	__u64 egid;
++	__u64 sgid;
++	__u64 fsgid;
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_pids - process identifiers
++ * @pid:		Process ID
++ * @tid:		Thread ID
++ * @ppid:		Parent process ID
++ *
++ * The PID and TID of a process.
++ *
++ * Attached to:
++ *   KDBUS_ITEM_PIDS
++ */
++struct kdbus_pids {
++	__u64 pid;
++	__u64 tid;
++	__u64 ppid;
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_caps - process capabilities
++ * @last_cap:	Highest currently known capability bit
++ * @caps:	Variable number of 32-bit capabilities flags
++ *
++ * Contains a variable number of 32-bit capabilities flags.
++ *
++ * Attached to:
++ *   KDBUS_ITEM_CAPS
++ */
++struct kdbus_caps {
++	__u32 last_cap;
++	__u32 caps[0];
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_audit - audit information
++ * @sessionid:		The audit session ID
++ * @loginuid:		The audit login uid
++ *
++ * Attached to:
++ *   KDBUS_ITEM_AUDIT
++ */
++struct kdbus_audit {
++	__u32 sessionid;
++	__u32 loginuid;
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_timestamp
++ * @seqnum:		Global per-domain message sequence number
++ * @monotonic_ns:	Monotonic timestamp, in nanoseconds
++ * @realtime_ns:	Realtime timestamp, in nanoseconds
++ *
++ * Attached to:
++ *   KDBUS_ITEM_TIMESTAMP
++ */
++struct kdbus_timestamp {
++	__u64 seqnum;
++	__u64 monotonic_ns;
++	__u64 realtime_ns;
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_vec - I/O vector for kdbus payload items
++ * @size:		The size of the vector
++ * @address:		Memory address of data buffer
++ * @offset:		Offset in the in-message payload memory,
++ *			relative to the message head
++ *
++ * Attached to:
++ *   KDBUS_ITEM_PAYLOAD_VEC, KDBUS_ITEM_PAYLOAD_OFF
++ */
++struct kdbus_vec {
++	__u64 size;
++	union {
++		__u64 address;
++		__u64 offset;
++	};
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_bloom_parameter - bus-wide bloom parameters
++ * @size:		Size of the bit field in bytes (m / 8)
++ * @n_hash:		Number of hash functions used (k)
++ */
++struct kdbus_bloom_parameter {
++	__u64 size;
++	__u64 n_hash;
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_bloom_filter - bloom filter containing n elements
++ * @generation:		Generation of the element set in the filter
++ * @data:		Bit field, multiple of 8 bytes
++ */
++struct kdbus_bloom_filter {
++	__u64 generation;
++	__u64 data[0];
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_memfd - a kdbus memfd
++ * @start:		The offset into the memfd where the segment starts
++ * @size:		The size of the memfd segment
++ * @fd:			The file descriptor number
++ * @__pad:		Padding to ensure proper alignment and size
++ *
++ * Attached to:
++ *   KDBUS_ITEM_PAYLOAD_MEMFD
++ */
++struct kdbus_memfd {
++	__u64 start;
++	__u64 size;
++	int fd;
++	__u32 __pad;
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_name - a registered well-known name with its flags
++ * @flags:		Flags from KDBUS_NAME_*
++ * @name:		Well-known name
++ *
++ * Attached to:
++ *   KDBUS_ITEM_OWNED_NAME
++ */
++struct kdbus_name {
++	__u64 flags;
++	char name[0];
++} __attribute__((__aligned__(8)));
++
++/**
++ * enum kdbus_policy_access_type - permissions of a policy record
++ * @_KDBUS_POLICY_ACCESS_NULL:	Uninitialized/invalid
++ * @KDBUS_POLICY_ACCESS_USER:	Grant access to a uid
++ * @KDBUS_POLICY_ACCESS_GROUP:	Grant access to gid
++ * @KDBUS_POLICY_ACCESS_WORLD:	World-accessible
++ */
++enum kdbus_policy_access_type {
++	_KDBUS_POLICY_ACCESS_NULL,
++	KDBUS_POLICY_ACCESS_USER,
++	KDBUS_POLICY_ACCESS_GROUP,
++	KDBUS_POLICY_ACCESS_WORLD,
++};
++
++/**
++ * enum kdbus_policy_access_flags - mode flags
++ * @KDBUS_POLICY_OWN:		Allow to own a well-known name
++ *				Implies KDBUS_POLICY_TALK and KDBUS_POLICY_SEE
++ * @KDBUS_POLICY_TALK:		Allow communication to a well-known name
++ *				Implies KDBUS_POLICY_SEE
++ * @KDBUS_POLICY_SEE:		Allow to see a well-known name
++ */
++enum kdbus_policy_type {
++	KDBUS_POLICY_SEE	= 0,
++	KDBUS_POLICY_TALK,
++	KDBUS_POLICY_OWN,
++};
++
++/**
++ * struct kdbus_policy_access - policy access item
++ * @type:		One of KDBUS_POLICY_ACCESS_* types
++ * @access:		Access to grant
++ * @id:			For KDBUS_POLICY_ACCESS_USER, the uid
++ *			For KDBUS_POLICY_ACCESS_GROUP, the gid
++ */
++struct kdbus_policy_access {
++	__u64 type;	/* USER, GROUP, WORLD */
++	__u64 access;	/* OWN, TALK, SEE */
++	__u64 id;	/* uid, gid, 0 */
++} __attribute__((__aligned__(8)));
++
++/**
++ * enum kdbus_attach_flags - flags for metadata attachments
++ * @KDBUS_ATTACH_TIMESTAMP:		Timestamp
++ * @KDBUS_ATTACH_CREDS:			Credentials
++ * @KDBUS_ATTACH_PIDS:			PIDs
++ * @KDBUS_ATTACH_AUXGROUPS:		Auxiliary groups
++ * @KDBUS_ATTACH_NAMES:			Well-known names
++ * @KDBUS_ATTACH_TID_COMM:		The "comm" process identifier of the TID
++ * @KDBUS_ATTACH_PID_COMM:		The "comm" process identifier of the PID
++ * @KDBUS_ATTACH_EXE:			The path of the executable
++ * @KDBUS_ATTACH_CMDLINE:		The process command line
++ * @KDBUS_ATTACH_CGROUP:		The croup membership
++ * @KDBUS_ATTACH_CAPS:			The process capabilities
++ * @KDBUS_ATTACH_SECLABEL:		The security label
++ * @KDBUS_ATTACH_AUDIT:			The audit IDs
++ * @KDBUS_ATTACH_CONN_DESCRIPTION:	The human-readable connection name
++ * @_KDBUS_ATTACH_ALL:			All of the above
++ * @_KDBUS_ATTACH_ANY:			Wildcard match to enable any kind of
++ *					metatdata.
++ */
++enum kdbus_attach_flags {
++	KDBUS_ATTACH_TIMESTAMP		=  1ULL <<  0,
++	KDBUS_ATTACH_CREDS		=  1ULL <<  1,
++	KDBUS_ATTACH_PIDS		=  1ULL <<  2,
++	KDBUS_ATTACH_AUXGROUPS		=  1ULL <<  3,
++	KDBUS_ATTACH_NAMES		=  1ULL <<  4,
++	KDBUS_ATTACH_TID_COMM		=  1ULL <<  5,
++	KDBUS_ATTACH_PID_COMM		=  1ULL <<  6,
++	KDBUS_ATTACH_EXE		=  1ULL <<  7,
++	KDBUS_ATTACH_CMDLINE		=  1ULL <<  8,
++	KDBUS_ATTACH_CGROUP		=  1ULL <<  9,
++	KDBUS_ATTACH_CAPS		=  1ULL << 10,
++	KDBUS_ATTACH_SECLABEL		=  1ULL << 11,
++	KDBUS_ATTACH_AUDIT		=  1ULL << 12,
++	KDBUS_ATTACH_CONN_DESCRIPTION	=  1ULL << 13,
++	_KDBUS_ATTACH_ALL		=  (1ULL << 14) - 1,
++	_KDBUS_ATTACH_ANY		=  ~0ULL
++};
++
++/**
++ * enum kdbus_item_type - item types to chain data in a list
++ * @_KDBUS_ITEM_NULL:			Uninitialized/invalid
++ * @_KDBUS_ITEM_USER_BASE:		Start of user items
++ * @KDBUS_ITEM_NEGOTIATE:		Negotiate supported items
++ * @KDBUS_ITEM_PAYLOAD_VEC:		Vector to data
++ * @KDBUS_ITEM_PAYLOAD_OFF:		Data at returned offset to message head
++ * @KDBUS_ITEM_PAYLOAD_MEMFD:		Data as sealed memfd
++ * @KDBUS_ITEM_FDS:			Attached file descriptors
++ * @KDBUS_ITEM_CANCEL_FD:		FD used to cancel a synchronous
++ *					operation by writing to it from
++ *					userspace
++ * @KDBUS_ITEM_BLOOM_PARAMETER:		Bus-wide bloom parameters, used with
++ *					KDBUS_CMD_BUS_MAKE, carries a
++ *					struct kdbus_bloom_parameter
++ * @KDBUS_ITEM_BLOOM_FILTER:		Bloom filter carried with a message,
++ *					used to match against a bloom mask of a
++ *					connection, carries a struct
++ *					kdbus_bloom_filter
++ * @KDBUS_ITEM_BLOOM_MASK:		Bloom mask used to match against a
++ *					message'sbloom filter
++ * @KDBUS_ITEM_DST_NAME:		Destination's well-known name
++ * @KDBUS_ITEM_MAKE_NAME:		Name of domain, bus, endpoint
++ * @KDBUS_ITEM_ATTACH_FLAGS_SEND:	Attach-flags, used for updating which
++ *					metadata a connection opts in to send
++ * @KDBUS_ITEM_ATTACH_FLAGS_RECV:	Attach-flags, used for updating which
++ *					metadata a connection requests to
++ *					receive for each reeceived message
++ * @KDBUS_ITEM_ID:			Connection ID
++ * @KDBUS_ITEM_NAME:			Well-know name with flags
++ * @_KDBUS_ITEM_ATTACH_BASE:		Start of metadata attach items
++ * @KDBUS_ITEM_TIMESTAMP:		Timestamp
++ * @KDBUS_ITEM_CREDS:			Process credentials
++ * @KDBUS_ITEM_PIDS:			Process identifiers
++ * @KDBUS_ITEM_AUXGROUPS:		Auxiliary process groups
++ * @KDBUS_ITEM_OWNED_NAME:		A name owned by the associated
++ *					connection
++ * @KDBUS_ITEM_TID_COMM:		Thread ID "comm" identifier
++ *					(Don't trust this, see below.)
++ * @KDBUS_ITEM_PID_COMM:		Process ID "comm" identifier
++ *					(Don't trust this, see below.)
++ * @KDBUS_ITEM_EXE:			The path of the executable
++ *					(Don't trust this, see below.)
++ * @KDBUS_ITEM_CMDLINE:			The process command line
++ *					(Don't trust this, see below.)
++ * @KDBUS_ITEM_CGROUP:			The croup membership
++ * @KDBUS_ITEM_CAPS:			The process capabilities
++ * @KDBUS_ITEM_SECLABEL:		The security label
++ * @KDBUS_ITEM_AUDIT:			The audit IDs
++ * @KDBUS_ITEM_CONN_DESCRIPTION:	The connection's human-readable name
++ *					(debugging)
++ * @_KDBUS_ITEM_POLICY_BASE:		Start of policy items
++ * @KDBUS_ITEM_POLICY_ACCESS:		Policy access block
++ * @_KDBUS_ITEM_KERNEL_BASE:		Start of kernel-generated message items
++ * @KDBUS_ITEM_NAME_ADD:		Notification in kdbus_notify_name_change
++ * @KDBUS_ITEM_NAME_REMOVE:		Notification in kdbus_notify_name_change
++ * @KDBUS_ITEM_NAME_CHANGE:		Notification in kdbus_notify_name_change
++ * @KDBUS_ITEM_ID_ADD:			Notification in kdbus_notify_id_change
++ * @KDBUS_ITEM_ID_REMOVE:		Notification in kdbus_notify_id_change
++ * @KDBUS_ITEM_REPLY_TIMEOUT:		Timeout has been reached
++ * @KDBUS_ITEM_REPLY_DEAD:		Destination died
++ *
++ * N.B: The process and thread COMM fields, as well as the CMDLINE and
++ * EXE fields may be altered by unprivileged processes und should
++ * hence *not* used for security decisions. Peers should make use of
++ * these items only for informational purposes, such as generating log
++ * records.
++ */
++enum kdbus_item_type {
++	_KDBUS_ITEM_NULL,
++	_KDBUS_ITEM_USER_BASE,
++	KDBUS_ITEM_NEGOTIATE	= _KDBUS_ITEM_USER_BASE,
++	KDBUS_ITEM_PAYLOAD_VEC,
++	KDBUS_ITEM_PAYLOAD_OFF,
++	KDBUS_ITEM_PAYLOAD_MEMFD,
++	KDBUS_ITEM_FDS,
++	KDBUS_ITEM_CANCEL_FD,
++	KDBUS_ITEM_BLOOM_PARAMETER,
++	KDBUS_ITEM_BLOOM_FILTER,
++	KDBUS_ITEM_BLOOM_MASK,
++	KDBUS_ITEM_DST_NAME,
++	KDBUS_ITEM_MAKE_NAME,
++	KDBUS_ITEM_ATTACH_FLAGS_SEND,
++	KDBUS_ITEM_ATTACH_FLAGS_RECV,
++	KDBUS_ITEM_ID,
++	KDBUS_ITEM_NAME,
++	KDBUS_ITEM_DST_ID,
++
++	/* keep these item types in sync with KDBUS_ATTACH_* flags */
++	_KDBUS_ITEM_ATTACH_BASE	= 0x1000,
++	KDBUS_ITEM_TIMESTAMP	= _KDBUS_ITEM_ATTACH_BASE,
++	KDBUS_ITEM_CREDS,
++	KDBUS_ITEM_PIDS,
++	KDBUS_ITEM_AUXGROUPS,
++	KDBUS_ITEM_OWNED_NAME,
++	KDBUS_ITEM_TID_COMM,
++	KDBUS_ITEM_PID_COMM,
++	KDBUS_ITEM_EXE,
++	KDBUS_ITEM_CMDLINE,
++	KDBUS_ITEM_CGROUP,
++	KDBUS_ITEM_CAPS,
++	KDBUS_ITEM_SECLABEL,
++	KDBUS_ITEM_AUDIT,
++	KDBUS_ITEM_CONN_DESCRIPTION,
++
++	_KDBUS_ITEM_POLICY_BASE	= 0x2000,
++	KDBUS_ITEM_POLICY_ACCESS = _KDBUS_ITEM_POLICY_BASE,
++
++	_KDBUS_ITEM_KERNEL_BASE	= 0x8000,
++	KDBUS_ITEM_NAME_ADD	= _KDBUS_ITEM_KERNEL_BASE,
++	KDBUS_ITEM_NAME_REMOVE,
++	KDBUS_ITEM_NAME_CHANGE,
++	KDBUS_ITEM_ID_ADD,
++	KDBUS_ITEM_ID_REMOVE,
++	KDBUS_ITEM_REPLY_TIMEOUT,
++	KDBUS_ITEM_REPLY_DEAD,
++};
++
++/**
++ * struct kdbus_item - chain of data blocks
++ * @size:		Overall data record size
++ * @type:		Kdbus_item type of data
++ * @data:		Generic bytes
++ * @data32:		Generic 32 bit array
++ * @data64:		Generic 64 bit array
++ * @str:		Generic string
++ * @id:			Connection ID
++ * @vec:		KDBUS_ITEM_PAYLOAD_VEC
++ * @creds:		KDBUS_ITEM_CREDS
++ * @audit:		KDBUS_ITEM_AUDIT
++ * @timestamp:		KDBUS_ITEM_TIMESTAMP
++ * @name:		KDBUS_ITEM_NAME
++ * @bloom_parameter:	KDBUS_ITEM_BLOOM_PARAMETER
++ * @bloom_filter:	KDBUS_ITEM_BLOOM_FILTER
++ * @memfd:		KDBUS_ITEM_PAYLOAD_MEMFD
++ * @name_change:	KDBUS_ITEM_NAME_ADD
++ *			KDBUS_ITEM_NAME_REMOVE
++ *			KDBUS_ITEM_NAME_CHANGE
++ * @id_change:		KDBUS_ITEM_ID_ADD
++ *			KDBUS_ITEM_ID_REMOVE
++ * @policy:		KDBUS_ITEM_POLICY_ACCESS
++ */
++struct kdbus_item {
++	__u64 size;
++	__u64 type;
++	union {
++		__u8 data[0];
++		__u32 data32[0];
++		__u64 data64[0];
++		char str[0];
++
++		__u64 id;
++		struct kdbus_vec vec;
++		struct kdbus_creds creds;
++		struct kdbus_pids pids;
++		struct kdbus_audit audit;
++		struct kdbus_caps caps;
++		struct kdbus_timestamp timestamp;
++		struct kdbus_name name;
++		struct kdbus_bloom_parameter bloom_parameter;
++		struct kdbus_bloom_filter bloom_filter;
++		struct kdbus_memfd memfd;
++		int fds[0];
++		struct kdbus_notify_name_change name_change;
++		struct kdbus_notify_id_change id_change;
++		struct kdbus_policy_access policy_access;
++	};
++} __attribute__((__aligned__(8)));
++
++/**
++ * enum kdbus_msg_flags - type of message
++ * @KDBUS_MSG_EXPECT_REPLY:	Expect a reply message, used for
++ *				method calls. The userspace-supplied
++ *				cookie identifies the message and the
++ *				respective reply carries the cookie
++ *				in cookie_reply
++ * @KDBUS_MSG_NO_AUTO_START:	Do not start a service if the addressed
++ *				name is not currently active. This flag is
++ *				not looked at by the kernel but only
++ *				serves as hint for userspace implementations.
++ * @KDBUS_MSG_SIGNAL:		Treat this message as signal
++ */
++enum kdbus_msg_flags {
++	KDBUS_MSG_EXPECT_REPLY	= 1ULL << 0,
++	KDBUS_MSG_NO_AUTO_START	= 1ULL << 1,
++	KDBUS_MSG_SIGNAL	= 1ULL << 2,
++};
++
++/**
++ * enum kdbus_payload_type - type of payload carried by message
++ * @KDBUS_PAYLOAD_KERNEL:	Kernel-generated simple message
++ * @KDBUS_PAYLOAD_DBUS:		D-Bus marshalling "DBusDBus"
++ *
++ * Any payload-type is accepted. Common types will get added here once
++ * established.
++ */
++enum kdbus_payload_type {
++	KDBUS_PAYLOAD_KERNEL,
++	KDBUS_PAYLOAD_DBUS	= 0x4442757344427573ULL,
++};
++
++/**
++ * struct kdbus_msg - the representation of a kdbus message
++ * @size:		Total size of the message
++ * @flags:		Message flags (KDBUS_MSG_*), userspace → kernel
++ * @priority:		Message queue priority value
++ * @dst_id:		64-bit ID of the destination connection
++ * @src_id:		64-bit ID of the source connection
++ * @payload_type:	Payload type (KDBUS_PAYLOAD_*)
++ * @cookie:		Userspace-supplied cookie, for the connection
++ *			to identify its messages
++ * @timeout_ns:		The time to wait for a message reply from the peer.
++ *			If there is no reply, and the send command is
++ *			executed asynchronously, a kernel-generated message
++ *			with an attached KDBUS_ITEM_REPLY_TIMEOUT item
++ *			is sent to @src_id. For synchronously executed send
++ *			command, the value denotes the maximum time the call
++ *			blocks to wait for a reply. The timeout is expected in
++ *			nanoseconds and as absolute CLOCK_MONOTONIC value.
++ * @cookie_reply:	A reply to the requesting message with the same
++ *			cookie. The requesting connection can match its
++ *			request and the reply with this value
++ * @items:		A list of kdbus_items containing the message payload
++ */
++struct kdbus_msg {
++	__u64 size;
++	__u64 flags;
++	__s64 priority;
++	__u64 dst_id;
++	__u64 src_id;
++	__u64 payload_type;
++	__u64 cookie;
++	union {
++		__u64 timeout_ns;
++		__u64 cookie_reply;
++	};
++	struct kdbus_item items[0];
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_msg_info - returned message container
++ * @offset:		Offset of kdbus_msg slice in pool
++ * @msg_size:		Copy of the kdbus_msg.size field
++ * @return_flags:	Command return flags, kernel → userspace
++ */
++struct kdbus_msg_info {
++	__u64 offset;
++	__u64 msg_size;
++	__u64 return_flags;
++} __attribute__((__aligned__(8)));
++
++/**
++ * enum kdbus_send_flags - flags for sending messages
++ * @KDBUS_SEND_SYNC_REPLY:	Wait for destination connection to
++ *				reply to this message. The
++ *				KDBUS_CMD_SEND ioctl() will block
++ *				until the reply is received, and
++ *				reply in struct kdbus_cmd_send will
++ *				yield the offset in the sender's pool
++ *				where the reply can be found.
++ *				This flag is only valid if
++ *				@KDBUS_MSG_EXPECT_REPLY is set as well.
++ */
++enum kdbus_send_flags {
++	KDBUS_SEND_SYNC_REPLY		= 1ULL << 0,
++};
++
++/**
++ * struct kdbus_cmd_send - send message
++ * @size:		Overall size of this structure
++ * @flags:		Flags to change send behavior (KDBUS_SEND_*)
++ * @return_flags:	Command return flags, kernel → userspace
++ * @msg_address:	Storage address of the kdbus_msg to send
++ * @reply:		Storage for message reply if KDBUS_SEND_SYNC_REPLY
++ *			was given
++ * @items:		Additional items for this command
++ */
++struct kdbus_cmd_send {
++	__u64 size;
++	__u64 flags;
++	__u64 return_flags;
++	__u64 msg_address;
++	struct kdbus_msg_info reply;
++	struct kdbus_item items[0];
++} __attribute__((__aligned__(8)));
++
++/**
++ * enum kdbus_recv_flags - flags for de-queuing messages
++ * @KDBUS_RECV_PEEK:		Return the next queued message without
++ *				actually de-queuing it, and without installing
++ *				any file descriptors or other resources. It is
++ *				usually used to determine the activating
++ *				connection of a bus name.
++ * @KDBUS_RECV_DROP:		Drop and free the next queued message and all
++ *				its resources without actually receiving it.
++ * @KDBUS_RECV_USE_PRIORITY:	Only de-queue messages with the specified or
++ *				higher priority (lowest values); if not set,
++ *				the priority value is ignored.
++ */
++enum kdbus_recv_flags {
++	KDBUS_RECV_PEEK		= 1ULL <<  0,
++	KDBUS_RECV_DROP		= 1ULL <<  1,
++	KDBUS_RECV_USE_PRIORITY	= 1ULL <<  2,
++};
++
++/**
++ * enum kdbus_recv_return_flags - return flags for message receive commands
++ * @KDBUS_RECV_RETURN_INCOMPLETE_FDS:	One or more file descriptors could not
++ *					be installed. These descriptors in
++ *					KDBUS_ITEM_FDS will carry the value -1.
++ * @KDBUS_RECV_RETURN_DROPPED_MSGS:	There have been dropped messages since
++ *					the last time a message was received.
++ *					The 'dropped_msgs' counter contains the
++ *					number of messages dropped pool
++ *					overflows or other missed broadcasts.
++ */
++enum kdbus_recv_return_flags {
++	KDBUS_RECV_RETURN_INCOMPLETE_FDS	= 1ULL <<  0,
++	KDBUS_RECV_RETURN_DROPPED_MSGS		= 1ULL <<  1,
++};
++
++/**
++ * struct kdbus_cmd_recv - struct to de-queue a buffered message
++ * @size:		Overall size of this object
++ * @flags:		KDBUS_RECV_* flags, userspace → kernel
++ * @return_flags:	Command return flags, kernel → userspace
++ * @priority:		Minimum priority of the messages to de-queue. Lowest
++ *			values have the highest priority.
++ * @dropped_msgs:	In case there were any dropped messages since the last
++ *			time a message was received, this will be set to the
++ *			number of lost messages and
++ *			KDBUS_RECV_RETURN_DROPPED_MSGS will be set in
++ *			'return_flags'. This can only happen if the ioctl
++ *			returns 0 or EAGAIN.
++ * @msg:		Return storage for received message.
++ * @items:		Additional items for this command.
++ *
++ * This struct is used with the KDBUS_CMD_RECV ioctl.
++ */
++struct kdbus_cmd_recv {
++	__u64 size;
++	__u64 flags;
++	__u64 return_flags;
++	__s64 priority;
++	__u64 dropped_msgs;
++	struct kdbus_msg_info msg;
++	struct kdbus_item items[0];
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_cmd_free - struct to free a slice of memory in the pool
++ * @size:		Overall size of this structure
++ * @flags:		Flags for the free command, userspace → kernel
++ * @return_flags:	Command return flags, kernel → userspace
++ * @offset:		The offset of the memory slice, as returned by other
++ *			ioctls
++ * @items:		Additional items to modify the behavior
++ *
++ * This struct is used with the KDBUS_CMD_FREE ioctl.
++ */
++struct kdbus_cmd_free {
++	__u64 size;
++	__u64 flags;
++	__u64 return_flags;
++	__u64 offset;
++	struct kdbus_item items[0];
++} __attribute__((__aligned__(8)));
++
++/**
++ * enum kdbus_hello_flags - flags for struct kdbus_cmd_hello
++ * @KDBUS_HELLO_ACCEPT_FD:	The connection allows the reception of
++ *				any passed file descriptors
++ * @KDBUS_HELLO_ACTIVATOR:	Special-purpose connection which registers
++ *				a well-know name for a process to be started
++ *				when traffic arrives
++ * @KDBUS_HELLO_POLICY_HOLDER:	Special-purpose connection which registers
++ *				policy entries for a name. The provided name
++ *				is not activated and not registered with the
++ *				name database, it only allows unprivileged
++ *				connections to acquire a name, talk or discover
++ *				a service
++ * @KDBUS_HELLO_MONITOR:	Special-purpose connection to monitor
++ *				bus traffic
++ */
++enum kdbus_hello_flags {
++	KDBUS_HELLO_ACCEPT_FD		=  1ULL <<  0,
++	KDBUS_HELLO_ACTIVATOR		=  1ULL <<  1,
++	KDBUS_HELLO_POLICY_HOLDER	=  1ULL <<  2,
++	KDBUS_HELLO_MONITOR		=  1ULL <<  3,
++};
++
++/**
++ * struct kdbus_cmd_hello - struct to say hello to kdbus
++ * @size:		The total size of the structure
++ * @flags:		Connection flags (KDBUS_HELLO_*), userspace → kernel
++ * @return_flags:	Command return flags, kernel → userspace
++ * @attach_flags_send:	Mask of metadata to attach to each message sent
++ *			off by this connection (KDBUS_ATTACH_*)
++ * @attach_flags_recv:	Mask of metadata to attach to each message receieved
++ *			by the new connection (KDBUS_ATTACH_*)
++ * @bus_flags:		The flags field copied verbatim from the original
++ *			KDBUS_CMD_BUS_MAKE ioctl. It's intended to be useful
++ *			to do negotiation of features of the payload that is
++ *			transferred (kernel → userspace)
++ * @id:			The ID of this connection (kernel → userspace)
++ * @pool_size:		Size of the connection's buffer where the received
++ *			messages are placed
++ * @offset:		Pool offset where items are returned to report
++ *			additional information about the bus and the newly
++ *			created connection.
++ * @items_size:		Size of buffer returned in the pool slice at @offset.
++ * @id128:		Unique 128-bit ID of the bus (kernel → userspace)
++ * @items:		A list of items
++ *
++ * This struct is used with the KDBUS_CMD_HELLO ioctl.
++ */
++struct kdbus_cmd_hello {
++	__u64 size;
++	__u64 flags;
++	__u64 return_flags;
++	__u64 attach_flags_send;
++	__u64 attach_flags_recv;
++	__u64 bus_flags;
++	__u64 id;
++	__u64 pool_size;
++	__u64 offset;
++	__u64 items_size;
++	__u8 id128[16];
++	struct kdbus_item items[0];
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_info - connection information
++ * @size:		total size of the struct
++ * @id:			64bit object ID
++ * @flags:		object creation flags
++ * @items:		list of items
++ *
++ * Note that the user is responsible for freeing the allocated memory with
++ * the KDBUS_CMD_FREE ioctl.
++ */
++struct kdbus_info {
++	__u64 size;
++	__u64 id;
++	__u64 flags;
++	struct kdbus_item items[0];
++} __attribute__((__aligned__(8)));
++
++/**
++ * enum kdbus_list_flags - what to include into the returned list
++ * @KDBUS_LIST_UNIQUE:		active connections
++ * @KDBUS_LIST_ACTIVATORS:	activator connections
++ * @KDBUS_LIST_NAMES:		known well-known names
++ * @KDBUS_LIST_QUEUED:		queued-up names
++ */
++enum kdbus_list_flags {
++	KDBUS_LIST_UNIQUE		= 1ULL <<  0,
++	KDBUS_LIST_NAMES		= 1ULL <<  1,
++	KDBUS_LIST_ACTIVATORS		= 1ULL <<  2,
++	KDBUS_LIST_QUEUED		= 1ULL <<  3,
++};
++
++/**
++ * struct kdbus_cmd_list - list connections
++ * @size:		overall size of this object
++ * @flags:		flags for the query (KDBUS_LIST_*), userspace → kernel
++ * @return_flags:	command return flags, kernel → userspace
++ * @offset:		Offset in the caller's pool buffer where an array of
++ *			kdbus_info objects is stored.
++ *			The user must use KDBUS_CMD_FREE to free the
++ *			allocated memory.
++ * @list_size:		size of returned list in bytes
++ * @items:		Items for the command. Reserved for future use.
++ *
++ * This structure is used with the KDBUS_CMD_LIST ioctl.
++ */
++struct kdbus_cmd_list {
++	__u64 size;
++	__u64 flags;
++	__u64 return_flags;
++	__u64 offset;
++	__u64 list_size;
++	struct kdbus_item items[0];
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_cmd_info - struct used for KDBUS_CMD_CONN_INFO ioctl
++ * @size:		The total size of the struct
++ * @flags:		Flags for this ioctl, userspace → kernel
++ * @return_flags:	Command return flags, kernel → userspace
++ * @id:			The 64-bit ID of the connection. If set to zero, passing
++ *			@name is required. kdbus will look up the name to
++ *			determine the ID in this case.
++ * @attach_flags:	Set of attach flags to specify the set of information
++ *			to receive, userspace → kernel
++ * @offset:		Returned offset in the caller's pool buffer where the
++ *			kdbus_info struct result is stored. The user must
++ *			use KDBUS_CMD_FREE to free the allocated memory.
++ * @info_size:		Output buffer to report size of data at @offset.
++ * @items:		The optional item list, containing the
++ *			well-known name to look up as a KDBUS_ITEM_NAME.
++ *			Only needed in case @id is zero.
++ *
++ * On success, the KDBUS_CMD_CONN_INFO ioctl will return 0 and @offset will
++ * tell the user the offset in the connection pool buffer at which to find the
++ * result in a struct kdbus_info.
++ */
++struct kdbus_cmd_info {
++	__u64 size;
++	__u64 flags;
++	__u64 return_flags;
++	__u64 id;
++	__u64 attach_flags;
++	__u64 offset;
++	__u64 info_size;
++	struct kdbus_item items[0];
++} __attribute__((__aligned__(8)));
++
++/**
++ * enum kdbus_cmd_match_flags - flags to control the KDBUS_CMD_MATCH_ADD ioctl
++ * @KDBUS_MATCH_REPLACE:	If entries with the supplied cookie already
++ *				exists, remove them before installing the new
++ *				matches.
++ */
++enum kdbus_cmd_match_flags {
++	KDBUS_MATCH_REPLACE	= 1ULL <<  0,
++};
++
++/**
++ * struct kdbus_cmd_match - struct to add or remove matches
++ * @size:		The total size of the struct
++ * @flags:		Flags for match command (KDBUS_MATCH_*),
++ *			userspace → kernel
++ * @return_flags:	Command return flags, kernel → userspace
++ * @cookie:		Userspace supplied cookie. When removing, the cookie
++ *			identifies the match to remove
++ * @items:		A list of items for additional information
++ *
++ * This structure is used with the KDBUS_CMD_MATCH_ADD and
++ * KDBUS_CMD_MATCH_REMOVE ioctl.
++ */
++struct kdbus_cmd_match {
++	__u64 size;
++	__u64 flags;
++	__u64 return_flags;
++	__u64 cookie;
++	struct kdbus_item items[0];
++} __attribute__((__aligned__(8)));
++
++/**
++ * enum kdbus_make_flags - Flags for KDBUS_CMD_{BUS,ENDPOINT}_MAKE
++ * @KDBUS_MAKE_ACCESS_GROUP:	Make the bus or endpoint node group-accessible
++ * @KDBUS_MAKE_ACCESS_WORLD:	Make the bus or endpoint node world-accessible
++ */
++enum kdbus_make_flags {
++	KDBUS_MAKE_ACCESS_GROUP		= 1ULL <<  0,
++	KDBUS_MAKE_ACCESS_WORLD		= 1ULL <<  1,
++};
++
++/**
++ * enum kdbus_name_flags - flags for KDBUS_CMD_NAME_ACQUIRE
++ * @KDBUS_NAME_REPLACE_EXISTING:	Try to replace name of other connections
++ * @KDBUS_NAME_ALLOW_REPLACEMENT:	Allow the replacement of the name
++ * @KDBUS_NAME_QUEUE:			Name should be queued if busy
++ * @KDBUS_NAME_IN_QUEUE:		Name is queued
++ * @KDBUS_NAME_ACTIVATOR:		Name is owned by a activator connection
++ * @KDBUS_NAME_PRIMARY:			Primary owner of the name
++ * @KDBUS_NAME_ACQUIRED:		Name was acquired/queued _now_
++ */
++enum kdbus_name_flags {
++	KDBUS_NAME_REPLACE_EXISTING	= 1ULL <<  0,
++	KDBUS_NAME_ALLOW_REPLACEMENT	= 1ULL <<  1,
++	KDBUS_NAME_QUEUE		= 1ULL <<  2,
++	KDBUS_NAME_IN_QUEUE		= 1ULL <<  3,
++	KDBUS_NAME_ACTIVATOR		= 1ULL <<  4,
++	KDBUS_NAME_PRIMARY		= 1ULL <<  5,
++	KDBUS_NAME_ACQUIRED		= 1ULL <<  6,
++};
++
++/**
++ * struct kdbus_cmd - generic ioctl payload
++ * @size:		Overall size of this structure
++ * @flags:		Flags for this ioctl, userspace → kernel
++ * @return_flags:	Ioctl return flags, kernel → userspace
++ * @items:		Additional items to modify the behavior
++ *
++ * This is a generic ioctl payload object. It's used by all ioctls that only
++ * take flags and items as input.
++ */
++struct kdbus_cmd {
++	__u64 size;
++	__u64 flags;
++	__u64 return_flags;
++	struct kdbus_item items[0];
++} __attribute__((__aligned__(8)));
++
++/**
++ * Ioctl API
++ *
++ * KDBUS_CMD_BUS_MAKE:		After opening the "control" node, this command
++ *				creates a new bus with the specified
++ *				name. The bus is immediately shut down and
++ *				cleaned up when the opened file descriptor is
++ *				closed.
++ *
++ * KDBUS_CMD_ENDPOINT_MAKE:	Creates a new named special endpoint to talk to
++ *				the bus. Such endpoints usually carry a more
++ *				restrictive policy and grant restricted access
++ *				to specific applications.
++ * KDBUS_CMD_ENDPOINT_UPDATE:	Update the properties of a custom enpoint. Used
++ *				to update the policy.
++ *
++ * KDBUS_CMD_HELLO:		By opening the bus node, a connection is
++ *				created. After a HELLO the opened connection
++ *				becomes an active peer on the bus.
++ * KDBUS_CMD_UPDATE:		Update the properties of a connection. Used to
++ *				update the metadata subscription mask and
++ *				policy.
++ * KDBUS_CMD_BYEBYE:		Disconnect a connection. If there are no
++ *				messages queued up in the connection's pool,
++ *				the call succeeds, and the handle is rendered
++ *				unusable. Otherwise, -EBUSY is returned without
++ *				any further side-effects.
++ * KDBUS_CMD_FREE:		Release the allocated memory in the receiver's
++ *				pool.
++ * KDBUS_CMD_CONN_INFO:		Retrieve credentials and properties of the
++ *				initial creator of the connection. The data was
++ *				stored at registration time and does not
++ *				necessarily represent the connected process or
++ *				the actual state of the process.
++ * KDBUS_CMD_BUS_CREATOR_INFO:	Retrieve information of the creator of the bus
++ *				a connection is attached to.
++ *
++ * KDBUS_CMD_SEND:		Send a message and pass data from userspace to
++ *				the kernel.
++ * KDBUS_CMD_RECV:		Receive a message from the kernel which is
++ *				placed in the receiver's pool.
++ *
++ * KDBUS_CMD_NAME_ACQUIRE:	Request a well-known bus name to associate with
++ *				the connection. Well-known names are used to
++ *				address a peer on the bus.
++ * KDBUS_CMD_NAME_RELEASE:	Release a well-known name the connection
++ *				currently owns.
++ * KDBUS_CMD_LIST:		Retrieve the list of all currently registered
++ *				well-known and unique names.
++ *
++ * KDBUS_CMD_MATCH_ADD:		Install a match which broadcast messages should
++ *				be delivered to the connection.
++ * KDBUS_CMD_MATCH_REMOVE:	Remove a current match for broadcast messages.
++ */
++enum kdbus_ioctl_type {
++	/* bus owner (00-0f) */
++	KDBUS_CMD_BUS_MAKE =		_IOW(KDBUS_IOCTL_MAGIC, 0x00,
++					     struct kdbus_cmd),
++
++	/* endpoint owner (10-1f) */
++	KDBUS_CMD_ENDPOINT_MAKE =	_IOW(KDBUS_IOCTL_MAGIC, 0x10,
++					     struct kdbus_cmd),
++	KDBUS_CMD_ENDPOINT_UPDATE =	_IOW(KDBUS_IOCTL_MAGIC, 0x11,
++					     struct kdbus_cmd),
++
++	/* connection owner (80-ff) */
++	KDBUS_CMD_HELLO =		_IOWR(KDBUS_IOCTL_MAGIC, 0x80,
++					      struct kdbus_cmd_hello),
++	KDBUS_CMD_UPDATE =		_IOW(KDBUS_IOCTL_MAGIC, 0x81,
++					     struct kdbus_cmd),
++	KDBUS_CMD_BYEBYE =		_IOW(KDBUS_IOCTL_MAGIC, 0x82,
++					     struct kdbus_cmd),
++	KDBUS_CMD_FREE =		_IOW(KDBUS_IOCTL_MAGIC, 0x83,
++					     struct kdbus_cmd_free),
++	KDBUS_CMD_CONN_INFO =		_IOR(KDBUS_IOCTL_MAGIC, 0x84,
++					     struct kdbus_cmd_info),
++	KDBUS_CMD_BUS_CREATOR_INFO =	_IOR(KDBUS_IOCTL_MAGIC, 0x85,
++					     struct kdbus_cmd_info),
++	KDBUS_CMD_LIST =		_IOR(KDBUS_IOCTL_MAGIC, 0x86,
++					     struct kdbus_cmd_list),
++
++	KDBUS_CMD_SEND =		_IOW(KDBUS_IOCTL_MAGIC, 0x90,
++					     struct kdbus_cmd_send),
++	KDBUS_CMD_RECV =		_IOR(KDBUS_IOCTL_MAGIC, 0x91,
++					     struct kdbus_cmd_recv),
++
++	KDBUS_CMD_NAME_ACQUIRE =	_IOW(KDBUS_IOCTL_MAGIC, 0xa0,
++					     struct kdbus_cmd),
++	KDBUS_CMD_NAME_RELEASE =	_IOW(KDBUS_IOCTL_MAGIC, 0xa1,
++					     struct kdbus_cmd),
++
++	KDBUS_CMD_MATCH_ADD =		_IOW(KDBUS_IOCTL_MAGIC, 0xb0,
++					     struct kdbus_cmd_match),
++	KDBUS_CMD_MATCH_REMOVE =	_IOW(KDBUS_IOCTL_MAGIC, 0xb1,
++					     struct kdbus_cmd_match),
++};
++
++#endif /* _UAPI_KDBUS_H_ */
+diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
+index 7b1425a..ce2ac5a 100644
+--- a/include/uapi/linux/magic.h
++++ b/include/uapi/linux/magic.h
+@@ -76,4 +76,6 @@
+ #define BTRFS_TEST_MAGIC	0x73727279
+ #define NSFS_MAGIC		0x6e736673
+ 
++#define KDBUS_SUPER_MAGIC	0x44427573
++
+ #endif /* __LINUX_MAGIC_H__ */
+diff --git a/init/Kconfig b/init/Kconfig
+index dc24dec..9388071 100644
+--- a/init/Kconfig
++++ b/init/Kconfig
+@@ -261,6 +261,19 @@ config POSIX_MQUEUE_SYSCTL
+ 	depends on SYSCTL
+ 	default y
+ 
++config KDBUS
++	tristate "kdbus interprocess communication"
++	depends on TMPFS
++	help
++	  D-Bus is a system for low-latency, low-overhead, easy to use
++	  interprocess communication (IPC).
++
++	  See the man-pages and HTML files in Documentation/kdbus/
++	  that are generated by 'make mandocs' and 'make htmldocs'.
++
++	  If you have an ordinary machine, select M here. The module
++	  will be called kdbus.
++
+ config CROSS_MEMORY_ATTACH
+ 	bool "Enable process_vm_readv/writev syscalls"
+ 	depends on MMU
+diff --git a/ipc/Makefile b/ipc/Makefile
+index 86c7300..68ec416 100644
+--- a/ipc/Makefile
++++ b/ipc/Makefile
+@@ -9,4 +9,4 @@ obj_mq-$(CONFIG_COMPAT) += compat_mq.o
+ obj-$(CONFIG_POSIX_MQUEUE) += mqueue.o msgutil.o $(obj_mq-y)
+ obj-$(CONFIG_IPC_NS) += namespace.o
+ obj-$(CONFIG_POSIX_MQUEUE_SYSCTL) += mq_sysctl.o
+-
++obj-$(CONFIG_KDBUS) += kdbus/
+diff --git a/ipc/kdbus/Makefile b/ipc/kdbus/Makefile
+new file mode 100644
+index 0000000..66663a1
+--- /dev/null
++++ b/ipc/kdbus/Makefile
+@@ -0,0 +1,33 @@
++#
++# By setting KDBUS_EXT=2, the kdbus module will be built as kdbus2.ko, and
++# KBUILD_MODNAME=kdbus2. This has the effect that all exported objects have
++# different names than usually (kdbus2fs, /sys/fs/kdbus2/) and you can run
++# your test-infrastructure against the kdbus2.ko, while running your system
++# on kdbus.ko.
++#
++# To just build the module, use:
++#     make KDBUS_EXT=2 M=ipc/kdbus
++#
++
++kdbus$(KDBUS_EXT)-y := \
++	bus.o \
++	connection.o \
++	endpoint.o \
++	fs.o \
++	handle.o \
++	item.o \
++	main.o \
++	match.o \
++	message.o \
++	metadata.o \
++	names.o \
++	node.o \
++	notify.o \
++	domain.o \
++	policy.o \
++	pool.o \
++	reply.o \
++	queue.o \
++	util.o
++
++obj-$(CONFIG_KDBUS) += kdbus$(KDBUS_EXT).o
+diff --git a/ipc/kdbus/bus.c b/ipc/kdbus/bus.c
+new file mode 100644
+index 0000000..a67f825
+--- /dev/null
++++ b/ipc/kdbus/bus.c
+@@ -0,0 +1,514 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/fs.h>
++#include <linux/hashtable.h>
++#include <linux/init.h>
++#include <linux/module.h>
++#include <linux/random.h>
++#include <linux/sched.h>
++#include <linux/sizes.h>
++#include <linux/slab.h>
++#include <linux/uaccess.h>
++#include <linux/uio.h>
++
++#include "bus.h"
++#include "notify.h"
++#include "connection.h"
++#include "domain.h"
++#include "endpoint.h"
++#include "handle.h"
++#include "item.h"
++#include "match.h"
++#include "message.h"
++#include "metadata.h"
++#include "names.h"
++#include "policy.h"
++#include "util.h"
++
++static void kdbus_bus_free(struct kdbus_node *node)
++{
++	struct kdbus_bus *bus = container_of(node, struct kdbus_bus, node);
++
++	WARN_ON(!list_empty(&bus->monitors_list));
++	WARN_ON(!hash_empty(bus->conn_hash));
++
++	kdbus_notify_free(bus);
++
++	kdbus_user_unref(bus->creator);
++	kdbus_name_registry_free(bus->name_registry);
++	kdbus_domain_unref(bus->domain);
++	kdbus_policy_db_clear(&bus->policy_db);
++	kdbus_meta_proc_unref(bus->creator_meta);
++	kfree(bus);
++}
++
++static void kdbus_bus_release(struct kdbus_node *node, bool was_active)
++{
++	struct kdbus_bus *bus = container_of(node, struct kdbus_bus, node);
++
++	if (was_active)
++		atomic_dec(&bus->creator->buses);
++}
++
++static struct kdbus_bus *kdbus_bus_new(struct kdbus_domain *domain,
++				       const char *name,
++				       struct kdbus_bloom_parameter *bloom,
++				       const u64 *pattach_owner,
++				       u64 flags, kuid_t uid, kgid_t gid)
++{
++	struct kdbus_bus *b;
++	u64 attach_owner;
++	int ret;
++
++	if (bloom->size < 8 || bloom->size > KDBUS_BUS_BLOOM_MAX_SIZE ||
++	    !KDBUS_IS_ALIGNED8(bloom->size) || bloom->n_hash < 1)
++		return ERR_PTR(-EINVAL);
++
++	ret = kdbus_sanitize_attach_flags(pattach_owner ? *pattach_owner : 0,
++					  &attach_owner);
++	if (ret < 0)
++		return ERR_PTR(ret);
++
++	ret = kdbus_verify_uid_prefix(name, domain->user_namespace, uid);
++	if (ret < 0)
++		return ERR_PTR(ret);
++
++	b = kzalloc(sizeof(*b), GFP_KERNEL);
++	if (!b)
++		return ERR_PTR(-ENOMEM);
++
++	kdbus_node_init(&b->node, KDBUS_NODE_BUS);
++
++	b->node.free_cb = kdbus_bus_free;
++	b->node.release_cb = kdbus_bus_release;
++	b->node.uid = uid;
++	b->node.gid = gid;
++	b->node.mode = S_IRUSR | S_IXUSR;
++
++	if (flags & (KDBUS_MAKE_ACCESS_GROUP | KDBUS_MAKE_ACCESS_WORLD))
++		b->node.mode |= S_IRGRP | S_IXGRP;
++	if (flags & KDBUS_MAKE_ACCESS_WORLD)
++		b->node.mode |= S_IROTH | S_IXOTH;
++
++	b->id = atomic64_inc_return(&domain->last_id);
++	b->bus_flags = flags;
++	b->attach_flags_owner = attach_owner;
++	generate_random_uuid(b->id128);
++	b->bloom = *bloom;
++	b->domain = kdbus_domain_ref(domain);
++
++	kdbus_policy_db_init(&b->policy_db);
++
++	init_rwsem(&b->conn_rwlock);
++	hash_init(b->conn_hash);
++	INIT_LIST_HEAD(&b->monitors_list);
++
++	INIT_LIST_HEAD(&b->notify_list);
++	spin_lock_init(&b->notify_lock);
++	mutex_init(&b->notify_flush_lock);
++
++	ret = kdbus_node_link(&b->node, &domain->node, name);
++	if (ret < 0)
++		goto exit_unref;
++
++	/* cache the metadata/credentials of the creator */
++	b->creator_meta = kdbus_meta_proc_new();
++	if (IS_ERR(b->creator_meta)) {
++		ret = PTR_ERR(b->creator_meta);
++		b->creator_meta = NULL;
++		goto exit_unref;
++	}
++
++	ret = kdbus_meta_proc_collect(b->creator_meta,
++				      KDBUS_ATTACH_CREDS |
++				      KDBUS_ATTACH_PIDS |
++				      KDBUS_ATTACH_AUXGROUPS |
++				      KDBUS_ATTACH_TID_COMM |
++				      KDBUS_ATTACH_PID_COMM |
++				      KDBUS_ATTACH_EXE |
++				      KDBUS_ATTACH_CMDLINE |
++				      KDBUS_ATTACH_CGROUP |
++				      KDBUS_ATTACH_CAPS |
++				      KDBUS_ATTACH_SECLABEL |
++				      KDBUS_ATTACH_AUDIT);
++	if (ret < 0)
++		goto exit_unref;
++
++	b->name_registry = kdbus_name_registry_new();
++	if (IS_ERR(b->name_registry)) {
++		ret = PTR_ERR(b->name_registry);
++		b->name_registry = NULL;
++		goto exit_unref;
++	}
++
++	/*
++	 * Bus-limits of the creator are accounted on its real UID, just like
++	 * all other per-user limits.
++	 */
++	b->creator = kdbus_user_lookup(domain, current_uid());
++	if (IS_ERR(b->creator)) {
++		ret = PTR_ERR(b->creator);
++		b->creator = NULL;
++		goto exit_unref;
++	}
++
++	return b;
++
++exit_unref:
++	kdbus_node_deactivate(&b->node);
++	kdbus_node_unref(&b->node);
++	return ERR_PTR(ret);
++}
++
++/**
++ * kdbus_bus_ref() - increase the reference counter of a kdbus_bus
++ * @bus:		The bus to reference
++ *
++ * Every user of a bus, except for its creator, must add a reference to the
++ * kdbus_bus using this function.
++ *
++ * Return: the bus itself
++ */
++struct kdbus_bus *kdbus_bus_ref(struct kdbus_bus *bus)
++{
++	if (bus)
++		kdbus_node_ref(&bus->node);
++	return bus;
++}
++
++/**
++ * kdbus_bus_unref() - decrease the reference counter of a kdbus_bus
++ * @bus:		The bus to unref
++ *
++ * Release a reference. If the reference count drops to 0, the bus will be
++ * freed.
++ *
++ * Return: NULL
++ */
++struct kdbus_bus *kdbus_bus_unref(struct kdbus_bus *bus)
++{
++	if (bus)
++		kdbus_node_unref(&bus->node);
++	return NULL;
++}
++
++/**
++ * kdbus_bus_find_conn_by_id() - find a connection with a given id
++ * @bus:		The bus to look for the connection
++ * @id:			The 64-bit connection id
++ *
++ * Looks up a connection with a given id. The returned connection
++ * is ref'ed, and needs to be unref'ed by the user. Returns NULL if
++ * the connection can't be found.
++ */
++struct kdbus_conn *kdbus_bus_find_conn_by_id(struct kdbus_bus *bus, u64 id)
++{
++	struct kdbus_conn *conn, *found = NULL;
++
++	down_read(&bus->conn_rwlock);
++	hash_for_each_possible(bus->conn_hash, conn, hentry, id)
++		if (conn->id == id) {
++			found = kdbus_conn_ref(conn);
++			break;
++		}
++	up_read(&bus->conn_rwlock);
++
++	return found;
++}
++
++/**
++ * kdbus_bus_broadcast() - send a message to all subscribed connections
++ * @bus:	The bus the connections are connected to
++ * @conn_src:	The source connection, may be %NULL for kernel notifications
++ * @staging:	Staging object containing the message to send
++ *
++ * Send message to all connections that are currently active on the bus.
++ * Connections must still have matches installed in order to let the message
++ * pass.
++ *
++ * The caller must hold the name-registry lock of @bus.
++ */
++void kdbus_bus_broadcast(struct kdbus_bus *bus,
++			 struct kdbus_conn *conn_src,
++			 struct kdbus_staging *staging)
++{
++	struct kdbus_conn *conn_dst;
++	unsigned int i;
++	int ret;
++
++	lockdep_assert_held(&bus->name_registry->rwlock);
++
++	/*
++	 * Make sure broadcast are queued on monitors before we send it out to
++	 * anyone else. Otherwise, connections might react to broadcasts before
++	 * the monitor gets the broadcast queued. In the worst case, the
++	 * monitor sees a reaction to the broadcast before the broadcast itself.
++	 * We don't give ordering guarantees across connections (and monitors
++	 * can re-construct order via sequence numbers), but we should at least
++	 * try to avoid re-ordering for monitors.
++	 */
++	kdbus_bus_eavesdrop(bus, conn_src, staging);
++
++	down_read(&bus->conn_rwlock);
++	hash_for_each(bus->conn_hash, i, conn_dst, hentry) {
++		if (!kdbus_conn_is_ordinary(conn_dst))
++			continue;
++
++		/*
++		 * Check if there is a match for the kmsg object in
++		 * the destination connection match db
++		 */
++		if (!kdbus_match_db_match_msg(conn_dst->match_db, conn_src,
++					      staging))
++			continue;
++
++		if (conn_src) {
++			/*
++			 * Anyone can send broadcasts, as they have no
++			 * destination. But a receiver needs TALK access to
++			 * the sender in order to receive broadcasts.
++			 */
++			if (!kdbus_conn_policy_talk(conn_dst, NULL, conn_src))
++				continue;
++		} else {
++			/*
++			 * Check if there is a policy db that prevents the
++			 * destination connection from receiving this kernel
++			 * notification
++			 */
++			if (!kdbus_conn_policy_see_notification(conn_dst, NULL,
++								staging->msg))
++				continue;
++		}
++
++		ret = kdbus_conn_entry_insert(conn_src, conn_dst, staging,
++					      NULL, NULL);
++		if (ret < 0)
++			kdbus_conn_lost_message(conn_dst);
++	}
++	up_read(&bus->conn_rwlock);
++}
++
++/**
++ * kdbus_bus_eavesdrop() - send a message to all subscribed monitors
++ * @bus:	The bus the monitors are connected to
++ * @conn_src:	The source connection, may be %NULL for kernel notifications
++ * @staging:	Staging object containing the message to send
++ *
++ * Send message to all monitors that are currently active on the bus. Monitors
++ * must still have matches installed in order to let the message pass.
++ *
++ * The caller must hold the name-registry lock of @bus.
++ */
++void kdbus_bus_eavesdrop(struct kdbus_bus *bus,
++			 struct kdbus_conn *conn_src,
++			 struct kdbus_staging *staging)
++{
++	struct kdbus_conn *conn_dst;
++	int ret;
++
++	/*
++	 * Monitor connections get all messages; ignore possible errors
++	 * when sending messages to monitor connections.
++	 */
++
++	lockdep_assert_held(&bus->name_registry->rwlock);
++
++	down_read(&bus->conn_rwlock);
++	list_for_each_entry(conn_dst, &bus->monitors_list, monitor_entry) {
++		ret = kdbus_conn_entry_insert(conn_src, conn_dst, staging,
++					      NULL, NULL);
++		if (ret < 0)
++			kdbus_conn_lost_message(conn_dst);
++	}
++	up_read(&bus->conn_rwlock);
++}
++
++/**
++ * kdbus_cmd_bus_make() - handle KDBUS_CMD_BUS_MAKE
++ * @domain:		domain to operate on
++ * @argp:		command payload
++ *
++ * Return: NULL or newly created bus on success, ERR_PTR on failure.
++ */
++struct kdbus_bus *kdbus_cmd_bus_make(struct kdbus_domain *domain,
++				     void __user *argp)
++{
++	struct kdbus_bus *bus = NULL;
++	struct kdbus_cmd *cmd;
++	struct kdbus_ep *ep = NULL;
++	int ret;
++
++	struct kdbus_arg argv[] = {
++		{ .type = KDBUS_ITEM_NEGOTIATE },
++		{ .type = KDBUS_ITEM_MAKE_NAME, .mandatory = true },
++		{ .type = KDBUS_ITEM_BLOOM_PARAMETER, .mandatory = true },
++		{ .type = KDBUS_ITEM_ATTACH_FLAGS_SEND },
++	};
++	struct kdbus_args args = {
++		.allowed_flags = KDBUS_FLAG_NEGOTIATE |
++				 KDBUS_MAKE_ACCESS_GROUP |
++				 KDBUS_MAKE_ACCESS_WORLD,
++		.argv = argv,
++		.argc = ARRAY_SIZE(argv),
++	};
++
++	ret = kdbus_args_parse(&args, argp, &cmd);
++	if (ret < 0)
++		return ERR_PTR(ret);
++	if (ret > 0)
++		return NULL;
++
++	bus = kdbus_bus_new(domain,
++			    argv[1].item->str, &argv[2].item->bloom_parameter,
++			    argv[3].item ? argv[3].item->data64 : NULL,
++			    cmd->flags, current_euid(), current_egid());
++	if (IS_ERR(bus)) {
++		ret = PTR_ERR(bus);
++		bus = NULL;
++		goto exit;
++	}
++
++	if (atomic_inc_return(&bus->creator->buses) > KDBUS_USER_MAX_BUSES) {
++		atomic_dec(&bus->creator->buses);
++		ret = -EMFILE;
++		goto exit;
++	}
++
++	if (!kdbus_node_activate(&bus->node)) {
++		atomic_dec(&bus->creator->buses);
++		ret = -ESHUTDOWN;
++		goto exit;
++	}
++
++	ep = kdbus_ep_new(bus, "bus", cmd->flags, bus->node.uid, bus->node.gid,
++			  false);
++	if (IS_ERR(ep)) {
++		ret = PTR_ERR(ep);
++		ep = NULL;
++		goto exit;
++	}
++
++	if (!kdbus_node_activate(&ep->node)) {
++		ret = -ESHUTDOWN;
++		goto exit;
++	}
++
++	/*
++	 * Drop our own reference, effectively causing the endpoint to be
++	 * deactivated and released when the parent bus is.
++	 */
++	ep = kdbus_ep_unref(ep);
++
++exit:
++	ret = kdbus_args_clear(&args, ret);
++	if (ret < 0) {
++		if (ep) {
++			kdbus_node_deactivate(&ep->node);
++			kdbus_ep_unref(ep);
++		}
++		if (bus) {
++			kdbus_node_deactivate(&bus->node);
++			kdbus_bus_unref(bus);
++		}
++		return ERR_PTR(ret);
++	}
++	return bus;
++}
++
++/**
++ * kdbus_cmd_bus_creator_info() - handle KDBUS_CMD_BUS_CREATOR_INFO
++ * @conn:		connection to operate on
++ * @argp:		command payload
++ *
++ * Return: >=0 on success, negative error code on failure.
++ */
++int kdbus_cmd_bus_creator_info(struct kdbus_conn *conn, void __user *argp)
++{
++	struct kdbus_cmd_info *cmd;
++	struct kdbus_bus *bus = conn->ep->bus;
++	struct kdbus_pool_slice *slice = NULL;
++	struct kdbus_item *meta_items = NULL;
++	struct kdbus_item_header item_hdr;
++	struct kdbus_info info = {};
++	size_t meta_size, name_len, cnt = 0;
++	struct kvec kvec[6];
++	u64 attach_flags, size = 0;
++	int ret;
++
++	struct kdbus_arg argv[] = {
++		{ .type = KDBUS_ITEM_NEGOTIATE },
++	};
++	struct kdbus_args args = {
++		.allowed_flags = KDBUS_FLAG_NEGOTIATE,
++		.argv = argv,
++		.argc = ARRAY_SIZE(argv),
++	};
++
++	ret = kdbus_args_parse(&args, argp, &cmd);
++	if (ret != 0)
++		return ret;
++
++	ret = kdbus_sanitize_attach_flags(cmd->attach_flags, &attach_flags);
++	if (ret < 0)
++		goto exit;
++
++	attach_flags &= bus->attach_flags_owner;
++
++	ret = kdbus_meta_emit(bus->creator_meta, NULL, NULL, conn,
++			      attach_flags, &meta_items, &meta_size);
++	if (ret < 0)
++		goto exit;
++
++	name_len = strlen(bus->node.name) + 1;
++	info.id = bus->id;
++	info.flags = bus->bus_flags;
++	item_hdr.type = KDBUS_ITEM_MAKE_NAME;
++	item_hdr.size = KDBUS_ITEM_HEADER_SIZE + name_len;
++
++	kdbus_kvec_set(&kvec[cnt++], &info, sizeof(info), &size);
++	kdbus_kvec_set(&kvec[cnt++], &item_hdr, sizeof(item_hdr), &size);
++	kdbus_kvec_set(&kvec[cnt++], bus->node.name, name_len, &size);
++	cnt += !!kdbus_kvec_pad(&kvec[cnt], &size);
++	if (meta_size > 0) {
++		kdbus_kvec_set(&kvec[cnt++], meta_items, meta_size, &size);
++		cnt += !!kdbus_kvec_pad(&kvec[cnt], &size);
++	}
++
++	info.size = size;
++
++	slice = kdbus_pool_slice_alloc(conn->pool, size, false);
++	if (IS_ERR(slice)) {
++		ret = PTR_ERR(slice);
++		slice = NULL;
++		goto exit;
++	}
++
++	ret = kdbus_pool_slice_copy_kvec(slice, 0, kvec, cnt, size);
++	if (ret < 0)
++		goto exit;
++
++	kdbus_pool_slice_publish(slice, &cmd->offset, &cmd->info_size);
++
++	if (kdbus_member_set_user(&cmd->offset, argp, typeof(*cmd), offset) ||
++	    kdbus_member_set_user(&cmd->info_size, argp,
++				  typeof(*cmd), info_size))
++		ret = -EFAULT;
++
++exit:
++	kdbus_pool_slice_release(slice);
++	kfree(meta_items);
++	return kdbus_args_clear(&args, ret);
++}
+diff --git a/ipc/kdbus/bus.h b/ipc/kdbus/bus.h
+new file mode 100644
+index 0000000..8c2acae
+--- /dev/null
++++ b/ipc/kdbus/bus.h
+@@ -0,0 +1,101 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_BUS_H
++#define __KDBUS_BUS_H
++
++#include <linux/hashtable.h>
++#include <linux/list.h>
++#include <linux/mutex.h>
++#include <linux/rwsem.h>
++#include <linux/spinlock.h>
++#include <uapi/linux/kdbus.h>
++
++#include "metadata.h"
++#include "names.h"
++#include "node.h"
++#include "policy.h"
++
++struct kdbus_conn;
++struct kdbus_domain;
++struct kdbus_staging;
++struct kdbus_user;
++
++/**
++ * struct kdbus_bus - bus in a domain
++ * @node:		kdbus_node
++ * @id:			ID of this bus in the domain
++ * @bus_flags:		Simple pass-through flags from userspace to userspace
++ * @attach_flags_owner:	KDBUS_ATTACH_* flags of bus creator that other
++ *			connections can see or query
++ * @id128:		Unique random 128 bit ID of this bus
++ * @bloom:		Bloom parameters
++ * @domain:		Domain of this bus
++ * @creator:		Creator of the bus
++ * @creator_meta:	Meta information about the bus creator
++ * @last_message_id:	Last used message id
++ * @policy_db:		Policy database for this bus
++ * @name_registry:	Name registry of this bus
++ * @conn_rwlock:	Read/Write lock for all lists of child connections
++ * @conn_hash:		Map of connection IDs
++ * @monitors_list:	Connections that monitor this bus
++ * @notify_list:	List of pending kernel-generated messages
++ * @notify_lock:	Notification list lock
++ * @notify_flush_lock:	Notification flushing lock
++ */
++struct kdbus_bus {
++	struct kdbus_node node;
++
++	/* static */
++	u64 id;
++	u64 bus_flags;
++	u64 attach_flags_owner;
++	u8 id128[16];
++	struct kdbus_bloom_parameter bloom;
++	struct kdbus_domain *domain;
++	struct kdbus_user *creator;
++	struct kdbus_meta_proc *creator_meta;
++
++	/* protected by own locks */
++	atomic64_t last_message_id;
++	struct kdbus_policy_db policy_db;
++	struct kdbus_name_registry *name_registry;
++
++	/* protected by conn_rwlock */
++	struct rw_semaphore conn_rwlock;
++	DECLARE_HASHTABLE(conn_hash, 8);
++	struct list_head monitors_list;
++
++	/* protected by notify_lock */
++	struct list_head notify_list;
++	spinlock_t notify_lock;
++	struct mutex notify_flush_lock;
++};
++
++struct kdbus_bus *kdbus_bus_ref(struct kdbus_bus *bus);
++struct kdbus_bus *kdbus_bus_unref(struct kdbus_bus *bus);
++
++struct kdbus_conn *kdbus_bus_find_conn_by_id(struct kdbus_bus *bus, u64 id);
++void kdbus_bus_broadcast(struct kdbus_bus *bus,
++			 struct kdbus_conn *conn_src,
++			 struct kdbus_staging *staging);
++void kdbus_bus_eavesdrop(struct kdbus_bus *bus,
++			 struct kdbus_conn *conn_src,
++			 struct kdbus_staging *staging);
++
++struct kdbus_bus *kdbus_cmd_bus_make(struct kdbus_domain *domain,
++				     void __user *argp);
++int kdbus_cmd_bus_creator_info(struct kdbus_conn *conn, void __user *argp);
++
++#endif
+diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c
+new file mode 100644
+index 0000000..ef63d65
+--- /dev/null
++++ b/ipc/kdbus/connection.c
+@@ -0,0 +1,2227 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/audit.h>
++#include <linux/file.h>
++#include <linux/fs.h>
++#include <linux/fs_struct.h>
++#include <linux/hashtable.h>
++#include <linux/idr.h>
++#include <linux/init.h>
++#include <linux/math64.h>
++#include <linux/mm.h>
++#include <linux/module.h>
++#include <linux/mutex.h>
++#include <linux/path.h>
++#include <linux/poll.h>
++#include <linux/sched.h>
++#include <linux/shmem_fs.h>
++#include <linux/sizes.h>
++#include <linux/slab.h>
++#include <linux/syscalls.h>
++#include <linux/uio.h>
++
++#include "bus.h"
++#include "connection.h"
++#include "endpoint.h"
++#include "handle.h"
++#include "match.h"
++#include "message.h"
++#include "metadata.h"
++#include "names.h"
++#include "domain.h"
++#include "item.h"
++#include "notify.h"
++#include "policy.h"
++#include "pool.h"
++#include "reply.h"
++#include "util.h"
++#include "queue.h"
++
++#define KDBUS_CONN_ACTIVE_BIAS	(INT_MIN + 2)
++#define KDBUS_CONN_ACTIVE_NEW	(INT_MIN + 1)
++
++static struct kdbus_conn *kdbus_conn_new(struct kdbus_ep *ep,
++					 struct file *file,
++					 struct kdbus_cmd_hello *hello,
++					 const char *name,
++					 const struct kdbus_creds *creds,
++					 const struct kdbus_pids *pids,
++					 const char *seclabel,
++					 const char *conn_description)
++{
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++	static struct lock_class_key __key;
++#endif
++	struct kdbus_pool_slice *slice = NULL;
++	struct kdbus_bus *bus = ep->bus;
++	struct kdbus_conn *conn;
++	u64 attach_flags_send;
++	u64 attach_flags_recv;
++	u64 items_size = 0;
++	bool is_policy_holder;
++	bool is_activator;
++	bool is_monitor;
++	bool privileged;
++	bool owner;
++	struct kvec kvec;
++	int ret;
++
++	struct {
++		u64 size;
++		u64 type;
++		struct kdbus_bloom_parameter bloom;
++	} bloom_item;
++
++	privileged = kdbus_ep_is_privileged(ep, file);
++	owner = kdbus_ep_is_owner(ep, file);
++
++	is_monitor = hello->flags & KDBUS_HELLO_MONITOR;
++	is_activator = hello->flags & KDBUS_HELLO_ACTIVATOR;
++	is_policy_holder = hello->flags & KDBUS_HELLO_POLICY_HOLDER;
++
++	if (!hello->pool_size || !IS_ALIGNED(hello->pool_size, PAGE_SIZE))
++		return ERR_PTR(-EINVAL);
++	if (is_monitor + is_activator + is_policy_holder > 1)
++		return ERR_PTR(-EINVAL);
++	if (name && !is_activator && !is_policy_holder)
++		return ERR_PTR(-EINVAL);
++	if (!name && (is_activator || is_policy_holder))
++		return ERR_PTR(-EINVAL);
++	if (name && !kdbus_name_is_valid(name, true))
++		return ERR_PTR(-EINVAL);
++	if (is_monitor && ep->user)
++		return ERR_PTR(-EOPNOTSUPP);
++	if (!owner && (is_activator || is_policy_holder || is_monitor))
++		return ERR_PTR(-EPERM);
++	if (!owner && (creds || pids || seclabel))
++		return ERR_PTR(-EPERM);
++
++	ret = kdbus_sanitize_attach_flags(hello->attach_flags_send,
++					  &attach_flags_send);
++	if (ret < 0)
++		return ERR_PTR(ret);
++
++	ret = kdbus_sanitize_attach_flags(hello->attach_flags_recv,
++					  &attach_flags_recv);
++	if (ret < 0)
++		return ERR_PTR(ret);
++
++	conn = kzalloc(sizeof(*conn), GFP_KERNEL);
++	if (!conn)
++		return ERR_PTR(-ENOMEM);
++
++	kref_init(&conn->kref);
++	atomic_set(&conn->active, KDBUS_CONN_ACTIVE_NEW);
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++	lockdep_init_map(&conn->dep_map, "s_active", &__key, 0);
++#endif
++	mutex_init(&conn->lock);
++	INIT_LIST_HEAD(&conn->names_list);
++	INIT_LIST_HEAD(&conn->reply_list);
++	atomic_set(&conn->request_count, 0);
++	atomic_set(&conn->lost_count, 0);
++	INIT_DELAYED_WORK(&conn->work, kdbus_reply_list_scan_work);
++	conn->cred = get_cred(file->f_cred);
++	conn->pid = get_pid(task_pid(current));
++	get_fs_root(current->fs, &conn->root_path);
++	init_waitqueue_head(&conn->wait);
++	kdbus_queue_init(&conn->queue);
++	conn->privileged = privileged;
++	conn->owner = owner;
++	conn->ep = kdbus_ep_ref(ep);
++	conn->id = atomic64_inc_return(&bus->domain->last_id);
++	conn->flags = hello->flags;
++	atomic64_set(&conn->attach_flags_send, attach_flags_send);
++	atomic64_set(&conn->attach_flags_recv, attach_flags_recv);
++	INIT_LIST_HEAD(&conn->monitor_entry);
++
++	if (conn_description) {
++		conn->description = kstrdup(conn_description, GFP_KERNEL);
++		if (!conn->description) {
++			ret = -ENOMEM;
++			goto exit_unref;
++		}
++	}
++
++	conn->pool = kdbus_pool_new(conn->description, hello->pool_size);
++	if (IS_ERR(conn->pool)) {
++		ret = PTR_ERR(conn->pool);
++		conn->pool = NULL;
++		goto exit_unref;
++	}
++
++	conn->match_db = kdbus_match_db_new();
++	if (IS_ERR(conn->match_db)) {
++		ret = PTR_ERR(conn->match_db);
++		conn->match_db = NULL;
++		goto exit_unref;
++	}
++
++	/* return properties of this connection to the caller */
++	hello->bus_flags = bus->bus_flags;
++	hello->id = conn->id;
++
++	BUILD_BUG_ON(sizeof(bus->id128) != sizeof(hello->id128));
++	memcpy(hello->id128, bus->id128, sizeof(hello->id128));
++
++	/* privileged processes can impersonate somebody else */
++	if (creds || pids || seclabel) {
++		conn->meta_fake = kdbus_meta_fake_new();
++		if (IS_ERR(conn->meta_fake)) {
++			ret = PTR_ERR(conn->meta_fake);
++			conn->meta_fake = NULL;
++			goto exit_unref;
++		}
++
++		ret = kdbus_meta_fake_collect(conn->meta_fake,
++					      creds, pids, seclabel);
++		if (ret < 0)
++			goto exit_unref;
++	} else {
++		conn->meta_proc = kdbus_meta_proc_new();
++		if (IS_ERR(conn->meta_proc)) {
++			ret = PTR_ERR(conn->meta_proc);
++			conn->meta_proc = NULL;
++			goto exit_unref;
++		}
++
++		ret = kdbus_meta_proc_collect(conn->meta_proc,
++					      KDBUS_ATTACH_CREDS |
++					      KDBUS_ATTACH_PIDS |
++					      KDBUS_ATTACH_AUXGROUPS |
++					      KDBUS_ATTACH_TID_COMM |
++					      KDBUS_ATTACH_PID_COMM |
++					      KDBUS_ATTACH_EXE |
++					      KDBUS_ATTACH_CMDLINE |
++					      KDBUS_ATTACH_CGROUP |
++					      KDBUS_ATTACH_CAPS |
++					      KDBUS_ATTACH_SECLABEL |
++					      KDBUS_ATTACH_AUDIT);
++		if (ret < 0)
++			goto exit_unref;
++	}
++
++	/*
++	 * Account the connection against the current user (UID), or for
++	 * custom endpoints use the anonymous user assigned to the endpoint.
++	 * Note that limits are always accounted against the real UID, not
++	 * the effective UID (cred->user always points to the accounting of
++	 * cred->uid, not cred->euid).
++	 * In case the caller is privileged, we allow changing the accounting
++	 * to the faked user.
++	 */
++	if (ep->user) {
++		conn->user = kdbus_user_ref(ep->user);
++	} else {
++		kuid_t uid;
++
++		if (conn->meta_fake && uid_valid(conn->meta_fake->uid) &&
++		    conn->privileged)
++			uid = conn->meta_fake->uid;
++		else
++			uid = conn->cred->uid;
++
++		conn->user = kdbus_user_lookup(ep->bus->domain, uid);
++		if (IS_ERR(conn->user)) {
++			ret = PTR_ERR(conn->user);
++			conn->user = NULL;
++			goto exit_unref;
++		}
++	}
++
++	if (atomic_inc_return(&conn->user->connections) > KDBUS_USER_MAX_CONN) {
++		/* decremented by destructor as conn->user is valid */
++		ret = -EMFILE;
++		goto exit_unref;
++	}
++
++	bloom_item.size = sizeof(bloom_item);
++	bloom_item.type = KDBUS_ITEM_BLOOM_PARAMETER;
++	bloom_item.bloom = bus->bloom;
++	kdbus_kvec_set(&kvec, &bloom_item, bloom_item.size, &items_size);
++
++	slice = kdbus_pool_slice_alloc(conn->pool, items_size, false);
++	if (IS_ERR(slice)) {
++		ret = PTR_ERR(slice);
++		slice = NULL;
++		goto exit_unref;
++	}
++
++	ret = kdbus_pool_slice_copy_kvec(slice, 0, &kvec, 1, items_size);
++	if (ret < 0)
++		goto exit_unref;
++
++	kdbus_pool_slice_publish(slice, &hello->offset, &hello->items_size);
++	kdbus_pool_slice_release(slice);
++
++	return conn;
++
++exit_unref:
++	kdbus_pool_slice_release(slice);
++	kdbus_conn_unref(conn);
++	return ERR_PTR(ret);
++}
++
++static void __kdbus_conn_free(struct kref *kref)
++{
++	struct kdbus_conn *conn = container_of(kref, struct kdbus_conn, kref);
++
++	WARN_ON(kdbus_conn_active(conn));
++	WARN_ON(delayed_work_pending(&conn->work));
++	WARN_ON(!list_empty(&conn->queue.msg_list));
++	WARN_ON(!list_empty(&conn->names_list));
++	WARN_ON(!list_empty(&conn->reply_list));
++
++	if (conn->user) {
++		atomic_dec(&conn->user->connections);
++		kdbus_user_unref(conn->user);
++	}
++
++	kdbus_meta_fake_free(conn->meta_fake);
++	kdbus_meta_proc_unref(conn->meta_proc);
++	kdbus_match_db_free(conn->match_db);
++	kdbus_pool_free(conn->pool);
++	kdbus_ep_unref(conn->ep);
++	path_put(&conn->root_path);
++	put_pid(conn->pid);
++	put_cred(conn->cred);
++	kfree(conn->description);
++	kfree(conn->quota);
++	kfree(conn);
++}
++
++/**
++ * kdbus_conn_ref() - take a connection reference
++ * @conn:		Connection, may be %NULL
++ *
++ * Return: the connection itself
++ */
++struct kdbus_conn *kdbus_conn_ref(struct kdbus_conn *conn)
++{
++	if (conn)
++		kref_get(&conn->kref);
++	return conn;
++}
++
++/**
++ * kdbus_conn_unref() - drop a connection reference
++ * @conn:		Connection (may be NULL)
++ *
++ * When the last reference is dropped, the connection's internal structure
++ * is freed.
++ *
++ * Return: NULL
++ */
++struct kdbus_conn *kdbus_conn_unref(struct kdbus_conn *conn)
++{
++	if (conn)
++		kref_put(&conn->kref, __kdbus_conn_free);
++	return NULL;
++}
++
++/**
++ * kdbus_conn_active() - connection is not disconnected
++ * @conn:		Connection to check
++ *
++ * Return true if the connection was not disconnected, yet. Note that a
++ * connection might be disconnected asynchronously, unless you hold the
++ * connection lock. If that's not suitable for you, see kdbus_conn_acquire() to
++ * suppress connection shutdown for a short period.
++ *
++ * Return: true if the connection is still active
++ */
++bool kdbus_conn_active(const struct kdbus_conn *conn)
++{
++	return atomic_read(&conn->active) >= 0;
++}
++
++/**
++ * kdbus_conn_acquire() - acquire an active connection reference
++ * @conn:		Connection
++ *
++ * Users can close a connection via KDBUS_BYEBYE (or by destroying the
++ * endpoint/bus/...) at any time. Whenever this happens, we should deny any
++ * user-visible action on this connection and signal ECONNRESET instead.
++ * To avoid testing for connection availability everytime you take the
++ * connection-lock, you can acquire a connection for short periods.
++ *
++ * By calling kdbus_conn_acquire(), you gain an "active reference" to the
++ * connection. You must also hold a regular reference at any time! As long as
++ * you hold the active-ref, the connection will not be shut down. However, if
++ * the connection was shut down, you can never acquire an active-ref again.
++ *
++ * kdbus_conn_disconnect() disables the connection and then waits for all active
++ * references to be dropped. It will also wake up any pending operation.
++ * However, you must not sleep for an indefinite period while holding an
++ * active-reference. Otherwise, kdbus_conn_disconnect() might stall. If you need
++ * to sleep for an indefinite period, either release the reference and try to
++ * acquire it again after waking up, or make kdbus_conn_disconnect() wake up
++ * your wait-queue.
++ *
++ * Return: 0 on success, negative error code on failure.
++ */
++int kdbus_conn_acquire(struct kdbus_conn *conn)
++{
++	if (!atomic_inc_unless_negative(&conn->active))
++		return -ECONNRESET;
++
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++	rwsem_acquire_read(&conn->dep_map, 0, 1, _RET_IP_);
++#endif
++
++	return 0;
++}
++
++/**
++ * kdbus_conn_release() - release an active connection reference
++ * @conn:		Connection
++ *
++ * This releases an active reference that has been acquired via
++ * kdbus_conn_acquire(). If the connection was already disabled and this is the
++ * last active-ref that is dropped, the disconnect-waiter will be woken up and
++ * properly close the connection.
++ */
++void kdbus_conn_release(struct kdbus_conn *conn)
++{
++	int v;
++
++	if (!conn)
++		return;
++
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++	rwsem_release(&conn->dep_map, 1, _RET_IP_);
++#endif
++
++	v = atomic_dec_return(&conn->active);
++	if (v != KDBUS_CONN_ACTIVE_BIAS)
++		return;
++
++	wake_up_all(&conn->wait);
++}
++
++static int kdbus_conn_connect(struct kdbus_conn *conn, const char *name)
++{
++	struct kdbus_ep *ep = conn->ep;
++	struct kdbus_bus *bus = ep->bus;
++	int ret;
++
++	if (WARN_ON(atomic_read(&conn->active) != KDBUS_CONN_ACTIVE_NEW))
++		return -EALREADY;
++
++	/* make sure the ep-node is active while we add our connection */
++	if (!kdbus_node_acquire(&ep->node))
++		return -ESHUTDOWN;
++
++	/* lock order: domain -> bus -> ep -> names -> conn */
++	mutex_lock(&ep->lock);
++	down_write(&bus->conn_rwlock);
++
++	/* link into monitor list */
++	if (kdbus_conn_is_monitor(conn))
++		list_add_tail(&conn->monitor_entry, &bus->monitors_list);
++
++	/* link into bus and endpoint */
++	list_add_tail(&conn->ep_entry, &ep->conn_list);
++	hash_add(bus->conn_hash, &conn->hentry, conn->id);
++
++	/* enable lookups and acquire active ref */
++	atomic_set(&conn->active, 1);
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++	rwsem_acquire_read(&conn->dep_map, 0, 1, _RET_IP_);
++#endif
++
++	up_write(&bus->conn_rwlock);
++	mutex_unlock(&ep->lock);
++
++	kdbus_node_release(&ep->node);
++
++	/*
++	 * Notify subscribers about the new active connection, unless it is
++	 * a monitor. Monitors are invisible on the bus, can't be addressed
++	 * directly, and won't cause any notifications.
++	 */
++	if (!kdbus_conn_is_monitor(conn)) {
++		ret = kdbus_notify_id_change(bus, KDBUS_ITEM_ID_ADD,
++					     conn->id, conn->flags);
++		if (ret < 0)
++			goto exit_disconnect;
++	}
++
++	if (kdbus_conn_is_activator(conn)) {
++		u64 flags = KDBUS_NAME_ACTIVATOR;
++
++		if (WARN_ON(!name)) {
++			ret = -EINVAL;
++			goto exit_disconnect;
++		}
++
++		ret = kdbus_name_acquire(bus->name_registry, conn, name,
++					 flags, NULL);
++		if (ret < 0)
++			goto exit_disconnect;
++	}
++
++	kdbus_conn_release(conn);
++	kdbus_notify_flush(bus);
++	return 0;
++
++exit_disconnect:
++	kdbus_conn_release(conn);
++	kdbus_conn_disconnect(conn, false);
++	return ret;
++}
++
++/**
++ * kdbus_conn_disconnect() - disconnect a connection
++ * @conn:		The connection to disconnect
++ * @ensure_queue_empty:	Flag to indicate if the call should fail in
++ *			case the connection's message list is not
++ *			empty
++ *
++ * If @ensure_msg_list_empty is true, and the connection has pending messages,
++ * -EBUSY is returned.
++ *
++ * Return: 0 on success, negative errno on failure
++ */
++int kdbus_conn_disconnect(struct kdbus_conn *conn, bool ensure_queue_empty)
++{
++	struct kdbus_queue_entry *entry, *tmp;
++	struct kdbus_bus *bus = conn->ep->bus;
++	struct kdbus_reply *r, *r_tmp;
++	struct kdbus_conn *c;
++	int i, v;
++
++	mutex_lock(&conn->lock);
++	v = atomic_read(&conn->active);
++	if (v == KDBUS_CONN_ACTIVE_NEW) {
++		/* was never connected */
++		mutex_unlock(&conn->lock);
++		return 0;
++	}
++	if (v < 0) {
++		/* already dead */
++		mutex_unlock(&conn->lock);
++		return -ECONNRESET;
++	}
++	if (ensure_queue_empty && !list_empty(&conn->queue.msg_list)) {
++		/* still busy */
++		mutex_unlock(&conn->lock);
++		return -EBUSY;
++	}
++
++	atomic_add(KDBUS_CONN_ACTIVE_BIAS, &conn->active);
++	mutex_unlock(&conn->lock);
++
++	wake_up_interruptible(&conn->wait);
++
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++	rwsem_acquire(&conn->dep_map, 0, 0, _RET_IP_);
++	if (atomic_read(&conn->active) != KDBUS_CONN_ACTIVE_BIAS)
++		lock_contended(&conn->dep_map, _RET_IP_);
++#endif
++
++	wait_event(conn->wait,
++		   atomic_read(&conn->active) == KDBUS_CONN_ACTIVE_BIAS);
++
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++	lock_acquired(&conn->dep_map, _RET_IP_);
++	rwsem_release(&conn->dep_map, 1, _RET_IP_);
++#endif
++
++	cancel_delayed_work_sync(&conn->work);
++	kdbus_policy_remove_owner(&conn->ep->bus->policy_db, conn);
++
++	/* lock order: domain -> bus -> ep -> names -> conn */
++	mutex_lock(&conn->ep->lock);
++	down_write(&bus->conn_rwlock);
++
++	/* remove from bus and endpoint */
++	hash_del(&conn->hentry);
++	list_del(&conn->monitor_entry);
++	list_del(&conn->ep_entry);
++
++	up_write(&bus->conn_rwlock);
++	mutex_unlock(&conn->ep->lock);
++
++	/*
++	 * Remove all names associated with this connection; this possibly
++	 * moves queued messages back to the activator connection.
++	 */
++	kdbus_name_release_all(bus->name_registry, conn);
++
++	/* if we die while other connections wait for our reply, notify them */
++	mutex_lock(&conn->lock);
++	list_for_each_entry_safe(entry, tmp, &conn->queue.msg_list, entry) {
++		if (entry->reply)
++			kdbus_notify_reply_dead(bus,
++						entry->reply->reply_dst->id,
++						entry->reply->cookie);
++		kdbus_queue_entry_free(entry);
++	}
++
++	list_for_each_entry_safe(r, r_tmp, &conn->reply_list, entry)
++		kdbus_reply_unlink(r);
++	mutex_unlock(&conn->lock);
++
++	/* lock order: domain -> bus -> ep -> names -> conn */
++	down_read(&bus->conn_rwlock);
++	hash_for_each(bus->conn_hash, i, c, hentry) {
++		mutex_lock(&c->lock);
++		list_for_each_entry_safe(r, r_tmp, &c->reply_list, entry) {
++			if (r->reply_src != conn)
++				continue;
++
++			if (r->sync)
++				kdbus_sync_reply_wakeup(r, -EPIPE);
++			else
++				/* send a 'connection dead' notification */
++				kdbus_notify_reply_dead(bus, c->id, r->cookie);
++
++			kdbus_reply_unlink(r);
++		}
++		mutex_unlock(&c->lock);
++	}
++	up_read(&bus->conn_rwlock);
++
++	if (!kdbus_conn_is_monitor(conn))
++		kdbus_notify_id_change(bus, KDBUS_ITEM_ID_REMOVE,
++				       conn->id, conn->flags);
++
++	kdbus_notify_flush(bus);
++
++	return 0;
++}
++
++/**
++ * kdbus_conn_has_name() - check if a connection owns a name
++ * @conn:		Connection
++ * @name:		Well-know name to check for
++ *
++ * The caller must hold the registry lock of conn->ep->bus.
++ *
++ * Return: true if the name is currently owned by the connection
++ */
++bool kdbus_conn_has_name(struct kdbus_conn *conn, const char *name)
++{
++	struct kdbus_name_owner *owner;
++
++	lockdep_assert_held(&conn->ep->bus->name_registry->rwlock);
++
++	list_for_each_entry(owner, &conn->names_list, conn_entry)
++		if (!(owner->flags & KDBUS_NAME_IN_QUEUE) &&
++		    !strcmp(name, owner->name->name))
++			return true;
++
++	return false;
++}
++
++struct kdbus_quota {
++	u32 memory;
++	u16 msgs;
++	u8 fds;
++};
++
++/**
++ * kdbus_conn_quota_inc() - increase quota accounting
++ * @c:		connection owning the quota tracking
++ * @u:		user to account for (or NULL for kernel accounting)
++ * @memory:	size of memory to account for
++ * @fds:	number of FDs to account for
++ *
++ * This call manages the quotas on resource @c. That is, it's used if other
++ * users want to use the resources of connection @c, which so far only concerns
++ * the receive queue of the destination.
++ *
++ * This increases the quota-accounting for user @u by @memory bytes and @fds
++ * file descriptors. If the user has already reached the quota limits, this call
++ * will not do any accounting but return a negative error code indicating the
++ * failure.
++ *
++ * Return: 0 on success, negative error code on failure.
++ */
++int kdbus_conn_quota_inc(struct kdbus_conn *c, struct kdbus_user *u,
++			 size_t memory, size_t fds)
++{
++	struct kdbus_quota *quota;
++	size_t available, accounted;
++	unsigned int id;
++
++	/*
++	 * Pool Layout:
++	 * 50% of a pool is always owned by the connection. It is reserved for
++	 * kernel queries, handling received messages and other tasks that are
++	 * under control of the pool owner. The other 50% of the pool are used
++	 * as incoming queue.
++	 * As we optionally support user-space based policies, we need fair
++	 * allocation schemes. Furthermore, resource utilization should be
++	 * maximized, so only minimal resources stay reserved. However, we need
++	 * to adapt to a dynamic number of users, as we cannot know how many
++	 * users will talk to a connection. Therefore, the current allocation
++	 * works like this:
++	 * We limit the number of bytes in a destination's pool per sending
++	 * user. The space available for a user is 33% of the unused pool space
++	 * (whereas the space used by the user itself is also treated as
++	 * 'unused'). This way, we favor users coming first, but keep enough
++	 * pool space available for any following users. Given that messages are
++	 * dequeued in FIFO order, this should balance nicely if the number of
++	 * users grows. At the same time, this algorithm guarantees that the
++	 * space available to a connection is reduced dynamically, the more
++	 * concurrent users talk to a connection.
++	 */
++
++	/* per user-accounting is expensive, so we keep state small */
++	BUILD_BUG_ON(sizeof(quota->memory) != 4);
++	BUILD_BUG_ON(sizeof(quota->msgs) != 2);
++	BUILD_BUG_ON(sizeof(quota->fds) != 1);
++	BUILD_BUG_ON(KDBUS_CONN_MAX_MSGS > U16_MAX);
++	BUILD_BUG_ON(KDBUS_CONN_MAX_FDS_PER_USER > U8_MAX);
++
++	id = u ? u->id : KDBUS_USER_KERNEL_ID;
++	if (id >= c->n_quota) {
++		unsigned int users;
++
++		users = max(KDBUS_ALIGN8(id) + 8, id);
++		quota = krealloc(c->quota, users * sizeof(*quota),
++				 GFP_KERNEL | __GFP_ZERO);
++		if (!quota)
++			return -ENOMEM;
++
++		c->n_quota = users;
++		c->quota = quota;
++	}
++
++	quota = &c->quota[id];
++	kdbus_pool_accounted(c->pool, &available, &accounted);
++
++	/* half the pool is _always_ reserved for the pool owner */
++	available /= 2;
++
++	/*
++	 * Pool owner slices are un-accounted slices; they can claim more
++	 * than 50% of the queue. However, the slices we're dealing with here
++	 * belong to the incoming queue, hence they are 'accounted' slices
++	 * to which the 50%-limit applies.
++	 */
++	if (available < accounted)
++		return -ENOBUFS;
++
++	/* 1/3 of the remaining space (including your own memory) */
++	available = (available - accounted + quota->memory) / 3;
++
++	if (available < quota->memory ||
++	    available - quota->memory < memory ||
++	    quota->memory + memory > U32_MAX)
++		return -ENOBUFS;
++	if (quota->msgs >= KDBUS_CONN_MAX_MSGS)
++		return -ENOBUFS;
++	if (quota->fds + fds < quota->fds ||
++	    quota->fds + fds > KDBUS_CONN_MAX_FDS_PER_USER)
++		return -EMFILE;
++
++	quota->memory += memory;
++	quota->fds += fds;
++	++quota->msgs;
++	return 0;
++}
++
++/**
++ * kdbus_conn_quota_dec() - decrease quota accounting
++ * @c:		connection owning the quota tracking
++ * @u:		user which was accounted for (or NULL for kernel accounting)
++ * @memory:	size of memory which was accounted for
++ * @fds:	number of FDs which were accounted for
++ *
++ * This does the reverse of kdbus_conn_quota_inc(). You have to release any
++ * accounted resources that you called kdbus_conn_quota_inc() for. However, you
++ * must not call kdbus_conn_quota_dec() if the accounting failed (that is,
++ * kdbus_conn_quota_inc() failed).
++ */
++void kdbus_conn_quota_dec(struct kdbus_conn *c, struct kdbus_user *u,
++			  size_t memory, size_t fds)
++{
++	struct kdbus_quota *quota;
++	unsigned int id;
++
++	id = u ? u->id : KDBUS_USER_KERNEL_ID;
++	if (WARN_ON(id >= c->n_quota))
++		return;
++
++	quota = &c->quota[id];
++
++	if (!WARN_ON(quota->msgs == 0))
++		--quota->msgs;
++	if (!WARN_ON(quota->memory < memory))
++		quota->memory -= memory;
++	if (!WARN_ON(quota->fds < fds))
++		quota->fds -= fds;
++}
++
++/**
++ * kdbus_conn_lost_message() - handle lost messages
++ * @c:		connection that lost a message
++ *
++ * kdbus is reliable. That means, we try hard to never lose messages. However,
++ * memory is limited, so we cannot rely on transmissions to never fail.
++ * Therefore, we use quota-limits to let callers know if their unicast message
++ * cannot be transmitted to a peer. This works fine for unicasts, but for
++ * broadcasts we cannot make the caller handle the transmission failure.
++ * Instead, we must let the destination know that it couldn't receive a
++ * broadcast.
++ * As this is an unlikely scenario, we keep it simple. A single lost-counter
++ * remembers the number of lost messages since the last call to RECV. The next
++ * message retrieval will notify the connection that it lost messages since the
++ * last message retrieval and thus should resync its state.
++ */
++void kdbus_conn_lost_message(struct kdbus_conn *c)
++{
++	if (atomic_inc_return(&c->lost_count) == 1)
++		wake_up_interruptible(&c->wait);
++}
++
++/* Callers should take the conn_dst lock */
++static struct kdbus_queue_entry *
++kdbus_conn_entry_make(struct kdbus_conn *conn_src,
++		      struct kdbus_conn *conn_dst,
++		      struct kdbus_staging *staging)
++{
++	/* The remote connection was disconnected */
++	if (!kdbus_conn_active(conn_dst))
++		return ERR_PTR(-ECONNRESET);
++
++	/*
++	 * If the connection does not accept file descriptors but the message
++	 * has some attached, refuse it.
++	 *
++	 * If this is a monitor connection, accept the message. In that
++	 * case, all file descriptors will be set to -1 at receive time.
++	 */
++	if (!kdbus_conn_is_monitor(conn_dst) &&
++	    !(conn_dst->flags & KDBUS_HELLO_ACCEPT_FD) &&
++	    staging->gaps && staging->gaps->n_fds > 0)
++		return ERR_PTR(-ECOMM);
++
++	return kdbus_queue_entry_new(conn_src, conn_dst, staging);
++}
++
++/*
++ * Synchronously responding to a message, allocate a queue entry
++ * and attach it to the reply tracking object.
++ * The connection's queue will never get to see it.
++ */
++static int kdbus_conn_entry_sync_attach(struct kdbus_conn *conn_dst,
++					struct kdbus_staging *staging,
++					struct kdbus_reply *reply_wake)
++{
++	struct kdbus_queue_entry *entry;
++	int remote_ret, ret = 0;
++
++	mutex_lock(&reply_wake->reply_dst->lock);
++
++	/*
++	 * If we are still waiting then proceed, allocate a queue
++	 * entry and attach it to the reply object
++	 */
++	if (reply_wake->waiting) {
++		entry = kdbus_conn_entry_make(reply_wake->reply_src, conn_dst,
++					      staging);
++		if (IS_ERR(entry))
++			ret = PTR_ERR(entry);
++		else
++			/* Attach the entry to the reply object */
++			reply_wake->queue_entry = entry;
++	} else {
++		ret = -ECONNRESET;
++	}
++
++	/*
++	 * Update the reply object and wake up remote peer only
++	 * on appropriate return codes
++	 *
++	 * * -ECOMM: if the replying connection failed with -ECOMM
++	 *           then wakeup remote peer with -EREMOTEIO
++	 *
++	 *           We do this to differenciate between -ECOMM errors
++	 *           from the original sender perspective:
++	 *           -ECOMM error during the sync send and
++	 *           -ECOMM error during the sync reply, this last
++	 *           one is rewritten to -EREMOTEIO
++	 *
++	 * * Wake up on all other return codes.
++	 */
++	remote_ret = ret;
++
++	if (ret == -ECOMM)
++		remote_ret = -EREMOTEIO;
++
++	kdbus_sync_reply_wakeup(reply_wake, remote_ret);
++	kdbus_reply_unlink(reply_wake);
++	mutex_unlock(&reply_wake->reply_dst->lock);
++
++	return ret;
++}
++
++/**
++ * kdbus_conn_entry_insert() - enqueue a message into the receiver's pool
++ * @conn_src:		The sending connection
++ * @conn_dst:		The connection to queue into
++ * @staging:		Message to send
++ * @reply:		The reply tracker to attach to the queue entry
++ * @name:		Destination name this msg is sent to, or NULL
++ *
++ * Return: 0 on success. negative error otherwise.
++ */
++int kdbus_conn_entry_insert(struct kdbus_conn *conn_src,
++			    struct kdbus_conn *conn_dst,
++			    struct kdbus_staging *staging,
++			    struct kdbus_reply *reply,
++			    const struct kdbus_name_entry *name)
++{
++	struct kdbus_queue_entry *entry;
++	int ret;
++
++	kdbus_conn_lock2(conn_src, conn_dst);
++
++	entry = kdbus_conn_entry_make(conn_src, conn_dst, staging);
++	if (IS_ERR(entry)) {
++		ret = PTR_ERR(entry);
++		goto exit_unlock;
++	}
++
++	if (reply) {
++		kdbus_reply_link(reply);
++		if (!reply->sync)
++			schedule_delayed_work(&conn_src->work, 0);
++	}
++
++	/*
++	 * Record the sequence number of the registered name; it will
++	 * be remembered by the queue, in case messages addressed to a
++	 * name need to be moved from or to an activator.
++	 */
++	if (name)
++		entry->dst_name_id = name->name_id;
++
++	kdbus_queue_entry_enqueue(entry, reply);
++	wake_up_interruptible(&conn_dst->wait);
++
++	ret = 0;
++
++exit_unlock:
++	kdbus_conn_unlock2(conn_src, conn_dst);
++	return ret;
++}
++
++static int kdbus_conn_wait_reply(struct kdbus_conn *conn_src,
++				 struct kdbus_cmd_send *cmd_send,
++				 struct file *ioctl_file,
++				 struct file *cancel_fd,
++				 struct kdbus_reply *reply_wait,
++				 ktime_t expire)
++{
++	struct kdbus_queue_entry *entry;
++	struct poll_wqueues pwq = {};
++	int ret;
++
++	if (WARN_ON(!reply_wait))
++		return -EIO;
++
++	/*
++	 * Block until the reply arrives. reply_wait is left untouched
++	 * by the timeout scans that might be conducted for other,
++	 * asynchronous replies of conn_src.
++	 */
++
++	poll_initwait(&pwq);
++	poll_wait(ioctl_file, &conn_src->wait, &pwq.pt);
++
++	for (;;) {
++		/*
++		 * Any of the following conditions will stop our synchronously
++		 * blocking SEND command:
++		 *
++		 * a) The origin sender closed its connection
++		 * b) The remote peer answered, setting reply_wait->waiting = 0
++		 * c) The cancel FD was written to
++		 * d) A signal was received
++		 * e) The specified timeout was reached, and none of the above
++		 *    conditions kicked in.
++		 */
++
++		/*
++		 * We have already acquired an active reference when
++		 * entering here, but another thread may call
++		 * KDBUS_CMD_BYEBYE which does not acquire an active
++		 * reference, therefore kdbus_conn_disconnect() will
++		 * not wait for us.
++		 */
++		if (!kdbus_conn_active(conn_src)) {
++			ret = -ECONNRESET;
++			break;
++		}
++
++		/*
++		 * After the replying peer unset the waiting variable
++		 * it will wake up us.
++		 */
++		if (!reply_wait->waiting) {
++			ret = reply_wait->err;
++			break;
++		}
++
++		if (cancel_fd) {
++			unsigned int r;
++
++			r = cancel_fd->f_op->poll(cancel_fd, &pwq.pt);
++			if (r & POLLIN) {
++				ret = -ECANCELED;
++				break;
++			}
++		}
++
++		if (signal_pending(current)) {
++			ret = -EINTR;
++			break;
++		}
++
++		if (!poll_schedule_timeout(&pwq, TASK_INTERRUPTIBLE,
++					   &expire, 0)) {
++			ret = -ETIMEDOUT;
++			break;
++		}
++
++		/*
++		 * Reset the poll worker func, so the waitqueues are not
++		 * added to the poll table again. We just reuse what we've
++		 * collected earlier for further iterations.
++		 */
++		init_poll_funcptr(&pwq.pt, NULL);
++	}
++
++	poll_freewait(&pwq);
++
++	if (ret == -EINTR) {
++		/*
++		 * Interrupted system call. Unref the reply object, and pass
++		 * the return value down the chain. Mark the reply as
++		 * interrupted, so the cleanup work can remove it, but do not
++		 * unlink it from the list. Once the syscall restarts, we'll
++		 * pick it up and wait on it again.
++		 */
++		mutex_lock(&conn_src->lock);
++		reply_wait->interrupted = true;
++		schedule_delayed_work(&conn_src->work, 0);
++		mutex_unlock(&conn_src->lock);
++
++		return -ERESTARTSYS;
++	}
++
++	mutex_lock(&conn_src->lock);
++	reply_wait->waiting = false;
++	entry = reply_wait->queue_entry;
++	if (entry) {
++		ret = kdbus_queue_entry_install(entry,
++						&cmd_send->reply.return_flags,
++						true);
++		kdbus_pool_slice_publish(entry->slice, &cmd_send->reply.offset,
++					 &cmd_send->reply.msg_size);
++		kdbus_queue_entry_free(entry);
++	}
++	kdbus_reply_unlink(reply_wait);
++	mutex_unlock(&conn_src->lock);
++
++	return ret;
++}
++
++static int kdbus_pin_dst(struct kdbus_bus *bus,
++			 struct kdbus_staging *staging,
++			 struct kdbus_name_entry **out_name,
++			 struct kdbus_conn **out_dst)
++{
++	const struct kdbus_msg *msg = staging->msg;
++	struct kdbus_name_owner *owner = NULL;
++	struct kdbus_name_entry *name = NULL;
++	struct kdbus_conn *dst = NULL;
++	int ret;
++
++	lockdep_assert_held(&bus->name_registry->rwlock);
++
++	if (!staging->dst_name) {
++		dst = kdbus_bus_find_conn_by_id(bus, msg->dst_id);
++		if (!dst)
++			return -ENXIO;
++
++		if (!kdbus_conn_is_ordinary(dst)) {
++			ret = -ENXIO;
++			goto error;
++		}
++	} else {
++		name = kdbus_name_lookup_unlocked(bus->name_registry,
++						  staging->dst_name);
++		if (name)
++			owner = kdbus_name_get_owner(name);
++		if (!owner)
++			return -ESRCH;
++
++		/*
++		 * If both a name and a connection ID are given as destination
++		 * of a message, check that the currently owning connection of
++		 * the name matches the specified ID.
++		 * This way, we allow userspace to send the message to a
++		 * specific connection by ID only if the connection currently
++		 * owns the given name.
++		 */
++		if (msg->dst_id != KDBUS_DST_ID_NAME &&
++		    msg->dst_id != owner->conn->id)
++			return -EREMCHG;
++
++		if ((msg->flags & KDBUS_MSG_NO_AUTO_START) &&
++		    kdbus_conn_is_activator(owner->conn))
++			return -EADDRNOTAVAIL;
++
++		dst = kdbus_conn_ref(owner->conn);
++	}
++
++	*out_name = name;
++	*out_dst = dst;
++	return 0;
++
++error:
++	kdbus_conn_unref(dst);
++	return ret;
++}
++
++static int kdbus_conn_reply(struct kdbus_conn *src,
++			    struct kdbus_staging *staging)
++{
++	const struct kdbus_msg *msg = staging->msg;
++	struct kdbus_name_entry *name = NULL;
++	struct kdbus_reply *reply, *wake = NULL;
++	struct kdbus_conn *dst = NULL;
++	struct kdbus_bus *bus = src->ep->bus;
++	int ret;
++
++	if (WARN_ON(msg->dst_id == KDBUS_DST_ID_BROADCAST) ||
++	    WARN_ON(msg->flags & KDBUS_MSG_EXPECT_REPLY) ||
++	    WARN_ON(msg->flags & KDBUS_MSG_SIGNAL))
++		return -EINVAL;
++
++	/* name-registry must be locked for lookup *and* collecting data */
++	down_read(&bus->name_registry->rwlock);
++
++	/* find and pin destination */
++
++	ret = kdbus_pin_dst(bus, staging, &name, &dst);
++	if (ret < 0)
++		goto exit;
++
++	mutex_lock(&dst->lock);
++	reply = kdbus_reply_find(src, dst, msg->cookie_reply);
++	if (reply) {
++		if (reply->sync)
++			wake = kdbus_reply_ref(reply);
++		kdbus_reply_unlink(reply);
++	}
++	mutex_unlock(&dst->lock);
++
++	if (!reply) {
++		ret = -EBADSLT;
++		goto exit;
++	}
++
++	/* send message */
++
++	kdbus_bus_eavesdrop(bus, src, staging);
++
++	if (wake)
++		ret = kdbus_conn_entry_sync_attach(dst, staging, wake);
++	else
++		ret = kdbus_conn_entry_insert(src, dst, staging, NULL, name);
++
++exit:
++	up_read(&bus->name_registry->rwlock);
++	kdbus_reply_unref(wake);
++	kdbus_conn_unref(dst);
++	return ret;
++}
++
++static struct kdbus_reply *kdbus_conn_call(struct kdbus_conn *src,
++					   struct kdbus_staging *staging,
++					   ktime_t exp)
++{
++	const struct kdbus_msg *msg = staging->msg;
++	struct kdbus_name_entry *name = NULL;
++	struct kdbus_reply *wait = NULL;
++	struct kdbus_conn *dst = NULL;
++	struct kdbus_bus *bus = src->ep->bus;
++	int ret;
++
++	if (WARN_ON(msg->dst_id == KDBUS_DST_ID_BROADCAST) ||
++	    WARN_ON(msg->flags & KDBUS_MSG_SIGNAL) ||
++	    WARN_ON(!(msg->flags & KDBUS_MSG_EXPECT_REPLY)))
++		return ERR_PTR(-EINVAL);
++
++	/* resume previous wait-context, if available */
++
++	mutex_lock(&src->lock);
++	wait = kdbus_reply_find(NULL, src, msg->cookie);
++	if (wait) {
++		if (wait->interrupted) {
++			kdbus_reply_ref(wait);
++			wait->interrupted = false;
++		} else {
++			wait = NULL;
++		}
++	}
++	mutex_unlock(&src->lock);
++
++	if (wait)
++		return wait;
++
++	if (ktime_compare(ktime_get(), exp) >= 0)
++		return ERR_PTR(-ETIMEDOUT);
++
++	/* name-registry must be locked for lookup *and* collecting data */
++	down_read(&bus->name_registry->rwlock);
++
++	/* find and pin destination */
++
++	ret = kdbus_pin_dst(bus, staging, &name, &dst);
++	if (ret < 0)
++		goto exit;
++
++	if (!kdbus_conn_policy_talk(src, current_cred(), dst)) {
++		ret = -EPERM;
++		goto exit;
++	}
++
++	wait = kdbus_reply_new(dst, src, msg, name, true);
++	if (IS_ERR(wait)) {
++		ret = PTR_ERR(wait);
++		wait = NULL;
++		goto exit;
++	}
++
++	/* send message */
++
++	kdbus_bus_eavesdrop(bus, src, staging);
++
++	ret = kdbus_conn_entry_insert(src, dst, staging, wait, name);
++	if (ret < 0)
++		goto exit;
++
++	ret = 0;
++
++exit:
++	up_read(&bus->name_registry->rwlock);
++	if (ret < 0) {
++		kdbus_reply_unref(wait);
++		wait = ERR_PTR(ret);
++	}
++	kdbus_conn_unref(dst);
++	return wait;
++}
++
++static int kdbus_conn_unicast(struct kdbus_conn *src,
++			      struct kdbus_staging *staging)
++{
++	const struct kdbus_msg *msg = staging->msg;
++	struct kdbus_name_entry *name = NULL;
++	struct kdbus_reply *wait = NULL;
++	struct kdbus_conn *dst = NULL;
++	struct kdbus_bus *bus = src->ep->bus;
++	bool is_signal = (msg->flags & KDBUS_MSG_SIGNAL);
++	int ret = 0;
++
++	if (WARN_ON(msg->dst_id == KDBUS_DST_ID_BROADCAST) ||
++	    WARN_ON(!(msg->flags & KDBUS_MSG_EXPECT_REPLY) &&
++		    msg->cookie_reply != 0))
++		return -EINVAL;
++
++	/* name-registry must be locked for lookup *and* collecting data */
++	down_read(&bus->name_registry->rwlock);
++
++	/* find and pin destination */
++
++	ret = kdbus_pin_dst(bus, staging, &name, &dst);
++	if (ret < 0)
++		goto exit;
++
++	if (is_signal) {
++		/* like broadcasts we eavesdrop even if the msg is dropped */
++		kdbus_bus_eavesdrop(bus, src, staging);
++
++		/* drop silently if peer is not interested or not privileged */
++		if (!kdbus_match_db_match_msg(dst->match_db, src, staging) ||
++		    !kdbus_conn_policy_talk(dst, NULL, src))
++			goto exit;
++	} else if (!kdbus_conn_policy_talk(src, current_cred(), dst)) {
++		ret = -EPERM;
++		goto exit;
++	} else if (msg->flags & KDBUS_MSG_EXPECT_REPLY) {
++		wait = kdbus_reply_new(dst, src, msg, name, false);
++		if (IS_ERR(wait)) {
++			ret = PTR_ERR(wait);
++			wait = NULL;
++			goto exit;
++		}
++	}
++
++	/* send message */
++
++	if (!is_signal)
++		kdbus_bus_eavesdrop(bus, src, staging);
++
++	ret = kdbus_conn_entry_insert(src, dst, staging, wait, name);
++	if (ret < 0 && !is_signal)
++		goto exit;
++
++	/* signals are treated like broadcasts, recv-errors are ignored */
++	ret = 0;
++
++exit:
++	up_read(&bus->name_registry->rwlock);
++	kdbus_reply_unref(wait);
++	kdbus_conn_unref(dst);
++	return ret;
++}
++
++/**
++ * kdbus_conn_move_messages() - move messages from one connection to another
++ * @conn_dst:		Connection to copy to
++ * @conn_src:		Connection to copy from
++ * @name_id:		Filter for the sequence number of the registered
++ *			name, 0 means no filtering.
++ *
++ * Move all messages from one connection to another. This is used when
++ * an implementer connection is taking over/giving back a well-known name
++ * from/to an activator connection.
++ */
++void kdbus_conn_move_messages(struct kdbus_conn *conn_dst,
++			      struct kdbus_conn *conn_src,
++			      u64 name_id)
++{
++	struct kdbus_queue_entry *e, *e_tmp;
++	struct kdbus_reply *r, *r_tmp;
++	struct kdbus_bus *bus;
++	struct kdbus_conn *c;
++	LIST_HEAD(msg_list);
++	int i, ret = 0;
++
++	if (WARN_ON(conn_src == conn_dst))
++		return;
++
++	bus = conn_src->ep->bus;
++
++	/* lock order: domain -> bus -> ep -> names -> conn */
++	down_read(&bus->conn_rwlock);
++	hash_for_each(bus->conn_hash, i, c, hentry) {
++		if (c == conn_src || c == conn_dst)
++			continue;
++
++		mutex_lock(&c->lock);
++		list_for_each_entry_safe(r, r_tmp, &c->reply_list, entry) {
++			if (r->reply_src != conn_src)
++				continue;
++
++			/* filter messages for a specific name */
++			if (name_id > 0 && r->name_id != name_id)
++				continue;
++
++			kdbus_conn_unref(r->reply_src);
++			r->reply_src = kdbus_conn_ref(conn_dst);
++		}
++		mutex_unlock(&c->lock);
++	}
++	up_read(&bus->conn_rwlock);
++
++	kdbus_conn_lock2(conn_src, conn_dst);
++	list_for_each_entry_safe(e, e_tmp, &conn_src->queue.msg_list, entry) {
++		/* filter messages for a specific name */
++		if (name_id > 0 && e->dst_name_id != name_id)
++			continue;
++
++		if (!(conn_dst->flags & KDBUS_HELLO_ACCEPT_FD) &&
++		    e->gaps && e->gaps->n_fds > 0) {
++			kdbus_conn_lost_message(conn_dst);
++			kdbus_queue_entry_free(e);
++			continue;
++		}
++
++		ret = kdbus_queue_entry_move(e, conn_dst);
++		if (ret < 0) {
++			kdbus_conn_lost_message(conn_dst);
++			kdbus_queue_entry_free(e);
++			continue;
++		}
++	}
++	kdbus_conn_unlock2(conn_src, conn_dst);
++
++	/* wake up poll() */
++	wake_up_interruptible(&conn_dst->wait);
++}
++
++/* query the policy-database for all names of @whom */
++static bool kdbus_conn_policy_query_all(struct kdbus_conn *conn,
++					const struct cred *conn_creds,
++					struct kdbus_policy_db *db,
++					struct kdbus_conn *whom,
++					unsigned int access)
++{
++	struct kdbus_name_owner *owner;
++	bool pass = false;
++	int res;
++
++	lockdep_assert_held(&conn->ep->bus->name_registry->rwlock);
++
++	down_read(&db->entries_rwlock);
++	mutex_lock(&whom->lock);
++
++	list_for_each_entry(owner, &whom->names_list, conn_entry) {
++		if (owner->flags & KDBUS_NAME_IN_QUEUE)
++			continue;
++
++		res = kdbus_policy_query_unlocked(db,
++					conn_creds ? : conn->cred,
++					owner->name->name,
++					kdbus_strhash(owner->name->name));
++		if (res >= (int)access) {
++			pass = true;
++			break;
++		}
++	}
++
++	mutex_unlock(&whom->lock);
++	up_read(&db->entries_rwlock);
++
++	return pass;
++}
++
++/**
++ * kdbus_conn_policy_own_name() - verify a connection can own the given name
++ * @conn:		Connection
++ * @conn_creds:		Credentials of @conn to use for policy check
++ * @name:		Name
++ *
++ * This verifies that @conn is allowed to acquire the well-known name @name.
++ *
++ * Return: true if allowed, false if not.
++ */
++bool kdbus_conn_policy_own_name(struct kdbus_conn *conn,
++				const struct cred *conn_creds,
++				const char *name)
++{
++	unsigned int hash = kdbus_strhash(name);
++	int res;
++
++	if (!conn_creds)
++		conn_creds = conn->cred;
++
++	if (conn->ep->user) {
++		res = kdbus_policy_query(&conn->ep->policy_db, conn_creds,
++					 name, hash);
++		if (res < KDBUS_POLICY_OWN)
++			return false;
++	}
++
++	if (conn->owner)
++		return true;
++
++	res = kdbus_policy_query(&conn->ep->bus->policy_db, conn_creds,
++				 name, hash);
++	return res >= KDBUS_POLICY_OWN;
++}
++
++/**
++ * kdbus_conn_policy_talk() - verify a connection can talk to a given peer
++ * @conn:		Connection that tries to talk
++ * @conn_creds:		Credentials of @conn to use for policy check
++ * @to:			Connection that is talked to
++ *
++ * This verifies that @conn is allowed to talk to @to.
++ *
++ * Return: true if allowed, false if not.
++ */
++bool kdbus_conn_policy_talk(struct kdbus_conn *conn,
++			    const struct cred *conn_creds,
++			    struct kdbus_conn *to)
++{
++	if (!conn_creds)
++		conn_creds = conn->cred;
++
++	if (conn->ep->user &&
++	    !kdbus_conn_policy_query_all(conn, conn_creds, &conn->ep->policy_db,
++					 to, KDBUS_POLICY_TALK))
++		return false;
++
++	if (conn->owner)
++		return true;
++	if (uid_eq(conn_creds->euid, to->cred->uid))
++		return true;
++
++	return kdbus_conn_policy_query_all(conn, conn_creds,
++					   &conn->ep->bus->policy_db, to,
++					   KDBUS_POLICY_TALK);
++}
++
++/**
++ * kdbus_conn_policy_see_name_unlocked() - verify a connection can see a given
++ *					   name
++ * @conn:		Connection
++ * @conn_creds:		Credentials of @conn to use for policy check
++ * @name:		Name
++ *
++ * This verifies that @conn is allowed to see the well-known name @name. Caller
++ * must hold policy-lock.
++ *
++ * Return: true if allowed, false if not.
++ */
++bool kdbus_conn_policy_see_name_unlocked(struct kdbus_conn *conn,
++					 const struct cred *conn_creds,
++					 const char *name)
++{
++	int res;
++
++	/*
++	 * By default, all names are visible on a bus. SEE policies can only be
++	 * installed on custom endpoints, where by default no name is visible.
++	 */
++	if (!conn->ep->user)
++		return true;
++
++	res = kdbus_policy_query_unlocked(&conn->ep->policy_db,
++					  conn_creds ? : conn->cred,
++					  name, kdbus_strhash(name));
++	return res >= KDBUS_POLICY_SEE;
++}
++
++static bool kdbus_conn_policy_see_name(struct kdbus_conn *conn,
++				       const struct cred *conn_creds,
++				       const char *name)
++{
++	bool res;
++
++	down_read(&conn->ep->policy_db.entries_rwlock);
++	res = kdbus_conn_policy_see_name_unlocked(conn, conn_creds, name);
++	up_read(&conn->ep->policy_db.entries_rwlock);
++
++	return res;
++}
++
++static bool kdbus_conn_policy_see(struct kdbus_conn *conn,
++				  const struct cred *conn_creds,
++				  struct kdbus_conn *whom)
++{
++	/*
++	 * By default, all names are visible on a bus, so a connection can
++	 * always see other connections. SEE policies can only be installed on
++	 * custom endpoints, where by default no name is visible and we hide
++	 * peers from each other, unless you see at least _one_ name of the
++	 * peer.
++	 */
++	return !conn->ep->user ||
++	       kdbus_conn_policy_query_all(conn, conn_creds,
++					   &conn->ep->policy_db, whom,
++					   KDBUS_POLICY_SEE);
++}
++
++/**
++ * kdbus_conn_policy_see_notification() - verify a connection is allowed to
++ *					  receive a given kernel notification
++ * @conn:		Connection
++ * @conn_creds:		Credentials of @conn to use for policy check
++ * @msg:		Notification message
++ *
++ * This checks whether @conn is allowed to see the kernel notification.
++ *
++ * Return: true if allowed, false if not.
++ */
++bool kdbus_conn_policy_see_notification(struct kdbus_conn *conn,
++					const struct cred *conn_creds,
++					const struct kdbus_msg *msg)
++{
++	/*
++	 * Depending on the notification type, broadcasted kernel notifications
++	 * have to be filtered:
++	 *
++	 * KDBUS_ITEM_NAME_{ADD,REMOVE,CHANGE}: This notification is forwarded
++	 *     to a peer if, and only if, that peer can see the name this
++	 *     notification is for.
++	 *
++	 * KDBUS_ITEM_ID_{ADD,REMOVE}: Notifications for ID changes are
++	 *     broadcast to everyone, to allow tracking peers.
++	 */
++
++	switch (msg->items[0].type) {
++	case KDBUS_ITEM_NAME_ADD:
++	case KDBUS_ITEM_NAME_REMOVE:
++	case KDBUS_ITEM_NAME_CHANGE:
++		return kdbus_conn_policy_see_name(conn, conn_creds,
++					msg->items[0].name_change.name);
++
++	case KDBUS_ITEM_ID_ADD:
++	case KDBUS_ITEM_ID_REMOVE:
++		return true;
++
++	default:
++		WARN(1, "Invalid type for notification broadcast: %llu\n",
++		     (unsigned long long)msg->items[0].type);
++		return false;
++	}
++}
++
++/**
++ * kdbus_cmd_hello() - handle KDBUS_CMD_HELLO
++ * @ep:			Endpoint to operate on
++ * @file:		File this connection is opened on
++ * @argp:		Command payload
++ *
++ * Return: NULL or newly created connection on success, ERR_PTR on failure.
++ */
++struct kdbus_conn *kdbus_cmd_hello(struct kdbus_ep *ep, struct file *file,
++				   void __user *argp)
++{
++	struct kdbus_cmd_hello *cmd;
++	struct kdbus_conn *c = NULL;
++	const char *item_name;
++	int ret;
++
++	struct kdbus_arg argv[] = {
++		{ .type = KDBUS_ITEM_NEGOTIATE },
++		{ .type = KDBUS_ITEM_NAME },
++		{ .type = KDBUS_ITEM_CREDS },
++		{ .type = KDBUS_ITEM_PIDS },
++		{ .type = KDBUS_ITEM_SECLABEL },
++		{ .type = KDBUS_ITEM_CONN_DESCRIPTION },
++		{ .type = KDBUS_ITEM_POLICY_ACCESS, .multiple = true },
++	};
++	struct kdbus_args args = {
++		.allowed_flags = KDBUS_FLAG_NEGOTIATE |
++				 KDBUS_HELLO_ACCEPT_FD |
++				 KDBUS_HELLO_ACTIVATOR |
++				 KDBUS_HELLO_POLICY_HOLDER |
++				 KDBUS_HELLO_MONITOR,
++		.argv = argv,
++		.argc = ARRAY_SIZE(argv),
++	};
++
++	ret = kdbus_args_parse(&args, argp, &cmd);
++	if (ret < 0)
++		return ERR_PTR(ret);
++	if (ret > 0)
++		return NULL;
++
++	item_name = argv[1].item ? argv[1].item->str : NULL;
++
++	c = kdbus_conn_new(ep, file, cmd, item_name,
++			   argv[2].item ? &argv[2].item->creds : NULL,
++			   argv[3].item ? &argv[3].item->pids : NULL,
++			   argv[4].item ? argv[4].item->str : NULL,
++			   argv[5].item ? argv[5].item->str : NULL);
++	if (IS_ERR(c)) {
++		ret = PTR_ERR(c);
++		c = NULL;
++		goto exit;
++	}
++
++	ret = kdbus_conn_connect(c, item_name);
++	if (ret < 0)
++		goto exit;
++
++	if (kdbus_conn_is_activator(c) || kdbus_conn_is_policy_holder(c)) {
++		ret = kdbus_conn_acquire(c);
++		if (ret < 0)
++			goto exit;
++
++		ret = kdbus_policy_set(&c->ep->bus->policy_db, args.items,
++				       args.items_size, 1,
++				       kdbus_conn_is_policy_holder(c), c);
++		kdbus_conn_release(c);
++		if (ret < 0)
++			goto exit;
++	}
++
++	if (copy_to_user(argp, cmd, sizeof(*cmd)))
++		ret = -EFAULT;
++
++exit:
++	ret = kdbus_args_clear(&args, ret);
++	if (ret < 0) {
++		if (c) {
++			kdbus_conn_disconnect(c, false);
++			kdbus_conn_unref(c);
++		}
++		return ERR_PTR(ret);
++	}
++	return c;
++}
++
++/**
++ * kdbus_cmd_byebye_unlocked() - handle KDBUS_CMD_BYEBYE
++ * @conn:		connection to operate on
++ * @argp:		command payload
++ *
++ * The caller must not hold any active reference to @conn or this will deadlock.
++ *
++ * Return: >=0 on success, negative error code on failure.
++ */
++int kdbus_cmd_byebye_unlocked(struct kdbus_conn *conn, void __user *argp)
++{
++	struct kdbus_cmd *cmd;
++	int ret;
++
++	struct kdbus_arg argv[] = {
++		{ .type = KDBUS_ITEM_NEGOTIATE },
++	};
++	struct kdbus_args args = {
++		.allowed_flags = KDBUS_FLAG_NEGOTIATE,
++		.argv = argv,
++		.argc = ARRAY_SIZE(argv),
++	};
++
++	if (!kdbus_conn_is_ordinary(conn))
++		return -EOPNOTSUPP;
++
++	ret = kdbus_args_parse(&args, argp, &cmd);
++	if (ret != 0)
++		return ret;
++
++	ret = kdbus_conn_disconnect(conn, true);
++	return kdbus_args_clear(&args, ret);
++}
++
++/**
++ * kdbus_cmd_conn_info() - handle KDBUS_CMD_CONN_INFO
++ * @conn:		connection to operate on
++ * @argp:		command payload
++ *
++ * Return: >=0 on success, negative error code on failure.
++ */
++int kdbus_cmd_conn_info(struct kdbus_conn *conn, void __user *argp)
++{
++	struct kdbus_meta_conn *conn_meta = NULL;
++	struct kdbus_pool_slice *slice = NULL;
++	struct kdbus_name_entry *entry = NULL;
++	struct kdbus_name_owner *owner = NULL;
++	struct kdbus_conn *owner_conn = NULL;
++	struct kdbus_item *meta_items = NULL;
++	struct kdbus_info info = {};
++	struct kdbus_cmd_info *cmd;
++	struct kdbus_bus *bus = conn->ep->bus;
++	struct kvec kvec[3];
++	size_t meta_size, cnt = 0;
++	const char *name;
++	u64 attach_flags, size = 0;
++	int ret;
++
++	struct kdbus_arg argv[] = {
++		{ .type = KDBUS_ITEM_NEGOTIATE },
++		{ .type = KDBUS_ITEM_NAME },
++	};
++	struct kdbus_args args = {
++		.allowed_flags = KDBUS_FLAG_NEGOTIATE,
++		.argv = argv,
++		.argc = ARRAY_SIZE(argv),
++	};
++
++	ret = kdbus_args_parse(&args, argp, &cmd);
++	if (ret != 0)
++		return ret;
++
++	/* registry must be held throughout lookup *and* collecting data */
++	down_read(&bus->name_registry->rwlock);
++
++	ret = kdbus_sanitize_attach_flags(cmd->attach_flags, &attach_flags);
++	if (ret < 0)
++		goto exit;
++
++	name = argv[1].item ? argv[1].item->str : NULL;
++
++	if (name) {
++		entry = kdbus_name_lookup_unlocked(bus->name_registry, name);
++		if (entry)
++			owner = kdbus_name_get_owner(entry);
++		if (!owner ||
++		    !kdbus_conn_policy_see_name(conn, current_cred(), name) ||
++		    (cmd->id != 0 && owner->conn->id != cmd->id)) {
++			/* pretend a name doesn't exist if you cannot see it */
++			ret = -ESRCH;
++			goto exit;
++		}
++
++		owner_conn = kdbus_conn_ref(owner->conn);
++	} else if (cmd->id > 0) {
++		owner_conn = kdbus_bus_find_conn_by_id(bus, cmd->id);
++		if (!owner_conn || !kdbus_conn_policy_see(conn, current_cred(),
++							  owner_conn)) {
++			/* pretend an id doesn't exist if you cannot see it */
++			ret = -ENXIO;
++			goto exit;
++		}
++	} else {
++		ret = -EINVAL;
++		goto exit;
++	}
++
++	attach_flags &= atomic64_read(&owner_conn->attach_flags_send);
++
++	conn_meta = kdbus_meta_conn_new();
++	if (IS_ERR(conn_meta)) {
++		ret = PTR_ERR(conn_meta);
++		conn_meta = NULL;
++		goto exit;
++	}
++
++	ret = kdbus_meta_conn_collect(conn_meta, owner_conn, 0, attach_flags);
++	if (ret < 0)
++		goto exit;
++
++	ret = kdbus_meta_emit(owner_conn->meta_proc, owner_conn->meta_fake,
++			      conn_meta, conn, attach_flags,
++			      &meta_items, &meta_size);
++	if (ret < 0)
++		goto exit;
++
++	info.id = owner_conn->id;
++	info.flags = owner_conn->flags;
++
++	kdbus_kvec_set(&kvec[cnt++], &info, sizeof(info), &size);
++	if (meta_size > 0) {
++		kdbus_kvec_set(&kvec[cnt++], meta_items, meta_size, &size);
++		cnt += !!kdbus_kvec_pad(&kvec[cnt], &size);
++	}
++
++	info.size = size;
++
++	slice = kdbus_pool_slice_alloc(conn->pool, size, false);
++	if (IS_ERR(slice)) {
++		ret = PTR_ERR(slice);
++		slice = NULL;
++		goto exit;
++	}
++
++	ret = kdbus_pool_slice_copy_kvec(slice, 0, kvec, cnt, size);
++	if (ret < 0)
++		goto exit;
++
++	kdbus_pool_slice_publish(slice, &cmd->offset, &cmd->info_size);
++
++	if (kdbus_member_set_user(&cmd->offset, argp, typeof(*cmd), offset) ||
++	    kdbus_member_set_user(&cmd->info_size, argp,
++				  typeof(*cmd), info_size)) {
++		ret = -EFAULT;
++		goto exit;
++	}
++
++	ret = 0;
++
++exit:
++	up_read(&bus->name_registry->rwlock);
++	kdbus_pool_slice_release(slice);
++	kfree(meta_items);
++	kdbus_meta_conn_unref(conn_meta);
++	kdbus_conn_unref(owner_conn);
++	return kdbus_args_clear(&args, ret);
++}
++
++/**
++ * kdbus_cmd_update() - handle KDBUS_CMD_UPDATE
++ * @conn:		connection to operate on
++ * @argp:		command payload
++ *
++ * Return: >=0 on success, negative error code on failure.
++ */
++int kdbus_cmd_update(struct kdbus_conn *conn, void __user *argp)
++{
++	struct kdbus_item *item_policy;
++	u64 *item_attach_send = NULL;
++	u64 *item_attach_recv = NULL;
++	struct kdbus_cmd *cmd;
++	u64 attach_send;
++	u64 attach_recv;
++	int ret;
++
++	struct kdbus_arg argv[] = {
++		{ .type = KDBUS_ITEM_NEGOTIATE },
++		{ .type = KDBUS_ITEM_ATTACH_FLAGS_SEND },
++		{ .type = KDBUS_ITEM_ATTACH_FLAGS_RECV },
++		{ .type = KDBUS_ITEM_NAME, .multiple = true },
++		{ .type = KDBUS_ITEM_POLICY_ACCESS, .multiple = true },
++	};
++	struct kdbus_args args = {
++		.allowed_flags = KDBUS_FLAG_NEGOTIATE,
++		.argv = argv,
++		.argc = ARRAY_SIZE(argv),
++	};
++
++	ret = kdbus_args_parse(&args, argp, &cmd);
++	if (ret != 0)
++		return ret;
++
++	item_attach_send = argv[1].item ? &argv[1].item->data64[0] : NULL;
++	item_attach_recv = argv[2].item ? &argv[2].item->data64[0] : NULL;
++	item_policy = argv[3].item ? : argv[4].item;
++
++	if (item_attach_send) {
++		if (!kdbus_conn_is_ordinary(conn) &&
++		    !kdbus_conn_is_monitor(conn)) {
++			ret = -EOPNOTSUPP;
++			goto exit;
++		}
++
++		ret = kdbus_sanitize_attach_flags(*item_attach_send,
++						  &attach_send);
++		if (ret < 0)
++			goto exit;
++	}
++
++	if (item_attach_recv) {
++		if (!kdbus_conn_is_ordinary(conn) &&
++		    !kdbus_conn_is_monitor(conn) &&
++		    !kdbus_conn_is_activator(conn)) {
++			ret = -EOPNOTSUPP;
++			goto exit;
++		}
++
++		ret = kdbus_sanitize_attach_flags(*item_attach_recv,
++						  &attach_recv);
++		if (ret < 0)
++			goto exit;
++	}
++
++	if (item_policy && !kdbus_conn_is_policy_holder(conn)) {
++		ret = -EOPNOTSUPP;
++		goto exit;
++	}
++
++	/* now that we verified the input, update the connection */
++
++	if (item_policy) {
++		ret = kdbus_policy_set(&conn->ep->bus->policy_db, cmd->items,
++				       KDBUS_ITEMS_SIZE(cmd, items),
++				       1, true, conn);
++		if (ret < 0)
++			goto exit;
++	}
++
++	if (item_attach_send)
++		atomic64_set(&conn->attach_flags_send, attach_send);
++
++	if (item_attach_recv)
++		atomic64_set(&conn->attach_flags_recv, attach_recv);
++
++exit:
++	return kdbus_args_clear(&args, ret);
++}
++
++/**
++ * kdbus_cmd_send() - handle KDBUS_CMD_SEND
++ * @conn:		connection to operate on
++ * @f:			file this command was called on
++ * @argp:		command payload
++ *
++ * Return: >=0 on success, negative error code on failure.
++ */
++int kdbus_cmd_send(struct kdbus_conn *conn, struct file *f, void __user *argp)
++{
++	struct kdbus_cmd_send *cmd;
++	struct kdbus_staging *staging = NULL;
++	struct kdbus_msg *msg = NULL;
++	struct file *cancel_fd = NULL;
++	int ret, ret2;
++
++	/* command arguments */
++	struct kdbus_arg argv[] = {
++		{ .type = KDBUS_ITEM_NEGOTIATE },
++		{ .type = KDBUS_ITEM_CANCEL_FD },
++	};
++	struct kdbus_args args = {
++		.allowed_flags = KDBUS_FLAG_NEGOTIATE |
++				 KDBUS_SEND_SYNC_REPLY,
++		.argv = argv,
++		.argc = ARRAY_SIZE(argv),
++	};
++
++	/* message arguments */
++	struct kdbus_arg msg_argv[] = {
++		{ .type = KDBUS_ITEM_NEGOTIATE },
++		{ .type = KDBUS_ITEM_PAYLOAD_VEC, .multiple = true },
++		{ .type = KDBUS_ITEM_PAYLOAD_MEMFD, .multiple = true },
++		{ .type = KDBUS_ITEM_FDS },
++		{ .type = KDBUS_ITEM_BLOOM_FILTER },
++		{ .type = KDBUS_ITEM_DST_NAME },
++	};
++	struct kdbus_args msg_args = {
++		.allowed_flags = KDBUS_FLAG_NEGOTIATE |
++				 KDBUS_MSG_EXPECT_REPLY |
++				 KDBUS_MSG_NO_AUTO_START |
++				 KDBUS_MSG_SIGNAL,
++		.argv = msg_argv,
++		.argc = ARRAY_SIZE(msg_argv),
++	};
++
++	if (!kdbus_conn_is_ordinary(conn))
++		return -EOPNOTSUPP;
++
++	/* make sure to parse both, @cmd and @msg on negotiation */
++
++	ret = kdbus_args_parse(&args, argp, &cmd);
++	if (ret < 0)
++		goto exit;
++	else if (ret > 0 && !cmd->msg_address) /* negotiation without msg */
++		goto exit;
++
++	ret2 = kdbus_args_parse_msg(&msg_args, KDBUS_PTR(cmd->msg_address),
++				    &msg);
++	if (ret2 < 0) { /* cannot parse message */
++		ret = ret2;
++		goto exit;
++	} else if (ret2 > 0 && !ret) { /* msg-negot implies cmd-negot */
++		ret = -EINVAL;
++		goto exit;
++	} else if (ret > 0) { /* negotiation */
++		goto exit;
++	}
++
++	/* here we parsed both, @cmd and @msg, and neither wants negotiation */
++
++	cmd->reply.return_flags = 0;
++	kdbus_pool_publish_empty(conn->pool, &cmd->reply.offset,
++				 &cmd->reply.msg_size);
++
++	if (argv[1].item) {
++		cancel_fd = fget(argv[1].item->fds[0]);
++		if (!cancel_fd) {
++			ret = -EBADF;
++			goto exit;
++		}
++
++		if (!cancel_fd->f_op->poll) {
++			ret = -EINVAL;
++			goto exit;
++		}
++	}
++
++	/* patch-in the source of this message */
++	if (msg->src_id > 0 && msg->src_id != conn->id) {
++		ret = -EINVAL;
++		goto exit;
++	}
++	msg->src_id = conn->id;
++
++	staging = kdbus_staging_new_user(conn->ep->bus, cmd, msg);
++	if (IS_ERR(staging)) {
++		ret = PTR_ERR(staging);
++		staging = NULL;
++		goto exit;
++	}
++
++	if (msg->dst_id == KDBUS_DST_ID_BROADCAST) {
++		down_read(&conn->ep->bus->name_registry->rwlock);
++		kdbus_bus_broadcast(conn->ep->bus, conn, staging);
++		up_read(&conn->ep->bus->name_registry->rwlock);
++	} else if (cmd->flags & KDBUS_SEND_SYNC_REPLY) {
++		struct kdbus_reply *r;
++		ktime_t exp;
++
++		exp = ns_to_ktime(msg->timeout_ns);
++		r = kdbus_conn_call(conn, staging, exp);
++		if (IS_ERR(r)) {
++			ret = PTR_ERR(r);
++			goto exit;
++		}
++
++		ret = kdbus_conn_wait_reply(conn, cmd, f, cancel_fd, r, exp);
++		kdbus_reply_unref(r);
++		if (ret < 0)
++			goto exit;
++	} else if ((msg->flags & KDBUS_MSG_EXPECT_REPLY) ||
++		   msg->cookie_reply == 0) {
++		ret = kdbus_conn_unicast(conn, staging);
++		if (ret < 0)
++			goto exit;
++	} else {
++		ret = kdbus_conn_reply(conn, staging);
++		if (ret < 0)
++			goto exit;
++	}
++
++	if (kdbus_member_set_user(&cmd->reply, argp, typeof(*cmd), reply))
++		ret = -EFAULT;
++
++exit:
++	if (cancel_fd)
++		fput(cancel_fd);
++	kdbus_staging_free(staging);
++	ret = kdbus_args_clear(&msg_args, ret);
++	return kdbus_args_clear(&args, ret);
++}
++
++/**
++ * kdbus_cmd_recv() - handle KDBUS_CMD_RECV
++ * @conn:		connection to operate on
++ * @argp:		command payload
++ *
++ * Return: >=0 on success, negative error code on failure.
++ */
++int kdbus_cmd_recv(struct kdbus_conn *conn, void __user *argp)
++{
++	struct kdbus_queue_entry *entry;
++	struct kdbus_cmd_recv *cmd;
++	int ret;
++
++	struct kdbus_arg argv[] = {
++		{ .type = KDBUS_ITEM_NEGOTIATE },
++	};
++	struct kdbus_args args = {
++		.allowed_flags = KDBUS_FLAG_NEGOTIATE |
++				 KDBUS_RECV_PEEK |
++				 KDBUS_RECV_DROP |
++				 KDBUS_RECV_USE_PRIORITY,
++		.argv = argv,
++		.argc = ARRAY_SIZE(argv),
++	};
++
++	if (!kdbus_conn_is_ordinary(conn) &&
++	    !kdbus_conn_is_monitor(conn) &&
++	    !kdbus_conn_is_activator(conn))
++		return -EOPNOTSUPP;
++
++	ret = kdbus_args_parse(&args, argp, &cmd);
++	if (ret != 0)
++		return ret;
++
++	cmd->dropped_msgs = 0;
++	cmd->msg.return_flags = 0;
++	kdbus_pool_publish_empty(conn->pool, &cmd->msg.offset,
++				 &cmd->msg.msg_size);
++
++	/* DROP+priority is not realiably, so prevent it */
++	if ((cmd->flags & KDBUS_RECV_DROP) &&
++	    (cmd->flags & KDBUS_RECV_USE_PRIORITY)) {
++		ret = -EINVAL;
++		goto exit;
++	}
++
++	mutex_lock(&conn->lock);
++
++	entry = kdbus_queue_peek(&conn->queue, cmd->priority,
++				 cmd->flags & KDBUS_RECV_USE_PRIORITY);
++	if (!entry) {
++		mutex_unlock(&conn->lock);
++		ret = -EAGAIN;
++	} else if (cmd->flags & KDBUS_RECV_DROP) {
++		struct kdbus_reply *reply = kdbus_reply_ref(entry->reply);
++
++		kdbus_queue_entry_free(entry);
++
++		mutex_unlock(&conn->lock);
++
++		if (reply) {
++			mutex_lock(&reply->reply_dst->lock);
++			if (!list_empty(&reply->entry)) {
++				kdbus_reply_unlink(reply);
++				if (reply->sync)
++					kdbus_sync_reply_wakeup(reply, -EPIPE);
++				else
++					kdbus_notify_reply_dead(conn->ep->bus,
++							reply->reply_dst->id,
++							reply->cookie);
++			}
++			mutex_unlock(&reply->reply_dst->lock);
++			kdbus_notify_flush(conn->ep->bus);
++		}
++
++		kdbus_reply_unref(reply);
++	} else {
++		bool install_fds;
++
++		/*
++		 * PEEK just returns the location of the next message. Do not
++		 * install FDs nor memfds nor anything else. The only
++		 * information of interest should be the message header and
++		 * metadata. Any FD numbers in the payload is undefined for
++		 * PEEK'ed messages.
++		 * Also make sure to never install fds into a connection that
++		 * has refused to receive any. Ordinary connections will not get
++		 * messages with FDs queued (the receiver will get -ECOMM), but
++		 * eavesdroppers might.
++		 */
++		install_fds = (conn->flags & KDBUS_HELLO_ACCEPT_FD) &&
++			      !(cmd->flags & KDBUS_RECV_PEEK);
++
++		ret = kdbus_queue_entry_install(entry,
++						&cmd->msg.return_flags,
++						install_fds);
++		if (ret < 0) {
++			mutex_unlock(&conn->lock);
++			goto exit;
++		}
++
++		kdbus_pool_slice_publish(entry->slice, &cmd->msg.offset,
++					 &cmd->msg.msg_size);
++
++		if (!(cmd->flags & KDBUS_RECV_PEEK))
++			kdbus_queue_entry_free(entry);
++
++		mutex_unlock(&conn->lock);
++	}
++
++	cmd->dropped_msgs = atomic_xchg(&conn->lost_count, 0);
++	if (cmd->dropped_msgs > 0)
++		cmd->return_flags |= KDBUS_RECV_RETURN_DROPPED_MSGS;
++
++	if (kdbus_member_set_user(&cmd->msg, argp, typeof(*cmd), msg) ||
++	    kdbus_member_set_user(&cmd->dropped_msgs, argp, typeof(*cmd),
++				  dropped_msgs))
++		ret = -EFAULT;
++
++exit:
++	return kdbus_args_clear(&args, ret);
++}
++
++/**
++ * kdbus_cmd_free() - handle KDBUS_CMD_FREE
++ * @conn:		connection to operate on
++ * @argp:		command payload
++ *
++ * Return: >=0 on success, negative error code on failure.
++ */
++int kdbus_cmd_free(struct kdbus_conn *conn, void __user *argp)
++{
++	struct kdbus_cmd_free *cmd;
++	int ret;
++
++	struct kdbus_arg argv[] = {
++		{ .type = KDBUS_ITEM_NEGOTIATE },
++	};
++	struct kdbus_args args = {
++		.allowed_flags = KDBUS_FLAG_NEGOTIATE,
++		.argv = argv,
++		.argc = ARRAY_SIZE(argv),
++	};
++
++	if (!kdbus_conn_is_ordinary(conn) &&
++	    !kdbus_conn_is_monitor(conn) &&
++	    !kdbus_conn_is_activator(conn))
++		return -EOPNOTSUPP;
++
++	ret = kdbus_args_parse(&args, argp, &cmd);
++	if (ret != 0)
++		return ret;
++
++	ret = kdbus_pool_release_offset(conn->pool, cmd->offset);
++
++	return kdbus_args_clear(&args, ret);
++}
+diff --git a/ipc/kdbus/connection.h b/ipc/kdbus/connection.h
+new file mode 100644
+index 0000000..1ad0820
+--- /dev/null
++++ b/ipc/kdbus/connection.h
+@@ -0,0 +1,260 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_CONNECTION_H
++#define __KDBUS_CONNECTION_H
++
++#include <linux/atomic.h>
++#include <linux/kref.h>
++#include <linux/lockdep.h>
++#include <linux/path.h>
++
++#include "limits.h"
++#include "metadata.h"
++#include "pool.h"
++#include "queue.h"
++#include "util.h"
++
++#define KDBUS_HELLO_SPECIAL_CONN	(KDBUS_HELLO_ACTIVATOR | \
++					 KDBUS_HELLO_POLICY_HOLDER | \
++					 KDBUS_HELLO_MONITOR)
++
++struct kdbus_name_entry;
++struct kdbus_quota;
++struct kdbus_staging;
++
++/**
++ * struct kdbus_conn - connection to a bus
++ * @kref:		Reference count
++ * @active:		Active references to the connection
++ * @id:			Connection ID
++ * @flags:		KDBUS_HELLO_* flags
++ * @attach_flags_send:	KDBUS_ATTACH_* flags for sending
++ * @attach_flags_recv:	KDBUS_ATTACH_* flags for receiving
++ * @description:	Human-readable connection description, used for
++ *			debugging. This field is only set when the
++ *			connection is created.
++ * @ep:			The endpoint this connection belongs to
++ * @lock:		Connection data lock
++ * @hentry:		Entry in ID <-> connection map
++ * @ep_entry:		Entry in endpoint
++ * @monitor_entry:	Entry in monitor, if the connection is a monitor
++ * @reply_list:		List of connections this connection should
++ *			reply to
++ * @work:		Delayed work to handle timeouts
++ *			activator for
++ * @match_db:		Subscription filter to broadcast messages
++ * @meta_proc:		Process metadata of connection creator, or NULL
++ * @meta_fake:		Faked metadata, or NULL
++ * @pool:		The user's buffer to receive messages
++ * @user:		Owner of the connection
++ * @cred:		The credentials of the connection at creation time
++ * @pid:		Pid at creation time
++ * @root_path:		Root path at creation time
++ * @request_count:	Number of pending requests issued by this
++ *			connection that are waiting for replies from
++ *			other peers
++ * @lost_count:		Number of lost broadcast messages
++ * @wait:		Wake up this endpoint
++ * @queue:		The message queue associated with this connection
++ * @quota:		Array of per-user quota indexed by user->id
++ * @n_quota:		Number of elements in quota array
++ * @names_list:		List of well-known names
++ * @name_count:		Number of owned well-known names
++ * @privileged:		Whether this connection is privileged on the domain
++ * @owner:		Owned by the same user as the bus owner
++ */
++struct kdbus_conn {
++	struct kref kref;
++	atomic_t active;
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++	struct lockdep_map dep_map;
++#endif
++	u64 id;
++	u64 flags;
++	atomic64_t attach_flags_send;
++	atomic64_t attach_flags_recv;
++	const char *description;
++	struct kdbus_ep *ep;
++	struct mutex lock;
++	struct hlist_node hentry;
++	struct list_head ep_entry;
++	struct list_head monitor_entry;
++	struct list_head reply_list;
++	struct delayed_work work;
++	struct kdbus_match_db *match_db;
++	struct kdbus_meta_proc *meta_proc;
++	struct kdbus_meta_fake *meta_fake;
++	struct kdbus_pool *pool;
++	struct kdbus_user *user;
++	const struct cred *cred;
++	struct pid *pid;
++	struct path root_path;
++	atomic_t request_count;
++	atomic_t lost_count;
++	wait_queue_head_t wait;
++	struct kdbus_queue queue;
++
++	struct kdbus_quota *quota;
++	unsigned int n_quota;
++
++	/* protected by registry->rwlock */
++	struct list_head names_list;
++	unsigned int name_count;
++
++	bool privileged:1;
++	bool owner:1;
++};
++
++struct kdbus_conn *kdbus_conn_ref(struct kdbus_conn *conn);
++struct kdbus_conn *kdbus_conn_unref(struct kdbus_conn *conn);
++bool kdbus_conn_active(const struct kdbus_conn *conn);
++int kdbus_conn_acquire(struct kdbus_conn *conn);
++void kdbus_conn_release(struct kdbus_conn *conn);
++int kdbus_conn_disconnect(struct kdbus_conn *conn, bool ensure_queue_empty);
++bool kdbus_conn_has_name(struct kdbus_conn *conn, const char *name);
++int kdbus_conn_quota_inc(struct kdbus_conn *c, struct kdbus_user *u,
++			 size_t memory, size_t fds);
++void kdbus_conn_quota_dec(struct kdbus_conn *c, struct kdbus_user *u,
++			  size_t memory, size_t fds);
++void kdbus_conn_lost_message(struct kdbus_conn *c);
++int kdbus_conn_entry_insert(struct kdbus_conn *conn_src,
++			    struct kdbus_conn *conn_dst,
++			    struct kdbus_staging *staging,
++			    struct kdbus_reply *reply,
++			    const struct kdbus_name_entry *name);
++void kdbus_conn_move_messages(struct kdbus_conn *conn_dst,
++			      struct kdbus_conn *conn_src,
++			      u64 name_id);
++
++/* policy */
++bool kdbus_conn_policy_own_name(struct kdbus_conn *conn,
++				const struct cred *conn_creds,
++				const char *name);
++bool kdbus_conn_policy_talk(struct kdbus_conn *conn,
++			    const struct cred *conn_creds,
++			    struct kdbus_conn *to);
++bool kdbus_conn_policy_see_name_unlocked(struct kdbus_conn *conn,
++					 const struct cred *curr_creds,
++					 const char *name);
++bool kdbus_conn_policy_see_notification(struct kdbus_conn *conn,
++					const struct cred *curr_creds,
++					const struct kdbus_msg *msg);
++
++/* command dispatcher */
++struct kdbus_conn *kdbus_cmd_hello(struct kdbus_ep *ep, struct file *file,
++				   void __user *argp);
++int kdbus_cmd_byebye_unlocked(struct kdbus_conn *conn, void __user *argp);
++int kdbus_cmd_conn_info(struct kdbus_conn *conn, void __user *argp);
++int kdbus_cmd_update(struct kdbus_conn *conn, void __user *argp);
++int kdbus_cmd_send(struct kdbus_conn *conn, struct file *f, void __user *argp);
++int kdbus_cmd_recv(struct kdbus_conn *conn, void __user *argp);
++int kdbus_cmd_free(struct kdbus_conn *conn, void __user *argp);
++
++/**
++ * kdbus_conn_is_ordinary() - Check if connection is ordinary
++ * @conn:		The connection to check
++ *
++ * Return: Non-zero if the connection is an ordinary connection
++ */
++static inline int kdbus_conn_is_ordinary(const struct kdbus_conn *conn)
++{
++	return !(conn->flags & KDBUS_HELLO_SPECIAL_CONN);
++}
++
++/**
++ * kdbus_conn_is_activator() - Check if connection is an activator
++ * @conn:		The connection to check
++ *
++ * Return: Non-zero if the connection is an activator
++ */
++static inline int kdbus_conn_is_activator(const struct kdbus_conn *conn)
++{
++	return conn->flags & KDBUS_HELLO_ACTIVATOR;
++}
++
++/**
++ * kdbus_conn_is_policy_holder() - Check if connection is a policy holder
++ * @conn:		The connection to check
++ *
++ * Return: Non-zero if the connection is a policy holder
++ */
++static inline int kdbus_conn_is_policy_holder(const struct kdbus_conn *conn)
++{
++	return conn->flags & KDBUS_HELLO_POLICY_HOLDER;
++}
++
++/**
++ * kdbus_conn_is_monitor() - Check if connection is a monitor
++ * @conn:		The connection to check
++ *
++ * Return: Non-zero if the connection is a monitor
++ */
++static inline int kdbus_conn_is_monitor(const struct kdbus_conn *conn)
++{
++	return conn->flags & KDBUS_HELLO_MONITOR;
++}
++
++/**
++ * kdbus_conn_lock2() - Lock two connections
++ * @a:		connection A to lock or NULL
++ * @b:		connection B to lock or NULL
++ *
++ * Lock two connections at once. As we need to have a stable locking order, we
++ * always lock the connection with lower memory address first.
++ */
++static inline void kdbus_conn_lock2(struct kdbus_conn *a, struct kdbus_conn *b)
++{
++	if (a < b) {
++		if (a)
++			mutex_lock(&a->lock);
++		if (b && b != a)
++			mutex_lock_nested(&b->lock, !!a);
++	} else {
++		if (b)
++			mutex_lock(&b->lock);
++		if (a && a != b)
++			mutex_lock_nested(&a->lock, !!b);
++	}
++}
++
++/**
++ * kdbus_conn_unlock2() - Unlock two connections
++ * @a:		connection A to unlock or NULL
++ * @b:		connection B to unlock or NULL
++ *
++ * Unlock two connections at once. See kdbus_conn_lock2().
++ */
++static inline void kdbus_conn_unlock2(struct kdbus_conn *a,
++				      struct kdbus_conn *b)
++{
++	if (a)
++		mutex_unlock(&a->lock);
++	if (b && b != a)
++		mutex_unlock(&b->lock);
++}
++
++/**
++ * kdbus_conn_assert_active() - lockdep assert on active lock
++ * @conn:	connection that shall be active
++ *
++ * This verifies via lockdep that the caller holds an active reference to the
++ * given connection.
++ */
++static inline void kdbus_conn_assert_active(struct kdbus_conn *conn)
++{
++	lockdep_assert_held(conn);
++}
++
++#endif
+diff --git a/ipc/kdbus/domain.c b/ipc/kdbus/domain.c
+new file mode 100644
+index 0000000..ac9f760
+--- /dev/null
++++ b/ipc/kdbus/domain.c
+@@ -0,0 +1,296 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/fs.h>
++#include <linux/idr.h>
++#include <linux/init.h>
++#include <linux/module.h>
++#include <linux/sched.h>
++#include <linux/sizes.h>
++#include <linux/slab.h>
++#include <linux/uaccess.h>
++
++#include "bus.h"
++#include "domain.h"
++#include "handle.h"
++#include "item.h"
++#include "limits.h"
++#include "util.h"
++
++static void kdbus_domain_control_free(struct kdbus_node *node)
++{
++	kfree(node);
++}
++
++static struct kdbus_node *kdbus_domain_control_new(struct kdbus_domain *domain,
++						   unsigned int access)
++{
++	struct kdbus_node *node;
++	int ret;
++
++	node = kzalloc(sizeof(*node), GFP_KERNEL);
++	if (!node)
++		return ERR_PTR(-ENOMEM);
++
++	kdbus_node_init(node, KDBUS_NODE_CONTROL);
++
++	node->free_cb = kdbus_domain_control_free;
++	node->mode = domain->node.mode;
++	node->mode = S_IRUSR | S_IWUSR;
++	if (access & (KDBUS_MAKE_ACCESS_GROUP | KDBUS_MAKE_ACCESS_WORLD))
++		node->mode |= S_IRGRP | S_IWGRP;
++	if (access & KDBUS_MAKE_ACCESS_WORLD)
++		node->mode |= S_IROTH | S_IWOTH;
++
++	ret = kdbus_node_link(node, &domain->node, "control");
++	if (ret < 0)
++		goto exit_free;
++
++	return node;
++
++exit_free:
++	kdbus_node_deactivate(node);
++	kdbus_node_unref(node);
++	return ERR_PTR(ret);
++}
++
++static void kdbus_domain_free(struct kdbus_node *node)
++{
++	struct kdbus_domain *domain =
++		container_of(node, struct kdbus_domain, node);
++
++	put_user_ns(domain->user_namespace);
++	ida_destroy(&domain->user_ida);
++	idr_destroy(&domain->user_idr);
++	kfree(domain);
++}
++
++/**
++ * kdbus_domain_new() - create a new domain
++ * @access:		The access mode for this node (KDBUS_MAKE_ACCESS_*)
++ *
++ * Return: a new kdbus_domain on success, ERR_PTR on failure
++ */
++struct kdbus_domain *kdbus_domain_new(unsigned int access)
++{
++	struct kdbus_domain *d;
++	int ret;
++
++	d = kzalloc(sizeof(*d), GFP_KERNEL);
++	if (!d)
++		return ERR_PTR(-ENOMEM);
++
++	kdbus_node_init(&d->node, KDBUS_NODE_DOMAIN);
++
++	d->node.free_cb = kdbus_domain_free;
++	d->node.mode = S_IRUSR | S_IXUSR;
++	if (access & (KDBUS_MAKE_ACCESS_GROUP | KDBUS_MAKE_ACCESS_WORLD))
++		d->node.mode |= S_IRGRP | S_IXGRP;
++	if (access & KDBUS_MAKE_ACCESS_WORLD)
++		d->node.mode |= S_IROTH | S_IXOTH;
++
++	mutex_init(&d->lock);
++	idr_init(&d->user_idr);
++	ida_init(&d->user_ida);
++
++	/* Pin user namespace so we can guarantee domain-unique bus * names. */
++	d->user_namespace = get_user_ns(current_user_ns());
++
++	ret = kdbus_node_link(&d->node, NULL, NULL);
++	if (ret < 0)
++		goto exit_unref;
++
++	return d;
++
++exit_unref:
++	kdbus_node_deactivate(&d->node);
++	kdbus_node_unref(&d->node);
++	return ERR_PTR(ret);
++}
++
++/**
++ * kdbus_domain_ref() - take a domain reference
++ * @domain:		Domain
++ *
++ * Return: the domain itself
++ */
++struct kdbus_domain *kdbus_domain_ref(struct kdbus_domain *domain)
++{
++	if (domain)
++		kdbus_node_ref(&domain->node);
++	return domain;
++}
++
++/**
++ * kdbus_domain_unref() - drop a domain reference
++ * @domain:		Domain
++ *
++ * When the last reference is dropped, the domain internal structure
++ * is freed.
++ *
++ * Return: NULL
++ */
++struct kdbus_domain *kdbus_domain_unref(struct kdbus_domain *domain)
++{
++	if (domain)
++		kdbus_node_unref(&domain->node);
++	return NULL;
++}
++
++/**
++ * kdbus_domain_populate() - populate static domain nodes
++ * @domain:	domain to populate
++ * @access:	KDBUS_MAKE_ACCESS_* access restrictions for new nodes
++ *
++ * Allocate and activate static sub-nodes of the given domain. This will fail if
++ * you call it on a non-active node or if the domain was already populated.
++ *
++ * Return: 0 on success, negative error code on failure.
++ */
++int kdbus_domain_populate(struct kdbus_domain *domain, unsigned int access)
++{
++	struct kdbus_node *control;
++
++	/*
++	 * Create a control-node for this domain. We drop our own reference
++	 * immediately, effectively causing the node to be deactivated and
++	 * released when the parent domain is.
++	 */
++	control = kdbus_domain_control_new(domain, access);
++	if (IS_ERR(control))
++		return PTR_ERR(control);
++
++	kdbus_node_activate(control);
++	kdbus_node_unref(control);
++	return 0;
++}
++
++/**
++ * kdbus_user_lookup() - lookup a kdbus_user object
++ * @domain:		domain of the user
++ * @uid:		uid of the user; INVALID_UID for an anon user
++ *
++ * Lookup the kdbus user accounting object for the given domain. If INVALID_UID
++ * is passed, a new anonymous user is created which is private to the caller.
++ *
++ * Return: The user object is returned, ERR_PTR on failure.
++ */
++struct kdbus_user *kdbus_user_lookup(struct kdbus_domain *domain, kuid_t uid)
++{
++	struct kdbus_user *u = NULL, *old = NULL;
++	int ret;
++
++	mutex_lock(&domain->lock);
++
++	if (uid_valid(uid)) {
++		old = idr_find(&domain->user_idr, __kuid_val(uid));
++		/*
++		 * If the object is about to be destroyed, ignore it and
++		 * replace the slot in the IDR later on.
++		 */
++		if (old && kref_get_unless_zero(&old->kref)) {
++			mutex_unlock(&domain->lock);
++			return old;
++		}
++	}
++
++	u = kzalloc(sizeof(*u), GFP_KERNEL);
++	if (!u) {
++		ret = -ENOMEM;
++		goto exit;
++	}
++
++	kref_init(&u->kref);
++	u->domain = kdbus_domain_ref(domain);
++	u->uid = uid;
++	atomic_set(&u->buses, 0);
++	atomic_set(&u->connections, 0);
++
++	if (uid_valid(uid)) {
++		if (old) {
++			idr_replace(&domain->user_idr, u, __kuid_val(uid));
++			old->uid = INVALID_UID; /* mark old as removed */
++		} else {
++			ret = idr_alloc(&domain->user_idr, u, __kuid_val(uid),
++					__kuid_val(uid) + 1, GFP_KERNEL);
++			if (ret < 0)
++				goto exit;
++		}
++	}
++
++	/*
++	 * Allocate the smallest possible index for this user; used
++	 * in arrays for accounting user quota in receiver queues.
++	 */
++	ret = ida_simple_get(&domain->user_ida, 1, 0, GFP_KERNEL);
++	if (ret < 0)
++		goto exit;
++
++	u->id = ret;
++	mutex_unlock(&domain->lock);
++	return u;
++
++exit:
++	if (u) {
++		if (uid_valid(u->uid))
++			idr_remove(&domain->user_idr, __kuid_val(u->uid));
++		kdbus_domain_unref(u->domain);
++		kfree(u);
++	}
++	mutex_unlock(&domain->lock);
++	return ERR_PTR(ret);
++}
++
++static void __kdbus_user_free(struct kref *kref)
++{
++	struct kdbus_user *user = container_of(kref, struct kdbus_user, kref);
++
++	WARN_ON(atomic_read(&user->buses) > 0);
++	WARN_ON(atomic_read(&user->connections) > 0);
++
++	mutex_lock(&user->domain->lock);
++	ida_simple_remove(&user->domain->user_ida, user->id);
++	if (uid_valid(user->uid))
++		idr_remove(&user->domain->user_idr, __kuid_val(user->uid));
++	mutex_unlock(&user->domain->lock);
++
++	kdbus_domain_unref(user->domain);
++	kfree(user);
++}
++
++/**
++ * kdbus_user_ref() - take a user reference
++ * @u:		User
++ *
++ * Return: @u is returned
++ */
++struct kdbus_user *kdbus_user_ref(struct kdbus_user *u)
++{
++	if (u)
++		kref_get(&u->kref);
++	return u;
++}
++
++/**
++ * kdbus_user_unref() - drop a user reference
++ * @u:		User
++ *
++ * Return: NULL
++ */
++struct kdbus_user *kdbus_user_unref(struct kdbus_user *u)
++{
++	if (u)
++		kref_put(&u->kref, __kdbus_user_free);
++	return NULL;
++}
+diff --git a/ipc/kdbus/domain.h b/ipc/kdbus/domain.h
+new file mode 100644
+index 0000000..447a2bd
+--- /dev/null
++++ b/ipc/kdbus/domain.h
+@@ -0,0 +1,77 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_DOMAIN_H
++#define __KDBUS_DOMAIN_H
++
++#include <linux/fs.h>
++#include <linux/idr.h>
++#include <linux/kref.h>
++#include <linux/user_namespace.h>
++
++#include "node.h"
++
++/**
++ * struct kdbus_domain - domain for buses
++ * @node:		Underlying API node
++ * @lock:		Domain data lock
++ * @last_id:		Last used object id
++ * @user_idr:		Set of all users indexed by UID
++ * @user_ida:		Set of all users to compute small indices
++ * @user_namespace:	User namespace, pinned at creation time
++ * @dentry:		Root dentry of VFS mount (don't use outside of kdbusfs)
++ */
++struct kdbus_domain {
++	struct kdbus_node node;
++	struct mutex lock;
++	atomic64_t last_id;
++	struct idr user_idr;
++	struct ida user_ida;
++	struct user_namespace *user_namespace;
++	struct dentry *dentry;
++};
++
++/**
++ * struct kdbus_user - resource accounting for users
++ * @kref:		Reference counter
++ * @domain:		Domain of the user
++ * @id:			Index of this user
++ * @uid:		UID of the user
++ * @buses:		Number of buses the user has created
++ * @connections:	Number of connections the user has created
++ */
++struct kdbus_user {
++	struct kref kref;
++	struct kdbus_domain *domain;
++	unsigned int id;
++	kuid_t uid;
++	atomic_t buses;
++	atomic_t connections;
++};
++
++#define kdbus_domain_from_node(_node) \
++	container_of((_node), struct kdbus_domain, node)
++
++struct kdbus_domain *kdbus_domain_new(unsigned int access);
++struct kdbus_domain *kdbus_domain_ref(struct kdbus_domain *domain);
++struct kdbus_domain *kdbus_domain_unref(struct kdbus_domain *domain);
++int kdbus_domain_populate(struct kdbus_domain *domain, unsigned int access);
++
++#define KDBUS_USER_KERNEL_ID 0 /* ID 0 is reserved for kernel accounting */
++
++struct kdbus_user *kdbus_user_lookup(struct kdbus_domain *domain, kuid_t uid);
++struct kdbus_user *kdbus_user_ref(struct kdbus_user *u);
++struct kdbus_user *kdbus_user_unref(struct kdbus_user *u);
++
++#endif
+diff --git a/ipc/kdbus/endpoint.c b/ipc/kdbus/endpoint.c
+new file mode 100644
+index 0000000..44e7a20
+--- /dev/null
++++ b/ipc/kdbus/endpoint.c
+@@ -0,0 +1,303 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/fs.h>
++#include <linux/idr.h>
++#include <linux/init.h>
++#include <linux/module.h>
++#include <linux/sched.h>
++#include <linux/sizes.h>
++#include <linux/slab.h>
++#include <linux/uaccess.h>
++#include <linux/uio.h>
++
++#include "bus.h"
++#include "connection.h"
++#include "domain.h"
++#include "endpoint.h"
++#include "handle.h"
++#include "item.h"
++#include "message.h"
++#include "policy.h"
++
++static void kdbus_ep_free(struct kdbus_node *node)
++{
++	struct kdbus_ep *ep = container_of(node, struct kdbus_ep, node);
++
++	WARN_ON(!list_empty(&ep->conn_list));
++
++	kdbus_policy_db_clear(&ep->policy_db);
++	kdbus_bus_unref(ep->bus);
++	kdbus_user_unref(ep->user);
++	kfree(ep);
++}
++
++static void kdbus_ep_release(struct kdbus_node *node, bool was_active)
++{
++	struct kdbus_ep *ep = container_of(node, struct kdbus_ep, node);
++
++	/* disconnect all connections to this endpoint */
++	for (;;) {
++		struct kdbus_conn *conn;
++
++		mutex_lock(&ep->lock);
++		conn = list_first_entry_or_null(&ep->conn_list,
++						struct kdbus_conn,
++						ep_entry);
++		if (!conn) {
++			mutex_unlock(&ep->lock);
++			break;
++		}
++
++		/* take reference, release lock, disconnect without lock */
++		kdbus_conn_ref(conn);
++		mutex_unlock(&ep->lock);
++
++		kdbus_conn_disconnect(conn, false);
++		kdbus_conn_unref(conn);
++	}
++}
++
++/**
++ * kdbus_ep_new() - create a new endpoint
++ * @bus:		The bus this endpoint will be created for
++ * @name:		The name of the endpoint
++ * @access:		The access flags for this node (KDBUS_MAKE_ACCESS_*)
++ * @uid:		The uid of the node
++ * @gid:		The gid of the node
++ * @is_custom:		Whether this is a custom endpoint
++ *
++ * This function will create a new endpoint with the given
++ * name and properties for a given bus.
++ *
++ * Return: a new kdbus_ep on success, ERR_PTR on failure.
++ */
++struct kdbus_ep *kdbus_ep_new(struct kdbus_bus *bus, const char *name,
++			      unsigned int access, kuid_t uid, kgid_t gid,
++			      bool is_custom)
++{
++	struct kdbus_ep *e;
++	int ret;
++
++	/*
++	 * Validate only custom endpoints names, default endpoints
++	 * with a "bus" name are created when the bus is created
++	 */
++	if (is_custom) {
++		ret = kdbus_verify_uid_prefix(name, bus->domain->user_namespace,
++					      uid);
++		if (ret < 0)
++			return ERR_PTR(ret);
++	}
++
++	e = kzalloc(sizeof(*e), GFP_KERNEL);
++	if (!e)
++		return ERR_PTR(-ENOMEM);
++
++	kdbus_node_init(&e->node, KDBUS_NODE_ENDPOINT);
++
++	e->node.free_cb = kdbus_ep_free;
++	e->node.release_cb = kdbus_ep_release;
++	e->node.uid = uid;
++	e->node.gid = gid;
++	e->node.mode = S_IRUSR | S_IWUSR;
++	if (access & (KDBUS_MAKE_ACCESS_GROUP | KDBUS_MAKE_ACCESS_WORLD))
++		e->node.mode |= S_IRGRP | S_IWGRP;
++	if (access & KDBUS_MAKE_ACCESS_WORLD)
++		e->node.mode |= S_IROTH | S_IWOTH;
++
++	mutex_init(&e->lock);
++	INIT_LIST_HEAD(&e->conn_list);
++	kdbus_policy_db_init(&e->policy_db);
++	e->bus = kdbus_bus_ref(bus);
++
++	ret = kdbus_node_link(&e->node, &bus->node, name);
++	if (ret < 0)
++		goto exit_unref;
++
++	/*
++	 * Transactions on custom endpoints are never accounted on the global
++	 * user limits. Instead, for each custom endpoint, we create a custom,
++	 * unique user, which all transactions are accounted on. Regardless of
++	 * the user using that endpoint, it is always accounted on the same
++	 * user-object. This budget is not shared with ordinary users on
++	 * non-custom endpoints.
++	 */
++	if (is_custom) {
++		e->user = kdbus_user_lookup(bus->domain, INVALID_UID);
++		if (IS_ERR(e->user)) {
++			ret = PTR_ERR(e->user);
++			e->user = NULL;
++			goto exit_unref;
++		}
++	}
++
++	return e;
++
++exit_unref:
++	kdbus_node_deactivate(&e->node);
++	kdbus_node_unref(&e->node);
++	return ERR_PTR(ret);
++}
++
++/**
++ * kdbus_ep_ref() - increase the reference counter of a kdbus_ep
++ * @ep:			The endpoint to reference
++ *
++ * Every user of an endpoint, except for its creator, must add a reference to
++ * the kdbus_ep instance using this function.
++ *
++ * Return: the ep itself
++ */
++struct kdbus_ep *kdbus_ep_ref(struct kdbus_ep *ep)
++{
++	if (ep)
++		kdbus_node_ref(&ep->node);
++	return ep;
++}
++
++/**
++ * kdbus_ep_unref() - decrease the reference counter of a kdbus_ep
++ * @ep:		The ep to unref
++ *
++ * Release a reference. If the reference count drops to 0, the ep will be
++ * freed.
++ *
++ * Return: NULL
++ */
++struct kdbus_ep *kdbus_ep_unref(struct kdbus_ep *ep)
++{
++	if (ep)
++		kdbus_node_unref(&ep->node);
++	return NULL;
++}
++
++/**
++ * kdbus_ep_is_privileged() - check whether a file is privileged
++ * @ep:		endpoint to operate on
++ * @file:	file to test
++ *
++ * Return: True if @file is privileged in the domain of @ep.
++ */
++bool kdbus_ep_is_privileged(struct kdbus_ep *ep, struct file *file)
++{
++	return !ep->user &&
++		file_ns_capable(file, ep->bus->domain->user_namespace,
++				CAP_IPC_OWNER);
++}
++
++/**
++ * kdbus_ep_is_owner() - check whether a file should be treated as bus owner
++ * @ep:		endpoint to operate on
++ * @file:	file to test
++ *
++ * Return: True if @file should be treated as bus owner on @ep
++ */
++bool kdbus_ep_is_owner(struct kdbus_ep *ep, struct file *file)
++{
++	return !ep->user &&
++		(uid_eq(file->f_cred->euid, ep->bus->node.uid) ||
++		 kdbus_ep_is_privileged(ep, file));
++}
++
++/**
++ * kdbus_cmd_ep_make() - handle KDBUS_CMD_ENDPOINT_MAKE
++ * @bus:		bus to operate on
++ * @argp:		command payload
++ *
++ * Return: NULL or newly created endpoint on success, ERR_PTR on failure.
++ */
++struct kdbus_ep *kdbus_cmd_ep_make(struct kdbus_bus *bus, void __user *argp)
++{
++	const char *item_make_name;
++	struct kdbus_ep *ep = NULL;
++	struct kdbus_cmd *cmd;
++	int ret;
++
++	struct kdbus_arg argv[] = {
++		{ .type = KDBUS_ITEM_NEGOTIATE },
++		{ .type = KDBUS_ITEM_MAKE_NAME, .mandatory = true },
++	};
++	struct kdbus_args args = {
++		.allowed_flags = KDBUS_FLAG_NEGOTIATE |
++				 KDBUS_MAKE_ACCESS_GROUP |
++				 KDBUS_MAKE_ACCESS_WORLD,
++		.argv = argv,
++		.argc = ARRAY_SIZE(argv),
++	};
++
++	ret = kdbus_args_parse(&args, argp, &cmd);
++	if (ret < 0)
++		return ERR_PTR(ret);
++	if (ret > 0)
++		return NULL;
++
++	item_make_name = argv[1].item->str;
++
++	ep = kdbus_ep_new(bus, item_make_name, cmd->flags,
++			  current_euid(), current_egid(), true);
++	if (IS_ERR(ep)) {
++		ret = PTR_ERR(ep);
++		ep = NULL;
++		goto exit;
++	}
++
++	if (!kdbus_node_activate(&ep->node)) {
++		ret = -ESHUTDOWN;
++		goto exit;
++	}
++
++exit:
++	ret = kdbus_args_clear(&args, ret);
++	if (ret < 0) {
++		if (ep) {
++			kdbus_node_deactivate(&ep->node);
++			kdbus_ep_unref(ep);
++		}
++		return ERR_PTR(ret);
++	}
++	return ep;
++}
++
++/**
++ * kdbus_cmd_ep_update() - handle KDBUS_CMD_ENDPOINT_UPDATE
++ * @ep:			endpoint to operate on
++ * @argp:		command payload
++ *
++ * Return: >=0 on success, negative error code on failure.
++ */
++int kdbus_cmd_ep_update(struct kdbus_ep *ep, void __user *argp)
++{
++	struct kdbus_cmd *cmd;
++	int ret;
++
++	struct kdbus_arg argv[] = {
++		{ .type = KDBUS_ITEM_NEGOTIATE },
++		{ .type = KDBUS_ITEM_NAME, .multiple = true },
++		{ .type = KDBUS_ITEM_POLICY_ACCESS, .multiple = true },
++	};
++	struct kdbus_args args = {
++		.allowed_flags = KDBUS_FLAG_NEGOTIATE,
++		.argv = argv,
++		.argc = ARRAY_SIZE(argv),
++	};
++
++	ret = kdbus_args_parse(&args, argp, &cmd);
++	if (ret != 0)
++		return ret;
++
++	ret = kdbus_policy_set(&ep->policy_db, args.items, args.items_size,
++			       0, true, ep);
++	return kdbus_args_clear(&args, ret);
++}
+diff --git a/ipc/kdbus/endpoint.h b/ipc/kdbus/endpoint.h
+new file mode 100644
+index 0000000..e0da59f
+--- /dev/null
++++ b/ipc/kdbus/endpoint.h
+@@ -0,0 +1,70 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_ENDPOINT_H
++#define __KDBUS_ENDPOINT_H
++
++#include <linux/list.h>
++#include <linux/mutex.h>
++#include <linux/uidgid.h>
++#include "node.h"
++#include "policy.h"
++
++struct kdbus_bus;
++struct kdbus_user;
++
++/**
++ * struct kdbus_ep - endpoint to access a bus
++ * @node:		The kdbus node
++ * @lock:		Endpoint data lock
++ * @bus:		Bus behind this endpoint
++ * @user:		Custom enpoints account against an anonymous user
++ * @policy_db:		Uploaded policy
++ * @conn_list:		Connections of this endpoint
++ *
++ * An endpoint offers access to a bus; the default endpoint node name is "bus".
++ * Additional custom endpoints to the same bus can be created and they can
++ * carry their own policies/filters.
++ */
++struct kdbus_ep {
++	struct kdbus_node node;
++	struct mutex lock;
++
++	/* static */
++	struct kdbus_bus *bus;
++	struct kdbus_user *user;
++
++	/* protected by own locks */
++	struct kdbus_policy_db policy_db;
++
++	/* protected by ep->lock */
++	struct list_head conn_list;
++};
++
++#define kdbus_ep_from_node(_node) \
++	container_of((_node), struct kdbus_ep, node)
++
++struct kdbus_ep *kdbus_ep_new(struct kdbus_bus *bus, const char *name,
++			      unsigned int access, kuid_t uid, kgid_t gid,
++			      bool policy);
++struct kdbus_ep *kdbus_ep_ref(struct kdbus_ep *ep);
++struct kdbus_ep *kdbus_ep_unref(struct kdbus_ep *ep);
++
++bool kdbus_ep_is_privileged(struct kdbus_ep *ep, struct file *file);
++bool kdbus_ep_is_owner(struct kdbus_ep *ep, struct file *file);
++
++struct kdbus_ep *kdbus_cmd_ep_make(struct kdbus_bus *bus, void __user *argp);
++int kdbus_cmd_ep_update(struct kdbus_ep *ep, void __user *argp);
++
++#endif
+diff --git a/ipc/kdbus/fs.c b/ipc/kdbus/fs.c
+new file mode 100644
+index 0000000..09c4809
+--- /dev/null
++++ b/ipc/kdbus/fs.c
+@@ -0,0 +1,508 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/dcache.h>
++#include <linux/fs.h>
++#include <linux/fsnotify.h>
++#include <linux/init.h>
++#include <linux/ipc_namespace.h>
++#include <linux/magic.h>
++#include <linux/module.h>
++#include <linux/mount.h>
++#include <linux/mutex.h>
++#include <linux/namei.h>
++#include <linux/pagemap.h>
++#include <linux/sched.h>
++#include <linux/slab.h>
++
++#include "bus.h"
++#include "domain.h"
++#include "endpoint.h"
++#include "fs.h"
++#include "handle.h"
++#include "node.h"
++
++#define kdbus_node_from_dentry(_dentry) \
++	((struct kdbus_node *)(_dentry)->d_fsdata)
++
++static struct inode *fs_inode_get(struct super_block *sb,
++				  struct kdbus_node *node);
++
++/*
++ * Directory Management
++ */
++
++static inline unsigned char kdbus_dt_type(struct kdbus_node *node)
++{
++	switch (node->type) {
++	case KDBUS_NODE_DOMAIN:
++	case KDBUS_NODE_BUS:
++		return DT_DIR;
++	case KDBUS_NODE_CONTROL:
++	case KDBUS_NODE_ENDPOINT:
++		return DT_REG;
++	}
++
++	return DT_UNKNOWN;
++}
++
++static int fs_dir_fop_iterate(struct file *file, struct dir_context *ctx)
++{
++	struct dentry *dentry = file->f_path.dentry;
++	struct kdbus_node *parent = kdbus_node_from_dentry(dentry);
++	struct kdbus_node *old, *next = file->private_data;
++
++	/*
++	 * kdbusfs directory iterator (modelled after sysfs/kernfs)
++	 * When iterating kdbusfs directories, we iterate all children of the
++	 * parent kdbus_node object. We use ctx->pos to store the hash of the
++	 * child and file->private_data to store a reference to the next node
++	 * object. If ctx->pos is not modified via llseek while you iterate a
++	 * directory, then we use the file->private_data node pointer to
++	 * directly access the next node in the tree.
++	 * However, if you directly seek on the directory, we have to find the
++	 * closest node to that position and cannot use our node pointer. This
++	 * means iterating the rb-tree to find the closest match and start over
++	 * from there.
++	 * Note that hash values are not necessarily unique. Therefore, llseek
++	 * is not guaranteed to seek to the same node that you got when you
++	 * retrieved the position. Seeking to 0, 1, 2 and >=INT_MAX is safe,
++	 * though. We could use the inode-number as position, but this would
++	 * require another rb-tree for fast access. Kernfs and others already
++	 * ignore those conflicts, so we should be fine, too.
++	 */
++
++	if (!dir_emit_dots(file, ctx))
++		return 0;
++
++	/* acquire @next; if deactivated, or seek detected, find next node */
++	old = next;
++	if (next && ctx->pos == next->hash) {
++		if (kdbus_node_acquire(next))
++			kdbus_node_ref(next);
++		else
++			next = kdbus_node_next_child(parent, next);
++	} else {
++		next = kdbus_node_find_closest(parent, ctx->pos);
++	}
++	kdbus_node_unref(old);
++
++	while (next) {
++		/* emit @next */
++		file->private_data = next;
++		ctx->pos = next->hash;
++
++		kdbus_node_release(next);
++
++		if (!dir_emit(ctx, next->name, strlen(next->name), next->id,
++			      kdbus_dt_type(next)))
++			return 0;
++
++		/* find next node after @next */
++		old = next;
++		next = kdbus_node_next_child(parent, next);
++		kdbus_node_unref(old);
++	}
++
++	file->private_data = NULL;
++	ctx->pos = INT_MAX;
++
++	return 0;
++}
++
++static loff_t fs_dir_fop_llseek(struct file *file, loff_t offset, int whence)
++{
++	struct inode *inode = file_inode(file);
++	loff_t ret;
++
++	/* protect f_off against fop_iterate */
++	mutex_lock(&inode->i_mutex);
++	ret = generic_file_llseek(file, offset, whence);
++	mutex_unlock(&inode->i_mutex);
++
++	return ret;
++}
++
++static int fs_dir_fop_release(struct inode *inode, struct file *file)
++{
++	kdbus_node_unref(file->private_data);
++	return 0;
++}
++
++static const struct file_operations fs_dir_fops = {
++	.read		= generic_read_dir,
++	.iterate	= fs_dir_fop_iterate,
++	.llseek		= fs_dir_fop_llseek,
++	.release	= fs_dir_fop_release,
++};
++
++static struct dentry *fs_dir_iop_lookup(struct inode *dir,
++					struct dentry *dentry,
++					unsigned int flags)
++{
++	struct dentry *dnew = NULL;
++	struct kdbus_node *parent;
++	struct kdbus_node *node;
++	struct inode *inode;
++
++	parent = kdbus_node_from_dentry(dentry->d_parent);
++	if (!kdbus_node_acquire(parent))
++		return NULL;
++
++	/* returns reference to _acquired_ child node */
++	node = kdbus_node_find_child(parent, dentry->d_name.name);
++	if (node) {
++		dentry->d_fsdata = node;
++		inode = fs_inode_get(dir->i_sb, node);
++		if (IS_ERR(inode))
++			dnew = ERR_CAST(inode);
++		else
++			dnew = d_splice_alias(inode, dentry);
++
++		kdbus_node_release(node);
++	}
++
++	kdbus_node_release(parent);
++	return dnew;
++}
++
++static const struct inode_operations fs_dir_iops = {
++	.permission	= generic_permission,
++	.lookup		= fs_dir_iop_lookup,
++};
++
++/*
++ * Inode Management
++ */
++
++static const struct inode_operations fs_inode_iops = {
++	.permission	= generic_permission,
++};
++
++static struct inode *fs_inode_get(struct super_block *sb,
++				  struct kdbus_node *node)
++{
++	struct inode *inode;
++
++	inode = iget_locked(sb, node->id);
++	if (!inode)
++		return ERR_PTR(-ENOMEM);
++	if (!(inode->i_state & I_NEW))
++		return inode;
++
++	inode->i_private = kdbus_node_ref(node);
++	inode->i_mapping->a_ops = &empty_aops;
++	inode->i_mode = node->mode & S_IALLUGO;
++	inode->i_atime = inode->i_ctime = inode->i_mtime = CURRENT_TIME;
++	inode->i_uid = node->uid;
++	inode->i_gid = node->gid;
++
++	switch (node->type) {
++	case KDBUS_NODE_DOMAIN:
++	case KDBUS_NODE_BUS:
++		inode->i_mode |= S_IFDIR;
++		inode->i_op = &fs_dir_iops;
++		inode->i_fop = &fs_dir_fops;
++		set_nlink(inode, 2);
++		break;
++	case KDBUS_NODE_CONTROL:
++	case KDBUS_NODE_ENDPOINT:
++		inode->i_mode |= S_IFREG;
++		inode->i_op = &fs_inode_iops;
++		inode->i_fop = &kdbus_handle_ops;
++		break;
++	}
++
++	unlock_new_inode(inode);
++
++	return inode;
++}
++
++/*
++ * Superblock Management
++ */
++
++static int fs_super_dop_revalidate(struct dentry *dentry, unsigned int flags)
++{
++	struct kdbus_node *node;
++
++	/* Force lookup on negatives */
++	if (!dentry->d_inode)
++		return 0;
++
++	node = kdbus_node_from_dentry(dentry);
++
++	/* see whether the node has been removed */
++	if (!kdbus_node_is_active(node))
++		return 0;
++
++	return 1;
++}
++
++static void fs_super_dop_release(struct dentry *dentry)
++{
++	kdbus_node_unref(dentry->d_fsdata);
++}
++
++static const struct dentry_operations fs_super_dops = {
++	.d_revalidate	= fs_super_dop_revalidate,
++	.d_release	= fs_super_dop_release,
++};
++
++static void fs_super_sop_evict_inode(struct inode *inode)
++{
++	struct kdbus_node *node = kdbus_node_from_inode(inode);
++
++	truncate_inode_pages_final(&inode->i_data);
++	clear_inode(inode);
++	kdbus_node_unref(node);
++}
++
++static const struct super_operations fs_super_sops = {
++	.statfs		= simple_statfs,
++	.drop_inode	= generic_delete_inode,
++	.evict_inode	= fs_super_sop_evict_inode,
++};
++
++static int fs_super_fill(struct super_block *sb)
++{
++	struct kdbus_domain *domain = sb->s_fs_info;
++	struct inode *inode;
++	int ret;
++
++	sb->s_blocksize = PAGE_CACHE_SIZE;
++	sb->s_blocksize_bits = PAGE_CACHE_SHIFT;
++	sb->s_magic = KDBUS_SUPER_MAGIC;
++	sb->s_maxbytes = MAX_LFS_FILESIZE;
++	sb->s_op = &fs_super_sops;
++	sb->s_time_gran = 1;
++
++	inode = fs_inode_get(sb, &domain->node);
++	if (IS_ERR(inode))
++		return PTR_ERR(inode);
++
++	sb->s_root = d_make_root(inode);
++	if (!sb->s_root) {
++		/* d_make_root iput()s the inode on failure */
++		return -ENOMEM;
++	}
++
++	/* sb holds domain reference */
++	sb->s_root->d_fsdata = &domain->node;
++	sb->s_d_op = &fs_super_dops;
++
++	/* sb holds root reference */
++	domain->dentry = sb->s_root;
++
++	if (!kdbus_node_activate(&domain->node))
++		return -ESHUTDOWN;
++
++	ret = kdbus_domain_populate(domain, KDBUS_MAKE_ACCESS_WORLD);
++	if (ret < 0)
++		return ret;
++
++	sb->s_flags |= MS_ACTIVE;
++	return 0;
++}
++
++static void fs_super_kill(struct super_block *sb)
++{
++	struct kdbus_domain *domain = sb->s_fs_info;
++
++	if (domain) {
++		kdbus_node_deactivate(&domain->node);
++		domain->dentry = NULL;
++	}
++
++	kill_anon_super(sb);
++	kdbus_domain_unref(domain);
++}
++
++static int fs_super_set(struct super_block *sb, void *data)
++{
++	int ret;
++
++	ret = set_anon_super(sb, data);
++	if (!ret)
++		sb->s_fs_info = data;
++
++	return ret;
++}
++
++static struct dentry *fs_super_mount(struct file_system_type *fs_type,
++				     int flags, const char *dev_name,
++				     void *data)
++{
++	struct kdbus_domain *domain;
++	struct super_block *sb;
++	int ret;
++
++	domain = kdbus_domain_new(KDBUS_MAKE_ACCESS_WORLD);
++	if (IS_ERR(domain))
++		return ERR_CAST(domain);
++
++	sb = sget(fs_type, NULL, fs_super_set, flags, domain);
++	if (IS_ERR(sb)) {
++		kdbus_node_deactivate(&domain->node);
++		kdbus_domain_unref(domain);
++		return ERR_CAST(sb);
++	}
++
++	WARN_ON(sb->s_fs_info != domain);
++	WARN_ON(sb->s_root);
++
++	ret = fs_super_fill(sb);
++	if (ret < 0) {
++		/* calls into ->kill_sb() when done */
++		deactivate_locked_super(sb);
++		return ERR_PTR(ret);
++	}
++
++	return dget(sb->s_root);
++}
++
++static struct file_system_type fs_type = {
++	.name		= KBUILD_MODNAME "fs",
++	.owner		= THIS_MODULE,
++	.mount		= fs_super_mount,
++	.kill_sb	= fs_super_kill,
++	.fs_flags	= FS_USERNS_MOUNT,
++};
++
++/**
++ * kdbus_fs_init() - register kdbus filesystem
++ *
++ * This registers a filesystem with the VFS layer. The filesystem is called
++ * `KBUILD_MODNAME "fs"', which usually resolves to `kdbusfs'. The nameing
++ * scheme allows to set KBUILD_MODNAME to "kdbus2" and you will get an
++ * independent filesystem for developers.
++ *
++ * Each mount of the kdbusfs filesystem has an kdbus_domain attached.
++ * Operations on this mount will only affect the attached domain. On each mount
++ * a new domain is automatically created and used for this mount exclusively.
++ * If you want to share a domain across multiple mounts, you need to bind-mount
++ * it.
++ *
++ * Mounts of kdbusfs (with a different domain each) are unrelated to each other
++ * and will never have any effect on any domain but their own.
++ *
++ * Return: 0 on success, negative error otherwise.
++ */
++int kdbus_fs_init(void)
++{
++	return register_filesystem(&fs_type);
++}
++
++/**
++ * kdbus_fs_exit() - unregister kdbus filesystem
++ *
++ * This does the reverse to kdbus_fs_init(). It unregisters the kdbusfs
++ * filesystem from VFS and cleans up any allocated resources.
++ */
++void kdbus_fs_exit(void)
++{
++	unregister_filesystem(&fs_type);
++}
++
++/* acquire domain of @node, making sure all ancestors are active */
++static struct kdbus_domain *fs_acquire_domain(struct kdbus_node *node)
++{
++	struct kdbus_domain *domain;
++	struct kdbus_node *iter;
++
++	/* caller must guarantee that @node is linked */
++	for (iter = node; iter->parent; iter = iter->parent)
++		if (!kdbus_node_is_active(iter->parent))
++			return NULL;
++
++	/* root nodes are always domains */
++	if (WARN_ON(iter->type != KDBUS_NODE_DOMAIN))
++		return NULL;
++
++	domain = kdbus_domain_from_node(iter);
++	if (!kdbus_node_acquire(&domain->node))
++		return NULL;
++
++	return domain;
++}
++
++/**
++ * kdbus_fs_flush() - flush dcache entries of a node
++ * @node:		Node to flush entries of
++ *
++ * This flushes all VFS filesystem cache entries for a node and all its
++ * children. This should be called whenever a node is destroyed during
++ * runtime. It will flush the cache entries so the linked objects can be
++ * deallocated.
++ *
++ * This is a no-op if you call it on active nodes (they really should stay in
++ * cache) or on nodes with deactivated parents (flushing the parent is enough).
++ * Furthermore, there is no need to call it on nodes whose lifetime is bound to
++ * their parents'. In those cases, the parent-flush will always also flush the
++ * children.
++ */
++void kdbus_fs_flush(struct kdbus_node *node)
++{
++	struct dentry *dentry, *parent_dentry = NULL;
++	struct kdbus_domain *domain;
++	struct qstr name;
++
++	/* active nodes should remain in cache */
++	if (!kdbus_node_is_deactivated(node))
++		return;
++
++	/* nodes that were never linked were never instantiated */
++	if (!node->parent)
++		return;
++
++	/* acquire domain and verify all ancestors are active */
++	domain = fs_acquire_domain(node);
++	if (!domain)
++		return;
++
++	switch (node->type) {
++	case KDBUS_NODE_ENDPOINT:
++		if (WARN_ON(!node->parent || !node->parent->name))
++			goto exit;
++
++		name.name = node->parent->name;
++		name.len = strlen(node->parent->name);
++		parent_dentry = d_hash_and_lookup(domain->dentry, &name);
++		if (IS_ERR_OR_NULL(parent_dentry))
++			goto exit;
++
++		/* fallthrough */
++	case KDBUS_NODE_BUS:
++		if (WARN_ON(!node->name))
++			goto exit;
++
++		name.name = node->name;
++		name.len = strlen(node->name);
++		dentry = d_hash_and_lookup(parent_dentry ? : domain->dentry,
++					   &name);
++		if (!IS_ERR_OR_NULL(dentry)) {
++			d_invalidate(dentry);
++			dput(dentry);
++		}
++
++		dput(parent_dentry);
++		break;
++
++	default:
++		/* all other types are bound to their parent lifetime */
++		break;
++	}
++
++exit:
++	kdbus_node_release(&domain->node);
++}
+diff --git a/ipc/kdbus/fs.h b/ipc/kdbus/fs.h
+new file mode 100644
+index 0000000..62f7d6a
+--- /dev/null
++++ b/ipc/kdbus/fs.h
+@@ -0,0 +1,28 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUSFS_H
++#define __KDBUSFS_H
++
++#include <linux/kernel.h>
++
++struct kdbus_node;
++
++int kdbus_fs_init(void);
++void kdbus_fs_exit(void);
++void kdbus_fs_flush(struct kdbus_node *node);
++
++#define kdbus_node_from_inode(_inode) \
++	((struct kdbus_node *)(_inode)->i_private)
++
++#endif
+diff --git a/ipc/kdbus/handle.c b/ipc/kdbus/handle.c
+new file mode 100644
+index 0000000..fc60932
+--- /dev/null
++++ b/ipc/kdbus/handle.c
+@@ -0,0 +1,691 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/file.h>
++#include <linux/fs.h>
++#include <linux/idr.h>
++#include <linux/init.h>
++#include <linux/kdev_t.h>
++#include <linux/module.h>
++#include <linux/mutex.h>
++#include <linux/poll.h>
++#include <linux/rwsem.h>
++#include <linux/sched.h>
++#include <linux/sizes.h>
++#include <linux/slab.h>
++#include <linux/uaccess.h>
++#include <linux/syscalls.h>
++
++#include "bus.h"
++#include "connection.h"
++#include "endpoint.h"
++#include "fs.h"
++#include "handle.h"
++#include "item.h"
++#include "match.h"
++#include "message.h"
++#include "names.h"
++#include "domain.h"
++#include "policy.h"
++
++static int kdbus_args_verify(struct kdbus_args *args)
++{
++	struct kdbus_item *item;
++	size_t i;
++	int ret;
++
++	KDBUS_ITEMS_FOREACH(item, args->items, args->items_size) {
++		struct kdbus_arg *arg = NULL;
++
++		if (!KDBUS_ITEM_VALID(item, args->items, args->items_size))
++			return -EINVAL;
++
++		for (i = 0; i < args->argc; ++i)
++			if (args->argv[i].type == item->type)
++				break;
++		if (i >= args->argc)
++			return -EINVAL;
++
++		arg = &args->argv[i];
++
++		ret = kdbus_item_validate(item);
++		if (ret < 0)
++			return ret;
++
++		if (arg->item && !arg->multiple)
++			return -EINVAL;
++
++		arg->item = item;
++	}
++
++	if (!KDBUS_ITEMS_END(item, args->items, args->items_size))
++		return -EINVAL;
++
++	return 0;
++}
++
++static int kdbus_args_negotiate(struct kdbus_args *args)
++{
++	struct kdbus_item __user *user;
++	struct kdbus_item *negotiation;
++	size_t i, j, num;
++
++	/*
++	 * If KDBUS_FLAG_NEGOTIATE is set, we overwrite the flags field with
++	 * the set of supported flags. Furthermore, if an KDBUS_ITEM_NEGOTIATE
++	 * item is passed, we iterate its payload (array of u64, each set to an
++	 * item type) and clear all unsupported item-types to 0.
++	 * The caller might do this recursively, if other flags or objects are
++	 * embedded in the payload itself.
++	 */
++
++	if (args->cmd->flags & KDBUS_FLAG_NEGOTIATE) {
++		if (put_user(args->allowed_flags & ~KDBUS_FLAG_NEGOTIATE,
++			     &args->user->flags))
++			return -EFAULT;
++	}
++
++	if (args->argc < 1 || args->argv[0].type != KDBUS_ITEM_NEGOTIATE ||
++	    !args->argv[0].item)
++		return 0;
++
++	negotiation = args->argv[0].item;
++	user = (struct kdbus_item __user *)
++		((u8 __user *)args->user +
++		 ((u8 *)negotiation - (u8 *)args->cmd));
++	num = KDBUS_ITEM_PAYLOAD_SIZE(negotiation) / sizeof(u64);
++
++	for (i = 0; i < num; ++i) {
++		for (j = 0; j < args->argc; ++j)
++			if (negotiation->data64[i] == args->argv[j].type)
++				break;
++
++		if (j < args->argc)
++			continue;
++
++		/* this item is not supported, clear it out */
++		negotiation->data64[i] = 0;
++		if (put_user(negotiation->data64[i], &user->data64[i]))
++			return -EFAULT;
++	}
++
++	return 0;
++}
++
++/**
++ * __kdbus_args_parse() - parse payload of kdbus command
++ * @args:		object to parse data into
++ * @is_cmd:		whether this is a command or msg payload
++ * @argp:		user-space location of command payload to parse
++ * @type_size:		overall size of command payload to parse
++ * @items_offset:	offset of items array in command payload
++ * @out:		output variable to store pointer to copied payload
++ *
++ * This parses the ioctl payload at user-space location @argp into @args. @args
++ * must be pre-initialized by the caller to reflect the supported flags and
++ * items of this command. This parser will then copy the command payload into
++ * kernel-space, verify correctness and consistency and cache pointers to parsed
++ * items and other data in @args.
++ *
++ * If this function succeeded, you must call kdbus_args_clear() to release
++ * allocated resources before destroying @args.
++ *
++ * This can also be used to import kdbus_msg objects. In that case, @is_cmd must
++ * be set to 'false' and the 'return_flags' field will not be touched (as it
++ * doesn't exist on kdbus_msg).
++ *
++ * Return: On failure a negative error code is returned. Otherwise, 1 is
++ * returned if negotiation was requested, 0 if not.
++ */
++int __kdbus_args_parse(struct kdbus_args *args, bool is_cmd, void __user *argp,
++		       size_t type_size, size_t items_offset, void **out)
++{
++	u64 user_size;
++	int ret, i;
++
++	ret = kdbus_copy_from_user(&user_size, argp, sizeof(user_size));
++	if (ret < 0)
++		return ret;
++
++	if (user_size < type_size)
++		return -EINVAL;
++	if (user_size > KDBUS_CMD_MAX_SIZE)
++		return -EMSGSIZE;
++
++	if (user_size <= sizeof(args->cmd_buf)) {
++		if (copy_from_user(args->cmd_buf, argp, user_size))
++			return -EFAULT;
++		args->cmd = (void*)args->cmd_buf;
++	} else {
++		args->cmd = memdup_user(argp, user_size);
++		if (IS_ERR(args->cmd))
++			return PTR_ERR(args->cmd);
++	}
++
++	if (args->cmd->size != user_size) {
++		ret = -EINVAL;
++		goto error;
++	}
++
++	if (is_cmd)
++		args->cmd->return_flags = 0;
++	args->user = argp;
++	args->items = (void *)((u8 *)args->cmd + items_offset);
++	args->items_size = args->cmd->size - items_offset;
++	args->is_cmd = is_cmd;
++
++	if (args->cmd->flags & ~args->allowed_flags) {
++		ret = -EINVAL;
++		goto error;
++	}
++
++	ret = kdbus_args_verify(args);
++	if (ret < 0)
++		goto error;
++
++	ret = kdbus_args_negotiate(args);
++	if (ret < 0)
++		goto error;
++
++	/* mandatory items must be given (but not on negotiation) */
++	if (!(args->cmd->flags & KDBUS_FLAG_NEGOTIATE)) {
++		for (i = 0; i < args->argc; ++i)
++			if (args->argv[i].mandatory && !args->argv[i].item) {
++				ret = -EINVAL;
++				goto error;
++			}
++	}
++
++	*out = args->cmd;
++	return !!(args->cmd->flags & KDBUS_FLAG_NEGOTIATE);
++
++error:
++	return kdbus_args_clear(args, ret);
++}
++
++/**
++ * kdbus_args_clear() - release allocated command resources
++ * @args:	object to release resources of
++ * @ret:	return value of this command
++ *
++ * This frees all allocated resources on @args and copies the command result
++ * flags into user-space. @ret is usually returned unchanged by this function,
++ * so it can be used in the final 'return' statement of the command handler.
++ *
++ * Return: -EFAULT if return values cannot be copied into user-space, otherwise
++ *         @ret is returned unchanged.
++ */
++int kdbus_args_clear(struct kdbus_args *args, int ret)
++{
++	if (!args)
++		return ret;
++
++	if (!IS_ERR_OR_NULL(args->cmd)) {
++		if (args->is_cmd && put_user(args->cmd->return_flags,
++					     &args->user->return_flags))
++			ret = -EFAULT;
++		if (args->cmd != (void*)args->cmd_buf)
++			kfree(args->cmd);
++		args->cmd = NULL;
++	}
++
++	return ret;
++}
++
++/**
++ * enum kdbus_handle_type - type an handle can be of
++ * @KDBUS_HANDLE_NONE:		no type set, yet
++ * @KDBUS_HANDLE_BUS_OWNER:	bus owner
++ * @KDBUS_HANDLE_EP_OWNER:	endpoint owner
++ * @KDBUS_HANDLE_CONNECTED:	endpoint connection after HELLO
++ */
++enum kdbus_handle_type {
++	KDBUS_HANDLE_NONE,
++	KDBUS_HANDLE_BUS_OWNER,
++	KDBUS_HANDLE_EP_OWNER,
++	KDBUS_HANDLE_CONNECTED,
++};
++
++/**
++ * struct kdbus_handle - handle to the kdbus system
++ * @lock:		handle lock
++ * @type:		type of this handle (KDBUS_HANDLE_*)
++ * @bus_owner:		bus this handle owns
++ * @ep_owner:		endpoint this handle owns
++ * @conn:		connection this handle owns
++ */
++struct kdbus_handle {
++	struct mutex lock;
++
++	enum kdbus_handle_type type;
++	union {
++		struct kdbus_bus *bus_owner;
++		struct kdbus_ep *ep_owner;
++		struct kdbus_conn *conn;
++	};
++};
++
++static int kdbus_handle_open(struct inode *inode, struct file *file)
++{
++	struct kdbus_handle *handle;
++	struct kdbus_node *node;
++	int ret;
++
++	node = kdbus_node_from_inode(inode);
++	if (!kdbus_node_acquire(node))
++		return -ESHUTDOWN;
++
++	handle = kzalloc(sizeof(*handle), GFP_KERNEL);
++	if (!handle) {
++		ret = -ENOMEM;
++		goto exit;
++	}
++
++	mutex_init(&handle->lock);
++	handle->type = KDBUS_HANDLE_NONE;
++
++	file->private_data = handle;
++	ret = 0;
++
++exit:
++	kdbus_node_release(node);
++	return ret;
++}
++
++static int kdbus_handle_release(struct inode *inode, struct file *file)
++{
++	struct kdbus_handle *handle = file->private_data;
++
++	switch (handle->type) {
++	case KDBUS_HANDLE_BUS_OWNER:
++		if (handle->bus_owner) {
++			kdbus_node_deactivate(&handle->bus_owner->node);
++			kdbus_bus_unref(handle->bus_owner);
++		}
++		break;
++	case KDBUS_HANDLE_EP_OWNER:
++		if (handle->ep_owner) {
++			kdbus_node_deactivate(&handle->ep_owner->node);
++			kdbus_ep_unref(handle->ep_owner);
++		}
++		break;
++	case KDBUS_HANDLE_CONNECTED:
++		kdbus_conn_disconnect(handle->conn, false);
++		kdbus_conn_unref(handle->conn);
++		break;
++	case KDBUS_HANDLE_NONE:
++		/* nothing to clean up */
++		break;
++	}
++
++	kfree(handle);
++
++	return 0;
++}
++
++static long kdbus_handle_ioctl_control(struct file *file, unsigned int cmd,
++				       void __user *argp)
++{
++	struct kdbus_handle *handle = file->private_data;
++	struct kdbus_node *node = file_inode(file)->i_private;
++	struct kdbus_domain *domain;
++	int ret = 0;
++
++	if (!kdbus_node_acquire(node))
++		return -ESHUTDOWN;
++
++	/*
++	 * The parent of control-nodes is always a domain, make sure to pin it
++	 * so the parent is actually valid.
++	 */
++	domain = kdbus_domain_from_node(node->parent);
++	if (!kdbus_node_acquire(&domain->node)) {
++		kdbus_node_release(node);
++		return -ESHUTDOWN;
++	}
++
++	switch (cmd) {
++	case KDBUS_CMD_BUS_MAKE: {
++		struct kdbus_bus *bus;
++
++		bus = kdbus_cmd_bus_make(domain, argp);
++		if (IS_ERR_OR_NULL(bus)) {
++			ret = PTR_ERR_OR_ZERO(bus);
++			break;
++		}
++
++		handle->bus_owner = bus;
++		ret = KDBUS_HANDLE_BUS_OWNER;
++		break;
++	}
++
++	default:
++		ret = -EBADFD;
++		break;
++	}
++
++	kdbus_node_release(&domain->node);
++	kdbus_node_release(node);
++	return ret;
++}
++
++static long kdbus_handle_ioctl_ep(struct file *file, unsigned int cmd,
++				  void __user *buf)
++{
++	struct kdbus_handle *handle = file->private_data;
++	struct kdbus_node *node = file_inode(file)->i_private;
++	struct kdbus_ep *ep, *file_ep = kdbus_ep_from_node(node);
++	struct kdbus_bus *bus = file_ep->bus;
++	struct kdbus_conn *conn;
++	int ret = 0;
++
++	if (!kdbus_node_acquire(node))
++		return -ESHUTDOWN;
++
++	switch (cmd) {
++	case KDBUS_CMD_ENDPOINT_MAKE: {
++		/* creating custom endpoints is a privileged operation */
++		if (!kdbus_ep_is_owner(file_ep, file)) {
++			ret = -EPERM;
++			break;
++		}
++
++		ep = kdbus_cmd_ep_make(bus, buf);
++		if (IS_ERR_OR_NULL(ep)) {
++			ret = PTR_ERR_OR_ZERO(ep);
++			break;
++		}
++
++		handle->ep_owner = ep;
++		ret = KDBUS_HANDLE_EP_OWNER;
++		break;
++	}
++
++	case KDBUS_CMD_HELLO:
++		conn = kdbus_cmd_hello(file_ep, file, buf);
++		if (IS_ERR_OR_NULL(conn)) {
++			ret = PTR_ERR_OR_ZERO(conn);
++			break;
++		}
++
++		handle->conn = conn;
++		ret = KDBUS_HANDLE_CONNECTED;
++		break;
++
++	default:
++		ret = -EBADFD;
++		break;
++	}
++
++	kdbus_node_release(node);
++	return ret;
++}
++
++static long kdbus_handle_ioctl_ep_owner(struct file *file, unsigned int command,
++					void __user *buf)
++{
++	struct kdbus_handle *handle = file->private_data;
++	struct kdbus_ep *ep = handle->ep_owner;
++	int ret;
++
++	if (!kdbus_node_acquire(&ep->node))
++		return -ESHUTDOWN;
++
++	switch (command) {
++	case KDBUS_CMD_ENDPOINT_UPDATE:
++		ret = kdbus_cmd_ep_update(ep, buf);
++		break;
++	default:
++		ret = -EBADFD;
++		break;
++	}
++
++	kdbus_node_release(&ep->node);
++	return ret;
++}
++
++static long kdbus_handle_ioctl_connected(struct file *file,
++					 unsigned int command, void __user *buf)
++{
++	struct kdbus_handle *handle = file->private_data;
++	struct kdbus_conn *conn = handle->conn;
++	struct kdbus_conn *release_conn = NULL;
++	int ret;
++
++	release_conn = conn;
++	ret = kdbus_conn_acquire(release_conn);
++	if (ret < 0)
++		return ret;
++
++	switch (command) {
++	case KDBUS_CMD_BYEBYE:
++		/*
++		 * BYEBYE is special; we must not acquire a connection when
++		 * calling into kdbus_conn_disconnect() or we will deadlock,
++		 * because kdbus_conn_disconnect() will wait for all acquired
++		 * references to be dropped.
++		 */
++		kdbus_conn_release(release_conn);
++		release_conn = NULL;
++		ret = kdbus_cmd_byebye_unlocked(conn, buf);
++		break;
++	case KDBUS_CMD_NAME_ACQUIRE:
++		ret = kdbus_cmd_name_acquire(conn, buf);
++		break;
++	case KDBUS_CMD_NAME_RELEASE:
++		ret = kdbus_cmd_name_release(conn, buf);
++		break;
++	case KDBUS_CMD_LIST:
++		ret = kdbus_cmd_list(conn, buf);
++		break;
++	case KDBUS_CMD_CONN_INFO:
++		ret = kdbus_cmd_conn_info(conn, buf);
++		break;
++	case KDBUS_CMD_BUS_CREATOR_INFO:
++		ret = kdbus_cmd_bus_creator_info(conn, buf);
++		break;
++	case KDBUS_CMD_UPDATE:
++		ret = kdbus_cmd_update(conn, buf);
++		break;
++	case KDBUS_CMD_MATCH_ADD:
++		ret = kdbus_cmd_match_add(conn, buf);
++		break;
++	case KDBUS_CMD_MATCH_REMOVE:
++		ret = kdbus_cmd_match_remove(conn, buf);
++		break;
++	case KDBUS_CMD_SEND:
++		ret = kdbus_cmd_send(conn, file, buf);
++		break;
++	case KDBUS_CMD_RECV:
++		ret = kdbus_cmd_recv(conn, buf);
++		break;
++	case KDBUS_CMD_FREE:
++		ret = kdbus_cmd_free(conn, buf);
++		break;
++	default:
++		ret = -EBADFD;
++		break;
++	}
++
++	kdbus_conn_release(release_conn);
++	return ret;
++}
++
++static long kdbus_handle_ioctl(struct file *file, unsigned int cmd,
++			       unsigned long arg)
++{
++	struct kdbus_handle *handle = file->private_data;
++	struct kdbus_node *node = kdbus_node_from_inode(file_inode(file));
++	void __user *argp = (void __user *)arg;
++	long ret = -EBADFD;
++
++	switch (cmd) {
++	case KDBUS_CMD_BUS_MAKE:
++	case KDBUS_CMD_ENDPOINT_MAKE:
++	case KDBUS_CMD_HELLO:
++		mutex_lock(&handle->lock);
++		if (handle->type == KDBUS_HANDLE_NONE) {
++			if (node->type == KDBUS_NODE_CONTROL)
++				ret = kdbus_handle_ioctl_control(file, cmd,
++								 argp);
++			else if (node->type == KDBUS_NODE_ENDPOINT)
++				ret = kdbus_handle_ioctl_ep(file, cmd, argp);
++
++			if (ret > 0) {
++				/*
++				 * The data given via open() is not sufficient
++				 * to setup a kdbus handle. Hence, we require
++				 * the user to perform a setup ioctl. This setup
++				 * can only be performed once and defines the
++				 * type of the handle. The different setup
++				 * ioctls are locked against each other so they
++				 * cannot race. Once the handle type is set,
++				 * the type-dependent ioctls are enabled. To
++				 * improve performance, we don't lock those via
++				 * handle->lock. Instead, we issue a
++				 * write-barrier before performing the
++				 * type-change, which pairs with smp_rmb() in
++				 * all handlers that access the type field. This
++				 * guarantees the handle is fully setup, if
++				 * handle->type is set. If handle->type is
++				 * unset, you must not make any assumptions
++				 * without taking handle->lock.
++				 * Note that handle->type is only set once. It
++				 * will never change afterwards.
++				 */
++				smp_wmb();
++				handle->type = ret;
++			}
++		}
++		mutex_unlock(&handle->lock);
++		break;
++
++	case KDBUS_CMD_ENDPOINT_UPDATE:
++	case KDBUS_CMD_BYEBYE:
++	case KDBUS_CMD_NAME_ACQUIRE:
++	case KDBUS_CMD_NAME_RELEASE:
++	case KDBUS_CMD_LIST:
++	case KDBUS_CMD_CONN_INFO:
++	case KDBUS_CMD_BUS_CREATOR_INFO:
++	case KDBUS_CMD_UPDATE:
++	case KDBUS_CMD_MATCH_ADD:
++	case KDBUS_CMD_MATCH_REMOVE:
++	case KDBUS_CMD_SEND:
++	case KDBUS_CMD_RECV:
++	case KDBUS_CMD_FREE: {
++		enum kdbus_handle_type type;
++
++		/*
++		 * This read-barrier pairs with smp_wmb() of the handle setup.
++		 * it guarantees the handle is fully written, in case the
++		 * type has been set. It allows us to access the handle without
++		 * taking handle->lock, given the guarantee that the type is
++		 * only ever set once, and stays constant afterwards.
++		 * Furthermore, the handle object itself is not modified in any
++		 * way after the type is set. That is, the type-field is the
++		 * last field that is written on any handle. If it has not been
++		 * set, we must not access the handle here.
++		 */
++		type = handle->type;
++		smp_rmb();
++
++		if (type == KDBUS_HANDLE_EP_OWNER)
++			ret = kdbus_handle_ioctl_ep_owner(file, cmd, argp);
++		else if (type == KDBUS_HANDLE_CONNECTED)
++			ret = kdbus_handle_ioctl_connected(file, cmd, argp);
++
++		break;
++	}
++	default:
++		ret = -ENOTTY;
++		break;
++	}
++
++	return ret < 0 ? ret : 0;
++}
++
++static unsigned int kdbus_handle_poll(struct file *file,
++				      struct poll_table_struct *wait)
++{
++	struct kdbus_handle *handle = file->private_data;
++	enum kdbus_handle_type type;
++	unsigned int mask = POLLOUT | POLLWRNORM;
++
++	/*
++	 * This pairs with smp_wmb() during handle setup. It guarantees that
++	 * _iff_ the handle type is set, handle->conn is valid. Furthermore,
++	 * _iff_ the type is set, the handle object is constant and never
++	 * changed again. If it's not set, we must not access the handle but
++	 * bail out. We also must assume no setup has taken place, yet.
++	 */
++	type = handle->type;
++	smp_rmb();
++
++	/* Only a connected endpoint can read/write data */
++	if (type != KDBUS_HANDLE_CONNECTED)
++		return POLLERR | POLLHUP;
++
++	poll_wait(file, &handle->conn->wait, wait);
++
++	/*
++	 * Verify the connection hasn't been deactivated _after_ adding the
++	 * wait-queue. This guarantees, that if the connection is deactivated
++	 * after we checked it, the waitqueue is signaled and we're called
++	 * again.
++	 */
++	if (!kdbus_conn_active(handle->conn))
++		return POLLERR | POLLHUP;
++
++	if (!list_empty(&handle->conn->queue.msg_list) ||
++	    atomic_read(&handle->conn->lost_count) > 0)
++		mask |= POLLIN | POLLRDNORM;
++
++	return mask;
++}
++
++static int kdbus_handle_mmap(struct file *file, struct vm_area_struct *vma)
++{
++	struct kdbus_handle *handle = file->private_data;
++	enum kdbus_handle_type type;
++	int ret = -EBADFD;
++
++	/*
++	 * This pairs with smp_wmb() during handle setup. It guarantees that
++	 * _iff_ the handle type is set, handle->conn is valid. Furthermore,
++	 * _iff_ the type is set, the handle object is constant and never
++	 * changed again. If it's not set, we must not access the handle but
++	 * bail out. We also must assume no setup has taken place, yet.
++	 */
++	type = handle->type;
++	smp_rmb();
++
++	/* Only connected handles have a pool we can map */
++	if (type == KDBUS_HANDLE_CONNECTED)
++		ret = kdbus_pool_mmap(handle->conn->pool, vma);
++
++	return ret;
++}
++
++const struct file_operations kdbus_handle_ops = {
++	.owner =		THIS_MODULE,
++	.open =			kdbus_handle_open,
++	.release =		kdbus_handle_release,
++	.poll =			kdbus_handle_poll,
++	.llseek =		noop_llseek,
++	.unlocked_ioctl =	kdbus_handle_ioctl,
++	.mmap =			kdbus_handle_mmap,
++#ifdef CONFIG_COMPAT
++	.compat_ioctl =		kdbus_handle_ioctl,
++#endif
++};
+diff --git a/ipc/kdbus/handle.h b/ipc/kdbus/handle.h
+new file mode 100644
+index 0000000..5dde2c1
+--- /dev/null
++++ b/ipc/kdbus/handle.h
+@@ -0,0 +1,103 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_HANDLE_H
++#define __KDBUS_HANDLE_H
++
++#include <linux/fs.h>
++#include <uapi/linux/kdbus.h>
++
++extern const struct file_operations kdbus_handle_ops;
++
++/**
++ * kdbus_arg - information and state of a single ioctl command item
++ * @type:		item type
++ * @item:		set by the parser to the first found item of this type
++ * @multiple:		whether multiple items of this type are allowed
++ * @mandatory:		whether at least one item of this type is required
++ *
++ * This structure describes a single item in an ioctl command payload. The
++ * caller has to pre-fill the type and flags, the parser will then use this
++ * information to verify the ioctl payload. @item is set by the parser to point
++ * to the first occurrence of the item.
++ */
++struct kdbus_arg {
++	u64 type;
++	struct kdbus_item *item;
++	bool multiple : 1;
++	bool mandatory : 1;
++};
++
++/**
++ * kdbus_args - information and state of ioctl command parser
++ * @allowed_flags:	set of flags this command supports
++ * @argc:		number of items in @argv
++ * @argv:		array of items this command supports
++ * @user:		set by parser to user-space location of current command
++ * @cmd:		set by parser to kernel copy of command payload
++ * @cmd_buf:		inline buf to avoid kmalloc() on small cmds
++ * @items:		points to item array in @cmd
++ * @items_size:		size of @items in bytes
++ * @is_cmd:		whether this is a command-payload or msg-payload
++ *
++ * This structure is used to parse ioctl command payloads on each invocation.
++ * The ioctl handler has to pre-fill the flags and allowed items before passing
++ * the object to kdbus_args_parse(). The parser will copy the command payload
++ * into kernel-space and verify the correctness of the data.
++ *
++ * We use a 256 bytes buffer for small command payloads, to be allocated on
++ * stack on syscall entrance.
++ */
++struct kdbus_args {
++	u64 allowed_flags;
++	size_t argc;
++	struct kdbus_arg *argv;
++
++	struct kdbus_cmd __user *user;
++	struct kdbus_cmd *cmd;
++	u8 cmd_buf[256];
++
++	struct kdbus_item *items;
++	size_t items_size;
++	bool is_cmd : 1;
++};
++
++int __kdbus_args_parse(struct kdbus_args *args, bool is_cmd, void __user *argp,
++		       size_t type_size, size_t items_offset, void **out);
++int kdbus_args_clear(struct kdbus_args *args, int ret);
++
++#define kdbus_args_parse(_args, _argp, _v)                              \
++	({                                                              \
++		BUILD_BUG_ON(offsetof(typeof(**(_v)), size) !=          \
++			     offsetof(struct kdbus_cmd, size));         \
++		BUILD_BUG_ON(offsetof(typeof(**(_v)), flags) !=         \
++			     offsetof(struct kdbus_cmd, flags));        \
++		BUILD_BUG_ON(offsetof(typeof(**(_v)), return_flags) !=  \
++			     offsetof(struct kdbus_cmd, return_flags)); \
++		__kdbus_args_parse((_args), 1, (_argp), sizeof(**(_v)), \
++				   offsetof(typeof(**(_v)), items),     \
++				   (void **)(_v));                      \
++	})
++
++#define kdbus_args_parse_msg(_args, _argp, _v)                          \
++	({                                                              \
++		BUILD_BUG_ON(offsetof(typeof(**(_v)), size) !=          \
++			     offsetof(struct kdbus_cmd, size));         \
++		BUILD_BUG_ON(offsetof(typeof(**(_v)), flags) !=         \
++			     offsetof(struct kdbus_cmd, flags));        \
++		__kdbus_args_parse((_args), 0, (_argp), sizeof(**(_v)), \
++				   offsetof(typeof(**(_v)), items),     \
++				   (void **)(_v));                      \
++	})
++
++#endif
+diff --git a/ipc/kdbus/item.c b/ipc/kdbus/item.c
+new file mode 100644
+index 0000000..ce78dba
+--- /dev/null
++++ b/ipc/kdbus/item.c
+@@ -0,0 +1,293 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/ctype.h>
++#include <linux/fs.h>
++#include <linux/string.h>
++
++#include "item.h"
++#include "limits.h"
++#include "util.h"
++
++/*
++ * This verifies the string at position @str with size @size is properly
++ * zero-terminated and does not contain a 0-byte but at the end.
++ */
++static bool kdbus_str_valid(const char *str, size_t size)
++{
++	return size > 0 && memchr(str, '\0', size) == str + size - 1;
++}
++
++/**
++ * kdbus_item_validate_name() - validate an item containing a name
++ * @item:		Item to validate
++ *
++ * Return: zero on success or an negative error code on failure
++ */
++int kdbus_item_validate_name(const struct kdbus_item *item)
++{
++	const char *name = item->str;
++	unsigned int i;
++	size_t len;
++
++	if (item->size < KDBUS_ITEM_HEADER_SIZE + 2)
++		return -EINVAL;
++
++	if (item->size > KDBUS_ITEM_HEADER_SIZE +
++			 KDBUS_SYSNAME_MAX_LEN + 1)
++		return -ENAMETOOLONG;
++
++	if (!kdbus_str_valid(name, KDBUS_ITEM_PAYLOAD_SIZE(item)))
++		return -EINVAL;
++
++	len = strlen(name);
++	if (len == 0)
++		return -EINVAL;
++
++	for (i = 0; i < len; i++) {
++		if (isalpha(name[i]))
++			continue;
++		if (isdigit(name[i]))
++			continue;
++		if (name[i] == '_')
++			continue;
++		if (i > 0 && i + 1 < len && (name[i] == '-' || name[i] == '.'))
++			continue;
++
++		return -EINVAL;
++	}
++
++	return 0;
++}
++
++/**
++ * kdbus_item_validate() - validate a single item
++ * @item:	item to validate
++ *
++ * Return: 0 if item is valid, negative error code if not.
++ */
++int kdbus_item_validate(const struct kdbus_item *item)
++{
++	size_t payload_size = KDBUS_ITEM_PAYLOAD_SIZE(item);
++	size_t l;
++	int ret;
++
++	BUILD_BUG_ON(KDBUS_ITEM_HEADER_SIZE !=
++		     sizeof(struct kdbus_item_header));
++
++	if (item->size < KDBUS_ITEM_HEADER_SIZE)
++		return -EINVAL;
++
++	switch (item->type) {
++	case KDBUS_ITEM_NEGOTIATE:
++		if (payload_size % sizeof(u64) != 0)
++			return -EINVAL;
++		break;
++
++	case KDBUS_ITEM_PAYLOAD_VEC:
++	case KDBUS_ITEM_PAYLOAD_OFF:
++		if (payload_size != sizeof(struct kdbus_vec))
++			return -EINVAL;
++		if (item->vec.size == 0 || item->vec.size > SIZE_MAX)
++			return -EINVAL;
++		break;
++
++	case KDBUS_ITEM_PAYLOAD_MEMFD:
++		if (payload_size != sizeof(struct kdbus_memfd))
++			return -EINVAL;
++		if (item->memfd.size == 0 || item->memfd.size > SIZE_MAX)
++			return -EINVAL;
++		if (item->memfd.fd < 0)
++			return -EBADF;
++		break;
++
++	case KDBUS_ITEM_FDS:
++		if (payload_size % sizeof(int) != 0)
++			return -EINVAL;
++		break;
++
++	case KDBUS_ITEM_CANCEL_FD:
++		if (payload_size != sizeof(int))
++			return -EINVAL;
++		break;
++
++	case KDBUS_ITEM_BLOOM_PARAMETER:
++		if (payload_size != sizeof(struct kdbus_bloom_parameter))
++			return -EINVAL;
++		break;
++
++	case KDBUS_ITEM_BLOOM_FILTER:
++		/* followed by the bloom-mask, depends on the bloom-size */
++		if (payload_size < sizeof(struct kdbus_bloom_filter))
++			return -EINVAL;
++		break;
++
++	case KDBUS_ITEM_BLOOM_MASK:
++		/* size depends on bloom-size of bus */
++		break;
++
++	case KDBUS_ITEM_CONN_DESCRIPTION:
++	case KDBUS_ITEM_MAKE_NAME:
++		ret = kdbus_item_validate_name(item);
++		if (ret < 0)
++			return ret;
++		break;
++
++	case KDBUS_ITEM_ATTACH_FLAGS_SEND:
++	case KDBUS_ITEM_ATTACH_FLAGS_RECV:
++	case KDBUS_ITEM_ID:
++	case KDBUS_ITEM_DST_ID:
++		if (payload_size != sizeof(u64))
++			return -EINVAL;
++		break;
++
++	case KDBUS_ITEM_TIMESTAMP:
++		if (payload_size != sizeof(struct kdbus_timestamp))
++			return -EINVAL;
++		break;
++
++	case KDBUS_ITEM_CREDS:
++		if (payload_size != sizeof(struct kdbus_creds))
++			return -EINVAL;
++		break;
++
++	case KDBUS_ITEM_AUXGROUPS:
++		if (payload_size % sizeof(u32) != 0)
++			return -EINVAL;
++		break;
++
++	case KDBUS_ITEM_NAME:
++	case KDBUS_ITEM_DST_NAME:
++	case KDBUS_ITEM_PID_COMM:
++	case KDBUS_ITEM_TID_COMM:
++	case KDBUS_ITEM_EXE:
++	case KDBUS_ITEM_CMDLINE:
++	case KDBUS_ITEM_CGROUP:
++	case KDBUS_ITEM_SECLABEL:
++		if (!kdbus_str_valid(item->str, payload_size))
++			return -EINVAL;
++		break;
++
++	case KDBUS_ITEM_CAPS:
++		if (payload_size < sizeof(u32))
++			return -EINVAL;
++		if (payload_size < sizeof(u32) +
++		    4 * CAP_TO_INDEX(item->caps.last_cap) * sizeof(u32))
++			return -EINVAL;
++		break;
++
++	case KDBUS_ITEM_AUDIT:
++		if (payload_size != sizeof(struct kdbus_audit))
++			return -EINVAL;
++		break;
++
++	case KDBUS_ITEM_POLICY_ACCESS:
++		if (payload_size != sizeof(struct kdbus_policy_access))
++			return -EINVAL;
++		break;
++
++	case KDBUS_ITEM_NAME_ADD:
++	case KDBUS_ITEM_NAME_REMOVE:
++	case KDBUS_ITEM_NAME_CHANGE:
++		if (payload_size < sizeof(struct kdbus_notify_name_change))
++			return -EINVAL;
++		l = payload_size - offsetof(struct kdbus_notify_name_change,
++					    name);
++		if (l > 0 && !kdbus_str_valid(item->name_change.name, l))
++			return -EINVAL;
++		break;
++
++	case KDBUS_ITEM_ID_ADD:
++	case KDBUS_ITEM_ID_REMOVE:
++		if (payload_size != sizeof(struct kdbus_notify_id_change))
++			return -EINVAL;
++		break;
++
++	case KDBUS_ITEM_REPLY_TIMEOUT:
++	case KDBUS_ITEM_REPLY_DEAD:
++		if (payload_size != 0)
++			return -EINVAL;
++		break;
++
++	default:
++		break;
++	}
++
++	return 0;
++}
++
++/**
++ * kdbus_items_validate() - validate items passed by user-space
++ * @items:		items to validate
++ * @items_size:		number of items
++ *
++ * This verifies that the passed items pointer is consistent and valid.
++ * Furthermore, each item is checked for:
++ *  - valid "size" value
++ *  - payload is of expected type
++ *  - payload is fully included in the item
++ *  - string payloads are zero-terminated
++ *
++ * Return: 0 on success, negative error code on failure.
++ */
++int kdbus_items_validate(const struct kdbus_item *items, size_t items_size)
++{
++	const struct kdbus_item *item;
++	int ret;
++
++	KDBUS_ITEMS_FOREACH(item, items, items_size) {
++		if (!KDBUS_ITEM_VALID(item, items, items_size))
++			return -EINVAL;
++
++		ret = kdbus_item_validate(item);
++		if (ret < 0)
++			return ret;
++	}
++
++	if (!KDBUS_ITEMS_END(item, items, items_size))
++		return -EINVAL;
++
++	return 0;
++}
++
++/**
++ * kdbus_item_set() - Set item content
++ * @item:	The item to modify
++ * @type:	The item type to set (KDBUS_ITEM_*)
++ * @data:	Data to copy to item->data, may be %NULL
++ * @len:	Number of bytes in @data
++ *
++ * This sets type, size and data fields of an item. If @data is NULL, the data
++ * memory is cleared.
++ *
++ * Note that you must align your @data memory to 8 bytes. Trailing padding (in
++ * case @len is not 8byte aligned) is cleared by this call.
++ *
++ * Returns: Pointer to the following item.
++ */
++struct kdbus_item *kdbus_item_set(struct kdbus_item *item, u64 type,
++				  const void *data, size_t len)
++{
++	item->type = type;
++	item->size = KDBUS_ITEM_HEADER_SIZE + len;
++
++	if (data) {
++		memcpy(item->data, data, len);
++		memset(item->data + len, 0, KDBUS_ALIGN8(len) - len);
++	} else {
++		memset(item->data, 0, KDBUS_ALIGN8(len));
++	}
++
++	return KDBUS_ITEM_NEXT(item);
++}
+diff --git a/ipc/kdbus/item.h b/ipc/kdbus/item.h
+new file mode 100644
+index 0000000..3a7e6cc
+--- /dev/null
++++ b/ipc/kdbus/item.h
+@@ -0,0 +1,61 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_ITEM_H
++#define __KDBUS_ITEM_H
++
++#include <linux/kernel.h>
++#include <uapi/linux/kdbus.h>
++
++#include "util.h"
++
++/* generic access and iterators over a stream of items */
++#define KDBUS_ITEM_NEXT(_i) (typeof(_i))((u8 *)(_i) + KDBUS_ALIGN8((_i)->size))
++#define KDBUS_ITEMS_SIZE(_h, _is) ((_h)->size - offsetof(typeof(*(_h)), _is))
++#define KDBUS_ITEM_HEADER_SIZE offsetof(struct kdbus_item, data)
++#define KDBUS_ITEM_SIZE(_s) KDBUS_ALIGN8(KDBUS_ITEM_HEADER_SIZE + (_s))
++#define KDBUS_ITEM_PAYLOAD_SIZE(_i) ((_i)->size - KDBUS_ITEM_HEADER_SIZE)
++
++#define KDBUS_ITEMS_FOREACH(_i, _is, _s)				\
++	for ((_i) = (_is);						\
++	     ((u8 *)(_i) < (u8 *)(_is) + (_s)) &&			\
++	       ((u8 *)(_i) >= (u8 *)(_is));				\
++	     (_i) = KDBUS_ITEM_NEXT(_i))
++
++#define KDBUS_ITEM_VALID(_i, _is, _s)					\
++	((_i)->size >= KDBUS_ITEM_HEADER_SIZE &&			\
++	 (u8 *)(_i) + (_i)->size > (u8 *)(_i) &&			\
++	 (u8 *)(_i) + (_i)->size <= (u8 *)(_is) + (_s) &&		\
++	 (u8 *)(_i) >= (u8 *)(_is))
++
++#define KDBUS_ITEMS_END(_i, _is, _s)					\
++	((u8 *)(_i) == ((u8 *)(_is) + KDBUS_ALIGN8(_s)))
++
++/**
++ * struct kdbus_item_header - Describes the fix part of an item
++ * @size:	The total size of the item
++ * @type:	The item type, one of KDBUS_ITEM_*
++ */
++struct kdbus_item_header {
++	u64 size;
++	u64 type;
++};
++
++int kdbus_item_validate_name(const struct kdbus_item *item);
++int kdbus_item_validate(const struct kdbus_item *item);
++int kdbus_items_validate(const struct kdbus_item *items, size_t items_size);
++struct kdbus_item *kdbus_item_set(struct kdbus_item *item, u64 type,
++				  const void *data, size_t len);
++
++#endif
+diff --git a/ipc/kdbus/limits.h b/ipc/kdbus/limits.h
+new file mode 100644
+index 0000000..c54925a
+--- /dev/null
++++ b/ipc/kdbus/limits.h
+@@ -0,0 +1,61 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_DEFAULTS_H
++#define __KDBUS_DEFAULTS_H
++
++#include <linux/kernel.h>
++
++/* maximum size of message header and items */
++#define KDBUS_MSG_MAX_SIZE		SZ_8K
++
++/* maximum number of memfd items per message */
++#define KDBUS_MSG_MAX_MEMFD_ITEMS	16
++
++/* max size of ioctl command data */
++#define KDBUS_CMD_MAX_SIZE		SZ_32K
++
++/* maximum number of inflight fds in a target queue per user */
++#define KDBUS_CONN_MAX_FDS_PER_USER	16
++
++/* maximum message payload size */
++#define KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE		SZ_2M
++
++/* maximum size of bloom bit field in bytes */
++#define KDBUS_BUS_BLOOM_MAX_SIZE		SZ_4K
++
++/* maximum length of well-known bus name */
++#define KDBUS_NAME_MAX_LEN			255
++
++/* maximum length of bus, domain, ep name */
++#define KDBUS_SYSNAME_MAX_LEN			63
++
++/* maximum number of matches per connection */
++#define KDBUS_MATCH_MAX				256
++
++/* maximum number of queued messages from the same individual user */
++#define KDBUS_CONN_MAX_MSGS			256
++
++/* maximum number of well-known names per connection */
++#define KDBUS_CONN_MAX_NAMES			256
++
++/* maximum number of queued requests waiting for a reply */
++#define KDBUS_CONN_MAX_REQUESTS_PENDING		128
++
++/* maximum number of connections per user in one domain */
++#define KDBUS_USER_MAX_CONN			1024
++
++/* maximum number of buses per user in one domain */
++#define KDBUS_USER_MAX_BUSES			16
++
++#endif
+diff --git a/ipc/kdbus/main.c b/ipc/kdbus/main.c
+new file mode 100644
+index 0000000..1ad4dc8
+--- /dev/null
++++ b/ipc/kdbus/main.c
+@@ -0,0 +1,114 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#define pr_fmt(fmt)    KBUILD_MODNAME ": " fmt
++#include <linux/fs.h>
++#include <linux/init.h>
++#include <linux/module.h>
++
++#include "util.h"
++#include "fs.h"
++#include "handle.h"
++#include "metadata.h"
++#include "node.h"
++
++/*
++ * This is a simplified outline of the internal kdbus object relations, for
++ * those interested in the inner life of the driver implementation.
++ *
++ * From a mount point's (domain's) perspective:
++ *
++ * struct kdbus_domain
++ *   |» struct kdbus_user *user (many, owned)
++ *   '» struct kdbus_node node (embedded)
++ *       |» struct kdbus_node children (many, referenced)
++ *       |» struct kdbus_node *parent (pinned)
++ *       '» struct kdbus_bus (many, pinned)
++ *           |» struct kdbus_node node (embedded)
++ *           '» struct kdbus_ep (many, pinned)
++ *               |» struct kdbus_node node (embedded)
++ *               |» struct kdbus_bus *bus (pinned)
++ *               |» struct kdbus_conn conn_list (many, pinned)
++ *               |   |» struct kdbus_ep *ep (pinned)
++ *               |   |» struct kdbus_name_entry *activator_of (owned)
++ *               |   |» struct kdbus_match_db *match_db (owned)
++ *               |   |» struct kdbus_meta *meta (owned)
++ *               |   |» struct kdbus_match_db *match_db (owned)
++ *               |   |    '» struct kdbus_match_entry (many, owned)
++ *               |   |
++ *               |   |» struct kdbus_pool *pool (owned)
++ *               |   |    '» struct kdbus_pool_slice *slices (many, owned)
++ *               |   |       '» struct kdbus_pool *pool (pinned)
++ *               |   |
++ *               |   |» struct kdbus_user *user (pinned)
++ *               |   `» struct kdbus_queue_entry entries (many, embedded)
++ *               |        |» struct kdbus_pool_slice *slice (pinned)
++ *               |        |» struct kdbus_conn_reply *reply (owned)
++ *               |        '» struct kdbus_user *user (pinned)
++ *               |
++ *               '» struct kdbus_user *user (pinned)
++ *                   '» struct kdbus_policy_db policy_db (embedded)
++ *                        |» struct kdbus_policy_db_entry (many, owned)
++ *                        |   |» struct kdbus_conn (pinned)
++ *                        |   '» struct kdbus_ep (pinned)
++ *                        |
++ *                        '» struct kdbus_policy_db_cache_entry (many, owned)
++ *                            '» struct kdbus_conn (pinned)
++ *
++ * For the life-time of a file descriptor derived from calling open() on a file
++ * inside the mount point:
++ *
++ * struct kdbus_handle
++ *  |» struct kdbus_meta *meta (owned)
++ *  |» struct kdbus_ep *ep (pinned)
++ *  |» struct kdbus_conn *conn (owned)
++ *  '» struct kdbus_ep *ep (owned)
++ */
++
++/* kdbus mount-point /sys/fs/kdbus */
++static struct kobject *kdbus_dir;
++
++static int __init kdbus_init(void)
++{
++	int ret;
++
++	kdbus_dir = kobject_create_and_add(KBUILD_MODNAME, fs_kobj);
++	if (!kdbus_dir)
++		return -ENOMEM;
++
++	ret = kdbus_fs_init();
++	if (ret < 0) {
++		pr_err("cannot register filesystem: %d\n", ret);
++		goto exit_dir;
++	}
++
++	pr_info("initialized\n");
++	return 0;
++
++exit_dir:
++	kobject_put(kdbus_dir);
++	return ret;
++}
++
++static void __exit kdbus_exit(void)
++{
++	kdbus_fs_exit();
++	kobject_put(kdbus_dir);
++	ida_destroy(&kdbus_node_ida);
++}
++
++module_init(kdbus_init);
++module_exit(kdbus_exit);
++MODULE_LICENSE("GPL");
++MODULE_DESCRIPTION("D-Bus, powerful, easy to use interprocess communication");
++MODULE_ALIAS_FS(KBUILD_MODNAME "fs");
+diff --git a/ipc/kdbus/match.c b/ipc/kdbus/match.c
+new file mode 100644
+index 0000000..4ee6a1f
+--- /dev/null
++++ b/ipc/kdbus/match.c
+@@ -0,0 +1,546 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/fs.h>
++#include <linux/hash.h>
++#include <linux/init.h>
++#include <linux/mutex.h>
++#include <linux/sched.h>
++#include <linux/sizes.h>
++#include <linux/slab.h>
++#include <linux/uaccess.h>
++
++#include "bus.h"
++#include "connection.h"
++#include "endpoint.h"
++#include "handle.h"
++#include "item.h"
++#include "match.h"
++#include "message.h"
++#include "names.h"
++
++/**
++ * struct kdbus_match_db - message filters
++ * @entries_list:	List of matches
++ * @mdb_rwlock:		Match data lock
++ * @entries_count:	Number of entries in database
++ */
++struct kdbus_match_db {
++	struct list_head entries_list;
++	struct rw_semaphore mdb_rwlock;
++	unsigned int entries_count;
++};
++
++/**
++ * struct kdbus_match_entry - a match database entry
++ * @cookie:		User-supplied cookie to lookup the entry
++ * @list_entry:		The list entry element for the db list
++ * @rules_list:		The list head for tracking rules of this entry
++ */
++struct kdbus_match_entry {
++	u64 cookie;
++	struct list_head list_entry;
++	struct list_head rules_list;
++};
++
++/**
++ * struct kdbus_bloom_mask - mask to match against filter
++ * @generations:	Number of generations carried
++ * @data:		Array of bloom bit fields
++ */
++struct kdbus_bloom_mask {
++	u64 generations;
++	u64 *data;
++};
++
++/**
++ * struct kdbus_match_rule - a rule appended to a match entry
++ * @type:		An item type to match against
++ * @bloom_mask:		Bloom mask to match a message's filter against, used
++ *			with KDBUS_ITEM_BLOOM_MASK
++ * @name:		Name to match against, used with KDBUS_ITEM_NAME,
++ *			KDBUS_ITEM_NAME_{ADD,REMOVE,CHANGE}
++ * @old_id:		ID to match against, used with
++ *			KDBUS_ITEM_NAME_{ADD,REMOVE,CHANGE},
++ *			KDBUS_ITEM_ID_REMOVE
++ * @new_id:		ID to match against, used with
++ *			KDBUS_ITEM_NAME_{ADD,REMOVE,CHANGE},
++ *			KDBUS_ITEM_ID_REMOVE
++ * @src_id:		ID to match against, used with KDBUS_ITEM_ID
++ * @dst_id:		Message destination ID, used with KDBUS_ITEM_DST_ID
++ * @rules_entry:	Entry in the entry's rules list
++ */
++struct kdbus_match_rule {
++	u64 type;
++	union {
++		struct kdbus_bloom_mask bloom_mask;
++		struct {
++			char *name;
++			u64 old_id;
++			u64 new_id;
++		};
++		u64 src_id;
++		u64 dst_id;
++	};
++	struct list_head rules_entry;
++};
++
++static void kdbus_match_rule_free(struct kdbus_match_rule *rule)
++{
++	if (!rule)
++		return;
++
++	switch (rule->type) {
++	case KDBUS_ITEM_BLOOM_MASK:
++		kfree(rule->bloom_mask.data);
++		break;
++
++	case KDBUS_ITEM_NAME:
++	case KDBUS_ITEM_NAME_ADD:
++	case KDBUS_ITEM_NAME_REMOVE:
++	case KDBUS_ITEM_NAME_CHANGE:
++		kfree(rule->name);
++		break;
++
++	case KDBUS_ITEM_ID:
++	case KDBUS_ITEM_DST_ID:
++	case KDBUS_ITEM_ID_ADD:
++	case KDBUS_ITEM_ID_REMOVE:
++		break;
++
++	default:
++		BUG();
++	}
++
++	list_del(&rule->rules_entry);
++	kfree(rule);
++}
++
++static void kdbus_match_entry_free(struct kdbus_match_entry *entry)
++{
++	struct kdbus_match_rule *r, *tmp;
++
++	if (!entry)
++		return;
++
++	list_for_each_entry_safe(r, tmp, &entry->rules_list, rules_entry)
++		kdbus_match_rule_free(r);
++
++	list_del(&entry->list_entry);
++	kfree(entry);
++}
++
++/**
++ * kdbus_match_db_free() - free match db resources
++ * @mdb:		The match database
++ */
++void kdbus_match_db_free(struct kdbus_match_db *mdb)
++{
++	struct kdbus_match_entry *entry, *tmp;
++
++	if (!mdb)
++		return;
++
++	list_for_each_entry_safe(entry, tmp, &mdb->entries_list, list_entry)
++		kdbus_match_entry_free(entry);
++
++	kfree(mdb);
++}
++
++/**
++ * kdbus_match_db_new() - create a new match database
++ *
++ * Return: a new kdbus_match_db on success, ERR_PTR on failure.
++ */
++struct kdbus_match_db *kdbus_match_db_new(void)
++{
++	struct kdbus_match_db *d;
++
++	d = kzalloc(sizeof(*d), GFP_KERNEL);
++	if (!d)
++		return ERR_PTR(-ENOMEM);
++
++	init_rwsem(&d->mdb_rwlock);
++	INIT_LIST_HEAD(&d->entries_list);
++
++	return d;
++}
++
++static bool kdbus_match_bloom(const struct kdbus_bloom_filter *filter,
++			      const struct kdbus_bloom_mask *mask,
++			      const struct kdbus_conn *conn)
++{
++	size_t n = conn->ep->bus->bloom.size / sizeof(u64);
++	const u64 *m;
++	size_t i;
++
++	/*
++	 * The message's filter carries a generation identifier, the
++	 * match's mask possibly carries an array of multiple generations
++	 * of the mask. Select the mask with the closest match of the
++	 * filter's generation.
++	 */
++	m = mask->data + (min(filter->generation, mask->generations - 1) * n);
++
++	/*
++	 * The message's filter contains the messages properties,
++	 * the match's mask contains the properties to look for in the
++	 * message. Check the mask bit field against the filter bit field,
++	 * if the message possibly carries the properties the connection
++	 * has subscribed to.
++	 */
++	for (i = 0; i < n; i++)
++		if ((filter->data[i] & m[i]) != m[i])
++			return false;
++
++	return true;
++}
++
++static bool kdbus_match_rule_conn(const struct kdbus_match_rule *r,
++				  struct kdbus_conn *c,
++				  const struct kdbus_staging *s)
++{
++	lockdep_assert_held(&c->ep->bus->name_registry->rwlock);
++
++	switch (r->type) {
++	case KDBUS_ITEM_BLOOM_MASK:
++		return kdbus_match_bloom(s->bloom_filter, &r->bloom_mask, c);
++	case KDBUS_ITEM_ID:
++		return r->src_id == c->id || r->src_id == KDBUS_MATCH_ID_ANY;
++	case KDBUS_ITEM_DST_ID:
++		return r->dst_id == s->msg->dst_id ||
++		       r->dst_id == KDBUS_MATCH_ID_ANY;
++	case KDBUS_ITEM_NAME:
++		return kdbus_conn_has_name(c, r->name);
++	default:
++		return false;
++	}
++}
++
++static bool kdbus_match_rule_kernel(const struct kdbus_match_rule *r,
++				    const struct kdbus_staging *s)
++{
++	struct kdbus_item *n = s->notify;
++
++	if (WARN_ON(!n) || n->type != r->type)
++		return false;
++
++	switch (r->type) {
++	case KDBUS_ITEM_ID_ADD:
++		return r->new_id == KDBUS_MATCH_ID_ANY ||
++		       r->new_id == n->id_change.id;
++	case KDBUS_ITEM_ID_REMOVE:
++		return r->old_id == KDBUS_MATCH_ID_ANY ||
++		       r->old_id == n->id_change.id;
++	case KDBUS_ITEM_NAME_ADD:
++	case KDBUS_ITEM_NAME_CHANGE:
++	case KDBUS_ITEM_NAME_REMOVE:
++		return (r->old_id == KDBUS_MATCH_ID_ANY ||
++		        r->old_id == n->name_change.old_id.id) &&
++		       (r->new_id == KDBUS_MATCH_ID_ANY ||
++		        r->new_id == n->name_change.new_id.id) &&
++		       (!r->name || !strcmp(r->name, n->name_change.name));
++	default:
++		return false;
++	}
++}
++
++static bool kdbus_match_rules(const struct kdbus_match_entry *entry,
++			      struct kdbus_conn *c,
++			      const struct kdbus_staging *s)
++{
++	struct kdbus_match_rule *r;
++
++	list_for_each_entry(r, &entry->rules_list, rules_entry)
++		if ((c && !kdbus_match_rule_conn(r, c, s)) ||
++		    (!c && !kdbus_match_rule_kernel(r, s)))
++			return false;
++
++	return true;
++}
++
++/**
++ * kdbus_match_db_match_msg() - match a msg object agains the database entries
++ * @mdb:		The match database
++ * @conn_src:		The connection object originating the message
++ * @staging:		Staging object containing the message to match against
++ *
++ * This function will walk through all the database entries previously uploaded
++ * with kdbus_match_db_add(). As soon as any of them has an all-satisfied rule
++ * set, this function will return true.
++ *
++ * The caller must hold the registry lock of conn_src->ep->bus, in case conn_src
++ * is non-NULL.
++ *
++ * Return: true if there was a matching database entry, false otherwise.
++ */
++bool kdbus_match_db_match_msg(struct kdbus_match_db *mdb,
++			      struct kdbus_conn *conn_src,
++			      const struct kdbus_staging *staging)
++{
++	struct kdbus_match_entry *entry;
++	bool matched = false;
++
++	down_read(&mdb->mdb_rwlock);
++	list_for_each_entry(entry, &mdb->entries_list, list_entry) {
++		matched = kdbus_match_rules(entry, conn_src, staging);
++		if (matched)
++			break;
++	}
++	up_read(&mdb->mdb_rwlock);
++
++	return matched;
++}
++
++static int kdbus_match_db_remove_unlocked(struct kdbus_match_db *mdb,
++					  u64 cookie)
++{
++	struct kdbus_match_entry *entry, *tmp;
++	bool found = false;
++
++	list_for_each_entry_safe(entry, tmp, &mdb->entries_list, list_entry)
++		if (entry->cookie == cookie) {
++			kdbus_match_entry_free(entry);
++			--mdb->entries_count;
++			found = true;
++		}
++
++	return found ? 0 : -EBADSLT;
++}
++
++/**
++ * kdbus_cmd_match_add() - handle KDBUS_CMD_MATCH_ADD
++ * @conn:		connection to operate on
++ * @argp:		command payload
++ *
++ * One call to this function (or one ioctl(KDBUS_CMD_MATCH_ADD), respectively,
++ * adds one new database entry with n rules attached to it. Each rule is
++ * described with an kdbus_item, and an entry is considered matching if all
++ * its rules are satisfied.
++ *
++ * The items attached to a kdbus_cmd_match struct have the following mapping:
++ *
++ * KDBUS_ITEM_BLOOM_MASK:	A bloom mask
++ * KDBUS_ITEM_NAME:		A connection's source name
++ * KDBUS_ITEM_ID:		A connection ID
++ * KDBUS_ITEM_DST_ID:		A connection ID
++ * KDBUS_ITEM_NAME_ADD:
++ * KDBUS_ITEM_NAME_REMOVE:
++ * KDBUS_ITEM_NAME_CHANGE:	Well-known name changes, carry
++ *				kdbus_notify_name_change
++ * KDBUS_ITEM_ID_ADD:
++ * KDBUS_ITEM_ID_REMOVE:	Connection ID changes, carry
++ *				kdbus_notify_id_change
++ *
++ * For kdbus_notify_{id,name}_change structs, only the ID and name fields
++ * are looked at when adding an entry. The flags are unused.
++ *
++ * Also note that KDBUS_ITEM_BLOOM_MASK, KDBUS_ITEM_NAME, KDBUS_ITEM_ID,
++ * and KDBUS_ITEM_DST_ID are used to match messages from userspace, while the
++ * others apply to kernel-generated notifications.
++ *
++ * Return: >=0 on success, negative error code on failure.
++ */
++int kdbus_cmd_match_add(struct kdbus_conn *conn, void __user *argp)
++{
++	struct kdbus_match_db *mdb = conn->match_db;
++	struct kdbus_match_entry *entry = NULL;
++	struct kdbus_cmd_match *cmd;
++	struct kdbus_item *item;
++	int ret;
++
++	struct kdbus_arg argv[] = {
++		{ .type = KDBUS_ITEM_NEGOTIATE },
++		{ .type = KDBUS_ITEM_BLOOM_MASK, .multiple = true },
++		{ .type = KDBUS_ITEM_NAME, .multiple = true },
++		{ .type = KDBUS_ITEM_ID, .multiple = true },
++		{ .type = KDBUS_ITEM_DST_ID, .multiple = true },
++		{ .type = KDBUS_ITEM_NAME_ADD, .multiple = true },
++		{ .type = KDBUS_ITEM_NAME_REMOVE, .multiple = true },
++		{ .type = KDBUS_ITEM_NAME_CHANGE, .multiple = true },
++		{ .type = KDBUS_ITEM_ID_ADD, .multiple = true },
++		{ .type = KDBUS_ITEM_ID_REMOVE, .multiple = true },
++	};
++	struct kdbus_args args = {
++		.allowed_flags = KDBUS_FLAG_NEGOTIATE |
++				 KDBUS_MATCH_REPLACE,
++		.argv = argv,
++		.argc = ARRAY_SIZE(argv),
++	};
++
++	if (!kdbus_conn_is_ordinary(conn))
++		return -EOPNOTSUPP;
++
++	ret = kdbus_args_parse(&args, argp, &cmd);
++	if (ret != 0)
++		return ret;
++
++	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
++	if (!entry) {
++		ret = -ENOMEM;
++		goto exit;
++	}
++
++	entry->cookie = cmd->cookie;
++	INIT_LIST_HEAD(&entry->list_entry);
++	INIT_LIST_HEAD(&entry->rules_list);
++
++	KDBUS_ITEMS_FOREACH(item, cmd->items, KDBUS_ITEMS_SIZE(cmd, items)) {
++		struct kdbus_match_rule *rule;
++		size_t size = item->size - offsetof(struct kdbus_item, data);
++
++		rule = kzalloc(sizeof(*rule), GFP_KERNEL);
++		if (!rule) {
++			ret = -ENOMEM;
++			goto exit;
++		}
++
++		rule->type = item->type;
++		INIT_LIST_HEAD(&rule->rules_entry);
++
++		switch (item->type) {
++		case KDBUS_ITEM_BLOOM_MASK: {
++			u64 bsize = conn->ep->bus->bloom.size;
++			u64 generations;
++			u64 remainder;
++
++			generations = div64_u64_rem(size, bsize, &remainder);
++			if (size < bsize || remainder > 0) {
++				ret = -EDOM;
++				break;
++			}
++
++			rule->bloom_mask.data = kmemdup(item->data,
++							size, GFP_KERNEL);
++			if (!rule->bloom_mask.data) {
++				ret = -ENOMEM;
++				break;
++			}
++
++			rule->bloom_mask.generations = generations;
++			break;
++		}
++
++		case KDBUS_ITEM_NAME:
++			if (!kdbus_name_is_valid(item->str, false)) {
++				ret = -EINVAL;
++				break;
++			}
++
++			rule->name = kstrdup(item->str, GFP_KERNEL);
++			if (!rule->name)
++				ret = -ENOMEM;
++
++			break;
++
++		case KDBUS_ITEM_ID:
++			rule->src_id = item->id;
++			break;
++
++		case KDBUS_ITEM_DST_ID:
++			rule->dst_id = item->id;
++			break;
++
++		case KDBUS_ITEM_NAME_ADD:
++		case KDBUS_ITEM_NAME_REMOVE:
++		case KDBUS_ITEM_NAME_CHANGE:
++			rule->old_id = item->name_change.old_id.id;
++			rule->new_id = item->name_change.new_id.id;
++
++			if (size > sizeof(struct kdbus_notify_name_change)) {
++				rule->name = kstrdup(item->name_change.name,
++						     GFP_KERNEL);
++				if (!rule->name)
++					ret = -ENOMEM;
++			}
++
++			break;
++
++		case KDBUS_ITEM_ID_ADD:
++		case KDBUS_ITEM_ID_REMOVE:
++			if (item->type == KDBUS_ITEM_ID_ADD)
++				rule->new_id = item->id_change.id;
++			else
++				rule->old_id = item->id_change.id;
++
++			break;
++		}
++
++		if (ret < 0) {
++			kdbus_match_rule_free(rule);
++			goto exit;
++		}
++
++		list_add_tail(&rule->rules_entry, &entry->rules_list);
++	}
++
++	down_write(&mdb->mdb_rwlock);
++
++	/* Remove any entry that has the same cookie as the current one. */
++	if (cmd->flags & KDBUS_MATCH_REPLACE)
++		kdbus_match_db_remove_unlocked(mdb, entry->cookie);
++
++	/*
++	 * If the above removal caught any entry, there will be room for the
++	 * new one.
++	 */
++	if (++mdb->entries_count > KDBUS_MATCH_MAX) {
++		--mdb->entries_count;
++		ret = -EMFILE;
++	} else {
++		list_add_tail(&entry->list_entry, &mdb->entries_list);
++		entry = NULL;
++	}
++
++	up_write(&mdb->mdb_rwlock);
++
++exit:
++	kdbus_match_entry_free(entry);
++	return kdbus_args_clear(&args, ret);
++}
++
++/**
++ * kdbus_cmd_match_remove() - handle KDBUS_CMD_MATCH_REMOVE
++ * @conn:		connection to operate on
++ * @argp:		command payload
++ *
++ * Return: >=0 on success, negative error code on failure.
++ */
++int kdbus_cmd_match_remove(struct kdbus_conn *conn, void __user *argp)
++{
++	struct kdbus_cmd_match *cmd;
++	int ret;
++
++	struct kdbus_arg argv[] = {
++		{ .type = KDBUS_ITEM_NEGOTIATE },
++	};
++	struct kdbus_args args = {
++		.allowed_flags = KDBUS_FLAG_NEGOTIATE,
++		.argv = argv,
++		.argc = ARRAY_SIZE(argv),
++	};
++
++	if (!kdbus_conn_is_ordinary(conn))
++		return -EOPNOTSUPP;
++
++	ret = kdbus_args_parse(&args, argp, &cmd);
++	if (ret != 0)
++		return ret;
++
++	down_write(&conn->match_db->mdb_rwlock);
++	ret = kdbus_match_db_remove_unlocked(conn->match_db, cmd->cookie);
++	up_write(&conn->match_db->mdb_rwlock);
++
++	return kdbus_args_clear(&args, ret);
++}
+diff --git a/ipc/kdbus/match.h b/ipc/kdbus/match.h
+new file mode 100644
+index 0000000..ceb492f
+--- /dev/null
++++ b/ipc/kdbus/match.h
+@@ -0,0 +1,35 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_MATCH_H
++#define __KDBUS_MATCH_H
++
++struct kdbus_conn;
++struct kdbus_match_db;
++struct kdbus_staging;
++
++struct kdbus_match_db *kdbus_match_db_new(void);
++void kdbus_match_db_free(struct kdbus_match_db *db);
++int kdbus_match_db_add(struct kdbus_conn *conn,
++		       struct kdbus_cmd_match *cmd);
++int kdbus_match_db_remove(struct kdbus_conn *conn,
++			  struct kdbus_cmd_match *cmd);
++bool kdbus_match_db_match_msg(struct kdbus_match_db *db,
++			      struct kdbus_conn *conn_src,
++			      const struct kdbus_staging *staging);
++
++int kdbus_cmd_match_add(struct kdbus_conn *conn, void __user *argp);
++int kdbus_cmd_match_remove(struct kdbus_conn *conn, void __user *argp);
++
++#endif
+diff --git a/ipc/kdbus/message.c b/ipc/kdbus/message.c
+new file mode 100644
+index 0000000..ae565cd
+--- /dev/null
++++ b/ipc/kdbus/message.c
+@@ -0,0 +1,1040 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/capability.h>
++#include <linux/cgroup.h>
++#include <linux/cred.h>
++#include <linux/file.h>
++#include <linux/init.h>
++#include <linux/module.h>
++#include <linux/mutex.h>
++#include <linux/sched.h>
++#include <linux/shmem_fs.h>
++#include <linux/sizes.h>
++#include <linux/slab.h>
++#include <linux/uaccess.h>
++#include <net/sock.h>
++
++#include "bus.h"
++#include "connection.h"
++#include "domain.h"
++#include "endpoint.h"
++#include "handle.h"
++#include "item.h"
++#include "match.h"
++#include "message.h"
++#include "names.h"
++#include "policy.h"
++
++static const char * const zeros = "\0\0\0\0\0\0\0";
++
++static struct kdbus_gaps *kdbus_gaps_new(size_t n_memfds, size_t n_fds)
++{
++	size_t size_offsets, size_memfds, size_fds, size;
++	struct kdbus_gaps *gaps;
++
++	size_offsets = n_memfds * sizeof(*gaps->memfd_offsets);
++	size_memfds = n_memfds * sizeof(*gaps->memfd_files);
++	size_fds = n_fds * sizeof(*gaps->fd_files);
++	size = sizeof(*gaps) + size_offsets + size_memfds + size_fds;
++
++	gaps = kzalloc(size, GFP_KERNEL);
++	if (!gaps)
++		return ERR_PTR(-ENOMEM);
++
++	kref_init(&gaps->kref);
++	gaps->n_memfds = 0; /* we reserve n_memfds, but don't enforce them */
++	gaps->memfd_offsets = (void *)(gaps + 1);
++	gaps->memfd_files = (void *)((u8 *)gaps->memfd_offsets + size_offsets);
++	gaps->n_fds = 0; /* we reserve n_fds, but don't enforce them */
++	gaps->fd_files = (void *)((u8 *)gaps->memfd_files + size_memfds);
++
++	return gaps;
++}
++
++static void kdbus_gaps_free(struct kref *kref)
++{
++	struct kdbus_gaps *gaps = container_of(kref, struct kdbus_gaps, kref);
++	size_t i;
++
++	for (i = 0; i < gaps->n_fds; ++i)
++		if (gaps->fd_files[i])
++			fput(gaps->fd_files[i]);
++	for (i = 0; i < gaps->n_memfds; ++i)
++		if (gaps->memfd_files[i])
++			fput(gaps->memfd_files[i]);
++
++	kfree(gaps);
++}
++
++/**
++ * kdbus_gaps_ref() - gain reference
++ * @gaps:	gaps object
++ *
++ * Return: @gaps is returned
++ */
++struct kdbus_gaps *kdbus_gaps_ref(struct kdbus_gaps *gaps)
++{
++	if (gaps)
++		kref_get(&gaps->kref);
++	return gaps;
++}
++
++/**
++ * kdbus_gaps_unref() - drop reference
++ * @gaps:	gaps object
++ *
++ * Return: NULL
++ */
++struct kdbus_gaps *kdbus_gaps_unref(struct kdbus_gaps *gaps)
++{
++	if (gaps)
++		kref_put(&gaps->kref, kdbus_gaps_free);
++	return NULL;
++}
++
++/**
++ * kdbus_gaps_install() - install file-descriptors
++ * @gaps:		gaps object, or NULL
++ * @slice:		pool slice that contains the message
++ * @out_incomplete	output variable to note incomplete fds
++ *
++ * This function installs all file-descriptors of @gaps into the current
++ * process and copies the file-descriptor numbers into the target pool slice.
++ *
++ * If the file-descriptors were only partially installed, then @out_incomplete
++ * will be set to true. Otherwise, it's set to false.
++ *
++ * Return: 0 on success, negative error code on failure
++ */
++int kdbus_gaps_install(struct kdbus_gaps *gaps, struct kdbus_pool_slice *slice,
++		       bool *out_incomplete)
++{
++	bool incomplete_fds = false;
++	struct kvec kvec;
++	size_t i, n_fds;
++	int ret, *fds;
++
++	if (!gaps) {
++		/* nothing to do */
++		*out_incomplete = incomplete_fds;
++		return 0;
++	}
++
++	n_fds = gaps->n_fds + gaps->n_memfds;
++	if (n_fds < 1) {
++		/* nothing to do */
++		*out_incomplete = incomplete_fds;
++		return 0;
++	}
++
++	fds = kmalloc_array(n_fds, sizeof(*fds), GFP_TEMPORARY);
++	n_fds = 0;
++	if (!fds)
++		return -ENOMEM;
++
++	/* 1) allocate fds and copy them over */
++
++	if (gaps->n_fds > 0) {
++		for (i = 0; i < gaps->n_fds; ++i) {
++			int fd;
++
++			fd = get_unused_fd_flags(O_CLOEXEC);
++			if (fd < 0)
++				incomplete_fds = true;
++
++			WARN_ON(!gaps->fd_files[i]);
++
++			fds[n_fds++] = fd < 0 ? -1 : fd;
++		}
++
++		/*
++		 * The file-descriptor array can only be present once per
++		 * message. Hence, prepare all fds and then copy them over with
++		 * a single kvec.
++		 */
++
++		WARN_ON(!gaps->fd_offset);
++
++		kvec.iov_base = fds;
++		kvec.iov_len = gaps->n_fds * sizeof(*fds);
++		ret = kdbus_pool_slice_copy_kvec(slice, gaps->fd_offset,
++						 &kvec, 1, kvec.iov_len);
++		if (ret < 0)
++			goto exit;
++	}
++
++	for (i = 0; i < gaps->n_memfds; ++i) {
++		int memfd;
++
++		memfd = get_unused_fd_flags(O_CLOEXEC);
++		if (memfd < 0) {
++			incomplete_fds = true;
++			/* memfds are initialized to -1, skip copying it */
++			continue;
++		}
++
++		fds[n_fds++] = memfd;
++
++		/*
++		 * memfds have to be copied individually as they each are put
++		 * into a separate item. This should not be an issue, though,
++		 * as usually there is no need to send more than one memfd per
++		 * message.
++		 */
++
++		WARN_ON(!gaps->memfd_offsets[i]);
++		WARN_ON(!gaps->memfd_files[i]);
++
++		kvec.iov_base = &memfd;
++		kvec.iov_len = sizeof(memfd);
++		ret = kdbus_pool_slice_copy_kvec(slice, gaps->memfd_offsets[i],
++						 &kvec, 1, kvec.iov_len);
++		if (ret < 0)
++			goto exit;
++	}
++
++	/* 2) install fds now that everything was successful */
++
++	for (i = 0; i < gaps->n_fds; ++i)
++		if (fds[i] >= 0)
++			fd_install(fds[i], get_file(gaps->fd_files[i]));
++	for (i = 0; i < gaps->n_memfds; ++i)
++		if (fds[gaps->n_fds + i] >= 0)
++			fd_install(fds[gaps->n_fds + i],
++				   get_file(gaps->memfd_files[i]));
++
++	ret = 0;
++
++exit:
++	if (ret < 0)
++		for (i = 0; i < n_fds; ++i)
++			put_unused_fd(fds[i]);
++	kfree(fds);
++	*out_incomplete = incomplete_fds;
++	return ret;
++}
++
++static struct file *kdbus_get_fd(int fd)
++{
++	struct file *f, *ret;
++	struct inode *inode;
++	struct socket *sock;
++
++	if (fd < 0)
++		return ERR_PTR(-EBADF);
++
++	f = fget_raw(fd);
++	if (!f)
++		return ERR_PTR(-EBADF);
++
++	inode = file_inode(f);
++	sock = S_ISSOCK(inode->i_mode) ? SOCKET_I(inode) : NULL;
++
++	if (f->f_mode & FMODE_PATH)
++		ret = f; /* O_PATH is always allowed */
++	else if (f->f_op == &kdbus_handle_ops)
++		ret = ERR_PTR(-EOPNOTSUPP); /* disallow kdbus-fd over kdbus */
++	else if (sock && sock->sk && sock->ops && sock->ops->family == PF_UNIX)
++		ret = ERR_PTR(-EOPNOTSUPP); /* disallow UDS over kdbus */
++	else
++		ret = f; /* all other are allowed */
++
++	if (f != ret)
++		fput(f);
++
++	return ret;
++}
++
++static struct file *kdbus_get_memfd(const struct kdbus_memfd *memfd)
++{
++	const int m = F_SEAL_SHRINK | F_SEAL_GROW | F_SEAL_WRITE | F_SEAL_SEAL;
++	struct file *f, *ret;
++	int s;
++
++	if (memfd->fd < 0)
++		return ERR_PTR(-EBADF);
++
++	f = fget(memfd->fd);
++	if (!f)
++		return ERR_PTR(-EBADF);
++
++	s = shmem_get_seals(f);
++	if (s < 0)
++		ret = ERR_PTR(-EMEDIUMTYPE);
++	else if ((s & m) != m)
++		ret = ERR_PTR(-ETXTBSY);
++	else if (memfd->start + memfd->size > (u64)i_size_read(file_inode(f)))
++		ret = ERR_PTR(-EFAULT);
++	else
++		ret = f;
++
++	if (f != ret)
++		fput(f);
++
++	return ret;
++}
++
++static int kdbus_msg_examine(struct kdbus_msg *msg, struct kdbus_bus *bus,
++			     struct kdbus_cmd_send *cmd, size_t *out_n_memfds,
++			     size_t *out_n_fds, size_t *out_n_parts)
++{
++	struct kdbus_item *item, *fds = NULL, *bloom = NULL, *dstname = NULL;
++	u64 n_parts, n_memfds, n_fds, vec_size;
++
++	/*
++	 * Step 1:
++	 * Validate the message and command parameters.
++	 */
++
++	/* KDBUS_PAYLOAD_KERNEL is reserved to kernel messages */
++	if (msg->payload_type == KDBUS_PAYLOAD_KERNEL)
++		return -EINVAL;
++
++	if (msg->dst_id == KDBUS_DST_ID_BROADCAST) {
++		/* broadcasts must be marked as signals */
++		if (!(msg->flags & KDBUS_MSG_SIGNAL))
++			return -EBADMSG;
++		/* broadcasts cannot have timeouts */
++		if (msg->timeout_ns > 0)
++			return -ENOTUNIQ;
++	}
++
++	if (msg->flags & KDBUS_MSG_EXPECT_REPLY) {
++		/* if you expect a reply, you must specify a timeout */
++		if (msg->timeout_ns == 0)
++			return -EINVAL;
++		/* signals cannot have replies */
++		if (msg->flags & KDBUS_MSG_SIGNAL)
++			return -ENOTUNIQ;
++	} else {
++		/* must expect reply if sent as synchronous call */
++		if (cmd->flags & KDBUS_SEND_SYNC_REPLY)
++			return -EINVAL;
++		/* cannot mark replies as signal */
++		if (msg->cookie_reply && (msg->flags & KDBUS_MSG_SIGNAL))
++			return -EINVAL;
++	}
++
++	/*
++	 * Step 2:
++	 * Validate all passed items. While at it, select some statistics that
++	 * are required to allocate state objects later on.
++	 *
++	 * Generic item validation has already been done via
++	 * kdbus_item_validate(). Furthermore, the number of items is naturally
++	 * limited by the maximum message size. Hence, only non-generic item
++	 * checks are performed here (mainly integer overflow tests).
++	 */
++
++	n_parts = 0;
++	n_memfds = 0;
++	n_fds = 0;
++	vec_size = 0;
++
++	KDBUS_ITEMS_FOREACH(item, msg->items, KDBUS_ITEMS_SIZE(msg, items)) {
++		switch (item->type) {
++		case KDBUS_ITEM_PAYLOAD_VEC: {
++			void __force __user *ptr = KDBUS_PTR(item->vec.address);
++			u64 size = item->vec.size;
++
++			if (vec_size + size < vec_size)
++				return -EMSGSIZE;
++			if (vec_size + size > KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE)
++				return -EMSGSIZE;
++			if (ptr && unlikely(!access_ok(VERIFY_READ, ptr, size)))
++				return -EFAULT;
++
++			if (ptr || size % 8) /* data or padding */
++				++n_parts;
++			break;
++		}
++		case KDBUS_ITEM_PAYLOAD_MEMFD: {
++			u64 start = item->memfd.start;
++			u64 size = item->memfd.size;
++
++			if (start + size < start)
++				return -EMSGSIZE;
++			if (n_memfds >= KDBUS_MSG_MAX_MEMFD_ITEMS)
++				return -E2BIG;
++
++			++n_memfds;
++			if (size % 8) /* vec-padding required */
++				++n_parts;
++			break;
++		}
++		case KDBUS_ITEM_FDS: {
++			if (fds)
++				return -EEXIST;
++
++			fds = item;
++			n_fds = KDBUS_ITEM_PAYLOAD_SIZE(item) / sizeof(int);
++			if (n_fds > KDBUS_CONN_MAX_FDS_PER_USER)
++				return -EMFILE;
++
++			break;
++		}
++		case KDBUS_ITEM_BLOOM_FILTER: {
++			u64 bloom_size;
++
++			if (bloom)
++				return -EEXIST;
++
++			bloom = item;
++			bloom_size = KDBUS_ITEM_PAYLOAD_SIZE(item) -
++				     offsetof(struct kdbus_bloom_filter, data);
++			if (!KDBUS_IS_ALIGNED8(bloom_size))
++				return -EFAULT;
++			if (bloom_size != bus->bloom.size)
++				return -EDOM;
++
++			break;
++		}
++		case KDBUS_ITEM_DST_NAME: {
++			if (dstname)
++				return -EEXIST;
++
++			dstname = item;
++			if (!kdbus_name_is_valid(item->str, false))
++				return -EINVAL;
++			if (msg->dst_id == KDBUS_DST_ID_BROADCAST)
++				return -EBADMSG;
++
++			break;
++		}
++		default:
++			return -EINVAL;
++		}
++	}
++
++	/*
++	 * Step 3:
++	 * Validate that required items were actually passed, and that no item
++	 * contradicts the message flags.
++	 */
++
++	/* bloom filters must be attached _iff_ it's a signal */
++	if (!(msg->flags & KDBUS_MSG_SIGNAL) != !bloom)
++		return -EBADMSG;
++	/* destination name is required if no ID is given */
++	if (msg->dst_id == KDBUS_DST_ID_NAME && !dstname)
++		return -EDESTADDRREQ;
++	/* cannot send file-descriptors attached to broadcasts */
++	if (msg->dst_id == KDBUS_DST_ID_BROADCAST && fds)
++		return -ENOTUNIQ;
++
++	*out_n_memfds = n_memfds;
++	*out_n_fds = n_fds;
++	*out_n_parts = n_parts;
++
++	return 0;
++}
++
++static bool kdbus_staging_merge_vecs(struct kdbus_staging *staging,
++				     struct kdbus_item **prev_item,
++				     struct iovec **prev_vec,
++				     const struct kdbus_item *merge)
++{
++	void __user *ptr = (void __user *)KDBUS_PTR(merge->vec.address);
++	u64 padding = merge->vec.size % 8;
++	struct kdbus_item *prev = *prev_item;
++	struct iovec *vec = *prev_vec;
++
++	/* XXX: merging is disabled so far */
++	if (0 && prev && prev->type == KDBUS_ITEM_PAYLOAD_OFF &&
++	    !merge->vec.address == !prev->vec.address) {
++		/*
++		 * If we merge two VECs, we can always drop the second
++		 * PAYLOAD_VEC item. Hence, include its size in the previous
++		 * one.
++		 */
++		prev->vec.size += merge->vec.size;
++
++		if (ptr) {
++			/*
++			 * If we merge two data VECs, we need two iovecs to copy
++			 * the data. But the items can be easily merged by
++			 * summing their lengths.
++			 */
++			vec = &staging->parts[staging->n_parts++];
++			vec->iov_len = merge->vec.size;
++			vec->iov_base = ptr;
++			staging->n_payload += vec->iov_len;
++		} else if (padding) {
++			/*
++			 * If we merge two 0-vecs with the second 0-vec
++			 * requiring padding, we need to insert an iovec to copy
++			 * the 0-padding. We try merging it with the previous
++			 * 0-padding iovec. This might end up with an
++			 * iov_len==0, in which case we simply drop the iovec.
++			 */
++			if (vec) {
++				staging->n_payload -= vec->iov_len;
++				vec->iov_len = prev->vec.size % 8;
++				if (!vec->iov_len) {
++					--staging->n_parts;
++					vec = NULL;
++				} else {
++					staging->n_payload += vec->iov_len;
++				}
++			} else {
++				vec = &staging->parts[staging->n_parts++];
++				vec->iov_len = padding;
++				vec->iov_base = (char __user *)zeros;
++				staging->n_payload += vec->iov_len;
++			}
++		} else {
++			/*
++			 * If we merge two 0-vecs with the second 0-vec having
++			 * no padding, we know the padding of the first stays
++			 * the same. Hence, @vec needs no adjustment.
++			 */
++		}
++
++		/* successfully merged with previous item */
++		merge = prev;
++	} else {
++		/*
++		 * If we cannot merge the payload item with the previous one,
++		 * we simply insert a new iovec for the data/padding.
++		 */
++		if (ptr) {
++			vec = &staging->parts[staging->n_parts++];
++			vec->iov_len = merge->vec.size;
++			vec->iov_base = ptr;
++			staging->n_payload += vec->iov_len;
++		} else if (padding) {
++			vec = &staging->parts[staging->n_parts++];
++			vec->iov_len = padding;
++			vec->iov_base = (char __user *)zeros;
++			staging->n_payload += vec->iov_len;
++		} else {
++			vec = NULL;
++		}
++	}
++
++	*prev_item = (struct kdbus_item *)merge;
++	*prev_vec = vec;
++
++	return merge == prev;
++}
++
++static int kdbus_staging_import(struct kdbus_staging *staging)
++{
++	struct kdbus_item *it, *item, *last, *prev_payload;
++	struct kdbus_gaps *gaps = staging->gaps;
++	struct kdbus_msg *msg = staging->msg;
++	struct iovec *part, *prev_part;
++	bool drop_item;
++
++	drop_item = false;
++	last = NULL;
++	prev_payload = NULL;
++	prev_part = NULL;
++
++	/*
++	 * We modify msg->items along the way; make sure to use @item as offset
++	 * to the next item (instead of the iterator @it).
++	 */
++	for (it = item = msg->items;
++	     it >= msg->items &&
++	             (u8 *)it < (u8 *)msg + msg->size &&
++	             (u8 *)it + it->size <= (u8 *)msg + msg->size; ) {
++		/*
++		 * If we dropped items along the way, move current item to
++		 * front. We must not access @it afterwards, but use @item
++		 * instead!
++		 */
++		if (it != item)
++			memmove(item, it, it->size);
++		it = (void *)((u8 *)it + KDBUS_ALIGN8(item->size));
++
++		switch (item->type) {
++		case KDBUS_ITEM_PAYLOAD_VEC: {
++			size_t offset = staging->n_payload;
++
++			if (kdbus_staging_merge_vecs(staging, &prev_payload,
++						     &prev_part, item)) {
++				drop_item = true;
++			} else if (item->vec.address) {
++				/* real offset is patched later on */
++				item->type = KDBUS_ITEM_PAYLOAD_OFF;
++				item->vec.offset = offset;
++			} else {
++				item->type = KDBUS_ITEM_PAYLOAD_OFF;
++				item->vec.offset = ~0ULL;
++			}
++
++			break;
++		}
++		case KDBUS_ITEM_PAYLOAD_MEMFD: {
++			struct file *f;
++
++			f = kdbus_get_memfd(&item->memfd);
++			if (IS_ERR(f))
++				return PTR_ERR(f);
++
++			gaps->memfd_files[gaps->n_memfds] = f;
++			gaps->memfd_offsets[gaps->n_memfds] =
++					(u8 *)&item->memfd.fd - (u8 *)msg;
++			++gaps->n_memfds;
++
++			/* memfds cannot be merged */
++			prev_payload = item;
++			prev_part = NULL;
++
++			/* insert padding to make following VECs aligned */
++			if (item->memfd.size % 8) {
++				part = &staging->parts[staging->n_parts++];
++				part->iov_len = item->memfd.size % 8;
++				part->iov_base = (char __user *)zeros;
++				staging->n_payload += part->iov_len;
++			}
++
++			break;
++		}
++		case KDBUS_ITEM_FDS: {
++			size_t i, n_fds;
++
++			n_fds = KDBUS_ITEM_PAYLOAD_SIZE(item) / sizeof(int);
++			for (i = 0; i < n_fds; ++i) {
++				struct file *f;
++
++				f = kdbus_get_fd(item->fds[i]);
++				if (IS_ERR(f))
++					return PTR_ERR(f);
++
++				gaps->fd_files[gaps->n_fds++] = f;
++			}
++
++			gaps->fd_offset = (u8 *)item->fds - (u8 *)msg;
++
++			break;
++		}
++		case KDBUS_ITEM_BLOOM_FILTER:
++			staging->bloom_filter = &item->bloom_filter;
++			break;
++		case KDBUS_ITEM_DST_NAME:
++			staging->dst_name = item->str;
++			break;
++		}
++
++		/* drop item if we merged it with a previous one */
++		if (drop_item) {
++			drop_item = false;
++		} else {
++			last = item;
++			item = KDBUS_ITEM_NEXT(item);
++		}
++	}
++
++	/* adjust message size regarding dropped items */
++	msg->size = offsetof(struct kdbus_msg, items);
++	if (last)
++		msg->size += ((u8 *)last - (u8 *)msg->items) + last->size;
++
++	return 0;
++}
++
++static void kdbus_staging_reserve(struct kdbus_staging *staging)
++{
++	struct iovec *part;
++
++	part = &staging->parts[staging->n_parts++];
++	part->iov_base = (void __user *)zeros;
++	part->iov_len = 0;
++}
++
++static struct kdbus_staging *kdbus_staging_new(struct kdbus_bus *bus,
++					       size_t n_parts,
++					       size_t msg_extra_size)
++{
++	const size_t reserved_parts = 5; /* see below for explanation */
++	struct kdbus_staging *staging;
++	int ret;
++
++	n_parts += reserved_parts;
++
++	staging = kzalloc(sizeof(*staging) + n_parts * sizeof(*staging->parts) +
++			  msg_extra_size, GFP_TEMPORARY);
++	if (!staging)
++		return ERR_PTR(-ENOMEM);
++
++	staging->msg_seqnum = atomic64_inc_return(&bus->last_message_id);
++	staging->n_parts = 0; /* we reserve n_parts, but don't enforce them */
++	staging->parts = (void *)(staging + 1);
++
++	if (msg_extra_size) /* if requested, allocate message, too */
++		staging->msg = (void *)((u8 *)staging->parts +
++				        n_parts * sizeof(*staging->parts));
++
++	staging->meta_proc = kdbus_meta_proc_new();
++	if (IS_ERR(staging->meta_proc)) {
++		ret = PTR_ERR(staging->meta_proc);
++		staging->meta_proc = NULL;
++		goto error;
++	}
++
++	staging->meta_conn = kdbus_meta_conn_new();
++	if (IS_ERR(staging->meta_conn)) {
++		ret = PTR_ERR(staging->meta_conn);
++		staging->meta_conn = NULL;
++		goto error;
++	}
++
++	/*
++	 * Prepare iovecs to copy the message into the target pool. We use the
++	 * following iovecs:
++	 *   * iovec to copy "kdbus_msg.size"
++	 *   * iovec to copy "struct kdbus_msg" (minus size) plus items
++	 *   * iovec for possible padding after the items
++	 *   * iovec for metadata items
++	 *   * iovec for possible padding after the items
++	 *
++	 * Make sure to update @reserved_parts if you add more parts here.
++	 */
++
++	kdbus_staging_reserve(staging); /* msg.size */
++	kdbus_staging_reserve(staging); /* msg (minus msg.size) plus items */
++	kdbus_staging_reserve(staging); /* msg padding */
++	kdbus_staging_reserve(staging); /* meta */
++	kdbus_staging_reserve(staging); /* meta padding */
++
++	return staging;
++
++error:
++	kdbus_staging_free(staging);
++	return ERR_PTR(ret);
++}
++
++struct kdbus_staging *kdbus_staging_new_kernel(struct kdbus_bus *bus,
++					       u64 dst, u64 cookie_timeout,
++					       size_t it_size, size_t it_type)
++{
++	struct kdbus_staging *staging;
++	size_t size;
++
++	size = offsetof(struct kdbus_msg, items) +
++	       KDBUS_ITEM_HEADER_SIZE + it_size;
++
++	staging = kdbus_staging_new(bus, 0, KDBUS_ALIGN8(size));
++	if (IS_ERR(staging))
++		return ERR_CAST(staging);
++
++	staging->msg->size = size;
++	staging->msg->flags = (dst == KDBUS_DST_ID_BROADCAST) ?
++							KDBUS_MSG_SIGNAL : 0;
++	staging->msg->dst_id = dst;
++	staging->msg->src_id = KDBUS_SRC_ID_KERNEL;
++	staging->msg->payload_type = KDBUS_PAYLOAD_KERNEL;
++	staging->msg->cookie_reply = cookie_timeout;
++	staging->notify = staging->msg->items;
++	staging->notify->size = KDBUS_ITEM_HEADER_SIZE + it_size;
++	staging->notify->type = it_type;
++
++	return staging;
++}
++
++struct kdbus_staging *kdbus_staging_new_user(struct kdbus_bus *bus,
++					     struct kdbus_cmd_send *cmd,
++					     struct kdbus_msg *msg)
++{
++	const size_t reserved_parts = 1; /* see below for explanation */
++	size_t n_memfds, n_fds, n_parts;
++	struct kdbus_staging *staging;
++	int ret;
++
++	/*
++	 * Examine user-supplied message and figure out how many resources we
++	 * need to allocate in our staging area. This requires us to iterate
++	 * the message twice, but saves us from re-allocating our resources
++	 * all the time.
++	 */
++
++	ret = kdbus_msg_examine(msg, bus, cmd, &n_memfds, &n_fds, &n_parts);
++	if (ret < 0)
++		return ERR_PTR(ret);
++
++	n_parts += reserved_parts;
++
++	/*
++	 * Allocate staging area with the number of required resources. Make
++	 * sure that we have enough iovecs for all required parts pre-allocated
++	 * so this will hopefully be the only memory allocation for this
++	 * message transaction.
++	 */
++
++	staging = kdbus_staging_new(bus, n_parts, 0);
++	if (IS_ERR(staging))
++		return ERR_CAST(staging);
++
++	staging->msg = msg;
++
++	/*
++	 * If the message contains memfds or fd items, we need to remember some
++	 * state so we can fill in the requested information at RECV time.
++	 * File-descriptors cannot be passed at SEND time. Hence, allocate a
++	 * gaps-object to remember that state. That gaps object is linked to
++	 * from the staging area, but will also be linked to from the message
++	 * queue of each peer. Hence, each receiver owns a reference to it, and
++	 * it will later be used to fill the 'gaps' in message that couldn't be
++	 * filled at SEND time.
++	 * Note that the 'gaps' object is read-only once the staging-allocator
++	 * returns. There might be connections receiving a queued message while
++	 * the sender still broadcasts the message to other receivers.
++	 */
++
++	if (n_memfds > 0 || n_fds > 0) {
++		staging->gaps = kdbus_gaps_new(n_memfds, n_fds);
++		if (IS_ERR(staging->gaps)) {
++			ret = PTR_ERR(staging->gaps);
++			staging->gaps = NULL;
++			kdbus_staging_free(staging);
++			return ERR_PTR(ret);
++		}
++	}
++
++	/*
++	 * kdbus_staging_new() already reserves parts for message setup. For
++	 * user-supplied messages, we add the following iovecs:
++	 *   ... variable number of iovecs for payload ...
++	 *   * final iovec for possible padding of payload
++	 *
++	 * Make sure to update @reserved_parts if you add more parts here.
++	 */
++
++	ret = kdbus_staging_import(staging); /* payload */
++	kdbus_staging_reserve(staging); /* payload padding */
++
++	if (ret < 0)
++		goto error;
++
++	return staging;
++
++error:
++	kdbus_staging_free(staging);
++	return ERR_PTR(ret);
++}
++
++struct kdbus_staging *kdbus_staging_free(struct kdbus_staging *staging)
++{
++	if (!staging)
++		return NULL;
++
++	kdbus_meta_conn_unref(staging->meta_conn);
++	kdbus_meta_proc_unref(staging->meta_proc);
++	kdbus_gaps_unref(staging->gaps);
++	kfree(staging);
++
++	return NULL;
++}
++
++static int kdbus_staging_collect_metadata(struct kdbus_staging *staging,
++					  struct kdbus_conn *src,
++					  struct kdbus_conn *dst,
++					  u64 *out_attach)
++{
++	u64 attach;
++	int ret;
++
++	if (src)
++		attach = kdbus_meta_msg_mask(src, dst);
++	else
++		attach = KDBUS_ATTACH_TIMESTAMP; /* metadata for kernel msgs */
++
++	if (src && !src->meta_fake) {
++		ret = kdbus_meta_proc_collect(staging->meta_proc, attach);
++		if (ret < 0)
++			return ret;
++	}
++
++	ret = kdbus_meta_conn_collect(staging->meta_conn, src,
++				      staging->msg_seqnum, attach);
++	if (ret < 0)
++		return ret;
++
++	*out_attach = attach;
++	return 0;
++}
++
++/**
++ * kdbus_staging_emit() - emit linearized message in target pool
++ * @staging:		staging object to create message from
++ * @src:		sender of the message (or NULL)
++ * @dst:		target connection to allocate message for
++ *
++ * This allocates a pool-slice for @dst and copies the message provided by
++ * @staging into it. The new slice is then returned to the caller for further
++ * processing. It's not linked into any queue, yet.
++ *
++ * Return: Newly allocated slice or ERR_PTR on failure.
++ */
++struct kdbus_pool_slice *kdbus_staging_emit(struct kdbus_staging *staging,
++					    struct kdbus_conn *src,
++					    struct kdbus_conn *dst)
++{
++	struct kdbus_item *item, *meta_items = NULL;
++	struct kdbus_pool_slice *slice = NULL;
++	size_t off, size, meta_size;
++	struct iovec *v;
++	u64 attach, msg_size;
++	int ret;
++
++	/*
++	 * Step 1:
++	 * Collect metadata from @src depending on the attach-flags allowed for
++	 * @dst. Translate it into the namespaces pinned by @dst.
++	 */
++
++	ret = kdbus_staging_collect_metadata(staging, src, dst, &attach);
++	if (ret < 0)
++		goto error;
++
++	ret = kdbus_meta_emit(staging->meta_proc, NULL, staging->meta_conn,
++			      dst, attach, &meta_items, &meta_size);
++	if (ret < 0)
++		goto error;
++
++	/*
++	 * Step 2:
++	 * Setup iovecs for the message. See kdbus_staging_new() for allocation
++	 * of those iovecs. All reserved iovecs have been initialized with
++	 * iov_len=0 + iov_base=zeros. Furthermore, the iovecs to copy the
++	 * actual message payload have already been initialized and need not be
++	 * touched.
++	 */
++
++	v = staging->parts;
++	msg_size = staging->msg->size;
++
++	/* msg.size */
++	v->iov_len = sizeof(msg_size);
++	v->iov_base = (void __user *)&msg_size;
++	++v;
++
++	/* msg (after msg.size) plus items */
++	v->iov_len = staging->msg->size - sizeof(staging->msg->size);
++	v->iov_base = (void __user *)((u8 *)staging->msg +
++				      sizeof(staging->msg->size));
++	++v;
++
++	/* padding after msg */
++	v->iov_len = KDBUS_ALIGN8(staging->msg->size) - staging->msg->size;
++	v->iov_base = (void __user *)zeros;
++	++v;
++
++	if (meta_size > 0) {
++		/* metadata items */
++		v->iov_len = meta_size;
++		v->iov_base = (void __user *)meta_items;
++		++v;
++
++		/* padding after metadata */
++		v->iov_len = KDBUS_ALIGN8(meta_size) - meta_size;
++		v->iov_base = (void __user *)zeros;
++		++v;
++
++		msg_size = KDBUS_ALIGN8(msg_size) + meta_size;
++	} else {
++		/* metadata items */
++		v->iov_len = 0;
++		v->iov_base = (void __user *)zeros;
++		++v;
++
++		/* padding after metadata */
++		v->iov_len = 0;
++		v->iov_base = (void __user *)zeros;
++		++v;
++	}
++
++	/* ... payload iovecs are already filled in ... */
++
++	/* compute overall size and fill in padding after payload */
++	size = KDBUS_ALIGN8(msg_size);
++
++	if (staging->n_payload > 0) {
++		size += staging->n_payload;
++
++		v = &staging->parts[staging->n_parts - 1];
++		v->iov_len = KDBUS_ALIGN8(size) - size;
++		v->iov_base = (void __user *)zeros;
++
++		size = KDBUS_ALIGN8(size);
++	}
++
++	/*
++	 * Step 3:
++	 * The PAYLOAD_OFF items in the message contain a relative 'offset'
++	 * field that tells the receiver where to find the actual payload. This
++	 * offset is relative to the start of the message, and as such depends
++	 * on the size of the metadata items we inserted. This size is variable
++	 * and changes for each peer we send the message to. Hence, we remember
++	 * the last relative offset that was used to calculate the 'offset'
++	 * fields. For each message, we re-calculate it and patch all items, in
++	 * case it changed.
++	 */
++
++	off = KDBUS_ALIGN8(msg_size);
++
++	if (off != staging->i_payload) {
++		KDBUS_ITEMS_FOREACH(item, staging->msg->items,
++				    KDBUS_ITEMS_SIZE(staging->msg, items)) {
++			if (item->type != KDBUS_ITEM_PAYLOAD_OFF)
++				continue;
++
++			item->vec.offset -= staging->i_payload;
++			item->vec.offset += off;
++		}
++
++		staging->i_payload = off;
++	}
++
++	/*
++	 * Step 4:
++	 * Allocate pool slice and copy over all data. Make sure to properly
++	 * account on user quota.
++	 */
++
++	ret = kdbus_conn_quota_inc(dst, src ? src->user : NULL, size,
++				   staging->gaps ? staging->gaps->n_fds : 0);
++	if (ret < 0)
++		goto error;
++
++	slice = kdbus_pool_slice_alloc(dst->pool, size, true);
++	if (IS_ERR(slice)) {
++		ret = PTR_ERR(slice);
++		slice = NULL;
++		goto error;
++	}
++
++	WARN_ON(kdbus_pool_slice_size(slice) != size);
++
++	ret = kdbus_pool_slice_copy_iovec(slice, 0, staging->parts,
++					  staging->n_parts, size);
++	if (ret < 0)
++		goto error;
++
++	/* all done, return slice to caller */
++	goto exit;
++
++error:
++	if (slice)
++		kdbus_conn_quota_dec(dst, src ? src->user : NULL, size,
++				     staging->gaps ? staging->gaps->n_fds : 0);
++	kdbus_pool_slice_release(slice);
++	slice = ERR_PTR(ret);
++exit:
++	kfree(meta_items);
++	return slice;
++}
+diff --git a/ipc/kdbus/message.h b/ipc/kdbus/message.h
+new file mode 100644
+index 0000000..298f9c9
+--- /dev/null
++++ b/ipc/kdbus/message.h
+@@ -0,0 +1,120 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_MESSAGE_H
++#define __KDBUS_MESSAGE_H
++
++#include <linux/fs.h>
++#include <linux/kref.h>
++#include <uapi/linux/kdbus.h>
++
++struct kdbus_bus;
++struct kdbus_conn;
++struct kdbus_meta_conn;
++struct kdbus_meta_proc;
++struct kdbus_pool_slice;
++
++/**
++ * struct kdbus_gaps - gaps in message to be filled later
++ * @kref:		Reference counter
++ * @n_memfd_offs:	Number of memfds
++ * @memfd_offs:		Offsets of kdbus_memfd items in target slice
++ * @n_fds:		Number of fds
++ * @fds:		Array of sent fds
++ * @fds_offset:		Offset of fd-array in target slice
++ *
++ * The 'gaps' object is used to track data that is needed to fill gaps in a
++ * message at RECV time. Usually, we try to compile the whole message at SEND
++ * time. This has the advantage, that we don't have to cache any information and
++ * can keep the memory consumption small. Furthermore, all copy operations can
++ * be combined into a single function call, which speeds up transactions
++ * considerably.
++ * However, things like file-descriptors can only be fully installed at RECV
++ * time. The gaps object tracks this data and pins it until a message is
++ * received. The gaps object is shared between all receivers of the same
++ * message.
++ */
++struct kdbus_gaps {
++	struct kref kref;
++
++	/* state tracking for KDBUS_ITEM_PAYLOAD_MEMFD entries */
++	size_t n_memfds;
++	u64 *memfd_offsets;
++	struct file **memfd_files;
++
++	/* state tracking for KDBUS_ITEM_FDS */
++	size_t n_fds;
++	struct file **fd_files;
++	u64 fd_offset;
++};
++
++struct kdbus_gaps *kdbus_gaps_ref(struct kdbus_gaps *gaps);
++struct kdbus_gaps *kdbus_gaps_unref(struct kdbus_gaps *gaps);
++int kdbus_gaps_install(struct kdbus_gaps *gaps, struct kdbus_pool_slice *slice,
++		       bool *out_incomplete);
++
++/**
++ * struct kdbus_staging - staging area to import messages
++ * @msg:		User-supplied message
++ * @gaps:		Gaps-object created during import (or NULL if empty)
++ * @msg_seqnum:		Message sequence number
++ * @notify_entry:	Entry into list of kernel-generated notifications
++ * @i_payload:		Current relative index of start of payload
++ * @n_payload:		Total number of bytes needed for payload
++ * @n_parts:		Number of parts
++ * @parts:		Array of iovecs that make up the whole message
++ * @meta_proc:		Process metadata of the sender (or NULL if empty)
++ * @meta_conn:		Connection metadata of the sender (or NULL if empty)
++ * @bloom_filter:	Pointer to the bloom-item in @msg, or NULL
++ * @dst_name:		Pointer to the dst-name-item in @msg, or NULL
++ * @notify:		Pointer to the notification item in @msg, or NULL
++ *
++ * The kdbus_staging object is a temporary staging area to import user-supplied
++ * messages into the kernel. It is only used during SEND and dropped once the
++ * message is queued. Any data that cannot be collected during SEND, is
++ * collected in a kdbus_gaps object and attached to the message queue.
++ */
++struct kdbus_staging {
++	struct kdbus_msg *msg;
++	struct kdbus_gaps *gaps;
++	u64 msg_seqnum;
++	struct list_head notify_entry;
++
++	/* crafted iovecs to copy the message */
++	size_t i_payload;
++	size_t n_payload;
++	size_t n_parts;
++	struct iovec *parts;
++
++	/* metadata state */
++	struct kdbus_meta_proc *meta_proc;
++	struct kdbus_meta_conn *meta_conn;
++
++	/* cached pointers into @msg */
++	const struct kdbus_bloom_filter *bloom_filter;
++	const char *dst_name;
++	struct kdbus_item *notify;
++};
++
++struct kdbus_staging *kdbus_staging_new_kernel(struct kdbus_bus *bus,
++					       u64 dst, u64 cookie_timeout,
++					       size_t it_size, size_t it_type);
++struct kdbus_staging *kdbus_staging_new_user(struct kdbus_bus *bus,
++					     struct kdbus_cmd_send *cmd,
++					     struct kdbus_msg *msg);
++struct kdbus_staging *kdbus_staging_free(struct kdbus_staging *staging);
++struct kdbus_pool_slice *kdbus_staging_emit(struct kdbus_staging *staging,
++					    struct kdbus_conn *src,
++					    struct kdbus_conn *dst);
++
++#endif
+diff --git a/ipc/kdbus/metadata.c b/ipc/kdbus/metadata.c
+new file mode 100644
+index 0000000..71ca475
+--- /dev/null
++++ b/ipc/kdbus/metadata.c
+@@ -0,0 +1,1347 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/audit.h>
++#include <linux/capability.h>
++#include <linux/cgroup.h>
++#include <linux/cred.h>
++#include <linux/file.h>
++#include <linux/fs_struct.h>
++#include <linux/init.h>
++#include <linux/kref.h>
++#include <linux/mutex.h>
++#include <linux/sched.h>
++#include <linux/security.h>
++#include <linux/sizes.h>
++#include <linux/slab.h>
++#include <linux/uaccess.h>
++#include <linux/uidgid.h>
++#include <linux/uio.h>
++#include <linux/user_namespace.h>
++
++#include "bus.h"
++#include "connection.h"
++#include "endpoint.h"
++#include "item.h"
++#include "message.h"
++#include "metadata.h"
++#include "names.h"
++
++/**
++ * struct kdbus_meta_proc - Process metadata
++ * @kref:		Reference counting
++ * @lock:		Object lock
++ * @collected:		Bitmask of collected items
++ * @valid:		Bitmask of collected and valid items
++ * @cred:		Credentials
++ * @pid:		PID of process
++ * @tgid:		TGID of process
++ * @ppid:		PPID of process
++ * @tid_comm:		TID comm line
++ * @pid_comm:		PID comm line
++ * @exe_path:		Executable path
++ * @root_path:		Root-FS path
++ * @cmdline:		Command-line
++ * @cgroup:		Full cgroup path
++ * @seclabel:		Seclabel
++ * @audit_loginuid:	Audit login-UID
++ * @audit_sessionid:	Audit session-ID
++ */
++struct kdbus_meta_proc {
++	struct kref kref;
++	struct mutex lock;
++	u64 collected;
++	u64 valid;
++
++	/* KDBUS_ITEM_CREDS */
++	/* KDBUS_ITEM_AUXGROUPS */
++	/* KDBUS_ITEM_CAPS */
++	const struct cred *cred;
++
++	/* KDBUS_ITEM_PIDS */
++	struct pid *pid;
++	struct pid *tgid;
++	struct pid *ppid;
++
++	/* KDBUS_ITEM_TID_COMM */
++	char tid_comm[TASK_COMM_LEN];
++	/* KDBUS_ITEM_PID_COMM */
++	char pid_comm[TASK_COMM_LEN];
++
++	/* KDBUS_ITEM_EXE */
++	struct path exe_path;
++	struct path root_path;
++
++	/* KDBUS_ITEM_CMDLINE */
++	char *cmdline;
++
++	/* KDBUS_ITEM_CGROUP */
++	char *cgroup;
++
++	/* KDBUS_ITEM_SECLABEL */
++	char *seclabel;
++
++	/* KDBUS_ITEM_AUDIT */
++	kuid_t audit_loginuid;
++	unsigned int audit_sessionid;
++};
++
++/**
++ * struct kdbus_meta_conn
++ * @kref:		Reference counting
++ * @lock:		Object lock
++ * @collected:		Bitmask of collected items
++ * @valid:		Bitmask of collected and valid items
++ * @ts:			Timestamp values
++ * @owned_names_items:	Serialized items for owned names
++ * @owned_names_size:	Size of @owned_names_items
++ * @conn_description:	Connection description
++ */
++struct kdbus_meta_conn {
++	struct kref kref;
++	struct mutex lock;
++	u64 collected;
++	u64 valid;
++
++	/* KDBUS_ITEM_TIMESTAMP */
++	struct kdbus_timestamp ts;
++
++	/* KDBUS_ITEM_OWNED_NAME */
++	struct kdbus_item *owned_names_items;
++	size_t owned_names_size;
++
++	/* KDBUS_ITEM_CONN_DESCRIPTION */
++	char *conn_description;
++};
++
++/* fixed size equivalent of "kdbus_caps" */
++struct kdbus_meta_caps {
++	u32 last_cap;
++	struct {
++		u32 caps[_KERNEL_CAPABILITY_U32S];
++	} set[4];
++};
++
++/**
++ * kdbus_meta_proc_new() - Create process metadata object
++ *
++ * Return: Pointer to new object on success, ERR_PTR on failure.
++ */
++struct kdbus_meta_proc *kdbus_meta_proc_new(void)
++{
++	struct kdbus_meta_proc *mp;
++
++	mp = kzalloc(sizeof(*mp), GFP_KERNEL);
++	if (!mp)
++		return ERR_PTR(-ENOMEM);
++
++	kref_init(&mp->kref);
++	mutex_init(&mp->lock);
++
++	return mp;
++}
++
++static void kdbus_meta_proc_free(struct kref *kref)
++{
++	struct kdbus_meta_proc *mp = container_of(kref, struct kdbus_meta_proc,
++						  kref);
++
++	path_put(&mp->exe_path);
++	path_put(&mp->root_path);
++	if (mp->cred)
++		put_cred(mp->cred);
++	put_pid(mp->ppid);
++	put_pid(mp->tgid);
++	put_pid(mp->pid);
++
++	kfree(mp->seclabel);
++	kfree(mp->cmdline);
++	kfree(mp->cgroup);
++	kfree(mp);
++}
++
++/**
++ * kdbus_meta_proc_ref() - Gain reference
++ * @mp:		Process metadata object
++ *
++ * Return: @mp is returned
++ */
++struct kdbus_meta_proc *kdbus_meta_proc_ref(struct kdbus_meta_proc *mp)
++{
++	if (mp)
++		kref_get(&mp->kref);
++	return mp;
++}
++
++/**
++ * kdbus_meta_proc_unref() - Drop reference
++ * @mp:		Process metadata object
++ *
++ * Return: NULL
++ */
++struct kdbus_meta_proc *kdbus_meta_proc_unref(struct kdbus_meta_proc *mp)
++{
++	if (mp)
++		kref_put(&mp->kref, kdbus_meta_proc_free);
++	return NULL;
++}
++
++static void kdbus_meta_proc_collect_pids(struct kdbus_meta_proc *mp)
++{
++	struct task_struct *parent;
++
++	mp->pid = get_pid(task_pid(current));
++	mp->tgid = get_pid(task_tgid(current));
++
++	rcu_read_lock();
++	parent = rcu_dereference(current->real_parent);
++	mp->ppid = get_pid(task_tgid(parent));
++	rcu_read_unlock();
++
++	mp->valid |= KDBUS_ATTACH_PIDS;
++}
++
++static void kdbus_meta_proc_collect_tid_comm(struct kdbus_meta_proc *mp)
++{
++	get_task_comm(mp->tid_comm, current);
++	mp->valid |= KDBUS_ATTACH_TID_COMM;
++}
++
++static void kdbus_meta_proc_collect_pid_comm(struct kdbus_meta_proc *mp)
++{
++	get_task_comm(mp->pid_comm, current->group_leader);
++	mp->valid |= KDBUS_ATTACH_PID_COMM;
++}
++
++static void kdbus_meta_proc_collect_exe(struct kdbus_meta_proc *mp)
++{
++	struct file *exe_file;
++
++	rcu_read_lock();
++	exe_file = rcu_dereference(current->mm->exe_file);
++	if (exe_file) {
++		mp->exe_path = exe_file->f_path;
++		path_get(&mp->exe_path);
++		get_fs_root(current->fs, &mp->root_path);
++		mp->valid |= KDBUS_ATTACH_EXE;
++	}
++	rcu_read_unlock();
++}
++
++static int kdbus_meta_proc_collect_cmdline(struct kdbus_meta_proc *mp)
++{
++	struct mm_struct *mm = current->mm;
++	char *cmdline;
++
++	if (!mm->arg_end)
++		return 0;
++
++	cmdline = strndup_user((const char __user *)mm->arg_start,
++			       mm->arg_end - mm->arg_start);
++	if (IS_ERR(cmdline))
++		return PTR_ERR(cmdline);
++
++	mp->cmdline = cmdline;
++	mp->valid |= KDBUS_ATTACH_CMDLINE;
++
++	return 0;
++}
++
++static int kdbus_meta_proc_collect_cgroup(struct kdbus_meta_proc *mp)
++{
++#ifdef CONFIG_CGROUPS
++	void *page;
++	char *s;
++
++	page = (void *)__get_free_page(GFP_TEMPORARY);
++	if (!page)
++		return -ENOMEM;
++
++	s = task_cgroup_path(current, page, PAGE_SIZE);
++	if (s) {
++		mp->cgroup = kstrdup(s, GFP_KERNEL);
++		if (!mp->cgroup) {
++			free_page((unsigned long)page);
++			return -ENOMEM;
++		}
++	}
++
++	free_page((unsigned long)page);
++	mp->valid |= KDBUS_ATTACH_CGROUP;
++#endif
++
++	return 0;
++}
++
++static int kdbus_meta_proc_collect_seclabel(struct kdbus_meta_proc *mp)
++{
++#ifdef CONFIG_SECURITY
++	char *ctx = NULL;
++	u32 sid, len;
++	int ret;
++
++	security_task_getsecid(current, &sid);
++	ret = security_secid_to_secctx(sid, &ctx, &len);
++	if (ret < 0) {
++		/*
++		 * EOPNOTSUPP means no security module is active,
++		 * lets skip adding the seclabel then. This effectively
++		 * drops the SECLABEL item.
++		 */
++		return (ret == -EOPNOTSUPP) ? 0 : ret;
++	}
++
++	mp->seclabel = kstrdup(ctx, GFP_KERNEL);
++	security_release_secctx(ctx, len);
++	if (!mp->seclabel)
++		return -ENOMEM;
++
++	mp->valid |= KDBUS_ATTACH_SECLABEL;
++#endif
++
++	return 0;
++}
++
++static void kdbus_meta_proc_collect_audit(struct kdbus_meta_proc *mp)
++{
++#ifdef CONFIG_AUDITSYSCALL
++	mp->audit_loginuid = audit_get_loginuid(current);
++	mp->audit_sessionid = audit_get_sessionid(current);
++	mp->valid |= KDBUS_ATTACH_AUDIT;
++#endif
++}
++
++/**
++ * kdbus_meta_proc_collect() - Collect process metadata
++ * @mp:		Process metadata object
++ * @what:	Attach flags to collect
++ *
++ * This collects process metadata from current and saves it in @mp.
++ *
++ * Return: 0 on success, negative error code on failure.
++ */
++int kdbus_meta_proc_collect(struct kdbus_meta_proc *mp, u64 what)
++{
++	int ret;
++
++	if (!mp || !(what & (KDBUS_ATTACH_CREDS |
++			     KDBUS_ATTACH_PIDS |
++			     KDBUS_ATTACH_AUXGROUPS |
++			     KDBUS_ATTACH_TID_COMM |
++			     KDBUS_ATTACH_PID_COMM |
++			     KDBUS_ATTACH_EXE |
++			     KDBUS_ATTACH_CMDLINE |
++			     KDBUS_ATTACH_CGROUP |
++			     KDBUS_ATTACH_CAPS |
++			     KDBUS_ATTACH_SECLABEL |
++			     KDBUS_ATTACH_AUDIT)))
++		return 0;
++
++	mutex_lock(&mp->lock);
++
++	/* creds, auxgrps and caps share "struct cred" as context */
++	{
++		const u64 m_cred = KDBUS_ATTACH_CREDS |
++				   KDBUS_ATTACH_AUXGROUPS |
++				   KDBUS_ATTACH_CAPS;
++
++		if ((what & m_cred) && !(mp->collected & m_cred)) {
++			mp->cred = get_current_cred();
++			mp->valid |= m_cred;
++			mp->collected |= m_cred;
++		}
++	}
++
++	if ((what & KDBUS_ATTACH_PIDS) &&
++	    !(mp->collected & KDBUS_ATTACH_PIDS)) {
++		kdbus_meta_proc_collect_pids(mp);
++		mp->collected |= KDBUS_ATTACH_PIDS;
++	}
++
++	if ((what & KDBUS_ATTACH_TID_COMM) &&
++	    !(mp->collected & KDBUS_ATTACH_TID_COMM)) {
++		kdbus_meta_proc_collect_tid_comm(mp);
++		mp->collected |= KDBUS_ATTACH_TID_COMM;
++	}
++
++	if ((what & KDBUS_ATTACH_PID_COMM) &&
++	    !(mp->collected & KDBUS_ATTACH_PID_COMM)) {
++		kdbus_meta_proc_collect_pid_comm(mp);
++		mp->collected |= KDBUS_ATTACH_PID_COMM;
++	}
++
++	if ((what & KDBUS_ATTACH_EXE) &&
++	    !(mp->collected & KDBUS_ATTACH_EXE)) {
++		kdbus_meta_proc_collect_exe(mp);
++		mp->collected |= KDBUS_ATTACH_EXE;
++	}
++
++	if ((what & KDBUS_ATTACH_CMDLINE) &&
++	    !(mp->collected & KDBUS_ATTACH_CMDLINE)) {
++		ret = kdbus_meta_proc_collect_cmdline(mp);
++		if (ret < 0)
++			goto exit_unlock;
++		mp->collected |= KDBUS_ATTACH_CMDLINE;
++	}
++
++	if ((what & KDBUS_ATTACH_CGROUP) &&
++	    !(mp->collected & KDBUS_ATTACH_CGROUP)) {
++		ret = kdbus_meta_proc_collect_cgroup(mp);
++		if (ret < 0)
++			goto exit_unlock;
++		mp->collected |= KDBUS_ATTACH_CGROUP;
++	}
++
++	if ((what & KDBUS_ATTACH_SECLABEL) &&
++	    !(mp->collected & KDBUS_ATTACH_SECLABEL)) {
++		ret = kdbus_meta_proc_collect_seclabel(mp);
++		if (ret < 0)
++			goto exit_unlock;
++		mp->collected |= KDBUS_ATTACH_SECLABEL;
++	}
++
++	if ((what & KDBUS_ATTACH_AUDIT) &&
++	    !(mp->collected & KDBUS_ATTACH_AUDIT)) {
++		kdbus_meta_proc_collect_audit(mp);
++		mp->collected |= KDBUS_ATTACH_AUDIT;
++	}
++
++	ret = 0;
++
++exit_unlock:
++	mutex_unlock(&mp->lock);
++	return ret;
++}
++
++/**
++ * kdbus_meta_fake_new() - Create fake metadata object
++ *
++ * Return: Pointer to new object on success, ERR_PTR on failure.
++ */
++struct kdbus_meta_fake *kdbus_meta_fake_new(void)
++{
++	struct kdbus_meta_fake *mf;
++
++	mf = kzalloc(sizeof(*mf), GFP_KERNEL);
++	if (!mf)
++		return ERR_PTR(-ENOMEM);
++
++	return mf;
++}
++
++/**
++ * kdbus_meta_fake_free() - Free fake metadata object
++ * @mf:		Fake metadata object
++ *
++ * Return: NULL
++ */
++struct kdbus_meta_fake *kdbus_meta_fake_free(struct kdbus_meta_fake *mf)
++{
++	if (mf) {
++		put_pid(mf->ppid);
++		put_pid(mf->tgid);
++		put_pid(mf->pid);
++		kfree(mf->seclabel);
++		kfree(mf);
++	}
++
++	return NULL;
++}
++
++/**
++ * kdbus_meta_fake_collect() - Fill fake metadata from faked credentials
++ * @mf:		Fake metadata object
++ * @creds:	Creds to set, may be %NULL
++ * @pids:	PIDs to set, may be %NULL
++ * @seclabel:	Seclabel to set, may be %NULL
++ *
++ * This function takes information stored in @creds, @pids and @seclabel and
++ * resolves them to kernel-representations, if possible. This call uses the
++ * current task's namespaces to resolve the given information.
++ *
++ * Return: 0 on success, negative error code on failure.
++ */
++int kdbus_meta_fake_collect(struct kdbus_meta_fake *mf,
++			    const struct kdbus_creds *creds,
++			    const struct kdbus_pids *pids,
++			    const char *seclabel)
++{
++	if (mf->valid)
++		return -EALREADY;
++
++	if (creds) {
++		struct user_namespace *ns = current_user_ns();
++
++		mf->uid		= make_kuid(ns, creds->uid);
++		mf->euid	= make_kuid(ns, creds->euid);
++		mf->suid	= make_kuid(ns, creds->suid);
++		mf->fsuid	= make_kuid(ns, creds->fsuid);
++
++		mf->gid		= make_kgid(ns, creds->gid);
++		mf->egid	= make_kgid(ns, creds->egid);
++		mf->sgid	= make_kgid(ns, creds->sgid);
++		mf->fsgid	= make_kgid(ns, creds->fsgid);
++
++		if ((creds->uid   != (uid_t)-1 && !uid_valid(mf->uid))   ||
++		    (creds->euid  != (uid_t)-1 && !uid_valid(mf->euid))  ||
++		    (creds->suid  != (uid_t)-1 && !uid_valid(mf->suid))  ||
++		    (creds->fsuid != (uid_t)-1 && !uid_valid(mf->fsuid)) ||
++		    (creds->gid   != (gid_t)-1 && !gid_valid(mf->gid))   ||
++		    (creds->egid  != (gid_t)-1 && !gid_valid(mf->egid))  ||
++		    (creds->sgid  != (gid_t)-1 && !gid_valid(mf->sgid))  ||
++		    (creds->fsgid != (gid_t)-1 && !gid_valid(mf->fsgid)))
++			return -EINVAL;
++
++		mf->valid |= KDBUS_ATTACH_CREDS;
++	}
++
++	if (pids) {
++		mf->pid = get_pid(find_vpid(pids->tid));
++		mf->tgid = get_pid(find_vpid(pids->pid));
++		mf->ppid = get_pid(find_vpid(pids->ppid));
++
++		if ((pids->tid != 0 && !mf->pid) ||
++		    (pids->pid != 0 && !mf->tgid) ||
++		    (pids->ppid != 0 && !mf->ppid)) {
++			put_pid(mf->pid);
++			put_pid(mf->tgid);
++			put_pid(mf->ppid);
++			mf->pid = NULL;
++			mf->tgid = NULL;
++			mf->ppid = NULL;
++			return -EINVAL;
++		}
++
++		mf->valid |= KDBUS_ATTACH_PIDS;
++	}
++
++	if (seclabel) {
++		mf->seclabel = kstrdup(seclabel, GFP_KERNEL);
++		if (!mf->seclabel)
++			return -ENOMEM;
++
++		mf->valid |= KDBUS_ATTACH_SECLABEL;
++	}
++
++	return 0;
++}
++
++/**
++ * kdbus_meta_conn_new() - Create connection metadata object
++ *
++ * Return: Pointer to new object on success, ERR_PTR on failure.
++ */
++struct kdbus_meta_conn *kdbus_meta_conn_new(void)
++{
++	struct kdbus_meta_conn *mc;
++
++	mc = kzalloc(sizeof(*mc), GFP_KERNEL);
++	if (!mc)
++		return ERR_PTR(-ENOMEM);
++
++	kref_init(&mc->kref);
++	mutex_init(&mc->lock);
++
++	return mc;
++}
++
++static void kdbus_meta_conn_free(struct kref *kref)
++{
++	struct kdbus_meta_conn *mc =
++		container_of(kref, struct kdbus_meta_conn, kref);
++
++	kfree(mc->conn_description);
++	kfree(mc->owned_names_items);
++	kfree(mc);
++}
++
++/**
++ * kdbus_meta_conn_ref() - Gain reference
++ * @mc:		Connection metadata object
++ */
++struct kdbus_meta_conn *kdbus_meta_conn_ref(struct kdbus_meta_conn *mc)
++{
++	if (mc)
++		kref_get(&mc->kref);
++	return mc;
++}
++
++/**
++ * kdbus_meta_conn_unref() - Drop reference
++ * @mc:		Connection metadata object
++ */
++struct kdbus_meta_conn *kdbus_meta_conn_unref(struct kdbus_meta_conn *mc)
++{
++	if (mc)
++		kref_put(&mc->kref, kdbus_meta_conn_free);
++	return NULL;
++}
++
++static void kdbus_meta_conn_collect_timestamp(struct kdbus_meta_conn *mc,
++					      u64 msg_seqnum)
++{
++	mc->ts.monotonic_ns = ktime_get_ns();
++	mc->ts.realtime_ns = ktime_get_real_ns();
++
++	if (msg_seqnum)
++		mc->ts.seqnum = msg_seqnum;
++
++	mc->valid |= KDBUS_ATTACH_TIMESTAMP;
++}
++
++static int kdbus_meta_conn_collect_names(struct kdbus_meta_conn *mc,
++					 struct kdbus_conn *conn)
++{
++	const struct kdbus_name_owner *owner;
++	struct kdbus_item *item;
++	size_t slen, size;
++
++	lockdep_assert_held(&conn->ep->bus->name_registry->rwlock);
++
++	size = 0;
++	/* open-code length calculation to avoid final padding */
++	list_for_each_entry(owner, &conn->names_list, conn_entry)
++		if (!(owner->flags & KDBUS_NAME_IN_QUEUE))
++			size = KDBUS_ALIGN8(size) + KDBUS_ITEM_HEADER_SIZE +
++				sizeof(struct kdbus_name) +
++				strlen(owner->name->name) + 1;
++
++	if (!size)
++		return 0;
++
++	/* make sure we include zeroed padding for convenience helpers */
++	item = kmalloc(KDBUS_ALIGN8(size), GFP_KERNEL);
++	if (!item)
++		return -ENOMEM;
++
++	mc->owned_names_items = item;
++	mc->owned_names_size = size;
++
++	list_for_each_entry(owner, &conn->names_list, conn_entry) {
++		if (owner->flags & KDBUS_NAME_IN_QUEUE)
++			continue;
++
++		slen = strlen(owner->name->name) + 1;
++		kdbus_item_set(item, KDBUS_ITEM_OWNED_NAME, NULL,
++			       sizeof(struct kdbus_name) + slen);
++		item->name.flags = owner->flags;
++		memcpy(item->name.name, owner->name->name, slen);
++		item = KDBUS_ITEM_NEXT(item);
++	}
++
++	/* sanity check: the buffer should be completely written now */
++	WARN_ON((u8 *)item !=
++			(u8 *)mc->owned_names_items + KDBUS_ALIGN8(size));
++
++	mc->valid |= KDBUS_ATTACH_NAMES;
++	return 0;
++}
++
++static int kdbus_meta_conn_collect_description(struct kdbus_meta_conn *mc,
++					       struct kdbus_conn *conn)
++{
++	if (!conn->description)
++		return 0;
++
++	mc->conn_description = kstrdup(conn->description, GFP_KERNEL);
++	if (!mc->conn_description)
++		return -ENOMEM;
++
++	mc->valid |= KDBUS_ATTACH_CONN_DESCRIPTION;
++	return 0;
++}
++
++/**
++ * kdbus_meta_conn_collect() - Collect connection metadata
++ * @mc:		Message metadata object
++ * @conn:	Connection to collect data from
++ * @msg_seqnum:	Sequence number of the message to send
++ * @what:	Attach flags to collect
++ *
++ * This collects connection metadata from @msg_seqnum and @conn and saves it
++ * in @mc.
++ *
++ * If KDBUS_ATTACH_NAMES is set in @what and @conn is non-NULL, the caller must
++ * hold the name-registry read-lock of conn->ep->bus->registry.
++ *
++ * Return: 0 on success, negative error code on failure.
++ */
++int kdbus_meta_conn_collect(struct kdbus_meta_conn *mc,
++			    struct kdbus_conn *conn,
++			    u64 msg_seqnum, u64 what)
++{
++	int ret;
++
++	if (!mc || !(what & (KDBUS_ATTACH_TIMESTAMP |
++			     KDBUS_ATTACH_NAMES |
++			     KDBUS_ATTACH_CONN_DESCRIPTION)))
++		return 0;
++
++	mutex_lock(&mc->lock);
++
++	if (msg_seqnum && (what & KDBUS_ATTACH_TIMESTAMP) &&
++	    !(mc->collected & KDBUS_ATTACH_TIMESTAMP)) {
++		kdbus_meta_conn_collect_timestamp(mc, msg_seqnum);
++		mc->collected |= KDBUS_ATTACH_TIMESTAMP;
++	}
++
++	if (conn && (what & KDBUS_ATTACH_NAMES) &&
++	    !(mc->collected & KDBUS_ATTACH_NAMES)) {
++		ret = kdbus_meta_conn_collect_names(mc, conn);
++		if (ret < 0)
++			goto exit_unlock;
++		mc->collected |= KDBUS_ATTACH_NAMES;
++	}
++
++	if (conn && (what & KDBUS_ATTACH_CONN_DESCRIPTION) &&
++	    !(mc->collected & KDBUS_ATTACH_CONN_DESCRIPTION)) {
++		ret = kdbus_meta_conn_collect_description(mc, conn);
++		if (ret < 0)
++			goto exit_unlock;
++		mc->collected |= KDBUS_ATTACH_CONN_DESCRIPTION;
++	}
++
++	ret = 0;
++
++exit_unlock:
++	mutex_unlock(&mc->lock);
++	return ret;
++}
++
++static void kdbus_meta_export_caps(struct kdbus_meta_caps *out,
++				   const struct kdbus_meta_proc *mp,
++				   struct user_namespace *user_ns)
++{
++	struct user_namespace *iter;
++	const struct cred *cred = mp->cred;
++	bool parent = false, owner = false;
++	int i;
++
++	/*
++	 * This translates the effective capabilities of 'cred' into the given
++	 * user-namespace. If the given user-namespace is a child-namespace of
++	 * the user-namespace of 'cred', the mask can be copied verbatim. If
++	 * not, the mask is cleared.
++	 * There's one exception: If 'cred' is the owner of any user-namespace
++	 * in the path between the given user-namespace and the user-namespace
++	 * of 'cred', then it has all effective capabilities set. This means,
++	 * the user who created a user-namespace always has all effective
++	 * capabilities in any child namespaces. Note that this is based on the
++	 * uid of the namespace creator, not the task hierarchy.
++	 */
++	for (iter = user_ns; iter; iter = iter->parent) {
++		if (iter == cred->user_ns) {
++			parent = true;
++			break;
++		}
++
++		if (iter == &init_user_ns)
++			break;
++
++		if ((iter->parent == cred->user_ns) &&
++		    uid_eq(iter->owner, cred->euid)) {
++			owner = true;
++			break;
++		}
++	}
++
++	out->last_cap = CAP_LAST_CAP;
++
++	CAP_FOR_EACH_U32(i) {
++		if (parent) {
++			out->set[0].caps[i] = cred->cap_inheritable.cap[i];
++			out->set[1].caps[i] = cred->cap_permitted.cap[i];
++			out->set[2].caps[i] = cred->cap_effective.cap[i];
++			out->set[3].caps[i] = cred->cap_bset.cap[i];
++		} else if (owner) {
++			out->set[0].caps[i] = 0U;
++			out->set[1].caps[i] = ~0U;
++			out->set[2].caps[i] = ~0U;
++			out->set[3].caps[i] = ~0U;
++		} else {
++			out->set[0].caps[i] = 0U;
++			out->set[1].caps[i] = 0U;
++			out->set[2].caps[i] = 0U;
++			out->set[3].caps[i] = 0U;
++		}
++	}
++
++	/* clear unused bits */
++	for (i = 0; i < 4; i++)
++		out->set[i].caps[CAP_TO_INDEX(CAP_LAST_CAP)] &=
++					CAP_LAST_U32_VALID_MASK;
++}
++
++/* This is equivalent to from_kuid_munged(), but maps INVALID_UID to itself */
++static uid_t kdbus_from_kuid_keep(struct user_namespace *ns, kuid_t uid)
++{
++	return uid_valid(uid) ? from_kuid_munged(ns, uid) : ((uid_t)-1);
++}
++
++/* This is equivalent to from_kgid_munged(), but maps INVALID_GID to itself */
++static gid_t kdbus_from_kgid_keep(struct user_namespace *ns, kgid_t gid)
++{
++	return gid_valid(gid) ? from_kgid_munged(ns, gid) : ((gid_t)-1);
++}
++
++struct kdbus_meta_staging {
++	const struct kdbus_meta_proc *mp;
++	const struct kdbus_meta_fake *mf;
++	const struct kdbus_meta_conn *mc;
++	const struct kdbus_conn *conn;
++	u64 mask;
++
++	void *exe;
++	const char *exe_path;
++};
++
++static size_t kdbus_meta_measure(struct kdbus_meta_staging *staging)
++{
++	const struct kdbus_meta_proc *mp = staging->mp;
++	const struct kdbus_meta_fake *mf = staging->mf;
++	const struct kdbus_meta_conn *mc = staging->mc;
++	const u64 mask = staging->mask;
++	size_t size = 0;
++
++	/* process metadata */
++
++	if (mf && (mask & KDBUS_ATTACH_CREDS))
++		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_creds));
++	else if (mp && (mask & KDBUS_ATTACH_CREDS))
++		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_creds));
++
++	if (mf && (mask & KDBUS_ATTACH_PIDS))
++		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_pids));
++	else if (mp && (mask & KDBUS_ATTACH_PIDS))
++		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_pids));
++
++	if (mp && (mask & KDBUS_ATTACH_AUXGROUPS))
++		size += KDBUS_ITEM_SIZE(mp->cred->group_info->ngroups *
++					sizeof(u64));
++
++	if (mp && (mask & KDBUS_ATTACH_TID_COMM))
++		size += KDBUS_ITEM_SIZE(strlen(mp->tid_comm) + 1);
++
++	if (mp && (mask & KDBUS_ATTACH_PID_COMM))
++		size += KDBUS_ITEM_SIZE(strlen(mp->pid_comm) + 1);
++
++	if (staging->exe_path && (mask & KDBUS_ATTACH_EXE))
++		size += KDBUS_ITEM_SIZE(strlen(staging->exe_path) + 1);
++
++	if (mp && (mask & KDBUS_ATTACH_CMDLINE))
++		size += KDBUS_ITEM_SIZE(strlen(mp->cmdline) + 1);
++
++	if (mp && (mask & KDBUS_ATTACH_CGROUP))
++		size += KDBUS_ITEM_SIZE(strlen(mp->cgroup) + 1);
++
++	if (mp && (mask & KDBUS_ATTACH_CAPS))
++		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_meta_caps));
++
++	if (mf && (mask & KDBUS_ATTACH_SECLABEL))
++		size += KDBUS_ITEM_SIZE(strlen(mf->seclabel) + 1);
++	else if (mp && (mask & KDBUS_ATTACH_SECLABEL))
++		size += KDBUS_ITEM_SIZE(strlen(mp->seclabel) + 1);
++
++	if (mp && (mask & KDBUS_ATTACH_AUDIT))
++		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_audit));
++
++	/* connection metadata */
++
++	if (mc && (mask & KDBUS_ATTACH_NAMES))
++		size += KDBUS_ALIGN8(mc->owned_names_size);
++
++	if (mc && (mask & KDBUS_ATTACH_CONN_DESCRIPTION))
++		size += KDBUS_ITEM_SIZE(strlen(mc->conn_description) + 1);
++
++	if (mc && (mask & KDBUS_ATTACH_TIMESTAMP))
++		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_timestamp));
++
++	return size;
++}
++
++static struct kdbus_item *kdbus_write_head(struct kdbus_item **iter,
++					   u64 type, u64 size)
++{
++	struct kdbus_item *item = *iter;
++	size_t padding;
++
++	item->type = type;
++	item->size = KDBUS_ITEM_HEADER_SIZE + size;
++
++	/* clear padding */
++	padding = KDBUS_ALIGN8(item->size) - item->size;
++	if (padding)
++		memset(item->data + size, 0, padding);
++
++	*iter = KDBUS_ITEM_NEXT(item);
++	return item;
++}
++
++static struct kdbus_item *kdbus_write_full(struct kdbus_item **iter,
++					   u64 type, u64 size, const void *data)
++{
++	struct kdbus_item *item;
++
++	item = kdbus_write_head(iter, type, size);
++	memcpy(item->data, data, size);
++	return item;
++}
++
++static size_t kdbus_meta_write(struct kdbus_meta_staging *staging, void *mem,
++			       size_t size)
++{
++	struct user_namespace *user_ns = staging->conn->cred->user_ns;
++	struct pid_namespace *pid_ns = ns_of_pid(staging->conn->pid);
++	struct kdbus_item *item = NULL, *items = mem;
++	u8 *end, *owned_names_end = NULL;
++
++	/* process metadata */
++
++	if (staging->mf && (staging->mask & KDBUS_ATTACH_CREDS)) {
++		const struct kdbus_meta_fake *mf = staging->mf;
++
++		item = kdbus_write_head(&items, KDBUS_ITEM_CREDS,
++					sizeof(struct kdbus_creds));
++		item->creds = (struct kdbus_creds){
++			.uid	= kdbus_from_kuid_keep(user_ns, mf->uid),
++			.euid	= kdbus_from_kuid_keep(user_ns, mf->euid),
++			.suid	= kdbus_from_kuid_keep(user_ns, mf->suid),
++			.fsuid	= kdbus_from_kuid_keep(user_ns, mf->fsuid),
++			.gid	= kdbus_from_kgid_keep(user_ns, mf->gid),
++			.egid	= kdbus_from_kgid_keep(user_ns, mf->egid),
++			.sgid	= kdbus_from_kgid_keep(user_ns, mf->sgid),
++			.fsgid	= kdbus_from_kgid_keep(user_ns, mf->fsgid),
++		};
++	} else if (staging->mp && (staging->mask & KDBUS_ATTACH_CREDS)) {
++		const struct cred *c = staging->mp->cred;
++
++		item = kdbus_write_head(&items, KDBUS_ITEM_CREDS,
++					sizeof(struct kdbus_creds));
++		item->creds = (struct kdbus_creds){
++			.uid	= kdbus_from_kuid_keep(user_ns, c->uid),
++			.euid	= kdbus_from_kuid_keep(user_ns, c->euid),
++			.suid	= kdbus_from_kuid_keep(user_ns, c->suid),
++			.fsuid	= kdbus_from_kuid_keep(user_ns, c->fsuid),
++			.gid	= kdbus_from_kgid_keep(user_ns, c->gid),
++			.egid	= kdbus_from_kgid_keep(user_ns, c->egid),
++			.sgid	= kdbus_from_kgid_keep(user_ns, c->sgid),
++			.fsgid	= kdbus_from_kgid_keep(user_ns, c->fsgid),
++		};
++	}
++
++	if (staging->mf && (staging->mask & KDBUS_ATTACH_PIDS)) {
++		item = kdbus_write_head(&items, KDBUS_ITEM_PIDS,
++					sizeof(struct kdbus_pids));
++		item->pids = (struct kdbus_pids){
++			.pid = pid_nr_ns(staging->mf->tgid, pid_ns),
++			.tid = pid_nr_ns(staging->mf->pid, pid_ns),
++			.ppid = pid_nr_ns(staging->mf->ppid, pid_ns),
++		};
++	} else if (staging->mp && (staging->mask & KDBUS_ATTACH_PIDS)) {
++		item = kdbus_write_head(&items, KDBUS_ITEM_PIDS,
++					sizeof(struct kdbus_pids));
++		item->pids = (struct kdbus_pids){
++			.pid = pid_nr_ns(staging->mp->tgid, pid_ns),
++			.tid = pid_nr_ns(staging->mp->pid, pid_ns),
++			.ppid = pid_nr_ns(staging->mp->ppid, pid_ns),
++		};
++	}
++
++	if (staging->mp && (staging->mask & KDBUS_ATTACH_AUXGROUPS)) {
++		const struct group_info *info = staging->mp->cred->group_info;
++		size_t i;
++
++		item = kdbus_write_head(&items, KDBUS_ITEM_AUXGROUPS,
++					info->ngroups * sizeof(u64));
++		for (i = 0; i < info->ngroups; ++i)
++			item->data64[i] = from_kgid_munged(user_ns,
++							   GROUP_AT(info, i));
++	}
++
++	if (staging->mp && (staging->mask & KDBUS_ATTACH_TID_COMM))
++		item = kdbus_write_full(&items, KDBUS_ITEM_TID_COMM,
++					strlen(staging->mp->tid_comm) + 1,
++					staging->mp->tid_comm);
++
++	if (staging->mp && (staging->mask & KDBUS_ATTACH_PID_COMM))
++		item = kdbus_write_full(&items, KDBUS_ITEM_PID_COMM,
++					strlen(staging->mp->pid_comm) + 1,
++					staging->mp->pid_comm);
++
++	if (staging->exe_path && (staging->mask & KDBUS_ATTACH_EXE))
++		item = kdbus_write_full(&items, KDBUS_ITEM_EXE,
++					strlen(staging->exe_path) + 1,
++					staging->exe_path);
++
++	if (staging->mp && (staging->mask & KDBUS_ATTACH_CMDLINE))
++		item = kdbus_write_full(&items, KDBUS_ITEM_CMDLINE,
++					strlen(staging->mp->cmdline) + 1,
++					staging->mp->cmdline);
++
++	if (staging->mp && (staging->mask & KDBUS_ATTACH_CGROUP))
++		item = kdbus_write_full(&items, KDBUS_ITEM_CGROUP,
++					strlen(staging->mp->cgroup) + 1,
++					staging->mp->cgroup);
++
++	if (staging->mp && (staging->mask & KDBUS_ATTACH_CAPS)) {
++		item = kdbus_write_head(&items, KDBUS_ITEM_CAPS,
++					sizeof(struct kdbus_meta_caps));
++		kdbus_meta_export_caps((void*)&item->caps, staging->mp,
++				       user_ns);
++	}
++
++	if (staging->mf && (staging->mask & KDBUS_ATTACH_SECLABEL))
++		item = kdbus_write_full(&items, KDBUS_ITEM_SECLABEL,
++					strlen(staging->mf->seclabel) + 1,
++					staging->mf->seclabel);
++	else if (staging->mp && (staging->mask & KDBUS_ATTACH_SECLABEL))
++		item = kdbus_write_full(&items, KDBUS_ITEM_SECLABEL,
++					strlen(staging->mp->seclabel) + 1,
++					staging->mp->seclabel);
++
++	if (staging->mp && (staging->mask & KDBUS_ATTACH_AUDIT)) {
++		item = kdbus_write_head(&items, KDBUS_ITEM_AUDIT,
++					sizeof(struct kdbus_audit));
++		item->audit = (struct kdbus_audit){
++			.loginuid = from_kuid(user_ns,
++					      staging->mp->audit_loginuid),
++			.sessionid = staging->mp->audit_sessionid,
++		};
++	}
++
++	/* connection metadata */
++
++	if (staging->mc && (staging->mask & KDBUS_ATTACH_NAMES)) {
++		memcpy(items, staging->mc->owned_names_items,
++		       KDBUS_ALIGN8(staging->mc->owned_names_size));
++		owned_names_end = (u8 *)items + staging->mc->owned_names_size;
++		items = (void *)KDBUS_ALIGN8((unsigned long)owned_names_end);
++	}
++
++	if (staging->mc && (staging->mask & KDBUS_ATTACH_CONN_DESCRIPTION))
++		item = kdbus_write_full(&items, KDBUS_ITEM_CONN_DESCRIPTION,
++				strlen(staging->mc->conn_description) + 1,
++				staging->mc->conn_description);
++
++	if (staging->mc && (staging->mask & KDBUS_ATTACH_TIMESTAMP))
++		item = kdbus_write_full(&items, KDBUS_ITEM_TIMESTAMP,
++					sizeof(staging->mc->ts),
++					&staging->mc->ts);
++
++	/*
++	 * Return real size (minus trailing padding). In case of 'owned_names'
++	 * we cannot deduce it from item->size, so treat it special.
++	 */
++
++	if (items == (void *)KDBUS_ALIGN8((unsigned long)owned_names_end))
++		end = owned_names_end;
++	else if (item)
++		end = (u8 *)item + item->size;
++	else
++		end = mem;
++
++	WARN_ON((u8 *)items - (u8 *)mem != size);
++	WARN_ON((void *)KDBUS_ALIGN8((unsigned long)end) != (void *)items);
++
++	return end - (u8 *)mem;
++}
++
++int kdbus_meta_emit(struct kdbus_meta_proc *mp,
++		    struct kdbus_meta_fake *mf,
++		    struct kdbus_meta_conn *mc,
++		    struct kdbus_conn *conn,
++		    u64 mask,
++		    struct kdbus_item **out_items,
++		    size_t *out_size)
++{
++	struct kdbus_meta_staging staging = {};
++	struct kdbus_item *items = NULL;
++	size_t size = 0;
++	int ret;
++
++	if (WARN_ON(mf && mp))
++		mp = NULL;
++
++	staging.mp = mp;
++	staging.mf = mf;
++	staging.mc = mc;
++	staging.conn = conn;
++
++	/* get mask of valid items */
++	if (mf)
++		staging.mask |= mf->valid;
++	if (mp) {
++		mutex_lock(&mp->lock);
++		staging.mask |= mp->valid;
++		mutex_unlock(&mp->lock);
++	}
++	if (mc) {
++		mutex_lock(&mc->lock);
++		staging.mask |= mc->valid;
++		mutex_unlock(&mc->lock);
++	}
++
++	staging.mask &= mask;
++
++	if (!staging.mask) { /* bail out if nothing to do */
++		ret = 0;
++		goto exit;
++	}
++
++	/* EXE is special as it needs a temporary page to assemble */
++	if (mp && (staging.mask & KDBUS_ATTACH_EXE)) {
++		struct path p;
++
++		/*
++		 * XXX: We need access to __d_path() so we can write the path
++		 * relative to conn->root_path. Once upstream, we need
++		 * EXPORT_SYMBOL(__d_path) or an equivalent of d_path() that
++		 * takes the root path directly. Until then, we drop this item
++		 * if the root-paths differ.
++		 */
++
++		get_fs_root(current->fs, &p);
++		if (path_equal(&p, &conn->root_path)) {
++			staging.exe = (void *)__get_free_page(GFP_TEMPORARY);
++			if (!staging.exe) {
++				path_put(&p);
++				ret = -ENOMEM;
++				goto exit;
++			}
++
++			staging.exe_path = d_path(&mp->exe_path, staging.exe,
++						  PAGE_SIZE);
++			if (IS_ERR(staging.exe_path)) {
++				path_put(&p);
++				ret = PTR_ERR(staging.exe_path);
++				goto exit;
++			}
++		}
++		path_put(&p);
++	}
++
++	size = kdbus_meta_measure(&staging);
++	if (!size) { /* bail out if nothing to do */
++		ret = 0;
++		goto exit;
++	}
++
++	items = kmalloc(size, GFP_KERNEL);
++	if (!items) {
++		ret = -ENOMEM;
++		goto exit;
++	}
++
++	size = kdbus_meta_write(&staging, items, size);
++	if (!size) {
++		kfree(items);
++		items = NULL;
++	}
++
++	ret = 0;
++
++exit:
++	if (staging.exe)
++		free_page((unsigned long)staging.exe);
++	if (ret >= 0) {
++		*out_items = items;
++		*out_size = size;
++	}
++	return ret;
++}
++
++enum {
++	KDBUS_META_PROC_NONE,
++	KDBUS_META_PROC_NORMAL,
++};
++
++/**
++ * kdbus_proc_permission() - check /proc permissions on target pid
++ * @pid_ns:		namespace we operate in
++ * @cred:		credentials of requestor
++ * @target:		target process
++ *
++ * This checks whether a process with credentials @cred can access information
++ * of @target in the namespace @pid_ns. This tries to follow /proc permissions,
++ * but is slightly more restrictive.
++ *
++ * Return: The /proc access level (KDBUS_META_PROC_*) is returned.
++ */
++static unsigned int kdbus_proc_permission(const struct pid_namespace *pid_ns,
++					  const struct cred *cred,
++					  struct pid *target)
++{
++	if (pid_ns->hide_pid < 1)
++		return KDBUS_META_PROC_NORMAL;
++
++	/* XXX: we need groups_search() exported for aux-groups */
++	if (gid_eq(cred->egid, pid_ns->pid_gid))
++		return KDBUS_META_PROC_NORMAL;
++
++	/*
++	 * XXX: If ptrace_may_access(PTRACE_MODE_READ) is granted, you can
++	 * overwrite hide_pid. However, ptrace_may_access() only supports
++	 * checking 'current', hence, we cannot use this here. But we
++	 * simply decide to not support this override, so no need to worry.
++	 */
++
++	return KDBUS_META_PROC_NONE;
++}
++
++/**
++ * kdbus_meta_proc_mask() - calculate which metadata would be visible to
++ *			    a connection via /proc
++ * @prv_pid:		pid of metadata provider
++ * @req_pid:		pid of metadata requestor
++ * @req_cred:		credentials of metadata reqeuestor
++ * @wanted:		metadata that is requested
++ *
++ * This checks which metadata items of @prv_pid can be read via /proc by the
++ * requestor @req_pid.
++ *
++ * Return: Set of metadata flags the requestor can see (limited by @wanted).
++ */
++static u64 kdbus_meta_proc_mask(struct pid *prv_pid,
++				struct pid *req_pid,
++				const struct cred *req_cred,
++				u64 wanted)
++{
++	struct pid_namespace *prv_ns, *req_ns;
++	unsigned int proc;
++
++	prv_ns = ns_of_pid(prv_pid);
++	req_ns = ns_of_pid(req_pid);
++
++	/*
++	 * If the sender is not visible in the receiver namespace, then the
++	 * receiver cannot access the sender via its own procfs. Hence, we do
++	 * not attach any additional metadata.
++	 */
++	if (!pid_nr_ns(prv_pid, req_ns))
++		return 0;
++
++	/*
++	 * If the pid-namespace of the receiver has hide_pid set, it cannot see
++	 * any process but its own. We shortcut this /proc permission check if
++	 * provider and requestor are the same. If not, we perform rather
++	 * expensive /proc permission checks.
++	 */
++	if (prv_pid == req_pid)
++		proc = KDBUS_META_PROC_NORMAL;
++	else
++		proc = kdbus_proc_permission(req_ns, req_cred, prv_pid);
++
++	/* you need /proc access to read standard process attributes */
++	if (proc < KDBUS_META_PROC_NORMAL)
++		wanted &= ~(KDBUS_ATTACH_TID_COMM |
++			    KDBUS_ATTACH_PID_COMM |
++			    KDBUS_ATTACH_SECLABEL |
++			    KDBUS_ATTACH_CMDLINE |
++			    KDBUS_ATTACH_CGROUP |
++			    KDBUS_ATTACH_AUDIT |
++			    KDBUS_ATTACH_CAPS |
++			    KDBUS_ATTACH_EXE);
++
++	/* clear all non-/proc flags */
++	return wanted & (KDBUS_ATTACH_TID_COMM |
++			 KDBUS_ATTACH_PID_COMM |
++			 KDBUS_ATTACH_SECLABEL |
++			 KDBUS_ATTACH_CMDLINE |
++			 KDBUS_ATTACH_CGROUP |
++			 KDBUS_ATTACH_AUDIT |
++			 KDBUS_ATTACH_CAPS |
++			 KDBUS_ATTACH_EXE);
++}
++
++/**
++ * kdbus_meta_get_mask() - calculate attach flags mask for metadata request
++ * @prv_pid:		pid of metadata provider
++ * @prv_mask:		mask of metadata the provide grants unchecked
++ * @req_pid:		pid of metadata requestor
++ * @req_cred:		credentials of metadata requestor
++ * @req_mask:		mask of metadata that is requested
++ *
++ * This calculates the metadata items that the requestor @req_pid can access
++ * from the metadata provider @prv_pid. This permission check consists of
++ * several different parts:
++ *  - Providers can grant metadata items unchecked. Regardless of their type,
++ *    they're always granted to the requestor. This mask is passed as @prv_mask.
++ *  - Basic items (credentials and connection metadata) are granted implicitly
++ *    to everyone. They're publicly available to any bus-user that can see the
++ *    provider.
++ *  - Process credentials that are not granted implicitly follow the same
++ *    permission checks as /proc. This means, we always assume a requestor
++ *    process has access to their *own* /proc mount, if they have access to
++ *    kdbusfs.
++ *
++ * Return: Mask of metadata that is granted.
++ */
++static u64 kdbus_meta_get_mask(struct pid *prv_pid, u64 prv_mask,
++			       struct pid *req_pid,
++			       const struct cred *req_cred, u64 req_mask)
++{
++	u64 missing, impl_mask, proc_mask = 0;
++
++	/*
++	 * Connection metadata and basic unix process credentials are
++	 * transmitted implicitly, and cannot be suppressed. Both are required
++	 * to perform user-space policies on the receiver-side. Furthermore,
++	 * connection metadata is public state, anyway, and unix credentials
++	 * are needed for UDS-compatibility. We extend them slightly by
++	 * auxiliary groups and additional uids/gids/pids.
++	 */
++	impl_mask = /* connection metadata */
++		    KDBUS_ATTACH_CONN_DESCRIPTION |
++		    KDBUS_ATTACH_TIMESTAMP |
++		    KDBUS_ATTACH_NAMES |
++		    /* credentials and pids */
++		    KDBUS_ATTACH_AUXGROUPS |
++		    KDBUS_ATTACH_CREDS |
++		    KDBUS_ATTACH_PIDS;
++
++	/*
++	 * Calculate the set of metadata that is not granted implicitly nor by
++	 * the sender, but still requested by the receiver. If any are left,
++	 * perform rather expensive /proc access checks for them.
++	 */
++	missing = req_mask & ~((prv_mask | impl_mask) & req_mask);
++	if (missing)
++		proc_mask = kdbus_meta_proc_mask(prv_pid, req_pid, req_cred,
++						 missing);
++
++	return (prv_mask | impl_mask | proc_mask) & req_mask;
++}
++
++/**
++ */
++u64 kdbus_meta_info_mask(const struct kdbus_conn *conn, u64 mask)
++{
++	return kdbus_meta_get_mask(conn->pid,
++				   atomic64_read(&conn->attach_flags_send),
++				   task_pid(current),
++				   current_cred(),
++				   mask);
++}
++
++/**
++ */
++u64 kdbus_meta_msg_mask(const struct kdbus_conn *snd,
++			const struct kdbus_conn *rcv)
++{
++	return kdbus_meta_get_mask(task_pid(current),
++				   atomic64_read(&snd->attach_flags_send),
++				   rcv->pid,
++				   rcv->cred,
++				   atomic64_read(&rcv->attach_flags_recv));
++}
+diff --git a/ipc/kdbus/metadata.h b/ipc/kdbus/metadata.h
+new file mode 100644
+index 0000000..dba7cc7
+--- /dev/null
++++ b/ipc/kdbus/metadata.h
+@@ -0,0 +1,86 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_METADATA_H
++#define __KDBUS_METADATA_H
++
++#include <linux/kernel.h>
++
++struct kdbus_conn;
++struct kdbus_pool_slice;
++
++struct kdbus_meta_proc;
++struct kdbus_meta_conn;
++
++/**
++ * struct kdbus_meta_fake - Fake metadata
++ * @valid:		Bitmask of collected and valid items
++ * @uid:		UID of process
++ * @euid:		EUID of process
++ * @suid:		SUID of process
++ * @fsuid:		FSUID of process
++ * @gid:		GID of process
++ * @egid:		EGID of process
++ * @sgid:		SGID of process
++ * @fsgid:		FSGID of process
++ * @pid:		PID of process
++ * @tgid:		TGID of process
++ * @ppid:		PPID of process
++ * @seclabel:		Seclabel
++ */
++struct kdbus_meta_fake {
++	u64 valid;
++
++	/* KDBUS_ITEM_CREDS */
++	kuid_t uid, euid, suid, fsuid;
++	kgid_t gid, egid, sgid, fsgid;
++
++	/* KDBUS_ITEM_PIDS */
++	struct pid *pid, *tgid, *ppid;
++
++	/* KDBUS_ITEM_SECLABEL */
++	char *seclabel;
++};
++
++struct kdbus_meta_proc *kdbus_meta_proc_new(void);
++struct kdbus_meta_proc *kdbus_meta_proc_ref(struct kdbus_meta_proc *mp);
++struct kdbus_meta_proc *kdbus_meta_proc_unref(struct kdbus_meta_proc *mp);
++int kdbus_meta_proc_collect(struct kdbus_meta_proc *mp, u64 what);
++
++struct kdbus_meta_fake *kdbus_meta_fake_new(void);
++struct kdbus_meta_fake *kdbus_meta_fake_free(struct kdbus_meta_fake *mf);
++int kdbus_meta_fake_collect(struct kdbus_meta_fake *mf,
++			    const struct kdbus_creds *creds,
++			    const struct kdbus_pids *pids,
++			    const char *seclabel);
++
++struct kdbus_meta_conn *kdbus_meta_conn_new(void);
++struct kdbus_meta_conn *kdbus_meta_conn_ref(struct kdbus_meta_conn *mc);
++struct kdbus_meta_conn *kdbus_meta_conn_unref(struct kdbus_meta_conn *mc);
++int kdbus_meta_conn_collect(struct kdbus_meta_conn *mc,
++			    struct kdbus_conn *conn,
++			    u64 msg_seqnum, u64 what);
++
++int kdbus_meta_emit(struct kdbus_meta_proc *mp,
++		    struct kdbus_meta_fake *mf,
++		    struct kdbus_meta_conn *mc,
++		    struct kdbus_conn *conn,
++		    u64 mask,
++		    struct kdbus_item **out_items,
++		    size_t *out_size);
++u64 kdbus_meta_info_mask(const struct kdbus_conn *conn, u64 mask);
++u64 kdbus_meta_msg_mask(const struct kdbus_conn *snd,
++			const struct kdbus_conn *rcv);
++
++#endif
+diff --git a/ipc/kdbus/names.c b/ipc/kdbus/names.c
+new file mode 100644
+index 0000000..bf44ca3
+--- /dev/null
++++ b/ipc/kdbus/names.c
+@@ -0,0 +1,854 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/ctype.h>
++#include <linux/fs.h>
++#include <linux/hash.h>
++#include <linux/idr.h>
++#include <linux/init.h>
++#include <linux/module.h>
++#include <linux/mutex.h>
++#include <linux/rwsem.h>
++#include <linux/sched.h>
++#include <linux/slab.h>
++#include <linux/uaccess.h>
++#include <linux/uio.h>
++
++#include "bus.h"
++#include "connection.h"
++#include "endpoint.h"
++#include "handle.h"
++#include "item.h"
++#include "names.h"
++#include "notify.h"
++#include "policy.h"
++
++#define KDBUS_NAME_SAVED_MASK (KDBUS_NAME_ALLOW_REPLACEMENT |	\
++			       KDBUS_NAME_QUEUE)
++
++static bool kdbus_name_owner_is_used(struct kdbus_name_owner *owner)
++{
++	return !list_empty(&owner->name_entry) ||
++	       owner == owner->name->activator;
++}
++
++static struct kdbus_name_owner *
++kdbus_name_owner_new(struct kdbus_conn *conn, struct kdbus_name_entry *name,
++		     u64 flags)
++{
++	struct kdbus_name_owner *owner;
++
++	kdbus_conn_assert_active(conn);
++
++	if (conn->name_count >= KDBUS_CONN_MAX_NAMES)
++		return ERR_PTR(-E2BIG);
++
++	owner = kmalloc(sizeof(*owner), GFP_KERNEL);
++	if (!owner)
++		return ERR_PTR(-ENOMEM);
++
++	owner->flags = flags & KDBUS_NAME_SAVED_MASK;
++	owner->conn = conn;
++	owner->name = name;
++	list_add_tail(&owner->conn_entry, &conn->names_list);
++	INIT_LIST_HEAD(&owner->name_entry);
++
++	++conn->name_count;
++	return owner;
++}
++
++static void kdbus_name_owner_free(struct kdbus_name_owner *owner)
++{
++	if (!owner)
++		return;
++
++	WARN_ON(kdbus_name_owner_is_used(owner));
++	--owner->conn->name_count;
++	list_del(&owner->conn_entry);
++	kfree(owner);
++}
++
++static struct kdbus_name_owner *
++kdbus_name_owner_find(struct kdbus_name_entry *name, struct kdbus_conn *conn)
++{
++	struct kdbus_name_owner *owner;
++
++	/*
++	 * Use conn->names_list over name->queue to make sure boundaries of
++	 * this linear search are controlled by the connection itself.
++	 * Furthermore, this will find normal owners as well as activators
++	 * without any additional code.
++	 */
++	list_for_each_entry(owner, &conn->names_list, conn_entry)
++		if (owner->name == name)
++			return owner;
++
++	return NULL;
++}
++
++static bool kdbus_name_entry_is_used(struct kdbus_name_entry *name)
++{
++	return !list_empty(&name->queue) || name->activator;
++}
++
++static struct kdbus_name_owner *
++kdbus_name_entry_first(struct kdbus_name_entry *name)
++{
++	return list_first_entry_or_null(&name->queue, struct kdbus_name_owner,
++					name_entry);
++}
++
++static struct kdbus_name_entry *
++kdbus_name_entry_new(struct kdbus_name_registry *r, u32 hash,
++		     const char *name_str)
++{
++	struct kdbus_name_entry *name;
++	size_t namelen;
++
++	lockdep_assert_held(&r->rwlock);
++
++	namelen = strlen(name_str);
++
++	name = kmalloc(sizeof(*name) + namelen + 1, GFP_KERNEL);
++	if (!name)
++		return ERR_PTR(-ENOMEM);
++
++	name->name_id = ++r->name_seq_last;
++	name->activator = NULL;
++	INIT_LIST_HEAD(&name->queue);
++	hash_add(r->entries_hash, &name->hentry, hash);
++	memcpy(name->name, name_str, namelen + 1);
++
++	return name;
++}
++
++static void kdbus_name_entry_free(struct kdbus_name_entry *name)
++{
++	if (!name)
++		return;
++
++	WARN_ON(kdbus_name_entry_is_used(name));
++	hash_del(&name->hentry);
++	kfree(name);
++}
++
++static struct kdbus_name_entry *
++kdbus_name_entry_find(struct kdbus_name_registry *r, u32 hash,
++		      const char *name_str)
++{
++	struct kdbus_name_entry *name;
++
++	lockdep_assert_held(&r->rwlock);
++
++	hash_for_each_possible(r->entries_hash, name, hentry, hash)
++		if (!strcmp(name->name, name_str))
++			return name;
++
++	return NULL;
++}
++
++/**
++ * kdbus_name_registry_new() - create a new name registry
++ *
++ * Return: a new kdbus_name_registry on success, ERR_PTR on failure.
++ */
++struct kdbus_name_registry *kdbus_name_registry_new(void)
++{
++	struct kdbus_name_registry *r;
++
++	r = kmalloc(sizeof(*r), GFP_KERNEL);
++	if (!r)
++		return ERR_PTR(-ENOMEM);
++
++	hash_init(r->entries_hash);
++	init_rwsem(&r->rwlock);
++	r->name_seq_last = 0;
++
++	return r;
++}
++
++/**
++ * kdbus_name_registry_free() - free name registry
++ * @r:		name registry to free, or NULL
++ *
++ * Free a name registry and cleanup all internal objects. This is a no-op if
++ * you pass NULL as registry.
++ */
++void kdbus_name_registry_free(struct kdbus_name_registry *r)
++{
++	if (!r)
++		return;
++
++	WARN_ON(!hash_empty(r->entries_hash));
++	kfree(r);
++}
++
++/**
++ * kdbus_name_lookup_unlocked() - lookup name in registry
++ * @reg:		name registry
++ * @name:		name to lookup
++ *
++ * This looks up @name in the given name-registry and returns the
++ * kdbus_name_entry object. The caller must hold the registry-lock and must not
++ * access the returned object after releasing the lock.
++ *
++ * Return: Pointer to name-entry, or NULL if not found.
++ */
++struct kdbus_name_entry *
++kdbus_name_lookup_unlocked(struct kdbus_name_registry *reg, const char *name)
++{
++	return kdbus_name_entry_find(reg, kdbus_strhash(name), name);
++}
++
++static int kdbus_name_become_activator(struct kdbus_name_owner *owner,
++				       u64 *return_flags)
++{
++	if (kdbus_name_owner_is_used(owner))
++		return -EALREADY;
++	if (owner->name->activator)
++		return -EEXIST;
++
++	owner->name->activator = owner;
++	owner->flags |= KDBUS_NAME_ACTIVATOR;
++
++	if (kdbus_name_entry_first(owner->name)) {
++		owner->flags |= KDBUS_NAME_IN_QUEUE;
++	} else {
++		owner->flags |= KDBUS_NAME_PRIMARY;
++		kdbus_notify_name_change(owner->conn->ep->bus,
++					 KDBUS_ITEM_NAME_ADD,
++					 0, owner->conn->id,
++					 0, owner->flags,
++					 owner->name->name);
++	}
++
++	if (return_flags)
++		*return_flags = owner->flags | KDBUS_NAME_ACQUIRED;
++
++	return 0;
++}
++
++static int kdbus_name_update(struct kdbus_name_owner *owner, u64 flags,
++			     u64 *return_flags)
++{
++	struct kdbus_name_owner *primary, *activator;
++	struct kdbus_name_entry *name;
++	struct kdbus_bus *bus;
++	u64 nflags = 0;
++	int ret = 0;
++
++	name = owner->name;
++	bus = owner->conn->ep->bus;
++	primary = kdbus_name_entry_first(name);
++	activator = name->activator;
++
++	/* cannot be activator and acquire a name */
++	if (owner == activator)
++		return -EUCLEAN;
++
++	/* update saved flags */
++	owner->flags = flags & KDBUS_NAME_SAVED_MASK;
++
++	if (!primary) {
++		/*
++		 * No primary owner (but maybe an activator). Take over the
++		 * name.
++		 */
++
++		list_add(&owner->name_entry, &name->queue);
++		owner->flags |= KDBUS_NAME_PRIMARY;
++		nflags |= KDBUS_NAME_ACQUIRED;
++
++		/* move messages to new owner on activation */
++		if (activator) {
++			kdbus_conn_move_messages(owner->conn, activator->conn,
++						 name->name_id);
++			kdbus_notify_name_change(bus, KDBUS_ITEM_NAME_CHANGE,
++					activator->conn->id, owner->conn->id,
++					activator->flags, owner->flags,
++					name->name);
++			activator->flags &= ~KDBUS_NAME_PRIMARY;
++			activator->flags |= KDBUS_NAME_IN_QUEUE;
++		} else {
++			kdbus_notify_name_change(bus, KDBUS_ITEM_NAME_ADD,
++						 0, owner->conn->id,
++						 0, owner->flags,
++						 name->name);
++		}
++
++	} else if (owner == primary) {
++		/*
++		 * Already the primary owner of the name, flags were already
++		 * updated. Nothing to do.
++		 */
++
++		owner->flags |= KDBUS_NAME_PRIMARY;
++
++	} else if ((primary->flags & KDBUS_NAME_ALLOW_REPLACEMENT) &&
++		   (flags & KDBUS_NAME_REPLACE_EXISTING)) {
++		/*
++		 * We're not the primary owner but can replace it. Move us
++		 * ahead of the primary owner and acquire the name (possibly
++		 * skipping queued owners ahead of us).
++		 */
++
++		list_del_init(&owner->name_entry);
++		list_add(&owner->name_entry, &name->queue);
++		owner->flags |= KDBUS_NAME_PRIMARY;
++		nflags |= KDBUS_NAME_ACQUIRED;
++
++		kdbus_notify_name_change(bus, KDBUS_ITEM_NAME_CHANGE,
++					 primary->conn->id, owner->conn->id,
++					 primary->flags, owner->flags,
++					 name->name);
++
++		/* requeue old primary, or drop if queueing not wanted */
++		if (primary->flags & KDBUS_NAME_QUEUE) {
++			primary->flags &= ~KDBUS_NAME_PRIMARY;
++			primary->flags |= KDBUS_NAME_IN_QUEUE;
++		} else {
++			list_del_init(&primary->name_entry);
++			kdbus_name_owner_free(primary);
++		}
++
++	} else if (flags & KDBUS_NAME_QUEUE) {
++		/*
++		 * Name is already occupied and we cannot take it over, but
++		 * queuing is allowed. Put us silently on the queue, if not
++		 * already there.
++		 */
++
++		owner->flags |= KDBUS_NAME_IN_QUEUE;
++		if (!kdbus_name_owner_is_used(owner)) {
++			list_add_tail(&owner->name_entry, &name->queue);
++			nflags |= KDBUS_NAME_ACQUIRED;
++		}
++	} else if (kdbus_name_owner_is_used(owner)) {
++		/*
++		 * Already queued on name, but re-queueing was not requested.
++		 * Make sure to unlink it from the name, the caller is
++		 * responsible for releasing it.
++		 */
++
++		list_del_init(&owner->name_entry);
++	} else {
++		/*
++		 * Name is already claimed and queueing is not requested.
++		 * Return error to the caller.
++		 */
++
++		ret = -EEXIST;
++	}
++
++	if (return_flags)
++		*return_flags = owner->flags | nflags;
++
++	return ret;
++}
++
++int kdbus_name_acquire(struct kdbus_name_registry *reg,
++		       struct kdbus_conn *conn, const char *name_str,
++		       u64 flags, u64 *return_flags)
++{
++	struct kdbus_name_entry *name = NULL;
++	struct kdbus_name_owner *owner = NULL;
++	u32 hash;
++	int ret;
++
++	kdbus_conn_assert_active(conn);
++
++	down_write(&reg->rwlock);
++
++	/*
++	 * Verify the connection has access to the name. Do this before testing
++	 * for double-acquisitions and other errors to make sure we do not leak
++	 * information about this name through possible custom endpoints.
++	 */
++	if (!kdbus_conn_policy_own_name(conn, current_cred(), name_str)) {
++		ret = -EPERM;
++		goto exit;
++	}
++
++	/*
++	 * Lookup the name entry. If it already exists, search for an owner
++	 * entry as we might already own that name. If either does not exist,
++	 * we will allocate a fresh one.
++	 */
++	hash = kdbus_strhash(name_str);
++	name = kdbus_name_entry_find(reg, hash, name_str);
++	if (name) {
++		owner = kdbus_name_owner_find(name, conn);
++	} else {
++		name = kdbus_name_entry_new(reg, hash, name_str);
++		if (IS_ERR(name)) {
++			ret = PTR_ERR(name);
++			name = NULL;
++			goto exit;
++		}
++	}
++
++	/* create name owner object if not already queued */
++	if (!owner) {
++		owner = kdbus_name_owner_new(conn, name, flags);
++		if (IS_ERR(owner)) {
++			ret = PTR_ERR(owner);
++			owner = NULL;
++			goto exit;
++		}
++	}
++
++	if (flags & KDBUS_NAME_ACTIVATOR)
++		ret = kdbus_name_become_activator(owner, return_flags);
++	else
++		ret = kdbus_name_update(owner, flags, return_flags);
++	if (ret < 0)
++		goto exit;
++
++exit:
++	if (owner && !kdbus_name_owner_is_used(owner))
++		kdbus_name_owner_free(owner);
++	if (name && !kdbus_name_entry_is_used(name))
++		kdbus_name_entry_free(name);
++	up_write(&reg->rwlock);
++	kdbus_notify_flush(conn->ep->bus);
++	return ret;
++}
++
++static void kdbus_name_release_unlocked(struct kdbus_name_owner *owner)
++{
++	struct kdbus_name_owner *primary, *next;
++	struct kdbus_name_entry *name;
++
++	name = owner->name;
++	primary = kdbus_name_entry_first(name);
++
++	list_del_init(&owner->name_entry);
++	if (owner == name->activator)
++		name->activator = NULL;
++
++	if (!primary || owner == primary) {
++		next = kdbus_name_entry_first(name);
++		if (!next)
++			next = name->activator;
++
++		if (next) {
++			/* hand to next in queue */
++			next->flags &= ~KDBUS_NAME_IN_QUEUE;
++			next->flags |= KDBUS_NAME_PRIMARY;
++			if (next == name->activator)
++				kdbus_conn_move_messages(next->conn,
++							 owner->conn,
++							 name->name_id);
++
++			kdbus_notify_name_change(owner->conn->ep->bus,
++					KDBUS_ITEM_NAME_CHANGE,
++					owner->conn->id, next->conn->id,
++					owner->flags, next->flags,
++					name->name);
++		} else {
++			kdbus_notify_name_change(owner->conn->ep->bus,
++						 KDBUS_ITEM_NAME_REMOVE,
++						 owner->conn->id, 0,
++						 owner->flags, 0,
++						 name->name);
++		}
++	}
++
++	kdbus_name_owner_free(owner);
++	if (!kdbus_name_entry_is_used(name))
++		kdbus_name_entry_free(name);
++}
++
++static int kdbus_name_release(struct kdbus_name_registry *reg,
++			      struct kdbus_conn *conn,
++			      const char *name_str)
++{
++	struct kdbus_name_owner *owner;
++	struct kdbus_name_entry *name;
++	int ret = 0;
++
++	down_write(&reg->rwlock);
++	name = kdbus_name_entry_find(reg, kdbus_strhash(name_str), name_str);
++	if (name) {
++		owner = kdbus_name_owner_find(name, conn);
++		if (owner)
++			kdbus_name_release_unlocked(owner);
++		else
++			ret = -EADDRINUSE;
++	} else {
++		ret = -ESRCH;
++	}
++	up_write(&reg->rwlock);
++
++	kdbus_notify_flush(conn->ep->bus);
++	return ret;
++}
++
++/**
++ * kdbus_name_release_all() - remove all name entries of a given connection
++ * @reg:		name registry
++ * @conn:		connection
++ */
++void kdbus_name_release_all(struct kdbus_name_registry *reg,
++			    struct kdbus_conn *conn)
++{
++	struct kdbus_name_owner *owner;
++
++	down_write(&reg->rwlock);
++
++	while ((owner = list_first_entry_or_null(&conn->names_list,
++						 struct kdbus_name_owner,
++						 conn_entry)))
++		kdbus_name_release_unlocked(owner);
++
++	up_write(&reg->rwlock);
++
++	kdbus_notify_flush(conn->ep->bus);
++}
++
++/**
++ * kdbus_name_is_valid() - check if a name is valid
++ * @p:			The name to check
++ * @allow_wildcard:	Whether or not to allow a wildcard name
++ *
++ * A name is valid if all of the following criterias are met:
++ *
++ *  - The name has two or more elements separated by a period ('.') character.
++ *  - All elements must contain at least one character.
++ *  - Each element must only contain the ASCII characters "[A-Z][a-z][0-9]_-"
++ *    and must not begin with a digit.
++ *  - The name must not exceed KDBUS_NAME_MAX_LEN.
++ *  - If @allow_wildcard is true, the name may end on '.*'
++ */
++bool kdbus_name_is_valid(const char *p, bool allow_wildcard)
++{
++	bool dot, found_dot = false;
++	const char *q;
++
++	for (dot = true, q = p; *q; q++) {
++		if (*q == '.') {
++			if (dot)
++				return false;
++
++			found_dot = true;
++			dot = true;
++		} else {
++			bool good;
++
++			good = isalpha(*q) || (!dot && isdigit(*q)) ||
++				*q == '_' || *q == '-' ||
++				(allow_wildcard && dot &&
++					*q == '*' && *(q + 1) == '\0');
++
++			if (!good)
++				return false;
++
++			dot = false;
++		}
++	}
++
++	if (q - p > KDBUS_NAME_MAX_LEN)
++		return false;
++
++	if (dot)
++		return false;
++
++	if (!found_dot)
++		return false;
++
++	return true;
++}
++
++/**
++ * kdbus_cmd_name_acquire() - handle KDBUS_CMD_NAME_ACQUIRE
++ * @conn:		connection to operate on
++ * @argp:		command payload
++ *
++ * Return: >=0 on success, negative error code on failure.
++ */
++int kdbus_cmd_name_acquire(struct kdbus_conn *conn, void __user *argp)
++{
++	const char *item_name;
++	struct kdbus_cmd *cmd;
++	int ret;
++
++	struct kdbus_arg argv[] = {
++		{ .type = KDBUS_ITEM_NEGOTIATE },
++		{ .type = KDBUS_ITEM_NAME, .mandatory = true },
++	};
++	struct kdbus_args args = {
++		.allowed_flags = KDBUS_FLAG_NEGOTIATE |
++				 KDBUS_NAME_REPLACE_EXISTING |
++				 KDBUS_NAME_ALLOW_REPLACEMENT |
++				 KDBUS_NAME_QUEUE,
++		.argv = argv,
++		.argc = ARRAY_SIZE(argv),
++	};
++
++	if (!kdbus_conn_is_ordinary(conn))
++		return -EOPNOTSUPP;
++
++	ret = kdbus_args_parse(&args, argp, &cmd);
++	if (ret != 0)
++		return ret;
++
++	item_name = argv[1].item->str;
++	if (!kdbus_name_is_valid(item_name, false)) {
++		ret = -EINVAL;
++		goto exit;
++	}
++
++	ret = kdbus_name_acquire(conn->ep->bus->name_registry, conn, item_name,
++				 cmd->flags, &cmd->return_flags);
++
++exit:
++	return kdbus_args_clear(&args, ret);
++}
++
++/**
++ * kdbus_cmd_name_release() - handle KDBUS_CMD_NAME_RELEASE
++ * @conn:		connection to operate on
++ * @argp:		command payload
++ *
++ * Return: >=0 on success, negative error code on failure.
++ */
++int kdbus_cmd_name_release(struct kdbus_conn *conn, void __user *argp)
++{
++	struct kdbus_cmd *cmd;
++	int ret;
++
++	struct kdbus_arg argv[] = {
++		{ .type = KDBUS_ITEM_NEGOTIATE },
++		{ .type = KDBUS_ITEM_NAME, .mandatory = true },
++	};
++	struct kdbus_args args = {
++		.allowed_flags = KDBUS_FLAG_NEGOTIATE,
++		.argv = argv,
++		.argc = ARRAY_SIZE(argv),
++	};
++
++	if (!kdbus_conn_is_ordinary(conn))
++		return -EOPNOTSUPP;
++
++	ret = kdbus_args_parse(&args, argp, &cmd);
++	if (ret != 0)
++		return ret;
++
++	ret = kdbus_name_release(conn->ep->bus->name_registry, conn,
++				 argv[1].item->str);
++	return kdbus_args_clear(&args, ret);
++}
++
++static int kdbus_list_write(struct kdbus_conn *conn,
++			    struct kdbus_conn *c,
++			    struct kdbus_pool_slice *slice,
++			    size_t *pos,
++			    struct kdbus_name_owner *o,
++			    bool write)
++{
++	struct kvec kvec[4];
++	size_t cnt = 0;
++	int ret;
++
++	/* info header */
++	struct kdbus_info info = {
++		.size = 0,
++		.id = c->id,
++		.flags = c->flags,
++	};
++
++	/* fake the header of a kdbus_name item */
++	struct {
++		u64 size;
++		u64 type;
++		u64 flags;
++	} h = {};
++
++	if (o && !kdbus_conn_policy_see_name_unlocked(conn, current_cred(),
++						      o->name->name))
++		return 0;
++
++	kdbus_kvec_set(&kvec[cnt++], &info, sizeof(info), &info.size);
++
++	/* append name */
++	if (o) {
++		size_t slen = strlen(o->name->name) + 1;
++
++		h.size = offsetof(struct kdbus_item, name.name) + slen;
++		h.type = KDBUS_ITEM_OWNED_NAME;
++		h.flags = o->flags;
++
++		kdbus_kvec_set(&kvec[cnt++], &h, sizeof(h), &info.size);
++		kdbus_kvec_set(&kvec[cnt++], o->name->name, slen, &info.size);
++		cnt += !!kdbus_kvec_pad(&kvec[cnt], &info.size);
++	}
++
++	if (write) {
++		ret = kdbus_pool_slice_copy_kvec(slice, *pos, kvec,
++						 cnt, info.size);
++		if (ret < 0)
++			return ret;
++	}
++
++	*pos += info.size;
++	return 0;
++}
++
++static int kdbus_list_all(struct kdbus_conn *conn, u64 flags,
++			  struct kdbus_pool_slice *slice,
++			  size_t *pos, bool write)
++{
++	struct kdbus_conn *c;
++	size_t p = *pos;
++	int ret, i;
++
++	hash_for_each(conn->ep->bus->conn_hash, i, c, hentry) {
++		bool added = false;
++
++		/* skip monitors */
++		if (kdbus_conn_is_monitor(c))
++			continue;
++
++		/* all names the connection owns */
++		if (flags & (KDBUS_LIST_NAMES |
++			     KDBUS_LIST_ACTIVATORS |
++			     KDBUS_LIST_QUEUED)) {
++			struct kdbus_name_owner *o;
++
++			list_for_each_entry(o, &c->names_list, conn_entry) {
++				if (o->flags & KDBUS_NAME_ACTIVATOR) {
++					if (!(flags & KDBUS_LIST_ACTIVATORS))
++						continue;
++
++					ret = kdbus_list_write(conn, c, slice,
++							       &p, o, write);
++					if (ret < 0) {
++						mutex_unlock(&c->lock);
++						return ret;
++					}
++
++					added = true;
++				} else if (o->flags & KDBUS_NAME_IN_QUEUE) {
++					if (!(flags & KDBUS_LIST_QUEUED))
++						continue;
++
++					ret = kdbus_list_write(conn, c, slice,
++							       &p, o, write);
++					if (ret < 0) {
++						mutex_unlock(&c->lock);
++						return ret;
++					}
++
++					added = true;
++				} else if (flags & KDBUS_LIST_NAMES) {
++					ret = kdbus_list_write(conn, c, slice,
++							       &p, o, write);
++					if (ret < 0) {
++						mutex_unlock(&c->lock);
++						return ret;
++					}
++
++					added = true;
++				}
++			}
++		}
++
++		/* nothing added so far, just add the unique ID */
++		if (!added && (flags & KDBUS_LIST_UNIQUE)) {
++			ret = kdbus_list_write(conn, c, slice, &p, NULL, write);
++			if (ret < 0)
++				return ret;
++		}
++	}
++
++	*pos = p;
++	return 0;
++}
++
++/**
++ * kdbus_cmd_list() - handle KDBUS_CMD_LIST
++ * @conn:		connection to operate on
++ * @argp:		command payload
++ *
++ * Return: >=0 on success, negative error code on failure.
++ */
++int kdbus_cmd_list(struct kdbus_conn *conn, void __user *argp)
++{
++	struct kdbus_name_registry *reg = conn->ep->bus->name_registry;
++	struct kdbus_pool_slice *slice = NULL;
++	struct kdbus_cmd_list *cmd;
++	size_t pos, size;
++	int ret;
++
++	struct kdbus_arg argv[] = {
++		{ .type = KDBUS_ITEM_NEGOTIATE },
++	};
++	struct kdbus_args args = {
++		.allowed_flags = KDBUS_FLAG_NEGOTIATE |
++				 KDBUS_LIST_UNIQUE |
++				 KDBUS_LIST_NAMES |
++				 KDBUS_LIST_ACTIVATORS |
++				 KDBUS_LIST_QUEUED,
++		.argv = argv,
++		.argc = ARRAY_SIZE(argv),
++	};
++
++	ret = kdbus_args_parse(&args, argp, &cmd);
++	if (ret != 0)
++		return ret;
++
++	/* lock order: domain -> bus -> ep -> names -> conn */
++	down_read(&reg->rwlock);
++	down_read(&conn->ep->bus->conn_rwlock);
++	down_read(&conn->ep->policy_db.entries_rwlock);
++
++	/* size of records */
++	size = 0;
++	ret = kdbus_list_all(conn, cmd->flags, NULL, &size, false);
++	if (ret < 0)
++		goto exit_unlock;
++
++	if (size == 0) {
++		kdbus_pool_publish_empty(conn->pool, &cmd->offset,
++					 &cmd->list_size);
++	} else {
++		slice = kdbus_pool_slice_alloc(conn->pool, size, false);
++		if (IS_ERR(slice)) {
++			ret = PTR_ERR(slice);
++			slice = NULL;
++			goto exit_unlock;
++		}
++
++		/* copy the records */
++		pos = 0;
++		ret = kdbus_list_all(conn, cmd->flags, slice, &pos, true);
++		if (ret < 0)
++			goto exit_unlock;
++
++		WARN_ON(pos != size);
++		kdbus_pool_slice_publish(slice, &cmd->offset, &cmd->list_size);
++	}
++
++	if (kdbus_member_set_user(&cmd->offset, argp, typeof(*cmd), offset) ||
++	    kdbus_member_set_user(&cmd->list_size, argp,
++				  typeof(*cmd), list_size))
++		ret = -EFAULT;
++
++exit_unlock:
++	up_read(&conn->ep->policy_db.entries_rwlock);
++	up_read(&conn->ep->bus->conn_rwlock);
++	up_read(&reg->rwlock);
++	kdbus_pool_slice_release(slice);
++	return kdbus_args_clear(&args, ret);
++}
+diff --git a/ipc/kdbus/names.h b/ipc/kdbus/names.h
+new file mode 100644
+index 0000000..edac59d
+--- /dev/null
++++ b/ipc/kdbus/names.h
+@@ -0,0 +1,105 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_NAMES_H
++#define __KDBUS_NAMES_H
++
++#include <linux/hashtable.h>
++#include <linux/rwsem.h>
++
++struct kdbus_name_entry;
++struct kdbus_name_owner;
++struct kdbus_name_registry;
++
++/**
++ * struct kdbus_name_registry - names registered for a bus
++ * @entries_hash:	Map of entries
++ * @lock:		Registry data lock
++ * @name_seq_last:	Last used sequence number to assign to a name entry
++ */
++struct kdbus_name_registry {
++	DECLARE_HASHTABLE(entries_hash, 8);
++	struct rw_semaphore rwlock;
++	u64 name_seq_last;
++};
++
++/**
++ * struct kdbus_name_entry - well-know name entry
++ * @name_id:		sequence number of name entry to be able to uniquely
++ *			identify a name over its registration lifetime
++ * @activator:		activator of this name, or NULL
++ * @queue:		list of queued owners
++ * @hentry:		entry in registry map
++ * @name:		well-known name
++ */
++struct kdbus_name_entry {
++	u64 name_id;
++	struct kdbus_name_owner *activator;
++	struct list_head queue;
++	struct hlist_node hentry;
++	char name[];
++};
++
++/**
++ * struct kdbus_name_owner - owner of a well-known name
++ * @flags:		KDBUS_NAME_* flags of this owner
++ * @conn:		connection owning the name
++ * @name:		name that is owned
++ * @conn_entry:		link into @conn
++ * @name_entry:		link into @name
++ */
++struct kdbus_name_owner {
++	u64 flags;
++	struct kdbus_conn *conn;
++	struct kdbus_name_entry *name;
++	struct list_head conn_entry;
++	struct list_head name_entry;
++};
++
++bool kdbus_name_is_valid(const char *p, bool allow_wildcard);
++
++struct kdbus_name_registry *kdbus_name_registry_new(void);
++void kdbus_name_registry_free(struct kdbus_name_registry *reg);
++
++struct kdbus_name_entry *
++kdbus_name_lookup_unlocked(struct kdbus_name_registry *reg, const char *name);
++
++int kdbus_name_acquire(struct kdbus_name_registry *reg,
++		       struct kdbus_conn *conn, const char *name,
++		       u64 flags, u64 *return_flags);
++void kdbus_name_release_all(struct kdbus_name_registry *reg,
++			    struct kdbus_conn *conn);
++
++int kdbus_cmd_name_acquire(struct kdbus_conn *conn, void __user *argp);
++int kdbus_cmd_name_release(struct kdbus_conn *conn, void __user *argp);
++int kdbus_cmd_list(struct kdbus_conn *conn, void __user *argp);
++
++/**
++ * kdbus_name_get_owner() - get current owner of a name
++ * @name:	name to get current owner of
++ *
++ * This returns a pointer to the current owner of a name (or its activator if
++ * there is no owner). The caller must make sure @name is valid and does not
++ * vanish.
++ *
++ * Return: Pointer to current owner or NULL if there is none.
++ */
++static inline struct kdbus_name_owner *
++kdbus_name_get_owner(struct kdbus_name_entry *name)
++{
++	return list_first_entry_or_null(&name->queue, struct kdbus_name_owner,
++					name_entry) ? : name->activator;
++}
++
++#endif
+diff --git a/ipc/kdbus/node.c b/ipc/kdbus/node.c
+new file mode 100644
+index 0000000..89f58bc
+--- /dev/null
++++ b/ipc/kdbus/node.c
+@@ -0,0 +1,897 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/atomic.h>
++#include <linux/fs.h>
++#include <linux/idr.h>
++#include <linux/kdev_t.h>
++#include <linux/rbtree.h>
++#include <linux/rwsem.h>
++#include <linux/sched.h>
++#include <linux/slab.h>
++#include <linux/wait.h>
++
++#include "bus.h"
++#include "domain.h"
++#include "endpoint.h"
++#include "fs.h"
++#include "handle.h"
++#include "node.h"
++#include "util.h"
++
++/**
++ * DOC: kdbus nodes
++ *
++ * Nodes unify lifetime management across exposed kdbus objects and provide a
++ * hierarchy. Each kdbus object, that might be exposed to user-space, has a
++ * kdbus_node object embedded and is linked into the hierarchy. Each node can
++ * have any number (0-n) of child nodes linked. Each child retains a reference
++ * to its parent node. For root-nodes, the parent is NULL.
++ *
++ * Each node object goes through a bunch of states during it's lifetime:
++ *     * NEW
++ *       * LINKED    (can be skipped by NEW->FREED transition)
++ *         * ACTIVE  (can be skipped by LINKED->INACTIVE transition)
++ *       * INACTIVE
++ *       * DRAINED
++ *     * FREED
++ *
++ * Each node is allocated by the caller and initialized via kdbus_node_init().
++ * This never fails and sets the object into state NEW. From now on, ref-counts
++ * on the node manage its lifetime. During init, the ref-count is set to 1. Once
++ * it drops to 0, the node goes to state FREED and the node->free_cb() callback
++ * is called to deallocate any memory.
++ *
++ * After initializing a node, you usually link it into the hierarchy. You need
++ * to provide a parent node and a name. The node will be linked as child to the
++ * parent and a globally unique ID is assigned to the child. The name of the
++ * child must be unique for all children of this parent. Otherwise, linking the
++ * child will fail with -EEXIST.
++ * Note that the child is not marked active, yet. Admittedly, it prevents any
++ * other node from being linked with the same name (thus, it reserves that
++ * name), but any child-lookup (via name or unique ID) will never return this
++ * child unless it has been marked active.
++ *
++ * Once successfully linked, you can use kdbus_node_activate() to activate a
++ * child. This will mark the child active. This state can be skipped by directly
++ * deactivating the child via kdbus_node_deactivate() (see below).
++ * By activating a child, you enable any lookups on this child to succeed from
++ * now on. Furthermore, any code that got its hands on a reference to the node,
++ * can from now on "acquire" the node.
++ *
++ *     Active References (or: 'acquiring' and 'releasing' a node)
++ *     Additionally to normal object references, nodes support something we call
++ *     "active references". An active reference can be acquired via
++ *     kdbus_node_acquire() and released via kdbus_node_release(). A caller
++ *     _must_ own a normal object reference whenever calling those functions.
++ *     Unlike object references, acquiring an active reference can fail (by
++ *     returning 'false' from kdbus_node_acquire()). An active reference can
++ *     only be acquired if the node is marked active. If it is not marked
++ *     active, yet, or if it was already deactivated, no more active references
++ *     can be acquired, ever!
++ *     Active references are used to track tasks working on a node. Whenever a
++ *     task enters kernel-space to perform an action on a node, it acquires an
++ *     active reference, performs the action and releases the reference again.
++ *     While holding an active reference, the node is guaranteed to stay active.
++ *     If the node is deactivated in parallel, the node is marked as
++ *     deactivated, then we wait for all active references to be dropped, before
++ *     we finally proceed with any cleanups. That is, if you hold an active
++ *     reference to a node, any resources that are bound to the "active" state
++ *     are guaranteed to stay accessible until you release your reference.
++ *
++ *     Active-references are very similar to rw-locks, where acquiring a node is
++ *     equal to try-read-lock and releasing to read-unlock. Deactivating a node
++ *     means write-lock and never releasing it again.
++ *     Unlike rw-locks, the 'active reference' concept is more versatile and
++ *     avoids unusual rw-lock usage (never releasing a write-lock..).
++ *
++ *     It is safe to acquire multiple active-references recursively. But you
++ *     need to check the return value of kdbus_node_acquire() on _each_ call. It
++ *     may stop granting references at _any_ time.
++ *
++ *     You're free to perform any operations you want while holding an active
++ *     reference, except sleeping for an indefinite period. Sleeping for a fixed
++ *     amount of time is fine, but you usually should not wait on wait-queues
++ *     without a timeout.
++ *     For example, if you wait for I/O to happen, you should gather all data
++ *     and schedule the I/O operation, then release your active reference and
++ *     wait for it to complete. Then try to acquire a new reference. If it
++ *     fails, perform any cleanup (the node is now dead). Otherwise, you can
++ *     finish your operation.
++ *
++ * All nodes can be deactivated via kdbus_node_deactivate() at any time. You can
++ * call this multiple times, even in parallel or on nodes that were never
++ * linked, and it will just work. The only restriction is, you must not hold an
++ * active reference when calling kdbus_node_deactivate().
++ * By deactivating a node, it is immediately marked inactive. Then, we wait for
++ * all active references to be released (called 'draining' the node). This
++ * shouldn't take very long as we don't perform long-lasting operations while
++ * holding an active reference. Note that once the node is marked inactive, no
++ * new active references can be acquired.
++ * Once all active references are dropped, the node is considered 'drained'. Now
++ * kdbus_node_deactivate() is called on each child of the node before we
++ * continue deactivating our node. That is, once all children are entirely
++ * deactivated, we call ->release_cb() of our node. ->release_cb() can release
++ * any resources on that node which are bound to the "active" state of a node.
++ * When done, we unlink the node from its parent rb-tree, mark it as
++ * 'released' and return.
++ * If kdbus_node_deactivate() is called multiple times (even in parallel), all
++ * but one caller will just wait until the node is fully deactivated. That is,
++ * one random caller of kdbus_node_deactivate() is selected to call
++ * ->release_cb() and cleanup the node. Only once all this is done, all other
++ * callers will return from kdbus_node_deactivate(). That is, it doesn't matter
++ * whether you're the selected caller or not, it will only return after
++ * everything is fully done.
++ *
++ * When a node is activated, we acquire a normal object reference to the node.
++ * This reference is dropped after deactivation is fully done (and only iff the
++ * node really was activated). This allows callers to link+activate a child node
++ * and then drop all refs. The node will be deactivated together with the
++ * parent, and then be freed when this reference is dropped.
++ *
++ * Currently, nodes provide a bunch of resources that external code can use
++ * directly. This includes:
++ *
++ *     * node->waitq: Each node has its own wait-queue that is used to manage
++ *                    the 'active' state. When a node is deactivated, we wait on
++ *                    this queue until all active refs are dropped. Analogously,
++ *                    when you release an active reference on a deactivated
++ *                    node, and the active ref-count drops to 0, we wake up a
++ *                    single thread on this queue. Furthermore, once the
++ *                    ->release_cb() callback finished, we wake up all waiters.
++ *                    The node-owner is free to re-use this wait-queue for other
++ *                    purposes. As node-management uses this queue only during
++ *                    deactivation, it is usually totally fine to re-use the
++ *                    queue for other, preferably low-overhead, use-cases.
++ *
++ *     * node->type: This field defines the type of the owner of this node. It
++ *                   must be set during node initialization and must remain
++ *                   constant. The node management never looks at this value,
++ *                   but external users might use to gain access to the owner
++ *                   object of a node.
++ *                   It is totally up to the owner of the node to define what
++ *                   their type means. Usually it means you can access the
++ *                   parent structure via container_of(), as long as you hold an
++ *                   active reference to the node.
++ *
++ *     * node->free_cb:    callback after all references are dropped
++ *       node->release_cb: callback during node deactivation
++ *                         These fields must be set by the node owner during
++ *                         node initialization. They must remain constant. If
++ *                         NULL, they're skipped.
++ *
++ *     * node->mode: filesystem access modes
++ *       node->uid:  filesystem owner uid
++ *       node->gid:  filesystem owner gid
++ *                   These fields must be set by the node owner during node
++ *                   initialization. They must remain constant and may be
++ *                   accessed by other callers to properly initialize
++ *                   filesystem nodes.
++ *
++ *     * node->id: This is an unsigned 32bit integer allocated by an IDA. It is
++ *                 always kept as small as possible during allocation and is
++ *                 globally unique across all nodes allocated by this module. 0
++ *                 is reserved as "not assigned" and is the default.
++ *                 The ID is assigned during kdbus_node_link() and is kept until
++ *                 the object is freed. Thus, the ID surpasses the active
++ *                 lifetime of a node. As long as you hold an object reference
++ *                 to a node (and the node was linked once), the ID is valid and
++ *                 unique.
++ *
++ *     * node->name: name of this node
++ *       node->hash: 31bit hash-value of @name (range [2..INT_MAX-1])
++ *                   These values follow the same lifetime rules as node->id.
++ *                   They're initialized when the node is linked and then remain
++ *                   constant until the last object reference is dropped.
++ *                   Unlike the id, the name is only unique across all siblings
++ *                   and only until the node is deactivated. Currently, the name
++ *                   is even unique if linked but not activated, yet. This might
++ *                   change in the future, though. Code should not rely on this.
++ *
++ *     * node->lock:     lock to protect node->children, node->rb, node->parent
++ *     * node->parent: Reference to parent node. This is set during LINK time
++ *                     and is dropped during destruction. You must not access
++ *                     it unless you hold an active reference to the node or if
++ *                     you know the node is dead.
++ *     * node->children: rb-tree of all linked children of this node. You must
++ *                       not access this directly, but use one of the iterator
++ *                       or lookup helpers.
++ */
++
++/*
++ * Bias values track states of "active references". They're all negative. If a
++ * node is active, its active-ref-counter is >=0 and tracks all active
++ * references. Once a node is deactivaed, we subtract NODE_BIAS. This means, the
++ * counter is now negative but still counts the active references. Once it drops
++ * to exactly NODE_BIAS, we know all active references were dropped. Exactly one
++ * thread will change it to NODE_RELEASE now, perform cleanup and then put it
++ * into NODE_DRAINED. Once drained, all other threads that tried deactivating
++ * the node will now be woken up (thus, they wait until the node is fully done).
++ * The initial state during node-setup is NODE_NEW. If a node is directly
++ * deactivated without having ever been active, it is put into
++ * NODE_RELEASE_DIRECT instead of NODE_BIAS. This tracks this one-bit state
++ * across node-deactivation. The task putting it into NODE_RELEASE now knows
++ * whether the node was active before or not.
++ *
++ * Some archs implement atomic_sub(v) with atomic_add(-v), so reserve INT_MIN
++ * to avoid overflows if multiplied by -1.
++ */
++#define KDBUS_NODE_BIAS			(INT_MIN + 5)
++#define KDBUS_NODE_RELEASE_DIRECT	(KDBUS_NODE_BIAS - 1)
++#define KDBUS_NODE_RELEASE		(KDBUS_NODE_BIAS - 2)
++#define KDBUS_NODE_DRAINED		(KDBUS_NODE_BIAS - 3)
++#define KDBUS_NODE_NEW			(KDBUS_NODE_BIAS - 4)
++
++/* global unique ID mapping for kdbus nodes */
++DEFINE_IDA(kdbus_node_ida);
++
++/**
++ * kdbus_node_name_hash() - hash a name
++ * @name:	The string to hash
++ *
++ * This computes the hash of @name. It is guaranteed to be in the range
++ * [2..INT_MAX-1]. The values 1, 2 and INT_MAX are unused as they are reserved
++ * for the filesystem code.
++ *
++ * Return: hash value of the passed string
++ */
++static unsigned int kdbus_node_name_hash(const char *name)
++{
++	unsigned int hash;
++
++	/* reserve hash numbers 0, 1 and >=INT_MAX for magic directories */
++	hash = kdbus_strhash(name) & INT_MAX;
++	if (hash < 2)
++		hash += 2;
++	if (hash >= INT_MAX)
++		hash = INT_MAX - 1;
++
++	return hash;
++}
++
++/**
++ * kdbus_node_name_compare() - compare a name with a node's name
++ * @hash:	hash of the string to compare the node with
++ * @name:	name to compare the node with
++ * @node:	node to compare the name with
++ *
++ * Return: 0 if @name and @hash exactly match the information in @node, or
++ * an integer less than or greater than zero if @name is found, respectively,
++ * to be less than or be greater than the string stored in @node.
++ */
++static int kdbus_node_name_compare(unsigned int hash, const char *name,
++				   const struct kdbus_node *node)
++{
++	if (hash != node->hash)
++		return hash - node->hash;
++
++	return strcmp(name, node->name);
++}
++
++/**
++ * kdbus_node_init() - initialize a kdbus_node
++ * @node:	Pointer to the node to initialize
++ * @type:	The type the node will have (KDBUS_NODE_*)
++ *
++ * The caller is responsible of allocating @node and initializating it to zero.
++ * Once this call returns, you must use the node_ref() and node_unref()
++ * functions to manage this node.
++ */
++void kdbus_node_init(struct kdbus_node *node, unsigned int type)
++{
++	atomic_set(&node->refcnt, 1);
++	mutex_init(&node->lock);
++	node->id = 0;
++	node->type = type;
++	RB_CLEAR_NODE(&node->rb);
++	node->children = RB_ROOT;
++	init_waitqueue_head(&node->waitq);
++	atomic_set(&node->active, KDBUS_NODE_NEW);
++}
++
++/**
++ * kdbus_node_link() - link a node into the nodes system
++ * @node:	Pointer to the node to initialize
++ * @parent:	Pointer to a parent node, may be %NULL
++ * @name:	The name of the node (or NULL if root node)
++ *
++ * This links a node into the hierarchy. This must not be called multiple times.
++ * If @parent is NULL, the node becomes a new root node.
++ *
++ * This call will fail if @name is not unique across all its siblings or if no
++ * ID could be allocated. You must not activate a node if linking failed! It is
++ * safe to deactivate it, though.
++ *
++ * Once you linked a node, you must call kdbus_node_deactivate() before you drop
++ * the last reference (even if you never activate the node).
++ *
++ * Return: 0 on success. negative error otherwise.
++ */
++int kdbus_node_link(struct kdbus_node *node, struct kdbus_node *parent,
++		    const char *name)
++{
++	int ret;
++
++	if (WARN_ON(node->type != KDBUS_NODE_DOMAIN && !parent))
++		return -EINVAL;
++
++	if (WARN_ON(parent && !name))
++		return -EINVAL;
++
++	if (name) {
++		node->name = kstrdup(name, GFP_KERNEL);
++		if (!node->name)
++			return -ENOMEM;
++
++		node->hash = kdbus_node_name_hash(name);
++	}
++
++	ret = ida_simple_get(&kdbus_node_ida, 1, 0, GFP_KERNEL);
++	if (ret < 0)
++		return ret;
++
++	node->id = ret;
++	ret = 0;
++
++	if (parent) {
++		struct rb_node **n, *prev;
++
++		if (!kdbus_node_acquire(parent))
++			return -ESHUTDOWN;
++
++		mutex_lock(&parent->lock);
++
++		n = &parent->children.rb_node;
++		prev = NULL;
++
++		while (*n) {
++			struct kdbus_node *pos;
++			int result;
++
++			pos = kdbus_node_from_rb(*n);
++			prev = *n;
++			result = kdbus_node_name_compare(node->hash,
++							 node->name,
++							 pos);
++			if (result == 0) {
++				ret = -EEXIST;
++				goto exit_unlock;
++			}
++
++			if (result < 0)
++				n = &pos->rb.rb_left;
++			else
++				n = &pos->rb.rb_right;
++		}
++
++		/* add new node and rebalance the tree */
++		rb_link_node(&node->rb, prev, n);
++		rb_insert_color(&node->rb, &parent->children);
++		node->parent = kdbus_node_ref(parent);
++
++exit_unlock:
++		mutex_unlock(&parent->lock);
++		kdbus_node_release(parent);
++	}
++
++	return ret;
++}
++
++/**
++ * kdbus_node_ref() - Acquire object reference
++ * @node:	node to acquire reference to (or NULL)
++ *
++ * This acquires a new reference to @node. You must already own a reference when
++ * calling this!
++ * If @node is NULL, this is a no-op.
++ *
++ * Return: @node is returned
++ */
++struct kdbus_node *kdbus_node_ref(struct kdbus_node *node)
++{
++	if (node)
++		atomic_inc(&node->refcnt);
++	return node;
++}
++
++/**
++ * kdbus_node_unref() - Drop object reference
++ * @node:	node to drop reference to (or NULL)
++ *
++ * This drops an object reference to @node. You must not access the node if you
++ * no longer own a reference.
++ * If the ref-count drops to 0, the object will be destroyed (->free_cb will be
++ * called).
++ *
++ * If you linked or activated the node, you must deactivate the node before you
++ * drop your last reference! If you didn't link or activate the node, you can
++ * drop any reference you want.
++ *
++ * Note that this calls into ->free_cb() and thus _might_ sleep. The ->free_cb()
++ * callbacks must not acquire any outer locks, though. So you can safely drop
++ * references while holding locks.
++ *
++ * If @node is NULL, this is a no-op.
++ *
++ * Return: This always returns NULL
++ */
++struct kdbus_node *kdbus_node_unref(struct kdbus_node *node)
++{
++	if (node && atomic_dec_and_test(&node->refcnt)) {
++		struct kdbus_node safe = *node;
++
++		WARN_ON(atomic_read(&node->active) != KDBUS_NODE_DRAINED);
++		WARN_ON(!RB_EMPTY_NODE(&node->rb));
++
++		if (node->free_cb)
++			node->free_cb(node);
++		if (safe.id > 0)
++			ida_simple_remove(&kdbus_node_ida, safe.id);
++
++		kfree(safe.name);
++
++		/*
++		 * kdbusfs relies on the parent to be available even after the
++		 * node was deactivated and unlinked. Therefore, we pin it
++		 * until a node is destroyed.
++		 */
++		kdbus_node_unref(safe.parent);
++	}
++
++	return NULL;
++}
++
++/**
++ * kdbus_node_is_active() - test whether a node is active
++ * @node:	node to test
++ *
++ * This checks whether @node is active. That means, @node was linked and
++ * activated by the node owner and hasn't been deactivated, yet. If, and only
++ * if, a node is active, kdbus_node_acquire() will be able to acquire active
++ * references.
++ *
++ * Note that this function does not give any lifetime guarantees. After this
++ * call returns, the node might be deactivated immediately. Normally, what you
++ * want is to acquire a real active reference via kdbus_node_acquire().
++ *
++ * Return: true if @node is active, false otherwise
++ */
++bool kdbus_node_is_active(struct kdbus_node *node)
++{
++	return atomic_read(&node->active) >= 0;
++}
++
++/**
++ * kdbus_node_is_deactivated() - test whether a node was already deactivated
++ * @node:	node to test
++ *
++ * This checks whether kdbus_node_deactivate() was called on @node. Note that
++ * this might be true even if you never deactivated the node directly, but only
++ * one of its ancestors.
++ *
++ * Note that even if this returns 'false', the node might get deactivated
++ * immediately after the call returns.
++ *
++ * Return: true if @node was already deactivated, false if not
++ */
++bool kdbus_node_is_deactivated(struct kdbus_node *node)
++{
++	int v;
++
++	v = atomic_read(&node->active);
++	return v != KDBUS_NODE_NEW && v < 0;
++}
++
++/**
++ * kdbus_node_activate() - activate a node
++ * @node:	node to activate
++ *
++ * This marks @node as active if, and only if, the node wasn't activated nor
++ * deactivated, yet, and the parent is still active. Any but the first call to
++ * kdbus_node_activate() is a no-op.
++ * If you called kdbus_node_deactivate() before, then even the first call to
++ * kdbus_node_activate() will be a no-op.
++ *
++ * This call doesn't give any lifetime guarantees. The node might get
++ * deactivated immediately after this call returns. Or the parent might already
++ * be deactivated, which will make this call a no-op.
++ *
++ * If this call successfully activated a node, it will take an object reference
++ * to it. This reference is dropped after the node is deactivated. Therefore,
++ * the object owner can safely drop their reference to @node iff they know that
++ * its parent node will get deactivated at some point. Once the parent node is
++ * deactivated, it will deactivate all its child and thus drop this reference
++ * again.
++ *
++ * Return: True if this call successfully activated the node, otherwise false.
++ *         Note that this might return false, even if the node is still active
++ *         (eg., if you called this a second time).
++ */
++bool kdbus_node_activate(struct kdbus_node *node)
++{
++	bool res = false;
++
++	mutex_lock(&node->lock);
++	if (atomic_read(&node->active) == KDBUS_NODE_NEW) {
++		atomic_sub(KDBUS_NODE_NEW, &node->active);
++		/* activated nodes have ref +1 */
++		kdbus_node_ref(node);
++		res = true;
++	}
++	mutex_unlock(&node->lock);
++
++	return res;
++}
++
++/**
++ * kdbus_node_deactivate() - deactivate a node
++ * @node:	The node to deactivate.
++ *
++ * This function recursively deactivates this node and all its children. It
++ * returns only once all children and the node itself were recursively disabled
++ * (even if you call this function multiple times in parallel).
++ *
++ * It is safe to call this function on _any_ node that was initialized _any_
++ * number of times.
++ *
++ * This call may sleep, as it waits for all active references to be dropped.
++ */
++void kdbus_node_deactivate(struct kdbus_node *node)
++{
++	struct kdbus_node *pos, *child;
++	struct rb_node *rb;
++	int v_pre, v_post;
++
++	pos = node;
++
++	/*
++	 * To avoid recursion, we perform back-tracking while deactivating
++	 * nodes. For each node we enter, we first mark the active-counter as
++	 * deactivated by adding BIAS. If the node as children, we set the first
++	 * child as current position and start over. If the node has no
++	 * children, we drain the node by waiting for all active refs to be
++	 * dropped and then releasing the node.
++	 *
++	 * After the node is released, we set its parent as current position
++	 * and start over. If the current position was the initial node, we're
++	 * done.
++	 *
++	 * Note that this function can be called in parallel by multiple
++	 * callers. We make sure that each node is only released once, and any
++	 * racing caller will wait until the other thread fully released that
++	 * node.
++	 */
++
++	for (;;) {
++		/*
++		 * Add BIAS to node->active to mark it as inactive. If it was
++		 * never active before, immediately mark it as RELEASE_INACTIVE
++		 * so we remember this state.
++		 * We cannot remember v_pre as we might iterate into the
++		 * children, overwriting v_pre, before we can release our node.
++		 */
++		mutex_lock(&pos->lock);
++		v_pre = atomic_read(&pos->active);
++		if (v_pre >= 0)
++			atomic_add_return(KDBUS_NODE_BIAS, &pos->active);
++		else if (v_pre == KDBUS_NODE_NEW)
++			atomic_set(&pos->active, KDBUS_NODE_RELEASE_DIRECT);
++		mutex_unlock(&pos->lock);
++
++		/* wait until all active references were dropped */
++		wait_event(pos->waitq,
++			   atomic_read(&pos->active) <= KDBUS_NODE_BIAS);
++
++		mutex_lock(&pos->lock);
++		/* recurse into first child if any */
++		rb = rb_first(&pos->children);
++		if (rb) {
++			child = kdbus_node_ref(kdbus_node_from_rb(rb));
++			mutex_unlock(&pos->lock);
++			pos = child;
++			continue;
++		}
++
++		/* mark object as RELEASE */
++		v_post = atomic_read(&pos->active);
++		if (v_post == KDBUS_NODE_BIAS ||
++		    v_post == KDBUS_NODE_RELEASE_DIRECT)
++			atomic_set(&pos->active, KDBUS_NODE_RELEASE);
++		mutex_unlock(&pos->lock);
++
++		/*
++		 * If this is the thread that marked the object as RELEASE, we
++		 * perform the actual release. Otherwise, we wait until the
++		 * release is done and the node is marked as DRAINED.
++		 */
++		if (v_post == KDBUS_NODE_BIAS ||
++		    v_post == KDBUS_NODE_RELEASE_DIRECT) {
++			if (pos->release_cb)
++				pos->release_cb(pos, v_post == KDBUS_NODE_BIAS);
++
++			if (pos->parent) {
++				mutex_lock(&pos->parent->lock);
++				if (!RB_EMPTY_NODE(&pos->rb)) {
++					rb_erase(&pos->rb,
++						 &pos->parent->children);
++					RB_CLEAR_NODE(&pos->rb);
++				}
++				mutex_unlock(&pos->parent->lock);
++			}
++
++			/* mark as DRAINED */
++			atomic_set(&pos->active, KDBUS_NODE_DRAINED);
++			wake_up_all(&pos->waitq);
++
++			/* drop VFS cache */
++			kdbus_fs_flush(pos);
++
++			/*
++			 * If the node was activated and someone subtracted BIAS
++			 * from it to deactivate it, we, and only us, are
++			 * responsible to release the extra ref-count that was
++			 * taken once in kdbus_node_activate().
++			 * If the node was never activated, no-one ever
++			 * subtracted BIAS, but instead skipped that state and
++			 * immediately went to NODE_RELEASE_DIRECT. In that case
++			 * we must not drop the reference.
++			 */
++			if (v_post == KDBUS_NODE_BIAS)
++				kdbus_node_unref(pos);
++		} else {
++			/* wait until object is DRAINED */
++			wait_event(pos->waitq,
++			    atomic_read(&pos->active) == KDBUS_NODE_DRAINED);
++		}
++
++		/*
++		 * We're done with the current node. Continue on its parent
++		 * again, which will try deactivating its next child, or itself
++		 * if no child is left.
++		 * If we've reached our initial node again, we are done and
++		 * can safely return.
++		 */
++		if (pos == node)
++			break;
++
++		child = pos;
++		pos = pos->parent;
++		kdbus_node_unref(child);
++	}
++}
++
++/**
++ * kdbus_node_acquire() - Acquire an active ref on a node
++ * @node:	The node
++ *
++ * This acquires an active-reference to @node. This will only succeed if the
++ * node is active. You must release this active reference via
++ * kdbus_node_release() again.
++ *
++ * See the introduction to "active references" for more details.
++ *
++ * Return: %true if @node was non-NULL and active
++ */
++bool kdbus_node_acquire(struct kdbus_node *node)
++{
++	return node && atomic_inc_unless_negative(&node->active);
++}
++
++/**
++ * kdbus_node_release() - Release an active ref on a node
++ * @node:	The node
++ *
++ * This releases an active reference that was previously acquired via
++ * kdbus_node_acquire(). See kdbus_node_acquire() for details.
++ */
++void kdbus_node_release(struct kdbus_node *node)
++{
++	if (node && atomic_dec_return(&node->active) == KDBUS_NODE_BIAS)
++		wake_up(&node->waitq);
++}
++
++/**
++ * kdbus_node_find_child() - Find child by name
++ * @node:	parent node to search through
++ * @name:	name of child node
++ *
++ * This searches through all children of @node for a child-node with name @name.
++ * If not found, or if the child is deactivated, NULL is returned. Otherwise,
++ * the child is acquired and a new reference is returned.
++ *
++ * If you're done with the child, you need to release it and drop your
++ * reference.
++ *
++ * This function does not acquire the parent node. However, if the parent was
++ * already deactivated, then kdbus_node_deactivate() will, at some point, also
++ * deactivate the child. Therefore, we can rely on the explicit ordering during
++ * deactivation.
++ *
++ * Return: Reference to acquired child node, or NULL if not found / not active.
++ */
++struct kdbus_node *kdbus_node_find_child(struct kdbus_node *node,
++					 const char *name)
++{
++	struct kdbus_node *child;
++	struct rb_node *rb;
++	unsigned int hash;
++	int ret;
++
++	hash = kdbus_node_name_hash(name);
++
++	mutex_lock(&node->lock);
++	rb = node->children.rb_node;
++	while (rb) {
++		child = kdbus_node_from_rb(rb);
++		ret = kdbus_node_name_compare(hash, name, child);
++		if (ret < 0)
++			rb = rb->rb_left;
++		else if (ret > 0)
++			rb = rb->rb_right;
++		else
++			break;
++	}
++	if (rb && kdbus_node_acquire(child))
++		kdbus_node_ref(child);
++	else
++		child = NULL;
++	mutex_unlock(&node->lock);
++
++	return child;
++}
++
++static struct kdbus_node *node_find_closest_unlocked(struct kdbus_node *node,
++						     unsigned int hash,
++						     const char *name)
++{
++	struct kdbus_node *n, *pos = NULL;
++	struct rb_node *rb;
++	int res;
++
++	/*
++	 * Find the closest child with ``node->hash >= hash'', or, if @name is
++	 * valid, ``node->name >= name'' (where '>=' is the lex. order).
++	 */
++
++	rb = node->children.rb_node;
++	while (rb) {
++		n = kdbus_node_from_rb(rb);
++
++		if (name)
++			res = kdbus_node_name_compare(hash, name, n);
++		else
++			res = hash - n->hash;
++
++		if (res <= 0) {
++			rb = rb->rb_left;
++			pos = n;
++		} else { /* ``hash > n->hash'', ``name > n->name'' */
++			rb = rb->rb_right;
++		}
++	}
++
++	return pos;
++}
++
++/**
++ * kdbus_node_find_closest() - Find closest child-match
++ * @node:	parent node to search through
++ * @hash:	hash value to find closest match for
++ *
++ * Find the closest child of @node with a hash greater than or equal to @hash.
++ * The closest match is the left-most child of @node with this property. Which
++ * means, it is the first child with that hash returned by
++ * kdbus_node_next_child(), if you'd iterate the whole parent node.
++ *
++ * Return: Reference to acquired child, or NULL if none found.
++ */
++struct kdbus_node *kdbus_node_find_closest(struct kdbus_node *node,
++					   unsigned int hash)
++{
++	struct kdbus_node *child;
++	struct rb_node *rb;
++
++	mutex_lock(&node->lock);
++
++	child = node_find_closest_unlocked(node, hash, NULL);
++	while (child && !kdbus_node_acquire(child)) {
++		rb = rb_next(&child->rb);
++		if (rb)
++			child = kdbus_node_from_rb(rb);
++		else
++			child = NULL;
++	}
++	kdbus_node_ref(child);
++
++	mutex_unlock(&node->lock);
++
++	return child;
++}
++
++/**
++ * kdbus_node_next_child() - Acquire next child
++ * @node:	parent node
++ * @prev:	previous child-node position or NULL
++ *
++ * This function returns a reference to the next active child of @node, after
++ * the passed position @prev. If @prev is NULL, a reference to the first active
++ * child is returned. If no more active children are found, NULL is returned.
++ *
++ * This function acquires the next child it returns. If you're done with the
++ * returned pointer, you need to release _and_ unref it.
++ *
++ * The passed in pointer @prev is not modified by this function, and it does
++ * *not* have to be active. If @prev was acquired via different means, or if it
++ * was unlinked from its parent before you pass it in, then this iterator will
++ * still return the next active child (it will have to search through the
++ * rb-tree based on the node-name, though).
++ * However, @prev must not be linked to a different parent than @node!
++ *
++ * Return: Reference to next acquired child, or NULL if at the end.
++ */
++struct kdbus_node *kdbus_node_next_child(struct kdbus_node *node,
++					 struct kdbus_node *prev)
++{
++	struct kdbus_node *pos = NULL;
++	struct rb_node *rb;
++
++	mutex_lock(&node->lock);
++
++	if (!prev) {
++		/*
++		 * New iteration; find first node in rb-tree and try to acquire
++		 * it. If we got it, directly return it as first element.
++		 * Otherwise, the loop below will find the next active node.
++		 */
++		rb = rb_first(&node->children);
++		if (!rb)
++			goto exit;
++		pos = kdbus_node_from_rb(rb);
++		if (kdbus_node_acquire(pos))
++			goto exit;
++	} else if (RB_EMPTY_NODE(&prev->rb)) {
++		/*
++		 * The current iterator is no longer linked to the rb-tree. Use
++		 * its hash value and name to find the next _higher_ node and
++		 * acquire it. If we got it, return it as next element.
++		 * Otherwise, the loop below will find the next active node.
++		 */
++		pos = node_find_closest_unlocked(node, prev->hash, prev->name);
++		if (!pos)
++			goto exit;
++		if (kdbus_node_acquire(pos))
++			goto exit;
++	} else {
++		/*
++		 * The current iterator is still linked to the parent. Set it
++		 * as current position and use the loop below to find the next
++		 * active element.
++		 */
++		pos = prev;
++	}
++
++	/* @pos was already returned or is inactive; find next active node */
++	do {
++		rb = rb_next(&pos->rb);
++		if (rb)
++			pos = kdbus_node_from_rb(rb);
++		else
++			pos = NULL;
++	} while (pos && !kdbus_node_acquire(pos));
++
++exit:
++	/* @pos is NULL or acquired. Take ref if non-NULL and return it */
++	kdbus_node_ref(pos);
++	mutex_unlock(&node->lock);
++	return pos;
++}
+diff --git a/ipc/kdbus/node.h b/ipc/kdbus/node.h
+new file mode 100644
+index 0000000..970e02b
+--- /dev/null
++++ b/ipc/kdbus/node.h
+@@ -0,0 +1,86 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_NODE_H
++#define __KDBUS_NODE_H
++
++#include <linux/atomic.h>
++#include <linux/kernel.h>
++#include <linux/mutex.h>
++#include <linux/wait.h>
++
++struct kdbus_node;
++
++enum kdbus_node_type {
++	KDBUS_NODE_DOMAIN,
++	KDBUS_NODE_CONTROL,
++	KDBUS_NODE_BUS,
++	KDBUS_NODE_ENDPOINT,
++};
++
++typedef void (*kdbus_node_free_t) (struct kdbus_node *node);
++typedef void (*kdbus_node_release_t) (struct kdbus_node *node, bool was_active);
++
++struct kdbus_node {
++	atomic_t refcnt;
++	atomic_t active;
++	wait_queue_head_t waitq;
++
++	/* static members */
++	unsigned int type;
++	kdbus_node_free_t free_cb;
++	kdbus_node_release_t release_cb;
++	umode_t mode;
++	kuid_t uid;
++	kgid_t gid;
++
++	/* valid once linked */
++	char *name;
++	unsigned int hash;
++	unsigned int id;
++	struct kdbus_node *parent; /* may be NULL */
++
++	/* valid iff active */
++	struct mutex lock;
++	struct rb_node rb;
++	struct rb_root children;
++};
++
++#define kdbus_node_from_rb(_node) rb_entry((_node), struct kdbus_node, rb)
++
++extern struct ida kdbus_node_ida;
++
++void kdbus_node_init(struct kdbus_node *node, unsigned int type);
++
++int kdbus_node_link(struct kdbus_node *node, struct kdbus_node *parent,
++		    const char *name);
++
++struct kdbus_node *kdbus_node_ref(struct kdbus_node *node);
++struct kdbus_node *kdbus_node_unref(struct kdbus_node *node);
++
++bool kdbus_node_is_active(struct kdbus_node *node);
++bool kdbus_node_is_deactivated(struct kdbus_node *node);
++bool kdbus_node_activate(struct kdbus_node *node);
++void kdbus_node_deactivate(struct kdbus_node *node);
++
++bool kdbus_node_acquire(struct kdbus_node *node);
++void kdbus_node_release(struct kdbus_node *node);
++
++struct kdbus_node *kdbus_node_find_child(struct kdbus_node *node,
++					 const char *name);
++struct kdbus_node *kdbus_node_find_closest(struct kdbus_node *node,
++					   unsigned int hash);
++struct kdbus_node *kdbus_node_next_child(struct kdbus_node *node,
++					 struct kdbus_node *prev);
++
++#endif
+diff --git a/ipc/kdbus/notify.c b/ipc/kdbus/notify.c
+new file mode 100644
+index 0000000..375758c
+--- /dev/null
++++ b/ipc/kdbus/notify.c
+@@ -0,0 +1,204 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/fs.h>
++#include <linux/init.h>
++#include <linux/module.h>
++#include <linux/mutex.h>
++#include <linux/spinlock.h>
++#include <linux/sched.h>
++#include <linux/slab.h>
++
++#include "bus.h"
++#include "connection.h"
++#include "domain.h"
++#include "endpoint.h"
++#include "item.h"
++#include "message.h"
++#include "notify.h"
++
++static inline void kdbus_notify_add_tail(struct kdbus_staging *staging,
++					 struct kdbus_bus *bus)
++{
++	spin_lock(&bus->notify_lock);
++	list_add_tail(&staging->notify_entry, &bus->notify_list);
++	spin_unlock(&bus->notify_lock);
++}
++
++static int kdbus_notify_reply(struct kdbus_bus *bus, u64 id,
++			      u64 cookie, u64 msg_type)
++{
++	struct kdbus_staging *s;
++
++	s = kdbus_staging_new_kernel(bus, id, cookie, 0, msg_type);
++	if (IS_ERR(s))
++		return PTR_ERR(s);
++
++	kdbus_notify_add_tail(s, bus);
++	return 0;
++}
++
++/**
++ * kdbus_notify_reply_timeout() - queue a timeout reply
++ * @bus:		Bus which queues the messages
++ * @id:			The destination's connection ID
++ * @cookie:		The cookie to set in the reply.
++ *
++ * Queues a message that has a KDBUS_ITEM_REPLY_TIMEOUT item attached.
++ *
++ * Return: 0 on success, negative errno on failure.
++ */
++int kdbus_notify_reply_timeout(struct kdbus_bus *bus, u64 id, u64 cookie)
++{
++	return kdbus_notify_reply(bus, id, cookie, KDBUS_ITEM_REPLY_TIMEOUT);
++}
++
++/**
++ * kdbus_notify_reply_dead() - queue a 'dead' reply
++ * @bus:		Bus which queues the messages
++ * @id:			The destination's connection ID
++ * @cookie:		The cookie to set in the reply.
++ *
++ * Queues a message that has a KDBUS_ITEM_REPLY_DEAD item attached.
++ *
++ * Return: 0 on success, negative errno on failure.
++ */
++int kdbus_notify_reply_dead(struct kdbus_bus *bus, u64 id, u64 cookie)
++{
++	return kdbus_notify_reply(bus, id, cookie, KDBUS_ITEM_REPLY_DEAD);
++}
++
++/**
++ * kdbus_notify_name_change() - queue a notification about a name owner change
++ * @bus:		Bus which queues the messages
++ * @type:		The type if the notification; KDBUS_ITEM_NAME_ADD,
++ *			KDBUS_ITEM_NAME_CHANGE or KDBUS_ITEM_NAME_REMOVE
++ * @old_id:		The id of the connection that used to own the name
++ * @new_id:		The id of the new owner connection
++ * @old_flags:		The flags to pass in the KDBUS_ITEM flags field for
++ *			the old owner
++ * @new_flags:		The flags to pass in the KDBUS_ITEM flags field for
++ *			the new owner
++ * @name:		The name that was removed or assigned to a new owner
++ *
++ * Return: 0 on success, negative errno on failure.
++ */
++int kdbus_notify_name_change(struct kdbus_bus *bus, u64 type,
++			     u64 old_id, u64 new_id,
++			     u64 old_flags, u64 new_flags,
++			     const char *name)
++{
++	size_t name_len, extra_size;
++	struct kdbus_staging *s;
++
++	name_len = strlen(name) + 1;
++	extra_size = sizeof(struct kdbus_notify_name_change) + name_len;
++
++	s = kdbus_staging_new_kernel(bus, KDBUS_DST_ID_BROADCAST, 0,
++				     extra_size, type);
++	if (IS_ERR(s))
++		return PTR_ERR(s);
++
++	s->notify->name_change.old_id.id = old_id;
++	s->notify->name_change.old_id.flags = old_flags;
++	s->notify->name_change.new_id.id = new_id;
++	s->notify->name_change.new_id.flags = new_flags;
++	memcpy(s->notify->name_change.name, name, name_len);
++
++	kdbus_notify_add_tail(s, bus);
++	return 0;
++}
++
++/**
++ * kdbus_notify_id_change() - queue a notification about a unique ID change
++ * @bus:		Bus which queues the messages
++ * @type:		The type if the notification; KDBUS_ITEM_ID_ADD or
++ *			KDBUS_ITEM_ID_REMOVE
++ * @id:			The id of the connection that was added or removed
++ * @flags:		The flags to pass in the KDBUS_ITEM flags field
++ *
++ * Return: 0 on success, negative errno on failure.
++ */
++int kdbus_notify_id_change(struct kdbus_bus *bus, u64 type, u64 id, u64 flags)
++{
++	struct kdbus_staging *s;
++	size_t extra_size;
++
++	extra_size = sizeof(struct kdbus_notify_id_change);
++	s = kdbus_staging_new_kernel(bus, KDBUS_DST_ID_BROADCAST, 0,
++				     extra_size, type);
++	if (IS_ERR(s))
++		return PTR_ERR(s);
++
++	s->notify->id_change.id = id;
++	s->notify->id_change.flags = flags;
++
++	kdbus_notify_add_tail(s, bus);
++	return 0;
++}
++
++/**
++ * kdbus_notify_flush() - send a list of collected messages
++ * @bus:		Bus which queues the messages
++ *
++ * The list is empty after sending the messages.
++ */
++void kdbus_notify_flush(struct kdbus_bus *bus)
++{
++	LIST_HEAD(notify_list);
++	struct kdbus_staging *s, *tmp;
++
++	mutex_lock(&bus->notify_flush_lock);
++	down_read(&bus->name_registry->rwlock);
++
++	spin_lock(&bus->notify_lock);
++	list_splice_init(&bus->notify_list, &notify_list);
++	spin_unlock(&bus->notify_lock);
++
++	list_for_each_entry_safe(s, tmp, &notify_list, notify_entry) {
++		if (s->msg->dst_id != KDBUS_DST_ID_BROADCAST) {
++			struct kdbus_conn *conn;
++
++			conn = kdbus_bus_find_conn_by_id(bus, s->msg->dst_id);
++			if (conn) {
++				kdbus_bus_eavesdrop(bus, NULL, s);
++				kdbus_conn_entry_insert(NULL, conn, s, NULL,
++							NULL);
++				kdbus_conn_unref(conn);
++			}
++		} else {
++			kdbus_bus_broadcast(bus, NULL, s);
++		}
++
++		list_del(&s->notify_entry);
++		kdbus_staging_free(s);
++	}
++
++	up_read(&bus->name_registry->rwlock);
++	mutex_unlock(&bus->notify_flush_lock);
++}
++
++/**
++ * kdbus_notify_free() - free a list of collected messages
++ * @bus:		Bus which queues the messages
++ */
++void kdbus_notify_free(struct kdbus_bus *bus)
++{
++	struct kdbus_staging *s, *tmp;
++
++	list_for_each_entry_safe(s, tmp, &bus->notify_list, notify_entry) {
++		list_del(&s->notify_entry);
++		kdbus_staging_free(s);
++	}
++}
+diff --git a/ipc/kdbus/notify.h b/ipc/kdbus/notify.h
+new file mode 100644
+index 0000000..03df464
+--- /dev/null
++++ b/ipc/kdbus/notify.h
+@@ -0,0 +1,30 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_NOTIFY_H
++#define __KDBUS_NOTIFY_H
++
++struct kdbus_bus;
++
++int kdbus_notify_id_change(struct kdbus_bus *bus, u64 type, u64 id, u64 flags);
++int kdbus_notify_reply_timeout(struct kdbus_bus *bus, u64 id, u64 cookie);
++int kdbus_notify_reply_dead(struct kdbus_bus *bus, u64 id, u64 cookie);
++int kdbus_notify_name_change(struct kdbus_bus *bus, u64 type,
++			     u64 old_id, u64 new_id,
++			     u64 old_flags, u64 new_flags,
++			     const char *name);
++void kdbus_notify_flush(struct kdbus_bus *bus);
++void kdbus_notify_free(struct kdbus_bus *bus);
++
++#endif
+diff --git a/ipc/kdbus/policy.c b/ipc/kdbus/policy.c
+new file mode 100644
+index 0000000..f2618e15
+--- /dev/null
++++ b/ipc/kdbus/policy.c
+@@ -0,0 +1,489 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/dcache.h>
++#include <linux/fs.h>
++#include <linux/init.h>
++#include <linux/mutex.h>
++#include <linux/sched.h>
++#include <linux/sizes.h>
++#include <linux/slab.h>
++#include <linux/uaccess.h>
++
++#include "bus.h"
++#include "connection.h"
++#include "domain.h"
++#include "item.h"
++#include "names.h"
++#include "policy.h"
++
++#define KDBUS_POLICY_HASH_SIZE	64
++
++/**
++ * struct kdbus_policy_db_entry_access - a database entry access item
++ * @type:		One of KDBUS_POLICY_ACCESS_* types
++ * @access:		Access to grant. One of KDBUS_POLICY_*
++ * @uid:		For KDBUS_POLICY_ACCESS_USER, the global uid
++ * @gid:		For KDBUS_POLICY_ACCESS_GROUP, the global gid
++ * @list:		List entry item for the entry's list
++ *
++ * This is the internal version of struct kdbus_policy_db_access.
++ */
++struct kdbus_policy_db_entry_access {
++	u8 type;		/* USER, GROUP, WORLD */
++	u8 access;		/* OWN, TALK, SEE */
++	union {
++		kuid_t uid;	/* global uid */
++		kgid_t gid;	/* global gid */
++	};
++	struct list_head list;
++};
++
++/**
++ * struct kdbus_policy_db_entry - a policy database entry
++ * @name:		The name to match the policy entry against
++ * @hentry:		The hash entry for the database's entries_hash
++ * @access_list:	List head for keeping tracks of the entry's
++ *			access items.
++ * @owner:		The owner of this entry. Can be a kdbus_conn or
++ *			a kdbus_ep object.
++ * @wildcard:		The name is a wildcard, such as ending on '.*'
++ */
++struct kdbus_policy_db_entry {
++	char *name;
++	struct hlist_node hentry;
++	struct list_head access_list;
++	const void *owner;
++	bool wildcard:1;
++};
++
++static void kdbus_policy_entry_free(struct kdbus_policy_db_entry *e)
++{
++	struct kdbus_policy_db_entry_access *a, *tmp;
++
++	list_for_each_entry_safe(a, tmp, &e->access_list, list) {
++		list_del(&a->list);
++		kfree(a);
++	}
++
++	kfree(e->name);
++	kfree(e);
++}
++
++static unsigned int kdbus_strnhash(const char *str, size_t len)
++{
++	unsigned long hash = init_name_hash();
++
++	while (len--)
++		hash = partial_name_hash(*str++, hash);
++
++	return end_name_hash(hash);
++}
++
++static const struct kdbus_policy_db_entry *
++kdbus_policy_lookup(struct kdbus_policy_db *db, const char *name, u32 hash)
++{
++	struct kdbus_policy_db_entry *e;
++	const char *dot;
++	size_t len;
++
++	/* find exact match */
++	hash_for_each_possible(db->entries_hash, e, hentry, hash)
++		if (strcmp(e->name, name) == 0 && !e->wildcard)
++			return e;
++
++	/* find wildcard match */
++
++	dot = strrchr(name, '.');
++	if (!dot)
++		return NULL;
++
++	len = dot - name;
++	hash = kdbus_strnhash(name, len);
++
++	hash_for_each_possible(db->entries_hash, e, hentry, hash)
++		if (e->wildcard && !strncmp(e->name, name, len) &&
++		    !e->name[len])
++			return e;
++
++	return NULL;
++}
++
++/**
++ * kdbus_policy_db_clear - release all memory from a policy db
++ * @db:		The policy database
++ */
++void kdbus_policy_db_clear(struct kdbus_policy_db *db)
++{
++	struct kdbus_policy_db_entry *e;
++	struct hlist_node *tmp;
++	unsigned int i;
++
++	/* purge entries */
++	down_write(&db->entries_rwlock);
++	hash_for_each_safe(db->entries_hash, i, tmp, e, hentry) {
++		hash_del(&e->hentry);
++		kdbus_policy_entry_free(e);
++	}
++	up_write(&db->entries_rwlock);
++}
++
++/**
++ * kdbus_policy_db_init() - initialize a new policy database
++ * @db:		The location of the database
++ *
++ * This initializes a new policy-db. The underlying memory must have been
++ * cleared to zero by the caller.
++ */
++void kdbus_policy_db_init(struct kdbus_policy_db *db)
++{
++	hash_init(db->entries_hash);
++	init_rwsem(&db->entries_rwlock);
++}
++
++/**
++ * kdbus_policy_query_unlocked() - Query the policy database
++ * @db:		Policy database
++ * @cred:	Credentials to test against
++ * @name:	Name to query
++ * @hash:	Hash value of @name
++ *
++ * Same as kdbus_policy_query() but requires the caller to lock the policy
++ * database against concurrent writes.
++ *
++ * Return: The highest KDBUS_POLICY_* access type found, or -EPERM if none.
++ */
++int kdbus_policy_query_unlocked(struct kdbus_policy_db *db,
++				const struct cred *cred, const char *name,
++				unsigned int hash)
++{
++	struct kdbus_policy_db_entry_access *a;
++	const struct kdbus_policy_db_entry *e;
++	int i, highest = -EPERM;
++
++	e = kdbus_policy_lookup(db, name, hash);
++	if (!e)
++		return -EPERM;
++
++	list_for_each_entry(a, &e->access_list, list) {
++		if ((int)a->access <= highest)
++			continue;
++
++		switch (a->type) {
++		case KDBUS_POLICY_ACCESS_USER:
++			if (uid_eq(cred->euid, a->uid))
++				highest = a->access;
++			break;
++		case KDBUS_POLICY_ACCESS_GROUP:
++			if (gid_eq(cred->egid, a->gid)) {
++				highest = a->access;
++				break;
++			}
++
++			for (i = 0; i < cred->group_info->ngroups; i++) {
++				kgid_t gid = GROUP_AT(cred->group_info, i);
++
++				if (gid_eq(gid, a->gid)) {
++					highest = a->access;
++					break;
++				}
++			}
++
++			break;
++		case KDBUS_POLICY_ACCESS_WORLD:
++			highest = a->access;
++			break;
++		}
++
++		/* OWN is the highest possible policy */
++		if (highest >= KDBUS_POLICY_OWN)
++			break;
++	}
++
++	return highest;
++}
++
++/**
++ * kdbus_policy_query() - Query the policy database
++ * @db:		Policy database
++ * @cred:	Credentials to test against
++ * @name:	Name to query
++ * @hash:	Hash value of @name
++ *
++ * Query the policy database @db for the access rights of @cred to the name
++ * @name. The access rights of @cred are returned, or -EPERM if no access is
++ * granted.
++ *
++ * This call effectively searches for the highest access-right granted to
++ * @cred. The caller should really cache those as policy lookups are rather
++ * expensive.
++ *
++ * Return: The highest KDBUS_POLICY_* access type found, or -EPERM if none.
++ */
++int kdbus_policy_query(struct kdbus_policy_db *db, const struct cred *cred,
++		       const char *name, unsigned int hash)
++{
++	int ret;
++
++	down_read(&db->entries_rwlock);
++	ret = kdbus_policy_query_unlocked(db, cred, name, hash);
++	up_read(&db->entries_rwlock);
++
++	return ret;
++}
++
++static void __kdbus_policy_remove_owner(struct kdbus_policy_db *db,
++					const void *owner)
++{
++	struct kdbus_policy_db_entry *e;
++	struct hlist_node *tmp;
++	int i;
++
++	hash_for_each_safe(db->entries_hash, i, tmp, e, hentry)
++		if (e->owner == owner) {
++			hash_del(&e->hentry);
++			kdbus_policy_entry_free(e);
++		}
++}
++
++/**
++ * kdbus_policy_remove_owner() - remove all entries related to a connection
++ * @db:		The policy database
++ * @owner:	The connection which items to remove
++ */
++void kdbus_policy_remove_owner(struct kdbus_policy_db *db,
++			       const void *owner)
++{
++	down_write(&db->entries_rwlock);
++	__kdbus_policy_remove_owner(db, owner);
++	up_write(&db->entries_rwlock);
++}
++
++/*
++ * Convert user provided policy access to internal kdbus policy
++ * access
++ */
++static struct kdbus_policy_db_entry_access *
++kdbus_policy_make_access(const struct kdbus_policy_access *uaccess)
++{
++	int ret;
++	struct kdbus_policy_db_entry_access *a;
++
++	a = kzalloc(sizeof(*a), GFP_KERNEL);
++	if (!a)
++		return ERR_PTR(-ENOMEM);
++
++	ret = -EINVAL;
++	switch (uaccess->access) {
++	case KDBUS_POLICY_SEE:
++	case KDBUS_POLICY_TALK:
++	case KDBUS_POLICY_OWN:
++		a->access = uaccess->access;
++		break;
++	default:
++		goto err;
++	}
++
++	switch (uaccess->type) {
++	case KDBUS_POLICY_ACCESS_USER:
++		a->uid = make_kuid(current_user_ns(), uaccess->id);
++		if (!uid_valid(a->uid))
++			goto err;
++
++		break;
++	case KDBUS_POLICY_ACCESS_GROUP:
++		a->gid = make_kgid(current_user_ns(), uaccess->id);
++		if (!gid_valid(a->gid))
++			goto err;
++
++		break;
++	case KDBUS_POLICY_ACCESS_WORLD:
++		break;
++	default:
++		goto err;
++	}
++
++	a->type = uaccess->type;
++
++	return a;
++
++err:
++	kfree(a);
++	return ERR_PTR(ret);
++}
++
++/**
++ * kdbus_policy_set() - set a connection's policy rules
++ * @db:				The policy database
++ * @items:			A list of kdbus_item elements that contain both
++ *				names and access rules to set.
++ * @items_size:			The total size of the items.
++ * @max_policies:		The maximum number of policy entries to allow.
++ *				Pass 0 for no limit.
++ * @allow_wildcards:		Boolean value whether wildcard entries (such
++ *				ending on '.*') should be allowed.
++ * @owner:			The owner of the new policy items.
++ *
++ * This function sets a new set of policies for a given owner. The names and
++ * access rules are gathered by walking the list of items passed in as
++ * argument. An item of type KDBUS_ITEM_NAME is expected before any number of
++ * KDBUS_ITEM_POLICY_ACCESS items. If there are more repetitions of this
++ * pattern than denoted in @max_policies, -EINVAL is returned.
++ *
++ * In order to allow atomic replacement of rules, the function first removes
++ * all entries that have been created for the given owner previously.
++ *
++ * Callers to this function must make sure that the owner is a custom
++ * endpoint, or if the endpoint is a default endpoint, then it must be
++ * either a policy holder or an activator.
++ *
++ * Return: 0 on success, negative errno on failure.
++ */
++int kdbus_policy_set(struct kdbus_policy_db *db,
++		     const struct kdbus_item *items,
++		     size_t items_size,
++		     size_t max_policies,
++		     bool allow_wildcards,
++		     const void *owner)
++{
++	struct kdbus_policy_db_entry_access *a;
++	struct kdbus_policy_db_entry *e, *p;
++	const struct kdbus_item *item;
++	struct hlist_node *tmp;
++	HLIST_HEAD(entries);
++	HLIST_HEAD(restore);
++	size_t count = 0;
++	int i, ret = 0;
++	u32 hash;
++
++	/* Walk the list of items and look for new policies */
++	e = NULL;
++	KDBUS_ITEMS_FOREACH(item, items, items_size) {
++		switch (item->type) {
++		case KDBUS_ITEM_NAME: {
++			size_t len;
++
++			if (max_policies && ++count > max_policies) {
++				ret = -E2BIG;
++				goto exit;
++			}
++
++			if (!kdbus_name_is_valid(item->str, true)) {
++				ret = -EINVAL;
++				goto exit;
++			}
++
++			e = kzalloc(sizeof(*e), GFP_KERNEL);
++			if (!e) {
++				ret = -ENOMEM;
++				goto exit;
++			}
++
++			INIT_LIST_HEAD(&e->access_list);
++			e->owner = owner;
++			hlist_add_head(&e->hentry, &entries);
++
++			e->name = kstrdup(item->str, GFP_KERNEL);
++			if (!e->name) {
++				ret = -ENOMEM;
++				goto exit;
++			}
++
++			/*
++			 * If a supplied name ends with an '.*', cut off that
++			 * part, only store anything before it, and mark the
++			 * entry as wildcard.
++			 */
++			len = strlen(e->name);
++			if (len > 2 &&
++			    e->name[len - 3] == '.' &&
++			    e->name[len - 2] == '*') {
++				if (!allow_wildcards) {
++					ret = -EINVAL;
++					goto exit;
++				}
++
++				e->name[len - 3] = '\0';
++				e->wildcard = true;
++			}
++
++			break;
++		}
++
++		case KDBUS_ITEM_POLICY_ACCESS:
++			if (!e) {
++				ret = -EINVAL;
++				goto exit;
++			}
++
++			a = kdbus_policy_make_access(&item->policy_access);
++			if (IS_ERR(a)) {
++				ret = PTR_ERR(a);
++				goto exit;
++			}
++
++			list_add_tail(&a->list, &e->access_list);
++			break;
++		}
++	}
++
++	down_write(&db->entries_rwlock);
++
++	/* remember previous entries to restore in case of failure */
++	hash_for_each_safe(db->entries_hash, i, tmp, e, hentry)
++		if (e->owner == owner) {
++			hash_del(&e->hentry);
++			hlist_add_head(&e->hentry, &restore);
++		}
++
++	hlist_for_each_entry_safe(e, tmp, &entries, hentry) {
++		/* prevent duplicates */
++		hash = kdbus_strhash(e->name);
++		hash_for_each_possible(db->entries_hash, p, hentry, hash)
++			if (strcmp(e->name, p->name) == 0 &&
++			    e->wildcard == p->wildcard) {
++				ret = -EEXIST;
++				goto restore;
++			}
++
++		hlist_del(&e->hentry);
++		hash_add(db->entries_hash, &e->hentry, hash);
++	}
++
++restore:
++	/* if we failed, flush all entries we added so far */
++	if (ret < 0)
++		__kdbus_policy_remove_owner(db, owner);
++
++	/* if we failed, restore entries, otherwise release them */
++	hlist_for_each_entry_safe(e, tmp, &restore, hentry) {
++		hlist_del(&e->hentry);
++		if (ret < 0) {
++			hash = kdbus_strhash(e->name);
++			hash_add(db->entries_hash, &e->hentry, hash);
++		} else {
++			kdbus_policy_entry_free(e);
++		}
++	}
++
++	up_write(&db->entries_rwlock);
++
++exit:
++	hlist_for_each_entry_safe(e, tmp, &entries, hentry) {
++		hlist_del(&e->hentry);
++		kdbus_policy_entry_free(e);
++	}
++
++	return ret;
++}
+diff --git a/ipc/kdbus/policy.h b/ipc/kdbus/policy.h
+new file mode 100644
+index 0000000..15dd7bc
+--- /dev/null
++++ b/ipc/kdbus/policy.h
+@@ -0,0 +1,51 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_POLICY_H
++#define __KDBUS_POLICY_H
++
++#include <linux/hashtable.h>
++#include <linux/rwsem.h>
++
++struct kdbus_conn;
++struct kdbus_item;
++
++/**
++ * struct kdbus_policy_db - policy database
++ * @entries_hash:	Hashtable of entries
++ * @entries_rwlock:	Mutex to protect the database's access entries
++ */
++struct kdbus_policy_db {
++	DECLARE_HASHTABLE(entries_hash, 6);
++	struct rw_semaphore entries_rwlock;
++};
++
++void kdbus_policy_db_init(struct kdbus_policy_db *db);
++void kdbus_policy_db_clear(struct kdbus_policy_db *db);
++
++int kdbus_policy_query_unlocked(struct kdbus_policy_db *db,
++				const struct cred *cred, const char *name,
++				unsigned int hash);
++int kdbus_policy_query(struct kdbus_policy_db *db, const struct cred *cred,
++		       const char *name, unsigned int hash);
++
++void kdbus_policy_remove_owner(struct kdbus_policy_db *db,
++			       const void *owner);
++int kdbus_policy_set(struct kdbus_policy_db *db,
++		     const struct kdbus_item *items,
++		     size_t items_size,
++		     size_t max_policies,
++		     bool allow_wildcards,
++		     const void *owner);
++
++#endif
+diff --git a/ipc/kdbus/pool.c b/ipc/kdbus/pool.c
+new file mode 100644
+index 0000000..63ccd55
+--- /dev/null
++++ b/ipc/kdbus/pool.c
+@@ -0,0 +1,728 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/aio.h>
++#include <linux/file.h>
++#include <linux/fs.h>
++#include <linux/highmem.h>
++#include <linux/init.h>
++#include <linux/mm.h>
++#include <linux/module.h>
++#include <linux/pagemap.h>
++#include <linux/rbtree.h>
++#include <linux/sched.h>
++#include <linux/shmem_fs.h>
++#include <linux/sizes.h>
++#include <linux/slab.h>
++#include <linux/uaccess.h>
++#include <linux/uio.h>
++
++#include "pool.h"
++#include "util.h"
++
++/**
++ * struct kdbus_pool - the receiver's buffer
++ * @f:			The backing shmem file
++ * @size:		The size of the file
++ * @accounted_size:	Currently accounted memory in bytes
++ * @lock:		Pool data lock
++ * @slices:		All slices sorted by address
++ * @slices_busy:	Tree of allocated slices
++ * @slices_free:	Tree of free slices
++ *
++ * The receiver's buffer, managed as a pool of allocated and free
++ * slices containing the queued messages.
++ *
++ * Messages sent with KDBUS_CMD_SEND are copied directly by the
++ * sending process into the receiver's pool.
++ *
++ * Messages received with KDBUS_CMD_RECV just return the offset
++ * to the data placed in the pool.
++ *
++ * The internally allocated memory needs to be returned by the receiver
++ * with KDBUS_CMD_FREE.
++ */
++struct kdbus_pool {
++	struct file *f;
++	size_t size;
++	size_t accounted_size;
++	struct mutex lock;
++
++	struct list_head slices;
++	struct rb_root slices_busy;
++	struct rb_root slices_free;
++};
++
++/**
++ * struct kdbus_pool_slice - allocated element in kdbus_pool
++ * @pool:		Pool this slice belongs to
++ * @off:		Offset of slice in the shmem file
++ * @size:		Size of slice
++ * @entry:		Entry in "all slices" list
++ * @rb_node:		Entry in free or busy list
++ * @free:		Unused slice
++ * @accounted:		Accounted as queue slice
++ * @ref_kernel:		Kernel holds a reference
++ * @ref_user:		Userspace holds a reference
++ *
++ * The pool has one or more slices, always spanning the entire size of the
++ * pool.
++ *
++ * Every slice is an element in a list sorted by the buffer address, to
++ * provide access to the next neighbor slice.
++ *
++ * Every slice is member in either the busy or the free tree. The free
++ * tree is organized by slice size, the busy tree organized by buffer
++ * offset.
++ */
++struct kdbus_pool_slice {
++	struct kdbus_pool *pool;
++	size_t off;
++	size_t size;
++
++	struct list_head entry;
++	struct rb_node rb_node;
++
++	bool free:1;
++	bool accounted:1;
++	bool ref_kernel:1;
++	bool ref_user:1;
++};
++
++static struct kdbus_pool_slice *kdbus_pool_slice_new(struct kdbus_pool *pool,
++						     size_t off, size_t size)
++{
++	struct kdbus_pool_slice *slice;
++
++	slice = kzalloc(sizeof(*slice), GFP_KERNEL);
++	if (!slice)
++		return NULL;
++
++	slice->pool = pool;
++	slice->off = off;
++	slice->size = size;
++	slice->free = true;
++	return slice;
++}
++
++/* insert a slice into the free tree */
++static void kdbus_pool_add_free_slice(struct kdbus_pool *pool,
++				      struct kdbus_pool_slice *slice)
++{
++	struct rb_node **n;
++	struct rb_node *pn = NULL;
++
++	n = &pool->slices_free.rb_node;
++	while (*n) {
++		struct kdbus_pool_slice *pslice;
++
++		pn = *n;
++		pslice = rb_entry(pn, struct kdbus_pool_slice, rb_node);
++		if (slice->size < pslice->size)
++			n = &pn->rb_left;
++		else
++			n = &pn->rb_right;
++	}
++
++	rb_link_node(&slice->rb_node, pn, n);
++	rb_insert_color(&slice->rb_node, &pool->slices_free);
++}
++
++/* insert a slice into the busy tree */
++static void kdbus_pool_add_busy_slice(struct kdbus_pool *pool,
++				      struct kdbus_pool_slice *slice)
++{
++	struct rb_node **n;
++	struct rb_node *pn = NULL;
++
++	n = &pool->slices_busy.rb_node;
++	while (*n) {
++		struct kdbus_pool_slice *pslice;
++
++		pn = *n;
++		pslice = rb_entry(pn, struct kdbus_pool_slice, rb_node);
++		if (slice->off < pslice->off)
++			n = &pn->rb_left;
++		else if (slice->off > pslice->off)
++			n = &pn->rb_right;
++		else
++			BUG();
++	}
++
++	rb_link_node(&slice->rb_node, pn, n);
++	rb_insert_color(&slice->rb_node, &pool->slices_busy);
++}
++
++static struct kdbus_pool_slice *kdbus_pool_find_slice(struct kdbus_pool *pool,
++						      size_t off)
++{
++	struct rb_node *n;
++
++	n = pool->slices_busy.rb_node;
++	while (n) {
++		struct kdbus_pool_slice *s;
++
++		s = rb_entry(n, struct kdbus_pool_slice, rb_node);
++		if (off < s->off)
++			n = n->rb_left;
++		else if (off > s->off)
++			n = n->rb_right;
++		else
++			return s;
++	}
++
++	return NULL;
++}
++
++/**
++ * kdbus_pool_slice_alloc() - allocate memory from a pool
++ * @pool:	The receiver's pool
++ * @size:	The number of bytes to allocate
++ * @accounted:	Whether this slice should be accounted for
++ *
++ * The returned slice is used for kdbus_pool_slice_release() to
++ * free the allocated memory. If either @kvec or @iovec is non-NULL, the data
++ * will be copied from kernel or userspace memory into the new slice at
++ * offset 0.
++ *
++ * Return: the allocated slice on success, ERR_PTR on failure.
++ */
++struct kdbus_pool_slice *kdbus_pool_slice_alloc(struct kdbus_pool *pool,
++						size_t size, bool accounted)
++{
++	size_t slice_size = KDBUS_ALIGN8(size);
++	struct rb_node *n, *found = NULL;
++	struct kdbus_pool_slice *s;
++	int ret = 0;
++
++	if (WARN_ON(!size))
++		return ERR_PTR(-EINVAL);
++
++	/* search a free slice with the closest matching size */
++	mutex_lock(&pool->lock);
++	n = pool->slices_free.rb_node;
++	while (n) {
++		s = rb_entry(n, struct kdbus_pool_slice, rb_node);
++		if (slice_size < s->size) {
++			found = n;
++			n = n->rb_left;
++		} else if (slice_size > s->size) {
++			n = n->rb_right;
++		} else {
++			found = n;
++			break;
++		}
++	}
++
++	/* no slice with the minimum size found in the pool */
++	if (!found) {
++		ret = -EXFULL;
++		goto exit_unlock;
++	}
++
++	/* no exact match, use the closest one */
++	if (!n) {
++		struct kdbus_pool_slice *s_new;
++
++		s = rb_entry(found, struct kdbus_pool_slice, rb_node);
++
++		/* split-off the remainder of the size to its own slice */
++		s_new = kdbus_pool_slice_new(pool, s->off + slice_size,
++					     s->size - slice_size);
++		if (!s_new) {
++			ret = -ENOMEM;
++			goto exit_unlock;
++		}
++
++		list_add(&s_new->entry, &s->entry);
++		kdbus_pool_add_free_slice(pool, s_new);
++
++		/* adjust our size now that we split-off another slice */
++		s->size = slice_size;
++	}
++
++	/* move slice from free to the busy tree */
++	rb_erase(found, &pool->slices_free);
++	kdbus_pool_add_busy_slice(pool, s);
++
++	WARN_ON(s->ref_kernel || s->ref_user);
++
++	s->ref_kernel = true;
++	s->free = false;
++	s->accounted = accounted;
++	if (accounted)
++		pool->accounted_size += s->size;
++	mutex_unlock(&pool->lock);
++
++	return s;
++
++exit_unlock:
++	mutex_unlock(&pool->lock);
++	return ERR_PTR(ret);
++}
++
++static void __kdbus_pool_slice_release(struct kdbus_pool_slice *slice)
++{
++	struct kdbus_pool *pool = slice->pool;
++
++	/* don't free the slice if either has a reference */
++	if (slice->ref_kernel || slice->ref_user)
++		return;
++
++	if (WARN_ON(slice->free))
++		return;
++
++	rb_erase(&slice->rb_node, &pool->slices_busy);
++
++	/* merge with the next free slice */
++	if (!list_is_last(&slice->entry, &pool->slices)) {
++		struct kdbus_pool_slice *s;
++
++		s = list_entry(slice->entry.next,
++			       struct kdbus_pool_slice, entry);
++		if (s->free) {
++			rb_erase(&s->rb_node, &pool->slices_free);
++			list_del(&s->entry);
++			slice->size += s->size;
++			kfree(s);
++		}
++	}
++
++	/* merge with previous free slice */
++	if (pool->slices.next != &slice->entry) {
++		struct kdbus_pool_slice *s;
++
++		s = list_entry(slice->entry.prev,
++			       struct kdbus_pool_slice, entry);
++		if (s->free) {
++			rb_erase(&s->rb_node, &pool->slices_free);
++			list_del(&slice->entry);
++			s->size += slice->size;
++			kfree(slice);
++			slice = s;
++		}
++	}
++
++	slice->free = true;
++	kdbus_pool_add_free_slice(pool, slice);
++}
++
++/**
++ * kdbus_pool_slice_release() - drop kernel-reference on allocated slice
++ * @slice:		Slice allocated from the pool
++ *
++ * This releases the kernel-reference on the given slice. If the
++ * kernel-reference and the user-reference on a slice are dropped, the slice is
++ * returned to the pool.
++ *
++ * So far, we do not implement full ref-counting on slices. Each, kernel and
++ * user-space can have exactly one reference to a slice. If both are dropped at
++ * the same time, the slice is released.
++ */
++void kdbus_pool_slice_release(struct kdbus_pool_slice *slice)
++{
++	struct kdbus_pool *pool;
++
++	if (!slice)
++		return;
++
++	/* @slice may be freed, so keep local ptr to @pool */
++	pool = slice->pool;
++
++	mutex_lock(&pool->lock);
++	/* kernel must own a ref to @slice to drop it */
++	WARN_ON(!slice->ref_kernel);
++	slice->ref_kernel = false;
++	/* no longer kernel-owned, de-account slice */
++	if (slice->accounted && !WARN_ON(pool->accounted_size < slice->size))
++		pool->accounted_size -= slice->size;
++	__kdbus_pool_slice_release(slice);
++	mutex_unlock(&pool->lock);
++}
++
++/**
++ * kdbus_pool_release_offset() - release a public offset
++ * @pool:		pool to operate on
++ * @off:		offset to release
++ *
++ * This should be called whenever user-space frees a slice given to them. It
++ * verifies the slice is available and public, and then drops it. It ensures
++ * correct locking and barriers against queues.
++ *
++ * Return: 0 on success, ENXIO if the offset is invalid or not public.
++ */
++int kdbus_pool_release_offset(struct kdbus_pool *pool, size_t off)
++{
++	struct kdbus_pool_slice *slice;
++	int ret = 0;
++
++	/* 'pool->size' is used as dummy offset for empty slices */
++	if (off == pool->size)
++		return 0;
++
++	mutex_lock(&pool->lock);
++	slice = kdbus_pool_find_slice(pool, off);
++	if (slice && slice->ref_user) {
++		slice->ref_user = false;
++		__kdbus_pool_slice_release(slice);
++	} else {
++		ret = -ENXIO;
++	}
++	mutex_unlock(&pool->lock);
++
++	return ret;
++}
++
++/**
++ * kdbus_pool_publish_empty() - publish empty slice to user-space
++ * @pool:		pool to operate on
++ * @off:		output storage for offset, or NULL
++ * @size:		output storage for size, or NULL
++ *
++ * This is the same as kdbus_pool_slice_publish(), but uses a dummy slice with
++ * size 0. The returned offset points to the end of the pool and is never
++ * returned on real slices.
++ */
++void kdbus_pool_publish_empty(struct kdbus_pool *pool, u64 *off, u64 *size)
++{
++	if (off)
++		*off = pool->size;
++	if (size)
++		*size = 0;
++}
++
++/**
++ * kdbus_pool_slice_publish() - publish slice to user-space
++ * @slice:		The slice
++ * @out_offset:		Output storage for offset, or NULL
++ * @out_size:		Output storage for size, or NULL
++ *
++ * This prepares a slice to be published to user-space.
++ *
++ * This call combines the following operations:
++ *   * the memory region is flushed so the user's memory view is consistent
++ *   * the slice is marked as referenced by user-space, so user-space has to
++ *     call KDBUS_CMD_FREE to release it
++ *   * the offset and size of the slice are written to the given output
++ *     arguments, if non-NULL
++ */
++void kdbus_pool_slice_publish(struct kdbus_pool_slice *slice,
++			      u64 *out_offset, u64 *out_size)
++{
++	mutex_lock(&slice->pool->lock);
++	/* kernel must own a ref to @slice to gain a user-space ref */
++	WARN_ON(!slice->ref_kernel);
++	slice->ref_user = true;
++	mutex_unlock(&slice->pool->lock);
++
++	if (out_offset)
++		*out_offset = slice->off;
++	if (out_size)
++		*out_size = slice->size;
++}
++
++/**
++ * kdbus_pool_slice_offset() - Get a slice's offset inside the pool
++ * @slice:	Slice to return the offset of
++ *
++ * Return: The internal offset @slice inside the pool.
++ */
++off_t kdbus_pool_slice_offset(const struct kdbus_pool_slice *slice)
++{
++	return slice->off;
++}
++
++/**
++ * kdbus_pool_slice_size() - get size of a pool slice
++ * @slice:	slice to query
++ *
++ * Return: size of the given slice
++ */
++size_t kdbus_pool_slice_size(const struct kdbus_pool_slice *slice)
++{
++	return slice->size;
++}
++
++/**
++ * kdbus_pool_new() - create a new pool
++ * @name:		Name of the (deleted) file which shows up in
++ *			/proc, used for debugging
++ * @size:		Maximum size of the pool
++ *
++ * Return: a new kdbus_pool on success, ERR_PTR on failure.
++ */
++struct kdbus_pool *kdbus_pool_new(const char *name, size_t size)
++{
++	struct kdbus_pool_slice *s;
++	struct kdbus_pool *p;
++	struct file *f;
++	char *n = NULL;
++	int ret;
++
++	p = kzalloc(sizeof(*p), GFP_KERNEL);
++	if (!p)
++		return ERR_PTR(-ENOMEM);
++
++	if (name) {
++		n = kasprintf(GFP_KERNEL, KBUILD_MODNAME "-conn:%s", name);
++		if (!n) {
++			ret = -ENOMEM;
++			goto exit_free;
++		}
++	}
++
++	f = shmem_file_setup(n ?: KBUILD_MODNAME "-conn", size, 0);
++	kfree(n);
++
++	if (IS_ERR(f)) {
++		ret = PTR_ERR(f);
++		goto exit_free;
++	}
++
++	ret = get_write_access(file_inode(f));
++	if (ret < 0)
++		goto exit_put_shmem;
++
++	/* allocate first slice spanning the entire pool */
++	s = kdbus_pool_slice_new(p, 0, size);
++	if (!s) {
++		ret = -ENOMEM;
++		goto exit_put_write;
++	}
++
++	p->f = f;
++	p->size = size;
++	p->slices_free = RB_ROOT;
++	p->slices_busy = RB_ROOT;
++	mutex_init(&p->lock);
++
++	INIT_LIST_HEAD(&p->slices);
++	list_add(&s->entry, &p->slices);
++
++	kdbus_pool_add_free_slice(p, s);
++	return p;
++
++exit_put_write:
++	put_write_access(file_inode(f));
++exit_put_shmem:
++	fput(f);
++exit_free:
++	kfree(p);
++	return ERR_PTR(ret);
++}
++
++/**
++ * kdbus_pool_free() - destroy pool
++ * @pool:		The receiver's pool
++ */
++void kdbus_pool_free(struct kdbus_pool *pool)
++{
++	struct kdbus_pool_slice *s, *tmp;
++
++	if (!pool)
++		return;
++
++	list_for_each_entry_safe(s, tmp, &pool->slices, entry) {
++		list_del(&s->entry);
++		kfree(s);
++	}
++
++	put_write_access(file_inode(pool->f));
++	fput(pool->f);
++	kfree(pool);
++}
++
++/**
++ * kdbus_pool_accounted() - retrieve accounting information
++ * @pool:		pool to query
++ * @size:		output for overall pool size
++ * @acc:		output for currently accounted size
++ *
++ * This returns accounting information of the pool. Note that the data might
++ * change after the function returns, as the pool lock is dropped. You need to
++ * protect the data via other means, if you need reliable accounting.
++ */
++void kdbus_pool_accounted(struct kdbus_pool *pool, size_t *size, size_t *acc)
++{
++	mutex_lock(&pool->lock);
++	if (size)
++		*size = pool->size;
++	if (acc)
++		*acc = pool->accounted_size;
++	mutex_unlock(&pool->lock);
++}
++
++/**
++ * kdbus_pool_slice_copy_iovec() - copy user memory to a slice
++ * @slice:		The slice to write to
++ * @off:		Offset in the slice to write to
++ * @iov:		iovec array, pointing to data to copy
++ * @iov_len:		Number of elements in @iov
++ * @total_len:		Total number of bytes described in members of @iov
++ *
++ * User memory referenced by @iov will be copied into @slice at offset @off.
++ *
++ * Return: the numbers of bytes copied, negative errno on failure.
++ */
++ssize_t
++kdbus_pool_slice_copy_iovec(const struct kdbus_pool_slice *slice, loff_t off,
++			    struct iovec *iov, size_t iov_len, size_t total_len)
++{
++	struct iov_iter iter;
++	ssize_t len;
++
++	if (WARN_ON(off + total_len > slice->size))
++		return -EFAULT;
++
++	off += slice->off;
++	iov_iter_init(&iter, WRITE, iov, iov_len, total_len);
++	len = vfs_iter_write(slice->pool->f, &iter, &off);
++
++	return (len >= 0 && len != total_len) ? -EFAULT : len;
++}
++
++/**
++ * kdbus_pool_slice_copy_kvec() - copy kernel memory to a slice
++ * @slice:		The slice to write to
++ * @off:		Offset in the slice to write to
++ * @kvec:		kvec array, pointing to data to copy
++ * @kvec_len:		Number of elements in @kvec
++ * @total_len:		Total number of bytes described in members of @kvec
++ *
++ * Kernel memory referenced by @kvec will be copied into @slice at offset @off.
++ *
++ * Return: the numbers of bytes copied, negative errno on failure.
++ */
++ssize_t kdbus_pool_slice_copy_kvec(const struct kdbus_pool_slice *slice,
++				   loff_t off, struct kvec *kvec,
++				   size_t kvec_len, size_t total_len)
++{
++	struct iov_iter iter;
++	mm_segment_t old_fs;
++	ssize_t len;
++
++	if (WARN_ON(off + total_len > slice->size))
++		return -EFAULT;
++
++	off += slice->off;
++	iov_iter_kvec(&iter, WRITE | ITER_KVEC, kvec, kvec_len, total_len);
++
++	old_fs = get_fs();
++	set_fs(get_ds());
++	len = vfs_iter_write(slice->pool->f, &iter, &off);
++	set_fs(old_fs);
++
++	return (len >= 0 && len != total_len) ? -EFAULT : len;
++}
++
++/**
++ * kdbus_pool_slice_copy() - copy data from one slice into another
++ * @slice_dst:		destination slice
++ * @slice_src:		source slice
++ *
++ * Return: 0 on success, negative error number on failure.
++ */
++int kdbus_pool_slice_copy(const struct kdbus_pool_slice *slice_dst,
++			  const struct kdbus_pool_slice *slice_src)
++{
++	struct file *f_src = slice_src->pool->f;
++	struct file *f_dst = slice_dst->pool->f;
++	struct inode *i_dst = file_inode(f_dst);
++	struct address_space *mapping_dst = f_dst->f_mapping;
++	const struct address_space_operations *aops = mapping_dst->a_ops;
++	unsigned long len = slice_src->size;
++	loff_t off_src = slice_src->off;
++	loff_t off_dst = slice_dst->off;
++	mm_segment_t old_fs;
++	int ret = 0;
++
++	if (WARN_ON(slice_src->size != slice_dst->size) ||
++	    WARN_ON(slice_src->free || slice_dst->free))
++		return -EINVAL;
++
++	mutex_lock(&i_dst->i_mutex);
++	old_fs = get_fs();
++	set_fs(get_ds());
++	while (len > 0) {
++		unsigned long page_off;
++		unsigned long copy_len;
++		char __user *kaddr;
++		struct page *page;
++		ssize_t n_read;
++		void *fsdata;
++		long status;
++
++		page_off = off_dst & (PAGE_CACHE_SIZE - 1);
++		copy_len = min_t(unsigned long,
++				 PAGE_CACHE_SIZE - page_off, len);
++
++		status = aops->write_begin(f_dst, mapping_dst, off_dst,
++					   copy_len, 0, &page, &fsdata);
++		if (unlikely(status < 0)) {
++			ret = status;
++			break;
++		}
++
++		kaddr = (char __force __user *)kmap(page) + page_off;
++		n_read = __vfs_read(f_src, kaddr, copy_len, &off_src);
++		kunmap(page);
++		mark_page_accessed(page);
++		flush_dcache_page(page);
++
++		if (unlikely(n_read != copy_len)) {
++			ret = -EFAULT;
++			break;
++		}
++
++		status = aops->write_end(f_dst, mapping_dst, off_dst,
++					 copy_len, copy_len, page, fsdata);
++		if (unlikely(status != copy_len)) {
++			ret = -EFAULT;
++			break;
++		}
++
++		off_dst += copy_len;
++		len -= copy_len;
++	}
++	set_fs(old_fs);
++	mutex_unlock(&i_dst->i_mutex);
++
++	return ret;
++}
++
++/**
++ * kdbus_pool_mmap() -  map the pool into the process
++ * @pool:		The receiver's pool
++ * @vma:		passed by mmap() syscall
++ *
++ * Return: the result of the mmap() call, negative errno on failure.
++ */
++int kdbus_pool_mmap(const struct kdbus_pool *pool, struct vm_area_struct *vma)
++{
++	/* deny write access to the pool */
++	if (vma->vm_flags & VM_WRITE)
++		return -EPERM;
++	vma->vm_flags &= ~VM_MAYWRITE;
++
++	/* do not allow to map more than the size of the file */
++	if ((vma->vm_end - vma->vm_start) > pool->size)
++		return -EFAULT;
++
++	/* replace the connection file with our shmem file */
++	if (vma->vm_file)
++		fput(vma->vm_file);
++	vma->vm_file = get_file(pool->f);
++
++	return pool->f->f_op->mmap(pool->f, vma);
++}
+diff --git a/ipc/kdbus/pool.h b/ipc/kdbus/pool.h
+new file mode 100644
+index 0000000..a903821
+--- /dev/null
++++ b/ipc/kdbus/pool.h
+@@ -0,0 +1,46 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_POOL_H
++#define __KDBUS_POOL_H
++
++#include <linux/uio.h>
++
++struct kdbus_pool;
++struct kdbus_pool_slice;
++
++struct kdbus_pool *kdbus_pool_new(const char *name, size_t size);
++void kdbus_pool_free(struct kdbus_pool *pool);
++void kdbus_pool_accounted(struct kdbus_pool *pool, size_t *size, size_t *acc);
++int kdbus_pool_mmap(const struct kdbus_pool *pool, struct vm_area_struct *vma);
++int kdbus_pool_release_offset(struct kdbus_pool *pool, size_t off);
++void kdbus_pool_publish_empty(struct kdbus_pool *pool, u64 *off, u64 *size);
++
++struct kdbus_pool_slice *kdbus_pool_slice_alloc(struct kdbus_pool *pool,
++						size_t size, bool accounted);
++void kdbus_pool_slice_release(struct kdbus_pool_slice *slice);
++void kdbus_pool_slice_publish(struct kdbus_pool_slice *slice,
++			      u64 *out_offset, u64 *out_size);
++off_t kdbus_pool_slice_offset(const struct kdbus_pool_slice *slice);
++size_t kdbus_pool_slice_size(const struct kdbus_pool_slice *slice);
++int kdbus_pool_slice_copy(const struct kdbus_pool_slice *slice_dst,
++			  const struct kdbus_pool_slice *slice_src);
++ssize_t kdbus_pool_slice_copy_kvec(const struct kdbus_pool_slice *slice,
++				   loff_t off, struct kvec *kvec,
++				   size_t kvec_count, size_t total_len);
++ssize_t kdbus_pool_slice_copy_iovec(const struct kdbus_pool_slice *slice,
++				    loff_t off, struct iovec *iov,
++				    size_t iov_count, size_t total_len);
++
++#endif
+diff --git a/ipc/kdbus/queue.c b/ipc/kdbus/queue.c
+new file mode 100644
+index 0000000..f9c44d7
+--- /dev/null
++++ b/ipc/kdbus/queue.c
+@@ -0,0 +1,363 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/audit.h>
++#include <linux/file.h>
++#include <linux/fs.h>
++#include <linux/hashtable.h>
++#include <linux/idr.h>
++#include <linux/init.h>
++#include <linux/math64.h>
++#include <linux/mm.h>
++#include <linux/module.h>
++#include <linux/mutex.h>
++#include <linux/poll.h>
++#include <linux/sched.h>
++#include <linux/sizes.h>
++#include <linux/slab.h>
++#include <linux/syscalls.h>
++#include <linux/uio.h>
++
++#include "util.h"
++#include "domain.h"
++#include "connection.h"
++#include "item.h"
++#include "message.h"
++#include "metadata.h"
++#include "queue.h"
++#include "reply.h"
++
++/**
++ * kdbus_queue_init() - initialize data structure related to a queue
++ * @queue:	The queue to initialize
++ */
++void kdbus_queue_init(struct kdbus_queue *queue)
++{
++	INIT_LIST_HEAD(&queue->msg_list);
++	queue->msg_prio_queue = RB_ROOT;
++}
++
++/**
++ * kdbus_queue_peek() - Retrieves an entry from a queue
++ * @queue:		The queue
++ * @priority:		The minimum priority of the entry to peek
++ * @use_priority:	Boolean flag whether or not to peek by priority
++ *
++ * Look for a entry in a queue, either by priority, or the oldest one (FIFO).
++ * The entry is not freed, put off the queue's lists or anything else.
++ *
++ * Return: the peeked queue entry on success, NULL if no suitable msg is found
++ */
++struct kdbus_queue_entry *kdbus_queue_peek(struct kdbus_queue *queue,
++					   s64 priority, bool use_priority)
++{
++	struct kdbus_queue_entry *e;
++
++	if (list_empty(&queue->msg_list))
++		return NULL;
++
++	if (use_priority) {
++		/* get next entry with highest priority */
++		e = rb_entry(queue->msg_prio_highest,
++			     struct kdbus_queue_entry, prio_node);
++
++		/* no entry with the requested priority */
++		if (e->priority > priority)
++			return NULL;
++	} else {
++		/* ignore the priority, return the next entry in the entry */
++		e = list_first_entry(&queue->msg_list,
++				     struct kdbus_queue_entry, entry);
++	}
++
++	return e;
++}
++
++static void kdbus_queue_entry_link(struct kdbus_queue_entry *entry)
++{
++	struct kdbus_queue *queue = &entry->conn->queue;
++	struct rb_node **n, *pn = NULL;
++	bool highest = true;
++
++	lockdep_assert_held(&entry->conn->lock);
++	if (WARN_ON(!list_empty(&entry->entry)))
++		return;
++
++	/* sort into priority entry tree */
++	n = &queue->msg_prio_queue.rb_node;
++	while (*n) {
++		struct kdbus_queue_entry *e;
++
++		pn = *n;
++		e = rb_entry(pn, struct kdbus_queue_entry, prio_node);
++
++		/* existing node for this priority, add to its list */
++		if (likely(entry->priority == e->priority)) {
++			list_add_tail(&entry->prio_entry, &e->prio_entry);
++			goto prio_done;
++		}
++
++		if (entry->priority < e->priority) {
++			n = &pn->rb_left;
++		} else {
++			n = &pn->rb_right;
++			highest = false;
++		}
++	}
++
++	/* cache highest-priority entry */
++	if (highest)
++		queue->msg_prio_highest = &entry->prio_node;
++
++	/* new node for this priority */
++	rb_link_node(&entry->prio_node, pn, n);
++	rb_insert_color(&entry->prio_node, &queue->msg_prio_queue);
++	INIT_LIST_HEAD(&entry->prio_entry);
++
++prio_done:
++	/* add to unsorted fifo list */
++	list_add_tail(&entry->entry, &queue->msg_list);
++}
++
++static void kdbus_queue_entry_unlink(struct kdbus_queue_entry *entry)
++{
++	struct kdbus_queue *queue = &entry->conn->queue;
++
++	lockdep_assert_held(&entry->conn->lock);
++	if (list_empty(&entry->entry))
++		return;
++
++	list_del_init(&entry->entry);
++
++	if (list_empty(&entry->prio_entry)) {
++		/*
++		 * Single entry for this priority, update cached
++		 * highest-priority entry, remove the tree node.
++		 */
++		if (queue->msg_prio_highest == &entry->prio_node)
++			queue->msg_prio_highest = rb_next(&entry->prio_node);
++
++		rb_erase(&entry->prio_node, &queue->msg_prio_queue);
++	} else {
++		struct kdbus_queue_entry *q;
++
++		/*
++		 * Multiple entries for this priority entry, get next one in
++		 * the list. Update cached highest-priority entry, store the
++		 * new one as the tree node.
++		 */
++		q = list_first_entry(&entry->prio_entry,
++				     struct kdbus_queue_entry, prio_entry);
++		list_del(&entry->prio_entry);
++
++		if (queue->msg_prio_highest == &entry->prio_node)
++			queue->msg_prio_highest = &q->prio_node;
++
++		rb_replace_node(&entry->prio_node, &q->prio_node,
++				&queue->msg_prio_queue);
++	}
++}
++
++/**
++ * kdbus_queue_entry_new() - allocate a queue entry
++ * @src:	source connection, or NULL
++ * @dst:	destination connection
++ * @s:		staging object carrying the message
++ *
++ * Allocates a queue entry based on a given msg and allocate space for
++ * the message payload and the requested metadata in the connection's pool.
++ * The entry is not actually added to the queue's lists at this point.
++ *
++ * Return: the allocated entry on success, or an ERR_PTR on failures.
++ */
++struct kdbus_queue_entry *kdbus_queue_entry_new(struct kdbus_conn *src,
++						struct kdbus_conn *dst,
++						struct kdbus_staging *s)
++{
++	struct kdbus_queue_entry *entry;
++	int ret;
++
++	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
++	if (!entry)
++		return ERR_PTR(-ENOMEM);
++
++	INIT_LIST_HEAD(&entry->entry);
++	entry->priority = s->msg->priority;
++	entry->conn = kdbus_conn_ref(dst);
++	entry->gaps = kdbus_gaps_ref(s->gaps);
++
++	entry->slice = kdbus_staging_emit(s, src, dst);
++	if (IS_ERR(entry->slice)) {
++		ret = PTR_ERR(entry->slice);
++		entry->slice = NULL;
++		goto error;
++	}
++
++	entry->user = src ? kdbus_user_ref(src->user) : NULL;
++	return entry;
++
++error:
++	kdbus_queue_entry_free(entry);
++	return ERR_PTR(ret);
++}
++
++/**
++ * kdbus_queue_entry_free() - free resources of an entry
++ * @entry:	The entry to free
++ *
++ * Removes resources allocated by a queue entry, along with the entry itself.
++ * Note that the entry's slice is not freed at this point.
++ */
++void kdbus_queue_entry_free(struct kdbus_queue_entry *entry)
++{
++	if (!entry)
++		return;
++
++	lockdep_assert_held(&entry->conn->lock);
++
++	kdbus_queue_entry_unlink(entry);
++	kdbus_reply_unref(entry->reply);
++
++	if (entry->slice) {
++		kdbus_conn_quota_dec(entry->conn, entry->user,
++				     kdbus_pool_slice_size(entry->slice),
++				     entry->gaps ? entry->gaps->n_fds : 0);
++		kdbus_pool_slice_release(entry->slice);
++	}
++
++	kdbus_user_unref(entry->user);
++	kdbus_gaps_unref(entry->gaps);
++	kdbus_conn_unref(entry->conn);
++	kfree(entry);
++}
++
++/**
++ * kdbus_queue_entry_install() - install message components into the
++ *				 receiver's process
++ * @entry:		The queue entry to install
++ * @return_flags:	Pointer to store the return flags for userspace
++ * @install_fds:	Whether or not to install associated file descriptors
++ *
++ * Return: 0 on success.
++ */
++int kdbus_queue_entry_install(struct kdbus_queue_entry *entry,
++			      u64 *return_flags, bool install_fds)
++{
++	bool incomplete_fds = false;
++	int ret;
++
++	lockdep_assert_held(&entry->conn->lock);
++
++	ret = kdbus_gaps_install(entry->gaps, entry->slice, &incomplete_fds);
++	if (ret < 0)
++		return ret;
++
++	if (incomplete_fds)
++		*return_flags |= KDBUS_RECV_RETURN_INCOMPLETE_FDS;
++	return 0;
++}
++
++/**
++ * kdbus_queue_entry_enqueue() - enqueue an entry
++ * @entry:		entry to enqueue
++ * @reply:		reply to link to this entry (or NULL if none)
++ *
++ * This enqueues an unqueued entry into the message queue of the linked
++ * connection. It also binds a reply object to the entry so we can remember it
++ * when the message is moved.
++ *
++ * Once this call returns (and the connection lock is released), this entry can
++ * be dequeued by the target connection. Note that the entry will not be removed
++ * from the queue until it is destroyed.
++ */
++void kdbus_queue_entry_enqueue(struct kdbus_queue_entry *entry,
++			       struct kdbus_reply *reply)
++{
++	lockdep_assert_held(&entry->conn->lock);
++
++	if (WARN_ON(entry->reply) || WARN_ON(!list_empty(&entry->entry)))
++		return;
++
++	entry->reply = kdbus_reply_ref(reply);
++	kdbus_queue_entry_link(entry);
++}
++
++/**
++ * kdbus_queue_entry_move() - move queue entry
++ * @e:		queue entry to move
++ * @dst:	destination connection to queue the entry on
++ *
++ * This moves a queue entry onto a different connection. It allocates a new
++ * slice on the target connection and copies the message over. If the copy
++ * succeeded, we move the entry from @src to @dst.
++ *
++ * On failure, the entry is left untouched.
++ *
++ * The queue entry must be queued right now, and after the call succeeds it will
++ * be queued on the destination, but no longer on the source.
++ *
++ * The caller must hold the connection lock of the source *and* destination.
++ *
++ * Return: 0 on success, negative error code on failure.
++ */
++int kdbus_queue_entry_move(struct kdbus_queue_entry *e,
++			   struct kdbus_conn *dst)
++{
++	struct kdbus_pool_slice *slice = NULL;
++	struct kdbus_conn *src = e->conn;
++	size_t size, fds;
++	int ret;
++
++	lockdep_assert_held(&src->lock);
++	lockdep_assert_held(&dst->lock);
++
++	if (WARN_ON(list_empty(&e->entry)))
++		return -EINVAL;
++	if (src == dst)
++		return 0;
++
++	size = kdbus_pool_slice_size(e->slice);
++	fds = e->gaps ? e->gaps->n_fds : 0;
++
++	ret = kdbus_conn_quota_inc(dst, e->user, size, fds);
++	if (ret < 0)
++		return ret;
++
++	slice = kdbus_pool_slice_alloc(dst->pool, size, true);
++	if (IS_ERR(slice)) {
++		ret = PTR_ERR(slice);
++		slice = NULL;
++		goto error;
++	}
++
++	ret = kdbus_pool_slice_copy(slice, e->slice);
++	if (ret < 0)
++		goto error;
++
++	kdbus_queue_entry_unlink(e);
++	kdbus_conn_quota_dec(src, e->user, size, fds);
++	kdbus_pool_slice_release(e->slice);
++	kdbus_conn_unref(e->conn);
++
++	e->slice = slice;
++	e->conn = kdbus_conn_ref(dst);
++	kdbus_queue_entry_link(e);
++
++	return 0;
++
++error:
++	kdbus_pool_slice_release(slice);
++	kdbus_conn_quota_dec(dst, e->user, size, fds);
++	return ret;
++}
+diff --git a/ipc/kdbus/queue.h b/ipc/kdbus/queue.h
+new file mode 100644
+index 0000000..bf686d1
+--- /dev/null
++++ b/ipc/kdbus/queue.h
+@@ -0,0 +1,84 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_QUEUE_H
++#define __KDBUS_QUEUE_H
++
++#include <linux/list.h>
++#include <linux/rbtree.h>
++
++struct kdbus_conn;
++struct kdbus_pool_slice;
++struct kdbus_reply;
++struct kdbus_staging;
++struct kdbus_user;
++
++/**
++ * struct kdbus_queue - a connection's message queue
++ * @msg_list:		List head for kdbus_queue_entry objects
++ * @msg_prio_queue:	RB tree root for messages, sorted by priority
++ * @msg_prio_highest:	Link to the RB node referencing the message with the
++ *			highest priority in the tree.
++ */
++struct kdbus_queue {
++	struct list_head msg_list;
++	struct rb_root msg_prio_queue;
++	struct rb_node *msg_prio_highest;
++};
++
++/**
++ * struct kdbus_queue_entry - messages waiting to be read
++ * @entry:		Entry in the connection's list
++ * @prio_node:		Entry in the priority queue tree
++ * @prio_entry:		Queue tree node entry in the list of one priority
++ * @priority:		Message priority
++ * @dst_name_id:	The sequence number of the name this message is
++ *			addressed to, 0 for messages sent to an ID
++ * @conn:		Connection this entry is queued on
++ * @gaps:		Gaps object to fill message gaps at RECV time
++ * @user:		User used for accounting
++ * @slice:		Slice in the receiver's pool for the message
++ * @reply:		The reply block if a reply to this message is expected
++ */
++struct kdbus_queue_entry {
++	struct list_head entry;
++	struct rb_node prio_node;
++	struct list_head prio_entry;
++
++	s64 priority;
++	u64 dst_name_id;
++
++	struct kdbus_conn *conn;
++	struct kdbus_gaps *gaps;
++	struct kdbus_user *user;
++	struct kdbus_pool_slice *slice;
++	struct kdbus_reply *reply;
++};
++
++void kdbus_queue_init(struct kdbus_queue *queue);
++struct kdbus_queue_entry *kdbus_queue_peek(struct kdbus_queue *queue,
++					   s64 priority, bool use_priority);
++
++struct kdbus_queue_entry *kdbus_queue_entry_new(struct kdbus_conn *src,
++						struct kdbus_conn *dst,
++						struct kdbus_staging *s);
++void kdbus_queue_entry_free(struct kdbus_queue_entry *entry);
++int kdbus_queue_entry_install(struct kdbus_queue_entry *entry,
++			      u64 *return_flags, bool install_fds);
++void kdbus_queue_entry_enqueue(struct kdbus_queue_entry *entry,
++			       struct kdbus_reply *reply);
++int kdbus_queue_entry_move(struct kdbus_queue_entry *entry,
++			   struct kdbus_conn *dst);
++
++#endif /* __KDBUS_QUEUE_H */
+diff --git a/ipc/kdbus/reply.c b/ipc/kdbus/reply.c
+new file mode 100644
+index 0000000..e6791d8
+--- /dev/null
++++ b/ipc/kdbus/reply.c
+@@ -0,0 +1,252 @@
++#include <linux/init.h>
++#include <linux/mm.h>
++#include <linux/module.h>
++#include <linux/mutex.h>
++#include <linux/slab.h>
++#include <linux/uio.h>
++
++#include "bus.h"
++#include "connection.h"
++#include "endpoint.h"
++#include "message.h"
++#include "metadata.h"
++#include "names.h"
++#include "domain.h"
++#include "item.h"
++#include "notify.h"
++#include "policy.h"
++#include "reply.h"
++#include "util.h"
++
++/**
++ * kdbus_reply_new() - Allocate and set up a new kdbus_reply object
++ * @reply_src:		The connection a reply is expected from
++ * @reply_dst:		The connection this reply object belongs to
++ * @msg:		Message associated with the reply
++ * @name_entry:		Name entry used to send the message
++ * @sync:		Whether or not to make this reply synchronous
++ *
++ * Allocate and fill a new kdbus_reply object.
++ *
++ * Return: New kdbus_conn object on success, ERR_PTR on error.
++ */
++struct kdbus_reply *kdbus_reply_new(struct kdbus_conn *reply_src,
++				    struct kdbus_conn *reply_dst,
++				    const struct kdbus_msg *msg,
++				    struct kdbus_name_entry *name_entry,
++				    bool sync)
++{
++	struct kdbus_reply *r;
++	int ret;
++
++	if (atomic_inc_return(&reply_dst->request_count) >
++	    KDBUS_CONN_MAX_REQUESTS_PENDING) {
++		ret = -EMLINK;
++		goto exit_dec_request_count;
++	}
++
++	r = kzalloc(sizeof(*r), GFP_KERNEL);
++	if (!r) {
++		ret = -ENOMEM;
++		goto exit_dec_request_count;
++	}
++
++	kref_init(&r->kref);
++	INIT_LIST_HEAD(&r->entry);
++	r->reply_src = kdbus_conn_ref(reply_src);
++	r->reply_dst = kdbus_conn_ref(reply_dst);
++	r->cookie = msg->cookie;
++	r->name_id = name_entry ? name_entry->name_id : 0;
++	r->deadline_ns = msg->timeout_ns;
++
++	if (sync) {
++		r->sync = true;
++		r->waiting = true;
++	}
++
++	return r;
++
++exit_dec_request_count:
++	atomic_dec(&reply_dst->request_count);
++	return ERR_PTR(ret);
++}
++
++static void __kdbus_reply_free(struct kref *kref)
++{
++	struct kdbus_reply *reply =
++		container_of(kref, struct kdbus_reply, kref);
++
++	atomic_dec(&reply->reply_dst->request_count);
++	kdbus_conn_unref(reply->reply_src);
++	kdbus_conn_unref(reply->reply_dst);
++	kfree(reply);
++}
++
++/**
++ * kdbus_reply_ref() - Increase reference on kdbus_reply
++ * @r:		The reply, may be %NULL
++ *
++ * Return: The reply object with an extra reference
++ */
++struct kdbus_reply *kdbus_reply_ref(struct kdbus_reply *r)
++{
++	if (r)
++		kref_get(&r->kref);
++	return r;
++}
++
++/**
++ * kdbus_reply_unref() - Decrease reference on kdbus_reply
++ * @r:		The reply, may be %NULL
++ *
++ * Return: NULL
++ */
++struct kdbus_reply *kdbus_reply_unref(struct kdbus_reply *r)
++{
++	if (r)
++		kref_put(&r->kref, __kdbus_reply_free);
++	return NULL;
++}
++
++/**
++ * kdbus_reply_link() - Link reply object into target connection
++ * @r:		Reply to link
++ */
++void kdbus_reply_link(struct kdbus_reply *r)
++{
++	if (WARN_ON(!list_empty(&r->entry)))
++		return;
++
++	list_add(&r->entry, &r->reply_dst->reply_list);
++	kdbus_reply_ref(r);
++}
++
++/**
++ * kdbus_reply_unlink() - Unlink reply object from target connection
++ * @r:		Reply to unlink
++ */
++void kdbus_reply_unlink(struct kdbus_reply *r)
++{
++	if (!list_empty(&r->entry)) {
++		list_del_init(&r->entry);
++		kdbus_reply_unref(r);
++	}
++}
++
++/**
++ * kdbus_sync_reply_wakeup() - Wake a synchronously blocking reply
++ * @reply:	The reply object
++ * @err:	Error code to set on the remote side
++ *
++ * Wake up remote peer (method origin) with the appropriate synchronous reply
++ * code.
++ */
++void kdbus_sync_reply_wakeup(struct kdbus_reply *reply, int err)
++{
++	if (WARN_ON(!reply->sync))
++		return;
++
++	reply->waiting = false;
++	reply->err = err;
++	wake_up_interruptible(&reply->reply_dst->wait);
++}
++
++/**
++ * kdbus_reply_find() - Find the corresponding reply object
++ * @replying:	The replying connection or NULL
++ * @reply_dst:	The connection the reply will be sent to
++ *		(method origin)
++ * @cookie:	The cookie of the requesting message
++ *
++ * Lookup a reply object that should be sent as a reply by
++ * @replying to @reply_dst with the given cookie.
++ *
++ * Callers must take the @reply_dst lock.
++ *
++ * Return: the corresponding reply object or NULL if not found
++ */
++struct kdbus_reply *kdbus_reply_find(struct kdbus_conn *replying,
++				     struct kdbus_conn *reply_dst,
++				     u64 cookie)
++{
++	struct kdbus_reply *r;
++
++	list_for_each_entry(r, &reply_dst->reply_list, entry) {
++		if (r->cookie == cookie &&
++		    (!replying || r->reply_src == replying))
++			return r;
++	}
++
++	return NULL;
++}
++
++/**
++ * kdbus_reply_list_scan_work() - Worker callback to scan the replies of a
++ *				  connection for exceeded timeouts
++ * @work:		Work struct of the connection to scan
++ *
++ * Walk the list of replies stored with a connection and look for entries
++ * that have exceeded their timeout. If such an entry is found, a timeout
++ * notification is sent to the waiting peer, and the reply is removed from
++ * the list.
++ *
++ * The work is rescheduled to the nearest timeout found during the list
++ * iteration.
++ */
++void kdbus_reply_list_scan_work(struct work_struct *work)
++{
++	struct kdbus_conn *conn =
++		container_of(work, struct kdbus_conn, work.work);
++	struct kdbus_reply *reply, *reply_tmp;
++	u64 deadline = ~0ULL;
++	u64 now;
++
++	now = ktime_get_ns();
++
++	mutex_lock(&conn->lock);
++	if (!kdbus_conn_active(conn)) {
++		mutex_unlock(&conn->lock);
++		return;
++	}
++
++	list_for_each_entry_safe(reply, reply_tmp, &conn->reply_list, entry) {
++		/*
++		 * If the reply block is waiting for synchronous I/O,
++		 * the timeout is handled by wait_event_*_timeout(),
++		 * so we don't have to care for it here.
++		 */
++		if (reply->sync && !reply->interrupted)
++			continue;
++
++		WARN_ON(reply->reply_dst != conn);
++
++		if (reply->deadline_ns > now) {
++			/* remember next timeout */
++			if (deadline > reply->deadline_ns)
++				deadline = reply->deadline_ns;
++
++			continue;
++		}
++
++		/*
++		 * A zero deadline means the connection died, was
++		 * cleaned up already and the notification was sent.
++		 * Don't send notifications for reply trackers that were
++		 * left in an interrupted syscall state.
++		 */
++		if (reply->deadline_ns != 0 && !reply->interrupted)
++			kdbus_notify_reply_timeout(conn->ep->bus, conn->id,
++						   reply->cookie);
++
++		kdbus_reply_unlink(reply);
++	}
++
++	/* rearm delayed work with next timeout */
++	if (deadline != ~0ULL)
++		schedule_delayed_work(&conn->work,
++				      nsecs_to_jiffies(deadline - now));
++
++	mutex_unlock(&conn->lock);
++
++	kdbus_notify_flush(conn->ep->bus);
++}
+diff --git a/ipc/kdbus/reply.h b/ipc/kdbus/reply.h
+new file mode 100644
+index 0000000..68d5232
+--- /dev/null
++++ b/ipc/kdbus/reply.h
+@@ -0,0 +1,68 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_REPLY_H
++#define __KDBUS_REPLY_H
++
++/**
++ * struct kdbus_reply - an entry of kdbus_conn's list of replies
++ * @kref:		Ref-count of this object
++ * @entry:		The entry of the connection's reply_list
++ * @reply_src:		The connection the reply will be sent from
++ * @reply_dst:		The connection the reply will be sent to
++ * @queue_entry:	The queue entry item that is prepared by the replying
++ *			connection
++ * @deadline_ns:	The deadline of the reply, in nanoseconds
++ * @cookie:		The cookie of the requesting message
++ * @name_id:		ID of the well-known name the original msg was sent to
++ * @sync:		The reply block is waiting for synchronous I/O
++ * @waiting:		The condition to synchronously wait for
++ * @interrupted:	The sync reply was left in an interrupted state
++ * @err:		The error code for the synchronous reply
++ */
++struct kdbus_reply {
++	struct kref kref;
++	struct list_head entry;
++	struct kdbus_conn *reply_src;
++	struct kdbus_conn *reply_dst;
++	struct kdbus_queue_entry *queue_entry;
++	u64 deadline_ns;
++	u64 cookie;
++	u64 name_id;
++	bool sync:1;
++	bool waiting:1;
++	bool interrupted:1;
++	int err;
++};
++
++struct kdbus_reply *kdbus_reply_new(struct kdbus_conn *reply_src,
++				    struct kdbus_conn *reply_dst,
++				    const struct kdbus_msg *msg,
++				    struct kdbus_name_entry *name_entry,
++				    bool sync);
++
++struct kdbus_reply *kdbus_reply_ref(struct kdbus_reply *r);
++struct kdbus_reply *kdbus_reply_unref(struct kdbus_reply *r);
++
++void kdbus_reply_link(struct kdbus_reply *r);
++void kdbus_reply_unlink(struct kdbus_reply *r);
++
++struct kdbus_reply *kdbus_reply_find(struct kdbus_conn *replying,
++				     struct kdbus_conn *reply_dst,
++				     u64 cookie);
++
++void kdbus_sync_reply_wakeup(struct kdbus_reply *reply, int err);
++void kdbus_reply_list_scan_work(struct work_struct *work);
++
++#endif /* __KDBUS_REPLY_H */
+diff --git a/ipc/kdbus/util.c b/ipc/kdbus/util.c
+new file mode 100644
+index 0000000..72b1883
+--- /dev/null
++++ b/ipc/kdbus/util.c
+@@ -0,0 +1,156 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/capability.h>
++#include <linux/cred.h>
++#include <linux/ctype.h>
++#include <linux/err.h>
++#include <linux/file.h>
++#include <linux/slab.h>
++#include <linux/string.h>
++#include <linux/uaccess.h>
++#include <linux/uio.h>
++#include <linux/user_namespace.h>
++
++#include "limits.h"
++#include "util.h"
++
++/**
++ * kdbus_copy_from_user() - copy aligned data from user-space
++ * @dest:	target buffer in kernel memory
++ * @user_ptr:	user-provided source buffer
++ * @size:	memory size to copy from user
++ *
++ * This copies @size bytes from @user_ptr into the kernel, just like
++ * copy_from_user() does. But we enforce an 8-byte alignment and reject any
++ * unaligned user-space pointers.
++ *
++ * Return: 0 on success, negative error code on failure.
++ */
++int kdbus_copy_from_user(void *dest, void __user *user_ptr, size_t size)
++{
++	if (!KDBUS_IS_ALIGNED8((uintptr_t)user_ptr))
++		return -EFAULT;
++
++	if (copy_from_user(dest, user_ptr, size))
++		return -EFAULT;
++
++	return 0;
++}
++
++/**
++ * kdbus_verify_uid_prefix() - verify UID prefix of a user-supplied name
++ * @name:	user-supplied name to verify
++ * @user_ns:	user-namespace to act in
++ * @kuid:	Kernel internal uid of user
++ *
++ * This verifies that the user-supplied name @name has their UID as prefix. This
++ * is the default name-spacing policy we enforce on user-supplied names for
++ * public kdbus entities like buses and endpoints.
++ *
++ * The user must supply names prefixed with "<UID>-", whereas the UID is
++ * interpreted in the user-namespace of the domain. If the user fails to supply
++ * such a prefixed name, we reject it.
++ *
++ * Return: 0 on success, negative error code on failure
++ */
++int kdbus_verify_uid_prefix(const char *name, struct user_namespace *user_ns,
++			    kuid_t kuid)
++{
++	uid_t uid;
++	char prefix[16];
++
++	/*
++	 * The kuid must have a mapping into the userns of the domain
++	 * otherwise do not allow creation of buses nor endpoints.
++	 */
++	uid = from_kuid(user_ns, kuid);
++	if (uid == (uid_t) -1)
++		return -EINVAL;
++
++	snprintf(prefix, sizeof(prefix), "%u-", uid);
++	if (strncmp(name, prefix, strlen(prefix)) != 0)
++		return -EINVAL;
++
++	return 0;
++}
++
++/**
++ * kdbus_sanitize_attach_flags() - Sanitize attach flags from user-space
++ * @flags:		Attach flags provided by userspace
++ * @attach_flags:	A pointer where to store the valid attach flags
++ *
++ * Convert attach-flags provided by user-space into a valid mask. If the mask
++ * is invalid, an error is returned. The sanitized attach flags are stored in
++ * the output parameter.
++ *
++ * Return: 0 on success, negative error on failure.
++ */
++int kdbus_sanitize_attach_flags(u64 flags, u64 *attach_flags)
++{
++	/* 'any' degrades to 'all' for compatibility */
++	if (flags == _KDBUS_ATTACH_ANY)
++		flags = _KDBUS_ATTACH_ALL;
++
++	/* reject unknown attach flags */
++	if (flags & ~_KDBUS_ATTACH_ALL)
++		return -EINVAL;
++
++	*attach_flags = flags;
++	return 0;
++}
++
++/**
++ * kdbus_kvec_set - helper utility to assemble kvec arrays
++ * @kvec:	kvec entry to use
++ * @src:	Source address to set in @kvec
++ * @len:	Number of bytes in @src
++ * @total_len:	Pointer to total length variable
++ *
++ * Set @src and @len in @kvec, and increase @total_len by @len.
++ */
++void kdbus_kvec_set(struct kvec *kvec, void *src, size_t len, u64 *total_len)
++{
++	kvec->iov_base = src;
++	kvec->iov_len = len;
++	*total_len += len;
++}
++
++static const char * const zeros = "\0\0\0\0\0\0\0";
++
++/**
++ * kdbus_kvec_pad - conditionally write a padding kvec
++ * @kvec:	kvec entry to use
++ * @len:	Total length used for kvec array
++ *
++ * Check if the current total byte length of the array in @len is aligned to
++ * 8 bytes. If it isn't, fill @kvec with padding information and increase @len
++ * by the number of bytes stored in @kvec.
++ *
++ * Return: the number of added padding bytes.
++ */
++size_t kdbus_kvec_pad(struct kvec *kvec, u64 *len)
++{
++	size_t pad = KDBUS_ALIGN8(*len) - *len;
++
++	if (!pad)
++		return 0;
++
++	kvec->iov_base = (void *)zeros;
++	kvec->iov_len = pad;
++
++	*len += pad;
++
++	return pad;
++}
+diff --git a/ipc/kdbus/util.h b/ipc/kdbus/util.h
+new file mode 100644
+index 0000000..5297166
+--- /dev/null
++++ b/ipc/kdbus/util.h
+@@ -0,0 +1,73 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_UTIL_H
++#define __KDBUS_UTIL_H
++
++#include <linux/dcache.h>
++#include <linux/ioctl.h>
++
++#include <uapi/linux/kdbus.h>
++
++/* all exported addresses are 64 bit */
++#define KDBUS_PTR(addr) ((void __user *)(uintptr_t)(addr))
++
++/* all exported sizes are 64 bit and data aligned to 64 bit */
++#define KDBUS_ALIGN8(s) ALIGN((s), 8)
++#define KDBUS_IS_ALIGNED8(s) (IS_ALIGNED(s, 8))
++
++/**
++ * kdbus_member_set_user - write a structure member to user memory
++ * @_s:		Variable to copy from
++ * @_b:		Buffer to write to
++ * @_t:		Structure type
++ * @_m:		Member name in the passed structure
++ *
++ * Return: the result of copy_to_user()
++ */
++#define kdbus_member_set_user(_s, _b, _t, _m)				\
++({									\
++	u64 __user *_sz =						\
++		(void __user *)((u8 __user *)(_b) + offsetof(_t, _m));	\
++	copy_to_user(_sz, _s, FIELD_SIZEOF(_t, _m));			\
++})
++
++/**
++ * kdbus_strhash - calculate a hash
++ * @str:	String
++ *
++ * Return: hash value
++ */
++static inline unsigned int kdbus_strhash(const char *str)
++{
++	unsigned long hash = init_name_hash();
++
++	while (*str)
++		hash = partial_name_hash(*str++, hash);
++
++	return end_name_hash(hash);
++}
++
++int kdbus_verify_uid_prefix(const char *name, struct user_namespace *user_ns,
++			    kuid_t kuid);
++int kdbus_sanitize_attach_flags(u64 flags, u64 *attach_flags);
++
++int kdbus_copy_from_user(void *dest, void __user *user_ptr, size_t size);
++
++struct kvec;
++
++void kdbus_kvec_set(struct kvec *kvec, void *src, size_t len, u64 *total_len);
++size_t kdbus_kvec_pad(struct kvec *kvec, u64 *len);
++
++#endif
+diff --git a/samples/Kconfig b/samples/Kconfig
+index 224ebb4..a4c6b2f 100644
+--- a/samples/Kconfig
++++ b/samples/Kconfig
+@@ -55,6 +55,13 @@ config SAMPLE_KDB
+ 	  Build an example of how to dynamically add the hello
+ 	  command to the kdb shell.
+ 
++config SAMPLE_KDBUS
++	bool "Build kdbus API example"
++	depends on KDBUS
++	help
++	  Build an example of how the kdbus API can be used from
++	  userspace.
++
+ config SAMPLE_RPMSG_CLIENT
+ 	tristate "Build rpmsg client sample -- loadable modules only"
+ 	depends on RPMSG && m
+diff --git a/samples/Makefile b/samples/Makefile
+index f00257b..f0ad51e 100644
+--- a/samples/Makefile
++++ b/samples/Makefile
+@@ -1,4 +1,5 @@
+ # Makefile for Linux samples code
+ 
+ obj-$(CONFIG_SAMPLES)	+= kobject/ kprobes/ trace_events/ livepatch/ \
+-			   hw_breakpoint/ kfifo/ kdb/ hidraw/ rpmsg/ seccomp/
++			   hw_breakpoint/ kfifo/ kdb/ kdbus/ hidraw/ rpmsg/ \
++			   seccomp/
+diff --git a/samples/kdbus/.gitignore b/samples/kdbus/.gitignore
+new file mode 100644
+index 0000000..ee07d98
+--- /dev/null
++++ b/samples/kdbus/.gitignore
+@@ -0,0 +1 @@
++kdbus-workers
+diff --git a/samples/kdbus/Makefile b/samples/kdbus/Makefile
+new file mode 100644
+index 0000000..137f842
+--- /dev/null
++++ b/samples/kdbus/Makefile
+@@ -0,0 +1,9 @@
++# kbuild trick to avoid linker error. Can be omitted if a module is built.
++obj- := dummy.o
++
++hostprogs-$(CONFIG_SAMPLE_KDBUS) += kdbus-workers
++
++always := $(hostprogs-y)
++
++HOSTCFLAGS_kdbus-workers.o += -I$(objtree)/usr/include
++HOSTLOADLIBES_kdbus-workers := -lrt
+diff --git a/samples/kdbus/kdbus-api.h b/samples/kdbus/kdbus-api.h
+new file mode 100644
+index 0000000..7f3abae
+--- /dev/null
++++ b/samples/kdbus/kdbus-api.h
+@@ -0,0 +1,114 @@
++#ifndef KDBUS_API_H
++#define KDBUS_API_H
++
++#include <sys/ioctl.h>
++#include <linux/kdbus.h>
++
++#define KDBUS_ALIGN8(l) (((l) + 7) & ~7)
++#define KDBUS_ITEM_HEADER_SIZE offsetof(struct kdbus_item, data)
++#define KDBUS_ITEM_SIZE(s) KDBUS_ALIGN8((s) + KDBUS_ITEM_HEADER_SIZE)
++#define KDBUS_ITEM_NEXT(item) \
++	(typeof(item))((uint8_t *)(item) + KDBUS_ALIGN8((item)->size))
++#define KDBUS_FOREACH(iter, first, _size)				\
++	for ((iter) = (first);						\
++	     ((uint8_t *)(iter) < (uint8_t *)(first) + (_size)) &&	\
++	       ((uint8_t *)(iter) >= (uint8_t *)(first));		\
++	     (iter) = (void *)((uint8_t *)(iter) + KDBUS_ALIGN8((iter)->size)))
++
++static inline int kdbus_cmd_bus_make(int control_fd, struct kdbus_cmd *cmd)
++{
++	int ret = ioctl(control_fd, KDBUS_CMD_BUS_MAKE, cmd);
++	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_endpoint_make(int bus_fd, struct kdbus_cmd *cmd)
++{
++	int ret = ioctl(bus_fd, KDBUS_CMD_ENDPOINT_MAKE, cmd);
++	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_endpoint_update(int ep_fd, struct kdbus_cmd *cmd)
++{
++	int ret = ioctl(ep_fd, KDBUS_CMD_ENDPOINT_UPDATE, cmd);
++	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_hello(int bus_fd, struct kdbus_cmd_hello *cmd)
++{
++	int ret = ioctl(bus_fd, KDBUS_CMD_HELLO, cmd);
++	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_update(int fd, struct kdbus_cmd *cmd)
++{
++	int ret = ioctl(fd, KDBUS_CMD_UPDATE, cmd);
++	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_byebye(int conn_fd, struct kdbus_cmd *cmd)
++{
++	int ret = ioctl(conn_fd, KDBUS_CMD_BYEBYE, cmd);
++	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_free(int conn_fd, struct kdbus_cmd_free *cmd)
++{
++	int ret = ioctl(conn_fd, KDBUS_CMD_FREE, cmd);
++	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_conn_info(int conn_fd, struct kdbus_cmd_info *cmd)
++{
++	int ret = ioctl(conn_fd, KDBUS_CMD_CONN_INFO, cmd);
++	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_bus_creator_info(int conn_fd, struct kdbus_cmd_info *cmd)
++{
++	int ret = ioctl(conn_fd, KDBUS_CMD_BUS_CREATOR_INFO, cmd);
++	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_list(int fd, struct kdbus_cmd_list *cmd)
++{
++	int ret = ioctl(fd, KDBUS_CMD_LIST, cmd);
++	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_send(int conn_fd, struct kdbus_cmd_send *cmd)
++{
++	int ret = ioctl(conn_fd, KDBUS_CMD_SEND, cmd);
++	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_recv(int conn_fd, struct kdbus_cmd_recv *cmd)
++{
++	int ret = ioctl(conn_fd, KDBUS_CMD_RECV, cmd);
++	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_name_acquire(int conn_fd, struct kdbus_cmd *cmd)
++{
++	int ret = ioctl(conn_fd, KDBUS_CMD_NAME_ACQUIRE, cmd);
++	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_name_release(int conn_fd, struct kdbus_cmd *cmd)
++{
++	int ret = ioctl(conn_fd, KDBUS_CMD_NAME_RELEASE, cmd);
++	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_match_add(int conn_fd, struct kdbus_cmd_match *cmd)
++{
++	int ret = ioctl(conn_fd, KDBUS_CMD_MATCH_ADD, cmd);
++	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_match_remove(int conn_fd, struct kdbus_cmd_match *cmd)
++{
++	int ret = ioctl(conn_fd, KDBUS_CMD_MATCH_REMOVE, cmd);
++	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++#endif /* KDBUS_API_H */
+diff --git a/samples/kdbus/kdbus-workers.c b/samples/kdbus/kdbus-workers.c
+new file mode 100644
+index 0000000..5a6dfdc
+--- /dev/null
++++ b/samples/kdbus/kdbus-workers.c
+@@ -0,0 +1,1346 @@
++/*
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++/*
++ * Example: Workers
++ * This program computes prime-numbers based on the sieve of Eratosthenes. The
++ * master sets up a shared memory region and spawns workers which clear out the
++ * non-primes. The master reacts to keyboard input and to client-requests to
++ * control what each worker does. Note that this is in no way meant as efficient
++ * way to compute primes. It should only serve as example how a master/worker
++ * concept can be implemented with kdbus used as control messages.
++ *
++ * The main process is called the 'master'. It creates a new, private bus which
++ * will be used between the master and its workers to communicate. The master
++ * then spawns a fixed number of workers. Whenever a worker dies (detected via
++ * SIGCHLD), the master spawns a new worker. When done, the master waits for all
++ * workers to exit, prints a status report and exits itself.
++ *
++ * The master process does *not* keep track of its workers. Instead, this
++ * example implements a PULL model. That is, the master acquires a well-known
++ * name on the bus which each worker uses to request tasks from the master. If
++ * there are no more tasks, the master will return an empty task-list, which
++ * casues a worker to exit immediately.
++ *
++ * As tasks can be computationally expensive, we support cancellation. Whenever
++ * the master process is interrupted, it will drop its well-known name on the
++ * bus. This causes kdbus to broadcast a name-change notification. The workers
++ * check for broadcast messages regularly and will exit if they receive one.
++ *
++ * This example exists of 4 objects:
++ *  * master: The master object contains the context of the master process. This
++ *            process manages the prime-context, spawns workers and assigns
++ *            prime-ranges to each worker to compute.
++ *            The master itself does not do any prime-computations itself.
++ *  * child:  The child object contains the context of a worker. It inherits the
++ *            prime context from its parent (the master) and then creates a new
++ *            bus context to request prime-ranges to compute.
++ *  * prime:  The "prime" object is used to abstract how we compute primes. When
++ *            allocated, it prepares a memory region to hold 1 bit for each
++ *            natural number up to a fixed maximum ('MAX_PRIMES').
++ *            The memory region is backed by a memfd which we share between
++ *            processes. Each worker now gets assigned a range of natural
++ *            numbers which it clears multiples of off the memory region. The
++ *            master process is responsible of distributing all natural numbers
++ *            up to the fixed maximum to its workers.
++ *  * bus:    The bus object is an abstraction of the kdbus API. It is pretty
++ *            straightfoward and only manages the connection-fd plus the
++ *            memory-mapped pool in a single object.
++ *
++ * This example is in reversed order, which should make it easier to read
++ * top-down, but requires some forward-declarations. Just ignore those.
++ */
++
++#include <stdio.h>
++#include <stdlib.h>
++#include <sys/syscall.h>
++
++/* glibc < 2.7 does not ship sys/signalfd.h */
++/* we require kernels with __NR_memfd_create */
++#if __GLIBC__ >= 2 && __GLIBC_MINOR__ >= 7 && defined(__NR_memfd_create)
++
++#include <ctype.h>
++#include <errno.h>
++#include <fcntl.h>
++#include <linux/memfd.h>
++#include <signal.h>
++#include <stdbool.h>
++#include <stddef.h>
++#include <stdint.h>
++#include <string.h>
++#include <sys/mman.h>
++#include <sys/poll.h>
++#include <sys/signalfd.h>
++#include <sys/time.h>
++#include <sys/wait.h>
++#include <time.h>
++#include <unistd.h>
++#include "kdbus-api.h"
++
++/* FORWARD DECLARATIONS */
++
++#define POOL_SIZE (16 * 1024 * 1024)
++#define MAX_PRIMES (2UL << 24)
++#define WORKER_COUNT (16)
++#define PRIME_STEPS (65536 * 4)
++
++static const char *arg_busname = "example-workers";
++static const char *arg_modname = "kdbus";
++static const char *arg_master = "org.freedesktop.master";
++
++static int err_assert(int r_errno, const char *msg, const char *func, int line,
++		      const char *file)
++{
++	r_errno = (r_errno != 0) ? -abs(r_errno) : -EFAULT;
++	if (r_errno < 0) {
++		errno = -r_errno;
++		fprintf(stderr, "ERR: %s: %m (%s:%d in %s)\n",
++			msg, func, line, file);
++	}
++	return r_errno;
++}
++
++#define err_r(_r, _msg) err_assert((_r), (_msg), __func__, __LINE__, __FILE__)
++#define err(_msg) err_r(errno, (_msg))
++
++struct prime;
++struct bus;
++struct master;
++struct child;
++
++struct prime {
++	int fd;
++	uint8_t *area;
++	size_t max;
++	size_t done;
++	size_t status;
++};
++
++static int prime_new(struct prime **out);
++static void prime_free(struct prime *p);
++static bool prime_done(struct prime *p);
++static void prime_consume(struct prime *p, size_t amount);
++static int prime_run(struct prime *p, struct bus *cancel, size_t number);
++static void prime_print(struct prime *p);
++
++struct bus {
++	int fd;
++	uint8_t *pool;
++};
++
++static int bus_open_connection(struct bus **out, uid_t uid, const char *name,
++			       uint64_t recv_flags);
++static void bus_close_connection(struct bus *b);
++static void bus_poool_free_slice(struct bus *b, uint64_t offset);
++static int bus_acquire_name(struct bus *b, const char *name);
++static int bus_install_name_loss_match(struct bus *b, const char *name);
++static int bus_poll(struct bus *b);
++static int bus_make(uid_t uid, const char *name);
++
++struct master {
++	size_t n_workers;
++	size_t max_workers;
++
++	int signal_fd;
++	int control_fd;
++
++	struct prime *prime;
++	struct bus *bus;
++};
++
++static int master_new(struct master **out);
++static void master_free(struct master *m);
++static int master_run(struct master *m);
++static int master_poll(struct master *m);
++static int master_handle_stdin(struct master *m);
++static int master_handle_signal(struct master *m);
++static int master_handle_bus(struct master *m);
++static int master_reply(struct master *m, const struct kdbus_msg *msg);
++static int master_waitpid(struct master *m);
++static int master_spawn(struct master *m);
++
++struct child {
++	struct bus *bus;
++	struct prime *prime;
++};
++
++static int child_new(struct child **out, struct prime *p);
++static void child_free(struct child *c);
++static int child_run(struct child *c);
++
++/* END OF FORWARD DECLARATIONS */
++
++/*
++ * This is the main entrypoint of this example. It is pretty straightforward. We
++ * create a master object, run the computation, print a status report and then
++ * exit. Nothing particularly interesting here, so lets look into the master
++ * object...
++ */
++int main(int argc, char **argv)
++{
++	struct master *m = NULL;
++	int r;
++
++	r = master_new(&m);
++	if (r < 0)
++		goto out;
++
++	r = master_run(m);
++	if (r < 0)
++		goto out;
++
++	if (0)
++		prime_print(m->prime);
++
++out:
++	master_free(m);
++	if (r < 0 && r != -EINTR)
++		fprintf(stderr, "failed\n");
++	else
++		fprintf(stderr, "done\n");
++	return r < 0 ? EXIT_FAILURE : EXIT_SUCCESS;
++}
++
++/*
++ * ...this will allocate a new master context. It keeps track of the current
++ * number of children/workers that are running, manages a signalfd to track
++ * SIGCHLD, and creates a private kdbus bus. Afterwards, it opens its connection
++ * to the bus and acquires a well known-name (arg_master).
++ */
++static int master_new(struct master **out)
++{
++	struct master *m;
++	sigset_t smask;
++	int r;
++
++	m = calloc(1, sizeof(*m));
++	if (!m)
++		return err("cannot allocate master");
++
++	m->max_workers = WORKER_COUNT;
++	m->signal_fd = -1;
++	m->control_fd = -1;
++
++	/* Block SIGINT and SIGCHLD signals */
++	sigemptyset(&smask);
++	sigaddset(&smask, SIGINT);
++	sigaddset(&smask, SIGCHLD);
++	sigprocmask(SIG_BLOCK, &smask, NULL);
++
++	m->signal_fd = signalfd(-1, &smask, SFD_CLOEXEC);
++	if (m->signal_fd < 0) {
++		r = err("cannot create signalfd");
++		goto error;
++	}
++
++	r = prime_new(&m->prime);
++	if (r < 0)
++		goto error;
++
++	m->control_fd = bus_make(getuid(), arg_busname);
++	if (m->control_fd < 0) {
++		r = m->control_fd;
++		goto error;
++	}
++
++	/*
++	 * Open a bus connection for the master, and require each received
++	 * message to have a metadata item of type KDBUS_ITEM_PIDS attached.
++	 * The current UID is needed to compute the name of the bus node to
++	 * connect to.
++	 */
++	r = bus_open_connection(&m->bus, getuid(),
++				arg_busname, KDBUS_ATTACH_PIDS);
++	if (r < 0)
++		goto error;
++
++	/*
++	 * Acquire a well-known name on the bus, so children can address
++	 * messages to the master using KDBUS_DST_ID_NAME as destination-ID
++	 * of messages.
++	 */
++	r = bus_acquire_name(m->bus, arg_master);
++	if (r < 0)
++		goto error;
++
++	*out = m;
++	return 0;
++
++error:
++	master_free(m);
++	return r;
++}
++
++/* pretty straightforward destructor of a master object */
++static void master_free(struct master *m)
++{
++	if (!m)
++		return;
++
++	bus_close_connection(m->bus);
++	if (m->control_fd >= 0)
++		close(m->control_fd);
++	prime_free(m->prime);
++	if (m->signal_fd >= 0)
++		close(m->signal_fd);
++	free(m);
++}
++
++static int master_run(struct master *m)
++{
++	int res, r = 0;
++
++	while (!prime_done(m->prime)) {
++		while (m->n_workers < m->max_workers) {
++			r = master_spawn(m);
++			if (r < 0)
++				break;
++		}
++
++		r = master_poll(m);
++		if (r < 0)
++			break;
++	}
++
++	if (r < 0) {
++		bus_close_connection(m->bus);
++		m->bus = NULL;
++	}
++
++	while (m->n_workers > 0) {
++		res = master_poll(m);
++		if (res < 0) {
++			if (m->bus) {
++				bus_close_connection(m->bus);
++				m->bus = NULL;
++			}
++			r = res;
++		}
++	}
++
++	return r == -EINTR ? 0 : r;
++}
++
++static int master_poll(struct master *m)
++{
++	struct pollfd fds[3] = {};
++	int r = 0, n = 0;
++
++	/*
++	 * Add stdin, the eventfd and the connection owner file descriptor to
++	 * the pollfd table, and handle incoming traffic on the latter in
++	 * master_handle_bus().
++	 */
++	fds[n].fd = STDIN_FILENO;
++	fds[n++].events = POLLIN;
++	fds[n].fd = m->signal_fd;
++	fds[n++].events = POLLIN;
++	if (m->bus) {
++		fds[n].fd = m->bus->fd;
++		fds[n++].events = POLLIN;
++	}
++
++	r = poll(fds, n, -1);
++	if (r < 0)
++		return err("poll() failed");
++
++	if (fds[0].revents & POLLIN)
++		r = master_handle_stdin(m);
++	else if (fds[0].revents)
++		r = err("ERR/HUP on stdin");
++	if (r < 0)
++		return r;
++
++	if (fds[1].revents & POLLIN)
++		r = master_handle_signal(m);
++	else if (fds[1].revents)
++		r = err("ERR/HUP on signalfd");
++	if (r < 0)
++		return r;
++
++	if (fds[2].revents & POLLIN)
++		r = master_handle_bus(m);
++	else if (fds[2].revents)
++		r = err("ERR/HUP on bus");
++
++	return r;
++}
++
++static int master_handle_stdin(struct master *m)
++{
++	char buf[128];
++	ssize_t l;
++	int r = 0;
++
++	l = read(STDIN_FILENO, buf, sizeof(buf));
++	if (l < 0)
++		return err("cannot read stdin");
++	if (l == 0)
++		return err_r(-EINVAL, "EOF on stdin");
++
++	while (l-- > 0) {
++		switch (buf[l]) {
++		case 'q':
++			/* quit */
++			r = -EINTR;
++			break;
++		case '\n':
++		case ' ':
++			/* ignore */
++			break;
++		default:
++			if (isgraph(buf[l]))
++				fprintf(stderr, "invalid input '%c'\n", buf[l]);
++			else
++				fprintf(stderr, "invalid input 0x%x\n", buf[l]);
++			break;
++		}
++	}
++
++	return r;
++}
++
++static int master_handle_signal(struct master *m)
++{
++	struct signalfd_siginfo val;
++	ssize_t l;
++
++	l = read(m->signal_fd, &val, sizeof(val));
++	if (l < 0)
++		return err("cannot read signalfd");
++	if (l != sizeof(val))
++		return err_r(-EINVAL, "invalid data from signalfd");
++
++	switch (val.ssi_signo) {
++	case SIGCHLD:
++		return master_waitpid(m);
++	case SIGINT:
++		return err_r(-EINTR, "interrupted");
++	default:
++		return err_r(-EINVAL, "caught invalid signal");
++	}
++}
++
++static int master_handle_bus(struct master *m)
++{
++	struct kdbus_cmd_recv recv = { .size = sizeof(recv) };
++	const struct kdbus_msg *msg = NULL;
++	const struct kdbus_item *item;
++	const struct kdbus_vec *vec = NULL;
++	int r = 0;
++
++	/*
++	 * To receive a message, the KDBUS_CMD_RECV ioctl is used.
++	 * It takes an argument of type 'struct kdbus_cmd_recv', which
++	 * will contain information on the received message when the call
++	 * returns. See kdbus.message(7).
++	 */
++	r = kdbus_cmd_recv(m->bus->fd, &recv);
++	/*
++	 * EAGAIN is returned when there is no message waiting on this
++	 * connection. This is not an error - simply bail out.
++	 */
++	if (r == -EAGAIN)
++		return 0;
++	if (r < 0)
++		return err_r(r, "cannot receive message");
++
++	/*
++	 * Messages received by a connection are stored inside the connection's
++	 * pool, at an offset that has been returned in the 'recv' command
++	 * struct above. The value describes the relative offset from the
++	 * start address of the pool. A message is described with
++	 * 'struct kdbus_msg'. See kdbus.message(7).
++	 */
++	msg = (void *)(m->bus->pool + recv.msg.offset);
++
++	/*
++	 * A messages describes its actual payload in an array of items.
++	 * KDBUS_FOREACH() is a simple iterator that walks such an array.
++	 * struct kdbus_msg has a field to denote its total size, which is
++	 * needed to determine the number of items in the array.
++	 */
++	KDBUS_FOREACH(item, msg->items,
++		      msg->size - offsetof(struct kdbus_msg, items)) {
++		/*
++		 * An item of type PAYLOAD_OFF describes in-line memory
++		 * stored in the pool at a described offset. That offset is
++		 * relative to the start address of the message header.
++		 * This example program only expects one single item of that
++		 * type, remembers the struct kdbus_vec member of the item
++		 * when it sees it, and bails out if there is more than one
++		 * of them.
++		 */
++		if (item->type == KDBUS_ITEM_PAYLOAD_OFF) {
++			if (vec) {
++				r = err_r(-EEXIST,
++					  "message with multiple vecs");
++				break;
++			}
++			vec = &item->vec;
++			if (vec->size != 1) {
++				r = err_r(-EINVAL, "invalid message size");
++				break;
++			}
++
++		/*
++		 * MEMFDs are transported as items of type PAYLOAD_MEMFD.
++		 * If such an item is attached, a new file descriptor was
++		 * installed into the task when KDBUS_CMD_RECV was called, and
++		 * its number is stored in item->memfd.fd.
++		 * Implementers *must* handle this item type and close the
++		 * file descriptor when no longer needed in order to prevent
++		 * file descriptor exhaustion. This example program just bails
++		 * out with an error in this case, as memfds are not expected
++		 * in this context.
++		 */
++		} else if (item->type == KDBUS_ITEM_PAYLOAD_MEMFD) {
++			r = err_r(-EINVAL, "message with memfd");
++			break;
++		}
++	}
++	if (r < 0)
++		goto exit;
++	if (!vec) {
++		r = err_r(-EINVAL, "empty message");
++		goto exit;
++	}
++
++	switch (*((const uint8_t *)msg + vec->offset)) {
++	case 'r': {
++		r = master_reply(m, msg);
++		break;
++	}
++	default:
++		r = err_r(-EINVAL, "invalid message type");
++		break;
++	}
++
++exit:
++	/*
++	 * We are done with the memory slice that was given to us through
++	 * recv.msg.offset. Tell the kernel it can use it for other content
++	 * in the future. See kdbus.pool(7).
++	 */
++	bus_poool_free_slice(m->bus, recv.msg.offset);
++	return r;
++}
++
++static int master_reply(struct master *m, const struct kdbus_msg *msg)
++{
++	struct kdbus_cmd_send cmd;
++	struct kdbus_item *item;
++	struct kdbus_msg *reply;
++	size_t size, status, p[2];
++	int r;
++
++	/*
++	 * This functions sends a message over kdbus. To do this, it uses the
++	 * KDBUS_CMD_SEND ioctl, which takes a command struct argument of type
++	 * 'struct kdbus_cmd_send'. This struct stores a pointer to the actual
++	 * message to send. See kdbus.message(7).
++	 */
++	p[0] = m->prime->done;
++	p[1] = prime_done(m->prime) ? 0 : PRIME_STEPS;
++
++	size = sizeof(*reply);
++	size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
++
++	/* Prepare the message to send */
++	reply = alloca(size);
++	memset(reply, 0, size);
++	reply->size = size;
++
++	/* Each message has a cookie that can be used to send replies */
++	reply->cookie = 1;
++
++	/* The payload_type is arbitrary, but it must be non-zero */
++	reply->payload_type = 0xdeadbeef;
++
++	/*
++	 * We are sending a reply. Let the kernel know the cookie of the
++	 * message we are replying to.
++	 */
++	reply->cookie_reply = msg->cookie;
++
++	/*
++	 * Messages can either be directed to a well-known name (stored as
++	 * string) or to a unique name (stored as number). This example does
++	 * the latter. If the message would be directed to a well-known name
++	 * instead, the message's dst_id field would be set to
++	 * KDBUS_DST_ID_NAME, and the name would be attaches in an item of type
++	 * KDBUS_ITEM_DST_NAME. See below for an example, and also refer to
++	 * kdbus.message(7).
++	 */
++	reply->dst_id = msg->src_id;
++
++	/* Our message has exactly one item to store its payload */
++	item = reply->items;
++	item->type = KDBUS_ITEM_PAYLOAD_VEC;
++	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
++	item->vec.address = (uintptr_t)p;
++	item->vec.size = sizeof(p);
++
++	/*
++	 * Now prepare the command struct, and reference the message we want
++	 * to send.
++	 */
++	memset(&cmd, 0, sizeof(cmd));
++	cmd.size = sizeof(cmd);
++	cmd.msg_address = (uintptr_t)reply;
++
++	/*
++	 * Finally, employ the command on the connection owner
++	 * file descriptor.
++	 */
++	r = kdbus_cmd_send(m->bus->fd, &cmd);
++	if (r < 0)
++		return err_r(r, "cannot send reply");
++
++	if (p[1]) {
++		prime_consume(m->prime, p[1]);
++		status = m->prime->done * 10000 / m->prime->max;
++		if (status != m->prime->status) {
++			m->prime->status = status;
++			fprintf(stderr, "status: %7.3lf%%\n",
++				(double)status / 100);
++		}
++	}
++
++	return 0;
++}
++
++static int master_waitpid(struct master *m)
++{
++	pid_t pid;
++	int r;
++
++	while ((pid = waitpid(-1, &r, WNOHANG)) > 0) {
++		if (m->n_workers > 0)
++			--m->n_workers;
++		if (!WIFEXITED(r))
++			r = err_r(-EINVAL, "child died unexpectedly");
++		else if (WEXITSTATUS(r) != 0)
++			r = err_r(-WEXITSTATUS(r), "child failed");
++	}
++
++	return r;
++}
++
++static int master_spawn(struct master *m)
++{
++	struct child *c = NULL;
++	struct prime *p = NULL;
++	pid_t pid;
++	int r;
++
++	/* Spawn off one child and call child_run() inside it */
++
++	pid = fork();
++	if (pid < 0)
++		return err("cannot fork");
++	if (pid > 0) {
++		/* parent */
++		++m->n_workers;
++		return 0;
++	}
++
++	/* child */
++
++	p = m->prime;
++	m->prime = NULL;
++	master_free(m);
++
++	r = child_new(&c, p);
++	if (r < 0)
++		goto exit;
++
++	r = child_run(c);
++
++exit:
++	child_free(c);
++	exit(abs(r));
++}
++
++static int child_new(struct child **out, struct prime *p)
++{
++	struct child *c;
++	int r;
++
++	c = calloc(1, sizeof(*c));
++	if (!c)
++		return err("cannot allocate child");
++
++	c->prime = p;
++
++	/*
++	 * Open a connection to the bus and require each received message to
++	 * carry a list of the well-known names the sendind connection currently
++	 * owns. The current UID is needed in order to determine the name of the
++	 * bus node to connect to.
++	 */
++	r = bus_open_connection(&c->bus, getuid(),
++				arg_busname, KDBUS_ATTACH_NAMES);
++	if (r < 0)
++		goto error;
++
++	/*
++	 * Install a kdbus match so the child's connection gets notified when
++	 * the master loses its well-known name.
++	 */
++	r = bus_install_name_loss_match(c->bus, arg_master);
++	if (r < 0)
++		goto error;
++
++	*out = c;
++	return 0;
++
++error:
++	child_free(c);
++	return r;
++}
++
++static void child_free(struct child *c)
++{
++	if (!c)
++		return;
++
++	bus_close_connection(c->bus);
++	prime_free(c->prime);
++	free(c);
++}
++
++static int child_run(struct child *c)
++{
++	struct kdbus_cmd_send cmd;
++	struct kdbus_item *item;
++	struct kdbus_vec *vec = NULL;
++	struct kdbus_msg *msg;
++	struct timespec spec;
++	size_t n, steps, size;
++	int r = 0;
++
++	/*
++	 * Let's send a message to the master and ask for work. To do this,
++	 * we use the KDBUS_CMD_SEND ioctl, which takes an argument of type
++	 * 'struct kdbus_cmd_send'. This struct stores a pointer to the actual
++	 * message to send. See kdbus.message(7).
++	 */
++	size = sizeof(*msg);
++	size += KDBUS_ITEM_SIZE(strlen(arg_master) + 1);
++	size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
++
++	msg = alloca(size);
++	memset(msg, 0, size);
++	msg->size = size;
++
++	/*
++	 * Tell the kernel that we expect a reply to this message. This means
++	 * that
++	 *
++	 * a) The remote peer will gain temporary permission to talk to us
++	 *    even if it would not be allowed to normally.
++	 *
++	 * b) A timeout value is required.
++	 *
++	 *    For asynchronous send commands, if no reply is received, we will
++	 *    get a kernel notification with an item of type
++	 *    KDBUS_ITEM_REPLY_TIMEOUT attached.
++	 *
++	 *    For synchronous send commands (which this example does), the
++	 *    ioctl will block until a reply is received or the timeout is
++	 *    exceeded.
++	 */
++	msg->flags = KDBUS_MSG_EXPECT_REPLY;
++
++	/* Set our cookie. Replies must use this cookie to send their reply. */
++	msg->cookie = 1;
++
++	/* The payload_type is arbitrary, but it must be non-zero */
++	msg->payload_type = 0xdeadbeef;
++
++	/*
++	 * We are sending our message to the current owner of a well-known
++	 * name. This makes an item of type KDBUS_ITEM_DST_NAME mandatory.
++	 */
++	msg->dst_id = KDBUS_DST_ID_NAME;
++
++	/*
++	 * Set the reply timeout to 5 seconds. Timeouts are always set in
++	 * absolute timestamps, based con CLOCK_MONOTONIC. See kdbus.message(7).
++	 */
++	clock_gettime(CLOCK_MONOTONIC_COARSE, &spec);
++	msg->timeout_ns += (5 + spec.tv_sec) * 1000ULL * 1000ULL * 1000ULL;
++	msg->timeout_ns += spec.tv_nsec;
++
++	/*
++	 * Fill the appended items. First, set the well-known name of the
++	 * destination we want to talk to.
++	 */
++	item = msg->items;
++	item->type = KDBUS_ITEM_DST_NAME;
++	item->size = KDBUS_ITEM_HEADER_SIZE + strlen(arg_master) + 1;
++	strcpy(item->str, arg_master);
++
++	/*
++	 * The 2nd item contains a vector to memory we want to send. It
++	 * can be content of any type. In our case, we're sending a one-byte
++	 * string only. The memory referenced by this item will be copied into
++	 * the pool of the receiver connection, and does not need to be valid
++	 * after the command is employed.
++	 */
++	item = KDBUS_ITEM_NEXT(item);
++	item->type = KDBUS_ITEM_PAYLOAD_VEC;
++	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
++	item->vec.address = (uintptr_t)"r";
++	item->vec.size = 1;
++
++	/* Set up the command struct and reference the message we prepared */
++	memset(&cmd, 0, sizeof(cmd));
++	cmd.size = sizeof(cmd);
++	cmd.msg_address = (uintptr_t)msg;
++
++	/*
++	 * The send commands knows a mode in which it will block until a
++	 * reply to a message is received. This example uses that mode.
++	 * The pool offset to the received reply will be stored in the command
++	 * struct after the send command returned. See below.
++	 */
++	cmd.flags = KDBUS_SEND_SYNC_REPLY;
++
++	/*
++	 * Finally, employ the command on the connection owner
++	 * file descriptor.
++	 */
++	r = kdbus_cmd_send(c->bus->fd, &cmd);
++	if (r == -ESRCH || r == -EPIPE || r == -ECONNRESET)
++		return 0;
++	if (r < 0)
++		return err_r(r, "cannot send request to master");
++
++	/*
++	 * The command was sent with the KDBUS_SEND_SYNC_REPLY flag set,
++	 * and returned successfully, which means that cmd.reply.offset now
++	 * points to a message inside our connection's pool where the reply
++	 * is found. This is equivalent to receiving the reply with
++	 * KDBUS_CMD_RECV, but it doesn't require waiting for the reply with
++	 * poll() and also saves the ioctl to receive the message.
++	 */
++	msg = (void *)(c->bus->pool + cmd.reply.offset);
++
++	/*
++	 * A messages describes its actual payload in an array of items.
++	 * KDBUS_FOREACH() is a simple iterator that walks such an array.
++	 * struct kdbus_msg has a field to denote its total size, which is
++	 * needed to determine the number of items in the array.
++	 */
++	KDBUS_FOREACH(item, msg->items,
++		      msg->size - offsetof(struct kdbus_msg, items)) {
++		/*
++		 * An item of type PAYLOAD_OFF describes in-line memory
++		 * stored in the pool at a described offset. That offset is
++		 * relative to the start address of the message header.
++		 * This example program only expects one single item of that
++		 * type, remembers the struct kdbus_vec member of the item
++		 * when it sees it, and bails out if there is more than one
++		 * of them.
++		 */
++		if (item->type == KDBUS_ITEM_PAYLOAD_OFF) {
++			if (vec) {
++				r = err_r(-EEXIST,
++					  "message with multiple vecs");
++				break;
++			}
++			vec = &item->vec;
++			if (vec->size != 2 * sizeof(size_t)) {
++				r = err_r(-EINVAL, "invalid message size");
++				break;
++			}
++		/*
++		 * MEMFDs are transported as items of type PAYLOAD_MEMFD.
++		 * If such an item is attached, a new file descriptor was
++		 * installed into the task when KDBUS_CMD_RECV was called, and
++		 * its number is stored in item->memfd.fd.
++		 * Implementers *must* handle this item type close the
++		 * file descriptor when no longer needed in order to prevent
++		 * file descriptor exhaustion. This example program just bails
++		 * out with an error in this case, as memfds are not expected
++		 * in this context.
++		 */
++		} else if (item->type == KDBUS_ITEM_PAYLOAD_MEMFD) {
++			r = err_r(-EINVAL, "message with memfd");
++			break;
++		}
++	}
++	if (r < 0)
++		goto exit;
++	if (!vec) {
++		r = err_r(-EINVAL, "empty message");
++		goto exit;
++	}
++
++	n = ((size_t *)((const uint8_t *)msg + vec->offset))[0];
++	steps = ((size_t *)((const uint8_t *)msg + vec->offset))[1];
++
++	while (steps-- > 0) {
++		++n;
++		r = prime_run(c->prime, c->bus, n);
++		if (r < 0)
++			break;
++		r = bus_poll(c->bus);
++		if (r != 0) {
++			r = r < 0 ? r : -EINTR;
++			break;
++		}
++	}
++
++exit:
++	/*
++	 * We are done with the memory slice that was given to us through
++	 * cmd.reply.offset. Tell the kernel it can use it for other content
++	 * in the future. See kdbus.pool(7).
++	 */
++	bus_poool_free_slice(c->bus, cmd.reply.offset);
++	return r;
++}
++
++/*
++ * Prime Computation
++ *
++ */
++
++static int prime_new(struct prime **out)
++{
++	struct prime *p;
++	int r;
++
++	p = calloc(1, sizeof(*p));
++	if (!p)
++		return err("cannot allocate prime memory");
++
++	p->fd = -1;
++	p->area = MAP_FAILED;
++	p->max = MAX_PRIMES;
++
++	/*
++	 * Prepare and map a memfd to store the bit-fields for the number
++	 * ranges we want to perform the prime detection on.
++	 */
++	p->fd = syscall(__NR_memfd_create, "prime-area", MFD_CLOEXEC);
++	if (p->fd < 0) {
++		r = err("cannot create memfd");
++		goto error;
++	}
++
++	r = ftruncate(p->fd, p->max / 8 + 1);
++	if (r < 0) {
++		r = err("cannot ftruncate area");
++		goto error;
++	}
++
++	p->area = mmap(NULL, p->max / 8 + 1, PROT_READ | PROT_WRITE,
++		       MAP_SHARED, p->fd, 0);
++	if (p->area == MAP_FAILED) {
++		r = err("cannot mmap memfd");
++		goto error;
++	}
++
++	*out = p;
++	return 0;
++
++error:
++	prime_free(p);
++	return r;
++}
++
++static void prime_free(struct prime *p)
++{
++	if (!p)
++		return;
++
++	if (p->area != MAP_FAILED)
++		munmap(p->area, p->max / 8 + 1);
++	if (p->fd >= 0)
++		close(p->fd);
++	free(p);
++}
++
++static bool prime_done(struct prime *p)
++{
++	return p->done >= p->max;
++}
++
++static void prime_consume(struct prime *p, size_t amount)
++{
++	p->done += amount;
++}
++
++static int prime_run(struct prime *p, struct bus *cancel, size_t number)
++{
++	size_t i, n = 0;
++	int r;
++
++	if (number < 2 || number > 65535)
++		return 0;
++
++	for (i = number * number;
++	     i < p->max && i > number;
++	     i += number) {
++		p->area[i / 8] |= 1 << (i % 8);
++
++		if (!(++n % (1 << 20))) {
++			r = bus_poll(cancel);
++			if (r != 0)
++				return r < 0 ? r : -EINTR;
++		}
++	}
++
++	return 0;
++}
++
++static void prime_print(struct prime *p)
++{
++	size_t i, l = 0;
++
++	fprintf(stderr, "PRIMES:");
++	for (i = 0; i < p->max; ++i) {
++		if (!(p->area[i / 8] & (1 << (i % 8))))
++			fprintf(stderr, "%c%7zu", !(l++ % 16) ? '\n' : ' ', i);
++	}
++	fprintf(stderr, "\nEND\n");
++}
++
++static int bus_open_connection(struct bus **out, uid_t uid, const char *name,
++			       uint64_t recv_flags)
++{
++	struct kdbus_cmd_hello hello;
++	char path[128];
++	struct bus *b;
++	int r;
++
++	/*
++	 * The 'bus' object is our representation of a kdbus connection which
++	 * stores two details: the connection owner file descriptor, and the
++	 * mmap()ed memory of its associated pool. See kdbus.connection(7) and
++	 * kdbus.pool(7).
++	 */
++	b = calloc(1, sizeof(*b));
++	if (!b)
++		return err("cannot allocate bus memory");
++
++	b->fd = -1;
++	b->pool = MAP_FAILED;
++
++	/* Compute the name of the bus node to connect to. */
++	snprintf(path, sizeof(path), "/sys/fs/%s/%lu-%s/bus",
++		 arg_modname, (unsigned long)uid, name);
++	b->fd = open(path, O_RDWR | O_CLOEXEC);
++	if (b->fd < 0) {
++		r = err("cannot open bus");
++		goto error;
++	}
++
++	/*
++	 * To make a connection to the bus, the KDBUS_CMD_HELLO ioctl is used.
++	 * It takes an argument of type 'struct kdbus_cmd_hello'.
++	 */
++	memset(&hello, 0, sizeof(hello));
++	hello.size = sizeof(hello);
++
++	/*
++	 * Specify a mask of metadata attach flags, describing metadata items
++	 * that this new connection allows to be sent.
++	 */
++	hello.attach_flags_send = _KDBUS_ATTACH_ALL;
++
++	/*
++	 * Specify a mask of metadata attach flags, describing metadata items
++	 * that this new connection wants to be receive along with each message.
++	 */
++	hello.attach_flags_recv = recv_flags;
++
++	/*
++	 * A connection may choose the size of its pool, but the number has to
++	 * comply with two rules: a) it must be greater than 0, and b) it must
++	 * be a mulitple of PAGE_SIZE. See kdbus.pool(7).
++	 */
++	hello.pool_size = POOL_SIZE;
++
++	/*
++	 * Now employ the command on the file descriptor opened above.
++	 * This command will turn the file descriptor into a connection-owner
++	 * file descriptor that controls the life-time of the connection; once
++	 * it's closed, the connection is shut down.
++	 */
++	r = kdbus_cmd_hello(b->fd, &hello);
++	if (r < 0) {
++		err_r(r, "HELLO failed");
++		goto error;
++	}
++
++	bus_poool_free_slice(b, hello.offset);
++
++	/*
++	 * Map the pool of the connection. Its size has been set in the
++	 * command struct above. See kdbus.pool(7).
++	 */
++	b->pool = mmap(NULL, POOL_SIZE, PROT_READ, MAP_SHARED, b->fd, 0);
++	if (b->pool == MAP_FAILED) {
++		r = err("cannot mmap pool");
++		goto error;
++	}
++
++	*out = b;
++	return 0;
++
++error:
++	bus_close_connection(b);
++	return r;
++}
++
++static void bus_close_connection(struct bus *b)
++{
++	if (!b)
++		return;
++
++	/*
++	 * A bus connection is closed by simply calling close() on the
++	 * connection owner file descriptor. The unique name and all owned
++	 * well-known names of the conneciton will disappear.
++	 * See kdbus.connection(7).
++	 */
++	if (b->pool != MAP_FAILED)
++		munmap(b->pool, POOL_SIZE);
++	if (b->fd >= 0)
++		close(b->fd);
++	free(b);
++}
++
++static void bus_poool_free_slice(struct bus *b, uint64_t offset)
++{
++	struct kdbus_cmd_free cmd = {
++		.size = sizeof(cmd),
++		.offset = offset,
++	};
++	int r;
++
++	/*
++	 * Once we're done with a piece of pool memory that was returned
++	 * by a command, we have to call the KDBUS_CMD_FREE ioctl on it so it
++	 * can be reused. The command takes an argument of type
++	 * 'struct kdbus_cmd_free', in which the pool offset of the slice to
++	 * free is stored. The ioctl is employed on the connection owner
++	 * file descriptor. See kdbus.pool(7),
++	 */
++	r = kdbus_cmd_free(b->fd, &cmd);
++	if (r < 0)
++		err_r(r, "cannot free pool slice");
++}
++
++static int bus_acquire_name(struct bus *b, const char *name)
++{
++	struct kdbus_item *item;
++	struct kdbus_cmd *cmd;
++	size_t size;
++	int r;
++
++	/*
++	 * This function acquires a well-known name on the bus through the
++	 * KDBUS_CMD_NAME_ACQUIRE ioctl. This ioctl takes an argument of type
++	 * 'struct kdbus_cmd', which is assembled below. See kdbus.name(7).
++	 */
++	size = sizeof(*cmd);
++	size += KDBUS_ITEM_SIZE(strlen(name) + 1);
++
++	cmd = alloca(size);
++	memset(cmd, 0, size);
++	cmd->size = size;
++
++	/*
++	 * The command requires an item of type KDBUS_ITEM_NAME, and its
++	 * content must be a valid bus name.
++	 */
++	item = cmd->items;
++	item->type = KDBUS_ITEM_NAME;
++	item->size = KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
++	strcpy(item->str, name);
++
++	/*
++	 * Employ the command on the connection owner file descriptor.
++	 */
++	r = kdbus_cmd_name_acquire(b->fd, cmd);
++	if (r < 0)
++		return err_r(r, "cannot acquire name");
++
++	return 0;
++}
++
++static int bus_install_name_loss_match(struct bus *b, const char *name)
++{
++	struct kdbus_cmd_match *match;
++	struct kdbus_item *item;
++	size_t size;
++	int r;
++
++	/*
++	 * In order to install a match for signal messages, we have to
++	 * assemble a 'struct kdbus_cmd_match' and use it along with the
++	 * KDBUS_CMD_MATCH_ADD ioctl. See kdbus.match(7).
++	 */
++	size = sizeof(*match);
++	size += KDBUS_ITEM_SIZE(sizeof(item->name_change) + strlen(name) + 1);
++
++	match = alloca(size);
++	memset(match, 0, size);
++	match->size = size;
++
++	/*
++	 * A match is comprised of many 'rules', each of which describes a
++	 * mandatory detail of the message. All rules of a match must be
++	 * satified in order to make a message pass.
++	 */
++	item = match->items;
++
++	/*
++	 * In this case, we're interested in notifications that inform us
++	 * about a well-known name being removed from the bus.
++	 */
++	item->type = KDBUS_ITEM_NAME_REMOVE;
++	item->size = KDBUS_ITEM_HEADER_SIZE +
++			sizeof(item->name_change) + strlen(name) + 1;
++
++	/*
++	 * We could limit the match further and require a specific unique-ID
++	 * to be the new or the old owner of the name. In this case, however,
++	 * we don't, and allow 'any' id.
++	 */
++	item->name_change.old_id.id = KDBUS_MATCH_ID_ANY;
++	item->name_change.new_id.id = KDBUS_MATCH_ID_ANY;
++
++	/* Copy in the well-known name we're interested in */
++	strcpy(item->name_change.name, name);
++
++	/*
++	 * Add the match through the KDBUS_CMD_MATCH_ADD ioctl, employed on
++	 * the connection owner fd.
++	 */
++	r = kdbus_cmd_match_add(b->fd, match);
++	if (r < 0)
++		return err_r(r, "cannot add match");
++
++	return 0;
++}
++
++static int bus_poll(struct bus *b)
++{
++	struct pollfd fds[1] = {};
++	int r;
++
++	/*
++	 * A connection endpoint supports poll() and will wake-up the
++	 * task with POLLIN set once a message has arrived.
++	 */
++	fds[0].fd = b->fd;
++	fds[0].events = POLLIN;
++	r = poll(fds, sizeof(fds) / sizeof(*fds), 0);
++	if (r < 0)
++		return err("cannot poll bus");
++	return !!(fds[0].revents & POLLIN);
++}
++
++static int bus_make(uid_t uid, const char *name)
++{
++	struct kdbus_item *item;
++	struct kdbus_cmd *make;
++	char path[128], busname[128];
++	size_t size;
++	int r, fd;
++
++	/*
++	 * Compute the full path to the 'control' node. 'arg_modname' may be
++	 * set to a different value than 'kdbus' for development purposes.
++	 * The 'control' node is the primary entry point to kdbus that must be
++	 * used in order to create a bus. See kdbus(7) and kdbus.bus(7).
++	 */
++	snprintf(path, sizeof(path), "/sys/fs/%s/control", arg_modname);
++
++	/*
++	 * Compute the bus name. A valid bus name must always be prefixed with
++	 * the EUID of the currently running process in order to avoid name
++	 * conflicts. See kdbus.bus(7).
++	 */
++	snprintf(busname, sizeof(busname), "%lu-%s", (unsigned long)uid, name);
++
++	fd = open(path, O_RDWR | O_CLOEXEC);
++	if (fd < 0)
++		return err("cannot open control file");
++
++	/*
++	 * The KDBUS_CMD_BUS_MAKE ioctl takes an argument of type
++	 * 'struct kdbus_cmd', and expects at least two items attached to
++	 * it: one to decribe the bloom parameters to be propagated to
++	 * connections of the bus, and the name of the bus that was computed
++	 * above. Assemble this struct now, and fill it with values.
++	 */
++	size = sizeof(*make);
++	size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_parameter));
++	size += KDBUS_ITEM_SIZE(strlen(busname) + 1);
++
++	make = alloca(size);
++	memset(make, 0, size);
++	make->size = size;
++
++	/*
++	 * Each item has a 'type' and 'size' field, and must be stored at an
++	 * 8-byte aligned address. The KDBUS_ITEM_NEXT macro is used to advance
++	 * the pointer. See kdbus.item(7) for more details.
++	 */
++	item = make->items;
++	item->type = KDBUS_ITEM_BLOOM_PARAMETER;
++	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(item->bloom_parameter);
++	item->bloom_parameter.size = 8;
++	item->bloom_parameter.n_hash = 1;
++
++	/* The name of the new bus is stored in the next item. */
++	item = KDBUS_ITEM_NEXT(item);
++	item->type = KDBUS_ITEM_MAKE_NAME;
++	item->size = KDBUS_ITEM_HEADER_SIZE + strlen(busname) + 1;
++	strcpy(item->str, busname);
++
++	/*
++	 * Now create the bus via the KDBUS_CMD_BUS_MAKE ioctl and return the
++	 * fd that was used back to the caller of this function. This fd is now
++	 * called a 'bus owner file descriptor', and it controls the life-time
++	 * of the newly created bus; once the file descriptor is closed, the
++	 * bus goes away, and all connections are shut down. See kdbus.bus(7).
++	 */
++	r = kdbus_cmd_bus_make(fd, make);
++	if (r < 0) {
++		err_r(r, "cannot make bus");
++		close(fd);
++		return r;
++	}
++
++	return fd;
++}
++
++#else
++
++#warning "Skipping compilation due to unsupported libc version"
++
++int main(int argc, char **argv)
++{
++	fprintf(stderr,
++		"Compilation of %s was skipped due to unsupported libc.\n",
++		argv[0]);
++
++	return EXIT_FAILURE;
++}
++
++#endif /* libc sanity check */
+diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
+index 95abddc..b57100c 100644
+--- a/tools/testing/selftests/Makefile
++++ b/tools/testing/selftests/Makefile
+@@ -5,6 +5,7 @@ TARGETS += exec
+ TARGETS += firmware
+ TARGETS += ftrace
+ TARGETS += kcmp
++TARGETS += kdbus
+ TARGETS += memfd
+ TARGETS += memory-hotplug
+ TARGETS += mount
+diff --git a/tools/testing/selftests/kdbus/.gitignore b/tools/testing/selftests/kdbus/.gitignore
+new file mode 100644
+index 0000000..d3ef42f
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/.gitignore
+@@ -0,0 +1 @@
++kdbus-test
+diff --git a/tools/testing/selftests/kdbus/Makefile b/tools/testing/selftests/kdbus/Makefile
+new file mode 100644
+index 0000000..8f36cb5
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/Makefile
+@@ -0,0 +1,49 @@
++CFLAGS += -I../../../../usr/include/
++CFLAGS += -I../../../../samples/kdbus/
++CFLAGS += -I../../../../include/uapi/
++CFLAGS += -std=gnu99
++CFLAGS += -DKBUILD_MODNAME=\"kdbus\" -D_GNU_SOURCE
++LDLIBS = -pthread -lcap -lm
++
++OBJS= \
++	kdbus-enum.o		\
++	kdbus-util.o		\
++	kdbus-test.o		\
++	kdbus-test.o		\
++	test-activator.o	\
++	test-benchmark.o	\
++	test-bus.o		\
++	test-chat.o		\
++	test-connection.o	\
++	test-daemon.o		\
++	test-endpoint.o		\
++	test-fd.o		\
++	test-free.o		\
++	test-match.o		\
++	test-message.o		\
++	test-metadata-ns.o	\
++	test-monitor.o		\
++	test-names.o		\
++	test-policy.o		\
++	test-policy-ns.o	\
++	test-policy-priv.o	\
++	test-sync.o		\
++	test-timeout.o
++
++all: kdbus-test
++
++include ../lib.mk
++
++%.o: %.c kdbus-enum.h kdbus-test.h kdbus-util.h
++	$(CC) $(CFLAGS) -c $< -o $@
++
++kdbus-test: $(OBJS)
++	$(CC) $(CFLAGS) $^ $(LDLIBS) -o $@
++
++TEST_PROGS := kdbus-test
++
++run_tests:
++	./kdbus-test --tap
++
++clean:
++	rm -f *.o kdbus-test
+diff --git a/tools/testing/selftests/kdbus/kdbus-enum.c b/tools/testing/selftests/kdbus/kdbus-enum.c
+new file mode 100644
+index 0000000..4f1e579
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/kdbus-enum.c
+@@ -0,0 +1,94 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <stdio.h>
++#include <string.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++
++struct kdbus_enum_table {
++	long long id;
++	const char *name;
++};
++
++#define TABLE(what) static struct kdbus_enum_table kdbus_table_##what[]
++#define ENUM(_id) { .id = _id, .name = STRINGIFY(_id) }
++#define LOOKUP(what)							\
++	const char *enum_##what(long long id)				\
++	{								\
++		for (size_t i = 0; i < ELEMENTSOF(kdbus_table_##what); i++) \
++			if (id == kdbus_table_##what[i].id)		\
++				return kdbus_table_##what[i].name;	\
++		return "UNKNOWN";					\
++	}
++
++TABLE(CMD) = {
++	ENUM(KDBUS_CMD_BUS_MAKE),
++	ENUM(KDBUS_CMD_ENDPOINT_MAKE),
++	ENUM(KDBUS_CMD_HELLO),
++	ENUM(KDBUS_CMD_SEND),
++	ENUM(KDBUS_CMD_RECV),
++	ENUM(KDBUS_CMD_LIST),
++	ENUM(KDBUS_CMD_NAME_RELEASE),
++	ENUM(KDBUS_CMD_CONN_INFO),
++	ENUM(KDBUS_CMD_MATCH_ADD),
++	ENUM(KDBUS_CMD_MATCH_REMOVE),
++};
++LOOKUP(CMD);
++
++TABLE(MSG) = {
++	ENUM(_KDBUS_ITEM_NULL),
++	ENUM(KDBUS_ITEM_PAYLOAD_VEC),
++	ENUM(KDBUS_ITEM_PAYLOAD_OFF),
++	ENUM(KDBUS_ITEM_PAYLOAD_MEMFD),
++	ENUM(KDBUS_ITEM_FDS),
++	ENUM(KDBUS_ITEM_BLOOM_PARAMETER),
++	ENUM(KDBUS_ITEM_BLOOM_FILTER),
++	ENUM(KDBUS_ITEM_DST_NAME),
++	ENUM(KDBUS_ITEM_MAKE_NAME),
++	ENUM(KDBUS_ITEM_ATTACH_FLAGS_SEND),
++	ENUM(KDBUS_ITEM_ATTACH_FLAGS_RECV),
++	ENUM(KDBUS_ITEM_ID),
++	ENUM(KDBUS_ITEM_NAME),
++	ENUM(KDBUS_ITEM_TIMESTAMP),
++	ENUM(KDBUS_ITEM_CREDS),
++	ENUM(KDBUS_ITEM_PIDS),
++	ENUM(KDBUS_ITEM_AUXGROUPS),
++	ENUM(KDBUS_ITEM_OWNED_NAME),
++	ENUM(KDBUS_ITEM_TID_COMM),
++	ENUM(KDBUS_ITEM_PID_COMM),
++	ENUM(KDBUS_ITEM_EXE),
++	ENUM(KDBUS_ITEM_CMDLINE),
++	ENUM(KDBUS_ITEM_CGROUP),
++	ENUM(KDBUS_ITEM_CAPS),
++	ENUM(KDBUS_ITEM_SECLABEL),
++	ENUM(KDBUS_ITEM_AUDIT),
++	ENUM(KDBUS_ITEM_CONN_DESCRIPTION),
++	ENUM(KDBUS_ITEM_NAME_ADD),
++	ENUM(KDBUS_ITEM_NAME_REMOVE),
++	ENUM(KDBUS_ITEM_NAME_CHANGE),
++	ENUM(KDBUS_ITEM_ID_ADD),
++	ENUM(KDBUS_ITEM_ID_REMOVE),
++	ENUM(KDBUS_ITEM_REPLY_TIMEOUT),
++	ENUM(KDBUS_ITEM_REPLY_DEAD),
++};
++LOOKUP(MSG);
++
++TABLE(PAYLOAD) = {
++	ENUM(KDBUS_PAYLOAD_KERNEL),
++	ENUM(KDBUS_PAYLOAD_DBUS),
++};
++LOOKUP(PAYLOAD);
+diff --git a/tools/testing/selftests/kdbus/kdbus-enum.h b/tools/testing/selftests/kdbus/kdbus-enum.h
+new file mode 100644
+index 0000000..ed28cca
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/kdbus-enum.h
+@@ -0,0 +1,15 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#pragma once
++
++const char *enum_CMD(long long id);
++const char *enum_MSG(long long id);
++const char *enum_MATCH(long long id);
++const char *enum_PAYLOAD(long long id);
+diff --git a/tools/testing/selftests/kdbus/kdbus-test.c b/tools/testing/selftests/kdbus/kdbus-test.c
+new file mode 100644
+index 0000000..db57381
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/kdbus-test.c
+@@ -0,0 +1,905 @@
++#include <errno.h>
++#include <stdio.h>
++#include <string.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <time.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <assert.h>
++#include <getopt.h>
++#include <stdbool.h>
++#include <signal.h>
++#include <sys/mount.h>
++#include <sys/prctl.h>
++#include <sys/wait.h>
++#include <sys/syscall.h>
++#include <sys/eventfd.h>
++#include <linux/sched.h>
++
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++#include "kdbus-test.h"
++
++enum {
++	TEST_CREATE_BUS		= 1 << 0,
++	TEST_CREATE_CONN	= 1 << 1,
++};
++
++struct kdbus_test {
++	const char *name;
++	const char *desc;
++	int (*func)(struct kdbus_test_env *env);
++	unsigned int flags;
++};
++
++struct kdbus_test_args {
++	bool mntns;
++	bool pidns;
++	bool userns;
++	char *uid_map;
++	char *gid_map;
++	int loop;
++	int wait;
++	int fork;
++	int tap_output;
++	char *module;
++	char *root;
++	char *test;
++	char *busname;
++};
++
++static const struct kdbus_test tests[] = {
++	{
++		.name	= "bus-make",
++		.desc	= "bus make functions",
++		.func	= kdbus_test_bus_make,
++		.flags	= 0,
++	},
++	{
++		.name	= "hello",
++		.desc	= "the HELLO command",
++		.func	= kdbus_test_hello,
++		.flags	= TEST_CREATE_BUS,
++	},
++	{
++		.name	= "byebye",
++		.desc	= "the BYEBYE command",
++		.func	= kdbus_test_byebye,
++		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
++	},
++	{
++		.name	= "chat",
++		.desc	= "a chat pattern",
++		.func	= kdbus_test_chat,
++		.flags	= TEST_CREATE_BUS,
++	},
++	{
++		.name	= "daemon",
++		.desc	= "a simple daemon",
++		.func	= kdbus_test_daemon,
++		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
++	},
++	{
++		.name	= "fd-passing",
++		.desc	= "file descriptor passing",
++		.func	= kdbus_test_fd_passing,
++		.flags	= TEST_CREATE_BUS,
++	},
++	{
++		.name	= "endpoint",
++		.desc	= "custom endpoint",
++		.func	= kdbus_test_custom_endpoint,
++		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
++	},
++	{
++		.name	= "monitor",
++		.desc	= "monitor functionality",
++		.func	= kdbus_test_monitor,
++		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
++	},
++	{
++		.name	= "name-basics",
++		.desc	= "basic name registry functions",
++		.func	= kdbus_test_name_basic,
++		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
++	},
++	{
++		.name	= "name-conflict",
++		.desc	= "name registry conflict details",
++		.func	= kdbus_test_name_conflict,
++		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
++	},
++	{
++		.name	= "name-queue",
++		.desc	= "queuing of names",
++		.func	= kdbus_test_name_queue,
++		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
++	},
++	{
++		.name	= "name-takeover",
++		.desc	= "takeover of names",
++		.func	= kdbus_test_name_takeover,
++		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
++	},
++	{
++		.name	= "message-basic",
++		.desc	= "basic message handling",
++		.func	= kdbus_test_message_basic,
++		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
++	},
++	{
++		.name	= "message-prio",
++		.desc	= "handling of messages with priority",
++		.func	= kdbus_test_message_prio,
++		.flags	= TEST_CREATE_BUS,
++	},
++	{
++		.name	= "message-quota",
++		.desc	= "message quotas are enforced",
++		.func	= kdbus_test_message_quota,
++		.flags	= TEST_CREATE_BUS,
++	},
++	{
++		.name	= "memory-access",
++		.desc	= "memory access",
++		.func	= kdbus_test_memory_access,
++		.flags	= TEST_CREATE_BUS,
++	},
++	{
++		.name	= "timeout",
++		.desc	= "timeout",
++		.func	= kdbus_test_timeout,
++		.flags	= TEST_CREATE_BUS,
++	},
++	{
++		.name	= "sync-byebye",
++		.desc	= "synchronous replies vs. BYEBYE",
++		.func	= kdbus_test_sync_byebye,
++		.flags	= TEST_CREATE_BUS,
++	},
++	{
++		.name	= "sync-reply",
++		.desc	= "synchronous replies",
++		.func	= kdbus_test_sync_reply,
++		.flags	= TEST_CREATE_BUS,
++	},
++	{
++		.name	= "message-free",
++		.desc	= "freeing of memory",
++		.func	= kdbus_test_free,
++		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
++	},
++	{
++		.name	= "connection-info",
++		.desc	= "retrieving connection information",
++		.func	= kdbus_test_conn_info,
++		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
++	},
++	{
++		.name	= "connection-update",
++		.desc	= "updating connection information",
++		.func	= kdbus_test_conn_update,
++		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
++	},
++	{
++		.name	= "writable-pool",
++		.desc	= "verifying pools are never writable",
++		.func	= kdbus_test_writable_pool,
++		.flags	= TEST_CREATE_BUS,
++	},
++	{
++		.name	= "policy",
++		.desc	= "policy",
++		.func	= kdbus_test_policy,
++		.flags	= TEST_CREATE_BUS,
++	},
++	{
++		.name	= "policy-priv",
++		.desc	= "unprivileged bus access",
++		.func	= kdbus_test_policy_priv,
++		.flags	= TEST_CREATE_BUS,
++	},
++	{
++		.name	= "policy-ns",
++		.desc	= "policy in user namespaces",
++		.func	= kdbus_test_policy_ns,
++		.flags	= TEST_CREATE_BUS,
++	},
++	{
++		.name	= "metadata-ns",
++		.desc	= "metadata in different namespaces",
++		.func	= kdbus_test_metadata_ns,
++		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
++	},
++	{
++		.name	= "match-id-add",
++		.desc	= "adding of matches by id",
++		.func	= kdbus_test_match_id_add,
++		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
++	},
++	{
++		.name	= "match-id-remove",
++		.desc	= "removing of matches by id",
++		.func	= kdbus_test_match_id_remove,
++		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
++	},
++	{
++		.name	= "match-replace",
++		.desc	= "replace of matches with the same cookie",
++		.func	= kdbus_test_match_replace,
++		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
++	},
++	{
++		.name	= "match-name-add",
++		.desc	= "adding of matches by name",
++		.func	= kdbus_test_match_name_add,
++		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
++	},
++	{
++		.name	= "match-name-remove",
++		.desc	= "removing of matches by name",
++		.func	= kdbus_test_match_name_remove,
++		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
++	},
++	{
++		.name	= "match-name-change",
++		.desc	= "matching for name changes",
++		.func	= kdbus_test_match_name_change,
++		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
++	},
++	{
++		.name	= "match-bloom",
++		.desc	= "matching with bloom filters",
++		.func	= kdbus_test_match_bloom,
++		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
++	},
++	{
++		.name	= "activator",
++		.desc	= "activator connections",
++		.func	= kdbus_test_activator,
++		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
++	},
++	{
++		.name	= "benchmark",
++		.desc	= "benchmark",
++		.func	= kdbus_test_benchmark,
++		.flags	= TEST_CREATE_BUS,
++	},
++	{
++		.name	= "benchmark-nomemfds",
++		.desc	= "benchmark without using memfds",
++		.func	= kdbus_test_benchmark_nomemfds,
++		.flags	= TEST_CREATE_BUS,
++	},
++	{
++		.name	= "benchmark-uds",
++		.desc	= "benchmark comparison to UDS",
++		.func	= kdbus_test_benchmark_uds,
++		.flags	= TEST_CREATE_BUS,
++	},
++};
++
++#define N_TESTS ((int) (sizeof(tests) / sizeof(tests[0])))
++
++static int test_prepare_env(const struct kdbus_test *t,
++			    const struct kdbus_test_args *args,
++			    struct kdbus_test_env *env)
++{
++	if (t->flags & TEST_CREATE_BUS) {
++		char *s;
++		char *n = NULL;
++		int ret;
++
++		asprintf(&s, "%s/control", args->root);
++
++		env->control_fd = open(s, O_RDWR);
++		free(s);
++		ASSERT_RETURN(env->control_fd >= 0);
++
++		if (!args->busname) {
++			n = unique_name("test-bus");
++			ASSERT_RETURN(n);
++		}
++
++		ret = kdbus_create_bus(env->control_fd,
++				       args->busname ?: n,
++				       _KDBUS_ATTACH_ALL, &s);
++		free(n);
++		ASSERT_RETURN(ret == 0);
++
++		asprintf(&env->buspath, "%s/%s/bus", args->root, s);
++		free(s);
++	}
++
++	if (t->flags & TEST_CREATE_CONN) {
++		env->conn = kdbus_hello(env->buspath, 0, NULL, 0);
++		ASSERT_RETURN(env->conn);
++	}
++
++	env->root = args->root;
++	env->module = args->module;
++
++	return 0;
++}
++
++void test_unprepare_env(const struct kdbus_test *t, struct kdbus_test_env *env)
++{
++	if (env->conn) {
++		kdbus_conn_free(env->conn);
++		env->conn = NULL;
++	}
++
++	if (env->control_fd >= 0) {
++		close(env->control_fd);
++		env->control_fd = -1;
++	}
++
++	if (env->buspath) {
++		free(env->buspath);
++		env->buspath = NULL;
++	}
++}
++
++static int test_run(const struct kdbus_test *t,
++		    const struct kdbus_test_args *kdbus_args,
++		    int wait)
++{
++	int ret;
++	struct kdbus_test_env env = {};
++
++	ret = test_prepare_env(t, kdbus_args, &env);
++	if (ret != TEST_OK)
++		return ret;
++
++	if (wait > 0) {
++		printf("Sleeping %d seconds before running test ...\n", wait);
++		sleep(wait);
++	}
++
++	ret = t->func(&env);
++	test_unprepare_env(t, &env);
++	return ret;
++}
++
++static int test_run_forked(const struct kdbus_test *t,
++			   const struct kdbus_test_args *kdbus_args,
++			   int wait)
++{
++	int ret;
++	pid_t pid;
++
++	pid = fork();
++	if (pid < 0) {
++		return TEST_ERR;
++	} else if (pid == 0) {
++		ret = test_run(t, kdbus_args, wait);
++		_exit(ret);
++	}
++
++	pid = waitpid(pid, &ret, 0);
++	if (pid <= 0)
++		return TEST_ERR;
++	else if (!WIFEXITED(ret))
++		return TEST_ERR;
++	else
++		return WEXITSTATUS(ret);
++}
++
++static void print_test_result(int ret)
++{
++	switch (ret) {
++	case TEST_OK:
++		printf("OK");
++		break;
++	case TEST_SKIP:
++		printf("SKIPPED");
++		break;
++	case TEST_ERR:
++		printf("ERROR");
++		break;
++	}
++}
++
++static int start_all_tests(struct kdbus_test_args *kdbus_args)
++{
++	int ret;
++	unsigned int fail_cnt = 0;
++	unsigned int skip_cnt = 0;
++	unsigned int ok_cnt = 0;
++	unsigned int i;
++
++	if (kdbus_args->tap_output) {
++		printf("1..%d\n", N_TESTS);
++		fflush(stdout);
++	}
++
++	kdbus_util_verbose = false;
++
++	for (i = 0; i < N_TESTS; i++) {
++		const struct kdbus_test *t = tests + i;
++
++		if (!kdbus_args->tap_output) {
++			unsigned int n;
++
++			printf("Testing %s (%s) ", t->desc, t->name);
++			for (n = 0; n < 60 - strlen(t->desc) - strlen(t->name); n++)
++				printf(".");
++			printf(" ");
++		}
++
++		ret = test_run_forked(t, kdbus_args, 0);
++		switch (ret) {
++		case TEST_OK:
++			ok_cnt++;
++			break;
++		case TEST_SKIP:
++			skip_cnt++;
++			break;
++		case TEST_ERR:
++			fail_cnt++;
++			break;
++		}
++
++		if (kdbus_args->tap_output) {
++			printf("%sok %d - %s%s (%s)\n",
++			       (ret == TEST_ERR) ? "not " : "", i + 1,
++			       (ret == TEST_SKIP) ? "# SKIP " : "",
++			       t->desc, t->name);
++			fflush(stdout);
++		} else {
++			print_test_result(ret);
++			printf("\n");
++		}
++	}
++
++	if (kdbus_args->tap_output)
++		printf("Failed %d/%d tests, %.2f%% okay\n", fail_cnt, N_TESTS,
++		       100.0 - (fail_cnt * 100.0) / ((float) N_TESTS));
++	else
++		printf("\nSUMMARY: %u tests passed, %u skipped, %u failed\n",
++		       ok_cnt, skip_cnt, fail_cnt);
++
++	return fail_cnt > 0 ? TEST_ERR : TEST_OK;
++}
++
++static int start_one_test(struct kdbus_test_args *kdbus_args)
++{
++	int i, ret;
++	bool test_found = false;
++
++	for (i = 0; i < N_TESTS; i++) {
++		const struct kdbus_test *t = tests + i;
++
++		if (strcmp(t->name, kdbus_args->test))
++			continue;
++
++		do {
++			test_found = true;
++			if (kdbus_args->fork)
++				ret = test_run_forked(t, kdbus_args,
++						      kdbus_args->wait);
++			else
++				ret = test_run(t, kdbus_args,
++					       kdbus_args->wait);
++
++			printf("Testing %s: ", t->desc);
++			print_test_result(ret);
++			printf("\n");
++
++			if (ret != TEST_OK)
++				break;
++		} while (kdbus_args->loop);
++
++		return ret;
++	}
++
++	if (!test_found) {
++		printf("Unknown test-id '%s'\n", kdbus_args->test);
++		return TEST_ERR;
++	}
++
++	return TEST_OK;
++}
++
++static void usage(const char *argv0)
++{
++	unsigned int i, j;
++
++	printf("Usage: %s [options]\n"
++	       "Options:\n"
++	       "\t-a, --tap		Output test results in TAP format\n"
++	       "\t-m, --module <module>	Kdbus module name\n"
++	       "\t-x, --loop		Run in a loop\n"
++	       "\t-f, --fork		Fork before running a test\n"
++	       "\t-h, --help		Print this help\n"
++	       "\t-r, --root <root>	Toplevel of the kdbus hierarchy\n"
++	       "\t-t, --test <test-id>	Run one specific test only, in verbose mode\n"
++	       "\t-b, --bus <busname>	Instead of generating a random bus name, take <busname>.\n"
++	       "\t-w, --wait <secs>	Wait <secs> before actually starting test\n"
++	       "\t    --mntns		New mount namespace\n"
++	       "\t    --pidns		New PID namespace\n"
++	       "\t    --userns		New user namespace\n"
++	       "\t    --uidmap uid_map	UID map for user namespace\n"
++	       "\t    --gidmap gid_map	GID map for user namespace\n"
++	       "\n", argv0);
++
++	printf("By default, all test are run once, and a summary is printed.\n"
++	       "Available tests for --test:\n\n");
++
++	for (i = 0; i < N_TESTS; i++) {
++		const struct kdbus_test *t = tests + i;
++
++		printf("\t%s", t->name);
++
++		for (j = 0; j < 24 - strlen(t->name); j++)
++			printf(" ");
++
++		printf("Test %s\n", t->desc);
++	}
++
++	printf("\n");
++	printf("Note that some tests may, if run specifically by --test, "
++	       "behave differently, and not terminate by themselves.\n");
++
++	exit(EXIT_FAILURE);
++}
++
++void print_kdbus_test_args(struct kdbus_test_args *args)
++{
++	if (args->userns || args->pidns || args->mntns)
++		printf("# Starting tests in new %s%s%s namespaces%s\n",
++			args->mntns ? "MOUNT " : "",
++			args->pidns ? "PID " : "",
++			args->userns ? "USER " : "",
++			args->mntns ? ", kdbusfs will be remounted" : "");
++	else
++		printf("# Starting tests in the same namespaces\n");
++}
++
++void print_metadata_support(void)
++{
++	bool no_meta_audit, no_meta_cgroups, no_meta_seclabel;
++
++	/*
++	 * KDBUS_ATTACH_CGROUP, KDBUS_ATTACH_AUDIT and
++	 * KDBUS_ATTACH_SECLABEL
++	 */
++	no_meta_audit = !config_auditsyscall_is_enabled();
++	no_meta_cgroups = !config_cgroups_is_enabled();
++	no_meta_seclabel = !config_security_is_enabled();
++
++	if (no_meta_audit | no_meta_cgroups | no_meta_seclabel)
++		printf("# Starting tests without %s%s%s metadata support\n",
++		       no_meta_audit ? "AUDIT " : "",
++		       no_meta_cgroups ? "CGROUP " : "",
++		       no_meta_seclabel ? "SECLABEL " : "");
++	else
++		printf("# Starting tests with full metadata support\n");
++}
++
++int run_tests(struct kdbus_test_args *kdbus_args)
++{
++	int ret;
++	static char control[4096];
++
++	snprintf(control, sizeof(control), "%s/control", kdbus_args->root);
++
++	if (access(control, W_OK) < 0) {
++		printf("Unable to locate control node at '%s'.\n",
++			control);
++		return TEST_ERR;
++	}
++
++	if (kdbus_args->test) {
++		ret = start_one_test(kdbus_args);
++	} else {
++		do {
++			ret = start_all_tests(kdbus_args);
++			if (ret != TEST_OK)
++				break;
++		} while (kdbus_args->loop);
++	}
++
++	return ret;
++}
++
++static void nop_handler(int sig) {}
++
++static int test_prepare_mounts(struct kdbus_test_args *kdbus_args)
++{
++	int ret;
++	char kdbusfs[64] = {'\0'};
++
++	snprintf(kdbusfs, sizeof(kdbusfs), "%sfs", kdbus_args->module);
++
++	/* make current mount slave */
++	ret = mount(NULL, "/", NULL, MS_SLAVE|MS_REC, NULL);
++	if (ret < 0) {
++		ret = -errno;
++		printf("error mount() root: %d (%m)\n", ret);
++		return ret;
++	}
++
++	/* Remount procfs since we need it in our tests */
++	if (kdbus_args->pidns) {
++		ret = mount("proc", "/proc", "proc",
++			    MS_NOSUID|MS_NOEXEC|MS_NODEV, NULL);
++		if (ret < 0) {
++			ret = -errno;
++			printf("error mount() /proc : %d (%m)\n", ret);
++			return ret;
++		}
++	}
++
++	/* Remount kdbusfs */
++	ret = mount(kdbusfs, kdbus_args->root, kdbusfs,
++		    MS_NOSUID|MS_NOEXEC|MS_NODEV, NULL);
++	if (ret < 0) {
++		ret = -errno;
++		printf("error mount() %s :%d (%m)\n", kdbusfs, ret);
++		return ret;
++	}
++
++	return 0;
++}
++
++int run_tests_in_namespaces(struct kdbus_test_args *kdbus_args)
++{
++	int ret;
++	int efd = -1;
++	int status;
++	pid_t pid, rpid;
++	struct sigaction oldsa;
++	struct sigaction sa = {
++		.sa_handler = nop_handler,
++		.sa_flags = SA_NOCLDSTOP,
++	};
++
++	efd = eventfd(0, EFD_CLOEXEC);
++	if (efd < 0) {
++		ret = -errno;
++		printf("eventfd() failed: %d (%m)\n", ret);
++		return TEST_ERR;
++	}
++
++	ret = sigaction(SIGCHLD, &sa, &oldsa);
++	if (ret < 0) {
++		ret = -errno;
++		printf("sigaction() failed: %d (%m)\n", ret);
++		return TEST_ERR;
++	}
++
++	/* setup namespaces */
++	pid = syscall(__NR_clone, SIGCHLD|
++		      (kdbus_args->userns ? CLONE_NEWUSER : 0) |
++		      (kdbus_args->mntns ? CLONE_NEWNS : 0) |
++		      (kdbus_args->pidns ? CLONE_NEWPID : 0), NULL);
++	if (pid < 0) {
++		printf("clone() failed: %d (%m)\n", -errno);
++		return TEST_ERR;
++	}
++
++	if (pid == 0) {
++		eventfd_t event_status = 0;
++
++		ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
++		if (ret < 0) {
++			ret = -errno;
++			printf("error prctl(): %d (%m)\n", ret);
++			_exit(TEST_ERR);
++		}
++
++		/* reset sighandlers of childs */
++		ret = sigaction(SIGCHLD, &oldsa, NULL);
++		if (ret < 0) {
++			ret = -errno;
++			printf("sigaction() failed: %d (%m)\n", ret);
++			_exit(TEST_ERR);
++		}
++
++		ret = eventfd_read(efd, &event_status);
++		if (ret < 0 || event_status != 1) {
++			printf("error eventfd_read()\n");
++			_exit(TEST_ERR);
++		}
++
++		if (kdbus_args->mntns) {
++			ret = test_prepare_mounts(kdbus_args);
++			if (ret < 0) {
++				printf("error preparing mounts\n");
++				_exit(TEST_ERR);
++			}
++		}
++
++		ret = run_tests(kdbus_args);
++		_exit(ret);
++	}
++
++	/* Setup userns mapping */
++	if (kdbus_args->userns) {
++		ret = userns_map_uid_gid(pid, kdbus_args->uid_map,
++					 kdbus_args->gid_map);
++		if (ret < 0) {
++			printf("error mapping uid and gid in userns\n");
++			eventfd_write(efd, 2);
++			return TEST_ERR;
++		}
++	}
++
++	ret = eventfd_write(efd, 1);
++	if (ret < 0) {
++		ret = -errno;
++		printf("error eventfd_write(): %d (%m)\n", ret);
++		return TEST_ERR;
++	}
++
++	rpid = waitpid(pid, &status, 0);
++	ASSERT_RETURN_VAL(rpid == pid, TEST_ERR);
++
++	close(efd);
++
++	if (!WIFEXITED(status) || WEXITSTATUS(status) != 0)
++		return TEST_ERR;
++
++	return TEST_OK;
++}
++
++int start_tests(struct kdbus_test_args *kdbus_args)
++{
++	int ret;
++	bool namespaces;
++	static char fspath[4096];
++
++	namespaces = (kdbus_args->mntns || kdbus_args->pidns ||
++		      kdbus_args->userns);
++
++	/* for pidns we need mntns set */
++	if (kdbus_args->pidns && !kdbus_args->mntns) {
++		printf("Failed: please set both pid and mnt namesapces\n");
++		return TEST_ERR;
++	}
++
++	if (kdbus_args->userns) {
++		if (!config_user_ns_is_enabled()) {
++			printf("User namespace not supported\n");
++			return TEST_ERR;
++		}
++
++		if (!kdbus_args->uid_map || !kdbus_args->gid_map) {
++			printf("Failed: please specify uid or gid mapping\n");
++			return TEST_ERR;
++		}
++	}
++
++	print_kdbus_test_args(kdbus_args);
++	print_metadata_support();
++
++	/* setup kdbus paths */
++	if (!kdbus_args->module)
++		kdbus_args->module = "kdbus";
++
++	if (!kdbus_args->root) {
++		snprintf(fspath, sizeof(fspath), "/sys/fs/%s",
++			 kdbus_args->module);
++		kdbus_args->root = fspath;
++	}
++
++	/* Start tests */
++	if (namespaces)
++		ret = run_tests_in_namespaces(kdbus_args);
++	else
++		ret = run_tests(kdbus_args);
++
++	return ret;
++}
++
++int main(int argc, char *argv[])
++{
++	int t, ret = 0;
++	struct kdbus_test_args *kdbus_args;
++	enum {
++		ARG_MNTNS = 0x100,
++		ARG_PIDNS,
++		ARG_USERNS,
++		ARG_UIDMAP,
++		ARG_GIDMAP,
++	};
++
++	kdbus_args = malloc(sizeof(*kdbus_args));
++	if (!kdbus_args) {
++		printf("unable to malloc() kdbus_args\n");
++		return EXIT_FAILURE;
++	}
++
++	memset(kdbus_args, 0, sizeof(*kdbus_args));
++
++	static const struct option options[] = {
++		{ "loop",	no_argument,		NULL, 'x' },
++		{ "help",	no_argument,		NULL, 'h' },
++		{ "root",	required_argument,	NULL, 'r' },
++		{ "test",	required_argument,	NULL, 't' },
++		{ "bus",	required_argument,	NULL, 'b' },
++		{ "wait",	required_argument,	NULL, 'w' },
++		{ "fork",	no_argument,		NULL, 'f' },
++		{ "module",	required_argument,	NULL, 'm' },
++		{ "tap",	no_argument,		NULL, 'a' },
++		{ "mntns",	no_argument,		NULL, ARG_MNTNS },
++		{ "pidns",	no_argument,		NULL, ARG_PIDNS },
++		{ "userns",	no_argument,		NULL, ARG_USERNS },
++		{ "uidmap",	required_argument,	NULL, ARG_UIDMAP },
++		{ "gidmap",	required_argument,	NULL, ARG_GIDMAP },
++		{}
++	};
++
++	srand(time(NULL));
++
++	while ((t = getopt_long(argc, argv, "hxfm:r:t:b:w:a", options, NULL)) >= 0) {
++		switch (t) {
++		case 'x':
++			kdbus_args->loop = 1;
++			break;
++
++		case 'm':
++			kdbus_args->module = optarg;
++			break;
++
++		case 'r':
++			kdbus_args->root = optarg;
++			break;
++
++		case 't':
++			kdbus_args->test = optarg;
++			break;
++
++		case 'b':
++			kdbus_args->busname = optarg;
++			break;
++
++		case 'w':
++			kdbus_args->wait = strtol(optarg, NULL, 10);
++			break;
++
++		case 'f':
++			kdbus_args->fork = 1;
++			break;
++
++		case 'a':
++			kdbus_args->tap_output = 1;
++			break;
++
++		case ARG_MNTNS:
++			kdbus_args->mntns = true;
++			break;
++
++		case ARG_PIDNS:
++			kdbus_args->pidns = true;
++			break;
++
++		case ARG_USERNS:
++			kdbus_args->userns = true;
++			break;
++
++		case ARG_UIDMAP:
++			kdbus_args->uid_map = optarg;
++			break;
++
++		case ARG_GIDMAP:
++			kdbus_args->gid_map = optarg;
++			break;
++
++		default:
++		case 'h':
++			usage(argv[0]);
++		}
++	}
++
++	ret = start_tests(kdbus_args);
++	if (ret == TEST_ERR)
++		return EXIT_FAILURE;
++
++	free(kdbus_args);
++
++	return 0;
++}
+diff --git a/tools/testing/selftests/kdbus/kdbus-test.h b/tools/testing/selftests/kdbus/kdbus-test.h
+new file mode 100644
+index 0000000..ee937f9
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/kdbus-test.h
+@@ -0,0 +1,84 @@
++#ifndef _TEST_KDBUS_H_
++#define _TEST_KDBUS_H_
++
++struct kdbus_test_env {
++	char *buspath;
++	const char *root;
++	const char *module;
++	int control_fd;
++	struct kdbus_conn *conn;
++};
++
++enum {
++	TEST_OK,
++	TEST_SKIP,
++	TEST_ERR,
++};
++
++#define ASSERT_RETURN_VAL(cond, val)		\
++	if (!(cond)) {			\
++		fprintf(stderr,	"Assertion '%s' failed in %s(), %s:%d\n", \
++			#cond, __func__, __FILE__, __LINE__);	\
++		return val;	\
++	}
++
++#define ASSERT_EXIT_VAL(cond, val)		\
++	if (!(cond)) {			\
++		fprintf(stderr, "Assertion '%s' failed in %s(), %s:%d\n", \
++			#cond, __func__, __FILE__, __LINE__);	\
++		_exit(val);	\
++	}
++
++#define ASSERT_BREAK(cond)		\
++	if (!(cond)) {			\
++		fprintf(stderr, "Assertion '%s' failed in %s(), %s:%d\n", \
++			#cond, __func__, __FILE__, __LINE__);	\
++		break; \
++	}
++
++#define ASSERT_RETURN(cond)		\
++	ASSERT_RETURN_VAL(cond, TEST_ERR)
++
++#define ASSERT_EXIT(cond)		\
++	ASSERT_EXIT_VAL(cond, EXIT_FAILURE)
++
++int kdbus_test_activator(struct kdbus_test_env *env);
++int kdbus_test_benchmark(struct kdbus_test_env *env);
++int kdbus_test_benchmark_nomemfds(struct kdbus_test_env *env);
++int kdbus_test_benchmark_uds(struct kdbus_test_env *env);
++int kdbus_test_bus_make(struct kdbus_test_env *env);
++int kdbus_test_byebye(struct kdbus_test_env *env);
++int kdbus_test_chat(struct kdbus_test_env *env);
++int kdbus_test_conn_info(struct kdbus_test_env *env);
++int kdbus_test_conn_update(struct kdbus_test_env *env);
++int kdbus_test_daemon(struct kdbus_test_env *env);
++int kdbus_test_custom_endpoint(struct kdbus_test_env *env);
++int kdbus_test_fd_passing(struct kdbus_test_env *env);
++int kdbus_test_free(struct kdbus_test_env *env);
++int kdbus_test_hello(struct kdbus_test_env *env);
++int kdbus_test_match_bloom(struct kdbus_test_env *env);
++int kdbus_test_match_id_add(struct kdbus_test_env *env);
++int kdbus_test_match_id_remove(struct kdbus_test_env *env);
++int kdbus_test_match_replace(struct kdbus_test_env *env);
++int kdbus_test_match_name_add(struct kdbus_test_env *env);
++int kdbus_test_match_name_change(struct kdbus_test_env *env);
++int kdbus_test_match_name_remove(struct kdbus_test_env *env);
++int kdbus_test_message_basic(struct kdbus_test_env *env);
++int kdbus_test_message_prio(struct kdbus_test_env *env);
++int kdbus_test_message_quota(struct kdbus_test_env *env);
++int kdbus_test_memory_access(struct kdbus_test_env *env);
++int kdbus_test_metadata_ns(struct kdbus_test_env *env);
++int kdbus_test_monitor(struct kdbus_test_env *env);
++int kdbus_test_name_basic(struct kdbus_test_env *env);
++int kdbus_test_name_conflict(struct kdbus_test_env *env);
++int kdbus_test_name_queue(struct kdbus_test_env *env);
++int kdbus_test_name_takeover(struct kdbus_test_env *env);
++int kdbus_test_policy(struct kdbus_test_env *env);
++int kdbus_test_policy_ns(struct kdbus_test_env *env);
++int kdbus_test_policy_priv(struct kdbus_test_env *env);
++int kdbus_test_sync_byebye(struct kdbus_test_env *env);
++int kdbus_test_sync_reply(struct kdbus_test_env *env);
++int kdbus_test_timeout(struct kdbus_test_env *env);
++int kdbus_test_writable_pool(struct kdbus_test_env *env);
++
++#endif /* _TEST_KDBUS_H_ */
+diff --git a/tools/testing/selftests/kdbus/kdbus-util.c b/tools/testing/selftests/kdbus/kdbus-util.c
+new file mode 100644
+index 0000000..82fa89b
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/kdbus-util.c
+@@ -0,0 +1,1612 @@
++/*
++ * Copyright (C) 2013-2015 Daniel Mack
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2014-2015 Djalal Harouni
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <stdio.h>
++#include <stdarg.h>
++#include <string.h>
++#include <time.h>
++#include <inttypes.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <stdbool.h>
++#include <errno.h>
++#include <assert.h>
++#include <poll.h>
++#include <grp.h>
++#include <sys/capability.h>
++#include <sys/mman.h>
++#include <sys/stat.h>
++#include <sys/time.h>
++#include <linux/unistd.h>
++#include <linux/memfd.h>
++
++#ifndef __NR_memfd_create
++  #ifdef __x86_64__
++    #define __NR_memfd_create 319
++  #elif defined __arm__
++    #define __NR_memfd_create 385
++  #else
++    #define __NR_memfd_create 356
++  #endif
++#endif
++
++#include "kdbus-api.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++
++#ifndef F_ADD_SEALS
++#define F_LINUX_SPECIFIC_BASE	1024
++#define F_ADD_SEALS     (F_LINUX_SPECIFIC_BASE + 9)
++#define F_GET_SEALS     (F_LINUX_SPECIFIC_BASE + 10)
++
++#define F_SEAL_SEAL     0x0001  /* prevent further seals from being set */
++#define F_SEAL_SHRINK   0x0002  /* prevent file from shrinking */
++#define F_SEAL_GROW     0x0004  /* prevent file from growing */
++#define F_SEAL_WRITE    0x0008  /* prevent writes */
++#endif
++
++int kdbus_util_verbose = true;
++
++int kdbus_sysfs_get_parameter_mask(const char *path, uint64_t *mask)
++{
++	int ret;
++	FILE *file;
++	unsigned long long value;
++
++	file = fopen(path, "r");
++	if (!file) {
++		ret = -errno;
++		kdbus_printf("--- error fopen(): %d (%m)\n", ret);
++		return ret;
++	}
++
++	ret = fscanf(file, "%llu", &value);
++	if (ret != 1) {
++		if (ferror(file))
++			ret = -errno;
++		else
++			ret = -EIO;
++
++		kdbus_printf("--- error fscanf(): %d\n", ret);
++		fclose(file);
++		return ret;
++	}
++
++	*mask = (uint64_t)value;
++
++	fclose(file);
++
++	return 0;
++}
++
++int kdbus_sysfs_set_parameter_mask(const char *path, uint64_t mask)
++{
++	int ret;
++	FILE *file;
++
++	file = fopen(path, "w");
++	if (!file) {
++		ret = -errno;
++		kdbus_printf("--- error open(): %d (%m)\n", ret);
++		return ret;
++	}
++
++	ret = fprintf(file, "%llu", (unsigned long long)mask);
++	if (ret <= 0) {
++		ret = -EIO;
++		kdbus_printf("--- error fprintf(): %d\n", ret);
++	}
++
++	fclose(file);
++
++	return ret > 0 ? 0 : ret;
++}
++
++int kdbus_create_bus(int control_fd, const char *name,
++		     uint64_t owner_meta, char **path)
++{
++	struct {
++		struct kdbus_cmd cmd;
++
++		/* bloom size item */
++		struct {
++			uint64_t size;
++			uint64_t type;
++			struct kdbus_bloom_parameter bloom;
++		} bp;
++
++		/* owner metadata items */
++		struct {
++			uint64_t size;
++			uint64_t type;
++			uint64_t flags;
++		} attach;
++
++		/* name item */
++		struct {
++			uint64_t size;
++			uint64_t type;
++			char str[64];
++		} name;
++	} bus_make;
++	int ret;
++
++	memset(&bus_make, 0, sizeof(bus_make));
++	bus_make.bp.size = sizeof(bus_make.bp);
++	bus_make.bp.type = KDBUS_ITEM_BLOOM_PARAMETER;
++	bus_make.bp.bloom.size = 64;
++	bus_make.bp.bloom.n_hash = 1;
++
++	snprintf(bus_make.name.str, sizeof(bus_make.name.str),
++		 "%u-%s", getuid(), name);
++
++	bus_make.attach.type = KDBUS_ITEM_ATTACH_FLAGS_SEND;
++	bus_make.attach.size = sizeof(bus_make.attach);
++	bus_make.attach.flags = owner_meta;
++
++	bus_make.name.type = KDBUS_ITEM_MAKE_NAME;
++	bus_make.name.size = KDBUS_ITEM_HEADER_SIZE +
++			     strlen(bus_make.name.str) + 1;
++
++	bus_make.cmd.flags = KDBUS_MAKE_ACCESS_WORLD;
++	bus_make.cmd.size = sizeof(bus_make.cmd) +
++			     bus_make.bp.size +
++			     bus_make.attach.size +
++			     bus_make.name.size;
++
++	kdbus_printf("Creating bus with name >%s< on control fd %d ...\n",
++		     name, control_fd);
++
++	ret = kdbus_cmd_bus_make(control_fd, &bus_make.cmd);
++	if (ret < 0) {
++		kdbus_printf("--- error when making bus: %d (%m)\n", ret);
++		return ret;
++	}
++
++	if (ret == 0 && path)
++		*path = strdup(bus_make.name.str);
++
++	return ret;
++}
++
++struct kdbus_conn *
++kdbus_hello(const char *path, uint64_t flags,
++	    const struct kdbus_item *item, size_t item_size)
++{
++	struct kdbus_cmd_free cmd_free = {};
++	int fd, ret;
++	struct {
++		struct kdbus_cmd_hello hello;
++
++		struct {
++			uint64_t size;
++			uint64_t type;
++			char str[16];
++		} conn_name;
++
++		uint8_t extra_items[item_size];
++	} h;
++	struct kdbus_conn *conn;
++
++	memset(&h, 0, sizeof(h));
++
++	if (item_size > 0)
++		memcpy(h.extra_items, item, item_size);
++
++	kdbus_printf("-- opening bus connection %s\n", path);
++	fd = open(path, O_RDWR|O_CLOEXEC);
++	if (fd < 0) {
++		kdbus_printf("--- error %d (%m)\n", fd);
++		return NULL;
++	}
++
++	h.hello.flags = flags | KDBUS_HELLO_ACCEPT_FD;
++	h.hello.attach_flags_send = _KDBUS_ATTACH_ALL;
++	h.hello.attach_flags_recv = _KDBUS_ATTACH_ALL;
++	h.conn_name.type = KDBUS_ITEM_CONN_DESCRIPTION;
++	strcpy(h.conn_name.str, "this-is-my-name");
++	h.conn_name.size = KDBUS_ITEM_HEADER_SIZE + strlen(h.conn_name.str) + 1;
++
++	h.hello.size = sizeof(h);
++	h.hello.pool_size = POOL_SIZE;
++
++	ret = kdbus_cmd_hello(fd, (struct kdbus_cmd_hello *) &h.hello);
++	if (ret < 0) {
++		kdbus_printf("--- error when saying hello: %d (%m)\n", ret);
++		return NULL;
++	}
++	kdbus_printf("-- Our peer ID for %s: %llu -- bus uuid: '%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x'\n",
++		     path, (unsigned long long)h.hello.id,
++		     h.hello.id128[0],  h.hello.id128[1],  h.hello.id128[2],
++		     h.hello.id128[3],  h.hello.id128[4],  h.hello.id128[5],
++		     h.hello.id128[6],  h.hello.id128[7],  h.hello.id128[8],
++		     h.hello.id128[9],  h.hello.id128[10], h.hello.id128[11],
++		     h.hello.id128[12], h.hello.id128[13], h.hello.id128[14],
++		     h.hello.id128[15]);
++
++	cmd_free.size = sizeof(cmd_free);
++	cmd_free.offset = h.hello.offset;
++	kdbus_cmd_free(fd, &cmd_free);
++
++	conn = malloc(sizeof(*conn));
++	if (!conn) {
++		kdbus_printf("unable to malloc()!?\n");
++		return NULL;
++	}
++
++	conn->buf = mmap(NULL, POOL_SIZE, PROT_READ, MAP_SHARED, fd, 0);
++	if (conn->buf == MAP_FAILED) {
++		free(conn);
++		close(fd);
++		kdbus_printf("--- error mmap (%m)\n");
++		return NULL;
++	}
++
++	conn->fd = fd;
++	conn->id = h.hello.id;
++	return conn;
++}
++
++struct kdbus_conn *
++kdbus_hello_registrar(const char *path, const char *name,
++		      const struct kdbus_policy_access *access,
++		      size_t num_access, uint64_t flags)
++{
++	struct kdbus_item *item, *items;
++	size_t i, size;
++
++	size = KDBUS_ITEM_SIZE(strlen(name) + 1) +
++		num_access * KDBUS_ITEM_SIZE(sizeof(*access));
++
++	items = alloca(size);
++
++	item = items;
++	item->size = KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
++	item->type = KDBUS_ITEM_NAME;
++	strcpy(item->str, name);
++	item = KDBUS_ITEM_NEXT(item);
++
++	for (i = 0; i < num_access; i++) {
++		item->size = KDBUS_ITEM_HEADER_SIZE +
++			     sizeof(struct kdbus_policy_access);
++		item->type = KDBUS_ITEM_POLICY_ACCESS;
++
++		item->policy_access.type = access[i].type;
++		item->policy_access.access = access[i].access;
++		item->policy_access.id = access[i].id;
++
++		item = KDBUS_ITEM_NEXT(item);
++	}
++
++	return kdbus_hello(path, flags, items, size);
++}
++
++struct kdbus_conn *kdbus_hello_activator(const char *path, const char *name,
++				   const struct kdbus_policy_access *access,
++				   size_t num_access)
++{
++	return kdbus_hello_registrar(path, name, access, num_access,
++				     KDBUS_HELLO_ACTIVATOR);
++}
++
++bool kdbus_item_in_message(struct kdbus_msg *msg, uint64_t type)
++{
++	const struct kdbus_item *item;
++
++	KDBUS_ITEM_FOREACH(item, msg, items)
++		if (item->type == type)
++			return true;
++
++	return false;
++}
++
++int kdbus_bus_creator_info(struct kdbus_conn *conn,
++			   uint64_t flags,
++			   uint64_t *offset)
++{
++	struct kdbus_cmd_info *cmd;
++	size_t size = sizeof(*cmd);
++	int ret;
++
++	cmd = alloca(size);
++	memset(cmd, 0, size);
++	cmd->size = size;
++	cmd->attach_flags = flags;
++
++	ret = kdbus_cmd_bus_creator_info(conn->fd, cmd);
++	if (ret < 0) {
++		kdbus_printf("--- error when requesting info: %d (%m)\n", ret);
++		return ret;
++	}
++
++	if (offset)
++		*offset = cmd->offset;
++	else
++		kdbus_free(conn, cmd->offset);
++
++	return 0;
++}
++
++int kdbus_conn_info(struct kdbus_conn *conn, uint64_t id,
++		    const char *name, uint64_t flags,
++		    uint64_t *offset)
++{
++	struct kdbus_cmd_info *cmd;
++	size_t size = sizeof(*cmd);
++	struct kdbus_info *info;
++	int ret;
++
++	if (name)
++		size += KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
++
++	cmd = alloca(size);
++	memset(cmd, 0, size);
++	cmd->size = size;
++	cmd->attach_flags = flags;
++
++	if (name) {
++		cmd->items[0].size = KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
++		cmd->items[0].type = KDBUS_ITEM_NAME;
++		strcpy(cmd->items[0].str, name);
++	} else {
++		cmd->id = id;
++	}
++
++	ret = kdbus_cmd_conn_info(conn->fd, cmd);
++	if (ret < 0) {
++		kdbus_printf("--- error when requesting info: %d (%m)\n", ret);
++		return ret;
++	}
++
++	info = (struct kdbus_info *) (conn->buf + cmd->offset);
++	if (info->size != cmd->info_size) {
++		kdbus_printf("%s(): size mismatch: %d != %d\n", __func__,
++				(int) info->size, (int) cmd->info_size);
++		return -EIO;
++	}
++
++	if (offset)
++		*offset = cmd->offset;
++	else
++		kdbus_free(conn, cmd->offset);
++
++	return 0;
++}
++
++void kdbus_conn_free(struct kdbus_conn *conn)
++{
++	if (!conn)
++		return;
++
++	if (conn->buf)
++		munmap(conn->buf, POOL_SIZE);
++
++	if (conn->fd >= 0)
++		close(conn->fd);
++
++	free(conn);
++}
++
++int sys_memfd_create(const char *name, __u64 size)
++{
++	int ret, fd;
++
++	fd = syscall(__NR_memfd_create, name, MFD_ALLOW_SEALING);
++	if (fd < 0)
++		return fd;
++
++	ret = ftruncate(fd, size);
++	if (ret < 0) {
++		close(fd);
++		return ret;
++	}
++
++	return fd;
++}
++
++int sys_memfd_seal_set(int fd)
++{
++	return fcntl(fd, F_ADD_SEALS, F_SEAL_SHRINK |
++			 F_SEAL_GROW | F_SEAL_WRITE | F_SEAL_SEAL);
++}
++
++off_t sys_memfd_get_size(int fd, off_t *size)
++{
++	struct stat stat;
++	int ret;
++
++	ret = fstat(fd, &stat);
++	if (ret < 0) {
++		kdbus_printf("stat() failed: %m\n");
++		return ret;
++	}
++
++	*size = stat.st_size;
++	return 0;
++}
++
++static int __kdbus_msg_send(const struct kdbus_conn *conn,
++			    const char *name,
++			    uint64_t cookie,
++			    uint64_t flags,
++			    uint64_t timeout,
++			    int64_t priority,
++			    uint64_t dst_id,
++			    uint64_t cmd_flags,
++			    int cancel_fd)
++{
++	struct kdbus_cmd_send *cmd = NULL;
++	struct kdbus_msg *msg = NULL;
++	const char ref1[1024 * 128 + 3] = "0123456789_0";
++	const char ref2[] = "0123456789_1";
++	struct kdbus_item *item;
++	struct timespec now;
++	uint64_t size;
++	int memfd = -1;
++	int ret;
++
++	size = sizeof(*msg) + 3 * KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
++
++	if (dst_id == KDBUS_DST_ID_BROADCAST)
++		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
++	else {
++		memfd = sys_memfd_create("my-name-is-nice", 1024 * 1024);
++		if (memfd < 0) {
++			kdbus_printf("failed to create memfd: %m\n");
++			return memfd;
++		}
++
++		if (write(memfd, "kdbus memfd 1234567", 19) != 19) {
++			ret = -errno;
++			kdbus_printf("writing to memfd failed: %m\n");
++			goto out;
++		}
++
++		ret = sys_memfd_seal_set(memfd);
++		if (ret < 0) {
++			ret = -errno;
++			kdbus_printf("memfd sealing failed: %m\n");
++			goto out;
++		}
++
++		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_memfd));
++	}
++
++	if (name)
++		size += KDBUS_ITEM_SIZE(strlen(name) + 1);
++
++	msg = malloc(size);
++	if (!msg) {
++		ret = -errno;
++		kdbus_printf("unable to malloc()!?\n");
++		goto out;
++	}
++
++	if (dst_id == KDBUS_DST_ID_BROADCAST)
++		flags |= KDBUS_MSG_SIGNAL;
++
++	memset(msg, 0, size);
++	msg->flags = flags;
++	msg->priority = priority;
++	msg->size = size;
++	msg->src_id = conn->id;
++	msg->dst_id = name ? 0 : dst_id;
++	msg->cookie = cookie;
++	msg->payload_type = KDBUS_PAYLOAD_DBUS;
++
++	if (timeout) {
++		ret = clock_gettime(CLOCK_MONOTONIC_COARSE, &now);
++		if (ret < 0)
++			goto out;
++
++		msg->timeout_ns = now.tv_sec * 1000000000ULL +
++				  now.tv_nsec + timeout;
++	}
++
++	item = msg->items;
++
++	if (name) {
++		item->type = KDBUS_ITEM_DST_NAME;
++		item->size = KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
++		strcpy(item->str, name);
++		item = KDBUS_ITEM_NEXT(item);
++	}
++
++	item->type = KDBUS_ITEM_PAYLOAD_VEC;
++	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
++	item->vec.address = (uintptr_t)&ref1;
++	item->vec.size = sizeof(ref1);
++	item = KDBUS_ITEM_NEXT(item);
++
++	/* data padding for ref1 */
++	item->type = KDBUS_ITEM_PAYLOAD_VEC;
++	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
++	item->vec.address = (uintptr_t)NULL;
++	item->vec.size =  KDBUS_ALIGN8(sizeof(ref1)) - sizeof(ref1);
++	item = KDBUS_ITEM_NEXT(item);
++
++	item->type = KDBUS_ITEM_PAYLOAD_VEC;
++	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
++	item->vec.address = (uintptr_t)&ref2;
++	item->vec.size = sizeof(ref2);
++	item = KDBUS_ITEM_NEXT(item);
++
++	if (dst_id == KDBUS_DST_ID_BROADCAST) {
++		item->type = KDBUS_ITEM_BLOOM_FILTER;
++		item->size = KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
++		item->bloom_filter.generation = 0;
++	} else {
++		item->type = KDBUS_ITEM_PAYLOAD_MEMFD;
++		item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_memfd);
++		item->memfd.size = 16;
++		item->memfd.fd = memfd;
++	}
++	item = KDBUS_ITEM_NEXT(item);
++
++	size = sizeof(*cmd);
++	if (cancel_fd != -1)
++		size += KDBUS_ITEM_SIZE(sizeof(cancel_fd));
++
++	cmd = malloc(size);
++	if (!cmd) {
++		ret = -errno;
++		kdbus_printf("unable to malloc()!?\n");
++		goto out;
++	}
++
++	cmd->size = size;
++	cmd->flags = cmd_flags;
++	cmd->msg_address = (uintptr_t)msg;
++
++	item = cmd->items;
++
++	if (cancel_fd != -1) {
++		item->type = KDBUS_ITEM_CANCEL_FD;
++		item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(cancel_fd);
++		item->fds[0] = cancel_fd;
++		item = KDBUS_ITEM_NEXT(item);
++	}
++
++	ret = kdbus_cmd_send(conn->fd, cmd);
++	if (ret < 0) {
++		kdbus_printf("error sending message: %d (%m)\n", ret);
++		goto out;
++	}
++
++	if (cmd_flags & KDBUS_SEND_SYNC_REPLY) {
++		struct kdbus_msg *reply;
++
++		kdbus_printf("SYNC REPLY @offset %llu:\n", cmd->reply.offset);
++		reply = (struct kdbus_msg *)(conn->buf + cmd->reply.offset);
++		kdbus_msg_dump(conn, reply);
++
++		kdbus_msg_free(reply);
++
++		ret = kdbus_free(conn, cmd->reply.offset);
++		if (ret < 0)
++			goto out;
++	}
++
++out:
++	free(msg);
++	free(cmd);
++
++	if (memfd >= 0)
++		close(memfd);
++
++	return ret < 0 ? ret : 0;
++}
++
++int kdbus_msg_send(const struct kdbus_conn *conn, const char *name,
++		   uint64_t cookie, uint64_t flags, uint64_t timeout,
++		   int64_t priority, uint64_t dst_id)
++{
++	return __kdbus_msg_send(conn, name, cookie, flags, timeout, priority,
++				dst_id, 0, -1);
++}
++
++int kdbus_msg_send_sync(const struct kdbus_conn *conn, const char *name,
++			uint64_t cookie, uint64_t flags, uint64_t timeout,
++			int64_t priority, uint64_t dst_id, int cancel_fd)
++{
++	return __kdbus_msg_send(conn, name, cookie, flags, timeout, priority,
++				dst_id, KDBUS_SEND_SYNC_REPLY, cancel_fd);
++}
++
++int kdbus_msg_send_reply(const struct kdbus_conn *conn,
++			 uint64_t reply_cookie,
++			 uint64_t dst_id)
++{
++	struct kdbus_cmd_send cmd = {};
++	struct kdbus_msg *msg;
++	const char ref1[1024 * 128 + 3] = "0123456789_0";
++	struct kdbus_item *item;
++	uint64_t size;
++	int ret;
++
++	size = sizeof(struct kdbus_msg);
++	size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
++
++	msg = malloc(size);
++	if (!msg) {
++		kdbus_printf("unable to malloc()!?\n");
++		return -ENOMEM;
++	}
++
++	memset(msg, 0, size);
++	msg->size = size;
++	msg->src_id = conn->id;
++	msg->dst_id = dst_id;
++	msg->cookie_reply = reply_cookie;
++	msg->payload_type = KDBUS_PAYLOAD_DBUS;
++
++	item = msg->items;
++
++	item->type = KDBUS_ITEM_PAYLOAD_VEC;
++	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
++	item->vec.address = (uintptr_t)&ref1;
++	item->vec.size = sizeof(ref1);
++	item = KDBUS_ITEM_NEXT(item);
++
++	cmd.size = sizeof(cmd);
++	cmd.msg_address = (uintptr_t)msg;
++
++	ret = kdbus_cmd_send(conn->fd, &cmd);
++	if (ret < 0)
++		kdbus_printf("error sending message: %d (%m)\n", ret);
++
++	free(msg);
++
++	return ret;
++}
++
++static char *msg_id(uint64_t id, char *buf)
++{
++	if (id == 0)
++		return "KERNEL";
++	if (id == ~0ULL)
++		return "BROADCAST";
++	sprintf(buf, "%llu", (unsigned long long)id);
++	return buf;
++}
++
++int kdbus_msg_dump(const struct kdbus_conn *conn, const struct kdbus_msg *msg)
++{
++	const struct kdbus_item *item = msg->items;
++	char buf_src[32];
++	char buf_dst[32];
++	uint64_t timeout = 0;
++	uint64_t cookie_reply = 0;
++	int ret = 0;
++
++	if (msg->flags & KDBUS_MSG_EXPECT_REPLY)
++		timeout = msg->timeout_ns;
++	else
++		cookie_reply = msg->cookie_reply;
++
++	kdbus_printf("MESSAGE: %s (%llu bytes) flags=0x%08llx, %s → %s, "
++		     "cookie=%llu, timeout=%llu cookie_reply=%llu priority=%lli\n",
++		enum_PAYLOAD(msg->payload_type), (unsigned long long)msg->size,
++		(unsigned long long)msg->flags,
++		msg_id(msg->src_id, buf_src), msg_id(msg->dst_id, buf_dst),
++		(unsigned long long)msg->cookie, (unsigned long long)timeout,
++		(unsigned long long)cookie_reply, (long long)msg->priority);
++
++	KDBUS_ITEM_FOREACH(item, msg, items) {
++		if (item->size < KDBUS_ITEM_HEADER_SIZE) {
++			kdbus_printf("  +%s (%llu bytes) invalid data record\n",
++				     enum_MSG(item->type), item->size);
++			ret = -EINVAL;
++			break;
++		}
++
++		switch (item->type) {
++		case KDBUS_ITEM_PAYLOAD_OFF: {
++			char *s;
++
++			if (item->vec.offset == ~0ULL)
++				s = "[\\0-bytes]";
++			else
++				s = (char *)msg + item->vec.offset;
++
++			kdbus_printf("  +%s (%llu bytes) off=%llu size=%llu '%s'\n",
++			       enum_MSG(item->type), item->size,
++			       (unsigned long long)item->vec.offset,
++			       (unsigned long long)item->vec.size, s);
++			break;
++		}
++
++		case KDBUS_ITEM_FDS: {
++			int i, n = (item->size - KDBUS_ITEM_HEADER_SIZE) /
++					sizeof(int);
++
++			kdbus_printf("  +%s (%llu bytes, %d fds)\n",
++			       enum_MSG(item->type), item->size, n);
++
++			for (i = 0; i < n; i++)
++				kdbus_printf("    fd[%d] = %d\n",
++					     i, item->fds[i]);
++
++			break;
++		}
++
++		case KDBUS_ITEM_PAYLOAD_MEMFD: {
++			char *buf;
++			off_t size;
++
++			buf = mmap(NULL, item->memfd.size, PROT_READ,
++				   MAP_PRIVATE, item->memfd.fd, 0);
++			if (buf == MAP_FAILED) {
++				kdbus_printf("mmap() fd=%i size=%llu failed: %m\n",
++					     item->memfd.fd, item->memfd.size);
++				break;
++			}
++
++			if (sys_memfd_get_size(item->memfd.fd, &size) < 0) {
++				kdbus_printf("KDBUS_CMD_MEMFD_SIZE_GET failed: %m\n");
++				break;
++			}
++
++			kdbus_printf("  +%s (%llu bytes) fd=%i size=%llu filesize=%llu '%s'\n",
++			       enum_MSG(item->type), item->size, item->memfd.fd,
++			       (unsigned long long)item->memfd.size,
++			       (unsigned long long)size, buf);
++			munmap(buf, item->memfd.size);
++			break;
++		}
++
++		case KDBUS_ITEM_CREDS:
++			kdbus_printf("  +%s (%llu bytes) uid=%lld, euid=%lld, suid=%lld, fsuid=%lld, "
++							"gid=%lld, egid=%lld, sgid=%lld, fsgid=%lld\n",
++				enum_MSG(item->type), item->size,
++				item->creds.uid, item->creds.euid,
++				item->creds.suid, item->creds.fsuid,
++				item->creds.gid, item->creds.egid,
++				item->creds.sgid, item->creds.fsgid);
++			break;
++
++		case KDBUS_ITEM_PIDS:
++			kdbus_printf("  +%s (%llu bytes) pid=%lld, tid=%lld, ppid=%lld\n",
++				enum_MSG(item->type), item->size,
++				item->pids.pid, item->pids.tid,
++				item->pids.ppid);
++			break;
++
++		case KDBUS_ITEM_AUXGROUPS: {
++			int i, n;
++
++			kdbus_printf("  +%s (%llu bytes)\n",
++				     enum_MSG(item->type), item->size);
++			n = (item->size - KDBUS_ITEM_HEADER_SIZE) /
++				sizeof(uint64_t);
++
++			for (i = 0; i < n; i++)
++				kdbus_printf("    gid[%d] = %lld\n",
++					     i, item->data64[i]);
++			break;
++		}
++
++		case KDBUS_ITEM_NAME:
++		case KDBUS_ITEM_PID_COMM:
++		case KDBUS_ITEM_TID_COMM:
++		case KDBUS_ITEM_EXE:
++		case KDBUS_ITEM_CGROUP:
++		case KDBUS_ITEM_SECLABEL:
++		case KDBUS_ITEM_DST_NAME:
++		case KDBUS_ITEM_CONN_DESCRIPTION:
++			kdbus_printf("  +%s (%llu bytes) '%s' (%zu)\n",
++				     enum_MSG(item->type), item->size,
++				     item->str, strlen(item->str));
++			break;
++
++		case KDBUS_ITEM_OWNED_NAME: {
++			kdbus_printf("  +%s (%llu bytes) '%s' (%zu) flags=0x%08llx\n",
++				     enum_MSG(item->type), item->size,
++				     item->name.name, strlen(item->name.name),
++				     item->name.flags);
++			break;
++		}
++
++		case KDBUS_ITEM_CMDLINE: {
++			size_t size = item->size - KDBUS_ITEM_HEADER_SIZE;
++			const char *str = item->str;
++			int count = 0;
++
++			kdbus_printf("  +%s (%llu bytes) ",
++				     enum_MSG(item->type), item->size);
++			while (size) {
++				kdbus_printf("'%s' ", str);
++				size -= strlen(str) + 1;
++				str += strlen(str) + 1;
++				count++;
++			}
++
++			kdbus_printf("(%d string%s)\n",
++				     count, (count == 1) ? "" : "s");
++			break;
++		}
++
++		case KDBUS_ITEM_AUDIT:
++			kdbus_printf("  +%s (%llu bytes) loginuid=%u sessionid=%u\n",
++			       enum_MSG(item->type), item->size,
++			       item->audit.loginuid, item->audit.sessionid);
++			break;
++
++		case KDBUS_ITEM_CAPS: {
++			const uint32_t *cap;
++			int n, i;
++
++			kdbus_printf("  +%s (%llu bytes) len=%llu bytes, last_cap %d\n",
++				     enum_MSG(item->type), item->size,
++				     (unsigned long long)item->size -
++					KDBUS_ITEM_HEADER_SIZE,
++				     (int) item->caps.last_cap);
++
++			cap = item->caps.caps;
++			n = (item->size - offsetof(struct kdbus_item, caps.caps))
++				/ 4 / sizeof(uint32_t);
++
++			kdbus_printf("    CapInh=");
++			for (i = 0; i < n; i++)
++				kdbus_printf("%08x", cap[(0 * n) + (n - i - 1)]);
++
++			kdbus_printf(" CapPrm=");
++			for (i = 0; i < n; i++)
++				kdbus_printf("%08x", cap[(1 * n) + (n - i - 1)]);
++
++			kdbus_printf(" CapEff=");
++			for (i = 0; i < n; i++)
++				kdbus_printf("%08x", cap[(2 * n) + (n - i - 1)]);
++
++			kdbus_printf(" CapBnd=");
++			for (i = 0; i < n; i++)
++				kdbus_printf("%08x", cap[(3 * n) + (n - i - 1)]);
++			kdbus_printf("\n");
++			break;
++		}
++
++		case KDBUS_ITEM_TIMESTAMP:
++			kdbus_printf("  +%s (%llu bytes) seq=%llu realtime=%lluns monotonic=%lluns\n",
++			       enum_MSG(item->type), item->size,
++			       (unsigned long long)item->timestamp.seqnum,
++			       (unsigned long long)item->timestamp.realtime_ns,
++			       (unsigned long long)item->timestamp.monotonic_ns);
++			break;
++
++		case KDBUS_ITEM_REPLY_TIMEOUT:
++			kdbus_printf("  +%s (%llu bytes) cookie=%llu\n",
++			       enum_MSG(item->type), item->size,
++			       msg->cookie_reply);
++			break;
++
++		case KDBUS_ITEM_NAME_ADD:
++		case KDBUS_ITEM_NAME_REMOVE:
++		case KDBUS_ITEM_NAME_CHANGE:
++			kdbus_printf("  +%s (%llu bytes) '%s', old id=%lld, now id=%lld, old_flags=0x%llx new_flags=0x%llx\n",
++				enum_MSG(item->type),
++				(unsigned long long) item->size,
++				item->name_change.name,
++				item->name_change.old_id.id,
++				item->name_change.new_id.id,
++				item->name_change.old_id.flags,
++				item->name_change.new_id.flags);
++			break;
++
++		case KDBUS_ITEM_ID_ADD:
++		case KDBUS_ITEM_ID_REMOVE:
++			kdbus_printf("  +%s (%llu bytes) id=%llu flags=%llu\n",
++			       enum_MSG(item->type),
++			       (unsigned long long) item->size,
++			       (unsigned long long) item->id_change.id,
++			       (unsigned long long) item->id_change.flags);
++			break;
++
++		default:
++			kdbus_printf("  +%s (%llu bytes)\n",
++				     enum_MSG(item->type), item->size);
++			break;
++		}
++	}
++
++	if ((char *)item - ((char *)msg + msg->size) >= 8) {
++		kdbus_printf("invalid padding at end of message\n");
++		ret = -EINVAL;
++	}
++
++	kdbus_printf("\n");
++
++	return ret;
++}
++
++void kdbus_msg_free(struct kdbus_msg *msg)
++{
++	const struct kdbus_item *item;
++	int nfds, i;
++
++	if (!msg)
++		return;
++
++	KDBUS_ITEM_FOREACH(item, msg, items) {
++		switch (item->type) {
++		/* close all memfds */
++		case KDBUS_ITEM_PAYLOAD_MEMFD:
++			close(item->memfd.fd);
++			break;
++		case KDBUS_ITEM_FDS:
++			nfds = (item->size - KDBUS_ITEM_HEADER_SIZE) /
++				sizeof(int);
++
++			for (i = 0; i < nfds; i++)
++				close(item->fds[i]);
++
++			break;
++		}
++	}
++}
++
++int kdbus_msg_recv(struct kdbus_conn *conn,
++		   struct kdbus_msg **msg_out,
++		   uint64_t *offset)
++{
++	struct kdbus_cmd_recv recv = { .size = sizeof(recv) };
++	struct kdbus_msg *msg;
++	int ret;
++
++	ret = kdbus_cmd_recv(conn->fd, &recv);
++	if (ret < 0)
++		return ret;
++
++	msg = (struct kdbus_msg *)(conn->buf + recv.msg.offset);
++	ret = kdbus_msg_dump(conn, msg);
++	if (ret < 0) {
++		kdbus_msg_free(msg);
++		return ret;
++	}
++
++	if (msg_out) {
++		*msg_out = msg;
++
++		if (offset)
++			*offset = recv.msg.offset;
++	} else {
++		kdbus_msg_free(msg);
++
++		ret = kdbus_free(conn, recv.msg.offset);
++		if (ret < 0)
++			return ret;
++	}
++
++	return 0;
++}
++
++/*
++ * Returns: 0 on success, negative errno on failure.
++ *
++ * We must return -ETIMEDOUT, -ECONNREST, -EAGAIN and other errors.
++ * We must return the result of kdbus_msg_recv()
++ */
++int kdbus_msg_recv_poll(struct kdbus_conn *conn,
++			int timeout_ms,
++			struct kdbus_msg **msg_out,
++			uint64_t *offset)
++{
++	int ret;
++
++	do {
++		struct timeval before, after, diff;
++		struct pollfd fd;
++
++		fd.fd = conn->fd;
++		fd.events = POLLIN | POLLPRI | POLLHUP;
++		fd.revents = 0;
++
++		gettimeofday(&before, NULL);
++		ret = poll(&fd, 1, timeout_ms);
++		gettimeofday(&after, NULL);
++
++		if (ret == 0) {
++			ret = -ETIMEDOUT;
++			break;
++		}
++
++		if (ret > 0) {
++			if (fd.revents & POLLIN)
++				ret = kdbus_msg_recv(conn, msg_out, offset);
++
++			if (fd.revents & (POLLHUP | POLLERR))
++				ret = -ECONNRESET;
++		}
++
++		if (ret == 0 || ret != -EAGAIN)
++			break;
++
++		timersub(&after, &before, &diff);
++		timeout_ms -= diff.tv_sec * 1000UL +
++			      diff.tv_usec / 1000UL;
++	} while (timeout_ms > 0);
++
++	return ret;
++}
++
++int kdbus_free(const struct kdbus_conn *conn, uint64_t offset)
++{
++	struct kdbus_cmd_free cmd_free = {};
++	int ret;
++
++	cmd_free.size = sizeof(cmd_free);
++	cmd_free.offset = offset;
++	cmd_free.flags = 0;
++
++	ret = kdbus_cmd_free(conn->fd, &cmd_free);
++	if (ret < 0) {
++		kdbus_printf("KDBUS_CMD_FREE failed: %d (%m)\n", ret);
++		return ret;
++	}
++
++	return 0;
++}
++
++int kdbus_name_acquire(struct kdbus_conn *conn,
++		       const char *name, uint64_t *flags)
++{
++	struct kdbus_cmd *cmd_name;
++	size_t name_len = strlen(name) + 1;
++	uint64_t size = sizeof(*cmd_name) + KDBUS_ITEM_SIZE(name_len);
++	struct kdbus_item *item;
++	int ret;
++
++	cmd_name = alloca(size);
++
++	memset(cmd_name, 0, size);
++
++	item = cmd_name->items;
++	item->size = KDBUS_ITEM_HEADER_SIZE + name_len;
++	item->type = KDBUS_ITEM_NAME;
++	strcpy(item->str, name);
++
++	cmd_name->size = size;
++	if (flags)
++		cmd_name->flags = *flags;
++
++	ret = kdbus_cmd_name_acquire(conn->fd, cmd_name);
++	if (ret < 0) {
++		kdbus_printf("error aquiring name: %s\n", strerror(-ret));
++		return ret;
++	}
++
++	kdbus_printf("%s(): flags after call: 0x%llx\n", __func__,
++		     cmd_name->return_flags);
++
++	if (flags)
++		*flags = cmd_name->return_flags;
++
++	return 0;
++}
++
++int kdbus_name_release(struct kdbus_conn *conn, const char *name)
++{
++	struct kdbus_cmd *cmd_name;
++	size_t name_len = strlen(name) + 1;
++	uint64_t size = sizeof(*cmd_name) + KDBUS_ITEM_SIZE(name_len);
++	struct kdbus_item *item;
++	int ret;
++
++	cmd_name = alloca(size);
++
++	memset(cmd_name, 0, size);
++
++	item = cmd_name->items;
++	item->size = KDBUS_ITEM_HEADER_SIZE + name_len;
++	item->type = KDBUS_ITEM_NAME;
++	strcpy(item->str, name);
++
++	cmd_name->size = size;
++
++	kdbus_printf("conn %lld giving up name '%s'\n",
++		     (unsigned long long) conn->id, name);
++
++	ret = kdbus_cmd_name_release(conn->fd, cmd_name);
++	if (ret < 0) {
++		kdbus_printf("error releasing name: %s\n", strerror(-ret));
++		return ret;
++	}
++
++	return 0;
++}
++
++int kdbus_list(struct kdbus_conn *conn, uint64_t flags)
++{
++	struct kdbus_cmd_list cmd_list = {};
++	struct kdbus_info *list, *name;
++	int ret;
++
++	cmd_list.size = sizeof(cmd_list);
++	cmd_list.flags = flags;
++
++	ret = kdbus_cmd_list(conn->fd, &cmd_list);
++	if (ret < 0) {
++		kdbus_printf("error listing names: %d (%m)\n", ret);
++		return ret;
++	}
++
++	kdbus_printf("REGISTRY:\n");
++	list = (struct kdbus_info *)(conn->buf + cmd_list.offset);
++
++	KDBUS_FOREACH(name, list, cmd_list.list_size) {
++		uint64_t flags = 0;
++		struct kdbus_item *item;
++		const char *n = "MISSING-NAME";
++
++		if (name->size == sizeof(struct kdbus_cmd))
++			continue;
++
++		KDBUS_ITEM_FOREACH(item, name, items)
++			if (item->type == KDBUS_ITEM_OWNED_NAME) {
++				n = item->name.name;
++				flags = item->name.flags;
++
++				kdbus_printf("%8llu flags=0x%08llx conn=0x%08llx '%s'\n",
++					     name->id,
++					     (unsigned long long) flags,
++					     name->flags, n);
++			}
++	}
++	kdbus_printf("\n");
++
++	ret = kdbus_free(conn, cmd_list.offset);
++
++	return ret;
++}
++
++int kdbus_conn_update_attach_flags(struct kdbus_conn *conn,
++				   uint64_t attach_flags_send,
++				   uint64_t attach_flags_recv)
++{
++	int ret;
++	size_t size;
++	struct kdbus_cmd *update;
++	struct kdbus_item *item;
++
++	size = sizeof(struct kdbus_cmd);
++	size += KDBUS_ITEM_SIZE(sizeof(uint64_t)) * 2;
++
++	update = malloc(size);
++	if (!update) {
++		kdbus_printf("error malloc: %m\n");
++		return -ENOMEM;
++	}
++
++	memset(update, 0, size);
++	update->size = size;
++
++	item = update->items;
++
++	item->type = KDBUS_ITEM_ATTACH_FLAGS_SEND;
++	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(uint64_t);
++	item->data64[0] = attach_flags_send;
++	item = KDBUS_ITEM_NEXT(item);
++
++	item->type = KDBUS_ITEM_ATTACH_FLAGS_RECV;
++	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(uint64_t);
++	item->data64[0] = attach_flags_recv;
++	item = KDBUS_ITEM_NEXT(item);
++
++	ret = kdbus_cmd_update(conn->fd, update);
++	if (ret < 0)
++		kdbus_printf("error conn update: %d (%m)\n", ret);
++
++	free(update);
++
++	return ret;
++}
++
++int kdbus_conn_update_policy(struct kdbus_conn *conn, const char *name,
++			     const struct kdbus_policy_access *access,
++			     size_t num_access)
++{
++	struct kdbus_cmd *update;
++	struct kdbus_item *item;
++	size_t i, size;
++	int ret;
++
++	size = sizeof(struct kdbus_cmd);
++	size += KDBUS_ITEM_SIZE(strlen(name) + 1);
++	size += num_access * KDBUS_ITEM_SIZE(sizeof(struct kdbus_policy_access));
++
++	update = malloc(size);
++	if (!update) {
++		kdbus_printf("error malloc: %m\n");
++		return -ENOMEM;
++	}
++
++	memset(update, 0, size);
++	update->size = size;
++
++	item = update->items;
++
++	item->type = KDBUS_ITEM_NAME;
++	item->size = KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
++	strcpy(item->str, name);
++	item = KDBUS_ITEM_NEXT(item);
++
++	for (i = 0; i < num_access; i++) {
++		item->size = KDBUS_ITEM_HEADER_SIZE +
++			     sizeof(struct kdbus_policy_access);
++		item->type = KDBUS_ITEM_POLICY_ACCESS;
++
++		item->policy_access.type = access[i].type;
++		item->policy_access.access = access[i].access;
++		item->policy_access.id = access[i].id;
++
++		item = KDBUS_ITEM_NEXT(item);
++	}
++
++	ret = kdbus_cmd_update(conn->fd, update);
++	if (ret < 0)
++		kdbus_printf("error conn update: %d (%m)\n", ret);
++
++	free(update);
++
++	return ret;
++}
++
++int kdbus_add_match_id(struct kdbus_conn *conn, uint64_t cookie,
++		       uint64_t type, uint64_t id)
++{
++	struct {
++		struct kdbus_cmd_match cmd;
++		struct {
++			uint64_t size;
++			uint64_t type;
++			struct kdbus_notify_id_change chg;
++		} item;
++	} buf;
++	int ret;
++
++	memset(&buf, 0, sizeof(buf));
++
++	buf.cmd.size = sizeof(buf);
++	buf.cmd.cookie = cookie;
++	buf.item.size = sizeof(buf.item);
++	buf.item.type = type;
++	buf.item.chg.id = id;
++
++	ret = kdbus_cmd_match_add(conn->fd, &buf.cmd);
++	if (ret < 0)
++		kdbus_printf("--- error adding conn match: %d (%m)\n", ret);
++
++	return ret;
++}
++
++int kdbus_add_match_empty(struct kdbus_conn *conn)
++{
++	struct {
++		struct kdbus_cmd_match cmd;
++		struct kdbus_item item;
++	} buf;
++	int ret;
++
++	memset(&buf, 0, sizeof(buf));
++
++	buf.item.size = sizeof(uint64_t) * 3;
++	buf.item.type = KDBUS_ITEM_ID;
++	buf.item.id = KDBUS_MATCH_ID_ANY;
++
++	buf.cmd.size = sizeof(buf.cmd) + buf.item.size;
++
++	ret = kdbus_cmd_match_add(conn->fd, &buf.cmd);
++	if (ret < 0)
++		kdbus_printf("--- error adding conn match: %d (%m)\n", ret);
++
++	return ret;
++}
++
++static int all_ids_are_mapped(const char *path)
++{
++	int ret;
++	FILE *file;
++	uint32_t inside_id, length;
++
++	file = fopen(path, "r");
++	if (!file) {
++		ret = -errno;
++		kdbus_printf("error fopen() %s: %d (%m)\n",
++			     path, ret);
++		return ret;
++	}
++
++	ret = fscanf(file, "%u\t%*u\t%u", &inside_id, &length);
++	if (ret != 2) {
++		if (ferror(file))
++			ret = -errno;
++		else
++			ret = -EIO;
++
++		kdbus_printf("--- error fscanf(): %d\n", ret);
++		fclose(file);
++		return ret;
++	}
++
++	fclose(file);
++
++	/*
++	 * If length is 4294967295 which means the invalid uid
++	 * (uid_t) -1 then we are able to map all uid/gids
++	 */
++	if (inside_id == 0 && length == (uid_t) -1)
++		return 1;
++
++	return 0;
++}
++
++int all_uids_gids_are_mapped(void)
++{
++	int ret;
++
++	ret = all_ids_are_mapped("/proc/self/uid_map");
++	if (ret <= 0) {
++		kdbus_printf("--- error not all uids are mapped\n");
++		return 0;
++	}
++
++	ret = all_ids_are_mapped("/proc/self/gid_map");
++	if (ret <= 0) {
++		kdbus_printf("--- error not all gids are mapped\n");
++		return 0;
++	}
++
++	return 1;
++}
++
++int drop_privileges(uid_t uid, gid_t gid)
++{
++	int ret;
++
++	ret = setgroups(0, NULL);
++	if (ret < 0) {
++		ret = -errno;
++		kdbus_printf("error setgroups: %d (%m)\n", ret);
++		return ret;
++	}
++
++	ret = setresgid(gid, gid, gid);
++	if (ret < 0) {
++		ret = -errno;
++		kdbus_printf("error setresgid: %d (%m)\n", ret);
++		return ret;
++	}
++
++	ret = setresuid(uid, uid, uid);
++	if (ret < 0) {
++		ret = -errno;
++		kdbus_printf("error setresuid: %d (%m)\n", ret);
++		return ret;
++	}
++
++	return ret;
++}
++
++uint64_t now(clockid_t clock)
++{
++	struct timespec spec;
++
++	clock_gettime(clock, &spec);
++	return spec.tv_sec * 1000ULL * 1000ULL * 1000ULL + spec.tv_nsec;
++}
++
++char *unique_name(const char *prefix)
++{
++	unsigned int i;
++	uint64_t u_now;
++	char n[17];
++	char *str;
++	int r;
++
++	/*
++	 * This returns a random string which is guaranteed to be
++	 * globally unique across all calls to unique_name(). We
++	 * compose the string as:
++	 *   <prefix>-<random>-<time>
++	 * With:
++	 *   <prefix>: string provided by the caller
++	 *   <random>: a random alpha string of 16 characters
++	 *   <time>: the current time in micro-seconds since last boot
++	 *
++	 * The <random> part makes the string always look vastly different,
++	 * the <time> part makes sure no two calls return the same string.
++	 */
++
++	u_now = now(CLOCK_MONOTONIC);
++
++	for (i = 0; i < sizeof(n) - 1; ++i)
++		n[i] = 'a' + (rand() % ('z' - 'a'));
++	n[sizeof(n) - 1] = 0;
++
++	r = asprintf(&str, "%s-%s-%" PRIu64, prefix, n, u_now);
++	if (r < 0)
++		return NULL;
++
++	return str;
++}
++
++static int do_userns_map_id(pid_t pid,
++			    const char *map_file,
++			    const char *map_id)
++{
++	int ret;
++	int fd;
++	char *map;
++	unsigned int i;
++
++	map = strndupa(map_id, strlen(map_id));
++	if (!map) {
++		ret = -errno;
++		kdbus_printf("error strndupa %s: %d (%m)\n",
++			map_file, ret);
++		return ret;
++	}
++
++	for (i = 0; i < strlen(map); i++)
++		if (map[i] == ',')
++			map[i] = '\n';
++
++	fd = open(map_file, O_RDWR);
++	if (fd < 0) {
++		ret = -errno;
++		kdbus_printf("error open %s: %d (%m)\n",
++			map_file, ret);
++		return ret;
++	}
++
++	ret = write(fd, map, strlen(map));
++	if (ret < 0) {
++		ret = -errno;
++		kdbus_printf("error write to %s: %d (%m)\n",
++			     map_file, ret);
++		goto out;
++	}
++
++	ret = 0;
++
++out:
++	close(fd);
++	return ret;
++}
++
++int userns_map_uid_gid(pid_t pid,
++		       const char *map_uid,
++		       const char *map_gid)
++{
++	int fd, ret;
++	char file_id[128] = {'\0'};
++
++	snprintf(file_id, sizeof(file_id), "/proc/%ld/uid_map",
++		 (long) pid);
++
++	ret = do_userns_map_id(pid, file_id, map_uid);
++	if (ret < 0)
++		return ret;
++
++	snprintf(file_id, sizeof(file_id), "/proc/%ld/setgroups",
++		 (long) pid);
++
++	fd = open(file_id, O_WRONLY);
++	if (fd >= 0) {
++		write(fd, "deny\n", 5);
++		close(fd);
++	}
++
++	snprintf(file_id, sizeof(file_id), "/proc/%ld/gid_map",
++		 (long) pid);
++
++	return do_userns_map_id(pid, file_id, map_gid);
++}
++
++static int do_cap_get_flag(cap_t caps, cap_value_t cap)
++{
++	int ret;
++	cap_flag_value_t flag_set;
++
++	ret = cap_get_flag(caps, cap, CAP_EFFECTIVE, &flag_set);
++	if (ret < 0) {
++		ret = -errno;
++		kdbus_printf("error cap_get_flag(): %d (%m)\n", ret);
++		return ret;
++	}
++
++	return (flag_set == CAP_SET);
++}
++
++/*
++ * Returns:
++ *  1 in case all the requested effective capabilities are set.
++ *  0 in case we do not have the requested capabilities. This value
++ *    will be used to abort tests with TEST_SKIP
++ *  Negative errno on failure.
++ *
++ *  Terminate args with a negative value.
++ */
++int test_is_capable(int cap, ...)
++{
++	int ret;
++	va_list ap;
++	cap_t caps;
++
++	caps = cap_get_proc();
++	if (!caps) {
++		ret = -errno;
++		kdbus_printf("error cap_get_proc(): %d (%m)\n", ret);
++		return ret;
++	}
++
++	ret = do_cap_get_flag(caps, (cap_value_t)cap);
++	if (ret <= 0)
++		goto out;
++
++	va_start(ap, cap);
++	while ((cap = va_arg(ap, int)) > 0) {
++		ret = do_cap_get_flag(caps, (cap_value_t)cap);
++		if (ret <= 0)
++			break;
++	}
++	va_end(ap);
++
++out:
++	cap_free(caps);
++	return ret;
++}
++
++int config_user_ns_is_enabled(void)
++{
++	return (access("/proc/self/uid_map", F_OK) == 0);
++}
++
++int config_auditsyscall_is_enabled(void)
++{
++	return (access("/proc/self/loginuid", F_OK) == 0);
++}
++
++int config_cgroups_is_enabled(void)
++{
++	return (access("/proc/self/cgroup", F_OK) == 0);
++}
++
++int config_security_is_enabled(void)
++{
++	int fd;
++	int ret;
++	char buf[128];
++
++	/* CONFIG_SECURITY is disabled */
++	if (access("/proc/self/attr/current", F_OK) != 0)
++		return 0;
++
++	/*
++	 * Now only if read() fails with -EINVAL then we assume
++	 * that SECLABEL and LSM are disabled
++	 */
++	fd = open("/proc/self/attr/current", O_RDONLY|O_CLOEXEC);
++	if (fd < 0)
++		return 1;
++
++	ret = read(fd, buf, sizeof(buf));
++	if (ret == -1 && errno == EINVAL)
++		ret = 0;
++	else
++		ret = 1;
++
++	close(fd);
++
++	return ret;
++}
+diff --git a/tools/testing/selftests/kdbus/kdbus-util.h b/tools/testing/selftests/kdbus/kdbus-util.h
+new file mode 100644
+index 0000000..e1e18b9
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/kdbus-util.h
+@@ -0,0 +1,218 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Daniel Mack
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#pragma once
++
++#define BIT(X) (1 << (X))
++
++#include <time.h>
++#include <stdbool.h>
++#include <linux/kdbus.h>
++
++#define _STRINGIFY(x) #x
++#define STRINGIFY(x) _STRINGIFY(x)
++#define ELEMENTSOF(x) (sizeof(x)/sizeof((x)[0]))
++
++#define KDBUS_PTR(addr) ((void *)(uintptr_t)(addr))
++
++#define KDBUS_ALIGN8(l) (((l) + 7) & ~7)
++#define KDBUS_ITEM_HEADER_SIZE offsetof(struct kdbus_item, data)
++#define KDBUS_ITEM_SIZE(s) KDBUS_ALIGN8((s) + KDBUS_ITEM_HEADER_SIZE)
++
++#define KDBUS_ITEM_NEXT(item) \
++	(typeof(item))((uint8_t *)(item) + KDBUS_ALIGN8((item)->size))
++#define KDBUS_ITEM_FOREACH(item, head, first)				\
++	for ((item) = (head)->first;					\
++	     ((uint8_t *)(item) < (uint8_t *)(head) + (head)->size) &&	\
++	       ((uint8_t *)(item) >= (uint8_t *)(head));		\
++	     (item) = KDBUS_ITEM_NEXT(item))
++#define KDBUS_FOREACH(iter, first, _size)				\
++	for ((iter) = (first);						\
++	     ((uint8_t *)(iter) < (uint8_t *)(first) + (_size)) &&	\
++	       ((uint8_t *)(iter) >= (uint8_t *)(first));		\
++	     (iter) = (void *)((uint8_t *)(iter) + KDBUS_ALIGN8((iter)->size)))
++
++#define _KDBUS_ATTACH_BITS_SET_NR (__builtin_popcountll(_KDBUS_ATTACH_ALL))
++
++/* Sum of KDBUS_ITEM_* that reflects _KDBUS_ATTACH_ALL */
++#define KDBUS_ATTACH_ITEMS_TYPE_SUM					\
++	((((_KDBUS_ATTACH_BITS_SET_NR - 1) *				\
++	((_KDBUS_ATTACH_BITS_SET_NR - 1) + 1)) / 2) +			\
++	(_KDBUS_ITEM_ATTACH_BASE * _KDBUS_ATTACH_BITS_SET_NR))
++
++#define POOL_SIZE (16 * 1024LU * 1024LU)
++
++#define UNPRIV_UID 65534
++#define UNPRIV_GID 65534
++
++/* Dump as user of process, useful for user namespace testing */
++#define SUID_DUMP_USER	1
++
++extern int kdbus_util_verbose;
++
++#define kdbus_printf(X...) \
++	if (kdbus_util_verbose) \
++		printf(X)
++
++#define RUN_UNPRIVILEGED(child_uid, child_gid, _child_, _parent_) ({	\
++		pid_t pid, rpid;					\
++		int ret;						\
++									\
++		pid = fork();						\
++		if (pid == 0) {						\
++			ret = drop_privileges(child_uid, child_gid);	\
++			ASSERT_EXIT_VAL(ret == 0, ret);			\
++									\
++			_child_;					\
++			_exit(0);					\
++		} else if (pid > 0) {					\
++			_parent_;					\
++			rpid = waitpid(pid, &ret, 0);			\
++			ASSERT_RETURN(rpid == pid);			\
++			ASSERT_RETURN(WIFEXITED(ret));			\
++			ASSERT_RETURN(WEXITSTATUS(ret) == 0);		\
++			ret = TEST_OK;					\
++		} else {						\
++			ret = pid;					\
++		}							\
++									\
++		ret;							\
++	})
++
++#define RUN_UNPRIVILEGED_CONN(_var_, _bus_, _code_)			\
++	RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_GID, ({			\
++		struct kdbus_conn *_var_;				\
++		_var_ = kdbus_hello(_bus_, 0, NULL, 0);			\
++		ASSERT_EXIT(_var_);					\
++		_code_;							\
++		kdbus_conn_free(_var_);					\
++	}), ({ 0; }))
++
++#define RUN_CLONE_CHILD(clone_ret, flags, _setup_, _child_body_,	\
++			_parent_setup_, _parent_body_) ({		\
++	pid_t pid, rpid;						\
++	int ret;							\
++	int efd = -1;							\
++									\
++	_setup_;							\
++	efd = eventfd(0, EFD_CLOEXEC);					\
++	ASSERT_RETURN(efd >= 0);					\
++	*(clone_ret) = 0;						\
++	pid = syscall(__NR_clone, flags, NULL);				\
++	if (pid == 0) {							\
++		eventfd_t event_status = 0;				\
++		ret = prctl(PR_SET_PDEATHSIG, SIGKILL);			\
++		ASSERT_EXIT(ret == 0);					\
++		ret = eventfd_read(efd, &event_status);			\
++		if (ret < 0 || event_status != 1) {			\
++			kdbus_printf("error eventfd_read()\n");		\
++			_exit(EXIT_FAILURE);				\
++		}							\
++		_child_body_;						\
++		_exit(0);						\
++	} else if (pid > 0) {						\
++		_parent_setup_;						\
++		ret = eventfd_write(efd, 1);				\
++		ASSERT_RETURN(ret >= 0);				\
++		_parent_body_;						\
++		rpid = waitpid(pid, &ret, 0);				\
++		ASSERT_RETURN(rpid == pid);				\
++		ASSERT_RETURN(WIFEXITED(ret));				\
++		ASSERT_RETURN(WEXITSTATUS(ret) == 0);			\
++		ret = TEST_OK;						\
++	} else {							\
++		ret = -errno;						\
++		*(clone_ret) = -errno;					\
++	}								\
++	close(efd);							\
++	ret;								\
++})
++
++/* Enums for parent if it should drop privs or not */
++enum kdbus_drop_parent {
++	DO_NOT_DROP,
++	DROP_SAME_UNPRIV,
++	DROP_OTHER_UNPRIV,
++};
++
++struct kdbus_conn {
++	int fd;
++	uint64_t id;
++	unsigned char *buf;
++};
++
++int kdbus_sysfs_get_parameter_mask(const char *path, uint64_t *mask);
++int kdbus_sysfs_set_parameter_mask(const char *path, uint64_t mask);
++
++int sys_memfd_create(const char *name, __u64 size);
++int sys_memfd_seal_set(int fd);
++off_t sys_memfd_get_size(int fd, off_t *size);
++
++int kdbus_list(struct kdbus_conn *conn, uint64_t flags);
++int kdbus_name_release(struct kdbus_conn *conn, const char *name);
++int kdbus_name_acquire(struct kdbus_conn *conn, const char *name,
++		       uint64_t *flags);
++void kdbus_msg_free(struct kdbus_msg *msg);
++int kdbus_msg_recv(struct kdbus_conn *conn,
++		   struct kdbus_msg **msg, uint64_t *offset);
++int kdbus_msg_recv_poll(struct kdbus_conn *conn, int timeout_ms,
++			struct kdbus_msg **msg_out, uint64_t *offset);
++int kdbus_free(const struct kdbus_conn *conn, uint64_t offset);
++int kdbus_msg_dump(const struct kdbus_conn *conn,
++		   const struct kdbus_msg *msg);
++int kdbus_create_bus(int control_fd, const char *name,
++		     uint64_t owner_meta, char **path);
++int kdbus_msg_send(const struct kdbus_conn *conn, const char *name,
++		   uint64_t cookie, uint64_t flags, uint64_t timeout,
++		   int64_t priority, uint64_t dst_id);
++int kdbus_msg_send_sync(const struct kdbus_conn *conn, const char *name,
++			uint64_t cookie, uint64_t flags, uint64_t timeout,
++			int64_t priority, uint64_t dst_id, int cancel_fd);
++int kdbus_msg_send_reply(const struct kdbus_conn *conn,
++			 uint64_t reply_cookie,
++			 uint64_t dst_id);
++struct kdbus_conn *kdbus_hello(const char *path, uint64_t hello_flags,
++			       const struct kdbus_item *item,
++			       size_t item_size);
++struct kdbus_conn *kdbus_hello_registrar(const char *path, const char *name,
++					 const struct kdbus_policy_access *access,
++					 size_t num_access, uint64_t flags);
++struct kdbus_conn *kdbus_hello_activator(const char *path, const char *name,
++					 const struct kdbus_policy_access *access,
++					 size_t num_access);
++bool kdbus_item_in_message(struct kdbus_msg *msg, uint64_t type);
++int kdbus_bus_creator_info(struct kdbus_conn *conn,
++			   uint64_t flags,
++			   uint64_t *offset);
++int kdbus_conn_info(struct kdbus_conn *conn, uint64_t id,
++		    const char *name, uint64_t flags, uint64_t *offset);
++void kdbus_conn_free(struct kdbus_conn *conn);
++int kdbus_conn_update_attach_flags(struct kdbus_conn *conn,
++				   uint64_t attach_flags_send,
++				   uint64_t attach_flags_recv);
++int kdbus_conn_update_policy(struct kdbus_conn *conn, const char *name,
++			     const struct kdbus_policy_access *access,
++			     size_t num_access);
++
++int kdbus_add_match_id(struct kdbus_conn *conn, uint64_t cookie,
++		       uint64_t type, uint64_t id);
++int kdbus_add_match_empty(struct kdbus_conn *conn);
++
++int all_uids_gids_are_mapped(void);
++int drop_privileges(uid_t uid, gid_t gid);
++uint64_t now(clockid_t clock);
++char *unique_name(const char *prefix);
++
++int userns_map_uid_gid(pid_t pid, const char *map_uid, const char *map_gid);
++int test_is_capable(int cap, ...);
++int config_user_ns_is_enabled(void);
++int config_auditsyscall_is_enabled(void);
++int config_cgroups_is_enabled(void);
++int config_security_is_enabled(void);
+diff --git a/tools/testing/selftests/kdbus/test-activator.c b/tools/testing/selftests/kdbus/test-activator.c
+new file mode 100644
+index 0000000..3d1b763
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-activator.c
+@@ -0,0 +1,318 @@
++#include <stdio.h>
++#include <string.h>
++#include <time.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stdbool.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++#include <assert.h>
++#include <poll.h>
++#include <sys/capability.h>
++#include <sys/types.h>
++#include <sys/wait.h>
++
++#include "kdbus-test.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++
++static int kdbus_starter_poll(struct kdbus_conn *conn)
++{
++	int ret;
++	struct pollfd fd;
++
++	fd.fd = conn->fd;
++	fd.events = POLLIN | POLLPRI | POLLHUP;
++	fd.revents = 0;
++
++	ret = poll(&fd, 1, 100);
++	if (ret == 0)
++		return -ETIMEDOUT;
++	else if (ret > 0) {
++		if (fd.revents & POLLIN)
++			return 0;
++
++		if (fd.revents & (POLLHUP | POLLERR))
++			ret = -ECONNRESET;
++	}
++
++	return ret;
++}
++
++/* Ensure that kdbus activator logic is safe */
++static int kdbus_priv_activator(struct kdbus_test_env *env)
++{
++	int ret;
++	struct kdbus_msg *msg = NULL;
++	uint64_t cookie = 0xdeadbeef;
++	uint64_t flags = KDBUS_NAME_REPLACE_EXISTING;
++	struct kdbus_conn *activator;
++	struct kdbus_conn *service;
++	struct kdbus_conn *client;
++	struct kdbus_conn *holder;
++	struct kdbus_policy_access *access;
++
++	access = (struct kdbus_policy_access[]){
++		{
++			.type = KDBUS_POLICY_ACCESS_USER,
++			.id = getuid(),
++			.access = KDBUS_POLICY_OWN,
++		},
++		{
++			.type = KDBUS_POLICY_ACCESS_USER,
++			.id = getuid(),
++			.access = KDBUS_POLICY_TALK,
++		},
++	};
++
++	activator = kdbus_hello_activator(env->buspath, "foo.priv.activator",
++					  access, 2);
++	ASSERT_RETURN(activator);
++
++	service = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(service);
++
++	client = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(client);
++
++	/*
++	 * Make sure that other users can't TALK to the activator
++	 */
++
++	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++		/* Try to talk using the ID */
++		ret = kdbus_msg_send(unpriv, NULL, 0xdeadbeef, 0, 0,
++				     0, activator->id);
++		ASSERT_EXIT(ret == -ENXIO);
++
++		/* Try to talk to the name */
++		ret = kdbus_msg_send(unpriv, "foo.priv.activator",
++				     0xdeadbeef, 0, 0, 0,
++				     KDBUS_DST_ID_NAME);
++		ASSERT_EXIT(ret == -EPERM);
++	}));
++	ASSERT_RETURN(ret >= 0);
++
++	/*
++	 * Make sure that we did not receive anything, so the
++	 * service will not be started automatically
++	 */
++
++	ret = kdbus_starter_poll(activator);
++	ASSERT_RETURN(ret == -ETIMEDOUT);
++
++	/*
++	 * Now try to emulate the starter/service logic and
++	 * acquire the name.
++	 */
++
++	cookie++;
++	ret = kdbus_msg_send(service, "foo.priv.activator", cookie,
++			     0, 0, 0, KDBUS_DST_ID_NAME);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_starter_poll(activator);
++	ASSERT_RETURN(ret == 0);
++
++	/* Policies are still checked, access denied */
++
++	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++		ret = kdbus_name_acquire(unpriv, "foo.priv.activator",
++					 &flags);
++		ASSERT_RETURN(ret == -EPERM);
++	}));
++	ASSERT_RETURN(ret >= 0);
++
++	ret = kdbus_name_acquire(service, "foo.priv.activator",
++				 &flags);
++	ASSERT_RETURN(ret == 0);
++
++	/* We read our previous starter message */
++
++	ret = kdbus_msg_recv_poll(service, 100, NULL, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	/* Try to talk, we still fail */
++
++	cookie++;
++	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++		/* Try to talk to the name */
++		ret = kdbus_msg_send(unpriv, "foo.priv.activator",
++				     cookie, 0, 0, 0,
++				     KDBUS_DST_ID_NAME);
++		ASSERT_EXIT(ret == -EPERM);
++	}));
++	ASSERT_RETURN(ret >= 0);
++
++	/* Still nothing to read */
++
++	ret = kdbus_msg_recv_poll(service, 100, NULL, NULL);
++	ASSERT_RETURN(ret == -ETIMEDOUT);
++
++	/* We receive every thing now */
++
++	cookie++;
++	ret = kdbus_msg_send(client, "foo.priv.activator", cookie,
++			     0, 0, 0, KDBUS_DST_ID_NAME);
++	ASSERT_RETURN(ret == 0);
++	ret = kdbus_msg_recv_poll(service, 100, &msg, NULL);
++	ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
++
++	kdbus_msg_free(msg);
++
++	/* Policies default to deny TALK now */
++	kdbus_conn_free(activator);
++
++	cookie++;
++	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++		/* Try to talk to the name */
++		ret = kdbus_msg_send(unpriv, "foo.priv.activator",
++				     cookie, 0, 0, 0,
++				     KDBUS_DST_ID_NAME);
++		ASSERT_EXIT(ret == -EPERM);
++	}));
++	ASSERT_RETURN(ret >= 0);
++
++	ret = kdbus_msg_recv_poll(service, 100, NULL, NULL);
++	ASSERT_RETURN(ret == -ETIMEDOUT);
++
++	/* Same user is able to TALK */
++	cookie++;
++	ret = kdbus_msg_send(client, "foo.priv.activator", cookie,
++			     0, 0, 0, KDBUS_DST_ID_NAME);
++	ASSERT_RETURN(ret == 0);
++	ret = kdbus_msg_recv_poll(service, 100, &msg, NULL);
++	ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
++
++	kdbus_msg_free(msg);
++
++	access = (struct kdbus_policy_access []){
++		{
++			.type = KDBUS_POLICY_ACCESS_WORLD,
++			.id = getuid(),
++			.access = KDBUS_POLICY_TALK,
++		},
++	};
++
++	holder = kdbus_hello_registrar(env->buspath, "foo.priv.activator",
++				       access, 1, KDBUS_HELLO_POLICY_HOLDER);
++	ASSERT_RETURN(holder);
++
++	/* Now we are able to TALK to the name */
++
++	cookie++;
++	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++		/* Try to talk to the name */
++		ret = kdbus_msg_send(unpriv, "foo.priv.activator",
++				     cookie, 0, 0, 0,
++				     KDBUS_DST_ID_NAME);
++		ASSERT_EXIT(ret == 0);
++	}));
++	ASSERT_RETURN(ret >= 0);
++
++	ret = kdbus_msg_recv_poll(service, 100, NULL, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++		ret = kdbus_name_acquire(unpriv, "foo.priv.activator",
++					 &flags);
++		ASSERT_RETURN(ret == -EPERM);
++	}));
++	ASSERT_RETURN(ret >= 0);
++
++	kdbus_conn_free(service);
++	kdbus_conn_free(client);
++	kdbus_conn_free(holder);
++
++	return 0;
++}
++
++int kdbus_test_activator(struct kdbus_test_env *env)
++{
++	int ret;
++	struct kdbus_conn *activator;
++	struct pollfd fds[2];
++	bool activator_done = false;
++	struct kdbus_policy_access access[2];
++
++	access[0].type = KDBUS_POLICY_ACCESS_USER;
++	access[0].id = getuid();
++	access[0].access = KDBUS_POLICY_OWN;
++
++	access[1].type = KDBUS_POLICY_ACCESS_WORLD;
++	access[1].access = KDBUS_POLICY_TALK;
++
++	activator = kdbus_hello_activator(env->buspath, "foo.test.activator",
++					  access, 2);
++	ASSERT_RETURN(activator);
++
++	ret = kdbus_add_match_empty(env->conn);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_list(env->conn, KDBUS_LIST_NAMES |
++				    KDBUS_LIST_UNIQUE |
++				    KDBUS_LIST_ACTIVATORS |
++				    KDBUS_LIST_QUEUED);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_msg_send(env->conn, "foo.test.activator", 0xdeafbeef,
++			     0, 0, 0, KDBUS_DST_ID_NAME);
++	ASSERT_RETURN(ret == 0);
++
++	fds[0].fd = activator->fd;
++	fds[1].fd = env->conn->fd;
++
++	kdbus_printf("-- entering poll loop ...\n");
++
++	for (;;) {
++		int i, nfds = sizeof(fds) / sizeof(fds[0]);
++
++		for (i = 0; i < nfds; i++) {
++			fds[i].events = POLLIN | POLLPRI;
++			fds[i].revents = 0;
++		}
++
++		ret = poll(fds, nfds, 3000);
++		ASSERT_RETURN(ret >= 0);
++
++		ret = kdbus_list(env->conn, KDBUS_LIST_NAMES);
++		ASSERT_RETURN(ret == 0);
++
++		if ((fds[0].revents & POLLIN) && !activator_done) {
++			uint64_t flags = KDBUS_NAME_REPLACE_EXISTING;
++
++			kdbus_printf("Starter was called back!\n");
++
++			ret = kdbus_name_acquire(env->conn,
++						 "foo.test.activator", &flags);
++			ASSERT_RETURN(ret == 0);
++
++			activator_done = true;
++		}
++
++		if (fds[1].revents & POLLIN) {
++			kdbus_msg_recv(env->conn, NULL, NULL);
++			break;
++		}
++	}
++
++	/* Check if all uids/gids are mapped */
++	if (!all_uids_gids_are_mapped())
++		return TEST_SKIP;
++
++	/* Check now capabilities, so we run the previous tests */
++	ret = test_is_capable(CAP_SETUID, CAP_SETGID, -1);
++	ASSERT_RETURN(ret >= 0);
++
++	if (!ret)
++		return TEST_SKIP;
++
++	ret = kdbus_priv_activator(env);
++	ASSERT_RETURN(ret == 0);
++
++	kdbus_conn_free(activator);
++
++	return TEST_OK;
++}
+diff --git a/tools/testing/selftests/kdbus/test-benchmark.c b/tools/testing/selftests/kdbus/test-benchmark.c
+new file mode 100644
+index 0000000..8a9744b
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-benchmark.c
+@@ -0,0 +1,451 @@
++#include <stdio.h>
++#include <string.h>
++#include <time.h>
++#include <fcntl.h>
++#include <locale.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <stdbool.h>
++#include <errno.h>
++#include <assert.h>
++#include <poll.h>
++#include <sys/time.h>
++#include <sys/mman.h>
++#include <sys/socket.h>
++#include <math.h>
++
++#include "kdbus-api.h"
++#include "kdbus-test.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++
++#define SERVICE_NAME "foo.bar.echo"
++
++/*
++ * To have a banchmark comparison with unix socket, set:
++ * user_memfd	= false;
++ * compare_uds	= true;
++ * attach_none	= true;		do not attached metadata
++ */
++
++static bool use_memfd = true;		/* transmit memfd? */
++static bool compare_uds = false;		/* unix-socket comparison? */
++static bool attach_none = false;		/* clear attach-flags? */
++static char stress_payload[8192];
++
++struct stats {
++	uint64_t count;
++	uint64_t latency_acc;
++	uint64_t latency_low;
++	uint64_t latency_high;
++	uint64_t latency_avg;
++	uint64_t latency_ssquares;
++};
++
++static struct stats stats;
++
++static void reset_stats(void)
++{
++	stats.count = 0;
++	stats.latency_acc = 0;
++	stats.latency_low = UINT64_MAX;
++	stats.latency_high = 0;
++	stats.latency_avg = 0;
++	stats.latency_ssquares = 0;
++}
++
++static void dump_stats(bool is_uds)
++{
++	if (stats.count > 0) {
++		kdbus_printf("stats %s: %'llu packets processed, latency (nsecs) min/max/avg/dev %'7llu // %'7llu // %'7llu // %'7.f\n",
++			     is_uds ? " (UNIX)" : "(KDBUS)",
++			     (unsigned long long) stats.count,
++			     (unsigned long long) stats.latency_low,
++			     (unsigned long long) stats.latency_high,
++			     (unsigned long long) stats.latency_avg,
++			     sqrt(stats.latency_ssquares / stats.count));
++	} else {
++		kdbus_printf("*** no packets received. bus stuck?\n");
++	}
++}
++
++static void add_stats(uint64_t prev)
++{
++	uint64_t diff, latency_avg_prev;
++
++	diff = now(CLOCK_THREAD_CPUTIME_ID) - prev;
++
++	stats.count++;
++	stats.latency_acc += diff;
++
++	/* see Welford62 */
++	latency_avg_prev = stats.latency_avg;
++	stats.latency_avg = stats.latency_acc / stats.count;
++	stats.latency_ssquares += (diff - latency_avg_prev) * (diff - stats.latency_avg);
++
++	if (stats.latency_low > diff)
++		stats.latency_low = diff;
++
++	if (stats.latency_high < diff)
++		stats.latency_high = diff;
++}
++
++static int setup_simple_kdbus_msg(struct kdbus_conn *conn,
++				  uint64_t dst_id,
++				  struct kdbus_msg **msg_out)
++{
++	struct kdbus_msg *msg;
++	struct kdbus_item *item;
++	uint64_t size;
++
++	size = sizeof(struct kdbus_msg);
++	size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
++
++	msg = malloc(size);
++	ASSERT_RETURN_VAL(msg, -ENOMEM);
++
++	memset(msg, 0, size);
++	msg->size = size;
++	msg->src_id = conn->id;
++	msg->dst_id = dst_id;
++	msg->payload_type = KDBUS_PAYLOAD_DBUS;
++
++	item = msg->items;
++
++	item->type = KDBUS_ITEM_PAYLOAD_VEC;
++	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
++	item->vec.address = (uintptr_t) stress_payload;
++	item->vec.size = sizeof(stress_payload);
++	item = KDBUS_ITEM_NEXT(item);
++
++	*msg_out = msg;
++
++	return 0;
++}
++
++static int setup_memfd_kdbus_msg(struct kdbus_conn *conn,
++				 uint64_t dst_id,
++				 off_t *memfd_item_offset,
++				 struct kdbus_msg **msg_out)
++{
++	struct kdbus_msg *msg;
++	struct kdbus_item *item;
++	uint64_t size;
++
++	size = sizeof(struct kdbus_msg);
++	size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
++	size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_memfd));
++
++	msg = malloc(size);
++	ASSERT_RETURN_VAL(msg, -ENOMEM);
++
++	memset(msg, 0, size);
++	msg->size = size;
++	msg->src_id = conn->id;
++	msg->dst_id = dst_id;
++	msg->payload_type = KDBUS_PAYLOAD_DBUS;
++
++	item = msg->items;
++
++	item->type = KDBUS_ITEM_PAYLOAD_VEC;
++	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
++	item->vec.address = (uintptr_t) stress_payload;
++	item->vec.size = sizeof(stress_payload);
++	item = KDBUS_ITEM_NEXT(item);
++
++	item->type = KDBUS_ITEM_PAYLOAD_MEMFD;
++	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_memfd);
++	item->memfd.size = sizeof(uint64_t);
++
++	*memfd_item_offset = (unsigned char *)item - (unsigned char *)msg;
++	*msg_out = msg;
++
++	return 0;
++}
++
++static int
++send_echo_request(struct kdbus_conn *conn, uint64_t dst_id,
++		  void *kdbus_msg, off_t memfd_item_offset)
++{
++	struct kdbus_cmd_send cmd = {};
++	int memfd = -1;
++	int ret;
++
++	if (use_memfd) {
++		uint64_t now_ns = now(CLOCK_THREAD_CPUTIME_ID);
++		struct kdbus_item *item = memfd_item_offset + kdbus_msg;
++		memfd = sys_memfd_create("memfd-name", 0);
++		ASSERT_RETURN_VAL(memfd >= 0, memfd);
++
++		ret = write(memfd, &now_ns, sizeof(now_ns));
++		ASSERT_RETURN_VAL(ret == sizeof(now_ns), -EAGAIN);
++
++		ret = sys_memfd_seal_set(memfd);
++		ASSERT_RETURN_VAL(ret == 0, -errno);
++
++		item->memfd.fd = memfd;
++	}
++
++	cmd.size = sizeof(cmd);
++	cmd.msg_address = (uintptr_t)kdbus_msg;
++
++	ret = kdbus_cmd_send(conn->fd, &cmd);
++	ASSERT_RETURN_VAL(ret == 0, ret);
++
++	close(memfd);
++
++	return 0;
++}
++
++static int
++handle_echo_reply(struct kdbus_conn *conn, uint64_t send_ns)
++{
++	int ret;
++	struct kdbus_cmd_recv recv = { .size = sizeof(recv) };
++	struct kdbus_msg *msg;
++	const struct kdbus_item *item;
++	bool has_memfd = false;
++
++	ret = kdbus_cmd_recv(conn->fd, &recv);
++	if (ret == -EAGAIN)
++		return ret;
++
++	ASSERT_RETURN_VAL(ret == 0, ret);
++
++	if (!use_memfd)
++		goto out;
++
++	msg = (struct kdbus_msg *)(conn->buf + recv.msg.offset);
++
++	KDBUS_ITEM_FOREACH(item, msg, items) {
++		switch (item->type) {
++		case KDBUS_ITEM_PAYLOAD_MEMFD: {
++			char *buf;
++
++			buf = mmap(NULL, item->memfd.size, PROT_READ,
++				   MAP_PRIVATE, item->memfd.fd, 0);
++			ASSERT_RETURN_VAL(buf != MAP_FAILED, -EINVAL);
++			ASSERT_RETURN_VAL(item->memfd.size == sizeof(uint64_t),
++					  -EINVAL);
++
++			add_stats(*(uint64_t*)buf);
++			munmap(buf, item->memfd.size);
++			close(item->memfd.fd);
++			has_memfd = true;
++			break;
++		}
++
++		case KDBUS_ITEM_PAYLOAD_OFF:
++			/* ignore */
++			break;
++		}
++	}
++
++out:
++	if (!has_memfd)
++		add_stats(send_ns);
++
++	ret = kdbus_free(conn, recv.msg.offset);
++	ASSERT_RETURN_VAL(ret == 0, -errno);
++
++	return 0;
++}
++
++static int benchmark(struct kdbus_test_env *env)
++{
++	static char buf[sizeof(stress_payload)];
++	struct kdbus_msg *kdbus_msg = NULL;
++	off_t memfd_cached_offset = 0;
++	int ret;
++	struct kdbus_conn *conn_a, *conn_b;
++	struct pollfd fds[2];
++	uint64_t start, send_ns, now_ns, diff;
++	unsigned int i;
++	int uds[2];
++
++	setlocale(LC_ALL, "");
++
++	for (i = 0; i < sizeof(stress_payload); i++)
++		stress_payload[i] = i;
++
++	/* setup kdbus pair */
++
++	conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
++	conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn_a && conn_b);
++
++	ret = kdbus_add_match_empty(conn_a);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_add_match_empty(conn_b);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_name_acquire(conn_a, SERVICE_NAME, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	if (attach_none) {
++		ret = kdbus_conn_update_attach_flags(conn_a,
++						     _KDBUS_ATTACH_ALL,
++						     0);
++		ASSERT_RETURN(ret == 0);
++	}
++
++	/* setup UDS pair */
++
++	ret = socketpair(AF_UNIX, SOCK_SEQPACKET | SOCK_NONBLOCK, 0, uds);
++	ASSERT_RETURN(ret == 0);
++
++	/* setup a kdbus msg now */
++	if (use_memfd) {
++		ret = setup_memfd_kdbus_msg(conn_b, conn_a->id,
++					    &memfd_cached_offset,
++					    &kdbus_msg);
++		ASSERT_RETURN(ret == 0);
++	} else {
++		ret = setup_simple_kdbus_msg(conn_b, conn_a->id, &kdbus_msg);
++		ASSERT_RETURN(ret == 0);
++	}
++
++	/* start benchmark */
++
++	kdbus_printf("-- entering poll loop ...\n");
++
++	do {
++		/* run kdbus benchmark */
++		fds[0].fd = conn_a->fd;
++		fds[1].fd = conn_b->fd;
++
++		/* cancel any pending message */
++		handle_echo_reply(conn_a, 0);
++
++		start = now(CLOCK_THREAD_CPUTIME_ID);
++		reset_stats();
++
++		send_ns = now(CLOCK_THREAD_CPUTIME_ID);
++		ret = send_echo_request(conn_b, conn_a->id,
++					kdbus_msg, memfd_cached_offset);
++		ASSERT_RETURN(ret == 0);
++
++		while (1) {
++			unsigned int nfds = sizeof(fds) / sizeof(fds[0]);
++			unsigned int i;
++
++			for (i = 0; i < nfds; i++) {
++				fds[i].events = POLLIN | POLLPRI | POLLHUP;
++				fds[i].revents = 0;
++			}
++
++			ret = poll(fds, nfds, 10);
++			if (ret < 0)
++				break;
++
++			if (fds[0].revents & POLLIN) {
++				ret = handle_echo_reply(conn_a, send_ns);
++				ASSERT_RETURN(ret == 0);
++
++				send_ns = now(CLOCK_THREAD_CPUTIME_ID);
++				ret = send_echo_request(conn_b, conn_a->id,
++							kdbus_msg,
++							memfd_cached_offset);
++				ASSERT_RETURN(ret == 0);
++			}
++
++			now_ns = now(CLOCK_THREAD_CPUTIME_ID);
++			diff = now_ns - start;
++			if (diff > 1000000000ULL) {
++				start = now_ns;
++
++				dump_stats(false);
++				break;
++			}
++		}
++
++		if (!compare_uds)
++			continue;
++
++		/* run unix-socket benchmark as comparison */
++
++		fds[0].fd = uds[0];
++		fds[1].fd = uds[1];
++
++		/* cancel any pendign message */
++		read(uds[1], buf, sizeof(buf));
++
++		start = now(CLOCK_THREAD_CPUTIME_ID);
++		reset_stats();
++
++		send_ns = now(CLOCK_THREAD_CPUTIME_ID);
++		ret = write(uds[0], stress_payload, sizeof(stress_payload));
++		ASSERT_RETURN(ret == sizeof(stress_payload));
++
++		while (1) {
++			unsigned int nfds = sizeof(fds) / sizeof(fds[0]);
++			unsigned int i;
++
++			for (i = 0; i < nfds; i++) {
++				fds[i].events = POLLIN | POLLPRI | POLLHUP;
++				fds[i].revents = 0;
++			}
++
++			ret = poll(fds, nfds, 10);
++			if (ret < 0)
++				break;
++
++			if (fds[1].revents & POLLIN) {
++				ret = read(uds[1], buf, sizeof(buf));
++				ASSERT_RETURN(ret == sizeof(buf));
++
++				add_stats(send_ns);
++
++				send_ns = now(CLOCK_THREAD_CPUTIME_ID);
++				ret = write(uds[0], buf, sizeof(buf));
++				ASSERT_RETURN(ret == sizeof(buf));
++			}
++
++			now_ns = now(CLOCK_THREAD_CPUTIME_ID);
++			diff = now_ns - start;
++			if (diff > 1000000000ULL) {
++				start = now_ns;
++
++				dump_stats(true);
++				break;
++			}
++		}
++
++	} while (kdbus_util_verbose);
++
++	kdbus_printf("-- closing bus connections\n");
++
++	free(kdbus_msg);
++
++	kdbus_conn_free(conn_a);
++	kdbus_conn_free(conn_b);
++
++	return (stats.count > 1) ? TEST_OK : TEST_ERR;
++}
++
++int kdbus_test_benchmark(struct kdbus_test_env *env)
++{
++	use_memfd = true;
++	attach_none = false;
++	compare_uds = false;
++	return benchmark(env);
++}
++
++int kdbus_test_benchmark_nomemfds(struct kdbus_test_env *env)
++{
++	use_memfd = false;
++	attach_none = false;
++	compare_uds = false;
++	return benchmark(env);
++}
++
++int kdbus_test_benchmark_uds(struct kdbus_test_env *env)
++{
++	use_memfd = false;
++	attach_none = true;
++	compare_uds = true;
++	return benchmark(env);
++}
+diff --git a/tools/testing/selftests/kdbus/test-bus.c b/tools/testing/selftests/kdbus/test-bus.c
+new file mode 100644
+index 0000000..762fb30
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-bus.c
+@@ -0,0 +1,175 @@
++#include <stdio.h>
++#include <string.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++#include <assert.h>
++#include <limits.h>
++#include <sys/mman.h>
++#include <stdbool.h>
++
++#include "kdbus-api.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++#include "kdbus-test.h"
++
++static struct kdbus_item *kdbus_get_item(struct kdbus_info *info,
++					 uint64_t type)
++{
++	struct kdbus_item *item;
++
++	KDBUS_ITEM_FOREACH(item, info, items)
++		if (item->type == type)
++			return item;
++
++	return NULL;
++}
++
++static int test_bus_creator_info(const char *bus_path)
++{
++	int ret;
++	uint64_t offset;
++	struct kdbus_conn *conn;
++	struct kdbus_info *info;
++	struct kdbus_item *item;
++	char *tmp, *busname;
++
++	/* extract the bus-name from @bus_path */
++	tmp = strdup(bus_path);
++	ASSERT_RETURN(tmp);
++	busname = strrchr(tmp, '/');
++	ASSERT_RETURN(busname);
++	*busname = 0;
++	busname = strrchr(tmp, '/');
++	ASSERT_RETURN(busname);
++	++busname;
++
++	conn = kdbus_hello(bus_path, 0, NULL, 0);
++	ASSERT_RETURN(conn);
++
++	ret = kdbus_bus_creator_info(conn, _KDBUS_ATTACH_ALL, &offset);
++	ASSERT_RETURN(ret == 0);
++
++	info = (struct kdbus_info *)(conn->buf + offset);
++
++	item = kdbus_get_item(info, KDBUS_ITEM_MAKE_NAME);
++	ASSERT_RETURN(item);
++	ASSERT_RETURN(!strcmp(item->str, busname));
++
++	ret = kdbus_free(conn, offset);
++	ASSERT_RETURN_VAL(ret == 0, ret);
++
++	free(tmp);
++	kdbus_conn_free(conn);
++	return 0;
++}
++
++int kdbus_test_bus_make(struct kdbus_test_env *env)
++{
++	struct {
++		struct kdbus_cmd cmd;
++
++		/* bloom size item */
++		struct {
++			uint64_t size;
++			uint64_t type;
++			struct kdbus_bloom_parameter bloom;
++		} bs;
++
++		/* name item */
++		uint64_t n_size;
++		uint64_t n_type;
++		char name[64];
++	} bus_make;
++	char s[PATH_MAX], *name;
++	int ret, control_fd2;
++	uid_t uid;
++
++	name = unique_name("");
++	ASSERT_RETURN(name);
++
++	snprintf(s, sizeof(s), "%s/control", env->root);
++	env->control_fd = open(s, O_RDWR|O_CLOEXEC);
++	ASSERT_RETURN(env->control_fd >= 0);
++
++	control_fd2 = open(s, O_RDWR|O_CLOEXEC);
++	ASSERT_RETURN(control_fd2 >= 0);
++
++	memset(&bus_make, 0, sizeof(bus_make));
++
++	bus_make.bs.size = sizeof(bus_make.bs);
++	bus_make.bs.type = KDBUS_ITEM_BLOOM_PARAMETER;
++	bus_make.bs.bloom.size = 64;
++	bus_make.bs.bloom.n_hash = 1;
++
++	bus_make.n_type = KDBUS_ITEM_MAKE_NAME;
++
++	uid = getuid();
++
++	/* missing uid prefix */
++	snprintf(bus_make.name, sizeof(bus_make.name), "foo");
++	bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
++	bus_make.cmd.size = sizeof(struct kdbus_cmd) +
++			    sizeof(bus_make.bs) + bus_make.n_size;
++	ret = kdbus_cmd_bus_make(env->control_fd, &bus_make.cmd);
++	ASSERT_RETURN(ret == -EINVAL);
++
++	/* non alphanumeric character */
++	snprintf(bus_make.name, sizeof(bus_make.name), "%u-blah@123", uid);
++	bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
++	bus_make.cmd.size = sizeof(struct kdbus_cmd) +
++			    sizeof(bus_make.bs) + bus_make.n_size;
++	ret = kdbus_cmd_bus_make(env->control_fd, &bus_make.cmd);
++	ASSERT_RETURN(ret == -EINVAL);
++
++	/* '-' at the end */
++	snprintf(bus_make.name, sizeof(bus_make.name), "%u-blah-", uid);
++	bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
++	bus_make.cmd.size = sizeof(struct kdbus_cmd) +
++			    sizeof(bus_make.bs) + bus_make.n_size;
++	ret = kdbus_cmd_bus_make(env->control_fd, &bus_make.cmd);
++	ASSERT_RETURN(ret == -EINVAL);
++
++	/* create a new bus */
++	snprintf(bus_make.name, sizeof(bus_make.name), "%u-%s-1", uid, name);
++	bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
++	bus_make.cmd.size = sizeof(struct kdbus_cmd) +
++			    sizeof(bus_make.bs) + bus_make.n_size;
++	ret = kdbus_cmd_bus_make(env->control_fd, &bus_make.cmd);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_cmd_bus_make(control_fd2, &bus_make.cmd);
++	ASSERT_RETURN(ret == -EEXIST);
++
++	snprintf(s, sizeof(s), "%s/%u-%s-1/bus", env->root, uid, name);
++	ASSERT_RETURN(access(s, F_OK) == 0);
++
++	ret = test_bus_creator_info(s);
++	ASSERT_RETURN(ret == 0);
++
++	/* can't use the same fd for bus make twice, even though a different
++	 * bus name is used
++	 */
++	snprintf(bus_make.name, sizeof(bus_make.name), "%u-%s-2", uid, name);
++	bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
++	bus_make.cmd.size = sizeof(struct kdbus_cmd) +
++			    sizeof(bus_make.bs) + bus_make.n_size;
++	ret = kdbus_cmd_bus_make(env->control_fd, &bus_make.cmd);
++	ASSERT_RETURN(ret == -EBADFD);
++
++	/* create a new bus, with different fd and different bus name */
++	snprintf(bus_make.name, sizeof(bus_make.name), "%u-%s-2", uid, name);
++	bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
++	bus_make.cmd.size = sizeof(struct kdbus_cmd) +
++			    sizeof(bus_make.bs) + bus_make.n_size;
++	ret = kdbus_cmd_bus_make(control_fd2, &bus_make.cmd);
++	ASSERT_RETURN(ret == 0);
++
++	close(control_fd2);
++	free(name);
++
++	return TEST_OK;
++}
+diff --git a/tools/testing/selftests/kdbus/test-chat.c b/tools/testing/selftests/kdbus/test-chat.c
+new file mode 100644
+index 0000000..41e5b53
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-chat.c
+@@ -0,0 +1,124 @@
++#include <stdio.h>
++#include <string.h>
++#include <time.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++#include <assert.h>
++#include <poll.h>
++#include <stdbool.h>
++
++#include "kdbus-test.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++
++int kdbus_test_chat(struct kdbus_test_env *env)
++{
++	int ret, cookie;
++	struct kdbus_conn *conn_a, *conn_b;
++	struct pollfd fds[2];
++	uint64_t flags;
++	int count;
++
++	conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
++	conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn_a && conn_b);
++
++	flags = KDBUS_NAME_ALLOW_REPLACEMENT;
++	ret = kdbus_name_acquire(conn_a, "foo.bar.test", &flags);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_name_acquire(conn_a, "foo.bar.baz", NULL);
++	ASSERT_RETURN(ret == 0);
++
++	flags = KDBUS_NAME_QUEUE;
++	ret = kdbus_name_acquire(conn_b, "foo.bar.baz", &flags);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_name_acquire(conn_a, "foo.bar.double", NULL);
++	ASSERT_RETURN(ret == 0);
++
++	flags = 0;
++	ret = kdbus_name_acquire(conn_a, "foo.bar.double", &flags);
++	ASSERT_RETURN(ret == 0);
++	ASSERT_RETURN(!(flags & KDBUS_NAME_ACQUIRED));
++
++	ret = kdbus_name_release(conn_a, "foo.bar.double");
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_name_release(conn_a, "foo.bar.double");
++	ASSERT_RETURN(ret == -ESRCH);
++
++	ret = kdbus_list(conn_b, KDBUS_LIST_UNIQUE |
++				 KDBUS_LIST_NAMES  |
++				 KDBUS_LIST_QUEUED |
++				 KDBUS_LIST_ACTIVATORS);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_add_match_empty(conn_a);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_add_match_empty(conn_b);
++	ASSERT_RETURN(ret == 0);
++
++	cookie = 0;
++	ret = kdbus_msg_send(conn_b, NULL, 0xc0000000 | cookie, 0, 0, 0,
++			     KDBUS_DST_ID_BROADCAST);
++	ASSERT_RETURN(ret == 0);
++
++	fds[0].fd = conn_a->fd;
++	fds[1].fd = conn_b->fd;
++
++	kdbus_printf("-- entering poll loop ...\n");
++
++	for (count = 0;; count++) {
++		int i, nfds = sizeof(fds) / sizeof(fds[0]);
++
++		for (i = 0; i < nfds; i++) {
++			fds[i].events = POLLIN | POLLPRI | POLLHUP;
++			fds[i].revents = 0;
++		}
++
++		ret = poll(fds, nfds, 3000);
++		ASSERT_RETURN(ret >= 0);
++
++		if (fds[0].revents & POLLIN) {
++			if (count > 2)
++				kdbus_name_release(conn_a, "foo.bar.baz");
++
++			ret = kdbus_msg_recv(conn_a, NULL, NULL);
++			ASSERT_RETURN(ret == 0);
++			ret = kdbus_msg_send(conn_a, NULL,
++					     0xc0000000 | cookie++,
++					     0, 0, 0, conn_b->id);
++			ASSERT_RETURN(ret == 0);
++		}
++
++		if (fds[1].revents & POLLIN) {
++			ret = kdbus_msg_recv(conn_b, NULL, NULL);
++			ASSERT_RETURN(ret == 0);
++			ret = kdbus_msg_send(conn_b, NULL,
++					     0xc0000000 | cookie++,
++					     0, 0, 0, conn_a->id);
++			ASSERT_RETURN(ret == 0);
++		}
++
++		ret = kdbus_list(conn_b, KDBUS_LIST_UNIQUE |
++					 KDBUS_LIST_NAMES  |
++					 KDBUS_LIST_QUEUED |
++					 KDBUS_LIST_ACTIVATORS);
++		ASSERT_RETURN(ret == 0);
++
++		if (count > 10)
++			break;
++	}
++
++	kdbus_printf("-- closing bus connections\n");
++	kdbus_conn_free(conn_a);
++	kdbus_conn_free(conn_b);
++
++	return TEST_OK;
++}
+diff --git a/tools/testing/selftests/kdbus/test-connection.c b/tools/testing/selftests/kdbus/test-connection.c
+new file mode 100644
+index 0000000..4688ce8
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-connection.c
+@@ -0,0 +1,597 @@
++#include <stdio.h>
++#include <string.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++#include <assert.h>
++#include <limits.h>
++#include <sys/types.h>
++#include <sys/capability.h>
++#include <sys/mman.h>
++#include <sys/syscall.h>
++#include <sys/wait.h>
++#include <stdbool.h>
++
++#include "kdbus-api.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++#include "kdbus-test.h"
++
++int kdbus_test_hello(struct kdbus_test_env *env)
++{
++	struct kdbus_cmd_free cmd_free = {};
++	struct kdbus_cmd_hello hello;
++	int fd, ret;
++
++	memset(&hello, 0, sizeof(hello));
++
++	fd = open(env->buspath, O_RDWR|O_CLOEXEC);
++	ASSERT_RETURN(fd >= 0);
++
++	hello.flags = KDBUS_HELLO_ACCEPT_FD;
++	hello.attach_flags_send = _KDBUS_ATTACH_ALL;
++	hello.attach_flags_recv = _KDBUS_ATTACH_ALL;
++	hello.size = sizeof(struct kdbus_cmd_hello);
++	hello.pool_size = POOL_SIZE;
++
++	/* an unaligned hello must result in -EFAULT */
++	ret = kdbus_cmd_hello(fd, (struct kdbus_cmd_hello *) ((char *) &hello + 1));
++	ASSERT_RETURN(ret == -EFAULT);
++
++	/* a size of 0 must return EMSGSIZE */
++	hello.size = 1;
++	hello.flags = KDBUS_HELLO_ACCEPT_FD;
++	hello.attach_flags_send = _KDBUS_ATTACH_ALL;
++	ret = kdbus_cmd_hello(fd, &hello);
++	ASSERT_RETURN(ret == -EINVAL);
++
++	hello.size = sizeof(struct kdbus_cmd_hello);
++
++	/* check faulty flags */
++	hello.flags = 1ULL << 32;
++	hello.attach_flags_send = _KDBUS_ATTACH_ALL;
++	ret = kdbus_cmd_hello(fd, &hello);
++	ASSERT_RETURN(ret == -EINVAL);
++
++	/* check for faulty pool sizes */
++	hello.pool_size = 0;
++	hello.flags = KDBUS_HELLO_ACCEPT_FD;
++	hello.attach_flags_send = _KDBUS_ATTACH_ALL;
++	ret = kdbus_cmd_hello(fd, &hello);
++	ASSERT_RETURN(ret == -EINVAL);
++
++	hello.pool_size = 4097;
++	hello.attach_flags_send = _KDBUS_ATTACH_ALL;
++	ret = kdbus_cmd_hello(fd, &hello);
++	ASSERT_RETURN(ret == -EINVAL);
++
++	hello.pool_size = POOL_SIZE;
++
++	hello.attach_flags_send = _KDBUS_ATTACH_ALL;
++	hello.offset = (__u64)-1;
++
++	/* success test */
++	ret = kdbus_cmd_hello(fd, &hello);
++	ASSERT_RETURN(ret == 0);
++
++	/* The kernel should have returned some items */
++	ASSERT_RETURN(hello.offset != (__u64)-1);
++	cmd_free.size = sizeof(cmd_free);
++	cmd_free.offset = hello.offset;
++	ret = kdbus_cmd_free(fd, &cmd_free);
++	ASSERT_RETURN(ret >= 0);
++
++	close(fd);
++
++	fd = open(env->buspath, O_RDWR|O_CLOEXEC);
++	ASSERT_RETURN(fd >= 0);
++
++	/* no ACTIVATOR flag without a name */
++	hello.flags = KDBUS_HELLO_ACTIVATOR;
++	ret = kdbus_cmd_hello(fd, &hello);
++	ASSERT_RETURN(ret == -EINVAL);
++
++	close(fd);
++
++	return TEST_OK;
++}
++
++int kdbus_test_byebye(struct kdbus_test_env *env)
++{
++	struct kdbus_conn *conn;
++	struct kdbus_cmd_recv cmd_recv = { .size = sizeof(cmd_recv) };
++	struct kdbus_cmd cmd_byebye = { .size = sizeof(cmd_byebye) };
++	int ret;
++
++	/* create a 2nd connection */
++	conn = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn != NULL);
++
++	ret = kdbus_add_match_empty(conn);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_add_match_empty(env->conn);
++	ASSERT_RETURN(ret == 0);
++
++	/* send over 1st connection */
++	ret = kdbus_msg_send(env->conn, NULL, 0, 0, 0, 0,
++			     KDBUS_DST_ID_BROADCAST);
++	ASSERT_RETURN(ret == 0);
++
++	/* say byebye on the 2nd, which must fail */
++	ret = kdbus_cmd_byebye(conn->fd, &cmd_byebye);
++	ASSERT_RETURN(ret == -EBUSY);
++
++	/* receive the message */
++	ret = kdbus_cmd_recv(conn->fd, &cmd_recv);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_free(conn, cmd_recv.msg.offset);
++	ASSERT_RETURN(ret == 0);
++
++	/* and try again */
++	ret = kdbus_cmd_byebye(conn->fd, &cmd_byebye);
++	ASSERT_RETURN(ret == 0);
++
++	/* a 2nd try should result in -ECONNRESET */
++	ret = kdbus_cmd_byebye(conn->fd, &cmd_byebye);
++	ASSERT_RETURN(ret == -ECONNRESET);
++
++	kdbus_conn_free(conn);
++
++	return TEST_OK;
++}
++
++/* Get only the first item */
++static struct kdbus_item *kdbus_get_item(struct kdbus_info *info,
++					 uint64_t type)
++{
++	struct kdbus_item *item;
++
++	KDBUS_ITEM_FOREACH(item, info, items)
++		if (item->type == type)
++			return item;
++
++	return NULL;
++}
++
++static unsigned int kdbus_count_item(struct kdbus_info *info,
++				     uint64_t type)
++{
++	unsigned int i = 0;
++	const struct kdbus_item *item;
++
++	KDBUS_ITEM_FOREACH(item, info, items)
++		if (item->type == type)
++			i++;
++
++	return i;
++}
++
++static int kdbus_fuzz_conn_info(struct kdbus_test_env *env, int capable)
++{
++	int ret;
++	unsigned int cnt = 0;
++	uint64_t offset = 0;
++	struct kdbus_info *info;
++	struct kdbus_conn *conn;
++	struct kdbus_conn *privileged;
++	const struct kdbus_item *item;
++	uint64_t valid_flags = KDBUS_ATTACH_NAMES |
++			       KDBUS_ATTACH_CREDS |
++			       KDBUS_ATTACH_PIDS |
++			       KDBUS_ATTACH_CONN_DESCRIPTION;
++
++	uint64_t invalid_flags = KDBUS_ATTACH_NAMES	|
++				 KDBUS_ATTACH_CREDS	|
++				 KDBUS_ATTACH_PIDS	|
++				 KDBUS_ATTACH_CAPS	|
++				 KDBUS_ATTACH_CGROUP	|
++				 KDBUS_ATTACH_CONN_DESCRIPTION;
++
++	struct kdbus_creds cached_creds;
++	uid_t ruid, euid, suid;
++	gid_t rgid, egid, sgid;
++
++	getresuid(&ruid, &euid, &suid);
++	getresgid(&rgid, &egid, &sgid);
++
++	cached_creds.uid = ruid;
++	cached_creds.euid = euid;
++	cached_creds.suid = suid;
++	cached_creds.fsuid = ruid;
++
++	cached_creds.gid = rgid;
++	cached_creds.egid = egid;
++	cached_creds.sgid = sgid;
++	cached_creds.fsgid = rgid;
++
++	struct kdbus_pids cached_pids = {
++		.pid	= getpid(),
++		.tid	= syscall(SYS_gettid),
++		.ppid	= getppid(),
++	};
++
++	ret = kdbus_conn_info(env->conn, env->conn->id, NULL,
++			      valid_flags, &offset);
++	ASSERT_RETURN(ret == 0);
++
++	info = (struct kdbus_info *)(env->conn->buf + offset);
++	ASSERT_RETURN(info->id == env->conn->id);
++
++	/* We do not have any well-known name */
++	item = kdbus_get_item(info, KDBUS_ITEM_NAME);
++	ASSERT_RETURN(item == NULL);
++
++	item = kdbus_get_item(info, KDBUS_ITEM_CONN_DESCRIPTION);
++	if (valid_flags & KDBUS_ATTACH_CONN_DESCRIPTION) {
++		ASSERT_RETURN(item);
++	} else {
++		ASSERT_RETURN(item == NULL);
++	}
++
++	kdbus_free(env->conn, offset);
++
++	conn = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn);
++
++	privileged = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(privileged);
++
++	ret = kdbus_conn_info(conn, conn->id, NULL, valid_flags, &offset);
++	ASSERT_RETURN(ret == 0);
++
++	info = (struct kdbus_info *)(conn->buf + offset);
++	ASSERT_RETURN(info->id == conn->id);
++
++	/* We do not have any well-known name */
++	item = kdbus_get_item(info, KDBUS_ITEM_NAME);
++	ASSERT_RETURN(item == NULL);
++
++	cnt = kdbus_count_item(info, KDBUS_ITEM_CREDS);
++	if (valid_flags & KDBUS_ATTACH_CREDS) {
++		ASSERT_RETURN(cnt == 1);
++
++		item = kdbus_get_item(info, KDBUS_ITEM_CREDS);
++		ASSERT_RETURN(item);
++
++		/* Compare received items with cached creds */
++		ASSERT_RETURN(memcmp(&item->creds, &cached_creds,
++				      sizeof(struct kdbus_creds)) == 0);
++	} else {
++		ASSERT_RETURN(cnt == 0);
++	}
++
++	item = kdbus_get_item(info, KDBUS_ITEM_PIDS);
++	if (valid_flags & KDBUS_ATTACH_PIDS) {
++		ASSERT_RETURN(item);
++
++		/* Compare item->pids with cached PIDs */
++		ASSERT_RETURN(item->pids.pid == cached_pids.pid &&
++			      item->pids.tid == cached_pids.tid &&
++			      item->pids.ppid == cached_pids.ppid);
++	} else {
++		ASSERT_RETURN(item == NULL);
++	}
++
++	/* We did not request KDBUS_ITEM_CAPS */
++	item = kdbus_get_item(info, KDBUS_ITEM_CAPS);
++	ASSERT_RETURN(item == NULL);
++
++	kdbus_free(conn, offset);
++
++	ret = kdbus_name_acquire(conn, "com.example.a", NULL);
++	ASSERT_RETURN(ret >= 0);
++
++	ret = kdbus_conn_info(conn, conn->id, NULL, valid_flags, &offset);
++	ASSERT_RETURN(ret == 0);
++
++	info = (struct kdbus_info *)(conn->buf + offset);
++	ASSERT_RETURN(info->id == conn->id);
++
++	item = kdbus_get_item(info, KDBUS_ITEM_OWNED_NAME);
++	if (valid_flags & KDBUS_ATTACH_NAMES) {
++		ASSERT_RETURN(item && !strcmp(item->name.name, "com.example.a"));
++	} else {
++		ASSERT_RETURN(item == NULL);
++	}
++
++	kdbus_free(conn, offset);
++
++	ret = kdbus_conn_info(conn, 0, "com.example.a", valid_flags, &offset);
++	ASSERT_RETURN(ret == 0);
++
++	info = (struct kdbus_info *)(conn->buf + offset);
++	ASSERT_RETURN(info->id == conn->id);
++
++	kdbus_free(conn, offset);
++
++	/* does not have the necessary caps to drop to unprivileged */
++	if (!capable)
++		goto continue_test;
++
++	ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_GID, ({
++		ret = kdbus_conn_info(conn, conn->id, NULL,
++				      valid_flags, &offset);
++		ASSERT_EXIT(ret == 0);
++
++		info = (struct kdbus_info *)(conn->buf + offset);
++		ASSERT_EXIT(info->id == conn->id);
++
++		if (valid_flags & KDBUS_ATTACH_NAMES) {
++			item = kdbus_get_item(info, KDBUS_ITEM_OWNED_NAME);
++			ASSERT_EXIT(item &&
++				    strcmp(item->name.name,
++				           "com.example.a") == 0);
++		}
++
++		if (valid_flags & KDBUS_ATTACH_CREDS) {
++			item = kdbus_get_item(info, KDBUS_ITEM_CREDS);
++			ASSERT_EXIT(item);
++
++			/* Compare received items with cached creds */
++			ASSERT_EXIT(memcmp(&item->creds, &cached_creds,
++				    sizeof(struct kdbus_creds)) == 0);
++		}
++
++		if (valid_flags & KDBUS_ATTACH_PIDS) {
++			item = kdbus_get_item(info, KDBUS_ITEM_PIDS);
++			ASSERT_EXIT(item);
++
++			/*
++			 * Compare item->pids with cached pids of
++			 * privileged one.
++			 *
++			 * cmd_info will always return cached pids.
++			 */
++			ASSERT_EXIT(item->pids.pid == cached_pids.pid &&
++				    item->pids.tid == cached_pids.tid);
++		}
++
++		kdbus_free(conn, offset);
++
++		/*
++		 * Use invalid_flags and make sure that userspace
++		 * do not play with us.
++		 */
++		ret = kdbus_conn_info(conn, conn->id, NULL,
++				      invalid_flags, &offset);
++		ASSERT_EXIT(ret == 0);
++
++		/*
++		 * Make sure that we return only one creds item and
++		 * it points to the cached creds.
++		 */
++		cnt = kdbus_count_item(info, KDBUS_ITEM_CREDS);
++		if (invalid_flags & KDBUS_ATTACH_CREDS) {
++			ASSERT_EXIT(cnt == 1);
++
++			item = kdbus_get_item(info, KDBUS_ITEM_CREDS);
++			ASSERT_EXIT(item);
++
++			/* Compare received items with cached creds */
++			ASSERT_EXIT(memcmp(&item->creds, &cached_creds,
++				    sizeof(struct kdbus_creds)) == 0);
++		} else {
++			ASSERT_EXIT(cnt == 0);
++		}
++
++		if (invalid_flags & KDBUS_ATTACH_PIDS) {
++			cnt = kdbus_count_item(info, KDBUS_ITEM_PIDS);
++			ASSERT_EXIT(cnt == 1);
++
++			item = kdbus_get_item(info, KDBUS_ITEM_PIDS);
++			ASSERT_EXIT(item);
++
++			/* Compare item->pids with cached pids */
++			ASSERT_EXIT(item->pids.pid == cached_pids.pid &&
++				    item->pids.tid == cached_pids.tid);
++		}
++
++		cnt = kdbus_count_item(info, KDBUS_ITEM_CGROUP);
++		if (invalid_flags & KDBUS_ATTACH_CGROUP) {
++			ASSERT_EXIT(cnt == 1);
++		} else {
++			ASSERT_EXIT(cnt == 0);
++		}
++
++		cnt = kdbus_count_item(info, KDBUS_ITEM_CAPS);
++		if (invalid_flags & KDBUS_ATTACH_CAPS) {
++			ASSERT_EXIT(cnt == 1);
++		} else {
++			ASSERT_EXIT(cnt == 0);
++		}
++
++		kdbus_free(conn, offset);
++	}),
++	({ 0; }));
++	ASSERT_RETURN(ret == 0);
++
++continue_test:
++
++	/* A second name */
++	ret = kdbus_name_acquire(conn, "com.example.b", NULL);
++	ASSERT_RETURN(ret >= 0);
++
++	ret = kdbus_conn_info(conn, conn->id, NULL, valid_flags, &offset);
++	ASSERT_RETURN(ret == 0);
++
++	info = (struct kdbus_info *)(conn->buf + offset);
++	ASSERT_RETURN(info->id == conn->id);
++
++	cnt = kdbus_count_item(info, KDBUS_ITEM_OWNED_NAME);
++	if (valid_flags & KDBUS_ATTACH_NAMES) {
++		ASSERT_RETURN(cnt == 2);
++	} else {
++		ASSERT_RETURN(cnt == 0);
++	}
++
++	kdbus_free(conn, offset);
++
++	ASSERT_RETURN(ret == 0);
++
++	return 0;
++}
++
++int kdbus_test_conn_info(struct kdbus_test_env *env)
++{
++	int ret;
++	int have_caps;
++	struct {
++		struct kdbus_cmd_info cmd_info;
++
++		struct {
++			uint64_t size;
++			uint64_t type;
++			char str[64];
++		} name;
++	} buf;
++
++	buf.cmd_info.size = sizeof(struct kdbus_cmd_info);
++	buf.cmd_info.flags = 0;
++	buf.cmd_info.attach_flags = 0;
++	buf.cmd_info.id = env->conn->id;
++
++	ret = kdbus_conn_info(env->conn, env->conn->id, NULL, 0, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	/* try to pass a name that is longer than the buffer's size */
++	buf.name.size = KDBUS_ITEM_HEADER_SIZE + 1;
++	buf.name.type = KDBUS_ITEM_NAME;
++	strcpy(buf.name.str, "foo.bar.bla");
++
++	buf.cmd_info.id = 0;
++	buf.cmd_info.size = sizeof(buf.cmd_info) + buf.name.size;
++	ret = kdbus_cmd_conn_info(env->conn->fd, (struct kdbus_cmd_info *) &buf);
++	ASSERT_RETURN(ret == -EINVAL);
++
++	/* Pass a non existent name */
++	ret = kdbus_conn_info(env->conn, 0, "non.existent.name", 0, NULL);
++	ASSERT_RETURN(ret == -ESRCH);
++
++	if (!all_uids_gids_are_mapped())
++		return TEST_SKIP;
++
++	/* Test for caps here, so we run the previous test */
++	have_caps = test_is_capable(CAP_SETUID, CAP_SETGID, -1);
++	ASSERT_RETURN(have_caps >= 0);
++
++	ret = kdbus_fuzz_conn_info(env, have_caps);
++	ASSERT_RETURN(ret == 0);
++
++	/* Now if we have skipped some tests then let the user know */
++	if (!have_caps)
++		return TEST_SKIP;
++
++	return TEST_OK;
++}
++
++int kdbus_test_conn_update(struct kdbus_test_env *env)
++{
++	struct kdbus_conn *conn;
++	struct kdbus_msg *msg;
++	int found = 0;
++	int ret;
++
++	/*
++	 * kdbus_hello() sets all attach flags. Receive a message by this
++	 * connection, and make sure a timestamp item (just to pick one) is
++	 * present.
++	 */
++	conn = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn);
++
++	ret = kdbus_msg_send(env->conn, NULL, 0x12345678, 0, 0, 0, conn->id);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_msg_recv(conn, &msg, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	found = kdbus_item_in_message(msg, KDBUS_ITEM_TIMESTAMP);
++	ASSERT_RETURN(found == 1);
++
++	kdbus_msg_free(msg);
++
++	/*
++	 * Now, modify the attach flags and repeat the action. The item must
++	 * now be missing.
++	 */
++	found = 0;
++
++	ret = kdbus_conn_update_attach_flags(conn,
++					     _KDBUS_ATTACH_ALL,
++					     _KDBUS_ATTACH_ALL &
++					     ~KDBUS_ATTACH_TIMESTAMP);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_msg_send(env->conn, NULL, 0x12345678, 0, 0, 0, conn->id);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_msg_recv(conn, &msg, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	found = kdbus_item_in_message(msg, KDBUS_ITEM_TIMESTAMP);
++	ASSERT_RETURN(found == 0);
++
++	/* Provide a bogus attach_flags value */
++	ret = kdbus_conn_update_attach_flags(conn,
++					     _KDBUS_ATTACH_ALL + 1,
++					     _KDBUS_ATTACH_ALL);
++	ASSERT_RETURN(ret == -EINVAL);
++
++	kdbus_msg_free(msg);
++
++	kdbus_conn_free(conn);
++
++	return TEST_OK;
++}
++
++int kdbus_test_writable_pool(struct kdbus_test_env *env)
++{
++	struct kdbus_cmd_free cmd_free = {};
++	struct kdbus_cmd_hello hello;
++	int fd, ret;
++	void *map;
++
++	fd = open(env->buspath, O_RDWR | O_CLOEXEC);
++	ASSERT_RETURN(fd >= 0);
++
++	memset(&hello, 0, sizeof(hello));
++	hello.flags = KDBUS_HELLO_ACCEPT_FD;
++	hello.attach_flags_send = _KDBUS_ATTACH_ALL;
++	hello.attach_flags_recv = _KDBUS_ATTACH_ALL;
++	hello.size = sizeof(struct kdbus_cmd_hello);
++	hello.pool_size = POOL_SIZE;
++	hello.offset = (__u64)-1;
++
++	/* success test */
++	ret = kdbus_cmd_hello(fd, &hello);
++	ASSERT_RETURN(ret == 0);
++
++	/* The kernel should have returned some items */
++	ASSERT_RETURN(hello.offset != (__u64)-1);
++	cmd_free.size = sizeof(cmd_free);
++	cmd_free.offset = hello.offset;
++	ret = kdbus_cmd_free(fd, &cmd_free);
++	ASSERT_RETURN(ret >= 0);
++
++	/* pools cannot be mapped writable */
++	map = mmap(NULL, POOL_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
++	ASSERT_RETURN(map == MAP_FAILED);
++
++	/* pools can always be mapped readable */
++	map = mmap(NULL, POOL_SIZE, PROT_READ, MAP_SHARED, fd, 0);
++	ASSERT_RETURN(map != MAP_FAILED);
++
++	/* make sure we cannot change protection masks to writable */
++	ret = mprotect(map, POOL_SIZE, PROT_READ | PROT_WRITE);
++	ASSERT_RETURN(ret < 0);
++
++	munmap(map, POOL_SIZE);
++	close(fd);
++
++	return TEST_OK;
++}
+diff --git a/tools/testing/selftests/kdbus/test-daemon.c b/tools/testing/selftests/kdbus/test-daemon.c
+new file mode 100644
+index 0000000..8bc2386
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-daemon.c
+@@ -0,0 +1,65 @@
++#include <stdio.h>
++#include <string.h>
++#include <time.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++#include <assert.h>
++#include <poll.h>
++#include <stdbool.h>
++
++#include "kdbus-test.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++
++int kdbus_test_daemon(struct kdbus_test_env *env)
++{
++	struct pollfd fds[2];
++	int count;
++	int ret;
++
++	/* This test doesn't make any sense in non-interactive mode */
++	if (!kdbus_util_verbose)
++		return TEST_OK;
++
++	printf("Created connection %llu on bus '%s'\n",
++		(unsigned long long) env->conn->id, env->buspath);
++
++	ret = kdbus_name_acquire(env->conn, "com.example.kdbus-test", NULL);
++	ASSERT_RETURN(ret == 0);
++	printf("  Aquired name: com.example.kdbus-test\n");
++
++	fds[0].fd = env->conn->fd;
++	fds[1].fd = STDIN_FILENO;
++
++	printf("Monitoring connections:\n");
++
++	for (count = 0;; count++) {
++		int i, nfds = sizeof(fds) / sizeof(fds[0]);
++
++		for (i = 0; i < nfds; i++) {
++			fds[i].events = POLLIN | POLLPRI | POLLHUP;
++			fds[i].revents = 0;
++		}
++
++		ret = poll(fds, nfds, -1);
++		if (ret <= 0)
++			break;
++
++		if (fds[0].revents & POLLIN) {
++			ret = kdbus_msg_recv(env->conn, NULL, NULL);
++			ASSERT_RETURN(ret == 0);
++		}
++
++		/* stdin */
++		if (fds[1].revents & POLLIN)
++			break;
++	}
++
++	printf("Closing bus connection\n");
++
++	return TEST_OK;
++}
+diff --git a/tools/testing/selftests/kdbus/test-endpoint.c b/tools/testing/selftests/kdbus/test-endpoint.c
+new file mode 100644
+index 0000000..34a7be4
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-endpoint.c
+@@ -0,0 +1,352 @@
++#include <stdio.h>
++#include <string.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++#include <assert.h>
++#include <libgen.h>
++#include <sys/capability.h>
++#include <sys/wait.h>
++#include <stdbool.h>
++
++#include "kdbus-api.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++#include "kdbus-test.h"
++
++#define KDBUS_SYSNAME_MAX_LEN			63
++
++static int install_name_add_match(struct kdbus_conn *conn, const char *name)
++{
++	struct {
++		struct kdbus_cmd_match cmd;
++		struct {
++			uint64_t size;
++			uint64_t type;
++			struct kdbus_notify_name_change chg;
++		} item;
++		char name[64];
++	} buf;
++	int ret;
++
++	/* install the match rule */
++	memset(&buf, 0, sizeof(buf));
++	buf.item.type = KDBUS_ITEM_NAME_ADD;
++	buf.item.chg.old_id.id = KDBUS_MATCH_ID_ANY;
++	buf.item.chg.new_id.id = KDBUS_MATCH_ID_ANY;
++	strncpy(buf.name, name, sizeof(buf.name) - 1);
++	buf.item.size = sizeof(buf.item) + strlen(buf.name) + 1;
++	buf.cmd.size = sizeof(buf.cmd) + buf.item.size;
++
++	ret = kdbus_cmd_match_add(conn->fd, &buf.cmd);
++	if (ret < 0)
++		return ret;
++
++	return 0;
++}
++
++static int create_endpoint(const char *buspath, uid_t uid, const char *name,
++			   uint64_t flags)
++{
++	struct {
++		struct kdbus_cmd cmd;
++
++		/* name item */
++		struct {
++			uint64_t size;
++			uint64_t type;
++			/* max should be KDBUS_SYSNAME_MAX_LEN */
++			char str[128];
++		} name;
++	} ep_make;
++	int fd, ret;
++
++	fd = open(buspath, O_RDWR);
++	if (fd < 0)
++		return fd;
++
++	memset(&ep_make, 0, sizeof(ep_make));
++
++	snprintf(ep_make.name.str,
++		 /* Use the KDBUS_SYSNAME_MAX_LEN or sizeof(str) */
++		 KDBUS_SYSNAME_MAX_LEN > strlen(name) ?
++		 KDBUS_SYSNAME_MAX_LEN : sizeof(ep_make.name.str),
++		 "%u-%s", uid, name);
++
++	ep_make.name.type = KDBUS_ITEM_MAKE_NAME;
++	ep_make.name.size = KDBUS_ITEM_HEADER_SIZE +
++			    strlen(ep_make.name.str) + 1;
++
++	ep_make.cmd.flags = flags;
++	ep_make.cmd.size = sizeof(ep_make.cmd) + ep_make.name.size;
++
++	ret = kdbus_cmd_endpoint_make(fd, &ep_make.cmd);
++	if (ret < 0) {
++		kdbus_printf("error creating endpoint: %d (%m)\n", ret);
++		return ret;
++	}
++
++	return fd;
++}
++
++static int unpriv_test_custom_ep(const char *buspath)
++{
++	int ret, ep_fd1, ep_fd2;
++	char *ep1, *ep2, *tmp1, *tmp2;
++
++	tmp1 = strdup(buspath);
++	tmp2 = strdup(buspath);
++	ASSERT_RETURN(tmp1 && tmp2);
++
++	ret = asprintf(&ep1, "%s/%u-%s", dirname(tmp1), getuid(), "apps1");
++	ASSERT_RETURN(ret >= 0);
++
++	ret = asprintf(&ep2, "%s/%u-%s", dirname(tmp2), getuid(), "apps2");
++	ASSERT_RETURN(ret >= 0);
++
++	free(tmp1);
++	free(tmp2);
++
++	/* endpoint only accessible to current uid */
++	ep_fd1 = create_endpoint(buspath, getuid(), "apps1", 0);
++	ASSERT_RETURN(ep_fd1 >= 0);
++
++	/* endpoint world accessible */
++	ep_fd2 = create_endpoint(buspath, getuid(), "apps2",
++				  KDBUS_MAKE_ACCESS_WORLD);
++	ASSERT_RETURN(ep_fd2 >= 0);
++
++	ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_UID, ({
++		int ep_fd;
++		struct kdbus_conn *ep_conn;
++
++		/*
++		 * Make sure that we are not able to create custom
++		 * endpoints
++		 */
++		ep_fd = create_endpoint(buspath, getuid(),
++					"unpriv_costum_ep", 0);
++		ASSERT_EXIT(ep_fd == -EPERM);
++
++		/*
++		 * Endpoint "apps1" only accessible to same users,
++		 * that own the endpoint. Access denied by VFS
++		 */
++		ep_conn = kdbus_hello(ep1, 0, NULL, 0);
++		ASSERT_EXIT(!ep_conn && errno == EACCES);
++
++		/* Endpoint "apps2" world accessible */
++		ep_conn = kdbus_hello(ep2, 0, NULL, 0);
++		ASSERT_EXIT(ep_conn);
++
++		kdbus_conn_free(ep_conn);
++
++		_exit(EXIT_SUCCESS);
++	}),
++	({ 0; }));
++	ASSERT_RETURN(ret == 0);
++
++	close(ep_fd1);
++	close(ep_fd2);
++	free(ep1);
++	free(ep2);
++
++	return 0;
++}
++
++static int update_endpoint(int fd, const char *name)
++{
++	int len = strlen(name) + 1;
++	struct {
++		struct kdbus_cmd cmd;
++
++		/* name item */
++		struct {
++			uint64_t size;
++			uint64_t type;
++			char str[KDBUS_ALIGN8(len)];
++		} name;
++
++		struct {
++			uint64_t size;
++			uint64_t type;
++			struct kdbus_policy_access access;
++		} access;
++	} ep_update;
++	int ret;
++
++	memset(&ep_update, 0, sizeof(ep_update));
++
++	ep_update.name.size = KDBUS_ITEM_HEADER_SIZE + len;
++	ep_update.name.type = KDBUS_ITEM_NAME;
++	strncpy(ep_update.name.str, name, sizeof(ep_update.name.str) - 1);
++
++	ep_update.access.size = sizeof(ep_update.access);
++	ep_update.access.type = KDBUS_ITEM_POLICY_ACCESS;
++	ep_update.access.access.type = KDBUS_POLICY_ACCESS_WORLD;
++	ep_update.access.access.access = KDBUS_POLICY_SEE;
++
++	ep_update.cmd.size = sizeof(ep_update);
++
++	ret = kdbus_cmd_endpoint_update(fd, &ep_update.cmd);
++	if (ret < 0) {
++		kdbus_printf("error updating endpoint: %d (%m)\n", ret);
++		return ret;
++	}
++
++	return 0;
++}
++
++int kdbus_test_custom_endpoint(struct kdbus_test_env *env)
++{
++	char *ep, *tmp;
++	int ret, ep_fd;
++	struct kdbus_msg *msg;
++	struct kdbus_conn *ep_conn;
++	struct kdbus_conn *reader;
++	const char *name = "foo.bar.baz";
++	const char *epname = "foo";
++	char fake_ep[KDBUS_SYSNAME_MAX_LEN + 1] = {'\0'};
++
++	memset(fake_ep, 'X', sizeof(fake_ep) - 1);
++
++	/* Try to create a custom endpoint with a long name */
++	ret = create_endpoint(env->buspath, getuid(), fake_ep, 0);
++	ASSERT_RETURN(ret == -ENAMETOOLONG);
++
++	/* Try to create a custom endpoint with a different uid */
++	ret = create_endpoint(env->buspath, getuid() + 1, "foobar", 0);
++	ASSERT_RETURN(ret == -EINVAL);
++
++	/* create a custom endpoint, and open a connection on it */
++	ep_fd = create_endpoint(env->buspath, getuid(), "foo", 0);
++	ASSERT_RETURN(ep_fd >= 0);
++
++	tmp = strdup(env->buspath);
++	ASSERT_RETURN(tmp);
++
++	ret = asprintf(&ep, "%s/%u-%s", dirname(tmp), getuid(), epname);
++	free(tmp);
++	ASSERT_RETURN(ret >= 0);
++
++	/* Register a connection that listen to broadcasts */
++	reader = kdbus_hello(ep, 0, NULL, 0);
++	ASSERT_RETURN(reader);
++
++	/* Register to kernel signals */
++	ret = kdbus_add_match_id(reader, 0x1, KDBUS_ITEM_ID_ADD,
++				 KDBUS_MATCH_ID_ANY);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_add_match_id(reader, 0x2, KDBUS_ITEM_ID_REMOVE,
++				 KDBUS_MATCH_ID_ANY);
++	ASSERT_RETURN(ret == 0);
++
++	ret = install_name_add_match(reader, name);
++	ASSERT_RETURN(ret == 0);
++
++	/* Monitor connections are not supported on custom endpoints */
++	ep_conn = kdbus_hello(ep, KDBUS_HELLO_MONITOR, NULL, 0);
++	ASSERT_RETURN(!ep_conn && errno == EOPNOTSUPP);
++
++	ep_conn = kdbus_hello(ep, 0, NULL, 0);
++	ASSERT_RETURN(ep_conn);
++
++	/* Check that the reader got the IdAdd notification */
++	ret = kdbus_msg_recv(reader, &msg, NULL);
++	ASSERT_RETURN(ret == 0);
++	ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_ID_ADD);
++	ASSERT_RETURN(msg->items[0].id_change.id == ep_conn->id);
++	kdbus_msg_free(msg);
++
++	/*
++	 * Add a name add match on the endpoint connection, acquire name from
++	 * the unfiltered connection, and make sure the filtered connection
++	 * did not get the notification on the name owner change. Also, the
++	 * endpoint connection may not be able to call conn_info, neither on
++	 * the name nor on the ID.
++	 */
++	ret = install_name_add_match(ep_conn, name);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_name_acquire(env->conn, name, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_msg_recv(ep_conn, NULL, NULL);
++	ASSERT_RETURN(ret == -EAGAIN);
++
++	ret = kdbus_conn_info(ep_conn, 0, name, 0, NULL);
++	ASSERT_RETURN(ret == -ESRCH);
++
++	ret = kdbus_conn_info(ep_conn, 0, "random.crappy.name", 0, NULL);
++	ASSERT_RETURN(ret == -ESRCH);
++
++	ret = kdbus_conn_info(ep_conn, env->conn->id, NULL, 0, NULL);
++	ASSERT_RETURN(ret == -ENXIO);
++
++	ret = kdbus_conn_info(ep_conn, 0x0fffffffffffffffULL, NULL, 0, NULL);
++	ASSERT_RETURN(ret == -ENXIO);
++
++	/* Check that the reader did not receive the name notification */
++	ret = kdbus_msg_recv(reader, NULL, NULL);
++	ASSERT_RETURN(ret == -EAGAIN);
++
++	/*
++	 * Release the name again, update the custom endpoint policy,
++	 * and try again. This time, the connection on the custom endpoint
++	 * should have gotten it.
++	 */
++	ret = kdbus_name_release(env->conn, name);
++	ASSERT_RETURN(ret == 0);
++
++	/* Check that the reader did not receive the name notification */
++	ret = kdbus_msg_recv(reader, NULL, NULL);
++	ASSERT_RETURN(ret == -EAGAIN);
++
++	ret = update_endpoint(ep_fd, name);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_name_acquire(env->conn, name, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_msg_recv(ep_conn, &msg, NULL);
++	ASSERT_RETURN(ret == 0);
++	ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_NAME_ADD);
++	ASSERT_RETURN(msg->items[0].name_change.old_id.id == 0);
++	ASSERT_RETURN(msg->items[0].name_change.new_id.id == env->conn->id);
++	ASSERT_RETURN(strcmp(msg->items[0].name_change.name, name) == 0);
++	kdbus_msg_free(msg);
++
++	ret = kdbus_msg_recv(reader, &msg, NULL);
++	ASSERT_RETURN(ret == 0);
++	ASSERT_RETURN(strcmp(msg->items[0].name_change.name, name) == 0);
++
++	kdbus_msg_free(msg);
++
++	ret = kdbus_conn_info(ep_conn, 0, name, 0, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_conn_info(ep_conn, env->conn->id, NULL, 0, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	/* If we have privileges test custom endpoints */
++	ret = test_is_capable(CAP_SETUID, CAP_SETGID, -1);
++	ASSERT_RETURN(ret >= 0);
++
++	/*
++	 * All uids/gids are mapped and we have the necessary caps
++	 */
++	if (ret && all_uids_gids_are_mapped()) {
++		ret = unpriv_test_custom_ep(env->buspath);
++		ASSERT_RETURN(ret == 0);
++	}
++
++	kdbus_conn_free(reader);
++	kdbus_conn_free(ep_conn);
++	close(ep_fd);
++
++	return TEST_OK;
++}
+diff --git a/tools/testing/selftests/kdbus/test-fd.c b/tools/testing/selftests/kdbus/test-fd.c
+new file mode 100644
+index 0000000..2ae0f5a
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-fd.c
+@@ -0,0 +1,789 @@
++#include <stdio.h>
++#include <string.h>
++#include <time.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stdbool.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++#include <assert.h>
++#include <sys/types.h>
++#include <sys/mman.h>
++#include <sys/socket.h>
++#include <sys/wait.h>
++
++#include "kdbus-api.h"
++#include "kdbus-test.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++
++#define KDBUS_MSG_MAX_ITEMS     128
++#define KDBUS_USER_MAX_CONN	256
++
++/* maximum number of inflight fds in a target queue per user */
++#define KDBUS_CONN_MAX_FDS_PER_USER	16
++
++/* maximum number of memfd items per message */
++#define KDBUS_MSG_MAX_MEMFD_ITEMS       16
++
++static int make_msg_payload_dbus(uint64_t src_id, uint64_t dst_id,
++				 uint64_t msg_size,
++				 struct kdbus_msg **msg_dbus)
++{
++	struct kdbus_msg *msg;
++
++	msg = malloc(msg_size);
++	ASSERT_RETURN_VAL(msg, -ENOMEM);
++
++	memset(msg, 0, msg_size);
++	msg->size = msg_size;
++	msg->src_id = src_id;
++	msg->dst_id = dst_id;
++	msg->payload_type = KDBUS_PAYLOAD_DBUS;
++
++	*msg_dbus = msg;
++
++	return 0;
++}
++
++static void make_item_memfds(struct kdbus_item *item,
++			     int *memfds, size_t memfd_size)
++{
++	size_t i;
++
++	for (i = 0; i < memfd_size; i++) {
++		item->type = KDBUS_ITEM_PAYLOAD_MEMFD;
++		item->size = KDBUS_ITEM_HEADER_SIZE +
++			     sizeof(struct kdbus_memfd);
++		item->memfd.fd = memfds[i];
++		item->memfd.size = sizeof(uint64_t); /* const size */
++		item = KDBUS_ITEM_NEXT(item);
++	}
++}
++
++static void make_item_fds(struct kdbus_item *item,
++			  int *fd_array, size_t fd_size)
++{
++	size_t i;
++	item->type = KDBUS_ITEM_FDS;
++	item->size = KDBUS_ITEM_HEADER_SIZE + (sizeof(int) * fd_size);
++
++	for (i = 0; i < fd_size; i++)
++		item->fds[i] = fd_array[i];
++}
++
++static int memfd_write(const char *name, void *buf, size_t bufsize)
++{
++	ssize_t ret;
++	int memfd;
++
++	memfd = sys_memfd_create(name, 0);
++	ASSERT_RETURN_VAL(memfd >= 0, memfd);
++
++	ret = write(memfd, buf, bufsize);
++	ASSERT_RETURN_VAL(ret == (ssize_t)bufsize, -EAGAIN);
++
++	ret = sys_memfd_seal_set(memfd);
++	ASSERT_RETURN_VAL(ret == 0, -errno);
++
++	return memfd;
++}
++
++static int send_memfds(struct kdbus_conn *conn, uint64_t dst_id,
++		       int *memfds_array, size_t memfd_count)
++{
++	struct kdbus_cmd_send cmd = {};
++	struct kdbus_item *item;
++	struct kdbus_msg *msg;
++	uint64_t size;
++	int ret;
++
++	size = sizeof(struct kdbus_msg);
++	size += memfd_count * KDBUS_ITEM_SIZE(sizeof(struct kdbus_memfd));
++
++	if (dst_id == KDBUS_DST_ID_BROADCAST)
++		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
++
++	ret = make_msg_payload_dbus(conn->id, dst_id, size, &msg);
++	ASSERT_RETURN_VAL(ret == 0, ret);
++
++	item = msg->items;
++
++	if (dst_id == KDBUS_DST_ID_BROADCAST) {
++		item->type = KDBUS_ITEM_BLOOM_FILTER;
++		item->size = KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
++		item = KDBUS_ITEM_NEXT(item);
++
++		msg->flags |= KDBUS_MSG_SIGNAL;
++	}
++
++	make_item_memfds(item, memfds_array, memfd_count);
++
++	cmd.size = sizeof(cmd);
++	cmd.msg_address = (uintptr_t)msg;
++
++	ret = kdbus_cmd_send(conn->fd, &cmd);
++	if (ret < 0) {
++		kdbus_printf("error sending message: %d (%m)\n", ret);
++		return ret;
++	}
++
++	free(msg);
++	return 0;
++}
++
++static int send_fds(struct kdbus_conn *conn, uint64_t dst_id,
++		    int *fd_array, size_t fd_count)
++{
++	struct kdbus_cmd_send cmd = {};
++	struct kdbus_item *item;
++	struct kdbus_msg *msg;
++	uint64_t size;
++	int ret;
++
++	size = sizeof(struct kdbus_msg);
++	size += KDBUS_ITEM_SIZE(sizeof(int) * fd_count);
++
++	if (dst_id == KDBUS_DST_ID_BROADCAST)
++		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
++
++	ret = make_msg_payload_dbus(conn->id, dst_id, size, &msg);
++	ASSERT_RETURN_VAL(ret == 0, ret);
++
++	item = msg->items;
++
++	if (dst_id == KDBUS_DST_ID_BROADCAST) {
++		item->type = KDBUS_ITEM_BLOOM_FILTER;
++		item->size = KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
++		item = KDBUS_ITEM_NEXT(item);
++
++		msg->flags |= KDBUS_MSG_SIGNAL;
++	}
++
++	make_item_fds(item, fd_array, fd_count);
++
++	cmd.size = sizeof(cmd);
++	cmd.msg_address = (uintptr_t)msg;
++
++	ret = kdbus_cmd_send(conn->fd, &cmd);
++	if (ret < 0) {
++		kdbus_printf("error sending message: %d (%m)\n", ret);
++		return ret;
++	}
++
++	free(msg);
++	return ret;
++}
++
++static int send_fds_memfds(struct kdbus_conn *conn, uint64_t dst_id,
++			   int *fds_array, size_t fd_count,
++			   int *memfds_array, size_t memfd_count)
++{
++	struct kdbus_cmd_send cmd = {};
++	struct kdbus_item *item;
++	struct kdbus_msg *msg;
++	uint64_t size;
++	int ret;
++
++	size = sizeof(struct kdbus_msg);
++	size += memfd_count * KDBUS_ITEM_SIZE(sizeof(struct kdbus_memfd));
++	size += KDBUS_ITEM_SIZE(sizeof(int) * fd_count);
++
++	ret = make_msg_payload_dbus(conn->id, dst_id, size, &msg);
++	ASSERT_RETURN_VAL(ret == 0, ret);
++
++	item = msg->items;
++
++	make_item_fds(item, fds_array, fd_count);
++	item = KDBUS_ITEM_NEXT(item);
++	make_item_memfds(item, memfds_array, memfd_count);
++
++	cmd.size = sizeof(cmd);
++	cmd.msg_address = (uintptr_t)msg;
++
++	ret = kdbus_cmd_send(conn->fd, &cmd);
++	if (ret < 0) {
++		kdbus_printf("error sending message: %d (%m)\n", ret);
++		return ret;
++	}
++
++	free(msg);
++	return ret;
++}
++
++/* Return the number of received fds */
++static unsigned int kdbus_item_get_nfds(struct kdbus_msg *msg)
++{
++	unsigned int fds = 0;
++	const struct kdbus_item *item;
++
++	KDBUS_ITEM_FOREACH(item, msg, items) {
++		switch (item->type) {
++		case KDBUS_ITEM_FDS: {
++			fds += (item->size - KDBUS_ITEM_HEADER_SIZE) /
++				sizeof(int);
++			break;
++		}
++
++		case KDBUS_ITEM_PAYLOAD_MEMFD:
++			fds++;
++			break;
++
++		default:
++			break;
++		}
++	}
++
++	return fds;
++}
++
++static struct kdbus_msg *
++get_kdbus_msg_with_fd(struct kdbus_conn *conn_src,
++		      uint64_t dst_id, uint64_t cookie, int fd)
++{
++	int ret;
++	uint64_t size;
++	struct kdbus_item *item;
++	struct kdbus_msg *msg;
++
++	size = sizeof(struct kdbus_msg);
++	if (fd >= 0)
++		size += KDBUS_ITEM_SIZE(sizeof(int));
++
++	ret = make_msg_payload_dbus(conn_src->id, dst_id, size, &msg);
++	ASSERT_RETURN_VAL(ret == 0, NULL);
++
++	msg->cookie = cookie;
++
++	if (fd >= 0) {
++		item = msg->items;
++
++		make_item_fds(item, (int *)&fd, 1);
++	}
++
++	return msg;
++}
++
++static int kdbus_test_no_fds(struct kdbus_test_env *env,
++			     int *fds, int *memfd)
++{
++	pid_t pid;
++	int ret, status;
++	uint64_t cookie;
++	int connfd1, connfd2;
++	struct kdbus_msg *msg, *msg_sync_reply;
++	struct kdbus_cmd_hello hello;
++	struct kdbus_conn *conn_src, *conn_dst, *conn_dummy;
++	struct kdbus_cmd_send cmd = {};
++	struct kdbus_cmd_free cmd_free = {};
++
++	conn_src = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn_src);
++
++	connfd1 = open(env->buspath, O_RDWR|O_CLOEXEC);
++	ASSERT_RETURN(connfd1 >= 0);
++
++	connfd2 = open(env->buspath, O_RDWR|O_CLOEXEC);
++	ASSERT_RETURN(connfd2 >= 0);
++
++	/*
++	 * Create connections without KDBUS_HELLO_ACCEPT_FD
++	 * to test if send fd operations are blocked
++	 */
++	conn_dst = malloc(sizeof(*conn_dst));
++	ASSERT_RETURN(conn_dst);
++
++	conn_dummy = malloc(sizeof(*conn_dummy));
++	ASSERT_RETURN(conn_dummy);
++
++	memset(&hello, 0, sizeof(hello));
++	hello.size = sizeof(struct kdbus_cmd_hello);
++	hello.pool_size = POOL_SIZE;
++	hello.attach_flags_send = _KDBUS_ATTACH_ALL;
++
++	ret = kdbus_cmd_hello(connfd1, &hello);
++	ASSERT_RETURN(ret == 0);
++
++	cmd_free.size = sizeof(cmd_free);
++	cmd_free.offset = hello.offset;
++	ret = kdbus_cmd_free(connfd1, &cmd_free);
++	ASSERT_RETURN(ret >= 0);
++
++	conn_dst->fd = connfd1;
++	conn_dst->id = hello.id;
++
++	memset(&hello, 0, sizeof(hello));
++	hello.size = sizeof(struct kdbus_cmd_hello);
++	hello.pool_size = POOL_SIZE;
++	hello.attach_flags_send = _KDBUS_ATTACH_ALL;
++
++	ret = kdbus_cmd_hello(connfd2, &hello);
++	ASSERT_RETURN(ret == 0);
++
++	cmd_free.size = sizeof(cmd_free);
++	cmd_free.offset = hello.offset;
++	ret = kdbus_cmd_free(connfd2, &cmd_free);
++	ASSERT_RETURN(ret >= 0);
++
++	conn_dummy->fd = connfd2;
++	conn_dummy->id = hello.id;
++
++	conn_dst->buf = mmap(NULL, POOL_SIZE, PROT_READ,
++			     MAP_SHARED, connfd1, 0);
++	ASSERT_RETURN(conn_dst->buf != MAP_FAILED);
++
++	conn_dummy->buf = mmap(NULL, POOL_SIZE, PROT_READ,
++			       MAP_SHARED, connfd2, 0);
++	ASSERT_RETURN(conn_dummy->buf != MAP_FAILED);
++
++	/*
++	 * Send fds to connection that do not accept fd passing
++	 */
++	ret = send_fds(conn_src, conn_dst->id, fds, 1);
++	ASSERT_RETURN(ret == -ECOMM);
++
++	/*
++	 * memfd are kdbus payload
++	 */
++	ret = send_memfds(conn_src, conn_dst->id, memfd, 1);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_msg_recv_poll(conn_dst, 100, NULL, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	cookie = time(NULL);
++
++	pid = fork();
++	ASSERT_RETURN_VAL(pid >= 0, pid);
++
++	if (pid == 0) {
++		struct timespec now;
++
++		/*
++		 * A sync send/reply to a connection that do not
++		 * accept fds should fail if it contains an fd
++		 */
++		msg_sync_reply = get_kdbus_msg_with_fd(conn_dst,
++						       conn_dummy->id,
++						       cookie, fds[0]);
++		ASSERT_EXIT(msg_sync_reply);
++
++		ret = clock_gettime(CLOCK_MONOTONIC_COARSE, &now);
++		ASSERT_EXIT(ret == 0);
++
++		msg_sync_reply->timeout_ns = now.tv_sec * 1000000000ULL +
++					     now.tv_nsec + 100000000ULL;
++		msg_sync_reply->flags = KDBUS_MSG_EXPECT_REPLY;
++
++		memset(&cmd, 0, sizeof(cmd));
++		cmd.size = sizeof(cmd);
++		cmd.msg_address = (uintptr_t)msg_sync_reply;
++		cmd.flags = KDBUS_SEND_SYNC_REPLY;
++
++		ret = kdbus_cmd_send(conn_dst->fd, &cmd);
++		ASSERT_EXIT(ret == -ECOMM);
++
++		/*
++		 * Now send a normal message, but the sync reply
++		 * will fail since it contains an fd that the
++		 * original sender do not want.
++		 *
++		 * The original sender will fail with -ETIMEDOUT
++		 */
++		cookie++;
++		ret = kdbus_msg_send_sync(conn_dst, NULL, cookie,
++					  KDBUS_MSG_EXPECT_REPLY,
++					  5000000000ULL, 0, conn_src->id, -1);
++		ASSERT_EXIT(ret == -EREMOTEIO);
++
++		cookie++;
++		ret = kdbus_msg_recv_poll(conn_dst, 100, &msg, NULL);
++		ASSERT_EXIT(ret == 0);
++		ASSERT_EXIT(msg->cookie == cookie);
++
++		free(msg_sync_reply);
++		kdbus_msg_free(msg);
++
++		_exit(EXIT_SUCCESS);
++	}
++
++	ret = kdbus_msg_recv_poll(conn_dummy, 100, NULL, NULL);
++	ASSERT_RETURN(ret == -ETIMEDOUT);
++
++	cookie++;
++	ret = kdbus_msg_recv_poll(conn_src, 100, &msg, NULL);
++	ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
++
++	kdbus_msg_free(msg);
++
++	/*
++	 * Try to reply with a kdbus connection handle, this should
++	 * fail with -EOPNOTSUPP
++	 */
++	msg_sync_reply = get_kdbus_msg_with_fd(conn_src,
++					       conn_dst->id,
++					       cookie, conn_dst->fd);
++	ASSERT_RETURN(msg_sync_reply);
++
++	msg_sync_reply->cookie_reply = cookie;
++
++	memset(&cmd, 0, sizeof(cmd));
++	cmd.size = sizeof(cmd);
++	cmd.msg_address = (uintptr_t)msg_sync_reply;
++
++	ret = kdbus_cmd_send(conn_src->fd, &cmd);
++	ASSERT_RETURN(ret == -EOPNOTSUPP);
++
++	free(msg_sync_reply);
++
++	/*
++	 * Try to reply with a normal fd, this should fail even
++	 * if the response is a sync reply
++	 *
++	 * From the sender view we fail with -ECOMM
++	 */
++	msg_sync_reply = get_kdbus_msg_with_fd(conn_src,
++					       conn_dst->id,
++					       cookie, fds[0]);
++	ASSERT_RETURN(msg_sync_reply);
++
++	msg_sync_reply->cookie_reply = cookie;
++
++	memset(&cmd, 0, sizeof(cmd));
++	cmd.size = sizeof(cmd);
++	cmd.msg_address = (uintptr_t)msg_sync_reply;
++
++	ret = kdbus_cmd_send(conn_src->fd, &cmd);
++	ASSERT_RETURN(ret == -ECOMM);
++
++	free(msg_sync_reply);
++
++	/*
++	 * Resend another normal message and check if the queue
++	 * is clear
++	 */
++	cookie++;
++	ret = kdbus_msg_send(conn_src, NULL, cookie, 0, 0, 0,
++			     conn_dst->id);
++	ASSERT_RETURN(ret == 0);
++
++	ret = waitpid(pid, &status, 0);
++	ASSERT_RETURN_VAL(ret >= 0, ret);
++
++	kdbus_conn_free(conn_dummy);
++	kdbus_conn_free(conn_dst);
++	kdbus_conn_free(conn_src);
++
++	return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
++}
++
++static int kdbus_send_multiple_fds(struct kdbus_conn *conn_src,
++				   struct kdbus_conn *conn_dst)
++{
++	int ret, i;
++	unsigned int nfds;
++	int fds[KDBUS_CONN_MAX_FDS_PER_USER + 1];
++	int memfds[KDBUS_MSG_MAX_ITEMS + 1];
++	struct kdbus_msg *msg;
++	uint64_t dummy_value;
++
++	dummy_value = time(NULL);
++
++	for (i = 0; i < KDBUS_CONN_MAX_FDS_PER_USER + 1; i++) {
++		fds[i] = open("/dev/null", O_RDWR|O_CLOEXEC);
++		ASSERT_RETURN_VAL(fds[i] >= 0, -errno);
++	}
++
++	/* Send KDBUS_CONN_MAX_FDS_PER_USER with one more fd */
++	ret = send_fds(conn_src, conn_dst->id, fds,
++		       KDBUS_CONN_MAX_FDS_PER_USER + 1);
++	ASSERT_RETURN(ret == -EMFILE);
++
++	/* Retry with the correct KDBUS_CONN_MAX_FDS_PER_USER */
++	ret = send_fds(conn_src, conn_dst->id, fds,
++		       KDBUS_CONN_MAX_FDS_PER_USER);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_msg_recv(conn_dst, &msg, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	/* Check we got the right number of fds */
++	nfds = kdbus_item_get_nfds(msg);
++	ASSERT_RETURN(nfds == KDBUS_CONN_MAX_FDS_PER_USER);
++
++	kdbus_msg_free(msg);
++
++	for (i = 0; i < KDBUS_MSG_MAX_ITEMS + 1; i++, dummy_value++) {
++		memfds[i] = memfd_write("memfd-name",
++					&dummy_value,
++					sizeof(dummy_value));
++		ASSERT_RETURN_VAL(memfds[i] >= 0, memfds[i]);
++	}
++
++	/* Send KDBUS_MSG_MAX_ITEMS with one more memfd */
++	ret = send_memfds(conn_src, conn_dst->id,
++			  memfds, KDBUS_MSG_MAX_ITEMS + 1);
++	ASSERT_RETURN(ret == -E2BIG);
++
++	ret = send_memfds(conn_src, conn_dst->id,
++			  memfds, KDBUS_MSG_MAX_MEMFD_ITEMS + 1);
++	ASSERT_RETURN(ret == -E2BIG);
++
++	/* Retry with the correct KDBUS_MSG_MAX_ITEMS */
++	ret = send_memfds(conn_src, conn_dst->id,
++			  memfds, KDBUS_MSG_MAX_MEMFD_ITEMS);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_msg_recv(conn_dst, &msg, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	/* Check we got the right number of fds */
++	nfds = kdbus_item_get_nfds(msg);
++	ASSERT_RETURN(nfds == KDBUS_MSG_MAX_MEMFD_ITEMS);
++
++	kdbus_msg_free(msg);
++
++
++	/*
++	 * Combine multiple KDBUS_CONN_MAX_FDS_PER_USER+1 fds and
++	 * 10 memfds
++	 */
++	ret = send_fds_memfds(conn_src, conn_dst->id,
++			      fds, KDBUS_CONN_MAX_FDS_PER_USER + 1,
++			      memfds, 10);
++	ASSERT_RETURN(ret == -EMFILE);
++
++	ret = kdbus_msg_recv(conn_dst, NULL, NULL);
++	ASSERT_RETURN(ret == -EAGAIN);
++
++	/*
++	 * Combine multiple KDBUS_CONN_MAX_FDS_PER_USER fds and
++	 * (128 - 1) + 1 memfds, all fds take one item, while each
++	 * memfd takes one item
++	 */
++	ret = send_fds_memfds(conn_src, conn_dst->id,
++			      fds, KDBUS_CONN_MAX_FDS_PER_USER,
++			      memfds, (KDBUS_MSG_MAX_ITEMS - 1) + 1);
++	ASSERT_RETURN(ret == -E2BIG);
++
++	ret = send_fds_memfds(conn_src, conn_dst->id,
++			      fds, KDBUS_CONN_MAX_FDS_PER_USER,
++			      memfds, KDBUS_MSG_MAX_MEMFD_ITEMS + 1);
++	ASSERT_RETURN(ret == -E2BIG);
++
++	ret = kdbus_msg_recv(conn_dst, NULL, NULL);
++	ASSERT_RETURN(ret == -EAGAIN);
++
++	/*
++	 * Send KDBUS_CONN_MAX_FDS_PER_USER fds +
++	 * KDBUS_MSG_MAX_MEMFD_ITEMS memfds
++	 */
++	ret = send_fds_memfds(conn_src, conn_dst->id,
++			      fds, KDBUS_CONN_MAX_FDS_PER_USER,
++			      memfds, KDBUS_MSG_MAX_MEMFD_ITEMS);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_msg_recv(conn_dst, &msg, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	/* Check we got the right number of fds */
++	nfds = kdbus_item_get_nfds(msg);
++	ASSERT_RETURN(nfds == KDBUS_CONN_MAX_FDS_PER_USER +
++			      KDBUS_MSG_MAX_MEMFD_ITEMS);
++
++	kdbus_msg_free(msg);
++
++
++	/*
++	 * Re-send fds + memfds, close them, but do not receive them
++	 * and try to queue more
++	 */
++	ret = send_fds_memfds(conn_src, conn_dst->id,
++			      fds, KDBUS_CONN_MAX_FDS_PER_USER,
++			      memfds, KDBUS_MSG_MAX_MEMFD_ITEMS);
++	ASSERT_RETURN(ret == 0);
++
++	/* close old references and get a new ones */
++	for (i = 0; i < KDBUS_CONN_MAX_FDS_PER_USER + 1; i++) {
++		close(fds[i]);
++		fds[i] = open("/dev/null", O_RDWR|O_CLOEXEC);
++		ASSERT_RETURN_VAL(fds[i] >= 0, -errno);
++	}
++
++	/* should fail since we have already fds in the queue */
++	ret = send_fds(conn_src, conn_dst->id, fds,
++		       KDBUS_CONN_MAX_FDS_PER_USER);
++	ASSERT_RETURN(ret == -EMFILE);
++
++	/* This should succeed */
++	ret = send_memfds(conn_src, conn_dst->id,
++			  memfds, KDBUS_MSG_MAX_MEMFD_ITEMS);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_msg_recv(conn_dst, &msg, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	nfds = kdbus_item_get_nfds(msg);
++	ASSERT_RETURN(nfds == KDBUS_CONN_MAX_FDS_PER_USER +
++			      KDBUS_MSG_MAX_MEMFD_ITEMS);
++
++	kdbus_msg_free(msg);
++
++	ret = kdbus_msg_recv(conn_dst, &msg, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	nfds = kdbus_item_get_nfds(msg);
++	ASSERT_RETURN(nfds == KDBUS_MSG_MAX_MEMFD_ITEMS);
++
++	kdbus_msg_free(msg);
++
++	ret = kdbus_msg_recv(conn_dst, NULL, NULL);
++	ASSERT_RETURN(ret == -EAGAIN);
++
++	for (i = 0; i < KDBUS_CONN_MAX_FDS_PER_USER + 1; i++)
++		close(fds[i]);
++
++	for (i = 0; i < KDBUS_MSG_MAX_ITEMS + 1; i++)
++		close(memfds[i]);
++
++	return 0;
++}
++
++int kdbus_test_fd_passing(struct kdbus_test_env *env)
++{
++	struct kdbus_conn *conn_src, *conn_dst;
++	const char *str = "stackenblocken";
++	const struct kdbus_item *item;
++	struct kdbus_msg *msg;
++	unsigned int i;
++	uint64_t now;
++	int fds_conn[2];
++	int sock_pair[2];
++	int fds[2];
++	int memfd;
++	int ret;
++
++	now = (uint64_t) time(NULL);
++
++	/* create two connections */
++	conn_src = kdbus_hello(env->buspath, 0, NULL, 0);
++	conn_dst = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn_src && conn_dst);
++
++	fds_conn[0] = conn_src->fd;
++	fds_conn[1] = conn_dst->fd;
++
++	ret = socketpair(AF_UNIX, SOCK_STREAM, 0, sock_pair);
++	ASSERT_RETURN(ret == 0);
++
++	/* Setup memfd */
++	memfd = memfd_write("memfd-name", &now, sizeof(now));
++	ASSERT_RETURN(memfd >= 0);
++
++	/* Setup pipes */
++	ret = pipe(fds);
++	ASSERT_RETURN(ret == 0);
++
++	i = write(fds[1], str, strlen(str));
++	ASSERT_RETURN(i == strlen(str));
++
++	/*
++	 * Try to ass the handle of a connection as message payload.
++	 * This must fail.
++	 */
++	ret = send_fds(conn_src, conn_dst->id, fds_conn, 2);
++	ASSERT_RETURN(ret == -ENOTSUP);
++
++	ret = send_fds(conn_dst, conn_src->id, fds_conn, 2);
++	ASSERT_RETURN(ret == -ENOTSUP);
++
++	ret = send_fds(conn_src, conn_dst->id, sock_pair, 2);
++	ASSERT_RETURN(ret == -ENOTSUP);
++
++	/*
++	 * Send fds and memfds to connection that do not accept fds
++	 */
++	ret = kdbus_test_no_fds(env, fds, (int *)&memfd);
++	ASSERT_RETURN(ret == 0);
++
++	/* Try to broadcast file descriptors. This must fail. */
++	ret = send_fds(conn_src, KDBUS_DST_ID_BROADCAST, fds, 1);
++	ASSERT_RETURN(ret == -ENOTUNIQ);
++
++	/* Try to broadcast memfd. This must succeed. */
++	ret = send_memfds(conn_src, KDBUS_DST_ID_BROADCAST, (int *)&memfd, 1);
++	ASSERT_RETURN(ret == 0);
++
++	/* Open code this loop */
++loop_send_fds:
++
++	/*
++	 * Send the read end of the pipe and close it.
++	 */
++	ret = send_fds(conn_src, conn_dst->id, fds, 1);
++	ASSERT_RETURN(ret == 0);
++	close(fds[0]);
++
++	ret = kdbus_msg_recv(conn_dst, &msg, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	KDBUS_ITEM_FOREACH(item, msg, items) {
++		if (item->type == KDBUS_ITEM_FDS) {
++			char tmp[14];
++			int nfds = (item->size - KDBUS_ITEM_HEADER_SIZE) /
++					sizeof(int);
++			ASSERT_RETURN(nfds == 1);
++
++			i = read(item->fds[0], tmp, sizeof(tmp));
++			if (i != 0) {
++				ASSERT_RETURN(i == sizeof(tmp));
++				ASSERT_RETURN(memcmp(tmp, str, sizeof(tmp)) == 0);
++
++				/* Write EOF */
++				close(fds[1]);
++
++				/*
++				 * Resend the read end of the pipe,
++				 * the receiver still holds a reference
++				 * to it...
++				 */
++				goto loop_send_fds;
++			}
++
++			/* Got EOF */
++
++			/*
++			 * Close the last reference to the read end
++			 * of the pipe, other references are
++			 * automatically closed just after send.
++			 */
++			close(item->fds[0]);
++		}
++	}
++
++	/*
++	 * Try to resend the read end of the pipe. Must fail with
++	 * -EBADF since both the sender and receiver closed their
++	 * references to it. We assume the above since sender and
++	 * receiver are on the same process.
++	 */
++	ret = send_fds(conn_src, conn_dst->id, fds, 1);
++	ASSERT_RETURN(ret == -EBADF);
++
++	/* Then we clear out received any data... */
++	kdbus_msg_free(msg);
++
++	ret = kdbus_send_multiple_fds(conn_src, conn_dst);
++	ASSERT_RETURN(ret == 0);
++
++	close(sock_pair[0]);
++	close(sock_pair[1]);
++	close(memfd);
++
++	kdbus_conn_free(conn_src);
++	kdbus_conn_free(conn_dst);
++
++	return TEST_OK;
++}
+diff --git a/tools/testing/selftests/kdbus/test-free.c b/tools/testing/selftests/kdbus/test-free.c
+new file mode 100644
+index 0000000..f666da3
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-free.c
+@@ -0,0 +1,64 @@
++#include <stdio.h>
++#include <string.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++#include <assert.h>
++#include <stdbool.h>
++
++#include "kdbus-api.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++#include "kdbus-test.h"
++
++static int sample_ioctl_call(struct kdbus_test_env *env)
++{
++	int ret;
++	struct kdbus_cmd_list cmd_list = {
++		.flags = KDBUS_LIST_QUEUED,
++		.size = sizeof(cmd_list),
++	};
++
++	ret = kdbus_cmd_list(env->conn->fd, &cmd_list);
++	ASSERT_RETURN(ret == 0);
++
++	/* DON'T FREE THIS SLICE OF MEMORY! */
++
++	return TEST_OK;
++}
++
++int kdbus_test_free(struct kdbus_test_env *env)
++{
++	int ret;
++	struct kdbus_cmd_free cmd_free = {};
++
++	/* free an unallocated buffer */
++	cmd_free.size = sizeof(cmd_free);
++	cmd_free.flags = 0;
++	cmd_free.offset = 0;
++	ret = kdbus_cmd_free(env->conn->fd, &cmd_free);
++	ASSERT_RETURN(ret == -ENXIO);
++
++	/* free a buffer out of the pool's bounds */
++	cmd_free.size = sizeof(cmd_free);
++	cmd_free.offset = POOL_SIZE + 1;
++	ret = kdbus_cmd_free(env->conn->fd, &cmd_free);
++	ASSERT_RETURN(ret == -ENXIO);
++
++	/*
++	 * The user application is responsible for freeing the allocated
++	 * memory with the KDBUS_CMD_FREE ioctl, so let's test what happens
++	 * if we forget about it.
++	 */
++
++	ret = sample_ioctl_call(env);
++	ASSERT_RETURN(ret == 0);
++
++	ret = sample_ioctl_call(env);
++	ASSERT_RETURN(ret == 0);
++
++	return TEST_OK;
++}
+diff --git a/tools/testing/selftests/kdbus/test-match.c b/tools/testing/selftests/kdbus/test-match.c
+new file mode 100644
+index 0000000..2360dc1
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-match.c
+@@ -0,0 +1,441 @@
++#include <stdio.h>
++#include <string.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++#include <assert.h>
++#include <stdbool.h>
++
++#include "kdbus-api.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++#include "kdbus-test.h"
++
++int kdbus_test_match_id_add(struct kdbus_test_env *env)
++{
++	struct {
++		struct kdbus_cmd_match cmd;
++		struct {
++			uint64_t size;
++			uint64_t type;
++			struct kdbus_notify_id_change chg;
++		} item;
++	} buf;
++	struct kdbus_conn *conn;
++	struct kdbus_msg *msg;
++	int ret;
++
++	memset(&buf, 0, sizeof(buf));
++
++	buf.cmd.size = sizeof(buf);
++	buf.cmd.cookie = 0xdeafbeefdeaddead;
++	buf.item.size = sizeof(buf.item);
++	buf.item.type = KDBUS_ITEM_ID_ADD;
++	buf.item.chg.id = KDBUS_MATCH_ID_ANY;
++
++	/* match on id add */
++	ret = kdbus_cmd_match_add(env->conn->fd, &buf.cmd);
++	ASSERT_RETURN(ret == 0);
++
++	/* create 2nd connection */
++	conn = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn != NULL);
++
++	/* 1st connection should have received a notification */
++	ret = kdbus_msg_recv(env->conn, &msg, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_ID_ADD);
++	ASSERT_RETURN(msg->items[0].id_change.id == conn->id);
++
++	kdbus_conn_free(conn);
++
++	return TEST_OK;
++}
++
++int kdbus_test_match_id_remove(struct kdbus_test_env *env)
++{
++	struct {
++		struct kdbus_cmd_match cmd;
++		struct {
++			uint64_t size;
++			uint64_t type;
++			struct kdbus_notify_id_change chg;
++		} item;
++	} buf;
++	struct kdbus_conn *conn;
++	struct kdbus_msg *msg;
++	size_t id;
++	int ret;
++
++	/* create 2nd connection */
++	conn = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn != NULL);
++	id = conn->id;
++
++	memset(&buf, 0, sizeof(buf));
++	buf.cmd.size = sizeof(buf);
++	buf.cmd.cookie = 0xdeafbeefdeaddead;
++	buf.item.size = sizeof(buf.item);
++	buf.item.type = KDBUS_ITEM_ID_REMOVE;
++	buf.item.chg.id = id;
++
++	/* register match on 2nd connection */
++	ret = kdbus_cmd_match_add(env->conn->fd, &buf.cmd);
++	ASSERT_RETURN(ret == 0);
++
++	/* remove 2nd connection again */
++	kdbus_conn_free(conn);
++
++	/* 1st connection should have received a notification */
++	ret = kdbus_msg_recv(env->conn, &msg, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_ID_REMOVE);
++	ASSERT_RETURN(msg->items[0].id_change.id == id);
++
++	return TEST_OK;
++}
++
++int kdbus_test_match_replace(struct kdbus_test_env *env)
++{
++	struct {
++		struct kdbus_cmd_match cmd;
++		struct {
++			uint64_t size;
++			uint64_t type;
++			struct kdbus_notify_id_change chg;
++		} item;
++	} buf;
++	struct kdbus_conn *conn;
++	struct kdbus_msg *msg;
++	size_t id;
++	int ret;
++
++	/* add a match to id_add */
++	ASSERT_RETURN(kdbus_test_match_id_add(env) == TEST_OK);
++
++	/* do a replace of the match from id_add to id_remove */
++	memset(&buf, 0, sizeof(buf));
++
++	buf.cmd.size = sizeof(buf);
++	buf.cmd.cookie = 0xdeafbeefdeaddead;
++	buf.cmd.flags = KDBUS_MATCH_REPLACE;
++	buf.item.size = sizeof(buf.item);
++	buf.item.type = KDBUS_ITEM_ID_REMOVE;
++	buf.item.chg.id = KDBUS_MATCH_ID_ANY;
++
++	ret = kdbus_cmd_match_add(env->conn->fd, &buf.cmd);
++
++	/* create 2nd connection */
++	conn = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn != NULL);
++	id = conn->id;
++
++	/* 1st connection should _not_ have received a notification */
++	ret = kdbus_msg_recv(env->conn, &msg, NULL);
++	ASSERT_RETURN(ret != 0);
++
++	/* remove 2nd connection */
++	kdbus_conn_free(conn);
++
++	/* 1st connection should _now_ have received a notification */
++	ret = kdbus_msg_recv(env->conn, &msg, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_ID_REMOVE);
++	ASSERT_RETURN(msg->items[0].id_change.id == id);
++
++	return TEST_OK;
++}
++
++int kdbus_test_match_name_add(struct kdbus_test_env *env)
++{
++	struct {
++		struct kdbus_cmd_match cmd;
++		struct {
++			uint64_t size;
++			uint64_t type;
++			struct kdbus_notify_name_change chg;
++		} item;
++		char name[64];
++	} buf;
++	struct kdbus_msg *msg;
++	char *name;
++	int ret;
++
++	name = "foo.bla.blaz";
++
++	/* install the match rule */
++	memset(&buf, 0, sizeof(buf));
++	buf.item.type = KDBUS_ITEM_NAME_ADD;
++	buf.item.chg.old_id.id = KDBUS_MATCH_ID_ANY;
++	buf.item.chg.new_id.id = KDBUS_MATCH_ID_ANY;
++	strncpy(buf.name, name, sizeof(buf.name) - 1);
++	buf.item.size = sizeof(buf.item) + strlen(buf.name) + 1;
++	buf.cmd.size = sizeof(buf.cmd) + buf.item.size;
++
++	ret = kdbus_cmd_match_add(env->conn->fd, &buf.cmd);
++	ASSERT_RETURN(ret == 0);
++
++	/* acquire the name */
++	ret = kdbus_name_acquire(env->conn, name, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	/* we should have received a notification */
++	ret = kdbus_msg_recv(env->conn, &msg, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_NAME_ADD);
++	ASSERT_RETURN(msg->items[0].name_change.old_id.id == 0);
++	ASSERT_RETURN(msg->items[0].name_change.new_id.id == env->conn->id);
++	ASSERT_RETURN(strcmp(msg->items[0].name_change.name, name) == 0);
++
++	return TEST_OK;
++}
++
++int kdbus_test_match_name_remove(struct kdbus_test_env *env)
++{
++	struct {
++		struct kdbus_cmd_match cmd;
++		struct {
++			uint64_t size;
++			uint64_t type;
++			struct kdbus_notify_name_change chg;
++		} item;
++		char name[64];
++	} buf;
++	struct kdbus_msg *msg;
++	char *name;
++	int ret;
++
++	name = "foo.bla.blaz";
++
++	/* acquire the name */
++	ret = kdbus_name_acquire(env->conn, name, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	/* install the match rule */
++	memset(&buf, 0, sizeof(buf));
++	buf.item.type = KDBUS_ITEM_NAME_REMOVE;
++	buf.item.chg.old_id.id = KDBUS_MATCH_ID_ANY;
++	buf.item.chg.new_id.id = KDBUS_MATCH_ID_ANY;
++	strncpy(buf.name, name, sizeof(buf.name) - 1);
++	buf.item.size = sizeof(buf.item) + strlen(buf.name) + 1;
++	buf.cmd.size = sizeof(buf.cmd) + buf.item.size;
++
++	ret = kdbus_cmd_match_add(env->conn->fd, &buf.cmd);
++	ASSERT_RETURN(ret == 0);
++
++	/* release the name again */
++	kdbus_name_release(env->conn, name);
++	ASSERT_RETURN(ret == 0);
++
++	/* we should have received a notification */
++	ret = kdbus_msg_recv(env->conn, &msg, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_NAME_REMOVE);
++	ASSERT_RETURN(msg->items[0].name_change.old_id.id == env->conn->id);
++	ASSERT_RETURN(msg->items[0].name_change.new_id.id == 0);
++	ASSERT_RETURN(strcmp(msg->items[0].name_change.name, name) == 0);
++
++	return TEST_OK;
++}
++
++int kdbus_test_match_name_change(struct kdbus_test_env *env)
++{
++	struct {
++		struct kdbus_cmd_match cmd;
++		struct {
++			uint64_t size;
++			uint64_t type;
++			struct kdbus_notify_name_change chg;
++		} item;
++		char name[64];
++	} buf;
++	struct kdbus_conn *conn;
++	struct kdbus_msg *msg;
++	uint64_t flags;
++	char *name = "foo.bla.baz";
++	int ret;
++
++	/* acquire the name */
++	ret = kdbus_name_acquire(env->conn, name, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	/* install the match rule */
++	memset(&buf, 0, sizeof(buf));
++	buf.item.type = KDBUS_ITEM_NAME_CHANGE;
++	buf.item.chg.old_id.id = KDBUS_MATCH_ID_ANY;
++	buf.item.chg.new_id.id = KDBUS_MATCH_ID_ANY;
++	strncpy(buf.name, name, sizeof(buf.name) - 1);
++	buf.item.size = sizeof(buf.item) + strlen(buf.name) + 1;
++	buf.cmd.size = sizeof(buf.cmd) + buf.item.size;
++
++	ret = kdbus_cmd_match_add(env->conn->fd, &buf.cmd);
++	ASSERT_RETURN(ret == 0);
++
++	/* create a 2nd connection */
++	conn = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn != NULL);
++
++	/* allow the new connection to own the same name */
++	/* queue the 2nd connection as waiting owner */
++	flags = KDBUS_NAME_QUEUE;
++	ret = kdbus_name_acquire(conn, name, &flags);
++	ASSERT_RETURN(ret == 0);
++	ASSERT_RETURN(flags & KDBUS_NAME_IN_QUEUE);
++
++	/* release name from 1st connection */
++	ret = kdbus_name_release(env->conn, name);
++	ASSERT_RETURN(ret == 0);
++
++	/* we should have received a notification */
++	ret = kdbus_msg_recv(env->conn, &msg, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_NAME_CHANGE);
++	ASSERT_RETURN(msg->items[0].name_change.old_id.id == env->conn->id);
++	ASSERT_RETURN(msg->items[0].name_change.new_id.id == conn->id);
++	ASSERT_RETURN(strcmp(msg->items[0].name_change.name, name) == 0);
++
++	kdbus_conn_free(conn);
++
++	return TEST_OK;
++}
++
++static int send_bloom_filter(const struct kdbus_conn *conn,
++			     uint64_t cookie,
++			     const uint8_t *filter,
++			     size_t filter_size,
++			     uint64_t filter_generation)
++{
++	struct kdbus_cmd_send cmd = {};
++	struct kdbus_msg *msg;
++	struct kdbus_item *item;
++	uint64_t size;
++	int ret;
++
++	size = sizeof(struct kdbus_msg);
++	size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + filter_size;
++
++	msg = alloca(size);
++
++	memset(msg, 0, size);
++	msg->size = size;
++	msg->src_id = conn->id;
++	msg->dst_id = KDBUS_DST_ID_BROADCAST;
++	msg->flags = KDBUS_MSG_SIGNAL;
++	msg->payload_type = KDBUS_PAYLOAD_DBUS;
++	msg->cookie = cookie;
++
++	item = msg->items;
++	item->type = KDBUS_ITEM_BLOOM_FILTER;
++	item->size = KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) +
++				filter_size;
++
++	item->bloom_filter.generation = filter_generation;
++	memcpy(item->bloom_filter.data, filter, filter_size);
++
++	cmd.size = sizeof(cmd);
++	cmd.msg_address = (uintptr_t)msg;
++
++	ret = kdbus_cmd_send(conn->fd, &cmd);
++	if (ret < 0) {
++		kdbus_printf("error sending message: %d (%m)\n", ret);
++		return ret;
++	}
++
++	return 0;
++}
++
++int kdbus_test_match_bloom(struct kdbus_test_env *env)
++{
++	struct {
++		struct kdbus_cmd_match cmd;
++		struct {
++			uint64_t size;
++			uint64_t type;
++			uint8_t data_gen0[64];
++			uint8_t data_gen1[64];
++		} item;
++	} buf;
++	struct kdbus_conn *conn;
++	struct kdbus_msg *msg;
++	uint64_t cookie = 0xf000f00f;
++	uint8_t filter[64];
++	int ret;
++
++	/* install the match rule */
++	memset(&buf, 0, sizeof(buf));
++	buf.cmd.size = sizeof(buf);
++
++	buf.item.size = sizeof(buf.item);
++	buf.item.type = KDBUS_ITEM_BLOOM_MASK;
++	buf.item.data_gen0[0] = 0x55;
++	buf.item.data_gen0[63] = 0x80;
++
++	buf.item.data_gen1[1] = 0xaa;
++	buf.item.data_gen1[9] = 0x02;
++
++	ret = kdbus_cmd_match_add(env->conn->fd, &buf.cmd);
++	ASSERT_RETURN(ret == 0);
++
++	/* create a 2nd connection */
++	conn = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn != NULL);
++
++	/* a message with a 0'ed out filter must not reach the other peer */
++	memset(filter, 0, sizeof(filter));
++	ret = send_bloom_filter(conn, ++cookie, filter, sizeof(filter), 0);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_msg_recv(env->conn, &msg, NULL);
++	ASSERT_RETURN(ret == -EAGAIN);
++
++	/* now set the filter to the connection's mask and expect success */
++	filter[0] = 0x55;
++	filter[63] = 0x80;
++	ret = send_bloom_filter(conn, ++cookie, filter, sizeof(filter), 0);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_msg_recv(env->conn, &msg, NULL);
++	ASSERT_RETURN(ret == 0);
++	ASSERT_RETURN(msg->cookie == cookie);
++
++	/* broaden the filter and try again. this should also succeed. */
++	filter[0] = 0xff;
++	filter[8] = 0xff;
++	filter[63] = 0xff;
++	ret = send_bloom_filter(conn, ++cookie, filter, sizeof(filter), 0);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_msg_recv(env->conn, &msg, NULL);
++	ASSERT_RETURN(ret == 0);
++	ASSERT_RETURN(msg->cookie == cookie);
++
++	/* the same filter must not match against bloom generation 1 */
++	ret = send_bloom_filter(conn, ++cookie, filter, sizeof(filter), 1);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_msg_recv(env->conn, &msg, NULL);
++	ASSERT_RETURN(ret == -EAGAIN);
++
++	/* set a different filter and try again */
++	filter[1] = 0xaa;
++	filter[9] = 0x02;
++	ret = send_bloom_filter(conn, ++cookie, filter, sizeof(filter), 1);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_msg_recv(env->conn, &msg, NULL);
++	ASSERT_RETURN(ret == 0);
++	ASSERT_RETURN(msg->cookie == cookie);
++
++	kdbus_conn_free(conn);
++
++	return TEST_OK;
++}
+diff --git a/tools/testing/selftests/kdbus/test-message.c b/tools/testing/selftests/kdbus/test-message.c
+new file mode 100644
+index 0000000..563dc85
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-message.c
+@@ -0,0 +1,734 @@
++#include <stdio.h>
++#include <string.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++#include <assert.h>
++#include <time.h>
++#include <stdbool.h>
++#include <sys/eventfd.h>
++#include <sys/types.h>
++#include <sys/wait.h>
++
++#include "kdbus-api.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++#include "kdbus-test.h"
++
++/* maximum number of queued messages from the same individual user */
++#define KDBUS_CONN_MAX_MSGS			256
++
++/* maximum number of queued requests waiting for a reply */
++#define KDBUS_CONN_MAX_REQUESTS_PENDING		128
++
++/* maximum message payload size */
++#define KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE		(2 * 1024UL * 1024UL)
++
++int kdbus_test_message_basic(struct kdbus_test_env *env)
++{
++	struct kdbus_conn *conn;
++	struct kdbus_conn *sender;
++	struct kdbus_msg *msg;
++	uint64_t cookie = 0x1234abcd5678eeff;
++	uint64_t offset;
++	int ret;
++
++	sender = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(sender != NULL);
++
++	/* create a 2nd connection */
++	conn = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn != NULL);
++
++	ret = kdbus_add_match_empty(conn);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_add_match_empty(sender);
++	ASSERT_RETURN(ret == 0);
++
++	/* send over 1st connection */
++	ret = kdbus_msg_send(sender, NULL, cookie, 0, 0, 0,
++			     KDBUS_DST_ID_BROADCAST);
++	ASSERT_RETURN(ret == 0);
++
++	/* Make sure that we do get our own broadcasts */
++	ret = kdbus_msg_recv(sender, &msg, &offset);
++	ASSERT_RETURN(ret == 0);
++	ASSERT_RETURN(msg->cookie == cookie);
++
++	kdbus_msg_free(msg);
++
++	/* ... and receive on the 2nd */
++	ret = kdbus_msg_recv_poll(conn, 100, &msg, &offset);
++	ASSERT_RETURN(ret == 0);
++	ASSERT_RETURN(msg->cookie == cookie);
++
++	kdbus_msg_free(msg);
++
++	/* Msgs that expect a reply must have timeout and cookie */
++	ret = kdbus_msg_send(sender, NULL, 0, KDBUS_MSG_EXPECT_REPLY,
++			     0, 0, conn->id);
++	ASSERT_RETURN(ret == -EINVAL);
++
++	/* Faked replies with a valid reply cookie are rejected */
++	ret = kdbus_msg_send_reply(conn, time(NULL) ^ cookie, sender->id);
++	ASSERT_RETURN(ret == -EBADSLT);
++
++	ret = kdbus_free(conn, offset);
++	ASSERT_RETURN(ret == 0);
++
++	kdbus_conn_free(sender);
++	kdbus_conn_free(conn);
++
++	return TEST_OK;
++}
++
++static int msg_recv_prio(struct kdbus_conn *conn,
++			 int64_t requested_prio,
++			 int64_t expected_prio)
++{
++	struct kdbus_cmd_recv recv = {
++		.size = sizeof(recv),
++		.flags = KDBUS_RECV_USE_PRIORITY,
++		.priority = requested_prio,
++	};
++	struct kdbus_msg *msg;
++	int ret;
++
++	ret = kdbus_cmd_recv(conn->fd, &recv);
++	if (ret < 0) {
++		kdbus_printf("error receiving message: %d (%m)\n", -errno);
++		return ret;
++	}
++
++	msg = (struct kdbus_msg *)(conn->buf + recv.msg.offset);
++	kdbus_msg_dump(conn, msg);
++
++	if (msg->priority != expected_prio) {
++		kdbus_printf("expected message prio %lld, got %lld\n",
++			     (unsigned long long) expected_prio,
++			     (unsigned long long) msg->priority);
++		return -EINVAL;
++	}
++
++	kdbus_msg_free(msg);
++	ret = kdbus_free(conn, recv.msg.offset);
++	if (ret < 0)
++		return ret;
++
++	return 0;
++}
++
++int kdbus_test_message_prio(struct kdbus_test_env *env)
++{
++	struct kdbus_conn *a, *b;
++	uint64_t cookie = 0;
++
++	a = kdbus_hello(env->buspath, 0, NULL, 0);
++	b = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(a && b);
++
++	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0,   25, a->id) == 0);
++	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -600, a->id) == 0);
++	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0,   10, a->id) == 0);
++	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0,  -35, a->id) == 0);
++	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -100, a->id) == 0);
++	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0,   20, a->id) == 0);
++	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0,  -15, a->id) == 0);
++	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -800, a->id) == 0);
++	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -150, a->id) == 0);
++	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0,   10, a->id) == 0);
++	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -800, a->id) == 0);
++	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0,  -10, a->id) == 0);
++
++	ASSERT_RETURN(msg_recv_prio(a, -200, -800) == 0);
++	ASSERT_RETURN(msg_recv_prio(a, -100, -800) == 0);
++	ASSERT_RETURN(msg_recv_prio(a, -400, -600) == 0);
++	ASSERT_RETURN(msg_recv_prio(a, -400, -600) == -EAGAIN);
++	ASSERT_RETURN(msg_recv_prio(a, 10, -150) == 0);
++	ASSERT_RETURN(msg_recv_prio(a, 10, -100) == 0);
++
++	kdbus_printf("--- get priority (all)\n");
++	ASSERT_RETURN(kdbus_msg_recv(a, NULL, NULL) == 0);
++
++	kdbus_conn_free(a);
++	kdbus_conn_free(b);
++
++	return TEST_OK;
++}
++
++static int kdbus_test_notify_kernel_quota(struct kdbus_test_env *env)
++{
++	int ret;
++	unsigned int i;
++	struct kdbus_conn *conn;
++	struct kdbus_conn *reader;
++	struct kdbus_msg *msg = NULL;
++	struct kdbus_cmd_recv recv = { .size = sizeof(recv) };
++
++	reader = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(reader);
++
++	conn = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn);
++
++	/* Register for ID signals */
++	ret = kdbus_add_match_id(reader, 0x1, KDBUS_ITEM_ID_ADD,
++				 KDBUS_MATCH_ID_ANY);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_add_match_id(reader, 0x2, KDBUS_ITEM_ID_REMOVE,
++				 KDBUS_MATCH_ID_ANY);
++	ASSERT_RETURN(ret == 0);
++
++	/* Each iteration two notifications: add and remove ID */
++	for (i = 0; i < KDBUS_CONN_MAX_MSGS / 2; i++) {
++		struct kdbus_conn *notifier;
++
++		notifier = kdbus_hello(env->buspath, 0, NULL, 0);
++		ASSERT_RETURN(notifier);
++
++		kdbus_conn_free(notifier);
++	}
++
++	/*
++	 * Now the reader queue is full with kernel notfications,
++	 * but as a user we still have room to push our messages.
++	 */
++	ret = kdbus_msg_send(conn, NULL, 0xdeadbeef, 0, 0, 0, reader->id);
++	ASSERT_RETURN(ret == 0);
++
++	/* More ID kernel notifications that will be lost */
++	kdbus_conn_free(conn);
++
++	conn = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn);
++
++	kdbus_conn_free(conn);
++
++	/*
++	 * We lost only 3 packets since only signal msgs are
++	 * accounted. The connection ID add/remove notification
++	 */
++	ret = kdbus_cmd_recv(reader->fd, &recv);
++	ASSERT_RETURN(ret == 0);
++	ASSERT_RETURN(recv.return_flags & KDBUS_RECV_RETURN_DROPPED_MSGS);
++	ASSERT_RETURN(recv.dropped_msgs == 3);
++
++	msg = (struct kdbus_msg *)(reader->buf + recv.msg.offset);
++	kdbus_msg_free(msg);
++
++	/* Read our queue */
++	for (i = 0; i < KDBUS_CONN_MAX_MSGS - 1; i++) {
++		memset(&recv, 0, sizeof(recv));
++		recv.size = sizeof(recv);
++
++		ret = kdbus_cmd_recv(reader->fd, &recv);
++		ASSERT_RETURN(ret == 0);
++		ASSERT_RETURN(!(recv.return_flags &
++			        KDBUS_RECV_RETURN_DROPPED_MSGS));
++
++		msg = (struct kdbus_msg *)(reader->buf + recv.msg.offset);
++		kdbus_msg_free(msg);
++	}
++
++	ret = kdbus_msg_recv(reader, NULL, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_msg_recv(reader, NULL, NULL);
++	ASSERT_RETURN(ret == -EAGAIN);
++
++	kdbus_conn_free(reader);
++
++	return 0;
++}
++
++/* Return the number of message successfully sent */
++static int kdbus_fill_conn_queue(struct kdbus_conn *conn_src,
++				 uint64_t dst_id,
++				 unsigned int max_msgs)
++{
++	unsigned int i;
++	uint64_t cookie = 0;
++	size_t size;
++	struct kdbus_cmd_send cmd = {};
++	struct kdbus_msg *msg;
++	int ret;
++
++	size = sizeof(struct kdbus_msg);
++	msg = malloc(size);
++	ASSERT_RETURN_VAL(msg, -ENOMEM);
++
++	memset(msg, 0, size);
++	msg->size = size;
++	msg->src_id = conn_src->id;
++	msg->dst_id = dst_id;
++	msg->payload_type = KDBUS_PAYLOAD_DBUS;
++
++	cmd.size = sizeof(cmd);
++	cmd.msg_address = (uintptr_t)msg;
++
++	for (i = 0; i < max_msgs; i++) {
++		msg->cookie = cookie++;
++		ret = kdbus_cmd_send(conn_src->fd, &cmd);
++		if (ret < 0)
++			break;
++	}
++
++	free(msg);
++
++	return i;
++}
++
++static int kdbus_test_activator_quota(struct kdbus_test_env *env)
++{
++	int ret;
++	unsigned int i;
++	unsigned int activator_msgs_count = 0;
++	uint64_t cookie = time(NULL);
++	struct kdbus_conn *conn;
++	struct kdbus_conn *sender;
++	struct kdbus_conn *activator;
++	struct kdbus_msg *msg;
++	uint64_t flags = KDBUS_NAME_REPLACE_EXISTING;
++	struct kdbus_cmd_recv recv = { .size = sizeof(recv) };
++	struct kdbus_policy_access access = {
++		.type = KDBUS_POLICY_ACCESS_USER,
++		.id = geteuid(),
++		.access = KDBUS_POLICY_OWN,
++	};
++
++	activator = kdbus_hello_activator(env->buspath, "foo.test.activator",
++					  &access, 1);
++	ASSERT_RETURN(activator);
++
++	conn = kdbus_hello(env->buspath, 0, NULL, 0);
++	sender = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn || sender);
++
++	ret = kdbus_list(sender, KDBUS_LIST_NAMES |
++				 KDBUS_LIST_UNIQUE |
++				 KDBUS_LIST_ACTIVATORS |
++				 KDBUS_LIST_QUEUED);
++	ASSERT_RETURN(ret == 0);
++
++	for (i = 0; i < KDBUS_CONN_MAX_MSGS; i++) {
++		ret = kdbus_msg_send(sender, "foo.test.activator",
++				     cookie++, 0, 0, 0,
++				     KDBUS_DST_ID_NAME);
++		if (ret < 0)
++			break;
++		activator_msgs_count++;
++	}
++
++	/* we must have at least sent one message */
++	ASSERT_RETURN_VAL(i > 0, -errno);
++	ASSERT_RETURN(ret == -ENOBUFS);
++
++	/* Good, activator queue is full now */
++
++	/* ENXIO on direct send (activators can never be addressed by ID) */
++	ret = kdbus_msg_send(conn, NULL, cookie++, 0, 0, 0, activator->id);
++	ASSERT_RETURN(ret == -ENXIO);
++
++	/* can't queue more */
++	ret = kdbus_msg_send(conn, "foo.test.activator", cookie++,
++			     0, 0, 0, KDBUS_DST_ID_NAME);
++	ASSERT_RETURN(ret == -ENOBUFS);
++
++	/* no match installed, so the broadcast will not inc dropped_msgs */
++	ret = kdbus_msg_send(sender, NULL, cookie++, 0, 0, 0,
++			     KDBUS_DST_ID_BROADCAST);
++	ASSERT_RETURN(ret == 0);
++
++	/* Check activator queue */
++	ret = kdbus_cmd_recv(activator->fd, &recv);
++	ASSERT_RETURN(ret == 0);
++	ASSERT_RETURN(recv.dropped_msgs == 0);
++
++	activator_msgs_count--;
++
++	msg = (struct kdbus_msg *)(activator->buf + recv.msg.offset);
++	kdbus_msg_free(msg);
++
++
++	/* Stage 1) of test check the pool memory quota */
++
++	/* Consume the connection pool memory */
++	for (i = 0; i < KDBUS_CONN_MAX_MSGS; i++) {
++		ret = kdbus_msg_send(sender, NULL,
++				     cookie++, 0, 0, 0, conn->id);
++		if (ret < 0)
++			break;
++	}
++
++	/* consume one message, so later at least one can be moved */
++	memset(&recv, 0, sizeof(recv));
++	recv.size = sizeof(recv);
++	ret = kdbus_cmd_recv(conn->fd, &recv);
++	ASSERT_RETURN(ret == 0);
++	ASSERT_RETURN(recv.dropped_msgs == 0);
++	msg = (struct kdbus_msg *)(conn->buf + recv.msg.offset);
++	kdbus_msg_free(msg);
++
++	/* Try to acquire the name now */
++	ret = kdbus_name_acquire(conn, "foo.test.activator", &flags);
++	ASSERT_RETURN(ret == 0);
++
++	/* try to read messages and see if we have lost some */
++	memset(&recv, 0, sizeof(recv));
++	recv.size = sizeof(recv);
++	ret = kdbus_cmd_recv(conn->fd, &recv);
++	ASSERT_RETURN(ret == 0);
++	ASSERT_RETURN(recv.dropped_msgs != 0);
++
++	/* number of dropped msgs < received ones (at least one was moved) */
++	ASSERT_RETURN(recv.dropped_msgs < activator_msgs_count);
++
++	/* Deduct the number of dropped msgs from the activator msgs */
++	activator_msgs_count -= recv.dropped_msgs;
++
++	msg = (struct kdbus_msg *)(activator->buf + recv.msg.offset);
++	kdbus_msg_free(msg);
++
++	/*
++	 * Release the name and hand it back to activator, now
++	 * we should have 'activator_msgs_count' msgs again in
++	 * the activator queue
++	 */
++	ret = kdbus_name_release(conn, "foo.test.activator");
++	ASSERT_RETURN(ret == 0);
++
++	/* make sure that we got our previous activator msgs */
++	ret = kdbus_msg_recv(activator, &msg, NULL);
++	ASSERT_RETURN(ret == 0);
++	ASSERT_RETURN(msg->src_id == sender->id);
++
++	activator_msgs_count--;
++
++	kdbus_msg_free(msg);
++
++
++	/* Stage 2) of test check max message quota */
++
++	/* Empty conn queue */
++	for (i = 0; i < KDBUS_CONN_MAX_MSGS; i++) {
++		ret = kdbus_msg_recv(conn, NULL, NULL);
++		if (ret == -EAGAIN)
++			break;
++	}
++
++	/* fill queue with max msgs quota */
++	ret = kdbus_fill_conn_queue(sender, conn->id, KDBUS_CONN_MAX_MSGS);
++	ASSERT_RETURN(ret == KDBUS_CONN_MAX_MSGS);
++
++	/* This one is lost but it is not accounted */
++	ret = kdbus_msg_send(sender, NULL,
++			     cookie++, 0, 0, 0, conn->id);
++	ASSERT_RETURN(ret == -ENOBUFS);
++
++	/* Acquire the name again */
++	ret = kdbus_name_acquire(conn, "foo.test.activator", &flags);
++	ASSERT_RETURN(ret == 0);
++
++	memset(&recv, 0, sizeof(recv));
++	recv.size = sizeof(recv);
++
++	/*
++	 * Try to read messages and make sure that we have lost all
++	 * the activator messages due to quota checks. Our queue is
++	 * already full.
++	 */
++	ret = kdbus_cmd_recv(conn->fd, &recv);
++	ASSERT_RETURN(ret == 0);
++	ASSERT_RETURN(recv.dropped_msgs == activator_msgs_count);
++
++	msg = (struct kdbus_msg *)(activator->buf + recv.msg.offset);
++	kdbus_msg_free(msg);
++
++	kdbus_conn_free(sender);
++	kdbus_conn_free(conn);
++	kdbus_conn_free(activator);
++
++	return 0;
++}
++
++static int kdbus_test_expected_reply_quota(struct kdbus_test_env *env)
++{
++	int ret;
++	unsigned int i, n;
++	unsigned int count;
++	uint64_t cookie = 0x1234abcd5678eeff;
++	struct kdbus_conn *conn;
++	struct kdbus_conn *connections[9];
++
++	conn = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn);
++
++	for (i = 0; i < 9; i++) {
++		connections[i] = kdbus_hello(env->buspath, 0, NULL, 0);
++		ASSERT_RETURN(connections[i]);
++	}
++
++	count = 0;
++	/* Send 16 messages to 8 different connections */
++	for (i = 0; i < 8; i++) {
++		for (n = 0; n < 16; n++) {
++			ret = kdbus_msg_send(conn, NULL, cookie++,
++					     KDBUS_MSG_EXPECT_REPLY,
++					     100000000ULL, 0,
++					     connections[i]->id);
++			if (ret < 0)
++				break;
++
++			count++;
++		}
++	}
++
++	/*
++	 * We should have queued at least
++	 * KDBUS_CONN_MAX_REQUESTS_PENDING method call
++	 */
++	ASSERT_RETURN(count == KDBUS_CONN_MAX_REQUESTS_PENDING);
++
++	/*
++	 * Now try to send a message to the last connection,
++	 * if we have reached KDBUS_CONN_MAX_REQUESTS_PENDING
++	 * no further requests are allowed
++	 */
++	ret = kdbus_msg_send(conn, NULL, cookie++, KDBUS_MSG_EXPECT_REPLY,
++			     1000000000ULL, 0, connections[8]->id);
++	ASSERT_RETURN(ret == -EMLINK);
++
++	for (i = 0; i < 9; i++)
++		kdbus_conn_free(connections[i]);
++
++	kdbus_conn_free(conn);
++
++	return 0;
++}
++
++int kdbus_test_pool_quota(struct kdbus_test_env *env)
++{
++	struct kdbus_conn *a, *b, *c;
++	struct kdbus_cmd_send cmd = {};
++	struct kdbus_item *item;
++	struct kdbus_msg *recv_msg;
++	struct kdbus_msg *msg;
++	uint64_t cookie = time(NULL);
++	uint64_t size;
++	unsigned int i;
++	char *payload;
++	int ret;
++
++	/* just a guard */
++	if (POOL_SIZE <= KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE ||
++	    POOL_SIZE % KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE != 0)
++		return 0;
++
++	payload = calloc(KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE, sizeof(char));
++	ASSERT_RETURN_VAL(payload, -ENOMEM);
++
++	a = kdbus_hello(env->buspath, 0, NULL, 0);
++	b = kdbus_hello(env->buspath, 0, NULL, 0);
++	c = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(a && b && c);
++
++	size = sizeof(struct kdbus_msg);
++	size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
++
++	msg = malloc(size);
++	ASSERT_RETURN_VAL(msg, -ENOMEM);
++
++	memset(msg, 0, size);
++	msg->size = size;
++	msg->src_id = a->id;
++	msg->dst_id = c->id;
++	msg->payload_type = KDBUS_PAYLOAD_DBUS;
++
++	item = msg->items;
++	item->type = KDBUS_ITEM_PAYLOAD_VEC;
++	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
++	item->vec.address = (uintptr_t)payload;
++	item->vec.size = KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE;
++	item = KDBUS_ITEM_NEXT(item);
++
++	cmd.size = sizeof(cmd);
++	cmd.msg_address = (uintptr_t)msg;
++
++	/*
++	 * Send 2097248 bytes, a user is only allowed to get 33% of half of
++	 * the free space of the pool, the already used space is
++	 * accounted as free space
++	 */
++	size += KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE;
++	for (i = size; i < (POOL_SIZE / 2 / 3); i += size) {
++		msg->cookie = cookie++;
++
++		ret = kdbus_cmd_send(a->fd, &cmd);
++		ASSERT_RETURN_VAL(ret == 0, ret);
++	}
++
++	/* Try to get more than 33% */
++	msg->cookie = cookie++;
++	ret = kdbus_cmd_send(a->fd, &cmd);
++	ASSERT_RETURN(ret == -ENOBUFS);
++
++	/* We still can pass small messages */
++	ret = kdbus_msg_send(b, NULL, cookie++, 0, 0, 0, c->id);
++	ASSERT_RETURN(ret == 0);
++
++	for (i = size; i < (POOL_SIZE / 2 / 3); i += size) {
++		ret = kdbus_msg_recv(c, &recv_msg, NULL);
++		ASSERT_RETURN(ret == 0);
++		ASSERT_RETURN(recv_msg->src_id == a->id);
++
++		kdbus_msg_free(recv_msg);
++	}
++
++	ret = kdbus_msg_recv(c, &recv_msg, NULL);
++	ASSERT_RETURN(ret == 0);
++	ASSERT_RETURN(recv_msg->src_id == b->id);
++
++	kdbus_msg_free(recv_msg);
++
++	ret = kdbus_msg_recv(c, NULL, NULL);
++	ASSERT_RETURN(ret == -EAGAIN);
++
++	free(msg);
++	free(payload);
++
++	kdbus_conn_free(c);
++	kdbus_conn_free(b);
++	kdbus_conn_free(a);
++
++	return 0;
++}
++
++int kdbus_test_message_quota(struct kdbus_test_env *env)
++{
++	struct kdbus_conn *a, *b;
++	uint64_t cookie = 0;
++	int ret;
++	int i;
++
++	ret = kdbus_test_activator_quota(env);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_test_notify_kernel_quota(env);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_test_pool_quota(env);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_test_expected_reply_quota(env);
++	ASSERT_RETURN(ret == 0);
++
++	a = kdbus_hello(env->buspath, 0, NULL, 0);
++	b = kdbus_hello(env->buspath, 0, NULL, 0);
++
++	ret = kdbus_fill_conn_queue(b, a->id, KDBUS_CONN_MAX_MSGS);
++	ASSERT_RETURN(ret == KDBUS_CONN_MAX_MSGS);
++
++	ret = kdbus_msg_send(b, NULL, ++cookie, 0, 0, 0, a->id);
++	ASSERT_RETURN(ret == -ENOBUFS);
++
++	for (i = 0; i < KDBUS_CONN_MAX_MSGS; ++i) {
++		ret = kdbus_msg_recv(a, NULL, NULL);
++		ASSERT_RETURN(ret == 0);
++	}
++
++	ret = kdbus_msg_recv(a, NULL, NULL);
++	ASSERT_RETURN(ret == -EAGAIN);
++
++	ret = kdbus_fill_conn_queue(b, a->id, KDBUS_CONN_MAX_MSGS + 1);
++	ASSERT_RETURN(ret == KDBUS_CONN_MAX_MSGS);
++
++	ret = kdbus_msg_send(b, NULL, ++cookie, 0, 0, 0, a->id);
++	ASSERT_RETURN(ret == -ENOBUFS);
++
++	kdbus_conn_free(a);
++	kdbus_conn_free(b);
++
++	return TEST_OK;
++}
++
++int kdbus_test_memory_access(struct kdbus_test_env *env)
++{
++	struct kdbus_conn *a, *b;
++	struct kdbus_cmd_send cmd = {};
++	struct kdbus_item *item;
++	struct kdbus_msg *msg;
++	uint64_t test_addr = 0;
++	char line[256];
++	uint64_t size;
++	FILE *f;
++	int ret;
++
++	/*
++	 * Search in /proc/kallsyms for the address of a kernel symbol that
++	 * should always be there, regardless of the config. Use that address
++	 * in a PAYLOAD_VEC item and make sure it's inaccessible.
++	 */
++
++	f = fopen("/proc/kallsyms", "r");
++	if (!f)
++		return TEST_SKIP;
++
++	while (fgets(line, sizeof(line), f)) {
++		char *s = line;
++
++		if (!strsep(&s, " "))
++			continue;
++
++		if (!strsep(&s, " "))
++			continue;
++
++		if (!strncmp(s, "mutex_lock", 10)) {
++			test_addr = strtoull(line, NULL, 16);
++			break;
++		}
++	}
++
++	fclose(f);
++
++	if (!test_addr)
++		return TEST_SKIP;
++
++	a = kdbus_hello(env->buspath, 0, NULL, 0);
++	b = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(a && b);
++
++	size = sizeof(struct kdbus_msg);
++	size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
++
++	msg = alloca(size);
++	ASSERT_RETURN_VAL(msg, -ENOMEM);
++
++	memset(msg, 0, size);
++	msg->size = size;
++	msg->src_id = a->id;
++	msg->dst_id = b->id;
++	msg->payload_type = KDBUS_PAYLOAD_DBUS;
++
++	item = msg->items;
++	item->type = KDBUS_ITEM_PAYLOAD_VEC;
++	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
++	item->vec.address = test_addr;
++	item->vec.size = sizeof(void*);
++	item = KDBUS_ITEM_NEXT(item);
++
++	cmd.size = sizeof(cmd);
++	cmd.msg_address = (uintptr_t)msg;
++
++	ret = kdbus_cmd_send(a->fd, &cmd);
++	ASSERT_RETURN(ret == -EFAULT);
++
++	kdbus_conn_free(b);
++	kdbus_conn_free(a);
++
++	return 0;
++}
+diff --git a/tools/testing/selftests/kdbus/test-metadata-ns.c b/tools/testing/selftests/kdbus/test-metadata-ns.c
+new file mode 100644
+index 0000000..1f6edc0
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-metadata-ns.c
+@@ -0,0 +1,500 @@
++/*
++ * Test metadata in new namespaces. Even if our tests can run
++ * in a namespaced setup, this test is necessary so we can inspect
++ * metadata on the same kdbusfs but between multiple namespaces
++ */
++
++#include <stdio.h>
++#include <string.h>
++#include <sched.h>
++#include <time.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++#include <assert.h>
++#include <signal.h>
++#include <sys/wait.h>
++#include <sys/prctl.h>
++#include <sys/eventfd.h>
++#include <sys/syscall.h>
++#include <sys/capability.h>
++#include <linux/sched.h>
++
++#include "kdbus-test.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++
++static const struct kdbus_creds privileged_creds = {};
++
++static const struct kdbus_creds unmapped_creds = {
++	.uid	= UNPRIV_UID,
++	.euid	= UNPRIV_UID,
++	.suid	= UNPRIV_UID,
++	.fsuid	= UNPRIV_UID,
++	.gid	= UNPRIV_GID,
++	.egid	= UNPRIV_GID,
++	.sgid	= UNPRIV_GID,
++	.fsgid	= UNPRIV_GID,
++};
++
++static const struct kdbus_pids unmapped_pids = {};
++
++/* Get only the first item */
++static struct kdbus_item *kdbus_get_item(struct kdbus_msg *msg,
++					 uint64_t type)
++{
++	struct kdbus_item *item;
++
++	KDBUS_ITEM_FOREACH(item, msg, items)
++		if (item->type == type)
++			return item;
++
++	return NULL;
++}
++
++static int kdbus_match_kdbus_creds(struct kdbus_msg *msg,
++				   const struct kdbus_creds *expected_creds)
++{
++	struct kdbus_item *item;
++
++	item = kdbus_get_item(msg, KDBUS_ITEM_CREDS);
++	ASSERT_RETURN(item);
++
++	ASSERT_RETURN(memcmp(&item->creds, expected_creds,
++			     sizeof(struct kdbus_creds)) == 0);
++
++	return 0;
++}
++
++static int kdbus_match_kdbus_pids(struct kdbus_msg *msg,
++				  const struct kdbus_pids *expected_pids)
++{
++	struct kdbus_item *item;
++
++	item = kdbus_get_item(msg, KDBUS_ITEM_PIDS);
++	ASSERT_RETURN(item);
++
++	ASSERT_RETURN(memcmp(&item->pids, expected_pids,
++			     sizeof(struct kdbus_pids)) == 0);
++
++	return 0;
++}
++
++static int __kdbus_clone_userns_test(const char *bus,
++				     struct kdbus_conn *conn,
++				     uint64_t grandpa_pid,
++				     int signal_fd)
++{
++	int clone_ret;
++	int ret;
++	struct kdbus_msg *msg = NULL;
++	const struct kdbus_item *item;
++	uint64_t cookie = time(NULL) ^ 0xdeadbeef;
++	struct kdbus_conn *unpriv_conn = NULL;
++	struct kdbus_pids parent_pids = {
++		.pid = getppid(),
++		.tid = getppid(),
++		.ppid = grandpa_pid,
++	};
++
++	ret = drop_privileges(UNPRIV_UID, UNPRIV_GID);
++	ASSERT_EXIT(ret == 0);
++
++	unpriv_conn = kdbus_hello(bus, 0, NULL, 0);
++	ASSERT_EXIT(unpriv_conn);
++
++	ret = kdbus_add_match_empty(unpriv_conn);
++	ASSERT_EXIT(ret == 0);
++
++	/*
++	 * ping privileged connection from this new unprivileged
++	 * one
++	 */
++
++	ret = kdbus_msg_send(unpriv_conn, NULL, cookie, 0, 0,
++			     0, conn->id);
++	ASSERT_EXIT(ret == 0);
++
++	/*
++	 * Since we just dropped privileges, the dumpable flag
++	 * was just cleared which makes the /proc/$clone_child/uid_map
++	 * to be owned by root, hence any userns uid mapping will fail
++	 * with -EPERM since the mapping will be done by uid 65534.
++	 *
++	 * To avoid this set the dumpable flag again which makes
++	 * procfs update the /proc/$clone_child/ inodes owner to 65534.
++	 *
++	 * Using this we will be able write to /proc/$clone_child/uid_map
++	 * as uid 65534 and map the uid 65534 to 0 inside the user namespace.
++	 */
++	ret = prctl(PR_SET_DUMPABLE, SUID_DUMP_USER);
++	ASSERT_EXIT(ret == 0);
++
++	/* Make child privileged in its new userns and run tests */
++
++	ret = RUN_CLONE_CHILD(&clone_ret,
++			      SIGCHLD | CLONE_NEWUSER | CLONE_NEWPID,
++	({ 0;  /* Clone setup, nothing */ }),
++	({
++		eventfd_t event_status = 0;
++		struct kdbus_conn *userns_conn;
++
++		/* ping connection from the new user namespace */
++		userns_conn = kdbus_hello(bus, 0, NULL, 0);
++		ASSERT_EXIT(userns_conn);
++
++		ret = kdbus_add_match_empty(userns_conn);
++		ASSERT_EXIT(ret == 0);
++
++		cookie++;
++		ret = kdbus_msg_send(userns_conn, NULL, cookie,
++				     0, 0, 0, conn->id);
++		ASSERT_EXIT(ret == 0);
++
++		/* Parent did send */
++		ret = eventfd_read(signal_fd, &event_status);
++		ASSERT_RETURN(ret >= 0 && event_status == 1);
++
++		/*
++		 * Receive from privileged connection
++		 */
++		kdbus_printf("Privileged → unprivileged/privileged "
++			     "in its userns "
++			     "(different userns and pidns):\n");
++		ret = kdbus_msg_recv_poll(userns_conn, 300, &msg, NULL);
++		ASSERT_EXIT(ret == 0);
++		ASSERT_EXIT(msg->dst_id == userns_conn->id);
++
++		item = kdbus_get_item(msg, KDBUS_ITEM_CAPS);
++		ASSERT_EXIT(item);
++
++		/* uid/gid not mapped, so we have unpriv cached creds */
++		ret = kdbus_match_kdbus_creds(msg, &unmapped_creds);
++		ASSERT_EXIT(ret == 0);
++
++		/*
++		 * Diffent pid namepsaces. This is the child pidns
++		 * so it should not see its parent kdbus_pids
++		 */
++		ret = kdbus_match_kdbus_pids(msg, &unmapped_pids);
++		ASSERT_EXIT(ret == 0);
++
++		kdbus_msg_free(msg);
++
++
++		/*
++		 * Receive broadcast from privileged connection
++		 */
++		kdbus_printf("Privileged → unprivileged/privileged "
++			     "in its userns "
++			     "(different userns and pidns):\n");
++		ret = kdbus_msg_recv_poll(userns_conn, 300, &msg, NULL);
++		ASSERT_EXIT(ret == 0);
++		ASSERT_EXIT(msg->dst_id == KDBUS_DST_ID_BROADCAST);
++
++		item = kdbus_get_item(msg, KDBUS_ITEM_CAPS);
++		ASSERT_EXIT(item);
++
++		/* uid/gid not mapped, so we have unpriv cached creds */
++		ret = kdbus_match_kdbus_creds(msg, &unmapped_creds);
++		ASSERT_EXIT(ret == 0);
++
++		/*
++		 * Diffent pid namepsaces. This is the child pidns
++		 * so it should not see its parent kdbus_pids
++		 */
++		ret = kdbus_match_kdbus_pids(msg, &unmapped_pids);
++		ASSERT_EXIT(ret == 0);
++
++		kdbus_msg_free(msg);
++
++		kdbus_conn_free(userns_conn);
++	}),
++	({
++		/* Parent setup map child uid/gid */
++		ret = userns_map_uid_gid(pid, "0 65534 1", "0 65534 1");
++		ASSERT_EXIT(ret == 0);
++	}),
++	({ 0; }));
++	/* Unprivileged was not able to create user namespace */
++	if (clone_ret == -EPERM) {
++		kdbus_printf("-- CLONE_NEWUSER TEST Failed for "
++			     "uid: %u\n -- Make sure that your kernel "
++			     "do not allow CLONE_NEWUSER for "
++			     "unprivileged users\n", UNPRIV_UID);
++		ret = 0;
++		goto out;
++	}
++
++	ASSERT_EXIT(ret == 0);
++
++
++	/*
++	 * Receive from privileged connection
++	 */
++	kdbus_printf("\nPrivileged → unprivileged (same namespaces):\n");
++	ret = kdbus_msg_recv_poll(unpriv_conn, 300, &msg, NULL);
++
++	ASSERT_EXIT(ret == 0);
++	ASSERT_EXIT(msg->dst_id == unpriv_conn->id);
++
++	/* will get the privileged creds */
++	ret = kdbus_match_kdbus_creds(msg, &privileged_creds);
++	ASSERT_EXIT(ret == 0);
++
++	/* Same pidns so will get the kdbus_pids */
++	ret = kdbus_match_kdbus_pids(msg, &parent_pids);
++	ASSERT_RETURN(ret == 0);
++
++	kdbus_msg_free(msg);
++
++
++	/*
++	 * Receive broadcast from privileged connection
++	 */
++	kdbus_printf("\nPrivileged → unprivileged (same namespaces):\n");
++	ret = kdbus_msg_recv_poll(unpriv_conn, 300, &msg, NULL);
++
++	ASSERT_EXIT(ret == 0);
++	ASSERT_EXIT(msg->dst_id == KDBUS_DST_ID_BROADCAST);
++
++	/* will get the privileged creds */
++	ret = kdbus_match_kdbus_creds(msg, &privileged_creds);
++	ASSERT_EXIT(ret == 0);
++
++	ret = kdbus_match_kdbus_pids(msg, &parent_pids);
++	ASSERT_RETURN(ret == 0);
++
++	kdbus_msg_free(msg);
++
++out:
++	kdbus_conn_free(unpriv_conn);
++
++	return ret;
++}
++
++static int kdbus_clone_userns_test(const char *bus,
++				   struct kdbus_conn *conn)
++{
++	int ret, status, efd;
++	pid_t pid, ppid;
++	uint64_t unpriv_conn_id, userns_conn_id;
++	struct kdbus_msg *msg;
++	const struct kdbus_item *item;
++	struct kdbus_pids expected_pids;
++	struct kdbus_conn *monitor;
++
++	kdbus_printf("STARTING TEST 'metadata-ns'.\n");
++
++	monitor = kdbus_hello(bus, KDBUS_HELLO_MONITOR, NULL, 0);
++	ASSERT_EXIT(monitor);
++
++	/*
++	 * parent will signal to child that is in its
++	 * userns to read its queue
++	 */
++	efd = eventfd(0, EFD_CLOEXEC);
++	ASSERT_RETURN_VAL(efd >= 0, efd);
++
++	ppid = getppid();
++
++	pid = fork();
++	ASSERT_RETURN_VAL(pid >= 0, -errno);
++
++	if (pid == 0) {
++		ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
++		ASSERT_EXIT_VAL(ret == 0, -errno);
++
++		ret = __kdbus_clone_userns_test(bus, conn, ppid, efd);
++		_exit(ret);
++	}
++
++
++	/* Phase 1) privileged receives from unprivileged */
++
++	/*
++	 * Receive from the unprivileged child
++	 */
++	kdbus_printf("\nUnprivileged → privileged (same namespaces):\n");
++	ret = kdbus_msg_recv_poll(conn, 300, &msg, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	unpriv_conn_id = msg->src_id;
++
++	/* Unprivileged user */
++	ret = kdbus_match_kdbus_creds(msg, &unmapped_creds);
++	ASSERT_RETURN(ret == 0);
++
++	/* Set the expected creds_pids */
++	expected_pids = (struct kdbus_pids) {
++		.pid = pid,
++		.tid = pid,
++		.ppid = getpid(),
++	};
++	ret = kdbus_match_kdbus_pids(msg, &expected_pids);
++	ASSERT_RETURN(ret == 0);
++
++	kdbus_msg_free(msg);
++
++
++	/*
++	 * Receive from the unprivileged that is in his own
++	 * userns and pidns
++	 */
++
++	kdbus_printf("\nUnprivileged/privileged in its userns → privileged "
++		     "(different userns and pidns)\n");
++	ret = kdbus_msg_recv_poll(conn, 300, &msg, NULL);
++	if (ret == -ETIMEDOUT)
++		/* perhaps unprivileged userns is not allowed */
++		goto wait;
++
++	ASSERT_RETURN(ret == 0);
++
++	userns_conn_id = msg->src_id;
++
++	item = kdbus_get_item(msg, KDBUS_ITEM_CAPS);
++	ASSERT_RETURN(item);
++
++	/*
++	 * Compare received items, creds must be translated into
++	 * the receiver user namespace, so the user is unprivileged
++	 */
++	ret = kdbus_match_kdbus_creds(msg, &unmapped_creds);
++	ASSERT_RETURN(ret == 0);
++
++	/*
++	 * We should have the kdbus_pids since we are the parent
++	 * pidns
++	 */
++	item = kdbus_get_item(msg, KDBUS_ITEM_PIDS);
++	ASSERT_RETURN(item);
++
++	ASSERT_RETURN(memcmp(&item->pids, &unmapped_pids,
++			     sizeof(struct kdbus_pids)) != 0);
++
++	/*
++	 * Parent pid of the unprivileged/privileged in its userns
++	 * is the unprivileged child pid that was forked here.
++	 */
++	ASSERT_RETURN((uint64_t)pid == item->pids.ppid);
++
++	kdbus_msg_free(msg);
++
++
++	/* Phase 2) Privileged connection sends now 3 packets */
++
++	/*
++	 * Sending to unprivileged connections a unicast
++	 */
++	ret = kdbus_msg_send(conn, NULL, 0xdeadbeef, 0, 0,
++			     0, unpriv_conn_id);
++	ASSERT_RETURN(ret == 0);
++
++	/* signal to child that is in its userns */
++	ret = eventfd_write(efd, 1);
++	ASSERT_EXIT(ret == 0);
++
++	/*
++	 * Sending to unprivileged/privilged in its userns
++	 * connections a unicast
++	 */
++	ret = kdbus_msg_send(conn, NULL, 0xdeadbeef, 0, 0,
++			     0, userns_conn_id);
++	ASSERT_RETURN(ret == 0);
++
++	/*
++	 * Sending to unprivileged connections a broadcast
++	 */
++	ret = kdbus_msg_send(conn, NULL, 0xdeadbeef, 0, 0,
++			     0, KDBUS_DST_ID_BROADCAST);
++	ASSERT_RETURN(ret == 0);
++
++
++wait:
++	ret = waitpid(pid, &status, 0);
++	ASSERT_RETURN(ret >= 0);
++
++	ASSERT_RETURN(WIFEXITED(status))
++	ASSERT_RETURN(!WEXITSTATUS(status));
++
++	/* Dump monitor queue */
++	kdbus_printf("\n\nMonitor queue:\n");
++	for (;;) {
++		ret = kdbus_msg_recv_poll(monitor, 100, &msg, NULL);
++		if (ret < 0)
++			break;
++
++		if (msg->payload_type == KDBUS_PAYLOAD_DBUS) {
++			/*
++			 * Parent pidns should see all the
++			 * pids
++			 */
++			item = kdbus_get_item(msg, KDBUS_ITEM_PIDS);
++			ASSERT_RETURN(item);
++
++			ASSERT_RETURN(item->pids.pid != 0 &&
++				      item->pids.tid != 0 &&
++				      item->pids.ppid != 0);
++		}
++
++		kdbus_msg_free(msg);
++	}
++
++	kdbus_conn_free(monitor);
++	close(efd);
++
++	return 0;
++}
++
++int kdbus_test_metadata_ns(struct kdbus_test_env *env)
++{
++	int ret;
++	struct kdbus_conn *holder, *conn;
++	struct kdbus_policy_access policy_access = {
++		/* Allow world so we can inspect metadata in namespace */
++		.type = KDBUS_POLICY_ACCESS_WORLD,
++		.id = geteuid(),
++		.access = KDBUS_POLICY_TALK,
++	};
++
++	/*
++	 * We require user-namespaces and all uids/gids
++	 * should be mapped (we can just require the necessary ones)
++	 */
++	if (!config_user_ns_is_enabled() ||
++	    !all_uids_gids_are_mapped())
++		return TEST_SKIP;
++
++	ret = test_is_capable(CAP_SETUID, CAP_SETGID, CAP_SYS_ADMIN, -1);
++	ASSERT_RETURN(ret >= 0);
++
++	/* no enough privileges, SKIP test */
++	if (!ret)
++		return TEST_SKIP;
++
++	holder = kdbus_hello_registrar(env->buspath, "com.example.metadata",
++				       &policy_access, 1,
++				       KDBUS_HELLO_POLICY_HOLDER);
++	ASSERT_RETURN(holder);
++
++	conn = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn);
++
++	ret = kdbus_add_match_empty(conn);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_name_acquire(conn, "com.example.metadata", NULL);
++	ASSERT_EXIT(ret >= 0);
++
++	ret = kdbus_clone_userns_test(env->buspath, conn);
++	ASSERT_RETURN(ret == 0);
++
++	kdbus_conn_free(holder);
++	kdbus_conn_free(conn);
++
++	return TEST_OK;
++}
+diff --git a/tools/testing/selftests/kdbus/test-monitor.c b/tools/testing/selftests/kdbus/test-monitor.c
+new file mode 100644
+index 0000000..e00d738
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-monitor.c
+@@ -0,0 +1,176 @@
++#include <stdio.h>
++#include <string.h>
++#include <time.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <stdbool.h>
++#include <errno.h>
++#include <assert.h>
++#include <signal.h>
++#include <sys/time.h>
++#include <sys/mman.h>
++#include <sys/capability.h>
++#include <sys/wait.h>
++
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++#include "kdbus-test.h"
++
++int kdbus_test_monitor(struct kdbus_test_env *env)
++{
++	struct kdbus_conn *monitor, *conn;
++	unsigned int cookie = 0xdeadbeef;
++	struct kdbus_msg *msg;
++	uint64_t offset = 0;
++	int ret;
++
++	conn = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn);
++
++	/* add matches to make sure the monitor do not trigger an item add or
++	 * remove on connect and disconnect, respectively.
++	 */
++	ret = kdbus_add_match_id(conn, 0x1, KDBUS_ITEM_ID_ADD,
++				 KDBUS_MATCH_ID_ANY);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_add_match_id(conn, 0x2, KDBUS_ITEM_ID_REMOVE,
++				 KDBUS_MATCH_ID_ANY);
++	ASSERT_RETURN(ret == 0);
++
++	/* register a monitor */
++	monitor = kdbus_hello(env->buspath, KDBUS_HELLO_MONITOR, NULL, 0);
++	ASSERT_RETURN(monitor);
++
++	/* make sure we did not receive a monitor connect notification */
++	ret = kdbus_msg_recv(conn, &msg, &offset);
++	ASSERT_RETURN(ret == -EAGAIN);
++
++	/* check that a monitor cannot acquire a name */
++	ret = kdbus_name_acquire(monitor, "foo.bar.baz", NULL);
++	ASSERT_RETURN(ret == -EOPNOTSUPP);
++
++	ret = kdbus_msg_send(env->conn, NULL, cookie, 0, 0,  0, conn->id);
++	ASSERT_RETURN(ret == 0);
++
++	/* the recipient should have gotten the message */
++	ret = kdbus_msg_recv(conn, &msg, &offset);
++	ASSERT_RETURN(ret == 0);
++	ASSERT_RETURN(msg->cookie == cookie);
++	kdbus_msg_free(msg);
++	kdbus_free(conn, offset);
++
++	/* and so should the monitor */
++	ret = kdbus_msg_recv(monitor, &msg, &offset);
++	ASSERT_RETURN(ret == 0);
++	ASSERT_RETURN(msg->cookie == cookie);
++
++	kdbus_msg_free(msg);
++	kdbus_free(monitor, offset);
++
++	/* Installing matches for monitors must fais must fail */
++	ret = kdbus_add_match_empty(monitor);
++	ASSERT_RETURN(ret == -EOPNOTSUPP);
++
++	cookie++;
++	ret = kdbus_msg_send(env->conn, NULL, cookie, 0, 0, 0,
++			     KDBUS_DST_ID_BROADCAST);
++	ASSERT_RETURN(ret == 0);
++
++	/* The monitor should get the message. */
++	ret = kdbus_msg_recv_poll(monitor, 100, &msg, &offset);
++	ASSERT_RETURN(ret == 0);
++	ASSERT_RETURN(msg->cookie == cookie);
++
++	kdbus_msg_free(msg);
++	kdbus_free(monitor, offset);
++
++	/*
++	 * Since we are the only monitor, update the attach flags
++	 * and tell we are not interessted in attach flags recv
++	 */
++
++	ret = kdbus_conn_update_attach_flags(monitor,
++					     _KDBUS_ATTACH_ALL,
++					     0);
++	ASSERT_RETURN(ret == 0);
++
++	cookie++;
++	ret = kdbus_msg_send(env->conn, NULL, cookie, 0, 0, 0,
++			     KDBUS_DST_ID_BROADCAST);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_msg_recv_poll(monitor, 100, &msg, &offset);
++	ASSERT_RETURN(ret == 0);
++	ASSERT_RETURN(msg->cookie == cookie);
++
++	ret = kdbus_item_in_message(msg, KDBUS_ITEM_TIMESTAMP);
++	ASSERT_RETURN(ret == 0);
++
++	kdbus_msg_free(msg);
++	kdbus_free(monitor, offset);
++
++	/*
++	 * Now we are interested in KDBUS_ITEM_TIMESTAMP and
++	 * KDBUS_ITEM_CREDS
++	 */
++	ret = kdbus_conn_update_attach_flags(monitor,
++					     _KDBUS_ATTACH_ALL,
++					     KDBUS_ATTACH_TIMESTAMP |
++					     KDBUS_ATTACH_CREDS);
++	ASSERT_RETURN(ret == 0);
++
++	cookie++;
++	ret = kdbus_msg_send(env->conn, NULL, cookie, 0, 0, 0,
++			     KDBUS_DST_ID_BROADCAST);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_msg_recv_poll(monitor, 100, &msg, &offset);
++	ASSERT_RETURN(ret == 0);
++	ASSERT_RETURN(msg->cookie == cookie);
++
++	ret = kdbus_item_in_message(msg, KDBUS_ITEM_TIMESTAMP);
++	ASSERT_RETURN(ret == 1);
++
++	ret = kdbus_item_in_message(msg, KDBUS_ITEM_CREDS);
++	ASSERT_RETURN(ret == 1);
++
++	/* the KDBUS_ITEM_PID_COMM was not requested */
++	ret = kdbus_item_in_message(msg, KDBUS_ITEM_PID_COMM);
++	ASSERT_RETURN(ret == 0);
++
++	kdbus_msg_free(msg);
++	kdbus_free(monitor, offset);
++
++	kdbus_conn_free(monitor);
++	/* make sure we did not receive a monitor disconnect notification */
++	ret = kdbus_msg_recv(conn, &msg, &offset);
++	ASSERT_RETURN(ret == -EAGAIN);
++
++	kdbus_conn_free(conn);
++
++	/* Make sure that monitor as unprivileged is not allowed */
++	ret = test_is_capable(CAP_SETUID, CAP_SETGID, -1);
++	ASSERT_RETURN(ret >= 0);
++
++	if (ret && all_uids_gids_are_mapped()) {
++		ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_UID, ({
++			monitor = kdbus_hello(env->buspath,
++					      KDBUS_HELLO_MONITOR,
++					      NULL, 0);
++			ASSERT_EXIT(!monitor && errno == EPERM);
++
++			_exit(EXIT_SUCCESS);
++		}),
++		({ 0; }));
++		ASSERT_RETURN(ret == 0);
++	}
++
++	return TEST_OK;
++}
+diff --git a/tools/testing/selftests/kdbus/test-names.c b/tools/testing/selftests/kdbus/test-names.c
+new file mode 100644
+index 0000000..e400dc8
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-names.c
+@@ -0,0 +1,272 @@
++#include <stdio.h>
++#include <string.h>
++#include <time.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++#include <assert.h>
++#include <limits.h>
++#include <getopt.h>
++#include <stdbool.h>
++
++#include "kdbus-api.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++#include "kdbus-test.h"
++
++struct test_name {
++	const char *name;
++	__u64 owner_id;
++	__u64 flags;
++};
++
++static bool conn_test_names(const struct kdbus_conn *conn,
++			    const struct test_name *tests,
++			    unsigned int n_tests)
++{
++	struct kdbus_cmd_list cmd_list = {};
++	struct kdbus_info *name, *list;
++	unsigned int i;
++	int ret;
++
++	cmd_list.size = sizeof(cmd_list);
++	cmd_list.flags = KDBUS_LIST_NAMES |
++			 KDBUS_LIST_ACTIVATORS |
++			 KDBUS_LIST_QUEUED;
++
++	ret = kdbus_cmd_list(conn->fd, &cmd_list);
++	ASSERT_RETURN(ret == 0);
++
++	list = (struct kdbus_info *)(conn->buf + cmd_list.offset);
++
++	for (i = 0; i < n_tests; i++) {
++		const struct test_name *t = tests + i;
++		bool found = false;
++
++		KDBUS_FOREACH(name, list, cmd_list.list_size) {
++			struct kdbus_item *item;
++
++			KDBUS_ITEM_FOREACH(item, name, items) {
++				if (item->type != KDBUS_ITEM_OWNED_NAME ||
++				    strcmp(item->name.name, t->name) != 0)
++					continue;
++
++				if (t->owner_id == name->id &&
++				    t->flags == item->name.flags) {
++					found = true;
++					break;
++				}
++			}
++		}
++
++		if (!found)
++			return false;
++	}
++
++	return true;
++}
++
++static bool conn_is_name_primary_owner(const struct kdbus_conn *conn,
++				       const char *needle)
++{
++	struct test_name t = {
++		.name = needle,
++		.owner_id = conn->id,
++		.flags = KDBUS_NAME_PRIMARY,
++	};
++
++	return conn_test_names(conn, &t, 1);
++}
++
++int kdbus_test_name_basic(struct kdbus_test_env *env)
++{
++	struct kdbus_conn *conn;
++	char *name, *dot_name, *invalid_name, *wildcard_name;
++	int ret;
++
++	name = "foo.bla.blaz";
++	dot_name = ".bla.blaz";
++	invalid_name = "foo";
++	wildcard_name = "foo.bla.bl.*";
++
++	/* create a 2nd connection */
++	conn = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn != NULL);
++
++	/* acquire name "foo.bar.xxx" name */
++	ret = kdbus_name_acquire(conn, "foo.bar.xxx", NULL);
++	ASSERT_RETURN(ret == 0);
++
++	/* Name is not valid, must fail */
++	ret = kdbus_name_acquire(env->conn, dot_name, NULL);
++	ASSERT_RETURN(ret == -EINVAL);
++
++	ret = kdbus_name_acquire(env->conn, invalid_name, NULL);
++	ASSERT_RETURN(ret == -EINVAL);
++
++	ret = kdbus_name_acquire(env->conn, wildcard_name, NULL);
++	ASSERT_RETURN(ret == -EINVAL);
++
++	/* check that we can acquire a name */
++	ret = kdbus_name_acquire(env->conn, name, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	ret = conn_is_name_primary_owner(env->conn, name);
++	ASSERT_RETURN(ret == true);
++
++	/* ... and release it again */
++	ret = kdbus_name_release(env->conn, name);
++	ASSERT_RETURN(ret == 0);
++
++	ret = conn_is_name_primary_owner(env->conn, name);
++	ASSERT_RETURN(ret == false);
++
++	/* check that we can't release it again */
++	ret = kdbus_name_release(env->conn, name);
++	ASSERT_RETURN(ret == -ESRCH);
++
++	/* check that we can't release a name that we don't own */
++	ret = kdbus_name_release(env->conn, "foo.bar.xxx");
++	ASSERT_RETURN(ret == -EADDRINUSE);
++
++	/* Name is not valid, must fail */
++	ret = kdbus_name_release(env->conn, dot_name);
++	ASSERT_RETURN(ret == -ESRCH);
++
++	ret = kdbus_name_release(env->conn, invalid_name);
++	ASSERT_RETURN(ret == -ESRCH);
++
++	ret = kdbus_name_release(env->conn, wildcard_name);
++	ASSERT_RETURN(ret == -ESRCH);
++
++	kdbus_conn_free(conn);
++
++	return TEST_OK;
++}
++
++int kdbus_test_name_conflict(struct kdbus_test_env *env)
++{
++	struct kdbus_conn *conn;
++	char *name;
++	int ret;
++
++	name = "foo.bla.blaz";
++
++	/* create a 2nd connection */
++	conn = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn != NULL);
++
++	/* allow the new connection to own the same name */
++	/* acquire name from the 1st connection */
++	ret = kdbus_name_acquire(env->conn, name, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	ret = conn_is_name_primary_owner(env->conn, name);
++	ASSERT_RETURN(ret == true);
++
++	/* check that we also can't acquire it again from the 2nd connection */
++	ret = kdbus_name_acquire(conn, name, NULL);
++	ASSERT_RETURN(ret == -EEXIST);
++
++	kdbus_conn_free(conn);
++
++	return TEST_OK;
++}
++
++int kdbus_test_name_queue(struct kdbus_test_env *env)
++{
++	struct kdbus_conn *conn;
++	struct test_name t[2];
++	const char *name;
++	uint64_t flags;
++	int ret;
++
++	name = "foo.bla.blaz";
++
++	flags = 0;
++
++	/* create a 2nd connection */
++	conn = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn != NULL);
++
++	/* allow the new connection to own the same name */
++	/* acquire name from the 1st connection */
++	ret = kdbus_name_acquire(env->conn, name, &flags);
++	ASSERT_RETURN(ret == 0);
++
++	ret = conn_is_name_primary_owner(env->conn, name);
++	ASSERT_RETURN(ret == true);
++
++	/* queue the 2nd connection as waiting owner */
++	flags = KDBUS_NAME_QUEUE;
++	ret = kdbus_name_acquire(conn, name, &flags);
++	ASSERT_RETURN(ret == 0);
++	ASSERT_RETURN(flags & KDBUS_NAME_IN_QUEUE);
++
++	t[0].name = name;
++	t[0].owner_id = env->conn->id;
++	t[0].flags = KDBUS_NAME_PRIMARY;
++	t[1].name = name;
++	t[1].owner_id = conn->id;
++	t[1].flags = KDBUS_NAME_QUEUE | KDBUS_NAME_IN_QUEUE;
++	ret = conn_test_names(conn, t, 2);
++	ASSERT_RETURN(ret == true);
++
++	/* release name from 1st connection */
++	ret = kdbus_name_release(env->conn, name);
++	ASSERT_RETURN(ret == 0);
++
++	/* now the name should be owned by the 2nd connection */
++	t[0].name = name;
++	t[0].owner_id = conn->id;
++	t[0].flags = KDBUS_NAME_PRIMARY | KDBUS_NAME_QUEUE;
++	ret = conn_test_names(conn, t, 1);
++	ASSERT_RETURN(ret == true);
++
++	kdbus_conn_free(conn);
++
++	return TEST_OK;
++}
++
++int kdbus_test_name_takeover(struct kdbus_test_env *env)
++{
++	struct kdbus_conn *conn;
++	struct test_name t;
++	const char *name;
++	uint64_t flags;
++	int ret;
++
++	name = "foo.bla.blaz";
++
++	flags = KDBUS_NAME_ALLOW_REPLACEMENT;
++
++	/* create a 2nd connection */
++	conn = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn != NULL);
++
++	/* acquire name for 1st connection */
++	ret = kdbus_name_acquire(env->conn, name, &flags);
++	ASSERT_RETURN(ret == 0);
++
++	t.name = name;
++	t.owner_id = env->conn->id;
++	t.flags = KDBUS_NAME_ALLOW_REPLACEMENT | KDBUS_NAME_PRIMARY;
++	ret = conn_test_names(conn, &t, 1);
++	ASSERT_RETURN(ret == true);
++
++	/* now steal name with 2nd connection */
++	flags = KDBUS_NAME_REPLACE_EXISTING;
++	ret = kdbus_name_acquire(conn, name, &flags);
++	ASSERT_RETURN(ret == 0);
++	ASSERT_RETURN(flags & KDBUS_NAME_ACQUIRED);
++
++	ret = conn_is_name_primary_owner(conn, name);
++	ASSERT_RETURN(ret == true);
++
++	kdbus_conn_free(conn);
++
++	return TEST_OK;
++}
+diff --git a/tools/testing/selftests/kdbus/test-policy-ns.c b/tools/testing/selftests/kdbus/test-policy-ns.c
+new file mode 100644
+index 0000000..3437012
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-policy-ns.c
+@@ -0,0 +1,632 @@
++/*
++ * Test metadata and policies in new namespaces. Even if our tests
++ * can run in a namespaced setup, this test is necessary so we can
++ * inspect policies on the same kdbusfs but between multiple
++ * namespaces.
++ *
++ * Copyright (C) 2014-2015 Djalal Harouni
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <stdio.h>
++#include <string.h>
++#include <fcntl.h>
++#include <pthread.h>
++#include <sched.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <stdint.h>
++#include <stdbool.h>
++#include <unistd.h>
++#include <errno.h>
++#include <signal.h>
++#include <sys/wait.h>
++#include <sys/prctl.h>
++#include <sys/eventfd.h>
++#include <sys/syscall.h>
++#include <sys/capability.h>
++#include <linux/sched.h>
++
++#include "kdbus-test.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++
++#define MAX_CONN	64
++#define POLICY_NAME	"foo.test.policy-test"
++
++#define KDBUS_CONN_MAX_MSGS_PER_USER            16
++
++/**
++ * Note: this test can be used to inspect policy_db->talk_access_hash
++ *
++ * The purpose of these tests:
++ * 1) Check KDBUS_POLICY_TALK
++ * 2) Check the cache state: kdbus_policy_db->talk_access_hash
++ * Should be extended
++ */
++
++/**
++ * Check a list of connections against conn_db[0]
++ * conn_db[0] will own the name "foo.test.policy-test" and the
++ * policy holder connection for this name will update the policy
++ * entries, so different use cases can be tested.
++ */
++static struct kdbus_conn **conn_db;
++
++static void *kdbus_recv_echo(void *ptr)
++{
++	int ret;
++	struct kdbus_conn *conn = ptr;
++
++	ret = kdbus_msg_recv_poll(conn, 200, NULL, NULL);
++
++	return (void *)(long)ret;
++}
++
++/* Trigger kdbus_policy_set() */
++static int kdbus_set_policy_talk(struct kdbus_conn *conn,
++				 const char *name,
++				 uid_t id, unsigned int type)
++{
++	int ret;
++	struct kdbus_policy_access access = {
++		.type = type,
++		.id = id,
++		.access = KDBUS_POLICY_TALK,
++	};
++
++	ret = kdbus_conn_update_policy(conn, name, &access, 1);
++	ASSERT_RETURN(ret == 0);
++
++	return TEST_OK;
++}
++
++/* return TEST_OK or TEST_ERR on failure */
++static int kdbus_register_same_activator(char *bus, const char *name,
++					 struct kdbus_conn **c)
++{
++	int ret;
++	struct kdbus_conn *activator;
++
++	activator = kdbus_hello_activator(bus, name, NULL, 0);
++	if (activator) {
++		*c = activator;
++		fprintf(stderr, "--- error was able to register name twice '%s'.\n",
++			name);
++		return TEST_ERR;
++	}
++
++	ret = -errno;
++	/* -EEXIST means test succeeded */
++	if (ret == -EEXIST)
++		return TEST_OK;
++
++	return TEST_ERR;
++}
++
++/* return TEST_OK or TEST_ERR on failure */
++static int kdbus_register_policy_holder(char *bus, const char *name,
++					struct kdbus_conn **conn)
++{
++	struct kdbus_conn *c;
++	struct kdbus_policy_access access[2];
++
++	access[0].type = KDBUS_POLICY_ACCESS_USER;
++	access[0].access = KDBUS_POLICY_OWN;
++	access[0].id = geteuid();
++
++	access[1].type = KDBUS_POLICY_ACCESS_WORLD;
++	access[1].access = KDBUS_POLICY_TALK;
++	access[1].id = geteuid();
++
++	c = kdbus_hello_registrar(bus, name, access, 2,
++				  KDBUS_HELLO_POLICY_HOLDER);
++	ASSERT_RETURN(c);
++
++	*conn = c;
++
++	return TEST_OK;
++}
++
++/**
++ * Create new threads for receiving from multiple senders,
++ * The 'conn_db' will be populated by newly created connections.
++ * Caller should free all allocated connections.
++ *
++ * return 0 on success, negative errno on failure.
++ */
++static int kdbus_recv_in_threads(const char *bus, const char *name,
++				 struct kdbus_conn **conn_db)
++{
++	int ret;
++	bool pool_full = false;
++	unsigned int sent_packets = 0;
++	unsigned int lost_packets = 0;
++	unsigned int i, tid;
++	unsigned long dst_id;
++	unsigned long cookie = 1;
++	unsigned int thread_nr = MAX_CONN - 1;
++	pthread_t thread_id[MAX_CONN - 1] = {'\0'};
++
++	dst_id = name ? KDBUS_DST_ID_NAME : conn_db[0]->id;
++
++	for (tid = 0, i = 1; tid < thread_nr; tid++, i++) {
++		ret = pthread_create(&thread_id[tid], NULL,
++				     kdbus_recv_echo, (void *)conn_db[0]);
++		if (ret < 0) {
++			ret = -errno;
++			kdbus_printf("error pthread_create: %d (%m)\n",
++				      ret);
++			break;
++		}
++
++		/* just free before re-using */
++		kdbus_conn_free(conn_db[i]);
++		conn_db[i] = NULL;
++
++		/* We need to create connections here */
++		conn_db[i] = kdbus_hello(bus, 0, NULL, 0);
++		if (!conn_db[i]) {
++			ret = -errno;
++			break;
++		}
++
++		ret = kdbus_add_match_empty(conn_db[i]);
++		if (ret < 0)
++			break;
++
++		ret = kdbus_msg_send(conn_db[i], name, cookie++,
++				     0, 0, 0, dst_id);
++		if (ret < 0) {
++			/*
++			 * Receivers are not reading their messages,
++			 * not scheduled ?!
++			 *
++			 * So set the pool full here, perhaps the
++			 * connection pool or queue was full, later
++			 * recheck receivers errors
++			 */
++			if (ret == -ENOBUFS || ret == -EXFULL)
++				pool_full = true;
++			break;
++		}
++
++		sent_packets++;
++	}
++
++	for (tid = 0; tid < thread_nr; tid++) {
++		int thread_ret = 0;
++
++		if (thread_id[tid]) {
++			pthread_join(thread_id[tid], (void *)&thread_ret);
++			if (thread_ret < 0) {
++				/* Update only if send did not fail */
++				if (ret == 0)
++					ret = thread_ret;
++
++				lost_packets++;
++			}
++		}
++	}
++
++	/*
++	 * When sending if we did fail with -ENOBUFS or -EXFULL
++	 * then we should have set lost_packet and we should at
++	 * least have sent_packets set to KDBUS_CONN_MAX_MSGS_PER_USER
++	 */
++	if (pool_full) {
++		ASSERT_RETURN(lost_packets > 0);
++
++		/*
++		 * We should at least send KDBUS_CONN_MAX_MSGS_PER_USER
++		 *
++		 * For every send operation we create a thread to
++		 * recv the packet, so we keep the queue clean
++		 */
++		ASSERT_RETURN(sent_packets >= KDBUS_CONN_MAX_MSGS_PER_USER);
++
++		/*
++		 * Set ret to zero since we only failed due to
++		 * the receiving threads that have not been
++		 * scheduled
++		 */
++		ret = 0;
++	}
++
++	return ret;
++}
++
++/* Return: TEST_OK or TEST_ERR on failure */
++static int kdbus_normal_test(const char *bus, const char *name,
++			     struct kdbus_conn **conn_db)
++{
++	int ret;
++
++	ret = kdbus_recv_in_threads(bus, name, conn_db);
++	ASSERT_RETURN(ret >= 0);
++
++	return TEST_OK;
++}
++
++static int kdbus_fork_test_by_id(const char *bus,
++				 struct kdbus_conn **conn_db,
++				 int parent_status, int child_status)
++{
++	int ret;
++	pid_t pid;
++	uint64_t cookie = 0x9876ecba;
++	struct kdbus_msg *msg = NULL;
++	uint64_t offset = 0;
++	int status = 0;
++
++	/*
++	 * If the child_status is not EXIT_SUCCESS, then we expect
++	 * that sending from the child will fail, thus receiving
++	 * from parent must error with -ETIMEDOUT, and vice versa.
++	 */
++	bool parent_timedout = !!child_status;
++	bool child_timedout = !!parent_status;
++
++	pid = fork();
++	ASSERT_RETURN_VAL(pid >= 0, pid);
++
++	if (pid == 0) {
++		struct kdbus_conn *conn_src;
++
++		ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
++		ASSERT_EXIT(ret == 0);
++
++		ret = drop_privileges(65534, 65534);
++		ASSERT_EXIT(ret == 0);
++
++		conn_src = kdbus_hello(bus, 0, NULL, 0);
++		ASSERT_EXIT(conn_src);
++
++		ret = kdbus_add_match_empty(conn_src);
++		ASSERT_EXIT(ret == 0);
++
++		/*
++		 * child_status is always checked against send
++		 * operations, in case it fails always return
++		 * EXIT_FAILURE.
++		 */
++		ret = kdbus_msg_send(conn_src, NULL, cookie,
++				     0, 0, 0, conn_db[0]->id);
++		ASSERT_EXIT(ret == child_status);
++
++		ret = kdbus_msg_recv_poll(conn_src, 100, NULL, NULL);
++
++		kdbus_conn_free(conn_src);
++
++		/*
++		 * Child kdbus_msg_recv_poll() should timeout since
++		 * the parent_status was set to a non EXIT_SUCCESS
++		 * value.
++		 */
++		if (child_timedout)
++			_exit(ret == -ETIMEDOUT ? EXIT_SUCCESS : EXIT_FAILURE);
++
++		_exit(ret == 0 ? EXIT_SUCCESS : EXIT_FAILURE);
++	}
++
++	ret = kdbus_msg_recv_poll(conn_db[0], 100, &msg, &offset);
++	/*
++	 * If parent_timedout is set then this should fail with
++	 * -ETIMEDOUT since the child_status was set to a non
++	 * EXIT_SUCCESS value. Otherwise, assume
++	 * that kdbus_msg_recv_poll() has succeeded.
++	 */
++	if (parent_timedout) {
++		ASSERT_RETURN_VAL(ret == -ETIMEDOUT, TEST_ERR);
++
++		/* timedout no need to continue, we don't have the
++		 * child connection ID, so just terminate. */
++		goto out;
++	} else {
++		ASSERT_RETURN_VAL(ret == 0, ret);
++	}
++
++	ret = kdbus_msg_send(conn_db[0], NULL, ++cookie,
++			     0, 0, 0, msg->src_id);
++	/*
++	 * parent_status is checked against send operations,
++	 * on failures always return TEST_ERR.
++	 */
++	ASSERT_RETURN_VAL(ret == parent_status, TEST_ERR);
++
++	kdbus_msg_free(msg);
++	kdbus_free(conn_db[0], offset);
++
++out:
++	ret = waitpid(pid, &status, 0);
++	ASSERT_RETURN_VAL(ret >= 0, ret);
++
++	return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
++}
++
++/*
++ * Return: TEST_OK, TEST_ERR or TEST_SKIP
++ * we return TEST_OK only if the children return with the expected
++ * 'expected_status' that is specified as an argument.
++ */
++static int kdbus_fork_test(const char *bus, const char *name,
++			   struct kdbus_conn **conn_db, int expected_status)
++{
++	pid_t pid;
++	int ret = 0;
++	int status = 0;
++
++	pid = fork();
++	ASSERT_RETURN_VAL(pid >= 0, pid);
++
++	if (pid == 0) {
++		ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
++		ASSERT_EXIT(ret == 0);
++
++		ret = drop_privileges(65534, 65534);
++		ASSERT_EXIT(ret == 0);
++
++		ret = kdbus_recv_in_threads(bus, name, conn_db);
++		_exit(ret == expected_status ? EXIT_SUCCESS : EXIT_FAILURE);
++	}
++
++	ret = waitpid(pid, &status, 0);
++	ASSERT_RETURN(ret >= 0);
++
++	return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
++}
++
++/* Return EXIT_SUCCESS, EXIT_FAILURE or negative errno */
++static int __kdbus_clone_userns_test(const char *bus,
++				     const char *name,
++				     struct kdbus_conn **conn_db,
++				     int expected_status)
++{
++	int efd;
++	pid_t pid;
++	int ret = 0;
++	unsigned int uid = 65534;
++	int status;
++
++	ret = drop_privileges(uid, uid);
++	ASSERT_RETURN_VAL(ret == 0, ret);
++
++	/*
++	 * Since we just dropped privileges, the dumpable flag was just
++	 * cleared which makes the /proc/$clone_child/uid_map to be
++	 * owned by root, hence any userns uid mapping will fail with
++	 * -EPERM since the mapping will be done by uid 65534.
++	 *
++	 * To avoid this set the dumpable flag again which makes procfs
++	 * update the /proc/$clone_child/ inodes owner to 65534.
++	 *
++	 * Using this we will be able write to /proc/$clone_child/uid_map
++	 * as uid 65534 and map the uid 65534 to 0 inside the user
++	 * namespace.
++	 */
++	ret = prctl(PR_SET_DUMPABLE, SUID_DUMP_USER);
++	ASSERT_RETURN_VAL(ret == 0, ret);
++
++	/* sync parent/child */
++	efd = eventfd(0, EFD_CLOEXEC);
++	ASSERT_RETURN_VAL(efd >= 0, efd);
++
++	pid = syscall(__NR_clone, SIGCHLD|CLONE_NEWUSER, NULL);
++	if (pid < 0) {
++		ret = -errno;
++		kdbus_printf("error clone: %d (%m)\n", ret);
++		/*
++		 * Normal user not allowed to create userns,
++		 * so nothing to worry about ?
++		 */
++		if (ret == -EPERM) {
++			kdbus_printf("-- CLONE_NEWUSER TEST Failed for uid: %u\n"
++				"-- Make sure that your kernel do not allow "
++				"CLONE_NEWUSER for unprivileged users\n"
++				"-- Upstream Commit: "
++				"https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5eaf563e\n",
++				uid);
++			ret = 0;
++		}
++
++		return ret;
++	}
++
++	if (pid == 0) {
++		struct kdbus_conn *conn_src;
++		eventfd_t event_status = 0;
++
++		ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
++		ASSERT_EXIT(ret == 0);
++
++		ret = eventfd_read(efd, &event_status);
++		ASSERT_EXIT(ret >= 0 && event_status == 1);
++
++		/* ping connection from the new user namespace */
++		conn_src = kdbus_hello(bus, 0, NULL, 0);
++		ASSERT_EXIT(conn_src);
++
++		ret = kdbus_add_match_empty(conn_src);
++		ASSERT_EXIT(ret == 0);
++
++		ret = kdbus_msg_send(conn_src, name, 0xabcd1234,
++				     0, 0, 0, KDBUS_DST_ID_NAME);
++		kdbus_conn_free(conn_src);
++
++		_exit(ret == expected_status ? EXIT_SUCCESS : EXIT_FAILURE);
++	}
++
++	ret = userns_map_uid_gid(pid, "0 65534 1", "0 65534 1");
++	ASSERT_RETURN_VAL(ret == 0, ret);
++
++	/* Tell child we are ready */
++	ret = eventfd_write(efd, 1);
++	ASSERT_RETURN_VAL(ret == 0, ret);
++
++	ret = waitpid(pid, &status, 0);
++	ASSERT_RETURN_VAL(ret >= 0, ret);
++
++	close(efd);
++
++	return status == EXIT_SUCCESS ? TEST_OK : TEST_ERR;
++}
++
++static int kdbus_clone_userns_test(const char *bus,
++				   const char *name,
++				   struct kdbus_conn **conn_db,
++				   int expected_status)
++{
++	pid_t pid;
++	int ret = 0;
++	int status;
++
++	pid = fork();
++	ASSERT_RETURN_VAL(pid >= 0, -errno);
++
++	if (pid == 0) {
++		ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
++		if (ret < 0)
++			_exit(EXIT_FAILURE);
++
++		ret = __kdbus_clone_userns_test(bus, name, conn_db,
++						expected_status);
++		_exit(ret);
++	}
++
++	/*
++	 * Receive in the original (root privileged) user namespace,
++	 * must fail with -ETIMEDOUT.
++	 */
++	ret = kdbus_msg_recv_poll(conn_db[0], 100, NULL, NULL);
++	ASSERT_RETURN_VAL(ret == -ETIMEDOUT, ret);
++
++	ret = waitpid(pid, &status, 0);
++	ASSERT_RETURN_VAL(ret >= 0, ret);
++
++	return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
++}
++
++int kdbus_test_policy_ns(struct kdbus_test_env *env)
++{
++	int i;
++	int ret;
++	struct kdbus_conn *activator = NULL;
++	struct kdbus_conn *policy_holder = NULL;
++	char *bus = env->buspath;
++
++	ret = test_is_capable(CAP_SETUID, CAP_SETGID, -1);
++	ASSERT_RETURN(ret >= 0);
++
++	/* no enough privileges, SKIP test */
++	if (!ret)
++		return TEST_SKIP;
++
++	/* we require user-namespaces */
++	if (access("/proc/self/uid_map", F_OK) != 0)
++		return TEST_SKIP;
++
++	/* uids/gids must be mapped */
++	if (!all_uids_gids_are_mapped())
++		return TEST_SKIP;
++
++	conn_db = calloc(MAX_CONN, sizeof(struct kdbus_conn *));
++	ASSERT_RETURN(conn_db);
++
++	memset(conn_db, 0, MAX_CONN * sizeof(struct kdbus_conn *));
++
++	conn_db[0] = kdbus_hello(bus, 0, NULL, 0);
++	ASSERT_RETURN(conn_db[0]);
++
++	ret = kdbus_add_match_empty(conn_db[0]);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_fork_test_by_id(bus, conn_db, -EPERM, -EPERM);
++	ASSERT_EXIT(ret == 0);
++
++	ret = kdbus_register_policy_holder(bus, POLICY_NAME,
++					   &policy_holder);
++	ASSERT_RETURN(ret == 0);
++
++	/* Try to register the same name with an activator */
++	ret = kdbus_register_same_activator(bus, POLICY_NAME,
++					    &activator);
++	ASSERT_RETURN(ret == 0);
++
++	/* Acquire POLICY_NAME */
++	ret = kdbus_name_acquire(conn_db[0], POLICY_NAME, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_normal_test(bus, POLICY_NAME, conn_db);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_list(conn_db[0], KDBUS_LIST_NAMES |
++				     KDBUS_LIST_UNIQUE |
++				     KDBUS_LIST_ACTIVATORS |
++				     KDBUS_LIST_QUEUED);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_fork_test(bus, POLICY_NAME, conn_db, EXIT_SUCCESS);
++	ASSERT_RETURN(ret == 0);
++
++	/*
++	 * children connections are able to talk to conn_db[0] since
++	 * current POLICY_NAME TALK type is KDBUS_POLICY_ACCESS_WORLD,
++	 * so expect EXIT_SUCCESS when sending from child. However,
++	 * since the child's connection does not own any well-known
++	 * name, The parent connection conn_db[0] should fail with
++	 * -EPERM but since it is a privileged bus user the TALK is
++	 *  allowed.
++	 */
++	ret = kdbus_fork_test_by_id(bus, conn_db,
++				    EXIT_SUCCESS, EXIT_SUCCESS);
++	ASSERT_EXIT(ret == 0);
++
++	/*
++	 * Connections that can talk are perhaps being destroyed now.
++	 * Restrict the policy and purge cache entries where the
++	 * conn_db[0] is the destination.
++	 *
++	 * Now only connections with uid == 0 are allowed to talk.
++	 */
++	ret = kdbus_set_policy_talk(policy_holder, POLICY_NAME,
++				    geteuid(), KDBUS_POLICY_ACCESS_USER);
++	ASSERT_RETURN(ret == 0);
++
++	/*
++	 * Testing connections (FORK+DROP) again:
++	 * After setting the policy re-check connections
++	 * we expect the children to fail with -EPERM
++	 */
++	ret = kdbus_fork_test(bus, POLICY_NAME, conn_db, -EPERM);
++	ASSERT_RETURN(ret == 0);
++
++	/*
++	 * Now expect that both parent and child to fail.
++	 *
++	 * Child should fail with -EPERM since we just restricted
++	 * the POLICY_NAME TALK to uid 0 and its uid is 65534.
++	 *
++	 * Since the parent's connection will timeout when receiving
++	 * from the child, we never continue. FWIW just put -EPERM.
++	 */
++	ret = kdbus_fork_test_by_id(bus, conn_db, -EPERM, -EPERM);
++	ASSERT_EXIT(ret == 0);
++
++	/* Check if the name can be reached in a new userns */
++	ret = kdbus_clone_userns_test(bus, POLICY_NAME, conn_db, -EPERM);
++	ASSERT_RETURN(ret == 0);
++
++	for (i = 0; i < MAX_CONN; i++)
++		kdbus_conn_free(conn_db[i]);
++
++	kdbus_conn_free(activator);
++	kdbus_conn_free(policy_holder);
++
++	free(conn_db);
++
++	return ret;
++}
+diff --git a/tools/testing/selftests/kdbus/test-policy-priv.c b/tools/testing/selftests/kdbus/test-policy-priv.c
+new file mode 100644
+index 0000000..0208638
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-policy-priv.c
+@@ -0,0 +1,1285 @@
++#include <errno.h>
++#include <stdio.h>
++#include <string.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stdint.h>
++#include <stdbool.h>
++#include <unistd.h>
++#include <time.h>
++#include <sys/capability.h>
++#include <sys/eventfd.h>
++#include <sys/wait.h>
++
++#include "kdbus-test.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++
++static int test_policy_priv_by_id(const char *bus,
++				  struct kdbus_conn *conn_dst,
++				  bool drop_second_user,
++				  int parent_status,
++				  int child_status)
++{
++	int ret = 0;
++	uint64_t expected_cookie = time(NULL) ^ 0xdeadbeef;
++
++	ASSERT_RETURN(conn_dst);
++
++	ret = RUN_UNPRIVILEGED_CONN(unpriv, bus, ({
++		ret = kdbus_msg_send(unpriv, NULL,
++				     expected_cookie, 0, 0, 0,
++				     conn_dst->id);
++		ASSERT_EXIT(ret == child_status);
++	}));
++	ASSERT_RETURN(ret >= 0);
++
++	ret = kdbus_msg_recv_poll(conn_dst, 300, NULL, NULL);
++	ASSERT_RETURN(ret == parent_status);
++
++	return 0;
++}
++
++static int test_policy_priv_by_broadcast(const char *bus,
++					 struct kdbus_conn *conn_dst,
++					 int drop_second_user,
++					 int parent_status,
++					 int child_status)
++{
++	int efd;
++	int ret = 0;
++	eventfd_t event_status = 0;
++	struct kdbus_msg *msg = NULL;
++	uid_t second_uid = UNPRIV_UID;
++	gid_t second_gid = UNPRIV_GID;
++	struct kdbus_conn *child_2 = conn_dst;
++	uint64_t expected_cookie = time(NULL) ^ 0xdeadbeef;
++
++	/* Drop to another unprivileged user other than UNPRIV_UID */
++	if (drop_second_user == DROP_OTHER_UNPRIV) {
++		second_uid = UNPRIV_UID - 1;
++		second_gid = UNPRIV_GID - 1;
++	}
++
++	/* child will signal parent to send broadcast */
++	efd = eventfd(0, EFD_CLOEXEC);
++	ASSERT_RETURN_VAL(efd >= 0, efd);
++
++	ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_GID, ({
++		struct kdbus_conn *child;
++
++		child = kdbus_hello(bus, 0, NULL, 0);
++		ASSERT_EXIT(child);
++
++		ret = kdbus_add_match_empty(child);
++		ASSERT_EXIT(ret == 0);
++
++		/* signal parent */
++		ret = eventfd_write(efd, 1);
++		ASSERT_EXIT(ret == 0);
++
++		/* Use a little bit high time */
++		ret = kdbus_msg_recv_poll(child, 500, &msg, NULL);
++		ASSERT_EXIT(ret == child_status);
++
++		/*
++		 * If we expect the child to get the broadcast
++		 * message, then check the received cookie.
++		 */
++		if (ret == 0) {
++			ASSERT_EXIT(expected_cookie == msg->cookie);
++		}
++
++		/* Use expected_cookie since 'msg' might be NULL */
++		ret = kdbus_msg_send(child, NULL, expected_cookie + 1,
++				     0, 0, 0, KDBUS_DST_ID_BROADCAST);
++		ASSERT_EXIT(ret == 0);
++
++		kdbus_msg_free(msg);
++		kdbus_conn_free(child);
++	}),
++	({
++		if (drop_second_user == DO_NOT_DROP) {
++			ASSERT_RETURN(child_2);
++
++			ret = eventfd_read(efd, &event_status);
++			ASSERT_RETURN(ret >= 0 && event_status == 1);
++
++			ret = kdbus_msg_send(child_2, NULL,
++					     expected_cookie, 0, 0, 0,
++					     KDBUS_DST_ID_BROADCAST);
++			ASSERT_RETURN(ret == 0);
++
++			/* drop own broadcast */
++			ret = kdbus_msg_recv(child_2, &msg, NULL);
++			ASSERT_RETURN(ret == 0);
++			ASSERT_RETURN(msg->src_id == child_2->id);
++			kdbus_msg_free(msg);
++
++			/* Use a little bit high time */
++			ret = kdbus_msg_recv_poll(child_2, 1000,
++						  &msg, NULL);
++			ASSERT_RETURN(ret == parent_status);
++
++			/*
++			 * Check returned cookie in case we expect
++			 * success.
++			 */
++			if (ret == 0) {
++				ASSERT_RETURN(msg->cookie ==
++					      expected_cookie + 1);
++			}
++
++			kdbus_msg_free(msg);
++		} else {
++			/*
++			 * Two unprivileged users will try to
++			 * communicate using broadcast.
++			 */
++			ret = RUN_UNPRIVILEGED(second_uid, second_gid, ({
++				child_2 = kdbus_hello(bus, 0, NULL, 0);
++				ASSERT_EXIT(child_2);
++
++				ret = kdbus_add_match_empty(child_2);
++				ASSERT_EXIT(ret == 0);
++
++				ret = eventfd_read(efd, &event_status);
++				ASSERT_EXIT(ret >= 0 && event_status == 1);
++
++				ret = kdbus_msg_send(child_2, NULL,
++						expected_cookie, 0, 0, 0,
++						KDBUS_DST_ID_BROADCAST);
++				ASSERT_EXIT(ret == 0);
++
++				/* drop own broadcast */
++				ret = kdbus_msg_recv(child_2, &msg, NULL);
++				ASSERT_RETURN(ret == 0);
++				ASSERT_RETURN(msg->src_id == child_2->id);
++				kdbus_msg_free(msg);
++
++				/* Use a little bit high time */
++				ret = kdbus_msg_recv_poll(child_2, 1000,
++							  &msg, NULL);
++				ASSERT_EXIT(ret == parent_status);
++
++				/*
++				 * Check returned cookie in case we expect
++				 * success.
++				 */
++				if (ret == 0) {
++					ASSERT_EXIT(msg->cookie ==
++						    expected_cookie + 1);
++				}
++
++				kdbus_msg_free(msg);
++				kdbus_conn_free(child_2);
++			}),
++			({ 0; }));
++			ASSERT_RETURN(ret == 0);
++		}
++	}));
++	ASSERT_RETURN(ret == 0);
++
++	close(efd);
++
++	return ret;
++}
++
++static void nosig(int sig)
++{
++}
++
++static int test_priv_before_policy_upload(struct kdbus_test_env *env)
++{
++	int ret = 0;
++	struct kdbus_conn *conn;
++
++	conn = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn);
++
++	/*
++	 * Make sure unprivileged bus user cannot acquire names
++	 * before registring any policy holder.
++	 */
++
++	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++		ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
++		ASSERT_EXIT(ret < 0);
++	}));
++	ASSERT_RETURN(ret == 0);
++
++	/*
++	 * Make sure unprivileged bus users cannot talk by default
++	 * to privileged ones, unless a policy holder that allows
++	 * this was uploaded.
++	 */
++
++	ret = test_policy_priv_by_id(env->buspath, conn, false,
++				     -ETIMEDOUT, -EPERM);
++	ASSERT_RETURN(ret == 0);
++
++	/* Activate matching for a privileged connection */
++	ret = kdbus_add_match_empty(conn);
++	ASSERT_RETURN(ret == 0);
++
++	/*
++	 * First make sure that BROADCAST with msg flag
++	 * KDBUS_MSG_EXPECT_REPLY will fail with -ENOTUNIQ
++	 */
++	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++		ret = kdbus_msg_send(unpriv, NULL, 0xdeadbeef,
++				     KDBUS_MSG_EXPECT_REPLY,
++				     5000000000ULL, 0,
++				     KDBUS_DST_ID_BROADCAST);
++		ASSERT_EXIT(ret == -ENOTUNIQ);
++	}));
++	ASSERT_RETURN(ret == 0);
++
++	/*
++	 * Test broadcast with a privileged connection.
++	 *
++	 * The first unprivileged receiver should not get the
++	 * broadcast message sent by the privileged connection,
++	 * since there is no a TALK policy that allows the
++	 * unprivileged to TALK to the privileged connection. It
++	 * will fail with -ETIMEDOUT
++	 *
++	 * Then second case:
++	 * The privileged connection should get the broadcast
++	 * message from the unprivileged one. Since the receiver is
++	 * a privileged bus user and it has default TALK access to
++	 * all connections it will receive those.
++	 */
++
++	ret = test_policy_priv_by_broadcast(env->buspath, conn,
++					    DO_NOT_DROP,
++					    0, -ETIMEDOUT);
++	ASSERT_RETURN(ret == 0);
++
++
++	/*
++	 * Test broadcast with two unprivileged connections running
++	 * under the same user.
++	 *
++	 * Both connections should succeed.
++	 */
++
++	ret = test_policy_priv_by_broadcast(env->buspath, NULL,
++					    DROP_SAME_UNPRIV, 0, 0);
++	ASSERT_RETURN(ret == 0);
++
++	/*
++	 * Test broadcast with two unprivileged connections running
++	 * under different users.
++	 *
++	 * Both connections will fail with -ETIMEDOUT.
++	 */
++
++	ret = test_policy_priv_by_broadcast(env->buspath, NULL,
++					    DROP_OTHER_UNPRIV,
++					    -ETIMEDOUT, -ETIMEDOUT);
++	ASSERT_RETURN(ret == 0);
++
++	kdbus_conn_free(conn);
++
++	return ret;
++}
++
++static int test_broadcast_after_policy_upload(struct kdbus_test_env *env)
++{
++	int ret;
++	int efd;
++	eventfd_t event_status = 0;
++	struct kdbus_msg *msg = NULL;
++	struct kdbus_conn *owner_a, *owner_b;
++	struct kdbus_conn *holder_a, *holder_b;
++	struct kdbus_policy_access access = {};
++	uint64_t expected_cookie = time(NULL) ^ 0xdeadbeef;
++
++	owner_a = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(owner_a);
++
++	ret = kdbus_name_acquire(owner_a, "com.example.broadcastA", NULL);
++	ASSERT_EXIT(ret >= 0);
++
++	/*
++	 * Make sure unprivileged bus users cannot talk by default
++	 * to privileged ones, unless a policy holder that allows
++	 * this was uploaded.
++	 */
++
++	++expected_cookie;
++	ret = test_policy_priv_by_id(env->buspath, owner_a, false,
++				     -ETIMEDOUT, -EPERM);
++	ASSERT_RETURN(ret == 0);
++
++	/*
++	 * Make sure that privileged won't receive broadcasts unless
++	 * it installs a match. It will fail with -ETIMEDOUT
++	 *
++	 * At same time check that the unprivileged connection will
++	 * not receive the broadcast message from the privileged one
++	 * since the privileged one owns a name with a restricted
++	 * policy TALK (actually the TALK policy is still not
++	 * registered so we fail by default), thus the unprivileged
++	 * receiver is not able to TALK to that name.
++	 */
++
++	/* Activate matching for a privileged connection */
++	ret = kdbus_add_match_empty(owner_a);
++	ASSERT_RETURN(ret == 0);
++
++	/*
++	 * Redo the previous test. The privileged conn owner_a is
++	 * able to TALK to any connection so it will receive the
++	 * broadcast message now.
++	 */
++
++	ret = test_policy_priv_by_broadcast(env->buspath, owner_a,
++					    DO_NOT_DROP,
++					    0, -ETIMEDOUT);
++	ASSERT_RETURN(ret == 0);
++
++	/*
++	 * Test that broadcast between two unprivileged users running
++	 * under the same user still succeed.
++	 */
++
++	ret = test_policy_priv_by_broadcast(env->buspath, NULL,
++					    DROP_SAME_UNPRIV, 0, 0);
++	ASSERT_RETURN(ret == 0);
++
++	/*
++	 * Test broadcast with two unprivileged connections running
++	 * under different users.
++	 *
++	 * Both connections will fail with -ETIMEDOUT.
++	 */
++
++	ret = test_policy_priv_by_broadcast(env->buspath, NULL,
++					    DROP_OTHER_UNPRIV,
++					    -ETIMEDOUT, -ETIMEDOUT);
++	ASSERT_RETURN(ret == 0);
++
++	access = (struct kdbus_policy_access){
++		.type = KDBUS_POLICY_ACCESS_USER,
++		.id = geteuid(),
++		.access = KDBUS_POLICY_OWN,
++	};
++
++	holder_a = kdbus_hello_registrar(env->buspath,
++					 "com.example.broadcastA",
++					 &access, 1,
++					 KDBUS_HELLO_POLICY_HOLDER);
++	ASSERT_RETURN(holder_a);
++
++	holder_b = kdbus_hello_registrar(env->buspath,
++					 "com.example.broadcastB",
++					 &access, 1,
++					 KDBUS_HELLO_POLICY_HOLDER);
++	ASSERT_RETURN(holder_b);
++
++	/* Free connections and their received messages and restart */
++	kdbus_conn_free(owner_a);
++
++	owner_a = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(owner_a);
++
++	/* Activate matching for a privileged connection */
++	ret = kdbus_add_match_empty(owner_a);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_name_acquire(owner_a, "com.example.broadcastA", NULL);
++	ASSERT_EXIT(ret >= 0);
++
++	owner_b = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(owner_b);
++
++	ret = kdbus_name_acquire(owner_b, "com.example.broadcastB", NULL);
++	ASSERT_EXIT(ret >= 0);
++
++	/* Activate matching for a privileged connection */
++	ret = kdbus_add_match_empty(owner_b);
++	ASSERT_RETURN(ret == 0);
++
++	/*
++	 * Test that even if "com.example.broadcastA" and
++	 * "com.example.broadcastB" do have a TALK access by default
++	 * they are able to signal each other using broadcast due to
++	 * the fact they are privileged connections, they receive
++	 * all broadcasts if the match allows it.
++	 */
++
++	++expected_cookie;
++	ret = kdbus_msg_send(owner_a, NULL, expected_cookie, 0,
++			     0, 0, KDBUS_DST_ID_BROADCAST);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_msg_recv_poll(owner_a, 100, &msg, NULL);
++	ASSERT_RETURN(ret == 0);
++	ASSERT_RETURN(msg->cookie == expected_cookie);
++
++	/* Check src ID */
++	ASSERT_RETURN(msg->src_id == owner_a->id);
++
++	kdbus_msg_free(msg);
++
++	ret = kdbus_msg_recv_poll(owner_b, 100, &msg, NULL);
++	ASSERT_RETURN(ret == 0);
++	ASSERT_RETURN(msg->cookie == expected_cookie);
++
++	/* Check src ID */
++	ASSERT_RETURN(msg->src_id == owner_a->id);
++
++	kdbus_msg_free(msg);
++
++	/* Release name "com.example.broadcastB" */
++
++	ret = kdbus_name_release(owner_b, "com.example.broadcastB");
++	ASSERT_EXIT(ret >= 0);
++
++	/* KDBUS_POLICY_OWN for unprivileged connections */
++	access = (struct kdbus_policy_access){
++		.type = KDBUS_POLICY_ACCESS_WORLD,
++		.id = geteuid(),
++		.access = KDBUS_POLICY_OWN,
++	};
++
++	/* Update the policy so unprivileged will own the name */
++
++	ret = kdbus_conn_update_policy(holder_b,
++				       "com.example.broadcastB",
++				       &access, 1);
++	ASSERT_RETURN(ret == 0);
++
++	/*
++	 * Send broadcasts from an unprivileged connection that
++	 * owns a name "com.example.broadcastB".
++	 *
++	 * We'll have four destinations here:
++	 *
++	 * 1) destination owner_a: privileged connection that owns
++	 * "com.example.broadcastA". It will receive the broadcast
++	 * since it is a privileged has default TALK access to all
++	 * connections, and it is subscribed to the match.
++	 * Will succeed.
++	 *
++	 * owner_b: privileged connection (running under a different
++	 * uid) that do not own names, but with an empty broadcast
++	 * match, so it will receive broadcasts since it has default
++	 * TALK access to all connection.
++	 *
++	 * unpriv_a: unpriv connection that do not own any name.
++	 * It will receive the broadcast since it is running under
++	 * the same user of the one broadcasting and did install
++	 * matches. It should get the message.
++	 *
++	 * unpriv_b: unpriv connection is not interested in broadcast
++	 * messages, so it did not install broadcast matches. Should
++	 * fail with -ETIMEDOUT
++	 */
++
++	++expected_cookie;
++	efd = eventfd(0, EFD_CLOEXEC);
++	ASSERT_RETURN_VAL(efd >= 0, efd);
++
++	ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_UID, ({
++		struct kdbus_conn *unpriv_owner;
++		struct kdbus_conn *unpriv_a, *unpriv_b;
++
++		unpriv_owner = kdbus_hello(env->buspath, 0, NULL, 0);
++		ASSERT_EXIT(unpriv_owner);
++
++		unpriv_a = kdbus_hello(env->buspath, 0, NULL, 0);
++		ASSERT_EXIT(unpriv_a);
++
++		unpriv_b = kdbus_hello(env->buspath, 0, NULL, 0);
++		ASSERT_EXIT(unpriv_b);
++
++		ret = kdbus_name_acquire(unpriv_owner,
++					 "com.example.broadcastB",
++					 NULL);
++		ASSERT_EXIT(ret >= 0);
++
++		ret = kdbus_add_match_empty(unpriv_a);
++		ASSERT_EXIT(ret == 0);
++
++		/* Signal that we are doing broadcasts */
++		ret = eventfd_write(efd, 1);
++		ASSERT_EXIT(ret == 0);
++
++		/*
++		 * Do broadcast from a connection that owns the
++		 * names "com.example.broadcastB".
++		 */
++		ret = kdbus_msg_send(unpriv_owner, NULL,
++				     expected_cookie,
++				     0, 0, 0,
++				     KDBUS_DST_ID_BROADCAST);
++		ASSERT_EXIT(ret == 0);
++
++		/*
++		 * Unprivileged connection running under the same
++		 * user. It should succeed.
++		 */
++		ret = kdbus_msg_recv_poll(unpriv_a, 300, &msg, NULL);
++		ASSERT_EXIT(ret == 0 && msg->cookie == expected_cookie);
++
++		/*
++		 * Did not install matches, not interested in
++		 * broadcasts
++		 */
++		ret = kdbus_msg_recv_poll(unpriv_b, 300, NULL, NULL);
++		ASSERT_EXIT(ret == -ETIMEDOUT);
++	}),
++	({
++		ret = eventfd_read(efd, &event_status);
++		ASSERT_RETURN(ret >= 0 && event_status == 1);
++
++		/*
++		 * owner_a must fail with -ETIMEDOUT, since it owns
++		 * name "com.example.broadcastA" and its TALK
++		 * access is restriced.
++		 */
++		ret = kdbus_msg_recv_poll(owner_a, 300, &msg, NULL);
++		ASSERT_RETURN(ret == 0);
++
++		/* confirm the received cookie */
++		ASSERT_RETURN(msg->cookie == expected_cookie);
++
++		kdbus_msg_free(msg);
++
++		/*
++		 * owner_b got the broadcast from an unprivileged
++		 * connection.
++		 */
++		ret = kdbus_msg_recv_poll(owner_b, 300, &msg, NULL);
++		ASSERT_RETURN(ret == 0);
++
++		/* confirm the received cookie */
++		ASSERT_RETURN(msg->cookie == expected_cookie);
++
++		kdbus_msg_free(msg);
++
++	}));
++	ASSERT_RETURN(ret == 0);
++
++	close(efd);
++
++	/*
++	 * Test broadcast with two unprivileged connections running
++	 * under different users.
++	 *
++	 * Both connections will fail with -ETIMEDOUT.
++	 */
++
++	ret = test_policy_priv_by_broadcast(env->buspath, NULL,
++					    DROP_OTHER_UNPRIV,
++					    -ETIMEDOUT, -ETIMEDOUT);
++	ASSERT_RETURN(ret == 0);
++
++	/* Drop received broadcasts by privileged */
++	ret = kdbus_msg_recv_poll(owner_a, 100, NULL, NULL);
++	ret = kdbus_msg_recv_poll(owner_a, 100, NULL, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_msg_recv(owner_a, NULL, NULL);
++	ASSERT_RETURN(ret == -EAGAIN);
++
++	ret = kdbus_msg_recv_poll(owner_b, 100, NULL, NULL);
++	ret = kdbus_msg_recv_poll(owner_b, 100, NULL, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_msg_recv(owner_b, NULL, NULL);
++	ASSERT_RETURN(ret == -EAGAIN);
++
++	/*
++	 * Perform last tests, allow others to talk to name
++	 * "com.example.broadcastA". So now receiving broadcasts
++	 * from it should succeed since the TALK policy allow it.
++	 */
++
++	/* KDBUS_POLICY_OWN for unprivileged connections */
++	access = (struct kdbus_policy_access){
++		.type = KDBUS_POLICY_ACCESS_WORLD,
++		.id = geteuid(),
++		.access = KDBUS_POLICY_TALK,
++	};
++
++	ret = kdbus_conn_update_policy(holder_a,
++				       "com.example.broadcastA",
++				       &access, 1);
++	ASSERT_RETURN(ret == 0);
++
++	/*
++	 * Unprivileged is able to TALK to "com.example.broadcastA"
++	 * now so it will receive its broadcasts
++	 */
++	ret = test_policy_priv_by_broadcast(env->buspath, owner_a,
++					    DO_NOT_DROP, 0, 0);
++	ASSERT_RETURN(ret == 0);
++
++	++expected_cookie;
++	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++		ret = kdbus_name_acquire(unpriv, "com.example.broadcastB",
++					 NULL);
++		ASSERT_EXIT(ret >= 0);
++		ret = kdbus_msg_send(unpriv, NULL, expected_cookie,
++				     0, 0, 0, KDBUS_DST_ID_BROADCAST);
++		ASSERT_EXIT(ret == 0);
++	}));
++	ASSERT_RETURN(ret == 0);
++
++	/* owner_a is privileged it will get the broadcast now. */
++	ret = kdbus_msg_recv_poll(owner_a, 300, &msg, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	/* confirm the received cookie */
++	ASSERT_RETURN(msg->cookie == expected_cookie);
++
++	kdbus_msg_free(msg);
++
++	/*
++	 * owner_a released name "com.example.broadcastA". It should
++	 * receive broadcasts since it is still privileged and has
++	 * the right match.
++	 *
++	 * Unprivileged connection will own a name and will try to
++	 * signal to the privileged connection.
++	 */
++
++	ret = kdbus_name_release(owner_a, "com.example.broadcastA");
++	ASSERT_EXIT(ret >= 0);
++
++	++expected_cookie;
++	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++		ret = kdbus_name_acquire(unpriv, "com.example.broadcastB",
++					 NULL);
++		ASSERT_EXIT(ret >= 0);
++		ret = kdbus_msg_send(unpriv, NULL, expected_cookie,
++				     0, 0, 0, KDBUS_DST_ID_BROADCAST);
++		ASSERT_EXIT(ret == 0);
++	}));
++	ASSERT_RETURN(ret == 0);
++
++	/* owner_a will get the broadcast now. */
++	ret = kdbus_msg_recv_poll(owner_a, 300, &msg, NULL);
++	ASSERT_RETURN(ret == 0);
++
++	/* confirm the received cookie */
++	ASSERT_RETURN(msg->cookie == expected_cookie);
++
++	kdbus_msg_free(msg);
++
++	kdbus_conn_free(owner_a);
++	kdbus_conn_free(owner_b);
++	kdbus_conn_free(holder_a);
++	kdbus_conn_free(holder_b);
++
++	return 0;
++}
++
++static int test_policy_priv(struct kdbus_test_env *env)
++{
++	struct kdbus_conn *conn_a, *conn_b, *conn, *owner;
++	struct kdbus_policy_access access, *acc;
++	sigset_t sset;
++	size_t num;
++	int ret;
++
++	/*
++	 * Make sure we have CAP_SETUID/SETGID so we can drop privileges
++	 */
++
++	ret = test_is_capable(CAP_SETUID, CAP_SETGID, -1);
++	ASSERT_RETURN(ret >= 0);
++
++	if (!ret)
++		return TEST_SKIP;
++
++	/* make sure that uids and gids are mapped */
++	if (!all_uids_gids_are_mapped())
++		return TEST_SKIP;
++
++	/*
++	 * Setup:
++	 *  conn_a: policy holder for com.example.a
++	 *  conn_b: name holder of com.example.b
++	 */
++
++	signal(SIGUSR1, nosig);
++	sigemptyset(&sset);
++	sigaddset(&sset, SIGUSR1);
++	sigprocmask(SIG_BLOCK, &sset, NULL);
++
++	conn = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn);
++
++	/*
++	 * Before registering any policy holder, make sure that the
++	 * bus is secure by default. This test is necessary, it catches
++	 * several cases where old D-Bus was vulnerable.
++	 */
++
++	ret = test_priv_before_policy_upload(env);
++	ASSERT_RETURN(ret == 0);
++
++	/*
++	 * Make sure unprivileged are not able to register policy
++	 * holders
++	 */
++
++	ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_GID, ({
++		struct kdbus_conn *holder;
++
++		holder = kdbus_hello_registrar(env->buspath,
++					       "com.example.a", NULL, 0,
++					       KDBUS_HELLO_POLICY_HOLDER);
++		ASSERT_EXIT(holder == NULL && errno == EPERM);
++	}),
++	({ 0; }));
++	ASSERT_RETURN(ret == 0);
++
++
++	/* Register policy holder */
++
++	conn_a = kdbus_hello_registrar(env->buspath, "com.example.a",
++				       NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
++	ASSERT_RETURN(conn_a);
++
++	conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn_b);
++
++	ret = kdbus_name_acquire(conn_b, "com.example.b", NULL);
++	ASSERT_EXIT(ret >= 0);
++
++	/*
++	 * Make sure bus-owners can always acquire names.
++	 */
++	ret = kdbus_name_acquire(conn, "com.example.a", NULL);
++	ASSERT_EXIT(ret >= 0);
++
++	kdbus_conn_free(conn);
++
++	/*
++	 * Make sure unprivileged users cannot acquire names with default
++	 * policy assigned.
++	 */
++
++	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++		ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
++		ASSERT_EXIT(ret < 0);
++	}));
++	ASSERT_RETURN(ret >= 0);
++
++	/*
++	 * Make sure unprivileged users can acquire names if we make them
++	 * world-accessible.
++	 */
++
++	access = (struct kdbus_policy_access){
++		.type = KDBUS_POLICY_ACCESS_WORLD,
++		.id = 0,
++		.access = KDBUS_POLICY_OWN,
++	};
++
++	/*
++	 * Make sure unprivileged/normal connections are not able
++	 * to update policies
++	 */
++
++	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++		ret = kdbus_conn_update_policy(unpriv, "com.example.a",
++					       &access, 1);
++		ASSERT_EXIT(ret == -EOPNOTSUPP);
++	}));
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
++	ASSERT_RETURN(ret == 0);
++
++	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++		ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
++		ASSERT_EXIT(ret >= 0);
++	}));
++	ASSERT_RETURN(ret >= 0);
++
++	/*
++	 * Make sure unprivileged users can acquire names if we make them
++	 * gid-accessible. But only if the gid matches.
++	 */
++
++	access = (struct kdbus_policy_access){
++		.type = KDBUS_POLICY_ACCESS_GROUP,
++		.id = UNPRIV_GID,
++		.access = KDBUS_POLICY_OWN,
++	};
++
++	ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
++	ASSERT_RETURN(ret == 0);
++
++	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++		ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
++		ASSERT_EXIT(ret >= 0);
++	}));
++	ASSERT_RETURN(ret >= 0);
++
++	access = (struct kdbus_policy_access){
++		.type = KDBUS_POLICY_ACCESS_GROUP,
++		.id = 1,
++		.access = KDBUS_POLICY_OWN,
++	};
++
++	ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
++	ASSERT_RETURN(ret == 0);
++
++	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++		ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
++		ASSERT_EXIT(ret < 0);
++	}));
++	ASSERT_RETURN(ret >= 0);
++
++	/*
++	 * Make sure unprivileged users can acquire names if we make them
++	 * uid-accessible. But only if the uid matches.
++	 */
++
++	access = (struct kdbus_policy_access){
++		.type = KDBUS_POLICY_ACCESS_USER,
++		.id = UNPRIV_UID,
++		.access = KDBUS_POLICY_OWN,
++	};
++
++	ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
++	ASSERT_RETURN(ret == 0);
++
++	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++		ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
++		ASSERT_EXIT(ret >= 0);
++	}));
++	ASSERT_RETURN(ret >= 0);
++
++	access = (struct kdbus_policy_access){
++		.type = KDBUS_POLICY_ACCESS_USER,
++		.id = 1,
++		.access = KDBUS_POLICY_OWN,
++	};
++
++	ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
++	ASSERT_RETURN(ret == 0);
++
++	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++		ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
++		ASSERT_EXIT(ret < 0);
++	}));
++	ASSERT_RETURN(ret >= 0);
++
++	/*
++	 * Make sure unprivileged users cannot acquire names if no owner-policy
++	 * matches, even if SEE/TALK policies match.
++	 */
++
++	num = 4;
++	acc = (struct kdbus_policy_access[]){
++		{
++			.type = KDBUS_POLICY_ACCESS_GROUP,
++			.id = UNPRIV_GID,
++			.access = KDBUS_POLICY_SEE,
++		},
++		{
++			.type = KDBUS_POLICY_ACCESS_USER,
++			.id = UNPRIV_UID,
++			.access = KDBUS_POLICY_TALK,
++		},
++		{
++			.type = KDBUS_POLICY_ACCESS_WORLD,
++			.id = 0,
++			.access = KDBUS_POLICY_TALK,
++		},
++		{
++			.type = KDBUS_POLICY_ACCESS_WORLD,
++			.id = 0,
++			.access = KDBUS_POLICY_SEE,
++		},
++	};
++
++	ret = kdbus_conn_update_policy(conn_a, "com.example.a", acc, num);
++	ASSERT_RETURN(ret == 0);
++
++	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++		ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
++		ASSERT_EXIT(ret < 0);
++	}));
++	ASSERT_RETURN(ret >= 0);
++
++	/*
++	 * Make sure unprivileged users can acquire names if the only matching
++	 * policy is somewhere in the middle.
++	 */
++
++	num = 5;
++	acc = (struct kdbus_policy_access[]){
++		{
++			.type = KDBUS_POLICY_ACCESS_USER,
++			.id = 1,
++			.access = KDBUS_POLICY_OWN,
++		},
++		{
++			.type = KDBUS_POLICY_ACCESS_USER,
++			.id = 2,
++			.access = KDBUS_POLICY_OWN,
++		},
++		{
++			.type = KDBUS_POLICY_ACCESS_USER,
++			.id = UNPRIV_UID,
++			.access = KDBUS_POLICY_OWN,
++		},
++		{
++			.type = KDBUS_POLICY_ACCESS_USER,
++			.id = 3,
++			.access = KDBUS_POLICY_OWN,
++		},
++		{
++			.type = KDBUS_POLICY_ACCESS_USER,
++			.id = 4,
++			.access = KDBUS_POLICY_OWN,
++		},
++	};
++
++	ret = kdbus_conn_update_policy(conn_a, "com.example.a", acc, num);
++	ASSERT_RETURN(ret == 0);
++
++	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++		ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
++		ASSERT_EXIT(ret >= 0);
++	}));
++	ASSERT_RETURN(ret >= 0);
++
++	/*
++	 * Clear policies
++	 */
++
++	ret = kdbus_conn_update_policy(conn_a, "com.example.a", NULL, 0);
++	ASSERT_RETURN(ret == 0);
++
++	/*
++	 * Make sure privileged bus users can _always_ talk to others.
++	 */
++
++	conn = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn);
++
++	ret = kdbus_msg_send(conn, "com.example.b", 0xdeadbeef, 0, 0, 0, 0);
++	ASSERT_EXIT(ret >= 0);
++
++	ret = kdbus_msg_recv_poll(conn_b, 300, NULL, NULL);
++	ASSERT_EXIT(ret >= 0);
++
++	kdbus_conn_free(conn);
++
++	/*
++	 * Make sure unprivileged bus users cannot talk by default.
++	 */
++
++	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++		ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
++				     0, 0);
++		ASSERT_EXIT(ret == -EPERM);
++	}));
++	ASSERT_RETURN(ret >= 0);
++
++	/*
++	 * Make sure unprivileged bus users can talk to equals, even without
++	 * policy.
++	 */
++
++	access = (struct kdbus_policy_access){
++		.type = KDBUS_POLICY_ACCESS_USER,
++		.id = UNPRIV_UID,
++		.access = KDBUS_POLICY_OWN,
++	};
++
++	ret = kdbus_conn_update_policy(conn_a, "com.example.c", &access, 1);
++	ASSERT_RETURN(ret == 0);
++
++	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++		struct kdbus_conn *owner;
++
++		owner = kdbus_hello(env->buspath, 0, NULL, 0);
++		ASSERT_RETURN(owner);
++
++		ret = kdbus_name_acquire(owner, "com.example.c", NULL);
++		ASSERT_EXIT(ret >= 0);
++
++		ret = kdbus_msg_send(unpriv, "com.example.c", 0xdeadbeef, 0, 0,
++				     0, 0);
++		ASSERT_EXIT(ret >= 0);
++		ret = kdbus_msg_recv_poll(owner, 100, NULL, NULL);
++		ASSERT_EXIT(ret >= 0);
++
++		kdbus_conn_free(owner);
++	}));
++	ASSERT_RETURN(ret >= 0);
++
++	/*
++	 * Make sure unprivileged bus users can talk to privileged users if a
++	 * suitable UID policy is set.
++	 */
++
++	access = (struct kdbus_policy_access){
++		.type = KDBUS_POLICY_ACCESS_USER,
++		.id = UNPRIV_UID,
++		.access = KDBUS_POLICY_TALK,
++	};
++
++	ret = kdbus_conn_update_policy(conn_a, "com.example.b", &access, 1);
++	ASSERT_RETURN(ret == 0);
++
++	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++		ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
++				     0, 0);
++		ASSERT_EXIT(ret >= 0);
++	}));
++	ASSERT_RETURN(ret >= 0);
++
++	ret = kdbus_msg_recv_poll(conn_b, 100, NULL, NULL);
++	ASSERT_EXIT(ret >= 0);
++
++	/*
++	 * Make sure unprivileged bus users can talk to privileged users if a
++	 * suitable GID policy is set.
++	 */
++
++	access = (struct kdbus_policy_access){
++		.type = KDBUS_POLICY_ACCESS_GROUP,
++		.id = UNPRIV_GID,
++		.access = KDBUS_POLICY_TALK,
++	};
++
++	ret = kdbus_conn_update_policy(conn_a, "com.example.b", &access, 1);
++	ASSERT_RETURN(ret == 0);
++
++	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++		ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
++				     0, 0);
++		ASSERT_EXIT(ret >= 0);
++	}));
++	ASSERT_RETURN(ret >= 0);
++
++	ret = kdbus_msg_recv_poll(conn_b, 100, NULL, NULL);
++	ASSERT_EXIT(ret >= 0);
++
++	/*
++	 * Make sure unprivileged bus users can talk to privileged users if a
++	 * suitable WORLD policy is set.
++	 */
++
++	access = (struct kdbus_policy_access){
++		.type = KDBUS_POLICY_ACCESS_WORLD,
++		.id = 0,
++		.access = KDBUS_POLICY_TALK,
++	};
++
++	ret = kdbus_conn_update_policy(conn_a, "com.example.b", &access, 1);
++	ASSERT_RETURN(ret == 0);
++
++	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++		ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
++				     0, 0);
++		ASSERT_EXIT(ret >= 0);
++	}));
++	ASSERT_RETURN(ret >= 0);
++
++	ret = kdbus_msg_recv_poll(conn_b, 100, NULL, NULL);
++	ASSERT_EXIT(ret >= 0);
++
++	/*
++	 * Make sure unprivileged bus users cannot talk to privileged users if
++	 * no suitable policy is set.
++	 */
++
++	num = 5;
++	acc = (struct kdbus_policy_access[]){
++		{
++			.type = KDBUS_POLICY_ACCESS_USER,
++			.id = 0,
++			.access = KDBUS_POLICY_OWN,
++		},
++		{
++			.type = KDBUS_POLICY_ACCESS_USER,
++			.id = 1,
++			.access = KDBUS_POLICY_TALK,
++		},
++		{
++			.type = KDBUS_POLICY_ACCESS_USER,
++			.id = UNPRIV_UID,
++			.access = KDBUS_POLICY_SEE,
++		},
++		{
++			.type = KDBUS_POLICY_ACCESS_USER,
++			.id = 3,
++			.access = KDBUS_POLICY_TALK,
++		},
++		{
++			.type = KDBUS_POLICY_ACCESS_USER,
++			.id = 4,
++			.access = KDBUS_POLICY_TALK,
++		},
++	};
++
++	ret = kdbus_conn_update_policy(conn_a, "com.example.b", acc, num);
++	ASSERT_RETURN(ret == 0);
++
++	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++		ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
++				     0, 0);
++		ASSERT_EXIT(ret == -EPERM);
++	}));
++	ASSERT_RETURN(ret >= 0);
++
++	/*
++	 * Make sure unprivileged bus users can talk to privileged users if a
++	 * suitable OWN privilege overwrites TALK.
++	 */
++
++	access = (struct kdbus_policy_access){
++		.type = KDBUS_POLICY_ACCESS_WORLD,
++		.id = 0,
++		.access = KDBUS_POLICY_OWN,
++	};
++
++	ret = kdbus_conn_update_policy(conn_a, "com.example.b", &access, 1);
++	ASSERT_RETURN(ret == 0);
++
++	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++		ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
++				     0, 0);
++		ASSERT_EXIT(ret >= 0);
++	}));
++	ASSERT_RETURN(ret >= 0);
++
++	ret = kdbus_msg_recv_poll(conn_b, 100, NULL, NULL);
++	ASSERT_EXIT(ret >= 0);
++
++	/*
++	 * Make sure the TALK cache is reset correctly when policies are
++	 * updated.
++	 */
++
++	access = (struct kdbus_policy_access){
++		.type = KDBUS_POLICY_ACCESS_WORLD,
++		.id = 0,
++		.access = KDBUS_POLICY_TALK,
++	};
++
++	ret = kdbus_conn_update_policy(conn_a, "com.example.b", &access, 1);
++	ASSERT_RETURN(ret == 0);
++
++	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++		ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
++				     0, 0);
++		ASSERT_EXIT(ret >= 0);
++
++		ret = kdbus_msg_recv_poll(conn_b, 100, NULL, NULL);
++		ASSERT_EXIT(ret >= 0);
++
++		ret = kdbus_conn_update_policy(conn_a, "com.example.b",
++					       NULL, 0);
++		ASSERT_RETURN(ret == 0);
++
++		ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
++				     0, 0);
++		ASSERT_EXIT(ret == -EPERM);
++	}));
++	ASSERT_RETURN(ret >= 0);
++
++	/*
++	 * Make sure the TALK cache is reset correctly when policy holders
++	 * disconnect.
++	 */
++
++	access = (struct kdbus_policy_access){
++		.type = KDBUS_POLICY_ACCESS_WORLD,
++		.id = 0,
++		.access = KDBUS_POLICY_OWN,
++	};
++
++	conn = kdbus_hello_registrar(env->buspath, "com.example.c",
++				     NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
++	ASSERT_RETURN(conn);
++
++	ret = kdbus_conn_update_policy(conn, "com.example.c", &access, 1);
++	ASSERT_RETURN(ret == 0);
++
++	owner = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(owner);
++
++	ret = kdbus_name_acquire(owner, "com.example.c", NULL);
++	ASSERT_RETURN(ret >= 0);
++
++	ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_GID, ({
++		struct kdbus_conn *unpriv;
++
++		/* wait for parent to be finished */
++		sigemptyset(&sset);
++		ret = sigsuspend(&sset);
++		ASSERT_RETURN(ret == -1 && errno == EINTR);
++
++		unpriv = kdbus_hello(env->buspath, 0, NULL, 0);
++		ASSERT_RETURN(unpriv);
++
++		ret = kdbus_msg_send(unpriv, "com.example.c", 0xdeadbeef, 0, 0,
++				     0, 0);
++		ASSERT_EXIT(ret >= 0);
++
++		ret = kdbus_msg_recv_poll(owner, 100, NULL, NULL);
++		ASSERT_EXIT(ret >= 0);
++
++		/* free policy holder */
++		kdbus_conn_free(conn);
++
++		ret = kdbus_msg_send(unpriv, "com.example.c", 0xdeadbeef, 0, 0,
++				     0, 0);
++		ASSERT_EXIT(ret == -EPERM);
++
++		kdbus_conn_free(unpriv);
++	}), ({
++		/* make sure policy holder is only valid in child */
++		kdbus_conn_free(conn);
++		kill(pid, SIGUSR1);
++	}));
++	ASSERT_RETURN(ret >= 0);
++
++
++	/*
++	 * The following tests are necessary.
++	 */
++
++	ret = test_broadcast_after_policy_upload(env);
++	ASSERT_RETURN(ret == 0);
++
++	kdbus_conn_free(owner);
++
++	/*
++	 * cleanup resources
++	 */
++
++	kdbus_conn_free(conn_b);
++	kdbus_conn_free(conn_a);
++
++	return TEST_OK;
++}
++
++int kdbus_test_policy_priv(struct kdbus_test_env *env)
++{
++	pid_t pid;
++	int ret;
++
++	/* make sure to exit() if a child returns from fork() */
++	pid = getpid();
++	ret = test_policy_priv(env);
++	if (pid != getpid())
++		exit(1);
++
++	return ret;
++}
+diff --git a/tools/testing/selftests/kdbus/test-policy.c b/tools/testing/selftests/kdbus/test-policy.c
+new file mode 100644
+index 0000000..96d20d5
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-policy.c
+@@ -0,0 +1,80 @@
++#include <errno.h>
++#include <stdio.h>
++#include <string.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stdint.h>
++#include <stdbool.h>
++#include <unistd.h>
++
++#include "kdbus-test.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++
++int kdbus_test_policy(struct kdbus_test_env *env)
++{
++	struct kdbus_conn *conn_a, *conn_b;
++	struct kdbus_policy_access access;
++	int ret;
++
++	/* Invalid name */
++	conn_a = kdbus_hello_registrar(env->buspath, ".example.a",
++				       NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
++	ASSERT_RETURN(conn_a == NULL);
++
++	conn_a = kdbus_hello_registrar(env->buspath, "example",
++				       NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
++	ASSERT_RETURN(conn_a == NULL);
++
++	conn_a = kdbus_hello_registrar(env->buspath, "com.example.a",
++				       NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
++	ASSERT_RETURN(conn_a);
++
++	conn_b = kdbus_hello_registrar(env->buspath, "com.example.b",
++				       NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
++	ASSERT_RETURN(conn_b);
++
++	/*
++	 * Verify there cannot be any duplicate entries, except for specific vs.
++	 * wildcard entries.
++	 */
++
++	access = (struct kdbus_policy_access){
++		.type = KDBUS_POLICY_ACCESS_USER,
++		.id = geteuid(),
++		.access = KDBUS_POLICY_SEE,
++	};
++
++	ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_conn_update_policy(conn_b, "com.example.a", &access, 1);
++	ASSERT_RETURN(ret == -EEXIST);
++
++	ret = kdbus_conn_update_policy(conn_b, "com.example.a.*", &access, 1);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_conn_update_policy(conn_a, "com.example.a.*", &access, 1);
++	ASSERT_RETURN(ret == -EEXIST);
++
++	ret = kdbus_conn_update_policy(conn_a, "com.example.*", &access, 1);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_conn_update_policy(conn_b, "com.example.a", &access, 1);
++	ASSERT_RETURN(ret == 0);
++
++	ret = kdbus_conn_update_policy(conn_b, "com.example.*", &access, 1);
++	ASSERT_RETURN(ret == -EEXIST);
++
++	/* Invalid name */
++	ret = kdbus_conn_update_policy(conn_b, ".example.*", &access, 1);
++	ASSERT_RETURN(ret == -EINVAL);
++
++	ret = kdbus_conn_update_policy(conn_b, "example", &access, 1);
++	ASSERT_RETURN(ret == -EINVAL);
++
++	kdbus_conn_free(conn_b);
++	kdbus_conn_free(conn_a);
++
++	return TEST_OK;
++}
+diff --git a/tools/testing/selftests/kdbus/test-sync.c b/tools/testing/selftests/kdbus/test-sync.c
+new file mode 100644
+index 0000000..0655a54
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-sync.c
+@@ -0,0 +1,369 @@
++#include <stdio.h>
++#include <string.h>
++#include <time.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++#include <assert.h>
++#include <pthread.h>
++#include <stdbool.h>
++#include <signal.h>
++#include <sys/wait.h>
++#include <sys/eventfd.h>
++
++#include "kdbus-api.h"
++#include "kdbus-test.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++
++static struct kdbus_conn *conn_a, *conn_b;
++static unsigned int cookie = 0xdeadbeef;
++
++static void nop_handler(int sig) {}
++
++static int interrupt_sync(struct kdbus_conn *conn_src,
++			  struct kdbus_conn *conn_dst)
++{
++	pid_t pid;
++	int ret, status;
++	struct kdbus_msg *msg = NULL;
++	struct sigaction sa = {
++		.sa_handler = nop_handler,
++		.sa_flags = SA_NOCLDSTOP|SA_RESTART,
++	};
++
++	cookie++;
++	pid = fork();
++	ASSERT_RETURN_VAL(pid >= 0, pid);
++
++	if (pid == 0) {
++		ret = sigaction(SIGINT, &sa, NULL);
++		ASSERT_EXIT(ret == 0);
++
++		ret = kdbus_msg_send_sync(conn_dst, NULL, cookie,
++					  KDBUS_MSG_EXPECT_REPLY,
++					  100000000ULL, 0, conn_src->id, -1);
++		ASSERT_EXIT(ret == -ETIMEDOUT);
++
++		_exit(EXIT_SUCCESS);
++	}
++
++	ret = kdbus_msg_recv_poll(conn_src, 100, &msg, NULL);
++	ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
++
++	kdbus_msg_free(msg);
++
++	ret = kill(pid, SIGINT);
++	ASSERT_RETURN_VAL(ret == 0, ret);
++
++	ret = waitpid(pid, &status, 0);
++	ASSERT_RETURN_VAL(ret >= 0, ret);
++
++	if (WIFSIGNALED(status))
++		return TEST_ERR;
++
++	ret = kdbus_msg_recv_poll(conn_src, 100, NULL, NULL);
++	ASSERT_RETURN(ret == -ETIMEDOUT);
++
++	return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
++}
++
++static int close_epipe_sync(const char *bus)
++{
++	pid_t pid;
++	int ret, status;
++	struct kdbus_conn *conn_src;
++	struct kdbus_conn *conn_dst;
++	struct kdbus_msg *msg = NULL;
++
++	conn_src = kdbus_hello(bus, 0, NULL, 0);
++	ASSERT_RETURN(conn_src);
++
++	ret = kdbus_add_match_empty(conn_src);
++	ASSERT_RETURN(ret == 0);
++
++	conn_dst = kdbus_hello(bus, 0, NULL, 0);
++	ASSERT_RETURN(conn_dst);
++
++	cookie++;
++	pid = fork();
++	ASSERT_RETURN_VAL(pid >= 0, pid);
++
++	if (pid == 0) {
++		uint64_t dst_id;
++
++		/* close our reference */
++		dst_id = conn_dst->id;
++		kdbus_conn_free(conn_dst);
++
++		ret = kdbus_msg_recv_poll(conn_src, 100, &msg, NULL);
++		ASSERT_EXIT(ret == 0 && msg->cookie == cookie);
++		ASSERT_EXIT(msg->src_id == dst_id);
++
++		cookie++;
++		ret = kdbus_msg_send_sync(conn_src, NULL, cookie,
++					  KDBUS_MSG_EXPECT_REPLY,
++					  100000000ULL, 0, dst_id, -1);
++		ASSERT_EXIT(ret == -EPIPE);
++
++		_exit(EXIT_SUCCESS);
++	}
++
++	ret = kdbus_msg_send(conn_dst, NULL, cookie, 0, 0, 0,
++			     KDBUS_DST_ID_BROADCAST);
++	ASSERT_RETURN(ret == 0);
++
++	cookie++;
++	ret = kdbus_msg_recv_poll(conn_dst, 100, &msg, NULL);
++	ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
++
++	kdbus_msg_free(msg);
++
++	/* destroy connection */
++	kdbus_conn_free(conn_dst);
++	kdbus_conn_free(conn_src);
++
++	ret = waitpid(pid, &status, 0);
++	ASSERT_RETURN_VAL(ret >= 0, ret);
++
++	if (!WIFEXITED(status))
++		return TEST_ERR;
++
++	return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
++}
++
++static int cancel_fd_sync(struct kdbus_conn *conn_src,
++			  struct kdbus_conn *conn_dst)
++{
++	pid_t pid;
++	int cancel_fd;
++	int ret, status;
++	uint64_t counter = 1;
++	struct kdbus_msg *msg = NULL;
++
++	cancel_fd = eventfd(0, 0);
++	ASSERT_RETURN_VAL(cancel_fd >= 0, cancel_fd);
++
++	cookie++;
++	pid = fork();
++	ASSERT_RETURN_VAL(pid >= 0, pid);
++
++	if (pid == 0) {
++		ret = kdbus_msg_send_sync(conn_dst, NULL, cookie,
++					  KDBUS_MSG_EXPECT_REPLY,
++					  100000000ULL, 0, conn_src->id,
++					  cancel_fd);
++		ASSERT_EXIT(ret == -ECANCELED);
++
++		_exit(EXIT_SUCCESS);
++	}
++
++	ret = kdbus_msg_recv_poll(conn_src, 100, &msg, NULL);
++	ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
++
++	kdbus_msg_free(msg);
++
++	ret = write(cancel_fd, &counter, sizeof(counter));
++	ASSERT_RETURN(ret == sizeof(counter));
++
++	ret = waitpid(pid, &status, 0);
++	ASSERT_RETURN_VAL(ret >= 0, ret);
++
++	if (WIFSIGNALED(status))
++		return TEST_ERR;
++
++	return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
++}
++
++static int no_cancel_sync(struct kdbus_conn *conn_src,
++			  struct kdbus_conn *conn_dst)
++{
++	pid_t pid;
++	int cancel_fd;
++	int ret, status;
++	struct kdbus_msg *msg = NULL;
++
++	/* pass eventfd, but never signal it so it shouldn't have any effect */
++
++	cancel_fd = eventfd(0, 0);
++	ASSERT_RETURN_VAL(cancel_fd >= 0, cancel_fd);
++
++	cookie++;
++	pid = fork();
++	ASSERT_RETURN_VAL(pid >= 0, pid);
++
++	if (pid == 0) {
++		ret = kdbus_msg_send_sync(conn_dst, NULL, cookie,
++					  KDBUS_MSG_EXPECT_REPLY,
++					  100000000ULL, 0, conn_src->id,
++					  cancel_fd);
++		ASSERT_EXIT(ret == 0);
++
++		_exit(EXIT_SUCCESS);
++	}
++
++	ret = kdbus_msg_recv_poll(conn_src, 100, &msg, NULL);
++	ASSERT_RETURN_VAL(ret == 0 && msg->cookie == cookie, -1);
++
++	kdbus_msg_free(msg);
++
++	ret = kdbus_msg_send_reply(conn_src, cookie, conn_dst->id);
++	ASSERT_RETURN_VAL(ret >= 0, ret);
++
++	ret = waitpid(pid, &status, 0);
++	ASSERT_RETURN_VAL(ret >= 0, ret);
++
++	if (WIFSIGNALED(status))
++		return -1;
++
++	return (status == EXIT_SUCCESS) ? 0 : -1;
++}
++
++static void *run_thread_reply(void *data)
++{
++	int ret;
++	unsigned long status = TEST_OK;
++
++	ret = kdbus_msg_recv_poll(conn_a, 3000, NULL, NULL);
++	if (ret < 0)
++		goto exit_thread;
++
++	kdbus_printf("Thread received message, sending reply ...\n");
++
++	/* using an unknown cookie must fail */
++	ret = kdbus_msg_send_reply(conn_a, ~cookie, conn_b->id);
++	if (ret != -EBADSLT) {
++		status = TEST_ERR;
++		goto exit_thread;
++	}
++
++	ret = kdbus_msg_send_reply(conn_a, cookie, conn_b->id);
++	if (ret != 0) {
++		status = TEST_ERR;
++		goto exit_thread;
++	}
++
++exit_thread:
++	pthread_exit(NULL);
++	return (void *) status;
++}
++
++int kdbus_test_sync_reply(struct kdbus_test_env *env)
++{
++	unsigned long status;
++	pthread_t thread;
++	int ret;
++
++	conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
++	conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn_a && conn_b);
++
++	pthread_create(&thread, NULL, run_thread_reply, NULL);
++
++	ret = kdbus_msg_send_sync(conn_b, NULL, cookie,
++				  KDBUS_MSG_EXPECT_REPLY,
++				  5000000000ULL, 0, conn_a->id, -1);
++
++	pthread_join(thread, (void *) &status);
++	ASSERT_RETURN(status == 0);
++	ASSERT_RETURN(ret == 0);
++
++	ret = interrupt_sync(conn_a, conn_b);
++	ASSERT_RETURN(ret == 0);
++
++	ret = close_epipe_sync(env->buspath);
++	ASSERT_RETURN(ret == 0);
++
++	ret = cancel_fd_sync(conn_a, conn_b);
++	ASSERT_RETURN(ret == 0);
++
++	ret = no_cancel_sync(conn_a, conn_b);
++	ASSERT_RETURN(ret == 0);
++
++	kdbus_printf("-- closing bus connections\n");
++
++	kdbus_conn_free(conn_a);
++	kdbus_conn_free(conn_b);
++
++	return TEST_OK;
++}
++
++#define BYEBYE_ME ((void*)0L)
++#define BYEBYE_THEM ((void*)1L)
++
++static void *run_thread_byebye(void *data)
++{
++	struct kdbus_cmd cmd_byebye = { .size = sizeof(cmd_byebye) };
++	int ret;
++
++	ret = kdbus_msg_recv_poll(conn_a, 3000, NULL, NULL);
++	if (ret == 0) {
++		kdbus_printf("Thread received message, invoking BYEBYE ...\n");
++		kdbus_msg_recv(conn_a, NULL, NULL);
++		if (data == BYEBYE_ME)
++			kdbus_cmd_byebye(conn_b->fd, &cmd_byebye);
++		else if (data == BYEBYE_THEM)
++			kdbus_cmd_byebye(conn_a->fd, &cmd_byebye);
++	}
++
++	pthread_exit(NULL);
++	return NULL;
++}
++
++int kdbus_test_sync_byebye(struct kdbus_test_env *env)
++{
++	pthread_t thread;
++	int ret;
++
++	/*
++	 * This sends a synchronous message to a thread, which waits until it
++	 * received the message and then invokes BYEBYE on the *ORIGINAL*
++	 * connection. That is, on the same connection that synchronously waits
++	 * for an reply.
++	 * This should properly wake the connection up and cause ECONNRESET as
++	 * the connection is disconnected now.
++	 *
++	 * The second time, we do the same but invoke BYEBYE on the *TARGET*
++	 * connection. This should also wake up the synchronous sender as the
++	 * reply cannot be sent by a disconnected target.
++	 */
++
++	conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
++	conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn_a && conn_b);
++
++	pthread_create(&thread, NULL, run_thread_byebye, BYEBYE_ME);
++
++	ret = kdbus_msg_send_sync(conn_b, NULL, cookie,
++				  KDBUS_MSG_EXPECT_REPLY,
++				  5000000000ULL, 0, conn_a->id, -1);
++
++	ASSERT_RETURN(ret == -ECONNRESET);
++
++	pthread_join(thread, NULL);
++
++	kdbus_conn_free(conn_a);
++	kdbus_conn_free(conn_b);
++
++	conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
++	conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn_a && conn_b);
++
++	pthread_create(&thread, NULL, run_thread_byebye, BYEBYE_THEM);
++
++	ret = kdbus_msg_send_sync(conn_b, NULL, cookie,
++				  KDBUS_MSG_EXPECT_REPLY,
++				  5000000000ULL, 0, conn_a->id, -1);
++
++	ASSERT_RETURN(ret == -EPIPE);
++
++	pthread_join(thread, NULL);
++
++	kdbus_conn_free(conn_a);
++	kdbus_conn_free(conn_b);
++
++	return TEST_OK;
++}
+diff --git a/tools/testing/selftests/kdbus/test-timeout.c b/tools/testing/selftests/kdbus/test-timeout.c
+new file mode 100644
+index 0000000..cfd1930
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-timeout.c
+@@ -0,0 +1,99 @@
++#include <stdio.h>
++#include <string.h>
++#include <time.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++#include <assert.h>
++#include <poll.h>
++#include <stdbool.h>
++
++#include "kdbus-api.h"
++#include "kdbus-test.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++
++int timeout_msg_recv(struct kdbus_conn *conn, uint64_t *expected)
++{
++	struct kdbus_cmd_recv recv = { .size = sizeof(recv) };
++	struct kdbus_msg *msg;
++	int ret;
++
++	ret = kdbus_cmd_recv(conn->fd, &recv);
++	if (ret < 0) {
++		kdbus_printf("error receiving message: %d (%m)\n", ret);
++		return ret;
++	}
++
++	msg = (struct kdbus_msg *)(conn->buf + recv.msg.offset);
++
++	ASSERT_RETURN_VAL(msg->payload_type == KDBUS_PAYLOAD_KERNEL, -EINVAL);
++	ASSERT_RETURN_VAL(msg->src_id == KDBUS_SRC_ID_KERNEL, -EINVAL);
++	ASSERT_RETURN_VAL(msg->dst_id == conn->id, -EINVAL);
++
++	*expected &= ~(1ULL << msg->cookie_reply);
++	kdbus_printf("Got message timeout for cookie %llu\n",
++		     msg->cookie_reply);
++
++	ret = kdbus_free(conn, recv.msg.offset);
++	if (ret < 0)
++		return ret;
++
++	return 0;
++}
++
++int kdbus_test_timeout(struct kdbus_test_env *env)
++{
++	struct kdbus_conn *conn_a, *conn_b;
++	struct pollfd fd;
++	int ret, i, n_msgs = 4;
++	uint64_t expected = 0;
++	uint64_t cookie = 0xdeadbeef;
++
++	conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
++	conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
++	ASSERT_RETURN(conn_a && conn_b);
++
++	fd.fd = conn_b->fd;
++
++	/*
++	 * send messages that expect a reply (within 100 msec),
++	 * but never answer it.
++	 */
++	for (i = 0; i < n_msgs; i++, cookie++) {
++		kdbus_printf("Sending message with cookie %llu ...\n",
++			     (unsigned long long)cookie);
++		ASSERT_RETURN(kdbus_msg_send(conn_b, NULL, cookie,
++			      KDBUS_MSG_EXPECT_REPLY,
++			      (i + 1) * 100ULL * 1000000ULL, 0,
++			      conn_a->id) == 0);
++		expected |= 1ULL << cookie;
++	}
++
++	for (;;) {
++		fd.events = POLLIN | POLLPRI | POLLHUP;
++		fd.revents = 0;
++
++		ret = poll(&fd, 1, (n_msgs + 1) * 100);
++		if (ret == 0)
++			kdbus_printf("--- timeout\n");
++		if (ret <= 0)
++			break;
++
++		if (fd.revents & POLLIN)
++			ASSERT_RETURN(!timeout_msg_recv(conn_b, &expected));
++
++		if (expected == 0)
++			break;
++	}
++
++	ASSERT_RETURN(expected == 0);
++
++	kdbus_conn_free(conn_a);
++	kdbus_conn_free(conn_b);
++
++	return TEST_OK;
++}


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-09-02 16:34 Mike Pagano
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-09-02 16:34 UTC (permalink / raw
  To: gentoo-commits

commit:     eddb8464aeb6e997d68c122086cacafc96684e3b
Author:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Wed Sep  2 16:34:29 2015 +0000
Commit:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Wed Sep  2 16:34:29 2015 +0000
URL:        https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=eddb8464

workqueue: Make flush_workqueue() available again to non GPL modules

 2710_flush-workqueue-non-GPL-availability.patch | 33 +++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/2710_flush-workqueue-non-GPL-availability.patch b/2710_flush-workqueue-non-GPL-availability.patch
new file mode 100644
index 0000000..3e017d4
--- /dev/null
+++ b/2710_flush-workqueue-non-GPL-availability.patch
@@ -0,0 +1,33 @@
+From 1dadafa86a779884f14a6e7a3ddde1a57b0a0a65 Mon Sep 17 00:00:00 2001
+From: Tim Gardner <tim.gardner@canonical.com>
+Date: Tue, 4 Aug 2015 11:26:04 -0600
+Subject: workqueue: Make flush_workqueue() available again to non GPL modules
+
+Commit 37b1ef31a568fc02e53587620226e5f3c66454c8 ("workqueue: move
+flush_scheduled_work() to workqueue.h") moved the exported non GPL
+flush_scheduled_work() from a function to an inline wrapper.
+Unfortunately, it directly calls flush_workqueue() which is a GPL function.
+This has the effect of changing the licensing requirement for this function
+and makes it unavailable to non GPL modules.
+
+See commit ad7b1f841f8a54c6d61ff181451f55b68175e15a ("workqueue: Make
+schedule_work() available again to non GPL modules") for precedent.
+
+Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
+Signed-off-by: Tejun Heo <tj@kernel.org>
+
+diff --git a/kernel/workqueue.c b/kernel/workqueue.c
+index 4c4f061..a413acb 100644
+--- a/kernel/workqueue.c
++++ b/kernel/workqueue.c
+@@ -2614,7 +2614,7 @@ void flush_workqueue(struct workqueue_struct *wq)
+ out_unlock:
+    mutex_unlock(&wq->mutex);
+ }
+-EXPORT_SYMBOL_GPL(flush_workqueue);
++EXPORT_SYMBOL(flush_workqueue);
+ 
+ /**
+  * drain_workqueue - drain a workqueue
+-- 
+cgit v0.10.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-09-15 12:31 Mike Pagano
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-09-15 12:31 UTC (permalink / raw
  To: gentoo-commits

commit:     f5a88481980ca0cbc0f981717e2368c486afa34c
Author:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Tue Sep 15 12:31:35 2015 +0000
Commit:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Tue Sep 15 12:31:35 2015 +0000
URL:        https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=f5a88481

BFQ v4r9 for 4.2

 0000_README                                        |   12 +
 ...roups-kconfig-build-bits-for-BFQ-v7r9-4.2.patch |  103 +
 ...introduce-the-BFQ-v7r9-I-O-sched-for-4.2.patch1 | 7026 ++++++++++++++++++++
 ...Early-Queue-Merge-EQM-to-BFQ-v7r9-for-4.2.patch | 1097 +++
 4 files changed, 8238 insertions(+)

diff --git a/0000_README b/0000_README
index 9022e99..0f4cdca 100644
--- a/0000_README
+++ b/0000_README
@@ -75,6 +75,18 @@ Patch:  5000_enable-additional-cpu-optimizations-for-gcc.patch
 From:   https://github.com/graysky2/kernel_gcc_patch/
 Desc:   Kernel patch enables gcc < v4.9 optimizations for additional CPUs.
 
+Patch:  5001_block-cgroups-kconfig-build-bits-for-BFQ-v7r9-4.2.patch
+From:   http://algo.ing.unimo.it/people/paolo/disk_sched/
+Desc:   BFQ v7r9 patch 1 for 4.2: Build, cgroups and kconfig bits
+
+Patch:  5002_block-introduce-the-BFQ-v7r9-I-O-sched-for-4.2.patch1
+From:   http://algo.ing.unimo.it/people/paolo/disk_sched/
+Desc:   BFQ v7r9 patch 2 for 4.2: BFQ Scheduler
+
+Patch:  5003_block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r9-for-4.2.0.patch
+From:   http://algo.ing.unimo.it/people/paolo/disk_sched/
+Desc:   BFQ v7r9 patch 3 for 4.2: Early Queue Merge (EQM)
+
 Patch:  5010_enable-additional-cpu-optimizations-for-gcc-4.9.patch
 From:   https://github.com/graysky2/kernel_gcc_patch/
 Desc:   Kernel patch enables gcc >= v4.9 optimizations for additional CPUs.

diff --git a/5001_block-cgroups-kconfig-build-bits-for-BFQ-v7r9-4.2.patch b/5001_block-cgroups-kconfig-build-bits-for-BFQ-v7r9-4.2.patch
new file mode 100644
index 0000000..fc7ef8e
--- /dev/null
+++ b/5001_block-cgroups-kconfig-build-bits-for-BFQ-v7r9-4.2.patch
@@ -0,0 +1,103 @@
+From f53ecde45f8d40a343aa5b5195e9f0944b7a1a37 Mon Sep 17 00:00:00 2001
+From: Paolo Valente <paolo.valente@unimore.it>
+Date: Tue, 7 Apr 2015 13:39:12 +0200
+Subject: [PATCH 1/3] block: cgroups, kconfig, build bits for BFQ-v7r9-4.2
+
+Update Kconfig.iosched and do the related Makefile changes to include
+kernel configuration options for BFQ. Also increase the number of
+policies supported by the blkio controller so that BFQ can add its
+own.
+
+Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
+Signed-off-by: Arianna Avanzini <avanzini@google.com>
+---
+ block/Kconfig.iosched  | 32 ++++++++++++++++++++++++++++++++
+ block/Makefile         |  1 +
+ include/linux/blkdev.h |  2 +-
+ 3 files changed, 34 insertions(+), 1 deletion(-)
+
+diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
+index 421bef9..0ee5f0f 100644
+--- a/block/Kconfig.iosched
++++ b/block/Kconfig.iosched
+@@ -39,6 +39,27 @@ config CFQ_GROUP_IOSCHED
+ 	---help---
+ 	  Enable group IO scheduling in CFQ.
+ 
++config IOSCHED_BFQ
++	tristate "BFQ I/O scheduler"
++	default n
++	---help---
++	  The BFQ I/O scheduler tries to distribute bandwidth among
++	  all processes according to their weights.
++	  It aims at distributing the bandwidth as desired, independently of
++	  the disk parameters and with any workload. It also tries to
++	  guarantee low latency to interactive and soft real-time
++	  applications. If compiled built-in (saying Y here), BFQ can
++	  be configured to support hierarchical scheduling.
++
++config CGROUP_BFQIO
++	bool "BFQ hierarchical scheduling support"
++	depends on CGROUPS && IOSCHED_BFQ=y
++	default n
++	---help---
++	  Enable hierarchical scheduling in BFQ, using the cgroups
++	  filesystem interface.  The name of the subsystem will be
++	  bfqio.
++
+ choice
+ 	prompt "Default I/O scheduler"
+ 	default DEFAULT_CFQ
+@@ -52,6 +73,16 @@ choice
+ 	config DEFAULT_CFQ
+ 		bool "CFQ" if IOSCHED_CFQ=y
+ 
++	config DEFAULT_BFQ
++		bool "BFQ" if IOSCHED_BFQ=y
++		help
++		  Selects BFQ as the default I/O scheduler which will be
++		  used by default for all block devices.
++		  The BFQ I/O scheduler aims at distributing the bandwidth
++		  as desired, independently of the disk parameters and with
++		  any workload. It also tries to guarantee low latency to
++		  interactive and soft real-time applications.
++
+ 	config DEFAULT_NOOP
+ 		bool "No-op"
+ 
+@@ -61,6 +92,7 @@ config DEFAULT_IOSCHED
+ 	string
+ 	default "deadline" if DEFAULT_DEADLINE
+ 	default "cfq" if DEFAULT_CFQ
++	default "bfq" if DEFAULT_BFQ
+ 	default "noop" if DEFAULT_NOOP
+ 
+ endmenu
+diff --git a/block/Makefile b/block/Makefile
+index 00ecc97..1ed86d5 100644
+--- a/block/Makefile
++++ b/block/Makefile
+@@ -18,6 +18,7 @@ obj-$(CONFIG_BLK_DEV_THROTTLING)	+= blk-throttle.o
+ obj-$(CONFIG_IOSCHED_NOOP)	+= noop-iosched.o
+ obj-$(CONFIG_IOSCHED_DEADLINE)	+= deadline-iosched.o
+ obj-$(CONFIG_IOSCHED_CFQ)	+= cfq-iosched.o
++obj-$(CONFIG_IOSCHED_BFQ)	+= bfq-iosched.o
+ 
+ obj-$(CONFIG_BLOCK_COMPAT)	+= compat_ioctl.o
+ obj-$(CONFIG_BLK_CMDLINE_PARSER)	+= cmdline-parser.o
+diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
+index a622f27..e2b4c03 100644
+--- a/include/linux/blkdev.h
++++ b/include/linux/blkdev.h
+@@ -43,7 +43,7 @@ struct blk_flush_queue;
+  * Maximum number of blkcg policies allowed to be registered concurrently.
+  * Defined here to simplify include dependency.
+  */
+-#define BLKCG_MAX_POLS		2
++#define BLKCG_MAX_POLS		3
+ 
+ struct request;
+ typedef void (rq_end_io_fn)(struct request *, int);
+-- 
+2.1.4
+

diff --git a/5002_block-introduce-the-BFQ-v7r9-I-O-sched-for-4.2.patch1 b/5002_block-introduce-the-BFQ-v7r9-I-O-sched-for-4.2.patch1
new file mode 100644
index 0000000..04dd37c
--- /dev/null
+++ b/5002_block-introduce-the-BFQ-v7r9-I-O-sched-for-4.2.patch1
@@ -0,0 +1,7026 @@
+From 152cacc8a71a6cd7fe8cedc1110a378721e66ffa Mon Sep 17 00:00:00 2001
+From: Paolo Valente <paolo.valente@unimore.it>
+Date: Thu, 9 May 2013 19:10:02 +0200
+Subject: [PATCH 2/3] block: introduce the BFQ-v7r9 I/O sched for 4.2
+
+Add the BFQ-v7r9 I/O scheduler to 4.2.
+The general structure is borrowed from CFQ, as much of the code for
+handling I/O contexts. Over time, several useful features have been
+ported from CFQ as well (details in the changelog in README.BFQ). A
+(bfq_)queue is associated to each task doing I/O on a device, and each
+time a scheduling decision has to be made a queue is selected and served
+until it expires.
+
+    - Slices are given in the service domain: tasks are assigned
+      budgets, measured in number of sectors. Once got the disk, a task
+      must however consume its assigned budget within a configurable
+      maximum time (by default, the maximum possible value of the
+      budgets is automatically computed to comply with this timeout).
+      This allows the desired latency vs "throughput boosting" tradeoff
+      to be set.
+
+    - Budgets are scheduled according to a variant of WF2Q+, implemented
+      using an augmented rb-tree to take eligibility into account while
+      preserving an O(log N) overall complexity.
+
+    - A low-latency tunable is provided; if enabled, both interactive
+      and soft real-time applications are guaranteed a very low latency.
+
+    - Latency guarantees are preserved also in the presence of NCQ.
+
+    - Also with flash-based devices, a high throughput is achieved
+      while still preserving latency guarantees.
+
+    - BFQ features Early Queue Merge (EQM), a sort of fusion of the
+      cooperating-queue-merging and the preemption mechanisms present
+      in CFQ. EQM is in fact a unified mechanism that tries to get a
+      sequential read pattern, and hence a high throughput, with any
+      set of processes performing interleaved I/O over a contiguous
+      sequence of sectors.
+
+    - BFQ supports full hierarchical scheduling, exporting a cgroups
+      interface.  Since each node has a full scheduler, each group can
+      be assigned its own weight.
+
+    - If the cgroups interface is not used, only I/O priorities can be
+      assigned to processes, with ioprio values mapped to weights
+      with the relation weight = IOPRIO_BE_NR - ioprio.
+
+    - ioprio classes are served in strict priority order, i.e., lower
+      priority queues are not served as long as there are higher
+      priority queues.  Among queues in the same class the bandwidth is
+      distributed in proportion to the weight of each queue. A very
+      thin extra bandwidth is however guaranteed to the Idle class, to
+      prevent it from starving.
+
+Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
+Signed-off-by: Arianna Avanzini <avanzini@google.com>
+---
+ block/Kconfig.iosched |    6 +-
+ block/bfq-cgroup.c    | 1108 +++++++++++++++
+ block/bfq-ioc.c       |   36 +
+ block/bfq-iosched.c   | 3753 +++++++++++++++++++++++++++++++++++++++++++++++++
+ block/bfq-sched.c     | 1197 ++++++++++++++++
+ block/bfq.h           |  807 +++++++++++
+ 6 files changed, 6903 insertions(+), 4 deletions(-)
+ create mode 100644 block/bfq-cgroup.c
+ create mode 100644 block/bfq-ioc.c
+ create mode 100644 block/bfq-iosched.c
+ create mode 100644 block/bfq-sched.c
+ create mode 100644 block/bfq.h
+
+diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
+index 0ee5f0f..f78cd1a 100644
+--- a/block/Kconfig.iosched
++++ b/block/Kconfig.iosched
+@@ -51,14 +51,12 @@ config IOSCHED_BFQ
+ 	  applications. If compiled built-in (saying Y here), BFQ can
+ 	  be configured to support hierarchical scheduling.
+ 
+-config CGROUP_BFQIO
++config BFQ_GROUP_IOSCHED
+ 	bool "BFQ hierarchical scheduling support"
+ 	depends on CGROUPS && IOSCHED_BFQ=y
+ 	default n
+ 	---help---
+-	  Enable hierarchical scheduling in BFQ, using the cgroups
+-	  filesystem interface.  The name of the subsystem will be
+-	  bfqio.
++	  Enable hierarchical scheduling in BFQ, using the blkio controller.
+ 
+ choice
+ 	prompt "Default I/O scheduler"
+diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
+new file mode 100644
+index 0000000..c02d65a
+--- /dev/null
++++ b/block/bfq-cgroup.c
+@@ -0,0 +1,1108 @@
++/*
++ * BFQ: CGROUPS support.
++ *
++ * Based on ideas and code from CFQ:
++ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
++ *
++ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
++ *		      Paolo Valente <paolo.valente@unimore.it>
++ *
++ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
++ *
++ * Licensed under the GPL-2 as detailed in the accompanying COPYING.BFQ
++ * file.
++ */
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++
++/* bfqg stats flags */
++enum bfqg_stats_flags {
++	BFQG_stats_waiting = 0,
++	BFQG_stats_idling,
++	BFQG_stats_empty,
++};
++
++#define BFQG_FLAG_FNS(name)						\
++static void bfqg_stats_mark_##name(struct bfqg_stats *stats)	\
++{									\
++	stats->flags |= (1 << BFQG_stats_##name);			\
++}									\
++static void bfqg_stats_clear_##name(struct bfqg_stats *stats)	\
++{									\
++	stats->flags &= ~(1 << BFQG_stats_##name);			\
++}									\
++static int bfqg_stats_##name(struct bfqg_stats *stats)		\
++{									\
++	return (stats->flags & (1 << BFQG_stats_##name)) != 0;		\
++}									\
++
++BFQG_FLAG_FNS(waiting)
++BFQG_FLAG_FNS(idling)
++BFQG_FLAG_FNS(empty)
++#undef BFQG_FLAG_FNS
++
++/* This should be called with the queue_lock held. */
++static void bfqg_stats_update_group_wait_time(struct bfqg_stats *stats)
++{
++	unsigned long long now;
++
++	if (!bfqg_stats_waiting(stats))
++		return;
++
++	now = sched_clock();
++	if (time_after64(now, stats->start_group_wait_time))
++		blkg_stat_add(&stats->group_wait_time,
++			      now - stats->start_group_wait_time);
++	bfqg_stats_clear_waiting(stats);
++}
++
++/* This should be called with the queue_lock held. */
++static void bfqg_stats_set_start_group_wait_time(struct bfq_group *bfqg,
++						 struct bfq_group *curr_bfqg)
++{
++	struct bfqg_stats *stats = &bfqg->stats;
++
++	if (bfqg_stats_waiting(stats))
++		return;
++	if (bfqg == curr_bfqg)
++		return;
++	stats->start_group_wait_time = sched_clock();
++	bfqg_stats_mark_waiting(stats);
++}
++
++/* This should be called with the queue_lock held. */
++static void bfqg_stats_end_empty_time(struct bfqg_stats *stats)
++{
++	unsigned long long now;
++
++	if (!bfqg_stats_empty(stats))
++		return;
++
++	now = sched_clock();
++	if (time_after64(now, stats->start_empty_time))
++		blkg_stat_add(&stats->empty_time,
++			      now - stats->start_empty_time);
++	bfqg_stats_clear_empty(stats);
++}
++
++static void bfqg_stats_update_dequeue(struct bfq_group *bfqg)
++{
++	blkg_stat_add(&bfqg->stats.dequeue, 1);
++}
++
++static void bfqg_stats_set_start_empty_time(struct bfq_group *bfqg)
++{
++	struct bfqg_stats *stats = &bfqg->stats;
++
++	if (blkg_rwstat_total(&stats->queued))
++		return;
++
++	/*
++	 * group is already marked empty. This can happen if bfqq got new
++	 * request in parent group and moved to this group while being added
++	 * to service tree. Just ignore the event and move on.
++	 */
++	if (bfqg_stats_empty(stats))
++		return;
++
++	stats->start_empty_time = sched_clock();
++	bfqg_stats_mark_empty(stats);
++}
++
++static void bfqg_stats_update_idle_time(struct bfq_group *bfqg)
++{
++	struct bfqg_stats *stats = &bfqg->stats;
++
++	if (bfqg_stats_idling(stats)) {
++		unsigned long long now = sched_clock();
++
++		if (time_after64(now, stats->start_idle_time))
++			blkg_stat_add(&stats->idle_time,
++				      now - stats->start_idle_time);
++		bfqg_stats_clear_idling(stats);
++	}
++}
++
++static void bfqg_stats_set_start_idle_time(struct bfq_group *bfqg)
++{
++	struct bfqg_stats *stats = &bfqg->stats;
++
++	stats->start_idle_time = sched_clock();
++	bfqg_stats_mark_idling(stats);
++}
++
++static void bfqg_stats_update_avg_queue_size(struct bfq_group *bfqg)
++{
++	struct bfqg_stats *stats = &bfqg->stats;
++
++	blkg_stat_add(&stats->avg_queue_size_sum,
++		      blkg_rwstat_total(&stats->queued));
++	blkg_stat_add(&stats->avg_queue_size_samples, 1);
++	bfqg_stats_update_group_wait_time(stats);
++}
++
++static struct blkcg_policy blkcg_policy_bfq;
++
++/*
++ * blk-cgroup policy-related handlers
++ * The following functions help in converting between blk-cgroup
++ * internal structures and BFQ-specific structures.
++ */
++
++static struct bfq_group *pd_to_bfqg(struct blkg_policy_data *pd)
++{
++	return pd ? container_of(pd, struct bfq_group, pd) : NULL;
++}
++
++static struct blkcg_gq *bfqg_to_blkg(struct bfq_group *bfqg)
++{
++	return pd_to_blkg(&bfqg->pd);
++}
++
++static struct bfq_group *blkg_to_bfqg(struct blkcg_gq *blkg)
++{
++	return pd_to_bfqg(blkg_to_pd(blkg, &blkcg_policy_bfq));
++}
++
++/*
++ * bfq_group handlers
++ * The following functions help in navigating the bfq_group hierarchy
++ * by allowing to find the parent of a bfq_group or the bfq_group
++ * associated to a bfq_queue.
++ */
++
++static struct bfq_group *bfqg_parent(struct bfq_group *bfqg)
++{
++	struct blkcg_gq *pblkg = bfqg_to_blkg(bfqg)->parent;
++
++	return pblkg ? blkg_to_bfqg(pblkg) : NULL;
++}
++
++static struct bfq_group *bfqq_group(struct bfq_queue *bfqq)
++{
++	struct bfq_entity *group_entity = bfqq->entity.parent;
++
++	return group_entity ? container_of(group_entity, struct bfq_group,
++					   entity) :
++			      bfqq->bfqd->root_group;
++}
++
++/*
++ * The following two functions handle get and put of a bfq_group by
++ * wrapping the related blk-cgroup hooks.
++ */
++
++static void bfqg_get(struct bfq_group *bfqg)
++{
++	return blkg_get(bfqg_to_blkg(bfqg));
++}
++
++static void bfqg_put(struct bfq_group *bfqg)
++{
++	return blkg_put(bfqg_to_blkg(bfqg));
++}
++
++static void bfqg_stats_update_io_add(struct bfq_group *bfqg,
++				     struct bfq_queue *bfqq,
++				     int rw)
++{
++	blkg_rwstat_add(&bfqg->stats.queued, rw, 1);
++	bfqg_stats_end_empty_time(&bfqg->stats);
++	if (!(bfqq == ((struct bfq_data *)bfqg->bfqd)->in_service_queue))
++		bfqg_stats_set_start_group_wait_time(bfqg, bfqq_group(bfqq));
++}
++
++static void bfqg_stats_update_io_remove(struct bfq_group *bfqg, int rw)
++{
++	blkg_rwstat_add(&bfqg->stats.queued, rw, -1);
++}
++
++static void bfqg_stats_update_io_merged(struct bfq_group *bfqg, int rw)
++{
++	blkg_rwstat_add(&bfqg->stats.merged, rw, 1);
++}
++
++static void bfqg_stats_update_dispatch(struct bfq_group *bfqg,
++					      uint64_t bytes, int rw)
++{
++	blkg_stat_add(&bfqg->stats.sectors, bytes >> 9);
++	blkg_rwstat_add(&bfqg->stats.serviced, rw, 1);
++	blkg_rwstat_add(&bfqg->stats.service_bytes, rw, bytes);
++}
++
++static void bfqg_stats_update_completion(struct bfq_group *bfqg,
++			uint64_t start_time, uint64_t io_start_time, int rw)
++{
++	struct bfqg_stats *stats = &bfqg->stats;
++	unsigned long long now = sched_clock();
++
++	if (time_after64(now, io_start_time))
++		blkg_rwstat_add(&stats->service_time, rw, now - io_start_time);
++	if (time_after64(io_start_time, start_time))
++		blkg_rwstat_add(&stats->wait_time, rw,
++				io_start_time - start_time);
++}
++
++/* @stats = 0 */
++static void bfqg_stats_reset(struct bfqg_stats *stats)
++{
++	if (!stats)
++		return;
++
++	/* queued stats shouldn't be cleared */
++	blkg_rwstat_reset(&stats->service_bytes);
++	blkg_rwstat_reset(&stats->serviced);
++	blkg_rwstat_reset(&stats->merged);
++	blkg_rwstat_reset(&stats->service_time);
++	blkg_rwstat_reset(&stats->wait_time);
++	blkg_stat_reset(&stats->time);
++	blkg_stat_reset(&stats->unaccounted_time);
++	blkg_stat_reset(&stats->avg_queue_size_sum);
++	blkg_stat_reset(&stats->avg_queue_size_samples);
++	blkg_stat_reset(&stats->dequeue);
++	blkg_stat_reset(&stats->group_wait_time);
++	blkg_stat_reset(&stats->idle_time);
++	blkg_stat_reset(&stats->empty_time);
++}
++
++/* @to += @from */
++static void bfqg_stats_merge(struct bfqg_stats *to, struct bfqg_stats *from)
++{
++	if (!to || !from)
++		return;
++
++	/* queued stats shouldn't be cleared */
++	blkg_rwstat_merge(&to->service_bytes, &from->service_bytes);
++	blkg_rwstat_merge(&to->serviced, &from->serviced);
++	blkg_rwstat_merge(&to->merged, &from->merged);
++	blkg_rwstat_merge(&to->service_time, &from->service_time);
++	blkg_rwstat_merge(&to->wait_time, &from->wait_time);
++	blkg_stat_merge(&from->time, &from->time);
++	blkg_stat_merge(&to->unaccounted_time, &from->unaccounted_time);
++	blkg_stat_merge(&to->avg_queue_size_sum, &from->avg_queue_size_sum);
++	blkg_stat_merge(&to->avg_queue_size_samples, &from->avg_queue_size_samples);
++	blkg_stat_merge(&to->dequeue, &from->dequeue);
++	blkg_stat_merge(&to->group_wait_time, &from->group_wait_time);
++	blkg_stat_merge(&to->idle_time, &from->idle_time);
++	blkg_stat_merge(&to->empty_time, &from->empty_time);
++}
++
++/*
++ * Transfer @bfqg's stats to its parent's dead_stats so that the ancestors'
++ * recursive stats can still account for the amount used by this bfqg after
++ * it's gone.
++ */
++static void bfqg_stats_xfer_dead(struct bfq_group *bfqg)
++{
++	struct bfq_group *parent;
++
++	if (!bfqg) /* root_group */
++		return;
++
++	parent = bfqg_parent(bfqg);
++
++	lockdep_assert_held(bfqg_to_blkg(bfqg)->q->queue_lock);
++
++	if (unlikely(!parent))
++		return;
++
++	bfqg_stats_merge(&parent->dead_stats, &bfqg->stats);
++	bfqg_stats_merge(&parent->dead_stats, &bfqg->dead_stats);
++	bfqg_stats_reset(&bfqg->stats);
++	bfqg_stats_reset(&bfqg->dead_stats);
++}
++
++static void bfq_init_entity(struct bfq_entity *entity,
++			    struct bfq_group *bfqg)
++{
++	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++
++	entity->weight = entity->new_weight;
++	entity->orig_weight = entity->new_weight;
++	if (bfqq) {
++		bfqq->ioprio = bfqq->new_ioprio;
++		bfqq->ioprio_class = bfqq->new_ioprio_class;
++		bfqg_get(bfqg);
++	}
++	entity->parent = bfqg->my_entity;
++	entity->sched_data = &bfqg->sched_data;
++}
++
++static void bfqg_stats_init(struct bfqg_stats *stats)
++{
++	blkg_rwstat_init(&stats->service_bytes);
++	blkg_rwstat_init(&stats->serviced);
++	blkg_rwstat_init(&stats->merged);
++	blkg_rwstat_init(&stats->service_time);
++	blkg_rwstat_init(&stats->wait_time);
++	blkg_rwstat_init(&stats->queued);
++
++	blkg_stat_init(&stats->sectors);
++	blkg_stat_init(&stats->time);
++
++	blkg_stat_init(&stats->unaccounted_time);
++	blkg_stat_init(&stats->avg_queue_size_sum);
++	blkg_stat_init(&stats->avg_queue_size_samples);
++	blkg_stat_init(&stats->dequeue);
++	blkg_stat_init(&stats->group_wait_time);
++	blkg_stat_init(&stats->idle_time);
++	blkg_stat_init(&stats->empty_time);
++}
++
++static struct bfq_group_data *cpd_to_bfqgd(struct blkcg_policy_data *cpd)
++ {
++	return cpd ? container_of(cpd, struct bfq_group_data, pd) : NULL;
++ }
++
++static struct bfq_group_data *blkcg_to_bfqgd(struct blkcg *blkcg)
++{
++	return cpd_to_bfqgd(blkcg_to_cpd(blkcg, &blkcg_policy_bfq));
++}
++
++static void bfq_cpd_init(const struct blkcg *blkcg)
++{
++	struct bfq_group_data *d =
++		cpd_to_bfqgd(blkcg->pd[blkcg_policy_bfq.plid]);
++
++	d->weight = BFQ_DEFAULT_GRP_WEIGHT;
++}
++
++static void bfq_pd_init(struct blkcg_gq *blkg)
++{
++	struct bfq_group *bfqg = blkg_to_bfqg(blkg);
++	struct bfq_data *bfqd = blkg->q->elevator->elevator_data;
++	struct bfq_entity *entity = &bfqg->entity;
++	struct bfq_group_data *d = blkcg_to_bfqgd(blkg->blkcg);
++
++	entity->orig_weight = entity->weight = entity->new_weight = d->weight;
++	entity->my_sched_data = &bfqg->sched_data;
++	bfqg->my_entity = entity; /*
++				   * the root_group's will be set to NULL
++				   * in bfq_init_queue()
++				   */
++	bfqg->bfqd = bfqd;
++	bfqg->active_entities = 0;
++
++	/* if the root_group does not exist, we are handling it right now */
++	if (bfqd->root_group && bfqg != bfqd->root_group)
++		hlist_add_head(&bfqg->bfqd_node, &bfqd->group_list);
++
++	bfqg_stats_init(&bfqg->stats);
++	bfqg_stats_init(&bfqg->dead_stats);
++}
++
++/* offset delta from bfqg->stats to bfqg->dead_stats */
++static const int dead_stats_off_delta = offsetof(struct bfq_group, dead_stats) -
++					offsetof(struct bfq_group, stats);
++
++/* to be used by recursive prfill, sums live and dead stats recursively */
++static u64 bfqg_stat_pd_recursive_sum(struct blkg_policy_data *pd, int off)
++{
++	u64 sum = 0;
++
++	sum += blkg_stat_recursive_sum(pd, off);
++	sum += blkg_stat_recursive_sum(pd, off + dead_stats_off_delta);
++	return sum;
++}
++
++/* to be used by recursive prfill, sums live and dead rwstats recursively */
++static struct blkg_rwstat bfqg_rwstat_pd_recursive_sum(struct blkg_policy_data *pd,
++						       int off)
++{
++	struct blkg_rwstat a, b;
++
++	a = blkg_rwstat_recursive_sum(pd, off);
++	b = blkg_rwstat_recursive_sum(pd, off + dead_stats_off_delta);
++	blkg_rwstat_merge(&a, &b);
++	return a;
++}
++
++static void bfq_pd_reset_stats(struct blkcg_gq *blkg)
++{
++	struct bfq_group *bfqg = blkg_to_bfqg(blkg);
++
++	bfqg_stats_reset(&bfqg->stats);
++	bfqg_stats_reset(&bfqg->dead_stats);
++}
++
++static void bfq_group_set_parent(struct bfq_group *bfqg,
++					struct bfq_group *parent)
++{
++	struct bfq_entity *entity;
++
++	BUG_ON(!parent);
++	BUG_ON(!bfqg);
++	BUG_ON(bfqg == parent);
++
++	entity = &bfqg->entity;
++	entity->parent = parent->my_entity;
++	entity->sched_data = &parent->sched_data;
++}
++
++static struct bfq_group *bfq_find_alloc_group(struct bfq_data *bfqd,
++					      struct blkcg *blkcg)
++{
++	struct request_queue *q = bfqd->queue;
++	struct bfq_group *bfqg = NULL, *parent;
++	struct bfq_entity *entity = NULL;
++
++	assert_spin_locked(bfqd->queue->queue_lock);
++
++	/* avoid lookup for the common case where there's no blkcg */
++	if (blkcg == &blkcg_root) {
++		bfqg = bfqd->root_group;
++	} else {
++		struct blkcg_gq *blkg;
++
++		blkg = blkg_lookup_create(blkcg, q);
++		if (!IS_ERR(blkg))
++			bfqg = blkg_to_bfqg(blkg);
++		else /* fallback to root_group */
++			bfqg = bfqd->root_group;
++	}
++
++	BUG_ON(!bfqg);
++
++	/*
++	 * Update chain of bfq_groups as we might be handling a leaf group
++	 * which, along with some of its relatives, has not been hooked yet
++	 * to the private hierarchy of BFQ.
++	 */
++	entity = &bfqg->entity;
++	for_each_entity(entity) {
++		bfqg = container_of(entity, struct bfq_group, entity);
++		BUG_ON(!bfqg);
++		if (bfqg != bfqd->root_group) {
++			parent = bfqg_parent(bfqg);
++			if (!parent)
++				parent = bfqd->root_group;
++			BUG_ON(!parent);
++			bfq_group_set_parent(bfqg, parent);
++		}
++	}
++
++	return bfqg;
++}
++
++/**
++ * bfq_bfqq_move - migrate @bfqq to @bfqg.
++ * @bfqd: queue descriptor.
++ * @bfqq: the queue to move.
++ * @entity: @bfqq's entity.
++ * @bfqg: the group to move to.
++ *
++ * Move @bfqq to @bfqg, deactivating it from its old group and reactivating
++ * it on the new one.  Avoid putting the entity on the old group idle tree.
++ *
++ * Must be called under the queue lock; the cgroup owning @bfqg must
++ * not disappear (by now this just means that we are called under
++ * rcu_read_lock()).
++ */
++static void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++			  struct bfq_entity *entity, struct bfq_group *bfqg)
++{
++	int busy, resume;
++
++	busy = bfq_bfqq_busy(bfqq);
++	resume = !RB_EMPTY_ROOT(&bfqq->sort_list);
++
++	BUG_ON(resume && !entity->on_st);
++	BUG_ON(busy && !resume && entity->on_st &&
++	       bfqq != bfqd->in_service_queue);
++
++	if (busy) {
++		BUG_ON(atomic_read(&bfqq->ref) < 2);
++
++		if (!resume)
++			bfq_del_bfqq_busy(bfqd, bfqq, 0);
++		else
++			bfq_deactivate_bfqq(bfqd, bfqq, 0);
++	} else if (entity->on_st)
++		bfq_put_idle_entity(bfq_entity_service_tree(entity), entity);
++	bfqg_put(bfqq_group(bfqq));
++
++	/*
++	 * Here we use a reference to bfqg.  We don't need a refcounter
++	 * as the cgroup reference will not be dropped, so that its
++	 * destroy() callback will not be invoked.
++	 */
++	entity->parent = bfqg->my_entity;
++	entity->sched_data = &bfqg->sched_data;
++	bfqg_get(bfqg);
++
++	if (busy) {
++		if (resume)
++			bfq_activate_bfqq(bfqd, bfqq);
++	}
++
++	if (!bfqd->in_service_queue && !bfqd->rq_in_driver)
++		bfq_schedule_dispatch(bfqd);
++}
++
++/**
++ * __bfq_bic_change_cgroup - move @bic to @cgroup.
++ * @bfqd: the queue descriptor.
++ * @bic: the bic to move.
++ * @blkcg: the blk-cgroup to move to.
++ *
++ * Move bic to blkcg, assuming that bfqd->queue is locked; the caller
++ * has to make sure that the reference to cgroup is valid across the call.
++ *
++ * NOTE: an alternative approach might have been to store the current
++ * cgroup in bfqq and getting a reference to it, reducing the lookup
++ * time here, at the price of slightly more complex code.
++ */
++static struct bfq_group *__bfq_bic_change_cgroup(struct bfq_data *bfqd,
++						struct bfq_io_cq *bic,
++						struct blkcg *blkcg)
++{
++	struct bfq_queue *async_bfqq = bic_to_bfqq(bic, 0);
++	struct bfq_queue *sync_bfqq = bic_to_bfqq(bic, 1);
++	struct bfq_group *bfqg;
++	struct bfq_entity *entity;
++
++	lockdep_assert_held(bfqd->queue->queue_lock);
++
++	bfqg = bfq_find_alloc_group(bfqd, blkcg);
++	if (async_bfqq) {
++		entity = &async_bfqq->entity;
++
++		if (entity->sched_data != &bfqg->sched_data) {
++			bic_set_bfqq(bic, NULL, 0);
++			bfq_log_bfqq(bfqd, async_bfqq,
++				     "bic_change_group: %p %d",
++				     async_bfqq, atomic_read(&async_bfqq->ref));
++			bfq_put_queue(async_bfqq);
++		}
++	}
++
++	if (sync_bfqq) {
++		entity = &sync_bfqq->entity;
++		if (entity->sched_data != &bfqg->sched_data)
++			bfq_bfqq_move(bfqd, sync_bfqq, entity, bfqg);
++	}
++
++	return bfqg;
++}
++
++static void bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio)
++{
++	struct bfq_data *bfqd = bic_to_bfqd(bic);
++	struct blkcg *blkcg;
++	struct bfq_group *bfqg = NULL;
++	uint64_t id;
++
++	rcu_read_lock();
++	blkcg = bio_blkcg(bio);
++	id = blkcg->css.serial_nr;
++	rcu_read_unlock();
++
++	/*
++	 * Check whether blkcg has changed.  The condition may trigger
++	 * spuriously on a newly created cic but there's no harm.
++	 */
++	if (unlikely(!bfqd) || likely(bic->blkcg_id == id))
++		return;
++
++	bfqg = __bfq_bic_change_cgroup(bfqd, bic, blkcg);
++	BUG_ON(!bfqg);
++	bic->blkcg_id = id;
++}
++
++/**
++ * bfq_flush_idle_tree - deactivate any entity on the idle tree of @st.
++ * @st: the service tree being flushed.
++ */
++static void bfq_flush_idle_tree(struct bfq_service_tree *st)
++{
++	struct bfq_entity *entity = st->first_idle;
++
++	for (; entity ; entity = st->first_idle)
++		__bfq_deactivate_entity(entity, 0);
++}
++
++/**
++ * bfq_reparent_leaf_entity - move leaf entity to the root_group.
++ * @bfqd: the device data structure with the root group.
++ * @entity: the entity to move.
++ */
++static void bfq_reparent_leaf_entity(struct bfq_data *bfqd,
++				     struct bfq_entity *entity)
++{
++	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++
++	BUG_ON(!bfqq);
++	bfq_bfqq_move(bfqd, bfqq, entity, bfqd->root_group);
++	return;
++}
++
++/**
++ * bfq_reparent_active_entities - move to the root group all active
++ *                                entities.
++ * @bfqd: the device data structure with the root group.
++ * @bfqg: the group to move from.
++ * @st: the service tree with the entities.
++ *
++ * Needs queue_lock to be taken and reference to be valid over the call.
++ */
++static void bfq_reparent_active_entities(struct bfq_data *bfqd,
++					 struct bfq_group *bfqg,
++					 struct bfq_service_tree *st)
++{
++	struct rb_root *active = &st->active;
++	struct bfq_entity *entity = NULL;
++
++	if (!RB_EMPTY_ROOT(&st->active))
++		entity = bfq_entity_of(rb_first(active));
++
++	for (; entity ; entity = bfq_entity_of(rb_first(active)))
++		bfq_reparent_leaf_entity(bfqd, entity);
++
++	if (bfqg->sched_data.in_service_entity)
++		bfq_reparent_leaf_entity(bfqd,
++			bfqg->sched_data.in_service_entity);
++
++	return;
++}
++
++/**
++ * bfq_destroy_group - destroy @bfqg.
++ * @bfqg: the group being destroyed.
++ *
++ * Destroy @bfqg, making sure that it is not referenced from its parent.
++ * blkio already grabs the queue_lock for us, so no need to use RCU-based magic
++ */
++static void bfq_pd_offline(struct blkcg_gq *blkg)
++{
++	struct bfq_service_tree *st;
++	struct bfq_group *bfqg = blkg_to_bfqg(blkg);
++	struct bfq_data *bfqd = bfqg->bfqd;
++	struct bfq_entity *entity = bfqg->my_entity;
++	int i;
++
++	if (!entity) /* root group */
++		return;
++
++	/*
++	 * Empty all service_trees belonging to this group before
++	 * deactivating the group itself.
++	 */
++	for (i = 0; i < BFQ_IOPRIO_CLASSES; i++) {
++		st = bfqg->sched_data.service_tree + i;
++
++		/*
++		 * The idle tree may still contain bfq_queues belonging
++		 * to exited task because they never migrated to a different
++		 * cgroup from the one being destroyed now.  No one else
++		 * can access them so it's safe to act without any lock.
++		 */
++		bfq_flush_idle_tree(st);
++
++		/*
++		 * It may happen that some queues are still active
++		 * (busy) upon group destruction (if the corresponding
++		 * processes have been forced to terminate). We move
++		 * all the leaf entities corresponding to these queues
++		 * to the root_group.
++		 * Also, it may happen that the group has an entity
++		 * in service, which is disconnected from the active
++		 * tree: it must be moved, too.
++		 * There is no need to put the sync queues, as the
++		 * scheduler has taken no reference.
++		 */
++		bfq_reparent_active_entities(bfqd, bfqg, st);
++		BUG_ON(!RB_EMPTY_ROOT(&st->active));
++		BUG_ON(!RB_EMPTY_ROOT(&st->idle));
++	}
++	BUG_ON(bfqg->sched_data.next_in_service);
++	BUG_ON(bfqg->sched_data.in_service_entity);
++
++	hlist_del(&bfqg->bfqd_node);
++	__bfq_deactivate_entity(entity, 0);
++	bfq_put_async_queues(bfqd, bfqg);
++	BUG_ON(entity->tree);
++
++	bfqg_stats_xfer_dead(bfqg);
++}
++
++static void bfq_end_wr_async(struct bfq_data *bfqd)
++{
++	struct hlist_node *tmp;
++	struct bfq_group *bfqg;
++
++	hlist_for_each_entry_safe(bfqg, tmp, &bfqd->group_list, bfqd_node)
++		bfq_end_wr_async_queues(bfqd, bfqg);
++	bfq_end_wr_async_queues(bfqd, bfqd->root_group);
++}
++
++/**
++ * bfq_disconnect_groups - disconnect @bfqd from all its groups.
++ * @bfqd: the device descriptor being exited.
++ *
++ * When the device exits we just make sure that no lookup can return
++ * the now unused group structures.  They will be deallocated on cgroup
++ * destruction.
++ */
++static void bfq_disconnect_groups(struct bfq_data *bfqd)
++{
++	struct hlist_node *tmp;
++	struct bfq_group *bfqg;
++
++	bfq_log(bfqd, "disconnect_groups beginning");
++	hlist_for_each_entry_safe(bfqg, tmp, &bfqd->group_list, bfqd_node) {
++		hlist_del(&bfqg->bfqd_node);
++
++		__bfq_deactivate_entity(bfqg->my_entity, 0);
++
++		/*
++		 * Don't remove from the group hash, just set an
++		 * invalid key.  No lookups can race with the
++		 * assignment as bfqd is being destroyed; this
++		 * implies also that new elements cannot be added
++		 * to the list.
++		 */
++		rcu_assign_pointer(bfqg->bfqd, NULL);
++
++		bfq_log(bfqd, "disconnect_groups: put async for group %p",
++			bfqg);
++		bfq_put_async_queues(bfqd, bfqg);
++	}
++}
++
++static u64 bfqio_cgroup_weight_read(struct cgroup_subsys_state *css,
++				       struct cftype *cftype)
++{
++	struct blkcg *blkcg = css_to_blkcg(css);
++	struct bfq_group_data *bfqgd = blkcg_to_bfqgd(blkcg);
++	int ret = -EINVAL;
++
++	spin_lock_irq(&blkcg->lock);
++	ret = bfqgd->weight;
++	spin_unlock_irq(&blkcg->lock);
++
++	return ret;
++}
++
++static int bfqio_cgroup_weight_write(struct cgroup_subsys_state *css,
++					struct cftype *cftype,
++					u64 val)
++{
++	struct blkcg *blkcg = css_to_blkcg(css);
++	struct bfq_group_data *bfqgd = blkcg_to_bfqgd(blkcg);
++	struct blkcg_gq *blkg;
++	int ret = -EINVAL;
++
++	if (val < BFQ_MIN_WEIGHT || val > BFQ_MAX_WEIGHT)
++		return ret;
++
++	ret = 0;
++	spin_lock_irq(&blkcg->lock);
++	bfqgd->weight = (unsigned short)val;
++	hlist_for_each_entry(blkg, &blkcg->blkg_list, blkcg_node) {
++		struct bfq_group *bfqg = blkg_to_bfqg(blkg);
++		if (!bfqg)
++			continue;
++		/*
++		 * Setting the prio_changed flag of the entity
++		 * to 1 with new_weight == weight would re-set
++		 * the value of the weight to its ioprio mapping.
++		 * Set the flag only if necessary.
++		 */
++		if ((unsigned short)val != bfqg->entity.new_weight) {
++			bfqg->entity.new_weight = (unsigned short)val;
++			/*
++			 * Make sure that the above new value has been
++			 * stored in bfqg->entity.new_weight before
++			 * setting the prio_changed flag. In fact,
++			 * this flag may be read asynchronously (in
++			 * critical sections protected by a different
++			 * lock than that held here), and finding this
++			 * flag set may cause the execution of the code
++			 * for updating parameters whose value may
++			 * depend also on bfqg->entity.new_weight (in
++			 * __bfq_entity_update_weight_prio).
++			 * This barrier makes sure that the new value
++			 * of bfqg->entity.new_weight is correctly
++			 * seen in that code.
++			 */
++			smp_wmb();
++			bfqg->entity.prio_changed = 1;
++		}
++	}
++	spin_unlock_irq(&blkcg->lock);
++
++	return ret;
++}
++
++static int bfqg_print_stat(struct seq_file *sf, void *v)
++{
++	blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)), blkg_prfill_stat,
++			  &blkcg_policy_bfq, seq_cft(sf)->private, false);
++	return 0;
++}
++
++static int bfqg_print_rwstat(struct seq_file *sf, void *v)
++{
++	blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)), blkg_prfill_rwstat,
++			  &blkcg_policy_bfq, seq_cft(sf)->private, true);
++	return 0;
++}
++
++static u64 bfqg_prfill_stat_recursive(struct seq_file *sf,
++				      struct blkg_policy_data *pd, int off)
++{
++	u64 sum = bfqg_stat_pd_recursive_sum(pd, off);
++
++	return __blkg_prfill_u64(sf, pd, sum);
++}
++
++static u64 bfqg_prfill_rwstat_recursive(struct seq_file *sf,
++					struct blkg_policy_data *pd, int off)
++{
++	struct blkg_rwstat sum = bfqg_rwstat_pd_recursive_sum(pd, off);
++
++	return __blkg_prfill_rwstat(sf, pd, &sum);
++}
++
++static int bfqg_print_stat_recursive(struct seq_file *sf, void *v)
++{
++	blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)),
++			  bfqg_prfill_stat_recursive, &blkcg_policy_bfq,
++			  seq_cft(sf)->private, false);
++	return 0;
++}
++
++static int bfqg_print_rwstat_recursive(struct seq_file *sf, void *v)
++{
++	blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)),
++			  bfqg_prfill_rwstat_recursive, &blkcg_policy_bfq,
++			  seq_cft(sf)->private, true);
++	return 0;
++}
++
++static u64 bfqg_prfill_avg_queue_size(struct seq_file *sf,
++				      struct blkg_policy_data *pd, int off)
++{
++	struct bfq_group *bfqg = pd_to_bfqg(pd);
++	u64 samples = blkg_stat_read(&bfqg->stats.avg_queue_size_samples);
++	u64 v = 0;
++
++	if (samples) {
++		v = blkg_stat_read(&bfqg->stats.avg_queue_size_sum);
++		v = div64_u64(v, samples);
++	}
++	__blkg_prfill_u64(sf, pd, v);
++	return 0;
++}
++
++/* print avg_queue_size */
++static int bfqg_print_avg_queue_size(struct seq_file *sf, void *v)
++{
++	blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)),
++			  bfqg_prfill_avg_queue_size, &blkcg_policy_bfq,
++			  0, false);
++	return 0;
++}
++
++static struct bfq_group *bfq_create_group_hierarchy(struct bfq_data *bfqd, int node)
++{
++	int ret;
++
++	ret = blkcg_activate_policy(bfqd->queue, &blkcg_policy_bfq);
++	if (ret)
++		return NULL;
++
++        return blkg_to_bfqg(bfqd->queue->root_blkg);
++}
++
++static struct cftype bfqio_files[] = {
++	{
++		.name = "bfq.weight",
++		.read_u64 = bfqio_cgroup_weight_read,
++		.write_u64 = bfqio_cgroup_weight_write,
++	},
++	/* statistics, cover only the tasks in the bfqg */
++	{
++		.name = "bfq.time",
++		.private = offsetof(struct bfq_group, stats.time),
++		.seq_show = bfqg_print_stat,
++	},
++	{
++		.name = "bfq.sectors",
++		.private = offsetof(struct bfq_group, stats.sectors),
++		.seq_show = bfqg_print_stat,
++	},
++	{
++		.name = "bfq.io_service_bytes",
++		.private = offsetof(struct bfq_group, stats.service_bytes),
++		.seq_show = bfqg_print_rwstat,
++	},
++	{
++		.name = "bfq.io_serviced",
++		.private = offsetof(struct bfq_group, stats.serviced),
++		.seq_show = bfqg_print_rwstat,
++	},
++	{
++		.name = "bfq.io_service_time",
++		.private = offsetof(struct bfq_group, stats.service_time),
++		.seq_show = bfqg_print_rwstat,
++	},
++	{
++		.name = "bfq.io_wait_time",
++		.private = offsetof(struct bfq_group, stats.wait_time),
++		.seq_show = bfqg_print_rwstat,
++	},
++	{
++		.name = "bfq.io_merged",
++		.private = offsetof(struct bfq_group, stats.merged),
++		.seq_show = bfqg_print_rwstat,
++	},
++	{
++		.name = "bfq.io_queued",
++		.private = offsetof(struct bfq_group, stats.queued),
++		.seq_show = bfqg_print_rwstat,
++	},
++
++	/* the same statictics which cover the bfqg and its descendants */
++	{
++		.name = "bfq.time_recursive",
++		.private = offsetof(struct bfq_group, stats.time),
++		.seq_show = bfqg_print_stat_recursive,
++	},
++	{
++		.name = "bfq.sectors_recursive",
++		.private = offsetof(struct bfq_group, stats.sectors),
++		.seq_show = bfqg_print_stat_recursive,
++	},
++	{
++		.name = "bfq.io_service_bytes_recursive",
++		.private = offsetof(struct bfq_group, stats.service_bytes),
++		.seq_show = bfqg_print_rwstat_recursive,
++	},
++	{
++		.name = "bfq.io_serviced_recursive",
++		.private = offsetof(struct bfq_group, stats.serviced),
++		.seq_show = bfqg_print_rwstat_recursive,
++	},
++	{
++		.name = "bfq.io_service_time_recursive",
++		.private = offsetof(struct bfq_group, stats.service_time),
++		.seq_show = bfqg_print_rwstat_recursive,
++	},
++	{
++		.name = "bfq.io_wait_time_recursive",
++		.private = offsetof(struct bfq_group, stats.wait_time),
++		.seq_show = bfqg_print_rwstat_recursive,
++	},
++	{
++		.name = "bfq.io_merged_recursive",
++		.private = offsetof(struct bfq_group, stats.merged),
++		.seq_show = bfqg_print_rwstat_recursive,
++	},
++	{
++		.name = "bfq.io_queued_recursive",
++		.private = offsetof(struct bfq_group, stats.queued),
++		.seq_show = bfqg_print_rwstat_recursive,
++	},
++	{
++		.name = "bfq.avg_queue_size",
++		.seq_show = bfqg_print_avg_queue_size,
++	},
++	{
++		.name = "bfq.group_wait_time",
++		.private = offsetof(struct bfq_group, stats.group_wait_time),
++		.seq_show = bfqg_print_stat,
++	},
++	{
++		.name = "bfq.idle_time",
++		.private = offsetof(struct bfq_group, stats.idle_time),
++		.seq_show = bfqg_print_stat,
++	},
++	{
++		.name = "bfq.empty_time",
++		.private = offsetof(struct bfq_group, stats.empty_time),
++		.seq_show = bfqg_print_stat,
++	},
++	{
++		.name = "bfq.dequeue",
++		.private = offsetof(struct bfq_group, stats.dequeue),
++		.seq_show = bfqg_print_stat,
++	},
++	{
++		.name = "bfq.unaccounted_time",
++		.private = offsetof(struct bfq_group, stats.unaccounted_time),
++		.seq_show = bfqg_print_stat,
++	},
++	{ }	/* terminate */
++};
++
++static struct blkcg_policy blkcg_policy_bfq = {
++       .pd_size                = sizeof(struct bfq_group),
++       .cpd_size               = sizeof(struct bfq_group_data),
++       .cftypes                = bfqio_files,
++       .pd_init_fn             = bfq_pd_init,
++       .cpd_init_fn            = bfq_cpd_init,
++       .pd_offline_fn          = bfq_pd_offline,
++       .pd_reset_stats_fn      = bfq_pd_reset_stats,
++};
++
++#else
++
++static void bfq_init_entity(struct bfq_entity *entity,
++			    struct bfq_group *bfqg)
++{
++	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++	entity->weight = entity->new_weight;
++	entity->orig_weight = entity->new_weight;
++	if (bfqq) {
++		bfqq->ioprio = bfqq->new_ioprio;
++		bfqq->ioprio_class = bfqq->new_ioprio_class;
++	}
++	entity->sched_data = &bfqg->sched_data;
++}
++
++static struct bfq_group *
++bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio)
++{
++	struct bfq_data *bfqd = bic_to_bfqd(bic);
++	return bfqd->root_group;
++}
++
++static void bfq_bfqq_move(struct bfq_data *bfqd,
++			  struct bfq_queue *bfqq,
++			  struct bfq_entity *entity,
++			  struct bfq_group *bfqg)
++{
++}
++
++static void bfq_end_wr_async(struct bfq_data *bfqd)
++{
++	bfq_end_wr_async_queues(bfqd, bfqd->root_group);
++}
++
++static void bfq_disconnect_groups(struct bfq_data *bfqd)
++{
++	bfq_put_async_queues(bfqd, bfqd->root_group);
++}
++
++static struct bfq_group *bfq_find_alloc_group(struct bfq_data *bfqd,
++                                              struct blkcg *blkcg)
++{
++	return bfqd->root_group;
++}
++
++static struct bfq_group *bfq_create_group_hierarchy(struct bfq_data *bfqd, int node)
++{
++	struct bfq_group *bfqg;
++	int i;
++
++	bfqg = kmalloc_node(sizeof(*bfqg), GFP_KERNEL | __GFP_ZERO, node);
++	if (!bfqg)
++		return NULL;
++
++	for (i = 0; i < BFQ_IOPRIO_CLASSES; i++)
++		bfqg->sched_data.service_tree[i] = BFQ_SERVICE_TREE_INIT;
++
++	return bfqg;
++}
++#endif
+diff --git a/block/bfq-ioc.c b/block/bfq-ioc.c
+new file mode 100644
+index 0000000..fb7bb8f
+--- /dev/null
++++ b/block/bfq-ioc.c
+@@ -0,0 +1,36 @@
++/*
++ * BFQ: I/O context handling.
++ *
++ * Based on ideas and code from CFQ:
++ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
++ *
++ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
++ *		      Paolo Valente <paolo.valente@unimore.it>
++ *
++ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
++ */
++
++/**
++ * icq_to_bic - convert iocontext queue structure to bfq_io_cq.
++ * @icq: the iocontext queue.
++ */
++static struct bfq_io_cq *icq_to_bic(struct io_cq *icq)
++{
++	/* bic->icq is the first member, %NULL will convert to %NULL */
++	return container_of(icq, struct bfq_io_cq, icq);
++}
++
++/**
++ * bfq_bic_lookup - search into @ioc a bic associated to @bfqd.
++ * @bfqd: the lookup key.
++ * @ioc: the io_context of the process doing I/O.
++ *
++ * Queue lock must be held.
++ */
++static struct bfq_io_cq *bfq_bic_lookup(struct bfq_data *bfqd,
++					struct io_context *ioc)
++{
++	if (ioc)
++		return icq_to_bic(ioc_lookup_icq(ioc, bfqd->queue));
++	return NULL;
++}
+diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
+new file mode 100644
+index 0000000..51d24dd
+--- /dev/null
++++ b/block/bfq-iosched.c
+@@ -0,0 +1,3753 @@
++/*
++ * Budget Fair Queueing (BFQ) disk scheduler.
++ *
++ * Based on ideas and code from CFQ:
++ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
++ *
++ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
++ *		      Paolo Valente <paolo.valente@unimore.it>
++ *
++ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
++ *
++ * Licensed under the GPL-2 as detailed in the accompanying COPYING.BFQ
++ * file.
++ *
++ * BFQ is a proportional-share storage-I/O scheduling algorithm based on
++ * the slice-by-slice service scheme of CFQ. But BFQ assigns budgets,
++ * measured in number of sectors, to processes instead of time slices. The
++ * device is not granted to the in-service process for a given time slice,
++ * but until it has exhausted its assigned budget. This change from the time
++ * to the service domain allows BFQ to distribute the device throughput
++ * among processes as desired, without any distortion due to ZBR, workload
++ * fluctuations or other factors. BFQ uses an ad hoc internal scheduler,
++ * called B-WF2Q+, to schedule processes according to their budgets. More
++ * precisely, BFQ schedules queues associated to processes. Thanks to the
++ * accurate policy of B-WF2Q+, BFQ can afford to assign high budgets to
++ * I/O-bound processes issuing sequential requests (to boost the
++ * throughput), and yet guarantee a low latency to interactive and soft
++ * real-time applications.
++ *
++ * BFQ is described in [1], where also a reference to the initial, more
++ * theoretical paper on BFQ can be found. The interested reader can find
++ * in the latter paper full details on the main algorithm, as well as
++ * formulas of the guarantees and formal proofs of all the properties.
++ * With respect to the version of BFQ presented in these papers, this
++ * implementation adds a few more heuristics, such as the one that
++ * guarantees a low latency to soft real-time applications, and a
++ * hierarchical extension based on H-WF2Q+.
++ *
++ * B-WF2Q+ is based on WF2Q+, that is described in [2], together with
++ * H-WF2Q+, while the augmented tree used to implement B-WF2Q+ with O(log N)
++ * complexity derives from the one introduced with EEVDF in [3].
++ *
++ * [1] P. Valente and M. Andreolini, ``Improving Application Responsiveness
++ *     with the BFQ Disk I/O Scheduler'',
++ *     Proceedings of the 5th Annual International Systems and Storage
++ *     Conference (SYSTOR '12), June 2012.
++ *
++ * http://algogroup.unimo.it/people/paolo/disk_sched/bf1-v1-suite-results.pdf
++ *
++ * [2] Jon C.R. Bennett and H. Zhang, ``Hierarchical Packet Fair Queueing
++ *     Algorithms,'' IEEE/ACM Transactions on Networking, 5(5):675-689,
++ *     Oct 1997.
++ *
++ * http://www.cs.cmu.edu/~hzhang/papers/TON-97-Oct.ps.gz
++ *
++ * [3] I. Stoica and H. Abdel-Wahab, ``Earliest Eligible Virtual Deadline
++ *     First: A Flexible and Accurate Mechanism for Proportional Share
++ *     Resource Allocation,'' technical report.
++ *
++ * http://www.cs.berkeley.edu/~istoica/papers/eevdf-tr-95.pdf
++ */
++#include <linux/module.h>
++#include <linux/slab.h>
++#include <linux/blkdev.h>
++#include <linux/cgroup.h>
++#include <linux/elevator.h>
++#include <linux/jiffies.h>
++#include <linux/rbtree.h>
++#include <linux/ioprio.h>
++#include "bfq.h"
++#include "blk.h"
++
++/* Expiration time of sync (0) and async (1) requests, in jiffies. */
++static const int bfq_fifo_expire[2] = { HZ / 4, HZ / 8 };
++
++/* Maximum backwards seek, in KiB. */
++static const int bfq_back_max = 16 * 1024;
++
++/* Penalty of a backwards seek, in number of sectors. */
++static const int bfq_back_penalty = 2;
++
++/* Idling period duration, in jiffies. */
++static int bfq_slice_idle = HZ / 125;
++
++/* Minimum number of assigned budgets for which stats are safe to compute. */
++static const int bfq_stats_min_budgets = 194;
++
++/* Default maximum budget values, in sectors and number of requests. */
++static const int bfq_default_max_budget = 16 * 1024;
++static const int bfq_max_budget_async_rq = 4;
++
++/*
++ * Async to sync throughput distribution is controlled as follows:
++ * when an async request is served, the entity is charged the number
++ * of sectors of the request, multiplied by the factor below
++ */
++static const int bfq_async_charge_factor = 10;
++
++/* Default timeout values, in jiffies, approximating CFQ defaults. */
++static const int bfq_timeout_sync = HZ / 8;
++static int bfq_timeout_async = HZ / 25;
++
++struct kmem_cache *bfq_pool;
++
++/* Below this threshold (in ms), we consider thinktime immediate. */
++#define BFQ_MIN_TT		2
++
++/* hw_tag detection: parallel requests threshold and min samples needed. */
++#define BFQ_HW_QUEUE_THRESHOLD	4
++#define BFQ_HW_QUEUE_SAMPLES	32
++
++#define BFQQ_SEEK_THR	 (sector_t)(8 * 1024)
++#define BFQQ_SEEKY(bfqq) ((bfqq)->seek_mean > BFQQ_SEEK_THR)
++
++/* Min samples used for peak rate estimation (for autotuning). */
++#define BFQ_PEAK_RATE_SAMPLES	32
++
++/* Shift used for peak rate fixed precision calculations. */
++#define BFQ_RATE_SHIFT		16
++
++/*
++ * By default, BFQ computes the duration of the weight raising for
++ * interactive applications automatically, using the following formula:
++ * duration = (R / r) * T, where r is the peak rate of the device, and
++ * R and T are two reference parameters.
++ * In particular, R is the peak rate of the reference device (see below),
++ * and T is a reference time: given the systems that are likely to be
++ * installed on the reference device according to its speed class, T is
++ * about the maximum time needed, under BFQ and while reading two files in
++ * parallel, to load typical large applications on these systems.
++ * In practice, the slower/faster the device at hand is, the more/less it
++ * takes to load applications with respect to the reference device.
++ * Accordingly, the longer/shorter BFQ grants weight raising to interactive
++ * applications.
++ *
++ * BFQ uses four different reference pairs (R, T), depending on:
++ * . whether the device is rotational or non-rotational;
++ * . whether the device is slow, such as old or portable HDDs, as well as
++ *   SD cards, or fast, such as newer HDDs and SSDs.
++ *
++ * The device's speed class is dynamically (re)detected in
++ * bfq_update_peak_rate() every time the estimated peak rate is updated.
++ *
++ * In the following definitions, R_slow[0]/R_fast[0] and T_slow[0]/T_fast[0]
++ * are the reference values for a slow/fast rotational device, whereas
++ * R_slow[1]/R_fast[1] and T_slow[1]/T_fast[1] are the reference values for
++ * a slow/fast non-rotational device. Finally, device_speed_thresh are the
++ * thresholds used to switch between speed classes.
++ * Both the reference peak rates and the thresholds are measured in
++ * sectors/usec, left-shifted by BFQ_RATE_SHIFT.
++ */
++static int R_slow[2] = {1536, 10752};
++static int R_fast[2] = {17415, 34791};
++/*
++ * To improve readability, a conversion function is used to initialize the
++ * following arrays, which entails that they can be initialized only in a
++ * function.
++ */
++static int T_slow[2];
++static int T_fast[2];
++static int device_speed_thresh[2];
++
++#define BFQ_SERVICE_TREE_INIT	((struct bfq_service_tree)		\
++				{ RB_ROOT, RB_ROOT, NULL, NULL, 0, 0 })
++
++#define RQ_BIC(rq)		((struct bfq_io_cq *) (rq)->elv.priv[0])
++#define RQ_BFQQ(rq)		((rq)->elv.priv[1])
++
++static void bfq_schedule_dispatch(struct bfq_data *bfqd);
++
++#include "bfq-ioc.c"
++#include "bfq-sched.c"
++#include "bfq-cgroup.c"
++
++#define bfq_class_idle(bfqq)	((bfqq)->ioprio_class == IOPRIO_CLASS_IDLE)
++#define bfq_class_rt(bfqq)	((bfqq)->ioprio_class == IOPRIO_CLASS_RT)
++
++#define bfq_sample_valid(samples)	((samples) > 80)
++
++/*
++ * We regard a request as SYNC, if either it's a read or has the SYNC bit
++ * set (in which case it could also be a direct WRITE).
++ */
++static int bfq_bio_sync(struct bio *bio)
++{
++	if (bio_data_dir(bio) == READ || (bio->bi_rw & REQ_SYNC))
++		return 1;
++
++	return 0;
++}
++
++/*
++ * Scheduler run of queue, if there are requests pending and no one in the
++ * driver that will restart queueing.
++ */
++static void bfq_schedule_dispatch(struct bfq_data *bfqd)
++{
++	if (bfqd->queued != 0) {
++		bfq_log(bfqd, "schedule dispatch");
++		kblockd_schedule_work(&bfqd->unplug_work);
++	}
++}
++
++/*
++ * Lifted from AS - choose which of rq1 and rq2 that is best served now.
++ * We choose the request that is closesr to the head right now.  Distance
++ * behind the head is penalized and only allowed to a certain extent.
++ */
++static struct request *bfq_choose_req(struct bfq_data *bfqd,
++				      struct request *rq1,
++				      struct request *rq2,
++				      sector_t last)
++{
++	sector_t s1, s2, d1 = 0, d2 = 0;
++	unsigned long back_max;
++#define BFQ_RQ1_WRAP	0x01 /* request 1 wraps */
++#define BFQ_RQ2_WRAP	0x02 /* request 2 wraps */
++	unsigned wrap = 0; /* bit mask: requests behind the disk head? */
++
++	if (!rq1 || rq1 == rq2)
++		return rq2;
++	if (!rq2)
++		return rq1;
++
++	if (rq_is_sync(rq1) && !rq_is_sync(rq2))
++		return rq1;
++	else if (rq_is_sync(rq2) && !rq_is_sync(rq1))
++		return rq2;
++	if ((rq1->cmd_flags & REQ_META) && !(rq2->cmd_flags & REQ_META))
++		return rq1;
++	else if ((rq2->cmd_flags & REQ_META) && !(rq1->cmd_flags & REQ_META))
++		return rq2;
++
++	s1 = blk_rq_pos(rq1);
++	s2 = blk_rq_pos(rq2);
++
++	/*
++	 * By definition, 1KiB is 2 sectors.
++	 */
++	back_max = bfqd->bfq_back_max * 2;
++
++	/*
++	 * Strict one way elevator _except_ in the case where we allow
++	 * short backward seeks which are biased as twice the cost of a
++	 * similar forward seek.
++	 */
++	if (s1 >= last)
++		d1 = s1 - last;
++	else if (s1 + back_max >= last)
++		d1 = (last - s1) * bfqd->bfq_back_penalty;
++	else
++		wrap |= BFQ_RQ1_WRAP;
++
++	if (s2 >= last)
++		d2 = s2 - last;
++	else if (s2 + back_max >= last)
++		d2 = (last - s2) * bfqd->bfq_back_penalty;
++	else
++		wrap |= BFQ_RQ2_WRAP;
++
++	/* Found required data */
++
++	/*
++	 * By doing switch() on the bit mask "wrap" we avoid having to
++	 * check two variables for all permutations: --> faster!
++	 */
++	switch (wrap) {
++	case 0: /* common case for CFQ: rq1 and rq2 not wrapped */
++		if (d1 < d2)
++			return rq1;
++		else if (d2 < d1)
++			return rq2;
++		else {
++			if (s1 >= s2)
++				return rq1;
++			else
++				return rq2;
++		}
++
++	case BFQ_RQ2_WRAP:
++		return rq1;
++	case BFQ_RQ1_WRAP:
++		return rq2;
++	case (BFQ_RQ1_WRAP|BFQ_RQ2_WRAP): /* both rqs wrapped */
++	default:
++		/*
++		 * Since both rqs are wrapped,
++		 * start with the one that's further behind head
++		 * (--> only *one* back seek required),
++		 * since back seek takes more time than forward.
++		 */
++		if (s1 <= s2)
++			return rq1;
++		else
++			return rq2;
++	}
++}
++
++/*
++ * Tell whether there are active queues or groups with differentiated weights.
++ */
++static bool bfq_differentiated_weights(struct bfq_data *bfqd)
++{
++	/*
++	 * For weights to differ, at least one of the trees must contain
++	 * at least two nodes.
++	 */
++	return (!RB_EMPTY_ROOT(&bfqd->queue_weights_tree) &&
++		(bfqd->queue_weights_tree.rb_node->rb_left ||
++		 bfqd->queue_weights_tree.rb_node->rb_right)
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++	       ) ||
++	       (!RB_EMPTY_ROOT(&bfqd->group_weights_tree) &&
++		(bfqd->group_weights_tree.rb_node->rb_left ||
++		 bfqd->group_weights_tree.rb_node->rb_right)
++#endif
++	       );
++}
++
++/*
++ * The following function returns true if every queue must receive the
++ * same share of the throughput (this condition is used when deciding
++ * whether idling may be disabled, see the comments in the function
++ * bfq_bfqq_may_idle()).
++ *
++ * Such a scenario occurs when:
++ * 1) all active queues have the same weight,
++ * 2) all active groups at the same level in the groups tree have the same
++ *    weight,
++ * 3) all active groups at the same level in the groups tree have the same
++ *    number of children.
++ *
++ * Unfortunately, keeping the necessary state for evaluating exactly the
++ * above symmetry conditions would be quite complex and time-consuming.
++ * Therefore this function evaluates, instead, the following stronger
++ * sub-conditions, for which it is much easier to maintain the needed
++ * state:
++ * 1) all active queues have the same weight,
++ * 2) all active groups have the same weight,
++ * 3) all active groups have at most one active child each.
++ * In particular, the last two conditions are always true if hierarchical
++ * support and the cgroups interface are not enabled, thus no state needs
++ * to be maintained in this case.
++ */
++static bool bfq_symmetric_scenario(struct bfq_data *bfqd)
++{
++	return
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++		!bfqd->active_numerous_groups &&
++#endif
++		!bfq_differentiated_weights(bfqd);
++}
++
++/*
++ * If the weight-counter tree passed as input contains no counter for
++ * the weight of the input entity, then add that counter; otherwise just
++ * increment the existing counter.
++ *
++ * Note that weight-counter trees contain few nodes in mostly symmetric
++ * scenarios. For example, if all queues have the same weight, then the
++ * weight-counter tree for the queues may contain at most one node.
++ * This holds even if low_latency is on, because weight-raised queues
++ * are not inserted in the tree.
++ * In most scenarios, the rate at which nodes are created/destroyed
++ * should be low too.
++ */
++static void bfq_weights_tree_add(struct bfq_data *bfqd,
++				 struct bfq_entity *entity,
++				 struct rb_root *root)
++{
++	struct rb_node **new = &(root->rb_node), *parent = NULL;
++
++	/*
++	 * Do not insert if the entity is already associated with a
++	 * counter, which happens if:
++	 *   1) the entity is associated with a queue,
++	 *   2) a request arrival has caused the queue to become both
++	 *      non-weight-raised, and hence change its weight, and
++	 *      backlogged; in this respect, each of the two events
++	 *      causes an invocation of this function,
++	 *   3) this is the invocation of this function caused by the
++	 *      second event. This second invocation is actually useless,
++	 *      and we handle this fact by exiting immediately. More
++	 *      efficient or clearer solutions might possibly be adopted.
++	 */
++	if (entity->weight_counter)
++		return;
++
++	while (*new) {
++		struct bfq_weight_counter *__counter = container_of(*new,
++						struct bfq_weight_counter,
++						weights_node);
++		parent = *new;
++
++		if (entity->weight == __counter->weight) {
++			entity->weight_counter = __counter;
++			goto inc_counter;
++		}
++		if (entity->weight < __counter->weight)
++			new = &((*new)->rb_left);
++		else
++			new = &((*new)->rb_right);
++	}
++
++	entity->weight_counter = kzalloc(sizeof(struct bfq_weight_counter),
++					 GFP_ATOMIC);
++	entity->weight_counter->weight = entity->weight;
++	rb_link_node(&entity->weight_counter->weights_node, parent, new);
++	rb_insert_color(&entity->weight_counter->weights_node, root);
++
++inc_counter:
++	entity->weight_counter->num_active++;
++}
++
++/*
++ * Decrement the weight counter associated with the entity, and, if the
++ * counter reaches 0, remove the counter from the tree.
++ * See the comments to the function bfq_weights_tree_add() for considerations
++ * about overhead.
++ */
++static void bfq_weights_tree_remove(struct bfq_data *bfqd,
++				    struct bfq_entity *entity,
++				    struct rb_root *root)
++{
++	if (!entity->weight_counter)
++		return;
++
++	BUG_ON(RB_EMPTY_ROOT(root));
++	BUG_ON(entity->weight_counter->weight != entity->weight);
++
++	BUG_ON(!entity->weight_counter->num_active);
++	entity->weight_counter->num_active--;
++	if (entity->weight_counter->num_active > 0)
++		goto reset_entity_pointer;
++
++	rb_erase(&entity->weight_counter->weights_node, root);
++	kfree(entity->weight_counter);
++
++reset_entity_pointer:
++	entity->weight_counter = NULL;
++}
++
++static struct request *bfq_find_next_rq(struct bfq_data *bfqd,
++					struct bfq_queue *bfqq,
++					struct request *last)
++{
++	struct rb_node *rbnext = rb_next(&last->rb_node);
++	struct rb_node *rbprev = rb_prev(&last->rb_node);
++	struct request *next = NULL, *prev = NULL;
++
++	BUG_ON(RB_EMPTY_NODE(&last->rb_node));
++
++	if (rbprev)
++		prev = rb_entry_rq(rbprev);
++
++	if (rbnext)
++		next = rb_entry_rq(rbnext);
++	else {
++		rbnext = rb_first(&bfqq->sort_list);
++		if (rbnext && rbnext != &last->rb_node)
++			next = rb_entry_rq(rbnext);
++	}
++
++	return bfq_choose_req(bfqd, next, prev, blk_rq_pos(last));
++}
++
++/* see the definition of bfq_async_charge_factor for details */
++static unsigned long bfq_serv_to_charge(struct request *rq,
++					struct bfq_queue *bfqq)
++{
++	return blk_rq_sectors(rq) *
++		(1 + ((!bfq_bfqq_sync(bfqq)) * (bfqq->wr_coeff == 1) *
++		bfq_async_charge_factor));
++}
++
++/**
++ * bfq_updated_next_req - update the queue after a new next_rq selection.
++ * @bfqd: the device data the queue belongs to.
++ * @bfqq: the queue to update.
++ *
++ * If the first request of a queue changes we make sure that the queue
++ * has enough budget to serve at least its first request (if the
++ * request has grown).  We do this because if the queue has not enough
++ * budget for its first request, it has to go through two dispatch
++ * rounds to actually get it dispatched.
++ */
++static void bfq_updated_next_req(struct bfq_data *bfqd,
++				 struct bfq_queue *bfqq)
++{
++	struct bfq_entity *entity = &bfqq->entity;
++	struct bfq_service_tree *st = bfq_entity_service_tree(entity);
++	struct request *next_rq = bfqq->next_rq;
++	unsigned long new_budget;
++
++	if (!next_rq)
++		return;
++
++	if (bfqq == bfqd->in_service_queue)
++		/*
++		 * In order not to break guarantees, budgets cannot be
++		 * changed after an entity has been selected.
++		 */
++		return;
++
++	BUG_ON(entity->tree != &st->active);
++	BUG_ON(entity == entity->sched_data->in_service_entity);
++
++	new_budget = max_t(unsigned long, bfqq->max_budget,
++			   bfq_serv_to_charge(next_rq, bfqq));
++	if (entity->budget != new_budget) {
++		entity->budget = new_budget;
++		bfq_log_bfqq(bfqd, bfqq, "updated next rq: new budget %lu",
++					 new_budget);
++		bfq_activate_bfqq(bfqd, bfqq);
++	}
++}
++
++static unsigned int bfq_wr_duration(struct bfq_data *bfqd)
++{
++	u64 dur;
++
++	if (bfqd->bfq_wr_max_time > 0)
++		return bfqd->bfq_wr_max_time;
++
++	dur = bfqd->RT_prod;
++	do_div(dur, bfqd->peak_rate);
++
++	return dur;
++}
++
++/* Empty burst list and add just bfqq (see comments to bfq_handle_burst) */
++static void bfq_reset_burst_list(struct bfq_data *bfqd, struct bfq_queue *bfqq)
++{
++	struct bfq_queue *item;
++	struct hlist_node *n;
++
++	hlist_for_each_entry_safe(item, n, &bfqd->burst_list, burst_list_node)
++		hlist_del_init(&item->burst_list_node);
++	hlist_add_head(&bfqq->burst_list_node, &bfqd->burst_list);
++	bfqd->burst_size = 1;
++}
++
++/* Add bfqq to the list of queues in current burst (see bfq_handle_burst) */
++static void bfq_add_to_burst(struct bfq_data *bfqd, struct bfq_queue *bfqq)
++{
++	/* Increment burst size to take into account also bfqq */
++	bfqd->burst_size++;
++
++	if (bfqd->burst_size == bfqd->bfq_large_burst_thresh) {
++		struct bfq_queue *pos, *bfqq_item;
++		struct hlist_node *n;
++
++		/*
++		 * Enough queues have been activated shortly after each
++		 * other to consider this burst as large.
++		 */
++		bfqd->large_burst = true;
++
++		/*
++		 * We can now mark all queues in the burst list as
++		 * belonging to a large burst.
++		 */
++		hlist_for_each_entry(bfqq_item, &bfqd->burst_list,
++				     burst_list_node)
++		        bfq_mark_bfqq_in_large_burst(bfqq_item);
++		bfq_mark_bfqq_in_large_burst(bfqq);
++
++		/*
++		 * From now on, and until the current burst finishes, any
++		 * new queue being activated shortly after the last queue
++		 * was inserted in the burst can be immediately marked as
++		 * belonging to a large burst. So the burst list is not
++		 * needed any more. Remove it.
++		 */
++		hlist_for_each_entry_safe(pos, n, &bfqd->burst_list,
++					  burst_list_node)
++			hlist_del_init(&pos->burst_list_node);
++	} else /* burst not yet large: add bfqq to the burst list */
++		hlist_add_head(&bfqq->burst_list_node, &bfqd->burst_list);
++}
++
++/*
++ * If many queues happen to become active shortly after each other, then,
++ * to help the processes associated to these queues get their job done as
++ * soon as possible, it is usually better to not grant either weight-raising
++ * or device idling to these queues. In this comment we describe, firstly,
++ * the reasons why this fact holds, and, secondly, the next function, which
++ * implements the main steps needed to properly mark these queues so that
++ * they can then be treated in a different way.
++ *
++ * As for the terminology, we say that a queue becomes active, i.e.,
++ * switches from idle to backlogged, either when it is created (as a
++ * consequence of the arrival of an I/O request), or, if already existing,
++ * when a new request for the queue arrives while the queue is idle.
++ * Bursts of activations, i.e., activations of different queues occurring
++ * shortly after each other, are typically caused by services or applications
++ * that spawn or reactivate many parallel threads/processes. Examples are
++ * systemd during boot or git grep.
++ *
++ * These services or applications benefit mostly from a high throughput:
++ * the quicker the requests of the activated queues are cumulatively served,
++ * the sooner the target job of these queues gets completed. As a consequence,
++ * weight-raising any of these queues, which also implies idling the device
++ * for it, is almost always counterproductive: in most cases it just lowers
++ * throughput.
++ *
++ * On the other hand, a burst of activations may be also caused by the start
++ * of an application that does not consist in a lot of parallel I/O-bound
++ * threads. In fact, with a complex application, the burst may be just a
++ * consequence of the fact that several processes need to be executed to
++ * start-up the application. To start an application as quickly as possible,
++ * the best thing to do is to privilege the I/O related to the application
++ * with respect to all other I/O. Therefore, the best strategy to start as
++ * quickly as possible an application that causes a burst of activations is
++ * to weight-raise all the queues activated during the burst. This is the
++ * exact opposite of the best strategy for the other type of bursts.
++ *
++ * In the end, to take the best action for each of the two cases, the two
++ * types of bursts need to be distinguished. Fortunately, this seems
++ * relatively easy to do, by looking at the sizes of the bursts. In
++ * particular, we found a threshold such that bursts with a larger size
++ * than that threshold are apparently caused only by services or commands
++ * such as systemd or git grep. For brevity, hereafter we call just 'large'
++ * these bursts. BFQ *does not* weight-raise queues whose activations occur
++ * in a large burst. In addition, for each of these queues BFQ performs or
++ * does not perform idling depending on which choice boosts the throughput
++ * most. The exact choice depends on the device and request pattern at
++ * hand.
++ *
++ * Turning back to the next function, it implements all the steps needed
++ * to detect the occurrence of a large burst and to properly mark all the
++ * queues belonging to it (so that they can then be treated in a different
++ * way). This goal is achieved by maintaining a special "burst list" that
++ * holds, temporarily, the queues that belong to the burst in progress. The
++ * list is then used to mark these queues as belonging to a large burst if
++ * the burst does become large. The main steps are the following.
++ *
++ * . when the very first queue is activated, the queue is inserted into the
++ *   list (as it could be the first queue in a possible burst)
++ *
++ * . if the current burst has not yet become large, and a queue Q that does
++ *   not yet belong to the burst is activated shortly after the last time
++ *   at which a new queue entered the burst list, then the function appends
++ *   Q to the burst list
++ *
++ * . if, as a consequence of the previous step, the burst size reaches
++ *   the large-burst threshold, then
++ *
++ *     . all the queues in the burst list are marked as belonging to a
++ *       large burst
++ *
++ *     . the burst list is deleted; in fact, the burst list already served
++ *       its purpose (keeping temporarily track of the queues in a burst,
++ *       so as to be able to mark them as belonging to a large burst in the
++ *       previous sub-step), and now is not needed any more
++ *
++ *     . the device enters a large-burst mode
++ *
++ * . if a queue Q that does not belong to the burst is activated while
++ *   the device is in large-burst mode and shortly after the last time
++ *   at which a queue either entered the burst list or was marked as
++ *   belonging to the current large burst, then Q is immediately marked
++ *   as belonging to a large burst.
++ *
++ * . if a queue Q that does not belong to the burst is activated a while
++ *   later, i.e., not shortly after, than the last time at which a queue
++ *   either entered the burst list or was marked as belonging to the
++ *   current large burst, then the current burst is deemed as finished and:
++ *
++ *        . the large-burst mode is reset if set
++ *
++ *        . the burst list is emptied
++ *
++ *        . Q is inserted in the burst list, as Q may be the first queue
++ *          in a possible new burst (then the burst list contains just Q
++ *          after this step).
++ */
++static void bfq_handle_burst(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++			     bool idle_for_long_time)
++{
++	/*
++	 * If bfqq happened to be activated in a burst, but has been idle
++	 * for at least as long as an interactive queue, then we assume
++	 * that, in the overall I/O initiated in the burst, the I/O
++	 * associated to bfqq is finished. So bfqq does not need to be
++	 * treated as a queue belonging to a burst anymore. Accordingly,
++	 * we reset bfqq's in_large_burst flag if set, and remove bfqq
++	 * from the burst list if it's there. We do not decrement instead
++	 * burst_size, because the fact that bfqq does not need to belong
++	 * to the burst list any more does not invalidate the fact that
++	 * bfqq may have been activated during the current burst.
++	 */
++	if (idle_for_long_time) {
++		hlist_del_init(&bfqq->burst_list_node);
++		bfq_clear_bfqq_in_large_burst(bfqq);
++	}
++
++	/*
++	 * If bfqq is already in the burst list or is part of a large
++	 * burst, then there is nothing else to do.
++	 */
++	if (!hlist_unhashed(&bfqq->burst_list_node) ||
++	    bfq_bfqq_in_large_burst(bfqq))
++		return;
++
++	/*
++	 * If bfqq's activation happens late enough, then the current
++	 * burst is finished, and related data structures must be reset.
++	 *
++	 * In this respect, consider the special case where bfqq is the very
++	 * first queue being activated. In this case, last_ins_in_burst is
++	 * not yet significant when we get here. But it is easy to verify
++	 * that, whether or not the following condition is true, bfqq will
++	 * end up being inserted into the burst list. In particular the
++	 * list will happen to contain only bfqq. And this is exactly what
++	 * has to happen, as bfqq may be the first queue in a possible
++	 * burst.
++	 */
++	if (time_is_before_jiffies(bfqd->last_ins_in_burst +
++	    bfqd->bfq_burst_interval)) {
++		bfqd->large_burst = false;
++		bfq_reset_burst_list(bfqd, bfqq);
++		return;
++	}
++
++	/*
++	 * If we get here, then bfqq is being activated shortly after the
++	 * last queue. So, if the current burst is also large, we can mark
++	 * bfqq as belonging to this large burst immediately.
++	 */
++	if (bfqd->large_burst) {
++		bfq_mark_bfqq_in_large_burst(bfqq);
++		return;
++	}
++
++	/*
++	 * If we get here, then a large-burst state has not yet been
++	 * reached, but bfqq is being activated shortly after the last
++	 * queue. Then we add bfqq to the burst.
++	 */
++	bfq_add_to_burst(bfqd, bfqq);
++}
++
++static void bfq_add_request(struct request *rq)
++{
++	struct bfq_queue *bfqq = RQ_BFQQ(rq);
++	struct bfq_entity *entity = &bfqq->entity;
++	struct bfq_data *bfqd = bfqq->bfqd;
++	struct request *next_rq, *prev;
++	unsigned long old_wr_coeff = bfqq->wr_coeff;
++	bool interactive = false;
++
++	bfq_log_bfqq(bfqd, bfqq, "add_request %d", rq_is_sync(rq));
++	bfqq->queued[rq_is_sync(rq)]++;
++	bfqd->queued++;
++
++	elv_rb_add(&bfqq->sort_list, rq);
++
++	/*
++	 * Check if this request is a better next-serve candidate.
++	 */
++	prev = bfqq->next_rq;
++	next_rq = bfq_choose_req(bfqd, bfqq->next_rq, rq, bfqd->last_position);
++	BUG_ON(!next_rq);
++	bfqq->next_rq = next_rq;
++
++	if (!bfq_bfqq_busy(bfqq)) {
++		bool soft_rt, in_burst,
++		     idle_for_long_time = time_is_before_jiffies(
++						bfqq->budget_timeout +
++						bfqd->bfq_wr_min_idle_time);
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++		bfqg_stats_update_io_add(bfqq_group(RQ_BFQQ(rq)), bfqq,
++					 rq->cmd_flags);
++#endif
++		if (bfq_bfqq_sync(bfqq)) {
++			bool already_in_burst =
++			   !hlist_unhashed(&bfqq->burst_list_node) ||
++			   bfq_bfqq_in_large_burst(bfqq);
++			bfq_handle_burst(bfqd, bfqq, idle_for_long_time);
++			/*
++			 * If bfqq was not already in the current burst,
++			 * then, at this point, bfqq either has been
++			 * added to the current burst or has caused the
++			 * current burst to terminate. In particular, in
++			 * the second case, bfqq has become the first
++			 * queue in a possible new burst.
++			 * In both cases last_ins_in_burst needs to be
++			 * moved forward.
++			 */
++			if (!already_in_burst)
++				bfqd->last_ins_in_burst = jiffies;
++		}
++
++		in_burst = bfq_bfqq_in_large_burst(bfqq);
++		soft_rt = bfqd->bfq_wr_max_softrt_rate > 0 &&
++			!in_burst &&
++			time_is_before_jiffies(bfqq->soft_rt_next_start);
++		interactive = !in_burst && idle_for_long_time;
++		entity->budget = max_t(unsigned long, bfqq->max_budget,
++				       bfq_serv_to_charge(next_rq, bfqq));
++
++		if (!bfq_bfqq_IO_bound(bfqq)) {
++			if (time_before(jiffies,
++					RQ_BIC(rq)->ttime.last_end_request +
++					bfqd->bfq_slice_idle)) {
++				bfqq->requests_within_timer++;
++				if (bfqq->requests_within_timer >=
++				    bfqd->bfq_requests_within_timer)
++					bfq_mark_bfqq_IO_bound(bfqq);
++			} else
++				bfqq->requests_within_timer = 0;
++		}
++
++		if (!bfqd->low_latency)
++			goto add_bfqq_busy;
++
++		/*
++		 * If the queue:
++		 * - is not being boosted,
++		 * - has been idle for enough time,
++		 * - is not a sync queue or is linked to a bfq_io_cq (it is
++		 *   shared "for its nature" or it is not shared and its
++		 *   requests have not been redirected to a shared queue)
++		 * start a weight-raising period.
++		 */
++		if (old_wr_coeff == 1 && (interactive || soft_rt) &&
++		    (!bfq_bfqq_sync(bfqq) || bfqq->bic)) {
++			bfqq->wr_coeff = bfqd->bfq_wr_coeff;
++			if (interactive)
++				bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
++			else
++				bfqq->wr_cur_max_time =
++					bfqd->bfq_wr_rt_max_time;
++			bfq_log_bfqq(bfqd, bfqq,
++				     "wrais starting at %lu, rais_max_time %u",
++				     jiffies,
++				     jiffies_to_msecs(bfqq->wr_cur_max_time));
++		} else if (old_wr_coeff > 1) {
++			if (interactive)
++				bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
++			else if (in_burst ||
++				 (bfqq->wr_cur_max_time ==
++				  bfqd->bfq_wr_rt_max_time &&
++				  !soft_rt)) {
++				bfqq->wr_coeff = 1;
++				bfq_log_bfqq(bfqd, bfqq,
++					"wrais ending at %lu, rais_max_time %u",
++					jiffies,
++					jiffies_to_msecs(bfqq->
++						wr_cur_max_time));
++			} else if (time_before(
++					bfqq->last_wr_start_finish +
++					bfqq->wr_cur_max_time,
++					jiffies +
++					bfqd->bfq_wr_rt_max_time) &&
++				   soft_rt) {
++				/*
++				 *
++				 * The remaining weight-raising time is lower
++				 * than bfqd->bfq_wr_rt_max_time, which means
++				 * that the application is enjoying weight
++				 * raising either because deemed soft-rt in
++				 * the near past, or because deemed interactive
++				 * a long ago.
++				 * In both cases, resetting now the current
++				 * remaining weight-raising time for the
++				 * application to the weight-raising duration
++				 * for soft rt applications would not cause any
++				 * latency increase for the application (as the
++				 * new duration would be higher than the
++				 * remaining time).
++				 *
++				 * In addition, the application is now meeting
++				 * the requirements for being deemed soft rt.
++				 * In the end we can correctly and safely
++				 * (re)charge the weight-raising duration for
++				 * the application with the weight-raising
++				 * duration for soft rt applications.
++				 *
++				 * In particular, doing this recharge now, i.e.,
++				 * before the weight-raising period for the
++				 * application finishes, reduces the probability
++				 * of the following negative scenario:
++				 * 1) the weight of a soft rt application is
++				 *    raised at startup (as for any newly
++				 *    created application),
++				 * 2) since the application is not interactive,
++				 *    at a certain time weight-raising is
++				 *    stopped for the application,
++				 * 3) at that time the application happens to
++				 *    still have pending requests, and hence
++				 *    is destined to not have a chance to be
++				 *    deemed soft rt before these requests are
++				 *    completed (see the comments to the
++				 *    function bfq_bfqq_softrt_next_start()
++				 *    for details on soft rt detection),
++				 * 4) these pending requests experience a high
++				 *    latency because the application is not
++				 *    weight-raised while they are pending.
++				 */
++				bfqq->last_wr_start_finish = jiffies;
++				bfqq->wr_cur_max_time =
++					bfqd->bfq_wr_rt_max_time;
++			}
++		}
++		if (old_wr_coeff != bfqq->wr_coeff)
++			entity->prio_changed = 1;
++add_bfqq_busy:
++		bfqq->last_idle_bklogged = jiffies;
++		bfqq->service_from_backlogged = 0;
++		bfq_clear_bfqq_softrt_update(bfqq);
++		bfq_add_bfqq_busy(bfqd, bfqq);
++	} else {
++		if (bfqd->low_latency && old_wr_coeff == 1 && !rq_is_sync(rq) &&
++		    time_is_before_jiffies(
++				bfqq->last_wr_start_finish +
++				bfqd->bfq_wr_min_inter_arr_async)) {
++			bfqq->wr_coeff = bfqd->bfq_wr_coeff;
++			bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
++
++			bfqd->wr_busy_queues++;
++			entity->prio_changed = 1;
++			bfq_log_bfqq(bfqd, bfqq,
++			    "non-idle wrais starting at %lu, rais_max_time %u",
++			    jiffies,
++			    jiffies_to_msecs(bfqq->wr_cur_max_time));
++		}
++		if (prev != bfqq->next_rq)
++			bfq_updated_next_req(bfqd, bfqq);
++	}
++
++	if (bfqd->low_latency &&
++		(old_wr_coeff == 1 || bfqq->wr_coeff == 1 || interactive))
++		bfqq->last_wr_start_finish = jiffies;
++}
++
++static struct request *bfq_find_rq_fmerge(struct bfq_data *bfqd,
++					  struct bio *bio)
++{
++	struct task_struct *tsk = current;
++	struct bfq_io_cq *bic;
++	struct bfq_queue *bfqq;
++
++	bic = bfq_bic_lookup(bfqd, tsk->io_context);
++	if (!bic)
++		return NULL;
++
++	bfqq = bic_to_bfqq(bic, bfq_bio_sync(bio));
++	if (bfqq)
++		return elv_rb_find(&bfqq->sort_list, bio_end_sector(bio));
++
++	return NULL;
++}
++
++static void bfq_activate_request(struct request_queue *q, struct request *rq)
++{
++	struct bfq_data *bfqd = q->elevator->elevator_data;
++
++	bfqd->rq_in_driver++;
++	bfqd->last_position = blk_rq_pos(rq) + blk_rq_sectors(rq);
++	bfq_log(bfqd, "activate_request: new bfqd->last_position %llu",
++		(long long unsigned)bfqd->last_position);
++}
++
++static void bfq_deactivate_request(struct request_queue *q, struct request *rq)
++{
++	struct bfq_data *bfqd = q->elevator->elevator_data;
++
++	BUG_ON(bfqd->rq_in_driver == 0);
++	bfqd->rq_in_driver--;
++}
++
++static void bfq_remove_request(struct request *rq)
++{
++	struct bfq_queue *bfqq = RQ_BFQQ(rq);
++	struct bfq_data *bfqd = bfqq->bfqd;
++	const int sync = rq_is_sync(rq);
++
++	if (bfqq->next_rq == rq) {
++		bfqq->next_rq = bfq_find_next_rq(bfqd, bfqq, rq);
++		bfq_updated_next_req(bfqd, bfqq);
++	}
++
++	if (rq->queuelist.prev != &rq->queuelist)
++		list_del_init(&rq->queuelist);
++	BUG_ON(bfqq->queued[sync] == 0);
++	bfqq->queued[sync]--;
++	bfqd->queued--;
++	elv_rb_del(&bfqq->sort_list, rq);
++
++	if (RB_EMPTY_ROOT(&bfqq->sort_list)) {
++		if (bfq_bfqq_busy(bfqq) && bfqq != bfqd->in_service_queue)
++			bfq_del_bfqq_busy(bfqd, bfqq, 1);
++		/*
++		 * Remove queue from request-position tree as it is empty.
++		 */
++		if (bfqq->pos_root) {
++			rb_erase(&bfqq->pos_node, bfqq->pos_root);
++			bfqq->pos_root = NULL;
++		}
++	}
++
++	if (rq->cmd_flags & REQ_META) {
++		BUG_ON(bfqq->meta_pending == 0);
++		bfqq->meta_pending--;
++	}
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++	bfqg_stats_update_io_remove(bfqq_group(bfqq), rq->cmd_flags);
++#endif
++}
++
++static int bfq_merge(struct request_queue *q, struct request **req,
++		     struct bio *bio)
++{
++	struct bfq_data *bfqd = q->elevator->elevator_data;
++	struct request *__rq;
++
++	__rq = bfq_find_rq_fmerge(bfqd, bio);
++	if (__rq && elv_rq_merge_ok(__rq, bio)) {
++		*req = __rq;
++		return ELEVATOR_FRONT_MERGE;
++	}
++
++	return ELEVATOR_NO_MERGE;
++}
++
++static void bfq_merged_request(struct request_queue *q, struct request *req,
++			       int type)
++{
++	if (type == ELEVATOR_FRONT_MERGE &&
++	    rb_prev(&req->rb_node) &&
++	    blk_rq_pos(req) <
++	    blk_rq_pos(container_of(rb_prev(&req->rb_node),
++				    struct request, rb_node))) {
++		struct bfq_queue *bfqq = RQ_BFQQ(req);
++		struct bfq_data *bfqd = bfqq->bfqd;
++		struct request *prev, *next_rq;
++
++		/* Reposition request in its sort_list */
++		elv_rb_del(&bfqq->sort_list, req);
++		elv_rb_add(&bfqq->sort_list, req);
++		/* Choose next request to be served for bfqq */
++		prev = bfqq->next_rq;
++		next_rq = bfq_choose_req(bfqd, bfqq->next_rq, req,
++					 bfqd->last_position);
++		BUG_ON(!next_rq);
++		bfqq->next_rq = next_rq;
++	}
++}
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++static void bfq_bio_merged(struct request_queue *q, struct request *req,
++			   struct bio *bio)
++{
++	bfqg_stats_update_io_merged(bfqq_group(RQ_BFQQ(req)), bio->bi_rw);
++}
++#endif
++
++static void bfq_merged_requests(struct request_queue *q, struct request *rq,
++				struct request *next)
++{
++	struct bfq_queue *bfqq = RQ_BFQQ(rq), *next_bfqq = RQ_BFQQ(next);
++
++	/*
++	 * If next and rq belong to the same bfq_queue and next is older
++	 * than rq, then reposition rq in the fifo (by substituting next
++	 * with rq). Otherwise, if next and rq belong to different
++	 * bfq_queues, never reposition rq: in fact, we would have to
++	 * reposition it with respect to next's position in its own fifo,
++	 * which would most certainly be too expensive with respect to
++	 * the benefits.
++	 */
++	if (bfqq == next_bfqq &&
++	    !list_empty(&rq->queuelist) && !list_empty(&next->queuelist) &&
++	    time_before(next->fifo_time, rq->fifo_time)) {
++		list_del_init(&rq->queuelist);
++		list_replace_init(&next->queuelist, &rq->queuelist);
++		rq->fifo_time = next->fifo_time;
++	}
++
++	if (bfqq->next_rq == next)
++		bfqq->next_rq = rq;
++
++	bfq_remove_request(next);
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++	bfqg_stats_update_io_merged(bfqq_group(bfqq), next->cmd_flags);
++#endif
++}
++
++/* Must be called with bfqq != NULL */
++static void bfq_bfqq_end_wr(struct bfq_queue *bfqq)
++{
++	BUG_ON(!bfqq);
++	if (bfq_bfqq_busy(bfqq))
++		bfqq->bfqd->wr_busy_queues--;
++	bfqq->wr_coeff = 1;
++	bfqq->wr_cur_max_time = 0;
++	/* Trigger a weight change on the next activation of the queue */
++	bfqq->entity.prio_changed = 1;
++}
++
++static void bfq_end_wr_async_queues(struct bfq_data *bfqd,
++				    struct bfq_group *bfqg)
++{
++	int i, j;
++
++	for (i = 0; i < 2; i++)
++		for (j = 0; j < IOPRIO_BE_NR; j++)
++			if (bfqg->async_bfqq[i][j])
++				bfq_bfqq_end_wr(bfqg->async_bfqq[i][j]);
++	if (bfqg->async_idle_bfqq)
++		bfq_bfqq_end_wr(bfqg->async_idle_bfqq);
++}
++
++static void bfq_end_wr(struct bfq_data *bfqd)
++{
++	struct bfq_queue *bfqq;
++
++	spin_lock_irq(bfqd->queue->queue_lock);
++
++	list_for_each_entry(bfqq, &bfqd->active_list, bfqq_list)
++		bfq_bfqq_end_wr(bfqq);
++	list_for_each_entry(bfqq, &bfqd->idle_list, bfqq_list)
++		bfq_bfqq_end_wr(bfqq);
++	bfq_end_wr_async(bfqd);
++
++	spin_unlock_irq(bfqd->queue->queue_lock);
++}
++
++static int bfq_allow_merge(struct request_queue *q, struct request *rq,
++			   struct bio *bio)
++{
++	struct bfq_data *bfqd = q->elevator->elevator_data;
++	struct bfq_io_cq *bic;
++
++	/*
++	 * Disallow merge of a sync bio into an async request.
++	 */
++	if (bfq_bio_sync(bio) && !rq_is_sync(rq))
++		return 0;
++
++	/*
++	 * Lookup the bfqq that this bio will be queued with. Allow
++	 * merge only if rq is queued there.
++	 * Queue lock is held here.
++	 */
++	bic = bfq_bic_lookup(bfqd, current->io_context);
++	if (!bic)
++		return 0;
++
++	return bic_to_bfqq(bic, bfq_bio_sync(bio)) == RQ_BFQQ(rq);
++}
++
++static void __bfq_set_in_service_queue(struct bfq_data *bfqd,
++				       struct bfq_queue *bfqq)
++{
++	if (bfqq) {
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++		bfqg_stats_update_avg_queue_size(bfqq_group(bfqq));
++#endif
++		bfq_mark_bfqq_must_alloc(bfqq);
++		bfq_mark_bfqq_budget_new(bfqq);
++		bfq_clear_bfqq_fifo_expire(bfqq);
++
++		bfqd->budgets_assigned = (bfqd->budgets_assigned*7 + 256) / 8;
++
++		bfq_log_bfqq(bfqd, bfqq,
++			     "set_in_service_queue, cur-budget = %d",
++			     bfqq->entity.budget);
++	}
++
++	bfqd->in_service_queue = bfqq;
++}
++
++/*
++ * Get and set a new queue for service.
++ */
++static struct bfq_queue *bfq_set_in_service_queue(struct bfq_data *bfqd)
++{
++	struct bfq_queue *bfqq = bfq_get_next_queue(bfqd);
++
++	__bfq_set_in_service_queue(bfqd, bfqq);
++	return bfqq;
++}
++
++/*
++ * If enough samples have been computed, return the current max budget
++ * stored in bfqd, which is dynamically updated according to the
++ * estimated disk peak rate; otherwise return the default max budget
++ */
++static int bfq_max_budget(struct bfq_data *bfqd)
++{
++	if (bfqd->budgets_assigned < bfq_stats_min_budgets)
++		return bfq_default_max_budget;
++	else
++		return bfqd->bfq_max_budget;
++}
++
++/*
++ * Return min budget, which is a fraction of the current or default
++ * max budget (trying with 1/32)
++ */
++static int bfq_min_budget(struct bfq_data *bfqd)
++{
++	if (bfqd->budgets_assigned < bfq_stats_min_budgets)
++		return bfq_default_max_budget / 32;
++	else
++		return bfqd->bfq_max_budget / 32;
++}
++
++static void bfq_arm_slice_timer(struct bfq_data *bfqd)
++{
++	struct bfq_queue *bfqq = bfqd->in_service_queue;
++	struct bfq_io_cq *bic;
++	unsigned long sl;
++
++	BUG_ON(!RB_EMPTY_ROOT(&bfqq->sort_list));
++
++	/* Processes have exited, don't wait. */
++	bic = bfqd->in_service_bic;
++	if (!bic || atomic_read(&bic->icq.ioc->active_ref) == 0)
++		return;
++
++	bfq_mark_bfqq_wait_request(bfqq);
++
++	/*
++	 * We don't want to idle for seeks, but we do want to allow
++	 * fair distribution of slice time for a process doing back-to-back
++	 * seeks. So allow a little bit of time for him to submit a new rq.
++	 *
++	 * To prevent processes with (partly) seeky workloads from
++	 * being too ill-treated, grant them a small fraction of the
++	 * assigned budget before reducing the waiting time to
++	 * BFQ_MIN_TT. This happened to help reduce latency.
++	 */
++	sl = bfqd->bfq_slice_idle;
++	/*
++	 * Unless the queue is being weight-raised or the scenario is
++	 * asymmetric, grant only minimum idle time if the queue either
++	 * has been seeky for long enough or has already proved to be
++	 * constantly seeky.
++	 */
++	if (bfq_sample_valid(bfqq->seek_samples) &&
++	    ((BFQQ_SEEKY(bfqq) && bfqq->entity.service >
++				  bfq_max_budget(bfqq->bfqd) / 8) ||
++	      bfq_bfqq_constantly_seeky(bfqq)) && bfqq->wr_coeff == 1 &&
++	    bfq_symmetric_scenario(bfqd))
++		sl = min(sl, msecs_to_jiffies(BFQ_MIN_TT));
++	else if (bfqq->wr_coeff > 1)
++		sl = sl * 3;
++	bfqd->last_idling_start = ktime_get();
++	mod_timer(&bfqd->idle_slice_timer, jiffies + sl);
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++	bfqg_stats_set_start_idle_time(bfqq_group(bfqq));
++#endif
++	bfq_log(bfqd, "arm idle: %u/%u ms",
++		jiffies_to_msecs(sl), jiffies_to_msecs(bfqd->bfq_slice_idle));
++}
++
++/*
++ * Set the maximum time for the in-service queue to consume its
++ * budget. This prevents seeky processes from lowering the disk
++ * throughput (always guaranteed with a time slice scheme as in CFQ).
++ */
++static void bfq_set_budget_timeout(struct bfq_data *bfqd)
++{
++	struct bfq_queue *bfqq = bfqd->in_service_queue;
++	unsigned int timeout_coeff;
++	if (bfqq->wr_cur_max_time == bfqd->bfq_wr_rt_max_time)
++		timeout_coeff = 1;
++	else
++		timeout_coeff = bfqq->entity.weight / bfqq->entity.orig_weight;
++
++	bfqd->last_budget_start = ktime_get();
++
++	bfq_clear_bfqq_budget_new(bfqq);
++	bfqq->budget_timeout = jiffies +
++		bfqd->bfq_timeout[bfq_bfqq_sync(bfqq)] * timeout_coeff;
++
++	bfq_log_bfqq(bfqd, bfqq, "set budget_timeout %u",
++		jiffies_to_msecs(bfqd->bfq_timeout[bfq_bfqq_sync(bfqq)] *
++		timeout_coeff));
++}
++
++/*
++ * Move request from internal lists to the request queue dispatch list.
++ */
++static void bfq_dispatch_insert(struct request_queue *q, struct request *rq)
++{
++	struct bfq_data *bfqd = q->elevator->elevator_data;
++	struct bfq_queue *bfqq = RQ_BFQQ(rq);
++
++	/*
++	 * For consistency, the next instruction should have been executed
++	 * after removing the request from the queue and dispatching it.
++	 * We execute instead this instruction before bfq_remove_request()
++	 * (and hence introduce a temporary inconsistency), for efficiency.
++	 * In fact, in a forced_dispatch, this prevents two counters related
++	 * to bfqq->dispatched to risk to be uselessly decremented if bfqq
++	 * is not in service, and then to be incremented again after
++	 * incrementing bfqq->dispatched.
++	 */
++	bfqq->dispatched++;
++	bfq_remove_request(rq);
++	elv_dispatch_sort(q, rq);
++
++	if (bfq_bfqq_sync(bfqq))
++		bfqd->sync_flight++;
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++	bfqg_stats_update_dispatch(bfqq_group(bfqq), blk_rq_bytes(rq),
++				   rq->cmd_flags);
++#endif
++}
++
++/*
++ * Return expired entry, or NULL to just start from scratch in rbtree.
++ */
++static struct request *bfq_check_fifo(struct bfq_queue *bfqq)
++{
++	struct request *rq = NULL;
++
++	if (bfq_bfqq_fifo_expire(bfqq))
++		return NULL;
++
++	bfq_mark_bfqq_fifo_expire(bfqq);
++
++	if (list_empty(&bfqq->fifo))
++		return NULL;
++
++	rq = rq_entry_fifo(bfqq->fifo.next);
++
++	if (time_before(jiffies, rq->fifo_time))
++		return NULL;
++
++	return rq;
++}
++
++static int bfq_bfqq_budget_left(struct bfq_queue *bfqq)
++{
++	struct bfq_entity *entity = &bfqq->entity;
++	return entity->budget - entity->service;
++}
++
++static void __bfq_bfqq_expire(struct bfq_data *bfqd, struct bfq_queue *bfqq)
++{
++	BUG_ON(bfqq != bfqd->in_service_queue);
++
++	__bfq_bfqd_reset_in_service(bfqd);
++
++	if (RB_EMPTY_ROOT(&bfqq->sort_list)) {
++		/*
++		 * Overloading budget_timeout field to store the time
++		 * at which the queue remains with no backlog; used by
++		 * the weight-raising mechanism.
++		 */
++		bfqq->budget_timeout = jiffies;
++		bfq_del_bfqq_busy(bfqd, bfqq, 1);
++	} else
++		bfq_activate_bfqq(bfqd, bfqq);
++}
++
++/**
++ * __bfq_bfqq_recalc_budget - try to adapt the budget to the @bfqq behavior.
++ * @bfqd: device data.
++ * @bfqq: queue to update.
++ * @reason: reason for expiration.
++ *
++ * Handle the feedback on @bfqq budget at queue expiration.
++ * See the body for detailed comments.
++ */
++static void __bfq_bfqq_recalc_budget(struct bfq_data *bfqd,
++				     struct bfq_queue *bfqq,
++				     enum bfqq_expiration reason)
++{
++	struct request *next_rq;
++	int budget, min_budget;
++
++	budget = bfqq->max_budget;
++	min_budget = bfq_min_budget(bfqd);
++
++	BUG_ON(bfqq != bfqd->in_service_queue);
++
++	bfq_log_bfqq(bfqd, bfqq, "recalc_budg: last budg %d, budg left %d",
++		bfqq->entity.budget, bfq_bfqq_budget_left(bfqq));
++	bfq_log_bfqq(bfqd, bfqq, "recalc_budg: last max_budg %d, min budg %d",
++		budget, bfq_min_budget(bfqd));
++	bfq_log_bfqq(bfqd, bfqq, "recalc_budg: sync %d, seeky %d",
++		bfq_bfqq_sync(bfqq), BFQQ_SEEKY(bfqd->in_service_queue));
++
++	if (bfq_bfqq_sync(bfqq)) {
++		switch (reason) {
++		/*
++		 * Caveat: in all the following cases we trade latency
++		 * for throughput.
++		 */
++		case BFQ_BFQQ_TOO_IDLE:
++			/*
++			 * This is the only case where we may reduce
++			 * the budget: if there is no request of the
++			 * process still waiting for completion, then
++			 * we assume (tentatively) that the timer has
++			 * expired because the batch of requests of
++			 * the process could have been served with a
++			 * smaller budget.  Hence, betting that
++			 * process will behave in the same way when it
++			 * becomes backlogged again, we reduce its
++			 * next budget.  As long as we guess right,
++			 * this budget cut reduces the latency
++			 * experienced by the process.
++			 *
++			 * However, if there are still outstanding
++			 * requests, then the process may have not yet
++			 * issued its next request just because it is
++			 * still waiting for the completion of some of
++			 * the still outstanding ones.  So in this
++			 * subcase we do not reduce its budget, on the
++			 * contrary we increase it to possibly boost
++			 * the throughput, as discussed in the
++			 * comments to the BUDGET_TIMEOUT case.
++			 */
++			if (bfqq->dispatched > 0) /* still outstanding reqs */
++				budget = min(budget * 2, bfqd->bfq_max_budget);
++			else {
++				if (budget > 5 * min_budget)
++					budget -= 4 * min_budget;
++				else
++					budget = min_budget;
++			}
++			break;
++		case BFQ_BFQQ_BUDGET_TIMEOUT:
++			/*
++			 * We double the budget here because: 1) it
++			 * gives the chance to boost the throughput if
++			 * this is not a seeky process (which may have
++			 * bumped into this timeout because of, e.g.,
++			 * ZBR), 2) together with charge_full_budget
++			 * it helps give seeky processes higher
++			 * timestamps, and hence be served less
++			 * frequently.
++			 */
++			budget = min(budget * 2, bfqd->bfq_max_budget);
++			break;
++		case BFQ_BFQQ_BUDGET_EXHAUSTED:
++			/*
++			 * The process still has backlog, and did not
++			 * let either the budget timeout or the disk
++			 * idling timeout expire. Hence it is not
++			 * seeky, has a short thinktime and may be
++			 * happy with a higher budget too. So
++			 * definitely increase the budget of this good
++			 * candidate to boost the disk throughput.
++			 */
++			budget = min(budget * 4, bfqd->bfq_max_budget);
++			break;
++		case BFQ_BFQQ_NO_MORE_REQUESTS:
++		       /*
++			* Leave the budget unchanged.
++			*/
++		default:
++			return;
++		}
++	} else
++		/*
++		 * Async queues get always the maximum possible budget
++		 * (their ability to dispatch is limited by
++		 * @bfqd->bfq_max_budget_async_rq).
++		 */
++		budget = bfqd->bfq_max_budget;
++
++	bfqq->max_budget = budget;
++
++	if (bfqd->budgets_assigned >= bfq_stats_min_budgets &&
++	    !bfqd->bfq_user_max_budget)
++		bfqq->max_budget = min(bfqq->max_budget, bfqd->bfq_max_budget);
++
++	/*
++	 * Make sure that we have enough budget for the next request.
++	 * Since the finish time of the bfqq must be kept in sync with
++	 * the budget, be sure to call __bfq_bfqq_expire() after the
++	 * update.
++	 */
++	next_rq = bfqq->next_rq;
++	if (next_rq)
++		bfqq->entity.budget = max_t(unsigned long, bfqq->max_budget,
++					    bfq_serv_to_charge(next_rq, bfqq));
++	else
++		bfqq->entity.budget = bfqq->max_budget;
++
++	bfq_log_bfqq(bfqd, bfqq, "head sect: %u, new budget %d",
++			next_rq ? blk_rq_sectors(next_rq) : 0,
++			bfqq->entity.budget);
++}
++
++static unsigned long bfq_calc_max_budget(u64 peak_rate, u64 timeout)
++{
++	unsigned long max_budget;
++
++	/*
++	 * The max_budget calculated when autotuning is equal to the
++	 * amount of sectors transfered in timeout_sync at the
++	 * estimated peak rate.
++	 */
++	max_budget = (unsigned long)(peak_rate * 1000 *
++				     timeout >> BFQ_RATE_SHIFT);
++
++	return max_budget;
++}
++
++/*
++ * In addition to updating the peak rate, checks whether the process
++ * is "slow", and returns 1 if so. This slow flag is used, in addition
++ * to the budget timeout, to reduce the amount of service provided to
++ * seeky processes, and hence reduce their chances to lower the
++ * throughput. See the code for more details.
++ */
++static bool bfq_update_peak_rate(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++				 bool compensate, enum bfqq_expiration reason)
++{
++	u64 bw, usecs, expected, timeout;
++	ktime_t delta;
++	int update = 0;
++
++	if (!bfq_bfqq_sync(bfqq) || bfq_bfqq_budget_new(bfqq))
++		return false;
++
++	if (compensate)
++		delta = bfqd->last_idling_start;
++	else
++		delta = ktime_get();
++	delta = ktime_sub(delta, bfqd->last_budget_start);
++	usecs = ktime_to_us(delta);
++
++	/* Don't trust short/unrealistic values. */
++	if (usecs < 100 || usecs >= LONG_MAX)
++		return false;
++
++	/*
++	 * Calculate the bandwidth for the last slice.  We use a 64 bit
++	 * value to store the peak rate, in sectors per usec in fixed
++	 * point math.  We do so to have enough precision in the estimate
++	 * and to avoid overflows.
++	 */
++	bw = (u64)bfqq->entity.service << BFQ_RATE_SHIFT;
++	do_div(bw, (unsigned long)usecs);
++
++	timeout = jiffies_to_msecs(bfqd->bfq_timeout[BLK_RW_SYNC]);
++
++	/*
++	 * Use only long (> 20ms) intervals to filter out spikes for
++	 * the peak rate estimation.
++	 */
++	if (usecs > 20000) {
++		if (bw > bfqd->peak_rate ||
++		   (!BFQQ_SEEKY(bfqq) &&
++		    reason == BFQ_BFQQ_BUDGET_TIMEOUT)) {
++			bfq_log(bfqd, "measured bw =%llu", bw);
++			/*
++			 * To smooth oscillations use a low-pass filter with
++			 * alpha=7/8, i.e.,
++			 * new_rate = (7/8) * old_rate + (1/8) * bw
++			 */
++			do_div(bw, 8);
++			if (bw == 0)
++				return 0;
++			bfqd->peak_rate *= 7;
++			do_div(bfqd->peak_rate, 8);
++			bfqd->peak_rate += bw;
++			update = 1;
++			bfq_log(bfqd, "new peak_rate=%llu", bfqd->peak_rate);
++		}
++
++		update |= bfqd->peak_rate_samples == BFQ_PEAK_RATE_SAMPLES - 1;
++
++		if (bfqd->peak_rate_samples < BFQ_PEAK_RATE_SAMPLES)
++			bfqd->peak_rate_samples++;
++
++		if (bfqd->peak_rate_samples == BFQ_PEAK_RATE_SAMPLES &&
++		    update) {
++			int dev_type = blk_queue_nonrot(bfqd->queue);
++			if (bfqd->bfq_user_max_budget == 0) {
++				bfqd->bfq_max_budget =
++					bfq_calc_max_budget(bfqd->peak_rate,
++							    timeout);
++				bfq_log(bfqd, "new max_budget=%d",
++					bfqd->bfq_max_budget);
++			}
++			if (bfqd->device_speed == BFQ_BFQD_FAST &&
++			    bfqd->peak_rate < device_speed_thresh[dev_type]) {
++				bfqd->device_speed = BFQ_BFQD_SLOW;
++				bfqd->RT_prod = R_slow[dev_type] *
++						T_slow[dev_type];
++			} else if (bfqd->device_speed == BFQ_BFQD_SLOW &&
++			    bfqd->peak_rate > device_speed_thresh[dev_type]) {
++				bfqd->device_speed = BFQ_BFQD_FAST;
++				bfqd->RT_prod = R_fast[dev_type] *
++						T_fast[dev_type];
++			}
++		}
++	}
++
++	/*
++	 * If the process has been served for a too short time
++	 * interval to let its possible sequential accesses prevail on
++	 * the initial seek time needed to move the disk head on the
++	 * first sector it requested, then give the process a chance
++	 * and for the moment return false.
++	 */
++	if (bfqq->entity.budget <= bfq_max_budget(bfqd) / 8)
++		return false;
++
++	/*
++	 * A process is considered ``slow'' (i.e., seeky, so that we
++	 * cannot treat it fairly in the service domain, as it would
++	 * slow down too much the other processes) if, when a slice
++	 * ends for whatever reason, it has received service at a
++	 * rate that would not be high enough to complete the budget
++	 * before the budget timeout expiration.
++	 */
++	expected = bw * 1000 * timeout >> BFQ_RATE_SHIFT;
++
++	/*
++	 * Caveat: processes doing IO in the slower disk zones will
++	 * tend to be slow(er) even if not seeky. And the estimated
++	 * peak rate will actually be an average over the disk
++	 * surface. Hence, to not be too harsh with unlucky processes,
++	 * we keep a budget/3 margin of safety before declaring a
++	 * process slow.
++	 */
++	return expected > (4 * bfqq->entity.budget) / 3;
++}
++
++/*
++ * To be deemed as soft real-time, an application must meet two
++ * requirements. First, the application must not require an average
++ * bandwidth higher than the approximate bandwidth required to playback or
++ * record a compressed high-definition video.
++ * The next function is invoked on the completion of the last request of a
++ * batch, to compute the next-start time instant, soft_rt_next_start, such
++ * that, if the next request of the application does not arrive before
++ * soft_rt_next_start, then the above requirement on the bandwidth is met.
++ *
++ * The second requirement is that the request pattern of the application is
++ * isochronous, i.e., that, after issuing a request or a batch of requests,
++ * the application stops issuing new requests until all its pending requests
++ * have been completed. After that, the application may issue a new batch,
++ * and so on.
++ * For this reason the next function is invoked to compute
++ * soft_rt_next_start only for applications that meet this requirement,
++ * whereas soft_rt_next_start is set to infinity for applications that do
++ * not.
++ *
++ * Unfortunately, even a greedy application may happen to behave in an
++ * isochronous way if the CPU load is high. In fact, the application may
++ * stop issuing requests while the CPUs are busy serving other processes,
++ * then restart, then stop again for a while, and so on. In addition, if
++ * the disk achieves a low enough throughput with the request pattern
++ * issued by the application (e.g., because the request pattern is random
++ * and/or the device is slow), then the application may meet the above
++ * bandwidth requirement too. To prevent such a greedy application to be
++ * deemed as soft real-time, a further rule is used in the computation of
++ * soft_rt_next_start: soft_rt_next_start must be higher than the current
++ * time plus the maximum time for which the arrival of a request is waited
++ * for when a sync queue becomes idle, namely bfqd->bfq_slice_idle.
++ * This filters out greedy applications, as the latter issue instead their
++ * next request as soon as possible after the last one has been completed
++ * (in contrast, when a batch of requests is completed, a soft real-time
++ * application spends some time processing data).
++ *
++ * Unfortunately, the last filter may easily generate false positives if
++ * only bfqd->bfq_slice_idle is used as a reference time interval and one
++ * or both the following cases occur:
++ * 1) HZ is so low that the duration of a jiffy is comparable to or higher
++ *    than bfqd->bfq_slice_idle. This happens, e.g., on slow devices with
++ *    HZ=100.
++ * 2) jiffies, instead of increasing at a constant rate, may stop increasing
++ *    for a while, then suddenly 'jump' by several units to recover the lost
++ *    increments. This seems to happen, e.g., inside virtual machines.
++ * To address this issue, we do not use as a reference time interval just
++ * bfqd->bfq_slice_idle, but bfqd->bfq_slice_idle plus a few jiffies. In
++ * particular we add the minimum number of jiffies for which the filter
++ * seems to be quite precise also in embedded systems and KVM/QEMU virtual
++ * machines.
++ */
++static unsigned long bfq_bfqq_softrt_next_start(struct bfq_data *bfqd,
++						struct bfq_queue *bfqq)
++{
++	return max(bfqq->last_idle_bklogged +
++		   HZ * bfqq->service_from_backlogged /
++		   bfqd->bfq_wr_max_softrt_rate,
++		   jiffies + bfqq->bfqd->bfq_slice_idle + 4);
++}
++
++/*
++ * Return the largest-possible time instant such that, for as long as possible,
++ * the current time will be lower than this time instant according to the macro
++ * time_is_before_jiffies().
++ */
++static unsigned long bfq_infinity_from_now(unsigned long now)
++{
++	return now + ULONG_MAX / 2;
++}
++
++/**
++ * bfq_bfqq_expire - expire a queue.
++ * @bfqd: device owning the queue.
++ * @bfqq: the queue to expire.
++ * @compensate: if true, compensate for the time spent idling.
++ * @reason: the reason causing the expiration.
++ *
++ *
++ * If the process associated to the queue is slow (i.e., seeky), or in
++ * case of budget timeout, or, finally, if it is async, we
++ * artificially charge it an entire budget (independently of the
++ * actual service it received). As a consequence, the queue will get
++ * higher timestamps than the correct ones upon reactivation, and
++ * hence it will be rescheduled as if it had received more service
++ * than what it actually received. In the end, this class of processes
++ * will receive less service in proportion to how slowly they consume
++ * their budgets (and hence how seriously they tend to lower the
++ * throughput).
++ *
++ * In contrast, when a queue expires because it has been idling for
++ * too much or because it exhausted its budget, we do not touch the
++ * amount of service it has received. Hence when the queue will be
++ * reactivated and its timestamps updated, the latter will be in sync
++ * with the actual service received by the queue until expiration.
++ *
++ * Charging a full budget to the first type of queues and the exact
++ * service to the others has the effect of using the WF2Q+ policy to
++ * schedule the former on a timeslice basis, without violating the
++ * service domain guarantees of the latter.
++ */
++static void bfq_bfqq_expire(struct bfq_data *bfqd,
++			    struct bfq_queue *bfqq,
++			    bool compensate,
++			    enum bfqq_expiration reason)
++{
++	bool slow;
++	BUG_ON(bfqq != bfqd->in_service_queue);
++
++	/*
++	 * Update disk peak rate for autotuning and check whether the
++	 * process is slow (see bfq_update_peak_rate).
++	 */
++	slow = bfq_update_peak_rate(bfqd, bfqq, compensate, reason);
++
++	/*
++	 * As above explained, 'punish' slow (i.e., seeky), timed-out
++	 * and async queues, to favor sequential sync workloads.
++	 *
++	 * Processes doing I/O in the slower disk zones will tend to be
++	 * slow(er) even if not seeky. Hence, since the estimated peak
++	 * rate is actually an average over the disk surface, these
++	 * processes may timeout just for bad luck. To avoid punishing
++	 * them we do not charge a full budget to a process that
++	 * succeeded in consuming at least 2/3 of its budget.
++	 */
++	if (slow || (reason == BFQ_BFQQ_BUDGET_TIMEOUT &&
++		     bfq_bfqq_budget_left(bfqq) >=  bfqq->entity.budget / 3))
++		bfq_bfqq_charge_full_budget(bfqq);
++
++	bfqq->service_from_backlogged += bfqq->entity.service;
++
++	if (BFQQ_SEEKY(bfqq) && reason == BFQ_BFQQ_BUDGET_TIMEOUT &&
++	    !bfq_bfqq_constantly_seeky(bfqq)) {
++		bfq_mark_bfqq_constantly_seeky(bfqq);
++		if (!blk_queue_nonrot(bfqd->queue))
++			bfqd->const_seeky_busy_in_flight_queues++;
++	}
++
++	if (reason == BFQ_BFQQ_TOO_IDLE &&
++	    bfqq->entity.service <= 2 * bfqq->entity.budget / 10 )
++		bfq_clear_bfqq_IO_bound(bfqq);
++
++	if (bfqd->low_latency && bfqq->wr_coeff == 1)
++		bfqq->last_wr_start_finish = jiffies;
++
++	if (bfqd->low_latency && bfqd->bfq_wr_max_softrt_rate > 0 &&
++	    RB_EMPTY_ROOT(&bfqq->sort_list)) {
++		/*
++		 * If we get here, and there are no outstanding requests,
++		 * then the request pattern is isochronous (see the comments
++		 * to the function bfq_bfqq_softrt_next_start()). Hence we
++		 * can compute soft_rt_next_start. If, instead, the queue
++		 * still has outstanding requests, then we have to wait
++		 * for the completion of all the outstanding requests to
++		 * discover whether the request pattern is actually
++		 * isochronous.
++		 */
++		if (bfqq->dispatched == 0)
++			bfqq->soft_rt_next_start =
++				bfq_bfqq_softrt_next_start(bfqd, bfqq);
++		else {
++			/*
++			 * The application is still waiting for the
++			 * completion of one or more requests:
++			 * prevent it from possibly being incorrectly
++			 * deemed as soft real-time by setting its
++			 * soft_rt_next_start to infinity. In fact,
++			 * without this assignment, the application
++			 * would be incorrectly deemed as soft
++			 * real-time if:
++			 * 1) it issued a new request before the
++			 *    completion of all its in-flight
++			 *    requests, and
++			 * 2) at that time, its soft_rt_next_start
++			 *    happened to be in the past.
++			 */
++			bfqq->soft_rt_next_start =
++				bfq_infinity_from_now(jiffies);
++			/*
++			 * Schedule an update of soft_rt_next_start to when
++			 * the task may be discovered to be isochronous.
++			 */
++			bfq_mark_bfqq_softrt_update(bfqq);
++		}
++	}
++
++	bfq_log_bfqq(bfqd, bfqq,
++		"expire (%d, slow %d, num_disp %d, idle_win %d)", reason,
++		slow, bfqq->dispatched, bfq_bfqq_idle_window(bfqq));
++
++	/*
++	 * Increase, decrease or leave budget unchanged according to
++	 * reason.
++	 */
++	__bfq_bfqq_recalc_budget(bfqd, bfqq, reason);
++	__bfq_bfqq_expire(bfqd, bfqq);
++}
++
++/*
++ * Budget timeout is not implemented through a dedicated timer, but
++ * just checked on request arrivals and completions, as well as on
++ * idle timer expirations.
++ */
++static bool bfq_bfqq_budget_timeout(struct bfq_queue *bfqq)
++{
++	if (bfq_bfqq_budget_new(bfqq) ||
++	    time_before(jiffies, bfqq->budget_timeout))
++		return false;
++	return true;
++}
++
++/*
++ * If we expire a queue that is waiting for the arrival of a new
++ * request, we may prevent the fictitious timestamp back-shifting that
++ * allows the guarantees of the queue to be preserved (see [1] for
++ * this tricky aspect). Hence we return true only if this condition
++ * does not hold, or if the queue is slow enough to deserve only to be
++ * kicked off for preserving a high throughput.
++*/
++static bool bfq_may_expire_for_budg_timeout(struct bfq_queue *bfqq)
++{
++	bfq_log_bfqq(bfqq->bfqd, bfqq,
++		"may_budget_timeout: wait_request %d left %d timeout %d",
++		bfq_bfqq_wait_request(bfqq),
++			bfq_bfqq_budget_left(bfqq) >=  bfqq->entity.budget / 3,
++		bfq_bfqq_budget_timeout(bfqq));
++
++	return (!bfq_bfqq_wait_request(bfqq) ||
++		bfq_bfqq_budget_left(bfqq) >=  bfqq->entity.budget / 3)
++		&&
++		bfq_bfqq_budget_timeout(bfqq);
++}
++
++/*
++ * For a queue that becomes empty, device idling is allowed only if
++ * this function returns true for that queue. As a consequence, since
++ * device idling plays a critical role for both throughput boosting
++ * and service guarantees, the return value of this function plays a
++ * critical role as well.
++ *
++ * In a nutshell, this function returns true only if idling is
++ * beneficial for throughput or, even if detrimental for throughput,
++ * idling is however necessary to preserve service guarantees (low
++ * latency, desired throughput distribution, ...). In particular, on
++ * NCQ-capable devices, this function tries to return false, so as to
++ * help keep the drives' internal queues full, whenever this helps the
++ * device boost the throughput without causing any service-guarantee
++ * issue.
++ *
++ * In more detail, the return value of this function is obtained by,
++ * first, computing a number of boolean variables that take into
++ * account throughput and service-guarantee issues, and, then,
++ * combining these variables in a logical expression. Most of the
++ * issues taken into account are not trivial. We discuss these issues
++ * while introducing the variables.
++ */
++static bool bfq_bfqq_may_idle(struct bfq_queue *bfqq)
++{
++	struct bfq_data *bfqd = bfqq->bfqd;
++	bool idling_boosts_thr, idling_boosts_thr_without_issues,
++		all_queues_seeky, on_hdd_and_not_all_queues_seeky,
++		idling_needed_for_service_guarantees,
++		asymmetric_scenario;
++
++	/*
++	 * The next variable takes into account the cases where idling
++	 * boosts the throughput.
++	 *
++	 * The value of the variable is computed considering, first, that
++	 * idling is virtually always beneficial for the throughput if:
++	 * (a) the device is not NCQ-capable, or
++	 * (b) regardless of the presence of NCQ, the device is rotational
++	 *     and the request pattern for bfqq is I/O-bound and sequential.
++	 *
++	 * Secondly, and in contrast to the above item (b), idling an
++	 * NCQ-capable flash-based device would not boost the
++	 * throughput even with sequential I/O; rather it would lower
++	 * the throughput in proportion to how fast the device
++	 * is. Accordingly, the next variable is true if any of the
++	 * above conditions (a) and (b) is true, and, in particular,
++	 * happens to be false if bfqd is an NCQ-capable flash-based
++	 * device.
++	 */
++	idling_boosts_thr = !bfqd->hw_tag ||
++		(!blk_queue_nonrot(bfqd->queue) && bfq_bfqq_IO_bound(bfqq) &&
++		 bfq_bfqq_idle_window(bfqq)) ;
++
++	/*
++	 * The value of the next variable,
++	 * idling_boosts_thr_without_issues, is equal to that of
++	 * idling_boosts_thr, unless a special case holds. In this
++	 * special case, described below, idling may cause problems to
++	 * weight-raised queues.
++	 *
++	 * When the request pool is saturated (e.g., in the presence
++	 * of write hogs), if the processes associated with
++	 * non-weight-raised queues ask for requests at a lower rate,
++	 * then processes associated with weight-raised queues have a
++	 * higher probability to get a request from the pool
++	 * immediately (or at least soon) when they need one. Thus
++	 * they have a higher probability to actually get a fraction
++	 * of the device throughput proportional to their high
++	 * weight. This is especially true with NCQ-capable drives,
++	 * which enqueue several requests in advance, and further
++	 * reorder internally-queued requests.
++	 *
++	 * For this reason, we force to false the value of
++	 * idling_boosts_thr_without_issues if there are weight-raised
++	 * busy queues. In this case, and if bfqq is not weight-raised,
++	 * this guarantees that the device is not idled for bfqq (if,
++	 * instead, bfqq is weight-raised, then idling will be
++	 * guaranteed by another variable, see below). Combined with
++	 * the timestamping rules of BFQ (see [1] for details), this
++	 * behavior causes bfqq, and hence any sync non-weight-raised
++	 * queue, to get a lower number of requests served, and thus
++	 * to ask for a lower number of requests from the request
++	 * pool, before the busy weight-raised queues get served
++	 * again. This often mitigates starvation problems in the
++	 * presence of heavy write workloads and NCQ, thereby
++	 * guaranteeing a higher application and system responsiveness
++	 * in these hostile scenarios.
++	 */
++	idling_boosts_thr_without_issues = idling_boosts_thr &&
++		bfqd->wr_busy_queues == 0;
++
++	/*
++	 * There are then two cases where idling must be performed not
++	 * for throughput concerns, but to preserve service
++	 * guarantees. In the description of these cases, we say, for
++	 * short, that a queue is sequential/random if the process
++	 * associated to the queue issues sequential/random requests
++	 * (in the second case the queue may be tagged as seeky or
++	 * even constantly_seeky).
++	 *
++	 * To introduce the first case, we note that, since
++	 * bfq_bfqq_idle_window(bfqq) is false if the device is
++	 * NCQ-capable and bfqq is random (see
++	 * bfq_update_idle_window()), then, from the above two
++	 * assignments it follows that
++	 * idling_boosts_thr_without_issues is false if the device is
++	 * NCQ-capable and bfqq is random. Therefore, for this case,
++	 * device idling would never be allowed if we used just
++	 * idling_boosts_thr_without_issues to decide whether to allow
++	 * it. And, beneficially, this would imply that throughput
++	 * would always be boosted also with random I/O on NCQ-capable
++	 * HDDs.
++	 *
++	 * But we must be careful on this point, to avoid an unfair
++	 * treatment for bfqq. In fact, because of the same above
++	 * assignments, idling_boosts_thr_without_issues is, on the
++	 * other hand, true if 1) the device is an HDD and bfqq is
++	 * sequential, and 2) there are no busy weight-raised
++	 * queues. As a consequence, if we used just
++	 * idling_boosts_thr_without_issues to decide whether to idle
++	 * the device, then with an HDD we might easily bump into a
++	 * scenario where queues that are sequential and I/O-bound
++	 * would enjoy idling, whereas random queues would not. The
++	 * latter might then get a low share of the device throughput,
++	 * simply because the former would get many requests served
++	 * after being set as in service, while the latter would not.
++	 *
++	 * To address this issue, we start by setting to true a
++	 * sentinel variable, on_hdd_and_not_all_queues_seeky, if the
++	 * device is rotational and not all queues with pending or
++	 * in-flight requests are constantly seeky (i.e., there are
++	 * active sequential queues, and bfqq might then be mistreated
++	 * if it does not enjoy idling because it is random).
++	 */
++	all_queues_seeky = bfq_bfqq_constantly_seeky(bfqq) &&
++			   bfqd->busy_in_flight_queues ==
++			   bfqd->const_seeky_busy_in_flight_queues;
++
++	on_hdd_and_not_all_queues_seeky =
++		!blk_queue_nonrot(bfqd->queue) && !all_queues_seeky;
++
++	/*
++	 * To introduce the second case where idling needs to be
++	 * performed to preserve service guarantees, we can note that
++	 * allowing the drive to enqueue more than one request at a
++	 * time, and hence delegating de facto final scheduling
++	 * decisions to the drive's internal scheduler, causes loss of
++	 * control on the actual request service order. In particular,
++	 * the critical situation is when requests from different
++	 * processes happens to be present, at the same time, in the
++	 * internal queue(s) of the drive. In such a situation, the
++	 * drive, by deciding the service order of the
++	 * internally-queued requests, does determine also the actual
++	 * throughput distribution among these processes. But the
++	 * drive typically has no notion or concern about per-process
++	 * throughput distribution, and makes its decisions only on a
++	 * per-request basis. Therefore, the service distribution
++	 * enforced by the drive's internal scheduler is likely to
++	 * coincide with the desired device-throughput distribution
++	 * only in a completely symmetric scenario where:
++	 * (i)  each of these processes must get the same throughput as
++	 *      the others;
++	 * (ii) all these processes have the same I/O pattern
++	        (either sequential or random).
++	 * In fact, in such a scenario, the drive will tend to treat
++	 * the requests of each of these processes in about the same
++	 * way as the requests of the others, and thus to provide
++	 * each of these processes with about the same throughput
++	 * (which is exactly the desired throughput distribution). In
++	 * contrast, in any asymmetric scenario, device idling is
++	 * certainly needed to guarantee that bfqq receives its
++	 * assigned fraction of the device throughput (see [1] for
++	 * details).
++	 *
++	 * We address this issue by controlling, actually, only the
++	 * symmetry sub-condition (i), i.e., provided that
++	 * sub-condition (i) holds, idling is not performed,
++	 * regardless of whether sub-condition (ii) holds. In other
++	 * words, only if sub-condition (i) holds, then idling is
++	 * allowed, and the device tends to be prevented from queueing
++	 * many requests, possibly of several processes. The reason
++	 * for not controlling also sub-condition (ii) is that, first,
++	 * in the case of an HDD, the asymmetry in terms of types of
++	 * I/O patterns is already taken in to account in the above
++	 * sentinel variable
++	 * on_hdd_and_not_all_queues_seeky. Secondly, in the case of a
++	 * flash-based device, we prefer however to privilege
++	 * throughput (and idling lowers throughput for this type of
++	 * devices), for the following reasons:
++	 * 1) differently from HDDs, the service time of random
++	 *    requests is not orders of magnitudes lower than the service
++	 *    time of sequential requests; thus, even if processes doing
++	 *    sequential I/O get a preferential treatment with respect to
++	 *    others doing random I/O, the consequences are not as
++	 *    dramatic as with HDDs;
++	 * 2) if a process doing random I/O does need strong
++	 *    throughput guarantees, it is hopefully already being
++	 *    weight-raised, or the user is likely to have assigned it a
++	 *    higher weight than the other processes (and thus
++	 *    sub-condition (i) is likely to be false, which triggers
++	 *    idling).
++	 *
++	 * According to the above considerations, the next variable is
++	 * true (only) if sub-condition (i) holds. To compute the
++	 * value of this variable, we not only use the return value of
++	 * the function bfq_symmetric_scenario(), but also check
++	 * whether bfqq is being weight-raised, because
++	 * bfq_symmetric_scenario() does not take into account also
++	 * weight-raised queues (see comments to
++	 * bfq_weights_tree_add()).
++	 *
++	 * As a side note, it is worth considering that the above
++	 * device-idling countermeasures may however fail in the
++	 * following unlucky scenario: if idling is (correctly)
++	 * disabled in a time period during which all symmetry
++	 * sub-conditions hold, and hence the device is allowed to
++	 * enqueue many requests, but at some later point in time some
++	 * sub-condition stops to hold, then it may become impossible
++	 * to let requests be served in the desired order until all
++	 * the requests already queued in the device have been served.
++	 */
++	asymmetric_scenario = bfqq->wr_coeff > 1 ||
++		!bfq_symmetric_scenario(bfqd);
++
++	/*
++	 * Finally, there is a case where maximizing throughput is the
++	 * best choice even if it may cause unfairness toward
++	 * bfqq. Such a case is when bfqq became active in a burst of
++	 * queue activations. Queues that became active during a large
++	 * burst benefit only from throughput, as discussed in the
++	 * comments to bfq_handle_burst. Thus, if bfqq became active
++	 * in a burst and not idling the device maximizes throughput,
++	 * then the device must no be idled, because not idling the
++	 * device provides bfqq and all other queues in the burst with
++	 * maximum benefit. Combining this and the two cases above, we
++	 * can now establish when idling is actually needed to
++	 * preserve service guarantees.
++	 */
++	idling_needed_for_service_guarantees =
++		(on_hdd_and_not_all_queues_seeky || asymmetric_scenario) &&
++		!bfq_bfqq_in_large_burst(bfqq);
++
++	/*
++	 * We have now all the components we need to compute the return
++	 * value of the function, which is true only if both the following
++	 * conditions hold:
++	 * 1) bfqq is sync, because idling make sense only for sync queues;
++	 * 2) idling either boosts the throughput (without issues), or
++	 *    is necessary to preserve service guarantees.
++	 */
++	return bfq_bfqq_sync(bfqq) &&
++		(idling_boosts_thr_without_issues ||
++		 idling_needed_for_service_guarantees);
++}
++
++/*
++ * If the in-service queue is empty but the function bfq_bfqq_may_idle
++ * returns true, then:
++ * 1) the queue must remain in service and cannot be expired, and
++ * 2) the device must be idled to wait for the possible arrival of a new
++ *    request for the queue.
++ * See the comments to the function bfq_bfqq_may_idle for the reasons
++ * why performing device idling is the best choice to boost the throughput
++ * and preserve service guarantees when bfq_bfqq_may_idle itself
++ * returns true.
++ */
++static bool bfq_bfqq_must_idle(struct bfq_queue *bfqq)
++{
++	struct bfq_data *bfqd = bfqq->bfqd;
++
++	return RB_EMPTY_ROOT(&bfqq->sort_list) && bfqd->bfq_slice_idle != 0 &&
++	       bfq_bfqq_may_idle(bfqq);
++}
++
++/*
++ * Select a queue for service.  If we have a current queue in service,
++ * check whether to continue servicing it, or retrieve and set a new one.
++ */
++static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
++{
++	struct bfq_queue *bfqq;
++	struct request *next_rq;
++	enum bfqq_expiration reason = BFQ_BFQQ_BUDGET_TIMEOUT;
++
++	bfqq = bfqd->in_service_queue;
++	if (!bfqq)
++		goto new_queue;
++
++	bfq_log_bfqq(bfqd, bfqq, "select_queue: already in-service queue");
++
++	if (bfq_may_expire_for_budg_timeout(bfqq) &&
++	    !timer_pending(&bfqd->idle_slice_timer) &&
++	    !bfq_bfqq_must_idle(bfqq))
++		goto expire;
++
++	next_rq = bfqq->next_rq;
++	/*
++	 * If bfqq has requests queued and it has enough budget left to
++	 * serve them, keep the queue, otherwise expire it.
++	 */
++	if (next_rq) {
++		if (bfq_serv_to_charge(next_rq, bfqq) >
++			bfq_bfqq_budget_left(bfqq)) {
++			reason = BFQ_BFQQ_BUDGET_EXHAUSTED;
++			goto expire;
++		} else {
++			/*
++			 * The idle timer may be pending because we may
++			 * not disable disk idling even when a new request
++			 * arrives.
++			 */
++			if (timer_pending(&bfqd->idle_slice_timer)) {
++				/*
++				 * If we get here: 1) at least a new request
++				 * has arrived but we have not disabled the
++				 * timer because the request was too small,
++				 * 2) then the block layer has unplugged
++				 * the device, causing the dispatch to be
++				 * invoked.
++				 *
++				 * Since the device is unplugged, now the
++				 * requests are probably large enough to
++				 * provide a reasonable throughput.
++				 * So we disable idling.
++				 */
++				bfq_clear_bfqq_wait_request(bfqq);
++				del_timer(&bfqd->idle_slice_timer);
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++				bfqg_stats_update_idle_time(bfqq_group(bfqq));
++#endif
++			}
++			goto keep_queue;
++		}
++	}
++
++	/*
++	 * No requests pending. However, if the in-service queue is idling
++	 * for a new request, or has requests waiting for a completion and
++	 * may idle after their completion, then keep it anyway.
++	 */
++	if (timer_pending(&bfqd->idle_slice_timer) ||
++	    (bfqq->dispatched != 0 && bfq_bfqq_may_idle(bfqq))) {
++		bfqq = NULL;
++		goto keep_queue;
++	}
++
++	reason = BFQ_BFQQ_NO_MORE_REQUESTS;
++expire:
++	bfq_bfqq_expire(bfqd, bfqq, false, reason);
++new_queue:
++	bfqq = bfq_set_in_service_queue(bfqd);
++	bfq_log(bfqd, "select_queue: new queue %d returned",
++		bfqq ? bfqq->pid : 0);
++keep_queue:
++	return bfqq;
++}
++
++static void bfq_update_wr_data(struct bfq_data *bfqd, struct bfq_queue *bfqq)
++{
++	struct bfq_entity *entity = &bfqq->entity;
++	if (bfqq->wr_coeff > 1) { /* queue is being weight-raised */
++		bfq_log_bfqq(bfqd, bfqq,
++			"raising period dur %u/%u msec, old coeff %u, w %d(%d)",
++			jiffies_to_msecs(jiffies - bfqq->last_wr_start_finish),
++			jiffies_to_msecs(bfqq->wr_cur_max_time),
++			bfqq->wr_coeff,
++			bfqq->entity.weight, bfqq->entity.orig_weight);
++
++		BUG_ON(bfqq != bfqd->in_service_queue && entity->weight !=
++		       entity->orig_weight * bfqq->wr_coeff);
++		if (entity->prio_changed)
++			bfq_log_bfqq(bfqd, bfqq, "WARN: pending prio change");
++
++		/*
++		 * If the queue was activated in a burst, or
++		 * too much time has elapsed from the beginning
++		 * of this weight-raising period, then end weight
++		 * raising.
++		 */
++		if (bfq_bfqq_in_large_burst(bfqq) ||
++		    time_is_before_jiffies(bfqq->last_wr_start_finish +
++					   bfqq->wr_cur_max_time)) {
++			bfqq->last_wr_start_finish = jiffies;
++			bfq_log_bfqq(bfqd, bfqq,
++				     "wrais ending at %lu, rais_max_time %u",
++				     bfqq->last_wr_start_finish,
++				     jiffies_to_msecs(bfqq->wr_cur_max_time));
++			bfq_bfqq_end_wr(bfqq);
++		}
++	}
++	/* Update weight both if it must be raised and if it must be lowered */
++	if ((entity->weight > entity->orig_weight) != (bfqq->wr_coeff > 1))
++		__bfq_entity_update_weight_prio(
++			bfq_entity_service_tree(entity),
++			entity);
++}
++
++/*
++ * Dispatch one request from bfqq, moving it to the request queue
++ * dispatch list.
++ */
++static int bfq_dispatch_request(struct bfq_data *bfqd,
++				struct bfq_queue *bfqq)
++{
++	int dispatched = 0;
++	struct request *rq;
++	unsigned long service_to_charge;
++
++	BUG_ON(RB_EMPTY_ROOT(&bfqq->sort_list));
++
++	/* Follow expired path, else get first next available. */
++	rq = bfq_check_fifo(bfqq);
++	if (!rq)
++		rq = bfqq->next_rq;
++	service_to_charge = bfq_serv_to_charge(rq, bfqq);
++
++	if (service_to_charge > bfq_bfqq_budget_left(bfqq)) {
++		/*
++		 * This may happen if the next rq is chosen in fifo order
++		 * instead of sector order. The budget is properly
++		 * dimensioned to be always sufficient to serve the next
++		 * request only if it is chosen in sector order. The reason
++		 * is that it would be quite inefficient and little useful
++		 * to always make sure that the budget is large enough to
++		 * serve even the possible next rq in fifo order.
++		 * In fact, requests are seldom served in fifo order.
++		 *
++		 * Expire the queue for budget exhaustion, and make sure
++		 * that the next act_budget is enough to serve the next
++		 * request, even if it comes from the fifo expired path.
++		 */
++		bfqq->next_rq = rq;
++		/*
++		 * Since this dispatch is failed, make sure that
++		 * a new one will be performed
++		 */
++		if (!bfqd->rq_in_driver)
++			bfq_schedule_dispatch(bfqd);
++		goto expire;
++	}
++
++	/* Finally, insert request into driver dispatch list. */
++	bfq_bfqq_served(bfqq, service_to_charge);
++	bfq_dispatch_insert(bfqd->queue, rq);
++
++	bfq_update_wr_data(bfqd, bfqq);
++
++	bfq_log_bfqq(bfqd, bfqq,
++			"dispatched %u sec req (%llu), budg left %d",
++			blk_rq_sectors(rq),
++			(long long unsigned)blk_rq_pos(rq),
++			bfq_bfqq_budget_left(bfqq));
++
++	dispatched++;
++
++	if (!bfqd->in_service_bic) {
++		atomic_long_inc(&RQ_BIC(rq)->icq.ioc->refcount);
++		bfqd->in_service_bic = RQ_BIC(rq);
++	}
++
++	if (bfqd->busy_queues > 1 && ((!bfq_bfqq_sync(bfqq) &&
++	    dispatched >= bfqd->bfq_max_budget_async_rq) ||
++	    bfq_class_idle(bfqq)))
++		goto expire;
++
++	return dispatched;
++
++expire:
++	bfq_bfqq_expire(bfqd, bfqq, false, BFQ_BFQQ_BUDGET_EXHAUSTED);
++	return dispatched;
++}
++
++static int __bfq_forced_dispatch_bfqq(struct bfq_queue *bfqq)
++{
++	int dispatched = 0;
++
++	while (bfqq->next_rq) {
++		bfq_dispatch_insert(bfqq->bfqd->queue, bfqq->next_rq);
++		dispatched++;
++	}
++
++	BUG_ON(!list_empty(&bfqq->fifo));
++	return dispatched;
++}
++
++/*
++ * Drain our current requests.
++ * Used for barriers and when switching io schedulers on-the-fly.
++ */
++static int bfq_forced_dispatch(struct bfq_data *bfqd)
++{
++	struct bfq_queue *bfqq, *n;
++	struct bfq_service_tree *st;
++	int dispatched = 0;
++
++	bfqq = bfqd->in_service_queue;
++	if (bfqq)
++		__bfq_bfqq_expire(bfqd, bfqq);
++
++	/*
++	 * Loop through classes, and be careful to leave the scheduler
++	 * in a consistent state, as feedback mechanisms and vtime
++	 * updates cannot be disabled during the process.
++	 */
++	list_for_each_entry_safe(bfqq, n, &bfqd->active_list, bfqq_list) {
++		st = bfq_entity_service_tree(&bfqq->entity);
++
++		dispatched += __bfq_forced_dispatch_bfqq(bfqq);
++		bfqq->max_budget = bfq_max_budget(bfqd);
++
++		bfq_forget_idle(st);
++	}
++
++	BUG_ON(bfqd->busy_queues != 0);
++
++	return dispatched;
++}
++
++static int bfq_dispatch_requests(struct request_queue *q, int force)
++{
++	struct bfq_data *bfqd = q->elevator->elevator_data;
++	struct bfq_queue *bfqq;
++	int max_dispatch;
++
++	bfq_log(bfqd, "dispatch requests: %d busy queues", bfqd->busy_queues);
++	if (bfqd->busy_queues == 0)
++		return 0;
++
++	if (unlikely(force))
++		return bfq_forced_dispatch(bfqd);
++
++	bfqq = bfq_select_queue(bfqd);
++	if (!bfqq)
++		return 0;
++
++	if (bfq_class_idle(bfqq))
++		max_dispatch = 1;
++
++	if (!bfq_bfqq_sync(bfqq))
++		max_dispatch = bfqd->bfq_max_budget_async_rq;
++
++	if (!bfq_bfqq_sync(bfqq) && bfqq->dispatched >= max_dispatch) {
++		if (bfqd->busy_queues > 1)
++			return 0;
++		if (bfqq->dispatched >= 4 * max_dispatch)
++			return 0;
++	}
++
++	if (bfqd->sync_flight != 0 && !bfq_bfqq_sync(bfqq))
++		return 0;
++
++	bfq_clear_bfqq_wait_request(bfqq);
++	BUG_ON(timer_pending(&bfqd->idle_slice_timer));
++
++	if (!bfq_dispatch_request(bfqd, bfqq))
++		return 0;
++
++	bfq_log_bfqq(bfqd, bfqq, "dispatched %s request",
++			bfq_bfqq_sync(bfqq) ? "sync" : "async");
++
++	return 1;
++}
++
++/*
++ * Task holds one reference to the queue, dropped when task exits.  Each rq
++ * in-flight on this queue also holds a reference, dropped when rq is freed.
++ *
++ * Queue lock must be held here.
++ */
++static void bfq_put_queue(struct bfq_queue *bfqq)
++{
++	struct bfq_data *bfqd = bfqq->bfqd;
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++	struct bfq_group *bfqg = bfqq_group(bfqq);
++#endif
++
++	BUG_ON(atomic_read(&bfqq->ref) <= 0);
++
++	bfq_log_bfqq(bfqd, bfqq, "put_queue: %p %d", bfqq,
++		     atomic_read(&bfqq->ref));
++	if (!atomic_dec_and_test(&bfqq->ref))
++		return;
++
++	BUG_ON(rb_first(&bfqq->sort_list));
++	BUG_ON(bfqq->allocated[READ] + bfqq->allocated[WRITE] != 0);
++	BUG_ON(bfqq->entity.tree);
++	BUG_ON(bfq_bfqq_busy(bfqq));
++	BUG_ON(bfqd->in_service_queue == bfqq);
++
++	if (bfq_bfqq_sync(bfqq))
++		/*
++		 * The fact that this queue is being destroyed does not
++		 * invalidate the fact that this queue may have been
++		 * activated during the current burst. As a consequence,
++		 * although the queue does not exist anymore, and hence
++		 * needs to be removed from the burst list if there,
++		 * the burst size has not to be decremented.
++		 */
++		hlist_del_init(&bfqq->burst_list_node);
++
++	bfq_log_bfqq(bfqd, bfqq, "put_queue: %p freed", bfqq);
++
++	kmem_cache_free(bfq_pool, bfqq);
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++	bfqg_put(bfqg);
++#endif
++}
++
++static void bfq_exit_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq)
++{
++	if (bfqq == bfqd->in_service_queue) {
++		__bfq_bfqq_expire(bfqd, bfqq);
++		bfq_schedule_dispatch(bfqd);
++	}
++
++	bfq_log_bfqq(bfqd, bfqq, "exit_bfqq: %p, %d", bfqq,
++		     atomic_read(&bfqq->ref));
++
++	bfq_put_queue(bfqq);
++}
++
++static void bfq_init_icq(struct io_cq *icq)
++{
++	struct bfq_io_cq *bic = icq_to_bic(icq);
++
++	bic->ttime.last_end_request = jiffies;
++}
++
++static void bfq_exit_icq(struct io_cq *icq)
++{
++	struct bfq_io_cq *bic = icq_to_bic(icq);
++	struct bfq_data *bfqd = bic_to_bfqd(bic);
++
++	if (bic->bfqq[BLK_RW_ASYNC]) {
++		bfq_exit_bfqq(bfqd, bic->bfqq[BLK_RW_ASYNC]);
++		bic->bfqq[BLK_RW_ASYNC] = NULL;
++	}
++
++	if (bic->bfqq[BLK_RW_SYNC]) {
++		bfq_exit_bfqq(bfqd, bic->bfqq[BLK_RW_SYNC]);
++		bic->bfqq[BLK_RW_SYNC] = NULL;
++	}
++}
++
++/*
++ * Update the entity prio values; note that the new values will not
++ * be used until the next (re)activation.
++ */
++static void bfq_set_next_ioprio_data(struct bfq_queue *bfqq, struct bfq_io_cq *bic)
++{
++	struct task_struct *tsk = current;
++	int ioprio_class;
++
++	ioprio_class = IOPRIO_PRIO_CLASS(bic->ioprio);
++	switch (ioprio_class) {
++	default:
++		dev_err(bfqq->bfqd->queue->backing_dev_info.dev,
++			"bfq: bad prio class %d\n", ioprio_class);
++	case IOPRIO_CLASS_NONE:
++		/*
++		 * No prio set, inherit CPU scheduling settings.
++		 */
++		bfqq->new_ioprio = task_nice_ioprio(tsk);
++		bfqq->new_ioprio_class = task_nice_ioclass(tsk);
++		break;
++	case IOPRIO_CLASS_RT:
++		bfqq->new_ioprio = IOPRIO_PRIO_DATA(bic->ioprio);
++		bfqq->new_ioprio_class = IOPRIO_CLASS_RT;
++		break;
++	case IOPRIO_CLASS_BE:
++		bfqq->new_ioprio = IOPRIO_PRIO_DATA(bic->ioprio);
++		bfqq->new_ioprio_class = IOPRIO_CLASS_BE;
++		break;
++	case IOPRIO_CLASS_IDLE:
++		bfqq->new_ioprio_class = IOPRIO_CLASS_IDLE;
++		bfqq->new_ioprio = 7;
++		bfq_clear_bfqq_idle_window(bfqq);
++		break;
++	}
++
++	if (bfqq->new_ioprio < 0 || bfqq->new_ioprio >= IOPRIO_BE_NR) {
++		printk(KERN_CRIT "bfq_set_next_ioprio_data: new_ioprio %d\n",
++				 bfqq->new_ioprio);
++		BUG();
++	}
++
++	bfqq->entity.new_weight = bfq_ioprio_to_weight(bfqq->new_ioprio);
++	bfqq->entity.prio_changed = 1;
++}
++
++static void bfq_check_ioprio_change(struct bfq_io_cq *bic, struct bio *bio)
++{
++	struct bfq_data *bfqd;
++	struct bfq_queue *bfqq, *new_bfqq;
++	unsigned long uninitialized_var(flags);
++	int ioprio = bic->icq.ioc->ioprio;
++
++	bfqd = bfq_get_bfqd_locked(&(bic->icq.q->elevator->elevator_data),
++				   &flags);
++	/*
++	 * This condition may trigger on a newly created bic, be sure to
++	 * drop the lock before returning.
++	 */
++	if (unlikely(!bfqd) || likely(bic->ioprio == ioprio))
++		goto out;
++
++	bic->ioprio = ioprio;
++
++	bfqq = bic->bfqq[BLK_RW_ASYNC];
++	if (bfqq) {
++		new_bfqq = bfq_get_queue(bfqd, bio, BLK_RW_ASYNC, bic,
++					 GFP_ATOMIC);
++		if (new_bfqq) {
++			bic->bfqq[BLK_RW_ASYNC] = new_bfqq;
++			bfq_log_bfqq(bfqd, bfqq,
++				     "check_ioprio_change: bfqq %p %d",
++				     bfqq, atomic_read(&bfqq->ref));
++			bfq_put_queue(bfqq);
++		}
++	}
++
++	bfqq = bic->bfqq[BLK_RW_SYNC];
++	if (bfqq)
++		bfq_set_next_ioprio_data(bfqq, bic);
++
++out:
++	bfq_put_bfqd_unlock(bfqd, &flags);
++}
++
++static void bfq_init_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++			  struct bfq_io_cq *bic, pid_t pid, int is_sync)
++{
++	RB_CLEAR_NODE(&bfqq->entity.rb_node);
++	INIT_LIST_HEAD(&bfqq->fifo);
++	INIT_HLIST_NODE(&bfqq->burst_list_node);
++
++	atomic_set(&bfqq->ref, 0);
++	bfqq->bfqd = bfqd;
++
++	if (bic)
++		bfq_set_next_ioprio_data(bfqq, bic);
++
++	if (is_sync) {
++		if (!bfq_class_idle(bfqq))
++			bfq_mark_bfqq_idle_window(bfqq);
++		bfq_mark_bfqq_sync(bfqq);
++	} else
++		bfq_clear_bfqq_sync(bfqq);
++	bfq_mark_bfqq_IO_bound(bfqq);
++
++	/* Tentative initial value to trade off between thr and lat */
++	bfqq->max_budget = (2 * bfq_max_budget(bfqd)) / 3;
++	bfqq->pid = pid;
++
++	bfqq->wr_coeff = 1;
++	bfqq->last_wr_start_finish = 0;
++	/*
++	 * Set to the value for which bfqq will not be deemed as
++	 * soft rt when it becomes backlogged.
++	 */
++	bfqq->soft_rt_next_start = bfq_infinity_from_now(jiffies);
++}
++
++static struct bfq_queue *bfq_find_alloc_queue(struct bfq_data *bfqd,
++					      struct bio *bio, int is_sync,
++					      struct bfq_io_cq *bic,
++					      gfp_t gfp_mask)
++{
++	struct bfq_group *bfqg;
++	struct bfq_queue *bfqq, *new_bfqq = NULL;
++	struct blkcg *blkcg;
++
++retry:
++	rcu_read_lock();
++
++	blkcg = bio_blkcg(bio);
++	bfqg = bfq_find_alloc_group(bfqd, blkcg);
++	/* bic always exists here */
++	bfqq = bic_to_bfqq(bic, is_sync);
++
++	/*
++	 * Always try a new alloc if we fall back to the OOM bfqq
++	 * originally, since it should just be a temporary situation.
++	 */
++	if (!bfqq || bfqq == &bfqd->oom_bfqq) {
++		bfqq = NULL;
++		if (new_bfqq) {
++			bfqq = new_bfqq;
++			new_bfqq = NULL;
++		} else if (gfp_mask & __GFP_WAIT) {
++			rcu_read_unlock();
++			spin_unlock_irq(bfqd->queue->queue_lock);
++			new_bfqq = kmem_cache_alloc_node(bfq_pool,
++					gfp_mask | __GFP_ZERO,
++					bfqd->queue->node);
++			spin_lock_irq(bfqd->queue->queue_lock);
++			if (new_bfqq)
++				goto retry;
++		} else {
++			bfqq = kmem_cache_alloc_node(bfq_pool,
++					gfp_mask | __GFP_ZERO,
++					bfqd->queue->node);
++		}
++
++		if (bfqq) {
++			bfq_init_bfqq(bfqd, bfqq, bic, current->pid,
++                                      is_sync);
++			bfq_init_entity(&bfqq->entity, bfqg);
++			bfq_log_bfqq(bfqd, bfqq, "allocated");
++		} else {
++			bfqq = &bfqd->oom_bfqq;
++			bfq_log_bfqq(bfqd, bfqq, "using oom bfqq");
++		}
++	}
++
++	if (new_bfqq)
++		kmem_cache_free(bfq_pool, new_bfqq);
++
++	rcu_read_unlock();
++
++	return bfqq;
++}
++
++static struct bfq_queue **bfq_async_queue_prio(struct bfq_data *bfqd,
++					       struct bfq_group *bfqg,
++					       int ioprio_class, int ioprio)
++{
++	switch (ioprio_class) {
++	case IOPRIO_CLASS_RT:
++		return &bfqg->async_bfqq[0][ioprio];
++	case IOPRIO_CLASS_NONE:
++		ioprio = IOPRIO_NORM;
++		/* fall through */
++	case IOPRIO_CLASS_BE:
++		return &bfqg->async_bfqq[1][ioprio];
++	case IOPRIO_CLASS_IDLE:
++		return &bfqg->async_idle_bfqq;
++	default:
++		BUG();
++	}
++}
++
++static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
++				       struct bio *bio, int is_sync,
++				       struct bfq_io_cq *bic, gfp_t gfp_mask)
++{
++	const int ioprio = IOPRIO_PRIO_DATA(bic->ioprio);
++	const int ioprio_class = IOPRIO_PRIO_CLASS(bic->ioprio);
++	struct bfq_queue **async_bfqq = NULL;
++	struct bfq_queue *bfqq = NULL;
++
++	if (!is_sync) {
++		struct blkcg *blkcg;
++		struct bfq_group *bfqg;
++
++		rcu_read_lock();
++		blkcg = bio_blkcg(bio);
++		rcu_read_unlock();
++		bfqg = bfq_find_alloc_group(bfqd, blkcg);
++		async_bfqq = bfq_async_queue_prio(bfqd, bfqg, ioprio_class,
++						  ioprio);
++		bfqq = *async_bfqq;
++	}
++
++	if (!bfqq)
++		bfqq = bfq_find_alloc_queue(bfqd, bio, is_sync, bic, gfp_mask);
++
++	/*
++	 * Pin the queue now that it's allocated, scheduler exit will
++	 * prune it.
++	 */
++	if (!is_sync && !(*async_bfqq)) {
++		atomic_inc(&bfqq->ref);
++		bfq_log_bfqq(bfqd, bfqq, "get_queue, bfqq not in async: %p, %d",
++			     bfqq, atomic_read(&bfqq->ref));
++		*async_bfqq = bfqq;
++	}
++
++	atomic_inc(&bfqq->ref);
++	bfq_log_bfqq(bfqd, bfqq, "get_queue, at end: %p, %d", bfqq,
++		     atomic_read(&bfqq->ref));
++	return bfqq;
++}
++
++static void bfq_update_io_thinktime(struct bfq_data *bfqd,
++				    struct bfq_io_cq *bic)
++{
++	unsigned long elapsed = jiffies - bic->ttime.last_end_request;
++	unsigned long ttime = min(elapsed, 2UL * bfqd->bfq_slice_idle);
++
++	bic->ttime.ttime_samples = (7*bic->ttime.ttime_samples + 256) / 8;
++	bic->ttime.ttime_total = (7*bic->ttime.ttime_total + 256*ttime) / 8;
++	bic->ttime.ttime_mean = (bic->ttime.ttime_total + 128) /
++				bic->ttime.ttime_samples;
++}
++
++static void bfq_update_io_seektime(struct bfq_data *bfqd,
++				   struct bfq_queue *bfqq,
++				   struct request *rq)
++{
++	sector_t sdist;
++	u64 total;
++
++	if (bfqq->last_request_pos < blk_rq_pos(rq))
++		sdist = blk_rq_pos(rq) - bfqq->last_request_pos;
++	else
++		sdist = bfqq->last_request_pos - blk_rq_pos(rq);
++
++	/*
++	 * Don't allow the seek distance to get too large from the
++	 * odd fragment, pagein, etc.
++	 */
++	if (bfqq->seek_samples == 0) /* first request, not really a seek */
++		sdist = 0;
++	else if (bfqq->seek_samples <= 60) /* second & third seek */
++		sdist = min(sdist, (bfqq->seek_mean * 4) + 2*1024*1024);
++	else
++		sdist = min(sdist, (bfqq->seek_mean * 4) + 2*1024*64);
++
++	bfqq->seek_samples = (7*bfqq->seek_samples + 256) / 8;
++	bfqq->seek_total = (7*bfqq->seek_total + (u64)256*sdist) / 8;
++	total = bfqq->seek_total + (bfqq->seek_samples/2);
++	do_div(total, bfqq->seek_samples);
++	bfqq->seek_mean = (sector_t)total;
++
++	bfq_log_bfqq(bfqd, bfqq, "dist=%llu mean=%llu", (u64)sdist,
++			(u64)bfqq->seek_mean);
++}
++
++/*
++ * Disable idle window if the process thinks too long or seeks so much that
++ * it doesn't matter.
++ */
++static void bfq_update_idle_window(struct bfq_data *bfqd,
++				   struct bfq_queue *bfqq,
++				   struct bfq_io_cq *bic)
++{
++	int enable_idle;
++
++	/* Don't idle for async or idle io prio class. */
++	if (!bfq_bfqq_sync(bfqq) || bfq_class_idle(bfqq))
++		return;
++
++	enable_idle = bfq_bfqq_idle_window(bfqq);
++
++	if (atomic_read(&bic->icq.ioc->active_ref) == 0 ||
++	    bfqd->bfq_slice_idle == 0 ||
++		(bfqd->hw_tag && BFQQ_SEEKY(bfqq) &&
++			bfqq->wr_coeff == 1))
++		enable_idle = 0;
++	else if (bfq_sample_valid(bic->ttime.ttime_samples)) {
++		if (bic->ttime.ttime_mean > bfqd->bfq_slice_idle &&
++			bfqq->wr_coeff == 1)
++			enable_idle = 0;
++		else
++			enable_idle = 1;
++	}
++	bfq_log_bfqq(bfqd, bfqq, "update_idle_window: enable_idle %d",
++		enable_idle);
++
++	if (enable_idle)
++		bfq_mark_bfqq_idle_window(bfqq);
++	else
++		bfq_clear_bfqq_idle_window(bfqq);
++}
++
++/*
++ * Called when a new fs request (rq) is added to bfqq.  Check if there's
++ * something we should do about it.
++ */
++static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++			    struct request *rq)
++{
++	struct bfq_io_cq *bic = RQ_BIC(rq);
++
++	if (rq->cmd_flags & REQ_META)
++		bfqq->meta_pending++;
++
++	bfq_update_io_thinktime(bfqd, bic);
++	bfq_update_io_seektime(bfqd, bfqq, rq);
++	if (!BFQQ_SEEKY(bfqq) && bfq_bfqq_constantly_seeky(bfqq)) {
++		bfq_clear_bfqq_constantly_seeky(bfqq);
++		if (!blk_queue_nonrot(bfqd->queue)) {
++			BUG_ON(!bfqd->const_seeky_busy_in_flight_queues);
++			bfqd->const_seeky_busy_in_flight_queues--;
++		}
++	}
++	if (bfqq->entity.service > bfq_max_budget(bfqd) / 8 ||
++	    !BFQQ_SEEKY(bfqq))
++		bfq_update_idle_window(bfqd, bfqq, bic);
++
++	bfq_log_bfqq(bfqd, bfqq,
++		     "rq_enqueued: idle_window=%d (seeky %d, mean %llu)",
++		     bfq_bfqq_idle_window(bfqq), BFQQ_SEEKY(bfqq),
++		     (long long unsigned)bfqq->seek_mean);
++
++	bfqq->last_request_pos = blk_rq_pos(rq) + blk_rq_sectors(rq);
++
++	if (bfqq == bfqd->in_service_queue && bfq_bfqq_wait_request(bfqq)) {
++		bool small_req = bfqq->queued[rq_is_sync(rq)] == 1 &&
++				 blk_rq_sectors(rq) < 32;
++		bool budget_timeout = bfq_bfqq_budget_timeout(bfqq);
++
++		/*
++		 * There is just this request queued: if the request
++		 * is small and the queue is not to be expired, then
++		 * just exit.
++		 *
++		 * In this way, if the disk is being idled to wait for
++		 * a new request from the in-service queue, we avoid
++		 * unplugging the device and committing the disk to serve
++		 * just a small request. On the contrary, we wait for
++		 * the block layer to decide when to unplug the device:
++		 * hopefully, new requests will be merged to this one
++		 * quickly, then the device will be unplugged and
++		 * larger requests will be dispatched.
++		 */
++		if (small_req && !budget_timeout)
++			return;
++
++		/*
++		 * A large enough request arrived, or the queue is to
++		 * be expired: in both cases disk idling is to be
++		 * stopped, so clear wait_request flag and reset
++		 * timer.
++		 */
++		bfq_clear_bfqq_wait_request(bfqq);
++		del_timer(&bfqd->idle_slice_timer);
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++		bfqg_stats_update_idle_time(bfqq_group(bfqq));
++#endif
++
++		/*
++		 * The queue is not empty, because a new request just
++		 * arrived. Hence we can safely expire the queue, in
++		 * case of budget timeout, without risking that the
++		 * timestamps of the queue are not updated correctly.
++		 * See [1] for more details.
++		 */
++		if (budget_timeout)
++			bfq_bfqq_expire(bfqd, bfqq, false,
++					BFQ_BFQQ_BUDGET_TIMEOUT);
++
++		/*
++		 * Let the request rip immediately, or let a new queue be
++		 * selected if bfqq has just been expired.
++		 */
++		__blk_run_queue(bfqd->queue);
++	}
++}
++
++static void bfq_insert_request(struct request_queue *q, struct request *rq)
++{
++	struct bfq_data *bfqd = q->elevator->elevator_data;
++	struct bfq_queue *bfqq = RQ_BFQQ(rq);
++
++	assert_spin_locked(bfqd->queue->queue_lock);
++
++	bfq_add_request(rq);
++
++	rq->fifo_time = jiffies + bfqd->bfq_fifo_expire[rq_is_sync(rq)];
++	list_add_tail(&rq->queuelist, &bfqq->fifo);
++
++	bfq_rq_enqueued(bfqd, bfqq, rq);
++}
++
++static void bfq_update_hw_tag(struct bfq_data *bfqd)
++{
++	bfqd->max_rq_in_driver = max(bfqd->max_rq_in_driver,
++				     bfqd->rq_in_driver);
++
++	if (bfqd->hw_tag == 1)
++		return;
++
++	/*
++	 * This sample is valid if the number of outstanding requests
++	 * is large enough to allow a queueing behavior.  Note that the
++	 * sum is not exact, as it's not taking into account deactivated
++	 * requests.
++	 */
++	if (bfqd->rq_in_driver + bfqd->queued < BFQ_HW_QUEUE_THRESHOLD)
++		return;
++
++	if (bfqd->hw_tag_samples++ < BFQ_HW_QUEUE_SAMPLES)
++		return;
++
++	bfqd->hw_tag = bfqd->max_rq_in_driver > BFQ_HW_QUEUE_THRESHOLD;
++	bfqd->max_rq_in_driver = 0;
++	bfqd->hw_tag_samples = 0;
++}
++
++static void bfq_completed_request(struct request_queue *q, struct request *rq)
++{
++	struct bfq_queue *bfqq = RQ_BFQQ(rq);
++	struct bfq_data *bfqd = bfqq->bfqd;
++	bool sync = bfq_bfqq_sync(bfqq);
++
++	bfq_log_bfqq(bfqd, bfqq, "completed one req with %u sects left (%d)",
++		     blk_rq_sectors(rq), sync);
++
++	bfq_update_hw_tag(bfqd);
++
++	BUG_ON(!bfqd->rq_in_driver);
++	BUG_ON(!bfqq->dispatched);
++	bfqd->rq_in_driver--;
++	bfqq->dispatched--;
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++	bfqg_stats_update_completion(bfqq_group(bfqq),
++				     rq_start_time_ns(rq),
++				     rq_io_start_time_ns(rq), rq->cmd_flags);
++#endif
++
++	if (!bfqq->dispatched && !bfq_bfqq_busy(bfqq)) {
++		bfq_weights_tree_remove(bfqd, &bfqq->entity,
++					&bfqd->queue_weights_tree);
++		if (!blk_queue_nonrot(bfqd->queue)) {
++			BUG_ON(!bfqd->busy_in_flight_queues);
++			bfqd->busy_in_flight_queues--;
++			if (bfq_bfqq_constantly_seeky(bfqq)) {
++				BUG_ON(!bfqd->
++					const_seeky_busy_in_flight_queues);
++				bfqd->const_seeky_busy_in_flight_queues--;
++			}
++		}
++	}
++
++	if (sync) {
++		bfqd->sync_flight--;
++		RQ_BIC(rq)->ttime.last_end_request = jiffies;
++	}
++
++	/*
++	 * If we are waiting to discover whether the request pattern of the
++	 * task associated with the queue is actually isochronous, and
++	 * both requisites for this condition to hold are satisfied, then
++	 * compute soft_rt_next_start (see the comments to the function
++	 * bfq_bfqq_softrt_next_start()).
++	 */
++	if (bfq_bfqq_softrt_update(bfqq) && bfqq->dispatched == 0 &&
++	    RB_EMPTY_ROOT(&bfqq->sort_list))
++		bfqq->soft_rt_next_start =
++			bfq_bfqq_softrt_next_start(bfqd, bfqq);
++
++	/*
++	 * If this is the in-service queue, check if it needs to be expired,
++	 * or if we want to idle in case it has no pending requests.
++	 */
++	if (bfqd->in_service_queue == bfqq) {
++		if (bfq_bfqq_budget_new(bfqq))
++			bfq_set_budget_timeout(bfqd);
++
++		if (bfq_bfqq_must_idle(bfqq)) {
++			bfq_arm_slice_timer(bfqd);
++			goto out;
++		} else if (bfq_may_expire_for_budg_timeout(bfqq))
++			bfq_bfqq_expire(bfqd, bfqq, false,
++					BFQ_BFQQ_BUDGET_TIMEOUT);
++		else if (RB_EMPTY_ROOT(&bfqq->sort_list) &&
++			 (bfqq->dispatched == 0 ||
++			  !bfq_bfqq_may_idle(bfqq)))
++			bfq_bfqq_expire(bfqd, bfqq, false,
++					BFQ_BFQQ_NO_MORE_REQUESTS);
++	}
++
++	if (!bfqd->rq_in_driver)
++		bfq_schedule_dispatch(bfqd);
++
++out:
++	return;
++}
++
++static int __bfq_may_queue(struct bfq_queue *bfqq)
++{
++	if (bfq_bfqq_wait_request(bfqq) && bfq_bfqq_must_alloc(bfqq)) {
++		bfq_clear_bfqq_must_alloc(bfqq);
++		return ELV_MQUEUE_MUST;
++	}
++
++	return ELV_MQUEUE_MAY;
++}
++
++static int bfq_may_queue(struct request_queue *q, int rw)
++{
++	struct bfq_data *bfqd = q->elevator->elevator_data;
++	struct task_struct *tsk = current;
++	struct bfq_io_cq *bic;
++	struct bfq_queue *bfqq;
++
++	/*
++	 * Don't force setup of a queue from here, as a call to may_queue
++	 * does not necessarily imply that a request actually will be
++	 * queued. So just lookup a possibly existing queue, or return
++	 * 'may queue' if that fails.
++	 */
++	bic = bfq_bic_lookup(bfqd, tsk->io_context);
++	if (!bic)
++		return ELV_MQUEUE_MAY;
++
++	bfqq = bic_to_bfqq(bic, rw_is_sync(rw));
++	if (bfqq)
++		return __bfq_may_queue(bfqq);
++
++	return ELV_MQUEUE_MAY;
++}
++
++/*
++ * Queue lock held here.
++ */
++static void bfq_put_request(struct request *rq)
++{
++	struct bfq_queue *bfqq = RQ_BFQQ(rq);
++
++	if (bfqq) {
++		const int rw = rq_data_dir(rq);
++
++		BUG_ON(!bfqq->allocated[rw]);
++		bfqq->allocated[rw]--;
++
++		rq->elv.priv[0] = NULL;
++		rq->elv.priv[1] = NULL;
++
++		bfq_log_bfqq(bfqq->bfqd, bfqq, "put_request %p, %d",
++			     bfqq, atomic_read(&bfqq->ref));
++		bfq_put_queue(bfqq);
++	}
++}
++
++/*
++ * Allocate bfq data structures associated with this request.
++ */
++static int bfq_set_request(struct request_queue *q, struct request *rq,
++			   struct bio *bio, gfp_t gfp_mask)
++{
++	struct bfq_data *bfqd = q->elevator->elevator_data;
++	struct bfq_io_cq *bic = icq_to_bic(rq->elv.icq);
++	const int rw = rq_data_dir(rq);
++	const int is_sync = rq_is_sync(rq);
++	struct bfq_queue *bfqq;
++	unsigned long flags;
++
++	might_sleep_if(gfp_mask & __GFP_WAIT);
++
++	bfq_check_ioprio_change(bic, bio);
++
++	spin_lock_irqsave(q->queue_lock, flags);
++
++	if (!bic)
++		goto queue_fail;
++
++	bfq_bic_update_cgroup(bic, bio);
++
++	bfqq = bic_to_bfqq(bic, is_sync);
++	if (!bfqq || bfqq == &bfqd->oom_bfqq) {
++		bfqq = bfq_get_queue(bfqd, bio, is_sync, bic, gfp_mask);
++		bic_set_bfqq(bic, bfqq, is_sync);
++		if (is_sync) {
++			if (bfqd->large_burst)
++				bfq_mark_bfqq_in_large_burst(bfqq);
++			else
++				bfq_clear_bfqq_in_large_burst(bfqq);
++		}
++	}
++
++	bfqq->allocated[rw]++;
++	atomic_inc(&bfqq->ref);
++	bfq_log_bfqq(bfqd, bfqq, "set_request: bfqq %p, %d", bfqq,
++		     atomic_read(&bfqq->ref));
++
++	rq->elv.priv[0] = bic;
++	rq->elv.priv[1] = bfqq;
++
++	spin_unlock_irqrestore(q->queue_lock, flags);
++
++	return 0;
++
++queue_fail:
++	bfq_schedule_dispatch(bfqd);
++	spin_unlock_irqrestore(q->queue_lock, flags);
++
++	return 1;
++}
++
++static void bfq_kick_queue(struct work_struct *work)
++{
++	struct bfq_data *bfqd =
++		container_of(work, struct bfq_data, unplug_work);
++	struct request_queue *q = bfqd->queue;
++
++	spin_lock_irq(q->queue_lock);
++	__blk_run_queue(q);
++	spin_unlock_irq(q->queue_lock);
++}
++
++/*
++ * Handler of the expiration of the timer running if the in-service queue
++ * is idling inside its time slice.
++ */
++static void bfq_idle_slice_timer(unsigned long data)
++{
++	struct bfq_data *bfqd = (struct bfq_data *)data;
++	struct bfq_queue *bfqq;
++	unsigned long flags;
++	enum bfqq_expiration reason;
++
++	spin_lock_irqsave(bfqd->queue->queue_lock, flags);
++
++	bfqq = bfqd->in_service_queue;
++	/*
++	 * Theoretical race here: the in-service queue can be NULL or
++	 * different from the queue that was idling if the timer handler
++	 * spins on the queue_lock and a new request arrives for the
++	 * current queue and there is a full dispatch cycle that changes
++	 * the in-service queue.  This can hardly happen, but in the worst
++	 * case we just expire a queue too early.
++	 */
++	if (bfqq) {
++		bfq_log_bfqq(bfqd, bfqq, "slice_timer expired");
++		if (bfq_bfqq_budget_timeout(bfqq))
++			/*
++			 * Also here the queue can be safely expired
++			 * for budget timeout without wasting
++			 * guarantees
++			 */
++			reason = BFQ_BFQQ_BUDGET_TIMEOUT;
++		else if (bfqq->queued[0] == 0 && bfqq->queued[1] == 0)
++			/*
++			 * The queue may not be empty upon timer expiration,
++			 * because we may not disable the timer when the
++			 * first request of the in-service queue arrives
++			 * during disk idling.
++			 */
++			reason = BFQ_BFQQ_TOO_IDLE;
++		else
++			goto schedule_dispatch;
++
++		bfq_bfqq_expire(bfqd, bfqq, true, reason);
++	}
++
++schedule_dispatch:
++	bfq_schedule_dispatch(bfqd);
++
++	spin_unlock_irqrestore(bfqd->queue->queue_lock, flags);
++}
++
++static void bfq_shutdown_timer_wq(struct bfq_data *bfqd)
++{
++	del_timer_sync(&bfqd->idle_slice_timer);
++	cancel_work_sync(&bfqd->unplug_work);
++}
++
++static void __bfq_put_async_bfqq(struct bfq_data *bfqd,
++					struct bfq_queue **bfqq_ptr)
++{
++	struct bfq_group *root_group = bfqd->root_group;
++	struct bfq_queue *bfqq = *bfqq_ptr;
++
++	bfq_log(bfqd, "put_async_bfqq: %p", bfqq);
++	if (bfqq) {
++		bfq_bfqq_move(bfqd, bfqq, &bfqq->entity, root_group);
++		bfq_log_bfqq(bfqd, bfqq, "put_async_bfqq: putting %p, %d",
++			     bfqq, atomic_read(&bfqq->ref));
++		bfq_put_queue(bfqq);
++		*bfqq_ptr = NULL;
++	}
++}
++
++/*
++ * Release all the bfqg references to its async queues.  If we are
++ * deallocating the group these queues may still contain requests, so
++ * we reparent them to the root cgroup (i.e., the only one that will
++ * exist for sure until all the requests on a device are gone).
++ */
++static void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg)
++{
++	int i, j;
++
++	for (i = 0; i < 2; i++)
++		for (j = 0; j < IOPRIO_BE_NR; j++)
++			__bfq_put_async_bfqq(bfqd, &bfqg->async_bfqq[i][j]);
++
++	__bfq_put_async_bfqq(bfqd, &bfqg->async_idle_bfqq);
++}
++
++static void bfq_exit_queue(struct elevator_queue *e)
++{
++	struct bfq_data *bfqd = e->elevator_data;
++	struct request_queue *q = bfqd->queue;
++	struct bfq_queue *bfqq, *n;
++
++	bfq_shutdown_timer_wq(bfqd);
++
++	spin_lock_irq(q->queue_lock);
++
++	BUG_ON(bfqd->in_service_queue);
++	list_for_each_entry_safe(bfqq, n, &bfqd->idle_list, bfqq_list)
++		bfq_deactivate_bfqq(bfqd, bfqq, 0);
++
++	bfq_disconnect_groups(bfqd);
++	spin_unlock_irq(q->queue_lock);
++
++	bfq_shutdown_timer_wq(bfqd);
++
++	synchronize_rcu();
++
++	BUG_ON(timer_pending(&bfqd->idle_slice_timer));
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++	blkcg_deactivate_policy(q, &blkcg_policy_bfq);
++#endif
++
++	kfree(bfqd);
++}
++
++static void bfq_init_root_group(struct bfq_group *root_group,
++				struct bfq_data *bfqd)
++{
++	int i;
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++	root_group->entity.parent = NULL;
++	root_group->my_entity = NULL;
++	root_group->bfqd = bfqd;
++#endif
++	for (i = 0; i < BFQ_IOPRIO_CLASSES; i++)
++		root_group->sched_data.service_tree[i] = BFQ_SERVICE_TREE_INIT;
++}
++
++static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
++{
++	struct bfq_data *bfqd;
++	struct elevator_queue *eq;
++
++	eq = elevator_alloc(q, e);
++	if (!eq)
++		return -ENOMEM;
++
++	bfqd = kzalloc_node(sizeof(*bfqd), GFP_KERNEL, q->node);
++	if (!bfqd) {
++		kobject_put(&eq->kobj);
++		return -ENOMEM;
++	}
++	eq->elevator_data = bfqd;
++
++	/*
++	 * Our fallback bfqq if bfq_find_alloc_queue() runs into OOM issues.
++	 * Grab a permanent reference to it, so that the normal code flow
++	 * will not attempt to free it.
++	 */
++	bfq_init_bfqq(bfqd, &bfqd->oom_bfqq, NULL, 1, 0);
++	atomic_inc(&bfqd->oom_bfqq.ref);
++	bfqd->oom_bfqq.new_ioprio = BFQ_DEFAULT_QUEUE_IOPRIO;
++	bfqd->oom_bfqq.new_ioprio_class = IOPRIO_CLASS_BE;
++	bfqd->oom_bfqq.entity.new_weight =
++		bfq_ioprio_to_weight(bfqd->oom_bfqq.new_ioprio);
++	/*
++	 * Trigger weight initialization, according to ioprio, at the
++	 * oom_bfqq's first activation. The oom_bfqq's ioprio and ioprio
++	 * class won't be changed any more.
++	 */
++	bfqd->oom_bfqq.entity.prio_changed = 1;
++
++	bfqd->queue = q;
++
++	spin_lock_irq(q->queue_lock);
++	q->elevator = eq;
++	spin_unlock_irq(q->queue_lock);
++
++	bfqd->root_group = bfq_create_group_hierarchy(bfqd, q->node);
++	if (!bfqd->root_group)
++		goto out_free;
++	bfq_init_root_group(bfqd->root_group, bfqd);
++	bfq_init_entity(&bfqd->oom_bfqq.entity, bfqd->root_group);
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++	bfqd->active_numerous_groups = 0;
++#endif
++
++	init_timer(&bfqd->idle_slice_timer);
++	bfqd->idle_slice_timer.function = bfq_idle_slice_timer;
++	bfqd->idle_slice_timer.data = (unsigned long)bfqd;
++
++	bfqd->queue_weights_tree = RB_ROOT;
++	bfqd->group_weights_tree = RB_ROOT;
++
++	INIT_WORK(&bfqd->unplug_work, bfq_kick_queue);
++
++	INIT_LIST_HEAD(&bfqd->active_list);
++	INIT_LIST_HEAD(&bfqd->idle_list);
++	INIT_HLIST_HEAD(&bfqd->burst_list);
++
++	bfqd->hw_tag = -1;
++
++	bfqd->bfq_max_budget = bfq_default_max_budget;
++
++	bfqd->bfq_fifo_expire[0] = bfq_fifo_expire[0];
++	bfqd->bfq_fifo_expire[1] = bfq_fifo_expire[1];
++	bfqd->bfq_back_max = bfq_back_max;
++	bfqd->bfq_back_penalty = bfq_back_penalty;
++	bfqd->bfq_slice_idle = bfq_slice_idle;
++	bfqd->bfq_class_idle_last_service = 0;
++	bfqd->bfq_max_budget_async_rq = bfq_max_budget_async_rq;
++	bfqd->bfq_timeout[BLK_RW_ASYNC] = bfq_timeout_async;
++	bfqd->bfq_timeout[BLK_RW_SYNC] = bfq_timeout_sync;
++
++	bfqd->bfq_requests_within_timer = 120;
++
++	bfqd->bfq_large_burst_thresh = 11;
++	bfqd->bfq_burst_interval = msecs_to_jiffies(500);
++
++	bfqd->low_latency = true;
++
++	bfqd->bfq_wr_coeff = 20;
++	bfqd->bfq_wr_rt_max_time = msecs_to_jiffies(300);
++	bfqd->bfq_wr_max_time = 0;
++	bfqd->bfq_wr_min_idle_time = msecs_to_jiffies(2000);
++	bfqd->bfq_wr_min_inter_arr_async = msecs_to_jiffies(500);
++	bfqd->bfq_wr_max_softrt_rate = 7000; /*
++					      * Approximate rate required
++					      * to playback or record a
++					      * high-definition compressed
++					      * video.
++					      */
++	bfqd->wr_busy_queues = 0;
++	bfqd->busy_in_flight_queues = 0;
++	bfqd->const_seeky_busy_in_flight_queues = 0;
++
++	/*
++	 * Begin by assuming, optimistically, that the device peak rate is
++	 * equal to the highest reference rate.
++	 */
++	bfqd->RT_prod = R_fast[blk_queue_nonrot(bfqd->queue)] *
++			T_fast[blk_queue_nonrot(bfqd->queue)];
++	bfqd->peak_rate = R_fast[blk_queue_nonrot(bfqd->queue)];
++	bfqd->device_speed = BFQ_BFQD_FAST;
++
++	return 0;
++
++out_free:
++	kfree(bfqd);
++	kobject_put(&eq->kobj);
++	return -ENOMEM;
++}
++
++static void bfq_slab_kill(void)
++{
++	if (bfq_pool)
++		kmem_cache_destroy(bfq_pool);
++}
++
++static int __init bfq_slab_setup(void)
++{
++	bfq_pool = KMEM_CACHE(bfq_queue, 0);
++	if (!bfq_pool)
++		return -ENOMEM;
++	return 0;
++}
++
++static ssize_t bfq_var_show(unsigned int var, char *page)
++{
++	return sprintf(page, "%d\n", var);
++}
++
++static ssize_t bfq_var_store(unsigned long *var, const char *page,
++			     size_t count)
++{
++	unsigned long new_val;
++	int ret = kstrtoul(page, 10, &new_val);
++
++	if (ret == 0)
++		*var = new_val;
++
++	return count;
++}
++
++static ssize_t bfq_wr_max_time_show(struct elevator_queue *e, char *page)
++{
++	struct bfq_data *bfqd = e->elevator_data;
++	return sprintf(page, "%d\n", bfqd->bfq_wr_max_time > 0 ?
++		       jiffies_to_msecs(bfqd->bfq_wr_max_time) :
++		       jiffies_to_msecs(bfq_wr_duration(bfqd)));
++}
++
++static ssize_t bfq_weights_show(struct elevator_queue *e, char *page)
++{
++	struct bfq_queue *bfqq;
++	struct bfq_data *bfqd = e->elevator_data;
++	ssize_t num_char = 0;
++
++	num_char += sprintf(page + num_char, "Tot reqs queued %d\n\n",
++			    bfqd->queued);
++
++	spin_lock_irq(bfqd->queue->queue_lock);
++
++	num_char += sprintf(page + num_char, "Active:\n");
++	list_for_each_entry(bfqq, &bfqd->active_list, bfqq_list) {
++	  num_char += sprintf(page + num_char,
++			      "pid%d: weight %hu, nr_queued %d %d, dur %d/%u\n",
++			      bfqq->pid,
++			      bfqq->entity.weight,
++			      bfqq->queued[0],
++			      bfqq->queued[1],
++			jiffies_to_msecs(jiffies - bfqq->last_wr_start_finish),
++			jiffies_to_msecs(bfqq->wr_cur_max_time));
++	}
++
++	num_char += sprintf(page + num_char, "Idle:\n");
++	list_for_each_entry(bfqq, &bfqd->idle_list, bfqq_list) {
++			num_char += sprintf(page + num_char,
++				"pid%d: weight %hu, dur %d/%u\n",
++				bfqq->pid,
++				bfqq->entity.weight,
++				jiffies_to_msecs(jiffies -
++					bfqq->last_wr_start_finish),
++				jiffies_to_msecs(bfqq->wr_cur_max_time));
++	}
++
++	spin_unlock_irq(bfqd->queue->queue_lock);
++
++	return num_char;
++}
++
++#define SHOW_FUNCTION(__FUNC, __VAR, __CONV)				\
++static ssize_t __FUNC(struct elevator_queue *e, char *page)		\
++{									\
++	struct bfq_data *bfqd = e->elevator_data;			\
++	unsigned int __data = __VAR;					\
++	if (__CONV)							\
++		__data = jiffies_to_msecs(__data);			\
++	return bfq_var_show(__data, (page));				\
++}
++SHOW_FUNCTION(bfq_fifo_expire_sync_show, bfqd->bfq_fifo_expire[1], 1);
++SHOW_FUNCTION(bfq_fifo_expire_async_show, bfqd->bfq_fifo_expire[0], 1);
++SHOW_FUNCTION(bfq_back_seek_max_show, bfqd->bfq_back_max, 0);
++SHOW_FUNCTION(bfq_back_seek_penalty_show, bfqd->bfq_back_penalty, 0);
++SHOW_FUNCTION(bfq_slice_idle_show, bfqd->bfq_slice_idle, 1);
++SHOW_FUNCTION(bfq_max_budget_show, bfqd->bfq_user_max_budget, 0);
++SHOW_FUNCTION(bfq_max_budget_async_rq_show,
++	      bfqd->bfq_max_budget_async_rq, 0);
++SHOW_FUNCTION(bfq_timeout_sync_show, bfqd->bfq_timeout[BLK_RW_SYNC], 1);
++SHOW_FUNCTION(bfq_timeout_async_show, bfqd->bfq_timeout[BLK_RW_ASYNC], 1);
++SHOW_FUNCTION(bfq_low_latency_show, bfqd->low_latency, 0);
++SHOW_FUNCTION(bfq_wr_coeff_show, bfqd->bfq_wr_coeff, 0);
++SHOW_FUNCTION(bfq_wr_rt_max_time_show, bfqd->bfq_wr_rt_max_time, 1);
++SHOW_FUNCTION(bfq_wr_min_idle_time_show, bfqd->bfq_wr_min_idle_time, 1);
++SHOW_FUNCTION(bfq_wr_min_inter_arr_async_show, bfqd->bfq_wr_min_inter_arr_async,
++	1);
++SHOW_FUNCTION(bfq_wr_max_softrt_rate_show, bfqd->bfq_wr_max_softrt_rate, 0);
++#undef SHOW_FUNCTION
++
++#define STORE_FUNCTION(__FUNC, __PTR, MIN, MAX, __CONV)			\
++static ssize_t								\
++__FUNC(struct elevator_queue *e, const char *page, size_t count)	\
++{									\
++	struct bfq_data *bfqd = e->elevator_data;			\
++	unsigned long uninitialized_var(__data);			\
++	int ret = bfq_var_store(&__data, (page), count);		\
++	if (__data < (MIN))						\
++		__data = (MIN);						\
++	else if (__data > (MAX))					\
++		__data = (MAX);						\
++	if (__CONV)							\
++		*(__PTR) = msecs_to_jiffies(__data);			\
++	else								\
++		*(__PTR) = __data;					\
++	return ret;							\
++}
++STORE_FUNCTION(bfq_fifo_expire_sync_store, &bfqd->bfq_fifo_expire[1], 1,
++		INT_MAX, 1);
++STORE_FUNCTION(bfq_fifo_expire_async_store, &bfqd->bfq_fifo_expire[0], 1,
++		INT_MAX, 1);
++STORE_FUNCTION(bfq_back_seek_max_store, &bfqd->bfq_back_max, 0, INT_MAX, 0);
++STORE_FUNCTION(bfq_back_seek_penalty_store, &bfqd->bfq_back_penalty, 1,
++		INT_MAX, 0);
++STORE_FUNCTION(bfq_slice_idle_store, &bfqd->bfq_slice_idle, 0, INT_MAX, 1);
++STORE_FUNCTION(bfq_max_budget_async_rq_store, &bfqd->bfq_max_budget_async_rq,
++		1, INT_MAX, 0);
++STORE_FUNCTION(bfq_timeout_async_store, &bfqd->bfq_timeout[BLK_RW_ASYNC], 0,
++		INT_MAX, 1);
++STORE_FUNCTION(bfq_wr_coeff_store, &bfqd->bfq_wr_coeff, 1, INT_MAX, 0);
++STORE_FUNCTION(bfq_wr_max_time_store, &bfqd->bfq_wr_max_time, 0, INT_MAX, 1);
++STORE_FUNCTION(bfq_wr_rt_max_time_store, &bfqd->bfq_wr_rt_max_time, 0, INT_MAX,
++		1);
++STORE_FUNCTION(bfq_wr_min_idle_time_store, &bfqd->bfq_wr_min_idle_time, 0,
++		INT_MAX, 1);
++STORE_FUNCTION(bfq_wr_min_inter_arr_async_store,
++		&bfqd->bfq_wr_min_inter_arr_async, 0, INT_MAX, 1);
++STORE_FUNCTION(bfq_wr_max_softrt_rate_store, &bfqd->bfq_wr_max_softrt_rate, 0,
++		INT_MAX, 0);
++#undef STORE_FUNCTION
++
++/* do nothing for the moment */
++static ssize_t bfq_weights_store(struct elevator_queue *e,
++				    const char *page, size_t count)
++{
++	return count;
++}
++
++static unsigned long bfq_estimated_max_budget(struct bfq_data *bfqd)
++{
++	u64 timeout = jiffies_to_msecs(bfqd->bfq_timeout[BLK_RW_SYNC]);
++
++	if (bfqd->peak_rate_samples >= BFQ_PEAK_RATE_SAMPLES)
++		return bfq_calc_max_budget(bfqd->peak_rate, timeout);
++	else
++		return bfq_default_max_budget;
++}
++
++static ssize_t bfq_max_budget_store(struct elevator_queue *e,
++				    const char *page, size_t count)
++{
++	struct bfq_data *bfqd = e->elevator_data;
++	unsigned long uninitialized_var(__data);
++	int ret = bfq_var_store(&__data, (page), count);
++
++	if (__data == 0)
++		bfqd->bfq_max_budget = bfq_estimated_max_budget(bfqd);
++	else {
++		if (__data > INT_MAX)
++			__data = INT_MAX;
++		bfqd->bfq_max_budget = __data;
++	}
++
++	bfqd->bfq_user_max_budget = __data;
++
++	return ret;
++}
++
++static ssize_t bfq_timeout_sync_store(struct elevator_queue *e,
++				      const char *page, size_t count)
++{
++	struct bfq_data *bfqd = e->elevator_data;
++	unsigned long uninitialized_var(__data);
++	int ret = bfq_var_store(&__data, (page), count);
++
++	if (__data < 1)
++		__data = 1;
++	else if (__data > INT_MAX)
++		__data = INT_MAX;
++
++	bfqd->bfq_timeout[BLK_RW_SYNC] = msecs_to_jiffies(__data);
++	if (bfqd->bfq_user_max_budget == 0)
++		bfqd->bfq_max_budget = bfq_estimated_max_budget(bfqd);
++
++	return ret;
++}
++
++static ssize_t bfq_low_latency_store(struct elevator_queue *e,
++				     const char *page, size_t count)
++{
++	struct bfq_data *bfqd = e->elevator_data;
++	unsigned long uninitialized_var(__data);
++	int ret = bfq_var_store(&__data, (page), count);
++
++	if (__data > 1)
++		__data = 1;
++	if (__data == 0 && bfqd->low_latency != 0)
++		bfq_end_wr(bfqd);
++	bfqd->low_latency = __data;
++
++	return ret;
++}
++
++#define BFQ_ATTR(name) \
++	__ATTR(name, S_IRUGO|S_IWUSR, bfq_##name##_show, bfq_##name##_store)
++
++static struct elv_fs_entry bfq_attrs[] = {
++	BFQ_ATTR(fifo_expire_sync),
++	BFQ_ATTR(fifo_expire_async),
++	BFQ_ATTR(back_seek_max),
++	BFQ_ATTR(back_seek_penalty),
++	BFQ_ATTR(slice_idle),
++	BFQ_ATTR(max_budget),
++	BFQ_ATTR(max_budget_async_rq),
++	BFQ_ATTR(timeout_sync),
++	BFQ_ATTR(timeout_async),
++	BFQ_ATTR(low_latency),
++	BFQ_ATTR(wr_coeff),
++	BFQ_ATTR(wr_max_time),
++	BFQ_ATTR(wr_rt_max_time),
++	BFQ_ATTR(wr_min_idle_time),
++	BFQ_ATTR(wr_min_inter_arr_async),
++	BFQ_ATTR(wr_max_softrt_rate),
++	BFQ_ATTR(weights),
++	__ATTR_NULL
++};
++
++static struct elevator_type iosched_bfq = {
++	.ops = {
++		.elevator_merge_fn =		bfq_merge,
++		.elevator_merged_fn =		bfq_merged_request,
++		.elevator_merge_req_fn =	bfq_merged_requests,
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++		.elevator_bio_merged_fn =	bfq_bio_merged,
++#endif
++		.elevator_allow_merge_fn =	bfq_allow_merge,
++		.elevator_dispatch_fn =		bfq_dispatch_requests,
++		.elevator_add_req_fn =		bfq_insert_request,
++		.elevator_activate_req_fn =	bfq_activate_request,
++		.elevator_deactivate_req_fn =	bfq_deactivate_request,
++		.elevator_completed_req_fn =	bfq_completed_request,
++		.elevator_former_req_fn =	elv_rb_former_request,
++		.elevator_latter_req_fn =	elv_rb_latter_request,
++		.elevator_init_icq_fn =		bfq_init_icq,
++		.elevator_exit_icq_fn =		bfq_exit_icq,
++		.elevator_set_req_fn =		bfq_set_request,
++		.elevator_put_req_fn =		bfq_put_request,
++		.elevator_may_queue_fn =	bfq_may_queue,
++		.elevator_init_fn =		bfq_init_queue,
++		.elevator_exit_fn =		bfq_exit_queue,
++	},
++	.icq_size =		sizeof(struct bfq_io_cq),
++	.icq_align =		__alignof__(struct bfq_io_cq),
++	.elevator_attrs =	bfq_attrs,
++	.elevator_name =	"bfq",
++	.elevator_owner =	THIS_MODULE,
++};
++
++static int __init bfq_init(void)
++{
++	int ret;
++
++	/*
++	 * Can be 0 on HZ < 1000 setups.
++	 */
++	if (bfq_slice_idle == 0)
++		bfq_slice_idle = 1;
++
++	if (bfq_timeout_async == 0)
++		bfq_timeout_async = 1;
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++	ret = blkcg_policy_register(&blkcg_policy_bfq);
++	if (ret)
++		return ret;
++#endif
++
++	ret = -ENOMEM;
++	if (bfq_slab_setup())
++		goto err_pol_unreg;
++
++	/*
++	 * Times to load large popular applications for the typical systems
++	 * installed on the reference devices (see the comments before the
++	 * definitions of the two arrays).
++	 */
++	T_slow[0] = msecs_to_jiffies(2600);
++	T_slow[1] = msecs_to_jiffies(1000);
++	T_fast[0] = msecs_to_jiffies(5500);
++	T_fast[1] = msecs_to_jiffies(2000);
++
++	/*
++	 * Thresholds that determine the switch between speed classes (see
++	 * the comments before the definition of the array).
++	 */
++	device_speed_thresh[0] = (R_fast[0] + R_slow[0]) / 2;
++	device_speed_thresh[1] = (R_fast[1] + R_slow[1]) / 2;
++
++	ret = elv_register(&iosched_bfq);
++	if (ret)
++		goto err_pol_unreg;
++
++	pr_info("BFQ I/O-scheduler: v7r9");
++
++	return 0;
++
++err_pol_unreg:
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++	blkcg_policy_unregister(&blkcg_policy_bfq);
++#endif
++	return ret;
++}
++
++static void __exit bfq_exit(void)
++{
++	elv_unregister(&iosched_bfq);
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++	blkcg_policy_unregister(&blkcg_policy_bfq);
++#endif
++	bfq_slab_kill();
++}
++
++module_init(bfq_init);
++module_exit(bfq_exit);
++
++MODULE_AUTHOR("Fabio Checconi, Paolo Valente");
++MODULE_LICENSE("GPL");
+diff --git a/block/bfq-sched.c b/block/bfq-sched.c
+new file mode 100644
+index 0000000..9328a1f
+--- /dev/null
++++ b/block/bfq-sched.c
+@@ -0,0 +1,1197 @@
++/*
++ * BFQ: Hierarchical B-WF2Q+ scheduler.
++ *
++ * Based on ideas and code from CFQ:
++ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
++ *
++ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
++ *		      Paolo Valente <paolo.valente@unimore.it>
++ *
++ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
++ */
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++#define for_each_entity(entity)	\
++	for (; entity ; entity = entity->parent)
++
++#define for_each_entity_safe(entity, parent) \
++	for (; entity && ({ parent = entity->parent; 1; }); entity = parent)
++
++
++static struct bfq_entity *bfq_lookup_next_entity(struct bfq_sched_data *sd,
++						 int extract,
++						 struct bfq_data *bfqd);
++
++static struct bfq_group *bfqq_group(struct bfq_queue *bfqq);
++
++static void bfq_update_budget(struct bfq_entity *next_in_service)
++{
++	struct bfq_entity *bfqg_entity;
++	struct bfq_group *bfqg;
++	struct bfq_sched_data *group_sd;
++
++	BUG_ON(!next_in_service);
++
++	group_sd = next_in_service->sched_data;
++
++	bfqg = container_of(group_sd, struct bfq_group, sched_data);
++	/*
++	 * bfq_group's my_entity field is not NULL only if the group
++	 * is not the root group. We must not touch the root entity
++	 * as it must never become an in-service entity.
++	 */
++	bfqg_entity = bfqg->my_entity;
++	if (bfqg_entity)
++		bfqg_entity->budget = next_in_service->budget;
++}
++
++static int bfq_update_next_in_service(struct bfq_sched_data *sd)
++{
++	struct bfq_entity *next_in_service;
++
++	if (sd->in_service_entity)
++		/* will update/requeue at the end of service */
++		return 0;
++
++	/*
++	 * NOTE: this can be improved in many ways, such as returning
++	 * 1 (and thus propagating upwards the update) only when the
++	 * budget changes, or caching the bfqq that will be scheduled
++	 * next from this subtree.  By now we worry more about
++	 * correctness than about performance...
++	 */
++	next_in_service = bfq_lookup_next_entity(sd, 0, NULL);
++	sd->next_in_service = next_in_service;
++
++	if (next_in_service)
++		bfq_update_budget(next_in_service);
++
++	return 1;
++}
++
++static void bfq_check_next_in_service(struct bfq_sched_data *sd,
++				      struct bfq_entity *entity)
++{
++	BUG_ON(sd->next_in_service != entity);
++}
++#else
++#define for_each_entity(entity)	\
++	for (; entity ; entity = NULL)
++
++#define for_each_entity_safe(entity, parent) \
++	for (parent = NULL; entity ; entity = parent)
++
++static int bfq_update_next_in_service(struct bfq_sched_data *sd)
++{
++	return 0;
++}
++
++static void bfq_check_next_in_service(struct bfq_sched_data *sd,
++				      struct bfq_entity *entity)
++{
++}
++
++static void bfq_update_budget(struct bfq_entity *next_in_service)
++{
++}
++#endif
++
++/*
++ * Shift for timestamp calculations.  This actually limits the maximum
++ * service allowed in one timestamp delta (small shift values increase it),
++ * the maximum total weight that can be used for the queues in the system
++ * (big shift values increase it), and the period of virtual time
++ * wraparounds.
++ */
++#define WFQ_SERVICE_SHIFT	22
++
++/**
++ * bfq_gt - compare two timestamps.
++ * @a: first ts.
++ * @b: second ts.
++ *
++ * Return @a > @b, dealing with wrapping correctly.
++ */
++static int bfq_gt(u64 a, u64 b)
++{
++	return (s64)(a - b) > 0;
++}
++
++static struct bfq_queue *bfq_entity_to_bfqq(struct bfq_entity *entity)
++{
++	struct bfq_queue *bfqq = NULL;
++
++	BUG_ON(!entity);
++
++	if (!entity->my_sched_data)
++		bfqq = container_of(entity, struct bfq_queue, entity);
++
++	return bfqq;
++}
++
++
++/**
++ * bfq_delta - map service into the virtual time domain.
++ * @service: amount of service.
++ * @weight: scale factor (weight of an entity or weight sum).
++ */
++static u64 bfq_delta(unsigned long service, unsigned long weight)
++{
++	u64 d = (u64)service << WFQ_SERVICE_SHIFT;
++
++	do_div(d, weight);
++	return d;
++}
++
++/**
++ * bfq_calc_finish - assign the finish time to an entity.
++ * @entity: the entity to act upon.
++ * @service: the service to be charged to the entity.
++ */
++static void bfq_calc_finish(struct bfq_entity *entity, unsigned long service)
++{
++	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++
++	BUG_ON(entity->weight == 0);
++
++	entity->finish = entity->start +
++		bfq_delta(service, entity->weight);
++
++	if (bfqq) {
++		bfq_log_bfqq(bfqq->bfqd, bfqq,
++			"calc_finish: serv %lu, w %d",
++			service, entity->weight);
++		bfq_log_bfqq(bfqq->bfqd, bfqq,
++			"calc_finish: start %llu, finish %llu, delta %llu",
++			entity->start, entity->finish,
++			bfq_delta(service, entity->weight));
++	}
++}
++
++/**
++ * bfq_entity_of - get an entity from a node.
++ * @node: the node field of the entity.
++ *
++ * Convert a node pointer to the relative entity.  This is used only
++ * to simplify the logic of some functions and not as the generic
++ * conversion mechanism because, e.g., in the tree walking functions,
++ * the check for a %NULL value would be redundant.
++ */
++static struct bfq_entity *bfq_entity_of(struct rb_node *node)
++{
++	struct bfq_entity *entity = NULL;
++
++	if (node)
++		entity = rb_entry(node, struct bfq_entity, rb_node);
++
++	return entity;
++}
++
++/**
++ * bfq_extract - remove an entity from a tree.
++ * @root: the tree root.
++ * @entity: the entity to remove.
++ */
++static void bfq_extract(struct rb_root *root, struct bfq_entity *entity)
++{
++	BUG_ON(entity->tree != root);
++
++	entity->tree = NULL;
++	rb_erase(&entity->rb_node, root);
++}
++
++/**
++ * bfq_idle_extract - extract an entity from the idle tree.
++ * @st: the service tree of the owning @entity.
++ * @entity: the entity being removed.
++ */
++static void bfq_idle_extract(struct bfq_service_tree *st,
++			     struct bfq_entity *entity)
++{
++	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++	struct rb_node *next;
++
++	BUG_ON(entity->tree != &st->idle);
++
++	if (entity == st->first_idle) {
++		next = rb_next(&entity->rb_node);
++		st->first_idle = bfq_entity_of(next);
++	}
++
++	if (entity == st->last_idle) {
++		next = rb_prev(&entity->rb_node);
++		st->last_idle = bfq_entity_of(next);
++	}
++
++	bfq_extract(&st->idle, entity);
++
++	if (bfqq)
++		list_del(&bfqq->bfqq_list);
++}
++
++/**
++ * bfq_insert - generic tree insertion.
++ * @root: tree root.
++ * @entity: entity to insert.
++ *
++ * This is used for the idle and the active tree, since they are both
++ * ordered by finish time.
++ */
++static void bfq_insert(struct rb_root *root, struct bfq_entity *entity)
++{
++	struct bfq_entity *entry;
++	struct rb_node **node = &root->rb_node;
++	struct rb_node *parent = NULL;
++
++	BUG_ON(entity->tree);
++
++	while (*node) {
++		parent = *node;
++		entry = rb_entry(parent, struct bfq_entity, rb_node);
++
++		if (bfq_gt(entry->finish, entity->finish))
++			node = &parent->rb_left;
++		else
++			node = &parent->rb_right;
++	}
++
++	rb_link_node(&entity->rb_node, parent, node);
++	rb_insert_color(&entity->rb_node, root);
++
++	entity->tree = root;
++}
++
++/**
++ * bfq_update_min - update the min_start field of a entity.
++ * @entity: the entity to update.
++ * @node: one of its children.
++ *
++ * This function is called when @entity may store an invalid value for
++ * min_start due to updates to the active tree.  The function  assumes
++ * that the subtree rooted at @node (which may be its left or its right
++ * child) has a valid min_start value.
++ */
++static void bfq_update_min(struct bfq_entity *entity, struct rb_node *node)
++{
++	struct bfq_entity *child;
++
++	if (node) {
++		child = rb_entry(node, struct bfq_entity, rb_node);
++		if (bfq_gt(entity->min_start, child->min_start))
++			entity->min_start = child->min_start;
++	}
++}
++
++/**
++ * bfq_update_active_node - recalculate min_start.
++ * @node: the node to update.
++ *
++ * @node may have changed position or one of its children may have moved,
++ * this function updates its min_start value.  The left and right subtrees
++ * are assumed to hold a correct min_start value.
++ */
++static void bfq_update_active_node(struct rb_node *node)
++{
++	struct bfq_entity *entity = rb_entry(node, struct bfq_entity, rb_node);
++
++	entity->min_start = entity->start;
++	bfq_update_min(entity, node->rb_right);
++	bfq_update_min(entity, node->rb_left);
++}
++
++/**
++ * bfq_update_active_tree - update min_start for the whole active tree.
++ * @node: the starting node.
++ *
++ * @node must be the deepest modified node after an update.  This function
++ * updates its min_start using the values held by its children, assuming
++ * that they did not change, and then updates all the nodes that may have
++ * changed in the path to the root.  The only nodes that may have changed
++ * are the ones in the path or their siblings.
++ */
++static void bfq_update_active_tree(struct rb_node *node)
++{
++	struct rb_node *parent;
++
++up:
++	bfq_update_active_node(node);
++
++	parent = rb_parent(node);
++	if (!parent)
++		return;
++
++	if (node == parent->rb_left && parent->rb_right)
++		bfq_update_active_node(parent->rb_right);
++	else if (parent->rb_left)
++		bfq_update_active_node(parent->rb_left);
++
++	node = parent;
++	goto up;
++}
++
++static void bfq_weights_tree_add(struct bfq_data *bfqd,
++				 struct bfq_entity *entity,
++				 struct rb_root *root);
++
++static void bfq_weights_tree_remove(struct bfq_data *bfqd,
++				    struct bfq_entity *entity,
++				    struct rb_root *root);
++
++
++/**
++ * bfq_active_insert - insert an entity in the active tree of its
++ *                     group/device.
++ * @st: the service tree of the entity.
++ * @entity: the entity being inserted.
++ *
++ * The active tree is ordered by finish time, but an extra key is kept
++ * per each node, containing the minimum value for the start times of
++ * its children (and the node itself), so it's possible to search for
++ * the eligible node with the lowest finish time in logarithmic time.
++ */
++static void bfq_active_insert(struct bfq_service_tree *st,
++			      struct bfq_entity *entity)
++{
++	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++	struct rb_node *node = &entity->rb_node;
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++	struct bfq_sched_data *sd = NULL;
++	struct bfq_group *bfqg = NULL;
++	struct bfq_data *bfqd = NULL;
++#endif
++
++	bfq_insert(&st->active, entity);
++
++	if (node->rb_left)
++		node = node->rb_left;
++	else if (node->rb_right)
++		node = node->rb_right;
++
++	bfq_update_active_tree(node);
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++	sd = entity->sched_data;
++	bfqg = container_of(sd, struct bfq_group, sched_data);
++	BUG_ON(!bfqg);
++	bfqd = (struct bfq_data *)bfqg->bfqd;
++#endif
++	if (bfqq)
++		list_add(&bfqq->bfqq_list, &bfqq->bfqd->active_list);
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++	else { /* bfq_group */
++		BUG_ON(!bfqd);
++		bfq_weights_tree_add(bfqd, entity, &bfqd->group_weights_tree);
++	}
++	if (bfqg != bfqd->root_group) {
++		BUG_ON(!bfqg);
++		BUG_ON(!bfqd);
++		bfqg->active_entities++;
++		if (bfqg->active_entities == 2)
++			bfqd->active_numerous_groups++;
++	}
++#endif
++}
++
++/**
++ * bfq_ioprio_to_weight - calc a weight from an ioprio.
++ * @ioprio: the ioprio value to convert.
++ */
++static unsigned short bfq_ioprio_to_weight(int ioprio)
++{
++	BUG_ON(ioprio < 0 || ioprio >= IOPRIO_BE_NR);
++	return IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF - ioprio;
++}
++
++/**
++ * bfq_weight_to_ioprio - calc an ioprio from a weight.
++ * @weight: the weight value to convert.
++ *
++ * To preserve as much as possible the old only-ioprio user interface,
++ * 0 is used as an escape ioprio value for weights (numerically) equal or
++ * larger than IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF.
++ */
++static unsigned short bfq_weight_to_ioprio(int weight)
++{
++	BUG_ON(weight < BFQ_MIN_WEIGHT || weight > BFQ_MAX_WEIGHT);
++	return IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF - weight < 0 ?
++		0 : IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF - weight;
++}
++
++static void bfq_get_entity(struct bfq_entity *entity)
++{
++	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++
++	if (bfqq) {
++		atomic_inc(&bfqq->ref);
++		bfq_log_bfqq(bfqq->bfqd, bfqq, "get_entity: %p %d",
++			     bfqq, atomic_read(&bfqq->ref));
++	}
++}
++
++/**
++ * bfq_find_deepest - find the deepest node that an extraction can modify.
++ * @node: the node being removed.
++ *
++ * Do the first step of an extraction in an rb tree, looking for the
++ * node that will replace @node, and returning the deepest node that
++ * the following modifications to the tree can touch.  If @node is the
++ * last node in the tree return %NULL.
++ */
++static struct rb_node *bfq_find_deepest(struct rb_node *node)
++{
++	struct rb_node *deepest;
++
++	if (!node->rb_right && !node->rb_left)
++		deepest = rb_parent(node);
++	else if (!node->rb_right)
++		deepest = node->rb_left;
++	else if (!node->rb_left)
++		deepest = node->rb_right;
++	else {
++		deepest = rb_next(node);
++		if (deepest->rb_right)
++			deepest = deepest->rb_right;
++		else if (rb_parent(deepest) != node)
++			deepest = rb_parent(deepest);
++	}
++
++	return deepest;
++}
++
++/**
++ * bfq_active_extract - remove an entity from the active tree.
++ * @st: the service_tree containing the tree.
++ * @entity: the entity being removed.
++ */
++static void bfq_active_extract(struct bfq_service_tree *st,
++			       struct bfq_entity *entity)
++{
++	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++	struct rb_node *node;
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++	struct bfq_sched_data *sd = NULL;
++	struct bfq_group *bfqg = NULL;
++	struct bfq_data *bfqd = NULL;
++#endif
++
++	node = bfq_find_deepest(&entity->rb_node);
++	bfq_extract(&st->active, entity);
++
++	if (node)
++		bfq_update_active_tree(node);
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++	sd = entity->sched_data;
++	bfqg = container_of(sd, struct bfq_group, sched_data);
++	BUG_ON(!bfqg);
++	bfqd = (struct bfq_data *)bfqg->bfqd;
++#endif
++	if (bfqq)
++		list_del(&bfqq->bfqq_list);
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++	else { /* bfq_group */
++		BUG_ON(!bfqd);
++		bfq_weights_tree_remove(bfqd, entity,
++					&bfqd->group_weights_tree);
++	}
++	if (bfqg != bfqd->root_group) {
++		BUG_ON(!bfqg);
++		BUG_ON(!bfqd);
++		BUG_ON(!bfqg->active_entities);
++		bfqg->active_entities--;
++		if (bfqg->active_entities == 1) {
++			BUG_ON(!bfqd->active_numerous_groups);
++			bfqd->active_numerous_groups--;
++		}
++	}
++#endif
++}
++
++/**
++ * bfq_idle_insert - insert an entity into the idle tree.
++ * @st: the service tree containing the tree.
++ * @entity: the entity to insert.
++ */
++static void bfq_idle_insert(struct bfq_service_tree *st,
++			    struct bfq_entity *entity)
++{
++	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++	struct bfq_entity *first_idle = st->first_idle;
++	struct bfq_entity *last_idle = st->last_idle;
++
++	if (!first_idle || bfq_gt(first_idle->finish, entity->finish))
++		st->first_idle = entity;
++	if (!last_idle || bfq_gt(entity->finish, last_idle->finish))
++		st->last_idle = entity;
++
++	bfq_insert(&st->idle, entity);
++
++	if (bfqq)
++		list_add(&bfqq->bfqq_list, &bfqq->bfqd->idle_list);
++}
++
++/**
++ * bfq_forget_entity - remove an entity from the wfq trees.
++ * @st: the service tree.
++ * @entity: the entity being removed.
++ *
++ * Update the device status and forget everything about @entity, putting
++ * the device reference to it, if it is a queue.  Entities belonging to
++ * groups are not refcounted.
++ */
++static void bfq_forget_entity(struct bfq_service_tree *st,
++			      struct bfq_entity *entity)
++{
++	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++	struct bfq_sched_data *sd;
++
++	BUG_ON(!entity->on_st);
++
++	entity->on_st = 0;
++	st->wsum -= entity->weight;
++	if (bfqq) {
++		sd = entity->sched_data;
++		bfq_log_bfqq(bfqq->bfqd, bfqq, "forget_entity: %p %d",
++			     bfqq, atomic_read(&bfqq->ref));
++		bfq_put_queue(bfqq);
++	}
++}
++
++/**
++ * bfq_put_idle_entity - release the idle tree ref of an entity.
++ * @st: service tree for the entity.
++ * @entity: the entity being released.
++ */
++static void bfq_put_idle_entity(struct bfq_service_tree *st,
++				struct bfq_entity *entity)
++{
++	bfq_idle_extract(st, entity);
++	bfq_forget_entity(st, entity);
++}
++
++/**
++ * bfq_forget_idle - update the idle tree if necessary.
++ * @st: the service tree to act upon.
++ *
++ * To preserve the global O(log N) complexity we only remove one entry here;
++ * as the idle tree will not grow indefinitely this can be done safely.
++ */
++static void bfq_forget_idle(struct bfq_service_tree *st)
++{
++	struct bfq_entity *first_idle = st->first_idle;
++	struct bfq_entity *last_idle = st->last_idle;
++
++	if (RB_EMPTY_ROOT(&st->active) && last_idle &&
++	    !bfq_gt(last_idle->finish, st->vtime)) {
++		/*
++		 * Forget the whole idle tree, increasing the vtime past
++		 * the last finish time of idle entities.
++		 */
++		st->vtime = last_idle->finish;
++	}
++
++	if (first_idle && !bfq_gt(first_idle->finish, st->vtime))
++		bfq_put_idle_entity(st, first_idle);
++}
++
++static struct bfq_service_tree *
++__bfq_entity_update_weight_prio(struct bfq_service_tree *old_st,
++			 struct bfq_entity *entity)
++{
++	struct bfq_service_tree *new_st = old_st;
++
++	if (entity->prio_changed) {
++		struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++		unsigned short prev_weight, new_weight;
++		struct bfq_data *bfqd = NULL;
++		struct rb_root *root;
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++		struct bfq_sched_data *sd;
++		struct bfq_group *bfqg;
++#endif
++
++		if (bfqq)
++			bfqd = bfqq->bfqd;
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++		else {
++			sd = entity->my_sched_data;
++			bfqg = container_of(sd, struct bfq_group, sched_data);
++			BUG_ON(!bfqg);
++			bfqd = (struct bfq_data *)bfqg->bfqd;
++			BUG_ON(!bfqd);
++		}
++#endif
++
++		BUG_ON(old_st->wsum < entity->weight);
++		old_st->wsum -= entity->weight;
++
++		if (entity->new_weight != entity->orig_weight) {
++			if (entity->new_weight < BFQ_MIN_WEIGHT ||
++			    entity->new_weight > BFQ_MAX_WEIGHT) {
++				printk(KERN_CRIT "update_weight_prio: "
++						 "new_weight %d\n",
++					entity->new_weight);
++				BUG();
++			}
++			entity->orig_weight = entity->new_weight;
++			if (bfqq)
++				bfqq->ioprio =
++				  bfq_weight_to_ioprio(entity->orig_weight);
++		}
++
++		if (bfqq)
++			bfqq->ioprio_class = bfqq->new_ioprio_class;
++		entity->prio_changed = 0;
++
++		/*
++		 * NOTE: here we may be changing the weight too early,
++		 * this will cause unfairness.  The correct approach
++		 * would have required additional complexity to defer
++		 * weight changes to the proper time instants (i.e.,
++		 * when entity->finish <= old_st->vtime).
++		 */
++		new_st = bfq_entity_service_tree(entity);
++
++		prev_weight = entity->weight;
++		new_weight = entity->orig_weight *
++			     (bfqq ? bfqq->wr_coeff : 1);
++		/*
++		 * If the weight of the entity changes, remove the entity
++		 * from its old weight counter (if there is a counter
++		 * associated with the entity), and add it to the counter
++		 * associated with its new weight.
++		 */
++		if (prev_weight != new_weight) {
++			root = bfqq ? &bfqd->queue_weights_tree :
++				      &bfqd->group_weights_tree;
++			bfq_weights_tree_remove(bfqd, entity, root);
++		}
++		entity->weight = new_weight;
++		/*
++		 * Add the entity to its weights tree only if it is
++		 * not associated with a weight-raised queue.
++		 */
++		if (prev_weight != new_weight &&
++		    (bfqq ? bfqq->wr_coeff == 1 : 1))
++			/* If we get here, root has been initialized. */
++			bfq_weights_tree_add(bfqd, entity, root);
++
++		new_st->wsum += entity->weight;
++
++		if (new_st != old_st)
++			entity->start = new_st->vtime;
++	}
++
++	return new_st;
++}
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++static void bfqg_stats_set_start_empty_time(struct bfq_group *bfqg);
++#endif
++
++/**
++ * bfq_bfqq_served - update the scheduler status after selection for
++ *                   service.
++ * @bfqq: the queue being served.
++ * @served: bytes to transfer.
++ *
++ * NOTE: this can be optimized, as the timestamps of upper level entities
++ * are synchronized every time a new bfqq is selected for service.  By now,
++ * we keep it to better check consistency.
++ */
++static void bfq_bfqq_served(struct bfq_queue *bfqq, int served)
++{
++	struct bfq_entity *entity = &bfqq->entity;
++	struct bfq_service_tree *st;
++
++	for_each_entity(entity) {
++		st = bfq_entity_service_tree(entity);
++
++		entity->service += served;
++		BUG_ON(entity->service > entity->budget);
++		BUG_ON(st->wsum == 0);
++
++		st->vtime += bfq_delta(served, st->wsum);
++		bfq_forget_idle(st);
++	}
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++	bfqg_stats_set_start_empty_time(bfqq_group(bfqq));
++#endif
++	bfq_log_bfqq(bfqq->bfqd, bfqq, "bfqq_served %d secs", served);
++}
++
++/**
++ * bfq_bfqq_charge_full_budget - set the service to the entity budget.
++ * @bfqq: the queue that needs a service update.
++ *
++ * When it's not possible to be fair in the service domain, because
++ * a queue is not consuming its budget fast enough (the meaning of
++ * fast depends on the timeout parameter), we charge it a full
++ * budget.  In this way we should obtain a sort of time-domain
++ * fairness among all the seeky/slow queues.
++ */
++static void bfq_bfqq_charge_full_budget(struct bfq_queue *bfqq)
++{
++	struct bfq_entity *entity = &bfqq->entity;
++
++	bfq_log_bfqq(bfqq->bfqd, bfqq, "charge_full_budget");
++
++	bfq_bfqq_served(bfqq, entity->budget - entity->service);
++}
++
++/**
++ * __bfq_activate_entity - activate an entity.
++ * @entity: the entity being activated.
++ *
++ * Called whenever an entity is activated, i.e., it is not active and one
++ * of its children receives a new request, or has to be reactivated due to
++ * budget exhaustion.  It uses the current budget of the entity (and the
++ * service received if @entity is active) of the queue to calculate its
++ * timestamps.
++ */
++static void __bfq_activate_entity(struct bfq_entity *entity)
++{
++	struct bfq_sched_data *sd = entity->sched_data;
++	struct bfq_service_tree *st = bfq_entity_service_tree(entity);
++
++	if (entity == sd->in_service_entity) {
++		BUG_ON(entity->tree);
++		/*
++		 * If we are requeueing the current entity we have
++		 * to take care of not charging to it service it has
++		 * not received.
++		 */
++		bfq_calc_finish(entity, entity->service);
++		entity->start = entity->finish;
++		sd->in_service_entity = NULL;
++	} else if (entity->tree == &st->active) {
++		/*
++		 * Requeueing an entity due to a change of some
++		 * next_in_service entity below it.  We reuse the
++		 * old start time.
++		 */
++		bfq_active_extract(st, entity);
++	} else if (entity->tree == &st->idle) {
++		/*
++		 * Must be on the idle tree, bfq_idle_extract() will
++		 * check for that.
++		 */
++		bfq_idle_extract(st, entity);
++		entity->start = bfq_gt(st->vtime, entity->finish) ?
++				       st->vtime : entity->finish;
++	} else {
++		/*
++		 * The finish time of the entity may be invalid, and
++		 * it is in the past for sure, otherwise the queue
++		 * would have been on the idle tree.
++		 */
++		entity->start = st->vtime;
++		st->wsum += entity->weight;
++		bfq_get_entity(entity);
++
++		BUG_ON(entity->on_st);
++		entity->on_st = 1;
++	}
++
++	st = __bfq_entity_update_weight_prio(st, entity);
++	bfq_calc_finish(entity, entity->budget);
++	bfq_active_insert(st, entity);
++}
++
++/**
++ * bfq_activate_entity - activate an entity and its ancestors if necessary.
++ * @entity: the entity to activate.
++ *
++ * Activate @entity and all the entities on the path from it to the root.
++ */
++static void bfq_activate_entity(struct bfq_entity *entity)
++{
++	struct bfq_sched_data *sd;
++
++	for_each_entity(entity) {
++		__bfq_activate_entity(entity);
++
++		sd = entity->sched_data;
++		if (!bfq_update_next_in_service(sd))
++			/*
++			 * No need to propagate the activation to the
++			 * upper entities, as they will be updated when
++			 * the in-service entity is rescheduled.
++			 */
++			break;
++	}
++}
++
++/**
++ * __bfq_deactivate_entity - deactivate an entity from its service tree.
++ * @entity: the entity to deactivate.
++ * @requeue: if false, the entity will not be put into the idle tree.
++ *
++ * Deactivate an entity, independently from its previous state.  If the
++ * entity was not on a service tree just return, otherwise if it is on
++ * any scheduler tree, extract it from that tree, and if necessary
++ * and if the caller did not specify @requeue, put it on the idle tree.
++ *
++ * Return %1 if the caller should update the entity hierarchy, i.e.,
++ * if the entity was in service or if it was the next_in_service for
++ * its sched_data; return %0 otherwise.
++ */
++static int __bfq_deactivate_entity(struct bfq_entity *entity, int requeue)
++{
++	struct bfq_sched_data *sd = entity->sched_data;
++	struct bfq_service_tree *st = bfq_entity_service_tree(entity);
++	int was_in_service = entity == sd->in_service_entity;
++	int ret = 0;
++
++	if (!entity->on_st)
++		return 0;
++
++	BUG_ON(was_in_service && entity->tree);
++
++	if (was_in_service) {
++		bfq_calc_finish(entity, entity->service);
++		sd->in_service_entity = NULL;
++	} else if (entity->tree == &st->active)
++		bfq_active_extract(st, entity);
++	else if (entity->tree == &st->idle)
++		bfq_idle_extract(st, entity);
++	else if (entity->tree)
++		BUG();
++
++	if (was_in_service || sd->next_in_service == entity)
++		ret = bfq_update_next_in_service(sd);
++
++	if (!requeue || !bfq_gt(entity->finish, st->vtime))
++		bfq_forget_entity(st, entity);
++	else
++		bfq_idle_insert(st, entity);
++
++	BUG_ON(sd->in_service_entity == entity);
++	BUG_ON(sd->next_in_service == entity);
++
++	return ret;
++}
++
++/**
++ * bfq_deactivate_entity - deactivate an entity.
++ * @entity: the entity to deactivate.
++ * @requeue: true if the entity can be put on the idle tree
++ */
++static void bfq_deactivate_entity(struct bfq_entity *entity, int requeue)
++{
++	struct bfq_sched_data *sd;
++	struct bfq_entity *parent;
++
++	for_each_entity_safe(entity, parent) {
++		sd = entity->sched_data;
++
++		if (!__bfq_deactivate_entity(entity, requeue))
++			/*
++			 * The parent entity is still backlogged, and
++			 * we don't need to update it as it is still
++			 * in service.
++			 */
++			break;
++
++		if (sd->next_in_service)
++			/*
++			 * The parent entity is still backlogged and
++			 * the budgets on the path towards the root
++			 * need to be updated.
++			 */
++			goto update;
++
++		/*
++		 * If we reach there the parent is no more backlogged and
++		 * we want to propagate the dequeue upwards.
++		 */
++		requeue = 1;
++	}
++
++	return;
++
++update:
++	entity = parent;
++	for_each_entity(entity) {
++		__bfq_activate_entity(entity);
++
++		sd = entity->sched_data;
++		if (!bfq_update_next_in_service(sd))
++			break;
++	}
++}
++
++/**
++ * bfq_update_vtime - update vtime if necessary.
++ * @st: the service tree to act upon.
++ *
++ * If necessary update the service tree vtime to have at least one
++ * eligible entity, skipping to its start time.  Assumes that the
++ * active tree of the device is not empty.
++ *
++ * NOTE: this hierarchical implementation updates vtimes quite often,
++ * we may end up with reactivated processes getting timestamps after a
++ * vtime skip done because we needed a ->first_active entity on some
++ * intermediate node.
++ */
++static void bfq_update_vtime(struct bfq_service_tree *st)
++{
++	struct bfq_entity *entry;
++	struct rb_node *node = st->active.rb_node;
++
++	entry = rb_entry(node, struct bfq_entity, rb_node);
++	if (bfq_gt(entry->min_start, st->vtime)) {
++		st->vtime = entry->min_start;
++		bfq_forget_idle(st);
++	}
++}
++
++/**
++ * bfq_first_active_entity - find the eligible entity with
++ *                           the smallest finish time
++ * @st: the service tree to select from.
++ *
++ * This function searches the first schedulable entity, starting from the
++ * root of the tree and going on the left every time on this side there is
++ * a subtree with at least one eligible (start >= vtime) entity. The path on
++ * the right is followed only if a) the left subtree contains no eligible
++ * entities and b) no eligible entity has been found yet.
++ */
++static struct bfq_entity *bfq_first_active_entity(struct bfq_service_tree *st)
++{
++	struct bfq_entity *entry, *first = NULL;
++	struct rb_node *node = st->active.rb_node;
++
++	while (node) {
++		entry = rb_entry(node, struct bfq_entity, rb_node);
++left:
++		if (!bfq_gt(entry->start, st->vtime))
++			first = entry;
++
++		BUG_ON(bfq_gt(entry->min_start, st->vtime));
++
++		if (node->rb_left) {
++			entry = rb_entry(node->rb_left,
++					 struct bfq_entity, rb_node);
++			if (!bfq_gt(entry->min_start, st->vtime)) {
++				node = node->rb_left;
++				goto left;
++			}
++		}
++		if (first)
++			break;
++		node = node->rb_right;
++	}
++
++	BUG_ON(!first && !RB_EMPTY_ROOT(&st->active));
++	return first;
++}
++
++/**
++ * __bfq_lookup_next_entity - return the first eligible entity in @st.
++ * @st: the service tree.
++ *
++ * Update the virtual time in @st and return the first eligible entity
++ * it contains.
++ */
++static struct bfq_entity *__bfq_lookup_next_entity(struct bfq_service_tree *st,
++						   bool force)
++{
++	struct bfq_entity *entity, *new_next_in_service = NULL;
++
++	if (RB_EMPTY_ROOT(&st->active))
++		return NULL;
++
++	bfq_update_vtime(st);
++	entity = bfq_first_active_entity(st);
++	BUG_ON(bfq_gt(entity->start, st->vtime));
++
++	/*
++	 * If the chosen entity does not match with the sched_data's
++	 * next_in_service and we are forcedly serving the IDLE priority
++	 * class tree, bubble up budget update.
++	 */
++	if (unlikely(force && entity != entity->sched_data->next_in_service)) {
++		new_next_in_service = entity;
++		for_each_entity(new_next_in_service)
++			bfq_update_budget(new_next_in_service);
++	}
++
++	return entity;
++}
++
++/**
++ * bfq_lookup_next_entity - return the first eligible entity in @sd.
++ * @sd: the sched_data.
++ * @extract: if true the returned entity will be also extracted from @sd.
++ *
++ * NOTE: since we cache the next_in_service entity at each level of the
++ * hierarchy, the complexity of the lookup can be decreased with
++ * absolutely no effort just returning the cached next_in_service value;
++ * we prefer to do full lookups to test the consistency of * the data
++ * structures.
++ */
++static struct bfq_entity *bfq_lookup_next_entity(struct bfq_sched_data *sd,
++						 int extract,
++						 struct bfq_data *bfqd)
++{
++	struct bfq_service_tree *st = sd->service_tree;
++	struct bfq_entity *entity;
++	int i = 0;
++
++	BUG_ON(sd->in_service_entity);
++
++	if (bfqd &&
++	    jiffies - bfqd->bfq_class_idle_last_service > BFQ_CL_IDLE_TIMEOUT) {
++		entity = __bfq_lookup_next_entity(st + BFQ_IOPRIO_CLASSES - 1,
++						  true);
++		if (entity) {
++			i = BFQ_IOPRIO_CLASSES - 1;
++			bfqd->bfq_class_idle_last_service = jiffies;
++			sd->next_in_service = entity;
++		}
++	}
++	for (; i < BFQ_IOPRIO_CLASSES; i++) {
++		entity = __bfq_lookup_next_entity(st + i, false);
++		if (entity) {
++			if (extract) {
++				bfq_check_next_in_service(sd, entity);
++				bfq_active_extract(st + i, entity);
++				sd->in_service_entity = entity;
++				sd->next_in_service = NULL;
++			}
++			break;
++		}
++	}
++
++	return entity;
++}
++
++/*
++ * Get next queue for service.
++ */
++static struct bfq_queue *bfq_get_next_queue(struct bfq_data *bfqd)
++{
++	struct bfq_entity *entity = NULL;
++	struct bfq_sched_data *sd;
++	struct bfq_queue *bfqq;
++
++	BUG_ON(bfqd->in_service_queue);
++
++	if (bfqd->busy_queues == 0)
++		return NULL;
++
++	sd = &bfqd->root_group->sched_data;
++	for (; sd ; sd = entity->my_sched_data) {
++		entity = bfq_lookup_next_entity(sd, 1, bfqd);
++		BUG_ON(!entity);
++		entity->service = 0;
++	}
++
++	bfqq = bfq_entity_to_bfqq(entity);
++	BUG_ON(!bfqq);
++
++	return bfqq;
++}
++
++static void __bfq_bfqd_reset_in_service(struct bfq_data *bfqd)
++{
++	if (bfqd->in_service_bic) {
++		put_io_context(bfqd->in_service_bic->icq.ioc);
++		bfqd->in_service_bic = NULL;
++	}
++
++	bfqd->in_service_queue = NULL;
++	del_timer(&bfqd->idle_slice_timer);
++}
++
++static void bfq_deactivate_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++				int requeue)
++{
++	struct bfq_entity *entity = &bfqq->entity;
++
++	if (bfqq == bfqd->in_service_queue)
++		__bfq_bfqd_reset_in_service(bfqd);
++
++	bfq_deactivate_entity(entity, requeue);
++}
++
++static void bfq_activate_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq)
++{
++	struct bfq_entity *entity = &bfqq->entity;
++
++	bfq_activate_entity(entity);
++}
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++static void bfqg_stats_update_dequeue(struct bfq_group *bfqg);
++#endif
++
++/*
++ * Called when the bfqq no longer has requests pending, remove it from
++ * the service tree.
++ */
++static void bfq_del_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++			      int requeue)
++{
++	BUG_ON(!bfq_bfqq_busy(bfqq));
++	BUG_ON(!RB_EMPTY_ROOT(&bfqq->sort_list));
++
++	bfq_log_bfqq(bfqd, bfqq, "del from busy");
++
++	bfq_clear_bfqq_busy(bfqq);
++
++	BUG_ON(bfqd->busy_queues == 0);
++	bfqd->busy_queues--;
++
++	if (!bfqq->dispatched) {
++		bfq_weights_tree_remove(bfqd, &bfqq->entity,
++					&bfqd->queue_weights_tree);
++		if (!blk_queue_nonrot(bfqd->queue)) {
++			BUG_ON(!bfqd->busy_in_flight_queues);
++			bfqd->busy_in_flight_queues--;
++			if (bfq_bfqq_constantly_seeky(bfqq)) {
++				BUG_ON(!bfqd->
++					const_seeky_busy_in_flight_queues);
++				bfqd->const_seeky_busy_in_flight_queues--;
++			}
++		}
++	}
++	if (bfqq->wr_coeff > 1)
++		bfqd->wr_busy_queues--;
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++	bfqg_stats_update_dequeue(bfqq_group(bfqq));
++#endif
++
++	bfq_deactivate_bfqq(bfqd, bfqq, requeue);
++}
++
++/*
++ * Called when an inactive queue receives a new request.
++ */
++static void bfq_add_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq)
++{
++	BUG_ON(bfq_bfqq_busy(bfqq));
++	BUG_ON(bfqq == bfqd->in_service_queue);
++
++	bfq_log_bfqq(bfqd, bfqq, "add to busy");
++
++	bfq_activate_bfqq(bfqd, bfqq);
++
++	bfq_mark_bfqq_busy(bfqq);
++	bfqd->busy_queues++;
++
++	if (!bfqq->dispatched) {
++		if (bfqq->wr_coeff == 1)
++			bfq_weights_tree_add(bfqd, &bfqq->entity,
++					     &bfqd->queue_weights_tree);
++		if (!blk_queue_nonrot(bfqd->queue)) {
++			bfqd->busy_in_flight_queues++;
++			if (bfq_bfqq_constantly_seeky(bfqq))
++				bfqd->const_seeky_busy_in_flight_queues++;
++		}
++	}
++	if (bfqq->wr_coeff > 1)
++		bfqd->wr_busy_queues++;
++}
+diff --git a/block/bfq.h b/block/bfq.h
+new file mode 100644
+index 0000000..ca5ac20
+--- /dev/null
++++ b/block/bfq.h
+@@ -0,0 +1,807 @@
++/*
++ * BFQ-v7r9 for 4.2.0: data structures and common functions prototypes.
++ *
++ * Based on ideas and code from CFQ:
++ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
++ *
++ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
++ *		      Paolo Valente <paolo.valente@unimore.it>
++ *
++ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
++ */
++
++#ifndef _BFQ_H
++#define _BFQ_H
++
++#include <linux/blktrace_api.h>
++#include <linux/hrtimer.h>
++#include <linux/ioprio.h>
++#include <linux/rbtree.h>
++#include <linux/blk-cgroup.h>
++
++#define BFQ_IOPRIO_CLASSES	3
++#define BFQ_CL_IDLE_TIMEOUT	(HZ/5)
++
++#define BFQ_MIN_WEIGHT			1
++#define BFQ_MAX_WEIGHT			1000
++#define BFQ_WEIGHT_CONVERSION_COEFF	10
++
++#define BFQ_DEFAULT_QUEUE_IOPRIO	4
++
++#define BFQ_DEFAULT_GRP_WEIGHT	10
++#define BFQ_DEFAULT_GRP_IOPRIO	0
++#define BFQ_DEFAULT_GRP_CLASS	IOPRIO_CLASS_BE
++
++struct bfq_entity;
++
++/**
++ * struct bfq_service_tree - per ioprio_class service tree.
++ * @active: tree for active entities (i.e., those backlogged).
++ * @idle: tree for idle entities (i.e., those not backlogged, with V <= F_i).
++ * @first_idle: idle entity with minimum F_i.
++ * @last_idle: idle entity with maximum F_i.
++ * @vtime: scheduler virtual time.
++ * @wsum: scheduler weight sum; active and idle entities contribute to it.
++ *
++ * Each service tree represents a B-WF2Q+ scheduler on its own.  Each
++ * ioprio_class has its own independent scheduler, and so its own
++ * bfq_service_tree.  All the fields are protected by the queue lock
++ * of the containing bfqd.
++ */
++struct bfq_service_tree {
++	struct rb_root active;
++	struct rb_root idle;
++
++	struct bfq_entity *first_idle;
++	struct bfq_entity *last_idle;
++
++	u64 vtime;
++	unsigned long wsum;
++};
++
++/**
++ * struct bfq_sched_data - multi-class scheduler.
++ * @in_service_entity: entity in service.
++ * @next_in_service: head-of-the-line entity in the scheduler.
++ * @service_tree: array of service trees, one per ioprio_class.
++ *
++ * bfq_sched_data is the basic scheduler queue.  It supports three
++ * ioprio_classes, and can be used either as a toplevel queue or as
++ * an intermediate queue on a hierarchical setup.
++ * @next_in_service points to the active entity of the sched_data
++ * service trees that will be scheduled next.
++ *
++ * The supported ioprio_classes are the same as in CFQ, in descending
++ * priority order, IOPRIO_CLASS_RT, IOPRIO_CLASS_BE, IOPRIO_CLASS_IDLE.
++ * Requests from higher priority queues are served before all the
++ * requests from lower priority queues; among requests of the same
++ * queue requests are served according to B-WF2Q+.
++ * All the fields are protected by the queue lock of the containing bfqd.
++ */
++struct bfq_sched_data {
++	struct bfq_entity *in_service_entity;
++	struct bfq_entity *next_in_service;
++	struct bfq_service_tree service_tree[BFQ_IOPRIO_CLASSES];
++};
++
++/**
++ * struct bfq_weight_counter - counter of the number of all active entities
++ *                             with a given weight.
++ * @weight: weight of the entities that this counter refers to.
++ * @num_active: number of active entities with this weight.
++ * @weights_node: weights tree member (see bfq_data's @queue_weights_tree
++ *                and @group_weights_tree).
++ */
++struct bfq_weight_counter {
++	short int weight;
++	unsigned int num_active;
++	struct rb_node weights_node;
++};
++
++/**
++ * struct bfq_entity - schedulable entity.
++ * @rb_node: service_tree member.
++ * @weight_counter: pointer to the weight counter associated with this entity.
++ * @on_st: flag, true if the entity is on a tree (either the active or
++ *         the idle one of its service_tree).
++ * @finish: B-WF2Q+ finish timestamp (aka F_i).
++ * @start: B-WF2Q+ start timestamp (aka S_i).
++ * @tree: tree the entity is enqueued into; %NULL if not on a tree.
++ * @min_start: minimum start time of the (active) subtree rooted at
++ *             this entity; used for O(log N) lookups into active trees.
++ * @service: service received during the last round of service.
++ * @budget: budget used to calculate F_i; F_i = S_i + @budget / @weight.
++ * @weight: weight of the queue
++ * @parent: parent entity, for hierarchical scheduling.
++ * @my_sched_data: for non-leaf nodes in the cgroup hierarchy, the
++ *                 associated scheduler queue, %NULL on leaf nodes.
++ * @sched_data: the scheduler queue this entity belongs to.
++ * @ioprio: the ioprio in use.
++ * @new_weight: when a weight change is requested, the new weight value.
++ * @orig_weight: original weight, used to implement weight boosting
++ * @prio_changed: flag, true when the user requested a weight, ioprio or
++ *		  ioprio_class change.
++ *
++ * A bfq_entity is used to represent either a bfq_queue (leaf node in the
++ * cgroup hierarchy) or a bfq_group into the upper level scheduler.  Each
++ * entity belongs to the sched_data of the parent group in the cgroup
++ * hierarchy.  Non-leaf entities have also their own sched_data, stored
++ * in @my_sched_data.
++ *
++ * Each entity stores independently its priority values; this would
++ * allow different weights on different devices, but this
++ * functionality is not exported to userspace by now.  Priorities and
++ * weights are updated lazily, first storing the new values into the
++ * new_* fields, then setting the @prio_changed flag.  As soon as
++ * there is a transition in the entity state that allows the priority
++ * update to take place the effective and the requested priority
++ * values are synchronized.
++ *
++ * Unless cgroups are used, the weight value is calculated from the
++ * ioprio to export the same interface as CFQ.  When dealing with
++ * ``well-behaved'' queues (i.e., queues that do not spend too much
++ * time to consume their budget and have true sequential behavior, and
++ * when there are no external factors breaking anticipation) the
++ * relative weights at each level of the cgroups hierarchy should be
++ * guaranteed.  All the fields are protected by the queue lock of the
++ * containing bfqd.
++ */
++struct bfq_entity {
++	struct rb_node rb_node;
++	struct bfq_weight_counter *weight_counter;
++
++	int on_st;
++
++	u64 finish;
++	u64 start;
++
++	struct rb_root *tree;
++
++	u64 min_start;
++
++	int service, budget;
++	unsigned short weight, new_weight;
++	unsigned short orig_weight;
++
++	struct bfq_entity *parent;
++
++	struct bfq_sched_data *my_sched_data;
++	struct bfq_sched_data *sched_data;
++
++	int prio_changed;
++};
++
++struct bfq_group;
++
++/**
++ * struct bfq_queue - leaf schedulable entity.
++ * @ref: reference counter.
++ * @bfqd: parent bfq_data.
++ * @new_ioprio: when an ioprio change is requested, the new ioprio value.
++ * @ioprio_class: the ioprio_class in use.
++ * @new_ioprio_class: when an ioprio_class change is requested, the new
++ *                    ioprio_class value.
++ * @new_bfqq: shared bfq_queue if queue is cooperating with
++ *           one or more other queues.
++ * @sort_list: sorted list of pending requests.
++ * @next_rq: if fifo isn't expired, next request to serve.
++ * @queued: nr of requests queued in @sort_list.
++ * @allocated: currently allocated requests.
++ * @meta_pending: pending metadata requests.
++ * @fifo: fifo list of requests in sort_list.
++ * @entity: entity representing this queue in the scheduler.
++ * @max_budget: maximum budget allowed from the feedback mechanism.
++ * @budget_timeout: budget expiration (in jiffies).
++ * @dispatched: number of requests on the dispatch list or inside driver.
++ * @flags: status flags.
++ * @bfqq_list: node for active/idle bfqq list inside our bfqd.
++ * @burst_list_node: node for the device's burst list.
++ * @seek_samples: number of seeks sampled
++ * @seek_total: sum of the distances of the seeks sampled
++ * @seek_mean: mean seek distance
++ * @last_request_pos: position of the last request enqueued
++ * @requests_within_timer: number of consecutive pairs of request completion
++ *                         and arrival, such that the queue becomes idle
++ *                         after the completion, but the next request arrives
++ *                         within an idle time slice; used only if the queue's
++ *                         IO_bound has been cleared.
++ * @pid: pid of the process owning the queue, used for logging purposes.
++ * @last_wr_start_finish: start time of the current weight-raising period if
++ *                        the @bfq-queue is being weight-raised, otherwise
++ *                        finish time of the last weight-raising period
++ * @wr_cur_max_time: current max raising time for this queue
++ * @soft_rt_next_start: minimum time instant such that, only if a new
++ *                      request is enqueued after this time instant in an
++ *                      idle @bfq_queue with no outstanding requests, then
++ *                      the task associated with the queue it is deemed as
++ *                      soft real-time (see the comments to the function
++ *                      bfq_bfqq_softrt_next_start())
++ * @last_idle_bklogged: time of the last transition of the @bfq_queue from
++ *                      idle to backlogged
++ * @service_from_backlogged: cumulative service received from the @bfq_queue
++ *                           since the last transition from idle to
++ *                           backlogged
++ * @bic: pointer to the bfq_io_cq owning the bfq_queue, set to %NULL if the
++ *	 queue is shared
++ *
++ * A bfq_queue is a leaf request queue; it can be associated with an
++ * io_context or more, if it  is  async or shared  between  cooperating
++ * processes. @cgroup holds a reference to the cgroup, to be sure that it
++ * does not disappear while a bfqq still references it (mostly to avoid
++ * races between request issuing and task migration followed by cgroup
++ * destruction).
++ * All the fields are protected by the queue lock of the containing bfqd.
++ */
++struct bfq_queue {
++	atomic_t ref;
++	struct bfq_data *bfqd;
++
++	unsigned short ioprio, new_ioprio;
++	unsigned short ioprio_class, new_ioprio_class;
++
++	/* fields for cooperating queues handling */
++	struct bfq_queue *new_bfqq;
++	struct rb_node pos_node;
++	struct rb_root *pos_root;
++
++	struct rb_root sort_list;
++	struct request *next_rq;
++	int queued[2];
++	int allocated[2];
++	int meta_pending;
++	struct list_head fifo;
++
++	struct bfq_entity entity;
++
++	int max_budget;
++	unsigned long budget_timeout;
++
++	int dispatched;
++
++	unsigned int flags;
++
++	struct list_head bfqq_list;
++
++	struct hlist_node burst_list_node;
++
++	unsigned int seek_samples;
++	u64 seek_total;
++	sector_t seek_mean;
++	sector_t last_request_pos;
++
++	unsigned int requests_within_timer;
++
++	pid_t pid;
++	struct bfq_io_cq *bic;
++
++	/* weight-raising fields */
++	unsigned long wr_cur_max_time;
++	unsigned long soft_rt_next_start;
++	unsigned long last_wr_start_finish;
++	unsigned int wr_coeff;
++	unsigned long last_idle_bklogged;
++	unsigned long service_from_backlogged;
++};
++
++/**
++ * struct bfq_ttime - per process thinktime stats.
++ * @ttime_total: total process thinktime
++ * @ttime_samples: number of thinktime samples
++ * @ttime_mean: average process thinktime
++ */
++struct bfq_ttime {
++	unsigned long last_end_request;
++
++	unsigned long ttime_total;
++	unsigned long ttime_samples;
++	unsigned long ttime_mean;
++};
++
++/**
++ * struct bfq_io_cq - per (request_queue, io_context) structure.
++ * @icq: associated io_cq structure
++ * @bfqq: array of two process queues, the sync and the async
++ * @ttime: associated @bfq_ttime struct
++ * @ioprio: per (request_queue, blkcg) ioprio.
++ * @blkcg_id: id of the blkcg the related io_cq belongs to.
++ */
++struct bfq_io_cq {
++	struct io_cq icq; /* must be the first member */
++	struct bfq_queue *bfqq[2];
++	struct bfq_ttime ttime;
++	int ioprio;
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++	uint64_t blkcg_id; /* the current blkcg ID */
++#endif
++};
++
++enum bfq_device_speed {
++	BFQ_BFQD_FAST,
++	BFQ_BFQD_SLOW,
++};
++
++/**
++ * struct bfq_data - per device data structure.
++ * @queue: request queue for the managed device.
++ * @root_group: root bfq_group for the device.
++ * @active_numerous_groups: number of bfq_groups containing more than one
++ *                          active @bfq_entity.
++ * @queue_weights_tree: rbtree of weight counters of @bfq_queues, sorted by
++ *                      weight. Used to keep track of whether all @bfq_queues
++ *                     have the same weight. The tree contains one counter
++ *                     for each distinct weight associated to some active
++ *                     and not weight-raised @bfq_queue (see the comments to
++ *                      the functions bfq_weights_tree_[add|remove] for
++ *                     further details).
++ * @group_weights_tree: rbtree of non-queue @bfq_entity weight counters, sorted
++ *                      by weight. Used to keep track of whether all
++ *                     @bfq_groups have the same weight. The tree contains
++ *                     one counter for each distinct weight associated to
++ *                     some active @bfq_group (see the comments to the
++ *                     functions bfq_weights_tree_[add|remove] for further
++ *                     details).
++ * @busy_queues: number of bfq_queues containing requests (including the
++ *		 queue in service, even if it is idling).
++ * @busy_in_flight_queues: number of @bfq_queues containing pending or
++ *                         in-flight requests, plus the @bfq_queue in
++ *                         service, even if idle but waiting for the
++ *                         possible arrival of its next sync request. This
++ *                         field is updated only if the device is rotational,
++ *                         but used only if the device is also NCQ-capable.
++ *                         The reason why the field is updated also for non-
++ *                         NCQ-capable rotational devices is related to the
++ *                         fact that the value of @hw_tag may be set also
++ *                         later than when busy_in_flight_queues may need to
++ *                         be incremented for the first time(s). Taking also
++ *                         this possibility into account, to avoid unbalanced
++ *                         increments/decrements, would imply more overhead
++ *                         than just updating busy_in_flight_queues
++ *                         regardless of the value of @hw_tag.
++ * @const_seeky_busy_in_flight_queues: number of constantly-seeky @bfq_queues
++ *                                     (that is, seeky queues that expired
++ *                                     for budget timeout at least once)
++ *                                     containing pending or in-flight
++ *                                     requests, including the in-service
++ *                                     @bfq_queue if constantly seeky. This
++ *                                     field is updated only if the device
++ *                                     is rotational, but used only if the
++ *                                     device is also NCQ-capable (see the
++ *                                     comments to @busy_in_flight_queues).
++ * @wr_busy_queues: number of weight-raised busy @bfq_queues.
++ * @queued: number of queued requests.
++ * @rq_in_driver: number of requests dispatched and waiting for completion.
++ * @sync_flight: number of sync requests in the driver.
++ * @max_rq_in_driver: max number of reqs in driver in the last
++ *                    @hw_tag_samples completed requests.
++ * @hw_tag_samples: nr of samples used to calculate hw_tag.
++ * @hw_tag: flag set to one if the driver is showing a queueing behavior.
++ * @budgets_assigned: number of budgets assigned.
++ * @idle_slice_timer: timer set when idling for the next sequential request
++ *                    from the queue in service.
++ * @unplug_work: delayed work to restart dispatching on the request queue.
++ * @in_service_queue: bfq_queue in service.
++ * @in_service_bic: bfq_io_cq (bic) associated with the @in_service_queue.
++ * @last_position: on-disk position of the last served request.
++ * @last_budget_start: beginning of the last budget.
++ * @last_idling_start: beginning of the last idle slice.
++ * @peak_rate: peak transfer rate observed for a budget.
++ * @peak_rate_samples: number of samples used to calculate @peak_rate.
++ * @bfq_max_budget: maximum budget allotted to a bfq_queue before
++ *                  rescheduling.
++ * @group_list: list of all the bfq_groups active on the device.
++ * @active_list: list of all the bfq_queues active on the device.
++ * @idle_list: list of all the bfq_queues idle on the device.
++ * @bfq_fifo_expire: timeout for async/sync requests; when it expires
++ *                   requests are served in fifo order.
++ * @bfq_back_penalty: weight of backward seeks wrt forward ones.
++ * @bfq_back_max: maximum allowed backward seek.
++ * @bfq_slice_idle: maximum idling time.
++ * @bfq_user_max_budget: user-configured max budget value
++ *                       (0 for auto-tuning).
++ * @bfq_max_budget_async_rq: maximum budget (in nr of requests) allotted to
++ *                           async queues.
++ * @bfq_timeout: timeout for bfq_queues to consume their budget; used to
++ *               to prevent seeky queues to impose long latencies to well
++ *               behaved ones (this also implies that seeky queues cannot
++ *               receive guarantees in the service domain; after a timeout
++ *               they are charged for the whole allocated budget, to try
++ *               to preserve a behavior reasonably fair among them, but
++ *               without service-domain guarantees).
++ * @bfq_coop_thresh: number of queue merges after which a @bfq_queue is
++ *                   no more granted any weight-raising.
++ * @bfq_failed_cooperations: number of consecutive failed cooperation
++ *                           chances after which weight-raising is restored
++ *                           to a queue subject to more than bfq_coop_thresh
++ *                           queue merges.
++ * @bfq_requests_within_timer: number of consecutive requests that must be
++ *                             issued within the idle time slice to set
++ *                             again idling to a queue which was marked as
++ *                             non-I/O-bound (see the definition of the
++ *                             IO_bound flag for further details).
++ * @last_ins_in_burst: last time at which a queue entered the current
++ *                     burst of queues being activated shortly after
++ *                     each other; for more details about this and the
++ *                     following parameters related to a burst of
++ *                     activations, see the comments to the function
++ *                     @bfq_handle_burst.
++ * @bfq_burst_interval: reference time interval used to decide whether a
++ *                      queue has been activated shortly after
++ *                      @last_ins_in_burst.
++ * @burst_size: number of queues in the current burst of queue activations.
++ * @bfq_large_burst_thresh: maximum burst size above which the current
++ * 			    queue-activation burst is deemed as 'large'.
++ * @large_burst: true if a large queue-activation burst is in progress.
++ * @burst_list: head of the burst list (as for the above fields, more details
++ * 		in the comments to the function bfq_handle_burst).
++ * @low_latency: if set to true, low-latency heuristics are enabled.
++ * @bfq_wr_coeff: maximum factor by which the weight of a weight-raised
++ *                queue is multiplied.
++ * @bfq_wr_max_time: maximum duration of a weight-raising period (jiffies).
++ * @bfq_wr_rt_max_time: maximum duration for soft real-time processes.
++ * @bfq_wr_min_idle_time: minimum idle period after which weight-raising
++ *			  may be reactivated for a queue (in jiffies).
++ * @bfq_wr_min_inter_arr_async: minimum period between request arrivals
++ *				after which weight-raising may be
++ *				reactivated for an already busy queue
++ *				(in jiffies).
++ * @bfq_wr_max_softrt_rate: max service-rate for a soft real-time queue,
++ *			    sectors per seconds.
++ * @RT_prod: cached value of the product R*T used for computing the maximum
++ *	     duration of the weight raising automatically.
++ * @device_speed: device-speed class for the low-latency heuristic.
++ * @oom_bfqq: fallback dummy bfqq for extreme OOM conditions.
++ *
++ * All the fields are protected by the @queue lock.
++ */
++struct bfq_data {
++	struct request_queue *queue;
++
++	struct bfq_group *root_group;
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++	int active_numerous_groups;
++#endif
++
++	struct rb_root queue_weights_tree;
++	struct rb_root group_weights_tree;
++
++	int busy_queues;
++	int busy_in_flight_queues;
++	int const_seeky_busy_in_flight_queues;
++	int wr_busy_queues;
++	int queued;
++	int rq_in_driver;
++	int sync_flight;
++
++	int max_rq_in_driver;
++	int hw_tag_samples;
++	int hw_tag;
++
++	int budgets_assigned;
++
++	struct timer_list idle_slice_timer;
++	struct work_struct unplug_work;
++
++	struct bfq_queue *in_service_queue;
++	struct bfq_io_cq *in_service_bic;
++
++	sector_t last_position;
++
++	ktime_t last_budget_start;
++	ktime_t last_idling_start;
++	int peak_rate_samples;
++	u64 peak_rate;
++	int bfq_max_budget;
++
++	struct hlist_head group_list;
++	struct list_head active_list;
++	struct list_head idle_list;
++
++	unsigned int bfq_fifo_expire[2];
++	unsigned int bfq_back_penalty;
++	unsigned int bfq_back_max;
++	unsigned int bfq_slice_idle;
++	u64 bfq_class_idle_last_service;
++
++	int bfq_user_max_budget;
++	int bfq_max_budget_async_rq;
++	unsigned int bfq_timeout[2];
++
++	unsigned int bfq_coop_thresh;
++	unsigned int bfq_failed_cooperations;
++	unsigned int bfq_requests_within_timer;
++
++	unsigned long last_ins_in_burst;
++	unsigned long bfq_burst_interval;
++	int burst_size;
++	unsigned long bfq_large_burst_thresh;
++	bool large_burst;
++	struct hlist_head burst_list;
++
++	bool low_latency;
++
++	/* parameters of the low_latency heuristics */
++	unsigned int bfq_wr_coeff;
++	unsigned int bfq_wr_max_time;
++	unsigned int bfq_wr_rt_max_time;
++	unsigned int bfq_wr_min_idle_time;
++	unsigned long bfq_wr_min_inter_arr_async;
++	unsigned int bfq_wr_max_softrt_rate;
++	u64 RT_prod;
++	enum bfq_device_speed device_speed;
++
++	struct bfq_queue oom_bfqq;
++};
++
++enum bfqq_state_flags {
++	BFQ_BFQQ_FLAG_busy = 0,		/* has requests or is in service */
++	BFQ_BFQQ_FLAG_wait_request,	/* waiting for a request */
++	BFQ_BFQQ_FLAG_must_alloc,	/* must be allowed rq alloc */
++	BFQ_BFQQ_FLAG_fifo_expire,	/* FIFO checked in this slice */
++	BFQ_BFQQ_FLAG_idle_window,	/* slice idling enabled */
++	BFQ_BFQQ_FLAG_sync,		/* synchronous queue */
++	BFQ_BFQQ_FLAG_budget_new,	/* no completion with this budget */
++	BFQ_BFQQ_FLAG_IO_bound,		/*
++					 * bfqq has timed-out at least once
++					 * having consumed at most 2/10 of
++					 * its budget
++					 */
++	BFQ_BFQQ_FLAG_in_large_burst,	/*
++					 * bfqq activated in a large burst,
++					 * see comments to bfq_handle_burst.
++					 */
++	BFQ_BFQQ_FLAG_constantly_seeky,	/*
++					 * bfqq has proved to be slow and
++					 * seeky until budget timeout
++					 */
++	BFQ_BFQQ_FLAG_softrt_update,	/*
++					 * may need softrt-next-start
++					 * update
++					 */
++};
++
++#define BFQ_BFQQ_FNS(name)						\
++static void bfq_mark_bfqq_##name(struct bfq_queue *bfqq)		\
++{									\
++	(bfqq)->flags |= (1 << BFQ_BFQQ_FLAG_##name);			\
++}									\
++static void bfq_clear_bfqq_##name(struct bfq_queue *bfqq)		\
++{									\
++	(bfqq)->flags &= ~(1 << BFQ_BFQQ_FLAG_##name);			\
++}									\
++static int bfq_bfqq_##name(const struct bfq_queue *bfqq)		\
++{									\
++	return ((bfqq)->flags & (1 << BFQ_BFQQ_FLAG_##name)) != 0;	\
++}
++
++BFQ_BFQQ_FNS(busy);
++BFQ_BFQQ_FNS(wait_request);
++BFQ_BFQQ_FNS(must_alloc);
++BFQ_BFQQ_FNS(fifo_expire);
++BFQ_BFQQ_FNS(idle_window);
++BFQ_BFQQ_FNS(sync);
++BFQ_BFQQ_FNS(budget_new);
++BFQ_BFQQ_FNS(IO_bound);
++BFQ_BFQQ_FNS(in_large_burst);
++BFQ_BFQQ_FNS(constantly_seeky);
++BFQ_BFQQ_FNS(softrt_update);
++#undef BFQ_BFQQ_FNS
++
++/* Logging facilities. */
++#define bfq_log_bfqq(bfqd, bfqq, fmt, args...) \
++	blk_add_trace_msg((bfqd)->queue, "bfq%d " fmt, (bfqq)->pid, ##args)
++
++#define bfq_log(bfqd, fmt, args...) \
++	blk_add_trace_msg((bfqd)->queue, "bfq " fmt, ##args)
++
++/* Expiration reasons. */
++enum bfqq_expiration {
++	BFQ_BFQQ_TOO_IDLE = 0,		/*
++					 * queue has been idling for
++					 * too long
++					 */
++	BFQ_BFQQ_BUDGET_TIMEOUT,	/* budget took too long to be used */
++	BFQ_BFQQ_BUDGET_EXHAUSTED,	/* budget consumed */
++	BFQ_BFQQ_NO_MORE_REQUESTS,	/* the queue has no more requests */
++};
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++
++struct bfqg_stats {
++	/* total bytes transferred */
++	struct blkg_rwstat		service_bytes;
++	/* total IOs serviced, post merge */
++	struct blkg_rwstat		serviced;
++	/* number of ios merged */
++	struct blkg_rwstat		merged;
++	/* total time spent on device in ns, may not be accurate w/ queueing */
++	struct blkg_rwstat		service_time;
++	/* total time spent waiting in scheduler queue in ns */
++	struct blkg_rwstat		wait_time;
++	/* number of IOs queued up */
++	struct blkg_rwstat		queued;
++	/* total sectors transferred */
++	struct blkg_stat		sectors;
++	/* total disk time and nr sectors dispatched by this group */
++	struct blkg_stat		time;
++	/* time not charged to this cgroup */
++	struct blkg_stat		unaccounted_time;
++	/* sum of number of ios queued across all samples */
++	struct blkg_stat		avg_queue_size_sum;
++	/* count of samples taken for average */
++	struct blkg_stat		avg_queue_size_samples;
++	/* how many times this group has been removed from service tree */
++	struct blkg_stat		dequeue;
++	/* total time spent waiting for it to be assigned a timeslice. */
++	struct blkg_stat		group_wait_time;
++	/* time spent idling for this blkcg_gq */
++	struct blkg_stat		idle_time;
++	/* total time with empty current active q with other requests queued */
++	struct blkg_stat		empty_time;
++	/* fields after this shouldn't be cleared on stat reset */
++	uint64_t			start_group_wait_time;
++	uint64_t			start_idle_time;
++	uint64_t			start_empty_time;
++	uint16_t			flags;
++};
++
++/*
++ * struct bfq_group_data - per-blkcg storage for the blkio subsystem.
++ *
++ * @ps: @blkcg_policy_storage that this structure inherits
++ * @weight: weight of the bfq_group
++ */
++struct bfq_group_data {
++	/* must be the first member */
++	struct blkcg_policy_data pd;
++
++	unsigned short weight;
++};
++
++/**
++ * struct bfq_group - per (device, cgroup) data structure.
++ * @entity: schedulable entity to insert into the parent group sched_data.
++ * @sched_data: own sched_data, to contain child entities (they may be
++ *              both bfq_queues and bfq_groups).
++ * @bfqd_node: node to be inserted into the @bfqd->group_list list
++ *             of the groups active on the same device; used for cleanup.
++ * @bfqd: the bfq_data for the device this group acts upon.
++ * @async_bfqq: array of async queues for all the tasks belonging to
++ *              the group, one queue per ioprio value per ioprio_class,
++ *              except for the idle class that has only one queue.
++ * @async_idle_bfqq: async queue for the idle class (ioprio is ignored).
++ * @my_entity: pointer to @entity, %NULL for the toplevel group; used
++ *             to avoid too many special cases during group creation/
++ *             migration.
++ * @active_entities: number of active entities belonging to the group;
++ *                   unused for the root group. Used to know whether there
++ *                   are groups with more than one active @bfq_entity
++ *                   (see the comments to the function
++ *                   bfq_bfqq_must_not_expire()).
++ *
++ * Each (device, cgroup) pair has its own bfq_group, i.e., for each cgroup
++ * there is a set of bfq_groups, each one collecting the lower-level
++ * entities belonging to the group that are acting on the same device.
++ *
++ * Locking works as follows:
++ *    o @bfqd is protected by the queue lock, RCU is used to access it
++ *      from the readers.
++ *    o All the other fields are protected by the @bfqd queue lock.
++ */
++struct bfq_group {
++	/* must be the first member */
++	struct blkg_policy_data pd;
++
++	struct bfq_entity entity;
++	struct bfq_sched_data sched_data;
++
++	struct hlist_node bfqd_node;
++
++	void *bfqd;
++
++	struct bfq_queue *async_bfqq[2][IOPRIO_BE_NR];
++	struct bfq_queue *async_idle_bfqq;
++
++	struct bfq_entity *my_entity;
++
++	int active_entities;
++
++	struct bfqg_stats stats;
++	struct bfqg_stats dead_stats;	/* stats pushed from dead children */
++};
++
++#else
++struct bfq_group {
++	struct bfq_sched_data sched_data;
++
++	struct bfq_queue *async_bfqq[2][IOPRIO_BE_NR];
++	struct bfq_queue *async_idle_bfqq;
++};
++#endif
++
++static struct bfq_queue *bfq_entity_to_bfqq(struct bfq_entity *entity);
++
++static struct bfq_service_tree *
++bfq_entity_service_tree(struct bfq_entity *entity)
++{
++	struct bfq_sched_data *sched_data = entity->sched_data;
++	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++	unsigned int idx = bfqq ? bfqq->ioprio_class - 1 :
++				  BFQ_DEFAULT_GRP_CLASS;
++
++	BUG_ON(idx >= BFQ_IOPRIO_CLASSES);
++	BUG_ON(sched_data == NULL);
++
++	return sched_data->service_tree + idx;
++}
++
++static struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic, bool is_sync)
++{
++	return bic->bfqq[is_sync];
++}
++
++static void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq,
++			 bool is_sync)
++{
++	bic->bfqq[is_sync] = bfqq;
++}
++
++static struct bfq_data *bic_to_bfqd(struct bfq_io_cq *bic)
++{
++	return bic->icq.q->elevator->elevator_data;
++}
++
++/**
++ * bfq_get_bfqd_locked - get a lock to a bfqd using a RCU protected pointer.
++ * @ptr: a pointer to a bfqd.
++ * @flags: storage for the flags to be saved.
++ *
++ * This function allows bfqg->bfqd to be protected by the
++ * queue lock of the bfqd they reference; the pointer is dereferenced
++ * under RCU, so the storage for bfqd is assured to be safe as long
++ * as the RCU read side critical section does not end.  After the
++ * bfqd->queue->queue_lock is taken the pointer is rechecked, to be
++ * sure that no other writer accessed it.  If we raced with a writer,
++ * the function returns NULL, with the queue unlocked, otherwise it
++ * returns the dereferenced pointer, with the queue locked.
++ */
++static struct bfq_data *bfq_get_bfqd_locked(void **ptr, unsigned long *flags)
++{
++	struct bfq_data *bfqd;
++
++	rcu_read_lock();
++	bfqd = rcu_dereference(*(struct bfq_data **)ptr);
++
++	if (bfqd != NULL) {
++		spin_lock_irqsave(bfqd->queue->queue_lock, *flags);
++		if (ptr == NULL)
++			printk(KERN_CRIT "get_bfqd_locked pointer NULL\n");
++		else if (*ptr == bfqd)
++			goto out;
++		spin_unlock_irqrestore(bfqd->queue->queue_lock, *flags);
++	}
++
++	bfqd = NULL;
++out:
++	rcu_read_unlock();
++	return bfqd;
++}
++
++static void bfq_put_bfqd_unlock(struct bfq_data *bfqd, unsigned long *flags)
++{
++	spin_unlock_irqrestore(bfqd->queue->queue_lock, *flags);
++}
++
++static void bfq_check_ioprio_change(struct bfq_io_cq *bic, struct bio *bio);
++static void bfq_put_queue(struct bfq_queue *bfqq);
++static void bfq_dispatch_insert(struct request_queue *q, struct request *rq);
++static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
++				       struct bio *bio, int is_sync,
++				       struct bfq_io_cq *bic, gfp_t gfp_mask);
++static void bfq_end_wr_async_queues(struct bfq_data *bfqd,
++				    struct bfq_group *bfqg);
++static void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg);
++static void bfq_exit_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq);
++
++#endif /* _BFQ_H */
+-- 
+2.1.4
+

diff --git a/5003_block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r9-for-4.2.patch b/5003_block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r9-for-4.2.patch
new file mode 100644
index 0000000..dac6db6
--- /dev/null
+++ b/5003_block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r9-for-4.2.patch
@@ -0,0 +1,1097 @@
+From 75c9c5ea340776c0a9e934581cf63cb963a33fd4 Mon Sep 17 00:00:00 2001
+From: Mauro Andreolini <mauro.andreolini@unimore.it>
+Date: Sun, 6 Sep 2015 16:09:05 +0200
+Subject: [PATCH 3/3] block, bfq: add Early Queue Merge (EQM) to BFQ-v7r9 for
+ 4.2.0
+
+A set of processes may happen  to  perform interleaved reads, i.e.,requests
+whose union would give rise to a  sequential read  pattern.  There are two
+typical  cases: in the first  case,   processes  read  fixed-size chunks of
+data at a fixed distance from each other, while in the second case processes
+may read variable-size chunks at  variable distances. The latter case occurs
+for  example with  QEMU, which  splits the  I/O generated  by the  guest into
+multiple chunks,  and lets these chunks  be served by a  pool of cooperating
+processes,  iteratively  assigning  the  next  chunk of  I/O  to  the first
+available  process. CFQ  uses actual  queue merging  for the  first type of
+rocesses, whereas it  uses preemption to get a sequential  read pattern out
+of the read requests  performed by the second type of  processes. In the end
+it uses  two different  mechanisms to  achieve the  same goal: boosting the
+throughput with interleaved I/O.
+
+This patch introduces  Early Queue Merge (EQM), a unified mechanism to get a
+sequential  read pattern  with both  types of  processes. The  main idea is
+checking newly arrived requests against the next request of the active queue
+both in case of actual request insert and in case of request merge. By doing
+so, both the types of processes can be handled by just merging their queues.
+EQM is  then simpler and  more compact than the  pair of mechanisms used in
+CFQ.
+
+Finally, EQM  also preserves the  typical low-latency properties of BFQ, by
+properly restoring the weight-raising state of a queue when it gets back to
+a non-merged state.
+
+Signed-off-by: Mauro Andreolini <mauro.andreolini@unimore.it>
+Signed-off-by: Arianna Avanzini <avanzini@google.com>
+Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
+---
+ block/bfq-cgroup.c  |   4 +
+ block/bfq-iosched.c | 684 ++++++++++++++++++++++++++++++++++++++++++++++++++--
+ block/bfq.h         |  66 +++++
+ 3 files changed, 740 insertions(+), 14 deletions(-)
+
+diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
+index c02d65a..bc34d7a 100644
+--- a/block/bfq-cgroup.c
++++ b/block/bfq-cgroup.c
+@@ -382,6 +382,7 @@ static void bfq_pd_init(struct blkcg_gq *blkg)
+ 				   */
+ 	bfqg->bfqd = bfqd;
+ 	bfqg->active_entities = 0;
++	bfqg->rq_pos_tree = RB_ROOT;
+ 
+ 	/* if the root_group does not exist, we are handling it right now */
+ 	if (bfqd->root_group && bfqg != bfqd->root_group)
+@@ -484,6 +485,8 @@ static struct bfq_group *bfq_find_alloc_group(struct bfq_data *bfqd,
+ 	return bfqg;
+ }
+ 
++static void bfq_pos_tree_add_move(struct bfq_data *bfqd, struct bfq_queue *bfqq);
++
+ /**
+  * bfq_bfqq_move - migrate @bfqq to @bfqg.
+  * @bfqd: queue descriptor.
+@@ -531,6 +534,7 @@ static void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
+ 	bfqg_get(bfqg);
+ 
+ 	if (busy) {
++		bfq_pos_tree_add_move(bfqd, bfqq);
+ 		if (resume)
+ 			bfq_activate_bfqq(bfqd, bfqq);
+ 	}
+diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
+index 51d24dd..fcd6eea 100644
+--- a/block/bfq-iosched.c
++++ b/block/bfq-iosched.c
+@@ -296,6 +296,72 @@ static struct request *bfq_choose_req(struct bfq_data *bfqd,
+ 	}
+ }
+ 
++static struct bfq_queue *
++bfq_rq_pos_tree_lookup(struct bfq_data *bfqd, struct rb_root *root,
++		     sector_t sector, struct rb_node **ret_parent,
++		     struct rb_node ***rb_link)
++{
++	struct rb_node **p, *parent;
++	struct bfq_queue *bfqq = NULL;
++
++	parent = NULL;
++	p = &root->rb_node;
++	while (*p) {
++		struct rb_node **n;
++
++		parent = *p;
++		bfqq = rb_entry(parent, struct bfq_queue, pos_node);
++
++		/*
++		 * Sort strictly based on sector. Smallest to the left,
++		 * largest to the right.
++		 */
++		if (sector > blk_rq_pos(bfqq->next_rq))
++			n = &(*p)->rb_right;
++		else if (sector < blk_rq_pos(bfqq->next_rq))
++			n = &(*p)->rb_left;
++		else
++			break;
++		p = n;
++		bfqq = NULL;
++	}
++
++	*ret_parent = parent;
++	if (rb_link)
++		*rb_link = p;
++
++	bfq_log(bfqd, "rq_pos_tree_lookup %llu: returning %d",
++		(long long unsigned)sector,
++		bfqq ? bfqq->pid : 0);
++
++	return bfqq;
++}
++
++static void bfq_pos_tree_add_move(struct bfq_data *bfqd, struct bfq_queue *bfqq)
++{
++	struct rb_node **p, *parent;
++	struct bfq_queue *__bfqq;
++
++	if (bfqq->pos_root) {
++		rb_erase(&bfqq->pos_node, bfqq->pos_root);
++		bfqq->pos_root = NULL;
++	}
++
++	if (bfq_class_idle(bfqq))
++		return;
++	if (!bfqq->next_rq)
++		return;
++
++	bfqq->pos_root = &bfq_bfqq_to_bfqg(bfqq)->rq_pos_tree;
++	__bfqq = bfq_rq_pos_tree_lookup(bfqd, bfqq->pos_root,
++			blk_rq_pos(bfqq->next_rq), &parent, &p);
++	if (!__bfqq) {
++		rb_link_node(&bfqq->pos_node, parent, p);
++		rb_insert_color(&bfqq->pos_node, bfqq->pos_root);
++	} else
++		bfqq->pos_root = NULL;
++}
++
+ /*
+  * Tell whether there are active queues or groups with differentiated weights.
+  */
+@@ -528,6 +594,57 @@ static unsigned int bfq_wr_duration(struct bfq_data *bfqd)
+ 	return dur;
+ }
+ 
++static unsigned bfq_bfqq_cooperations(struct bfq_queue *bfqq)
++{
++	return bfqq->bic ? bfqq->bic->cooperations : 0;
++}
++
++static void
++bfq_bfqq_resume_state(struct bfq_queue *bfqq, struct bfq_io_cq *bic)
++{
++	if (bic->saved_idle_window)
++		bfq_mark_bfqq_idle_window(bfqq);
++	else
++		bfq_clear_bfqq_idle_window(bfqq);
++	if (bic->saved_IO_bound)
++		bfq_mark_bfqq_IO_bound(bfqq);
++	else
++		bfq_clear_bfqq_IO_bound(bfqq);
++	/* Assuming that the flag in_large_burst is already correctly set */
++	if (bic->wr_time_left && bfqq->bfqd->low_latency &&
++	    !bfq_bfqq_in_large_burst(bfqq) &&
++	    bic->cooperations < bfqq->bfqd->bfq_coop_thresh) {
++		/*
++		 * Start a weight raising period with the duration given by
++		 * the raising_time_left snapshot.
++		 */
++		if (bfq_bfqq_busy(bfqq))
++			bfqq->bfqd->wr_busy_queues++;
++		bfqq->wr_coeff = bfqq->bfqd->bfq_wr_coeff;
++		bfqq->wr_cur_max_time = bic->wr_time_left;
++		bfqq->last_wr_start_finish = jiffies;
++		bfqq->entity.prio_changed = 1;
++	}
++	/*
++	 * Clear wr_time_left to prevent bfq_bfqq_save_state() from
++	 * getting confused about the queue's need of a weight-raising
++	 * period.
++	 */
++	bic->wr_time_left = 0;
++}
++
++static int bfqq_process_refs(struct bfq_queue *bfqq)
++{
++	int process_refs, io_refs;
++
++	lockdep_assert_held(bfqq->bfqd->queue->queue_lock);
++
++	io_refs = bfqq->allocated[READ] + bfqq->allocated[WRITE];
++	process_refs = atomic_read(&bfqq->ref) - io_refs - bfqq->entity.on_st;
++	BUG_ON(process_refs < 0);
++	return process_refs;
++}
++
+ /* Empty burst list and add just bfqq (see comments to bfq_handle_burst) */
+ static void bfq_reset_burst_list(struct bfq_data *bfqd, struct bfq_queue *bfqq)
+ {
+@@ -764,8 +881,14 @@ static void bfq_add_request(struct request *rq)
+ 	BUG_ON(!next_rq);
+ 	bfqq->next_rq = next_rq;
+ 
++	/*
++	 * Adjust priority tree position, if next_rq changes.
++	 */
++	if (prev != bfqq->next_rq)
++		bfq_pos_tree_add_move(bfqd, bfqq);
++
+ 	if (!bfq_bfqq_busy(bfqq)) {
+-		bool soft_rt, in_burst,
++		bool soft_rt, coop_or_in_burst,
+ 		     idle_for_long_time = time_is_before_jiffies(
+ 						bfqq->budget_timeout +
+ 						bfqd->bfq_wr_min_idle_time);
+@@ -793,11 +916,12 @@ static void bfq_add_request(struct request *rq)
+ 				bfqd->last_ins_in_burst = jiffies;
+ 		}
+ 
+-		in_burst = bfq_bfqq_in_large_burst(bfqq);
++		coop_or_in_burst = bfq_bfqq_in_large_burst(bfqq) ||
++			bfq_bfqq_cooperations(bfqq) >= bfqd->bfq_coop_thresh;
+ 		soft_rt = bfqd->bfq_wr_max_softrt_rate > 0 &&
+-			!in_burst &&
++			!coop_or_in_burst &&
+ 			time_is_before_jiffies(bfqq->soft_rt_next_start);
+-		interactive = !in_burst && idle_for_long_time;
++		interactive = !coop_or_in_burst && idle_for_long_time;
+ 		entity->budget = max_t(unsigned long, bfqq->max_budget,
+ 				       bfq_serv_to_charge(next_rq, bfqq));
+ 
+@@ -816,6 +940,9 @@ static void bfq_add_request(struct request *rq)
+ 		if (!bfqd->low_latency)
+ 			goto add_bfqq_busy;
+ 
++		if (bfq_bfqq_just_split(bfqq))
++			goto set_prio_changed;
++
+ 		/*
+ 		 * If the queue:
+ 		 * - is not being boosted,
+@@ -840,7 +967,7 @@ static void bfq_add_request(struct request *rq)
+ 		} else if (old_wr_coeff > 1) {
+ 			if (interactive)
+ 				bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
+-			else if (in_burst ||
++			else if (coop_or_in_burst ||
+ 				 (bfqq->wr_cur_max_time ==
+ 				  bfqd->bfq_wr_rt_max_time &&
+ 				  !soft_rt)) {
+@@ -905,6 +1032,7 @@ static void bfq_add_request(struct request *rq)
+ 					bfqd->bfq_wr_rt_max_time;
+ 			}
+ 		}
++set_prio_changed:
+ 		if (old_wr_coeff != bfqq->wr_coeff)
+ 			entity->prio_changed = 1;
+ add_bfqq_busy:
+@@ -1047,6 +1175,15 @@ static void bfq_merged_request(struct request_queue *q, struct request *req,
+ 					 bfqd->last_position);
+ 		BUG_ON(!next_rq);
+ 		bfqq->next_rq = next_rq;
++		/*
++		 * If next_rq changes, update both the queue's budget to
++		 * fit the new request and the queue's position in its
++		 * rq_pos_tree.
++		 */
++		if (prev != bfqq->next_rq) {
++			bfq_updated_next_req(bfqd, bfqq);
++			bfq_pos_tree_add_move(bfqd, bfqq);
++		}
+ 	}
+ }
+ 
+@@ -1129,11 +1266,343 @@ static void bfq_end_wr(struct bfq_data *bfqd)
+ 	spin_unlock_irq(bfqd->queue->queue_lock);
+ }
+ 
++static sector_t bfq_io_struct_pos(void *io_struct, bool request)
++{
++	if (request)
++		return blk_rq_pos(io_struct);
++	else
++		return ((struct bio *)io_struct)->bi_iter.bi_sector;
++}
++
++static int bfq_rq_close_to_sector(void *io_struct, bool request,
++				  sector_t sector)
++{
++	return abs64(bfq_io_struct_pos(io_struct, request) - sector) <=
++	       BFQQ_SEEK_THR;
++}
++
++static struct bfq_queue *bfqq_find_close(struct bfq_data *bfqd,
++					 struct bfq_queue *bfqq,
++					 sector_t sector)
++{
++	struct rb_root *root = &bfq_bfqq_to_bfqg(bfqq)->rq_pos_tree;
++	struct rb_node *parent, *node;
++	struct bfq_queue *__bfqq;
++
++	if (RB_EMPTY_ROOT(root))
++		return NULL;
++
++	/*
++	 * First, if we find a request starting at the end of the last
++	 * request, choose it.
++	 */
++	__bfqq = bfq_rq_pos_tree_lookup(bfqd, root, sector, &parent, NULL);
++	if (__bfqq)
++		return __bfqq;
++
++	/*
++	 * If the exact sector wasn't found, the parent of the NULL leaf
++	 * will contain the closest sector (rq_pos_tree sorted by
++	 * next_request position).
++	 */
++	__bfqq = rb_entry(parent, struct bfq_queue, pos_node);
++	if (bfq_rq_close_to_sector(__bfqq->next_rq, true, sector))
++		return __bfqq;
++
++	if (blk_rq_pos(__bfqq->next_rq) < sector)
++		node = rb_next(&__bfqq->pos_node);
++	else
++		node = rb_prev(&__bfqq->pos_node);
++	if (!node)
++		return NULL;
++
++	__bfqq = rb_entry(node, struct bfq_queue, pos_node);
++	if (bfq_rq_close_to_sector(__bfqq->next_rq, true, sector))
++		return __bfqq;
++
++	return NULL;
++}
++
++static struct bfq_queue *bfq_find_close_cooperator(struct bfq_data *bfqd,
++						   struct bfq_queue *cur_bfqq,
++						   sector_t sector)
++{
++	struct bfq_queue *bfqq;
++
++	/*
++	 * We should notice if some of the queues are cooperating, e.g.
++	 * working closely on the same area of the disk. In that case,
++	 * we can group them together and don't waste time idling.
++	 */
++	bfqq = bfqq_find_close(bfqd, cur_bfqq, sector);
++	if (!bfqq || bfqq == cur_bfqq)
++		return NULL;
++
++	return bfqq;
++}
++
++static struct bfq_queue *
++bfq_setup_merge(struct bfq_queue *bfqq, struct bfq_queue *new_bfqq)
++{
++	int process_refs, new_process_refs;
++	struct bfq_queue *__bfqq;
++
++	/*
++	 * If there are no process references on the new_bfqq, then it is
++	 * unsafe to follow the ->new_bfqq chain as other bfqq's in the chain
++	 * may have dropped their last reference (not just their last process
++	 * reference).
++	 */
++	if (!bfqq_process_refs(new_bfqq))
++		return NULL;
++
++	/* Avoid a circular list and skip interim queue merges. */
++	while ((__bfqq = new_bfqq->new_bfqq)) {
++		if (__bfqq == bfqq)
++			return NULL;
++		new_bfqq = __bfqq;
++	}
++
++	process_refs = bfqq_process_refs(bfqq);
++	new_process_refs = bfqq_process_refs(new_bfqq);
++	/*
++	 * If the process for the bfqq has gone away, there is no
++	 * sense in merging the queues.
++	 */
++	if (process_refs == 0 || new_process_refs == 0)
++		return NULL;
++
++	bfq_log_bfqq(bfqq->bfqd, bfqq, "scheduling merge with queue %d",
++		new_bfqq->pid);
++
++	/*
++	 * Merging is just a redirection: the requests of the process
++	 * owning one of the two queues are redirected to the other queue.
++	 * The latter queue, in its turn, is set as shared if this is the
++	 * first time that the requests of some process are redirected to
++	 * it.
++	 *
++	 * We redirect bfqq to new_bfqq and not the opposite, because we
++	 * are in the context of the process owning bfqq, hence we have
++	 * the io_cq of this process. So we can immediately configure this
++	 * io_cq to redirect the requests of the process to new_bfqq.
++	 *
++	 * NOTE, even if new_bfqq coincides with the in-service queue, the
++	 * io_cq of new_bfqq is not available, because, if the in-service
++	 * queue is shared, bfqd->in_service_bic may not point to the
++	 * io_cq of the in-service queue.
++	 * Redirecting the requests of the process owning bfqq to the
++	 * currently in-service queue is in any case the best option, as
++	 * we feed the in-service queue with new requests close to the
++	 * last request served and, by doing so, hopefully increase the
++	 * throughput.
++	 */
++	bfqq->new_bfqq = new_bfqq;
++	atomic_add(process_refs, &new_bfqq->ref);
++	return new_bfqq;
++}
++
++static bool bfq_may_be_close_cooperator(struct bfq_queue *bfqq,
++					struct bfq_queue *new_bfqq)
++{
++	if (WARN_ON(bfqq->entity.parent != new_bfqq->entity.parent))
++		return false;
++
++	if (bfq_class_idle(bfqq) || bfq_class_idle(new_bfqq) ||
++	    (bfqq->ioprio_class != new_bfqq->ioprio_class))
++		return false;
++
++	/*
++	 * If either of the queues has already been detected as seeky,
++	 * then merging it with the other queue is unlikely to lead to
++	 * sequential I/O.
++	 */
++	if (BFQQ_SEEKY(bfqq) || BFQQ_SEEKY(new_bfqq))
++		return false;
++
++	/*
++	 * Interleaved I/O is known to be done by (some) applications
++	 * only for reads, so it does not make sense to merge async
++	 * queues.
++	 */
++	if (!bfq_bfqq_sync(bfqq) || !bfq_bfqq_sync(new_bfqq))
++		return false;
++
++	return true;
++}
++
++/*
++ * Attempt to schedule a merge of bfqq with the currently in-service queue
++ * or with a close queue among the scheduled queues.
++ * Return NULL if no merge was scheduled, a pointer to the shared bfq_queue
++ * structure otherwise.
++ *
++ * The OOM queue is not allowed to participate to cooperation: in fact, since
++ * the requests temporarily redirected to the OOM queue could be redirected
++ * again to dedicated queues at any time, the state needed to correctly
++ * handle merging with the OOM queue would be quite complex and expensive
++ * to maintain. Besides, in such a critical condition as an out of memory,
++ * the benefits of queue merging may be little relevant, or even negligible.
++ */
++static struct bfq_queue *
++bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++		     void *io_struct, bool request)
++{
++	struct bfq_queue *in_service_bfqq, *new_bfqq;
++
++	if (bfqq->new_bfqq)
++		return bfqq->new_bfqq;
++	if (!io_struct || unlikely(bfqq == &bfqd->oom_bfqq))
++		return NULL;
++	/* If device has only one backlogged bfq_queue, don't search. */
++	if (bfqd->busy_queues == 1)
++		return NULL;
++
++	in_service_bfqq = bfqd->in_service_queue;
++
++	if (!in_service_bfqq || in_service_bfqq == bfqq ||
++	    !bfqd->in_service_bic ||
++	    unlikely(in_service_bfqq == &bfqd->oom_bfqq))
++		goto check_scheduled;
++
++	if (bfq_rq_close_to_sector(io_struct, request, bfqd->last_position) &&
++	    bfq_may_be_close_cooperator(bfqq, in_service_bfqq)) {
++		new_bfqq = bfq_setup_merge(bfqq, in_service_bfqq);
++		if (new_bfqq)
++			return new_bfqq;
++	}
++	/*
++	 * Check whether there is a cooperator among currently scheduled
++	 * queues. The only thing we need is that the bio/request is not
++	 * NULL, as we need it to establish whether a cooperator exists.
++	 */
++check_scheduled:
++	new_bfqq = bfq_find_close_cooperator(bfqd, bfqq,
++			bfq_io_struct_pos(io_struct, request));
++	if (new_bfqq && likely(new_bfqq != &bfqd->oom_bfqq) &&
++	    bfq_may_be_close_cooperator(bfqq, new_bfqq))
++		return bfq_setup_merge(bfqq, new_bfqq);
++
++	return NULL;
++}
++
++static void bfq_bfqq_save_state(struct bfq_queue *bfqq)
++{
++	/*
++	 * If !bfqq->bic, the queue is already shared or its requests
++	 * have already been redirected to a shared queue; both idle window
++	 * and weight raising state have already been saved. Do nothing.
++	 */
++	if (!bfqq->bic)
++		return;
++	if (bfqq->bic->wr_time_left)
++		/*
++		 * This is the queue of a just-started process, and would
++		 * deserve weight raising: we set wr_time_left to the full
++		 * weight-raising duration to trigger weight-raising when
++		 * and if the queue is split and the first request of the
++		 * queue is enqueued.
++		 */
++		bfqq->bic->wr_time_left = bfq_wr_duration(bfqq->bfqd);
++	else if (bfqq->wr_coeff > 1) {
++		unsigned long wr_duration =
++			jiffies - bfqq->last_wr_start_finish;
++		/*
++		 * It may happen that a queue's weight raising period lasts
++		 * longer than its wr_cur_max_time, as weight raising is
++		 * handled only when a request is enqueued or dispatched (it
++		 * does not use any timer). If the weight raising period is
++		 * about to end, don't save it.
++		 */
++		if (bfqq->wr_cur_max_time <= wr_duration)
++			bfqq->bic->wr_time_left = 0;
++		else
++			bfqq->bic->wr_time_left =
++				bfqq->wr_cur_max_time - wr_duration;
++		/*
++		 * The bfq_queue is becoming shared or the requests of the
++		 * process owning the queue are being redirected to a shared
++		 * queue. Stop the weight raising period of the queue, as in
++		 * both cases it should not be owned by an interactive or
++		 * soft real-time application.
++		 */
++		bfq_bfqq_end_wr(bfqq);
++	} else
++		bfqq->bic->wr_time_left = 0;
++	bfqq->bic->saved_idle_window = bfq_bfqq_idle_window(bfqq);
++	bfqq->bic->saved_IO_bound = bfq_bfqq_IO_bound(bfqq);
++	bfqq->bic->saved_in_large_burst = bfq_bfqq_in_large_burst(bfqq);
++	bfqq->bic->was_in_burst_list = !hlist_unhashed(&bfqq->burst_list_node);
++	bfqq->bic->cooperations++;
++	bfqq->bic->failed_cooperations = 0;
++}
++
++static void bfq_get_bic_reference(struct bfq_queue *bfqq)
++{
++	/*
++	 * If bfqq->bic has a non-NULL value, the bic to which it belongs
++	 * is about to begin using a shared bfq_queue.
++	 */
++	if (bfqq->bic)
++		atomic_long_inc(&bfqq->bic->icq.ioc->refcount);
++}
++
++static void
++bfq_merge_bfqqs(struct bfq_data *bfqd, struct bfq_io_cq *bic,
++		struct bfq_queue *bfqq, struct bfq_queue *new_bfqq)
++{
++	bfq_log_bfqq(bfqd, bfqq, "merging with queue %lu",
++		(long unsigned)new_bfqq->pid);
++	/* Save weight raising and idle window of the merged queues */
++	bfq_bfqq_save_state(bfqq);
++	bfq_bfqq_save_state(new_bfqq);
++	if (bfq_bfqq_IO_bound(bfqq))
++		bfq_mark_bfqq_IO_bound(new_bfqq);
++	bfq_clear_bfqq_IO_bound(bfqq);
++	/*
++	 * Grab a reference to the bic, to prevent it from being destroyed
++	 * before being possibly touched by a bfq_split_bfqq().
++	 */
++	bfq_get_bic_reference(bfqq);
++	bfq_get_bic_reference(new_bfqq);
++	/*
++	 * Merge queues (that is, let bic redirect its requests to new_bfqq)
++	 */
++	bic_set_bfqq(bic, new_bfqq, 1);
++	bfq_mark_bfqq_coop(new_bfqq);
++	/*
++	 * new_bfqq now belongs to at least two bics (it is a shared queue):
++	 * set new_bfqq->bic to NULL. bfqq either:
++	 * - does not belong to any bic any more, and hence bfqq->bic must
++	 *   be set to NULL, or
++	 * - is a queue whose owning bics have already been redirected to a
++	 *   different queue, hence the queue is destined to not belong to
++	 *   any bic soon and bfqq->bic is already NULL (therefore the next
++	 *   assignment causes no harm).
++	 */
++	new_bfqq->bic = NULL;
++	bfqq->bic = NULL;
++	bfq_put_queue(bfqq);
++}
++
++static void bfq_bfqq_increase_failed_cooperations(struct bfq_queue *bfqq)
++{
++	struct bfq_io_cq *bic = bfqq->bic;
++	struct bfq_data *bfqd = bfqq->bfqd;
++
++	if (bic && bfq_bfqq_cooperations(bfqq) >= bfqd->bfq_coop_thresh) {
++		bic->failed_cooperations++;
++		if (bic->failed_cooperations >= bfqd->bfq_failed_cooperations)
++			bic->cooperations = 0;
++	}
++}
++
+ static int bfq_allow_merge(struct request_queue *q, struct request *rq,
+ 			   struct bio *bio)
+ {
+ 	struct bfq_data *bfqd = q->elevator->elevator_data;
+ 	struct bfq_io_cq *bic;
++	struct bfq_queue *bfqq, *new_bfqq;
+ 
+ 	/*
+ 	 * Disallow merge of a sync bio into an async request.
+@@ -1150,7 +1619,26 @@ static int bfq_allow_merge(struct request_queue *q, struct request *rq,
+ 	if (!bic)
+ 		return 0;
+ 
+-	return bic_to_bfqq(bic, bfq_bio_sync(bio)) == RQ_BFQQ(rq);
++	bfqq = bic_to_bfqq(bic, bfq_bio_sync(bio));
++	/*
++	 * We take advantage of this function to perform an early merge
++	 * of the queues of possible cooperating processes.
++	 */
++	if (bfqq) {
++		new_bfqq = bfq_setup_cooperator(bfqd, bfqq, bio, false);
++		if (new_bfqq) {
++			bfq_merge_bfqqs(bfqd, bic, bfqq, new_bfqq);
++			/*
++			 * If we get here, the bio will be queued in the
++			 * shared queue, i.e., new_bfqq, so use new_bfqq
++			 * to decide whether bio and rq can be merged.
++			 */
++			bfqq = new_bfqq;
++		} else
++			bfq_bfqq_increase_failed_cooperations(bfqq);
++	}
++
++	return bfqq == RQ_BFQQ(rq);
+ }
+ 
+ static void __bfq_set_in_service_queue(struct bfq_data *bfqd,
+@@ -1349,6 +1837,15 @@ static void __bfq_bfqq_expire(struct bfq_data *bfqd, struct bfq_queue *bfqq)
+ 
+ 	__bfq_bfqd_reset_in_service(bfqd);
+ 
++	/*
++	 * If this bfqq is shared between multiple processes, check
++	 * to make sure that those processes are still issuing I/Os
++	 * within the mean seek distance. If not, it may be time to
++	 * break the queues apart again.
++	 */
++	if (bfq_bfqq_coop(bfqq) && BFQQ_SEEKY(bfqq))
++		bfq_mark_bfqq_split_coop(bfqq);
++
+ 	if (RB_EMPTY_ROOT(&bfqq->sort_list)) {
+ 		/*
+ 		 * Overloading budget_timeout field to store the time
+@@ -1357,8 +1854,13 @@ static void __bfq_bfqq_expire(struct bfq_data *bfqd, struct bfq_queue *bfqq)
+ 		 */
+ 		bfqq->budget_timeout = jiffies;
+ 		bfq_del_bfqq_busy(bfqd, bfqq, 1);
+-	} else
++	} else {
+ 		bfq_activate_bfqq(bfqd, bfqq);
++		/*
++		 * Resort priority tree of potential close cooperators.
++		 */
++		bfq_pos_tree_add_move(bfqd, bfqq);
++	}
+ }
+ 
+ /**
+@@ -2242,10 +2744,12 @@ static void bfq_update_wr_data(struct bfq_data *bfqd, struct bfq_queue *bfqq)
+ 		/*
+ 		 * If the queue was activated in a burst, or
+ 		 * too much time has elapsed from the beginning
+-		 * of this weight-raising period, then end weight
+-		 * raising.
++		 * of this weight-raising period, or the queue has
++		 * exceeded the acceptable number of cooperations,
++		 * then end weight raising.
+ 		 */
+ 		if (bfq_bfqq_in_large_burst(bfqq) ||
++		    bfq_bfqq_cooperations(bfqq) >= bfqd->bfq_coop_thresh ||
+ 		    time_is_before_jiffies(bfqq->last_wr_start_finish +
+ 					   bfqq->wr_cur_max_time)) {
+ 			bfqq->last_wr_start_finish = jiffies;
+@@ -2474,6 +2978,25 @@ static void bfq_put_queue(struct bfq_queue *bfqq)
+ #endif
+ }
+ 
++static void bfq_put_cooperator(struct bfq_queue *bfqq)
++{
++	struct bfq_queue *__bfqq, *next;
++
++	/*
++	 * If this queue was scheduled to merge with another queue, be
++	 * sure to drop the reference taken on that queue (and others in
++	 * the merge chain). See bfq_setup_merge and bfq_merge_bfqqs.
++	 */
++	__bfqq = bfqq->new_bfqq;
++	while (__bfqq) {
++		if (__bfqq == bfqq)
++			break;
++		next = __bfqq->new_bfqq;
++		bfq_put_queue(__bfqq);
++		__bfqq = next;
++	}
++}
++
+ static void bfq_exit_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq)
+ {
+ 	if (bfqq == bfqd->in_service_queue) {
+@@ -2484,6 +3007,8 @@ static void bfq_exit_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq)
+ 	bfq_log_bfqq(bfqd, bfqq, "exit_bfqq: %p, %d", bfqq,
+ 		     atomic_read(&bfqq->ref));
+ 
++	bfq_put_cooperator(bfqq);
++
+ 	bfq_put_queue(bfqq);
+ }
+ 
+@@ -2492,6 +3017,25 @@ static void bfq_init_icq(struct io_cq *icq)
+ 	struct bfq_io_cq *bic = icq_to_bic(icq);
+ 
+ 	bic->ttime.last_end_request = jiffies;
++	/*
++	 * A newly created bic indicates that the process has just
++	 * started doing I/O, and is probably mapping into memory its
++	 * executable and libraries: it definitely needs weight raising.
++	 * There is however the possibility that the process performs,
++	 * for a while, I/O close to some other process. EQM intercepts
++	 * this behavior and may merge the queue corresponding to the
++	 * process  with some other queue, BEFORE the weight of the queue
++	 * is raised. Merged queues are not weight-raised (they are assumed
++	 * to belong to processes that benefit only from high throughput).
++	 * If the merge is basically the consequence of an accident, then
++	 * the queue will be split soon and will get back its old weight.
++	 * It is then important to write down somewhere that this queue
++	 * does need weight raising, even if it did not make it to get its
++	 * weight raised before being merged. To this purpose, we overload
++	 * the field raising_time_left and assign 1 to it, to mark the queue
++	 * as needing weight raising.
++	 */
++	bic->wr_time_left = 1;
+ }
+ 
+ static void bfq_exit_icq(struct io_cq *icq)
+@@ -2505,6 +3049,13 @@ static void bfq_exit_icq(struct io_cq *icq)
+ 	}
+ 
+ 	if (bic->bfqq[BLK_RW_SYNC]) {
++		/*
++		 * If the bic is using a shared queue, put the reference
++		 * taken on the io_context when the bic started using a
++		 * shared bfq_queue.
++		 */
++		if (bfq_bfqq_coop(bic->bfqq[BLK_RW_SYNC]))
++			put_io_context(icq->ioc);
+ 		bfq_exit_bfqq(bfqd, bic->bfqq[BLK_RW_SYNC]);
+ 		bic->bfqq[BLK_RW_SYNC] = NULL;
+ 	}
+@@ -2809,6 +3360,10 @@ static void bfq_update_idle_window(struct bfq_data *bfqd,
+ 	if (!bfq_bfqq_sync(bfqq) || bfq_class_idle(bfqq))
+ 		return;
+ 
++	/* Idle window just restored, statistics are meaningless. */
++	if (bfq_bfqq_just_split(bfqq))
++		return;
++
+ 	enable_idle = bfq_bfqq_idle_window(bfqq);
+ 
+ 	if (atomic_read(&bic->icq.ioc->active_ref) == 0 ||
+@@ -2856,6 +3411,7 @@ static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq,
+ 	if (bfqq->entity.service > bfq_max_budget(bfqd) / 8 ||
+ 	    !BFQQ_SEEKY(bfqq))
+ 		bfq_update_idle_window(bfqd, bfqq, bic);
++	bfq_clear_bfqq_just_split(bfqq);
+ 
+ 	bfq_log_bfqq(bfqd, bfqq,
+ 		     "rq_enqueued: idle_window=%d (seeky %d, mean %llu)",
+@@ -2920,12 +3476,47 @@ static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq,
+ static void bfq_insert_request(struct request_queue *q, struct request *rq)
+ {
+ 	struct bfq_data *bfqd = q->elevator->elevator_data;
+-	struct bfq_queue *bfqq = RQ_BFQQ(rq);
++	struct bfq_queue *bfqq = RQ_BFQQ(rq), *new_bfqq;
+ 
+ 	assert_spin_locked(bfqd->queue->queue_lock);
+ 
++	/*
++	 * An unplug may trigger a requeue of a request from the device
++	 * driver: make sure we are in process context while trying to
++	 * merge two bfq_queues.
++	 */
++	if (!in_interrupt()) {
++		new_bfqq = bfq_setup_cooperator(bfqd, bfqq, rq, true);
++		if (new_bfqq) {
++			if (bic_to_bfqq(RQ_BIC(rq), 1) != bfqq)
++				new_bfqq = bic_to_bfqq(RQ_BIC(rq), 1);
++			/*
++			 * Release the request's reference to the old bfqq
++			 * and make sure one is taken to the shared queue.
++			 */
++			new_bfqq->allocated[rq_data_dir(rq)]++;
++			bfqq->allocated[rq_data_dir(rq)]--;
++			atomic_inc(&new_bfqq->ref);
++			bfq_put_queue(bfqq);
++			if (bic_to_bfqq(RQ_BIC(rq), 1) == bfqq)
++				bfq_merge_bfqqs(bfqd, RQ_BIC(rq),
++						bfqq, new_bfqq);
++			rq->elv.priv[1] = new_bfqq;
++			bfqq = new_bfqq;
++		} else
++			bfq_bfqq_increase_failed_cooperations(bfqq);
++	}
++
+ 	bfq_add_request(rq);
+ 
++	/*
++	 * Here a newly-created bfq_queue has already started a weight-raising
++	 * period: clear raising_time_left to prevent bfq_bfqq_save_state()
++	 * from assigning it a full weight-raising period. See the detailed
++	 * comments about this field in bfq_init_icq().
++	 */
++	if (bfqq->bic)
++		bfqq->bic->wr_time_left = 0;
+ 	rq->fifo_time = jiffies + bfqd->bfq_fifo_expire[rq_is_sync(rq)];
+ 	list_add_tail(&rq->queuelist, &bfqq->fifo);
+ 
+@@ -3094,6 +3685,32 @@ static void bfq_put_request(struct request *rq)
+ }
+ 
+ /*
++ * Returns NULL if a new bfqq should be allocated, or the old bfqq if this
++ * was the last process referring to said bfqq.
++ */
++static struct bfq_queue *
++bfq_split_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq)
++{
++	bfq_log_bfqq(bfqq->bfqd, bfqq, "splitting queue");
++
++	put_io_context(bic->icq.ioc);
++
++	if (bfqq_process_refs(bfqq) == 1) {
++		bfqq->pid = current->pid;
++		bfq_clear_bfqq_coop(bfqq);
++		bfq_clear_bfqq_split_coop(bfqq);
++		return bfqq;
++	}
++
++	bic_set_bfqq(bic, NULL, 1);
++
++	bfq_put_cooperator(bfqq);
++
++	bfq_put_queue(bfqq);
++	return NULL;
++}
++
++/*
+  * Allocate bfq data structures associated with this request.
+  */
+ static int bfq_set_request(struct request_queue *q, struct request *rq,
+@@ -3105,6 +3722,7 @@ static int bfq_set_request(struct request_queue *q, struct request *rq,
+ 	const int is_sync = rq_is_sync(rq);
+ 	struct bfq_queue *bfqq;
+ 	unsigned long flags;
++	bool split = false;
+ 
+ 	might_sleep_if(gfp_mask & __GFP_WAIT);
+ 
+@@ -3117,15 +3735,30 @@ static int bfq_set_request(struct request_queue *q, struct request *rq,
+ 
+ 	bfq_bic_update_cgroup(bic, bio);
+ 
++new_queue:
+ 	bfqq = bic_to_bfqq(bic, is_sync);
+ 	if (!bfqq || bfqq == &bfqd->oom_bfqq) {
+ 		bfqq = bfq_get_queue(bfqd, bio, is_sync, bic, gfp_mask);
+ 		bic_set_bfqq(bic, bfqq, is_sync);
+-		if (is_sync) {
+-			if (bfqd->large_burst)
++		if (split && is_sync) {
++			if ((bic->was_in_burst_list && bfqd->large_burst) ||
++			    bic->saved_in_large_burst)
+ 				bfq_mark_bfqq_in_large_burst(bfqq);
+-			else
+-				bfq_clear_bfqq_in_large_burst(bfqq);
++			else {
++			    bfq_clear_bfqq_in_large_burst(bfqq);
++			    if (bic->was_in_burst_list)
++			       hlist_add_head(&bfqq->burst_list_node,
++				              &bfqd->burst_list);
++			}
++		}
++	} else {
++		/* If the queue was seeky for too long, break it apart. */
++		if (bfq_bfqq_coop(bfqq) && bfq_bfqq_split_coop(bfqq)) {
++			bfq_log_bfqq(bfqd, bfqq, "breaking apart bfqq");
++			bfqq = bfq_split_bfqq(bic, bfqq);
++			split = true;
++			if (!bfqq)
++				goto new_queue;
+ 		}
+ 	}
+ 
+@@ -3137,6 +3770,26 @@ static int bfq_set_request(struct request_queue *q, struct request *rq,
+ 	rq->elv.priv[0] = bic;
+ 	rq->elv.priv[1] = bfqq;
+ 
++	/*
++	 * If a bfq_queue has only one process reference, it is owned
++	 * by only one bfq_io_cq: we can set the bic field of the
++	 * bfq_queue to the address of that structure. Also, if the
++	 * queue has just been split, mark a flag so that the
++	 * information is available to the other scheduler hooks.
++	 */
++	if (likely(bfqq != &bfqd->oom_bfqq) && bfqq_process_refs(bfqq) == 1) {
++		bfqq->bic = bic;
++		if (split) {
++			bfq_mark_bfqq_just_split(bfqq);
++			/*
++			 * If the queue has just been split from a shared
++			 * queue, restore the idle window and the possible
++			 * weight raising period.
++			 */
++			bfq_bfqq_resume_state(bfqq, bic);
++		}
++	}
++
+ 	spin_unlock_irqrestore(q->queue_lock, flags);
+ 
+ 	return 0;
+@@ -3289,6 +3942,7 @@ static void bfq_init_root_group(struct bfq_group *root_group,
+ 	root_group->my_entity = NULL;
+ 	root_group->bfqd = bfqd;
+ #endif
++	root_group->rq_pos_tree = RB_ROOT;
+ 	for (i = 0; i < BFQ_IOPRIO_CLASSES; i++)
+ 		root_group->sched_data.service_tree[i] = BFQ_SERVICE_TREE_INIT;
+ }
+@@ -3369,6 +4023,8 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
+ 	bfqd->bfq_timeout[BLK_RW_ASYNC] = bfq_timeout_async;
+ 	bfqd->bfq_timeout[BLK_RW_SYNC] = bfq_timeout_sync;
+ 
++	bfqd->bfq_coop_thresh = 2;
++	bfqd->bfq_failed_cooperations = 7000;
+ 	bfqd->bfq_requests_within_timer = 120;
+ 
+ 	bfqd->bfq_large_burst_thresh = 11;
+diff --git a/block/bfq.h b/block/bfq.h
+index ca5ac20..320c438 100644
+--- a/block/bfq.h
++++ b/block/bfq.h
+@@ -183,6 +183,8 @@ struct bfq_group;
+  *                    ioprio_class value.
+  * @new_bfqq: shared bfq_queue if queue is cooperating with
+  *           one or more other queues.
++ * @pos_node: request-position tree member (see bfq_group's @rq_pos_tree).
++ * @pos_root: request-position tree root (see bfq_group's @rq_pos_tree).
+  * @sort_list: sorted list of pending requests.
+  * @next_rq: if fifo isn't expired, next request to serve.
+  * @queued: nr of requests queued in @sort_list.
+@@ -304,6 +306,26 @@ struct bfq_ttime {
+  * @ttime: associated @bfq_ttime struct
+  * @ioprio: per (request_queue, blkcg) ioprio.
+  * @blkcg_id: id of the blkcg the related io_cq belongs to.
++ * @wr_time_left: snapshot of the time left before weight raising ends
++ *                for the sync queue associated to this process; this
++ *		  snapshot is taken to remember this value while the weight
++ *		  raising is suspended because the queue is merged with a
++ *		  shared queue, and is used to set @raising_cur_max_time
++ *		  when the queue is split from the shared queue and its
++ *		  weight is raised again
++ * @saved_idle_window: same purpose as the previous field for the idle
++ *                     window
++ * @saved_IO_bound: same purpose as the previous two fields for the I/O
++ *                  bound classification of a queue
++ * @saved_in_large_burst: same purpose as the previous fields for the
++ *                        value of the field keeping the queue's belonging
++ *                        to a large burst
++ * @was_in_burst_list: true if the queue belonged to a burst list
++ *                     before its merge with another cooperating queue
++ * @cooperations: counter of consecutive successful queue merges underwent
++ *                by any of the process' @bfq_queues
++ * @failed_cooperations: counter of consecutive failed queue merges of any
++ *                       of the process' @bfq_queues
+  */
+ struct bfq_io_cq {
+ 	struct io_cq icq; /* must be the first member */
+@@ -314,6 +336,16 @@ struct bfq_io_cq {
+ #ifdef CONFIG_BFQ_GROUP_IOSCHED
+ 	uint64_t blkcg_id; /* the current blkcg ID */
+ #endif
++
++	unsigned int wr_time_left;
++	bool saved_idle_window;
++	bool saved_IO_bound;
++
++	bool saved_in_large_burst;
++	bool was_in_burst_list;
++
++	unsigned int cooperations;
++	unsigned int failed_cooperations;
+ };
+ 
+ enum bfq_device_speed {
+@@ -559,6 +591,9 @@ enum bfqq_state_flags {
+ 					 * may need softrt-next-start
+ 					 * update
+ 					 */
++	BFQ_BFQQ_FLAG_coop,		/* bfqq is shared */
++	BFQ_BFQQ_FLAG_split_coop,	/* shared bfqq will be split */
++	BFQ_BFQQ_FLAG_just_split,	/* queue has just been split */
+ };
+ 
+ #define BFQ_BFQQ_FNS(name)						\
+@@ -585,6 +620,9 @@ BFQ_BFQQ_FNS(budget_new);
+ BFQ_BFQQ_FNS(IO_bound);
+ BFQ_BFQQ_FNS(in_large_burst);
+ BFQ_BFQQ_FNS(constantly_seeky);
++BFQ_BFQQ_FNS(coop);
++BFQ_BFQQ_FNS(split_coop);
++BFQ_BFQQ_FNS(just_split);
+ BFQ_BFQQ_FNS(softrt_update);
+ #undef BFQ_BFQQ_FNS
+ 
+@@ -679,6 +717,9 @@ struct bfq_group_data {
+  *                   are groups with more than one active @bfq_entity
+  *                   (see the comments to the function
+  *                   bfq_bfqq_must_not_expire()).
++ * @rq_pos_tree: rbtree sorted by next_request position, used when
++ *               determining if two or more queues have interleaving
++ *               requests (see bfq_find_close_cooperator()).
+  *
+  * Each (device, cgroup) pair has its own bfq_group, i.e., for each cgroup
+  * there is a set of bfq_groups, each one collecting the lower-level
+@@ -707,6 +748,8 @@ struct bfq_group {
+ 
+ 	int active_entities;
+ 
++	struct rb_root rq_pos_tree;
++
+ 	struct bfqg_stats stats;
+ 	struct bfqg_stats dead_stats;	/* stats pushed from dead children */
+ };
+@@ -717,6 +760,8 @@ struct bfq_group {
+ 
+ 	struct bfq_queue *async_bfqq[2][IOPRIO_BE_NR];
+ 	struct bfq_queue *async_idle_bfqq;
++
++	struct rb_root rq_pos_tree;
+ };
+ #endif
+ 
+@@ -793,6 +838,27 @@ static void bfq_put_bfqd_unlock(struct bfq_data *bfqd, unsigned long *flags)
+ 	spin_unlock_irqrestore(bfqd->queue->queue_lock, *flags);
+ }
+ 
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++
++static struct bfq_group *bfq_bfqq_to_bfqg(struct bfq_queue *bfqq)
++{
++	struct bfq_entity *group_entity = bfqq->entity.parent;
++
++	if (!group_entity)
++		group_entity = &bfqq->bfqd->root_group->entity;
++
++	return container_of(group_entity, struct bfq_group, entity);
++}
++
++#else
++
++static struct bfq_group *bfq_bfqq_to_bfqg(struct bfq_queue *bfqq)
++{
++	return bfqq->bfqd->root_group;
++}
++
++#endif
++
+ static void bfq_check_ioprio_change(struct bfq_io_cq *bic, struct bio *bio);
+ static void bfq_put_queue(struct bfq_queue *bfqq);
+ static void bfq_dispatch_insert(struct request_queue *q, struct request *rq);
+-- 
+2.1.4
+


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-09-21 22:19 Mike Pagano
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-09-21 22:19 UTC (permalink / raw
  To: gentoo-commits

commit:     857fef960f822e5b9d2105502ed3707d4f52df93
Author:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Mon Sep 21 22:19:23 2015 +0000
Commit:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Mon Sep 21 22:19:23 2015 +0000
URL:        https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=857fef96

Linux patch 4.2.1

 0000_README            |    4 +
 1000_linux-4.2.1.patch | 4522 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 4526 insertions(+)

diff --git a/0000_README b/0000_README
index 0f4cdca..0c6168a 100644
--- a/0000_README
+++ b/0000_README
@@ -43,6 +43,10 @@ EXPERIMENTAL
 Individual Patch Descriptions:
 --------------------------------------------------------------------------
 
+Patch:  1000_linux-4.2.1.patch
+From:   http://www.kernel.org
+Desc:   Linux 4.2.1
+
 Patch:  1500_XATTR_USER_PREFIX.patch
 From:   https://bugs.gentoo.org/show_bug.cgi?id=470644
 Desc:   Support for namespace user.pax.* on tmpfs.

diff --git a/1000_linux-4.2.1.patch b/1000_linux-4.2.1.patch
new file mode 100644
index 0000000..2be0056
--- /dev/null
+++ b/1000_linux-4.2.1.patch
@@ -0,0 +1,4522 @@
+diff --git a/Documentation/ABI/testing/configfs-usb-gadget-loopback b/Documentation/ABI/testing/configfs-usb-gadget-loopback
+index 9aae5bfb9908..06beefbcf061 100644
+--- a/Documentation/ABI/testing/configfs-usb-gadget-loopback
++++ b/Documentation/ABI/testing/configfs-usb-gadget-loopback
+@@ -5,4 +5,4 @@ Description:
+ 		The attributes:
+ 
+ 		qlen		- depth of loopback queue
+-		bulk_buflen	- buffer length
++		buflen		- buffer length
+diff --git a/Documentation/ABI/testing/configfs-usb-gadget-sourcesink b/Documentation/ABI/testing/configfs-usb-gadget-sourcesink
+index 29477c319f61..bc7ff731aa0c 100644
+--- a/Documentation/ABI/testing/configfs-usb-gadget-sourcesink
++++ b/Documentation/ABI/testing/configfs-usb-gadget-sourcesink
+@@ -9,4 +9,4 @@ Description:
+ 		isoc_maxpacket	- 0 - 1023 (fs), 0 - 1024 (hs/ss)
+ 		isoc_mult	- 0..2 (hs/ss only)
+ 		isoc_maxburst	- 0..15 (ss only)
+-		qlen		- buffer length
++		buflen		- buffer length
+diff --git a/Documentation/device-mapper/statistics.txt b/Documentation/device-mapper/statistics.txt
+index 4919b2dfd1b3..6f5ef944ca4c 100644
+--- a/Documentation/device-mapper/statistics.txt
++++ b/Documentation/device-mapper/statistics.txt
+@@ -121,6 +121,10 @@ Messages
+ 
+ 	Output format:
+ 	  <region_id>: <start_sector>+<length> <step> <program_id> <aux_data>
++	        precise_timestamps histogram:n1,n2,n3,...
++
++	The strings "precise_timestamps" and "histogram" are printed only
++	if they were specified when creating the region.
+ 
+     @stats_print <region_id> [<starting_line> <number_of_lines>]
+ 
+diff --git a/Documentation/usb/gadget-testing.txt b/Documentation/usb/gadget-testing.txt
+index 592678009c15..b24d3ef89166 100644
+--- a/Documentation/usb/gadget-testing.txt
++++ b/Documentation/usb/gadget-testing.txt
+@@ -237,9 +237,7 @@ Testing the LOOPBACK function
+ -----------------------------
+ 
+ device: run the gadget
+-host: test-usb
+-
+-http://www.linux-usb.org/usbtest/testusb.c
++host: test-usb (tools/usb/testusb.c)
+ 
+ 8. MASS STORAGE function
+ ========================
+@@ -586,9 +584,8 @@ Testing the SOURCESINK function
+ -------------------------------
+ 
+ device: run the gadget
+-host: test-usb
++host: test-usb (tools/usb/testusb.c)
+ 
+-http://www.linux-usb.org/usbtest/testusb.c
+ 
+ 16. UAC1 function
+ =================
+diff --git a/Makefile b/Makefile
+index c3615937df38..a03efc18aa48 100644
+--- a/Makefile
++++ b/Makefile
+@@ -1,6 +1,6 @@
+ VERSION = 4
+ PATCHLEVEL = 2
+-SUBLEVEL = 0
++SUBLEVEL = 1
+ EXTRAVERSION =
+ NAME = Hurr durr I'ma sheep
+ 
+diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
+index 1c5021002fe4..ede2526ecf1f 100644
+--- a/arch/arm/Kconfig
++++ b/arch/arm/Kconfig
+@@ -536,6 +536,7 @@ config ARCH_ORION5X
+ 	select MVEBU_MBUS
+ 	select PCI
+ 	select PLAT_ORION_LEGACY
++	select MULTI_IRQ_HANDLER
+ 	help
+ 	  Support for the following Marvell Orion 5x series SoCs:
+ 	  Orion-1 (5181), Orion-VoIP (5181L), Orion-NAS (5182),
+diff --git a/arch/arm/boot/dts/exynos3250-rinato.dts b/arch/arm/boot/dts/exynos3250-rinato.dts
+index 031853b75528..baa9b2f52009 100644
+--- a/arch/arm/boot/dts/exynos3250-rinato.dts
++++ b/arch/arm/boot/dts/exynos3250-rinato.dts
+@@ -182,7 +182,7 @@
+ 
+ 		display-timings {
+ 			timing-0 {
+-				clock-frequency = <0>;
++				clock-frequency = <4600000>;
+ 				hactive = <320>;
+ 				vactive = <320>;
+ 				hfront-porch = <1>;
+diff --git a/arch/arm/boot/dts/rk3288.dtsi b/arch/arm/boot/dts/rk3288.dtsi
+index 22316d00493e..858efd0c861d 100644
+--- a/arch/arm/boot/dts/rk3288.dtsi
++++ b/arch/arm/boot/dts/rk3288.dtsi
+@@ -626,7 +626,7 @@
+ 		compatible = "rockchip,rk3288-wdt", "snps,dw-wdt";
+ 		reg = <0xff800000 0x100>;
+ 		clocks = <&cru PCLK_WDT>;
+-		interrupts = <GIC_SPI 111 IRQ_TYPE_LEVEL_HIGH>;
++		interrupts = <GIC_SPI 79 IRQ_TYPE_LEVEL_HIGH>;
+ 		status = "disabled";
+ 	};
+ 
+diff --git a/arch/arm/mach-bcm/bcm63xx_smp.c b/arch/arm/mach-bcm/bcm63xx_smp.c
+index 3f014f18cea5..b8e18cc8f237 100644
+--- a/arch/arm/mach-bcm/bcm63xx_smp.c
++++ b/arch/arm/mach-bcm/bcm63xx_smp.c
+@@ -127,7 +127,7 @@ static int bcm63138_smp_boot_secondary(unsigned int cpu,
+ 	}
+ 
+ 	/* Locate the secondary CPU node */
+-	dn = of_get_cpu_node(cpu_logical_map(cpu), NULL);
++	dn = of_get_cpu_node(cpu, NULL);
+ 	if (!dn) {
+ 		pr_err("SMP: failed to locate secondary CPU%d node\n", cpu);
+ 		ret = -ENODEV;
+diff --git a/arch/arm/mach-omap2/clockdomains7xx_data.c b/arch/arm/mach-omap2/clockdomains7xx_data.c
+index 57d5df0c1fbd..7581e036bda6 100644
+--- a/arch/arm/mach-omap2/clockdomains7xx_data.c
++++ b/arch/arm/mach-omap2/clockdomains7xx_data.c
+@@ -331,7 +331,7 @@ static struct clockdomain l4per2_7xx_clkdm = {
+ 	.dep_bit	  = DRA7XX_L4PER2_STATDEP_SHIFT,
+ 	.wkdep_srcs	  = l4per2_wkup_sleep_deps,
+ 	.sleepdep_srcs	  = l4per2_wkup_sleep_deps,
+-	.flags		  = CLKDM_CAN_HWSUP_SWSUP,
++	.flags		  = CLKDM_CAN_SWSUP,
+ };
+ 
+ static struct clockdomain mpu0_7xx_clkdm = {
+diff --git a/arch/arm/mach-orion5x/include/mach/irqs.h b/arch/arm/mach-orion5x/include/mach/irqs.h
+index a6fa9d8f12d8..2431d9923427 100644
+--- a/arch/arm/mach-orion5x/include/mach/irqs.h
++++ b/arch/arm/mach-orion5x/include/mach/irqs.h
+@@ -16,42 +16,42 @@
+ /*
+  * Orion Main Interrupt Controller
+  */
+-#define IRQ_ORION5X_BRIDGE		0
+-#define IRQ_ORION5X_DOORBELL_H2C	1
+-#define IRQ_ORION5X_DOORBELL_C2H	2
+-#define IRQ_ORION5X_UART0		3
+-#define IRQ_ORION5X_UART1		4
+-#define IRQ_ORION5X_I2C			5
+-#define IRQ_ORION5X_GPIO_0_7		6
+-#define IRQ_ORION5X_GPIO_8_15		7
+-#define IRQ_ORION5X_GPIO_16_23		8
+-#define IRQ_ORION5X_GPIO_24_31		9
+-#define IRQ_ORION5X_PCIE0_ERR		10
+-#define IRQ_ORION5X_PCIE0_INT		11
+-#define IRQ_ORION5X_USB1_CTRL		12
+-#define IRQ_ORION5X_DEV_BUS_ERR		14
+-#define IRQ_ORION5X_PCI_ERR		15
+-#define IRQ_ORION5X_USB_BR_ERR		16
+-#define IRQ_ORION5X_USB0_CTRL		17
+-#define IRQ_ORION5X_ETH_RX		18
+-#define IRQ_ORION5X_ETH_TX		19
+-#define IRQ_ORION5X_ETH_MISC		20
+-#define IRQ_ORION5X_ETH_SUM		21
+-#define IRQ_ORION5X_ETH_ERR		22
+-#define IRQ_ORION5X_IDMA_ERR		23
+-#define IRQ_ORION5X_IDMA_0		24
+-#define IRQ_ORION5X_IDMA_1		25
+-#define IRQ_ORION5X_IDMA_2		26
+-#define IRQ_ORION5X_IDMA_3		27
+-#define IRQ_ORION5X_CESA		28
+-#define IRQ_ORION5X_SATA		29
+-#define IRQ_ORION5X_XOR0		30
+-#define IRQ_ORION5X_XOR1		31
++#define IRQ_ORION5X_BRIDGE		(1 + 0)
++#define IRQ_ORION5X_DOORBELL_H2C	(1 + 1)
++#define IRQ_ORION5X_DOORBELL_C2H	(1 + 2)
++#define IRQ_ORION5X_UART0		(1 + 3)
++#define IRQ_ORION5X_UART1		(1 + 4)
++#define IRQ_ORION5X_I2C			(1 + 5)
++#define IRQ_ORION5X_GPIO_0_7		(1 + 6)
++#define IRQ_ORION5X_GPIO_8_15		(1 + 7)
++#define IRQ_ORION5X_GPIO_16_23		(1 + 8)
++#define IRQ_ORION5X_GPIO_24_31		(1 + 9)
++#define IRQ_ORION5X_PCIE0_ERR		(1 + 10)
++#define IRQ_ORION5X_PCIE0_INT		(1 + 11)
++#define IRQ_ORION5X_USB1_CTRL		(1 + 12)
++#define IRQ_ORION5X_DEV_BUS_ERR		(1 + 14)
++#define IRQ_ORION5X_PCI_ERR		(1 + 15)
++#define IRQ_ORION5X_USB_BR_ERR		(1 + 16)
++#define IRQ_ORION5X_USB0_CTRL		(1 + 17)
++#define IRQ_ORION5X_ETH_RX		(1 + 18)
++#define IRQ_ORION5X_ETH_TX		(1 + 19)
++#define IRQ_ORION5X_ETH_MISC		(1 + 20)
++#define IRQ_ORION5X_ETH_SUM		(1 + 21)
++#define IRQ_ORION5X_ETH_ERR		(1 + 22)
++#define IRQ_ORION5X_IDMA_ERR		(1 + 23)
++#define IRQ_ORION5X_IDMA_0		(1 + 24)
++#define IRQ_ORION5X_IDMA_1		(1 + 25)
++#define IRQ_ORION5X_IDMA_2		(1 + 26)
++#define IRQ_ORION5X_IDMA_3		(1 + 27)
++#define IRQ_ORION5X_CESA		(1 + 28)
++#define IRQ_ORION5X_SATA		(1 + 29)
++#define IRQ_ORION5X_XOR0		(1 + 30)
++#define IRQ_ORION5X_XOR1		(1 + 31)
+ 
+ /*
+  * Orion General Purpose Pins
+  */
+-#define IRQ_ORION5X_GPIO_START	32
++#define IRQ_ORION5X_GPIO_START	33
+ #define NR_GPIO_IRQS		32
+ 
+ #define NR_IRQS			(IRQ_ORION5X_GPIO_START + NR_GPIO_IRQS)
+diff --git a/arch/arm/mach-orion5x/irq.c b/arch/arm/mach-orion5x/irq.c
+index cd4bac4d7e43..086ecb87d885 100644
+--- a/arch/arm/mach-orion5x/irq.c
++++ b/arch/arm/mach-orion5x/irq.c
+@@ -42,7 +42,7 @@ __exception_irq_entry orion5x_legacy_handle_irq(struct pt_regs *regs)
+ 	stat = readl_relaxed(MAIN_IRQ_CAUSE);
+ 	stat &= readl_relaxed(MAIN_IRQ_MASK);
+ 	if (stat) {
+-		unsigned int hwirq = __fls(stat);
++		unsigned int hwirq = 1 + __fls(stat);
+ 		handle_IRQ(hwirq, regs);
+ 		return;
+ 	}
+@@ -51,7 +51,7 @@ __exception_irq_entry orion5x_legacy_handle_irq(struct pt_regs *regs)
+ 
+ void __init orion5x_init_irq(void)
+ {
+-	orion_irq_init(0, MAIN_IRQ_MASK);
++	orion_irq_init(1, MAIN_IRQ_MASK);
+ 
+ #ifdef CONFIG_MULTI_IRQ_HANDLER
+ 	set_handle_irq(orion5x_legacy_handle_irq);
+diff --git a/arch/arm/mach-rockchip/platsmp.c b/arch/arm/mach-rockchip/platsmp.c
+index 8fcec1cc101e..01b3e3683ede 100644
+--- a/arch/arm/mach-rockchip/platsmp.c
++++ b/arch/arm/mach-rockchip/platsmp.c
+@@ -72,29 +72,22 @@ static struct reset_control *rockchip_get_core_reset(int cpu)
+ static int pmu_set_power_domain(int pd, bool on)
+ {
+ 	u32 val = (on) ? 0 : BIT(pd);
++	struct reset_control *rstc = rockchip_get_core_reset(pd);
+ 	int ret;
+ 
++	if (IS_ERR(rstc) && read_cpuid_part() != ARM_CPU_PART_CORTEX_A9) {
++		pr_err("%s: could not get reset control for core %d\n",
++		       __func__, pd);
++		return PTR_ERR(rstc);
++	}
++
+ 	/*
+ 	 * We need to soft reset the cpu when we turn off the cpu power domain,
+ 	 * or else the active processors might be stalled when the individual
+ 	 * processor is powered down.
+ 	 */
+-	if (read_cpuid_part() != ARM_CPU_PART_CORTEX_A9) {
+-		struct reset_control *rstc = rockchip_get_core_reset(pd);
+-
+-		if (IS_ERR(rstc)) {
+-			pr_err("%s: could not get reset control for core %d\n",
+-			       __func__, pd);
+-			return PTR_ERR(rstc);
+-		}
+-
+-		if (on)
+-			reset_control_deassert(rstc);
+-		else
+-			reset_control_assert(rstc);
+-
+-		reset_control_put(rstc);
+-	}
++	if (!IS_ERR(rstc) && !on)
++		reset_control_assert(rstc);
+ 
+ 	ret = regmap_update_bits(pmu, PMU_PWRDN_CON, BIT(pd), val);
+ 	if (ret < 0) {
+@@ -112,6 +105,12 @@ static int pmu_set_power_domain(int pd, bool on)
+ 		}
+ 	}
+ 
++	if (!IS_ERR(rstc)) {
++		if (on)
++			reset_control_deassert(rstc);
++		reset_control_put(rstc);
++	}
++
+ 	return 0;
+ }
+ 
+@@ -146,8 +145,12 @@ static int rockchip_boot_secondary(unsigned int cpu, struct task_struct *idle)
+ 		 * the mailbox:
+ 		 * sram_base_addr + 4: 0xdeadbeaf
+ 		 * sram_base_addr + 8: start address for pc
++		 * The cpu0 need to wait the other cpus other than cpu0 entering
++		 * the wfe state.The wait time is affected by many aspects.
++		 * (e.g: cpu frequency, bootrom frequency, sram frequency, ...)
+ 		 * */
+-		udelay(10);
++		mdelay(1); /* ensure the cpus other than cpu0 to startup */
++
+ 		writel(virt_to_phys(secondary_startup), sram_base_addr + 8);
+ 		writel(0xDEADBEAF, sram_base_addr + 4);
+ 		dsb_sev();
+diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+index b027a89737b6..c6d601cc9764 100644
+--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
++++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+@@ -421,14 +421,20 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
+ 	rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
+ 	v = pte & ~HPTE_V_HVLOCK;
+ 	if (v & HPTE_V_VALID) {
+-		u64 pte1;
+-
+-		pte1 = be64_to_cpu(hpte[1]);
+ 		hpte[0] &= ~cpu_to_be64(HPTE_V_VALID);
+-		rb = compute_tlbie_rb(v, pte1, pte_index);
++		rb = compute_tlbie_rb(v, be64_to_cpu(hpte[1]), pte_index);
+ 		do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags), true);
+-		/* Read PTE low word after tlbie to get final R/C values */
+-		remove_revmap_chain(kvm, pte_index, rev, v, pte1);
++		/*
++		 * The reference (R) and change (C) bits in a HPT
++		 * entry can be set by hardware at any time up until
++		 * the HPTE is invalidated and the TLB invalidation
++		 * sequence has completed.  This means that when
++		 * removing a HPTE, we need to re-read the HPTE after
++		 * the invalidation sequence has completed in order to
++		 * obtain reliable values of R and C.
++		 */
++		remove_revmap_chain(kvm, pte_index, rev, v,
++				    be64_to_cpu(hpte[1]));
+ 	}
+ 	r = rev->guest_rpte & ~HPTE_GR_RESERVED;
+ 	note_hpte_modification(kvm, rev);
+diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+index faa86e9c0551..76408cf0ad04 100644
+--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
++++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+@@ -1127,6 +1127,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
+ 	cmpwi	r12, BOOK3S_INTERRUPT_H_DOORBELL
+ 	bne	3f
+ 	lbz	r0, HSTATE_HOST_IPI(r13)
++	cmpwi	r0, 0
+ 	beq	4f
+ 	b	guest_exit_cont
+ 3:
+diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
+index ca070d260af2..b80512b9ef59 100644
+--- a/arch/s390/kernel/setup.c
++++ b/arch/s390/kernel/setup.c
+@@ -688,7 +688,7 @@ static void __init setup_memory(void)
+ /*
+  * Setup hardware capabilities.
+  */
+-static void __init setup_hwcaps(void)
++static int __init setup_hwcaps(void)
+ {
+ 	static const int stfl_bits[6] = { 0, 2, 7, 17, 19, 21 };
+ 	struct cpuid cpu_id;
+@@ -754,9 +754,11 @@ static void __init setup_hwcaps(void)
+ 		elf_hwcap |= HWCAP_S390_TE;
+ 
+ 	/*
+-	 * Vector extension HWCAP_S390_VXRS is bit 11.
++	 * Vector extension HWCAP_S390_VXRS is bit 11. The Vector extension
++	 * can be disabled with the "novx" parameter. Use MACHINE_HAS_VX
++	 * instead of facility bit 129.
+ 	 */
+-	if (test_facility(129))
++	if (MACHINE_HAS_VX)
+ 		elf_hwcap |= HWCAP_S390_VXRS;
+ 	get_cpu_id(&cpu_id);
+ 	add_device_randomness(&cpu_id, sizeof(cpu_id));
+@@ -793,7 +795,9 @@ static void __init setup_hwcaps(void)
+ 		strcpy(elf_platform, "z13");
+ 		break;
+ 	}
++	return 0;
+ }
++arch_initcall(setup_hwcaps);
+ 
+ /*
+  * Add system information as device randomness
+@@ -881,11 +885,6 @@ void __init setup_arch(char **cmdline_p)
+         cpu_init();
+ 
+ 	/*
+-	 * Setup capabilities (ELF_HWCAP & ELF_PLATFORM).
+-	 */
+-	setup_hwcaps();
+-
+-	/*
+ 	 * Create kernel page tables and switch to virtual addressing.
+ 	 */
+         paging_init();
+diff --git a/arch/x86/crypto/ghash-clmulni-intel_glue.c b/arch/x86/crypto/ghash-clmulni-intel_glue.c
+index 64d7cf1b50e1..440df0c7a2ee 100644
+--- a/arch/x86/crypto/ghash-clmulni-intel_glue.c
++++ b/arch/x86/crypto/ghash-clmulni-intel_glue.c
+@@ -294,6 +294,7 @@ static struct ahash_alg ghash_async_alg = {
+ 			.cra_name		= "ghash",
+ 			.cra_driver_name	= "ghash-clmulni",
+ 			.cra_priority		= 400,
++			.cra_ctxsize		= sizeof(struct ghash_async_ctx),
+ 			.cra_flags		= CRYPTO_ALG_TYPE_AHASH | CRYPTO_ALG_ASYNC,
+ 			.cra_blocksize		= GHASH_BLOCK_SIZE,
+ 			.cra_type		= &crypto_ahash_type,
+diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
+index e49ee24da85e..9393896717d0 100644
+--- a/arch/x86/kernel/acpi/boot.c
++++ b/arch/x86/kernel/acpi/boot.c
+@@ -445,6 +445,7 @@ static void __init acpi_sci_ioapic_setup(u8 bus_irq, u16 polarity, u16 trigger,
+ 		polarity = acpi_sci_flags & ACPI_MADT_POLARITY_MASK;
+ 
+ 	mp_override_legacy_irq(bus_irq, polarity, trigger, gsi);
++	acpi_penalize_sci_irq(bus_irq, trigger, polarity);
+ 
+ 	/*
+ 	 * stash over-ride to indicate we've been here
+diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
+index 844f56c5616d..c93c27df9919 100644
+--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
++++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
+@@ -146,6 +146,27 @@ void mce_intel_hcpu_update(unsigned long cpu)
+ 	per_cpu(cmci_storm_state, cpu) = CMCI_STORM_NONE;
+ }
+ 
++static void cmci_toggle_interrupt_mode(bool on)
++{
++	unsigned long flags, *owned;
++	int bank;
++	u64 val;
++
++	raw_spin_lock_irqsave(&cmci_discover_lock, flags);
++	owned = this_cpu_ptr(mce_banks_owned);
++	for_each_set_bit(bank, owned, MAX_NR_BANKS) {
++		rdmsrl(MSR_IA32_MCx_CTL2(bank), val);
++
++		if (on)
++			val |= MCI_CTL2_CMCI_EN;
++		else
++			val &= ~MCI_CTL2_CMCI_EN;
++
++		wrmsrl(MSR_IA32_MCx_CTL2(bank), val);
++	}
++	raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
++}
++
+ unsigned long cmci_intel_adjust_timer(unsigned long interval)
+ {
+ 	if ((this_cpu_read(cmci_backoff_cnt) > 0) &&
+@@ -175,7 +196,7 @@ unsigned long cmci_intel_adjust_timer(unsigned long interval)
+ 		 */
+ 		if (!atomic_read(&cmci_storm_on_cpus)) {
+ 			__this_cpu_write(cmci_storm_state, CMCI_STORM_NONE);
+-			cmci_reenable();
++			cmci_toggle_interrupt_mode(true);
+ 			cmci_recheck();
+ 		}
+ 		return CMCI_POLL_INTERVAL;
+@@ -186,22 +207,6 @@ unsigned long cmci_intel_adjust_timer(unsigned long interval)
+ 	}
+ }
+ 
+-static void cmci_storm_disable_banks(void)
+-{
+-	unsigned long flags, *owned;
+-	int bank;
+-	u64 val;
+-
+-	raw_spin_lock_irqsave(&cmci_discover_lock, flags);
+-	owned = this_cpu_ptr(mce_banks_owned);
+-	for_each_set_bit(bank, owned, MAX_NR_BANKS) {
+-		rdmsrl(MSR_IA32_MCx_CTL2(bank), val);
+-		val &= ~MCI_CTL2_CMCI_EN;
+-		wrmsrl(MSR_IA32_MCx_CTL2(bank), val);
+-	}
+-	raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
+-}
+-
+ static bool cmci_storm_detect(void)
+ {
+ 	unsigned int cnt = __this_cpu_read(cmci_storm_cnt);
+@@ -223,7 +228,7 @@ static bool cmci_storm_detect(void)
+ 	if (cnt <= CMCI_STORM_THRESHOLD)
+ 		return false;
+ 
+-	cmci_storm_disable_banks();
++	cmci_toggle_interrupt_mode(false);
+ 	__this_cpu_write(cmci_storm_state, CMCI_STORM_ACTIVE);
+ 	r = atomic_add_return(1, &cmci_storm_on_cpus);
+ 	mce_timer_kick(CMCI_STORM_INTERVAL);
+diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
+index 44171462bd2a..82362ad2f25d 100644
+--- a/arch/x86/kvm/mmu.c
++++ b/arch/x86/kvm/mmu.c
+@@ -357,12 +357,6 @@ static u64 __get_spte_lockless(u64 *sptep)
+ {
+ 	return ACCESS_ONCE(*sptep);
+ }
+-
+-static bool __check_direct_spte_mmio_pf(u64 spte)
+-{
+-	/* It is valid if the spte is zapped. */
+-	return spte == 0ull;
+-}
+ #else
+ union split_spte {
+ 	struct {
+@@ -478,23 +472,6 @@ retry:
+ 
+ 	return spte.spte;
+ }
+-
+-static bool __check_direct_spte_mmio_pf(u64 spte)
+-{
+-	union split_spte sspte = (union split_spte)spte;
+-	u32 high_mmio_mask = shadow_mmio_mask >> 32;
+-
+-	/* It is valid if the spte is zapped. */
+-	if (spte == 0ull)
+-		return true;
+-
+-	/* It is valid if the spte is being zapped. */
+-	if (sspte.spte_low == 0ull &&
+-	    (sspte.spte_high & high_mmio_mask) == high_mmio_mask)
+-		return true;
+-
+-	return false;
+-}
+ #endif
+ 
+ static bool spte_is_locklessly_modifiable(u64 spte)
+@@ -3299,21 +3276,6 @@ static bool quickly_check_mmio_pf(struct kvm_vcpu *vcpu, u64 addr, bool direct)
+ 	return vcpu_match_mmio_gva(vcpu, addr);
+ }
+ 
+-
+-/*
+- * On direct hosts, the last spte is only allows two states
+- * for mmio page fault:
+- *   - It is the mmio spte
+- *   - It is zapped or it is being zapped.
+- *
+- * This function completely checks the spte when the last spte
+- * is not the mmio spte.
+- */
+-static bool check_direct_spte_mmio_pf(u64 spte)
+-{
+-	return __check_direct_spte_mmio_pf(spte);
+-}
+-
+ static u64 walk_shadow_page_get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr)
+ {
+ 	struct kvm_shadow_walk_iterator iterator;
+@@ -3356,13 +3318,6 @@ int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct)
+ 	}
+ 
+ 	/*
+-	 * It's ok if the gva is remapped by other cpus on shadow guest,
+-	 * it's a BUG if the gfn is not a mmio page.
+-	 */
+-	if (direct && !check_direct_spte_mmio_pf(spte))
+-		return RET_MMIO_PF_BUG;
+-
+-	/*
+ 	 * If the page table is zapped by other cpus, let CPU fault again on
+ 	 * the address.
+ 	 */
+diff --git a/arch/xtensa/include/asm/traps.h b/arch/xtensa/include/asm/traps.h
+index 677bfcf4ee5d..28f33a8b7f5f 100644
+--- a/arch/xtensa/include/asm/traps.h
++++ b/arch/xtensa/include/asm/traps.h
+@@ -25,30 +25,39 @@ static inline void spill_registers(void)
+ {
+ #if XCHAL_NUM_AREGS > 16
+ 	__asm__ __volatile__ (
+-		"	call12	1f\n"
++		"	call8	1f\n"
+ 		"	_j	2f\n"
+ 		"	retw\n"
+ 		"	.align	4\n"
+ 		"1:\n"
++#if XCHAL_NUM_AREGS == 32
++		"	_entry	a1, 32\n"
++		"	addi	a8, a0, 3\n"
++		"	_entry	a1, 16\n"
++		"	mov	a12, a12\n"
++		"	retw\n"
++#else
+ 		"	_entry	a1, 48\n"
+-		"	addi	a12, a0, 3\n"
+-#if XCHAL_NUM_AREGS > 32
+-		"	.rept	(" __stringify(XCHAL_NUM_AREGS) " - 32) / 12\n"
++		"	call12	1f\n"
++		"	retw\n"
++		"	.align	4\n"
++		"1:\n"
++		"	.rept	(" __stringify(XCHAL_NUM_AREGS) " - 16) / 12\n"
+ 		"	_entry	a1, 48\n"
+ 		"	mov	a12, a0\n"
+ 		"	.endr\n"
+-#endif
+-		"	_entry	a1, 48\n"
++		"	_entry	a1, 16\n"
+ #if XCHAL_NUM_AREGS % 12 == 0
+-		"	mov	a8, a8\n"
+-#elif XCHAL_NUM_AREGS % 12 == 4
+ 		"	mov	a12, a12\n"
+-#elif XCHAL_NUM_AREGS % 12 == 8
++#elif XCHAL_NUM_AREGS % 12 == 4
+ 		"	mov	a4, a4\n"
++#elif XCHAL_NUM_AREGS % 12 == 8
++		"	mov	a8, a8\n"
+ #endif
+ 		"	retw\n"
++#endif
+ 		"2:\n"
+-		: : : "a12", "a13", "memory");
++		: : : "a8", "a9", "memory");
+ #else
+ 	__asm__ __volatile__ (
+ 		"	mov	a12, a12\n"
+diff --git a/arch/xtensa/kernel/entry.S b/arch/xtensa/kernel/entry.S
+index 82bbfa5a05b3..a2a902140c4e 100644
+--- a/arch/xtensa/kernel/entry.S
++++ b/arch/xtensa/kernel/entry.S
+@@ -568,12 +568,13 @@ user_exception_exit:
+ 	 *	 (if we have restored WSBITS-1 frames).
+ 	 */
+ 
++2:
+ #if XCHAL_HAVE_THREADPTR
+ 	l32i	a3, a1, PT_THREADPTR
+ 	wur	a3, threadptr
+ #endif
+ 
+-2:	j	common_exception_exit
++	j	common_exception_exit
+ 
+ 	/* This is the kernel exception exit.
+ 	 * We avoided to do a MOVSP when we entered the exception, but we
+@@ -1820,7 +1821,7 @@ ENDPROC(system_call)
+ 	mov	a12, a0
+ 	.endr
+ #endif
+-	_entry	a1, 48
++	_entry	a1, 16
+ #if XCHAL_NUM_AREGS % 12 == 0
+ 	mov	a8, a8
+ #elif XCHAL_NUM_AREGS % 12 == 4
+@@ -1844,7 +1845,7 @@ ENDPROC(system_call)
+ 
+ ENTRY(_switch_to)
+ 
+-	entry	a1, 16
++	entry	a1, 48
+ 
+ 	mov	a11, a3			# and 'next' (a3)
+ 
+diff --git a/drivers/acpi/acpi_pnp.c b/drivers/acpi/acpi_pnp.c
+index ff6d8adc9cda..fb765524cc3d 100644
+--- a/drivers/acpi/acpi_pnp.c
++++ b/drivers/acpi/acpi_pnp.c
+@@ -153,6 +153,7 @@ static const struct acpi_device_id acpi_pnp_device_ids[] = {
+ 	{"AEI0250"},		/* PROLiNK 1456VH ISA PnP K56flex Fax Modem */
+ 	{"AEI1240"},		/* Actiontec ISA PNP 56K X2 Fax Modem */
+ 	{"AKY1021"},		/* Rockwell 56K ACF II Fax+Data+Voice Modem */
++	{"ALI5123"},		/* ALi Fast Infrared Controller */
+ 	{"AZT4001"},		/* AZT3005 PnP SOUND DEVICE */
+ 	{"BDP3336"},		/* Best Data Products Inc. Smart One 336F PnP Modem */
+ 	{"BRI0A49"},		/* Boca Complete Ofc Communicator 14.4 Data-FAX */
+diff --git a/drivers/acpi/pci_link.c b/drivers/acpi/pci_link.c
+index cfd7581cc19f..b09ad554430a 100644
+--- a/drivers/acpi/pci_link.c
++++ b/drivers/acpi/pci_link.c
+@@ -826,6 +826,22 @@ void acpi_penalize_isa_irq(int irq, int active)
+ }
+ 
+ /*
++ * Penalize IRQ used by ACPI SCI. If ACPI SCI pin attributes conflict with
++ * PCI IRQ attributes, mark ACPI SCI as ISA_ALWAYS so it won't be use for
++ * PCI IRQs.
++ */
++void acpi_penalize_sci_irq(int irq, int trigger, int polarity)
++{
++	if (irq >= 0 && irq < ARRAY_SIZE(acpi_irq_penalty)) {
++		if (trigger != ACPI_MADT_TRIGGER_LEVEL ||
++		    polarity != ACPI_MADT_POLARITY_ACTIVE_LOW)
++			acpi_irq_penalty[irq] += PIRQ_PENALTY_ISA_ALWAYS;
++		else
++			acpi_irq_penalty[irq] += PIRQ_PENALTY_PCI_USING;
++	}
++}
++
++/*
+  * Over-ride default table to reserve additional IRQs for use by ISA
+  * e.g. acpi_irq_isa=5
+  * Useful for telling ACPI how not to interfere with your ISA sound card.
+diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
+index 7e62751abfac..a46660204e3a 100644
+--- a/drivers/ata/ahci.c
++++ b/drivers/ata/ahci.c
+@@ -351,6 +351,7 @@ static const struct pci_device_id ahci_pci_tbl[] = {
+ 	/* JMicron 362B and 362C have an AHCI function with IDE class code */
+ 	{ PCI_VDEVICE(JMICRON, 0x2362), board_ahci_ign_iferr },
+ 	{ PCI_VDEVICE(JMICRON, 0x236f), board_ahci_ign_iferr },
++	/* May need to update quirk_jmicron_async_suspend() for additions */
+ 
+ 	/* ATI */
+ 	{ PCI_VDEVICE(ATI, 0x4380), board_ahci_sb600 }, /* ATI SB600 */
+@@ -1451,18 +1452,6 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
+ 	else if (pdev->vendor == 0x177d && pdev->device == 0xa01c)
+ 		ahci_pci_bar = AHCI_PCI_BAR_CAVIUM;
+ 
+-	/*
+-	 * The JMicron chip 361/363 contains one SATA controller and one
+-	 * PATA controller,for powering on these both controllers, we must
+-	 * follow the sequence one by one, otherwise one of them can not be
+-	 * powered on successfully, so here we disable the async suspend
+-	 * method for these chips.
+-	 */
+-	if (pdev->vendor == PCI_VENDOR_ID_JMICRON &&
+-		(pdev->device == PCI_DEVICE_ID_JMICRON_JMB363 ||
+-		pdev->device == PCI_DEVICE_ID_JMICRON_JMB361))
+-		device_disable_async_suspend(&pdev->dev);
+-
+ 	/* acquire resources */
+ 	rc = pcim_enable_device(pdev);
+ 	if (rc)
+diff --git a/drivers/ata/pata_jmicron.c b/drivers/ata/pata_jmicron.c
+index 47e418b8c8ba..4d1a5d2c4287 100644
+--- a/drivers/ata/pata_jmicron.c
++++ b/drivers/ata/pata_jmicron.c
+@@ -143,18 +143,6 @@ static int jmicron_init_one (struct pci_dev *pdev, const struct pci_device_id *i
+ 	};
+ 	const struct ata_port_info *ppi[] = { &info, NULL };
+ 
+-	/*
+-	 * The JMicron chip 361/363 contains one SATA controller and one
+-	 * PATA controller,for powering on these both controllers, we must
+-	 * follow the sequence one by one, otherwise one of them can not be
+-	 * powered on successfully, so here we disable the async suspend
+-	 * method for these chips.
+-	 */
+-	if (pdev->vendor == PCI_VENDOR_ID_JMICRON &&
+-		(pdev->device == PCI_DEVICE_ID_JMICRON_JMB363 ||
+-		pdev->device == PCI_DEVICE_ID_JMICRON_JMB361))
+-		device_disable_async_suspend(&pdev->dev);
+-
+ 	return ata_pci_bmdma_init_one(pdev, ppi, &jmicron_sht, NULL, 0);
+ }
+ 
+diff --git a/drivers/auxdisplay/ks0108.c b/drivers/auxdisplay/ks0108.c
+index 5b93852392b8..0d752851a1ee 100644
+--- a/drivers/auxdisplay/ks0108.c
++++ b/drivers/auxdisplay/ks0108.c
+@@ -139,6 +139,7 @@ static int __init ks0108_init(void)
+ 
+ 	ks0108_pardevice = parport_register_device(ks0108_parport, KS0108_NAME,
+ 		NULL, NULL, NULL, PARPORT_DEV_EXCL, NULL);
++	parport_put_port(ks0108_parport);
+ 	if (ks0108_pardevice == NULL) {
+ 		printk(KERN_ERR KS0108_NAME ": ERROR: "
+ 			"parport didn't register new device\n");
+diff --git a/drivers/base/devres.c b/drivers/base/devres.c
+index c8a53d1e019f..875464690117 100644
+--- a/drivers/base/devres.c
++++ b/drivers/base/devres.c
+@@ -297,10 +297,10 @@ void * devres_get(struct device *dev, void *new_res,
+ 	if (!dr) {
+ 		add_dr(dev, &new_dr->node);
+ 		dr = new_dr;
+-		new_dr = NULL;
++		new_res = NULL;
+ 	}
+ 	spin_unlock_irqrestore(&dev->devres_lock, flags);
+-	devres_free(new_dr);
++	devres_free(new_res);
+ 
+ 	return dr->data;
+ }
+diff --git a/drivers/base/platform.c b/drivers/base/platform.c
+index 063f0ab15259..f80aaaf9f610 100644
+--- a/drivers/base/platform.c
++++ b/drivers/base/platform.c
+@@ -375,9 +375,7 @@ int platform_device_add(struct platform_device *pdev)
+ 
+ 	while (--i >= 0) {
+ 		struct resource *r = &pdev->resource[i];
+-		unsigned long type = resource_type(r);
+-
+-		if (type == IORESOURCE_MEM || type == IORESOURCE_IO)
++		if (r->parent)
+ 			release_resource(r);
+ 	}
+ 
+@@ -408,9 +406,7 @@ void platform_device_del(struct platform_device *pdev)
+ 
+ 		for (i = 0; i < pdev->num_resources; i++) {
+ 			struct resource *r = &pdev->resource[i];
+-			unsigned long type = resource_type(r);
+-
+-			if (type == IORESOURCE_MEM || type == IORESOURCE_IO)
++			if (r->parent)
+ 				release_resource(r);
+ 		}
+ 	}
+diff --git a/drivers/base/power/clock_ops.c b/drivers/base/power/clock_ops.c
+index acef9f9f759a..652b5a367c1f 100644
+--- a/drivers/base/power/clock_ops.c
++++ b/drivers/base/power/clock_ops.c
+@@ -38,7 +38,7 @@ struct pm_clock_entry {
+  * @dev: The device for the given clock
+  * @ce: PM clock entry corresponding to the clock.
+  */
+-static inline int __pm_clk_enable(struct device *dev, struct pm_clock_entry *ce)
++static inline void __pm_clk_enable(struct device *dev, struct pm_clock_entry *ce)
+ {
+ 	int ret;
+ 
+@@ -50,8 +50,6 @@ static inline int __pm_clk_enable(struct device *dev, struct pm_clock_entry *ce)
+ 			dev_err(dev, "%s: failed to enable clk %p, error %d\n",
+ 				__func__, ce->clk, ret);
+ 	}
+-
+-	return ret;
+ }
+ 
+ /**
+diff --git a/drivers/clk/pistachio/clk-pistachio.c b/drivers/clk/pistachio/clk-pistachio.c
+index 8c0fe8828f99..c4ceb5eaf46c 100644
+--- a/drivers/clk/pistachio/clk-pistachio.c
++++ b/drivers/clk/pistachio/clk-pistachio.c
+@@ -159,9 +159,15 @@ PNAME(mux_debug) = { "mips_pll_mux", "rpu_v_pll_mux",
+ 		     "wifi_pll_mux", "bt_pll_mux" };
+ static u32 mux_debug_idx[] = { 0x0, 0x1, 0x2, 0x4, 0x8, 0x10 };
+ 
+-static unsigned int pistachio_critical_clks[] __initdata = {
+-	CLK_MIPS,
+-	CLK_PERIPH_SYS,
++static unsigned int pistachio_critical_clks_core[] __initdata = {
++	CLK_MIPS
++};
++
++static unsigned int pistachio_critical_clks_sys[] __initdata = {
++	PERIPH_CLK_SYS,
++	PERIPH_CLK_SYS_BUS,
++	PERIPH_CLK_DDR,
++	PERIPH_CLK_ROM,
+ };
+ 
+ static void __init pistachio_clk_init(struct device_node *np)
+@@ -193,8 +199,8 @@ static void __init pistachio_clk_init(struct device_node *np)
+ 
+ 	pistachio_clk_register_provider(p);
+ 
+-	pistachio_clk_force_enable(p, pistachio_critical_clks,
+-				   ARRAY_SIZE(pistachio_critical_clks));
++	pistachio_clk_force_enable(p, pistachio_critical_clks_core,
++				   ARRAY_SIZE(pistachio_critical_clks_core));
+ }
+ CLK_OF_DECLARE(pistachio_clk, "img,pistachio-clk", pistachio_clk_init);
+ 
+@@ -261,6 +267,9 @@ static void __init pistachio_clk_periph_init(struct device_node *np)
+ 				    ARRAY_SIZE(pistachio_periph_gates));
+ 
+ 	pistachio_clk_register_provider(p);
++
++	pistachio_clk_force_enable(p, pistachio_critical_clks_sys,
++				   ARRAY_SIZE(pistachio_critical_clks_sys));
+ }
+ CLK_OF_DECLARE(pistachio_clk_periph, "img,pistachio-clk-periph",
+ 	       pistachio_clk_periph_init);
+diff --git a/drivers/clk/pistachio/clk-pll.c b/drivers/clk/pistachio/clk-pll.c
+index e17dada0dd21..c9b459821084 100644
+--- a/drivers/clk/pistachio/clk-pll.c
++++ b/drivers/clk/pistachio/clk-pll.c
+@@ -65,6 +65,12 @@
+ #define MIN_OUTPUT_FRAC			12000000UL
+ #define MAX_OUTPUT_FRAC			1600000000UL
+ 
++/* Fractional PLL operating modes */
++enum pll_mode {
++	PLL_MODE_FRAC,
++	PLL_MODE_INT,
++};
++
+ struct pistachio_clk_pll {
+ 	struct clk_hw hw;
+ 	void __iomem *base;
+@@ -88,12 +94,10 @@ static inline void pll_lock(struct pistachio_clk_pll *pll)
+ 		cpu_relax();
+ }
+ 
+-static inline u32 do_div_round_closest(u64 dividend, u32 divisor)
++static inline u64 do_div_round_closest(u64 dividend, u64 divisor)
+ {
+ 	dividend += divisor / 2;
+-	do_div(dividend, divisor);
+-
+-	return dividend;
++	return div64_u64(dividend, divisor);
+ }
+ 
+ static inline struct pistachio_clk_pll *to_pistachio_pll(struct clk_hw *hw)
+@@ -101,6 +105,29 @@ static inline struct pistachio_clk_pll *to_pistachio_pll(struct clk_hw *hw)
+ 	return container_of(hw, struct pistachio_clk_pll, hw);
+ }
+ 
++static inline enum pll_mode pll_frac_get_mode(struct clk_hw *hw)
++{
++	struct pistachio_clk_pll *pll = to_pistachio_pll(hw);
++	u32 val;
++
++	val = pll_readl(pll, PLL_CTRL3) & PLL_FRAC_CTRL3_DSMPD;
++	return val ? PLL_MODE_INT : PLL_MODE_FRAC;
++}
++
++static inline void pll_frac_set_mode(struct clk_hw *hw, enum pll_mode mode)
++{
++	struct pistachio_clk_pll *pll = to_pistachio_pll(hw);
++	u32 val;
++
++	val = pll_readl(pll, PLL_CTRL3);
++	if (mode == PLL_MODE_INT)
++		val |= PLL_FRAC_CTRL3_DSMPD | PLL_FRAC_CTRL3_DACPD;
++	else
++		val &= ~(PLL_FRAC_CTRL3_DSMPD | PLL_FRAC_CTRL3_DACPD);
++
++	pll_writel(pll, val, PLL_CTRL3);
++}
++
+ static struct pistachio_pll_rate_table *
+ pll_get_params(struct pistachio_clk_pll *pll, unsigned long fref,
+ 	       unsigned long fout)
+@@ -136,8 +163,7 @@ static int pll_gf40lp_frac_enable(struct clk_hw *hw)
+ 	u32 val;
+ 
+ 	val = pll_readl(pll, PLL_CTRL3);
+-	val &= ~(PLL_FRAC_CTRL3_PD | PLL_FRAC_CTRL3_DACPD |
+-		 PLL_FRAC_CTRL3_DSMPD | PLL_FRAC_CTRL3_FOUTPOSTDIVPD |
++	val &= ~(PLL_FRAC_CTRL3_PD | PLL_FRAC_CTRL3_FOUTPOSTDIVPD |
+ 		 PLL_FRAC_CTRL3_FOUT4PHASEPD | PLL_FRAC_CTRL3_FOUTVCOPD);
+ 	pll_writel(pll, val, PLL_CTRL3);
+ 
+@@ -173,7 +199,7 @@ static int pll_gf40lp_frac_set_rate(struct clk_hw *hw, unsigned long rate,
+ 	struct pistachio_clk_pll *pll = to_pistachio_pll(hw);
+ 	struct pistachio_pll_rate_table *params;
+ 	int enabled = pll_gf40lp_frac_is_enabled(hw);
+-	u32 val, vco, old_postdiv1, old_postdiv2;
++	u64 val, vco, old_postdiv1, old_postdiv2;
+ 	const char *name = __clk_get_name(hw->clk);
+ 
+ 	if (rate < MIN_OUTPUT_FRAC || rate > MAX_OUTPUT_FRAC)
+@@ -183,17 +209,21 @@ static int pll_gf40lp_frac_set_rate(struct clk_hw *hw, unsigned long rate,
+ 	if (!params || !params->refdiv)
+ 		return -EINVAL;
+ 
+-	vco = params->fref * params->fbdiv / params->refdiv;
++	/* calculate vco */
++	vco = params->fref;
++	vco *= (params->fbdiv << 24) + params->frac;
++	vco = div64_u64(vco, params->refdiv << 24);
++
+ 	if (vco < MIN_VCO_FRAC_FRAC || vco > MAX_VCO_FRAC_FRAC)
+-		pr_warn("%s: VCO %u is out of range %lu..%lu\n", name, vco,
++		pr_warn("%s: VCO %llu is out of range %lu..%lu\n", name, vco,
+ 			MIN_VCO_FRAC_FRAC, MAX_VCO_FRAC_FRAC);
+ 
+-	val = params->fref / params->refdiv;
++	val = div64_u64(params->fref, params->refdiv);
+ 	if (val < MIN_PFD)
+-		pr_warn("%s: PFD %u is too low (min %lu)\n",
++		pr_warn("%s: PFD %llu is too low (min %lu)\n",
+ 			name, val, MIN_PFD);
+ 	if (val > vco / 16)
+-		pr_warn("%s: PFD %u is too high (max %u)\n",
++		pr_warn("%s: PFD %llu is too high (max %llu)\n",
+ 			name, val, vco / 16);
+ 
+ 	val = pll_readl(pll, PLL_CTRL1);
+@@ -227,6 +257,12 @@ static int pll_gf40lp_frac_set_rate(struct clk_hw *hw, unsigned long rate,
+ 		(params->postdiv2 << PLL_FRAC_CTRL2_POSTDIV2_SHIFT);
+ 	pll_writel(pll, val, PLL_CTRL2);
+ 
++	/* set operating mode */
++	if (params->frac)
++		pll_frac_set_mode(hw, PLL_MODE_FRAC);
++	else
++		pll_frac_set_mode(hw, PLL_MODE_INT);
++
+ 	if (enabled)
+ 		pll_lock(pll);
+ 
+@@ -237,8 +273,7 @@ static unsigned long pll_gf40lp_frac_recalc_rate(struct clk_hw *hw,
+ 						 unsigned long parent_rate)
+ {
+ 	struct pistachio_clk_pll *pll = to_pistachio_pll(hw);
+-	u32 val, prediv, fbdiv, frac, postdiv1, postdiv2;
+-	u64 rate = parent_rate;
++	u64 val, prediv, fbdiv, frac, postdiv1, postdiv2, rate;
+ 
+ 	val = pll_readl(pll, PLL_CTRL1);
+ 	prediv = (val >> PLL_CTRL1_REFDIV_SHIFT) & PLL_CTRL1_REFDIV_MASK;
+@@ -251,7 +286,13 @@ static unsigned long pll_gf40lp_frac_recalc_rate(struct clk_hw *hw,
+ 		PLL_FRAC_CTRL2_POSTDIV2_MASK;
+ 	frac = (val >> PLL_FRAC_CTRL2_FRAC_SHIFT) & PLL_FRAC_CTRL2_FRAC_MASK;
+ 
+-	rate *= (fbdiv << 24) + frac;
++	/* get operating mode (int/frac) and calculate rate accordingly */
++	rate = parent_rate;
++	if (pll_frac_get_mode(hw) == PLL_MODE_FRAC)
++		rate *= (fbdiv << 24) + frac;
++	else
++		rate *= (fbdiv << 24);
++
+ 	rate = do_div_round_closest(rate, (prediv * postdiv1 * postdiv2) << 24);
+ 
+ 	return rate;
+@@ -279,7 +320,7 @@ static int pll_gf40lp_laint_enable(struct clk_hw *hw)
+ 	u32 val;
+ 
+ 	val = pll_readl(pll, PLL_CTRL1);
+-	val &= ~(PLL_INT_CTRL1_PD | PLL_INT_CTRL1_DSMPD |
++	val &= ~(PLL_INT_CTRL1_PD |
+ 		 PLL_INT_CTRL1_FOUTPOSTDIVPD | PLL_INT_CTRL1_FOUTVCOPD);
+ 	pll_writel(pll, val, PLL_CTRL1);
+ 
+@@ -325,12 +366,12 @@ static int pll_gf40lp_laint_set_rate(struct clk_hw *hw, unsigned long rate,
+ 	if (!params || !params->refdiv)
+ 		return -EINVAL;
+ 
+-	vco = params->fref * params->fbdiv / params->refdiv;
++	vco = div_u64(params->fref * params->fbdiv, params->refdiv);
+ 	if (vco < MIN_VCO_LA || vco > MAX_VCO_LA)
+ 		pr_warn("%s: VCO %u is out of range %lu..%lu\n", name, vco,
+ 			MIN_VCO_LA, MAX_VCO_LA);
+ 
+-	val = params->fref / params->refdiv;
++	val = div_u64(params->fref, params->refdiv);
+ 	if (val < MIN_PFD)
+ 		pr_warn("%s: PFD %u is too low (min %lu)\n",
+ 			name, val, MIN_PFD);
+diff --git a/drivers/clk/pistachio/clk.h b/drivers/clk/pistachio/clk.h
+index 52fabbc24624..8d45178dbde3 100644
+--- a/drivers/clk/pistachio/clk.h
++++ b/drivers/clk/pistachio/clk.h
+@@ -95,13 +95,13 @@ struct pistachio_fixed_factor {
+ 	}
+ 
+ struct pistachio_pll_rate_table {
+-	unsigned long fref;
+-	unsigned long fout;
+-	unsigned int refdiv;
+-	unsigned int fbdiv;
+-	unsigned int postdiv1;
+-	unsigned int postdiv2;
+-	unsigned int frac;
++	unsigned long long fref;
++	unsigned long long fout;
++	unsigned long long refdiv;
++	unsigned long long fbdiv;
++	unsigned long long postdiv1;
++	unsigned long long postdiv2;
++	unsigned long long frac;
+ };
+ 
+ enum pistachio_pll_type {
+diff --git a/drivers/clk/pxa/clk-pxa25x.c b/drivers/clk/pxa/clk-pxa25x.c
+index 6cd88d963a7f..542e45ef5087 100644
+--- a/drivers/clk/pxa/clk-pxa25x.c
++++ b/drivers/clk/pxa/clk-pxa25x.c
+@@ -79,7 +79,7 @@ unsigned int pxa25x_get_clk_frequency_khz(int info)
+ 			clks[3] / 1000000, (clks[3] % 1000000) / 10000);
+ 	}
+ 
+-	return (unsigned int)clks[0];
++	return (unsigned int)clks[0] / KHz;
+ }
+ 
+ static unsigned long clk_pxa25x_memory_get_rate(struct clk_hw *hw,
+diff --git a/drivers/clk/pxa/clk-pxa27x.c b/drivers/clk/pxa/clk-pxa27x.c
+index 9a31b77eed23..5b82d30baf9f 100644
+--- a/drivers/clk/pxa/clk-pxa27x.c
++++ b/drivers/clk/pxa/clk-pxa27x.c
+@@ -80,7 +80,7 @@ unsigned int pxa27x_get_clk_frequency_khz(int info)
+ 		pr_info("System bus clock: %ld.%02ldMHz\n",
+ 			clks[4] / 1000000, (clks[4] % 1000000) / 10000);
+ 	}
+-	return (unsigned int)clks[0];
++	return (unsigned int)clks[0] / KHz;
+ }
+ 
+ bool pxa27x_is_ppll_disabled(void)
+diff --git a/drivers/clk/pxa/clk-pxa3xx.c b/drivers/clk/pxa/clk-pxa3xx.c
+index ac03ba49e9d1..4af4eed5f89f 100644
+--- a/drivers/clk/pxa/clk-pxa3xx.c
++++ b/drivers/clk/pxa/clk-pxa3xx.c
+@@ -78,7 +78,7 @@ unsigned int pxa3xx_get_clk_frequency_khz(int info)
+ 		pr_info("System bus clock: %ld.%02ldMHz\n",
+ 			clks[4] / 1000000, (clks[4] % 1000000) / 10000);
+ 	}
+-	return (unsigned int)clks[0];
++	return (unsigned int)clks[0] / KHz;
+ }
+ 
+ static unsigned long clk_pxa3xx_ac97_get_rate(struct clk_hw *hw,
+diff --git a/drivers/clk/qcom/gcc-apq8084.c b/drivers/clk/qcom/gcc-apq8084.c
+index 54a756b90a37..457c540585f9 100644
+--- a/drivers/clk/qcom/gcc-apq8084.c
++++ b/drivers/clk/qcom/gcc-apq8084.c
+@@ -2105,6 +2105,7 @@ static struct clk_branch gcc_ce1_clk = {
+ 				"ce1_clk_src",
+ 			},
+ 			.num_parents = 1,
++			.flags = CLK_SET_RATE_PARENT,
+ 			.ops = &clk_branch2_ops,
+ 		},
+ 	},
+diff --git a/drivers/clk/qcom/gcc-msm8916.c b/drivers/clk/qcom/gcc-msm8916.c
+index c66f7bc2ae87..5d75bffab141 100644
+--- a/drivers/clk/qcom/gcc-msm8916.c
++++ b/drivers/clk/qcom/gcc-msm8916.c
+@@ -2278,7 +2278,7 @@ static struct clk_branch gcc_prng_ahb_clk = {
+ 	.halt_check = BRANCH_HALT_VOTED,
+ 	.clkr = {
+ 		.enable_reg = 0x45004,
+-		.enable_mask = BIT(0),
++		.enable_mask = BIT(8),
+ 		.hw.init = &(struct clk_init_data){
+ 			.name = "gcc_prng_ahb_clk",
+ 			.parent_names = (const char *[]){
+diff --git a/drivers/clk/qcom/gcc-msm8974.c b/drivers/clk/qcom/gcc-msm8974.c
+index c39d09874e74..f06a082e3e87 100644
+--- a/drivers/clk/qcom/gcc-msm8974.c
++++ b/drivers/clk/qcom/gcc-msm8974.c
+@@ -1783,6 +1783,7 @@ static struct clk_branch gcc_ce1_clk = {
+ 				"ce1_clk_src",
+ 			},
+ 			.num_parents = 1,
++			.flags = CLK_SET_RATE_PARENT,
+ 			.ops = &clk_branch2_ops,
+ 		},
+ 	},
+diff --git a/drivers/clk/rockchip/clk-rk3288.c b/drivers/clk/rockchip/clk-rk3288.c
+index 4f817ed9e6ee..0211162ee879 100644
+--- a/drivers/clk/rockchip/clk-rk3288.c
++++ b/drivers/clk/rockchip/clk-rk3288.c
+@@ -578,7 +578,7 @@ static struct rockchip_clk_branch rk3288_clk_branches[] __initdata = {
+ 	COMPOSITE(0, "mac_pll_src", mux_pll_src_npll_cpll_gpll_p, 0,
+ 			RK3288_CLKSEL_CON(21), 0, 2, MFLAGS, 8, 5, DFLAGS,
+ 			RK3288_CLKGATE_CON(2), 5, GFLAGS),
+-	MUX(SCLK_MAC, "mac_clk", mux_mac_p, 0,
++	MUX(SCLK_MAC, "mac_clk", mux_mac_p, CLK_SET_RATE_PARENT,
+ 			RK3288_CLKSEL_CON(21), 4, 1, MFLAGS),
+ 	GATE(SCLK_MACREF_OUT, "sclk_macref_out", "mac_clk", 0,
+ 			RK3288_CLKGATE_CON(5), 3, GFLAGS),
+diff --git a/drivers/clk/samsung/clk-exynos4.c b/drivers/clk/samsung/clk-exynos4.c
+index cae2c048488d..d1af2fc53c5f 100644
+--- a/drivers/clk/samsung/clk-exynos4.c
++++ b/drivers/clk/samsung/clk-exynos4.c
+@@ -86,6 +86,7 @@
+ #define DIV_PERIL4		0xc560
+ #define DIV_PERIL5		0xc564
+ #define E4X12_DIV_CAM1		0xc568
++#define E4X12_GATE_BUS_FSYS1	0xc744
+ #define GATE_SCLK_CAM		0xc820
+ #define GATE_IP_CAM		0xc920
+ #define GATE_IP_TV		0xc924
+@@ -1097,6 +1098,7 @@ static struct samsung_gate_clock exynos4x12_gate_clks[] __initdata = {
+ 		0),
+ 	GATE(CLK_PPMUIMAGE, "ppmuimage", "aclk200", E4X12_GATE_IP_IMAGE, 9, 0,
+ 		0),
++	GATE(CLK_TSADC, "tsadc", "aclk133", E4X12_GATE_BUS_FSYS1, 16, 0, 0),
+ 	GATE(CLK_MIPI_HSI, "mipi_hsi", "aclk133", GATE_IP_FSYS, 10, 0, 0),
+ 	GATE(CLK_CHIPID, "chipid", "aclk100", E4X12_GATE_IP_PERIR, 0, 0, 0),
+ 	GATE(CLK_SYSREG, "sysreg", "aclk100", E4X12_GATE_IP_PERIR, 1,
+diff --git a/drivers/clk/samsung/clk-s5pv210.c b/drivers/clk/samsung/clk-s5pv210.c
+index cf7e8fa7b624..793cb1d2f7ae 100644
+--- a/drivers/clk/samsung/clk-s5pv210.c
++++ b/drivers/clk/samsung/clk-s5pv210.c
+@@ -828,6 +828,8 @@ static void __init __s5pv210_clk_init(struct device_node *np,
+ 
+ 	s5pv210_clk_sleep_init();
+ 
++	samsung_clk_of_add_provider(np, ctx);
++
+ 	pr_info("%s clocks: mout_apll = %ld, mout_mpll = %ld\n"
+ 		"\tmout_epll = %ld, mout_vpll = %ld\n",
+ 		is_s5p6442 ? "S5P6442" : "S5PV210",
+diff --git a/drivers/clk/versatile/clk-sp810.c b/drivers/clk/versatile/clk-sp810.c
+index a96dd8e53fdb..b674ffc4f5ce 100644
+--- a/drivers/clk/versatile/clk-sp810.c
++++ b/drivers/clk/versatile/clk-sp810.c
+@@ -128,8 +128,8 @@ static struct clk *clk_sp810_timerclken_of_get(struct of_phandle_args *clkspec,
+ {
+ 	struct clk_sp810 *sp810 = data;
+ 
+-	if (WARN_ON(clkspec->args_count != 1 || clkspec->args[0] >
+-			ARRAY_SIZE(sp810->timerclken)))
++	if (WARN_ON(clkspec->args_count != 1 ||
++		    clkspec->args[0] >=	ARRAY_SIZE(sp810->timerclken)))
+ 		return NULL;
+ 
+ 	return sp810->timerclken[clkspec->args[0]].clk;
+diff --git a/drivers/crypto/vmx/aes_ctr.c b/drivers/crypto/vmx/aes_ctr.c
+index 7adae42a7b79..ed3838781b4c 100644
+--- a/drivers/crypto/vmx/aes_ctr.c
++++ b/drivers/crypto/vmx/aes_ctr.c
+@@ -113,6 +113,7 @@ static int p8_aes_ctr_crypt(struct blkcipher_desc *desc,
+ 			    struct scatterlist *src, unsigned int nbytes)
+ {
+ 	int ret;
++	u64 inc;
+ 	struct blkcipher_walk walk;
+ 	struct p8_aes_ctr_ctx *ctx =
+ 		crypto_tfm_ctx(crypto_blkcipher_tfm(desc->tfm));
+@@ -140,7 +141,12 @@ static int p8_aes_ctr_crypt(struct blkcipher_desc *desc,
+ 						    walk.iv);
+ 			pagefault_enable();
+ 
+-			crypto_inc(walk.iv, AES_BLOCK_SIZE);
++			/* We need to update IV mostly for last bytes/round */
++			inc = (nbytes & AES_BLOCK_MASK) / AES_BLOCK_SIZE;
++			if (inc > 0)
++				while (inc--)
++					crypto_inc(walk.iv, AES_BLOCK_SIZE);
++
+ 			nbytes &= AES_BLOCK_SIZE - 1;
+ 			ret = blkcipher_walk_done(desc, &walk, nbytes);
+ 		}
+diff --git a/drivers/crypto/vmx/aesp8-ppc.pl b/drivers/crypto/vmx/aesp8-ppc.pl
+index 6c5c20c6108e..228053921b3f 100644
+--- a/drivers/crypto/vmx/aesp8-ppc.pl
++++ b/drivers/crypto/vmx/aesp8-ppc.pl
+@@ -1437,28 +1437,28 @@ Load_ctr32_enc_key:
+ 	?vperm		v31,v31,$out0,$keyperm
+ 	lvx		v25,$x10,$key_		# pre-load round[2]
+ 
+-	vadduwm		$two,$one,$one
++	vadduqm		$two,$one,$one
+ 	subi		$inp,$inp,15		# undo "caller"
+ 	$SHL		$len,$len,4
+ 
+-	vadduwm		$out1,$ivec,$one	# counter values ...
+-	vadduwm		$out2,$ivec,$two
++	vadduqm		$out1,$ivec,$one	# counter values ...
++	vadduqm		$out2,$ivec,$two
+ 	vxor		$out0,$ivec,$rndkey0	# ... xored with rndkey[0]
+ 	 le?li		$idx,8
+-	vadduwm		$out3,$out1,$two
++	vadduqm		$out3,$out1,$two
+ 	vxor		$out1,$out1,$rndkey0
+ 	 le?lvsl	$inpperm,0,$idx
+-	vadduwm		$out4,$out2,$two
++	vadduqm		$out4,$out2,$two
+ 	vxor		$out2,$out2,$rndkey0
+ 	 le?vspltisb	$tmp,0x0f
+-	vadduwm		$out5,$out3,$two
++	vadduqm		$out5,$out3,$two
+ 	vxor		$out3,$out3,$rndkey0
+ 	 le?vxor	$inpperm,$inpperm,$tmp	# transform for lvx_u/stvx_u
+-	vadduwm		$out6,$out4,$two
++	vadduqm		$out6,$out4,$two
+ 	vxor		$out4,$out4,$rndkey0
+-	vadduwm		$out7,$out5,$two
++	vadduqm		$out7,$out5,$two
+ 	vxor		$out5,$out5,$rndkey0
+-	vadduwm		$ivec,$out6,$two	# next counter value
++	vadduqm		$ivec,$out6,$two	# next counter value
+ 	vxor		$out6,$out6,$rndkey0
+ 	vxor		$out7,$out7,$rndkey0
+ 
+@@ -1594,27 +1594,27 @@ Loop_ctr32_enc8x_middle:
+ 
+ 	vcipherlast	$in0,$out0,$in0
+ 	vcipherlast	$in1,$out1,$in1
+-	 vadduwm	$out1,$ivec,$one	# counter values ...
++	 vadduqm	$out1,$ivec,$one	# counter values ...
+ 	vcipherlast	$in2,$out2,$in2
+-	 vadduwm	$out2,$ivec,$two
++	 vadduqm	$out2,$ivec,$two
+ 	 vxor		$out0,$ivec,$rndkey0	# ... xored with rndkey[0]
+ 	vcipherlast	$in3,$out3,$in3
+-	 vadduwm	$out3,$out1,$two
++	 vadduqm	$out3,$out1,$two
+ 	 vxor		$out1,$out1,$rndkey0
+ 	vcipherlast	$in4,$out4,$in4
+-	 vadduwm	$out4,$out2,$two
++	 vadduqm	$out4,$out2,$two
+ 	 vxor		$out2,$out2,$rndkey0
+ 	vcipherlast	$in5,$out5,$in5
+-	 vadduwm	$out5,$out3,$two
++	 vadduqm	$out5,$out3,$two
+ 	 vxor		$out3,$out3,$rndkey0
+ 	vcipherlast	$in6,$out6,$in6
+-	 vadduwm	$out6,$out4,$two
++	 vadduqm	$out6,$out4,$two
+ 	 vxor		$out4,$out4,$rndkey0
+ 	vcipherlast	$in7,$out7,$in7
+-	 vadduwm	$out7,$out5,$two
++	 vadduqm	$out7,$out5,$two
+ 	 vxor		$out5,$out5,$rndkey0
+ 	le?vperm	$in0,$in0,$in0,$inpperm
+-	 vadduwm	$ivec,$out6,$two	# next counter value
++	 vadduqm	$ivec,$out6,$two	# next counter value
+ 	 vxor		$out6,$out6,$rndkey0
+ 	le?vperm	$in1,$in1,$in1,$inpperm
+ 	 vxor		$out7,$out7,$rndkey0
+diff --git a/drivers/crypto/vmx/ghashp8-ppc.pl b/drivers/crypto/vmx/ghashp8-ppc.pl
+index 0a6f899839dd..d8429cb71f02 100644
+--- a/drivers/crypto/vmx/ghashp8-ppc.pl
++++ b/drivers/crypto/vmx/ghashp8-ppc.pl
+@@ -61,6 +61,12 @@ $code=<<___;
+ 	mtspr		256,r0
+ 	li		r10,0x30
+ 	lvx_u		$H,0,r4			# load H
++	le?xor		r7,r7,r7
++	le?addi		r7,r7,0x8		# need a vperm start with 08
++	le?lvsr		5,0,r7
++	le?vspltisb	6,0x0f
++	le?vxor		5,5,6			# set a b-endian mask
++	le?vperm	$H,$H,$H,5
+ 
+ 	vspltisb	$xC2,-16		# 0xf0
+ 	vspltisb	$t0,1			# one
+diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
+index 27df17a0e620..89c3dd62ba21 100644
+--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
++++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
+@@ -75,6 +75,11 @@ void amdgpu_connector_hotplug(struct drm_connector *connector)
+ 			if (!amdgpu_display_hpd_sense(adev, amdgpu_connector->hpd.hpd)) {
+ 				drm_helper_connector_dpms(connector, DRM_MODE_DPMS_OFF);
+ 			} else if (amdgpu_atombios_dp_needs_link_train(amdgpu_connector)) {
++				/* Don't try to start link training before we
++				 * have the dpcd */
++				if (!amdgpu_atombios_dp_get_dpcd(amdgpu_connector))
++					return;
++
+ 				/* set it to OFF so that drm_helper_connector_dpms()
+ 				 * won't return immediately since the current state
+ 				 * is ON at this point.
+diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
+index db5422e65ec5..a8207e5a8549 100644
+--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
++++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
+@@ -97,18 +97,12 @@ int amdgpu_ih_ring_init(struct amdgpu_device *adev, unsigned ring_size,
+ 			/* add 8 bytes for the rptr/wptr shadows and
+ 			 * add them to the end of the ring allocation.
+ 			 */
+-			adev->irq.ih.ring = kzalloc(adev->irq.ih.ring_size + 8, GFP_KERNEL);
++			adev->irq.ih.ring = pci_alloc_consistent(adev->pdev,
++								 adev->irq.ih.ring_size + 8,
++								 &adev->irq.ih.rb_dma_addr);
+ 			if (adev->irq.ih.ring == NULL)
+ 				return -ENOMEM;
+-			adev->irq.ih.rb_dma_addr = pci_map_single(adev->pdev,
+-								  (void *)adev->irq.ih.ring,
+-								  adev->irq.ih.ring_size,
+-								  PCI_DMA_BIDIRECTIONAL);
+-			if (pci_dma_mapping_error(adev->pdev, adev->irq.ih.rb_dma_addr)) {
+-				dev_err(&adev->pdev->dev, "Failed to DMA MAP the IH RB page\n");
+-				kfree((void *)adev->irq.ih.ring);
+-				return -ENOMEM;
+-			}
++			memset((void *)adev->irq.ih.ring, 0, adev->irq.ih.ring_size + 8);
+ 			adev->irq.ih.wptr_offs = (adev->irq.ih.ring_size / 4) + 0;
+ 			adev->irq.ih.rptr_offs = (adev->irq.ih.ring_size / 4) + 1;
+ 		}
+@@ -148,9 +142,9 @@ void amdgpu_ih_ring_fini(struct amdgpu_device *adev)
+ 			/* add 8 bytes for the rptr/wptr shadows and
+ 			 * add them to the end of the ring allocation.
+ 			 */
+-			pci_unmap_single(adev->pdev, adev->irq.ih.rb_dma_addr,
+-					 adev->irq.ih.ring_size + 8, PCI_DMA_BIDIRECTIONAL);
+-			kfree((void *)adev->irq.ih.ring);
++			pci_free_consistent(adev->pdev, adev->irq.ih.ring_size + 8,
++					    (void *)adev->irq.ih.ring,
++					    adev->irq.ih.rb_dma_addr);
+ 			adev->irq.ih.ring = NULL;
+ 		}
+ 	} else {
+diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+index f5c22556ec2c..2abc661845b6 100644
+--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
++++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+@@ -374,7 +374,8 @@ static int amdgpu_uvd_cs_msg_decode(uint32_t *msg, unsigned buf_sizes[])
+ 	unsigned height_in_mb = ALIGN(height / 16, 2);
+ 	unsigned fs_in_mb = width_in_mb * height_in_mb;
+ 
+-	unsigned image_size, tmp, min_dpb_size, num_dpb_buffer, min_ctx_size;
++	unsigned image_size, tmp, min_dpb_size, num_dpb_buffer;
++	unsigned min_ctx_size = 0;
+ 
+ 	image_size = width * height;
+ 	image_size += image_size / 2;
+diff --git a/drivers/gpu/drm/amd/amdgpu/atombios_dp.c b/drivers/gpu/drm/amd/amdgpu/atombios_dp.c
+index 9ba0a7d5bc8e..92b6acadfc52 100644
+--- a/drivers/gpu/drm/amd/amdgpu/atombios_dp.c
++++ b/drivers/gpu/drm/amd/amdgpu/atombios_dp.c
+@@ -139,7 +139,8 @@ amdgpu_atombios_dp_aux_transfer(struct drm_dp_aux *aux, struct drm_dp_aux_msg *m
+ 
+ 	tx_buf[0] = msg->address & 0xff;
+ 	tx_buf[1] = msg->address >> 8;
+-	tx_buf[2] = msg->request << 4;
++	tx_buf[2] = (msg->request << 4) |
++		((msg->address >> 16) & 0xf);
+ 	tx_buf[3] = msg->size ? (msg->size - 1) : 0;
+ 
+ 	switch (msg->request & ~DP_AUX_I2C_MOT) {
+diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
+index e70a26f587a0..e774a437dd65 100644
+--- a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
++++ b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
+@@ -1331,7 +1331,7 @@ static void dce_v10_0_program_watermarks(struct amdgpu_device *adev,
+ 	tmp = REG_SET_FIELD(wm_mask, DPG_WATERMARK_MASK_CONTROL, URGENCY_WATERMARK_MASK, 2);
+ 	WREG32(mmDPG_WATERMARK_MASK_CONTROL + amdgpu_crtc->crtc_offset, tmp);
+ 	tmp = RREG32(mmDPG_PIPE_URGENCY_CONTROL + amdgpu_crtc->crtc_offset);
+-	tmp = REG_SET_FIELD(tmp, DPG_PIPE_URGENCY_CONTROL, URGENCY_LOW_WATERMARK, latency_watermark_a);
++	tmp = REG_SET_FIELD(tmp, DPG_PIPE_URGENCY_CONTROL, URGENCY_LOW_WATERMARK, latency_watermark_b);
+ 	tmp = REG_SET_FIELD(tmp, DPG_PIPE_URGENCY_CONTROL, URGENCY_HIGH_WATERMARK, line_time);
+ 	WREG32(mmDPG_PIPE_URGENCY_CONTROL + amdgpu_crtc->crtc_offset, tmp);
+ 	/* restore original selection */
+diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
+index dcb402ee048a..c4a21a7afd68 100644
+--- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
++++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
+@@ -1329,7 +1329,7 @@ static void dce_v11_0_program_watermarks(struct amdgpu_device *adev,
+ 	tmp = REG_SET_FIELD(wm_mask, DPG_WATERMARK_MASK_CONTROL, URGENCY_WATERMARK_MASK, 2);
+ 	WREG32(mmDPG_WATERMARK_MASK_CONTROL + amdgpu_crtc->crtc_offset, tmp);
+ 	tmp = RREG32(mmDPG_PIPE_URGENCY_CONTROL + amdgpu_crtc->crtc_offset);
+-	tmp = REG_SET_FIELD(tmp, DPG_PIPE_URGENCY_CONTROL, URGENCY_LOW_WATERMARK, latency_watermark_a);
++	tmp = REG_SET_FIELD(tmp, DPG_PIPE_URGENCY_CONTROL, URGENCY_LOW_WATERMARK, latency_watermark_b);
+ 	tmp = REG_SET_FIELD(tmp, DPG_PIPE_URGENCY_CONTROL, URGENCY_HIGH_WATERMARK, line_time);
+ 	WREG32(mmDPG_PIPE_URGENCY_CONTROL + amdgpu_crtc->crtc_offset, tmp);
+ 	/* restore original selection */
+diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
+index 884b4f9b81c4..603146ec9868 100644
+--- a/drivers/gpu/drm/i915/i915_drv.c
++++ b/drivers/gpu/drm/i915/i915_drv.c
+@@ -683,15 +683,18 @@ static int i915_drm_suspend_late(struct drm_device *drm_dev, bool hibernation)
+ 
+ 	pci_disable_device(drm_dev->pdev);
+ 	/*
+-	 * During hibernation on some GEN4 platforms the BIOS may try to access
++	 * During hibernation on some platforms the BIOS may try to access
+ 	 * the device even though it's already in D3 and hang the machine. So
+ 	 * leave the device in D0 on those platforms and hope the BIOS will
+-	 * power down the device properly. Platforms where this was seen:
+-	 * Lenovo Thinkpad X301, X61s
++	 * power down the device properly. The issue was seen on multiple old
++	 * GENs with different BIOS vendors, so having an explicit blacklist
++	 * is inpractical; apply the workaround on everything pre GEN6. The
++	 * platforms where the issue was seen:
++	 * Lenovo Thinkpad X301, X61s, X60, T60, X41
++	 * Fujitsu FSC S7110
++	 * Acer Aspire 1830T
+ 	 */
+-	if (!(hibernation &&
+-	      drm_dev->pdev->subsystem_vendor == PCI_VENDOR_ID_LENOVO &&
+-	      INTEL_INFO(dev_priv)->gen == 4))
++	if (!(hibernation && INTEL_INFO(dev_priv)->gen < 6))
+ 		pci_set_power_state(drm_dev->pdev, PCI_D3hot);
+ 
+ 	return 0;
+diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
+index fd1de451c8c6..e1df8feb05be 100644
+--- a/drivers/gpu/drm/i915/i915_drv.h
++++ b/drivers/gpu/drm/i915/i915_drv.h
+@@ -3303,13 +3303,13 @@ int intel_freq_opcode(struct drm_i915_private *dev_priv, int val);
+ #define I915_READ64(reg)	dev_priv->uncore.funcs.mmio_readq(dev_priv, (reg), true)
+ 
+ #define I915_READ64_2x32(lower_reg, upper_reg) ({			\
+-	u32 upper, lower, tmp;						\
+-	tmp = I915_READ(upper_reg);					\
++	u32 upper, lower, old_upper, loop = 0;				\
++	upper = I915_READ(upper_reg);					\
+ 	do {								\
+-		upper = tmp;						\
++		old_upper = upper;					\
+ 		lower = I915_READ(lower_reg);				\
+-		tmp = I915_READ(upper_reg);				\
+-	} while (upper != tmp);						\
++		upper = I915_READ(upper_reg);				\
++	} while (upper != old_upper && loop++ < 2);			\
+ 	(u64)upper << 32 | lower; })
+ 
+ #define POSTING_READ(reg)	(void)I915_READ_NOTRACE(reg)
+diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+index a7fa14516cda..5e6b4a29e503 100644
+--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
++++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+@@ -1024,6 +1024,7 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
+ 		u32 old_read = obj->base.read_domains;
+ 		u32 old_write = obj->base.write_domain;
+ 
++		obj->dirty = 1; /* be paranoid  */
+ 		obj->base.write_domain = obj->base.pending_write_domain;
+ 		if (obj->base.write_domain == 0)
+ 			obj->base.pending_read_domains |= obj->base.read_domains;
+@@ -1031,7 +1032,6 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
+ 
+ 		i915_vma_move_to_active(vma, ring);
+ 		if (obj->base.write_domain) {
+-			obj->dirty = 1;
+ 			i915_gem_request_assign(&obj->last_write_req, req);
+ 
+ 			intel_fb_obj_invalidate(obj, ring, ORIGIN_CS);
+diff --git a/drivers/gpu/drm/i915/intel_csr.c b/drivers/gpu/drm/i915/intel_csr.c
+index bcb41e61877d..fb842d6e343f 100644
+--- a/drivers/gpu/drm/i915/intel_csr.c
++++ b/drivers/gpu/drm/i915/intel_csr.c
+@@ -350,7 +350,7 @@ static void finish_csr_load(const struct firmware *fw, void *context)
+ 	}
+ 	csr->mmio_count = dmc_header->mmio_count;
+ 	for (i = 0; i < dmc_header->mmio_count; i++) {
+-		if (dmc_header->mmioaddr[i] < CSR_MMIO_START_RANGE &&
++		if (dmc_header->mmioaddr[i] < CSR_MMIO_START_RANGE ||
+ 			dmc_header->mmioaddr[i] > CSR_MMIO_END_RANGE) {
+ 			DRM_ERROR(" Firmware has wrong mmio address 0x%x\n",
+ 						dmc_header->mmioaddr[i]);
+diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
+index 87476ff181dd..107c6c0519fd 100644
+--- a/drivers/gpu/drm/i915/intel_display.c
++++ b/drivers/gpu/drm/i915/intel_display.c
+@@ -14665,6 +14665,24 @@ void intel_modeset_init(struct drm_device *dev)
+ 	if (INTEL_INFO(dev)->num_pipes == 0)
+ 		return;
+ 
++	/*
++	 * There may be no VBT; and if the BIOS enabled SSC we can
++	 * just keep using it to avoid unnecessary flicker.  Whereas if the
++	 * BIOS isn't using it, don't assume it will work even if the VBT
++	 * indicates as much.
++	 */
++	if (HAS_PCH_IBX(dev) || HAS_PCH_CPT(dev)) {
++		bool bios_lvds_use_ssc = !!(I915_READ(PCH_DREF_CONTROL) &
++					    DREF_SSC1_ENABLE);
++
++		if (dev_priv->vbt.lvds_use_ssc != bios_lvds_use_ssc) {
++			DRM_DEBUG_KMS("SSC %sabled by BIOS, overriding VBT which says %sabled\n",
++				     bios_lvds_use_ssc ? "en" : "dis",
++				     dev_priv->vbt.lvds_use_ssc ? "en" : "dis");
++			dev_priv->vbt.lvds_use_ssc = bios_lvds_use_ssc;
++		}
++	}
++
+ 	intel_init_display(dev);
+ 	intel_init_audio(dev);
+ 
+@@ -15160,7 +15178,6 @@ void intel_modeset_setup_hw_state(struct drm_device *dev,
+ 
+ void intel_modeset_gem_init(struct drm_device *dev)
+ {
+-	struct drm_i915_private *dev_priv = dev->dev_private;
+ 	struct drm_crtc *c;
+ 	struct drm_i915_gem_object *obj;
+ 	int ret;
+@@ -15169,16 +15186,6 @@ void intel_modeset_gem_init(struct drm_device *dev)
+ 	intel_init_gt_powersave(dev);
+ 	mutex_unlock(&dev->struct_mutex);
+ 
+-	/*
+-	 * There may be no VBT; and if the BIOS enabled SSC we can
+-	 * just keep using it to avoid unnecessary flicker.  Whereas if the
+-	 * BIOS isn't using it, don't assume it will work even if the VBT
+-	 * indicates as much.
+-	 */
+-	if (HAS_PCH_IBX(dev) || HAS_PCH_CPT(dev))
+-		dev_priv->vbt.lvds_use_ssc = !!(I915_READ(PCH_DREF_CONTROL) &
+-						DREF_SSC1_ENABLE);
+-
+ 	intel_modeset_init_hw(dev);
+ 
+ 	intel_setup_overlay(dev);
+diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
+index 1df0e1fe235f..bd8f8863eb0e 100644
+--- a/drivers/gpu/drm/i915/intel_dp.c
++++ b/drivers/gpu/drm/i915/intel_dp.c
+@@ -4987,9 +4987,12 @@ intel_dp_hpd_pulse(struct intel_digital_port *intel_dig_port, bool long_hpd)
+ 
+ 		intel_dp_probe_oui(intel_dp);
+ 
+-		if (!intel_dp_probe_mst(intel_dp))
++		if (!intel_dp_probe_mst(intel_dp)) {
++			drm_modeset_lock(&dev->mode_config.connection_mutex, NULL);
++			intel_dp_check_link_status(intel_dp);
++			drm_modeset_unlock(&dev->mode_config.connection_mutex);
+ 			goto mst_fail;
+-
++		}
+ 	} else {
+ 		if (intel_dp->is_mst) {
+ 			if (intel_dp_check_mst_status(intel_dp) == -EINVAL)
+@@ -4997,10 +5000,6 @@ intel_dp_hpd_pulse(struct intel_digital_port *intel_dig_port, bool long_hpd)
+ 		}
+ 
+ 		if (!intel_dp->is_mst) {
+-			/*
+-			 * we'll check the link status via the normal hot plug path later -
+-			 * but for short hpds we should check it now
+-			 */
+ 			drm_modeset_lock(&dev->mode_config.connection_mutex, NULL);
+ 			intel_dp_check_link_status(intel_dp);
+ 			drm_modeset_unlock(&dev->mode_config.connection_mutex);
+diff --git a/drivers/gpu/drm/i915/intel_dsi.c b/drivers/gpu/drm/i915/intel_dsi.c
+index b5a5558ecd63..68b25dd525f0 100644
+--- a/drivers/gpu/drm/i915/intel_dsi.c
++++ b/drivers/gpu/drm/i915/intel_dsi.c
+@@ -1036,11 +1036,7 @@ void intel_dsi_init(struct drm_device *dev)
+ 	intel_connector->unregister = intel_connector_unregister;
+ 
+ 	/* Pipe A maps to MIPI DSI port A, pipe B maps to MIPI DSI port C */
+-	if (dev_priv->vbt.dsi.config->dual_link) {
+-		/* XXX: does dual link work on either pipe? */
+-		intel_encoder->crtc_mask = (1 << PIPE_A);
+-		intel_dsi->ports = ((1 << PORT_A) | (1 << PORT_C));
+-	} else if (dev_priv->vbt.dsi.port == DVO_PORT_MIPIA) {
++	if (dev_priv->vbt.dsi.port == DVO_PORT_MIPIA) {
+ 		intel_encoder->crtc_mask = (1 << PIPE_A);
+ 		intel_dsi->ports = (1 << PORT_A);
+ 	} else if (dev_priv->vbt.dsi.port == DVO_PORT_MIPIC) {
+@@ -1048,6 +1044,9 @@ void intel_dsi_init(struct drm_device *dev)
+ 		intel_dsi->ports = (1 << PORT_C);
+ 	}
+ 
++	if (dev_priv->vbt.dsi.config->dual_link)
++		intel_dsi->ports = ((1 << PORT_A) | (1 << PORT_C));
++
+ 	/* Create a DSI host (and a device) for each port. */
+ 	for_each_dsi_port(port, intel_dsi->ports) {
+ 		struct intel_dsi_host *host;
+diff --git a/drivers/gpu/drm/qxl/qxl_display.c b/drivers/gpu/drm/qxl/qxl_display.c
+index a8dbb3ef4e3c..7c6225c84ba6 100644
+--- a/drivers/gpu/drm/qxl/qxl_display.c
++++ b/drivers/gpu/drm/qxl/qxl_display.c
+@@ -160,9 +160,35 @@ static int qxl_add_monitors_config_modes(struct drm_connector *connector,
+ 	*pwidth = head->width;
+ 	*pheight = head->height;
+ 	drm_mode_probed_add(connector, mode);
++	/* remember the last custom size for mode validation */
++	qdev->monitors_config_width = mode->hdisplay;
++	qdev->monitors_config_height = mode->vdisplay;
+ 	return 1;
+ }
+ 
++static struct mode_size {
++	int w;
++	int h;
++} common_modes[] = {
++	{ 640,  480},
++	{ 720,  480},
++	{ 800,  600},
++	{ 848,  480},
++	{1024,  768},
++	{1152,  768},
++	{1280,  720},
++	{1280,  800},
++	{1280,  854},
++	{1280,  960},
++	{1280, 1024},
++	{1440,  900},
++	{1400, 1050},
++	{1680, 1050},
++	{1600, 1200},
++	{1920, 1080},
++	{1920, 1200}
++};
++
+ static int qxl_add_common_modes(struct drm_connector *connector,
+                                 unsigned pwidth,
+                                 unsigned pheight)
+@@ -170,29 +196,6 @@ static int qxl_add_common_modes(struct drm_connector *connector,
+ 	struct drm_device *dev = connector->dev;
+ 	struct drm_display_mode *mode = NULL;
+ 	int i;
+-	struct mode_size {
+-		int w;
+-		int h;
+-	} common_modes[] = {
+-		{ 640,  480},
+-		{ 720,  480},
+-		{ 800,  600},
+-		{ 848,  480},
+-		{1024,  768},
+-		{1152,  768},
+-		{1280,  720},
+-		{1280,  800},
+-		{1280,  854},
+-		{1280,  960},
+-		{1280, 1024},
+-		{1440,  900},
+-		{1400, 1050},
+-		{1680, 1050},
+-		{1600, 1200},
+-		{1920, 1080},
+-		{1920, 1200}
+-	};
+-
+ 	for (i = 0; i < ARRAY_SIZE(common_modes); i++) {
+ 		mode = drm_cvt_mode(dev, common_modes[i].w, common_modes[i].h,
+ 				    60, false, false, false);
+@@ -823,11 +826,22 @@ static int qxl_conn_get_modes(struct drm_connector *connector)
+ static int qxl_conn_mode_valid(struct drm_connector *connector,
+ 			       struct drm_display_mode *mode)
+ {
++	struct drm_device *ddev = connector->dev;
++	struct qxl_device *qdev = ddev->dev_private;
++	int i;
++
+ 	/* TODO: is this called for user defined modes? (xrandr --add-mode)
+ 	 * TODO: check that the mode fits in the framebuffer */
+-	DRM_DEBUG("%s: %dx%d status=%d\n", mode->name, mode->hdisplay,
+-		  mode->vdisplay, mode->status);
+-	return MODE_OK;
++
++	if(qdev->monitors_config_width == mode->hdisplay &&
++	   qdev->monitors_config_height == mode->vdisplay)
++		return MODE_OK;
++
++	for (i = 0; i < ARRAY_SIZE(common_modes); i++) {
++		if (common_modes[i].w == mode->hdisplay && common_modes[i].h == mode->vdisplay)
++			return MODE_OK;
++	}
++	return MODE_BAD;
+ }
+ 
+ static struct drm_encoder *qxl_best_encoder(struct drm_connector *connector)
+diff --git a/drivers/gpu/drm/qxl/qxl_drv.h b/drivers/gpu/drm/qxl/qxl_drv.h
+index d8549690801d..01a86948eb8c 100644
+--- a/drivers/gpu/drm/qxl/qxl_drv.h
++++ b/drivers/gpu/drm/qxl/qxl_drv.h
+@@ -325,6 +325,8 @@ struct qxl_device {
+ 	struct work_struct fb_work;
+ 
+ 	struct drm_property *hotplug_mode_update_property;
++	int monitors_config_width;
++	int monitors_config_height;
+ };
+ 
+ /* forward declaration for QXL_INFO_IO */
+diff --git a/drivers/gpu/drm/radeon/atombios_dp.c b/drivers/gpu/drm/radeon/atombios_dp.c
+index f81e0d7d0232..9cd49c584263 100644
+--- a/drivers/gpu/drm/radeon/atombios_dp.c
++++ b/drivers/gpu/drm/radeon/atombios_dp.c
+@@ -171,8 +171,9 @@ radeon_dp_aux_transfer_atom(struct drm_dp_aux *aux, struct drm_dp_aux_msg *msg)
+ 		return -E2BIG;
+ 
+ 	tx_buf[0] = msg->address & 0xff;
+-	tx_buf[1] = msg->address >> 8;
+-	tx_buf[2] = msg->request << 4;
++	tx_buf[1] = (msg->address >> 8) & 0xff;
++	tx_buf[2] = (msg->request << 4) |
++		((msg->address >> 16) & 0xf);
+ 	tx_buf[3] = msg->size ? (msg->size - 1) : 0;
+ 
+ 	switch (msg->request & ~DP_AUX_I2C_MOT) {
+diff --git a/drivers/gpu/drm/radeon/radeon_audio.c b/drivers/gpu/drm/radeon/radeon_audio.c
+index fbc8d88d6e5d..2c02e99b5f95 100644
+--- a/drivers/gpu/drm/radeon/radeon_audio.c
++++ b/drivers/gpu/drm/radeon/radeon_audio.c
+@@ -522,13 +522,15 @@ static int radeon_audio_set_avi_packet(struct drm_encoder *encoder,
+ 		return err;
+ 	}
+ 
+-	if (drm_rgb_quant_range_selectable(radeon_connector_edid(connector))) {
+-		if (radeon_encoder->output_csc == RADEON_OUTPUT_CSC_TVRGB)
+-			frame.quantization_range = HDMI_QUANTIZATION_RANGE_LIMITED;
+-		else
+-			frame.quantization_range = HDMI_QUANTIZATION_RANGE_FULL;
+-	} else {
+-		frame.quantization_range = HDMI_QUANTIZATION_RANGE_DEFAULT;
++	if (radeon_encoder->output_csc != RADEON_OUTPUT_CSC_BYPASS) {
++		if (drm_rgb_quant_range_selectable(radeon_connector_edid(connector))) {
++			if (radeon_encoder->output_csc == RADEON_OUTPUT_CSC_TVRGB)
++				frame.quantization_range = HDMI_QUANTIZATION_RANGE_LIMITED;
++			else
++				frame.quantization_range = HDMI_QUANTIZATION_RANGE_FULL;
++		} else {
++			frame.quantization_range = HDMI_QUANTIZATION_RANGE_DEFAULT;
++		}
+ 	}
+ 
+ 	err = hdmi_avi_infoframe_pack(&frame, buffer, sizeof(buffer));
+diff --git a/drivers/gpu/drm/radeon/radeon_connectors.c b/drivers/gpu/drm/radeon/radeon_connectors.c
+index 94b21ae70ef7..5a2cafb4f1bc 100644
+--- a/drivers/gpu/drm/radeon/radeon_connectors.c
++++ b/drivers/gpu/drm/radeon/radeon_connectors.c
+@@ -95,6 +95,11 @@ void radeon_connector_hotplug(struct drm_connector *connector)
+ 			if (!radeon_hpd_sense(rdev, radeon_connector->hpd.hpd)) {
+ 				drm_helper_connector_dpms(connector, DRM_MODE_DPMS_OFF);
+ 			} else if (radeon_dp_needs_link_train(radeon_connector)) {
++				/* Don't try to start link training before we
++				 * have the dpcd */
++				if (!radeon_dp_getdpcd(radeon_connector))
++					return;
++
+ 				/* set it to OFF so that drm_helper_connector_dpms()
+ 				 * won't return immediately since the current state
+ 				 * is ON at this point.
+diff --git a/drivers/gpu/drm/radeon/radeon_dp_auxch.c b/drivers/gpu/drm/radeon/radeon_dp_auxch.c
+index fcbd60bb0349..3b0c229d7dcd 100644
+--- a/drivers/gpu/drm/radeon/radeon_dp_auxch.c
++++ b/drivers/gpu/drm/radeon/radeon_dp_auxch.c
+@@ -116,8 +116,8 @@ radeon_dp_aux_transfer_native(struct drm_dp_aux *aux, struct drm_dp_aux_msg *msg
+ 	       AUX_SW_WR_BYTES(bytes));
+ 
+ 	/* write the data header into the registers */
+-	/* request, addres, msg size */
+-	byte = (msg->request << 4);
++	/* request, address, msg size */
++	byte = (msg->request << 4) | ((msg->address >> 16) & 0xf);
+ 	WREG32(AUX_SW_DATA + aux_offset[instance],
+ 	       AUX_SW_DATA_MASK(byte) | AUX_SW_AUTOINCREMENT_DISABLE);
+ 
+diff --git a/drivers/hid/hid-cp2112.c b/drivers/hid/hid-cp2112.c
+index a2dbbbe0d8d7..39bf74793b8b 100644
+--- a/drivers/hid/hid-cp2112.c
++++ b/drivers/hid/hid-cp2112.c
+@@ -537,7 +537,7 @@ static int cp2112_xfer(struct i2c_adapter *adap, u16 addr,
+ 	struct cp2112_device *dev = (struct cp2112_device *)adap->algo_data;
+ 	struct hid_device *hdev = dev->hdev;
+ 	u8 buf[64];
+-	__be16 word;
++	__le16 word;
+ 	ssize_t count;
+ 	size_t read_length = 0;
+ 	unsigned int retries;
+@@ -554,7 +554,7 @@ static int cp2112_xfer(struct i2c_adapter *adap, u16 addr,
+ 		if (I2C_SMBUS_READ == read_write)
+ 			count = cp2112_read_req(buf, addr, read_length);
+ 		else
+-			count = cp2112_write_req(buf, addr, data->byte, NULL,
++			count = cp2112_write_req(buf, addr, command, NULL,
+ 						 0);
+ 		break;
+ 	case I2C_SMBUS_BYTE_DATA:
+@@ -569,7 +569,7 @@ static int cp2112_xfer(struct i2c_adapter *adap, u16 addr,
+ 		break;
+ 	case I2C_SMBUS_WORD_DATA:
+ 		read_length = 2;
+-		word = cpu_to_be16(data->word);
++		word = cpu_to_le16(data->word);
+ 
+ 		if (I2C_SMBUS_READ == read_write)
+ 			count = cp2112_write_read_req(buf, addr, read_length,
+@@ -582,7 +582,7 @@ static int cp2112_xfer(struct i2c_adapter *adap, u16 addr,
+ 		size = I2C_SMBUS_WORD_DATA;
+ 		read_write = I2C_SMBUS_READ;
+ 		read_length = 2;
+-		word = cpu_to_be16(data->word);
++		word = cpu_to_le16(data->word);
+ 
+ 		count = cp2112_write_read_req(buf, addr, read_length, command,
+ 					      (u8 *)&word, 2);
+@@ -675,7 +675,7 @@ static int cp2112_xfer(struct i2c_adapter *adap, u16 addr,
+ 		data->byte = buf[0];
+ 		break;
+ 	case I2C_SMBUS_WORD_DATA:
+-		data->word = be16_to_cpup((__be16 *)buf);
++		data->word = le16_to_cpup((__le16 *)buf);
+ 		break;
+ 	case I2C_SMBUS_BLOCK_DATA:
+ 		if (read_length > I2C_SMBUS_BLOCK_MAX) {
+diff --git a/drivers/hid/usbhid/hid-core.c b/drivers/hid/usbhid/hid-core.c
+index bfbe1bedda7f..eab5bd6a2442 100644
+--- a/drivers/hid/usbhid/hid-core.c
++++ b/drivers/hid/usbhid/hid-core.c
+@@ -164,7 +164,7 @@ static void hid_io_error(struct hid_device *hid)
+ 	if (time_after(jiffies, usbhid->stop_retry)) {
+ 
+ 		/* Retries failed, so do a port reset unless we lack bandwidth*/
+-		if (test_bit(HID_NO_BANDWIDTH, &usbhid->iofl)
++		if (!test_bit(HID_NO_BANDWIDTH, &usbhid->iofl)
+ 		     && !test_and_set_bit(HID_RESET_PENDING, &usbhid->iofl)) {
+ 
+ 			schedule_work(&usbhid->reset_work);
+diff --git a/drivers/iio/accel/mma8452.c b/drivers/iio/accel/mma8452.c
+index 13ea1ea23328..bda69a4355fa 100644
+--- a/drivers/iio/accel/mma8452.c
++++ b/drivers/iio/accel/mma8452.c
+@@ -229,7 +229,7 @@ static int mma8452_get_hp_filter_index(struct mma8452_data *data,
+ 	int i = mma8452_get_odr_index(data);
+ 
+ 	return mma8452_get_int_plus_micros_index(mma8452_hp_filter_cutoff[i],
+-		ARRAY_SIZE(mma8452_scales[0]), val, val2);
++		ARRAY_SIZE(mma8452_hp_filter_cutoff[0]), val, val2);
+ }
+ 
+ static int mma8452_read_hp_filter(struct mma8452_data *data, int *hz, int *uHz)
+diff --git a/drivers/iio/gyro/Kconfig b/drivers/iio/gyro/Kconfig
+index b3d0e94f72eb..8d2439345673 100644
+--- a/drivers/iio/gyro/Kconfig
++++ b/drivers/iio/gyro/Kconfig
+@@ -53,7 +53,8 @@ config ADXRS450
+ config BMG160
+ 	tristate "BOSCH BMG160 Gyro Sensor"
+ 	depends on I2C
+-	select IIO_TRIGGERED_BUFFER if IIO_BUFFER
++	select IIO_BUFFER
++	select IIO_TRIGGERED_BUFFER
+ 	help
+ 	  Say yes here to build support for Bosch BMG160 Tri-axis Gyro Sensor
+ 	  driver. This driver also supports BMI055 gyroscope.
+diff --git a/drivers/iio/imu/adis16400_core.c b/drivers/iio/imu/adis16400_core.c
+index 2fd68f2219a7..d42e4fe2c7ed 100644
+--- a/drivers/iio/imu/adis16400_core.c
++++ b/drivers/iio/imu/adis16400_core.c
+@@ -780,7 +780,7 @@ static struct adis16400_chip_info adis16400_chips[] = {
+ 		.flags = ADIS16400_HAS_PROD_ID |
+ 				ADIS16400_HAS_SERIAL_NUMBER |
+ 				ADIS16400_BURST_DIAG_STAT,
+-		.gyro_scale_micro = IIO_DEGREE_TO_RAD(10000), /* 0.01 deg/s */
++		.gyro_scale_micro = IIO_DEGREE_TO_RAD(40000), /* 0.04 deg/s */
+ 		.accel_scale_micro = IIO_G_TO_M_S_2(833), /* 1/1200 g */
+ 		.temp_scale_nano = 73860000, /* 0.07386 C */
+ 		.temp_offset = 31000000 / 73860, /* 31 C = 0x00 */
+diff --git a/drivers/iio/imu/adis16480.c b/drivers/iio/imu/adis16480.c
+index 989605dd6f78..b94bfd3f595b 100644
+--- a/drivers/iio/imu/adis16480.c
++++ b/drivers/iio/imu/adis16480.c
+@@ -110,6 +110,10 @@
+ struct adis16480_chip_info {
+ 	unsigned int num_channels;
+ 	const struct iio_chan_spec *channels;
++	unsigned int gyro_max_val;
++	unsigned int gyro_max_scale;
++	unsigned int accel_max_val;
++	unsigned int accel_max_scale;
+ };
+ 
+ struct adis16480 {
+@@ -497,19 +501,21 @@ static int adis16480_set_filter_freq(struct iio_dev *indio_dev,
+ static int adis16480_read_raw(struct iio_dev *indio_dev,
+ 	const struct iio_chan_spec *chan, int *val, int *val2, long info)
+ {
++	struct adis16480 *st = iio_priv(indio_dev);
++
+ 	switch (info) {
+ 	case IIO_CHAN_INFO_RAW:
+ 		return adis_single_conversion(indio_dev, chan, 0, val);
+ 	case IIO_CHAN_INFO_SCALE:
+ 		switch (chan->type) {
+ 		case IIO_ANGL_VEL:
+-			*val = 0;
+-			*val2 = IIO_DEGREE_TO_RAD(20000); /* 0.02 degree/sec */
+-			return IIO_VAL_INT_PLUS_MICRO;
++			*val = st->chip_info->gyro_max_scale;
++			*val2 = st->chip_info->gyro_max_val;
++			return IIO_VAL_FRACTIONAL;
+ 		case IIO_ACCEL:
+-			*val = 0;
+-			*val2 = IIO_G_TO_M_S_2(800); /* 0.8 mg */
+-			return IIO_VAL_INT_PLUS_MICRO;
++			*val = st->chip_info->accel_max_scale;
++			*val2 = st->chip_info->accel_max_val;
++			return IIO_VAL_FRACTIONAL;
+ 		case IIO_MAGN:
+ 			*val = 0;
+ 			*val2 = 100; /* 0.0001 gauss */
+@@ -674,18 +680,39 @@ static const struct adis16480_chip_info adis16480_chip_info[] = {
+ 	[ADIS16375] = {
+ 		.channels = adis16485_channels,
+ 		.num_channels = ARRAY_SIZE(adis16485_channels),
++		/*
++		 * storing the value in rad/degree and the scale in degree
++		 * gives us the result in rad and better precession than
++		 * storing the scale directly in rad.
++		 */
++		.gyro_max_val = IIO_RAD_TO_DEGREE(22887),
++		.gyro_max_scale = 300,
++		.accel_max_val = IIO_M_S_2_TO_G(21973),
++		.accel_max_scale = 18,
+ 	},
+ 	[ADIS16480] = {
+ 		.channels = adis16480_channels,
+ 		.num_channels = ARRAY_SIZE(adis16480_channels),
++		.gyro_max_val = IIO_RAD_TO_DEGREE(22500),
++		.gyro_max_scale = 450,
++		.accel_max_val = IIO_M_S_2_TO_G(12500),
++		.accel_max_scale = 5,
+ 	},
+ 	[ADIS16485] = {
+ 		.channels = adis16485_channels,
+ 		.num_channels = ARRAY_SIZE(adis16485_channels),
++		.gyro_max_val = IIO_RAD_TO_DEGREE(22500),
++		.gyro_max_scale = 450,
++		.accel_max_val = IIO_M_S_2_TO_G(20000),
++		.accel_max_scale = 5,
+ 	},
+ 	[ADIS16488] = {
+ 		.channels = adis16480_channels,
+ 		.num_channels = ARRAY_SIZE(adis16480_channels),
++		.gyro_max_val = IIO_RAD_TO_DEGREE(22500),
++		.gyro_max_scale = 450,
++		.accel_max_val = IIO_M_S_2_TO_G(22500),
++		.accel_max_scale = 18,
+ 	},
+ };
+ 
+diff --git a/drivers/iio/industrialio-buffer.c b/drivers/iio/industrialio-buffer.c
+index 6eee1b044c60..b3fda9ee4174 100644
+--- a/drivers/iio/industrialio-buffer.c
++++ b/drivers/iio/industrialio-buffer.c
+@@ -151,7 +151,7 @@ unsigned int iio_buffer_poll(struct file *filp,
+ 	struct iio_buffer *rb = indio_dev->buffer;
+ 
+ 	if (!indio_dev->info)
+-		return -ENODEV;
++		return 0;
+ 
+ 	poll_wait(filp, &rb->pollq, wait);
+ 	if (iio_buffer_ready(indio_dev, rb, rb->watermark, 0))
+diff --git a/drivers/iio/industrialio-event.c b/drivers/iio/industrialio-event.c
+index 894d8137c4cf..52d4fcb0de1d 100644
+--- a/drivers/iio/industrialio-event.c
++++ b/drivers/iio/industrialio-event.c
+@@ -84,7 +84,7 @@ static unsigned int iio_event_poll(struct file *filep,
+ 	unsigned int events = 0;
+ 
+ 	if (!indio_dev->info)
+-		return -ENODEV;
++		return events;
+ 
+ 	poll_wait(filep, &ev_int->wait, wait);
+ 
+diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
+index 1fe93cfea7d3..9d0672b58c31 100644
+--- a/drivers/md/dm-cache-target.c
++++ b/drivers/md/dm-cache-target.c
+@@ -1729,6 +1729,8 @@ static void remap_cell_to_origin_clear_discard(struct cache *cache,
+ 		remap_to_origin(cache, bio);
+ 		issue(cache, bio);
+ 	}
++
++	free_prison_cell(cache, cell);
+ }
+ 
+ static void remap_cell_to_cache_dirty(struct cache *cache, struct dm_bio_prison_cell *cell,
+@@ -1763,6 +1765,8 @@ static void remap_cell_to_cache_dirty(struct cache *cache, struct dm_bio_prison_
+ 		remap_to_cache(cache, bio, cblock);
+ 		issue(cache, bio);
+ 	}
++
++	free_prison_cell(cache, cell);
+ }
+ 
+ /*----------------------------------------------------------------*/
+diff --git a/drivers/md/dm-stats.c b/drivers/md/dm-stats.c
+index 8a8b48fa901a..8289804ccd99 100644
+--- a/drivers/md/dm-stats.c
++++ b/drivers/md/dm-stats.c
+@@ -457,12 +457,24 @@ static int dm_stats_list(struct dm_stats *stats, const char *program,
+ 	list_for_each_entry(s, &stats->list, list_entry) {
+ 		if (!program || !strcmp(program, s->program_id)) {
+ 			len = s->end - s->start;
+-			DMEMIT("%d: %llu+%llu %llu %s %s\n", s->id,
++			DMEMIT("%d: %llu+%llu %llu %s %s", s->id,
+ 				(unsigned long long)s->start,
+ 				(unsigned long long)len,
+ 				(unsigned long long)s->step,
+ 				s->program_id,
+ 				s->aux_data);
++			if (s->stat_flags & STAT_PRECISE_TIMESTAMPS)
++				DMEMIT(" precise_timestamps");
++			if (s->n_histogram_entries) {
++				unsigned i;
++				DMEMIT(" histogram:");
++				for (i = 0; i < s->n_histogram_entries; i++) {
++					if (i)
++						DMEMIT(",");
++					DMEMIT("%llu", s->histogram_boundaries[i]);
++				}
++			}
++			DMEMIT("\n");
+ 		}
+ 	}
+ 	mutex_unlock(&stats->mutex);
+diff --git a/drivers/of/address.c b/drivers/of/address.c
+index 8bfda6ade2c0..384574c3987c 100644
+--- a/drivers/of/address.c
++++ b/drivers/of/address.c
+@@ -845,10 +845,10 @@ struct device_node *of_find_matching_node_by_address(struct device_node *from,
+ 	struct resource res;
+ 
+ 	while (dn) {
+-		if (of_address_to_resource(dn, 0, &res))
+-			continue;
+-		if (res.start == base_address)
++		if (!of_address_to_resource(dn, 0, &res) &&
++		    res.start == base_address)
+ 			return dn;
++
+ 		dn = of_find_matching_node(dn, matches);
+ 	}
+ 
+diff --git a/drivers/pci/access.c b/drivers/pci/access.c
+index d9b64a175990..b965c12168b7 100644
+--- a/drivers/pci/access.c
++++ b/drivers/pci/access.c
+@@ -439,6 +439,56 @@ static const struct pci_vpd_ops pci_vpd_pci22_ops = {
+ 	.release = pci_vpd_pci22_release,
+ };
+ 
++static ssize_t pci_vpd_f0_read(struct pci_dev *dev, loff_t pos, size_t count,
++			       void *arg)
++{
++	struct pci_dev *tdev = pci_get_slot(dev->bus, PCI_SLOT(dev->devfn));
++	ssize_t ret;
++
++	if (!tdev)
++		return -ENODEV;
++
++	ret = pci_read_vpd(tdev, pos, count, arg);
++	pci_dev_put(tdev);
++	return ret;
++}
++
++static ssize_t pci_vpd_f0_write(struct pci_dev *dev, loff_t pos, size_t count,
++				const void *arg)
++{
++	struct pci_dev *tdev = pci_get_slot(dev->bus, PCI_SLOT(dev->devfn));
++	ssize_t ret;
++
++	if (!tdev)
++		return -ENODEV;
++
++	ret = pci_write_vpd(tdev, pos, count, arg);
++	pci_dev_put(tdev);
++	return ret;
++}
++
++static const struct pci_vpd_ops pci_vpd_f0_ops = {
++	.read = pci_vpd_f0_read,
++	.write = pci_vpd_f0_write,
++	.release = pci_vpd_pci22_release,
++};
++
++static int pci_vpd_f0_dev_check(struct pci_dev *dev)
++{
++	struct pci_dev *tdev = pci_get_slot(dev->bus, PCI_SLOT(dev->devfn));
++	int ret = 0;
++
++	if (!tdev)
++		return -ENODEV;
++	if (!tdev->vpd || !tdev->multifunction ||
++	    dev->class != tdev->class || dev->vendor != tdev->vendor ||
++	    dev->device != tdev->device)
++		ret = -ENODEV;
++
++	pci_dev_put(tdev);
++	return ret;
++}
++
+ int pci_vpd_pci22_init(struct pci_dev *dev)
+ {
+ 	struct pci_vpd_pci22 *vpd;
+@@ -447,12 +497,21 @@ int pci_vpd_pci22_init(struct pci_dev *dev)
+ 	cap = pci_find_capability(dev, PCI_CAP_ID_VPD);
+ 	if (!cap)
+ 		return -ENODEV;
++	if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0) {
++		int ret = pci_vpd_f0_dev_check(dev);
++
++		if (ret)
++			return ret;
++	}
+ 	vpd = kzalloc(sizeof(*vpd), GFP_ATOMIC);
+ 	if (!vpd)
+ 		return -ENOMEM;
+ 
+ 	vpd->base.len = PCI_VPD_PCI22_SIZE;
+-	vpd->base.ops = &pci_vpd_pci22_ops;
++	if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0)
++		vpd->base.ops = &pci_vpd_f0_ops;
++	else
++		vpd->base.ops = &pci_vpd_pci22_ops;
+ 	mutex_init(&vpd->lock);
+ 	vpd->cap = cap;
+ 	vpd->busy = false;
+diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
+index e9fd0e90fa3b..dbd13854f21e 100644
+--- a/drivers/pci/quirks.c
++++ b/drivers/pci/quirks.c
+@@ -1569,6 +1569,18 @@ DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_JMICRON, PCI_DEVICE_ID_JMICRON_JMB3
+ 
+ #endif
+ 
++static void quirk_jmicron_async_suspend(struct pci_dev *dev)
++{
++	if (dev->multifunction) {
++		device_disable_async_suspend(&dev->dev);
++		dev_info(&dev->dev, "async suspend disabled to avoid multi-function power-on ordering issue\n");
++	}
++}
++DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_JMICRON, PCI_ANY_ID, PCI_CLASS_STORAGE_IDE, 8, quirk_jmicron_async_suspend);
++DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_JMICRON, PCI_ANY_ID, PCI_CLASS_STORAGE_SATA_AHCI, 0, quirk_jmicron_async_suspend);
++DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_JMICRON, 0x2362, quirk_jmicron_async_suspend);
++DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_JMICRON, 0x236f, quirk_jmicron_async_suspend);
++
+ #ifdef CONFIG_X86_IO_APIC
+ static void quirk_alder_ioapic(struct pci_dev *pdev)
+ {
+@@ -1894,6 +1906,15 @@ static void quirk_netmos(struct pci_dev *dev)
+ DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_NETMOS, PCI_ANY_ID,
+ 			 PCI_CLASS_COMMUNICATION_SERIAL, 8, quirk_netmos);
+ 
++static void quirk_f0_vpd_link(struct pci_dev *dev)
++{
++	if (!dev->multifunction || !PCI_FUNC(dev->devfn))
++		return;
++	dev->dev_flags |= PCI_DEV_FLAGS_VPD_REF_F0;
++}
++DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, PCI_ANY_ID,
++			      PCI_CLASS_NETWORK_ETHERNET, 8, quirk_f0_vpd_link);
++
+ static void quirk_e100_interrupt(struct pci_dev *dev)
+ {
+ 	u16 command, pmcsr;
+@@ -2829,12 +2850,15 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x3c28, vtd_mask_spec_errors);
+ 
+ static void fixup_ti816x_class(struct pci_dev *dev)
+ {
++	u32 class = dev->class;
++
+ 	/* TI 816x devices do not have class code set when in PCIe boot mode */
+-	dev_info(&dev->dev, "Setting PCI class for 816x PCIe device\n");
+-	dev->class = PCI_CLASS_MULTIMEDIA_VIDEO;
++	dev->class = PCI_CLASS_MULTIMEDIA_VIDEO << 8;
++	dev_info(&dev->dev, "PCI class overridden (%#08x -> %#08x)\n",
++		 class, dev->class);
+ }
+ DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_TI, 0xb800,
+-				 PCI_CLASS_NOT_DEFINED, 0, fixup_ti816x_class);
++			      PCI_CLASS_NOT_DEFINED, 0, fixup_ti816x_class);
+ 
+ /* Some PCIe devices do not work reliably with the claimed maximum
+  * payload size supported.
+diff --git a/drivers/regulator/pbias-regulator.c b/drivers/regulator/pbias-regulator.c
+index bd2b75c0d1d1..4fa7bcaf454e 100644
+--- a/drivers/regulator/pbias-regulator.c
++++ b/drivers/regulator/pbias-regulator.c
+@@ -30,6 +30,7 @@
+ struct pbias_reg_info {
+ 	u32 enable;
+ 	u32 enable_mask;
++	u32 disable_val;
+ 	u32 vmode;
+ 	unsigned int enable_time;
+ 	char *name;
+@@ -62,6 +63,7 @@ static const struct pbias_reg_info pbias_mmc_omap2430 = {
+ 	.enable = BIT(1),
+ 	.enable_mask = BIT(1),
+ 	.vmode = BIT(0),
++	.disable_val = 0,
+ 	.enable_time = 100,
+ 	.name = "pbias_mmc_omap2430"
+ };
+@@ -77,6 +79,7 @@ static const struct pbias_reg_info pbias_sim_omap3 = {
+ static const struct pbias_reg_info pbias_mmc_omap4 = {
+ 	.enable = BIT(26) | BIT(22),
+ 	.enable_mask = BIT(26) | BIT(25) | BIT(22),
++	.disable_val = BIT(25),
+ 	.vmode = BIT(21),
+ 	.enable_time = 100,
+ 	.name = "pbias_mmc_omap4"
+@@ -85,6 +88,7 @@ static const struct pbias_reg_info pbias_mmc_omap4 = {
+ static const struct pbias_reg_info pbias_mmc_omap5 = {
+ 	.enable = BIT(27) | BIT(26),
+ 	.enable_mask = BIT(27) | BIT(25) | BIT(26),
++	.disable_val = BIT(25),
+ 	.vmode = BIT(21),
+ 	.enable_time = 100,
+ 	.name = "pbias_mmc_omap5"
+@@ -159,6 +163,7 @@ static int pbias_regulator_probe(struct platform_device *pdev)
+ 		drvdata[data_idx].desc.enable_reg = res->start;
+ 		drvdata[data_idx].desc.enable_mask = info->enable_mask;
+ 		drvdata[data_idx].desc.enable_val = info->enable;
++		drvdata[data_idx].desc.disable_val = info->disable_val;
+ 
+ 		cfg.init_data = pbias_matches[idx].init_data;
+ 		cfg.driver_data = &drvdata[data_idx];
+diff --git a/drivers/soc/tegra/pmc.c b/drivers/soc/tegra/pmc.c
+index 75d0457a77b7..fa7036c4daf9 100644
+--- a/drivers/soc/tegra/pmc.c
++++ b/drivers/soc/tegra/pmc.c
+@@ -736,12 +736,12 @@ void tegra_pmc_init_tsense_reset(struct tegra_pmc *pmc)
+ 	u32 value, checksum;
+ 
+ 	if (!pmc->soc->has_tsense_reset)
+-		goto out;
++		return;
+ 
+ 	np = of_find_node_by_name(pmc->dev->of_node, "i2c-thermtrip");
+ 	if (!np) {
+ 		dev_warn(dev, "i2c-thermtrip node not found, %s.\n", disabled);
+-		goto out;
++		return;
+ 	}
+ 
+ 	if (of_property_read_u32(np, "nvidia,i2c-controller-id", &ctrl_id)) {
+diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
+index 59705ab23577..c9357bb393d3 100644
+--- a/drivers/spi/spi-bcm2835.c
++++ b/drivers/spi/spi-bcm2835.c
+@@ -553,13 +553,11 @@ static int bcm2835_spi_transfer_one(struct spi_master *master,
+ 	spi_used_hz = cdiv ? (clk_hz / cdiv) : (clk_hz / 65536);
+ 	bcm2835_wr(bs, BCM2835_SPI_CLK, cdiv);
+ 
+-	/* handle all the modes */
++	/* handle all the 3-wire mode */
+ 	if ((spi->mode & SPI_3WIRE) && (tfr->rx_buf))
+ 		cs |= BCM2835_SPI_CS_REN;
+-	if (spi->mode & SPI_CPOL)
+-		cs |= BCM2835_SPI_CS_CPOL;
+-	if (spi->mode & SPI_CPHA)
+-		cs |= BCM2835_SPI_CS_CPHA;
++	else
++		cs &= ~BCM2835_SPI_CS_REN;
+ 
+ 	/* for gpio_cs set dummy CS so that no HW-CS get changed
+ 	 * we can not run this in bcm2835_spi_set_cs, as it does
+@@ -592,6 +590,25 @@ static int bcm2835_spi_transfer_one(struct spi_master *master,
+ 	return bcm2835_spi_transfer_one_irq(master, spi, tfr, cs);
+ }
+ 
++static int bcm2835_spi_prepare_message(struct spi_master *master,
++				       struct spi_message *msg)
++{
++	struct spi_device *spi = msg->spi;
++	struct bcm2835_spi *bs = spi_master_get_devdata(master);
++	u32 cs = bcm2835_rd(bs, BCM2835_SPI_CS);
++
++	cs &= ~(BCM2835_SPI_CS_CPOL | BCM2835_SPI_CS_CPHA);
++
++	if (spi->mode & SPI_CPOL)
++		cs |= BCM2835_SPI_CS_CPOL;
++	if (spi->mode & SPI_CPHA)
++		cs |= BCM2835_SPI_CS_CPHA;
++
++	bcm2835_wr(bs, BCM2835_SPI_CS, cs);
++
++	return 0;
++}
++
+ static void bcm2835_spi_handle_err(struct spi_master *master,
+ 				   struct spi_message *msg)
+ {
+@@ -739,6 +756,7 @@ static int bcm2835_spi_probe(struct platform_device *pdev)
+ 	master->set_cs = bcm2835_spi_set_cs;
+ 	master->transfer_one = bcm2835_spi_transfer_one;
+ 	master->handle_err = bcm2835_spi_handle_err;
++	master->prepare_message = bcm2835_spi_prepare_message;
+ 	master->dev.of_node = pdev->dev.of_node;
+ 
+ 	bs = spi_master_get_devdata(master);
+diff --git a/drivers/spi/spi-bitbang-txrx.h b/drivers/spi/spi-bitbang-txrx.h
+index 06b34e5bcfa3..47bb9b898dfd 100644
+--- a/drivers/spi/spi-bitbang-txrx.h
++++ b/drivers/spi/spi-bitbang-txrx.h
+@@ -49,7 +49,7 @@ bitbang_txrx_be_cpha0(struct spi_device *spi,
+ {
+ 	/* if (cpol == 0) this is SPI_MODE_0; else this is SPI_MODE_2 */
+ 
+-	bool oldbit = !(word & 1);
++	u32 oldbit = (!(word & (1<<(bits-1)))) << 31;
+ 	/* clock starts at inactive polarity */
+ 	for (word <<= (32 - bits); likely(bits); bits--) {
+ 
+@@ -81,7 +81,7 @@ bitbang_txrx_be_cpha1(struct spi_device *spi,
+ {
+ 	/* if (cpol == 0) this is SPI_MODE_1; else this is SPI_MODE_3 */
+ 
+-	bool oldbit = !(word & (1 << 31));
++	u32 oldbit = (!(word & (1<<(bits-1)))) << 31;
+ 	/* clock starts at inactive polarity */
+ 	for (word <<= (32 - bits); likely(bits); bits--) {
+ 
+diff --git a/drivers/spi/spi-dw-mmio.c b/drivers/spi/spi-dw-mmio.c
+index eb03e1215195..7edede6e024b 100644
+--- a/drivers/spi/spi-dw-mmio.c
++++ b/drivers/spi/spi-dw-mmio.c
+@@ -74,6 +74,9 @@ static int dw_spi_mmio_probe(struct platform_device *pdev)
+ 
+ 	dws->max_freq = clk_get_rate(dwsmmio->clk);
+ 
++	of_property_read_u32(pdev->dev.of_node, "reg-io-width",
++			     &dws->reg_io_width);
++
+ 	num_cs = 4;
+ 
+ 	if (pdev->dev.of_node)
+diff --git a/drivers/spi/spi-dw.c b/drivers/spi/spi-dw.c
+index 8d67d03c71eb..4fbfcdc5cb24 100644
+--- a/drivers/spi/spi-dw.c
++++ b/drivers/spi/spi-dw.c
+@@ -194,7 +194,7 @@ static void dw_writer(struct dw_spi *dws)
+ 			else
+ 				txw = *(u16 *)(dws->tx);
+ 		}
+-		dw_writel(dws, DW_SPI_DR, txw);
++		dw_write_io_reg(dws, DW_SPI_DR, txw);
+ 		dws->tx += dws->n_bytes;
+ 	}
+ }
+@@ -205,7 +205,7 @@ static void dw_reader(struct dw_spi *dws)
+ 	u16 rxw;
+ 
+ 	while (max--) {
+-		rxw = dw_readl(dws, DW_SPI_DR);
++		rxw = dw_read_io_reg(dws, DW_SPI_DR);
+ 		/* Care rx only if the transfer's original "rx" is not null */
+ 		if (dws->rx_end - dws->len) {
+ 			if (dws->n_bytes == 1)
+diff --git a/drivers/spi/spi-dw.h b/drivers/spi/spi-dw.h
+index 6c91391c1a4f..b75ed327d5a2 100644
+--- a/drivers/spi/spi-dw.h
++++ b/drivers/spi/spi-dw.h
+@@ -109,6 +109,7 @@ struct dw_spi {
+ 	u32			fifo_len;	/* depth of the FIFO buffer */
+ 	u32			max_freq;	/* max bus freq supported */
+ 
++	u32			reg_io_width;	/* DR I/O width in bytes */
+ 	u16			bus_num;
+ 	u16			num_cs;		/* supported slave numbers */
+ 
+@@ -145,11 +146,45 @@ static inline u32 dw_readl(struct dw_spi *dws, u32 offset)
+ 	return __raw_readl(dws->regs + offset);
+ }
+ 
++static inline u16 dw_readw(struct dw_spi *dws, u32 offset)
++{
++	return __raw_readw(dws->regs + offset);
++}
++
+ static inline void dw_writel(struct dw_spi *dws, u32 offset, u32 val)
+ {
+ 	__raw_writel(val, dws->regs + offset);
+ }
+ 
++static inline void dw_writew(struct dw_spi *dws, u32 offset, u16 val)
++{
++	__raw_writew(val, dws->regs + offset);
++}
++
++static inline u32 dw_read_io_reg(struct dw_spi *dws, u32 offset)
++{
++	switch (dws->reg_io_width) {
++	case 2:
++		return dw_readw(dws, offset);
++	case 4:
++	default:
++		return dw_readl(dws, offset);
++	}
++}
++
++static inline void dw_write_io_reg(struct dw_spi *dws, u32 offset, u32 val)
++{
++	switch (dws->reg_io_width) {
++	case 2:
++		dw_writew(dws, offset, val);
++		break;
++	case 4:
++	default:
++		dw_writel(dws, offset, val);
++		break;
++	}
++}
++
+ static inline void spi_enable_chip(struct dw_spi *dws, int enable)
+ {
+ 	dw_writel(dws, DW_SPI_SSIENR, (enable ? 1 : 0));
+diff --git a/drivers/spi/spi-img-spfi.c b/drivers/spi/spi-img-spfi.c
+index acce90ac7371..bb916c8d40db 100644
+--- a/drivers/spi/spi-img-spfi.c
++++ b/drivers/spi/spi-img-spfi.c
+@@ -105,6 +105,10 @@ struct img_spfi {
+ 	bool rx_dma_busy;
+ };
+ 
++struct img_spfi_device_data {
++	bool gpio_requested;
++};
++
+ static inline u32 spfi_readl(struct img_spfi *spfi, u32 reg)
+ {
+ 	return readl(spfi->regs + reg);
+@@ -267,15 +271,15 @@ static int img_spfi_start_pio(struct spi_master *master,
+ 		cpu_relax();
+ 	}
+ 
+-	ret = spfi_wait_all_done(spfi);
+-	if (ret < 0)
+-		return ret;
+-
+ 	if (rx_bytes > 0 || tx_bytes > 0) {
+ 		dev_err(spfi->dev, "PIO transfer timed out\n");
+ 		return -ETIMEDOUT;
+ 	}
+ 
++	ret = spfi_wait_all_done(spfi);
++	if (ret < 0)
++		return ret;
++
+ 	return 0;
+ }
+ 
+@@ -440,21 +444,50 @@ static int img_spfi_unprepare(struct spi_master *master,
+ 
+ static int img_spfi_setup(struct spi_device *spi)
+ {
+-	int ret;
+-
+-	ret = gpio_request_one(spi->cs_gpio, (spi->mode & SPI_CS_HIGH) ?
+-			       GPIOF_OUT_INIT_LOW : GPIOF_OUT_INIT_HIGH,
+-			       dev_name(&spi->dev));
+-	if (ret)
+-		dev_err(&spi->dev, "can't request chipselect gpio %d\n",
++	int ret = -EINVAL;
++	struct img_spfi_device_data *spfi_data = spi_get_ctldata(spi);
++
++	if (!spfi_data) {
++		spfi_data = kzalloc(sizeof(*spfi_data), GFP_KERNEL);
++		if (!spfi_data)
++			return -ENOMEM;
++		spfi_data->gpio_requested = false;
++		spi_set_ctldata(spi, spfi_data);
++	}
++	if (!spfi_data->gpio_requested) {
++		ret = gpio_request_one(spi->cs_gpio,
++				       (spi->mode & SPI_CS_HIGH) ?
++				       GPIOF_OUT_INIT_LOW : GPIOF_OUT_INIT_HIGH,
++				       dev_name(&spi->dev));
++		if (ret)
++			dev_err(&spi->dev, "can't request chipselect gpio %d\n",
+ 				spi->cs_gpio);
+-
++		else
++			spfi_data->gpio_requested = true;
++	} else {
++		if (gpio_is_valid(spi->cs_gpio)) {
++			int mode = ((spi->mode & SPI_CS_HIGH) ?
++				    GPIOF_OUT_INIT_LOW : GPIOF_OUT_INIT_HIGH);
++
++			ret = gpio_direction_output(spi->cs_gpio, mode);
++			if (ret)
++				dev_err(&spi->dev, "chipselect gpio %d setup failed (%d)\n",
++					spi->cs_gpio, ret);
++		}
++	}
+ 	return ret;
+ }
+ 
+ static void img_spfi_cleanup(struct spi_device *spi)
+ {
+-	gpio_free(spi->cs_gpio);
++	struct img_spfi_device_data *spfi_data = spi_get_ctldata(spi);
++
++	if (spfi_data) {
++		if (spfi_data->gpio_requested)
++			gpio_free(spi->cs_gpio);
++		kfree(spfi_data);
++		spi_set_ctldata(spi, NULL);
++	}
+ }
+ 
+ static void img_spfi_config(struct spi_master *master, struct spi_device *spi,
+diff --git a/drivers/spi/spi-omap2-mcspi.c b/drivers/spi/spi-omap2-mcspi.c
+index 58673841286c..3d09e0b69b73 100644
+--- a/drivers/spi/spi-omap2-mcspi.c
++++ b/drivers/spi/spi-omap2-mcspi.c
+@@ -245,6 +245,7 @@ static void omap2_mcspi_set_enable(const struct spi_device *spi, int enable)
+ 
+ static void omap2_mcspi_set_cs(struct spi_device *spi, bool enable)
+ {
++	struct omap2_mcspi *mcspi = spi_master_get_devdata(spi->master);
+ 	u32 l;
+ 
+ 	/* The controller handles the inverted chip selects
+@@ -255,6 +256,12 @@ static void omap2_mcspi_set_cs(struct spi_device *spi, bool enable)
+ 		enable = !enable;
+ 
+ 	if (spi->controller_state) {
++		int err = pm_runtime_get_sync(mcspi->dev);
++		if (err < 0) {
++			dev_err(mcspi->dev, "failed to get sync: %d\n", err);
++			return;
++		}
++
+ 		l = mcspi_cached_chconf0(spi);
+ 
+ 		if (enable)
+@@ -263,6 +270,9 @@ static void omap2_mcspi_set_cs(struct spi_device *spi, bool enable)
+ 			l |= OMAP2_MCSPI_CHCONF_FORCE;
+ 
+ 		mcspi_write_chconf0(spi, l);
++
++		pm_runtime_mark_last_busy(mcspi->dev);
++		pm_runtime_put_autosuspend(mcspi->dev);
+ 	}
+ }
+ 
+diff --git a/drivers/spi/spi-orion.c b/drivers/spi/spi-orion.c
+index 8cad107a5b3f..a87cfd4ba17b 100644
+--- a/drivers/spi/spi-orion.c
++++ b/drivers/spi/spi-orion.c
+@@ -41,6 +41,11 @@
+ #define ORION_SPI_DATA_OUT_REG		0x08
+ #define ORION_SPI_DATA_IN_REG		0x0c
+ #define ORION_SPI_INT_CAUSE_REG		0x10
++#define ORION_SPI_TIMING_PARAMS_REG	0x18
++
++#define ORION_SPI_TMISO_SAMPLE_MASK	(0x3 << 6)
++#define ORION_SPI_TMISO_SAMPLE_1	(1 << 6)
++#define ORION_SPI_TMISO_SAMPLE_2	(2 << 6)
+ 
+ #define ORION_SPI_MODE_CPOL		(1 << 11)
+ #define ORION_SPI_MODE_CPHA		(1 << 12)
+@@ -70,6 +75,7 @@ struct orion_spi_dev {
+ 	unsigned int		min_divisor;
+ 	unsigned int		max_divisor;
+ 	u32			prescale_mask;
++	bool			is_errata_50mhz_ac;
+ };
+ 
+ struct orion_spi {
+@@ -195,6 +201,41 @@ orion_spi_mode_set(struct spi_device *spi)
+ 	writel(reg, spi_reg(orion_spi, ORION_SPI_IF_CONFIG_REG));
+ }
+ 
++static void
++orion_spi_50mhz_ac_timing_erratum(struct spi_device *spi, unsigned int speed)
++{
++	u32 reg;
++	struct orion_spi *orion_spi;
++
++	orion_spi = spi_master_get_devdata(spi->master);
++
++	/*
++	 * Erratum description: (Erratum NO. FE-9144572) The device
++	 * SPI interface supports frequencies of up to 50 MHz.
++	 * However, due to this erratum, when the device core clock is
++	 * 250 MHz and the SPI interfaces is configured for 50MHz SPI
++	 * clock and CPOL=CPHA=1 there might occur data corruption on
++	 * reads from the SPI device.
++	 * Erratum Workaround:
++	 * Work in one of the following configurations:
++	 * 1. Set CPOL=CPHA=0 in "SPI Interface Configuration
++	 * Register".
++	 * 2. Set TMISO_SAMPLE value to 0x2 in "SPI Timing Parameters 1
++	 * Register" before setting the interface.
++	 */
++	reg = readl(spi_reg(orion_spi, ORION_SPI_TIMING_PARAMS_REG));
++	reg &= ~ORION_SPI_TMISO_SAMPLE_MASK;
++
++	if (clk_get_rate(orion_spi->clk) == 250000000 &&
++			speed == 50000000 && spi->mode & SPI_CPOL &&
++			spi->mode & SPI_CPHA)
++		reg |= ORION_SPI_TMISO_SAMPLE_2;
++	else
++		reg |= ORION_SPI_TMISO_SAMPLE_1; /* This is the default value */
++
++	writel(reg, spi_reg(orion_spi, ORION_SPI_TIMING_PARAMS_REG));
++}
++
+ /*
+  * called only when no transfer is active on the bus
+  */
+@@ -216,6 +257,9 @@ orion_spi_setup_transfer(struct spi_device *spi, struct spi_transfer *t)
+ 
+ 	orion_spi_mode_set(spi);
+ 
++	if (orion_spi->devdata->is_errata_50mhz_ac)
++		orion_spi_50mhz_ac_timing_erratum(spi, speed);
++
+ 	rc = orion_spi_baudrate_set(spi, speed);
+ 	if (rc)
+ 		return rc;
+@@ -413,6 +457,14 @@ static const struct orion_spi_dev armada_375_spi_dev_data = {
+ 	.prescale_mask = ARMADA_SPI_CLK_PRESCALE_MASK,
+ };
+ 
++static const struct orion_spi_dev armada_380_spi_dev_data = {
++	.typ = ARMADA_SPI,
++	.max_hz = 50000000,
++	.max_divisor = 1920,
++	.prescale_mask = ARMADA_SPI_CLK_PRESCALE_MASK,
++	.is_errata_50mhz_ac = true,
++};
++
+ static const struct of_device_id orion_spi_of_match_table[] = {
+ 	{
+ 		.compatible = "marvell,orion-spi",
+@@ -428,7 +480,7 @@ static const struct of_device_id orion_spi_of_match_table[] = {
+ 	},
+ 	{
+ 		.compatible = "marvell,armada-380-spi",
+-		.data = &armada_xp_spi_dev_data,
++		.data = &armada_380_spi_dev_data,
+ 	},
+ 	{
+ 		.compatible = "marvell,armada-390-spi",
+diff --git a/drivers/spi/spi-sh-msiof.c b/drivers/spi/spi-sh-msiof.c
+index d3370a612d84..a7629f8edfca 100644
+--- a/drivers/spi/spi-sh-msiof.c
++++ b/drivers/spi/spi-sh-msiof.c
+@@ -48,8 +48,8 @@ struct sh_msiof_spi_priv {
+ 	const struct sh_msiof_chipdata *chipdata;
+ 	struct sh_msiof_spi_info *info;
+ 	struct completion done;
+-	int tx_fifo_size;
+-	int rx_fifo_size;
++	unsigned int tx_fifo_size;
++	unsigned int rx_fifo_size;
+ 	void *tx_dma_page;
+ 	void *rx_dma_page;
+ 	dma_addr_t tx_dma_addr;
+@@ -95,8 +95,6 @@ struct sh_msiof_spi_priv {
+ #define MDR2_WDLEN1(i)	(((i) - 1) << 16) /* Word Count (1-64/256 (SH, A1))) */
+ #define MDR2_GRPMASK1	0x00000001 /* Group Output Mask 1 (SH, A1) */
+ 
+-#define MAX_WDLEN	256U
+-
+ /* TSCR and RSCR */
+ #define SCR_BRPS_MASK	    0x1f00 /* Prescaler Setting (1-32) */
+ #define SCR_BRPS(i)	(((i) - 1) << 8)
+@@ -850,7 +848,12 @@ static int sh_msiof_transfer_one(struct spi_master *master,
+ 		 *  DMA supports 32-bit words only, hence pack 8-bit and 16-bit
+ 		 *  words, with byte resp. word swapping.
+ 		 */
+-		unsigned int l = min(len, MAX_WDLEN * 4);
++		unsigned int l = 0;
++
++		if (tx_buf)
++			l = min(len, p->tx_fifo_size * 4);
++		if (rx_buf)
++			l = min(len, p->rx_fifo_size * 4);
+ 
+ 		if (bits <= 8) {
+ 			if (l & 3)
+@@ -963,7 +966,7 @@ static const struct sh_msiof_chipdata sh_data = {
+ 
+ static const struct sh_msiof_chipdata r8a779x_data = {
+ 	.tx_fifo_size = 64,
+-	.rx_fifo_size = 256,
++	.rx_fifo_size = 64,
+ 	.master_flags = SPI_MASTER_MUST_TX,
+ };
+ 
+diff --git a/drivers/spi/spi-xilinx.c b/drivers/spi/spi-xilinx.c
+index 133f53a9c1d4..a339c1e9997a 100644
+--- a/drivers/spi/spi-xilinx.c
++++ b/drivers/spi/spi-xilinx.c
+@@ -249,19 +249,23 @@ static int xilinx_spi_txrx_bufs(struct spi_device *spi, struct spi_transfer *t)
+ 	xspi->tx_ptr = t->tx_buf;
+ 	xspi->rx_ptr = t->rx_buf;
+ 	remaining_words = t->len / xspi->bytes_per_word;
+-	reinit_completion(&xspi->done);
+ 
+ 	if (xspi->irq >= 0 &&  remaining_words > xspi->buffer_size) {
++		u32 isr;
+ 		use_irq = true;
+-		xspi->write_fn(XSPI_INTR_TX_EMPTY,
+-				xspi->regs + XIPIF_V123B_IISR_OFFSET);
+-		/* Enable the global IPIF interrupt */
+-		xspi->write_fn(XIPIF_V123B_GINTR_ENABLE,
+-				xspi->regs + XIPIF_V123B_DGIER_OFFSET);
+ 		/* Inhibit irq to avoid spurious irqs on tx_empty*/
+ 		cr = xspi->read_fn(xspi->regs + XSPI_CR_OFFSET);
+ 		xspi->write_fn(cr | XSPI_CR_TRANS_INHIBIT,
+ 			       xspi->regs + XSPI_CR_OFFSET);
++		/* ACK old irqs (if any) */
++		isr = xspi->read_fn(xspi->regs + XIPIF_V123B_IISR_OFFSET);
++		if (isr)
++			xspi->write_fn(isr,
++				       xspi->regs + XIPIF_V123B_IISR_OFFSET);
++		/* Enable the global IPIF interrupt */
++		xspi->write_fn(XIPIF_V123B_GINTR_ENABLE,
++				xspi->regs + XIPIF_V123B_DGIER_OFFSET);
++		reinit_completion(&xspi->done);
+ 	}
+ 
+ 	while (remaining_words) {
+@@ -302,8 +306,10 @@ static int xilinx_spi_txrx_bufs(struct spi_device *spi, struct spi_transfer *t)
+ 		remaining_words -= n_words;
+ 	}
+ 
+-	if (use_irq)
++	if (use_irq) {
+ 		xspi->write_fn(0, xspi->regs + XIPIF_V123B_DGIER_OFFSET);
++		xspi->write_fn(cr, xspi->regs + XSPI_CR_OFFSET);
++	}
+ 
+ 	return t->len;
+ }
+diff --git a/drivers/staging/comedi/drivers/adl_pci7x3x.c b/drivers/staging/comedi/drivers/adl_pci7x3x.c
+index 934af3ff7897..b0fc027cf485 100644
+--- a/drivers/staging/comedi/drivers/adl_pci7x3x.c
++++ b/drivers/staging/comedi/drivers/adl_pci7x3x.c
+@@ -120,8 +120,20 @@ static int adl_pci7x3x_do_insn_bits(struct comedi_device *dev,
+ {
+ 	unsigned long reg = (unsigned long)s->private;
+ 
+-	if (comedi_dio_update_state(s, data))
+-		outl(s->state, dev->iobase + reg);
++	if (comedi_dio_update_state(s, data)) {
++		unsigned int val = s->state;
++
++		if (s->n_chan == 16) {
++			/*
++			 * It seems the PCI-7230 needs the 16-bit DO state
++			 * to be shifted left by 16 bits before being written
++			 * to the 32-bit register.  Set the value in both
++			 * halves of the register to be sure.
++			 */
++			val |= val << 16;
++		}
++		outl(val, dev->iobase + reg);
++	}
+ 
+ 	data[1] = s->state;
+ 
+diff --git a/drivers/staging/comedi/drivers/usbduxsigma.c b/drivers/staging/comedi/drivers/usbduxsigma.c
+index eaa9add491df..dc0b25a54088 100644
+--- a/drivers/staging/comedi/drivers/usbduxsigma.c
++++ b/drivers/staging/comedi/drivers/usbduxsigma.c
+@@ -550,27 +550,6 @@ static int usbduxsigma_ai_cmdtest(struct comedi_device *dev,
+ 	if (err)
+ 		return 3;
+ 
+-	/* Step 4: fix up any arguments */
+-
+-	if (high_speed) {
+-		/*
+-		 * every 2 channels get a time window of 125us. Thus, if we
+-		 * sample all 16 channels we need 1ms. If we sample only one
+-		 * channel we need only 125us
+-		 */
+-		devpriv->ai_interval = interval;
+-		devpriv->ai_timer = cmd->scan_begin_arg / (125000 * interval);
+-	} else {
+-		/* interval always 1ms */
+-		devpriv->ai_interval = 1;
+-		devpriv->ai_timer = cmd->scan_begin_arg / 1000000;
+-	}
+-	if (devpriv->ai_timer < 1)
+-		err |= -EINVAL;
+-
+-	if (err)
+-		return 4;
+-
+ 	return 0;
+ }
+ 
+@@ -668,6 +647,22 @@ static int usbduxsigma_ai_cmd(struct comedi_device *dev,
+ 
+ 	down(&devpriv->sem);
+ 
++	if (devpriv->high_speed) {
++		/*
++		 * every 2 channels get a time window of 125us. Thus, if we
++		 * sample all 16 channels we need 1ms. If we sample only one
++		 * channel we need only 125us
++		 */
++		unsigned int interval = usbduxsigma_chans_to_interval(len);
++
++		devpriv->ai_interval = interval;
++		devpriv->ai_timer = cmd->scan_begin_arg / (125000 * interval);
++	} else {
++		/* interval always 1ms */
++		devpriv->ai_interval = 1;
++		devpriv->ai_timer = cmd->scan_begin_arg / 1000000;
++	}
++
+ 	for (i = 0; i < len; i++) {
+ 		unsigned int chan  = CR_CHAN(cmd->chanlist[i]);
+ 
+@@ -917,25 +912,6 @@ static int usbduxsigma_ao_cmdtest(struct comedi_device *dev,
+ 	if (err)
+ 		return 3;
+ 
+-	/* Step 4: fix up any arguments */
+-
+-	/* we count in timer steps */
+-	if (high_speed) {
+-		/* timing of the conversion itself: every 125 us */
+-		devpriv->ao_timer = cmd->convert_arg / 125000;
+-	} else {
+-		/*
+-		 * timing of the scan: every 1ms
+-		 * we get all channels at once
+-		 */
+-		devpriv->ao_timer = cmd->scan_begin_arg / 1000000;
+-	}
+-	if (devpriv->ao_timer < 1)
+-		err |= -EINVAL;
+-
+-	if (err)
+-		return 4;
+-
+ 	return 0;
+ }
+ 
+@@ -948,6 +924,20 @@ static int usbduxsigma_ao_cmd(struct comedi_device *dev,
+ 
+ 	down(&devpriv->sem);
+ 
++	if (cmd->convert_src == TRIG_TIMER) {
++		/*
++		 * timing of the conversion itself: every 125 us
++		 * at high speed (not used yet)
++		 */
++		devpriv->ao_timer = cmd->convert_arg / 125000;
++	} else {
++		/*
++		 * timing of the scan: every 1ms
++		 * we get all channels at once
++		 */
++		devpriv->ao_timer = cmd->scan_begin_arg / 1000000;
++	}
++
+ 	devpriv->ao_counter = devpriv->ao_timer;
+ 
+ 	if (cmd->start_src == TRIG_NOW) {
+diff --git a/drivers/staging/rtl8192e/rtl8192e/rtl_core.c b/drivers/staging/rtl8192e/rtl8192e/rtl_core.c
+index c6cdb43b864c..476808261fa8 100644
+--- a/drivers/staging/rtl8192e/rtl8192e/rtl_core.c
++++ b/drivers/staging/rtl8192e/rtl8192e/rtl_core.c
+@@ -1826,8 +1826,8 @@ void rtl8192_hard_data_xmit(struct sk_buff *skb, struct net_device *dev,
+ 		return;
+ 	}
+ 
+-	if (queue_index != TXCMD_QUEUE)
+-		netdev_warn(dev, "%s(): queue index != TXCMD_QUEUE\n",
++	if (queue_index == TXCMD_QUEUE)
++		netdev_warn(dev, "%s(): queue index == TXCMD_QUEUE\n",
+ 			    __func__);
+ 
+ 	memcpy((unsigned char *)(skb->cb), &dev, sizeof(dev));
+diff --git a/drivers/staging/unisys/visorbus/visorchipset.c b/drivers/staging/unisys/visorbus/visorchipset.c
+index bb8087e70127..44269d58eb51 100644
+--- a/drivers/staging/unisys/visorbus/visorchipset.c
++++ b/drivers/staging/unisys/visorbus/visorchipset.c
+@@ -2381,6 +2381,9 @@ static struct acpi_driver unisys_acpi_driver = {
+ 		.remove = visorchipset_exit,
+ 		},
+ };
++
++MODULE_DEVICE_TABLE(acpi, unisys_device_ids);
++
+ static __init uint32_t visorutil_spar_detect(void)
+ {
+ 	unsigned int eax, ebx, ecx, edx;
+diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
+index d75a66c72750..b470df122642 100644
+--- a/drivers/tty/serial/8250/8250_omap.c
++++ b/drivers/tty/serial/8250/8250_omap.c
+@@ -100,6 +100,7 @@ struct omap8250_priv {
+ 	struct work_struct qos_work;
+ 	struct uart_8250_dma omap8250_dma;
+ 	spinlock_t rx_dma_lock;
++	bool rx_dma_broken;
+ };
+ 
+ static u32 uart_read(struct uart_8250_port *up, u32 reg)
+@@ -754,6 +755,7 @@ static void omap_8250_rx_dma_flush(struct uart_8250_port *p)
+ 	struct omap8250_priv	*priv = p->port.private_data;
+ 	struct uart_8250_dma	*dma = p->dma;
+ 	unsigned long		flags;
++	int ret;
+ 
+ 	spin_lock_irqsave(&priv->rx_dma_lock, flags);
+ 
+@@ -762,7 +764,9 @@ static void omap_8250_rx_dma_flush(struct uart_8250_port *p)
+ 		return;
+ 	}
+ 
+-	dmaengine_pause(dma->rxchan);
++	ret = dmaengine_pause(dma->rxchan);
++	if (WARN_ON_ONCE(ret))
++		priv->rx_dma_broken = true;
+ 
+ 	spin_unlock_irqrestore(&priv->rx_dma_lock, flags);
+ 
+@@ -806,6 +810,9 @@ static int omap_8250_rx_dma(struct uart_8250_port *p, unsigned int iir)
+ 		break;
+ 	}
+ 
++	if (priv->rx_dma_broken)
++		return -EINVAL;
++
+ 	spin_lock_irqsave(&priv->rx_dma_lock, flags);
+ 
+ 	if (dma->rx_running)
+@@ -1180,6 +1187,11 @@ static int omap8250_probe(struct platform_device *pdev)
+ 
+ 			if (of_machine_is_compatible("ti,am33xx"))
+ 				priv->habit |= OMAP_DMA_TX_KICK;
++			/*
++			 * pause is currently not supported atleast on omap-sdma
++			 * and edma on most earlier kernels.
++			 */
++			priv->rx_dma_broken = true;
+ 		}
+ 	}
+ #endif
+diff --git a/drivers/tty/serial/8250/8250_pci.c b/drivers/tty/serial/8250/8250_pci.c
+index e55f18b93fe7..46ddce479f26 100644
+--- a/drivers/tty/serial/8250/8250_pci.c
++++ b/drivers/tty/serial/8250/8250_pci.c
+@@ -2017,6 +2017,12 @@ pci_wch_ch38x_setup(struct serial_private *priv,
+ #define PCIE_DEVICE_ID_WCH_CH382_2S1P	0x3250
+ #define PCIE_DEVICE_ID_WCH_CH384_4S	0x3470
+ 
++#define PCI_VENDOR_ID_PERICOM			0x12D8
++#define PCI_DEVICE_ID_PERICOM_PI7C9X7951	0x7951
++#define PCI_DEVICE_ID_PERICOM_PI7C9X7952	0x7952
++#define PCI_DEVICE_ID_PERICOM_PI7C9X7954	0x7954
++#define PCI_DEVICE_ID_PERICOM_PI7C9X7958	0x7958
++
+ /* Unknown vendors/cards - this should not be in linux/pci_ids.h */
+ #define PCI_SUBDEVICE_ID_UNKNOWN_0x1584	0x1584
+ #define PCI_SUBDEVICE_ID_UNKNOWN_0x1588	0x1588
+@@ -2331,27 +2337,12 @@ static struct pci_serial_quirk pci_serial_quirks[] __refdata = {
+ 	 * Pericom
+ 	 */
+ 	{
+-		.vendor		= 0x12d8,
+-		.device		= 0x7952,
+-		.subvendor	= PCI_ANY_ID,
+-		.subdevice	= PCI_ANY_ID,
+-		.setup		= pci_pericom_setup,
+-	},
+-	{
+-		.vendor		= 0x12d8,
+-		.device		= 0x7954,
+-		.subvendor	= PCI_ANY_ID,
+-		.subdevice	= PCI_ANY_ID,
+-		.setup		= pci_pericom_setup,
+-	},
+-	{
+-		.vendor		= 0x12d8,
+-		.device		= 0x7958,
+-		.subvendor	= PCI_ANY_ID,
+-		.subdevice	= PCI_ANY_ID,
+-		.setup		= pci_pericom_setup,
++		.vendor         = PCI_VENDOR_ID_PERICOM,
++		.device         = PCI_ANY_ID,
++		.subvendor      = PCI_ANY_ID,
++		.subdevice      = PCI_ANY_ID,
++		.setup          = pci_pericom_setup,
+ 	},
+-
+ 	/*
+ 	 * PLX
+ 	 */
+@@ -3056,6 +3047,10 @@ enum pci_board_num_t {
+ 	pbn_fintek_8,
+ 	pbn_fintek_12,
+ 	pbn_wch384_4,
++	pbn_pericom_PI7C9X7951,
++	pbn_pericom_PI7C9X7952,
++	pbn_pericom_PI7C9X7954,
++	pbn_pericom_PI7C9X7958,
+ };
+ 
+ /*
+@@ -3881,7 +3876,6 @@ static struct pciserial_board pci_boards[] = {
+ 		.base_baud	= 115200,
+ 		.first_offset	= 0x40,
+ 	},
+-
+ 	[pbn_wch384_4] = {
+ 		.flags		= FL_BASE0,
+ 		.num_ports	= 4,
+@@ -3889,6 +3883,33 @@ static struct pciserial_board pci_boards[] = {
+ 		.uart_offset    = 8,
+ 		.first_offset   = 0xC0,
+ 	},
++	/*
++	 * Pericom PI7C9X795[1248] Uno/Dual/Quad/Octal UART
++	 */
++	[pbn_pericom_PI7C9X7951] = {
++		.flags          = FL_BASE0,
++		.num_ports      = 1,
++		.base_baud      = 921600,
++		.uart_offset	= 0x8,
++	},
++	[pbn_pericom_PI7C9X7952] = {
++		.flags          = FL_BASE0,
++		.num_ports      = 2,
++		.base_baud      = 921600,
++		.uart_offset	= 0x8,
++	},
++	[pbn_pericom_PI7C9X7954] = {
++		.flags          = FL_BASE0,
++		.num_ports      = 4,
++		.base_baud      = 921600,
++		.uart_offset	= 0x8,
++	},
++	[pbn_pericom_PI7C9X7958] = {
++		.flags          = FL_BASE0,
++		.num_ports      = 8,
++		.base_baud      = 921600,
++		.uart_offset	= 0x8,
++	},
+ };
+ 
+ static const struct pci_device_id blacklist[] = {
+@@ -5154,6 +5175,25 @@ static struct pci_device_id serial_pci_tbl[] = {
+ 		0,
+ 		0, pbn_exar_XR17V8358 },
+ 	/*
++	 * Pericom PI7C9X795[1248] Uno/Dual/Quad/Octal UART
++	 */
++	{   PCI_VENDOR_ID_PERICOM, PCI_DEVICE_ID_PERICOM_PI7C9X7951,
++		PCI_ANY_ID, PCI_ANY_ID,
++		0,
++		0, pbn_pericom_PI7C9X7951 },
++	{   PCI_VENDOR_ID_PERICOM, PCI_DEVICE_ID_PERICOM_PI7C9X7952,
++		PCI_ANY_ID, PCI_ANY_ID,
++		0,
++		0, pbn_pericom_PI7C9X7952 },
++	{   PCI_VENDOR_ID_PERICOM, PCI_DEVICE_ID_PERICOM_PI7C9X7954,
++		PCI_ANY_ID, PCI_ANY_ID,
++		0,
++		0, pbn_pericom_PI7C9X7954 },
++	{   PCI_VENDOR_ID_PERICOM, PCI_DEVICE_ID_PERICOM_PI7C9X7958,
++		PCI_ANY_ID, PCI_ANY_ID,
++		0,
++		0, pbn_pericom_PI7C9X7958 },
++	/*
+ 	 * Topic TP560 Data/Fax/Voice 56k modem (reported by Evan Clarke)
+ 	 */
+ 	{	PCI_VENDOR_ID_TOPIC, PCI_DEVICE_ID_TOPIC_TP560,
+diff --git a/drivers/tty/serial/8250/8250_pnp.c b/drivers/tty/serial/8250/8250_pnp.c
+index 50a09cd76d50..658b392d1170 100644
+--- a/drivers/tty/serial/8250/8250_pnp.c
++++ b/drivers/tty/serial/8250/8250_pnp.c
+@@ -41,6 +41,12 @@ static const struct pnp_device_id pnp_dev_table[] = {
+ 	{	"AEI1240",		0	},
+ 	/* Rockwell 56K ACF II Fax+Data+Voice Modem */
+ 	{	"AKY1021",		0 /*SPCI_FL_NO_SHIRQ*/	},
++	/*
++	 * ALi Fast Infrared Controller
++	 * Native driver (ali-ircc) is broken so at least
++	 * it can be used with irtty-sir.
++	 */
++	{	"ALI5123",		0	},
+ 	/* AZT3005 PnP SOUND DEVICE */
+ 	{	"AZT4001",		0	},
+ 	/* Best Data Products Inc. Smart One 336F PnP Modem */
+@@ -364,6 +370,11 @@ static const struct pnp_device_id pnp_dev_table[] = {
+ 	/* Winbond CIR port, should not be probed. We should keep track
+ 	   of it to prevent the legacy serial driver from probing it */
+ 	{	"WEC1022",		CIR_PORT	},
++	/*
++	 * SMSC IrCC SIR/FIR port, should not be probed by serial driver
++	 * as well so its own driver can bind to it.
++	 */
++	{	"SMCF010",		CIR_PORT	},
+ 	{	"",			0	}
+ };
+ 
+diff --git a/drivers/tty/serial/8250/8250_uniphier.c b/drivers/tty/serial/8250/8250_uniphier.c
+index 7d79425c2b09..d11621e2cf1d 100644
+--- a/drivers/tty/serial/8250/8250_uniphier.c
++++ b/drivers/tty/serial/8250/8250_uniphier.c
+@@ -218,6 +218,7 @@ static int uniphier_uart_probe(struct platform_device *pdev)
+ 	ret = serial8250_register_8250_port(&up);
+ 	if (ret < 0) {
+ 		dev_err(dev, "failed to register 8250 port\n");
++		clk_disable_unprepare(priv->clk);
+ 		return ret;
+ 	}
+ 
+diff --git a/drivers/tty/serial/men_z135_uart.c b/drivers/tty/serial/men_z135_uart.c
+index 35c55505b3eb..5a41b8fbb10a 100644
+--- a/drivers/tty/serial/men_z135_uart.c
++++ b/drivers/tty/serial/men_z135_uart.c
+@@ -392,7 +392,6 @@ static irqreturn_t men_z135_intr(int irq, void *data)
+ 	struct men_z135_port *uart = (struct men_z135_port *)data;
+ 	struct uart_port *port = &uart->port;
+ 	bool handled = false;
+-	unsigned long flags;
+ 	int irq_id;
+ 
+ 	uart->stat_reg = ioread32(port->membase + MEN_Z135_STAT_REG);
+@@ -401,7 +400,7 @@ static irqreturn_t men_z135_intr(int irq, void *data)
+ 	if (!irq_id)
+ 		goto out;
+ 
+-	spin_lock_irqsave(&port->lock, flags);
++	spin_lock(&port->lock);
+ 	/* It's save to write to IIR[7:6] RXC[9:8] */
+ 	iowrite8(irq_id, port->membase + MEN_Z135_STAT_REG);
+ 
+@@ -427,7 +426,7 @@ static irqreturn_t men_z135_intr(int irq, void *data)
+ 		handled = true;
+ 	}
+ 
+-	spin_unlock_irqrestore(&port->lock, flags);
++	spin_unlock(&port->lock);
+ out:
+ 	return IRQ_RETVAL(handled);
+ }
+@@ -717,7 +716,7 @@ static void men_z135_set_termios(struct uart_port *port,
+ 
+ 	baud = uart_get_baud_rate(port, termios, old, 0, uart_freq / 16);
+ 
+-	spin_lock(&port->lock);
++	spin_lock_irq(&port->lock);
+ 	if (tty_termios_baud_rate(termios))
+ 		tty_termios_encode_baud_rate(termios, baud, baud);
+ 
+@@ -725,7 +724,7 @@ static void men_z135_set_termios(struct uart_port *port,
+ 	iowrite32(bd_reg, port->membase + MEN_Z135_BAUD_REG);
+ 
+ 	uart_update_timeout(port, termios->c_cflag, baud);
+-	spin_unlock(&port->lock);
++	spin_unlock_irq(&port->lock);
+ }
+ 
+ static const char *men_z135_type(struct uart_port *port)
+diff --git a/drivers/tty/serial/samsung.c b/drivers/tty/serial/samsung.c
+index 67d0c213b1c7..5916311eecb1 100644
+--- a/drivers/tty/serial/samsung.c
++++ b/drivers/tty/serial/samsung.c
+@@ -295,15 +295,6 @@ static int s3c24xx_serial_start_tx_dma(struct s3c24xx_uart_port *ourport,
+ 	if (ourport->tx_mode != S3C24XX_TX_DMA)
+ 		enable_tx_dma(ourport);
+ 
+-	while (xmit->tail & (dma_get_cache_alignment() - 1)) {
+-		if (rd_regl(port, S3C2410_UFSTAT) & ourport->info->tx_fifofull)
+-			return 0;
+-		wr_regb(port, S3C2410_UTXH, xmit->buf[xmit->tail]);
+-		xmit->tail = (xmit->tail + 1) & (UART_XMIT_SIZE - 1);
+-		port->icount.tx++;
+-		count--;
+-	}
+-
+ 	dma->tx_size = count & ~(dma_get_cache_alignment() - 1);
+ 	dma->tx_transfer_addr = dma->tx_addr + xmit->tail;
+ 
+@@ -342,7 +333,9 @@ static void s3c24xx_serial_start_next_tx(struct s3c24xx_uart_port *ourport)
+ 		return;
+ 	}
+ 
+-	if (!ourport->dma || !ourport->dma->tx_chan || count < port->fifosize)
++	if (!ourport->dma || !ourport->dma->tx_chan ||
++	    count < ourport->min_dma_size ||
++	    xmit->tail & (dma_get_cache_alignment() - 1))
+ 		s3c24xx_serial_start_tx_pio(ourport);
+ 	else
+ 		s3c24xx_serial_start_tx_dma(ourport, count);
+@@ -736,15 +729,20 @@ static irqreturn_t s3c24xx_serial_tx_chars(int irq, void *id)
+ 	struct uart_port *port = &ourport->port;
+ 	struct circ_buf *xmit = &port->state->xmit;
+ 	unsigned long flags;
+-	int count;
++	int count, dma_count = 0;
+ 
+ 	spin_lock_irqsave(&port->lock, flags);
+ 
+ 	count = CIRC_CNT_TO_END(xmit->head, xmit->tail, UART_XMIT_SIZE);
+ 
+-	if (ourport->dma && ourport->dma->tx_chan && count >= port->fifosize) {
+-		s3c24xx_serial_start_tx_dma(ourport, count);
+-		goto out;
++	if (ourport->dma && ourport->dma->tx_chan &&
++	    count >= ourport->min_dma_size) {
++		int align = dma_get_cache_alignment() -
++			(xmit->tail & (dma_get_cache_alignment() - 1));
++		if (count-align >= ourport->min_dma_size) {
++			dma_count = count-align;
++			count = align;
++		}
+ 	}
+ 
+ 	if (port->x_char) {
+@@ -765,14 +763,24 @@ static irqreturn_t s3c24xx_serial_tx_chars(int irq, void *id)
+ 
+ 	/* try and drain the buffer... */
+ 
+-	count = port->fifosize;
+-	while (!uart_circ_empty(xmit) && count-- > 0) {
++	if (count > port->fifosize) {
++		count = port->fifosize;
++		dma_count = 0;
++	}
++
++	while (!uart_circ_empty(xmit) && count > 0) {
+ 		if (rd_regl(port, S3C2410_UFSTAT) & ourport->info->tx_fifofull)
+ 			break;
+ 
+ 		wr_regb(port, S3C2410_UTXH, xmit->buf[xmit->tail]);
+ 		xmit->tail = (xmit->tail + 1) & (UART_XMIT_SIZE - 1);
+ 		port->icount.tx++;
++		count--;
++	}
++
++	if (!count && dma_count) {
++		s3c24xx_serial_start_tx_dma(ourport, dma_count);
++		goto out;
+ 	}
+ 
+ 	if (uart_circ_chars_pending(xmit) < WAKEUP_CHARS) {
+@@ -1838,6 +1846,13 @@ static int s3c24xx_serial_probe(struct platform_device *pdev)
+ 	else if (ourport->info->fifosize)
+ 		ourport->port.fifosize = ourport->info->fifosize;
+ 
++	/*
++	 * DMA transfers must be aligned at least to cache line size,
++	 * so find minimal transfer size suitable for DMA mode
++	 */
++	ourport->min_dma_size = max_t(int, ourport->port.fifosize,
++				    dma_get_cache_alignment());
++
+ 	probe_index++;
+ 
+ 	dbg("%s: initialising port %p...\n", __func__, ourport);
+diff --git a/drivers/tty/serial/samsung.h b/drivers/tty/serial/samsung.h
+index d275032aa68d..fc5deaa4f382 100644
+--- a/drivers/tty/serial/samsung.h
++++ b/drivers/tty/serial/samsung.h
+@@ -82,6 +82,7 @@ struct s3c24xx_uart_port {
+ 	unsigned char			tx_claimed;
+ 	unsigned int			pm_level;
+ 	unsigned long			baudclk_rate;
++	unsigned int			min_dma_size;
+ 
+ 	unsigned int			rx_irq;
+ 	unsigned int			tx_irq;
+diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
+index 69e769c35cf5..06ecd1e6871c 100644
+--- a/drivers/usb/dwc3/ep0.c
++++ b/drivers/usb/dwc3/ep0.c
+@@ -820,6 +820,11 @@ static void dwc3_ep0_complete_data(struct dwc3 *dwc,
+ 		unsigned maxp = ep0->endpoint.maxpacket;
+ 
+ 		transfer_size += (maxp - (transfer_size % maxp));
++
++		/* Maximum of DWC3_EP0_BOUNCE_SIZE can only be received */
++		if (transfer_size > DWC3_EP0_BOUNCE_SIZE)
++			transfer_size = DWC3_EP0_BOUNCE_SIZE;
++
+ 		transferred = min_t(u32, ur->length,
+ 				transfer_size - length);
+ 		memcpy(ur->buf, dwc->ep0_bounce, transferred);
+@@ -941,11 +946,14 @@ static void __dwc3_ep0_do_control_data(struct dwc3 *dwc,
+ 			return;
+ 		}
+ 
+-		WARN_ON(req->request.length > DWC3_EP0_BOUNCE_SIZE);
+-
+ 		maxpacket = dep->endpoint.maxpacket;
+ 		transfer_size = roundup(req->request.length, maxpacket);
+ 
++		if (transfer_size > DWC3_EP0_BOUNCE_SIZE) {
++			dev_WARN(dwc->dev, "bounce buf can't handle req len\n");
++			transfer_size = DWC3_EP0_BOUNCE_SIZE;
++		}
++
+ 		dwc->ep0_bounced = true;
+ 
+ 		/*
+diff --git a/drivers/usb/gadget/function/f_uac2.c b/drivers/usb/gadget/function/f_uac2.c
+index 531861547253..96d935b00504 100644
+--- a/drivers/usb/gadget/function/f_uac2.c
++++ b/drivers/usb/gadget/function/f_uac2.c
+@@ -975,6 +975,29 @@ free_ep(struct uac2_rtd_params *prm, struct usb_ep *ep)
+ 			"%s:%d Error!\n", __func__, __LINE__);
+ }
+ 
++static void set_ep_max_packet_size(const struct f_uac2_opts *uac2_opts,
++	struct usb_endpoint_descriptor *ep_desc,
++	unsigned int factor, bool is_playback)
++{
++	int chmask, srate, ssize;
++	u16 max_packet_size;
++
++	if (is_playback) {
++		chmask = uac2_opts->p_chmask;
++		srate = uac2_opts->p_srate;
++		ssize = uac2_opts->p_ssize;
++	} else {
++		chmask = uac2_opts->c_chmask;
++		srate = uac2_opts->c_srate;
++		ssize = uac2_opts->c_ssize;
++	}
++
++	max_packet_size = num_channels(chmask) * ssize *
++		DIV_ROUND_UP(srate, factor / (1 << (ep_desc->bInterval - 1)));
++	ep_desc->wMaxPacketSize = cpu_to_le16(min(max_packet_size,
++				le16_to_cpu(ep_desc->wMaxPacketSize)));
++}
++
+ static int
+ afunc_bind(struct usb_configuration *cfg, struct usb_function *fn)
+ {
+@@ -1070,10 +1093,14 @@ afunc_bind(struct usb_configuration *cfg, struct usb_function *fn)
+ 	uac2->p_prm.uac2 = uac2;
+ 	uac2->c_prm.uac2 = uac2;
+ 
++	/* Calculate wMaxPacketSize according to audio bandwidth */
++	set_ep_max_packet_size(uac2_opts, &fs_epin_desc, 1000, true);
++	set_ep_max_packet_size(uac2_opts, &fs_epout_desc, 1000, false);
++	set_ep_max_packet_size(uac2_opts, &hs_epin_desc, 8000, true);
++	set_ep_max_packet_size(uac2_opts, &hs_epout_desc, 8000, false);
++
+ 	hs_epout_desc.bEndpointAddress = fs_epout_desc.bEndpointAddress;
+-	hs_epout_desc.wMaxPacketSize = fs_epout_desc.wMaxPacketSize;
+ 	hs_epin_desc.bEndpointAddress = fs_epin_desc.bEndpointAddress;
+-	hs_epin_desc.wMaxPacketSize = fs_epin_desc.wMaxPacketSize;
+ 
+ 	ret = usb_assign_descriptors(fn, fs_audio_desc, hs_audio_desc, NULL);
+ 	if (ret)
+diff --git a/drivers/usb/gadget/udc/m66592-udc.c b/drivers/usb/gadget/udc/m66592-udc.c
+index 309706fe4bf0..9704053dfe05 100644
+--- a/drivers/usb/gadget/udc/m66592-udc.c
++++ b/drivers/usb/gadget/udc/m66592-udc.c
+@@ -1052,7 +1052,7 @@ static void set_feature(struct m66592 *m66592, struct usb_ctrlrequest *ctrl)
+ 				tmp = m66592_read(m66592, M66592_INTSTS0) &
+ 								M66592_CTSQ;
+ 				udelay(1);
+-			} while (tmp != M66592_CS_IDST || timeout-- > 0);
++			} while (tmp != M66592_CS_IDST && timeout-- > 0);
+ 
+ 			if (tmp == M66592_CS_IDST)
+ 				m66592_bset(m66592,
+diff --git a/drivers/usb/host/ehci-sysfs.c b/drivers/usb/host/ehci-sysfs.c
+index 5e44407aa099..5216f2b09d63 100644
+--- a/drivers/usb/host/ehci-sysfs.c
++++ b/drivers/usb/host/ehci-sysfs.c
+@@ -29,7 +29,7 @@ static ssize_t show_companion(struct device *dev,
+ 	int			count = PAGE_SIZE;
+ 	char			*ptr = buf;
+ 
+-	ehci = hcd_to_ehci(bus_to_hcd(dev_get_drvdata(dev)));
++	ehci = hcd_to_ehci(dev_get_drvdata(dev));
+ 	nports = HCS_N_PORTS(ehci->hcs_params);
+ 
+ 	for (index = 0; index < nports; ++index) {
+@@ -54,7 +54,7 @@ static ssize_t store_companion(struct device *dev,
+ 	struct ehci_hcd		*ehci;
+ 	int			portnum, new_owner;
+ 
+-	ehci = hcd_to_ehci(bus_to_hcd(dev_get_drvdata(dev)));
++	ehci = hcd_to_ehci(dev_get_drvdata(dev));
+ 	new_owner = PORT_OWNER;		/* Owned by companion */
+ 	if (sscanf(buf, "%d", &portnum) != 1)
+ 		return -EINVAL;
+@@ -85,7 +85,7 @@ static ssize_t show_uframe_periodic_max(struct device *dev,
+ 	struct ehci_hcd		*ehci;
+ 	int			n;
+ 
+-	ehci = hcd_to_ehci(bus_to_hcd(dev_get_drvdata(dev)));
++	ehci = hcd_to_ehci(dev_get_drvdata(dev));
+ 	n = scnprintf(buf, PAGE_SIZE, "%d\n", ehci->uframe_periodic_max);
+ 	return n;
+ }
+@@ -101,7 +101,7 @@ static ssize_t store_uframe_periodic_max(struct device *dev,
+ 	unsigned long		flags;
+ 	ssize_t			ret;
+ 
+-	ehci = hcd_to_ehci(bus_to_hcd(dev_get_drvdata(dev)));
++	ehci = hcd_to_ehci(dev_get_drvdata(dev));
+ 	if (kstrtouint(buf, 0, &uframe_periodic_max) < 0)
+ 		return -EINVAL;
+ 
+diff --git a/drivers/usb/serial/ftdi_sio.c b/drivers/usb/serial/ftdi_sio.c
+index 4c8b3b82103d..a5a0376bbd48 100644
+--- a/drivers/usb/serial/ftdi_sio.c
++++ b/drivers/usb/serial/ftdi_sio.c
+@@ -605,6 +605,10 @@ static const struct usb_device_id id_table_combined[] = {
+ 	{ USB_DEVICE(FTDI_VID, FTDI_NT_ORIONLXM_PID),
+ 		.driver_info = (kernel_ulong_t)&ftdi_jtag_quirk },
+ 	{ USB_DEVICE(FTDI_VID, FTDI_SYNAPSE_SS200_PID) },
++	{ USB_DEVICE(FTDI_VID, FTDI_CUSTOMWARE_MINIPLEX_PID) },
++	{ USB_DEVICE(FTDI_VID, FTDI_CUSTOMWARE_MINIPLEX2_PID) },
++	{ USB_DEVICE(FTDI_VID, FTDI_CUSTOMWARE_MINIPLEX2WI_PID) },
++	{ USB_DEVICE(FTDI_VID, FTDI_CUSTOMWARE_MINIPLEX3_PID) },
+ 	/*
+ 	 * ELV devices:
+ 	 */
+diff --git a/drivers/usb/serial/ftdi_sio_ids.h b/drivers/usb/serial/ftdi_sio_ids.h
+index 792e054126de..2943b97b2a83 100644
+--- a/drivers/usb/serial/ftdi_sio_ids.h
++++ b/drivers/usb/serial/ftdi_sio_ids.h
+@@ -568,6 +568,14 @@
+  */
+ #define FTDI_SYNAPSE_SS200_PID 0x9090 /* SS200 - SNAP Stick 200 */
+ 
++/*
++ * CustomWare / ShipModul NMEA multiplexers product ids (FTDI_VID)
++ */
++#define FTDI_CUSTOMWARE_MINIPLEX_PID	0xfd48	/* MiniPlex first generation NMEA Multiplexer */
++#define FTDI_CUSTOMWARE_MINIPLEX2_PID	0xfd49	/* MiniPlex-USB and MiniPlex-2 series */
++#define FTDI_CUSTOMWARE_MINIPLEX2WI_PID	0xfd4a	/* MiniPlex-2Wi */
++#define FTDI_CUSTOMWARE_MINIPLEX3_PID	0xfd4b	/* MiniPlex-3 series */
++
+ 
+ /********************************/
+ /** third-party VID/PID combos **/
+diff --git a/drivers/usb/serial/pl2303.c b/drivers/usb/serial/pl2303.c
+index f5257af33ecf..ae682e4eeaef 100644
+--- a/drivers/usb/serial/pl2303.c
++++ b/drivers/usb/serial/pl2303.c
+@@ -362,21 +362,38 @@ static speed_t pl2303_encode_baud_rate_direct(unsigned char buf[4],
+ static speed_t pl2303_encode_baud_rate_divisor(unsigned char buf[4],
+ 								speed_t baud)
+ {
+-	unsigned int tmp;
++	unsigned int baseline, mantissa, exponent;
+ 
+ 	/*
+ 	 * Apparently the formula is:
+-	 * baudrate = 12M * 32 / (2^buf[1]) / buf[0]
++	 *   baudrate = 12M * 32 / (mantissa * 4^exponent)
++	 * where
++	 *   mantissa = buf[8:0]
++	 *   exponent = buf[11:9]
+ 	 */
+-	tmp = 12000000 * 32 / baud;
++	baseline = 12000000 * 32;
++	mantissa = baseline / baud;
++	if (mantissa == 0)
++		mantissa = 1;	/* Avoid dividing by zero if baud > 32*12M. */
++	exponent = 0;
++	while (mantissa >= 512) {
++		if (exponent < 7) {
++			mantissa >>= 2;	/* divide by 4 */
++			exponent++;
++		} else {
++			/* Exponent is maxed. Trim mantissa and leave. */
++			mantissa = 511;
++			break;
++		}
++	}
++
+ 	buf[3] = 0x80;
+ 	buf[2] = 0;
+-	buf[1] = (tmp >= 256);
+-	while (tmp >= 256) {
+-		tmp >>= 2;
+-		buf[1] <<= 1;
+-	}
+-	buf[0] = tmp;
++	buf[1] = exponent << 1 | mantissa >> 8;
++	buf[0] = mantissa & 0xff;
++
++	/* Calculate and return the exact baud rate. */
++	baud = (baseline / mantissa) >> (exponent << 1);
+ 
+ 	return baud;
+ }
+diff --git a/drivers/usb/serial/qcserial.c b/drivers/usb/serial/qcserial.c
+index d156545728c2..ebcec8cda858 100644
+--- a/drivers/usb/serial/qcserial.c
++++ b/drivers/usb/serial/qcserial.c
+@@ -139,6 +139,7 @@ static const struct usb_device_id id_table[] = {
+ 	{USB_DEVICE(0x0AF0, 0x8120)},	/* Option GTM681W */
+ 
+ 	/* non-Gobi Sierra Wireless devices */
++	{DEVICE_SWI(0x03f0, 0x4e1d)},	/* HP lt4111 LTE/EV-DO/HSPA+ Gobi 4G Module */
+ 	{DEVICE_SWI(0x0f3d, 0x68a2)},	/* Sierra Wireless MC7700 */
+ 	{DEVICE_SWI(0x114f, 0x68a2)},	/* Sierra Wireless MC7750 */
+ 	{DEVICE_SWI(0x1199, 0x68a2)},	/* Sierra Wireless MC7710 */
+diff --git a/drivers/usb/serial/symbolserial.c b/drivers/usb/serial/symbolserial.c
+index 8fceec7298e0..6ed804450a5a 100644
+--- a/drivers/usb/serial/symbolserial.c
++++ b/drivers/usb/serial/symbolserial.c
+@@ -94,7 +94,7 @@ exit:
+ 
+ static int symbol_open(struct tty_struct *tty, struct usb_serial_port *port)
+ {
+-	struct symbol_private *priv = usb_get_serial_data(port->serial);
++	struct symbol_private *priv = usb_get_serial_port_data(port);
+ 	unsigned long flags;
+ 	int result = 0;
+ 
+@@ -120,7 +120,7 @@ static void symbol_close(struct usb_serial_port *port)
+ static void symbol_throttle(struct tty_struct *tty)
+ {
+ 	struct usb_serial_port *port = tty->driver_data;
+-	struct symbol_private *priv = usb_get_serial_data(port->serial);
++	struct symbol_private *priv = usb_get_serial_port_data(port);
+ 
+ 	spin_lock_irq(&priv->lock);
+ 	priv->throttled = true;
+@@ -130,7 +130,7 @@ static void symbol_throttle(struct tty_struct *tty)
+ static void symbol_unthrottle(struct tty_struct *tty)
+ {
+ 	struct usb_serial_port *port = tty->driver_data;
+-	struct symbol_private *priv = usb_get_serial_data(port->serial);
++	struct symbol_private *priv = usb_get_serial_port_data(port);
+ 	int result;
+ 	bool was_throttled;
+ 
+diff --git a/fs/ceph/super.c b/fs/ceph/super.c
+index d1c833c321b9..7b6bfcbf801c 100644
+--- a/fs/ceph/super.c
++++ b/fs/ceph/super.c
+@@ -479,7 +479,7 @@ static int ceph_show_options(struct seq_file *m, struct dentry *root)
+ 	if (fsopt->max_readdir_bytes != CEPH_MAX_READDIR_BYTES_DEFAULT)
+ 		seq_printf(m, ",readdir_max_bytes=%d", fsopt->max_readdir_bytes);
+ 	if (strcmp(fsopt->snapdir_name, CEPH_SNAPDIRNAME_DEFAULT))
+-		seq_printf(m, ",snapdirname=%s", fsopt->snapdir_name);
++		seq_show_option(m, "snapdirname", fsopt->snapdir_name);
+ 
+ 	return 0;
+ }
+diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
+index 0a9fb6b53126..6a1119e87fbb 100644
+--- a/fs/cifs/cifsfs.c
++++ b/fs/cifs/cifsfs.c
+@@ -394,17 +394,17 @@ cifs_show_options(struct seq_file *s, struct dentry *root)
+ 	struct sockaddr *srcaddr;
+ 	srcaddr = (struct sockaddr *)&tcon->ses->server->srcaddr;
+ 
+-	seq_printf(s, ",vers=%s", tcon->ses->server->vals->version_string);
++	seq_show_option(s, "vers", tcon->ses->server->vals->version_string);
+ 	cifs_show_security(s, tcon->ses);
+ 	cifs_show_cache_flavor(s, cifs_sb);
+ 
+ 	if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MULTIUSER)
+ 		seq_puts(s, ",multiuser");
+ 	else if (tcon->ses->user_name)
+-		seq_printf(s, ",username=%s", tcon->ses->user_name);
++		seq_show_option(s, "username", tcon->ses->user_name);
+ 
+ 	if (tcon->ses->domainName)
+-		seq_printf(s, ",domain=%s", tcon->ses->domainName);
++		seq_show_option(s, "domain", tcon->ses->domainName);
+ 
+ 	if (srcaddr->sa_family != AF_UNSPEC) {
+ 		struct sockaddr_in *saddr4;
+diff --git a/fs/ext4/super.c b/fs/ext4/super.c
+index 58987b5c514b..9981064c4a54 100644
+--- a/fs/ext4/super.c
++++ b/fs/ext4/super.c
+@@ -1763,10 +1763,10 @@ static inline void ext4_show_quota_options(struct seq_file *seq,
+ 	}
+ 
+ 	if (sbi->s_qf_names[USRQUOTA])
+-		seq_printf(seq, ",usrjquota=%s", sbi->s_qf_names[USRQUOTA]);
++		seq_show_option(seq, "usrjquota", sbi->s_qf_names[USRQUOTA]);
+ 
+ 	if (sbi->s_qf_names[GRPQUOTA])
+-		seq_printf(seq, ",grpjquota=%s", sbi->s_qf_names[GRPQUOTA]);
++		seq_show_option(seq, "grpjquota", sbi->s_qf_names[GRPQUOTA]);
+ #endif
+ }
+ 
+diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
+index 2982445947e1..894fb01a91da 100644
+--- a/fs/gfs2/super.c
++++ b/fs/gfs2/super.c
+@@ -1334,11 +1334,11 @@ static int gfs2_show_options(struct seq_file *s, struct dentry *root)
+ 	if (is_ancestor(root, sdp->sd_master_dir))
+ 		seq_puts(s, ",meta");
+ 	if (args->ar_lockproto[0])
+-		seq_printf(s, ",lockproto=%s", args->ar_lockproto);
++		seq_show_option(s, "lockproto", args->ar_lockproto);
+ 	if (args->ar_locktable[0])
+-		seq_printf(s, ",locktable=%s", args->ar_locktable);
++		seq_show_option(s, "locktable", args->ar_locktable);
+ 	if (args->ar_hostdata[0])
+-		seq_printf(s, ",hostdata=%s", args->ar_hostdata);
++		seq_show_option(s, "hostdata", args->ar_hostdata);
+ 	if (args->ar_spectator)
+ 		seq_puts(s, ",spectator");
+ 	if (args->ar_localflocks)
+diff --git a/fs/hfs/super.c b/fs/hfs/super.c
+index 55c03b9e9070..4574fdd3d421 100644
+--- a/fs/hfs/super.c
++++ b/fs/hfs/super.c
+@@ -136,9 +136,9 @@ static int hfs_show_options(struct seq_file *seq, struct dentry *root)
+ 	struct hfs_sb_info *sbi = HFS_SB(root->d_sb);
+ 
+ 	if (sbi->s_creator != cpu_to_be32(0x3f3f3f3f))
+-		seq_printf(seq, ",creator=%.4s", (char *)&sbi->s_creator);
++		seq_show_option_n(seq, "creator", (char *)&sbi->s_creator, 4);
+ 	if (sbi->s_type != cpu_to_be32(0x3f3f3f3f))
+-		seq_printf(seq, ",type=%.4s", (char *)&sbi->s_type);
++		seq_show_option_n(seq, "type", (char *)&sbi->s_type, 4);
+ 	seq_printf(seq, ",uid=%u,gid=%u",
+ 			from_kuid_munged(&init_user_ns, sbi->s_uid),
+ 			from_kgid_munged(&init_user_ns, sbi->s_gid));
+diff --git a/fs/hfsplus/options.c b/fs/hfsplus/options.c
+index c90b72ee676d..bb806e58c977 100644
+--- a/fs/hfsplus/options.c
++++ b/fs/hfsplus/options.c
+@@ -218,9 +218,9 @@ int hfsplus_show_options(struct seq_file *seq, struct dentry *root)
+ 	struct hfsplus_sb_info *sbi = HFSPLUS_SB(root->d_sb);
+ 
+ 	if (sbi->creator != HFSPLUS_DEF_CR_TYPE)
+-		seq_printf(seq, ",creator=%.4s", (char *)&sbi->creator);
++		seq_show_option_n(seq, "creator", (char *)&sbi->creator, 4);
+ 	if (sbi->type != HFSPLUS_DEF_CR_TYPE)
+-		seq_printf(seq, ",type=%.4s", (char *)&sbi->type);
++		seq_show_option_n(seq, "type", (char *)&sbi->type, 4);
+ 	seq_printf(seq, ",umask=%o,uid=%u,gid=%u", sbi->umask,
+ 			from_kuid_munged(&init_user_ns, sbi->uid),
+ 			from_kgid_munged(&init_user_ns, sbi->gid));
+diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
+index 059597b23f67..2ac99db3750e 100644
+--- a/fs/hostfs/hostfs_kern.c
++++ b/fs/hostfs/hostfs_kern.c
+@@ -260,7 +260,7 @@ static int hostfs_show_options(struct seq_file *seq, struct dentry *root)
+ 	size_t offset = strlen(root_ino) + 1;
+ 
+ 	if (strlen(root_path) > offset)
+-		seq_printf(seq, ",%s", root_path + offset);
++		seq_show_option(seq, root_path + offset, NULL);
+ 
+ 	if (append)
+ 		seq_puts(seq, ",append");
+diff --git a/fs/hpfs/namei.c b/fs/hpfs/namei.c
+index a0872f239f04..9e92c9c2d319 100644
+--- a/fs/hpfs/namei.c
++++ b/fs/hpfs/namei.c
+@@ -8,6 +8,17 @@
+ #include <linux/sched.h>
+ #include "hpfs_fn.h"
+ 
++static void hpfs_update_directory_times(struct inode *dir)
++{
++	time_t t = get_seconds();
++	if (t == dir->i_mtime.tv_sec &&
++	    t == dir->i_ctime.tv_sec)
++		return;
++	dir->i_mtime.tv_sec = dir->i_ctime.tv_sec = t;
++	dir->i_mtime.tv_nsec = dir->i_ctime.tv_nsec = 0;
++	hpfs_write_inode_nolock(dir);
++}
++
+ static int hpfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+ {
+ 	const unsigned char *name = dentry->d_name.name;
+@@ -99,6 +110,7 @@ static int hpfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+ 		result->i_mode = mode | S_IFDIR;
+ 		hpfs_write_inode_nolock(result);
+ 	}
++	hpfs_update_directory_times(dir);
+ 	d_instantiate(dentry, result);
+ 	hpfs_unlock(dir->i_sb);
+ 	return 0;
+@@ -187,6 +199,7 @@ static int hpfs_create(struct inode *dir, struct dentry *dentry, umode_t mode, b
+ 		result->i_mode = mode | S_IFREG;
+ 		hpfs_write_inode_nolock(result);
+ 	}
++	hpfs_update_directory_times(dir);
+ 	d_instantiate(dentry, result);
+ 	hpfs_unlock(dir->i_sb);
+ 	return 0;
+@@ -262,6 +275,7 @@ static int hpfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, de
+ 	insert_inode_hash(result);
+ 
+ 	hpfs_write_inode_nolock(result);
++	hpfs_update_directory_times(dir);
+ 	d_instantiate(dentry, result);
+ 	brelse(bh);
+ 	hpfs_unlock(dir->i_sb);
+@@ -340,6 +354,7 @@ static int hpfs_symlink(struct inode *dir, struct dentry *dentry, const char *sy
+ 	insert_inode_hash(result);
+ 
+ 	hpfs_write_inode_nolock(result);
++	hpfs_update_directory_times(dir);
+ 	d_instantiate(dentry, result);
+ 	hpfs_unlock(dir->i_sb);
+ 	return 0;
+@@ -423,6 +438,8 @@ again:
+ out1:
+ 	hpfs_brelse4(&qbh);
+ out:
++	if (!err)
++		hpfs_update_directory_times(dir);
+ 	hpfs_unlock(dir->i_sb);
+ 	return err;
+ }
+@@ -477,6 +494,8 @@ static int hpfs_rmdir(struct inode *dir, struct dentry *dentry)
+ out1:
+ 	hpfs_brelse4(&qbh);
+ out:
++	if (!err)
++		hpfs_update_directory_times(dir);
+ 	hpfs_unlock(dir->i_sb);
+ 	return err;
+ }
+@@ -595,7 +614,7 @@ static int hpfs_rename(struct inode *old_dir, struct dentry *old_dentry,
+ 		goto end1;
+ 	}
+ 
+-	end:
++end:
+ 	hpfs_i(i)->i_parent_dir = new_dir->i_ino;
+ 	if (S_ISDIR(i->i_mode)) {
+ 		inc_nlink(new_dir);
+@@ -610,6 +629,10 @@ static int hpfs_rename(struct inode *old_dir, struct dentry *old_dentry,
+ 		brelse(bh);
+ 	}
+ end1:
++	if (!err) {
++		hpfs_update_directory_times(old_dir);
++		hpfs_update_directory_times(new_dir);
++	}
+ 	hpfs_unlock(i->i_sb);
+ 	return err;
+ }
+diff --git a/fs/libfs.c b/fs/libfs.c
+index 102edfd39000..c7cbfb092e94 100644
+--- a/fs/libfs.c
++++ b/fs/libfs.c
+@@ -1185,7 +1185,7 @@ void make_empty_dir_inode(struct inode *inode)
+ 	inode->i_uid = GLOBAL_ROOT_UID;
+ 	inode->i_gid = GLOBAL_ROOT_GID;
+ 	inode->i_rdev = 0;
+-	inode->i_size = 2;
++	inode->i_size = 0;
+ 	inode->i_blkbits = PAGE_SHIFT;
+ 	inode->i_blocks = 0;
+ 
+diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
+index 719f7f4c7a37..33efa334ec76 100644
+--- a/fs/ocfs2/file.c
++++ b/fs/ocfs2/file.c
+@@ -2372,6 +2372,20 @@ relock:
+ 	/* buffered aio wouldn't have proper lock coverage today */
+ 	BUG_ON(written == -EIOCBQUEUED && !(iocb->ki_flags & IOCB_DIRECT));
+ 
++	/*
++	 * deep in g_f_a_w_n()->ocfs2_direct_IO we pass in a ocfs2_dio_end_io
++	 * function pointer which is called when o_direct io completes so that
++	 * it can unlock our rw lock.
++	 * Unfortunately there are error cases which call end_io and others
++	 * that don't.  so we don't have to unlock the rw_lock if either an
++	 * async dio is going to do it in the future or an end_io after an
++	 * error has already done it.
++	 */
++	if ((written == -EIOCBQUEUED) || (!ocfs2_iocb_is_rw_locked(iocb))) {
++		rw_level = -1;
++		unaligned_dio = 0;
++	}
++
+ 	if (unlikely(written <= 0))
+ 		goto no_sync;
+ 
+@@ -2396,20 +2410,6 @@ relock:
+ 	}
+ 
+ no_sync:
+-	/*
+-	 * deep in g_f_a_w_n()->ocfs2_direct_IO we pass in a ocfs2_dio_end_io
+-	 * function pointer which is called when o_direct io completes so that
+-	 * it can unlock our rw lock.
+-	 * Unfortunately there are error cases which call end_io and others
+-	 * that don't.  so we don't have to unlock the rw_lock if either an
+-	 * async dio is going to do it in the future or an end_io after an
+-	 * error has already done it.
+-	 */
+-	if ((ret == -EIOCBQUEUED) || (!ocfs2_iocb_is_rw_locked(iocb))) {
+-		rw_level = -1;
+-		unaligned_dio = 0;
+-	}
+-
+ 	if (unaligned_dio) {
+ 		ocfs2_iocb_clear_unaligned_aio(iocb);
+ 		mutex_unlock(&OCFS2_I(inode)->ip_unaligned_aio);
+diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
+index 403c5660b306..a482e312c7b2 100644
+--- a/fs/ocfs2/super.c
++++ b/fs/ocfs2/super.c
+@@ -1550,8 +1550,8 @@ static int ocfs2_show_options(struct seq_file *s, struct dentry *root)
+ 		seq_printf(s, ",localflocks,");
+ 
+ 	if (osb->osb_cluster_stack[0])
+-		seq_printf(s, ",cluster_stack=%.*s", OCFS2_STACK_LABEL_LEN,
+-			   osb->osb_cluster_stack);
++		seq_show_option_n(s, "cluster_stack", osb->osb_cluster_stack,
++				  OCFS2_STACK_LABEL_LEN);
+ 	if (opts & OCFS2_MOUNT_USRQUOTA)
+ 		seq_printf(s, ",usrquota");
+ 	if (opts & OCFS2_MOUNT_GRPQUOTA)
+diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
+index 7466ff339c66..79073d68b475 100644
+--- a/fs/overlayfs/super.c
++++ b/fs/overlayfs/super.c
+@@ -588,10 +588,10 @@ static int ovl_show_options(struct seq_file *m, struct dentry *dentry)
+ 	struct super_block *sb = dentry->d_sb;
+ 	struct ovl_fs *ufs = sb->s_fs_info;
+ 
+-	seq_printf(m, ",lowerdir=%s", ufs->config.lowerdir);
++	seq_show_option(m, "lowerdir", ufs->config.lowerdir);
+ 	if (ufs->config.upperdir) {
+-		seq_printf(m, ",upperdir=%s", ufs->config.upperdir);
+-		seq_printf(m, ",workdir=%s", ufs->config.workdir);
++		seq_show_option(m, "upperdir", ufs->config.upperdir);
++		seq_show_option(m, "workdir", ufs->config.workdir);
+ 	}
+ 	return 0;
+ }
+diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c
+index 0e4cf728126f..4a62fe8cc3bf 100644
+--- a/fs/reiserfs/super.c
++++ b/fs/reiserfs/super.c
+@@ -714,18 +714,20 @@ static int reiserfs_show_options(struct seq_file *seq, struct dentry *root)
+ 		seq_puts(seq, ",acl");
+ 
+ 	if (REISERFS_SB(s)->s_jdev)
+-		seq_printf(seq, ",jdev=%s", REISERFS_SB(s)->s_jdev);
++		seq_show_option(seq, "jdev", REISERFS_SB(s)->s_jdev);
+ 
+ 	if (journal->j_max_commit_age != journal->j_default_max_commit_age)
+ 		seq_printf(seq, ",commit=%d", journal->j_max_commit_age);
+ 
+ #ifdef CONFIG_QUOTA
+ 	if (REISERFS_SB(s)->s_qf_names[USRQUOTA])
+-		seq_printf(seq, ",usrjquota=%s", REISERFS_SB(s)->s_qf_names[USRQUOTA]);
++		seq_show_option(seq, "usrjquota",
++				REISERFS_SB(s)->s_qf_names[USRQUOTA]);
+ 	else if (opts & (1 << REISERFS_USRQUOTA))
+ 		seq_puts(seq, ",usrquota");
+ 	if (REISERFS_SB(s)->s_qf_names[GRPQUOTA])
+-		seq_printf(seq, ",grpjquota=%s", REISERFS_SB(s)->s_qf_names[GRPQUOTA]);
++		seq_show_option(seq, "grpjquota",
++				REISERFS_SB(s)->s_qf_names[GRPQUOTA]);
+ 	else if (opts & (1 << REISERFS_GRPQUOTA))
+ 		seq_puts(seq, ",grpquota");
+ 	if (REISERFS_SB(s)->s_jquota_fmt) {
+diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
+index 74bcbabfa523..b14bbd6bb05f 100644
+--- a/fs/xfs/libxfs/xfs_da_format.h
++++ b/fs/xfs/libxfs/xfs_da_format.h
+@@ -680,8 +680,15 @@ typedef struct xfs_attr_leaf_name_remote {
+ typedef struct xfs_attr_leafblock {
+ 	xfs_attr_leaf_hdr_t	hdr;	/* constant-structure header block */
+ 	xfs_attr_leaf_entry_t	entries[1];	/* sorted on key, not name */
+-	xfs_attr_leaf_name_local_t namelist;	/* grows from bottom of buf */
+-	xfs_attr_leaf_name_remote_t valuelist;	/* grows from bottom of buf */
++	/*
++	 * The rest of the block contains the following structures after the
++	 * leaf entries, growing from the bottom up. The variables are never
++	 * referenced and definining them can actually make gcc optimize away
++	 * accesses to the 'entries' array above index 0 so don't do that.
++	 *
++	 * xfs_attr_leaf_name_local_t namelist;
++	 * xfs_attr_leaf_name_remote_t valuelist;
++	 */
+ } xfs_attr_leafblock_t;
+ 
+ /*
+diff --git a/fs/xfs/libxfs/xfs_dir2_data.c b/fs/xfs/libxfs/xfs_dir2_data.c
+index de1ea16f5748..534bbf283d6b 100644
+--- a/fs/xfs/libxfs/xfs_dir2_data.c
++++ b/fs/xfs/libxfs/xfs_dir2_data.c
+@@ -252,7 +252,8 @@ xfs_dir3_data_reada_verify(
+ 		return;
+ 	case cpu_to_be32(XFS_DIR2_DATA_MAGIC):
+ 	case cpu_to_be32(XFS_DIR3_DATA_MAGIC):
+-		xfs_dir3_data_verify(bp);
++		bp->b_ops = &xfs_dir3_data_buf_ops;
++		bp->b_ops->verify_read(bp);
+ 		return;
+ 	default:
+ 		xfs_buf_ioerror(bp, -EFSCORRUPTED);
+diff --git a/fs/xfs/libxfs/xfs_dir2_node.c b/fs/xfs/libxfs/xfs_dir2_node.c
+index 41b80d3d3877..06bb4218b362 100644
+--- a/fs/xfs/libxfs/xfs_dir2_node.c
++++ b/fs/xfs/libxfs/xfs_dir2_node.c
+@@ -2132,6 +2132,7 @@ xfs_dir2_node_replace(
+ 	int			error;		/* error return value */
+ 	int			i;		/* btree level */
+ 	xfs_ino_t		inum;		/* new inode number */
++	int			ftype;		/* new file type */
+ 	xfs_dir2_leaf_t		*leaf;		/* leaf structure */
+ 	xfs_dir2_leaf_entry_t	*lep;		/* leaf entry being changed */
+ 	int			rval;		/* internal return value */
+@@ -2145,7 +2146,14 @@ xfs_dir2_node_replace(
+ 	state = xfs_da_state_alloc();
+ 	state->args = args;
+ 	state->mp = args->dp->i_mount;
++
++	/*
++	 * We have to save new inode number and ftype since
++	 * xfs_da3_node_lookup_int() is going to overwrite them
++	 */
+ 	inum = args->inumber;
++	ftype = args->filetype;
++
+ 	/*
+ 	 * Lookup the entry to change in the btree.
+ 	 */
+@@ -2183,7 +2191,7 @@ xfs_dir2_node_replace(
+ 		 * Fill in the new inode number and log the entry.
+ 		 */
+ 		dep->inumber = cpu_to_be64(inum);
+-		args->dp->d_ops->data_put_ftype(dep, args->filetype);
++		args->dp->d_ops->data_put_ftype(dep, ftype);
+ 		xfs_dir2_data_log_entry(args, state->extrablk.bp, dep);
+ 		rval = 0;
+ 	}
+diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
+index 3859f5e27a4d..458fced2c0f9 100644
+--- a/fs/xfs/xfs_aops.c
++++ b/fs/xfs/xfs_aops.c
+@@ -356,7 +356,8 @@ xfs_end_bio(
+ {
+ 	xfs_ioend_t		*ioend = bio->bi_private;
+ 
+-	ioend->io_error = test_bit(BIO_UPTODATE, &bio->bi_flags) ? 0 : error;
++	if (!ioend->io_error && !test_bit(BIO_UPTODATE, &bio->bi_flags))
++		ioend->io_error = error;
+ 
+ 	/* Toss bio and pass work off to an xfsdatad thread */
+ 	bio->bi_private = NULL;
+diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
+index 1fb16562c159..bbd9b1f10ffb 100644
+--- a/fs/xfs/xfs_super.c
++++ b/fs/xfs/xfs_super.c
+@@ -511,9 +511,9 @@ xfs_showargs(
+ 		seq_printf(m, "," MNTOPT_LOGBSIZE "=%dk", mp->m_logbsize >> 10);
+ 
+ 	if (mp->m_logname)
+-		seq_printf(m, "," MNTOPT_LOGDEV "=%s", mp->m_logname);
++		seq_show_option(m, MNTOPT_LOGDEV, mp->m_logname);
+ 	if (mp->m_rtname)
+-		seq_printf(m, "," MNTOPT_RTDEV "=%s", mp->m_rtname);
++		seq_show_option(m, MNTOPT_RTDEV, mp->m_rtname);
+ 
+ 	if (mp->m_dalign > 0)
+ 		seq_printf(m, "," MNTOPT_SUNIT "=%d",
+diff --git a/include/linux/acpi.h b/include/linux/acpi.h
+index d2445fa9999f..0b2394f61af4 100644
+--- a/include/linux/acpi.h
++++ b/include/linux/acpi.h
+@@ -221,7 +221,7 @@ struct pci_dev;
+ 
+ int acpi_pci_irq_enable (struct pci_dev *dev);
+ void acpi_penalize_isa_irq(int irq, int active);
+-
++void acpi_penalize_sci_irq(int irq, int trigger, int polarity);
+ void acpi_pci_irq_disable (struct pci_dev *dev);
+ 
+ extern int ec_read(u8 addr, u8 *val);
+diff --git a/include/linux/iio/iio.h b/include/linux/iio/iio.h
+index f79148261d16..7bb7f673cb3f 100644
+--- a/include/linux/iio/iio.h
++++ b/include/linux/iio/iio.h
+@@ -645,6 +645,15 @@ int iio_str_to_fixpoint(const char *str, int fract_mult, int *integer,
+ #define IIO_DEGREE_TO_RAD(deg) (((deg) * 314159ULL + 9000000ULL) / 18000000ULL)
+ 
+ /**
++ * IIO_RAD_TO_DEGREE() - Convert rad to degree
++ * @rad: A value in rad
++ *
++ * Returns the given value converted from rad to degree
++ */
++#define IIO_RAD_TO_DEGREE(rad) \
++	(((rad) * 18000000ULL + 314159ULL / 2) / 314159ULL)
++
++/**
+  * IIO_G_TO_M_S_2() - Convert g to meter / second**2
+  * @g: A value in g
+  *
+@@ -652,4 +661,12 @@ int iio_str_to_fixpoint(const char *str, int fract_mult, int *integer,
+  */
+ #define IIO_G_TO_M_S_2(g) ((g) * 980665ULL / 100000ULL)
+ 
++/**
++ * IIO_M_S_2_TO_G() - Convert meter / second**2 to g
++ * @ms2: A value in meter / second**2
++ *
++ * Returns the given value converted from meter / second**2 to g
++ */
++#define IIO_M_S_2_TO_G(ms2) (((ms2) * 100000ULL + 980665ULL / 2) / 980665ULL)
++
+ #endif /* _INDUSTRIAL_IO_H_ */
+diff --git a/include/linux/pci.h b/include/linux/pci.h
+index 860c751810fc..1d4eb6057f72 100644
+--- a/include/linux/pci.h
++++ b/include/linux/pci.h
+@@ -180,6 +180,8 @@ enum pci_dev_flags {
+ 	PCI_DEV_FLAGS_NO_BUS_RESET = (__force pci_dev_flags_t) (1 << 6),
+ 	/* Do not use PM reset even if device advertises NoSoftRst- */
+ 	PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
++	/* Get VPD from function 0 VPD */
++	PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
+ };
+ 
+ enum pci_irq_reroute_variant {
+diff --git a/include/linux/seq_file.h b/include/linux/seq_file.h
+index 912a7c482649..d4c7271382cb 100644
+--- a/include/linux/seq_file.h
++++ b/include/linux/seq_file.h
+@@ -149,6 +149,41 @@ static inline struct user_namespace *seq_user_ns(struct seq_file *seq)
+ #endif
+ }
+ 
++/**
++ * seq_show_options - display mount options with appropriate escapes.
++ * @m: the seq_file handle
++ * @name: the mount option name
++ * @value: the mount option name's value, can be NULL
++ */
++static inline void seq_show_option(struct seq_file *m, const char *name,
++				   const char *value)
++{
++	seq_putc(m, ',');
++	seq_escape(m, name, ",= \t\n\\");
++	if (value) {
++		seq_putc(m, '=');
++		seq_escape(m, value, ", \t\n\\");
++	}
++}
++
++/**
++ * seq_show_option_n - display mount options with appropriate escapes
++ *		       where @value must be a specific length.
++ * @m: the seq_file handle
++ * @name: the mount option name
++ * @value: the mount option name's value, cannot be NULL
++ * @length: the length of @value to display
++ *
++ * This is a macro since this uses "length" to define the size of the
++ * stack buffer.
++ */
++#define seq_show_option_n(m, name, value, length) {	\
++	char val_buf[length + 1];			\
++	strncpy(val_buf, value, length);		\
++	val_buf[length] = '\0';				\
++	seq_show_option(m, name, val_buf);		\
++}
++
+ #define SEQ_START_TOKEN ((void *)1)
+ /*
+  * Helpers for iteration over list_head-s in seq_files
+diff --git a/include/uapi/linux/dm-ioctl.h b/include/uapi/linux/dm-ioctl.h
+index 061aca3a962d..d34611e35a30 100644
+--- a/include/uapi/linux/dm-ioctl.h
++++ b/include/uapi/linux/dm-ioctl.h
+@@ -267,9 +267,9 @@ enum {
+ #define DM_DEV_SET_GEOMETRY	_IOWR(DM_IOCTL, DM_DEV_SET_GEOMETRY_CMD, struct dm_ioctl)
+ 
+ #define DM_VERSION_MAJOR	4
+-#define DM_VERSION_MINOR	32
++#define DM_VERSION_MINOR	33
+ #define DM_VERSION_PATCHLEVEL	0
+-#define DM_VERSION_EXTRA	"-ioctl (2015-6-26)"
++#define DM_VERSION_EXTRA	"-ioctl (2015-8-18)"
+ 
+ /* Status bits */
+ #define DM_READONLY_FLAG	(1 << 0) /* In/Out */
+diff --git a/kernel/cgroup.c b/kernel/cgroup.c
+index f89d9292eee6..c6c4240e7d28 100644
+--- a/kernel/cgroup.c
++++ b/kernel/cgroup.c
+@@ -1334,7 +1334,7 @@ static int cgroup_show_options(struct seq_file *seq,
+ 
+ 	for_each_subsys(ss, ssid)
+ 		if (root->subsys_mask & (1 << ssid))
+-			seq_printf(seq, ",%s", ss->name);
++			seq_show_option(seq, ss->name, NULL);
+ 	if (root->flags & CGRP_ROOT_NOPREFIX)
+ 		seq_puts(seq, ",noprefix");
+ 	if (root->flags & CGRP_ROOT_XATTR)
+@@ -1342,13 +1342,14 @@ static int cgroup_show_options(struct seq_file *seq,
+ 
+ 	spin_lock(&release_agent_path_lock);
+ 	if (strlen(root->release_agent_path))
+-		seq_printf(seq, ",release_agent=%s", root->release_agent_path);
++		seq_show_option(seq, "release_agent",
++				root->release_agent_path);
+ 	spin_unlock(&release_agent_path_lock);
+ 
+ 	if (test_bit(CGRP_CPUSET_CLONE_CHILDREN, &root->cgrp.flags))
+ 		seq_puts(seq, ",clone_children");
+ 	if (strlen(root->name))
+-		seq_printf(seq, ",name=%s", root->name);
++		seq_show_option(seq, "name", root->name);
+ 	return 0;
+ }
+ 
+diff --git a/kernel/sched/core.c b/kernel/sched/core.c
+index 78b4bad10081..e9673433cc01 100644
+--- a/kernel/sched/core.c
++++ b/kernel/sched/core.c
+@@ -5433,6 +5433,14 @@ static int sched_cpu_active(struct notifier_block *nfb,
+ 	case CPU_STARTING:
+ 		set_cpu_rq_start_time();
+ 		return NOTIFY_OK;
++	case CPU_ONLINE:
++		/*
++		 * At this point a starting CPU has marked itself as online via
++		 * set_cpu_online(). But it might not yet have marked itself
++		 * as active, which is essential from here on.
++		 *
++		 * Thus, fall-through and help the starting CPU along.
++		 */
+ 	case CPU_DOWN_FAILED:
+ 		set_cpu_active((long)hcpu, true);
+ 		return NOTIFY_OK;
+diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
+index 6da82bcb0a8b..8fd97dac538a 100644
+--- a/mm/memory_hotplug.c
++++ b/mm/memory_hotplug.c
+@@ -1248,6 +1248,14 @@ int __ref add_memory(int nid, u64 start, u64 size)
+ 
+ 	mem_hotplug_begin();
+ 
++	/*
++	 * Add new range to memblock so that when hotadd_new_pgdat() is called
++	 * to allocate new pgdat, get_pfn_range_for_nid() will be able to find
++	 * this new range and calculate total pages correctly.  The range will
++	 * be removed at hot-remove time.
++	 */
++	memblock_add_node(start, size, nid);
++
+ 	new_node = !node_online(nid);
+ 	if (new_node) {
+ 		pgdat = hotadd_new_pgdat(nid, start);
+@@ -1277,7 +1285,6 @@ int __ref add_memory(int nid, u64 start, u64 size)
+ 
+ 	/* create new memmap entry */
+ 	firmware_map_add_hotplug(start, start + size, "System RAM");
+-	memblock_add_node(start, size, nid);
+ 
+ 	goto out;
+ 
+@@ -1286,6 +1293,7 @@ error:
+ 	if (new_pgdat)
+ 		rollback_node_hotadd(nid, pgdat);
+ 	release_memory_resource(res);
++	memblock_remove(start, size);
+ 
+ out:
+ 	mem_hotplug_done();
+diff --git a/net/ceph/ceph_common.c b/net/ceph/ceph_common.c
+index f30329f72641..69a4d30a9ccf 100644
+--- a/net/ceph/ceph_common.c
++++ b/net/ceph/ceph_common.c
+@@ -517,8 +517,11 @@ int ceph_print_client_options(struct seq_file *m, struct ceph_client *client)
+ 	struct ceph_options *opt = client->options;
+ 	size_t pos = m->count;
+ 
+-	if (opt->name)
+-		seq_printf(m, "name=%s,", opt->name);
++	if (opt->name) {
++		seq_puts(m, "name=");
++		seq_escape(m, opt->name, ", \t\n\\");
++		seq_putc(m, ',');
++	}
+ 	if (opt->key)
+ 		seq_puts(m, "secret=<hidden>,");
+ 
+diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
+index 564079c5c49d..cdf4c589a391 100644
+--- a/security/selinux/hooks.c
++++ b/security/selinux/hooks.c
+@@ -1100,7 +1100,7 @@ static void selinux_write_opts(struct seq_file *m,
+ 		seq_puts(m, prefix);
+ 		if (has_comma)
+ 			seq_putc(m, '\"');
+-		seq_puts(m, opts->mnt_opts[i]);
++		seq_escape(m, opts->mnt_opts[i], "\"\n\\");
+ 		if (has_comma)
+ 			seq_putc(m, '\"');
+ 	}
+diff --git a/sound/soc/codecs/adav80x.c b/sound/soc/codecs/adav80x.c
+index 36d842570745..69c63b92e078 100644
+--- a/sound/soc/codecs/adav80x.c
++++ b/sound/soc/codecs/adav80x.c
+@@ -865,7 +865,6 @@ const struct regmap_config adav80x_regmap_config = {
+ 	.val_bits = 8,
+ 	.pad_bits = 1,
+ 	.reg_bits = 7,
+-	.read_flag_mask = 0x01,
+ 
+ 	.max_register = ADAV80X_PLL_OUTE,
+ 
+diff --git a/sound/soc/codecs/arizona.c b/sound/soc/codecs/arizona.c
+index 802e05eae3e9..4180827a8480 100644
+--- a/sound/soc/codecs/arizona.c
++++ b/sound/soc/codecs/arizona.c
+@@ -1756,17 +1756,6 @@ int arizona_init_dai(struct arizona_priv *priv, int id)
+ }
+ EXPORT_SYMBOL_GPL(arizona_init_dai);
+ 
+-static irqreturn_t arizona_fll_clock_ok(int irq, void *data)
+-{
+-	struct arizona_fll *fll = data;
+-
+-	arizona_fll_dbg(fll, "clock OK\n");
+-
+-	complete(&fll->ok);
+-
+-	return IRQ_HANDLED;
+-}
+-
+ static struct {
+ 	unsigned int min;
+ 	unsigned int max;
+@@ -2048,17 +2037,18 @@ static int arizona_is_enabled_fll(struct arizona_fll *fll)
+ static int arizona_enable_fll(struct arizona_fll *fll)
+ {
+ 	struct arizona *arizona = fll->arizona;
+-	unsigned long time_left;
+ 	bool use_sync = false;
+ 	int already_enabled = arizona_is_enabled_fll(fll);
+ 	struct arizona_fll_cfg cfg;
++	int i;
++	unsigned int val;
+ 
+ 	if (already_enabled < 0)
+ 		return already_enabled;
+ 
+ 	if (already_enabled) {
+ 		/* Facilitate smooth refclk across the transition */
+-		regmap_update_bits_async(fll->arizona->regmap, fll->base + 0x7,
++		regmap_update_bits_async(fll->arizona->regmap, fll->base + 0x9,
+ 					 ARIZONA_FLL1_GAIN_MASK, 0);
+ 		regmap_update_bits_async(fll->arizona->regmap, fll->base + 1,
+ 					 ARIZONA_FLL1_FREERUN,
+@@ -2110,9 +2100,6 @@ static int arizona_enable_fll(struct arizona_fll *fll)
+ 	if (!already_enabled)
+ 		pm_runtime_get(arizona->dev);
+ 
+-	/* Clear any pending completions */
+-	try_wait_for_completion(&fll->ok);
+-
+ 	regmap_update_bits_async(arizona->regmap, fll->base + 1,
+ 				 ARIZONA_FLL1_ENA, ARIZONA_FLL1_ENA);
+ 	if (use_sync)
+@@ -2124,10 +2111,24 @@ static int arizona_enable_fll(struct arizona_fll *fll)
+ 		regmap_update_bits_async(arizona->regmap, fll->base + 1,
+ 					 ARIZONA_FLL1_FREERUN, 0);
+ 
+-	time_left = wait_for_completion_timeout(&fll->ok,
+-					  msecs_to_jiffies(250));
+-	if (time_left == 0)
++	arizona_fll_dbg(fll, "Waiting for FLL lock...\n");
++	val = 0;
++	for (i = 0; i < 15; i++) {
++		if (i < 5)
++			usleep_range(200, 400);
++		else
++			msleep(20);
++
++		regmap_read(arizona->regmap,
++			    ARIZONA_INTERRUPT_RAW_STATUS_5,
++			    &val);
++		if (val & (ARIZONA_FLL1_CLOCK_OK_STS << (fll->id - 1)))
++			break;
++	}
++	if (i == 15)
+ 		arizona_fll_warn(fll, "Timed out waiting for lock\n");
++	else
++		arizona_fll_dbg(fll, "FLL locked (%d polls)\n", i);
+ 
+ 	return 0;
+ }
+@@ -2212,11 +2213,8 @@ EXPORT_SYMBOL_GPL(arizona_set_fll);
+ int arizona_init_fll(struct arizona *arizona, int id, int base, int lock_irq,
+ 		     int ok_irq, struct arizona_fll *fll)
+ {
+-	int ret;
+ 	unsigned int val;
+ 
+-	init_completion(&fll->ok);
+-
+ 	fll->id = id;
+ 	fll->base = base;
+ 	fll->arizona = arizona;
+@@ -2238,13 +2236,6 @@ int arizona_init_fll(struct arizona *arizona, int id, int base, int lock_irq,
+ 	snprintf(fll->clock_ok_name, sizeof(fll->clock_ok_name),
+ 		 "FLL%d clock OK", id);
+ 
+-	ret = arizona_request_irq(arizona, ok_irq, fll->clock_ok_name,
+-				  arizona_fll_clock_ok, fll);
+-	if (ret != 0) {
+-		dev_err(arizona->dev, "Failed to get FLL%d clock OK IRQ: %d\n",
+-			id, ret);
+-	}
+-
+ 	regmap_update_bits(arizona->regmap, fll->base + 1,
+ 			   ARIZONA_FLL1_FREERUN, 0);
+ 
+diff --git a/sound/soc/codecs/arizona.h b/sound/soc/codecs/arizona.h
+index 43deb0462309..36867d05e0bb 100644
+--- a/sound/soc/codecs/arizona.h
++++ b/sound/soc/codecs/arizona.h
+@@ -242,7 +242,6 @@ struct arizona_fll {
+ 	int id;
+ 	unsigned int base;
+ 	unsigned int vco_mult;
+-	struct completion ok;
+ 
+ 	unsigned int fout;
+ 	int sync_src;
+diff --git a/sound/soc/codecs/rt5640.c b/sound/soc/codecs/rt5640.c
+index 9bc78e57513d..ff72cd8c236e 100644
+--- a/sound/soc/codecs/rt5640.c
++++ b/sound/soc/codecs/rt5640.c
+@@ -984,6 +984,35 @@ static int rt5640_hp_event(struct snd_soc_dapm_widget *w,
+ 	return 0;
+ }
+ 
++static int rt5640_lout_event(struct snd_soc_dapm_widget *w,
++	struct snd_kcontrol *kcontrol, int event)
++{
++	struct snd_soc_codec *codec = snd_soc_dapm_to_codec(w->dapm);
++
++	switch (event) {
++	case SND_SOC_DAPM_POST_PMU:
++		hp_amp_power_on(codec);
++		snd_soc_update_bits(codec, RT5640_PWR_ANLG1,
++			RT5640_PWR_LM, RT5640_PWR_LM);
++		snd_soc_update_bits(codec, RT5640_OUTPUT,
++			RT5640_L_MUTE | RT5640_R_MUTE, 0);
++		break;
++
++	case SND_SOC_DAPM_PRE_PMD:
++		snd_soc_update_bits(codec, RT5640_OUTPUT,
++			RT5640_L_MUTE | RT5640_R_MUTE,
++			RT5640_L_MUTE | RT5640_R_MUTE);
++		snd_soc_update_bits(codec, RT5640_PWR_ANLG1,
++			RT5640_PWR_LM, 0);
++		break;
++
++	default:
++		return 0;
++	}
++
++	return 0;
++}
++
+ static int rt5640_hp_power_event(struct snd_soc_dapm_widget *w,
+ 			   struct snd_kcontrol *kcontrol, int event)
+ {
+@@ -1179,13 +1208,16 @@ static const struct snd_soc_dapm_widget rt5640_dapm_widgets[] = {
+ 		0, rt5640_spo_l_mix, ARRAY_SIZE(rt5640_spo_l_mix)),
+ 	SND_SOC_DAPM_MIXER("SPOR MIX", SND_SOC_NOPM, 0,
+ 		0, rt5640_spo_r_mix, ARRAY_SIZE(rt5640_spo_r_mix)),
+-	SND_SOC_DAPM_MIXER("LOUT MIX", RT5640_PWR_ANLG1, RT5640_PWR_LM_BIT, 0,
++	SND_SOC_DAPM_MIXER("LOUT MIX", SND_SOC_NOPM, 0, 0,
+ 		rt5640_lout_mix, ARRAY_SIZE(rt5640_lout_mix)),
+ 	SND_SOC_DAPM_SUPPLY_S("Improve HP Amp Drv", 1, SND_SOC_NOPM,
+ 		0, 0, rt5640_hp_power_event, SND_SOC_DAPM_POST_PMU),
+ 	SND_SOC_DAPM_PGA_S("HP Amp", 1, SND_SOC_NOPM, 0, 0,
+ 		rt5640_hp_event,
+ 		SND_SOC_DAPM_PRE_PMD | SND_SOC_DAPM_POST_PMU),
++	SND_SOC_DAPM_PGA_S("LOUT amp", 1, SND_SOC_NOPM, 0, 0,
++		rt5640_lout_event,
++		SND_SOC_DAPM_PRE_PMD | SND_SOC_DAPM_POST_PMU),
+ 	SND_SOC_DAPM_SUPPLY("HP L Amp", RT5640_PWR_ANLG1,
+ 		RT5640_PWR_HP_L_BIT, 0, NULL, 0),
+ 	SND_SOC_DAPM_SUPPLY("HP R Amp", RT5640_PWR_ANLG1,
+@@ -1500,8 +1532,10 @@ static const struct snd_soc_dapm_route rt5640_dapm_routes[] = {
+ 	{"HP R Playback", "Switch", "HP Amp"},
+ 	{"HPOL", NULL, "HP L Playback"},
+ 	{"HPOR", NULL, "HP R Playback"},
+-	{"LOUTL", NULL, "LOUT MIX"},
+-	{"LOUTR", NULL, "LOUT MIX"},
++
++	{"LOUT amp", NULL, "LOUT MIX"},
++	{"LOUTL", NULL, "LOUT amp"},
++	{"LOUTR", NULL, "LOUT amp"},
+ };
+ 
+ static const struct snd_soc_dapm_route rt5640_specific_dapm_routes[] = {
+diff --git a/sound/soc/codecs/rt5645.c b/sound/soc/codecs/rt5645.c
+index 961bd7e5877e..58713733d314 100644
+--- a/sound/soc/codecs/rt5645.c
++++ b/sound/soc/codecs/rt5645.c
+@@ -3232,6 +3232,13 @@ static struct dmi_system_id dmi_platform_intel_braswell[] = {
+ 			DMI_MATCH(DMI_PRODUCT_NAME, "Strago"),
+ 		},
+ 	},
++	{
++		.ident = "Google Celes",
++		.callback = strago_quirk_cb,
++		.matches = {
++			DMI_MATCH(DMI_PRODUCT_NAME, "Celes"),
++		},
++	},
+ 	{ }
+ };
+ 
+diff --git a/sound/soc/samsung/arndale_rt5631.c b/sound/soc/samsung/arndale_rt5631.c
+index 8bf2e2c4bafb..9e371eb3e4fa 100644
+--- a/sound/soc/samsung/arndale_rt5631.c
++++ b/sound/soc/samsung/arndale_rt5631.c
+@@ -116,15 +116,6 @@ static int arndale_audio_probe(struct platform_device *pdev)
+ 	return ret;
+ }
+ 
+-static int arndale_audio_remove(struct platform_device *pdev)
+-{
+-	struct snd_soc_card *card = platform_get_drvdata(pdev);
+-
+-	snd_soc_unregister_card(card);
+-
+-	return 0;
+-}
+-
+ static const struct of_device_id samsung_arndale_rt5631_of_match[] __maybe_unused = {
+ 	{ .compatible = "samsung,arndale-rt5631", },
+ 	{ .compatible = "samsung,arndale-alc5631", },
+@@ -139,7 +130,6 @@ static struct platform_driver arndale_audio_driver = {
+ 		.of_match_table = of_match_ptr(samsung_arndale_rt5631_of_match),
+ 	},
+ 	.probe = arndale_audio_probe,
+-	.remove = arndale_audio_remove,
+ };
+ 
+ module_platform_driver(arndale_audio_driver);


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-09-22 11:43 Mike Pagano
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-09-22 11:43 UTC (permalink / raw
  To: gentoo-commits

commit:     5d62f231ba9e82e14ed7c8a7e0117cec4fa5973d
Author:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Tue Sep 22 11:43:09 2015 +0000
Commit:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Tue Sep 22 11:43:09 2015 +0000
URL:        https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=5d62f231

Removal of BFQ for compilation issues. I will add the next working version released.

 0000_README                                        |   12 -
 ...roups-kconfig-build-bits-for-BFQ-v7r9-4.2.patch |  103 -
 ...introduce-the-BFQ-v7r9-I-O-sched-for-4.2.patch1 | 7026 --------------------
 ...Early-Queue-Merge-EQM-to-BFQ-v7r9-for-4.2.patch | 1097 ---
 4 files changed, 8238 deletions(-)

diff --git a/0000_README b/0000_README
index 0c6168a..7050114 100644
--- a/0000_README
+++ b/0000_README
@@ -79,18 +79,6 @@ Patch:  5000_enable-additional-cpu-optimizations-for-gcc.patch
 From:   https://github.com/graysky2/kernel_gcc_patch/
 Desc:   Kernel patch enables gcc < v4.9 optimizations for additional CPUs.
 
-Patch:  5001_block-cgroups-kconfig-build-bits-for-BFQ-v7r9-4.2.patch
-From:   http://algo.ing.unimo.it/people/paolo/disk_sched/
-Desc:   BFQ v7r9 patch 1 for 4.2: Build, cgroups and kconfig bits
-
-Patch:  5002_block-introduce-the-BFQ-v7r9-I-O-sched-for-4.2.patch1
-From:   http://algo.ing.unimo.it/people/paolo/disk_sched/
-Desc:   BFQ v7r9 patch 2 for 4.2: BFQ Scheduler
-
-Patch:  5003_block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r9-for-4.2.0.patch
-From:   http://algo.ing.unimo.it/people/paolo/disk_sched/
-Desc:   BFQ v7r9 patch 3 for 4.2: Early Queue Merge (EQM)
-
 Patch:  5010_enable-additional-cpu-optimizations-for-gcc-4.9.patch
 From:   https://github.com/graysky2/kernel_gcc_patch/
 Desc:   Kernel patch enables gcc >= v4.9 optimizations for additional CPUs.

diff --git a/5001_block-cgroups-kconfig-build-bits-for-BFQ-v7r9-4.2.patch b/5001_block-cgroups-kconfig-build-bits-for-BFQ-v7r9-4.2.patch
deleted file mode 100644
index fc7ef8e..0000000
--- a/5001_block-cgroups-kconfig-build-bits-for-BFQ-v7r9-4.2.patch
+++ /dev/null
@@ -1,103 +0,0 @@
-From f53ecde45f8d40a343aa5b5195e9f0944b7a1a37 Mon Sep 17 00:00:00 2001
-From: Paolo Valente <paolo.valente@unimore.it>
-Date: Tue, 7 Apr 2015 13:39:12 +0200
-Subject: [PATCH 1/3] block: cgroups, kconfig, build bits for BFQ-v7r9-4.2
-
-Update Kconfig.iosched and do the related Makefile changes to include
-kernel configuration options for BFQ. Also increase the number of
-policies supported by the blkio controller so that BFQ can add its
-own.
-
-Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
-Signed-off-by: Arianna Avanzini <avanzini@google.com>
----
- block/Kconfig.iosched  | 32 ++++++++++++++++++++++++++++++++
- block/Makefile         |  1 +
- include/linux/blkdev.h |  2 +-
- 3 files changed, 34 insertions(+), 1 deletion(-)
-
-diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
-index 421bef9..0ee5f0f 100644
---- a/block/Kconfig.iosched
-+++ b/block/Kconfig.iosched
-@@ -39,6 +39,27 @@ config CFQ_GROUP_IOSCHED
- 	---help---
- 	  Enable group IO scheduling in CFQ.
- 
-+config IOSCHED_BFQ
-+	tristate "BFQ I/O scheduler"
-+	default n
-+	---help---
-+	  The BFQ I/O scheduler tries to distribute bandwidth among
-+	  all processes according to their weights.
-+	  It aims at distributing the bandwidth as desired, independently of
-+	  the disk parameters and with any workload. It also tries to
-+	  guarantee low latency to interactive and soft real-time
-+	  applications. If compiled built-in (saying Y here), BFQ can
-+	  be configured to support hierarchical scheduling.
-+
-+config CGROUP_BFQIO
-+	bool "BFQ hierarchical scheduling support"
-+	depends on CGROUPS && IOSCHED_BFQ=y
-+	default n
-+	---help---
-+	  Enable hierarchical scheduling in BFQ, using the cgroups
-+	  filesystem interface.  The name of the subsystem will be
-+	  bfqio.
-+
- choice
- 	prompt "Default I/O scheduler"
- 	default DEFAULT_CFQ
-@@ -52,6 +73,16 @@ choice
- 	config DEFAULT_CFQ
- 		bool "CFQ" if IOSCHED_CFQ=y
- 
-+	config DEFAULT_BFQ
-+		bool "BFQ" if IOSCHED_BFQ=y
-+		help
-+		  Selects BFQ as the default I/O scheduler which will be
-+		  used by default for all block devices.
-+		  The BFQ I/O scheduler aims at distributing the bandwidth
-+		  as desired, independently of the disk parameters and with
-+		  any workload. It also tries to guarantee low latency to
-+		  interactive and soft real-time applications.
-+
- 	config DEFAULT_NOOP
- 		bool "No-op"
- 
-@@ -61,6 +92,7 @@ config DEFAULT_IOSCHED
- 	string
- 	default "deadline" if DEFAULT_DEADLINE
- 	default "cfq" if DEFAULT_CFQ
-+	default "bfq" if DEFAULT_BFQ
- 	default "noop" if DEFAULT_NOOP
- 
- endmenu
-diff --git a/block/Makefile b/block/Makefile
-index 00ecc97..1ed86d5 100644
---- a/block/Makefile
-+++ b/block/Makefile
-@@ -18,6 +18,7 @@ obj-$(CONFIG_BLK_DEV_THROTTLING)	+= blk-throttle.o
- obj-$(CONFIG_IOSCHED_NOOP)	+= noop-iosched.o
- obj-$(CONFIG_IOSCHED_DEADLINE)	+= deadline-iosched.o
- obj-$(CONFIG_IOSCHED_CFQ)	+= cfq-iosched.o
-+obj-$(CONFIG_IOSCHED_BFQ)	+= bfq-iosched.o
- 
- obj-$(CONFIG_BLOCK_COMPAT)	+= compat_ioctl.o
- obj-$(CONFIG_BLK_CMDLINE_PARSER)	+= cmdline-parser.o
-diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
-index a622f27..e2b4c03 100644
---- a/include/linux/blkdev.h
-+++ b/include/linux/blkdev.h
-@@ -43,7 +43,7 @@ struct blk_flush_queue;
-  * Maximum number of blkcg policies allowed to be registered concurrently.
-  * Defined here to simplify include dependency.
-  */
--#define BLKCG_MAX_POLS		2
-+#define BLKCG_MAX_POLS		3
- 
- struct request;
- typedef void (rq_end_io_fn)(struct request *, int);
--- 
-2.1.4
-

diff --git a/5002_block-introduce-the-BFQ-v7r9-I-O-sched-for-4.2.patch1 b/5002_block-introduce-the-BFQ-v7r9-I-O-sched-for-4.2.patch1
deleted file mode 100644
index 04dd37c..0000000
--- a/5002_block-introduce-the-BFQ-v7r9-I-O-sched-for-4.2.patch1
+++ /dev/null
@@ -1,7026 +0,0 @@
-From 152cacc8a71a6cd7fe8cedc1110a378721e66ffa Mon Sep 17 00:00:00 2001
-From: Paolo Valente <paolo.valente@unimore.it>
-Date: Thu, 9 May 2013 19:10:02 +0200
-Subject: [PATCH 2/3] block: introduce the BFQ-v7r9 I/O sched for 4.2
-
-Add the BFQ-v7r9 I/O scheduler to 4.2.
-The general structure is borrowed from CFQ, as much of the code for
-handling I/O contexts. Over time, several useful features have been
-ported from CFQ as well (details in the changelog in README.BFQ). A
-(bfq_)queue is associated to each task doing I/O on a device, and each
-time a scheduling decision has to be made a queue is selected and served
-until it expires.
-
-    - Slices are given in the service domain: tasks are assigned
-      budgets, measured in number of sectors. Once got the disk, a task
-      must however consume its assigned budget within a configurable
-      maximum time (by default, the maximum possible value of the
-      budgets is automatically computed to comply with this timeout).
-      This allows the desired latency vs "throughput boosting" tradeoff
-      to be set.
-
-    - Budgets are scheduled according to a variant of WF2Q+, implemented
-      using an augmented rb-tree to take eligibility into account while
-      preserving an O(log N) overall complexity.
-
-    - A low-latency tunable is provided; if enabled, both interactive
-      and soft real-time applications are guaranteed a very low latency.
-
-    - Latency guarantees are preserved also in the presence of NCQ.
-
-    - Also with flash-based devices, a high throughput is achieved
-      while still preserving latency guarantees.
-
-    - BFQ features Early Queue Merge (EQM), a sort of fusion of the
-      cooperating-queue-merging and the preemption mechanisms present
-      in CFQ. EQM is in fact a unified mechanism that tries to get a
-      sequential read pattern, and hence a high throughput, with any
-      set of processes performing interleaved I/O over a contiguous
-      sequence of sectors.
-
-    - BFQ supports full hierarchical scheduling, exporting a cgroups
-      interface.  Since each node has a full scheduler, each group can
-      be assigned its own weight.
-
-    - If the cgroups interface is not used, only I/O priorities can be
-      assigned to processes, with ioprio values mapped to weights
-      with the relation weight = IOPRIO_BE_NR - ioprio.
-
-    - ioprio classes are served in strict priority order, i.e., lower
-      priority queues are not served as long as there are higher
-      priority queues.  Among queues in the same class the bandwidth is
-      distributed in proportion to the weight of each queue. A very
-      thin extra bandwidth is however guaranteed to the Idle class, to
-      prevent it from starving.
-
-Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
-Signed-off-by: Arianna Avanzini <avanzini@google.com>
----
- block/Kconfig.iosched |    6 +-
- block/bfq-cgroup.c    | 1108 +++++++++++++++
- block/bfq-ioc.c       |   36 +
- block/bfq-iosched.c   | 3753 +++++++++++++++++++++++++++++++++++++++++++++++++
- block/bfq-sched.c     | 1197 ++++++++++++++++
- block/bfq.h           |  807 +++++++++++
- 6 files changed, 6903 insertions(+), 4 deletions(-)
- create mode 100644 block/bfq-cgroup.c
- create mode 100644 block/bfq-ioc.c
- create mode 100644 block/bfq-iosched.c
- create mode 100644 block/bfq-sched.c
- create mode 100644 block/bfq.h
-
-diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
-index 0ee5f0f..f78cd1a 100644
---- a/block/Kconfig.iosched
-+++ b/block/Kconfig.iosched
-@@ -51,14 +51,12 @@ config IOSCHED_BFQ
- 	  applications. If compiled built-in (saying Y here), BFQ can
- 	  be configured to support hierarchical scheduling.
- 
--config CGROUP_BFQIO
-+config BFQ_GROUP_IOSCHED
- 	bool "BFQ hierarchical scheduling support"
- 	depends on CGROUPS && IOSCHED_BFQ=y
- 	default n
- 	---help---
--	  Enable hierarchical scheduling in BFQ, using the cgroups
--	  filesystem interface.  The name of the subsystem will be
--	  bfqio.
-+	  Enable hierarchical scheduling in BFQ, using the blkio controller.
- 
- choice
- 	prompt "Default I/O scheduler"
-diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
-new file mode 100644
-index 0000000..c02d65a
---- /dev/null
-+++ b/block/bfq-cgroup.c
-@@ -0,0 +1,1108 @@
-+/*
-+ * BFQ: CGROUPS support.
-+ *
-+ * Based on ideas and code from CFQ:
-+ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
-+ *
-+ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
-+ *		      Paolo Valente <paolo.valente@unimore.it>
-+ *
-+ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
-+ *
-+ * Licensed under the GPL-2 as detailed in the accompanying COPYING.BFQ
-+ * file.
-+ */
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+
-+/* bfqg stats flags */
-+enum bfqg_stats_flags {
-+	BFQG_stats_waiting = 0,
-+	BFQG_stats_idling,
-+	BFQG_stats_empty,
-+};
-+
-+#define BFQG_FLAG_FNS(name)						\
-+static void bfqg_stats_mark_##name(struct bfqg_stats *stats)	\
-+{									\
-+	stats->flags |= (1 << BFQG_stats_##name);			\
-+}									\
-+static void bfqg_stats_clear_##name(struct bfqg_stats *stats)	\
-+{									\
-+	stats->flags &= ~(1 << BFQG_stats_##name);			\
-+}									\
-+static int bfqg_stats_##name(struct bfqg_stats *stats)		\
-+{									\
-+	return (stats->flags & (1 << BFQG_stats_##name)) != 0;		\
-+}									\
-+
-+BFQG_FLAG_FNS(waiting)
-+BFQG_FLAG_FNS(idling)
-+BFQG_FLAG_FNS(empty)
-+#undef BFQG_FLAG_FNS
-+
-+/* This should be called with the queue_lock held. */
-+static void bfqg_stats_update_group_wait_time(struct bfqg_stats *stats)
-+{
-+	unsigned long long now;
-+
-+	if (!bfqg_stats_waiting(stats))
-+		return;
-+
-+	now = sched_clock();
-+	if (time_after64(now, stats->start_group_wait_time))
-+		blkg_stat_add(&stats->group_wait_time,
-+			      now - stats->start_group_wait_time);
-+	bfqg_stats_clear_waiting(stats);
-+}
-+
-+/* This should be called with the queue_lock held. */
-+static void bfqg_stats_set_start_group_wait_time(struct bfq_group *bfqg,
-+						 struct bfq_group *curr_bfqg)
-+{
-+	struct bfqg_stats *stats = &bfqg->stats;
-+
-+	if (bfqg_stats_waiting(stats))
-+		return;
-+	if (bfqg == curr_bfqg)
-+		return;
-+	stats->start_group_wait_time = sched_clock();
-+	bfqg_stats_mark_waiting(stats);
-+}
-+
-+/* This should be called with the queue_lock held. */
-+static void bfqg_stats_end_empty_time(struct bfqg_stats *stats)
-+{
-+	unsigned long long now;
-+
-+	if (!bfqg_stats_empty(stats))
-+		return;
-+
-+	now = sched_clock();
-+	if (time_after64(now, stats->start_empty_time))
-+		blkg_stat_add(&stats->empty_time,
-+			      now - stats->start_empty_time);
-+	bfqg_stats_clear_empty(stats);
-+}
-+
-+static void bfqg_stats_update_dequeue(struct bfq_group *bfqg)
-+{
-+	blkg_stat_add(&bfqg->stats.dequeue, 1);
-+}
-+
-+static void bfqg_stats_set_start_empty_time(struct bfq_group *bfqg)
-+{
-+	struct bfqg_stats *stats = &bfqg->stats;
-+
-+	if (blkg_rwstat_total(&stats->queued))
-+		return;
-+
-+	/*
-+	 * group is already marked empty. This can happen if bfqq got new
-+	 * request in parent group and moved to this group while being added
-+	 * to service tree. Just ignore the event and move on.
-+	 */
-+	if (bfqg_stats_empty(stats))
-+		return;
-+
-+	stats->start_empty_time = sched_clock();
-+	bfqg_stats_mark_empty(stats);
-+}
-+
-+static void bfqg_stats_update_idle_time(struct bfq_group *bfqg)
-+{
-+	struct bfqg_stats *stats = &bfqg->stats;
-+
-+	if (bfqg_stats_idling(stats)) {
-+		unsigned long long now = sched_clock();
-+
-+		if (time_after64(now, stats->start_idle_time))
-+			blkg_stat_add(&stats->idle_time,
-+				      now - stats->start_idle_time);
-+		bfqg_stats_clear_idling(stats);
-+	}
-+}
-+
-+static void bfqg_stats_set_start_idle_time(struct bfq_group *bfqg)
-+{
-+	struct bfqg_stats *stats = &bfqg->stats;
-+
-+	stats->start_idle_time = sched_clock();
-+	bfqg_stats_mark_idling(stats);
-+}
-+
-+static void bfqg_stats_update_avg_queue_size(struct bfq_group *bfqg)
-+{
-+	struct bfqg_stats *stats = &bfqg->stats;
-+
-+	blkg_stat_add(&stats->avg_queue_size_sum,
-+		      blkg_rwstat_total(&stats->queued));
-+	blkg_stat_add(&stats->avg_queue_size_samples, 1);
-+	bfqg_stats_update_group_wait_time(stats);
-+}
-+
-+static struct blkcg_policy blkcg_policy_bfq;
-+
-+/*
-+ * blk-cgroup policy-related handlers
-+ * The following functions help in converting between blk-cgroup
-+ * internal structures and BFQ-specific structures.
-+ */
-+
-+static struct bfq_group *pd_to_bfqg(struct blkg_policy_data *pd)
-+{
-+	return pd ? container_of(pd, struct bfq_group, pd) : NULL;
-+}
-+
-+static struct blkcg_gq *bfqg_to_blkg(struct bfq_group *bfqg)
-+{
-+	return pd_to_blkg(&bfqg->pd);
-+}
-+
-+static struct bfq_group *blkg_to_bfqg(struct blkcg_gq *blkg)
-+{
-+	return pd_to_bfqg(blkg_to_pd(blkg, &blkcg_policy_bfq));
-+}
-+
-+/*
-+ * bfq_group handlers
-+ * The following functions help in navigating the bfq_group hierarchy
-+ * by allowing to find the parent of a bfq_group or the bfq_group
-+ * associated to a bfq_queue.
-+ */
-+
-+static struct bfq_group *bfqg_parent(struct bfq_group *bfqg)
-+{
-+	struct blkcg_gq *pblkg = bfqg_to_blkg(bfqg)->parent;
-+
-+	return pblkg ? blkg_to_bfqg(pblkg) : NULL;
-+}
-+
-+static struct bfq_group *bfqq_group(struct bfq_queue *bfqq)
-+{
-+	struct bfq_entity *group_entity = bfqq->entity.parent;
-+
-+	return group_entity ? container_of(group_entity, struct bfq_group,
-+					   entity) :
-+			      bfqq->bfqd->root_group;
-+}
-+
-+/*
-+ * The following two functions handle get and put of a bfq_group by
-+ * wrapping the related blk-cgroup hooks.
-+ */
-+
-+static void bfqg_get(struct bfq_group *bfqg)
-+{
-+	return blkg_get(bfqg_to_blkg(bfqg));
-+}
-+
-+static void bfqg_put(struct bfq_group *bfqg)
-+{
-+	return blkg_put(bfqg_to_blkg(bfqg));
-+}
-+
-+static void bfqg_stats_update_io_add(struct bfq_group *bfqg,
-+				     struct bfq_queue *bfqq,
-+				     int rw)
-+{
-+	blkg_rwstat_add(&bfqg->stats.queued, rw, 1);
-+	bfqg_stats_end_empty_time(&bfqg->stats);
-+	if (!(bfqq == ((struct bfq_data *)bfqg->bfqd)->in_service_queue))
-+		bfqg_stats_set_start_group_wait_time(bfqg, bfqq_group(bfqq));
-+}
-+
-+static void bfqg_stats_update_io_remove(struct bfq_group *bfqg, int rw)
-+{
-+	blkg_rwstat_add(&bfqg->stats.queued, rw, -1);
-+}
-+
-+static void bfqg_stats_update_io_merged(struct bfq_group *bfqg, int rw)
-+{
-+	blkg_rwstat_add(&bfqg->stats.merged, rw, 1);
-+}
-+
-+static void bfqg_stats_update_dispatch(struct bfq_group *bfqg,
-+					      uint64_t bytes, int rw)
-+{
-+	blkg_stat_add(&bfqg->stats.sectors, bytes >> 9);
-+	blkg_rwstat_add(&bfqg->stats.serviced, rw, 1);
-+	blkg_rwstat_add(&bfqg->stats.service_bytes, rw, bytes);
-+}
-+
-+static void bfqg_stats_update_completion(struct bfq_group *bfqg,
-+			uint64_t start_time, uint64_t io_start_time, int rw)
-+{
-+	struct bfqg_stats *stats = &bfqg->stats;
-+	unsigned long long now = sched_clock();
-+
-+	if (time_after64(now, io_start_time))
-+		blkg_rwstat_add(&stats->service_time, rw, now - io_start_time);
-+	if (time_after64(io_start_time, start_time))
-+		blkg_rwstat_add(&stats->wait_time, rw,
-+				io_start_time - start_time);
-+}
-+
-+/* @stats = 0 */
-+static void bfqg_stats_reset(struct bfqg_stats *stats)
-+{
-+	if (!stats)
-+		return;
-+
-+	/* queued stats shouldn't be cleared */
-+	blkg_rwstat_reset(&stats->service_bytes);
-+	blkg_rwstat_reset(&stats->serviced);
-+	blkg_rwstat_reset(&stats->merged);
-+	blkg_rwstat_reset(&stats->service_time);
-+	blkg_rwstat_reset(&stats->wait_time);
-+	blkg_stat_reset(&stats->time);
-+	blkg_stat_reset(&stats->unaccounted_time);
-+	blkg_stat_reset(&stats->avg_queue_size_sum);
-+	blkg_stat_reset(&stats->avg_queue_size_samples);
-+	blkg_stat_reset(&stats->dequeue);
-+	blkg_stat_reset(&stats->group_wait_time);
-+	blkg_stat_reset(&stats->idle_time);
-+	blkg_stat_reset(&stats->empty_time);
-+}
-+
-+/* @to += @from */
-+static void bfqg_stats_merge(struct bfqg_stats *to, struct bfqg_stats *from)
-+{
-+	if (!to || !from)
-+		return;
-+
-+	/* queued stats shouldn't be cleared */
-+	blkg_rwstat_merge(&to->service_bytes, &from->service_bytes);
-+	blkg_rwstat_merge(&to->serviced, &from->serviced);
-+	blkg_rwstat_merge(&to->merged, &from->merged);
-+	blkg_rwstat_merge(&to->service_time, &from->service_time);
-+	blkg_rwstat_merge(&to->wait_time, &from->wait_time);
-+	blkg_stat_merge(&from->time, &from->time);
-+	blkg_stat_merge(&to->unaccounted_time, &from->unaccounted_time);
-+	blkg_stat_merge(&to->avg_queue_size_sum, &from->avg_queue_size_sum);
-+	blkg_stat_merge(&to->avg_queue_size_samples, &from->avg_queue_size_samples);
-+	blkg_stat_merge(&to->dequeue, &from->dequeue);
-+	blkg_stat_merge(&to->group_wait_time, &from->group_wait_time);
-+	blkg_stat_merge(&to->idle_time, &from->idle_time);
-+	blkg_stat_merge(&to->empty_time, &from->empty_time);
-+}
-+
-+/*
-+ * Transfer @bfqg's stats to its parent's dead_stats so that the ancestors'
-+ * recursive stats can still account for the amount used by this bfqg after
-+ * it's gone.
-+ */
-+static void bfqg_stats_xfer_dead(struct bfq_group *bfqg)
-+{
-+	struct bfq_group *parent;
-+
-+	if (!bfqg) /* root_group */
-+		return;
-+
-+	parent = bfqg_parent(bfqg);
-+
-+	lockdep_assert_held(bfqg_to_blkg(bfqg)->q->queue_lock);
-+
-+	if (unlikely(!parent))
-+		return;
-+
-+	bfqg_stats_merge(&parent->dead_stats, &bfqg->stats);
-+	bfqg_stats_merge(&parent->dead_stats, &bfqg->dead_stats);
-+	bfqg_stats_reset(&bfqg->stats);
-+	bfqg_stats_reset(&bfqg->dead_stats);
-+}
-+
-+static void bfq_init_entity(struct bfq_entity *entity,
-+			    struct bfq_group *bfqg)
-+{
-+	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
-+
-+	entity->weight = entity->new_weight;
-+	entity->orig_weight = entity->new_weight;
-+	if (bfqq) {
-+		bfqq->ioprio = bfqq->new_ioprio;
-+		bfqq->ioprio_class = bfqq->new_ioprio_class;
-+		bfqg_get(bfqg);
-+	}
-+	entity->parent = bfqg->my_entity;
-+	entity->sched_data = &bfqg->sched_data;
-+}
-+
-+static void bfqg_stats_init(struct bfqg_stats *stats)
-+{
-+	blkg_rwstat_init(&stats->service_bytes);
-+	blkg_rwstat_init(&stats->serviced);
-+	blkg_rwstat_init(&stats->merged);
-+	blkg_rwstat_init(&stats->service_time);
-+	blkg_rwstat_init(&stats->wait_time);
-+	blkg_rwstat_init(&stats->queued);
-+
-+	blkg_stat_init(&stats->sectors);
-+	blkg_stat_init(&stats->time);
-+
-+	blkg_stat_init(&stats->unaccounted_time);
-+	blkg_stat_init(&stats->avg_queue_size_sum);
-+	blkg_stat_init(&stats->avg_queue_size_samples);
-+	blkg_stat_init(&stats->dequeue);
-+	blkg_stat_init(&stats->group_wait_time);
-+	blkg_stat_init(&stats->idle_time);
-+	blkg_stat_init(&stats->empty_time);
-+}
-+
-+static struct bfq_group_data *cpd_to_bfqgd(struct blkcg_policy_data *cpd)
-+ {
-+	return cpd ? container_of(cpd, struct bfq_group_data, pd) : NULL;
-+ }
-+
-+static struct bfq_group_data *blkcg_to_bfqgd(struct blkcg *blkcg)
-+{
-+	return cpd_to_bfqgd(blkcg_to_cpd(blkcg, &blkcg_policy_bfq));
-+}
-+
-+static void bfq_cpd_init(const struct blkcg *blkcg)
-+{
-+	struct bfq_group_data *d =
-+		cpd_to_bfqgd(blkcg->pd[blkcg_policy_bfq.plid]);
-+
-+	d->weight = BFQ_DEFAULT_GRP_WEIGHT;
-+}
-+
-+static void bfq_pd_init(struct blkcg_gq *blkg)
-+{
-+	struct bfq_group *bfqg = blkg_to_bfqg(blkg);
-+	struct bfq_data *bfqd = blkg->q->elevator->elevator_data;
-+	struct bfq_entity *entity = &bfqg->entity;
-+	struct bfq_group_data *d = blkcg_to_bfqgd(blkg->blkcg);
-+
-+	entity->orig_weight = entity->weight = entity->new_weight = d->weight;
-+	entity->my_sched_data = &bfqg->sched_data;
-+	bfqg->my_entity = entity; /*
-+				   * the root_group's will be set to NULL
-+				   * in bfq_init_queue()
-+				   */
-+	bfqg->bfqd = bfqd;
-+	bfqg->active_entities = 0;
-+
-+	/* if the root_group does not exist, we are handling it right now */
-+	if (bfqd->root_group && bfqg != bfqd->root_group)
-+		hlist_add_head(&bfqg->bfqd_node, &bfqd->group_list);
-+
-+	bfqg_stats_init(&bfqg->stats);
-+	bfqg_stats_init(&bfqg->dead_stats);
-+}
-+
-+/* offset delta from bfqg->stats to bfqg->dead_stats */
-+static const int dead_stats_off_delta = offsetof(struct bfq_group, dead_stats) -
-+					offsetof(struct bfq_group, stats);
-+
-+/* to be used by recursive prfill, sums live and dead stats recursively */
-+static u64 bfqg_stat_pd_recursive_sum(struct blkg_policy_data *pd, int off)
-+{
-+	u64 sum = 0;
-+
-+	sum += blkg_stat_recursive_sum(pd, off);
-+	sum += blkg_stat_recursive_sum(pd, off + dead_stats_off_delta);
-+	return sum;
-+}
-+
-+/* to be used by recursive prfill, sums live and dead rwstats recursively */
-+static struct blkg_rwstat bfqg_rwstat_pd_recursive_sum(struct blkg_policy_data *pd,
-+						       int off)
-+{
-+	struct blkg_rwstat a, b;
-+
-+	a = blkg_rwstat_recursive_sum(pd, off);
-+	b = blkg_rwstat_recursive_sum(pd, off + dead_stats_off_delta);
-+	blkg_rwstat_merge(&a, &b);
-+	return a;
-+}
-+
-+static void bfq_pd_reset_stats(struct blkcg_gq *blkg)
-+{
-+	struct bfq_group *bfqg = blkg_to_bfqg(blkg);
-+
-+	bfqg_stats_reset(&bfqg->stats);
-+	bfqg_stats_reset(&bfqg->dead_stats);
-+}
-+
-+static void bfq_group_set_parent(struct bfq_group *bfqg,
-+					struct bfq_group *parent)
-+{
-+	struct bfq_entity *entity;
-+
-+	BUG_ON(!parent);
-+	BUG_ON(!bfqg);
-+	BUG_ON(bfqg == parent);
-+
-+	entity = &bfqg->entity;
-+	entity->parent = parent->my_entity;
-+	entity->sched_data = &parent->sched_data;
-+}
-+
-+static struct bfq_group *bfq_find_alloc_group(struct bfq_data *bfqd,
-+					      struct blkcg *blkcg)
-+{
-+	struct request_queue *q = bfqd->queue;
-+	struct bfq_group *bfqg = NULL, *parent;
-+	struct bfq_entity *entity = NULL;
-+
-+	assert_spin_locked(bfqd->queue->queue_lock);
-+
-+	/* avoid lookup for the common case where there's no blkcg */
-+	if (blkcg == &blkcg_root) {
-+		bfqg = bfqd->root_group;
-+	} else {
-+		struct blkcg_gq *blkg;
-+
-+		blkg = blkg_lookup_create(blkcg, q);
-+		if (!IS_ERR(blkg))
-+			bfqg = blkg_to_bfqg(blkg);
-+		else /* fallback to root_group */
-+			bfqg = bfqd->root_group;
-+	}
-+
-+	BUG_ON(!bfqg);
-+
-+	/*
-+	 * Update chain of bfq_groups as we might be handling a leaf group
-+	 * which, along with some of its relatives, has not been hooked yet
-+	 * to the private hierarchy of BFQ.
-+	 */
-+	entity = &bfqg->entity;
-+	for_each_entity(entity) {
-+		bfqg = container_of(entity, struct bfq_group, entity);
-+		BUG_ON(!bfqg);
-+		if (bfqg != bfqd->root_group) {
-+			parent = bfqg_parent(bfqg);
-+			if (!parent)
-+				parent = bfqd->root_group;
-+			BUG_ON(!parent);
-+			bfq_group_set_parent(bfqg, parent);
-+		}
-+	}
-+
-+	return bfqg;
-+}
-+
-+/**
-+ * bfq_bfqq_move - migrate @bfqq to @bfqg.
-+ * @bfqd: queue descriptor.
-+ * @bfqq: the queue to move.
-+ * @entity: @bfqq's entity.
-+ * @bfqg: the group to move to.
-+ *
-+ * Move @bfqq to @bfqg, deactivating it from its old group and reactivating
-+ * it on the new one.  Avoid putting the entity on the old group idle tree.
-+ *
-+ * Must be called under the queue lock; the cgroup owning @bfqg must
-+ * not disappear (by now this just means that we are called under
-+ * rcu_read_lock()).
-+ */
-+static void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
-+			  struct bfq_entity *entity, struct bfq_group *bfqg)
-+{
-+	int busy, resume;
-+
-+	busy = bfq_bfqq_busy(bfqq);
-+	resume = !RB_EMPTY_ROOT(&bfqq->sort_list);
-+
-+	BUG_ON(resume && !entity->on_st);
-+	BUG_ON(busy && !resume && entity->on_st &&
-+	       bfqq != bfqd->in_service_queue);
-+
-+	if (busy) {
-+		BUG_ON(atomic_read(&bfqq->ref) < 2);
-+
-+		if (!resume)
-+			bfq_del_bfqq_busy(bfqd, bfqq, 0);
-+		else
-+			bfq_deactivate_bfqq(bfqd, bfqq, 0);
-+	} else if (entity->on_st)
-+		bfq_put_idle_entity(bfq_entity_service_tree(entity), entity);
-+	bfqg_put(bfqq_group(bfqq));
-+
-+	/*
-+	 * Here we use a reference to bfqg.  We don't need a refcounter
-+	 * as the cgroup reference will not be dropped, so that its
-+	 * destroy() callback will not be invoked.
-+	 */
-+	entity->parent = bfqg->my_entity;
-+	entity->sched_data = &bfqg->sched_data;
-+	bfqg_get(bfqg);
-+
-+	if (busy) {
-+		if (resume)
-+			bfq_activate_bfqq(bfqd, bfqq);
-+	}
-+
-+	if (!bfqd->in_service_queue && !bfqd->rq_in_driver)
-+		bfq_schedule_dispatch(bfqd);
-+}
-+
-+/**
-+ * __bfq_bic_change_cgroup - move @bic to @cgroup.
-+ * @bfqd: the queue descriptor.
-+ * @bic: the bic to move.
-+ * @blkcg: the blk-cgroup to move to.
-+ *
-+ * Move bic to blkcg, assuming that bfqd->queue is locked; the caller
-+ * has to make sure that the reference to cgroup is valid across the call.
-+ *
-+ * NOTE: an alternative approach might have been to store the current
-+ * cgroup in bfqq and getting a reference to it, reducing the lookup
-+ * time here, at the price of slightly more complex code.
-+ */
-+static struct bfq_group *__bfq_bic_change_cgroup(struct bfq_data *bfqd,
-+						struct bfq_io_cq *bic,
-+						struct blkcg *blkcg)
-+{
-+	struct bfq_queue *async_bfqq = bic_to_bfqq(bic, 0);
-+	struct bfq_queue *sync_bfqq = bic_to_bfqq(bic, 1);
-+	struct bfq_group *bfqg;
-+	struct bfq_entity *entity;
-+
-+	lockdep_assert_held(bfqd->queue->queue_lock);
-+
-+	bfqg = bfq_find_alloc_group(bfqd, blkcg);
-+	if (async_bfqq) {
-+		entity = &async_bfqq->entity;
-+
-+		if (entity->sched_data != &bfqg->sched_data) {
-+			bic_set_bfqq(bic, NULL, 0);
-+			bfq_log_bfqq(bfqd, async_bfqq,
-+				     "bic_change_group: %p %d",
-+				     async_bfqq, atomic_read(&async_bfqq->ref));
-+			bfq_put_queue(async_bfqq);
-+		}
-+	}
-+
-+	if (sync_bfqq) {
-+		entity = &sync_bfqq->entity;
-+		if (entity->sched_data != &bfqg->sched_data)
-+			bfq_bfqq_move(bfqd, sync_bfqq, entity, bfqg);
-+	}
-+
-+	return bfqg;
-+}
-+
-+static void bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio)
-+{
-+	struct bfq_data *bfqd = bic_to_bfqd(bic);
-+	struct blkcg *blkcg;
-+	struct bfq_group *bfqg = NULL;
-+	uint64_t id;
-+
-+	rcu_read_lock();
-+	blkcg = bio_blkcg(bio);
-+	id = blkcg->css.serial_nr;
-+	rcu_read_unlock();
-+
-+	/*
-+	 * Check whether blkcg has changed.  The condition may trigger
-+	 * spuriously on a newly created cic but there's no harm.
-+	 */
-+	if (unlikely(!bfqd) || likely(bic->blkcg_id == id))
-+		return;
-+
-+	bfqg = __bfq_bic_change_cgroup(bfqd, bic, blkcg);
-+	BUG_ON(!bfqg);
-+	bic->blkcg_id = id;
-+}
-+
-+/**
-+ * bfq_flush_idle_tree - deactivate any entity on the idle tree of @st.
-+ * @st: the service tree being flushed.
-+ */
-+static void bfq_flush_idle_tree(struct bfq_service_tree *st)
-+{
-+	struct bfq_entity *entity = st->first_idle;
-+
-+	for (; entity ; entity = st->first_idle)
-+		__bfq_deactivate_entity(entity, 0);
-+}
-+
-+/**
-+ * bfq_reparent_leaf_entity - move leaf entity to the root_group.
-+ * @bfqd: the device data structure with the root group.
-+ * @entity: the entity to move.
-+ */
-+static void bfq_reparent_leaf_entity(struct bfq_data *bfqd,
-+				     struct bfq_entity *entity)
-+{
-+	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
-+
-+	BUG_ON(!bfqq);
-+	bfq_bfqq_move(bfqd, bfqq, entity, bfqd->root_group);
-+	return;
-+}
-+
-+/**
-+ * bfq_reparent_active_entities - move to the root group all active
-+ *                                entities.
-+ * @bfqd: the device data structure with the root group.
-+ * @bfqg: the group to move from.
-+ * @st: the service tree with the entities.
-+ *
-+ * Needs queue_lock to be taken and reference to be valid over the call.
-+ */
-+static void bfq_reparent_active_entities(struct bfq_data *bfqd,
-+					 struct bfq_group *bfqg,
-+					 struct bfq_service_tree *st)
-+{
-+	struct rb_root *active = &st->active;
-+	struct bfq_entity *entity = NULL;
-+
-+	if (!RB_EMPTY_ROOT(&st->active))
-+		entity = bfq_entity_of(rb_first(active));
-+
-+	for (; entity ; entity = bfq_entity_of(rb_first(active)))
-+		bfq_reparent_leaf_entity(bfqd, entity);
-+
-+	if (bfqg->sched_data.in_service_entity)
-+		bfq_reparent_leaf_entity(bfqd,
-+			bfqg->sched_data.in_service_entity);
-+
-+	return;
-+}
-+
-+/**
-+ * bfq_destroy_group - destroy @bfqg.
-+ * @bfqg: the group being destroyed.
-+ *
-+ * Destroy @bfqg, making sure that it is not referenced from its parent.
-+ * blkio already grabs the queue_lock for us, so no need to use RCU-based magic
-+ */
-+static void bfq_pd_offline(struct blkcg_gq *blkg)
-+{
-+	struct bfq_service_tree *st;
-+	struct bfq_group *bfqg = blkg_to_bfqg(blkg);
-+	struct bfq_data *bfqd = bfqg->bfqd;
-+	struct bfq_entity *entity = bfqg->my_entity;
-+	int i;
-+
-+	if (!entity) /* root group */
-+		return;
-+
-+	/*
-+	 * Empty all service_trees belonging to this group before
-+	 * deactivating the group itself.
-+	 */
-+	for (i = 0; i < BFQ_IOPRIO_CLASSES; i++) {
-+		st = bfqg->sched_data.service_tree + i;
-+
-+		/*
-+		 * The idle tree may still contain bfq_queues belonging
-+		 * to exited task because they never migrated to a different
-+		 * cgroup from the one being destroyed now.  No one else
-+		 * can access them so it's safe to act without any lock.
-+		 */
-+		bfq_flush_idle_tree(st);
-+
-+		/*
-+		 * It may happen that some queues are still active
-+		 * (busy) upon group destruction (if the corresponding
-+		 * processes have been forced to terminate). We move
-+		 * all the leaf entities corresponding to these queues
-+		 * to the root_group.
-+		 * Also, it may happen that the group has an entity
-+		 * in service, which is disconnected from the active
-+		 * tree: it must be moved, too.
-+		 * There is no need to put the sync queues, as the
-+		 * scheduler has taken no reference.
-+		 */
-+		bfq_reparent_active_entities(bfqd, bfqg, st);
-+		BUG_ON(!RB_EMPTY_ROOT(&st->active));
-+		BUG_ON(!RB_EMPTY_ROOT(&st->idle));
-+	}
-+	BUG_ON(bfqg->sched_data.next_in_service);
-+	BUG_ON(bfqg->sched_data.in_service_entity);
-+
-+	hlist_del(&bfqg->bfqd_node);
-+	__bfq_deactivate_entity(entity, 0);
-+	bfq_put_async_queues(bfqd, bfqg);
-+	BUG_ON(entity->tree);
-+
-+	bfqg_stats_xfer_dead(bfqg);
-+}
-+
-+static void bfq_end_wr_async(struct bfq_data *bfqd)
-+{
-+	struct hlist_node *tmp;
-+	struct bfq_group *bfqg;
-+
-+	hlist_for_each_entry_safe(bfqg, tmp, &bfqd->group_list, bfqd_node)
-+		bfq_end_wr_async_queues(bfqd, bfqg);
-+	bfq_end_wr_async_queues(bfqd, bfqd->root_group);
-+}
-+
-+/**
-+ * bfq_disconnect_groups - disconnect @bfqd from all its groups.
-+ * @bfqd: the device descriptor being exited.
-+ *
-+ * When the device exits we just make sure that no lookup can return
-+ * the now unused group structures.  They will be deallocated on cgroup
-+ * destruction.
-+ */
-+static void bfq_disconnect_groups(struct bfq_data *bfqd)
-+{
-+	struct hlist_node *tmp;
-+	struct bfq_group *bfqg;
-+
-+	bfq_log(bfqd, "disconnect_groups beginning");
-+	hlist_for_each_entry_safe(bfqg, tmp, &bfqd->group_list, bfqd_node) {
-+		hlist_del(&bfqg->bfqd_node);
-+
-+		__bfq_deactivate_entity(bfqg->my_entity, 0);
-+
-+		/*
-+		 * Don't remove from the group hash, just set an
-+		 * invalid key.  No lookups can race with the
-+		 * assignment as bfqd is being destroyed; this
-+		 * implies also that new elements cannot be added
-+		 * to the list.
-+		 */
-+		rcu_assign_pointer(bfqg->bfqd, NULL);
-+
-+		bfq_log(bfqd, "disconnect_groups: put async for group %p",
-+			bfqg);
-+		bfq_put_async_queues(bfqd, bfqg);
-+	}
-+}
-+
-+static u64 bfqio_cgroup_weight_read(struct cgroup_subsys_state *css,
-+				       struct cftype *cftype)
-+{
-+	struct blkcg *blkcg = css_to_blkcg(css);
-+	struct bfq_group_data *bfqgd = blkcg_to_bfqgd(blkcg);
-+	int ret = -EINVAL;
-+
-+	spin_lock_irq(&blkcg->lock);
-+	ret = bfqgd->weight;
-+	spin_unlock_irq(&blkcg->lock);
-+
-+	return ret;
-+}
-+
-+static int bfqio_cgroup_weight_write(struct cgroup_subsys_state *css,
-+					struct cftype *cftype,
-+					u64 val)
-+{
-+	struct blkcg *blkcg = css_to_blkcg(css);
-+	struct bfq_group_data *bfqgd = blkcg_to_bfqgd(blkcg);
-+	struct blkcg_gq *blkg;
-+	int ret = -EINVAL;
-+
-+	if (val < BFQ_MIN_WEIGHT || val > BFQ_MAX_WEIGHT)
-+		return ret;
-+
-+	ret = 0;
-+	spin_lock_irq(&blkcg->lock);
-+	bfqgd->weight = (unsigned short)val;
-+	hlist_for_each_entry(blkg, &blkcg->blkg_list, blkcg_node) {
-+		struct bfq_group *bfqg = blkg_to_bfqg(blkg);
-+		if (!bfqg)
-+			continue;
-+		/*
-+		 * Setting the prio_changed flag of the entity
-+		 * to 1 with new_weight == weight would re-set
-+		 * the value of the weight to its ioprio mapping.
-+		 * Set the flag only if necessary.
-+		 */
-+		if ((unsigned short)val != bfqg->entity.new_weight) {
-+			bfqg->entity.new_weight = (unsigned short)val;
-+			/*
-+			 * Make sure that the above new value has been
-+			 * stored in bfqg->entity.new_weight before
-+			 * setting the prio_changed flag. In fact,
-+			 * this flag may be read asynchronously (in
-+			 * critical sections protected by a different
-+			 * lock than that held here), and finding this
-+			 * flag set may cause the execution of the code
-+			 * for updating parameters whose value may
-+			 * depend also on bfqg->entity.new_weight (in
-+			 * __bfq_entity_update_weight_prio).
-+			 * This barrier makes sure that the new value
-+			 * of bfqg->entity.new_weight is correctly
-+			 * seen in that code.
-+			 */
-+			smp_wmb();
-+			bfqg->entity.prio_changed = 1;
-+		}
-+	}
-+	spin_unlock_irq(&blkcg->lock);
-+
-+	return ret;
-+}
-+
-+static int bfqg_print_stat(struct seq_file *sf, void *v)
-+{
-+	blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)), blkg_prfill_stat,
-+			  &blkcg_policy_bfq, seq_cft(sf)->private, false);
-+	return 0;
-+}
-+
-+static int bfqg_print_rwstat(struct seq_file *sf, void *v)
-+{
-+	blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)), blkg_prfill_rwstat,
-+			  &blkcg_policy_bfq, seq_cft(sf)->private, true);
-+	return 0;
-+}
-+
-+static u64 bfqg_prfill_stat_recursive(struct seq_file *sf,
-+				      struct blkg_policy_data *pd, int off)
-+{
-+	u64 sum = bfqg_stat_pd_recursive_sum(pd, off);
-+
-+	return __blkg_prfill_u64(sf, pd, sum);
-+}
-+
-+static u64 bfqg_prfill_rwstat_recursive(struct seq_file *sf,
-+					struct blkg_policy_data *pd, int off)
-+{
-+	struct blkg_rwstat sum = bfqg_rwstat_pd_recursive_sum(pd, off);
-+
-+	return __blkg_prfill_rwstat(sf, pd, &sum);
-+}
-+
-+static int bfqg_print_stat_recursive(struct seq_file *sf, void *v)
-+{
-+	blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)),
-+			  bfqg_prfill_stat_recursive, &blkcg_policy_bfq,
-+			  seq_cft(sf)->private, false);
-+	return 0;
-+}
-+
-+static int bfqg_print_rwstat_recursive(struct seq_file *sf, void *v)
-+{
-+	blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)),
-+			  bfqg_prfill_rwstat_recursive, &blkcg_policy_bfq,
-+			  seq_cft(sf)->private, true);
-+	return 0;
-+}
-+
-+static u64 bfqg_prfill_avg_queue_size(struct seq_file *sf,
-+				      struct blkg_policy_data *pd, int off)
-+{
-+	struct bfq_group *bfqg = pd_to_bfqg(pd);
-+	u64 samples = blkg_stat_read(&bfqg->stats.avg_queue_size_samples);
-+	u64 v = 0;
-+
-+	if (samples) {
-+		v = blkg_stat_read(&bfqg->stats.avg_queue_size_sum);
-+		v = div64_u64(v, samples);
-+	}
-+	__blkg_prfill_u64(sf, pd, v);
-+	return 0;
-+}
-+
-+/* print avg_queue_size */
-+static int bfqg_print_avg_queue_size(struct seq_file *sf, void *v)
-+{
-+	blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)),
-+			  bfqg_prfill_avg_queue_size, &blkcg_policy_bfq,
-+			  0, false);
-+	return 0;
-+}
-+
-+static struct bfq_group *bfq_create_group_hierarchy(struct bfq_data *bfqd, int node)
-+{
-+	int ret;
-+
-+	ret = blkcg_activate_policy(bfqd->queue, &blkcg_policy_bfq);
-+	if (ret)
-+		return NULL;
-+
-+        return blkg_to_bfqg(bfqd->queue->root_blkg);
-+}
-+
-+static struct cftype bfqio_files[] = {
-+	{
-+		.name = "bfq.weight",
-+		.read_u64 = bfqio_cgroup_weight_read,
-+		.write_u64 = bfqio_cgroup_weight_write,
-+	},
-+	/* statistics, cover only the tasks in the bfqg */
-+	{
-+		.name = "bfq.time",
-+		.private = offsetof(struct bfq_group, stats.time),
-+		.seq_show = bfqg_print_stat,
-+	},
-+	{
-+		.name = "bfq.sectors",
-+		.private = offsetof(struct bfq_group, stats.sectors),
-+		.seq_show = bfqg_print_stat,
-+	},
-+	{
-+		.name = "bfq.io_service_bytes",
-+		.private = offsetof(struct bfq_group, stats.service_bytes),
-+		.seq_show = bfqg_print_rwstat,
-+	},
-+	{
-+		.name = "bfq.io_serviced",
-+		.private = offsetof(struct bfq_group, stats.serviced),
-+		.seq_show = bfqg_print_rwstat,
-+	},
-+	{
-+		.name = "bfq.io_service_time",
-+		.private = offsetof(struct bfq_group, stats.service_time),
-+		.seq_show = bfqg_print_rwstat,
-+	},
-+	{
-+		.name = "bfq.io_wait_time",
-+		.private = offsetof(struct bfq_group, stats.wait_time),
-+		.seq_show = bfqg_print_rwstat,
-+	},
-+	{
-+		.name = "bfq.io_merged",
-+		.private = offsetof(struct bfq_group, stats.merged),
-+		.seq_show = bfqg_print_rwstat,
-+	},
-+	{
-+		.name = "bfq.io_queued",
-+		.private = offsetof(struct bfq_group, stats.queued),
-+		.seq_show = bfqg_print_rwstat,
-+	},
-+
-+	/* the same statictics which cover the bfqg and its descendants */
-+	{
-+		.name = "bfq.time_recursive",
-+		.private = offsetof(struct bfq_group, stats.time),
-+		.seq_show = bfqg_print_stat_recursive,
-+	},
-+	{
-+		.name = "bfq.sectors_recursive",
-+		.private = offsetof(struct bfq_group, stats.sectors),
-+		.seq_show = bfqg_print_stat_recursive,
-+	},
-+	{
-+		.name = "bfq.io_service_bytes_recursive",
-+		.private = offsetof(struct bfq_group, stats.service_bytes),
-+		.seq_show = bfqg_print_rwstat_recursive,
-+	},
-+	{
-+		.name = "bfq.io_serviced_recursive",
-+		.private = offsetof(struct bfq_group, stats.serviced),
-+		.seq_show = bfqg_print_rwstat_recursive,
-+	},
-+	{
-+		.name = "bfq.io_service_time_recursive",
-+		.private = offsetof(struct bfq_group, stats.service_time),
-+		.seq_show = bfqg_print_rwstat_recursive,
-+	},
-+	{
-+		.name = "bfq.io_wait_time_recursive",
-+		.private = offsetof(struct bfq_group, stats.wait_time),
-+		.seq_show = bfqg_print_rwstat_recursive,
-+	},
-+	{
-+		.name = "bfq.io_merged_recursive",
-+		.private = offsetof(struct bfq_group, stats.merged),
-+		.seq_show = bfqg_print_rwstat_recursive,
-+	},
-+	{
-+		.name = "bfq.io_queued_recursive",
-+		.private = offsetof(struct bfq_group, stats.queued),
-+		.seq_show = bfqg_print_rwstat_recursive,
-+	},
-+	{
-+		.name = "bfq.avg_queue_size",
-+		.seq_show = bfqg_print_avg_queue_size,
-+	},
-+	{
-+		.name = "bfq.group_wait_time",
-+		.private = offsetof(struct bfq_group, stats.group_wait_time),
-+		.seq_show = bfqg_print_stat,
-+	},
-+	{
-+		.name = "bfq.idle_time",
-+		.private = offsetof(struct bfq_group, stats.idle_time),
-+		.seq_show = bfqg_print_stat,
-+	},
-+	{
-+		.name = "bfq.empty_time",
-+		.private = offsetof(struct bfq_group, stats.empty_time),
-+		.seq_show = bfqg_print_stat,
-+	},
-+	{
-+		.name = "bfq.dequeue",
-+		.private = offsetof(struct bfq_group, stats.dequeue),
-+		.seq_show = bfqg_print_stat,
-+	},
-+	{
-+		.name = "bfq.unaccounted_time",
-+		.private = offsetof(struct bfq_group, stats.unaccounted_time),
-+		.seq_show = bfqg_print_stat,
-+	},
-+	{ }	/* terminate */
-+};
-+
-+static struct blkcg_policy blkcg_policy_bfq = {
-+       .pd_size                = sizeof(struct bfq_group),
-+       .cpd_size               = sizeof(struct bfq_group_data),
-+       .cftypes                = bfqio_files,
-+       .pd_init_fn             = bfq_pd_init,
-+       .cpd_init_fn            = bfq_cpd_init,
-+       .pd_offline_fn          = bfq_pd_offline,
-+       .pd_reset_stats_fn      = bfq_pd_reset_stats,
-+};
-+
-+#else
-+
-+static void bfq_init_entity(struct bfq_entity *entity,
-+			    struct bfq_group *bfqg)
-+{
-+	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
-+	entity->weight = entity->new_weight;
-+	entity->orig_weight = entity->new_weight;
-+	if (bfqq) {
-+		bfqq->ioprio = bfqq->new_ioprio;
-+		bfqq->ioprio_class = bfqq->new_ioprio_class;
-+	}
-+	entity->sched_data = &bfqg->sched_data;
-+}
-+
-+static struct bfq_group *
-+bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio)
-+{
-+	struct bfq_data *bfqd = bic_to_bfqd(bic);
-+	return bfqd->root_group;
-+}
-+
-+static void bfq_bfqq_move(struct bfq_data *bfqd,
-+			  struct bfq_queue *bfqq,
-+			  struct bfq_entity *entity,
-+			  struct bfq_group *bfqg)
-+{
-+}
-+
-+static void bfq_end_wr_async(struct bfq_data *bfqd)
-+{
-+	bfq_end_wr_async_queues(bfqd, bfqd->root_group);
-+}
-+
-+static void bfq_disconnect_groups(struct bfq_data *bfqd)
-+{
-+	bfq_put_async_queues(bfqd, bfqd->root_group);
-+}
-+
-+static struct bfq_group *bfq_find_alloc_group(struct bfq_data *bfqd,
-+                                              struct blkcg *blkcg)
-+{
-+	return bfqd->root_group;
-+}
-+
-+static struct bfq_group *bfq_create_group_hierarchy(struct bfq_data *bfqd, int node)
-+{
-+	struct bfq_group *bfqg;
-+	int i;
-+
-+	bfqg = kmalloc_node(sizeof(*bfqg), GFP_KERNEL | __GFP_ZERO, node);
-+	if (!bfqg)
-+		return NULL;
-+
-+	for (i = 0; i < BFQ_IOPRIO_CLASSES; i++)
-+		bfqg->sched_data.service_tree[i] = BFQ_SERVICE_TREE_INIT;
-+
-+	return bfqg;
-+}
-+#endif
-diff --git a/block/bfq-ioc.c b/block/bfq-ioc.c
-new file mode 100644
-index 0000000..fb7bb8f
---- /dev/null
-+++ b/block/bfq-ioc.c
-@@ -0,0 +1,36 @@
-+/*
-+ * BFQ: I/O context handling.
-+ *
-+ * Based on ideas and code from CFQ:
-+ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
-+ *
-+ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
-+ *		      Paolo Valente <paolo.valente@unimore.it>
-+ *
-+ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
-+ */
-+
-+/**
-+ * icq_to_bic - convert iocontext queue structure to bfq_io_cq.
-+ * @icq: the iocontext queue.
-+ */
-+static struct bfq_io_cq *icq_to_bic(struct io_cq *icq)
-+{
-+	/* bic->icq is the first member, %NULL will convert to %NULL */
-+	return container_of(icq, struct bfq_io_cq, icq);
-+}
-+
-+/**
-+ * bfq_bic_lookup - search into @ioc a bic associated to @bfqd.
-+ * @bfqd: the lookup key.
-+ * @ioc: the io_context of the process doing I/O.
-+ *
-+ * Queue lock must be held.
-+ */
-+static struct bfq_io_cq *bfq_bic_lookup(struct bfq_data *bfqd,
-+					struct io_context *ioc)
-+{
-+	if (ioc)
-+		return icq_to_bic(ioc_lookup_icq(ioc, bfqd->queue));
-+	return NULL;
-+}
-diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
-new file mode 100644
-index 0000000..51d24dd
---- /dev/null
-+++ b/block/bfq-iosched.c
-@@ -0,0 +1,3753 @@
-+/*
-+ * Budget Fair Queueing (BFQ) disk scheduler.
-+ *
-+ * Based on ideas and code from CFQ:
-+ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
-+ *
-+ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
-+ *		      Paolo Valente <paolo.valente@unimore.it>
-+ *
-+ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
-+ *
-+ * Licensed under the GPL-2 as detailed in the accompanying COPYING.BFQ
-+ * file.
-+ *
-+ * BFQ is a proportional-share storage-I/O scheduling algorithm based on
-+ * the slice-by-slice service scheme of CFQ. But BFQ assigns budgets,
-+ * measured in number of sectors, to processes instead of time slices. The
-+ * device is not granted to the in-service process for a given time slice,
-+ * but until it has exhausted its assigned budget. This change from the time
-+ * to the service domain allows BFQ to distribute the device throughput
-+ * among processes as desired, without any distortion due to ZBR, workload
-+ * fluctuations or other factors. BFQ uses an ad hoc internal scheduler,
-+ * called B-WF2Q+, to schedule processes according to their budgets. More
-+ * precisely, BFQ schedules queues associated to processes. Thanks to the
-+ * accurate policy of B-WF2Q+, BFQ can afford to assign high budgets to
-+ * I/O-bound processes issuing sequential requests (to boost the
-+ * throughput), and yet guarantee a low latency to interactive and soft
-+ * real-time applications.
-+ *
-+ * BFQ is described in [1], where also a reference to the initial, more
-+ * theoretical paper on BFQ can be found. The interested reader can find
-+ * in the latter paper full details on the main algorithm, as well as
-+ * formulas of the guarantees and formal proofs of all the properties.
-+ * With respect to the version of BFQ presented in these papers, this
-+ * implementation adds a few more heuristics, such as the one that
-+ * guarantees a low latency to soft real-time applications, and a
-+ * hierarchical extension based on H-WF2Q+.
-+ *
-+ * B-WF2Q+ is based on WF2Q+, that is described in [2], together with
-+ * H-WF2Q+, while the augmented tree used to implement B-WF2Q+ with O(log N)
-+ * complexity derives from the one introduced with EEVDF in [3].
-+ *
-+ * [1] P. Valente and M. Andreolini, ``Improving Application Responsiveness
-+ *     with the BFQ Disk I/O Scheduler'',
-+ *     Proceedings of the 5th Annual International Systems and Storage
-+ *     Conference (SYSTOR '12), June 2012.
-+ *
-+ * http://algogroup.unimo.it/people/paolo/disk_sched/bf1-v1-suite-results.pdf
-+ *
-+ * [2] Jon C.R. Bennett and H. Zhang, ``Hierarchical Packet Fair Queueing
-+ *     Algorithms,'' IEEE/ACM Transactions on Networking, 5(5):675-689,
-+ *     Oct 1997.
-+ *
-+ * http://www.cs.cmu.edu/~hzhang/papers/TON-97-Oct.ps.gz
-+ *
-+ * [3] I. Stoica and H. Abdel-Wahab, ``Earliest Eligible Virtual Deadline
-+ *     First: A Flexible and Accurate Mechanism for Proportional Share
-+ *     Resource Allocation,'' technical report.
-+ *
-+ * http://www.cs.berkeley.edu/~istoica/papers/eevdf-tr-95.pdf
-+ */
-+#include <linux/module.h>
-+#include <linux/slab.h>
-+#include <linux/blkdev.h>
-+#include <linux/cgroup.h>
-+#include <linux/elevator.h>
-+#include <linux/jiffies.h>
-+#include <linux/rbtree.h>
-+#include <linux/ioprio.h>
-+#include "bfq.h"
-+#include "blk.h"
-+
-+/* Expiration time of sync (0) and async (1) requests, in jiffies. */
-+static const int bfq_fifo_expire[2] = { HZ / 4, HZ / 8 };
-+
-+/* Maximum backwards seek, in KiB. */
-+static const int bfq_back_max = 16 * 1024;
-+
-+/* Penalty of a backwards seek, in number of sectors. */
-+static const int bfq_back_penalty = 2;
-+
-+/* Idling period duration, in jiffies. */
-+static int bfq_slice_idle = HZ / 125;
-+
-+/* Minimum number of assigned budgets for which stats are safe to compute. */
-+static const int bfq_stats_min_budgets = 194;
-+
-+/* Default maximum budget values, in sectors and number of requests. */
-+static const int bfq_default_max_budget = 16 * 1024;
-+static const int bfq_max_budget_async_rq = 4;
-+
-+/*
-+ * Async to sync throughput distribution is controlled as follows:
-+ * when an async request is served, the entity is charged the number
-+ * of sectors of the request, multiplied by the factor below
-+ */
-+static const int bfq_async_charge_factor = 10;
-+
-+/* Default timeout values, in jiffies, approximating CFQ defaults. */
-+static const int bfq_timeout_sync = HZ / 8;
-+static int bfq_timeout_async = HZ / 25;
-+
-+struct kmem_cache *bfq_pool;
-+
-+/* Below this threshold (in ms), we consider thinktime immediate. */
-+#define BFQ_MIN_TT		2
-+
-+/* hw_tag detection: parallel requests threshold and min samples needed. */
-+#define BFQ_HW_QUEUE_THRESHOLD	4
-+#define BFQ_HW_QUEUE_SAMPLES	32
-+
-+#define BFQQ_SEEK_THR	 (sector_t)(8 * 1024)
-+#define BFQQ_SEEKY(bfqq) ((bfqq)->seek_mean > BFQQ_SEEK_THR)
-+
-+/* Min samples used for peak rate estimation (for autotuning). */
-+#define BFQ_PEAK_RATE_SAMPLES	32
-+
-+/* Shift used for peak rate fixed precision calculations. */
-+#define BFQ_RATE_SHIFT		16
-+
-+/*
-+ * By default, BFQ computes the duration of the weight raising for
-+ * interactive applications automatically, using the following formula:
-+ * duration = (R / r) * T, where r is the peak rate of the device, and
-+ * R and T are two reference parameters.
-+ * In particular, R is the peak rate of the reference device (see below),
-+ * and T is a reference time: given the systems that are likely to be
-+ * installed on the reference device according to its speed class, T is
-+ * about the maximum time needed, under BFQ and while reading two files in
-+ * parallel, to load typical large applications on these systems.
-+ * In practice, the slower/faster the device at hand is, the more/less it
-+ * takes to load applications with respect to the reference device.
-+ * Accordingly, the longer/shorter BFQ grants weight raising to interactive
-+ * applications.
-+ *
-+ * BFQ uses four different reference pairs (R, T), depending on:
-+ * . whether the device is rotational or non-rotational;
-+ * . whether the device is slow, such as old or portable HDDs, as well as
-+ *   SD cards, or fast, such as newer HDDs and SSDs.
-+ *
-+ * The device's speed class is dynamically (re)detected in
-+ * bfq_update_peak_rate() every time the estimated peak rate is updated.
-+ *
-+ * In the following definitions, R_slow[0]/R_fast[0] and T_slow[0]/T_fast[0]
-+ * are the reference values for a slow/fast rotational device, whereas
-+ * R_slow[1]/R_fast[1] and T_slow[1]/T_fast[1] are the reference values for
-+ * a slow/fast non-rotational device. Finally, device_speed_thresh are the
-+ * thresholds used to switch between speed classes.
-+ * Both the reference peak rates and the thresholds are measured in
-+ * sectors/usec, left-shifted by BFQ_RATE_SHIFT.
-+ */
-+static int R_slow[2] = {1536, 10752};
-+static int R_fast[2] = {17415, 34791};
-+/*
-+ * To improve readability, a conversion function is used to initialize the
-+ * following arrays, which entails that they can be initialized only in a
-+ * function.
-+ */
-+static int T_slow[2];
-+static int T_fast[2];
-+static int device_speed_thresh[2];
-+
-+#define BFQ_SERVICE_TREE_INIT	((struct bfq_service_tree)		\
-+				{ RB_ROOT, RB_ROOT, NULL, NULL, 0, 0 })
-+
-+#define RQ_BIC(rq)		((struct bfq_io_cq *) (rq)->elv.priv[0])
-+#define RQ_BFQQ(rq)		((rq)->elv.priv[1])
-+
-+static void bfq_schedule_dispatch(struct bfq_data *bfqd);
-+
-+#include "bfq-ioc.c"
-+#include "bfq-sched.c"
-+#include "bfq-cgroup.c"
-+
-+#define bfq_class_idle(bfqq)	((bfqq)->ioprio_class == IOPRIO_CLASS_IDLE)
-+#define bfq_class_rt(bfqq)	((bfqq)->ioprio_class == IOPRIO_CLASS_RT)
-+
-+#define bfq_sample_valid(samples)	((samples) > 80)
-+
-+/*
-+ * We regard a request as SYNC, if either it's a read or has the SYNC bit
-+ * set (in which case it could also be a direct WRITE).
-+ */
-+static int bfq_bio_sync(struct bio *bio)
-+{
-+	if (bio_data_dir(bio) == READ || (bio->bi_rw & REQ_SYNC))
-+		return 1;
-+
-+	return 0;
-+}
-+
-+/*
-+ * Scheduler run of queue, if there are requests pending and no one in the
-+ * driver that will restart queueing.
-+ */
-+static void bfq_schedule_dispatch(struct bfq_data *bfqd)
-+{
-+	if (bfqd->queued != 0) {
-+		bfq_log(bfqd, "schedule dispatch");
-+		kblockd_schedule_work(&bfqd->unplug_work);
-+	}
-+}
-+
-+/*
-+ * Lifted from AS - choose which of rq1 and rq2 that is best served now.
-+ * We choose the request that is closesr to the head right now.  Distance
-+ * behind the head is penalized and only allowed to a certain extent.
-+ */
-+static struct request *bfq_choose_req(struct bfq_data *bfqd,
-+				      struct request *rq1,
-+				      struct request *rq2,
-+				      sector_t last)
-+{
-+	sector_t s1, s2, d1 = 0, d2 = 0;
-+	unsigned long back_max;
-+#define BFQ_RQ1_WRAP	0x01 /* request 1 wraps */
-+#define BFQ_RQ2_WRAP	0x02 /* request 2 wraps */
-+	unsigned wrap = 0; /* bit mask: requests behind the disk head? */
-+
-+	if (!rq1 || rq1 == rq2)
-+		return rq2;
-+	if (!rq2)
-+		return rq1;
-+
-+	if (rq_is_sync(rq1) && !rq_is_sync(rq2))
-+		return rq1;
-+	else if (rq_is_sync(rq2) && !rq_is_sync(rq1))
-+		return rq2;
-+	if ((rq1->cmd_flags & REQ_META) && !(rq2->cmd_flags & REQ_META))
-+		return rq1;
-+	else if ((rq2->cmd_flags & REQ_META) && !(rq1->cmd_flags & REQ_META))
-+		return rq2;
-+
-+	s1 = blk_rq_pos(rq1);
-+	s2 = blk_rq_pos(rq2);
-+
-+	/*
-+	 * By definition, 1KiB is 2 sectors.
-+	 */
-+	back_max = bfqd->bfq_back_max * 2;
-+
-+	/*
-+	 * Strict one way elevator _except_ in the case where we allow
-+	 * short backward seeks which are biased as twice the cost of a
-+	 * similar forward seek.
-+	 */
-+	if (s1 >= last)
-+		d1 = s1 - last;
-+	else if (s1 + back_max >= last)
-+		d1 = (last - s1) * bfqd->bfq_back_penalty;
-+	else
-+		wrap |= BFQ_RQ1_WRAP;
-+
-+	if (s2 >= last)
-+		d2 = s2 - last;
-+	else if (s2 + back_max >= last)
-+		d2 = (last - s2) * bfqd->bfq_back_penalty;
-+	else
-+		wrap |= BFQ_RQ2_WRAP;
-+
-+	/* Found required data */
-+
-+	/*
-+	 * By doing switch() on the bit mask "wrap" we avoid having to
-+	 * check two variables for all permutations: --> faster!
-+	 */
-+	switch (wrap) {
-+	case 0: /* common case for CFQ: rq1 and rq2 not wrapped */
-+		if (d1 < d2)
-+			return rq1;
-+		else if (d2 < d1)
-+			return rq2;
-+		else {
-+			if (s1 >= s2)
-+				return rq1;
-+			else
-+				return rq2;
-+		}
-+
-+	case BFQ_RQ2_WRAP:
-+		return rq1;
-+	case BFQ_RQ1_WRAP:
-+		return rq2;
-+	case (BFQ_RQ1_WRAP|BFQ_RQ2_WRAP): /* both rqs wrapped */
-+	default:
-+		/*
-+		 * Since both rqs are wrapped,
-+		 * start with the one that's further behind head
-+		 * (--> only *one* back seek required),
-+		 * since back seek takes more time than forward.
-+		 */
-+		if (s1 <= s2)
-+			return rq1;
-+		else
-+			return rq2;
-+	}
-+}
-+
-+/*
-+ * Tell whether there are active queues or groups with differentiated weights.
-+ */
-+static bool bfq_differentiated_weights(struct bfq_data *bfqd)
-+{
-+	/*
-+	 * For weights to differ, at least one of the trees must contain
-+	 * at least two nodes.
-+	 */
-+	return (!RB_EMPTY_ROOT(&bfqd->queue_weights_tree) &&
-+		(bfqd->queue_weights_tree.rb_node->rb_left ||
-+		 bfqd->queue_weights_tree.rb_node->rb_right)
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+	       ) ||
-+	       (!RB_EMPTY_ROOT(&bfqd->group_weights_tree) &&
-+		(bfqd->group_weights_tree.rb_node->rb_left ||
-+		 bfqd->group_weights_tree.rb_node->rb_right)
-+#endif
-+	       );
-+}
-+
-+/*
-+ * The following function returns true if every queue must receive the
-+ * same share of the throughput (this condition is used when deciding
-+ * whether idling may be disabled, see the comments in the function
-+ * bfq_bfqq_may_idle()).
-+ *
-+ * Such a scenario occurs when:
-+ * 1) all active queues have the same weight,
-+ * 2) all active groups at the same level in the groups tree have the same
-+ *    weight,
-+ * 3) all active groups at the same level in the groups tree have the same
-+ *    number of children.
-+ *
-+ * Unfortunately, keeping the necessary state for evaluating exactly the
-+ * above symmetry conditions would be quite complex and time-consuming.
-+ * Therefore this function evaluates, instead, the following stronger
-+ * sub-conditions, for which it is much easier to maintain the needed
-+ * state:
-+ * 1) all active queues have the same weight,
-+ * 2) all active groups have the same weight,
-+ * 3) all active groups have at most one active child each.
-+ * In particular, the last two conditions are always true if hierarchical
-+ * support and the cgroups interface are not enabled, thus no state needs
-+ * to be maintained in this case.
-+ */
-+static bool bfq_symmetric_scenario(struct bfq_data *bfqd)
-+{
-+	return
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+		!bfqd->active_numerous_groups &&
-+#endif
-+		!bfq_differentiated_weights(bfqd);
-+}
-+
-+/*
-+ * If the weight-counter tree passed as input contains no counter for
-+ * the weight of the input entity, then add that counter; otherwise just
-+ * increment the existing counter.
-+ *
-+ * Note that weight-counter trees contain few nodes in mostly symmetric
-+ * scenarios. For example, if all queues have the same weight, then the
-+ * weight-counter tree for the queues may contain at most one node.
-+ * This holds even if low_latency is on, because weight-raised queues
-+ * are not inserted in the tree.
-+ * In most scenarios, the rate at which nodes are created/destroyed
-+ * should be low too.
-+ */
-+static void bfq_weights_tree_add(struct bfq_data *bfqd,
-+				 struct bfq_entity *entity,
-+				 struct rb_root *root)
-+{
-+	struct rb_node **new = &(root->rb_node), *parent = NULL;
-+
-+	/*
-+	 * Do not insert if the entity is already associated with a
-+	 * counter, which happens if:
-+	 *   1) the entity is associated with a queue,
-+	 *   2) a request arrival has caused the queue to become both
-+	 *      non-weight-raised, and hence change its weight, and
-+	 *      backlogged; in this respect, each of the two events
-+	 *      causes an invocation of this function,
-+	 *   3) this is the invocation of this function caused by the
-+	 *      second event. This second invocation is actually useless,
-+	 *      and we handle this fact by exiting immediately. More
-+	 *      efficient or clearer solutions might possibly be adopted.
-+	 */
-+	if (entity->weight_counter)
-+		return;
-+
-+	while (*new) {
-+		struct bfq_weight_counter *__counter = container_of(*new,
-+						struct bfq_weight_counter,
-+						weights_node);
-+		parent = *new;
-+
-+		if (entity->weight == __counter->weight) {
-+			entity->weight_counter = __counter;
-+			goto inc_counter;
-+		}
-+		if (entity->weight < __counter->weight)
-+			new = &((*new)->rb_left);
-+		else
-+			new = &((*new)->rb_right);
-+	}
-+
-+	entity->weight_counter = kzalloc(sizeof(struct bfq_weight_counter),
-+					 GFP_ATOMIC);
-+	entity->weight_counter->weight = entity->weight;
-+	rb_link_node(&entity->weight_counter->weights_node, parent, new);
-+	rb_insert_color(&entity->weight_counter->weights_node, root);
-+
-+inc_counter:
-+	entity->weight_counter->num_active++;
-+}
-+
-+/*
-+ * Decrement the weight counter associated with the entity, and, if the
-+ * counter reaches 0, remove the counter from the tree.
-+ * See the comments to the function bfq_weights_tree_add() for considerations
-+ * about overhead.
-+ */
-+static void bfq_weights_tree_remove(struct bfq_data *bfqd,
-+				    struct bfq_entity *entity,
-+				    struct rb_root *root)
-+{
-+	if (!entity->weight_counter)
-+		return;
-+
-+	BUG_ON(RB_EMPTY_ROOT(root));
-+	BUG_ON(entity->weight_counter->weight != entity->weight);
-+
-+	BUG_ON(!entity->weight_counter->num_active);
-+	entity->weight_counter->num_active--;
-+	if (entity->weight_counter->num_active > 0)
-+		goto reset_entity_pointer;
-+
-+	rb_erase(&entity->weight_counter->weights_node, root);
-+	kfree(entity->weight_counter);
-+
-+reset_entity_pointer:
-+	entity->weight_counter = NULL;
-+}
-+
-+static struct request *bfq_find_next_rq(struct bfq_data *bfqd,
-+					struct bfq_queue *bfqq,
-+					struct request *last)
-+{
-+	struct rb_node *rbnext = rb_next(&last->rb_node);
-+	struct rb_node *rbprev = rb_prev(&last->rb_node);
-+	struct request *next = NULL, *prev = NULL;
-+
-+	BUG_ON(RB_EMPTY_NODE(&last->rb_node));
-+
-+	if (rbprev)
-+		prev = rb_entry_rq(rbprev);
-+
-+	if (rbnext)
-+		next = rb_entry_rq(rbnext);
-+	else {
-+		rbnext = rb_first(&bfqq->sort_list);
-+		if (rbnext && rbnext != &last->rb_node)
-+			next = rb_entry_rq(rbnext);
-+	}
-+
-+	return bfq_choose_req(bfqd, next, prev, blk_rq_pos(last));
-+}
-+
-+/* see the definition of bfq_async_charge_factor for details */
-+static unsigned long bfq_serv_to_charge(struct request *rq,
-+					struct bfq_queue *bfqq)
-+{
-+	return blk_rq_sectors(rq) *
-+		(1 + ((!bfq_bfqq_sync(bfqq)) * (bfqq->wr_coeff == 1) *
-+		bfq_async_charge_factor));
-+}
-+
-+/**
-+ * bfq_updated_next_req - update the queue after a new next_rq selection.
-+ * @bfqd: the device data the queue belongs to.
-+ * @bfqq: the queue to update.
-+ *
-+ * If the first request of a queue changes we make sure that the queue
-+ * has enough budget to serve at least its first request (if the
-+ * request has grown).  We do this because if the queue has not enough
-+ * budget for its first request, it has to go through two dispatch
-+ * rounds to actually get it dispatched.
-+ */
-+static void bfq_updated_next_req(struct bfq_data *bfqd,
-+				 struct bfq_queue *bfqq)
-+{
-+	struct bfq_entity *entity = &bfqq->entity;
-+	struct bfq_service_tree *st = bfq_entity_service_tree(entity);
-+	struct request *next_rq = bfqq->next_rq;
-+	unsigned long new_budget;
-+
-+	if (!next_rq)
-+		return;
-+
-+	if (bfqq == bfqd->in_service_queue)
-+		/*
-+		 * In order not to break guarantees, budgets cannot be
-+		 * changed after an entity has been selected.
-+		 */
-+		return;
-+
-+	BUG_ON(entity->tree != &st->active);
-+	BUG_ON(entity == entity->sched_data->in_service_entity);
-+
-+	new_budget = max_t(unsigned long, bfqq->max_budget,
-+			   bfq_serv_to_charge(next_rq, bfqq));
-+	if (entity->budget != new_budget) {
-+		entity->budget = new_budget;
-+		bfq_log_bfqq(bfqd, bfqq, "updated next rq: new budget %lu",
-+					 new_budget);
-+		bfq_activate_bfqq(bfqd, bfqq);
-+	}
-+}
-+
-+static unsigned int bfq_wr_duration(struct bfq_data *bfqd)
-+{
-+	u64 dur;
-+
-+	if (bfqd->bfq_wr_max_time > 0)
-+		return bfqd->bfq_wr_max_time;
-+
-+	dur = bfqd->RT_prod;
-+	do_div(dur, bfqd->peak_rate);
-+
-+	return dur;
-+}
-+
-+/* Empty burst list and add just bfqq (see comments to bfq_handle_burst) */
-+static void bfq_reset_burst_list(struct bfq_data *bfqd, struct bfq_queue *bfqq)
-+{
-+	struct bfq_queue *item;
-+	struct hlist_node *n;
-+
-+	hlist_for_each_entry_safe(item, n, &bfqd->burst_list, burst_list_node)
-+		hlist_del_init(&item->burst_list_node);
-+	hlist_add_head(&bfqq->burst_list_node, &bfqd->burst_list);
-+	bfqd->burst_size = 1;
-+}
-+
-+/* Add bfqq to the list of queues in current burst (see bfq_handle_burst) */
-+static void bfq_add_to_burst(struct bfq_data *bfqd, struct bfq_queue *bfqq)
-+{
-+	/* Increment burst size to take into account also bfqq */
-+	bfqd->burst_size++;
-+
-+	if (bfqd->burst_size == bfqd->bfq_large_burst_thresh) {
-+		struct bfq_queue *pos, *bfqq_item;
-+		struct hlist_node *n;
-+
-+		/*
-+		 * Enough queues have been activated shortly after each
-+		 * other to consider this burst as large.
-+		 */
-+		bfqd->large_burst = true;
-+
-+		/*
-+		 * We can now mark all queues in the burst list as
-+		 * belonging to a large burst.
-+		 */
-+		hlist_for_each_entry(bfqq_item, &bfqd->burst_list,
-+				     burst_list_node)
-+		        bfq_mark_bfqq_in_large_burst(bfqq_item);
-+		bfq_mark_bfqq_in_large_burst(bfqq);
-+
-+		/*
-+		 * From now on, and until the current burst finishes, any
-+		 * new queue being activated shortly after the last queue
-+		 * was inserted in the burst can be immediately marked as
-+		 * belonging to a large burst. So the burst list is not
-+		 * needed any more. Remove it.
-+		 */
-+		hlist_for_each_entry_safe(pos, n, &bfqd->burst_list,
-+					  burst_list_node)
-+			hlist_del_init(&pos->burst_list_node);
-+	} else /* burst not yet large: add bfqq to the burst list */
-+		hlist_add_head(&bfqq->burst_list_node, &bfqd->burst_list);
-+}
-+
-+/*
-+ * If many queues happen to become active shortly after each other, then,
-+ * to help the processes associated to these queues get their job done as
-+ * soon as possible, it is usually better to not grant either weight-raising
-+ * or device idling to these queues. In this comment we describe, firstly,
-+ * the reasons why this fact holds, and, secondly, the next function, which
-+ * implements the main steps needed to properly mark these queues so that
-+ * they can then be treated in a different way.
-+ *
-+ * As for the terminology, we say that a queue becomes active, i.e.,
-+ * switches from idle to backlogged, either when it is created (as a
-+ * consequence of the arrival of an I/O request), or, if already existing,
-+ * when a new request for the queue arrives while the queue is idle.
-+ * Bursts of activations, i.e., activations of different queues occurring
-+ * shortly after each other, are typically caused by services or applications
-+ * that spawn or reactivate many parallel threads/processes. Examples are
-+ * systemd during boot or git grep.
-+ *
-+ * These services or applications benefit mostly from a high throughput:
-+ * the quicker the requests of the activated queues are cumulatively served,
-+ * the sooner the target job of these queues gets completed. As a consequence,
-+ * weight-raising any of these queues, which also implies idling the device
-+ * for it, is almost always counterproductive: in most cases it just lowers
-+ * throughput.
-+ *
-+ * On the other hand, a burst of activations may be also caused by the start
-+ * of an application that does not consist in a lot of parallel I/O-bound
-+ * threads. In fact, with a complex application, the burst may be just a
-+ * consequence of the fact that several processes need to be executed to
-+ * start-up the application. To start an application as quickly as possible,
-+ * the best thing to do is to privilege the I/O related to the application
-+ * with respect to all other I/O. Therefore, the best strategy to start as
-+ * quickly as possible an application that causes a burst of activations is
-+ * to weight-raise all the queues activated during the burst. This is the
-+ * exact opposite of the best strategy for the other type of bursts.
-+ *
-+ * In the end, to take the best action for each of the two cases, the two
-+ * types of bursts need to be distinguished. Fortunately, this seems
-+ * relatively easy to do, by looking at the sizes of the bursts. In
-+ * particular, we found a threshold such that bursts with a larger size
-+ * than that threshold are apparently caused only by services or commands
-+ * such as systemd or git grep. For brevity, hereafter we call just 'large'
-+ * these bursts. BFQ *does not* weight-raise queues whose activations occur
-+ * in a large burst. In addition, for each of these queues BFQ performs or
-+ * does not perform idling depending on which choice boosts the throughput
-+ * most. The exact choice depends on the device and request pattern at
-+ * hand.
-+ *
-+ * Turning back to the next function, it implements all the steps needed
-+ * to detect the occurrence of a large burst and to properly mark all the
-+ * queues belonging to it (so that they can then be treated in a different
-+ * way). This goal is achieved by maintaining a special "burst list" that
-+ * holds, temporarily, the queues that belong to the burst in progress. The
-+ * list is then used to mark these queues as belonging to a large burst if
-+ * the burst does become large. The main steps are the following.
-+ *
-+ * . when the very first queue is activated, the queue is inserted into the
-+ *   list (as it could be the first queue in a possible burst)
-+ *
-+ * . if the current burst has not yet become large, and a queue Q that does
-+ *   not yet belong to the burst is activated shortly after the last time
-+ *   at which a new queue entered the burst list, then the function appends
-+ *   Q to the burst list
-+ *
-+ * . if, as a consequence of the previous step, the burst size reaches
-+ *   the large-burst threshold, then
-+ *
-+ *     . all the queues in the burst list are marked as belonging to a
-+ *       large burst
-+ *
-+ *     . the burst list is deleted; in fact, the burst list already served
-+ *       its purpose (keeping temporarily track of the queues in a burst,
-+ *       so as to be able to mark them as belonging to a large burst in the
-+ *       previous sub-step), and now is not needed any more
-+ *
-+ *     . the device enters a large-burst mode
-+ *
-+ * . if a queue Q that does not belong to the burst is activated while
-+ *   the device is in large-burst mode and shortly after the last time
-+ *   at which a queue either entered the burst list or was marked as
-+ *   belonging to the current large burst, then Q is immediately marked
-+ *   as belonging to a large burst.
-+ *
-+ * . if a queue Q that does not belong to the burst is activated a while
-+ *   later, i.e., not shortly after, than the last time at which a queue
-+ *   either entered the burst list or was marked as belonging to the
-+ *   current large burst, then the current burst is deemed as finished and:
-+ *
-+ *        . the large-burst mode is reset if set
-+ *
-+ *        . the burst list is emptied
-+ *
-+ *        . Q is inserted in the burst list, as Q may be the first queue
-+ *          in a possible new burst (then the burst list contains just Q
-+ *          after this step).
-+ */
-+static void bfq_handle_burst(struct bfq_data *bfqd, struct bfq_queue *bfqq,
-+			     bool idle_for_long_time)
-+{
-+	/*
-+	 * If bfqq happened to be activated in a burst, but has been idle
-+	 * for at least as long as an interactive queue, then we assume
-+	 * that, in the overall I/O initiated in the burst, the I/O
-+	 * associated to bfqq is finished. So bfqq does not need to be
-+	 * treated as a queue belonging to a burst anymore. Accordingly,
-+	 * we reset bfqq's in_large_burst flag if set, and remove bfqq
-+	 * from the burst list if it's there. We do not decrement instead
-+	 * burst_size, because the fact that bfqq does not need to belong
-+	 * to the burst list any more does not invalidate the fact that
-+	 * bfqq may have been activated during the current burst.
-+	 */
-+	if (idle_for_long_time) {
-+		hlist_del_init(&bfqq->burst_list_node);
-+		bfq_clear_bfqq_in_large_burst(bfqq);
-+	}
-+
-+	/*
-+	 * If bfqq is already in the burst list or is part of a large
-+	 * burst, then there is nothing else to do.
-+	 */
-+	if (!hlist_unhashed(&bfqq->burst_list_node) ||
-+	    bfq_bfqq_in_large_burst(bfqq))
-+		return;
-+
-+	/*
-+	 * If bfqq's activation happens late enough, then the current
-+	 * burst is finished, and related data structures must be reset.
-+	 *
-+	 * In this respect, consider the special case where bfqq is the very
-+	 * first queue being activated. In this case, last_ins_in_burst is
-+	 * not yet significant when we get here. But it is easy to verify
-+	 * that, whether or not the following condition is true, bfqq will
-+	 * end up being inserted into the burst list. In particular the
-+	 * list will happen to contain only bfqq. And this is exactly what
-+	 * has to happen, as bfqq may be the first queue in a possible
-+	 * burst.
-+	 */
-+	if (time_is_before_jiffies(bfqd->last_ins_in_burst +
-+	    bfqd->bfq_burst_interval)) {
-+		bfqd->large_burst = false;
-+		bfq_reset_burst_list(bfqd, bfqq);
-+		return;
-+	}
-+
-+	/*
-+	 * If we get here, then bfqq is being activated shortly after the
-+	 * last queue. So, if the current burst is also large, we can mark
-+	 * bfqq as belonging to this large burst immediately.
-+	 */
-+	if (bfqd->large_burst) {
-+		bfq_mark_bfqq_in_large_burst(bfqq);
-+		return;
-+	}
-+
-+	/*
-+	 * If we get here, then a large-burst state has not yet been
-+	 * reached, but bfqq is being activated shortly after the last
-+	 * queue. Then we add bfqq to the burst.
-+	 */
-+	bfq_add_to_burst(bfqd, bfqq);
-+}
-+
-+static void bfq_add_request(struct request *rq)
-+{
-+	struct bfq_queue *bfqq = RQ_BFQQ(rq);
-+	struct bfq_entity *entity = &bfqq->entity;
-+	struct bfq_data *bfqd = bfqq->bfqd;
-+	struct request *next_rq, *prev;
-+	unsigned long old_wr_coeff = bfqq->wr_coeff;
-+	bool interactive = false;
-+
-+	bfq_log_bfqq(bfqd, bfqq, "add_request %d", rq_is_sync(rq));
-+	bfqq->queued[rq_is_sync(rq)]++;
-+	bfqd->queued++;
-+
-+	elv_rb_add(&bfqq->sort_list, rq);
-+
-+	/*
-+	 * Check if this request is a better next-serve candidate.
-+	 */
-+	prev = bfqq->next_rq;
-+	next_rq = bfq_choose_req(bfqd, bfqq->next_rq, rq, bfqd->last_position);
-+	BUG_ON(!next_rq);
-+	bfqq->next_rq = next_rq;
-+
-+	if (!bfq_bfqq_busy(bfqq)) {
-+		bool soft_rt, in_burst,
-+		     idle_for_long_time = time_is_before_jiffies(
-+						bfqq->budget_timeout +
-+						bfqd->bfq_wr_min_idle_time);
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+		bfqg_stats_update_io_add(bfqq_group(RQ_BFQQ(rq)), bfqq,
-+					 rq->cmd_flags);
-+#endif
-+		if (bfq_bfqq_sync(bfqq)) {
-+			bool already_in_burst =
-+			   !hlist_unhashed(&bfqq->burst_list_node) ||
-+			   bfq_bfqq_in_large_burst(bfqq);
-+			bfq_handle_burst(bfqd, bfqq, idle_for_long_time);
-+			/*
-+			 * If bfqq was not already in the current burst,
-+			 * then, at this point, bfqq either has been
-+			 * added to the current burst or has caused the
-+			 * current burst to terminate. In particular, in
-+			 * the second case, bfqq has become the first
-+			 * queue in a possible new burst.
-+			 * In both cases last_ins_in_burst needs to be
-+			 * moved forward.
-+			 */
-+			if (!already_in_burst)
-+				bfqd->last_ins_in_burst = jiffies;
-+		}
-+
-+		in_burst = bfq_bfqq_in_large_burst(bfqq);
-+		soft_rt = bfqd->bfq_wr_max_softrt_rate > 0 &&
-+			!in_burst &&
-+			time_is_before_jiffies(bfqq->soft_rt_next_start);
-+		interactive = !in_burst && idle_for_long_time;
-+		entity->budget = max_t(unsigned long, bfqq->max_budget,
-+				       bfq_serv_to_charge(next_rq, bfqq));
-+
-+		if (!bfq_bfqq_IO_bound(bfqq)) {
-+			if (time_before(jiffies,
-+					RQ_BIC(rq)->ttime.last_end_request +
-+					bfqd->bfq_slice_idle)) {
-+				bfqq->requests_within_timer++;
-+				if (bfqq->requests_within_timer >=
-+				    bfqd->bfq_requests_within_timer)
-+					bfq_mark_bfqq_IO_bound(bfqq);
-+			} else
-+				bfqq->requests_within_timer = 0;
-+		}
-+
-+		if (!bfqd->low_latency)
-+			goto add_bfqq_busy;
-+
-+		/*
-+		 * If the queue:
-+		 * - is not being boosted,
-+		 * - has been idle for enough time,
-+		 * - is not a sync queue or is linked to a bfq_io_cq (it is
-+		 *   shared "for its nature" or it is not shared and its
-+		 *   requests have not been redirected to a shared queue)
-+		 * start a weight-raising period.
-+		 */
-+		if (old_wr_coeff == 1 && (interactive || soft_rt) &&
-+		    (!bfq_bfqq_sync(bfqq) || bfqq->bic)) {
-+			bfqq->wr_coeff = bfqd->bfq_wr_coeff;
-+			if (interactive)
-+				bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
-+			else
-+				bfqq->wr_cur_max_time =
-+					bfqd->bfq_wr_rt_max_time;
-+			bfq_log_bfqq(bfqd, bfqq,
-+				     "wrais starting at %lu, rais_max_time %u",
-+				     jiffies,
-+				     jiffies_to_msecs(bfqq->wr_cur_max_time));
-+		} else if (old_wr_coeff > 1) {
-+			if (interactive)
-+				bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
-+			else if (in_burst ||
-+				 (bfqq->wr_cur_max_time ==
-+				  bfqd->bfq_wr_rt_max_time &&
-+				  !soft_rt)) {
-+				bfqq->wr_coeff = 1;
-+				bfq_log_bfqq(bfqd, bfqq,
-+					"wrais ending at %lu, rais_max_time %u",
-+					jiffies,
-+					jiffies_to_msecs(bfqq->
-+						wr_cur_max_time));
-+			} else if (time_before(
-+					bfqq->last_wr_start_finish +
-+					bfqq->wr_cur_max_time,
-+					jiffies +
-+					bfqd->bfq_wr_rt_max_time) &&
-+				   soft_rt) {
-+				/*
-+				 *
-+				 * The remaining weight-raising time is lower
-+				 * than bfqd->bfq_wr_rt_max_time, which means
-+				 * that the application is enjoying weight
-+				 * raising either because deemed soft-rt in
-+				 * the near past, or because deemed interactive
-+				 * a long ago.
-+				 * In both cases, resetting now the current
-+				 * remaining weight-raising time for the
-+				 * application to the weight-raising duration
-+				 * for soft rt applications would not cause any
-+				 * latency increase for the application (as the
-+				 * new duration would be higher than the
-+				 * remaining time).
-+				 *
-+				 * In addition, the application is now meeting
-+				 * the requirements for being deemed soft rt.
-+				 * In the end we can correctly and safely
-+				 * (re)charge the weight-raising duration for
-+				 * the application with the weight-raising
-+				 * duration for soft rt applications.
-+				 *
-+				 * In particular, doing this recharge now, i.e.,
-+				 * before the weight-raising period for the
-+				 * application finishes, reduces the probability
-+				 * of the following negative scenario:
-+				 * 1) the weight of a soft rt application is
-+				 *    raised at startup (as for any newly
-+				 *    created application),
-+				 * 2) since the application is not interactive,
-+				 *    at a certain time weight-raising is
-+				 *    stopped for the application,
-+				 * 3) at that time the application happens to
-+				 *    still have pending requests, and hence
-+				 *    is destined to not have a chance to be
-+				 *    deemed soft rt before these requests are
-+				 *    completed (see the comments to the
-+				 *    function bfq_bfqq_softrt_next_start()
-+				 *    for details on soft rt detection),
-+				 * 4) these pending requests experience a high
-+				 *    latency because the application is not
-+				 *    weight-raised while they are pending.
-+				 */
-+				bfqq->last_wr_start_finish = jiffies;
-+				bfqq->wr_cur_max_time =
-+					bfqd->bfq_wr_rt_max_time;
-+			}
-+		}
-+		if (old_wr_coeff != bfqq->wr_coeff)
-+			entity->prio_changed = 1;
-+add_bfqq_busy:
-+		bfqq->last_idle_bklogged = jiffies;
-+		bfqq->service_from_backlogged = 0;
-+		bfq_clear_bfqq_softrt_update(bfqq);
-+		bfq_add_bfqq_busy(bfqd, bfqq);
-+	} else {
-+		if (bfqd->low_latency && old_wr_coeff == 1 && !rq_is_sync(rq) &&
-+		    time_is_before_jiffies(
-+				bfqq->last_wr_start_finish +
-+				bfqd->bfq_wr_min_inter_arr_async)) {
-+			bfqq->wr_coeff = bfqd->bfq_wr_coeff;
-+			bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
-+
-+			bfqd->wr_busy_queues++;
-+			entity->prio_changed = 1;
-+			bfq_log_bfqq(bfqd, bfqq,
-+			    "non-idle wrais starting at %lu, rais_max_time %u",
-+			    jiffies,
-+			    jiffies_to_msecs(bfqq->wr_cur_max_time));
-+		}
-+		if (prev != bfqq->next_rq)
-+			bfq_updated_next_req(bfqd, bfqq);
-+	}
-+
-+	if (bfqd->low_latency &&
-+		(old_wr_coeff == 1 || bfqq->wr_coeff == 1 || interactive))
-+		bfqq->last_wr_start_finish = jiffies;
-+}
-+
-+static struct request *bfq_find_rq_fmerge(struct bfq_data *bfqd,
-+					  struct bio *bio)
-+{
-+	struct task_struct *tsk = current;
-+	struct bfq_io_cq *bic;
-+	struct bfq_queue *bfqq;
-+
-+	bic = bfq_bic_lookup(bfqd, tsk->io_context);
-+	if (!bic)
-+		return NULL;
-+
-+	bfqq = bic_to_bfqq(bic, bfq_bio_sync(bio));
-+	if (bfqq)
-+		return elv_rb_find(&bfqq->sort_list, bio_end_sector(bio));
-+
-+	return NULL;
-+}
-+
-+static void bfq_activate_request(struct request_queue *q, struct request *rq)
-+{
-+	struct bfq_data *bfqd = q->elevator->elevator_data;
-+
-+	bfqd->rq_in_driver++;
-+	bfqd->last_position = blk_rq_pos(rq) + blk_rq_sectors(rq);
-+	bfq_log(bfqd, "activate_request: new bfqd->last_position %llu",
-+		(long long unsigned)bfqd->last_position);
-+}
-+
-+static void bfq_deactivate_request(struct request_queue *q, struct request *rq)
-+{
-+	struct bfq_data *bfqd = q->elevator->elevator_data;
-+
-+	BUG_ON(bfqd->rq_in_driver == 0);
-+	bfqd->rq_in_driver--;
-+}
-+
-+static void bfq_remove_request(struct request *rq)
-+{
-+	struct bfq_queue *bfqq = RQ_BFQQ(rq);
-+	struct bfq_data *bfqd = bfqq->bfqd;
-+	const int sync = rq_is_sync(rq);
-+
-+	if (bfqq->next_rq == rq) {
-+		bfqq->next_rq = bfq_find_next_rq(bfqd, bfqq, rq);
-+		bfq_updated_next_req(bfqd, bfqq);
-+	}
-+
-+	if (rq->queuelist.prev != &rq->queuelist)
-+		list_del_init(&rq->queuelist);
-+	BUG_ON(bfqq->queued[sync] == 0);
-+	bfqq->queued[sync]--;
-+	bfqd->queued--;
-+	elv_rb_del(&bfqq->sort_list, rq);
-+
-+	if (RB_EMPTY_ROOT(&bfqq->sort_list)) {
-+		if (bfq_bfqq_busy(bfqq) && bfqq != bfqd->in_service_queue)
-+			bfq_del_bfqq_busy(bfqd, bfqq, 1);
-+		/*
-+		 * Remove queue from request-position tree as it is empty.
-+		 */
-+		if (bfqq->pos_root) {
-+			rb_erase(&bfqq->pos_node, bfqq->pos_root);
-+			bfqq->pos_root = NULL;
-+		}
-+	}
-+
-+	if (rq->cmd_flags & REQ_META) {
-+		BUG_ON(bfqq->meta_pending == 0);
-+		bfqq->meta_pending--;
-+	}
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+	bfqg_stats_update_io_remove(bfqq_group(bfqq), rq->cmd_flags);
-+#endif
-+}
-+
-+static int bfq_merge(struct request_queue *q, struct request **req,
-+		     struct bio *bio)
-+{
-+	struct bfq_data *bfqd = q->elevator->elevator_data;
-+	struct request *__rq;
-+
-+	__rq = bfq_find_rq_fmerge(bfqd, bio);
-+	if (__rq && elv_rq_merge_ok(__rq, bio)) {
-+		*req = __rq;
-+		return ELEVATOR_FRONT_MERGE;
-+	}
-+
-+	return ELEVATOR_NO_MERGE;
-+}
-+
-+static void bfq_merged_request(struct request_queue *q, struct request *req,
-+			       int type)
-+{
-+	if (type == ELEVATOR_FRONT_MERGE &&
-+	    rb_prev(&req->rb_node) &&
-+	    blk_rq_pos(req) <
-+	    blk_rq_pos(container_of(rb_prev(&req->rb_node),
-+				    struct request, rb_node))) {
-+		struct bfq_queue *bfqq = RQ_BFQQ(req);
-+		struct bfq_data *bfqd = bfqq->bfqd;
-+		struct request *prev, *next_rq;
-+
-+		/* Reposition request in its sort_list */
-+		elv_rb_del(&bfqq->sort_list, req);
-+		elv_rb_add(&bfqq->sort_list, req);
-+		/* Choose next request to be served for bfqq */
-+		prev = bfqq->next_rq;
-+		next_rq = bfq_choose_req(bfqd, bfqq->next_rq, req,
-+					 bfqd->last_position);
-+		BUG_ON(!next_rq);
-+		bfqq->next_rq = next_rq;
-+	}
-+}
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+static void bfq_bio_merged(struct request_queue *q, struct request *req,
-+			   struct bio *bio)
-+{
-+	bfqg_stats_update_io_merged(bfqq_group(RQ_BFQQ(req)), bio->bi_rw);
-+}
-+#endif
-+
-+static void bfq_merged_requests(struct request_queue *q, struct request *rq,
-+				struct request *next)
-+{
-+	struct bfq_queue *bfqq = RQ_BFQQ(rq), *next_bfqq = RQ_BFQQ(next);
-+
-+	/*
-+	 * If next and rq belong to the same bfq_queue and next is older
-+	 * than rq, then reposition rq in the fifo (by substituting next
-+	 * with rq). Otherwise, if next and rq belong to different
-+	 * bfq_queues, never reposition rq: in fact, we would have to
-+	 * reposition it with respect to next's position in its own fifo,
-+	 * which would most certainly be too expensive with respect to
-+	 * the benefits.
-+	 */
-+	if (bfqq == next_bfqq &&
-+	    !list_empty(&rq->queuelist) && !list_empty(&next->queuelist) &&
-+	    time_before(next->fifo_time, rq->fifo_time)) {
-+		list_del_init(&rq->queuelist);
-+		list_replace_init(&next->queuelist, &rq->queuelist);
-+		rq->fifo_time = next->fifo_time;
-+	}
-+
-+	if (bfqq->next_rq == next)
-+		bfqq->next_rq = rq;
-+
-+	bfq_remove_request(next);
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+	bfqg_stats_update_io_merged(bfqq_group(bfqq), next->cmd_flags);
-+#endif
-+}
-+
-+/* Must be called with bfqq != NULL */
-+static void bfq_bfqq_end_wr(struct bfq_queue *bfqq)
-+{
-+	BUG_ON(!bfqq);
-+	if (bfq_bfqq_busy(bfqq))
-+		bfqq->bfqd->wr_busy_queues--;
-+	bfqq->wr_coeff = 1;
-+	bfqq->wr_cur_max_time = 0;
-+	/* Trigger a weight change on the next activation of the queue */
-+	bfqq->entity.prio_changed = 1;
-+}
-+
-+static void bfq_end_wr_async_queues(struct bfq_data *bfqd,
-+				    struct bfq_group *bfqg)
-+{
-+	int i, j;
-+
-+	for (i = 0; i < 2; i++)
-+		for (j = 0; j < IOPRIO_BE_NR; j++)
-+			if (bfqg->async_bfqq[i][j])
-+				bfq_bfqq_end_wr(bfqg->async_bfqq[i][j]);
-+	if (bfqg->async_idle_bfqq)
-+		bfq_bfqq_end_wr(bfqg->async_idle_bfqq);
-+}
-+
-+static void bfq_end_wr(struct bfq_data *bfqd)
-+{
-+	struct bfq_queue *bfqq;
-+
-+	spin_lock_irq(bfqd->queue->queue_lock);
-+
-+	list_for_each_entry(bfqq, &bfqd->active_list, bfqq_list)
-+		bfq_bfqq_end_wr(bfqq);
-+	list_for_each_entry(bfqq, &bfqd->idle_list, bfqq_list)
-+		bfq_bfqq_end_wr(bfqq);
-+	bfq_end_wr_async(bfqd);
-+
-+	spin_unlock_irq(bfqd->queue->queue_lock);
-+}
-+
-+static int bfq_allow_merge(struct request_queue *q, struct request *rq,
-+			   struct bio *bio)
-+{
-+	struct bfq_data *bfqd = q->elevator->elevator_data;
-+	struct bfq_io_cq *bic;
-+
-+	/*
-+	 * Disallow merge of a sync bio into an async request.
-+	 */
-+	if (bfq_bio_sync(bio) && !rq_is_sync(rq))
-+		return 0;
-+
-+	/*
-+	 * Lookup the bfqq that this bio will be queued with. Allow
-+	 * merge only if rq is queued there.
-+	 * Queue lock is held here.
-+	 */
-+	bic = bfq_bic_lookup(bfqd, current->io_context);
-+	if (!bic)
-+		return 0;
-+
-+	return bic_to_bfqq(bic, bfq_bio_sync(bio)) == RQ_BFQQ(rq);
-+}
-+
-+static void __bfq_set_in_service_queue(struct bfq_data *bfqd,
-+				       struct bfq_queue *bfqq)
-+{
-+	if (bfqq) {
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+		bfqg_stats_update_avg_queue_size(bfqq_group(bfqq));
-+#endif
-+		bfq_mark_bfqq_must_alloc(bfqq);
-+		bfq_mark_bfqq_budget_new(bfqq);
-+		bfq_clear_bfqq_fifo_expire(bfqq);
-+
-+		bfqd->budgets_assigned = (bfqd->budgets_assigned*7 + 256) / 8;
-+
-+		bfq_log_bfqq(bfqd, bfqq,
-+			     "set_in_service_queue, cur-budget = %d",
-+			     bfqq->entity.budget);
-+	}
-+
-+	bfqd->in_service_queue = bfqq;
-+}
-+
-+/*
-+ * Get and set a new queue for service.
-+ */
-+static struct bfq_queue *bfq_set_in_service_queue(struct bfq_data *bfqd)
-+{
-+	struct bfq_queue *bfqq = bfq_get_next_queue(bfqd);
-+
-+	__bfq_set_in_service_queue(bfqd, bfqq);
-+	return bfqq;
-+}
-+
-+/*
-+ * If enough samples have been computed, return the current max budget
-+ * stored in bfqd, which is dynamically updated according to the
-+ * estimated disk peak rate; otherwise return the default max budget
-+ */
-+static int bfq_max_budget(struct bfq_data *bfqd)
-+{
-+	if (bfqd->budgets_assigned < bfq_stats_min_budgets)
-+		return bfq_default_max_budget;
-+	else
-+		return bfqd->bfq_max_budget;
-+}
-+
-+/*
-+ * Return min budget, which is a fraction of the current or default
-+ * max budget (trying with 1/32)
-+ */
-+static int bfq_min_budget(struct bfq_data *bfqd)
-+{
-+	if (bfqd->budgets_assigned < bfq_stats_min_budgets)
-+		return bfq_default_max_budget / 32;
-+	else
-+		return bfqd->bfq_max_budget / 32;
-+}
-+
-+static void bfq_arm_slice_timer(struct bfq_data *bfqd)
-+{
-+	struct bfq_queue *bfqq = bfqd->in_service_queue;
-+	struct bfq_io_cq *bic;
-+	unsigned long sl;
-+
-+	BUG_ON(!RB_EMPTY_ROOT(&bfqq->sort_list));
-+
-+	/* Processes have exited, don't wait. */
-+	bic = bfqd->in_service_bic;
-+	if (!bic || atomic_read(&bic->icq.ioc->active_ref) == 0)
-+		return;
-+
-+	bfq_mark_bfqq_wait_request(bfqq);
-+
-+	/*
-+	 * We don't want to idle for seeks, but we do want to allow
-+	 * fair distribution of slice time for a process doing back-to-back
-+	 * seeks. So allow a little bit of time for him to submit a new rq.
-+	 *
-+	 * To prevent processes with (partly) seeky workloads from
-+	 * being too ill-treated, grant them a small fraction of the
-+	 * assigned budget before reducing the waiting time to
-+	 * BFQ_MIN_TT. This happened to help reduce latency.
-+	 */
-+	sl = bfqd->bfq_slice_idle;
-+	/*
-+	 * Unless the queue is being weight-raised or the scenario is
-+	 * asymmetric, grant only minimum idle time if the queue either
-+	 * has been seeky for long enough or has already proved to be
-+	 * constantly seeky.
-+	 */
-+	if (bfq_sample_valid(bfqq->seek_samples) &&
-+	    ((BFQQ_SEEKY(bfqq) && bfqq->entity.service >
-+				  bfq_max_budget(bfqq->bfqd) / 8) ||
-+	      bfq_bfqq_constantly_seeky(bfqq)) && bfqq->wr_coeff == 1 &&
-+	    bfq_symmetric_scenario(bfqd))
-+		sl = min(sl, msecs_to_jiffies(BFQ_MIN_TT));
-+	else if (bfqq->wr_coeff > 1)
-+		sl = sl * 3;
-+	bfqd->last_idling_start = ktime_get();
-+	mod_timer(&bfqd->idle_slice_timer, jiffies + sl);
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+	bfqg_stats_set_start_idle_time(bfqq_group(bfqq));
-+#endif
-+	bfq_log(bfqd, "arm idle: %u/%u ms",
-+		jiffies_to_msecs(sl), jiffies_to_msecs(bfqd->bfq_slice_idle));
-+}
-+
-+/*
-+ * Set the maximum time for the in-service queue to consume its
-+ * budget. This prevents seeky processes from lowering the disk
-+ * throughput (always guaranteed with a time slice scheme as in CFQ).
-+ */
-+static void bfq_set_budget_timeout(struct bfq_data *bfqd)
-+{
-+	struct bfq_queue *bfqq = bfqd->in_service_queue;
-+	unsigned int timeout_coeff;
-+	if (bfqq->wr_cur_max_time == bfqd->bfq_wr_rt_max_time)
-+		timeout_coeff = 1;
-+	else
-+		timeout_coeff = bfqq->entity.weight / bfqq->entity.orig_weight;
-+
-+	bfqd->last_budget_start = ktime_get();
-+
-+	bfq_clear_bfqq_budget_new(bfqq);
-+	bfqq->budget_timeout = jiffies +
-+		bfqd->bfq_timeout[bfq_bfqq_sync(bfqq)] * timeout_coeff;
-+
-+	bfq_log_bfqq(bfqd, bfqq, "set budget_timeout %u",
-+		jiffies_to_msecs(bfqd->bfq_timeout[bfq_bfqq_sync(bfqq)] *
-+		timeout_coeff));
-+}
-+
-+/*
-+ * Move request from internal lists to the request queue dispatch list.
-+ */
-+static void bfq_dispatch_insert(struct request_queue *q, struct request *rq)
-+{
-+	struct bfq_data *bfqd = q->elevator->elevator_data;
-+	struct bfq_queue *bfqq = RQ_BFQQ(rq);
-+
-+	/*
-+	 * For consistency, the next instruction should have been executed
-+	 * after removing the request from the queue and dispatching it.
-+	 * We execute instead this instruction before bfq_remove_request()
-+	 * (and hence introduce a temporary inconsistency), for efficiency.
-+	 * In fact, in a forced_dispatch, this prevents two counters related
-+	 * to bfqq->dispatched to risk to be uselessly decremented if bfqq
-+	 * is not in service, and then to be incremented again after
-+	 * incrementing bfqq->dispatched.
-+	 */
-+	bfqq->dispatched++;
-+	bfq_remove_request(rq);
-+	elv_dispatch_sort(q, rq);
-+
-+	if (bfq_bfqq_sync(bfqq))
-+		bfqd->sync_flight++;
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+	bfqg_stats_update_dispatch(bfqq_group(bfqq), blk_rq_bytes(rq),
-+				   rq->cmd_flags);
-+#endif
-+}
-+
-+/*
-+ * Return expired entry, or NULL to just start from scratch in rbtree.
-+ */
-+static struct request *bfq_check_fifo(struct bfq_queue *bfqq)
-+{
-+	struct request *rq = NULL;
-+
-+	if (bfq_bfqq_fifo_expire(bfqq))
-+		return NULL;
-+
-+	bfq_mark_bfqq_fifo_expire(bfqq);
-+
-+	if (list_empty(&bfqq->fifo))
-+		return NULL;
-+
-+	rq = rq_entry_fifo(bfqq->fifo.next);
-+
-+	if (time_before(jiffies, rq->fifo_time))
-+		return NULL;
-+
-+	return rq;
-+}
-+
-+static int bfq_bfqq_budget_left(struct bfq_queue *bfqq)
-+{
-+	struct bfq_entity *entity = &bfqq->entity;
-+	return entity->budget - entity->service;
-+}
-+
-+static void __bfq_bfqq_expire(struct bfq_data *bfqd, struct bfq_queue *bfqq)
-+{
-+	BUG_ON(bfqq != bfqd->in_service_queue);
-+
-+	__bfq_bfqd_reset_in_service(bfqd);
-+
-+	if (RB_EMPTY_ROOT(&bfqq->sort_list)) {
-+		/*
-+		 * Overloading budget_timeout field to store the time
-+		 * at which the queue remains with no backlog; used by
-+		 * the weight-raising mechanism.
-+		 */
-+		bfqq->budget_timeout = jiffies;
-+		bfq_del_bfqq_busy(bfqd, bfqq, 1);
-+	} else
-+		bfq_activate_bfqq(bfqd, bfqq);
-+}
-+
-+/**
-+ * __bfq_bfqq_recalc_budget - try to adapt the budget to the @bfqq behavior.
-+ * @bfqd: device data.
-+ * @bfqq: queue to update.
-+ * @reason: reason for expiration.
-+ *
-+ * Handle the feedback on @bfqq budget at queue expiration.
-+ * See the body for detailed comments.
-+ */
-+static void __bfq_bfqq_recalc_budget(struct bfq_data *bfqd,
-+				     struct bfq_queue *bfqq,
-+				     enum bfqq_expiration reason)
-+{
-+	struct request *next_rq;
-+	int budget, min_budget;
-+
-+	budget = bfqq->max_budget;
-+	min_budget = bfq_min_budget(bfqd);
-+
-+	BUG_ON(bfqq != bfqd->in_service_queue);
-+
-+	bfq_log_bfqq(bfqd, bfqq, "recalc_budg: last budg %d, budg left %d",
-+		bfqq->entity.budget, bfq_bfqq_budget_left(bfqq));
-+	bfq_log_bfqq(bfqd, bfqq, "recalc_budg: last max_budg %d, min budg %d",
-+		budget, bfq_min_budget(bfqd));
-+	bfq_log_bfqq(bfqd, bfqq, "recalc_budg: sync %d, seeky %d",
-+		bfq_bfqq_sync(bfqq), BFQQ_SEEKY(bfqd->in_service_queue));
-+
-+	if (bfq_bfqq_sync(bfqq)) {
-+		switch (reason) {
-+		/*
-+		 * Caveat: in all the following cases we trade latency
-+		 * for throughput.
-+		 */
-+		case BFQ_BFQQ_TOO_IDLE:
-+			/*
-+			 * This is the only case where we may reduce
-+			 * the budget: if there is no request of the
-+			 * process still waiting for completion, then
-+			 * we assume (tentatively) that the timer has
-+			 * expired because the batch of requests of
-+			 * the process could have been served with a
-+			 * smaller budget.  Hence, betting that
-+			 * process will behave in the same way when it
-+			 * becomes backlogged again, we reduce its
-+			 * next budget.  As long as we guess right,
-+			 * this budget cut reduces the latency
-+			 * experienced by the process.
-+			 *
-+			 * However, if there are still outstanding
-+			 * requests, then the process may have not yet
-+			 * issued its next request just because it is
-+			 * still waiting for the completion of some of
-+			 * the still outstanding ones.  So in this
-+			 * subcase we do not reduce its budget, on the
-+			 * contrary we increase it to possibly boost
-+			 * the throughput, as discussed in the
-+			 * comments to the BUDGET_TIMEOUT case.
-+			 */
-+			if (bfqq->dispatched > 0) /* still outstanding reqs */
-+				budget = min(budget * 2, bfqd->bfq_max_budget);
-+			else {
-+				if (budget > 5 * min_budget)
-+					budget -= 4 * min_budget;
-+				else
-+					budget = min_budget;
-+			}
-+			break;
-+		case BFQ_BFQQ_BUDGET_TIMEOUT:
-+			/*
-+			 * We double the budget here because: 1) it
-+			 * gives the chance to boost the throughput if
-+			 * this is not a seeky process (which may have
-+			 * bumped into this timeout because of, e.g.,
-+			 * ZBR), 2) together with charge_full_budget
-+			 * it helps give seeky processes higher
-+			 * timestamps, and hence be served less
-+			 * frequently.
-+			 */
-+			budget = min(budget * 2, bfqd->bfq_max_budget);
-+			break;
-+		case BFQ_BFQQ_BUDGET_EXHAUSTED:
-+			/*
-+			 * The process still has backlog, and did not
-+			 * let either the budget timeout or the disk
-+			 * idling timeout expire. Hence it is not
-+			 * seeky, has a short thinktime and may be
-+			 * happy with a higher budget too. So
-+			 * definitely increase the budget of this good
-+			 * candidate to boost the disk throughput.
-+			 */
-+			budget = min(budget * 4, bfqd->bfq_max_budget);
-+			break;
-+		case BFQ_BFQQ_NO_MORE_REQUESTS:
-+		       /*
-+			* Leave the budget unchanged.
-+			*/
-+		default:
-+			return;
-+		}
-+	} else
-+		/*
-+		 * Async queues get always the maximum possible budget
-+		 * (their ability to dispatch is limited by
-+		 * @bfqd->bfq_max_budget_async_rq).
-+		 */
-+		budget = bfqd->bfq_max_budget;
-+
-+	bfqq->max_budget = budget;
-+
-+	if (bfqd->budgets_assigned >= bfq_stats_min_budgets &&
-+	    !bfqd->bfq_user_max_budget)
-+		bfqq->max_budget = min(bfqq->max_budget, bfqd->bfq_max_budget);
-+
-+	/*
-+	 * Make sure that we have enough budget for the next request.
-+	 * Since the finish time of the bfqq must be kept in sync with
-+	 * the budget, be sure to call __bfq_bfqq_expire() after the
-+	 * update.
-+	 */
-+	next_rq = bfqq->next_rq;
-+	if (next_rq)
-+		bfqq->entity.budget = max_t(unsigned long, bfqq->max_budget,
-+					    bfq_serv_to_charge(next_rq, bfqq));
-+	else
-+		bfqq->entity.budget = bfqq->max_budget;
-+
-+	bfq_log_bfqq(bfqd, bfqq, "head sect: %u, new budget %d",
-+			next_rq ? blk_rq_sectors(next_rq) : 0,
-+			bfqq->entity.budget);
-+}
-+
-+static unsigned long bfq_calc_max_budget(u64 peak_rate, u64 timeout)
-+{
-+	unsigned long max_budget;
-+
-+	/*
-+	 * The max_budget calculated when autotuning is equal to the
-+	 * amount of sectors transfered in timeout_sync at the
-+	 * estimated peak rate.
-+	 */
-+	max_budget = (unsigned long)(peak_rate * 1000 *
-+				     timeout >> BFQ_RATE_SHIFT);
-+
-+	return max_budget;
-+}
-+
-+/*
-+ * In addition to updating the peak rate, checks whether the process
-+ * is "slow", and returns 1 if so. This slow flag is used, in addition
-+ * to the budget timeout, to reduce the amount of service provided to
-+ * seeky processes, and hence reduce their chances to lower the
-+ * throughput. See the code for more details.
-+ */
-+static bool bfq_update_peak_rate(struct bfq_data *bfqd, struct bfq_queue *bfqq,
-+				 bool compensate, enum bfqq_expiration reason)
-+{
-+	u64 bw, usecs, expected, timeout;
-+	ktime_t delta;
-+	int update = 0;
-+
-+	if (!bfq_bfqq_sync(bfqq) || bfq_bfqq_budget_new(bfqq))
-+		return false;
-+
-+	if (compensate)
-+		delta = bfqd->last_idling_start;
-+	else
-+		delta = ktime_get();
-+	delta = ktime_sub(delta, bfqd->last_budget_start);
-+	usecs = ktime_to_us(delta);
-+
-+	/* Don't trust short/unrealistic values. */
-+	if (usecs < 100 || usecs >= LONG_MAX)
-+		return false;
-+
-+	/*
-+	 * Calculate the bandwidth for the last slice.  We use a 64 bit
-+	 * value to store the peak rate, in sectors per usec in fixed
-+	 * point math.  We do so to have enough precision in the estimate
-+	 * and to avoid overflows.
-+	 */
-+	bw = (u64)bfqq->entity.service << BFQ_RATE_SHIFT;
-+	do_div(bw, (unsigned long)usecs);
-+
-+	timeout = jiffies_to_msecs(bfqd->bfq_timeout[BLK_RW_SYNC]);
-+
-+	/*
-+	 * Use only long (> 20ms) intervals to filter out spikes for
-+	 * the peak rate estimation.
-+	 */
-+	if (usecs > 20000) {
-+		if (bw > bfqd->peak_rate ||
-+		   (!BFQQ_SEEKY(bfqq) &&
-+		    reason == BFQ_BFQQ_BUDGET_TIMEOUT)) {
-+			bfq_log(bfqd, "measured bw =%llu", bw);
-+			/*
-+			 * To smooth oscillations use a low-pass filter with
-+			 * alpha=7/8, i.e.,
-+			 * new_rate = (7/8) * old_rate + (1/8) * bw
-+			 */
-+			do_div(bw, 8);
-+			if (bw == 0)
-+				return 0;
-+			bfqd->peak_rate *= 7;
-+			do_div(bfqd->peak_rate, 8);
-+			bfqd->peak_rate += bw;
-+			update = 1;
-+			bfq_log(bfqd, "new peak_rate=%llu", bfqd->peak_rate);
-+		}
-+
-+		update |= bfqd->peak_rate_samples == BFQ_PEAK_RATE_SAMPLES - 1;
-+
-+		if (bfqd->peak_rate_samples < BFQ_PEAK_RATE_SAMPLES)
-+			bfqd->peak_rate_samples++;
-+
-+		if (bfqd->peak_rate_samples == BFQ_PEAK_RATE_SAMPLES &&
-+		    update) {
-+			int dev_type = blk_queue_nonrot(bfqd->queue);
-+			if (bfqd->bfq_user_max_budget == 0) {
-+				bfqd->bfq_max_budget =
-+					bfq_calc_max_budget(bfqd->peak_rate,
-+							    timeout);
-+				bfq_log(bfqd, "new max_budget=%d",
-+					bfqd->bfq_max_budget);
-+			}
-+			if (bfqd->device_speed == BFQ_BFQD_FAST &&
-+			    bfqd->peak_rate < device_speed_thresh[dev_type]) {
-+				bfqd->device_speed = BFQ_BFQD_SLOW;
-+				bfqd->RT_prod = R_slow[dev_type] *
-+						T_slow[dev_type];
-+			} else if (bfqd->device_speed == BFQ_BFQD_SLOW &&
-+			    bfqd->peak_rate > device_speed_thresh[dev_type]) {
-+				bfqd->device_speed = BFQ_BFQD_FAST;
-+				bfqd->RT_prod = R_fast[dev_type] *
-+						T_fast[dev_type];
-+			}
-+		}
-+	}
-+
-+	/*
-+	 * If the process has been served for a too short time
-+	 * interval to let its possible sequential accesses prevail on
-+	 * the initial seek time needed to move the disk head on the
-+	 * first sector it requested, then give the process a chance
-+	 * and for the moment return false.
-+	 */
-+	if (bfqq->entity.budget <= bfq_max_budget(bfqd) / 8)
-+		return false;
-+
-+	/*
-+	 * A process is considered ``slow'' (i.e., seeky, so that we
-+	 * cannot treat it fairly in the service domain, as it would
-+	 * slow down too much the other processes) if, when a slice
-+	 * ends for whatever reason, it has received service at a
-+	 * rate that would not be high enough to complete the budget
-+	 * before the budget timeout expiration.
-+	 */
-+	expected = bw * 1000 * timeout >> BFQ_RATE_SHIFT;
-+
-+	/*
-+	 * Caveat: processes doing IO in the slower disk zones will
-+	 * tend to be slow(er) even if not seeky. And the estimated
-+	 * peak rate will actually be an average over the disk
-+	 * surface. Hence, to not be too harsh with unlucky processes,
-+	 * we keep a budget/3 margin of safety before declaring a
-+	 * process slow.
-+	 */
-+	return expected > (4 * bfqq->entity.budget) / 3;
-+}
-+
-+/*
-+ * To be deemed as soft real-time, an application must meet two
-+ * requirements. First, the application must not require an average
-+ * bandwidth higher than the approximate bandwidth required to playback or
-+ * record a compressed high-definition video.
-+ * The next function is invoked on the completion of the last request of a
-+ * batch, to compute the next-start time instant, soft_rt_next_start, such
-+ * that, if the next request of the application does not arrive before
-+ * soft_rt_next_start, then the above requirement on the bandwidth is met.
-+ *
-+ * The second requirement is that the request pattern of the application is
-+ * isochronous, i.e., that, after issuing a request or a batch of requests,
-+ * the application stops issuing new requests until all its pending requests
-+ * have been completed. After that, the application may issue a new batch,
-+ * and so on.
-+ * For this reason the next function is invoked to compute
-+ * soft_rt_next_start only for applications that meet this requirement,
-+ * whereas soft_rt_next_start is set to infinity for applications that do
-+ * not.
-+ *
-+ * Unfortunately, even a greedy application may happen to behave in an
-+ * isochronous way if the CPU load is high. In fact, the application may
-+ * stop issuing requests while the CPUs are busy serving other processes,
-+ * then restart, then stop again for a while, and so on. In addition, if
-+ * the disk achieves a low enough throughput with the request pattern
-+ * issued by the application (e.g., because the request pattern is random
-+ * and/or the device is slow), then the application may meet the above
-+ * bandwidth requirement too. To prevent such a greedy application to be
-+ * deemed as soft real-time, a further rule is used in the computation of
-+ * soft_rt_next_start: soft_rt_next_start must be higher than the current
-+ * time plus the maximum time for which the arrival of a request is waited
-+ * for when a sync queue becomes idle, namely bfqd->bfq_slice_idle.
-+ * This filters out greedy applications, as the latter issue instead their
-+ * next request as soon as possible after the last one has been completed
-+ * (in contrast, when a batch of requests is completed, a soft real-time
-+ * application spends some time processing data).
-+ *
-+ * Unfortunately, the last filter may easily generate false positives if
-+ * only bfqd->bfq_slice_idle is used as a reference time interval and one
-+ * or both the following cases occur:
-+ * 1) HZ is so low that the duration of a jiffy is comparable to or higher
-+ *    than bfqd->bfq_slice_idle. This happens, e.g., on slow devices with
-+ *    HZ=100.
-+ * 2) jiffies, instead of increasing at a constant rate, may stop increasing
-+ *    for a while, then suddenly 'jump' by several units to recover the lost
-+ *    increments. This seems to happen, e.g., inside virtual machines.
-+ * To address this issue, we do not use as a reference time interval just
-+ * bfqd->bfq_slice_idle, but bfqd->bfq_slice_idle plus a few jiffies. In
-+ * particular we add the minimum number of jiffies for which the filter
-+ * seems to be quite precise also in embedded systems and KVM/QEMU virtual
-+ * machines.
-+ */
-+static unsigned long bfq_bfqq_softrt_next_start(struct bfq_data *bfqd,
-+						struct bfq_queue *bfqq)
-+{
-+	return max(bfqq->last_idle_bklogged +
-+		   HZ * bfqq->service_from_backlogged /
-+		   bfqd->bfq_wr_max_softrt_rate,
-+		   jiffies + bfqq->bfqd->bfq_slice_idle + 4);
-+}
-+
-+/*
-+ * Return the largest-possible time instant such that, for as long as possible,
-+ * the current time will be lower than this time instant according to the macro
-+ * time_is_before_jiffies().
-+ */
-+static unsigned long bfq_infinity_from_now(unsigned long now)
-+{
-+	return now + ULONG_MAX / 2;
-+}
-+
-+/**
-+ * bfq_bfqq_expire - expire a queue.
-+ * @bfqd: device owning the queue.
-+ * @bfqq: the queue to expire.
-+ * @compensate: if true, compensate for the time spent idling.
-+ * @reason: the reason causing the expiration.
-+ *
-+ *
-+ * If the process associated to the queue is slow (i.e., seeky), or in
-+ * case of budget timeout, or, finally, if it is async, we
-+ * artificially charge it an entire budget (independently of the
-+ * actual service it received). As a consequence, the queue will get
-+ * higher timestamps than the correct ones upon reactivation, and
-+ * hence it will be rescheduled as if it had received more service
-+ * than what it actually received. In the end, this class of processes
-+ * will receive less service in proportion to how slowly they consume
-+ * their budgets (and hence how seriously they tend to lower the
-+ * throughput).
-+ *
-+ * In contrast, when a queue expires because it has been idling for
-+ * too much or because it exhausted its budget, we do not touch the
-+ * amount of service it has received. Hence when the queue will be
-+ * reactivated and its timestamps updated, the latter will be in sync
-+ * with the actual service received by the queue until expiration.
-+ *
-+ * Charging a full budget to the first type of queues and the exact
-+ * service to the others has the effect of using the WF2Q+ policy to
-+ * schedule the former on a timeslice basis, without violating the
-+ * service domain guarantees of the latter.
-+ */
-+static void bfq_bfqq_expire(struct bfq_data *bfqd,
-+			    struct bfq_queue *bfqq,
-+			    bool compensate,
-+			    enum bfqq_expiration reason)
-+{
-+	bool slow;
-+	BUG_ON(bfqq != bfqd->in_service_queue);
-+
-+	/*
-+	 * Update disk peak rate for autotuning and check whether the
-+	 * process is slow (see bfq_update_peak_rate).
-+	 */
-+	slow = bfq_update_peak_rate(bfqd, bfqq, compensate, reason);
-+
-+	/*
-+	 * As above explained, 'punish' slow (i.e., seeky), timed-out
-+	 * and async queues, to favor sequential sync workloads.
-+	 *
-+	 * Processes doing I/O in the slower disk zones will tend to be
-+	 * slow(er) even if not seeky. Hence, since the estimated peak
-+	 * rate is actually an average over the disk surface, these
-+	 * processes may timeout just for bad luck. To avoid punishing
-+	 * them we do not charge a full budget to a process that
-+	 * succeeded in consuming at least 2/3 of its budget.
-+	 */
-+	if (slow || (reason == BFQ_BFQQ_BUDGET_TIMEOUT &&
-+		     bfq_bfqq_budget_left(bfqq) >=  bfqq->entity.budget / 3))
-+		bfq_bfqq_charge_full_budget(bfqq);
-+
-+	bfqq->service_from_backlogged += bfqq->entity.service;
-+
-+	if (BFQQ_SEEKY(bfqq) && reason == BFQ_BFQQ_BUDGET_TIMEOUT &&
-+	    !bfq_bfqq_constantly_seeky(bfqq)) {
-+		bfq_mark_bfqq_constantly_seeky(bfqq);
-+		if (!blk_queue_nonrot(bfqd->queue))
-+			bfqd->const_seeky_busy_in_flight_queues++;
-+	}
-+
-+	if (reason == BFQ_BFQQ_TOO_IDLE &&
-+	    bfqq->entity.service <= 2 * bfqq->entity.budget / 10 )
-+		bfq_clear_bfqq_IO_bound(bfqq);
-+
-+	if (bfqd->low_latency && bfqq->wr_coeff == 1)
-+		bfqq->last_wr_start_finish = jiffies;
-+
-+	if (bfqd->low_latency && bfqd->bfq_wr_max_softrt_rate > 0 &&
-+	    RB_EMPTY_ROOT(&bfqq->sort_list)) {
-+		/*
-+		 * If we get here, and there are no outstanding requests,
-+		 * then the request pattern is isochronous (see the comments
-+		 * to the function bfq_bfqq_softrt_next_start()). Hence we
-+		 * can compute soft_rt_next_start. If, instead, the queue
-+		 * still has outstanding requests, then we have to wait
-+		 * for the completion of all the outstanding requests to
-+		 * discover whether the request pattern is actually
-+		 * isochronous.
-+		 */
-+		if (bfqq->dispatched == 0)
-+			bfqq->soft_rt_next_start =
-+				bfq_bfqq_softrt_next_start(bfqd, bfqq);
-+		else {
-+			/*
-+			 * The application is still waiting for the
-+			 * completion of one or more requests:
-+			 * prevent it from possibly being incorrectly
-+			 * deemed as soft real-time by setting its
-+			 * soft_rt_next_start to infinity. In fact,
-+			 * without this assignment, the application
-+			 * would be incorrectly deemed as soft
-+			 * real-time if:
-+			 * 1) it issued a new request before the
-+			 *    completion of all its in-flight
-+			 *    requests, and
-+			 * 2) at that time, its soft_rt_next_start
-+			 *    happened to be in the past.
-+			 */
-+			bfqq->soft_rt_next_start =
-+				bfq_infinity_from_now(jiffies);
-+			/*
-+			 * Schedule an update of soft_rt_next_start to when
-+			 * the task may be discovered to be isochronous.
-+			 */
-+			bfq_mark_bfqq_softrt_update(bfqq);
-+		}
-+	}
-+
-+	bfq_log_bfqq(bfqd, bfqq,
-+		"expire (%d, slow %d, num_disp %d, idle_win %d)", reason,
-+		slow, bfqq->dispatched, bfq_bfqq_idle_window(bfqq));
-+
-+	/*
-+	 * Increase, decrease or leave budget unchanged according to
-+	 * reason.
-+	 */
-+	__bfq_bfqq_recalc_budget(bfqd, bfqq, reason);
-+	__bfq_bfqq_expire(bfqd, bfqq);
-+}
-+
-+/*
-+ * Budget timeout is not implemented through a dedicated timer, but
-+ * just checked on request arrivals and completions, as well as on
-+ * idle timer expirations.
-+ */
-+static bool bfq_bfqq_budget_timeout(struct bfq_queue *bfqq)
-+{
-+	if (bfq_bfqq_budget_new(bfqq) ||
-+	    time_before(jiffies, bfqq->budget_timeout))
-+		return false;
-+	return true;
-+}
-+
-+/*
-+ * If we expire a queue that is waiting for the arrival of a new
-+ * request, we may prevent the fictitious timestamp back-shifting that
-+ * allows the guarantees of the queue to be preserved (see [1] for
-+ * this tricky aspect). Hence we return true only if this condition
-+ * does not hold, or if the queue is slow enough to deserve only to be
-+ * kicked off for preserving a high throughput.
-+*/
-+static bool bfq_may_expire_for_budg_timeout(struct bfq_queue *bfqq)
-+{
-+	bfq_log_bfqq(bfqq->bfqd, bfqq,
-+		"may_budget_timeout: wait_request %d left %d timeout %d",
-+		bfq_bfqq_wait_request(bfqq),
-+			bfq_bfqq_budget_left(bfqq) >=  bfqq->entity.budget / 3,
-+		bfq_bfqq_budget_timeout(bfqq));
-+
-+	return (!bfq_bfqq_wait_request(bfqq) ||
-+		bfq_bfqq_budget_left(bfqq) >=  bfqq->entity.budget / 3)
-+		&&
-+		bfq_bfqq_budget_timeout(bfqq);
-+}
-+
-+/*
-+ * For a queue that becomes empty, device idling is allowed only if
-+ * this function returns true for that queue. As a consequence, since
-+ * device idling plays a critical role for both throughput boosting
-+ * and service guarantees, the return value of this function plays a
-+ * critical role as well.
-+ *
-+ * In a nutshell, this function returns true only if idling is
-+ * beneficial for throughput or, even if detrimental for throughput,
-+ * idling is however necessary to preserve service guarantees (low
-+ * latency, desired throughput distribution, ...). In particular, on
-+ * NCQ-capable devices, this function tries to return false, so as to
-+ * help keep the drives' internal queues full, whenever this helps the
-+ * device boost the throughput without causing any service-guarantee
-+ * issue.
-+ *
-+ * In more detail, the return value of this function is obtained by,
-+ * first, computing a number of boolean variables that take into
-+ * account throughput and service-guarantee issues, and, then,
-+ * combining these variables in a logical expression. Most of the
-+ * issues taken into account are not trivial. We discuss these issues
-+ * while introducing the variables.
-+ */
-+static bool bfq_bfqq_may_idle(struct bfq_queue *bfqq)
-+{
-+	struct bfq_data *bfqd = bfqq->bfqd;
-+	bool idling_boosts_thr, idling_boosts_thr_without_issues,
-+		all_queues_seeky, on_hdd_and_not_all_queues_seeky,
-+		idling_needed_for_service_guarantees,
-+		asymmetric_scenario;
-+
-+	/*
-+	 * The next variable takes into account the cases where idling
-+	 * boosts the throughput.
-+	 *
-+	 * The value of the variable is computed considering, first, that
-+	 * idling is virtually always beneficial for the throughput if:
-+	 * (a) the device is not NCQ-capable, or
-+	 * (b) regardless of the presence of NCQ, the device is rotational
-+	 *     and the request pattern for bfqq is I/O-bound and sequential.
-+	 *
-+	 * Secondly, and in contrast to the above item (b), idling an
-+	 * NCQ-capable flash-based device would not boost the
-+	 * throughput even with sequential I/O; rather it would lower
-+	 * the throughput in proportion to how fast the device
-+	 * is. Accordingly, the next variable is true if any of the
-+	 * above conditions (a) and (b) is true, and, in particular,
-+	 * happens to be false if bfqd is an NCQ-capable flash-based
-+	 * device.
-+	 */
-+	idling_boosts_thr = !bfqd->hw_tag ||
-+		(!blk_queue_nonrot(bfqd->queue) && bfq_bfqq_IO_bound(bfqq) &&
-+		 bfq_bfqq_idle_window(bfqq)) ;
-+
-+	/*
-+	 * The value of the next variable,
-+	 * idling_boosts_thr_without_issues, is equal to that of
-+	 * idling_boosts_thr, unless a special case holds. In this
-+	 * special case, described below, idling may cause problems to
-+	 * weight-raised queues.
-+	 *
-+	 * When the request pool is saturated (e.g., in the presence
-+	 * of write hogs), if the processes associated with
-+	 * non-weight-raised queues ask for requests at a lower rate,
-+	 * then processes associated with weight-raised queues have a
-+	 * higher probability to get a request from the pool
-+	 * immediately (or at least soon) when they need one. Thus
-+	 * they have a higher probability to actually get a fraction
-+	 * of the device throughput proportional to their high
-+	 * weight. This is especially true with NCQ-capable drives,
-+	 * which enqueue several requests in advance, and further
-+	 * reorder internally-queued requests.
-+	 *
-+	 * For this reason, we force to false the value of
-+	 * idling_boosts_thr_without_issues if there are weight-raised
-+	 * busy queues. In this case, and if bfqq is not weight-raised,
-+	 * this guarantees that the device is not idled for bfqq (if,
-+	 * instead, bfqq is weight-raised, then idling will be
-+	 * guaranteed by another variable, see below). Combined with
-+	 * the timestamping rules of BFQ (see [1] for details), this
-+	 * behavior causes bfqq, and hence any sync non-weight-raised
-+	 * queue, to get a lower number of requests served, and thus
-+	 * to ask for a lower number of requests from the request
-+	 * pool, before the busy weight-raised queues get served
-+	 * again. This often mitigates starvation problems in the
-+	 * presence of heavy write workloads and NCQ, thereby
-+	 * guaranteeing a higher application and system responsiveness
-+	 * in these hostile scenarios.
-+	 */
-+	idling_boosts_thr_without_issues = idling_boosts_thr &&
-+		bfqd->wr_busy_queues == 0;
-+
-+	/*
-+	 * There are then two cases where idling must be performed not
-+	 * for throughput concerns, but to preserve service
-+	 * guarantees. In the description of these cases, we say, for
-+	 * short, that a queue is sequential/random if the process
-+	 * associated to the queue issues sequential/random requests
-+	 * (in the second case the queue may be tagged as seeky or
-+	 * even constantly_seeky).
-+	 *
-+	 * To introduce the first case, we note that, since
-+	 * bfq_bfqq_idle_window(bfqq) is false if the device is
-+	 * NCQ-capable and bfqq is random (see
-+	 * bfq_update_idle_window()), then, from the above two
-+	 * assignments it follows that
-+	 * idling_boosts_thr_without_issues is false if the device is
-+	 * NCQ-capable and bfqq is random. Therefore, for this case,
-+	 * device idling would never be allowed if we used just
-+	 * idling_boosts_thr_without_issues to decide whether to allow
-+	 * it. And, beneficially, this would imply that throughput
-+	 * would always be boosted also with random I/O on NCQ-capable
-+	 * HDDs.
-+	 *
-+	 * But we must be careful on this point, to avoid an unfair
-+	 * treatment for bfqq. In fact, because of the same above
-+	 * assignments, idling_boosts_thr_without_issues is, on the
-+	 * other hand, true if 1) the device is an HDD and bfqq is
-+	 * sequential, and 2) there are no busy weight-raised
-+	 * queues. As a consequence, if we used just
-+	 * idling_boosts_thr_without_issues to decide whether to idle
-+	 * the device, then with an HDD we might easily bump into a
-+	 * scenario where queues that are sequential and I/O-bound
-+	 * would enjoy idling, whereas random queues would not. The
-+	 * latter might then get a low share of the device throughput,
-+	 * simply because the former would get many requests served
-+	 * after being set as in service, while the latter would not.
-+	 *
-+	 * To address this issue, we start by setting to true a
-+	 * sentinel variable, on_hdd_and_not_all_queues_seeky, if the
-+	 * device is rotational and not all queues with pending or
-+	 * in-flight requests are constantly seeky (i.e., there are
-+	 * active sequential queues, and bfqq might then be mistreated
-+	 * if it does not enjoy idling because it is random).
-+	 */
-+	all_queues_seeky = bfq_bfqq_constantly_seeky(bfqq) &&
-+			   bfqd->busy_in_flight_queues ==
-+			   bfqd->const_seeky_busy_in_flight_queues;
-+
-+	on_hdd_and_not_all_queues_seeky =
-+		!blk_queue_nonrot(bfqd->queue) && !all_queues_seeky;
-+
-+	/*
-+	 * To introduce the second case where idling needs to be
-+	 * performed to preserve service guarantees, we can note that
-+	 * allowing the drive to enqueue more than one request at a
-+	 * time, and hence delegating de facto final scheduling
-+	 * decisions to the drive's internal scheduler, causes loss of
-+	 * control on the actual request service order. In particular,
-+	 * the critical situation is when requests from different
-+	 * processes happens to be present, at the same time, in the
-+	 * internal queue(s) of the drive. In such a situation, the
-+	 * drive, by deciding the service order of the
-+	 * internally-queued requests, does determine also the actual
-+	 * throughput distribution among these processes. But the
-+	 * drive typically has no notion or concern about per-process
-+	 * throughput distribution, and makes its decisions only on a
-+	 * per-request basis. Therefore, the service distribution
-+	 * enforced by the drive's internal scheduler is likely to
-+	 * coincide with the desired device-throughput distribution
-+	 * only in a completely symmetric scenario where:
-+	 * (i)  each of these processes must get the same throughput as
-+	 *      the others;
-+	 * (ii) all these processes have the same I/O pattern
-+	        (either sequential or random).
-+	 * In fact, in such a scenario, the drive will tend to treat
-+	 * the requests of each of these processes in about the same
-+	 * way as the requests of the others, and thus to provide
-+	 * each of these processes with about the same throughput
-+	 * (which is exactly the desired throughput distribution). In
-+	 * contrast, in any asymmetric scenario, device idling is
-+	 * certainly needed to guarantee that bfqq receives its
-+	 * assigned fraction of the device throughput (see [1] for
-+	 * details).
-+	 *
-+	 * We address this issue by controlling, actually, only the
-+	 * symmetry sub-condition (i), i.e., provided that
-+	 * sub-condition (i) holds, idling is not performed,
-+	 * regardless of whether sub-condition (ii) holds. In other
-+	 * words, only if sub-condition (i) holds, then idling is
-+	 * allowed, and the device tends to be prevented from queueing
-+	 * many requests, possibly of several processes. The reason
-+	 * for not controlling also sub-condition (ii) is that, first,
-+	 * in the case of an HDD, the asymmetry in terms of types of
-+	 * I/O patterns is already taken in to account in the above
-+	 * sentinel variable
-+	 * on_hdd_and_not_all_queues_seeky. Secondly, in the case of a
-+	 * flash-based device, we prefer however to privilege
-+	 * throughput (and idling lowers throughput for this type of
-+	 * devices), for the following reasons:
-+	 * 1) differently from HDDs, the service time of random
-+	 *    requests is not orders of magnitudes lower than the service
-+	 *    time of sequential requests; thus, even if processes doing
-+	 *    sequential I/O get a preferential treatment with respect to
-+	 *    others doing random I/O, the consequences are not as
-+	 *    dramatic as with HDDs;
-+	 * 2) if a process doing random I/O does need strong
-+	 *    throughput guarantees, it is hopefully already being
-+	 *    weight-raised, or the user is likely to have assigned it a
-+	 *    higher weight than the other processes (and thus
-+	 *    sub-condition (i) is likely to be false, which triggers
-+	 *    idling).
-+	 *
-+	 * According to the above considerations, the next variable is
-+	 * true (only) if sub-condition (i) holds. To compute the
-+	 * value of this variable, we not only use the return value of
-+	 * the function bfq_symmetric_scenario(), but also check
-+	 * whether bfqq is being weight-raised, because
-+	 * bfq_symmetric_scenario() does not take into account also
-+	 * weight-raised queues (see comments to
-+	 * bfq_weights_tree_add()).
-+	 *
-+	 * As a side note, it is worth considering that the above
-+	 * device-idling countermeasures may however fail in the
-+	 * following unlucky scenario: if idling is (correctly)
-+	 * disabled in a time period during which all symmetry
-+	 * sub-conditions hold, and hence the device is allowed to
-+	 * enqueue many requests, but at some later point in time some
-+	 * sub-condition stops to hold, then it may become impossible
-+	 * to let requests be served in the desired order until all
-+	 * the requests already queued in the device have been served.
-+	 */
-+	asymmetric_scenario = bfqq->wr_coeff > 1 ||
-+		!bfq_symmetric_scenario(bfqd);
-+
-+	/*
-+	 * Finally, there is a case where maximizing throughput is the
-+	 * best choice even if it may cause unfairness toward
-+	 * bfqq. Such a case is when bfqq became active in a burst of
-+	 * queue activations. Queues that became active during a large
-+	 * burst benefit only from throughput, as discussed in the
-+	 * comments to bfq_handle_burst. Thus, if bfqq became active
-+	 * in a burst and not idling the device maximizes throughput,
-+	 * then the device must no be idled, because not idling the
-+	 * device provides bfqq and all other queues in the burst with
-+	 * maximum benefit. Combining this and the two cases above, we
-+	 * can now establish when idling is actually needed to
-+	 * preserve service guarantees.
-+	 */
-+	idling_needed_for_service_guarantees =
-+		(on_hdd_and_not_all_queues_seeky || asymmetric_scenario) &&
-+		!bfq_bfqq_in_large_burst(bfqq);
-+
-+	/*
-+	 * We have now all the components we need to compute the return
-+	 * value of the function, which is true only if both the following
-+	 * conditions hold:
-+	 * 1) bfqq is sync, because idling make sense only for sync queues;
-+	 * 2) idling either boosts the throughput (without issues), or
-+	 *    is necessary to preserve service guarantees.
-+	 */
-+	return bfq_bfqq_sync(bfqq) &&
-+		(idling_boosts_thr_without_issues ||
-+		 idling_needed_for_service_guarantees);
-+}
-+
-+/*
-+ * If the in-service queue is empty but the function bfq_bfqq_may_idle
-+ * returns true, then:
-+ * 1) the queue must remain in service and cannot be expired, and
-+ * 2) the device must be idled to wait for the possible arrival of a new
-+ *    request for the queue.
-+ * See the comments to the function bfq_bfqq_may_idle for the reasons
-+ * why performing device idling is the best choice to boost the throughput
-+ * and preserve service guarantees when bfq_bfqq_may_idle itself
-+ * returns true.
-+ */
-+static bool bfq_bfqq_must_idle(struct bfq_queue *bfqq)
-+{
-+	struct bfq_data *bfqd = bfqq->bfqd;
-+
-+	return RB_EMPTY_ROOT(&bfqq->sort_list) && bfqd->bfq_slice_idle != 0 &&
-+	       bfq_bfqq_may_idle(bfqq);
-+}
-+
-+/*
-+ * Select a queue for service.  If we have a current queue in service,
-+ * check whether to continue servicing it, or retrieve and set a new one.
-+ */
-+static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
-+{
-+	struct bfq_queue *bfqq;
-+	struct request *next_rq;
-+	enum bfqq_expiration reason = BFQ_BFQQ_BUDGET_TIMEOUT;
-+
-+	bfqq = bfqd->in_service_queue;
-+	if (!bfqq)
-+		goto new_queue;
-+
-+	bfq_log_bfqq(bfqd, bfqq, "select_queue: already in-service queue");
-+
-+	if (bfq_may_expire_for_budg_timeout(bfqq) &&
-+	    !timer_pending(&bfqd->idle_slice_timer) &&
-+	    !bfq_bfqq_must_idle(bfqq))
-+		goto expire;
-+
-+	next_rq = bfqq->next_rq;
-+	/*
-+	 * If bfqq has requests queued and it has enough budget left to
-+	 * serve them, keep the queue, otherwise expire it.
-+	 */
-+	if (next_rq) {
-+		if (bfq_serv_to_charge(next_rq, bfqq) >
-+			bfq_bfqq_budget_left(bfqq)) {
-+			reason = BFQ_BFQQ_BUDGET_EXHAUSTED;
-+			goto expire;
-+		} else {
-+			/*
-+			 * The idle timer may be pending because we may
-+			 * not disable disk idling even when a new request
-+			 * arrives.
-+			 */
-+			if (timer_pending(&bfqd->idle_slice_timer)) {
-+				/*
-+				 * If we get here: 1) at least a new request
-+				 * has arrived but we have not disabled the
-+				 * timer because the request was too small,
-+				 * 2) then the block layer has unplugged
-+				 * the device, causing the dispatch to be
-+				 * invoked.
-+				 *
-+				 * Since the device is unplugged, now the
-+				 * requests are probably large enough to
-+				 * provide a reasonable throughput.
-+				 * So we disable idling.
-+				 */
-+				bfq_clear_bfqq_wait_request(bfqq);
-+				del_timer(&bfqd->idle_slice_timer);
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+				bfqg_stats_update_idle_time(bfqq_group(bfqq));
-+#endif
-+			}
-+			goto keep_queue;
-+		}
-+	}
-+
-+	/*
-+	 * No requests pending. However, if the in-service queue is idling
-+	 * for a new request, or has requests waiting for a completion and
-+	 * may idle after their completion, then keep it anyway.
-+	 */
-+	if (timer_pending(&bfqd->idle_slice_timer) ||
-+	    (bfqq->dispatched != 0 && bfq_bfqq_may_idle(bfqq))) {
-+		bfqq = NULL;
-+		goto keep_queue;
-+	}
-+
-+	reason = BFQ_BFQQ_NO_MORE_REQUESTS;
-+expire:
-+	bfq_bfqq_expire(bfqd, bfqq, false, reason);
-+new_queue:
-+	bfqq = bfq_set_in_service_queue(bfqd);
-+	bfq_log(bfqd, "select_queue: new queue %d returned",
-+		bfqq ? bfqq->pid : 0);
-+keep_queue:
-+	return bfqq;
-+}
-+
-+static void bfq_update_wr_data(struct bfq_data *bfqd, struct bfq_queue *bfqq)
-+{
-+	struct bfq_entity *entity = &bfqq->entity;
-+	if (bfqq->wr_coeff > 1) { /* queue is being weight-raised */
-+		bfq_log_bfqq(bfqd, bfqq,
-+			"raising period dur %u/%u msec, old coeff %u, w %d(%d)",
-+			jiffies_to_msecs(jiffies - bfqq->last_wr_start_finish),
-+			jiffies_to_msecs(bfqq->wr_cur_max_time),
-+			bfqq->wr_coeff,
-+			bfqq->entity.weight, bfqq->entity.orig_weight);
-+
-+		BUG_ON(bfqq != bfqd->in_service_queue && entity->weight !=
-+		       entity->orig_weight * bfqq->wr_coeff);
-+		if (entity->prio_changed)
-+			bfq_log_bfqq(bfqd, bfqq, "WARN: pending prio change");
-+
-+		/*
-+		 * If the queue was activated in a burst, or
-+		 * too much time has elapsed from the beginning
-+		 * of this weight-raising period, then end weight
-+		 * raising.
-+		 */
-+		if (bfq_bfqq_in_large_burst(bfqq) ||
-+		    time_is_before_jiffies(bfqq->last_wr_start_finish +
-+					   bfqq->wr_cur_max_time)) {
-+			bfqq->last_wr_start_finish = jiffies;
-+			bfq_log_bfqq(bfqd, bfqq,
-+				     "wrais ending at %lu, rais_max_time %u",
-+				     bfqq->last_wr_start_finish,
-+				     jiffies_to_msecs(bfqq->wr_cur_max_time));
-+			bfq_bfqq_end_wr(bfqq);
-+		}
-+	}
-+	/* Update weight both if it must be raised and if it must be lowered */
-+	if ((entity->weight > entity->orig_weight) != (bfqq->wr_coeff > 1))
-+		__bfq_entity_update_weight_prio(
-+			bfq_entity_service_tree(entity),
-+			entity);
-+}
-+
-+/*
-+ * Dispatch one request from bfqq, moving it to the request queue
-+ * dispatch list.
-+ */
-+static int bfq_dispatch_request(struct bfq_data *bfqd,
-+				struct bfq_queue *bfqq)
-+{
-+	int dispatched = 0;
-+	struct request *rq;
-+	unsigned long service_to_charge;
-+
-+	BUG_ON(RB_EMPTY_ROOT(&bfqq->sort_list));
-+
-+	/* Follow expired path, else get first next available. */
-+	rq = bfq_check_fifo(bfqq);
-+	if (!rq)
-+		rq = bfqq->next_rq;
-+	service_to_charge = bfq_serv_to_charge(rq, bfqq);
-+
-+	if (service_to_charge > bfq_bfqq_budget_left(bfqq)) {
-+		/*
-+		 * This may happen if the next rq is chosen in fifo order
-+		 * instead of sector order. The budget is properly
-+		 * dimensioned to be always sufficient to serve the next
-+		 * request only if it is chosen in sector order. The reason
-+		 * is that it would be quite inefficient and little useful
-+		 * to always make sure that the budget is large enough to
-+		 * serve even the possible next rq in fifo order.
-+		 * In fact, requests are seldom served in fifo order.
-+		 *
-+		 * Expire the queue for budget exhaustion, and make sure
-+		 * that the next act_budget is enough to serve the next
-+		 * request, even if it comes from the fifo expired path.
-+		 */
-+		bfqq->next_rq = rq;
-+		/*
-+		 * Since this dispatch is failed, make sure that
-+		 * a new one will be performed
-+		 */
-+		if (!bfqd->rq_in_driver)
-+			bfq_schedule_dispatch(bfqd);
-+		goto expire;
-+	}
-+
-+	/* Finally, insert request into driver dispatch list. */
-+	bfq_bfqq_served(bfqq, service_to_charge);
-+	bfq_dispatch_insert(bfqd->queue, rq);
-+
-+	bfq_update_wr_data(bfqd, bfqq);
-+
-+	bfq_log_bfqq(bfqd, bfqq,
-+			"dispatched %u sec req (%llu), budg left %d",
-+			blk_rq_sectors(rq),
-+			(long long unsigned)blk_rq_pos(rq),
-+			bfq_bfqq_budget_left(bfqq));
-+
-+	dispatched++;
-+
-+	if (!bfqd->in_service_bic) {
-+		atomic_long_inc(&RQ_BIC(rq)->icq.ioc->refcount);
-+		bfqd->in_service_bic = RQ_BIC(rq);
-+	}
-+
-+	if (bfqd->busy_queues > 1 && ((!bfq_bfqq_sync(bfqq) &&
-+	    dispatched >= bfqd->bfq_max_budget_async_rq) ||
-+	    bfq_class_idle(bfqq)))
-+		goto expire;
-+
-+	return dispatched;
-+
-+expire:
-+	bfq_bfqq_expire(bfqd, bfqq, false, BFQ_BFQQ_BUDGET_EXHAUSTED);
-+	return dispatched;
-+}
-+
-+static int __bfq_forced_dispatch_bfqq(struct bfq_queue *bfqq)
-+{
-+	int dispatched = 0;
-+
-+	while (bfqq->next_rq) {
-+		bfq_dispatch_insert(bfqq->bfqd->queue, bfqq->next_rq);
-+		dispatched++;
-+	}
-+
-+	BUG_ON(!list_empty(&bfqq->fifo));
-+	return dispatched;
-+}
-+
-+/*
-+ * Drain our current requests.
-+ * Used for barriers and when switching io schedulers on-the-fly.
-+ */
-+static int bfq_forced_dispatch(struct bfq_data *bfqd)
-+{
-+	struct bfq_queue *bfqq, *n;
-+	struct bfq_service_tree *st;
-+	int dispatched = 0;
-+
-+	bfqq = bfqd->in_service_queue;
-+	if (bfqq)
-+		__bfq_bfqq_expire(bfqd, bfqq);
-+
-+	/*
-+	 * Loop through classes, and be careful to leave the scheduler
-+	 * in a consistent state, as feedback mechanisms and vtime
-+	 * updates cannot be disabled during the process.
-+	 */
-+	list_for_each_entry_safe(bfqq, n, &bfqd->active_list, bfqq_list) {
-+		st = bfq_entity_service_tree(&bfqq->entity);
-+
-+		dispatched += __bfq_forced_dispatch_bfqq(bfqq);
-+		bfqq->max_budget = bfq_max_budget(bfqd);
-+
-+		bfq_forget_idle(st);
-+	}
-+
-+	BUG_ON(bfqd->busy_queues != 0);
-+
-+	return dispatched;
-+}
-+
-+static int bfq_dispatch_requests(struct request_queue *q, int force)
-+{
-+	struct bfq_data *bfqd = q->elevator->elevator_data;
-+	struct bfq_queue *bfqq;
-+	int max_dispatch;
-+
-+	bfq_log(bfqd, "dispatch requests: %d busy queues", bfqd->busy_queues);
-+	if (bfqd->busy_queues == 0)
-+		return 0;
-+
-+	if (unlikely(force))
-+		return bfq_forced_dispatch(bfqd);
-+
-+	bfqq = bfq_select_queue(bfqd);
-+	if (!bfqq)
-+		return 0;
-+
-+	if (bfq_class_idle(bfqq))
-+		max_dispatch = 1;
-+
-+	if (!bfq_bfqq_sync(bfqq))
-+		max_dispatch = bfqd->bfq_max_budget_async_rq;
-+
-+	if (!bfq_bfqq_sync(bfqq) && bfqq->dispatched >= max_dispatch) {
-+		if (bfqd->busy_queues > 1)
-+			return 0;
-+		if (bfqq->dispatched >= 4 * max_dispatch)
-+			return 0;
-+	}
-+
-+	if (bfqd->sync_flight != 0 && !bfq_bfqq_sync(bfqq))
-+		return 0;
-+
-+	bfq_clear_bfqq_wait_request(bfqq);
-+	BUG_ON(timer_pending(&bfqd->idle_slice_timer));
-+
-+	if (!bfq_dispatch_request(bfqd, bfqq))
-+		return 0;
-+
-+	bfq_log_bfqq(bfqd, bfqq, "dispatched %s request",
-+			bfq_bfqq_sync(bfqq) ? "sync" : "async");
-+
-+	return 1;
-+}
-+
-+/*
-+ * Task holds one reference to the queue, dropped when task exits.  Each rq
-+ * in-flight on this queue also holds a reference, dropped when rq is freed.
-+ *
-+ * Queue lock must be held here.
-+ */
-+static void bfq_put_queue(struct bfq_queue *bfqq)
-+{
-+	struct bfq_data *bfqd = bfqq->bfqd;
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+	struct bfq_group *bfqg = bfqq_group(bfqq);
-+#endif
-+
-+	BUG_ON(atomic_read(&bfqq->ref) <= 0);
-+
-+	bfq_log_bfqq(bfqd, bfqq, "put_queue: %p %d", bfqq,
-+		     atomic_read(&bfqq->ref));
-+	if (!atomic_dec_and_test(&bfqq->ref))
-+		return;
-+
-+	BUG_ON(rb_first(&bfqq->sort_list));
-+	BUG_ON(bfqq->allocated[READ] + bfqq->allocated[WRITE] != 0);
-+	BUG_ON(bfqq->entity.tree);
-+	BUG_ON(bfq_bfqq_busy(bfqq));
-+	BUG_ON(bfqd->in_service_queue == bfqq);
-+
-+	if (bfq_bfqq_sync(bfqq))
-+		/*
-+		 * The fact that this queue is being destroyed does not
-+		 * invalidate the fact that this queue may have been
-+		 * activated during the current burst. As a consequence,
-+		 * although the queue does not exist anymore, and hence
-+		 * needs to be removed from the burst list if there,
-+		 * the burst size has not to be decremented.
-+		 */
-+		hlist_del_init(&bfqq->burst_list_node);
-+
-+	bfq_log_bfqq(bfqd, bfqq, "put_queue: %p freed", bfqq);
-+
-+	kmem_cache_free(bfq_pool, bfqq);
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+	bfqg_put(bfqg);
-+#endif
-+}
-+
-+static void bfq_exit_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq)
-+{
-+	if (bfqq == bfqd->in_service_queue) {
-+		__bfq_bfqq_expire(bfqd, bfqq);
-+		bfq_schedule_dispatch(bfqd);
-+	}
-+
-+	bfq_log_bfqq(bfqd, bfqq, "exit_bfqq: %p, %d", bfqq,
-+		     atomic_read(&bfqq->ref));
-+
-+	bfq_put_queue(bfqq);
-+}
-+
-+static void bfq_init_icq(struct io_cq *icq)
-+{
-+	struct bfq_io_cq *bic = icq_to_bic(icq);
-+
-+	bic->ttime.last_end_request = jiffies;
-+}
-+
-+static void bfq_exit_icq(struct io_cq *icq)
-+{
-+	struct bfq_io_cq *bic = icq_to_bic(icq);
-+	struct bfq_data *bfqd = bic_to_bfqd(bic);
-+
-+	if (bic->bfqq[BLK_RW_ASYNC]) {
-+		bfq_exit_bfqq(bfqd, bic->bfqq[BLK_RW_ASYNC]);
-+		bic->bfqq[BLK_RW_ASYNC] = NULL;
-+	}
-+
-+	if (bic->bfqq[BLK_RW_SYNC]) {
-+		bfq_exit_bfqq(bfqd, bic->bfqq[BLK_RW_SYNC]);
-+		bic->bfqq[BLK_RW_SYNC] = NULL;
-+	}
-+}
-+
-+/*
-+ * Update the entity prio values; note that the new values will not
-+ * be used until the next (re)activation.
-+ */
-+static void bfq_set_next_ioprio_data(struct bfq_queue *bfqq, struct bfq_io_cq *bic)
-+{
-+	struct task_struct *tsk = current;
-+	int ioprio_class;
-+
-+	ioprio_class = IOPRIO_PRIO_CLASS(bic->ioprio);
-+	switch (ioprio_class) {
-+	default:
-+		dev_err(bfqq->bfqd->queue->backing_dev_info.dev,
-+			"bfq: bad prio class %d\n", ioprio_class);
-+	case IOPRIO_CLASS_NONE:
-+		/*
-+		 * No prio set, inherit CPU scheduling settings.
-+		 */
-+		bfqq->new_ioprio = task_nice_ioprio(tsk);
-+		bfqq->new_ioprio_class = task_nice_ioclass(tsk);
-+		break;
-+	case IOPRIO_CLASS_RT:
-+		bfqq->new_ioprio = IOPRIO_PRIO_DATA(bic->ioprio);
-+		bfqq->new_ioprio_class = IOPRIO_CLASS_RT;
-+		break;
-+	case IOPRIO_CLASS_BE:
-+		bfqq->new_ioprio = IOPRIO_PRIO_DATA(bic->ioprio);
-+		bfqq->new_ioprio_class = IOPRIO_CLASS_BE;
-+		break;
-+	case IOPRIO_CLASS_IDLE:
-+		bfqq->new_ioprio_class = IOPRIO_CLASS_IDLE;
-+		bfqq->new_ioprio = 7;
-+		bfq_clear_bfqq_idle_window(bfqq);
-+		break;
-+	}
-+
-+	if (bfqq->new_ioprio < 0 || bfqq->new_ioprio >= IOPRIO_BE_NR) {
-+		printk(KERN_CRIT "bfq_set_next_ioprio_data: new_ioprio %d\n",
-+				 bfqq->new_ioprio);
-+		BUG();
-+	}
-+
-+	bfqq->entity.new_weight = bfq_ioprio_to_weight(bfqq->new_ioprio);
-+	bfqq->entity.prio_changed = 1;
-+}
-+
-+static void bfq_check_ioprio_change(struct bfq_io_cq *bic, struct bio *bio)
-+{
-+	struct bfq_data *bfqd;
-+	struct bfq_queue *bfqq, *new_bfqq;
-+	unsigned long uninitialized_var(flags);
-+	int ioprio = bic->icq.ioc->ioprio;
-+
-+	bfqd = bfq_get_bfqd_locked(&(bic->icq.q->elevator->elevator_data),
-+				   &flags);
-+	/*
-+	 * This condition may trigger on a newly created bic, be sure to
-+	 * drop the lock before returning.
-+	 */
-+	if (unlikely(!bfqd) || likely(bic->ioprio == ioprio))
-+		goto out;
-+
-+	bic->ioprio = ioprio;
-+
-+	bfqq = bic->bfqq[BLK_RW_ASYNC];
-+	if (bfqq) {
-+		new_bfqq = bfq_get_queue(bfqd, bio, BLK_RW_ASYNC, bic,
-+					 GFP_ATOMIC);
-+		if (new_bfqq) {
-+			bic->bfqq[BLK_RW_ASYNC] = new_bfqq;
-+			bfq_log_bfqq(bfqd, bfqq,
-+				     "check_ioprio_change: bfqq %p %d",
-+				     bfqq, atomic_read(&bfqq->ref));
-+			bfq_put_queue(bfqq);
-+		}
-+	}
-+
-+	bfqq = bic->bfqq[BLK_RW_SYNC];
-+	if (bfqq)
-+		bfq_set_next_ioprio_data(bfqq, bic);
-+
-+out:
-+	bfq_put_bfqd_unlock(bfqd, &flags);
-+}
-+
-+static void bfq_init_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq,
-+			  struct bfq_io_cq *bic, pid_t pid, int is_sync)
-+{
-+	RB_CLEAR_NODE(&bfqq->entity.rb_node);
-+	INIT_LIST_HEAD(&bfqq->fifo);
-+	INIT_HLIST_NODE(&bfqq->burst_list_node);
-+
-+	atomic_set(&bfqq->ref, 0);
-+	bfqq->bfqd = bfqd;
-+
-+	if (bic)
-+		bfq_set_next_ioprio_data(bfqq, bic);
-+
-+	if (is_sync) {
-+		if (!bfq_class_idle(bfqq))
-+			bfq_mark_bfqq_idle_window(bfqq);
-+		bfq_mark_bfqq_sync(bfqq);
-+	} else
-+		bfq_clear_bfqq_sync(bfqq);
-+	bfq_mark_bfqq_IO_bound(bfqq);
-+
-+	/* Tentative initial value to trade off between thr and lat */
-+	bfqq->max_budget = (2 * bfq_max_budget(bfqd)) / 3;
-+	bfqq->pid = pid;
-+
-+	bfqq->wr_coeff = 1;
-+	bfqq->last_wr_start_finish = 0;
-+	/*
-+	 * Set to the value for which bfqq will not be deemed as
-+	 * soft rt when it becomes backlogged.
-+	 */
-+	bfqq->soft_rt_next_start = bfq_infinity_from_now(jiffies);
-+}
-+
-+static struct bfq_queue *bfq_find_alloc_queue(struct bfq_data *bfqd,
-+					      struct bio *bio, int is_sync,
-+					      struct bfq_io_cq *bic,
-+					      gfp_t gfp_mask)
-+{
-+	struct bfq_group *bfqg;
-+	struct bfq_queue *bfqq, *new_bfqq = NULL;
-+	struct blkcg *blkcg;
-+
-+retry:
-+	rcu_read_lock();
-+
-+	blkcg = bio_blkcg(bio);
-+	bfqg = bfq_find_alloc_group(bfqd, blkcg);
-+	/* bic always exists here */
-+	bfqq = bic_to_bfqq(bic, is_sync);
-+
-+	/*
-+	 * Always try a new alloc if we fall back to the OOM bfqq
-+	 * originally, since it should just be a temporary situation.
-+	 */
-+	if (!bfqq || bfqq == &bfqd->oom_bfqq) {
-+		bfqq = NULL;
-+		if (new_bfqq) {
-+			bfqq = new_bfqq;
-+			new_bfqq = NULL;
-+		} else if (gfp_mask & __GFP_WAIT) {
-+			rcu_read_unlock();
-+			spin_unlock_irq(bfqd->queue->queue_lock);
-+			new_bfqq = kmem_cache_alloc_node(bfq_pool,
-+					gfp_mask | __GFP_ZERO,
-+					bfqd->queue->node);
-+			spin_lock_irq(bfqd->queue->queue_lock);
-+			if (new_bfqq)
-+				goto retry;
-+		} else {
-+			bfqq = kmem_cache_alloc_node(bfq_pool,
-+					gfp_mask | __GFP_ZERO,
-+					bfqd->queue->node);
-+		}
-+
-+		if (bfqq) {
-+			bfq_init_bfqq(bfqd, bfqq, bic, current->pid,
-+                                      is_sync);
-+			bfq_init_entity(&bfqq->entity, bfqg);
-+			bfq_log_bfqq(bfqd, bfqq, "allocated");
-+		} else {
-+			bfqq = &bfqd->oom_bfqq;
-+			bfq_log_bfqq(bfqd, bfqq, "using oom bfqq");
-+		}
-+	}
-+
-+	if (new_bfqq)
-+		kmem_cache_free(bfq_pool, new_bfqq);
-+
-+	rcu_read_unlock();
-+
-+	return bfqq;
-+}
-+
-+static struct bfq_queue **bfq_async_queue_prio(struct bfq_data *bfqd,
-+					       struct bfq_group *bfqg,
-+					       int ioprio_class, int ioprio)
-+{
-+	switch (ioprio_class) {
-+	case IOPRIO_CLASS_RT:
-+		return &bfqg->async_bfqq[0][ioprio];
-+	case IOPRIO_CLASS_NONE:
-+		ioprio = IOPRIO_NORM;
-+		/* fall through */
-+	case IOPRIO_CLASS_BE:
-+		return &bfqg->async_bfqq[1][ioprio];
-+	case IOPRIO_CLASS_IDLE:
-+		return &bfqg->async_idle_bfqq;
-+	default:
-+		BUG();
-+	}
-+}
-+
-+static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
-+				       struct bio *bio, int is_sync,
-+				       struct bfq_io_cq *bic, gfp_t gfp_mask)
-+{
-+	const int ioprio = IOPRIO_PRIO_DATA(bic->ioprio);
-+	const int ioprio_class = IOPRIO_PRIO_CLASS(bic->ioprio);
-+	struct bfq_queue **async_bfqq = NULL;
-+	struct bfq_queue *bfqq = NULL;
-+
-+	if (!is_sync) {
-+		struct blkcg *blkcg;
-+		struct bfq_group *bfqg;
-+
-+		rcu_read_lock();
-+		blkcg = bio_blkcg(bio);
-+		rcu_read_unlock();
-+		bfqg = bfq_find_alloc_group(bfqd, blkcg);
-+		async_bfqq = bfq_async_queue_prio(bfqd, bfqg, ioprio_class,
-+						  ioprio);
-+		bfqq = *async_bfqq;
-+	}
-+
-+	if (!bfqq)
-+		bfqq = bfq_find_alloc_queue(bfqd, bio, is_sync, bic, gfp_mask);
-+
-+	/*
-+	 * Pin the queue now that it's allocated, scheduler exit will
-+	 * prune it.
-+	 */
-+	if (!is_sync && !(*async_bfqq)) {
-+		atomic_inc(&bfqq->ref);
-+		bfq_log_bfqq(bfqd, bfqq, "get_queue, bfqq not in async: %p, %d",
-+			     bfqq, atomic_read(&bfqq->ref));
-+		*async_bfqq = bfqq;
-+	}
-+
-+	atomic_inc(&bfqq->ref);
-+	bfq_log_bfqq(bfqd, bfqq, "get_queue, at end: %p, %d", bfqq,
-+		     atomic_read(&bfqq->ref));
-+	return bfqq;
-+}
-+
-+static void bfq_update_io_thinktime(struct bfq_data *bfqd,
-+				    struct bfq_io_cq *bic)
-+{
-+	unsigned long elapsed = jiffies - bic->ttime.last_end_request;
-+	unsigned long ttime = min(elapsed, 2UL * bfqd->bfq_slice_idle);
-+
-+	bic->ttime.ttime_samples = (7*bic->ttime.ttime_samples + 256) / 8;
-+	bic->ttime.ttime_total = (7*bic->ttime.ttime_total + 256*ttime) / 8;
-+	bic->ttime.ttime_mean = (bic->ttime.ttime_total + 128) /
-+				bic->ttime.ttime_samples;
-+}
-+
-+static void bfq_update_io_seektime(struct bfq_data *bfqd,
-+				   struct bfq_queue *bfqq,
-+				   struct request *rq)
-+{
-+	sector_t sdist;
-+	u64 total;
-+
-+	if (bfqq->last_request_pos < blk_rq_pos(rq))
-+		sdist = blk_rq_pos(rq) - bfqq->last_request_pos;
-+	else
-+		sdist = bfqq->last_request_pos - blk_rq_pos(rq);
-+
-+	/*
-+	 * Don't allow the seek distance to get too large from the
-+	 * odd fragment, pagein, etc.
-+	 */
-+	if (bfqq->seek_samples == 0) /* first request, not really a seek */
-+		sdist = 0;
-+	else if (bfqq->seek_samples <= 60) /* second & third seek */
-+		sdist = min(sdist, (bfqq->seek_mean * 4) + 2*1024*1024);
-+	else
-+		sdist = min(sdist, (bfqq->seek_mean * 4) + 2*1024*64);
-+
-+	bfqq->seek_samples = (7*bfqq->seek_samples + 256) / 8;
-+	bfqq->seek_total = (7*bfqq->seek_total + (u64)256*sdist) / 8;
-+	total = bfqq->seek_total + (bfqq->seek_samples/2);
-+	do_div(total, bfqq->seek_samples);
-+	bfqq->seek_mean = (sector_t)total;
-+
-+	bfq_log_bfqq(bfqd, bfqq, "dist=%llu mean=%llu", (u64)sdist,
-+			(u64)bfqq->seek_mean);
-+}
-+
-+/*
-+ * Disable idle window if the process thinks too long or seeks so much that
-+ * it doesn't matter.
-+ */
-+static void bfq_update_idle_window(struct bfq_data *bfqd,
-+				   struct bfq_queue *bfqq,
-+				   struct bfq_io_cq *bic)
-+{
-+	int enable_idle;
-+
-+	/* Don't idle for async or idle io prio class. */
-+	if (!bfq_bfqq_sync(bfqq) || bfq_class_idle(bfqq))
-+		return;
-+
-+	enable_idle = bfq_bfqq_idle_window(bfqq);
-+
-+	if (atomic_read(&bic->icq.ioc->active_ref) == 0 ||
-+	    bfqd->bfq_slice_idle == 0 ||
-+		(bfqd->hw_tag && BFQQ_SEEKY(bfqq) &&
-+			bfqq->wr_coeff == 1))
-+		enable_idle = 0;
-+	else if (bfq_sample_valid(bic->ttime.ttime_samples)) {
-+		if (bic->ttime.ttime_mean > bfqd->bfq_slice_idle &&
-+			bfqq->wr_coeff == 1)
-+			enable_idle = 0;
-+		else
-+			enable_idle = 1;
-+	}
-+	bfq_log_bfqq(bfqd, bfqq, "update_idle_window: enable_idle %d",
-+		enable_idle);
-+
-+	if (enable_idle)
-+		bfq_mark_bfqq_idle_window(bfqq);
-+	else
-+		bfq_clear_bfqq_idle_window(bfqq);
-+}
-+
-+/*
-+ * Called when a new fs request (rq) is added to bfqq.  Check if there's
-+ * something we should do about it.
-+ */
-+static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq,
-+			    struct request *rq)
-+{
-+	struct bfq_io_cq *bic = RQ_BIC(rq);
-+
-+	if (rq->cmd_flags & REQ_META)
-+		bfqq->meta_pending++;
-+
-+	bfq_update_io_thinktime(bfqd, bic);
-+	bfq_update_io_seektime(bfqd, bfqq, rq);
-+	if (!BFQQ_SEEKY(bfqq) && bfq_bfqq_constantly_seeky(bfqq)) {
-+		bfq_clear_bfqq_constantly_seeky(bfqq);
-+		if (!blk_queue_nonrot(bfqd->queue)) {
-+			BUG_ON(!bfqd->const_seeky_busy_in_flight_queues);
-+			bfqd->const_seeky_busy_in_flight_queues--;
-+		}
-+	}
-+	if (bfqq->entity.service > bfq_max_budget(bfqd) / 8 ||
-+	    !BFQQ_SEEKY(bfqq))
-+		bfq_update_idle_window(bfqd, bfqq, bic);
-+
-+	bfq_log_bfqq(bfqd, bfqq,
-+		     "rq_enqueued: idle_window=%d (seeky %d, mean %llu)",
-+		     bfq_bfqq_idle_window(bfqq), BFQQ_SEEKY(bfqq),
-+		     (long long unsigned)bfqq->seek_mean);
-+
-+	bfqq->last_request_pos = blk_rq_pos(rq) + blk_rq_sectors(rq);
-+
-+	if (bfqq == bfqd->in_service_queue && bfq_bfqq_wait_request(bfqq)) {
-+		bool small_req = bfqq->queued[rq_is_sync(rq)] == 1 &&
-+				 blk_rq_sectors(rq) < 32;
-+		bool budget_timeout = bfq_bfqq_budget_timeout(bfqq);
-+
-+		/*
-+		 * There is just this request queued: if the request
-+		 * is small and the queue is not to be expired, then
-+		 * just exit.
-+		 *
-+		 * In this way, if the disk is being idled to wait for
-+		 * a new request from the in-service queue, we avoid
-+		 * unplugging the device and committing the disk to serve
-+		 * just a small request. On the contrary, we wait for
-+		 * the block layer to decide when to unplug the device:
-+		 * hopefully, new requests will be merged to this one
-+		 * quickly, then the device will be unplugged and
-+		 * larger requests will be dispatched.
-+		 */
-+		if (small_req && !budget_timeout)
-+			return;
-+
-+		/*
-+		 * A large enough request arrived, or the queue is to
-+		 * be expired: in both cases disk idling is to be
-+		 * stopped, so clear wait_request flag and reset
-+		 * timer.
-+		 */
-+		bfq_clear_bfqq_wait_request(bfqq);
-+		del_timer(&bfqd->idle_slice_timer);
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+		bfqg_stats_update_idle_time(bfqq_group(bfqq));
-+#endif
-+
-+		/*
-+		 * The queue is not empty, because a new request just
-+		 * arrived. Hence we can safely expire the queue, in
-+		 * case of budget timeout, without risking that the
-+		 * timestamps of the queue are not updated correctly.
-+		 * See [1] for more details.
-+		 */
-+		if (budget_timeout)
-+			bfq_bfqq_expire(bfqd, bfqq, false,
-+					BFQ_BFQQ_BUDGET_TIMEOUT);
-+
-+		/*
-+		 * Let the request rip immediately, or let a new queue be
-+		 * selected if bfqq has just been expired.
-+		 */
-+		__blk_run_queue(bfqd->queue);
-+	}
-+}
-+
-+static void bfq_insert_request(struct request_queue *q, struct request *rq)
-+{
-+	struct bfq_data *bfqd = q->elevator->elevator_data;
-+	struct bfq_queue *bfqq = RQ_BFQQ(rq);
-+
-+	assert_spin_locked(bfqd->queue->queue_lock);
-+
-+	bfq_add_request(rq);
-+
-+	rq->fifo_time = jiffies + bfqd->bfq_fifo_expire[rq_is_sync(rq)];
-+	list_add_tail(&rq->queuelist, &bfqq->fifo);
-+
-+	bfq_rq_enqueued(bfqd, bfqq, rq);
-+}
-+
-+static void bfq_update_hw_tag(struct bfq_data *bfqd)
-+{
-+	bfqd->max_rq_in_driver = max(bfqd->max_rq_in_driver,
-+				     bfqd->rq_in_driver);
-+
-+	if (bfqd->hw_tag == 1)
-+		return;
-+
-+	/*
-+	 * This sample is valid if the number of outstanding requests
-+	 * is large enough to allow a queueing behavior.  Note that the
-+	 * sum is not exact, as it's not taking into account deactivated
-+	 * requests.
-+	 */
-+	if (bfqd->rq_in_driver + bfqd->queued < BFQ_HW_QUEUE_THRESHOLD)
-+		return;
-+
-+	if (bfqd->hw_tag_samples++ < BFQ_HW_QUEUE_SAMPLES)
-+		return;
-+
-+	bfqd->hw_tag = bfqd->max_rq_in_driver > BFQ_HW_QUEUE_THRESHOLD;
-+	bfqd->max_rq_in_driver = 0;
-+	bfqd->hw_tag_samples = 0;
-+}
-+
-+static void bfq_completed_request(struct request_queue *q, struct request *rq)
-+{
-+	struct bfq_queue *bfqq = RQ_BFQQ(rq);
-+	struct bfq_data *bfqd = bfqq->bfqd;
-+	bool sync = bfq_bfqq_sync(bfqq);
-+
-+	bfq_log_bfqq(bfqd, bfqq, "completed one req with %u sects left (%d)",
-+		     blk_rq_sectors(rq), sync);
-+
-+	bfq_update_hw_tag(bfqd);
-+
-+	BUG_ON(!bfqd->rq_in_driver);
-+	BUG_ON(!bfqq->dispatched);
-+	bfqd->rq_in_driver--;
-+	bfqq->dispatched--;
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+	bfqg_stats_update_completion(bfqq_group(bfqq),
-+				     rq_start_time_ns(rq),
-+				     rq_io_start_time_ns(rq), rq->cmd_flags);
-+#endif
-+
-+	if (!bfqq->dispatched && !bfq_bfqq_busy(bfqq)) {
-+		bfq_weights_tree_remove(bfqd, &bfqq->entity,
-+					&bfqd->queue_weights_tree);
-+		if (!blk_queue_nonrot(bfqd->queue)) {
-+			BUG_ON(!bfqd->busy_in_flight_queues);
-+			bfqd->busy_in_flight_queues--;
-+			if (bfq_bfqq_constantly_seeky(bfqq)) {
-+				BUG_ON(!bfqd->
-+					const_seeky_busy_in_flight_queues);
-+				bfqd->const_seeky_busy_in_flight_queues--;
-+			}
-+		}
-+	}
-+
-+	if (sync) {
-+		bfqd->sync_flight--;
-+		RQ_BIC(rq)->ttime.last_end_request = jiffies;
-+	}
-+
-+	/*
-+	 * If we are waiting to discover whether the request pattern of the
-+	 * task associated with the queue is actually isochronous, and
-+	 * both requisites for this condition to hold are satisfied, then
-+	 * compute soft_rt_next_start (see the comments to the function
-+	 * bfq_bfqq_softrt_next_start()).
-+	 */
-+	if (bfq_bfqq_softrt_update(bfqq) && bfqq->dispatched == 0 &&
-+	    RB_EMPTY_ROOT(&bfqq->sort_list))
-+		bfqq->soft_rt_next_start =
-+			bfq_bfqq_softrt_next_start(bfqd, bfqq);
-+
-+	/*
-+	 * If this is the in-service queue, check if it needs to be expired,
-+	 * or if we want to idle in case it has no pending requests.
-+	 */
-+	if (bfqd->in_service_queue == bfqq) {
-+		if (bfq_bfqq_budget_new(bfqq))
-+			bfq_set_budget_timeout(bfqd);
-+
-+		if (bfq_bfqq_must_idle(bfqq)) {
-+			bfq_arm_slice_timer(bfqd);
-+			goto out;
-+		} else if (bfq_may_expire_for_budg_timeout(bfqq))
-+			bfq_bfqq_expire(bfqd, bfqq, false,
-+					BFQ_BFQQ_BUDGET_TIMEOUT);
-+		else if (RB_EMPTY_ROOT(&bfqq->sort_list) &&
-+			 (bfqq->dispatched == 0 ||
-+			  !bfq_bfqq_may_idle(bfqq)))
-+			bfq_bfqq_expire(bfqd, bfqq, false,
-+					BFQ_BFQQ_NO_MORE_REQUESTS);
-+	}
-+
-+	if (!bfqd->rq_in_driver)
-+		bfq_schedule_dispatch(bfqd);
-+
-+out:
-+	return;
-+}
-+
-+static int __bfq_may_queue(struct bfq_queue *bfqq)
-+{
-+	if (bfq_bfqq_wait_request(bfqq) && bfq_bfqq_must_alloc(bfqq)) {
-+		bfq_clear_bfqq_must_alloc(bfqq);
-+		return ELV_MQUEUE_MUST;
-+	}
-+
-+	return ELV_MQUEUE_MAY;
-+}
-+
-+static int bfq_may_queue(struct request_queue *q, int rw)
-+{
-+	struct bfq_data *bfqd = q->elevator->elevator_data;
-+	struct task_struct *tsk = current;
-+	struct bfq_io_cq *bic;
-+	struct bfq_queue *bfqq;
-+
-+	/*
-+	 * Don't force setup of a queue from here, as a call to may_queue
-+	 * does not necessarily imply that a request actually will be
-+	 * queued. So just lookup a possibly existing queue, or return
-+	 * 'may queue' if that fails.
-+	 */
-+	bic = bfq_bic_lookup(bfqd, tsk->io_context);
-+	if (!bic)
-+		return ELV_MQUEUE_MAY;
-+
-+	bfqq = bic_to_bfqq(bic, rw_is_sync(rw));
-+	if (bfqq)
-+		return __bfq_may_queue(bfqq);
-+
-+	return ELV_MQUEUE_MAY;
-+}
-+
-+/*
-+ * Queue lock held here.
-+ */
-+static void bfq_put_request(struct request *rq)
-+{
-+	struct bfq_queue *bfqq = RQ_BFQQ(rq);
-+
-+	if (bfqq) {
-+		const int rw = rq_data_dir(rq);
-+
-+		BUG_ON(!bfqq->allocated[rw]);
-+		bfqq->allocated[rw]--;
-+
-+		rq->elv.priv[0] = NULL;
-+		rq->elv.priv[1] = NULL;
-+
-+		bfq_log_bfqq(bfqq->bfqd, bfqq, "put_request %p, %d",
-+			     bfqq, atomic_read(&bfqq->ref));
-+		bfq_put_queue(bfqq);
-+	}
-+}
-+
-+/*
-+ * Allocate bfq data structures associated with this request.
-+ */
-+static int bfq_set_request(struct request_queue *q, struct request *rq,
-+			   struct bio *bio, gfp_t gfp_mask)
-+{
-+	struct bfq_data *bfqd = q->elevator->elevator_data;
-+	struct bfq_io_cq *bic = icq_to_bic(rq->elv.icq);
-+	const int rw = rq_data_dir(rq);
-+	const int is_sync = rq_is_sync(rq);
-+	struct bfq_queue *bfqq;
-+	unsigned long flags;
-+
-+	might_sleep_if(gfp_mask & __GFP_WAIT);
-+
-+	bfq_check_ioprio_change(bic, bio);
-+
-+	spin_lock_irqsave(q->queue_lock, flags);
-+
-+	if (!bic)
-+		goto queue_fail;
-+
-+	bfq_bic_update_cgroup(bic, bio);
-+
-+	bfqq = bic_to_bfqq(bic, is_sync);
-+	if (!bfqq || bfqq == &bfqd->oom_bfqq) {
-+		bfqq = bfq_get_queue(bfqd, bio, is_sync, bic, gfp_mask);
-+		bic_set_bfqq(bic, bfqq, is_sync);
-+		if (is_sync) {
-+			if (bfqd->large_burst)
-+				bfq_mark_bfqq_in_large_burst(bfqq);
-+			else
-+				bfq_clear_bfqq_in_large_burst(bfqq);
-+		}
-+	}
-+
-+	bfqq->allocated[rw]++;
-+	atomic_inc(&bfqq->ref);
-+	bfq_log_bfqq(bfqd, bfqq, "set_request: bfqq %p, %d", bfqq,
-+		     atomic_read(&bfqq->ref));
-+
-+	rq->elv.priv[0] = bic;
-+	rq->elv.priv[1] = bfqq;
-+
-+	spin_unlock_irqrestore(q->queue_lock, flags);
-+
-+	return 0;
-+
-+queue_fail:
-+	bfq_schedule_dispatch(bfqd);
-+	spin_unlock_irqrestore(q->queue_lock, flags);
-+
-+	return 1;
-+}
-+
-+static void bfq_kick_queue(struct work_struct *work)
-+{
-+	struct bfq_data *bfqd =
-+		container_of(work, struct bfq_data, unplug_work);
-+	struct request_queue *q = bfqd->queue;
-+
-+	spin_lock_irq(q->queue_lock);
-+	__blk_run_queue(q);
-+	spin_unlock_irq(q->queue_lock);
-+}
-+
-+/*
-+ * Handler of the expiration of the timer running if the in-service queue
-+ * is idling inside its time slice.
-+ */
-+static void bfq_idle_slice_timer(unsigned long data)
-+{
-+	struct bfq_data *bfqd = (struct bfq_data *)data;
-+	struct bfq_queue *bfqq;
-+	unsigned long flags;
-+	enum bfqq_expiration reason;
-+
-+	spin_lock_irqsave(bfqd->queue->queue_lock, flags);
-+
-+	bfqq = bfqd->in_service_queue;
-+	/*
-+	 * Theoretical race here: the in-service queue can be NULL or
-+	 * different from the queue that was idling if the timer handler
-+	 * spins on the queue_lock and a new request arrives for the
-+	 * current queue and there is a full dispatch cycle that changes
-+	 * the in-service queue.  This can hardly happen, but in the worst
-+	 * case we just expire a queue too early.
-+	 */
-+	if (bfqq) {
-+		bfq_log_bfqq(bfqd, bfqq, "slice_timer expired");
-+		if (bfq_bfqq_budget_timeout(bfqq))
-+			/*
-+			 * Also here the queue can be safely expired
-+			 * for budget timeout without wasting
-+			 * guarantees
-+			 */
-+			reason = BFQ_BFQQ_BUDGET_TIMEOUT;
-+		else if (bfqq->queued[0] == 0 && bfqq->queued[1] == 0)
-+			/*
-+			 * The queue may not be empty upon timer expiration,
-+			 * because we may not disable the timer when the
-+			 * first request of the in-service queue arrives
-+			 * during disk idling.
-+			 */
-+			reason = BFQ_BFQQ_TOO_IDLE;
-+		else
-+			goto schedule_dispatch;
-+
-+		bfq_bfqq_expire(bfqd, bfqq, true, reason);
-+	}
-+
-+schedule_dispatch:
-+	bfq_schedule_dispatch(bfqd);
-+
-+	spin_unlock_irqrestore(bfqd->queue->queue_lock, flags);
-+}
-+
-+static void bfq_shutdown_timer_wq(struct bfq_data *bfqd)
-+{
-+	del_timer_sync(&bfqd->idle_slice_timer);
-+	cancel_work_sync(&bfqd->unplug_work);
-+}
-+
-+static void __bfq_put_async_bfqq(struct bfq_data *bfqd,
-+					struct bfq_queue **bfqq_ptr)
-+{
-+	struct bfq_group *root_group = bfqd->root_group;
-+	struct bfq_queue *bfqq = *bfqq_ptr;
-+
-+	bfq_log(bfqd, "put_async_bfqq: %p", bfqq);
-+	if (bfqq) {
-+		bfq_bfqq_move(bfqd, bfqq, &bfqq->entity, root_group);
-+		bfq_log_bfqq(bfqd, bfqq, "put_async_bfqq: putting %p, %d",
-+			     bfqq, atomic_read(&bfqq->ref));
-+		bfq_put_queue(bfqq);
-+		*bfqq_ptr = NULL;
-+	}
-+}
-+
-+/*
-+ * Release all the bfqg references to its async queues.  If we are
-+ * deallocating the group these queues may still contain requests, so
-+ * we reparent them to the root cgroup (i.e., the only one that will
-+ * exist for sure until all the requests on a device are gone).
-+ */
-+static void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg)
-+{
-+	int i, j;
-+
-+	for (i = 0; i < 2; i++)
-+		for (j = 0; j < IOPRIO_BE_NR; j++)
-+			__bfq_put_async_bfqq(bfqd, &bfqg->async_bfqq[i][j]);
-+
-+	__bfq_put_async_bfqq(bfqd, &bfqg->async_idle_bfqq);
-+}
-+
-+static void bfq_exit_queue(struct elevator_queue *e)
-+{
-+	struct bfq_data *bfqd = e->elevator_data;
-+	struct request_queue *q = bfqd->queue;
-+	struct bfq_queue *bfqq, *n;
-+
-+	bfq_shutdown_timer_wq(bfqd);
-+
-+	spin_lock_irq(q->queue_lock);
-+
-+	BUG_ON(bfqd->in_service_queue);
-+	list_for_each_entry_safe(bfqq, n, &bfqd->idle_list, bfqq_list)
-+		bfq_deactivate_bfqq(bfqd, bfqq, 0);
-+
-+	bfq_disconnect_groups(bfqd);
-+	spin_unlock_irq(q->queue_lock);
-+
-+	bfq_shutdown_timer_wq(bfqd);
-+
-+	synchronize_rcu();
-+
-+	BUG_ON(timer_pending(&bfqd->idle_slice_timer));
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+	blkcg_deactivate_policy(q, &blkcg_policy_bfq);
-+#endif
-+
-+	kfree(bfqd);
-+}
-+
-+static void bfq_init_root_group(struct bfq_group *root_group,
-+				struct bfq_data *bfqd)
-+{
-+	int i;
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+	root_group->entity.parent = NULL;
-+	root_group->my_entity = NULL;
-+	root_group->bfqd = bfqd;
-+#endif
-+	for (i = 0; i < BFQ_IOPRIO_CLASSES; i++)
-+		root_group->sched_data.service_tree[i] = BFQ_SERVICE_TREE_INIT;
-+}
-+
-+static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
-+{
-+	struct bfq_data *bfqd;
-+	struct elevator_queue *eq;
-+
-+	eq = elevator_alloc(q, e);
-+	if (!eq)
-+		return -ENOMEM;
-+
-+	bfqd = kzalloc_node(sizeof(*bfqd), GFP_KERNEL, q->node);
-+	if (!bfqd) {
-+		kobject_put(&eq->kobj);
-+		return -ENOMEM;
-+	}
-+	eq->elevator_data = bfqd;
-+
-+	/*
-+	 * Our fallback bfqq if bfq_find_alloc_queue() runs into OOM issues.
-+	 * Grab a permanent reference to it, so that the normal code flow
-+	 * will not attempt to free it.
-+	 */
-+	bfq_init_bfqq(bfqd, &bfqd->oom_bfqq, NULL, 1, 0);
-+	atomic_inc(&bfqd->oom_bfqq.ref);
-+	bfqd->oom_bfqq.new_ioprio = BFQ_DEFAULT_QUEUE_IOPRIO;
-+	bfqd->oom_bfqq.new_ioprio_class = IOPRIO_CLASS_BE;
-+	bfqd->oom_bfqq.entity.new_weight =
-+		bfq_ioprio_to_weight(bfqd->oom_bfqq.new_ioprio);
-+	/*
-+	 * Trigger weight initialization, according to ioprio, at the
-+	 * oom_bfqq's first activation. The oom_bfqq's ioprio and ioprio
-+	 * class won't be changed any more.
-+	 */
-+	bfqd->oom_bfqq.entity.prio_changed = 1;
-+
-+	bfqd->queue = q;
-+
-+	spin_lock_irq(q->queue_lock);
-+	q->elevator = eq;
-+	spin_unlock_irq(q->queue_lock);
-+
-+	bfqd->root_group = bfq_create_group_hierarchy(bfqd, q->node);
-+	if (!bfqd->root_group)
-+		goto out_free;
-+	bfq_init_root_group(bfqd->root_group, bfqd);
-+	bfq_init_entity(&bfqd->oom_bfqq.entity, bfqd->root_group);
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+	bfqd->active_numerous_groups = 0;
-+#endif
-+
-+	init_timer(&bfqd->idle_slice_timer);
-+	bfqd->idle_slice_timer.function = bfq_idle_slice_timer;
-+	bfqd->idle_slice_timer.data = (unsigned long)bfqd;
-+
-+	bfqd->queue_weights_tree = RB_ROOT;
-+	bfqd->group_weights_tree = RB_ROOT;
-+
-+	INIT_WORK(&bfqd->unplug_work, bfq_kick_queue);
-+
-+	INIT_LIST_HEAD(&bfqd->active_list);
-+	INIT_LIST_HEAD(&bfqd->idle_list);
-+	INIT_HLIST_HEAD(&bfqd->burst_list);
-+
-+	bfqd->hw_tag = -1;
-+
-+	bfqd->bfq_max_budget = bfq_default_max_budget;
-+
-+	bfqd->bfq_fifo_expire[0] = bfq_fifo_expire[0];
-+	bfqd->bfq_fifo_expire[1] = bfq_fifo_expire[1];
-+	bfqd->bfq_back_max = bfq_back_max;
-+	bfqd->bfq_back_penalty = bfq_back_penalty;
-+	bfqd->bfq_slice_idle = bfq_slice_idle;
-+	bfqd->bfq_class_idle_last_service = 0;
-+	bfqd->bfq_max_budget_async_rq = bfq_max_budget_async_rq;
-+	bfqd->bfq_timeout[BLK_RW_ASYNC] = bfq_timeout_async;
-+	bfqd->bfq_timeout[BLK_RW_SYNC] = bfq_timeout_sync;
-+
-+	bfqd->bfq_requests_within_timer = 120;
-+
-+	bfqd->bfq_large_burst_thresh = 11;
-+	bfqd->bfq_burst_interval = msecs_to_jiffies(500);
-+
-+	bfqd->low_latency = true;
-+
-+	bfqd->bfq_wr_coeff = 20;
-+	bfqd->bfq_wr_rt_max_time = msecs_to_jiffies(300);
-+	bfqd->bfq_wr_max_time = 0;
-+	bfqd->bfq_wr_min_idle_time = msecs_to_jiffies(2000);
-+	bfqd->bfq_wr_min_inter_arr_async = msecs_to_jiffies(500);
-+	bfqd->bfq_wr_max_softrt_rate = 7000; /*
-+					      * Approximate rate required
-+					      * to playback or record a
-+					      * high-definition compressed
-+					      * video.
-+					      */
-+	bfqd->wr_busy_queues = 0;
-+	bfqd->busy_in_flight_queues = 0;
-+	bfqd->const_seeky_busy_in_flight_queues = 0;
-+
-+	/*
-+	 * Begin by assuming, optimistically, that the device peak rate is
-+	 * equal to the highest reference rate.
-+	 */
-+	bfqd->RT_prod = R_fast[blk_queue_nonrot(bfqd->queue)] *
-+			T_fast[blk_queue_nonrot(bfqd->queue)];
-+	bfqd->peak_rate = R_fast[blk_queue_nonrot(bfqd->queue)];
-+	bfqd->device_speed = BFQ_BFQD_FAST;
-+
-+	return 0;
-+
-+out_free:
-+	kfree(bfqd);
-+	kobject_put(&eq->kobj);
-+	return -ENOMEM;
-+}
-+
-+static void bfq_slab_kill(void)
-+{
-+	if (bfq_pool)
-+		kmem_cache_destroy(bfq_pool);
-+}
-+
-+static int __init bfq_slab_setup(void)
-+{
-+	bfq_pool = KMEM_CACHE(bfq_queue, 0);
-+	if (!bfq_pool)
-+		return -ENOMEM;
-+	return 0;
-+}
-+
-+static ssize_t bfq_var_show(unsigned int var, char *page)
-+{
-+	return sprintf(page, "%d\n", var);
-+}
-+
-+static ssize_t bfq_var_store(unsigned long *var, const char *page,
-+			     size_t count)
-+{
-+	unsigned long new_val;
-+	int ret = kstrtoul(page, 10, &new_val);
-+
-+	if (ret == 0)
-+		*var = new_val;
-+
-+	return count;
-+}
-+
-+static ssize_t bfq_wr_max_time_show(struct elevator_queue *e, char *page)
-+{
-+	struct bfq_data *bfqd = e->elevator_data;
-+	return sprintf(page, "%d\n", bfqd->bfq_wr_max_time > 0 ?
-+		       jiffies_to_msecs(bfqd->bfq_wr_max_time) :
-+		       jiffies_to_msecs(bfq_wr_duration(bfqd)));
-+}
-+
-+static ssize_t bfq_weights_show(struct elevator_queue *e, char *page)
-+{
-+	struct bfq_queue *bfqq;
-+	struct bfq_data *bfqd = e->elevator_data;
-+	ssize_t num_char = 0;
-+
-+	num_char += sprintf(page + num_char, "Tot reqs queued %d\n\n",
-+			    bfqd->queued);
-+
-+	spin_lock_irq(bfqd->queue->queue_lock);
-+
-+	num_char += sprintf(page + num_char, "Active:\n");
-+	list_for_each_entry(bfqq, &bfqd->active_list, bfqq_list) {
-+	  num_char += sprintf(page + num_char,
-+			      "pid%d: weight %hu, nr_queued %d %d, dur %d/%u\n",
-+			      bfqq->pid,
-+			      bfqq->entity.weight,
-+			      bfqq->queued[0],
-+			      bfqq->queued[1],
-+			jiffies_to_msecs(jiffies - bfqq->last_wr_start_finish),
-+			jiffies_to_msecs(bfqq->wr_cur_max_time));
-+	}
-+
-+	num_char += sprintf(page + num_char, "Idle:\n");
-+	list_for_each_entry(bfqq, &bfqd->idle_list, bfqq_list) {
-+			num_char += sprintf(page + num_char,
-+				"pid%d: weight %hu, dur %d/%u\n",
-+				bfqq->pid,
-+				bfqq->entity.weight,
-+				jiffies_to_msecs(jiffies -
-+					bfqq->last_wr_start_finish),
-+				jiffies_to_msecs(bfqq->wr_cur_max_time));
-+	}
-+
-+	spin_unlock_irq(bfqd->queue->queue_lock);
-+
-+	return num_char;
-+}
-+
-+#define SHOW_FUNCTION(__FUNC, __VAR, __CONV)				\
-+static ssize_t __FUNC(struct elevator_queue *e, char *page)		\
-+{									\
-+	struct bfq_data *bfqd = e->elevator_data;			\
-+	unsigned int __data = __VAR;					\
-+	if (__CONV)							\
-+		__data = jiffies_to_msecs(__data);			\
-+	return bfq_var_show(__data, (page));				\
-+}
-+SHOW_FUNCTION(bfq_fifo_expire_sync_show, bfqd->bfq_fifo_expire[1], 1);
-+SHOW_FUNCTION(bfq_fifo_expire_async_show, bfqd->bfq_fifo_expire[0], 1);
-+SHOW_FUNCTION(bfq_back_seek_max_show, bfqd->bfq_back_max, 0);
-+SHOW_FUNCTION(bfq_back_seek_penalty_show, bfqd->bfq_back_penalty, 0);
-+SHOW_FUNCTION(bfq_slice_idle_show, bfqd->bfq_slice_idle, 1);
-+SHOW_FUNCTION(bfq_max_budget_show, bfqd->bfq_user_max_budget, 0);
-+SHOW_FUNCTION(bfq_max_budget_async_rq_show,
-+	      bfqd->bfq_max_budget_async_rq, 0);
-+SHOW_FUNCTION(bfq_timeout_sync_show, bfqd->bfq_timeout[BLK_RW_SYNC], 1);
-+SHOW_FUNCTION(bfq_timeout_async_show, bfqd->bfq_timeout[BLK_RW_ASYNC], 1);
-+SHOW_FUNCTION(bfq_low_latency_show, bfqd->low_latency, 0);
-+SHOW_FUNCTION(bfq_wr_coeff_show, bfqd->bfq_wr_coeff, 0);
-+SHOW_FUNCTION(bfq_wr_rt_max_time_show, bfqd->bfq_wr_rt_max_time, 1);
-+SHOW_FUNCTION(bfq_wr_min_idle_time_show, bfqd->bfq_wr_min_idle_time, 1);
-+SHOW_FUNCTION(bfq_wr_min_inter_arr_async_show, bfqd->bfq_wr_min_inter_arr_async,
-+	1);
-+SHOW_FUNCTION(bfq_wr_max_softrt_rate_show, bfqd->bfq_wr_max_softrt_rate, 0);
-+#undef SHOW_FUNCTION
-+
-+#define STORE_FUNCTION(__FUNC, __PTR, MIN, MAX, __CONV)			\
-+static ssize_t								\
-+__FUNC(struct elevator_queue *e, const char *page, size_t count)	\
-+{									\
-+	struct bfq_data *bfqd = e->elevator_data;			\
-+	unsigned long uninitialized_var(__data);			\
-+	int ret = bfq_var_store(&__data, (page), count);		\
-+	if (__data < (MIN))						\
-+		__data = (MIN);						\
-+	else if (__data > (MAX))					\
-+		__data = (MAX);						\
-+	if (__CONV)							\
-+		*(__PTR) = msecs_to_jiffies(__data);			\
-+	else								\
-+		*(__PTR) = __data;					\
-+	return ret;							\
-+}
-+STORE_FUNCTION(bfq_fifo_expire_sync_store, &bfqd->bfq_fifo_expire[1], 1,
-+		INT_MAX, 1);
-+STORE_FUNCTION(bfq_fifo_expire_async_store, &bfqd->bfq_fifo_expire[0], 1,
-+		INT_MAX, 1);
-+STORE_FUNCTION(bfq_back_seek_max_store, &bfqd->bfq_back_max, 0, INT_MAX, 0);
-+STORE_FUNCTION(bfq_back_seek_penalty_store, &bfqd->bfq_back_penalty, 1,
-+		INT_MAX, 0);
-+STORE_FUNCTION(bfq_slice_idle_store, &bfqd->bfq_slice_idle, 0, INT_MAX, 1);
-+STORE_FUNCTION(bfq_max_budget_async_rq_store, &bfqd->bfq_max_budget_async_rq,
-+		1, INT_MAX, 0);
-+STORE_FUNCTION(bfq_timeout_async_store, &bfqd->bfq_timeout[BLK_RW_ASYNC], 0,
-+		INT_MAX, 1);
-+STORE_FUNCTION(bfq_wr_coeff_store, &bfqd->bfq_wr_coeff, 1, INT_MAX, 0);
-+STORE_FUNCTION(bfq_wr_max_time_store, &bfqd->bfq_wr_max_time, 0, INT_MAX, 1);
-+STORE_FUNCTION(bfq_wr_rt_max_time_store, &bfqd->bfq_wr_rt_max_time, 0, INT_MAX,
-+		1);
-+STORE_FUNCTION(bfq_wr_min_idle_time_store, &bfqd->bfq_wr_min_idle_time, 0,
-+		INT_MAX, 1);
-+STORE_FUNCTION(bfq_wr_min_inter_arr_async_store,
-+		&bfqd->bfq_wr_min_inter_arr_async, 0, INT_MAX, 1);
-+STORE_FUNCTION(bfq_wr_max_softrt_rate_store, &bfqd->bfq_wr_max_softrt_rate, 0,
-+		INT_MAX, 0);
-+#undef STORE_FUNCTION
-+
-+/* do nothing for the moment */
-+static ssize_t bfq_weights_store(struct elevator_queue *e,
-+				    const char *page, size_t count)
-+{
-+	return count;
-+}
-+
-+static unsigned long bfq_estimated_max_budget(struct bfq_data *bfqd)
-+{
-+	u64 timeout = jiffies_to_msecs(bfqd->bfq_timeout[BLK_RW_SYNC]);
-+
-+	if (bfqd->peak_rate_samples >= BFQ_PEAK_RATE_SAMPLES)
-+		return bfq_calc_max_budget(bfqd->peak_rate, timeout);
-+	else
-+		return bfq_default_max_budget;
-+}
-+
-+static ssize_t bfq_max_budget_store(struct elevator_queue *e,
-+				    const char *page, size_t count)
-+{
-+	struct bfq_data *bfqd = e->elevator_data;
-+	unsigned long uninitialized_var(__data);
-+	int ret = bfq_var_store(&__data, (page), count);
-+
-+	if (__data == 0)
-+		bfqd->bfq_max_budget = bfq_estimated_max_budget(bfqd);
-+	else {
-+		if (__data > INT_MAX)
-+			__data = INT_MAX;
-+		bfqd->bfq_max_budget = __data;
-+	}
-+
-+	bfqd->bfq_user_max_budget = __data;
-+
-+	return ret;
-+}
-+
-+static ssize_t bfq_timeout_sync_store(struct elevator_queue *e,
-+				      const char *page, size_t count)
-+{
-+	struct bfq_data *bfqd = e->elevator_data;
-+	unsigned long uninitialized_var(__data);
-+	int ret = bfq_var_store(&__data, (page), count);
-+
-+	if (__data < 1)
-+		__data = 1;
-+	else if (__data > INT_MAX)
-+		__data = INT_MAX;
-+
-+	bfqd->bfq_timeout[BLK_RW_SYNC] = msecs_to_jiffies(__data);
-+	if (bfqd->bfq_user_max_budget == 0)
-+		bfqd->bfq_max_budget = bfq_estimated_max_budget(bfqd);
-+
-+	return ret;
-+}
-+
-+static ssize_t bfq_low_latency_store(struct elevator_queue *e,
-+				     const char *page, size_t count)
-+{
-+	struct bfq_data *bfqd = e->elevator_data;
-+	unsigned long uninitialized_var(__data);
-+	int ret = bfq_var_store(&__data, (page), count);
-+
-+	if (__data > 1)
-+		__data = 1;
-+	if (__data == 0 && bfqd->low_latency != 0)
-+		bfq_end_wr(bfqd);
-+	bfqd->low_latency = __data;
-+
-+	return ret;
-+}
-+
-+#define BFQ_ATTR(name) \
-+	__ATTR(name, S_IRUGO|S_IWUSR, bfq_##name##_show, bfq_##name##_store)
-+
-+static struct elv_fs_entry bfq_attrs[] = {
-+	BFQ_ATTR(fifo_expire_sync),
-+	BFQ_ATTR(fifo_expire_async),
-+	BFQ_ATTR(back_seek_max),
-+	BFQ_ATTR(back_seek_penalty),
-+	BFQ_ATTR(slice_idle),
-+	BFQ_ATTR(max_budget),
-+	BFQ_ATTR(max_budget_async_rq),
-+	BFQ_ATTR(timeout_sync),
-+	BFQ_ATTR(timeout_async),
-+	BFQ_ATTR(low_latency),
-+	BFQ_ATTR(wr_coeff),
-+	BFQ_ATTR(wr_max_time),
-+	BFQ_ATTR(wr_rt_max_time),
-+	BFQ_ATTR(wr_min_idle_time),
-+	BFQ_ATTR(wr_min_inter_arr_async),
-+	BFQ_ATTR(wr_max_softrt_rate),
-+	BFQ_ATTR(weights),
-+	__ATTR_NULL
-+};
-+
-+static struct elevator_type iosched_bfq = {
-+	.ops = {
-+		.elevator_merge_fn =		bfq_merge,
-+		.elevator_merged_fn =		bfq_merged_request,
-+		.elevator_merge_req_fn =	bfq_merged_requests,
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+		.elevator_bio_merged_fn =	bfq_bio_merged,
-+#endif
-+		.elevator_allow_merge_fn =	bfq_allow_merge,
-+		.elevator_dispatch_fn =		bfq_dispatch_requests,
-+		.elevator_add_req_fn =		bfq_insert_request,
-+		.elevator_activate_req_fn =	bfq_activate_request,
-+		.elevator_deactivate_req_fn =	bfq_deactivate_request,
-+		.elevator_completed_req_fn =	bfq_completed_request,
-+		.elevator_former_req_fn =	elv_rb_former_request,
-+		.elevator_latter_req_fn =	elv_rb_latter_request,
-+		.elevator_init_icq_fn =		bfq_init_icq,
-+		.elevator_exit_icq_fn =		bfq_exit_icq,
-+		.elevator_set_req_fn =		bfq_set_request,
-+		.elevator_put_req_fn =		bfq_put_request,
-+		.elevator_may_queue_fn =	bfq_may_queue,
-+		.elevator_init_fn =		bfq_init_queue,
-+		.elevator_exit_fn =		bfq_exit_queue,
-+	},
-+	.icq_size =		sizeof(struct bfq_io_cq),
-+	.icq_align =		__alignof__(struct bfq_io_cq),
-+	.elevator_attrs =	bfq_attrs,
-+	.elevator_name =	"bfq",
-+	.elevator_owner =	THIS_MODULE,
-+};
-+
-+static int __init bfq_init(void)
-+{
-+	int ret;
-+
-+	/*
-+	 * Can be 0 on HZ < 1000 setups.
-+	 */
-+	if (bfq_slice_idle == 0)
-+		bfq_slice_idle = 1;
-+
-+	if (bfq_timeout_async == 0)
-+		bfq_timeout_async = 1;
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+	ret = blkcg_policy_register(&blkcg_policy_bfq);
-+	if (ret)
-+		return ret;
-+#endif
-+
-+	ret = -ENOMEM;
-+	if (bfq_slab_setup())
-+		goto err_pol_unreg;
-+
-+	/*
-+	 * Times to load large popular applications for the typical systems
-+	 * installed on the reference devices (see the comments before the
-+	 * definitions of the two arrays).
-+	 */
-+	T_slow[0] = msecs_to_jiffies(2600);
-+	T_slow[1] = msecs_to_jiffies(1000);
-+	T_fast[0] = msecs_to_jiffies(5500);
-+	T_fast[1] = msecs_to_jiffies(2000);
-+
-+	/*
-+	 * Thresholds that determine the switch between speed classes (see
-+	 * the comments before the definition of the array).
-+	 */
-+	device_speed_thresh[0] = (R_fast[0] + R_slow[0]) / 2;
-+	device_speed_thresh[1] = (R_fast[1] + R_slow[1]) / 2;
-+
-+	ret = elv_register(&iosched_bfq);
-+	if (ret)
-+		goto err_pol_unreg;
-+
-+	pr_info("BFQ I/O-scheduler: v7r9");
-+
-+	return 0;
-+
-+err_pol_unreg:
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+	blkcg_policy_unregister(&blkcg_policy_bfq);
-+#endif
-+	return ret;
-+}
-+
-+static void __exit bfq_exit(void)
-+{
-+	elv_unregister(&iosched_bfq);
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+	blkcg_policy_unregister(&blkcg_policy_bfq);
-+#endif
-+	bfq_slab_kill();
-+}
-+
-+module_init(bfq_init);
-+module_exit(bfq_exit);
-+
-+MODULE_AUTHOR("Fabio Checconi, Paolo Valente");
-+MODULE_LICENSE("GPL");
-diff --git a/block/bfq-sched.c b/block/bfq-sched.c
-new file mode 100644
-index 0000000..9328a1f
---- /dev/null
-+++ b/block/bfq-sched.c
-@@ -0,0 +1,1197 @@
-+/*
-+ * BFQ: Hierarchical B-WF2Q+ scheduler.
-+ *
-+ * Based on ideas and code from CFQ:
-+ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
-+ *
-+ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
-+ *		      Paolo Valente <paolo.valente@unimore.it>
-+ *
-+ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
-+ */
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+#define for_each_entity(entity)	\
-+	for (; entity ; entity = entity->parent)
-+
-+#define for_each_entity_safe(entity, parent) \
-+	for (; entity && ({ parent = entity->parent; 1; }); entity = parent)
-+
-+
-+static struct bfq_entity *bfq_lookup_next_entity(struct bfq_sched_data *sd,
-+						 int extract,
-+						 struct bfq_data *bfqd);
-+
-+static struct bfq_group *bfqq_group(struct bfq_queue *bfqq);
-+
-+static void bfq_update_budget(struct bfq_entity *next_in_service)
-+{
-+	struct bfq_entity *bfqg_entity;
-+	struct bfq_group *bfqg;
-+	struct bfq_sched_data *group_sd;
-+
-+	BUG_ON(!next_in_service);
-+
-+	group_sd = next_in_service->sched_data;
-+
-+	bfqg = container_of(group_sd, struct bfq_group, sched_data);
-+	/*
-+	 * bfq_group's my_entity field is not NULL only if the group
-+	 * is not the root group. We must not touch the root entity
-+	 * as it must never become an in-service entity.
-+	 */
-+	bfqg_entity = bfqg->my_entity;
-+	if (bfqg_entity)
-+		bfqg_entity->budget = next_in_service->budget;
-+}
-+
-+static int bfq_update_next_in_service(struct bfq_sched_data *sd)
-+{
-+	struct bfq_entity *next_in_service;
-+
-+	if (sd->in_service_entity)
-+		/* will update/requeue at the end of service */
-+		return 0;
-+
-+	/*
-+	 * NOTE: this can be improved in many ways, such as returning
-+	 * 1 (and thus propagating upwards the update) only when the
-+	 * budget changes, or caching the bfqq that will be scheduled
-+	 * next from this subtree.  By now we worry more about
-+	 * correctness than about performance...
-+	 */
-+	next_in_service = bfq_lookup_next_entity(sd, 0, NULL);
-+	sd->next_in_service = next_in_service;
-+
-+	if (next_in_service)
-+		bfq_update_budget(next_in_service);
-+
-+	return 1;
-+}
-+
-+static void bfq_check_next_in_service(struct bfq_sched_data *sd,
-+				      struct bfq_entity *entity)
-+{
-+	BUG_ON(sd->next_in_service != entity);
-+}
-+#else
-+#define for_each_entity(entity)	\
-+	for (; entity ; entity = NULL)
-+
-+#define for_each_entity_safe(entity, parent) \
-+	for (parent = NULL; entity ; entity = parent)
-+
-+static int bfq_update_next_in_service(struct bfq_sched_data *sd)
-+{
-+	return 0;
-+}
-+
-+static void bfq_check_next_in_service(struct bfq_sched_data *sd,
-+				      struct bfq_entity *entity)
-+{
-+}
-+
-+static void bfq_update_budget(struct bfq_entity *next_in_service)
-+{
-+}
-+#endif
-+
-+/*
-+ * Shift for timestamp calculations.  This actually limits the maximum
-+ * service allowed in one timestamp delta (small shift values increase it),
-+ * the maximum total weight that can be used for the queues in the system
-+ * (big shift values increase it), and the period of virtual time
-+ * wraparounds.
-+ */
-+#define WFQ_SERVICE_SHIFT	22
-+
-+/**
-+ * bfq_gt - compare two timestamps.
-+ * @a: first ts.
-+ * @b: second ts.
-+ *
-+ * Return @a > @b, dealing with wrapping correctly.
-+ */
-+static int bfq_gt(u64 a, u64 b)
-+{
-+	return (s64)(a - b) > 0;
-+}
-+
-+static struct bfq_queue *bfq_entity_to_bfqq(struct bfq_entity *entity)
-+{
-+	struct bfq_queue *bfqq = NULL;
-+
-+	BUG_ON(!entity);
-+
-+	if (!entity->my_sched_data)
-+		bfqq = container_of(entity, struct bfq_queue, entity);
-+
-+	return bfqq;
-+}
-+
-+
-+/**
-+ * bfq_delta - map service into the virtual time domain.
-+ * @service: amount of service.
-+ * @weight: scale factor (weight of an entity or weight sum).
-+ */
-+static u64 bfq_delta(unsigned long service, unsigned long weight)
-+{
-+	u64 d = (u64)service << WFQ_SERVICE_SHIFT;
-+
-+	do_div(d, weight);
-+	return d;
-+}
-+
-+/**
-+ * bfq_calc_finish - assign the finish time to an entity.
-+ * @entity: the entity to act upon.
-+ * @service: the service to be charged to the entity.
-+ */
-+static void bfq_calc_finish(struct bfq_entity *entity, unsigned long service)
-+{
-+	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
-+
-+	BUG_ON(entity->weight == 0);
-+
-+	entity->finish = entity->start +
-+		bfq_delta(service, entity->weight);
-+
-+	if (bfqq) {
-+		bfq_log_bfqq(bfqq->bfqd, bfqq,
-+			"calc_finish: serv %lu, w %d",
-+			service, entity->weight);
-+		bfq_log_bfqq(bfqq->bfqd, bfqq,
-+			"calc_finish: start %llu, finish %llu, delta %llu",
-+			entity->start, entity->finish,
-+			bfq_delta(service, entity->weight));
-+	}
-+}
-+
-+/**
-+ * bfq_entity_of - get an entity from a node.
-+ * @node: the node field of the entity.
-+ *
-+ * Convert a node pointer to the relative entity.  This is used only
-+ * to simplify the logic of some functions and not as the generic
-+ * conversion mechanism because, e.g., in the tree walking functions,
-+ * the check for a %NULL value would be redundant.
-+ */
-+static struct bfq_entity *bfq_entity_of(struct rb_node *node)
-+{
-+	struct bfq_entity *entity = NULL;
-+
-+	if (node)
-+		entity = rb_entry(node, struct bfq_entity, rb_node);
-+
-+	return entity;
-+}
-+
-+/**
-+ * bfq_extract - remove an entity from a tree.
-+ * @root: the tree root.
-+ * @entity: the entity to remove.
-+ */
-+static void bfq_extract(struct rb_root *root, struct bfq_entity *entity)
-+{
-+	BUG_ON(entity->tree != root);
-+
-+	entity->tree = NULL;
-+	rb_erase(&entity->rb_node, root);
-+}
-+
-+/**
-+ * bfq_idle_extract - extract an entity from the idle tree.
-+ * @st: the service tree of the owning @entity.
-+ * @entity: the entity being removed.
-+ */
-+static void bfq_idle_extract(struct bfq_service_tree *st,
-+			     struct bfq_entity *entity)
-+{
-+	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
-+	struct rb_node *next;
-+
-+	BUG_ON(entity->tree != &st->idle);
-+
-+	if (entity == st->first_idle) {
-+		next = rb_next(&entity->rb_node);
-+		st->first_idle = bfq_entity_of(next);
-+	}
-+
-+	if (entity == st->last_idle) {
-+		next = rb_prev(&entity->rb_node);
-+		st->last_idle = bfq_entity_of(next);
-+	}
-+
-+	bfq_extract(&st->idle, entity);
-+
-+	if (bfqq)
-+		list_del(&bfqq->bfqq_list);
-+}
-+
-+/**
-+ * bfq_insert - generic tree insertion.
-+ * @root: tree root.
-+ * @entity: entity to insert.
-+ *
-+ * This is used for the idle and the active tree, since they are both
-+ * ordered by finish time.
-+ */
-+static void bfq_insert(struct rb_root *root, struct bfq_entity *entity)
-+{
-+	struct bfq_entity *entry;
-+	struct rb_node **node = &root->rb_node;
-+	struct rb_node *parent = NULL;
-+
-+	BUG_ON(entity->tree);
-+
-+	while (*node) {
-+		parent = *node;
-+		entry = rb_entry(parent, struct bfq_entity, rb_node);
-+
-+		if (bfq_gt(entry->finish, entity->finish))
-+			node = &parent->rb_left;
-+		else
-+			node = &parent->rb_right;
-+	}
-+
-+	rb_link_node(&entity->rb_node, parent, node);
-+	rb_insert_color(&entity->rb_node, root);
-+
-+	entity->tree = root;
-+}
-+
-+/**
-+ * bfq_update_min - update the min_start field of a entity.
-+ * @entity: the entity to update.
-+ * @node: one of its children.
-+ *
-+ * This function is called when @entity may store an invalid value for
-+ * min_start due to updates to the active tree.  The function  assumes
-+ * that the subtree rooted at @node (which may be its left or its right
-+ * child) has a valid min_start value.
-+ */
-+static void bfq_update_min(struct bfq_entity *entity, struct rb_node *node)
-+{
-+	struct bfq_entity *child;
-+
-+	if (node) {
-+		child = rb_entry(node, struct bfq_entity, rb_node);
-+		if (bfq_gt(entity->min_start, child->min_start))
-+			entity->min_start = child->min_start;
-+	}
-+}
-+
-+/**
-+ * bfq_update_active_node - recalculate min_start.
-+ * @node: the node to update.
-+ *
-+ * @node may have changed position or one of its children may have moved,
-+ * this function updates its min_start value.  The left and right subtrees
-+ * are assumed to hold a correct min_start value.
-+ */
-+static void bfq_update_active_node(struct rb_node *node)
-+{
-+	struct bfq_entity *entity = rb_entry(node, struct bfq_entity, rb_node);
-+
-+	entity->min_start = entity->start;
-+	bfq_update_min(entity, node->rb_right);
-+	bfq_update_min(entity, node->rb_left);
-+}
-+
-+/**
-+ * bfq_update_active_tree - update min_start for the whole active tree.
-+ * @node: the starting node.
-+ *
-+ * @node must be the deepest modified node after an update.  This function
-+ * updates its min_start using the values held by its children, assuming
-+ * that they did not change, and then updates all the nodes that may have
-+ * changed in the path to the root.  The only nodes that may have changed
-+ * are the ones in the path or their siblings.
-+ */
-+static void bfq_update_active_tree(struct rb_node *node)
-+{
-+	struct rb_node *parent;
-+
-+up:
-+	bfq_update_active_node(node);
-+
-+	parent = rb_parent(node);
-+	if (!parent)
-+		return;
-+
-+	if (node == parent->rb_left && parent->rb_right)
-+		bfq_update_active_node(parent->rb_right);
-+	else if (parent->rb_left)
-+		bfq_update_active_node(parent->rb_left);
-+
-+	node = parent;
-+	goto up;
-+}
-+
-+static void bfq_weights_tree_add(struct bfq_data *bfqd,
-+				 struct bfq_entity *entity,
-+				 struct rb_root *root);
-+
-+static void bfq_weights_tree_remove(struct bfq_data *bfqd,
-+				    struct bfq_entity *entity,
-+				    struct rb_root *root);
-+
-+
-+/**
-+ * bfq_active_insert - insert an entity in the active tree of its
-+ *                     group/device.
-+ * @st: the service tree of the entity.
-+ * @entity: the entity being inserted.
-+ *
-+ * The active tree is ordered by finish time, but an extra key is kept
-+ * per each node, containing the minimum value for the start times of
-+ * its children (and the node itself), so it's possible to search for
-+ * the eligible node with the lowest finish time in logarithmic time.
-+ */
-+static void bfq_active_insert(struct bfq_service_tree *st,
-+			      struct bfq_entity *entity)
-+{
-+	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
-+	struct rb_node *node = &entity->rb_node;
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+	struct bfq_sched_data *sd = NULL;
-+	struct bfq_group *bfqg = NULL;
-+	struct bfq_data *bfqd = NULL;
-+#endif
-+
-+	bfq_insert(&st->active, entity);
-+
-+	if (node->rb_left)
-+		node = node->rb_left;
-+	else if (node->rb_right)
-+		node = node->rb_right;
-+
-+	bfq_update_active_tree(node);
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+	sd = entity->sched_data;
-+	bfqg = container_of(sd, struct bfq_group, sched_data);
-+	BUG_ON(!bfqg);
-+	bfqd = (struct bfq_data *)bfqg->bfqd;
-+#endif
-+	if (bfqq)
-+		list_add(&bfqq->bfqq_list, &bfqq->bfqd->active_list);
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+	else { /* bfq_group */
-+		BUG_ON(!bfqd);
-+		bfq_weights_tree_add(bfqd, entity, &bfqd->group_weights_tree);
-+	}
-+	if (bfqg != bfqd->root_group) {
-+		BUG_ON(!bfqg);
-+		BUG_ON(!bfqd);
-+		bfqg->active_entities++;
-+		if (bfqg->active_entities == 2)
-+			bfqd->active_numerous_groups++;
-+	}
-+#endif
-+}
-+
-+/**
-+ * bfq_ioprio_to_weight - calc a weight from an ioprio.
-+ * @ioprio: the ioprio value to convert.
-+ */
-+static unsigned short bfq_ioprio_to_weight(int ioprio)
-+{
-+	BUG_ON(ioprio < 0 || ioprio >= IOPRIO_BE_NR);
-+	return IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF - ioprio;
-+}
-+
-+/**
-+ * bfq_weight_to_ioprio - calc an ioprio from a weight.
-+ * @weight: the weight value to convert.
-+ *
-+ * To preserve as much as possible the old only-ioprio user interface,
-+ * 0 is used as an escape ioprio value for weights (numerically) equal or
-+ * larger than IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF.
-+ */
-+static unsigned short bfq_weight_to_ioprio(int weight)
-+{
-+	BUG_ON(weight < BFQ_MIN_WEIGHT || weight > BFQ_MAX_WEIGHT);
-+	return IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF - weight < 0 ?
-+		0 : IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF - weight;
-+}
-+
-+static void bfq_get_entity(struct bfq_entity *entity)
-+{
-+	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
-+
-+	if (bfqq) {
-+		atomic_inc(&bfqq->ref);
-+		bfq_log_bfqq(bfqq->bfqd, bfqq, "get_entity: %p %d",
-+			     bfqq, atomic_read(&bfqq->ref));
-+	}
-+}
-+
-+/**
-+ * bfq_find_deepest - find the deepest node that an extraction can modify.
-+ * @node: the node being removed.
-+ *
-+ * Do the first step of an extraction in an rb tree, looking for the
-+ * node that will replace @node, and returning the deepest node that
-+ * the following modifications to the tree can touch.  If @node is the
-+ * last node in the tree return %NULL.
-+ */
-+static struct rb_node *bfq_find_deepest(struct rb_node *node)
-+{
-+	struct rb_node *deepest;
-+
-+	if (!node->rb_right && !node->rb_left)
-+		deepest = rb_parent(node);
-+	else if (!node->rb_right)
-+		deepest = node->rb_left;
-+	else if (!node->rb_left)
-+		deepest = node->rb_right;
-+	else {
-+		deepest = rb_next(node);
-+		if (deepest->rb_right)
-+			deepest = deepest->rb_right;
-+		else if (rb_parent(deepest) != node)
-+			deepest = rb_parent(deepest);
-+	}
-+
-+	return deepest;
-+}
-+
-+/**
-+ * bfq_active_extract - remove an entity from the active tree.
-+ * @st: the service_tree containing the tree.
-+ * @entity: the entity being removed.
-+ */
-+static void bfq_active_extract(struct bfq_service_tree *st,
-+			       struct bfq_entity *entity)
-+{
-+	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
-+	struct rb_node *node;
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+	struct bfq_sched_data *sd = NULL;
-+	struct bfq_group *bfqg = NULL;
-+	struct bfq_data *bfqd = NULL;
-+#endif
-+
-+	node = bfq_find_deepest(&entity->rb_node);
-+	bfq_extract(&st->active, entity);
-+
-+	if (node)
-+		bfq_update_active_tree(node);
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+	sd = entity->sched_data;
-+	bfqg = container_of(sd, struct bfq_group, sched_data);
-+	BUG_ON(!bfqg);
-+	bfqd = (struct bfq_data *)bfqg->bfqd;
-+#endif
-+	if (bfqq)
-+		list_del(&bfqq->bfqq_list);
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+	else { /* bfq_group */
-+		BUG_ON(!bfqd);
-+		bfq_weights_tree_remove(bfqd, entity,
-+					&bfqd->group_weights_tree);
-+	}
-+	if (bfqg != bfqd->root_group) {
-+		BUG_ON(!bfqg);
-+		BUG_ON(!bfqd);
-+		BUG_ON(!bfqg->active_entities);
-+		bfqg->active_entities--;
-+		if (bfqg->active_entities == 1) {
-+			BUG_ON(!bfqd->active_numerous_groups);
-+			bfqd->active_numerous_groups--;
-+		}
-+	}
-+#endif
-+}
-+
-+/**
-+ * bfq_idle_insert - insert an entity into the idle tree.
-+ * @st: the service tree containing the tree.
-+ * @entity: the entity to insert.
-+ */
-+static void bfq_idle_insert(struct bfq_service_tree *st,
-+			    struct bfq_entity *entity)
-+{
-+	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
-+	struct bfq_entity *first_idle = st->first_idle;
-+	struct bfq_entity *last_idle = st->last_idle;
-+
-+	if (!first_idle || bfq_gt(first_idle->finish, entity->finish))
-+		st->first_idle = entity;
-+	if (!last_idle || bfq_gt(entity->finish, last_idle->finish))
-+		st->last_idle = entity;
-+
-+	bfq_insert(&st->idle, entity);
-+
-+	if (bfqq)
-+		list_add(&bfqq->bfqq_list, &bfqq->bfqd->idle_list);
-+}
-+
-+/**
-+ * bfq_forget_entity - remove an entity from the wfq trees.
-+ * @st: the service tree.
-+ * @entity: the entity being removed.
-+ *
-+ * Update the device status and forget everything about @entity, putting
-+ * the device reference to it, if it is a queue.  Entities belonging to
-+ * groups are not refcounted.
-+ */
-+static void bfq_forget_entity(struct bfq_service_tree *st,
-+			      struct bfq_entity *entity)
-+{
-+	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
-+	struct bfq_sched_data *sd;
-+
-+	BUG_ON(!entity->on_st);
-+
-+	entity->on_st = 0;
-+	st->wsum -= entity->weight;
-+	if (bfqq) {
-+		sd = entity->sched_data;
-+		bfq_log_bfqq(bfqq->bfqd, bfqq, "forget_entity: %p %d",
-+			     bfqq, atomic_read(&bfqq->ref));
-+		bfq_put_queue(bfqq);
-+	}
-+}
-+
-+/**
-+ * bfq_put_idle_entity - release the idle tree ref of an entity.
-+ * @st: service tree for the entity.
-+ * @entity: the entity being released.
-+ */
-+static void bfq_put_idle_entity(struct bfq_service_tree *st,
-+				struct bfq_entity *entity)
-+{
-+	bfq_idle_extract(st, entity);
-+	bfq_forget_entity(st, entity);
-+}
-+
-+/**
-+ * bfq_forget_idle - update the idle tree if necessary.
-+ * @st: the service tree to act upon.
-+ *
-+ * To preserve the global O(log N) complexity we only remove one entry here;
-+ * as the idle tree will not grow indefinitely this can be done safely.
-+ */
-+static void bfq_forget_idle(struct bfq_service_tree *st)
-+{
-+	struct bfq_entity *first_idle = st->first_idle;
-+	struct bfq_entity *last_idle = st->last_idle;
-+
-+	if (RB_EMPTY_ROOT(&st->active) && last_idle &&
-+	    !bfq_gt(last_idle->finish, st->vtime)) {
-+		/*
-+		 * Forget the whole idle tree, increasing the vtime past
-+		 * the last finish time of idle entities.
-+		 */
-+		st->vtime = last_idle->finish;
-+	}
-+
-+	if (first_idle && !bfq_gt(first_idle->finish, st->vtime))
-+		bfq_put_idle_entity(st, first_idle);
-+}
-+
-+static struct bfq_service_tree *
-+__bfq_entity_update_weight_prio(struct bfq_service_tree *old_st,
-+			 struct bfq_entity *entity)
-+{
-+	struct bfq_service_tree *new_st = old_st;
-+
-+	if (entity->prio_changed) {
-+		struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
-+		unsigned short prev_weight, new_weight;
-+		struct bfq_data *bfqd = NULL;
-+		struct rb_root *root;
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+		struct bfq_sched_data *sd;
-+		struct bfq_group *bfqg;
-+#endif
-+
-+		if (bfqq)
-+			bfqd = bfqq->bfqd;
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+		else {
-+			sd = entity->my_sched_data;
-+			bfqg = container_of(sd, struct bfq_group, sched_data);
-+			BUG_ON(!bfqg);
-+			bfqd = (struct bfq_data *)bfqg->bfqd;
-+			BUG_ON(!bfqd);
-+		}
-+#endif
-+
-+		BUG_ON(old_st->wsum < entity->weight);
-+		old_st->wsum -= entity->weight;
-+
-+		if (entity->new_weight != entity->orig_weight) {
-+			if (entity->new_weight < BFQ_MIN_WEIGHT ||
-+			    entity->new_weight > BFQ_MAX_WEIGHT) {
-+				printk(KERN_CRIT "update_weight_prio: "
-+						 "new_weight %d\n",
-+					entity->new_weight);
-+				BUG();
-+			}
-+			entity->orig_weight = entity->new_weight;
-+			if (bfqq)
-+				bfqq->ioprio =
-+				  bfq_weight_to_ioprio(entity->orig_weight);
-+		}
-+
-+		if (bfqq)
-+			bfqq->ioprio_class = bfqq->new_ioprio_class;
-+		entity->prio_changed = 0;
-+
-+		/*
-+		 * NOTE: here we may be changing the weight too early,
-+		 * this will cause unfairness.  The correct approach
-+		 * would have required additional complexity to defer
-+		 * weight changes to the proper time instants (i.e.,
-+		 * when entity->finish <= old_st->vtime).
-+		 */
-+		new_st = bfq_entity_service_tree(entity);
-+
-+		prev_weight = entity->weight;
-+		new_weight = entity->orig_weight *
-+			     (bfqq ? bfqq->wr_coeff : 1);
-+		/*
-+		 * If the weight of the entity changes, remove the entity
-+		 * from its old weight counter (if there is a counter
-+		 * associated with the entity), and add it to the counter
-+		 * associated with its new weight.
-+		 */
-+		if (prev_weight != new_weight) {
-+			root = bfqq ? &bfqd->queue_weights_tree :
-+				      &bfqd->group_weights_tree;
-+			bfq_weights_tree_remove(bfqd, entity, root);
-+		}
-+		entity->weight = new_weight;
-+		/*
-+		 * Add the entity to its weights tree only if it is
-+		 * not associated with a weight-raised queue.
-+		 */
-+		if (prev_weight != new_weight &&
-+		    (bfqq ? bfqq->wr_coeff == 1 : 1))
-+			/* If we get here, root has been initialized. */
-+			bfq_weights_tree_add(bfqd, entity, root);
-+
-+		new_st->wsum += entity->weight;
-+
-+		if (new_st != old_st)
-+			entity->start = new_st->vtime;
-+	}
-+
-+	return new_st;
-+}
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+static void bfqg_stats_set_start_empty_time(struct bfq_group *bfqg);
-+#endif
-+
-+/**
-+ * bfq_bfqq_served - update the scheduler status after selection for
-+ *                   service.
-+ * @bfqq: the queue being served.
-+ * @served: bytes to transfer.
-+ *
-+ * NOTE: this can be optimized, as the timestamps of upper level entities
-+ * are synchronized every time a new bfqq is selected for service.  By now,
-+ * we keep it to better check consistency.
-+ */
-+static void bfq_bfqq_served(struct bfq_queue *bfqq, int served)
-+{
-+	struct bfq_entity *entity = &bfqq->entity;
-+	struct bfq_service_tree *st;
-+
-+	for_each_entity(entity) {
-+		st = bfq_entity_service_tree(entity);
-+
-+		entity->service += served;
-+		BUG_ON(entity->service > entity->budget);
-+		BUG_ON(st->wsum == 0);
-+
-+		st->vtime += bfq_delta(served, st->wsum);
-+		bfq_forget_idle(st);
-+	}
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+	bfqg_stats_set_start_empty_time(bfqq_group(bfqq));
-+#endif
-+	bfq_log_bfqq(bfqq->bfqd, bfqq, "bfqq_served %d secs", served);
-+}
-+
-+/**
-+ * bfq_bfqq_charge_full_budget - set the service to the entity budget.
-+ * @bfqq: the queue that needs a service update.
-+ *
-+ * When it's not possible to be fair in the service domain, because
-+ * a queue is not consuming its budget fast enough (the meaning of
-+ * fast depends on the timeout parameter), we charge it a full
-+ * budget.  In this way we should obtain a sort of time-domain
-+ * fairness among all the seeky/slow queues.
-+ */
-+static void bfq_bfqq_charge_full_budget(struct bfq_queue *bfqq)
-+{
-+	struct bfq_entity *entity = &bfqq->entity;
-+
-+	bfq_log_bfqq(bfqq->bfqd, bfqq, "charge_full_budget");
-+
-+	bfq_bfqq_served(bfqq, entity->budget - entity->service);
-+}
-+
-+/**
-+ * __bfq_activate_entity - activate an entity.
-+ * @entity: the entity being activated.
-+ *
-+ * Called whenever an entity is activated, i.e., it is not active and one
-+ * of its children receives a new request, or has to be reactivated due to
-+ * budget exhaustion.  It uses the current budget of the entity (and the
-+ * service received if @entity is active) of the queue to calculate its
-+ * timestamps.
-+ */
-+static void __bfq_activate_entity(struct bfq_entity *entity)
-+{
-+	struct bfq_sched_data *sd = entity->sched_data;
-+	struct bfq_service_tree *st = bfq_entity_service_tree(entity);
-+
-+	if (entity == sd->in_service_entity) {
-+		BUG_ON(entity->tree);
-+		/*
-+		 * If we are requeueing the current entity we have
-+		 * to take care of not charging to it service it has
-+		 * not received.
-+		 */
-+		bfq_calc_finish(entity, entity->service);
-+		entity->start = entity->finish;
-+		sd->in_service_entity = NULL;
-+	} else if (entity->tree == &st->active) {
-+		/*
-+		 * Requeueing an entity due to a change of some
-+		 * next_in_service entity below it.  We reuse the
-+		 * old start time.
-+		 */
-+		bfq_active_extract(st, entity);
-+	} else if (entity->tree == &st->idle) {
-+		/*
-+		 * Must be on the idle tree, bfq_idle_extract() will
-+		 * check for that.
-+		 */
-+		bfq_idle_extract(st, entity);
-+		entity->start = bfq_gt(st->vtime, entity->finish) ?
-+				       st->vtime : entity->finish;
-+	} else {
-+		/*
-+		 * The finish time of the entity may be invalid, and
-+		 * it is in the past for sure, otherwise the queue
-+		 * would have been on the idle tree.
-+		 */
-+		entity->start = st->vtime;
-+		st->wsum += entity->weight;
-+		bfq_get_entity(entity);
-+
-+		BUG_ON(entity->on_st);
-+		entity->on_st = 1;
-+	}
-+
-+	st = __bfq_entity_update_weight_prio(st, entity);
-+	bfq_calc_finish(entity, entity->budget);
-+	bfq_active_insert(st, entity);
-+}
-+
-+/**
-+ * bfq_activate_entity - activate an entity and its ancestors if necessary.
-+ * @entity: the entity to activate.
-+ *
-+ * Activate @entity and all the entities on the path from it to the root.
-+ */
-+static void bfq_activate_entity(struct bfq_entity *entity)
-+{
-+	struct bfq_sched_data *sd;
-+
-+	for_each_entity(entity) {
-+		__bfq_activate_entity(entity);
-+
-+		sd = entity->sched_data;
-+		if (!bfq_update_next_in_service(sd))
-+			/*
-+			 * No need to propagate the activation to the
-+			 * upper entities, as they will be updated when
-+			 * the in-service entity is rescheduled.
-+			 */
-+			break;
-+	}
-+}
-+
-+/**
-+ * __bfq_deactivate_entity - deactivate an entity from its service tree.
-+ * @entity: the entity to deactivate.
-+ * @requeue: if false, the entity will not be put into the idle tree.
-+ *
-+ * Deactivate an entity, independently from its previous state.  If the
-+ * entity was not on a service tree just return, otherwise if it is on
-+ * any scheduler tree, extract it from that tree, and if necessary
-+ * and if the caller did not specify @requeue, put it on the idle tree.
-+ *
-+ * Return %1 if the caller should update the entity hierarchy, i.e.,
-+ * if the entity was in service or if it was the next_in_service for
-+ * its sched_data; return %0 otherwise.
-+ */
-+static int __bfq_deactivate_entity(struct bfq_entity *entity, int requeue)
-+{
-+	struct bfq_sched_data *sd = entity->sched_data;
-+	struct bfq_service_tree *st = bfq_entity_service_tree(entity);
-+	int was_in_service = entity == sd->in_service_entity;
-+	int ret = 0;
-+
-+	if (!entity->on_st)
-+		return 0;
-+
-+	BUG_ON(was_in_service && entity->tree);
-+
-+	if (was_in_service) {
-+		bfq_calc_finish(entity, entity->service);
-+		sd->in_service_entity = NULL;
-+	} else if (entity->tree == &st->active)
-+		bfq_active_extract(st, entity);
-+	else if (entity->tree == &st->idle)
-+		bfq_idle_extract(st, entity);
-+	else if (entity->tree)
-+		BUG();
-+
-+	if (was_in_service || sd->next_in_service == entity)
-+		ret = bfq_update_next_in_service(sd);
-+
-+	if (!requeue || !bfq_gt(entity->finish, st->vtime))
-+		bfq_forget_entity(st, entity);
-+	else
-+		bfq_idle_insert(st, entity);
-+
-+	BUG_ON(sd->in_service_entity == entity);
-+	BUG_ON(sd->next_in_service == entity);
-+
-+	return ret;
-+}
-+
-+/**
-+ * bfq_deactivate_entity - deactivate an entity.
-+ * @entity: the entity to deactivate.
-+ * @requeue: true if the entity can be put on the idle tree
-+ */
-+static void bfq_deactivate_entity(struct bfq_entity *entity, int requeue)
-+{
-+	struct bfq_sched_data *sd;
-+	struct bfq_entity *parent;
-+
-+	for_each_entity_safe(entity, parent) {
-+		sd = entity->sched_data;
-+
-+		if (!__bfq_deactivate_entity(entity, requeue))
-+			/*
-+			 * The parent entity is still backlogged, and
-+			 * we don't need to update it as it is still
-+			 * in service.
-+			 */
-+			break;
-+
-+		if (sd->next_in_service)
-+			/*
-+			 * The parent entity is still backlogged and
-+			 * the budgets on the path towards the root
-+			 * need to be updated.
-+			 */
-+			goto update;
-+
-+		/*
-+		 * If we reach there the parent is no more backlogged and
-+		 * we want to propagate the dequeue upwards.
-+		 */
-+		requeue = 1;
-+	}
-+
-+	return;
-+
-+update:
-+	entity = parent;
-+	for_each_entity(entity) {
-+		__bfq_activate_entity(entity);
-+
-+		sd = entity->sched_data;
-+		if (!bfq_update_next_in_service(sd))
-+			break;
-+	}
-+}
-+
-+/**
-+ * bfq_update_vtime - update vtime if necessary.
-+ * @st: the service tree to act upon.
-+ *
-+ * If necessary update the service tree vtime to have at least one
-+ * eligible entity, skipping to its start time.  Assumes that the
-+ * active tree of the device is not empty.
-+ *
-+ * NOTE: this hierarchical implementation updates vtimes quite often,
-+ * we may end up with reactivated processes getting timestamps after a
-+ * vtime skip done because we needed a ->first_active entity on some
-+ * intermediate node.
-+ */
-+static void bfq_update_vtime(struct bfq_service_tree *st)
-+{
-+	struct bfq_entity *entry;
-+	struct rb_node *node = st->active.rb_node;
-+
-+	entry = rb_entry(node, struct bfq_entity, rb_node);
-+	if (bfq_gt(entry->min_start, st->vtime)) {
-+		st->vtime = entry->min_start;
-+		bfq_forget_idle(st);
-+	}
-+}
-+
-+/**
-+ * bfq_first_active_entity - find the eligible entity with
-+ *                           the smallest finish time
-+ * @st: the service tree to select from.
-+ *
-+ * This function searches the first schedulable entity, starting from the
-+ * root of the tree and going on the left every time on this side there is
-+ * a subtree with at least one eligible (start >= vtime) entity. The path on
-+ * the right is followed only if a) the left subtree contains no eligible
-+ * entities and b) no eligible entity has been found yet.
-+ */
-+static struct bfq_entity *bfq_first_active_entity(struct bfq_service_tree *st)
-+{
-+	struct bfq_entity *entry, *first = NULL;
-+	struct rb_node *node = st->active.rb_node;
-+
-+	while (node) {
-+		entry = rb_entry(node, struct bfq_entity, rb_node);
-+left:
-+		if (!bfq_gt(entry->start, st->vtime))
-+			first = entry;
-+
-+		BUG_ON(bfq_gt(entry->min_start, st->vtime));
-+
-+		if (node->rb_left) {
-+			entry = rb_entry(node->rb_left,
-+					 struct bfq_entity, rb_node);
-+			if (!bfq_gt(entry->min_start, st->vtime)) {
-+				node = node->rb_left;
-+				goto left;
-+			}
-+		}
-+		if (first)
-+			break;
-+		node = node->rb_right;
-+	}
-+
-+	BUG_ON(!first && !RB_EMPTY_ROOT(&st->active));
-+	return first;
-+}
-+
-+/**
-+ * __bfq_lookup_next_entity - return the first eligible entity in @st.
-+ * @st: the service tree.
-+ *
-+ * Update the virtual time in @st and return the first eligible entity
-+ * it contains.
-+ */
-+static struct bfq_entity *__bfq_lookup_next_entity(struct bfq_service_tree *st,
-+						   bool force)
-+{
-+	struct bfq_entity *entity, *new_next_in_service = NULL;
-+
-+	if (RB_EMPTY_ROOT(&st->active))
-+		return NULL;
-+
-+	bfq_update_vtime(st);
-+	entity = bfq_first_active_entity(st);
-+	BUG_ON(bfq_gt(entity->start, st->vtime));
-+
-+	/*
-+	 * If the chosen entity does not match with the sched_data's
-+	 * next_in_service and we are forcedly serving the IDLE priority
-+	 * class tree, bubble up budget update.
-+	 */
-+	if (unlikely(force && entity != entity->sched_data->next_in_service)) {
-+		new_next_in_service = entity;
-+		for_each_entity(new_next_in_service)
-+			bfq_update_budget(new_next_in_service);
-+	}
-+
-+	return entity;
-+}
-+
-+/**
-+ * bfq_lookup_next_entity - return the first eligible entity in @sd.
-+ * @sd: the sched_data.
-+ * @extract: if true the returned entity will be also extracted from @sd.
-+ *
-+ * NOTE: since we cache the next_in_service entity at each level of the
-+ * hierarchy, the complexity of the lookup can be decreased with
-+ * absolutely no effort just returning the cached next_in_service value;
-+ * we prefer to do full lookups to test the consistency of * the data
-+ * structures.
-+ */
-+static struct bfq_entity *bfq_lookup_next_entity(struct bfq_sched_data *sd,
-+						 int extract,
-+						 struct bfq_data *bfqd)
-+{
-+	struct bfq_service_tree *st = sd->service_tree;
-+	struct bfq_entity *entity;
-+	int i = 0;
-+
-+	BUG_ON(sd->in_service_entity);
-+
-+	if (bfqd &&
-+	    jiffies - bfqd->bfq_class_idle_last_service > BFQ_CL_IDLE_TIMEOUT) {
-+		entity = __bfq_lookup_next_entity(st + BFQ_IOPRIO_CLASSES - 1,
-+						  true);
-+		if (entity) {
-+			i = BFQ_IOPRIO_CLASSES - 1;
-+			bfqd->bfq_class_idle_last_service = jiffies;
-+			sd->next_in_service = entity;
-+		}
-+	}
-+	for (; i < BFQ_IOPRIO_CLASSES; i++) {
-+		entity = __bfq_lookup_next_entity(st + i, false);
-+		if (entity) {
-+			if (extract) {
-+				bfq_check_next_in_service(sd, entity);
-+				bfq_active_extract(st + i, entity);
-+				sd->in_service_entity = entity;
-+				sd->next_in_service = NULL;
-+			}
-+			break;
-+		}
-+	}
-+
-+	return entity;
-+}
-+
-+/*
-+ * Get next queue for service.
-+ */
-+static struct bfq_queue *bfq_get_next_queue(struct bfq_data *bfqd)
-+{
-+	struct bfq_entity *entity = NULL;
-+	struct bfq_sched_data *sd;
-+	struct bfq_queue *bfqq;
-+
-+	BUG_ON(bfqd->in_service_queue);
-+
-+	if (bfqd->busy_queues == 0)
-+		return NULL;
-+
-+	sd = &bfqd->root_group->sched_data;
-+	for (; sd ; sd = entity->my_sched_data) {
-+		entity = bfq_lookup_next_entity(sd, 1, bfqd);
-+		BUG_ON(!entity);
-+		entity->service = 0;
-+	}
-+
-+	bfqq = bfq_entity_to_bfqq(entity);
-+	BUG_ON(!bfqq);
-+
-+	return bfqq;
-+}
-+
-+static void __bfq_bfqd_reset_in_service(struct bfq_data *bfqd)
-+{
-+	if (bfqd->in_service_bic) {
-+		put_io_context(bfqd->in_service_bic->icq.ioc);
-+		bfqd->in_service_bic = NULL;
-+	}
-+
-+	bfqd->in_service_queue = NULL;
-+	del_timer(&bfqd->idle_slice_timer);
-+}
-+
-+static void bfq_deactivate_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq,
-+				int requeue)
-+{
-+	struct bfq_entity *entity = &bfqq->entity;
-+
-+	if (bfqq == bfqd->in_service_queue)
-+		__bfq_bfqd_reset_in_service(bfqd);
-+
-+	bfq_deactivate_entity(entity, requeue);
-+}
-+
-+static void bfq_activate_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq)
-+{
-+	struct bfq_entity *entity = &bfqq->entity;
-+
-+	bfq_activate_entity(entity);
-+}
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+static void bfqg_stats_update_dequeue(struct bfq_group *bfqg);
-+#endif
-+
-+/*
-+ * Called when the bfqq no longer has requests pending, remove it from
-+ * the service tree.
-+ */
-+static void bfq_del_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq,
-+			      int requeue)
-+{
-+	BUG_ON(!bfq_bfqq_busy(bfqq));
-+	BUG_ON(!RB_EMPTY_ROOT(&bfqq->sort_list));
-+
-+	bfq_log_bfqq(bfqd, bfqq, "del from busy");
-+
-+	bfq_clear_bfqq_busy(bfqq);
-+
-+	BUG_ON(bfqd->busy_queues == 0);
-+	bfqd->busy_queues--;
-+
-+	if (!bfqq->dispatched) {
-+		bfq_weights_tree_remove(bfqd, &bfqq->entity,
-+					&bfqd->queue_weights_tree);
-+		if (!blk_queue_nonrot(bfqd->queue)) {
-+			BUG_ON(!bfqd->busy_in_flight_queues);
-+			bfqd->busy_in_flight_queues--;
-+			if (bfq_bfqq_constantly_seeky(bfqq)) {
-+				BUG_ON(!bfqd->
-+					const_seeky_busy_in_flight_queues);
-+				bfqd->const_seeky_busy_in_flight_queues--;
-+			}
-+		}
-+	}
-+	if (bfqq->wr_coeff > 1)
-+		bfqd->wr_busy_queues--;
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+	bfqg_stats_update_dequeue(bfqq_group(bfqq));
-+#endif
-+
-+	bfq_deactivate_bfqq(bfqd, bfqq, requeue);
-+}
-+
-+/*
-+ * Called when an inactive queue receives a new request.
-+ */
-+static void bfq_add_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq)
-+{
-+	BUG_ON(bfq_bfqq_busy(bfqq));
-+	BUG_ON(bfqq == bfqd->in_service_queue);
-+
-+	bfq_log_bfqq(bfqd, bfqq, "add to busy");
-+
-+	bfq_activate_bfqq(bfqd, bfqq);
-+
-+	bfq_mark_bfqq_busy(bfqq);
-+	bfqd->busy_queues++;
-+
-+	if (!bfqq->dispatched) {
-+		if (bfqq->wr_coeff == 1)
-+			bfq_weights_tree_add(bfqd, &bfqq->entity,
-+					     &bfqd->queue_weights_tree);
-+		if (!blk_queue_nonrot(bfqd->queue)) {
-+			bfqd->busy_in_flight_queues++;
-+			if (bfq_bfqq_constantly_seeky(bfqq))
-+				bfqd->const_seeky_busy_in_flight_queues++;
-+		}
-+	}
-+	if (bfqq->wr_coeff > 1)
-+		bfqd->wr_busy_queues++;
-+}
-diff --git a/block/bfq.h b/block/bfq.h
-new file mode 100644
-index 0000000..ca5ac20
---- /dev/null
-+++ b/block/bfq.h
-@@ -0,0 +1,807 @@
-+/*
-+ * BFQ-v7r9 for 4.2.0: data structures and common functions prototypes.
-+ *
-+ * Based on ideas and code from CFQ:
-+ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
-+ *
-+ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
-+ *		      Paolo Valente <paolo.valente@unimore.it>
-+ *
-+ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
-+ */
-+
-+#ifndef _BFQ_H
-+#define _BFQ_H
-+
-+#include <linux/blktrace_api.h>
-+#include <linux/hrtimer.h>
-+#include <linux/ioprio.h>
-+#include <linux/rbtree.h>
-+#include <linux/blk-cgroup.h>
-+
-+#define BFQ_IOPRIO_CLASSES	3
-+#define BFQ_CL_IDLE_TIMEOUT	(HZ/5)
-+
-+#define BFQ_MIN_WEIGHT			1
-+#define BFQ_MAX_WEIGHT			1000
-+#define BFQ_WEIGHT_CONVERSION_COEFF	10
-+
-+#define BFQ_DEFAULT_QUEUE_IOPRIO	4
-+
-+#define BFQ_DEFAULT_GRP_WEIGHT	10
-+#define BFQ_DEFAULT_GRP_IOPRIO	0
-+#define BFQ_DEFAULT_GRP_CLASS	IOPRIO_CLASS_BE
-+
-+struct bfq_entity;
-+
-+/**
-+ * struct bfq_service_tree - per ioprio_class service tree.
-+ * @active: tree for active entities (i.e., those backlogged).
-+ * @idle: tree for idle entities (i.e., those not backlogged, with V <= F_i).
-+ * @first_idle: idle entity with minimum F_i.
-+ * @last_idle: idle entity with maximum F_i.
-+ * @vtime: scheduler virtual time.
-+ * @wsum: scheduler weight sum; active and idle entities contribute to it.
-+ *
-+ * Each service tree represents a B-WF2Q+ scheduler on its own.  Each
-+ * ioprio_class has its own independent scheduler, and so its own
-+ * bfq_service_tree.  All the fields are protected by the queue lock
-+ * of the containing bfqd.
-+ */
-+struct bfq_service_tree {
-+	struct rb_root active;
-+	struct rb_root idle;
-+
-+	struct bfq_entity *first_idle;
-+	struct bfq_entity *last_idle;
-+
-+	u64 vtime;
-+	unsigned long wsum;
-+};
-+
-+/**
-+ * struct bfq_sched_data - multi-class scheduler.
-+ * @in_service_entity: entity in service.
-+ * @next_in_service: head-of-the-line entity in the scheduler.
-+ * @service_tree: array of service trees, one per ioprio_class.
-+ *
-+ * bfq_sched_data is the basic scheduler queue.  It supports three
-+ * ioprio_classes, and can be used either as a toplevel queue or as
-+ * an intermediate queue on a hierarchical setup.
-+ * @next_in_service points to the active entity of the sched_data
-+ * service trees that will be scheduled next.
-+ *
-+ * The supported ioprio_classes are the same as in CFQ, in descending
-+ * priority order, IOPRIO_CLASS_RT, IOPRIO_CLASS_BE, IOPRIO_CLASS_IDLE.
-+ * Requests from higher priority queues are served before all the
-+ * requests from lower priority queues; among requests of the same
-+ * queue requests are served according to B-WF2Q+.
-+ * All the fields are protected by the queue lock of the containing bfqd.
-+ */
-+struct bfq_sched_data {
-+	struct bfq_entity *in_service_entity;
-+	struct bfq_entity *next_in_service;
-+	struct bfq_service_tree service_tree[BFQ_IOPRIO_CLASSES];
-+};
-+
-+/**
-+ * struct bfq_weight_counter - counter of the number of all active entities
-+ *                             with a given weight.
-+ * @weight: weight of the entities that this counter refers to.
-+ * @num_active: number of active entities with this weight.
-+ * @weights_node: weights tree member (see bfq_data's @queue_weights_tree
-+ *                and @group_weights_tree).
-+ */
-+struct bfq_weight_counter {
-+	short int weight;
-+	unsigned int num_active;
-+	struct rb_node weights_node;
-+};
-+
-+/**
-+ * struct bfq_entity - schedulable entity.
-+ * @rb_node: service_tree member.
-+ * @weight_counter: pointer to the weight counter associated with this entity.
-+ * @on_st: flag, true if the entity is on a tree (either the active or
-+ *         the idle one of its service_tree).
-+ * @finish: B-WF2Q+ finish timestamp (aka F_i).
-+ * @start: B-WF2Q+ start timestamp (aka S_i).
-+ * @tree: tree the entity is enqueued into; %NULL if not on a tree.
-+ * @min_start: minimum start time of the (active) subtree rooted at
-+ *             this entity; used for O(log N) lookups into active trees.
-+ * @service: service received during the last round of service.
-+ * @budget: budget used to calculate F_i; F_i = S_i + @budget / @weight.
-+ * @weight: weight of the queue
-+ * @parent: parent entity, for hierarchical scheduling.
-+ * @my_sched_data: for non-leaf nodes in the cgroup hierarchy, the
-+ *                 associated scheduler queue, %NULL on leaf nodes.
-+ * @sched_data: the scheduler queue this entity belongs to.
-+ * @ioprio: the ioprio in use.
-+ * @new_weight: when a weight change is requested, the new weight value.
-+ * @orig_weight: original weight, used to implement weight boosting
-+ * @prio_changed: flag, true when the user requested a weight, ioprio or
-+ *		  ioprio_class change.
-+ *
-+ * A bfq_entity is used to represent either a bfq_queue (leaf node in the
-+ * cgroup hierarchy) or a bfq_group into the upper level scheduler.  Each
-+ * entity belongs to the sched_data of the parent group in the cgroup
-+ * hierarchy.  Non-leaf entities have also their own sched_data, stored
-+ * in @my_sched_data.
-+ *
-+ * Each entity stores independently its priority values; this would
-+ * allow different weights on different devices, but this
-+ * functionality is not exported to userspace by now.  Priorities and
-+ * weights are updated lazily, first storing the new values into the
-+ * new_* fields, then setting the @prio_changed flag.  As soon as
-+ * there is a transition in the entity state that allows the priority
-+ * update to take place the effective and the requested priority
-+ * values are synchronized.
-+ *
-+ * Unless cgroups are used, the weight value is calculated from the
-+ * ioprio to export the same interface as CFQ.  When dealing with
-+ * ``well-behaved'' queues (i.e., queues that do not spend too much
-+ * time to consume their budget and have true sequential behavior, and
-+ * when there are no external factors breaking anticipation) the
-+ * relative weights at each level of the cgroups hierarchy should be
-+ * guaranteed.  All the fields are protected by the queue lock of the
-+ * containing bfqd.
-+ */
-+struct bfq_entity {
-+	struct rb_node rb_node;
-+	struct bfq_weight_counter *weight_counter;
-+
-+	int on_st;
-+
-+	u64 finish;
-+	u64 start;
-+
-+	struct rb_root *tree;
-+
-+	u64 min_start;
-+
-+	int service, budget;
-+	unsigned short weight, new_weight;
-+	unsigned short orig_weight;
-+
-+	struct bfq_entity *parent;
-+
-+	struct bfq_sched_data *my_sched_data;
-+	struct bfq_sched_data *sched_data;
-+
-+	int prio_changed;
-+};
-+
-+struct bfq_group;
-+
-+/**
-+ * struct bfq_queue - leaf schedulable entity.
-+ * @ref: reference counter.
-+ * @bfqd: parent bfq_data.
-+ * @new_ioprio: when an ioprio change is requested, the new ioprio value.
-+ * @ioprio_class: the ioprio_class in use.
-+ * @new_ioprio_class: when an ioprio_class change is requested, the new
-+ *                    ioprio_class value.
-+ * @new_bfqq: shared bfq_queue if queue is cooperating with
-+ *           one or more other queues.
-+ * @sort_list: sorted list of pending requests.
-+ * @next_rq: if fifo isn't expired, next request to serve.
-+ * @queued: nr of requests queued in @sort_list.
-+ * @allocated: currently allocated requests.
-+ * @meta_pending: pending metadata requests.
-+ * @fifo: fifo list of requests in sort_list.
-+ * @entity: entity representing this queue in the scheduler.
-+ * @max_budget: maximum budget allowed from the feedback mechanism.
-+ * @budget_timeout: budget expiration (in jiffies).
-+ * @dispatched: number of requests on the dispatch list or inside driver.
-+ * @flags: status flags.
-+ * @bfqq_list: node for active/idle bfqq list inside our bfqd.
-+ * @burst_list_node: node for the device's burst list.
-+ * @seek_samples: number of seeks sampled
-+ * @seek_total: sum of the distances of the seeks sampled
-+ * @seek_mean: mean seek distance
-+ * @last_request_pos: position of the last request enqueued
-+ * @requests_within_timer: number of consecutive pairs of request completion
-+ *                         and arrival, such that the queue becomes idle
-+ *                         after the completion, but the next request arrives
-+ *                         within an idle time slice; used only if the queue's
-+ *                         IO_bound has been cleared.
-+ * @pid: pid of the process owning the queue, used for logging purposes.
-+ * @last_wr_start_finish: start time of the current weight-raising period if
-+ *                        the @bfq-queue is being weight-raised, otherwise
-+ *                        finish time of the last weight-raising period
-+ * @wr_cur_max_time: current max raising time for this queue
-+ * @soft_rt_next_start: minimum time instant such that, only if a new
-+ *                      request is enqueued after this time instant in an
-+ *                      idle @bfq_queue with no outstanding requests, then
-+ *                      the task associated with the queue it is deemed as
-+ *                      soft real-time (see the comments to the function
-+ *                      bfq_bfqq_softrt_next_start())
-+ * @last_idle_bklogged: time of the last transition of the @bfq_queue from
-+ *                      idle to backlogged
-+ * @service_from_backlogged: cumulative service received from the @bfq_queue
-+ *                           since the last transition from idle to
-+ *                           backlogged
-+ * @bic: pointer to the bfq_io_cq owning the bfq_queue, set to %NULL if the
-+ *	 queue is shared
-+ *
-+ * A bfq_queue is a leaf request queue; it can be associated with an
-+ * io_context or more, if it  is  async or shared  between  cooperating
-+ * processes. @cgroup holds a reference to the cgroup, to be sure that it
-+ * does not disappear while a bfqq still references it (mostly to avoid
-+ * races between request issuing and task migration followed by cgroup
-+ * destruction).
-+ * All the fields are protected by the queue lock of the containing bfqd.
-+ */
-+struct bfq_queue {
-+	atomic_t ref;
-+	struct bfq_data *bfqd;
-+
-+	unsigned short ioprio, new_ioprio;
-+	unsigned short ioprio_class, new_ioprio_class;
-+
-+	/* fields for cooperating queues handling */
-+	struct bfq_queue *new_bfqq;
-+	struct rb_node pos_node;
-+	struct rb_root *pos_root;
-+
-+	struct rb_root sort_list;
-+	struct request *next_rq;
-+	int queued[2];
-+	int allocated[2];
-+	int meta_pending;
-+	struct list_head fifo;
-+
-+	struct bfq_entity entity;
-+
-+	int max_budget;
-+	unsigned long budget_timeout;
-+
-+	int dispatched;
-+
-+	unsigned int flags;
-+
-+	struct list_head bfqq_list;
-+
-+	struct hlist_node burst_list_node;
-+
-+	unsigned int seek_samples;
-+	u64 seek_total;
-+	sector_t seek_mean;
-+	sector_t last_request_pos;
-+
-+	unsigned int requests_within_timer;
-+
-+	pid_t pid;
-+	struct bfq_io_cq *bic;
-+
-+	/* weight-raising fields */
-+	unsigned long wr_cur_max_time;
-+	unsigned long soft_rt_next_start;
-+	unsigned long last_wr_start_finish;
-+	unsigned int wr_coeff;
-+	unsigned long last_idle_bklogged;
-+	unsigned long service_from_backlogged;
-+};
-+
-+/**
-+ * struct bfq_ttime - per process thinktime stats.
-+ * @ttime_total: total process thinktime
-+ * @ttime_samples: number of thinktime samples
-+ * @ttime_mean: average process thinktime
-+ */
-+struct bfq_ttime {
-+	unsigned long last_end_request;
-+
-+	unsigned long ttime_total;
-+	unsigned long ttime_samples;
-+	unsigned long ttime_mean;
-+};
-+
-+/**
-+ * struct bfq_io_cq - per (request_queue, io_context) structure.
-+ * @icq: associated io_cq structure
-+ * @bfqq: array of two process queues, the sync and the async
-+ * @ttime: associated @bfq_ttime struct
-+ * @ioprio: per (request_queue, blkcg) ioprio.
-+ * @blkcg_id: id of the blkcg the related io_cq belongs to.
-+ */
-+struct bfq_io_cq {
-+	struct io_cq icq; /* must be the first member */
-+	struct bfq_queue *bfqq[2];
-+	struct bfq_ttime ttime;
-+	int ioprio;
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+	uint64_t blkcg_id; /* the current blkcg ID */
-+#endif
-+};
-+
-+enum bfq_device_speed {
-+	BFQ_BFQD_FAST,
-+	BFQ_BFQD_SLOW,
-+};
-+
-+/**
-+ * struct bfq_data - per device data structure.
-+ * @queue: request queue for the managed device.
-+ * @root_group: root bfq_group for the device.
-+ * @active_numerous_groups: number of bfq_groups containing more than one
-+ *                          active @bfq_entity.
-+ * @queue_weights_tree: rbtree of weight counters of @bfq_queues, sorted by
-+ *                      weight. Used to keep track of whether all @bfq_queues
-+ *                     have the same weight. The tree contains one counter
-+ *                     for each distinct weight associated to some active
-+ *                     and not weight-raised @bfq_queue (see the comments to
-+ *                      the functions bfq_weights_tree_[add|remove] for
-+ *                     further details).
-+ * @group_weights_tree: rbtree of non-queue @bfq_entity weight counters, sorted
-+ *                      by weight. Used to keep track of whether all
-+ *                     @bfq_groups have the same weight. The tree contains
-+ *                     one counter for each distinct weight associated to
-+ *                     some active @bfq_group (see the comments to the
-+ *                     functions bfq_weights_tree_[add|remove] for further
-+ *                     details).
-+ * @busy_queues: number of bfq_queues containing requests (including the
-+ *		 queue in service, even if it is idling).
-+ * @busy_in_flight_queues: number of @bfq_queues containing pending or
-+ *                         in-flight requests, plus the @bfq_queue in
-+ *                         service, even if idle but waiting for the
-+ *                         possible arrival of its next sync request. This
-+ *                         field is updated only if the device is rotational,
-+ *                         but used only if the device is also NCQ-capable.
-+ *                         The reason why the field is updated also for non-
-+ *                         NCQ-capable rotational devices is related to the
-+ *                         fact that the value of @hw_tag may be set also
-+ *                         later than when busy_in_flight_queues may need to
-+ *                         be incremented for the first time(s). Taking also
-+ *                         this possibility into account, to avoid unbalanced
-+ *                         increments/decrements, would imply more overhead
-+ *                         than just updating busy_in_flight_queues
-+ *                         regardless of the value of @hw_tag.
-+ * @const_seeky_busy_in_flight_queues: number of constantly-seeky @bfq_queues
-+ *                                     (that is, seeky queues that expired
-+ *                                     for budget timeout at least once)
-+ *                                     containing pending or in-flight
-+ *                                     requests, including the in-service
-+ *                                     @bfq_queue if constantly seeky. This
-+ *                                     field is updated only if the device
-+ *                                     is rotational, but used only if the
-+ *                                     device is also NCQ-capable (see the
-+ *                                     comments to @busy_in_flight_queues).
-+ * @wr_busy_queues: number of weight-raised busy @bfq_queues.
-+ * @queued: number of queued requests.
-+ * @rq_in_driver: number of requests dispatched and waiting for completion.
-+ * @sync_flight: number of sync requests in the driver.
-+ * @max_rq_in_driver: max number of reqs in driver in the last
-+ *                    @hw_tag_samples completed requests.
-+ * @hw_tag_samples: nr of samples used to calculate hw_tag.
-+ * @hw_tag: flag set to one if the driver is showing a queueing behavior.
-+ * @budgets_assigned: number of budgets assigned.
-+ * @idle_slice_timer: timer set when idling for the next sequential request
-+ *                    from the queue in service.
-+ * @unplug_work: delayed work to restart dispatching on the request queue.
-+ * @in_service_queue: bfq_queue in service.
-+ * @in_service_bic: bfq_io_cq (bic) associated with the @in_service_queue.
-+ * @last_position: on-disk position of the last served request.
-+ * @last_budget_start: beginning of the last budget.
-+ * @last_idling_start: beginning of the last idle slice.
-+ * @peak_rate: peak transfer rate observed for a budget.
-+ * @peak_rate_samples: number of samples used to calculate @peak_rate.
-+ * @bfq_max_budget: maximum budget allotted to a bfq_queue before
-+ *                  rescheduling.
-+ * @group_list: list of all the bfq_groups active on the device.
-+ * @active_list: list of all the bfq_queues active on the device.
-+ * @idle_list: list of all the bfq_queues idle on the device.
-+ * @bfq_fifo_expire: timeout for async/sync requests; when it expires
-+ *                   requests are served in fifo order.
-+ * @bfq_back_penalty: weight of backward seeks wrt forward ones.
-+ * @bfq_back_max: maximum allowed backward seek.
-+ * @bfq_slice_idle: maximum idling time.
-+ * @bfq_user_max_budget: user-configured max budget value
-+ *                       (0 for auto-tuning).
-+ * @bfq_max_budget_async_rq: maximum budget (in nr of requests) allotted to
-+ *                           async queues.
-+ * @bfq_timeout: timeout for bfq_queues to consume their budget; used to
-+ *               to prevent seeky queues to impose long latencies to well
-+ *               behaved ones (this also implies that seeky queues cannot
-+ *               receive guarantees in the service domain; after a timeout
-+ *               they are charged for the whole allocated budget, to try
-+ *               to preserve a behavior reasonably fair among them, but
-+ *               without service-domain guarantees).
-+ * @bfq_coop_thresh: number of queue merges after which a @bfq_queue is
-+ *                   no more granted any weight-raising.
-+ * @bfq_failed_cooperations: number of consecutive failed cooperation
-+ *                           chances after which weight-raising is restored
-+ *                           to a queue subject to more than bfq_coop_thresh
-+ *                           queue merges.
-+ * @bfq_requests_within_timer: number of consecutive requests that must be
-+ *                             issued within the idle time slice to set
-+ *                             again idling to a queue which was marked as
-+ *                             non-I/O-bound (see the definition of the
-+ *                             IO_bound flag for further details).
-+ * @last_ins_in_burst: last time at which a queue entered the current
-+ *                     burst of queues being activated shortly after
-+ *                     each other; for more details about this and the
-+ *                     following parameters related to a burst of
-+ *                     activations, see the comments to the function
-+ *                     @bfq_handle_burst.
-+ * @bfq_burst_interval: reference time interval used to decide whether a
-+ *                      queue has been activated shortly after
-+ *                      @last_ins_in_burst.
-+ * @burst_size: number of queues in the current burst of queue activations.
-+ * @bfq_large_burst_thresh: maximum burst size above which the current
-+ * 			    queue-activation burst is deemed as 'large'.
-+ * @large_burst: true if a large queue-activation burst is in progress.
-+ * @burst_list: head of the burst list (as for the above fields, more details
-+ * 		in the comments to the function bfq_handle_burst).
-+ * @low_latency: if set to true, low-latency heuristics are enabled.
-+ * @bfq_wr_coeff: maximum factor by which the weight of a weight-raised
-+ *                queue is multiplied.
-+ * @bfq_wr_max_time: maximum duration of a weight-raising period (jiffies).
-+ * @bfq_wr_rt_max_time: maximum duration for soft real-time processes.
-+ * @bfq_wr_min_idle_time: minimum idle period after which weight-raising
-+ *			  may be reactivated for a queue (in jiffies).
-+ * @bfq_wr_min_inter_arr_async: minimum period between request arrivals
-+ *				after which weight-raising may be
-+ *				reactivated for an already busy queue
-+ *				(in jiffies).
-+ * @bfq_wr_max_softrt_rate: max service-rate for a soft real-time queue,
-+ *			    sectors per seconds.
-+ * @RT_prod: cached value of the product R*T used for computing the maximum
-+ *	     duration of the weight raising automatically.
-+ * @device_speed: device-speed class for the low-latency heuristic.
-+ * @oom_bfqq: fallback dummy bfqq for extreme OOM conditions.
-+ *
-+ * All the fields are protected by the @queue lock.
-+ */
-+struct bfq_data {
-+	struct request_queue *queue;
-+
-+	struct bfq_group *root_group;
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+	int active_numerous_groups;
-+#endif
-+
-+	struct rb_root queue_weights_tree;
-+	struct rb_root group_weights_tree;
-+
-+	int busy_queues;
-+	int busy_in_flight_queues;
-+	int const_seeky_busy_in_flight_queues;
-+	int wr_busy_queues;
-+	int queued;
-+	int rq_in_driver;
-+	int sync_flight;
-+
-+	int max_rq_in_driver;
-+	int hw_tag_samples;
-+	int hw_tag;
-+
-+	int budgets_assigned;
-+
-+	struct timer_list idle_slice_timer;
-+	struct work_struct unplug_work;
-+
-+	struct bfq_queue *in_service_queue;
-+	struct bfq_io_cq *in_service_bic;
-+
-+	sector_t last_position;
-+
-+	ktime_t last_budget_start;
-+	ktime_t last_idling_start;
-+	int peak_rate_samples;
-+	u64 peak_rate;
-+	int bfq_max_budget;
-+
-+	struct hlist_head group_list;
-+	struct list_head active_list;
-+	struct list_head idle_list;
-+
-+	unsigned int bfq_fifo_expire[2];
-+	unsigned int bfq_back_penalty;
-+	unsigned int bfq_back_max;
-+	unsigned int bfq_slice_idle;
-+	u64 bfq_class_idle_last_service;
-+
-+	int bfq_user_max_budget;
-+	int bfq_max_budget_async_rq;
-+	unsigned int bfq_timeout[2];
-+
-+	unsigned int bfq_coop_thresh;
-+	unsigned int bfq_failed_cooperations;
-+	unsigned int bfq_requests_within_timer;
-+
-+	unsigned long last_ins_in_burst;
-+	unsigned long bfq_burst_interval;
-+	int burst_size;
-+	unsigned long bfq_large_burst_thresh;
-+	bool large_burst;
-+	struct hlist_head burst_list;
-+
-+	bool low_latency;
-+
-+	/* parameters of the low_latency heuristics */
-+	unsigned int bfq_wr_coeff;
-+	unsigned int bfq_wr_max_time;
-+	unsigned int bfq_wr_rt_max_time;
-+	unsigned int bfq_wr_min_idle_time;
-+	unsigned long bfq_wr_min_inter_arr_async;
-+	unsigned int bfq_wr_max_softrt_rate;
-+	u64 RT_prod;
-+	enum bfq_device_speed device_speed;
-+
-+	struct bfq_queue oom_bfqq;
-+};
-+
-+enum bfqq_state_flags {
-+	BFQ_BFQQ_FLAG_busy = 0,		/* has requests or is in service */
-+	BFQ_BFQQ_FLAG_wait_request,	/* waiting for a request */
-+	BFQ_BFQQ_FLAG_must_alloc,	/* must be allowed rq alloc */
-+	BFQ_BFQQ_FLAG_fifo_expire,	/* FIFO checked in this slice */
-+	BFQ_BFQQ_FLAG_idle_window,	/* slice idling enabled */
-+	BFQ_BFQQ_FLAG_sync,		/* synchronous queue */
-+	BFQ_BFQQ_FLAG_budget_new,	/* no completion with this budget */
-+	BFQ_BFQQ_FLAG_IO_bound,		/*
-+					 * bfqq has timed-out at least once
-+					 * having consumed at most 2/10 of
-+					 * its budget
-+					 */
-+	BFQ_BFQQ_FLAG_in_large_burst,	/*
-+					 * bfqq activated in a large burst,
-+					 * see comments to bfq_handle_burst.
-+					 */
-+	BFQ_BFQQ_FLAG_constantly_seeky,	/*
-+					 * bfqq has proved to be slow and
-+					 * seeky until budget timeout
-+					 */
-+	BFQ_BFQQ_FLAG_softrt_update,	/*
-+					 * may need softrt-next-start
-+					 * update
-+					 */
-+};
-+
-+#define BFQ_BFQQ_FNS(name)						\
-+static void bfq_mark_bfqq_##name(struct bfq_queue *bfqq)		\
-+{									\
-+	(bfqq)->flags |= (1 << BFQ_BFQQ_FLAG_##name);			\
-+}									\
-+static void bfq_clear_bfqq_##name(struct bfq_queue *bfqq)		\
-+{									\
-+	(bfqq)->flags &= ~(1 << BFQ_BFQQ_FLAG_##name);			\
-+}									\
-+static int bfq_bfqq_##name(const struct bfq_queue *bfqq)		\
-+{									\
-+	return ((bfqq)->flags & (1 << BFQ_BFQQ_FLAG_##name)) != 0;	\
-+}
-+
-+BFQ_BFQQ_FNS(busy);
-+BFQ_BFQQ_FNS(wait_request);
-+BFQ_BFQQ_FNS(must_alloc);
-+BFQ_BFQQ_FNS(fifo_expire);
-+BFQ_BFQQ_FNS(idle_window);
-+BFQ_BFQQ_FNS(sync);
-+BFQ_BFQQ_FNS(budget_new);
-+BFQ_BFQQ_FNS(IO_bound);
-+BFQ_BFQQ_FNS(in_large_burst);
-+BFQ_BFQQ_FNS(constantly_seeky);
-+BFQ_BFQQ_FNS(softrt_update);
-+#undef BFQ_BFQQ_FNS
-+
-+/* Logging facilities. */
-+#define bfq_log_bfqq(bfqd, bfqq, fmt, args...) \
-+	blk_add_trace_msg((bfqd)->queue, "bfq%d " fmt, (bfqq)->pid, ##args)
-+
-+#define bfq_log(bfqd, fmt, args...) \
-+	blk_add_trace_msg((bfqd)->queue, "bfq " fmt, ##args)
-+
-+/* Expiration reasons. */
-+enum bfqq_expiration {
-+	BFQ_BFQQ_TOO_IDLE = 0,		/*
-+					 * queue has been idling for
-+					 * too long
-+					 */
-+	BFQ_BFQQ_BUDGET_TIMEOUT,	/* budget took too long to be used */
-+	BFQ_BFQQ_BUDGET_EXHAUSTED,	/* budget consumed */
-+	BFQ_BFQQ_NO_MORE_REQUESTS,	/* the queue has no more requests */
-+};
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+
-+struct bfqg_stats {
-+	/* total bytes transferred */
-+	struct blkg_rwstat		service_bytes;
-+	/* total IOs serviced, post merge */
-+	struct blkg_rwstat		serviced;
-+	/* number of ios merged */
-+	struct blkg_rwstat		merged;
-+	/* total time spent on device in ns, may not be accurate w/ queueing */
-+	struct blkg_rwstat		service_time;
-+	/* total time spent waiting in scheduler queue in ns */
-+	struct blkg_rwstat		wait_time;
-+	/* number of IOs queued up */
-+	struct blkg_rwstat		queued;
-+	/* total sectors transferred */
-+	struct blkg_stat		sectors;
-+	/* total disk time and nr sectors dispatched by this group */
-+	struct blkg_stat		time;
-+	/* time not charged to this cgroup */
-+	struct blkg_stat		unaccounted_time;
-+	/* sum of number of ios queued across all samples */
-+	struct blkg_stat		avg_queue_size_sum;
-+	/* count of samples taken for average */
-+	struct blkg_stat		avg_queue_size_samples;
-+	/* how many times this group has been removed from service tree */
-+	struct blkg_stat		dequeue;
-+	/* total time spent waiting for it to be assigned a timeslice. */
-+	struct blkg_stat		group_wait_time;
-+	/* time spent idling for this blkcg_gq */
-+	struct blkg_stat		idle_time;
-+	/* total time with empty current active q with other requests queued */
-+	struct blkg_stat		empty_time;
-+	/* fields after this shouldn't be cleared on stat reset */
-+	uint64_t			start_group_wait_time;
-+	uint64_t			start_idle_time;
-+	uint64_t			start_empty_time;
-+	uint16_t			flags;
-+};
-+
-+/*
-+ * struct bfq_group_data - per-blkcg storage for the blkio subsystem.
-+ *
-+ * @ps: @blkcg_policy_storage that this structure inherits
-+ * @weight: weight of the bfq_group
-+ */
-+struct bfq_group_data {
-+	/* must be the first member */
-+	struct blkcg_policy_data pd;
-+
-+	unsigned short weight;
-+};
-+
-+/**
-+ * struct bfq_group - per (device, cgroup) data structure.
-+ * @entity: schedulable entity to insert into the parent group sched_data.
-+ * @sched_data: own sched_data, to contain child entities (they may be
-+ *              both bfq_queues and bfq_groups).
-+ * @bfqd_node: node to be inserted into the @bfqd->group_list list
-+ *             of the groups active on the same device; used for cleanup.
-+ * @bfqd: the bfq_data for the device this group acts upon.
-+ * @async_bfqq: array of async queues for all the tasks belonging to
-+ *              the group, one queue per ioprio value per ioprio_class,
-+ *              except for the idle class that has only one queue.
-+ * @async_idle_bfqq: async queue for the idle class (ioprio is ignored).
-+ * @my_entity: pointer to @entity, %NULL for the toplevel group; used
-+ *             to avoid too many special cases during group creation/
-+ *             migration.
-+ * @active_entities: number of active entities belonging to the group;
-+ *                   unused for the root group. Used to know whether there
-+ *                   are groups with more than one active @bfq_entity
-+ *                   (see the comments to the function
-+ *                   bfq_bfqq_must_not_expire()).
-+ *
-+ * Each (device, cgroup) pair has its own bfq_group, i.e., for each cgroup
-+ * there is a set of bfq_groups, each one collecting the lower-level
-+ * entities belonging to the group that are acting on the same device.
-+ *
-+ * Locking works as follows:
-+ *    o @bfqd is protected by the queue lock, RCU is used to access it
-+ *      from the readers.
-+ *    o All the other fields are protected by the @bfqd queue lock.
-+ */
-+struct bfq_group {
-+	/* must be the first member */
-+	struct blkg_policy_data pd;
-+
-+	struct bfq_entity entity;
-+	struct bfq_sched_data sched_data;
-+
-+	struct hlist_node bfqd_node;
-+
-+	void *bfqd;
-+
-+	struct bfq_queue *async_bfqq[2][IOPRIO_BE_NR];
-+	struct bfq_queue *async_idle_bfqq;
-+
-+	struct bfq_entity *my_entity;
-+
-+	int active_entities;
-+
-+	struct bfqg_stats stats;
-+	struct bfqg_stats dead_stats;	/* stats pushed from dead children */
-+};
-+
-+#else
-+struct bfq_group {
-+	struct bfq_sched_data sched_data;
-+
-+	struct bfq_queue *async_bfqq[2][IOPRIO_BE_NR];
-+	struct bfq_queue *async_idle_bfqq;
-+};
-+#endif
-+
-+static struct bfq_queue *bfq_entity_to_bfqq(struct bfq_entity *entity);
-+
-+static struct bfq_service_tree *
-+bfq_entity_service_tree(struct bfq_entity *entity)
-+{
-+	struct bfq_sched_data *sched_data = entity->sched_data;
-+	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
-+	unsigned int idx = bfqq ? bfqq->ioprio_class - 1 :
-+				  BFQ_DEFAULT_GRP_CLASS;
-+
-+	BUG_ON(idx >= BFQ_IOPRIO_CLASSES);
-+	BUG_ON(sched_data == NULL);
-+
-+	return sched_data->service_tree + idx;
-+}
-+
-+static struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic, bool is_sync)
-+{
-+	return bic->bfqq[is_sync];
-+}
-+
-+static void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq,
-+			 bool is_sync)
-+{
-+	bic->bfqq[is_sync] = bfqq;
-+}
-+
-+static struct bfq_data *bic_to_bfqd(struct bfq_io_cq *bic)
-+{
-+	return bic->icq.q->elevator->elevator_data;
-+}
-+
-+/**
-+ * bfq_get_bfqd_locked - get a lock to a bfqd using a RCU protected pointer.
-+ * @ptr: a pointer to a bfqd.
-+ * @flags: storage for the flags to be saved.
-+ *
-+ * This function allows bfqg->bfqd to be protected by the
-+ * queue lock of the bfqd they reference; the pointer is dereferenced
-+ * under RCU, so the storage for bfqd is assured to be safe as long
-+ * as the RCU read side critical section does not end.  After the
-+ * bfqd->queue->queue_lock is taken the pointer is rechecked, to be
-+ * sure that no other writer accessed it.  If we raced with a writer,
-+ * the function returns NULL, with the queue unlocked, otherwise it
-+ * returns the dereferenced pointer, with the queue locked.
-+ */
-+static struct bfq_data *bfq_get_bfqd_locked(void **ptr, unsigned long *flags)
-+{
-+	struct bfq_data *bfqd;
-+
-+	rcu_read_lock();
-+	bfqd = rcu_dereference(*(struct bfq_data **)ptr);
-+
-+	if (bfqd != NULL) {
-+		spin_lock_irqsave(bfqd->queue->queue_lock, *flags);
-+		if (ptr == NULL)
-+			printk(KERN_CRIT "get_bfqd_locked pointer NULL\n");
-+		else if (*ptr == bfqd)
-+			goto out;
-+		spin_unlock_irqrestore(bfqd->queue->queue_lock, *flags);
-+	}
-+
-+	bfqd = NULL;
-+out:
-+	rcu_read_unlock();
-+	return bfqd;
-+}
-+
-+static void bfq_put_bfqd_unlock(struct bfq_data *bfqd, unsigned long *flags)
-+{
-+	spin_unlock_irqrestore(bfqd->queue->queue_lock, *flags);
-+}
-+
-+static void bfq_check_ioprio_change(struct bfq_io_cq *bic, struct bio *bio);
-+static void bfq_put_queue(struct bfq_queue *bfqq);
-+static void bfq_dispatch_insert(struct request_queue *q, struct request *rq);
-+static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
-+				       struct bio *bio, int is_sync,
-+				       struct bfq_io_cq *bic, gfp_t gfp_mask);
-+static void bfq_end_wr_async_queues(struct bfq_data *bfqd,
-+				    struct bfq_group *bfqg);
-+static void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg);
-+static void bfq_exit_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq);
-+
-+#endif /* _BFQ_H */
--- 
-2.1.4
-

diff --git a/5003_block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r9-for-4.2.patch b/5003_block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r9-for-4.2.patch
deleted file mode 100644
index dac6db6..0000000
--- a/5003_block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r9-for-4.2.patch
+++ /dev/null
@@ -1,1097 +0,0 @@
-From 75c9c5ea340776c0a9e934581cf63cb963a33fd4 Mon Sep 17 00:00:00 2001
-From: Mauro Andreolini <mauro.andreolini@unimore.it>
-Date: Sun, 6 Sep 2015 16:09:05 +0200
-Subject: [PATCH 3/3] block, bfq: add Early Queue Merge (EQM) to BFQ-v7r9 for
- 4.2.0
-
-A set of processes may happen  to  perform interleaved reads, i.e.,requests
-whose union would give rise to a  sequential read  pattern.  There are two
-typical  cases: in the first  case,   processes  read  fixed-size chunks of
-data at a fixed distance from each other, while in the second case processes
-may read variable-size chunks at  variable distances. The latter case occurs
-for  example with  QEMU, which  splits the  I/O generated  by the  guest into
-multiple chunks,  and lets these chunks  be served by a  pool of cooperating
-processes,  iteratively  assigning  the  next  chunk of  I/O  to  the first
-available  process. CFQ  uses actual  queue merging  for the  first type of
-rocesses, whereas it  uses preemption to get a sequential  read pattern out
-of the read requests  performed by the second type of  processes. In the end
-it uses  two different  mechanisms to  achieve the  same goal: boosting the
-throughput with interleaved I/O.
-
-This patch introduces  Early Queue Merge (EQM), a unified mechanism to get a
-sequential  read pattern  with both  types of  processes. The  main idea is
-checking newly arrived requests against the next request of the active queue
-both in case of actual request insert and in case of request merge. By doing
-so, both the types of processes can be handled by just merging their queues.
-EQM is  then simpler and  more compact than the  pair of mechanisms used in
-CFQ.
-
-Finally, EQM  also preserves the  typical low-latency properties of BFQ, by
-properly restoring the weight-raising state of a queue when it gets back to
-a non-merged state.
-
-Signed-off-by: Mauro Andreolini <mauro.andreolini@unimore.it>
-Signed-off-by: Arianna Avanzini <avanzini@google.com>
-Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
----
- block/bfq-cgroup.c  |   4 +
- block/bfq-iosched.c | 684 ++++++++++++++++++++++++++++++++++++++++++++++++++--
- block/bfq.h         |  66 +++++
- 3 files changed, 740 insertions(+), 14 deletions(-)
-
-diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
-index c02d65a..bc34d7a 100644
---- a/block/bfq-cgroup.c
-+++ b/block/bfq-cgroup.c
-@@ -382,6 +382,7 @@ static void bfq_pd_init(struct blkcg_gq *blkg)
- 				   */
- 	bfqg->bfqd = bfqd;
- 	bfqg->active_entities = 0;
-+	bfqg->rq_pos_tree = RB_ROOT;
- 
- 	/* if the root_group does not exist, we are handling it right now */
- 	if (bfqd->root_group && bfqg != bfqd->root_group)
-@@ -484,6 +485,8 @@ static struct bfq_group *bfq_find_alloc_group(struct bfq_data *bfqd,
- 	return bfqg;
- }
- 
-+static void bfq_pos_tree_add_move(struct bfq_data *bfqd, struct bfq_queue *bfqq);
-+
- /**
-  * bfq_bfqq_move - migrate @bfqq to @bfqg.
-  * @bfqd: queue descriptor.
-@@ -531,6 +534,7 @@ static void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
- 	bfqg_get(bfqg);
- 
- 	if (busy) {
-+		bfq_pos_tree_add_move(bfqd, bfqq);
- 		if (resume)
- 			bfq_activate_bfqq(bfqd, bfqq);
- 	}
-diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
-index 51d24dd..fcd6eea 100644
---- a/block/bfq-iosched.c
-+++ b/block/bfq-iosched.c
-@@ -296,6 +296,72 @@ static struct request *bfq_choose_req(struct bfq_data *bfqd,
- 	}
- }
- 
-+static struct bfq_queue *
-+bfq_rq_pos_tree_lookup(struct bfq_data *bfqd, struct rb_root *root,
-+		     sector_t sector, struct rb_node **ret_parent,
-+		     struct rb_node ***rb_link)
-+{
-+	struct rb_node **p, *parent;
-+	struct bfq_queue *bfqq = NULL;
-+
-+	parent = NULL;
-+	p = &root->rb_node;
-+	while (*p) {
-+		struct rb_node **n;
-+
-+		parent = *p;
-+		bfqq = rb_entry(parent, struct bfq_queue, pos_node);
-+
-+		/*
-+		 * Sort strictly based on sector. Smallest to the left,
-+		 * largest to the right.
-+		 */
-+		if (sector > blk_rq_pos(bfqq->next_rq))
-+			n = &(*p)->rb_right;
-+		else if (sector < blk_rq_pos(bfqq->next_rq))
-+			n = &(*p)->rb_left;
-+		else
-+			break;
-+		p = n;
-+		bfqq = NULL;
-+	}
-+
-+	*ret_parent = parent;
-+	if (rb_link)
-+		*rb_link = p;
-+
-+	bfq_log(bfqd, "rq_pos_tree_lookup %llu: returning %d",
-+		(long long unsigned)sector,
-+		bfqq ? bfqq->pid : 0);
-+
-+	return bfqq;
-+}
-+
-+static void bfq_pos_tree_add_move(struct bfq_data *bfqd, struct bfq_queue *bfqq)
-+{
-+	struct rb_node **p, *parent;
-+	struct bfq_queue *__bfqq;
-+
-+	if (bfqq->pos_root) {
-+		rb_erase(&bfqq->pos_node, bfqq->pos_root);
-+		bfqq->pos_root = NULL;
-+	}
-+
-+	if (bfq_class_idle(bfqq))
-+		return;
-+	if (!bfqq->next_rq)
-+		return;
-+
-+	bfqq->pos_root = &bfq_bfqq_to_bfqg(bfqq)->rq_pos_tree;
-+	__bfqq = bfq_rq_pos_tree_lookup(bfqd, bfqq->pos_root,
-+			blk_rq_pos(bfqq->next_rq), &parent, &p);
-+	if (!__bfqq) {
-+		rb_link_node(&bfqq->pos_node, parent, p);
-+		rb_insert_color(&bfqq->pos_node, bfqq->pos_root);
-+	} else
-+		bfqq->pos_root = NULL;
-+}
-+
- /*
-  * Tell whether there are active queues or groups with differentiated weights.
-  */
-@@ -528,6 +594,57 @@ static unsigned int bfq_wr_duration(struct bfq_data *bfqd)
- 	return dur;
- }
- 
-+static unsigned bfq_bfqq_cooperations(struct bfq_queue *bfqq)
-+{
-+	return bfqq->bic ? bfqq->bic->cooperations : 0;
-+}
-+
-+static void
-+bfq_bfqq_resume_state(struct bfq_queue *bfqq, struct bfq_io_cq *bic)
-+{
-+	if (bic->saved_idle_window)
-+		bfq_mark_bfqq_idle_window(bfqq);
-+	else
-+		bfq_clear_bfqq_idle_window(bfqq);
-+	if (bic->saved_IO_bound)
-+		bfq_mark_bfqq_IO_bound(bfqq);
-+	else
-+		bfq_clear_bfqq_IO_bound(bfqq);
-+	/* Assuming that the flag in_large_burst is already correctly set */
-+	if (bic->wr_time_left && bfqq->bfqd->low_latency &&
-+	    !bfq_bfqq_in_large_burst(bfqq) &&
-+	    bic->cooperations < bfqq->bfqd->bfq_coop_thresh) {
-+		/*
-+		 * Start a weight raising period with the duration given by
-+		 * the raising_time_left snapshot.
-+		 */
-+		if (bfq_bfqq_busy(bfqq))
-+			bfqq->bfqd->wr_busy_queues++;
-+		bfqq->wr_coeff = bfqq->bfqd->bfq_wr_coeff;
-+		bfqq->wr_cur_max_time = bic->wr_time_left;
-+		bfqq->last_wr_start_finish = jiffies;
-+		bfqq->entity.prio_changed = 1;
-+	}
-+	/*
-+	 * Clear wr_time_left to prevent bfq_bfqq_save_state() from
-+	 * getting confused about the queue's need of a weight-raising
-+	 * period.
-+	 */
-+	bic->wr_time_left = 0;
-+}
-+
-+static int bfqq_process_refs(struct bfq_queue *bfqq)
-+{
-+	int process_refs, io_refs;
-+
-+	lockdep_assert_held(bfqq->bfqd->queue->queue_lock);
-+
-+	io_refs = bfqq->allocated[READ] + bfqq->allocated[WRITE];
-+	process_refs = atomic_read(&bfqq->ref) - io_refs - bfqq->entity.on_st;
-+	BUG_ON(process_refs < 0);
-+	return process_refs;
-+}
-+
- /* Empty burst list and add just bfqq (see comments to bfq_handle_burst) */
- static void bfq_reset_burst_list(struct bfq_data *bfqd, struct bfq_queue *bfqq)
- {
-@@ -764,8 +881,14 @@ static void bfq_add_request(struct request *rq)
- 	BUG_ON(!next_rq);
- 	bfqq->next_rq = next_rq;
- 
-+	/*
-+	 * Adjust priority tree position, if next_rq changes.
-+	 */
-+	if (prev != bfqq->next_rq)
-+		bfq_pos_tree_add_move(bfqd, bfqq);
-+
- 	if (!bfq_bfqq_busy(bfqq)) {
--		bool soft_rt, in_burst,
-+		bool soft_rt, coop_or_in_burst,
- 		     idle_for_long_time = time_is_before_jiffies(
- 						bfqq->budget_timeout +
- 						bfqd->bfq_wr_min_idle_time);
-@@ -793,11 +916,12 @@ static void bfq_add_request(struct request *rq)
- 				bfqd->last_ins_in_burst = jiffies;
- 		}
- 
--		in_burst = bfq_bfqq_in_large_burst(bfqq);
-+		coop_or_in_burst = bfq_bfqq_in_large_burst(bfqq) ||
-+			bfq_bfqq_cooperations(bfqq) >= bfqd->bfq_coop_thresh;
- 		soft_rt = bfqd->bfq_wr_max_softrt_rate > 0 &&
--			!in_burst &&
-+			!coop_or_in_burst &&
- 			time_is_before_jiffies(bfqq->soft_rt_next_start);
--		interactive = !in_burst && idle_for_long_time;
-+		interactive = !coop_or_in_burst && idle_for_long_time;
- 		entity->budget = max_t(unsigned long, bfqq->max_budget,
- 				       bfq_serv_to_charge(next_rq, bfqq));
- 
-@@ -816,6 +940,9 @@ static void bfq_add_request(struct request *rq)
- 		if (!bfqd->low_latency)
- 			goto add_bfqq_busy;
- 
-+		if (bfq_bfqq_just_split(bfqq))
-+			goto set_prio_changed;
-+
- 		/*
- 		 * If the queue:
- 		 * - is not being boosted,
-@@ -840,7 +967,7 @@ static void bfq_add_request(struct request *rq)
- 		} else if (old_wr_coeff > 1) {
- 			if (interactive)
- 				bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
--			else if (in_burst ||
-+			else if (coop_or_in_burst ||
- 				 (bfqq->wr_cur_max_time ==
- 				  bfqd->bfq_wr_rt_max_time &&
- 				  !soft_rt)) {
-@@ -905,6 +1032,7 @@ static void bfq_add_request(struct request *rq)
- 					bfqd->bfq_wr_rt_max_time;
- 			}
- 		}
-+set_prio_changed:
- 		if (old_wr_coeff != bfqq->wr_coeff)
- 			entity->prio_changed = 1;
- add_bfqq_busy:
-@@ -1047,6 +1175,15 @@ static void bfq_merged_request(struct request_queue *q, struct request *req,
- 					 bfqd->last_position);
- 		BUG_ON(!next_rq);
- 		bfqq->next_rq = next_rq;
-+		/*
-+		 * If next_rq changes, update both the queue's budget to
-+		 * fit the new request and the queue's position in its
-+		 * rq_pos_tree.
-+		 */
-+		if (prev != bfqq->next_rq) {
-+			bfq_updated_next_req(bfqd, bfqq);
-+			bfq_pos_tree_add_move(bfqd, bfqq);
-+		}
- 	}
- }
- 
-@@ -1129,11 +1266,343 @@ static void bfq_end_wr(struct bfq_data *bfqd)
- 	spin_unlock_irq(bfqd->queue->queue_lock);
- }
- 
-+static sector_t bfq_io_struct_pos(void *io_struct, bool request)
-+{
-+	if (request)
-+		return blk_rq_pos(io_struct);
-+	else
-+		return ((struct bio *)io_struct)->bi_iter.bi_sector;
-+}
-+
-+static int bfq_rq_close_to_sector(void *io_struct, bool request,
-+				  sector_t sector)
-+{
-+	return abs64(bfq_io_struct_pos(io_struct, request) - sector) <=
-+	       BFQQ_SEEK_THR;
-+}
-+
-+static struct bfq_queue *bfqq_find_close(struct bfq_data *bfqd,
-+					 struct bfq_queue *bfqq,
-+					 sector_t sector)
-+{
-+	struct rb_root *root = &bfq_bfqq_to_bfqg(bfqq)->rq_pos_tree;
-+	struct rb_node *parent, *node;
-+	struct bfq_queue *__bfqq;
-+
-+	if (RB_EMPTY_ROOT(root))
-+		return NULL;
-+
-+	/*
-+	 * First, if we find a request starting at the end of the last
-+	 * request, choose it.
-+	 */
-+	__bfqq = bfq_rq_pos_tree_lookup(bfqd, root, sector, &parent, NULL);
-+	if (__bfqq)
-+		return __bfqq;
-+
-+	/*
-+	 * If the exact sector wasn't found, the parent of the NULL leaf
-+	 * will contain the closest sector (rq_pos_tree sorted by
-+	 * next_request position).
-+	 */
-+	__bfqq = rb_entry(parent, struct bfq_queue, pos_node);
-+	if (bfq_rq_close_to_sector(__bfqq->next_rq, true, sector))
-+		return __bfqq;
-+
-+	if (blk_rq_pos(__bfqq->next_rq) < sector)
-+		node = rb_next(&__bfqq->pos_node);
-+	else
-+		node = rb_prev(&__bfqq->pos_node);
-+	if (!node)
-+		return NULL;
-+
-+	__bfqq = rb_entry(node, struct bfq_queue, pos_node);
-+	if (bfq_rq_close_to_sector(__bfqq->next_rq, true, sector))
-+		return __bfqq;
-+
-+	return NULL;
-+}
-+
-+static struct bfq_queue *bfq_find_close_cooperator(struct bfq_data *bfqd,
-+						   struct bfq_queue *cur_bfqq,
-+						   sector_t sector)
-+{
-+	struct bfq_queue *bfqq;
-+
-+	/*
-+	 * We should notice if some of the queues are cooperating, e.g.
-+	 * working closely on the same area of the disk. In that case,
-+	 * we can group them together and don't waste time idling.
-+	 */
-+	bfqq = bfqq_find_close(bfqd, cur_bfqq, sector);
-+	if (!bfqq || bfqq == cur_bfqq)
-+		return NULL;
-+
-+	return bfqq;
-+}
-+
-+static struct bfq_queue *
-+bfq_setup_merge(struct bfq_queue *bfqq, struct bfq_queue *new_bfqq)
-+{
-+	int process_refs, new_process_refs;
-+	struct bfq_queue *__bfqq;
-+
-+	/*
-+	 * If there are no process references on the new_bfqq, then it is
-+	 * unsafe to follow the ->new_bfqq chain as other bfqq's in the chain
-+	 * may have dropped their last reference (not just their last process
-+	 * reference).
-+	 */
-+	if (!bfqq_process_refs(new_bfqq))
-+		return NULL;
-+
-+	/* Avoid a circular list and skip interim queue merges. */
-+	while ((__bfqq = new_bfqq->new_bfqq)) {
-+		if (__bfqq == bfqq)
-+			return NULL;
-+		new_bfqq = __bfqq;
-+	}
-+
-+	process_refs = bfqq_process_refs(bfqq);
-+	new_process_refs = bfqq_process_refs(new_bfqq);
-+	/*
-+	 * If the process for the bfqq has gone away, there is no
-+	 * sense in merging the queues.
-+	 */
-+	if (process_refs == 0 || new_process_refs == 0)
-+		return NULL;
-+
-+	bfq_log_bfqq(bfqq->bfqd, bfqq, "scheduling merge with queue %d",
-+		new_bfqq->pid);
-+
-+	/*
-+	 * Merging is just a redirection: the requests of the process
-+	 * owning one of the two queues are redirected to the other queue.
-+	 * The latter queue, in its turn, is set as shared if this is the
-+	 * first time that the requests of some process are redirected to
-+	 * it.
-+	 *
-+	 * We redirect bfqq to new_bfqq and not the opposite, because we
-+	 * are in the context of the process owning bfqq, hence we have
-+	 * the io_cq of this process. So we can immediately configure this
-+	 * io_cq to redirect the requests of the process to new_bfqq.
-+	 *
-+	 * NOTE, even if new_bfqq coincides with the in-service queue, the
-+	 * io_cq of new_bfqq is not available, because, if the in-service
-+	 * queue is shared, bfqd->in_service_bic may not point to the
-+	 * io_cq of the in-service queue.
-+	 * Redirecting the requests of the process owning bfqq to the
-+	 * currently in-service queue is in any case the best option, as
-+	 * we feed the in-service queue with new requests close to the
-+	 * last request served and, by doing so, hopefully increase the
-+	 * throughput.
-+	 */
-+	bfqq->new_bfqq = new_bfqq;
-+	atomic_add(process_refs, &new_bfqq->ref);
-+	return new_bfqq;
-+}
-+
-+static bool bfq_may_be_close_cooperator(struct bfq_queue *bfqq,
-+					struct bfq_queue *new_bfqq)
-+{
-+	if (WARN_ON(bfqq->entity.parent != new_bfqq->entity.parent))
-+		return false;
-+
-+	if (bfq_class_idle(bfqq) || bfq_class_idle(new_bfqq) ||
-+	    (bfqq->ioprio_class != new_bfqq->ioprio_class))
-+		return false;
-+
-+	/*
-+	 * If either of the queues has already been detected as seeky,
-+	 * then merging it with the other queue is unlikely to lead to
-+	 * sequential I/O.
-+	 */
-+	if (BFQQ_SEEKY(bfqq) || BFQQ_SEEKY(new_bfqq))
-+		return false;
-+
-+	/*
-+	 * Interleaved I/O is known to be done by (some) applications
-+	 * only for reads, so it does not make sense to merge async
-+	 * queues.
-+	 */
-+	if (!bfq_bfqq_sync(bfqq) || !bfq_bfqq_sync(new_bfqq))
-+		return false;
-+
-+	return true;
-+}
-+
-+/*
-+ * Attempt to schedule a merge of bfqq with the currently in-service queue
-+ * or with a close queue among the scheduled queues.
-+ * Return NULL if no merge was scheduled, a pointer to the shared bfq_queue
-+ * structure otherwise.
-+ *
-+ * The OOM queue is not allowed to participate to cooperation: in fact, since
-+ * the requests temporarily redirected to the OOM queue could be redirected
-+ * again to dedicated queues at any time, the state needed to correctly
-+ * handle merging with the OOM queue would be quite complex and expensive
-+ * to maintain. Besides, in such a critical condition as an out of memory,
-+ * the benefits of queue merging may be little relevant, or even negligible.
-+ */
-+static struct bfq_queue *
-+bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
-+		     void *io_struct, bool request)
-+{
-+	struct bfq_queue *in_service_bfqq, *new_bfqq;
-+
-+	if (bfqq->new_bfqq)
-+		return bfqq->new_bfqq;
-+	if (!io_struct || unlikely(bfqq == &bfqd->oom_bfqq))
-+		return NULL;
-+	/* If device has only one backlogged bfq_queue, don't search. */
-+	if (bfqd->busy_queues == 1)
-+		return NULL;
-+
-+	in_service_bfqq = bfqd->in_service_queue;
-+
-+	if (!in_service_bfqq || in_service_bfqq == bfqq ||
-+	    !bfqd->in_service_bic ||
-+	    unlikely(in_service_bfqq == &bfqd->oom_bfqq))
-+		goto check_scheduled;
-+
-+	if (bfq_rq_close_to_sector(io_struct, request, bfqd->last_position) &&
-+	    bfq_may_be_close_cooperator(bfqq, in_service_bfqq)) {
-+		new_bfqq = bfq_setup_merge(bfqq, in_service_bfqq);
-+		if (new_bfqq)
-+			return new_bfqq;
-+	}
-+	/*
-+	 * Check whether there is a cooperator among currently scheduled
-+	 * queues. The only thing we need is that the bio/request is not
-+	 * NULL, as we need it to establish whether a cooperator exists.
-+	 */
-+check_scheduled:
-+	new_bfqq = bfq_find_close_cooperator(bfqd, bfqq,
-+			bfq_io_struct_pos(io_struct, request));
-+	if (new_bfqq && likely(new_bfqq != &bfqd->oom_bfqq) &&
-+	    bfq_may_be_close_cooperator(bfqq, new_bfqq))
-+		return bfq_setup_merge(bfqq, new_bfqq);
-+
-+	return NULL;
-+}
-+
-+static void bfq_bfqq_save_state(struct bfq_queue *bfqq)
-+{
-+	/*
-+	 * If !bfqq->bic, the queue is already shared or its requests
-+	 * have already been redirected to a shared queue; both idle window
-+	 * and weight raising state have already been saved. Do nothing.
-+	 */
-+	if (!bfqq->bic)
-+		return;
-+	if (bfqq->bic->wr_time_left)
-+		/*
-+		 * This is the queue of a just-started process, and would
-+		 * deserve weight raising: we set wr_time_left to the full
-+		 * weight-raising duration to trigger weight-raising when
-+		 * and if the queue is split and the first request of the
-+		 * queue is enqueued.
-+		 */
-+		bfqq->bic->wr_time_left = bfq_wr_duration(bfqq->bfqd);
-+	else if (bfqq->wr_coeff > 1) {
-+		unsigned long wr_duration =
-+			jiffies - bfqq->last_wr_start_finish;
-+		/*
-+		 * It may happen that a queue's weight raising period lasts
-+		 * longer than its wr_cur_max_time, as weight raising is
-+		 * handled only when a request is enqueued or dispatched (it
-+		 * does not use any timer). If the weight raising period is
-+		 * about to end, don't save it.
-+		 */
-+		if (bfqq->wr_cur_max_time <= wr_duration)
-+			bfqq->bic->wr_time_left = 0;
-+		else
-+			bfqq->bic->wr_time_left =
-+				bfqq->wr_cur_max_time - wr_duration;
-+		/*
-+		 * The bfq_queue is becoming shared or the requests of the
-+		 * process owning the queue are being redirected to a shared
-+		 * queue. Stop the weight raising period of the queue, as in
-+		 * both cases it should not be owned by an interactive or
-+		 * soft real-time application.
-+		 */
-+		bfq_bfqq_end_wr(bfqq);
-+	} else
-+		bfqq->bic->wr_time_left = 0;
-+	bfqq->bic->saved_idle_window = bfq_bfqq_idle_window(bfqq);
-+	bfqq->bic->saved_IO_bound = bfq_bfqq_IO_bound(bfqq);
-+	bfqq->bic->saved_in_large_burst = bfq_bfqq_in_large_burst(bfqq);
-+	bfqq->bic->was_in_burst_list = !hlist_unhashed(&bfqq->burst_list_node);
-+	bfqq->bic->cooperations++;
-+	bfqq->bic->failed_cooperations = 0;
-+}
-+
-+static void bfq_get_bic_reference(struct bfq_queue *bfqq)
-+{
-+	/*
-+	 * If bfqq->bic has a non-NULL value, the bic to which it belongs
-+	 * is about to begin using a shared bfq_queue.
-+	 */
-+	if (bfqq->bic)
-+		atomic_long_inc(&bfqq->bic->icq.ioc->refcount);
-+}
-+
-+static void
-+bfq_merge_bfqqs(struct bfq_data *bfqd, struct bfq_io_cq *bic,
-+		struct bfq_queue *bfqq, struct bfq_queue *new_bfqq)
-+{
-+	bfq_log_bfqq(bfqd, bfqq, "merging with queue %lu",
-+		(long unsigned)new_bfqq->pid);
-+	/* Save weight raising and idle window of the merged queues */
-+	bfq_bfqq_save_state(bfqq);
-+	bfq_bfqq_save_state(new_bfqq);
-+	if (bfq_bfqq_IO_bound(bfqq))
-+		bfq_mark_bfqq_IO_bound(new_bfqq);
-+	bfq_clear_bfqq_IO_bound(bfqq);
-+	/*
-+	 * Grab a reference to the bic, to prevent it from being destroyed
-+	 * before being possibly touched by a bfq_split_bfqq().
-+	 */
-+	bfq_get_bic_reference(bfqq);
-+	bfq_get_bic_reference(new_bfqq);
-+	/*
-+	 * Merge queues (that is, let bic redirect its requests to new_bfqq)
-+	 */
-+	bic_set_bfqq(bic, new_bfqq, 1);
-+	bfq_mark_bfqq_coop(new_bfqq);
-+	/*
-+	 * new_bfqq now belongs to at least two bics (it is a shared queue):
-+	 * set new_bfqq->bic to NULL. bfqq either:
-+	 * - does not belong to any bic any more, and hence bfqq->bic must
-+	 *   be set to NULL, or
-+	 * - is a queue whose owning bics have already been redirected to a
-+	 *   different queue, hence the queue is destined to not belong to
-+	 *   any bic soon and bfqq->bic is already NULL (therefore the next
-+	 *   assignment causes no harm).
-+	 */
-+	new_bfqq->bic = NULL;
-+	bfqq->bic = NULL;
-+	bfq_put_queue(bfqq);
-+}
-+
-+static void bfq_bfqq_increase_failed_cooperations(struct bfq_queue *bfqq)
-+{
-+	struct bfq_io_cq *bic = bfqq->bic;
-+	struct bfq_data *bfqd = bfqq->bfqd;
-+
-+	if (bic && bfq_bfqq_cooperations(bfqq) >= bfqd->bfq_coop_thresh) {
-+		bic->failed_cooperations++;
-+		if (bic->failed_cooperations >= bfqd->bfq_failed_cooperations)
-+			bic->cooperations = 0;
-+	}
-+}
-+
- static int bfq_allow_merge(struct request_queue *q, struct request *rq,
- 			   struct bio *bio)
- {
- 	struct bfq_data *bfqd = q->elevator->elevator_data;
- 	struct bfq_io_cq *bic;
-+	struct bfq_queue *bfqq, *new_bfqq;
- 
- 	/*
- 	 * Disallow merge of a sync bio into an async request.
-@@ -1150,7 +1619,26 @@ static int bfq_allow_merge(struct request_queue *q, struct request *rq,
- 	if (!bic)
- 		return 0;
- 
--	return bic_to_bfqq(bic, bfq_bio_sync(bio)) == RQ_BFQQ(rq);
-+	bfqq = bic_to_bfqq(bic, bfq_bio_sync(bio));
-+	/*
-+	 * We take advantage of this function to perform an early merge
-+	 * of the queues of possible cooperating processes.
-+	 */
-+	if (bfqq) {
-+		new_bfqq = bfq_setup_cooperator(bfqd, bfqq, bio, false);
-+		if (new_bfqq) {
-+			bfq_merge_bfqqs(bfqd, bic, bfqq, new_bfqq);
-+			/*
-+			 * If we get here, the bio will be queued in the
-+			 * shared queue, i.e., new_bfqq, so use new_bfqq
-+			 * to decide whether bio and rq can be merged.
-+			 */
-+			bfqq = new_bfqq;
-+		} else
-+			bfq_bfqq_increase_failed_cooperations(bfqq);
-+	}
-+
-+	return bfqq == RQ_BFQQ(rq);
- }
- 
- static void __bfq_set_in_service_queue(struct bfq_data *bfqd,
-@@ -1349,6 +1837,15 @@ static void __bfq_bfqq_expire(struct bfq_data *bfqd, struct bfq_queue *bfqq)
- 
- 	__bfq_bfqd_reset_in_service(bfqd);
- 
-+	/*
-+	 * If this bfqq is shared between multiple processes, check
-+	 * to make sure that those processes are still issuing I/Os
-+	 * within the mean seek distance. If not, it may be time to
-+	 * break the queues apart again.
-+	 */
-+	if (bfq_bfqq_coop(bfqq) && BFQQ_SEEKY(bfqq))
-+		bfq_mark_bfqq_split_coop(bfqq);
-+
- 	if (RB_EMPTY_ROOT(&bfqq->sort_list)) {
- 		/*
- 		 * Overloading budget_timeout field to store the time
-@@ -1357,8 +1854,13 @@ static void __bfq_bfqq_expire(struct bfq_data *bfqd, struct bfq_queue *bfqq)
- 		 */
- 		bfqq->budget_timeout = jiffies;
- 		bfq_del_bfqq_busy(bfqd, bfqq, 1);
--	} else
-+	} else {
- 		bfq_activate_bfqq(bfqd, bfqq);
-+		/*
-+		 * Resort priority tree of potential close cooperators.
-+		 */
-+		bfq_pos_tree_add_move(bfqd, bfqq);
-+	}
- }
- 
- /**
-@@ -2242,10 +2744,12 @@ static void bfq_update_wr_data(struct bfq_data *bfqd, struct bfq_queue *bfqq)
- 		/*
- 		 * If the queue was activated in a burst, or
- 		 * too much time has elapsed from the beginning
--		 * of this weight-raising period, then end weight
--		 * raising.
-+		 * of this weight-raising period, or the queue has
-+		 * exceeded the acceptable number of cooperations,
-+		 * then end weight raising.
- 		 */
- 		if (bfq_bfqq_in_large_burst(bfqq) ||
-+		    bfq_bfqq_cooperations(bfqq) >= bfqd->bfq_coop_thresh ||
- 		    time_is_before_jiffies(bfqq->last_wr_start_finish +
- 					   bfqq->wr_cur_max_time)) {
- 			bfqq->last_wr_start_finish = jiffies;
-@@ -2474,6 +2978,25 @@ static void bfq_put_queue(struct bfq_queue *bfqq)
- #endif
- }
- 
-+static void bfq_put_cooperator(struct bfq_queue *bfqq)
-+{
-+	struct bfq_queue *__bfqq, *next;
-+
-+	/*
-+	 * If this queue was scheduled to merge with another queue, be
-+	 * sure to drop the reference taken on that queue (and others in
-+	 * the merge chain). See bfq_setup_merge and bfq_merge_bfqqs.
-+	 */
-+	__bfqq = bfqq->new_bfqq;
-+	while (__bfqq) {
-+		if (__bfqq == bfqq)
-+			break;
-+		next = __bfqq->new_bfqq;
-+		bfq_put_queue(__bfqq);
-+		__bfqq = next;
-+	}
-+}
-+
- static void bfq_exit_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq)
- {
- 	if (bfqq == bfqd->in_service_queue) {
-@@ -2484,6 +3007,8 @@ static void bfq_exit_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq)
- 	bfq_log_bfqq(bfqd, bfqq, "exit_bfqq: %p, %d", bfqq,
- 		     atomic_read(&bfqq->ref));
- 
-+	bfq_put_cooperator(bfqq);
-+
- 	bfq_put_queue(bfqq);
- }
- 
-@@ -2492,6 +3017,25 @@ static void bfq_init_icq(struct io_cq *icq)
- 	struct bfq_io_cq *bic = icq_to_bic(icq);
- 
- 	bic->ttime.last_end_request = jiffies;
-+	/*
-+	 * A newly created bic indicates that the process has just
-+	 * started doing I/O, and is probably mapping into memory its
-+	 * executable and libraries: it definitely needs weight raising.
-+	 * There is however the possibility that the process performs,
-+	 * for a while, I/O close to some other process. EQM intercepts
-+	 * this behavior and may merge the queue corresponding to the
-+	 * process  with some other queue, BEFORE the weight of the queue
-+	 * is raised. Merged queues are not weight-raised (they are assumed
-+	 * to belong to processes that benefit only from high throughput).
-+	 * If the merge is basically the consequence of an accident, then
-+	 * the queue will be split soon and will get back its old weight.
-+	 * It is then important to write down somewhere that this queue
-+	 * does need weight raising, even if it did not make it to get its
-+	 * weight raised before being merged. To this purpose, we overload
-+	 * the field raising_time_left and assign 1 to it, to mark the queue
-+	 * as needing weight raising.
-+	 */
-+	bic->wr_time_left = 1;
- }
- 
- static void bfq_exit_icq(struct io_cq *icq)
-@@ -2505,6 +3049,13 @@ static void bfq_exit_icq(struct io_cq *icq)
- 	}
- 
- 	if (bic->bfqq[BLK_RW_SYNC]) {
-+		/*
-+		 * If the bic is using a shared queue, put the reference
-+		 * taken on the io_context when the bic started using a
-+		 * shared bfq_queue.
-+		 */
-+		if (bfq_bfqq_coop(bic->bfqq[BLK_RW_SYNC]))
-+			put_io_context(icq->ioc);
- 		bfq_exit_bfqq(bfqd, bic->bfqq[BLK_RW_SYNC]);
- 		bic->bfqq[BLK_RW_SYNC] = NULL;
- 	}
-@@ -2809,6 +3360,10 @@ static void bfq_update_idle_window(struct bfq_data *bfqd,
- 	if (!bfq_bfqq_sync(bfqq) || bfq_class_idle(bfqq))
- 		return;
- 
-+	/* Idle window just restored, statistics are meaningless. */
-+	if (bfq_bfqq_just_split(bfqq))
-+		return;
-+
- 	enable_idle = bfq_bfqq_idle_window(bfqq);
- 
- 	if (atomic_read(&bic->icq.ioc->active_ref) == 0 ||
-@@ -2856,6 +3411,7 @@ static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq,
- 	if (bfqq->entity.service > bfq_max_budget(bfqd) / 8 ||
- 	    !BFQQ_SEEKY(bfqq))
- 		bfq_update_idle_window(bfqd, bfqq, bic);
-+	bfq_clear_bfqq_just_split(bfqq);
- 
- 	bfq_log_bfqq(bfqd, bfqq,
- 		     "rq_enqueued: idle_window=%d (seeky %d, mean %llu)",
-@@ -2920,12 +3476,47 @@ static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq,
- static void bfq_insert_request(struct request_queue *q, struct request *rq)
- {
- 	struct bfq_data *bfqd = q->elevator->elevator_data;
--	struct bfq_queue *bfqq = RQ_BFQQ(rq);
-+	struct bfq_queue *bfqq = RQ_BFQQ(rq), *new_bfqq;
- 
- 	assert_spin_locked(bfqd->queue->queue_lock);
- 
-+	/*
-+	 * An unplug may trigger a requeue of a request from the device
-+	 * driver: make sure we are in process context while trying to
-+	 * merge two bfq_queues.
-+	 */
-+	if (!in_interrupt()) {
-+		new_bfqq = bfq_setup_cooperator(bfqd, bfqq, rq, true);
-+		if (new_bfqq) {
-+			if (bic_to_bfqq(RQ_BIC(rq), 1) != bfqq)
-+				new_bfqq = bic_to_bfqq(RQ_BIC(rq), 1);
-+			/*
-+			 * Release the request's reference to the old bfqq
-+			 * and make sure one is taken to the shared queue.
-+			 */
-+			new_bfqq->allocated[rq_data_dir(rq)]++;
-+			bfqq->allocated[rq_data_dir(rq)]--;
-+			atomic_inc(&new_bfqq->ref);
-+			bfq_put_queue(bfqq);
-+			if (bic_to_bfqq(RQ_BIC(rq), 1) == bfqq)
-+				bfq_merge_bfqqs(bfqd, RQ_BIC(rq),
-+						bfqq, new_bfqq);
-+			rq->elv.priv[1] = new_bfqq;
-+			bfqq = new_bfqq;
-+		} else
-+			bfq_bfqq_increase_failed_cooperations(bfqq);
-+	}
-+
- 	bfq_add_request(rq);
- 
-+	/*
-+	 * Here a newly-created bfq_queue has already started a weight-raising
-+	 * period: clear raising_time_left to prevent bfq_bfqq_save_state()
-+	 * from assigning it a full weight-raising period. See the detailed
-+	 * comments about this field in bfq_init_icq().
-+	 */
-+	if (bfqq->bic)
-+		bfqq->bic->wr_time_left = 0;
- 	rq->fifo_time = jiffies + bfqd->bfq_fifo_expire[rq_is_sync(rq)];
- 	list_add_tail(&rq->queuelist, &bfqq->fifo);
- 
-@@ -3094,6 +3685,32 @@ static void bfq_put_request(struct request *rq)
- }
- 
- /*
-+ * Returns NULL if a new bfqq should be allocated, or the old bfqq if this
-+ * was the last process referring to said bfqq.
-+ */
-+static struct bfq_queue *
-+bfq_split_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq)
-+{
-+	bfq_log_bfqq(bfqq->bfqd, bfqq, "splitting queue");
-+
-+	put_io_context(bic->icq.ioc);
-+
-+	if (bfqq_process_refs(bfqq) == 1) {
-+		bfqq->pid = current->pid;
-+		bfq_clear_bfqq_coop(bfqq);
-+		bfq_clear_bfqq_split_coop(bfqq);
-+		return bfqq;
-+	}
-+
-+	bic_set_bfqq(bic, NULL, 1);
-+
-+	bfq_put_cooperator(bfqq);
-+
-+	bfq_put_queue(bfqq);
-+	return NULL;
-+}
-+
-+/*
-  * Allocate bfq data structures associated with this request.
-  */
- static int bfq_set_request(struct request_queue *q, struct request *rq,
-@@ -3105,6 +3722,7 @@ static int bfq_set_request(struct request_queue *q, struct request *rq,
- 	const int is_sync = rq_is_sync(rq);
- 	struct bfq_queue *bfqq;
- 	unsigned long flags;
-+	bool split = false;
- 
- 	might_sleep_if(gfp_mask & __GFP_WAIT);
- 
-@@ -3117,15 +3735,30 @@ static int bfq_set_request(struct request_queue *q, struct request *rq,
- 
- 	bfq_bic_update_cgroup(bic, bio);
- 
-+new_queue:
- 	bfqq = bic_to_bfqq(bic, is_sync);
- 	if (!bfqq || bfqq == &bfqd->oom_bfqq) {
- 		bfqq = bfq_get_queue(bfqd, bio, is_sync, bic, gfp_mask);
- 		bic_set_bfqq(bic, bfqq, is_sync);
--		if (is_sync) {
--			if (bfqd->large_burst)
-+		if (split && is_sync) {
-+			if ((bic->was_in_burst_list && bfqd->large_burst) ||
-+			    bic->saved_in_large_burst)
- 				bfq_mark_bfqq_in_large_burst(bfqq);
--			else
--				bfq_clear_bfqq_in_large_burst(bfqq);
-+			else {
-+			    bfq_clear_bfqq_in_large_burst(bfqq);
-+			    if (bic->was_in_burst_list)
-+			       hlist_add_head(&bfqq->burst_list_node,
-+				              &bfqd->burst_list);
-+			}
-+		}
-+	} else {
-+		/* If the queue was seeky for too long, break it apart. */
-+		if (bfq_bfqq_coop(bfqq) && bfq_bfqq_split_coop(bfqq)) {
-+			bfq_log_bfqq(bfqd, bfqq, "breaking apart bfqq");
-+			bfqq = bfq_split_bfqq(bic, bfqq);
-+			split = true;
-+			if (!bfqq)
-+				goto new_queue;
- 		}
- 	}
- 
-@@ -3137,6 +3770,26 @@ static int bfq_set_request(struct request_queue *q, struct request *rq,
- 	rq->elv.priv[0] = bic;
- 	rq->elv.priv[1] = bfqq;
- 
-+	/*
-+	 * If a bfq_queue has only one process reference, it is owned
-+	 * by only one bfq_io_cq: we can set the bic field of the
-+	 * bfq_queue to the address of that structure. Also, if the
-+	 * queue has just been split, mark a flag so that the
-+	 * information is available to the other scheduler hooks.
-+	 */
-+	if (likely(bfqq != &bfqd->oom_bfqq) && bfqq_process_refs(bfqq) == 1) {
-+		bfqq->bic = bic;
-+		if (split) {
-+			bfq_mark_bfqq_just_split(bfqq);
-+			/*
-+			 * If the queue has just been split from a shared
-+			 * queue, restore the idle window and the possible
-+			 * weight raising period.
-+			 */
-+			bfq_bfqq_resume_state(bfqq, bic);
-+		}
-+	}
-+
- 	spin_unlock_irqrestore(q->queue_lock, flags);
- 
- 	return 0;
-@@ -3289,6 +3942,7 @@ static void bfq_init_root_group(struct bfq_group *root_group,
- 	root_group->my_entity = NULL;
- 	root_group->bfqd = bfqd;
- #endif
-+	root_group->rq_pos_tree = RB_ROOT;
- 	for (i = 0; i < BFQ_IOPRIO_CLASSES; i++)
- 		root_group->sched_data.service_tree[i] = BFQ_SERVICE_TREE_INIT;
- }
-@@ -3369,6 +4023,8 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
- 	bfqd->bfq_timeout[BLK_RW_ASYNC] = bfq_timeout_async;
- 	bfqd->bfq_timeout[BLK_RW_SYNC] = bfq_timeout_sync;
- 
-+	bfqd->bfq_coop_thresh = 2;
-+	bfqd->bfq_failed_cooperations = 7000;
- 	bfqd->bfq_requests_within_timer = 120;
- 
- 	bfqd->bfq_large_burst_thresh = 11;
-diff --git a/block/bfq.h b/block/bfq.h
-index ca5ac20..320c438 100644
---- a/block/bfq.h
-+++ b/block/bfq.h
-@@ -183,6 +183,8 @@ struct bfq_group;
-  *                    ioprio_class value.
-  * @new_bfqq: shared bfq_queue if queue is cooperating with
-  *           one or more other queues.
-+ * @pos_node: request-position tree member (see bfq_group's @rq_pos_tree).
-+ * @pos_root: request-position tree root (see bfq_group's @rq_pos_tree).
-  * @sort_list: sorted list of pending requests.
-  * @next_rq: if fifo isn't expired, next request to serve.
-  * @queued: nr of requests queued in @sort_list.
-@@ -304,6 +306,26 @@ struct bfq_ttime {
-  * @ttime: associated @bfq_ttime struct
-  * @ioprio: per (request_queue, blkcg) ioprio.
-  * @blkcg_id: id of the blkcg the related io_cq belongs to.
-+ * @wr_time_left: snapshot of the time left before weight raising ends
-+ *                for the sync queue associated to this process; this
-+ *		  snapshot is taken to remember this value while the weight
-+ *		  raising is suspended because the queue is merged with a
-+ *		  shared queue, and is used to set @raising_cur_max_time
-+ *		  when the queue is split from the shared queue and its
-+ *		  weight is raised again
-+ * @saved_idle_window: same purpose as the previous field for the idle
-+ *                     window
-+ * @saved_IO_bound: same purpose as the previous two fields for the I/O
-+ *                  bound classification of a queue
-+ * @saved_in_large_burst: same purpose as the previous fields for the
-+ *                        value of the field keeping the queue's belonging
-+ *                        to a large burst
-+ * @was_in_burst_list: true if the queue belonged to a burst list
-+ *                     before its merge with another cooperating queue
-+ * @cooperations: counter of consecutive successful queue merges underwent
-+ *                by any of the process' @bfq_queues
-+ * @failed_cooperations: counter of consecutive failed queue merges of any
-+ *                       of the process' @bfq_queues
-  */
- struct bfq_io_cq {
- 	struct io_cq icq; /* must be the first member */
-@@ -314,6 +336,16 @@ struct bfq_io_cq {
- #ifdef CONFIG_BFQ_GROUP_IOSCHED
- 	uint64_t blkcg_id; /* the current blkcg ID */
- #endif
-+
-+	unsigned int wr_time_left;
-+	bool saved_idle_window;
-+	bool saved_IO_bound;
-+
-+	bool saved_in_large_burst;
-+	bool was_in_burst_list;
-+
-+	unsigned int cooperations;
-+	unsigned int failed_cooperations;
- };
- 
- enum bfq_device_speed {
-@@ -559,6 +591,9 @@ enum bfqq_state_flags {
- 					 * may need softrt-next-start
- 					 * update
- 					 */
-+	BFQ_BFQQ_FLAG_coop,		/* bfqq is shared */
-+	BFQ_BFQQ_FLAG_split_coop,	/* shared bfqq will be split */
-+	BFQ_BFQQ_FLAG_just_split,	/* queue has just been split */
- };
- 
- #define BFQ_BFQQ_FNS(name)						\
-@@ -585,6 +620,9 @@ BFQ_BFQQ_FNS(budget_new);
- BFQ_BFQQ_FNS(IO_bound);
- BFQ_BFQQ_FNS(in_large_burst);
- BFQ_BFQQ_FNS(constantly_seeky);
-+BFQ_BFQQ_FNS(coop);
-+BFQ_BFQQ_FNS(split_coop);
-+BFQ_BFQQ_FNS(just_split);
- BFQ_BFQQ_FNS(softrt_update);
- #undef BFQ_BFQQ_FNS
- 
-@@ -679,6 +717,9 @@ struct bfq_group_data {
-  *                   are groups with more than one active @bfq_entity
-  *                   (see the comments to the function
-  *                   bfq_bfqq_must_not_expire()).
-+ * @rq_pos_tree: rbtree sorted by next_request position, used when
-+ *               determining if two or more queues have interleaving
-+ *               requests (see bfq_find_close_cooperator()).
-  *
-  * Each (device, cgroup) pair has its own bfq_group, i.e., for each cgroup
-  * there is a set of bfq_groups, each one collecting the lower-level
-@@ -707,6 +748,8 @@ struct bfq_group {
- 
- 	int active_entities;
- 
-+	struct rb_root rq_pos_tree;
-+
- 	struct bfqg_stats stats;
- 	struct bfqg_stats dead_stats;	/* stats pushed from dead children */
- };
-@@ -717,6 +760,8 @@ struct bfq_group {
- 
- 	struct bfq_queue *async_bfqq[2][IOPRIO_BE_NR];
- 	struct bfq_queue *async_idle_bfqq;
-+
-+	struct rb_root rq_pos_tree;
- };
- #endif
- 
-@@ -793,6 +838,27 @@ static void bfq_put_bfqd_unlock(struct bfq_data *bfqd, unsigned long *flags)
- 	spin_unlock_irqrestore(bfqd->queue->queue_lock, *flags);
- }
- 
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+
-+static struct bfq_group *bfq_bfqq_to_bfqg(struct bfq_queue *bfqq)
-+{
-+	struct bfq_entity *group_entity = bfqq->entity.parent;
-+
-+	if (!group_entity)
-+		group_entity = &bfqq->bfqd->root_group->entity;
-+
-+	return container_of(group_entity, struct bfq_group, entity);
-+}
-+
-+#else
-+
-+static struct bfq_group *bfq_bfqq_to_bfqg(struct bfq_queue *bfqq)
-+{
-+	return bfqq->bfqd->root_group;
-+}
-+
-+#endif
-+
- static void bfq_check_ioprio_change(struct bfq_io_cq *bic, struct bio *bio);
- static void bfq_put_queue(struct bfq_queue *bfqq);
- static void bfq_dispatch_insert(struct request_queue *q, struct request *rq);
--- 
-2.1.4
-


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-09-28 16:49 Mike Pagano
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-09-28 16:49 UTC (permalink / raw
  To: gentoo-commits

commit:     24113c3716b8d5a19a98dca269fbd61c48ce37dc
Author:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Mon Sep 28 16:49:45 2015 +0000
Commit:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Mon Sep 28 16:49:45 2015 +0000
URL:        https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=24113c37

Add BFQ v7r8.

 0000_README                                        |   12 +
 ...roups-kconfig-build-bits-for-BFQ-v7r8-4.2.patch |  104 +
 ...introduce-the-BFQ-v7r8-I-O-sched-for-4.2.patch1 | 6952 ++++++++++++++++++++
 ...Early-Queue-Merge-EQM-to-BFQ-v7r8-for-4.2.patch | 1220 ++++
 4 files changed, 8288 insertions(+)

diff --git a/0000_README b/0000_README
index 7050114..93b94b6 100644
--- a/0000_README
+++ b/0000_README
@@ -79,6 +79,18 @@ Patch:  5000_enable-additional-cpu-optimizations-for-gcc.patch
 From:   https://github.com/graysky2/kernel_gcc_patch/
 Desc:   Kernel patch enables gcc < v4.9 optimizations for additional CPUs.
 
+Patch:  5001_block-cgroups-kconfig-build-bits-for-BFQ-v7r8-4.2.patch
+From:   http://algo.ing.unimo.it/people/paolo/disk_sched/
+Desc:   BFQ v7r8 patch 1 for 4.2: Build, cgroups and kconfig bits
+
+Patch:  5002_block-introduce-the-BFQ-v7r8-I-O-sched-for-4.2.patch1
+From:   http://algo.ing.unimo.it/people/paolo/disk_sched/
+Desc:   BFQ v7r8 patch 2 for 4.2: BFQ Scheduler
+
+Patch:  5003_block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r8-for-4.2.0.patch
+From:   http://algo.ing.unimo.it/people/paolo/disk_sched/
+Desc:   BFQ v7r8 patch 3 for 4.2: Early Queue Merge (EQM)
+
 Patch:  5010_enable-additional-cpu-optimizations-for-gcc-4.9.patch
 From:   https://github.com/graysky2/kernel_gcc_patch/
 Desc:   Kernel patch enables gcc >= v4.9 optimizations for additional CPUs.

diff --git a/5001_block-cgroups-kconfig-build-bits-for-BFQ-v7r8-4.2.patch b/5001_block-cgroups-kconfig-build-bits-for-BFQ-v7r8-4.2.patch
new file mode 100644
index 0000000..daf9be7
--- /dev/null
+++ b/5001_block-cgroups-kconfig-build-bits-for-BFQ-v7r8-4.2.patch
@@ -0,0 +1,104 @@
+From c710d693f32c3d4952626aa2bdcf68ac7b40dd0e Mon Sep 17 00:00:00 2001
+From: Paolo Valente <paolo.valente@unimore.it>
+Date: Tue, 7 Apr 2015 13:39:12 +0200
+Subject: [PATCH 1/3] block: cgroups, kconfig, build bits for BFQ-v7r8-4.2
+
+Update Kconfig.iosched and do the related Makefile changes to include
+kernel configuration options for BFQ. Also add the bfqio controller
+to the cgroups subsystem.
+
+Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
+Signed-off-by: Arianna Avanzini <avanzini@google.com>
+---
+ block/Kconfig.iosched         | 32 ++++++++++++++++++++++++++++++++
+ block/Makefile                |  1 +
+ include/linux/cgroup_subsys.h |  4 ++++
+ 3 files changed, 37 insertions(+)
+
+diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
+index 421bef9..0ee5f0f 100644
+--- a/block/Kconfig.iosched
++++ b/block/Kconfig.iosched
+@@ -39,6 +39,27 @@ config CFQ_GROUP_IOSCHED
+ 	---help---
+ 	  Enable group IO scheduling in CFQ.
+ 
++config IOSCHED_BFQ
++	tristate "BFQ I/O scheduler"
++	default n
++	---help---
++	  The BFQ I/O scheduler tries to distribute bandwidth among
++	  all processes according to their weights.
++	  It aims at distributing the bandwidth as desired, independently of
++	  the disk parameters and with any workload. It also tries to
++	  guarantee low latency to interactive and soft real-time
++	  applications. If compiled built-in (saying Y here), BFQ can
++	  be configured to support hierarchical scheduling.
++
++config CGROUP_BFQIO
++	bool "BFQ hierarchical scheduling support"
++	depends on CGROUPS && IOSCHED_BFQ=y
++	default n
++	---help---
++	  Enable hierarchical scheduling in BFQ, using the cgroups
++	  filesystem interface.  The name of the subsystem will be
++	  bfqio.
++
+ choice
+ 	prompt "Default I/O scheduler"
+ 	default DEFAULT_CFQ
+@@ -52,6 +73,16 @@ choice
+ 	config DEFAULT_CFQ
+ 		bool "CFQ" if IOSCHED_CFQ=y
+ 
++	config DEFAULT_BFQ
++		bool "BFQ" if IOSCHED_BFQ=y
++		help
++		  Selects BFQ as the default I/O scheduler which will be
++		  used by default for all block devices.
++		  The BFQ I/O scheduler aims at distributing the bandwidth
++		  as desired, independently of the disk parameters and with
++		  any workload. It also tries to guarantee low latency to
++		  interactive and soft real-time applications.
++
+ 	config DEFAULT_NOOP
+ 		bool "No-op"
+ 
+@@ -61,6 +92,7 @@ config DEFAULT_IOSCHED
+ 	string
+ 	default "deadline" if DEFAULT_DEADLINE
+ 	default "cfq" if DEFAULT_CFQ
++	default "bfq" if DEFAULT_BFQ
+ 	default "noop" if DEFAULT_NOOP
+ 
+ endmenu
+diff --git a/block/Makefile b/block/Makefile
+index 00ecc97..1ed86d5 100644
+--- a/block/Makefile
++++ b/block/Makefile
+@@ -18,6 +18,7 @@ obj-$(CONFIG_BLK_DEV_THROTTLING)	+= blk-throttle.o
+ obj-$(CONFIG_IOSCHED_NOOP)	+= noop-iosched.o
+ obj-$(CONFIG_IOSCHED_DEADLINE)	+= deadline-iosched.o
+ obj-$(CONFIG_IOSCHED_CFQ)	+= cfq-iosched.o
++obj-$(CONFIG_IOSCHED_BFQ)	+= bfq-iosched.o
+ 
+ obj-$(CONFIG_BLOCK_COMPAT)	+= compat_ioctl.o
+ obj-$(CONFIG_BLK_CMDLINE_PARSER)	+= cmdline-parser.o
+diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
+index 1a96fda..81ad8a0 100644
+--- a/include/linux/cgroup_subsys.h
++++ b/include/linux/cgroup_subsys.h
+@@ -46,6 +46,10 @@ SUBSYS(freezer)
+ SUBSYS(net_cls)
+ #endif
+ 
++#if IS_ENABLED(CONFIG_CGROUP_BFQIO)
++SUBSYS(bfqio)
++#endif
++
+ #if IS_ENABLED(CONFIG_CGROUP_PERF)
+ SUBSYS(perf_event)
+ #endif
+-- 
+1.9.1
+

diff --git a/5002_block-introduce-the-BFQ-v7r8-I-O-sched-for-4.2.patch1 b/5002_block-introduce-the-BFQ-v7r8-I-O-sched-for-4.2.patch1
new file mode 100644
index 0000000..4cc232d
--- /dev/null
+++ b/5002_block-introduce-the-BFQ-v7r8-I-O-sched-for-4.2.patch1
@@ -0,0 +1,6952 @@
+From a364e1785d2eef24c2ca0ade5db036721b86c185 Mon Sep 17 00:00:00 2001
+From: Paolo Valente <paolo.valente@unimore.it>
+Date: Thu, 9 May 2013 19:10:02 +0200
+Subject: [PATCH 2/3] block: introduce the BFQ-v7r8 I/O sched for 4.2
+
+Add the BFQ-v7r8 I/O scheduler to 4.2.
+The general structure is borrowed from CFQ, as much of the code for
+handling I/O contexts. Over time, several useful features have been
+ported from CFQ as well (details in the changelog in README.BFQ). A
+(bfq_)queue is associated to each task doing I/O on a device, and each
+time a scheduling decision has to be made a queue is selected and served
+until it expires.
+
+    - Slices are given in the service domain: tasks are assigned
+      budgets, measured in number of sectors. Once got the disk, a task
+      must however consume its assigned budget within a configurable
+      maximum time (by default, the maximum possible value of the
+      budgets is automatically computed to comply with this timeout).
+      This allows the desired latency vs "throughput boosting" tradeoff
+      to be set.
+
+    - Budgets are scheduled according to a variant of WF2Q+, implemented
+      using an augmented rb-tree to take eligibility into account while
+      preserving an O(log N) overall complexity.
+
+    - A low-latency tunable is provided; if enabled, both interactive
+      and soft real-time applications are guaranteed a very low latency.
+
+    - Latency guarantees are preserved also in the presence of NCQ.
+
+    - Also with flash-based devices, a high throughput is achieved
+      while still preserving latency guarantees.
+
+    - BFQ features Early Queue Merge (EQM), a sort of fusion of the
+      cooperating-queue-merging and the preemption mechanisms present
+      in CFQ. EQM is in fact a unified mechanism that tries to get a
+      sequential read pattern, and hence a high throughput, with any
+      set of processes performing interleaved I/O over a contiguous
+      sequence of sectors.
+
+    - BFQ supports full hierarchical scheduling, exporting a cgroups
+      interface.  Since each node has a full scheduler, each group can
+      be assigned its own weight.
+
+    - If the cgroups interface is not used, only I/O priorities can be
+      assigned to processes, with ioprio values mapped to weights
+      with the relation weight = IOPRIO_BE_NR - ioprio.
+
+    - ioprio classes are served in strict priority order, i.e., lower
+      priority queues are not served as long as there are higher
+      priority queues.  Among queues in the same class the bandwidth is
+      distributed in proportion to the weight of each queue. A very
+      thin extra bandwidth is however guaranteed to the Idle class, to
+      prevent it from starving.
+
+Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
+Signed-off-by: Arianna Avanzini <avanzini@google.com>
+---
+ block/bfq-cgroup.c  |  936 +++++++++++++
+ block/bfq-ioc.c     |   36 +
+ block/bfq-iosched.c | 3898 +++++++++++++++++++++++++++++++++++++++++++++++++++
+ block/bfq-sched.c   | 1208 ++++++++++++++++
+ block/bfq.h         |  771 ++++++++++
+ 5 files changed, 6849 insertions(+)
+ create mode 100644 block/bfq-cgroup.c
+ create mode 100644 block/bfq-ioc.c
+ create mode 100644 block/bfq-iosched.c
+ create mode 100644 block/bfq-sched.c
+ create mode 100644 block/bfq.h
+
+diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
+new file mode 100644
+index 0000000..11e2f1d
+--- /dev/null
++++ b/block/bfq-cgroup.c
+@@ -0,0 +1,936 @@
++/*
++ * BFQ: CGROUPS support.
++ *
++ * Based on ideas and code from CFQ:
++ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
++ *
++ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
++ *		      Paolo Valente <paolo.valente@unimore.it>
++ *
++ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
++ *
++ * Licensed under the GPL-2 as detailed in the accompanying COPYING.BFQ
++ * file.
++ */
++
++#ifdef CONFIG_CGROUP_BFQIO
++
++static DEFINE_MUTEX(bfqio_mutex);
++
++static bool bfqio_is_removed(struct bfqio_cgroup *bgrp)
++{
++	return bgrp ? !bgrp->online : false;
++}
++
++static struct bfqio_cgroup bfqio_root_cgroup = {
++	.weight = BFQ_DEFAULT_GRP_WEIGHT,
++	.ioprio = BFQ_DEFAULT_GRP_IOPRIO,
++	.ioprio_class = BFQ_DEFAULT_GRP_CLASS,
++};
++
++static inline void bfq_init_entity(struct bfq_entity *entity,
++				   struct bfq_group *bfqg)
++{
++	entity->weight = entity->new_weight;
++	entity->orig_weight = entity->new_weight;
++	entity->ioprio = entity->new_ioprio;
++	entity->ioprio_class = entity->new_ioprio_class;
++	entity->parent = bfqg->my_entity;
++	entity->sched_data = &bfqg->sched_data;
++}
++
++static struct bfqio_cgroup *css_to_bfqio(struct cgroup_subsys_state *css)
++{
++	return css ? container_of(css, struct bfqio_cgroup, css) : NULL;
++}
++
++/*
++ * Search the bfq_group for bfqd into the hash table (by now only a list)
++ * of bgrp.  Must be called under rcu_read_lock().
++ */
++static struct bfq_group *bfqio_lookup_group(struct bfqio_cgroup *bgrp,
++					    struct bfq_data *bfqd)
++{
++	struct bfq_group *bfqg;
++	void *key;
++
++	hlist_for_each_entry_rcu(bfqg, &bgrp->group_data, group_node) {
++		key = rcu_dereference(bfqg->bfqd);
++		if (key == bfqd)
++			return bfqg;
++	}
++
++	return NULL;
++}
++
++static inline void bfq_group_init_entity(struct bfqio_cgroup *bgrp,
++					 struct bfq_group *bfqg)
++{
++	struct bfq_entity *entity = &bfqg->entity;
++
++	/*
++	 * If the weight of the entity has never been set via the sysfs
++	 * interface, then bgrp->weight == 0. In this case we initialize
++	 * the weight from the current ioprio value. Otherwise, the group
++	 * weight, if set, has priority over the ioprio value.
++	 */
++	if (bgrp->weight == 0) {
++		entity->new_weight = bfq_ioprio_to_weight(bgrp->ioprio);
++		entity->new_ioprio = bgrp->ioprio;
++	} else {
++		if (bgrp->weight < BFQ_MIN_WEIGHT ||
++		    bgrp->weight > BFQ_MAX_WEIGHT) {
++			printk(KERN_CRIT "bfq_group_init_entity: "
++					 "bgrp->weight %d\n", bgrp->weight);
++			BUG();
++		}
++		entity->new_weight = bgrp->weight;
++		entity->new_ioprio = bfq_weight_to_ioprio(bgrp->weight);
++	}
++	entity->orig_weight = entity->weight = entity->new_weight;
++	entity->ioprio = entity->new_ioprio;
++	entity->ioprio_class = entity->new_ioprio_class = bgrp->ioprio_class;
++	entity->my_sched_data = &bfqg->sched_data;
++	bfqg->active_entities = 0;
++}
++
++static inline void bfq_group_set_parent(struct bfq_group *bfqg,
++					struct bfq_group *parent)
++{
++	struct bfq_entity *entity;
++
++	BUG_ON(parent == NULL);
++	BUG_ON(bfqg == NULL);
++
++	entity = &bfqg->entity;
++	entity->parent = parent->my_entity;
++	entity->sched_data = &parent->sched_data;
++}
++
++/**
++ * bfq_group_chain_alloc - allocate a chain of groups.
++ * @bfqd: queue descriptor.
++ * @css: the leaf cgroup_subsys_state this chain starts from.
++ *
++ * Allocate a chain of groups starting from the one belonging to
++ * @cgroup up to the root cgroup.  Stop if a cgroup on the chain
++ * to the root has already an allocated group on @bfqd.
++ */
++static struct bfq_group *bfq_group_chain_alloc(struct bfq_data *bfqd,
++					       struct cgroup_subsys_state *css)
++{
++	struct bfqio_cgroup *bgrp;
++	struct bfq_group *bfqg, *prev = NULL, *leaf = NULL;
++
++	for (; css != NULL; css = css->parent) {
++		bgrp = css_to_bfqio(css);
++
++		bfqg = bfqio_lookup_group(bgrp, bfqd);
++		if (bfqg != NULL) {
++			/*
++			 * All the cgroups in the path from there to the
++			 * root must have a bfq_group for bfqd, so we don't
++			 * need any more allocations.
++			 */
++			break;
++		}
++
++		bfqg = kzalloc(sizeof(*bfqg), GFP_ATOMIC);
++		if (bfqg == NULL)
++			goto cleanup;
++
++		bfq_group_init_entity(bgrp, bfqg);
++		bfqg->my_entity = &bfqg->entity;
++
++		if (leaf == NULL) {
++			leaf = bfqg;
++			prev = leaf;
++		} else {
++			bfq_group_set_parent(prev, bfqg);
++			/*
++			 * Build a list of allocated nodes using the bfqd
++			 * filed, that is still unused and will be
++			 * initialized only after the node will be
++			 * connected.
++			 */
++			prev->bfqd = bfqg;
++			prev = bfqg;
++		}
++	}
++
++	return leaf;
++
++cleanup:
++	while (leaf != NULL) {
++		prev = leaf;
++		leaf = leaf->bfqd;
++		kfree(prev);
++	}
++
++	return NULL;
++}
++
++/**
++ * bfq_group_chain_link - link an allocated group chain to a cgroup
++ *                        hierarchy.
++ * @bfqd: the queue descriptor.
++ * @css: the leaf cgroup_subsys_state to start from.
++ * @leaf: the leaf group (to be associated to @cgroup).
++ *
++ * Try to link a chain of groups to a cgroup hierarchy, connecting the
++ * nodes bottom-up, so we can be sure that when we find a cgroup in the
++ * hierarchy that already as a group associated to @bfqd all the nodes
++ * in the path to the root cgroup have one too.
++ *
++ * On locking: the queue lock protects the hierarchy (there is a hierarchy
++ * per device) while the bfqio_cgroup lock protects the list of groups
++ * belonging to the same cgroup.
++ */
++static void bfq_group_chain_link(struct bfq_data *bfqd,
++				 struct cgroup_subsys_state *css,
++				 struct bfq_group *leaf)
++{
++	struct bfqio_cgroup *bgrp;
++	struct bfq_group *bfqg, *next, *prev = NULL;
++	unsigned long flags;
++
++	assert_spin_locked(bfqd->queue->queue_lock);
++
++	for (; css != NULL && leaf != NULL; css = css->parent) {
++		bgrp = css_to_bfqio(css);
++		next = leaf->bfqd;
++
++		bfqg = bfqio_lookup_group(bgrp, bfqd);
++		BUG_ON(bfqg != NULL);
++
++		spin_lock_irqsave(&bgrp->lock, flags);
++
++		rcu_assign_pointer(leaf->bfqd, bfqd);
++		hlist_add_head_rcu(&leaf->group_node, &bgrp->group_data);
++		hlist_add_head(&leaf->bfqd_node, &bfqd->group_list);
++
++		spin_unlock_irqrestore(&bgrp->lock, flags);
++
++		prev = leaf;
++		leaf = next;
++	}
++
++	BUG_ON(css == NULL && leaf != NULL);
++	if (css != NULL && prev != NULL) {
++		bgrp = css_to_bfqio(css);
++		bfqg = bfqio_lookup_group(bgrp, bfqd);
++		bfq_group_set_parent(prev, bfqg);
++	}
++}
++
++/**
++ * bfq_find_alloc_group - return the group associated to @bfqd in @cgroup.
++ * @bfqd: queue descriptor.
++ * @cgroup: cgroup being searched for.
++ *
++ * Return a group associated to @bfqd in @cgroup, allocating one if
++ * necessary.  When a group is returned all the cgroups in the path
++ * to the root have a group associated to @bfqd.
++ *
++ * If the allocation fails, return the root group: this breaks guarantees
++ * but is a safe fallback.  If this loss becomes a problem it can be
++ * mitigated using the equivalent weight (given by the product of the
++ * weights of the groups in the path from @group to the root) in the
++ * root scheduler.
++ *
++ * We allocate all the missing nodes in the path from the leaf cgroup
++ * to the root and we connect the nodes only after all the allocations
++ * have been successful.
++ */
++static struct bfq_group *bfq_find_alloc_group(struct bfq_data *bfqd,
++					      struct cgroup_subsys_state *css)
++{
++	struct bfqio_cgroup *bgrp = css_to_bfqio(css);
++	struct bfq_group *bfqg;
++
++	bfqg = bfqio_lookup_group(bgrp, bfqd);
++	if (bfqg != NULL)
++		return bfqg;
++
++	bfqg = bfq_group_chain_alloc(bfqd, css);
++	if (bfqg != NULL)
++		bfq_group_chain_link(bfqd, css, bfqg);
++	else
++		bfqg = bfqd->root_group;
++
++	return bfqg;
++}
++
++/**
++ * bfq_bfqq_move - migrate @bfqq to @bfqg.
++ * @bfqd: queue descriptor.
++ * @bfqq: the queue to move.
++ * @entity: @bfqq's entity.
++ * @bfqg: the group to move to.
++ *
++ * Move @bfqq to @bfqg, deactivating it from its old group and reactivating
++ * it on the new one.  Avoid putting the entity on the old group idle tree.
++ *
++ * Must be called under the queue lock; the cgroup owning @bfqg must
++ * not disappear (by now this just means that we are called under
++ * rcu_read_lock()).
++ */
++static void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++			  struct bfq_entity *entity, struct bfq_group *bfqg)
++{
++	int busy, resume;
++
++	busy = bfq_bfqq_busy(bfqq);
++	resume = !RB_EMPTY_ROOT(&bfqq->sort_list);
++
++	BUG_ON(resume && !entity->on_st);
++	BUG_ON(busy && !resume && entity->on_st &&
++	       bfqq != bfqd->in_service_queue);
++
++	if (busy) {
++		BUG_ON(atomic_read(&bfqq->ref) < 2);
++
++		if (!resume)
++			bfq_del_bfqq_busy(bfqd, bfqq, 0);
++		else
++			bfq_deactivate_bfqq(bfqd, bfqq, 0);
++	} else if (entity->on_st)
++		bfq_put_idle_entity(bfq_entity_service_tree(entity), entity);
++
++	/*
++	 * Here we use a reference to bfqg.  We don't need a refcounter
++	 * as the cgroup reference will not be dropped, so that its
++	 * destroy() callback will not be invoked.
++	 */
++	entity->parent = bfqg->my_entity;
++	entity->sched_data = &bfqg->sched_data;
++
++	if (busy && resume)
++		bfq_activate_bfqq(bfqd, bfqq);
++
++	if (bfqd->in_service_queue == NULL && !bfqd->rq_in_driver)
++		bfq_schedule_dispatch(bfqd);
++}
++
++/**
++ * __bfq_bic_change_cgroup - move @bic to @cgroup.
++ * @bfqd: the queue descriptor.
++ * @bic: the bic to move.
++ * @cgroup: the cgroup to move to.
++ *
++ * Move bic to cgroup, assuming that bfqd->queue is locked; the caller
++ * has to make sure that the reference to cgroup is valid across the call.
++ *
++ * NOTE: an alternative approach might have been to store the current
++ * cgroup in bfqq and getting a reference to it, reducing the lookup
++ * time here, at the price of slightly more complex code.
++ */
++static struct bfq_group *__bfq_bic_change_cgroup(struct bfq_data *bfqd,
++						struct bfq_io_cq *bic,
++						struct cgroup_subsys_state *css)
++{
++	struct bfq_queue *async_bfqq = bic_to_bfqq(bic, 0);
++	struct bfq_queue *sync_bfqq = bic_to_bfqq(bic, 1);
++	struct bfq_entity *entity;
++	struct bfq_group *bfqg;
++	struct bfqio_cgroup *bgrp;
++
++	bgrp = css_to_bfqio(css);
++
++	bfqg = bfq_find_alloc_group(bfqd, css);
++	if (async_bfqq != NULL) {
++		entity = &async_bfqq->entity;
++
++		if (entity->sched_data != &bfqg->sched_data) {
++			bic_set_bfqq(bic, NULL, 0);
++			bfq_log_bfqq(bfqd, async_bfqq,
++				     "bic_change_group: %p %d",
++				     async_bfqq, atomic_read(&async_bfqq->ref));
++			bfq_put_queue(async_bfqq);
++		}
++	}
++
++	if (sync_bfqq != NULL) {
++		entity = &sync_bfqq->entity;
++		if (entity->sched_data != &bfqg->sched_data)
++			bfq_bfqq_move(bfqd, sync_bfqq, entity, bfqg);
++	}
++
++	return bfqg;
++}
++
++/**
++ * bfq_bic_change_cgroup - move @bic to @cgroup.
++ * @bic: the bic being migrated.
++ * @cgroup: the destination cgroup.
++ *
++ * When the task owning @bic is moved to @cgroup, @bic is immediately
++ * moved into its new parent group.
++ */
++static void bfq_bic_change_cgroup(struct bfq_io_cq *bic,
++				  struct cgroup_subsys_state *css)
++{
++	struct bfq_data *bfqd;
++	unsigned long uninitialized_var(flags);
++
++	bfqd = bfq_get_bfqd_locked(&(bic->icq.q->elevator->elevator_data),
++				   &flags);
++	if (bfqd != NULL) {
++		__bfq_bic_change_cgroup(bfqd, bic, css);
++		bfq_put_bfqd_unlock(bfqd, &flags);
++	}
++}
++
++/**
++ * bfq_bic_update_cgroup - update the cgroup of @bic.
++ * @bic: the @bic to update.
++ *
++ * Make sure that @bic is enqueued in the cgroup of the current task.
++ * We need this in addition to moving bics during the cgroup attach
++ * phase because the task owning @bic could be at its first disk
++ * access or we may end up in the root cgroup as the result of a
++ * memory allocation failure and here we try to move to the right
++ * group.
++ *
++ * Must be called under the queue lock.  It is safe to use the returned
++ * value even after the rcu_read_unlock() as the migration/destruction
++ * paths act under the queue lock too.  IOW it is impossible to race with
++ * group migration/destruction and end up with an invalid group as:
++ *   a) here cgroup has not yet been destroyed, nor its destroy callback
++ *      has started execution, as current holds a reference to it,
++ *   b) if it is destroyed after rcu_read_unlock() [after current is
++ *      migrated to a different cgroup] its attach() callback will have
++ *      taken care of remove all the references to the old cgroup data.
++ */
++static struct bfq_group *bfq_bic_update_cgroup(struct bfq_io_cq *bic)
++{
++	struct bfq_data *bfqd = bic_to_bfqd(bic);
++	struct bfq_group *bfqg;
++	struct cgroup_subsys_state *css;
++
++	BUG_ON(bfqd == NULL);
++
++	rcu_read_lock();
++	css = task_css(current, bfqio_cgrp_id);
++	bfqg = __bfq_bic_change_cgroup(bfqd, bic, css);
++	rcu_read_unlock();
++
++	return bfqg;
++}
++
++/**
++ * bfq_flush_idle_tree - deactivate any entity on the idle tree of @st.
++ * @st: the service tree being flushed.
++ */
++static inline void bfq_flush_idle_tree(struct bfq_service_tree *st)
++{
++	struct bfq_entity *entity = st->first_idle;
++
++	for (; entity != NULL; entity = st->first_idle)
++		__bfq_deactivate_entity(entity, 0);
++}
++
++/**
++ * bfq_reparent_leaf_entity - move leaf entity to the root_group.
++ * @bfqd: the device data structure with the root group.
++ * @entity: the entity to move.
++ */
++static inline void bfq_reparent_leaf_entity(struct bfq_data *bfqd,
++					    struct bfq_entity *entity)
++{
++	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++
++	BUG_ON(bfqq == NULL);
++	bfq_bfqq_move(bfqd, bfqq, entity, bfqd->root_group);
++	return;
++}
++
++/**
++ * bfq_reparent_active_entities - move to the root group all active
++ *                                entities.
++ * @bfqd: the device data structure with the root group.
++ * @bfqg: the group to move from.
++ * @st: the service tree with the entities.
++ *
++ * Needs queue_lock to be taken and reference to be valid over the call.
++ */
++static inline void bfq_reparent_active_entities(struct bfq_data *bfqd,
++						struct bfq_group *bfqg,
++						struct bfq_service_tree *st)
++{
++	struct rb_root *active = &st->active;
++	struct bfq_entity *entity = NULL;
++
++	if (!RB_EMPTY_ROOT(&st->active))
++		entity = bfq_entity_of(rb_first(active));
++
++	for (; entity != NULL; entity = bfq_entity_of(rb_first(active)))
++		bfq_reparent_leaf_entity(bfqd, entity);
++
++	if (bfqg->sched_data.in_service_entity != NULL)
++		bfq_reparent_leaf_entity(bfqd,
++			bfqg->sched_data.in_service_entity);
++
++	return;
++}
++
++/**
++ * bfq_destroy_group - destroy @bfqg.
++ * @bgrp: the bfqio_cgroup containing @bfqg.
++ * @bfqg: the group being destroyed.
++ *
++ * Destroy @bfqg, making sure that it is not referenced from its parent.
++ */
++static void bfq_destroy_group(struct bfqio_cgroup *bgrp, struct bfq_group *bfqg)
++{
++	struct bfq_data *bfqd;
++	struct bfq_service_tree *st;
++	struct bfq_entity *entity = bfqg->my_entity;
++	unsigned long uninitialized_var(flags);
++	int i;
++
++	hlist_del(&bfqg->group_node);
++
++	/*
++	 * Empty all service_trees belonging to this group before
++	 * deactivating the group itself.
++	 */
++	for (i = 0; i < BFQ_IOPRIO_CLASSES; i++) {
++		st = bfqg->sched_data.service_tree + i;
++
++		/*
++		 * The idle tree may still contain bfq_queues belonging
++		 * to exited task because they never migrated to a different
++		 * cgroup from the one being destroyed now.  No one else
++		 * can access them so it's safe to act without any lock.
++		 */
++		bfq_flush_idle_tree(st);
++
++		/*
++		 * It may happen that some queues are still active
++		 * (busy) upon group destruction (if the corresponding
++		 * processes have been forced to terminate). We move
++		 * all the leaf entities corresponding to these queues
++		 * to the root_group.
++		 * Also, it may happen that the group has an entity
++		 * in service, which is disconnected from the active
++		 * tree: it must be moved, too.
++		 * There is no need to put the sync queues, as the
++		 * scheduler has taken no reference.
++		 */
++		bfqd = bfq_get_bfqd_locked(&bfqg->bfqd, &flags);
++		if (bfqd != NULL) {
++			bfq_reparent_active_entities(bfqd, bfqg, st);
++			bfq_put_bfqd_unlock(bfqd, &flags);
++		}
++		BUG_ON(!RB_EMPTY_ROOT(&st->active));
++		BUG_ON(!RB_EMPTY_ROOT(&st->idle));
++	}
++	BUG_ON(bfqg->sched_data.next_in_service != NULL);
++	BUG_ON(bfqg->sched_data.in_service_entity != NULL);
++
++	/*
++	 * We may race with device destruction, take extra care when
++	 * dereferencing bfqg->bfqd.
++	 */
++	bfqd = bfq_get_bfqd_locked(&bfqg->bfqd, &flags);
++	if (bfqd != NULL) {
++		hlist_del(&bfqg->bfqd_node);
++		__bfq_deactivate_entity(entity, 0);
++		bfq_put_async_queues(bfqd, bfqg);
++		bfq_put_bfqd_unlock(bfqd, &flags);
++	}
++	BUG_ON(entity->tree != NULL);
++
++	/*
++	 * No need to defer the kfree() to the end of the RCU grace
++	 * period: we are called from the destroy() callback of our
++	 * cgroup, so we can be sure that no one is a) still using
++	 * this cgroup or b) doing lookups in it.
++	 */
++	kfree(bfqg);
++}
++
++static void bfq_end_wr_async(struct bfq_data *bfqd)
++{
++	struct hlist_node *tmp;
++	struct bfq_group *bfqg;
++
++	hlist_for_each_entry_safe(bfqg, tmp, &bfqd->group_list, bfqd_node)
++		bfq_end_wr_async_queues(bfqd, bfqg);
++	bfq_end_wr_async_queues(bfqd, bfqd->root_group);
++}
++
++/**
++ * bfq_disconnect_groups - disconnect @bfqd from all its groups.
++ * @bfqd: the device descriptor being exited.
++ *
++ * When the device exits we just make sure that no lookup can return
++ * the now unused group structures.  They will be deallocated on cgroup
++ * destruction.
++ */
++static void bfq_disconnect_groups(struct bfq_data *bfqd)
++{
++	struct hlist_node *tmp;
++	struct bfq_group *bfqg;
++
++	bfq_log(bfqd, "disconnect_groups beginning");
++	hlist_for_each_entry_safe(bfqg, tmp, &bfqd->group_list, bfqd_node) {
++		hlist_del(&bfqg->bfqd_node);
++
++		__bfq_deactivate_entity(bfqg->my_entity, 0);
++
++		/*
++		 * Don't remove from the group hash, just set an
++		 * invalid key.  No lookups can race with the
++		 * assignment as bfqd is being destroyed; this
++		 * implies also that new elements cannot be added
++		 * to the list.
++		 */
++		rcu_assign_pointer(bfqg->bfqd, NULL);
++
++		bfq_log(bfqd, "disconnect_groups: put async for group %p",
++			bfqg);
++		bfq_put_async_queues(bfqd, bfqg);
++	}
++}
++
++static inline void bfq_free_root_group(struct bfq_data *bfqd)
++{
++	struct bfqio_cgroup *bgrp = &bfqio_root_cgroup;
++	struct bfq_group *bfqg = bfqd->root_group;
++
++	bfq_put_async_queues(bfqd, bfqg);
++
++	spin_lock_irq(&bgrp->lock);
++	hlist_del_rcu(&bfqg->group_node);
++	spin_unlock_irq(&bgrp->lock);
++
++	/*
++	 * No need to synchronize_rcu() here: since the device is gone
++	 * there cannot be any read-side access to its root_group.
++	 */
++	kfree(bfqg);
++}
++
++static struct bfq_group *bfq_alloc_root_group(struct bfq_data *bfqd, int node)
++{
++	struct bfq_group *bfqg;
++	struct bfqio_cgroup *bgrp;
++	int i;
++
++	bfqg = kzalloc_node(sizeof(*bfqg), GFP_KERNEL, node);
++	if (bfqg == NULL)
++		return NULL;
++
++	bfqg->entity.parent = NULL;
++	for (i = 0; i < BFQ_IOPRIO_CLASSES; i++)
++		bfqg->sched_data.service_tree[i] = BFQ_SERVICE_TREE_INIT;
++
++	bgrp = &bfqio_root_cgroup;
++	spin_lock_irq(&bgrp->lock);
++	rcu_assign_pointer(bfqg->bfqd, bfqd);
++	hlist_add_head_rcu(&bfqg->group_node, &bgrp->group_data);
++	spin_unlock_irq(&bgrp->lock);
++
++	return bfqg;
++}
++
++#define SHOW_FUNCTION(__VAR)						\
++static u64 bfqio_cgroup_##__VAR##_read(struct cgroup_subsys_state *css, \
++				       struct cftype *cftype)		\
++{									\
++	struct bfqio_cgroup *bgrp = css_to_bfqio(css);			\
++	u64 ret = -ENODEV;						\
++									\
++	mutex_lock(&bfqio_mutex);					\
++	if (bfqio_is_removed(bgrp))					\
++		goto out_unlock;					\
++									\
++	spin_lock_irq(&bgrp->lock);					\
++	ret = bgrp->__VAR;						\
++	spin_unlock_irq(&bgrp->lock);					\
++									\
++out_unlock:								\
++	mutex_unlock(&bfqio_mutex);					\
++	return ret;							\
++}
++
++SHOW_FUNCTION(weight);
++SHOW_FUNCTION(ioprio);
++SHOW_FUNCTION(ioprio_class);
++#undef SHOW_FUNCTION
++
++#define STORE_FUNCTION(__VAR, __MIN, __MAX)				\
++static int bfqio_cgroup_##__VAR##_write(struct cgroup_subsys_state *css,\
++					struct cftype *cftype,		\
++					u64 val)			\
++{									\
++	struct bfqio_cgroup *bgrp = css_to_bfqio(css);			\
++	struct bfq_group *bfqg;						\
++	int ret = -EINVAL;						\
++									\
++	if (val < (__MIN) || val > (__MAX))				\
++		return ret;						\
++									\
++	ret = -ENODEV;							\
++	mutex_lock(&bfqio_mutex);					\
++	if (bfqio_is_removed(bgrp))					\
++		goto out_unlock;					\
++	ret = 0;							\
++									\
++	spin_lock_irq(&bgrp->lock);					\
++	bgrp->__VAR = (unsigned short)val;				\
++	hlist_for_each_entry(bfqg, &bgrp->group_data, group_node) {	\
++		/*							\
++		 * Setting the ioprio_changed flag of the entity        \
++		 * to 1 with new_##__VAR == ##__VAR would re-set        \
++		 * the value of the weight to its ioprio mapping.       \
++		 * Set the flag only if necessary.			\
++		 */							\
++		if ((unsigned short)val != bfqg->entity.new_##__VAR) {  \
++			bfqg->entity.new_##__VAR = (unsigned short)val; \
++			/*						\
++			 * Make sure that the above new value has been	\
++			 * stored in bfqg->entity.new_##__VAR before	\
++			 * setting the ioprio_changed flag. In fact,	\
++			 * this flag may be read asynchronously (in	\
++			 * critical sections protected by a different	\
++			 * lock than that held here), and finding this	\
++			 * flag set may cause the execution of the code	\
++			 * for updating parameters whose value may	\
++			 * depend also on bfqg->entity.new_##__VAR (in	\
++			 * __bfq_entity_update_weight_prio).		\
++			 * This barrier makes sure that the new value	\
++			 * of bfqg->entity.new_##__VAR is correctly	\
++			 * seen in that code.				\
++			 */						\
++			smp_wmb();                                      \
++			bfqg->entity.ioprio_changed = 1;                \
++		}							\
++	}								\
++	spin_unlock_irq(&bgrp->lock);					\
++									\
++out_unlock:								\
++	mutex_unlock(&bfqio_mutex);					\
++	return ret;							\
++}
++
++STORE_FUNCTION(weight, BFQ_MIN_WEIGHT, BFQ_MAX_WEIGHT);
++STORE_FUNCTION(ioprio, 0, IOPRIO_BE_NR - 1);
++STORE_FUNCTION(ioprio_class, IOPRIO_CLASS_RT, IOPRIO_CLASS_IDLE);
++#undef STORE_FUNCTION
++
++static struct cftype bfqio_files[] = {
++	{
++		.name = "weight",
++		.read_u64 = bfqio_cgroup_weight_read,
++		.write_u64 = bfqio_cgroup_weight_write,
++	},
++	{
++		.name = "ioprio",
++		.read_u64 = bfqio_cgroup_ioprio_read,
++		.write_u64 = bfqio_cgroup_ioprio_write,
++	},
++	{
++		.name = "ioprio_class",
++		.read_u64 = bfqio_cgroup_ioprio_class_read,
++		.write_u64 = bfqio_cgroup_ioprio_class_write,
++	},
++	{ },	/* terminate */
++};
++
++static struct cgroup_subsys_state *bfqio_create(struct cgroup_subsys_state
++						*parent_css)
++{
++	struct bfqio_cgroup *bgrp;
++
++	if (parent_css != NULL) {
++		bgrp = kzalloc(sizeof(*bgrp), GFP_KERNEL);
++		if (bgrp == NULL)
++			return ERR_PTR(-ENOMEM);
++	} else
++		bgrp = &bfqio_root_cgroup;
++
++	spin_lock_init(&bgrp->lock);
++	INIT_HLIST_HEAD(&bgrp->group_data);
++	bgrp->ioprio = BFQ_DEFAULT_GRP_IOPRIO;
++	bgrp->ioprio_class = BFQ_DEFAULT_GRP_CLASS;
++
++	return &bgrp->css;
++}
++
++/*
++ * We cannot support shared io contexts, as we have no means to support
++ * two tasks with the same ioc in two different groups without major rework
++ * of the main bic/bfqq data structures.  By now we allow a task to change
++ * its cgroup only if it's the only owner of its ioc; the drawback of this
++ * behavior is that a group containing a task that forked using CLONE_IO
++ * will not be destroyed until the tasks sharing the ioc die.
++ */
++static int bfqio_can_attach(struct cgroup_subsys_state *css,
++			    struct cgroup_taskset *tset)
++{
++	struct task_struct *task;
++	struct io_context *ioc;
++	int ret = 0;
++
++	cgroup_taskset_for_each(task, tset) {
++		/*
++		 * task_lock() is needed to avoid races with
++		 * exit_io_context()
++		 */
++		task_lock(task);
++		ioc = task->io_context;
++		if (ioc != NULL && atomic_read(&ioc->nr_tasks) > 1)
++			/*
++			 * ioc == NULL means that the task is either too
++			 * young or exiting: if it has still no ioc the
++			 * ioc can't be shared, if the task is exiting the
++			 * attach will fail anyway, no matter what we
++			 * return here.
++			 */
++			ret = -EINVAL;
++		task_unlock(task);
++		if (ret)
++			break;
++	}
++
++	return ret;
++}
++
++static void bfqio_attach(struct cgroup_subsys_state *css,
++			 struct cgroup_taskset *tset)
++{
++	struct task_struct *task;
++	struct io_context *ioc;
++	struct io_cq *icq;
++
++	/*
++	 * IMPORTANT NOTE: The move of more than one process at a time to a
++	 * new group has not yet been tested.
++	 */
++	cgroup_taskset_for_each(task, tset) {
++		ioc = get_task_io_context(task, GFP_ATOMIC, NUMA_NO_NODE);
++		if (ioc) {
++			/*
++			 * Handle cgroup change here.
++			 */
++			rcu_read_lock();
++			hlist_for_each_entry_rcu(icq, &ioc->icq_list, ioc_node)
++				if (!strncmp(
++					icq->q->elevator->type->elevator_name,
++					"bfq", ELV_NAME_MAX))
++					bfq_bic_change_cgroup(icq_to_bic(icq),
++							      css);
++			rcu_read_unlock();
++			put_io_context(ioc);
++		}
++	}
++}
++
++static void bfqio_destroy(struct cgroup_subsys_state *css)
++{
++	struct bfqio_cgroup *bgrp = css_to_bfqio(css);
++	struct hlist_node *tmp;
++	struct bfq_group *bfqg;
++
++	/*
++	 * Since we are destroying the cgroup, there are no more tasks
++	 * referencing it, and all the RCU grace periods that may have
++	 * referenced it are ended (as the destruction of the parent
++	 * cgroup is RCU-safe); bgrp->group_data will not be accessed by
++	 * anything else and we don't need any synchronization.
++	 */
++	hlist_for_each_entry_safe(bfqg, tmp, &bgrp->group_data, group_node)
++		bfq_destroy_group(bgrp, bfqg);
++
++	BUG_ON(!hlist_empty(&bgrp->group_data));
++
++	kfree(bgrp);
++}
++
++static int bfqio_css_online(struct cgroup_subsys_state *css)
++{
++	struct bfqio_cgroup *bgrp = css_to_bfqio(css);
++
++	mutex_lock(&bfqio_mutex);
++	bgrp->online = true;
++	mutex_unlock(&bfqio_mutex);
++
++	return 0;
++}
++
++static void bfqio_css_offline(struct cgroup_subsys_state *css)
++{
++	struct bfqio_cgroup *bgrp = css_to_bfqio(css);
++
++	mutex_lock(&bfqio_mutex);
++	bgrp->online = false;
++	mutex_unlock(&bfqio_mutex);
++}
++
++struct cgroup_subsys bfqio_cgrp_subsys = {
++	.css_alloc = bfqio_create,
++	.css_online = bfqio_css_online,
++	.css_offline = bfqio_css_offline,
++	.can_attach = bfqio_can_attach,
++	.attach = bfqio_attach,
++	.css_free = bfqio_destroy,
++	.legacy_cftypes = bfqio_files,
++};
++#else
++static inline void bfq_init_entity(struct bfq_entity *entity,
++				   struct bfq_group *bfqg)
++{
++	entity->weight = entity->new_weight;
++	entity->orig_weight = entity->new_weight;
++	entity->ioprio = entity->new_ioprio;
++	entity->ioprio_class = entity->new_ioprio_class;
++	entity->sched_data = &bfqg->sched_data;
++}
++
++static inline struct bfq_group *
++bfq_bic_update_cgroup(struct bfq_io_cq *bic)
++{
++	struct bfq_data *bfqd = bic_to_bfqd(bic);
++	return bfqd->root_group;
++}
++
++static inline void bfq_bfqq_move(struct bfq_data *bfqd,
++				 struct bfq_queue *bfqq,
++				 struct bfq_entity *entity,
++				 struct bfq_group *bfqg)
++{
++}
++
++static void bfq_end_wr_async(struct bfq_data *bfqd)
++{
++	bfq_end_wr_async_queues(bfqd, bfqd->root_group);
++}
++
++static inline void bfq_disconnect_groups(struct bfq_data *bfqd)
++{
++	bfq_put_async_queues(bfqd, bfqd->root_group);
++}
++
++static inline void bfq_free_root_group(struct bfq_data *bfqd)
++{
++	kfree(bfqd->root_group);
++}
++
++static struct bfq_group *bfq_alloc_root_group(struct bfq_data *bfqd, int node)
++{
++	struct bfq_group *bfqg;
++	int i;
++
++	bfqg = kmalloc_node(sizeof(*bfqg), GFP_KERNEL | __GFP_ZERO, node);
++	if (bfqg == NULL)
++		return NULL;
++
++	for (i = 0; i < BFQ_IOPRIO_CLASSES; i++)
++		bfqg->sched_data.service_tree[i] = BFQ_SERVICE_TREE_INIT;
++
++	return bfqg;
++}
++#endif
+diff --git a/block/bfq-ioc.c b/block/bfq-ioc.c
+new file mode 100644
+index 0000000..7f6b000
+--- /dev/null
++++ b/block/bfq-ioc.c
+@@ -0,0 +1,36 @@
++/*
++ * BFQ: I/O context handling.
++ *
++ * Based on ideas and code from CFQ:
++ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
++ *
++ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
++ *		      Paolo Valente <paolo.valente@unimore.it>
++ *
++ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
++ */
++
++/**
++ * icq_to_bic - convert iocontext queue structure to bfq_io_cq.
++ * @icq: the iocontext queue.
++ */
++static inline struct bfq_io_cq *icq_to_bic(struct io_cq *icq)
++{
++	/* bic->icq is the first member, %NULL will convert to %NULL */
++	return container_of(icq, struct bfq_io_cq, icq);
++}
++
++/**
++ * bfq_bic_lookup - search into @ioc a bic associated to @bfqd.
++ * @bfqd: the lookup key.
++ * @ioc: the io_context of the process doing I/O.
++ *
++ * Queue lock must be held.
++ */
++static inline struct bfq_io_cq *bfq_bic_lookup(struct bfq_data *bfqd,
++					       struct io_context *ioc)
++{
++	if (ioc)
++		return icq_to_bic(ioc_lookup_icq(ioc, bfqd->queue));
++	return NULL;
++}
+diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
+new file mode 100644
+index 0000000..773b2ee
+--- /dev/null
++++ b/block/bfq-iosched.c
+@@ -0,0 +1,3898 @@
++/*
++ * Budget Fair Queueing (BFQ) disk scheduler.
++ *
++ * Based on ideas and code from CFQ:
++ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
++ *
++ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
++ *		      Paolo Valente <paolo.valente@unimore.it>
++ *
++ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
++ *
++ * Licensed under the GPL-2 as detailed in the accompanying COPYING.BFQ
++ * file.
++ *
++ * BFQ is a proportional-share storage-I/O scheduling algorithm based on
++ * the slice-by-slice service scheme of CFQ. But BFQ assigns budgets,
++ * measured in number of sectors, to processes instead of time slices. The
++ * device is not granted to the in-service process for a given time slice,
++ * but until it has exhausted its assigned budget. This change from the time
++ * to the service domain allows BFQ to distribute the device throughput
++ * among processes as desired, without any distortion due to ZBR, workload
++ * fluctuations or other factors. BFQ uses an ad hoc internal scheduler,
++ * called B-WF2Q+, to schedule processes according to their budgets. More
++ * precisely, BFQ schedules queues associated to processes. Thanks to the
++ * accurate policy of B-WF2Q+, BFQ can afford to assign high budgets to
++ * I/O-bound processes issuing sequential requests (to boost the
++ * throughput), and yet guarantee a low latency to interactive and soft
++ * real-time applications.
++ *
++ * BFQ is described in [1], where also a reference to the initial, more
++ * theoretical paper on BFQ can be found. The interested reader can find
++ * in the latter paper full details on the main algorithm, as well as
++ * formulas of the guarantees and formal proofs of all the properties.
++ * With respect to the version of BFQ presented in these papers, this
++ * implementation adds a few more heuristics, such as the one that
++ * guarantees a low latency to soft real-time applications, and a
++ * hierarchical extension based on H-WF2Q+.
++ *
++ * B-WF2Q+ is based on WF2Q+, that is described in [2], together with
++ * H-WF2Q+, while the augmented tree used to implement B-WF2Q+ with O(log N)
++ * complexity derives from the one introduced with EEVDF in [3].
++ *
++ * [1] P. Valente and M. Andreolini, ``Improving Application Responsiveness
++ *     with the BFQ Disk I/O Scheduler'',
++ *     Proceedings of the 5th Annual International Systems and Storage
++ *     Conference (SYSTOR '12), June 2012.
++ *
++ * http://algogroup.unimo.it/people/paolo/disk_sched/bf1-v1-suite-results.pdf
++ *
++ * [2] Jon C.R. Bennett and H. Zhang, ``Hierarchical Packet Fair Queueing
++ *     Algorithms,'' IEEE/ACM Transactions on Networking, 5(5):675-689,
++ *     Oct 1997.
++ *
++ * http://www.cs.cmu.edu/~hzhang/papers/TON-97-Oct.ps.gz
++ *
++ * [3] I. Stoica and H. Abdel-Wahab, ``Earliest Eligible Virtual Deadline
++ *     First: A Flexible and Accurate Mechanism for Proportional Share
++ *     Resource Allocation,'' technical report.
++ *
++ * http://www.cs.berkeley.edu/~istoica/papers/eevdf-tr-95.pdf
++ */
++#include <linux/module.h>
++#include <linux/slab.h>
++#include <linux/blkdev.h>
++#include <linux/cgroup.h>
++#include <linux/elevator.h>
++#include <linux/jiffies.h>
++#include <linux/rbtree.h>
++#include <linux/ioprio.h>
++#include "bfq.h"
++#include "blk.h"
++
++/* Expiration time of sync (0) and async (1) requests, in jiffies. */
++static const int bfq_fifo_expire[2] = { HZ / 4, HZ / 8 };
++
++/* Maximum backwards seek, in KiB. */
++static const int bfq_back_max = 16 * 1024;
++
++/* Penalty of a backwards seek, in number of sectors. */
++static const int bfq_back_penalty = 2;
++
++/* Idling period duration, in jiffies. */
++static int bfq_slice_idle = HZ / 125;
++
++/* Default maximum budget values, in sectors and number of requests. */
++static const int bfq_default_max_budget = 16 * 1024;
++static const int bfq_max_budget_async_rq = 4;
++
++/*
++ * Async to sync throughput distribution is controlled as follows:
++ * when an async request is served, the entity is charged the number
++ * of sectors of the request, multiplied by the factor below
++ */
++static const int bfq_async_charge_factor = 10;
++
++/* Default timeout values, in jiffies, approximating CFQ defaults. */
++static const int bfq_timeout_sync = HZ / 8;
++static int bfq_timeout_async = HZ / 25;
++
++struct kmem_cache *bfq_pool;
++
++/* Below this threshold (in ms), we consider thinktime immediate. */
++#define BFQ_MIN_TT		2
++
++/* hw_tag detection: parallel requests threshold and min samples needed. */
++#define BFQ_HW_QUEUE_THRESHOLD	4
++#define BFQ_HW_QUEUE_SAMPLES	32
++
++#define BFQQ_SEEK_THR	 (sector_t)(8 * 1024)
++#define BFQQ_SEEKY(bfqq) ((bfqq)->seek_mean > BFQQ_SEEK_THR)
++
++/* Min samples used for peak rate estimation (for autotuning). */
++#define BFQ_PEAK_RATE_SAMPLES	32
++
++/* Shift used for peak rate fixed precision calculations. */
++#define BFQ_RATE_SHIFT		16
++
++/*
++ * By default, BFQ computes the duration of the weight raising for
++ * interactive applications automatically, using the following formula:
++ * duration = (R / r) * T, where r is the peak rate of the device, and
++ * R and T are two reference parameters.
++ * In particular, R is the peak rate of the reference device (see below),
++ * and T is a reference time: given the systems that are likely to be
++ * installed on the reference device according to its speed class, T is
++ * about the maximum time needed, under BFQ and while reading two files in
++ * parallel, to load typical large applications on these systems.
++ * In practice, the slower/faster the device at hand is, the more/less it
++ * takes to load applications with respect to the reference device.
++ * Accordingly, the longer/shorter BFQ grants weight raising to interactive
++ * applications.
++ *
++ * BFQ uses four different reference pairs (R, T), depending on:
++ * . whether the device is rotational or non-rotational;
++ * . whether the device is slow, such as old or portable HDDs, as well as
++ *   SD cards, or fast, such as newer HDDs and SSDs.
++ *
++ * The device's speed class is dynamically (re)detected in
++ * bfq_update_peak_rate() every time the estimated peak rate is updated.
++ *
++ * In the following definitions, R_slow[0]/R_fast[0] and T_slow[0]/T_fast[0]
++ * are the reference values for a slow/fast rotational device, whereas
++ * R_slow[1]/R_fast[1] and T_slow[1]/T_fast[1] are the reference values for
++ * a slow/fast non-rotational device. Finally, device_speed_thresh are the
++ * thresholds used to switch between speed classes.
++ * Both the reference peak rates and the thresholds are measured in
++ * sectors/usec, left-shifted by BFQ_RATE_SHIFT.
++ */
++static int R_slow[2] = {1536, 10752};
++static int R_fast[2] = {17415, 34791};
++/*
++ * To improve readability, a conversion function is used to initialize the
++ * following arrays, which entails that they can be initialized only in a
++ * function.
++ */
++static int T_slow[2];
++static int T_fast[2];
++static int device_speed_thresh[2];
++
++#define BFQ_SERVICE_TREE_INIT	((struct bfq_service_tree)		\
++				{ RB_ROOT, RB_ROOT, NULL, NULL, 0, 0 })
++
++#define RQ_BIC(rq)		((struct bfq_io_cq *) (rq)->elv.priv[0])
++#define RQ_BFQQ(rq)		((rq)->elv.priv[1])
++
++static inline void bfq_schedule_dispatch(struct bfq_data *bfqd);
++
++#include "bfq-ioc.c"
++#include "bfq-sched.c"
++#include "bfq-cgroup.c"
++
++#define bfq_class_idle(bfqq)	((bfqq)->entity.ioprio_class ==\
++				 IOPRIO_CLASS_IDLE)
++#define bfq_class_rt(bfqq)	((bfqq)->entity.ioprio_class ==\
++				 IOPRIO_CLASS_RT)
++
++#define bfq_sample_valid(samples)	((samples) > 80)
++
++/*
++ * The following macro groups conditions that need to be evaluated when
++ * checking if existing queues and groups form a symmetric scenario
++ * and therefore idling can be reduced or disabled for some of the
++ * queues. See the comment to the function bfq_bfqq_must_not_expire()
++ * for further details.
++ */
++#ifdef CONFIG_CGROUP_BFQIO
++#define symmetric_scenario	  (!bfqd->active_numerous_groups && \
++				   !bfq_differentiated_weights(bfqd))
++#else
++#define symmetric_scenario	  (!bfq_differentiated_weights(bfqd))
++#endif
++
++/*
++ * We regard a request as SYNC, if either it's a read or has the SYNC bit
++ * set (in which case it could also be a direct WRITE).
++ */
++static inline int bfq_bio_sync(struct bio *bio)
++{
++	if (bio_data_dir(bio) == READ || (bio->bi_rw & REQ_SYNC))
++		return 1;
++
++	return 0;
++}
++
++/*
++ * Scheduler run of queue, if there are requests pending and no one in the
++ * driver that will restart queueing.
++ */
++static inline void bfq_schedule_dispatch(struct bfq_data *bfqd)
++{
++	if (bfqd->queued != 0) {
++		bfq_log(bfqd, "schedule dispatch");
++		kblockd_schedule_work(&bfqd->unplug_work);
++	}
++}
++
++/*
++ * Lifted from AS - choose which of rq1 and rq2 that is best served now.
++ * We choose the request that is closesr to the head right now.  Distance
++ * behind the head is penalized and only allowed to a certain extent.
++ */
++static struct request *bfq_choose_req(struct bfq_data *bfqd,
++				      struct request *rq1,
++				      struct request *rq2,
++				      sector_t last)
++{
++	sector_t s1, s2, d1 = 0, d2 = 0;
++	unsigned long back_max;
++#define BFQ_RQ1_WRAP	0x01 /* request 1 wraps */
++#define BFQ_RQ2_WRAP	0x02 /* request 2 wraps */
++	unsigned wrap = 0; /* bit mask: requests behind the disk head? */
++
++	if (rq1 == NULL || rq1 == rq2)
++		return rq2;
++	if (rq2 == NULL)
++		return rq1;
++
++	if (rq_is_sync(rq1) && !rq_is_sync(rq2))
++		return rq1;
++	else if (rq_is_sync(rq2) && !rq_is_sync(rq1))
++		return rq2;
++	if ((rq1->cmd_flags & REQ_META) && !(rq2->cmd_flags & REQ_META))
++		return rq1;
++	else if ((rq2->cmd_flags & REQ_META) && !(rq1->cmd_flags & REQ_META))
++		return rq2;
++
++	s1 = blk_rq_pos(rq1);
++	s2 = blk_rq_pos(rq2);
++
++	/*
++	 * By definition, 1KiB is 2 sectors.
++	 */
++	back_max = bfqd->bfq_back_max * 2;
++
++	/*
++	 * Strict one way elevator _except_ in the case where we allow
++	 * short backward seeks which are biased as twice the cost of a
++	 * similar forward seek.
++	 */
++	if (s1 >= last)
++		d1 = s1 - last;
++	else if (s1 + back_max >= last)
++		d1 = (last - s1) * bfqd->bfq_back_penalty;
++	else
++		wrap |= BFQ_RQ1_WRAP;
++
++	if (s2 >= last)
++		d2 = s2 - last;
++	else if (s2 + back_max >= last)
++		d2 = (last - s2) * bfqd->bfq_back_penalty;
++	else
++		wrap |= BFQ_RQ2_WRAP;
++
++	/* Found required data */
++
++	/*
++	 * By doing switch() on the bit mask "wrap" we avoid having to
++	 * check two variables for all permutations: --> faster!
++	 */
++	switch (wrap) {
++	case 0: /* common case for CFQ: rq1 and rq2 not wrapped */
++		if (d1 < d2)
++			return rq1;
++		else if (d2 < d1)
++			return rq2;
++		else {
++			if (s1 >= s2)
++				return rq1;
++			else
++				return rq2;
++		}
++
++	case BFQ_RQ2_WRAP:
++		return rq1;
++	case BFQ_RQ1_WRAP:
++		return rq2;
++	case (BFQ_RQ1_WRAP|BFQ_RQ2_WRAP): /* both rqs wrapped */
++	default:
++		/*
++		 * Since both rqs are wrapped,
++		 * start with the one that's further behind head
++		 * (--> only *one* back seek required),
++		 * since back seek takes more time than forward.
++		 */
++		if (s1 <= s2)
++			return rq1;
++		else
++			return rq2;
++	}
++}
++
++static struct bfq_queue *
++bfq_rq_pos_tree_lookup(struct bfq_data *bfqd, struct rb_root *root,
++		     sector_t sector, struct rb_node **ret_parent,
++		     struct rb_node ***rb_link)
++{
++	struct rb_node **p, *parent;
++	struct bfq_queue *bfqq = NULL;
++
++	parent = NULL;
++	p = &root->rb_node;
++	while (*p) {
++		struct rb_node **n;
++
++		parent = *p;
++		bfqq = rb_entry(parent, struct bfq_queue, pos_node);
++
++		/*
++		 * Sort strictly based on sector. Smallest to the left,
++		 * largest to the right.
++		 */
++		if (sector > blk_rq_pos(bfqq->next_rq))
++			n = &(*p)->rb_right;
++		else if (sector < blk_rq_pos(bfqq->next_rq))
++			n = &(*p)->rb_left;
++		else
++			break;
++		p = n;
++		bfqq = NULL;
++	}
++
++	*ret_parent = parent;
++	if (rb_link)
++		*rb_link = p;
++
++	bfq_log(bfqd, "rq_pos_tree_lookup %llu: returning %d",
++		(long long unsigned)sector,
++		bfqq != NULL ? bfqq->pid : 0);
++
++	return bfqq;
++}
++
++static void bfq_rq_pos_tree_add(struct bfq_data *bfqd, struct bfq_queue *bfqq)
++{
++	struct rb_node **p, *parent;
++	struct bfq_queue *__bfqq;
++
++	if (bfqq->pos_root != NULL) {
++		rb_erase(&bfqq->pos_node, bfqq->pos_root);
++		bfqq->pos_root = NULL;
++	}
++
++	if (bfq_class_idle(bfqq))
++		return;
++	if (!bfqq->next_rq)
++		return;
++
++	bfqq->pos_root = &bfqd->rq_pos_tree;
++	__bfqq = bfq_rq_pos_tree_lookup(bfqd, bfqq->pos_root,
++			blk_rq_pos(bfqq->next_rq), &parent, &p);
++	if (__bfqq == NULL) {
++		rb_link_node(&bfqq->pos_node, parent, p);
++		rb_insert_color(&bfqq->pos_node, bfqq->pos_root);
++	} else
++		bfqq->pos_root = NULL;
++}
++
++/*
++ * Tell whether there are active queues or groups with differentiated weights.
++ */
++static inline bool bfq_differentiated_weights(struct bfq_data *bfqd)
++{
++	/*
++	 * For weights to differ, at least one of the trees must contain
++	 * at least two nodes.
++	 */
++	return (!RB_EMPTY_ROOT(&bfqd->queue_weights_tree) &&
++		(bfqd->queue_weights_tree.rb_node->rb_left ||
++		 bfqd->queue_weights_tree.rb_node->rb_right)
++#ifdef CONFIG_CGROUP_BFQIO
++	       ) ||
++	       (!RB_EMPTY_ROOT(&bfqd->group_weights_tree) &&
++		(bfqd->group_weights_tree.rb_node->rb_left ||
++		 bfqd->group_weights_tree.rb_node->rb_right)
++#endif
++	       );
++}
++
++/*
++ * If the weight-counter tree passed as input contains no counter for
++ * the weight of the input entity, then add that counter; otherwise just
++ * increment the existing counter.
++ *
++ * Note that weight-counter trees contain few nodes in mostly symmetric
++ * scenarios. For example, if all queues have the same weight, then the
++ * weight-counter tree for the queues may contain at most one node.
++ * This holds even if low_latency is on, because weight-raised queues
++ * are not inserted in the tree.
++ * In most scenarios, the rate at which nodes are created/destroyed
++ * should be low too.
++ */
++static void bfq_weights_tree_add(struct bfq_data *bfqd,
++				 struct bfq_entity *entity,
++				 struct rb_root *root)
++{
++	struct rb_node **new = &(root->rb_node), *parent = NULL;
++
++	/*
++	 * Do not insert if the entity is already associated with a
++	 * counter, which happens if:
++	 *   1) the entity is associated with a queue,
++	 *   2) a request arrival has caused the queue to become both
++	 *      non-weight-raised, and hence change its weight, and
++	 *      backlogged; in this respect, each of the two events
++	 *      causes an invocation of this function,
++	 *   3) this is the invocation of this function caused by the
++	 *      second event. This second invocation is actually useless,
++	 *      and we handle this fact by exiting immediately. More
++	 *      efficient or clearer solutions might possibly be adopted.
++	 */
++	if (entity->weight_counter)
++		return;
++
++	while (*new) {
++		struct bfq_weight_counter *__counter = container_of(*new,
++						struct bfq_weight_counter,
++						weights_node);
++		parent = *new;
++
++		if (entity->weight == __counter->weight) {
++			entity->weight_counter = __counter;
++			goto inc_counter;
++		}
++		if (entity->weight < __counter->weight)
++			new = &((*new)->rb_left);
++		else
++			new = &((*new)->rb_right);
++	}
++
++	entity->weight_counter = kzalloc(sizeof(struct bfq_weight_counter),
++					 GFP_ATOMIC);
++	entity->weight_counter->weight = entity->weight;
++	rb_link_node(&entity->weight_counter->weights_node, parent, new);
++	rb_insert_color(&entity->weight_counter->weights_node, root);
++
++inc_counter:
++	entity->weight_counter->num_active++;
++}
++
++/*
++ * Decrement the weight counter associated with the entity, and, if the
++ * counter reaches 0, remove the counter from the tree.
++ * See the comments to the function bfq_weights_tree_add() for considerations
++ * about overhead.
++ */
++static void bfq_weights_tree_remove(struct bfq_data *bfqd,
++				    struct bfq_entity *entity,
++				    struct rb_root *root)
++{
++	if (!entity->weight_counter)
++		return;
++
++	BUG_ON(RB_EMPTY_ROOT(root));
++	BUG_ON(entity->weight_counter->weight != entity->weight);
++
++	BUG_ON(!entity->weight_counter->num_active);
++	entity->weight_counter->num_active--;
++	if (entity->weight_counter->num_active > 0)
++		goto reset_entity_pointer;
++
++	rb_erase(&entity->weight_counter->weights_node, root);
++	kfree(entity->weight_counter);
++
++reset_entity_pointer:
++	entity->weight_counter = NULL;
++}
++
++static struct request *bfq_find_next_rq(struct bfq_data *bfqd,
++					struct bfq_queue *bfqq,
++					struct request *last)
++{
++	struct rb_node *rbnext = rb_next(&last->rb_node);
++	struct rb_node *rbprev = rb_prev(&last->rb_node);
++	struct request *next = NULL, *prev = NULL;
++
++	BUG_ON(RB_EMPTY_NODE(&last->rb_node));
++
++	if (rbprev != NULL)
++		prev = rb_entry_rq(rbprev);
++
++	if (rbnext != NULL)
++		next = rb_entry_rq(rbnext);
++	else {
++		rbnext = rb_first(&bfqq->sort_list);
++		if (rbnext && rbnext != &last->rb_node)
++			next = rb_entry_rq(rbnext);
++	}
++
++	return bfq_choose_req(bfqd, next, prev, blk_rq_pos(last));
++}
++
++/* see the definition of bfq_async_charge_factor for details */
++static inline unsigned long bfq_serv_to_charge(struct request *rq,
++					       struct bfq_queue *bfqq)
++{
++	return blk_rq_sectors(rq) *
++		(1 + ((!bfq_bfqq_sync(bfqq)) * (bfqq->wr_coeff == 1) *
++		bfq_async_charge_factor));
++}
++
++/**
++ * bfq_updated_next_req - update the queue after a new next_rq selection.
++ * @bfqd: the device data the queue belongs to.
++ * @bfqq: the queue to update.
++ *
++ * If the first request of a queue changes we make sure that the queue
++ * has enough budget to serve at least its first request (if the
++ * request has grown).  We do this because if the queue has not enough
++ * budget for its first request, it has to go through two dispatch
++ * rounds to actually get it dispatched.
++ */
++static void bfq_updated_next_req(struct bfq_data *bfqd,
++				 struct bfq_queue *bfqq)
++{
++	struct bfq_entity *entity = &bfqq->entity;
++	struct bfq_service_tree *st = bfq_entity_service_tree(entity);
++	struct request *next_rq = bfqq->next_rq;
++	unsigned long new_budget;
++
++	if (next_rq == NULL)
++		return;
++
++	if (bfqq == bfqd->in_service_queue)
++		/*
++		 * In order not to break guarantees, budgets cannot be
++		 * changed after an entity has been selected.
++		 */
++		return;
++
++	BUG_ON(entity->tree != &st->active);
++	BUG_ON(entity == entity->sched_data->in_service_entity);
++
++	new_budget = max_t(unsigned long, bfqq->max_budget,
++			   bfq_serv_to_charge(next_rq, bfqq));
++	if (entity->budget != new_budget) {
++		entity->budget = new_budget;
++		bfq_log_bfqq(bfqd, bfqq, "updated next rq: new budget %lu",
++					 new_budget);
++		bfq_activate_bfqq(bfqd, bfqq);
++	}
++}
++
++static inline unsigned int bfq_wr_duration(struct bfq_data *bfqd)
++{
++	u64 dur;
++
++	if (bfqd->bfq_wr_max_time > 0)
++		return bfqd->bfq_wr_max_time;
++
++	dur = bfqd->RT_prod;
++	do_div(dur, bfqd->peak_rate);
++
++	return dur;
++}
++
++/* Empty burst list and add just bfqq (see comments to bfq_handle_burst) */
++static inline void bfq_reset_burst_list(struct bfq_data *bfqd,
++					struct bfq_queue *bfqq)
++{
++	struct bfq_queue *item;
++	struct hlist_node *n;
++
++	hlist_for_each_entry_safe(item, n, &bfqd->burst_list, burst_list_node)
++		hlist_del_init(&item->burst_list_node);
++	hlist_add_head(&bfqq->burst_list_node, &bfqd->burst_list);
++	bfqd->burst_size = 1;
++}
++
++/* Add bfqq to the list of queues in current burst (see bfq_handle_burst) */
++static void bfq_add_to_burst(struct bfq_data *bfqd, struct bfq_queue *bfqq)
++{
++	/* Increment burst size to take into account also bfqq */
++	bfqd->burst_size++;
++
++	if (bfqd->burst_size == bfqd->bfq_large_burst_thresh) {
++		struct bfq_queue *pos, *bfqq_item;
++		struct hlist_node *n;
++
++		/*
++		 * Enough queues have been activated shortly after each
++		 * other to consider this burst as large.
++		 */
++		bfqd->large_burst = true;
++
++		/*
++		 * We can now mark all queues in the burst list as
++		 * belonging to a large burst.
++		 */
++		hlist_for_each_entry(bfqq_item, &bfqd->burst_list,
++				     burst_list_node)
++		        bfq_mark_bfqq_in_large_burst(bfqq_item);
++		bfq_mark_bfqq_in_large_burst(bfqq);
++
++		/*
++		 * From now on, and until the current burst finishes, any
++		 * new queue being activated shortly after the last queue
++		 * was inserted in the burst can be immediately marked as
++		 * belonging to a large burst. So the burst list is not
++		 * needed any more. Remove it.
++		 */
++		hlist_for_each_entry_safe(pos, n, &bfqd->burst_list,
++					  burst_list_node)
++			hlist_del_init(&pos->burst_list_node);
++	} else /* burst not yet large: add bfqq to the burst list */
++		hlist_add_head(&bfqq->burst_list_node, &bfqd->burst_list);
++}
++
++/*
++ * If many queues happen to become active shortly after each other, then,
++ * to help the processes associated to these queues get their job done as
++ * soon as possible, it is usually better to not grant either weight-raising
++ * or device idling to these queues. In this comment we describe, firstly,
++ * the reasons why this fact holds, and, secondly, the next function, which
++ * implements the main steps needed to properly mark these queues so that
++ * they can then be treated in a different way.
++ *
++ * As for the terminology, we say that a queue becomes active, i.e.,
++ * switches from idle to backlogged, either when it is created (as a
++ * consequence of the arrival of an I/O request), or, if already existing,
++ * when a new request for the queue arrives while the queue is idle.
++ * Bursts of activations, i.e., activations of different queues occurring
++ * shortly after each other, are typically caused by services or applications
++ * that spawn or reactivate many parallel threads/processes. Examples are
++ * systemd during boot or git grep.
++ *
++ * These services or applications benefit mostly from a high throughput:
++ * the quicker the requests of the activated queues are cumulatively served,
++ * the sooner the target job of these queues gets completed. As a consequence,
++ * weight-raising any of these queues, which also implies idling the device
++ * for it, is almost always counterproductive: in most cases it just lowers
++ * throughput.
++ *
++ * On the other hand, a burst of activations may be also caused by the start
++ * of an application that does not consist in a lot of parallel I/O-bound
++ * threads. In fact, with a complex application, the burst may be just a
++ * consequence of the fact that several processes need to be executed to
++ * start-up the application. To start an application as quickly as possible,
++ * the best thing to do is to privilege the I/O related to the application
++ * with respect to all other I/O. Therefore, the best strategy to start as
++ * quickly as possible an application that causes a burst of activations is
++ * to weight-raise all the queues activated during the burst. This is the
++ * exact opposite of the best strategy for the other type of bursts.
++ *
++ * In the end, to take the best action for each of the two cases, the two
++ * types of bursts need to be distinguished. Fortunately, this seems
++ * relatively easy to do, by looking at the sizes of the bursts. In
++ * particular, we found a threshold such that bursts with a larger size
++ * than that threshold are apparently caused only by services or commands
++ * such as systemd or git grep. For brevity, hereafter we call just 'large'
++ * these bursts. BFQ *does not* weight-raise queues whose activations occur
++ * in a large burst. In addition, for each of these queues BFQ performs or
++ * does not perform idling depending on which choice boosts the throughput
++ * most. The exact choice depends on the device and request pattern at
++ * hand.
++ *
++ * Turning back to the next function, it implements all the steps needed
++ * to detect the occurrence of a large burst and to properly mark all the
++ * queues belonging to it (so that they can then be treated in a different
++ * way). This goal is achieved by maintaining a special "burst list" that
++ * holds, temporarily, the queues that belong to the burst in progress. The
++ * list is then used to mark these queues as belonging to a large burst if
++ * the burst does become large. The main steps are the following.
++ *
++ * . when the very first queue is activated, the queue is inserted into the
++ *   list (as it could be the first queue in a possible burst)
++ *
++ * . if the current burst has not yet become large, and a queue Q that does
++ *   not yet belong to the burst is activated shortly after the last time
++ *   at which a new queue entered the burst list, then the function appends
++ *   Q to the burst list
++ *
++ * . if, as a consequence of the previous step, the burst size reaches
++ *   the large-burst threshold, then
++ *
++ *     . all the queues in the burst list are marked as belonging to a
++ *       large burst
++ *
++ *     . the burst list is deleted; in fact, the burst list already served
++ *       its purpose (keeping temporarily track of the queues in a burst,
++ *       so as to be able to mark them as belonging to a large burst in the
++ *       previous sub-step), and now is not needed any more
++ *
++ *     . the device enters a large-burst mode
++ *
++ * . if a queue Q that does not belong to the burst is activated while
++ *   the device is in large-burst mode and shortly after the last time
++ *   at which a queue either entered the burst list or was marked as
++ *   belonging to the current large burst, then Q is immediately marked
++ *   as belonging to a large burst.
++ *
++ * . if a queue Q that does not belong to the burst is activated a while
++ *   later, i.e., not shortly after, than the last time at which a queue
++ *   either entered the burst list or was marked as belonging to the
++ *   current large burst, then the current burst is deemed as finished and:
++ *
++ *        . the large-burst mode is reset if set
++ *
++ *        . the burst list is emptied
++ *
++ *        . Q is inserted in the burst list, as Q may be the first queue
++ *          in a possible new burst (then the burst list contains just Q
++ *          after this step).
++ */
++static void bfq_handle_burst(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++			     bool idle_for_long_time)
++{
++	/*
++	 * If bfqq happened to be activated in a burst, but has been idle
++	 * for at least as long as an interactive queue, then we assume
++	 * that, in the overall I/O initiated in the burst, the I/O
++	 * associated to bfqq is finished. So bfqq does not need to be
++	 * treated as a queue belonging to a burst anymore. Accordingly,
++	 * we reset bfqq's in_large_burst flag if set, and remove bfqq
++	 * from the burst list if it's there. We do not decrement instead
++	 * burst_size, because the fact that bfqq does not need to belong
++	 * to the burst list any more does not invalidate the fact that
++	 * bfqq may have been activated during the current burst.
++	 */
++	if (idle_for_long_time) {
++		hlist_del_init(&bfqq->burst_list_node);
++		bfq_clear_bfqq_in_large_burst(bfqq);
++	}
++
++	/*
++	 * If bfqq is already in the burst list or is part of a large
++	 * burst, then there is nothing else to do.
++	 */
++	if (!hlist_unhashed(&bfqq->burst_list_node) ||
++	    bfq_bfqq_in_large_burst(bfqq))
++		return;
++
++	/*
++	 * If bfqq's activation happens late enough, then the current
++	 * burst is finished, and related data structures must be reset.
++	 *
++	 * In this respect, consider the special case where bfqq is the very
++	 * first queue being activated. In this case, last_ins_in_burst is
++	 * not yet significant when we get here. But it is easy to verify
++	 * that, whether or not the following condition is true, bfqq will
++	 * end up being inserted into the burst list. In particular the
++	 * list will happen to contain only bfqq. And this is exactly what
++	 * has to happen, as bfqq may be the first queue in a possible
++	 * burst.
++	 */
++	if (time_is_before_jiffies(bfqd->last_ins_in_burst +
++	    bfqd->bfq_burst_interval)) {
++		bfqd->large_burst = false;
++		bfq_reset_burst_list(bfqd, bfqq);
++		return;
++	}
++
++	/*
++	 * If we get here, then bfqq is being activated shortly after the
++	 * last queue. So, if the current burst is also large, we can mark
++	 * bfqq as belonging to this large burst immediately.
++	 */
++	if (bfqd->large_burst) {
++		bfq_mark_bfqq_in_large_burst(bfqq);
++		return;
++	}
++
++	/*
++	 * If we get here, then a large-burst state has not yet been
++	 * reached, but bfqq is being activated shortly after the last
++	 * queue. Then we add bfqq to the burst.
++	 */
++	bfq_add_to_burst(bfqd, bfqq);
++}
++
++static void bfq_add_request(struct request *rq)
++{
++	struct bfq_queue *bfqq = RQ_BFQQ(rq);
++	struct bfq_entity *entity = &bfqq->entity;
++	struct bfq_data *bfqd = bfqq->bfqd;
++	struct request *next_rq, *prev;
++	unsigned long old_wr_coeff = bfqq->wr_coeff;
++	bool interactive = false;
++
++	bfq_log_bfqq(bfqd, bfqq, "add_request %d", rq_is_sync(rq));
++	bfqq->queued[rq_is_sync(rq)]++;
++	bfqd->queued++;
++
++	elv_rb_add(&bfqq->sort_list, rq);
++
++	/*
++	 * Check if this request is a better next-serve candidate.
++	 */
++	prev = bfqq->next_rq;
++	next_rq = bfq_choose_req(bfqd, bfqq->next_rq, rq, bfqd->last_position);
++	BUG_ON(next_rq == NULL);
++	bfqq->next_rq = next_rq;
++
++	/*
++	 * Adjust priority tree position, if next_rq changes.
++	 */
++	if (prev != bfqq->next_rq)
++		bfq_rq_pos_tree_add(bfqd, bfqq);
++
++	if (!bfq_bfqq_busy(bfqq)) {
++		bool soft_rt,
++		     idle_for_long_time = time_is_before_jiffies(
++						bfqq->budget_timeout +
++						bfqd->bfq_wr_min_idle_time);
++
++		if (bfq_bfqq_sync(bfqq)) {
++			bool already_in_burst =
++			   !hlist_unhashed(&bfqq->burst_list_node) ||
++			   bfq_bfqq_in_large_burst(bfqq);
++			bfq_handle_burst(bfqd, bfqq, idle_for_long_time);
++			/*
++			 * If bfqq was not already in the current burst,
++			 * then, at this point, bfqq either has been
++			 * added to the current burst or has caused the
++			 * current burst to terminate. In particular, in
++			 * the second case, bfqq has become the first
++			 * queue in a possible new burst.
++			 * In both cases last_ins_in_burst needs to be
++			 * moved forward.
++			 */
++			if (!already_in_burst)
++				bfqd->last_ins_in_burst = jiffies;
++		}
++
++		soft_rt = bfqd->bfq_wr_max_softrt_rate > 0 &&
++			!bfq_bfqq_in_large_burst(bfqq) &&
++			time_is_before_jiffies(bfqq->soft_rt_next_start);
++		interactive = !bfq_bfqq_in_large_burst(bfqq) &&
++			      idle_for_long_time;
++		entity->budget = max_t(unsigned long, bfqq->max_budget,
++				       bfq_serv_to_charge(next_rq, bfqq));
++
++		if (!bfq_bfqq_IO_bound(bfqq)) {
++			if (time_before(jiffies,
++					RQ_BIC(rq)->ttime.last_end_request +
++					bfqd->bfq_slice_idle)) {
++				bfqq->requests_within_timer++;
++				if (bfqq->requests_within_timer >=
++				    bfqd->bfq_requests_within_timer)
++					bfq_mark_bfqq_IO_bound(bfqq);
++			} else
++				bfqq->requests_within_timer = 0;
++		}
++
++		if (!bfqd->low_latency)
++			goto add_bfqq_busy;
++
++		/*
++		 * If the queue is not being boosted and has been idle
++		 * for enough time, start a weight-raising period
++		 */
++		if (old_wr_coeff == 1 && (interactive || soft_rt)) {
++			bfqq->wr_coeff = bfqd->bfq_wr_coeff;
++			if (interactive)
++				bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
++			else
++				bfqq->wr_cur_max_time =
++					bfqd->bfq_wr_rt_max_time;
++			bfq_log_bfqq(bfqd, bfqq,
++				     "wrais starting at %lu, rais_max_time %u",
++				     jiffies,
++				     jiffies_to_msecs(bfqq->wr_cur_max_time));
++		} else if (old_wr_coeff > 1) {
++			if (interactive)
++				bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
++			else if (bfq_bfqq_in_large_burst(bfqq) ||
++				 (bfqq->wr_cur_max_time ==
++				  bfqd->bfq_wr_rt_max_time &&
++				  !soft_rt)) {
++				bfqq->wr_coeff = 1;
++				bfq_log_bfqq(bfqd, bfqq,
++					"wrais ending at %lu, rais_max_time %u",
++					jiffies,
++					jiffies_to_msecs(bfqq->
++						wr_cur_max_time));
++			} else if (time_before(
++					bfqq->last_wr_start_finish +
++					bfqq->wr_cur_max_time,
++					jiffies +
++					bfqd->bfq_wr_rt_max_time) &&
++				   soft_rt) {
++				/*
++				 *
++				 * The remaining weight-raising time is lower
++				 * than bfqd->bfq_wr_rt_max_time, which
++				 * means that the application is enjoying
++				 * weight raising either because deemed soft-
++				 * rt in the near past, or because deemed
++				 * interactive a long ago. In both cases,
++				 * resetting now the current remaining weight-
++				 * raising time for the application to the
++				 * weight-raising duration for soft rt
++				 * applications would not cause any latency
++				 * increase for the application (as the new
++				 * duration would be higher than the remaining
++				 * time).
++				 *
++				 * In addition, the application is now meeting
++				 * the requirements for being deemed soft rt.
++				 * In the end we can correctly and safely
++				 * (re)charge the weight-raising duration for
++				 * the application with the weight-raising
++				 * duration for soft rt applications.
++				 *
++				 * In particular, doing this recharge now, i.e.,
++				 * before the weight-raising period for the
++				 * application finishes, reduces the probability
++				 * of the following negative scenario:
++				 * 1) the weight of a soft rt application is
++				 *    raised at startup (as for any newly
++				 *    created application),
++				 * 2) since the application is not interactive,
++				 *    at a certain time weight-raising is
++				 *    stopped for the application,
++				 * 3) at that time the application happens to
++				 *    still have pending requests, and hence
++				 *    is destined to not have a chance to be
++				 *    deemed soft rt before these requests are
++				 *    completed (see the comments to the
++				 *    function bfq_bfqq_softrt_next_start()
++				 *    for details on soft rt detection),
++				 * 4) these pending requests experience a high
++				 *    latency because the application is not
++				 *    weight-raised while they are pending.
++				 */
++				bfqq->last_wr_start_finish = jiffies;
++				bfqq->wr_cur_max_time =
++					bfqd->bfq_wr_rt_max_time;
++			}
++		}
++		if (old_wr_coeff != bfqq->wr_coeff)
++			entity->ioprio_changed = 1;
++add_bfqq_busy:
++		bfqq->last_idle_bklogged = jiffies;
++		bfqq->service_from_backlogged = 0;
++		bfq_clear_bfqq_softrt_update(bfqq);
++		bfq_add_bfqq_busy(bfqd, bfqq);
++	} else {
++		if (bfqd->low_latency && old_wr_coeff == 1 && !rq_is_sync(rq) &&
++		    time_is_before_jiffies(
++				bfqq->last_wr_start_finish +
++				bfqd->bfq_wr_min_inter_arr_async)) {
++			bfqq->wr_coeff = bfqd->bfq_wr_coeff;
++			bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
++
++			bfqd->wr_busy_queues++;
++			entity->ioprio_changed = 1;
++			bfq_log_bfqq(bfqd, bfqq,
++			    "non-idle wrais starting at %lu, rais_max_time %u",
++			    jiffies,
++			    jiffies_to_msecs(bfqq->wr_cur_max_time));
++		}
++		if (prev != bfqq->next_rq)
++			bfq_updated_next_req(bfqd, bfqq);
++	}
++
++	if (bfqd->low_latency &&
++		(old_wr_coeff == 1 || bfqq->wr_coeff == 1 || interactive))
++		bfqq->last_wr_start_finish = jiffies;
++}
++
++static struct request *bfq_find_rq_fmerge(struct bfq_data *bfqd,
++					  struct bio *bio)
++{
++	struct task_struct *tsk = current;
++	struct bfq_io_cq *bic;
++	struct bfq_queue *bfqq;
++
++	bic = bfq_bic_lookup(bfqd, tsk->io_context);
++	if (bic == NULL)
++		return NULL;
++
++	bfqq = bic_to_bfqq(bic, bfq_bio_sync(bio));
++	if (bfqq != NULL)
++		return elv_rb_find(&bfqq->sort_list, bio_end_sector(bio));
++
++	return NULL;
++}
++
++static void bfq_activate_request(struct request_queue *q, struct request *rq)
++{
++	struct bfq_data *bfqd = q->elevator->elevator_data;
++
++	bfqd->rq_in_driver++;
++	bfqd->last_position = blk_rq_pos(rq) + blk_rq_sectors(rq);
++	bfq_log(bfqd, "activate_request: new bfqd->last_position %llu",
++		(long long unsigned)bfqd->last_position);
++}
++
++static inline void bfq_deactivate_request(struct request_queue *q,
++					  struct request *rq)
++{
++	struct bfq_data *bfqd = q->elevator->elevator_data;
++
++	BUG_ON(bfqd->rq_in_driver == 0);
++	bfqd->rq_in_driver--;
++}
++
++static void bfq_remove_request(struct request *rq)
++{
++	struct bfq_queue *bfqq = RQ_BFQQ(rq);
++	struct bfq_data *bfqd = bfqq->bfqd;
++	const int sync = rq_is_sync(rq);
++
++	if (bfqq->next_rq == rq) {
++		bfqq->next_rq = bfq_find_next_rq(bfqd, bfqq, rq);
++		bfq_updated_next_req(bfqd, bfqq);
++	}
++
++	if (rq->queuelist.prev != &rq->queuelist)
++		list_del_init(&rq->queuelist);
++	BUG_ON(bfqq->queued[sync] == 0);
++	bfqq->queued[sync]--;
++	bfqd->queued--;
++	elv_rb_del(&bfqq->sort_list, rq);
++
++	if (RB_EMPTY_ROOT(&bfqq->sort_list)) {
++		if (bfq_bfqq_busy(bfqq) && bfqq != bfqd->in_service_queue)
++			bfq_del_bfqq_busy(bfqd, bfqq, 1);
++		/*
++		 * Remove queue from request-position tree as it is empty.
++		 */
++		if (bfqq->pos_root != NULL) {
++			rb_erase(&bfqq->pos_node, bfqq->pos_root);
++			bfqq->pos_root = NULL;
++		}
++	}
++
++	if (rq->cmd_flags & REQ_META) {
++		BUG_ON(bfqq->meta_pending == 0);
++		bfqq->meta_pending--;
++	}
++}
++
++static int bfq_merge(struct request_queue *q, struct request **req,
++		     struct bio *bio)
++{
++	struct bfq_data *bfqd = q->elevator->elevator_data;
++	struct request *__rq;
++
++	__rq = bfq_find_rq_fmerge(bfqd, bio);
++	if (__rq != NULL && elv_rq_merge_ok(__rq, bio)) {
++		*req = __rq;
++		return ELEVATOR_FRONT_MERGE;
++	}
++
++	return ELEVATOR_NO_MERGE;
++}
++
++static void bfq_merged_request(struct request_queue *q, struct request *req,
++			       int type)
++{
++	if (type == ELEVATOR_FRONT_MERGE &&
++	    rb_prev(&req->rb_node) &&
++	    blk_rq_pos(req) <
++	    blk_rq_pos(container_of(rb_prev(&req->rb_node),
++				    struct request, rb_node))) {
++		struct bfq_queue *bfqq = RQ_BFQQ(req);
++		struct bfq_data *bfqd = bfqq->bfqd;
++		struct request *prev, *next_rq;
++
++		/* Reposition request in its sort_list */
++		elv_rb_del(&bfqq->sort_list, req);
++		elv_rb_add(&bfqq->sort_list, req);
++		/* Choose next request to be served for bfqq */
++		prev = bfqq->next_rq;
++		next_rq = bfq_choose_req(bfqd, bfqq->next_rq, req,
++					 bfqd->last_position);
++		BUG_ON(next_rq == NULL);
++		bfqq->next_rq = next_rq;
++		/*
++		 * If next_rq changes, update both the queue's budget to
++		 * fit the new request and the queue's position in its
++		 * rq_pos_tree.
++		 */
++		if (prev != bfqq->next_rq) {
++			bfq_updated_next_req(bfqd, bfqq);
++			bfq_rq_pos_tree_add(bfqd, bfqq);
++		}
++	}
++}
++
++static void bfq_merged_requests(struct request_queue *q, struct request *rq,
++				struct request *next)
++{
++	struct bfq_queue *bfqq = RQ_BFQQ(rq), *next_bfqq = RQ_BFQQ(next);
++
++	/*
++	 * If next and rq belong to the same bfq_queue and next is older
++	 * than rq, then reposition rq in the fifo (by substituting next
++	 * with rq). Otherwise, if next and rq belong to different
++	 * bfq_queues, never reposition rq: in fact, we would have to
++	 * reposition it with respect to next's position in its own fifo,
++	 * which would most certainly be too expensive with respect to
++	 * the benefits.
++	 */
++	if (bfqq == next_bfqq &&
++	    !list_empty(&rq->queuelist) && !list_empty(&next->queuelist) &&
++	    time_before(next->fifo_time, rq->fifo_time)) {
++		list_del_init(&rq->queuelist);
++		list_replace_init(&next->queuelist, &rq->queuelist);
++		rq->fifo_time = next->fifo_time;
++	}
++
++	if (bfqq->next_rq == next)
++		bfqq->next_rq = rq;
++
++	bfq_remove_request(next);
++}
++
++/* Must be called with bfqq != NULL */
++static inline void bfq_bfqq_end_wr(struct bfq_queue *bfqq)
++{
++	BUG_ON(bfqq == NULL);
++	if (bfq_bfqq_busy(bfqq))
++		bfqq->bfqd->wr_busy_queues--;
++	bfqq->wr_coeff = 1;
++	bfqq->wr_cur_max_time = 0;
++	/* Trigger a weight change on the next activation of the queue */
++	bfqq->entity.ioprio_changed = 1;
++}
++
++static void bfq_end_wr_async_queues(struct bfq_data *bfqd,
++				    struct bfq_group *bfqg)
++{
++	int i, j;
++
++	for (i = 0; i < 2; i++)
++		for (j = 0; j < IOPRIO_BE_NR; j++)
++			if (bfqg->async_bfqq[i][j] != NULL)
++				bfq_bfqq_end_wr(bfqg->async_bfqq[i][j]);
++	if (bfqg->async_idle_bfqq != NULL)
++		bfq_bfqq_end_wr(bfqg->async_idle_bfqq);
++}
++
++static void bfq_end_wr(struct bfq_data *bfqd)
++{
++	struct bfq_queue *bfqq;
++
++	spin_lock_irq(bfqd->queue->queue_lock);
++
++	list_for_each_entry(bfqq, &bfqd->active_list, bfqq_list)
++		bfq_bfqq_end_wr(bfqq);
++	list_for_each_entry(bfqq, &bfqd->idle_list, bfqq_list)
++		bfq_bfqq_end_wr(bfqq);
++	bfq_end_wr_async(bfqd);
++
++	spin_unlock_irq(bfqd->queue->queue_lock);
++}
++
++static int bfq_allow_merge(struct request_queue *q, struct request *rq,
++			   struct bio *bio)
++{
++	struct bfq_data *bfqd = q->elevator->elevator_data;
++	struct bfq_io_cq *bic;
++	struct bfq_queue *bfqq;
++
++	/*
++	 * Disallow merge of a sync bio into an async request.
++	 */
++	if (bfq_bio_sync(bio) && !rq_is_sync(rq))
++		return 0;
++
++	/*
++	 * Lookup the bfqq that this bio will be queued with. Allow
++	 * merge only if rq is queued there.
++	 * Queue lock is held here.
++	 */
++	bic = bfq_bic_lookup(bfqd, current->io_context);
++	if (bic == NULL)
++		return 0;
++
++	bfqq = bic_to_bfqq(bic, bfq_bio_sync(bio));
++	return bfqq == RQ_BFQQ(rq);
++}
++
++static void __bfq_set_in_service_queue(struct bfq_data *bfqd,
++				       struct bfq_queue *bfqq)
++{
++	if (bfqq != NULL) {
++		bfq_mark_bfqq_must_alloc(bfqq);
++		bfq_mark_bfqq_budget_new(bfqq);
++		bfq_clear_bfqq_fifo_expire(bfqq);
++
++		bfqd->budgets_assigned = (bfqd->budgets_assigned*7 + 256) / 8;
++
++		bfq_log_bfqq(bfqd, bfqq,
++			     "set_in_service_queue, cur-budget = %lu",
++			     bfqq->entity.budget);
++	}
++
++	bfqd->in_service_queue = bfqq;
++}
++
++/*
++ * Get and set a new queue for service.
++ */
++static struct bfq_queue *bfq_set_in_service_queue(struct bfq_data *bfqd,
++						  struct bfq_queue *bfqq)
++{
++	if (!bfqq)
++		bfqq = bfq_get_next_queue(bfqd);
++	else
++		bfq_get_next_queue_forced(bfqd, bfqq);
++
++	__bfq_set_in_service_queue(bfqd, bfqq);
++	return bfqq;
++}
++
++static inline sector_t bfq_dist_from_last(struct bfq_data *bfqd,
++					  struct request *rq)
++{
++	if (blk_rq_pos(rq) >= bfqd->last_position)
++		return blk_rq_pos(rq) - bfqd->last_position;
++	else
++		return bfqd->last_position - blk_rq_pos(rq);
++}
++
++/*
++ * Return true if bfqq has no request pending and rq is close enough to
++ * bfqd->last_position, or if rq is closer to bfqd->last_position than
++ * bfqq->next_rq
++ */
++static inline int bfq_rq_close(struct bfq_data *bfqd, struct request *rq)
++{
++	return bfq_dist_from_last(bfqd, rq) <= BFQQ_SEEK_THR;
++}
++
++static struct bfq_queue *bfqq_close(struct bfq_data *bfqd)
++{
++	struct rb_root *root = &bfqd->rq_pos_tree;
++	struct rb_node *parent, *node;
++	struct bfq_queue *__bfqq;
++	sector_t sector = bfqd->last_position;
++
++	if (RB_EMPTY_ROOT(root))
++		return NULL;
++
++	/*
++	 * First, if we find a request starting at the end of the last
++	 * request, choose it.
++	 */
++	__bfqq = bfq_rq_pos_tree_lookup(bfqd, root, sector, &parent, NULL);
++	if (__bfqq != NULL)
++		return __bfqq;
++
++	/*
++	 * If the exact sector wasn't found, the parent of the NULL leaf
++	 * will contain the closest sector (rq_pos_tree sorted by
++	 * next_request position).
++	 */
++	__bfqq = rb_entry(parent, struct bfq_queue, pos_node);
++	if (bfq_rq_close(bfqd, __bfqq->next_rq))
++		return __bfqq;
++
++	if (blk_rq_pos(__bfqq->next_rq) < sector)
++		node = rb_next(&__bfqq->pos_node);
++	else
++		node = rb_prev(&__bfqq->pos_node);
++	if (node == NULL)
++		return NULL;
++
++	__bfqq = rb_entry(node, struct bfq_queue, pos_node);
++	if (bfq_rq_close(bfqd, __bfqq->next_rq))
++		return __bfqq;
++
++	return NULL;
++}
++
++/*
++ * bfqd - obvious
++ * cur_bfqq - passed in so that we don't decide that the current queue
++ *            is closely cooperating with itself.
++ *
++ * We are assuming that cur_bfqq has dispatched at least one request,
++ * and that bfqd->last_position reflects a position on the disk associated
++ * with the I/O issued by cur_bfqq.
++ */
++static struct bfq_queue *bfq_close_cooperator(struct bfq_data *bfqd,
++					      struct bfq_queue *cur_bfqq)
++{
++	struct bfq_queue *bfqq;
++
++	if (bfq_class_idle(cur_bfqq))
++		return NULL;
++	if (!bfq_bfqq_sync(cur_bfqq))
++		return NULL;
++	if (BFQQ_SEEKY(cur_bfqq))
++		return NULL;
++
++	/* If device has only one backlogged bfq_queue, don't search. */
++	if (bfqd->busy_queues == 1)
++		return NULL;
++
++	/*
++	 * We should notice if some of the queues are cooperating, e.g.
++	 * working closely on the same area of the disk. In that case,
++	 * we can group them together and don't waste time idling.
++	 */
++	bfqq = bfqq_close(bfqd);
++	if (bfqq == NULL || bfqq == cur_bfqq)
++		return NULL;
++
++	/*
++	 * Do not merge queues from different bfq_groups.
++	*/
++	if (bfqq->entity.parent != cur_bfqq->entity.parent)
++		return NULL;
++
++	/*
++	 * It only makes sense to merge sync queues.
++	 */
++	if (!bfq_bfqq_sync(bfqq))
++		return NULL;
++	if (BFQQ_SEEKY(bfqq))
++		return NULL;
++
++	/*
++	 * Do not merge queues of different priority classes.
++	 */
++	if (bfq_class_rt(bfqq) != bfq_class_rt(cur_bfqq))
++		return NULL;
++
++	return bfqq;
++}
++
++/*
++ * If enough samples have been computed, return the current max budget
++ * stored in bfqd, which is dynamically updated according to the
++ * estimated disk peak rate; otherwise return the default max budget
++ */
++static inline unsigned long bfq_max_budget(struct bfq_data *bfqd)
++{
++	if (bfqd->budgets_assigned < 194)
++		return bfq_default_max_budget;
++	else
++		return bfqd->bfq_max_budget;
++}
++
++/*
++ * Return min budget, which is a fraction of the current or default
++ * max budget (trying with 1/32)
++ */
++static inline unsigned long bfq_min_budget(struct bfq_data *bfqd)
++{
++	if (bfqd->budgets_assigned < 194)
++		return bfq_default_max_budget / 32;
++	else
++		return bfqd->bfq_max_budget / 32;
++}
++
++static void bfq_arm_slice_timer(struct bfq_data *bfqd)
++{
++	struct bfq_queue *bfqq = bfqd->in_service_queue;
++	struct bfq_io_cq *bic;
++	unsigned long sl;
++
++	BUG_ON(!RB_EMPTY_ROOT(&bfqq->sort_list));
++
++	/* Processes have exited, don't wait. */
++	bic = bfqd->in_service_bic;
++	if (bic == NULL || atomic_read(&bic->icq.ioc->active_ref) == 0)
++		return;
++
++	bfq_mark_bfqq_wait_request(bfqq);
++
++	/*
++	 * We don't want to idle for seeks, but we do want to allow
++	 * fair distribution of slice time for a process doing back-to-back
++	 * seeks. So allow a little bit of time for him to submit a new rq.
++	 *
++	 * To prevent processes with (partly) seeky workloads from
++	 * being too ill-treated, grant them a small fraction of the
++	 * assigned budget before reducing the waiting time to
++	 * BFQ_MIN_TT. This happened to help reduce latency.
++	 */
++	sl = bfqd->bfq_slice_idle;
++	/*
++	 * Unless the queue is being weight-raised or the scenario is
++	 * asymmetric, grant only minimum idle time if the queue either
++	 * has been seeky for long enough or has already proved to be
++	 * constantly seeky.
++	 */
++	if (bfq_sample_valid(bfqq->seek_samples) &&
++	    ((BFQQ_SEEKY(bfqq) && bfqq->entity.service >
++				  bfq_max_budget(bfqq->bfqd) / 8) ||
++	      bfq_bfqq_constantly_seeky(bfqq)) && bfqq->wr_coeff == 1 &&
++	    symmetric_scenario)
++		sl = min(sl, msecs_to_jiffies(BFQ_MIN_TT));
++	else if (bfqq->wr_coeff > 1)
++		sl = sl * 3;
++	bfqd->last_idling_start = ktime_get();
++	mod_timer(&bfqd->idle_slice_timer, jiffies + sl);
++	bfq_log(bfqd, "arm idle: %u/%u ms",
++		jiffies_to_msecs(sl), jiffies_to_msecs(bfqd->bfq_slice_idle));
++}
++
++/*
++ * Set the maximum time for the in-service queue to consume its
++ * budget. This prevents seeky processes from lowering the disk
++ * throughput (always guaranteed with a time slice scheme as in CFQ).
++ */
++static void bfq_set_budget_timeout(struct bfq_data *bfqd)
++{
++	struct bfq_queue *bfqq = bfqd->in_service_queue;
++	unsigned int timeout_coeff;
++	if (bfqq->wr_cur_max_time == bfqd->bfq_wr_rt_max_time)
++		timeout_coeff = 1;
++	else
++		timeout_coeff = bfqq->entity.weight / bfqq->entity.orig_weight;
++
++	bfqd->last_budget_start = ktime_get();
++
++	bfq_clear_bfqq_budget_new(bfqq);
++	bfqq->budget_timeout = jiffies +
++		bfqd->bfq_timeout[bfq_bfqq_sync(bfqq)] * timeout_coeff;
++
++	bfq_log_bfqq(bfqd, bfqq, "set budget_timeout %u",
++		jiffies_to_msecs(bfqd->bfq_timeout[bfq_bfqq_sync(bfqq)] *
++		timeout_coeff));
++}
++
++/*
++ * Move request from internal lists to the request queue dispatch list.
++ */
++static void bfq_dispatch_insert(struct request_queue *q, struct request *rq)
++{
++	struct bfq_data *bfqd = q->elevator->elevator_data;
++	struct bfq_queue *bfqq = RQ_BFQQ(rq);
++
++	/*
++	 * For consistency, the next instruction should have been executed
++	 * after removing the request from the queue and dispatching it.
++	 * We execute instead this instruction before bfq_remove_request()
++	 * (and hence introduce a temporary inconsistency), for efficiency.
++	 * In fact, in a forced_dispatch, this prevents two counters related
++	 * to bfqq->dispatched to risk to be uselessly decremented if bfqq
++	 * is not in service, and then to be incremented again after
++	 * incrementing bfqq->dispatched.
++	 */
++	bfqq->dispatched++;
++	bfq_remove_request(rq);
++	elv_dispatch_sort(q, rq);
++
++	if (bfq_bfqq_sync(bfqq))
++		bfqd->sync_flight++;
++}
++
++/*
++ * Return expired entry, or NULL to just start from scratch in rbtree.
++ */
++static struct request *bfq_check_fifo(struct bfq_queue *bfqq)
++{
++	struct request *rq = NULL;
++
++	if (bfq_bfqq_fifo_expire(bfqq))
++		return NULL;
++
++	bfq_mark_bfqq_fifo_expire(bfqq);
++
++	if (list_empty(&bfqq->fifo))
++		return NULL;
++
++	rq = rq_entry_fifo(bfqq->fifo.next);
++
++	if (time_before(jiffies, rq->fifo_time))
++		return NULL;
++
++	return rq;
++}
++
++/* Must be called with the queue_lock held. */
++static int bfqq_process_refs(struct bfq_queue *bfqq)
++{
++	int process_refs, io_refs;
++
++	io_refs = bfqq->allocated[READ] + bfqq->allocated[WRITE];
++	process_refs = atomic_read(&bfqq->ref) - io_refs - bfqq->entity.on_st;
++	BUG_ON(process_refs < 0);
++	return process_refs;
++}
++
++static void bfq_setup_merge(struct bfq_queue *bfqq, struct bfq_queue *new_bfqq)
++{
++	int process_refs, new_process_refs;
++	struct bfq_queue *__bfqq;
++
++	/*
++	 * If there are no process references on the new_bfqq, then it is
++	 * unsafe to follow the ->new_bfqq chain as other bfqq's in the chain
++	 * may have dropped their last reference (not just their last process
++	 * reference).
++	 */
++	if (!bfqq_process_refs(new_bfqq))
++		return;
++
++	/* Avoid a circular list and skip interim queue merges. */
++	while ((__bfqq = new_bfqq->new_bfqq)) {
++		if (__bfqq == bfqq)
++			return;
++		new_bfqq = __bfqq;
++	}
++
++	process_refs = bfqq_process_refs(bfqq);
++	new_process_refs = bfqq_process_refs(new_bfqq);
++	/*
++	 * If the process for the bfqq has gone away, there is no
++	 * sense in merging the queues.
++	 */
++	if (process_refs == 0 || new_process_refs == 0)
++		return;
++
++	/*
++	 * Merge in the direction of the lesser amount of work.
++	 */
++	if (new_process_refs >= process_refs) {
++		bfqq->new_bfqq = new_bfqq;
++		atomic_add(process_refs, &new_bfqq->ref);
++	} else {
++		new_bfqq->new_bfqq = bfqq;
++		atomic_add(new_process_refs, &bfqq->ref);
++	}
++	bfq_log_bfqq(bfqq->bfqd, bfqq, "scheduling merge with queue %d",
++		new_bfqq->pid);
++}
++
++static inline unsigned long bfq_bfqq_budget_left(struct bfq_queue *bfqq)
++{
++	struct bfq_entity *entity = &bfqq->entity;
++	return entity->budget - entity->service;
++}
++
++static void __bfq_bfqq_expire(struct bfq_data *bfqd, struct bfq_queue *bfqq)
++{
++	BUG_ON(bfqq != bfqd->in_service_queue);
++
++	__bfq_bfqd_reset_in_service(bfqd);
++
++	/*
++	 * If this bfqq is shared between multiple processes, check
++	 * to make sure that those processes are still issuing I/Os
++	 * within the mean seek distance. If not, it may be time to
++	 * break the queues apart again.
++	 */
++	if (bfq_bfqq_coop(bfqq) && BFQQ_SEEKY(bfqq))
++		bfq_mark_bfqq_split_coop(bfqq);
++
++	if (RB_EMPTY_ROOT(&bfqq->sort_list)) {
++		/*
++		 * Overloading budget_timeout field to store the time
++		 * at which the queue remains with no backlog; used by
++		 * the weight-raising mechanism.
++		 */
++		bfqq->budget_timeout = jiffies;
++		bfq_del_bfqq_busy(bfqd, bfqq, 1);
++	} else {
++		bfq_activate_bfqq(bfqd, bfqq);
++		/*
++		 * Resort priority tree of potential close cooperators.
++		 */
++		bfq_rq_pos_tree_add(bfqd, bfqq);
++	}
++}
++
++/**
++ * __bfq_bfqq_recalc_budget - try to adapt the budget to the @bfqq behavior.
++ * @bfqd: device data.
++ * @bfqq: queue to update.
++ * @reason: reason for expiration.
++ *
++ * Handle the feedback on @bfqq budget.  See the body for detailed
++ * comments.
++ */
++static void __bfq_bfqq_recalc_budget(struct bfq_data *bfqd,
++				     struct bfq_queue *bfqq,
++				     enum bfqq_expiration reason)
++{
++	struct request *next_rq;
++	unsigned long budget, min_budget;
++
++	budget = bfqq->max_budget;
++	min_budget = bfq_min_budget(bfqd);
++
++	BUG_ON(bfqq != bfqd->in_service_queue);
++
++	bfq_log_bfqq(bfqd, bfqq, "recalc_budg: last budg %lu, budg left %lu",
++		bfqq->entity.budget, bfq_bfqq_budget_left(bfqq));
++	bfq_log_bfqq(bfqd, bfqq, "recalc_budg: last max_budg %lu, min budg %lu",
++		budget, bfq_min_budget(bfqd));
++	bfq_log_bfqq(bfqd, bfqq, "recalc_budg: sync %d, seeky %d",
++		bfq_bfqq_sync(bfqq), BFQQ_SEEKY(bfqd->in_service_queue));
++
++	if (bfq_bfqq_sync(bfqq)) {
++		switch (reason) {
++		/*
++		 * Caveat: in all the following cases we trade latency
++		 * for throughput.
++		 */
++		case BFQ_BFQQ_TOO_IDLE:
++			/*
++			 * This is the only case where we may reduce
++			 * the budget: if there is no request of the
++			 * process still waiting for completion, then
++			 * we assume (tentatively) that the timer has
++			 * expired because the batch of requests of
++			 * the process could have been served with a
++			 * smaller budget.  Hence, betting that
++			 * process will behave in the same way when it
++			 * becomes backlogged again, we reduce its
++			 * next budget.  As long as we guess right,
++			 * this budget cut reduces the latency
++			 * experienced by the process.
++			 *
++			 * However, if there are still outstanding
++			 * requests, then the process may have not yet
++			 * issued its next request just because it is
++			 * still waiting for the completion of some of
++			 * the still outstanding ones.  So in this
++			 * subcase we do not reduce its budget, on the
++			 * contrary we increase it to possibly boost
++			 * the throughput, as discussed in the
++			 * comments to the BUDGET_TIMEOUT case.
++			 */
++			if (bfqq->dispatched > 0) /* still outstanding reqs */
++				budget = min(budget * 2, bfqd->bfq_max_budget);
++			else {
++				if (budget > 5 * min_budget)
++					budget -= 4 * min_budget;
++				else
++					budget = min_budget;
++			}
++			break;
++		case BFQ_BFQQ_BUDGET_TIMEOUT:
++			/*
++			 * We double the budget here because: 1) it
++			 * gives the chance to boost the throughput if
++			 * this is not a seeky process (which may have
++			 * bumped into this timeout because of, e.g.,
++			 * ZBR), 2) together with charge_full_budget
++			 * it helps give seeky processes higher
++			 * timestamps, and hence be served less
++			 * frequently.
++			 */
++			budget = min(budget * 2, bfqd->bfq_max_budget);
++			break;
++		case BFQ_BFQQ_BUDGET_EXHAUSTED:
++			/*
++			 * The process still has backlog, and did not
++			 * let either the budget timeout or the disk
++			 * idling timeout expire. Hence it is not
++			 * seeky, has a short thinktime and may be
++			 * happy with a higher budget too. So
++			 * definitely increase the budget of this good
++			 * candidate to boost the disk throughput.
++			 */
++			budget = min(budget * 4, bfqd->bfq_max_budget);
++			break;
++		case BFQ_BFQQ_NO_MORE_REQUESTS:
++		       /*
++			* Leave the budget unchanged.
++			*/
++		default:
++			return;
++		}
++	} else /* async queue */
++	    /* async queues get always the maximum possible budget
++	     * (their ability to dispatch is limited by
++	     * @bfqd->bfq_max_budget_async_rq).
++	     */
++		budget = bfqd->bfq_max_budget;
++
++	bfqq->max_budget = budget;
++
++	if (bfqd->budgets_assigned >= 194 && bfqd->bfq_user_max_budget == 0 &&
++	    bfqq->max_budget > bfqd->bfq_max_budget)
++		bfqq->max_budget = bfqd->bfq_max_budget;
++
++	/*
++	 * Make sure that we have enough budget for the next request.
++	 * Since the finish time of the bfqq must be kept in sync with
++	 * the budget, be sure to call __bfq_bfqq_expire() after the
++	 * update.
++	 */
++	next_rq = bfqq->next_rq;
++	if (next_rq != NULL)
++		bfqq->entity.budget = max_t(unsigned long, bfqq->max_budget,
++					    bfq_serv_to_charge(next_rq, bfqq));
++	else
++		bfqq->entity.budget = bfqq->max_budget;
++
++	bfq_log_bfqq(bfqd, bfqq, "head sect: %u, new budget %lu",
++			next_rq != NULL ? blk_rq_sectors(next_rq) : 0,
++			bfqq->entity.budget);
++}
++
++static unsigned long bfq_calc_max_budget(u64 peak_rate, u64 timeout)
++{
++	unsigned long max_budget;
++
++	/*
++	 * The max_budget calculated when autotuning is equal to the
++	 * amount of sectors transfered in timeout_sync at the
++	 * estimated peak rate.
++	 */
++	max_budget = (unsigned long)(peak_rate * 1000 *
++				     timeout >> BFQ_RATE_SHIFT);
++
++	return max_budget;
++}
++
++/*
++ * In addition to updating the peak rate, checks whether the process
++ * is "slow", and returns 1 if so. This slow flag is used, in addition
++ * to the budget timeout, to reduce the amount of service provided to
++ * seeky processes, and hence reduce their chances to lower the
++ * throughput. See the code for more details.
++ */
++static int bfq_update_peak_rate(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++				int compensate, enum bfqq_expiration reason)
++{
++	u64 bw, usecs, expected, timeout;
++	ktime_t delta;
++	int update = 0;
++
++	if (!bfq_bfqq_sync(bfqq) || bfq_bfqq_budget_new(bfqq))
++		return 0;
++
++	if (compensate)
++		delta = bfqd->last_idling_start;
++	else
++		delta = ktime_get();
++	delta = ktime_sub(delta, bfqd->last_budget_start);
++	usecs = ktime_to_us(delta);
++
++	/* Don't trust short/unrealistic values. */
++	if (usecs < 100 || usecs >= LONG_MAX)
++		return 0;
++
++	/*
++	 * Calculate the bandwidth for the last slice.  We use a 64 bit
++	 * value to store the peak rate, in sectors per usec in fixed
++	 * point math.  We do so to have enough precision in the estimate
++	 * and to avoid overflows.
++	 */
++	bw = (u64)bfqq->entity.service << BFQ_RATE_SHIFT;
++	do_div(bw, (unsigned long)usecs);
++
++	timeout = jiffies_to_msecs(bfqd->bfq_timeout[BLK_RW_SYNC]);
++
++	/*
++	 * Use only long (> 20ms) intervals to filter out spikes for
++	 * the peak rate estimation.
++	 */
++	if (usecs > 20000) {
++		if (bw > bfqd->peak_rate ||
++		   (!BFQQ_SEEKY(bfqq) &&
++		    reason == BFQ_BFQQ_BUDGET_TIMEOUT)) {
++			bfq_log(bfqd, "measured bw =%llu", bw);
++			/*
++			 * To smooth oscillations use a low-pass filter with
++			 * alpha=7/8, i.e.,
++			 * new_rate = (7/8) * old_rate + (1/8) * bw
++			 */
++			do_div(bw, 8);
++			if (bw == 0)
++				return 0;
++			bfqd->peak_rate *= 7;
++			do_div(bfqd->peak_rate, 8);
++			bfqd->peak_rate += bw;
++			update = 1;
++			bfq_log(bfqd, "new peak_rate=%llu", bfqd->peak_rate);
++		}
++
++		update |= bfqd->peak_rate_samples == BFQ_PEAK_RATE_SAMPLES - 1;
++
++		if (bfqd->peak_rate_samples < BFQ_PEAK_RATE_SAMPLES)
++			bfqd->peak_rate_samples++;
++
++		if (bfqd->peak_rate_samples == BFQ_PEAK_RATE_SAMPLES &&
++		    update) {
++			int dev_type = blk_queue_nonrot(bfqd->queue);
++			if (bfqd->bfq_user_max_budget == 0) {
++				bfqd->bfq_max_budget =
++					bfq_calc_max_budget(bfqd->peak_rate,
++							    timeout);
++				bfq_log(bfqd, "new max_budget=%lu",
++					bfqd->bfq_max_budget);
++			}
++			if (bfqd->device_speed == BFQ_BFQD_FAST &&
++			    bfqd->peak_rate < device_speed_thresh[dev_type]) {
++				bfqd->device_speed = BFQ_BFQD_SLOW;
++				bfqd->RT_prod = R_slow[dev_type] *
++						T_slow[dev_type];
++			} else if (bfqd->device_speed == BFQ_BFQD_SLOW &&
++			    bfqd->peak_rate > device_speed_thresh[dev_type]) {
++				bfqd->device_speed = BFQ_BFQD_FAST;
++				bfqd->RT_prod = R_fast[dev_type] *
++						T_fast[dev_type];
++			}
++		}
++	}
++
++	/*
++	 * If the process has been served for a too short time
++	 * interval to let its possible sequential accesses prevail on
++	 * the initial seek time needed to move the disk head on the
++	 * first sector it requested, then give the process a chance
++	 * and for the moment return false.
++	 */
++	if (bfqq->entity.budget <= bfq_max_budget(bfqd) / 8)
++		return 0;
++
++	/*
++	 * A process is considered ``slow'' (i.e., seeky, so that we
++	 * cannot treat it fairly in the service domain, as it would
++	 * slow down too much the other processes) if, when a slice
++	 * ends for whatever reason, it has received service at a
++	 * rate that would not be high enough to complete the budget
++	 * before the budget timeout expiration.
++	 */
++	expected = bw * 1000 * timeout >> BFQ_RATE_SHIFT;
++
++	/*
++	 * Caveat: processes doing IO in the slower disk zones will
++	 * tend to be slow(er) even if not seeky. And the estimated
++	 * peak rate will actually be an average over the disk
++	 * surface. Hence, to not be too harsh with unlucky processes,
++	 * we keep a budget/3 margin of safety before declaring a
++	 * process slow.
++	 */
++	return expected > (4 * bfqq->entity.budget) / 3;
++}
++
++/*
++ * To be deemed as soft real-time, an application must meet two
++ * requirements. First, the application must not require an average
++ * bandwidth higher than the approximate bandwidth required to playback or
++ * record a compressed high-definition video.
++ * The next function is invoked on the completion of the last request of a
++ * batch, to compute the next-start time instant, soft_rt_next_start, such
++ * that, if the next request of the application does not arrive before
++ * soft_rt_next_start, then the above requirement on the bandwidth is met.
++ *
++ * The second requirement is that the request pattern of the application is
++ * isochronous, i.e., that, after issuing a request or a batch of requests,
++ * the application stops issuing new requests until all its pending requests
++ * have been completed. After that, the application may issue a new batch,
++ * and so on.
++ * For this reason the next function is invoked to compute
++ * soft_rt_next_start only for applications that meet this requirement,
++ * whereas soft_rt_next_start is set to infinity for applications that do
++ * not.
++ *
++ * Unfortunately, even a greedy application may happen to behave in an
++ * isochronous way if the CPU load is high. In fact, the application may
++ * stop issuing requests while the CPUs are busy serving other processes,
++ * then restart, then stop again for a while, and so on. In addition, if
++ * the disk achieves a low enough throughput with the request pattern
++ * issued by the application (e.g., because the request pattern is random
++ * and/or the device is slow), then the application may meet the above
++ * bandwidth requirement too. To prevent such a greedy application to be
++ * deemed as soft real-time, a further rule is used in the computation of
++ * soft_rt_next_start: soft_rt_next_start must be higher than the current
++ * time plus the maximum time for which the arrival of a request is waited
++ * for when a sync queue becomes idle, namely bfqd->bfq_slice_idle.
++ * This filters out greedy applications, as the latter issue instead their
++ * next request as soon as possible after the last one has been completed
++ * (in contrast, when a batch of requests is completed, a soft real-time
++ * application spends some time processing data).
++ *
++ * Unfortunately, the last filter may easily generate false positives if
++ * only bfqd->bfq_slice_idle is used as a reference time interval and one
++ * or both the following cases occur:
++ * 1) HZ is so low that the duration of a jiffy is comparable to or higher
++ *    than bfqd->bfq_slice_idle. This happens, e.g., on slow devices with
++ *    HZ=100.
++ * 2) jiffies, instead of increasing at a constant rate, may stop increasing
++ *    for a while, then suddenly 'jump' by several units to recover the lost
++ *    increments. This seems to happen, e.g., inside virtual machines.
++ * To address this issue, we do not use as a reference time interval just
++ * bfqd->bfq_slice_idle, but bfqd->bfq_slice_idle plus a few jiffies. In
++ * particular we add the minimum number of jiffies for which the filter
++ * seems to be quite precise also in embedded systems and KVM/QEMU virtual
++ * machines.
++ */
++static inline unsigned long bfq_bfqq_softrt_next_start(struct bfq_data *bfqd,
++						       struct bfq_queue *bfqq)
++{
++	return max(bfqq->last_idle_bklogged +
++		   HZ * bfqq->service_from_backlogged /
++		   bfqd->bfq_wr_max_softrt_rate,
++		   jiffies + bfqq->bfqd->bfq_slice_idle + 4);
++}
++
++/*
++ * Return the largest-possible time instant such that, for as long as possible,
++ * the current time will be lower than this time instant according to the macro
++ * time_is_before_jiffies().
++ */
++static inline unsigned long bfq_infinity_from_now(unsigned long now)
++{
++	return now + ULONG_MAX / 2;
++}
++
++/**
++ * bfq_bfqq_expire - expire a queue.
++ * @bfqd: device owning the queue.
++ * @bfqq: the queue to expire.
++ * @compensate: if true, compensate for the time spent idling.
++ * @reason: the reason causing the expiration.
++ *
++ *
++ * If the process associated to the queue is slow (i.e., seeky), or in
++ * case of budget timeout, or, finally, if it is async, we
++ * artificially charge it an entire budget (independently of the
++ * actual service it received). As a consequence, the queue will get
++ * higher timestamps than the correct ones upon reactivation, and
++ * hence it will be rescheduled as if it had received more service
++ * than what it actually received. In the end, this class of processes
++ * will receive less service in proportion to how slowly they consume
++ * their budgets (and hence how seriously they tend to lower the
++ * throughput).
++ *
++ * In contrast, when a queue expires because it has been idling for
++ * too much or because it exhausted its budget, we do not touch the
++ * amount of service it has received. Hence when the queue will be
++ * reactivated and its timestamps updated, the latter will be in sync
++ * with the actual service received by the queue until expiration.
++ *
++ * Charging a full budget to the first type of queues and the exact
++ * service to the others has the effect of using the WF2Q+ policy to
++ * schedule the former on a timeslice basis, without violating the
++ * service domain guarantees of the latter.
++ */
++static void bfq_bfqq_expire(struct bfq_data *bfqd,
++			    struct bfq_queue *bfqq,
++			    int compensate,
++			    enum bfqq_expiration reason)
++{
++	int slow;
++	BUG_ON(bfqq != bfqd->in_service_queue);
++
++	/* Update disk peak rate for autotuning and check whether the
++	 * process is slow (see bfq_update_peak_rate).
++	 */
++	slow = bfq_update_peak_rate(bfqd, bfqq, compensate, reason);
++
++	/*
++	 * As above explained, 'punish' slow (i.e., seeky), timed-out
++	 * and async queues, to favor sequential sync workloads.
++	 *
++	 * Processes doing I/O in the slower disk zones will tend to be
++	 * slow(er) even if not seeky. Hence, since the estimated peak
++	 * rate is actually an average over the disk surface, these
++	 * processes may timeout just for bad luck. To avoid punishing
++	 * them we do not charge a full budget to a process that
++	 * succeeded in consuming at least 2/3 of its budget.
++	 */
++	if (slow || (reason == BFQ_BFQQ_BUDGET_TIMEOUT &&
++		     bfq_bfqq_budget_left(bfqq) >=  bfqq->entity.budget / 3))
++		bfq_bfqq_charge_full_budget(bfqq);
++
++	bfqq->service_from_backlogged += bfqq->entity.service;
++
++	if (BFQQ_SEEKY(bfqq) && reason == BFQ_BFQQ_BUDGET_TIMEOUT &&
++	    !bfq_bfqq_constantly_seeky(bfqq)) {
++		bfq_mark_bfqq_constantly_seeky(bfqq);
++		if (!blk_queue_nonrot(bfqd->queue))
++			bfqd->const_seeky_busy_in_flight_queues++;
++	}
++
++	if (reason == BFQ_BFQQ_TOO_IDLE &&
++	    bfqq->entity.service <= 2 * bfqq->entity.budget / 10 )
++		bfq_clear_bfqq_IO_bound(bfqq);
++
++	if (bfqd->low_latency && bfqq->wr_coeff == 1)
++		bfqq->last_wr_start_finish = jiffies;
++
++	if (bfqd->low_latency && bfqd->bfq_wr_max_softrt_rate > 0 &&
++	    RB_EMPTY_ROOT(&bfqq->sort_list)) {
++		/*
++		 * If we get here, and there are no outstanding requests,
++		 * then the request pattern is isochronous (see the comments
++		 * to the function bfq_bfqq_softrt_next_start()). Hence we
++		 * can compute soft_rt_next_start. If, instead, the queue
++		 * still has outstanding requests, then we have to wait
++		 * for the completion of all the outstanding requests to
++		 * discover whether the request pattern is actually
++		 * isochronous.
++		 */
++		if (bfqq->dispatched == 0)
++			bfqq->soft_rt_next_start =
++				bfq_bfqq_softrt_next_start(bfqd, bfqq);
++		else {
++			/*
++			 * The application is still waiting for the
++			 * completion of one or more requests:
++			 * prevent it from possibly being incorrectly
++			 * deemed as soft real-time by setting its
++			 * soft_rt_next_start to infinity. In fact,
++			 * without this assignment, the application
++			 * would be incorrectly deemed as soft
++			 * real-time if:
++			 * 1) it issued a new request before the
++			 *    completion of all its in-flight
++			 *    requests, and
++			 * 2) at that time, its soft_rt_next_start
++			 *    happened to be in the past.
++			 */
++			bfqq->soft_rt_next_start =
++				bfq_infinity_from_now(jiffies);
++			/*
++			 * Schedule an update of soft_rt_next_start to when
++			 * the task may be discovered to be isochronous.
++			 */
++			bfq_mark_bfqq_softrt_update(bfqq);
++		}
++	}
++
++	bfq_log_bfqq(bfqd, bfqq,
++		"expire (%d, slow %d, num_disp %d, idle_win %d)", reason,
++		slow, bfqq->dispatched, bfq_bfqq_idle_window(bfqq));
++
++	/*
++	 * Increase, decrease or leave budget unchanged according to
++	 * reason.
++	 */
++	__bfq_bfqq_recalc_budget(bfqd, bfqq, reason);
++	__bfq_bfqq_expire(bfqd, bfqq);
++}
++
++/*
++ * Budget timeout is not implemented through a dedicated timer, but
++ * just checked on request arrivals and completions, as well as on
++ * idle timer expirations.
++ */
++static int bfq_bfqq_budget_timeout(struct bfq_queue *bfqq)
++{
++	if (bfq_bfqq_budget_new(bfqq) ||
++	    time_before(jiffies, bfqq->budget_timeout))
++		return 0;
++	return 1;
++}
++
++/*
++ * If we expire a queue that is waiting for the arrival of a new
++ * request, we may prevent the fictitious timestamp back-shifting that
++ * allows the guarantees of the queue to be preserved (see [1] for
++ * this tricky aspect). Hence we return true only if this condition
++ * does not hold, or if the queue is slow enough to deserve only to be
++ * kicked off for preserving a high throughput.
++*/
++static inline int bfq_may_expire_for_budg_timeout(struct bfq_queue *bfqq)
++{
++	bfq_log_bfqq(bfqq->bfqd, bfqq,
++		"may_budget_timeout: wait_request %d left %d timeout %d",
++		bfq_bfqq_wait_request(bfqq),
++			bfq_bfqq_budget_left(bfqq) >=  bfqq->entity.budget / 3,
++		bfq_bfqq_budget_timeout(bfqq));
++
++	return (!bfq_bfqq_wait_request(bfqq) ||
++		bfq_bfqq_budget_left(bfqq) >=  bfqq->entity.budget / 3)
++		&&
++		bfq_bfqq_budget_timeout(bfqq);
++}
++
++/*
++ * Device idling is allowed only for the queues for which this function
++ * returns true. For this reason, the return value of this function plays a
++ * critical role for both throughput boosting and service guarantees. The
++ * return value is computed through a logical expression. In this rather
++ * long comment, we try to briefly describe all the details and motivations
++ * behind the components of this logical expression.
++ *
++ * First, the expression is false if bfqq is not sync, or if: bfqq happened
++ * to become active during a large burst of queue activations, and the
++ * pattern of requests bfqq contains boosts the throughput if bfqq is
++ * expired. In fact, queues that became active during a large burst benefit
++ * only from throughput, as discussed in the comments to bfq_handle_burst.
++ * In this respect, expiring bfqq certainly boosts the throughput on NCQ-
++ * capable flash-based devices, whereas, on rotational devices, it boosts
++ * the throughput only if bfqq contains random requests.
++ *
++ * On the opposite end, if (a) bfqq is sync, (b) the above burst-related
++ * condition does not hold, and (c) bfqq is being weight-raised, then the
++ * expression always evaluates to true, as device idling is instrumental
++ * for preserving low-latency guarantees (see [1]). If, instead, conditions
++ * (a) and (b) do hold, but (c) does not, then the expression evaluates to
++ * true only if: (1) bfqq is I/O-bound and has a non-null idle window, and
++ * (2) at least one of the following two conditions holds.
++ * The first condition is that the device is not performing NCQ, because
++ * idling the device most certainly boosts the throughput if this condition
++ * holds and bfqq is I/O-bound and has been granted a non-null idle window.
++ * The second compound condition is made of the logical AND of two components.
++ *
++ * The first component is true only if there is no weight-raised busy
++ * queue. This guarantees that the device is not idled for a sync non-
++ * weight-raised queue when there are busy weight-raised queues. The former
++ * is then expired immediately if empty. Combined with the timestamping
++ * rules of BFQ (see [1] for details), this causes sync non-weight-raised
++ * queues to get a lower number of requests served, and hence to ask for a
++ * lower number of requests from the request pool, before the busy weight-
++ * raised queues get served again.
++ *
++ * This is beneficial for the processes associated with weight-raised
++ * queues, when the request pool is saturated (e.g., in the presence of
++ * write hogs). In fact, if the processes associated with the other queues
++ * ask for requests at a lower rate, then weight-raised processes have a
++ * higher probability to get a request from the pool immediately (or at
++ * least soon) when they need one. Hence they have a higher probability to
++ * actually get a fraction of the disk throughput proportional to their
++ * high weight. This is especially true with NCQ-capable drives, which
++ * enqueue several requests in advance and further reorder internally-
++ * queued requests.
++ *
++ * In the end, mistreating non-weight-raised queues when there are busy
++ * weight-raised queues seems to mitigate starvation problems in the
++ * presence of heavy write workloads and NCQ, and hence to guarantee a
++ * higher application and system responsiveness in these hostile scenarios.
++ *
++ * If the first component of the compound condition is instead true, i.e.,
++ * there is no weight-raised busy queue, then the second component of the
++ * compound condition takes into account service-guarantee and throughput
++ * issues related to NCQ (recall that the compound condition is evaluated
++ * only if the device is detected as supporting NCQ).
++ *
++ * As for service guarantees, allowing the drive to enqueue more than one
++ * request at a time, and hence delegating de facto final scheduling
++ * decisions to the drive's internal scheduler, causes loss of control on
++ * the actual request service order. In this respect, when the drive is
++ * allowed to enqueue more than one request at a time, the service
++ * distribution enforced by the drive's internal scheduler is likely to
++ * coincide with the desired device-throughput distribution only in the
++ * following, perfectly symmetric, scenario:
++ * 1) all active queues have the same weight,
++ * 2) all active groups at the same level in the groups tree have the same
++ *    weight,
++ * 3) all active groups at the same level in the groups tree have the same
++ *    number of children.
++ *
++ * Even in such a scenario, sequential I/O may still receive a preferential
++ * treatment, but this is not likely to be a big issue with flash-based
++ * devices, because of their non-dramatic loss of throughput with random
++ * I/O. Things do differ with HDDs, for which additional care is taken, as
++ * explained after completing the discussion for flash-based devices.
++ *
++ * Unfortunately, keeping the necessary state for evaluating exactly the
++ * above symmetry conditions would be quite complex and time-consuming.
++ * Therefore BFQ evaluates instead the following stronger sub-conditions,
++ * for which it is much easier to maintain the needed state:
++ * 1) all active queues have the same weight,
++ * 2) all active groups have the same weight,
++ * 3) all active groups have at most one active child each.
++ * In particular, the last two conditions are always true if hierarchical
++ * support and the cgroups interface are not enabled, hence no state needs
++ * to be maintained in this case.
++ *
++ * According to the above considerations, the second component of the
++ * compound condition evaluates to true if any of the above symmetry
++ * sub-condition does not hold, or the device is not flash-based. Therefore,
++ * if also the first component is true, then idling is allowed for a sync
++ * queue. These are the only sub-conditions considered if the device is
++ * flash-based, as, for such a device, it is sensible to force idling only
++ * for service-guarantee issues. In fact, as for throughput, idling
++ * NCQ-capable flash-based devices would not boost the throughput even
++ * with sequential I/O; rather it would lower the throughput in proportion
++ * to how fast the device is. In the end, (only) if all the three
++ * sub-conditions hold and the device is flash-based, the compound
++ * condition evaluates to false and therefore no idling is performed.
++ *
++ * As already said, things change with a rotational device, where idling
++ * boosts the throughput with sequential I/O (even with NCQ). Hence, for
++ * such a device the second component of the compound condition evaluates
++ * to true also if the following additional sub-condition does not hold:
++ * the queue is constantly seeky. Unfortunately, this different behavior
++ * with respect to flash-based devices causes an additional asymmetry: if
++ * some sync queues enjoy idling and some other sync queues do not, then
++ * the latter get a low share of the device throughput, simply because the
++ * former get many requests served after being set as in service, whereas
++ * the latter do not. As a consequence, to guarantee the desired throughput
++ * distribution, on HDDs the compound expression evaluates to true (and
++ * hence device idling is performed) also if the following last symmetry
++ * condition does not hold: no other queue is benefiting from idling. Also
++ * this last condition is actually replaced with a simpler-to-maintain and
++ * stronger condition: there is no busy queue which is not constantly seeky
++ * (and hence may also benefit from idling).
++ *
++ * To sum up, when all the required symmetry and throughput-boosting
++ * sub-conditions hold, the second component of the compound condition
++ * evaluates to false, and hence no idling is performed. This helps to
++ * keep the drives' internal queues full on NCQ-capable devices, and hence
++ * to boost the throughput, without causing 'almost' any loss of service
++ * guarantees. The 'almost' follows from the fact that, if the internal
++ * queue of one such device is filled while all the sub-conditions hold,
++ * but at some point in time some sub-condition stops to hold, then it may
++ * become impossible to let requests be served in the new desired order
++ * until all the requests already queued in the device have been served.
++ */
++static inline bool bfq_bfqq_must_not_expire(struct bfq_queue *bfqq)
++{
++	struct bfq_data *bfqd = bfqq->bfqd;
++#define cond_for_seeky_on_ncq_hdd (bfq_bfqq_constantly_seeky(bfqq) && \
++				   bfqd->busy_in_flight_queues == \
++				   bfqd->const_seeky_busy_in_flight_queues)
++
++#define cond_for_expiring_in_burst	(bfq_bfqq_in_large_burst(bfqq) && \
++					 bfqd->hw_tag && \
++					 (blk_queue_nonrot(bfqd->queue) || \
++					  bfq_bfqq_constantly_seeky(bfqq)))
++
++/*
++ * Condition for expiring a non-weight-raised queue (and hence not idling
++ * the device).
++ */
++#define cond_for_expiring_non_wr  (bfqd->hw_tag && \
++				   (bfqd->wr_busy_queues > 0 || \
++				    (blk_queue_nonrot(bfqd->queue) || \
++				      cond_for_seeky_on_ncq_hdd)))
++
++	return bfq_bfqq_sync(bfqq) &&
++		!cond_for_expiring_in_burst &&
++		(bfqq->wr_coeff > 1 || !symmetric_scenario ||
++		 (bfq_bfqq_IO_bound(bfqq) && bfq_bfqq_idle_window(bfqq) &&
++		  !cond_for_expiring_non_wr)
++	);
++}
++
++/*
++ * If the in-service queue is empty but sync, and the function
++ * bfq_bfqq_must_not_expire returns true, then:
++ * 1) the queue must remain in service and cannot be expired, and
++ * 2) the disk must be idled to wait for the possible arrival of a new
++ *    request for the queue.
++ * See the comments to the function bfq_bfqq_must_not_expire for the reasons
++ * why performing device idling is the best choice to boost the throughput
++ * and preserve service guarantees when bfq_bfqq_must_not_expire itself
++ * returns true.
++ */
++static inline bool bfq_bfqq_must_idle(struct bfq_queue *bfqq)
++{
++	struct bfq_data *bfqd = bfqq->bfqd;
++
++	return RB_EMPTY_ROOT(&bfqq->sort_list) && bfqd->bfq_slice_idle != 0 &&
++	       bfq_bfqq_must_not_expire(bfqq);
++}
++
++/*
++ * Select a queue for service.  If we have a current queue in service,
++ * check whether to continue servicing it, or retrieve and set a new one.
++ */
++static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
++{
++	struct bfq_queue *bfqq, *new_bfqq = NULL;
++	struct request *next_rq;
++	enum bfqq_expiration reason = BFQ_BFQQ_BUDGET_TIMEOUT;
++
++	bfqq = bfqd->in_service_queue;
++	if (bfqq == NULL)
++		goto new_queue;
++
++	bfq_log_bfqq(bfqd, bfqq, "select_queue: already in-service queue");
++
++	/*
++         * If another queue has a request waiting within our mean seek
++         * distance, let it run. The expire code will check for close
++         * cooperators and put the close queue at the front of the
++         * service tree. If possible, merge the expiring queue with the
++         * new bfqq.
++         */
++        new_bfqq = bfq_close_cooperator(bfqd, bfqq);
++        if (new_bfqq != NULL && bfqq->new_bfqq == NULL)
++                bfq_setup_merge(bfqq, new_bfqq);
++
++	if (bfq_may_expire_for_budg_timeout(bfqq) &&
++	    !timer_pending(&bfqd->idle_slice_timer) &&
++	    !bfq_bfqq_must_idle(bfqq))
++		goto expire;
++
++	next_rq = bfqq->next_rq;
++	/*
++	 * If bfqq has requests queued and it has enough budget left to
++	 * serve them, keep the queue, otherwise expire it.
++	 */
++	if (next_rq != NULL) {
++		if (bfq_serv_to_charge(next_rq, bfqq) >
++			bfq_bfqq_budget_left(bfqq)) {
++			reason = BFQ_BFQQ_BUDGET_EXHAUSTED;
++			goto expire;
++		} else {
++			/*
++			 * The idle timer may be pending because we may
++			 * not disable disk idling even when a new request
++			 * arrives.
++			 */
++			if (timer_pending(&bfqd->idle_slice_timer)) {
++				/*
++				 * If we get here: 1) at least a new request
++				 * has arrived but we have not disabled the
++				 * timer because the request was too small,
++				 * 2) then the block layer has unplugged
++				 * the device, causing the dispatch to be
++				 * invoked.
++				 *
++				 * Since the device is unplugged, now the
++				 * requests are probably large enough to
++				 * provide a reasonable throughput.
++				 * So we disable idling.
++				 */
++				bfq_clear_bfqq_wait_request(bfqq);
++				del_timer(&bfqd->idle_slice_timer);
++			}
++			if (new_bfqq == NULL)
++				goto keep_queue;
++			else
++				goto expire;
++		}
++	}
++
++	/*
++	 * No requests pending. However, if the in-service queue is idling
++	 * for a new request, or has requests waiting for a completion and
++	 * may idle after their completion, then keep it anyway.
++	 */
++	if (new_bfqq == NULL && (timer_pending(&bfqd->idle_slice_timer) ||
++	    (bfqq->dispatched != 0 && bfq_bfqq_must_not_expire(bfqq)))) {
++		bfqq = NULL;
++		goto keep_queue;
++	} else if (new_bfqq != NULL && timer_pending(&bfqd->idle_slice_timer)) {
++		/*
++		 * Expiring the queue because there is a close cooperator,
++		 * cancel timer.
++		 */
++		bfq_clear_bfqq_wait_request(bfqq);
++		del_timer(&bfqd->idle_slice_timer);
++	}
++
++	reason = BFQ_BFQQ_NO_MORE_REQUESTS;
++expire:
++	bfq_bfqq_expire(bfqd, bfqq, 0, reason);
++new_queue:
++	bfqq = bfq_set_in_service_queue(bfqd, new_bfqq);
++	bfq_log(bfqd, "select_queue: new queue %d returned",
++		bfqq != NULL ? bfqq->pid : 0);
++keep_queue:
++	return bfqq;
++}
++
++static void bfq_update_wr_data(struct bfq_data *bfqd,
++			       struct bfq_queue *bfqq)
++{
++	if (bfqq->wr_coeff > 1) { /* queue is being boosted */
++		struct bfq_entity *entity = &bfqq->entity;
++
++		bfq_log_bfqq(bfqd, bfqq,
++			"raising period dur %u/%u msec, old coeff %u, w %d(%d)",
++			jiffies_to_msecs(jiffies -
++				bfqq->last_wr_start_finish),
++			jiffies_to_msecs(bfqq->wr_cur_max_time),
++			bfqq->wr_coeff,
++			bfqq->entity.weight, bfqq->entity.orig_weight);
++
++		BUG_ON(bfqq != bfqd->in_service_queue && entity->weight !=
++		       entity->orig_weight * bfqq->wr_coeff);
++		if (entity->ioprio_changed)
++			bfq_log_bfqq(bfqd, bfqq, "WARN: pending prio change");
++		/*
++		 * If the queue was activated in a burst, or
++		 * too much time has elapsed from the beginning
++		 * of this weight-raising, then end weight raising.
++		 */
++		if (bfq_bfqq_in_large_burst(bfqq) ||
++		    time_is_before_jiffies(bfqq->last_wr_start_finish +
++					   bfqq->wr_cur_max_time)) {
++			bfqq->last_wr_start_finish = jiffies;
++			bfq_log_bfqq(bfqd, bfqq,
++				     "wrais ending at %lu, rais_max_time %u",
++				     bfqq->last_wr_start_finish,
++				     jiffies_to_msecs(bfqq->wr_cur_max_time));
++			bfq_bfqq_end_wr(bfqq);
++			__bfq_entity_update_weight_prio(
++				bfq_entity_service_tree(entity),
++				entity);
++		}
++	}
++}
++
++/*
++ * Dispatch one request from bfqq, moving it to the request queue
++ * dispatch list.
++ */
++static int bfq_dispatch_request(struct bfq_data *bfqd,
++				struct bfq_queue *bfqq)
++{
++	int dispatched = 0;
++	struct request *rq;
++	unsigned long service_to_charge;
++
++	BUG_ON(RB_EMPTY_ROOT(&bfqq->sort_list));
++
++	/* Follow expired path, else get first next available. */
++	rq = bfq_check_fifo(bfqq);
++	if (rq == NULL)
++		rq = bfqq->next_rq;
++	service_to_charge = bfq_serv_to_charge(rq, bfqq);
++
++	if (service_to_charge > bfq_bfqq_budget_left(bfqq)) {
++		/*
++		 * This may happen if the next rq is chosen in fifo order
++		 * instead of sector order. The budget is properly
++		 * dimensioned to be always sufficient to serve the next
++		 * request only if it is chosen in sector order. The reason
++		 * is that it would be quite inefficient and little useful
++		 * to always make sure that the budget is large enough to
++		 * serve even the possible next rq in fifo order.
++		 * In fact, requests are seldom served in fifo order.
++		 *
++		 * Expire the queue for budget exhaustion, and make sure
++		 * that the next act_budget is enough to serve the next
++		 * request, even if it comes from the fifo expired path.
++		 */
++		bfqq->next_rq = rq;
++		/*
++		 * Since this dispatch is failed, make sure that
++		 * a new one will be performed
++		 */
++		if (!bfqd->rq_in_driver)
++			bfq_schedule_dispatch(bfqd);
++		goto expire;
++	}
++
++	/* Finally, insert request into driver dispatch list. */
++	bfq_bfqq_served(bfqq, service_to_charge);
++	bfq_dispatch_insert(bfqd->queue, rq);
++
++	bfq_update_wr_data(bfqd, bfqq);
++
++	bfq_log_bfqq(bfqd, bfqq,
++			"dispatched %u sec req (%llu), budg left %lu",
++			blk_rq_sectors(rq),
++			(long long unsigned)blk_rq_pos(rq),
++			bfq_bfqq_budget_left(bfqq));
++
++	dispatched++;
++
++	if (bfqd->in_service_bic == NULL) {
++		atomic_long_inc(&RQ_BIC(rq)->icq.ioc->refcount);
++		bfqd->in_service_bic = RQ_BIC(rq);
++	}
++
++	if (bfqd->busy_queues > 1 && ((!bfq_bfqq_sync(bfqq) &&
++	    dispatched >= bfqd->bfq_max_budget_async_rq) ||
++	    bfq_class_idle(bfqq)))
++		goto expire;
++
++	return dispatched;
++
++expire:
++	bfq_bfqq_expire(bfqd, bfqq, 0, BFQ_BFQQ_BUDGET_EXHAUSTED);
++	return dispatched;
++}
++
++static int __bfq_forced_dispatch_bfqq(struct bfq_queue *bfqq)
++{
++	int dispatched = 0;
++
++	while (bfqq->next_rq != NULL) {
++		bfq_dispatch_insert(bfqq->bfqd->queue, bfqq->next_rq);
++		dispatched++;
++	}
++
++	BUG_ON(!list_empty(&bfqq->fifo));
++	return dispatched;
++}
++
++/*
++ * Drain our current requests.
++ * Used for barriers and when switching io schedulers on-the-fly.
++ */
++static int bfq_forced_dispatch(struct bfq_data *bfqd)
++{
++	struct bfq_queue *bfqq, *n;
++	struct bfq_service_tree *st;
++	int dispatched = 0;
++
++	bfqq = bfqd->in_service_queue;
++	if (bfqq != NULL)
++		__bfq_bfqq_expire(bfqd, bfqq);
++
++	/*
++	 * Loop through classes, and be careful to leave the scheduler
++	 * in a consistent state, as feedback mechanisms and vtime
++	 * updates cannot be disabled during the process.
++	 */
++	list_for_each_entry_safe(bfqq, n, &bfqd->active_list, bfqq_list) {
++		st = bfq_entity_service_tree(&bfqq->entity);
++
++		dispatched += __bfq_forced_dispatch_bfqq(bfqq);
++		bfqq->max_budget = bfq_max_budget(bfqd);
++
++		bfq_forget_idle(st);
++	}
++
++	BUG_ON(bfqd->busy_queues != 0);
++
++	return dispatched;
++}
++
++static int bfq_dispatch_requests(struct request_queue *q, int force)
++{
++	struct bfq_data *bfqd = q->elevator->elevator_data;
++	struct bfq_queue *bfqq;
++	int max_dispatch;
++
++	bfq_log(bfqd, "dispatch requests: %d busy queues", bfqd->busy_queues);
++	if (bfqd->busy_queues == 0)
++		return 0;
++
++	if (unlikely(force))
++		return bfq_forced_dispatch(bfqd);
++
++	bfqq = bfq_select_queue(bfqd);
++	if (bfqq == NULL)
++		return 0;
++
++	if (bfq_class_idle(bfqq))
++		max_dispatch = 1;
++
++	if (!bfq_bfqq_sync(bfqq))
++		max_dispatch = bfqd->bfq_max_budget_async_rq;
++
++	if (!bfq_bfqq_sync(bfqq) && bfqq->dispatched >= max_dispatch) {
++		if (bfqd->busy_queues > 1)
++			return 0;
++		if (bfqq->dispatched >= 4 * max_dispatch)
++			return 0;
++	}
++
++	if (bfqd->sync_flight != 0 && !bfq_bfqq_sync(bfqq))
++		return 0;
++
++	bfq_clear_bfqq_wait_request(bfqq);
++	BUG_ON(timer_pending(&bfqd->idle_slice_timer));
++
++	if (!bfq_dispatch_request(bfqd, bfqq))
++		return 0;
++
++	bfq_log_bfqq(bfqd, bfqq, "dispatched %s request",
++			bfq_bfqq_sync(bfqq) ? "sync" : "async");
++
++	return 1;
++}
++
++/*
++ * Task holds one reference to the queue, dropped when task exits.  Each rq
++ * in-flight on this queue also holds a reference, dropped when rq is freed.
++ *
++ * Queue lock must be held here.
++ */
++static void bfq_put_queue(struct bfq_queue *bfqq)
++{
++	struct bfq_data *bfqd = bfqq->bfqd;
++
++	BUG_ON(atomic_read(&bfqq->ref) <= 0);
++
++	bfq_log_bfqq(bfqd, bfqq, "put_queue: %p %d", bfqq,
++		     atomic_read(&bfqq->ref));
++	if (!atomic_dec_and_test(&bfqq->ref))
++		return;
++
++	BUG_ON(rb_first(&bfqq->sort_list) != NULL);
++	BUG_ON(bfqq->allocated[READ] + bfqq->allocated[WRITE] != 0);
++	BUG_ON(bfqq->entity.tree != NULL);
++	BUG_ON(bfq_bfqq_busy(bfqq));
++	BUG_ON(bfqd->in_service_queue == bfqq);
++
++	if (bfq_bfqq_sync(bfqq))
++		/*
++		 * The fact that this queue is being destroyed does not
++		 * invalidate the fact that this queue may have been
++		 * activated during the current burst. As a consequence,
++		 * although the queue does not exist anymore, and hence
++		 * needs to be removed from the burst list if there,
++		 * the burst size has not to be decremented.
++		 */
++		hlist_del_init(&bfqq->burst_list_node);
++
++	bfq_log_bfqq(bfqd, bfqq, "put_queue: %p freed", bfqq);
++
++	kmem_cache_free(bfq_pool, bfqq);
++}
++
++static void bfq_put_cooperator(struct bfq_queue *bfqq)
++{
++	struct bfq_queue *__bfqq, *next;
++
++	/*
++	 * If this queue was scheduled to merge with another queue, be
++	 * sure to drop the reference taken on that queue (and others in
++	 * the merge chain). See bfq_setup_merge and bfq_merge_bfqqs.
++	 */
++	__bfqq = bfqq->new_bfqq;
++	while (__bfqq) {
++		if (__bfqq == bfqq)
++			break;
++		next = __bfqq->new_bfqq;
++		bfq_put_queue(__bfqq);
++		__bfqq = next;
++	}
++}
++
++static void bfq_exit_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq)
++{
++	if (bfqq == bfqd->in_service_queue) {
++		__bfq_bfqq_expire(bfqd, bfqq);
++		bfq_schedule_dispatch(bfqd);
++	}
++
++	bfq_log_bfqq(bfqd, bfqq, "exit_bfqq: %p, %d", bfqq,
++		     atomic_read(&bfqq->ref));
++
++	bfq_put_cooperator(bfqq);
++
++	bfq_put_queue(bfqq);
++}
++
++static inline void bfq_init_icq(struct io_cq *icq)
++{
++	struct bfq_io_cq *bic = icq_to_bic(icq);
++
++	bic->ttime.last_end_request = jiffies;
++}
++
++static void bfq_exit_icq(struct io_cq *icq)
++{
++	struct bfq_io_cq *bic = icq_to_bic(icq);
++	struct bfq_data *bfqd = bic_to_bfqd(bic);
++
++	if (bic->bfqq[BLK_RW_ASYNC]) {
++		bfq_exit_bfqq(bfqd, bic->bfqq[BLK_RW_ASYNC]);
++		bic->bfqq[BLK_RW_ASYNC] = NULL;
++	}
++
++	if (bic->bfqq[BLK_RW_SYNC]) {
++		bfq_exit_bfqq(bfqd, bic->bfqq[BLK_RW_SYNC]);
++		bic->bfqq[BLK_RW_SYNC] = NULL;
++	}
++}
++
++/*
++ * Update the entity prio values; note that the new values will not
++ * be used until the next (re)activation.
++ */
++static void bfq_set_next_ioprio_data(struct bfq_queue *bfqq, struct bfq_io_cq *bic)
++{
++	struct task_struct *tsk = current;
++	int ioprio_class;
++
++	ioprio_class = IOPRIO_PRIO_CLASS(bic->ioprio);
++	switch (ioprio_class) {
++	default:
++		dev_err(bfqq->bfqd->queue->backing_dev_info.dev,
++			"bfq: bad prio class %d\n", ioprio_class);
++	case IOPRIO_CLASS_NONE:
++		/*
++		 * No prio set, inherit CPU scheduling settings.
++		 */
++		bfqq->entity.new_ioprio = task_nice_ioprio(tsk);
++		bfqq->entity.new_ioprio_class = task_nice_ioclass(tsk);
++		break;
++	case IOPRIO_CLASS_RT:
++		bfqq->entity.new_ioprio = IOPRIO_PRIO_DATA(bic->ioprio);
++		bfqq->entity.new_ioprio_class = IOPRIO_CLASS_RT;
++		break;
++	case IOPRIO_CLASS_BE:
++		bfqq->entity.new_ioprio = IOPRIO_PRIO_DATA(bic->ioprio);
++		bfqq->entity.new_ioprio_class = IOPRIO_CLASS_BE;
++		break;
++	case IOPRIO_CLASS_IDLE:
++		bfqq->entity.new_ioprio_class = IOPRIO_CLASS_IDLE;
++		bfqq->entity.new_ioprio = 7;
++		bfq_clear_bfqq_idle_window(bfqq);
++		break;
++	}
++
++	if (bfqq->entity.new_ioprio < 0 ||
++	    bfqq->entity.new_ioprio >= IOPRIO_BE_NR) {
++		printk(KERN_CRIT "bfq_set_next_ioprio_data: new_ioprio %d\n",
++				 bfqq->entity.new_ioprio);
++		BUG();
++	}
++
++	bfqq->entity.new_weight = bfq_ioprio_to_weight(bfqq->entity.new_ioprio);
++	bfqq->entity.ioprio_changed = 1;
++}
++
++static void bfq_check_ioprio_change(struct bfq_io_cq *bic)
++{
++	struct bfq_data *bfqd;
++	struct bfq_queue *bfqq, *new_bfqq;
++	struct bfq_group *bfqg;
++	unsigned long uninitialized_var(flags);
++	int ioprio = bic->icq.ioc->ioprio;
++
++	bfqd = bfq_get_bfqd_locked(&(bic->icq.q->elevator->elevator_data),
++				   &flags);
++	/*
++	 * This condition may trigger on a newly created bic, be sure to
++	 * drop the lock before returning.
++	 */
++	if (unlikely(bfqd == NULL) || likely(bic->ioprio == ioprio))
++		goto out;
++
++	bic->ioprio = ioprio;
++
++	bfqq = bic->bfqq[BLK_RW_ASYNC];
++	if (bfqq != NULL) {
++		bfqg = container_of(bfqq->entity.sched_data, struct bfq_group,
++				    sched_data);
++		new_bfqq = bfq_get_queue(bfqd, bfqg, BLK_RW_ASYNC, bic,
++					 GFP_ATOMIC);
++		if (new_bfqq != NULL) {
++			bic->bfqq[BLK_RW_ASYNC] = new_bfqq;
++			bfq_log_bfqq(bfqd, bfqq,
++				     "check_ioprio_change: bfqq %p %d",
++				     bfqq, atomic_read(&bfqq->ref));
++			bfq_put_queue(bfqq);
++		}
++	}
++
++	bfqq = bic->bfqq[BLK_RW_SYNC];
++	if (bfqq != NULL)
++		bfq_set_next_ioprio_data(bfqq, bic);
++
++out:
++	bfq_put_bfqd_unlock(bfqd, &flags);
++}
++
++static void bfq_init_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++			  struct bfq_io_cq *bic, pid_t pid, int is_sync)
++{
++	RB_CLEAR_NODE(&bfqq->entity.rb_node);
++	INIT_LIST_HEAD(&bfqq->fifo);
++	INIT_HLIST_NODE(&bfqq->burst_list_node);
++
++	atomic_set(&bfqq->ref, 0);
++	bfqq->bfqd = bfqd;
++
++	if (bic)
++		bfq_set_next_ioprio_data(bfqq, bic);
++
++	if (is_sync) {
++		if (!bfq_class_idle(bfqq))
++			bfq_mark_bfqq_idle_window(bfqq);
++		bfq_mark_bfqq_sync(bfqq);
++	}
++	bfq_mark_bfqq_IO_bound(bfqq);
++
++	/* Tentative initial value to trade off between thr and lat */
++	bfqq->max_budget = (2 * bfq_max_budget(bfqd)) / 3;
++	bfqq->pid = pid;
++
++	bfqq->wr_coeff = 1;
++	bfqq->last_wr_start_finish = 0;
++	/*
++	 * Set to the value for which bfqq will not be deemed as
++	 * soft rt when it becomes backlogged.
++	 */
++	bfqq->soft_rt_next_start = bfq_infinity_from_now(jiffies);
++}
++
++static struct bfq_queue *bfq_find_alloc_queue(struct bfq_data *bfqd,
++					      struct bfq_group *bfqg,
++					      int is_sync,
++					      struct bfq_io_cq *bic,
++					      gfp_t gfp_mask)
++{
++	struct bfq_queue *bfqq, *new_bfqq = NULL;
++
++retry:
++	/* bic always exists here */
++	bfqq = bic_to_bfqq(bic, is_sync);
++
++	/*
++	 * Always try a new alloc if we fall back to the OOM bfqq
++	 * originally, since it should just be a temporary situation.
++	 */
++	if (bfqq == NULL || bfqq == &bfqd->oom_bfqq) {
++		bfqq = NULL;
++		if (new_bfqq != NULL) {
++			bfqq = new_bfqq;
++			new_bfqq = NULL;
++		} else if (gfp_mask & __GFP_WAIT) {
++			spin_unlock_irq(bfqd->queue->queue_lock);
++			new_bfqq = kmem_cache_alloc_node(bfq_pool,
++					gfp_mask | __GFP_ZERO,
++					bfqd->queue->node);
++			spin_lock_irq(bfqd->queue->queue_lock);
++			if (new_bfqq != NULL)
++				goto retry;
++		} else {
++			bfqq = kmem_cache_alloc_node(bfq_pool,
++					gfp_mask | __GFP_ZERO,
++					bfqd->queue->node);
++		}
++
++		if (bfqq != NULL) {
++			bfq_init_bfqq(bfqd, bfqq, bic, current->pid,
++                                      is_sync);
++			bfq_init_entity(&bfqq->entity, bfqg);
++			bfq_log_bfqq(bfqd, bfqq, "allocated");
++		} else {
++			bfqq = &bfqd->oom_bfqq;
++			bfq_log_bfqq(bfqd, bfqq, "using oom bfqq");
++		}
++	}
++
++	if (new_bfqq != NULL)
++		kmem_cache_free(bfq_pool, new_bfqq);
++
++	return bfqq;
++}
++
++static struct bfq_queue **bfq_async_queue_prio(struct bfq_data *bfqd,
++					       struct bfq_group *bfqg,
++					       int ioprio_class, int ioprio)
++{
++	switch (ioprio_class) {
++	case IOPRIO_CLASS_RT:
++		return &bfqg->async_bfqq[0][ioprio];
++	case IOPRIO_CLASS_NONE:
++		ioprio = IOPRIO_NORM;
++		/* fall through */
++	case IOPRIO_CLASS_BE:
++		return &bfqg->async_bfqq[1][ioprio];
++	case IOPRIO_CLASS_IDLE:
++		return &bfqg->async_idle_bfqq;
++	default:
++		BUG();
++	}
++}
++
++static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
++				       struct bfq_group *bfqg, int is_sync,
++				       struct bfq_io_cq *bic, gfp_t gfp_mask)
++{
++	const int ioprio = IOPRIO_PRIO_DATA(bic->ioprio);
++	const int ioprio_class = IOPRIO_PRIO_CLASS(bic->ioprio);
++	struct bfq_queue **async_bfqq = NULL;
++	struct bfq_queue *bfqq = NULL;
++
++	if (!is_sync) {
++		async_bfqq = bfq_async_queue_prio(bfqd, bfqg, ioprio_class,
++						  ioprio);
++		bfqq = *async_bfqq;
++	}
++
++	if (bfqq == NULL)
++		bfqq = bfq_find_alloc_queue(bfqd, bfqg, is_sync, bic, gfp_mask);
++
++	/*
++	 * Pin the queue now that it's allocated, scheduler exit will
++	 * prune it.
++	 */
++	if (!is_sync && *async_bfqq == NULL) {
++		atomic_inc(&bfqq->ref);
++		bfq_log_bfqq(bfqd, bfqq, "get_queue, bfqq not in async: %p, %d",
++			     bfqq, atomic_read(&bfqq->ref));
++		*async_bfqq = bfqq;
++	}
++
++	atomic_inc(&bfqq->ref);
++	bfq_log_bfqq(bfqd, bfqq, "get_queue, at end: %p, %d", bfqq,
++		     atomic_read(&bfqq->ref));
++	return bfqq;
++}
++
++static void bfq_update_io_thinktime(struct bfq_data *bfqd,
++				    struct bfq_io_cq *bic)
++{
++	unsigned long elapsed = jiffies - bic->ttime.last_end_request;
++	unsigned long ttime = min(elapsed, 2UL * bfqd->bfq_slice_idle);
++
++	bic->ttime.ttime_samples = (7*bic->ttime.ttime_samples + 256) / 8;
++	bic->ttime.ttime_total = (7*bic->ttime.ttime_total + 256*ttime) / 8;
++	bic->ttime.ttime_mean = (bic->ttime.ttime_total + 128) /
++				bic->ttime.ttime_samples;
++}
++
++static void bfq_update_io_seektime(struct bfq_data *bfqd,
++				   struct bfq_queue *bfqq,
++				   struct request *rq)
++{
++	sector_t sdist;
++	u64 total;
++
++	if (bfqq->last_request_pos < blk_rq_pos(rq))
++		sdist = blk_rq_pos(rq) - bfqq->last_request_pos;
++	else
++		sdist = bfqq->last_request_pos - blk_rq_pos(rq);
++
++	/*
++	 * Don't allow the seek distance to get too large from the
++	 * odd fragment, pagein, etc.
++	 */
++	if (bfqq->seek_samples == 0) /* first request, not really a seek */
++		sdist = 0;
++	else if (bfqq->seek_samples <= 60) /* second & third seek */
++		sdist = min(sdist, (bfqq->seek_mean * 4) + 2*1024*1024);
++	else
++		sdist = min(sdist, (bfqq->seek_mean * 4) + 2*1024*64);
++
++	bfqq->seek_samples = (7*bfqq->seek_samples + 256) / 8;
++	bfqq->seek_total = (7*bfqq->seek_total + (u64)256*sdist) / 8;
++	total = bfqq->seek_total + (bfqq->seek_samples/2);
++	do_div(total, bfqq->seek_samples);
++	bfqq->seek_mean = (sector_t)total;
++
++	bfq_log_bfqq(bfqd, bfqq, "dist=%llu mean=%llu", (u64)sdist,
++			(u64)bfqq->seek_mean);
++}
++
++/*
++ * Disable idle window if the process thinks too long or seeks so much that
++ * it doesn't matter.
++ */
++static void bfq_update_idle_window(struct bfq_data *bfqd,
++				   struct bfq_queue *bfqq,
++				   struct bfq_io_cq *bic)
++{
++	int enable_idle;
++
++	/* Don't idle for async or idle io prio class. */
++	if (!bfq_bfqq_sync(bfqq) || bfq_class_idle(bfqq))
++		return;
++
++	enable_idle = bfq_bfqq_idle_window(bfqq);
++
++	if (atomic_read(&bic->icq.ioc->active_ref) == 0 ||
++	    bfqd->bfq_slice_idle == 0 ||
++		(bfqd->hw_tag && BFQQ_SEEKY(bfqq) &&
++			bfqq->wr_coeff == 1))
++		enable_idle = 0;
++	else if (bfq_sample_valid(bic->ttime.ttime_samples)) {
++		if (bic->ttime.ttime_mean > bfqd->bfq_slice_idle &&
++			bfqq->wr_coeff == 1)
++			enable_idle = 0;
++		else
++			enable_idle = 1;
++	}
++	bfq_log_bfqq(bfqd, bfqq, "update_idle_window: enable_idle %d",
++		enable_idle);
++
++	if (enable_idle)
++		bfq_mark_bfqq_idle_window(bfqq);
++	else
++		bfq_clear_bfqq_idle_window(bfqq);
++}
++
++/*
++ * Called when a new fs request (rq) is added to bfqq.  Check if there's
++ * something we should do about it.
++ */
++static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++			    struct request *rq)
++{
++	struct bfq_io_cq *bic = RQ_BIC(rq);
++
++	if (rq->cmd_flags & REQ_META)
++		bfqq->meta_pending++;
++
++	bfq_update_io_thinktime(bfqd, bic);
++	bfq_update_io_seektime(bfqd, bfqq, rq);
++	if (!BFQQ_SEEKY(bfqq) && bfq_bfqq_constantly_seeky(bfqq)) {
++		bfq_clear_bfqq_constantly_seeky(bfqq);
++		if (!blk_queue_nonrot(bfqd->queue)) {
++			BUG_ON(!bfqd->const_seeky_busy_in_flight_queues);
++			bfqd->const_seeky_busy_in_flight_queues--;
++		}
++	}
++	if (bfqq->entity.service > bfq_max_budget(bfqd) / 8 ||
++	    !BFQQ_SEEKY(bfqq))
++		bfq_update_idle_window(bfqd, bfqq, bic);
++
++	bfq_log_bfqq(bfqd, bfqq,
++		     "rq_enqueued: idle_window=%d (seeky %d, mean %llu)",
++		     bfq_bfqq_idle_window(bfqq), BFQQ_SEEKY(bfqq),
++		     (long long unsigned)bfqq->seek_mean);
++
++	bfqq->last_request_pos = blk_rq_pos(rq) + blk_rq_sectors(rq);
++
++	if (bfqq == bfqd->in_service_queue && bfq_bfqq_wait_request(bfqq)) {
++		int small_req = bfqq->queued[rq_is_sync(rq)] == 1 &&
++				blk_rq_sectors(rq) < 32;
++		int budget_timeout = bfq_bfqq_budget_timeout(bfqq);
++
++		/*
++		 * There is just this request queued: if the request
++		 * is small and the queue is not to be expired, then
++		 * just exit.
++		 *
++		 * In this way, if the disk is being idled to wait for
++		 * a new request from the in-service queue, we avoid
++		 * unplugging the device and committing the disk to serve
++		 * just a small request. On the contrary, we wait for
++		 * the block layer to decide when to unplug the device:
++		 * hopefully, new requests will be merged to this one
++		 * quickly, then the device will be unplugged and
++		 * larger requests will be dispatched.
++		 */
++		if (small_req && !budget_timeout)
++			return;
++
++		/*
++		 * A large enough request arrived, or the queue is to
++		 * be expired: in both cases disk idling is to be
++		 * stopped, so clear wait_request flag and reset
++		 * timer.
++		 */
++		bfq_clear_bfqq_wait_request(bfqq);
++		del_timer(&bfqd->idle_slice_timer);
++
++		/*
++		 * The queue is not empty, because a new request just
++		 * arrived. Hence we can safely expire the queue, in
++		 * case of budget timeout, without risking that the
++		 * timestamps of the queue are not updated correctly.
++		 * See [1] for more details.
++		 */
++		if (budget_timeout)
++			bfq_bfqq_expire(bfqd, bfqq, 0, BFQ_BFQQ_BUDGET_TIMEOUT);
++
++		/*
++		 * Let the request rip immediately, or let a new queue be
++		 * selected if bfqq has just been expired.
++		 */
++		__blk_run_queue(bfqd->queue);
++	}
++}
++
++static void bfq_insert_request(struct request_queue *q, struct request *rq)
++{
++	struct bfq_data *bfqd = q->elevator->elevator_data;
++	struct bfq_queue *bfqq = RQ_BFQQ(rq);
++
++	assert_spin_locked(bfqd->queue->queue_lock);
++
++	bfq_add_request(rq);
++
++	rq->fifo_time = jiffies + bfqd->bfq_fifo_expire[rq_is_sync(rq)];
++	list_add_tail(&rq->queuelist, &bfqq->fifo);
++
++	bfq_rq_enqueued(bfqd, bfqq, rq);
++}
++
++static void bfq_update_hw_tag(struct bfq_data *bfqd)
++{
++	bfqd->max_rq_in_driver = max(bfqd->max_rq_in_driver,
++				     bfqd->rq_in_driver);
++
++	if (bfqd->hw_tag == 1)
++		return;
++
++	/*
++	 * This sample is valid if the number of outstanding requests
++	 * is large enough to allow a queueing behavior.  Note that the
++	 * sum is not exact, as it's not taking into account deactivated
++	 * requests.
++	 */
++	if (bfqd->rq_in_driver + bfqd->queued < BFQ_HW_QUEUE_THRESHOLD)
++		return;
++
++	if (bfqd->hw_tag_samples++ < BFQ_HW_QUEUE_SAMPLES)
++		return;
++
++	bfqd->hw_tag = bfqd->max_rq_in_driver > BFQ_HW_QUEUE_THRESHOLD;
++	bfqd->max_rq_in_driver = 0;
++	bfqd->hw_tag_samples = 0;
++}
++
++static void bfq_completed_request(struct request_queue *q, struct request *rq)
++{
++	struct bfq_queue *bfqq = RQ_BFQQ(rq);
++	struct bfq_data *bfqd = bfqq->bfqd;
++	bool sync = bfq_bfqq_sync(bfqq);
++
++	bfq_log_bfqq(bfqd, bfqq, "completed one req with %u sects left (%d)",
++		     blk_rq_sectors(rq), sync);
++
++	bfq_update_hw_tag(bfqd);
++
++	BUG_ON(!bfqd->rq_in_driver);
++	BUG_ON(!bfqq->dispatched);
++	bfqd->rq_in_driver--;
++	bfqq->dispatched--;
++
++	if (!bfqq->dispatched && !bfq_bfqq_busy(bfqq)) {
++		bfq_weights_tree_remove(bfqd, &bfqq->entity,
++					&bfqd->queue_weights_tree);
++		if (!blk_queue_nonrot(bfqd->queue)) {
++			BUG_ON(!bfqd->busy_in_flight_queues);
++			bfqd->busy_in_flight_queues--;
++			if (bfq_bfqq_constantly_seeky(bfqq)) {
++				BUG_ON(!bfqd->
++					const_seeky_busy_in_flight_queues);
++				bfqd->const_seeky_busy_in_flight_queues--;
++			}
++		}
++	}
++
++	if (sync) {
++		bfqd->sync_flight--;
++		RQ_BIC(rq)->ttime.last_end_request = jiffies;
++	}
++
++	/*
++	 * If we are waiting to discover whether the request pattern of the
++	 * task associated with the queue is actually isochronous, and
++	 * both requisites for this condition to hold are satisfied, then
++	 * compute soft_rt_next_start (see the comments to the function
++	 * bfq_bfqq_softrt_next_start()).
++	 */
++	if (bfq_bfqq_softrt_update(bfqq) && bfqq->dispatched == 0 &&
++	    RB_EMPTY_ROOT(&bfqq->sort_list))
++		bfqq->soft_rt_next_start =
++			bfq_bfqq_softrt_next_start(bfqd, bfqq);
++
++	/*
++	 * If this is the in-service queue, check if it needs to be expired,
++	 * or if we want to idle in case it has no pending requests.
++	 */
++	if (bfqd->in_service_queue == bfqq) {
++		if (bfq_bfqq_budget_new(bfqq))
++			bfq_set_budget_timeout(bfqd);
++
++		if (bfq_bfqq_must_idle(bfqq)) {
++			bfq_arm_slice_timer(bfqd);
++			goto out;
++		} else if (bfq_may_expire_for_budg_timeout(bfqq))
++			bfq_bfqq_expire(bfqd, bfqq, 0, BFQ_BFQQ_BUDGET_TIMEOUT);
++		else if (RB_EMPTY_ROOT(&bfqq->sort_list) &&
++			 (bfqq->dispatched == 0 ||
++			  !bfq_bfqq_must_not_expire(bfqq)))
++			bfq_bfqq_expire(bfqd, bfqq, 0,
++					BFQ_BFQQ_NO_MORE_REQUESTS);
++	}
++
++	if (!bfqd->rq_in_driver)
++		bfq_schedule_dispatch(bfqd);
++
++out:
++	return;
++}
++
++static inline int __bfq_may_queue(struct bfq_queue *bfqq)
++{
++	if (bfq_bfqq_wait_request(bfqq) && bfq_bfqq_must_alloc(bfqq)) {
++		bfq_clear_bfqq_must_alloc(bfqq);
++		return ELV_MQUEUE_MUST;
++	}
++
++	return ELV_MQUEUE_MAY;
++}
++
++static int bfq_may_queue(struct request_queue *q, int rw)
++{
++	struct bfq_data *bfqd = q->elevator->elevator_data;
++	struct task_struct *tsk = current;
++	struct bfq_io_cq *bic;
++	struct bfq_queue *bfqq;
++
++	/*
++	 * Don't force setup of a queue from here, as a call to may_queue
++	 * does not necessarily imply that a request actually will be
++	 * queued. So just lookup a possibly existing queue, or return
++	 * 'may queue' if that fails.
++	 */
++	bic = bfq_bic_lookup(bfqd, tsk->io_context);
++	if (bic == NULL)
++		return ELV_MQUEUE_MAY;
++
++	bfqq = bic_to_bfqq(bic, rw_is_sync(rw));
++	if (bfqq != NULL)
++		return __bfq_may_queue(bfqq);
++
++	return ELV_MQUEUE_MAY;
++}
++
++/*
++ * Queue lock held here.
++ */
++static void bfq_put_request(struct request *rq)
++{
++	struct bfq_queue *bfqq = RQ_BFQQ(rq);
++
++	if (bfqq != NULL) {
++		const int rw = rq_data_dir(rq);
++
++		BUG_ON(!bfqq->allocated[rw]);
++		bfqq->allocated[rw]--;
++
++		rq->elv.priv[0] = NULL;
++		rq->elv.priv[1] = NULL;
++
++		bfq_log_bfqq(bfqq->bfqd, bfqq, "put_request %p, %d",
++			     bfqq, atomic_read(&bfqq->ref));
++		bfq_put_queue(bfqq);
++	}
++}
++
++static struct bfq_queue *
++bfq_merge_bfqqs(struct bfq_data *bfqd, struct bfq_io_cq *bic,
++		struct bfq_queue *bfqq)
++{
++	bfq_log_bfqq(bfqd, bfqq, "merging with queue %lu",
++		(long unsigned)bfqq->new_bfqq->pid);
++	bic_set_bfqq(bic, bfqq->new_bfqq, 1);
++	bfq_mark_bfqq_coop(bfqq->new_bfqq);
++	bfq_put_queue(bfqq);
++	return bic_to_bfqq(bic, 1);
++}
++
++/*
++ * Returns NULL if a new bfqq should be allocated, or the old bfqq if this
++ * was the last process referring to said bfqq.
++ */
++static struct bfq_queue *
++bfq_split_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq)
++{
++	bfq_log_bfqq(bfqq->bfqd, bfqq, "splitting queue");
++	if (bfqq_process_refs(bfqq) == 1) {
++		bfqq->pid = current->pid;
++		bfq_clear_bfqq_coop(bfqq);
++		bfq_clear_bfqq_split_coop(bfqq);
++		return bfqq;
++	}
++
++	bic_set_bfqq(bic, NULL, 1);
++
++	bfq_put_cooperator(bfqq);
++
++	bfq_put_queue(bfqq);
++	return NULL;
++}
++
++/*
++ * Allocate bfq data structures associated with this request.
++ */
++static int bfq_set_request(struct request_queue *q, struct request *rq,
++			   struct bio *bio, gfp_t gfp_mask)
++{
++	struct bfq_data *bfqd = q->elevator->elevator_data;
++	struct bfq_io_cq *bic = icq_to_bic(rq->elv.icq);
++	const int rw = rq_data_dir(rq);
++	const int is_sync = rq_is_sync(rq);
++	struct bfq_queue *bfqq;
++	struct bfq_group *bfqg;
++	unsigned long flags;
++
++	might_sleep_if(gfp_mask & __GFP_WAIT);
++
++	bfq_check_ioprio_change(bic);
++
++	spin_lock_irqsave(q->queue_lock, flags);
++
++	if (bic == NULL)
++		goto queue_fail;
++
++	bfqg = bfq_bic_update_cgroup(bic);
++
++new_queue:
++	bfqq = bic_to_bfqq(bic, is_sync);
++	if (bfqq == NULL || bfqq == &bfqd->oom_bfqq) {
++		bfqq = bfq_get_queue(bfqd, bfqg, is_sync, bic, gfp_mask);
++		bic_set_bfqq(bic, bfqq, is_sync);
++	} else {
++		/*
++		 * If the queue was seeky for too long, break it apart.
++		 */
++		if (bfq_bfqq_coop(bfqq) && bfq_bfqq_split_coop(bfqq)) {
++			bfq_log_bfqq(bfqd, bfqq, "breaking apart bfqq");
++			bfqq = bfq_split_bfqq(bic, bfqq);
++			if (!bfqq)
++				goto new_queue;
++		}
++
++		/*
++		 * Check to see if this queue is scheduled to merge with
++		 * another closely cooperating queue. The merging of queues
++		 * happens here as it must be done in process context.
++		 * The reference on new_bfqq was taken in merge_bfqqs.
++		 */
++		if (bfqq->new_bfqq != NULL)
++			bfqq = bfq_merge_bfqqs(bfqd, bic, bfqq);
++	}
++
++	bfqq->allocated[rw]++;
++	atomic_inc(&bfqq->ref);
++	bfq_log_bfqq(bfqd, bfqq, "set_request: bfqq %p, %d", bfqq,
++		     atomic_read(&bfqq->ref));
++
++	rq->elv.priv[0] = bic;
++	rq->elv.priv[1] = bfqq;
++
++	spin_unlock_irqrestore(q->queue_lock, flags);
++
++	return 0;
++
++queue_fail:
++	bfq_schedule_dispatch(bfqd);
++	spin_unlock_irqrestore(q->queue_lock, flags);
++
++	return 1;
++}
++
++static void bfq_kick_queue(struct work_struct *work)
++{
++	struct bfq_data *bfqd =
++		container_of(work, struct bfq_data, unplug_work);
++	struct request_queue *q = bfqd->queue;
++
++	spin_lock_irq(q->queue_lock);
++	__blk_run_queue(q);
++	spin_unlock_irq(q->queue_lock);
++}
++
++/*
++ * Handler of the expiration of the timer running if the in-service queue
++ * is idling inside its time slice.
++ */
++static void bfq_idle_slice_timer(unsigned long data)
++{
++	struct bfq_data *bfqd = (struct bfq_data *)data;
++	struct bfq_queue *bfqq;
++	unsigned long flags;
++	enum bfqq_expiration reason;
++
++	spin_lock_irqsave(bfqd->queue->queue_lock, flags);
++
++	bfqq = bfqd->in_service_queue;
++	/*
++	 * Theoretical race here: the in-service queue can be NULL or
++	 * different from the queue that was idling if the timer handler
++	 * spins on the queue_lock and a new request arrives for the
++	 * current queue and there is a full dispatch cycle that changes
++	 * the in-service queue.  This can hardly happen, but in the worst
++	 * case we just expire a queue too early.
++	 */
++	if (bfqq != NULL) {
++		bfq_log_bfqq(bfqd, bfqq, "slice_timer expired");
++		if (bfq_bfqq_budget_timeout(bfqq))
++			/*
++			 * Also here the queue can be safely expired
++			 * for budget timeout without wasting
++			 * guarantees
++			 */
++			reason = BFQ_BFQQ_BUDGET_TIMEOUT;
++		else if (bfqq->queued[0] == 0 && bfqq->queued[1] == 0)
++			/*
++			 * The queue may not be empty upon timer expiration,
++			 * because we may not disable the timer when the
++			 * first request of the in-service queue arrives
++			 * during disk idling.
++			 */
++			reason = BFQ_BFQQ_TOO_IDLE;
++		else
++			goto schedule_dispatch;
++
++		bfq_bfqq_expire(bfqd, bfqq, 1, reason);
++	}
++
++schedule_dispatch:
++	bfq_schedule_dispatch(bfqd);
++
++	spin_unlock_irqrestore(bfqd->queue->queue_lock, flags);
++}
++
++static void bfq_shutdown_timer_wq(struct bfq_data *bfqd)
++{
++	del_timer_sync(&bfqd->idle_slice_timer);
++	cancel_work_sync(&bfqd->unplug_work);
++}
++
++static inline void __bfq_put_async_bfqq(struct bfq_data *bfqd,
++					struct bfq_queue **bfqq_ptr)
++{
++	struct bfq_group *root_group = bfqd->root_group;
++	struct bfq_queue *bfqq = *bfqq_ptr;
++
++	bfq_log(bfqd, "put_async_bfqq: %p", bfqq);
++	if (bfqq != NULL) {
++		bfq_bfqq_move(bfqd, bfqq, &bfqq->entity, root_group);
++		bfq_log_bfqq(bfqd, bfqq, "put_async_bfqq: putting %p, %d",
++			     bfqq, atomic_read(&bfqq->ref));
++		bfq_put_queue(bfqq);
++		*bfqq_ptr = NULL;
++	}
++}
++
++/*
++ * Release all the bfqg references to its async queues.  If we are
++ * deallocating the group these queues may still contain requests, so
++ * we reparent them to the root cgroup (i.e., the only one that will
++ * exist for sure until all the requests on a device are gone).
++ */
++static void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg)
++{
++	int i, j;
++
++	for (i = 0; i < 2; i++)
++		for (j = 0; j < IOPRIO_BE_NR; j++)
++			__bfq_put_async_bfqq(bfqd, &bfqg->async_bfqq[i][j]);
++
++	__bfq_put_async_bfqq(bfqd, &bfqg->async_idle_bfqq);
++}
++
++static void bfq_exit_queue(struct elevator_queue *e)
++{
++	struct bfq_data *bfqd = e->elevator_data;
++	struct request_queue *q = bfqd->queue;
++	struct bfq_queue *bfqq, *n;
++
++	bfq_shutdown_timer_wq(bfqd);
++
++	spin_lock_irq(q->queue_lock);
++
++	BUG_ON(bfqd->in_service_queue != NULL);
++	list_for_each_entry_safe(bfqq, n, &bfqd->idle_list, bfqq_list)
++		bfq_deactivate_bfqq(bfqd, bfqq, 0);
++
++	bfq_disconnect_groups(bfqd);
++	spin_unlock_irq(q->queue_lock);
++
++	bfq_shutdown_timer_wq(bfqd);
++
++	synchronize_rcu();
++
++	BUG_ON(timer_pending(&bfqd->idle_slice_timer));
++
++	bfq_free_root_group(bfqd);
++	kfree(bfqd);
++}
++
++static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
++{
++	struct bfq_group *bfqg;
++	struct bfq_data *bfqd;
++	struct elevator_queue *eq;
++
++	eq = elevator_alloc(q, e);
++	if (eq == NULL)
++		return -ENOMEM;
++
++	bfqd = kzalloc_node(sizeof(*bfqd), GFP_KERNEL, q->node);
++	if (bfqd == NULL) {
++		kobject_put(&eq->kobj);
++		return -ENOMEM;
++	}
++	eq->elevator_data = bfqd;
++
++	/*
++	 * Our fallback bfqq if bfq_find_alloc_queue() runs into OOM issues.
++	 * Grab a permanent reference to it, so that the normal code flow
++	 * will not attempt to free it.
++	 */
++	bfq_init_bfqq(bfqd, &bfqd->oom_bfqq, NULL, 1, 0);
++	atomic_inc(&bfqd->oom_bfqq.ref);
++	bfqd->oom_bfqq.entity.new_ioprio = BFQ_DEFAULT_QUEUE_IOPRIO;
++	bfqd->oom_bfqq.entity.new_ioprio_class = IOPRIO_CLASS_BE;
++	bfqd->oom_bfqq.entity.new_weight =
++		bfq_ioprio_to_weight(bfqd->oom_bfqq.entity.new_ioprio);
++	/*
++	 * Trigger weight initialization, according to ioprio, at the
++	 * oom_bfqq's first activation. The oom_bfqq's ioprio and ioprio
++	 * class won't be changed any more.
++	 */
++	bfqd->oom_bfqq.entity.ioprio_changed = 1;
++
++	bfqd->queue = q;
++
++	spin_lock_irq(q->queue_lock);
++	q->elevator = eq;
++	spin_unlock_irq(q->queue_lock);
++
++	bfqg = bfq_alloc_root_group(bfqd, q->node);
++	if (bfqg == NULL) {
++		kfree(bfqd);
++		kobject_put(&eq->kobj);
++		return -ENOMEM;
++	}
++
++	bfqd->root_group = bfqg;
++	bfq_init_entity(&bfqd->oom_bfqq.entity, bfqd->root_group);
++#ifdef CONFIG_CGROUP_BFQIO
++	bfqd->active_numerous_groups = 0;
++#endif
++
++	init_timer(&bfqd->idle_slice_timer);
++	bfqd->idle_slice_timer.function = bfq_idle_slice_timer;
++	bfqd->idle_slice_timer.data = (unsigned long)bfqd;
++
++	bfqd->rq_pos_tree = RB_ROOT;
++	bfqd->queue_weights_tree = RB_ROOT;
++	bfqd->group_weights_tree = RB_ROOT;
++
++	INIT_WORK(&bfqd->unplug_work, bfq_kick_queue);
++
++	INIT_LIST_HEAD(&bfqd->active_list);
++	INIT_LIST_HEAD(&bfqd->idle_list);
++	INIT_HLIST_HEAD(&bfqd->burst_list);
++
++	bfqd->hw_tag = -1;
++
++	bfqd->bfq_max_budget = bfq_default_max_budget;
++
++	bfqd->bfq_fifo_expire[0] = bfq_fifo_expire[0];
++	bfqd->bfq_fifo_expire[1] = bfq_fifo_expire[1];
++	bfqd->bfq_back_max = bfq_back_max;
++	bfqd->bfq_back_penalty = bfq_back_penalty;
++	bfqd->bfq_slice_idle = bfq_slice_idle;
++	bfqd->bfq_class_idle_last_service = 0;
++	bfqd->bfq_max_budget_async_rq = bfq_max_budget_async_rq;
++	bfqd->bfq_timeout[BLK_RW_ASYNC] = bfq_timeout_async;
++	bfqd->bfq_timeout[BLK_RW_SYNC] = bfq_timeout_sync;
++
++	bfqd->bfq_coop_thresh = 2;
++	bfqd->bfq_failed_cooperations = 7000;
++	bfqd->bfq_requests_within_timer = 120;
++
++	bfqd->bfq_large_burst_thresh = 11;
++	bfqd->bfq_burst_interval = msecs_to_jiffies(500);
++
++	bfqd->low_latency = true;
++
++	bfqd->bfq_wr_coeff = 20;
++	bfqd->bfq_wr_rt_max_time = msecs_to_jiffies(300);
++	bfqd->bfq_wr_max_time = 0;
++	bfqd->bfq_wr_min_idle_time = msecs_to_jiffies(2000);
++	bfqd->bfq_wr_min_inter_arr_async = msecs_to_jiffies(500);
++	bfqd->bfq_wr_max_softrt_rate = 7000; /*
++					      * Approximate rate required
++					      * to playback or record a
++					      * high-definition compressed
++					      * video.
++					      */
++	bfqd->wr_busy_queues = 0;
++	bfqd->busy_in_flight_queues = 0;
++	bfqd->const_seeky_busy_in_flight_queues = 0;
++
++	/*
++	 * Begin by assuming, optimistically, that the device peak rate is
++	 * equal to the highest reference rate.
++	 */
++	bfqd->RT_prod = R_fast[blk_queue_nonrot(bfqd->queue)] *
++			T_fast[blk_queue_nonrot(bfqd->queue)];
++	bfqd->peak_rate = R_fast[blk_queue_nonrot(bfqd->queue)];
++	bfqd->device_speed = BFQ_BFQD_FAST;
++
++	return 0;
++}
++
++static void bfq_slab_kill(void)
++{
++	if (bfq_pool != NULL)
++		kmem_cache_destroy(bfq_pool);
++}
++
++static int __init bfq_slab_setup(void)
++{
++	bfq_pool = KMEM_CACHE(bfq_queue, 0);
++	if (bfq_pool == NULL)
++		return -ENOMEM;
++	return 0;
++}
++
++static ssize_t bfq_var_show(unsigned int var, char *page)
++{
++	return sprintf(page, "%d\n", var);
++}
++
++static ssize_t bfq_var_store(unsigned long *var, const char *page,
++			     size_t count)
++{
++	unsigned long new_val;
++	int ret = kstrtoul(page, 10, &new_val);
++
++	if (ret == 0)
++		*var = new_val;
++
++	return count;
++}
++
++static ssize_t bfq_wr_max_time_show(struct elevator_queue *e, char *page)
++{
++	struct bfq_data *bfqd = e->elevator_data;
++	return sprintf(page, "%d\n", bfqd->bfq_wr_max_time > 0 ?
++		       jiffies_to_msecs(bfqd->bfq_wr_max_time) :
++		       jiffies_to_msecs(bfq_wr_duration(bfqd)));
++}
++
++static ssize_t bfq_weights_show(struct elevator_queue *e, char *page)
++{
++	struct bfq_queue *bfqq;
++	struct bfq_data *bfqd = e->elevator_data;
++	ssize_t num_char = 0;
++
++	num_char += sprintf(page + num_char, "Tot reqs queued %d\n\n",
++			    bfqd->queued);
++
++	spin_lock_irq(bfqd->queue->queue_lock);
++
++	num_char += sprintf(page + num_char, "Active:\n");
++	list_for_each_entry(bfqq, &bfqd->active_list, bfqq_list) {
++	  num_char += sprintf(page + num_char,
++			      "pid%d: weight %hu, nr_queued %d %d, dur %d/%u\n",
++			      bfqq->pid,
++			      bfqq->entity.weight,
++			      bfqq->queued[0],
++			      bfqq->queued[1],
++			jiffies_to_msecs(jiffies - bfqq->last_wr_start_finish),
++			jiffies_to_msecs(bfqq->wr_cur_max_time));
++	}
++
++	num_char += sprintf(page + num_char, "Idle:\n");
++	list_for_each_entry(bfqq, &bfqd->idle_list, bfqq_list) {
++			num_char += sprintf(page + num_char,
++				"pid%d: weight %hu, dur %d/%u\n",
++				bfqq->pid,
++				bfqq->entity.weight,
++				jiffies_to_msecs(jiffies -
++					bfqq->last_wr_start_finish),
++				jiffies_to_msecs(bfqq->wr_cur_max_time));
++	}
++
++	spin_unlock_irq(bfqd->queue->queue_lock);
++
++	return num_char;
++}
++
++#define SHOW_FUNCTION(__FUNC, __VAR, __CONV)				\
++static ssize_t __FUNC(struct elevator_queue *e, char *page)		\
++{									\
++	struct bfq_data *bfqd = e->elevator_data;			\
++	unsigned int __data = __VAR;					\
++	if (__CONV)							\
++		__data = jiffies_to_msecs(__data);			\
++	return bfq_var_show(__data, (page));				\
++}
++SHOW_FUNCTION(bfq_fifo_expire_sync_show, bfqd->bfq_fifo_expire[1], 1);
++SHOW_FUNCTION(bfq_fifo_expire_async_show, bfqd->bfq_fifo_expire[0], 1);
++SHOW_FUNCTION(bfq_back_seek_max_show, bfqd->bfq_back_max, 0);
++SHOW_FUNCTION(bfq_back_seek_penalty_show, bfqd->bfq_back_penalty, 0);
++SHOW_FUNCTION(bfq_slice_idle_show, bfqd->bfq_slice_idle, 1);
++SHOW_FUNCTION(bfq_max_budget_show, bfqd->bfq_user_max_budget, 0);
++SHOW_FUNCTION(bfq_max_budget_async_rq_show,
++	      bfqd->bfq_max_budget_async_rq, 0);
++SHOW_FUNCTION(bfq_timeout_sync_show, bfqd->bfq_timeout[BLK_RW_SYNC], 1);
++SHOW_FUNCTION(bfq_timeout_async_show, bfqd->bfq_timeout[BLK_RW_ASYNC], 1);
++SHOW_FUNCTION(bfq_low_latency_show, bfqd->low_latency, 0);
++SHOW_FUNCTION(bfq_wr_coeff_show, bfqd->bfq_wr_coeff, 0);
++SHOW_FUNCTION(bfq_wr_rt_max_time_show, bfqd->bfq_wr_rt_max_time, 1);
++SHOW_FUNCTION(bfq_wr_min_idle_time_show, bfqd->bfq_wr_min_idle_time, 1);
++SHOW_FUNCTION(bfq_wr_min_inter_arr_async_show, bfqd->bfq_wr_min_inter_arr_async,
++	1);
++SHOW_FUNCTION(bfq_wr_max_softrt_rate_show, bfqd->bfq_wr_max_softrt_rate, 0);
++#undef SHOW_FUNCTION
++
++#define STORE_FUNCTION(__FUNC, __PTR, MIN, MAX, __CONV)			\
++static ssize_t								\
++__FUNC(struct elevator_queue *e, const char *page, size_t count)	\
++{									\
++	struct bfq_data *bfqd = e->elevator_data;			\
++	unsigned long uninitialized_var(__data);			\
++	int ret = bfq_var_store(&__data, (page), count);		\
++	if (__data < (MIN))						\
++		__data = (MIN);						\
++	else if (__data > (MAX))					\
++		__data = (MAX);						\
++	if (__CONV)							\
++		*(__PTR) = msecs_to_jiffies(__data);			\
++	else								\
++		*(__PTR) = __data;					\
++	return ret;							\
++}
++STORE_FUNCTION(bfq_fifo_expire_sync_store, &bfqd->bfq_fifo_expire[1], 1,
++		INT_MAX, 1);
++STORE_FUNCTION(bfq_fifo_expire_async_store, &bfqd->bfq_fifo_expire[0], 1,
++		INT_MAX, 1);
++STORE_FUNCTION(bfq_back_seek_max_store, &bfqd->bfq_back_max, 0, INT_MAX, 0);
++STORE_FUNCTION(bfq_back_seek_penalty_store, &bfqd->bfq_back_penalty, 1,
++		INT_MAX, 0);
++STORE_FUNCTION(bfq_slice_idle_store, &bfqd->bfq_slice_idle, 0, INT_MAX, 1);
++STORE_FUNCTION(bfq_max_budget_async_rq_store, &bfqd->bfq_max_budget_async_rq,
++		1, INT_MAX, 0);
++STORE_FUNCTION(bfq_timeout_async_store, &bfqd->bfq_timeout[BLK_RW_ASYNC], 0,
++		INT_MAX, 1);
++STORE_FUNCTION(bfq_wr_coeff_store, &bfqd->bfq_wr_coeff, 1, INT_MAX, 0);
++STORE_FUNCTION(bfq_wr_max_time_store, &bfqd->bfq_wr_max_time, 0, INT_MAX, 1);
++STORE_FUNCTION(bfq_wr_rt_max_time_store, &bfqd->bfq_wr_rt_max_time, 0, INT_MAX,
++		1);
++STORE_FUNCTION(bfq_wr_min_idle_time_store, &bfqd->bfq_wr_min_idle_time, 0,
++		INT_MAX, 1);
++STORE_FUNCTION(bfq_wr_min_inter_arr_async_store,
++		&bfqd->bfq_wr_min_inter_arr_async, 0, INT_MAX, 1);
++STORE_FUNCTION(bfq_wr_max_softrt_rate_store, &bfqd->bfq_wr_max_softrt_rate, 0,
++		INT_MAX, 0);
++#undef STORE_FUNCTION
++
++/* do nothing for the moment */
++static ssize_t bfq_weights_store(struct elevator_queue *e,
++				    const char *page, size_t count)
++{
++	return count;
++}
++
++static inline unsigned long bfq_estimated_max_budget(struct bfq_data *bfqd)
++{
++	u64 timeout = jiffies_to_msecs(bfqd->bfq_timeout[BLK_RW_SYNC]);
++
++	if (bfqd->peak_rate_samples >= BFQ_PEAK_RATE_SAMPLES)
++		return bfq_calc_max_budget(bfqd->peak_rate, timeout);
++	else
++		return bfq_default_max_budget;
++}
++
++static ssize_t bfq_max_budget_store(struct elevator_queue *e,
++				    const char *page, size_t count)
++{
++	struct bfq_data *bfqd = e->elevator_data;
++	unsigned long uninitialized_var(__data);
++	int ret = bfq_var_store(&__data, (page), count);
++
++	if (__data == 0)
++		bfqd->bfq_max_budget = bfq_estimated_max_budget(bfqd);
++	else {
++		if (__data > INT_MAX)
++			__data = INT_MAX;
++		bfqd->bfq_max_budget = __data;
++	}
++
++	bfqd->bfq_user_max_budget = __data;
++
++	return ret;
++}
++
++static ssize_t bfq_timeout_sync_store(struct elevator_queue *e,
++				      const char *page, size_t count)
++{
++	struct bfq_data *bfqd = e->elevator_data;
++	unsigned long uninitialized_var(__data);
++	int ret = bfq_var_store(&__data, (page), count);
++
++	if (__data < 1)
++		__data = 1;
++	else if (__data > INT_MAX)
++		__data = INT_MAX;
++
++	bfqd->bfq_timeout[BLK_RW_SYNC] = msecs_to_jiffies(__data);
++	if (bfqd->bfq_user_max_budget == 0)
++		bfqd->bfq_max_budget = bfq_estimated_max_budget(bfqd);
++
++	return ret;
++}
++
++static ssize_t bfq_low_latency_store(struct elevator_queue *e,
++				     const char *page, size_t count)
++{
++	struct bfq_data *bfqd = e->elevator_data;
++	unsigned long uninitialized_var(__data);
++	int ret = bfq_var_store(&__data, (page), count);
++
++	if (__data > 1)
++		__data = 1;
++	if (__data == 0 && bfqd->low_latency != 0)
++		bfq_end_wr(bfqd);
++	bfqd->low_latency = __data;
++
++	return ret;
++}
++
++#define BFQ_ATTR(name) \
++	__ATTR(name, S_IRUGO|S_IWUSR, bfq_##name##_show, bfq_##name##_store)
++
++static struct elv_fs_entry bfq_attrs[] = {
++	BFQ_ATTR(fifo_expire_sync),
++	BFQ_ATTR(fifo_expire_async),
++	BFQ_ATTR(back_seek_max),
++	BFQ_ATTR(back_seek_penalty),
++	BFQ_ATTR(slice_idle),
++	BFQ_ATTR(max_budget),
++	BFQ_ATTR(max_budget_async_rq),
++	BFQ_ATTR(timeout_sync),
++	BFQ_ATTR(timeout_async),
++	BFQ_ATTR(low_latency),
++	BFQ_ATTR(wr_coeff),
++	BFQ_ATTR(wr_max_time),
++	BFQ_ATTR(wr_rt_max_time),
++	BFQ_ATTR(wr_min_idle_time),
++	BFQ_ATTR(wr_min_inter_arr_async),
++	BFQ_ATTR(wr_max_softrt_rate),
++	BFQ_ATTR(weights),
++	__ATTR_NULL
++};
++
++static struct elevator_type iosched_bfq = {
++	.ops = {
++		.elevator_merge_fn =		bfq_merge,
++		.elevator_merged_fn =		bfq_merged_request,
++		.elevator_merge_req_fn =	bfq_merged_requests,
++		.elevator_allow_merge_fn =	bfq_allow_merge,
++		.elevator_dispatch_fn =		bfq_dispatch_requests,
++		.elevator_add_req_fn =		bfq_insert_request,
++		.elevator_activate_req_fn =	bfq_activate_request,
++		.elevator_deactivate_req_fn =	bfq_deactivate_request,
++		.elevator_completed_req_fn =	bfq_completed_request,
++		.elevator_former_req_fn =	elv_rb_former_request,
++		.elevator_latter_req_fn =	elv_rb_latter_request,
++		.elevator_init_icq_fn =		bfq_init_icq,
++		.elevator_exit_icq_fn =		bfq_exit_icq,
++		.elevator_set_req_fn =		bfq_set_request,
++		.elevator_put_req_fn =		bfq_put_request,
++		.elevator_may_queue_fn =	bfq_may_queue,
++		.elevator_init_fn =		bfq_init_queue,
++		.elevator_exit_fn =		bfq_exit_queue,
++	},
++	.icq_size =		sizeof(struct bfq_io_cq),
++	.icq_align =		__alignof__(struct bfq_io_cq),
++	.elevator_attrs =	bfq_attrs,
++	.elevator_name =	"bfq",
++	.elevator_owner =	THIS_MODULE,
++};
++
++static int __init bfq_init(void)
++{
++	/*
++	 * Can be 0 on HZ < 1000 setups.
++	 */
++	if (bfq_slice_idle == 0)
++		bfq_slice_idle = 1;
++
++	if (bfq_timeout_async == 0)
++		bfq_timeout_async = 1;
++
++	if (bfq_slab_setup())
++		return -ENOMEM;
++
++	/*
++	 * Times to load large popular applications for the typical systems
++	 * installed on the reference devices (see the comments before the
++	 * definitions of the two arrays).
++	 */
++	T_slow[0] = msecs_to_jiffies(2600);
++	T_slow[1] = msecs_to_jiffies(1000);
++	T_fast[0] = msecs_to_jiffies(5500);
++	T_fast[1] = msecs_to_jiffies(2000);
++
++	/*
++	 * Thresholds that determine the switch between speed classes (see
++	 * the comments before the definition of the array).
++	 */
++	device_speed_thresh[0] = (R_fast[0] + R_slow[0]) / 2;
++	device_speed_thresh[1] = (R_fast[1] + R_slow[1]) / 2;
++
++	elv_register(&iosched_bfq);
++	pr_info("BFQ I/O-scheduler: v7r8");
++
++	return 0;
++}
++
++static void __exit bfq_exit(void)
++{
++	elv_unregister(&iosched_bfq);
++	bfq_slab_kill();
++}
++
++module_init(bfq_init);
++module_exit(bfq_exit);
++
++MODULE_AUTHOR("Fabio Checconi, Paolo Valente");
++MODULE_LICENSE("GPL");
+diff --git a/block/bfq-sched.c b/block/bfq-sched.c
+new file mode 100644
+index 0000000..c343099
+--- /dev/null
++++ b/block/bfq-sched.c
+@@ -0,0 +1,1208 @@
++/*
++ * BFQ: Hierarchical B-WF2Q+ scheduler.
++ *
++ * Based on ideas and code from CFQ:
++ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
++ *
++ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
++ *		      Paolo Valente <paolo.valente@unimore.it>
++ *
++ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
++ */
++
++#ifdef CONFIG_CGROUP_BFQIO
++#define for_each_entity(entity)	\
++	for (; entity != NULL; entity = entity->parent)
++
++#define for_each_entity_safe(entity, parent) \
++	for (; entity && ({ parent = entity->parent; 1; }); entity = parent)
++
++static struct bfq_entity *bfq_lookup_next_entity(struct bfq_sched_data *sd,
++						 int extract,
++						 struct bfq_data *bfqd);
++
++static inline void bfq_update_budget(struct bfq_entity *next_in_service)
++{
++	struct bfq_entity *bfqg_entity;
++	struct bfq_group *bfqg;
++	struct bfq_sched_data *group_sd;
++
++	BUG_ON(next_in_service == NULL);
++
++	group_sd = next_in_service->sched_data;
++
++	bfqg = container_of(group_sd, struct bfq_group, sched_data);
++	/*
++	 * bfq_group's my_entity field is not NULL only if the group
++	 * is not the root group. We must not touch the root entity
++	 * as it must never become an in-service entity.
++	 */
++	bfqg_entity = bfqg->my_entity;
++	if (bfqg_entity != NULL)
++		bfqg_entity->budget = next_in_service->budget;
++}
++
++static int bfq_update_next_in_service(struct bfq_sched_data *sd)
++{
++	struct bfq_entity *next_in_service;
++
++	if (sd->in_service_entity != NULL)
++		/* will update/requeue at the end of service */
++		return 0;
++
++	/*
++	 * NOTE: this can be improved in many ways, such as returning
++	 * 1 (and thus propagating upwards the update) only when the
++	 * budget changes, or caching the bfqq that will be scheduled
++	 * next from this subtree.  By now we worry more about
++	 * correctness than about performance...
++	 */
++	next_in_service = bfq_lookup_next_entity(sd, 0, NULL);
++	sd->next_in_service = next_in_service;
++
++	if (next_in_service != NULL)
++		bfq_update_budget(next_in_service);
++
++	return 1;
++}
++
++static inline void bfq_check_next_in_service(struct bfq_sched_data *sd,
++					     struct bfq_entity *entity)
++{
++	BUG_ON(sd->next_in_service != entity);
++}
++#else
++#define for_each_entity(entity)	\
++	for (; entity != NULL; entity = NULL)
++
++#define for_each_entity_safe(entity, parent) \
++	for (parent = NULL; entity != NULL; entity = parent)
++
++static inline int bfq_update_next_in_service(struct bfq_sched_data *sd)
++{
++	return 0;
++}
++
++static inline void bfq_check_next_in_service(struct bfq_sched_data *sd,
++					     struct bfq_entity *entity)
++{
++}
++
++static inline void bfq_update_budget(struct bfq_entity *next_in_service)
++{
++}
++#endif
++
++/*
++ * Shift for timestamp calculations.  This actually limits the maximum
++ * service allowed in one timestamp delta (small shift values increase it),
++ * the maximum total weight that can be used for the queues in the system
++ * (big shift values increase it), and the period of virtual time
++ * wraparounds.
++ */
++#define WFQ_SERVICE_SHIFT	22
++
++/**
++ * bfq_gt - compare two timestamps.
++ * @a: first ts.
++ * @b: second ts.
++ *
++ * Return @a > @b, dealing with wrapping correctly.
++ */
++static inline int bfq_gt(u64 a, u64 b)
++{
++	return (s64)(a - b) > 0;
++}
++
++static inline struct bfq_queue *bfq_entity_to_bfqq(struct bfq_entity *entity)
++{
++	struct bfq_queue *bfqq = NULL;
++
++	BUG_ON(entity == NULL);
++
++	if (entity->my_sched_data == NULL)
++		bfqq = container_of(entity, struct bfq_queue, entity);
++
++	return bfqq;
++}
++
++
++/**
++ * bfq_delta - map service into the virtual time domain.
++ * @service: amount of service.
++ * @weight: scale factor (weight of an entity or weight sum).
++ */
++static inline u64 bfq_delta(unsigned long service,
++					unsigned long weight)
++{
++	u64 d = (u64)service << WFQ_SERVICE_SHIFT;
++
++	do_div(d, weight);
++	return d;
++}
++
++/**
++ * bfq_calc_finish - assign the finish time to an entity.
++ * @entity: the entity to act upon.
++ * @service: the service to be charged to the entity.
++ */
++static inline void bfq_calc_finish(struct bfq_entity *entity,
++				   unsigned long service)
++{
++	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++
++	BUG_ON(entity->weight == 0);
++
++	entity->finish = entity->start +
++		bfq_delta(service, entity->weight);
++
++	if (bfqq != NULL) {
++		bfq_log_bfqq(bfqq->bfqd, bfqq,
++			"calc_finish: serv %lu, w %d",
++			service, entity->weight);
++		bfq_log_bfqq(bfqq->bfqd, bfqq,
++			"calc_finish: start %llu, finish %llu, delta %llu",
++			entity->start, entity->finish,
++			bfq_delta(service, entity->weight));
++	}
++}
++
++/**
++ * bfq_entity_of - get an entity from a node.
++ * @node: the node field of the entity.
++ *
++ * Convert a node pointer to the relative entity.  This is used only
++ * to simplify the logic of some functions and not as the generic
++ * conversion mechanism because, e.g., in the tree walking functions,
++ * the check for a %NULL value would be redundant.
++ */
++static inline struct bfq_entity *bfq_entity_of(struct rb_node *node)
++{
++	struct bfq_entity *entity = NULL;
++
++	if (node != NULL)
++		entity = rb_entry(node, struct bfq_entity, rb_node);
++
++	return entity;
++}
++
++/**
++ * bfq_extract - remove an entity from a tree.
++ * @root: the tree root.
++ * @entity: the entity to remove.
++ */
++static inline void bfq_extract(struct rb_root *root,
++			       struct bfq_entity *entity)
++{
++	BUG_ON(entity->tree != root);
++
++	entity->tree = NULL;
++	rb_erase(&entity->rb_node, root);
++}
++
++/**
++ * bfq_idle_extract - extract an entity from the idle tree.
++ * @st: the service tree of the owning @entity.
++ * @entity: the entity being removed.
++ */
++static void bfq_idle_extract(struct bfq_service_tree *st,
++			     struct bfq_entity *entity)
++{
++	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++	struct rb_node *next;
++
++	BUG_ON(entity->tree != &st->idle);
++
++	if (entity == st->first_idle) {
++		next = rb_next(&entity->rb_node);
++		st->first_idle = bfq_entity_of(next);
++	}
++
++	if (entity == st->last_idle) {
++		next = rb_prev(&entity->rb_node);
++		st->last_idle = bfq_entity_of(next);
++	}
++
++	bfq_extract(&st->idle, entity);
++
++	if (bfqq != NULL)
++		list_del(&bfqq->bfqq_list);
++}
++
++/**
++ * bfq_insert - generic tree insertion.
++ * @root: tree root.
++ * @entity: entity to insert.
++ *
++ * This is used for the idle and the active tree, since they are both
++ * ordered by finish time.
++ */
++static void bfq_insert(struct rb_root *root, struct bfq_entity *entity)
++{
++	struct bfq_entity *entry;
++	struct rb_node **node = &root->rb_node;
++	struct rb_node *parent = NULL;
++
++	BUG_ON(entity->tree != NULL);
++
++	while (*node != NULL) {
++		parent = *node;
++		entry = rb_entry(parent, struct bfq_entity, rb_node);
++
++		if (bfq_gt(entry->finish, entity->finish))
++			node = &parent->rb_left;
++		else
++			node = &parent->rb_right;
++	}
++
++	rb_link_node(&entity->rb_node, parent, node);
++	rb_insert_color(&entity->rb_node, root);
++
++	entity->tree = root;
++}
++
++/**
++ * bfq_update_min - update the min_start field of a entity.
++ * @entity: the entity to update.
++ * @node: one of its children.
++ *
++ * This function is called when @entity may store an invalid value for
++ * min_start due to updates to the active tree.  The function  assumes
++ * that the subtree rooted at @node (which may be its left or its right
++ * child) has a valid min_start value.
++ */
++static inline void bfq_update_min(struct bfq_entity *entity,
++				  struct rb_node *node)
++{
++	struct bfq_entity *child;
++
++	if (node != NULL) {
++		child = rb_entry(node, struct bfq_entity, rb_node);
++		if (bfq_gt(entity->min_start, child->min_start))
++			entity->min_start = child->min_start;
++	}
++}
++
++/**
++ * bfq_update_active_node - recalculate min_start.
++ * @node: the node to update.
++ *
++ * @node may have changed position or one of its children may have moved,
++ * this function updates its min_start value.  The left and right subtrees
++ * are assumed to hold a correct min_start value.
++ */
++static inline void bfq_update_active_node(struct rb_node *node)
++{
++	struct bfq_entity *entity = rb_entry(node, struct bfq_entity, rb_node);
++
++	entity->min_start = entity->start;
++	bfq_update_min(entity, node->rb_right);
++	bfq_update_min(entity, node->rb_left);
++}
++
++/**
++ * bfq_update_active_tree - update min_start for the whole active tree.
++ * @node: the starting node.
++ *
++ * @node must be the deepest modified node after an update.  This function
++ * updates its min_start using the values held by its children, assuming
++ * that they did not change, and then updates all the nodes that may have
++ * changed in the path to the root.  The only nodes that may have changed
++ * are the ones in the path or their siblings.
++ */
++static void bfq_update_active_tree(struct rb_node *node)
++{
++	struct rb_node *parent;
++
++up:
++	bfq_update_active_node(node);
++
++	parent = rb_parent(node);
++	if (parent == NULL)
++		return;
++
++	if (node == parent->rb_left && parent->rb_right != NULL)
++		bfq_update_active_node(parent->rb_right);
++	else if (parent->rb_left != NULL)
++		bfq_update_active_node(parent->rb_left);
++
++	node = parent;
++	goto up;
++}
++
++static void bfq_weights_tree_add(struct bfq_data *bfqd,
++				 struct bfq_entity *entity,
++				 struct rb_root *root);
++
++static void bfq_weights_tree_remove(struct bfq_data *bfqd,
++				    struct bfq_entity *entity,
++				    struct rb_root *root);
++
++
++/**
++ * bfq_active_insert - insert an entity in the active tree of its
++ *                     group/device.
++ * @st: the service tree of the entity.
++ * @entity: the entity being inserted.
++ *
++ * The active tree is ordered by finish time, but an extra key is kept
++ * per each node, containing the minimum value for the start times of
++ * its children (and the node itself), so it's possible to search for
++ * the eligible node with the lowest finish time in logarithmic time.
++ */
++static void bfq_active_insert(struct bfq_service_tree *st,
++			      struct bfq_entity *entity)
++{
++	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++	struct rb_node *node = &entity->rb_node;
++#ifdef CONFIG_CGROUP_BFQIO
++	struct bfq_sched_data *sd = NULL;
++	struct bfq_group *bfqg = NULL;
++	struct bfq_data *bfqd = NULL;
++#endif
++
++	bfq_insert(&st->active, entity);
++
++	if (node->rb_left != NULL)
++		node = node->rb_left;
++	else if (node->rb_right != NULL)
++		node = node->rb_right;
++
++	bfq_update_active_tree(node);
++
++#ifdef CONFIG_CGROUP_BFQIO
++	sd = entity->sched_data;
++	bfqg = container_of(sd, struct bfq_group, sched_data);
++	BUG_ON(!bfqg);
++	bfqd = (struct bfq_data *)bfqg->bfqd;
++#endif
++	if (bfqq != NULL)
++		list_add(&bfqq->bfqq_list, &bfqq->bfqd->active_list);
++#ifdef CONFIG_CGROUP_BFQIO
++	else { /* bfq_group */
++		BUG_ON(!bfqd);
++		bfq_weights_tree_add(bfqd, entity, &bfqd->group_weights_tree);
++	}
++	if (bfqg != bfqd->root_group) {
++		BUG_ON(!bfqg);
++		BUG_ON(!bfqd);
++		bfqg->active_entities++;
++		if (bfqg->active_entities == 2)
++			bfqd->active_numerous_groups++;
++	}
++#endif
++}
++
++/**
++ * bfq_ioprio_to_weight - calc a weight from an ioprio.
++ * @ioprio: the ioprio value to convert.
++ */
++static inline unsigned short bfq_ioprio_to_weight(int ioprio)
++{
++	BUG_ON(ioprio < 0 || ioprio >= IOPRIO_BE_NR);
++	return IOPRIO_BE_NR - ioprio;
++}
++
++/**
++ * bfq_weight_to_ioprio - calc an ioprio from a weight.
++ * @weight: the weight value to convert.
++ *
++ * To preserve as mush as possible the old only-ioprio user interface,
++ * 0 is used as an escape ioprio value for weights (numerically) equal or
++ * larger than IOPRIO_BE_NR
++ */
++static inline unsigned short bfq_weight_to_ioprio(int weight)
++{
++	BUG_ON(weight < BFQ_MIN_WEIGHT || weight > BFQ_MAX_WEIGHT);
++	return IOPRIO_BE_NR - weight < 0 ? 0 : IOPRIO_BE_NR - weight;
++}
++
++static inline void bfq_get_entity(struct bfq_entity *entity)
++{
++	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++
++	if (bfqq != NULL) {
++		atomic_inc(&bfqq->ref);
++		bfq_log_bfqq(bfqq->bfqd, bfqq, "get_entity: %p %d",
++			     bfqq, atomic_read(&bfqq->ref));
++	}
++}
++
++/**
++ * bfq_find_deepest - find the deepest node that an extraction can modify.
++ * @node: the node being removed.
++ *
++ * Do the first step of an extraction in an rb tree, looking for the
++ * node that will replace @node, and returning the deepest node that
++ * the following modifications to the tree can touch.  If @node is the
++ * last node in the tree return %NULL.
++ */
++static struct rb_node *bfq_find_deepest(struct rb_node *node)
++{
++	struct rb_node *deepest;
++
++	if (node->rb_right == NULL && node->rb_left == NULL)
++		deepest = rb_parent(node);
++	else if (node->rb_right == NULL)
++		deepest = node->rb_left;
++	else if (node->rb_left == NULL)
++		deepest = node->rb_right;
++	else {
++		deepest = rb_next(node);
++		if (deepest->rb_right != NULL)
++			deepest = deepest->rb_right;
++		else if (rb_parent(deepest) != node)
++			deepest = rb_parent(deepest);
++	}
++
++	return deepest;
++}
++
++/**
++ * bfq_active_extract - remove an entity from the active tree.
++ * @st: the service_tree containing the tree.
++ * @entity: the entity being removed.
++ */
++static void bfq_active_extract(struct bfq_service_tree *st,
++			       struct bfq_entity *entity)
++{
++	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++	struct rb_node *node;
++#ifdef CONFIG_CGROUP_BFQIO
++	struct bfq_sched_data *sd = NULL;
++	struct bfq_group *bfqg = NULL;
++	struct bfq_data *bfqd = NULL;
++#endif
++
++	node = bfq_find_deepest(&entity->rb_node);
++	bfq_extract(&st->active, entity);
++
++	if (node != NULL)
++		bfq_update_active_tree(node);
++
++#ifdef CONFIG_CGROUP_BFQIO
++	sd = entity->sched_data;
++	bfqg = container_of(sd, struct bfq_group, sched_data);
++	BUG_ON(!bfqg);
++	bfqd = (struct bfq_data *)bfqg->bfqd;
++#endif
++	if (bfqq != NULL)
++		list_del(&bfqq->bfqq_list);
++#ifdef CONFIG_CGROUP_BFQIO
++	else { /* bfq_group */
++		BUG_ON(!bfqd);
++		bfq_weights_tree_remove(bfqd, entity,
++					&bfqd->group_weights_tree);
++	}
++	if (bfqg != bfqd->root_group) {
++		BUG_ON(!bfqg);
++		BUG_ON(!bfqd);
++		BUG_ON(!bfqg->active_entities);
++		bfqg->active_entities--;
++		if (bfqg->active_entities == 1) {
++			BUG_ON(!bfqd->active_numerous_groups);
++			bfqd->active_numerous_groups--;
++		}
++	}
++#endif
++}
++
++/**
++ * bfq_idle_insert - insert an entity into the idle tree.
++ * @st: the service tree containing the tree.
++ * @entity: the entity to insert.
++ */
++static void bfq_idle_insert(struct bfq_service_tree *st,
++			    struct bfq_entity *entity)
++{
++	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++	struct bfq_entity *first_idle = st->first_idle;
++	struct bfq_entity *last_idle = st->last_idle;
++
++	if (first_idle == NULL || bfq_gt(first_idle->finish, entity->finish))
++		st->first_idle = entity;
++	if (last_idle == NULL || bfq_gt(entity->finish, last_idle->finish))
++		st->last_idle = entity;
++
++	bfq_insert(&st->idle, entity);
++
++	if (bfqq != NULL)
++		list_add(&bfqq->bfqq_list, &bfqq->bfqd->idle_list);
++}
++
++/**
++ * bfq_forget_entity - remove an entity from the wfq trees.
++ * @st: the service tree.
++ * @entity: the entity being removed.
++ *
++ * Update the device status and forget everything about @entity, putting
++ * the device reference to it, if it is a queue.  Entities belonging to
++ * groups are not refcounted.
++ */
++static void bfq_forget_entity(struct bfq_service_tree *st,
++			      struct bfq_entity *entity)
++{
++	struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++	struct bfq_sched_data *sd;
++
++	BUG_ON(!entity->on_st);
++
++	entity->on_st = 0;
++	st->wsum -= entity->weight;
++	if (bfqq != NULL) {
++		sd = entity->sched_data;
++		bfq_log_bfqq(bfqq->bfqd, bfqq, "forget_entity: %p %d",
++			     bfqq, atomic_read(&bfqq->ref));
++		bfq_put_queue(bfqq);
++	}
++}
++
++/**
++ * bfq_put_idle_entity - release the idle tree ref of an entity.
++ * @st: service tree for the entity.
++ * @entity: the entity being released.
++ */
++static void bfq_put_idle_entity(struct bfq_service_tree *st,
++				struct bfq_entity *entity)
++{
++	bfq_idle_extract(st, entity);
++	bfq_forget_entity(st, entity);
++}
++
++/**
++ * bfq_forget_idle - update the idle tree if necessary.
++ * @st: the service tree to act upon.
++ *
++ * To preserve the global O(log N) complexity we only remove one entry here;
++ * as the idle tree will not grow indefinitely this can be done safely.
++ */
++static void bfq_forget_idle(struct bfq_service_tree *st)
++{
++	struct bfq_entity *first_idle = st->first_idle;
++	struct bfq_entity *last_idle = st->last_idle;
++
++	if (RB_EMPTY_ROOT(&st->active) && last_idle != NULL &&
++	    !bfq_gt(last_idle->finish, st->vtime)) {
++		/*
++		 * Forget the whole idle tree, increasing the vtime past
++		 * the last finish time of idle entities.
++		 */
++		st->vtime = last_idle->finish;
++	}
++
++	if (first_idle != NULL && !bfq_gt(first_idle->finish, st->vtime))
++		bfq_put_idle_entity(st, first_idle);
++}
++
++static struct bfq_service_tree *
++__bfq_entity_update_weight_prio(struct bfq_service_tree *old_st,
++			 struct bfq_entity *entity)
++{
++	struct bfq_service_tree *new_st = old_st;
++
++	if (entity->ioprio_changed) {
++		struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++		unsigned short prev_weight, new_weight;
++		struct bfq_data *bfqd = NULL;
++		struct rb_root *root;
++#ifdef CONFIG_CGROUP_BFQIO
++		struct bfq_sched_data *sd;
++		struct bfq_group *bfqg;
++#endif
++
++		if (bfqq != NULL)
++			bfqd = bfqq->bfqd;
++#ifdef CONFIG_CGROUP_BFQIO
++		else {
++			sd = entity->my_sched_data;
++			bfqg = container_of(sd, struct bfq_group, sched_data);
++			BUG_ON(!bfqg);
++			bfqd = (struct bfq_data *)bfqg->bfqd;
++			BUG_ON(!bfqd);
++		}
++#endif
++
++		BUG_ON(old_st->wsum < entity->weight);
++		old_st->wsum -= entity->weight;
++
++		if (entity->new_weight != entity->orig_weight) {
++			if (entity->new_weight < BFQ_MIN_WEIGHT ||
++			    entity->new_weight > BFQ_MAX_WEIGHT) {
++				printk(KERN_CRIT "update_weight_prio: "
++						 "new_weight %d\n",
++					entity->new_weight);
++				BUG();
++			}
++			entity->orig_weight = entity->new_weight;
++			entity->ioprio =
++				bfq_weight_to_ioprio(entity->orig_weight);
++		}
++
++		entity->ioprio_class = entity->new_ioprio_class;
++		entity->ioprio_changed = 0;
++
++		/*
++		 * NOTE: here we may be changing the weight too early,
++		 * this will cause unfairness.  The correct approach
++		 * would have required additional complexity to defer
++		 * weight changes to the proper time instants (i.e.,
++		 * when entity->finish <= old_st->vtime).
++		 */
++		new_st = bfq_entity_service_tree(entity);
++
++		prev_weight = entity->weight;
++		new_weight = entity->orig_weight *
++			     (bfqq != NULL ? bfqq->wr_coeff : 1);
++		/*
++		 * If the weight of the entity changes, remove the entity
++		 * from its old weight counter (if there is a counter
++		 * associated with the entity), and add it to the counter
++		 * associated with its new weight.
++		 */
++		if (prev_weight != new_weight) {
++			root = bfqq ? &bfqd->queue_weights_tree :
++				      &bfqd->group_weights_tree;
++			bfq_weights_tree_remove(bfqd, entity, root);
++		}
++		entity->weight = new_weight;
++		/*
++		 * Add the entity to its weights tree only if it is
++		 * not associated with a weight-raised queue.
++		 */
++		if (prev_weight != new_weight &&
++		    (bfqq ? bfqq->wr_coeff == 1 : 1))
++			/* If we get here, root has been initialized. */
++			bfq_weights_tree_add(bfqd, entity, root);
++
++		new_st->wsum += entity->weight;
++
++		if (new_st != old_st)
++			entity->start = new_st->vtime;
++	}
++
++	return new_st;
++}
++
++/**
++ * bfq_bfqq_served - update the scheduler status after selection for
++ *                   service.
++ * @bfqq: the queue being served.
++ * @served: bytes to transfer.
++ *
++ * NOTE: this can be optimized, as the timestamps of upper level entities
++ * are synchronized every time a new bfqq is selected for service.  By now,
++ * we keep it to better check consistency.
++ */
++static void bfq_bfqq_served(struct bfq_queue *bfqq, unsigned long served)
++{
++	struct bfq_entity *entity = &bfqq->entity;
++	struct bfq_service_tree *st;
++
++	for_each_entity(entity) {
++		st = bfq_entity_service_tree(entity);
++
++		entity->service += served;
++		BUG_ON(entity->service > entity->budget);
++		BUG_ON(st->wsum == 0);
++
++		st->vtime += bfq_delta(served, st->wsum);
++		bfq_forget_idle(st);
++	}
++	bfq_log_bfqq(bfqq->bfqd, bfqq, "bfqq_served %lu secs", served);
++}
++
++/**
++ * bfq_bfqq_charge_full_budget - set the service to the entity budget.
++ * @bfqq: the queue that needs a service update.
++ *
++ * When it's not possible to be fair in the service domain, because
++ * a queue is not consuming its budget fast enough (the meaning of
++ * fast depends on the timeout parameter), we charge it a full
++ * budget.  In this way we should obtain a sort of time-domain
++ * fairness among all the seeky/slow queues.
++ */
++static inline void bfq_bfqq_charge_full_budget(struct bfq_queue *bfqq)
++{
++	struct bfq_entity *entity = &bfqq->entity;
++
++	bfq_log_bfqq(bfqq->bfqd, bfqq, "charge_full_budget");
++
++	bfq_bfqq_served(bfqq, entity->budget - entity->service);
++}
++
++/**
++ * __bfq_activate_entity - activate an entity.
++ * @entity: the entity being activated.
++ *
++ * Called whenever an entity is activated, i.e., it is not active and one
++ * of its children receives a new request, or has to be reactivated due to
++ * budget exhaustion.  It uses the current budget of the entity (and the
++ * service received if @entity is active) of the queue to calculate its
++ * timestamps.
++ */
++static void __bfq_activate_entity(struct bfq_entity *entity)
++{
++	struct bfq_sched_data *sd = entity->sched_data;
++	struct bfq_service_tree *st = bfq_entity_service_tree(entity);
++
++	if (entity == sd->in_service_entity) {
++		BUG_ON(entity->tree != NULL);
++		/*
++		 * If we are requeueing the current entity we have
++		 * to take care of not charging to it service it has
++		 * not received.
++		 */
++		bfq_calc_finish(entity, entity->service);
++		entity->start = entity->finish;
++		sd->in_service_entity = NULL;
++	} else if (entity->tree == &st->active) {
++		/*
++		 * Requeueing an entity due to a change of some
++		 * next_in_service entity below it.  We reuse the
++		 * old start time.
++		 */
++		bfq_active_extract(st, entity);
++	} else if (entity->tree == &st->idle) {
++		/*
++		 * Must be on the idle tree, bfq_idle_extract() will
++		 * check for that.
++		 */
++		bfq_idle_extract(st, entity);
++		entity->start = bfq_gt(st->vtime, entity->finish) ?
++				       st->vtime : entity->finish;
++	} else {
++		/*
++		 * The finish time of the entity may be invalid, and
++		 * it is in the past for sure, otherwise the queue
++		 * would have been on the idle tree.
++		 */
++		entity->start = st->vtime;
++		st->wsum += entity->weight;
++		bfq_get_entity(entity);
++
++		BUG_ON(entity->on_st);
++		entity->on_st = 1;
++	}
++
++	st = __bfq_entity_update_weight_prio(st, entity);
++	bfq_calc_finish(entity, entity->budget);
++	bfq_active_insert(st, entity);
++}
++
++/**
++ * bfq_activate_entity - activate an entity and its ancestors if necessary.
++ * @entity: the entity to activate.
++ *
++ * Activate @entity and all the entities on the path from it to the root.
++ */
++static void bfq_activate_entity(struct bfq_entity *entity)
++{
++	struct bfq_sched_data *sd;
++
++	for_each_entity(entity) {
++		__bfq_activate_entity(entity);
++
++		sd = entity->sched_data;
++		if (!bfq_update_next_in_service(sd))
++			/*
++			 * No need to propagate the activation to the
++			 * upper entities, as they will be updated when
++			 * the in-service entity is rescheduled.
++			 */
++			break;
++	}
++}
++
++/**
++ * __bfq_deactivate_entity - deactivate an entity from its service tree.
++ * @entity: the entity to deactivate.
++ * @requeue: if false, the entity will not be put into the idle tree.
++ *
++ * Deactivate an entity, independently from its previous state.  If the
++ * entity was not on a service tree just return, otherwise if it is on
++ * any scheduler tree, extract it from that tree, and if necessary
++ * and if the caller did not specify @requeue, put it on the idle tree.
++ *
++ * Return %1 if the caller should update the entity hierarchy, i.e.,
++ * if the entity was in service or if it was the next_in_service for
++ * its sched_data; return %0 otherwise.
++ */
++static int __bfq_deactivate_entity(struct bfq_entity *entity, int requeue)
++{
++	struct bfq_sched_data *sd = entity->sched_data;
++	struct bfq_service_tree *st = bfq_entity_service_tree(entity);
++	int was_in_service = entity == sd->in_service_entity;
++	int ret = 0;
++
++	if (!entity->on_st)
++		return 0;
++
++	BUG_ON(was_in_service && entity->tree != NULL);
++
++	if (was_in_service) {
++		bfq_calc_finish(entity, entity->service);
++		sd->in_service_entity = NULL;
++	} else if (entity->tree == &st->active)
++		bfq_active_extract(st, entity);
++	else if (entity->tree == &st->idle)
++		bfq_idle_extract(st, entity);
++	else if (entity->tree != NULL)
++		BUG();
++
++	if (was_in_service || sd->next_in_service == entity)
++		ret = bfq_update_next_in_service(sd);
++
++	if (!requeue || !bfq_gt(entity->finish, st->vtime))
++		bfq_forget_entity(st, entity);
++	else
++		bfq_idle_insert(st, entity);
++
++	BUG_ON(sd->in_service_entity == entity);
++	BUG_ON(sd->next_in_service == entity);
++
++	return ret;
++}
++
++/**
++ * bfq_deactivate_entity - deactivate an entity.
++ * @entity: the entity to deactivate.
++ * @requeue: true if the entity can be put on the idle tree
++ */
++static void bfq_deactivate_entity(struct bfq_entity *entity, int requeue)
++{
++	struct bfq_sched_data *sd;
++	struct bfq_entity *parent;
++
++	for_each_entity_safe(entity, parent) {
++		sd = entity->sched_data;
++
++		if (!__bfq_deactivate_entity(entity, requeue))
++			/*
++			 * The parent entity is still backlogged, and
++			 * we don't need to update it as it is still
++			 * in service.
++			 */
++			break;
++
++		if (sd->next_in_service != NULL)
++			/*
++			 * The parent entity is still backlogged and
++			 * the budgets on the path towards the root
++			 * need to be updated.
++			 */
++			goto update;
++
++		/*
++		 * If we reach there the parent is no more backlogged and
++		 * we want to propagate the dequeue upwards.
++		 */
++		requeue = 1;
++	}
++
++	return;
++
++update:
++	entity = parent;
++	for_each_entity(entity) {
++		__bfq_activate_entity(entity);
++
++		sd = entity->sched_data;
++		if (!bfq_update_next_in_service(sd))
++			break;
++	}
++}
++
++/**
++ * bfq_update_vtime - update vtime if necessary.
++ * @st: the service tree to act upon.
++ *
++ * If necessary update the service tree vtime to have at least one
++ * eligible entity, skipping to its start time.  Assumes that the
++ * active tree of the device is not empty.
++ *
++ * NOTE: this hierarchical implementation updates vtimes quite often,
++ * we may end up with reactivated processes getting timestamps after a
++ * vtime skip done because we needed a ->first_active entity on some
++ * intermediate node.
++ */
++static void bfq_update_vtime(struct bfq_service_tree *st)
++{
++	struct bfq_entity *entry;
++	struct rb_node *node = st->active.rb_node;
++
++	entry = rb_entry(node, struct bfq_entity, rb_node);
++	if (bfq_gt(entry->min_start, st->vtime)) {
++		st->vtime = entry->min_start;
++		bfq_forget_idle(st);
++	}
++}
++
++/**
++ * bfq_first_active_entity - find the eligible entity with
++ *                           the smallest finish time
++ * @st: the service tree to select from.
++ *
++ * This function searches the first schedulable entity, starting from the
++ * root of the tree and going on the left every time on this side there is
++ * a subtree with at least one eligible (start >= vtime) entity. The path on
++ * the right is followed only if a) the left subtree contains no eligible
++ * entities and b) no eligible entity has been found yet.
++ */
++static struct bfq_entity *bfq_first_active_entity(struct bfq_service_tree *st)
++{
++	struct bfq_entity *entry, *first = NULL;
++	struct rb_node *node = st->active.rb_node;
++
++	while (node != NULL) {
++		entry = rb_entry(node, struct bfq_entity, rb_node);
++left:
++		if (!bfq_gt(entry->start, st->vtime))
++			first = entry;
++
++		BUG_ON(bfq_gt(entry->min_start, st->vtime));
++
++		if (node->rb_left != NULL) {
++			entry = rb_entry(node->rb_left,
++					 struct bfq_entity, rb_node);
++			if (!bfq_gt(entry->min_start, st->vtime)) {
++				node = node->rb_left;
++				goto left;
++			}
++		}
++		if (first != NULL)
++			break;
++		node = node->rb_right;
++	}
++
++	BUG_ON(first == NULL && !RB_EMPTY_ROOT(&st->active));
++	return first;
++}
++
++/**
++ * __bfq_lookup_next_entity - return the first eligible entity in @st.
++ * @st: the service tree.
++ *
++ * Update the virtual time in @st and return the first eligible entity
++ * it contains.
++ */
++static struct bfq_entity *__bfq_lookup_next_entity(struct bfq_service_tree *st,
++						   bool force)
++{
++	struct bfq_entity *entity, *new_next_in_service = NULL;
++
++	if (RB_EMPTY_ROOT(&st->active))
++		return NULL;
++
++	bfq_update_vtime(st);
++	entity = bfq_first_active_entity(st);
++	BUG_ON(bfq_gt(entity->start, st->vtime));
++
++	/*
++	 * If the chosen entity does not match with the sched_data's
++	 * next_in_service and we are forcedly serving the IDLE priority
++	 * class tree, bubble up budget update.
++	 */
++	if (unlikely(force && entity != entity->sched_data->next_in_service)) {
++		new_next_in_service = entity;
++		for_each_entity(new_next_in_service)
++			bfq_update_budget(new_next_in_service);
++	}
++
++	return entity;
++}
++
++/**
++ * bfq_lookup_next_entity - return the first eligible entity in @sd.
++ * @sd: the sched_data.
++ * @extract: if true the returned entity will be also extracted from @sd.
++ *
++ * NOTE: since we cache the next_in_service entity at each level of the
++ * hierarchy, the complexity of the lookup can be decreased with
++ * absolutely no effort just returning the cached next_in_service value;
++ * we prefer to do full lookups to test the consistency of * the data
++ * structures.
++ */
++static struct bfq_entity *bfq_lookup_next_entity(struct bfq_sched_data *sd,
++						 int extract,
++						 struct bfq_data *bfqd)
++{
++	struct bfq_service_tree *st = sd->service_tree;
++	struct bfq_entity *entity;
++	int i = 0;
++
++	BUG_ON(sd->in_service_entity != NULL);
++
++	if (bfqd != NULL &&
++	    jiffies - bfqd->bfq_class_idle_last_service > BFQ_CL_IDLE_TIMEOUT) {
++		entity = __bfq_lookup_next_entity(st + BFQ_IOPRIO_CLASSES - 1,
++						  true);
++		if (entity != NULL) {
++			i = BFQ_IOPRIO_CLASSES - 1;
++			bfqd->bfq_class_idle_last_service = jiffies;
++			sd->next_in_service = entity;
++		}
++	}
++	for (; i < BFQ_IOPRIO_CLASSES; i++) {
++		entity = __bfq_lookup_next_entity(st + i, false);
++		if (entity != NULL) {
++			if (extract) {
++				bfq_check_next_in_service(sd, entity);
++				bfq_active_extract(st + i, entity);
++				sd->in_service_entity = entity;
++				sd->next_in_service = NULL;
++			}
++			break;
++		}
++	}
++
++	return entity;
++}
++
++/*
++ * Get next queue for service.
++ */
++static struct bfq_queue *bfq_get_next_queue(struct bfq_data *bfqd)
++{
++	struct bfq_entity *entity = NULL;
++	struct bfq_sched_data *sd;
++	struct bfq_queue *bfqq;
++
++	BUG_ON(bfqd->in_service_queue != NULL);
++
++	if (bfqd->busy_queues == 0)
++		return NULL;
++
++	sd = &bfqd->root_group->sched_data;
++	for (; sd != NULL; sd = entity->my_sched_data) {
++		entity = bfq_lookup_next_entity(sd, 1, bfqd);
++		BUG_ON(entity == NULL);
++		entity->service = 0;
++	}
++
++	bfqq = bfq_entity_to_bfqq(entity);
++	BUG_ON(bfqq == NULL);
++
++	return bfqq;
++}
++
++/*
++ * Forced extraction of the given queue.
++ */
++static void bfq_get_next_queue_forced(struct bfq_data *bfqd,
++				      struct bfq_queue *bfqq)
++{
++	struct bfq_entity *entity;
++	struct bfq_sched_data *sd;
++
++	BUG_ON(bfqd->in_service_queue != NULL);
++
++	entity = &bfqq->entity;
++	/*
++	 * Bubble up extraction/update from the leaf to the root.
++	*/
++	for_each_entity(entity) {
++		sd = entity->sched_data;
++		bfq_update_budget(entity);
++		bfq_update_vtime(bfq_entity_service_tree(entity));
++		bfq_active_extract(bfq_entity_service_tree(entity), entity);
++		sd->in_service_entity = entity;
++		sd->next_in_service = NULL;
++		entity->service = 0;
++	}
++
++	return;
++}
++
++static void __bfq_bfqd_reset_in_service(struct bfq_data *bfqd)
++{
++	if (bfqd->in_service_bic != NULL) {
++		put_io_context(bfqd->in_service_bic->icq.ioc);
++		bfqd->in_service_bic = NULL;
++	}
++
++	bfqd->in_service_queue = NULL;
++	del_timer(&bfqd->idle_slice_timer);
++}
++
++static void bfq_deactivate_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++				int requeue)
++{
++	struct bfq_entity *entity = &bfqq->entity;
++
++	if (bfqq == bfqd->in_service_queue)
++		__bfq_bfqd_reset_in_service(bfqd);
++
++	bfq_deactivate_entity(entity, requeue);
++}
++
++static void bfq_activate_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq)
++{
++	struct bfq_entity *entity = &bfqq->entity;
++
++	bfq_activate_entity(entity);
++}
++
++/*
++ * Called when the bfqq no longer has requests pending, remove it from
++ * the service tree.
++ */
++static void bfq_del_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++			      int requeue)
++{
++	BUG_ON(!bfq_bfqq_busy(bfqq));
++	BUG_ON(!RB_EMPTY_ROOT(&bfqq->sort_list));
++
++	bfq_log_bfqq(bfqd, bfqq, "del from busy");
++
++	bfq_clear_bfqq_busy(bfqq);
++
++	BUG_ON(bfqd->busy_queues == 0);
++	bfqd->busy_queues--;
++
++	if (!bfqq->dispatched) {
++		bfq_weights_tree_remove(bfqd, &bfqq->entity,
++					&bfqd->queue_weights_tree);
++		if (!blk_queue_nonrot(bfqd->queue)) {
++			BUG_ON(!bfqd->busy_in_flight_queues);
++			bfqd->busy_in_flight_queues--;
++			if (bfq_bfqq_constantly_seeky(bfqq)) {
++				BUG_ON(!bfqd->
++					const_seeky_busy_in_flight_queues);
++				bfqd->const_seeky_busy_in_flight_queues--;
++			}
++		}
++	}
++	if (bfqq->wr_coeff > 1)
++		bfqd->wr_busy_queues--;
++
++	bfq_deactivate_bfqq(bfqd, bfqq, requeue);
++}
++
++/*
++ * Called when an inactive queue receives a new request.
++ */
++static void bfq_add_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq)
++{
++	BUG_ON(bfq_bfqq_busy(bfqq));
++	BUG_ON(bfqq == bfqd->in_service_queue);
++
++	bfq_log_bfqq(bfqd, bfqq, "add to busy");
++
++	bfq_activate_bfqq(bfqd, bfqq);
++
++	bfq_mark_bfqq_busy(bfqq);
++	bfqd->busy_queues++;
++
++	if (!bfqq->dispatched) {
++		if (bfqq->wr_coeff == 1)
++			bfq_weights_tree_add(bfqd, &bfqq->entity,
++					     &bfqd->queue_weights_tree);
++		if (!blk_queue_nonrot(bfqd->queue)) {
++			bfqd->busy_in_flight_queues++;
++			if (bfq_bfqq_constantly_seeky(bfqq))
++				bfqd->const_seeky_busy_in_flight_queues++;
++		}
++	}
++	if (bfqq->wr_coeff > 1)
++		bfqd->wr_busy_queues++;
++}
+diff --git a/block/bfq.h b/block/bfq.h
+new file mode 100644
+index 0000000..e350b5f
+--- /dev/null
++++ b/block/bfq.h
+@@ -0,0 +1,771 @@
++/*
++ * BFQ-v7r8 for 4.2.0: data structures and common functions prototypes.
++ *
++ * Based on ideas and code from CFQ:
++ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
++ *
++ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
++ *		      Paolo Valente <paolo.valente@unimore.it>
++ *
++ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
++ */
++
++#ifndef _BFQ_H
++#define _BFQ_H
++
++#include <linux/blktrace_api.h>
++#include <linux/hrtimer.h>
++#include <linux/ioprio.h>
++#include <linux/rbtree.h>
++
++#define BFQ_IOPRIO_CLASSES	3
++#define BFQ_CL_IDLE_TIMEOUT	(HZ/5)
++
++#define BFQ_MIN_WEIGHT	1
++#define BFQ_MAX_WEIGHT	1000
++
++#define BFQ_DEFAULT_QUEUE_IOPRIO	4
++
++#define BFQ_DEFAULT_GRP_WEIGHT	10
++#define BFQ_DEFAULT_GRP_IOPRIO	0
++#define BFQ_DEFAULT_GRP_CLASS	IOPRIO_CLASS_BE
++
++struct bfq_entity;
++
++/**
++ * struct bfq_service_tree - per ioprio_class service tree.
++ * @active: tree for active entities (i.e., those backlogged).
++ * @idle: tree for idle entities (i.e., those not backlogged, with V <= F_i).
++ * @first_idle: idle entity with minimum F_i.
++ * @last_idle: idle entity with maximum F_i.
++ * @vtime: scheduler virtual time.
++ * @wsum: scheduler weight sum; active and idle entities contribute to it.
++ *
++ * Each service tree represents a B-WF2Q+ scheduler on its own.  Each
++ * ioprio_class has its own independent scheduler, and so its own
++ * bfq_service_tree.  All the fields are protected by the queue lock
++ * of the containing bfqd.
++ */
++struct bfq_service_tree {
++	struct rb_root active;
++	struct rb_root idle;
++
++	struct bfq_entity *first_idle;
++	struct bfq_entity *last_idle;
++
++	u64 vtime;
++	unsigned long wsum;
++};
++
++/**
++ * struct bfq_sched_data - multi-class scheduler.
++ * @in_service_entity: entity in service.
++ * @next_in_service: head-of-the-line entity in the scheduler.
++ * @service_tree: array of service trees, one per ioprio_class.
++ *
++ * bfq_sched_data is the basic scheduler queue.  It supports three
++ * ioprio_classes, and can be used either as a toplevel queue or as
++ * an intermediate queue on a hierarchical setup.
++ * @next_in_service points to the active entity of the sched_data
++ * service trees that will be scheduled next.
++ *
++ * The supported ioprio_classes are the same as in CFQ, in descending
++ * priority order, IOPRIO_CLASS_RT, IOPRIO_CLASS_BE, IOPRIO_CLASS_IDLE.
++ * Requests from higher priority queues are served before all the
++ * requests from lower priority queues; among requests of the same
++ * queue requests are served according to B-WF2Q+.
++ * All the fields are protected by the queue lock of the containing bfqd.
++ */
++struct bfq_sched_data {
++	struct bfq_entity *in_service_entity;
++	struct bfq_entity *next_in_service;
++	struct bfq_service_tree service_tree[BFQ_IOPRIO_CLASSES];
++};
++
++/**
++ * struct bfq_weight_counter - counter of the number of all active entities
++ *                             with a given weight.
++ * @weight: weight of the entities that this counter refers to.
++ * @num_active: number of active entities with this weight.
++ * @weights_node: weights tree member (see bfq_data's @queue_weights_tree
++ *                and @group_weights_tree).
++ */
++struct bfq_weight_counter {
++	short int weight;
++	unsigned int num_active;
++	struct rb_node weights_node;
++};
++
++/**
++ * struct bfq_entity - schedulable entity.
++ * @rb_node: service_tree member.
++ * @weight_counter: pointer to the weight counter associated with this entity.
++ * @on_st: flag, true if the entity is on a tree (either the active or
++ *         the idle one of its service_tree).
++ * @finish: B-WF2Q+ finish timestamp (aka F_i).
++ * @start: B-WF2Q+ start timestamp (aka S_i).
++ * @tree: tree the entity is enqueued into; %NULL if not on a tree.
++ * @min_start: minimum start time of the (active) subtree rooted at
++ *             this entity; used for O(log N) lookups into active trees.
++ * @service: service received during the last round of service.
++ * @budget: budget used to calculate F_i; F_i = S_i + @budget / @weight.
++ * @weight: weight of the queue
++ * @parent: parent entity, for hierarchical scheduling.
++ * @my_sched_data: for non-leaf nodes in the cgroup hierarchy, the
++ *                 associated scheduler queue, %NULL on leaf nodes.
++ * @sched_data: the scheduler queue this entity belongs to.
++ * @ioprio: the ioprio in use.
++ * @new_weight: when a weight change is requested, the new weight value.
++ * @orig_weight: original weight, used to implement weight boosting
++ * @new_ioprio: when an ioprio change is requested, the new ioprio value.
++ * @ioprio_class: the ioprio_class in use.
++ * @new_ioprio_class: when an ioprio_class change is requested, the new
++ *                    ioprio_class value.
++ * @ioprio_changed: flag, true when the user requested a weight, ioprio or
++ *                  ioprio_class change.
++ *
++ * A bfq_entity is used to represent either a bfq_queue (leaf node in the
++ * cgroup hierarchy) or a bfq_group into the upper level scheduler.  Each
++ * entity belongs to the sched_data of the parent group in the cgroup
++ * hierarchy.  Non-leaf entities have also their own sched_data, stored
++ * in @my_sched_data.
++ *
++ * Each entity stores independently its priority values; this would
++ * allow different weights on different devices, but this
++ * functionality is not exported to userspace by now.  Priorities and
++ * weights are updated lazily, first storing the new values into the
++ * new_* fields, then setting the @ioprio_changed flag.  As soon as
++ * there is a transition in the entity state that allows the priority
++ * update to take place the effective and the requested priority
++ * values are synchronized.
++ *
++ * Unless cgroups are used, the weight value is calculated from the
++ * ioprio to export the same interface as CFQ.  When dealing with
++ * ``well-behaved'' queues (i.e., queues that do not spend too much
++ * time to consume their budget and have true sequential behavior, and
++ * when there are no external factors breaking anticipation) the
++ * relative weights at each level of the cgroups hierarchy should be
++ * guaranteed.  All the fields are protected by the queue lock of the
++ * containing bfqd.
++ */
++struct bfq_entity {
++	struct rb_node rb_node;
++	struct bfq_weight_counter *weight_counter;
++
++	int on_st;
++
++	u64 finish;
++	u64 start;
++
++	struct rb_root *tree;
++
++	u64 min_start;
++
++	unsigned long service, budget;
++	unsigned short weight, new_weight;
++	unsigned short orig_weight;
++
++	struct bfq_entity *parent;
++
++	struct bfq_sched_data *my_sched_data;
++	struct bfq_sched_data *sched_data;
++
++	unsigned short ioprio, new_ioprio;
++	unsigned short ioprio_class, new_ioprio_class;
++
++	int ioprio_changed;
++};
++
++struct bfq_group;
++
++/**
++ * struct bfq_queue - leaf schedulable entity.
++ * @ref: reference counter.
++ * @bfqd: parent bfq_data.
++ * @new_bfqq: shared bfq_queue if queue is cooperating with
++ *           one or more other queues.
++ * @pos_node: request-position tree member (see bfq_data's @rq_pos_tree).
++ * @pos_root: request-position tree root (see bfq_data's @rq_pos_tree).
++ * @sort_list: sorted list of pending requests.
++ * @next_rq: if fifo isn't expired, next request to serve.
++ * @queued: nr of requests queued in @sort_list.
++ * @allocated: currently allocated requests.
++ * @meta_pending: pending metadata requests.
++ * @fifo: fifo list of requests in sort_list.
++ * @entity: entity representing this queue in the scheduler.
++ * @max_budget: maximum budget allowed from the feedback mechanism.
++ * @budget_timeout: budget expiration (in jiffies).
++ * @dispatched: number of requests on the dispatch list or inside driver.
++ * @flags: status flags.
++ * @bfqq_list: node for active/idle bfqq list inside our bfqd.
++ * @burst_list_node: node for the device's burst list.
++ * @seek_samples: number of seeks sampled
++ * @seek_total: sum of the distances of the seeks sampled
++ * @seek_mean: mean seek distance
++ * @last_request_pos: position of the last request enqueued
++ * @requests_within_timer: number of consecutive pairs of request completion
++ *                         and arrival, such that the queue becomes idle
++ *                         after the completion, but the next request arrives
++ *                         within an idle time slice; used only if the queue's
++ *                         IO_bound has been cleared.
++ * @pid: pid of the process owning the queue, used for logging purposes.
++ * @last_wr_start_finish: start time of the current weight-raising period if
++ *                        the @bfq-queue is being weight-raised, otherwise
++ *                        finish time of the last weight-raising period
++ * @wr_cur_max_time: current max raising time for this queue
++ * @soft_rt_next_start: minimum time instant such that, only if a new
++ *                      request is enqueued after this time instant in an
++ *                      idle @bfq_queue with no outstanding requests, then
++ *                      the task associated with the queue it is deemed as
++ *                      soft real-time (see the comments to the function
++ *                      bfq_bfqq_softrt_next_start()).
++ * @last_idle_bklogged: time of the last transition of the @bfq_queue from
++ *                      idle to backlogged
++ * @service_from_backlogged: cumulative service received from the @bfq_queue
++ *                           since the last transition from idle to
++ *                           backlogged
++ *
++ * A bfq_queue is a leaf request queue; it can be associated with an io_context
++ * or more, if it is async or shared between cooperating processes. @cgroup
++ * holds a reference to the cgroup, to be sure that it does not disappear while
++ * a bfqq still references it (mostly to avoid races between request issuing and
++ * task migration followed by cgroup destruction).
++ * All the fields are protected by the queue lock of the containing bfqd.
++ */
++struct bfq_queue {
++	atomic_t ref;
++	struct bfq_data *bfqd;
++
++	/* fields for cooperating queues handling */
++	struct bfq_queue *new_bfqq;
++	struct rb_node pos_node;
++	struct rb_root *pos_root;
++
++	struct rb_root sort_list;
++	struct request *next_rq;
++	int queued[2];
++	int allocated[2];
++	int meta_pending;
++	struct list_head fifo;
++
++	struct bfq_entity entity;
++
++	unsigned long max_budget;
++	unsigned long budget_timeout;
++
++	int dispatched;
++
++	unsigned int flags;
++
++	struct list_head bfqq_list;
++
++	struct hlist_node burst_list_node;
++
++	unsigned int seek_samples;
++	u64 seek_total;
++	sector_t seek_mean;
++	sector_t last_request_pos;
++
++	unsigned int requests_within_timer;
++
++	pid_t pid;
++
++	/* weight-raising fields */
++	unsigned long wr_cur_max_time;
++	unsigned long soft_rt_next_start;
++	unsigned long last_wr_start_finish;
++	unsigned int wr_coeff;
++	unsigned long last_idle_bklogged;
++	unsigned long service_from_backlogged;
++};
++
++/**
++ * struct bfq_ttime - per process thinktime stats.
++ * @ttime_total: total process thinktime
++ * @ttime_samples: number of thinktime samples
++ * @ttime_mean: average process thinktime
++ */
++struct bfq_ttime {
++	unsigned long last_end_request;
++
++	unsigned long ttime_total;
++	unsigned long ttime_samples;
++	unsigned long ttime_mean;
++};
++
++/**
++ * struct bfq_io_cq - per (request_queue, io_context) structure.
++ * @icq: associated io_cq structure
++ * @bfqq: array of two process queues, the sync and the async
++ * @ttime: associated @bfq_ttime struct
++ */
++struct bfq_io_cq {
++	struct io_cq icq; /* must be the first member */
++	struct bfq_queue *bfqq[2];
++	struct bfq_ttime ttime;
++	int ioprio;
++};
++
++enum bfq_device_speed {
++	BFQ_BFQD_FAST,
++	BFQ_BFQD_SLOW,
++};
++
++/**
++ * struct bfq_data - per device data structure.
++ * @queue: request queue for the managed device.
++ * @root_group: root bfq_group for the device.
++ * @rq_pos_tree: rbtree sorted by next_request position, used when
++ *               determining if two or more queues have interleaving
++ *               requests (see bfq_close_cooperator()).
++ * @active_numerous_groups: number of bfq_groups containing more than one
++ *                          active @bfq_entity.
++ * @queue_weights_tree: rbtree of weight counters of @bfq_queues, sorted by
++ *                      weight. Used to keep track of whether all @bfq_queues
++ *                     have the same weight. The tree contains one counter
++ *                     for each distinct weight associated to some active
++ *                     and not weight-raised @bfq_queue (see the comments to
++ *                      the functions bfq_weights_tree_[add|remove] for
++ *                     further details).
++ * @group_weights_tree: rbtree of non-queue @bfq_entity weight counters, sorted
++ *                      by weight. Used to keep track of whether all
++ *                     @bfq_groups have the same weight. The tree contains
++ *                     one counter for each distinct weight associated to
++ *                     some active @bfq_group (see the comments to the
++ *                     functions bfq_weights_tree_[add|remove] for further
++ *                     details).
++ * @busy_queues: number of bfq_queues containing requests (including the
++ *		 queue in service, even if it is idling).
++ * @busy_in_flight_queues: number of @bfq_queues containing pending or
++ *                         in-flight requests, plus the @bfq_queue in
++ *                         service, even if idle but waiting for the
++ *                         possible arrival of its next sync request. This
++ *                         field is updated only if the device is rotational,
++ *                         but used only if the device is also NCQ-capable.
++ *                         The reason why the field is updated also for non-
++ *                         NCQ-capable rotational devices is related to the
++ *                         fact that the value of @hw_tag may be set also
++ *                         later than when busy_in_flight_queues may need to
++ *                         be incremented for the first time(s). Taking also
++ *                         this possibility into account, to avoid unbalanced
++ *                         increments/decrements, would imply more overhead
++ *                         than just updating busy_in_flight_queues
++ *                         regardless of the value of @hw_tag.
++ * @const_seeky_busy_in_flight_queues: number of constantly-seeky @bfq_queues
++ *                                     (that is, seeky queues that expired
++ *                                     for budget timeout at least once)
++ *                                     containing pending or in-flight
++ *                                     requests, including the in-service
++ *                                     @bfq_queue if constantly seeky. This
++ *                                     field is updated only if the device
++ *                                     is rotational, but used only if the
++ *                                     device is also NCQ-capable (see the
++ *                                     comments to @busy_in_flight_queues).
++ * @wr_busy_queues: number of weight-raised busy @bfq_queues.
++ * @queued: number of queued requests.
++ * @rq_in_driver: number of requests dispatched and waiting for completion.
++ * @sync_flight: number of sync requests in the driver.
++ * @max_rq_in_driver: max number of reqs in driver in the last
++ *                    @hw_tag_samples completed requests.
++ * @hw_tag_samples: nr of samples used to calculate hw_tag.
++ * @hw_tag: flag set to one if the driver is showing a queueing behavior.
++ * @budgets_assigned: number of budgets assigned.
++ * @idle_slice_timer: timer set when idling for the next sequential request
++ *                    from the queue in service.
++ * @unplug_work: delayed work to restart dispatching on the request queue.
++ * @in_service_queue: bfq_queue in service.
++ * @in_service_bic: bfq_io_cq (bic) associated with the @in_service_queue.
++ * @last_position: on-disk position of the last served request.
++ * @last_budget_start: beginning of the last budget.
++ * @last_idling_start: beginning of the last idle slice.
++ * @peak_rate: peak transfer rate observed for a budget.
++ * @peak_rate_samples: number of samples used to calculate @peak_rate.
++ * @bfq_max_budget: maximum budget allotted to a bfq_queue before
++ *                  rescheduling.
++ * @group_list: list of all the bfq_groups active on the device.
++ * @active_list: list of all the bfq_queues active on the device.
++ * @idle_list: list of all the bfq_queues idle on the device.
++ * @bfq_fifo_expire: timeout for async/sync requests; when it expires
++ *                   requests are served in fifo order.
++ * @bfq_back_penalty: weight of backward seeks wrt forward ones.
++ * @bfq_back_max: maximum allowed backward seek.
++ * @bfq_slice_idle: maximum idling time.
++ * @bfq_user_max_budget: user-configured max budget value
++ *                       (0 for auto-tuning).
++ * @bfq_max_budget_async_rq: maximum budget (in nr of requests) allotted to
++ *                           async queues.
++ * @bfq_timeout: timeout for bfq_queues to consume their budget; used to
++ *               to prevent seeky queues to impose long latencies to well
++ *               behaved ones (this also implies that seeky queues cannot
++ *               receive guarantees in the service domain; after a timeout
++ *               they are charged for the whole allocated budget, to try
++ *               to preserve a behavior reasonably fair among them, but
++ *               without service-domain guarantees).
++ * @bfq_coop_thresh: number of queue merges after which a @bfq_queue is
++ *                   no more granted any weight-raising.
++ * @bfq_failed_cooperations: number of consecutive failed cooperation
++ *                           chances after which weight-raising is restored
++ *                           to a queue subject to more than bfq_coop_thresh
++ *                           queue merges.
++ * @bfq_requests_within_timer: number of consecutive requests that must be
++ *                             issued within the idle time slice to set
++ *                             again idling to a queue which was marked as
++ *                             non-I/O-bound (see the definition of the
++ *                             IO_bound flag for further details).
++ * @last_ins_in_burst: last time at which a queue entered the current
++ *                     burst of queues being activated shortly after
++ *                     each other; for more details about this and the
++ *                     following parameters related to a burst of
++ *                     activations, see the comments to the function
++ *                     @bfq_handle_burst.
++ * @bfq_burst_interval: reference time interval used to decide whether a
++ *                      queue has been activated shortly after
++ *                      @last_ins_in_burst.
++ * @burst_size: number of queues in the current burst of queue activations.
++ * @bfq_large_burst_thresh: maximum burst size above which the current
++ * 			    queue-activation burst is deemed as 'large'.
++ * @large_burst: true if a large queue-activation burst is in progress.
++ * @burst_list: head of the burst list (as for the above fields, more details
++ * 		in the comments to the function bfq_handle_burst).
++ * @low_latency: if set to true, low-latency heuristics are enabled.
++ * @bfq_wr_coeff: maximum factor by which the weight of a weight-raised
++ *                queue is multiplied.
++ * @bfq_wr_max_time: maximum duration of a weight-raising period (jiffies).
++ * @bfq_wr_rt_max_time: maximum duration for soft real-time processes.
++ * @bfq_wr_min_idle_time: minimum idle period after which weight-raising
++ *			  may be reactivated for a queue (in jiffies).
++ * @bfq_wr_min_inter_arr_async: minimum period between request arrivals
++ *				after which weight-raising may be
++ *				reactivated for an already busy queue
++ *				(in jiffies).
++ * @bfq_wr_max_softrt_rate: max service-rate for a soft real-time queue,
++ *			    sectors per seconds.
++ * @RT_prod: cached value of the product R*T used for computing the maximum
++ *	     duration of the weight raising automatically.
++ * @device_speed: device-speed class for the low-latency heuristic.
++ * @oom_bfqq: fallback dummy bfqq for extreme OOM conditions.
++ *
++ * All the fields are protected by the @queue lock.
++ */
++struct bfq_data {
++	struct request_queue *queue;
++
++	struct bfq_group *root_group;
++	struct rb_root rq_pos_tree;
++
++#ifdef CONFIG_CGROUP_BFQIO
++	int active_numerous_groups;
++#endif
++
++	struct rb_root queue_weights_tree;
++	struct rb_root group_weights_tree;
++
++	int busy_queues;
++	int busy_in_flight_queues;
++	int const_seeky_busy_in_flight_queues;
++	int wr_busy_queues;
++	int queued;
++	int rq_in_driver;
++	int sync_flight;
++
++	int max_rq_in_driver;
++	int hw_tag_samples;
++	int hw_tag;
++
++	int budgets_assigned;
++
++	struct timer_list idle_slice_timer;
++	struct work_struct unplug_work;
++
++	struct bfq_queue *in_service_queue;
++	struct bfq_io_cq *in_service_bic;
++
++	sector_t last_position;
++
++	ktime_t last_budget_start;
++	ktime_t last_idling_start;
++	int peak_rate_samples;
++	u64 peak_rate;
++	unsigned long bfq_max_budget;
++
++	struct hlist_head group_list;
++	struct list_head active_list;
++	struct list_head idle_list;
++
++	unsigned int bfq_fifo_expire[2];
++	unsigned int bfq_back_penalty;
++	unsigned int bfq_back_max;
++	unsigned int bfq_slice_idle;
++	u64 bfq_class_idle_last_service;
++
++	unsigned int bfq_user_max_budget;
++	unsigned int bfq_max_budget_async_rq;
++	unsigned int bfq_timeout[2];
++
++	unsigned int bfq_coop_thresh;
++	unsigned int bfq_failed_cooperations;
++	unsigned int bfq_requests_within_timer;
++
++	unsigned long last_ins_in_burst;
++	unsigned long bfq_burst_interval;
++	int burst_size;
++	unsigned long bfq_large_burst_thresh;
++	bool large_burst;
++	struct hlist_head burst_list;
++
++	bool low_latency;
++
++	/* parameters of the low_latency heuristics */
++	unsigned int bfq_wr_coeff;
++	unsigned int bfq_wr_max_time;
++	unsigned int bfq_wr_rt_max_time;
++	unsigned int bfq_wr_min_idle_time;
++	unsigned long bfq_wr_min_inter_arr_async;
++	unsigned int bfq_wr_max_softrt_rate;
++	u64 RT_prod;
++	enum bfq_device_speed device_speed;
++
++	struct bfq_queue oom_bfqq;
++};
++
++enum bfqq_state_flags {
++	BFQ_BFQQ_FLAG_busy = 0,		/* has requests or is in service */
++	BFQ_BFQQ_FLAG_wait_request,	/* waiting for a request */
++	BFQ_BFQQ_FLAG_must_alloc,	/* must be allowed rq alloc */
++	BFQ_BFQQ_FLAG_fifo_expire,	/* FIFO checked in this slice */
++	BFQ_BFQQ_FLAG_idle_window,	/* slice idling enabled */
++	BFQ_BFQQ_FLAG_sync,		/* synchronous queue */
++	BFQ_BFQQ_FLAG_budget_new,	/* no completion with this budget */
++	BFQ_BFQQ_FLAG_IO_bound,         /*
++					 * bfqq has timed-out at least once
++					 * having consumed at most 2/10 of
++					 * its budget
++					 */
++	BFQ_BFQQ_FLAG_in_large_burst,	/*
++					 * bfqq activated in a large burst,
++					 * see comments to bfq_handle_burst.
++					 */
++	BFQ_BFQQ_FLAG_constantly_seeky,	/*
++					 * bfqq has proved to be slow and
++					 * seeky until budget timeout
++					 */
++	BFQ_BFQQ_FLAG_softrt_update,    /*
++					 * may need softrt-next-start
++					 * update
++					 */
++	BFQ_BFQQ_FLAG_coop,		/* bfqq is shared */
++	BFQ_BFQQ_FLAG_split_coop,	/* shared bfqq will be splitted */
++};
++
++#define BFQ_BFQQ_FNS(name)						\
++static inline void bfq_mark_bfqq_##name(struct bfq_queue *bfqq)		\
++{									\
++	(bfqq)->flags |= (1 << BFQ_BFQQ_FLAG_##name);			\
++}									\
++static inline void bfq_clear_bfqq_##name(struct bfq_queue *bfqq)	\
++{									\
++	(bfqq)->flags &= ~(1 << BFQ_BFQQ_FLAG_##name);			\
++}									\
++static inline int bfq_bfqq_##name(const struct bfq_queue *bfqq)		\
++{									\
++	return ((bfqq)->flags & (1 << BFQ_BFQQ_FLAG_##name)) != 0;	\
++}
++
++BFQ_BFQQ_FNS(busy);
++BFQ_BFQQ_FNS(wait_request);
++BFQ_BFQQ_FNS(must_alloc);
++BFQ_BFQQ_FNS(fifo_expire);
++BFQ_BFQQ_FNS(idle_window);
++BFQ_BFQQ_FNS(sync);
++BFQ_BFQQ_FNS(budget_new);
++BFQ_BFQQ_FNS(IO_bound);
++BFQ_BFQQ_FNS(in_large_burst);
++BFQ_BFQQ_FNS(constantly_seeky);
++BFQ_BFQQ_FNS(coop);
++BFQ_BFQQ_FNS(split_coop);
++BFQ_BFQQ_FNS(softrt_update);
++#undef BFQ_BFQQ_FNS
++
++/* Logging facilities. */
++#define bfq_log_bfqq(bfqd, bfqq, fmt, args...) \
++	blk_add_trace_msg((bfqd)->queue, "bfq%d " fmt, (bfqq)->pid, ##args)
++
++#define bfq_log(bfqd, fmt, args...) \
++	blk_add_trace_msg((bfqd)->queue, "bfq " fmt, ##args)
++
++/* Expiration reasons. */
++enum bfqq_expiration {
++	BFQ_BFQQ_TOO_IDLE = 0,		/*
++					 * queue has been idling for
++					 * too long
++					 */
++	BFQ_BFQQ_BUDGET_TIMEOUT,	/* budget took too long to be used */
++	BFQ_BFQQ_BUDGET_EXHAUSTED,	/* budget consumed */
++	BFQ_BFQQ_NO_MORE_REQUESTS,	/* the queue has no more requests */
++};
++
++#ifdef CONFIG_CGROUP_BFQIO
++/**
++ * struct bfq_group - per (device, cgroup) data structure.
++ * @entity: schedulable entity to insert into the parent group sched_data.
++ * @sched_data: own sched_data, to contain child entities (they may be
++ *              both bfq_queues and bfq_groups).
++ * @group_node: node to be inserted into the bfqio_cgroup->group_data
++ *              list of the containing cgroup's bfqio_cgroup.
++ * @bfqd_node: node to be inserted into the @bfqd->group_list list
++ *             of the groups active on the same device; used for cleanup.
++ * @bfqd: the bfq_data for the device this group acts upon.
++ * @async_bfqq: array of async queues for all the tasks belonging to
++ *              the group, one queue per ioprio value per ioprio_class,
++ *              except for the idle class that has only one queue.
++ * @async_idle_bfqq: async queue for the idle class (ioprio is ignored).
++ * @my_entity: pointer to @entity, %NULL for the toplevel group; used
++ *             to avoid too many special cases during group creation/
++ *             migration.
++ * @active_entities: number of active entities belonging to the group;
++ *                   unused for the root group. Used to know whether there
++ *                   are groups with more than one active @bfq_entity
++ *                   (see the comments to the function
++ *                   bfq_bfqq_must_not_expire()).
++ *
++ * Each (device, cgroup) pair has its own bfq_group, i.e., for each cgroup
++ * there is a set of bfq_groups, each one collecting the lower-level
++ * entities belonging to the group that are acting on the same device.
++ *
++ * Locking works as follows:
++ *    o @group_node is protected by the bfqio_cgroup lock, and is accessed
++ *      via RCU from its readers.
++ *    o @bfqd is protected by the queue lock, RCU is used to access it
++ *      from the readers.
++ *    o All the other fields are protected by the @bfqd queue lock.
++ */
++struct bfq_group {
++	struct bfq_entity entity;
++	struct bfq_sched_data sched_data;
++
++	struct hlist_node group_node;
++	struct hlist_node bfqd_node;
++
++	void *bfqd;
++
++	struct bfq_queue *async_bfqq[2][IOPRIO_BE_NR];
++	struct bfq_queue *async_idle_bfqq;
++
++	struct bfq_entity *my_entity;
++
++	int active_entities;
++};
++
++/**
++ * struct bfqio_cgroup - bfq cgroup data structure.
++ * @css: subsystem state for bfq in the containing cgroup.
++ * @online: flag marked when the subsystem is inserted.
++ * @weight: cgroup weight.
++ * @ioprio: cgroup ioprio.
++ * @ioprio_class: cgroup ioprio_class.
++ * @lock: spinlock that protects @ioprio, @ioprio_class and @group_data.
++ * @group_data: list containing the bfq_group belonging to this cgroup.
++ *
++ * @group_data is accessed using RCU, with @lock protecting the updates,
++ * @ioprio and @ioprio_class are protected by @lock.
++ */
++struct bfqio_cgroup {
++	struct cgroup_subsys_state css;
++	bool online;
++
++	unsigned short weight, ioprio, ioprio_class;
++
++	spinlock_t lock;
++	struct hlist_head group_data;
++};
++#else
++struct bfq_group {
++	struct bfq_sched_data sched_data;
++
++	struct bfq_queue *async_bfqq[2][IOPRIO_BE_NR];
++	struct bfq_queue *async_idle_bfqq;
++};
++#endif
++
++static inline struct bfq_service_tree *
++bfq_entity_service_tree(struct bfq_entity *entity)
++{
++	struct bfq_sched_data *sched_data = entity->sched_data;
++	unsigned int idx = entity->ioprio_class - 1;
++
++	BUG_ON(idx >= BFQ_IOPRIO_CLASSES);
++	BUG_ON(sched_data == NULL);
++
++	return sched_data->service_tree + idx;
++}
++
++static inline struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic,
++					    bool is_sync)
++{
++	return bic->bfqq[is_sync];
++}
++
++static inline void bic_set_bfqq(struct bfq_io_cq *bic,
++				struct bfq_queue *bfqq, bool is_sync)
++{
++	bic->bfqq[is_sync] = bfqq;
++}
++
++static inline struct bfq_data *bic_to_bfqd(struct bfq_io_cq *bic)
++{
++	return bic->icq.q->elevator->elevator_data;
++}
++
++/**
++ * bfq_get_bfqd_locked - get a lock to a bfqd using a RCU protected pointer.
++ * @ptr: a pointer to a bfqd.
++ * @flags: storage for the flags to be saved.
++ *
++ * This function allows bfqg->bfqd to be protected by the
++ * queue lock of the bfqd they reference; the pointer is dereferenced
++ * under RCU, so the storage for bfqd is assured to be safe as long
++ * as the RCU read side critical section does not end.  After the
++ * bfqd->queue->queue_lock is taken the pointer is rechecked, to be
++ * sure that no other writer accessed it.  If we raced with a writer,
++ * the function returns NULL, with the queue unlocked, otherwise it
++ * returns the dereferenced pointer, with the queue locked.
++ */
++static inline struct bfq_data *bfq_get_bfqd_locked(void **ptr,
++						   unsigned long *flags)
++{
++	struct bfq_data *bfqd;
++
++	rcu_read_lock();
++	bfqd = rcu_dereference(*(struct bfq_data **)ptr);
++
++	if (bfqd != NULL) {
++		spin_lock_irqsave(bfqd->queue->queue_lock, *flags);
++		if (*ptr == bfqd)
++			goto out;
++		spin_unlock_irqrestore(bfqd->queue->queue_lock, *flags);
++	}
++
++	bfqd = NULL;
++out:
++	rcu_read_unlock();
++	return bfqd;
++}
++
++static inline void bfq_put_bfqd_unlock(struct bfq_data *bfqd,
++				       unsigned long *flags)
++{
++	spin_unlock_irqrestore(bfqd->queue->queue_lock, *flags);
++}
++
++static void bfq_check_ioprio_change(struct bfq_io_cq *bic);
++static void bfq_put_queue(struct bfq_queue *bfqq);
++static void bfq_dispatch_insert(struct request_queue *q, struct request *rq);
++static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
++				       struct bfq_group *bfqg, int is_sync,
++				       struct bfq_io_cq *bic, gfp_t gfp_mask);
++static void bfq_end_wr_async_queues(struct bfq_data *bfqd,
++				    struct bfq_group *bfqg);
++static void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg);
++static void bfq_exit_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq);
++
++#endif /* _BFQ_H */
+-- 
+1.9.1
+

diff --git a/5003_block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r8-for-4.2.patch b/5003_block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r8-for-4.2.patch
new file mode 100644
index 0000000..547a098
--- /dev/null
+++ b/5003_block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r8-for-4.2.patch
@@ -0,0 +1,1220 @@
+From e7a71ea27442adefc78628dedca1477a1ac6994e Mon Sep 17 00:00:00 2001
+From: Mauro Andreolini <mauro.andreolini@unimore.it>
+Date: Fri, 5 Jun 2015 17:45:40 +0200
+Subject: [PATCH 3/3] block, bfq: add Early Queue Merge (EQM) to BFQ-v7r8 for
+ 4.2.0
+
+A set of processes may happen  to  perform interleaved reads, i.e.,requests
+whose union would give rise to a  sequential read  pattern.  There are two
+typical  cases: in the first  case,   processes  read  fixed-size chunks of
+data at a fixed distance from each other, while in the second case processes
+may read variable-size chunks at  variable distances. The latter case occurs
+for  example with  QEMU, which  splits the  I/O generated  by the  guest into
+multiple chunks,  and lets these chunks  be served by a  pool of cooperating
+processes,  iteratively  assigning  the  next  chunk of  I/O  to  the first
+available  process. CFQ  uses actual  queue merging  for the  first type of
+rocesses, whereas it  uses preemption to get a sequential  read pattern out
+of the read requests  performed by the second type of  processes. In the end
+it uses  two different  mechanisms to  achieve the  same goal: boosting the
+throughput with interleaved I/O.
+
+This patch introduces  Early Queue Merge (EQM), a unified mechanism to get a
+sequential  read pattern  with both  types of  processes. The  main idea is
+checking newly arrived requests against the next request of the active queue
+both in case of actual request insert and in case of request merge. By doing
+so, both the types of processes can be handled by just merging their queues.
+EQM is  then simpler and  more compact than the  pair of mechanisms used in
+CFQ.
+
+Finally, EQM  also preserves the  typical low-latency properties of BFQ, by
+properly restoring the weight-raising state of a queue when it gets back to
+a non-merged state.
+
+Signed-off-by: Mauro Andreolini <mauro.andreolini@unimore.it>
+Signed-off-by: Arianna Avanzini <avanzini@google.com>
+Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
+---
+ block/bfq-iosched.c | 750 +++++++++++++++++++++++++++++++++++++---------------
+ block/bfq-sched.c   |  28 --
+ block/bfq.h         |  54 +++-
+ 3 files changed, 580 insertions(+), 252 deletions(-)
+
+diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
+index 773b2ee..71b51c1 100644
+--- a/block/bfq-iosched.c
++++ b/block/bfq-iosched.c
+@@ -573,6 +573,57 @@ static inline unsigned int bfq_wr_duration(struct bfq_data *bfqd)
+ 	return dur;
+ }
+ 
++static inline unsigned
++bfq_bfqq_cooperations(struct bfq_queue *bfqq)
++{
++	return bfqq->bic ? bfqq->bic->cooperations : 0;
++}
++
++static inline void
++bfq_bfqq_resume_state(struct bfq_queue *bfqq, struct bfq_io_cq *bic)
++{
++	if (bic->saved_idle_window)
++		bfq_mark_bfqq_idle_window(bfqq);
++	else
++		bfq_clear_bfqq_idle_window(bfqq);
++	if (bic->saved_IO_bound)
++		bfq_mark_bfqq_IO_bound(bfqq);
++	else
++		bfq_clear_bfqq_IO_bound(bfqq);
++	/* Assuming that the flag in_large_burst is already correctly set */
++	if (bic->wr_time_left && bfqq->bfqd->low_latency &&
++	    !bfq_bfqq_in_large_burst(bfqq) &&
++	    bic->cooperations < bfqq->bfqd->bfq_coop_thresh) {
++		/*
++		 * Start a weight raising period with the duration given by
++		 * the raising_time_left snapshot.
++		 */
++		if (bfq_bfqq_busy(bfqq))
++			bfqq->bfqd->wr_busy_queues++;
++		bfqq->wr_coeff = bfqq->bfqd->bfq_wr_coeff;
++		bfqq->wr_cur_max_time = bic->wr_time_left;
++		bfqq->last_wr_start_finish = jiffies;
++		bfqq->entity.ioprio_changed = 1;
++	}
++	/*
++	 * Clear wr_time_left to prevent bfq_bfqq_save_state() from
++	 * getting confused about the queue's need of a weight-raising
++	 * period.
++	 */
++	bic->wr_time_left = 0;
++}
++
++/* Must be called with the queue_lock held. */
++static int bfqq_process_refs(struct bfq_queue *bfqq)
++{
++	int process_refs, io_refs;
++
++	io_refs = bfqq->allocated[READ] + bfqq->allocated[WRITE];
++	process_refs = atomic_read(&bfqq->ref) - io_refs - bfqq->entity.on_st;
++	BUG_ON(process_refs < 0);
++	return process_refs;
++}
++
+ /* Empty burst list and add just bfqq (see comments to bfq_handle_burst) */
+ static inline void bfq_reset_burst_list(struct bfq_data *bfqd,
+ 					struct bfq_queue *bfqq)
+@@ -817,7 +868,7 @@ static void bfq_add_request(struct request *rq)
+ 		bfq_rq_pos_tree_add(bfqd, bfqq);
+ 
+ 	if (!bfq_bfqq_busy(bfqq)) {
+-		bool soft_rt,
++		bool soft_rt, coop_or_in_burst,
+ 		     idle_for_long_time = time_is_before_jiffies(
+ 						bfqq->budget_timeout +
+ 						bfqd->bfq_wr_min_idle_time);
+@@ -841,11 +892,12 @@ static void bfq_add_request(struct request *rq)
+ 				bfqd->last_ins_in_burst = jiffies;
+ 		}
+ 
++		coop_or_in_burst = bfq_bfqq_in_large_burst(bfqq) ||
++			bfq_bfqq_cooperations(bfqq) >= bfqd->bfq_coop_thresh;
+ 		soft_rt = bfqd->bfq_wr_max_softrt_rate > 0 &&
+-			!bfq_bfqq_in_large_burst(bfqq) &&
++			!coop_or_in_burst &&
+ 			time_is_before_jiffies(bfqq->soft_rt_next_start);
+-		interactive = !bfq_bfqq_in_large_burst(bfqq) &&
+-			      idle_for_long_time;
++		interactive = !coop_or_in_burst && idle_for_long_time;
+ 		entity->budget = max_t(unsigned long, bfqq->max_budget,
+ 				       bfq_serv_to_charge(next_rq, bfqq));
+ 
+@@ -864,11 +916,20 @@ static void bfq_add_request(struct request *rq)
+ 		if (!bfqd->low_latency)
+ 			goto add_bfqq_busy;
+ 
++		if (bfq_bfqq_just_split(bfqq))
++			goto set_ioprio_changed;
++
+ 		/*
+-		 * If the queue is not being boosted and has been idle
+-		 * for enough time, start a weight-raising period
++		 * If the queue:
++		 * - is not being boosted,
++		 * - has been idle for enough time,
++		 * - is not a sync queue or is linked to a bfq_io_cq (it is
++		 *   shared "for its nature" or it is not shared and its
++		 *   requests have not been redirected to a shared queue)
++		 * start a weight-raising period.
+ 		 */
+-		if (old_wr_coeff == 1 && (interactive || soft_rt)) {
++		if (old_wr_coeff == 1 && (interactive || soft_rt) &&
++		    (!bfq_bfqq_sync(bfqq) || bfqq->bic != NULL)) {
+ 			bfqq->wr_coeff = bfqd->bfq_wr_coeff;
+ 			if (interactive)
+ 				bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
+@@ -882,7 +943,7 @@ static void bfq_add_request(struct request *rq)
+ 		} else if (old_wr_coeff > 1) {
+ 			if (interactive)
+ 				bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
+-			else if (bfq_bfqq_in_large_burst(bfqq) ||
++			else if (coop_or_in_burst ||
+ 				 (bfqq->wr_cur_max_time ==
+ 				  bfqd->bfq_wr_rt_max_time &&
+ 				  !soft_rt)) {
+@@ -901,18 +962,18 @@ static void bfq_add_request(struct request *rq)
+ 				/*
+ 				 *
+ 				 * The remaining weight-raising time is lower
+-				 * than bfqd->bfq_wr_rt_max_time, which
+-				 * means that the application is enjoying
+-				 * weight raising either because deemed soft-
+-				 * rt in the near past, or because deemed
+-				 * interactive a long ago. In both cases,
+-				 * resetting now the current remaining weight-
+-				 * raising time for the application to the
+-				 * weight-raising duration for soft rt
+-				 * applications would not cause any latency
+-				 * increase for the application (as the new
+-				 * duration would be higher than the remaining
+-				 * time).
++				 * than bfqd->bfq_wr_rt_max_time, which means
++				 * that the application is enjoying weight
++				 * raising either because deemed soft-rt in
++				 * the near past, or because deemed interactive
++				 * a long ago.
++				 * In both cases, resetting now the current
++				 * remaining weight-raising time for the
++				 * application to the weight-raising duration
++				 * for soft rt applications would not cause any
++				 * latency increase for the application (as the
++				 * new duration would be higher than the
++				 * remaining time).
+ 				 *
+ 				 * In addition, the application is now meeting
+ 				 * the requirements for being deemed soft rt.
+@@ -947,6 +1008,7 @@ static void bfq_add_request(struct request *rq)
+ 					bfqd->bfq_wr_rt_max_time;
+ 			}
+ 		}
++set_ioprio_changed:
+ 		if (old_wr_coeff != bfqq->wr_coeff)
+ 			entity->ioprio_changed = 1;
+ add_bfqq_busy:
+@@ -1167,90 +1229,35 @@ static void bfq_end_wr(struct bfq_data *bfqd)
+ 	spin_unlock_irq(bfqd->queue->queue_lock);
+ }
+ 
+-static int bfq_allow_merge(struct request_queue *q, struct request *rq,
+-			   struct bio *bio)
++static inline sector_t bfq_io_struct_pos(void *io_struct, bool request)
+ {
+-	struct bfq_data *bfqd = q->elevator->elevator_data;
+-	struct bfq_io_cq *bic;
+-	struct bfq_queue *bfqq;
+-
+-	/*
+-	 * Disallow merge of a sync bio into an async request.
+-	 */
+-	if (bfq_bio_sync(bio) && !rq_is_sync(rq))
+-		return 0;
+-
+-	/*
+-	 * Lookup the bfqq that this bio will be queued with. Allow
+-	 * merge only if rq is queued there.
+-	 * Queue lock is held here.
+-	 */
+-	bic = bfq_bic_lookup(bfqd, current->io_context);
+-	if (bic == NULL)
+-		return 0;
+-
+-	bfqq = bic_to_bfqq(bic, bfq_bio_sync(bio));
+-	return bfqq == RQ_BFQQ(rq);
+-}
+-
+-static void __bfq_set_in_service_queue(struct bfq_data *bfqd,
+-				       struct bfq_queue *bfqq)
+-{
+-	if (bfqq != NULL) {
+-		bfq_mark_bfqq_must_alloc(bfqq);
+-		bfq_mark_bfqq_budget_new(bfqq);
+-		bfq_clear_bfqq_fifo_expire(bfqq);
+-
+-		bfqd->budgets_assigned = (bfqd->budgets_assigned*7 + 256) / 8;
+-
+-		bfq_log_bfqq(bfqd, bfqq,
+-			     "set_in_service_queue, cur-budget = %lu",
+-			     bfqq->entity.budget);
+-	}
+-
+-	bfqd->in_service_queue = bfqq;
+-}
+-
+-/*
+- * Get and set a new queue for service.
+- */
+-static struct bfq_queue *bfq_set_in_service_queue(struct bfq_data *bfqd,
+-						  struct bfq_queue *bfqq)
+-{
+-	if (!bfqq)
+-		bfqq = bfq_get_next_queue(bfqd);
++	if (request)
++		return blk_rq_pos(io_struct);
+ 	else
+-		bfq_get_next_queue_forced(bfqd, bfqq);
+-
+-	__bfq_set_in_service_queue(bfqd, bfqq);
+-	return bfqq;
++		return ((struct bio *)io_struct)->bi_iter.bi_sector;
+ }
+ 
+-static inline sector_t bfq_dist_from_last(struct bfq_data *bfqd,
+-					  struct request *rq)
++static inline sector_t bfq_dist_from(sector_t pos1,
++				     sector_t pos2)
+ {
+-	if (blk_rq_pos(rq) >= bfqd->last_position)
+-		return blk_rq_pos(rq) - bfqd->last_position;
++	if (pos1 >= pos2)
++		return pos1 - pos2;
+ 	else
+-		return bfqd->last_position - blk_rq_pos(rq);
++		return pos2 - pos1;
+ }
+ 
+-/*
+- * Return true if bfqq has no request pending and rq is close enough to
+- * bfqd->last_position, or if rq is closer to bfqd->last_position than
+- * bfqq->next_rq
+- */
+-static inline int bfq_rq_close(struct bfq_data *bfqd, struct request *rq)
++static inline int bfq_rq_close_to_sector(void *io_struct, bool request,
++					 sector_t sector)
+ {
+-	return bfq_dist_from_last(bfqd, rq) <= BFQQ_SEEK_THR;
++	return bfq_dist_from(bfq_io_struct_pos(io_struct, request), sector) <=
++	       BFQQ_SEEK_THR;
+ }
+ 
+-static struct bfq_queue *bfqq_close(struct bfq_data *bfqd)
++static struct bfq_queue *bfqq_close(struct bfq_data *bfqd, sector_t sector)
+ {
+ 	struct rb_root *root = &bfqd->rq_pos_tree;
+ 	struct rb_node *parent, *node;
+ 	struct bfq_queue *__bfqq;
+-	sector_t sector = bfqd->last_position;
+ 
+ 	if (RB_EMPTY_ROOT(root))
+ 		return NULL;
+@@ -1269,7 +1276,7 @@ static struct bfq_queue *bfqq_close(struct bfq_data *bfqd)
+ 	 * next_request position).
+ 	 */
+ 	__bfqq = rb_entry(parent, struct bfq_queue, pos_node);
+-	if (bfq_rq_close(bfqd, __bfqq->next_rq))
++	if (bfq_rq_close_to_sector(__bfqq->next_rq, true, sector))
+ 		return __bfqq;
+ 
+ 	if (blk_rq_pos(__bfqq->next_rq) < sector)
+@@ -1280,7 +1287,7 @@ static struct bfq_queue *bfqq_close(struct bfq_data *bfqd)
+ 		return NULL;
+ 
+ 	__bfqq = rb_entry(node, struct bfq_queue, pos_node);
+-	if (bfq_rq_close(bfqd, __bfqq->next_rq))
++	if (bfq_rq_close_to_sector(__bfqq->next_rq, true, sector))
+ 		return __bfqq;
+ 
+ 	return NULL;
+@@ -1289,14 +1296,12 @@ static struct bfq_queue *bfqq_close(struct bfq_data *bfqd)
+ /*
+  * bfqd - obvious
+  * cur_bfqq - passed in so that we don't decide that the current queue
+- *            is closely cooperating with itself.
+- *
+- * We are assuming that cur_bfqq has dispatched at least one request,
+- * and that bfqd->last_position reflects a position on the disk associated
+- * with the I/O issued by cur_bfqq.
++ *            is closely cooperating with itself
++ * sector - used as a reference point to search for a close queue
+  */
+ static struct bfq_queue *bfq_close_cooperator(struct bfq_data *bfqd,
+-					      struct bfq_queue *cur_bfqq)
++					      struct bfq_queue *cur_bfqq,
++					      sector_t sector)
+ {
+ 	struct bfq_queue *bfqq;
+ 
+@@ -1316,7 +1321,7 @@ static struct bfq_queue *bfq_close_cooperator(struct bfq_data *bfqd,
+ 	 * working closely on the same area of the disk. In that case,
+ 	 * we can group them together and don't waste time idling.
+ 	 */
+-	bfqq = bfqq_close(bfqd);
++	bfqq = bfqq_close(bfqd, sector);
+ 	if (bfqq == NULL || bfqq == cur_bfqq)
+ 		return NULL;
+ 
+@@ -1343,6 +1348,315 @@ static struct bfq_queue *bfq_close_cooperator(struct bfq_data *bfqd,
+ 	return bfqq;
+ }
+ 
++static struct bfq_queue *
++bfq_setup_merge(struct bfq_queue *bfqq, struct bfq_queue *new_bfqq)
++{
++	int process_refs, new_process_refs;
++	struct bfq_queue *__bfqq;
++
++	/*
++	 * If there are no process references on the new_bfqq, then it is
++	 * unsafe to follow the ->new_bfqq chain as other bfqq's in the chain
++	 * may have dropped their last reference (not just their last process
++	 * reference).
++	 */
++	if (!bfqq_process_refs(new_bfqq))
++		return NULL;
++
++	/* Avoid a circular list and skip interim queue merges. */
++	while ((__bfqq = new_bfqq->new_bfqq)) {
++		if (__bfqq == bfqq)
++			return NULL;
++		new_bfqq = __bfqq;
++	}
++
++	process_refs = bfqq_process_refs(bfqq);
++	new_process_refs = bfqq_process_refs(new_bfqq);
++	/*
++	 * If the process for the bfqq has gone away, there is no
++	 * sense in merging the queues.
++	 */
++	if (process_refs == 0 || new_process_refs == 0)
++		return NULL;
++
++	bfq_log_bfqq(bfqq->bfqd, bfqq, "scheduling merge with queue %d",
++		new_bfqq->pid);
++
++	/*
++	 * Merging is just a redirection: the requests of the process
++	 * owning one of the two queues are redirected to the other queue.
++	 * The latter queue, in its turn, is set as shared if this is the
++	 * first time that the requests of some process are redirected to
++	 * it.
++	 *
++	 * We redirect bfqq to new_bfqq and not the opposite, because we
++	 * are in the context of the process owning bfqq, hence we have
++	 * the io_cq of this process. So we can immediately configure this
++	 * io_cq to redirect the requests of the process to new_bfqq.
++	 *
++	 * NOTE, even if new_bfqq coincides with the in-service queue, the
++	 * io_cq of new_bfqq is not available, because, if the in-service
++	 * queue is shared, bfqd->in_service_bic may not point to the
++	 * io_cq of the in-service queue.
++	 * Redirecting the requests of the process owning bfqq to the
++	 * currently in-service queue is in any case the best option, as
++	 * we feed the in-service queue with new requests close to the
++	 * last request served and, by doing so, hopefully increase the
++	 * throughput.
++	 */
++	bfqq->new_bfqq = new_bfqq;
++	atomic_add(process_refs, &new_bfqq->ref);
++	return new_bfqq;
++}
++
++/*
++ * Attempt to schedule a merge of bfqq with the currently in-service queue
++ * or with a close queue among the scheduled queues.
++ * Return NULL if no merge was scheduled, a pointer to the shared bfq_queue
++ * structure otherwise.
++ *
++ * The OOM queue is not allowed to participate to cooperation: in fact, since
++ * the requests temporarily redirected to the OOM queue could be redirected
++ * again to dedicated queues at any time, the state needed to correctly
++ * handle merging with the OOM queue would be quite complex and expensive
++ * to maintain. Besides, in such a critical condition as an out of memory,
++ * the benefits of queue merging may be little relevant, or even negligible.
++ */
++static struct bfq_queue *
++bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++		     void *io_struct, bool request)
++{
++	struct bfq_queue *in_service_bfqq, *new_bfqq;
++
++	if (bfqq->new_bfqq)
++		return bfqq->new_bfqq;
++
++	if (!io_struct || unlikely(bfqq == &bfqd->oom_bfqq))
++		return NULL;
++
++	in_service_bfqq = bfqd->in_service_queue;
++
++	if (in_service_bfqq == NULL || in_service_bfqq == bfqq ||
++	    !bfqd->in_service_bic ||
++	    unlikely(in_service_bfqq == &bfqd->oom_bfqq))
++		goto check_scheduled;
++
++	if (bfq_class_idle(in_service_bfqq) || bfq_class_idle(bfqq))
++		goto check_scheduled;
++
++	if (bfq_class_rt(in_service_bfqq) != bfq_class_rt(bfqq))
++		goto check_scheduled;
++
++	if (in_service_bfqq->entity.parent != bfqq->entity.parent)
++		goto check_scheduled;
++
++	if (bfq_rq_close_to_sector(io_struct, request, bfqd->last_position) &&
++	    bfq_bfqq_sync(in_service_bfqq) && bfq_bfqq_sync(bfqq)) {
++		new_bfqq = bfq_setup_merge(bfqq, in_service_bfqq);
++		if (new_bfqq != NULL)
++			return new_bfqq; /* Merge with in-service queue */
++	}
++
++	/*
++	 * Check whether there is a cooperator among currently scheduled
++	 * queues. The only thing we need is that the bio/request is not
++	 * NULL, as we need it to establish whether a cooperator exists.
++	 */
++check_scheduled:
++	new_bfqq = bfq_close_cooperator(bfqd, bfqq,
++					bfq_io_struct_pos(io_struct, request));
++	if (new_bfqq && likely(new_bfqq != &bfqd->oom_bfqq))
++		return bfq_setup_merge(bfqq, new_bfqq);
++
++	return NULL;
++}
++
++static inline void
++bfq_bfqq_save_state(struct bfq_queue *bfqq)
++{
++	/*
++	 * If bfqq->bic == NULL, the queue is already shared or its requests
++	 * have already been redirected to a shared queue; both idle window
++	 * and weight raising state have already been saved. Do nothing.
++	 */
++	if (bfqq->bic == NULL)
++		return;
++	if (bfqq->bic->wr_time_left)
++		/*
++		 * This is the queue of a just-started process, and would
++		 * deserve weight raising: we set wr_time_left to the full
++		 * weight-raising duration to trigger weight-raising when
++		 * and if the queue is split and the first request of the
++		 * queue is enqueued.
++		 */
++		bfqq->bic->wr_time_left = bfq_wr_duration(bfqq->bfqd);
++	else if (bfqq->wr_coeff > 1) {
++		unsigned long wr_duration =
++			jiffies - bfqq->last_wr_start_finish;
++		/*
++		 * It may happen that a queue's weight raising period lasts
++		 * longer than its wr_cur_max_time, as weight raising is
++		 * handled only when a request is enqueued or dispatched (it
++		 * does not use any timer). If the weight raising period is
++		 * about to end, don't save it.
++		 */
++		if (bfqq->wr_cur_max_time <= wr_duration)
++			bfqq->bic->wr_time_left = 0;
++		else
++			bfqq->bic->wr_time_left =
++				bfqq->wr_cur_max_time - wr_duration;
++		/*
++		 * The bfq_queue is becoming shared or the requests of the
++		 * process owning the queue are being redirected to a shared
++		 * queue. Stop the weight raising period of the queue, as in
++		 * both cases it should not be owned by an interactive or
++		 * soft real-time application.
++		 */
++		bfq_bfqq_end_wr(bfqq);
++	} else
++		bfqq->bic->wr_time_left = 0;
++	bfqq->bic->saved_idle_window = bfq_bfqq_idle_window(bfqq);
++	bfqq->bic->saved_IO_bound = bfq_bfqq_IO_bound(bfqq);
++	bfqq->bic->saved_in_large_burst = bfq_bfqq_in_large_burst(bfqq);
++	bfqq->bic->was_in_burst_list = !hlist_unhashed(&bfqq->burst_list_node);
++	bfqq->bic->cooperations++;
++	bfqq->bic->failed_cooperations = 0;
++}
++
++static inline void
++bfq_get_bic_reference(struct bfq_queue *bfqq)
++{
++	/*
++	 * If bfqq->bic has a non-NULL value, the bic to which it belongs
++	 * is about to begin using a shared bfq_queue.
++	 */
++	if (bfqq->bic)
++		atomic_long_inc(&bfqq->bic->icq.ioc->refcount);
++}
++
++static void
++bfq_merge_bfqqs(struct bfq_data *bfqd, struct bfq_io_cq *bic,
++		struct bfq_queue *bfqq, struct bfq_queue *new_bfqq)
++{
++	bfq_log_bfqq(bfqd, bfqq, "merging with queue %lu",
++		(long unsigned)new_bfqq->pid);
++	/* Save weight raising and idle window of the merged queues */
++	bfq_bfqq_save_state(bfqq);
++	bfq_bfqq_save_state(new_bfqq);
++	if (bfq_bfqq_IO_bound(bfqq))
++		bfq_mark_bfqq_IO_bound(new_bfqq);
++	bfq_clear_bfqq_IO_bound(bfqq);
++	/*
++	 * Grab a reference to the bic, to prevent it from being destroyed
++	 * before being possibly touched by a bfq_split_bfqq().
++	 */
++	bfq_get_bic_reference(bfqq);
++	bfq_get_bic_reference(new_bfqq);
++	/*
++	 * Merge queues (that is, let bic redirect its requests to new_bfqq)
++	 */
++	bic_set_bfqq(bic, new_bfqq, 1);
++	bfq_mark_bfqq_coop(new_bfqq);
++	/*
++	 * new_bfqq now belongs to at least two bics (it is a shared queue):
++	 * set new_bfqq->bic to NULL. bfqq either:
++	 * - does not belong to any bic any more, and hence bfqq->bic must
++	 *   be set to NULL, or
++	 * - is a queue whose owning bics have already been redirected to a
++	 *   different queue, hence the queue is destined to not belong to
++	 *   any bic soon and bfqq->bic is already NULL (therefore the next
++	 *   assignment causes no harm).
++	 */
++	new_bfqq->bic = NULL;
++	bfqq->bic = NULL;
++	bfq_put_queue(bfqq);
++}
++
++static inline void bfq_bfqq_increase_failed_cooperations(struct bfq_queue *bfqq)
++{
++	struct bfq_io_cq *bic = bfqq->bic;
++	struct bfq_data *bfqd = bfqq->bfqd;
++
++	if (bic && bfq_bfqq_cooperations(bfqq) >= bfqd->bfq_coop_thresh) {
++		bic->failed_cooperations++;
++		if (bic->failed_cooperations >= bfqd->bfq_failed_cooperations)
++			bic->cooperations = 0;
++	}
++}
++
++static int bfq_allow_merge(struct request_queue *q, struct request *rq,
++			   struct bio *bio)
++{
++	struct bfq_data *bfqd = q->elevator->elevator_data;
++	struct bfq_io_cq *bic;
++	struct bfq_queue *bfqq, *new_bfqq;
++
++	/*
++	 * Disallow merge of a sync bio into an async request.
++	 */
++	if (bfq_bio_sync(bio) && !rq_is_sync(rq))
++		return 0;
++
++	/*
++	 * Lookup the bfqq that this bio will be queued with. Allow
++	 * merge only if rq is queued there.
++	 * Queue lock is held here.
++	 */
++	bic = bfq_bic_lookup(bfqd, current->io_context);
++	if (bic == NULL)
++		return 0;
++
++	bfqq = bic_to_bfqq(bic, bfq_bio_sync(bio));
++	/*
++	 * We take advantage of this function to perform an early merge
++	 * of the queues of possible cooperating processes.
++	 */
++	if (bfqq != NULL) {
++		new_bfqq = bfq_setup_cooperator(bfqd, bfqq, bio, false);
++		if (new_bfqq != NULL) {
++			bfq_merge_bfqqs(bfqd, bic, bfqq, new_bfqq);
++			/*
++			 * If we get here, the bio will be queued in the
++			 * shared queue, i.e., new_bfqq, so use new_bfqq
++			 * to decide whether bio and rq can be merged.
++			 */
++			bfqq = new_bfqq;
++		} else
++			bfq_bfqq_increase_failed_cooperations(bfqq);
++	}
++
++	return bfqq == RQ_BFQQ(rq);
++}
++
++static void __bfq_set_in_service_queue(struct bfq_data *bfqd,
++				       struct bfq_queue *bfqq)
++{
++	if (bfqq != NULL) {
++		bfq_mark_bfqq_must_alloc(bfqq);
++		bfq_mark_bfqq_budget_new(bfqq);
++		bfq_clear_bfqq_fifo_expire(bfqq);
++
++		bfqd->budgets_assigned = (bfqd->budgets_assigned*7 + 256) / 8;
++
++		bfq_log_bfqq(bfqd, bfqq,
++			     "set_in_service_queue, cur-budget = %lu",
++			     bfqq->entity.budget);
++	}
++
++	bfqd->in_service_queue = bfqq;
++}
++
++/*
++ * Get and set a new queue for service.
++ */
++static struct bfq_queue *bfq_set_in_service_queue(struct bfq_data *bfqd)
++{
++	struct bfq_queue *bfqq = bfq_get_next_queue(bfqd);
++
++	__bfq_set_in_service_queue(bfqd, bfqq);
++	return bfqq;
++}
++
+ /*
+  * If enough samples have been computed, return the current max budget
+  * stored in bfqd, which is dynamically updated according to the
+@@ -1488,61 +1802,6 @@ static struct request *bfq_check_fifo(struct bfq_queue *bfqq)
+ 	return rq;
+ }
+ 
+-/* Must be called with the queue_lock held. */
+-static int bfqq_process_refs(struct bfq_queue *bfqq)
+-{
+-	int process_refs, io_refs;
+-
+-	io_refs = bfqq->allocated[READ] + bfqq->allocated[WRITE];
+-	process_refs = atomic_read(&bfqq->ref) - io_refs - bfqq->entity.on_st;
+-	BUG_ON(process_refs < 0);
+-	return process_refs;
+-}
+-
+-static void bfq_setup_merge(struct bfq_queue *bfqq, struct bfq_queue *new_bfqq)
+-{
+-	int process_refs, new_process_refs;
+-	struct bfq_queue *__bfqq;
+-
+-	/*
+-	 * If there are no process references on the new_bfqq, then it is
+-	 * unsafe to follow the ->new_bfqq chain as other bfqq's in the chain
+-	 * may have dropped their last reference (not just their last process
+-	 * reference).
+-	 */
+-	if (!bfqq_process_refs(new_bfqq))
+-		return;
+-
+-	/* Avoid a circular list and skip interim queue merges. */
+-	while ((__bfqq = new_bfqq->new_bfqq)) {
+-		if (__bfqq == bfqq)
+-			return;
+-		new_bfqq = __bfqq;
+-	}
+-
+-	process_refs = bfqq_process_refs(bfqq);
+-	new_process_refs = bfqq_process_refs(new_bfqq);
+-	/*
+-	 * If the process for the bfqq has gone away, there is no
+-	 * sense in merging the queues.
+-	 */
+-	if (process_refs == 0 || new_process_refs == 0)
+-		return;
+-
+-	/*
+-	 * Merge in the direction of the lesser amount of work.
+-	 */
+-	if (new_process_refs >= process_refs) {
+-		bfqq->new_bfqq = new_bfqq;
+-		atomic_add(process_refs, &new_bfqq->ref);
+-	} else {
+-		new_bfqq->new_bfqq = bfqq;
+-		atomic_add(new_process_refs, &bfqq->ref);
+-	}
+-	bfq_log_bfqq(bfqq->bfqd, bfqq, "scheduling merge with queue %d",
+-		new_bfqq->pid);
+-}
+-
+ static inline unsigned long bfq_bfqq_budget_left(struct bfq_queue *bfqq)
+ {
+ 	struct bfq_entity *entity = &bfqq->entity;
+@@ -2269,7 +2528,7 @@ static inline bool bfq_bfqq_must_idle(struct bfq_queue *bfqq)
+  */
+ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
+ {
+-	struct bfq_queue *bfqq, *new_bfqq = NULL;
++	struct bfq_queue *bfqq;
+ 	struct request *next_rq;
+ 	enum bfqq_expiration reason = BFQ_BFQQ_BUDGET_TIMEOUT;
+ 
+@@ -2279,17 +2538,6 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
+ 
+ 	bfq_log_bfqq(bfqd, bfqq, "select_queue: already in-service queue");
+ 
+-	/*
+-         * If another queue has a request waiting within our mean seek
+-         * distance, let it run. The expire code will check for close
+-         * cooperators and put the close queue at the front of the
+-         * service tree. If possible, merge the expiring queue with the
+-         * new bfqq.
+-         */
+-        new_bfqq = bfq_close_cooperator(bfqd, bfqq);
+-        if (new_bfqq != NULL && bfqq->new_bfqq == NULL)
+-                bfq_setup_merge(bfqq, new_bfqq);
+-
+ 	if (bfq_may_expire_for_budg_timeout(bfqq) &&
+ 	    !timer_pending(&bfqd->idle_slice_timer) &&
+ 	    !bfq_bfqq_must_idle(bfqq))
+@@ -2328,10 +2576,7 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
+ 				bfq_clear_bfqq_wait_request(bfqq);
+ 				del_timer(&bfqd->idle_slice_timer);
+ 			}
+-			if (new_bfqq == NULL)
+-				goto keep_queue;
+-			else
+-				goto expire;
++			goto keep_queue;
+ 		}
+ 	}
+ 
+@@ -2340,40 +2585,30 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
+ 	 * for a new request, or has requests waiting for a completion and
+ 	 * may idle after their completion, then keep it anyway.
+ 	 */
+-	if (new_bfqq == NULL && (timer_pending(&bfqd->idle_slice_timer) ||
+-	    (bfqq->dispatched != 0 && bfq_bfqq_must_not_expire(bfqq)))) {
++	if (timer_pending(&bfqd->idle_slice_timer) ||
++	    (bfqq->dispatched != 0 && bfq_bfqq_must_not_expire(bfqq))) {
+ 		bfqq = NULL;
+ 		goto keep_queue;
+-	} else if (new_bfqq != NULL && timer_pending(&bfqd->idle_slice_timer)) {
+-		/*
+-		 * Expiring the queue because there is a close cooperator,
+-		 * cancel timer.
+-		 */
+-		bfq_clear_bfqq_wait_request(bfqq);
+-		del_timer(&bfqd->idle_slice_timer);
+ 	}
+ 
+ 	reason = BFQ_BFQQ_NO_MORE_REQUESTS;
+ expire:
+ 	bfq_bfqq_expire(bfqd, bfqq, 0, reason);
+ new_queue:
+-	bfqq = bfq_set_in_service_queue(bfqd, new_bfqq);
++	bfqq = bfq_set_in_service_queue(bfqd);
+ 	bfq_log(bfqd, "select_queue: new queue %d returned",
+ 		bfqq != NULL ? bfqq->pid : 0);
+ keep_queue:
+ 	return bfqq;
+ }
+ 
+-static void bfq_update_wr_data(struct bfq_data *bfqd,
+-			       struct bfq_queue *bfqq)
++static void bfq_update_wr_data(struct bfq_data *bfqd, struct bfq_queue *bfqq)
+ {
+-	if (bfqq->wr_coeff > 1) { /* queue is being boosted */
+-		struct bfq_entity *entity = &bfqq->entity;
+-
++	struct bfq_entity *entity = &bfqq->entity;
++	if (bfqq->wr_coeff > 1) { /* queue is being weight-raised */
+ 		bfq_log_bfqq(bfqd, bfqq,
+ 			"raising period dur %u/%u msec, old coeff %u, w %d(%d)",
+-			jiffies_to_msecs(jiffies -
+-				bfqq->last_wr_start_finish),
++			jiffies_to_msecs(jiffies - bfqq->last_wr_start_finish),
+ 			jiffies_to_msecs(bfqq->wr_cur_max_time),
+ 			bfqq->wr_coeff,
+ 			bfqq->entity.weight, bfqq->entity.orig_weight);
+@@ -2382,12 +2617,16 @@ static void bfq_update_wr_data(struct bfq_data *bfqd,
+ 		       entity->orig_weight * bfqq->wr_coeff);
+ 		if (entity->ioprio_changed)
+ 			bfq_log_bfqq(bfqd, bfqq, "WARN: pending prio change");
++
+ 		/*
+ 		 * If the queue was activated in a burst, or
+ 		 * too much time has elapsed from the beginning
+-		 * of this weight-raising, then end weight raising.
++		 * of this weight-raising period, or the queue has
++		 * exceeded the acceptable number of cooperations,
++		 * then end weight raising.
+ 		 */
+ 		if (bfq_bfqq_in_large_burst(bfqq) ||
++		    bfq_bfqq_cooperations(bfqq) >= bfqd->bfq_coop_thresh ||
+ 		    time_is_before_jiffies(bfqq->last_wr_start_finish +
+ 					   bfqq->wr_cur_max_time)) {
+ 			bfqq->last_wr_start_finish = jiffies;
+@@ -2396,11 +2635,13 @@ static void bfq_update_wr_data(struct bfq_data *bfqd,
+ 				     bfqq->last_wr_start_finish,
+ 				     jiffies_to_msecs(bfqq->wr_cur_max_time));
+ 			bfq_bfqq_end_wr(bfqq);
+-			__bfq_entity_update_weight_prio(
+-				bfq_entity_service_tree(entity),
+-				entity);
+ 		}
+ 	}
++	/* Update weight both if it must be raised and if it must be lowered */
++	if ((entity->weight > entity->orig_weight) != (bfqq->wr_coeff > 1))
++		__bfq_entity_update_weight_prio(
++			bfq_entity_service_tree(entity),
++			entity);
+ }
+ 
+ /*
+@@ -2647,6 +2888,25 @@ static inline void bfq_init_icq(struct io_cq *icq)
+ 	struct bfq_io_cq *bic = icq_to_bic(icq);
+ 
+ 	bic->ttime.last_end_request = jiffies;
++	/*
++	 * A newly created bic indicates that the process has just
++	 * started doing I/O, and is probably mapping into memory its
++	 * executable and libraries: it definitely needs weight raising.
++	 * There is however the possibility that the process performs,
++	 * for a while, I/O close to some other process. EQM intercepts
++	 * this behavior and may merge the queue corresponding to the
++	 * process  with some other queue, BEFORE the weight of the queue
++	 * is raised. Merged queues are not weight-raised (they are assumed
++	 * to belong to processes that benefit only from high throughput).
++	 * If the merge is basically the consequence of an accident, then
++	 * the queue will be split soon and will get back its old weight.
++	 * It is then important to write down somewhere that this queue
++	 * does need weight raising, even if it did not make it to get its
++	 * weight raised before being merged. To this purpose, we overload
++	 * the field raising_time_left and assign 1 to it, to mark the queue
++	 * as needing weight raising.
++	 */
++	bic->wr_time_left = 1;
+ }
+ 
+ static void bfq_exit_icq(struct io_cq *icq)
+@@ -2660,6 +2920,13 @@ static void bfq_exit_icq(struct io_cq *icq)
+ 	}
+ 
+ 	if (bic->bfqq[BLK_RW_SYNC]) {
++		/*
++		 * If the bic is using a shared queue, put the reference
++		 * taken on the io_context when the bic started using a
++		 * shared bfq_queue.
++		 */
++		if (bfq_bfqq_coop(bic->bfqq[BLK_RW_SYNC]))
++			put_io_context(icq->ioc);
+ 		bfq_exit_bfqq(bfqd, bic->bfqq[BLK_RW_SYNC]);
+ 		bic->bfqq[BLK_RW_SYNC] = NULL;
+ 	}
+@@ -2952,6 +3219,10 @@ static void bfq_update_idle_window(struct bfq_data *bfqd,
+ 	if (!bfq_bfqq_sync(bfqq) || bfq_class_idle(bfqq))
+ 		return;
+ 
++	/* Idle window just restored, statistics are meaningless. */
++	if (bfq_bfqq_just_split(bfqq))
++		return;
++
+ 	enable_idle = bfq_bfqq_idle_window(bfqq);
+ 
+ 	if (atomic_read(&bic->icq.ioc->active_ref) == 0 ||
+@@ -2999,6 +3270,7 @@ static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq,
+ 	if (bfqq->entity.service > bfq_max_budget(bfqd) / 8 ||
+ 	    !BFQQ_SEEKY(bfqq))
+ 		bfq_update_idle_window(bfqd, bfqq, bic);
++	bfq_clear_bfqq_just_split(bfqq);
+ 
+ 	bfq_log_bfqq(bfqd, bfqq,
+ 		     "rq_enqueued: idle_window=%d (seeky %d, mean %llu)",
+@@ -3059,12 +3331,47 @@ static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq,
+ static void bfq_insert_request(struct request_queue *q, struct request *rq)
+ {
+ 	struct bfq_data *bfqd = q->elevator->elevator_data;
+-	struct bfq_queue *bfqq = RQ_BFQQ(rq);
++	struct bfq_queue *bfqq = RQ_BFQQ(rq), *new_bfqq;
+ 
+ 	assert_spin_locked(bfqd->queue->queue_lock);
+ 
++	/*
++	 * An unplug may trigger a requeue of a request from the device
++	 * driver: make sure we are in process context while trying to
++	 * merge two bfq_queues.
++	 */
++	if (!in_interrupt()) {
++		new_bfqq = bfq_setup_cooperator(bfqd, bfqq, rq, true);
++		if (new_bfqq != NULL) {
++			if (bic_to_bfqq(RQ_BIC(rq), 1) != bfqq)
++				new_bfqq = bic_to_bfqq(RQ_BIC(rq), 1);
++			/*
++			 * Release the request's reference to the old bfqq
++			 * and make sure one is taken to the shared queue.
++			 */
++			new_bfqq->allocated[rq_data_dir(rq)]++;
++			bfqq->allocated[rq_data_dir(rq)]--;
++			atomic_inc(&new_bfqq->ref);
++			bfq_put_queue(bfqq);
++			if (bic_to_bfqq(RQ_BIC(rq), 1) == bfqq)
++				bfq_merge_bfqqs(bfqd, RQ_BIC(rq),
++						bfqq, new_bfqq);
++			rq->elv.priv[1] = new_bfqq;
++			bfqq = new_bfqq;
++		} else
++			bfq_bfqq_increase_failed_cooperations(bfqq);
++	}
++
+ 	bfq_add_request(rq);
+ 
++	/*
++	 * Here a newly-created bfq_queue has already started a weight-raising
++	 * period: clear raising_time_left to prevent bfq_bfqq_save_state()
++	 * from assigning it a full weight-raising period. See the detailed
++	 * comments about this field in bfq_init_icq().
++	 */
++	if (bfqq->bic != NULL)
++		bfqq->bic->wr_time_left = 0;
+ 	rq->fifo_time = jiffies + bfqd->bfq_fifo_expire[rq_is_sync(rq)];
+ 	list_add_tail(&rq->queuelist, &bfqq->fifo);
+ 
+@@ -3226,18 +3533,6 @@ static void bfq_put_request(struct request *rq)
+ 	}
+ }
+ 
+-static struct bfq_queue *
+-bfq_merge_bfqqs(struct bfq_data *bfqd, struct bfq_io_cq *bic,
+-		struct bfq_queue *bfqq)
+-{
+-	bfq_log_bfqq(bfqd, bfqq, "merging with queue %lu",
+-		(long unsigned)bfqq->new_bfqq->pid);
+-	bic_set_bfqq(bic, bfqq->new_bfqq, 1);
+-	bfq_mark_bfqq_coop(bfqq->new_bfqq);
+-	bfq_put_queue(bfqq);
+-	return bic_to_bfqq(bic, 1);
+-}
+-
+ /*
+  * Returns NULL if a new bfqq should be allocated, or the old bfqq if this
+  * was the last process referring to said bfqq.
+@@ -3246,6 +3541,9 @@ static struct bfq_queue *
+ bfq_split_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq)
+ {
+ 	bfq_log_bfqq(bfqq->bfqd, bfqq, "splitting queue");
++
++	put_io_context(bic->icq.ioc);
++
+ 	if (bfqq_process_refs(bfqq) == 1) {
+ 		bfqq->pid = current->pid;
+ 		bfq_clear_bfqq_coop(bfqq);
+@@ -3274,6 +3572,7 @@ static int bfq_set_request(struct request_queue *q, struct request *rq,
+ 	struct bfq_queue *bfqq;
+ 	struct bfq_group *bfqg;
+ 	unsigned long flags;
++	bool split = false;
+ 
+ 	might_sleep_if(gfp_mask & __GFP_WAIT);
+ 
+@@ -3291,25 +3590,26 @@ new_queue:
+ 	if (bfqq == NULL || bfqq == &bfqd->oom_bfqq) {
+ 		bfqq = bfq_get_queue(bfqd, bfqg, is_sync, bic, gfp_mask);
+ 		bic_set_bfqq(bic, bfqq, is_sync);
++		if (split && is_sync) {
++			if ((bic->was_in_burst_list && bfqd->large_burst) ||
++			    bic->saved_in_large_burst)
++				bfq_mark_bfqq_in_large_burst(bfqq);
++			else {
++			    bfq_clear_bfqq_in_large_burst(bfqq);
++			    if (bic->was_in_burst_list)
++			       hlist_add_head(&bfqq->burst_list_node,
++				              &bfqd->burst_list);
++			}
++		}
+ 	} else {
+-		/*
+-		 * If the queue was seeky for too long, break it apart.
+-		 */
++		/* If the queue was seeky for too long, break it apart. */
+ 		if (bfq_bfqq_coop(bfqq) && bfq_bfqq_split_coop(bfqq)) {
+ 			bfq_log_bfqq(bfqd, bfqq, "breaking apart bfqq");
+ 			bfqq = bfq_split_bfqq(bic, bfqq);
++			split = true;
+ 			if (!bfqq)
+ 				goto new_queue;
+ 		}
+-
+-		/*
+-		 * Check to see if this queue is scheduled to merge with
+-		 * another closely cooperating queue. The merging of queues
+-		 * happens here as it must be done in process context.
+-		 * The reference on new_bfqq was taken in merge_bfqqs.
+-		 */
+-		if (bfqq->new_bfqq != NULL)
+-			bfqq = bfq_merge_bfqqs(bfqd, bic, bfqq);
+ 	}
+ 
+ 	bfqq->allocated[rw]++;
+@@ -3320,6 +3620,26 @@ new_queue:
+ 	rq->elv.priv[0] = bic;
+ 	rq->elv.priv[1] = bfqq;
+ 
++	/*
++	 * If a bfq_queue has only one process reference, it is owned
++	 * by only one bfq_io_cq: we can set the bic field of the
++	 * bfq_queue to the address of that structure. Also, if the
++	 * queue has just been split, mark a flag so that the
++	 * information is available to the other scheduler hooks.
++	 */
++	if (likely(bfqq != &bfqd->oom_bfqq) && bfqq_process_refs(bfqq) == 1) {
++		bfqq->bic = bic;
++		if (split) {
++			bfq_mark_bfqq_just_split(bfqq);
++			/*
++			 * If the queue has just been split from a shared
++			 * queue, restore the idle window and the possible
++			 * weight raising period.
++			 */
++			bfq_bfqq_resume_state(bfqq, bic);
++		}
++	}
++
+ 	spin_unlock_irqrestore(q->queue_lock, flags);
+ 
+ 	return 0;
+diff --git a/block/bfq-sched.c b/block/bfq-sched.c
+index c343099..d0890c6 100644
+--- a/block/bfq-sched.c
++++ b/block/bfq-sched.c
+@@ -1085,34 +1085,6 @@ static struct bfq_queue *bfq_get_next_queue(struct bfq_data *bfqd)
+ 	return bfqq;
+ }
+ 
+-/*
+- * Forced extraction of the given queue.
+- */
+-static void bfq_get_next_queue_forced(struct bfq_data *bfqd,
+-				      struct bfq_queue *bfqq)
+-{
+-	struct bfq_entity *entity;
+-	struct bfq_sched_data *sd;
+-
+-	BUG_ON(bfqd->in_service_queue != NULL);
+-
+-	entity = &bfqq->entity;
+-	/*
+-	 * Bubble up extraction/update from the leaf to the root.
+-	*/
+-	for_each_entity(entity) {
+-		sd = entity->sched_data;
+-		bfq_update_budget(entity);
+-		bfq_update_vtime(bfq_entity_service_tree(entity));
+-		bfq_active_extract(bfq_entity_service_tree(entity), entity);
+-		sd->in_service_entity = entity;
+-		sd->next_in_service = NULL;
+-		entity->service = 0;
+-	}
+-
+-	return;
+-}
+-
+ static void __bfq_bfqd_reset_in_service(struct bfq_data *bfqd)
+ {
+ 	if (bfqd->in_service_bic != NULL) {
+diff --git a/block/bfq.h b/block/bfq.h
+index e350b5f..93d3f6e 100644
+--- a/block/bfq.h
++++ b/block/bfq.h
+@@ -218,18 +218,21 @@ struct bfq_group;
+  *                      idle @bfq_queue with no outstanding requests, then
+  *                      the task associated with the queue it is deemed as
+  *                      soft real-time (see the comments to the function
+- *                      bfq_bfqq_softrt_next_start()).
++ *                      bfq_bfqq_softrt_next_start())
+  * @last_idle_bklogged: time of the last transition of the @bfq_queue from
+  *                      idle to backlogged
+  * @service_from_backlogged: cumulative service received from the @bfq_queue
+  *                           since the last transition from idle to
+  *                           backlogged
++ * @bic: pointer to the bfq_io_cq owning the bfq_queue, set to %NULL if the
++ *	 queue is shared
+  *
+- * A bfq_queue is a leaf request queue; it can be associated with an io_context
+- * or more, if it is async or shared between cooperating processes. @cgroup
+- * holds a reference to the cgroup, to be sure that it does not disappear while
+- * a bfqq still references it (mostly to avoid races between request issuing and
+- * task migration followed by cgroup destruction).
++ * A bfq_queue is a leaf request queue; it can be associated with an
++ * io_context or more, if it  is  async or shared  between  cooperating
++ * processes. @cgroup holds a reference to the cgroup, to be sure that it
++ * does not disappear while a bfqq still references it (mostly to avoid
++ * races between request issuing and task migration followed by cgroup
++ * destruction).
+  * All the fields are protected by the queue lock of the containing bfqd.
+  */
+ struct bfq_queue {
+@@ -269,6 +272,7 @@ struct bfq_queue {
+ 	unsigned int requests_within_timer;
+ 
+ 	pid_t pid;
++	struct bfq_io_cq *bic;
+ 
+ 	/* weight-raising fields */
+ 	unsigned long wr_cur_max_time;
+@@ -298,12 +302,42 @@ struct bfq_ttime {
+  * @icq: associated io_cq structure
+  * @bfqq: array of two process queues, the sync and the async
+  * @ttime: associated @bfq_ttime struct
++ * @wr_time_left: snapshot of the time left before weight raising ends
++ *                for the sync queue associated to this process; this
++ *		  snapshot is taken to remember this value while the weight
++ *		  raising is suspended because the queue is merged with a
++ *		  shared queue, and is used to set @raising_cur_max_time
++ *		  when the queue is split from the shared queue and its
++ *		  weight is raised again
++ * @saved_idle_window: same purpose as the previous field for the idle
++ *                     window
++ * @saved_IO_bound: same purpose as the previous two fields for the I/O
++ *                  bound classification of a queue
++ * @saved_in_large_burst: same purpose as the previous fields for the
++ *                        value of the field keeping the queue's belonging
++ *                        to a large burst
++ * @was_in_burst_list: true if the queue belonged to a burst list
++ *                     before its merge with another cooperating queue
++ * @cooperations: counter of consecutive successful queue merges underwent
++ *                by any of the process' @bfq_queues
++ * @failed_cooperations: counter of consecutive failed queue merges of any
++ *                       of the process' @bfq_queues
+  */
+ struct bfq_io_cq {
+ 	struct io_cq icq; /* must be the first member */
+ 	struct bfq_queue *bfqq[2];
+ 	struct bfq_ttime ttime;
+ 	int ioprio;
++
++	unsigned int wr_time_left;
++	bool saved_idle_window;
++	bool saved_IO_bound;
++
++	bool saved_in_large_burst;
++	bool was_in_burst_list;
++
++	unsigned int cooperations;
++	unsigned int failed_cooperations;
+ };
+ 
+ enum bfq_device_speed {
+@@ -536,7 +570,7 @@ enum bfqq_state_flags {
+ 	BFQ_BFQQ_FLAG_idle_window,	/* slice idling enabled */
+ 	BFQ_BFQQ_FLAG_sync,		/* synchronous queue */
+ 	BFQ_BFQQ_FLAG_budget_new,	/* no completion with this budget */
+-	BFQ_BFQQ_FLAG_IO_bound,         /*
++	BFQ_BFQQ_FLAG_IO_bound,		/*
+ 					 * bfqq has timed-out at least once
+ 					 * having consumed at most 2/10 of
+ 					 * its budget
+@@ -549,12 +583,13 @@ enum bfqq_state_flags {
+ 					 * bfqq has proved to be slow and
+ 					 * seeky until budget timeout
+ 					 */
+-	BFQ_BFQQ_FLAG_softrt_update,    /*
++	BFQ_BFQQ_FLAG_softrt_update,	/*
+ 					 * may need softrt-next-start
+ 					 * update
+ 					 */
+ 	BFQ_BFQQ_FLAG_coop,		/* bfqq is shared */
+-	BFQ_BFQQ_FLAG_split_coop,	/* shared bfqq will be splitted */
++	BFQ_BFQQ_FLAG_split_coop,	/* shared bfqq will be split */
++	BFQ_BFQQ_FLAG_just_split,	/* queue has just been split */
+ };
+ 
+ #define BFQ_BFQQ_FNS(name)						\
+@@ -583,6 +618,7 @@ BFQ_BFQQ_FNS(in_large_burst);
+ BFQ_BFQQ_FNS(constantly_seeky);
+ BFQ_BFQQ_FNS(coop);
+ BFQ_BFQQ_FNS(split_coop);
++BFQ_BFQQ_FNS(just_split);
+ BFQ_BFQQ_FNS(softrt_update);
+ #undef BFQ_BFQQ_FNS
+ 
+-- 
+1.9.1
+


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-09-28 23:44 Mike Pagano
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-09-28 23:44 UTC (permalink / raw
  To: gentoo-commits

commit:     226f35b4faf8c37111b54e1449a20137b0b3212c
Author:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Mon Sep 28 23:44:18 2015 +0000
Commit:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Mon Sep 28 23:44:18 2015 +0000
URL:        https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=226f35b4

dm crypt: constrain crypt device's max_segment_size to PAGE_SIZE. See bug #561558. Thanks to kipplasterjoe for reporting.

 0000_README                                |  4 ++
 1600_dm-crypt-limit-max-segment-size.patch | 84 ++++++++++++++++++++++++++++++
 2 files changed, 88 insertions(+)

diff --git a/0000_README b/0000_README
index 93b94b6..551dcf3 100644
--- a/0000_README
+++ b/0000_README
@@ -55,6 +55,10 @@ Patch:  1510_fs-enable-link-security-restrictions-by-default.patch
 From:   http://sources.debian.net/src/linux/3.16.7-ckt4-3/debian/patches/debian/fs-enable-link-security-restrictions-by-default.patch/
 Desc:   Enable link security restrictions by default.
 
+Patch:  1600_dm-crypt-limit-max-segment-size.patch
+From:   https://bugzilla.kernel.org/show_bug.cgi?id=104421
+Desc:   dm crypt: constrain crypt device's max_segment_size to PAGE_SIZE.
+
 Patch:  2700_ThinkPad-30-brightness-control-fix.patch
 From:   Seth Forshee <seth.forshee@canonical.com>
 Desc:   ACPI: Disable Windows 8 compatibility for some Lenovo ThinkPads.

diff --git a/1600_dm-crypt-limit-max-segment-size.patch b/1600_dm-crypt-limit-max-segment-size.patch
new file mode 100644
index 0000000..82aca44
--- /dev/null
+++ b/1600_dm-crypt-limit-max-segment-size.patch
@@ -0,0 +1,84 @@
+From 586b286b110e94eb31840ac5afc0c24e0881fe34 Mon Sep 17 00:00:00 2001
+From: Mike Snitzer <snitzer@redhat.com>
+Date: Wed, 9 Sep 2015 21:34:51 -0400
+Subject: dm crypt: constrain crypt device's max_segment_size to PAGE_SIZE
+
+Setting the dm-crypt device's max_segment_size to PAGE_SIZE is an
+unfortunate constraint that is required to avoid the potential for
+exceeding dm-crypt's underlying device's max_segments limits -- due to
+crypt_alloc_buffer() possibly allocating pages for the encryption bio
+that are not as physically contiguous as the original bio.
+
+It is interesting to note that this problem was already fixed back in
+2007 via commit 91e106259 ("dm crypt: use bio_add_page").  But Linux 4.0
+commit cf2f1abfb ("dm crypt: don't allocate pages for a partial
+request") regressed dm-crypt back to _not_ using bio_add_page().  But
+given dm-crypt's cpu parallelization changes all depend on commit
+cf2f1abfb's abandoning of the more complex io fragments processing that
+dm-crypt previously had we cannot easily go back to using
+bio_add_page().
+
+So all said the cleanest way to resolve this issue is to fix dm-crypt to
+properly constrain the original bios entering dm-crypt so the encryption
+bios that dm-crypt generates from the original bios are always
+compatible with the underlying device's max_segments queue limits.
+
+It should be noted that technically Linux 4.3 does _not_ need this fix
+because of the block core's new late bio-splitting capability.  But, it
+is reasoned, there is little to be gained by having the block core split
+the encrypted bio that is composed of PAGE_SIZE segments.  That said, in
+the future we may revert this change.
+
+Fixes: cf2f1abfb ("dm crypt: don't allocate pages for a partial request")
+Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=104421
+Suggested-by: Jeff Moyer <jmoyer@redhat.com>
+Signed-off-by: Mike Snitzer <snitzer@redhat.com>
+Cc: stable@vger.kernel.org # 4.0+
+
+diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
+index d60c88d..4b3b6f8 100644
+--- a/drivers/md/dm-crypt.c
++++ b/drivers/md/dm-crypt.c
+@@ -968,7 +968,8 @@ static void crypt_free_buffer_pages(struct crypt_config *cc, struct bio *clone);
+ 
+ /*
+  * Generate a new unfragmented bio with the given size
+- * This should never violate the device limitations
++ * This should never violate the device limitations (but only because
++ * max_segment_size is being constrained to PAGE_SIZE).
+  *
+  * This function may be called concurrently. If we allocate from the mempool
+  * concurrently, there is a possibility of deadlock. For example, if we have
+@@ -2045,9 +2046,20 @@ static int crypt_iterate_devices(struct dm_target *ti,
+ 	return fn(ti, cc->dev, cc->start, ti->len, data);
+ }
+ 
++static void crypt_io_hints(struct dm_target *ti, struct queue_limits *limits)
++{
++	/*
++	 * Unfortunate constraint that is required to avoid the potential
++	 * for exceeding underlying device's max_segments limits -- due to
++	 * crypt_alloc_buffer() possibly allocating pages for the encryption
++	 * bio that are not as physically contiguous as the original bio.
++	 */
++	limits->max_segment_size = PAGE_SIZE;
++}
++
+ static struct target_type crypt_target = {
+ 	.name   = "crypt",
+-	.version = {1, 14, 0},
++	.version = {1, 14, 1},
+ 	.module = THIS_MODULE,
+ 	.ctr    = crypt_ctr,
+ 	.dtr    = crypt_dtr,
+@@ -2058,6 +2070,7 @@ static struct target_type crypt_target = {
+ 	.resume = crypt_resume,
+ 	.message = crypt_message,
+ 	.iterate_devices = crypt_iterate_devices,
++	.io_hints = crypt_io_hints,
+ };
+ 
+ static int __init dm_crypt_init(void)
+-- 
+cgit v0.10.2
+


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-09-29 17:51 Mike Pagano
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-09-29 17:51 UTC (permalink / raw
  To: gentoo-commits

commit:     418b300cac3a4b2286197e6433c3e8a08c638305
Author:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Tue Sep 29 17:51:49 2015 +0000
Commit:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Tue Sep 29 17:51:49 2015 +0000
URL:        https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=418b300c

Linux patch 4.2.2

 0000_README            |    4 +
 1001_linux-4.2.2.patch | 5014 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 5018 insertions(+)

diff --git a/0000_README b/0000_README
index 551dcf3..9428abc 100644
--- a/0000_README
+++ b/0000_README
@@ -47,6 +47,10 @@ Patch:  1000_linux-4.2.1.patch
 From:   http://www.kernel.org
 Desc:   Linux 4.2.1
 
+Patch:  1001_linux-4.2.2.patch
+From:   http://www.kernel.org
+Desc:   Linux 4.2.2
+
 Patch:  1500_XATTR_USER_PREFIX.patch
 From:   https://bugs.gentoo.org/show_bug.cgi?id=470644
 Desc:   Support for namespace user.pax.* on tmpfs.

diff --git a/1001_linux-4.2.2.patch b/1001_linux-4.2.2.patch
new file mode 100644
index 0000000..6e64028
--- /dev/null
+++ b/1001_linux-4.2.2.patch
@@ -0,0 +1,5014 @@
+diff --git a/Makefile b/Makefile
+index a03efc18aa48..3578b4426ecf 100644
+--- a/Makefile
++++ b/Makefile
+@@ -1,6 +1,6 @@
+ VERSION = 4
+ PATCHLEVEL = 2
+-SUBLEVEL = 1
++SUBLEVEL = 2
+ EXTRAVERSION =
+ NAME = Hurr durr I'ma sheep
+ 
+diff --git a/arch/arm/boot/compressed/decompress.c b/arch/arm/boot/compressed/decompress.c
+index bd245d34952d..a0765e7ed6c7 100644
+--- a/arch/arm/boot/compressed/decompress.c
++++ b/arch/arm/boot/compressed/decompress.c
+@@ -57,5 +57,5 @@ extern char * strstr(const char * s1, const char *s2);
+ 
+ int do_decompress(u8 *input, int len, u8 *output, void (*error)(char *x))
+ {
+-	return decompress(input, len, NULL, NULL, output, NULL, error);
++	return __decompress(input, len, NULL, NULL, output, 0, NULL, error);
+ }
+diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
+index bc738d2b8392..f9c341c5ae78 100644
+--- a/arch/arm/kvm/arm.c
++++ b/arch/arm/kvm/arm.c
+@@ -449,7 +449,7 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
+ 	 * Map the VGIC hardware resources before running a vcpu the first
+ 	 * time on this VM.
+ 	 */
+-	if (unlikely(!vgic_ready(kvm))) {
++	if (unlikely(irqchip_in_kernel(kvm) && !vgic_ready(kvm))) {
+ 		ret = kvm_vgic_map_resources(kvm);
+ 		if (ret)
+ 			return ret;
+diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
+index 318175f62c24..735456feb08e 100644
+--- a/arch/arm64/Kconfig
++++ b/arch/arm64/Kconfig
+@@ -104,6 +104,10 @@ config NO_IOPORT_MAP
+ config STACKTRACE_SUPPORT
+ 	def_bool y
+ 
++config ILLEGAL_POINTER_VALUE
++	hex
++	default 0xdead000000000000
++
+ config LOCKDEP_SUPPORT
+ 	def_bool y
+ 
+@@ -417,6 +421,22 @@ config ARM64_ERRATUM_845719
+ 
+ 	  If unsure, say Y.
+ 
++config ARM64_ERRATUM_843419
++	bool "Cortex-A53: 843419: A load or store might access an incorrect address"
++	depends on MODULES
++	default y
++	help
++	  This option builds kernel modules using the large memory model in
++	  order to avoid the use of the ADRP instruction, which can cause
++	  a subsequent memory access to use an incorrect address on Cortex-A53
++	  parts up to r0p4.
++
++	  Note that the kernel itself must be linked with a version of ld
++	  which fixes potentially affected ADRP instructions through the
++	  use of veneers.
++
++	  If unsure, say Y.
++
+ endmenu
+ 
+ 
+diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
+index 4d2a925998f9..81151663ef38 100644
+--- a/arch/arm64/Makefile
++++ b/arch/arm64/Makefile
+@@ -30,6 +30,10 @@ endif
+ 
+ CHECKFLAGS	+= -D__aarch64__
+ 
++ifeq ($(CONFIG_ARM64_ERRATUM_843419), y)
++CFLAGS_MODULE	+= -mcmodel=large
++endif
++
+ # Default value
+ head-y		:= arch/arm64/kernel/head.o
+ 
+diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
+index f800d45ea226..44a59c20e773 100644
+--- a/arch/arm64/include/asm/memory.h
++++ b/arch/arm64/include/asm/memory.h
+@@ -114,6 +114,14 @@ extern phys_addr_t		memstart_addr;
+ #define PHYS_OFFSET		({ memstart_addr; })
+ 
+ /*
++ * The maximum physical address that the linear direct mapping
++ * of system RAM can cover. (PAGE_OFFSET can be interpreted as
++ * a 2's complement signed quantity and negated to derive the
++ * maximum size of the linear mapping.)
++ */
++#define MAX_MEMBLOCK_ADDR	({ memstart_addr - PAGE_OFFSET - 1; })
++
++/*
+  * PFNs are used to describe any physical page; this means
+  * PFN 0 == physical address 0.
+  *
+diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
+index e16351819fed..8213ca15abd2 100644
+--- a/arch/arm64/kernel/entry.S
++++ b/arch/arm64/kernel/entry.S
+@@ -116,7 +116,7 @@
+ 	*/
+ 	.endm
+ 
+-	.macro	kernel_exit, el, ret = 0
++	.macro	kernel_exit, el
+ 	ldp	x21, x22, [sp, #S_PC]		// load ELR, SPSR
+ 	.if	\el == 0
+ 	ct_user_enter
+@@ -146,11 +146,7 @@
+ 	.endif
+ 	msr	elr_el1, x21			// set up the return data
+ 	msr	spsr_el1, x22
+-	.if	\ret
+-	ldr	x1, [sp, #S_X1]			// preserve x0 (syscall return)
+-	.else
+ 	ldp	x0, x1, [sp, #16 * 0]
+-	.endif
+ 	ldp	x2, x3, [sp, #16 * 1]
+ 	ldp	x4, x5, [sp, #16 * 2]
+ 	ldp	x6, x7, [sp, #16 * 3]
+@@ -613,22 +609,21 @@ ENDPROC(cpu_switch_to)
+  */
+ ret_fast_syscall:
+ 	disable_irq				// disable interrupts
++	str	x0, [sp, #S_X0]			// returned x0
+ 	ldr	x1, [tsk, #TI_FLAGS]		// re-check for syscall tracing
+ 	and	x2, x1, #_TIF_SYSCALL_WORK
+ 	cbnz	x2, ret_fast_syscall_trace
+ 	and	x2, x1, #_TIF_WORK_MASK
+-	cbnz	x2, fast_work_pending
++	cbnz	x2, work_pending
+ 	enable_step_tsk x1, x2
+-	kernel_exit 0, ret = 1
++	kernel_exit 0
+ ret_fast_syscall_trace:
+ 	enable_irq				// enable interrupts
+-	b	__sys_trace_return
++	b	__sys_trace_return_skipped	// we already saved x0
+ 
+ /*
+  * Ok, we need to do extra processing, enter the slow path.
+  */
+-fast_work_pending:
+-	str	x0, [sp, #S_X0]			// returned x0
+ work_pending:
+ 	tbnz	x1, #TIF_NEED_RESCHED, work_resched
+ 	/* TIF_SIGPENDING, TIF_NOTIFY_RESUME or TIF_FOREIGN_FPSTATE case */
+@@ -652,7 +647,7 @@ ret_to_user:
+ 	cbnz	x2, work_pending
+ 	enable_step_tsk x1, x2
+ no_work_pending:
+-	kernel_exit 0, ret = 0
++	kernel_exit 0
+ ENDPROC(ret_to_user)
+ 
+ /*
+diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
+index 44d6f7545505..c56956a16d3f 100644
+--- a/arch/arm64/kernel/fpsimd.c
++++ b/arch/arm64/kernel/fpsimd.c
+@@ -158,6 +158,7 @@ void fpsimd_thread_switch(struct task_struct *next)
+ void fpsimd_flush_thread(void)
+ {
+ 	memset(&current->thread.fpsimd_state, 0, sizeof(struct fpsimd_state));
++	fpsimd_flush_task_state(current);
+ 	set_thread_flag(TIF_FOREIGN_FPSTATE);
+ }
+ 
+diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
+index c0ff3ce4299e..370541162658 100644
+--- a/arch/arm64/kernel/head.S
++++ b/arch/arm64/kernel/head.S
+@@ -528,6 +528,11 @@ CPU_LE(	movk	x0, #0x30d0, lsl #16	)	// Clear EE and E0E on LE systems
+ 	msr	hstr_el2, xzr			// Disable CP15 traps to EL2
+ #endif
+ 
++	/* EL2 debug */
++	mrs	x0, pmcr_el0			// Disable debug access traps
++	ubfx	x0, x0, #11, #5			// to EL2 and allow access to
++	msr	mdcr_el2, x0			// all PMU counters from EL1
++
+ 	/* Stage-2 translation */
+ 	msr	vttbr_el2, xzr
+ 
+diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
+index 67bf4107f6ef..876eb8df50bf 100644
+--- a/arch/arm64/kernel/module.c
++++ b/arch/arm64/kernel/module.c
+@@ -332,12 +332,14 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
+ 			ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 0, 21,
+ 					     AARCH64_INSN_IMM_ADR);
+ 			break;
++#ifndef CONFIG_ARM64_ERRATUM_843419
+ 		case R_AARCH64_ADR_PREL_PG_HI21_NC:
+ 			overflow_check = false;
+ 		case R_AARCH64_ADR_PREL_PG_HI21:
+ 			ovf = reloc_insn_imm(RELOC_OP_PAGE, loc, val, 12, 21,
+ 					     AARCH64_INSN_IMM_ADR);
+ 			break;
++#endif
+ 		case R_AARCH64_ADD_ABS_LO12_NC:
+ 		case R_AARCH64_LDST8_ABS_LO12_NC:
+ 			overflow_check = false;
+diff --git a/arch/arm64/kernel/signal32.c b/arch/arm64/kernel/signal32.c
+index 948f0ad2de23..71ef6dc89ae5 100644
+--- a/arch/arm64/kernel/signal32.c
++++ b/arch/arm64/kernel/signal32.c
+@@ -212,14 +212,32 @@ int copy_siginfo_from_user32(siginfo_t *to, compat_siginfo_t __user *from)
+ 
+ /*
+  * VFP save/restore code.
++ *
++ * We have to be careful with endianness, since the fpsimd context-switch
++ * code operates on 128-bit (Q) register values whereas the compat ABI
++ * uses an array of 64-bit (D) registers. Consequently, we need to swap
++ * the two halves of each Q register when running on a big-endian CPU.
+  */
++union __fpsimd_vreg {
++	__uint128_t	raw;
++	struct {
++#ifdef __AARCH64EB__
++		u64	hi;
++		u64	lo;
++#else
++		u64	lo;
++		u64	hi;
++#endif
++	};
++};
++
+ static int compat_preserve_vfp_context(struct compat_vfp_sigframe __user *frame)
+ {
+ 	struct fpsimd_state *fpsimd = &current->thread.fpsimd_state;
+ 	compat_ulong_t magic = VFP_MAGIC;
+ 	compat_ulong_t size = VFP_STORAGE_SIZE;
+ 	compat_ulong_t fpscr, fpexc;
+-	int err = 0;
++	int i, err = 0;
+ 
+ 	/*
+ 	 * Save the hardware registers to the fpsimd_state structure.
+@@ -235,10 +253,15 @@ static int compat_preserve_vfp_context(struct compat_vfp_sigframe __user *frame)
+ 	/*
+ 	 * Now copy the FP registers. Since the registers are packed,
+ 	 * we can copy the prefix we want (V0-V15) as it is.
+-	 * FIXME: Won't work if big endian.
+ 	 */
+-	err |= __copy_to_user(&frame->ufp.fpregs, fpsimd->vregs,
+-			      sizeof(frame->ufp.fpregs));
++	for (i = 0; i < ARRAY_SIZE(frame->ufp.fpregs); i += 2) {
++		union __fpsimd_vreg vreg = {
++			.raw = fpsimd->vregs[i >> 1],
++		};
++
++		__put_user_error(vreg.lo, &frame->ufp.fpregs[i], err);
++		__put_user_error(vreg.hi, &frame->ufp.fpregs[i + 1], err);
++	}
+ 
+ 	/* Create an AArch32 fpscr from the fpsr and the fpcr. */
+ 	fpscr = (fpsimd->fpsr & VFP_FPSCR_STAT_MASK) |
+@@ -263,7 +286,7 @@ static int compat_restore_vfp_context(struct compat_vfp_sigframe __user *frame)
+ 	compat_ulong_t magic = VFP_MAGIC;
+ 	compat_ulong_t size = VFP_STORAGE_SIZE;
+ 	compat_ulong_t fpscr;
+-	int err = 0;
++	int i, err = 0;
+ 
+ 	__get_user_error(magic, &frame->magic, err);
+ 	__get_user_error(size, &frame->size, err);
+@@ -273,12 +296,14 @@ static int compat_restore_vfp_context(struct compat_vfp_sigframe __user *frame)
+ 	if (magic != VFP_MAGIC || size != VFP_STORAGE_SIZE)
+ 		return -EINVAL;
+ 
+-	/*
+-	 * Copy the FP registers into the start of the fpsimd_state.
+-	 * FIXME: Won't work if big endian.
+-	 */
+-	err |= __copy_from_user(fpsimd.vregs, frame->ufp.fpregs,
+-				sizeof(frame->ufp.fpregs));
++	/* Copy the FP registers into the start of the fpsimd_state. */
++	for (i = 0; i < ARRAY_SIZE(frame->ufp.fpregs); i += 2) {
++		union __fpsimd_vreg vreg;
++
++		__get_user_error(vreg.lo, &frame->ufp.fpregs[i], err);
++		__get_user_error(vreg.hi, &frame->ufp.fpregs[i + 1], err);
++		fpsimd.vregs[i >> 1] = vreg.raw;
++	}
+ 
+ 	/* Extract the fpsr and the fpcr from the fpscr */
+ 	__get_user_error(fpscr, &frame->ufp.fpscr, err);
+diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
+index 17a8fb14f428..3c6051cbf442 100644
+--- a/arch/arm64/kvm/hyp.S
++++ b/arch/arm64/kvm/hyp.S
+@@ -840,8 +840,6 @@
+ 	mrs	x3, cntv_ctl_el0
+ 	and	x3, x3, #3
+ 	str	w3, [x0, #VCPU_TIMER_CNTV_CTL]
+-	bic	x3, x3, #1		// Clear Enable
+-	msr	cntv_ctl_el0, x3
+ 
+ 	isb
+ 
+@@ -849,6 +847,9 @@
+ 	str	x3, [x0, #VCPU_TIMER_CNTV_CVAL]
+ 
+ 1:
++	// Disable the virtual timer
++	msr	cntv_ctl_el0, xzr
++
+ 	// Allow physical timer/counter access for the host
+ 	mrs	x2, cnthctl_el2
+ 	orr	x2, x2, #3
+@@ -943,13 +944,15 @@ ENTRY(__kvm_vcpu_run)
+ 	// Guest context
+ 	add	x2, x0, #VCPU_CONTEXT
+ 
++	// We must restore the 32-bit state before the sysregs, thanks
++	// to Cortex-A57 erratum #852523.
++	restore_guest_32bit_state
+ 	bl __restore_sysregs
+ 	bl __restore_fpsimd
+ 
+ 	skip_debug_state x3, 1f
+ 	bl	__restore_debug
+ 1:
+-	restore_guest_32bit_state
+ 	restore_guest_regs
+ 
+ 	// That's it, no more messing around.
+diff --git a/arch/h8300/boot/compressed/misc.c b/arch/h8300/boot/compressed/misc.c
+index 704274127c07..c4f2cfcb117b 100644
+--- a/arch/h8300/boot/compressed/misc.c
++++ b/arch/h8300/boot/compressed/misc.c
+@@ -70,5 +70,5 @@ void decompress_kernel(void)
+ 	free_mem_ptr = (unsigned long)&_end;
+ 	free_mem_end_ptr = free_mem_ptr + HEAP_SIZE;
+ 
+-	decompress(input_data, input_len, NULL, NULL, output, NULL, error);
++	__decompress(input_data, input_len, NULL, NULL, output, 0, NULL, error);
+ }
+diff --git a/arch/m32r/boot/compressed/misc.c b/arch/m32r/boot/compressed/misc.c
+index 28a09529f206..3a7692745868 100644
+--- a/arch/m32r/boot/compressed/misc.c
++++ b/arch/m32r/boot/compressed/misc.c
+@@ -86,6 +86,7 @@ decompress_kernel(int mmu_on, unsigned char *zimage_data,
+ 	free_mem_end_ptr = free_mem_ptr + BOOT_HEAP_SIZE;
+ 
+ 	puts("\nDecompressing Linux... ");
+-	decompress(input_data, input_len, NULL, NULL, output_data, NULL, error);
++	__decompress(input_data, input_len, NULL, NULL, output_data, 0,
++			NULL, error);
+ 	puts("done.\nBooting the kernel.\n");
+ }
+diff --git a/arch/mips/boot/compressed/decompress.c b/arch/mips/boot/compressed/decompress.c
+index 54831069a206..080cd53bac36 100644
+--- a/arch/mips/boot/compressed/decompress.c
++++ b/arch/mips/boot/compressed/decompress.c
+@@ -111,8 +111,8 @@ void decompress_kernel(unsigned long boot_heap_start)
+ 	puts("\n");
+ 
+ 	/* Decompress the kernel with according algorithm */
+-	decompress((char *)zimage_start, zimage_size, 0, 0,
+-		   (void *)VMLINUX_LOAD_ADDRESS_ULL, 0, error);
++	__decompress((char *)zimage_start, zimage_size, 0, 0,
++		   (void *)VMLINUX_LOAD_ADDRESS_ULL, 0, 0, error);
+ 
+ 	/* FIXME: should we flush cache here? */
+ 	puts("Now, booting the kernel...\n");
+diff --git a/arch/mips/kernel/cps-vec.S b/arch/mips/kernel/cps-vec.S
+index 1b6ca634e646..9f71c06aebf6 100644
+--- a/arch/mips/kernel/cps-vec.S
++++ b/arch/mips/kernel/cps-vec.S
+@@ -152,7 +152,7 @@ dcache_done:
+ 
+ 	/* Enter the coherent domain */
+ 	li	t0, 0xff
+-	PTR_S	t0, GCR_CL_COHERENCE_OFS(v1)
++	sw	t0, GCR_CL_COHERENCE_OFS(v1)
+ 	ehb
+ 
+ 	/* Jump to kseg0 */
+@@ -302,7 +302,7 @@ LEAF(mips_cps_boot_vpes)
+ 	PTR_L	t0, 0(t0)
+ 
+ 	/* Calculate a pointer to this cores struct core_boot_config */
+-	PTR_L	t0, GCR_CL_ID_OFS(t0)
++	lw	t0, GCR_CL_ID_OFS(t0)
+ 	li	t1, COREBOOTCFG_SIZE
+ 	mul	t0, t0, t1
+ 	PTR_LA	t1, mips_cps_core_bootcfg
+diff --git a/arch/mips/math-emu/cp1emu.c b/arch/mips/math-emu/cp1emu.c
+index 712f17a2ecf2..f0f1b98a5fde 100644
+--- a/arch/mips/math-emu/cp1emu.c
++++ b/arch/mips/math-emu/cp1emu.c
+@@ -1137,7 +1137,7 @@ emul:
+ 			break;
+ 
+ 		case mfhc_op:
+-			if (!cpu_has_mips_r2)
++			if (!cpu_has_mips_r2_r6)
+ 				goto sigill;
+ 
+ 			/* copregister rd -> gpr[rt] */
+@@ -1148,7 +1148,7 @@ emul:
+ 			break;
+ 
+ 		case mthc_op:
+-			if (!cpu_has_mips_r2)
++			if (!cpu_has_mips_r2_r6)
+ 				goto sigill;
+ 
+ 			/* copregister rd <- gpr[rt] */
+@@ -1181,6 +1181,24 @@ emul:
+ 			}
+ 			break;
+ 
++		case bc1eqz_op:
++		case bc1nez_op:
++			if (!cpu_has_mips_r6 || delay_slot(xcp))
++				return SIGILL;
++
++			cond = likely = 0;
++			switch (MIPSInst_RS(ir)) {
++			case bc1eqz_op:
++				if (get_fpr32(&current->thread.fpu.fpr[MIPSInst_RT(ir)], 0) & 0x1)
++				    cond = 1;
++				break;
++			case bc1nez_op:
++				if (!(get_fpr32(&current->thread.fpu.fpr[MIPSInst_RT(ir)], 0) & 0x1))
++				    cond = 1;
++				break;
++			}
++			goto branch_common;
++
+ 		case bc_op:
+ 			if (delay_slot(xcp))
+ 				return SIGILL;
+@@ -1207,7 +1225,7 @@ emul:
+ 			case bct_op:
+ 				break;
+ 			}
+-
++branch_common:
+ 			set_delay_slot(xcp);
+ 			if (cond) {
+ 				/*
+diff --git a/arch/parisc/kernel/irq.c b/arch/parisc/kernel/irq.c
+index f3191db6e2e9..c0eab24f6a9e 100644
+--- a/arch/parisc/kernel/irq.c
++++ b/arch/parisc/kernel/irq.c
+@@ -507,8 +507,8 @@ void do_cpu_irq_mask(struct pt_regs *regs)
+ 	struct pt_regs *old_regs;
+ 	unsigned long eirr_val;
+ 	int irq, cpu = smp_processor_id();
+-#ifdef CONFIG_SMP
+ 	struct irq_desc *desc;
++#ifdef CONFIG_SMP
+ 	cpumask_t dest;
+ #endif
+ 
+@@ -521,8 +521,12 @@ void do_cpu_irq_mask(struct pt_regs *regs)
+ 		goto set_out;
+ 	irq = eirr_to_irq(eirr_val);
+ 
+-#ifdef CONFIG_SMP
++	/* Filter out spurious interrupts, mostly from serial port at bootup */
+ 	desc = irq_to_desc(irq);
++	if (unlikely(!desc->action))
++		goto set_out;
++
++#ifdef CONFIG_SMP
+ 	cpumask_copy(&dest, desc->irq_data.affinity);
+ 	if (irqd_is_per_cpu(&desc->irq_data) &&
+ 	    !cpumask_test_cpu(smp_processor_id(), &dest)) {
+diff --git a/arch/parisc/kernel/syscall.S b/arch/parisc/kernel/syscall.S
+index 7ef22e3387e0..0b8d26d3ba43 100644
+--- a/arch/parisc/kernel/syscall.S
++++ b/arch/parisc/kernel/syscall.S
+@@ -821,7 +821,7 @@ cas2_action:
+ 	/* 64bit CAS */
+ #ifdef CONFIG_64BIT
+ 19:	ldd,ma	0(%sr3,%r26), %r29
+-	sub,=	%r29, %r25, %r0
++	sub,*=	%r29, %r25, %r0
+ 	b,n	cas2_end
+ 20:	std,ma	%r24, 0(%sr3,%r26)
+ 	copy	%r0, %r28
+diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
+index 73eddda53b8e..4eec430d8fa8 100644
+--- a/arch/powerpc/boot/Makefile
++++ b/arch/powerpc/boot/Makefile
+@@ -28,6 +28,9 @@ BOOTCFLAGS	+= -m64
+ endif
+ ifdef CONFIG_CPU_BIG_ENDIAN
+ BOOTCFLAGS	+= -mbig-endian
++else
++BOOTCFLAGS	+= -mlittle-endian
++BOOTCFLAGS	+= $(call cc-option,-mabi=elfv2)
+ endif
+ 
+ BOOTAFLAGS	:= -D__ASSEMBLY__ $(BOOTCFLAGS) -traditional -nostdinc
+diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
+index 3bb7488bd24b..7ee2300ee392 100644
+--- a/arch/powerpc/include/asm/pgtable-ppc64.h
++++ b/arch/powerpc/include/asm/pgtable-ppc64.h
+@@ -135,7 +135,19 @@
+ #define pte_iterate_hashed_end() } while(0)
+ 
+ #ifdef CONFIG_PPC_HAS_HASH_64K
+-#define pte_pagesize_index(mm, addr, pte)	get_slice_psize(mm, addr)
++/*
++ * We expect this to be called only for user addresses or kernel virtual
++ * addresses other than the linear mapping.
++ */
++#define pte_pagesize_index(mm, addr, pte)			\
++	({							\
++		unsigned int psize;				\
++		if (is_kernel_addr(addr))			\
++			psize = MMU_PAGE_4K;			\
++		else						\
++			psize = get_slice_psize(mm, addr);	\
++		psize;						\
++	})
+ #else
+ #define pte_pagesize_index(mm, addr, pte)	MMU_PAGE_4K
+ #endif
+diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
+index 7a4ede16b283..b77ef369c0f0 100644
+--- a/arch/powerpc/include/asm/rtas.h
++++ b/arch/powerpc/include/asm/rtas.h
+@@ -343,6 +343,7 @@ extern void rtas_power_off(void);
+ extern void rtas_halt(void);
+ extern void rtas_os_term(char *str);
+ extern int rtas_get_sensor(int sensor, int index, int *state);
++extern int rtas_get_sensor_fast(int sensor, int index, int *state);
+ extern int rtas_get_power_level(int powerdomain, int *level);
+ extern int rtas_set_power_level(int powerdomain, int level, int *setlevel);
+ extern bool rtas_indicator_present(int token, int *maxindex);
+diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h
+index 58abeda64cb7..15cca17cba4b 100644
+--- a/arch/powerpc/include/asm/switch_to.h
++++ b/arch/powerpc/include/asm/switch_to.h
+@@ -29,6 +29,7 @@ static inline void save_early_sprs(struct thread_struct *prev) {}
+ 
+ extern void enable_kernel_fp(void);
+ extern void enable_kernel_altivec(void);
++extern void enable_kernel_vsx(void);
+ extern int emulate_altivec(struct pt_regs *);
+ extern void __giveup_vsx(struct task_struct *);
+ extern void giveup_vsx(struct task_struct *);
+diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
+index af9b597b10af..01c961d5d2de 100644
+--- a/arch/powerpc/kernel/eeh.c
++++ b/arch/powerpc/kernel/eeh.c
+@@ -308,11 +308,26 @@ void eeh_slot_error_detail(struct eeh_pe *pe, int severity)
+ 	if (!(pe->type & EEH_PE_PHB)) {
+ 		if (eeh_has_flag(EEH_ENABLE_IO_FOR_LOG))
+ 			eeh_pci_enable(pe, EEH_OPT_THAW_MMIO);
++
++		/*
++		 * The config space of some PCI devices can't be accessed
++		 * when their PEs are in frozen state. Otherwise, fenced
++		 * PHB might be seen. Those PEs are identified with flag
++		 * EEH_PE_CFG_RESTRICTED, indicating EEH_PE_CFG_BLOCKED
++		 * is set automatically when the PE is put to EEH_PE_ISOLATED.
++		 *
++		 * Restoring BARs possibly triggers PCI config access in
++		 * (OPAL) firmware and then causes fenced PHB. If the
++		 * PCI config is blocked with flag EEH_PE_CFG_BLOCKED, it's
++		 * pointless to restore BARs and dump config space.
++		 */
+ 		eeh_ops->configure_bridge(pe);
+-		eeh_pe_restore_bars(pe);
++		if (!(pe->state & EEH_PE_CFG_BLOCKED)) {
++			eeh_pe_restore_bars(pe);
+ 
+-		pci_regs_buf[0] = 0;
+-		eeh_pe_traverse(pe, eeh_dump_pe_log, &loglen);
++			pci_regs_buf[0] = 0;
++			eeh_pe_traverse(pe, eeh_dump_pe_log, &loglen);
++		}
+ 	}
+ 
+ 	eeh_ops->get_log(pe, severity, pci_regs_buf, loglen);
+@@ -1116,9 +1131,6 @@ void eeh_add_device_late(struct pci_dev *dev)
+ 		return;
+ 	}
+ 
+-	if (eeh_has_flag(EEH_PROBE_MODE_DEV))
+-		eeh_ops->probe(pdn, NULL);
+-
+ 	/*
+ 	 * The EEH cache might not be removed correctly because of
+ 	 * unbalanced kref to the device during unplug time, which
+@@ -1142,6 +1154,9 @@ void eeh_add_device_late(struct pci_dev *dev)
+ 		dev->dev.archdata.edev = NULL;
+ 	}
+ 
++	if (eeh_has_flag(EEH_PROBE_MODE_DEV))
++		eeh_ops->probe(pdn, NULL);
++
+ 	edev->pdev = dev;
+ 	dev->dev.archdata.edev = edev;
+ 
+diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
+index 8005e18d1b40..64e6e9d9e656 100644
+--- a/arch/powerpc/kernel/process.c
++++ b/arch/powerpc/kernel/process.c
+@@ -204,8 +204,6 @@ EXPORT_SYMBOL_GPL(flush_altivec_to_thread);
+ #endif /* CONFIG_ALTIVEC */
+ 
+ #ifdef CONFIG_VSX
+-#if 0
+-/* not currently used, but some crazy RAID module might want to later */
+ void enable_kernel_vsx(void)
+ {
+ 	WARN_ON(preemptible());
+@@ -220,7 +218,6 @@ void enable_kernel_vsx(void)
+ #endif /* CONFIG_SMP */
+ }
+ EXPORT_SYMBOL(enable_kernel_vsx);
+-#endif
+ 
+ void giveup_vsx(struct task_struct *tsk)
+ {
+diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
+index 7a488c108410..caffb10e7aa3 100644
+--- a/arch/powerpc/kernel/rtas.c
++++ b/arch/powerpc/kernel/rtas.c
+@@ -584,6 +584,23 @@ int rtas_get_sensor(int sensor, int index, int *state)
+ }
+ EXPORT_SYMBOL(rtas_get_sensor);
+ 
++int rtas_get_sensor_fast(int sensor, int index, int *state)
++{
++	int token = rtas_token("get-sensor-state");
++	int rc;
++
++	if (token == RTAS_UNKNOWN_SERVICE)
++		return -ENOENT;
++
++	rc = rtas_call(token, 2, 2, state, sensor, index);
++	WARN_ON(rc == RTAS_BUSY || (rc >= RTAS_EXTENDED_DELAY_MIN &&
++				    rc <= RTAS_EXTENDED_DELAY_MAX));
++
++	if (rc < 0)
++		return rtas_error_rc(rc);
++	return rc;
++}
++
+ bool rtas_indicator_present(int token, int *maxindex)
+ {
+ 	int proplen, count, i;
+diff --git a/arch/powerpc/mm/hugepage-hash64.c b/arch/powerpc/mm/hugepage-hash64.c
+index 43dafb9d6a46..4d87122cf6a7 100644
+--- a/arch/powerpc/mm/hugepage-hash64.c
++++ b/arch/powerpc/mm/hugepage-hash64.c
+@@ -85,7 +85,6 @@ int __hash_page_thp(unsigned long ea, unsigned long access, unsigned long vsid,
+ 	BUG_ON(index >= 4096);
+ 
+ 	vpn = hpt_vpn(ea, vsid, ssize);
+-	hash = hpt_hash(vpn, shift, ssize);
+ 	hpte_slot_array = get_hpte_slot_array(pmdp);
+ 	if (psize == MMU_PAGE_4K) {
+ 		/*
+@@ -101,6 +100,7 @@ int __hash_page_thp(unsigned long ea, unsigned long access, unsigned long vsid,
+ 	valid = hpte_valid(hpte_slot_array, index);
+ 	if (valid) {
+ 		/* update the hpte bits */
++		hash = hpt_hash(vpn, shift, ssize);
+ 		hidx =  hpte_hash_index(hpte_slot_array, index);
+ 		if (hidx & _PTEIDX_SECONDARY)
+ 			hash = ~hash;
+@@ -126,6 +126,7 @@ int __hash_page_thp(unsigned long ea, unsigned long access, unsigned long vsid,
+ 	if (!valid) {
+ 		unsigned long hpte_group;
+ 
++		hash = hpt_hash(vpn, shift, ssize);
+ 		/* insert new entry */
+ 		pa = pmd_pfn(__pmd(old_pmd)) << PAGE_SHIFT;
+ 		new_pmd |= _PAGE_HASHPTE;
+diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
+index 85cbc96eff6c..8b64f89e68c9 100644
+--- a/arch/powerpc/platforms/powernv/pci-ioda.c
++++ b/arch/powerpc/platforms/powernv/pci-ioda.c
+@@ -2078,9 +2078,23 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
+ 	struct iommu_table *tbl = NULL;
+ 	long rc;
+ 
++	/*
++	 * crashkernel= specifies the kdump kernel's maximum memory at
++	 * some offset and there is no guaranteed the result is a power
++	 * of 2, which will cause errors later.
++	 */
++	const u64 max_memory = __rounddown_pow_of_two(memory_hotplug_max());
++
++	/*
++	 * In memory constrained environments, e.g. kdump kernel, the
++	 * DMA window can be larger than available memory, which will
++	 * cause errors later.
++	 */
++	const u64 window_size = min((u64)pe->table_group.tce32_size, max_memory);
++
+ 	rc = pnv_pci_ioda2_create_table(&pe->table_group, 0,
+ 			IOMMU_PAGE_SHIFT_4K,
+-			pe->table_group.tce32_size,
++			window_size,
+ 			POWERNV_IOMMU_DEFAULT_LEVELS, &tbl);
+ 	if (rc) {
+ 		pe_err(pe, "Failed to create 32-bit TCE table, err %ld",
+diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
+index 47d9cebe7159..db17827eb746 100644
+--- a/arch/powerpc/platforms/pseries/dlpar.c
++++ b/arch/powerpc/platforms/pseries/dlpar.c
+@@ -422,8 +422,10 @@ static ssize_t dlpar_cpu_probe(const char *buf, size_t count)
+ 
+ 	dn = dlpar_configure_connector(cpu_to_be32(drc_index), parent);
+ 	of_node_put(parent);
+-	if (!dn)
++	if (!dn) {
++		dlpar_release_drc(drc_index);
+ 		return -EINVAL;
++	}
+ 
+ 	rc = dlpar_attach_node(dn);
+ 	if (rc) {
+diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
+index 02e4a1745516..3b6647e574b6 100644
+--- a/arch/powerpc/platforms/pseries/ras.c
++++ b/arch/powerpc/platforms/pseries/ras.c
+@@ -189,7 +189,8 @@ static irqreturn_t ras_epow_interrupt(int irq, void *dev_id)
+ 	int state;
+ 	int critical;
+ 
+-	status = rtas_get_sensor(EPOW_SENSOR_TOKEN, EPOW_SENSOR_INDEX, &state);
++	status = rtas_get_sensor_fast(EPOW_SENSOR_TOKEN, EPOW_SENSOR_INDEX,
++				      &state);
+ 
+ 	if (state > 3)
+ 		critical = 1;		/* Time Critical */
+diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
+index df6a7041922b..e6e8b241d717 100644
+--- a/arch/powerpc/platforms/pseries/setup.c
++++ b/arch/powerpc/platforms/pseries/setup.c
+@@ -268,6 +268,11 @@ static int pci_dn_reconfig_notifier(struct notifier_block *nb, unsigned long act
+ 			eeh_dev_init(PCI_DN(np), pci->phb);
+ 		}
+ 		break;
++	case OF_RECONFIG_DETACH_NODE:
++		pci = PCI_DN(np);
++		if (pci)
++			list_del(&pci->list);
++		break;
+ 	default:
+ 		err = NOTIFY_DONE;
+ 		break;
+diff --git a/arch/s390/boot/compressed/misc.c b/arch/s390/boot/compressed/misc.c
+index 42506b371b74..4da604ebf6fd 100644
+--- a/arch/s390/boot/compressed/misc.c
++++ b/arch/s390/boot/compressed/misc.c
+@@ -167,7 +167,7 @@ unsigned long decompress_kernel(void)
+ #endif
+ 
+ 	puts("Uncompressing Linux... ");
+-	decompress(input_data, input_len, NULL, NULL, output, NULL, error);
++	__decompress(input_data, input_len, NULL, NULL, output, 0, NULL, error);
+ 	puts("Ok, booting the kernel.\n");
+ 	return (unsigned long) output;
+ }
+diff --git a/arch/sh/boot/compressed/misc.c b/arch/sh/boot/compressed/misc.c
+index 95470a472d2c..208a9753ab38 100644
+--- a/arch/sh/boot/compressed/misc.c
++++ b/arch/sh/boot/compressed/misc.c
+@@ -132,7 +132,7 @@ void decompress_kernel(void)
+ 
+ 	puts("Uncompressing Linux... ");
+ 	cache_control(CACHE_ENABLE);
+-	decompress(input_data, input_len, NULL, NULL, output, NULL, error);
++	__decompress(input_data, input_len, NULL, NULL, output, 0, NULL, error);
+ 	cache_control(CACHE_DISABLE);
+ 	puts("Ok, booting the kernel.\n");
+ }
+diff --git a/arch/unicore32/boot/compressed/misc.c b/arch/unicore32/boot/compressed/misc.c
+index 176d5bda3559..5c65dfee278c 100644
+--- a/arch/unicore32/boot/compressed/misc.c
++++ b/arch/unicore32/boot/compressed/misc.c
+@@ -119,8 +119,8 @@ unsigned long decompress_kernel(unsigned long output_start,
+ 	output_ptr = get_unaligned_le32(tmp);
+ 
+ 	arch_decomp_puts("Uncompressing Linux...");
+-	decompress(input_data, input_data_end - input_data, NULL, NULL,
+-			output_data, NULL, error);
++	__decompress(input_data, input_data_end - input_data, NULL, NULL,
++			output_data, 0, NULL, error);
+ 	arch_decomp_puts(" done, booting the kernel.\n");
+ 	return output_ptr;
+ }
+diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
+index a107b935e22f..e28437e0f708 100644
+--- a/arch/x86/boot/compressed/misc.c
++++ b/arch/x86/boot/compressed/misc.c
+@@ -424,7 +424,8 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
+ #endif
+ 
+ 	debug_putstr("\nDecompressing Linux... ");
+-	decompress(input_data, input_len, NULL, NULL, output, NULL, error);
++	__decompress(input_data, input_len, NULL, NULL, output, output_len,
++			NULL, error);
+ 	parse_elf(output);
+ 	/*
+ 	 * 32-bit always performs relocations. 64-bit relocations are only
+diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
+index 8340e45c891a..68aec42545c2 100644
+--- a/arch/x86/mm/init_32.c
++++ b/arch/x86/mm/init_32.c
+@@ -137,6 +137,7 @@ page_table_range_init_count(unsigned long start, unsigned long end)
+ 
+ 	vaddr = start;
+ 	pgd_idx = pgd_index(vaddr);
++	pmd_idx = pmd_index(vaddr);
+ 
+ 	for ( ; (pgd_idx < PTRS_PER_PGD) && (vaddr != end); pgd_idx++) {
+ 		for (; (pmd_idx < PTRS_PER_PMD) && (vaddr != end);
+diff --git a/block/blk-flush.c b/block/blk-flush.c
+index 20badd7b9d1b..9c423e53324a 100644
+--- a/block/blk-flush.c
++++ b/block/blk-flush.c
+@@ -73,6 +73,7 @@
+ 
+ #include "blk.h"
+ #include "blk-mq.h"
++#include "blk-mq-tag.h"
+ 
+ /* FLUSH/FUA sequences */
+ enum {
+@@ -226,7 +227,12 @@ static void flush_end_io(struct request *flush_rq, int error)
+ 	struct blk_flush_queue *fq = blk_get_flush_queue(q, flush_rq->mq_ctx);
+ 
+ 	if (q->mq_ops) {
++		struct blk_mq_hw_ctx *hctx;
++
++		/* release the tag's ownership to the req cloned from */
+ 		spin_lock_irqsave(&fq->mq_flush_lock, flags);
++		hctx = q->mq_ops->map_queue(q, flush_rq->mq_ctx->cpu);
++		blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq);
+ 		flush_rq->tag = -1;
+ 	}
+ 
+@@ -308,11 +314,18 @@ static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq)
+ 
+ 	/*
+ 	 * Borrow tag from the first request since they can't
+-	 * be in flight at the same time.
++	 * be in flight at the same time. And acquire the tag's
++	 * ownership for flush req.
+ 	 */
+ 	if (q->mq_ops) {
++		struct blk_mq_hw_ctx *hctx;
++
+ 		flush_rq->mq_ctx = first_rq->mq_ctx;
+ 		flush_rq->tag = first_rq->tag;
++		fq->orig_rq = first_rq;
++
++		hctx = q->mq_ops->map_queue(q, first_rq->mq_ctx->cpu);
++		blk_mq_tag_set_rq(hctx, first_rq->tag, flush_rq);
+ 	}
+ 
+ 	flush_rq->cmd_type = REQ_TYPE_FS;
+diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
+index b79685e06b70..279c5d674edf 100644
+--- a/block/blk-mq-sysfs.c
++++ b/block/blk-mq-sysfs.c
+@@ -141,15 +141,26 @@ static ssize_t blk_mq_sysfs_completed_show(struct blk_mq_ctx *ctx, char *page)
+ 
+ static ssize_t sysfs_list_show(char *page, struct list_head *list, char *msg)
+ {
+-	char *start_page = page;
+ 	struct request *rq;
++	int len = snprintf(page, PAGE_SIZE - 1, "%s:\n", msg);
++
++	list_for_each_entry(rq, list, queuelist) {
++		const int rq_len = 2 * sizeof(rq) + 2;
++
++		/* if the output will be truncated */
++		if (PAGE_SIZE - 1 < len + rq_len) {
++			/* backspacing if it can't hold '\t...\n' */
++			if (PAGE_SIZE - 1 < len + 5)
++				len -= rq_len;
++			len += snprintf(page + len, PAGE_SIZE - 1 - len,
++					"\t...\n");
++			break;
++		}
++		len += snprintf(page + len, PAGE_SIZE - 1 - len,
++				"\t%p\n", rq);
++	}
+ 
+-	page += sprintf(page, "%s:\n", msg);
+-
+-	list_for_each_entry(rq, list, queuelist)
+-		page += sprintf(page, "\t%p\n", rq);
+-
+-	return page - start_page;
++	return len;
+ }
+ 
+ static ssize_t blk_mq_sysfs_rq_list_show(struct blk_mq_ctx *ctx, char *page)
+diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
+index 9b6e28830b82..9115c6d59948 100644
+--- a/block/blk-mq-tag.c
++++ b/block/blk-mq-tag.c
+@@ -429,7 +429,7 @@ static void bt_for_each(struct blk_mq_hw_ctx *hctx,
+ 		for (bit = find_first_bit(&bm->word, bm->depth);
+ 		     bit < bm->depth;
+ 		     bit = find_next_bit(&bm->word, bm->depth, bit + 1)) {
+-		     	rq = blk_mq_tag_to_rq(hctx->tags, off + bit);
++			rq = hctx->tags->rqs[off + bit];
+ 			if (rq->q == hctx->queue)
+ 				fn(hctx, rq, data, reserved);
+ 		}
+@@ -453,7 +453,7 @@ static void bt_tags_for_each(struct blk_mq_tags *tags,
+ 		for (bit = find_first_bit(&bm->word, bm->depth);
+ 		     bit < bm->depth;
+ 		     bit = find_next_bit(&bm->word, bm->depth, bit + 1)) {
+-			rq = blk_mq_tag_to_rq(tags, off + bit);
++			rq = tags->rqs[off + bit];
+ 			fn(rq, data, reserved);
+ 		}
+ 
+diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
+index 75893a34237d..9eb2cf4f01cb 100644
+--- a/block/blk-mq-tag.h
++++ b/block/blk-mq-tag.h
+@@ -89,4 +89,16 @@ static inline void blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
+ 	__blk_mq_tag_idle(hctx);
+ }
+ 
++/*
++ * This helper should only be used for flush request to share tag
++ * with the request cloned from, and both the two requests can't be
++ * in flight at the same time. The caller has to make sure the tag
++ * can't be freed.
++ */
++static inline void blk_mq_tag_set_rq(struct blk_mq_hw_ctx *hctx,
++		unsigned int tag, struct request *rq)
++{
++	hctx->tags->rqs[tag] = rq;
++}
++
+ #endif
+diff --git a/block/blk-mq.c b/block/blk-mq.c
+index 7d842db59699..176262ec3731 100644
+--- a/block/blk-mq.c
++++ b/block/blk-mq.c
+@@ -559,23 +559,9 @@ void blk_mq_abort_requeue_list(struct request_queue *q)
+ }
+ EXPORT_SYMBOL(blk_mq_abort_requeue_list);
+ 
+-static inline bool is_flush_request(struct request *rq,
+-		struct blk_flush_queue *fq, unsigned int tag)
+-{
+-	return ((rq->cmd_flags & REQ_FLUSH_SEQ) &&
+-			fq->flush_rq->tag == tag);
+-}
+-
+ struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
+ {
+-	struct request *rq = tags->rqs[tag];
+-	/* mq_ctx of flush rq is always cloned from the corresponding req */
+-	struct blk_flush_queue *fq = blk_get_flush_queue(rq->q, rq->mq_ctx);
+-
+-	if (!is_flush_request(rq, fq, tag))
+-		return rq;
+-
+-	return fq->flush_rq;
++	return tags->rqs[tag];
+ }
+ EXPORT_SYMBOL(blk_mq_tag_to_rq);
+ 
+diff --git a/block/blk.h b/block/blk.h
+index 026d9594142b..838188b35a83 100644
+--- a/block/blk.h
++++ b/block/blk.h
+@@ -22,6 +22,12 @@ struct blk_flush_queue {
+ 	struct list_head	flush_queue[2];
+ 	struct list_head	flush_data_in_flight;
+ 	struct request		*flush_rq;
++
++	/*
++	 * flush_rq shares tag with this rq, both can't be active
++	 * at the same time
++	 */
++	struct request		*orig_rq;
+ 	spinlock_t		mq_flush_lock;
+ };
+ 
+diff --git a/drivers/base/node.c b/drivers/base/node.c
+index 31df474d72f4..560751bad294 100644
+--- a/drivers/base/node.c
++++ b/drivers/base/node.c
+@@ -392,6 +392,16 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, int nid)
+ 	for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
+ 		int page_nid;
+ 
++		/*
++		 * memory block could have several absent sections from start.
++		 * skip pfn range from absent section
++		 */
++		if (!pfn_present(pfn)) {
++			pfn = round_down(pfn + PAGES_PER_SECTION,
++					 PAGES_PER_SECTION) - 1;
++			continue;
++		}
++
+ 		page_nid = get_nid_for_pfn(pfn);
+ 		if (page_nid < 0)
+ 			continue;
+diff --git a/drivers/crypto/vmx/aes.c b/drivers/crypto/vmx/aes.c
+index e79e567e43aa..263af709e536 100644
+--- a/drivers/crypto/vmx/aes.c
++++ b/drivers/crypto/vmx/aes.c
+@@ -84,6 +84,7 @@ static int p8_aes_setkey(struct crypto_tfm *tfm, const u8 *key,
+ 	preempt_disable();
+ 	pagefault_disable();
+ 	enable_kernel_altivec();
++	enable_kernel_vsx();
+ 	ret = aes_p8_set_encrypt_key(key, keylen * 8, &ctx->enc_key);
+ 	ret += aes_p8_set_decrypt_key(key, keylen * 8, &ctx->dec_key);
+ 	pagefault_enable();
+@@ -103,6 +104,7 @@ static void p8_aes_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+ 		preempt_disable();
+ 		pagefault_disable();
+ 		enable_kernel_altivec();
++		enable_kernel_vsx();
+ 		aes_p8_encrypt(src, dst, &ctx->enc_key);
+ 		pagefault_enable();
+ 		preempt_enable();
+@@ -119,6 +121,7 @@ static void p8_aes_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+ 		preempt_disable();
+ 		pagefault_disable();
+ 		enable_kernel_altivec();
++		enable_kernel_vsx();
+ 		aes_p8_decrypt(src, dst, &ctx->dec_key);
+ 		pagefault_enable();
+ 		preempt_enable();
+diff --git a/drivers/crypto/vmx/aes_cbc.c b/drivers/crypto/vmx/aes_cbc.c
+index 7299995c78ec..0b8fe2ec5315 100644
+--- a/drivers/crypto/vmx/aes_cbc.c
++++ b/drivers/crypto/vmx/aes_cbc.c
+@@ -85,6 +85,7 @@ static int p8_aes_cbc_setkey(struct crypto_tfm *tfm, const u8 *key,
+ 	preempt_disable();
+ 	pagefault_disable();
+ 	enable_kernel_altivec();
++	enable_kernel_vsx();
+ 	ret = aes_p8_set_encrypt_key(key, keylen * 8, &ctx->enc_key);
+ 	ret += aes_p8_set_decrypt_key(key, keylen * 8, &ctx->dec_key);
+ 	pagefault_enable();
+@@ -115,6 +116,7 @@ static int p8_aes_cbc_encrypt(struct blkcipher_desc *desc,
+ 		preempt_disable();
+ 		pagefault_disable();
+ 		enable_kernel_altivec();
++		enable_kernel_vsx();
+ 
+ 		blkcipher_walk_init(&walk, dst, src, nbytes);
+ 		ret = blkcipher_walk_virt(desc, &walk);
+@@ -155,6 +157,7 @@ static int p8_aes_cbc_decrypt(struct blkcipher_desc *desc,
+ 		preempt_disable();
+ 		pagefault_disable();
+ 		enable_kernel_altivec();
++		enable_kernel_vsx();
+ 
+ 		blkcipher_walk_init(&walk, dst, src, nbytes);
+ 		ret = blkcipher_walk_virt(desc, &walk);
+diff --git a/drivers/crypto/vmx/aes_ctr.c b/drivers/crypto/vmx/aes_ctr.c
+index ed3838781b4c..ee1306cd8f59 100644
+--- a/drivers/crypto/vmx/aes_ctr.c
++++ b/drivers/crypto/vmx/aes_ctr.c
+@@ -82,6 +82,7 @@ static int p8_aes_ctr_setkey(struct crypto_tfm *tfm, const u8 *key,
+ 
+ 	pagefault_disable();
+ 	enable_kernel_altivec();
++	enable_kernel_vsx();
+ 	ret = aes_p8_set_encrypt_key(key, keylen * 8, &ctx->enc_key);
+ 	pagefault_enable();
+ 
+@@ -100,6 +101,7 @@ static void p8_aes_ctr_final(struct p8_aes_ctr_ctx *ctx,
+ 
+ 	pagefault_disable();
+ 	enable_kernel_altivec();
++	enable_kernel_vsx();
+ 	aes_p8_encrypt(ctrblk, keystream, &ctx->enc_key);
+ 	pagefault_enable();
+ 
+@@ -132,6 +134,7 @@ static int p8_aes_ctr_crypt(struct blkcipher_desc *desc,
+ 		while ((nbytes = walk.nbytes) >= AES_BLOCK_SIZE) {
+ 			pagefault_disable();
+ 			enable_kernel_altivec();
++			enable_kernel_vsx();
+ 			aes_p8_ctr32_encrypt_blocks(walk.src.virt.addr,
+ 						    walk.dst.virt.addr,
+ 						    (nbytes &
+diff --git a/drivers/crypto/vmx/ghash.c b/drivers/crypto/vmx/ghash.c
+index b5e29002b666..2183a2e77641 100644
+--- a/drivers/crypto/vmx/ghash.c
++++ b/drivers/crypto/vmx/ghash.c
+@@ -119,6 +119,7 @@ static int p8_ghash_setkey(struct crypto_shash *tfm, const u8 *key,
+ 	preempt_disable();
+ 	pagefault_disable();
+ 	enable_kernel_altivec();
++	enable_kernel_vsx();
+ 	enable_kernel_fp();
+ 	gcm_init_p8(ctx->htable, (const u64 *) key);
+ 	pagefault_enable();
+@@ -149,6 +150,7 @@ static int p8_ghash_update(struct shash_desc *desc,
+ 			preempt_disable();
+ 			pagefault_disable();
+ 			enable_kernel_altivec();
++			enable_kernel_vsx();
+ 			enable_kernel_fp();
+ 			gcm_ghash_p8(dctx->shash, ctx->htable,
+ 				     dctx->buffer, GHASH_DIGEST_SIZE);
+@@ -163,6 +165,7 @@ static int p8_ghash_update(struct shash_desc *desc,
+ 			preempt_disable();
+ 			pagefault_disable();
+ 			enable_kernel_altivec();
++			enable_kernel_vsx();
+ 			enable_kernel_fp();
+ 			gcm_ghash_p8(dctx->shash, ctx->htable, src, len);
+ 			pagefault_enable();
+@@ -193,6 +196,7 @@ static int p8_ghash_final(struct shash_desc *desc, u8 *out)
+ 			preempt_disable();
+ 			pagefault_disable();
+ 			enable_kernel_altivec();
++			enable_kernel_vsx();
+ 			enable_kernel_fp();
+ 			gcm_ghash_p8(dctx->shash, ctx->htable,
+ 				     dctx->buffer, GHASH_DIGEST_SIZE);
+diff --git a/drivers/gpu/drm/i915/intel_ddi.c b/drivers/gpu/drm/i915/intel_ddi.c
+index cacb07b7a8f1..32e7b4a686ef 100644
+--- a/drivers/gpu/drm/i915/intel_ddi.c
++++ b/drivers/gpu/drm/i915/intel_ddi.c
+@@ -1293,17 +1293,14 @@ skl_ddi_pll_select(struct intel_crtc *intel_crtc,
+ 			 DPLL_CFGCR2_PDIV(wrpll_params.pdiv) |
+ 			 wrpll_params.central_freq;
+ 	} else if (intel_encoder->type == INTEL_OUTPUT_DISPLAYPORT) {
+-		struct drm_encoder *encoder = &intel_encoder->base;
+-		struct intel_dp *intel_dp = enc_to_intel_dp(encoder);
+-
+-		switch (intel_dp->link_bw) {
+-		case DP_LINK_BW_1_62:
++		switch (crtc_state->port_clock / 2) {
++		case 81000:
+ 			ctrl1 |= DPLL_CTRL1_LINK_RATE(DPLL_CTRL1_LINK_RATE_810, 0);
+ 			break;
+-		case DP_LINK_BW_2_7:
++		case 135000:
+ 			ctrl1 |= DPLL_CTRL1_LINK_RATE(DPLL_CTRL1_LINK_RATE_1350, 0);
+ 			break;
+-		case DP_LINK_BW_5_4:
++		case 270000:
+ 			ctrl1 |= DPLL_CTRL1_LINK_RATE(DPLL_CTRL1_LINK_RATE_2700, 0);
+ 			break;
+ 		}
+diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
+index bd8f8863eb0e..ca2d923101fc 100644
+--- a/drivers/gpu/drm/i915/intel_dp.c
++++ b/drivers/gpu/drm/i915/intel_dp.c
+@@ -48,28 +48,28 @@
+ #define INTEL_DP_RESOLUTION_FAILSAFE	(3 << INTEL_DP_RESOLUTION_SHIFT_MASK)
+ 
+ struct dp_link_dpll {
+-	int link_bw;
++	int clock;
+ 	struct dpll dpll;
+ };
+ 
+ static const struct dp_link_dpll gen4_dpll[] = {
+-	{ DP_LINK_BW_1_62,
++	{ 162000,
+ 		{ .p1 = 2, .p2 = 10, .n = 2, .m1 = 23, .m2 = 8 } },
+-	{ DP_LINK_BW_2_7,
++	{ 270000,
+ 		{ .p1 = 1, .p2 = 10, .n = 1, .m1 = 14, .m2 = 2 } }
+ };
+ 
+ static const struct dp_link_dpll pch_dpll[] = {
+-	{ DP_LINK_BW_1_62,
++	{ 162000,
+ 		{ .p1 = 2, .p2 = 10, .n = 1, .m1 = 12, .m2 = 9 } },
+-	{ DP_LINK_BW_2_7,
++	{ 270000,
+ 		{ .p1 = 1, .p2 = 10, .n = 2, .m1 = 14, .m2 = 8 } }
+ };
+ 
+ static const struct dp_link_dpll vlv_dpll[] = {
+-	{ DP_LINK_BW_1_62,
++	{ 162000,
+ 		{ .p1 = 3, .p2 = 2, .n = 5, .m1 = 3, .m2 = 81 } },
+-	{ DP_LINK_BW_2_7,
++	{ 270000,
+ 		{ .p1 = 2, .p2 = 2, .n = 1, .m1 = 2, .m2 = 27 } }
+ };
+ 
+@@ -83,11 +83,11 @@ static const struct dp_link_dpll chv_dpll[] = {
+ 	 * m2 is stored in fixed point format using formula below
+ 	 * (m2_int << 22) | m2_fraction
+ 	 */
+-	{ DP_LINK_BW_1_62,	/* m2_int = 32, m2_fraction = 1677722 */
++	{ 162000,	/* m2_int = 32, m2_fraction = 1677722 */
+ 		{ .p1 = 4, .p2 = 2, .n = 1, .m1 = 2, .m2 = 0x819999a } },
+-	{ DP_LINK_BW_2_7,	/* m2_int = 27, m2_fraction = 0 */
++	{ 270000,	/* m2_int = 27, m2_fraction = 0 */
+ 		{ .p1 = 4, .p2 = 1, .n = 1, .m1 = 2, .m2 = 0x6c00000 } },
+-	{ DP_LINK_BW_5_4,	/* m2_int = 27, m2_fraction = 0 */
++	{ 540000,	/* m2_int = 27, m2_fraction = 0 */
+ 		{ .p1 = 2, .p2 = 1, .n = 1, .m1 = 2, .m2 = 0x6c00000 } }
+ };
+ 
+@@ -1089,7 +1089,7 @@ intel_dp_connector_unregister(struct intel_connector *intel_connector)
+ }
+ 
+ static void
+-skl_edp_set_pll_config(struct intel_crtc_state *pipe_config, int link_clock)
++skl_edp_set_pll_config(struct intel_crtc_state *pipe_config)
+ {
+ 	u32 ctrl1;
+ 
+@@ -1101,7 +1101,7 @@ skl_edp_set_pll_config(struct intel_crtc_state *pipe_config, int link_clock)
+ 	pipe_config->dpll_hw_state.cfgcr2 = 0;
+ 
+ 	ctrl1 = DPLL_CTRL1_OVERRIDE(SKL_DPLL0);
+-	switch (link_clock / 2) {
++	switch (pipe_config->port_clock / 2) {
+ 	case 81000:
+ 		ctrl1 |= DPLL_CTRL1_LINK_RATE(DPLL_CTRL1_LINK_RATE_810,
+ 					      SKL_DPLL0);
+@@ -1134,20 +1134,20 @@ skl_edp_set_pll_config(struct intel_crtc_state *pipe_config, int link_clock)
+ 	pipe_config->dpll_hw_state.ctrl1 = ctrl1;
+ }
+ 
+-static void
+-hsw_dp_set_ddi_pll_sel(struct intel_crtc_state *pipe_config, int link_bw)
++void
++hsw_dp_set_ddi_pll_sel(struct intel_crtc_state *pipe_config)
+ {
+ 	memset(&pipe_config->dpll_hw_state, 0,
+ 	       sizeof(pipe_config->dpll_hw_state));
+ 
+-	switch (link_bw) {
+-	case DP_LINK_BW_1_62:
++	switch (pipe_config->port_clock / 2) {
++	case 81000:
+ 		pipe_config->ddi_pll_sel = PORT_CLK_SEL_LCPLL_810;
+ 		break;
+-	case DP_LINK_BW_2_7:
++	case 135000:
+ 		pipe_config->ddi_pll_sel = PORT_CLK_SEL_LCPLL_1350;
+ 		break;
+-	case DP_LINK_BW_5_4:
++	case 270000:
+ 		pipe_config->ddi_pll_sel = PORT_CLK_SEL_LCPLL_2700;
+ 		break;
+ 	}
+@@ -1198,7 +1198,7 @@ intel_dp_source_rates(struct drm_device *dev, const int **source_rates)
+ 
+ static void
+ intel_dp_set_clock(struct intel_encoder *encoder,
+-		   struct intel_crtc_state *pipe_config, int link_bw)
++		   struct intel_crtc_state *pipe_config)
+ {
+ 	struct drm_device *dev = encoder->base.dev;
+ 	const struct dp_link_dpll *divisor = NULL;
+@@ -1220,7 +1220,7 @@ intel_dp_set_clock(struct intel_encoder *encoder,
+ 
+ 	if (divisor && count) {
+ 		for (i = 0; i < count; i++) {
+-			if (link_bw == divisor[i].link_bw) {
++			if (pipe_config->port_clock == divisor[i].clock) {
+ 				pipe_config->dpll = divisor[i].dpll;
+ 				pipe_config->clock_set = true;
+ 				break;
+@@ -1494,13 +1494,13 @@ found:
+ 	}
+ 
+ 	if (IS_SKYLAKE(dev) && is_edp(intel_dp))
+-		skl_edp_set_pll_config(pipe_config, common_rates[clock]);
++		skl_edp_set_pll_config(pipe_config);
+ 	else if (IS_BROXTON(dev))
+ 		/* handled in ddi */;
+ 	else if (IS_HASWELL(dev) || IS_BROADWELL(dev))
+-		hsw_dp_set_ddi_pll_sel(pipe_config, intel_dp->link_bw);
++		hsw_dp_set_ddi_pll_sel(pipe_config);
+ 	else
+-		intel_dp_set_clock(encoder, pipe_config, intel_dp->link_bw);
++		intel_dp_set_clock(encoder, pipe_config);
+ 
+ 	return true;
+ }
+diff --git a/drivers/gpu/drm/i915/intel_dp_mst.c b/drivers/gpu/drm/i915/intel_dp_mst.c
+index 600afdbef8c9..8c127201ab3c 100644
+--- a/drivers/gpu/drm/i915/intel_dp_mst.c
++++ b/drivers/gpu/drm/i915/intel_dp_mst.c
+@@ -33,6 +33,7 @@
+ static bool intel_dp_mst_compute_config(struct intel_encoder *encoder,
+ 					struct intel_crtc_state *pipe_config)
+ {
++	struct drm_device *dev = encoder->base.dev;
+ 	struct intel_dp_mst_encoder *intel_mst = enc_to_mst(&encoder->base);
+ 	struct intel_digital_port *intel_dig_port = intel_mst->primary;
+ 	struct intel_dp *intel_dp = &intel_dig_port->dp;
+@@ -97,6 +98,10 @@ static bool intel_dp_mst_compute_config(struct intel_encoder *encoder,
+ 			       &pipe_config->dp_m_n);
+ 
+ 	pipe_config->dp_m_n.tu = slots;
++
++	if (IS_HASWELL(dev) || IS_BROADWELL(dev))
++		hsw_dp_set_ddi_pll_sel(pipe_config);
++
+ 	return true;
+ 
+ }
+diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
+index 105928382e21..04d426156bdb 100644
+--- a/drivers/gpu/drm/i915/intel_drv.h
++++ b/drivers/gpu/drm/i915/intel_drv.h
+@@ -1194,6 +1194,7 @@ void intel_edp_drrs_disable(struct intel_dp *intel_dp);
+ void intel_edp_drrs_invalidate(struct drm_device *dev,
+ 		unsigned frontbuffer_bits);
+ void intel_edp_drrs_flush(struct drm_device *dev, unsigned frontbuffer_bits);
++void hsw_dp_set_ddi_pll_sel(struct intel_crtc_state *pipe_config);
+ 
+ /* intel_dp_mst.c */
+ int intel_dp_mst_encoder_init(struct intel_digital_port *intel_dig_port, int conn_id);
+diff --git a/drivers/gpu/drm/radeon/radeon_combios.c b/drivers/gpu/drm/radeon/radeon_combios.c
+index c097d3a82bda..a9b01bcf7d0a 100644
+--- a/drivers/gpu/drm/radeon/radeon_combios.c
++++ b/drivers/gpu/drm/radeon/radeon_combios.c
+@@ -3387,6 +3387,14 @@ void radeon_combios_asic_init(struct drm_device *dev)
+ 	    rdev->pdev->subsystem_device == 0x30ae)
+ 		return;
+ 
++	/* quirk for rs4xx HP Compaq dc5750 Small Form Factor to make it resume
++	 * - it hangs on resume inside the dynclk 1 table.
++	 */
++	if (rdev->family == CHIP_RS480 &&
++	    rdev->pdev->subsystem_vendor == 0x103c &&
++	    rdev->pdev->subsystem_device == 0x280a)
++		return;
++
+ 	/* DYN CLK 1 */
+ 	table = combios_get_table_offset(dev, COMBIOS_DYN_CLK_1_TABLE);
+ 	if (table)
+diff --git a/drivers/i2c/busses/i2c-xgene-slimpro.c b/drivers/i2c/busses/i2c-xgene-slimpro.c
+index 1c9cb65ac4cf..4233f5695352 100644
+--- a/drivers/i2c/busses/i2c-xgene-slimpro.c
++++ b/drivers/i2c/busses/i2c-xgene-slimpro.c
+@@ -198,10 +198,10 @@ static int slimpro_i2c_blkrd(struct slimpro_i2c_dev *ctx, u32 chip, u32 addr,
+ 	int rc;
+ 
+ 	paddr = dma_map_single(ctx->dev, ctx->dma_buffer, readlen, DMA_FROM_DEVICE);
+-	rc = dma_mapping_error(ctx->dev, paddr);
+-	if (rc) {
++	if (dma_mapping_error(ctx->dev, paddr)) {
+ 		dev_err(&ctx->adapter.dev, "Error in mapping dma buffer %p\n",
+ 			ctx->dma_buffer);
++		rc = -ENOMEM;
+ 		goto err;
+ 	}
+ 
+@@ -241,10 +241,10 @@ static int slimpro_i2c_blkwr(struct slimpro_i2c_dev *ctx, u32 chip,
+ 	memcpy(ctx->dma_buffer, data, writelen);
+ 	paddr = dma_map_single(ctx->dev, ctx->dma_buffer, writelen,
+ 			       DMA_TO_DEVICE);
+-	rc = dma_mapping_error(ctx->dev, paddr);
+-	if (rc) {
++	if (dma_mapping_error(ctx->dev, paddr)) {
+ 		dev_err(&ctx->adapter.dev, "Error in mapping dma buffer %p\n",
+ 			ctx->dma_buffer);
++		rc = -ENOMEM;
+ 		goto err;
+ 	}
+ 
+diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
+index ba365b6d1e8d..65cbfcc92f11 100644
+--- a/drivers/infiniband/core/uverbs.h
++++ b/drivers/infiniband/core/uverbs.h
+@@ -85,7 +85,7 @@
+  */
+ 
+ struct ib_uverbs_device {
+-	struct kref				ref;
++	atomic_t				refcount;
+ 	int					num_comp_vectors;
+ 	struct completion			comp;
+ 	struct device			       *dev;
+@@ -94,6 +94,7 @@ struct ib_uverbs_device {
+ 	struct cdev			        cdev;
+ 	struct rb_root				xrcd_tree;
+ 	struct mutex				xrcd_tree_mutex;
++	struct kobject				kobj;
+ };
+ 
+ struct ib_uverbs_event_file {
+diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
+index bbb02ffe87df..a6ca83b3153f 100644
+--- a/drivers/infiniband/core/uverbs_cmd.c
++++ b/drivers/infiniband/core/uverbs_cmd.c
+@@ -2346,6 +2346,12 @@ ssize_t ib_uverbs_post_send(struct ib_uverbs_file *file,
+ 		next->send_flags = user_wr->send_flags;
+ 
+ 		if (is_ud) {
++			if (next->opcode != IB_WR_SEND &&
++			    next->opcode != IB_WR_SEND_WITH_IMM) {
++				ret = -EINVAL;
++				goto out_put;
++			}
++
+ 			next->wr.ud.ah = idr_read_ah(user_wr->wr.ud.ah,
+ 						     file->ucontext);
+ 			if (!next->wr.ud.ah) {
+@@ -2385,9 +2391,11 @@ ssize_t ib_uverbs_post_send(struct ib_uverbs_file *file,
+ 					user_wr->wr.atomic.compare_add;
+ 				next->wr.atomic.swap = user_wr->wr.atomic.swap;
+ 				next->wr.atomic.rkey = user_wr->wr.atomic.rkey;
++			case IB_WR_SEND:
+ 				break;
+ 			default:
+-				break;
++				ret = -EINVAL;
++				goto out_put;
+ 			}
+ 		}
+ 
+diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
+index f6eef2da7097..15f4126a577d 100644
+--- a/drivers/infiniband/core/uverbs_main.c
++++ b/drivers/infiniband/core/uverbs_main.c
+@@ -130,14 +130,18 @@ static int (*uverbs_ex_cmd_table[])(struct ib_uverbs_file *file,
+ static void ib_uverbs_add_one(struct ib_device *device);
+ static void ib_uverbs_remove_one(struct ib_device *device);
+ 
+-static void ib_uverbs_release_dev(struct kref *ref)
++static void ib_uverbs_release_dev(struct kobject *kobj)
+ {
+ 	struct ib_uverbs_device *dev =
+-		container_of(ref, struct ib_uverbs_device, ref);
++		container_of(kobj, struct ib_uverbs_device, kobj);
+ 
+-	complete(&dev->comp);
++	kfree(dev);
+ }
+ 
++static struct kobj_type ib_uverbs_dev_ktype = {
++	.release = ib_uverbs_release_dev,
++};
++
+ static void ib_uverbs_release_event_file(struct kref *ref)
+ {
+ 	struct ib_uverbs_event_file *file =
+@@ -303,13 +307,19 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
+ 	return context->device->dealloc_ucontext(context);
+ }
+ 
++static void ib_uverbs_comp_dev(struct ib_uverbs_device *dev)
++{
++	complete(&dev->comp);
++}
++
+ static void ib_uverbs_release_file(struct kref *ref)
+ {
+ 	struct ib_uverbs_file *file =
+ 		container_of(ref, struct ib_uverbs_file, ref);
+ 
+ 	module_put(file->device->ib_dev->owner);
+-	kref_put(&file->device->ref, ib_uverbs_release_dev);
++	if (atomic_dec_and_test(&file->device->refcount))
++		ib_uverbs_comp_dev(file->device);
+ 
+ 	kfree(file);
+ }
+@@ -743,9 +753,7 @@ static int ib_uverbs_open(struct inode *inode, struct file *filp)
+ 	int ret;
+ 
+ 	dev = container_of(inode->i_cdev, struct ib_uverbs_device, cdev);
+-	if (dev)
+-		kref_get(&dev->ref);
+-	else
++	if (!atomic_inc_not_zero(&dev->refcount))
+ 		return -ENXIO;
+ 
+ 	if (!try_module_get(dev->ib_dev->owner)) {
+@@ -766,6 +774,7 @@ static int ib_uverbs_open(struct inode *inode, struct file *filp)
+ 	mutex_init(&file->mutex);
+ 
+ 	filp->private_data = file;
++	kobject_get(&dev->kobj);
+ 
+ 	return nonseekable_open(inode, filp);
+ 
+@@ -773,13 +782,16 @@ err_module:
+ 	module_put(dev->ib_dev->owner);
+ 
+ err:
+-	kref_put(&dev->ref, ib_uverbs_release_dev);
++	if (atomic_dec_and_test(&dev->refcount))
++		ib_uverbs_comp_dev(dev);
++
+ 	return ret;
+ }
+ 
+ static int ib_uverbs_close(struct inode *inode, struct file *filp)
+ {
+ 	struct ib_uverbs_file *file = filp->private_data;
++	struct ib_uverbs_device *dev = file->device;
+ 
+ 	ib_uverbs_cleanup_ucontext(file, file->ucontext);
+ 
+@@ -787,6 +799,7 @@ static int ib_uverbs_close(struct inode *inode, struct file *filp)
+ 		kref_put(&file->async_file->ref, ib_uverbs_release_event_file);
+ 
+ 	kref_put(&file->ref, ib_uverbs_release_file);
++	kobject_put(&dev->kobj);
+ 
+ 	return 0;
+ }
+@@ -882,10 +895,11 @@ static void ib_uverbs_add_one(struct ib_device *device)
+ 	if (!uverbs_dev)
+ 		return;
+ 
+-	kref_init(&uverbs_dev->ref);
++	atomic_set(&uverbs_dev->refcount, 1);
+ 	init_completion(&uverbs_dev->comp);
+ 	uverbs_dev->xrcd_tree = RB_ROOT;
+ 	mutex_init(&uverbs_dev->xrcd_tree_mutex);
++	kobject_init(&uverbs_dev->kobj, &ib_uverbs_dev_ktype);
+ 
+ 	spin_lock(&map_lock);
+ 	devnum = find_first_zero_bit(dev_map, IB_UVERBS_MAX_DEVICES);
+@@ -912,6 +926,7 @@ static void ib_uverbs_add_one(struct ib_device *device)
+ 	cdev_init(&uverbs_dev->cdev, NULL);
+ 	uverbs_dev->cdev.owner = THIS_MODULE;
+ 	uverbs_dev->cdev.ops = device->mmap ? &uverbs_mmap_fops : &uverbs_fops;
++	uverbs_dev->cdev.kobj.parent = &uverbs_dev->kobj;
+ 	kobject_set_name(&uverbs_dev->cdev.kobj, "uverbs%d", uverbs_dev->devnum);
+ 	if (cdev_add(&uverbs_dev->cdev, base, 1))
+ 		goto err_cdev;
+@@ -942,9 +957,10 @@ err_cdev:
+ 		clear_bit(devnum, overflow_map);
+ 
+ err:
+-	kref_put(&uverbs_dev->ref, ib_uverbs_release_dev);
++	if (atomic_dec_and_test(&uverbs_dev->refcount))
++		ib_uverbs_comp_dev(uverbs_dev);
+ 	wait_for_completion(&uverbs_dev->comp);
+-	kfree(uverbs_dev);
++	kobject_put(&uverbs_dev->kobj);
+ 	return;
+ }
+ 
+@@ -964,9 +980,10 @@ static void ib_uverbs_remove_one(struct ib_device *device)
+ 	else
+ 		clear_bit(uverbs_dev->devnum - IB_UVERBS_MAX_DEVICES, overflow_map);
+ 
+-	kref_put(&uverbs_dev->ref, ib_uverbs_release_dev);
++	if (atomic_dec_and_test(&uverbs_dev->refcount))
++		ib_uverbs_comp_dev(uverbs_dev);
+ 	wait_for_completion(&uverbs_dev->comp);
+-	kfree(uverbs_dev);
++	kobject_put(&uverbs_dev->kobj);
+ }
+ 
+ static char *uverbs_devnode(struct device *dev, umode_t *mode)
+diff --git a/drivers/infiniband/hw/mlx4/ah.c b/drivers/infiniband/hw/mlx4/ah.c
+index f50a546224ad..33fdd50123f7 100644
+--- a/drivers/infiniband/hw/mlx4/ah.c
++++ b/drivers/infiniband/hw/mlx4/ah.c
+@@ -148,9 +148,13 @@ int mlx4_ib_query_ah(struct ib_ah *ibah, struct ib_ah_attr *ah_attr)
+ 	enum rdma_link_layer ll;
+ 
+ 	memset(ah_attr, 0, sizeof *ah_attr);
+-	ah_attr->sl = be32_to_cpu(ah->av.ib.sl_tclass_flowlabel) >> 28;
+ 	ah_attr->port_num = be32_to_cpu(ah->av.ib.port_pd) >> 24;
+ 	ll = rdma_port_get_link_layer(ibah->device, ah_attr->port_num);
++	if (ll == IB_LINK_LAYER_ETHERNET)
++		ah_attr->sl = be32_to_cpu(ah->av.eth.sl_tclass_flowlabel) >> 29;
++	else
++		ah_attr->sl = be32_to_cpu(ah->av.ib.sl_tclass_flowlabel) >> 28;
++
+ 	ah_attr->dlid = ll == IB_LINK_LAYER_INFINIBAND ? be16_to_cpu(ah->av.ib.dlid) : 0;
+ 	if (ah->av.ib.stat_rate)
+ 		ah_attr->static_rate = ah->av.ib.stat_rate - MLX4_STAT_RATE_OFFSET;
+diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
+index 36eb3d012b6d..2f4259525bb1 100644
+--- a/drivers/infiniband/hw/mlx4/cq.c
++++ b/drivers/infiniband/hw/mlx4/cq.c
+@@ -638,7 +638,7 @@ static void mlx4_ib_poll_sw_comp(struct mlx4_ib_cq *cq, int num_entries,
+ 	 * simulated FLUSH_ERR completions
+ 	 */
+ 	list_for_each_entry(qp, &cq->send_qp_list, cq_send_list) {
+-		mlx4_ib_qp_sw_comp(qp, num_entries, wc, npolled, 1);
++		mlx4_ib_qp_sw_comp(qp, num_entries, wc + *npolled, npolled, 1);
+ 		if (*npolled >= num_entries)
+ 			goto out;
+ 	}
+diff --git a/drivers/infiniband/hw/mlx4/mcg.c b/drivers/infiniband/hw/mlx4/mcg.c
+index ed327e6c8fdc..a0559a8af4f4 100644
+--- a/drivers/infiniband/hw/mlx4/mcg.c
++++ b/drivers/infiniband/hw/mlx4/mcg.c
+@@ -206,15 +206,16 @@ static int send_mad_to_wire(struct mlx4_ib_demux_ctx *ctx, struct ib_mad *mad)
+ {
+ 	struct mlx4_ib_dev *dev = ctx->dev;
+ 	struct ib_ah_attr	ah_attr;
++	unsigned long flags;
+ 
+-	spin_lock(&dev->sm_lock);
++	spin_lock_irqsave(&dev->sm_lock, flags);
+ 	if (!dev->sm_ah[ctx->port - 1]) {
+ 		/* port is not yet Active, sm_ah not ready */
+-		spin_unlock(&dev->sm_lock);
++		spin_unlock_irqrestore(&dev->sm_lock, flags);
+ 		return -EAGAIN;
+ 	}
+ 	mlx4_ib_query_ah(dev->sm_ah[ctx->port - 1], &ah_attr);
+-	spin_unlock(&dev->sm_lock);
++	spin_unlock_irqrestore(&dev->sm_lock, flags);
+ 	return mlx4_ib_send_to_wire(dev, mlx4_master_func_num(dev->dev),
+ 				    ctx->port, IB_QPT_GSI, 0, 1, IB_QP1_QKEY,
+ 				    &ah_attr, NULL, mad);
+diff --git a/drivers/infiniband/hw/mlx4/sysfs.c b/drivers/infiniband/hw/mlx4/sysfs.c
+index 6797108ce873..69fb5ba94d0f 100644
+--- a/drivers/infiniband/hw/mlx4/sysfs.c
++++ b/drivers/infiniband/hw/mlx4/sysfs.c
+@@ -640,6 +640,8 @@ static int add_port(struct mlx4_ib_dev *dev, int port_num, int slave)
+ 	struct mlx4_port *p;
+ 	int i;
+ 	int ret;
++	int is_eth = rdma_port_get_link_layer(&dev->ib_dev, port_num) ==
++			IB_LINK_LAYER_ETHERNET;
+ 
+ 	p = kzalloc(sizeof *p, GFP_KERNEL);
+ 	if (!p)
+@@ -657,7 +659,8 @@ static int add_port(struct mlx4_ib_dev *dev, int port_num, int slave)
+ 
+ 	p->pkey_group.name  = "pkey_idx";
+ 	p->pkey_group.attrs =
+-		alloc_group_attrs(show_port_pkey, store_port_pkey,
++		alloc_group_attrs(show_port_pkey,
++				  is_eth ? NULL : store_port_pkey,
+ 				  dev->dev->caps.pkey_table_len[port_num]);
+ 	if (!p->pkey_group.attrs) {
+ 		ret = -ENOMEM;
+diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
+index bc9a0de897cb..dbb75c0de848 100644
+--- a/drivers/infiniband/hw/mlx5/mr.c
++++ b/drivers/infiniband/hw/mlx5/mr.c
+@@ -1118,19 +1118,7 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
+ 	return &mr->ibmr;
+ 
+ error:
+-	/*
+-	 * Destroy the umem *before* destroying the MR, to ensure we
+-	 * will not have any in-flight notifiers when destroying the
+-	 * MR.
+-	 *
+-	 * As the MR is completely invalid to begin with, and this
+-	 * error path is only taken if we can't push the mr entry into
+-	 * the pagefault tree, this is safe.
+-	 */
+-
+ 	ib_umem_release(umem);
+-	/* Kill the MR, and return an error code. */
+-	clean_mr(mr);
+ 	return ERR_PTR(err);
+ }
+ 
+diff --git a/drivers/infiniband/hw/qib/qib_keys.c b/drivers/infiniband/hw/qib/qib_keys.c
+index ad843c786e72..5afaa218508d 100644
+--- a/drivers/infiniband/hw/qib/qib_keys.c
++++ b/drivers/infiniband/hw/qib/qib_keys.c
+@@ -86,6 +86,10 @@ int qib_alloc_lkey(struct qib_mregion *mr, int dma_region)
+ 	 * unrestricted LKEY.
+ 	 */
+ 	rkt->gen++;
++	/*
++	 * bits are capped in qib_verbs.c to insure enough bits
++	 * for generation number
++	 */
+ 	mr->lkey = (r << (32 - ib_qib_lkey_table_size)) |
+ 		((((1 << (24 - ib_qib_lkey_table_size)) - 1) & rkt->gen)
+ 		 << 8);
+diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c
+index a05d1a372208..77e981abfce4 100644
+--- a/drivers/infiniband/hw/qib/qib_verbs.c
++++ b/drivers/infiniband/hw/qib/qib_verbs.c
+@@ -40,6 +40,7 @@
+ #include <linux/rculist.h>
+ #include <linux/mm.h>
+ #include <linux/random.h>
++#include <linux/vmalloc.h>
+ 
+ #include "qib.h"
+ #include "qib_common.h"
+@@ -2109,10 +2110,16 @@ int qib_register_ib_device(struct qib_devdata *dd)
+ 	 * the LKEY).  The remaining bits act as a generation number or tag.
+ 	 */
+ 	spin_lock_init(&dev->lk_table.lock);
++	/* insure generation is at least 4 bits see keys.c */
++	if (ib_qib_lkey_table_size > MAX_LKEY_TABLE_BITS) {
++		qib_dev_warn(dd, "lkey bits %u too large, reduced to %u\n",
++			ib_qib_lkey_table_size, MAX_LKEY_TABLE_BITS);
++		ib_qib_lkey_table_size = MAX_LKEY_TABLE_BITS;
++	}
+ 	dev->lk_table.max = 1 << ib_qib_lkey_table_size;
+ 	lk_tab_size = dev->lk_table.max * sizeof(*dev->lk_table.table);
+ 	dev->lk_table.table = (struct qib_mregion __rcu **)
+-		__get_free_pages(GFP_KERNEL, get_order(lk_tab_size));
++		vmalloc(lk_tab_size);
+ 	if (dev->lk_table.table == NULL) {
+ 		ret = -ENOMEM;
+ 		goto err_lk;
+@@ -2286,7 +2293,7 @@ err_tx:
+ 					sizeof(struct qib_pio_header),
+ 				  dev->pio_hdrs, dev->pio_hdrs_phys);
+ err_hdrs:
+-	free_pages((unsigned long) dev->lk_table.table, get_order(lk_tab_size));
++	vfree(dev->lk_table.table);
+ err_lk:
+ 	kfree(dev->qp_table);
+ err_qpt:
+@@ -2340,8 +2347,7 @@ void qib_unregister_ib_device(struct qib_devdata *dd)
+ 					sizeof(struct qib_pio_header),
+ 				  dev->pio_hdrs, dev->pio_hdrs_phys);
+ 	lk_tab_size = dev->lk_table.max * sizeof(*dev->lk_table.table);
+-	free_pages((unsigned long) dev->lk_table.table,
+-		   get_order(lk_tab_size));
++	vfree(dev->lk_table.table);
+ 	kfree(dev->qp_table);
+ }
+ 
+diff --git a/drivers/infiniband/hw/qib/qib_verbs.h b/drivers/infiniband/hw/qib/qib_verbs.h
+index 1635572752ce..bce0fa596b4d 100644
+--- a/drivers/infiniband/hw/qib/qib_verbs.h
++++ b/drivers/infiniband/hw/qib/qib_verbs.h
+@@ -647,6 +647,8 @@ struct qib_qpn_table {
+ 	struct qpn_map map[QPNMAP_ENTRIES];
+ };
+ 
++#define MAX_LKEY_TABLE_BITS 23
++
+ struct qib_lkey_table {
+ 	spinlock_t lock; /* protect changes in this struct */
+ 	u32 next;               /* next unused index (speeds search) */
+diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c
+index 6a594aac2290..c933d882c35c 100644
+--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
++++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
+@@ -201,6 +201,7 @@ iser_initialize_task_headers(struct iscsi_task *task,
+ 		goto out;
+ 	}
+ 
++	tx_desc->mapped = true;
+ 	tx_desc->dma_addr = dma_addr;
+ 	tx_desc->tx_sg[0].addr   = tx_desc->dma_addr;
+ 	tx_desc->tx_sg[0].length = ISER_HEADERS_LEN;
+@@ -360,16 +361,19 @@ iscsi_iser_task_xmit(struct iscsi_task *task)
+ static void iscsi_iser_cleanup_task(struct iscsi_task *task)
+ {
+ 	struct iscsi_iser_task *iser_task = task->dd_data;
+-	struct iser_tx_desc    *tx_desc   = &iser_task->desc;
+-	struct iser_conn       *iser_conn	  = task->conn->dd_data;
++	struct iser_tx_desc *tx_desc = &iser_task->desc;
++	struct iser_conn *iser_conn = task->conn->dd_data;
+ 	struct iser_device *device = iser_conn->ib_conn.device;
+ 
+ 	/* DEVICE_REMOVAL event might have already released the device */
+ 	if (!device)
+ 		return;
+ 
+-	ib_dma_unmap_single(device->ib_device,
+-		tx_desc->dma_addr, ISER_HEADERS_LEN, DMA_TO_DEVICE);
++	if (likely(tx_desc->mapped)) {
++		ib_dma_unmap_single(device->ib_device, tx_desc->dma_addr,
++				    ISER_HEADERS_LEN, DMA_TO_DEVICE);
++		tx_desc->mapped = false;
++	}
+ 
+ 	/* mgmt tasks do not need special cleanup */
+ 	if (!task->sc)
+diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h
+index 262ba1f8ee50..d2b6caf7694d 100644
+--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
++++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
+@@ -270,6 +270,7 @@ enum iser_desc_type {
+  *                 sg[1] optionally points to either of immediate data
+  *                 unsolicited data-out or control
+  * @num_sge:       number sges used on this TX task
++ * @mapped:        Is the task header mapped
+  */
+ struct iser_tx_desc {
+ 	struct iser_hdr              iser_header;
+@@ -278,6 +279,7 @@ struct iser_tx_desc {
+ 	u64		             dma_addr;
+ 	struct ib_sge		     tx_sg[2];
+ 	int                          num_sge;
++	bool			     mapped;
+ };
+ 
+ #define ISER_RX_PAD_SIZE	(256 - (ISER_RX_PAYLOAD_SIZE + \
+diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c
+index 3e2118e8ed87..0a47f42fec24 100644
+--- a/drivers/infiniband/ulp/iser/iser_initiator.c
++++ b/drivers/infiniband/ulp/iser/iser_initiator.c
+@@ -454,7 +454,7 @@ int iser_send_data_out(struct iscsi_conn *conn,
+ 	unsigned long buf_offset;
+ 	unsigned long data_seg_len;
+ 	uint32_t itt;
+-	int err = 0;
++	int err;
+ 	struct ib_sge *tx_dsg;
+ 
+ 	itt = (__force uint32_t)hdr->itt;
+@@ -475,7 +475,9 @@ int iser_send_data_out(struct iscsi_conn *conn,
+ 	memcpy(&tx_desc->iscsi_header, hdr, sizeof(struct iscsi_hdr));
+ 
+ 	/* build the tx desc */
+-	iser_initialize_task_headers(task, tx_desc);
++	err = iser_initialize_task_headers(task, tx_desc);
++	if (err)
++		goto send_data_out_error;
+ 
+ 	mem_reg = &iser_task->rdma_reg[ISER_DIR_OUT];
+ 	tx_dsg = &tx_desc->tx_sg[1];
+@@ -502,7 +504,7 @@ int iser_send_data_out(struct iscsi_conn *conn,
+ 
+ send_data_out_error:
+ 	kmem_cache_free(ig.desc_cache, tx_desc);
+-	iser_err("conn %p failed err %d\n",conn, err);
++	iser_err("conn %p failed err %d\n", conn, err);
+ 	return err;
+ }
+ 
+diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
+index 31a20b462266..ffda44ff9375 100644
+--- a/drivers/infiniband/ulp/srp/ib_srp.c
++++ b/drivers/infiniband/ulp/srp/ib_srp.c
+@@ -2757,6 +2757,13 @@ static int srp_sdev_count(struct Scsi_Host *host)
+ 	return c;
+ }
+ 
++/*
++ * Return values:
++ * < 0 upon failure. Caller is responsible for SRP target port cleanup.
++ * 0 and target->state == SRP_TARGET_REMOVED if asynchronous target port
++ *    removal has been scheduled.
++ * 0 and target->state != SRP_TARGET_REMOVED upon success.
++ */
+ static int srp_add_target(struct srp_host *host, struct srp_target_port *target)
+ {
+ 	struct srp_rport_identifiers ids;
+@@ -3262,7 +3269,7 @@ static ssize_t srp_create_target(struct device *dev,
+ 					srp_free_ch_ib(target, ch);
+ 					srp_free_req_data(target, ch);
+ 					target->ch_count = ch - target->ch;
+-					break;
++					goto connected;
+ 				}
+ 			}
+ 
+@@ -3272,6 +3279,7 @@ static ssize_t srp_create_target(struct device *dev,
+ 		node_idx++;
+ 	}
+ 
++connected:
+ 	target->scsi_host->nr_hw_queues = target->ch_count;
+ 
+ 	ret = srp_add_target(host, target);
+@@ -3294,6 +3302,8 @@ out:
+ 	mutex_unlock(&host->add_target_mutex);
+ 
+ 	scsi_host_put(target->scsi_host);
++	if (ret < 0)
++		scsi_host_put(target->scsi_host);
+ 
+ 	return ret;
+ 
+diff --git a/drivers/input/evdev.c b/drivers/input/evdev.c
+index 9d35499faca4..08d496411f75 100644
+--- a/drivers/input/evdev.c
++++ b/drivers/input/evdev.c
+@@ -290,19 +290,14 @@ static int evdev_flush(struct file *file, fl_owner_t id)
+ {
+ 	struct evdev_client *client = file->private_data;
+ 	struct evdev *evdev = client->evdev;
+-	int retval;
+ 
+-	retval = mutex_lock_interruptible(&evdev->mutex);
+-	if (retval)
+-		return retval;
++	mutex_lock(&evdev->mutex);
+ 
+-	if (!evdev->exist || client->revoked)
+-		retval = -ENODEV;
+-	else
+-		retval = input_flush_device(&evdev->handle, file);
++	if (evdev->exist && !client->revoked)
++		input_flush_device(&evdev->handle, file);
+ 
+ 	mutex_unlock(&evdev->mutex);
+-	return retval;
++	return 0;
+ }
+ 
+ static void evdev_free(struct device *dev)
+diff --git a/drivers/iommu/fsl_pamu.c b/drivers/iommu/fsl_pamu.c
+index abeedc9a78c2..2570f2a25dc4 100644
+--- a/drivers/iommu/fsl_pamu.c
++++ b/drivers/iommu/fsl_pamu.c
+@@ -41,7 +41,6 @@ struct pamu_isr_data {
+ 
+ static struct paace *ppaact;
+ static struct paace *spaact;
+-static struct ome *omt __initdata;
+ 
+ /*
+  * Table for matching compatible strings, for device tree
+@@ -50,7 +49,7 @@ static struct ome *omt __initdata;
+  * SOCs. For the older SOCs "fsl,qoriq-device-config-1.0"
+  * string would be used.
+  */
+-static const struct of_device_id guts_device_ids[] __initconst = {
++static const struct of_device_id guts_device_ids[] = {
+ 	{ .compatible = "fsl,qoriq-device-config-1.0", },
+ 	{ .compatible = "fsl,qoriq-device-config-2.0", },
+ 	{}
+@@ -599,7 +598,7 @@ found_cpu_node:
+  * Memory accesses to QMAN and BMAN private memory need not be coherent, so
+  * clear the PAACE entry coherency attribute for them.
+  */
+-static void __init setup_qbman_paace(struct paace *ppaace, int  paace_type)
++static void setup_qbman_paace(struct paace *ppaace, int  paace_type)
+ {
+ 	switch (paace_type) {
+ 	case QMAN_PAACE:
+@@ -629,7 +628,7 @@ static void __init setup_qbman_paace(struct paace *ppaace, int  paace_type)
+  * this table to translate device transaction to appropriate corenet
+  * transaction.
+  */
+-static void __init setup_omt(struct ome *omt)
++static void setup_omt(struct ome *omt)
+ {
+ 	struct ome *ome;
+ 
+@@ -666,7 +665,7 @@ static void __init setup_omt(struct ome *omt)
+  * Get the maximum number of PAACT table entries
+  * and subwindows supported by PAMU
+  */
+-static void __init get_pamu_cap_values(unsigned long pamu_reg_base)
++static void get_pamu_cap_values(unsigned long pamu_reg_base)
+ {
+ 	u32 pc_val;
+ 
+@@ -676,9 +675,9 @@ static void __init get_pamu_cap_values(unsigned long pamu_reg_base)
+ }
+ 
+ /* Setup PAMU registers pointing to PAACT, SPAACT and OMT */
+-static int __init setup_one_pamu(unsigned long pamu_reg_base, unsigned long pamu_reg_size,
+-				 phys_addr_t ppaact_phys, phys_addr_t spaact_phys,
+-				 phys_addr_t omt_phys)
++static int setup_one_pamu(unsigned long pamu_reg_base, unsigned long pamu_reg_size,
++			  phys_addr_t ppaact_phys, phys_addr_t spaact_phys,
++			  phys_addr_t omt_phys)
+ {
+ 	u32 *pc;
+ 	struct pamu_mmap_regs *pamu_regs;
+@@ -720,7 +719,7 @@ static int __init setup_one_pamu(unsigned long pamu_reg_base, unsigned long pamu
+ }
+ 
+ /* Enable all device LIODNS */
+-static void __init setup_liodns(void)
++static void setup_liodns(void)
+ {
+ 	int i, len;
+ 	struct paace *ppaace;
+@@ -846,7 +845,7 @@ struct ccsr_law {
+ /*
+  * Create a coherence subdomain for a given memory block.
+  */
+-static int __init create_csd(phys_addr_t phys, size_t size, u32 csd_port_id)
++static int create_csd(phys_addr_t phys, size_t size, u32 csd_port_id)
+ {
+ 	struct device_node *np;
+ 	const __be32 *iprop;
+@@ -988,7 +987,7 @@ error:
+ static const struct {
+ 	u32 svr;
+ 	u32 port_id;
+-} port_id_map[] __initconst = {
++} port_id_map[] = {
+ 	{(SVR_P2040 << 8) | 0x10, 0xFF000000},	/* P2040 1.0 */
+ 	{(SVR_P2040 << 8) | 0x11, 0xFF000000},	/* P2040 1.1 */
+ 	{(SVR_P2041 << 8) | 0x10, 0xFF000000},	/* P2041 1.0 */
+@@ -1006,7 +1005,7 @@ static const struct {
+ 
+ #define SVR_SECURITY	0x80000	/* The Security (E) bit */
+ 
+-static int __init fsl_pamu_probe(struct platform_device *pdev)
++static int fsl_pamu_probe(struct platform_device *pdev)
+ {
+ 	struct device *dev = &pdev->dev;
+ 	void __iomem *pamu_regs = NULL;
+@@ -1022,6 +1021,7 @@ static int __init fsl_pamu_probe(struct platform_device *pdev)
+ 	int irq;
+ 	phys_addr_t ppaact_phys;
+ 	phys_addr_t spaact_phys;
++	struct ome *omt;
+ 	phys_addr_t omt_phys;
+ 	size_t mem_size = 0;
+ 	unsigned int order = 0;
+@@ -1200,7 +1200,7 @@ error:
+ 	return ret;
+ }
+ 
+-static struct platform_driver fsl_of_pamu_driver __initdata = {
++static struct platform_driver fsl_of_pamu_driver = {
+ 	.driver = {
+ 		.name = "fsl-of-pamu",
+ 	},
+diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
+index 0649b94f5958..7553cb90627f 100644
+--- a/drivers/iommu/intel-iommu.c
++++ b/drivers/iommu/intel-iommu.c
+@@ -755,6 +755,7 @@ static inline struct context_entry *iommu_context_addr(struct intel_iommu *iommu
+ 	struct context_entry *context;
+ 	u64 *entry;
+ 
++	entry = &root->lo;
+ 	if (ecs_enabled(iommu)) {
+ 		if (devfn >= 0x80) {
+ 			devfn -= 0x80;
+@@ -762,7 +763,6 @@ static inline struct context_entry *iommu_context_addr(struct intel_iommu *iommu
+ 		}
+ 		devfn *= 2;
+ 	}
+-	entry = &root->lo;
+ 	if (*entry & 1)
+ 		context = phys_to_virt(*entry & VTD_PAGE_MASK);
+ 	else {
+diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
+index 4e460216bd16..e29d5d7fe220 100644
+--- a/drivers/iommu/io-pgtable-arm.c
++++ b/drivers/iommu/io-pgtable-arm.c
+@@ -200,6 +200,10 @@ typedef u64 arm_lpae_iopte;
+ 
+ static bool selftest_running = false;
+ 
++static int __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
++			    unsigned long iova, size_t size, int lvl,
++			    arm_lpae_iopte *ptep);
++
+ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
+ 			     unsigned long iova, phys_addr_t paddr,
+ 			     arm_lpae_iopte prot, int lvl,
+@@ -207,10 +211,21 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
+ {
+ 	arm_lpae_iopte pte = prot;
+ 
+-	/* We require an unmap first */
+ 	if (iopte_leaf(*ptep, lvl)) {
++		/* We require an unmap first */
+ 		WARN_ON(!selftest_running);
+ 		return -EEXIST;
++	} else if (iopte_type(*ptep, lvl) == ARM_LPAE_PTE_TYPE_TABLE) {
++		/*
++		 * We need to unmap and free the old table before
++		 * overwriting it with a block entry.
++		 */
++		arm_lpae_iopte *tblp;
++		size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
++
++		tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
++		if (WARN_ON(__arm_lpae_unmap(data, iova, sz, lvl, tblp) != sz))
++			return -EINVAL;
+ 	}
+ 
+ 	if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_NS)
+diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
+index c1f2e521dc52..2cd439203d0f 100644
+--- a/drivers/iommu/tegra-smmu.c
++++ b/drivers/iommu/tegra-smmu.c
+@@ -27,6 +27,7 @@ struct tegra_smmu {
+ 	const struct tegra_smmu_soc *soc;
+ 
+ 	unsigned long pfn_mask;
++	unsigned long tlb_mask;
+ 
+ 	unsigned long *asids;
+ 	struct mutex lock;
+@@ -68,7 +69,8 @@ static inline u32 smmu_readl(struct tegra_smmu *smmu, unsigned long offset)
+ #define SMMU_TLB_CONFIG 0x14
+ #define  SMMU_TLB_CONFIG_HIT_UNDER_MISS (1 << 29)
+ #define  SMMU_TLB_CONFIG_ROUND_ROBIN_ARBITRATION (1 << 28)
+-#define  SMMU_TLB_CONFIG_ACTIVE_LINES(x) ((x) & 0x3f)
++#define  SMMU_TLB_CONFIG_ACTIVE_LINES(smmu) \
++	((smmu)->soc->num_tlb_lines & (smmu)->tlb_mask)
+ 
+ #define SMMU_PTC_CONFIG 0x18
+ #define  SMMU_PTC_CONFIG_ENABLE (1 << 29)
+@@ -816,6 +818,9 @@ struct tegra_smmu *tegra_smmu_probe(struct device *dev,
+ 	smmu->pfn_mask = BIT_MASK(mc->soc->num_address_bits - PAGE_SHIFT) - 1;
+ 	dev_dbg(dev, "address bits: %u, PFN mask: %#lx\n",
+ 		mc->soc->num_address_bits, smmu->pfn_mask);
++	smmu->tlb_mask = (smmu->soc->num_tlb_lines << 1) - 1;
++	dev_dbg(dev, "TLB lines: %u, mask: %#lx\n", smmu->soc->num_tlb_lines,
++		smmu->tlb_mask);
+ 
+ 	value = SMMU_PTC_CONFIG_ENABLE | SMMU_PTC_CONFIG_INDEX_MAP(0x3f);
+ 
+@@ -825,7 +830,7 @@ struct tegra_smmu *tegra_smmu_probe(struct device *dev,
+ 	smmu_writel(smmu, value, SMMU_PTC_CONFIG);
+ 
+ 	value = SMMU_TLB_CONFIG_HIT_UNDER_MISS |
+-		SMMU_TLB_CONFIG_ACTIVE_LINES(0x20);
++		SMMU_TLB_CONFIG_ACTIVE_LINES(smmu);
+ 
+ 	if (soc->supports_round_robin_arbitration)
+ 		value |= SMMU_TLB_CONFIG_ROUND_ROBIN_ARBITRATION;
+diff --git a/drivers/media/platform/am437x/am437x-vpfe.c b/drivers/media/platform/am437x/am437x-vpfe.c
+index 1fba339cddc1..c8447fa3fd91 100644
+--- a/drivers/media/platform/am437x/am437x-vpfe.c
++++ b/drivers/media/platform/am437x/am437x-vpfe.c
+@@ -1186,14 +1186,24 @@ static int vpfe_initialize_device(struct vpfe_device *vpfe)
+ static int vpfe_release(struct file *file)
+ {
+ 	struct vpfe_device *vpfe = video_drvdata(file);
++	bool fh_singular;
+ 	int ret;
+ 
+ 	mutex_lock(&vpfe->lock);
+ 
+-	if (v4l2_fh_is_singular_file(file))
+-		vpfe_ccdc_close(&vpfe->ccdc, vpfe->pdev);
++	/* Save the singular status before we call the clean-up helper */
++	fh_singular = v4l2_fh_is_singular_file(file);
++
++	/* the release helper will cleanup any on-going streaming */
+ 	ret = _vb2_fop_release(file, NULL);
+ 
++	/*
++	 * If this was the last open file.
++	 * Then de-initialize hw module.
++	 */
++	if (fh_singular)
++		vpfe_ccdc_close(&vpfe->ccdc, vpfe->pdev);
++
+ 	mutex_unlock(&vpfe->lock);
+ 
+ 	return ret;
+@@ -1565,7 +1575,7 @@ static int vpfe_s_fmt(struct file *file, void *priv,
+ 		return -EBUSY;
+ 	}
+ 
+-	ret = vpfe_try_fmt(file, priv, fmt);
++	ret = vpfe_try_fmt(file, priv, &format);
+ 	if (ret)
+ 		return ret;
+ 
+diff --git a/drivers/media/platform/omap3isp/isp.c b/drivers/media/platform/omap3isp/isp.c
+index 18d0a871747f..12be830d704f 100644
+--- a/drivers/media/platform/omap3isp/isp.c
++++ b/drivers/media/platform/omap3isp/isp.c
+@@ -829,14 +829,14 @@ static int isp_pipeline_link_notify(struct media_link *link, u32 flags,
+ 	int ret;
+ 
+ 	if (notification == MEDIA_DEV_NOTIFY_POST_LINK_CH &&
+-	    !(link->flags & MEDIA_LNK_FL_ENABLED)) {
++	    !(flags & MEDIA_LNK_FL_ENABLED)) {
+ 		/* Powering off entities is assumed to never fail. */
+ 		isp_pipeline_pm_power(source, -sink_use);
+ 		isp_pipeline_pm_power(sink, -source_use);
+ 		return 0;
+ 	}
+ 
+-	if (notification == MEDIA_DEV_NOTIFY_POST_LINK_CH &&
++	if (notification == MEDIA_DEV_NOTIFY_PRE_LINK_CH &&
+ 		(flags & MEDIA_LNK_FL_ENABLED)) {
+ 
+ 		ret = isp_pipeline_pm_power(source, sink_use);
+@@ -2000,10 +2000,8 @@ static int isp_register_entities(struct isp_device *isp)
+ 	ret = v4l2_device_register_subdev_nodes(&isp->v4l2_dev);
+ 
+ done:
+-	if (ret < 0) {
++	if (ret < 0)
+ 		isp_unregister_entities(isp);
+-		v4l2_async_notifier_unregister(&isp->notifier);
+-	}
+ 
+ 	return ret;
+ }
+@@ -2423,10 +2421,6 @@ static int isp_probe(struct platform_device *pdev)
+ 		ret = isp_of_parse_nodes(&pdev->dev, &isp->notifier);
+ 		if (ret < 0)
+ 			return ret;
+-		ret = v4l2_async_notifier_register(&isp->v4l2_dev,
+-						   &isp->notifier);
+-		if (ret)
+-			return ret;
+ 	} else {
+ 		isp->pdata = pdev->dev.platform_data;
+ 		isp->syscon = syscon_regmap_lookup_by_pdevname("syscon.0");
+@@ -2557,18 +2551,27 @@ static int isp_probe(struct platform_device *pdev)
+ 	if (ret < 0)
+ 		goto error_iommu;
+ 
+-	isp->notifier.bound = isp_subdev_notifier_bound;
+-	isp->notifier.complete = isp_subdev_notifier_complete;
+-
+ 	ret = isp_register_entities(isp);
+ 	if (ret < 0)
+ 		goto error_modules;
+ 
++	if (IS_ENABLED(CONFIG_OF) && pdev->dev.of_node) {
++		isp->notifier.bound = isp_subdev_notifier_bound;
++		isp->notifier.complete = isp_subdev_notifier_complete;
++
++		ret = v4l2_async_notifier_register(&isp->v4l2_dev,
++						   &isp->notifier);
++		if (ret)
++			goto error_register_entities;
++	}
++
+ 	isp_core_init(isp, 1);
+ 	omap3isp_put(isp);
+ 
+ 	return 0;
+ 
++error_register_entities:
++	isp_unregister_entities(isp);
+ error_modules:
+ 	isp_cleanup_modules(isp);
+ error_iommu:
+diff --git a/drivers/media/platform/xilinx/xilinx-dma.c b/drivers/media/platform/xilinx/xilinx-dma.c
+index 98e50e446d57..e779c93cb015 100644
+--- a/drivers/media/platform/xilinx/xilinx-dma.c
++++ b/drivers/media/platform/xilinx/xilinx-dma.c
+@@ -699,8 +699,10 @@ int xvip_dma_init(struct xvip_composite_device *xdev, struct xvip_dma *dma,
+ 
+ 	/* ... and the buffers queue... */
+ 	dma->alloc_ctx = vb2_dma_contig_init_ctx(dma->xdev->dev);
+-	if (IS_ERR(dma->alloc_ctx))
++	if (IS_ERR(dma->alloc_ctx)) {
++		ret = PTR_ERR(dma->alloc_ctx);
+ 		goto error;
++	}
+ 
+ 	/* Don't enable VB2_READ and VB2_WRITE, as using the read() and write()
+ 	 * V4L2 APIs would be inefficient. Testing on the command line with a
+diff --git a/drivers/media/rc/rc-main.c b/drivers/media/rc/rc-main.c
+index 0ff388a16168..f3b6b2caabf6 100644
+--- a/drivers/media/rc/rc-main.c
++++ b/drivers/media/rc/rc-main.c
+@@ -1191,9 +1191,6 @@ static int rc_dev_uevent(struct device *device, struct kobj_uevent_env *env)
+ {
+ 	struct rc_dev *dev = to_rc_dev(device);
+ 
+-	if (!dev || !dev->input_dev)
+-		return -ENODEV;
+-
+ 	if (dev->rc_map.name)
+ 		ADD_HOTPLUG_VAR("NAME=%s", dev->rc_map.name);
+ 	if (dev->driver_name)
+diff --git a/drivers/memory/tegra/tegra114.c b/drivers/memory/tegra/tegra114.c
+index 9f579589e800..9bf11ea90549 100644
+--- a/drivers/memory/tegra/tegra114.c
++++ b/drivers/memory/tegra/tegra114.c
+@@ -935,6 +935,7 @@ static const struct tegra_smmu_soc tegra114_smmu_soc = {
+ 	.num_swgroups = ARRAY_SIZE(tegra114_swgroups),
+ 	.supports_round_robin_arbitration = false,
+ 	.supports_request_limit = false,
++	.num_tlb_lines = 32,
+ 	.num_asids = 4,
+ 	.ops = &tegra114_smmu_ops,
+ };
+diff --git a/drivers/memory/tegra/tegra124.c b/drivers/memory/tegra/tegra124.c
+index 966e1557e6f4..70ed80d23431 100644
+--- a/drivers/memory/tegra/tegra124.c
++++ b/drivers/memory/tegra/tegra124.c
+@@ -1023,6 +1023,7 @@ static const struct tegra_smmu_soc tegra124_smmu_soc = {
+ 	.num_swgroups = ARRAY_SIZE(tegra124_swgroups),
+ 	.supports_round_robin_arbitration = true,
+ 	.supports_request_limit = true,
++	.num_tlb_lines = 32,
+ 	.num_asids = 128,
+ 	.ops = &tegra124_smmu_ops,
+ };
+diff --git a/drivers/memory/tegra/tegra30.c b/drivers/memory/tegra/tegra30.c
+index 1abcd8f6f3ba..b2a34fefabef 100644
+--- a/drivers/memory/tegra/tegra30.c
++++ b/drivers/memory/tegra/tegra30.c
+@@ -957,6 +957,7 @@ static const struct tegra_smmu_soc tegra30_smmu_soc = {
+ 	.num_swgroups = ARRAY_SIZE(tegra30_swgroups),
+ 	.supports_round_robin_arbitration = false,
+ 	.supports_request_limit = false,
++	.num_tlb_lines = 16,
+ 	.num_asids = 4,
+ 	.ops = &tegra30_smmu_ops,
+ };
+diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
+index 729e0851167d..4224a6acf4c4 100644
+--- a/drivers/misc/cxl/api.c
++++ b/drivers/misc/cxl/api.c
+@@ -59,7 +59,7 @@ EXPORT_SYMBOL_GPL(cxl_get_phys_dev);
+ 
+ int cxl_release_context(struct cxl_context *ctx)
+ {
+-	if (ctx->status != CLOSED)
++	if (ctx->status >= STARTED)
+ 		return -EBUSY;
+ 
+ 	put_device(&ctx->afu->dev);
+diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
+index 32ad09705949..dc836071c633 100644
+--- a/drivers/misc/cxl/pci.c
++++ b/drivers/misc/cxl/pci.c
+@@ -851,16 +851,9 @@ int cxl_reset(struct cxl *adapter)
+ {
+ 	struct pci_dev *dev = to_pci_dev(adapter->dev.parent);
+ 	int rc;
+-	int i;
+-	u32 val;
+ 
+ 	dev_info(&dev->dev, "CXL reset\n");
+ 
+-	for (i = 0; i < adapter->slices; i++) {
+-		cxl_pci_vphb_remove(adapter->afu[i]);
+-		cxl_remove_afu(adapter->afu[i]);
+-	}
+-
+ 	/* pcie_warm_reset requests a fundamental pci reset which includes a
+ 	 * PERST assert/deassert.  PERST triggers a loading of the image
+ 	 * if "user" or "factory" is selected in sysfs */
+@@ -869,20 +862,6 @@ int cxl_reset(struct cxl *adapter)
+ 		return rc;
+ 	}
+ 
+-	/* the PERST done above fences the PHB.  So, reset depends on EEH
+-	 * to unbind the driver, tell Sapphire to reinit the PHB, and rebind
+-	 * the driver.  Do an mmio read explictly to ensure EEH notices the
+-	 * fenced PHB.  Retry for a few seconds before giving up. */
+-	i = 0;
+-	while (((val = mmio_read32be(adapter->p1_mmio)) != 0xffffffff) &&
+-		(i < 5)) {
+-		msleep(500);
+-		i++;
+-	}
+-
+-	if (val != 0xffffffff)
+-		dev_err(&dev->dev, "cxl: PERST failed to trigger EEH\n");
+-
+ 	return rc;
+ }
+ 
+@@ -1140,8 +1119,6 @@ static int cxl_probe(struct pci_dev *dev, const struct pci_device_id *id)
+ 	int slice;
+ 	int rc;
+ 
+-	pci_dev_get(dev);
+-
+ 	if (cxl_verbose)
+ 		dump_cxl_config_space(dev);
+ 
+diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
+index 9ad73f30f744..9e3fdbdc4037 100644
+--- a/drivers/mmc/core/core.c
++++ b/drivers/mmc/core/core.c
+@@ -358,8 +358,10 @@ EXPORT_SYMBOL(mmc_start_bkops);
+  */
+ static void mmc_wait_data_done(struct mmc_request *mrq)
+ {
+-	mrq->host->context_info.is_done_rcv = true;
+-	wake_up_interruptible(&mrq->host->context_info.wait);
++	struct mmc_context_info *context_info = &mrq->host->context_info;
++
++	context_info->is_done_rcv = true;
++	wake_up_interruptible(&context_info->wait);
+ }
+ 
+ static void mmc_wait_done(struct mmc_request *mrq)
+diff --git a/drivers/mmc/host/sdhci-of-esdhc.c b/drivers/mmc/host/sdhci-of-esdhc.c
+index 797be7549a15..653f335bef15 100644
+--- a/drivers/mmc/host/sdhci-of-esdhc.c
++++ b/drivers/mmc/host/sdhci-of-esdhc.c
+@@ -208,6 +208,12 @@ static void esdhc_of_set_clock(struct sdhci_host *host, unsigned int clock)
+ 	if (clock == 0)
+ 		return;
+ 
++	/* Workaround to start pre_div at 2 for VNN < VENDOR_V_23 */
++	temp = esdhc_readw(host, SDHCI_HOST_VERSION);
++	temp = (temp & SDHCI_VENDOR_VER_MASK) >> SDHCI_VENDOR_VER_SHIFT;
++	if (temp < VENDOR_V_23)
++		pre_div = 2;
++
+ 	/* Workaround to reduce the clock frequency for p1010 esdhc */
+ 	if (of_find_compatible_node(NULL, NULL, "fsl,p1010-esdhc")) {
+ 		if (clock > 20000000)
+diff --git a/drivers/mmc/host/sdhci-pci.c b/drivers/mmc/host/sdhci-pci.c
+index 94f54d2772e8..b3b0a3e4fca1 100644
+--- a/drivers/mmc/host/sdhci-pci.c
++++ b/drivers/mmc/host/sdhci-pci.c
+@@ -618,6 +618,7 @@ static int jmicron_resume(struct sdhci_pci_chip *chip)
+ static const struct sdhci_pci_fixes sdhci_o2 = {
+ 	.probe = sdhci_pci_o2_probe,
+ 	.quirks = SDHCI_QUIRK_NO_ENDATTR_IN_NOPDESC,
++	.quirks2 = SDHCI_QUIRK2_CLEAR_TRANSFERMODE_REG_BEFORE_CMD,
+ 	.probe_slot = sdhci_pci_o2_probe_slot,
+ 	.resume = sdhci_pci_o2_resume,
+ };
+diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
+index 1dbe93232030..b0c915a35a9e 100644
+--- a/drivers/mmc/host/sdhci.c
++++ b/drivers/mmc/host/sdhci.c
+@@ -54,8 +54,7 @@ static void sdhci_finish_command(struct sdhci_host *);
+ static int sdhci_execute_tuning(struct mmc_host *mmc, u32 opcode);
+ static void sdhci_enable_preset_value(struct sdhci_host *host, bool enable);
+ static int sdhci_pre_dma_transfer(struct sdhci_host *host,
+-					struct mmc_data *data,
+-					struct sdhci_host_next *next);
++					struct mmc_data *data);
+ static int sdhci_do_get_cd(struct sdhci_host *host);
+ 
+ #ifdef CONFIG_PM
+@@ -496,7 +495,7 @@ static int sdhci_adma_table_pre(struct sdhci_host *host,
+ 		goto fail;
+ 	BUG_ON(host->align_addr & host->align_mask);
+ 
+-	host->sg_count = sdhci_pre_dma_transfer(host, data, NULL);
++	host->sg_count = sdhci_pre_dma_transfer(host, data);
+ 	if (host->sg_count < 0)
+ 		goto unmap_align;
+ 
+@@ -635,9 +634,11 @@ static void sdhci_adma_table_post(struct sdhci_host *host,
+ 		}
+ 	}
+ 
+-	if (!data->host_cookie)
++	if (data->host_cookie == COOKIE_MAPPED) {
+ 		dma_unmap_sg(mmc_dev(host->mmc), data->sg,
+ 			data->sg_len, direction);
++		data->host_cookie = COOKIE_UNMAPPED;
++	}
+ }
+ 
+ static u8 sdhci_calc_timeout(struct sdhci_host *host, struct mmc_command *cmd)
+@@ -833,7 +834,7 @@ static void sdhci_prepare_data(struct sdhci_host *host, struct mmc_command *cmd)
+ 		} else {
+ 			int sg_cnt;
+ 
+-			sg_cnt = sdhci_pre_dma_transfer(host, data, NULL);
++			sg_cnt = sdhci_pre_dma_transfer(host, data);
+ 			if (sg_cnt <= 0) {
+ 				/*
+ 				 * This only happens when someone fed
+@@ -949,11 +950,13 @@ static void sdhci_finish_data(struct sdhci_host *host)
+ 		if (host->flags & SDHCI_USE_ADMA)
+ 			sdhci_adma_table_post(host, data);
+ 		else {
+-			if (!data->host_cookie)
++			if (data->host_cookie == COOKIE_MAPPED) {
+ 				dma_unmap_sg(mmc_dev(host->mmc),
+ 					data->sg, data->sg_len,
+ 					(data->flags & MMC_DATA_READ) ?
+ 					DMA_FROM_DEVICE : DMA_TO_DEVICE);
++				data->host_cookie = COOKIE_UNMAPPED;
++			}
+ 		}
+ 	}
+ 
+@@ -1132,6 +1135,7 @@ static u16 sdhci_get_preset_value(struct sdhci_host *host)
+ 		preset = sdhci_readw(host, SDHCI_PRESET_FOR_SDR104);
+ 		break;
+ 	case MMC_TIMING_UHS_DDR50:
++	case MMC_TIMING_MMC_DDR52:
+ 		preset = sdhci_readw(host, SDHCI_PRESET_FOR_DDR50);
+ 		break;
+ 	case MMC_TIMING_MMC_HS400:
+@@ -1559,7 +1563,8 @@ static void sdhci_do_set_ios(struct sdhci_host *host, struct mmc_ios *ios)
+ 				 (ios->timing == MMC_TIMING_UHS_SDR25) ||
+ 				 (ios->timing == MMC_TIMING_UHS_SDR50) ||
+ 				 (ios->timing == MMC_TIMING_UHS_SDR104) ||
+-				 (ios->timing == MMC_TIMING_UHS_DDR50))) {
++				 (ios->timing == MMC_TIMING_UHS_DDR50) ||
++				 (ios->timing == MMC_TIMING_MMC_DDR52))) {
+ 			u16 preset;
+ 
+ 			sdhci_enable_preset_value(host, true);
+@@ -2097,49 +2102,36 @@ static void sdhci_post_req(struct mmc_host *mmc, struct mmc_request *mrq,
+ 	struct mmc_data *data = mrq->data;
+ 
+ 	if (host->flags & SDHCI_REQ_USE_DMA) {
+-		if (data->host_cookie)
++		if (data->host_cookie == COOKIE_GIVEN ||
++				data->host_cookie == COOKIE_MAPPED)
+ 			dma_unmap_sg(mmc_dev(host->mmc), data->sg, data->sg_len,
+ 					 data->flags & MMC_DATA_WRITE ?
+ 					 DMA_TO_DEVICE : DMA_FROM_DEVICE);
+-		mrq->data->host_cookie = 0;
++		data->host_cookie = COOKIE_UNMAPPED;
+ 	}
+ }
+ 
+ static int sdhci_pre_dma_transfer(struct sdhci_host *host,
+-				       struct mmc_data *data,
+-				       struct sdhci_host_next *next)
++				       struct mmc_data *data)
+ {
+ 	int sg_count;
+ 
+-	if (!next && data->host_cookie &&
+-	    data->host_cookie != host->next_data.cookie) {
+-		pr_debug(DRIVER_NAME "[%s] invalid cookie: %d, next-cookie %d\n",
+-			__func__, data->host_cookie, host->next_data.cookie);
+-		data->host_cookie = 0;
++	if (data->host_cookie == COOKIE_MAPPED) {
++		data->host_cookie = COOKIE_GIVEN;
++		return data->sg_count;
+ 	}
+ 
+-	/* Check if next job is already prepared */
+-	if (next ||
+-	    (!next && data->host_cookie != host->next_data.cookie)) {
+-		sg_count = dma_map_sg(mmc_dev(host->mmc), data->sg,
+-				     data->sg_len,
+-				     data->flags & MMC_DATA_WRITE ?
+-				     DMA_TO_DEVICE : DMA_FROM_DEVICE);
+-
+-	} else {
+-		sg_count = host->next_data.sg_count;
+-		host->next_data.sg_count = 0;
+-	}
++	WARN_ON(data->host_cookie == COOKIE_GIVEN);
+ 
++	sg_count = dma_map_sg(mmc_dev(host->mmc), data->sg, data->sg_len,
++				data->flags & MMC_DATA_WRITE ?
++				DMA_TO_DEVICE : DMA_FROM_DEVICE);
+ 
+ 	if (sg_count == 0)
+-		return -EINVAL;
++		return -ENOSPC;
+ 
+-	if (next) {
+-		next->sg_count = sg_count;
+-		data->host_cookie = ++next->cookie < 0 ? 1 : next->cookie;
+-	} else
+-		host->sg_count = sg_count;
++	data->sg_count = sg_count;
++	data->host_cookie = COOKIE_MAPPED;
+ 
+ 	return sg_count;
+ }
+@@ -2149,16 +2141,10 @@ static void sdhci_pre_req(struct mmc_host *mmc, struct mmc_request *mrq,
+ {
+ 	struct sdhci_host *host = mmc_priv(mmc);
+ 
+-	if (mrq->data->host_cookie) {
+-		mrq->data->host_cookie = 0;
+-		return;
+-	}
++	mrq->data->host_cookie = COOKIE_UNMAPPED;
+ 
+ 	if (host->flags & SDHCI_REQ_USE_DMA)
+-		if (sdhci_pre_dma_transfer(host,
+-					mrq->data,
+-					&host->next_data) < 0)
+-			mrq->data->host_cookie = 0;
++		sdhci_pre_dma_transfer(host, mrq->data);
+ }
+ 
+ static void sdhci_card_event(struct mmc_host *mmc)
+@@ -3030,7 +3016,6 @@ int sdhci_add_host(struct sdhci_host *host)
+ 		host->max_clk = host->ops->get_max_clock(host);
+ 	}
+ 
+-	host->next_data.cookie = 1;
+ 	/*
+ 	 * In case of Host Controller v3.00, find out whether clock
+ 	 * multiplier is supported.
+diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h
+index 5521d29368e4..a9512a421f52 100644
+--- a/drivers/mmc/host/sdhci.h
++++ b/drivers/mmc/host/sdhci.h
+@@ -309,9 +309,10 @@ struct sdhci_adma2_64_desc {
+  */
+ #define SDHCI_MAX_SEGS		128
+ 
+-struct sdhci_host_next {
+-	unsigned int	sg_count;
+-	s32		cookie;
++enum sdhci_cookie {
++	COOKIE_UNMAPPED,
++	COOKIE_MAPPED,
++	COOKIE_GIVEN,
+ };
+ 
+ struct sdhci_host {
+@@ -503,7 +504,6 @@ struct sdhci_host {
+ 	unsigned int		tuning_mode;	/* Re-tuning mode supported by host */
+ #define SDHCI_TUNING_MODE_1	0
+ 
+-	struct sdhci_host_next	next_data;
+ 	unsigned long private[0] ____cacheline_aligned;
+ };
+ 
+diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
+index 73c934cf6c61..79789d8e52da 100644
+--- a/drivers/net/ethernet/broadcom/tg3.c
++++ b/drivers/net/ethernet/broadcom/tg3.c
+@@ -10757,7 +10757,7 @@ static ssize_t tg3_show_temp(struct device *dev,
+ 	tg3_ape_scratchpad_read(tp, &temperature, attr->index,
+ 				sizeof(temperature));
+ 	spin_unlock_bh(&tp->lock);
+-	return sprintf(buf, "%u\n", temperature);
++	return sprintf(buf, "%u\n", temperature * 1000);
+ }
+ 
+ 
+diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
+index c2bd4f98a837..212d668dabb3 100644
+--- a/drivers/net/ethernet/intel/igb/igb.h
++++ b/drivers/net/ethernet/intel/igb/igb.h
+@@ -540,6 +540,7 @@ void igb_ptp_rx_pktstamp(struct igb_q_vector *q_vector, unsigned char *va,
+ 			 struct sk_buff *skb);
+ int igb_ptp_set_ts_config(struct net_device *netdev, struct ifreq *ifr);
+ int igb_ptp_get_ts_config(struct net_device *netdev, struct ifreq *ifr);
++void igb_set_flag_queue_pairs(struct igb_adapter *, const u32);
+ #ifdef CONFIG_IGB_HWMON
+ void igb_sysfs_exit(struct igb_adapter *adapter);
+ int igb_sysfs_init(struct igb_adapter *adapter);
+diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c
+index d5673eb90c54..0afc0913e5b9 100644
+--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
++++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
+@@ -2991,6 +2991,7 @@ static int igb_set_channels(struct net_device *netdev,
+ {
+ 	struct igb_adapter *adapter = netdev_priv(netdev);
+ 	unsigned int count = ch->combined_count;
++	unsigned int max_combined = 0;
+ 
+ 	/* Verify they are not requesting separate vectors */
+ 	if (!count || ch->rx_count || ch->tx_count)
+@@ -3001,11 +3002,13 @@ static int igb_set_channels(struct net_device *netdev,
+ 		return -EINVAL;
+ 
+ 	/* Verify the number of channels doesn't exceed hw limits */
+-	if (count > igb_max_channels(adapter))
++	max_combined = igb_max_channels(adapter);
++	if (count > max_combined)
+ 		return -EINVAL;
+ 
+ 	if (count != adapter->rss_queues) {
+ 		adapter->rss_queues = count;
++		igb_set_flag_queue_pairs(adapter, max_combined);
+ 
+ 		/* Hardware has to reinitialize queues and interrupts to
+ 		 * match the new configuration.
+diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
+index 830466c49987..8d7b59689722 100644
+--- a/drivers/net/ethernet/intel/igb/igb_main.c
++++ b/drivers/net/ethernet/intel/igb/igb_main.c
+@@ -1205,10 +1205,14 @@ static int igb_alloc_q_vector(struct igb_adapter *adapter,
+ 
+ 	/* allocate q_vector and rings */
+ 	q_vector = adapter->q_vector[v_idx];
+-	if (!q_vector)
++	if (!q_vector) {
+ 		q_vector = kzalloc(size, GFP_KERNEL);
+-	else
++	} else if (size > ksize(q_vector)) {
++		kfree_rcu(q_vector, rcu);
++		q_vector = kzalloc(size, GFP_KERNEL);
++	} else {
+ 		memset(q_vector, 0, size);
++	}
+ 	if (!q_vector)
+ 		return -ENOMEM;
+ 
+@@ -2888,6 +2892,14 @@ static void igb_init_queue_configuration(struct igb_adapter *adapter)
+ 
+ 	adapter->rss_queues = min_t(u32, max_rss_queues, num_online_cpus());
+ 
++	igb_set_flag_queue_pairs(adapter, max_rss_queues);
++}
++
++void igb_set_flag_queue_pairs(struct igb_adapter *adapter,
++			      const u32 max_rss_queues)
++{
++	struct e1000_hw *hw = &adapter->hw;
++
+ 	/* Determine if we need to pair queues. */
+ 	switch (hw->mac.type) {
+ 	case e1000_82575:
+diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+index 864b476f7fd5..925f2f8659b8 100644
+--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
++++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+@@ -837,8 +837,11 @@ static int stmmac_init_phy(struct net_device *dev)
+ 				     interface);
+ 	}
+ 
+-	if (IS_ERR(phydev)) {
++	if (IS_ERR_OR_NULL(phydev)) {
+ 		pr_err("%s: Could not attach to PHY\n", dev->name);
++		if (!phydev)
++			return -ENODEV;
++
+ 		return PTR_ERR(phydev);
+ 	}
+ 
+diff --git a/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c b/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c
+index 23806c243a53..fd4a5353d216 100644
+--- a/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c
++++ b/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c
+@@ -321,6 +321,7 @@ static struct usb_device_id rtl8192c_usb_ids[] = {
+ 	{RTL_USB_DEVICE(0x07b8, 0x8188, rtl92cu_hal_cfg)}, /*Abocom - Abocom*/
+ 	{RTL_USB_DEVICE(0x07b8, 0x8189, rtl92cu_hal_cfg)}, /*Funai - Abocom*/
+ 	{RTL_USB_DEVICE(0x0846, 0x9041, rtl92cu_hal_cfg)}, /*NetGear WNA1000M*/
++	{RTL_USB_DEVICE(0x0846, 0x9043, rtl92cu_hal_cfg)}, /*NG WNA1000Mv2*/
+ 	{RTL_USB_DEVICE(0x0b05, 0x17ba, rtl92cu_hal_cfg)}, /*ASUS-Edimax*/
+ 	{RTL_USB_DEVICE(0x0bda, 0x5088, rtl92cu_hal_cfg)}, /*Thinkware-CC&C*/
+ 	{RTL_USB_DEVICE(0x0df6, 0x0052, rtl92cu_hal_cfg)}, /*Sitecom - Edimax*/
+diff --git a/drivers/net/wireless/rtlwifi/rtl8821ae/hw.c b/drivers/net/wireless/rtlwifi/rtl8821ae/hw.c
+index 3236d44b459d..b7f18e2155eb 100644
+--- a/drivers/net/wireless/rtlwifi/rtl8821ae/hw.c
++++ b/drivers/net/wireless/rtlwifi/rtl8821ae/hw.c
+@@ -2180,7 +2180,7 @@ static int _rtl8821ae_set_media_status(struct ieee80211_hw *hw,
+ 
+ 	rtl_write_byte(rtlpriv, MSR, bt_msr);
+ 	rtlpriv->cfg->ops->led_control(hw, ledaction);
+-	if ((bt_msr & 0xfc) == MSR_AP)
++	if ((bt_msr & MSR_MASK) == MSR_AP)
+ 		rtl_write_byte(rtlpriv, REG_BCNTCFG + 1, 0x00);
+ 	else
+ 		rtl_write_byte(rtlpriv, REG_BCNTCFG + 1, 0x66);
+diff --git a/drivers/net/wireless/rtlwifi/rtl8821ae/reg.h b/drivers/net/wireless/rtlwifi/rtl8821ae/reg.h
+index 53668fc8f23e..1d6110f9c1fb 100644
+--- a/drivers/net/wireless/rtlwifi/rtl8821ae/reg.h
++++ b/drivers/net/wireless/rtlwifi/rtl8821ae/reg.h
+@@ -429,6 +429,7 @@
+ #define	MSR_ADHOC				0x01
+ #define	MSR_INFRA				0x02
+ #define	MSR_AP					0x03
++#define MSR_MASK				0x03
+ 
+ #define	RRSR_RSC_OFFSET				21
+ #define	RRSR_SHORT_OFFSET			23
+diff --git a/drivers/nfc/st-nci/i2c.c b/drivers/nfc/st-nci/i2c.c
+index 06175ce769bb..707ed2eb5936 100644
+--- a/drivers/nfc/st-nci/i2c.c
++++ b/drivers/nfc/st-nci/i2c.c
+@@ -25,15 +25,15 @@
+ #include <linux/interrupt.h>
+ #include <linux/delay.h>
+ #include <linux/nfc.h>
+-#include <linux/platform_data/st_nci.h>
++#include <linux/platform_data/st-nci.h>
+ 
+ #include "ndlc.h"
+ 
+-#define DRIVER_DESC "NCI NFC driver for ST21NFCB"
++#define DRIVER_DESC "NCI NFC driver for ST_NCI"
+ 
+ /* ndlc header */
+-#define ST21NFCB_FRAME_HEADROOM	1
+-#define ST21NFCB_FRAME_TAILROOM 0
++#define ST_NCI_FRAME_HEADROOM	1
++#define ST_NCI_FRAME_TAILROOM 0
+ 
+ #define ST_NCI_I2C_MIN_SIZE 4   /* PCB(1) + NCI Packet header(3) */
+ #define ST_NCI_I2C_MAX_SIZE 250 /* req 4.2.1 */
+@@ -118,15 +118,10 @@ static int st_nci_i2c_write(void *phy_id, struct sk_buff *skb)
+ /*
+  * Reads an ndlc frame and returns it in a newly allocated sk_buff.
+  * returns:
+- * frame size : if received frame is complete (find ST21NFCB_SOF_EOF at
+- * end of read)
+- * -EAGAIN : if received frame is incomplete (not find ST21NFCB_SOF_EOF
+- * at end of read)
++ * 0 : if received frame is complete
+  * -EREMOTEIO : i2c read error (fatal)
+  * -EBADMSG : frame was incorrect and discarded
+- * (value returned from st_nci_i2c_repack)
+- * -EIO : if no ST21NFCB_SOF_EOF is found after reaching
+- * the read length end sequence
++ * -ENOMEM : cannot allocate skb, frame dropped
+  */
+ static int st_nci_i2c_read(struct st_nci_i2c_phy *phy,
+ 				 struct sk_buff **skb)
+@@ -179,7 +174,7 @@ static int st_nci_i2c_read(struct st_nci_i2c_phy *phy,
+ /*
+  * Reads an ndlc frame from the chip.
+  *
+- * On ST21NFCB, IRQ goes in idle state when read starts.
++ * On ST_NCI, IRQ goes in idle state when read starts.
+  */
+ static irqreturn_t st_nci_irq_thread_fn(int irq, void *phy_id)
+ {
+@@ -325,12 +320,12 @@ static int st_nci_i2c_probe(struct i2c_client *client,
+ 		}
+ 	} else {
+ 		nfc_err(&client->dev,
+-			"st21nfcb platform resources not available\n");
++			"st_nci platform resources not available\n");
+ 		return -ENODEV;
+ 	}
+ 
+ 	r = ndlc_probe(phy, &i2c_phy_ops, &client->dev,
+-			ST21NFCB_FRAME_HEADROOM, ST21NFCB_FRAME_TAILROOM,
++			ST_NCI_FRAME_HEADROOM, ST_NCI_FRAME_TAILROOM,
+ 			&phy->ndlc);
+ 	if (r < 0) {
+ 		nfc_err(&client->dev, "Unable to register ndlc layer\n");
+diff --git a/drivers/nfc/st-nci/ndlc.c b/drivers/nfc/st-nci/ndlc.c
+index 56c6a4cb4c96..4f51649d0e75 100644
+--- a/drivers/nfc/st-nci/ndlc.c
++++ b/drivers/nfc/st-nci/ndlc.c
+@@ -171,6 +171,8 @@ static void llt_ndlc_rcv_queue(struct llt_ndlc *ndlc)
+ 		if ((pcb & PCB_TYPE_MASK) == PCB_TYPE_SUPERVISOR) {
+ 			switch (pcb & PCB_SYNC_MASK) {
+ 			case PCB_SYNC_ACK:
++				skb = skb_dequeue(&ndlc->ack_pending_q);
++				kfree_skb(skb);
+ 				del_timer_sync(&ndlc->t1_timer);
+ 				del_timer_sync(&ndlc->t2_timer);
+ 				ndlc->t2_active = false;
+@@ -196,8 +198,10 @@ static void llt_ndlc_rcv_queue(struct llt_ndlc *ndlc)
+ 				kfree_skb(skb);
+ 				break;
+ 			}
+-		} else {
++		} else if ((pcb & PCB_TYPE_MASK) == PCB_TYPE_DATAFRAME) {
+ 			nci_recv_frame(ndlc->ndev, skb);
++		} else {
++			kfree_skb(skb);
+ 		}
+ 	}
+ }
+diff --git a/drivers/nfc/st-nci/st-nci_se.c b/drivers/nfc/st-nci/st-nci_se.c
+index 97addfa96c6f..c742ef65a05a 100644
+--- a/drivers/nfc/st-nci/st-nci_se.c
++++ b/drivers/nfc/st-nci/st-nci_se.c
+@@ -189,14 +189,14 @@ int st_nci_hci_load_session(struct nci_dev *ndev)
+ 				ST_NCI_DEVICE_MGNT_GATE,
+ 				ST_NCI_DEVICE_MGNT_PIPE);
+ 	if (r < 0)
+-		goto free_info;
++		return r;
+ 
+ 	/* Get pipe list */
+ 	r = nci_hci_send_cmd(ndev, ST_NCI_DEVICE_MGNT_GATE,
+ 			ST_NCI_DM_GETINFO, pipe_list, sizeof(pipe_list),
+ 			&skb_pipe_list);
+ 	if (r < 0)
+-		goto free_info;
++		return r;
+ 
+ 	/* Complete the existing gate_pipe table */
+ 	for (i = 0; i < skb_pipe_list->len; i++) {
+@@ -222,6 +222,7 @@ int st_nci_hci_load_session(struct nci_dev *ndev)
+ 		    dm_pipe_info->src_host_id != ST_NCI_ESE_HOST_ID) {
+ 			pr_err("Unexpected apdu_reader pipe on host %x\n",
+ 			       dm_pipe_info->src_host_id);
++			kfree_skb(skb_pipe_info);
+ 			continue;
+ 		}
+ 
+@@ -241,13 +242,12 @@ int st_nci_hci_load_session(struct nci_dev *ndev)
+ 			ndev->hci_dev->pipes[st_nci_gates[j].pipe].host =
+ 						dm_pipe_info->src_host_id;
+ 		}
++		kfree_skb(skb_pipe_info);
+ 	}
+ 
+ 	memcpy(ndev->hci_dev->init_data.gates, st_nci_gates,
+ 	       sizeof(st_nci_gates));
+ 
+-free_info:
+-	kfree_skb(skb_pipe_info);
+ 	kfree_skb(skb_pipe_list);
+ 	return r;
+ }
+diff --git a/drivers/nfc/st21nfca/st21nfca.c b/drivers/nfc/st21nfca/st21nfca.c
+index d251f7229c4e..051286562fab 100644
+--- a/drivers/nfc/st21nfca/st21nfca.c
++++ b/drivers/nfc/st21nfca/st21nfca.c
+@@ -148,14 +148,14 @@ static int st21nfca_hci_load_session(struct nfc_hci_dev *hdev)
+ 				ST21NFCA_DEVICE_MGNT_GATE,
+ 				ST21NFCA_DEVICE_MGNT_PIPE);
+ 	if (r < 0)
+-		goto free_info;
++		return r;
+ 
+ 	/* Get pipe list */
+ 	r = nfc_hci_send_cmd(hdev, ST21NFCA_DEVICE_MGNT_GATE,
+ 			ST21NFCA_DM_GETINFO, pipe_list, sizeof(pipe_list),
+ 			&skb_pipe_list);
+ 	if (r < 0)
+-		goto free_info;
++		return r;
+ 
+ 	/* Complete the existing gate_pipe table */
+ 	for (i = 0; i < skb_pipe_list->len; i++) {
+@@ -181,6 +181,7 @@ static int st21nfca_hci_load_session(struct nfc_hci_dev *hdev)
+ 			info->src_host_id != ST21NFCA_ESE_HOST_ID) {
+ 			pr_err("Unexpected apdu_reader pipe on host %x\n",
+ 				info->src_host_id);
++			kfree_skb(skb_pipe_info);
+ 			continue;
+ 		}
+ 
+@@ -200,6 +201,7 @@ static int st21nfca_hci_load_session(struct nfc_hci_dev *hdev)
+ 			hdev->pipes[st21nfca_gates[j].pipe].dest_host =
+ 							info->src_host_id;
+ 		}
++		kfree_skb(skb_pipe_info);
+ 	}
+ 
+ 	/*
+@@ -214,13 +216,12 @@ static int st21nfca_hci_load_session(struct nfc_hci_dev *hdev)
+ 					st21nfca_gates[i].gate,
+ 					st21nfca_gates[i].pipe);
+ 			if (r < 0)
+-				goto free_info;
++				goto free_list;
+ 		}
+ 	}
+ 
+ 	memcpy(hdev->init_data.gates, st21nfca_gates, sizeof(st21nfca_gates));
+-free_info:
+-	kfree_skb(skb_pipe_info);
++free_list:
+ 	kfree_skb(skb_pipe_list);
+ 	return r;
+ }
+diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
+index 07496560e5b9..6e82bc42373b 100644
+--- a/drivers/of/fdt.c
++++ b/drivers/of/fdt.c
+@@ -967,7 +967,9 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname,
+ }
+ 
+ #ifdef CONFIG_HAVE_MEMBLOCK
+-#define MAX_PHYS_ADDR	((phys_addr_t)~0)
++#ifndef MAX_MEMBLOCK_ADDR
++#define MAX_MEMBLOCK_ADDR	((phys_addr_t)~0)
++#endif
+ 
+ void __init __weak early_init_dt_add_memory_arch(u64 base, u64 size)
+ {
+@@ -984,16 +986,16 @@ void __init __weak early_init_dt_add_memory_arch(u64 base, u64 size)
+ 	}
+ 	size &= PAGE_MASK;
+ 
+-	if (base > MAX_PHYS_ADDR) {
++	if (base > MAX_MEMBLOCK_ADDR) {
+ 		pr_warning("Ignoring memory block 0x%llx - 0x%llx\n",
+ 				base, base + size);
+ 		return;
+ 	}
+ 
+-	if (base + size - 1 > MAX_PHYS_ADDR) {
++	if (base + size - 1 > MAX_MEMBLOCK_ADDR) {
+ 		pr_warning("Ignoring memory range 0x%llx - 0x%llx\n",
+-				((u64)MAX_PHYS_ADDR) + 1, base + size);
+-		size = MAX_PHYS_ADDR - base + 1;
++				((u64)MAX_MEMBLOCK_ADDR) + 1, base + size);
++		size = MAX_MEMBLOCK_ADDR - base + 1;
+ 	}
+ 
+ 	if (base + size < phys_offset) {
+diff --git a/drivers/parisc/lba_pci.c b/drivers/parisc/lba_pci.c
+index dceb9ddfd99a..a32c1f6c252c 100644
+--- a/drivers/parisc/lba_pci.c
++++ b/drivers/parisc/lba_pci.c
+@@ -1556,8 +1556,11 @@ lba_driver_probe(struct parisc_device *dev)
+ 	if (lba_dev->hba.lmmio_space.flags)
+ 		pci_add_resource_offset(&resources, &lba_dev->hba.lmmio_space,
+ 					lba_dev->hba.lmmio_space_offset);
+-	if (lba_dev->hba.gmmio_space.flags)
+-		pci_add_resource(&resources, &lba_dev->hba.gmmio_space);
++	if (lba_dev->hba.gmmio_space.flags) {
++		/* pci_add_resource(&resources, &lba_dev->hba.gmmio_space); */
++		pr_warn("LBA: Not registering GMMIO space %pR\n",
++			&lba_dev->hba.gmmio_space);
++	}
+ 
+ 	pci_add_resource(&resources, &lba_dev->hba.bus_num);
+ 
+diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
+index 944f50015ed0..73de4efcbe6e 100644
+--- a/drivers/pci/Kconfig
++++ b/drivers/pci/Kconfig
+@@ -2,7 +2,7 @@
+ # PCI configuration
+ #
+ config PCI_BUS_ADDR_T_64BIT
+-	def_bool y if (ARCH_DMA_ADDR_T_64BIT || (64BIT && !PARISC))
++	def_bool y if (ARCH_DMA_ADDR_T_64BIT || 64BIT)
+ 	depends on PCI
+ 
+ config PCI_MSI
+diff --git a/drivers/pinctrl/mediatek/pinctrl-mtk-common.c b/drivers/pinctrl/mediatek/pinctrl-mtk-common.c
+index ad1ea1695b4a..4a52072d1d3f 100644
+--- a/drivers/pinctrl/mediatek/pinctrl-mtk-common.c
++++ b/drivers/pinctrl/mediatek/pinctrl-mtk-common.c
+@@ -1202,12 +1202,6 @@ static int mtk_pctrl_build_state(struct platform_device *pdev)
+ 	return 0;
+ }
+ 
+-static struct pinctrl_desc mtk_pctrl_desc = {
+-	.confops	= &mtk_pconf_ops,
+-	.pctlops	= &mtk_pctrl_ops,
+-	.pmxops		= &mtk_pmx_ops,
+-};
+-
+ int mtk_pctrl_init(struct platform_device *pdev,
+ 		const struct mtk_pinctrl_devdata *data,
+ 		struct regmap *regmap)
+@@ -1265,12 +1259,17 @@ int mtk_pctrl_init(struct platform_device *pdev,
+ 
+ 	for (i = 0; i < pctl->devdata->npins; i++)
+ 		pins[i] = pctl->devdata->pins[i].pin;
+-	mtk_pctrl_desc.name = dev_name(&pdev->dev);
+-	mtk_pctrl_desc.owner = THIS_MODULE;
+-	mtk_pctrl_desc.pins = pins;
+-	mtk_pctrl_desc.npins = pctl->devdata->npins;
++
++	pctl->pctl_desc.name = dev_name(&pdev->dev);
++	pctl->pctl_desc.owner = THIS_MODULE;
++	pctl->pctl_desc.pins = pins;
++	pctl->pctl_desc.npins = pctl->devdata->npins;
++	pctl->pctl_desc.confops = &mtk_pconf_ops;
++	pctl->pctl_desc.pctlops = &mtk_pctrl_ops;
++	pctl->pctl_desc.pmxops = &mtk_pmx_ops;
+ 	pctl->dev = &pdev->dev;
+-	pctl->pctl_dev = pinctrl_register(&mtk_pctrl_desc, &pdev->dev, pctl);
++
++	pctl->pctl_dev = pinctrl_register(&pctl->pctl_desc, &pdev->dev, pctl);
+ 	if (IS_ERR(pctl->pctl_dev)) {
+ 		dev_err(&pdev->dev, "couldn't register pinctrl driver\n");
+ 		return PTR_ERR(pctl->pctl_dev);
+diff --git a/drivers/pinctrl/mediatek/pinctrl-mtk-common.h b/drivers/pinctrl/mediatek/pinctrl-mtk-common.h
+index 30213e514c2f..c532c23c70b4 100644
+--- a/drivers/pinctrl/mediatek/pinctrl-mtk-common.h
++++ b/drivers/pinctrl/mediatek/pinctrl-mtk-common.h
+@@ -256,6 +256,7 @@ struct mtk_pinctrl_devdata {
+ struct mtk_pinctrl {
+ 	struct regmap	*regmap1;
+ 	struct regmap	*regmap2;
++	struct pinctrl_desc pctl_desc;
+ 	struct device           *dev;
+ 	struct gpio_chip	*chip;
+ 	struct mtk_pinctrl_group	*groups;
+diff --git a/drivers/pinctrl/pinctrl-at91.c b/drivers/pinctrl/pinctrl-at91.c
+index a0824477072b..2deb1309fcac 100644
+--- a/drivers/pinctrl/pinctrl-at91.c
++++ b/drivers/pinctrl/pinctrl-at91.c
+@@ -320,6 +320,9 @@ static const struct pinctrl_ops at91_pctrl_ops = {
+ static void __iomem *pin_to_controller(struct at91_pinctrl *info,
+ 				 unsigned int bank)
+ {
++	if (!gpio_chips[bank])
++		return NULL;
++
+ 	return gpio_chips[bank]->regbase;
+ }
+ 
+@@ -729,6 +732,10 @@ static int at91_pmx_set(struct pinctrl_dev *pctldev, unsigned selector,
+ 		pin = &pins_conf[i];
+ 		at91_pin_dbg(info->dev, pin);
+ 		pio = pin_to_controller(info, pin->bank);
++
++		if (!pio)
++			continue;
++
+ 		mask = pin_to_mask(pin->pin);
+ 		at91_mux_disable_interrupt(pio, mask);
+ 		switch (pin->mux) {
+@@ -848,6 +855,10 @@ static int at91_pinconf_get(struct pinctrl_dev *pctldev,
+ 	*config = 0;
+ 	dev_dbg(info->dev, "%s:%d, pin_id=%d", __func__, __LINE__, pin_id);
+ 	pio = pin_to_controller(info, pin_to_bank(pin_id));
++
++	if (!pio)
++		return -EINVAL;
++
+ 	pin = pin_id % MAX_NB_GPIO_PER_BANK;
+ 
+ 	if (at91_mux_get_multidrive(pio, pin))
+@@ -889,6 +900,10 @@ static int at91_pinconf_set(struct pinctrl_dev *pctldev,
+ 			"%s:%d, pin_id=%d, config=0x%lx",
+ 			__func__, __LINE__, pin_id, config);
+ 		pio = pin_to_controller(info, pin_to_bank(pin_id));
++
++		if (!pio)
++			return -EINVAL;
++
+ 		pin = pin_id % MAX_NB_GPIO_PER_BANK;
+ 		mask = pin_to_mask(pin);
+ 
+diff --git a/drivers/platform/x86/ideapad-laptop.c b/drivers/platform/x86/ideapad-laptop.c
+index 76b57388d01b..81c3e582309a 100644
+--- a/drivers/platform/x86/ideapad-laptop.c
++++ b/drivers/platform/x86/ideapad-laptop.c
+@@ -853,6 +853,13 @@ static const struct dmi_system_id no_hw_rfkill_list[] = {
+ 		},
+ 	},
+ 	{
++		.ident = "Lenovo Yoga 3 14",
++		.matches = {
++			DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
++			DMI_MATCH(DMI_PRODUCT_VERSION, "Lenovo Yoga 3 14"),
++		},
++	},
++	{
+ 		.ident = "Lenovo Yoga 3 Pro 1370",
+ 		.matches = {
+ 			DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
+diff --git a/drivers/rtc/rtc-abx80x.c b/drivers/rtc/rtc-abx80x.c
+index 4337c3bc6ace..afea84c7a155 100644
+--- a/drivers/rtc/rtc-abx80x.c
++++ b/drivers/rtc/rtc-abx80x.c
+@@ -28,7 +28,7 @@
+ #define ABX8XX_REG_WD		0x07
+ 
+ #define ABX8XX_REG_CTRL1	0x10
+-#define ABX8XX_CTRL_WRITE	BIT(1)
++#define ABX8XX_CTRL_WRITE	BIT(0)
+ #define ABX8XX_CTRL_12_24	BIT(6)
+ 
+ #define ABX8XX_REG_CFG_KEY	0x1f
+diff --git a/drivers/rtc/rtc-s3c.c b/drivers/rtc/rtc-s3c.c
+index a0f832362199..2e709e239dbc 100644
+--- a/drivers/rtc/rtc-s3c.c
++++ b/drivers/rtc/rtc-s3c.c
+@@ -39,6 +39,7 @@ struct s3c_rtc {
+ 	void __iomem *base;
+ 	struct clk *rtc_clk;
+ 	struct clk *rtc_src_clk;
++	bool clk_disabled;
+ 
+ 	struct s3c_rtc_data *data;
+ 
+@@ -71,9 +72,12 @@ static void s3c_rtc_enable_clk(struct s3c_rtc *info)
+ 	unsigned long irq_flags;
+ 
+ 	spin_lock_irqsave(&info->alarm_clk_lock, irq_flags);
+-	clk_enable(info->rtc_clk);
+-	if (info->data->needs_src_clk)
+-		clk_enable(info->rtc_src_clk);
++	if (info->clk_disabled) {
++		clk_enable(info->rtc_clk);
++		if (info->data->needs_src_clk)
++			clk_enable(info->rtc_src_clk);
++		info->clk_disabled = false;
++	}
+ 	spin_unlock_irqrestore(&info->alarm_clk_lock, irq_flags);
+ }
+ 
+@@ -82,9 +86,12 @@ static void s3c_rtc_disable_clk(struct s3c_rtc *info)
+ 	unsigned long irq_flags;
+ 
+ 	spin_lock_irqsave(&info->alarm_clk_lock, irq_flags);
+-	if (info->data->needs_src_clk)
+-		clk_disable(info->rtc_src_clk);
+-	clk_disable(info->rtc_clk);
++	if (!info->clk_disabled) {
++		if (info->data->needs_src_clk)
++			clk_disable(info->rtc_src_clk);
++		clk_disable(info->rtc_clk);
++		info->clk_disabled = true;
++	}
+ 	spin_unlock_irqrestore(&info->alarm_clk_lock, irq_flags);
+ }
+ 
+@@ -128,6 +135,11 @@ static int s3c_rtc_setaie(struct device *dev, unsigned int enabled)
+ 
+ 	s3c_rtc_disable_clk(info);
+ 
++	if (enabled)
++		s3c_rtc_enable_clk(info);
++	else
++		s3c_rtc_disable_clk(info);
++
+ 	return 0;
+ }
+ 
+diff --git a/drivers/rtc/rtc-s5m.c b/drivers/rtc/rtc-s5m.c
+index 8c70d785ba73..ab60287ee72d 100644
+--- a/drivers/rtc/rtc-s5m.c
++++ b/drivers/rtc/rtc-s5m.c
+@@ -635,6 +635,16 @@ static int s5m8767_rtc_init_reg(struct s5m_rtc_info *info)
+ 	case S2MPS13X:
+ 		data[0] = (0 << BCD_EN_SHIFT) | (1 << MODEL24_SHIFT);
+ 		ret = regmap_write(info->regmap, info->regs->ctrl, data[0]);
++		if (ret < 0)
++			break;
++
++		/*
++		 * Should set WUDR & (RUDR or AUDR) bits to high after writing
++		 * RTC_CTRL register like writing Alarm registers. We can't find
++		 * the description from datasheet but vendor code does that
++		 * really.
++		 */
++		ret = s5m8767_rtc_set_alarm_reg(info);
+ 		break;
+ 
+ 	default:
+diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
+index f5021fcb154e..089e7f8543a5 100644
+--- a/fs/btrfs/transaction.c
++++ b/fs/btrfs/transaction.c
+@@ -1893,8 +1893,11 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
+ 			spin_unlock(&root->fs_info->trans_lock);
+ 
+ 			wait_for_commit(root, prev_trans);
++			ret = prev_trans->aborted;
+ 
+ 			btrfs_put_transaction(prev_trans);
++			if (ret)
++				goto cleanup_transaction;
+ 		} else {
+ 			spin_unlock(&root->fs_info->trans_lock);
+ 		}
+diff --git a/fs/cifs/ioctl.c b/fs/cifs/ioctl.c
+index 49b8b6e41a18..c7b84f3bf6ad 100644
+--- a/fs/cifs/ioctl.c
++++ b/fs/cifs/ioctl.c
+@@ -70,6 +70,12 @@ static long cifs_ioctl_clone(unsigned int xid, struct file *dst_file,
+ 		goto out_drop_write;
+ 	}
+ 
++	if (src_file.file->f_op->unlocked_ioctl != cifs_ioctl) {
++		rc = -EBADF;
++		cifs_dbg(VFS, "src file seems to be from a different filesystem type\n");
++		goto out_fput;
++	}
++
+ 	if ((!src_file.file->private_data) || (!dst_file->private_data)) {
+ 		rc = -EBADF;
+ 		cifs_dbg(VFS, "missing cifsFileInfo on copy range src file\n");
+diff --git a/fs/coredump.c b/fs/coredump.c
+index c5ecde6f3eed..a8f75640ac86 100644
+--- a/fs/coredump.c
++++ b/fs/coredump.c
+@@ -513,10 +513,10 @@ void do_coredump(const siginfo_t *siginfo)
+ 	const struct cred *old_cred;
+ 	struct cred *cred;
+ 	int retval = 0;
+-	int flag = 0;
+ 	int ispipe;
+ 	struct files_struct *displaced;
+-	bool need_nonrelative = false;
++	/* require nonrelative corefile path and be extra careful */
++	bool need_suid_safe = false;
+ 	bool core_dumped = false;
+ 	static atomic_t core_dump_count = ATOMIC_INIT(0);
+ 	struct coredump_params cprm = {
+@@ -550,9 +550,8 @@ void do_coredump(const siginfo_t *siginfo)
+ 	 */
+ 	if (__get_dumpable(cprm.mm_flags) == SUID_DUMP_ROOT) {
+ 		/* Setuid core dump mode */
+-		flag = O_EXCL;		/* Stop rewrite attacks */
+ 		cred->fsuid = GLOBAL_ROOT_UID;	/* Dump root private */
+-		need_nonrelative = true;
++		need_suid_safe = true;
+ 	}
+ 
+ 	retval = coredump_wait(siginfo->si_signo, &core_state);
+@@ -633,7 +632,7 @@ void do_coredump(const siginfo_t *siginfo)
+ 		if (cprm.limit < binfmt->min_coredump)
+ 			goto fail_unlock;
+ 
+-		if (need_nonrelative && cn.corename[0] != '/') {
++		if (need_suid_safe && cn.corename[0] != '/') {
+ 			printk(KERN_WARNING "Pid %d(%s) can only dump core "\
+ 				"to fully qualified path!\n",
+ 				task_tgid_vnr(current), current->comm);
+@@ -641,8 +640,35 @@ void do_coredump(const siginfo_t *siginfo)
+ 			goto fail_unlock;
+ 		}
+ 
++		/*
++		 * Unlink the file if it exists unless this is a SUID
++		 * binary - in that case, we're running around with root
++		 * privs and don't want to unlink another user's coredump.
++		 */
++		if (!need_suid_safe) {
++			mm_segment_t old_fs;
++
++			old_fs = get_fs();
++			set_fs(KERNEL_DS);
++			/*
++			 * If it doesn't exist, that's fine. If there's some
++			 * other problem, we'll catch it at the filp_open().
++			 */
++			(void) sys_unlink((const char __user *)cn.corename);
++			set_fs(old_fs);
++		}
++
++		/*
++		 * There is a race between unlinking and creating the
++		 * file, but if that causes an EEXIST here, that's
++		 * fine - another process raced with us while creating
++		 * the corefile, and the other process won. To userspace,
++		 * what matters is that at least one of the two processes
++		 * writes its coredump successfully, not which one.
++		 */
+ 		cprm.file = filp_open(cn.corename,
+-				 O_CREAT | 2 | O_NOFOLLOW | O_LARGEFILE | flag,
++				 O_CREAT | 2 | O_NOFOLLOW |
++				 O_LARGEFILE | O_EXCL,
+ 				 0600);
+ 		if (IS_ERR(cprm.file))
+ 			goto fail_unlock;
+@@ -659,11 +685,15 @@ void do_coredump(const siginfo_t *siginfo)
+ 		if (!S_ISREG(inode->i_mode))
+ 			goto close_fail;
+ 		/*
+-		 * Dont allow local users get cute and trick others to coredump
+-		 * into their pre-created files.
++		 * Don't dump core if the filesystem changed owner or mode
++		 * of the file during file creation. This is an issue when
++		 * a process dumps core while its cwd is e.g. on a vfat
++		 * filesystem.
+ 		 */
+ 		if (!uid_eq(inode->i_uid, current_fsuid()))
+ 			goto close_fail;
++		if ((inode->i_mode & 0677) != 0600)
++			goto close_fail;
+ 		if (!(cprm.file->f_mode & FMODE_CAN_WRITE))
+ 			goto close_fail;
+ 		if (do_truncate(cprm.file->f_path.dentry, 0, 0, cprm.file))
+diff --git a/fs/ecryptfs/dentry.c b/fs/ecryptfs/dentry.c
+index 8db0b464483f..63cd2c147221 100644
+--- a/fs/ecryptfs/dentry.c
++++ b/fs/ecryptfs/dentry.c
+@@ -45,20 +45,20 @@
+ static int ecryptfs_d_revalidate(struct dentry *dentry, unsigned int flags)
+ {
+ 	struct dentry *lower_dentry = ecryptfs_dentry_to_lower(dentry);
+-	int rc;
+-
+-	if (!(lower_dentry->d_flags & DCACHE_OP_REVALIDATE))
+-		return 1;
++	int rc = 1;
+ 
+ 	if (flags & LOOKUP_RCU)
+ 		return -ECHILD;
+ 
+-	rc = lower_dentry->d_op->d_revalidate(lower_dentry, flags);
++	if (lower_dentry->d_flags & DCACHE_OP_REVALIDATE)
++		rc = lower_dentry->d_op->d_revalidate(lower_dentry, flags);
++
+ 	if (d_really_is_positive(dentry)) {
+-		struct inode *lower_inode =
+-			ecryptfs_inode_to_lower(d_inode(dentry));
++		struct inode *inode = d_inode(dentry);
+ 
+-		fsstack_copy_attr_all(d_inode(dentry), lower_inode);
++		fsstack_copy_attr_all(inode, ecryptfs_inode_to_lower(inode));
++		if (!inode->i_nlink)
++			return 0;
+ 	}
+ 	return rc;
+ }
+diff --git a/fs/ext4/super.c b/fs/ext4/super.c
+index 9981064c4a54..a5e8c744e962 100644
+--- a/fs/ext4/super.c
++++ b/fs/ext4/super.c
+@@ -325,6 +325,22 @@ static void save_error_info(struct super_block *sb, const char *func,
+ 	ext4_commit_super(sb, 1);
+ }
+ 
++/*
++ * The del_gendisk() function uninitializes the disk-specific data
++ * structures, including the bdi structure, without telling anyone
++ * else.  Once this happens, any attempt to call mark_buffer_dirty()
++ * (for example, by ext4_commit_super), will cause a kernel OOPS.
++ * This is a kludge to prevent these oops until we can put in a proper
++ * hook in del_gendisk() to inform the VFS and file system layers.
++ */
++static int block_device_ejected(struct super_block *sb)
++{
++	struct inode *bd_inode = sb->s_bdev->bd_inode;
++	struct backing_dev_info *bdi = inode_to_bdi(bd_inode);
++
++	return bdi->dev == NULL;
++}
++
+ static void ext4_journal_commit_callback(journal_t *journal, transaction_t *txn)
+ {
+ 	struct super_block		*sb = journal->j_private;
+@@ -4617,7 +4633,7 @@ static int ext4_commit_super(struct super_block *sb, int sync)
+ 	struct buffer_head *sbh = EXT4_SB(sb)->s_sbh;
+ 	int error = 0;
+ 
+-	if (!sbh)
++	if (!sbh || block_device_ejected(sb))
+ 		return error;
+ 	if (buffer_write_io_error(sbh)) {
+ 		/*
+@@ -4833,10 +4849,11 @@ static int ext4_freeze(struct super_block *sb)
+ 		error = jbd2_journal_flush(journal);
+ 		if (error < 0)
+ 			goto out;
++
++		/* Journal blocked and flushed, clear needs_recovery flag. */
++		EXT4_CLEAR_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
+ 	}
+ 
+-	/* Journal blocked and flushed, clear needs_recovery flag. */
+-	EXT4_CLEAR_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
+ 	error = ext4_commit_super(sb, 1);
+ out:
+ 	if (journal)
+@@ -4854,8 +4871,11 @@ static int ext4_unfreeze(struct super_block *sb)
+ 	if (sb->s_flags & MS_RDONLY)
+ 		return 0;
+ 
+-	/* Reset the needs_recovery flag before the fs is unlocked. */
+-	EXT4_SET_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
++	if (EXT4_SB(sb)->s_journal) {
++		/* Reset the needs_recovery flag before the fs is unlocked. */
++		EXT4_SET_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
++	}
++
+ 	ext4_commit_super(sb, 1);
+ 	return 0;
+ }
+diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c
+index d3fa6bd9503e..221719eac5de 100644
+--- a/fs/hfs/bnode.c
++++ b/fs/hfs/bnode.c
+@@ -288,7 +288,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
+ 			page_cache_release(page);
+ 			goto fail;
+ 		}
+-		page_cache_release(page);
+ 		node->page[i] = page;
+ 	}
+ 
+@@ -398,11 +397,11 @@ node_error:
+ 
+ void hfs_bnode_free(struct hfs_bnode *node)
+ {
+-	//int i;
++	int i;
+ 
+-	//for (i = 0; i < node->tree->pages_per_bnode; i++)
+-	//	if (node->page[i])
+-	//		page_cache_release(node->page[i]);
++	for (i = 0; i < node->tree->pages_per_bnode; i++)
++		if (node->page[i])
++			page_cache_release(node->page[i]);
+ 	kfree(node);
+ }
+ 
+diff --git a/fs/hfs/brec.c b/fs/hfs/brec.c
+index 9f4ee7f52026..6fc766df0461 100644
+--- a/fs/hfs/brec.c
++++ b/fs/hfs/brec.c
+@@ -131,13 +131,16 @@ skip:
+ 	hfs_bnode_write(node, entry, data_off + key_len, entry_len);
+ 	hfs_bnode_dump(node);
+ 
+-	if (new_node) {
+-		/* update parent key if we inserted a key
+-		 * at the start of the first node
+-		 */
+-		if (!rec && new_node != node)
+-			hfs_brec_update_parent(fd);
++	/*
++	 * update parent key if we inserted a key
++	 * at the start of the node and it is not the new node
++	 */
++	if (!rec && new_node != node) {
++		hfs_bnode_read_key(node, fd->search_key, data_off + size);
++		hfs_brec_update_parent(fd);
++	}
+ 
++	if (new_node) {
+ 		hfs_bnode_put(fd->bnode);
+ 		if (!new_node->parent) {
+ 			hfs_btree_inc_height(tree);
+@@ -166,9 +169,6 @@ skip:
+ 		goto again;
+ 	}
+ 
+-	if (!rec)
+-		hfs_brec_update_parent(fd);
+-
+ 	return 0;
+ }
+ 
+@@ -366,6 +366,8 @@ again:
+ 	if (IS_ERR(parent))
+ 		return PTR_ERR(parent);
+ 	__hfs_brec_find(parent, fd);
++	if (fd->record < 0)
++		return -ENOENT;
+ 	hfs_bnode_dump(parent);
+ 	rec = fd->record;
+ 
+diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
+index 759708fd9331..63924662aaf3 100644
+--- a/fs/hfsplus/bnode.c
++++ b/fs/hfsplus/bnode.c
+@@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
+ 			page_cache_release(page);
+ 			goto fail;
+ 		}
+-		page_cache_release(page);
+ 		node->page[i] = page;
+ 	}
+ 
+@@ -566,13 +565,11 @@ node_error:
+ 
+ void hfs_bnode_free(struct hfs_bnode *node)
+ {
+-#if 0
+ 	int i;
+ 
+ 	for (i = 0; i < node->tree->pages_per_bnode; i++)
+ 		if (node->page[i])
+ 			page_cache_release(node->page[i]);
+-#endif
+ 	kfree(node);
+ }
+ 
+diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
+index 4227dc4f7437..8c44654ce274 100644
+--- a/fs/jbd2/checkpoint.c
++++ b/fs/jbd2/checkpoint.c
+@@ -417,12 +417,12 @@ int jbd2_cleanup_journal_tail(journal_t *journal)
+  * journal_clean_one_cp_list
+  *
+  * Find all the written-back checkpoint buffers in the given list and
+- * release them.
++ * release them. If 'destroy' is set, clean all buffers unconditionally.
+  *
+  * Called with j_list_lock held.
+  * Returns 1 if we freed the transaction, 0 otherwise.
+  */
+-static int journal_clean_one_cp_list(struct journal_head *jh)
++static int journal_clean_one_cp_list(struct journal_head *jh, bool destroy)
+ {
+ 	struct journal_head *last_jh;
+ 	struct journal_head *next_jh = jh;
+@@ -436,7 +436,10 @@ static int journal_clean_one_cp_list(struct journal_head *jh)
+ 	do {
+ 		jh = next_jh;
+ 		next_jh = jh->b_cpnext;
+-		ret = __try_to_free_cp_buf(jh);
++		if (!destroy)
++			ret = __try_to_free_cp_buf(jh);
++		else
++			ret = __jbd2_journal_remove_checkpoint(jh) + 1;
+ 		if (!ret)
+ 			return freed;
+ 		if (ret == 2)
+@@ -459,10 +462,11 @@ static int journal_clean_one_cp_list(struct journal_head *jh)
+  * journal_clean_checkpoint_list
+  *
+  * Find all the written-back checkpoint buffers in the journal and release them.
++ * If 'destroy' is set, release all buffers unconditionally.
+  *
+  * Called with j_list_lock held.
+  */
+-void __jbd2_journal_clean_checkpoint_list(journal_t *journal)
++void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy)
+ {
+ 	transaction_t *transaction, *last_transaction, *next_transaction;
+ 	int ret;
+@@ -476,7 +480,8 @@ void __jbd2_journal_clean_checkpoint_list(journal_t *journal)
+ 	do {
+ 		transaction = next_transaction;
+ 		next_transaction = transaction->t_cpnext;
+-		ret = journal_clean_one_cp_list(transaction->t_checkpoint_list);
++		ret = journal_clean_one_cp_list(transaction->t_checkpoint_list,
++						destroy);
+ 		/*
+ 		 * This function only frees up some memory if possible so we
+ 		 * dont have an obligation to finish processing. Bail out if
+@@ -492,7 +497,7 @@ void __jbd2_journal_clean_checkpoint_list(journal_t *journal)
+ 		 * we can possibly see not yet submitted buffers on io_list
+ 		 */
+ 		ret = journal_clean_one_cp_list(transaction->
+-				t_checkpoint_io_list);
++				t_checkpoint_io_list, destroy);
+ 		if (need_resched())
+ 			return;
+ 		/*
+@@ -506,6 +511,28 @@ void __jbd2_journal_clean_checkpoint_list(journal_t *journal)
+ }
+ 
+ /*
++ * Remove buffers from all checkpoint lists as journal is aborted and we just
++ * need to free memory
++ */
++void jbd2_journal_destroy_checkpoint(journal_t *journal)
++{
++	/*
++	 * We loop because __jbd2_journal_clean_checkpoint_list() may abort
++	 * early due to a need of rescheduling.
++	 */
++	while (1) {
++		spin_lock(&journal->j_list_lock);
++		if (!journal->j_checkpoint_transactions) {
++			spin_unlock(&journal->j_list_lock);
++			break;
++		}
++		__jbd2_journal_clean_checkpoint_list(journal, true);
++		spin_unlock(&journal->j_list_lock);
++		cond_resched();
++	}
++}
++
++/*
+  * journal_remove_checkpoint: called after a buffer has been committed
+  * to disk (either by being write-back flushed to disk, or being
+  * committed to the log).
+diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
+index b73e0215baa7..362e5f614450 100644
+--- a/fs/jbd2/commit.c
++++ b/fs/jbd2/commit.c
+@@ -510,7 +510,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
+ 	 * frees some memory
+ 	 */
+ 	spin_lock(&journal->j_list_lock);
+-	__jbd2_journal_clean_checkpoint_list(journal);
++	__jbd2_journal_clean_checkpoint_list(journal, false);
+ 	spin_unlock(&journal->j_list_lock);
+ 
+ 	jbd_debug(3, "JBD2: commit phase 1\n");
+diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
+index 4ff3fad4e9e3..2721513adb1f 100644
+--- a/fs/jbd2/journal.c
++++ b/fs/jbd2/journal.c
+@@ -1693,8 +1693,17 @@ int jbd2_journal_destroy(journal_t *journal)
+ 	while (journal->j_checkpoint_transactions != NULL) {
+ 		spin_unlock(&journal->j_list_lock);
+ 		mutex_lock(&journal->j_checkpoint_mutex);
+-		jbd2_log_do_checkpoint(journal);
++		err = jbd2_log_do_checkpoint(journal);
+ 		mutex_unlock(&journal->j_checkpoint_mutex);
++		/*
++		 * If checkpointing failed, just free the buffers to avoid
++		 * looping forever
++		 */
++		if (err) {
++			jbd2_journal_destroy_checkpoint(journal);
++			spin_lock(&journal->j_list_lock);
++			break;
++		}
+ 		spin_lock(&journal->j_list_lock);
+ 	}
+ 
+diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
+index b3289d701eea..14e3b1e1b17d 100644
+--- a/fs/nfs/flexfilelayout/flexfilelayout.c
++++ b/fs/nfs/flexfilelayout/flexfilelayout.c
+@@ -1199,6 +1199,11 @@ static int ff_layout_write_done_cb(struct rpc_task *task,
+ 	    hdr->res.verf->committed == NFS_DATA_SYNC)
+ 		ff_layout_set_layoutcommit(hdr);
+ 
++	/* zero out fattr since we don't care DS attr at all */
++	hdr->fattr.valid = 0;
++	if (task->tk_status >= 0)
++		nfs_writeback_update_inode(hdr);
++
+ 	return 0;
+ }
+ 
+diff --git a/fs/nfs/flexfilelayout/flexfilelayoutdev.c b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
+index f13e1969eedd..b28fa4cbea52 100644
+--- a/fs/nfs/flexfilelayout/flexfilelayoutdev.c
++++ b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
+@@ -500,16 +500,19 @@ int ff_layout_encode_ds_ioerr(struct nfs4_flexfile_layout *flo,
+ 					   range->offset, range->length))
+ 			continue;
+ 		/* offset(8) + length(8) + stateid(NFS4_STATEID_SIZE)
+-		 * + deviceid(NFS4_DEVICEID4_SIZE) + status(4) + opnum(4)
++		 * + array length + deviceid(NFS4_DEVICEID4_SIZE)
++		 * + status(4) + opnum(4)
+ 		 */
+ 		p = xdr_reserve_space(xdr,
+-				24 + NFS4_STATEID_SIZE + NFS4_DEVICEID4_SIZE);
++				28 + NFS4_STATEID_SIZE + NFS4_DEVICEID4_SIZE);
+ 		if (unlikely(!p))
+ 			return -ENOBUFS;
+ 		p = xdr_encode_hyper(p, err->offset);
+ 		p = xdr_encode_hyper(p, err->length);
+ 		p = xdr_encode_opaque_fixed(p, &err->stateid,
+ 					    NFS4_STATEID_SIZE);
++		/* Encode 1 error */
++		*p++ = cpu_to_be32(1);
+ 		p = xdr_encode_opaque_fixed(p, &err->deviceid,
+ 					    NFS4_DEVICEID4_SIZE);
+ 		*p++ = cpu_to_be32(err->status);
+diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
+index 0adc7d245b3d..4afbe13321cb 100644
+--- a/fs/nfs/inode.c
++++ b/fs/nfs/inode.c
+@@ -1273,13 +1273,6 @@ static int nfs_check_inode_attributes(struct inode *inode, struct nfs_fattr *fat
+ 	return 0;
+ }
+ 
+-static int nfs_ctime_need_update(const struct inode *inode, const struct nfs_fattr *fattr)
+-{
+-	if (!(fattr->valid & NFS_ATTR_FATTR_CTIME))
+-		return 0;
+-	return timespec_compare(&fattr->ctime, &inode->i_ctime) > 0;
+-}
+-
+ static atomic_long_t nfs_attr_generation_counter;
+ 
+ static unsigned long nfs_read_attr_generation_counter(void)
+@@ -1428,7 +1421,6 @@ static int nfs_inode_attrs_need_update(const struct inode *inode, const struct n
+ 	const struct nfs_inode *nfsi = NFS_I(inode);
+ 
+ 	return ((long)fattr->gencount - (long)nfsi->attr_gencount) > 0 ||
+-		nfs_ctime_need_update(inode, fattr) ||
+ 		((long)nfsi->attr_gencount - (long)nfs_read_attr_generation_counter() > 0);
+ }
+ 
+@@ -1491,6 +1483,13 @@ static int nfs_post_op_update_inode_locked(struct inode *inode, struct nfs_fattr
+ {
+ 	unsigned long invalid = NFS_INO_INVALID_ATTR|NFS_INO_REVAL_PAGECACHE;
+ 
++	/*
++	 * Don't revalidate the pagecache if we hold a delegation, but do
++	 * force an attribute update
++	 */
++	if (NFS_PROTO(inode)->have_delegation(inode, FMODE_READ))
++		invalid = NFS_INO_INVALID_ATTR|NFS_INO_REVAL_FORCED;
++
+ 	if (S_ISDIR(inode->i_mode))
+ 		invalid |= NFS_INO_INVALID_DATA;
+ 	nfs_set_cache_invalid(inode, invalid);
+diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
+index 9b372b845f6a..1dad18105ed0 100644
+--- a/fs/nfs/internal.h
++++ b/fs/nfs/internal.h
+@@ -490,6 +490,9 @@ void nfs_retry_commit(struct list_head *page_list,
+ void nfs_commitdata_release(struct nfs_commit_data *data);
+ void nfs_request_add_commit_list(struct nfs_page *req, struct list_head *dst,
+ 				 struct nfs_commit_info *cinfo);
++void nfs_request_add_commit_list_locked(struct nfs_page *req,
++		struct list_head *dst,
++		struct nfs_commit_info *cinfo);
+ void nfs_request_remove_commit_list(struct nfs_page *req,
+ 				    struct nfs_commit_info *cinfo);
+ void nfs_init_cinfo(struct nfs_commit_info *cinfo,
+@@ -623,13 +626,15 @@ void nfs_super_set_maxbytes(struct super_block *sb, __u64 maxfilesize)
+  * Record the page as unstable and mark its inode as dirty.
+  */
+ static inline
+-void nfs_mark_page_unstable(struct page *page)
++void nfs_mark_page_unstable(struct page *page, struct nfs_commit_info *cinfo)
+ {
+-	struct inode *inode = page_file_mapping(page)->host;
++	if (!cinfo->dreq) {
++		struct inode *inode = page_file_mapping(page)->host;
+ 
+-	inc_zone_page_state(page, NR_UNSTABLE_NFS);
+-	inc_wb_stat(&inode_to_bdi(inode)->wb, WB_RECLAIMABLE);
+-	 __mark_inode_dirty(inode, I_DIRTY_DATASYNC);
++		inc_zone_page_state(page, NR_UNSTABLE_NFS);
++		inc_wb_stat(&inode_to_bdi(inode)->wb, WB_RECLAIMABLE);
++		__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
++	}
+ }
+ 
+ /*
+diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
+index 3acb1eb72930..73c8204ad463 100644
+--- a/fs/nfs/nfs4proc.c
++++ b/fs/nfs/nfs4proc.c
+@@ -1156,6 +1156,8 @@ static int can_open_delegated(struct nfs_delegation *delegation, fmode_t fmode)
+ 		return 0;
+ 	if ((delegation->type & fmode) != fmode)
+ 		return 0;
++	if (test_bit(NFS_DELEGATION_NEED_RECLAIM, &delegation->flags))
++		return 0;
+ 	if (test_bit(NFS_DELEGATION_RETURNING, &delegation->flags))
+ 		return 0;
+ 	nfs_mark_delegation_referenced(delegation);
+@@ -1220,6 +1222,7 @@ static void nfs_resync_open_stateid_locked(struct nfs4_state *state)
+ }
+ 
+ static void nfs_clear_open_stateid_locked(struct nfs4_state *state,
++		nfs4_stateid *arg_stateid,
+ 		nfs4_stateid *stateid, fmode_t fmode)
+ {
+ 	clear_bit(NFS_O_RDWR_STATE, &state->flags);
+@@ -1238,8 +1241,9 @@ static void nfs_clear_open_stateid_locked(struct nfs4_state *state,
+ 	if (stateid == NULL)
+ 		return;
+ 	/* Handle races with OPEN */
+-	if (!nfs4_stateid_match_other(stateid, &state->open_stateid) ||
+-	    !nfs4_stateid_is_newer(stateid, &state->open_stateid)) {
++	if (!nfs4_stateid_match_other(arg_stateid, &state->open_stateid) ||
++	    (nfs4_stateid_match_other(stateid, &state->open_stateid) &&
++	    !nfs4_stateid_is_newer(stateid, &state->open_stateid))) {
+ 		nfs_resync_open_stateid_locked(state);
+ 		return;
+ 	}
+@@ -1248,10 +1252,12 @@ static void nfs_clear_open_stateid_locked(struct nfs4_state *state,
+ 	nfs4_stateid_copy(&state->open_stateid, stateid);
+ }
+ 
+-static void nfs_clear_open_stateid(struct nfs4_state *state, nfs4_stateid *stateid, fmode_t fmode)
++static void nfs_clear_open_stateid(struct nfs4_state *state,
++	nfs4_stateid *arg_stateid,
++	nfs4_stateid *stateid, fmode_t fmode)
+ {
+ 	write_seqlock(&state->seqlock);
+-	nfs_clear_open_stateid_locked(state, stateid, fmode);
++	nfs_clear_open_stateid_locked(state, arg_stateid, stateid, fmode);
+ 	write_sequnlock(&state->seqlock);
+ 	if (test_bit(NFS_STATE_RECLAIM_NOGRACE, &state->flags))
+ 		nfs4_schedule_state_manager(state->owner->so_server->nfs_client);
+@@ -2425,7 +2431,7 @@ static int _nfs4_do_open(struct inode *dir,
+ 		goto err_free_label;
+ 	state = ctx->state;
+ 
+-	if ((opendata->o_arg.open_flags & O_EXCL) &&
++	if ((opendata->o_arg.open_flags & (O_CREAT|O_EXCL)) == (O_CREAT|O_EXCL) &&
+ 	    (opendata->o_arg.createmode != NFS4_CREATE_GUARDED)) {
+ 		nfs4_exclusive_attrset(opendata, sattr);
+ 
+@@ -2684,7 +2690,8 @@ static void nfs4_close_done(struct rpc_task *task, void *data)
+ 				goto out_release;
+ 			}
+ 	}
+-	nfs_clear_open_stateid(state, res_stateid, calldata->arg.fmode);
++	nfs_clear_open_stateid(state, &calldata->arg.stateid,
++			res_stateid, calldata->arg.fmode);
+ out_release:
+ 	nfs_release_seqid(calldata->arg.seqid);
+ 	nfs_refresh_inode(calldata->inode, calldata->res.fattr);
+@@ -4984,7 +4991,7 @@ nfs4_init_nonuniform_client_string(struct nfs_client *clp)
+ 		return 0;
+ retry:
+ 	rcu_read_lock();
+-	len = 10 + strlen(clp->cl_ipaddr) + 1 +
++	len = 14 + strlen(clp->cl_ipaddr) + 1 +
+ 		strlen(rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_ADDR)) +
+ 		1 +
+ 		strlen(rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_PROTO)) +
+@@ -8661,6 +8668,7 @@ static const struct nfs4_minor_version_ops nfs_v4_2_minor_ops = {
+ 	.reboot_recovery_ops = &nfs41_reboot_recovery_ops,
+ 	.nograce_recovery_ops = &nfs41_nograce_recovery_ops,
+ 	.state_renewal_ops = &nfs41_state_renewal_ops,
++	.mig_recovery_ops = &nfs41_mig_recovery_ops,
+ };
+ #endif
+ 
+diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
+index 4984bbe55ff1..7c5718ba625e 100644
+--- a/fs/nfs/pagelist.c
++++ b/fs/nfs/pagelist.c
+@@ -77,8 +77,8 @@ EXPORT_SYMBOL_GPL(nfs_pgheader_init);
+ void nfs_set_pgio_error(struct nfs_pgio_header *hdr, int error, loff_t pos)
+ {
+ 	spin_lock(&hdr->lock);
+-	if (pos < hdr->io_start + hdr->good_bytes) {
+-		set_bit(NFS_IOHDR_ERROR, &hdr->flags);
++	if (!test_and_set_bit(NFS_IOHDR_ERROR, &hdr->flags)
++	    || pos < hdr->io_start + hdr->good_bytes) {
+ 		clear_bit(NFS_IOHDR_EOF, &hdr->flags);
+ 		hdr->good_bytes = pos - hdr->io_start;
+ 		hdr->error = error;
+diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c
+index f37e25b6311c..e5c679f04099 100644
+--- a/fs/nfs/pnfs_nfs.c
++++ b/fs/nfs/pnfs_nfs.c
+@@ -359,26 +359,31 @@ same_sockaddr(struct sockaddr *addr1, struct sockaddr *addr2)
+ 	return false;
+ }
+ 
++/*
++ * Checks if 'dsaddrs1' contains a subset of 'dsaddrs2'. If it does,
++ * declare a match.
++ */
+ static bool
+ _same_data_server_addrs_locked(const struct list_head *dsaddrs1,
+ 			       const struct list_head *dsaddrs2)
+ {
+ 	struct nfs4_pnfs_ds_addr *da1, *da2;
+-
+-	/* step through both lists, comparing as we go */
+-	for (da1 = list_first_entry(dsaddrs1, typeof(*da1), da_node),
+-	     da2 = list_first_entry(dsaddrs2, typeof(*da2), da_node);
+-	     da1 != NULL && da2 != NULL;
+-	     da1 = list_entry(da1->da_node.next, typeof(*da1), da_node),
+-	     da2 = list_entry(da2->da_node.next, typeof(*da2), da_node)) {
+-		if (!same_sockaddr((struct sockaddr *)&da1->da_addr,
+-				   (struct sockaddr *)&da2->da_addr))
+-			return false;
++	struct sockaddr *sa1, *sa2;
++	bool match = false;
++
++	list_for_each_entry(da1, dsaddrs1, da_node) {
++		sa1 = (struct sockaddr *)&da1->da_addr;
++		match = false;
++		list_for_each_entry(da2, dsaddrs2, da_node) {
++			sa2 = (struct sockaddr *)&da2->da_addr;
++			match = same_sockaddr(sa1, sa2);
++			if (match)
++				break;
++		}
++		if (!match)
++			break;
+ 	}
+-	if (da1 == NULL && da2 == NULL)
+-		return true;
+-
+-	return false;
++	return match;
+ }
+ 
+ /*
+@@ -863,9 +868,10 @@ pnfs_layout_mark_request_commit(struct nfs_page *req,
+ 	}
+ 	set_bit(PG_COMMIT_TO_DS, &req->wb_flags);
+ 	cinfo->ds->nwritten++;
+-	spin_unlock(cinfo->lock);
+ 
+-	nfs_request_add_commit_list(req, list, cinfo);
++	nfs_request_add_commit_list_locked(req, list, cinfo);
++	spin_unlock(cinfo->lock);
++	nfs_mark_page_unstable(req->wb_page, cinfo);
+ }
+ EXPORT_SYMBOL_GPL(pnfs_layout_mark_request_commit);
+ 
+diff --git a/fs/nfs/write.c b/fs/nfs/write.c
+index 75a35a1afa79..fdee9270ca15 100644
+--- a/fs/nfs/write.c
++++ b/fs/nfs/write.c
+@@ -768,6 +768,28 @@ nfs_page_search_commits_for_head_request_locked(struct nfs_inode *nfsi,
+ }
+ 
+ /**
++ * nfs_request_add_commit_list_locked - add request to a commit list
++ * @req: pointer to a struct nfs_page
++ * @dst: commit list head
++ * @cinfo: holds list lock and accounting info
++ *
++ * This sets the PG_CLEAN bit, updates the cinfo count of
++ * number of outstanding requests requiring a commit as well as
++ * the MM page stats.
++ *
++ * The caller must hold the cinfo->lock, and the nfs_page lock.
++ */
++void
++nfs_request_add_commit_list_locked(struct nfs_page *req, struct list_head *dst,
++			    struct nfs_commit_info *cinfo)
++{
++	set_bit(PG_CLEAN, &req->wb_flags);
++	nfs_list_add_request(req, dst);
++	cinfo->mds->ncommit++;
++}
++EXPORT_SYMBOL_GPL(nfs_request_add_commit_list_locked);
++
++/**
+  * nfs_request_add_commit_list - add request to a commit list
+  * @req: pointer to a struct nfs_page
+  * @dst: commit list head
+@@ -784,13 +806,10 @@ void
+ nfs_request_add_commit_list(struct nfs_page *req, struct list_head *dst,
+ 			    struct nfs_commit_info *cinfo)
+ {
+-	set_bit(PG_CLEAN, &(req)->wb_flags);
+ 	spin_lock(cinfo->lock);
+-	nfs_list_add_request(req, dst);
+-	cinfo->mds->ncommit++;
++	nfs_request_add_commit_list_locked(req, dst, cinfo);
+ 	spin_unlock(cinfo->lock);
+-	if (!cinfo->dreq)
+-		nfs_mark_page_unstable(req->wb_page);
++	nfs_mark_page_unstable(req->wb_page, cinfo);
+ }
+ EXPORT_SYMBOL_GPL(nfs_request_add_commit_list);
+ 
+diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
+index 95202719a1fd..75189cd34583 100644
+--- a/fs/nfsd/nfs4state.c
++++ b/fs/nfsd/nfs4state.c
+@@ -777,13 +777,16 @@ hash_delegation_locked(struct nfs4_delegation *dp, struct nfs4_file *fp)
+ 	list_add(&dp->dl_perclnt, &dp->dl_stid.sc_client->cl_delegations);
+ }
+ 
+-static void
++static bool
+ unhash_delegation_locked(struct nfs4_delegation *dp)
+ {
+ 	struct nfs4_file *fp = dp->dl_stid.sc_file;
+ 
+ 	lockdep_assert_held(&state_lock);
+ 
++	if (list_empty(&dp->dl_perfile))
++		return false;
++
+ 	dp->dl_stid.sc_type = NFS4_CLOSED_DELEG_STID;
+ 	/* Ensure that deleg break won't try to requeue it */
+ 	++dp->dl_time;
+@@ -792,16 +795,21 @@ unhash_delegation_locked(struct nfs4_delegation *dp)
+ 	list_del_init(&dp->dl_recall_lru);
+ 	list_del_init(&dp->dl_perfile);
+ 	spin_unlock(&fp->fi_lock);
++	return true;
+ }
+ 
+ static void destroy_delegation(struct nfs4_delegation *dp)
+ {
++	bool unhashed;
++
+ 	spin_lock(&state_lock);
+-	unhash_delegation_locked(dp);
++	unhashed = unhash_delegation_locked(dp);
+ 	spin_unlock(&state_lock);
+-	put_clnt_odstate(dp->dl_clnt_odstate);
+-	nfs4_put_deleg_lease(dp->dl_stid.sc_file);
+-	nfs4_put_stid(&dp->dl_stid);
++	if (unhashed) {
++		put_clnt_odstate(dp->dl_clnt_odstate);
++		nfs4_put_deleg_lease(dp->dl_stid.sc_file);
++		nfs4_put_stid(&dp->dl_stid);
++	}
+ }
+ 
+ static void revoke_delegation(struct nfs4_delegation *dp)
+@@ -1004,16 +1012,20 @@ static void nfs4_put_stateowner(struct nfs4_stateowner *sop)
+ 	sop->so_ops->so_free(sop);
+ }
+ 
+-static void unhash_ol_stateid(struct nfs4_ol_stateid *stp)
++static bool unhash_ol_stateid(struct nfs4_ol_stateid *stp)
+ {
+ 	struct nfs4_file *fp = stp->st_stid.sc_file;
+ 
+ 	lockdep_assert_held(&stp->st_stateowner->so_client->cl_lock);
+ 
++	if (list_empty(&stp->st_perfile))
++		return false;
++
+ 	spin_lock(&fp->fi_lock);
+-	list_del(&stp->st_perfile);
++	list_del_init(&stp->st_perfile);
+ 	spin_unlock(&fp->fi_lock);
+ 	list_del(&stp->st_perstateowner);
++	return true;
+ }
+ 
+ static void nfs4_free_ol_stateid(struct nfs4_stid *stid)
+@@ -1063,25 +1075,27 @@ static void put_ol_stateid_locked(struct nfs4_ol_stateid *stp,
+ 	list_add(&stp->st_locks, reaplist);
+ }
+ 
+-static void unhash_lock_stateid(struct nfs4_ol_stateid *stp)
++static bool unhash_lock_stateid(struct nfs4_ol_stateid *stp)
+ {
+ 	struct nfs4_openowner *oo = openowner(stp->st_openstp->st_stateowner);
+ 
+ 	lockdep_assert_held(&oo->oo_owner.so_client->cl_lock);
+ 
+ 	list_del_init(&stp->st_locks);
+-	unhash_ol_stateid(stp);
+ 	nfs4_unhash_stid(&stp->st_stid);
++	return unhash_ol_stateid(stp);
+ }
+ 
+ static void release_lock_stateid(struct nfs4_ol_stateid *stp)
+ {
+ 	struct nfs4_openowner *oo = openowner(stp->st_openstp->st_stateowner);
++	bool unhashed;
+ 
+ 	spin_lock(&oo->oo_owner.so_client->cl_lock);
+-	unhash_lock_stateid(stp);
++	unhashed = unhash_lock_stateid(stp);
+ 	spin_unlock(&oo->oo_owner.so_client->cl_lock);
+-	nfs4_put_stid(&stp->st_stid);
++	if (unhashed)
++		nfs4_put_stid(&stp->st_stid);
+ }
+ 
+ static void unhash_lockowner_locked(struct nfs4_lockowner *lo)
+@@ -1129,7 +1143,7 @@ static void release_lockowner(struct nfs4_lockowner *lo)
+ 	while (!list_empty(&lo->lo_owner.so_stateids)) {
+ 		stp = list_first_entry(&lo->lo_owner.so_stateids,
+ 				struct nfs4_ol_stateid, st_perstateowner);
+-		unhash_lock_stateid(stp);
++		WARN_ON(!unhash_lock_stateid(stp));
+ 		put_ol_stateid_locked(stp, &reaplist);
+ 	}
+ 	spin_unlock(&clp->cl_lock);
+@@ -1142,21 +1156,26 @@ static void release_open_stateid_locks(struct nfs4_ol_stateid *open_stp,
+ {
+ 	struct nfs4_ol_stateid *stp;
+ 
++	lockdep_assert_held(&open_stp->st_stid.sc_client->cl_lock);
++
+ 	while (!list_empty(&open_stp->st_locks)) {
+ 		stp = list_entry(open_stp->st_locks.next,
+ 				struct nfs4_ol_stateid, st_locks);
+-		unhash_lock_stateid(stp);
++		WARN_ON(!unhash_lock_stateid(stp));
+ 		put_ol_stateid_locked(stp, reaplist);
+ 	}
+ }
+ 
+-static void unhash_open_stateid(struct nfs4_ol_stateid *stp,
++static bool unhash_open_stateid(struct nfs4_ol_stateid *stp,
+ 				struct list_head *reaplist)
+ {
++	bool unhashed;
++
+ 	lockdep_assert_held(&stp->st_stid.sc_client->cl_lock);
+ 
+-	unhash_ol_stateid(stp);
++	unhashed = unhash_ol_stateid(stp);
+ 	release_open_stateid_locks(stp, reaplist);
++	return unhashed;
+ }
+ 
+ static void release_open_stateid(struct nfs4_ol_stateid *stp)
+@@ -1164,8 +1183,8 @@ static void release_open_stateid(struct nfs4_ol_stateid *stp)
+ 	LIST_HEAD(reaplist);
+ 
+ 	spin_lock(&stp->st_stid.sc_client->cl_lock);
+-	unhash_open_stateid(stp, &reaplist);
+-	put_ol_stateid_locked(stp, &reaplist);
++	if (unhash_open_stateid(stp, &reaplist))
++		put_ol_stateid_locked(stp, &reaplist);
+ 	spin_unlock(&stp->st_stid.sc_client->cl_lock);
+ 	free_ol_stateid_reaplist(&reaplist);
+ }
+@@ -1210,8 +1229,8 @@ static void release_openowner(struct nfs4_openowner *oo)
+ 	while (!list_empty(&oo->oo_owner.so_stateids)) {
+ 		stp = list_first_entry(&oo->oo_owner.so_stateids,
+ 				struct nfs4_ol_stateid, st_perstateowner);
+-		unhash_open_stateid(stp, &reaplist);
+-		put_ol_stateid_locked(stp, &reaplist);
++		if (unhash_open_stateid(stp, &reaplist))
++			put_ol_stateid_locked(stp, &reaplist);
+ 	}
+ 	spin_unlock(&clp->cl_lock);
+ 	free_ol_stateid_reaplist(&reaplist);
+@@ -1714,7 +1733,7 @@ __destroy_client(struct nfs4_client *clp)
+ 	spin_lock(&state_lock);
+ 	while (!list_empty(&clp->cl_delegations)) {
+ 		dp = list_entry(clp->cl_delegations.next, struct nfs4_delegation, dl_perclnt);
+-		unhash_delegation_locked(dp);
++		WARN_ON(!unhash_delegation_locked(dp));
+ 		list_add(&dp->dl_recall_lru, &reaplist);
+ 	}
+ 	spin_unlock(&state_lock);
+@@ -4345,7 +4364,7 @@ nfs4_laundromat(struct nfsd_net *nn)
+ 			new_timeo = min(new_timeo, t);
+ 			break;
+ 		}
+-		unhash_delegation_locked(dp);
++		WARN_ON(!unhash_delegation_locked(dp));
+ 		list_add(&dp->dl_recall_lru, &reaplist);
+ 	}
+ 	spin_unlock(&state_lock);
+@@ -4751,7 +4770,7 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
+ 		if (check_for_locks(stp->st_stid.sc_file,
+ 				    lockowner(stp->st_stateowner)))
+ 			break;
+-		unhash_lock_stateid(stp);
++		WARN_ON(!unhash_lock_stateid(stp));
+ 		spin_unlock(&cl->cl_lock);
+ 		nfs4_put_stid(s);
+ 		ret = nfs_ok;
+@@ -4967,20 +4986,23 @@ out:
+ static void nfsd4_close_open_stateid(struct nfs4_ol_stateid *s)
+ {
+ 	struct nfs4_client *clp = s->st_stid.sc_client;
++	bool unhashed;
+ 	LIST_HEAD(reaplist);
+ 
+ 	s->st_stid.sc_type = NFS4_CLOSED_STID;
+ 	spin_lock(&clp->cl_lock);
+-	unhash_open_stateid(s, &reaplist);
++	unhashed = unhash_open_stateid(s, &reaplist);
+ 
+ 	if (clp->cl_minorversion) {
+-		put_ol_stateid_locked(s, &reaplist);
++		if (unhashed)
++			put_ol_stateid_locked(s, &reaplist);
+ 		spin_unlock(&clp->cl_lock);
+ 		free_ol_stateid_reaplist(&reaplist);
+ 	} else {
+ 		spin_unlock(&clp->cl_lock);
+ 		free_ol_stateid_reaplist(&reaplist);
+-		move_to_close_lru(s, clp->net);
++		if (unhashed)
++			move_to_close_lru(s, clp->net);
+ 	}
+ }
+ 
+@@ -6019,7 +6041,7 @@ nfsd_inject_add_lock_to_list(struct nfs4_ol_stateid *lst,
+ 
+ static u64 nfsd_foreach_client_lock(struct nfs4_client *clp, u64 max,
+ 				    struct list_head *collect,
+-				    void (*func)(struct nfs4_ol_stateid *))
++				    bool (*func)(struct nfs4_ol_stateid *))
+ {
+ 	struct nfs4_openowner *oop;
+ 	struct nfs4_ol_stateid *stp, *st_next;
+@@ -6033,9 +6055,9 @@ static u64 nfsd_foreach_client_lock(struct nfs4_client *clp, u64 max,
+ 			list_for_each_entry_safe(lst, lst_next,
+ 					&stp->st_locks, st_locks) {
+ 				if (func) {
+-					func(lst);
+-					nfsd_inject_add_lock_to_list(lst,
+-								collect);
++					if (func(lst))
++						nfsd_inject_add_lock_to_list(lst,
++									collect);
+ 				}
+ 				++count;
+ 				/*
+@@ -6305,7 +6327,7 @@ static u64 nfsd_find_all_delegations(struct nfs4_client *clp, u64 max,
+ 				continue;
+ 
+ 			atomic_inc(&clp->cl_refcount);
+-			unhash_delegation_locked(dp);
++			WARN_ON(!unhash_delegation_locked(dp));
+ 			list_add(&dp->dl_recall_lru, victims);
+ 		}
+ 		++count;
+@@ -6635,7 +6657,7 @@ nfs4_state_shutdown_net(struct net *net)
+ 	spin_lock(&state_lock);
+ 	list_for_each_safe(pos, next, &nn->del_recall_lru) {
+ 		dp = list_entry (pos, struct nfs4_delegation, dl_recall_lru);
+-		unhash_delegation_locked(dp);
++		WARN_ON(!unhash_delegation_locked(dp));
+ 		list_add(&dp->dl_recall_lru, &reaplist);
+ 	}
+ 	spin_unlock(&state_lock);
+diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
+index 75e0563c09d1..b81f725ee21d 100644
+--- a/fs/nfsd/nfs4xdr.c
++++ b/fs/nfsd/nfs4xdr.c
+@@ -2140,6 +2140,27 @@ nfsd4_encode_aclname(struct xdr_stream *xdr, struct svc_rqst *rqstp,
+ 		return nfsd4_encode_user(xdr, rqstp, ace->who_uid);
+ }
+ 
++static inline __be32
++nfsd4_encode_layout_type(struct xdr_stream *xdr, enum pnfs_layouttype layout_type)
++{
++	__be32 *p;
++
++	if (layout_type) {
++		p = xdr_reserve_space(xdr, 8);
++		if (!p)
++			return nfserr_resource;
++		*p++ = cpu_to_be32(1);
++		*p++ = cpu_to_be32(layout_type);
++	} else {
++		p = xdr_reserve_space(xdr, 4);
++		if (!p)
++			return nfserr_resource;
++		*p++ = cpu_to_be32(0);
++	}
++
++	return 0;
++}
++
+ #define WORD0_ABSENT_FS_ATTRS (FATTR4_WORD0_FS_LOCATIONS | FATTR4_WORD0_FSID | \
+ 			      FATTR4_WORD0_RDATTR_ERROR)
+ #define WORD1_ABSENT_FS_ATTRS FATTR4_WORD1_MOUNTED_ON_FILEID
+@@ -2688,20 +2709,16 @@ out_acl:
+ 		p = xdr_encode_hyper(p, stat.ino);
+ 	}
+ #ifdef CONFIG_NFSD_PNFS
+-	if ((bmval1 & FATTR4_WORD1_FS_LAYOUT_TYPES) ||
+-	    (bmval2 & FATTR4_WORD2_LAYOUT_TYPES)) {
+-		if (exp->ex_layout_type) {
+-			p = xdr_reserve_space(xdr, 8);
+-			if (!p)
+-				goto out_resource;
+-			*p++ = cpu_to_be32(1);
+-			*p++ = cpu_to_be32(exp->ex_layout_type);
+-		} else {
+-			p = xdr_reserve_space(xdr, 4);
+-			if (!p)
+-				goto out_resource;
+-			*p++ = cpu_to_be32(0);
+-		}
++	if (bmval1 & FATTR4_WORD1_FS_LAYOUT_TYPES) {
++		status = nfsd4_encode_layout_type(xdr, exp->ex_layout_type);
++		if (status)
++			goto out;
++	}
++
++	if (bmval2 & FATTR4_WORD2_LAYOUT_TYPES) {
++		status = nfsd4_encode_layout_type(xdr, exp->ex_layout_type);
++		if (status)
++			goto out;
+ 	}
+ 
+ 	if (bmval2 & FATTR4_WORD2_LAYOUT_BLKSIZE) {
+diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
+index edb640ae9a94..eb1cebed3f36 100644
+--- a/include/linux/jbd2.h
++++ b/include/linux/jbd2.h
+@@ -1042,8 +1042,9 @@ void jbd2_update_log_tail(journal_t *journal, tid_t tid, unsigned long block);
+ extern void jbd2_journal_commit_transaction(journal_t *);
+ 
+ /* Checkpoint list management */
+-void __jbd2_journal_clean_checkpoint_list(journal_t *journal);
++void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy);
+ int __jbd2_journal_remove_checkpoint(struct journal_head *);
++void jbd2_journal_destroy_checkpoint(journal_t *journal);
+ void __jbd2_journal_insert_checkpoint(struct journal_head *, transaction_t *);
+ 
+ 
+diff --git a/include/linux/platform_data/st_nci.h b/include/linux/platform_data/st_nci.h
+deleted file mode 100644
+index d9d400a297bd..000000000000
+--- a/include/linux/platform_data/st_nci.h
++++ /dev/null
+@@ -1,29 +0,0 @@
+-/*
+- * Driver include for ST NCI NFC chip family.
+- *
+- * Copyright (C) 2014-2015  STMicroelectronics SAS. All rights reserved.
+- *
+- * This program is free software; you can redistribute it and/or modify it
+- * under the terms and conditions of the GNU General Public License,
+- * version 2, as published by the Free Software Foundation.
+- *
+- * This program is distributed in the hope that it will be useful,
+- * but WITHOUT ANY WARRANTY; without even the implied warranty of
+- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+- * GNU General Public License for more details.
+- *
+- * You should have received a copy of the GNU General Public License
+- * along with this program; if not, see <http://www.gnu.org/licenses/>.
+- */
+-
+-#ifndef _ST_NCI_H_
+-#define _ST_NCI_H_
+-
+-#define ST_NCI_DRIVER_NAME "st_nci"
+-
+-struct st_nci_nfc_platform_data {
+-	unsigned int gpio_reset;
+-	unsigned int irq_polarity;
+-};
+-
+-#endif /* _ST_NCI_H_ */
+diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
+index cb94ee4181d4..4929a8a9fd52 100644
+--- a/include/linux/sunrpc/svc_rdma.h
++++ b/include/linux/sunrpc/svc_rdma.h
+@@ -172,13 +172,6 @@ struct svcxprt_rdma {
+ #define RDMAXPRT_SQ_PENDING	2
+ #define RDMAXPRT_CONN_PENDING	3
+ 
+-#define RPCRDMA_MAX_SVC_SEGS	(64)	/* server max scatter/gather */
+-#if RPCSVC_MAXPAYLOAD < (RPCRDMA_MAX_SVC_SEGS << PAGE_SHIFT)
+-#define RPCRDMA_MAXPAYLOAD	RPCSVC_MAXPAYLOAD
+-#else
+-#define RPCRDMA_MAXPAYLOAD	(RPCRDMA_MAX_SVC_SEGS << PAGE_SHIFT)
+-#endif
+-
+ #define RPCRDMA_LISTEN_BACKLOG  10
+ /* The default ORD value is based on two outstanding full-size writes with a
+  * page size of 4k, or 32k * 2 ops / 4k = 16 outstanding RDMA_READ.  */
+@@ -187,6 +180,8 @@ struct svcxprt_rdma {
+ #define RPCRDMA_MAX_REQUESTS    32
+ #define RPCRDMA_MAX_REQ_SIZE    4096
+ 
++#define RPCSVC_MAXPAYLOAD_RDMA	RPCSVC_MAXPAYLOAD
++
+ /* svc_rdma_marshal.c */
+ extern int svc_rdma_xdr_decode_req(struct rpcrdma_msg **, struct svc_rqst *);
+ extern int svc_rdma_xdr_encode_error(struct svcxprt_rdma *,
+diff --git a/include/linux/sunrpc/xprtsock.h b/include/linux/sunrpc/xprtsock.h
+index 7591788e9fbf..357e44c1a46b 100644
+--- a/include/linux/sunrpc/xprtsock.h
++++ b/include/linux/sunrpc/xprtsock.h
+@@ -42,6 +42,7 @@ struct sock_xprt {
+ 	/*
+ 	 * Connection of transports
+ 	 */
++	unsigned long		sock_state;
+ 	struct delayed_work	connect_worker;
+ 	struct sockaddr_storage	srcaddr;
+ 	unsigned short		srcport;
+@@ -76,6 +77,8 @@ struct sock_xprt {
+  */
+ #define TCP_RPC_REPLY		(1UL << 6)
+ 
++#define XPRT_SOCK_CONNECTING	1U
++
+ #endif /* __KERNEL__ */
+ 
+ #endif /* _LINUX_SUNRPC_XPRTSOCK_H */
+diff --git a/include/soc/tegra/mc.h b/include/soc/tegra/mc.h
+index 1ab2813273cd..bf2058690ceb 100644
+--- a/include/soc/tegra/mc.h
++++ b/include/soc/tegra/mc.h
+@@ -66,6 +66,7 @@ struct tegra_smmu_soc {
+ 	bool supports_round_robin_arbitration;
+ 	bool supports_request_limit;
+ 
++	unsigned int num_tlb_lines;
+ 	unsigned int num_asids;
+ 
+ 	const struct tegra_smmu_ops *ops;
+diff --git a/include/sound/hda_i915.h b/include/sound/hda_i915.h
+index adb5ba5cbd9d..ff99140831ba 100644
+--- a/include/sound/hda_i915.h
++++ b/include/sound/hda_i915.h
+@@ -11,7 +11,7 @@ int snd_hdac_get_display_clk(struct hdac_bus *bus);
+ int snd_hdac_i915_init(struct hdac_bus *bus);
+ int snd_hdac_i915_exit(struct hdac_bus *bus);
+ #else
+-static int snd_hdac_set_codec_wakeup(struct hdac_bus *bus, bool enable)
++static inline int snd_hdac_set_codec_wakeup(struct hdac_bus *bus, bool enable)
+ {
+ 	return 0;
+ }
+diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h
+index fd1a02cb3c82..003dca933803 100644
+--- a/include/trace/events/sunrpc.h
++++ b/include/trace/events/sunrpc.h
+@@ -529,18 +529,21 @@ TRACE_EVENT(svc_xprt_do_enqueue,
+ 
+ 	TP_STRUCT__entry(
+ 		__field(struct svc_xprt *, xprt)
+-		__field(struct svc_rqst *, rqst)
++		__field_struct(struct sockaddr_storage, ss)
++		__field(int, pid)
++		__field(unsigned long, flags)
+ 	),
+ 
+ 	TP_fast_assign(
+ 		__entry->xprt = xprt;
+-		__entry->rqst = rqst;
++		xprt ? memcpy(&__entry->ss, &xprt->xpt_remote, sizeof(__entry->ss)) : memset(&__entry->ss, 0, sizeof(__entry->ss));
++		__entry->pid = rqst? rqst->rq_task->pid : 0;
++		__entry->flags = xprt ? xprt->xpt_flags : 0;
+ 	),
+ 
+ 	TP_printk("xprt=0x%p addr=%pIScp pid=%d flags=%s", __entry->xprt,
+-		(struct sockaddr *)&__entry->xprt->xpt_remote,
+-		__entry->rqst ? __entry->rqst->rq_task->pid : 0,
+-		show_svc_xprt_flags(__entry->xprt->xpt_flags))
++		(struct sockaddr *)&__entry->ss,
++		__entry->pid, show_svc_xprt_flags(__entry->flags))
+ );
+ 
+ TRACE_EVENT(svc_xprt_dequeue,
+@@ -589,16 +592,20 @@ TRACE_EVENT(svc_handle_xprt,
+ 	TP_STRUCT__entry(
+ 		__field(struct svc_xprt *, xprt)
+ 		__field(int, len)
++		__field_struct(struct sockaddr_storage, ss)
++		__field(unsigned long, flags)
+ 	),
+ 
+ 	TP_fast_assign(
+ 		__entry->xprt = xprt;
++		xprt ? memcpy(&__entry->ss, &xprt->xpt_remote, sizeof(__entry->ss)) : memset(&__entry->ss, 0, sizeof(__entry->ss));
+ 		__entry->len = len;
++		__entry->flags = xprt ? xprt->xpt_flags : 0;
+ 	),
+ 
+ 	TP_printk("xprt=0x%p addr=%pIScp len=%d flags=%s", __entry->xprt,
+-		(struct sockaddr *)&__entry->xprt->xpt_remote, __entry->len,
+-		show_svc_xprt_flags(__entry->xprt->xpt_flags))
++		(struct sockaddr *)&__entry->ss,
++		__entry->len, show_svc_xprt_flags(__entry->flags))
+ );
+ #endif /* _TRACE_SUNRPC_H */
+ 
+diff --git a/kernel/fork.c b/kernel/fork.c
+index dbd9b8d7b7cc..26a70dc7a915 100644
+--- a/kernel/fork.c
++++ b/kernel/fork.c
+@@ -1871,13 +1871,21 @@ static int check_unshare_flags(unsigned long unshare_flags)
+ 				CLONE_NEWUSER|CLONE_NEWPID))
+ 		return -EINVAL;
+ 	/*
+-	 * Not implemented, but pretend it works if there is nothing to
+-	 * unshare. Note that unsharing CLONE_THREAD or CLONE_SIGHAND
+-	 * needs to unshare vm.
++	 * Not implemented, but pretend it works if there is nothing
++	 * to unshare.  Note that unsharing the address space or the
++	 * signal handlers also need to unshare the signal queues (aka
++	 * CLONE_THREAD).
+ 	 */
+ 	if (unshare_flags & (CLONE_THREAD | CLONE_SIGHAND | CLONE_VM)) {
+-		/* FIXME: get_task_mm() increments ->mm_users */
+-		if (atomic_read(&current->mm->mm_users) > 1)
++		if (!thread_group_empty(current))
++			return -EINVAL;
++	}
++	if (unshare_flags & (CLONE_SIGHAND | CLONE_VM)) {
++		if (atomic_read(&current->sighand->count) > 1)
++			return -EINVAL;
++	}
++	if (unshare_flags & CLONE_VM) {
++		if (!current_is_single_threaded())
+ 			return -EINVAL;
+ 	}
+ 
+@@ -1946,16 +1954,16 @@ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags)
+ 	if (unshare_flags & CLONE_NEWUSER)
+ 		unshare_flags |= CLONE_THREAD | CLONE_FS;
+ 	/*
+-	 * If unsharing a thread from a thread group, must also unshare vm.
+-	 */
+-	if (unshare_flags & CLONE_THREAD)
+-		unshare_flags |= CLONE_VM;
+-	/*
+ 	 * If unsharing vm, must also unshare signal handlers.
+ 	 */
+ 	if (unshare_flags & CLONE_VM)
+ 		unshare_flags |= CLONE_SIGHAND;
+ 	/*
++	 * If unsharing a signal handlers, must also unshare the signal queues.
++	 */
++	if (unshare_flags & CLONE_SIGHAND)
++		unshare_flags |= CLONE_THREAD;
++	/*
+ 	 * If unsharing namespace, must also unshare filesystem information.
+ 	 */
+ 	if (unshare_flags & CLONE_NEWNS)
+diff --git a/kernel/workqueue.c b/kernel/workqueue.c
+index 4c4f06176f74..a413acb59a07 100644
+--- a/kernel/workqueue.c
++++ b/kernel/workqueue.c
+@@ -2614,7 +2614,7 @@ void flush_workqueue(struct workqueue_struct *wq)
+ out_unlock:
+ 	mutex_unlock(&wq->mutex);
+ }
+-EXPORT_SYMBOL_GPL(flush_workqueue);
++EXPORT_SYMBOL(flush_workqueue);
+ 
+ /**
+  * drain_workqueue - drain a workqueue
+diff --git a/lib/decompress_bunzip2.c b/lib/decompress_bunzip2.c
+index 6dd0335ea61b..0234361b24b8 100644
+--- a/lib/decompress_bunzip2.c
++++ b/lib/decompress_bunzip2.c
+@@ -743,12 +743,12 @@ exit_0:
+ }
+ 
+ #ifdef PREBOOT
+-STATIC int INIT decompress(unsigned char *buf, long len,
++STATIC int INIT __decompress(unsigned char *buf, long len,
+ 			long (*fill)(void*, unsigned long),
+ 			long (*flush)(void*, unsigned long),
+-			unsigned char *outbuf,
++			unsigned char *outbuf, long olen,
+ 			long *pos,
+-			void(*error)(char *x))
++			void (*error)(char *x))
+ {
+ 	return bunzip2(buf, len - 4, fill, flush, outbuf, pos, error);
+ }
+diff --git a/lib/decompress_inflate.c b/lib/decompress_inflate.c
+index d4c7891635ec..555c06bf20da 100644
+--- a/lib/decompress_inflate.c
++++ b/lib/decompress_inflate.c
+@@ -1,4 +1,5 @@
+ #ifdef STATIC
++#define PREBOOT
+ /* Pre-boot environment: included */
+ 
+ /* prevent inclusion of _LINUX_KERNEL_H in pre-boot environment: lots
+@@ -33,23 +34,23 @@ static long INIT nofill(void *buffer, unsigned long len)
+ }
+ 
+ /* Included from initramfs et al code */
+-STATIC int INIT gunzip(unsigned char *buf, long len,
++STATIC int INIT __gunzip(unsigned char *buf, long len,
+ 		       long (*fill)(void*, unsigned long),
+ 		       long (*flush)(void*, unsigned long),
+-		       unsigned char *out_buf,
++		       unsigned char *out_buf, long out_len,
+ 		       long *pos,
+ 		       void(*error)(char *x)) {
+ 	u8 *zbuf;
+ 	struct z_stream_s *strm;
+ 	int rc;
+-	size_t out_len;
+ 
+ 	rc = -1;
+ 	if (flush) {
+ 		out_len = 0x8000; /* 32 K */
+ 		out_buf = malloc(out_len);
+ 	} else {
+-		out_len = ((size_t)~0) - (size_t)out_buf; /* no limit */
++		if (!out_len)
++			out_len = ((size_t)~0) - (size_t)out_buf; /* no limit */
+ 	}
+ 	if (!out_buf) {
+ 		error("Out of memory while allocating output buffer");
+@@ -181,4 +182,24 @@ gunzip_nomem1:
+ 	return rc; /* returns Z_OK (0) if successful */
+ }
+ 
+-#define decompress gunzip
++#ifndef PREBOOT
++STATIC int INIT gunzip(unsigned char *buf, long len,
++		       long (*fill)(void*, unsigned long),
++		       long (*flush)(void*, unsigned long),
++		       unsigned char *out_buf,
++		       long *pos,
++		       void (*error)(char *x))
++{
++	return __gunzip(buf, len, fill, flush, out_buf, 0, pos, error);
++}
++#else
++STATIC int INIT __decompress(unsigned char *buf, long len,
++			   long (*fill)(void*, unsigned long),
++			   long (*flush)(void*, unsigned long),
++			   unsigned char *out_buf, long out_len,
++			   long *pos,
++			   void (*error)(char *x))
++{
++	return __gunzip(buf, len, fill, flush, out_buf, out_len, pos, error);
++}
++#endif
+diff --git a/lib/decompress_unlz4.c b/lib/decompress_unlz4.c
+index 40f66ebe57b7..036fc882cd72 100644
+--- a/lib/decompress_unlz4.c
++++ b/lib/decompress_unlz4.c
+@@ -196,12 +196,12 @@ exit_0:
+ }
+ 
+ #ifdef PREBOOT
+-STATIC int INIT decompress(unsigned char *buf, long in_len,
++STATIC int INIT __decompress(unsigned char *buf, long in_len,
+ 			      long (*fill)(void*, unsigned long),
+ 			      long (*flush)(void*, unsigned long),
+-			      unsigned char *output,
++			      unsigned char *output, long out_len,
+ 			      long *posp,
+-			      void(*error)(char *x)
++			      void (*error)(char *x)
+ 	)
+ {
+ 	return unlz4(buf, in_len - 4, fill, flush, output, posp, error);
+diff --git a/lib/decompress_unlzma.c b/lib/decompress_unlzma.c
+index 0be83af62b88..decb64629c14 100644
+--- a/lib/decompress_unlzma.c
++++ b/lib/decompress_unlzma.c
+@@ -667,13 +667,12 @@ exit_0:
+ }
+ 
+ #ifdef PREBOOT
+-STATIC int INIT decompress(unsigned char *buf, long in_len,
++STATIC int INIT __decompress(unsigned char *buf, long in_len,
+ 			      long (*fill)(void*, unsigned long),
+ 			      long (*flush)(void*, unsigned long),
+-			      unsigned char *output,
++			      unsigned char *output, long out_len,
+ 			      long *posp,
+-			      void(*error)(char *x)
+-	)
++			      void (*error)(char *x))
+ {
+ 	return unlzma(buf, in_len - 4, fill, flush, output, posp, error);
+ }
+diff --git a/lib/decompress_unlzo.c b/lib/decompress_unlzo.c
+index b94a31bdd87d..f4c158e3a022 100644
+--- a/lib/decompress_unlzo.c
++++ b/lib/decompress_unlzo.c
+@@ -31,6 +31,7 @@
+  */
+ 
+ #ifdef STATIC
++#define PREBOOT
+ #include "lzo/lzo1x_decompress_safe.c"
+ #else
+ #include <linux/decompress/unlzo.h>
+@@ -287,4 +288,14 @@ exit:
+ 	return ret;
+ }
+ 
+-#define decompress unlzo
++#ifdef PREBOOT
++STATIC int INIT __decompress(unsigned char *buf, long len,
++			   long (*fill)(void*, unsigned long),
++			   long (*flush)(void*, unsigned long),
++			   unsigned char *out_buf, long olen,
++			   long *pos,
++			   void (*error)(char *x))
++{
++	return unlzo(buf, len, fill, flush, out_buf, pos, error);
++}
++#endif
+diff --git a/lib/decompress_unxz.c b/lib/decompress_unxz.c
+index b07a78340e9d..25d59a95bd66 100644
+--- a/lib/decompress_unxz.c
++++ b/lib/decompress_unxz.c
+@@ -394,4 +394,14 @@ error_alloc_state:
+  * This macro is used by architecture-specific files to decompress
+  * the kernel image.
+  */
+-#define decompress unxz
++#ifdef XZ_PREBOOT
++STATIC int INIT __decompress(unsigned char *buf, long len,
++			   long (*fill)(void*, unsigned long),
++			   long (*flush)(void*, unsigned long),
++			   unsigned char *out_buf, long olen,
++			   long *pos,
++			   void (*error)(char *x))
++{
++	return unxz(buf, len, fill, flush, out_buf, pos, error);
++}
++#endif
+diff --git a/mm/vmscan.c b/mm/vmscan.c
+index 8286938c70de..26c86e2fb5af 100644
+--- a/mm/vmscan.c
++++ b/mm/vmscan.c
+@@ -1190,7 +1190,7 @@ cull_mlocked:
+ 		if (PageSwapCache(page))
+ 			try_to_free_swap(page);
+ 		unlock_page(page);
+-		putback_lru_page(page);
++		list_add(&page->lru, &ret_pages);
+ 		continue;
+ 
+ activate_locked:
+diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
+index b8233505bf9f..8f1df6793650 100644
+--- a/net/mac80211/tx.c
++++ b/net/mac80211/tx.c
+@@ -311,9 +311,6 @@ ieee80211_tx_h_check_assoc(struct ieee80211_tx_data *tx)
+ 	if (tx->sdata->vif.type == NL80211_IFTYPE_WDS)
+ 		return TX_CONTINUE;
+ 
+-	if (tx->sdata->vif.type == NL80211_IFTYPE_MESH_POINT)
+-		return TX_CONTINUE;
+-
+ 	if (tx->flags & IEEE80211_TX_PS_BUFFERED)
+ 		return TX_CONTINUE;
+ 
+diff --git a/net/nfc/nci/hci.c b/net/nfc/nci/hci.c
+index af002df640c7..609f92283d1b 100644
+--- a/net/nfc/nci/hci.c
++++ b/net/nfc/nci/hci.c
+@@ -233,7 +233,7 @@ int nci_hci_send_cmd(struct nci_dev *ndev, u8 gate, u8 cmd,
+ 	r = nci_request(ndev, nci_hci_send_data_req, (unsigned long)&data,
+ 			msecs_to_jiffies(NCI_DATA_TIMEOUT));
+ 
+-	if (r == NCI_STATUS_OK)
++	if (r == NCI_STATUS_OK && skb)
+ 		*skb = conn_info->rx_skb;
+ 
+ 	return r;
+diff --git a/net/nfc/netlink.c b/net/nfc/netlink.c
+index f85f37ed19b2..73d1ca7c546c 100644
+--- a/net/nfc/netlink.c
++++ b/net/nfc/netlink.c
+@@ -1518,12 +1518,13 @@ static int nfc_genl_vendor_cmd(struct sk_buff *skb,
+ 	if (!dev || !dev->vendor_cmds || !dev->n_vendor_cmds)
+ 		return -ENODEV;
+ 
+-	data = nla_data(info->attrs[NFC_ATTR_VENDOR_DATA]);
+-	if (data) {
++	if (info->attrs[NFC_ATTR_VENDOR_DATA]) {
++		data = nla_data(info->attrs[NFC_ATTR_VENDOR_DATA]);
+ 		data_len = nla_len(info->attrs[NFC_ATTR_VENDOR_DATA]);
+ 		if (data_len == 0)
+ 			return -EINVAL;
+ 	} else {
++		data = NULL;
+ 		data_len = 0;
+ 	}
+ 
+diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
+index ab5dd621ae0c..2e98f4a243e5 100644
+--- a/net/sunrpc/xprt.c
++++ b/net/sunrpc/xprt.c
+@@ -614,6 +614,7 @@ static void xprt_autoclose(struct work_struct *work)
+ 	clear_bit(XPRT_CLOSE_WAIT, &xprt->state);
+ 	xprt->ops->close(xprt);
+ 	xprt_release_write(xprt, NULL);
++	wake_up_bit(&xprt->state, XPRT_LOCKED);
+ }
+ 
+ /**
+@@ -723,6 +724,7 @@ void xprt_unlock_connect(struct rpc_xprt *xprt, void *cookie)
+ 	xprt->ops->release_xprt(xprt, NULL);
+ out:
+ 	spin_unlock_bh(&xprt->transport_lock);
++	wake_up_bit(&xprt->state, XPRT_LOCKED);
+ }
+ 
+ /**
+@@ -1394,6 +1396,10 @@ out:
+ static void xprt_destroy(struct rpc_xprt *xprt)
+ {
+ 	dprintk("RPC:       destroying transport %p\n", xprt);
++
++	/* Exclude transport connect/disconnect handlers */
++	wait_on_bit_lock(&xprt->state, XPRT_LOCKED, TASK_UNINTERRUPTIBLE);
++
+ 	del_timer_sync(&xprt->timer);
+ 
+ 	rpc_xprt_debugfs_unregister(xprt);
+diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
+index 6b36279e4288..48f6de912f78 100644
+--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
++++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
+@@ -91,7 +91,7 @@ struct svc_xprt_class svc_rdma_class = {
+ 	.xcl_name = "rdma",
+ 	.xcl_owner = THIS_MODULE,
+ 	.xcl_ops = &svc_rdma_ops,
+-	.xcl_max_payload = RPCRDMA_MAXPAYLOAD,
++	.xcl_max_payload = RPCSVC_MAXPAYLOAD_RDMA,
+ 	.xcl_ident = XPRT_TRANSPORT_RDMA,
+ };
+ 
+diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
+index f49dd8b38122..e718d0959af3 100644
+--- a/net/sunrpc/xprtrdma/xprt_rdma.h
++++ b/net/sunrpc/xprtrdma/xprt_rdma.h
+@@ -51,7 +51,6 @@
+ #include <linux/sunrpc/clnt.h> 		/* rpc_xprt */
+ #include <linux/sunrpc/rpc_rdma.h> 	/* RPC/RDMA protocol */
+ #include <linux/sunrpc/xprtrdma.h> 	/* xprt parameters */
+-#include <linux/sunrpc/svc.h>		/* RPCSVC_MAXPAYLOAD */
+ 
+ #define RDMA_RESOLVE_TIMEOUT	(5000)	/* 5 seconds */
+ #define RDMA_CONNECT_RETRY_MAX	(2)	/* retries if no listener backlog */
+diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
+index 0030376327b7..8a39b1e48bc4 100644
+--- a/net/sunrpc/xprtsock.c
++++ b/net/sunrpc/xprtsock.c
+@@ -829,6 +829,7 @@ static void xs_reset_transport(struct sock_xprt *transport)
+ 	sk->sk_user_data = NULL;
+ 
+ 	xs_restore_old_callbacks(transport, sk);
++	xprt_clear_connected(xprt);
+ 	write_unlock_bh(&sk->sk_callback_lock);
+ 	xs_sock_reset_connection_flags(xprt);
+ 
+@@ -1432,6 +1433,7 @@ out:
+ static void xs_tcp_state_change(struct sock *sk)
+ {
+ 	struct rpc_xprt *xprt;
++	struct sock_xprt *transport;
+ 
+ 	read_lock_bh(&sk->sk_callback_lock);
+ 	if (!(xprt = xprt_from_sock(sk)))
+@@ -1443,13 +1445,12 @@ static void xs_tcp_state_change(struct sock *sk)
+ 			sock_flag(sk, SOCK_ZAPPED),
+ 			sk->sk_shutdown);
+ 
++	transport = container_of(xprt, struct sock_xprt, xprt);
+ 	trace_rpc_socket_state_change(xprt, sk->sk_socket);
+ 	switch (sk->sk_state) {
+ 	case TCP_ESTABLISHED:
+ 		spin_lock(&xprt->transport_lock);
+ 		if (!xprt_test_and_set_connected(xprt)) {
+-			struct sock_xprt *transport = container_of(xprt,
+-					struct sock_xprt, xprt);
+ 
+ 			/* Reset TCP record info */
+ 			transport->tcp_offset = 0;
+@@ -1458,6 +1459,8 @@ static void xs_tcp_state_change(struct sock *sk)
+ 			transport->tcp_flags =
+ 				TCP_RCV_COPY_FRAGHDR | TCP_RCV_COPY_XID;
+ 			xprt->connect_cookie++;
++			clear_bit(XPRT_SOCK_CONNECTING, &transport->sock_state);
++			xprt_clear_connecting(xprt);
+ 
+ 			xprt_wake_pending_tasks(xprt, -EAGAIN);
+ 		}
+@@ -1493,6 +1496,9 @@ static void xs_tcp_state_change(struct sock *sk)
+ 		smp_mb__after_atomic();
+ 		break;
+ 	case TCP_CLOSE:
++		if (test_and_clear_bit(XPRT_SOCK_CONNECTING,
++					&transport->sock_state))
++			xprt_clear_connecting(xprt);
+ 		xs_sock_mark_closed(xprt);
+ 	}
+  out:
+@@ -2176,6 +2182,7 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
+ 	/* Tell the socket layer to start connecting... */
+ 	xprt->stat.connect_count++;
+ 	xprt->stat.connect_start = jiffies;
++	set_bit(XPRT_SOCK_CONNECTING, &transport->sock_state);
+ 	ret = kernel_connect(sock, xs_addr(xprt), xprt->addrlen, O_NONBLOCK);
+ 	switch (ret) {
+ 	case 0:
+@@ -2237,7 +2244,6 @@ static void xs_tcp_setup_socket(struct work_struct *work)
+ 	case -EINPROGRESS:
+ 	case -EALREADY:
+ 		xprt_unlock_connect(xprt, transport);
+-		xprt_clear_connecting(xprt);
+ 		return;
+ 	case -EINVAL:
+ 		/* Happens, for instance, if the user specified a link
+@@ -2279,13 +2285,14 @@ static void xs_connect(struct rpc_xprt *xprt, struct rpc_task *task)
+ 
+ 	WARN_ON_ONCE(!xprt_lock_connect(xprt, task, transport));
+ 
+-	/* Start by resetting any existing state */
+-	xs_reset_transport(transport);
+-
+-	if (transport->sock != NULL && !RPC_IS_SOFTCONN(task)) {
++	if (transport->sock != NULL) {
+ 		dprintk("RPC:       xs_connect delayed xprt %p for %lu "
+ 				"seconds\n",
+ 				xprt, xprt->reestablish_timeout / HZ);
++
++		/* Start by resetting any existing state */
++		xs_reset_transport(transport);
++
+ 		queue_delayed_work(rpciod_workqueue,
+ 				   &transport->connect_worker,
+ 				   xprt->reestablish_timeout);
+diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
+index 374ea53288ca..c8f01ccc2513 100644
+--- a/sound/pci/hda/patch_realtek.c
++++ b/sound/pci/hda/patch_realtek.c
+@@ -1135,7 +1135,7 @@ static const struct hda_fixup alc880_fixups[] = {
+ 		/* override all pins as BIOS on old Amilo is broken */
+ 		.type = HDA_FIXUP_PINS,
+ 		.v.pins = (const struct hda_pintbl[]) {
+-			{ 0x14, 0x0121411f }, /* HP */
++			{ 0x14, 0x0121401f }, /* HP */
+ 			{ 0x15, 0x99030120 }, /* speaker */
+ 			{ 0x16, 0x99030130 }, /* bass speaker */
+ 			{ 0x17, 0x411111f0 }, /* N/A */
+@@ -1155,7 +1155,7 @@ static const struct hda_fixup alc880_fixups[] = {
+ 		/* almost compatible with FUJITSU, but no bass and SPDIF */
+ 		.type = HDA_FIXUP_PINS,
+ 		.v.pins = (const struct hda_pintbl[]) {
+-			{ 0x14, 0x0121411f }, /* HP */
++			{ 0x14, 0x0121401f }, /* HP */
+ 			{ 0x15, 0x99030120 }, /* speaker */
+ 			{ 0x16, 0x411111f0 }, /* N/A */
+ 			{ 0x17, 0x411111f0 }, /* N/A */
+@@ -1364,7 +1364,7 @@ static const struct snd_pci_quirk alc880_fixup_tbl[] = {
+ 	SND_PCI_QUIRK(0x161f, 0x203d, "W810", ALC880_FIXUP_W810),
+ 	SND_PCI_QUIRK(0x161f, 0x205d, "Medion Rim 2150", ALC880_FIXUP_MEDION_RIM),
+ 	SND_PCI_QUIRK(0x1631, 0xe011, "PB 13201056", ALC880_FIXUP_6ST_AUTOMUTE),
+-	SND_PCI_QUIRK(0x1734, 0x107c, "FSC F1734", ALC880_FIXUP_F1734),
++	SND_PCI_QUIRK(0x1734, 0x107c, "FSC Amilo M1437", ALC880_FIXUP_FUJITSU),
+ 	SND_PCI_QUIRK(0x1734, 0x1094, "FSC Amilo M1451G", ALC880_FIXUP_FUJITSU),
+ 	SND_PCI_QUIRK(0x1734, 0x10ac, "FSC AMILO Xi 1526", ALC880_FIXUP_F1734),
+ 	SND_PCI_QUIRK(0x1734, 0x10b0, "FSC Amilo Pi1556", ALC880_FIXUP_FUJITSU),
+@@ -5189,8 +5189,11 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
+ 	SND_PCI_QUIRK(0x1028, 0x06c7, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE),
+ 	SND_PCI_QUIRK(0x1028, 0x06d9, "Dell", ALC293_FIXUP_DELL1_MIC_NO_PRESENCE),
+ 	SND_PCI_QUIRK(0x1028, 0x06da, "Dell", ALC293_FIXUP_DELL1_MIC_NO_PRESENCE),
+-	SND_PCI_QUIRK(0x1028, 0x06de, "Dell", ALC292_FIXUP_DISABLE_AAMIX),
+ 	SND_PCI_QUIRK(0x1028, 0x06db, "Dell", ALC292_FIXUP_DISABLE_AAMIX),
++	SND_PCI_QUIRK(0x1028, 0x06dd, "Dell", ALC292_FIXUP_DISABLE_AAMIX),
++	SND_PCI_QUIRK(0x1028, 0x06de, "Dell", ALC292_FIXUP_DISABLE_AAMIX),
++	SND_PCI_QUIRK(0x1028, 0x06df, "Dell", ALC292_FIXUP_DISABLE_AAMIX),
++	SND_PCI_QUIRK(0x1028, 0x06e0, "Dell", ALC292_FIXUP_DISABLE_AAMIX),
+ 	SND_PCI_QUIRK(0x1028, 0x164a, "Dell", ALC293_FIXUP_DELL1_MIC_NO_PRESENCE),
+ 	SND_PCI_QUIRK(0x1028, 0x164b, "Dell", ALC293_FIXUP_DELL1_MIC_NO_PRESENCE),
+ 	SND_PCI_QUIRK(0x103c, 0x1586, "HP", ALC269_FIXUP_HP_MUTE_LED_MIC2),
+@@ -6579,6 +6582,7 @@ static const struct snd_pci_quirk alc662_fixup_tbl[] = {
+ 	SND_PCI_QUIRK(0x1028, 0x05db, "Dell", ALC668_FIXUP_DELL_MIC_NO_PRESENCE),
+ 	SND_PCI_QUIRK(0x1028, 0x05fe, "Dell XPS 15", ALC668_FIXUP_DELL_XPS13),
+ 	SND_PCI_QUIRK(0x1028, 0x060a, "Dell XPS 13", ALC668_FIXUP_DELL_XPS13),
++	SND_PCI_QUIRK(0x1028, 0x060d, "Dell M3800", ALC668_FIXUP_DELL_XPS13),
+ 	SND_PCI_QUIRK(0x1028, 0x0625, "Dell", ALC668_FIXUP_DELL_MIC_NO_PRESENCE),
+ 	SND_PCI_QUIRK(0x1028, 0x0626, "Dell", ALC668_FIXUP_DELL_MIC_NO_PRESENCE),
+ 	SND_PCI_QUIRK(0x1028, 0x0696, "Dell", ALC668_FIXUP_DELL_MIC_NO_PRESENCE),
+diff --git a/sound/usb/mixer.c b/sound/usb/mixer.c
+index 6b3acba5da7a..83d6e76435b4 100644
+--- a/sound/usb/mixer.c
++++ b/sound/usb/mixer.c
+@@ -2522,7 +2522,7 @@ static int restore_mixer_value(struct usb_mixer_elem_list *list)
+ 		for (c = 0; c < MAX_CHANNELS; c++) {
+ 			if (!(cval->cmask & (1 << c)))
+ 				continue;
+-			if (cval->cached & (1 << c)) {
++			if (cval->cached & (1 << (c + 1))) {
+ 				err = snd_usb_set_cur_mix_value(cval, c + 1, idx,
+ 							cval->cache_val[idx]);
+ 				if (err < 0)


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-09-29 19:16 Mike Pagano
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-09-29 19:16 UTC (permalink / raw
  To: gentoo-commits

commit:     ddc71720a6ed4cec05ef162cbcab0a74b71f76a7
Author:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Mon Oct  5 11:49:35 2015 +0000
Commit:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Mon Oct  5 11:49:35 2015 +0000
URL:        https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=ddc71720

Remove redundant patch

 2710_flush-workqueue-non-GPL-availability.patch | 33 -------------------------
 1 file changed, 33 deletions(-)

diff --git a/2710_flush-workqueue-non-GPL-availability.patch b/2710_flush-workqueue-non-GPL-availability.patch
deleted file mode 100644
index 3e017d4..0000000
--- a/2710_flush-workqueue-non-GPL-availability.patch
+++ /dev/null
@@ -1,33 +0,0 @@
-From 1dadafa86a779884f14a6e7a3ddde1a57b0a0a65 Mon Sep 17 00:00:00 2001
-From: Tim Gardner <tim.gardner@canonical.com>
-Date: Tue, 4 Aug 2015 11:26:04 -0600
-Subject: workqueue: Make flush_workqueue() available again to non GPL modules
-
-Commit 37b1ef31a568fc02e53587620226e5f3c66454c8 ("workqueue: move
-flush_scheduled_work() to workqueue.h") moved the exported non GPL
-flush_scheduled_work() from a function to an inline wrapper.
-Unfortunately, it directly calls flush_workqueue() which is a GPL function.
-This has the effect of changing the licensing requirement for this function
-and makes it unavailable to non GPL modules.
-
-See commit ad7b1f841f8a54c6d61ff181451f55b68175e15a ("workqueue: Make
-schedule_work() available again to non GPL modules") for precedent.
-
-Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-Signed-off-by: Tejun Heo <tj@kernel.org>
-
-diff --git a/kernel/workqueue.c b/kernel/workqueue.c
-index 4c4f061..a413acb 100644
---- a/kernel/workqueue.c
-+++ b/kernel/workqueue.c
-@@ -2614,7 +2614,7 @@ void flush_workqueue(struct workqueue_struct *wq)
- out_unlock:
-    mutex_unlock(&wq->mutex);
- }
--EXPORT_SYMBOL_GPL(flush_workqueue);
-+EXPORT_SYMBOL(flush_workqueue);
- 
- /**
-  * drain_workqueue - drain a workqueue
--- 
-cgit v0.10.2


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-10-03 16:12 Mike Pagano
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-10-03 16:12 UTC (permalink / raw
  To: gentoo-commits

commit:     2c0f6c3b92e2248ee19155496c89a7eead78472a
Author:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Sat Oct  3 16:12:47 2015 +0000
Commit:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Sat Oct  3 16:12:47 2015 +0000
URL:        https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=2c0f6c3b

Linux patch 4.2.3

 0000_README            |    4 +
 1002_linux-4.2.3.patch | 1532 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 1536 insertions(+)

diff --git a/0000_README b/0000_README
index 9428abc..5a14372 100644
--- a/0000_README
+++ b/0000_README
@@ -51,6 +51,10 @@ Patch:  1001_linux-4.2.2.patch
 From:   http://www.kernel.org
 Desc:   Linux 4.2.2
 
+Patch:  1002_linux-4.2.3.patch
+From:   http://www.kernel.org
+Desc:   Linux 4.2.3
+
 Patch:  1500_XATTR_USER_PREFIX.patch
 From:   https://bugs.gentoo.org/show_bug.cgi?id=470644
 Desc:   Support for namespace user.pax.* on tmpfs.

diff --git a/1002_linux-4.2.3.patch b/1002_linux-4.2.3.patch
new file mode 100644
index 0000000..018e36c
--- /dev/null
+++ b/1002_linux-4.2.3.patch
@@ -0,0 +1,1532 @@
+diff --git a/Documentation/devicetree/bindings/net/ethernet.txt b/Documentation/devicetree/bindings/net/ethernet.txt
+index 41b3f3f864e8..5d88f37480b6 100644
+--- a/Documentation/devicetree/bindings/net/ethernet.txt
++++ b/Documentation/devicetree/bindings/net/ethernet.txt
+@@ -25,7 +25,11 @@ The following properties are common to the Ethernet controllers:
+   flow control thresholds.
+ - tx-fifo-depth: the size of the controller's transmit fifo in bytes. This
+   is used for components that can have configurable fifo sizes.
++- managed: string, specifies the PHY management type. Supported values are:
++  "auto", "in-band-status". "auto" is the default, it usess MDIO for
++  management if fixed-link is not specified.
+ 
+ Child nodes of the Ethernet controller are typically the individual PHY devices
+ connected via the MDIO bus (sometimes the MDIO bus controller is separate).
+ They are described in the phy.txt file in this same directory.
++For non-MDIO PHY management see fixed-link.txt.
+diff --git a/Makefile b/Makefile
+index 3578b4426ecf..a6edbb11a69a 100644
+--- a/Makefile
++++ b/Makefile
+@@ -1,6 +1,6 @@
+ VERSION = 4
+ PATCHLEVEL = 2
+-SUBLEVEL = 2
++SUBLEVEL = 3
+ EXTRAVERSION =
+ NAME = Hurr durr I'ma sheep
+ 
+diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
+index 965d1afb0eaa..5cb13ca3a3ac 100644
+--- a/drivers/block/zram/zcomp.c
++++ b/drivers/block/zram/zcomp.c
+@@ -330,12 +330,14 @@ void zcomp_destroy(struct zcomp *comp)
+  * allocate new zcomp and initialize it. return compressing
+  * backend pointer or ERR_PTR if things went bad. ERR_PTR(-EINVAL)
+  * if requested algorithm is not supported, ERR_PTR(-ENOMEM) in
+- * case of allocation error.
++ * case of allocation error, or any other error potentially
++ * returned by functions zcomp_strm_{multi,single}_create.
+  */
+ struct zcomp *zcomp_create(const char *compress, int max_strm)
+ {
+ 	struct zcomp *comp;
+ 	struct zcomp_backend *backend;
++	int error;
+ 
+ 	backend = find_backend(compress);
+ 	if (!backend)
+@@ -347,12 +349,12 @@ struct zcomp *zcomp_create(const char *compress, int max_strm)
+ 
+ 	comp->backend = backend;
+ 	if (max_strm > 1)
+-		zcomp_strm_multi_create(comp, max_strm);
++		error = zcomp_strm_multi_create(comp, max_strm);
+ 	else
+-		zcomp_strm_single_create(comp);
+-	if (!comp->stream) {
++		error = zcomp_strm_single_create(comp);
++	if (error) {
+ 		kfree(comp);
+-		return ERR_PTR(-ENOMEM);
++		return ERR_PTR(error);
+ 	}
+ 	return comp;
+ }
+diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
+index 079897b3a955..9d56515f4c4d 100644
+--- a/drivers/net/dsa/bcm_sf2.c
++++ b/drivers/net/dsa/bcm_sf2.c
+@@ -418,7 +418,7 @@ static int bcm_sf2_sw_fast_age_port(struct dsa_switch  *ds, int port)
+ 	core_writel(priv, port, CORE_FAST_AGE_PORT);
+ 
+ 	reg = core_readl(priv, CORE_FAST_AGE_CTRL);
+-	reg |= EN_AGE_PORT | FAST_AGE_STR_DONE;
++	reg |= EN_AGE_PORT | EN_AGE_DYNAMIC | FAST_AGE_STR_DONE;
+ 	core_writel(priv, reg, CORE_FAST_AGE_CTRL);
+ 
+ 	do {
+@@ -432,6 +432,8 @@ static int bcm_sf2_sw_fast_age_port(struct dsa_switch  *ds, int port)
+ 	if (!timeout)
+ 		return -ETIMEDOUT;
+ 
++	core_writel(priv, 0, CORE_FAST_AGE_CTRL);
++
+ 	return 0;
+ }
+ 
+@@ -507,7 +509,7 @@ static int bcm_sf2_sw_br_set_stp_state(struct dsa_switch *ds, int port,
+ 	u32 reg;
+ 
+ 	reg = core_readl(priv, CORE_G_PCTL_PORT(port));
+-	cur_hw_state = reg >> G_MISTP_STATE_SHIFT;
++	cur_hw_state = reg & (G_MISTP_STATE_MASK << G_MISTP_STATE_SHIFT);
+ 
+ 	switch (state) {
+ 	case BR_STATE_DISABLED:
+@@ -531,10 +533,12 @@ static int bcm_sf2_sw_br_set_stp_state(struct dsa_switch *ds, int port,
+ 	}
+ 
+ 	/* Fast-age ARL entries if we are moving a port from Learning or
+-	 * Forwarding state to Disabled, Blocking or Listening state
++	 * Forwarding (cur_hw_state) state to Disabled, Blocking or Listening
++	 * state (hw_state)
+ 	 */
+ 	if (cur_hw_state != hw_state) {
+-		if (cur_hw_state & 4 && !(hw_state & 4)) {
++		if (cur_hw_state >= G_MISTP_LEARN_STATE &&
++		    hw_state <= G_MISTP_LISTEN_STATE) {
+ 			ret = bcm_sf2_sw_fast_age_port(ds, port);
+ 			if (ret) {
+ 				pr_err("%s: fast-ageing failed\n", __func__);
+@@ -901,15 +905,11 @@ static void bcm_sf2_sw_fixed_link_update(struct dsa_switch *ds, int port,
+ 					 struct fixed_phy_status *status)
+ {
+ 	struct bcm_sf2_priv *priv = ds_to_priv(ds);
+-	u32 duplex, pause, speed;
++	u32 duplex, pause;
+ 	u32 reg;
+ 
+ 	duplex = core_readl(priv, CORE_DUPSTS);
+ 	pause = core_readl(priv, CORE_PAUSESTS);
+-	speed = core_readl(priv, CORE_SPDSTS);
+-
+-	speed >>= (port * SPDSTS_SHIFT);
+-	speed &= SPDSTS_MASK;
+ 
+ 	status->link = 0;
+ 
+@@ -944,18 +944,6 @@ static void bcm_sf2_sw_fixed_link_update(struct dsa_switch *ds, int port,
+ 		reg &= ~LINK_STS;
+ 	core_writel(priv, reg, CORE_STS_OVERRIDE_GMIIP_PORT(port));
+ 
+-	switch (speed) {
+-	case SPDSTS_10:
+-		status->speed = SPEED_10;
+-		break;
+-	case SPDSTS_100:
+-		status->speed = SPEED_100;
+-		break;
+-	case SPDSTS_1000:
+-		status->speed = SPEED_1000;
+-		break;
+-	}
+-
+ 	if ((pause & (1 << port)) &&
+ 	    (pause & (1 << (port + PAUSESTS_TX_PAUSE_SHIFT)))) {
+ 		status->asym_pause = 1;
+diff --git a/drivers/net/dsa/bcm_sf2.h b/drivers/net/dsa/bcm_sf2.h
+index 22e2ebf31333..789d7b7737da 100644
+--- a/drivers/net/dsa/bcm_sf2.h
++++ b/drivers/net/dsa/bcm_sf2.h
+@@ -112,8 +112,8 @@ static inline u64 name##_readq(struct bcm_sf2_priv *priv, u32 off)	\
+ 	spin_unlock(&priv->indir_lock);					\
+ 	return (u64)indir << 32 | dir;					\
+ }									\
+-static inline void name##_writeq(struct bcm_sf2_priv *priv, u32 off,	\
+-							u64 val)	\
++static inline void name##_writeq(struct bcm_sf2_priv *priv, u64 val,	\
++							u32 off)	\
+ {									\
+ 	spin_lock(&priv->indir_lock);					\
+ 	reg_writel(priv, upper_32_bits(val), REG_DIR_DATA_WRITE);	\
+diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
+index 561342466076..26ec2fbfaa89 100644
+--- a/drivers/net/dsa/mv88e6xxx.c
++++ b/drivers/net/dsa/mv88e6xxx.c
+@@ -1387,6 +1387,7 @@ static int mv88e6xxx_setup_port(struct dsa_switch *ds, int port)
+ 		reg = _mv88e6xxx_reg_read(ds, REG_PORT(port), PORT_PCS_CTRL);
+ 		if (dsa_is_cpu_port(ds, port) ||
+ 		    ds->dsa_port_mask & (1 << port)) {
++			reg &= ~PORT_PCS_CTRL_UNFORCED;
+ 			reg |= PORT_PCS_CTRL_FORCE_LINK |
+ 				PORT_PCS_CTRL_LINK_UP |
+ 				PORT_PCS_CTRL_DUPLEX_FULL |
+diff --git a/drivers/net/ethernet/altera/altera_tse_main.c b/drivers/net/ethernet/altera/altera_tse_main.c
+index da48e66377b5..8207877d6237 100644
+--- a/drivers/net/ethernet/altera/altera_tse_main.c
++++ b/drivers/net/ethernet/altera/altera_tse_main.c
+@@ -511,8 +511,7 @@ static int tse_poll(struct napi_struct *napi, int budget)
+ 
+ 	if (rxcomplete < budget) {
+ 
+-		napi_gro_flush(napi, false);
+-		__napi_complete(napi);
++		napi_complete(napi);
+ 
+ 		netdev_dbg(priv->dev,
+ 			   "NAPI Complete, did %d packets with budget %d\n",
+diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
+index b349e6f36ea7..de63266de16b 100644
+--- a/drivers/net/ethernet/freescale/fec_main.c
++++ b/drivers/net/ethernet/freescale/fec_main.c
+@@ -1402,6 +1402,7 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id)
+ 		if ((status & BD_ENET_RX_LAST) == 0)
+ 			netdev_err(ndev, "rcv is not +last\n");
+ 
++		writel(FEC_ENET_RXF, fep->hwp + FEC_IEVENT);
+ 
+ 		/* Check for errors. */
+ 		if (status & (BD_ENET_RX_LG | BD_ENET_RX_SH | BD_ENET_RX_NO |
+diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
+index 62e48bc0cb23..09ec32e33076 100644
+--- a/drivers/net/ethernet/marvell/mvneta.c
++++ b/drivers/net/ethernet/marvell/mvneta.c
+@@ -1479,6 +1479,7 @@ static int mvneta_rx(struct mvneta_port *pp, int rx_todo,
+ 		struct mvneta_rx_desc *rx_desc = mvneta_rxq_next_desc_get(rxq);
+ 		struct sk_buff *skb;
+ 		unsigned char *data;
++		dma_addr_t phys_addr;
+ 		u32 rx_status;
+ 		int rx_bytes, err;
+ 
+@@ -1486,6 +1487,7 @@ static int mvneta_rx(struct mvneta_port *pp, int rx_todo,
+ 		rx_status = rx_desc->status;
+ 		rx_bytes = rx_desc->data_size - (ETH_FCS_LEN + MVNETA_MH_SIZE);
+ 		data = (unsigned char *)rx_desc->buf_cookie;
++		phys_addr = rx_desc->buf_phys_addr;
+ 
+ 		if (!mvneta_rxq_desc_is_first_last(rx_status) ||
+ 		    (rx_status & MVNETA_RXD_ERR_SUMMARY)) {
+@@ -1534,7 +1536,7 @@ static int mvneta_rx(struct mvneta_port *pp, int rx_todo,
+ 		if (!skb)
+ 			goto err_drop_frame;
+ 
+-		dma_unmap_single(dev->dev.parent, rx_desc->buf_phys_addr,
++		dma_unmap_single(dev->dev.parent, phys_addr,
+ 				 MVNETA_RX_BUF_SIZE(pp->pkt_size), DMA_FROM_DEVICE);
+ 
+ 		rcvd_pkts++;
+@@ -3027,8 +3029,8 @@ static int mvneta_probe(struct platform_device *pdev)
+ 	const char *dt_mac_addr;
+ 	char hw_mac_addr[ETH_ALEN];
+ 	const char *mac_from;
++	const char *managed;
+ 	int phy_mode;
+-	int fixed_phy = 0;
+ 	int err;
+ 
+ 	/* Our multiqueue support is not complete, so for now, only
+@@ -3062,7 +3064,6 @@ static int mvneta_probe(struct platform_device *pdev)
+ 			dev_err(&pdev->dev, "cannot register fixed PHY\n");
+ 			goto err_free_irq;
+ 		}
+-		fixed_phy = 1;
+ 
+ 		/* In the case of a fixed PHY, the DT node associated
+ 		 * to the PHY is the Ethernet MAC DT node.
+@@ -3086,8 +3087,10 @@ static int mvneta_probe(struct platform_device *pdev)
+ 	pp = netdev_priv(dev);
+ 	pp->phy_node = phy_node;
+ 	pp->phy_interface = phy_mode;
+-	pp->use_inband_status = (phy_mode == PHY_INTERFACE_MODE_SGMII) &&
+-				fixed_phy;
++
++	err = of_property_read_string(dn, "managed", &managed);
++	pp->use_inband_status = (err == 0 &&
++				 strcmp(managed, "in-band-status") == 0);
+ 
+ 	pp->clk = devm_clk_get(&pdev->dev, NULL);
+ 	if (IS_ERR(pp->clk)) {
+diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+index 9c145dddd717..4f95fa7b594d 100644
+--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
++++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+@@ -1250,8 +1250,6 @@ int mlx4_en_config_rss_steer(struct mlx4_en_priv *priv)
+ 		rss_context->hash_fn = MLX4_RSS_HASH_TOP;
+ 		memcpy(rss_context->rss_key, priv->rss_key,
+ 		       MLX4_EN_RSS_KEY_SIZE);
+-		netdev_rss_key_fill(rss_context->rss_key,
+-				    MLX4_EN_RSS_KEY_SIZE);
+ 	} else {
+ 		en_err(priv, "Unknown RSS hash function requested\n");
+ 		err = -EINVAL;
+diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
+index 29c2a017a450..a408977a531a 100644
+--- a/drivers/net/ethernet/mellanox/mlx4/main.c
++++ b/drivers/net/ethernet/mellanox/mlx4/main.c
+@@ -2654,9 +2654,14 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev)
+ 
+ 	if (msi_x) {
+ 		int nreq = dev->caps.num_ports * num_online_cpus() + 1;
++		bool shared_ports = false;
+ 
+ 		nreq = min_t(int, dev->caps.num_eqs - dev->caps.reserved_eqs,
+ 			     nreq);
++		if (nreq > MAX_MSIX) {
++			nreq = MAX_MSIX;
++			shared_ports = true;
++		}
+ 
+ 		entries = kcalloc(nreq, sizeof *entries, GFP_KERNEL);
+ 		if (!entries)
+@@ -2679,6 +2684,9 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev)
+ 		bitmap_zero(priv->eq_table.eq[MLX4_EQ_ASYNC].actv_ports.ports,
+ 			    dev->caps.num_ports);
+ 
++		if (MLX4_IS_LEGACY_EQ_MODE(dev->caps))
++			shared_ports = true;
++
+ 		for (i = 0; i < dev->caps.num_comp_vectors + 1; i++) {
+ 			if (i == MLX4_EQ_ASYNC)
+ 				continue;
+@@ -2686,7 +2694,7 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev)
+ 			priv->eq_table.eq[i].irq =
+ 				entries[i + 1 - !!(i > MLX4_EQ_ASYNC)].vector;
+ 
+-			if (MLX4_IS_LEGACY_EQ_MODE(dev->caps)) {
++			if (shared_ports) {
+ 				bitmap_fill(priv->eq_table.eq[i].actv_ports.ports,
+ 					    dev->caps.num_ports);
+ 				/* We don't set affinity hint when there
+diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
+index edd77342773a..248478c6f6e4 100644
+--- a/drivers/net/macvtap.c
++++ b/drivers/net/macvtap.c
+@@ -1111,10 +1111,10 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
+ 		return 0;
+ 
+ 	case TUNSETSNDBUF:
+-		if (get_user(u, up))
++		if (get_user(s, sp))
+ 			return -EFAULT;
+ 
+-		q->sk.sk_sndbuf = u;
++		q->sk.sk_sndbuf = s;
+ 		return 0;
+ 
+ 	case TUNGETVNETHDRSZ:
+diff --git a/drivers/net/phy/fixed_phy.c b/drivers/net/phy/fixed_phy.c
+index d7a65247f952..99d9bc19c94a 100644
+--- a/drivers/net/phy/fixed_phy.c
++++ b/drivers/net/phy/fixed_phy.c
+@@ -52,6 +52,10 @@ static int fixed_phy_update_regs(struct fixed_phy *fp)
+ 	u16 lpagb = 0;
+ 	u16 lpa = 0;
+ 
++	if (!fp->status.link)
++		goto done;
++	bmsr |= BMSR_LSTATUS | BMSR_ANEGCOMPLETE;
++
+ 	if (fp->status.duplex) {
+ 		bmcr |= BMCR_FULLDPLX;
+ 
+@@ -96,15 +100,13 @@ static int fixed_phy_update_regs(struct fixed_phy *fp)
+ 		}
+ 	}
+ 
+-	if (fp->status.link)
+-		bmsr |= BMSR_LSTATUS | BMSR_ANEGCOMPLETE;
+-
+ 	if (fp->status.pause)
+ 		lpa |= LPA_PAUSE_CAP;
+ 
+ 	if (fp->status.asym_pause)
+ 		lpa |= LPA_PAUSE_ASYM;
+ 
++done:
+ 	fp->regs[MII_PHYSID1] = 0;
+ 	fp->regs[MII_PHYSID2] = 0;
+ 
+diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
+index 46a14cbb0215..02a4615b65f8 100644
+--- a/drivers/net/phy/mdio_bus.c
++++ b/drivers/net/phy/mdio_bus.c
+@@ -303,12 +303,12 @@ void mdiobus_unregister(struct mii_bus *bus)
+ 	BUG_ON(bus->state != MDIOBUS_REGISTERED);
+ 	bus->state = MDIOBUS_UNREGISTERED;
+ 
+-	device_del(&bus->dev);
+ 	for (i = 0; i < PHY_MAX_ADDR; i++) {
+ 		if (bus->phy_map[i])
+ 			device_unregister(&bus->phy_map[i]->dev);
+ 		bus->phy_map[i] = NULL;
+ 	}
++	device_del(&bus->dev);
+ }
+ EXPORT_SYMBOL(mdiobus_unregister);
+ 
+diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
+index fa8f5046afe9..487be20b6b12 100644
+--- a/drivers/net/ppp/ppp_generic.c
++++ b/drivers/net/ppp/ppp_generic.c
+@@ -2742,6 +2742,7 @@ static struct ppp *ppp_create_interface(struct net *net, int unit,
+ 	 */
+ 	dev_net_set(dev, net);
+ 
++	rtnl_lock();
+ 	mutex_lock(&pn->all_ppp_mutex);
+ 
+ 	if (unit < 0) {
+@@ -2772,7 +2773,7 @@ static struct ppp *ppp_create_interface(struct net *net, int unit,
+ 	ppp->file.index = unit;
+ 	sprintf(dev->name, "ppp%d", unit);
+ 
+-	ret = register_netdev(dev);
++	ret = register_netdevice(dev);
+ 	if (ret != 0) {
+ 		unit_put(&pn->units_idr, unit);
+ 		netdev_err(ppp->dev, "PPP: couldn't register device %s (%d)\n",
+@@ -2784,6 +2785,7 @@ static struct ppp *ppp_create_interface(struct net *net, int unit,
+ 
+ 	atomic_inc(&ppp_unit_count);
+ 	mutex_unlock(&pn->all_ppp_mutex);
++	rtnl_unlock();
+ 
+ 	*retp = 0;
+ 	return ppp;
+diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c
+index fdc60db60829..7c8c23cc6896 100644
+--- a/drivers/of/of_mdio.c
++++ b/drivers/of/of_mdio.c
+@@ -266,7 +266,8 @@ EXPORT_SYMBOL(of_phy_attach);
+ bool of_phy_is_fixed_link(struct device_node *np)
+ {
+ 	struct device_node *dn;
+-	int len;
++	int len, err;
++	const char *managed;
+ 
+ 	/* New binding */
+ 	dn = of_get_child_by_name(np, "fixed-link");
+@@ -275,6 +276,10 @@ bool of_phy_is_fixed_link(struct device_node *np)
+ 		return true;
+ 	}
+ 
++	err = of_property_read_string(np, "managed", &managed);
++	if (err == 0 && strcmp(managed, "auto") != 0)
++		return true;
++
+ 	/* Old binding */
+ 	if (of_get_property(np, "fixed-link", &len) &&
+ 	    len == (5 * sizeof(__be32)))
+@@ -289,8 +294,18 @@ int of_phy_register_fixed_link(struct device_node *np)
+ 	struct fixed_phy_status status = {};
+ 	struct device_node *fixed_link_node;
+ 	const __be32 *fixed_link_prop;
+-	int len;
++	int len, err;
+ 	struct phy_device *phy;
++	const char *managed;
++
++	err = of_property_read_string(np, "managed", &managed);
++	if (err == 0) {
++		if (strcmp(managed, "in-band-status") == 0) {
++			/* status is zeroed, namely its .link member */
++			phy = fixed_phy_register(PHY_POLL, &status, np);
++			return IS_ERR(phy) ? PTR_ERR(phy) : 0;
++		}
++	}
+ 
+ 	/* New binding */
+ 	fixed_link_node = of_get_child_by_name(np, "fixed-link");
+diff --git a/drivers/platform/x86/hp-wmi.c b/drivers/platform/x86/hp-wmi.c
+index 06697315a088..fb4dd7b3ee71 100644
+--- a/drivers/platform/x86/hp-wmi.c
++++ b/drivers/platform/x86/hp-wmi.c
+@@ -54,8 +54,9 @@ MODULE_ALIAS("wmi:5FB7F034-2C63-45e9-BE91-3D44E2C707E4");
+ #define HPWMI_HARDWARE_QUERY 0x4
+ #define HPWMI_WIRELESS_QUERY 0x5
+ #define HPWMI_BIOS_QUERY 0x9
++#define HPWMI_FEATURE_QUERY 0xb
+ #define HPWMI_HOTKEY_QUERY 0xc
+-#define HPWMI_FEATURE_QUERY 0xd
++#define HPWMI_FEATURE2_QUERY 0xd
+ #define HPWMI_WIRELESS2_QUERY 0x1b
+ #define HPWMI_POSTCODEERROR_QUERY 0x2a
+ 
+@@ -295,25 +296,33 @@ static int hp_wmi_tablet_state(void)
+ 	return (state & 0x4) ? 1 : 0;
+ }
+ 
+-static int __init hp_wmi_bios_2009_later(void)
++static int __init hp_wmi_bios_2008_later(void)
+ {
+ 	int state = 0;
+ 	int ret = hp_wmi_perform_query(HPWMI_FEATURE_QUERY, 0, &state,
+ 				       sizeof(state), sizeof(state));
+-	if (ret)
+-		return ret;
++	if (!ret)
++		return 1;
+ 
+-	return (state & 0x10) ? 1 : 0;
++	return (ret == HPWMI_RET_UNKNOWN_CMDTYPE) ? 0 : -ENXIO;
+ }
+ 
+-static int hp_wmi_enable_hotkeys(void)
++static int __init hp_wmi_bios_2009_later(void)
+ {
+-	int ret;
+-	int query = 0x6e;
++	int state = 0;
++	int ret = hp_wmi_perform_query(HPWMI_FEATURE2_QUERY, 0, &state,
++				       sizeof(state), sizeof(state));
++	if (!ret)
++		return 1;
+ 
+-	ret = hp_wmi_perform_query(HPWMI_BIOS_QUERY, 1, &query, sizeof(query),
+-				   0);
++	return (ret == HPWMI_RET_UNKNOWN_CMDTYPE) ? 0 : -ENXIO;
++}
+ 
++static int __init hp_wmi_enable_hotkeys(void)
++{
++	int value = 0x6e;
++	int ret = hp_wmi_perform_query(HPWMI_BIOS_QUERY, 1, &value,
++				       sizeof(value), 0);
+ 	if (ret)
+ 		return -EINVAL;
+ 	return 0;
+@@ -663,7 +672,7 @@ static int __init hp_wmi_input_setup(void)
+ 			    hp_wmi_tablet_state());
+ 	input_sync(hp_wmi_input_dev);
+ 
+-	if (hp_wmi_bios_2009_later() == 4)
++	if (!hp_wmi_bios_2009_later() && hp_wmi_bios_2008_later())
+ 		hp_wmi_enable_hotkeys();
+ 
+ 	status = wmi_install_notify_handler(HPWMI_EVENT_GUID, hp_wmi_notify, NULL);
+diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
+index 1285eaf5dc22..03cdb9e18d57 100644
+--- a/net/bridge/br_multicast.c
++++ b/net/bridge/br_multicast.c
+@@ -991,7 +991,7 @@ static int br_ip4_multicast_igmp3_report(struct net_bridge *br,
+ 
+ 	ih = igmpv3_report_hdr(skb);
+ 	num = ntohs(ih->ngrec);
+-	len = sizeof(*ih);
++	len = skb_transport_offset(skb) + sizeof(*ih);
+ 
+ 	for (i = 0; i < num; i++) {
+ 		len += sizeof(*grec);
+@@ -1052,7 +1052,7 @@ static int br_ip6_multicast_mld2_report(struct net_bridge *br,
+ 
+ 	icmp6h = icmp6_hdr(skb);
+ 	num = ntohs(icmp6h->icmp6_dataun.un_data16[1]);
+-	len = sizeof(*icmp6h);
++	len = skb_transport_offset(skb) + sizeof(*icmp6h);
+ 
+ 	for (i = 0; i < num; i++) {
+ 		__be16 *nsrcs, _nsrcs;
+diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
+index 9a12668f7d62..0ad144fb0c79 100644
+--- a/net/core/fib_rules.c
++++ b/net/core/fib_rules.c
+@@ -615,15 +615,17 @@ static int dump_rules(struct sk_buff *skb, struct netlink_callback *cb,
+ {
+ 	int idx = 0;
+ 	struct fib_rule *rule;
++	int err = 0;
+ 
+ 	rcu_read_lock();
+ 	list_for_each_entry_rcu(rule, &ops->rules_list, list) {
+ 		if (idx < cb->args[1])
+ 			goto skip;
+ 
+-		if (fib_nl_fill_rule(skb, rule, NETLINK_CB(cb->skb).portid,
+-				     cb->nlh->nlmsg_seq, RTM_NEWRULE,
+-				     NLM_F_MULTI, ops) < 0)
++		err = fib_nl_fill_rule(skb, rule, NETLINK_CB(cb->skb).portid,
++				       cb->nlh->nlmsg_seq, RTM_NEWRULE,
++				       NLM_F_MULTI, ops);
++		if (err)
+ 			break;
+ skip:
+ 		idx++;
+@@ -632,7 +634,7 @@ skip:
+ 	cb->args[1] = idx;
+ 	rules_ops_put(ops);
+ 
+-	return skb->len;
++	return err;
+ }
+ 
+ static int fib_nl_dumprule(struct sk_buff *skb, struct netlink_callback *cb)
+@@ -648,7 +650,9 @@ static int fib_nl_dumprule(struct sk_buff *skb, struct netlink_callback *cb)
+ 		if (ops == NULL)
+ 			return -EAFNOSUPPORT;
+ 
+-		return dump_rules(skb, cb, ops);
++		dump_rules(skb, cb, ops);
++
++		return skb->len;
+ 	}
+ 
+ 	rcu_read_lock();
+diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
+index dc004b1e1f85..0861018be708 100644
+--- a/net/core/rtnetlink.c
++++ b/net/core/rtnetlink.c
+@@ -3021,6 +3021,7 @@ static int rtnl_bridge_getlink(struct sk_buff *skb, struct netlink_callback *cb)
+ 	u32 portid = NETLINK_CB(cb->skb).portid;
+ 	u32 seq = cb->nlh->nlmsg_seq;
+ 	u32 filter_mask = 0;
++	int err;
+ 
+ 	if (nlmsg_len(cb->nlh) > sizeof(struct ifinfomsg)) {
+ 		struct nlattr *extfilt;
+@@ -3041,20 +3042,25 @@ static int rtnl_bridge_getlink(struct sk_buff *skb, struct netlink_callback *cb)
+ 		struct net_device *br_dev = netdev_master_upper_dev_get(dev);
+ 
+ 		if (br_dev && br_dev->netdev_ops->ndo_bridge_getlink) {
+-			if (idx >= cb->args[0] &&
+-			    br_dev->netdev_ops->ndo_bridge_getlink(
+-				    skb, portid, seq, dev, filter_mask,
+-				    NLM_F_MULTI) < 0)
+-				break;
++			if (idx >= cb->args[0]) {
++				err = br_dev->netdev_ops->ndo_bridge_getlink(
++						skb, portid, seq, dev,
++						filter_mask, NLM_F_MULTI);
++				if (err < 0 && err != -EOPNOTSUPP)
++					break;
++			}
+ 			idx++;
+ 		}
+ 
+ 		if (ops->ndo_bridge_getlink) {
+-			if (idx >= cb->args[0] &&
+-			    ops->ndo_bridge_getlink(skb, portid, seq, dev,
+-						    filter_mask,
+-						    NLM_F_MULTI) < 0)
+-				break;
++			if (idx >= cb->args[0]) {
++				err = ops->ndo_bridge_getlink(skb, portid,
++							      seq, dev,
++							      filter_mask,
++							      NLM_F_MULTI);
++				if (err < 0 && err != -EOPNOTSUPP)
++					break;
++			}
+ 			idx++;
+ 		}
+ 	}
+diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c
+index d79866c5f8bc..817622f3dbb7 100644
+--- a/net/core/sock_diag.c
++++ b/net/core/sock_diag.c
+@@ -90,6 +90,9 @@ int sock_diag_put_filterinfo(bool may_report_filterinfo, struct sock *sk,
+ 		goto out;
+ 
+ 	fprog = filter->prog->orig_prog;
++	if (!fprog)
++		goto out;
++
+ 	flen = bpf_classic_proglen(fprog);
+ 
+ 	attr = nla_reserve(skb, attrtype, flen);
+diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
+index b1c218df2c85..b7dedd9d36d8 100644
+--- a/net/ipv4/tcp_output.c
++++ b/net/ipv4/tcp_output.c
+@@ -2898,6 +2898,7 @@ void tcp_send_active_reset(struct sock *sk, gfp_t priority)
+ 	skb_reserve(skb, MAX_TCP_HEADER);
+ 	tcp_init_nondata_skb(skb, tcp_acceptable_seq(sk),
+ 			     TCPHDR_ACK | TCPHDR_RST);
++	skb_mstamp_get(&skb->skb_mstamp);
+ 	/* Send it off. */
+ 	if (tcp_transmit_skb(sk, skb, 0, priority))
+ 		NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTFAILED);
+diff --git a/net/ipv6/exthdrs_offload.c b/net/ipv6/exthdrs_offload.c
+index 447a7fbd1bb6..f5e2ba1c18bf 100644
+--- a/net/ipv6/exthdrs_offload.c
++++ b/net/ipv6/exthdrs_offload.c
+@@ -36,6 +36,6 @@ out:
+ 	return ret;
+ 
+ out_rt:
+-	inet_del_offload(&rthdr_offload, IPPROTO_ROUTING);
++	inet6_del_offload(&rthdr_offload, IPPROTO_ROUTING);
+ 	goto out;
+ }
+diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
+index 74ceb73c1c9a..5f36266b1f5e 100644
+--- a/net/ipv6/ip6mr.c
++++ b/net/ipv6/ip6mr.c
+@@ -550,7 +550,7 @@ static void ipmr_mfc_seq_stop(struct seq_file *seq, void *v)
+ 
+ 	if (it->cache == &mrt->mfc6_unres_queue)
+ 		spin_unlock_bh(&mfc_unres_lock);
+-	else if (it->cache == mrt->mfc6_cache_array)
++	else if (it->cache == &mrt->mfc6_cache_array[it->ct])
+ 		read_unlock(&mrt_lock);
+ }
+ 
+diff --git a/net/ipv6/route.c b/net/ipv6/route.c
+index d15586490cec..00b64d402a57 100644
+--- a/net/ipv6/route.c
++++ b/net/ipv6/route.c
+@@ -1727,7 +1727,7 @@ static int ip6_convert_metrics(struct mx6_config *mxc,
+ 	return -EINVAL;
+ }
+ 
+-int ip6_route_add(struct fib6_config *cfg)
++int ip6_route_info_create(struct fib6_config *cfg, struct rt6_info **rt_ret)
+ {
+ 	int err;
+ 	struct net *net = cfg->fc_nlinfo.nl_net;
+@@ -1735,7 +1735,6 @@ int ip6_route_add(struct fib6_config *cfg)
+ 	struct net_device *dev = NULL;
+ 	struct inet6_dev *idev = NULL;
+ 	struct fib6_table *table;
+-	struct mx6_config mxc = { .mx = NULL, };
+ 	int addr_type;
+ 
+ 	if (cfg->fc_dst_len > 128 || cfg->fc_src_len > 128)
+@@ -1941,6 +1940,32 @@ install_route:
+ 
+ 	cfg->fc_nlinfo.nl_net = dev_net(dev);
+ 
++	*rt_ret = rt;
++
++	return 0;
++out:
++	if (dev)
++		dev_put(dev);
++	if (idev)
++		in6_dev_put(idev);
++	if (rt)
++		dst_free(&rt->dst);
++
++	*rt_ret = NULL;
++
++	return err;
++}
++
++int ip6_route_add(struct fib6_config *cfg)
++{
++	struct mx6_config mxc = { .mx = NULL, };
++	struct rt6_info *rt = NULL;
++	int err;
++
++	err = ip6_route_info_create(cfg, &rt);
++	if (err)
++		goto out;
++
+ 	err = ip6_convert_metrics(&mxc, cfg);
+ 	if (err)
+ 		goto out;
+@@ -1948,14 +1973,12 @@ install_route:
+ 	err = __ip6_ins_rt(rt, &cfg->fc_nlinfo, &mxc);
+ 
+ 	kfree(mxc.mx);
++
+ 	return err;
+ out:
+-	if (dev)
+-		dev_put(dev);
+-	if (idev)
+-		in6_dev_put(idev);
+ 	if (rt)
+ 		dst_free(&rt->dst);
++
+ 	return err;
+ }
+ 
+@@ -2727,19 +2750,78 @@ errout:
+ 	return err;
+ }
+ 
+-static int ip6_route_multipath(struct fib6_config *cfg, int add)
++struct rt6_nh {
++	struct rt6_info *rt6_info;
++	struct fib6_config r_cfg;
++	struct mx6_config mxc;
++	struct list_head next;
++};
++
++static void ip6_print_replace_route_err(struct list_head *rt6_nh_list)
++{
++	struct rt6_nh *nh;
++
++	list_for_each_entry(nh, rt6_nh_list, next) {
++		pr_warn("IPV6: multipath route replace failed (check consistency of installed routes): %pI6 nexthop %pI6 ifi %d\n",
++		        &nh->r_cfg.fc_dst, &nh->r_cfg.fc_gateway,
++		        nh->r_cfg.fc_ifindex);
++	}
++}
++
++static int ip6_route_info_append(struct list_head *rt6_nh_list,
++				 struct rt6_info *rt, struct fib6_config *r_cfg)
++{
++	struct rt6_nh *nh;
++	struct rt6_info *rtnh;
++	int err = -EEXIST;
++
++	list_for_each_entry(nh, rt6_nh_list, next) {
++		/* check if rt6_info already exists */
++		rtnh = nh->rt6_info;
++
++		if (rtnh->dst.dev == rt->dst.dev &&
++		    rtnh->rt6i_idev == rt->rt6i_idev &&
++		    ipv6_addr_equal(&rtnh->rt6i_gateway,
++				    &rt->rt6i_gateway))
++			return err;
++	}
++
++	nh = kzalloc(sizeof(*nh), GFP_KERNEL);
++	if (!nh)
++		return -ENOMEM;
++	nh->rt6_info = rt;
++	err = ip6_convert_metrics(&nh->mxc, r_cfg);
++	if (err) {
++		kfree(nh);
++		return err;
++	}
++	memcpy(&nh->r_cfg, r_cfg, sizeof(*r_cfg));
++	list_add_tail(&nh->next, rt6_nh_list);
++
++	return 0;
++}
++
++static int ip6_route_multipath_add(struct fib6_config *cfg)
+ {
+ 	struct fib6_config r_cfg;
+ 	struct rtnexthop *rtnh;
++	struct rt6_info *rt;
++	struct rt6_nh *err_nh;
++	struct rt6_nh *nh, *nh_safe;
+ 	int remaining;
+ 	int attrlen;
+-	int err = 0, last_err = 0;
++	int err = 1;
++	int nhn = 0;
++	int replace = (cfg->fc_nlinfo.nlh &&
++		       (cfg->fc_nlinfo.nlh->nlmsg_flags & NLM_F_REPLACE));
++	LIST_HEAD(rt6_nh_list);
+ 
+ 	remaining = cfg->fc_mp_len;
+-beginning:
+ 	rtnh = (struct rtnexthop *)cfg->fc_mp;
+ 
+-	/* Parse a Multipath Entry */
++	/* Parse a Multipath Entry and build a list (rt6_nh_list) of
++	 * rt6_info structs per nexthop
++	 */
+ 	while (rtnh_ok(rtnh, remaining)) {
+ 		memcpy(&r_cfg, cfg, sizeof(*cfg));
+ 		if (rtnh->rtnh_ifindex)
+@@ -2755,22 +2837,32 @@ beginning:
+ 				r_cfg.fc_flags |= RTF_GATEWAY;
+ 			}
+ 		}
+-		err = add ? ip6_route_add(&r_cfg) : ip6_route_del(&r_cfg);
++
++		err = ip6_route_info_create(&r_cfg, &rt);
++		if (err)
++			goto cleanup;
++
++		err = ip6_route_info_append(&rt6_nh_list, rt, &r_cfg);
+ 		if (err) {
+-			last_err = err;
+-			/* If we are trying to remove a route, do not stop the
+-			 * loop when ip6_route_del() fails (because next hop is
+-			 * already gone), we should try to remove all next hops.
+-			 */
+-			if (add) {
+-				/* If add fails, we should try to delete all
+-				 * next hops that have been already added.
+-				 */
+-				add = 0;
+-				remaining = cfg->fc_mp_len - remaining;
+-				goto beginning;
+-			}
++			dst_free(&rt->dst);
++			goto cleanup;
++		}
++
++		rtnh = rtnh_next(rtnh, &remaining);
++	}
++
++	err_nh = NULL;
++	list_for_each_entry(nh, &rt6_nh_list, next) {
++		err = __ip6_ins_rt(nh->rt6_info, &cfg->fc_nlinfo, &nh->mxc);
++		/* nh->rt6_info is used or freed at this point, reset to NULL*/
++		nh->rt6_info = NULL;
++		if (err) {
++			if (replace && nhn)
++				ip6_print_replace_route_err(&rt6_nh_list);
++			err_nh = nh;
++			goto add_errout;
+ 		}
++
+ 		/* Because each route is added like a single route we remove
+ 		 * these flags after the first nexthop: if there is a collision,
+ 		 * we have already failed to add the first nexthop:
+@@ -2780,6 +2872,63 @@ beginning:
+ 		 */
+ 		cfg->fc_nlinfo.nlh->nlmsg_flags &= ~(NLM_F_EXCL |
+ 						     NLM_F_REPLACE);
++		nhn++;
++	}
++
++	goto cleanup;
++
++add_errout:
++	/* Delete routes that were already added */
++	list_for_each_entry(nh, &rt6_nh_list, next) {
++		if (err_nh == nh)
++			break;
++		ip6_route_del(&nh->r_cfg);
++	}
++
++cleanup:
++	list_for_each_entry_safe(nh, nh_safe, &rt6_nh_list, next) {
++		if (nh->rt6_info)
++			dst_free(&nh->rt6_info->dst);
++		if (nh->mxc.mx)
++			kfree(nh->mxc.mx);
++		list_del(&nh->next);
++		kfree(nh);
++	}
++
++	return err;
++}
++
++static int ip6_route_multipath_del(struct fib6_config *cfg)
++{
++	struct fib6_config r_cfg;
++	struct rtnexthop *rtnh;
++	int remaining;
++	int attrlen;
++	int err = 1, last_err = 0;
++
++	remaining = cfg->fc_mp_len;
++	rtnh = (struct rtnexthop *)cfg->fc_mp;
++
++	/* Parse a Multipath Entry */
++	while (rtnh_ok(rtnh, remaining)) {
++		memcpy(&r_cfg, cfg, sizeof(*cfg));
++		if (rtnh->rtnh_ifindex)
++			r_cfg.fc_ifindex = rtnh->rtnh_ifindex;
++
++		attrlen = rtnh_attrlen(rtnh);
++		if (attrlen > 0) {
++			struct nlattr *nla, *attrs = rtnh_attrs(rtnh);
++
++			nla = nla_find(attrs, attrlen, RTA_GATEWAY);
++			if (nla) {
++				nla_memcpy(&r_cfg.fc_gateway, nla, 16);
++				r_cfg.fc_flags |= RTF_GATEWAY;
++			}
++		}
++		err = ip6_route_del(&r_cfg);
++		if (err)
++			last_err = err;
++
+ 		rtnh = rtnh_next(rtnh, &remaining);
+ 	}
+ 
+@@ -2796,7 +2945,7 @@ static int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr *nlh)
+ 		return err;
+ 
+ 	if (cfg.fc_mp)
+-		return ip6_route_multipath(&cfg, 0);
++		return ip6_route_multipath_del(&cfg);
+ 	else
+ 		return ip6_route_del(&cfg);
+ }
+@@ -2811,7 +2960,7 @@ static int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr *nlh)
+ 		return err;
+ 
+ 	if (cfg.fc_mp)
+-		return ip6_route_multipath(&cfg, 1);
++		return ip6_route_multipath_add(&cfg);
+ 	else
+ 		return ip6_route_add(&cfg);
+ }
+diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
+index a774985489e2..0857f7243797 100644
+--- a/net/netlink/af_netlink.c
++++ b/net/netlink/af_netlink.c
+@@ -124,6 +124,24 @@ static inline u32 netlink_group_mask(u32 group)
+ 	return group ? 1 << (group - 1) : 0;
+ }
+ 
++static struct sk_buff *netlink_to_full_skb(const struct sk_buff *skb,
++					   gfp_t gfp_mask)
++{
++	unsigned int len = skb_end_offset(skb);
++	struct sk_buff *new;
++
++	new = alloc_skb(len, gfp_mask);
++	if (new == NULL)
++		return NULL;
++
++	NETLINK_CB(new).portid = NETLINK_CB(skb).portid;
++	NETLINK_CB(new).dst_group = NETLINK_CB(skb).dst_group;
++	NETLINK_CB(new).creds = NETLINK_CB(skb).creds;
++
++	memcpy(skb_put(new, len), skb->data, len);
++	return new;
++}
++
+ int netlink_add_tap(struct netlink_tap *nt)
+ {
+ 	if (unlikely(nt->dev->type != ARPHRD_NETLINK))
+@@ -205,7 +223,11 @@ static int __netlink_deliver_tap_skb(struct sk_buff *skb,
+ 	int ret = -ENOMEM;
+ 
+ 	dev_hold(dev);
+-	nskb = skb_clone(skb, GFP_ATOMIC);
++
++	if (netlink_skb_is_mmaped(skb) || is_vmalloc_addr(skb->head))
++		nskb = netlink_to_full_skb(skb, GFP_ATOMIC);
++	else
++		nskb = skb_clone(skb, GFP_ATOMIC);
+ 	if (nskb) {
+ 		nskb->dev = dev;
+ 		nskb->protocol = htons((u16) sk->sk_protocol);
+@@ -278,11 +300,6 @@ static void netlink_rcv_wake(struct sock *sk)
+ }
+ 
+ #ifdef CONFIG_NETLINK_MMAP
+-static bool netlink_skb_is_mmaped(const struct sk_buff *skb)
+-{
+-	return NETLINK_CB(skb).flags & NETLINK_SKB_MMAPED;
+-}
+-
+ static bool netlink_rx_is_mmaped(struct sock *sk)
+ {
+ 	return nlk_sk(sk)->rx_ring.pg_vec != NULL;
+@@ -834,7 +851,6 @@ static void netlink_ring_set_copied(struct sock *sk, struct sk_buff *skb)
+ }
+ 
+ #else /* CONFIG_NETLINK_MMAP */
+-#define netlink_skb_is_mmaped(skb)	false
+ #define netlink_rx_is_mmaped(sk)	false
+ #define netlink_tx_is_mmaped(sk)	false
+ #define netlink_mmap			sock_no_mmap
+@@ -1082,8 +1098,8 @@ static int netlink_insert(struct sock *sk, u32 portid)
+ 
+ 	lock_sock(sk);
+ 
+-	err = -EBUSY;
+-	if (nlk_sk(sk)->portid)
++	err = nlk_sk(sk)->portid == portid ? 0 : -EBUSY;
++	if (nlk_sk(sk)->bound)
+ 		goto err;
+ 
+ 	err = -ENOMEM;
+@@ -1103,10 +1119,14 @@ static int netlink_insert(struct sock *sk, u32 portid)
+ 			err = -EOVERFLOW;
+ 		if (err == -EEXIST)
+ 			err = -EADDRINUSE;
+-		nlk_sk(sk)->portid = 0;
+ 		sock_put(sk);
++		goto err;
+ 	}
+ 
++	/* We need to ensure that the socket is hashed and visible. */
++	smp_wmb();
++	nlk_sk(sk)->bound = portid;
++
+ err:
+ 	release_sock(sk);
+ 	return err;
+@@ -1491,6 +1511,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
+ 	struct sockaddr_nl *nladdr = (struct sockaddr_nl *)addr;
+ 	int err;
+ 	long unsigned int groups = nladdr->nl_groups;
++	bool bound;
+ 
+ 	if (addr_len < sizeof(struct sockaddr_nl))
+ 		return -EINVAL;
+@@ -1507,9 +1528,14 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
+ 			return err;
+ 	}
+ 
+-	if (nlk->portid)
++	bound = nlk->bound;
++	if (bound) {
++		/* Ensure nlk->portid is up-to-date. */
++		smp_rmb();
++
+ 		if (nladdr->nl_pid != nlk->portid)
+ 			return -EINVAL;
++	}
+ 
+ 	if (nlk->netlink_bind && groups) {
+ 		int group;
+@@ -1525,7 +1551,10 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
+ 		}
+ 	}
+ 
+-	if (!nlk->portid) {
++	/* No need for barriers here as we return to user-space without
++	 * using any of the bound attributes.
++	 */
++	if (!bound) {
+ 		err = nladdr->nl_pid ?
+ 			netlink_insert(sk, nladdr->nl_pid) :
+ 			netlink_autobind(sock);
+@@ -1573,7 +1602,10 @@ static int netlink_connect(struct socket *sock, struct sockaddr *addr,
+ 	    !netlink_allowed(sock, NL_CFG_F_NONROOT_SEND))
+ 		return -EPERM;
+ 
+-	if (!nlk->portid)
++	/* No need for barriers here as we return to user-space without
++	 * using any of the bound attributes.
++	 */
++	if (!nlk->bound)
+ 		err = netlink_autobind(sock);
+ 
+ 	if (err == 0) {
+@@ -2391,10 +2423,13 @@ static int netlink_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
+ 		dst_group = nlk->dst_group;
+ 	}
+ 
+-	if (!nlk->portid) {
++	if (!nlk->bound) {
+ 		err = netlink_autobind(sock);
+ 		if (err)
+ 			goto out;
++	} else {
++		/* Ensure nlk is hashed and visible. */
++		smp_rmb();
+ 	}
+ 
+ 	/* It's a really convoluted way for userland to ask for mmaped
+diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
+index 89008405d6b4..14437d9b1965 100644
+--- a/net/netlink/af_netlink.h
++++ b/net/netlink/af_netlink.h
+@@ -35,6 +35,7 @@ struct netlink_sock {
+ 	unsigned long		state;
+ 	size_t			max_recvmsg_len;
+ 	wait_queue_head_t	wait;
++	bool			bound;
+ 	bool			cb_running;
+ 	struct netlink_callback	cb;
+ 	struct mutex		*cb_mutex;
+@@ -59,6 +60,15 @@ static inline struct netlink_sock *nlk_sk(struct sock *sk)
+ 	return container_of(sk, struct netlink_sock, sk);
+ }
+ 
++static inline bool netlink_skb_is_mmaped(const struct sk_buff *skb)
++{
++#ifdef CONFIG_NETLINK_MMAP
++	return NETLINK_CB(skb).flags & NETLINK_SKB_MMAPED;
++#else
++	return false;
++#endif /* CONFIG_NETLINK_MMAP */
++}
++
+ struct netlink_table {
+ 	struct rhashtable	hash;
+ 	struct hlist_head	mc_list;
+diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
+index ff8c4a4c1609..ff332d1b94bc 100644
+--- a/net/openvswitch/datapath.c
++++ b/net/openvswitch/datapath.c
+@@ -920,7 +920,7 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
+ 	if (error)
+ 		goto err_kfree_flow;
+ 
+-	ovs_flow_mask_key(&new_flow->key, &key, &mask);
++	ovs_flow_mask_key(&new_flow->key, &key, true, &mask);
+ 
+ 	/* Extract flow identifier. */
+ 	error = ovs_nla_get_identifier(&new_flow->id, a[OVS_FLOW_ATTR_UFID],
+@@ -1047,7 +1047,7 @@ static struct sw_flow_actions *get_flow_actions(const struct nlattr *a,
+ 	struct sw_flow_key masked_key;
+ 	int error;
+ 
+-	ovs_flow_mask_key(&masked_key, key, mask);
++	ovs_flow_mask_key(&masked_key, key, true, mask);
+ 	error = ovs_nla_copy_actions(a, &masked_key, &acts, log);
+ 	if (error) {
+ 		OVS_NLERR(log,
+diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
+index 65523948fb95..b5c3bba87fc8 100644
+--- a/net/openvswitch/flow_table.c
++++ b/net/openvswitch/flow_table.c
+@@ -56,20 +56,21 @@ static u16 range_n_bytes(const struct sw_flow_key_range *range)
+ }
+ 
+ void ovs_flow_mask_key(struct sw_flow_key *dst, const struct sw_flow_key *src,
+-		       const struct sw_flow_mask *mask)
++		       bool full, const struct sw_flow_mask *mask)
+ {
+-	const long *m = (const long *)((const u8 *)&mask->key +
+-				mask->range.start);
+-	const long *s = (const long *)((const u8 *)src +
+-				mask->range.start);
+-	long *d = (long *)((u8 *)dst + mask->range.start);
++	int start = full ? 0 : mask->range.start;
++	int len = full ? sizeof *dst : range_n_bytes(&mask->range);
++	const long *m = (const long *)((const u8 *)&mask->key + start);
++	const long *s = (const long *)((const u8 *)src + start);
++	long *d = (long *)((u8 *)dst + start);
+ 	int i;
+ 
+-	/* The memory outside of the 'mask->range' are not set since
+-	 * further operations on 'dst' only uses contents within
+-	 * 'mask->range'.
++	/* If 'full' is true then all of 'dst' is fully initialized. Otherwise,
++	 * if 'full' is false the memory outside of the 'mask->range' is left
++	 * uninitialized. This can be used as an optimization when further
++	 * operations on 'dst' only use contents within 'mask->range'.
+ 	 */
+-	for (i = 0; i < range_n_bytes(&mask->range); i += sizeof(long))
++	for (i = 0; i < len; i += sizeof(long))
+ 		*d++ = *s++ & *m++;
+ }
+ 
+@@ -473,7 +474,7 @@ static struct sw_flow *masked_flow_lookup(struct table_instance *ti,
+ 	u32 hash;
+ 	struct sw_flow_key masked_key;
+ 
+-	ovs_flow_mask_key(&masked_key, unmasked, mask);
++	ovs_flow_mask_key(&masked_key, unmasked, false, mask);
+ 	hash = flow_hash(&masked_key, &mask->range);
+ 	head = find_bucket(ti, hash);
+ 	hlist_for_each_entry_rcu(flow, head, flow_table.node[ti->node_ver]) {
+diff --git a/net/openvswitch/flow_table.h b/net/openvswitch/flow_table.h
+index 616eda10d955..2dd9900f533d 100644
+--- a/net/openvswitch/flow_table.h
++++ b/net/openvswitch/flow_table.h
+@@ -86,5 +86,5 @@ struct sw_flow *ovs_flow_tbl_lookup_ufid(struct flow_table *,
+ bool ovs_flow_cmp(const struct sw_flow *, const struct sw_flow_match *);
+ 
+ void ovs_flow_mask_key(struct sw_flow_key *dst, const struct sw_flow_key *src,
+-		       const struct sw_flow_mask *mask);
++		       bool full, const struct sw_flow_mask *mask);
+ #endif /* flow_table.h */
+diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
+index ed458b315ef4..7851b1222a36 100644
+--- a/net/packet/af_packet.c
++++ b/net/packet/af_packet.c
+@@ -229,6 +229,8 @@ struct packet_skb_cb {
+ 	} sa;
+ };
+ 
++#define vio_le() virtio_legacy_is_little_endian()
++
+ #define PACKET_SKB_CB(__skb)	((struct packet_skb_cb *)((__skb)->cb))
+ 
+ #define GET_PBDQC_FROM_RB(x)	((struct tpacket_kbdq_core *)(&(x)->prb_bdqc))
+@@ -2561,15 +2563,15 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
+ 			goto out_unlock;
+ 
+ 		if ((vnet_hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) &&
+-		    (__virtio16_to_cpu(false, vnet_hdr.csum_start) +
+-		     __virtio16_to_cpu(false, vnet_hdr.csum_offset) + 2 >
+-		      __virtio16_to_cpu(false, vnet_hdr.hdr_len)))
+-			vnet_hdr.hdr_len = __cpu_to_virtio16(false,
+-				 __virtio16_to_cpu(false, vnet_hdr.csum_start) +
+-				__virtio16_to_cpu(false, vnet_hdr.csum_offset) + 2);
++		    (__virtio16_to_cpu(vio_le(), vnet_hdr.csum_start) +
++		     __virtio16_to_cpu(vio_le(), vnet_hdr.csum_offset) + 2 >
++		      __virtio16_to_cpu(vio_le(), vnet_hdr.hdr_len)))
++			vnet_hdr.hdr_len = __cpu_to_virtio16(vio_le(),
++				 __virtio16_to_cpu(vio_le(), vnet_hdr.csum_start) +
++				__virtio16_to_cpu(vio_le(), vnet_hdr.csum_offset) + 2);
+ 
+ 		err = -EINVAL;
+-		if (__virtio16_to_cpu(false, vnet_hdr.hdr_len) > len)
++		if (__virtio16_to_cpu(vio_le(), vnet_hdr.hdr_len) > len)
+ 			goto out_unlock;
+ 
+ 		if (vnet_hdr.gso_type != VIRTIO_NET_HDR_GSO_NONE) {
+@@ -2612,7 +2614,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
+ 	hlen = LL_RESERVED_SPACE(dev);
+ 	tlen = dev->needed_tailroom;
+ 	skb = packet_alloc_skb(sk, hlen + tlen, hlen, len,
+-			       __virtio16_to_cpu(false, vnet_hdr.hdr_len),
++			       __virtio16_to_cpu(vio_le(), vnet_hdr.hdr_len),
+ 			       msg->msg_flags & MSG_DONTWAIT, &err);
+ 	if (skb == NULL)
+ 		goto out_unlock;
+@@ -2659,8 +2661,8 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
+ 
+ 	if (po->has_vnet_hdr) {
+ 		if (vnet_hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
+-			u16 s = __virtio16_to_cpu(false, vnet_hdr.csum_start);
+-			u16 o = __virtio16_to_cpu(false, vnet_hdr.csum_offset);
++			u16 s = __virtio16_to_cpu(vio_le(), vnet_hdr.csum_start);
++			u16 o = __virtio16_to_cpu(vio_le(), vnet_hdr.csum_offset);
+ 			if (!skb_partial_csum_set(skb, s, o)) {
+ 				err = -EINVAL;
+ 				goto out_free;
+@@ -2668,7 +2670,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
+ 		}
+ 
+ 		skb_shinfo(skb)->gso_size =
+-			__virtio16_to_cpu(false, vnet_hdr.gso_size);
++			__virtio16_to_cpu(vio_le(), vnet_hdr.gso_size);
+ 		skb_shinfo(skb)->gso_type = gso_type;
+ 
+ 		/* Header must be checked, and gso_segs computed. */
+@@ -3042,9 +3044,9 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
+ 
+ 			/* This is a hint as to how much should be linear. */
+ 			vnet_hdr.hdr_len =
+-				__cpu_to_virtio16(false, skb_headlen(skb));
++				__cpu_to_virtio16(vio_le(), skb_headlen(skb));
+ 			vnet_hdr.gso_size =
+-				__cpu_to_virtio16(false, sinfo->gso_size);
++				__cpu_to_virtio16(vio_le(), sinfo->gso_size);
+ 			if (sinfo->gso_type & SKB_GSO_TCPV4)
+ 				vnet_hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
+ 			else if (sinfo->gso_type & SKB_GSO_TCPV6)
+@@ -3062,9 +3064,9 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
+ 
+ 		if (skb->ip_summed == CHECKSUM_PARTIAL) {
+ 			vnet_hdr.flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
+-			vnet_hdr.csum_start = __cpu_to_virtio16(false,
++			vnet_hdr.csum_start = __cpu_to_virtio16(vio_le(),
+ 					  skb_checksum_start_offset(skb));
+-			vnet_hdr.csum_offset = __cpu_to_virtio16(false,
++			vnet_hdr.csum_offset = __cpu_to_virtio16(vio_le(),
+ 							 skb->csum_offset);
+ 		} else if (skb->ip_summed == CHECKSUM_UNNECESSARY) {
+ 			vnet_hdr.flags = VIRTIO_NET_HDR_F_DATA_VALID;
+diff --git a/net/sched/cls_fw.c b/net/sched/cls_fw.c
+index 715e01e5910a..f23a3b68bba6 100644
+--- a/net/sched/cls_fw.c
++++ b/net/sched/cls_fw.c
+@@ -33,7 +33,6 @@
+ 
+ struct fw_head {
+ 	u32			mask;
+-	bool			mask_set;
+ 	struct fw_filter __rcu	*ht[HTSIZE];
+ 	struct rcu_head		rcu;
+ };
+@@ -84,7 +83,7 @@ static int fw_classify(struct sk_buff *skb, const struct tcf_proto *tp,
+ 			}
+ 		}
+ 	} else {
+-		/* old method */
++		/* Old method: classify the packet using its skb mark. */
+ 		if (id && (TC_H_MAJ(id) == 0 ||
+ 			   !(TC_H_MAJ(id ^ tp->q->handle)))) {
+ 			res->classid = id;
+@@ -114,14 +113,9 @@ static unsigned long fw_get(struct tcf_proto *tp, u32 handle)
+ 
+ static int fw_init(struct tcf_proto *tp)
+ {
+-	struct fw_head *head;
+-
+-	head = kzalloc(sizeof(struct fw_head), GFP_KERNEL);
+-	if (head == NULL)
+-		return -ENOBUFS;
+-
+-	head->mask_set = false;
+-	rcu_assign_pointer(tp->root, head);
++	/* We don't allocate fw_head here, because in the old method
++	 * we don't need it at all.
++	 */
+ 	return 0;
+ }
+ 
+@@ -252,7 +246,7 @@ static int fw_change(struct net *net, struct sk_buff *in_skb,
+ 	int err;
+ 
+ 	if (!opt)
+-		return handle ? -EINVAL : 0;
++		return handle ? -EINVAL : 0; /* Succeed if it is old method. */
+ 
+ 	err = nla_parse_nested(tb, TCA_FW_MAX, opt, fw_policy);
+ 	if (err < 0)
+@@ -302,11 +296,17 @@ static int fw_change(struct net *net, struct sk_buff *in_skb,
+ 	if (!handle)
+ 		return -EINVAL;
+ 
+-	if (!head->mask_set) {
+-		head->mask = 0xFFFFFFFF;
++	if (!head) {
++		u32 mask = 0xFFFFFFFF;
+ 		if (tb[TCA_FW_MASK])
+-			head->mask = nla_get_u32(tb[TCA_FW_MASK]);
+-		head->mask_set = true;
++			mask = nla_get_u32(tb[TCA_FW_MASK]);
++
++		head = kzalloc(sizeof(*head), GFP_KERNEL);
++		if (!head)
++			return -ENOBUFS;
++		head->mask = mask;
++
++		rcu_assign_pointer(tp->root, head);
+ 	}
+ 
+ 	f = kzalloc(sizeof(struct fw_filter), GFP_KERNEL);
+diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
+index 59e80356672b..3ac604f96da0 100644
+--- a/net/sctp/protocol.c
++++ b/net/sctp/protocol.c
+@@ -1166,7 +1166,7 @@ static void sctp_v4_del_protocol(void)
+ 	unregister_inetaddr_notifier(&sctp_inetaddr_notifier);
+ }
+ 
+-static int __net_init sctp_net_init(struct net *net)
++static int __net_init sctp_defaults_init(struct net *net)
+ {
+ 	int status;
+ 
+@@ -1259,12 +1259,6 @@ static int __net_init sctp_net_init(struct net *net)
+ 
+ 	sctp_dbg_objcnt_init(net);
+ 
+-	/* Initialize the control inode/socket for handling OOTB packets.  */
+-	if ((status = sctp_ctl_sock_init(net))) {
+-		pr_err("Failed to initialize the SCTP control sock\n");
+-		goto err_ctl_sock_init;
+-	}
+-
+ 	/* Initialize the local address list. */
+ 	INIT_LIST_HEAD(&net->sctp.local_addr_list);
+ 	spin_lock_init(&net->sctp.local_addr_lock);
+@@ -1280,9 +1274,6 @@ static int __net_init sctp_net_init(struct net *net)
+ 
+ 	return 0;
+ 
+-err_ctl_sock_init:
+-	sctp_dbg_objcnt_exit(net);
+-	sctp_proc_exit(net);
+ err_init_proc:
+ 	cleanup_sctp_mibs(net);
+ err_init_mibs:
+@@ -1291,15 +1282,12 @@ err_sysctl_register:
+ 	return status;
+ }
+ 
+-static void __net_exit sctp_net_exit(struct net *net)
++static void __net_exit sctp_defaults_exit(struct net *net)
+ {
+ 	/* Free the local address list */
+ 	sctp_free_addr_wq(net);
+ 	sctp_free_local_addr_list(net);
+ 
+-	/* Free the control endpoint.  */
+-	inet_ctl_sock_destroy(net->sctp.ctl_sock);
+-
+ 	sctp_dbg_objcnt_exit(net);
+ 
+ 	sctp_proc_exit(net);
+@@ -1307,9 +1295,32 @@ static void __net_exit sctp_net_exit(struct net *net)
+ 	sctp_sysctl_net_unregister(net);
+ }
+ 
+-static struct pernet_operations sctp_net_ops = {
+-	.init = sctp_net_init,
+-	.exit = sctp_net_exit,
++static struct pernet_operations sctp_defaults_ops = {
++	.init = sctp_defaults_init,
++	.exit = sctp_defaults_exit,
++};
++
++static int __net_init sctp_ctrlsock_init(struct net *net)
++{
++	int status;
++
++	/* Initialize the control inode/socket for handling OOTB packets.  */
++	status = sctp_ctl_sock_init(net);
++	if (status)
++		pr_err("Failed to initialize the SCTP control sock\n");
++
++	return status;
++}
++
++static void __net_init sctp_ctrlsock_exit(struct net *net)
++{
++	/* Free the control endpoint.  */
++	inet_ctl_sock_destroy(net->sctp.ctl_sock);
++}
++
++static struct pernet_operations sctp_ctrlsock_ops = {
++	.init = sctp_ctrlsock_init,
++	.exit = sctp_ctrlsock_exit,
+ };
+ 
+ /* Initialize the universe into something sensible.  */
+@@ -1442,8 +1453,11 @@ static __init int sctp_init(void)
+ 	sctp_v4_pf_init();
+ 	sctp_v6_pf_init();
+ 
+-	status = sctp_v4_protosw_init();
++	status = register_pernet_subsys(&sctp_defaults_ops);
++	if (status)
++		goto err_register_defaults;
+ 
++	status = sctp_v4_protosw_init();
+ 	if (status)
+ 		goto err_protosw_init;
+ 
+@@ -1451,9 +1465,9 @@ static __init int sctp_init(void)
+ 	if (status)
+ 		goto err_v6_protosw_init;
+ 
+-	status = register_pernet_subsys(&sctp_net_ops);
++	status = register_pernet_subsys(&sctp_ctrlsock_ops);
+ 	if (status)
+-		goto err_register_pernet_subsys;
++		goto err_register_ctrlsock;
+ 
+ 	status = sctp_v4_add_protocol();
+ 	if (status)
+@@ -1469,12 +1483,14 @@ out:
+ err_v6_add_protocol:
+ 	sctp_v4_del_protocol();
+ err_add_protocol:
+-	unregister_pernet_subsys(&sctp_net_ops);
+-err_register_pernet_subsys:
++	unregister_pernet_subsys(&sctp_ctrlsock_ops);
++err_register_ctrlsock:
+ 	sctp_v6_protosw_exit();
+ err_v6_protosw_init:
+ 	sctp_v4_protosw_exit();
+ err_protosw_init:
++	unregister_pernet_subsys(&sctp_defaults_ops);
++err_register_defaults:
+ 	sctp_v4_pf_exit();
+ 	sctp_v6_pf_exit();
+ 	sctp_sysctl_unregister();
+@@ -1507,12 +1523,14 @@ static __exit void sctp_exit(void)
+ 	sctp_v6_del_protocol();
+ 	sctp_v4_del_protocol();
+ 
+-	unregister_pernet_subsys(&sctp_net_ops);
++	unregister_pernet_subsys(&sctp_ctrlsock_ops);
+ 
+ 	/* Free protosw registrations */
+ 	sctp_v6_protosw_exit();
+ 	sctp_v4_protosw_exit();
+ 
++	unregister_pernet_subsys(&sctp_defaults_ops);
++
+ 	/* Unregister with socket layer. */
+ 	sctp_v6_pf_exit();
+ 	sctp_v4_pf_exit();


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-10-23 17:14 Mike Pagano
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-10-23 17:14 UTC (permalink / raw
  To: gentoo-commits

commit:     a66c9411919f0d467ddacb949af14b1336517b90
Author:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Fri Oct 23 17:14:16 2015 +0000
Commit:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Fri Oct 23 17:14:16 2015 +0000
URL:        https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=a66c9411

Linux patch 4.2.4

 0000_README            |     4 +
 1003_linux-4.2.4.patch | 10010 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 10014 insertions(+)

diff --git a/0000_README b/0000_README
index 5a14372..2a467c2 100644
--- a/0000_README
+++ b/0000_README
@@ -55,6 +55,10 @@ Patch:  1002_linux-4.2.3.patch
 From:   http://www.kernel.org
 Desc:   Linux 4.2.3
 
+Patch:  1003_linux-4.2.4.patch
+From:   http://www.kernel.org
+Desc:   Linux 4.2.4
+
 Patch:  1500_XATTR_USER_PREFIX.patch
 From:   https://bugs.gentoo.org/show_bug.cgi?id=470644
 Desc:   Support for namespace user.pax.* on tmpfs.

diff --git a/1003_linux-4.2.4.patch b/1003_linux-4.2.4.patch
new file mode 100644
index 0000000..4118bfa
--- /dev/null
+++ b/1003_linux-4.2.4.patch
@@ -0,0 +1,10010 @@
+diff --git a/Documentation/HOWTO b/Documentation/HOWTO
+index 93aa8604630e..21152d397b88 100644
+--- a/Documentation/HOWTO
++++ b/Documentation/HOWTO
+@@ -218,16 +218,16 @@ The development process
+ Linux kernel development process currently consists of a few different
+ main kernel "branches" and lots of different subsystem-specific kernel
+ branches.  These different branches are:
+-  - main 3.x kernel tree
+-  - 3.x.y -stable kernel tree
+-  - 3.x -git kernel patches
++  - main 4.x kernel tree
++  - 4.x.y -stable kernel tree
++  - 4.x -git kernel patches
+   - subsystem specific kernel trees and patches
+-  - the 3.x -next kernel tree for integration tests
++  - the 4.x -next kernel tree for integration tests
+ 
+-3.x kernel tree
++4.x kernel tree
+ -----------------
+-3.x kernels are maintained by Linus Torvalds, and can be found on
+-kernel.org in the pub/linux/kernel/v3.x/ directory.  Its development
++4.x kernels are maintained by Linus Torvalds, and can be found on
++kernel.org in the pub/linux/kernel/v4.x/ directory.  Its development
+ process is as follows:
+   - As soon as a new kernel is released a two weeks window is open,
+     during this period of time maintainers can submit big diffs to
+@@ -262,20 +262,20 @@ mailing list about kernel releases:
+ 	released according to perceived bug status, not according to a
+ 	preconceived timeline."
+ 
+-3.x.y -stable kernel tree
++4.x.y -stable kernel tree
+ ---------------------------
+ Kernels with 3-part versions are -stable kernels. They contain
+ relatively small and critical fixes for security problems or significant
+-regressions discovered in a given 3.x kernel.
++regressions discovered in a given 4.x kernel.
+ 
+ This is the recommended branch for users who want the most recent stable
+ kernel and are not interested in helping test development/experimental
+ versions.
+ 
+-If no 3.x.y kernel is available, then the highest numbered 3.x
++If no 4.x.y kernel is available, then the highest numbered 4.x
+ kernel is the current stable kernel.
+ 
+-3.x.y are maintained by the "stable" team <stable@vger.kernel.org>, and
++4.x.y are maintained by the "stable" team <stable@vger.kernel.org>, and
+ are released as needs dictate.  The normal release period is approximately
+ two weeks, but it can be longer if there are no pressing problems.  A
+ security-related problem, instead, can cause a release to happen almost
+@@ -285,7 +285,7 @@ The file Documentation/stable_kernel_rules.txt in the kernel tree
+ documents what kinds of changes are acceptable for the -stable tree, and
+ how the release process works.
+ 
+-3.x -git patches
++4.x -git patches
+ ------------------
+ These are daily snapshots of Linus' kernel tree which are managed in a
+ git repository (hence the name.) These patches are usually released
+@@ -317,9 +317,9 @@ revisions to it, and maintainers can mark patches as under review,
+ accepted, or rejected.  Most of these patchwork sites are listed at
+ http://patchwork.kernel.org/.
+ 
+-3.x -next kernel tree for integration tests
++4.x -next kernel tree for integration tests
+ ---------------------------------------------
+-Before updates from subsystem trees are merged into the mainline 3.x
++Before updates from subsystem trees are merged into the mainline 4.x
+ tree, they need to be integration-tested.  For this purpose, a special
+ testing repository exists into which virtually all subsystem trees are
+ pulled on an almost daily basis:
+diff --git a/Makefile b/Makefile
+index a6edbb11a69a..a952801a6cd5 100644
+--- a/Makefile
++++ b/Makefile
+@@ -1,6 +1,6 @@
+ VERSION = 4
+ PATCHLEVEL = 2
+-SUBLEVEL = 3
++SUBLEVEL = 4
+ EXTRAVERSION =
+ NAME = Hurr durr I'ma sheep
+ 
+diff --git a/arch/arc/plat-axs10x/axs10x.c b/arch/arc/plat-axs10x/axs10x.c
+index e7769c3ab5f2..ac79491ee2c0 100644
+--- a/arch/arc/plat-axs10x/axs10x.c
++++ b/arch/arc/plat-axs10x/axs10x.c
+@@ -402,6 +402,8 @@ static void __init axs103_early_init(void)
+ 	unsigned int num_cores = (read_aux_reg(ARC_REG_MCIP_BCR) >> 16) & 0x3F;
+ 	if (num_cores > 2)
+ 		arc_set_core_freq(50 * 1000000);
++	else if (num_cores == 2)
++		arc_set_core_freq(75 * 1000000);
+ #endif
+ 
+ 	switch (arc_get_core_freq()/1000000) {
+diff --git a/arch/arm/Makefile b/arch/arm/Makefile
+index 7451b447cc2d..2c2b28ee4811 100644
+--- a/arch/arm/Makefile
++++ b/arch/arm/Makefile
+@@ -54,6 +54,14 @@ AS		+= -EL
+ LD		+= -EL
+ endif
+ 
++#
++# The Scalar Replacement of Aggregates (SRA) optimization pass in GCC 4.9 and
++# later may result in code being generated that handles signed short and signed
++# char struct members incorrectly. So disable it.
++# (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65932)
++#
++KBUILD_CFLAGS	+= $(call cc-option,-fno-ipa-sra)
++
+ # This selects which instruction set is used.
+ # Note that GCC does not numerically define an architecture version
+ # macro, but instead defines a whole series of macros which makes
+diff --git a/arch/arm/boot/dts/exynos5420.dtsi b/arch/arm/boot/dts/exynos5420.dtsi
+index 534f27ceb10b..fa8107dec109 100644
+--- a/arch/arm/boot/dts/exynos5420.dtsi
++++ b/arch/arm/boot/dts/exynos5420.dtsi
+@@ -1118,7 +1118,7 @@
+ 		interrupt-parent = <&combiner>;
+ 		interrupts = <3 0>;
+ 		clock-names = "sysmmu", "master";
+-		clocks = <&clock CLK_SMMU_FIMD1M0>, <&clock CLK_FIMD1>;
++		clocks = <&clock CLK_SMMU_FIMD1M1>, <&clock CLK_FIMD1>;
+ 		power-domains = <&disp_pd>;
+ 		#iommu-cells = <0>;
+ 	};
+diff --git a/arch/arm/boot/dts/imx6qdl-rex.dtsi b/arch/arm/boot/dts/imx6qdl-rex.dtsi
+index 3373fd958e95..a50356243888 100644
+--- a/arch/arm/boot/dts/imx6qdl-rex.dtsi
++++ b/arch/arm/boot/dts/imx6qdl-rex.dtsi
+@@ -35,7 +35,6 @@
+ 			compatible = "regulator-fixed";
+ 			reg = <1>;
+ 			pinctrl-names = "default";
+-			pinctrl-0 = <&pinctrl_usbh1>;
+ 			regulator-name = "usbh1_vbus";
+ 			regulator-min-microvolt = <5000000>;
+ 			regulator-max-microvolt = <5000000>;
+@@ -47,7 +46,6 @@
+ 			compatible = "regulator-fixed";
+ 			reg = <2>;
+ 			pinctrl-names = "default";
+-			pinctrl-0 = <&pinctrl_usbotg>;
+ 			regulator-name = "usb_otg_vbus";
+ 			regulator-min-microvolt = <5000000>;
+ 			regulator-max-microvolt = <5000000>;
+diff --git a/arch/arm/boot/dts/omap3-beagle.dts b/arch/arm/boot/dts/omap3-beagle.dts
+index a5474113cd50..67659a0ed13e 100644
+--- a/arch/arm/boot/dts/omap3-beagle.dts
++++ b/arch/arm/boot/dts/omap3-beagle.dts
+@@ -202,7 +202,7 @@
+ 
+ 	tfp410_pins: pinmux_tfp410_pins {
+ 		pinctrl-single,pins = <
+-			0x194 (PIN_OUTPUT | MUX_MODE4)	/* hdq_sio.gpio_170 */
++			0x196 (PIN_OUTPUT | MUX_MODE4)	/* hdq_sio.gpio_170 */
+ 		>;
+ 	};
+ 
+diff --git a/arch/arm/boot/dts/omap5-uevm.dts b/arch/arm/boot/dts/omap5-uevm.dts
+index 275618f19a43..5771a149ce4a 100644
+--- a/arch/arm/boot/dts/omap5-uevm.dts
++++ b/arch/arm/boot/dts/omap5-uevm.dts
+@@ -174,8 +174,8 @@
+ 
+ 	i2c5_pins: pinmux_i2c5_pins {
+ 		pinctrl-single,pins = <
+-			0x184 (PIN_INPUT | MUX_MODE0)		/* i2c5_scl */
+-			0x186 (PIN_INPUT | MUX_MODE0)		/* i2c5_sda */
++			0x186 (PIN_INPUT | MUX_MODE0)		/* i2c5_scl */
++			0x188 (PIN_INPUT | MUX_MODE0)		/* i2c5_sda */
+ 		>;
+ 	};
+ 
+diff --git a/arch/arm/boot/dts/sun7i-a20.dtsi b/arch/arm/boot/dts/sun7i-a20.dtsi
+index 6a63f30c9a69..f5f384c04335 100644
+--- a/arch/arm/boot/dts/sun7i-a20.dtsi
++++ b/arch/arm/boot/dts/sun7i-a20.dtsi
+@@ -107,7 +107,7 @@
+ 				720000	1200000
+ 				528000	1100000
+ 				312000	1000000
+-				144000	900000
++				144000	1000000
+ 				>;
+ 			#cooling-cells = <2>;
+ 			cooling-min-level = <0>;
+diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
+index a6ad93c9bce3..fd9eefce0a7b 100644
+--- a/arch/arm/kernel/kgdb.c
++++ b/arch/arm/kernel/kgdb.c
+@@ -259,15 +259,17 @@ int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt)
+ 	if (err)
+ 		return err;
+ 
+-	patch_text((void *)bpt->bpt_addr,
+-		   *(unsigned int *)arch_kgdb_ops.gdb_bpt_instr);
++	/* Machine is already stopped, so we can use __patch_text() directly */
++	__patch_text((void *)bpt->bpt_addr,
++		     *(unsigned int *)arch_kgdb_ops.gdb_bpt_instr);
+ 
+ 	return err;
+ }
+ 
+ int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
+ {
+-	patch_text((void *)bpt->bpt_addr, *(unsigned int *)bpt->saved_instr);
++	/* Machine is already stopped, so we can use __patch_text() directly */
++	__patch_text((void *)bpt->bpt_addr, *(unsigned int *)bpt->saved_instr);
+ 
+ 	return 0;
+ }
+diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
+index 54272e0be713..7d5379c1c443 100644
+--- a/arch/arm/kernel/perf_event.c
++++ b/arch/arm/kernel/perf_event.c
+@@ -795,8 +795,10 @@ static int of_pmu_irq_cfg(struct arm_pmu *pmu)
+ 
+ 	/* Don't bother with PPIs; they're already affine */
+ 	irq = platform_get_irq(pdev, 0);
+-	if (irq >= 0 && irq_is_percpu(irq))
++	if (irq >= 0 && irq_is_percpu(irq)) {
++		cpumask_setall(&pmu->supported_cpus);
+ 		return 0;
++	}
+ 
+ 	irqs = kcalloc(pdev->num_resources, sizeof(*irqs), GFP_KERNEL);
+ 	if (!irqs)
+diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
+index 423663e23791..586eef26203d 100644
+--- a/arch/arm/kernel/signal.c
++++ b/arch/arm/kernel/signal.c
+@@ -343,12 +343,17 @@ setup_return(struct pt_regs *regs, struct ksignal *ksig,
+ 		 */
+ 		thumb = handler & 1;
+ 
+-#if __LINUX_ARM_ARCH__ >= 7
++#if __LINUX_ARM_ARCH__ >= 6
+ 		/*
+-		 * Clear the If-Then Thumb-2 execution state
+-		 * ARM spec requires this to be all 000s in ARM mode
+-		 * Snapdragon S4/Krait misbehaves on a Thumb=>ARM
+-		 * signal transition without this.
++		 * Clear the If-Then Thumb-2 execution state.  ARM spec
++		 * requires this to be all 000s in ARM mode.  Snapdragon
++		 * S4/Krait misbehaves on a Thumb=>ARM signal transition
++		 * without this.
++		 *
++		 * We must do this whenever we are running on a Thumb-2
++		 * capable CPU, which includes ARMv6T2.  However, we elect
++		 * to do this whenever we're on an ARMv6 or later CPU for
++		 * simplicity.
+ 		 */
+ 		cpsr &= ~PSR_IT_MASK;
+ #endif
+diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
+index 702740d37465..51a59504bef4 100644
+--- a/arch/arm/kvm/interrupts_head.S
++++ b/arch/arm/kvm/interrupts_head.S
+@@ -515,8 +515,7 @@ ARM_BE8(rev	r6, r6  )
+ 
+ 	mrc	p15, 0, r2, c14, c3, 1	@ CNTV_CTL
+ 	str	r2, [vcpu, #VCPU_TIMER_CNTV_CTL]
+-	bic	r2, #1			@ Clear ENABLE
+-	mcr	p15, 0, r2, c14, c3, 1	@ CNTV_CTL
++
+ 	isb
+ 
+ 	mrrc	p15, 3, rr_lo_hi(r2, r3), c14	@ CNTV_CVAL
+@@ -529,6 +528,9 @@ ARM_BE8(rev	r6, r6  )
+ 	mcrr	p15, 4, r2, r2, c14	@ CNTVOFF
+ 
+ 1:
++	mov	r2, #0			@ Clear ENABLE
++	mcr	p15, 0, r2, c14, c3, 1	@ CNTV_CTL
++
+ 	@ Allow physical timer/counter access for the host
+ 	mrc	p15, 4, r2, c14, c1, 0	@ CNTHCTL
+ 	orr	r2, r2, #(CNTHCTL_PL1PCEN | CNTHCTL_PL1PCTEN)
+diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
+index 7b4201294187..6984342da13d 100644
+--- a/arch/arm/kvm/mmu.c
++++ b/arch/arm/kvm/mmu.c
+@@ -1792,8 +1792,10 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
+ 		if (vma->vm_flags & VM_PFNMAP) {
+ 			gpa_t gpa = mem->guest_phys_addr +
+ 				    (vm_start - mem->userspace_addr);
+-			phys_addr_t pa = (vma->vm_pgoff << PAGE_SHIFT) +
+-					 vm_start - vma->vm_start;
++			phys_addr_t pa;
++
++			pa = (phys_addr_t)vma->vm_pgoff << PAGE_SHIFT;
++			pa += vm_start - vma->vm_start;
+ 
+ 			/* IO region dirty page logging not allowed */
+ 			if (memslot->flags & KVM_MEM_LOG_DIRTY_PAGES)
+diff --git a/arch/arm/mach-exynos/mcpm-exynos.c b/arch/arm/mach-exynos/mcpm-exynos.c
+index 9bdf54795f05..56978199c479 100644
+--- a/arch/arm/mach-exynos/mcpm-exynos.c
++++ b/arch/arm/mach-exynos/mcpm-exynos.c
+@@ -20,6 +20,7 @@
+ #include <asm/cputype.h>
+ #include <asm/cp15.h>
+ #include <asm/mcpm.h>
++#include <asm/smp_plat.h>
+ 
+ #include "regs-pmu.h"
+ #include "common.h"
+@@ -70,7 +71,31 @@ static int exynos_cpu_powerup(unsigned int cpu, unsigned int cluster)
+ 		cluster >= EXYNOS5420_NR_CLUSTERS)
+ 		return -EINVAL;
+ 
+-	exynos_cpu_power_up(cpunr);
++	if (!exynos_cpu_power_state(cpunr)) {
++		exynos_cpu_power_up(cpunr);
++
++		/*
++		 * This assumes the cluster number of the big cores(Cortex A15)
++		 * is 0 and the Little cores(Cortex A7) is 1.
++		 * When the system was booted from the Little core,
++		 * they should be reset during power up cpu.
++		 */
++		if (cluster &&
++		    cluster == MPIDR_AFFINITY_LEVEL(cpu_logical_map(0), 1)) {
++			/*
++			 * Before we reset the Little cores, we should wait
++			 * the SPARE2 register is set to 1 because the init
++			 * codes of the iROM will set the register after
++			 * initialization.
++			 */
++			while (!pmu_raw_readl(S5P_PMU_SPARE2))
++				udelay(10);
++
++			pmu_raw_writel(EXYNOS5420_KFC_CORE_RESET(cpu),
++					EXYNOS_SWRESET);
++		}
++	}
++
+ 	return 0;
+ }
+ 
+diff --git a/arch/arm/mach-exynos/regs-pmu.h b/arch/arm/mach-exynos/regs-pmu.h
+index b7614333d296..fba9068ed260 100644
+--- a/arch/arm/mach-exynos/regs-pmu.h
++++ b/arch/arm/mach-exynos/regs-pmu.h
+@@ -513,6 +513,12 @@ static inline unsigned int exynos_pmu_cpunr(unsigned int mpidr)
+ #define SPREAD_ENABLE						0xF
+ #define SPREAD_USE_STANDWFI					0xF
+ 
++#define EXYNOS5420_KFC_CORE_RESET0				BIT(8)
++#define EXYNOS5420_KFC_ETM_RESET0				BIT(20)
++
++#define EXYNOS5420_KFC_CORE_RESET(_nr)				\
++	((EXYNOS5420_KFC_CORE_RESET0 | EXYNOS5420_KFC_ETM_RESET0) << (_nr))
++
+ #define EXYNOS5420_BB_CON1					0x0784
+ #define EXYNOS5420_BB_SEL_EN					BIT(31)
+ #define EXYNOS5420_BB_PMOS_EN					BIT(7)
+diff --git a/arch/arm/plat-pxa/ssp.c b/arch/arm/plat-pxa/ssp.c
+index ad9529cc4203..daa1a65f2eb7 100644
+--- a/arch/arm/plat-pxa/ssp.c
++++ b/arch/arm/plat-pxa/ssp.c
+@@ -107,7 +107,6 @@ static const struct of_device_id pxa_ssp_of_ids[] = {
+ 	{ .compatible = "mvrl,pxa168-ssp",	.data = (void *) PXA168_SSP },
+ 	{ .compatible = "mrvl,pxa910-ssp",	.data = (void *) PXA910_SSP },
+ 	{ .compatible = "mrvl,ce4100-ssp",	.data = (void *) CE4100_SSP },
+-	{ .compatible = "mrvl,lpss-ssp",	.data = (void *) LPSS_SSP },
+ 	{ },
+ };
+ MODULE_DEVICE_TABLE(of, pxa_ssp_of_ids);
+diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
+index e8ca6eaedd02..13671a9cf016 100644
+--- a/arch/arm64/kernel/efi.c
++++ b/arch/arm64/kernel/efi.c
+@@ -258,7 +258,8 @@ static bool __init efi_virtmap_init(void)
+ 		 */
+ 		if (!is_normal_ram(md))
+ 			prot = __pgprot(PROT_DEVICE_nGnRE);
+-		else if (md->type == EFI_RUNTIME_SERVICES_CODE)
++		else if (md->type == EFI_RUNTIME_SERVICES_CODE ||
++			 !PAGE_ALIGNED(md->phys_addr))
+ 			prot = PAGE_KERNEL_EXEC;
+ 		else
+ 			prot = PAGE_KERNEL;
+diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
+index 08cafc518b9a..0f03a8fe2314 100644
+--- a/arch/arm64/kernel/entry-ftrace.S
++++ b/arch/arm64/kernel/entry-ftrace.S
+@@ -178,6 +178,24 @@ ENTRY(ftrace_stub)
+ ENDPROC(ftrace_stub)
+ 
+ #ifdef CONFIG_FUNCTION_GRAPH_TRACER
++	/* save return value regs*/
++	.macro save_return_regs
++	sub sp, sp, #64
++	stp x0, x1, [sp]
++	stp x2, x3, [sp, #16]
++	stp x4, x5, [sp, #32]
++	stp x6, x7, [sp, #48]
++	.endm
++
++	/* restore return value regs*/
++	.macro restore_return_regs
++	ldp x0, x1, [sp]
++	ldp x2, x3, [sp, #16]
++	ldp x4, x5, [sp, #32]
++	ldp x6, x7, [sp, #48]
++	add sp, sp, #64
++	.endm
++
+ /*
+  * void ftrace_graph_caller(void)
+  *
+@@ -204,11 +222,11 @@ ENDPROC(ftrace_graph_caller)
+  * only when CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST is enabled.
+  */
+ ENTRY(return_to_handler)
+-	str	x0, [sp, #-16]!
++	save_return_regs
+ 	mov	x0, x29			//     parent's fp
+ 	bl	ftrace_return_to_handler// addr = ftrace_return_to_hander(fp);
+ 	mov	x30, x0			// restore the original return address
+-	ldr	x0, [sp], #16
++	restore_return_regs
+ 	ret
+ END(return_to_handler)
+ #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
+diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
+index 94d98cd1aad8..27c3e6fd24c1 100644
+--- a/arch/arm64/mm/fault.c
++++ b/arch/arm64/mm/fault.c
+@@ -278,6 +278,7 @@ retry:
+ 			 * starvation.
+ 			 */
+ 			mm_flags &= ~FAULT_FLAG_ALLOW_RETRY;
++			mm_flags |= FAULT_FLAG_TRIED;
+ 			goto retry;
+ 		}
+ 	}
+diff --git a/arch/m68k/include/asm/linkage.h b/arch/m68k/include/asm/linkage.h
+index 5a822bb790f7..066e74f666ae 100644
+--- a/arch/m68k/include/asm/linkage.h
++++ b/arch/m68k/include/asm/linkage.h
+@@ -4,4 +4,34 @@
+ #define __ALIGN .align 4
+ #define __ALIGN_STR ".align 4"
+ 
++/*
++ * Make sure the compiler doesn't do anything stupid with the
++ * arguments on the stack - they are owned by the *caller*, not
++ * the callee. This just fools gcc into not spilling into them,
++ * and keeps it from doing tailcall recursion and/or using the
++ * stack slots for temporaries, since they are live and "used"
++ * all the way to the end of the function.
++ */
++#define asmlinkage_protect(n, ret, args...) \
++	__asmlinkage_protect##n(ret, ##args)
++#define __asmlinkage_protect_n(ret, args...) \
++	__asm__ __volatile__ ("" : "=r" (ret) : "0" (ret), ##args)
++#define __asmlinkage_protect0(ret) \
++	__asmlinkage_protect_n(ret)
++#define __asmlinkage_protect1(ret, arg1) \
++	__asmlinkage_protect_n(ret, "m" (arg1))
++#define __asmlinkage_protect2(ret, arg1, arg2) \
++	__asmlinkage_protect_n(ret, "m" (arg1), "m" (arg2))
++#define __asmlinkage_protect3(ret, arg1, arg2, arg3) \
++	__asmlinkage_protect_n(ret, "m" (arg1), "m" (arg2), "m" (arg3))
++#define __asmlinkage_protect4(ret, arg1, arg2, arg3, arg4) \
++	__asmlinkage_protect_n(ret, "m" (arg1), "m" (arg2), "m" (arg3), \
++			      "m" (arg4))
++#define __asmlinkage_protect5(ret, arg1, arg2, arg3, arg4, arg5) \
++	__asmlinkage_protect_n(ret, "m" (arg1), "m" (arg2), "m" (arg3), \
++			      "m" (arg4), "m" (arg5))
++#define __asmlinkage_protect6(ret, arg1, arg2, arg3, arg4, arg5, arg6) \
++	__asmlinkage_protect_n(ret, "m" (arg1), "m" (arg2), "m" (arg3), \
++			      "m" (arg4), "m" (arg5), "m" (arg6))
++
+ #endif
+diff --git a/arch/mips/kernel/cps-vec.S b/arch/mips/kernel/cps-vec.S
+index 9f71c06aebf6..209ded16806b 100644
+--- a/arch/mips/kernel/cps-vec.S
++++ b/arch/mips/kernel/cps-vec.S
+@@ -39,6 +39,7 @@
+ 	 mfc0	\dest, CP0_CONFIG, 3
+ 	andi	\dest, \dest, MIPS_CONF3_MT
+ 	beqz	\dest, \nomt
++	 nop
+ 	.endm
+ 
+ .section .text.cps-vec
+@@ -223,10 +224,9 @@ LEAF(excep_ejtag)
+ 	END(excep_ejtag)
+ 
+ LEAF(mips_cps_core_init)
+-#ifdef CONFIG_MIPS_MT
++#ifdef CONFIG_MIPS_MT_SMP
+ 	/* Check that the core implements the MT ASE */
+ 	has_mt	t0, 3f
+-	 nop
+ 
+ 	.set	push
+ 	.set	mips64r2
+@@ -310,8 +310,9 @@ LEAF(mips_cps_boot_vpes)
+ 	PTR_ADDU t0, t0, t1
+ 
+ 	/* Calculate this VPEs ID. If the core doesn't support MT use 0 */
++	li	t9, 0
++#ifdef CONFIG_MIPS_MT_SMP
+ 	has_mt	ta2, 1f
+-	 li	t9, 0
+ 
+ 	/* Find the number of VPEs present in the core */
+ 	mfc0	t1, CP0_MVPCONF0
+@@ -330,6 +331,7 @@ LEAF(mips_cps_boot_vpes)
+ 	/* Retrieve the VPE ID from EBase.CPUNum */
+ 	mfc0	t9, $15, 1
+ 	and	t9, t9, t1
++#endif
+ 
+ 1:	/* Calculate a pointer to this VPEs struct vpe_boot_config */
+ 	li	t1, VPEBOOTCFG_SIZE
+@@ -337,7 +339,7 @@ LEAF(mips_cps_boot_vpes)
+ 	PTR_L	ta3, COREBOOTCFG_VPECONFIG(t0)
+ 	PTR_ADDU v0, v0, ta3
+ 
+-#ifdef CONFIG_MIPS_MT
++#ifdef CONFIG_MIPS_MT_SMP
+ 
+ 	/* If the core doesn't support MT then return */
+ 	bnez	ta2, 1f
+@@ -451,7 +453,7 @@ LEAF(mips_cps_boot_vpes)
+ 
+ 2:	.set	pop
+ 
+-#endif /* CONFIG_MIPS_MT */
++#endif /* CONFIG_MIPS_MT_SMP */
+ 
+ 	/* Return */
+ 	jr	ra
+diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
+index 008b3378653a..4ceac5cdd6b8 100644
+--- a/arch/mips/kernel/setup.c
++++ b/arch/mips/kernel/setup.c
+@@ -338,7 +338,7 @@ static void __init bootmem_init(void)
+ 		if (end <= reserved_end)
+ 			continue;
+ #ifdef CONFIG_BLK_DEV_INITRD
+-		/* mapstart should be after initrd_end */
++		/* Skip zones before initrd and initrd itself */
+ 		if (initrd_end && end <= (unsigned long)PFN_UP(__pa(initrd_end)))
+ 			continue;
+ #endif
+@@ -371,6 +371,14 @@ static void __init bootmem_init(void)
+ 		max_low_pfn = PFN_DOWN(HIGHMEM_START);
+ 	}
+ 
++#ifdef CONFIG_BLK_DEV_INITRD
++	/*
++	 * mapstart should be after initrd_end
++	 */
++	if (initrd_end)
++		mapstart = max(mapstart, (unsigned long)PFN_UP(__pa(initrd_end)));
++#endif
++
+ 	/*
+ 	 * Initialize the boot-time allocator with low memory only.
+ 	 */
+diff --git a/arch/mips/loongson64/common/env.c b/arch/mips/loongson64/common/env.c
+index f6c44dd332e2..d6d07ad56180 100644
+--- a/arch/mips/loongson64/common/env.c
++++ b/arch/mips/loongson64/common/env.c
+@@ -64,6 +64,9 @@ void __init prom_init_env(void)
+ 	}
+ 	if (memsize == 0)
+ 		memsize = 256;
++
++	loongson_sysconf.nr_uarts = 1;
++
+ 	pr_info("memsize=%u, highmemsize=%u\n", memsize, highmemsize);
+ #else
+ 	struct boot_params *boot_p;
+diff --git a/arch/mips/mm/dma-default.c b/arch/mips/mm/dma-default.c
+index eeaf0245c3b1..815892ed3fe8 100644
+--- a/arch/mips/mm/dma-default.c
++++ b/arch/mips/mm/dma-default.c
+@@ -100,7 +100,7 @@ static gfp_t massage_gfp_flags(const struct device *dev, gfp_t gfp)
+ 	else
+ #endif
+ #if defined(CONFIG_ZONE_DMA) && !defined(CONFIG_ZONE_DMA32)
+-	     if (dev->coherent_dma_mask < DMA_BIT_MASK(64))
++	     if (dev->coherent_dma_mask < DMA_BIT_MASK(sizeof(phys_addr_t) * 8))
+ 		dma_flag = __GFP_DMA;
+ 	else
+ #endif
+diff --git a/arch/mips/net/bpf_jit_asm.S b/arch/mips/net/bpf_jit_asm.S
+index e92726099be0..dabf4179cd7e 100644
+--- a/arch/mips/net/bpf_jit_asm.S
++++ b/arch/mips/net/bpf_jit_asm.S
+@@ -64,8 +64,20 @@ sk_load_word_positive:
+ 	PTR_ADDU t1, $r_skb_data, offset
+ 	lw	$r_A, 0(t1)
+ #ifdef CONFIG_CPU_LITTLE_ENDIAN
++# if defined(__mips_isa_rev) && (__mips_isa_rev >= 2)
+ 	wsbh	t0, $r_A
+ 	rotr	$r_A, t0, 16
++# else
++	sll	t0, $r_A, 24
++	srl	t1, $r_A, 24
++	srl	t2, $r_A, 8
++	or	t0, t0, t1
++	andi	t2, t2, 0xff00
++	andi	t1, $r_A, 0xff00
++	or	t0, t0, t2
++	sll	t1, t1, 8
++	or	$r_A, t0, t1
++# endif
+ #endif
+ 	jr	$r_ra
+ 	 move	$r_ret, zero
+@@ -80,8 +92,16 @@ sk_load_half_positive:
+ 	PTR_ADDU t1, $r_skb_data, offset
+ 	lh	$r_A, 0(t1)
+ #ifdef CONFIG_CPU_LITTLE_ENDIAN
++# if defined(__mips_isa_rev) && (__mips_isa_rev >= 2)
+ 	wsbh	t0, $r_A
+ 	seh	$r_A, t0
++# else
++	sll	t0, $r_A, 24
++	andi	t1, $r_A, 0xff00
++	sra	t0, t0, 16
++	srl	t1, t1, 8
++	or	$r_A, t0, t1
++# endif
+ #endif
+ 	jr	$r_ra
+ 	 move	$r_ret, zero
+@@ -148,23 +168,47 @@ sk_load_byte_positive:
+ NESTED(bpf_slow_path_word, (6 * SZREG), $r_sp)
+ 	bpf_slow_path_common(4)
+ #ifdef CONFIG_CPU_LITTLE_ENDIAN
++# if defined(__mips_isa_rev) && (__mips_isa_rev >= 2)
+ 	wsbh	t0, $r_s0
+ 	jr	$r_ra
+ 	 rotr	$r_A, t0, 16
+-#endif
++# else
++	sll	t0, $r_s0, 24
++	srl	t1, $r_s0, 24
++	srl	t2, $r_s0, 8
++	or	t0, t0, t1
++	andi	t2, t2, 0xff00
++	andi	t1, $r_s0, 0xff00
++	or	t0, t0, t2
++	sll	t1, t1, 8
++	jr	$r_ra
++	 or	$r_A, t0, t1
++# endif
++#else
+ 	jr	$r_ra
+-	move	$r_A, $r_s0
++	 move	$r_A, $r_s0
++#endif
+ 
+ 	END(bpf_slow_path_word)
+ 
+ NESTED(bpf_slow_path_half, (6 * SZREG), $r_sp)
+ 	bpf_slow_path_common(2)
+ #ifdef CONFIG_CPU_LITTLE_ENDIAN
++# if defined(__mips_isa_rev) && (__mips_isa_rev >= 2)
+ 	jr	$r_ra
+ 	 wsbh	$r_A, $r_s0
+-#endif
++# else
++	sll	t0, $r_s0, 8
++	andi	t1, $r_s0, 0xff00
++	andi	t0, t0, 0xff00
++	srl	t1, t1, 8
++	jr	$r_ra
++	 or	$r_A, t0, t1
++# endif
++#else
+ 	jr	$r_ra
+ 	 move	$r_A, $r_s0
++#endif
+ 
+ 	END(bpf_slow_path_half)
+ 
+diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
+index 05ea8fc7f829..4816fe2fa857 100644
+--- a/arch/powerpc/kvm/book3s.c
++++ b/arch/powerpc/kvm/book3s.c
+@@ -827,12 +827,15 @@ int kvmppc_h_logical_ci_load(struct kvm_vcpu *vcpu)
+ 	unsigned long size = kvmppc_get_gpr(vcpu, 4);
+ 	unsigned long addr = kvmppc_get_gpr(vcpu, 5);
+ 	u64 buf;
++	int srcu_idx;
+ 	int ret;
+ 
+ 	if (!is_power_of_2(size) || (size > sizeof(buf)))
+ 		return H_TOO_HARD;
+ 
++	srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
+ 	ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, addr, size, &buf);
++	srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx);
+ 	if (ret != 0)
+ 		return H_TOO_HARD;
+ 
+@@ -867,6 +870,7 @@ int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu)
+ 	unsigned long addr = kvmppc_get_gpr(vcpu, 5);
+ 	unsigned long val = kvmppc_get_gpr(vcpu, 6);
+ 	u64 buf;
++	int srcu_idx;
+ 	int ret;
+ 
+ 	switch (size) {
+@@ -890,7 +894,9 @@ int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu)
+ 		return H_TOO_HARD;
+ 	}
+ 
++	srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
+ 	ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, addr, size, &buf);
++	srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx);
+ 	if (ret != 0)
+ 		return H_TOO_HARD;
+ 
+diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
+index 68d067ad4222..a9f753fb73a8 100644
+--- a/arch/powerpc/kvm/book3s_hv.c
++++ b/arch/powerpc/kvm/book3s_hv.c
+@@ -2178,7 +2178,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
+ 		vc->runner = vcpu;
+ 		if (n_ceded == vc->n_runnable) {
+ 			kvmppc_vcore_blocked(vc);
+-		} else if (should_resched()) {
++		} else if (need_resched()) {
+ 			vc->vcore_state = VCORE_PREEMPT;
+ 			/* Let something else run */
+ 			cond_resched_lock(&vc->lock);
+diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+index 76408cf0ad04..437f64350847 100644
+--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
++++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+@@ -1171,6 +1171,7 @@ mc_cont:
+ 	bl	kvmhv_accumulate_time
+ #endif
+ 
++	mr 	r3, r12
+ 	/* Increment exit count, poke other threads to exit */
+ 	bl	kvmhv_commence_exit
+ 	nop
+diff --git a/arch/powerpc/platforms/pasemi/msi.c b/arch/powerpc/platforms/pasemi/msi.c
+index 27f2b187a91b..ff1bb4b690b9 100644
+--- a/arch/powerpc/platforms/pasemi/msi.c
++++ b/arch/powerpc/platforms/pasemi/msi.c
+@@ -63,6 +63,7 @@ static struct irq_chip mpic_pasemi_msi_chip = {
+ static void pasemi_msi_teardown_msi_irqs(struct pci_dev *pdev)
+ {
+ 	struct msi_desc *entry;
++	irq_hw_number_t hwirq;
+ 
+ 	pr_debug("pasemi_msi_teardown_msi_irqs, pdev %p\n", pdev);
+ 
+@@ -70,10 +71,10 @@ static void pasemi_msi_teardown_msi_irqs(struct pci_dev *pdev)
+ 		if (entry->irq == NO_IRQ)
+ 			continue;
+ 
++		hwirq = virq_to_hw(entry->irq);
+ 		irq_set_msi_desc(entry->irq, NULL);
+-		msi_bitmap_free_hwirqs(&msi_mpic->msi_bitmap,
+-				       virq_to_hw(entry->irq), ALLOC_CHUNK);
+ 		irq_dispose_mapping(entry->irq);
++		msi_bitmap_free_hwirqs(&msi_mpic->msi_bitmap, hwirq, ALLOC_CHUNK);
+ 	}
+ 
+ 	return;
+diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
+index 765d8ed558d0..fd16f86e54a9 100644
+--- a/arch/powerpc/platforms/powernv/pci.c
++++ b/arch/powerpc/platforms/powernv/pci.c
+@@ -99,6 +99,7 @@ void pnv_teardown_msi_irqs(struct pci_dev *pdev)
+ 	struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+ 	struct pnv_phb *phb = hose->private_data;
+ 	struct msi_desc *entry;
++	irq_hw_number_t hwirq;
+ 
+ 	if (WARN_ON(!phb))
+ 		return;
+@@ -106,10 +107,10 @@ void pnv_teardown_msi_irqs(struct pci_dev *pdev)
+ 	list_for_each_entry(entry, &pdev->msi_list, list) {
+ 		if (entry->irq == NO_IRQ)
+ 			continue;
++		hwirq = virq_to_hw(entry->irq);
+ 		irq_set_msi_desc(entry->irq, NULL);
+-		msi_bitmap_free_hwirqs(&phb->msi_bmp,
+-			virq_to_hw(entry->irq) - phb->msi_base, 1);
+ 		irq_dispose_mapping(entry->irq);
++		msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq - phb->msi_base, 1);
+ 	}
+ }
+ #endif /* CONFIG_PCI_MSI */
+diff --git a/arch/powerpc/sysdev/fsl_msi.c b/arch/powerpc/sysdev/fsl_msi.c
+index 5236e5427c38..691e8e517b3e 100644
+--- a/arch/powerpc/sysdev/fsl_msi.c
++++ b/arch/powerpc/sysdev/fsl_msi.c
+@@ -128,15 +128,16 @@ static void fsl_teardown_msi_irqs(struct pci_dev *pdev)
+ {
+ 	struct msi_desc *entry;
+ 	struct fsl_msi *msi_data;
++	irq_hw_number_t hwirq;
+ 
+ 	list_for_each_entry(entry, &pdev->msi_list, list) {
+ 		if (entry->irq == NO_IRQ)
+ 			continue;
++		hwirq = virq_to_hw(entry->irq);
+ 		msi_data = irq_get_chip_data(entry->irq);
+ 		irq_set_msi_desc(entry->irq, NULL);
+-		msi_bitmap_free_hwirqs(&msi_data->bitmap,
+-				       virq_to_hw(entry->irq), 1);
+ 		irq_dispose_mapping(entry->irq);
++		msi_bitmap_free_hwirqs(&msi_data->bitmap, hwirq, 1);
+ 	}
+ 
+ 	return;
+diff --git a/arch/powerpc/sysdev/mpic_u3msi.c b/arch/powerpc/sysdev/mpic_u3msi.c
+index fc46ef3b816e..4c3165fa521c 100644
+--- a/arch/powerpc/sysdev/mpic_u3msi.c
++++ b/arch/powerpc/sysdev/mpic_u3msi.c
+@@ -107,15 +107,16 @@ static u64 find_u4_magic_addr(struct pci_dev *pdev, unsigned int hwirq)
+ static void u3msi_teardown_msi_irqs(struct pci_dev *pdev)
+ {
+ 	struct msi_desc *entry;
++	irq_hw_number_t hwirq;
+ 
+         list_for_each_entry(entry, &pdev->msi_list, list) {
+ 		if (entry->irq == NO_IRQ)
+ 			continue;
+ 
++		hwirq = virq_to_hw(entry->irq);
+ 		irq_set_msi_desc(entry->irq, NULL);
+-		msi_bitmap_free_hwirqs(&msi_mpic->msi_bitmap,
+-				       virq_to_hw(entry->irq), 1);
+ 		irq_dispose_mapping(entry->irq);
++		msi_bitmap_free_hwirqs(&msi_mpic->msi_bitmap, hwirq, 1);
+ 	}
+ 
+ 	return;
+diff --git a/arch/powerpc/sysdev/ppc4xx_msi.c b/arch/powerpc/sysdev/ppc4xx_msi.c
+index 6eb21f2ea585..060f23775255 100644
+--- a/arch/powerpc/sysdev/ppc4xx_msi.c
++++ b/arch/powerpc/sysdev/ppc4xx_msi.c
+@@ -124,16 +124,17 @@ void ppc4xx_teardown_msi_irqs(struct pci_dev *dev)
+ {
+ 	struct msi_desc *entry;
+ 	struct ppc4xx_msi *msi_data = &ppc4xx_msi;
++	irq_hw_number_t hwirq;
+ 
+ 	dev_dbg(&dev->dev, "PCIE-MSI: tearing down msi irqs\n");
+ 
+ 	list_for_each_entry(entry, &dev->msi_list, list) {
+ 		if (entry->irq == NO_IRQ)
+ 			continue;
++		hwirq = virq_to_hw(entry->irq);
+ 		irq_set_msi_desc(entry->irq, NULL);
+-		msi_bitmap_free_hwirqs(&msi_data->bitmap,
+-				virq_to_hw(entry->irq), 1);
+ 		irq_dispose_mapping(entry->irq);
++		msi_bitmap_free_hwirqs(&msi_data->bitmap, hwirq, 1);
+ 	}
+ }
+ 
+diff --git a/arch/s390/boot/compressed/Makefile b/arch/s390/boot/compressed/Makefile
+index d4788111c161..fac6ac9790fa 100644
+--- a/arch/s390/boot/compressed/Makefile
++++ b/arch/s390/boot/compressed/Makefile
+@@ -10,7 +10,7 @@ targets += misc.o piggy.o sizes.h head.o
+ 
+ KBUILD_CFLAGS := -m64 -D__KERNEL__ $(LINUX_INCLUDE) -O2
+ KBUILD_CFLAGS += -DDISABLE_BRANCH_PROFILING
+-KBUILD_CFLAGS += $(cflags-y) -fno-delete-null-pointer-checks
++KBUILD_CFLAGS += $(cflags-y) -fno-delete-null-pointer-checks -msoft-float
+ KBUILD_CFLAGS += $(call cc-option,-mpacked-stack)
+ KBUILD_CFLAGS += $(call cc-option,-ffreestanding)
+ 
+diff --git a/arch/s390/kernel/compat_signal.c b/arch/s390/kernel/compat_signal.c
+index fe8d6924efaa..c78ba51ae285 100644
+--- a/arch/s390/kernel/compat_signal.c
++++ b/arch/s390/kernel/compat_signal.c
+@@ -48,6 +48,19 @@ typedef struct
+ 	struct ucontext32 uc;
+ } rt_sigframe32;
+ 
++static inline void sigset_to_sigset32(unsigned long *set64,
++				      compat_sigset_word *set32)
++{
++	set32[0] = (compat_sigset_word) set64[0];
++	set32[1] = (compat_sigset_word)(set64[0] >> 32);
++}
++
++static inline void sigset32_to_sigset(compat_sigset_word *set32,
++				      unsigned long *set64)
++{
++	set64[0] = (unsigned long) set32[0] | ((unsigned long) set32[1] << 32);
++}
++
+ int copy_siginfo_to_user32(compat_siginfo_t __user *to, const siginfo_t *from)
+ {
+ 	int err;
+@@ -303,10 +316,12 @@ COMPAT_SYSCALL_DEFINE0(sigreturn)
+ {
+ 	struct pt_regs *regs = task_pt_regs(current);
+ 	sigframe32 __user *frame = (sigframe32 __user *)regs->gprs[15];
++	compat_sigset_t cset;
+ 	sigset_t set;
+ 
+-	if (__copy_from_user(&set.sig, &frame->sc.oldmask, _SIGMASK_COPY_SIZE32))
++	if (__copy_from_user(&cset.sig, &frame->sc.oldmask, _SIGMASK_COPY_SIZE32))
+ 		goto badframe;
++	sigset32_to_sigset(cset.sig, set.sig);
+ 	set_current_blocked(&set);
+ 	if (restore_sigregs32(regs, &frame->sregs))
+ 		goto badframe;
+@@ -323,10 +338,12 @@ COMPAT_SYSCALL_DEFINE0(rt_sigreturn)
+ {
+ 	struct pt_regs *regs = task_pt_regs(current);
+ 	rt_sigframe32 __user *frame = (rt_sigframe32 __user *)regs->gprs[15];
++	compat_sigset_t cset;
+ 	sigset_t set;
+ 
+-	if (__copy_from_user(&set, &frame->uc.uc_sigmask, sizeof(set)))
++	if (__copy_from_user(&cset, &frame->uc.uc_sigmask, sizeof(cset)))
+ 		goto badframe;
++	sigset32_to_sigset(cset.sig, set.sig);
+ 	set_current_blocked(&set);
+ 	if (compat_restore_altstack(&frame->uc.uc_stack))
+ 		goto badframe;
+@@ -397,7 +414,7 @@ static int setup_frame32(struct ksignal *ksig, sigset_t *set,
+ 		return -EFAULT;
+ 
+ 	/* Create struct sigcontext32 on the signal stack */
+-	memcpy(&sc.oldmask, &set->sig, _SIGMASK_COPY_SIZE32);
++	sigset_to_sigset32(set->sig, sc.oldmask);
+ 	sc.sregs = (__u32)(unsigned long __force) &frame->sregs;
+ 	if (__copy_to_user(&frame->sc, &sc, sizeof(frame->sc)))
+ 		return -EFAULT;
+@@ -458,6 +475,7 @@ static int setup_frame32(struct ksignal *ksig, sigset_t *set,
+ static int setup_rt_frame32(struct ksignal *ksig, sigset_t *set,
+ 			    struct pt_regs *regs)
+ {
++	compat_sigset_t cset;
+ 	rt_sigframe32 __user *frame;
+ 	unsigned long restorer;
+ 	size_t frame_size;
+@@ -505,11 +523,12 @@ static int setup_rt_frame32(struct ksignal *ksig, sigset_t *set,
+ 	store_sigregs();
+ 
+ 	/* Create ucontext on the signal stack. */
++	sigset_to_sigset32(set->sig, cset.sig);
+ 	if (__put_user(uc_flags, &frame->uc.uc_flags) ||
+ 	    __put_user(0, &frame->uc.uc_link) ||
+ 	    __compat_save_altstack(&frame->uc.uc_stack, regs->gprs[15]) ||
+ 	    save_sigregs32(regs, &frame->uc.uc_mcontext) ||
+-	    __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set)) ||
++	    __copy_to_user(&frame->uc.uc_sigmask, &cset, sizeof(cset)) ||
+ 	    save_sigregs_ext32(regs, &frame->uc.uc_mcontext_ext))
+ 		return -EFAULT;
+ 
+diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
+index 8cb3e438f21e..d330840a2b18 100644
+--- a/arch/x86/entry/entry_64.S
++++ b/arch/x86/entry/entry_64.S
+@@ -1219,7 +1219,18 @@ END(error_exit)
+ 
+ /* Runs on exception stack */
+ ENTRY(nmi)
++	/*
++	 * Fix up the exception frame if we're on Xen.
++	 * PARAVIRT_ADJUST_EXCEPTION_FRAME is guaranteed to push at most
++	 * one value to the stack on native, so it may clobber the rdx
++	 * scratch slot, but it won't clobber any of the important
++	 * slots past it.
++	 *
++	 * Xen is a different story, because the Xen frame itself overlaps
++	 * the "NMI executing" variable.
++	 */
+ 	PARAVIRT_ADJUST_EXCEPTION_FRAME
++
+ 	/*
+ 	 * We allow breakpoints in NMIs. If a breakpoint occurs, then
+ 	 * the iretq it performs will take us out of NMI context.
+@@ -1270,9 +1281,12 @@ ENTRY(nmi)
+ 	 * we don't want to enable interrupts, because then we'll end
+ 	 * up in an awkward situation in which IRQs are on but NMIs
+ 	 * are off.
++	 *
++	 * We also must not push anything to the stack before switching
++	 * stacks lest we corrupt the "NMI executing" variable.
+ 	 */
+ 
+-	SWAPGS
++	SWAPGS_UNSAFE_STACK
+ 	cld
+ 	movq	%rsp, %rdx
+ 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
+diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
+index 9ebc3d009373..2350ab78183a 100644
+--- a/arch/x86/include/asm/msr-index.h
++++ b/arch/x86/include/asm/msr-index.h
+@@ -311,6 +311,7 @@
+ /* C1E active bits in int pending message */
+ #define K8_INTP_C1E_ACTIVE_MASK		0x18000000
+ #define MSR_K8_TSEG_ADDR		0xc0010112
++#define MSR_K8_TSEG_MASK		0xc0010113
+ #define K8_MTRRFIXRANGE_DRAM_ENABLE	0x00040000 /* MtrrFixDramEn bit    */
+ #define K8_MTRRFIXRANGE_DRAM_MODIFY	0x00080000 /* MtrrFixDramModEn bit */
+ #define K8_MTRR_RDMEM_WRMEM_MASK	0x18181818 /* Mask: RdMem|WrMem    */
+diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
+index dca71714f860..b12f81022a6b 100644
+--- a/arch/x86/include/asm/preempt.h
++++ b/arch/x86/include/asm/preempt.h
+@@ -90,9 +90,9 @@ static __always_inline bool __preempt_count_dec_and_test(void)
+ /*
+  * Returns true when we need to resched and can (barring IRQ state).
+  */
+-static __always_inline bool should_resched(void)
++static __always_inline bool should_resched(int preempt_offset)
+ {
+-	return unlikely(!raw_cpu_read_4(__preempt_count));
++	return unlikely(raw_cpu_read_4(__preempt_count) == preempt_offset);
+ }
+ 
+ #ifdef CONFIG_PREEMPT
+diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
+index 9d51fae1cba3..eaba08076030 100644
+--- a/arch/x86/include/asm/qspinlock.h
++++ b/arch/x86/include/asm/qspinlock.h
+@@ -39,18 +39,27 @@ static inline void queued_spin_unlock(struct qspinlock *lock)
+ }
+ #endif
+ 
+-#define virt_queued_spin_lock virt_queued_spin_lock
+-
+-static inline bool virt_queued_spin_lock(struct qspinlock *lock)
++#ifdef CONFIG_PARAVIRT
++#define virt_spin_lock virt_spin_lock
++static inline bool virt_spin_lock(struct qspinlock *lock)
+ {
+ 	if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
+ 		return false;
+ 
+-	while (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) != 0)
+-		cpu_relax();
++	/*
++	 * On hypervisors without PARAVIRT_SPINLOCKS support we fall
++	 * back to a Test-and-Set spinlock, because fair locks have
++	 * horrible lock 'holder' preemption issues.
++	 */
++
++	do {
++		while (atomic_read(&lock->val) != 0)
++			cpu_relax();
++	} while (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) != 0);
+ 
+ 	return true;
+ }
++#endif /* CONFIG_PARAVIRT */
+ 
+ #include <asm-generic/qspinlock.h>
+ 
+diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
+index c42827eb86cf..25f909362b7a 100644
+--- a/arch/x86/kernel/alternative.c
++++ b/arch/x86/kernel/alternative.c
+@@ -338,10 +338,15 @@ done:
+ 
+ static void __init_or_module optimize_nops(struct alt_instr *a, u8 *instr)
+ {
++	unsigned long flags;
++
+ 	if (instr[0] != 0x90)
+ 		return;
+ 
++	local_irq_save(flags);
+ 	add_nops(instr + (a->instrlen - a->padlen), a->padlen);
++	sync_core();
++	local_irq_restore(flags);
+ 
+ 	DUMP_BYTES(instr, a->instrlen, "%p: [%d:%d) optimized NOPs: ",
+ 		   instr, a->instrlen - a->padlen, a->padlen);
+diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
+index cde732c1b495..307a49828826 100644
+--- a/arch/x86/kernel/apic/apic.c
++++ b/arch/x86/kernel/apic/apic.c
+@@ -336,6 +336,13 @@ static void __setup_APIC_LVTT(unsigned int clocks, int oneshot, int irqen)
+ 	apic_write(APIC_LVTT, lvtt_value);
+ 
+ 	if (lvtt_value & APIC_LVT_TIMER_TSCDEADLINE) {
++		/*
++		 * See Intel SDM: TSC-Deadline Mode chapter. In xAPIC mode,
++		 * writing to the APIC LVTT and TSC_DEADLINE MSR isn't serialized.
++		 * According to Intel, MFENCE can do the serialization here.
++		 */
++		asm volatile("mfence" : : : "memory");
++
+ 		printk_once(KERN_DEBUG "TSC deadline timer enabled\n");
+ 		return;
+ 	}
+diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
+index 206052e55517..5880b482d83c 100644
+--- a/arch/x86/kernel/apic/io_apic.c
++++ b/arch/x86/kernel/apic/io_apic.c
+@@ -2522,6 +2522,7 @@ void __init setup_ioapic_dest(void)
+ 	int pin, ioapic, irq, irq_entry;
+ 	const struct cpumask *mask;
+ 	struct irq_data *idata;
++	struct irq_chip *chip;
+ 
+ 	if (skip_ioapic_setup == 1)
+ 		return;
+@@ -2545,9 +2546,9 @@ void __init setup_ioapic_dest(void)
+ 		else
+ 			mask = apic->target_cpus();
+ 
+-		irq_set_affinity(irq, mask);
++		chip = irq_data_get_irq_chip(idata);
++		chip->irq_set_affinity(idata, mask, false);
+ 	}
+-
+ }
+ #endif
+ 
+diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
+index 6326ae24e4d5..1b09c420c7ff 100644
+--- a/arch/x86/kernel/cpu/perf_event_intel.c
++++ b/arch/x86/kernel/cpu/perf_event_intel.c
+@@ -2102,9 +2102,12 @@ static struct event_constraint *
+ intel_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
+ 			    struct perf_event *event)
+ {
+-	struct event_constraint *c1 = cpuc->event_constraint[idx];
++	struct event_constraint *c1 = NULL;
+ 	struct event_constraint *c2;
+ 
++	if (idx >= 0) /* fake does < 0 */
++		c1 = cpuc->event_constraint[idx];
++
+ 	/*
+ 	 * first time only
+ 	 * - static constraint: no change across incremental scheduling calls
+diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
+index e068d6683dba..74ca2fe7a0b3 100644
+--- a/arch/x86/kernel/crash.c
++++ b/arch/x86/kernel/crash.c
+@@ -185,10 +185,9 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
+ }
+ 
+ #ifdef CONFIG_KEXEC_FILE
+-static int get_nr_ram_ranges_callback(unsigned long start_pfn,
+-				unsigned long nr_pfn, void *arg)
++static int get_nr_ram_ranges_callback(u64 start, u64 end, void *arg)
+ {
+-	int *nr_ranges = arg;
++	unsigned int *nr_ranges = arg;
+ 
+ 	(*nr_ranges)++;
+ 	return 0;
+@@ -214,7 +213,7 @@ static void fill_up_crash_elf_data(struct crash_elf_data *ced,
+ 
+ 	ced->image = image;
+ 
+-	walk_system_ram_range(0, -1, &nr_ranges,
++	walk_system_ram_res(0, -1, &nr_ranges,
+ 				get_nr_ram_ranges_callback);
+ 
+ 	ced->max_nr_ranges = nr_ranges;
+diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
+index 58bcfb67c01f..ebb5657ee280 100644
+--- a/arch/x86/kernel/paravirt.c
++++ b/arch/x86/kernel/paravirt.c
+@@ -41,10 +41,18 @@
+ #include <asm/timer.h>
+ #include <asm/special_insns.h>
+ 
+-/* nop stub */
+-void _paravirt_nop(void)
+-{
+-}
++/*
++ * nop stub, which must not clobber anything *including the stack* to
++ * avoid confusing the entry prologues.
++ */
++extern void _paravirt_nop(void);
++asm (".pushsection .entry.text, \"ax\"\n"
++     ".global _paravirt_nop\n"
++     "_paravirt_nop:\n\t"
++     "ret\n\t"
++     ".size _paravirt_nop, . - _paravirt_nop\n\t"
++     ".type _paravirt_nop, @function\n\t"
++     ".popsection");
+ 
+ /* identity function, which can be inlined */
+ u32 _paravirt_ident_32(u32 x)
+diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
+index f6b916387590..a90ac95562af 100644
+--- a/arch/x86/kernel/process_64.c
++++ b/arch/x86/kernel/process_64.c
+@@ -497,27 +497,59 @@ void set_personality_ia32(bool x32)
+ }
+ EXPORT_SYMBOL_GPL(set_personality_ia32);
+ 
++/*
++ * Called from fs/proc with a reference on @p to find the function
++ * which called into schedule(). This needs to be done carefully
++ * because the task might wake up and we might look at a stack
++ * changing under us.
++ */
+ unsigned long get_wchan(struct task_struct *p)
+ {
+-	unsigned long stack;
+-	u64 fp, ip;
++	unsigned long start, bottom, top, sp, fp, ip;
+ 	int count = 0;
+ 
+ 	if (!p || p == current || p->state == TASK_RUNNING)
+ 		return 0;
+-	stack = (unsigned long)task_stack_page(p);
+-	if (p->thread.sp < stack || p->thread.sp >= stack+THREAD_SIZE)
++
++	start = (unsigned long)task_stack_page(p);
++	if (!start)
++		return 0;
++
++	/*
++	 * Layout of the stack page:
++	 *
++	 * ----------- topmax = start + THREAD_SIZE - sizeof(unsigned long)
++	 * PADDING
++	 * ----------- top = topmax - TOP_OF_KERNEL_STACK_PADDING
++	 * stack
++	 * ----------- bottom = start + sizeof(thread_info)
++	 * thread_info
++	 * ----------- start
++	 *
++	 * The tasks stack pointer points at the location where the
++	 * framepointer is stored. The data on the stack is:
++	 * ... IP FP ... IP FP
++	 *
++	 * We need to read FP and IP, so we need to adjust the upper
++	 * bound by another unsigned long.
++	 */
++	top = start + THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING;
++	top -= 2 * sizeof(unsigned long);
++	bottom = start + sizeof(struct thread_info);
++
++	sp = READ_ONCE(p->thread.sp);
++	if (sp < bottom || sp > top)
+ 		return 0;
+-	fp = *(u64 *)(p->thread.sp);
++
++	fp = READ_ONCE(*(unsigned long *)sp);
+ 	do {
+-		if (fp < (unsigned long)stack ||
+-		    fp >= (unsigned long)stack+THREAD_SIZE)
++		if (fp < bottom || fp > top)
+ 			return 0;
+-		ip = *(u64 *)(fp+8);
++		ip = READ_ONCE(*(unsigned long *)(fp + sizeof(unsigned long)));
+ 		if (!in_sched_functions(ip))
+ 			return ip;
+-		fp = *(u64 *)fp;
+-	} while (count++ < 16);
++		fp = READ_ONCE(*(unsigned long *)fp);
++	} while (count++ < 16 && p->state != TASK_RUNNING);
+ 	return 0;
+ }
+ 
+diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
+index 7437b41f6a47..dc9af7a0839a 100644
+--- a/arch/x86/kernel/tsc.c
++++ b/arch/x86/kernel/tsc.c
+@@ -21,6 +21,7 @@
+ #include <asm/hypervisor.h>
+ #include <asm/nmi.h>
+ #include <asm/x86_init.h>
++#include <asm/geode.h>
+ 
+ unsigned int __read_mostly cpu_khz;	/* TSC clocks / usec, not used here */
+ EXPORT_SYMBOL(cpu_khz);
+@@ -1013,15 +1014,17 @@ EXPORT_SYMBOL_GPL(mark_tsc_unstable);
+ 
+ static void __init check_system_tsc_reliable(void)
+ {
+-#ifdef CONFIG_MGEODE_LX
+-	/* RTSC counts during suspend */
++#if defined(CONFIG_MGEODEGX1) || defined(CONFIG_MGEODE_LX) || defined(CONFIG_X86_GENERIC)
++	if (is_geode_lx()) {
++		/* RTSC counts during suspend */
+ #define RTSC_SUSP 0x100
+-	unsigned long res_low, res_high;
++		unsigned long res_low, res_high;
+ 
+-	rdmsr_safe(MSR_GEODE_BUSCONT_CONF0, &res_low, &res_high);
+-	/* Geode_LX - the OLPC CPU has a very reliable TSC */
+-	if (res_low & RTSC_SUSP)
+-		tsc_clocksource_reliable = 1;
++		rdmsr_safe(MSR_GEODE_BUSCONT_CONF0, &res_low, &res_high);
++		/* Geode_LX - the OLPC CPU has a very reliable TSC */
++		if (res_low & RTSC_SUSP)
++			tsc_clocksource_reliable = 1;
++	}
+ #endif
+ 	if (boot_cpu_has(X86_FEATURE_TSC_RELIABLE))
+ 		tsc_clocksource_reliable = 1;
+diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
+index 8e0c0844c6b9..2d32b67a1043 100644
+--- a/arch/x86/kvm/svm.c
++++ b/arch/x86/kvm/svm.c
+@@ -513,7 +513,7 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
+ 	struct vcpu_svm *svm = to_svm(vcpu);
+ 
+ 	if (svm->vmcb->control.next_rip != 0) {
+-		WARN_ON(!static_cpu_has(X86_FEATURE_NRIPS));
++		WARN_ON_ONCE(!static_cpu_has(X86_FEATURE_NRIPS));
+ 		svm->next_rip = svm->vmcb->control.next_rip;
+ 	}
+ 
+@@ -865,64 +865,6 @@ static void svm_disable_lbrv(struct vcpu_svm *svm)
+ 	set_msr_interception(msrpm, MSR_IA32_LASTINTTOIP, 0, 0);
+ }
+ 
+-#define MTRR_TYPE_UC_MINUS	7
+-#define MTRR2PROTVAL_INVALID 0xff
+-
+-static u8 mtrr2protval[8];
+-
+-static u8 fallback_mtrr_type(int mtrr)
+-{
+-	/*
+-	 * WT and WP aren't always available in the host PAT.  Treat
+-	 * them as UC and UC- respectively.  Everything else should be
+-	 * there.
+-	 */
+-	switch (mtrr)
+-	{
+-	case MTRR_TYPE_WRTHROUGH:
+-		return MTRR_TYPE_UNCACHABLE;
+-	case MTRR_TYPE_WRPROT:
+-		return MTRR_TYPE_UC_MINUS;
+-	default:
+-		BUG();
+-	}
+-}
+-
+-static void build_mtrr2protval(void)
+-{
+-	int i;
+-	u64 pat;
+-
+-	for (i = 0; i < 8; i++)
+-		mtrr2protval[i] = MTRR2PROTVAL_INVALID;
+-
+-	/* Ignore the invalid MTRR types.  */
+-	mtrr2protval[2] = 0;
+-	mtrr2protval[3] = 0;
+-
+-	/*
+-	 * Use host PAT value to figure out the mapping from guest MTRR
+-	 * values to nested page table PAT/PCD/PWT values.  We do not
+-	 * want to change the host PAT value every time we enter the
+-	 * guest.
+-	 */
+-	rdmsrl(MSR_IA32_CR_PAT, pat);
+-	for (i = 0; i < 8; i++) {
+-		u8 mtrr = pat >> (8 * i);
+-
+-		if (mtrr2protval[mtrr] == MTRR2PROTVAL_INVALID)
+-			mtrr2protval[mtrr] = __cm_idx2pte(i);
+-	}
+-
+-	for (i = 0; i < 8; i++) {
+-		if (mtrr2protval[i] == MTRR2PROTVAL_INVALID) {
+-			u8 fallback = fallback_mtrr_type(i);
+-			mtrr2protval[i] = mtrr2protval[fallback];
+-			BUG_ON(mtrr2protval[i] == MTRR2PROTVAL_INVALID);
+-		}
+-	}
+-}
+-
+ static __init int svm_hardware_setup(void)
+ {
+ 	int cpu;
+@@ -989,7 +931,6 @@ static __init int svm_hardware_setup(void)
+ 	} else
+ 		kvm_disable_tdp();
+ 
+-	build_mtrr2protval();
+ 	return 0;
+ 
+ err:
+@@ -1144,39 +1085,6 @@ static u64 svm_compute_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc)
+ 	return target_tsc - tsc;
+ }
+ 
+-static void svm_set_guest_pat(struct vcpu_svm *svm, u64 *g_pat)
+-{
+-	struct kvm_vcpu *vcpu = &svm->vcpu;
+-
+-	/* Unlike Intel, AMD takes the guest's CR0.CD into account.
+-	 *
+-	 * AMD does not have IPAT.  To emulate it for the case of guests
+-	 * with no assigned devices, just set everything to WB.  If guests
+-	 * have assigned devices, however, we cannot force WB for RAM
+-	 * pages only, so use the guest PAT directly.
+-	 */
+-	if (!kvm_arch_has_assigned_device(vcpu->kvm))
+-		*g_pat = 0x0606060606060606;
+-	else
+-		*g_pat = vcpu->arch.pat;
+-}
+-
+-static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
+-{
+-	u8 mtrr;
+-
+-	/*
+-	 * 1. MMIO: trust guest MTRR, so same as item 3.
+-	 * 2. No passthrough: always map as WB, and force guest PAT to WB as well
+-	 * 3. Passthrough: can't guarantee the result, try to trust guest.
+-	 */
+-	if (!is_mmio && !kvm_arch_has_assigned_device(vcpu->kvm))
+-		return 0;
+-
+-	mtrr = kvm_mtrr_get_guest_memory_type(vcpu, gfn);
+-	return mtrr2protval[mtrr];
+-}
+-
+ static void init_vmcb(struct vcpu_svm *svm, bool init_event)
+ {
+ 	struct vmcb_control_area *control = &svm->vmcb->control;
+@@ -1260,6 +1168,7 @@ static void init_vmcb(struct vcpu_svm *svm, bool init_event)
+ 	 * It also updates the guest-visible cr0 value.
+ 	 */
+ 	(void)kvm_set_cr0(&svm->vcpu, X86_CR0_NW | X86_CR0_CD | X86_CR0_ET);
++	kvm_mmu_reset_context(&svm->vcpu);
+ 
+ 	save->cr4 = X86_CR4_PAE;
+ 	/* rdx = ?? */
+@@ -1272,7 +1181,6 @@ static void init_vmcb(struct vcpu_svm *svm, bool init_event)
+ 		clr_cr_intercept(svm, INTERCEPT_CR3_READ);
+ 		clr_cr_intercept(svm, INTERCEPT_CR3_WRITE);
+ 		save->g_pat = svm->vcpu.arch.pat;
+-		svm_set_guest_pat(svm, &save->g_pat);
+ 		save->cr3 = 0;
+ 		save->cr4 = 0;
+ 	}
+@@ -3347,16 +3255,6 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
+ 	case MSR_VM_IGNNE:
+ 		vcpu_unimpl(vcpu, "unimplemented wrmsr: 0x%x data 0x%llx\n", ecx, data);
+ 		break;
+-	case MSR_IA32_CR_PAT:
+-		if (npt_enabled) {
+-			if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
+-				return 1;
+-			vcpu->arch.pat = data;
+-			svm_set_guest_pat(svm, &svm->vmcb->save.g_pat);
+-			mark_dirty(svm->vmcb, VMCB_NPT);
+-			break;
+-		}
+-		/* fall through */
+ 	default:
+ 		return kvm_set_msr_common(vcpu, msr);
+ 	}
+@@ -4191,6 +4089,11 @@ static bool svm_has_high_real_mode_segbase(void)
+ 	return true;
+ }
+ 
++static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
++{
++	return 0;
++}
++
+ static void svm_cpuid_update(struct kvm_vcpu *vcpu)
+ {
+ }
+diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
+index 83b7b5cd75d5..aa9e8229571d 100644
+--- a/arch/x86/kvm/vmx.c
++++ b/arch/x86/kvm/vmx.c
+@@ -6134,6 +6134,8 @@ static __init int hardware_setup(void)
+ 	memcpy(vmx_msr_bitmap_longmode_x2apic,
+ 			vmx_msr_bitmap_longmode, PAGE_SIZE);
+ 
++	set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
++
+ 	if (enable_apicv) {
+ 		for (msr = 0x800; msr <= 0x8ff; msr++)
+ 			vmx_disable_intercept_msr_read_x2apic(msr);
+@@ -8632,17 +8634,22 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
+ 	u64 ipat = 0;
+ 
+ 	/* For VT-d and EPT combination
+-	 * 1. MMIO: guest may want to apply WC, trust it.
++	 * 1. MMIO: always map as UC
+ 	 * 2. EPT with VT-d:
+ 	 *   a. VT-d without snooping control feature: can't guarantee the
+-	 *	result, try to trust guest.  So the same as item 1.
++	 *	result, try to trust guest.
+ 	 *   b. VT-d with snooping control feature: snooping control feature of
+ 	 *	VT-d engine can guarantee the cache correctness. Just set it
+ 	 *	to WB to keep consistent with host. So the same as item 3.
+ 	 * 3. EPT without VT-d: always map as WB and set IPAT=1 to keep
+ 	 *    consistent with host MTRR
+ 	 */
+-	if (!is_mmio && !kvm_arch_has_noncoherent_dma(vcpu->kvm)) {
++	if (is_mmio) {
++		cache = MTRR_TYPE_UNCACHABLE;
++		goto exit;
++	}
++
++	if (!kvm_arch_has_noncoherent_dma(vcpu->kvm)) {
+ 		ipat = VMX_EPT_IPAT_BIT;
+ 		cache = MTRR_TYPE_WRBACK;
+ 		goto exit;
+diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
+index 8f0f6eca69da..32c6e6ac5964 100644
+--- a/arch/x86/kvm/x86.c
++++ b/arch/x86/kvm/x86.c
+@@ -2388,6 +2388,8 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+ 	case MSR_IA32_LASTINTFROMIP:
+ 	case MSR_IA32_LASTINTTOIP:
+ 	case MSR_K8_SYSCFG:
++	case MSR_K8_TSEG_ADDR:
++	case MSR_K8_TSEG_MASK:
+ 	case MSR_K7_HWCR:
+ 	case MSR_VM_HSAVE_PA:
+ 	case MSR_K8_INT_PENDING_MSG:
+diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
+index 3fba623e3ba5..f9977a7a9444 100644
+--- a/arch/x86/mm/init_64.c
++++ b/arch/x86/mm/init_64.c
+@@ -1132,7 +1132,7 @@ void mark_rodata_ro(void)
+ 	 * has been zapped already via cleanup_highmem().
+ 	 */
+ 	all_end = roundup((unsigned long)_brk_end, PMD_SIZE);
+-	set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT);
++	set_memory_nx(text_end, (all_end - text_end) >> PAGE_SHIFT);
+ 
+ 	rodata_test();
+ 
+diff --git a/arch/x86/pci/intel_mid_pci.c b/arch/x86/pci/intel_mid_pci.c
+index 27062303c881..7553921c146c 100644
+--- a/arch/x86/pci/intel_mid_pci.c
++++ b/arch/x86/pci/intel_mid_pci.c
+@@ -35,6 +35,9 @@
+ 
+ #define PCIE_CAP_OFFSET	0x100
+ 
++/* Quirks for the listed devices */
++#define PCI_DEVICE_ID_INTEL_MRFL_MMC	0x1190
++
+ /* Fixed BAR fields */
+ #define PCIE_VNDR_CAP_ID_FIXED_BAR 0x00	/* Fixed BAR (TBD) */
+ #define PCI_FIXED_BAR_0_SIZE	0x04
+@@ -214,10 +217,27 @@ static int intel_mid_pci_irq_enable(struct pci_dev *dev)
+ 	if (dev->irq_managed && dev->irq > 0)
+ 		return 0;
+ 
+-	if (intel_mid_identify_cpu() == INTEL_MID_CPU_CHIP_TANGIER)
++	switch (intel_mid_identify_cpu()) {
++	case INTEL_MID_CPU_CHIP_TANGIER:
+ 		polarity = 0; /* active high */
+-	else
++
++		/* Special treatment for IRQ0 */
++		if (dev->irq == 0) {
++			/*
++			 * TNG has IRQ0 assigned to eMMC controller. But there
++			 * are also other devices with bogus PCI configuration
++			 * that have IRQ0 assigned. This check ensures that
++			 * eMMC gets it.
++			 */
++			if (dev->device != PCI_DEVICE_ID_INTEL_MRFL_MMC)
++				return -EBUSY;
++		}
++		break;
++	default:
+ 		polarity = 1; /* active low */
++		break;
++	}
++
+ 	ioapic_set_alloc_attr(&info, dev_to_node(&dev->dev), 1, polarity);
+ 
+ 	/*
+diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
+index e4308fe6afe8..c6835bfad3a1 100644
+--- a/arch/x86/platform/efi/efi.c
++++ b/arch/x86/platform/efi/efi.c
+@@ -705,6 +705,70 @@ out:
+ }
+ 
+ /*
++ * Iterate the EFI memory map in reverse order because the regions
++ * will be mapped top-down. The end result is the same as if we had
++ * mapped things forward, but doesn't require us to change the
++ * existing implementation of efi_map_region().
++ */
++static inline void *efi_map_next_entry_reverse(void *entry)
++{
++	/* Initial call */
++	if (!entry)
++		return memmap.map_end - memmap.desc_size;
++
++	entry -= memmap.desc_size;
++	if (entry < memmap.map)
++		return NULL;
++
++	return entry;
++}
++
++/*
++ * efi_map_next_entry - Return the next EFI memory map descriptor
++ * @entry: Previous EFI memory map descriptor
++ *
++ * This is a helper function to iterate over the EFI memory map, which
++ * we do in different orders depending on the current configuration.
++ *
++ * To begin traversing the memory map @entry must be %NULL.
++ *
++ * Returns %NULL when we reach the end of the memory map.
++ */
++static void *efi_map_next_entry(void *entry)
++{
++	if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
++		/*
++		 * Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE
++		 * config table feature requires us to map all entries
++		 * in the same order as they appear in the EFI memory
++		 * map. That is to say, entry N must have a lower
++		 * virtual address than entry N+1. This is because the
++		 * firmware toolchain leaves relative references in
++		 * the code/data sections, which are split and become
++		 * separate EFI memory regions. Mapping things
++		 * out-of-order leads to the firmware accessing
++		 * unmapped addresses.
++		 *
++		 * Since we need to map things this way whether or not
++		 * the kernel actually makes use of
++		 * EFI_PROPERTIES_TABLE, let's just switch to this
++		 * scheme by default for 64-bit.
++		 */
++		return efi_map_next_entry_reverse(entry);
++	}
++
++	/* Initial call */
++	if (!entry)
++		return memmap.map;
++
++	entry += memmap.desc_size;
++	if (entry >= memmap.map_end)
++		return NULL;
++
++	return entry;
++}
++
++/*
+  * Map the efi memory ranges of the runtime services and update new_mmap with
+  * virtual addresses.
+  */
+@@ -714,7 +778,8 @@ static void * __init efi_map_regions(int *count, int *pg_shift)
+ 	unsigned long left = 0;
+ 	efi_memory_desc_t *md;
+ 
+-	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
++	p = NULL;
++	while ((p = efi_map_next_entry(p))) {
+ 		md = p;
+ 		if (!(md->attribute & EFI_MEMORY_RUNTIME)) {
+ #ifdef CONFIG_X86_64
+diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
+index 11d6fb4e8483..777ad2f03160 100644
+--- a/arch/x86/xen/enlighten.c
++++ b/arch/x86/xen/enlighten.c
+@@ -33,6 +33,10 @@
+ #include <linux/memblock.h>
+ #include <linux/edd.h>
+ 
++#ifdef CONFIG_KEXEC_CORE
++#include <linux/kexec.h>
++#endif
++
+ #include <xen/xen.h>
+ #include <xen/events.h>
+ #include <xen/interface/xen.h>
+@@ -1800,6 +1804,21 @@ static struct notifier_block xen_hvm_cpu_notifier = {
+ 	.notifier_call	= xen_hvm_cpu_notify,
+ };
+ 
++#ifdef CONFIG_KEXEC_CORE
++static void xen_hvm_shutdown(void)
++{
++	native_machine_shutdown();
++	if (kexec_in_progress)
++		xen_reboot(SHUTDOWN_soft_reset);
++}
++
++static void xen_hvm_crash_shutdown(struct pt_regs *regs)
++{
++	native_machine_crash_shutdown(regs);
++	xen_reboot(SHUTDOWN_soft_reset);
++}
++#endif
++
+ static void __init xen_hvm_guest_init(void)
+ {
+ 	if (xen_pv_domain())
+@@ -1819,6 +1838,10 @@ static void __init xen_hvm_guest_init(void)
+ 	x86_init.irqs.intr_init = xen_init_IRQ;
+ 	xen_hvm_init_time_ops();
+ 	xen_hvm_init_mmu_ops();
++#ifdef CONFIG_KEXEC_CORE
++	machine_ops.shutdown = xen_hvm_shutdown;
++	machine_ops.crash_shutdown = xen_hvm_crash_shutdown;
++#endif
+ }
+ #endif
+ 
+diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
+index d6283b3f5db5..9cc48d1d7abb 100644
+--- a/block/blk-cgroup.c
++++ b/block/blk-cgroup.c
+@@ -387,6 +387,9 @@ static void blkg_destroy_all(struct request_queue *q)
+ 		blkg_destroy(blkg);
+ 		spin_unlock(&blkcg->lock);
+ 	}
++
++	q->root_blkg = NULL;
++	q->root_rl.blkg = NULL;
+ }
+ 
+ /*
+diff --git a/block/blk-mq.c b/block/blk-mq.c
+index 176262ec3731..c69902695136 100644
+--- a/block/blk-mq.c
++++ b/block/blk-mq.c
+@@ -1807,7 +1807,6 @@ static void blk_mq_map_swqueue(struct request_queue *q)
+ 
+ 		hctx = q->mq_ops->map_queue(q, i);
+ 		cpumask_set_cpu(i, hctx->cpumask);
+-		cpumask_set_cpu(i, hctx->tags->cpumask);
+ 		ctx->index_hw = hctx->nr_ctx;
+ 		hctx->ctxs[hctx->nr_ctx++] = ctx;
+ 	}
+@@ -1847,6 +1846,14 @@ static void blk_mq_map_swqueue(struct request_queue *q)
+ 		hctx->next_cpu = cpumask_first(hctx->cpumask);
+ 		hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;
+ 	}
++
++	queue_for_each_ctx(q, ctx, i) {
++		if (!cpu_online(i))
++			continue;
++
++		hctx = q->mq_ops->map_queue(q, i);
++		cpumask_set_cpu(i, hctx->tags->cpumask);
++	}
+ }
+ 
+ static void blk_mq_update_tag_set_depth(struct blk_mq_tag_set *set)
+diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
+index 764280a91776..e9fd32e91668 100644
+--- a/drivers/base/cacheinfo.c
++++ b/drivers/base/cacheinfo.c
+@@ -148,7 +148,11 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)
+ 
+ 			if (sibling == cpu) /* skip itself */
+ 				continue;
++
+ 			sib_cpu_ci = get_cpu_cacheinfo(sibling);
++			if (!sib_cpu_ci->info_list)
++				continue;
++
+ 			sib_leaf = sib_cpu_ci->info_list + index;
+ 			cpumask_clear_cpu(cpu, &sib_leaf->shared_cpu_map);
+ 			cpumask_clear_cpu(sibling, &this_leaf->shared_cpu_map);
+@@ -159,6 +163,9 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)
+ 
+ static void free_cache_attributes(unsigned int cpu)
+ {
++	if (!per_cpu_cacheinfo(cpu))
++		return;
++
+ 	cache_shared_cpu_map_remove(cpu);
+ 
+ 	kfree(per_cpu_cacheinfo(cpu));
+@@ -514,8 +521,7 @@ static int cacheinfo_cpu_callback(struct notifier_block *nfb,
+ 		break;
+ 	case CPU_DEAD:
+ 		cache_remove_dev(cpu);
+-		if (per_cpu_cacheinfo(cpu))
+-			free_cache_attributes(cpu);
++		free_cache_attributes(cpu);
+ 		break;
+ 	}
+ 	return notifier_from_errno(rc);
+diff --git a/drivers/base/property.c b/drivers/base/property.c
+index f3f6d167f3f1..37a7bb7b239d 100644
+--- a/drivers/base/property.c
++++ b/drivers/base/property.c
+@@ -27,9 +27,10 @@
+  */
+ void device_add_property_set(struct device *dev, struct property_set *pset)
+ {
+-	if (pset)
+-		pset->fwnode.type = FWNODE_PDATA;
++	if (!pset)
++		return;
+ 
++	pset->fwnode.type = FWNODE_PDATA;
+ 	set_secondary_fwnode(dev, &pset->fwnode);
+ }
+ EXPORT_SYMBOL_GPL(device_add_property_set);
+diff --git a/drivers/base/regmap/regmap-debugfs.c b/drivers/base/regmap/regmap-debugfs.c
+index 5799a0b9e6cc..c8941f39c919 100644
+--- a/drivers/base/regmap/regmap-debugfs.c
++++ b/drivers/base/regmap/regmap-debugfs.c
+@@ -32,8 +32,7 @@ static DEFINE_MUTEX(regmap_debugfs_early_lock);
+ /* Calculate the length of a fixed format  */
+ static size_t regmap_calc_reg_len(int max_val, char *buf, size_t buf_size)
+ {
+-	snprintf(buf, buf_size, "%x", max_val);
+-	return strlen(buf);
++	return snprintf(NULL, 0, "%x", max_val);
+ }
+ 
+ static ssize_t regmap_name_read_file(struct file *file,
+@@ -432,7 +431,7 @@ static ssize_t regmap_access_read_file(struct file *file,
+ 		/* If we're in the region the user is trying to read */
+ 		if (p >= *ppos) {
+ 			/* ...but not beyond it */
+-			if (buf_pos >= count - 1 - tot_len)
++			if (buf_pos + tot_len + 1 >= count)
+ 				break;
+ 
+ 			/* Format the register */
+diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
+index deb3f001791f..767657565de6 100644
+--- a/drivers/block/xen-blkback/xenbus.c
++++ b/drivers/block/xen-blkback/xenbus.c
+@@ -212,6 +212,9 @@ static int xen_blkif_map(struct xen_blkif *blkif, grant_ref_t *gref,
+ 
+ static int xen_blkif_disconnect(struct xen_blkif *blkif)
+ {
++	struct pending_req *req, *n;
++	int i = 0, j;
++
+ 	if (blkif->xenblkd) {
+ 		kthread_stop(blkif->xenblkd);
+ 		wake_up(&blkif->shutdown_wq);
+@@ -238,13 +241,28 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
+ 	/* Remove all persistent grants and the cache of ballooned pages. */
+ 	xen_blkbk_free_caches(blkif);
+ 
++	/* Check that there is no request in use */
++	list_for_each_entry_safe(req, n, &blkif->pending_free, free_list) {
++		list_del(&req->free_list);
++
++		for (j = 0; j < MAX_INDIRECT_SEGMENTS; j++)
++			kfree(req->segments[j]);
++
++		for (j = 0; j < MAX_INDIRECT_PAGES; j++)
++			kfree(req->indirect_pages[j]);
++
++		kfree(req);
++		i++;
++	}
++
++	WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));
++	blkif->nr_ring_pages = 0;
++
+ 	return 0;
+ }
+ 
+ static void xen_blkif_free(struct xen_blkif *blkif)
+ {
+-	struct pending_req *req, *n;
+-	int i = 0, j;
+ 
+ 	xen_blkif_disconnect(blkif);
+ 	xen_vbd_free(&blkif->vbd);
+@@ -257,22 +275,6 @@ static void xen_blkif_free(struct xen_blkif *blkif)
+ 	BUG_ON(!list_empty(&blkif->free_pages));
+ 	BUG_ON(!RB_EMPTY_ROOT(&blkif->persistent_gnts));
+ 
+-	/* Check that there is no request in use */
+-	list_for_each_entry_safe(req, n, &blkif->pending_free, free_list) {
+-		list_del(&req->free_list);
+-
+-		for (j = 0; j < MAX_INDIRECT_SEGMENTS; j++)
+-			kfree(req->segments[j]);
+-
+-		for (j = 0; j < MAX_INDIRECT_PAGES; j++)
+-			kfree(req->indirect_pages[j]);
+-
+-		kfree(req);
+-		i++;
+-	}
+-
+-	WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));
+-
+ 	kmem_cache_free(xen_blkif_cachep, blkif);
+ }
+ 
+diff --git a/drivers/clk/samsung/clk-cpu.c b/drivers/clk/samsung/clk-cpu.c
+index 3a1fe07cfe9e..dd02356e2e86 100644
+--- a/drivers/clk/samsung/clk-cpu.c
++++ b/drivers/clk/samsung/clk-cpu.c
+@@ -161,7 +161,7 @@ static int exynos_cpuclk_pre_rate_change(struct clk_notifier_data *ndata,
+ 	 * the values for DIV_COPY and DIV_HPM dividers need not be set.
+ 	 */
+ 	div0 = cfg_data->div0;
+-	if (test_bit(CLK_CPU_HAS_DIV1, &cpuclk->flags)) {
++	if (cpuclk->flags & CLK_CPU_HAS_DIV1) {
+ 		div1 = cfg_data->div1;
+ 		if (readl(base + E4210_SRC_CPU) & E4210_MUX_HPM_MASK)
+ 			div1 = readl(base + E4210_DIV_CPU1) &
+@@ -182,7 +182,7 @@ static int exynos_cpuclk_pre_rate_change(struct clk_notifier_data *ndata,
+ 		alt_div = DIV_ROUND_UP(alt_prate, tmp_rate) - 1;
+ 		WARN_ON(alt_div >= MAX_DIV);
+ 
+-		if (test_bit(CLK_CPU_NEEDS_DEBUG_ALT_DIV, &cpuclk->flags)) {
++		if (cpuclk->flags & CLK_CPU_NEEDS_DEBUG_ALT_DIV) {
+ 			/*
+ 			 * In Exynos4210, ATB clock parent is also mout_core. So
+ 			 * ATB clock also needs to be mantained at safe speed.
+@@ -203,7 +203,7 @@ static int exynos_cpuclk_pre_rate_change(struct clk_notifier_data *ndata,
+ 	writel(div0, base + E4210_DIV_CPU0);
+ 	wait_until_divider_stable(base + E4210_DIV_STAT_CPU0, DIV_MASK_ALL);
+ 
+-	if (test_bit(CLK_CPU_HAS_DIV1, &cpuclk->flags)) {
++	if (cpuclk->flags & CLK_CPU_HAS_DIV1) {
+ 		writel(div1, base + E4210_DIV_CPU1);
+ 		wait_until_divider_stable(base + E4210_DIV_STAT_CPU1,
+ 				DIV_MASK_ALL);
+@@ -222,7 +222,7 @@ static int exynos_cpuclk_post_rate_change(struct clk_notifier_data *ndata,
+ 	unsigned long mux_reg;
+ 
+ 	/* find out the divider values to use for clock data */
+-	if (test_bit(CLK_CPU_NEEDS_DEBUG_ALT_DIV, &cpuclk->flags)) {
++	if (cpuclk->flags & CLK_CPU_NEEDS_DEBUG_ALT_DIV) {
+ 		while ((cfg_data->prate * 1000) != ndata->new_rate) {
+ 			if (cfg_data->prate == 0)
+ 				return -EINVAL;
+@@ -237,7 +237,7 @@ static int exynos_cpuclk_post_rate_change(struct clk_notifier_data *ndata,
+ 	writel(mux_reg & ~(1 << 16), base + E4210_SRC_CPU);
+ 	wait_until_mux_stable(base + E4210_STAT_CPU, 16, 1);
+ 
+-	if (test_bit(CLK_CPU_NEEDS_DEBUG_ALT_DIV, &cpuclk->flags)) {
++	if (cpuclk->flags & CLK_CPU_NEEDS_DEBUG_ALT_DIV) {
+ 		div |= (cfg_data->div0 & E4210_DIV0_ATB_MASK);
+ 		div_mask |= E4210_DIV0_ATB_MASK;
+ 	}
+diff --git a/drivers/clk/ti/clk-3xxx.c b/drivers/clk/ti/clk-3xxx.c
+index 757636d166cf..4ab28cfb8d2a 100644
+--- a/drivers/clk/ti/clk-3xxx.c
++++ b/drivers/clk/ti/clk-3xxx.c
+@@ -163,7 +163,6 @@ static struct ti_dt_clk omap3xxx_clks[] = {
+ 	DT_CLK(NULL, "gpio2_ick", "gpio2_ick"),
+ 	DT_CLK(NULL, "wdt3_ick", "wdt3_ick"),
+ 	DT_CLK(NULL, "uart3_ick", "uart3_ick"),
+-	DT_CLK(NULL, "uart4_ick", "uart4_ick"),
+ 	DT_CLK(NULL, "gpt9_ick", "gpt9_ick"),
+ 	DT_CLK(NULL, "gpt8_ick", "gpt8_ick"),
+ 	DT_CLK(NULL, "gpt7_ick", "gpt7_ick"),
+@@ -308,6 +307,7 @@ static struct ti_dt_clk am35xx_clks[] = {
+ static struct ti_dt_clk omap36xx_clks[] = {
+ 	DT_CLK(NULL, "omap_192m_alwon_fck", "omap_192m_alwon_fck"),
+ 	DT_CLK(NULL, "uart4_fck", "uart4_fck"),
++	DT_CLK(NULL, "uart4_ick", "uart4_ick"),
+ 	{ .node_name = NULL },
+ };
+ 
+diff --git a/drivers/clk/ti/clk-7xx.c b/drivers/clk/ti/clk-7xx.c
+index 63b8323df918..0eb82107c421 100644
+--- a/drivers/clk/ti/clk-7xx.c
++++ b/drivers/clk/ti/clk-7xx.c
+@@ -16,7 +16,6 @@
+ #include <linux/clkdev.h>
+ #include <linux/clk/ti.h>
+ 
+-#define DRA7_DPLL_ABE_DEFFREQ				180633600
+ #define DRA7_DPLL_GMAC_DEFFREQ				1000000000
+ #define DRA7_DPLL_USB_DEFFREQ				960000000
+ 
+@@ -312,27 +311,12 @@ static struct ti_dt_clk dra7xx_clks[] = {
+ int __init dra7xx_dt_clk_init(void)
+ {
+ 	int rc;
+-	struct clk *abe_dpll_mux, *sys_clkin2, *dpll_ck, *hdcp_ck;
++	struct clk *dpll_ck, *hdcp_ck;
+ 
+ 	ti_dt_clocks_register(dra7xx_clks);
+ 
+ 	omap2_clk_disable_autoidle_all();
+ 
+-	abe_dpll_mux = clk_get_sys(NULL, "abe_dpll_sys_clk_mux");
+-	sys_clkin2 = clk_get_sys(NULL, "sys_clkin2");
+-	dpll_ck = clk_get_sys(NULL, "dpll_abe_ck");
+-
+-	rc = clk_set_parent(abe_dpll_mux, sys_clkin2);
+-	if (!rc)
+-		rc = clk_set_rate(dpll_ck, DRA7_DPLL_ABE_DEFFREQ);
+-	if (rc)
+-		pr_err("%s: failed to configure ABE DPLL!\n", __func__);
+-
+-	dpll_ck = clk_get_sys(NULL, "dpll_abe_m2x2_ck");
+-	rc = clk_set_rate(dpll_ck, DRA7_DPLL_ABE_DEFFREQ * 2);
+-	if (rc)
+-		pr_err("%s: failed to configure ABE DPLL m2x2!\n", __func__);
+-
+ 	dpll_ck = clk_get_sys(NULL, "dpll_gmac_ck");
+ 	rc = clk_set_rate(dpll_ck, DRA7_DPLL_GMAC_DEFFREQ);
+ 	if (rc)
+diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
+index 0136dfcdabf0..7c2a7385c2ad 100644
+--- a/drivers/cpufreq/acpi-cpufreq.c
++++ b/drivers/cpufreq/acpi-cpufreq.c
+@@ -146,6 +146,9 @@ static ssize_t show_freqdomain_cpus(struct cpufreq_policy *policy, char *buf)
+ {
+ 	struct acpi_cpufreq_data *data = per_cpu(acfreq_data, policy->cpu);
+ 
++	if (unlikely(!data))
++		return -ENODEV;
++
+ 	return cpufreq_show_cpus(data->freqdomain_cpus, buf);
+ }
+ 
+diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c
+index 528a82bf5038..99a406501e8c 100644
+--- a/drivers/cpufreq/cpufreq-dt.c
++++ b/drivers/cpufreq/cpufreq-dt.c
+@@ -255,7 +255,8 @@ static int cpufreq_init(struct cpufreq_policy *policy)
+ 			rcu_read_unlock();
+ 
+ 			tol_uV = opp_uV * priv->voltage_tolerance / 100;
+-			if (regulator_is_supported_voltage(cpu_reg, opp_uV,
++			if (regulator_is_supported_voltage(cpu_reg,
++							   opp_uV - tol_uV,
+ 							   opp_uV + tol_uV)) {
+ 				if (opp_uV < min_uV)
+ 					min_uV = opp_uV;
+diff --git a/drivers/crypto/marvell/cesa.h b/drivers/crypto/marvell/cesa.h
+index b60698b30d30..bc2a55bc35e4 100644
+--- a/drivers/crypto/marvell/cesa.h
++++ b/drivers/crypto/marvell/cesa.h
+@@ -687,6 +687,33 @@ static inline u32 mv_cesa_get_int_mask(struct mv_cesa_engine *engine)
+ 
+ int mv_cesa_queue_req(struct crypto_async_request *req);
+ 
++/*
++ * Helper function that indicates whether a crypto request needs to be
++ * cleaned up or not after being enqueued using mv_cesa_queue_req().
++ */
++static inline int mv_cesa_req_needs_cleanup(struct crypto_async_request *req,
++					    int ret)
++{
++	/*
++	 * The queue still had some space, the request was queued
++	 * normally, so there's no need to clean it up.
++	 */
++	if (ret == -EINPROGRESS)
++		return false;
++
++	/*
++	 * The queue had not space left, but since the request is
++	 * flagged with CRYPTO_TFM_REQ_MAY_BACKLOG, it was added to
++	 * the backlog and will be processed later. There's no need to
++	 * clean it up.
++	 */
++	if (ret == -EBUSY && req->flags & CRYPTO_TFM_REQ_MAY_BACKLOG)
++		return false;
++
++	/* Request wasn't queued, we need to clean it up */
++	return true;
++}
++
+ /* TDMA functions */
+ 
+ static inline void mv_cesa_req_dma_iter_init(struct mv_cesa_dma_iter *iter,
+diff --git a/drivers/crypto/marvell/cipher.c b/drivers/crypto/marvell/cipher.c
+index 0745cf3b9c0e..3df2f4e7adb2 100644
+--- a/drivers/crypto/marvell/cipher.c
++++ b/drivers/crypto/marvell/cipher.c
+@@ -189,7 +189,6 @@ static inline void mv_cesa_ablkcipher_prepare(struct crypto_async_request *req,
+ {
+ 	struct ablkcipher_request *ablkreq = ablkcipher_request_cast(req);
+ 	struct mv_cesa_ablkcipher_req *creq = ablkcipher_request_ctx(ablkreq);
+-
+ 	creq->req.base.engine = engine;
+ 
+ 	if (creq->req.base.type == CESA_DMA_REQ)
+@@ -431,7 +430,7 @@ static int mv_cesa_des_op(struct ablkcipher_request *req,
+ 		return ret;
+ 
+ 	ret = mv_cesa_queue_req(&req->base);
+-	if (ret && ret != -EINPROGRESS)
++	if (mv_cesa_req_needs_cleanup(&req->base, ret))
+ 		mv_cesa_ablkcipher_cleanup(req);
+ 
+ 	return ret;
+@@ -551,7 +550,7 @@ static int mv_cesa_des3_op(struct ablkcipher_request *req,
+ 		return ret;
+ 
+ 	ret = mv_cesa_queue_req(&req->base);
+-	if (ret && ret != -EINPROGRESS)
++	if (mv_cesa_req_needs_cleanup(&req->base, ret))
+ 		mv_cesa_ablkcipher_cleanup(req);
+ 
+ 	return ret;
+@@ -693,7 +692,7 @@ static int mv_cesa_aes_op(struct ablkcipher_request *req,
+ 		return ret;
+ 
+ 	ret = mv_cesa_queue_req(&req->base);
+-	if (ret && ret != -EINPROGRESS)
++	if (mv_cesa_req_needs_cleanup(&req->base, ret))
+ 		mv_cesa_ablkcipher_cleanup(req);
+ 
+ 	return ret;
+diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
+index ae9272eb9c1a..e8d0d7128137 100644
+--- a/drivers/crypto/marvell/hash.c
++++ b/drivers/crypto/marvell/hash.c
+@@ -739,10 +739,8 @@ static int mv_cesa_ahash_update(struct ahash_request *req)
+ 		return 0;
+ 
+ 	ret = mv_cesa_queue_req(&req->base);
+-	if (ret && ret != -EINPROGRESS) {
++	if (mv_cesa_req_needs_cleanup(&req->base, ret))
+ 		mv_cesa_ahash_cleanup(req);
+-		return ret;
+-	}
+ 
+ 	return ret;
+ }
+@@ -766,7 +764,7 @@ static int mv_cesa_ahash_final(struct ahash_request *req)
+ 		return 0;
+ 
+ 	ret = mv_cesa_queue_req(&req->base);
+-	if (ret && ret != -EINPROGRESS)
++	if (mv_cesa_req_needs_cleanup(&req->base, ret))
+ 		mv_cesa_ahash_cleanup(req);
+ 
+ 	return ret;
+@@ -791,7 +789,7 @@ static int mv_cesa_ahash_finup(struct ahash_request *req)
+ 		return 0;
+ 
+ 	ret = mv_cesa_queue_req(&req->base);
+-	if (ret && ret != -EINPROGRESS)
++	if (mv_cesa_req_needs_cleanup(&req->base, ret))
+ 		mv_cesa_ahash_cleanup(req);
+ 
+ 	return ret;
+diff --git a/drivers/dma/at_xdmac.c b/drivers/dma/at_xdmac.c
+index 40afa2a16cfc..da7917a2eed2 100644
+--- a/drivers/dma/at_xdmac.c
++++ b/drivers/dma/at_xdmac.c
+@@ -455,6 +455,15 @@ static struct at_xdmac_desc *at_xdmac_alloc_desc(struct dma_chan *chan,
+ 	return desc;
+ }
+ 
++void at_xdmac_init_used_desc(struct at_xdmac_desc *desc)
++{
++	memset(&desc->lld, 0, sizeof(desc->lld));
++	INIT_LIST_HEAD(&desc->descs_list);
++	desc->direction = DMA_TRANS_NONE;
++	desc->xfer_size = 0;
++	desc->active_xfer = false;
++}
++
+ /* Call must be protected by lock. */
+ static struct at_xdmac_desc *at_xdmac_get_desc(struct at_xdmac_chan *atchan)
+ {
+@@ -466,7 +475,7 @@ static struct at_xdmac_desc *at_xdmac_get_desc(struct at_xdmac_chan *atchan)
+ 		desc = list_first_entry(&atchan->free_descs_list,
+ 					struct at_xdmac_desc, desc_node);
+ 		list_del(&desc->desc_node);
+-		desc->active_xfer = false;
++		at_xdmac_init_used_desc(desc);
+ 	}
+ 
+ 	return desc;
+@@ -797,10 +806,7 @@ at_xdmac_prep_dma_cyclic(struct dma_chan *chan, dma_addr_t buf_addr,
+ 		list_add_tail(&desc->desc_node, &first->descs_list);
+ 	}
+ 
+-	prev->lld.mbr_nda = first->tx_dma_desc.phys;
+-	dev_dbg(chan2dev(chan),
+-		"%s: chain lld: prev=0x%p, mbr_nda=%pad\n",
+-		__func__, prev, &prev->lld.mbr_nda);
++	at_xdmac_queue_desc(chan, prev, first);
+ 	first->tx_dma_desc.flags = flags;
+ 	first->xfer_size = buf_len;
+ 	first->direction = direction;
+@@ -878,14 +884,14 @@ at_xdmac_interleaved_queue_desc(struct dma_chan *chan,
+ 
+ 	if (xt->src_inc) {
+ 		if (xt->src_sgl)
+-			chan_cc |=  AT_XDMAC_CC_SAM_UBS_DS_AM;
++			chan_cc |=  AT_XDMAC_CC_SAM_UBS_AM;
+ 		else
+ 			chan_cc |=  AT_XDMAC_CC_SAM_INCREMENTED_AM;
+ 	}
+ 
+ 	if (xt->dst_inc) {
+ 		if (xt->dst_sgl)
+-			chan_cc |=  AT_XDMAC_CC_DAM_UBS_DS_AM;
++			chan_cc |=  AT_XDMAC_CC_DAM_UBS_AM;
+ 		else
+ 			chan_cc |=  AT_XDMAC_CC_DAM_INCREMENTED_AM;
+ 	}
+diff --git a/drivers/dma/dw/core.c b/drivers/dma/dw/core.c
+index cf1c87fa1edd..bedce038c6e2 100644
+--- a/drivers/dma/dw/core.c
++++ b/drivers/dma/dw/core.c
+@@ -1591,7 +1591,6 @@ int dw_dma_probe(struct dw_dma_chip *chip, struct dw_dma_platform_data *pdata)
+ 	INIT_LIST_HEAD(&dw->dma.channels);
+ 	for (i = 0; i < nr_channels; i++) {
+ 		struct dw_dma_chan	*dwc = &dw->chan[i];
+-		int			r = nr_channels - i - 1;
+ 
+ 		dwc->chan.device = &dw->dma;
+ 		dma_cookie_init(&dwc->chan);
+@@ -1603,7 +1602,7 @@ int dw_dma_probe(struct dw_dma_chip *chip, struct dw_dma_platform_data *pdata)
+ 
+ 		/* 7 is highest priority & 0 is lowest. */
+ 		if (pdata->chan_priority == CHAN_PRIORITY_ASCENDING)
+-			dwc->priority = r;
++			dwc->priority = nr_channels - i - 1;
+ 		else
+ 			dwc->priority = i;
+ 
+@@ -1622,6 +1621,7 @@ int dw_dma_probe(struct dw_dma_chip *chip, struct dw_dma_platform_data *pdata)
+ 		/* Hardware configuration */
+ 		if (autocfg) {
+ 			unsigned int dwc_params;
++			unsigned int r = DW_DMA_MAX_NR_CHANNELS - i - 1;
+ 			void __iomem *addr = chip->regs + r * sizeof(u32);
+ 
+ 			dwc_params = dma_read_byaddr(addr, DWC_PARAMS);
+diff --git a/drivers/dma/pxa_dma.c b/drivers/dma/pxa_dma.c
+index ddcbbf5cd9e9..95bdbbe2a671 100644
+--- a/drivers/dma/pxa_dma.c
++++ b/drivers/dma/pxa_dma.c
+@@ -888,6 +888,7 @@ pxad_tx_prep(struct virt_dma_chan *vc, struct virt_dma_desc *vd,
+ 	struct dma_async_tx_descriptor *tx;
+ 	struct pxad_chan *chan = container_of(vc, struct pxad_chan, vc);
+ 
++	INIT_LIST_HEAD(&vd->node);
+ 	tx = vchan_tx_prep(vc, vd, tx_flags);
+ 	tx->tx_submit = pxad_tx_submit;
+ 	dev_dbg(&chan->vc.chan.dev->device,
+diff --git a/drivers/extcon/extcon.c b/drivers/extcon/extcon.c
+index 43b57b02d050..ca94f475fd05 100644
+--- a/drivers/extcon/extcon.c
++++ b/drivers/extcon/extcon.c
+@@ -126,7 +126,7 @@ static int find_cable_index_by_id(struct extcon_dev *edev, const unsigned int id
+ 
+ static int find_cable_id_by_name(struct extcon_dev *edev, const char *name)
+ {
+-	unsigned int id = -EINVAL;
++	int id = -EINVAL;
+ 	int i = 0;
+ 
+ 	/* Find the id of extcon cable */
+@@ -143,7 +143,7 @@ static int find_cable_id_by_name(struct extcon_dev *edev, const char *name)
+ 
+ static int find_cable_index_by_name(struct extcon_dev *edev, const char *name)
+ {
+-	unsigned int id;
++	int id;
+ 
+ 	if (edev->max_supported == 0)
+ 		return -EINVAL;
+@@ -159,7 +159,7 @@ static int find_cable_index_by_name(struct extcon_dev *edev, const char *name)
+ static bool is_extcon_changed(u32 prev, u32 new, int idx, bool *attached)
+ {
+ 	if (((prev >> idx) & 0x1) != ((new >> idx) & 0x1)) {
+-		*attached = new ? true : false;
++		*attached = ((new >> idx) & 0x1) ? true : false;
+ 		return true;
+ 	}
+ 
+@@ -378,7 +378,7 @@ EXPORT_SYMBOL_GPL(extcon_get_cable_state_);
+  */
+ int extcon_get_cable_state(struct extcon_dev *edev, const char *cable_name)
+ {
+-	unsigned int id;
++	int id;
+ 
+ 	id = find_cable_id_by_name(edev, cable_name);
+ 	if (id < 0)
+@@ -426,7 +426,7 @@ EXPORT_SYMBOL_GPL(extcon_set_cable_state_);
+ int extcon_set_cable_state(struct extcon_dev *edev,
+ 			const char *cable_name, bool cable_state)
+ {
+-	unsigned int id;
++	int id;
+ 
+ 	id = find_cable_id_by_name(edev, cable_name);
+ 	if (id < 0)
+diff --git a/drivers/firmware/efi/libstub/arm-stub.c b/drivers/firmware/efi/libstub/arm-stub.c
+index e29560e6b40b..950c87f5d279 100644
+--- a/drivers/firmware/efi/libstub/arm-stub.c
++++ b/drivers/firmware/efi/libstub/arm-stub.c
+@@ -13,6 +13,7 @@
+  */
+ 
+ #include <linux/efi.h>
++#include <linux/sort.h>
+ #include <asm/efi.h>
+ 
+ #include "efistub.h"
+@@ -305,6 +306,44 @@ fail:
+  */
+ #define EFI_RT_VIRTUAL_BASE	0x40000000
+ 
++static int cmp_mem_desc(const void *l, const void *r)
++{
++	const efi_memory_desc_t *left = l, *right = r;
++
++	return (left->phys_addr > right->phys_addr) ? 1 : -1;
++}
++
++/*
++ * Returns whether region @left ends exactly where region @right starts,
++ * or false if either argument is NULL.
++ */
++static bool regions_are_adjacent(efi_memory_desc_t *left,
++				 efi_memory_desc_t *right)
++{
++	u64 left_end;
++
++	if (left == NULL || right == NULL)
++		return false;
++
++	left_end = left->phys_addr + left->num_pages * EFI_PAGE_SIZE;
++
++	return left_end == right->phys_addr;
++}
++
++/*
++ * Returns whether region @left and region @right have compatible memory type
++ * mapping attributes, and are both EFI_MEMORY_RUNTIME regions.
++ */
++static bool regions_have_compatible_memory_type_attrs(efi_memory_desc_t *left,
++						      efi_memory_desc_t *right)
++{
++	static const u64 mem_type_mask = EFI_MEMORY_WB | EFI_MEMORY_WT |
++					 EFI_MEMORY_WC | EFI_MEMORY_UC |
++					 EFI_MEMORY_RUNTIME;
++
++	return ((left->attribute ^ right->attribute) & mem_type_mask) == 0;
++}
++
+ /*
+  * efi_get_virtmap() - create a virtual mapping for the EFI memory map
+  *
+@@ -317,33 +356,52 @@ void efi_get_virtmap(efi_memory_desc_t *memory_map, unsigned long map_size,
+ 		     int *count)
+ {
+ 	u64 efi_virt_base = EFI_RT_VIRTUAL_BASE;
+-	efi_memory_desc_t *out = runtime_map;
++	efi_memory_desc_t *in, *prev = NULL, *out = runtime_map;
+ 	int l;
+ 
+-	for (l = 0; l < map_size; l += desc_size) {
+-		efi_memory_desc_t *in = (void *)memory_map + l;
++	/*
++	 * To work around potential issues with the Properties Table feature
++	 * introduced in UEFI 2.5, which may split PE/COFF executable images
++	 * in memory into several RuntimeServicesCode and RuntimeServicesData
++	 * regions, we need to preserve the relative offsets between adjacent
++	 * EFI_MEMORY_RUNTIME regions with the same memory type attributes.
++	 * The easiest way to find adjacent regions is to sort the memory map
++	 * before traversing it.
++	 */
++	sort(memory_map, map_size / desc_size, desc_size, cmp_mem_desc, NULL);
++
++	for (l = 0; l < map_size; l += desc_size, prev = in) {
+ 		u64 paddr, size;
+ 
++		in = (void *)memory_map + l;
+ 		if (!(in->attribute & EFI_MEMORY_RUNTIME))
+ 			continue;
+ 
++		paddr = in->phys_addr;
++		size = in->num_pages * EFI_PAGE_SIZE;
++
+ 		/*
+ 		 * Make the mapping compatible with 64k pages: this allows
+ 		 * a 4k page size kernel to kexec a 64k page size kernel and
+ 		 * vice versa.
+ 		 */
+-		paddr = round_down(in->phys_addr, SZ_64K);
+-		size = round_up(in->num_pages * EFI_PAGE_SIZE +
+-				in->phys_addr - paddr, SZ_64K);
+-
+-		/*
+-		 * Avoid wasting memory on PTEs by choosing a virtual base that
+-		 * is compatible with section mappings if this region has the
+-		 * appropriate size and physical alignment. (Sections are 2 MB
+-		 * on 4k granule kernels)
+-		 */
+-		if (IS_ALIGNED(in->phys_addr, SZ_2M) && size >= SZ_2M)
+-			efi_virt_base = round_up(efi_virt_base, SZ_2M);
++		if (!regions_are_adjacent(prev, in) ||
++		    !regions_have_compatible_memory_type_attrs(prev, in)) {
++
++			paddr = round_down(in->phys_addr, SZ_64K);
++			size += in->phys_addr - paddr;
++
++			/*
++			 * Avoid wasting memory on PTEs by choosing a virtual
++			 * base that is compatible with section mappings if this
++			 * region has the appropriate size and physical
++			 * alignment. (Sections are 2 MB on 4k granule kernels)
++			 */
++			if (IS_ALIGNED(in->phys_addr, SZ_2M) && size >= SZ_2M)
++				efi_virt_base = round_up(efi_virt_base, SZ_2M);
++			else
++				efi_virt_base = round_up(efi_virt_base, SZ_64K);
++		}
+ 
+ 		in->virt_addr = efi_virt_base + in->phys_addr - paddr;
+ 		efi_virt_base += size;
+diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+index b4d36f0f2153..c098d762089c 100644
+--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
++++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+@@ -140,7 +140,7 @@ void amdgpu_irq_preinstall(struct drm_device *dev)
+  */
+ int amdgpu_irq_postinstall(struct drm_device *dev)
+ {
+-	dev->max_vblank_count = 0x001fffff;
++	dev->max_vblank_count = 0x00ffffff;
+ 	return 0;
+ }
+ 
+diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+index 2abc661845b6..ddcfbf3b188b 100644
+--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
++++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+@@ -543,46 +543,60 @@ static int amdgpu_uvd_cs_msg(struct amdgpu_uvd_cs_ctx *ctx,
+ 		return -EINVAL;
+ 	}
+ 
+-	if (msg_type == 1) {
++	switch (msg_type) {
++	case 0:
++		/* it's a create msg, calc image size (width * height) */
++		amdgpu_bo_kunmap(bo);
++
++		/* try to alloc a new handle */
++		for (i = 0; i < AMDGPU_MAX_UVD_HANDLES; ++i) {
++			if (atomic_read(&adev->uvd.handles[i]) == handle) {
++				DRM_ERROR("Handle 0x%x already in use!\n", handle);
++				return -EINVAL;
++			}
++
++			if (!atomic_cmpxchg(&adev->uvd.handles[i], 0, handle)) {
++				adev->uvd.filp[i] = ctx->parser->filp;
++				return 0;
++			}
++		}
++
++		DRM_ERROR("No more free UVD handles!\n");
++		return -EINVAL;
++
++	case 1:
+ 		/* it's a decode msg, calc buffer sizes */
+ 		r = amdgpu_uvd_cs_msg_decode(msg, ctx->buf_sizes);
+ 		amdgpu_bo_kunmap(bo);
+ 		if (r)
+ 			return r;
+ 
+-	} else if (msg_type == 2) {
++		/* validate the handle */
++		for (i = 0; i < AMDGPU_MAX_UVD_HANDLES; ++i) {
++			if (atomic_read(&adev->uvd.handles[i]) == handle) {
++				if (adev->uvd.filp[i] != ctx->parser->filp) {
++					DRM_ERROR("UVD handle collision detected!\n");
++					return -EINVAL;
++				}
++				return 0;
++			}
++		}
++
++		DRM_ERROR("Invalid UVD handle 0x%x!\n", handle);
++		return -ENOENT;
++
++	case 2:
+ 		/* it's a destroy msg, free the handle */
+ 		for (i = 0; i < AMDGPU_MAX_UVD_HANDLES; ++i)
+ 			atomic_cmpxchg(&adev->uvd.handles[i], handle, 0);
+ 		amdgpu_bo_kunmap(bo);
+ 		return 0;
+-	} else {
+-		/* it's a create msg */
+-		amdgpu_bo_kunmap(bo);
+-
+-		if (msg_type != 0) {
+-			DRM_ERROR("Illegal UVD message type (%d)!\n", msg_type);
+-			return -EINVAL;
+-		}
+-
+-		/* it's a create msg, no special handling needed */
+-	}
+-
+-	/* create or decode, validate the handle */
+-	for (i = 0; i < AMDGPU_MAX_UVD_HANDLES; ++i) {
+-		if (atomic_read(&adev->uvd.handles[i]) == handle)
+-			return 0;
+-	}
+ 
+-	/* handle not found try to alloc a new one */
+-	for (i = 0; i < AMDGPU_MAX_UVD_HANDLES; ++i) {
+-		if (!atomic_cmpxchg(&adev->uvd.handles[i], 0, handle)) {
+-			adev->uvd.filp[i] = ctx->parser->filp;
+-			return 0;
+-		}
++	default:
++		DRM_ERROR("Illegal UVD message type (%d)!\n", msg_type);
++		return -EINVAL;
+ 	}
+-
+-	DRM_ERROR("No more free UVD handles!\n");
++	BUG();
+ 	return -EINVAL;
+ }
+ 
+diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+index 9a4e3b63f1cb..b07402fc8ded 100644
+--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
++++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+@@ -787,7 +787,7 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev,
+ 	int r;
+ 
+ 	if (mem) {
+-		addr = mem->start << PAGE_SHIFT;
++		addr = (u64)mem->start << PAGE_SHIFT;
+ 		if (mem->mem_type != TTM_PL_TT)
+ 			addr += adev->vm_manager.vram_base_offset;
+ 	} else {
+diff --git a/drivers/gpu/drm/amd/amdgpu/atombios_encoders.c b/drivers/gpu/drm/amd/amdgpu/atombios_encoders.c
+index ae8caca61e04..e60557417049 100644
+--- a/drivers/gpu/drm/amd/amdgpu/atombios_encoders.c
++++ b/drivers/gpu/drm/amd/amdgpu/atombios_encoders.c
+@@ -1279,8 +1279,7 @@ amdgpu_atombios_encoder_setup_dig(struct drm_encoder *encoder, int action)
+ 			amdgpu_atombios_encoder_setup_dig_encoder(encoder, ATOM_ENCODER_CMD_DP_VIDEO_ON, 0);
+ 		}
+ 		if (amdgpu_encoder->devices & (ATOM_DEVICE_LCD_SUPPORT))
+-			amdgpu_atombios_encoder_setup_dig_transmitter(encoder,
+-							       ATOM_TRANSMITTER_ACTION_LCD_BLON, 0, 0);
++			amdgpu_atombios_encoder_set_backlight_level(amdgpu_encoder, dig->backlight_level);
+ 		if (ext_encoder)
+ 			amdgpu_atombios_encoder_setup_external_encoder(encoder, ext_encoder, ATOM_ENABLE);
+ 	} else {
+diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c b/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c
+index 4efd671d7a9b..9488ea6ea93f 100644
+--- a/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c
++++ b/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c
+@@ -224,11 +224,11 @@ static int uvd_v4_2_suspend(void *handle)
+ 	int r;
+ 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+ 
+-	r = uvd_v4_2_hw_fini(adev);
++	r = amdgpu_uvd_suspend(adev);
+ 	if (r)
+ 		return r;
+ 
+-	r = amdgpu_uvd_suspend(adev);
++	r = uvd_v4_2_hw_fini(adev);
+ 	if (r)
+ 		return r;
+ 
+diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c
+index b756bd99c0fd..d0ed998228ef 100644
+--- a/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c
++++ b/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c
+@@ -220,11 +220,11 @@ static int uvd_v5_0_suspend(void *handle)
+ 	int r;
+ 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+ 
+-	r = uvd_v5_0_hw_fini(adev);
++	r = amdgpu_uvd_suspend(adev);
+ 	if (r)
+ 		return r;
+ 
+-	r = amdgpu_uvd_suspend(adev);
++	r = uvd_v5_0_hw_fini(adev);
+ 	if (r)
+ 		return r;
+ 
+diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
+index 49aa931b2cb4..345eb760fd5b 100644
+--- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
++++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
+@@ -214,11 +214,11 @@ static int uvd_v6_0_suspend(void *handle)
+ 	int r;
+ 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+ 
+-	r = uvd_v6_0_hw_fini(adev);
++	r = amdgpu_uvd_suspend(adev);
+ 	if (r)
+ 		return r;
+ 
+-	r = amdgpu_uvd_suspend(adev);
++	r = uvd_v6_0_hw_fini(adev);
+ 	if (r)
+ 		return r;
+ 
+diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c
+index 68552da40287..4f58a1e18de6 100644
+--- a/drivers/gpu/drm/amd/amdgpu/vi.c
++++ b/drivers/gpu/drm/amd/amdgpu/vi.c
+@@ -1290,7 +1290,8 @@ static int vi_common_early_init(void *handle)
+ 	case CHIP_CARRIZO:
+ 		adev->has_uvd = true;
+ 		adev->cg_flags = 0;
+-		adev->pg_flags = AMDGPU_PG_SUPPORT_UVD | AMDGPU_PG_SUPPORT_VCE;
++		/* Disable UVD pg */
++		adev->pg_flags = /* AMDGPU_PG_SUPPORT_UVD | */AMDGPU_PG_SUPPORT_VCE;
+ 		adev->external_rev_id = adev->rev_id + 0x1;
+ 		if (amdgpu_smc_load_fw && smc_enabled)
+ 			adev->firmware.smu_load = true;
+diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
+index eb603f1defc2..969e7898a7ed 100644
+--- a/drivers/gpu/drm/drm_dp_mst_topology.c
++++ b/drivers/gpu/drm/drm_dp_mst_topology.c
+@@ -804,8 +804,6 @@ static void drm_dp_destroy_mst_branch_device(struct kref *kref)
+ 	struct drm_dp_mst_port *port, *tmp;
+ 	bool wake_tx = false;
+ 
+-	cancel_work_sync(&mstb->mgr->work);
+-
+ 	/*
+ 	 * destroy all ports - don't need lock
+ 	 * as there are no more references to the mst branch
+@@ -863,29 +861,33 @@ static void drm_dp_destroy_port(struct kref *kref)
+ {
+ 	struct drm_dp_mst_port *port = container_of(kref, struct drm_dp_mst_port, kref);
+ 	struct drm_dp_mst_topology_mgr *mgr = port->mgr;
++
+ 	if (!port->input) {
+ 		port->vcpi.num_slots = 0;
+ 
+ 		kfree(port->cached_edid);
+ 
+-		/* we can't destroy the connector here, as
+-		   we might be holding the mode_config.mutex
+-		   from an EDID retrieval */
++		/*
++		 * The only time we don't have a connector
++		 * on an output port is if the connector init
++		 * fails.
++		 */
+ 		if (port->connector) {
++			/* we can't destroy the connector here, as
++			 * we might be holding the mode_config.mutex
++			 * from an EDID retrieval */
++
+ 			mutex_lock(&mgr->destroy_connector_lock);
+ 			list_add(&port->next, &mgr->destroy_connector_list);
+ 			mutex_unlock(&mgr->destroy_connector_lock);
+ 			schedule_work(&mgr->destroy_connector_work);
+ 			return;
+ 		}
++		/* no need to clean up vcpi
++		 * as if we have no connector we never setup a vcpi */
+ 		drm_dp_port_teardown_pdt(port, port->pdt);
+-
+-		if (!port->input && port->vcpi.vcpi > 0)
+-			drm_dp_mst_put_payload_id(mgr, port->vcpi.vcpi);
+ 	}
+ 	kfree(port);
+-
+-	(*mgr->cbs->hotplug)(mgr);
+ }
+ 
+ static void drm_dp_put_port(struct drm_dp_mst_port *port)
+@@ -1115,12 +1117,21 @@ static void drm_dp_add_port(struct drm_dp_mst_branch *mstb,
+ 		char proppath[255];
+ 		build_mst_prop_path(port, mstb, proppath, sizeof(proppath));
+ 		port->connector = (*mstb->mgr->cbs->add_connector)(mstb->mgr, port, proppath);
+-
++		if (!port->connector) {
++			/* remove it from the port list */
++			mutex_lock(&mstb->mgr->lock);
++			list_del(&port->next);
++			mutex_unlock(&mstb->mgr->lock);
++			/* drop port list reference */
++			drm_dp_put_port(port);
++			goto out;
++		}
+ 		if (port->port_num >= 8) {
+ 			port->cached_edid = drm_get_edid(port->connector, &port->aux.ddc);
+ 		}
+ 	}
+ 
++out:
+ 	/* put reference to this port */
+ 	drm_dp_put_port(port);
+ }
+@@ -1978,6 +1989,8 @@ void drm_dp_mst_topology_mgr_suspend(struct drm_dp_mst_topology_mgr *mgr)
+ 	drm_dp_dpcd_writeb(mgr->aux, DP_MSTM_CTRL,
+ 			   DP_MST_EN | DP_UPSTREAM_IS_SRC);
+ 	mutex_unlock(&mgr->lock);
++	flush_work(&mgr->work);
++	flush_work(&mgr->destroy_connector_work);
+ }
+ EXPORT_SYMBOL(drm_dp_mst_topology_mgr_suspend);
+ 
+@@ -2661,7 +2674,7 @@ static void drm_dp_destroy_connector_work(struct work_struct *work)
+ {
+ 	struct drm_dp_mst_topology_mgr *mgr = container_of(work, struct drm_dp_mst_topology_mgr, destroy_connector_work);
+ 	struct drm_dp_mst_port *port;
+-
++	bool send_hotplug = false;
+ 	/*
+ 	 * Not a regular list traverse as we have to drop the destroy
+ 	 * connector lock before destroying the connector, to avoid AB->BA
+@@ -2684,7 +2697,10 @@ static void drm_dp_destroy_connector_work(struct work_struct *work)
+ 		if (!port->input && port->vcpi.vcpi > 0)
+ 			drm_dp_mst_put_payload_id(mgr, port->vcpi.vcpi);
+ 		kfree(port);
++		send_hotplug = true;
+ 	}
++	if (send_hotplug)
++		(*mgr->cbs->hotplug)(mgr);
+ }
+ 
+ /**
+@@ -2737,6 +2753,7 @@ EXPORT_SYMBOL(drm_dp_mst_topology_mgr_init);
+  */
+ void drm_dp_mst_topology_mgr_destroy(struct drm_dp_mst_topology_mgr *mgr)
+ {
++	flush_work(&mgr->work);
+ 	flush_work(&mgr->destroy_connector_work);
+ 	mutex_lock(&mgr->payload_lock);
+ 	kfree(mgr->payloads);
+diff --git a/drivers/gpu/drm/drm_lock.c b/drivers/gpu/drm/drm_lock.c
+index f861361a635e..4924d381b664 100644
+--- a/drivers/gpu/drm/drm_lock.c
++++ b/drivers/gpu/drm/drm_lock.c
+@@ -61,6 +61,9 @@ int drm_legacy_lock(struct drm_device *dev, void *data,
+ 	struct drm_master *master = file_priv->master;
+ 	int ret = 0;
+ 
++	if (drm_core_check_feature(dev, DRIVER_MODESET))
++		return -EINVAL;
++
+ 	++file_priv->lock_count;
+ 
+ 	if (lock->context == DRM_KERNEL_CONTEXT) {
+@@ -153,6 +156,9 @@ int drm_legacy_unlock(struct drm_device *dev, void *data, struct drm_file *file_
+ 	struct drm_lock *lock = data;
+ 	struct drm_master *master = file_priv->master;
+ 
++	if (drm_core_check_feature(dev, DRIVER_MODESET))
++		return -EINVAL;
++
+ 	if (lock->context == DRM_KERNEL_CONTEXT) {
+ 		DRM_ERROR("Process %d using kernel context %d\n",
+ 			  task_pid_nr(current), lock->context);
+diff --git a/drivers/gpu/drm/i915/intel_bios.c b/drivers/gpu/drm/i915/intel_bios.c
+index 198fc3c3291b..17522f733513 100644
+--- a/drivers/gpu/drm/i915/intel_bios.c
++++ b/drivers/gpu/drm/i915/intel_bios.c
+@@ -42,7 +42,7 @@ find_section(const void *_bdb, int section_id)
+ 	const struct bdb_header *bdb = _bdb;
+ 	const u8 *base = _bdb;
+ 	int index = 0;
+-	u16 total, current_size;
++	u32 total, current_size;
+ 	u8 current_id;
+ 
+ 	/* skip to first section */
+@@ -57,6 +57,10 @@ find_section(const void *_bdb, int section_id)
+ 		current_size = *((const u16 *)(base + index));
+ 		index += 2;
+ 
++		/* The MIPI Sequence Block v3+ has a separate size field. */
++		if (current_id == BDB_MIPI_SEQUENCE && *(base + index) >= 3)
++			current_size = *((const u32 *)(base + index + 1));
++
+ 		if (index + current_size > total)
+ 			return NULL;
+ 
+@@ -859,6 +863,12 @@ parse_mipi(struct drm_i915_private *dev_priv, const struct bdb_header *bdb)
+ 		return;
+ 	}
+ 
++	/* Fail gracefully for forward incompatible sequence block. */
++	if (sequence->version >= 3) {
++		DRM_ERROR("Unable to parse MIPI Sequence Block v3+\n");
++		return;
++	}
++
+ 	DRM_DEBUG_DRIVER("Found MIPI sequence block\n");
+ 
+ 	block_size = get_blocksize(sequence);
+diff --git a/drivers/gpu/drm/qxl/qxl_display.c b/drivers/gpu/drm/qxl/qxl_display.c
+index 7c6225c84ba6..4649bd2ed340 100644
+--- a/drivers/gpu/drm/qxl/qxl_display.c
++++ b/drivers/gpu/drm/qxl/qxl_display.c
+@@ -618,7 +618,7 @@ static int qxl_crtc_mode_set(struct drm_crtc *crtc,
+ 		  adjusted_mode->hdisplay,
+ 		  adjusted_mode->vdisplay);
+ 
+-	if (qcrtc->index == 0)
++	if (bo->is_primary == false)
+ 		recreate_primary = true;
+ 
+ 	if (bo->surf.stride * bo->surf.height > qdev->vram_size) {
+@@ -886,13 +886,15 @@ static enum drm_connector_status qxl_conn_detect(
+ 		drm_connector_to_qxl_output(connector);
+ 	struct drm_device *ddev = connector->dev;
+ 	struct qxl_device *qdev = ddev->dev_private;
+-	int connected;
++	bool connected = false;
+ 
+ 	/* The first monitor is always connected */
+-	connected = (output->index == 0) ||
+-		    (qdev->client_monitors_config &&
+-		     qdev->client_monitors_config->count > output->index &&
+-		     qxl_head_enabled(&qdev->client_monitors_config->heads[output->index]));
++	if (!qdev->client_monitors_config) {
++		if (output->index == 0)
++			connected = true;
++	} else
++		connected = qdev->client_monitors_config->count > output->index &&
++		     qxl_head_enabled(&qdev->client_monitors_config->heads[output->index]);
+ 
+ 	DRM_DEBUG("#%d connected: %d\n", output->index, connected);
+ 	if (!connected)
+diff --git a/drivers/gpu/drm/radeon/atombios_encoders.c b/drivers/gpu/drm/radeon/atombios_encoders.c
+index c3872598b85a..65adb9c72377 100644
+--- a/drivers/gpu/drm/radeon/atombios_encoders.c
++++ b/drivers/gpu/drm/radeon/atombios_encoders.c
+@@ -1624,8 +1624,9 @@ radeon_atom_encoder_dpms_avivo(struct drm_encoder *encoder, int mode)
+ 		} else
+ 			atom_execute_table(rdev->mode_info.atom_context, index, (uint32_t *)&args);
+ 		if (radeon_encoder->devices & (ATOM_DEVICE_LCD_SUPPORT)) {
+-			args.ucAction = ATOM_LCD_BLON;
+-			atom_execute_table(rdev->mode_info.atom_context, index, (uint32_t *)&args);
++			struct radeon_encoder_atom_dig *dig = radeon_encoder->enc_priv;
++
++			atombios_set_backlight_level(radeon_encoder, dig->backlight_level);
+ 		}
+ 		break;
+ 	case DRM_MODE_DPMS_STANDBY:
+@@ -1706,8 +1707,7 @@ radeon_atom_encoder_dpms_dig(struct drm_encoder *encoder, int mode)
+ 				atombios_dig_encoder_setup(encoder, ATOM_ENCODER_CMD_DP_VIDEO_ON, 0);
+ 		}
+ 		if (radeon_encoder->devices & (ATOM_DEVICE_LCD_SUPPORT))
+-			atombios_dig_transmitter_setup(encoder,
+-						       ATOM_TRANSMITTER_ACTION_LCD_BLON, 0, 0);
++			atombios_set_backlight_level(radeon_encoder, dig->backlight_level);
+ 		if (ext_encoder)
+ 			atombios_external_encoder_setup(encoder, ext_encoder, ATOM_ENABLE);
+ 		break;
+diff --git a/drivers/hv/hv_utils_transport.c b/drivers/hv/hv_utils_transport.c
+index ea7ba5ef16a9..6a9d80a5332d 100644
+--- a/drivers/hv/hv_utils_transport.c
++++ b/drivers/hv/hv_utils_transport.c
+@@ -186,7 +186,7 @@ int hvutil_transport_send(struct hvutil_transport *hvt, void *msg, int len)
+ 		return -EINVAL;
+ 	} else if (hvt->mode == HVUTIL_TRANSPORT_NETLINK) {
+ 		cn_msg = kzalloc(sizeof(*cn_msg) + len, GFP_ATOMIC);
+-		if (!msg)
++		if (!cn_msg)
+ 			return -ENOMEM;
+ 		cn_msg->id.idx = hvt->cn_id.idx;
+ 		cn_msg->id.val = hvt->cn_id.val;
+diff --git a/drivers/hwmon/nct6775.c b/drivers/hwmon/nct6775.c
+index bd1c99deac71..2aaedbe0b023 100644
+--- a/drivers/hwmon/nct6775.c
++++ b/drivers/hwmon/nct6775.c
+@@ -354,6 +354,10 @@ static const u16 NCT6775_REG_TEMP_CRIT[ARRAY_SIZE(nct6775_temp_label) - 1]
+ 
+ /* NCT6776 specific data */
+ 
++/* STEP_UP_TIME and STEP_DOWN_TIME regs are swapped for all chips but NCT6775 */
++#define NCT6776_REG_FAN_STEP_UP_TIME NCT6775_REG_FAN_STEP_DOWN_TIME
++#define NCT6776_REG_FAN_STEP_DOWN_TIME NCT6775_REG_FAN_STEP_UP_TIME
++
+ static const s8 NCT6776_ALARM_BITS[] = {
+ 	0, 1, 2, 3, 8, 21, 20, 16,	/* in0.. in7 */
+ 	17, -1, -1, -1, -1, -1, -1,	/* in8..in14 */
+@@ -3528,8 +3532,8 @@ static int nct6775_probe(struct platform_device *pdev)
+ 		data->REG_FAN_PULSES = NCT6776_REG_FAN_PULSES;
+ 		data->FAN_PULSE_SHIFT = NCT6775_FAN_PULSE_SHIFT;
+ 		data->REG_FAN_TIME[0] = NCT6775_REG_FAN_STOP_TIME;
+-		data->REG_FAN_TIME[1] = NCT6775_REG_FAN_STEP_UP_TIME;
+-		data->REG_FAN_TIME[2] = NCT6775_REG_FAN_STEP_DOWN_TIME;
++		data->REG_FAN_TIME[1] = NCT6776_REG_FAN_STEP_UP_TIME;
++		data->REG_FAN_TIME[2] = NCT6776_REG_FAN_STEP_DOWN_TIME;
+ 		data->REG_TOLERANCE_H = NCT6776_REG_TOLERANCE_H;
+ 		data->REG_PWM[0] = NCT6775_REG_PWM;
+ 		data->REG_PWM[1] = NCT6775_REG_FAN_START_OUTPUT;
+@@ -3600,8 +3604,8 @@ static int nct6775_probe(struct platform_device *pdev)
+ 		data->REG_FAN_PULSES = NCT6779_REG_FAN_PULSES;
+ 		data->FAN_PULSE_SHIFT = NCT6775_FAN_PULSE_SHIFT;
+ 		data->REG_FAN_TIME[0] = NCT6775_REG_FAN_STOP_TIME;
+-		data->REG_FAN_TIME[1] = NCT6775_REG_FAN_STEP_UP_TIME;
+-		data->REG_FAN_TIME[2] = NCT6775_REG_FAN_STEP_DOWN_TIME;
++		data->REG_FAN_TIME[1] = NCT6776_REG_FAN_STEP_UP_TIME;
++		data->REG_FAN_TIME[2] = NCT6776_REG_FAN_STEP_DOWN_TIME;
+ 		data->REG_TOLERANCE_H = NCT6776_REG_TOLERANCE_H;
+ 		data->REG_PWM[0] = NCT6775_REG_PWM;
+ 		data->REG_PWM[1] = NCT6775_REG_FAN_START_OUTPUT;
+@@ -3677,8 +3681,8 @@ static int nct6775_probe(struct platform_device *pdev)
+ 		data->REG_FAN_PULSES = NCT6779_REG_FAN_PULSES;
+ 		data->FAN_PULSE_SHIFT = NCT6775_FAN_PULSE_SHIFT;
+ 		data->REG_FAN_TIME[0] = NCT6775_REG_FAN_STOP_TIME;
+-		data->REG_FAN_TIME[1] = NCT6775_REG_FAN_STEP_UP_TIME;
+-		data->REG_FAN_TIME[2] = NCT6775_REG_FAN_STEP_DOWN_TIME;
++		data->REG_FAN_TIME[1] = NCT6776_REG_FAN_STEP_UP_TIME;
++		data->REG_FAN_TIME[2] = NCT6776_REG_FAN_STEP_DOWN_TIME;
+ 		data->REG_TOLERANCE_H = NCT6776_REG_TOLERANCE_H;
+ 		data->REG_PWM[0] = NCT6775_REG_PWM;
+ 		data->REG_PWM[1] = NCT6775_REG_FAN_START_OUTPUT;
+diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
+index d851e1828d6f..85761b78bb5f 100644
+--- a/drivers/infiniband/ulp/isert/ib_isert.c
++++ b/drivers/infiniband/ulp/isert/ib_isert.c
+@@ -3012,9 +3012,16 @@ isert_get_dataout(struct iscsi_conn *conn, struct iscsi_cmd *cmd, bool recovery)
+ static int
+ isert_immediate_queue(struct iscsi_conn *conn, struct iscsi_cmd *cmd, int state)
+ {
+-	int ret;
++	struct isert_cmd *isert_cmd = iscsit_priv_cmd(cmd);
++	int ret = 0;
+ 
+ 	switch (state) {
++	case ISTATE_REMOVE:
++		spin_lock_bh(&conn->cmd_lock);
++		list_del_init(&cmd->i_conn_node);
++		spin_unlock_bh(&conn->cmd_lock);
++		isert_put_cmd(isert_cmd, true);
++		break;
+ 	case ISTATE_SEND_NOPIN_WANT_RESPONSE:
+ 		ret = isert_put_nopin(cmd, conn, false);
+ 		break;
+@@ -3379,6 +3386,41 @@ isert_wait4flush(struct isert_conn *isert_conn)
+ 	wait_for_completion(&isert_conn->wait_comp_err);
+ }
+ 
++/**
++ * isert_put_unsol_pending_cmds() - Drop commands waiting for
++ *     unsolicitate dataout
++ * @conn:    iscsi connection
++ *
++ * We might still have commands that are waiting for unsolicited
++ * dataouts messages. We must put the extra reference on those
++ * before blocking on the target_wait_for_session_cmds
++ */
++static void
++isert_put_unsol_pending_cmds(struct iscsi_conn *conn)
++{
++	struct iscsi_cmd *cmd, *tmp;
++	static LIST_HEAD(drop_cmd_list);
++
++	spin_lock_bh(&conn->cmd_lock);
++	list_for_each_entry_safe(cmd, tmp, &conn->conn_cmd_list, i_conn_node) {
++		if ((cmd->cmd_flags & ICF_NON_IMMEDIATE_UNSOLICITED_DATA) &&
++		    (cmd->write_data_done < conn->sess->sess_ops->FirstBurstLength) &&
++		    (cmd->write_data_done < cmd->se_cmd.data_length))
++			list_move_tail(&cmd->i_conn_node, &drop_cmd_list);
++	}
++	spin_unlock_bh(&conn->cmd_lock);
++
++	list_for_each_entry_safe(cmd, tmp, &drop_cmd_list, i_conn_node) {
++		list_del_init(&cmd->i_conn_node);
++		if (cmd->i_state != ISTATE_REMOVE) {
++			struct isert_cmd *isert_cmd = iscsit_priv_cmd(cmd);
++
++			isert_info("conn %p dropping cmd %p\n", conn, cmd);
++			isert_put_cmd(isert_cmd, true);
++		}
++	}
++}
++
+ static void isert_wait_conn(struct iscsi_conn *conn)
+ {
+ 	struct isert_conn *isert_conn = conn->context;
+@@ -3397,8 +3439,9 @@ static void isert_wait_conn(struct iscsi_conn *conn)
+ 	isert_conn_terminate(isert_conn);
+ 	mutex_unlock(&isert_conn->mutex);
+ 
+-	isert_wait4cmds(conn);
+ 	isert_wait4flush(isert_conn);
++	isert_put_unsol_pending_cmds(conn);
++	isert_wait4cmds(conn);
+ 	isert_wait4logout(isert_conn);
+ 
+ 	queue_work(isert_release_wq, &isert_conn->release_work);
+diff --git a/drivers/irqchip/irq-atmel-aic5.c b/drivers/irqchip/irq-atmel-aic5.c
+index 459bf4429d36..7e077bf13fe1 100644
+--- a/drivers/irqchip/irq-atmel-aic5.c
++++ b/drivers/irqchip/irq-atmel-aic5.c
+@@ -88,28 +88,36 @@ static void aic5_mask(struct irq_data *d)
+ {
+ 	struct irq_domain *domain = d->domain;
+ 	struct irq_domain_chip_generic *dgc = domain->gc;
+-	struct irq_chip_generic *gc = dgc->gc[0];
++	struct irq_chip_generic *bgc = dgc->gc[0];
++	struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
+ 
+-	/* Disable interrupt on AIC5 */
+-	irq_gc_lock(gc);
++	/*
++	 * Disable interrupt on AIC5. We always take the lock of the
++	 * first irq chip as all chips share the same registers.
++	 */
++	irq_gc_lock(bgc);
+ 	irq_reg_writel(gc, d->hwirq, AT91_AIC5_SSR);
+ 	irq_reg_writel(gc, 1, AT91_AIC5_IDCR);
+ 	gc->mask_cache &= ~d->mask;
+-	irq_gc_unlock(gc);
++	irq_gc_unlock(bgc);
+ }
+ 
+ static void aic5_unmask(struct irq_data *d)
+ {
+ 	struct irq_domain *domain = d->domain;
+ 	struct irq_domain_chip_generic *dgc = domain->gc;
+-	struct irq_chip_generic *gc = dgc->gc[0];
++	struct irq_chip_generic *bgc = dgc->gc[0];
++	struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
+ 
+-	/* Enable interrupt on AIC5 */
+-	irq_gc_lock(gc);
++	/*
++	 * Enable interrupt on AIC5. We always take the lock of the
++	 * first irq chip as all chips share the same registers.
++	 */
++	irq_gc_lock(bgc);
+ 	irq_reg_writel(gc, d->hwirq, AT91_AIC5_SSR);
+ 	irq_reg_writel(gc, 1, AT91_AIC5_IECR);
+ 	gc->mask_cache |= d->mask;
+-	irq_gc_unlock(gc);
++	irq_gc_unlock(bgc);
+ }
+ 
+ static int aic5_retrigger(struct irq_data *d)
+diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
+index c00e2db351ba..9a791dd52199 100644
+--- a/drivers/irqchip/irq-gic-v3-its.c
++++ b/drivers/irqchip/irq-gic-v3-its.c
+@@ -921,8 +921,10 @@ retry_baser:
+ 			 * non-cacheable as well.
+ 			 */
+ 			shr = tmp & GITS_BASER_SHAREABILITY_MASK;
+-			if (!shr)
++			if (!shr) {
+ 				cache = GITS_BASER_nC;
++				__flush_dcache_area(base, alloc_size);
++			}
+ 			goto retry_baser;
+ 		}
+ 
+@@ -1163,6 +1165,8 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
+ 		return NULL;
+ 	}
+ 
++	__flush_dcache_area(itt, sz);
++
+ 	dev->its = its;
+ 	dev->itt = itt;
+ 	dev->nr_ites = nr_ites;
+diff --git a/drivers/leds/Kconfig b/drivers/leds/Kconfig
+index 9ad35f72ab4c..433fb9df848a 100644
+--- a/drivers/leds/Kconfig
++++ b/drivers/leds/Kconfig
+@@ -229,7 +229,7 @@ config LEDS_LP55XX_COMMON
+ 	tristate "Common Driver for TI/National LP5521/5523/55231/5562/8501"
+ 	depends on LEDS_LP5521 || LEDS_LP5523 || LEDS_LP5562 || LEDS_LP8501
+ 	select FW_LOADER
+-	select FW_LOADER_USER_HELPER_FALLBACK
++	select FW_LOADER_USER_HELPER
+ 	help
+ 	  This option supports common operations for LP5521/5523/55231/5562/8501
+ 	  devices.
+diff --git a/drivers/leds/led-class.c b/drivers/leds/led-class.c
+index beabfbc6f7cd..ca51d58bed24 100644
+--- a/drivers/leds/led-class.c
++++ b/drivers/leds/led-class.c
+@@ -228,12 +228,15 @@ static int led_classdev_next_name(const char *init_name, char *name,
+ {
+ 	unsigned int i = 0;
+ 	int ret = 0;
++	struct device *dev;
+ 
+ 	strlcpy(name, init_name, len);
+ 
+-	while (class_find_device(leds_class, NULL, name, match_name) &&
+-	       (ret < len))
++	while ((ret < len) &&
++	       (dev = class_find_device(leds_class, NULL, name, match_name))) {
++		put_device(dev);
+ 		ret = snprintf(name, len, "%s_%u", init_name, ++i);
++	}
+ 
+ 	if (ret >= len)
+ 		return -ENOMEM;
+diff --git a/drivers/macintosh/windfarm_core.c b/drivers/macintosh/windfarm_core.c
+index 3ee198b65843..cc7ece1712b5 100644
+--- a/drivers/macintosh/windfarm_core.c
++++ b/drivers/macintosh/windfarm_core.c
+@@ -435,7 +435,7 @@ int wf_unregister_client(struct notifier_block *nb)
+ {
+ 	mutex_lock(&wf_lock);
+ 	blocking_notifier_chain_unregister(&wf_client_list, nb);
+-	wf_client_count++;
++	wf_client_count--;
+ 	if (wf_client_count == 0)
+ 		wf_stop_thread();
+ 	mutex_unlock(&wf_lock);
+diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
+index e51de52eeb94..48b5890c28e3 100644
+--- a/drivers/md/bitmap.c
++++ b/drivers/md/bitmap.c
+@@ -1997,7 +1997,8 @@ int bitmap_resize(struct bitmap *bitmap, sector_t blocks,
+ 	if (bitmap->mddev->bitmap_info.offset || bitmap->mddev->bitmap_info.file)
+ 		ret = bitmap_storage_alloc(&store, chunks,
+ 					   !bitmap->mddev->bitmap_info.external,
+-					   bitmap->cluster_slot);
++					   mddev_is_clustered(bitmap->mddev)
++					   ? bitmap->cluster_slot : 0);
+ 	if (ret)
+ 		goto err;
+ 
+diff --git a/drivers/md/dm-cache-policy-cleaner.c b/drivers/md/dm-cache-policy-cleaner.c
+index 240c9f0e85e7..8a096456579b 100644
+--- a/drivers/md/dm-cache-policy-cleaner.c
++++ b/drivers/md/dm-cache-policy-cleaner.c
+@@ -436,7 +436,7 @@ static struct dm_cache_policy *wb_create(dm_cblock_t cache_size,
+ static struct dm_cache_policy_type wb_policy_type = {
+ 	.name = "cleaner",
+ 	.version = {1, 0, 0},
+-	.hint_size = 0,
++	.hint_size = 4,
+ 	.owner = THIS_MODULE,
+ 	.create = wb_create
+ };
+diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
+index 0f48fed44a17..0d28c5b9d065 100644
+--- a/drivers/md/dm-crypt.c
++++ b/drivers/md/dm-crypt.c
+@@ -968,7 +968,8 @@ static void crypt_free_buffer_pages(struct crypt_config *cc, struct bio *clone);
+ 
+ /*
+  * Generate a new unfragmented bio with the given size
+- * This should never violate the device limitations
++ * This should never violate the device limitations (but only because
++ * max_segment_size is being constrained to PAGE_SIZE).
+  *
+  * This function may be called concurrently. If we allocate from the mempool
+  * concurrently, there is a possibility of deadlock. For example, if we have
+@@ -2058,9 +2059,20 @@ static int crypt_iterate_devices(struct dm_target *ti,
+ 	return fn(ti, cc->dev, cc->start, ti->len, data);
+ }
+ 
++static void crypt_io_hints(struct dm_target *ti, struct queue_limits *limits)
++{
++	/*
++	 * Unfortunate constraint that is required to avoid the potential
++	 * for exceeding underlying device's max_segments limits -- due to
++	 * crypt_alloc_buffer() possibly allocating pages for the encryption
++	 * bio that are not as physically contiguous as the original bio.
++	 */
++	limits->max_segment_size = PAGE_SIZE;
++}
++
+ static struct target_type crypt_target = {
+ 	.name   = "crypt",
+-	.version = {1, 14, 0},
++	.version = {1, 14, 1},
+ 	.module = THIS_MODULE,
+ 	.ctr    = crypt_ctr,
+ 	.dtr    = crypt_dtr,
+@@ -2072,6 +2084,7 @@ static struct target_type crypt_target = {
+ 	.message = crypt_message,
+ 	.merge  = crypt_merge,
+ 	.iterate_devices = crypt_iterate_devices,
++	.io_hints = crypt_io_hints,
+ };
+ 
+ static int __init dm_crypt_init(void)
+diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
+index 2daa67793511..1257d484392a 100644
+--- a/drivers/md/dm-raid.c
++++ b/drivers/md/dm-raid.c
+@@ -329,8 +329,7 @@ static int validate_region_size(struct raid_set *rs, unsigned long region_size)
+ 		 */
+ 		if (min_region_size > (1 << 13)) {
+ 			/* If not a power of 2, make it the next power of 2 */
+-			if (min_region_size & (min_region_size - 1))
+-				region_size = 1 << fls(region_size);
++			region_size = roundup_pow_of_two(min_region_size);
+ 			DMINFO("Choosing default region size of %lu sectors",
+ 			       region_size);
+ 		} else {
+diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
+index d2bbe8cc1e97..75aef240c2d1 100644
+--- a/drivers/md/dm-thin.c
++++ b/drivers/md/dm-thin.c
+@@ -4333,6 +4333,10 @@ static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits)
+ {
+ 	struct thin_c *tc = ti->private;
+ 	struct pool *pool = tc->pool;
++	struct queue_limits *pool_limits = dm_get_queue_limits(pool->pool_md);
++
++	if (!pool_limits->discard_granularity)
++		return; /* pool's discard support is disabled */
+ 
+ 	limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
+ 	limits->max_discard_sectors = 2048 * 1024 * 16; /* 16G */
+diff --git a/drivers/md/dm.c b/drivers/md/dm.c
+index 0d7ab20c58df..3e32f4e31bbb 100644
+--- a/drivers/md/dm.c
++++ b/drivers/md/dm.c
+@@ -2952,8 +2952,6 @@ static void __dm_destroy(struct mapped_device *md, bool wait)
+ 
+ 	might_sleep();
+ 
+-	map = dm_get_live_table(md, &srcu_idx);
+-
+ 	spin_lock(&_minor_lock);
+ 	idr_replace(&_minor_idr, MINOR_ALLOCED, MINOR(disk_devt(dm_disk(md))));
+ 	set_bit(DMF_FREEING, &md->flags);
+@@ -2967,14 +2965,14 @@ static void __dm_destroy(struct mapped_device *md, bool wait)
+ 	 * do not race with internal suspend.
+ 	 */
+ 	mutex_lock(&md->suspend_lock);
++	map = dm_get_live_table(md, &srcu_idx);
+ 	if (!dm_suspended_md(md)) {
+ 		dm_table_presuspend_targets(map);
+ 		dm_table_postsuspend_targets(map);
+ 	}
+-	mutex_unlock(&md->suspend_lock);
+-
+ 	/* dm_put_live_table must be before msleep, otherwise deadlock is possible */
+ 	dm_put_live_table(md, srcu_idx);
++	mutex_unlock(&md->suspend_lock);
+ 
+ 	/*
+ 	 * Rare, but there may be I/O requests still going to complete,
+diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
+index efb654eb5399..0875e5e7e09a 100644
+--- a/drivers/md/raid0.c
++++ b/drivers/md/raid0.c
+@@ -83,7 +83,7 @@ static int create_strip_zones(struct mddev *mddev, struct r0conf **private_conf)
+ 	char b[BDEVNAME_SIZE];
+ 	char b2[BDEVNAME_SIZE];
+ 	struct r0conf *conf = kzalloc(sizeof(*conf), GFP_KERNEL);
+-	bool discard_supported = false;
++	unsigned short blksize = 512;
+ 
+ 	if (!conf)
+ 		return -ENOMEM;
+@@ -98,6 +98,9 @@ static int create_strip_zones(struct mddev *mddev, struct r0conf **private_conf)
+ 		sector_div(sectors, mddev->chunk_sectors);
+ 		rdev1->sectors = sectors * mddev->chunk_sectors;
+ 
++		blksize = max(blksize, queue_logical_block_size(
++				      rdev1->bdev->bd_disk->queue));
++
+ 		rdev_for_each(rdev2, mddev) {
+ 			pr_debug("md/raid0:%s:   comparing %s(%llu)"
+ 				 " with %s(%llu)\n",
+@@ -134,6 +137,18 @@ static int create_strip_zones(struct mddev *mddev, struct r0conf **private_conf)
+ 	}
+ 	pr_debug("md/raid0:%s: FINAL %d zones\n",
+ 		 mdname(mddev), conf->nr_strip_zones);
++	/*
++	 * now since we have the hard sector sizes, we can make sure
++	 * chunk size is a multiple of that sector size
++	 */
++	if ((mddev->chunk_sectors << 9) % blksize) {
++		printk(KERN_ERR "md/raid0:%s: chunk_size of %d not multiple of block size %d\n",
++		       mdname(mddev),
++		       mddev->chunk_sectors << 9, blksize);
++		err = -EINVAL;
++		goto abort;
++	}
++
+ 	err = -ENOMEM;
+ 	conf->strip_zone = kzalloc(sizeof(struct strip_zone)*
+ 				conf->nr_strip_zones, GFP_KERNEL);
+@@ -188,19 +203,12 @@ static int create_strip_zones(struct mddev *mddev, struct r0conf **private_conf)
+ 		}
+ 		dev[j] = rdev1;
+ 
+-		if (mddev->queue)
+-			disk_stack_limits(mddev->gendisk, rdev1->bdev,
+-					  rdev1->data_offset << 9);
+-
+ 		if (rdev1->bdev->bd_disk->queue->merge_bvec_fn)
+ 			conf->has_merge_bvec = 1;
+ 
+ 		if (!smallest || (rdev1->sectors < smallest->sectors))
+ 			smallest = rdev1;
+ 		cnt++;
+-
+-		if (blk_queue_discard(bdev_get_queue(rdev1->bdev)))
+-			discard_supported = true;
+ 	}
+ 	if (cnt != mddev->raid_disks) {
+ 		printk(KERN_ERR "md/raid0:%s: too few disks (%d of %d) - "
+@@ -261,28 +269,6 @@ static int create_strip_zones(struct mddev *mddev, struct r0conf **private_conf)
+ 			 (unsigned long long)smallest->sectors);
+ 	}
+ 
+-	/*
+-	 * now since we have the hard sector sizes, we can make sure
+-	 * chunk size is a multiple of that sector size
+-	 */
+-	if ((mddev->chunk_sectors << 9) % queue_logical_block_size(mddev->queue)) {
+-		printk(KERN_ERR "md/raid0:%s: chunk_size of %d not valid\n",
+-		       mdname(mddev),
+-		       mddev->chunk_sectors << 9);
+-		goto abort;
+-	}
+-
+-	if (mddev->queue) {
+-		blk_queue_io_min(mddev->queue, mddev->chunk_sectors << 9);
+-		blk_queue_io_opt(mddev->queue,
+-				 (mddev->chunk_sectors << 9) * mddev->raid_disks);
+-
+-		if (!discard_supported)
+-			queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
+-		else
+-			queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
+-	}
+-
+ 	pr_debug("md/raid0:%s: done.\n", mdname(mddev));
+ 	*private_conf = conf;
+ 
+@@ -433,12 +419,6 @@ static int raid0_run(struct mddev *mddev)
+ 	if (md_check_no_bitmap(mddev))
+ 		return -EINVAL;
+ 
+-	if (mddev->queue) {
+-		blk_queue_max_hw_sectors(mddev->queue, mddev->chunk_sectors);
+-		blk_queue_max_write_same_sectors(mddev->queue, mddev->chunk_sectors);
+-		blk_queue_max_discard_sectors(mddev->queue, mddev->chunk_sectors);
+-	}
+-
+ 	/* if private is not null, we are here after takeover */
+ 	if (mddev->private == NULL) {
+ 		ret = create_strip_zones(mddev, &conf);
+@@ -447,6 +427,29 @@ static int raid0_run(struct mddev *mddev)
+ 		mddev->private = conf;
+ 	}
+ 	conf = mddev->private;
++	if (mddev->queue) {
++		struct md_rdev *rdev;
++		bool discard_supported = false;
++
++		blk_queue_max_hw_sectors(mddev->queue, mddev->chunk_sectors);
++		blk_queue_max_write_same_sectors(mddev->queue, mddev->chunk_sectors);
++		blk_queue_max_discard_sectors(mddev->queue, mddev->chunk_sectors);
++
++		blk_queue_io_min(mddev->queue, mddev->chunk_sectors << 9);
++		blk_queue_io_opt(mddev->queue,
++				 (mddev->chunk_sectors << 9) * mddev->raid_disks);
++
++		rdev_for_each(rdev, mddev) {
++			disk_stack_limits(mddev->gendisk, rdev->bdev,
++					  rdev->data_offset << 9);
++			if (blk_queue_discard(bdev_get_queue(rdev->bdev)))
++				discard_supported = true;
++		}
++		if (!discard_supported)
++			queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
++		else
++			queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
++	}
+ 
+ 	/* calculate array device size */
+ 	md_set_array_sectors(mddev, raid0_size(mddev, 0, 0));
+diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
+index 9e3fdbdc4037..2f4503a7f315 100644
+--- a/drivers/mmc/core/core.c
++++ b/drivers/mmc/core/core.c
+@@ -134,9 +134,11 @@ void mmc_request_done(struct mmc_host *host, struct mmc_request *mrq)
+ 	int err = cmd->error;
+ 
+ 	/* Flag re-tuning needed on CRC errors */
+-	if (err == -EILSEQ || (mrq->sbc && mrq->sbc->error == -EILSEQ) ||
++	if ((cmd->opcode != MMC_SEND_TUNING_BLOCK &&
++	    cmd->opcode != MMC_SEND_TUNING_BLOCK_HS200) &&
++	    (err == -EILSEQ || (mrq->sbc && mrq->sbc->error == -EILSEQ) ||
+ 	    (mrq->data && mrq->data->error == -EILSEQ) ||
+-	    (mrq->stop && mrq->stop->error == -EILSEQ))
++	    (mrq->stop && mrq->stop->error == -EILSEQ)))
+ 		mmc_retune_needed(host);
+ 
+ 	if (err && cmd->retries && mmc_host_is_spi(host)) {
+diff --git a/drivers/mmc/core/host.c b/drivers/mmc/core/host.c
+index 99a9c9011c50..79979e9d5a09 100644
+--- a/drivers/mmc/core/host.c
++++ b/drivers/mmc/core/host.c
+@@ -457,7 +457,7 @@ int mmc_of_parse(struct mmc_host *host)
+ 					   0, &cd_gpio_invert);
+ 		if (!ret)
+ 			dev_info(host->parent, "Got CD GPIO\n");
+-		else if (ret != -ENOENT)
++		else if (ret != -ENOENT && ret != -ENOSYS)
+ 			return ret;
+ 
+ 		/*
+@@ -481,7 +481,7 @@ int mmc_of_parse(struct mmc_host *host)
+ 	ret = mmc_gpiod_request_ro(host, "wp", 0, false, 0, &ro_gpio_invert);
+ 	if (!ret)
+ 		dev_info(host->parent, "Got WP GPIO\n");
+-	else if (ret != -ENOENT)
++	else if (ret != -ENOENT && ret != -ENOSYS)
+ 		return ret;
+ 
+ 	if (of_property_read_bool(np, "disable-wp"))
+diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
+index 40e9d8e45f25..e41fb7405426 100644
+--- a/drivers/mmc/host/dw_mmc.c
++++ b/drivers/mmc/host/dw_mmc.c
+@@ -99,6 +99,9 @@ struct idmac_desc {
+ 
+ 	__le32		des3;	/* buffer 2 physical address */
+ };
++
++/* Each descriptor can transfer up to 4KB of data in chained mode */
++#define DW_MCI_DESC_DATA_LENGTH	0x1000
+ #endif /* CONFIG_MMC_DW_IDMAC */
+ 
+ static bool dw_mci_reset(struct dw_mci *host);
+@@ -462,66 +465,96 @@ static void dw_mci_idmac_complete_dma(struct dw_mci *host)
+ static void dw_mci_translate_sglist(struct dw_mci *host, struct mmc_data *data,
+ 				    unsigned int sg_len)
+ {
++	unsigned int desc_len;
+ 	int i;
+ 	if (host->dma_64bit_address == 1) {
+-		struct idmac_desc_64addr *desc = host->sg_cpu;
++		struct idmac_desc_64addr *desc_first, *desc_last, *desc;
++
++		desc_first = desc_last = desc = host->sg_cpu;
+ 
+-		for (i = 0; i < sg_len; i++, desc++) {
++		for (i = 0; i < sg_len; i++) {
+ 			unsigned int length = sg_dma_len(&data->sg[i]);
+ 			u64 mem_addr = sg_dma_address(&data->sg[i]);
+ 
+-			/*
+-			 * Set the OWN bit and disable interrupts for this
+-			 * descriptor
+-			 */
+-			desc->des0 = IDMAC_DES0_OWN | IDMAC_DES0_DIC |
+-						IDMAC_DES0_CH;
+-			/* Buffer length */
+-			IDMAC_64ADDR_SET_BUFFER1_SIZE(desc, length);
+-
+-			/* Physical address to DMA to/from */
+-			desc->des4 = mem_addr & 0xffffffff;
+-			desc->des5 = mem_addr >> 32;
++			for ( ; length ; desc++) {
++				desc_len = (length <= DW_MCI_DESC_DATA_LENGTH) ?
++					   length : DW_MCI_DESC_DATA_LENGTH;
++
++				length -= desc_len;
++
++				/*
++				 * Set the OWN bit and disable interrupts
++				 * for this descriptor
++				 */
++				desc->des0 = IDMAC_DES0_OWN | IDMAC_DES0_DIC |
++							IDMAC_DES0_CH;
++
++				/* Buffer length */
++				IDMAC_64ADDR_SET_BUFFER1_SIZE(desc, desc_len);
++
++				/* Physical address to DMA to/from */
++				desc->des4 = mem_addr & 0xffffffff;
++				desc->des5 = mem_addr >> 32;
++
++				/* Update physical address for the next desc */
++				mem_addr += desc_len;
++
++				/* Save pointer to the last descriptor */
++				desc_last = desc;
++			}
+ 		}
+ 
+ 		/* Set first descriptor */
+-		desc = host->sg_cpu;
+-		desc->des0 |= IDMAC_DES0_FD;
++		desc_first->des0 |= IDMAC_DES0_FD;
+ 
+ 		/* Set last descriptor */
+-		desc = host->sg_cpu + (i - 1) *
+-				sizeof(struct idmac_desc_64addr);
+-		desc->des0 &= ~(IDMAC_DES0_CH | IDMAC_DES0_DIC);
+-		desc->des0 |= IDMAC_DES0_LD;
++		desc_last->des0 &= ~(IDMAC_DES0_CH | IDMAC_DES0_DIC);
++		desc_last->des0 |= IDMAC_DES0_LD;
+ 
+ 	} else {
+-		struct idmac_desc *desc = host->sg_cpu;
++		struct idmac_desc *desc_first, *desc_last, *desc;
++
++		desc_first = desc_last = desc = host->sg_cpu;
+ 
+-		for (i = 0; i < sg_len; i++, desc++) {
++		for (i = 0; i < sg_len; i++) {
+ 			unsigned int length = sg_dma_len(&data->sg[i]);
+ 			u32 mem_addr = sg_dma_address(&data->sg[i]);
+ 
+-			/*
+-			 * Set the OWN bit and disable interrupts for this
+-			 * descriptor
+-			 */
+-			desc->des0 = cpu_to_le32(IDMAC_DES0_OWN |
+-					IDMAC_DES0_DIC | IDMAC_DES0_CH);
+-			/* Buffer length */
+-			IDMAC_SET_BUFFER1_SIZE(desc, length);
++			for ( ; length ; desc++) {
++				desc_len = (length <= DW_MCI_DESC_DATA_LENGTH) ?
++					   length : DW_MCI_DESC_DATA_LENGTH;
++
++				length -= desc_len;
++
++				/*
++				 * Set the OWN bit and disable interrupts
++				 * for this descriptor
++				 */
++				desc->des0 = cpu_to_le32(IDMAC_DES0_OWN |
++							 IDMAC_DES0_DIC |
++							 IDMAC_DES0_CH);
++
++				/* Buffer length */
++				IDMAC_SET_BUFFER1_SIZE(desc, desc_len);
+ 
+-			/* Physical address to DMA to/from */
+-			desc->des2 = cpu_to_le32(mem_addr);
++				/* Physical address to DMA to/from */
++				desc->des2 = cpu_to_le32(mem_addr);
++
++				/* Update physical address for the next desc */
++				mem_addr += desc_len;
++
++				/* Save pointer to the last descriptor */
++				desc_last = desc;
++			}
+ 		}
+ 
+ 		/* Set first descriptor */
+-		desc = host->sg_cpu;
+-		desc->des0 |= cpu_to_le32(IDMAC_DES0_FD);
++		desc_first->des0 |= cpu_to_le32(IDMAC_DES0_FD);
+ 
+ 		/* Set last descriptor */
+-		desc = host->sg_cpu + (i - 1) * sizeof(struct idmac_desc);
+-		desc->des0 &= cpu_to_le32(~(IDMAC_DES0_CH | IDMAC_DES0_DIC));
+-		desc->des0 |= cpu_to_le32(IDMAC_DES0_LD);
++		desc_last->des0 &= cpu_to_le32(~(IDMAC_DES0_CH |
++					       IDMAC_DES0_DIC));
++		desc_last->des0 |= cpu_to_le32(IDMAC_DES0_LD);
+ 	}
+ 
+ 	wmb();
+@@ -2394,7 +2427,7 @@ static int dw_mci_init_slot(struct dw_mci *host, unsigned int id)
+ #ifdef CONFIG_MMC_DW_IDMAC
+ 		mmc->max_segs = host->ring_size;
+ 		mmc->max_blk_size = 65536;
+-		mmc->max_seg_size = 0x1000;
++		mmc->max_seg_size = DW_MCI_DESC_DATA_LENGTH;
+ 		mmc->max_req_size = mmc->max_seg_size * host->ring_size;
+ 		mmc->max_blk_count = mmc->max_req_size / 512;
+ #else
+diff --git a/drivers/mmc/host/sdhci-pxav3.c b/drivers/mmc/host/sdhci-pxav3.c
+index 946d37f94a31..f5edf9d3a18a 100644
+--- a/drivers/mmc/host/sdhci-pxav3.c
++++ b/drivers/mmc/host/sdhci-pxav3.c
+@@ -135,6 +135,7 @@ static int armada_38x_quirks(struct platform_device *pdev,
+ 	struct sdhci_pxa *pxa = pltfm_host->priv;
+ 	struct resource *res;
+ 
++	host->quirks &= ~SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN;
+ 	host->quirks |= SDHCI_QUIRK_MISSING_CAPS;
+ 	res = platform_get_resource_byname(pdev, IORESOURCE_MEM,
+ 					   "conf-sdio3");
+@@ -290,6 +291,9 @@ static void pxav3_set_uhs_signaling(struct sdhci_host *host, unsigned int uhs)
+ 		    uhs == MMC_TIMING_UHS_DDR50) {
+ 			reg_val &= ~SDIO3_CONF_CLK_INV;
+ 			reg_val |= SDIO3_CONF_SD_FB_CLK;
++		} else if (uhs == MMC_TIMING_MMC_HS) {
++			reg_val &= ~SDIO3_CONF_CLK_INV;
++			reg_val &= ~SDIO3_CONF_SD_FB_CLK;
+ 		} else {
+ 			reg_val |= SDIO3_CONF_CLK_INV;
+ 			reg_val &= ~SDIO3_CONF_SD_FB_CLK;
+@@ -398,7 +402,7 @@ static int sdhci_pxav3_probe(struct platform_device *pdev)
+ 	if (of_device_is_compatible(np, "marvell,armada-380-sdhci")) {
+ 		ret = armada_38x_quirks(pdev, host);
+ 		if (ret < 0)
+-			goto err_clk_get;
++			goto err_mbus_win;
+ 		ret = mv_conf_mbus_windows(pdev, mv_mbus_dram_info());
+ 		if (ret < 0)
+ 			goto err_mbus_win;
+diff --git a/drivers/mtd/nand/pxa3xx_nand.c b/drivers/mtd/nand/pxa3xx_nand.c
+index 1259cc558ce9..5465fa439c9e 100644
+--- a/drivers/mtd/nand/pxa3xx_nand.c
++++ b/drivers/mtd/nand/pxa3xx_nand.c
+@@ -1473,6 +1473,9 @@ static int pxa3xx_nand_scan(struct mtd_info *mtd)
+ 	if (pdata->keep_config && !pxa3xx_nand_detect_config(info))
+ 		goto KEEP_CONFIG;
+ 
++	/* Set a default chunk size */
++	info->chunk_size = 512;
++
+ 	ret = pxa3xx_nand_sensing(info);
+ 	if (ret) {
+ 		dev_info(&info->pdev->dev, "There is no chip on cs %d!\n",
+diff --git a/drivers/mtd/nand/sunxi_nand.c b/drivers/mtd/nand/sunxi_nand.c
+index 6f93b2990d25..499b8e433d3d 100644
+--- a/drivers/mtd/nand/sunxi_nand.c
++++ b/drivers/mtd/nand/sunxi_nand.c
+@@ -138,6 +138,10 @@
+ #define NFC_ECC_MODE		GENMASK(15, 12)
+ #define NFC_RANDOM_SEED		GENMASK(30, 16)
+ 
++/* NFC_USER_DATA helper macros */
++#define NFC_BUF_TO_USER_DATA(buf)	((buf)[0] | ((buf)[1] << 8) | \
++					((buf)[2] << 16) | ((buf)[3] << 24))
++
+ #define NFC_DEFAULT_TIMEOUT_MS	1000
+ 
+ #define NFC_SRAM_SIZE		1024
+@@ -632,15 +636,9 @@ static int sunxi_nfc_hw_ecc_write_page(struct mtd_info *mtd,
+ 		offset = layout->eccpos[i * ecc->bytes] - 4 + mtd->writesize;
+ 
+ 		/* Fill OOB data in */
+-		if (oob_required) {
+-			tmp = 0xffffffff;
+-			memcpy_toio(nfc->regs + NFC_REG_USER_DATA_BASE, &tmp,
+-				    4);
+-		} else {
+-			memcpy_toio(nfc->regs + NFC_REG_USER_DATA_BASE,
+-				    chip->oob_poi + offset - mtd->writesize,
+-				    4);
+-		}
++		writel(NFC_BUF_TO_USER_DATA(chip->oob_poi +
++					    layout->oobfree[i].offset),
++		       nfc->regs + NFC_REG_USER_DATA_BASE);
+ 
+ 		chip->cmdfunc(mtd, NAND_CMD_RNDIN, offset, -1);
+ 
+@@ -770,14 +768,8 @@ static int sunxi_nfc_hw_syndrome_ecc_write_page(struct mtd_info *mtd,
+ 		offset += ecc->size;
+ 
+ 		/* Fill OOB data in */
+-		if (oob_required) {
+-			tmp = 0xffffffff;
+-			memcpy_toio(nfc->regs + NFC_REG_USER_DATA_BASE, &tmp,
+-				    4);
+-		} else {
+-			memcpy_toio(nfc->regs + NFC_REG_USER_DATA_BASE, oob,
+-				    4);
+-		}
++		writel(NFC_BUF_TO_USER_DATA(oob),
++		       nfc->regs + NFC_REG_USER_DATA_BASE);
+ 
+ 		tmp = NFC_DATA_TRANS | NFC_DATA_SWAP_METHOD | NFC_ACCESS_DIR |
+ 		      (1 << 30);
+@@ -1312,6 +1304,7 @@ static void sunxi_nand_chips_cleanup(struct sunxi_nfc *nfc)
+ 					node);
+ 		nand_release(&chip->mtd);
+ 		sunxi_nand_ecc_cleanup(&chip->nand.ecc);
++		list_del(&chip->node);
+ 	}
+ }
+ 
+diff --git a/drivers/mtd/ubi/io.c b/drivers/mtd/ubi/io.c
+index 5bbd1f094f4e..1fc23e48fe8e 100644
+--- a/drivers/mtd/ubi/io.c
++++ b/drivers/mtd/ubi/io.c
+@@ -926,6 +926,11 @@ static int validate_vid_hdr(const struct ubi_device *ubi,
+ 		goto bad;
+ 	}
+ 
++	if (data_size > ubi->leb_size) {
++		ubi_err(ubi, "bad data_size");
++		goto bad;
++	}
++
+ 	if (vol_type == UBI_VID_STATIC) {
+ 		/*
+ 		 * Although from high-level point of view static volumes may
+diff --git a/drivers/mtd/ubi/vtbl.c b/drivers/mtd/ubi/vtbl.c
+index 80bdd5b88bac..d85c19762160 100644
+--- a/drivers/mtd/ubi/vtbl.c
++++ b/drivers/mtd/ubi/vtbl.c
+@@ -649,6 +649,7 @@ static int init_volumes(struct ubi_device *ubi,
+ 		if (ubi->corr_peb_count)
+ 			ubi_err(ubi, "%d PEBs are corrupted and not used",
+ 				ubi->corr_peb_count);
++		return -ENOSPC;
+ 	}
+ 	ubi->rsvd_pebs += reserved_pebs;
+ 	ubi->avail_pebs -= reserved_pebs;
+diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c
+index 275d9fb6fe5c..eb4489f9082f 100644
+--- a/drivers/mtd/ubi/wl.c
++++ b/drivers/mtd/ubi/wl.c
+@@ -1601,6 +1601,7 @@ int ubi_wl_init(struct ubi_device *ubi, struct ubi_attach_info *ai)
+ 		if (ubi->corr_peb_count)
+ 			ubi_err(ubi, "%d PEBs are corrupted and not used",
+ 				ubi->corr_peb_count);
++		err = -ENOSPC;
+ 		goto out_free;
+ 	}
+ 	ubi->avail_pebs -= reserved_pebs;
+diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
+index 89d788d8f263..adfe1de78d99 100644
+--- a/drivers/net/ethernet/intel/e1000e/netdev.c
++++ b/drivers/net/ethernet/intel/e1000e/netdev.c
+@@ -4280,18 +4280,29 @@ static cycle_t e1000e_cyclecounter_read(const struct cyclecounter *cc)
+ 	struct e1000_adapter *adapter = container_of(cc, struct e1000_adapter,
+ 						     cc);
+ 	struct e1000_hw *hw = &adapter->hw;
++	u32 systimel_1, systimel_2, systimeh;
+ 	cycle_t systim, systim_next;
+-	/* SYSTIMH latching upon SYSTIML read does not work well. To fix that
+-	 * we don't want to allow overflow of SYSTIML and a change to SYSTIMH
+-	 * to occur between reads, so if we read a vale close to overflow, we
+-	 * wait for overflow to occur and read both registers when its safe.
++	/* SYSTIMH latching upon SYSTIML read does not work well.
++	 * This means that if SYSTIML overflows after we read it but before
++	 * we read SYSTIMH, the value of SYSTIMH has been incremented and we
++	 * will experience a huge non linear increment in the systime value
++	 * to fix that we test for overflow and if true, we re-read systime.
+ 	 */
+-	u32 systim_overflow_latch_fix = 0x3FFFFFFF;
+-
+-	do {
+-		systim = (cycle_t)er32(SYSTIML);
+-	} while (systim > systim_overflow_latch_fix);
+-	systim |= (cycle_t)er32(SYSTIMH) << 32;
++	systimel_1 = er32(SYSTIML);
++	systimeh = er32(SYSTIMH);
++	systimel_2 = er32(SYSTIML);
++	/* Check for overflow. If there was no overflow, use the values */
++	if (systimel_1 < systimel_2) {
++		systim = (cycle_t)systimel_1;
++		systim |= (cycle_t)systimeh << 32;
++	} else {
++		/* There was an overflow, read again SYSTIMH, and use
++		 * systimel_2
++		 */
++		systimeh = er32(SYSTIMH);
++		systim = (cycle_t)systimel_2;
++		systim |= (cycle_t)systimeh << 32;
++	}
+ 
+ 	if ((hw->mac.type == e1000_82574) || (hw->mac.type == e1000_82583)) {
+ 		u64 incvalue, time_delta, rem, temp;
+diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
+index 8d7b59689722..5bc9fca67957 100644
+--- a/drivers/net/ethernet/intel/igb/igb_main.c
++++ b/drivers/net/ethernet/intel/igb/igb_main.c
+@@ -2851,7 +2851,7 @@ static void igb_probe_vfs(struct igb_adapter *adapter)
+ 		return;
+ 
+ 	pci_sriov_set_totalvfs(pdev, 7);
+-	igb_pci_enable_sriov(pdev, max_vfs);
++	igb_enable_sriov(pdev, max_vfs);
+ 
+ #endif /* CONFIG_PCI_IOV */
+ }
+diff --git a/drivers/net/ethernet/via/Kconfig b/drivers/net/ethernet/via/Kconfig
+index 2f1264b882b9..d3d094742a7e 100644
+--- a/drivers/net/ethernet/via/Kconfig
++++ b/drivers/net/ethernet/via/Kconfig
+@@ -17,7 +17,7 @@ if NET_VENDOR_VIA
+ 
+ config VIA_RHINE
+ 	tristate "VIA Rhine support"
+-	depends on (PCI || OF_IRQ)
++	depends on PCI || (OF_IRQ && GENERIC_PCI_IOMAP)
+ 	depends on HAS_DMA
+ 	select CRC32
+ 	select MII
+diff --git a/drivers/net/wireless/ath/ath10k/htc.c b/drivers/net/wireless/ath/ath10k/htc.c
+index 85bfa2acb801..32d9ff1b19dc 100644
+--- a/drivers/net/wireless/ath/ath10k/htc.c
++++ b/drivers/net/wireless/ath/ath10k/htc.c
+@@ -145,8 +145,10 @@ int ath10k_htc_send(struct ath10k_htc *htc,
+ 	skb_cb->eid = eid;
+ 	skb_cb->paddr = dma_map_single(dev, skb->data, skb->len, DMA_TO_DEVICE);
+ 	ret = dma_mapping_error(dev, skb_cb->paddr);
+-	if (ret)
++	if (ret) {
++		ret = -EIO;
+ 		goto err_credits;
++	}
+ 
+ 	sg_item.transfer_id = ep->eid;
+ 	sg_item.transfer_context = skb;
+diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c b/drivers/net/wireless/ath/ath10k/htt_tx.c
+index a60ef7d1d5fc..7be3ce6e0ffa 100644
+--- a/drivers/net/wireless/ath/ath10k/htt_tx.c
++++ b/drivers/net/wireless/ath/ath10k/htt_tx.c
+@@ -371,8 +371,10 @@ int ath10k_htt_mgmt_tx(struct ath10k_htt *htt, struct sk_buff *msdu)
+ 	skb_cb->paddr = dma_map_single(dev, msdu->data, msdu->len,
+ 				       DMA_TO_DEVICE);
+ 	res = dma_mapping_error(dev, skb_cb->paddr);
+-	if (res)
++	if (res) {
++		res = -EIO;
+ 		goto err_free_txdesc;
++	}
+ 
+ 	skb_put(txdesc, len);
+ 	cmd = (struct htt_cmd *)txdesc->data;
+@@ -456,8 +458,10 @@ int ath10k_htt_tx(struct ath10k_htt *htt, struct sk_buff *msdu)
+ 	skb_cb->paddr = dma_map_single(dev, msdu->data, msdu->len,
+ 				       DMA_TO_DEVICE);
+ 	res = dma_mapping_error(dev, skb_cb->paddr);
+-	if (res)
++	if (res) {
++		res = -EIO;
+ 		goto err_free_txbuf;
++	}
+ 
+ 	switch (skb_cb->txmode) {
+ 	case ATH10K_HW_TXRX_RAW:
+diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
+index 218b6af63447..0d3c474ff76d 100644
+--- a/drivers/net/wireless/ath/ath10k/mac.c
++++ b/drivers/net/wireless/ath/ath10k/mac.c
+@@ -591,11 +591,19 @@ ath10k_mac_get_any_chandef_iter(struct ieee80211_hw *hw,
+ static int ath10k_peer_create(struct ath10k *ar, u32 vdev_id, const u8 *addr,
+ 			      enum wmi_peer_type peer_type)
+ {
++	struct ath10k_vif *arvif;
++	int num_peers = 0;
+ 	int ret;
+ 
+ 	lockdep_assert_held(&ar->conf_mutex);
+ 
+-	if (ar->num_peers >= ar->max_num_peers)
++	num_peers = ar->num_peers;
++
++	/* Each vdev consumes a peer entry as well */
++	list_for_each_entry(arvif, &ar->arvifs, list)
++		num_peers++;
++
++	if (num_peers >= ar->max_num_peers)
+ 		return -ENOBUFS;
+ 
+ 	ret = ath10k_wmi_peer_create(ar, vdev_id, addr, peer_type);
+@@ -2995,6 +3003,8 @@ void ath10k_mac_tx_unlock(struct ath10k *ar, int reason)
+ 						   IEEE80211_IFACE_ITER_RESUME_ALL,
+ 						   ath10k_mac_tx_unlock_iter,
+ 						   ar);
++
++	ieee80211_wake_queue(ar->hw, ar->hw->offchannel_tx_hw_queue);
+ }
+ 
+ void ath10k_mac_vif_tx_lock(struct ath10k_vif *arvif, int reason)
+@@ -3034,38 +3044,16 @@ static void ath10k_mac_vif_handle_tx_pause(struct ath10k_vif *arvif,
+ 
+ 	lockdep_assert_held(&ar->htt.tx_lock);
+ 
+-	switch (pause_id) {
+-	case WMI_TLV_TX_PAUSE_ID_MCC:
+-	case WMI_TLV_TX_PAUSE_ID_P2P_CLI_NOA:
+-	case WMI_TLV_TX_PAUSE_ID_P2P_GO_PS:
+-	case WMI_TLV_TX_PAUSE_ID_AP_PS:
+-	case WMI_TLV_TX_PAUSE_ID_IBSS_PS:
+-		switch (action) {
+-		case WMI_TLV_TX_PAUSE_ACTION_STOP:
+-			ath10k_mac_vif_tx_lock(arvif, pause_id);
+-			break;
+-		case WMI_TLV_TX_PAUSE_ACTION_WAKE:
+-			ath10k_mac_vif_tx_unlock(arvif, pause_id);
+-			break;
+-		default:
+-			ath10k_warn(ar, "received unknown tx pause action %d on vdev %i, ignoring\n",
+-				    action, arvif->vdev_id);
+-			break;
+-		}
++	switch (action) {
++	case WMI_TLV_TX_PAUSE_ACTION_STOP:
++		ath10k_mac_vif_tx_lock(arvif, pause_id);
++		break;
++	case WMI_TLV_TX_PAUSE_ACTION_WAKE:
++		ath10k_mac_vif_tx_unlock(arvif, pause_id);
+ 		break;
+-	case WMI_TLV_TX_PAUSE_ID_AP_PEER_PS:
+-	case WMI_TLV_TX_PAUSE_ID_AP_PEER_UAPSD:
+-	case WMI_TLV_TX_PAUSE_ID_STA_ADD_BA:
+-	case WMI_TLV_TX_PAUSE_ID_HOST:
+ 	default:
+-		/* FIXME: Some pause_ids aren't vdev specific. Instead they
+-		 * target peer_id and tid. Implementing these could improve
+-		 * traffic scheduling fairness across multiple connected
+-		 * stations in AP/IBSS modes.
+-		 */
+-		ath10k_dbg(ar, ATH10K_DBG_MAC,
+-			   "mac ignoring unsupported tx pause vdev %i id %d\n",
+-			   arvif->vdev_id, pause_id);
++		ath10k_warn(ar, "received unknown tx pause action %d on vdev %i, ignoring\n",
++			    action, arvif->vdev_id);
+ 		break;
+ 	}
+ }
+@@ -3082,12 +3070,15 @@ static void ath10k_mac_handle_tx_pause_iter(void *data, u8 *mac,
+ 	struct ath10k_vif *arvif = ath10k_vif_to_arvif(vif);
+ 	struct ath10k_mac_tx_pause *arg = data;
+ 
++	if (arvif->vdev_id != arg->vdev_id)
++		return;
++
+ 	ath10k_mac_vif_handle_tx_pause(arvif, arg->pause_id, arg->action);
+ }
+ 
+-void ath10k_mac_handle_tx_pause(struct ath10k *ar, u32 vdev_id,
+-				enum wmi_tlv_tx_pause_id pause_id,
+-				enum wmi_tlv_tx_pause_action action)
++void ath10k_mac_handle_tx_pause_vdev(struct ath10k *ar, u32 vdev_id,
++				     enum wmi_tlv_tx_pause_id pause_id,
++				     enum wmi_tlv_tx_pause_action action)
+ {
+ 	struct ath10k_mac_tx_pause arg = {
+ 		.vdev_id = vdev_id,
+@@ -4080,6 +4071,11 @@ static int ath10k_add_interface(struct ieee80211_hw *hw,
+ 		       sizeof(arvif->bitrate_mask.control[i].vht_mcs));
+ 	}
+ 
++	if (ar->num_peers >= ar->max_num_peers) {
++		ath10k_warn(ar, "refusing vdev creation due to insufficient peer entry resources in firmware\n");
++		return -ENOBUFS;
++	}
++
+ 	if (ar->free_vdev_map == 0) {
+ 		ath10k_warn(ar, "Free vdev map is empty, no more interfaces allowed.\n");
+ 		ret = -EBUSY;
+@@ -4287,6 +4283,11 @@ static int ath10k_add_interface(struct ieee80211_hw *hw,
+ 		}
+ 	}
+ 
++	spin_lock_bh(&ar->htt.tx_lock);
++	if (!ar->tx_paused)
++		ieee80211_wake_queue(ar->hw, arvif->vdev_id);
++	spin_unlock_bh(&ar->htt.tx_lock);
++
+ 	mutex_unlock(&ar->conf_mutex);
+ 	return 0;
+ 
+@@ -5561,6 +5562,21 @@ static int ath10k_set_rts_threshold(struct ieee80211_hw *hw, u32 value)
+ 	return ret;
+ }
+ 
++static int ath10k_mac_op_set_frag_threshold(struct ieee80211_hw *hw, u32 value)
++{
++	/* Even though there's a WMI enum for fragmentation threshold no known
++	 * firmware actually implements it. Moreover it is not possible to rely
++	 * frame fragmentation to mac80211 because firmware clears the "more
++	 * fragments" bit in frame control making it impossible for remote
++	 * devices to reassemble frames.
++	 *
++	 * Hence implement a dummy callback just to say fragmentation isn't
++	 * supported. This effectively prevents mac80211 from doing frame
++	 * fragmentation in software.
++	 */
++	return -EOPNOTSUPP;
++}
++
+ static void ath10k_flush(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
+ 			 u32 queues, bool drop)
+ {
+@@ -6395,6 +6411,7 @@ static const struct ieee80211_ops ath10k_ops = {
+ 	.remain_on_channel		= ath10k_remain_on_channel,
+ 	.cancel_remain_on_channel	= ath10k_cancel_remain_on_channel,
+ 	.set_rts_threshold		= ath10k_set_rts_threshold,
++	.set_frag_threshold		= ath10k_mac_op_set_frag_threshold,
+ 	.flush				= ath10k_flush,
+ 	.tx_last_beacon			= ath10k_tx_last_beacon,
+ 	.set_antenna			= ath10k_set_antenna,
+diff --git a/drivers/net/wireless/ath/ath10k/mac.h b/drivers/net/wireless/ath/ath10k/mac.h
+index b291f063705c..e3cefe4c7cfd 100644
+--- a/drivers/net/wireless/ath/ath10k/mac.h
++++ b/drivers/net/wireless/ath/ath10k/mac.h
+@@ -61,9 +61,9 @@ int ath10k_mac_vif_chan(struct ieee80211_vif *vif,
+ 
+ void ath10k_mac_handle_beacon(struct ath10k *ar, struct sk_buff *skb);
+ void ath10k_mac_handle_beacon_miss(struct ath10k *ar, u32 vdev_id);
+-void ath10k_mac_handle_tx_pause(struct ath10k *ar, u32 vdev_id,
+-				enum wmi_tlv_tx_pause_id pause_id,
+-				enum wmi_tlv_tx_pause_action action);
++void ath10k_mac_handle_tx_pause_vdev(struct ath10k *ar, u32 vdev_id,
++				     enum wmi_tlv_tx_pause_id pause_id,
++				     enum wmi_tlv_tx_pause_action action);
+ 
+ u8 ath10k_mac_hw_rate_to_idx(const struct ieee80211_supported_band *sband,
+ 			     u8 hw_rate);
+diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
+index ea656e011a96..8c5cc1facc45 100644
+--- a/drivers/net/wireless/ath/ath10k/pci.c
++++ b/drivers/net/wireless/ath/ath10k/pci.c
+@@ -1546,8 +1546,10 @@ static int ath10k_pci_hif_exchange_bmi_msg(struct ath10k *ar,
+ 
+ 	req_paddr = dma_map_single(ar->dev, treq, req_len, DMA_TO_DEVICE);
+ 	ret = dma_mapping_error(ar->dev, req_paddr);
+-	if (ret)
++	if (ret) {
++		ret = -EIO;
+ 		goto err_dma;
++	}
+ 
+ 	if (resp && resp_len) {
+ 		tresp = kzalloc(*resp_len, GFP_KERNEL);
+@@ -1559,8 +1561,10 @@ static int ath10k_pci_hif_exchange_bmi_msg(struct ath10k *ar,
+ 		resp_paddr = dma_map_single(ar->dev, tresp, *resp_len,
+ 					    DMA_FROM_DEVICE);
+ 		ret = dma_mapping_error(ar->dev, resp_paddr);
+-		if (ret)
++		if (ret) {
++			ret = EIO;
+ 			goto err_req;
++		}
+ 
+ 		xfer.wait_for_resp = true;
+ 		xfer.resp_len = 0;
+diff --git a/drivers/net/wireless/ath/ath10k/wmi-tlv.c b/drivers/net/wireless/ath/ath10k/wmi-tlv.c
+index 8fdba3865c96..6f477e83099d 100644
+--- a/drivers/net/wireless/ath/ath10k/wmi-tlv.c
++++ b/drivers/net/wireless/ath/ath10k/wmi-tlv.c
+@@ -377,12 +377,34 @@ static int ath10k_wmi_tlv_event_tx_pause(struct ath10k *ar,
+ 		   "wmi tlv tx pause pause_id %u action %u vdev_map 0x%08x peer_id %u tid_map 0x%08x\n",
+ 		   pause_id, action, vdev_map, peer_id, tid_map);
+ 
+-	for (vdev_id = 0; vdev_map; vdev_id++) {
+-		if (!(vdev_map & BIT(vdev_id)))
+-			continue;
+-
+-		vdev_map &= ~BIT(vdev_id);
+-		ath10k_mac_handle_tx_pause(ar, vdev_id, pause_id, action);
++	switch (pause_id) {
++	case WMI_TLV_TX_PAUSE_ID_MCC:
++	case WMI_TLV_TX_PAUSE_ID_P2P_CLI_NOA:
++	case WMI_TLV_TX_PAUSE_ID_P2P_GO_PS:
++	case WMI_TLV_TX_PAUSE_ID_AP_PS:
++	case WMI_TLV_TX_PAUSE_ID_IBSS_PS:
++		for (vdev_id = 0; vdev_map; vdev_id++) {
++			if (!(vdev_map & BIT(vdev_id)))
++				continue;
++
++			vdev_map &= ~BIT(vdev_id);
++			ath10k_mac_handle_tx_pause_vdev(ar, vdev_id, pause_id,
++							action);
++		}
++		break;
++	case WMI_TLV_TX_PAUSE_ID_AP_PEER_PS:
++	case WMI_TLV_TX_PAUSE_ID_AP_PEER_UAPSD:
++	case WMI_TLV_TX_PAUSE_ID_STA_ADD_BA:
++	case WMI_TLV_TX_PAUSE_ID_HOST:
++		ath10k_dbg(ar, ATH10K_DBG_MAC,
++			   "mac ignoring unsupported tx pause id %d\n",
++			   pause_id);
++		break;
++	default:
++		ath10k_dbg(ar, ATH10K_DBG_MAC,
++			   "mac ignoring unknown tx pause vdev %d\n",
++			   pause_id);
++		break;
+ 	}
+ 
+ 	kfree(tb);
+diff --git a/drivers/net/wireless/ath/ath10k/wmi.c b/drivers/net/wireless/ath/ath10k/wmi.c
+index 6c046c244705..8dd84c160cfd 100644
+--- a/drivers/net/wireless/ath/ath10k/wmi.c
++++ b/drivers/net/wireless/ath/ath10k/wmi.c
+@@ -2391,6 +2391,7 @@ void ath10k_wmi_event_host_swba(struct ath10k *ar, struct sk_buff *skb)
+ 				ath10k_warn(ar, "failed to map beacon: %d\n",
+ 					    ret);
+ 				dev_kfree_skb_any(bcn);
++				ret = -EIO;
+ 				goto skip;
+ 			}
+ 
+diff --git a/drivers/net/wireless/rsi/rsi_91x_sdio_ops.c b/drivers/net/wireless/rsi/rsi_91x_sdio_ops.c
+index 1c6788aecc62..40d72312f3df 100644
+--- a/drivers/net/wireless/rsi/rsi_91x_sdio_ops.c
++++ b/drivers/net/wireless/rsi/rsi_91x_sdio_ops.c
+@@ -203,8 +203,10 @@ static int rsi_load_ta_instructions(struct rsi_common *common)
+ 
+ 	/* Copy firmware into DMA-accessible memory */
+ 	fw = kmemdup(fw_entry->data, fw_entry->size, GFP_KERNEL);
+-	if (!fw)
+-		return -ENOMEM;
++	if (!fw) {
++		status = -ENOMEM;
++		goto out;
++	}
+ 	len = fw_entry->size;
+ 
+ 	if (len % 4)
+@@ -217,6 +219,8 @@ static int rsi_load_ta_instructions(struct rsi_common *common)
+ 
+ 	status = rsi_copy_to_card(common, fw, len, num_blocks);
+ 	kfree(fw);
++
++out:
+ 	release_firmware(fw_entry);
+ 	return status;
+ }
+diff --git a/drivers/net/wireless/rsi/rsi_91x_usb_ops.c b/drivers/net/wireless/rsi/rsi_91x_usb_ops.c
+index 30c2cf7fa93b..de4900862836 100644
+--- a/drivers/net/wireless/rsi/rsi_91x_usb_ops.c
++++ b/drivers/net/wireless/rsi/rsi_91x_usb_ops.c
+@@ -148,8 +148,10 @@ static int rsi_load_ta_instructions(struct rsi_common *common)
+ 
+ 	/* Copy firmware into DMA-accessible memory */
+ 	fw = kmemdup(fw_entry->data, fw_entry->size, GFP_KERNEL);
+-	if (!fw)
+-		return -ENOMEM;
++	if (!fw) {
++		status = -ENOMEM;
++		goto out;
++	}
+ 	len = fw_entry->size;
+ 
+ 	if (len % 4)
+@@ -162,6 +164,8 @@ static int rsi_load_ta_instructions(struct rsi_common *common)
+ 
+ 	status = rsi_copy_to_card(common, fw, len, num_blocks);
+ 	kfree(fw);
++
++out:
+ 	release_firmware(fw_entry);
+ 	return status;
+ }
+diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
+index f948c46d5132..5ff0cfd142ee 100644
+--- a/drivers/net/xen-netfront.c
++++ b/drivers/net/xen-netfront.c
+@@ -1348,7 +1348,8 @@ static void xennet_disconnect_backend(struct netfront_info *info)
+ 		queue->tx_evtchn = queue->rx_evtchn = 0;
+ 		queue->tx_irq = queue->rx_irq = 0;
+ 
+-		napi_synchronize(&queue->napi);
++		if (netif_running(info->netdev))
++			napi_synchronize(&queue->napi);
+ 
+ 		xennet_release_tx_bufs(queue);
+ 		xennet_release_rx_bufs(queue);
+diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
+index ade9eb917a4d..b796d1bd8988 100644
+--- a/drivers/nvdimm/pmem.c
++++ b/drivers/nvdimm/pmem.c
+@@ -86,6 +86,8 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
+ 	struct pmem_device *pmem = bdev->bd_disk->private_data;
+ 
+ 	pmem_do_bvec(pmem, page, PAGE_CACHE_SIZE, 0, rw, sector);
++	if (rw & WRITE)
++		wmb_pmem();
+ 	page_endio(page, rw & WRITE, 0);
+ 
+ 	return 0;
+diff --git a/drivers/pci/access.c b/drivers/pci/access.c
+index b965c12168b7..502a82ca1db0 100644
+--- a/drivers/pci/access.c
++++ b/drivers/pci/access.c
+@@ -442,7 +442,8 @@ static const struct pci_vpd_ops pci_vpd_pci22_ops = {
+ static ssize_t pci_vpd_f0_read(struct pci_dev *dev, loff_t pos, size_t count,
+ 			       void *arg)
+ {
+-	struct pci_dev *tdev = pci_get_slot(dev->bus, PCI_SLOT(dev->devfn));
++	struct pci_dev *tdev = pci_get_slot(dev->bus,
++					    PCI_DEVFN(PCI_SLOT(dev->devfn), 0));
+ 	ssize_t ret;
+ 
+ 	if (!tdev)
+@@ -456,7 +457,8 @@ static ssize_t pci_vpd_f0_read(struct pci_dev *dev, loff_t pos, size_t count,
+ static ssize_t pci_vpd_f0_write(struct pci_dev *dev, loff_t pos, size_t count,
+ 				const void *arg)
+ {
+-	struct pci_dev *tdev = pci_get_slot(dev->bus, PCI_SLOT(dev->devfn));
++	struct pci_dev *tdev = pci_get_slot(dev->bus,
++					    PCI_DEVFN(PCI_SLOT(dev->devfn), 0));
+ 	ssize_t ret;
+ 
+ 	if (!tdev)
+@@ -473,22 +475,6 @@ static const struct pci_vpd_ops pci_vpd_f0_ops = {
+ 	.release = pci_vpd_pci22_release,
+ };
+ 
+-static int pci_vpd_f0_dev_check(struct pci_dev *dev)
+-{
+-	struct pci_dev *tdev = pci_get_slot(dev->bus, PCI_SLOT(dev->devfn));
+-	int ret = 0;
+-
+-	if (!tdev)
+-		return -ENODEV;
+-	if (!tdev->vpd || !tdev->multifunction ||
+-	    dev->class != tdev->class || dev->vendor != tdev->vendor ||
+-	    dev->device != tdev->device)
+-		ret = -ENODEV;
+-
+-	pci_dev_put(tdev);
+-	return ret;
+-}
+-
+ int pci_vpd_pci22_init(struct pci_dev *dev)
+ {
+ 	struct pci_vpd_pci22 *vpd;
+@@ -497,12 +483,7 @@ int pci_vpd_pci22_init(struct pci_dev *dev)
+ 	cap = pci_find_capability(dev, PCI_CAP_ID_VPD);
+ 	if (!cap)
+ 		return -ENODEV;
+-	if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0) {
+-		int ret = pci_vpd_f0_dev_check(dev);
+ 
+-		if (ret)
+-			return ret;
+-	}
+ 	vpd = kzalloc(sizeof(*vpd), GFP_ATOMIC);
+ 	if (!vpd)
+ 		return -ENOMEM;
+diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
+index 6fbd3f2b5992..d3346d23963b 100644
+--- a/drivers/pci/bus.c
++++ b/drivers/pci/bus.c
+@@ -256,6 +256,8 @@ bool pci_bus_clip_resource(struct pci_dev *dev, int idx)
+ 
+ 		res->start = start;
+ 		res->end = end;
++		res->flags &= ~IORESOURCE_UNSET;
++		orig_res.flags &= ~IORESOURCE_UNSET;
+ 		dev_printk(KERN_DEBUG, &dev->dev, "%pR clipped to %pR\n",
+ 				 &orig_res, res);
+ 
+diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
+index dbd13854f21e..6b1c6a915daa 100644
+--- a/drivers/pci/quirks.c
++++ b/drivers/pci/quirks.c
+@@ -1906,11 +1906,27 @@ static void quirk_netmos(struct pci_dev *dev)
+ DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_NETMOS, PCI_ANY_ID,
+ 			 PCI_CLASS_COMMUNICATION_SERIAL, 8, quirk_netmos);
+ 
++/*
++ * Quirk non-zero PCI functions to route VPD access through function 0 for
++ * devices that share VPD resources between functions.  The functions are
++ * expected to be identical devices.
++ */
+ static void quirk_f0_vpd_link(struct pci_dev *dev)
+ {
+-	if (!dev->multifunction || !PCI_FUNC(dev->devfn))
++	struct pci_dev *f0;
++
++	if (!PCI_FUNC(dev->devfn))
+ 		return;
+-	dev->dev_flags |= PCI_DEV_FLAGS_VPD_REF_F0;
++
++	f0 = pci_get_slot(dev->bus, PCI_DEVFN(PCI_SLOT(dev->devfn), 0));
++	if (!f0)
++		return;
++
++	if (f0->vpd && dev->class == f0->class &&
++	    dev->vendor == f0->vendor && dev->device == f0->device)
++		dev->dev_flags |= PCI_DEV_FLAGS_VPD_REF_F0;
++
++	pci_dev_put(f0);
+ }
+ DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, PCI_ANY_ID,
+ 			      PCI_CLASS_NETWORK_ETHERNET, 8, quirk_f0_vpd_link);
+diff --git a/drivers/pcmcia/sa1100_generic.c b/drivers/pcmcia/sa1100_generic.c
+index 803945259da8..42861cc70158 100644
+--- a/drivers/pcmcia/sa1100_generic.c
++++ b/drivers/pcmcia/sa1100_generic.c
+@@ -93,7 +93,6 @@ static int sa11x0_drv_pcmcia_remove(struct platform_device *dev)
+ 	for (i = 0; i < sinfo->nskt; i++)
+ 		soc_pcmcia_remove_one(&sinfo->skt[i]);
+ 
+-	clk_put(sinfo->clk);
+ 	kfree(sinfo);
+ 	return 0;
+ }
+diff --git a/drivers/pcmcia/sa11xx_base.c b/drivers/pcmcia/sa11xx_base.c
+index cf6de2c2b329..553d70a67f80 100644
+--- a/drivers/pcmcia/sa11xx_base.c
++++ b/drivers/pcmcia/sa11xx_base.c
+@@ -222,7 +222,7 @@ int sa11xx_drv_pcmcia_probe(struct device *dev, struct pcmcia_low_level *ops,
+ 	int i, ret = 0;
+ 	struct clk *clk;
+ 
+-	clk = clk_get(dev, NULL);
++	clk = devm_clk_get(dev, NULL);
+ 	if (IS_ERR(clk))
+ 		return PTR_ERR(clk);
+ 
+@@ -251,7 +251,6 @@ int sa11xx_drv_pcmcia_probe(struct device *dev, struct pcmcia_low_level *ops,
+ 	if (ret) {
+ 		while (--i >= 0)
+ 			soc_pcmcia_remove_one(&sinfo->skt[i]);
+-		clk_put(clk);
+ 		kfree(sinfo);
+ 	} else {
+ 		dev_set_drvdata(dev, sinfo);
+diff --git a/drivers/platform/x86/toshiba_acpi.c b/drivers/platform/x86/toshiba_acpi.c
+index 3ad7b1fa24ce..6f4f310de946 100644
+--- a/drivers/platform/x86/toshiba_acpi.c
++++ b/drivers/platform/x86/toshiba_acpi.c
+@@ -2408,11 +2408,9 @@ static int toshiba_acpi_setup_keyboard(struct toshiba_acpi_dev *dev)
+ 	if (error)
+ 		return error;
+ 
+-	error = toshiba_hotkey_event_type_get(dev, &events_type);
+-	if (error) {
+-		pr_err("Unable to query Hotkey Event Type\n");
+-		return error;
+-	}
++	if (toshiba_hotkey_event_type_get(dev, &events_type))
++		pr_notice("Unable to query Hotkey Event Type\n");
++
+ 	dev->hotkey_event_type = events_type;
+ 
+ 	dev->hotkey_dev = input_allocate_device();
+diff --git a/drivers/power/avs/Kconfig b/drivers/power/avs/Kconfig
+index 7f3d389bd601..a67eeace6a89 100644
+--- a/drivers/power/avs/Kconfig
++++ b/drivers/power/avs/Kconfig
+@@ -13,7 +13,7 @@ menuconfig POWER_AVS
+ 
+ config ROCKCHIP_IODOMAIN
+         tristate "Rockchip IO domain support"
+-        depends on ARCH_ROCKCHIP && OF
++        depends on POWER_AVS && ARCH_ROCKCHIP && OF
+         help
+           Say y here to enable support io domains on Rockchip SoCs. It is
+           necessary for the io domain setting of the SoC to match the
+diff --git a/drivers/regulator/axp20x-regulator.c b/drivers/regulator/axp20x-regulator.c
+index 646829132b59..1dea0e8353e0 100644
+--- a/drivers/regulator/axp20x-regulator.c
++++ b/drivers/regulator/axp20x-regulator.c
+@@ -192,9 +192,9 @@ static const struct regulator_desc axp22x_regulators[] = {
+ 	AXP_DESC(AXP22X, DCDC3, "dcdc3", "vin3", 600, 1860, 20,
+ 		 AXP22X_DCDC3_V_OUT, 0x3f, AXP22X_PWR_OUT_CTRL1, BIT(3)),
+ 	AXP_DESC(AXP22X, DCDC4, "dcdc4", "vin4", 600, 1540, 20,
+-		 AXP22X_DCDC4_V_OUT, 0x3f, AXP22X_PWR_OUT_CTRL1, BIT(3)),
++		 AXP22X_DCDC4_V_OUT, 0x3f, AXP22X_PWR_OUT_CTRL1, BIT(4)),
+ 	AXP_DESC(AXP22X, DCDC5, "dcdc5", "vin5", 1000, 2550, 50,
+-		 AXP22X_DCDC5_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL1, BIT(4)),
++		 AXP22X_DCDC5_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL1, BIT(5)),
+ 	/* secondary switchable output of DCDC1 */
+ 	AXP_DESC_SW(AXP22X, DC1SW, "dc1sw", "dcdc1", 1600, 3400, 100,
+ 		    AXP22X_DCDC1_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL2, BIT(7)),
+diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
+index 78387a6cbae5..5081533858f1 100644
+--- a/drivers/regulator/core.c
++++ b/drivers/regulator/core.c
+@@ -1376,15 +1376,19 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
+ 		return 0;
+ 
+ 	r = regulator_dev_lookup(dev, rdev->supply_name, &ret);
+-	if (ret == -ENODEV) {
+-		/*
+-		 * No supply was specified for this regulator and
+-		 * there will never be one.
+-		 */
+-		return 0;
+-	}
+-
+ 	if (!r) {
++		if (ret == -ENODEV) {
++			/*
++			 * No supply was specified for this regulator and
++			 * there will never be one.
++			 */
++			return 0;
++		}
++
++		/* Did the lookup explicitly defer for us? */
++		if (ret == -EPROBE_DEFER)
++			return ret;
++
+ 		if (have_full_constraints()) {
+ 			r = dummy_regulator_rdev;
+ 		} else {
+diff --git a/drivers/scsi/3w-9xxx.c b/drivers/scsi/3w-9xxx.c
+index add419d6ff34..a56a7b243e91 100644
+--- a/drivers/scsi/3w-9xxx.c
++++ b/drivers/scsi/3w-9xxx.c
+@@ -212,6 +212,17 @@ static const struct file_operations twa_fops = {
+ 	.llseek		= noop_llseek,
+ };
+ 
++/*
++ * The controllers use an inline buffer instead of a mapped SGL for small,
++ * single entry buffers.  Note that we treat a zero-length transfer like
++ * a mapped SGL.
++ */
++static bool twa_command_mapped(struct scsi_cmnd *cmd)
++{
++	return scsi_sg_count(cmd) != 1 ||
++		scsi_bufflen(cmd) >= TW_MIN_SGL_LENGTH;
++}
++
+ /* This function will complete an aen request from the isr */
+ static int twa_aen_complete(TW_Device_Extension *tw_dev, int request_id)
+ {
+@@ -1339,7 +1350,8 @@ static irqreturn_t twa_interrupt(int irq, void *dev_instance)
+ 				}
+ 
+ 				/* Now complete the io */
+-				scsi_dma_unmap(cmd);
++				if (twa_command_mapped(cmd))
++					scsi_dma_unmap(cmd);
+ 				cmd->scsi_done(cmd);
+ 				tw_dev->state[request_id] = TW_S_COMPLETED;
+ 				twa_free_request_id(tw_dev, request_id);
+@@ -1582,7 +1594,8 @@ static int twa_reset_device_extension(TW_Device_Extension *tw_dev)
+ 				struct scsi_cmnd *cmd = tw_dev->srb[i];
+ 
+ 				cmd->result = (DID_RESET << 16);
+-				scsi_dma_unmap(cmd);
++				if (twa_command_mapped(cmd))
++					scsi_dma_unmap(cmd);
+ 				cmd->scsi_done(cmd);
+ 			}
+ 		}
+@@ -1765,12 +1778,14 @@ static int twa_scsi_queue_lck(struct scsi_cmnd *SCpnt, void (*done)(struct scsi_
+ 	retval = twa_scsiop_execute_scsi(tw_dev, request_id, NULL, 0, NULL);
+ 	switch (retval) {
+ 	case SCSI_MLQUEUE_HOST_BUSY:
+-		scsi_dma_unmap(SCpnt);
++		if (twa_command_mapped(SCpnt))
++			scsi_dma_unmap(SCpnt);
+ 		twa_free_request_id(tw_dev, request_id);
+ 		break;
+ 	case 1:
+ 		SCpnt->result = (DID_ERROR << 16);
+-		scsi_dma_unmap(SCpnt);
++		if (twa_command_mapped(SCpnt))
++			scsi_dma_unmap(SCpnt);
+ 		done(SCpnt);
+ 		tw_dev->state[request_id] = TW_S_COMPLETED;
+ 		twa_free_request_id(tw_dev, request_id);
+@@ -1831,8 +1846,7 @@ static int twa_scsiop_execute_scsi(TW_Device_Extension *tw_dev, int request_id,
+ 		/* Map sglist from scsi layer to cmd packet */
+ 
+ 		if (scsi_sg_count(srb)) {
+-			if ((scsi_sg_count(srb) == 1) &&
+-			    (scsi_bufflen(srb) < TW_MIN_SGL_LENGTH)) {
++			if (!twa_command_mapped(srb)) {
+ 				if (srb->sc_data_direction == DMA_TO_DEVICE ||
+ 				    srb->sc_data_direction == DMA_BIDIRECTIONAL)
+ 					scsi_sg_copy_to_buffer(srb,
+@@ -1905,7 +1919,7 @@ static void twa_scsiop_execute_scsi_complete(TW_Device_Extension *tw_dev, int re
+ {
+ 	struct scsi_cmnd *cmd = tw_dev->srb[request_id];
+ 
+-	if (scsi_bufflen(cmd) < TW_MIN_SGL_LENGTH &&
++	if (!twa_command_mapped(cmd) &&
+ 	    (cmd->sc_data_direction == DMA_FROM_DEVICE ||
+ 	     cmd->sc_data_direction == DMA_BIDIRECTIONAL)) {
+ 		if (scsi_sg_count(cmd) == 1) {
+diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
+index 1dafeb43333b..cab4e98b2b0e 100644
+--- a/drivers/scsi/hpsa.c
++++ b/drivers/scsi/hpsa.c
+@@ -5104,7 +5104,7 @@ static int hpsa_eh_device_reset_handler(struct scsi_cmnd *scsicmd)
+ 	int rc;
+ 	struct ctlr_info *h;
+ 	struct hpsa_scsi_dev_t *dev;
+-	char msg[40];
++	char msg[48];
+ 
+ 	/* find the controller to which the command to be aborted was sent */
+ 	h = sdev_to_hba(scsicmd->device);
+@@ -5122,16 +5122,18 @@ static int hpsa_eh_device_reset_handler(struct scsi_cmnd *scsicmd)
+ 
+ 	/* if controller locked up, we can guarantee command won't complete */
+ 	if (lockup_detected(h)) {
+-		sprintf(msg, "cmd %d RESET FAILED, lockup detected",
+-				hpsa_get_cmd_index(scsicmd));
++		snprintf(msg, sizeof(msg),
++			 "cmd %d RESET FAILED, lockup detected",
++			 hpsa_get_cmd_index(scsicmd));
+ 		hpsa_show_dev_msg(KERN_WARNING, h, dev, msg);
+ 		return FAILED;
+ 	}
+ 
+ 	/* this reset request might be the result of a lockup; check */
+ 	if (detect_controller_lockup(h)) {
+-		sprintf(msg, "cmd %d RESET FAILED, new lockup detected",
+-				hpsa_get_cmd_index(scsicmd));
++		snprintf(msg, sizeof(msg),
++			 "cmd %d RESET FAILED, new lockup detected",
++			 hpsa_get_cmd_index(scsicmd));
+ 		hpsa_show_dev_msg(KERN_WARNING, h, dev, msg);
+ 		return FAILED;
+ 	}
+@@ -5145,7 +5147,8 @@ static int hpsa_eh_device_reset_handler(struct scsi_cmnd *scsicmd)
+ 	/* send a reset to the SCSI LUN which the command was sent to */
+ 	rc = hpsa_do_reset(h, dev, dev->scsi3addr, HPSA_RESET_TYPE_LUN,
+ 			   DEFAULT_REPLY_QUEUE);
+-	sprintf(msg, "reset %s", rc == 0 ? "completed successfully" : "failed");
++	snprintf(msg, sizeof(msg), "reset %s",
++		 rc == 0 ? "completed successfully" : "failed");
+ 	hpsa_show_dev_msg(KERN_WARNING, h, dev, msg);
+ 	return rc == 0 ? SUCCESS : FAILED;
+ }
+diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
+index a9aa38903efe..cccab6188328 100644
+--- a/drivers/scsi/ipr.c
++++ b/drivers/scsi/ipr.c
+@@ -4554,7 +4554,7 @@ static ssize_t ipr_store_raw_mode(struct device *dev,
+ 	spin_lock_irqsave(ioa_cfg->host->host_lock, lock_flags);
+ 	res = (struct ipr_resource_entry *)sdev->hostdata;
+ 	if (res) {
+-		if (ioa_cfg->sis64 && ipr_is_af_dasd_device(res)) {
++		if (ipr_is_af_dasd_device(res)) {
+ 			res->raw_mode = simple_strtoul(buf, NULL, 10);
+ 			len = strlen(buf);
+ 			if (res->sdev)
+diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
+index 6457a8a0db9c..bf3d801ac5f9 100644
+--- a/drivers/scsi/scsi_error.c
++++ b/drivers/scsi/scsi_error.c
+@@ -2169,8 +2169,17 @@ int scsi_error_handler(void *data)
+ 	 * We never actually get interrupted because kthread_run
+ 	 * disables signal delivery for the created thread.
+ 	 */
+-	while (!kthread_should_stop()) {
++	while (true) {
++		/*
++		 * The sequence in kthread_stop() sets the stop flag first
++		 * then wakes the process.  To avoid missed wakeups, the task
++		 * should always be in a non running state before the stop
++		 * flag is checked
++		 */
+ 		set_current_state(TASK_INTERRUPTIBLE);
++		if (kthread_should_stop())
++			break;
++
+ 		if ((shost->host_failed == 0 && shost->host_eh_scheduled == 0) ||
+ 		    shost->host_failed != atomic_read(&shost->host_busy)) {
+ 			SCSI_LOG_ERROR_RECOVERY(1,
+diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
+index c9357bb393d3..744596464d33 100644
+--- a/drivers/spi/spi-bcm2835.c
++++ b/drivers/spi/spi-bcm2835.c
+@@ -386,14 +386,14 @@ static bool bcm2835_spi_can_dma(struct spi_master *master,
+ 	/* otherwise we only allow transfers within the same page
+ 	 * to avoid wasting time on dma_mapping when it is not practical
+ 	 */
+-	if (((size_t)tfr->tx_buf & PAGE_MASK) + tfr->len > PAGE_SIZE) {
++	if (((size_t)tfr->tx_buf & (PAGE_SIZE - 1)) + tfr->len > PAGE_SIZE) {
+ 		dev_warn_once(&spi->dev,
+ 			      "Unaligned spi tx-transfer bridging page\n");
+ 		return false;
+ 	}
+-	if (((size_t)tfr->rx_buf & PAGE_MASK) + tfr->len > PAGE_SIZE) {
++	if (((size_t)tfr->rx_buf & (PAGE_SIZE - 1)) + tfr->len > PAGE_SIZE) {
+ 		dev_warn_once(&spi->dev,
+-			      "Unaligned spi tx-transfer bridging page\n");
++			      "Unaligned spi rx-transfer bridging page\n");
+ 		return false;
+ 	}
+ 
+diff --git a/drivers/spi/spi-pxa2xx.c b/drivers/spi/spi-pxa2xx.c
+index 7293d6d875c5..8e4b1a7c37ce 100644
+--- a/drivers/spi/spi-pxa2xx.c
++++ b/drivers/spi/spi-pxa2xx.c
+@@ -643,6 +643,10 @@ static irqreturn_t ssp_int(int irq, void *dev_id)
+ 	if (!(sccr1_reg & SSCR1_TIE))
+ 		mask &= ~SSSR_TFS;
+ 
++	/* Ignore RX timeout interrupt if it is disabled */
++	if (!(sccr1_reg & SSCR1_TINTE))
++		mask &= ~SSSR_TINT;
++
+ 	if (!(status & mask))
+ 		return IRQ_NONE;
+ 
+diff --git a/drivers/spi/spi-xtensa-xtfpga.c b/drivers/spi/spi-xtensa-xtfpga.c
+index 2e32ea2f194f..be6155cba9de 100644
+--- a/drivers/spi/spi-xtensa-xtfpga.c
++++ b/drivers/spi/spi-xtensa-xtfpga.c
+@@ -34,13 +34,13 @@ struct xtfpga_spi {
+ static inline void xtfpga_spi_write32(const struct xtfpga_spi *spi,
+ 				      unsigned addr, u32 val)
+ {
+-	iowrite32(val, spi->regs + addr);
++	__raw_writel(val, spi->regs + addr);
+ }
+ 
+ static inline unsigned int xtfpga_spi_read32(const struct xtfpga_spi *spi,
+ 					     unsigned addr)
+ {
+-	return ioread32(spi->regs + addr);
++	return __raw_readl(spi->regs + addr);
+ }
+ 
+ static inline void xtfpga_spi_wait_busy(struct xtfpga_spi *xspi)
+diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
+index cf8b91b23a76..9ce2f156d382 100644
+--- a/drivers/spi/spi.c
++++ b/drivers/spi/spi.c
+@@ -1437,8 +1437,7 @@ static struct class spi_master_class = {
+  *
+  * The caller is responsible for assigning the bus number and initializing
+  * the master's methods before calling spi_register_master(); and (after errors
+- * adding the device) calling spi_master_put() and kfree() to prevent a memory
+- * leak.
++ * adding the device) calling spi_master_put() to prevent a memory leak.
+  */
+ struct spi_master *spi_alloc_master(struct device *dev, unsigned size)
+ {
+diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c
+index c7de64171c45..97aad8f91c2f 100644
+--- a/drivers/spi/spidev.c
++++ b/drivers/spi/spidev.c
+@@ -651,7 +651,8 @@ static int spidev_release(struct inode *inode, struct file *filp)
+ 		kfree(spidev->rx_buffer);
+ 		spidev->rx_buffer = NULL;
+ 
+-		spidev->speed_hz = spidev->spi->max_speed_hz;
++		if (spidev->spi)
++			spidev->speed_hz = spidev->spi->max_speed_hz;
+ 
+ 		/* ... after we unbound from the underlying device? */
+ 		spin_lock_irq(&spidev->spi_lock);
+diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
+index 6f4811263557..b71b1f2d98d5 100644
+--- a/drivers/staging/android/ion/ion.c
++++ b/drivers/staging/android/ion/ion.c
+@@ -1179,13 +1179,13 @@ struct ion_handle *ion_import_dma_buf(struct ion_client *client, int fd)
+ 		mutex_unlock(&client->lock);
+ 		goto end;
+ 	}
+-	mutex_unlock(&client->lock);
+ 
+ 	handle = ion_handle_create(client, buffer);
+-	if (IS_ERR(handle))
++	if (IS_ERR(handle)) {
++		mutex_unlock(&client->lock);
+ 		goto end;
++	}
+ 
+-	mutex_lock(&client->lock);
+ 	ret = ion_handle_add(client, handle);
+ 	mutex_unlock(&client->lock);
+ 	if (ret) {
+diff --git a/drivers/staging/speakup/fakekey.c b/drivers/staging/speakup/fakekey.c
+index 4299cf45f947..5e1f16c36b49 100644
+--- a/drivers/staging/speakup/fakekey.c
++++ b/drivers/staging/speakup/fakekey.c
+@@ -81,6 +81,7 @@ void speakup_fake_down_arrow(void)
+ 	__this_cpu_write(reporting_keystroke, true);
+ 	input_report_key(virt_keyboard, KEY_DOWN, PRESSED);
+ 	input_report_key(virt_keyboard, KEY_DOWN, RELEASED);
++	input_sync(virt_keyboard);
+ 	__this_cpu_write(reporting_keystroke, false);
+ 
+ 	/* reenable preemption */
+diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
+index fd092909a457..56cf1996f30f 100644
+--- a/drivers/target/iscsi/iscsi_target.c
++++ b/drivers/target/iscsi/iscsi_target.c
+@@ -341,7 +341,6 @@ static struct iscsi_np *iscsit_get_np(
+ 
+ struct iscsi_np *iscsit_add_np(
+ 	struct __kernel_sockaddr_storage *sockaddr,
+-	char *ip_str,
+ 	int network_transport)
+ {
+ 	struct sockaddr_in *sock_in;
+@@ -370,11 +369,9 @@ struct iscsi_np *iscsit_add_np(
+ 	np->np_flags |= NPF_IP_NETWORK;
+ 	if (sockaddr->ss_family == AF_INET6) {
+ 		sock_in6 = (struct sockaddr_in6 *)sockaddr;
+-		snprintf(np->np_ip, IPV6_ADDRESS_SPACE, "%s", ip_str);
+ 		np->np_port = ntohs(sock_in6->sin6_port);
+ 	} else {
+ 		sock_in = (struct sockaddr_in *)sockaddr;
+-		sprintf(np->np_ip, "%s", ip_str);
+ 		np->np_port = ntohs(sock_in->sin_port);
+ 	}
+ 
+@@ -411,8 +408,8 @@ struct iscsi_np *iscsit_add_np(
+ 	list_add_tail(&np->np_list, &g_np_list);
+ 	mutex_unlock(&np_lock);
+ 
+-	pr_debug("CORE[0] - Added Network Portal: %s:%hu on %s\n",
+-		np->np_ip, np->np_port, np->np_transport->name);
++	pr_debug("CORE[0] - Added Network Portal: %pISc:%hu on %s\n",
++		&np->np_sockaddr, np->np_port, np->np_transport->name);
+ 
+ 	return np;
+ }
+@@ -481,8 +478,8 @@ int iscsit_del_np(struct iscsi_np *np)
+ 	list_del(&np->np_list);
+ 	mutex_unlock(&np_lock);
+ 
+-	pr_debug("CORE[0] - Removed Network Portal: %s:%hu on %s\n",
+-		np->np_ip, np->np_port, np->np_transport->name);
++	pr_debug("CORE[0] - Removed Network Portal: %pISc:%hu on %s\n",
++		&np->np_sockaddr, np->np_port, np->np_transport->name);
+ 
+ 	iscsit_put_transport(np->np_transport);
+ 	kfree(np);
+@@ -3464,7 +3461,6 @@ iscsit_build_sendtargets_response(struct iscsi_cmd *cmd,
+ 						tpg_np_list) {
+ 				struct iscsi_np *np = tpg_np->tpg_np;
+ 				bool inaddr_any = iscsit_check_inaddr_any(np);
+-				char *fmt_str;
+ 
+ 				if (np->np_network_transport != network_transport)
+ 					continue;
+@@ -3492,15 +3488,18 @@ iscsit_build_sendtargets_response(struct iscsi_cmd *cmd,
+ 					}
+ 				}
+ 
+-				if (np->np_sockaddr.ss_family == AF_INET6)
+-					fmt_str = "TargetAddress=[%s]:%hu,%hu";
+-				else
+-					fmt_str = "TargetAddress=%s:%hu,%hu";
+-
+-				len = sprintf(buf, fmt_str,
+-					inaddr_any ? conn->local_ip : np->np_ip,
+-					np->np_port,
+-					tpg->tpgt);
++				if (inaddr_any) {
++					len = sprintf(buf, "TargetAddress="
++						      "%s:%hu,%hu",
++						      conn->local_ip,
++						      np->np_port,
++						      tpg->tpgt);
++				} else {
++					len = sprintf(buf, "TargetAddress="
++						      "%pISpc,%hu",
++						      &np->np_sockaddr,
++						      tpg->tpgt);
++				}
+ 				len += 1;
+ 
+ 				if ((len + payload_len) > buffer_len) {
+diff --git a/drivers/target/iscsi/iscsi_target.h b/drivers/target/iscsi/iscsi_target.h
+index 7d0f9c00d9c2..d294f030a097 100644
+--- a/drivers/target/iscsi/iscsi_target.h
++++ b/drivers/target/iscsi/iscsi_target.h
+@@ -13,7 +13,7 @@ extern int iscsit_deaccess_np(struct iscsi_np *, struct iscsi_portal_group *,
+ extern bool iscsit_check_np_match(struct __kernel_sockaddr_storage *,
+ 				struct iscsi_np *, int);
+ extern struct iscsi_np *iscsit_add_np(struct __kernel_sockaddr_storage *,
+-				char *, int);
++				int);
+ extern int iscsit_reset_np_thread(struct iscsi_np *, struct iscsi_tpg_np *,
+ 				struct iscsi_portal_group *, bool);
+ extern int iscsit_del_np(struct iscsi_np *);
+diff --git a/drivers/target/iscsi/iscsi_target_configfs.c b/drivers/target/iscsi/iscsi_target_configfs.c
+index c1898c84b3d2..db3b9b986954 100644
+--- a/drivers/target/iscsi/iscsi_target_configfs.c
++++ b/drivers/target/iscsi/iscsi_target_configfs.c
+@@ -99,7 +99,7 @@ static ssize_t lio_target_np_store_sctp(
+ 		 * Use existing np->np_sockaddr for SCTP network portal reference
+ 		 */
+ 		tpg_np_sctp = iscsit_tpg_add_network_portal(tpg, &np->np_sockaddr,
+-					np->np_ip, tpg_np, ISCSI_SCTP_TCP);
++					tpg_np, ISCSI_SCTP_TCP);
+ 		if (!tpg_np_sctp || IS_ERR(tpg_np_sctp))
+ 			goto out;
+ 	} else {
+@@ -177,7 +177,7 @@ static ssize_t lio_target_np_store_iser(
+ 		}
+ 
+ 		tpg_np_iser = iscsit_tpg_add_network_portal(tpg, &np->np_sockaddr,
+-				np->np_ip, tpg_np, ISCSI_INFINIBAND);
++				tpg_np, ISCSI_INFINIBAND);
+ 		if (IS_ERR(tpg_np_iser)) {
+ 			rc = PTR_ERR(tpg_np_iser);
+ 			goto out;
+@@ -248,8 +248,8 @@ static struct se_tpg_np *lio_target_call_addnptotpg(
+ 			return ERR_PTR(-EINVAL);
+ 		}
+ 		str++; /* Skip over leading "[" */
+-		*str2 = '\0'; /* Terminate the IPv6 address */
+-		str2++; /* Skip over the "]" */
++		*str2 = '\0'; /* Terminate the unbracketed IPv6 address */
++		str2++; /* Skip over the \0 */
+ 		port_str = strstr(str2, ":");
+ 		if (!port_str) {
+ 			pr_err("Unable to locate \":port\""
+@@ -316,7 +316,7 @@ static struct se_tpg_np *lio_target_call_addnptotpg(
+ 	 * sys/kernel/config/iscsi/$IQN/$TPG/np/$IP:$PORT/
+ 	 *
+ 	 */
+-	tpg_np = iscsit_tpg_add_network_portal(tpg, &sockaddr, str, NULL,
++	tpg_np = iscsit_tpg_add_network_portal(tpg, &sockaddr, NULL,
+ 				ISCSI_TCP);
+ 	if (IS_ERR(tpg_np)) {
+ 		iscsit_put_tpg(tpg);
+@@ -344,8 +344,8 @@ static void lio_target_call_delnpfromtpg(
+ 
+ 	se_tpg = &tpg->tpg_se_tpg;
+ 	pr_debug("LIO_Target_ConfigFS: DEREGISTER -> %s TPGT: %hu"
+-		" PORTAL: %s:%hu\n", config_item_name(&se_tpg->se_tpg_wwn->wwn_group.cg_item),
+-		tpg->tpgt, tpg_np->tpg_np->np_ip, tpg_np->tpg_np->np_port);
++		" PORTAL: %pISc:%hu\n", config_item_name(&se_tpg->se_tpg_wwn->wwn_group.cg_item),
++		tpg->tpgt, &tpg_np->tpg_np->np_sockaddr, tpg_np->tpg_np->np_port);
+ 
+ 	ret = iscsit_tpg_del_network_portal(tpg, tpg_np);
+ 	if (ret < 0)
+diff --git a/drivers/target/iscsi/iscsi_target_login.c b/drivers/target/iscsi/iscsi_target_login.c
+index 7e8f65e5448f..666c0739bfbe 100644
+--- a/drivers/target/iscsi/iscsi_target_login.c
++++ b/drivers/target/iscsi/iscsi_target_login.c
+@@ -823,8 +823,8 @@ static void iscsi_handle_login_thread_timeout(unsigned long data)
+ 	struct iscsi_np *np = (struct iscsi_np *) data;
+ 
+ 	spin_lock_bh(&np->np_thread_lock);
+-	pr_err("iSCSI Login timeout on Network Portal %s:%hu\n",
+-			np->np_ip, np->np_port);
++	pr_err("iSCSI Login timeout on Network Portal %pISc:%hu\n",
++			&np->np_sockaddr, np->np_port);
+ 
+ 	if (np->np_login_timer_flags & ISCSI_TF_STOP) {
+ 		spin_unlock_bh(&np->np_thread_lock);
+@@ -1302,8 +1302,8 @@ static int __iscsi_target_login_thread(struct iscsi_np *np)
+ 	spin_lock_bh(&np->np_thread_lock);
+ 	if (np->np_thread_state != ISCSI_NP_THREAD_ACTIVE) {
+ 		spin_unlock_bh(&np->np_thread_lock);
+-		pr_err("iSCSI Network Portal on %s:%hu currently not"
+-			" active.\n", np->np_ip, np->np_port);
++		pr_err("iSCSI Network Portal on %pISc:%hu currently not"
++			" active.\n", &np->np_sockaddr, np->np_port);
+ 		iscsit_tx_login_rsp(conn, ISCSI_STATUS_CLS_TARGET_ERR,
+ 				ISCSI_LOGIN_STATUS_SVC_UNAVAILABLE);
+ 		goto new_sess_out;
+diff --git a/drivers/target/iscsi/iscsi_target_parameters.c b/drivers/target/iscsi/iscsi_target_parameters.c
+index e8a52f7d6204..51d1734d5390 100644
+--- a/drivers/target/iscsi/iscsi_target_parameters.c
++++ b/drivers/target/iscsi/iscsi_target_parameters.c
+@@ -407,6 +407,7 @@ int iscsi_create_default_params(struct iscsi_param_list **param_list_ptr)
+ 			TYPERANGE_UTF8, USE_INITIAL_ONLY);
+ 	if (!param)
+ 		goto out;
++
+ 	/*
+ 	 * Extra parameters for ISER from RFC-5046
+ 	 */
+@@ -496,9 +497,9 @@ int iscsi_set_keys_to_negotiate(
+ 		} else if (!strcmp(param->name, SESSIONTYPE)) {
+ 			SET_PSTATE_NEGOTIATE(param);
+ 		} else if (!strcmp(param->name, IFMARKER)) {
+-			SET_PSTATE_NEGOTIATE(param);
++			SET_PSTATE_REJECT(param);
+ 		} else if (!strcmp(param->name, OFMARKER)) {
+-			SET_PSTATE_NEGOTIATE(param);
++			SET_PSTATE_REJECT(param);
+ 		} else if (!strcmp(param->name, IFMARKINT)) {
+ 			SET_PSTATE_REJECT(param);
+ 		} else if (!strcmp(param->name, OFMARKINT)) {
+diff --git a/drivers/target/iscsi/iscsi_target_tpg.c b/drivers/target/iscsi/iscsi_target_tpg.c
+index 968068ffcb1c..de26bee4bddd 100644
+--- a/drivers/target/iscsi/iscsi_target_tpg.c
++++ b/drivers/target/iscsi/iscsi_target_tpg.c
+@@ -460,7 +460,6 @@ static bool iscsit_tpg_check_network_portal(
+ struct iscsi_tpg_np *iscsit_tpg_add_network_portal(
+ 	struct iscsi_portal_group *tpg,
+ 	struct __kernel_sockaddr_storage *sockaddr,
+-	char *ip_str,
+ 	struct iscsi_tpg_np *tpg_np_parent,
+ 	int network_transport)
+ {
+@@ -470,8 +469,8 @@ struct iscsi_tpg_np *iscsit_tpg_add_network_portal(
+ 	if (!tpg_np_parent) {
+ 		if (iscsit_tpg_check_network_portal(tpg->tpg_tiqn, sockaddr,
+ 				network_transport)) {
+-			pr_err("Network Portal: %s already exists on a"
+-				" different TPG on %s\n", ip_str,
++			pr_err("Network Portal: %pISc already exists on a"
++				" different TPG on %s\n", sockaddr,
+ 				tpg->tpg_tiqn->tiqn);
+ 			return ERR_PTR(-EEXIST);
+ 		}
+@@ -484,7 +483,7 @@ struct iscsi_tpg_np *iscsit_tpg_add_network_portal(
+ 		return ERR_PTR(-ENOMEM);
+ 	}
+ 
+-	np = iscsit_add_np(sockaddr, ip_str, network_transport);
++	np = iscsit_add_np(sockaddr, network_transport);
+ 	if (IS_ERR(np)) {
+ 		kfree(tpg_np);
+ 		return ERR_CAST(np);
+@@ -514,8 +513,8 @@ struct iscsi_tpg_np *iscsit_tpg_add_network_portal(
+ 		spin_unlock(&tpg_np_parent->tpg_np_parent_lock);
+ 	}
+ 
+-	pr_debug("CORE[%s] - Added Network Portal: %s:%hu,%hu on %s\n",
+-		tpg->tpg_tiqn->tiqn, np->np_ip, np->np_port, tpg->tpgt,
++	pr_debug("CORE[%s] - Added Network Portal: %pISc:%hu,%hu on %s\n",
++		tpg->tpg_tiqn->tiqn, &np->np_sockaddr, np->np_port, tpg->tpgt,
+ 		np->np_transport->name);
+ 
+ 	return tpg_np;
+@@ -528,8 +527,8 @@ static int iscsit_tpg_release_np(
+ {
+ 	iscsit_clear_tpg_np_login_thread(tpg_np, tpg, true);
+ 
+-	pr_debug("CORE[%s] - Removed Network Portal: %s:%hu,%hu on %s\n",
+-		tpg->tpg_tiqn->tiqn, np->np_ip, np->np_port, tpg->tpgt,
++	pr_debug("CORE[%s] - Removed Network Portal: %pISc:%hu,%hu on %s\n",
++		tpg->tpg_tiqn->tiqn, &np->np_sockaddr, np->np_port, tpg->tpgt,
+ 		np->np_transport->name);
+ 
+ 	tpg_np->tpg_np = NULL;
+diff --git a/drivers/target/iscsi/iscsi_target_tpg.h b/drivers/target/iscsi/iscsi_target_tpg.h
+index 95ff5bdecd71..28abda89ea98 100644
+--- a/drivers/target/iscsi/iscsi_target_tpg.h
++++ b/drivers/target/iscsi/iscsi_target_tpg.h
+@@ -22,7 +22,7 @@ extern struct iscsi_node_attrib *iscsit_tpg_get_node_attrib(struct iscsi_session
+ extern void iscsit_tpg_del_external_nps(struct iscsi_tpg_np *);
+ extern struct iscsi_tpg_np *iscsit_tpg_locate_child_np(struct iscsi_tpg_np *, int);
+ extern struct iscsi_tpg_np *iscsit_tpg_add_network_portal(struct iscsi_portal_group *,
+-			struct __kernel_sockaddr_storage *, char *, struct iscsi_tpg_np *,
++			struct __kernel_sockaddr_storage *, struct iscsi_tpg_np *,
+ 			int);
+ extern int iscsit_tpg_del_network_portal(struct iscsi_portal_group *,
+ 			struct iscsi_tpg_np *);
+diff --git a/drivers/target/target_core_device.c b/drivers/target/target_core_device.c
+index 09e682b1c549..8f1cd194f06a 100644
+--- a/drivers/target/target_core_device.c
++++ b/drivers/target/target_core_device.c
+@@ -427,8 +427,6 @@ void core_disable_device_list_for_node(
+ 
+ 	hlist_del_rcu(&orig->link);
+ 	clear_bit(DEF_PR_REG_ACTIVE, &orig->deve_flags);
+-	rcu_assign_pointer(orig->se_lun, NULL);
+-	rcu_assign_pointer(orig->se_lun_acl, NULL);
+ 	orig->lun_flags = 0;
+ 	orig->creation_time = 0;
+ 	orig->attach_count--;
+@@ -439,6 +437,9 @@ void core_disable_device_list_for_node(
+ 	kref_put(&orig->pr_kref, target_pr_kref_release);
+ 	wait_for_completion(&orig->pr_comp);
+ 
++	rcu_assign_pointer(orig->se_lun, NULL);
++	rcu_assign_pointer(orig->se_lun_acl, NULL);
++
+ 	kfree_rcu(orig, rcu_head);
+ 
+ 	core_scsi3_free_pr_reg_from_nacl(dev, nacl);
+diff --git a/drivers/target/target_core_pr.c b/drivers/target/target_core_pr.c
+index 5ab7100de17e..e7933115087a 100644
+--- a/drivers/target/target_core_pr.c
++++ b/drivers/target/target_core_pr.c
+@@ -618,7 +618,7 @@ static struct t10_pr_registration *__core_scsi3_do_alloc_registration(
+ 	struct se_device *dev,
+ 	struct se_node_acl *nacl,
+ 	struct se_lun *lun,
+-	struct se_dev_entry *deve,
++	struct se_dev_entry *dest_deve,
+ 	u64 mapped_lun,
+ 	unsigned char *isid,
+ 	u64 sa_res_key,
+@@ -640,7 +640,29 @@ static struct t10_pr_registration *__core_scsi3_do_alloc_registration(
+ 	INIT_LIST_HEAD(&pr_reg->pr_reg_atp_mem_list);
+ 	atomic_set(&pr_reg->pr_res_holders, 0);
+ 	pr_reg->pr_reg_nacl = nacl;
+-	pr_reg->pr_reg_deve = deve;
++	/*
++	 * For destination registrations for ALL_TG_PT=1 and SPEC_I_PT=1,
++	 * the se_dev_entry->pr_ref will have been already obtained by
++	 * core_get_se_deve_from_rtpi() or __core_scsi3_alloc_registration().
++	 *
++	 * Otherwise, locate se_dev_entry now and obtain a reference until
++	 * registration completes in __core_scsi3_add_registration().
++	 */
++	if (dest_deve) {
++		pr_reg->pr_reg_deve = dest_deve;
++	} else {
++		rcu_read_lock();
++		pr_reg->pr_reg_deve = target_nacl_find_deve(nacl, mapped_lun);
++		if (!pr_reg->pr_reg_deve) {
++			rcu_read_unlock();
++			pr_err("Unable to locate PR deve %s mapped_lun: %llu\n",
++				nacl->initiatorname, mapped_lun);
++			kmem_cache_free(t10_pr_reg_cache, pr_reg);
++			return NULL;
++		}
++		kref_get(&pr_reg->pr_reg_deve->pr_kref);
++		rcu_read_unlock();
++	}
+ 	pr_reg->pr_res_mapped_lun = mapped_lun;
+ 	pr_reg->pr_aptpl_target_lun = lun->unpacked_lun;
+ 	pr_reg->tg_pt_sep_rtpi = lun->lun_rtpi;
+@@ -936,17 +958,29 @@ static int __core_scsi3_check_aptpl_registration(
+ 		    !(strcmp(pr_reg->pr_tport, t_port)) &&
+ 		     (pr_reg->pr_reg_tpgt == tpgt) &&
+ 		     (pr_reg->pr_aptpl_target_lun == target_lun)) {
++			/*
++			 * Obtain the ->pr_reg_deve pointer + reference, that
++			 * is released by __core_scsi3_add_registration() below.
++			 */
++			rcu_read_lock();
++			pr_reg->pr_reg_deve = target_nacl_find_deve(nacl, mapped_lun);
++			if (!pr_reg->pr_reg_deve) {
++				pr_err("Unable to locate PR APTPL %s mapped_lun:"
++					" %llu\n", nacl->initiatorname, mapped_lun);
++				rcu_read_unlock();
++				continue;
++			}
++			kref_get(&pr_reg->pr_reg_deve->pr_kref);
++			rcu_read_unlock();
+ 
+ 			pr_reg->pr_reg_nacl = nacl;
+ 			pr_reg->tg_pt_sep_rtpi = lun->lun_rtpi;
+-
+ 			list_del(&pr_reg->pr_reg_aptpl_list);
+ 			spin_unlock(&pr_tmpl->aptpl_reg_lock);
+ 			/*
+ 			 * At this point all of the pointers in *pr_reg will
+ 			 * be setup, so go ahead and add the registration.
+ 			 */
+-
+ 			__core_scsi3_add_registration(dev, nacl, pr_reg, 0, 0);
+ 			/*
+ 			 * If this registration is the reservation holder,
+@@ -1044,18 +1078,11 @@ static void __core_scsi3_add_registration(
+ 
+ 	__core_scsi3_dump_registration(tfo, dev, nacl, pr_reg, register_type);
+ 	spin_unlock(&pr_tmpl->registration_lock);
+-
+-	rcu_read_lock();
+-	deve = pr_reg->pr_reg_deve;
+-	if (deve)
+-		set_bit(DEF_PR_REG_ACTIVE, &deve->deve_flags);
+-	rcu_read_unlock();
+-
+ 	/*
+ 	 * Skip extra processing for ALL_TG_PT=0 or REGISTER_AND_MOVE.
+ 	 */
+ 	if (!pr_reg->pr_reg_all_tg_pt || register_move)
+-		return;
++		goto out;
+ 	/*
+ 	 * Walk pr_reg->pr_reg_atp_list and add registrations for ALL_TG_PT=1
+ 	 * allocated in __core_scsi3_alloc_registration()
+@@ -1075,19 +1102,31 @@ static void __core_scsi3_add_registration(
+ 		__core_scsi3_dump_registration(tfo, dev, nacl_tmp, pr_reg_tmp,
+ 					       register_type);
+ 		spin_unlock(&pr_tmpl->registration_lock);
+-
++		/*
++		 * Drop configfs group dependency reference and deve->pr_kref
++		 * obtained from  __core_scsi3_alloc_registration() code.
++		 */
+ 		rcu_read_lock();
+ 		deve = pr_reg_tmp->pr_reg_deve;
+-		if (deve)
++		if (deve) {
+ 			set_bit(DEF_PR_REG_ACTIVE, &deve->deve_flags);
++			core_scsi3_lunacl_undepend_item(deve);
++			pr_reg_tmp->pr_reg_deve = NULL;
++		}
+ 		rcu_read_unlock();
+-
+-		/*
+-		 * Drop configfs group dependency reference from
+-		 * __core_scsi3_alloc_registration()
+-		 */
+-		core_scsi3_lunacl_undepend_item(pr_reg_tmp->pr_reg_deve);
+ 	}
++out:
++	/*
++	 * Drop deve->pr_kref obtained in __core_scsi3_do_alloc_registration()
++	 */
++	rcu_read_lock();
++	deve = pr_reg->pr_reg_deve;
++	if (deve) {
++		set_bit(DEF_PR_REG_ACTIVE, &deve->deve_flags);
++		kref_put(&deve->pr_kref, target_pr_kref_release);
++		pr_reg->pr_reg_deve = NULL;
++	}
++	rcu_read_unlock();
+ }
+ 
+ static int core_scsi3_alloc_registration(
+@@ -1785,9 +1824,11 @@ core_scsi3_decode_spec_i_port(
+ 			dest_node_acl->initiatorname, i_buf, (dest_se_deve) ?
+ 			dest_se_deve->mapped_lun : 0);
+ 
+-		if (!dest_se_deve)
++		if (!dest_se_deve) {
++			kref_put(&local_pr_reg->pr_reg_deve->pr_kref,
++				 target_pr_kref_release);
+ 			continue;
+-
++		}
+ 		core_scsi3_lunacl_undepend_item(dest_se_deve);
+ 		core_scsi3_nodeacl_undepend_item(dest_node_acl);
+ 		core_scsi3_tpg_undepend_item(dest_tpg);
+@@ -1823,9 +1864,11 @@ out:
+ 
+ 		kmem_cache_free(t10_pr_reg_cache, dest_pr_reg);
+ 
+-		if (!dest_se_deve)
++		if (!dest_se_deve) {
++			kref_put(&local_pr_reg->pr_reg_deve->pr_kref,
++				 target_pr_kref_release);
+ 			continue;
+-
++		}
+ 		core_scsi3_lunacl_undepend_item(dest_se_deve);
+ 		core_scsi3_nodeacl_undepend_item(dest_node_acl);
+ 		core_scsi3_tpg_undepend_item(dest_tpg);
+diff --git a/drivers/target/target_core_xcopy.c b/drivers/target/target_core_xcopy.c
+index 4515f52546f8..47fe94ee10b8 100644
+--- a/drivers/target/target_core_xcopy.c
++++ b/drivers/target/target_core_xcopy.c
+@@ -450,6 +450,8 @@ int target_xcopy_setup_pt(void)
+ 	memset(&xcopy_pt_sess, 0, sizeof(struct se_session));
+ 	INIT_LIST_HEAD(&xcopy_pt_sess.sess_list);
+ 	INIT_LIST_HEAD(&xcopy_pt_sess.sess_acl_list);
++	INIT_LIST_HEAD(&xcopy_pt_sess.sess_cmd_list);
++	spin_lock_init(&xcopy_pt_sess.sess_cmd_lock);
+ 
+ 	xcopy_pt_nacl.se_tpg = &xcopy_pt_tpg;
+ 	xcopy_pt_nacl.nacl_sess = &xcopy_pt_sess;
+@@ -644,7 +646,7 @@ static int target_xcopy_read_source(
+ 	pr_debug("XCOPY: Built READ_16: LBA: %llu Sectors: %u Length: %u\n",
+ 		(unsigned long long)src_lba, src_sectors, length);
+ 
+-	transport_init_se_cmd(se_cmd, &xcopy_pt_tfo, NULL, length,
++	transport_init_se_cmd(se_cmd, &xcopy_pt_tfo, &xcopy_pt_sess, length,
+ 			      DMA_FROM_DEVICE, 0, &xpt_cmd->sense_buffer[0]);
+ 	xop->src_pt_cmd = xpt_cmd;
+ 
+@@ -704,7 +706,7 @@ static int target_xcopy_write_destination(
+ 	pr_debug("XCOPY: Built WRITE_16: LBA: %llu Sectors: %u Length: %u\n",
+ 		(unsigned long long)dst_lba, dst_sectors, length);
+ 
+-	transport_init_se_cmd(se_cmd, &xcopy_pt_tfo, NULL, length,
++	transport_init_se_cmd(se_cmd, &xcopy_pt_tfo, &xcopy_pt_sess, length,
+ 			      DMA_TO_DEVICE, 0, &xpt_cmd->sense_buffer[0]);
+ 	xop->dst_pt_cmd = xpt_cmd;
+ 
+diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
+index 620dcd405ff6..42c6f71bdcc1 100644
+--- a/drivers/thermal/cpu_cooling.c
++++ b/drivers/thermal/cpu_cooling.c
+@@ -262,7 +262,9 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,
+  * efficiently.  Power is stored in mW, frequency in KHz.  The
+  * resulting table is in ascending order.
+  *
+- * Return: 0 on success, -E* on error.
++ * Return: 0 on success, -EINVAL if there are no OPPs for any CPUs,
++ * -ENOMEM if we run out of memory or -EAGAIN if an OPP was
++ * added/enabled while the function was executing.
+  */
+ static int build_dyn_power_table(struct cpufreq_cooling_device *cpufreq_device,
+ 				 u32 capacitance)
+@@ -273,8 +275,6 @@ static int build_dyn_power_table(struct cpufreq_cooling_device *cpufreq_device,
+ 	int num_opps = 0, cpu, i, ret = 0;
+ 	unsigned long freq;
+ 
+-	rcu_read_lock();
+-
+ 	for_each_cpu(cpu, &cpufreq_device->allowed_cpus) {
+ 		dev = get_cpu_device(cpu);
+ 		if (!dev) {
+@@ -284,24 +284,20 @@ static int build_dyn_power_table(struct cpufreq_cooling_device *cpufreq_device,
+ 		}
+ 
+ 		num_opps = dev_pm_opp_get_opp_count(dev);
+-		if (num_opps > 0) {
++		if (num_opps > 0)
+ 			break;
+-		} else if (num_opps < 0) {
+-			ret = num_opps;
+-			goto unlock;
+-		}
++		else if (num_opps < 0)
++			return num_opps;
+ 	}
+ 
+-	if (num_opps == 0) {
+-		ret = -EINVAL;
+-		goto unlock;
+-	}
++	if (num_opps == 0)
++		return -EINVAL;
+ 
+ 	power_table = kcalloc(num_opps, sizeof(*power_table), GFP_KERNEL);
+-	if (!power_table) {
+-		ret = -ENOMEM;
+-		goto unlock;
+-	}
++	if (!power_table)
++		return -ENOMEM;
++
++	rcu_read_lock();
+ 
+ 	for (freq = 0, i = 0;
+ 	     opp = dev_pm_opp_find_freq_ceil(dev, &freq), !IS_ERR(opp);
+@@ -309,6 +305,12 @@ static int build_dyn_power_table(struct cpufreq_cooling_device *cpufreq_device,
+ 		u32 freq_mhz, voltage_mv;
+ 		u64 power;
+ 
++		if (i >= num_opps) {
++			rcu_read_unlock();
++			ret = -EAGAIN;
++			goto free_power_table;
++		}
++
+ 		freq_mhz = freq / 1000000;
+ 		voltage_mv = dev_pm_opp_get_voltage(opp) / 1000;
+ 
+@@ -326,17 +328,22 @@ static int build_dyn_power_table(struct cpufreq_cooling_device *cpufreq_device,
+ 		power_table[i].power = power;
+ 	}
+ 
+-	if (i == 0) {
++	rcu_read_unlock();
++
++	if (i != num_opps) {
+ 		ret = PTR_ERR(opp);
+-		goto unlock;
++		goto free_power_table;
+ 	}
+ 
+ 	cpufreq_device->cpu_dev = dev;
+ 	cpufreq_device->dyn_power_table = power_table;
+ 	cpufreq_device->dyn_power_table_entries = i;
+ 
+-unlock:
+-	rcu_read_unlock();
++	return 0;
++
++free_power_table:
++	kfree(power_table);
++
+ 	return ret;
+ }
+ 
+@@ -847,7 +854,7 @@ __cpufreq_cooling_register(struct device_node *np,
+ 	ret = get_idr(&cpufreq_idr, &cpufreq_dev->id);
+ 	if (ret) {
+ 		cool_dev = ERR_PTR(ret);
+-		goto free_table;
++		goto free_power_table;
+ 	}
+ 
+ 	snprintf(dev_name, sizeof(dev_name), "thermal-cpufreq-%d",
+@@ -889,6 +896,8 @@ __cpufreq_cooling_register(struct device_node *np,
+ 
+ remove_idr:
+ 	release_idr(&cpufreq_idr, cpufreq_dev->id);
++free_power_table:
++	kfree(cpufreq_dev->dyn_power_table);
+ free_table:
+ 	kfree(cpufreq_dev->freq_table);
+ free_time_in_idle_timestamp:
+@@ -1039,6 +1048,7 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev)
+ 
+ 	thermal_cooling_device_unregister(cpufreq_dev->cool_dev);
+ 	release_idr(&cpufreq_idr, cpufreq_dev->id);
++	kfree(cpufreq_dev->dyn_power_table);
+ 	kfree(cpufreq_dev->time_in_idle_timestamp);
+ 	kfree(cpufreq_dev->time_in_idle);
+ 	kfree(cpufreq_dev->freq_table);
+diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
+index ee8bfacf2071..afc1879f66e0 100644
+--- a/drivers/tty/n_tty.c
++++ b/drivers/tty/n_tty.c
+@@ -343,8 +343,7 @@ static void n_tty_packet_mode_flush(struct tty_struct *tty)
+ 		spin_lock_irqsave(&tty->ctrl_lock, flags);
+ 		tty->ctrl_status |= TIOCPKT_FLUSHREAD;
+ 		spin_unlock_irqrestore(&tty->ctrl_lock, flags);
+-		if (waitqueue_active(&tty->link->read_wait))
+-			wake_up_interruptible(&tty->link->read_wait);
++		wake_up_interruptible(&tty->link->read_wait);
+ 	}
+ }
+ 
+@@ -1382,8 +1381,7 @@ handle_newline:
+ 			put_tty_queue(c, ldata);
+ 			smp_store_release(&ldata->canon_head, ldata->read_head);
+ 			kill_fasync(&tty->fasync, SIGIO, POLL_IN);
+-			if (waitqueue_active(&tty->read_wait))
+-				wake_up_interruptible_poll(&tty->read_wait, POLLIN);
++			wake_up_interruptible_poll(&tty->read_wait, POLLIN);
+ 			return 0;
+ 		}
+ 	}
+@@ -1667,8 +1665,7 @@ static void __receive_buf(struct tty_struct *tty, const unsigned char *cp,
+ 
+ 	if ((read_cnt(ldata) >= ldata->minimum_to_wake) || L_EXTPROC(tty)) {
+ 		kill_fasync(&tty->fasync, SIGIO, POLL_IN);
+-		if (waitqueue_active(&tty->read_wait))
+-			wake_up_interruptible_poll(&tty->read_wait, POLLIN);
++		wake_up_interruptible_poll(&tty->read_wait, POLLIN);
+ 	}
+ }
+ 
+@@ -1887,10 +1884,8 @@ static void n_tty_set_termios(struct tty_struct *tty, struct ktermios *old)
+ 	}
+ 
+ 	/* The termios change make the tty ready for I/O */
+-	if (waitqueue_active(&tty->write_wait))
+-		wake_up_interruptible(&tty->write_wait);
+-	if (waitqueue_active(&tty->read_wait))
+-		wake_up_interruptible(&tty->read_wait);
++	wake_up_interruptible(&tty->write_wait);
++	wake_up_interruptible(&tty->read_wait);
+ }
+ 
+ /**
+diff --git a/drivers/tty/serial/8250/8250_core.c b/drivers/tty/serial/8250/8250_core.c
+index 37fff12dd4d0..c35d96ece8ff 100644
+--- a/drivers/tty/serial/8250/8250_core.c
++++ b/drivers/tty/serial/8250/8250_core.c
+@@ -326,6 +326,14 @@ configured less than Maximum supported fifo bytes */
+ 				  UART_FCR7_64BYTE,
+ 		.flags		= UART_CAP_FIFO,
+ 	},
++	[PORT_RT2880] = {
++		.name		= "Palmchip BK-3103",
++		.fifo_size	= 16,
++		.tx_loadsz	= 16,
++		.fcr		= UART_FCR_ENABLE_FIFO | UART_FCR_R_TRIG_10,
++		.rxtrig_bytes	= {1, 4, 8, 14},
++		.flags		= UART_CAP_FIFO,
++	},
+ };
+ 
+ /* Uart divisor latch read */
+diff --git a/drivers/tty/serial/atmel_serial.c b/drivers/tty/serial/atmel_serial.c
+index 2a8f528153e7..40326b342762 100644
+--- a/drivers/tty/serial/atmel_serial.c
++++ b/drivers/tty/serial/atmel_serial.c
+@@ -2641,7 +2641,7 @@ static int atmel_serial_probe(struct platform_device *pdev)
+ 	ret = atmel_init_gpios(port, &pdev->dev);
+ 	if (ret < 0) {
+ 		dev_err(&pdev->dev, "Failed to initialize GPIOs.");
+-		goto err;
++		goto err_clear_bit;
+ 	}
+ 
+ 	ret = atmel_init_port(port, pdev);
+diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
+index 57fc6ee12332..774df354af55 100644
+--- a/drivers/tty/tty_io.c
++++ b/drivers/tty/tty_io.c
+@@ -2136,8 +2136,24 @@ retry_open:
+ 	if (!noctty &&
+ 	    current->signal->leader &&
+ 	    !current->signal->tty &&
+-	    tty->session == NULL)
+-		__proc_set_tty(tty);
++	    tty->session == NULL) {
++		/*
++		 * Don't let a process that only has write access to the tty
++		 * obtain the privileges associated with having a tty as
++		 * controlling terminal (being able to reopen it with full
++		 * access through /dev/tty, being able to perform pushback).
++		 * Many distributions set the group of all ttys to "tty" and
++		 * grant write-only access to all terminals for setgid tty
++		 * binaries, which should not imply full privileges on all ttys.
++		 *
++		 * This could theoretically break old code that performs open()
++		 * on a write-only file descriptor. In that case, it might be
++		 * necessary to also permit this if
++		 * inode_permission(inode, MAY_READ) == 0.
++		 */
++		if (filp->f_mode & FMODE_READ)
++			__proc_set_tty(tty);
++	}
+ 	spin_unlock_irq(&current->sighand->siglock);
+ 	read_unlock(&tasklist_lock);
+ 	tty_unlock(tty);
+@@ -2426,7 +2442,7 @@ static int fionbio(struct file *file, int __user *p)
+  *		Takes ->siglock() when updating signal->tty
+  */
+ 
+-static int tiocsctty(struct tty_struct *tty, int arg)
++static int tiocsctty(struct tty_struct *tty, struct file *file, int arg)
+ {
+ 	int ret = 0;
+ 
+@@ -2460,6 +2476,13 @@ static int tiocsctty(struct tty_struct *tty, int arg)
+ 			goto unlock;
+ 		}
+ 	}
++
++	/* See the comment in tty_open(). */
++	if ((file->f_mode & FMODE_READ) == 0 && !capable(CAP_SYS_ADMIN)) {
++		ret = -EPERM;
++		goto unlock;
++	}
++
+ 	proc_set_tty(tty);
+ unlock:
+ 	read_unlock(&tasklist_lock);
+@@ -2852,7 +2875,7 @@ long tty_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+ 		no_tty();
+ 		return 0;
+ 	case TIOCSCTTY:
+-		return tiocsctty(tty, arg);
++		return tiocsctty(tty, file, arg);
+ 	case TIOCGPGRP:
+ 		return tiocgpgrp(tty, real_tty, p);
+ 	case TIOCSPGRP:
+diff --git a/drivers/usb/chipidea/ci_hdrc_imx.c b/drivers/usb/chipidea/ci_hdrc_imx.c
+index 389f0e034259..fa774323ebda 100644
+--- a/drivers/usb/chipidea/ci_hdrc_imx.c
++++ b/drivers/usb/chipidea/ci_hdrc_imx.c
+@@ -56,7 +56,7 @@ static const struct of_device_id ci_hdrc_imx_dt_ids[] = {
+ 	{ .compatible = "fsl,imx27-usb", .data = &imx27_usb_data},
+ 	{ .compatible = "fsl,imx6q-usb", .data = &imx6q_usb_data},
+ 	{ .compatible = "fsl,imx6sl-usb", .data = &imx6sl_usb_data},
+-	{ .compatible = "fsl,imx6sx-usb", .data = &imx6sl_usb_data},
++	{ .compatible = "fsl,imx6sx-usb", .data = &imx6sx_usb_data},
+ 	{ /* sentinel */ }
+ };
+ MODULE_DEVICE_TABLE(of, ci_hdrc_imx_dt_ids);
+diff --git a/drivers/usb/chipidea/udc.c b/drivers/usb/chipidea/udc.c
+index 764f668d45a9..6e53c24fa1cb 100644
+--- a/drivers/usb/chipidea/udc.c
++++ b/drivers/usb/chipidea/udc.c
+@@ -656,6 +656,44 @@ __acquires(hwep->lock)
+ 	return 0;
+ }
+ 
++static int _ep_set_halt(struct usb_ep *ep, int value, bool check_transfer)
++{
++	struct ci_hw_ep *hwep = container_of(ep, struct ci_hw_ep, ep);
++	int direction, retval = 0;
++	unsigned long flags;
++
++	if (ep == NULL || hwep->ep.desc == NULL)
++		return -EINVAL;
++
++	if (usb_endpoint_xfer_isoc(hwep->ep.desc))
++		return -EOPNOTSUPP;
++
++	spin_lock_irqsave(hwep->lock, flags);
++
++	if (value && hwep->dir == TX && check_transfer &&
++		!list_empty(&hwep->qh.queue) &&
++			!usb_endpoint_xfer_control(hwep->ep.desc)) {
++		spin_unlock_irqrestore(hwep->lock, flags);
++		return -EAGAIN;
++	}
++
++	direction = hwep->dir;
++	do {
++		retval |= hw_ep_set_halt(hwep->ci, hwep->num, hwep->dir, value);
++
++		if (!value)
++			hwep->wedge = 0;
++
++		if (hwep->type == USB_ENDPOINT_XFER_CONTROL)
++			hwep->dir = (hwep->dir == TX) ? RX : TX;
++
++	} while (hwep->dir != direction);
++
++	spin_unlock_irqrestore(hwep->lock, flags);
++	return retval;
++}
++
++
+ /**
+  * _gadget_stop_activity: stops all USB activity, flushes & disables all endpts
+  * @gadget: gadget
+@@ -1051,7 +1089,7 @@ __acquires(ci->lock)
+ 				num += ci->hw_ep_max / 2;
+ 
+ 			spin_unlock(&ci->lock);
+-			err = usb_ep_set_halt(&ci->ci_hw_ep[num].ep);
++			err = _ep_set_halt(&ci->ci_hw_ep[num].ep, 1, false);
+ 			spin_lock(&ci->lock);
+ 			if (!err)
+ 				isr_setup_status_phase(ci);
+@@ -1110,8 +1148,8 @@ delegate:
+ 
+ 	if (err < 0) {
+ 		spin_unlock(&ci->lock);
+-		if (usb_ep_set_halt(&hwep->ep))
+-			dev_err(ci->dev, "error: ep_set_halt\n");
++		if (_ep_set_halt(&hwep->ep, 1, false))
++			dev_err(ci->dev, "error: _ep_set_halt\n");
+ 		spin_lock(&ci->lock);
+ 	}
+ }
+@@ -1142,9 +1180,9 @@ __acquires(ci->lock)
+ 					err = isr_setup_status_phase(ci);
+ 				if (err < 0) {
+ 					spin_unlock(&ci->lock);
+-					if (usb_ep_set_halt(&hwep->ep))
++					if (_ep_set_halt(&hwep->ep, 1, false))
+ 						dev_err(ci->dev,
+-							"error: ep_set_halt\n");
++						"error: _ep_set_halt\n");
+ 					spin_lock(&ci->lock);
+ 				}
+ 			}
+@@ -1390,41 +1428,7 @@ static int ep_dequeue(struct usb_ep *ep, struct usb_request *req)
+  */
+ static int ep_set_halt(struct usb_ep *ep, int value)
+ {
+-	struct ci_hw_ep *hwep = container_of(ep, struct ci_hw_ep, ep);
+-	int direction, retval = 0;
+-	unsigned long flags;
+-
+-	if (ep == NULL || hwep->ep.desc == NULL)
+-		return -EINVAL;
+-
+-	if (usb_endpoint_xfer_isoc(hwep->ep.desc))
+-		return -EOPNOTSUPP;
+-
+-	spin_lock_irqsave(hwep->lock, flags);
+-
+-#ifndef STALL_IN
+-	/* g_file_storage MS compliant but g_zero fails chapter 9 compliance */
+-	if (value && hwep->type == USB_ENDPOINT_XFER_BULK && hwep->dir == TX &&
+-	    !list_empty(&hwep->qh.queue)) {
+-		spin_unlock_irqrestore(hwep->lock, flags);
+-		return -EAGAIN;
+-	}
+-#endif
+-
+-	direction = hwep->dir;
+-	do {
+-		retval |= hw_ep_set_halt(hwep->ci, hwep->num, hwep->dir, value);
+-
+-		if (!value)
+-			hwep->wedge = 0;
+-
+-		if (hwep->type == USB_ENDPOINT_XFER_CONTROL)
+-			hwep->dir = (hwep->dir == TX) ? RX : TX;
+-
+-	} while (hwep->dir != direction);
+-
+-	spin_unlock_irqrestore(hwep->lock, flags);
+-	return retval;
++	return _ep_set_halt(ep, value, true);
+ }
+ 
+ /**
+diff --git a/drivers/usb/core/config.c b/drivers/usb/core/config.c
+index b2a540b43f97..b9ddf0c1ffe5 100644
+--- a/drivers/usb/core/config.c
++++ b/drivers/usb/core/config.c
+@@ -112,7 +112,7 @@ static void usb_parse_ss_endpoint_companion(struct device *ddev, int cfgno,
+ 				cfgno, inum, asnum, ep->desc.bEndpointAddress);
+ 		ep->ss_ep_comp.bmAttributes = 16;
+ 	} else if (usb_endpoint_xfer_isoc(&ep->desc) &&
+-			desc->bmAttributes > 2) {
++		   USB_SS_MULT(desc->bmAttributes) > 3) {
+ 		dev_warn(ddev, "Isoc endpoint has Mult of %d in "
+ 				"config %d interface %d altsetting %d ep %d: "
+ 				"setting to 3\n", desc->bmAttributes + 1,
+@@ -121,7 +121,8 @@ static void usb_parse_ss_endpoint_companion(struct device *ddev, int cfgno,
+ 	}
+ 
+ 	if (usb_endpoint_xfer_isoc(&ep->desc))
+-		max_tx = (desc->bMaxBurst + 1) * (desc->bmAttributes + 1) *
++		max_tx = (desc->bMaxBurst + 1) *
++			(USB_SS_MULT(desc->bmAttributes)) *
+ 			usb_endpoint_maxp(&ep->desc);
+ 	else if (usb_endpoint_xfer_int(&ep->desc))
+ 		max_tx = usb_endpoint_maxp(&ep->desc) *
+diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c
+index d85abfed84cc..f5a381945db2 100644
+--- a/drivers/usb/core/quirks.c
++++ b/drivers/usb/core/quirks.c
+@@ -54,6 +54,13 @@ static const struct usb_device_id usb_quirk_list[] = {
+ 	{ USB_DEVICE(0x046d, 0x082d), .driver_info = USB_QUIRK_DELAY_INIT },
+ 	{ USB_DEVICE(0x046d, 0x0843), .driver_info = USB_QUIRK_DELAY_INIT },
+ 
++	/* Logitech ConferenceCam CC3000e */
++	{ USB_DEVICE(0x046d, 0x0847), .driver_info = USB_QUIRK_DELAY_INIT },
++	{ USB_DEVICE(0x046d, 0x0848), .driver_info = USB_QUIRK_DELAY_INIT },
++
++	/* Logitech PTZ Pro Camera */
++	{ USB_DEVICE(0x046d, 0x0853), .driver_info = USB_QUIRK_DELAY_INIT },
++
+ 	/* Logitech Quickcam Fusion */
+ 	{ USB_DEVICE(0x046d, 0x08c1), .driver_info = USB_QUIRK_RESET_RESUME },
+ 
+@@ -78,6 +85,12 @@ static const struct usb_device_id usb_quirk_list[] = {
+ 	/* Philips PSC805 audio device */
+ 	{ USB_DEVICE(0x0471, 0x0155), .driver_info = USB_QUIRK_RESET_RESUME },
+ 
++	/* Plantronic Audio 655 DSP */
++	{ USB_DEVICE(0x047f, 0xc008), .driver_info = USB_QUIRK_RESET_RESUME },
++
++	/* Plantronic Audio 648 USB */
++	{ USB_DEVICE(0x047f, 0xc013), .driver_info = USB_QUIRK_RESET_RESUME },
++
+ 	/* Artisman Watchdog Dongle */
+ 	{ USB_DEVICE(0x04b4, 0x0526), .driver_info =
+ 			USB_QUIRK_CONFIG_INTF_STRINGS },
+diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
+index 9a8c936cd42c..41f841fa6c4d 100644
+--- a/drivers/usb/host/xhci-mem.c
++++ b/drivers/usb/host/xhci-mem.c
+@@ -1498,10 +1498,10 @@ int xhci_endpoint_init(struct xhci_hcd *xhci,
+ 	 * use Event Data TRBs, and we don't chain in a link TRB on short
+ 	 * transfers, we're basically dividing by 1.
+ 	 *
+-	 * xHCI 1.0 specification indicates that the Average TRB Length should
+-	 * be set to 8 for control endpoints.
++	 * xHCI 1.0 and 1.1 specification indicates that the Average TRB Length
++	 * should be set to 8 for control endpoints.
+ 	 */
+-	if (usb_endpoint_xfer_control(&ep->desc) && xhci->hci_version == 0x100)
++	if (usb_endpoint_xfer_control(&ep->desc) && xhci->hci_version >= 0x100)
+ 		ep_ctx->tx_info |= cpu_to_le32(AVG_TRB_LENGTH_FOR_EP(8));
+ 	else
+ 		ep_ctx->tx_info |=
+@@ -1792,8 +1792,7 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci)
+ 	int size;
+ 	int i, j, num_ports;
+ 
+-	if (timer_pending(&xhci->cmd_timer))
+-		del_timer_sync(&xhci->cmd_timer);
++	del_timer_sync(&xhci->cmd_timer);
+ 
+ 	/* Free the Event Ring Segment Table and the actual Event Ring */
+ 	size = sizeof(struct xhci_erst_entry)*(xhci->erst.num_entries);
+@@ -2321,6 +2320,10 @@ int xhci_mem_init(struct xhci_hcd *xhci, gfp_t flags)
+ 
+ 	INIT_LIST_HEAD(&xhci->cmd_list);
+ 
++	/* init command timeout timer */
++	setup_timer(&xhci->cmd_timer, xhci_handle_command_timeout,
++		    (unsigned long)xhci);
++
+ 	page_size = readl(&xhci->op_regs->page_size);
+ 	xhci_dbg_trace(xhci, trace_xhci_dbg_init,
+ 			"Supported page size register = 0x%x", page_size);
+@@ -2505,10 +2508,6 @@ int xhci_mem_init(struct xhci_hcd *xhci, gfp_t flags)
+ 			"Wrote ERST address to ir_set 0.");
+ 	xhci_print_ir_set(xhci, 0);
+ 
+-	/* init command timeout timer */
+-	setup_timer(&xhci->cmd_timer, xhci_handle_command_timeout,
+-		    (unsigned long)xhci);
+-
+ 	/*
+ 	 * XXX: Might need to set the Interrupter Moderation Register to
+ 	 * something other than the default (~1ms minimum between interrupts).
+diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
+index 5590eac2b22d..c79d33676672 100644
+--- a/drivers/usb/host/xhci-pci.c
++++ b/drivers/usb/host/xhci-pci.c
+@@ -180,51 +180,6 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
+ 				"QUIRK: Resetting on resume");
+ }
+ 
+-/*
+- * In some Intel xHCI controllers, in order to get D3 working,
+- * through a vendor specific SSIC CONFIG register at offset 0x883c,
+- * SSIC PORT need to be marked as "unused" before putting xHCI
+- * into D3. After D3 exit, the SSIC port need to be marked as "used".
+- * Without this change, xHCI might not enter D3 state.
+- * Make sure PME works on some Intel xHCI controllers by writing 1 to clear
+- * the Internal PME flag bit in vendor specific PMCTRL register at offset 0x80a4
+- */
+-static void xhci_pme_quirk(struct usb_hcd *hcd, bool suspend)
+-{
+-	struct xhci_hcd	*xhci = hcd_to_xhci(hcd);
+-	struct pci_dev		*pdev = to_pci_dev(hcd->self.controller);
+-	u32 val;
+-	void __iomem *reg;
+-
+-	if (pdev->vendor == PCI_VENDOR_ID_INTEL &&
+-		 pdev->device == PCI_DEVICE_ID_INTEL_CHERRYVIEW_XHCI) {
+-
+-		reg = (void __iomem *) xhci->cap_regs + PORT2_SSIC_CONFIG_REG2;
+-
+-		/* Notify SSIC that SSIC profile programming is not done */
+-		val = readl(reg) & ~PROG_DONE;
+-		writel(val, reg);
+-
+-		/* Mark SSIC port as unused(suspend) or used(resume) */
+-		val = readl(reg);
+-		if (suspend)
+-			val |= SSIC_PORT_UNUSED;
+-		else
+-			val &= ~SSIC_PORT_UNUSED;
+-		writel(val, reg);
+-
+-		/* Notify SSIC that SSIC profile programming is done */
+-		val = readl(reg) | PROG_DONE;
+-		writel(val, reg);
+-		readl(reg);
+-	}
+-
+-	reg = (void __iomem *) xhci->cap_regs + 0x80a4;
+-	val = readl(reg);
+-	writel(val | BIT(28), reg);
+-	readl(reg);
+-}
+-
+ #ifdef CONFIG_ACPI
+ static void xhci_pme_acpi_rtd3_enable(struct pci_dev *dev)
+ {
+@@ -345,6 +300,51 @@ static void xhci_pci_remove(struct pci_dev *dev)
+ }
+ 
+ #ifdef CONFIG_PM
++/*
++ * In some Intel xHCI controllers, in order to get D3 working,
++ * through a vendor specific SSIC CONFIG register at offset 0x883c,
++ * SSIC PORT need to be marked as "unused" before putting xHCI
++ * into D3. After D3 exit, the SSIC port need to be marked as "used".
++ * Without this change, xHCI might not enter D3 state.
++ * Make sure PME works on some Intel xHCI controllers by writing 1 to clear
++ * the Internal PME flag bit in vendor specific PMCTRL register at offset 0x80a4
++ */
++static void xhci_pme_quirk(struct usb_hcd *hcd, bool suspend)
++{
++	struct xhci_hcd	*xhci = hcd_to_xhci(hcd);
++	struct pci_dev		*pdev = to_pci_dev(hcd->self.controller);
++	u32 val;
++	void __iomem *reg;
++
++	if (pdev->vendor == PCI_VENDOR_ID_INTEL &&
++		 pdev->device == PCI_DEVICE_ID_INTEL_CHERRYVIEW_XHCI) {
++
++		reg = (void __iomem *) xhci->cap_regs + PORT2_SSIC_CONFIG_REG2;
++
++		/* Notify SSIC that SSIC profile programming is not done */
++		val = readl(reg) & ~PROG_DONE;
++		writel(val, reg);
++
++		/* Mark SSIC port as unused(suspend) or used(resume) */
++		val = readl(reg);
++		if (suspend)
++			val |= SSIC_PORT_UNUSED;
++		else
++			val &= ~SSIC_PORT_UNUSED;
++		writel(val, reg);
++
++		/* Notify SSIC that SSIC profile programming is done */
++		val = readl(reg) | PROG_DONE;
++		writel(val, reg);
++		readl(reg);
++	}
++
++	reg = (void __iomem *) xhci->cap_regs + 0x80a4;
++	val = readl(reg);
++	writel(val | BIT(28), reg);
++	readl(reg);
++}
++
+ static int xhci_pci_suspend(struct usb_hcd *hcd, bool do_wakeup)
+ {
+ 	struct xhci_hcd	*xhci = hcd_to_xhci(hcd);
+diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
+index 32f4d564494a..8aadf3def901 100644
+--- a/drivers/usb/host/xhci-ring.c
++++ b/drivers/usb/host/xhci-ring.c
+@@ -302,6 +302,15 @@ static int xhci_abort_cmd_ring(struct xhci_hcd *xhci)
+ 	ret = xhci_handshake(&xhci->op_regs->cmd_ring,
+ 			CMD_RING_RUNNING, 0, 5 * 1000 * 1000);
+ 	if (ret < 0) {
++		/* we are about to kill xhci, give it one more chance */
++		xhci_write_64(xhci, temp_64 | CMD_RING_ABORT,
++			      &xhci->op_regs->cmd_ring);
++		udelay(1000);
++		ret = xhci_handshake(&xhci->op_regs->cmd_ring,
++				     CMD_RING_RUNNING, 0, 3 * 1000 * 1000);
++		if (ret == 0)
++			return 0;
++
+ 		xhci_err(xhci, "Stopped the command ring failed, "
+ 				"maybe the host is dead\n");
+ 		xhci->xhc_state |= XHCI_STATE_DYING;
+@@ -3041,9 +3050,11 @@ static int queue_bulk_sg_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
+ 	struct xhci_td *td;
+ 	struct scatterlist *sg;
+ 	int num_sgs;
+-	int trb_buff_len, this_sg_len, running_total;
++	int trb_buff_len, this_sg_len, running_total, ret;
+ 	unsigned int total_packet_count;
++	bool zero_length_needed;
+ 	bool first_trb;
++	int last_trb_num;
+ 	u64 addr;
+ 	bool more_trbs_coming;
+ 
+@@ -3059,13 +3070,27 @@ static int queue_bulk_sg_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
+ 	total_packet_count = DIV_ROUND_UP(urb->transfer_buffer_length,
+ 			usb_endpoint_maxp(&urb->ep->desc));
+ 
+-	trb_buff_len = prepare_transfer(xhci, xhci->devs[slot_id],
++	ret = prepare_transfer(xhci, xhci->devs[slot_id],
+ 			ep_index, urb->stream_id,
+ 			num_trbs, urb, 0, mem_flags);
+-	if (trb_buff_len < 0)
+-		return trb_buff_len;
++	if (ret < 0)
++		return ret;
+ 
+ 	urb_priv = urb->hcpriv;
++
++	/* Deal with URB_ZERO_PACKET - need one more td/trb */
++	zero_length_needed = urb->transfer_flags & URB_ZERO_PACKET &&
++		urb_priv->length == 2;
++	if (zero_length_needed) {
++		num_trbs++;
++		xhci_dbg(xhci, "Creating zero length td.\n");
++		ret = prepare_transfer(xhci, xhci->devs[slot_id],
++				ep_index, urb->stream_id,
++				1, urb, 1, mem_flags);
++		if (ret < 0)
++			return ret;
++	}
++
+ 	td = urb_priv->td[0];
+ 
+ 	/*
+@@ -3095,6 +3120,7 @@ static int queue_bulk_sg_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
+ 		trb_buff_len = urb->transfer_buffer_length;
+ 
+ 	first_trb = true;
++	last_trb_num = zero_length_needed ? 2 : 1;
+ 	/* Queue the first TRB, even if it's zero-length */
+ 	do {
+ 		u32 field = 0;
+@@ -3112,12 +3138,15 @@ static int queue_bulk_sg_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
+ 		/* Chain all the TRBs together; clear the chain bit in the last
+ 		 * TRB to indicate it's the last TRB in the chain.
+ 		 */
+-		if (num_trbs > 1) {
++		if (num_trbs > last_trb_num) {
+ 			field |= TRB_CHAIN;
+-		} else {
+-			/* FIXME - add check for ZERO_PACKET flag before this */
++		} else if (num_trbs == last_trb_num) {
+ 			td->last_trb = ep_ring->enqueue;
+ 			field |= TRB_IOC;
++		} else if (zero_length_needed && num_trbs == 1) {
++			trb_buff_len = 0;
++			urb_priv->td[1]->last_trb = ep_ring->enqueue;
++			field |= TRB_IOC;
+ 		}
+ 
+ 		/* Only set interrupt on short packet for IN endpoints */
+@@ -3179,7 +3208,7 @@ static int queue_bulk_sg_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
+ 		if (running_total + trb_buff_len > urb->transfer_buffer_length)
+ 			trb_buff_len =
+ 				urb->transfer_buffer_length - running_total;
+-	} while (running_total < urb->transfer_buffer_length);
++	} while (num_trbs > 0);
+ 
+ 	check_trb_math(urb, num_trbs, running_total);
+ 	giveback_first_trb(xhci, slot_id, ep_index, urb->stream_id,
+@@ -3197,7 +3226,9 @@ int xhci_queue_bulk_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
+ 	int num_trbs;
+ 	struct xhci_generic_trb *start_trb;
+ 	bool first_trb;
++	int last_trb_num;
+ 	bool more_trbs_coming;
++	bool zero_length_needed;
+ 	int start_cycle;
+ 	u32 field, length_field;
+ 
+@@ -3228,7 +3259,6 @@ int xhci_queue_bulk_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
+ 		num_trbs++;
+ 		running_total += TRB_MAX_BUFF_SIZE;
+ 	}
+-	/* FIXME: this doesn't deal with URB_ZERO_PACKET - need one more */
+ 
+ 	ret = prepare_transfer(xhci, xhci->devs[slot_id],
+ 			ep_index, urb->stream_id,
+@@ -3237,6 +3267,20 @@ int xhci_queue_bulk_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
+ 		return ret;
+ 
+ 	urb_priv = urb->hcpriv;
++
++	/* Deal with URB_ZERO_PACKET - need one more td/trb */
++	zero_length_needed = urb->transfer_flags & URB_ZERO_PACKET &&
++		urb_priv->length == 2;
++	if (zero_length_needed) {
++		num_trbs++;
++		xhci_dbg(xhci, "Creating zero length td.\n");
++		ret = prepare_transfer(xhci, xhci->devs[slot_id],
++				ep_index, urb->stream_id,
++				1, urb, 1, mem_flags);
++		if (ret < 0)
++			return ret;
++	}
++
+ 	td = urb_priv->td[0];
+ 
+ 	/*
+@@ -3258,7 +3302,7 @@ int xhci_queue_bulk_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
+ 		trb_buff_len = urb->transfer_buffer_length;
+ 
+ 	first_trb = true;
+-
++	last_trb_num = zero_length_needed ? 2 : 1;
+ 	/* Queue the first TRB, even if it's zero-length */
+ 	do {
+ 		u32 remainder = 0;
+@@ -3275,12 +3319,15 @@ int xhci_queue_bulk_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
+ 		/* Chain all the TRBs together; clear the chain bit in the last
+ 		 * TRB to indicate it's the last TRB in the chain.
+ 		 */
+-		if (num_trbs > 1) {
++		if (num_trbs > last_trb_num) {
+ 			field |= TRB_CHAIN;
+-		} else {
+-			/* FIXME - add check for ZERO_PACKET flag before this */
++		} else if (num_trbs == last_trb_num) {
+ 			td->last_trb = ep_ring->enqueue;
+ 			field |= TRB_IOC;
++		} else if (zero_length_needed && num_trbs == 1) {
++			trb_buff_len = 0;
++			urb_priv->td[1]->last_trb = ep_ring->enqueue;
++			field |= TRB_IOC;
+ 		}
+ 
+ 		/* Only set interrupt on short packet for IN endpoints */
+@@ -3318,7 +3365,7 @@ int xhci_queue_bulk_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
+ 		trb_buff_len = urb->transfer_buffer_length - running_total;
+ 		if (trb_buff_len > TRB_MAX_BUFF_SIZE)
+ 			trb_buff_len = TRB_MAX_BUFF_SIZE;
+-	} while (running_total < urb->transfer_buffer_length);
++	} while (num_trbs > 0);
+ 
+ 	check_trb_math(urb, num_trbs, running_total);
+ 	giveback_first_trb(xhci, slot_id, ep_index, urb->stream_id,
+@@ -3385,8 +3432,8 @@ int xhci_queue_ctrl_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
+ 	if (start_cycle == 0)
+ 		field |= 0x1;
+ 
+-	/* xHCI 1.0 6.4.1.2.1: Transfer Type field */
+-	if (xhci->hci_version == 0x100) {
++	/* xHCI 1.0/1.1 6.4.1.2.1: Transfer Type field */
++	if (xhci->hci_version >= 0x100) {
+ 		if (urb->transfer_buffer_length > 0) {
+ 			if (setup->bRequestType & USB_DIR_IN)
+ 				field |= TRB_TX_TYPE(TRB_DATA_IN);
+diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
+index 526ebc0c7e72..d7b9f484d4e9 100644
+--- a/drivers/usb/host/xhci.c
++++ b/drivers/usb/host/xhci.c
+@@ -146,7 +146,8 @@ static int xhci_start(struct xhci_hcd *xhci)
+ 				"waited %u microseconds.\n",
+ 				XHCI_MAX_HALT_USEC);
+ 	if (!ret)
+-		xhci->xhc_state &= ~XHCI_STATE_HALTED;
++		xhci->xhc_state &= ~(XHCI_STATE_HALTED | XHCI_STATE_DYING);
++
+ 	return ret;
+ }
+ 
+@@ -654,15 +655,6 @@ int xhci_run(struct usb_hcd *hcd)
+ }
+ EXPORT_SYMBOL_GPL(xhci_run);
+ 
+-static void xhci_only_stop_hcd(struct usb_hcd *hcd)
+-{
+-	struct xhci_hcd *xhci = hcd_to_xhci(hcd);
+-
+-	spin_lock_irq(&xhci->lock);
+-	xhci_halt(xhci);
+-	spin_unlock_irq(&xhci->lock);
+-}
+-
+ /*
+  * Stop xHCI driver.
+  *
+@@ -677,12 +669,14 @@ void xhci_stop(struct usb_hcd *hcd)
+ 	u32 temp;
+ 	struct xhci_hcd *xhci = hcd_to_xhci(hcd);
+ 
+-	if (!usb_hcd_is_primary_hcd(hcd)) {
+-		xhci_only_stop_hcd(xhci->shared_hcd);
++	if (xhci->xhc_state & XHCI_STATE_HALTED)
+ 		return;
+-	}
+ 
++	mutex_lock(&xhci->mutex);
+ 	spin_lock_irq(&xhci->lock);
++	xhci->xhc_state |= XHCI_STATE_HALTED;
++	xhci->cmd_ring_state = CMD_RING_STATE_STOPPED;
++
+ 	/* Make sure the xHC is halted for a USB3 roothub
+ 	 * (xhci_stop() could be called as part of failed init).
+ 	 */
+@@ -717,6 +711,7 @@ void xhci_stop(struct usb_hcd *hcd)
+ 	xhci_dbg_trace(xhci, trace_xhci_dbg_init,
+ 			"xhci_stop completed - status = %x",
+ 			readl(&xhci->op_regs->status));
++	mutex_unlock(&xhci->mutex);
+ }
+ 
+ /*
+@@ -1340,6 +1335,11 @@ int xhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flags)
+ 
+ 	if (usb_endpoint_xfer_isoc(&urb->ep->desc))
+ 		size = urb->number_of_packets;
++	else if (usb_endpoint_is_bulk_out(&urb->ep->desc) &&
++	    urb->transfer_buffer_length > 0 &&
++	    urb->transfer_flags & URB_ZERO_PACKET &&
++	    !(urb->transfer_buffer_length % usb_endpoint_maxp(&urb->ep->desc)))
++		size = 2;
+ 	else
+ 		size = 1;
+ 
+@@ -3788,6 +3788,9 @@ static int xhci_setup_device(struct usb_hcd *hcd, struct usb_device *udev,
+ 
+ 	mutex_lock(&xhci->mutex);
+ 
++	if (xhci->xhc_state)	/* dying or halted */
++		goto out;
++
+ 	if (!udev->slot_id) {
+ 		xhci_dbg_trace(xhci, trace_xhci_dbg_address,
+ 				"Bad Slot ID %d", udev->slot_id);
+diff --git a/drivers/usb/misc/chaoskey.c b/drivers/usb/misc/chaoskey.c
+index 3ad5d19e4d04..23c794813e6a 100644
+--- a/drivers/usb/misc/chaoskey.c
++++ b/drivers/usb/misc/chaoskey.c
+@@ -472,7 +472,7 @@ static int chaoskey_rng_read(struct hwrng *rng, void *data,
+ 	if (this_time > max)
+ 		this_time = max;
+ 
+-	memcpy(data, dev->buf, this_time);
++	memcpy(data, dev->buf + dev->used, this_time);
+ 
+ 	dev->used += this_time;
+ 
+diff --git a/drivers/usb/musb/musb_cppi41.c b/drivers/usb/musb/musb_cppi41.c
+index 4d1b44c232ee..d07cafb7d5f5 100644
+--- a/drivers/usb/musb/musb_cppi41.c
++++ b/drivers/usb/musb/musb_cppi41.c
+@@ -614,7 +614,7 @@ static int cppi41_dma_controller_start(struct cppi41_dma_controller *controller)
+ {
+ 	struct musb *musb = controller->musb;
+ 	struct device *dev = musb->controller;
+-	struct device_node *np = dev->of_node;
++	struct device_node *np = dev->parent->of_node;
+ 	struct cppi41_dma_channel *cppi41_channel;
+ 	int count;
+ 	int i;
+@@ -664,7 +664,7 @@ static int cppi41_dma_controller_start(struct cppi41_dma_controller *controller)
+ 		musb_dma->status = MUSB_DMA_STATUS_FREE;
+ 		musb_dma->max_len = SZ_4M;
+ 
+-		dc = dma_request_slave_channel(dev, str);
++		dc = dma_request_slave_channel(dev->parent, str);
+ 		if (!dc) {
+ 			dev_err(dev, "Failed to request %s.\n", str);
+ 			ret = -EPROBE_DEFER;
+@@ -695,7 +695,7 @@ cppi41_dma_controller_create(struct musb *musb, void __iomem *base)
+ 	struct cppi41_dma_controller *controller;
+ 	int ret = 0;
+ 
+-	if (!musb->controller->of_node) {
++	if (!musb->controller->parent->of_node) {
+ 		dev_err(musb->controller, "Need DT for the DMA engine.\n");
+ 		return NULL;
+ 	}
+diff --git a/drivers/usb/musb/musb_dsps.c b/drivers/usb/musb/musb_dsps.c
+index 1334a3de31b8..67325ec94894 100644
+--- a/drivers/usb/musb/musb_dsps.c
++++ b/drivers/usb/musb/musb_dsps.c
+@@ -225,8 +225,11 @@ static void dsps_musb_enable(struct musb *musb)
+ 
+ 	dsps_writel(reg_base, wrp->epintr_set, epmask);
+ 	dsps_writel(reg_base, wrp->coreintr_set, coremask);
+-	/* start polling for ID change. */
+-	mod_timer(&glue->timer, jiffies + msecs_to_jiffies(wrp->poll_timeout));
++	/* start polling for ID change in dual-role idle mode */
++	if (musb->xceiv->otg->state == OTG_STATE_B_IDLE &&
++			musb->port_mode == MUSB_PORT_MODE_DUAL_ROLE)
++		mod_timer(&glue->timer, jiffies +
++				msecs_to_jiffies(wrp->poll_timeout));
+ 	dsps_musb_try_idle(musb, 0);
+ }
+ 
+diff --git a/drivers/usb/phy/phy-generic.c b/drivers/usb/phy/phy-generic.c
+index deee68eafb72..0cd85f2ccddd 100644
+--- a/drivers/usb/phy/phy-generic.c
++++ b/drivers/usb/phy/phy-generic.c
+@@ -230,7 +230,8 @@ int usb_phy_gen_create_phy(struct device *dev, struct usb_phy_generic *nop,
+ 		clk_rate = pdata->clk_rate;
+ 		needs_vcc = pdata->needs_vcc;
+ 		if (gpio_is_valid(pdata->gpio_reset)) {
+-			err = devm_gpio_request_one(dev, pdata->gpio_reset, 0,
++			err = devm_gpio_request_one(dev, pdata->gpio_reset,
++						    GPIOF_ACTIVE_LOW,
+ 						    dev_name(dev));
+ 			if (!err)
+ 				nop->gpiod_reset =
+diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c
+index 876423b8892c..7c8eb4c4c175 100644
+--- a/drivers/usb/serial/option.c
++++ b/drivers/usb/serial/option.c
+@@ -278,6 +278,10 @@ static void option_instat_callback(struct urb *urb);
+ #define ZTE_PRODUCT_MF622			0x0001
+ #define ZTE_PRODUCT_MF628			0x0015
+ #define ZTE_PRODUCT_MF626			0x0031
++#define ZTE_PRODUCT_ZM8620_X			0x0396
++#define ZTE_PRODUCT_ME3620_MBIM			0x0426
++#define ZTE_PRODUCT_ME3620_X			0x1432
++#define ZTE_PRODUCT_ME3620_L			0x1433
+ #define ZTE_PRODUCT_AC2726			0xfff1
+ #define ZTE_PRODUCT_MG880			0xfffd
+ #define ZTE_PRODUCT_CDMA_TECH			0xfffe
+@@ -544,6 +548,18 @@ static const struct option_blacklist_info zte_mc2716_z_blacklist = {
+ 	.sendsetup = BIT(1) | BIT(2) | BIT(3),
+ };
+ 
++static const struct option_blacklist_info zte_me3620_mbim_blacklist = {
++	.reserved = BIT(2) | BIT(3) | BIT(4),
++};
++
++static const struct option_blacklist_info zte_me3620_xl_blacklist = {
++	.reserved = BIT(3) | BIT(4) | BIT(5),
++};
++
++static const struct option_blacklist_info zte_zm8620_x_blacklist = {
++	.reserved = BIT(3) | BIT(4) | BIT(5),
++};
++
+ static const struct option_blacklist_info huawei_cdc12_blacklist = {
+ 	.reserved = BIT(1) | BIT(2),
+ };
+@@ -1591,6 +1607,14 @@ static const struct usb_device_id option_ids[] = {
+ 	 .driver_info = (kernel_ulong_t)&zte_ad3812_z_blacklist },
+ 	{ USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, ZTE_PRODUCT_MC2716, 0xff, 0xff, 0xff),
+ 	 .driver_info = (kernel_ulong_t)&zte_mc2716_z_blacklist },
++	{ USB_DEVICE(ZTE_VENDOR_ID, ZTE_PRODUCT_ME3620_L),
++	 .driver_info = (kernel_ulong_t)&zte_me3620_xl_blacklist },
++	{ USB_DEVICE(ZTE_VENDOR_ID, ZTE_PRODUCT_ME3620_MBIM),
++	 .driver_info = (kernel_ulong_t)&zte_me3620_mbim_blacklist },
++	{ USB_DEVICE(ZTE_VENDOR_ID, ZTE_PRODUCT_ME3620_X),
++	 .driver_info = (kernel_ulong_t)&zte_me3620_xl_blacklist },
++	{ USB_DEVICE(ZTE_VENDOR_ID, ZTE_PRODUCT_ZM8620_X),
++	 .driver_info = (kernel_ulong_t)&zte_zm8620_x_blacklist },
+ 	{ USB_VENDOR_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0xff, 0x02, 0x01) },
+ 	{ USB_VENDOR_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0xff, 0x02, 0x05) },
+ 	{ USB_VENDOR_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0xff, 0x86, 0x10) },
+diff --git a/drivers/usb/serial/whiteheat.c b/drivers/usb/serial/whiteheat.c
+index 6c3734d2b45a..d3ea90bef84d 100644
+--- a/drivers/usb/serial/whiteheat.c
++++ b/drivers/usb/serial/whiteheat.c
+@@ -80,6 +80,8 @@ static int  whiteheat_firmware_download(struct usb_serial *serial,
+ static int  whiteheat_firmware_attach(struct usb_serial *serial);
+ 
+ /* function prototypes for the Connect Tech WhiteHEAT serial converter */
++static int whiteheat_probe(struct usb_serial *serial,
++				const struct usb_device_id *id);
+ static int  whiteheat_attach(struct usb_serial *serial);
+ static void whiteheat_release(struct usb_serial *serial);
+ static int  whiteheat_port_probe(struct usb_serial_port *port);
+@@ -116,6 +118,7 @@ static struct usb_serial_driver whiteheat_device = {
+ 	.description =		"Connect Tech - WhiteHEAT",
+ 	.id_table =		id_table_std,
+ 	.num_ports =		4,
++	.probe =		whiteheat_probe,
+ 	.attach =		whiteheat_attach,
+ 	.release =		whiteheat_release,
+ 	.port_probe =		whiteheat_port_probe,
+@@ -217,6 +220,34 @@ static int whiteheat_firmware_attach(struct usb_serial *serial)
+ /*****************************************************************************
+  * Connect Tech's White Heat serial driver functions
+  *****************************************************************************/
++
++static int whiteheat_probe(struct usb_serial *serial,
++				const struct usb_device_id *id)
++{
++	struct usb_host_interface *iface_desc;
++	struct usb_endpoint_descriptor *endpoint;
++	size_t num_bulk_in = 0;
++	size_t num_bulk_out = 0;
++	size_t min_num_bulk;
++	unsigned int i;
++
++	iface_desc = serial->interface->cur_altsetting;
++
++	for (i = 0; i < iface_desc->desc.bNumEndpoints; i++) {
++		endpoint = &iface_desc->endpoint[i].desc;
++		if (usb_endpoint_is_bulk_in(endpoint))
++			++num_bulk_in;
++		if (usb_endpoint_is_bulk_out(endpoint))
++			++num_bulk_out;
++	}
++
++	min_num_bulk = COMMAND_PORT + 1;
++	if (num_bulk_in < min_num_bulk || num_bulk_out < min_num_bulk)
++		return -ENODEV;
++
++	return 0;
++}
++
+ static int whiteheat_attach(struct usb_serial *serial)
+ {
+ 	struct usb_serial_port *command_port;
+diff --git a/drivers/watchdog/imgpdc_wdt.c b/drivers/watchdog/imgpdc_wdt.c
+index 0f73621827ab..15ab07230960 100644
+--- a/drivers/watchdog/imgpdc_wdt.c
++++ b/drivers/watchdog/imgpdc_wdt.c
+@@ -316,6 +316,7 @@ static int pdc_wdt_remove(struct platform_device *pdev)
+ {
+ 	struct pdc_wdt_dev *pdc_wdt = platform_get_drvdata(pdev);
+ 
++	unregister_restart_handler(&pdc_wdt->restart_handler);
+ 	pdc_wdt_stop(&pdc_wdt->wdt_dev);
+ 	watchdog_unregister_device(&pdc_wdt->wdt_dev);
+ 	clk_disable_unprepare(pdc_wdt->wdt_clk);
+diff --git a/drivers/watchdog/sunxi_wdt.c b/drivers/watchdog/sunxi_wdt.c
+index a29afb37c48c..47bd8a14d01f 100644
+--- a/drivers/watchdog/sunxi_wdt.c
++++ b/drivers/watchdog/sunxi_wdt.c
+@@ -184,7 +184,7 @@ static int sunxi_wdt_start(struct watchdog_device *wdt_dev)
+ 	/* Set system reset function */
+ 	reg = readl(wdt_base + regs->wdt_cfg);
+ 	reg &= ~(regs->wdt_reset_mask);
+-	reg |= ~(regs->wdt_reset_val);
++	reg |= regs->wdt_reset_val;
+ 	writel(reg, wdt_base + regs->wdt_cfg);
+ 
+ 	/* Enable watchdog */
+diff --git a/drivers/xen/preempt.c b/drivers/xen/preempt.c
+index a1800c150839..08cb419eb4e6 100644
+--- a/drivers/xen/preempt.c
++++ b/drivers/xen/preempt.c
+@@ -31,7 +31,7 @@ EXPORT_SYMBOL_GPL(xen_in_preemptible_hcall);
+ asmlinkage __visible void xen_maybe_preempt_hcall(void)
+ {
+ 	if (unlikely(__this_cpu_read(xen_in_preemptible_hcall)
+-		     && should_resched())) {
++		     && need_resched())) {
+ 		/*
+ 		 * Clear flag as we may be rescheduled on a different
+ 		 * cpu.
+diff --git a/fs/block_dev.c b/fs/block_dev.c
+index 198243717da5..1170f8ce5e7f 100644
+--- a/fs/block_dev.c
++++ b/fs/block_dev.c
+@@ -1241,6 +1241,13 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
+ 				goto out_clear;
+ 			}
+ 			bd_set_size(bdev, (loff_t)bdev->bd_part->nr_sects << 9);
++			/*
++			 * If the partition is not aligned on a page
++			 * boundary, we can't do dax I/O to it.
++			 */
++			if ((bdev->bd_part->start_sect % (PAGE_SIZE / 512)) ||
++			    (bdev->bd_part->nr_sects % (PAGE_SIZE / 512)))
++				bdev->bd_inode->i_flags &= ~S_DAX;
+ 		}
+ 	} else {
+ 		if (bdev->bd_contains == bdev) {
+diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
+index 02d05817cbdf..3fc4fec9b94e 100644
+--- a/fs/btrfs/extent_io.c
++++ b/fs/btrfs/extent_io.c
+@@ -2798,7 +2798,8 @@ static int submit_extent_page(int rw, struct extent_io_tree *tree,
+ 			      bio_end_io_t end_io_func,
+ 			      int mirror_num,
+ 			      unsigned long prev_bio_flags,
+-			      unsigned long bio_flags)
++			      unsigned long bio_flags,
++			      bool force_bio_submit)
+ {
+ 	int ret = 0;
+ 	struct bio *bio;
+@@ -2816,6 +2817,7 @@ static int submit_extent_page(int rw, struct extent_io_tree *tree,
+ 			contig = bio_end_sector(bio) == sector;
+ 
+ 		if (prev_bio_flags != bio_flags || !contig ||
++		    force_bio_submit ||
+ 		    merge_bio(rw, tree, page, offset, page_size, bio, bio_flags) ||
+ 		    bio_add_page(bio, page, page_size, offset) < page_size) {
+ 			ret = submit_one_bio(rw, bio, mirror_num,
+@@ -2909,7 +2911,8 @@ static int __do_readpage(struct extent_io_tree *tree,
+ 			 get_extent_t *get_extent,
+ 			 struct extent_map **em_cached,
+ 			 struct bio **bio, int mirror_num,
+-			 unsigned long *bio_flags, int rw)
++			 unsigned long *bio_flags, int rw,
++			 u64 *prev_em_start)
+ {
+ 	struct inode *inode = page->mapping->host;
+ 	u64 start = page_offset(page);
+@@ -2957,6 +2960,7 @@ static int __do_readpage(struct extent_io_tree *tree,
+ 	}
+ 	while (cur <= end) {
+ 		unsigned long pnr = (last_byte >> PAGE_CACHE_SHIFT) + 1;
++		bool force_bio_submit = false;
+ 
+ 		if (cur >= last_byte) {
+ 			char *userpage;
+@@ -3007,6 +3011,49 @@ static int __do_readpage(struct extent_io_tree *tree,
+ 		block_start = em->block_start;
+ 		if (test_bit(EXTENT_FLAG_PREALLOC, &em->flags))
+ 			block_start = EXTENT_MAP_HOLE;
++
++		/*
++		 * If we have a file range that points to a compressed extent
++		 * and it's followed by a consecutive file range that points to
++		 * to the same compressed extent (possibly with a different
++		 * offset and/or length, so it either points to the whole extent
++		 * or only part of it), we must make sure we do not submit a
++		 * single bio to populate the pages for the 2 ranges because
++		 * this makes the compressed extent read zero out the pages
++		 * belonging to the 2nd range. Imagine the following scenario:
++		 *
++		 *  File layout
++		 *  [0 - 8K]                     [8K - 24K]
++		 *    |                               |
++		 *    |                               |
++		 * points to extent X,         points to extent X,
++		 * offset 4K, length of 8K     offset 0, length 16K
++		 *
++		 * [extent X, compressed length = 4K uncompressed length = 16K]
++		 *
++		 * If the bio to read the compressed extent covers both ranges,
++		 * it will decompress extent X into the pages belonging to the
++		 * first range and then it will stop, zeroing out the remaining
++		 * pages that belong to the other range that points to extent X.
++		 * So here we make sure we submit 2 bios, one for the first
++		 * range and another one for the third range. Both will target
++		 * the same physical extent from disk, but we can't currently
++		 * make the compressed bio endio callback populate the pages
++		 * for both ranges because each compressed bio is tightly
++		 * coupled with a single extent map, and each range can have
++		 * an extent map with a different offset value relative to the
++		 * uncompressed data of our extent and different lengths. This
++		 * is a corner case so we prioritize correctness over
++		 * non-optimal behavior (submitting 2 bios for the same extent).
++		 */
++		if (test_bit(EXTENT_FLAG_COMPRESSED, &em->flags) &&
++		    prev_em_start && *prev_em_start != (u64)-1 &&
++		    *prev_em_start != em->orig_start)
++			force_bio_submit = true;
++
++		if (prev_em_start)
++			*prev_em_start = em->orig_start;
++
+ 		free_extent_map(em);
+ 		em = NULL;
+ 
+@@ -3056,7 +3103,8 @@ static int __do_readpage(struct extent_io_tree *tree,
+ 					 bdev, bio, pnr,
+ 					 end_bio_extent_readpage, mirror_num,
+ 					 *bio_flags,
+-					 this_bio_flag);
++					 this_bio_flag,
++					 force_bio_submit);
+ 		if (!ret) {
+ 			nr++;
+ 			*bio_flags = this_bio_flag;
+@@ -3083,7 +3131,8 @@ static inline void __do_contiguous_readpages(struct extent_io_tree *tree,
+ 					     get_extent_t *get_extent,
+ 					     struct extent_map **em_cached,
+ 					     struct bio **bio, int mirror_num,
+-					     unsigned long *bio_flags, int rw)
++					     unsigned long *bio_flags, int rw,
++					     u64 *prev_em_start)
+ {
+ 	struct inode *inode;
+ 	struct btrfs_ordered_extent *ordered;
+@@ -3103,7 +3152,7 @@ static inline void __do_contiguous_readpages(struct extent_io_tree *tree,
+ 
+ 	for (index = 0; index < nr_pages; index++) {
+ 		__do_readpage(tree, pages[index], get_extent, em_cached, bio,
+-			      mirror_num, bio_flags, rw);
++			      mirror_num, bio_flags, rw, prev_em_start);
+ 		page_cache_release(pages[index]);
+ 	}
+ }
+@@ -3113,7 +3162,8 @@ static void __extent_readpages(struct extent_io_tree *tree,
+ 			       int nr_pages, get_extent_t *get_extent,
+ 			       struct extent_map **em_cached,
+ 			       struct bio **bio, int mirror_num,
+-			       unsigned long *bio_flags, int rw)
++			       unsigned long *bio_flags, int rw,
++			       u64 *prev_em_start)
+ {
+ 	u64 start = 0;
+ 	u64 end = 0;
+@@ -3134,7 +3184,7 @@ static void __extent_readpages(struct extent_io_tree *tree,
+ 						  index - first_index, start,
+ 						  end, get_extent, em_cached,
+ 						  bio, mirror_num, bio_flags,
+-						  rw);
++						  rw, prev_em_start);
+ 			start = page_start;
+ 			end = start + PAGE_CACHE_SIZE - 1;
+ 			first_index = index;
+@@ -3145,7 +3195,8 @@ static void __extent_readpages(struct extent_io_tree *tree,
+ 		__do_contiguous_readpages(tree, &pages[first_index],
+ 					  index - first_index, start,
+ 					  end, get_extent, em_cached, bio,
+-					  mirror_num, bio_flags, rw);
++					  mirror_num, bio_flags, rw,
++					  prev_em_start);
+ }
+ 
+ static int __extent_read_full_page(struct extent_io_tree *tree,
+@@ -3171,7 +3222,7 @@ static int __extent_read_full_page(struct extent_io_tree *tree,
+ 	}
+ 
+ 	ret = __do_readpage(tree, page, get_extent, NULL, bio, mirror_num,
+-			    bio_flags, rw);
++			    bio_flags, rw, NULL);
+ 	return ret;
+ }
+ 
+@@ -3197,7 +3248,7 @@ int extent_read_full_page_nolock(struct extent_io_tree *tree, struct page *page,
+ 	int ret;
+ 
+ 	ret = __do_readpage(tree, page, get_extent, NULL, &bio, mirror_num,
+-				      &bio_flags, READ);
++			    &bio_flags, READ, NULL);
+ 	if (bio)
+ 		ret = submit_one_bio(READ, bio, mirror_num, bio_flags);
+ 	return ret;
+@@ -3450,7 +3501,7 @@ static noinline_for_stack int __extent_writepage_io(struct inode *inode,
+ 						 sector, iosize, pg_offset,
+ 						 bdev, &epd->bio, max_nr,
+ 						 end_bio_extent_writepage,
+-						 0, 0, 0);
++						 0, 0, 0, false);
+ 			if (ret)
+ 				SetPageError(page);
+ 		}
+@@ -3752,7 +3803,7 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
+ 		ret = submit_extent_page(rw, tree, p, offset >> 9,
+ 					 PAGE_CACHE_SIZE, 0, bdev, &epd->bio,
+ 					 -1, end_bio_extent_buffer_writepage,
+-					 0, epd->bio_flags, bio_flags);
++					 0, epd->bio_flags, bio_flags, false);
+ 		epd->bio_flags = bio_flags;
+ 		if (ret) {
+ 			set_btree_ioerr(p);
+@@ -4156,6 +4207,7 @@ int extent_readpages(struct extent_io_tree *tree,
+ 	struct page *page;
+ 	struct extent_map *em_cached = NULL;
+ 	int nr = 0;
++	u64 prev_em_start = (u64)-1;
+ 
+ 	for (page_idx = 0; page_idx < nr_pages; page_idx++) {
+ 		page = list_entry(pages->prev, struct page, lru);
+@@ -4172,12 +4224,12 @@ int extent_readpages(struct extent_io_tree *tree,
+ 		if (nr < ARRAY_SIZE(pagepool))
+ 			continue;
+ 		__extent_readpages(tree, pagepool, nr, get_extent, &em_cached,
+-				   &bio, 0, &bio_flags, READ);
++				   &bio, 0, &bio_flags, READ, &prev_em_start);
+ 		nr = 0;
+ 	}
+ 	if (nr)
+ 		__extent_readpages(tree, pagepool, nr, get_extent, &em_cached,
+-				   &bio, 0, &bio_flags, READ);
++				   &bio, 0, &bio_flags, READ, &prev_em_start);
+ 
+ 	if (em_cached)
+ 		free_extent_map(em_cached);
+diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
+index e33dff356460..b54e63038b96 100644
+--- a/fs/btrfs/inode.c
++++ b/fs/btrfs/inode.c
+@@ -5051,7 +5051,8 @@ void btrfs_evict_inode(struct inode *inode)
+ 		goto no_delete;
+ 	}
+ 	/* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */
+-	btrfs_wait_ordered_range(inode, 0, (u64)-1);
++	if (!special_file(inode->i_mode))
++		btrfs_wait_ordered_range(inode, 0, (u64)-1);
+ 
+ 	btrfs_free_io_failure_record(inode, 0, (u64)-1);
+ 
+diff --git a/fs/cifs/cifsencrypt.c b/fs/cifs/cifsencrypt.c
+index aa0dc2573374..afa09fce8151 100644
+--- a/fs/cifs/cifsencrypt.c
++++ b/fs/cifs/cifsencrypt.c
+@@ -444,6 +444,48 @@ find_domain_name(struct cifs_ses *ses, const struct nls_table *nls_cp)
+ 	return 0;
+ }
+ 
++/* Server has provided av pairs/target info in the type 2 challenge
++ * packet and we have plucked it and stored within smb session.
++ * We parse that blob here to find the server given timestamp
++ * as part of ntlmv2 authentication (or local current time as
++ * default in case of failure)
++ */
++static __le64
++find_timestamp(struct cifs_ses *ses)
++{
++	unsigned int attrsize;
++	unsigned int type;
++	unsigned int onesize = sizeof(struct ntlmssp2_name);
++	unsigned char *blobptr;
++	unsigned char *blobend;
++	struct ntlmssp2_name *attrptr;
++
++	if (!ses->auth_key.len || !ses->auth_key.response)
++		return 0;
++
++	blobptr = ses->auth_key.response;
++	blobend = blobptr + ses->auth_key.len;
++
++	while (blobptr + onesize < blobend) {
++		attrptr = (struct ntlmssp2_name *) blobptr;
++		type = le16_to_cpu(attrptr->type);
++		if (type == NTLMSSP_AV_EOL)
++			break;
++		blobptr += 2; /* advance attr type */
++		attrsize = le16_to_cpu(attrptr->length);
++		blobptr += 2; /* advance attr size */
++		if (blobptr + attrsize > blobend)
++			break;
++		if (type == NTLMSSP_AV_TIMESTAMP) {
++			if (attrsize == sizeof(u64))
++				return *((__le64 *)blobptr);
++		}
++		blobptr += attrsize; /* advance attr value */
++	}
++
++	return cpu_to_le64(cifs_UnixTimeToNT(CURRENT_TIME));
++}
++
+ static int calc_ntlmv2_hash(struct cifs_ses *ses, char *ntlmv2_hash,
+ 			    const struct nls_table *nls_cp)
+ {
+@@ -641,6 +683,7 @@ setup_ntlmv2_rsp(struct cifs_ses *ses, const struct nls_table *nls_cp)
+ 	struct ntlmv2_resp *ntlmv2;
+ 	char ntlmv2_hash[16];
+ 	unsigned char *tiblob = NULL; /* target info blob */
++	__le64 rsp_timestamp;
+ 
+ 	if (ses->server->negflavor == CIFS_NEGFLAVOR_EXTENDED) {
+ 		if (!ses->domainName) {
+@@ -659,6 +702,12 @@ setup_ntlmv2_rsp(struct cifs_ses *ses, const struct nls_table *nls_cp)
+ 		}
+ 	}
+ 
++	/* Must be within 5 minutes of the server (or in range +/-2h
++	 * in case of Mac OS X), so simply carry over server timestamp
++	 * (as Windows 7 does)
++	 */
++	rsp_timestamp = find_timestamp(ses);
++
+ 	baselen = CIFS_SESS_KEY_SIZE + sizeof(struct ntlmv2_resp);
+ 	tilen = ses->auth_key.len;
+ 	tiblob = ses->auth_key.response;
+@@ -675,8 +724,8 @@ setup_ntlmv2_rsp(struct cifs_ses *ses, const struct nls_table *nls_cp)
+ 			(ses->auth_key.response + CIFS_SESS_KEY_SIZE);
+ 	ntlmv2->blob_signature = cpu_to_le32(0x00000101);
+ 	ntlmv2->reserved = 0;
+-	/* Must be within 5 minutes of the server */
+-	ntlmv2->time = cpu_to_le64(cifs_UnixTimeToNT(CURRENT_TIME));
++	ntlmv2->time = rsp_timestamp;
++
+ 	get_random_bytes(&ntlmv2->client_chal, sizeof(ntlmv2->client_chal));
+ 	ntlmv2->reserved2 = 0;
+ 
+diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
+index f621b44cb800..6b66dd5d1540 100644
+--- a/fs/cifs/inode.c
++++ b/fs/cifs/inode.c
+@@ -2034,7 +2034,6 @@ cifs_set_file_size(struct inode *inode, struct iattr *attrs,
+ 	struct tcon_link *tlink = NULL;
+ 	struct cifs_tcon *tcon = NULL;
+ 	struct TCP_Server_Info *server;
+-	struct cifs_io_parms io_parms;
+ 
+ 	/*
+ 	 * To avoid spurious oplock breaks from server, in the case of
+@@ -2056,18 +2055,6 @@ cifs_set_file_size(struct inode *inode, struct iattr *attrs,
+ 			rc = -ENOSYS;
+ 		cifsFileInfo_put(open_file);
+ 		cifs_dbg(FYI, "SetFSize for attrs rc = %d\n", rc);
+-		if ((rc == -EINVAL) || (rc == -EOPNOTSUPP)) {
+-			unsigned int bytes_written;
+-
+-			io_parms.netfid = open_file->fid.netfid;
+-			io_parms.pid = open_file->pid;
+-			io_parms.tcon = tcon;
+-			io_parms.offset = 0;
+-			io_parms.length = attrs->ia_size;
+-			rc = CIFSSMBWrite(xid, &io_parms, &bytes_written,
+-					  NULL, NULL, 1);
+-			cifs_dbg(FYI, "Wrt seteof rc %d\n", rc);
+-		}
+ 	} else
+ 		rc = -EINVAL;
+ 
+@@ -2093,28 +2080,7 @@ cifs_set_file_size(struct inode *inode, struct iattr *attrs,
+ 	else
+ 		rc = -ENOSYS;
+ 	cifs_dbg(FYI, "SetEOF by path (setattrs) rc = %d\n", rc);
+-	if ((rc == -EINVAL) || (rc == -EOPNOTSUPP)) {
+-		__u16 netfid;
+-		int oplock = 0;
+ 
+-		rc = SMBLegacyOpen(xid, tcon, full_path, FILE_OPEN,
+-				   GENERIC_WRITE, CREATE_NOT_DIR, &netfid,
+-				   &oplock, NULL, cifs_sb->local_nls,
+-				   cifs_remap(cifs_sb));
+-		if (rc == 0) {
+-			unsigned int bytes_written;
+-
+-			io_parms.netfid = netfid;
+-			io_parms.pid = current->tgid;
+-			io_parms.tcon = tcon;
+-			io_parms.offset = 0;
+-			io_parms.length = attrs->ia_size;
+-			rc = CIFSSMBWrite(xid, &io_parms, &bytes_written, NULL,
+-					  NULL,  1);
+-			cifs_dbg(FYI, "wrt seteof rc %d\n", rc);
+-			CIFSSMBClose(xid, tcon, netfid);
+-		}
+-	}
+ 	if (tlink)
+ 		cifs_put_tlink(tlink);
+ 
+diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
+index df91bcf56d67..18da19f4f811 100644
+--- a/fs/cifs/smb2ops.c
++++ b/fs/cifs/smb2ops.c
+@@ -50,9 +50,13 @@ change_conf(struct TCP_Server_Info *server)
+ 		break;
+ 	default:
+ 		server->echoes = true;
+-		server->oplocks = true;
++		if (enable_oplocks) {
++			server->oplocks = true;
++			server->oplock_credits = 1;
++		} else
++			server->oplocks = false;
++
+ 		server->echo_credits = 1;
+-		server->oplock_credits = 1;
+ 	}
+ 	server->credits -= server->echo_credits + server->oplock_credits;
+ 	return 0;
+diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
+index b8b4f08ee094..60dd83164ed6 100644
+--- a/fs/cifs/smb2pdu.c
++++ b/fs/cifs/smb2pdu.c
+@@ -46,6 +46,7 @@
+ #include "smb2status.h"
+ #include "smb2glob.h"
+ #include "cifspdu.h"
++#include "cifs_spnego.h"
+ 
+ /*
+  *  The following table defines the expected "StructureSize" of SMB2 requests
+@@ -486,19 +487,15 @@ SMB2_negotiate(const unsigned int xid, struct cifs_ses *ses)
+ 		cifs_dbg(FYI, "missing security blob on negprot\n");
+ 
+ 	rc = cifs_enable_signing(server, ses->sign);
+-#ifdef CONFIG_SMB2_ASN1  /* BB REMOVEME when updated asn1.c ready */
+ 	if (rc)
+ 		goto neg_exit;
+-	if (blob_length)
++	if (blob_length) {
+ 		rc = decode_negTokenInit(security_blob, blob_length, server);
+-	if (rc == 1)
+-		rc = 0;
+-	else if (rc == 0) {
+-		rc = -EIO;
+-		goto neg_exit;
++		if (rc == 1)
++			rc = 0;
++		else if (rc == 0)
++			rc = -EIO;
+ 	}
+-#endif
+-
+ neg_exit:
+ 	free_rsp_buf(resp_buftype, rsp);
+ 	return rc;
+@@ -592,7 +589,8 @@ SMB2_sess_setup(const unsigned int xid, struct cifs_ses *ses,
+ 	__le32 phase = NtLmNegotiate; /* NTLMSSP, if needed, is multistage */
+ 	struct TCP_Server_Info *server = ses->server;
+ 	u16 blob_length = 0;
+-	char *security_blob;
++	struct key *spnego_key = NULL;
++	char *security_blob = NULL;
+ 	char *ntlmssp_blob = NULL;
+ 	bool use_spnego = false; /* else use raw ntlmssp */
+ 
+@@ -620,7 +618,8 @@ SMB2_sess_setup(const unsigned int xid, struct cifs_ses *ses,
+ 	ses->ntlmssp->sesskey_per_smbsess = true;
+ 
+ 	/* FIXME: allow for other auth types besides NTLMSSP (e.g. krb5) */
+-	ses->sectype = RawNTLMSSP;
++	if (ses->sectype != Kerberos && ses->sectype != RawNTLMSSP)
++		ses->sectype = RawNTLMSSP;
+ 
+ ssetup_ntlmssp_authenticate:
+ 	if (phase == NtLmChallenge)
+@@ -649,7 +648,48 @@ ssetup_ntlmssp_authenticate:
+ 	iov[0].iov_base = (char *)req;
+ 	/* 4 for rfc1002 length field and 1 for pad */
+ 	iov[0].iov_len = get_rfc1002_length(req) + 4 - 1;
+-	if (phase == NtLmNegotiate) {
++
++	if (ses->sectype == Kerberos) {
++#ifdef CONFIG_CIFS_UPCALL
++		struct cifs_spnego_msg *msg;
++
++		spnego_key = cifs_get_spnego_key(ses);
++		if (IS_ERR(spnego_key)) {
++			rc = PTR_ERR(spnego_key);
++			spnego_key = NULL;
++			goto ssetup_exit;
++		}
++
++		msg = spnego_key->payload.data;
++		/*
++		 * check version field to make sure that cifs.upcall is
++		 * sending us a response in an expected form
++		 */
++		if (msg->version != CIFS_SPNEGO_UPCALL_VERSION) {
++			cifs_dbg(VFS,
++				  "bad cifs.upcall version. Expected %d got %d",
++				  CIFS_SPNEGO_UPCALL_VERSION, msg->version);
++			rc = -EKEYREJECTED;
++			goto ssetup_exit;
++		}
++		ses->auth_key.response = kmemdup(msg->data, msg->sesskey_len,
++						 GFP_KERNEL);
++		if (!ses->auth_key.response) {
++			cifs_dbg(VFS,
++				"Kerberos can't allocate (%u bytes) memory",
++				msg->sesskey_len);
++			rc = -ENOMEM;
++			goto ssetup_exit;
++		}
++		ses->auth_key.len = msg->sesskey_len;
++		blob_length = msg->secblob_len;
++		iov[1].iov_base = msg->data + msg->sesskey_len;
++		iov[1].iov_len = blob_length;
++#else
++		rc = -EOPNOTSUPP;
++		goto ssetup_exit;
++#endif /* CONFIG_CIFS_UPCALL */
++	} else if (phase == NtLmNegotiate) { /* if not krb5 must be ntlmssp */
+ 		ntlmssp_blob = kmalloc(sizeof(struct _NEGOTIATE_MESSAGE),
+ 				       GFP_KERNEL);
+ 		if (ntlmssp_blob == NULL) {
+@@ -672,6 +712,8 @@ ssetup_ntlmssp_authenticate:
+ 			/* with raw NTLMSSP we don't encapsulate in SPNEGO */
+ 			security_blob = ntlmssp_blob;
+ 		}
++		iov[1].iov_base = security_blob;
++		iov[1].iov_len = blob_length;
+ 	} else if (phase == NtLmAuthenticate) {
+ 		req->hdr.SessionId = ses->Suid;
+ 		ntlmssp_blob = kzalloc(sizeof(struct _NEGOTIATE_MESSAGE) + 500,
+@@ -699,6 +741,8 @@ ssetup_ntlmssp_authenticate:
+ 		} else {
+ 			security_blob = ntlmssp_blob;
+ 		}
++		iov[1].iov_base = security_blob;
++		iov[1].iov_len = blob_length;
+ 	} else {
+ 		cifs_dbg(VFS, "illegal ntlmssp phase\n");
+ 		rc = -EIO;
+@@ -710,8 +754,6 @@ ssetup_ntlmssp_authenticate:
+ 				cpu_to_le16(sizeof(struct smb2_sess_setup_req) -
+ 					    1 /* pad */ - 4 /* rfc1001 len */);
+ 	req->SecurityBufferLength = cpu_to_le16(blob_length);
+-	iov[1].iov_base = security_blob;
+-	iov[1].iov_len = blob_length;
+ 
+ 	inc_rfc1001_len(req, blob_length - 1 /* pad */);
+ 
+@@ -722,6 +764,7 @@ ssetup_ntlmssp_authenticate:
+ 
+ 	kfree(security_blob);
+ 	rsp = (struct smb2_sess_setup_rsp *)iov[0].iov_base;
++	ses->Suid = rsp->hdr.SessionId;
+ 	if (resp_buftype != CIFS_NO_BUFFER &&
+ 	    rsp->hdr.Status == STATUS_MORE_PROCESSING_REQUIRED) {
+ 		if (phase != NtLmNegotiate) {
+@@ -739,7 +782,6 @@ ssetup_ntlmssp_authenticate:
+ 		/* NTLMSSP Negotiate sent now processing challenge (response) */
+ 		phase = NtLmChallenge; /* process ntlmssp challenge */
+ 		rc = 0; /* MORE_PROCESSING is not an error here but expected */
+-		ses->Suid = rsp->hdr.SessionId;
+ 		rc = decode_ntlmssp_challenge(rsp->Buffer,
+ 				le16_to_cpu(rsp->SecurityBufferLength), ses);
+ 	}
+@@ -796,6 +838,10 @@ keygen_exit:
+ 		kfree(ses->auth_key.response);
+ 		ses->auth_key.response = NULL;
+ 	}
++	if (spnego_key) {
++		key_invalidate(spnego_key);
++		key_put(spnego_key);
++	}
+ 	kfree(ses->ntlmssp);
+ 
+ 	return rc;
+diff --git a/fs/dax.c b/fs/dax.c
+index a7f77e1fa18c..ef35a2014580 100644
+--- a/fs/dax.c
++++ b/fs/dax.c
+@@ -116,7 +116,8 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
+ 		unsigned len;
+ 		if (pos == max) {
+ 			unsigned blkbits = inode->i_blkbits;
+-			sector_t block = pos >> blkbits;
++			long page = pos >> PAGE_SHIFT;
++			sector_t block = page << (PAGE_SHIFT - blkbits);
+ 			unsigned first = pos - (block << blkbits);
+ 			long size;
+ 
+diff --git a/fs/dcache.c b/fs/dcache.c
+index 9b5fe503f6cb..e3b44ca75a1b 100644
+--- a/fs/dcache.c
++++ b/fs/dcache.c
+@@ -2926,6 +2926,13 @@ restart:
+ 
+ 		if (dentry == vfsmnt->mnt_root || IS_ROOT(dentry)) {
+ 			struct mount *parent = ACCESS_ONCE(mnt->mnt_parent);
++			/* Escaped? */
++			if (dentry != vfsmnt->mnt_root) {
++				bptr = *buffer;
++				blen = *buflen;
++				error = 3;
++				break;
++			}
+ 			/* Global root? */
+ 			if (mnt != parent) {
+ 				dentry = ACCESS_ONCE(mnt->mnt_mountpoint);
+diff --git a/fs/namei.c b/fs/namei.c
+index 1c2105ed20c5..36df4818a635 100644
+--- a/fs/namei.c
++++ b/fs/namei.c
+@@ -560,6 +560,24 @@ static int __nd_alloc_stack(struct nameidata *nd)
+ 	return 0;
+ }
+ 
++/**
++ * path_connected - Verify that a path->dentry is below path->mnt.mnt_root
++ * @path: nameidate to verify
++ *
++ * Rename can sometimes move a file or directory outside of a bind
++ * mount, path_connected allows those cases to be detected.
++ */
++static bool path_connected(const struct path *path)
++{
++	struct vfsmount *mnt = path->mnt;
++
++	/* Only bind mounts can have disconnected paths */
++	if (mnt->mnt_root == mnt->mnt_sb->s_root)
++		return true;
++
++	return is_subdir(path->dentry, mnt->mnt_root);
++}
++
+ static inline int nd_alloc_stack(struct nameidata *nd)
+ {
+ 	if (likely(nd->depth != EMBEDDED_LEVELS))
+@@ -1296,6 +1314,8 @@ static int follow_dotdot_rcu(struct nameidata *nd)
+ 				return -ECHILD;
+ 			nd->path.dentry = parent;
+ 			nd->seq = seq;
++			if (unlikely(!path_connected(&nd->path)))
++				return -ENOENT;
+ 			break;
+ 		} else {
+ 			struct mount *mnt = real_mount(nd->path.mnt);
+@@ -1396,7 +1416,7 @@ static void follow_mount(struct path *path)
+ 	}
+ }
+ 
+-static void follow_dotdot(struct nameidata *nd)
++static int follow_dotdot(struct nameidata *nd)
+ {
+ 	if (!nd->root.mnt)
+ 		set_root(nd);
+@@ -1412,6 +1432,8 @@ static void follow_dotdot(struct nameidata *nd)
+ 			/* rare case of legitimate dget_parent()... */
+ 			nd->path.dentry = dget_parent(nd->path.dentry);
+ 			dput(old);
++			if (unlikely(!path_connected(&nd->path)))
++				return -ENOENT;
+ 			break;
+ 		}
+ 		if (!follow_up(&nd->path))
+@@ -1419,6 +1441,7 @@ static void follow_dotdot(struct nameidata *nd)
+ 	}
+ 	follow_mount(&nd->path);
+ 	nd->inode = nd->path.dentry->d_inode;
++	return 0;
+ }
+ 
+ /*
+@@ -1535,8 +1558,6 @@ static int lookup_fast(struct nameidata *nd,
+ 		negative = d_is_negative(dentry);
+ 		if (read_seqcount_retry(&dentry->d_seq, seq))
+ 			return -ECHILD;
+-		if (negative)
+-			return -ENOENT;
+ 
+ 		/*
+ 		 * This sequence count validates that the parent had no
+@@ -1557,6 +1578,12 @@ static int lookup_fast(struct nameidata *nd,
+ 				goto unlazy;
+ 			}
+ 		}
++		/*
++		 * Note: do negative dentry check after revalidation in
++		 * case that drops it.
++		 */
++		if (negative)
++			return -ENOENT;
+ 		path->mnt = mnt;
+ 		path->dentry = dentry;
+ 		if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
+@@ -1634,7 +1661,7 @@ static inline int handle_dots(struct nameidata *nd, int type)
+ 		if (nd->flags & LOOKUP_RCU) {
+ 			return follow_dotdot_rcu(nd);
+ 		} else
+-			follow_dotdot(nd);
++			return follow_dotdot(nd);
+ 	}
+ 	return 0;
+ }
+diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
+index 029d688a969f..c56886829708 100644
+--- a/fs/nfs/delegation.c
++++ b/fs/nfs/delegation.c
+@@ -113,7 +113,8 @@ out:
+ 	return status;
+ }
+ 
+-static int nfs_delegation_claim_opens(struct inode *inode, const nfs4_stateid *stateid)
++static int nfs_delegation_claim_opens(struct inode *inode,
++		const nfs4_stateid *stateid, fmode_t type)
+ {
+ 	struct nfs_inode *nfsi = NFS_I(inode);
+ 	struct nfs_open_context *ctx;
+@@ -140,7 +141,7 @@ again:
+ 		/* Block nfs4_proc_unlck */
+ 		mutex_lock(&sp->so_delegreturn_mutex);
+ 		seq = raw_seqcount_begin(&sp->so_reclaim_seqcount);
+-		err = nfs4_open_delegation_recall(ctx, state, stateid);
++		err = nfs4_open_delegation_recall(ctx, state, stateid, type);
+ 		if (!err)
+ 			err = nfs_delegation_claim_locks(ctx, state, stateid);
+ 		if (!err && read_seqcount_retry(&sp->so_reclaim_seqcount, seq))
+@@ -411,7 +412,8 @@ static int nfs_end_delegation_return(struct inode *inode, struct nfs_delegation
+ 	do {
+ 		if (test_bit(NFS_DELEGATION_REVOKED, &delegation->flags))
+ 			break;
+-		err = nfs_delegation_claim_opens(inode, &delegation->stateid);
++		err = nfs_delegation_claim_opens(inode, &delegation->stateid,
++				delegation->type);
+ 		if (!issync || err != -EAGAIN)
+ 			break;
+ 		/*
+diff --git a/fs/nfs/delegation.h b/fs/nfs/delegation.h
+index e3c20a3ccc93..785c8525b576 100644
+--- a/fs/nfs/delegation.h
++++ b/fs/nfs/delegation.h
+@@ -54,7 +54,7 @@ void nfs_delegation_reap_unclaimed(struct nfs_client *clp);
+ 
+ /* NFSv4 delegation-related procedures */
+ int nfs4_proc_delegreturn(struct inode *inode, struct rpc_cred *cred, const nfs4_stateid *stateid, int issync);
+-int nfs4_open_delegation_recall(struct nfs_open_context *ctx, struct nfs4_state *state, const nfs4_stateid *stateid);
++int nfs4_open_delegation_recall(struct nfs_open_context *ctx, struct nfs4_state *state, const nfs4_stateid *stateid, fmode_t type);
+ int nfs4_lock_delegation_recall(struct file_lock *fl, struct nfs4_state *state, const nfs4_stateid *stateid);
+ bool nfs4_copy_delegation_stateid(nfs4_stateid *dst, struct inode *inode, fmode_t flags);
+ 
+diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
+index b34f2e228601..02ec07973bc4 100644
+--- a/fs/nfs/filelayout/filelayout.c
++++ b/fs/nfs/filelayout/filelayout.c
+@@ -629,23 +629,18 @@ out_put:
+ 	goto out;
+ }
+ 
+-static void filelayout_free_fh_array(struct nfs4_filelayout_segment *fl)
++static void _filelayout_free_lseg(struct nfs4_filelayout_segment *fl)
+ {
+ 	int i;
+ 
+-	for (i = 0; i < fl->num_fh; i++) {
+-		if (!fl->fh_array[i])
+-			break;
+-		kfree(fl->fh_array[i]);
++	if (fl->fh_array) {
++		for (i = 0; i < fl->num_fh; i++) {
++			if (!fl->fh_array[i])
++				break;
++			kfree(fl->fh_array[i]);
++		}
++		kfree(fl->fh_array);
+ 	}
+-	kfree(fl->fh_array);
+-	fl->fh_array = NULL;
+-}
+-
+-static void
+-_filelayout_free_lseg(struct nfs4_filelayout_segment *fl)
+-{
+-	filelayout_free_fh_array(fl);
+ 	kfree(fl);
+ }
+ 
+@@ -716,21 +711,21 @@ filelayout_decode_layout(struct pnfs_layout_hdr *flo,
+ 		/* Do we want to use a mempool here? */
+ 		fl->fh_array[i] = kmalloc(sizeof(struct nfs_fh), gfp_flags);
+ 		if (!fl->fh_array[i])
+-			goto out_err_free;
++			goto out_err;
+ 
+ 		p = xdr_inline_decode(&stream, 4);
+ 		if (unlikely(!p))
+-			goto out_err_free;
++			goto out_err;
+ 		fl->fh_array[i]->size = be32_to_cpup(p++);
+ 		if (sizeof(struct nfs_fh) < fl->fh_array[i]->size) {
+ 			printk(KERN_ERR "NFS: Too big fh %d received %d\n",
+ 			       i, fl->fh_array[i]->size);
+-			goto out_err_free;
++			goto out_err;
+ 		}
+ 
+ 		p = xdr_inline_decode(&stream, fl->fh_array[i]->size);
+ 		if (unlikely(!p))
+-			goto out_err_free;
++			goto out_err;
+ 		memcpy(fl->fh_array[i]->data, p, fl->fh_array[i]->size);
+ 		dprintk("DEBUG: %s: fh len %d\n", __func__,
+ 			fl->fh_array[i]->size);
+@@ -739,8 +734,6 @@ filelayout_decode_layout(struct pnfs_layout_hdr *flo,
+ 	__free_page(scratch);
+ 	return 0;
+ 
+-out_err_free:
+-	filelayout_free_fh_array(fl);
+ out_err:
+ 	__free_page(scratch);
+ 	return -EIO;
+diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
+index d731bbf974aa..0f020e4d8421 100644
+--- a/fs/nfs/nfs42proc.c
++++ b/fs/nfs/nfs42proc.c
+@@ -175,10 +175,12 @@ loff_t nfs42_proc_llseek(struct file *filep, loff_t offset, int whence)
+ {
+ 	struct nfs_server *server = NFS_SERVER(file_inode(filep));
+ 	struct nfs4_exception exception = { };
+-	int err;
++	loff_t err;
+ 
+ 	do {
+ 		err = _nfs42_proc_llseek(filep, offset, whence);
++		if (err >= 0)
++			break;
+ 		if (err == -ENOTSUPP)
+ 			return -EOPNOTSUPP;
+ 		err = nfs4_handle_exception(server, err, &exception);
+diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
+index 73c8204ad463..d2daacad3568 100644
+--- a/fs/nfs/nfs4proc.c
++++ b/fs/nfs/nfs4proc.c
+@@ -1127,6 +1127,21 @@ static int nfs4_wait_for_completion_rpc_task(struct rpc_task *task)
+ 	return ret;
+ }
+ 
++static bool nfs4_mode_match_open_stateid(struct nfs4_state *state,
++		fmode_t fmode)
++{
++	switch(fmode & (FMODE_READ|FMODE_WRITE)) {
++	case FMODE_READ|FMODE_WRITE:
++		return state->n_rdwr != 0;
++	case FMODE_WRITE:
++		return state->n_wronly != 0;
++	case FMODE_READ:
++		return state->n_rdonly != 0;
++	}
++	WARN_ON_ONCE(1);
++	return false;
++}
++
+ static int can_open_cached(struct nfs4_state *state, fmode_t mode, int open_mode)
+ {
+ 	int ret = 0;
+@@ -1561,17 +1576,13 @@ static struct nfs4_opendata *nfs4_open_recoverdata_alloc(struct nfs_open_context
+ 	return opendata;
+ }
+ 
+-static int nfs4_open_recover_helper(struct nfs4_opendata *opendata, fmode_t fmode, struct nfs4_state **res)
++static int nfs4_open_recover_helper(struct nfs4_opendata *opendata,
++		fmode_t fmode)
+ {
+ 	struct nfs4_state *newstate;
+ 	int ret;
+ 
+-	if ((opendata->o_arg.claim == NFS4_OPEN_CLAIM_DELEGATE_CUR ||
+-	     opendata->o_arg.claim == NFS4_OPEN_CLAIM_DELEG_CUR_FH) &&
+-	    (opendata->o_arg.u.delegation_type & fmode) != fmode)
+-		/* This mode can't have been delegated, so we must have
+-		 * a valid open_stateid to cover it - not need to reclaim.
+-		 */
++	if (!nfs4_mode_match_open_stateid(opendata->state, fmode))
+ 		return 0;
+ 	opendata->o_arg.open_flags = 0;
+ 	opendata->o_arg.fmode = fmode;
+@@ -1587,14 +1598,14 @@ static int nfs4_open_recover_helper(struct nfs4_opendata *opendata, fmode_t fmod
+ 	newstate = nfs4_opendata_to_nfs4_state(opendata);
+ 	if (IS_ERR(newstate))
+ 		return PTR_ERR(newstate);
++	if (newstate != opendata->state)
++		ret = -ESTALE;
+ 	nfs4_close_state(newstate, fmode);
+-	*res = newstate;
+-	return 0;
++	return ret;
+ }
+ 
+ static int nfs4_open_recover(struct nfs4_opendata *opendata, struct nfs4_state *state)
+ {
+-	struct nfs4_state *newstate;
+ 	int ret;
+ 
+ 	/* Don't trigger recovery in nfs_test_and_clear_all_open_stateid */
+@@ -1605,27 +1616,15 @@ static int nfs4_open_recover(struct nfs4_opendata *opendata, struct nfs4_state *
+ 	clear_bit(NFS_DELEGATED_STATE, &state->flags);
+ 	clear_bit(NFS_OPEN_STATE, &state->flags);
+ 	smp_rmb();
+-	if (state->n_rdwr != 0) {
+-		ret = nfs4_open_recover_helper(opendata, FMODE_READ|FMODE_WRITE, &newstate);
+-		if (ret != 0)
+-			return ret;
+-		if (newstate != state)
+-			return -ESTALE;
+-	}
+-	if (state->n_wronly != 0) {
+-		ret = nfs4_open_recover_helper(opendata, FMODE_WRITE, &newstate);
+-		if (ret != 0)
+-			return ret;
+-		if (newstate != state)
+-			return -ESTALE;
+-	}
+-	if (state->n_rdonly != 0) {
+-		ret = nfs4_open_recover_helper(opendata, FMODE_READ, &newstate);
+-		if (ret != 0)
+-			return ret;
+-		if (newstate != state)
+-			return -ESTALE;
+-	}
++	ret = nfs4_open_recover_helper(opendata, FMODE_READ|FMODE_WRITE);
++	if (ret != 0)
++		return ret;
++	ret = nfs4_open_recover_helper(opendata, FMODE_WRITE);
++	if (ret != 0)
++		return ret;
++	ret = nfs4_open_recover_helper(opendata, FMODE_READ);
++	if (ret != 0)
++		return ret;
+ 	/*
+ 	 * We may have performed cached opens for all three recoveries.
+ 	 * Check if we need to update the current stateid.
+@@ -1749,18 +1748,32 @@ static int nfs4_handle_delegation_recall_error(struct nfs_server *server, struct
+ 	return err;
+ }
+ 
+-int nfs4_open_delegation_recall(struct nfs_open_context *ctx, struct nfs4_state *state, const nfs4_stateid *stateid)
++int nfs4_open_delegation_recall(struct nfs_open_context *ctx,
++		struct nfs4_state *state, const nfs4_stateid *stateid,
++		fmode_t type)
+ {
+ 	struct nfs_server *server = NFS_SERVER(state->inode);
+ 	struct nfs4_opendata *opendata;
+-	int err;
++	int err = 0;
+ 
+ 	opendata = nfs4_open_recoverdata_alloc(ctx, state,
+ 			NFS4_OPEN_CLAIM_DELEG_CUR_FH);
+ 	if (IS_ERR(opendata))
+ 		return PTR_ERR(opendata);
+ 	nfs4_stateid_copy(&opendata->o_arg.u.delegation, stateid);
+-	err = nfs4_open_recover(opendata, state);
++	clear_bit(NFS_DELEGATED_STATE, &state->flags);
++	switch (type & (FMODE_READ|FMODE_WRITE)) {
++	case FMODE_READ|FMODE_WRITE:
++	case FMODE_WRITE:
++		err = nfs4_open_recover_helper(opendata, FMODE_READ|FMODE_WRITE);
++		if (err)
++			break;
++		err = nfs4_open_recover_helper(opendata, FMODE_WRITE);
++		if (err)
++			break;
++	case FMODE_READ:
++		err = nfs4_open_recover_helper(opendata, FMODE_READ);
++	}
+ 	nfs4_opendata_put(opendata);
+ 	return nfs4_handle_delegation_recall_error(server, state, stateid, err);
+ }
+diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
+index 7c5718ba625e..fe3ddd20ff89 100644
+--- a/fs/nfs/pagelist.c
++++ b/fs/nfs/pagelist.c
+@@ -508,7 +508,7 @@ size_t nfs_generic_pg_test(struct nfs_pageio_descriptor *desc,
+ 	 * for it without upsetting the slab allocator.
+ 	 */
+ 	if (((mirror->pg_count + req->wb_bytes) >> PAGE_SHIFT) *
+-			sizeof(struct page) > PAGE_SIZE)
++			sizeof(struct page *) > PAGE_SIZE)
+ 		return 0;
+ 
+ 	return min(mirror->pg_bsize - mirror->pg_count, (size_t)req->wb_bytes);
+diff --git a/fs/nfs/read.c b/fs/nfs/read.c
+index ae0ff7a11b40..01b8cc8e8cfc 100644
+--- a/fs/nfs/read.c
++++ b/fs/nfs/read.c
+@@ -72,6 +72,9 @@ void nfs_pageio_reset_read_mds(struct nfs_pageio_descriptor *pgio)
+ {
+ 	struct nfs_pgio_mirror *mirror;
+ 
++	if (pgio->pg_ops && pgio->pg_ops->pg_cleanup)
++		pgio->pg_ops->pg_cleanup(pgio);
++
+ 	pgio->pg_ops = &nfs_pgio_rw_ops;
+ 
+ 	/* read path should never have more than one mirror */
+diff --git a/fs/nfs/write.c b/fs/nfs/write.c
+index fdee9270ca15..b45b465bc205 100644
+--- a/fs/nfs/write.c
++++ b/fs/nfs/write.c
+@@ -1223,7 +1223,7 @@ static int nfs_can_extend_write(struct file *file, struct page *page, struct ino
+ 		return 1;
+ 	if (!flctx || (list_empty_careful(&flctx->flc_flock) &&
+ 		       list_empty_careful(&flctx->flc_posix)))
+-		return 0;
++		return 1;
+ 
+ 	/* Check to see if there are whole file write locks */
+ 	ret = 0;
+@@ -1351,6 +1351,9 @@ void nfs_pageio_reset_write_mds(struct nfs_pageio_descriptor *pgio)
+ {
+ 	struct nfs_pgio_mirror *mirror;
+ 
++	if (pgio->pg_ops && pgio->pg_ops->pg_cleanup)
++		pgio->pg_ops->pg_cleanup(pgio);
++
+ 	pgio->pg_ops = &nfs_pgio_rw_ops;
+ 
+ 	nfs_pageio_stop_mirroring(pgio);
+diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
+index fdf4b41d0609..482cfd34472d 100644
+--- a/fs/ocfs2/dlm/dlmmaster.c
++++ b/fs/ocfs2/dlm/dlmmaster.c
+@@ -1439,6 +1439,7 @@ int dlm_master_request_handler(struct o2net_msg *msg, u32 len, void *data,
+ 	int found, ret;
+ 	int set_maybe;
+ 	int dispatch_assert = 0;
++	int dispatched = 0;
+ 
+ 	if (!dlm_grab(dlm))
+ 		return DLM_MASTER_RESP_NO;
+@@ -1658,15 +1659,18 @@ send_response:
+ 			mlog(ML_ERROR, "failed to dispatch assert master work\n");
+ 			response = DLM_MASTER_RESP_ERROR;
+ 			dlm_lockres_put(res);
+-		} else
++		} else {
++			dispatched = 1;
+ 			__dlm_lockres_grab_inflight_worker(dlm, res);
++		}
+ 		spin_unlock(&res->spinlock);
+ 	} else {
+ 		if (res)
+ 			dlm_lockres_put(res);
+ 	}
+ 
+-	dlm_put(dlm);
++	if (!dispatched)
++		dlm_put(dlm);
+ 	return response;
+ }
+ 
+@@ -2090,7 +2094,6 @@ int dlm_dispatch_assert_master(struct dlm_ctxt *dlm,
+ 
+ 
+ 	/* queue up work for dlm_assert_master_worker */
+-	dlm_grab(dlm);  /* get an extra ref for the work item */
+ 	dlm_init_work_item(dlm, item, dlm_assert_master_worker, NULL);
+ 	item->u.am.lockres = res; /* already have a ref */
+ 	/* can optionally ignore node numbers higher than this node */
+diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
+index ce12e0b1a31f..3d90ad7ff91f 100644
+--- a/fs/ocfs2/dlm/dlmrecovery.c
++++ b/fs/ocfs2/dlm/dlmrecovery.c
+@@ -1694,6 +1694,7 @@ int dlm_master_requery_handler(struct o2net_msg *msg, u32 len, void *data,
+ 	unsigned int hash;
+ 	int master = DLM_LOCK_RES_OWNER_UNKNOWN;
+ 	u32 flags = DLM_ASSERT_MASTER_REQUERY;
++	int dispatched = 0;
+ 
+ 	if (!dlm_grab(dlm)) {
+ 		/* since the domain has gone away on this
+@@ -1719,8 +1720,10 @@ int dlm_master_requery_handler(struct o2net_msg *msg, u32 len, void *data,
+ 				dlm_put(dlm);
+ 				/* sender will take care of this and retry */
+ 				return ret;
+-			} else
++			} else {
++				dispatched = 1;
+ 				__dlm_lockres_grab_inflight_worker(dlm, res);
++			}
+ 			spin_unlock(&res->spinlock);
+ 		} else {
+ 			/* put.. incase we are not the master */
+@@ -1730,7 +1733,8 @@ int dlm_master_requery_handler(struct o2net_msg *msg, u32 len, void *data,
+ 	}
+ 	spin_unlock(&dlm->spinlock);
+ 
+-	dlm_put(dlm);
++	if (!dispatched)
++		dlm_put(dlm);
+ 	return master;
+ }
+ 
+diff --git a/fs/ubifs/xattr.c b/fs/ubifs/xattr.c
+index 96f3448b6eb4..fd65b3f1923c 100644
+--- a/fs/ubifs/xattr.c
++++ b/fs/ubifs/xattr.c
+@@ -652,11 +652,8 @@ int ubifs_init_security(struct inode *dentry, struct inode *inode,
+ {
+ 	int err;
+ 
+-	mutex_lock(&inode->i_mutex);
+ 	err = security_inode_init_security(inode, dentry, qstr,
+ 					   &init_xattrs, 0);
+-	mutex_unlock(&inode->i_mutex);
+-
+ 	if (err) {
+ 		struct ubifs_info *c = dentry->i_sb->s_fs_info;
+ 		ubifs_err(c, "cannot initialize security for inode %lu, error %d",
+diff --git a/include/asm-generic/preempt.h b/include/asm-generic/preempt.h
+index d0a7a4753db2..0bec580a4885 100644
+--- a/include/asm-generic/preempt.h
++++ b/include/asm-generic/preempt.h
+@@ -71,9 +71,10 @@ static __always_inline bool __preempt_count_dec_and_test(void)
+ /*
+  * Returns true when we need to resched and can (barring IRQ state).
+  */
+-static __always_inline bool should_resched(void)
++static __always_inline bool should_resched(int preempt_offset)
+ {
+-	return unlikely(!preempt_count() && tif_need_resched());
++	return unlikely(preempt_count() == preempt_offset &&
++			tif_need_resched());
+ }
+ 
+ #ifdef CONFIG_PREEMPT
+diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
+index 83bfb87f5bf1..e2aadbc7151f 100644
+--- a/include/asm-generic/qspinlock.h
++++ b/include/asm-generic/qspinlock.h
+@@ -111,8 +111,8 @@ static inline void queued_spin_unlock_wait(struct qspinlock *lock)
+ 		cpu_relax();
+ }
+ 
+-#ifndef virt_queued_spin_lock
+-static __always_inline bool virt_queued_spin_lock(struct qspinlock *lock)
++#ifndef virt_spin_lock
++static __always_inline bool virt_spin_lock(struct qspinlock *lock)
+ {
+ 	return false;
+ }
+diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
+index 93755a629299..430c876ad717 100644
+--- a/include/linux/cgroup-defs.h
++++ b/include/linux/cgroup-defs.h
+@@ -463,31 +463,8 @@ struct cgroup_subsys {
+ 	unsigned int depends_on;
+ };
+ 
+-extern struct percpu_rw_semaphore cgroup_threadgroup_rwsem;
+-
+-/**
+- * cgroup_threadgroup_change_begin - threadgroup exclusion for cgroups
+- * @tsk: target task
+- *
+- * Called from threadgroup_change_begin() and allows cgroup operations to
+- * synchronize against threadgroup changes using a percpu_rw_semaphore.
+- */
+-static inline void cgroup_threadgroup_change_begin(struct task_struct *tsk)
+-{
+-	percpu_down_read(&cgroup_threadgroup_rwsem);
+-}
+-
+-/**
+- * cgroup_threadgroup_change_end - threadgroup exclusion for cgroups
+- * @tsk: target task
+- *
+- * Called from threadgroup_change_end().  Counterpart of
+- * cgroup_threadcgroup_change_begin().
+- */
+-static inline void cgroup_threadgroup_change_end(struct task_struct *tsk)
+-{
+-	percpu_up_read(&cgroup_threadgroup_rwsem);
+-}
++void cgroup_threadgroup_change_begin(struct task_struct *tsk);
++void cgroup_threadgroup_change_end(struct task_struct *tsk);
+ 
+ #else	/* CONFIG_CGROUPS */
+ 
+diff --git a/include/linux/init_task.h b/include/linux/init_task.h
+index e8493fee8160..bb9b075f0eb0 100644
+--- a/include/linux/init_task.h
++++ b/include/linux/init_task.h
+@@ -25,6 +25,13 @@
+ extern struct files_struct init_files;
+ extern struct fs_struct init_fs;
+ 
++#ifdef CONFIG_CGROUPS
++#define INIT_GROUP_RWSEM(sig)						\
++	.group_rwsem = __RWSEM_INITIALIZER(sig.group_rwsem),
++#else
++#define INIT_GROUP_RWSEM(sig)
++#endif
++
+ #ifdef CONFIG_CPUSETS
+ #define INIT_CPUSET_SEQ(tsk)							\
+ 	.mems_allowed_seq = SEQCNT_ZERO(tsk.mems_allowed_seq),
+@@ -48,6 +55,7 @@ extern struct fs_struct init_fs;
+ 	},								\
+ 	.cred_guard_mutex =						\
+ 		 __MUTEX_INITIALIZER(sig.cred_guard_mutex),		\
++	INIT_GROUP_RWSEM(sig)						\
+ }
+ 
+ extern struct nsproxy init_nsproxy;
+diff --git a/include/linux/mm.h b/include/linux/mm.h
+index bf6f117fcf4d..2b05068f5878 100644
+--- a/include/linux/mm.h
++++ b/include/linux/mm.h
+@@ -916,6 +916,27 @@ static inline void set_page_links(struct page *page, enum zone_type zone,
+ #endif
+ }
+ 
++#ifdef CONFIG_MEMCG
++static inline struct mem_cgroup *page_memcg(struct page *page)
++{
++	return page->mem_cgroup;
++}
++
++static inline void set_page_memcg(struct page *page, struct mem_cgroup *memcg)
++{
++	page->mem_cgroup = memcg;
++}
++#else
++static inline struct mem_cgroup *page_memcg(struct page *page)
++{
++	return NULL;
++}
++
++static inline void set_page_memcg(struct page *page, struct mem_cgroup *memcg)
++{
++}
++#endif
++
+ /*
+  * Some inline functions in vmstat.h depend on page_zone()
+  */
+diff --git a/include/linux/preempt.h b/include/linux/preempt.h
+index 84991f185173..bea8dd8ff5e0 100644
+--- a/include/linux/preempt.h
++++ b/include/linux/preempt.h
+@@ -84,13 +84,21 @@
+  */
+ #define in_nmi()	(preempt_count() & NMI_MASK)
+ 
++/*
++ * The preempt_count offset after preempt_disable();
++ */
+ #if defined(CONFIG_PREEMPT_COUNT)
+-# define PREEMPT_DISABLE_OFFSET 1
++# define PREEMPT_DISABLE_OFFSET	PREEMPT_OFFSET
+ #else
+-# define PREEMPT_DISABLE_OFFSET 0
++# define PREEMPT_DISABLE_OFFSET	0
+ #endif
+ 
+ /*
++ * The preempt_count offset after spin_lock()
++ */
++#define PREEMPT_LOCK_OFFSET	PREEMPT_DISABLE_OFFSET
++
++/*
+  * The preempt_count offset needed for things like:
+  *
+  *  spin_lock_bh()
+@@ -103,7 +111,7 @@
+  *
+  * Work as expected.
+  */
+-#define SOFTIRQ_LOCK_OFFSET (SOFTIRQ_DISABLE_OFFSET + PREEMPT_DISABLE_OFFSET)
++#define SOFTIRQ_LOCK_OFFSET (SOFTIRQ_DISABLE_OFFSET + PREEMPT_LOCK_OFFSET)
+ 
+ /*
+  * Are we running in atomic context?  WARNING: this macro cannot
+@@ -124,7 +132,8 @@
+ #if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
+ extern void preempt_count_add(int val);
+ extern void preempt_count_sub(int val);
+-#define preempt_count_dec_and_test() ({ preempt_count_sub(1); should_resched(); })
++#define preempt_count_dec_and_test() \
++	({ preempt_count_sub(1); should_resched(0); })
+ #else
+ #define preempt_count_add(val)	__preempt_count_add(val)
+ #define preempt_count_sub(val)	__preempt_count_sub(val)
+@@ -184,7 +193,7 @@ do { \
+ 
+ #define preempt_check_resched() \
+ do { \
+-	if (should_resched()) \
++	if (should_resched(0)) \
+ 		__preempt_schedule(); \
+ } while (0)
+ 
+diff --git a/include/linux/sched.h b/include/linux/sched.h
+index 04b5ada460b4..bfca8aa215d1 100644
+--- a/include/linux/sched.h
++++ b/include/linux/sched.h
+@@ -754,6 +754,18 @@ struct signal_struct {
+ 	unsigned audit_tty_log_passwd;
+ 	struct tty_audit_buf *tty_audit_buf;
+ #endif
++#ifdef CONFIG_CGROUPS
++	/*
++	 * group_rwsem prevents new tasks from entering the threadgroup and
++	 * member tasks from exiting,a more specifically, setting of
++	 * PF_EXITING.  fork and exit paths are protected with this rwsem
++	 * using threadgroup_change_begin/end().  Users which require
++	 * threadgroup to remain stable should use threadgroup_[un]lock()
++	 * which also takes care of exec path.  Currently, cgroup is the
++	 * only user.
++	 */
++	struct rw_semaphore group_rwsem;
++#endif
+ 
+ 	oom_flags_t oom_flags;
+ 	short oom_score_adj;		/* OOM kill score adjustment */
+@@ -2897,12 +2909,6 @@ extern int _cond_resched(void);
+ 
+ extern int __cond_resched_lock(spinlock_t *lock);
+ 
+-#ifdef CONFIG_PREEMPT_COUNT
+-#define PREEMPT_LOCK_OFFSET	PREEMPT_OFFSET
+-#else
+-#define PREEMPT_LOCK_OFFSET	0
+-#endif
+-
+ #define cond_resched_lock(lock) ({				\
+ 	___might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET);\
+ 	__cond_resched_lock(lock);				\
+diff --git a/include/linux/security.h b/include/linux/security.h
+index 79d85ddf8093..2f4c1f7aa7db 100644
+--- a/include/linux/security.h
++++ b/include/linux/security.h
+@@ -946,7 +946,7 @@ static inline int security_task_prctl(int option, unsigned long arg2,
+ 				      unsigned long arg4,
+ 				      unsigned long arg5)
+ {
+-	return cap_task_prctl(option, arg2, arg3, arg3, arg5);
++	return cap_task_prctl(option, arg2, arg3, arg4, arg5);
+ }
+ 
+ static inline void security_task_to_inode(struct task_struct *p, struct inode *inode)
+diff --git a/include/net/netfilter/br_netfilter.h b/include/net/netfilter/br_netfilter.h
+index bab824bde92c..d4c6b5f30acd 100644
+--- a/include/net/netfilter/br_netfilter.h
++++ b/include/net/netfilter/br_netfilter.h
+@@ -59,7 +59,7 @@ static inline unsigned int
+ br_nf_pre_routing_ipv6(const struct nf_hook_ops *ops, struct sk_buff *skb,
+ 		       const struct nf_hook_state *state)
+ {
+-	return NF_DROP;
++	return NF_ACCEPT;
+ }
+ #endif
+ 
+diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
+index 37cd3911d5c5..4023c4ce260f 100644
+--- a/include/net/netfilter/nf_conntrack.h
++++ b/include/net/netfilter/nf_conntrack.h
+@@ -292,6 +292,7 @@ extern unsigned int nf_conntrack_hash_rnd;
+ void init_nf_conntrack_hash_rnd(void);
+ 
+ struct nf_conn *nf_ct_tmpl_alloc(struct net *net, u16 zone, gfp_t flags);
++void nf_ct_tmpl_free(struct nf_conn *tmpl);
+ 
+ #define NF_CT_STAT_INC(net, count)	  __this_cpu_inc((net)->ct.stat->count)
+ #define NF_CT_STAT_INC_ATOMIC(net, count) this_cpu_inc((net)->ct.stat->count)
+diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
+index 2a246680a6c3..aa8bee72c9d3 100644
+--- a/include/net/netfilter/nf_tables.h
++++ b/include/net/netfilter/nf_tables.h
+@@ -125,7 +125,7 @@ static inline enum nft_data_types nft_dreg_to_type(enum nft_registers reg)
+ 
+ static inline enum nft_registers nft_type_to_reg(enum nft_data_types type)
+ {
+-	return type == NFT_DATA_VERDICT ? NFT_REG_VERDICT : NFT_REG_1;
++	return type == NFT_DATA_VERDICT ? NFT_REG_VERDICT : NFT_REG_1 * NFT_REG_SIZE / NFT_REG32_SIZE;
+ }
+ 
+ unsigned int nft_parse_register(const struct nlattr *attr);
+diff --git a/include/target/iscsi/iscsi_target_core.h b/include/target/iscsi/iscsi_target_core.h
+index 0aedbb2c10e0..7e7f8875ac32 100644
+--- a/include/target/iscsi/iscsi_target_core.h
++++ b/include/target/iscsi/iscsi_target_core.h
+@@ -776,7 +776,6 @@ struct iscsi_np {
+ 	enum iscsi_timer_flags_table np_login_timer_flags;
+ 	u32			np_exports;
+ 	enum np_flags_table	np_flags;
+-	unsigned char		np_ip[IPV6_ADDRESS_SPACE];
+ 	u16			np_port;
+ 	spinlock_t		np_thread_lock;
+ 	struct completion	np_restart_comp;
+diff --git a/include/xen/interface/sched.h b/include/xen/interface/sched.h
+index 9ce083960a25..f18490985fc8 100644
+--- a/include/xen/interface/sched.h
++++ b/include/xen/interface/sched.h
+@@ -107,5 +107,13 @@ struct sched_watchdog {
+ #define SHUTDOWN_suspend    2  /* Clean up, save suspend info, kill.         */
+ #define SHUTDOWN_crash      3  /* Tell controller we've crashed.             */
+ #define SHUTDOWN_watchdog   4  /* Restart because watchdog time expired.     */
++/*
++ * Domain asked to perform 'soft reset' for it. The expected behavior is to
++ * reset internal Xen state for the domain returning it to the point where it
++ * was created but leaving the domain's memory contents and vCPU contexts
++ * intact. This will allow the domain to start over and set up all Xen specific
++ * interfaces again.
++ */
++#define SHUTDOWN_soft_reset 5
+ 
+ #endif /* __XEN_PUBLIC_SCHED_H__ */
+diff --git a/ipc/msg.c b/ipc/msg.c
+index 66c4f567eb73..1471db9a7e61 100644
+--- a/ipc/msg.c
++++ b/ipc/msg.c
+@@ -137,13 +137,6 @@ static int newque(struct ipc_namespace *ns, struct ipc_params *params)
+ 		return retval;
+ 	}
+ 
+-	/* ipc_addid() locks msq upon success. */
+-	id = ipc_addid(&msg_ids(ns), &msq->q_perm, ns->msg_ctlmni);
+-	if (id < 0) {
+-		ipc_rcu_putref(msq, msg_rcu_free);
+-		return id;
+-	}
+-
+ 	msq->q_stime = msq->q_rtime = 0;
+ 	msq->q_ctime = get_seconds();
+ 	msq->q_cbytes = msq->q_qnum = 0;
+@@ -153,6 +146,13 @@ static int newque(struct ipc_namespace *ns, struct ipc_params *params)
+ 	INIT_LIST_HEAD(&msq->q_receivers);
+ 	INIT_LIST_HEAD(&msq->q_senders);
+ 
++	/* ipc_addid() locks msq upon success. */
++	id = ipc_addid(&msg_ids(ns), &msq->q_perm, ns->msg_ctlmni);
++	if (id < 0) {
++		ipc_rcu_putref(msq, msg_rcu_free);
++		return id;
++	}
++
+ 	ipc_unlock_object(&msq->q_perm);
+ 	rcu_read_unlock();
+ 
+diff --git a/ipc/shm.c b/ipc/shm.c
+index 4aef24d91b63..0e61fd430547 100644
+--- a/ipc/shm.c
++++ b/ipc/shm.c
+@@ -551,12 +551,6 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
+ 	if (IS_ERR(file))
+ 		goto no_file;
+ 
+-	id = ipc_addid(&shm_ids(ns), &shp->shm_perm, ns->shm_ctlmni);
+-	if (id < 0) {
+-		error = id;
+-		goto no_id;
+-	}
+-
+ 	shp->shm_cprid = task_tgid_vnr(current);
+ 	shp->shm_lprid = 0;
+ 	shp->shm_atim = shp->shm_dtim = 0;
+@@ -565,6 +559,13 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
+ 	shp->shm_nattch = 0;
+ 	shp->shm_file = file;
+ 	shp->shm_creator = current;
++
++	id = ipc_addid(&shm_ids(ns), &shp->shm_perm, ns->shm_ctlmni);
++	if (id < 0) {
++		error = id;
++		goto no_id;
++	}
++
+ 	list_add(&shp->shm_clist, &current->sysvshm.shm_clist);
+ 
+ 	/*
+diff --git a/ipc/util.c b/ipc/util.c
+index be4230020a1f..0f401d94b7c6 100644
+--- a/ipc/util.c
++++ b/ipc/util.c
+@@ -237,6 +237,10 @@ int ipc_addid(struct ipc_ids *ids, struct kern_ipc_perm *new, int size)
+ 	rcu_read_lock();
+ 	spin_lock(&new->lock);
+ 
++	current_euid_egid(&euid, &egid);
++	new->cuid = new->uid = euid;
++	new->gid = new->cgid = egid;
++
+ 	id = idr_alloc(&ids->ipcs_idr, new,
+ 		       (next_id < 0) ? 0 : ipcid_to_idx(next_id), 0,
+ 		       GFP_NOWAIT);
+@@ -249,10 +253,6 @@ int ipc_addid(struct ipc_ids *ids, struct kern_ipc_perm *new, int size)
+ 
+ 	ids->in_use++;
+ 
+-	current_euid_egid(&euid, &egid);
+-	new->cuid = new->uid = euid;
+-	new->gid = new->cgid = egid;
+-
+ 	if (next_id < 0) {
+ 		new->seq = ids->seq++;
+ 		if (ids->seq > IPCID_SEQ_MAX)
+diff --git a/kernel/cgroup.c b/kernel/cgroup.c
+index c6c4240e7d28..fe6f855de3d1 100644
+--- a/kernel/cgroup.c
++++ b/kernel/cgroup.c
+@@ -46,7 +46,6 @@
+ #include <linux/slab.h>
+ #include <linux/spinlock.h>
+ #include <linux/rwsem.h>
+-#include <linux/percpu-rwsem.h>
+ #include <linux/string.h>
+ #include <linux/sort.h>
+ #include <linux/kmod.h>
+@@ -104,8 +103,6 @@ static DEFINE_SPINLOCK(cgroup_idr_lock);
+  */
+ static DEFINE_SPINLOCK(release_agent_path_lock);
+ 
+-struct percpu_rw_semaphore cgroup_threadgroup_rwsem;
+-
+ #define cgroup_assert_mutex_or_rcu_locked()				\
+ 	rcu_lockdep_assert(rcu_read_lock_held() ||			\
+ 			   lockdep_is_held(&cgroup_mutex),		\
+@@ -870,6 +867,48 @@ static struct css_set *find_css_set(struct css_set *old_cset,
+ 	return cset;
+ }
+ 
++void cgroup_threadgroup_change_begin(struct task_struct *tsk)
++{
++	down_read(&tsk->signal->group_rwsem);
++}
++
++void cgroup_threadgroup_change_end(struct task_struct *tsk)
++{
++	up_read(&tsk->signal->group_rwsem);
++}
++
++/**
++ * threadgroup_lock - lock threadgroup
++ * @tsk: member task of the threadgroup to lock
++ *
++ * Lock the threadgroup @tsk belongs to.  No new task is allowed to enter
++ * and member tasks aren't allowed to exit (as indicated by PF_EXITING) or
++ * change ->group_leader/pid.  This is useful for cases where the threadgroup
++ * needs to stay stable across blockable operations.
++ *
++ * fork and exit explicitly call threadgroup_change_{begin|end}() for
++ * synchronization.  While held, no new task will be added to threadgroup
++ * and no existing live task will have its PF_EXITING set.
++ *
++ * de_thread() does threadgroup_change_{begin|end}() when a non-leader
++ * sub-thread becomes a new leader.
++ */
++static void threadgroup_lock(struct task_struct *tsk)
++{
++	down_write(&tsk->signal->group_rwsem);
++}
++
++/**
++ * threadgroup_unlock - unlock threadgroup
++ * @tsk: member task of the threadgroup to unlock
++ *
++ * Reverse threadgroup_lock().
++ */
++static inline void threadgroup_unlock(struct task_struct *tsk)
++{
++	up_write(&tsk->signal->group_rwsem);
++}
++
+ static struct cgroup_root *cgroup_root_from_kf(struct kernfs_root *kf_root)
+ {
+ 	struct cgroup *root_cgrp = kf_root->kn->priv;
+@@ -2066,9 +2105,9 @@ static void cgroup_task_migrate(struct cgroup *old_cgrp,
+ 	lockdep_assert_held(&css_set_rwsem);
+ 
+ 	/*
+-	 * We are synchronized through cgroup_threadgroup_rwsem against
+-	 * PF_EXITING setting such that we can't race against cgroup_exit()
+-	 * changing the css_set to init_css_set and dropping the old one.
++	 * We are synchronized through threadgroup_lock() against PF_EXITING
++	 * setting such that we can't race against cgroup_exit() changing the
++	 * css_set to init_css_set and dropping the old one.
+ 	 */
+ 	WARN_ON_ONCE(tsk->flags & PF_EXITING);
+ 	old_cset = task_css_set(tsk);
+@@ -2125,11 +2164,10 @@ static void cgroup_migrate_finish(struct list_head *preloaded_csets)
+  * @src_cset and add it to @preloaded_csets, which should later be cleaned
+  * up by cgroup_migrate_finish().
+  *
+- * This function may be called without holding cgroup_threadgroup_rwsem
+- * even if the target is a process.  Threads may be created and destroyed
+- * but as long as cgroup_mutex is not dropped, no new css_set can be put
+- * into play and the preloaded css_sets are guaranteed to cover all
+- * migrations.
++ * This function may be called without holding threadgroup_lock even if the
++ * target is a process.  Threads may be created and destroyed but as long
++ * as cgroup_mutex is not dropped, no new css_set can be put into play and
++ * the preloaded css_sets are guaranteed to cover all migrations.
+  */
+ static void cgroup_migrate_add_src(struct css_set *src_cset,
+ 				   struct cgroup *dst_cgrp,
+@@ -2232,7 +2270,7 @@ err:
+  * @threadgroup: whether @leader points to the whole process or a single task
+  *
+  * Migrate a process or task denoted by @leader to @cgrp.  If migrating a
+- * process, the caller must be holding cgroup_threadgroup_rwsem.  The
++ * process, the caller must be holding threadgroup_lock of @leader.  The
+  * caller is also responsible for invoking cgroup_migrate_add_src() and
+  * cgroup_migrate_prepare_dst() on the targets before invoking this
+  * function and following up with cgroup_migrate_finish().
+@@ -2360,7 +2398,7 @@ out_release_tset:
+  * @leader: the task or the leader of the threadgroup to be attached
+  * @threadgroup: attach the whole threadgroup?
+  *
+- * Call holding cgroup_mutex and cgroup_threadgroup_rwsem.
++ * Call holding cgroup_mutex and threadgroup_lock of @leader.
+  */
+ static int cgroup_attach_task(struct cgroup *dst_cgrp,
+ 			      struct task_struct *leader, bool threadgroup)
+@@ -2452,13 +2490,14 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf,
+ 	if (!cgrp)
+ 		return -ENODEV;
+ 
+-	percpu_down_write(&cgroup_threadgroup_rwsem);
++retry_find_task:
+ 	rcu_read_lock();
+ 	if (pid) {
+ 		tsk = find_task_by_vpid(pid);
+ 		if (!tsk) {
++			rcu_read_unlock();
+ 			ret = -ESRCH;
+-			goto out_unlock_rcu;
++			goto out_unlock_cgroup;
+ 		}
+ 	} else {
+ 		tsk = current;
+@@ -2474,23 +2513,37 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf,
+ 	 */
+ 	if (tsk == kthreadd_task || (tsk->flags & PF_NO_SETAFFINITY)) {
+ 		ret = -EINVAL;
+-		goto out_unlock_rcu;
++		rcu_read_unlock();
++		goto out_unlock_cgroup;
+ 	}
+ 
+ 	get_task_struct(tsk);
+ 	rcu_read_unlock();
+ 
++	threadgroup_lock(tsk);
++	if (threadgroup) {
++		if (!thread_group_leader(tsk)) {
++			/*
++			 * a race with de_thread from another thread's exec()
++			 * may strip us of our leadership, if this happens,
++			 * there is no choice but to throw this task away and
++			 * try again; this is
++			 * "double-double-toil-and-trouble-check locking".
++			 */
++			threadgroup_unlock(tsk);
++			put_task_struct(tsk);
++			goto retry_find_task;
++		}
++	}
++
+ 	ret = cgroup_procs_write_permission(tsk, cgrp, of);
+ 	if (!ret)
+ 		ret = cgroup_attach_task(cgrp, tsk, threadgroup);
+ 
+-	put_task_struct(tsk);
+-	goto out_unlock_threadgroup;
++	threadgroup_unlock(tsk);
+ 
+-out_unlock_rcu:
+-	rcu_read_unlock();
+-out_unlock_threadgroup:
+-	percpu_up_write(&cgroup_threadgroup_rwsem);
++	put_task_struct(tsk);
++out_unlock_cgroup:
+ 	cgroup_kn_unlock(of->kn);
+ 	return ret ?: nbytes;
+ }
+@@ -2635,8 +2688,6 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp)
+ 
+ 	lockdep_assert_held(&cgroup_mutex);
+ 
+-	percpu_down_write(&cgroup_threadgroup_rwsem);
+-
+ 	/* look up all csses currently attached to @cgrp's subtree */
+ 	down_read(&css_set_rwsem);
+ 	css_for_each_descendant_pre(css, cgroup_css(cgrp, NULL)) {
+@@ -2692,8 +2743,17 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp)
+ 				goto out_finish;
+ 			last_task = task;
+ 
++			threadgroup_lock(task);
++			/* raced against de_thread() from another thread? */
++			if (!thread_group_leader(task)) {
++				threadgroup_unlock(task);
++				put_task_struct(task);
++				continue;
++			}
++
+ 			ret = cgroup_migrate(src_cset->dfl_cgrp, task, true);
+ 
++			threadgroup_unlock(task);
+ 			put_task_struct(task);
+ 
+ 			if (WARN(ret, "cgroup: failed to update controllers for the default hierarchy (%d), further operations may crash or hang\n", ret))
+@@ -2703,7 +2763,6 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp)
+ 
+ out_finish:
+ 	cgroup_migrate_finish(&preloaded_csets);
+-	percpu_up_write(&cgroup_threadgroup_rwsem);
+ 	return ret;
+ }
+ 
+@@ -5013,7 +5072,6 @@ int __init cgroup_init(void)
+ 	unsigned long key;
+ 	int ssid, err;
+ 
+-	BUG_ON(percpu_init_rwsem(&cgroup_threadgroup_rwsem));
+ 	BUG_ON(cgroup_init_cftypes(NULL, cgroup_dfl_base_files));
+ 	BUG_ON(cgroup_init_cftypes(NULL, cgroup_legacy_base_files));
+ 
+diff --git a/kernel/fork.c b/kernel/fork.c
+index 26a70dc7a915..e769c8c86f86 100644
+--- a/kernel/fork.c
++++ b/kernel/fork.c
+@@ -1146,6 +1146,10 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
+ 	tty_audit_fork(sig);
+ 	sched_autogroup_fork(sig);
+ 
++#ifdef CONFIG_CGROUPS
++	init_rwsem(&sig->group_rwsem);
++#endif
++
+ 	sig->oom_score_adj = current->signal->oom_score_adj;
+ 	sig->oom_score_adj_min = current->signal->oom_score_adj_min;
+ 
+diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c
+index 0e97c142ce40..4e6267a34440 100644
+--- a/kernel/irq/proc.c
++++ b/kernel/irq/proc.c
+@@ -12,6 +12,7 @@
+ #include <linux/seq_file.h>
+ #include <linux/interrupt.h>
+ #include <linux/kernel_stat.h>
++#include <linux/mutex.h>
+ 
+ #include "internals.h"
+ 
+@@ -323,18 +324,29 @@ void register_handler_proc(unsigned int irq, struct irqaction *action)
+ 
+ void register_irq_proc(unsigned int irq, struct irq_desc *desc)
+ {
++	static DEFINE_MUTEX(register_lock);
+ 	char name [MAX_NAMELEN];
+ 
+-	if (!root_irq_dir || (desc->irq_data.chip == &no_irq_chip) || desc->dir)
++	if (!root_irq_dir || (desc->irq_data.chip == &no_irq_chip))
+ 		return;
+ 
++	/*
++	 * irq directories are registered only when a handler is
++	 * added, not when the descriptor is created, so multiple
++	 * tasks might try to register at the same time.
++	 */
++	mutex_lock(&register_lock);
++
++	if (desc->dir)
++		goto out_unlock;
++
+ 	memset(name, 0, MAX_NAMELEN);
+ 	sprintf(name, "%d", irq);
+ 
+ 	/* create /proc/irq/1234 */
+ 	desc->dir = proc_mkdir(name, root_irq_dir);
+ 	if (!desc->dir)
+-		return;
++		goto out_unlock;
+ 
+ #ifdef CONFIG_SMP
+ 	/* create /proc/irq/<irq>/smp_affinity */
+@@ -355,6 +367,9 @@ void register_irq_proc(unsigned int irq, struct irq_desc *desc)
+ 
+ 	proc_create_data("spurious", 0444, desc->dir,
+ 			 &irq_spurious_proc_fops, (void *)(long)irq);
++
++out_unlock:
++	mutex_unlock(&register_lock);
+ }
+ 
+ void unregister_irq_proc(unsigned int irq, struct irq_desc *desc)
+diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
+index 38c49202d532..8ed01611ae73 100644
+--- a/kernel/locking/qspinlock.c
++++ b/kernel/locking/qspinlock.c
+@@ -289,7 +289,7 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
+ 	if (pv_enabled())
+ 		goto queue;
+ 
+-	if (virt_queued_spin_lock(lock))
++	if (virt_spin_lock(lock))
+ 		return;
+ 
+ 	/*
+diff --git a/kernel/sched/core.c b/kernel/sched/core.c
+index e9673433cc01..6776631676e0 100644
+--- a/kernel/sched/core.c
++++ b/kernel/sched/core.c
+@@ -2461,11 +2461,11 @@ static struct rq *finish_task_switch(struct task_struct *prev)
+ 	 * If a task dies, then it sets TASK_DEAD in tsk->state and calls
+ 	 * schedule one last time. The schedule call will never return, and
+ 	 * the scheduled task must drop that reference.
+-	 * The test for TASK_DEAD must occur while the runqueue locks are
+-	 * still held, otherwise prev could be scheduled on another cpu, die
+-	 * there before we look at prev->state, and then the reference would
+-	 * be dropped twice.
+-	 *		Manfred Spraul <manfred@colorfullife.com>
++	 *
++	 * We must observe prev->state before clearing prev->on_cpu (in
++	 * finish_lock_switch), otherwise a concurrent wakeup can get prev
++	 * running on another CPU and we could rave with its RUNNING -> DEAD
++	 * transition, resulting in a double drop.
+ 	 */
+ 	prev_state = prev->state;
+ 	vtime_task_switch(prev);
+@@ -2614,13 +2614,20 @@ unsigned long nr_running(void)
+ 
+ /*
+  * Check if only the current task is running on the cpu.
++ *
++ * Caution: this function does not check that the caller has disabled
++ * preemption, thus the result might have a time-of-check-to-time-of-use
++ * race.  The caller is responsible to use it correctly, for example:
++ *
++ * - from a non-preemptable section (of course)
++ *
++ * - from a thread that is bound to a single CPU
++ *
++ * - in a loop with very short iterations (e.g. a polling loop)
+  */
+ bool single_task_running(void)
+ {
+-	if (cpu_rq(smp_processor_id())->nr_running == 1)
+-		return true;
+-	else
+-		return false;
++	return raw_rq()->nr_running == 1;
+ }
+ EXPORT_SYMBOL(single_task_running);
+ 
+@@ -4492,7 +4499,7 @@ SYSCALL_DEFINE0(sched_yield)
+ 
+ int __sched _cond_resched(void)
+ {
+-	if (should_resched()) {
++	if (should_resched(0)) {
+ 		preempt_schedule_common();
+ 		return 1;
+ 	}
+@@ -4510,7 +4517,7 @@ EXPORT_SYMBOL(_cond_resched);
+  */
+ int __cond_resched_lock(spinlock_t *lock)
+ {
+-	int resched = should_resched();
++	int resched = should_resched(PREEMPT_LOCK_OFFSET);
+ 	int ret = 0;
+ 
+ 	lockdep_assert_held(lock);
+@@ -4532,7 +4539,7 @@ int __sched __cond_resched_softirq(void)
+ {
+ 	BUG_ON(!in_softirq());
+ 
+-	if (should_resched()) {
++	if (should_resched(SOFTIRQ_DISABLE_OFFSET)) {
+ 		local_bh_enable();
+ 		preempt_schedule_common();
+ 		local_bh_disable();
+diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
+index 84d48790bb6d..08ab96b366bf 100644
+--- a/kernel/sched/sched.h
++++ b/kernel/sched/sched.h
+@@ -1091,9 +1091,10 @@ static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
+ 	 * After ->on_cpu is cleared, the task can be moved to a different CPU.
+ 	 * We must ensure this doesn't happen until the switch is completely
+ 	 * finished.
++	 *
++	 * Pairs with the control dependency and rmb in try_to_wake_up().
+ 	 */
+-	smp_wmb();
+-	prev->on_cpu = 0;
++	smp_store_release(&prev->on_cpu, 0);
+ #endif
+ #ifdef CONFIG_DEBUG_SPINLOCK
+ 	/* this is a valid case when another task releases the spinlock */
+diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
+index 841b72f720e8..3a38775b50c2 100644
+--- a/kernel/time/clocksource.c
++++ b/kernel/time/clocksource.c
+@@ -217,7 +217,7 @@ static void clocksource_watchdog(unsigned long data)
+ 			continue;
+ 
+ 		/* Check the deviation from the watchdog clocksource. */
+-		if ((abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD)) {
++		if (abs64(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD) {
+ 			pr_warn("timekeeping watchdog: Marking clocksource '%s' as unstable because the skew is too large:\n",
+ 				cs->name);
+ 			pr_warn("                      '%s' wd_now: %llx wd_last: %llx mask: %llx\n",
+diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
+index bca3667a2de1..a20d4110e871 100644
+--- a/kernel/time/timekeeping.c
++++ b/kernel/time/timekeeping.c
+@@ -1607,7 +1607,7 @@ static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
+ 	negative = (tick_error < 0);
+ 
+ 	/* Sort out the magnitude of the correction */
+-	tick_error = abs(tick_error);
++	tick_error = abs64(tick_error);
+ 	for (adj = 0; tick_error > interval; adj++)
+ 		tick_error >>= 1;
+ 
+diff --git a/lib/iommu-common.c b/lib/iommu-common.c
+index ff19f66d3f7f..b1c93e94ca7a 100644
+--- a/lib/iommu-common.c
++++ b/lib/iommu-common.c
+@@ -21,8 +21,7 @@ static	DEFINE_PER_CPU(unsigned int, iommu_hash_common);
+ 
+ static inline bool need_flush(struct iommu_map_table *iommu)
+ {
+-	return (iommu->lazy_flush != NULL &&
+-		(iommu->flags & IOMMU_NEED_FLUSH) != 0);
++	return ((iommu->flags & IOMMU_NEED_FLUSH) != 0);
+ }
+ 
+ static inline void set_flush(struct iommu_map_table *iommu)
+@@ -211,7 +210,8 @@ unsigned long iommu_tbl_range_alloc(struct device *dev,
+ 			goto bail;
+ 		}
+ 	}
+-	if (n < pool->hint || need_flush(iommu)) {
++	if (iommu->lazy_flush &&
++	    (n < pool->hint || need_flush(iommu))) {
+ 		clear_flush(iommu);
+ 		iommu->lazy_flush(iommu);
+ 	}
+diff --git a/mm/hugetlb.c b/mm/hugetlb.c
+index a8c3087089d8..62c1ec5a9d31 100644
+--- a/mm/hugetlb.c
++++ b/mm/hugetlb.c
+@@ -2974,6 +2974,14 @@ static void unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma,
+ 			continue;
+ 
+ 		/*
++		 * Shared VMAs have their own reserves and do not affect
++		 * MAP_PRIVATE accounting but it is possible that a shared
++		 * VMA is using the same page so check and skip such VMAs.
++		 */
++		if (iter_vma->vm_flags & VM_MAYSHARE)
++			continue;
++
++		/*
+ 		 * Unmap the page from other VMAs without their own reserves.
+ 		 * They get marked to be SIGKILLed if they fault in these
+ 		 * areas. This is because a future no-page fault on this VMA
+diff --git a/mm/memcontrol.c b/mm/memcontrol.c
+index acb93c554f6e..237d4686482d 100644
+--- a/mm/memcontrol.c
++++ b/mm/memcontrol.c
+@@ -806,12 +806,14 @@ mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_zone *mctz)
+ }
+ 
+ /*
++ * Return page count for single (non recursive) @memcg.
++ *
+  * Implementation Note: reading percpu statistics for memcg.
+  *
+  * Both of vmstat[] and percpu_counter has threshold and do periodic
+  * synchronization to implement "quick" read. There are trade-off between
+  * reading cost and precision of value. Then, we may have a chance to implement
+- * a periodic synchronizion of counter in memcg's counter.
++ * a periodic synchronization of counter in memcg's counter.
+  *
+  * But this _read() function is used for user interface now. The user accounts
+  * memory usage by memory cgroup and he _always_ requires exact value because
+@@ -821,17 +823,24 @@ mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_zone *mctz)
+  *
+  * If there are kernel internal actions which can make use of some not-exact
+  * value, and reading all cpu value can be performance bottleneck in some
+- * common workload, threashold and synchonization as vmstat[] should be
++ * common workload, threshold and synchronization as vmstat[] should be
+  * implemented.
+  */
+-static long mem_cgroup_read_stat(struct mem_cgroup *memcg,
+-				 enum mem_cgroup_stat_index idx)
++static unsigned long
++mem_cgroup_read_stat(struct mem_cgroup *memcg, enum mem_cgroup_stat_index idx)
+ {
+ 	long val = 0;
+ 	int cpu;
+ 
++	/* Per-cpu values can be negative, use a signed accumulator */
+ 	for_each_possible_cpu(cpu)
+ 		val += per_cpu(memcg->stat->count[idx], cpu);
++	/*
++	 * Summing races with updates, so val may be negative.  Avoid exposing
++	 * transient negative values.
++	 */
++	if (val < 0)
++		val = 0;
+ 	return val;
+ }
+ 
+@@ -1498,7 +1507,7 @@ void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
+ 		for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
+ 			if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
+ 				continue;
+-			pr_cont(" %s:%ldKB", mem_cgroup_stat_names[i],
++			pr_cont(" %s:%luKB", mem_cgroup_stat_names[i],
+ 				K(mem_cgroup_read_stat(iter, i)));
+ 		}
+ 
+@@ -3119,14 +3128,11 @@ static unsigned long tree_stat(struct mem_cgroup *memcg,
+ 			       enum mem_cgroup_stat_index idx)
+ {
+ 	struct mem_cgroup *iter;
+-	long val = 0;
++	unsigned long val = 0;
+ 
+-	/* Per-cpu values can be negative, use a signed accumulator */
+ 	for_each_mem_cgroup_tree(iter, memcg)
+ 		val += mem_cgroup_read_stat(iter, idx);
+ 
+-	if (val < 0) /* race ? */
+-		val = 0;
+ 	return val;
+ }
+ 
+@@ -3469,7 +3475,7 @@ static int memcg_stat_show(struct seq_file *m, void *v)
+ 	for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
+ 		if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
+ 			continue;
+-		seq_printf(m, "%s %ld\n", mem_cgroup_stat_names[i],
++		seq_printf(m, "%s %lu\n", mem_cgroup_stat_names[i],
+ 			   mem_cgroup_read_stat(memcg, i) * PAGE_SIZE);
+ 	}
+ 
+@@ -3494,13 +3500,13 @@ static int memcg_stat_show(struct seq_file *m, void *v)
+ 			   (u64)memsw * PAGE_SIZE);
+ 
+ 	for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
+-		long long val = 0;
++		unsigned long long val = 0;
+ 
+ 		if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
+ 			continue;
+ 		for_each_mem_cgroup_tree(mi, memcg)
+ 			val += mem_cgroup_read_stat(mi, i) * PAGE_SIZE;
+-		seq_printf(m, "total_%s %lld\n", mem_cgroup_stat_names[i], val);
++		seq_printf(m, "total_%s %llu\n", mem_cgroup_stat_names[i], val);
+ 	}
+ 
+ 	for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++) {
+diff --git a/mm/migrate.c b/mm/migrate.c
+index eb4267107d1f..fcb6204de108 100644
+--- a/mm/migrate.c
++++ b/mm/migrate.c
+@@ -734,6 +734,15 @@ static int move_to_new_page(struct page *newpage, struct page *page,
+ 	if (PageSwapBacked(page))
+ 		SetPageSwapBacked(newpage);
+ 
++	/*
++	 * Indirectly called below, migrate_page_copy() copies PG_dirty and thus
++	 * needs newpage's memcg set to transfer memcg dirty page accounting.
++	 * So perform memcg migration in two steps:
++	 * 1. set newpage->mem_cgroup (here)
++	 * 2. clear page->mem_cgroup (below)
++	 */
++	set_page_memcg(newpage, page_memcg(page));
++
+ 	mapping = page_mapping(page);
+ 	if (!mapping)
+ 		rc = migrate_page(mapping, newpage, page, mode);
+@@ -750,9 +759,10 @@ static int move_to_new_page(struct page *newpage, struct page *page,
+ 		rc = fallback_migrate_page(mapping, newpage, page, mode);
+ 
+ 	if (rc != MIGRATEPAGE_SUCCESS) {
++		set_page_memcg(newpage, NULL);
+ 		newpage->mapping = NULL;
+ 	} else {
+-		mem_cgroup_migrate(page, newpage, false);
++		set_page_memcg(page, NULL);
+ 		if (page_was_mapped)
+ 			remove_migration_ptes(page, newpage);
+ 		page->mapping = NULL;
+@@ -1068,7 +1078,7 @@ out:
+ 	if (rc != MIGRATEPAGE_SUCCESS && put_new_page)
+ 		put_new_page(new_hpage, private);
+ 	else
+-		put_page(new_hpage);
++		putback_active_hugepage(new_hpage);
+ 
+ 	if (result) {
+ 		if (rc)
+diff --git a/mm/slab.c b/mm/slab.c
+index bbd0b47dc6a9..ae360283029c 100644
+--- a/mm/slab.c
++++ b/mm/slab.c
+@@ -2190,9 +2190,16 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
+ 			size += BYTES_PER_WORD;
+ 	}
+ #if FORCED_DEBUG && defined(CONFIG_DEBUG_PAGEALLOC)
+-	if (size >= kmalloc_size(INDEX_NODE + 1)
+-	    && cachep->object_size > cache_line_size()
+-	    && ALIGN(size, cachep->align) < PAGE_SIZE) {
++	/*
++	 * To activate debug pagealloc, off-slab management is necessary
++	 * requirement. In early phase of initialization, small sized slab
++	 * doesn't get initialized so it would not be possible. So, we need
++	 * to check size >= 256. It guarantees that all necessary small
++	 * sized slab is initialized in current slab initialization sequence.
++	 */
++	if (!slab_early_init && size >= kmalloc_size(INDEX_NODE) &&
++		size >= 256 && cachep->object_size > cache_line_size() &&
++		ALIGN(size, cachep->align) < PAGE_SIZE) {
+ 		cachep->obj_offset += PAGE_SIZE - ALIGN(size, cachep->align);
+ 		size = PAGE_SIZE;
+ 	}
+diff --git a/net/batman-adv/distributed-arp-table.c b/net/batman-adv/distributed-arp-table.c
+index 6d0b471eede8..cc7d87d64987 100644
+--- a/net/batman-adv/distributed-arp-table.c
++++ b/net/batman-adv/distributed-arp-table.c
+@@ -19,6 +19,7 @@
+ #include "main.h"
+ 
+ #include <linux/atomic.h>
++#include <linux/bitops.h>
+ #include <linux/byteorder/generic.h>
+ #include <linux/errno.h>
+ #include <linux/etherdevice.h>
+@@ -453,7 +454,7 @@ static bool batadv_is_orig_node_eligible(struct batadv_dat_candidate *res,
+ 	int j;
+ 
+ 	/* check if orig node candidate is running DAT */
+-	if (!(candidate->capabilities & BATADV_ORIG_CAPA_HAS_DAT))
++	if (!test_bit(BATADV_ORIG_CAPA_HAS_DAT, &candidate->capabilities))
+ 		goto out;
+ 
+ 	/* Check if this node has already been selected... */
+@@ -713,9 +714,9 @@ static void batadv_dat_tvlv_ogm_handler_v1(struct batadv_priv *bat_priv,
+ 					   uint16_t tvlv_value_len)
+ {
+ 	if (flags & BATADV_TVLV_HANDLER_OGM_CIFNOTFND)
+-		orig->capabilities &= ~BATADV_ORIG_CAPA_HAS_DAT;
++		clear_bit(BATADV_ORIG_CAPA_HAS_DAT, &orig->capabilities);
+ 	else
+-		orig->capabilities |= BATADV_ORIG_CAPA_HAS_DAT;
++		set_bit(BATADV_ORIG_CAPA_HAS_DAT, &orig->capabilities);
+ }
+ 
+ /**
+diff --git a/net/batman-adv/multicast.c b/net/batman-adv/multicast.c
+index 7aa480b7edd0..68a9554961eb 100644
+--- a/net/batman-adv/multicast.c
++++ b/net/batman-adv/multicast.c
+@@ -19,6 +19,8 @@
+ #include "main.h"
+ 
+ #include <linux/atomic.h>
++#include <linux/bitops.h>
++#include <linux/bug.h>
+ #include <linux/byteorder/generic.h>
+ #include <linux/errno.h>
+ #include <linux/etherdevice.h>
+@@ -588,19 +590,26 @@ batadv_mcast_forw_mode(struct batadv_priv *bat_priv, struct sk_buff *skb,
+  *
+  * If the BATADV_MCAST_WANT_ALL_UNSNOOPABLES flag of this originator,
+  * orig, has toggled then this method updates counter and list accordingly.
++ *
++ * Caller needs to hold orig->mcast_handler_lock.
+  */
+ static void batadv_mcast_want_unsnoop_update(struct batadv_priv *bat_priv,
+ 					     struct batadv_orig_node *orig,
+ 					     uint8_t mcast_flags)
+ {
++	struct hlist_node *node = &orig->mcast_want_all_unsnoopables_node;
++	struct hlist_head *head = &bat_priv->mcast.want_all_unsnoopables_list;
++
+ 	/* switched from flag unset to set */
+ 	if (mcast_flags & BATADV_MCAST_WANT_ALL_UNSNOOPABLES &&
+ 	    !(orig->mcast_flags & BATADV_MCAST_WANT_ALL_UNSNOOPABLES)) {
+ 		atomic_inc(&bat_priv->mcast.num_want_all_unsnoopables);
+ 
+ 		spin_lock_bh(&bat_priv->mcast.want_lists_lock);
+-		hlist_add_head_rcu(&orig->mcast_want_all_unsnoopables_node,
+-				   &bat_priv->mcast.want_all_unsnoopables_list);
++		/* flag checks above + mcast_handler_lock prevents this */
++		WARN_ON(!hlist_unhashed(node));
++
++		hlist_add_head_rcu(node, head);
+ 		spin_unlock_bh(&bat_priv->mcast.want_lists_lock);
+ 	/* switched from flag set to unset */
+ 	} else if (!(mcast_flags & BATADV_MCAST_WANT_ALL_UNSNOOPABLES) &&
+@@ -608,7 +617,10 @@ static void batadv_mcast_want_unsnoop_update(struct batadv_priv *bat_priv,
+ 		atomic_dec(&bat_priv->mcast.num_want_all_unsnoopables);
+ 
+ 		spin_lock_bh(&bat_priv->mcast.want_lists_lock);
+-		hlist_del_rcu(&orig->mcast_want_all_unsnoopables_node);
++		/* flag checks above + mcast_handler_lock prevents this */
++		WARN_ON(hlist_unhashed(node));
++
++		hlist_del_init_rcu(node);
+ 		spin_unlock_bh(&bat_priv->mcast.want_lists_lock);
+ 	}
+ }
+@@ -621,19 +633,26 @@ static void batadv_mcast_want_unsnoop_update(struct batadv_priv *bat_priv,
+  *
+  * If the BATADV_MCAST_WANT_ALL_IPV4 flag of this originator, orig, has
+  * toggled then this method updates counter and list accordingly.
++ *
++ * Caller needs to hold orig->mcast_handler_lock.
+  */
+ static void batadv_mcast_want_ipv4_update(struct batadv_priv *bat_priv,
+ 					  struct batadv_orig_node *orig,
+ 					  uint8_t mcast_flags)
+ {
++	struct hlist_node *node = &orig->mcast_want_all_ipv4_node;
++	struct hlist_head *head = &bat_priv->mcast.want_all_ipv4_list;
++
+ 	/* switched from flag unset to set */
+ 	if (mcast_flags & BATADV_MCAST_WANT_ALL_IPV4 &&
+ 	    !(orig->mcast_flags & BATADV_MCAST_WANT_ALL_IPV4)) {
+ 		atomic_inc(&bat_priv->mcast.num_want_all_ipv4);
+ 
+ 		spin_lock_bh(&bat_priv->mcast.want_lists_lock);
+-		hlist_add_head_rcu(&orig->mcast_want_all_ipv4_node,
+-				   &bat_priv->mcast.want_all_ipv4_list);
++		/* flag checks above + mcast_handler_lock prevents this */
++		WARN_ON(!hlist_unhashed(node));
++
++		hlist_add_head_rcu(node, head);
+ 		spin_unlock_bh(&bat_priv->mcast.want_lists_lock);
+ 	/* switched from flag set to unset */
+ 	} else if (!(mcast_flags & BATADV_MCAST_WANT_ALL_IPV4) &&
+@@ -641,7 +660,10 @@ static void batadv_mcast_want_ipv4_update(struct batadv_priv *bat_priv,
+ 		atomic_dec(&bat_priv->mcast.num_want_all_ipv4);
+ 
+ 		spin_lock_bh(&bat_priv->mcast.want_lists_lock);
+-		hlist_del_rcu(&orig->mcast_want_all_ipv4_node);
++		/* flag checks above + mcast_handler_lock prevents this */
++		WARN_ON(hlist_unhashed(node));
++
++		hlist_del_init_rcu(node);
+ 		spin_unlock_bh(&bat_priv->mcast.want_lists_lock);
+ 	}
+ }
+@@ -654,19 +676,26 @@ static void batadv_mcast_want_ipv4_update(struct batadv_priv *bat_priv,
+  *
+  * If the BATADV_MCAST_WANT_ALL_IPV6 flag of this originator, orig, has
+  * toggled then this method updates counter and list accordingly.
++ *
++ * Caller needs to hold orig->mcast_handler_lock.
+  */
+ static void batadv_mcast_want_ipv6_update(struct batadv_priv *bat_priv,
+ 					  struct batadv_orig_node *orig,
+ 					  uint8_t mcast_flags)
+ {
++	struct hlist_node *node = &orig->mcast_want_all_ipv6_node;
++	struct hlist_head *head = &bat_priv->mcast.want_all_ipv6_list;
++
+ 	/* switched from flag unset to set */
+ 	if (mcast_flags & BATADV_MCAST_WANT_ALL_IPV6 &&
+ 	    !(orig->mcast_flags & BATADV_MCAST_WANT_ALL_IPV6)) {
+ 		atomic_inc(&bat_priv->mcast.num_want_all_ipv6);
+ 
+ 		spin_lock_bh(&bat_priv->mcast.want_lists_lock);
+-		hlist_add_head_rcu(&orig->mcast_want_all_ipv6_node,
+-				   &bat_priv->mcast.want_all_ipv6_list);
++		/* flag checks above + mcast_handler_lock prevents this */
++		WARN_ON(!hlist_unhashed(node));
++
++		hlist_add_head_rcu(node, head);
+ 		spin_unlock_bh(&bat_priv->mcast.want_lists_lock);
+ 	/* switched from flag set to unset */
+ 	} else if (!(mcast_flags & BATADV_MCAST_WANT_ALL_IPV6) &&
+@@ -674,7 +703,10 @@ static void batadv_mcast_want_ipv6_update(struct batadv_priv *bat_priv,
+ 		atomic_dec(&bat_priv->mcast.num_want_all_ipv6);
+ 
+ 		spin_lock_bh(&bat_priv->mcast.want_lists_lock);
+-		hlist_del_rcu(&orig->mcast_want_all_ipv6_node);
++		/* flag checks above + mcast_handler_lock prevents this */
++		WARN_ON(hlist_unhashed(node));
++
++		hlist_del_init_rcu(node);
+ 		spin_unlock_bh(&bat_priv->mcast.want_lists_lock);
+ 	}
+ }
+@@ -697,39 +729,42 @@ static void batadv_mcast_tvlv_ogm_handler_v1(struct batadv_priv *bat_priv,
+ 	uint8_t mcast_flags = BATADV_NO_FLAGS;
+ 	bool orig_initialized;
+ 
+-	orig_initialized = orig->capa_initialized & BATADV_ORIG_CAPA_HAS_MCAST;
++	if (orig_mcast_enabled && tvlv_value &&
++	    (tvlv_value_len >= sizeof(mcast_flags)))
++		mcast_flags = *(uint8_t *)tvlv_value;
++
++	spin_lock_bh(&orig->mcast_handler_lock);
++	orig_initialized = test_bit(BATADV_ORIG_CAPA_HAS_MCAST,
++				    &orig->capa_initialized);
+ 
+ 	/* If mcast support is turned on decrease the disabled mcast node
+ 	 * counter only if we had increased it for this node before. If this
+ 	 * is a completely new orig_node no need to decrease the counter.
+ 	 */
+ 	if (orig_mcast_enabled &&
+-	    !(orig->capabilities & BATADV_ORIG_CAPA_HAS_MCAST)) {
++	    !test_bit(BATADV_ORIG_CAPA_HAS_MCAST, &orig->capabilities)) {
+ 		if (orig_initialized)
+ 			atomic_dec(&bat_priv->mcast.num_disabled);
+-		orig->capabilities |= BATADV_ORIG_CAPA_HAS_MCAST;
++		set_bit(BATADV_ORIG_CAPA_HAS_MCAST, &orig->capabilities);
+ 	/* If mcast support is being switched off or if this is an initial
+ 	 * OGM without mcast support then increase the disabled mcast
+ 	 * node counter.
+ 	 */
+ 	} else if (!orig_mcast_enabled &&
+-		   (orig->capabilities & BATADV_ORIG_CAPA_HAS_MCAST ||
++		   (test_bit(BATADV_ORIG_CAPA_HAS_MCAST, &orig->capabilities) ||
+ 		    !orig_initialized)) {
+ 		atomic_inc(&bat_priv->mcast.num_disabled);
+-		orig->capabilities &= ~BATADV_ORIG_CAPA_HAS_MCAST;
++		clear_bit(BATADV_ORIG_CAPA_HAS_MCAST, &orig->capabilities);
+ 	}
+ 
+-	orig->capa_initialized |= BATADV_ORIG_CAPA_HAS_MCAST;
+-
+-	if (orig_mcast_enabled && tvlv_value &&
+-	    (tvlv_value_len >= sizeof(mcast_flags)))
+-		mcast_flags = *(uint8_t *)tvlv_value;
++	set_bit(BATADV_ORIG_CAPA_HAS_MCAST, &orig->capa_initialized);
+ 
+ 	batadv_mcast_want_unsnoop_update(bat_priv, orig, mcast_flags);
+ 	batadv_mcast_want_ipv4_update(bat_priv, orig, mcast_flags);
+ 	batadv_mcast_want_ipv6_update(bat_priv, orig, mcast_flags);
+ 
+ 	orig->mcast_flags = mcast_flags;
++	spin_unlock_bh(&orig->mcast_handler_lock);
+ }
+ 
+ /**
+@@ -763,11 +798,15 @@ void batadv_mcast_purge_orig(struct batadv_orig_node *orig)
+ {
+ 	struct batadv_priv *bat_priv = orig->bat_priv;
+ 
+-	if (!(orig->capabilities & BATADV_ORIG_CAPA_HAS_MCAST) &&
+-	    orig->capa_initialized & BATADV_ORIG_CAPA_HAS_MCAST)
++	spin_lock_bh(&orig->mcast_handler_lock);
++
++	if (!test_bit(BATADV_ORIG_CAPA_HAS_MCAST, &orig->capabilities) &&
++	    test_bit(BATADV_ORIG_CAPA_HAS_MCAST, &orig->capa_initialized))
+ 		atomic_dec(&bat_priv->mcast.num_disabled);
+ 
+ 	batadv_mcast_want_unsnoop_update(bat_priv, orig, BATADV_NO_FLAGS);
+ 	batadv_mcast_want_ipv4_update(bat_priv, orig, BATADV_NO_FLAGS);
+ 	batadv_mcast_want_ipv6_update(bat_priv, orig, BATADV_NO_FLAGS);
++
++	spin_unlock_bh(&orig->mcast_handler_lock);
+ }
+diff --git a/net/batman-adv/network-coding.c b/net/batman-adv/network-coding.c
+index f0a50f31d822..46604010dcd4 100644
+--- a/net/batman-adv/network-coding.c
++++ b/net/batman-adv/network-coding.c
+@@ -19,6 +19,7 @@
+ #include "main.h"
+ 
+ #include <linux/atomic.h>
++#include <linux/bitops.h>
+ #include <linux/byteorder/generic.h>
+ #include <linux/compiler.h>
+ #include <linux/debugfs.h>
+@@ -134,9 +135,9 @@ static void batadv_nc_tvlv_ogm_handler_v1(struct batadv_priv *bat_priv,
+ 					  uint16_t tvlv_value_len)
+ {
+ 	if (flags & BATADV_TVLV_HANDLER_OGM_CIFNOTFND)
+-		orig->capabilities &= ~BATADV_ORIG_CAPA_HAS_NC;
++		clear_bit(BATADV_ORIG_CAPA_HAS_NC, &orig->capabilities);
+ 	else
+-		orig->capabilities |= BATADV_ORIG_CAPA_HAS_NC;
++		set_bit(BATADV_ORIG_CAPA_HAS_NC, &orig->capabilities);
+ }
+ 
+ /**
+@@ -894,7 +895,7 @@ void batadv_nc_update_nc_node(struct batadv_priv *bat_priv,
+ 		goto out;
+ 
+ 	/* check if orig node is network coding enabled */
+-	if (!(orig_node->capabilities & BATADV_ORIG_CAPA_HAS_NC))
++	if (!test_bit(BATADV_ORIG_CAPA_HAS_NC, &orig_node->capabilities))
+ 		goto out;
+ 
+ 	/* accept ogms from 'good' neighbors and single hop neighbors */
+diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c
+index 018b7495ad84..32a0fcfab36d 100644
+--- a/net/batman-adv/originator.c
++++ b/net/batman-adv/originator.c
+@@ -696,8 +696,13 @@ struct batadv_orig_node *batadv_orig_node_new(struct batadv_priv *bat_priv,
+ 	orig_node->last_seen = jiffies;
+ 	reset_time = jiffies - 1 - msecs_to_jiffies(BATADV_RESET_PROTECTION_MS);
+ 	orig_node->bcast_seqno_reset = reset_time;
++
+ #ifdef CONFIG_BATMAN_ADV_MCAST
+ 	orig_node->mcast_flags = BATADV_NO_FLAGS;
++	INIT_HLIST_NODE(&orig_node->mcast_want_all_unsnoopables_node);
++	INIT_HLIST_NODE(&orig_node->mcast_want_all_ipv4_node);
++	INIT_HLIST_NODE(&orig_node->mcast_want_all_ipv6_node);
++	spin_lock_init(&orig_node->mcast_handler_lock);
+ #endif
+ 
+ 	/* create a vlan object for the "untagged" LAN */
+diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
+index a2fc843c2243..51cda3a7c51d 100644
+--- a/net/batman-adv/soft-interface.c
++++ b/net/batman-adv/soft-interface.c
+@@ -202,6 +202,7 @@ static int batadv_interface_tx(struct sk_buff *skb,
+ 	int gw_mode;
+ 	enum batadv_forw_mode forw_mode;
+ 	struct batadv_orig_node *mcast_single_orig = NULL;
++	int network_offset = ETH_HLEN;
+ 
+ 	if (atomic_read(&bat_priv->mesh_state) != BATADV_MESH_ACTIVE)
+ 		goto dropped;
+@@ -214,14 +215,18 @@ static int batadv_interface_tx(struct sk_buff *skb,
+ 	case ETH_P_8021Q:
+ 		vhdr = vlan_eth_hdr(skb);
+ 
+-		if (vhdr->h_vlan_encapsulated_proto != ethertype)
++		if (vhdr->h_vlan_encapsulated_proto != ethertype) {
++			network_offset += VLAN_HLEN;
+ 			break;
++		}
+ 
+ 		/* fall through */
+ 	case ETH_P_BATMAN:
+ 		goto dropped;
+ 	}
+ 
++	skb_set_network_header(skb, network_offset);
++
+ 	if (batadv_bla_tx(bat_priv, skb, vid))
+ 		goto dropped;
+ 
+diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
+index 5809b39c1922..c9b26291ac4c 100644
+--- a/net/batman-adv/translation-table.c
++++ b/net/batman-adv/translation-table.c
+@@ -19,6 +19,7 @@
+ #include "main.h"
+ 
+ #include <linux/atomic.h>
++#include <linux/bitops.h>
+ #include <linux/bug.h>
+ #include <linux/byteorder/generic.h>
+ #include <linux/compiler.h>
+@@ -1882,7 +1883,7 @@ void batadv_tt_global_del_orig(struct batadv_priv *bat_priv,
+ 		}
+ 		spin_unlock_bh(list_lock);
+ 	}
+-	orig_node->capa_initialized &= ~BATADV_ORIG_CAPA_HAS_TT;
++	clear_bit(BATADV_ORIG_CAPA_HAS_TT, &orig_node->capa_initialized);
+ }
+ 
+ static bool batadv_tt_global_to_purge(struct batadv_tt_global_entry *tt_global,
+@@ -2841,7 +2842,7 @@ static void _batadv_tt_update_changes(struct batadv_priv *bat_priv,
+ 				return;
+ 		}
+ 	}
+-	orig_node->capa_initialized |= BATADV_ORIG_CAPA_HAS_TT;
++	set_bit(BATADV_ORIG_CAPA_HAS_TT, &orig_node->capa_initialized);
+ }
+ 
+ static void batadv_tt_fill_gtable(struct batadv_priv *bat_priv,
+@@ -3343,7 +3344,8 @@ static void batadv_tt_update_orig(struct batadv_priv *bat_priv,
+ 	bool has_tt_init;
+ 
+ 	tt_vlan = (struct batadv_tvlv_tt_vlan_data *)tt_buff;
+-	has_tt_init = orig_node->capa_initialized & BATADV_ORIG_CAPA_HAS_TT;
++	has_tt_init = test_bit(BATADV_ORIG_CAPA_HAS_TT,
++			       &orig_node->capa_initialized);
+ 
+ 	/* orig table not initialised AND first diff is in the OGM OR the ttvn
+ 	 * increased by one -> we can apply the attached changes
+diff --git a/net/batman-adv/types.h b/net/batman-adv/types.h
+index 67d63483618e..55610a805b53 100644
+--- a/net/batman-adv/types.h
++++ b/net/batman-adv/types.h
+@@ -221,6 +221,7 @@ struct batadv_orig_bat_iv {
+  * @batadv_dat_addr_t:  address of the orig node in the distributed hash
+  * @last_seen: time when last packet from this node was received
+  * @bcast_seqno_reset: time when the broadcast seqno window was reset
++ * @mcast_handler_lock: synchronizes mcast-capability and -flag changes
+  * @mcast_flags: multicast flags announced by the orig node
+  * @mcast_want_all_unsnoop_node: a list node for the
+  *  mcast.want_all_unsnoopables list
+@@ -268,13 +269,15 @@ struct batadv_orig_node {
+ 	unsigned long last_seen;
+ 	unsigned long bcast_seqno_reset;
+ #ifdef CONFIG_BATMAN_ADV_MCAST
++	/* synchronizes mcast tvlv specific orig changes */
++	spinlock_t mcast_handler_lock;
+ 	uint8_t mcast_flags;
+ 	struct hlist_node mcast_want_all_unsnoopables_node;
+ 	struct hlist_node mcast_want_all_ipv4_node;
+ 	struct hlist_node mcast_want_all_ipv6_node;
+ #endif
+-	uint8_t capabilities;
+-	uint8_t capa_initialized;
++	unsigned long capabilities;
++	unsigned long capa_initialized;
+ 	atomic_t last_ttvn;
+ 	unsigned char *tt_buff;
+ 	int16_t tt_buff_len;
+@@ -313,10 +316,10 @@ struct batadv_orig_node {
+  *  (= orig node announces a tvlv of type BATADV_TVLV_MCAST)
+  */
+ enum batadv_orig_capabilities {
+-	BATADV_ORIG_CAPA_HAS_DAT = BIT(0),
+-	BATADV_ORIG_CAPA_HAS_NC = BIT(1),
+-	BATADV_ORIG_CAPA_HAS_TT = BIT(2),
+-	BATADV_ORIG_CAPA_HAS_MCAST = BIT(3),
++	BATADV_ORIG_CAPA_HAS_DAT,
++	BATADV_ORIG_CAPA_HAS_NC,
++	BATADV_ORIG_CAPA_HAS_TT,
++	BATADV_ORIG_CAPA_HAS_MCAST,
+ };
+ 
+ /**
+diff --git a/net/bluetooth/smp.c b/net/bluetooth/smp.c
+index ad82324f710f..0510a577a7b5 100644
+--- a/net/bluetooth/smp.c
++++ b/net/bluetooth/smp.c
+@@ -2311,12 +2311,6 @@ int smp_conn_security(struct hci_conn *hcon, __u8 sec_level)
+ 	if (!conn)
+ 		return 1;
+ 
+-	chan = conn->smp;
+-	if (!chan) {
+-		BT_ERR("SMP security requested but not available");
+-		return 1;
+-	}
+-
+ 	if (!hci_dev_test_flag(hcon->hdev, HCI_LE_ENABLED))
+ 		return 1;
+ 
+@@ -2330,6 +2324,12 @@ int smp_conn_security(struct hci_conn *hcon, __u8 sec_level)
+ 		if (smp_ltk_encrypt(conn, hcon->pending_sec_level))
+ 			return 0;
+ 
++	chan = conn->smp;
++	if (!chan) {
++		BT_ERR("SMP security requested but not available");
++		return 1;
++	}
++
+ 	l2cap_chan_lock(chan);
+ 
+ 	/* If SMP is already in progress ignore this request */
+diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h
+index afe905c208af..691b54fcaf2a 100644
+--- a/net/netfilter/ipset/ip_set_hash_gen.h
++++ b/net/netfilter/ipset/ip_set_hash_gen.h
+@@ -152,9 +152,13 @@ htable_bits(u32 hashsize)
+ #define SET_HOST_MASK(family)	(family == AF_INET ? 32 : 128)
+ 
+ #ifdef IP_SET_HASH_WITH_NET0
++/* cidr from 0 to SET_HOST_MASK() value and c = cidr + 1 */
+ #define NLEN(family)		(SET_HOST_MASK(family) + 1)
++#define CIDR_POS(c)		((c) - 1)
+ #else
++/* cidr from 1 to SET_HOST_MASK() value and c = cidr + 1 */
+ #define NLEN(family)		SET_HOST_MASK(family)
++#define CIDR_POS(c)		((c) - 2)
+ #endif
+ 
+ #else
+@@ -305,7 +309,7 @@ mtype_add_cidr(struct htype *h, u8 cidr, u8 nets_length, u8 n)
+ 		} else if (h->nets[i].cidr[n] < cidr) {
+ 			j = i;
+ 		} else if (h->nets[i].cidr[n] == cidr) {
+-			h->nets[cidr - 1].nets[n]++;
++			h->nets[CIDR_POS(cidr)].nets[n]++;
+ 			return;
+ 		}
+ 	}
+@@ -314,7 +318,7 @@ mtype_add_cidr(struct htype *h, u8 cidr, u8 nets_length, u8 n)
+ 			h->nets[i].cidr[n] = h->nets[i - 1].cidr[n];
+ 	}
+ 	h->nets[i].cidr[n] = cidr;
+-	h->nets[cidr - 1].nets[n] = 1;
++	h->nets[CIDR_POS(cidr)].nets[n] = 1;
+ }
+ 
+ static void
+@@ -325,8 +329,8 @@ mtype_del_cidr(struct htype *h, u8 cidr, u8 nets_length, u8 n)
+ 	for (i = 0; i < nets_length; i++) {
+ 		if (h->nets[i].cidr[n] != cidr)
+ 			continue;
+-		h->nets[cidr - 1].nets[n]--;
+-		if (h->nets[cidr - 1].nets[n] > 0)
++		h->nets[CIDR_POS(cidr)].nets[n]--;
++		if (h->nets[CIDR_POS(cidr)].nets[n] > 0)
+ 			return;
+ 		for (j = i; j < net_end && h->nets[j].cidr[n]; j++)
+ 			h->nets[j].cidr[n] = h->nets[j + 1].cidr[n];
+diff --git a/net/netfilter/ipset/ip_set_hash_netnet.c b/net/netfilter/ipset/ip_set_hash_netnet.c
+index 3c862c0a76d1..a93dfebffa81 100644
+--- a/net/netfilter/ipset/ip_set_hash_netnet.c
++++ b/net/netfilter/ipset/ip_set_hash_netnet.c
+@@ -131,6 +131,13 @@ hash_netnet4_data_next(struct hash_netnet4_elem *next,
+ #define HOST_MASK	32
+ #include "ip_set_hash_gen.h"
+ 
++static void
++hash_netnet4_init(struct hash_netnet4_elem *e)
++{
++	e->cidr[0] = HOST_MASK;
++	e->cidr[1] = HOST_MASK;
++}
++
+ static int
+ hash_netnet4_kadt(struct ip_set *set, const struct sk_buff *skb,
+ 		  const struct xt_action_param *par,
+@@ -160,7 +167,7 @@ hash_netnet4_uadt(struct ip_set *set, struct nlattr *tb[],
+ {
+ 	const struct hash_netnet *h = set->data;
+ 	ipset_adtfn adtfn = set->variant->adt[adt];
+-	struct hash_netnet4_elem e = { .cidr = { HOST_MASK, HOST_MASK, }, };
++	struct hash_netnet4_elem e = { };
+ 	struct ip_set_ext ext = IP_SET_INIT_UEXT(set);
+ 	u32 ip = 0, ip_to = 0, last;
+ 	u32 ip2 = 0, ip2_from = 0, ip2_to = 0, last2;
+@@ -169,6 +176,7 @@ hash_netnet4_uadt(struct ip_set *set, struct nlattr *tb[],
+ 	if (tb[IPSET_ATTR_LINENO])
+ 		*lineno = nla_get_u32(tb[IPSET_ATTR_LINENO]);
+ 
++	hash_netnet4_init(&e);
+ 	if (unlikely(!tb[IPSET_ATTR_IP] || !tb[IPSET_ATTR_IP2] ||
+ 		     !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS)))
+ 		return -IPSET_ERR_PROTOCOL;
+@@ -357,6 +365,13 @@ hash_netnet6_data_next(struct hash_netnet4_elem *next,
+ #define IP_SET_EMIT_CREATE
+ #include "ip_set_hash_gen.h"
+ 
++static void
++hash_netnet6_init(struct hash_netnet6_elem *e)
++{
++	e->cidr[0] = HOST_MASK;
++	e->cidr[1] = HOST_MASK;
++}
++
+ static int
+ hash_netnet6_kadt(struct ip_set *set, const struct sk_buff *skb,
+ 		  const struct xt_action_param *par,
+@@ -385,13 +400,14 @@ hash_netnet6_uadt(struct ip_set *set, struct nlattr *tb[],
+ 		  enum ipset_adt adt, u32 *lineno, u32 flags, bool retried)
+ {
+ 	ipset_adtfn adtfn = set->variant->adt[adt];
+-	struct hash_netnet6_elem e = { .cidr = { HOST_MASK, HOST_MASK, }, };
++	struct hash_netnet6_elem e = { };
+ 	struct ip_set_ext ext = IP_SET_INIT_UEXT(set);
+ 	int ret;
+ 
+ 	if (tb[IPSET_ATTR_LINENO])
+ 		*lineno = nla_get_u32(tb[IPSET_ATTR_LINENO]);
+ 
++	hash_netnet6_init(&e);
+ 	if (unlikely(!tb[IPSET_ATTR_IP] || !tb[IPSET_ATTR_IP2] ||
+ 		     !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS)))
+ 		return -IPSET_ERR_PROTOCOL;
+diff --git a/net/netfilter/ipset/ip_set_hash_netportnet.c b/net/netfilter/ipset/ip_set_hash_netportnet.c
+index 0c68734f5cc4..9a14c237830f 100644
+--- a/net/netfilter/ipset/ip_set_hash_netportnet.c
++++ b/net/netfilter/ipset/ip_set_hash_netportnet.c
+@@ -142,6 +142,13 @@ hash_netportnet4_data_next(struct hash_netportnet4_elem *next,
+ #define HOST_MASK	32
+ #include "ip_set_hash_gen.h"
+ 
++static void
++hash_netportnet4_init(struct hash_netportnet4_elem *e)
++{
++	e->cidr[0] = HOST_MASK;
++	e->cidr[1] = HOST_MASK;
++}
++
+ static int
+ hash_netportnet4_kadt(struct ip_set *set, const struct sk_buff *skb,
+ 		      const struct xt_action_param *par,
+@@ -175,7 +182,7 @@ hash_netportnet4_uadt(struct ip_set *set, struct nlattr *tb[],
+ {
+ 	const struct hash_netportnet *h = set->data;
+ 	ipset_adtfn adtfn = set->variant->adt[adt];
+-	struct hash_netportnet4_elem e = { .cidr = { HOST_MASK, HOST_MASK, }, };
++	struct hash_netportnet4_elem e = { };
+ 	struct ip_set_ext ext = IP_SET_INIT_UEXT(set);
+ 	u32 ip = 0, ip_to = 0, ip_last, p = 0, port, port_to;
+ 	u32 ip2_from = 0, ip2_to = 0, ip2_last, ip2;
+@@ -185,6 +192,7 @@ hash_netportnet4_uadt(struct ip_set *set, struct nlattr *tb[],
+ 	if (tb[IPSET_ATTR_LINENO])
+ 		*lineno = nla_get_u32(tb[IPSET_ATTR_LINENO]);
+ 
++	hash_netportnet4_init(&e);
+ 	if (unlikely(!tb[IPSET_ATTR_IP] || !tb[IPSET_ATTR_IP2] ||
+ 		     !ip_set_attr_netorder(tb, IPSET_ATTR_PORT) ||
+ 		     !ip_set_optattr_netorder(tb, IPSET_ATTR_PORT_TO) ||
+@@ -412,6 +420,13 @@ hash_netportnet6_data_next(struct hash_netportnet4_elem *next,
+ #define IP_SET_EMIT_CREATE
+ #include "ip_set_hash_gen.h"
+ 
++static void
++hash_netportnet6_init(struct hash_netportnet6_elem *e)
++{
++	e->cidr[0] = HOST_MASK;
++	e->cidr[1] = HOST_MASK;
++}
++
+ static int
+ hash_netportnet6_kadt(struct ip_set *set, const struct sk_buff *skb,
+ 		      const struct xt_action_param *par,
+@@ -445,7 +460,7 @@ hash_netportnet6_uadt(struct ip_set *set, struct nlattr *tb[],
+ {
+ 	const struct hash_netportnet *h = set->data;
+ 	ipset_adtfn adtfn = set->variant->adt[adt];
+-	struct hash_netportnet6_elem e = { .cidr = { HOST_MASK, HOST_MASK, }, };
++	struct hash_netportnet6_elem e = { };
+ 	struct ip_set_ext ext = IP_SET_INIT_UEXT(set);
+ 	u32 port, port_to;
+ 	bool with_ports = false;
+@@ -454,6 +469,7 @@ hash_netportnet6_uadt(struct ip_set *set, struct nlattr *tb[],
+ 	if (tb[IPSET_ATTR_LINENO])
+ 		*lineno = nla_get_u32(tb[IPSET_ATTR_LINENO]);
+ 
++	hash_netportnet6_init(&e);
+ 	if (unlikely(!tb[IPSET_ATTR_IP] || !tb[IPSET_ATTR_IP2] ||
+ 		     !ip_set_attr_netorder(tb, IPSET_ATTR_PORT) ||
+ 		     !ip_set_optattr_netorder(tb, IPSET_ATTR_PORT_TO) ||
+diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
+index 3c20d02aee73..0625a42df108 100644
+--- a/net/netfilter/nf_conntrack_core.c
++++ b/net/netfilter/nf_conntrack_core.c
+@@ -320,12 +320,13 @@ out_free:
+ }
+ EXPORT_SYMBOL_GPL(nf_ct_tmpl_alloc);
+ 
+-static void nf_ct_tmpl_free(struct nf_conn *tmpl)
++void nf_ct_tmpl_free(struct nf_conn *tmpl)
+ {
+ 	nf_ct_ext_destroy(tmpl);
+ 	nf_ct_ext_free(tmpl);
+ 	kfree(tmpl);
+ }
++EXPORT_SYMBOL_GPL(nf_ct_tmpl_free);
+ 
+ static void
+ destroy_conntrack(struct nf_conntrack *nfct)
+diff --git a/net/netfilter/nf_log.c b/net/netfilter/nf_log.c
+index 675d12c69e32..a5d41dfa9f05 100644
+--- a/net/netfilter/nf_log.c
++++ b/net/netfilter/nf_log.c
+@@ -107,12 +107,17 @@ EXPORT_SYMBOL(nf_log_register);
+ 
+ void nf_log_unregister(struct nf_logger *logger)
+ {
++	const struct nf_logger *log;
+ 	int i;
+ 
+ 	mutex_lock(&nf_log_mutex);
+-	for (i = 0; i < NFPROTO_NUMPROTO; i++)
+-		RCU_INIT_POINTER(loggers[i][logger->type], NULL);
++	for (i = 0; i < NFPROTO_NUMPROTO; i++) {
++		log = nft_log_dereference(loggers[i][logger->type]);
++		if (log == logger)
++			RCU_INIT_POINTER(loggers[i][logger->type], NULL);
++	}
+ 	mutex_unlock(&nf_log_mutex);
++	synchronize_rcu();
+ }
+ EXPORT_SYMBOL(nf_log_unregister);
+ 
+diff --git a/net/netfilter/nf_synproxy_core.c b/net/netfilter/nf_synproxy_core.c
+index d7f168527903..d6ee8f8b19b6 100644
+--- a/net/netfilter/nf_synproxy_core.c
++++ b/net/netfilter/nf_synproxy_core.c
+@@ -378,7 +378,7 @@ static int __net_init synproxy_net_init(struct net *net)
+ err3:
+ 	free_percpu(snet->stats);
+ err2:
+-	nf_conntrack_free(ct);
++	nf_ct_tmpl_free(ct);
+ err1:
+ 	return err;
+ }
+diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
+index 0c0e8ecf02ab..70277b11f742 100644
+--- a/net/netfilter/nfnetlink.c
++++ b/net/netfilter/nfnetlink.c
+@@ -444,6 +444,7 @@ done:
+ static void nfnetlink_rcv(struct sk_buff *skb)
+ {
+ 	struct nlmsghdr *nlh = nlmsg_hdr(skb);
++	u_int16_t res_id;
+ 	int msglen;
+ 
+ 	if (nlh->nlmsg_len < NLMSG_HDRLEN ||
+@@ -468,7 +469,12 @@ static void nfnetlink_rcv(struct sk_buff *skb)
+ 
+ 		nfgenmsg = nlmsg_data(nlh);
+ 		skb_pull(skb, msglen);
+-		nfnetlink_rcv_batch(skb, nlh, nfgenmsg->res_id);
++		/* Work around old nft using host byte order */
++		if (nfgenmsg->res_id == NFNL_SUBSYS_NFTABLES)
++			res_id = NFNL_SUBSYS_NFTABLES;
++		else
++			res_id = ntohs(nfgenmsg->res_id);
++		nfnetlink_rcv_batch(skb, nlh, res_id);
+ 	} else {
+ 		netlink_rcv_skb(skb, &nfnetlink_rcv_msg);
+ 	}
+diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c
+index 66def315eb56..9c8fab00164b 100644
+--- a/net/netfilter/nft_compat.c
++++ b/net/netfilter/nft_compat.c
+@@ -619,6 +619,13 @@ struct nft_xt {
+ 
+ static struct nft_expr_type nft_match_type;
+ 
++static bool nft_match_cmp(const struct xt_match *match,
++			  const char *name, u32 rev, u32 family)
++{
++	return strcmp(match->name, name) == 0 && match->revision == rev &&
++	       (match->family == NFPROTO_UNSPEC || match->family == family);
++}
++
+ static const struct nft_expr_ops *
+ nft_match_select_ops(const struct nft_ctx *ctx,
+ 		     const struct nlattr * const tb[])
+@@ -626,7 +633,7 @@ nft_match_select_ops(const struct nft_ctx *ctx,
+ 	struct nft_xt *nft_match;
+ 	struct xt_match *match;
+ 	char *mt_name;
+-	__u32 rev, family;
++	u32 rev, family;
+ 
+ 	if (tb[NFTA_MATCH_NAME] == NULL ||
+ 	    tb[NFTA_MATCH_REV] == NULL ||
+@@ -641,8 +648,7 @@ nft_match_select_ops(const struct nft_ctx *ctx,
+ 	list_for_each_entry(nft_match, &nft_match_list, head) {
+ 		struct xt_match *match = nft_match->ops.data;
+ 
+-		if (strcmp(match->name, mt_name) == 0 &&
+-		    match->revision == rev && match->family == family) {
++		if (nft_match_cmp(match, mt_name, rev, family)) {
+ 			if (!try_module_get(match->me))
+ 				return ERR_PTR(-ENOENT);
+ 
+@@ -693,6 +699,13 @@ static LIST_HEAD(nft_target_list);
+ 
+ static struct nft_expr_type nft_target_type;
+ 
++static bool nft_target_cmp(const struct xt_target *tg,
++			   const char *name, u32 rev, u32 family)
++{
++	return strcmp(tg->name, name) == 0 && tg->revision == rev &&
++	       (tg->family == NFPROTO_UNSPEC || tg->family == family);
++}
++
+ static const struct nft_expr_ops *
+ nft_target_select_ops(const struct nft_ctx *ctx,
+ 		      const struct nlattr * const tb[])
+@@ -700,7 +713,7 @@ nft_target_select_ops(const struct nft_ctx *ctx,
+ 	struct nft_xt *nft_target;
+ 	struct xt_target *target;
+ 	char *tg_name;
+-	__u32 rev, family;
++	u32 rev, family;
+ 
+ 	if (tb[NFTA_TARGET_NAME] == NULL ||
+ 	    tb[NFTA_TARGET_REV] == NULL ||
+@@ -715,8 +728,7 @@ nft_target_select_ops(const struct nft_ctx *ctx,
+ 	list_for_each_entry(nft_target, &nft_target_list, head) {
+ 		struct xt_target *target = nft_target->ops.data;
+ 
+-		if (strcmp(target->name, tg_name) == 0 &&
+-		    target->revision == rev && target->family == family) {
++		if (nft_target_cmp(target, tg_name, rev, family)) {
+ 			if (!try_module_get(target->me))
+ 				return ERR_PTR(-ENOENT);
+ 
+diff --git a/net/netfilter/xt_CT.c b/net/netfilter/xt_CT.c
+index 43ddeee404e9..f3377ce1ff18 100644
+--- a/net/netfilter/xt_CT.c
++++ b/net/netfilter/xt_CT.c
+@@ -233,7 +233,7 @@ out:
+ 	return 0;
+ 
+ err3:
+-	nf_conntrack_free(ct);
++	nf_ct_tmpl_free(ct);
+ err2:
+ 	nf_ct_l3proto_module_put(par->family);
+ err1:
+diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+index d25cd430f9ff..95412abc95b0 100644
+--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
++++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+@@ -384,6 +384,7 @@ static int send_reply(struct svcxprt_rdma *rdma,
+ 		      int byte_count)
+ {
+ 	struct ib_send_wr send_wr;
++	u32 xdr_off;
+ 	int sge_no;
+ 	int sge_bytes;
+ 	int page_no;
+@@ -418,8 +419,8 @@ static int send_reply(struct svcxprt_rdma *rdma,
+ 	ctxt->direction = DMA_TO_DEVICE;
+ 
+ 	/* Map the payload indicated by 'byte_count' */
++	xdr_off = 0;
+ 	for (sge_no = 1; byte_count && sge_no < vec->count; sge_no++) {
+-		int xdr_off = 0;
+ 		sge_bytes = min_t(size_t, vec->sge[sge_no].iov_len, byte_count);
+ 		byte_count -= sge_bytes;
+ 		ctxt->sge[sge_no].addr =
+@@ -457,6 +458,13 @@ static int send_reply(struct svcxprt_rdma *rdma,
+ 	}
+ 	rqstp->rq_next_page = rqstp->rq_respages + 1;
+ 
++	/* The loop above bumps sc_dma_used for each sge. The
++	 * xdr_buf.tail gets a separate sge, but resides in the
++	 * same page as xdr_buf.head. Don't count it twice.
++	 */
++	if (sge_no > ctxt->count)
++		atomic_dec(&rdma->sc_dma_used);
++
+ 	if (sge_no > rdma->sc_max_sge) {
+ 		pr_err("svcrdma: Too many sges (%d)\n", sge_no);
+ 		goto err;
+diff --git a/sound/arm/Kconfig b/sound/arm/Kconfig
+index 885683a3b0bd..e0406211716b 100644
+--- a/sound/arm/Kconfig
++++ b/sound/arm/Kconfig
+@@ -9,6 +9,14 @@ menuconfig SND_ARM
+ 	  Drivers that are implemented on ASoC can be found in
+ 	  "ALSA for SoC audio support" section.
+ 
++config SND_PXA2XX_LIB
++	tristate
++	select SND_AC97_CODEC if SND_PXA2XX_LIB_AC97
++	select SND_DMAENGINE_PCM
++
++config SND_PXA2XX_LIB_AC97
++	bool
++
+ if SND_ARM
+ 
+ config SND_ARMAACI
+@@ -21,13 +29,6 @@ config SND_PXA2XX_PCM
+ 	tristate
+ 	select SND_PCM
+ 
+-config SND_PXA2XX_LIB
+-	tristate
+-	select SND_AC97_CODEC if SND_PXA2XX_LIB_AC97
+-
+-config SND_PXA2XX_LIB_AC97
+-	bool
+-
+ config SND_PXA2XX_AC97
+ 	tristate "AC97 driver for the Intel PXA2xx chip"
+ 	depends on ARCH_PXA
+diff --git a/sound/pci/hda/hda_tegra.c b/sound/pci/hda/hda_tegra.c
+index 477742cb70a2..58c0aad37284 100644
+--- a/sound/pci/hda/hda_tegra.c
++++ b/sound/pci/hda/hda_tegra.c
+@@ -73,6 +73,7 @@ struct hda_tegra {
+ 	struct clk *hda2codec_2x_clk;
+ 	struct clk *hda2hdmi_clk;
+ 	void __iomem *regs;
++	struct work_struct probe_work;
+ };
+ 
+ #ifdef CONFIG_PM
+@@ -294,7 +295,9 @@ static int hda_tegra_dev_disconnect(struct snd_device *device)
+ static int hda_tegra_dev_free(struct snd_device *device)
+ {
+ 	struct azx *chip = device->device_data;
++	struct hda_tegra *hda = container_of(chip, struct hda_tegra, chip);
+ 
++	cancel_work_sync(&hda->probe_work);
+ 	if (azx_bus(chip)->chip_init) {
+ 		azx_stop_all_streams(chip);
+ 		azx_stop_chip(chip);
+@@ -426,6 +429,9 @@ static int hda_tegra_first_init(struct azx *chip, struct platform_device *pdev)
+ /*
+  * constructor
+  */
++
++static void hda_tegra_probe_work(struct work_struct *work);
++
+ static int hda_tegra_create(struct snd_card *card,
+ 			    unsigned int driver_caps,
+ 			    struct hda_tegra *hda)
+@@ -452,6 +458,8 @@ static int hda_tegra_create(struct snd_card *card,
+ 	chip->single_cmd = false;
+ 	chip->snoop = true;
+ 
++	INIT_WORK(&hda->probe_work, hda_tegra_probe_work);
++
+ 	err = azx_bus_init(chip, NULL, &hda_tegra_io_ops);
+ 	if (err < 0)
+ 		return err;
+@@ -499,6 +507,21 @@ static int hda_tegra_probe(struct platform_device *pdev)
+ 	card->private_data = chip;
+ 
+ 	dev_set_drvdata(&pdev->dev, card);
++	schedule_work(&hda->probe_work);
++
++	return 0;
++
++out_free:
++	snd_card_free(card);
++	return err;
++}
++
++static void hda_tegra_probe_work(struct work_struct *work)
++{
++	struct hda_tegra *hda = container_of(work, struct hda_tegra, probe_work);
++	struct azx *chip = &hda->chip;
++	struct platform_device *pdev = to_platform_device(hda->dev);
++	int err;
+ 
+ 	err = hda_tegra_first_init(chip, pdev);
+ 	if (err < 0)
+@@ -520,11 +543,8 @@ static int hda_tegra_probe(struct platform_device *pdev)
+ 	chip->running = 1;
+ 	snd_hda_set_power_save(&chip->bus, power_save * 1000);
+ 
+-	return 0;
+-
+-out_free:
+-	snd_card_free(card);
+-	return err;
++ out_free:
++	return; /* no error return from async probe */
+ }
+ 
+ static int hda_tegra_remove(struct platform_device *pdev)
+diff --git a/sound/pci/hda/patch_cirrus.c b/sound/pci/hda/patch_cirrus.c
+index 584a0343ab0c..85813de26da8 100644
+--- a/sound/pci/hda/patch_cirrus.c
++++ b/sound/pci/hda/patch_cirrus.c
+@@ -633,6 +633,7 @@ static const struct snd_pci_quirk cs4208_mac_fixup_tbl[] = {
+ 	SND_PCI_QUIRK(0x106b, 0x5e00, "MacBookPro 11,2", CS4208_MBP11),
+ 	SND_PCI_QUIRK(0x106b, 0x7100, "MacBookAir 6,1", CS4208_MBA6),
+ 	SND_PCI_QUIRK(0x106b, 0x7200, "MacBookAir 6,2", CS4208_MBA6),
++	SND_PCI_QUIRK(0x106b, 0x7b00, "MacBookPro 12,1", CS4208_MBP11),
+ 	{} /* terminator */
+ };
+ 
+diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
+index c8f01ccc2513..6a66139871c6 100644
+--- a/sound/pci/hda/patch_realtek.c
++++ b/sound/pci/hda/patch_realtek.c
+@@ -4188,6 +4188,24 @@ static void alc_fixup_disable_aamix(struct hda_codec *codec,
+ 	}
+ }
+ 
++/* fixup for Thinkpad docks: add dock pins, avoid HP parser fixup */
++static void alc_fixup_tpt440_dock(struct hda_codec *codec,
++				  const struct hda_fixup *fix, int action)
++{
++	static const struct hda_pintbl pincfgs[] = {
++		{ 0x16, 0x21211010 }, /* dock headphone */
++		{ 0x19, 0x21a11010 }, /* dock mic */
++		{ }
++	};
++	struct alc_spec *spec = codec->spec;
++
++	if (action == HDA_FIXUP_ACT_PRE_PROBE) {
++		spec->parse_flags = HDA_PINCFG_NO_HP_FIXUP;
++		codec->power_save_node = 0; /* avoid click noises */
++		snd_hda_apply_pincfgs(codec, pincfgs);
++	}
++}
++
+ static void alc_shutup_dell_xps13(struct hda_codec *codec)
+ {
+ 	struct alc_spec *spec = codec->spec;
+@@ -4562,7 +4580,6 @@ enum {
+ 	ALC255_FIXUP_HEADSET_MODE_NO_HP_MIC,
+ 	ALC293_FIXUP_DELL1_MIC_NO_PRESENCE,
+ 	ALC292_FIXUP_TPT440_DOCK,
+-	ALC292_FIXUP_TPT440_DOCK2,
+ 	ALC283_FIXUP_BXBT2807_MIC,
+ 	ALC255_FIXUP_DELL_WMI_MIC_MUTE_LED,
+ 	ALC282_FIXUP_ASPIRE_V5_PINS,
+@@ -5029,17 +5046,7 @@ static const struct hda_fixup alc269_fixups[] = {
+ 	},
+ 	[ALC292_FIXUP_TPT440_DOCK] = {
+ 		.type = HDA_FIXUP_FUNC,
+-		.v.func = alc269_fixup_pincfg_no_hp_to_lineout,
+-		.chained = true,
+-		.chain_id = ALC292_FIXUP_TPT440_DOCK2
+-	},
+-	[ALC292_FIXUP_TPT440_DOCK2] = {
+-		.type = HDA_FIXUP_PINS,
+-		.v.pins = (const struct hda_pintbl[]) {
+-			{ 0x16, 0x21211010 }, /* dock headphone */
+-			{ 0x19, 0x21a11010 }, /* dock mic */
+-			{ }
+-		},
++		.v.func = alc_fixup_tpt440_dock,
+ 		.chained = true,
+ 		.chain_id = ALC269_FIXUP_LIMIT_INT_MIC_BOOST
+ 	},
+@@ -5299,6 +5306,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
+ 	SND_PCI_QUIRK(0x17aa, 0x2212, "Thinkpad T440", ALC292_FIXUP_TPT440_DOCK),
+ 	SND_PCI_QUIRK(0x17aa, 0x2214, "Thinkpad X240", ALC292_FIXUP_TPT440_DOCK),
+ 	SND_PCI_QUIRK(0x17aa, 0x2215, "Thinkpad", ALC269_FIXUP_LIMIT_INT_MIC_BOOST),
++	SND_PCI_QUIRK(0x17aa, 0x2223, "ThinkPad T550", ALC292_FIXUP_TPT440_DOCK),
+ 	SND_PCI_QUIRK(0x17aa, 0x2226, "ThinkPad X250", ALC292_FIXUP_TPT440_DOCK),
+ 	SND_PCI_QUIRK(0x17aa, 0x3977, "IdeaPad S210", ALC283_FIXUP_INT_MIC),
+ 	SND_PCI_QUIRK(0x17aa, 0x3978, "IdeaPad Y410P", ALC269_FIXUP_NO_SHUTUP),
+diff --git a/sound/pci/hda/patch_sigmatel.c b/sound/pci/hda/patch_sigmatel.c
+index 9d947aef2c8b..def5cc8dff02 100644
+--- a/sound/pci/hda/patch_sigmatel.c
++++ b/sound/pci/hda/patch_sigmatel.c
+@@ -4520,7 +4520,11 @@ static int patch_stac92hd73xx(struct hda_codec *codec)
+ 		return err;
+ 
+ 	spec = codec->spec;
+-	codec->power_save_node = 1;
++	/* enable power_save_node only for new 92HD89xx chips, as it causes
++	 * click noises on old 92HD73xx chips.
++	 */
++	if ((codec->core.vendor_id & 0xfffffff0) != 0x111d7670)
++		codec->power_save_node = 1;
+ 	spec->linear_tone_beep = 0;
+ 	spec->gen.mixer_nid = 0x1d;
+ 	spec->have_spdif_mux = 1;
+diff --git a/sound/soc/au1x/db1200.c b/sound/soc/au1x/db1200.c
+index 58c3164802b8..8c907ebea189 100644
+--- a/sound/soc/au1x/db1200.c
++++ b/sound/soc/au1x/db1200.c
+@@ -129,6 +129,8 @@ static struct snd_soc_dai_link db1300_i2s_dai = {
+ 	.cpu_dai_name	= "au1xpsc_i2s.2",
+ 	.platform_name	= "au1xpsc-pcm.2",
+ 	.codec_name	= "wm8731.0-001b",
++	.dai_fmt	= SND_SOC_DAIFMT_LEFT_J | SND_SOC_DAIFMT_NB_NF |
++			  SND_SOC_DAIFMT_CBM_CFM,
+ 	.ops		= &db1200_i2s_wm8731_ops,
+ };
+ 
+@@ -146,6 +148,8 @@ static struct snd_soc_dai_link db1550_i2s_dai = {
+ 	.cpu_dai_name	= "au1xpsc_i2s.3",
+ 	.platform_name	= "au1xpsc-pcm.3",
+ 	.codec_name	= "wm8731.0-001b",
++	.dai_fmt	= SND_SOC_DAIFMT_LEFT_J | SND_SOC_DAIFMT_NB_NF |
++			  SND_SOC_DAIFMT_CBM_CFM,
+ 	.ops		= &db1200_i2s_wm8731_ops,
+ };
+ 
+diff --git a/sound/soc/codecs/sgtl5000.c b/sound/soc/codecs/sgtl5000.c
+index e673f6ceb521..7c411297bfdd 100644
+--- a/sound/soc/codecs/sgtl5000.c
++++ b/sound/soc/codecs/sgtl5000.c
+@@ -1377,8 +1377,8 @@ static int sgtl5000_probe(struct snd_soc_codec *codec)
+ 			sgtl5000->micbias_resistor << SGTL5000_BIAS_R_SHIFT);
+ 
+ 	snd_soc_update_bits(codec, SGTL5000_CHIP_MIC_CTRL,
+-			SGTL5000_BIAS_R_MASK,
+-			sgtl5000->micbias_voltage << SGTL5000_BIAS_R_SHIFT);
++			SGTL5000_BIAS_VOLT_MASK,
++			sgtl5000->micbias_voltage << SGTL5000_BIAS_VOLT_SHIFT);
+ 	/*
+ 	 * disable DAP
+ 	 * TODO:
+diff --git a/sound/soc/codecs/tas2552.c b/sound/soc/codecs/tas2552.c
+index 4f25a7d0efa2..b3e5685aca1e 100644
+--- a/sound/soc/codecs/tas2552.c
++++ b/sound/soc/codecs/tas2552.c
+@@ -551,7 +551,7 @@ static struct snd_soc_dai_driver tas2552_dai[] = {
+ /*
+  * DAC digital volumes. From -7 to 24 dB in 1 dB steps
+  */
+-static DECLARE_TLV_DB_SCALE(dac_tlv, -7, 100, 0);
++static DECLARE_TLV_DB_SCALE(dac_tlv, -700, 100, 0);
+ 
+ static const char * const tas2552_din_source_select[] = {
+ 	"Muted",
+diff --git a/sound/soc/dwc/designware_i2s.c b/sound/soc/dwc/designware_i2s.c
+index a3e97b46b64e..0d28e3b356f6 100644
+--- a/sound/soc/dwc/designware_i2s.c
++++ b/sound/soc/dwc/designware_i2s.c
+@@ -131,10 +131,10 @@ static inline void i2s_clear_irqs(struct dw_i2s_dev *dev, u32 stream)
+ 
+ 	if (stream == SNDRV_PCM_STREAM_PLAYBACK) {
+ 		for (i = 0; i < 4; i++)
+-			i2s_write_reg(dev->i2s_base, TOR(i), 0);
++			i2s_read_reg(dev->i2s_base, TOR(i));
+ 	} else {
+ 		for (i = 0; i < 4; i++)
+-			i2s_write_reg(dev->i2s_base, ROR(i), 0);
++			i2s_read_reg(dev->i2s_base, ROR(i));
+ 	}
+ }
+ 
+diff --git a/sound/soc/pxa/Kconfig b/sound/soc/pxa/Kconfig
+index 39cea80846c3..f2bf8661dd21 100644
+--- a/sound/soc/pxa/Kconfig
++++ b/sound/soc/pxa/Kconfig
+@@ -1,7 +1,6 @@
+ config SND_PXA2XX_SOC
+ 	tristate "SoC Audio for the Intel PXA2xx chip"
+ 	depends on ARCH_PXA
+-	select SND_ARM
+ 	select SND_PXA2XX_LIB
+ 	help
+ 	  Say Y or M if you want to add support for codecs attached to
+@@ -25,7 +24,6 @@ config SND_PXA2XX_AC97
+ config SND_PXA2XX_SOC_AC97
+ 	tristate
+ 	select AC97_BUS
+-	select SND_ARM
+ 	select SND_PXA2XX_LIB_AC97
+ 	select SND_SOC_AC97_BUS
+ 
+diff --git a/sound/soc/pxa/pxa2xx-ac97.c b/sound/soc/pxa/pxa2xx-ac97.c
+index 1f6054650991..9e4b04e0fbd1 100644
+--- a/sound/soc/pxa/pxa2xx-ac97.c
++++ b/sound/soc/pxa/pxa2xx-ac97.c
+@@ -49,7 +49,7 @@ static struct snd_ac97_bus_ops pxa2xx_ac97_ops = {
+ 	.reset	= pxa2xx_ac97_cold_reset,
+ };
+ 
+-static unsigned long pxa2xx_ac97_pcm_stereo_in_req = 12;
++static unsigned long pxa2xx_ac97_pcm_stereo_in_req = 11;
+ static struct snd_dmaengine_dai_dma_data pxa2xx_ac97_pcm_stereo_in = {
+ 	.addr		= __PREG(PCDR),
+ 	.addr_width	= DMA_SLAVE_BUSWIDTH_4_BYTES,
+@@ -57,7 +57,7 @@ static struct snd_dmaengine_dai_dma_data pxa2xx_ac97_pcm_stereo_in = {
+ 	.filter_data	= &pxa2xx_ac97_pcm_stereo_in_req,
+ };
+ 
+-static unsigned long pxa2xx_ac97_pcm_stereo_out_req = 11;
++static unsigned long pxa2xx_ac97_pcm_stereo_out_req = 12;
+ static struct snd_dmaengine_dai_dma_data pxa2xx_ac97_pcm_stereo_out = {
+ 	.addr		= __PREG(PCDR),
+ 	.addr_width	= DMA_SLAVE_BUSWIDTH_4_BYTES,
+diff --git a/sound/synth/emux/emux_oss.c b/sound/synth/emux/emux_oss.c
+index 82e350e9501c..ac75816ada7c 100644
+--- a/sound/synth/emux/emux_oss.c
++++ b/sound/synth/emux/emux_oss.c
+@@ -69,7 +69,8 @@ snd_emux_init_seq_oss(struct snd_emux *emu)
+ 	struct snd_seq_oss_reg *arg;
+ 	struct snd_seq_device *dev;
+ 
+-	if (snd_seq_device_new(emu->card, 0, SNDRV_SEQ_DEV_ID_OSS,
++	/* using device#1 here for avoiding conflicts with OPL3 */
++	if (snd_seq_device_new(emu->card, 1, SNDRV_SEQ_DEV_ID_OSS,
+ 			       sizeof(struct snd_seq_oss_reg), &dev) < 0)
+ 		return;
+ 
+diff --git a/tools/lguest/lguest.c b/tools/lguest/lguest.c
+index e44052483ed9..80159e6811c2 100644
+--- a/tools/lguest/lguest.c
++++ b/tools/lguest/lguest.c
+@@ -125,7 +125,11 @@ struct device_list {
+ /* The list of Guest devices, based on command line arguments. */
+ static struct device_list devices;
+ 
+-struct virtio_pci_cfg_cap {
++/*
++ * Just like struct virtio_pci_cfg_cap in uapi/linux/virtio_pci.h,
++ * but uses a u32 explicitly for the data.
++ */
++struct virtio_pci_cfg_cap_u32 {
+ 	struct virtio_pci_cap cap;
+ 	u32 pci_cfg_data; /* Data for BAR access. */
+ };
+@@ -157,7 +161,7 @@ struct pci_config {
+ 	struct virtio_pci_notify_cap notify;
+ 	struct virtio_pci_cap isr;
+ 	struct virtio_pci_cap device;
+-	struct virtio_pci_cfg_cap cfg_access;
++	struct virtio_pci_cfg_cap_u32 cfg_access;
+ };
+ 
+ /* The device structure describes a single device. */
+@@ -1291,7 +1295,7 @@ static struct device *dev_and_reg(u32 *reg)
+  * only fault if they try to write with some invalid bar/offset/length.
+  */
+ static bool valid_bar_access(struct device *d,
+-			     struct virtio_pci_cfg_cap *cfg_access)
++			     struct virtio_pci_cfg_cap_u32 *cfg_access)
+ {
+ 	/* We only have 1 bar (BAR0) */
+ 	if (cfg_access->cap.bar != 0)
+diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c
+index cc25f059ab3d..a843bee66a4f 100644
+--- a/tools/lib/traceevent/event-parse.c
++++ b/tools/lib/traceevent/event-parse.c
+@@ -3721,7 +3721,7 @@ static void print_str_arg(struct trace_seq *s, void *data, int size,
+ 	struct format_field *field;
+ 	struct printk_map *printk;
+ 	long long val, fval;
+-	unsigned long addr;
++	unsigned long long addr;
+ 	char *str;
+ 	unsigned char *hex;
+ 	int print;
+@@ -3754,13 +3754,30 @@ static void print_str_arg(struct trace_seq *s, void *data, int size,
+ 		 */
+ 		if (!(field->flags & FIELD_IS_ARRAY) &&
+ 		    field->size == pevent->long_size) {
+-			addr = *(unsigned long *)(data + field->offset);
++
++			/* Handle heterogeneous recording and processing
++			 * architectures
++			 *
++			 * CASE I:
++			 * Traces recorded on 32-bit devices (32-bit
++			 * addressing) and processed on 64-bit devices:
++			 * In this case, only 32 bits should be read.
++			 *
++			 * CASE II:
++			 * Traces recorded on 64 bit devices and processed
++			 * on 32-bit devices:
++			 * In this case, 64 bits must be read.
++			 */
++			addr = (pevent->long_size == 8) ?
++				*(unsigned long long *)(data + field->offset) :
++				(unsigned long long)*(unsigned int *)(data + field->offset);
++
+ 			/* Check if it matches a print format */
+ 			printk = find_printk(pevent, addr);
+ 			if (printk)
+ 				trace_seq_puts(s, printk->printk);
+ 			else
+-				trace_seq_printf(s, "%lx", addr);
++				trace_seq_printf(s, "%llx", addr);
+ 			break;
+ 		}
+ 		str = malloc(len + 1);
+diff --git a/tools/perf/arch/alpha/Build b/tools/perf/arch/alpha/Build
+new file mode 100644
+index 000000000000..1bb8bf6d7fd4
+--- /dev/null
++++ b/tools/perf/arch/alpha/Build
+@@ -0,0 +1 @@
++# empty
+diff --git a/tools/perf/arch/mips/Build b/tools/perf/arch/mips/Build
+new file mode 100644
+index 000000000000..1bb8bf6d7fd4
+--- /dev/null
++++ b/tools/perf/arch/mips/Build
+@@ -0,0 +1 @@
++# empty
+diff --git a/tools/perf/arch/parisc/Build b/tools/perf/arch/parisc/Build
+new file mode 100644
+index 000000000000..1bb8bf6d7fd4
+--- /dev/null
++++ b/tools/perf/arch/parisc/Build
+@@ -0,0 +1 @@
++# empty
+diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
+index d99d850e1444..ef355fc0e870 100644
+--- a/tools/perf/builtin-stat.c
++++ b/tools/perf/builtin-stat.c
+@@ -694,7 +694,7 @@ static void abs_printout(int id, int nr, struct perf_evsel *evsel, double avg)
+ static void print_aggr(char *prefix)
+ {
+ 	struct perf_evsel *counter;
+-	int cpu, cpu2, s, s2, id, nr;
++	int cpu, s, s2, id, nr;
+ 	double uval;
+ 	u64 ena, run, val;
+ 
+@@ -707,8 +707,7 @@ static void print_aggr(char *prefix)
+ 			val = ena = run = 0;
+ 			nr = 0;
+ 			for (cpu = 0; cpu < perf_evsel__nr_cpus(counter); cpu++) {
+-				cpu2 = perf_evsel__cpus(counter)->map[cpu];
+-				s2 = aggr_get_id(evsel_list->cpus, cpu2);
++				s2 = aggr_get_id(perf_evsel__cpus(counter), cpu);
+ 				if (s2 != id)
+ 					continue;
+ 				val += perf_counts(counter->counts, cpu, 0)->val;
+diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
+index 03ace57a800c..4215cc155041 100644
+--- a/tools/perf/util/header.c
++++ b/tools/perf/util/header.c
+@@ -1442,7 +1442,7 @@ static int process_nrcpus(struct perf_file_section *section __maybe_unused,
+ 	if (ph->needs_swap)
+ 		nr = bswap_32(nr);
+ 
+-	ph->env.nr_cpus_online = nr;
++	ph->env.nr_cpus_avail = nr;
+ 
+ 	ret = readn(fd, &nr, sizeof(nr));
+ 	if (ret != sizeof(nr))
+@@ -1451,7 +1451,7 @@ static int process_nrcpus(struct perf_file_section *section __maybe_unused,
+ 	if (ph->needs_swap)
+ 		nr = bswap_32(nr);
+ 
+-	ph->env.nr_cpus_avail = nr;
++	ph->env.nr_cpus_online = nr;
+ 	return 0;
+ }
+ 
+diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
+index 6f28d53d4e46..f298c696e24f 100644
+--- a/tools/perf/util/hist.c
++++ b/tools/perf/util/hist.c
+@@ -151,6 +151,9 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
+ 	hists__new_col_len(hists, HISTC_LOCAL_WEIGHT, 12);
+ 	hists__new_col_len(hists, HISTC_GLOBAL_WEIGHT, 12);
+ 
++	if (h->srcline)
++		hists__new_col_len(hists, HISTC_SRCLINE, strlen(h->srcline));
++
+ 	if (h->transaction)
+ 		hists__new_col_len(hists, HISTC_TRANSACTION,
+ 				   hist_entry__transaction_len());
+diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
+index 591905a02b92..9cd70819c795 100644
+--- a/tools/perf/util/parse-events.y
++++ b/tools/perf/util/parse-events.y
+@@ -255,7 +255,7 @@ PE_PMU_EVENT_PRE '-' PE_PMU_EVENT_SUF sep_dc
+ 	list_add_tail(&term->list, head);
+ 
+ 	ALLOC_LIST(list);
+-	ABORT_ON(parse_events_add_pmu(list, &data->idx, "cpu", head));
++	ABORT_ON(parse_events_add_pmu(data, list, "cpu", head));
+ 	parse_events__free_terms(head);
+ 	$$ = list;
+ }
+diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
+index 381f23a443c7..ae6351db6de4 100644
+--- a/tools/perf/util/probe-event.c
++++ b/tools/perf/util/probe-event.c
+@@ -274,12 +274,13 @@ static int kernel_get_module_dso(const char *module, struct dso **pdso)
+ 	int ret = 0;
+ 
+ 	if (module) {
+-		list_for_each_entry(dso, &host_machine->dsos.head, node) {
+-			if (!dso->kernel)
+-				continue;
+-			if (strncmp(dso->short_name + 1, module,
+-				    dso->short_name_len - 2) == 0)
+-				goto found;
++		char module_name[128];
++
++		snprintf(module_name, sizeof(module_name), "[%s]", module);
++		map = map_groups__find_by_name(&host_machine->kmaps, MAP__FUNCTION, module_name);
++		if (map) {
++			dso = map->dso;
++			goto found;
+ 		}
+ 		pr_debug("Failed to find module %s.\n", module);
+ 		return -ENOENT;
+diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
+index 31db6ee7db54..cd55c6db421d 100644
+--- a/tools/perf/util/probe-event.h
++++ b/tools/perf/util/probe-event.h
+@@ -106,6 +106,8 @@ struct variable_list {
+ 	struct strlist			*vars;	/* Available variables */
+ };
+ 
++struct map;
++
+ /* Command string to events */
+ extern int parse_perf_probe_command(const char *cmd,
+ 				    struct perf_probe_event *pev);
+diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
+index 65f7e389ae09..333858821ab0 100644
+--- a/tools/perf/util/symbol-elf.c
++++ b/tools/perf/util/symbol-elf.c
+@@ -1260,8 +1260,6 @@ out_close:
+ static int kcore__init(struct kcore *kcore, char *filename, int elfclass,
+ 		       bool temp)
+ {
+-	GElf_Ehdr *ehdr;
+-
+ 	kcore->elfclass = elfclass;
+ 
+ 	if (temp)
+@@ -1278,9 +1276,7 @@ static int kcore__init(struct kcore *kcore, char *filename, int elfclass,
+ 	if (!gelf_newehdr(kcore->elf, elfclass))
+ 		goto out_end;
+ 
+-	ehdr = gelf_getehdr(kcore->elf, &kcore->ehdr);
+-	if (!ehdr)
+-		goto out_end;
++	memset(&kcore->ehdr, 0, sizeof(GElf_Ehdr));
+ 
+ 	return 0;
+ 
+@@ -1337,23 +1333,18 @@ static int kcore__copy_hdr(struct kcore *from, struct kcore *to, size_t count)
+ static int kcore__add_phdr(struct kcore *kcore, int idx, off_t offset,
+ 			   u64 addr, u64 len)
+ {
+-	GElf_Phdr gphdr;
+-	GElf_Phdr *phdr;
+-
+-	phdr = gelf_getphdr(kcore->elf, idx, &gphdr);
+-	if (!phdr)
+-		return -1;
+-
+-	phdr->p_type	= PT_LOAD;
+-	phdr->p_flags	= PF_R | PF_W | PF_X;
+-	phdr->p_offset	= offset;
+-	phdr->p_vaddr	= addr;
+-	phdr->p_paddr	= 0;
+-	phdr->p_filesz	= len;
+-	phdr->p_memsz	= len;
+-	phdr->p_align	= page_size;
+-
+-	if (!gelf_update_phdr(kcore->elf, idx, phdr))
++	GElf_Phdr phdr = {
++		.p_type		= PT_LOAD,
++		.p_flags	= PF_R | PF_W | PF_X,
++		.p_offset	= offset,
++		.p_vaddr	= addr,
++		.p_paddr	= 0,
++		.p_filesz	= len,
++		.p_memsz	= len,
++		.p_align	= page_size,
++	};
++
++	if (!gelf_update_phdr(kcore->elf, idx, &phdr))
+ 		return -1;
+ 
+ 	return 0;
+diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
+index 9ff4193dfa49..79db45336e3a 100644
+--- a/virt/kvm/eventfd.c
++++ b/virt/kvm/eventfd.c
+@@ -771,40 +771,14 @@ static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags)
+ 	return KVM_MMIO_BUS;
+ }
+ 
+-static int
+-kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
++static int kvm_assign_ioeventfd_idx(struct kvm *kvm,
++				enum kvm_bus bus_idx,
++				struct kvm_ioeventfd *args)
+ {
+-	enum kvm_bus              bus_idx;
+-	struct _ioeventfd        *p;
+-	struct eventfd_ctx       *eventfd;
+-	int                       ret;
+-
+-	bus_idx = ioeventfd_bus_from_flags(args->flags);
+-	/* must be natural-word sized, or 0 to ignore length */
+-	switch (args->len) {
+-	case 0:
+-	case 1:
+-	case 2:
+-	case 4:
+-	case 8:
+-		break;
+-	default:
+-		return -EINVAL;
+-	}
+-
+-	/* check for range overflow */
+-	if (args->addr + args->len < args->addr)
+-		return -EINVAL;
+ 
+-	/* check for extra flags that we don't understand */
+-	if (args->flags & ~KVM_IOEVENTFD_VALID_FLAG_MASK)
+-		return -EINVAL;
+-
+-	/* ioeventfd with no length can't be combined with DATAMATCH */
+-	if (!args->len &&
+-	    args->flags & (KVM_IOEVENTFD_FLAG_PIO |
+-			   KVM_IOEVENTFD_FLAG_DATAMATCH))
+-		return -EINVAL;
++	struct eventfd_ctx *eventfd;
++	struct _ioeventfd *p;
++	int ret;
+ 
+ 	eventfd = eventfd_ctx_fdget(args->fd);
+ 	if (IS_ERR(eventfd))
+@@ -843,16 +817,6 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
+ 	if (ret < 0)
+ 		goto unlock_fail;
+ 
+-	/* When length is ignored, MMIO is also put on a separate bus, for
+-	 * faster lookups.
+-	 */
+-	if (!args->len && !(args->flags & KVM_IOEVENTFD_FLAG_PIO)) {
+-		ret = kvm_io_bus_register_dev(kvm, KVM_FAST_MMIO_BUS,
+-					      p->addr, 0, &p->dev);
+-		if (ret < 0)
+-			goto register_fail;
+-	}
+-
+ 	kvm->buses[bus_idx]->ioeventfd_count++;
+ 	list_add_tail(&p->list, &kvm->ioeventfds);
+ 
+@@ -860,8 +824,6 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
+ 
+ 	return 0;
+ 
+-register_fail:
+-	kvm_io_bus_unregister_dev(kvm, bus_idx, &p->dev);
+ unlock_fail:
+ 	mutex_unlock(&kvm->slots_lock);
+ 
+@@ -873,14 +835,13 @@ fail:
+ }
+ 
+ static int
+-kvm_deassign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
++kvm_deassign_ioeventfd_idx(struct kvm *kvm, enum kvm_bus bus_idx,
++			   struct kvm_ioeventfd *args)
+ {
+-	enum kvm_bus              bus_idx;
+ 	struct _ioeventfd        *p, *tmp;
+ 	struct eventfd_ctx       *eventfd;
+ 	int                       ret = -ENOENT;
+ 
+-	bus_idx = ioeventfd_bus_from_flags(args->flags);
+ 	eventfd = eventfd_ctx_fdget(args->fd);
+ 	if (IS_ERR(eventfd))
+ 		return PTR_ERR(eventfd);
+@@ -901,10 +862,6 @@ kvm_deassign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
+ 			continue;
+ 
+ 		kvm_io_bus_unregister_dev(kvm, bus_idx, &p->dev);
+-		if (!p->length) {
+-			kvm_io_bus_unregister_dev(kvm, KVM_FAST_MMIO_BUS,
+-						  &p->dev);
+-		}
+ 		kvm->buses[bus_idx]->ioeventfd_count--;
+ 		ioeventfd_release(p);
+ 		ret = 0;
+@@ -918,6 +875,71 @@ kvm_deassign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
+ 	return ret;
+ }
+ 
++static int kvm_deassign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
++{
++	enum kvm_bus bus_idx = ioeventfd_bus_from_flags(args->flags);
++	int ret = kvm_deassign_ioeventfd_idx(kvm, bus_idx, args);
++
++	if (!args->len && bus_idx == KVM_MMIO_BUS)
++		kvm_deassign_ioeventfd_idx(kvm, KVM_FAST_MMIO_BUS, args);
++
++	return ret;
++}
++
++static int
++kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
++{
++	enum kvm_bus              bus_idx;
++	int ret;
++
++	bus_idx = ioeventfd_bus_from_flags(args->flags);
++	/* must be natural-word sized, or 0 to ignore length */
++	switch (args->len) {
++	case 0:
++	case 1:
++	case 2:
++	case 4:
++	case 8:
++		break;
++	default:
++		return -EINVAL;
++	}
++
++	/* check for range overflow */
++	if (args->addr + args->len < args->addr)
++		return -EINVAL;
++
++	/* check for extra flags that we don't understand */
++	if (args->flags & ~KVM_IOEVENTFD_VALID_FLAG_MASK)
++		return -EINVAL;
++
++	/* ioeventfd with no length can't be combined with DATAMATCH */
++	if (!args->len &&
++	    args->flags & (KVM_IOEVENTFD_FLAG_PIO |
++			   KVM_IOEVENTFD_FLAG_DATAMATCH))
++		return -EINVAL;
++
++	ret = kvm_assign_ioeventfd_idx(kvm, bus_idx, args);
++	if (ret)
++		goto fail;
++
++	/* When length is ignored, MMIO is also put on a separate bus, for
++	 * faster lookups.
++	 */
++	if (!args->len && bus_idx == KVM_MMIO_BUS) {
++		ret = kvm_assign_ioeventfd_idx(kvm, KVM_FAST_MMIO_BUS, args);
++		if (ret < 0)
++			goto fast_fail;
++	}
++
++	return 0;
++
++fast_fail:
++	kvm_deassign_ioeventfd_idx(kvm, bus_idx, args);
++fail:
++	return ret;
++}
++
+ int
+ kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
+ {
+diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
+index 8b8a44453670..5a2a78a91d58 100644
+--- a/virt/kvm/kvm_main.c
++++ b/virt/kvm/kvm_main.c
+@@ -3080,10 +3080,25 @@ static void kvm_io_bus_destroy(struct kvm_io_bus *bus)
+ static inline int kvm_io_bus_cmp(const struct kvm_io_range *r1,
+ 				 const struct kvm_io_range *r2)
+ {
+-	if (r1->addr < r2->addr)
++	gpa_t addr1 = r1->addr;
++	gpa_t addr2 = r2->addr;
++
++	if (addr1 < addr2)
+ 		return -1;
+-	if (r1->addr + r1->len > r2->addr + r2->len)
++
++	/* If r2->len == 0, match the exact address.  If r2->len != 0,
++	 * accept any overlapping write.  Any order is acceptable for
++	 * overlapping ranges, because kvm_io_bus_get_first_dev ensures
++	 * we process all of them.
++	 */
++	if (r2->len) {
++		addr1 += r1->len;
++		addr2 += r2->len;
++	}
++
++	if (addr1 > addr2)
+ 		return 1;
++
+ 	return 0;
+ }
+ 


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-10-23 17:19 Mike Pagano
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-10-23 17:19 UTC (permalink / raw
  To: gentoo-commits

commit:     6bc02433d40973c69bd8f87e1f849c63dc01a3c4
Author:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Fri Oct 23 17:19:17 2015 +0000
Commit:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Fri Oct 23 17:19:17 2015 +0000
URL:        https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=6bc02433

Remove redundant patch.

 0000_README                                |  4 --
 1600_dm-crypt-limit-max-segment-size.patch | 84 ------------------------------
 2 files changed, 88 deletions(-)

diff --git a/0000_README b/0000_README
index 2a467c2..daafdd3 100644
--- a/0000_README
+++ b/0000_README
@@ -67,10 +67,6 @@ Patch:  1510_fs-enable-link-security-restrictions-by-default.patch
 From:   http://sources.debian.net/src/linux/3.16.7-ckt4-3/debian/patches/debian/fs-enable-link-security-restrictions-by-default.patch/
 Desc:   Enable link security restrictions by default.
 
-Patch:  1600_dm-crypt-limit-max-segment-size.patch
-From:   https://bugzilla.kernel.org/show_bug.cgi?id=104421
-Desc:   dm crypt: constrain crypt device's max_segment_size to PAGE_SIZE.
-
 Patch:  2700_ThinkPad-30-brightness-control-fix.patch
 From:   Seth Forshee <seth.forshee@canonical.com>
 Desc:   ACPI: Disable Windows 8 compatibility for some Lenovo ThinkPads.

diff --git a/1600_dm-crypt-limit-max-segment-size.patch b/1600_dm-crypt-limit-max-segment-size.patch
deleted file mode 100644
index 82aca44..0000000
--- a/1600_dm-crypt-limit-max-segment-size.patch
+++ /dev/null
@@ -1,84 +0,0 @@
-From 586b286b110e94eb31840ac5afc0c24e0881fe34 Mon Sep 17 00:00:00 2001
-From: Mike Snitzer <snitzer@redhat.com>
-Date: Wed, 9 Sep 2015 21:34:51 -0400
-Subject: dm crypt: constrain crypt device's max_segment_size to PAGE_SIZE
-
-Setting the dm-crypt device's max_segment_size to PAGE_SIZE is an
-unfortunate constraint that is required to avoid the potential for
-exceeding dm-crypt's underlying device's max_segments limits -- due to
-crypt_alloc_buffer() possibly allocating pages for the encryption bio
-that are not as physically contiguous as the original bio.
-
-It is interesting to note that this problem was already fixed back in
-2007 via commit 91e106259 ("dm crypt: use bio_add_page").  But Linux 4.0
-commit cf2f1abfb ("dm crypt: don't allocate pages for a partial
-request") regressed dm-crypt back to _not_ using bio_add_page().  But
-given dm-crypt's cpu parallelization changes all depend on commit
-cf2f1abfb's abandoning of the more complex io fragments processing that
-dm-crypt previously had we cannot easily go back to using
-bio_add_page().
-
-So all said the cleanest way to resolve this issue is to fix dm-crypt to
-properly constrain the original bios entering dm-crypt so the encryption
-bios that dm-crypt generates from the original bios are always
-compatible with the underlying device's max_segments queue limits.
-
-It should be noted that technically Linux 4.3 does _not_ need this fix
-because of the block core's new late bio-splitting capability.  But, it
-is reasoned, there is little to be gained by having the block core split
-the encrypted bio that is composed of PAGE_SIZE segments.  That said, in
-the future we may revert this change.
-
-Fixes: cf2f1abfb ("dm crypt: don't allocate pages for a partial request")
-Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=104421
-Suggested-by: Jeff Moyer <jmoyer@redhat.com>
-Signed-off-by: Mike Snitzer <snitzer@redhat.com>
-Cc: stable@vger.kernel.org # 4.0+
-
-diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
-index d60c88d..4b3b6f8 100644
---- a/drivers/md/dm-crypt.c
-+++ b/drivers/md/dm-crypt.c
-@@ -968,7 +968,8 @@ static void crypt_free_buffer_pages(struct crypt_config *cc, struct bio *clone);
- 
- /*
-  * Generate a new unfragmented bio with the given size
-- * This should never violate the device limitations
-+ * This should never violate the device limitations (but only because
-+ * max_segment_size is being constrained to PAGE_SIZE).
-  *
-  * This function may be called concurrently. If we allocate from the mempool
-  * concurrently, there is a possibility of deadlock. For example, if we have
-@@ -2045,9 +2046,20 @@ static int crypt_iterate_devices(struct dm_target *ti,
- 	return fn(ti, cc->dev, cc->start, ti->len, data);
- }
- 
-+static void crypt_io_hints(struct dm_target *ti, struct queue_limits *limits)
-+{
-+	/*
-+	 * Unfortunate constraint that is required to avoid the potential
-+	 * for exceeding underlying device's max_segments limits -- due to
-+	 * crypt_alloc_buffer() possibly allocating pages for the encryption
-+	 * bio that are not as physically contiguous as the original bio.
-+	 */
-+	limits->max_segment_size = PAGE_SIZE;
-+}
-+
- static struct target_type crypt_target = {
- 	.name   = "crypt",
--	.version = {1, 14, 0},
-+	.version = {1, 14, 1},
- 	.module = THIS_MODULE,
- 	.ctr    = crypt_ctr,
- 	.dtr    = crypt_dtr,
-@@ -2058,6 +2070,7 @@ static struct target_type crypt_target = {
- 	.resume = crypt_resume,
- 	.message = crypt_message,
- 	.iterate_devices = crypt_iterate_devices,
-+	.io_hints = crypt_io_hints,
- };
- 
- static int __init dm_crypt_init(void)
--- 
-cgit v0.10.2
-


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-10-27 13:36 Mike Pagano
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-10-27 13:36 UTC (permalink / raw
  To: gentoo-commits

commit:     b00da6f810d31f1fb924713c20c3f3b103f03228
Author:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Tue Oct 27 13:36:07 2015 +0000
Commit:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Tue Oct 27 13:36:07 2015 +0000
URL:        https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=b00da6f8

Linux patch 4.2.5

 0000_README            |    4 +
 1004_linux-4.2.5.patch | 1945 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 1949 insertions(+)

diff --git a/0000_README b/0000_README
index daafdd3..d40ecf2 100644
--- a/0000_README
+++ b/0000_README
@@ -59,6 +59,10 @@ Patch:  1003_linux-4.2.4.patch
 From:   http://www.kernel.org
 Desc:   Linux 4.2.4
 
+Patch:  1004_linux-4.2.5.patch
+From:   http://www.kernel.org
+Desc:   Linux 4.2.5
+
 Patch:  1500_XATTR_USER_PREFIX.patch
 From:   https://bugs.gentoo.org/show_bug.cgi?id=470644
 Desc:   Support for namespace user.pax.* on tmpfs.

diff --git a/1004_linux-4.2.5.patch b/1004_linux-4.2.5.patch
new file mode 100644
index 0000000..b866faf
--- /dev/null
+++ b/1004_linux-4.2.5.patch
@@ -0,0 +1,1945 @@
+diff --git a/Makefile b/Makefile
+index a952801a6cd5..96076dcad18e 100644
+--- a/Makefile
++++ b/Makefile
+@@ -1,6 +1,6 @@
+ VERSION = 4
+ PATCHLEVEL = 2
+-SUBLEVEL = 4
++SUBLEVEL = 5
+ EXTRAVERSION =
+ NAME = Hurr durr I'ma sheep
+ 
+diff --git a/arch/arm/mach-ux500/Makefile b/arch/arm/mach-ux500/Makefile
+index 4418a5078833..c8643ac5db71 100644
+--- a/arch/arm/mach-ux500/Makefile
++++ b/arch/arm/mach-ux500/Makefile
+@@ -7,7 +7,7 @@ obj-$(CONFIG_CACHE_L2X0)	+= cache-l2x0.o
+ obj-$(CONFIG_UX500_SOC_DB8500)	+= cpu-db8500.o
+ obj-$(CONFIG_MACH_MOP500)	+= board-mop500-regulators.o \
+ 				board-mop500-audio.o
+-obj-$(CONFIG_SMP)		+= platsmp.o headsmp.o
++obj-$(CONFIG_SMP)		+= platsmp.o
+ obj-$(CONFIG_HOTPLUG_CPU)	+= hotplug.o
+ obj-$(CONFIG_PM_GENERIC_DOMAINS) += pm_domains.o
+ 
+diff --git a/arch/arm/mach-ux500/cpu-db8500.c b/arch/arm/mach-ux500/cpu-db8500.c
+index 16913800bbf9..ba708ce08616 100644
+--- a/arch/arm/mach-ux500/cpu-db8500.c
++++ b/arch/arm/mach-ux500/cpu-db8500.c
+@@ -154,7 +154,6 @@ static const char * stericsson_dt_platform_compat[] = {
+ };
+ 
+ DT_MACHINE_START(U8500_DT, "ST-Ericsson Ux5x0 platform (Device Tree Support)")
+-	.smp            = smp_ops(ux500_smp_ops),
+ 	.map_io		= u8500_map_io,
+ 	.init_irq	= ux500_init_irq,
+ 	/* we re-use nomadik timer here */
+diff --git a/arch/arm/mach-ux500/headsmp.S b/arch/arm/mach-ux500/headsmp.S
+deleted file mode 100644
+index 9cdea049485d..000000000000
+--- a/arch/arm/mach-ux500/headsmp.S
++++ /dev/null
+@@ -1,37 +0,0 @@
+-/*
+- *  Copyright (c) 2009 ST-Ericsson
+- *	This file is based  ARM Realview platform
+- *  Copyright (c) 2003 ARM Limited
+- *  All Rights Reserved
+- *
+- * This program is free software; you can redistribute it and/or modify
+- * it under the terms of the GNU General Public License version 2 as
+- * published by the Free Software Foundation.
+- */
+-#include <linux/linkage.h>
+-#include <linux/init.h>
+-
+-/*
+- * U8500 specific entry point for secondary CPUs.
+- */
+-ENTRY(u8500_secondary_startup)
+-	mrc	p15, 0, r0, c0, c0, 5
+-	and	r0, r0, #15
+-	adr	r4, 1f
+-	ldmia	r4, {r5, r6}
+-	sub	r4, r4, r5
+-	add	r6, r6, r4
+-pen:	ldr	r7, [r6]
+-	cmp	r7, r0
+-	bne	pen
+-
+-	/*
+-	 * we've been released from the holding pen: secondary_stack
+-	 * should now contain the SVC stack for this core
+-	 */
+-	b	secondary_startup
+-ENDPROC(u8500_secondary_startup)
+-
+-	.align 2
+-1:	.long	.
+-	.long	pen_release
+diff --git a/arch/arm/mach-ux500/platsmp.c b/arch/arm/mach-ux500/platsmp.c
+index 62b1de922bd8..70766b963758 100644
+--- a/arch/arm/mach-ux500/platsmp.c
++++ b/arch/arm/mach-ux500/platsmp.c
+@@ -28,135 +28,81 @@
+ #include "db8500-regs.h"
+ #include "id.h"
+ 
+-static void __iomem *scu_base;
+-static void __iomem *backupram;
+-
+-/* This is called from headsmp.S to wakeup the secondary core */
+-extern void u8500_secondary_startup(void);
+-
+-/*
+- * Write pen_release in a way that is guaranteed to be visible to all
+- * observers, irrespective of whether they're taking part in coherency
+- * or not.  This is necessary for the hotplug code to work reliably.
+- */
+-static void write_pen_release(int val)
+-{
+-	pen_release = val;
+-	smp_wmb();
+-	sync_cache_w(&pen_release);
+-}
+-
+-static DEFINE_SPINLOCK(boot_lock);
+-
+-static void ux500_secondary_init(unsigned int cpu)
+-{
+-	/*
+-	 * let the primary processor know we're out of the
+-	 * pen, then head off into the C entry point
+-	 */
+-	write_pen_release(-1);
+-
+-	/*
+-	 * Synchronise with the boot thread.
+-	 */
+-	spin_lock(&boot_lock);
+-	spin_unlock(&boot_lock);
+-}
++/* Magic triggers in backup RAM */
++#define UX500_CPU1_JUMPADDR_OFFSET 0x1FF4
++#define UX500_CPU1_WAKEMAGIC_OFFSET 0x1FF0
+ 
+-static int ux500_boot_secondary(unsigned int cpu, struct task_struct *idle)
++static void wakeup_secondary(void)
+ {
+-	unsigned long timeout;
+-
+-	/*
+-	 * set synchronisation state between this boot processor
+-	 * and the secondary one
+-	 */
+-	spin_lock(&boot_lock);
+-
+-	/*
+-	 * The secondary processor is waiting to be released from
+-	 * the holding pen - release it, then wait for it to flag
+-	 * that it has been released by resetting pen_release.
+-	 */
+-	write_pen_release(cpu_logical_map(cpu));
+-
+-	arch_send_wakeup_ipi_mask(cpumask_of(cpu));
++	struct device_node *np;
++	static void __iomem *backupram;
+ 
+-	timeout = jiffies + (1 * HZ);
+-	while (time_before(jiffies, timeout)) {
+-		if (pen_release == -1)
+-			break;
++	np = of_find_compatible_node(NULL, NULL, "ste,dbx500-backupram");
++	if (!np) {
++		pr_err("No backupram base address\n");
++		return;
++	}
++	backupram = of_iomap(np, 0);
++	of_node_put(np);
++	if (!backupram) {
++		pr_err("No backupram remap\n");
++		return;
+ 	}
+ 
+ 	/*
+-	 * now the secondary core is starting up let it run its
+-	 * calibrations, then wait for it to finish
+-	 */
+-	spin_unlock(&boot_lock);
+-
+-	return pen_release != -1 ? -ENOSYS : 0;
+-}
+-
+-static void __init wakeup_secondary(void)
+-{
+-	/*
+ 	 * write the address of secondary startup into the backup ram register
+ 	 * at offset 0x1FF4, then write the magic number 0xA1FEED01 to the
+ 	 * backup ram register at offset 0x1FF0, which is what boot rom code
+-	 * is waiting for. This would wake up the secondary core from WFE
++	 * is waiting for. This will wake up the secondary core from WFE.
+ 	 */
+-#define UX500_CPU1_JUMPADDR_OFFSET 0x1FF4
+-	__raw_writel(virt_to_phys(u8500_secondary_startup),
+-		     backupram + UX500_CPU1_JUMPADDR_OFFSET);
+-
+-#define UX500_CPU1_WAKEMAGIC_OFFSET 0x1FF0
+-	__raw_writel(0xA1FEED01,
+-		     backupram + UX500_CPU1_WAKEMAGIC_OFFSET);
++	writel(virt_to_phys(secondary_startup),
++	       backupram + UX500_CPU1_JUMPADDR_OFFSET);
++	writel(0xA1FEED01,
++	       backupram + UX500_CPU1_WAKEMAGIC_OFFSET);
+ 
+ 	/* make sure write buffer is drained */
+ 	mb();
++	iounmap(backupram);
+ }
+ 
+-/*
+- * Initialise the CPU possible map early - this describes the CPUs
+- * which may be present or become present in the system.
+- */
+-static void __init ux500_smp_init_cpus(void)
++static void __init ux500_smp_prepare_cpus(unsigned int max_cpus)
+ {
+-	unsigned int i, ncores;
+ 	struct device_node *np;
++	static void __iomem *scu_base;
++	unsigned int ncores;
++	int i;
+ 
+ 	np = of_find_compatible_node(NULL, NULL, "arm,cortex-a9-scu");
++	if (!np) {
++		pr_err("No SCU base address\n");
++		return;
++	}
+ 	scu_base = of_iomap(np, 0);
+ 	of_node_put(np);
+-	if (!scu_base)
++	if (!scu_base) {
++		pr_err("No SCU remap\n");
+ 		return;
+-	backupram = ioremap(U8500_BACKUPRAM0_BASE, SZ_8K);
+-	ncores = scu_get_core_count(scu_base);
+-
+-	/* sanity check */
+-	if (ncores > nr_cpu_ids) {
+-		pr_warn("SMP: %u cores greater than maximum (%u), clipping\n",
+-			ncores, nr_cpu_ids);
+-		ncores = nr_cpu_ids;
+ 	}
+ 
++	scu_enable(scu_base);
++	ncores = scu_get_core_count(scu_base);
+ 	for (i = 0; i < ncores; i++)
+ 		set_cpu_possible(i, true);
++	iounmap(scu_base);
+ }
+ 
+-static void __init ux500_smp_prepare_cpus(unsigned int max_cpus)
++static int ux500_boot_secondary(unsigned int cpu, struct task_struct *idle)
+ {
+-	scu_enable(scu_base);
+ 	wakeup_secondary();
++	arch_send_wakeup_ipi_mask(cpumask_of(cpu));
++	return 0;
+ }
+ 
+ struct smp_operations ux500_smp_ops __initdata = {
+-	.smp_init_cpus		= ux500_smp_init_cpus,
+ 	.smp_prepare_cpus	= ux500_smp_prepare_cpus,
+-	.smp_secondary_init	= ux500_secondary_init,
+ 	.smp_boot_secondary	= ux500_boot_secondary,
+ #ifdef CONFIG_HOTPLUG_CPU
+ 	.cpu_die		= ux500_cpu_die,
+ #endif
+ };
++CPU_METHOD_OF_DECLARE(ux500_smp, "ste,dbx500-smp", &ux500_smp_ops);
+diff --git a/arch/arm/mach-ux500/setup.h b/arch/arm/mach-ux500/setup.h
+index 1fb6ad2789f1..65876eac0761 100644
+--- a/arch/arm/mach-ux500/setup.h
++++ b/arch/arm/mach-ux500/setup.h
+@@ -26,7 +26,6 @@ extern struct device *ux500_soc_device_init(const char *soc_id);
+ 
+ extern void ux500_timer_init(void);
+ 
+-extern struct smp_operations ux500_smp_ops;
+ extern void ux500_cpu_die(unsigned int cpu);
+ 
+ #endif /*  __ASM_ARCH_SETUP_H */
+diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
+index 81151663ef38..3258174e6152 100644
+--- a/arch/arm64/Makefile
++++ b/arch/arm64/Makefile
+@@ -31,7 +31,7 @@ endif
+ CHECKFLAGS	+= -D__aarch64__
+ 
+ ifeq ($(CONFIG_ARM64_ERRATUM_843419), y)
+-CFLAGS_MODULE	+= -mcmodel=large
++KBUILD_CFLAGS_MODULE	+= -mcmodel=large
+ endif
+ 
+ # Default value
+diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
+index 56283f8a675c..cf7319422768 100644
+--- a/arch/arm64/include/asm/pgtable.h
++++ b/arch/arm64/include/asm/pgtable.h
+@@ -80,7 +80,7 @@ extern void __pgd_error(const char *file, int line, unsigned long val);
+ #define PAGE_S2			__pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_NORMAL) | PTE_S2_RDONLY)
+ #define PAGE_S2_DEVICE		__pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_DEVICE_nGnRE) | PTE_S2_RDONLY | PTE_UXN)
+ 
+-#define PAGE_NONE		__pgprot(((_PAGE_DEFAULT) & ~PTE_TYPE_MASK) | PTE_PROT_NONE | PTE_PXN | PTE_UXN)
++#define PAGE_NONE		__pgprot(((_PAGE_DEFAULT) & ~PTE_VALID) | PTE_PROT_NONE | PTE_PXN | PTE_UXN)
+ #define PAGE_SHARED		__pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_PXN | PTE_UXN | PTE_WRITE)
+ #define PAGE_SHARED_EXEC	__pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_PXN | PTE_WRITE)
+ #define PAGE_COPY		__pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_PXN | PTE_UXN)
+@@ -460,7 +460,7 @@ static inline pud_t *pud_offset(pgd_t *pgd, unsigned long addr)
+ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
+ {
+ 	const pteval_t mask = PTE_USER | PTE_PXN | PTE_UXN | PTE_RDONLY |
+-			      PTE_PROT_NONE | PTE_WRITE | PTE_TYPE_MASK;
++			      PTE_PROT_NONE | PTE_VALID | PTE_WRITE;
+ 	pte_val(pte) = (pte_val(pte) & ~mask) | (pgprot_val(newprot) & mask);
+ 	return pte;
+ }
+diff --git a/arch/sparc/crypto/aes_glue.c b/arch/sparc/crypto/aes_glue.c
+index 2e48eb8813ff..c90930de76ba 100644
+--- a/arch/sparc/crypto/aes_glue.c
++++ b/arch/sparc/crypto/aes_glue.c
+@@ -433,6 +433,7 @@ static struct crypto_alg algs[] = { {
+ 		.blkcipher = {
+ 			.min_keysize	= AES_MIN_KEY_SIZE,
+ 			.max_keysize	= AES_MAX_KEY_SIZE,
++			.ivsize		= AES_BLOCK_SIZE,
+ 			.setkey		= aes_set_key,
+ 			.encrypt	= cbc_encrypt,
+ 			.decrypt	= cbc_decrypt,
+@@ -452,6 +453,7 @@ static struct crypto_alg algs[] = { {
+ 		.blkcipher = {
+ 			.min_keysize	= AES_MIN_KEY_SIZE,
+ 			.max_keysize	= AES_MAX_KEY_SIZE,
++			.ivsize		= AES_BLOCK_SIZE,
+ 			.setkey		= aes_set_key,
+ 			.encrypt	= ctr_crypt,
+ 			.decrypt	= ctr_crypt,
+diff --git a/arch/sparc/crypto/camellia_glue.c b/arch/sparc/crypto/camellia_glue.c
+index 6bf2479a12fb..561a84d93cf6 100644
+--- a/arch/sparc/crypto/camellia_glue.c
++++ b/arch/sparc/crypto/camellia_glue.c
+@@ -274,6 +274,7 @@ static struct crypto_alg algs[] = { {
+ 		.blkcipher = {
+ 			.min_keysize	= CAMELLIA_MIN_KEY_SIZE,
+ 			.max_keysize	= CAMELLIA_MAX_KEY_SIZE,
++			.ivsize		= CAMELLIA_BLOCK_SIZE,
+ 			.setkey		= camellia_set_key,
+ 			.encrypt	= cbc_encrypt,
+ 			.decrypt	= cbc_decrypt,
+diff --git a/arch/sparc/crypto/des_glue.c b/arch/sparc/crypto/des_glue.c
+index dd6a34fa6e19..61af794aa2d3 100644
+--- a/arch/sparc/crypto/des_glue.c
++++ b/arch/sparc/crypto/des_glue.c
+@@ -429,6 +429,7 @@ static struct crypto_alg algs[] = { {
+ 		.blkcipher = {
+ 			.min_keysize	= DES_KEY_SIZE,
+ 			.max_keysize	= DES_KEY_SIZE,
++			.ivsize		= DES_BLOCK_SIZE,
+ 			.setkey		= des_set_key,
+ 			.encrypt	= cbc_encrypt,
+ 			.decrypt	= cbc_decrypt,
+@@ -485,6 +486,7 @@ static struct crypto_alg algs[] = { {
+ 		.blkcipher = {
+ 			.min_keysize	= DES3_EDE_KEY_SIZE,
+ 			.max_keysize	= DES3_EDE_KEY_SIZE,
++			.ivsize		= DES3_EDE_BLOCK_SIZE,
+ 			.setkey		= des3_ede_set_key,
+ 			.encrypt	= cbc3_encrypt,
+ 			.decrypt	= cbc3_decrypt,
+diff --git a/arch/x86/crypto/camellia_aesni_avx_glue.c b/arch/x86/crypto/camellia_aesni_avx_glue.c
+index 80a0e4389c9a..bacaa13acac5 100644
+--- a/arch/x86/crypto/camellia_aesni_avx_glue.c
++++ b/arch/x86/crypto/camellia_aesni_avx_glue.c
+@@ -554,6 +554,11 @@ static int __init camellia_aesni_init(void)
+ {
+ 	const char *feature_name;
+ 
++	if (!cpu_has_avx || !cpu_has_aes || !cpu_has_osxsave) {
++		pr_info("AVX or AES-NI instructions are not detected.\n");
++		return -ENODEV;
++	}
++
+ 	if (!cpu_has_xfeatures(XSTATE_SSE | XSTATE_YMM, &feature_name)) {
+ 		pr_info("CPU feature '%s' is not supported.\n", feature_name);
+ 		return -ENODEV;
+diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
+index e7a4fde5d631..2392541a96e6 100644
+--- a/arch/x86/kvm/emulate.c
++++ b/arch/x86/kvm/emulate.c
+@@ -2418,7 +2418,7 @@ static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt, u64 smbase)
+ 	u64 val, cr0, cr4;
+ 	u32 base3;
+ 	u16 selector;
+-	int i;
++	int i, r;
+ 
+ 	for (i = 0; i < 16; i++)
+ 		*reg_write(ctxt, i) = GET_SMSTATE(u64, smbase, 0x7ff8 - i * 8);
+@@ -2460,13 +2460,17 @@ static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt, u64 smbase)
+ 	dt.address =                GET_SMSTATE(u64, smbase, 0x7e68);
+ 	ctxt->ops->set_gdt(ctxt, &dt);
+ 
++	r = rsm_enter_protected_mode(ctxt, cr0, cr4);
++	if (r != X86EMUL_CONTINUE)
++		return r;
++
+ 	for (i = 0; i < 6; i++) {
+-		int r = rsm_load_seg_64(ctxt, smbase, i);
++		r = rsm_load_seg_64(ctxt, smbase, i);
+ 		if (r != X86EMUL_CONTINUE)
+ 			return r;
+ 	}
+ 
+-	return rsm_enter_protected_mode(ctxt, cr0, cr4);
++	return X86EMUL_CONTINUE;
+ }
+ 
+ static int em_rsm(struct x86_emulate_ctxt *ctxt)
+diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
+index 32c6e6ac5964..373328b71599 100644
+--- a/arch/x86/kvm/x86.c
++++ b/arch/x86/kvm/x86.c
+@@ -6706,6 +6706,12 @@ static inline int vcpu_block(struct kvm *kvm, struct kvm_vcpu *vcpu)
+ 	return 1;
+ }
+ 
++static inline bool kvm_vcpu_running(struct kvm_vcpu *vcpu)
++{
++	return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
++		!vcpu->arch.apf.halted);
++}
++
+ static int vcpu_run(struct kvm_vcpu *vcpu)
+ {
+ 	int r;
+@@ -6714,8 +6720,7 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
+ 	vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
+ 
+ 	for (;;) {
+-		if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
+-		    !vcpu->arch.apf.halted)
++		if (kvm_vcpu_running(vcpu))
+ 			r = vcpu_enter_guest(vcpu);
+ 		else
+ 			r = vcpu_block(kvm, vcpu);
+@@ -8011,19 +8016,36 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
+ 	kvm_mmu_invalidate_zap_all_pages(kvm);
+ }
+ 
++static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
++{
++	if (!list_empty_careful(&vcpu->async_pf.done))
++		return true;
++
++	if (kvm_apic_has_events(vcpu))
++		return true;
++
++	if (vcpu->arch.pv.pv_unhalted)
++		return true;
++
++	if (atomic_read(&vcpu->arch.nmi_queued))
++		return true;
++
++	if (test_bit(KVM_REQ_SMI, &vcpu->requests))
++		return true;
++
++	if (kvm_arch_interrupt_allowed(vcpu) &&
++	    kvm_cpu_has_interrupt(vcpu))
++		return true;
++
++	return false;
++}
++
+ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
+ {
+ 	if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events)
+ 		kvm_x86_ops->check_nested_events(vcpu, false);
+ 
+-	return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
+-		!vcpu->arch.apf.halted)
+-		|| !list_empty_careful(&vcpu->async_pf.done)
+-		|| kvm_apic_has_events(vcpu)
+-		|| vcpu->arch.pv.pv_unhalted
+-		|| atomic_read(&vcpu->arch.nmi_queued) ||
+-		(kvm_arch_interrupt_allowed(vcpu) &&
+-		 kvm_cpu_has_interrupt(vcpu));
++	return kvm_vcpu_running(vcpu) || kvm_vcpu_has_events(vcpu);
+ }
+ 
+ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
+diff --git a/crypto/ahash.c b/crypto/ahash.c
+index 8acb886032ae..9c1dc8d6106a 100644
+--- a/crypto/ahash.c
++++ b/crypto/ahash.c
+@@ -544,7 +544,8 @@ static int ahash_prepare_alg(struct ahash_alg *alg)
+ 	struct crypto_alg *base = &alg->halg.base;
+ 
+ 	if (alg->halg.digestsize > PAGE_SIZE / 8 ||
+-	    alg->halg.statesize > PAGE_SIZE / 8)
++	    alg->halg.statesize > PAGE_SIZE / 8 ||
++	    alg->halg.statesize == 0)
+ 		return -EINVAL;
+ 
+ 	base->cra_type = &crypto_ahash_type;
+diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
+index bc67a93aa4f4..324bf35ec4dd 100644
+--- a/drivers/block/rbd.c
++++ b/drivers/block/rbd.c
+@@ -5201,7 +5201,6 @@ static int rbd_dev_probe_parent(struct rbd_device *rbd_dev)
+ out_err:
+ 	if (parent) {
+ 		rbd_dev_unparent(rbd_dev);
+-		kfree(rbd_dev->header_name);
+ 		rbd_dev_destroy(parent);
+ 	} else {
+ 		rbd_put_client(rbdc);
+diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
+index b16b9256883e..4c4035fdeb6f 100644
+--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
++++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
+@@ -76,8 +76,6 @@ static void amdgpu_flip_work_func(struct work_struct *__work)
+ 	/* We borrow the event spin lock for protecting flip_status */
+ 	spin_lock_irqsave(&crtc->dev->event_lock, flags);
+ 
+-	/* set the proper interrupt */
+-	amdgpu_irq_get(adev, &adev->pageflip_irq, work->crtc_id);
+ 	/* do the flip (mmio) */
+ 	adev->mode_info.funcs->page_flip(adev, work->crtc_id, work->base);
+ 	/* set the flip status */
+diff --git a/drivers/gpu/drm/amd/amdgpu/ci_dpm.c b/drivers/gpu/drm/amd/amdgpu/ci_dpm.c
+index 82e8d0730517..a1a35a5df8e7 100644
+--- a/drivers/gpu/drm/amd/amdgpu/ci_dpm.c
++++ b/drivers/gpu/drm/amd/amdgpu/ci_dpm.c
+@@ -6185,6 +6185,11 @@ static int ci_dpm_late_init(void *handle)
+ 	if (!amdgpu_dpm)
+ 		return 0;
+ 
++	/* init the sysfs and debugfs files late */
++	ret = amdgpu_pm_sysfs_init(adev);
++	if (ret)
++		return ret;
++
+ 	ret = ci_set_temperature_range(adev);
+ 	if (ret)
+ 		return ret;
+@@ -6232,9 +6237,6 @@ static int ci_dpm_sw_init(void *handle)
+ 	adev->pm.dpm.current_ps = adev->pm.dpm.requested_ps = adev->pm.dpm.boot_ps;
+ 	if (amdgpu_dpm == 1)
+ 		amdgpu_pm_print_power_states(adev);
+-	ret = amdgpu_pm_sysfs_init(adev);
+-	if (ret)
+-		goto dpm_failed;
+ 	mutex_unlock(&adev->pm.mutex);
+ 	DRM_INFO("amdgpu: dpm initialized\n");
+ 
+diff --git a/drivers/gpu/drm/amd/amdgpu/cik.c b/drivers/gpu/drm/amd/amdgpu/cik.c
+index 341c56681841..519fa515c4d8 100644
+--- a/drivers/gpu/drm/amd/amdgpu/cik.c
++++ b/drivers/gpu/drm/amd/amdgpu/cik.c
+@@ -1565,6 +1565,9 @@ static void cik_pcie_gen3_enable(struct amdgpu_device *adev)
+ 	int ret, i;
+ 	u16 tmp16;
+ 
++	if (pci_is_root_bus(adev->pdev->bus))
++		return;
++
+ 	if (amdgpu_pcie_gen2 == 0)
+ 		return;
+ 
+diff --git a/drivers/gpu/drm/amd/amdgpu/cz_dpm.c b/drivers/gpu/drm/amd/amdgpu/cz_dpm.c
+index ace870afc7d4..fd29c18fc14e 100644
+--- a/drivers/gpu/drm/amd/amdgpu/cz_dpm.c
++++ b/drivers/gpu/drm/amd/amdgpu/cz_dpm.c
+@@ -596,6 +596,12 @@ static int cz_dpm_late_init(void *handle)
+ 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+ 
+ 	if (amdgpu_dpm) {
++		int ret;
++		/* init the sysfs and debugfs files late */
++		ret = amdgpu_pm_sysfs_init(adev);
++		if (ret)
++			return ret;
++
+ 		/* powerdown unused blocks for now */
+ 		cz_dpm_powergate_uvd(adev, true);
+ 		cz_dpm_powergate_vce(adev, true);
+@@ -632,10 +638,6 @@ static int cz_dpm_sw_init(void *handle)
+ 	if (amdgpu_dpm == 1)
+ 		amdgpu_pm_print_power_states(adev);
+ 
+-	ret = amdgpu_pm_sysfs_init(adev);
+-	if (ret)
+-		goto dpm_init_failed;
+-
+ 	mutex_unlock(&adev->pm.mutex);
+ 	DRM_INFO("amdgpu: dpm initialized\n");
+ 
+diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
+index e774a437dd65..ef36467c7e34 100644
+--- a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
++++ b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
+@@ -233,6 +233,24 @@ static u32 dce_v10_0_vblank_get_counter(struct amdgpu_device *adev, int crtc)
+ 		return RREG32(mmCRTC_STATUS_FRAME_COUNT + crtc_offsets[crtc]);
+ }
+ 
++static void dce_v10_0_pageflip_interrupt_init(struct amdgpu_device *adev)
++{
++	unsigned i;
++
++	/* Enable pflip interrupts */
++	for (i = 0; i < adev->mode_info.num_crtc; i++)
++		amdgpu_irq_get(adev, &adev->pageflip_irq, i);
++}
++
++static void dce_v10_0_pageflip_interrupt_fini(struct amdgpu_device *adev)
++{
++	unsigned i;
++
++	/* Disable pflip interrupts */
++	for (i = 0; i < adev->mode_info.num_crtc; i++)
++		amdgpu_irq_put(adev, &adev->pageflip_irq, i);
++}
++
+ /**
+  * dce_v10_0_page_flip - pageflip callback.
+  *
+@@ -2641,9 +2659,10 @@ static void dce_v10_0_crtc_dpms(struct drm_crtc *crtc, int mode)
+ 		dce_v10_0_vga_enable(crtc, true);
+ 		amdgpu_atombios_crtc_blank(crtc, ATOM_DISABLE);
+ 		dce_v10_0_vga_enable(crtc, false);
+-		/* Make sure VBLANK interrupt is still enabled */
++		/* Make sure VBLANK and PFLIP interrupts are still enabled */
+ 		type = amdgpu_crtc_idx_to_irq_type(adev, amdgpu_crtc->crtc_id);
+ 		amdgpu_irq_update(adev, &adev->crtc_irq, type);
++		amdgpu_irq_update(adev, &adev->pageflip_irq, type);
+ 		drm_vblank_post_modeset(dev, amdgpu_crtc->crtc_id);
+ 		dce_v10_0_crtc_load_lut(crtc);
+ 		break;
+@@ -3002,6 +3021,8 @@ static int dce_v10_0_hw_init(void *handle)
+ 		dce_v10_0_audio_enable(adev, &adev->mode_info.audio.pin[i], false);
+ 	}
+ 
++	dce_v10_0_pageflip_interrupt_init(adev);
++
+ 	return 0;
+ }
+ 
+@@ -3016,6 +3037,8 @@ static int dce_v10_0_hw_fini(void *handle)
+ 		dce_v10_0_audio_enable(adev, &adev->mode_info.audio.pin[i], false);
+ 	}
+ 
++	dce_v10_0_pageflip_interrupt_fini(adev);
++
+ 	return 0;
+ }
+ 
+@@ -3027,6 +3050,8 @@ static int dce_v10_0_suspend(void *handle)
+ 
+ 	dce_v10_0_hpd_fini(adev);
+ 
++	dce_v10_0_pageflip_interrupt_fini(adev);
++
+ 	return 0;
+ }
+ 
+@@ -3052,6 +3077,8 @@ static int dce_v10_0_resume(void *handle)
+ 	/* initialize hpd */
+ 	dce_v10_0_hpd_init(adev);
+ 
++	dce_v10_0_pageflip_interrupt_init(adev);
++
+ 	return 0;
+ }
+ 
+@@ -3346,7 +3373,6 @@ static int dce_v10_0_pageflip_irq(struct amdgpu_device *adev,
+ 	spin_unlock_irqrestore(&adev->ddev->event_lock, flags);
+ 
+ 	drm_vblank_put(adev->ddev, amdgpu_crtc->crtc_id);
+-	amdgpu_irq_put(adev, &adev->pageflip_irq, crtc_id);
+ 	queue_work(amdgpu_crtc->pflip_queue, &works->unpin_work);
+ 
+ 	return 0;
+diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
+index c4a21a7afd68..329bca0f1331 100644
+--- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
++++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
+@@ -233,6 +233,24 @@ static u32 dce_v11_0_vblank_get_counter(struct amdgpu_device *adev, int crtc)
+ 		return RREG32(mmCRTC_STATUS_FRAME_COUNT + crtc_offsets[crtc]);
+ }
+ 
++static void dce_v11_0_pageflip_interrupt_init(struct amdgpu_device *adev)
++{
++	unsigned i;
++
++	/* Enable pflip interrupts */
++	for (i = 0; i < adev->mode_info.num_crtc; i++)
++		amdgpu_irq_get(adev, &adev->pageflip_irq, i);
++}
++
++static void dce_v11_0_pageflip_interrupt_fini(struct amdgpu_device *adev)
++{
++	unsigned i;
++
++	/* Disable pflip interrupts */
++	for (i = 0; i < adev->mode_info.num_crtc; i++)
++		amdgpu_irq_put(adev, &adev->pageflip_irq, i);
++}
++
+ /**
+  * dce_v11_0_page_flip - pageflip callback.
+  *
+@@ -2640,9 +2658,10 @@ static void dce_v11_0_crtc_dpms(struct drm_crtc *crtc, int mode)
+ 		dce_v11_0_vga_enable(crtc, true);
+ 		amdgpu_atombios_crtc_blank(crtc, ATOM_DISABLE);
+ 		dce_v11_0_vga_enable(crtc, false);
+-		/* Make sure VBLANK interrupt is still enabled */
++		/* Make sure VBLANK and PFLIP interrupts are still enabled */
+ 		type = amdgpu_crtc_idx_to_irq_type(adev, amdgpu_crtc->crtc_id);
+ 		amdgpu_irq_update(adev, &adev->crtc_irq, type);
++		amdgpu_irq_update(adev, &adev->pageflip_irq, type);
+ 		drm_vblank_post_modeset(dev, amdgpu_crtc->crtc_id);
+ 		dce_v11_0_crtc_load_lut(crtc);
+ 		break;
+@@ -2888,7 +2907,7 @@ static int dce_v11_0_early_init(void *handle)
+ 
+ 	switch (adev->asic_type) {
+ 	case CHIP_CARRIZO:
+-		adev->mode_info.num_crtc = 4;
++		adev->mode_info.num_crtc = 3;
+ 		adev->mode_info.num_hpd = 6;
+ 		adev->mode_info.num_dig = 9;
+ 		break;
+@@ -3000,6 +3019,8 @@ static int dce_v11_0_hw_init(void *handle)
+ 		dce_v11_0_audio_enable(adev, &adev->mode_info.audio.pin[i], false);
+ 	}
+ 
++	dce_v11_0_pageflip_interrupt_init(adev);
++
+ 	return 0;
+ }
+ 
+@@ -3014,6 +3035,8 @@ static int dce_v11_0_hw_fini(void *handle)
+ 		dce_v11_0_audio_enable(adev, &adev->mode_info.audio.pin[i], false);
+ 	}
+ 
++	dce_v11_0_pageflip_interrupt_fini(adev);
++
+ 	return 0;
+ }
+ 
+@@ -3025,6 +3048,8 @@ static int dce_v11_0_suspend(void *handle)
+ 
+ 	dce_v11_0_hpd_fini(adev);
+ 
++	dce_v11_0_pageflip_interrupt_fini(adev);
++
+ 	return 0;
+ }
+ 
+@@ -3051,6 +3076,8 @@ static int dce_v11_0_resume(void *handle)
+ 	/* initialize hpd */
+ 	dce_v11_0_hpd_init(adev);
+ 
++	dce_v11_0_pageflip_interrupt_init(adev);
++
+ 	return 0;
+ }
+ 
+@@ -3345,7 +3372,6 @@ static int dce_v11_0_pageflip_irq(struct amdgpu_device *adev,
+ 	spin_unlock_irqrestore(&adev->ddev->event_lock, flags);
+ 
+ 	drm_vblank_put(adev->ddev, amdgpu_crtc->crtc_id);
+-	amdgpu_irq_put(adev, &adev->pageflip_irq, crtc_id);
+ 	queue_work(amdgpu_crtc->pflip_queue, &works->unpin_work);
+ 
+ 	return 0;
+diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c
+index cc050a329c49..937879ed86bc 100644
+--- a/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c
++++ b/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c
+@@ -204,6 +204,24 @@ static u32 dce_v8_0_vblank_get_counter(struct amdgpu_device *adev, int crtc)
+ 		return RREG32(mmCRTC_STATUS_FRAME_COUNT + crtc_offsets[crtc]);
+ }
+ 
++static void dce_v8_0_pageflip_interrupt_init(struct amdgpu_device *adev)
++{
++	unsigned i;
++
++	/* Enable pflip interrupts */
++	for (i = 0; i < adev->mode_info.num_crtc; i++)
++		amdgpu_irq_get(adev, &adev->pageflip_irq, i);
++}
++
++static void dce_v8_0_pageflip_interrupt_fini(struct amdgpu_device *adev)
++{
++	unsigned i;
++
++	/* Disable pflip interrupts */
++	for (i = 0; i < adev->mode_info.num_crtc; i++)
++		amdgpu_irq_put(adev, &adev->pageflip_irq, i);
++}
++
+ /**
+  * dce_v8_0_page_flip - pageflip callback.
+  *
+@@ -2575,9 +2593,10 @@ static void dce_v8_0_crtc_dpms(struct drm_crtc *crtc, int mode)
+ 		dce_v8_0_vga_enable(crtc, true);
+ 		amdgpu_atombios_crtc_blank(crtc, ATOM_DISABLE);
+ 		dce_v8_0_vga_enable(crtc, false);
+-		/* Make sure VBLANK interrupt is still enabled */
++		/* Make sure VBLANK and PFLIP interrupts are still enabled */
+ 		type = amdgpu_crtc_idx_to_irq_type(adev, amdgpu_crtc->crtc_id);
+ 		amdgpu_irq_update(adev, &adev->crtc_irq, type);
++		amdgpu_irq_update(adev, &adev->pageflip_irq, type);
+ 		drm_vblank_post_modeset(dev, amdgpu_crtc->crtc_id);
+ 		dce_v8_0_crtc_load_lut(crtc);
+ 		break;
+@@ -2933,6 +2952,8 @@ static int dce_v8_0_hw_init(void *handle)
+ 		dce_v8_0_audio_enable(adev, &adev->mode_info.audio.pin[i], false);
+ 	}
+ 
++	dce_v8_0_pageflip_interrupt_init(adev);
++
+ 	return 0;
+ }
+ 
+@@ -2947,6 +2968,8 @@ static int dce_v8_0_hw_fini(void *handle)
+ 		dce_v8_0_audio_enable(adev, &adev->mode_info.audio.pin[i], false);
+ 	}
+ 
++	dce_v8_0_pageflip_interrupt_fini(adev);
++
+ 	return 0;
+ }
+ 
+@@ -2958,6 +2981,8 @@ static int dce_v8_0_suspend(void *handle)
+ 
+ 	dce_v8_0_hpd_fini(adev);
+ 
++	dce_v8_0_pageflip_interrupt_fini(adev);
++
+ 	return 0;
+ }
+ 
+@@ -2981,6 +3006,8 @@ static int dce_v8_0_resume(void *handle)
+ 	/* initialize hpd */
+ 	dce_v8_0_hpd_init(adev);
+ 
++	dce_v8_0_pageflip_interrupt_init(adev);
++
+ 	return 0;
+ }
+ 
+@@ -3376,7 +3403,6 @@ static int dce_v8_0_pageflip_irq(struct amdgpu_device *adev,
+ 	spin_unlock_irqrestore(&adev->ddev->event_lock, flags);
+ 
+ 	drm_vblank_put(adev->ddev, amdgpu_crtc->crtc_id);
+-	amdgpu_irq_put(adev, &adev->pageflip_irq, crtc_id);
+ 	queue_work(amdgpu_crtc->pflip_queue, &works->unpin_work);
+ 
+ 	return 0;
+diff --git a/drivers/gpu/drm/amd/amdgpu/kv_dpm.c b/drivers/gpu/drm/amd/amdgpu/kv_dpm.c
+index 94ec04a9c4d5..9745ed3a9aef 100644
+--- a/drivers/gpu/drm/amd/amdgpu/kv_dpm.c
++++ b/drivers/gpu/drm/amd/amdgpu/kv_dpm.c
+@@ -2995,6 +2995,12 @@ static int kv_dpm_late_init(void *handle)
+ {
+ 	/* powerdown unused blocks for now */
+ 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
++	int ret;
++
++	/* init the sysfs and debugfs files late */
++	ret = amdgpu_pm_sysfs_init(adev);
++	if (ret)
++		return ret;
+ 
+ 	kv_dpm_powergate_acp(adev, true);
+ 	kv_dpm_powergate_samu(adev, true);
+@@ -3038,9 +3044,6 @@ static int kv_dpm_sw_init(void *handle)
+ 	adev->pm.dpm.current_ps = adev->pm.dpm.requested_ps = adev->pm.dpm.boot_ps;
+ 	if (amdgpu_dpm == 1)
+ 		amdgpu_pm_print_power_states(adev);
+-	ret = amdgpu_pm_sysfs_init(adev);
+-	if (ret)
+-		goto dpm_failed;
+ 	mutex_unlock(&adev->pm.mutex);
+ 	DRM_INFO("amdgpu: dpm initialized\n");
+ 
+diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c
+index 4f58a1e18de6..9ffa56cebdbc 100644
+--- a/drivers/gpu/drm/amd/amdgpu/vi.c
++++ b/drivers/gpu/drm/amd/amdgpu/vi.c
+@@ -968,6 +968,9 @@ static void vi_pcie_gen3_enable(struct amdgpu_device *adev)
+ 	u32 mask;
+ 	int ret;
+ 
++	if (pci_is_root_bus(adev->pdev->bus))
++		return;
++
+ 	if (amdgpu_pcie_gen2 == 0)
+ 		return;
+ 
+diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
+index 969e7898a7ed..27a2426c3daa 100644
+--- a/drivers/gpu/drm/drm_dp_mst_topology.c
++++ b/drivers/gpu/drm/drm_dp_mst_topology.c
+@@ -2789,12 +2789,13 @@ static int drm_dp_mst_i2c_xfer(struct i2c_adapter *adapter, struct i2c_msg *msgs
+ 	if (msgs[num - 1].flags & I2C_M_RD)
+ 		reading = true;
+ 
+-	if (!reading) {
++	if (!reading || (num - 1 > DP_REMOTE_I2C_READ_MAX_TRANSACTIONS)) {
+ 		DRM_DEBUG_KMS("Unsupported I2C transaction for MST device\n");
+ 		ret = -EIO;
+ 		goto out;
+ 	}
+ 
++	memset(&msg, 0, sizeof(msg));
+ 	msg.req_type = DP_REMOTE_I2C_READ;
+ 	msg.u.i2c_read.num_transactions = num - 1;
+ 	msg.u.i2c_read.port_number = port->port_num;
+diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c
+index 0f6cd33b531f..684bd4a13843 100644
+--- a/drivers/gpu/drm/drm_sysfs.c
++++ b/drivers/gpu/drm/drm_sysfs.c
+@@ -235,18 +235,12 @@ static ssize_t dpms_show(struct device *device,
+ 			   char *buf)
+ {
+ 	struct drm_connector *connector = to_drm_connector(device);
+-	struct drm_device *dev = connector->dev;
+-	uint64_t dpms_status;
+-	int ret;
++	int dpms;
+ 
+-	ret = drm_object_property_get_value(&connector->base,
+-					    dev->mode_config.dpms_property,
+-					    &dpms_status);
+-	if (ret)
+-		return 0;
++	dpms = READ_ONCE(connector->dpms);
+ 
+ 	return snprintf(buf, PAGE_SIZE, "%s\n",
+-			drm_get_dpms_name((int)dpms_status));
++			drm_get_dpms_name(dpms));
+ }
+ 
+ static ssize_t enabled_show(struct device *device,
+diff --git a/drivers/gpu/drm/nouveau/nouveau_fbcon.c b/drivers/gpu/drm/nouveau/nouveau_fbcon.c
+index 6751553abe4a..567791b27d6d 100644
+--- a/drivers/gpu/drm/nouveau/nouveau_fbcon.c
++++ b/drivers/gpu/drm/nouveau/nouveau_fbcon.c
+@@ -178,8 +178,30 @@ nouveau_fbcon_sync(struct fb_info *info)
+ 	return 0;
+ }
+ 
++static int
++nouveau_fbcon_open(struct fb_info *info, int user)
++{
++	struct nouveau_fbdev *fbcon = info->par;
++	struct nouveau_drm *drm = nouveau_drm(fbcon->dev);
++	int ret = pm_runtime_get_sync(drm->dev->dev);
++	if (ret < 0 && ret != -EACCES)
++		return ret;
++	return 0;
++}
++
++static int
++nouveau_fbcon_release(struct fb_info *info, int user)
++{
++	struct nouveau_fbdev *fbcon = info->par;
++	struct nouveau_drm *drm = nouveau_drm(fbcon->dev);
++	pm_runtime_put(drm->dev->dev);
++	return 0;
++}
++
+ static struct fb_ops nouveau_fbcon_ops = {
+ 	.owner = THIS_MODULE,
++	.fb_open = nouveau_fbcon_open,
++	.fb_release = nouveau_fbcon_release,
+ 	.fb_check_var = drm_fb_helper_check_var,
+ 	.fb_set_par = drm_fb_helper_set_par,
+ 	.fb_fillrect = nouveau_fbcon_fillrect,
+@@ -195,6 +217,8 @@ static struct fb_ops nouveau_fbcon_ops = {
+ 
+ static struct fb_ops nouveau_fbcon_sw_ops = {
+ 	.owner = THIS_MODULE,
++	.fb_open = nouveau_fbcon_open,
++	.fb_release = nouveau_fbcon_release,
+ 	.fb_check_var = drm_fb_helper_check_var,
+ 	.fb_set_par = drm_fb_helper_set_par,
+ 	.fb_fillrect = cfb_fillrect,
+diff --git a/drivers/gpu/drm/qxl/qxl_fb.c b/drivers/gpu/drm/qxl/qxl_fb.c
+index 6b6e57e8c2d6..847a902e7385 100644
+--- a/drivers/gpu/drm/qxl/qxl_fb.c
++++ b/drivers/gpu/drm/qxl/qxl_fb.c
+@@ -144,14 +144,17 @@ static void qxl_dirty_update(struct qxl_fbdev *qfbdev,
+ 
+ 	spin_lock_irqsave(&qfbdev->dirty.lock, flags);
+ 
+-	if (qfbdev->dirty.y1 < y)
+-		y = qfbdev->dirty.y1;
+-	if (qfbdev->dirty.y2 > y2)
+-		y2 = qfbdev->dirty.y2;
+-	if (qfbdev->dirty.x1 < x)
+-		x = qfbdev->dirty.x1;
+-	if (qfbdev->dirty.x2 > x2)
+-		x2 = qfbdev->dirty.x2;
++	if ((qfbdev->dirty.y2 - qfbdev->dirty.y1) &&
++	    (qfbdev->dirty.x2 - qfbdev->dirty.x1)) {
++		if (qfbdev->dirty.y1 < y)
++			y = qfbdev->dirty.y1;
++		if (qfbdev->dirty.y2 > y2)
++			y2 = qfbdev->dirty.y2;
++		if (qfbdev->dirty.x1 < x)
++			x = qfbdev->dirty.x1;
++		if (qfbdev->dirty.x2 > x2)
++			x2 = qfbdev->dirty.x2;
++	}
+ 
+ 	qfbdev->dirty.x1 = x;
+ 	qfbdev->dirty.x2 = x2;
+diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c
+index d2e9e9efc159..6743174acdbc 100644
+--- a/drivers/gpu/drm/radeon/radeon_display.c
++++ b/drivers/gpu/drm/radeon/radeon_display.c
+@@ -1633,18 +1633,8 @@ int radeon_modeset_init(struct radeon_device *rdev)
+ 	radeon_fbdev_init(rdev);
+ 	drm_kms_helper_poll_init(rdev->ddev);
+ 
+-	if (rdev->pm.dpm_enabled) {
+-		/* do dpm late init */
+-		ret = radeon_pm_late_init(rdev);
+-		if (ret) {
+-			rdev->pm.dpm_enabled = false;
+-			DRM_ERROR("radeon_pm_late_init failed, disabling dpm\n");
+-		}
+-		/* set the dpm state for PX since there won't be
+-		 * a modeset to call this.
+-		 */
+-		radeon_pm_compute_clocks(rdev);
+-	}
++	/* do pm late init */
++	ret = radeon_pm_late_init(rdev);
+ 
+ 	return 0;
+ }
+diff --git a/drivers/gpu/drm/radeon/radeon_dp_mst.c b/drivers/gpu/drm/radeon/radeon_dp_mst.c
+index 257b10be5cda..42986130cc63 100644
+--- a/drivers/gpu/drm/radeon/radeon_dp_mst.c
++++ b/drivers/gpu/drm/radeon/radeon_dp_mst.c
+@@ -283,6 +283,7 @@ static struct drm_connector *radeon_dp_add_mst_connector(struct drm_dp_mst_topol
+ 	radeon_connector->mst_encoder = radeon_dp_create_fake_mst_encoder(master);
+ 
+ 	drm_object_attach_property(&connector->base, dev->mode_config.path_property, 0);
++	drm_object_attach_property(&connector->base, dev->mode_config.tile_property, 0);
+ 	drm_mode_connector_set_path_property(connector, pathprop);
+ 	drm_reinit_primary_mode_group(dev);
+ 
+diff --git a/drivers/gpu/drm/radeon/radeon_pm.c b/drivers/gpu/drm/radeon/radeon_pm.c
+index c1ba83a8dd8c..948c33105801 100644
+--- a/drivers/gpu/drm/radeon/radeon_pm.c
++++ b/drivers/gpu/drm/radeon/radeon_pm.c
+@@ -1331,14 +1331,6 @@ static int radeon_pm_init_old(struct radeon_device *rdev)
+ 	INIT_DELAYED_WORK(&rdev->pm.dynpm_idle_work, radeon_dynpm_idle_work_handler);
+ 
+ 	if (rdev->pm.num_power_states > 1) {
+-		/* where's the best place to put these? */
+-		ret = device_create_file(rdev->dev, &dev_attr_power_profile);
+-		if (ret)
+-			DRM_ERROR("failed to create device file for power profile\n");
+-		ret = device_create_file(rdev->dev, &dev_attr_power_method);
+-		if (ret)
+-			DRM_ERROR("failed to create device file for power method\n");
+-
+ 		if (radeon_debugfs_pm_init(rdev)) {
+ 			DRM_ERROR("Failed to register debugfs file for PM!\n");
+ 		}
+@@ -1396,20 +1388,6 @@ static int radeon_pm_init_dpm(struct radeon_device *rdev)
+ 		goto dpm_failed;
+ 	rdev->pm.dpm_enabled = true;
+ 
+-	ret = device_create_file(rdev->dev, &dev_attr_power_dpm_state);
+-	if (ret)
+-		DRM_ERROR("failed to create device file for dpm state\n");
+-	ret = device_create_file(rdev->dev, &dev_attr_power_dpm_force_performance_level);
+-	if (ret)
+-		DRM_ERROR("failed to create device file for dpm state\n");
+-	/* XXX: these are noops for dpm but are here for backwards compat */
+-	ret = device_create_file(rdev->dev, &dev_attr_power_profile);
+-	if (ret)
+-		DRM_ERROR("failed to create device file for power profile\n");
+-	ret = device_create_file(rdev->dev, &dev_attr_power_method);
+-	if (ret)
+-		DRM_ERROR("failed to create device file for power method\n");
+-
+ 	if (radeon_debugfs_pm_init(rdev)) {
+ 		DRM_ERROR("Failed to register debugfs file for dpm!\n");
+ 	}
+@@ -1550,9 +1528,44 @@ int radeon_pm_late_init(struct radeon_device *rdev)
+ 	int ret = 0;
+ 
+ 	if (rdev->pm.pm_method == PM_METHOD_DPM) {
+-		mutex_lock(&rdev->pm.mutex);
+-		ret = radeon_dpm_late_enable(rdev);
+-		mutex_unlock(&rdev->pm.mutex);
++		if (rdev->pm.dpm_enabled) {
++			ret = device_create_file(rdev->dev, &dev_attr_power_dpm_state);
++			if (ret)
++				DRM_ERROR("failed to create device file for dpm state\n");
++			ret = device_create_file(rdev->dev, &dev_attr_power_dpm_force_performance_level);
++			if (ret)
++				DRM_ERROR("failed to create device file for dpm state\n");
++			/* XXX: these are noops for dpm but are here for backwards compat */
++			ret = device_create_file(rdev->dev, &dev_attr_power_profile);
++			if (ret)
++				DRM_ERROR("failed to create device file for power profile\n");
++			ret = device_create_file(rdev->dev, &dev_attr_power_method);
++			if (ret)
++				DRM_ERROR("failed to create device file for power method\n");
++
++			mutex_lock(&rdev->pm.mutex);
++			ret = radeon_dpm_late_enable(rdev);
++			mutex_unlock(&rdev->pm.mutex);
++			if (ret) {
++				rdev->pm.dpm_enabled = false;
++				DRM_ERROR("radeon_pm_late_init failed, disabling dpm\n");
++			} else {
++				/* set the dpm state for PX since there won't be
++				 * a modeset to call this.
++				 */
++				radeon_pm_compute_clocks(rdev);
++			}
++		}
++	} else {
++		if (rdev->pm.num_power_states > 1) {
++			/* where's the best place to put these? */
++			ret = device_create_file(rdev->dev, &dev_attr_power_profile);
++			if (ret)
++				DRM_ERROR("failed to create device file for power profile\n");
++			ret = device_create_file(rdev->dev, &dev_attr_power_method);
++			if (ret)
++				DRM_ERROR("failed to create device file for power method\n");
++		}
+ 	}
+ 	return ret;
+ }
+diff --git a/drivers/i2c/busses/i2c-designware-platdrv.c b/drivers/i2c/busses/i2c-designware-platdrv.c
+index 3dd2de31a2f8..472b88285c75 100644
+--- a/drivers/i2c/busses/i2c-designware-platdrv.c
++++ b/drivers/i2c/busses/i2c-designware-platdrv.c
+@@ -24,6 +24,7 @@
+ #include <linux/kernel.h>
+ #include <linux/module.h>
+ #include <linux/delay.h>
++#include <linux/dmi.h>
+ #include <linux/i2c.h>
+ #include <linux/clk.h>
+ #include <linux/clk-provider.h>
+@@ -51,6 +52,22 @@ static u32 i2c_dw_get_clk_rate_khz(struct dw_i2c_dev *dev)
+ }
+ 
+ #ifdef CONFIG_ACPI
++/*
++ * The HCNT/LCNT information coming from ACPI should be the most accurate
++ * for given platform. However, some systems get it wrong. On such systems
++ * we get better results by calculating those based on the input clock.
++ */
++static const struct dmi_system_id dw_i2c_no_acpi_params[] = {
++	{
++		.ident = "Dell Inspiron 7348",
++		.matches = {
++			DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
++			DMI_MATCH(DMI_PRODUCT_NAME, "Inspiron 7348"),
++		},
++	},
++	{ }
++};
++
+ static void dw_i2c_acpi_params(struct platform_device *pdev, char method[],
+ 			       u16 *hcnt, u16 *lcnt, u32 *sda_hold)
+ {
+@@ -58,6 +75,9 @@ static void dw_i2c_acpi_params(struct platform_device *pdev, char method[],
+ 	acpi_handle handle = ACPI_HANDLE(&pdev->dev);
+ 	union acpi_object *obj;
+ 
++	if (dmi_check_system(dw_i2c_no_acpi_params))
++		return;
++
+ 	if (ACPI_FAILURE(acpi_evaluate_object(handle, method, NULL, &buf)))
+ 		return;
+ 
+@@ -253,12 +273,6 @@ static int dw_i2c_probe(struct platform_device *pdev)
+ 	adap->dev.parent = &pdev->dev;
+ 	adap->dev.of_node = pdev->dev.of_node;
+ 
+-	r = i2c_add_numbered_adapter(adap);
+-	if (r) {
+-		dev_err(&pdev->dev, "failure adding adapter\n");
+-		return r;
+-	}
+-
+ 	if (dev->pm_runtime_disabled) {
+ 		pm_runtime_forbid(&pdev->dev);
+ 	} else {
+@@ -268,6 +282,13 @@ static int dw_i2c_probe(struct platform_device *pdev)
+ 		pm_runtime_enable(&pdev->dev);
+ 	}
+ 
++	r = i2c_add_numbered_adapter(adap);
++	if (r) {
++		dev_err(&pdev->dev, "failure adding adapter\n");
++		pm_runtime_disable(&pdev->dev);
++		return r;
++	}
++
+ 	return 0;
+ }
+ 
+diff --git a/drivers/i2c/busses/i2c-rcar.c b/drivers/i2c/busses/i2c-rcar.c
+index d8361dada584..d8b5a8fee1e6 100644
+--- a/drivers/i2c/busses/i2c-rcar.c
++++ b/drivers/i2c/busses/i2c-rcar.c
+@@ -690,15 +690,16 @@ static int rcar_i2c_probe(struct platform_device *pdev)
+ 		return ret;
+ 	}
+ 
++	pm_runtime_enable(dev);
++	platform_set_drvdata(pdev, priv);
++
+ 	ret = i2c_add_numbered_adapter(adap);
+ 	if (ret < 0) {
+ 		dev_err(dev, "reg adap failed: %d\n", ret);
++		pm_runtime_disable(dev);
+ 		return ret;
+ 	}
+ 
+-	pm_runtime_enable(dev);
+-	platform_set_drvdata(pdev, priv);
+-
+ 	dev_info(dev, "probed\n");
+ 
+ 	return 0;
+diff --git a/drivers/i2c/busses/i2c-s3c2410.c b/drivers/i2c/busses/i2c-s3c2410.c
+index 50bfd8cef5f2..5df819610d52 100644
+--- a/drivers/i2c/busses/i2c-s3c2410.c
++++ b/drivers/i2c/busses/i2c-s3c2410.c
+@@ -1243,17 +1243,19 @@ static int s3c24xx_i2c_probe(struct platform_device *pdev)
+ 	i2c->adap.nr = i2c->pdata->bus_num;
+ 	i2c->adap.dev.of_node = pdev->dev.of_node;
+ 
++	platform_set_drvdata(pdev, i2c);
++
++	pm_runtime_enable(&pdev->dev);
++
+ 	ret = i2c_add_numbered_adapter(&i2c->adap);
+ 	if (ret < 0) {
+ 		dev_err(&pdev->dev, "failed to add bus to i2c core\n");
++		pm_runtime_disable(&pdev->dev);
+ 		s3c24xx_i2c_deregister_cpufreq(i2c);
+ 		clk_unprepare(i2c->clk);
+ 		return ret;
+ 	}
+ 
+-	platform_set_drvdata(pdev, i2c);
+-
+-	pm_runtime_enable(&pdev->dev);
+ 	pm_runtime_enable(&i2c->adap.dev);
+ 
+ 	dev_info(&pdev->dev, "%s: S3C I2C adapter\n", dev_name(&i2c->adap.dev));
+diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
+index 75aef240c2d1..493c38e08bd2 100644
+--- a/drivers/md/dm-thin.c
++++ b/drivers/md/dm-thin.c
+@@ -3255,7 +3255,7 @@ static int pool_ctr(struct dm_target *ti, unsigned argc, char **argv)
+ 						metadata_low_callback,
+ 						pool);
+ 	if (r)
+-		goto out_free_pt;
++		goto out_flags_changed;
+ 
+ 	pt->callbacks.congested_fn = pool_is_congested;
+ 	dm_table_add_target_callbacks(ti->table, &pt->callbacks);
+diff --git a/drivers/mfd/max77843.c b/drivers/mfd/max77843.c
+index a354ac677ec7..1074a0d68680 100644
+--- a/drivers/mfd/max77843.c
++++ b/drivers/mfd/max77843.c
+@@ -79,7 +79,7 @@ static int max77843_chg_init(struct max77843 *max77843)
+ 	if (!max77843->i2c_chg) {
+ 		dev_err(&max77843->i2c->dev,
+ 				"Cannot allocate I2C device for Charger\n");
+-		return PTR_ERR(max77843->i2c_chg);
++		return -ENODEV;
+ 	}
+ 	i2c_set_clientdata(max77843->i2c_chg, max77843);
+ 
+diff --git a/drivers/net/ethernet/ibm/emac/core.h b/drivers/net/ethernet/ibm/emac/core.h
+index 28df37420da9..ac02c675c59c 100644
+--- a/drivers/net/ethernet/ibm/emac/core.h
++++ b/drivers/net/ethernet/ibm/emac/core.h
+@@ -460,8 +460,8 @@ struct emac_ethtool_regs_subhdr {
+ 	u32 index;
+ };
+ 
+-#define EMAC_ETHTOOL_REGS_VER		0
+-#define EMAC4_ETHTOOL_REGS_VER		1
+-#define EMAC4SYNC_ETHTOOL_REGS_VER	2
++#define EMAC_ETHTOOL_REGS_VER		3
++#define EMAC4_ETHTOOL_REGS_VER		4
++#define EMAC4SYNC_ETHTOOL_REGS_VER	5
+ 
+ #endif /* __IBM_NEWEMAC_CORE_H */
+diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
+index 3837ae344f63..2ed75060da50 100644
+--- a/drivers/net/ppp/pppoe.c
++++ b/drivers/net/ppp/pppoe.c
+@@ -313,7 +313,6 @@ static void pppoe_flush_dev(struct net_device *dev)
+ 			if (po->pppoe_dev == dev &&
+ 			    sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND | PPPOX_ZOMBIE)) {
+ 				pppox_unbind_sock(sk);
+-				sk->sk_state = PPPOX_ZOMBIE;
+ 				sk->sk_state_change(sk);
+ 				po->pppoe_dev = NULL;
+ 				dev_put(dev);
+diff --git a/drivers/pinctrl/freescale/pinctrl-imx25.c b/drivers/pinctrl/freescale/pinctrl-imx25.c
+index faf635654312..293ed4381cc0 100644
+--- a/drivers/pinctrl/freescale/pinctrl-imx25.c
++++ b/drivers/pinctrl/freescale/pinctrl-imx25.c
+@@ -26,7 +26,8 @@
+ #include "pinctrl-imx.h"
+ 
+ enum imx25_pads {
+-	MX25_PAD_RESERVE0 = 1,
++	MX25_PAD_RESERVE0 = 0,
++	MX25_PAD_RESERVE1 = 1,
+ 	MX25_PAD_A10 = 2,
+ 	MX25_PAD_A13 = 3,
+ 	MX25_PAD_A14 = 4,
+@@ -169,6 +170,7 @@ enum imx25_pads {
+ /* Pad names for the pinmux subsystem */
+ static const struct pinctrl_pin_desc imx25_pinctrl_pads[] = {
+ 	IMX_PINCTRL_PIN(MX25_PAD_RESERVE0),
++	IMX_PINCTRL_PIN(MX25_PAD_RESERVE1),
+ 	IMX_PINCTRL_PIN(MX25_PAD_A10),
+ 	IMX_PINCTRL_PIN(MX25_PAD_A13),
+ 	IMX_PINCTRL_PIN(MX25_PAD_A14),
+diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
+index 802fabb30e15..34cbe3505dac 100644
+--- a/fs/btrfs/backref.c
++++ b/fs/btrfs/backref.c
+@@ -1809,7 +1809,6 @@ static int iterate_inode_extrefs(u64 inum, struct btrfs_root *fs_root,
+ 	int found = 0;
+ 	struct extent_buffer *eb;
+ 	struct btrfs_inode_extref *extref;
+-	struct extent_buffer *leaf;
+ 	u32 item_size;
+ 	u32 cur_offset;
+ 	unsigned long ptr;
+@@ -1837,9 +1836,8 @@ static int iterate_inode_extrefs(u64 inum, struct btrfs_root *fs_root,
+ 		btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
+ 		btrfs_release_path(path);
+ 
+-		leaf = path->nodes[0];
+-		item_size = btrfs_item_size_nr(leaf, slot);
+-		ptr = btrfs_item_ptr_offset(leaf, slot);
++		item_size = btrfs_item_size_nr(eb, slot);
++		ptr = btrfs_item_ptr_offset(eb, slot);
+ 		cur_offset = 0;
+ 
+ 		while (cur_offset < item_size) {
+@@ -1853,7 +1851,7 @@ static int iterate_inode_extrefs(u64 inum, struct btrfs_root *fs_root,
+ 			if (ret)
+ 				break;
+ 
+-			cur_offset += btrfs_inode_extref_name_len(leaf, extref);
++			cur_offset += btrfs_inode_extref_name_len(eb, extref);
+ 			cur_offset += sizeof(*extref);
+ 		}
+ 		btrfs_tree_read_unlock_blocking(eb);
+diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
+index 0770c91586ca..f490b6155091 100644
+--- a/fs/btrfs/ioctl.c
++++ b/fs/btrfs/ioctl.c
+@@ -4647,6 +4647,11 @@ locked:
+ 		bctl->flags |= BTRFS_BALANCE_TYPE_MASK;
+ 	}
+ 
++	if (bctl->flags & ~(BTRFS_BALANCE_ARGS_MASK | BTRFS_BALANCE_TYPE_MASK)) {
++		ret = -EINVAL;
++		goto out_bargs;
++	}
++
+ do_balance:
+ 	/*
+ 	 * Ownership of bctl and mutually_exclusive_operation_running
+diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
+index 95842a909e7f..2ac5f8cd701a 100644
+--- a/fs/btrfs/volumes.h
++++ b/fs/btrfs/volumes.h
+@@ -376,6 +376,14 @@ struct map_lookup {
+ #define BTRFS_BALANCE_ARGS_VRANGE	(1ULL << 4)
+ #define BTRFS_BALANCE_ARGS_LIMIT	(1ULL << 5)
+ 
++#define BTRFS_BALANCE_ARGS_MASK			\
++	(BTRFS_BALANCE_ARGS_PROFILES |		\
++	 BTRFS_BALANCE_ARGS_USAGE |		\
++	 BTRFS_BALANCE_ARGS_DEVID | 		\
++	 BTRFS_BALANCE_ARGS_DRANGE |		\
++	 BTRFS_BALANCE_ARGS_VRANGE |		\
++	 BTRFS_BALANCE_ARGS_LIMIT)
++
+ /*
+  * Profile changing flags.  When SOFT is set we won't relocate chunk if
+  * it already has the target profile (even though it may be
+diff --git a/fs/nfsd/blocklayout.c b/fs/nfsd/blocklayout.c
+index cdefaa331a07..c29d9421bd5e 100644
+--- a/fs/nfsd/blocklayout.c
++++ b/fs/nfsd/blocklayout.c
+@@ -56,14 +56,6 @@ nfsd4_block_proc_layoutget(struct inode *inode, const struct svc_fh *fhp,
+ 	u32 device_generation = 0;
+ 	int error;
+ 
+-	/*
+-	 * We do not attempt to support I/O smaller than the fs block size,
+-	 * or not aligned to it.
+-	 */
+-	if (args->lg_minlength < block_size) {
+-		dprintk("pnfsd: I/O too small\n");
+-		goto out_layoutunavailable;
+-	}
+ 	if (seg->offset & (block_size - 1)) {
+ 		dprintk("pnfsd: I/O misaligned\n");
+ 		goto out_layoutunavailable;
+diff --git a/include/drm/drm_dp_mst_helper.h b/include/drm/drm_dp_mst_helper.h
+index 86d0b25ed054..a89f505c856b 100644
+--- a/include/drm/drm_dp_mst_helper.h
++++ b/include/drm/drm_dp_mst_helper.h
+@@ -253,6 +253,7 @@ struct drm_dp_remote_dpcd_write {
+ 	u8 *bytes;
+ };
+ 
++#define DP_REMOTE_I2C_READ_MAX_TRANSACTIONS 4
+ struct drm_dp_remote_i2c_read {
+ 	u8 num_transactions;
+ 	u8 port_number;
+@@ -262,7 +263,7 @@ struct drm_dp_remote_i2c_read {
+ 		u8 *bytes;
+ 		u8 no_stop_bit;
+ 		u8 i2c_transaction_delay;
+-	} transactions[4];
++	} transactions[DP_REMOTE_I2C_READ_MAX_TRANSACTIONS];
+ 	u8 read_i2c_device_id;
+ 	u8 num_bytes_read;
+ };
+diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
+index 9b88536487e6..275158803824 100644
+--- a/include/linux/skbuff.h
++++ b/include/linux/skbuff.h
+@@ -2601,6 +2601,9 @@ static inline void skb_postpull_rcsum(struct sk_buff *skb,
+ {
+ 	if (skb->ip_summed == CHECKSUM_COMPLETE)
+ 		skb->csum = csum_sub(skb->csum, csum_partial(start, len, 0));
++	else if (skb->ip_summed == CHECKSUM_PARTIAL &&
++		 skb_checksum_start_offset(skb) < 0)
++		skb->ip_summed = CHECKSUM_NONE;
+ }
+ 
+ unsigned char *skb_pull_rcsum(struct sk_buff *skb, unsigned int len);
+diff --git a/include/net/af_unix.h b/include/net/af_unix.h
+index 4a167b30a12f..cb1b9bbda332 100644
+--- a/include/net/af_unix.h
++++ b/include/net/af_unix.h
+@@ -63,7 +63,11 @@ struct unix_sock {
+ #define UNIX_GC_MAYBE_CYCLE	1
+ 	struct socket_wq	peer_wq;
+ };
+-#define unix_sk(__sk) ((struct unix_sock *)__sk)
++
++static inline struct unix_sock *unix_sk(struct sock *sk)
++{
++	return (struct unix_sock *)sk;
++}
+ 
+ #define peer_wait peer_wq.wait
+ 
+diff --git a/include/net/sock.h b/include/net/sock.h
+index f21f0708ec59..4ca4c3fe446f 100644
+--- a/include/net/sock.h
++++ b/include/net/sock.h
+@@ -826,6 +826,14 @@ static inline __must_check int sk_add_backlog(struct sock *sk, struct sk_buff *s
+ 	if (sk_rcvqueues_full(sk, limit))
+ 		return -ENOBUFS;
+ 
++	/*
++	 * If the skb was allocated from pfmemalloc reserves, only
++	 * allow SOCK_MEMALLOC sockets to use it as this socket is
++	 * helping free memory
++	 */
++	if (skb_pfmemalloc(skb) && !sock_flag(sk, SOCK_MEMALLOC))
++		return -ENOMEM;
++
+ 	__sk_add_backlog(sk, skb);
+ 	sk->sk_backlog.len += skb->truesize;
+ 	return 0;
+diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
+index a20d4110e871..3688f1e07ebd 100644
+--- a/kernel/time/timekeeping.c
++++ b/kernel/time/timekeeping.c
+@@ -1244,7 +1244,7 @@ void __init timekeeping_init(void)
+ 	set_normalized_timespec64(&tmp, -boot.tv_sec, -boot.tv_nsec);
+ 	tk_set_wall_to_mono(tk, tmp);
+ 
+-	timekeeping_update(tk, TK_MIRROR);
++	timekeeping_update(tk, TK_MIRROR | TK_CLOCK_WAS_SET);
+ 
+ 	write_seqcount_end(&tk_core.seq);
+ 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
+diff --git a/kernel/workqueue.c b/kernel/workqueue.c
+index a413acb59a07..1de0f5fabb98 100644
+--- a/kernel/workqueue.c
++++ b/kernel/workqueue.c
+@@ -1458,13 +1458,13 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
+ 	timer_stats_timer_set_start_info(&dwork->timer);
+ 
+ 	dwork->wq = wq;
++	/* timer isn't guaranteed to run in this cpu, record earlier */
++	if (cpu == WORK_CPU_UNBOUND)
++		cpu = raw_smp_processor_id();
+ 	dwork->cpu = cpu;
+ 	timer->expires = jiffies + delay;
+ 
+-	if (unlikely(cpu != WORK_CPU_UNBOUND))
+-		add_timer_on(timer, cpu);
+-	else
+-		add_timer(timer);
++	add_timer_on(timer, cpu);
+ }
+ 
+ /**
+diff --git a/mm/memcontrol.c b/mm/memcontrol.c
+index 237d4686482d..03a6f7506cf3 100644
+--- a/mm/memcontrol.c
++++ b/mm/memcontrol.c
+@@ -3687,6 +3687,7 @@ static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg,
+ 	ret = page_counter_memparse(args, "-1", &threshold);
+ 	if (ret)
+ 		return ret;
++	threshold <<= PAGE_SHIFT;
+ 
+ 	mutex_lock(&memcg->thresholds_lock);
+ 
+diff --git a/net/core/ethtool.c b/net/core/ethtool.c
+index b495ab1797fa..29edf74846fc 100644
+--- a/net/core/ethtool.c
++++ b/net/core/ethtool.c
+@@ -1284,7 +1284,7 @@ static int ethtool_get_strings(struct net_device *dev, void __user *useraddr)
+ 
+ 	gstrings.len = ret;
+ 
+-	data = kmalloc(gstrings.len * ETH_GSTRING_LEN, GFP_USER);
++	data = kcalloc(gstrings.len, ETH_GSTRING_LEN, GFP_USER);
+ 	if (!data)
+ 		return -ENOMEM;
+ 
+diff --git a/net/core/filter.c b/net/core/filter.c
+index be3098fb65e4..8dcdd86b68dd 100644
+--- a/net/core/filter.c
++++ b/net/core/filter.c
+@@ -1412,6 +1412,7 @@ static u64 bpf_clone_redirect(u64 r1, u64 ifindex, u64 flags, u64 r4, u64 r5)
+ 		return dev_forward_skb(dev, skb2);
+ 
+ 	skb2->dev = dev;
++	skb_sender_cpu_clear(skb2);
+ 	return dev_queue_xmit(skb2);
+ }
+ 
+@@ -1701,9 +1702,13 @@ int sk_get_filter(struct sock *sk, struct sock_filter __user *ubuf,
+ 		goto out;
+ 
+ 	/* We're copying the filter that has been originally attached,
+-	 * so no conversion/decode needed anymore.
++	 * so no conversion/decode needed anymore. eBPF programs that
++	 * have no original program cannot be dumped through this.
+ 	 */
++	ret = -EACCES;
+ 	fprog = filter->prog->orig_prog;
++	if (!fprog)
++		goto out;
+ 
+ 	ret = fprog->len;
+ 	if (!len)
+diff --git a/net/core/skbuff.c b/net/core/skbuff.c
+index 7b84330e5d30..7bfa18746681 100644
+--- a/net/core/skbuff.c
++++ b/net/core/skbuff.c
+@@ -2958,11 +2958,12 @@ EXPORT_SYMBOL_GPL(skb_append_pagefrags);
+  */
+ unsigned char *skb_pull_rcsum(struct sk_buff *skb, unsigned int len)
+ {
++	unsigned char *data = skb->data;
++
+ 	BUG_ON(len > skb->len);
+-	skb->len -= len;
+-	BUG_ON(skb->len < skb->data_len);
+-	skb_postpull_rcsum(skb, skb->data, len);
+-	return skb->data += len;
++	__skb_pull(skb, len);
++	skb_postpull_rcsum(skb, data, len);
++	return skb->data;
+ }
+ EXPORT_SYMBOL_GPL(skb_pull_rcsum);
+ 
+diff --git a/net/dsa/slave.c b/net/dsa/slave.c
+index 35c47ddd04f0..25dbb91e1bc0 100644
+--- a/net/dsa/slave.c
++++ b/net/dsa/slave.c
+@@ -348,12 +348,17 @@ static int dsa_slave_stp_update(struct net_device *dev, u8 state)
+ static int dsa_slave_port_attr_set(struct net_device *dev,
+ 				   struct switchdev_attr *attr)
+ {
+-	int ret = 0;
++	struct dsa_slave_priv *p = netdev_priv(dev);
++	struct dsa_switch *ds = p->parent;
++	int ret;
+ 
+ 	switch (attr->id) {
+ 	case SWITCHDEV_ATTR_PORT_STP_STATE:
+-		if (attr->trans == SWITCHDEV_TRANS_COMMIT)
+-			ret = dsa_slave_stp_update(dev, attr->u.stp_state);
++		if (attr->trans == SWITCHDEV_TRANS_PREPARE)
++			ret = ds->drv->port_stp_update ? 0 : -EOPNOTSUPP;
++		else
++			ret = ds->drv->port_stp_update(ds, p->port,
++						       attr->u.stp_state);
+ 		break;
+ 	default:
+ 		ret = -EOPNOTSUPP;
+diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
+index 134957159c27..61b45a17fc73 100644
+--- a/net/ipv4/inet_connection_sock.c
++++ b/net/ipv4/inet_connection_sock.c
+@@ -577,21 +577,22 @@ EXPORT_SYMBOL(inet_rtx_syn_ack);
+ static bool reqsk_queue_unlink(struct request_sock_queue *queue,
+ 			       struct request_sock *req)
+ {
+-	struct listen_sock *lopt = queue->listen_opt;
+ 	struct request_sock **prev;
++	struct listen_sock *lopt;
+ 	bool found = false;
+ 
+ 	spin_lock(&queue->syn_wait_lock);
+-
+-	for (prev = &lopt->syn_table[req->rsk_hash]; *prev != NULL;
+-	     prev = &(*prev)->dl_next) {
+-		if (*prev == req) {
+-			*prev = req->dl_next;
+-			found = true;
+-			break;
++	lopt = queue->listen_opt;
++	if (lopt) {
++		for (prev = &lopt->syn_table[req->rsk_hash]; *prev != NULL;
++		     prev = &(*prev)->dl_next) {
++			if (*prev == req) {
++				*prev = req->dl_next;
++				found = true;
++				break;
++			}
+ 		}
+ 	}
+-
+ 	spin_unlock(&queue->syn_wait_lock);
+ 	if (timer_pending(&req->rsk_timer) && del_timer_sync(&req->rsk_timer))
+ 		reqsk_put(req);
+@@ -685,20 +686,20 @@ void reqsk_queue_hash_req(struct request_sock_queue *queue,
+ 	req->num_timeout = 0;
+ 	req->sk = NULL;
+ 
++	setup_timer(&req->rsk_timer, reqsk_timer_handler, (unsigned long)req);
++	mod_timer_pinned(&req->rsk_timer, jiffies + timeout);
++	req->rsk_hash = hash;
++
+ 	/* before letting lookups find us, make sure all req fields
+ 	 * are committed to memory and refcnt initialized.
+ 	 */
+ 	smp_wmb();
+ 	atomic_set(&req->rsk_refcnt, 2);
+-	setup_timer(&req->rsk_timer, reqsk_timer_handler, (unsigned long)req);
+-	req->rsk_hash = hash;
+ 
+ 	spin_lock(&queue->syn_wait_lock);
+ 	req->dl_next = lopt->syn_table[hash];
+ 	lopt->syn_table[hash] = req;
+ 	spin_unlock(&queue->syn_wait_lock);
+-
+-	mod_timer_pinned(&req->rsk_timer, jiffies + timeout);
+ }
+ EXPORT_SYMBOL(reqsk_queue_hash_req);
+ 
+diff --git a/net/ipv6/route.c b/net/ipv6/route.c
+index 00b64d402a57..dd6ebba5846c 100644
+--- a/net/ipv6/route.c
++++ b/net/ipv6/route.c
+@@ -139,6 +139,9 @@ static void rt6_uncached_list_flush_dev(struct net *net, struct net_device *dev)
+ 	struct net_device *loopback_dev = net->loopback_dev;
+ 	int cpu;
+ 
++	if (dev == loopback_dev)
++		return;
++
+ 	for_each_possible_cpu(cpu) {
+ 		struct uncached_list *ul = per_cpu_ptr(&rt6_uncached_list, cpu);
+ 		struct rt6_info *rt;
+@@ -148,14 +151,12 @@ static void rt6_uncached_list_flush_dev(struct net *net, struct net_device *dev)
+ 			struct inet6_dev *rt_idev = rt->rt6i_idev;
+ 			struct net_device *rt_dev = rt->dst.dev;
+ 
+-			if (rt_idev && (rt_idev->dev == dev || !dev) &&
+-			    rt_idev->dev != loopback_dev) {
++			if (rt_idev->dev == dev) {
+ 				rt->rt6i_idev = in6_dev_get(loopback_dev);
+ 				in6_dev_put(rt_idev);
+ 			}
+ 
+-			if (rt_dev && (rt_dev == dev || !dev) &&
+-			    rt_dev != loopback_dev) {
++			if (rt_dev == dev) {
+ 				rt->dst.dev = loopback_dev;
+ 				dev_hold(rt->dst.dev);
+ 				dev_put(rt_dev);
+@@ -2577,7 +2578,8 @@ void rt6_ifdown(struct net *net, struct net_device *dev)
+ 
+ 	fib6_clean_all(net, fib6_ifdown, &adn);
+ 	icmp6_clean_all(fib6_ifdown, &adn);
+-	rt6_uncached_list_flush_dev(net, dev);
++	if (dev)
++		rt6_uncached_list_flush_dev(net, dev);
+ }
+ 
+ struct rt6_mtu_change_arg {
+diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
+index f6b090df3930..afca2eb4dfa7 100644
+--- a/net/l2tp/l2tp_core.c
++++ b/net/l2tp/l2tp_core.c
+@@ -1319,7 +1319,7 @@ static void l2tp_tunnel_del_work(struct work_struct *work)
+ 	tunnel = container_of(work, struct l2tp_tunnel, del_work);
+ 	sk = l2tp_tunnel_sock_lookup(tunnel);
+ 	if (!sk)
+-		return;
++		goto out;
+ 
+ 	sock = sk->sk_socket;
+ 
+@@ -1341,6 +1341,8 @@ static void l2tp_tunnel_del_work(struct work_struct *work)
+ 	}
+ 
+ 	l2tp_tunnel_sock_put(sk);
++out:
++	l2tp_tunnel_dec_refcount(tunnel);
+ }
+ 
+ /* Create a socket for the tunnel, if one isn't set up by
+@@ -1636,8 +1638,13 @@ EXPORT_SYMBOL_GPL(l2tp_tunnel_create);
+  */
+ int l2tp_tunnel_delete(struct l2tp_tunnel *tunnel)
+ {
++	l2tp_tunnel_inc_refcount(tunnel);
+ 	l2tp_tunnel_closeall(tunnel);
+-	return (false == queue_work(l2tp_wq, &tunnel->del_work));
++	if (false == queue_work(l2tp_wq, &tunnel->del_work)) {
++		l2tp_tunnel_dec_refcount(tunnel);
++		return 1;
++	}
++	return 0;
+ }
+ EXPORT_SYMBOL_GPL(l2tp_tunnel_delete);
+ 
+diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
+index 0857f7243797..a133d16eb053 100644
+--- a/net/netlink/af_netlink.c
++++ b/net/netlink/af_netlink.c
+@@ -2750,6 +2750,7 @@ static int netlink_dump(struct sock *sk)
+ 	struct sk_buff *skb = NULL;
+ 	struct nlmsghdr *nlh;
+ 	int len, err = -ENOBUFS;
++	int alloc_min_size;
+ 	int alloc_size;
+ 
+ 	mutex_lock(nlk->cb_mutex);
+@@ -2758,9 +2759,6 @@ static int netlink_dump(struct sock *sk)
+ 		goto errout_skb;
+ 	}
+ 
+-	cb = &nlk->cb;
+-	alloc_size = max_t(int, cb->min_dump_alloc, NLMSG_GOODSIZE);
+-
+ 	if (!netlink_rx_is_mmaped(sk) &&
+ 	    atomic_read(&sk->sk_rmem_alloc) >= sk->sk_rcvbuf)
+ 		goto errout_skb;
+@@ -2770,23 +2768,35 @@ static int netlink_dump(struct sock *sk)
+ 	 * to reduce number of system calls on dump operations, if user
+ 	 * ever provided a big enough buffer.
+ 	 */
+-	if (alloc_size < nlk->max_recvmsg_len) {
+-		skb = netlink_alloc_skb(sk,
+-					nlk->max_recvmsg_len,
+-					nlk->portid,
++	cb = &nlk->cb;
++	alloc_min_size = max_t(int, cb->min_dump_alloc, NLMSG_GOODSIZE);
++
++	if (alloc_min_size < nlk->max_recvmsg_len) {
++		alloc_size = nlk->max_recvmsg_len;
++		skb = netlink_alloc_skb(sk, alloc_size, nlk->portid,
+ 					GFP_KERNEL |
+ 					__GFP_NOWARN |
+ 					__GFP_NORETRY);
+-		/* available room should be exact amount to avoid MSG_TRUNC */
+-		if (skb)
+-			skb_reserve(skb, skb_tailroom(skb) -
+-					 nlk->max_recvmsg_len);
+ 	}
+-	if (!skb)
++	if (!skb) {
++		alloc_size = alloc_min_size;
+ 		skb = netlink_alloc_skb(sk, alloc_size, nlk->portid,
+ 					GFP_KERNEL);
++	}
+ 	if (!skb)
+ 		goto errout_skb;
++
++	/* Trim skb to allocated size. User is expected to provide buffer as
++	 * large as max(min_dump_alloc, 16KiB (mac_recvmsg_len capped at
++	 * netlink_recvmsg())). dump will pack as many smaller messages as
++	 * could fit within the allocated skb. skb is typically allocated
++	 * with larger space than required (could be as much as near 2x the
++	 * requested size with align to next power of 2 approach). Allowing
++	 * dump to use the excess space makes it difficult for a user to have a
++	 * reasonable static buffer based on the expected largest dump of a
++	 * single netdev. The outcome is MSG_TRUNC error.
++	 */
++	skb_reserve(skb, skb_tailroom(skb) - alloc_size);
+ 	netlink_skb_set_owner_r(skb, sk);
+ 
+ 	len = cb->dump(skb, cb);
+diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
+index b5c3bba87fc8..af08e6fc9860 100644
+--- a/net/openvswitch/flow_table.c
++++ b/net/openvswitch/flow_table.c
+@@ -92,7 +92,8 @@ struct sw_flow *ovs_flow_alloc(void)
+ 
+ 	/* Initialize the default stat node. */
+ 	stats = kmem_cache_alloc_node(flow_stats_cache,
+-				      GFP_KERNEL | __GFP_ZERO, 0);
++				      GFP_KERNEL | __GFP_ZERO,
++				      node_online(0) ? 0 : NUMA_NO_NODE);
+ 	if (!stats)
+ 		goto err;
+ 
+diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
+index 268545050ddb..b1768198ad59 100644
+--- a/net/sched/act_mirred.c
++++ b/net/sched/act_mirred.c
+@@ -168,6 +168,7 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
+ 
+ 	skb2->skb_iif = skb->dev->ifindex;
+ 	skb2->dev = dev;
++	skb_sender_cpu_clear(skb2);
+ 	err = dev_queue_xmit(skb2);
+ 
+ out:
+diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+index 2e1348bde325..96d886a866e9 100644
+--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
++++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+@@ -146,7 +146,8 @@ int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,
+ 	ctxt->read_hdr = head;
+ 	pages_needed =
+ 		min_t(int, pages_needed, rdma_read_max_sge(xprt, pages_needed));
+-	read = min_t(int, pages_needed << PAGE_SHIFT, rs_length);
++	read = min_t(int, (pages_needed << PAGE_SHIFT) - *page_offset,
++		     rs_length);
+ 
+ 	for (pno = 0; pno < pages_needed; pno++) {
+ 		int len = min_t(int, rs_length, PAGE_SIZE - pg_off);
+@@ -245,7 +246,8 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
+ 	ctxt->direction = DMA_FROM_DEVICE;
+ 	ctxt->frmr = frmr;
+ 	pages_needed = min_t(int, pages_needed, xprt->sc_frmr_pg_list_len);
+-	read = min_t(int, pages_needed << PAGE_SHIFT, rs_length);
++	read = min_t(int, (pages_needed << PAGE_SHIFT) - *page_offset,
++		     rs_length);
+ 
+ 	frmr->kva = page_address(rqstp->rq_arg.pages[pg_no]);
+ 	frmr->direction = DMA_FROM_DEVICE;
+diff --git a/net/tipc/msg.h b/net/tipc/msg.h
+index 19c45fb66238..49f9a9648aa9 100644
+--- a/net/tipc/msg.h
++++ b/net/tipc/msg.h
+@@ -357,7 +357,7 @@ static inline u32 msg_importance(struct tipc_msg *m)
+ 	if (likely((usr <= TIPC_CRITICAL_IMPORTANCE) && !msg_errcode(m)))
+ 		return usr;
+ 	if ((usr == MSG_FRAGMENTER) || (usr == MSG_BUNDLER))
+-		return msg_bits(m, 5, 13, 0x7);
++		return msg_bits(m, 9, 0, 0x7);
+ 	return TIPC_SYSTEM_IMPORTANCE;
+ }
+ 
+@@ -366,7 +366,7 @@ static inline void msg_set_importance(struct tipc_msg *m, u32 i)
+ 	int usr = msg_user(m);
+ 
+ 	if (likely((usr == MSG_FRAGMENTER) || (usr == MSG_BUNDLER)))
+-		msg_set_bits(m, 5, 13, 0x7, i);
++		msg_set_bits(m, 9, 0, 0x7, i);
+ 	else if (i < TIPC_SYSTEM_IMPORTANCE)
+ 		msg_set_user(m, i);
+ 	else
+diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
+index 03ee4d359f6a..94f658235fb4 100644
+--- a/net/unix/af_unix.c
++++ b/net/unix/af_unix.c
+@@ -2064,6 +2064,11 @@ static int unix_stream_read_generic(struct unix_stream_read_state *state)
+ 		goto out;
+ 	}
+ 
++	if (flags & MSG_PEEK)
++		skip = sk_peek_offset(sk, flags);
++	else
++		skip = 0;
++
+ 	do {
+ 		int chunk;
+ 		struct sk_buff *skb, *last;
+@@ -2112,7 +2117,6 @@ unlock:
+ 			break;
+ 		}
+ 
+-		skip = sk_peek_offset(sk, flags);
+ 		while (skip >= unix_skb_len(skb)) {
+ 			skip -= unix_skb_len(skb);
+ 			last = skb;
+@@ -2181,6 +2185,17 @@ unlock:
+ 
+ 			sk_peek_offset_fwd(sk, chunk);
+ 
++			if (UNIXCB(skb).fp)
++				break;
++
++			skip = 0;
++			last = skb;
++			last_len = skb->len;
++			unix_state_lock(sk);
++			skb = skb_peek_next(skb, &sk->sk_receive_queue);
++			if (skb)
++				goto again;
++			unix_state_unlock(sk);
+ 			break;
+ 		}
+ 	} while (size);


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-11-05 23:30 Mike Pagano
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-11-05 23:30 UTC (permalink / raw
  To: gentoo-commits

commit:     3a0e597bb6b80d0db9567050a1fb2c397c1e3594
Author:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Thu Nov  5 23:30:34 2015 +0000
Commit:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Thu Nov  5 23:30:34 2015 +0000
URL:        https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=3a0e597b

Removing kdbus as per upstream developers. See http://lwn.net/Articles/663062/

 0000_README                |     4 -
 5015_kdbus-8-12-2015.patch | 34349 -------------------------------------------
 2 files changed, 34353 deletions(-)

diff --git a/0000_README b/0000_README
index d40ecf2..cf9d964 100644
--- a/0000_README
+++ b/0000_README
@@ -110,7 +110,3 @@ Desc:   BFQ v7r8 patch 3 for 4.2: Early Queue Merge (EQM)
 Patch:  5010_enable-additional-cpu-optimizations-for-gcc-4.9.patch
 From:   https://github.com/graysky2/kernel_gcc_patch/
 Desc:   Kernel patch enables gcc >= v4.9 optimizations for additional CPUs.
-
-Patch:  5015_kdbus-8-12-2015.patch
-From:   https://lkml.org
-Desc:   Kernel-level IPC implementation

diff --git a/5015_kdbus-8-12-2015.patch b/5015_kdbus-8-12-2015.patch
deleted file mode 100644
index 4e018f2..0000000
--- a/5015_kdbus-8-12-2015.patch
+++ /dev/null
@@ -1,34349 +0,0 @@
-diff --git a/Documentation/Makefile b/Documentation/Makefile
-index bc05482..e2127a7 100644
---- a/Documentation/Makefile
-+++ b/Documentation/Makefile
-@@ -1,4 +1,4 @@
- subdir-y := accounting auxdisplay blackfin connector \
--	filesystems filesystems ia64 laptops mic misc-devices \
-+	filesystems filesystems ia64 kdbus laptops mic misc-devices \
- 	networking pcmcia prctl ptp spi timers vDSO video4linux \
- 	watchdog
-diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
-index 51f4221..ec7c81b 100644
---- a/Documentation/ioctl/ioctl-number.txt
-+++ b/Documentation/ioctl/ioctl-number.txt
-@@ -292,6 +292,7 @@ Code  Seq#(hex)	Include File		Comments
- 0x92	00-0F	drivers/usb/mon/mon_bin.c
- 0x93	60-7F	linux/auto_fs.h
- 0x94	all	fs/btrfs/ioctl.h
-+0x95	all	uapi/linux/kdbus.h	kdbus IPC driver
- 0x97	00-7F	fs/ceph/ioctl.h		Ceph file system
- 0x99	00-0F				537-Addinboard driver
- 					<mailto:buk@buks.ipn.de>
-diff --git a/Documentation/kdbus/.gitignore b/Documentation/kdbus/.gitignore
-new file mode 100644
-index 0000000..b4a77cc
---- /dev/null
-+++ b/Documentation/kdbus/.gitignore
-@@ -0,0 +1,2 @@
-+*.7
-+*.html
-diff --git a/Documentation/kdbus/Makefile b/Documentation/kdbus/Makefile
-new file mode 100644
-index 0000000..8caffe5
---- /dev/null
-+++ b/Documentation/kdbus/Makefile
-@@ -0,0 +1,44 @@
-+DOCS :=	\
-+	kdbus.xml		\
-+	kdbus.bus.xml		\
-+	kdbus.connection.xml	\
-+	kdbus.endpoint.xml	\
-+	kdbus.fs.xml		\
-+	kdbus.item.xml		\
-+	kdbus.match.xml		\
-+	kdbus.message.xml	\
-+	kdbus.name.xml		\
-+	kdbus.policy.xml	\
-+	kdbus.pool.xml
-+
-+XMLFILES := $(addprefix $(obj)/,$(DOCS))
-+MANFILES := $(patsubst %.xml, %.7, $(XMLFILES))
-+HTMLFILES := $(patsubst %.xml, %.html, $(XMLFILES))
-+
-+XMLTO_ARGS := -m $(srctree)/$(src)/stylesheet.xsl --skip-validation
-+
-+quiet_cmd_db2man = MAN     $@
-+      cmd_db2man = xmlto man $(XMLTO_ARGS) -o $(obj) $<
-+%.7: %.xml
-+	@(which xmlto > /dev/null 2>&1) || \
-+	 (echo "*** You need to install xmlto ***"; \
-+	  exit 1)
-+	$(call cmd,db2man)
-+
-+quiet_cmd_db2html = HTML    $@
-+      cmd_db2html = xmlto html-nochunks $(XMLTO_ARGS) -o $(obj) $<
-+%.html: %.xml
-+	@(which xmlto > /dev/null 2>&1) || \
-+	 (echo "*** You need to install xmlto ***"; \
-+	  exit 1)
-+	$(call cmd,db2html)
-+
-+mandocs: $(MANFILES)
-+
-+htmldocs: $(HTMLFILES)
-+
-+clean-files := $(MANFILES) $(HTMLFILES)
-+
-+# we don't support other %docs targets right now
-+%docs:
-+	@true
-diff --git a/Documentation/kdbus/kdbus.bus.xml b/Documentation/kdbus/kdbus.bus.xml
-new file mode 100644
-index 0000000..83f1198
---- /dev/null
-+++ b/Documentation/kdbus/kdbus.bus.xml
-@@ -0,0 +1,344 @@
-+<?xml version='1.0'?> <!--*-nxml-*-->
-+<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
-+        "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
-+
-+<refentry id="kdbus.bus">
-+
-+  <refentryinfo>
-+    <title>kdbus.bus</title>
-+    <productname>kdbus.bus</productname>
-+  </refentryinfo>
-+
-+  <refmeta>
-+    <refentrytitle>kdbus.bus</refentrytitle>
-+    <manvolnum>7</manvolnum>
-+  </refmeta>
-+
-+  <refnamediv>
-+    <refname>kdbus.bus</refname>
-+    <refpurpose>kdbus bus</refpurpose>
-+  </refnamediv>
-+
-+  <refsect1>
-+    <title>Description</title>
-+
-+    <para>
-+      A bus is a resource that is shared between connections in order to
-+      transmit messages (see
-+      <citerefentry>
-+        <refentrytitle>kdbus.message</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>).
-+      Each bus is independent, and operations on the bus will not have any
-+      effect on other buses. A bus is a management entity that controls the
-+      addresses of its connections, their policies and message transactions
-+      performed via this bus.
-+    </para>
-+    <para>
-+      Each bus is bound to the mount instance it was created on. It has a
-+      custom name that is unique across all buses of a domain. In
-+      <citerefentry>
-+        <refentrytitle>kdbus.fs</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>
-+      a bus is presented as a directory. No operations can be performed on
-+      the bus itself; instead you need to perform the operations on an endpoint
-+      associated with the bus. Endpoints are accessible as files underneath the
-+      bus directory. A default endpoint called <constant>bus</constant> is
-+      provided on each bus.
-+    </para>
-+    <para>
-+      Bus names may be chosen freely except for one restriction: the name must
-+      be prefixed with the numeric effective UID of the creator and a dash. This
-+      is required to avoid namespace clashes between different users. When
-+      creating a bus, the name that is passed in must be properly formatted, or
-+      the kernel will refuse creation of the bus. Example:
-+      <literal>1047-foobar</literal> is an acceptable name for a bus
-+      registered by a user with UID 1047. However,
-+      <literal>1024-foobar</literal> is not, and neither is
-+      <literal>foobar</literal>. The UID must be provided in the
-+      user-namespace of the bus owner.
-+    </para>
-+    <para>
-+      To create a new bus, you need to open the control file of a domain and
-+      employ the <constant>KDBUS_CMD_BUS_MAKE</constant> ioctl. The control
-+      file descriptor that was used to issue
-+      <constant>KDBUS_CMD_BUS_MAKE</constant> must not previously have been
-+      used for any other control-ioctl and must be kept open for the entire
-+      life-time of the created bus. Closing it will immediately cleanup the
-+      entire bus and all its associated resources and endpoints. Every control
-+      file descriptor can only be used to create a single new bus; from that
-+      point on, it is not used for any further communication until the final
-+      <citerefentry>
-+        <refentrytitle>close</refentrytitle>
-+        <manvolnum>2</manvolnum>
-+      </citerefentry>
-+      .
-+    </para>
-+    <para>
-+      Each bus will generate a random, 128-bit UUID upon creation. This UUID
-+      will be returned to creators of connections through
-+      <varname>kdbus_cmd_hello.id128</varname> and can be used to uniquely
-+      identify buses, even across different machines or containers. The UUID
-+      will have its variant bits set to <literal>DCE</literal>, and denote
-+      version 4 (random). For more details on UUIDs, see <ulink
-+      url="https://en.wikipedia.org/wiki/Universally_unique_identifier">
-+      the Wikipedia article on UUIDs</ulink>.
-+    </para>
-+
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Creating buses</title>
-+    <para>
-+      To create a new bus, the <constant>KDBUS_CMD_BUS_MAKE</constant>
-+      command is used. It takes a <type>struct kdbus_cmd</type> argument.
-+    </para>
-+    <programlisting>
-+struct kdbus_cmd {
-+  __u64 size;
-+  __u64 flags;
-+  __u64 return_flags;
-+  struct kdbus_item items[0];
-+};
-+    </programlisting>
-+
-+    <para>The fields in this struct are described below.</para>
-+
-+    <variablelist>
-+      <varlistentry>
-+        <term><varname>size</varname></term>
-+        <listitem><para>
-+          The overall size of the struct, including its items.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>flags</varname></term>
-+        <listitem><para>The flags for creation.</para>
-+          <variablelist>
-+            <varlistentry>
-+              <term><constant>KDBUS_MAKE_ACCESS_GROUP</constant></term>
-+              <listitem>
-+                <para>Make the bus file group-accessible.</para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_MAKE_ACCESS_WORLD</constant></term>
-+              <listitem>
-+                <para>Make the bus file world-accessible.</para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
-+              <listitem>
-+                <para>
-+                  Requests a set of valid flags for this ioctl. When this bit is
-+                  set, no action is taken; the ioctl will return
-+                  <errorcode>0</errorcode>, and the <varname>flags</varname>
-+                  field will have all bits set that are valid for this command.
-+                  The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
-+                  cleared by the operation.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+          </variablelist>
-+        </listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>return_flags</varname></term>
-+        <listitem><para>
-+          Flags returned by the kernel. Currently unused and always set to
-+          <constant>0</constant> by the kernel.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>items</varname></term>
-+        <listitem>
-+          <para>
-+            The following items (see
-+            <citerefentry>
-+              <refentrytitle>kdbus.item</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>)
-+            are expected for <constant>KDBUS_CMD_BUS_MAKE</constant>.
-+          </para>
-+          <variablelist>
-+            <varlistentry>
-+              <term><constant>KDBUS_ITEM_MAKE_NAME</constant></term>
-+              <listitem>
-+                <para>
-+                  Contains a null-terminated string that identifies the
-+                  bus. The name must be unique across the kdbus domain and
-+                  must start with the effective UID of the caller, followed by
-+                  a '<literal>-</literal>' (dash). This item is mandatory.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_ITEM_BLOOM_PARAMETER</constant></term>
-+              <listitem>
-+                <para>
-+                  Bus-wide bloom parameters passed in a
-+                  <type>struct kdbus_bloom_parameter</type>. These settings are
-+                  copied back to new connections verbatim. This item is
-+                  mandatory. See
-+                  <citerefentry>
-+                    <refentrytitle>kdbus.item</refentrytitle>
-+                    <manvolnum>7</manvolnum>
-+                  </citerefentry>
-+                  for a more detailed description of this item.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_ITEM_ATTACH_FLAGS_SEND</constant></term>
-+              <listitem>
-+                <para>
-+                  An optional item that contains a set of attach flags that are
-+                  returned to connections when they query the bus creator
-+                  metadata. If not set, no metadata is returned.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
-+              <listitem><para>
-+                With this item, programs can <emphasis>probe</emphasis> the
-+                kernel for known item types. See
-+                <citerefentry>
-+                  <refentrytitle>kdbus.item</refentrytitle>
-+                  <manvolnum>7</manvolnum>
-+                </citerefentry>
-+                for more details.
-+              </para></listitem>
-+            </varlistentry>
-+          </variablelist>
-+        </listitem>
-+      </varlistentry>
-+    </variablelist>
-+
-+    <para>
-+      Unrecognized items are rejected, and the ioctl will fail with
-+      <varname>errno</varname> set to <constant>EINVAL</constant>.
-+    </para>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Return value</title>
-+    <para>
-+      On success, all mentioned ioctl commands return <errorcode>0</errorcode>;
-+      on error, <errorcode>-1</errorcode> is returned, and
-+      <varname>errno</varname> is set to indicate the error.
-+      If the issued ioctl is illegal for the file descriptor used,
-+      <varname>errno</varname> will be set to <constant>ENOTTY</constant>.
-+    </para>
-+
-+    <refsect2>
-+      <title>
-+        <constant>KDBUS_CMD_BUS_MAKE</constant> may fail with the following
-+        errors
-+      </title>
-+
-+      <variablelist>
-+        <varlistentry>
-+          <term><constant>EBADMSG</constant></term>
-+          <listitem><para>
-+            A mandatory item is missing.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EINVAL</constant></term>
-+          <listitem><para>
-+            The flags supplied in the <constant>struct kdbus_cmd</constant>
-+            are invalid or the supplied name does not start with the current
-+            UID and a '<literal>-</literal>' (dash).
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EEXIST</constant></term>
-+          <listitem><para>
-+            A bus of that name already exists.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>ESHUTDOWN</constant></term>
-+          <listitem><para>
-+            The kdbus mount instance for the bus was already shut down.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EMFILE</constant></term>
-+          <listitem><para>
-+            The maximum number of buses for the current user is exhausted.
-+          </para></listitem>
-+        </varlistentry>
-+      </variablelist>
-+    </refsect2>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>See Also</title>
-+    <simplelist type="inline">
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.connection</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.endpoint</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.fs</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.item</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.message</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.name</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.pool</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+    </simplelist>
-+  </refsect1>
-+</refentry>
-diff --git a/Documentation/kdbus/kdbus.connection.xml b/Documentation/kdbus/kdbus.connection.xml
-new file mode 100644
-index 0000000..4bb5f30
---- /dev/null
-+++ b/Documentation/kdbus/kdbus.connection.xml
-@@ -0,0 +1,1244 @@
-+<?xml version='1.0'?> <!--*-nxml-*-->
-+<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
-+        "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
-+
-+<refentry id="kdbus.connection">
-+
-+  <refentryinfo>
-+    <title>kdbus.connection</title>
-+    <productname>kdbus.connection</productname>
-+  </refentryinfo>
-+
-+  <refmeta>
-+    <refentrytitle>kdbus.connection</refentrytitle>
-+    <manvolnum>7</manvolnum>
-+  </refmeta>
-+
-+  <refnamediv>
-+    <refname>kdbus.connection</refname>
-+    <refpurpose>kdbus connection</refpurpose>
-+  </refnamediv>
-+
-+  <refsect1>
-+    <title>Description</title>
-+
-+    <para>
-+      Connections are identified by their <emphasis>connection ID</emphasis>,
-+      internally implemented as a <type>uint64_t</type> counter.
-+      The IDs of every newly created bus start at <constant>1</constant>, and
-+      every new connection will increment the counter by <constant>1</constant>.
-+      The IDs are not reused.
-+    </para>
-+    <para>
-+      In higher level tools, the user visible representation of a connection is
-+      defined by the D-Bus protocol specification as
-+      <constant>":1.&lt;ID&gt;"</constant>.
-+    </para>
-+    <para>
-+      Messages with a specific <type>uint64_t</type> destination ID are
-+      directly delivered to the connection with the corresponding ID. Signal
-+      messages (see
-+      <citerefentry>
-+        <refentrytitle>kdbus.message</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>)
-+      may be addressed to the special destination ID
-+      <constant>KDBUS_DST_ID_BROADCAST</constant> (~0ULL) and will then
-+      potentially be delivered to all currently active connections on the bus.
-+      However, in order to receive any signal messages, clients must subscribe
-+      to them by installing a match (see
-+      <citerefentry>
-+        <refentrytitle>kdbus.match</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>).
-+    </para>
-+    <para>
-+      Messages synthesized and sent directly by the kernel will carry the
-+      special source ID <constant>KDBUS_SRC_ID_KERNEL</constant> (0).
-+    </para>
-+    <para>
-+      In addition to the unique <type>uint64_t</type> connection ID,
-+      established connections can request the ownership of
-+      <emphasis>well-known names</emphasis>, under which they can be found and
-+      addressed by other bus clients. A well-known name is associated with one
-+      and only one connection at a time. See
-+      <citerefentry>
-+        <refentrytitle>kdbus.name</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>
-+      on name acquisition, the name registry, and the validity of names.
-+    </para>
-+    <para>
-+      Messages can specify the special destination ID
-+      <constant>KDBUS_DST_ID_NAME</constant> (0) and carry a well-known name
-+      in the message data. Such a message is delivered to the destination
-+      connection which owns that well-known name.
-+    </para>
-+
-+    <programlisting><![CDATA[
-+  +-------------------------------------------------------------------------+
-+  | +---------------+     +---------------------------+                     |
-+  | | Connection    |     | Message                   | -----------------+  |
-+  | | :1.22         | --> | src: 22                   |                  |  |
-+  | |               |     | dst: 25                   |                  |  |
-+  | |               |     |                           |                  |  |
-+  | |               |     |                           |                  |  |
-+  | |               |     +---------------------------+                  |  |
-+  | |               |                                                    |  |
-+  | |               | <--------------------------------------+           |  |
-+  | +---------------+                                        |           |  |
-+  |                                                          |           |  |
-+  | +---------------+     +---------------------------+      |           |  |
-+  | | Connection    |     | Message                   | -----+           |  |
-+  | | :1.25         | --> | src: 25                   |                  |  |
-+  | |               |     | dst: 0xffffffffffffffff   | -------------+   |  |
-+  | |               |     |  (KDBUS_DST_ID_BROADCAST) |              |   |  |
-+  | |               |     |                           | ---------+   |   |  |
-+  | |               |     +---------------------------+          |   |   |  |
-+  | |               |                                            |   |   |  |
-+  | |               | <--------------------------------------------------+  |
-+  | +---------------+                                            |   |      |
-+  |                                                              |   |      |
-+  | +---------------+     +---------------------------+          |   |      |
-+  | | Connection    |     | Message                   | --+      |   |      |
-+  | | :1.55         | --> | src: 55                   |   |      |   |      |
-+  | |               |     | dst: 0 / org.foo.bar      |   |      |   |      |
-+  | |               |     |                           |   |      |   |      |
-+  | |               |     |                           |   |      |   |      |
-+  | |               |     +---------------------------+   |      |   |      |
-+  | |               |                                     |      |   |      |
-+  | |               | <------------------------------------------+   |      |
-+  | +---------------+                                     |          |      |
-+  |                                                       |          |      |
-+  | +---------------+                                     |          |      |
-+  | | Connection    |                                     |          |      |
-+  | | :1.81         |                                     |          |      |
-+  | | org.foo.bar   |                                     |          |      |
-+  | |               |                                     |          |      |
-+  | |               |                                     |          |      |
-+  | |               | <-----------------------------------+          |      |
-+  | |               |                                                |      |
-+  | |               | <----------------------------------------------+      |
-+  | +---------------+                                                       |
-+  +-------------------------------------------------------------------------+
-+    ]]></programlisting>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Privileged connections</title>
-+    <para>
-+      A connection is considered <emphasis>privileged</emphasis> if the user
-+      it was created by is the same that created the bus, or if the creating
-+      task had <constant>CAP_IPC_OWNER</constant> set when it called
-+      <constant>KDBUS_CMD_HELLO</constant> (see below).
-+    </para>
-+    <para>
-+      Privileged connections have permission to employ certain restricted
-+      functions and commands, which are explained below and in other kdbus
-+      man-pages.
-+    </para>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Activator and policy holder connection</title>
-+    <para>
-+      An <emphasis>activator</emphasis> connection is a placeholder for a
-+      <emphasis>well-known name</emphasis>. Messages sent to such a connection
-+      can be used to start an implementer connection, which will then get all
-+      the messages from the activator copied over. An activator connection
-+      cannot be used to send any message.
-+    </para>
-+    <para>
-+      A <emphasis>policy holder</emphasis> connection only installs a policy
-+      for one or more names. These policy entries are kept active as long as
-+      the connection is alive, and are removed once it terminates. Such a
-+      policy connection type can be used to deploy restrictions for names that
-+      are not yet active on the bus. A policy holder connection cannot be used
-+      to send any message.
-+    </para>
-+    <para>
-+      The creation of activator or policy holder connections is restricted to
-+      privileged users on the bus (see above).
-+    </para>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Monitor connections</title>
-+    <para>
-+      Monitors are eavesdropping connections that receive all the traffic on the
-+      bus, but is invisible to other connections. Such connections have all
-+      properties of any other, regular connection, except for the following
-+      details:
-+    </para>
-+
-+    <itemizedlist>
-+      <listitem><para>
-+        They will get every message sent over the bus, both unicasts and
-+        broadcasts.
-+      </para></listitem>
-+
-+      <listitem><para>
-+        Installing matches for signal messages is neither necessary
-+        nor allowed.
-+      </para></listitem>
-+
-+      <listitem><para>
-+        They cannot send messages or be directly addressed as receiver.
-+      </para></listitem>
-+
-+      <listitem><para>
-+        They cannot own well-known names. Therefore, they also can't operate as
-+        activators.
-+      </para></listitem>
-+
-+      <listitem><para>
-+        Their creation and destruction will not cause
-+        <constant>KDBUS_ITEM_ID_{ADD,REMOVE}</constant> (see
-+        <citerefentry>
-+          <refentrytitle>kdbus.item</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>).
-+      </para></listitem>
-+
-+      <listitem><para>
-+        They are not listed with their unique name in name registry dumps
-+        (see <constant>KDBUS_CMD_NAME_LIST</constant> in
-+        <citerefentry>
-+          <refentrytitle>kdbus.name</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>), so other connections cannot detect the presence of
-+	a monitor.
-+      </para></listitem>
-+    </itemizedlist>
-+    <para>
-+      The creation of monitor connections is restricted to privileged users on
-+      the bus (see above).
-+    </para>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Creating connections</title>
-+    <para>
-+      A connection to a bus is created by opening an endpoint file (see
-+      <citerefentry>
-+        <refentrytitle>kdbus.endpoint</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>)
-+      of a bus and becoming an active client with the
-+      <constant>KDBUS_CMD_HELLO</constant> ioctl. Every connection has a unique
-+      identifier on the bus and can address messages to every other connection
-+      on the same bus by using the peer's connection ID as the destination.
-+    </para>
-+    <para>
-+      The <constant>KDBUS_CMD_HELLO</constant> ioctl takes a <type>struct
-+      kdbus_cmd_hello</type> as argument.
-+    </para>
-+
-+    <programlisting>
-+struct kdbus_cmd_hello {
-+  __u64 size;
-+  __u64 flags;
-+  __u64 return_flags;
-+  __u64 attach_flags_send;
-+  __u64 attach_flags_recv;
-+  __u64 bus_flags;
-+  __u64 id;
-+  __u64 pool_size;
-+  __u64 offset;
-+  __u8 id128[16];
-+  struct kdbus_item items[0];
-+};
-+    </programlisting>
-+
-+    <para>The fields in this struct are described below.</para>
-+
-+    <variablelist>
-+      <varlistentry>
-+        <term><varname>size</varname></term>
-+        <listitem><para>
-+          The overall size of the struct, including its items.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>flags</varname></term>
-+        <listitem>
-+          <para>Flags to apply to this connection</para>
-+          <variablelist>
-+            <varlistentry>
-+              <term><constant>KDBUS_HELLO_ACCEPT_FD</constant></term>
-+              <listitem>
-+                <para>
-+                  When this flag is set, the connection can be sent file
-+                  descriptors as message payload of unicast messages. If it's
-+                  not set, an attempt to send file descriptors will result in
-+                  <constant>-ECOMM</constant> on the sender's side.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_HELLO_ACTIVATOR</constant></term>
-+              <listitem>
-+                <para>
-+                  Make this connection an activator (see above). With this bit
-+                  set, an item of type <constant>KDBUS_ITEM_NAME</constant> has
-+                  to be attached. This item describes the well-known name this
-+                  connection should be an activator for.
-+                  A connection can not be an activator and a policy holder at
-+                  the same time time, so this bit is not allowed together with
-+                  <constant>KDBUS_HELLO_POLICY_HOLDER</constant>.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_HELLO_POLICY_HOLDER</constant></term>
-+              <listitem>
-+                <para>
-+                  Make this connection a policy holder (see above). With this
-+                  bit set, an item of type <constant>KDBUS_ITEM_NAME</constant>
-+                  has to be attached. This item describes the well-known name
-+                  this connection should hold a policy for.
-+                  A connection can not be an activator and a policy holder at
-+                  the same time time, so this bit is not allowed together with
-+                  <constant>KDBUS_HELLO_ACTIVATOR</constant>.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_HELLO_MONITOR</constant></term>
-+              <listitem>
-+                <para>
-+                  Make this connection a monitor connection (see above).
-+                </para>
-+                <para>
-+                  This flag can only be set by privileged bus connections. See
-+                  below for more information.
-+                  A connection can not be monitor and an activator or a policy
-+                  holder at the same time time, so this bit is not allowed
-+                  together with <constant>KDBUS_HELLO_ACTIVATOR</constant> or
-+                  <constant>KDBUS_HELLO_POLICY_HOLDER</constant>.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
-+              <listitem>
-+                <para>
-+                  Requests a set of valid flags for this ioctl. When this bit is
-+                  set, no action is taken; the ioctl will return
-+                  <errorcode>0</errorcode>, and the <varname>flags</varname>
-+                  field will have all bits set that are valid for this command.
-+                  The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
-+                  cleared by the operation.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+          </variablelist>
-+        </listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>return_flags</varname></term>
-+        <listitem><para>
-+          Flags returned by the kernel. Currently unused and always set to
-+          <constant>0</constant> by the kernel.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>attach_flags_send</varname></term>
-+        <listitem><para>
-+          Set the bits for metadata this connection permits to be sent to the
-+          receiving peer. Only metadata items that are both allowed to be sent
-+          by the sender and that are requested by the receiver will be attached
-+          to the message.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>attach_flags_recv</varname></term>
-+        <listitem><para>
-+          Request the attachment of metadata for each message received by this
-+          connection. See
-+          <citerefentry>
-+            <refentrytitle>kdbus</refentrytitle>
-+            <manvolnum>7</manvolnum>
-+          </citerefentry>
-+          for information about metadata, and
-+          <citerefentry>
-+            <refentrytitle>kdbus.item</refentrytitle>
-+            <manvolnum>7</manvolnum>
-+          </citerefentry>
-+          regarding items in general.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>bus_flags</varname></term>
-+        <listitem><para>
-+          Upon successful completion of the ioctl, this member will contain the
-+          flags of the bus it connected to.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>id</varname></term>
-+        <listitem><para>
-+          Upon successful completion of the command, this member will contain
-+          the numerical ID of the new connection.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>pool_size</varname></term>
-+        <listitem><para>
-+          The size of the communication pool, in bytes. The pool can be
-+          accessed by calling
-+          <citerefentry>
-+            <refentrytitle>mmap</refentrytitle>
-+            <manvolnum>2</manvolnum>
-+          </citerefentry>
-+          on the file descriptor that was used to issue the
-+          <constant>KDBUS_CMD_HELLO</constant> ioctl.
-+          The pool size of a connection must be greater than
-+          <constant>0</constant> and a multiple of
-+          <constant>PAGE_SIZE</constant>. See
-+          <citerefentry>
-+            <refentrytitle>kdbus.pool</refentrytitle>
-+            <manvolnum>7</manvolnum>
-+          </citerefentry>
-+          for more information.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>offset</varname></term>
-+        <listitem><para>
-+          The kernel will return the offset in the pool where returned details
-+          will be stored. See below.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>id128</varname></term>
-+        <listitem><para>
-+          Upon successful completion of the ioctl, this member will contain the
-+          <emphasis>128-bit UUID</emphasis> of the connected bus.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>items</varname></term>
-+        <listitem>
-+          <para>
-+            Variable list of items containing optional additional information.
-+            The following items are currently expected/valid:
-+          </para>
-+          <variablelist>
-+            <varlistentry>
-+              <term><constant>KDBUS_ITEM_CONN_DESCRIPTION</constant></term>
-+              <listitem>
-+                <para>
-+                  Contains a string that describes this connection, so it can
-+                  be identified later.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_ITEM_NAME</constant></term>
-+              <term><constant>KDBUS_ITEM_POLICY_ACCESS</constant></term>
-+              <listitem>
-+                <para>
-+                  For activators and policy holders only, combinations of
-+                  these two items describe policy access entries. See
-+                  <citerefentry>
-+                    <refentrytitle>kdbus.policy</refentrytitle>
-+                    <manvolnum>7</manvolnum>
-+                  </citerefentry>
-+                  for further details.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_ITEM_CREDS</constant></term>
-+              <term><constant>KDBUS_ITEM_PIDS</constant></term>
-+              <term><constant>KDBUS_ITEM_SECLABEL</constant></term>
-+              <listitem>
-+                <para>
-+                  Privileged bus users may submit these types in order to
-+                  create connections with faked credentials. This information
-+                  will be returned when peer information is queried by
-+                  <constant>KDBUS_CMD_CONN_INFO</constant>. See below for more
-+                  information on retrieving information on connections.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
-+              <listitem><para>
-+                With this item, programs can <emphasis>probe</emphasis> the
-+                kernel for known item types. See
-+                <citerefentry>
-+                  <refentrytitle>kdbus.item</refentrytitle>
-+                  <manvolnum>7</manvolnum>
-+                </citerefentry>
-+                for more details.
-+              </para></listitem>
-+            </varlistentry>
-+          </variablelist>
-+
-+          <para>
-+            Unrecognized items are rejected, and the ioctl will fail with
-+            <varname>errno</varname> set to <constant>EINVAL</constant>.
-+          </para>
-+        </listitem>
-+      </varlistentry>
-+    </variablelist>
-+
-+    <para>
-+      At the offset returned in the <varname>offset</varname> field of
-+      <type>struct kdbus_cmd_hello</type>, the kernel will store items
-+      of the following types:
-+    </para>
-+
-+    <variablelist>
-+      <varlistentry>
-+        <term><constant>KDBUS_ITEM_BLOOM_PARAMETER</constant></term>
-+        <listitem>
-+          <para>
-+            Bloom filter parameter as defined by the bus creator.
-+          </para>
-+        </listitem>
-+      </varlistentry>
-+    </variablelist>
-+
-+    <para>
-+      The offset in the pool has to be freed with the
-+      <constant>KDBUS_CMD_FREE</constant> ioctl. See
-+      <citerefentry>
-+        <refentrytitle>kdbus.pool</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>
-+      for further information.
-+    </para>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Retrieving information on a connection</title>
-+    <para>
-+      The <constant>KDBUS_CMD_CONN_INFO</constant> ioctl can be used to
-+      retrieve credentials and properties of the initial creator of a
-+      connection. This ioctl uses the following struct.
-+    </para>
-+
-+    <programlisting>
-+struct kdbus_cmd_info {
-+  __u64 size;
-+  __u64 flags;
-+  __u64 return_flags;
-+  __u64 id;
-+  __u64 attach_flags;
-+  __u64 offset;
-+  __u64 info_size;
-+  struct kdbus_item items[0];
-+};
-+    </programlisting>
-+
-+    <variablelist>
-+      <varlistentry>
-+        <term><varname>size</varname></term>
-+        <listitem><para>
-+          The overall size of the struct, including its items.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>flags</varname></term>
-+        <listitem><para>
-+          Currently, no flags are supported.
-+          <constant>KDBUS_FLAG_NEGOTIATE</constant> is accepted to probe for
-+          valid flags. If set, the ioctl will return <errorcode>0</errorcode>,
-+          and the <varname>flags</varname> field is set to
-+          <constant>0</constant>.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>return_flags</varname></term>
-+        <listitem><para>
-+          Flags returned by the kernel. Currently unused and always set to
-+          <constant>0</constant> by the kernel.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>id</varname></term>
-+        <listitem><para>
-+          The numerical ID of the connection for which information is to be
-+          retrieved. If set to a non-zero value, the
-+          <constant>KDBUS_ITEM_OWNED_NAME</constant> item is ignored.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>attach_flags</varname></term>
-+        <listitem><para>
-+          Specifies which metadata items should be attached to the answer. See
-+          <citerefentry>
-+            <refentrytitle>kdbus.message</refentrytitle>
-+            <manvolnum>7</manvolnum>
-+          </citerefentry>.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>offset</varname></term>
-+        <listitem><para>
-+          When the ioctl returns, this field will contain the offset of the
-+          connection information inside the caller's pool. See
-+          <citerefentry>
-+            <refentrytitle>kdbus.pool</refentrytitle>
-+            <manvolnum>7</manvolnum>
-+          </citerefentry>
-+          for further information.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>info_size</varname></term>
-+        <listitem><para>
-+          The kernel will return the size of the returned information, so
-+          applications can optionally
-+          <citerefentry>
-+            <refentrytitle>mmap</refentrytitle>
-+            <manvolnum>2</manvolnum>
-+          </citerefentry>
-+          specific parts of the pool. See
-+          <citerefentry>
-+            <refentrytitle>kdbus.pool</refentrytitle>
-+            <manvolnum>7</manvolnum>
-+          </citerefentry>
-+          for further information.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>items</varname></term>
-+        <listitem>
-+          <para>
-+            The following items are expected for
-+            <constant>KDBUS_CMD_CONN_INFO</constant>.
-+          </para>
-+          <variablelist>
-+            <varlistentry>
-+              <term><constant>KDBUS_ITEM_OWNED_NAME</constant></term>
-+              <listitem>
-+                <para>
-+                  Contains the well-known name of the connection to look up as.
-+                  This item is mandatory if the <varname>id</varname> field is
-+                  set to 0.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
-+              <listitem><para>
-+                With this item, programs can <emphasis>probe</emphasis> the
-+                kernel for known item types. See
-+                <citerefentry>
-+                  <refentrytitle>kdbus.item</refentrytitle>
-+                  <manvolnum>7</manvolnum>
-+                </citerefentry>
-+                for more details.
-+              </para></listitem>
-+            </varlistentry>
-+          </variablelist>
-+          <para>
-+            Unrecognized items are rejected, and the ioctl will fail with
-+            <varname>errno</varname> set to <constant>EINVAL</constant>.
-+          </para>
-+        </listitem>
-+      </varlistentry>
-+    </variablelist>
-+
-+    <para>
-+      When the ioctl returns, the following struct will be stored in the
-+      caller's pool at <varname>offset</varname>. The fields in this struct
-+      are described below.
-+    </para>
-+
-+    <programlisting>
-+struct kdbus_info {
-+  __u64 size;
-+  __u64 id;
-+  __u64 flags;
-+  struct kdbus_item items[0];
-+};
-+    </programlisting>
-+
-+    <variablelist>
-+      <varlistentry>
-+        <term><varname>size</varname></term>
-+        <listitem><para>
-+          The overall size of the struct, including its items.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>id</varname></term>
-+        <listitem><para>
-+          The connection's unique ID.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>flags</varname></term>
-+        <listitem><para>
-+          The connection's flags as specified when it was created.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>items</varname></term>
-+        <listitem>
-+          <para>
-+            Depending on the <varname>flags</varname> field in
-+            <type>struct kdbus_cmd_info</type>, items of types
-+            <constant>KDBUS_ITEM_OWNED_NAME</constant> and
-+            <constant>KDBUS_ITEM_CONN_DESCRIPTION</constant> may follow here.
-+            <constant>KDBUS_ITEM_NEGOTIATE</constant> is also allowed.
-+          </para>
-+        </listitem>
-+      </varlistentry>
-+    </variablelist>
-+
-+    <para>
-+      Once the caller is finished with parsing the return buffer, it needs to
-+      employ the <constant>KDBUS_CMD_FREE</constant> command for the offset, in
-+      order to free the buffer part. See
-+      <citerefentry>
-+        <refentrytitle>kdbus.pool</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>
-+      for further information.
-+    </para>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Getting information about a connection's bus creator</title>
-+    <para>
-+      The <constant>KDBUS_CMD_BUS_CREATOR_INFO</constant> ioctl takes the same
-+      struct as <constant>KDBUS_CMD_CONN_INFO</constant>, but is used to
-+      retrieve information about the creator of the bus the connection is
-+      attached to. The metadata returned by this call is collected during the
-+      creation of the bus and is never altered afterwards, so it provides
-+      pristine information on the task that created the bus, at the moment when
-+      it did so.
-+    </para>
-+    <para>
-+      In response to this call, a slice in the connection's pool is allocated
-+      and filled with an object of type <type>struct kdbus_info</type>,
-+      pointed to by the ioctl's <varname>offset</varname> field.
-+    </para>
-+
-+    <programlisting>
-+struct kdbus_info {
-+  __u64 size;
-+  __u64 id;
-+  __u64 flags;
-+  struct kdbus_item items[0];
-+};
-+    </programlisting>
-+
-+    <variablelist>
-+      <varlistentry>
-+        <term><varname>size</varname></term>
-+        <listitem><para>
-+          The overall size of the struct, including its items.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>id</varname></term>
-+        <listitem><para>
-+          The bus ID.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>flags</varname></term>
-+        <listitem><para>
-+          The bus flags as specified when it was created.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>items</varname></term>
-+        <listitem>
-+          <para>
-+            Metadata information is stored in items here. The item list
-+            contains a <constant>KDBUS_ITEM_MAKE_NAME</constant> item that
-+            indicates the bus name of the calling connection.
-+            <constant>KDBUS_ITEM_NEGOTIATE</constant> is allowed to probe
-+            for known item types.
-+          </para>
-+        </listitem>
-+      </varlistentry>
-+    </variablelist>
-+
-+    <para>
-+      Once the caller is finished with parsing the return buffer, it needs to
-+      employ the <constant>KDBUS_CMD_FREE</constant> command for the offset, in
-+      order to free the buffer part. See
-+      <citerefentry>
-+        <refentrytitle>kdbus.pool</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>
-+      for further information.
-+    </para>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Updating connection details</title>
-+    <para>
-+      Some of a connection's details can be updated with the
-+      <constant>KDBUS_CMD_CONN_UPDATE</constant> ioctl, using the file
-+      descriptor that was used to create the connection. The update command
-+      uses the following struct.
-+    </para>
-+
-+    <programlisting>
-+struct kdbus_cmd {
-+  __u64 size;
-+  __u64 flags;
-+  __u64 return_flags;
-+  struct kdbus_item items[0];
-+};
-+    </programlisting>
-+
-+    <variablelist>
-+      <varlistentry>
-+        <term><varname>size</varname></term>
-+        <listitem><para>
-+          The overall size of the struct, including its items.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>flags</varname></term>
-+        <listitem><para>
-+          Currently, no flags are supported.
-+          <constant>KDBUS_FLAG_NEGOTIATE</constant> is accepted to probe for
-+          valid flags. If set, the ioctl will return <errorcode>0</errorcode>,
-+          and the <varname>flags</varname> field is set to
-+          <constant>0</constant>.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>return_flags</varname></term>
-+        <listitem><para>
-+          Flags returned by the kernel. Currently unused and always set to
-+          <constant>0</constant> by the kernel.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>items</varname></term>
-+        <listitem>
-+          <para>
-+            Items to describe the connection details to be updated. The
-+            following item types are supported.
-+          </para>
-+          <variablelist>
-+            <varlistentry>
-+              <term><constant>KDBUS_ITEM_ATTACH_FLAGS_SEND</constant></term>
-+              <listitem>
-+                <para>
-+                  Supply a new set of metadata items that this connection
-+                  permits to be sent along with messages.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_ITEM_ATTACH_FLAGS_RECV</constant></term>
-+              <listitem>
-+                <para>
-+                  Supply a new set of metadata items that this connection
-+                  requests to be attached to each message.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_ITEM_NAME</constant></term>
-+              <term><constant>KDBUS_ITEM_POLICY_ACCESS</constant></term>
-+              <listitem>
-+                <para>
-+                  Policy holder connections may supply a new set of policy
-+                  information with these items. For other connection types,
-+                  <constant>EOPNOTSUPP</constant> is returned in
-+                  <varname>errno</varname>.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
-+              <listitem><para>
-+                With this item, programs can <emphasis>probe</emphasis> the
-+                kernel for known item types. See
-+                <citerefentry>
-+                  <refentrytitle>kdbus.item</refentrytitle>
-+                  <manvolnum>7</manvolnum>
-+                </citerefentry>
-+                for more details.
-+              </para></listitem>
-+            </varlistentry>
-+          </variablelist>
-+
-+          <para>
-+            Unrecognized items are rejected, and the ioctl will fail with
-+            <varname>errno</varname> set to <constant>EINVAL</constant>.
-+          </para>
-+        </listitem>
-+      </varlistentry>
-+    </variablelist>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Termination of connections</title>
-+    <para>
-+      A connection can be terminated by simply calling
-+      <citerefentry>
-+        <refentrytitle>close</refentrytitle>
-+        <manvolnum>2</manvolnum>
-+      </citerefentry>
-+      on its file descriptor. All pending incoming messages will be discarded,
-+      and the memory allocated by the pool will be freed.
-+    </para>
-+
-+    <para>
-+      An alternative way of closing down a connection is via the
-+      <constant>KDBUS_CMD_BYEBYE</constant> ioctl. This ioctl will succeed only
-+      if the message queue of the connection is empty at the time of closing;
-+      otherwise, the ioctl will fail with <varname>errno</varname> set to
-+      <constant>EBUSY</constant>. When this ioctl returns
-+      successfully, the connection has been terminated and won't accept any new
-+      messages from remote peers. This way, a connection can be terminated
-+      race-free, without losing any messages. The ioctl takes an argument of
-+      type <type>struct kdbus_cmd</type>.
-+    </para>
-+
-+    <programlisting>
-+struct kdbus_cmd {
-+  __u64 size;
-+  __u64 flags;
-+  __u64 return_flags;
-+  struct kdbus_item items[0];
-+};
-+    </programlisting>
-+
-+    <variablelist>
-+      <varlistentry>
-+        <term><varname>size</varname></term>
-+        <listitem><para>
-+          The overall size of the struct, including its items.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>flags</varname></term>
-+        <listitem><para>
-+          Currently, no flags are supported.
-+          <constant>KDBUS_FLAG_NEGOTIATE</constant> is accepted to probe for
-+          valid flags. If set, the ioctl will fail with
-+          <varname>errno</varname> set to <constant>EPROTO</constant>, and
-+          the <varname>flags</varname> field is set to <constant>0</constant>.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>return_flags</varname></term>
-+        <listitem><para>
-+          Flags returned by the kernel. Currently unused and always set to
-+          <constant>0</constant> by the kernel.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>items</varname></term>
-+        <listitem>
-+          <para>
-+            The following item types are supported.
-+          </para>
-+          <variablelist>
-+            <varlistentry>
-+              <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
-+              <listitem><para>
-+                With this item, programs can <emphasis>probe</emphasis> the
-+                kernel for known item types. See
-+                <citerefentry>
-+                  <refentrytitle>kdbus.item</refentrytitle>
-+                  <manvolnum>7</manvolnum>
-+                </citerefentry>
-+                for more details.
-+              </para></listitem>
-+            </varlistentry>
-+          </variablelist>
-+
-+          <para>
-+            Unrecognized items are rejected, and the ioctl will fail with
-+            <varname>errno</varname> set to <constant>EINVAL</constant>.
-+          </para>
-+        </listitem>
-+      </varlistentry>
-+    </variablelist>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Return value</title>
-+    <para>
-+      On success, all mentioned ioctl commands return <errorcode>0</errorcode>;
-+      on error, <errorcode>-1</errorcode> is returned, and
-+      <varname>errno</varname> is set to indicate the error.
-+      If the issued ioctl is illegal for the file descriptor used,
-+      <varname>errno</varname> will be set to <constant>ENOTTY</constant>.
-+    </para>
-+
-+    <refsect2>
-+      <title>
-+        <constant>KDBUS_CMD_HELLO</constant> may fail with the following
-+        errors
-+      </title>
-+
-+      <variablelist>
-+        <varlistentry>
-+          <term><constant>EFAULT</constant></term>
-+          <listitem><para>
-+            The supplied pool size was 0 or not a multiple of the page size.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EINVAL</constant></term>
-+          <listitem><para>
-+            The flags supplied in <type>struct kdbus_cmd_hello</type>
-+            are invalid.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EINVAL</constant></term>
-+          <listitem><para>
-+            An illegal combination of
-+            <constant>KDBUS_HELLO_MONITOR</constant>,
-+            <constant>KDBUS_HELLO_ACTIVATOR</constant> and
-+            <constant>KDBUS_HELLO_POLICY_HOLDER</constant> was passed in
-+            <varname>flags</varname>.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EINVAL</constant></term>
-+          <listitem><para>
-+            An invalid set of items was supplied.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>ECONNREFUSED</constant></term>
-+          <listitem><para>
-+            The attach_flags_send field did not satisfy the requirements of
-+            the bus.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EPERM</constant></term>
-+          <listitem><para>
-+            A <constant>KDBUS_ITEM_CREDS</constant> items was supplied, but the
-+            current user is not privileged.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>ESHUTDOWN</constant></term>
-+          <listitem><para>
-+            The bus you were trying to connect to has already been shut down.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EMFILE</constant></term>
-+          <listitem><para>
-+            The maximum number of connections on the bus has been reached.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EOPNOTSUPP</constant></term>
-+          <listitem><para>
-+            The endpoint does not support the connection flags supplied in
-+            <type>struct kdbus_cmd_hello</type>.
-+          </para></listitem>
-+        </varlistentry>
-+      </variablelist>
-+    </refsect2>
-+
-+    <refsect2>
-+      <title>
-+        <constant>KDBUS_CMD_BYEBYE</constant> may fail with the following
-+        errors
-+      </title>
-+
-+      <variablelist>
-+        <varlistentry>
-+          <term><constant>EALREADY</constant></term>
-+          <listitem><para>
-+            The connection has already been shut down.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EBUSY</constant></term>
-+          <listitem><para>
-+            There are still messages queued up in the connection's pool.
-+          </para></listitem>
-+        </varlistentry>
-+      </variablelist>
-+    </refsect2>
-+
-+    <refsect2>
-+      <title>
-+        <constant>KDBUS_CMD_CONN_INFO</constant> may fail with the following
-+        errors
-+      </title>
-+
-+      <variablelist>
-+        <varlistentry>
-+          <term><constant>EINVAL</constant></term>
-+          <listitem><para>
-+            Invalid flags, or neither an ID nor a name was provided, or the
-+            name is invalid.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>ESRCH</constant></term>
-+          <listitem><para>
-+            Connection lookup by name failed.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>ENXIO</constant></term>
-+          <listitem><para>
-+            No connection with the provided connection ID found.
-+          </para></listitem>
-+        </varlistentry>
-+      </variablelist>
-+    </refsect2>
-+
-+    <refsect2>
-+      <title>
-+        <constant>KDBUS_CMD_CONN_UPDATE</constant> may fail with the following
-+        errors
-+      </title>
-+
-+      <variablelist>
-+        <varlistentry>
-+          <term><constant>EINVAL</constant></term>
-+          <listitem><para>
-+            Illegal flags or items.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EINVAL</constant></term>
-+          <listitem><para>
-+            Wildcards submitted in policy entries, or illegal sequence
-+            of policy items.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EOPNOTSUPP</constant></term>
-+          <listitem><para>
-+            Operation not supported by connection.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>E2BIG</constant></term>
-+          <listitem><para>
-+            Too many policy items attached.
-+          </para></listitem>
-+        </varlistentry>
-+      </variablelist>
-+    </refsect2>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>See Also</title>
-+    <simplelist type="inline">
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.bus</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.endpoint</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.message</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.name</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.policy</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.pool</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.item</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+    </simplelist>
-+  </refsect1>
-+</refentry>
-diff --git a/Documentation/kdbus/kdbus.endpoint.xml b/Documentation/kdbus/kdbus.endpoint.xml
-new file mode 100644
-index 0000000..6632485
---- /dev/null
-+++ b/Documentation/kdbus/kdbus.endpoint.xml
-@@ -0,0 +1,429 @@
-+<?xml version='1.0'?> <!--*-nxml-*-->
-+<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
-+        "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
-+
-+<refentry id="kdbus.endpoint">
-+
-+  <refentryinfo>
-+    <title>kdbus.endpoint</title>
-+    <productname>kdbus.endpoint</productname>
-+  </refentryinfo>
-+
-+  <refmeta>
-+    <refentrytitle>kdbus.endpoint</refentrytitle>
-+    <manvolnum>7</manvolnum>
-+  </refmeta>
-+
-+  <refnamediv>
-+    <refname>kdbus.endpoint</refname>
-+    <refpurpose>kdbus endpoint</refpurpose>
-+  </refnamediv>
-+
-+  <refsect1>
-+    <title>Description</title>
-+
-+    <para>
-+      Endpoints are entry points to a bus (see
-+      <citerefentry>
-+        <refentrytitle>kdbus.bus</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>).
-+      By default, each bus has a default
-+      endpoint called 'bus'. The bus owner has the ability to create custom
-+      endpoints with specific names, permissions, and policy databases
-+      (see below). An endpoint is presented as file underneath the directory
-+      of the parent bus.
-+    </para>
-+    <para>
-+      To create a custom endpoint, open the default endpoint
-+      (<literal>bus</literal>) and use the
-+      <constant>KDBUS_CMD_ENDPOINT_MAKE</constant> ioctl with
-+      <type>struct kdbus_cmd</type>. Custom endpoints always have a policy
-+      database that, by default, forbids any operation. You have to explicitly
-+      install policy entries to allow any operation on this endpoint.
-+    </para>
-+    <para>
-+      Once <constant>KDBUS_CMD_ENDPOINT_MAKE</constant> succeeded, the new
-+      endpoint will appear in the filesystem
-+      (<citerefentry>
-+        <refentrytitle>kdbus.bus</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>), and the used file descriptor will manage the
-+      newly created endpoint resource. It cannot be used to manage further
-+      resources and must be kept open as long as the endpoint is needed. The
-+      endpoint will be terminated as soon as the file descriptor is closed.
-+    </para>
-+    <para>
-+      Endpoint names may be chosen freely except for one restriction: the name
-+      must be prefixed with the numeric effective UID of the creator and a dash.
-+      This is required to avoid namespace clashes between different users. When
-+      creating an endpoint, the name that is passed in must be properly
-+      formatted or the kernel will refuse creation of the endpoint. Example:
-+      <literal>1047-my-endpoint</literal> is an acceptable name for an
-+      endpoint registered by a user with UID 1047. However,
-+      <literal>1024-my-endpoint</literal> is not, and neither is
-+      <literal>my-endpoint</literal>. The UID must be provided in the
-+      user-namespace of the bus.
-+    </para>
-+    <para>
-+      To create connections to a bus, use <constant>KDBUS_CMD_HELLO</constant>
-+      on a file descriptor returned by <function>open()</function> on an
-+      endpoint node. See
-+      <citerefentry>
-+        <refentrytitle>kdbus.connection</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>
-+      for further details.
-+    </para>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Creating custom endpoints</title>
-+    <para>
-+      To create a new endpoint, the
-+      <constant>KDBUS_CMD_ENDPOINT_MAKE</constant> command is used. Along with
-+      the endpoint's name, which will be used to expose the endpoint in the
-+      <citerefentry>
-+        <refentrytitle>kdbus.fs</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>,
-+      the command also optionally takes items to set up the endpoint's
-+      <citerefentry>
-+        <refentrytitle>kdbus.policy</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>.
-+      <constant>KDBUS_CMD_ENDPOINT_MAKE</constant> takes a
-+      <type>struct kdbus_cmd</type> argument.
-+    </para>
-+    <programlisting>
-+struct kdbus_cmd {
-+  __u64 size;
-+  __u64 flags;
-+  __u64 return_flags;
-+  struct kdbus_item items[0];
-+};
-+    </programlisting>
-+
-+    <para>The fields in this struct are described below.</para>
-+
-+    <variablelist>
-+      <varlistentry>
-+        <term><varname>size</varname></term>
-+        <listitem><para>
-+          The overall size of the struct, including its items.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>flags</varname></term>
-+        <listitem><para>The flags for creation.</para>
-+          <variablelist>
-+            <varlistentry>
-+              <term><constant>KDBUS_MAKE_ACCESS_GROUP</constant></term>
-+              <listitem>
-+                <para>Make the endpoint file group-accessible.</para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_MAKE_ACCESS_WORLD</constant></term>
-+              <listitem>
-+                <para>Make the endpoint file world-accessible.</para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
-+              <listitem>
-+                <para>
-+                  Requests a set of valid flags for this ioctl. When this bit is
-+                  set, no action is taken; the ioctl will return
-+                  <errorcode>0</errorcode>, and the <varname>flags</varname>
-+                  field will have all bits set that are valid for this command.
-+                  The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
-+                  cleared by the operation.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+          </variablelist>
-+        </listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>return_flags</varname></term>
-+        <listitem><para>
-+          Flags returned by the kernel. Currently unused and always set to
-+          <constant>0</constant> by the kernel.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>items</varname></term>
-+        <listitem>
-+          <para>
-+            The following items are expected for
-+            <constant>KDBUS_CMD_ENDPOINT_MAKE</constant>.
-+          </para>
-+          <variablelist>
-+            <varlistentry>
-+              <term><constant>KDBUS_ITEM_MAKE_NAME</constant></term>
-+              <listitem>
-+                <para>Contains a string to identify the endpoint name.</para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_ITEM_NAME</constant></term>
-+              <term><constant>KDBUS_ITEM_POLICY_ACCESS</constant></term>
-+              <listitem>
-+                <para>
-+                  These items are used to set the policy attached to the
-+                  endpoint. For more details on bus and endpoint policies, see
-+                  <citerefentry>
-+                    <refentrytitle>kdbus.policy</refentrytitle>
-+                    <manvolnum>7</manvolnum>
-+                  </citerefentry>.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+          </variablelist>
-+          <para>
-+            Unrecognized items are rejected, and the ioctl will fail with
-+            <varname>errno</varname> set to <varname>EINVAL</varname>.
-+          </para>
-+        </listitem>
-+      </varlistentry>
-+    </variablelist>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Updating endpoints</title>
-+    <para>
-+      To update an existing endpoint, the
-+      <constant>KDBUS_CMD_ENDPOINT_UPDATE</constant> command is used on the file
-+      descriptor that was used to create the endpoint, using
-+      <constant>KDBUS_CMD_ENDPOINT_MAKE</constant>. The only relevant detail of
-+      the endpoint that can be updated is the policy. When the command is
-+      employed, the policy of the endpoint is <emphasis>replaced</emphasis>
-+      atomically with the new set of rules.
-+      The command takes a <type>struct kdbus_cmd</type> argument.
-+    </para>
-+    <programlisting>
-+struct kdbus_cmd {
-+  __u64 size;
-+  __u64 flags;
-+  __u64 return_flags;
-+  struct kdbus_item items[0];
-+};
-+    </programlisting>
-+
-+    <para>The fields in this struct are described below.</para>
-+
-+    <variablelist>
-+      <varlistentry>
-+        <term><varname>size</varname></term>
-+        <listitem><para>
-+          The overall size of the struct, including its items.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>flags</varname></term>
-+        <listitem><para>
-+          Unused for this command.
-+          <constant>KDBUS_FLAG_NEGOTIATE</constant> is accepted to probe for
-+          valid flags. If set, the ioctl will return <errorcode>0</errorcode>,
-+          and the <varname>flags</varname> field is set to
-+          <constant>0</constant>.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>return_flags</varname></term>
-+        <listitem><para>
-+          Flags returned by the kernel. Currently unused and always set to
-+          <constant>0</constant> by the kernel.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>items</varname></term>
-+        <listitem>
-+          <para>
-+            The following items are expected for
-+            <constant>KDBUS_CMD_ENDPOINT_UPDATE</constant>.
-+          </para>
-+          <variablelist>
-+            <varlistentry>
-+              <term><constant>KDBUS_ITEM_NAME</constant></term>
-+              <term><constant>KDBUS_ITEM_POLICY_ACCESS</constant></term>
-+              <listitem>
-+                <para>
-+                  These items are used to set the policy attached to the
-+                  endpoint. For more details on bus and endpoint policies, see
-+                  <citerefentry>
-+                    <refentrytitle>kdbus.policy</refentrytitle>
-+                    <manvolnum>7</manvolnum>
-+                  </citerefentry>.
-+                  Existing policy is atomically replaced with the new rules
-+                  provided.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
-+              <listitem><para>
-+                With this item, programs can <emphasis>probe</emphasis> the
-+                kernel for known item types. See
-+                <citerefentry>
-+                  <refentrytitle>kdbus.item</refentrytitle>
-+                  <manvolnum>7</manvolnum>
-+                </citerefentry>
-+                for more details.
-+              </para></listitem>
-+            </varlistentry>
-+          </variablelist>
-+          <para>
-+            Unrecognized items are rejected, and the ioctl will fail with
-+            <varname>errno</varname> set to <constant>EINVAL</constant>.
-+          </para>
-+        </listitem>
-+      </varlistentry>
-+    </variablelist>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Return value</title>
-+    <para>
-+      On success, all mentioned ioctl commands return <errorcode>0</errorcode>;
-+      on error, <errorcode>-1</errorcode> is returned, and
-+      <varname>errno</varname> is set to indicate the error.
-+      If the issued ioctl is illegal for the file descriptor used,
-+      <varname>errno</varname> will be set to <constant>ENOTTY</constant>.
-+    </para>
-+
-+    <refsect2>
-+      <title>
-+        <constant>KDBUS_CMD_ENDPOINT_MAKE</constant> may fail with the
-+        following errors
-+      </title>
-+
-+      <variablelist>
-+        <varlistentry>
-+          <term><constant>EINVAL</constant></term>
-+          <listitem><para>
-+            The flags supplied in the <type>struct kdbus_cmd</type>
-+            are invalid.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EINVAL</constant></term>
-+          <listitem><para>
-+            Illegal combination of <constant>KDBUS_ITEM_NAME</constant> and
-+            <constant>KDBUS_ITEM_POLICY_ACCESS</constant> was provided.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EEXIST</constant></term>
-+          <listitem><para>
-+            An endpoint of that name already exists.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EPERM</constant></term>
-+          <listitem><para>
-+            The calling user is not privileged. See
-+            <citerefentry>
-+              <refentrytitle>kdbus</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>
-+            for information about privileged users.
-+          </para></listitem>
-+        </varlistentry>
-+      </variablelist>
-+    </refsect2>
-+
-+    <refsect2>
-+      <title>
-+        <constant>KDBUS_CMD_ENDPOINT_UPDATE</constant> may fail with the
-+        following errors
-+      </title>
-+
-+      <variablelist>
-+        <varlistentry>
-+          <term><constant>EINVAL</constant></term>
-+          <listitem><para>
-+            The flags supplied in <type>struct kdbus_cmd</type>
-+            are invalid.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EINVAL</constant></term>
-+          <listitem><para>
-+            Illegal combination of <constant>KDBUS_ITEM_NAME</constant> and
-+            <constant>KDBUS_ITEM_POLICY_ACCESS</constant> was provided.
-+          </para></listitem>
-+        </varlistentry>
-+      </variablelist>
-+    </refsect2>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>See Also</title>
-+    <simplelist type="inline">
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.bus</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.endpoint</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.fs</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.item</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.message</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.name</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.pool</refentrytitle>
-+           <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+    </simplelist>
-+  </refsect1>
-+</refentry>
-diff --git a/Documentation/kdbus/kdbus.fs.xml b/Documentation/kdbus/kdbus.fs.xml
-new file mode 100644
-index 0000000..8c2a90e
---- /dev/null
-+++ b/Documentation/kdbus/kdbus.fs.xml
-@@ -0,0 +1,124 @@
-+<?xml version='1.0'?> <!--*-nxml-*-->
-+<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
-+        "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
-+
-+<refentry id="kdbus_fs">
-+
-+  <refentryinfo>
-+    <title>kdbus.fs</title>
-+    <productname>kdbus.fs</productname>
-+  </refentryinfo>
-+
-+  <refmeta>
-+    <refentrytitle>kdbus.fs</refentrytitle>
-+    <manvolnum>7</manvolnum>
-+  </refmeta>
-+
-+  <refnamediv>
-+    <refname>kdbus.fs</refname>
-+    <refpurpose>kdbus file system</refpurpose>
-+  </refnamediv>
-+
-+  <refsect1>
-+    <title>File-system Layout</title>
-+
-+    <para>
-+      The <emphasis>kdbusfs</emphasis> pseudo filesystem provides access to
-+      kdbus entities, such as <emphasis>buses</emphasis> and
-+      <emphasis>endpoints</emphasis>. Each time the filesystem is mounted,
-+      a new, isolated kdbus instance is created, which is independent from the
-+      other instances.
-+    </para>
-+    <para>
-+      The system-wide standard mount point for <emphasis>kdbusfs</emphasis> is
-+      <constant>/sys/fs/kdbus</constant>.
-+    </para>
-+
-+    <para>
-+      Buses are represented as directories in the file system layout, whereas
-+      endpoints are exposed as files inside these directories. At the top-level,
-+      a <emphasis>control</emphasis> node is present, which can be opened to
-+      create new buses via the <constant>KDBUS_CMD_BUS_MAKE</constant> ioctl.
-+      Each <emphasis>bus</emphasis> shows a default endpoint called
-+      <varname>bus</varname>, which can be opened to either create a connection
-+      with the <constant>KDBUS_CMD_HELLO</constant> ioctl, or to create new
-+      custom endpoints for the bus with
-+      <constant>KDBUS_CMD_ENDPOINT_MAKE</constant>. See
-+      <citerefentry>
-+        <refentrytitle>kdbus.bus</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>,
-+      <citerefentry>
-+        <refentrytitle>kdbus.connection</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry> and
-+      <citerefentry>
-+        <refentrytitle>kdbus.endpoint</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>
-+      for more details.
-+    </para>
-+
-+    <para>Following, you can see an example layout of the
-+    <emphasis>kdbusfs</emphasis> filesystem:</para>
-+
-+<programlisting>
-+        /sys/fs/kdbus/                          ; mount-point
-+        |-- 0-system                            ; bus directory
-+        |   |-- bus                             ; default endpoint
-+        |   `-- 1017-custom                     ; custom endpoint
-+        |-- 1000-user                           ; bus directory
-+        |   |-- bus                             ; default endpoint
-+        |   |-- 1000-service-A                  ; custom endpoint
-+        |   `-- 1000-service-B                  ; custom endpoint
-+        `-- control                             ; control file
-+</programlisting>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Mounting instances</title>
-+    <para>
-+      In order to get a new and separate kdbus environment, a new instance
-+      of <emphasis>kdbusfs</emphasis> can be mounted like this:
-+    </para>
-+<programlisting>
-+  # mount -t kdbusfs kdbusfs /tmp/new_kdbus/
-+</programlisting>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>See Also</title>
-+    <simplelist type="inline">
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.bus</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.connection</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.endpoint</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>mount</refentrytitle>
-+          <manvolnum>8</manvolnum>
-+        </citerefentry>
-+      </member>
-+    </simplelist>
-+  </refsect1>
-+</refentry>
-diff --git a/Documentation/kdbus/kdbus.item.xml b/Documentation/kdbus/kdbus.item.xml
-new file mode 100644
-index 0000000..ee09dfa
---- /dev/null
-+++ b/Documentation/kdbus/kdbus.item.xml
-@@ -0,0 +1,839 @@
-+<?xml version='1.0'?> <!--*-nxml-*-->
-+<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
-+        "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
-+
-+<refentry id="kdbus">
-+
-+  <refentryinfo>
-+    <title>kdbus.item</title>
-+    <productname>kdbus item</productname>
-+  </refentryinfo>
-+
-+  <refmeta>
-+    <refentrytitle>kdbus.item</refentrytitle>
-+    <manvolnum>7</manvolnum>
-+  </refmeta>
-+
-+  <refnamediv>
-+    <refname>kdbus.item</refname>
-+    <refpurpose>kdbus item structure, layout and usage</refpurpose>
-+  </refnamediv>
-+
-+  <refsect1>
-+    <title>Description</title>
-+
-+    <para>
-+      To flexibly augment transport structures, data blobs of type
-+      <type>struct kdbus_item</type> can be attached to the structs passed
-+      into the ioctls. Some ioctls make items of certain types mandatory,
-+      others are optional. Items that are unsupported by ioctls they are
-+      attached to will cause the ioctl to fail with <varname>errno</varname>
-+      set to <constant>EINVAL</constant>.
-+      Items are also used for information stored in a connection's
-+      <emphasis>pool</emphasis>, such as received messages, name lists or
-+      requested connection or bus owner information. Depending on the type of
-+      an item, its total size is either fixed or variable.
-+    </para>
-+
-+    <refsect2>
-+      <title>Chaining items</title>
-+      <para>
-+        Whenever items are used as part of the kdbus kernel API, they are
-+        embedded in structs that are embedded inside structs that themselves
-+        include a size field containing the overall size of the structure.
-+        This allows multiple items to be chained up, and an item iterator
-+        (see below) is capable of detecting the end of an item chain.
-+      </para>
-+    </refsect2>
-+
-+    <refsect2>
-+      <title>Alignment</title>
-+      <para>
-+        The kernel expects all items to be aligned to 8-byte boundaries.
-+        Unaligned items will cause the ioctl they are used with to fail
-+        with <varname>errno</varname> set to <constant>EINVAL</constant>.
-+        An item that has an unaligned size itself hence needs to be padded
-+        if it is followed by another item.
-+      </para>
-+    </refsect2>
-+
-+    <refsect2>
-+      <title>Iterating items</title>
-+      <para>
-+        A simple iterator would iterate over the items until the items have
-+        reached the embedding structure's overall size. An example
-+        implementation is shown below.
-+      </para>
-+
-+      <programlisting><![CDATA[
-+#define KDBUS_ALIGN8(val) (((val) + 7) & ~7)
-+
-+#define KDBUS_ITEM_NEXT(item) \
-+    (typeof(item))((uint8_t *)(item) + KDBUS_ALIGN8((item)->size))
-+
-+#define KDBUS_ITEM_FOREACH(item, head, first)                      \
-+    for ((item) = (head)->first;                                   \
-+         ((uint8_t *)(item) < (uint8_t *)(head) + (head)->size) && \
-+          ((uint8_t *)(item) >= (uint8_t *)(head));                \
-+         (item) = KDBUS_ITEM_NEXT(item))
-+      ]]></programlisting>
-+    </refsect2>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Item layout</title>
-+    <para>
-+      A <type>struct kdbus_item</type> consists of a
-+      <varname>size</varname> field, describing its overall size, and a
-+      <varname>type</varname> field, both 64 bit wide. They are followed by
-+      a union to store information that is specific to the item's type.
-+      The struct layout is shown below.
-+    </para>
-+
-+    <programlisting>
-+struct kdbus_item {
-+  __u64 size;
-+  __u64 type;
-+  /* item payload - see below */
-+  union {
-+    __u8 data[0];
-+    __u32 data32[0];
-+    __u64 data64[0];
-+    char str[0];
-+
-+    __u64 id;
-+    struct kdbus_vec vec;
-+    struct kdbus_creds creds;
-+    struct kdbus_pids pids;
-+    struct kdbus_audit audit;
-+    struct kdbus_caps caps;
-+    struct kdbus_timestamp timestamp;
-+    struct kdbus_name name;
-+    struct kdbus_bloom_parameter bloom_parameter;
-+    struct kdbus_bloom_filter bloom_filter;
-+    struct kdbus_memfd memfd;
-+    int fds[0];
-+    struct kdbus_notify_name_change name_change;
-+    struct kdbus_notify_id_change id_change;
-+    struct kdbus_policy_access policy_access;
-+  };
-+};
-+    </programlisting>
-+
-+    <para>
-+      <type>struct kdbus_item</type> should never be used to allocate
-+      an item instance, as its size may grow in future releases of the API.
-+      Instead, it should be manually assembled by storing the
-+      <varname>size</varname>, <varname>type</varname> and payload to a
-+      struct of its own.
-+    </para>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Item types</title>
-+
-+    <refsect2>
-+      <title>Negotiation item</title>
-+      <variablelist>
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
-+          <listitem><para>
-+            With this item is attached to any ioctl, programs can
-+            <emphasis>probe</emphasis> the kernel for known item types.
-+            The item carries an array of <type>uint64_t</type> values in
-+            <varname>item.data64</varname>, each set to an item type to
-+            probe. The kernel will reset each member of this array that is
-+            not recognized as valid item type to <constant>0</constant>.
-+            This way, users can negotiate kernel features at start-up to
-+            keep newer userspace compatible with older kernels. This item
-+            is never attached by the kernel in response to any command.
-+          </para></listitem>
-+        </varlistentry>
-+      </variablelist>
-+    </refsect2>
-+
-+    <refsect2>
-+      <title>Command specific items</title>
-+      <variablelist>
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_PAYLOAD_VEC</constant></term>
-+          <term><constant>KDBUS_ITEM_PAYLOAD_OFF</constant></term>
-+          <listitem><para>
-+            Messages are directly copied by the sending process into the
-+            receiver's
-+            <citerefentry>
-+              <refentrytitle>kdbus.pool</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>.
-+            This way, two peers can exchange data by effectively doing a
-+            single-copy from one process to another; the kernel will not buffer
-+            the data anywhere else. <constant>KDBUS_ITEM_PAYLOAD_VEC</constant>
-+            is used when <emphasis>sending</emphasis> message. The item
-+            references a memory address when the payload data can be found.
-+            <constant>KDBUS_ITEM_PAYLOAD_OFF</constant> is used when messages
-+            are <emphasis>received</emphasis>, and the
-+            <constant>offset</constant> value describes the offset inside the
-+            receiving connection's
-+            <citerefentry>
-+              <refentrytitle>kdbus.pool</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>
-+            where the message payload can be found. See
-+            <citerefentry>
-+              <refentrytitle>kdbus.message</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>
-+            for more information on passing of payload data along with a
-+            message.
-+            <programlisting>
-+struct kdbus_vec {
-+  __u64 size;
-+  union {
-+    __u64 address;
-+    __u64 offset;
-+  };
-+};
-+            </programlisting>
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_PAYLOAD_MEMFD</constant></term>
-+          <listitem><para>
-+            Transports a file descriptor of a <emphasis>memfd</emphasis> in
-+            <type>struct kdbus_memfd</type> in <varname>item.memfd</varname>.
-+            The <varname>size</varname> field has to match the actual size of
-+            the memfd that was specified when it was created. The
-+            <varname>start</varname> parameter denotes the offset inside the
-+            memfd at which the referenced payload starts. See
-+            <citerefentry>
-+              <refentrytitle>kdbus.message</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>
-+            for more information on passing of payload data along with a
-+            message.
-+            <programlisting>
-+struct kdbus_memfd {
-+  __u64 start;
-+  __u64 size;
-+  int fd;
-+  __u32 __pad;
-+};
-+            </programlisting>
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_FDS</constant></term>
-+          <listitem><para>
-+            Contains an array of <emphasis>file descriptors</emphasis>.
-+            When used with <constant>KDBUS_CMD_SEND</constant>, the values of
-+            this array must be filled with valid file descriptor numbers.
-+            When received as item attached to a message, the array will
-+            contain the numbers of the installed file descriptors, or
-+            <constant>-1</constant> in case an error occurred.
-+            In either case, the number of entries in the array is derived from
-+            the item's total size. See
-+            <citerefentry>
-+              <refentrytitle>kdbus.message</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>
-+            for more information.
-+          </para></listitem>
-+        </varlistentry>
-+      </variablelist>
-+    </refsect2>
-+
-+    <refsect2>
-+      <title>Items specific to some commands</title>
-+      <variablelist>
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_CANCEL_FD</constant></term>
-+          <listitem><para>
-+            Transports a file descriptor that can be used to cancel a
-+            synchronous <constant>KDBUS_CMD_SEND</constant> operation by
-+            writing to it. The file descriptor is stored in
-+            <varname>item.fd[0]</varname>. The item may only contain one
-+            file descriptor. See
-+            <citerefentry>
-+              <refentrytitle>kdbus.message</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>
-+            for more information on this item and how to use it.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_BLOOM_PARAMETER</constant></term>
-+          <listitem><para>
-+            Contains a set of <emphasis>bloom parameters</emphasis> as
-+            <type>struct kdbus_bloom_parameter</type> in
-+            <varname>item.bloom_parameter</varname>.
-+            The item is passed from userspace to kernel during the
-+            <constant>KDBUS_CMD_BUS_MAKE</constant> ioctl, and returned
-+            verbatim when <constant>KDBUS_CMD_HELLO</constant> is called.
-+            The kernel does not use the bloom parameters, but they need to
-+            be known by each connection on the bus in order to define the
-+            bloom filter hash details. See
-+            <citerefentry>
-+              <refentrytitle>kdbus.match</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>
-+            for more information on matching and bloom filters.
-+            <programlisting>
-+struct kdbus_bloom_parameter {
-+  __u64 size;
-+  __u64 n_hash;
-+};
-+            </programlisting>
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_BLOOM_FILTER</constant></term>
-+          <listitem><para>
-+            Carries a <emphasis>bloom filter</emphasis> as
-+            <type>struct kdbus_bloom_filter</type> in
-+            <varname>item.bloom_filter</varname>. It is mandatory to send this
-+            item attached to a <type>struct kdbus_msg</type>, in case the
-+            message is a signal. This item is never transported from kernel to
-+            userspace. See
-+            <citerefentry>
-+              <refentrytitle>kdbus.match</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>
-+            for more information on matching and bloom filters.
-+            <programlisting>
-+struct kdbus_bloom_filter {
-+  __u64 generation;
-+  __u64 data[0];
-+};
-+            </programlisting>
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_BLOOM_MASK</constant></term>
-+          <listitem><para>
-+            Transports a <emphasis>bloom mask</emphasis> as binary data blob
-+            stored in <varname>item.data</varname>. This item is used to
-+            describe a match into a connection's match database. See
-+            <citerefentry>
-+              <refentrytitle>kdbus.match</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>
-+            for more information on matching and bloom filters.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_DST_NAME</constant></term>
-+          <listitem><para>
-+            Contains a <emphasis>well-known name</emphasis> to send a
-+            message to, as null-terminated string in
-+            <varname>item.str</varname>. This item is used with
-+            <constant>KDBUS_CMD_SEND</constant>. See
-+            <citerefentry>
-+              <refentrytitle>kdbus.message</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>
-+            for more information on how to send a message.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_MAKE_NAME</constant></term>
-+          <listitem><para>
-+            Contains a <emphasis>bus name</emphasis> or
-+            <emphasis>endpoint name</emphasis>, stored as null-terminated
-+            string in <varname>item.str</varname>. This item is sent from
-+            userspace to kernel when buses or endpoints are created, and
-+            returned back to userspace when the bus creator information is
-+            queried. See
-+            <citerefentry>
-+              <refentrytitle>kdbus.bus</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>
-+            and
-+            <citerefentry>
-+              <refentrytitle>kdbus.endpoint</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_ATTACH_FLAGS_SEND</constant></term>
-+          <term><constant>KDBUS_ITEM_ATTACH_FLAGS_RECV</constant></term>
-+          <listitem><para>
-+            Contains a set of <emphasis>attach flags</emphasis> at
-+            <emphasis>send</emphasis> or <emphasis>receive</emphasis> time. See
-+            <citerefentry>
-+              <refentrytitle>kdbus</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>,
-+            <citerefentry>
-+              <refentrytitle>kdbus.bus</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry> and
-+            <citerefentry>
-+              <refentrytitle>kdbus.connection</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>
-+            for more information on attach flags.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_ID</constant></term>
-+          <listitem><para>
-+            Transports a connection's <emphasis>numerical ID</emphasis> of
-+            a connection as <type>uint64_t</type> value in
-+            <varname>item.id</varname>.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_NAME</constant></term>
-+          <listitem><para>
-+            Transports a name associated with the
-+            <emphasis>name registry</emphasis> as null-terminated string as
-+            <type>struct kdbus_name</type> in
-+            <varname>item.name</varname>. The <varname>flags</varname>
-+            contains the flags of the name. See
-+            <citerefentry>
-+              <refentrytitle>kdbus.name</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>
-+            for more information on how to access the name registry of a bus.
-+            <programlisting>
-+struct kdbus_name {
-+  __u64 flags;
-+  char name[0];
-+};
-+            </programlisting>
-+          </para></listitem>
-+        </varlistentry>
-+      </variablelist>
-+    </refsect2>
-+
-+    <refsect2>
-+      <title>Items attached by the kernel as metadata</title>
-+
-+      <variablelist>
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_TIMESTAMP</constant></term>
-+          <listitem><para>
-+            Contains both the <emphasis>monotonic</emphasis> and the
-+            <emphasis>realtime</emphasis> timestamp, taken when the message
-+            was processed on the kernel side.
-+            Stored as <type>struct kdbus_timestamp</type> in
-+            <varname>item.timestamp</varname>.
-+            <programlisting>
-+struct kdbus_timestamp {
-+  __u64 seqnum;
-+  __u64 monotonic_ns;
-+  __u64 realtime_ns;
-+};
-+            </programlisting>
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_CREDS</constant></term>
-+          <listitem><para>
-+            Contains a set of <emphasis>user</emphasis> and
-+            <emphasis>group</emphasis> information as 32-bit values, in the
-+            usual four flavors: real, effective, saved and filesystem related.
-+            Stored as <type>struct kdbus_creds</type> in
-+            <varname>item.creds</varname>.
-+            <programlisting>
-+struct kdbus_creds {
-+  __u32 uid;
-+  __u32 euid;
-+  __u32 suid;
-+  __u32 fsuid;
-+  __u32 gid;
-+  __u32 egid;
-+  __u32 sgid;
-+  __u32 fsgid;
-+};
-+            </programlisting>
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_PIDS</constant></term>
-+          <listitem><para>
-+            Contains the <emphasis>PID</emphasis>, <emphasis>TID</emphasis>
-+            and <emphasis>parent PID (PPID)</emphasis> of a remote peer.
-+            Stored as <type>struct kdbus_pids</type> in
-+            <varname>item.pids</varname>.
-+            <programlisting>
-+struct kdbus_pids {
-+  __u64 pid;
-+  __u64 tid;
-+  __u64 ppid;
-+};
-+            </programlisting>
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_AUXGROUPS</constant></term>
-+          <listitem><para>
-+            Contains the <emphasis>auxiliary (supplementary) groups</emphasis>
-+            a remote peer is a member of, stored as array of
-+            <type>uint32_t</type> values in <varname>item.data32</varname>.
-+            The array length can be determined by looking at the item's total
-+            size, subtracting the size of the header and dividing the
-+            remainder by <constant>sizeof(uint32_t)</constant>.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_OWNED_NAME</constant></term>
-+          <listitem><para>
-+            Contains a <emphasis>well-known name</emphasis> currently owned
-+            by a connection. The name is stored as null-terminated string in
-+            <varname>item.str</varname>. Its length can also be derived from
-+            the item's total size.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_TID_COMM</constant> [*]</term>
-+          <listitem><para>
-+            Contains the <emphasis>comm</emphasis> string of a task's
-+            <emphasis>TID</emphasis> (thread ID), stored as null-terminated
-+            string in <varname>item.str</varname>. Its length can also be
-+            derived from the item's total size. Receivers of this item should
-+            not use its contents for any kind of security measures. See below.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_PID_COMM</constant> [*]</term>
-+          <listitem><para>
-+            Contains the <emphasis>comm</emphasis> string of a task's
-+            <emphasis>PID</emphasis> (process ID), stored as null-terminated
-+            string in <varname>item.str</varname>. Its length can also be
-+            derived from the item's total size. Receivers of this item should
-+            not use its contents for any kind of security measures. See below.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_EXE</constant> [*]</term>
-+          <listitem><para>
-+            Contains the <emphasis>path to the executable</emphasis> of a task,
-+            stored as null-terminated string in <varname>item.str</varname>. Its
-+            length can also be derived from the item's total size. Receivers of
-+            this item should not use its contents for any kind of security
-+            measures. See below.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_CMDLINE</constant> [*]</term>
-+          <listitem><para>
-+            Contains the <emphasis>command line arguments</emphasis> of a
-+            task, stored as an <emphasis>array</emphasis> of null-terminated
-+            strings in <varname>item.str</varname>. The total length of all
-+            strings in the array can be derived from the item's total size.
-+            Receivers of this item should not use its contents for any kind
-+            of security measures. See below.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_CGROUP</constant></term>
-+          <listitem><para>
-+            Contains the <emphasis>cgroup path</emphasis> of a task, stored
-+            as null-terminated string in <varname>item.str</varname>. Its
-+            length can also be derived from the item's total size.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_CAPS</constant></term>
-+          <listitem><para>
-+            Contains sets of <emphasis>capabilities</emphasis>, stored as
-+            <type>struct kdbus_caps</type> in <varname>item.caps</varname>.
-+            As the item size may increase in the future, programs should be
-+            written in a way that it takes
-+            <varname>item.caps.last_cap</varname> into account, and derive
-+            the number of sets and rows from the item size and the reported
-+            number of valid capability bits.
-+            <programlisting>
-+struct kdbus_caps {
-+  __u32 last_cap;
-+  __u32 caps[0];
-+};
-+            </programlisting>
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_SECLABEL</constant></term>
-+          <listitem><para>
-+            Contains the <emphasis>LSM label</emphasis> of a task, stored as
-+            null-terminated string in <varname>item.str</varname>. Its length
-+            can also be derived from the item's total size.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_AUDIT</constant></term>
-+          <listitem><para>
-+            Contains the audit <emphasis>sessionid</emphasis> and
-+            <emphasis>loginuid</emphasis> of a task, stored as
-+            <type>struct kdbus_audit</type> in
-+            <varname>item.audit</varname>.
-+            <programlisting>
-+struct kdbus_audit {
-+  __u32 sessionid;
-+  __u32 loginuid;
-+};
-+            </programlisting>
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_CONN_DESCRIPTION</constant></term>
-+          <listitem><para>
-+            Contains the <emphasis>connection description</emphasis>, as set
-+            by <constant>KDBUS_CMD_HELLO</constant> or
-+            <constant>KDBUS_CMD_CONN_UPDATE</constant>, stored as
-+            null-terminated string in <varname>item.str</varname>. Its length
-+            can also be derived from the item's total size.
-+          </para></listitem>
-+        </varlistentry>
-+      </variablelist>
-+
-+      <para>
-+        All metadata is automatically translated into the
-+        <emphasis>namespaces</emphasis> of the task that receives them. See
-+        <citerefentry>
-+          <refentrytitle>kdbus.message</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+        for more information.
-+      </para>
-+
-+      <para>
-+        [*] Note that the content stored in metadata items of type
-+        <constant>KDBUS_ITEM_TID_COMM</constant>,
-+        <constant>KDBUS_ITEM_PID_COMM</constant>,
-+        <constant>KDBUS_ITEM_EXE</constant> and
-+        <constant>KDBUS_ITEM_CMDLINE</constant>
-+        can easily be tampered by the sending tasks. Therefore, they should
-+        <emphasis>not</emphasis> be used for any sort of security relevant
-+        assumptions. The only reason they are transmitted is to let
-+        receivers know about details that were set when metadata was
-+        collected, even though the task they were collected from is not
-+        active any longer when the items are received.
-+      </para>
-+    </refsect2>
-+
-+    <refsect2>
-+      <title>Items used for policy entries, matches and notifications</title>
-+
-+      <variablelist>
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_POLICY_ACCESS</constant></term>
-+          <listitem><para>
-+            This item describes a <emphasis>policy access</emphasis> entry to
-+            access the policy database of a
-+            <citerefentry>
-+              <refentrytitle>kdbus.bus</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry> or
-+            <citerefentry>
-+              <refentrytitle>kdbus.endpoint</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>.
-+            Please refer to
-+            <citerefentry>
-+              <refentrytitle>kdbus.policy</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>
-+            for more information on the policy database and how to access it.
-+            <programlisting>
-+struct kdbus_policy_access {
-+  __u64 type;
-+  __u64 access;
-+  __u64 id;
-+};
-+            </programlisting>
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_ID_ADD</constant></term>
-+          <term><constant>KDBUS_ITEM_ID_REMOVE</constant></term>
-+          <listitem><para>
-+            This item is sent as attachment to a
-+            <emphasis>kernel notification</emphasis> and indicates that a
-+            new connection was created on the bus, or that a connection was
-+            disconnected, respectively. It stores a
-+            <type>struct kdbus_notify_id_change</type> in
-+            <varname>item.id_change</varname>.
-+            The <varname>id</varname> field contains the numeric ID of the
-+            connection that was added or removed, and <varname>flags</varname>
-+            is set to the connection flags, as passed by
-+            <constant>KDBUS_CMD_HELLO</constant>. See
-+            <citerefentry>
-+              <refentrytitle>kdbus.match</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>
-+            and
-+            <citerefentry>
-+              <refentrytitle>kdbus.message</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>
-+            for more information on matches and notification messages.
-+            <programlisting>
-+struct kdbus_notify_id_change {
-+  __u64 id;
-+  __u64 flags;
-+};
-+            </programlisting>
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_NAME_ADD</constant></term>
-+          <term><constant>KDBUS_ITEM_NAME_REMOVE</constant></term>
-+          <term><constant>KDBUS_ITEM_NAME_CHANGE</constant></term>
-+          <listitem><para>
-+            This item is sent as attachment to a
-+            <emphasis>kernel notification</emphasis> and indicates that a
-+            <emphasis>well-known name</emphasis> appeared, disappeared or
-+            transferred to another owner on the bus. It stores a
-+            <type>struct kdbus_notify_name_change</type> in
-+            <varname>item.name_change</varname>.
-+            <varname>old_id</varname> describes the former owner of the name
-+            and is set to <constant>0</constant> values in case of
-+            <constant>KDBUS_ITEM_NAME_ADD</constant>.
-+            <varname>new_id</varname> describes the new owner of the name and
-+            is set to <constant>0</constant> values in case of
-+            <constant>KDBUS_ITEM_NAME_REMOVE</constant>.
-+            The <varname>name</varname> field contains the well-known name the
-+            notification is about, as null-terminated string. See
-+            <citerefentry>
-+              <refentrytitle>kdbus.match</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>
-+            and
-+            <citerefentry>
-+              <refentrytitle>kdbus.message</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>
-+            for more information on matches and notification messages.
-+            <programlisting>
-+struct kdbus_notify_name_change {
-+  struct kdbus_notify_id_change old_id;
-+  struct kdbus_notify_id_change new_id;
-+  char name[0];
-+};
-+            </programlisting>
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_REPLY_TIMEOUT</constant></term>
-+          <listitem><para>
-+            This item is sent as attachment to a
-+            <emphasis>kernel notification</emphasis>. It informs the receiver
-+            that an expected reply to a message was not received in time.
-+            The remote peer ID and the message cookie are stored in the message
-+            header. See
-+            <citerefentry>
-+              <refentrytitle>kdbus.message</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>
-+            for more information about messages, timeouts and notifications.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ITEM_REPLY_DEAD</constant></term>
-+          <listitem><para>
-+            This item is sent as attachment to a
-+            <emphasis>kernel notification</emphasis>. It informs the receiver
-+            that a remote connection a reply is expected from was disconnected
-+            before that reply was sent. The remote peer ID and the message
-+            cookie are stored in the message header. See
-+            <citerefentry>
-+              <refentrytitle>kdbus.message</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>
-+            for more information about messages, timeouts and notifications.
-+          </para></listitem>
-+        </varlistentry>
-+      </variablelist>
-+    </refsect2>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>See Also</title>
-+    <simplelist type="inline">
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.bus</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.connection</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.endpoint</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.fs</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.message</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.name</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.pool</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>memfd_create</refentrytitle>
-+          <manvolnum>2</manvolnum>
-+        </citerefentry>
-+      </member>
-+    </simplelist>
-+  </refsect1>
-+
-+</refentry>
-diff --git a/Documentation/kdbus/kdbus.match.xml b/Documentation/kdbus/kdbus.match.xml
-new file mode 100644
-index 0000000..ae38e04
---- /dev/null
-+++ b/Documentation/kdbus/kdbus.match.xml
-@@ -0,0 +1,555 @@
-+<?xml version='1.0'?> <!--*-nxml-*-->
-+<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
-+        "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
-+
-+<refentry id="kdbus.match">
-+
-+  <refentryinfo>
-+    <title>kdbus.match</title>
-+    <productname>kdbus.match</productname>
-+  </refentryinfo>
-+
-+  <refmeta>
-+    <refentrytitle>kdbus.match</refentrytitle>
-+    <manvolnum>7</manvolnum>
-+  </refmeta>
-+
-+  <refnamediv>
-+    <refname>kdbus.match</refname>
-+    <refpurpose>kdbus match</refpurpose>
-+  </refnamediv>
-+
-+  <refsect1>
-+    <title>Description</title>
-+
-+    <para>
-+      kdbus connections can install matches in order to subscribe to signal
-+      messages sent on the bus. Such signal messages can be either directed
-+      to a single connection (by setting a specific connection ID in
-+      <varname>struct kdbus_msg.dst_id</varname> or by sending it to a
-+      well-known name), or to potentially <emphasis>all</emphasis> currently
-+      active connections on the bus (by setting
-+      <varname>struct kdbus_msg.dst_id</varname> to
-+      <constant>KDBUS_DST_ID_BROADCAST</constant>).
-+      A signal message always has the <constant>KDBUS_MSG_SIGNAL</constant>
-+      bit set in the <varname>flags</varname> bitfield.
-+      Also, signal messages can originate from either the kernel (called
-+      <emphasis>notifications</emphasis>), or from other bus connections.
-+      In either case, a bus connection needs to have a suitable
-+      <emphasis>match</emphasis> installed in order to receive any signal
-+      message. Without any rules installed in the connection, no signal message
-+      will be received.
-+    </para>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Matches for signal messages from other connections</title>
-+    <para>
-+      Matches for messages from other connections (not kernel notifications)
-+      are implemented as bloom filters (see below). The sender adds certain
-+      properties of the message as elements to a bloom filter bit field, and
-+      sends that along with the signal message.
-+
-+      The receiving connection adds the message properties it is interested in
-+      as elements to a bloom mask bit field, and uploads the mask as match rule,
-+      possibly along with some other rules to further limit the match.
-+
-+      The kernel will match the signal message's bloom filter against the
-+      connection's bloom mask (simply by &amp;-ing it), and will decide whether
-+      the message should be delivered to a connection.
-+    </para>
-+    <para>
-+      The kernel has no notion of any specific properties of the signal message,
-+      all it sees are the bit fields of the bloom filter and the mask to match
-+      against. The use of bloom filters allows simple and efficient matching,
-+      without exposing any message properties or internals to the kernel side.
-+      Clients need to deal with the fact that they might receive signal messages
-+      which they did not subscribe to, as the bloom filter might allow
-+      false-positives to pass the filter.
-+
-+      To allow the future extension of the set of elements in the bloom filter,
-+      the filter specifies a <emphasis>generation</emphasis> number. A later
-+      generation must always contain all elements of the set of the previous
-+      generation, but can add new elements to the set. The match rules mask can
-+      carry an array with all previous generations of masks individually stored.
-+      When the filter and mask are matched by the kernel, the mask with the
-+      closest matching generation is selected as the index into the mask array.
-+    </para>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Bloom filters</title>
-+    <para>
-+      Bloom filters allow checking whether a given word is present in a
-+      dictionary.  This allows connections to set up a mask for information it
-+      is interested in, and will be delivered signal messages that have a
-+      matching filter.
-+
-+      For general information, see
-+      <ulink url="https://en.wikipedia.org/wiki/Bloom_filter">the Wikipedia
-+      article on bloom filters</ulink>.
-+    </para>
-+    <para>
-+      The size of the bloom filter is defined per bus when it is created, in
-+      <varname>kdbus_bloom_parameter.size</varname>. All bloom filters attached
-+      to signal messages on the bus must match this size, and all bloom filter
-+      matches uploaded by connections must also match the size, or a multiple
-+      thereof (see below).
-+
-+      The calculation of the mask has to be done in userspace applications. The
-+      kernel just checks the bitmasks to decide whether or not to let the
-+      message pass. All bits in the mask must match the filter in and bit-wise
-+      <emphasis>AND</emphasis> logic, but the mask may have more bits set than
-+      the filter. Consequently, false positive matches are expected to happen,
-+      and programs must deal with that fact by checking the contents of the
-+      payload again at receive time.
-+    </para>
-+    <para>
-+      Masks are entities that are always passed to the kernel as part of a
-+      match (with an item of type <constant>KDBUS_ITEM_BLOOM_MASK</constant>),
-+      and filters can be attached to signals, with an item of type
-+      <constant>KDBUS_ITEM_BLOOM_FILTER</constant>. For a filter to match, all
-+      its bits have to be set in the match mask as well.
-+    </para>
-+    <para>
-+      For example, consider a bus that has a bloom size of 8 bytes, and the
-+      following mask/filter combinations:
-+    </para>
-+    <programlisting><![CDATA[
-+          filter  0x0101010101010101
-+          mask    0x0101010101010101
-+                  -> matches
-+
-+          filter  0x0303030303030303
-+          mask    0x0101010101010101
-+                  -> doesn't match
-+
-+          filter  0x0101010101010101
-+          mask    0x0303030303030303
-+                  -> matches
-+    ]]></programlisting>
-+
-+    <para>
-+      Hence, in order to catch all messages, a mask filled with
-+      <constant>0xff</constant> bytes can be installed as a wildcard match rule.
-+    </para>
-+
-+    <refsect2>
-+      <title>Generations</title>
-+
-+      <para>
-+        Uploaded matches may contain multiple masks, which have to be as large
-+        as the bloom filter size defined by the bus. Each block of a mask is
-+        called a <emphasis>generation</emphasis>, starting at index 0.
-+
-+        At match time, when a signal is about to be delivered, a bloom mask
-+        generation is passed, which denotes which of the bloom masks the filter
-+        should be matched against. This allows programs to provide backward
-+        compatible masks at upload time, while older clients can still match
-+        against older versions of filters.
-+      </para>
-+    </refsect2>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Matches for kernel notifications</title>
-+    <para>
-+      To receive kernel generated notifications (see
-+      <citerefentry>
-+        <refentrytitle>kdbus.message</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>),
-+      a connection must install match rules that are different from
-+      the bloom filter matches described in the section above. They can be
-+      filtered by the connection ID that caused the notification to be sent, by
-+      one of the names it currently owns, or by the type of the notification
-+      (ID/name add/remove/change).
-+    </para>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Adding a match</title>
-+    <para>
-+      To add a match, the <constant>KDBUS_CMD_MATCH_ADD</constant> ioctl is
-+      used, which takes a <type>struct kdbus_cmd_match</type> as an argument
-+      described below.
-+
-+      Note that each of the items attached to this command will internally
-+      create one match <emphasis>rule</emphasis>, and the collection of them,
-+      which is submitted as one block via the ioctl, is called a
-+      <emphasis>match</emphasis>. To allow a message to pass, all rules of a
-+      match have to be satisfied. Hence, adding more items to the command will
-+      only narrow the possibility of a match to effectively let the message
-+      pass, and will decrease the chance that the connection's process will be
-+      woken up needlessly.
-+
-+      Multiple matches can be installed per connection. As long as one of it has
-+      a set of rules which allows the message to pass, this one will be
-+      decisive.
-+    </para>
-+
-+    <programlisting>
-+struct kdbus_cmd_match {
-+  __u64 size;
-+  __u64 flags;
-+  __u64 return_flags;
-+  __u64 cookie;
-+  struct kdbus_item items[0];
-+};
-+    </programlisting>
-+
-+    <para>The fields in this struct are described below.</para>
-+
-+    <variablelist>
-+      <varlistentry>
-+        <term><varname>size</varname></term>
-+        <listitem><para>
-+          The overall size of the struct, including its items.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>flags</varname></term>
-+        <listitem><para>Flags to control the behavior of the ioctl.</para>
-+          <variablelist>
-+            <varlistentry>
-+              <term><constant>KDBUS_MATCH_REPLACE</constant></term>
-+              <listitem>
-+                <para>Make the endpoint file group-accessible</para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
-+              <listitem>
-+                <para>
-+                  Requests a set of valid flags for this ioctl. When this bit is
-+                  set, no action is taken; the ioctl will return
-+                  <errorcode>0</errorcode>, and the <varname>flags</varname>
-+                  field will have all bits set that are valid for this command.
-+                  The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
-+                  cleared by the operation.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+          </variablelist>
-+        </listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>return_flags</varname></term>
-+        <listitem><para>
-+          Flags returned by the kernel. Currently unused and always set to
-+          <constant>0</constant> by the kernel.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>cookie</varname></term>
-+        <listitem><para>
-+          A cookie which identifies the match, so it can be referred to when
-+          removing it.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>items</varname></term>
-+        <listitem>
-+        <para>
-+          Items to define the actual rules of the matches. The following item
-+          types are expected. Each item will create one new match rule.
-+        </para>
-+          <variablelist>
-+            <varlistentry>
-+              <term><constant>KDBUS_ITEM_BLOOM_MASK</constant></term>
-+              <listitem>
-+                <para>
-+                  An item that carries the bloom filter mask to match against
-+                  in its data field. The payload size must match the bloom
-+                  filter size that was specified when the bus was created.
-+                  See the "Bloom filters" section above for more information on
-+                  bloom filters.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_ITEM_NAME</constant></term>
-+              <listitem>
-+                <para>
-+                  When used as part of kernel notifications, this item specifies
-+                  a name that is acquired, lost or that changed its owner (see
-+                  below). When used as part of a match for user-generated signal
-+                  messages, it specifies a name that the sending connection must
-+                  own at the time of sending the signal.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_ITEM_ID</constant></term>
-+              <listitem>
-+                <para>
-+                  Specify a sender connection's ID that will match this rule.
-+                  For kernel notifications, this specifies the ID of a
-+                  connection that was added to or removed from the bus.
-+                  For used-generated signals, it specifies the ID of the
-+                  connection that sent the signal message.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_ITEM_NAME_ADD</constant></term>
-+              <term><constant>KDBUS_ITEM_NAME_REMOVE</constant></term>
-+              <term><constant>KDBUS_ITEM_NAME_CHANGE</constant></term>
-+              <listitem>
-+                <para>
-+                  These items request delivery of kernel notifications that
-+                  describe a name acquisition, loss, or change. The details
-+                  are stored in the item's
-+                  <varname>kdbus_notify_name_change</varname> member.
-+                  All information specified must be matched in order to make
-+                  the message pass. Use
-+                  <constant>KDBUS_MATCH_ID_ANY</constant> to
-+                  match against any unique connection ID.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_ITEM_ID_ADD</constant></term>
-+              <term><constant>KDBUS_ITEM_ID_REMOVE</constant></term>
-+              <listitem>
-+                <para>
-+                  These items request delivery of kernel notifications that are
-+                  generated when a connection is created or terminated.
-+                  <type>struct kdbus_notify_id_change</type> is used to
-+                  store the actual match information. This item can be used to
-+                  monitor one particular connection ID, or, when the ID field
-+                  is set to <constant>KDBUS_MATCH_ID_ANY</constant>,
-+                  all of them.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
-+              <listitem><para>
-+                With this item, programs can <emphasis>probe</emphasis> the
-+                kernel for known item types. See
-+                <citerefentry>
-+                  <refentrytitle>kdbus.item</refentrytitle>
-+                  <manvolnum>7</manvolnum>
-+                </citerefentry>
-+                for more details.
-+              </para></listitem>
-+            </varlistentry>
-+          </variablelist>
-+
-+          <para>
-+            Unrecognized items are rejected, and the ioctl will fail with
-+            <varname>errno</varname> set to <constant>EINVAL</constant>.
-+          </para>
-+        </listitem>
-+      </varlistentry>
-+    </variablelist>
-+
-+    <para>
-+      Refer to
-+      <citerefentry>
-+        <refentrytitle>kdbus.message</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>
-+      for more information on message types.
-+    </para>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Removing a match</title>
-+    <para>
-+      Matches can be removed with the
-+      <constant>KDBUS_CMD_MATCH_REMOVE</constant> ioctl, which takes
-+      <type>struct kdbus_cmd_match</type> as argument, but its fields
-+      usage slightly differs compared to that of
-+      <constant>KDBUS_CMD_MATCH_ADD</constant>.
-+    </para>
-+
-+    <programlisting>
-+struct kdbus_cmd_match {
-+  __u64 size;
-+  __u64 cookie;
-+  __u64 flags;
-+  __u64 return_flags;
-+  struct kdbus_item items[0];
-+};
-+    </programlisting>
-+
-+    <para>The fields in this struct are described below.</para>
-+
-+    <variablelist>
-+      <varlistentry>
-+        <term><varname>size</varname></term>
-+        <listitem><para>
-+          The overall size of the struct, including its items.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>cookie</varname></term>
-+        <listitem><para>
-+          The cookie of the match, as it was passed when the match was added.
-+          All matches that have this cookie will be removed.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>flags</varname></term>
-+        <listitem><para>
-+          No flags are supported for this use case.
-+          <constant>KDBUS_FLAG_NEGOTIATE</constant> is accepted to probe for
-+          valid flags. If set, the ioctl will fail with
-+          <errorcode>-1</errorcode>, <varname>errno</varname> is set to
-+          <constant>EPROTO</constant>, and the <varname>flags</varname> field
-+          is set to <constant>0</constant>.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>return_flags</varname></term>
-+        <listitem><para>
-+          Flags returned by the kernel. Currently unused and always set to
-+          <constant>0</constant> by the kernel.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>items</varname></term>
-+        <listitem>
-+          <para>
-+            No items are supported for this use case, but
-+            <constant>KDBUS_ITEM_NEGOTIATE</constant> is allowed nevertheless.
-+          </para>
-+        </listitem>
-+      </varlistentry>
-+    </variablelist>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Return value</title>
-+    <para>
-+      On success, all mentioned ioctl commands return <errorcode>0</errorcode>;
-+      on error, <errorcode>-1</errorcode> is returned, and
-+      <varname>errno</varname> is set to indicate the error.
-+      If the issued ioctl is illegal for the file descriptor used,
-+      <varname>errno</varname> will be set to <constant>ENOTTY</constant>.
-+    </para>
-+
-+    <refsect2>
-+      <title>
-+        <constant>KDBUS_CMD_MATCH_ADD</constant> may fail with the following
-+        errors
-+      </title>
-+
-+      <variablelist>
-+        <varlistentry>
-+          <term><constant>EINVAL</constant></term>
-+          <listitem><para>
-+            Illegal flags or items.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EDOM</constant></term>
-+          <listitem><para>
-+            Illegal bloom filter size.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EMFILE</constant></term>
-+          <listitem><para>
-+            Too many matches for this connection.
-+          </para></listitem>
-+        </varlistentry>
-+      </variablelist>
-+    </refsect2>
-+
-+    <refsect2>
-+      <title>
-+        <constant>KDBUS_CMD_MATCH_REMOVE</constant> may fail with the following
-+        errors
-+      </title>
-+
-+      <variablelist>
-+        <varlistentry>
-+          <term><constant>EINVAL</constant></term>
-+          <listitem><para>
-+            Illegal flags.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EBADSLT</constant></term>
-+          <listitem><para>
-+            A match entry with the given cookie could not be found.
-+          </para></listitem>
-+        </varlistentry>
-+      </variablelist>
-+    </refsect2>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>See Also</title>
-+    <simplelist type="inline">
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.bus</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.match</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.fs</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.item</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.message</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.name</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.pool</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+    </simplelist>
-+  </refsect1>
-+</refentry>
-diff --git a/Documentation/kdbus/kdbus.message.xml b/Documentation/kdbus/kdbus.message.xml
-new file mode 100644
-index 0000000..0115d9d
---- /dev/null
-+++ b/Documentation/kdbus/kdbus.message.xml
-@@ -0,0 +1,1276 @@
-+<?xml version='1.0'?> <!--*-nxml-*-->
-+<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
-+        "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
-+
-+<refentry id="kdbus.message">
-+
-+  <refentryinfo>
-+    <title>kdbus.message</title>
-+    <productname>kdbus.message</productname>
-+  </refentryinfo>
-+
-+  <refmeta>
-+    <refentrytitle>kdbus.message</refentrytitle>
-+    <manvolnum>7</manvolnum>
-+  </refmeta>
-+
-+  <refnamediv>
-+    <refname>kdbus.message</refname>
-+    <refpurpose>kdbus message</refpurpose>
-+  </refnamediv>
-+
-+  <refsect1>
-+    <title>Description</title>
-+
-+    <para>
-+      A kdbus message is used to exchange information between two connections
-+      on a bus, or to transport notifications from the kernel to one or many
-+      connections. This document describes the layout of messages, how payload
-+      is added to them and how they are sent and received.
-+    </para>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Message layout</title>
-+
-+    <para>The layout of a message is shown below.</para>
-+
-+    <programlisting>
-+  +-------------------------------------------------------------------------+
-+  | Message                                                                 |
-+  | +---------------------------------------------------------------------+ |
-+  | | Header                                                              | |
-+  | | size:          overall message size, including the data records     | |
-+  | | destination:   connection ID of the receiver                        | |
-+  | | source:        connection ID of the sender (set by kernel)          | |
-+  | | payload_type:  "DBusDBus" textual identifier stored as uint64_t     | |
-+  | +---------------------------------------------------------------------+ |
-+  | +---------------------------------------------------------------------+ |
-+  | | Data Record                                                         | |
-+  | | size:  overall record size (without padding)                        | |
-+  | | type:  type of data                                                 | |
-+  | | data:  reference to data (address or file descriptor)               | |
-+  | +---------------------------------------------------------------------+ |
-+  | +---------------------------------------------------------------------+ |
-+  | | padding bytes to the next 8 byte alignment                          | |
-+  | +---------------------------------------------------------------------+ |
-+  | +---------------------------------------------------------------------+ |
-+  | | Data Record                                                         | |
-+  | | size:  overall record size (without padding)                        | |
-+  | | ...                                                                 | |
-+  | +---------------------------------------------------------------------+ |
-+  | +---------------------------------------------------------------------+ |
-+  | | padding bytes to the next 8 byte alignment                          | |
-+  | +---------------------------------------------------------------------+ |
-+  | +---------------------------------------------------------------------+ |
-+  | | Data Record                                                         | |
-+  | | size:  overall record size                                          | |
-+  | | ...                                                                 | |
-+  | +---------------------------------------------------------------------+ |
-+  |   ... further data records ...                                          |
-+  +-------------------------------------------------------------------------+
-+    </programlisting>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Message payload</title>
-+
-+    <para>
-+      When connecting to the bus, receivers request a memory pool of a given
-+      size, large enough to carry all backlog of data enqueued for the
-+      connection. The pool is internally backed by a shared memory file which
-+      can be <function>mmap()</function>ed by the receiver. See
-+      <citerefentry>
-+        <refentrytitle>kdbus.pool</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>
-+      for more information.
-+    </para>
-+
-+    <para>
-+      Message payload must be described in items attached to a message when
-+      it is sent. A receiver can access the payload by looking at the items
-+      that are attached to a message in its pool. The following items are used.
-+    </para>
-+
-+    <variablelist>
-+      <varlistentry>
-+        <term><constant>KDBUS_ITEM_PAYLOAD_VEC</constant></term>
-+        <listitem>
-+          <para>
-+            This item references a piece of memory on the sender side which is
-+            directly copied into the receiver's pool. This way, two peers can
-+            exchange data by effectively doing a single-copy from one process
-+            to another; the kernel will not buffer the data anywhere else.
-+            This item is never found in a message received by a connection.
-+          </para>
-+        </listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><constant>KDBUS_ITEM_PAYLOAD_OFF</constant></term>
-+        <listitem>
-+          <para>
-+            This item is attached to messages on the receiving side and points
-+            to a memory area inside the receiver's pool. The
-+            <varname>offset</varname> variable in the item denotes the memory
-+            location relative to the message itself.
-+          </para>
-+        </listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><constant>KDBUS_ITEM_PAYLOAD_MEMFD</constant></term>
-+        <listitem>
-+          <para>
-+            Messages can reference <emphasis>memfd</emphasis> files which
-+            contain the data. memfd files are tmpfs-backed files that allow
-+            sealing of the content of the file, which prevents all writable
-+            access to the file content.
-+          </para>
-+          <para>
-+            Only memfds that have
-+            <constant>(F_SEAL_SHRINK|F_SEAL_GROW|F_SEAL_WRITE|F_SEAL_SEAL)
-+            </constant>
-+            set are accepted as payload data, which enforces reliable passing of
-+            data. The receiver can assume that neither the sender nor anyone
-+            else can alter the content after the message is sent. If those
-+            seals are not set on the memfd, the ioctl will fail with
-+            <errorcode>-1</errorcode>, and <varname>errno</varname> will be
-+            set to <constant>ETXTBUSY</constant>.
-+          </para>
-+        </listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><constant>KDBUS_ITEM_FDS</constant></term>
-+        <listitem>
-+          <para>
-+            Messages can transport regular file descriptors via
-+            <constant>KDBUS_ITEM_FDS</constant>. This item carries an array
-+            of <type>int</type> values in <varname>item.fd</varname>. The
-+            maximum number of file descriptors in the item is
-+            <constant>253</constant>, and only one item of this type is
-+            accepted per message. All passed values must be valid file
-+            descriptors; the open count of each file descriptors is increased
-+            by installing it to the receiver's task. This item can only be
-+            used for directed messages, not for broadcasts, and only to
-+            remote peers that have opted-in for receiving file descriptors
-+            at connection time (<constant>KDBUS_HELLO_ACCEPT_FD</constant>).
-+          </para>
-+        </listitem>
-+      </varlistentry>
-+    </variablelist>
-+
-+    <para>
-+      The sender must not make any assumptions on the type in which data is
-+      received by the remote peer. The kernel is free to re-pack multiple
-+      <constant>KDBUS_ITEM_PAYLOAD_VEC</constant> and
-+      <constant>KDBUS_ITEM_PAYLOAD_MEMFD</constant> payloads. For instance, the
-+      kernel may decide to merge multiple <constant>VECs</constant> into a
-+      single <constant>VEC</constant>, inline <constant>MEMFD</constant>
-+      payloads into memory, or merge all passed <constant>VECs</constant> into a
-+      single <constant>MEMFD</constant>. However, the kernel preserves the order
-+      of passed data. This means that the order of all <constant>VEC</constant>
-+      and <constant>MEMFD</constant> items is not changed in respect to each
-+      other. In other words: All passed <constant>VEC</constant> and
-+      <constant>MEMFD</constant> data payloads are treated as a single stream
-+      of data that may be received by the remote peer in a different set of
-+      chunks than it was sent as.
-+    </para>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Sending messages</title>
-+
-+    <para>
-+      Messages are passed to the kernel with the
-+      <constant>KDBUS_CMD_SEND</constant> ioctl. Depending on the destination
-+      address of the message, the kernel delivers the message to the specific
-+      destination connection, or to some subset of all connections on the same
-+      bus. Sending messages across buses is not possible. Messages are always
-+      queued in the memory pool of the destination connection (see above).
-+    </para>
-+
-+    <para>
-+      The <constant>KDBUS_CMD_SEND</constant> ioctl uses a
-+      <type>struct kdbus_cmd_send</type> to describe the message
-+      transfer.
-+    </para>
-+    <programlisting>
-+struct kdbus_cmd_send {
-+  __u64 size;
-+  __u64 flags;
-+  __u64 return_flags;
-+  __u64 msg_address;
-+  struct kdbus_msg_info reply;
-+  struct kdbus_item items[0];
-+};
-+    </programlisting>
-+
-+    <para>The fields in this struct are described below.</para>
-+
-+    <variablelist>
-+      <varlistentry>
-+        <term><varname>size</varname></term>
-+        <listitem><para>
-+          The overall size of the struct, including its items.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>flags</varname></term>
-+        <listitem><para>Flags for message delivery</para>
-+          <variablelist>
-+            <varlistentry>
-+              <term><constant>KDBUS_SEND_SYNC_REPLY</constant></term>
-+              <listitem>
-+                <para>
-+                  By default, all calls to kdbus are considered asynchronous,
-+                  non-blocking. However, as there are many use cases that need
-+                  to wait for a remote peer to answer a method call, there's a
-+                  way to send a message and wait for a reply in a synchronous
-+                  fashion. This is what the
-+                  <constant>KDBUS_SEND_SYNC_REPLY</constant> controls. The
-+                  <constant>KDBUS_CMD_SEND</constant> ioctl will block until the
-+                  reply has arrived, the timeout limit is reached, in case the
-+                  remote connection was shut down, or if interrupted by a signal
-+                  before any reply; see
-+                  <citerefentry>
-+                    <refentrytitle>signal</refentrytitle>
-+                    <manvolnum>7</manvolnum>
-+                  </citerefentry>.
-+
-+                  The offset of the reply message in the sender's pool is stored
-+                  in <varname>reply</varname> when the ioctl has returned without
-+                  error. Hence, there is no need for another
-+                  <constant>KDBUS_CMD_RECV</constant> ioctl or anything else to
-+                  receive the reply.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
-+              <listitem>
-+                <para>
-+                  Request a set of valid flags for this ioctl. When this bit is
-+                  set, no action is taken; the ioctl will fail with
-+                  <errorcode>-1</errorcode>, <varname>errno</varname>
-+                  is set to <constant>EPROTO</constant>.
-+                  Once the ioctl returned, the <varname>flags</varname>
-+                  field will have all bits set that the kernel recognizes as
-+                  valid for this command.
-+                  The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
-+                  cleared by the operation.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+          </variablelist>
-+        </listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>return_flags</varname></term>
-+        <listitem><para>
-+          Flags returned by the kernel. Currently unused and always set to
-+          <constant>0</constant> by the kernel.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>msg_address</varname></term>
-+        <listitem><para>
-+          In this field, users have to provide a pointer to a message
-+          (<type>struct kdbus_msg</type>) to send. See below for a
-+          detailed description.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>reply</varname></term>
-+        <listitem><para>
-+          Only used for synchronous replies. See description of
-+          <type>struct kdbus_cmd_recv</type> for more details.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>items</varname></term>
-+        <listitem>
-+          <para>
-+            The following items are currently recognized.
-+          </para>
-+          <variablelist>
-+            <varlistentry>
-+              <term><constant>KDBUS_ITEM_CANCEL_FD</constant></term>
-+              <listitem>
-+                <para>
-+                  When this optional item is passed in, and the call is
-+                  executed as SYNC call, the passed in file descriptor can be
-+                  used as alternative cancellation point. The kernel will call
-+                  <citerefentry>
-+                    <refentrytitle>poll</refentrytitle>
-+                    <manvolnum>2</manvolnum>
-+                  </citerefentry>
-+                  on this file descriptor, and once it reports any incoming
-+                  bytes, the blocking send operation will be canceled; the
-+                  blocking, synchronous ioctl call will return
-+                  <errorcode>-1</errorcode>, and <varname>errno</varname> will
-+                  be set to <errorname>ECANCELED</errorname>.
-+                  Any type of file descriptor on which
-+                  <citerefentry>
-+                    <refentrytitle>poll</refentrytitle>
-+                    <manvolnum>2</manvolnum>
-+                  </citerefentry>
-+                  can be called on can be used as payload to this item; for
-+                  example, an eventfd can be used for this purpose, see
-+                  <citerefentry>
-+                    <refentrytitle>eventfd</refentrytitle>
-+                    <manvolnum>2</manvolnum>
-+                  </citerefentry>.
-+                  For asynchronous message sending, this item is allowed but
-+                  ignored.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+          </variablelist>
-+          <para>
-+            Unrecognized items are rejected, and the ioctl will fail with
-+            <varname>errno</varname> set to <constant>EINVAL</constant>.
-+          </para>
-+        </listitem>
-+      </varlistentry>
-+    </variablelist>
-+
-+    <para>
-+      The message referenced by the <varname>msg_address</varname> above has
-+      the following layout.
-+    </para>
-+
-+    <programlisting>
-+struct kdbus_msg {
-+  __u64 size;
-+  __u64 flags;
-+  __s64 priority;
-+  __u64 dst_id;
-+  __u64 src_id;
-+  __u64 payload_type;
-+  __u64 cookie;
-+  __u64 timeout_ns;
-+  __u64 cookie_reply;
-+  struct kdbus_item items[0];
-+};
-+    </programlisting>
-+
-+    <para>The fields in this struct are described below.</para>
-+
-+    <variablelist>
-+      <varlistentry>
-+        <term><varname>size</varname></term>
-+        <listitem><para>
-+          The overall size of the struct, including its items.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>flags</varname></term>
-+        <listitem><para>Flags to describe message details.</para>
-+          <variablelist>
-+            <varlistentry>
-+              <term><constant>KDBUS_MSG_EXPECT_REPLY</constant></term>
-+              <listitem>
-+                <para>
-+                  Expect a reply to this message from the remote peer. With
-+                  this bit set, the timeout_ns field must be set to a non-zero
-+                  number of nanoseconds in which the receiving peer is expected
-+                  to reply. If such a reply is not received in time, the sender
-+                  will be notified with a timeout message (see below). The
-+                  value must be an absolute value, in nanoseconds and based on
-+                  <constant>CLOCK_MONOTONIC</constant>.
-+                </para><para>
-+                  For a message to be accepted as reply, it must be a direct
-+                  message to the original sender (not a broadcast and not a
-+                  signal message), and its
-+                  <varname>kdbus_msg.cookie_reply</varname> must match the
-+                  previous message's <varname>kdbus_msg.cookie</varname>.
-+                </para><para>
-+                  Expected replies also temporarily open the policy of the
-+                  sending connection, so the other peer is allowed to respond
-+                  within the given time window.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_MSG_NO_AUTO_START</constant></term>
-+              <listitem>
-+                <para>
-+                  By default, when a message is sent to an activator
-+                  connection, the activator is notified and will start an
-+                  implementer. This flag inhibits that behavior. With this bit
-+                  set, and the remote being an activator, the ioctl will fail
-+                  with <varname>errno</varname> set to
-+                  <constant>EADDRNOTAVAIL</constant>.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
-+              <listitem>
-+                <para>
-+                  Requests a set of valid flags for this ioctl. When this bit is
-+                  set, no action is taken; the ioctl will return
-+                  <errorcode>0</errorcode>, and the <varname>flags</varname>
-+                  field will have all bits set that are valid for this command.
-+                  The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
-+                  cleared by the operation.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+          </variablelist>
-+        </listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>priority</varname></term>
-+        <listitem><para>
-+          The priority of this message. Receiving messages (see below) may
-+          optionally be constrained to messages of a minimal priority. This
-+          allows for use cases where timing critical data is interleaved with
-+          control data on the same connection. If unused, the priority field
-+          should be set to <constant>0</constant>.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>dst_id</varname></term>
-+        <listitem><para>
-+          The numeric ID of the destination connection, or
-+          <constant>KDBUS_DST_ID_BROADCAST</constant>
-+          (~0ULL) to address every peer on the bus, or
-+          <constant>KDBUS_DST_ID_NAME</constant> (0) to look
-+          it up dynamically from the bus' name registry.
-+          In the latter case, an item of type
-+          <constant>KDBUS_ITEM_DST_NAME</constant> is mandatory.
-+          Also see
-+          <citerefentry>
-+            <refentrytitle>kdbus.name</refentrytitle>
-+            <manvolnum>7</manvolnum>
-+          </citerefentry>
-+          .
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>src_id</varname></term>
-+        <listitem><para>
-+          Upon return of the ioctl, this member will contain the sending
-+          connection's numerical ID. Should be 0 at send time.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>payload_type</varname></term>
-+        <listitem><para>
-+          Type of the payload in the actual data records. Currently, only
-+          <constant>KDBUS_PAYLOAD_DBUS</constant> is accepted as input value
-+          of this field. When receiving messages that are generated by the
-+          kernel (notifications), this field will contain
-+          <constant>KDBUS_PAYLOAD_KERNEL</constant>.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>cookie</varname></term>
-+        <listitem><para>
-+          Cookie of this message, for later recognition. Also, when replying
-+          to a message (see above), the <varname>cookie_reply</varname>
-+          field must match this value.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>timeout_ns</varname></term>
-+        <listitem><para>
-+          If the message sent requires a reply from the remote peer (see above),
-+          this field contains the timeout in absolute nanoseconds based on
-+          <constant>CLOCK_MONOTONIC</constant>. Also see
-+          <citerefentry>
-+            <refentrytitle>clock_gettime</refentrytitle>
-+            <manvolnum>2</manvolnum>
-+          </citerefentry>.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>cookie_reply</varname></term>
-+        <listitem><para>
-+          If the message sent is a reply to another message, this field must
-+          match the cookie of the formerly received message.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>items</varname></term>
-+        <listitem>
-+          <para>
-+            A dynamically sized list of items to contain additional information.
-+            The following items are expected/valid:
-+          </para>
-+          <variablelist>
-+            <varlistentry>
-+              <term><constant>KDBUS_ITEM_PAYLOAD_VEC</constant></term>
-+              <term><constant>KDBUS_ITEM_PAYLOAD_MEMFD</constant></term>
-+              <term><constant>KDBUS_ITEM_FDS</constant></term>
-+              <listitem>
-+                <para>
-+                  Actual data records containing the payload. See section
-+                  "Message payload".
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_ITEM_BLOOM_FILTER</constant></term>
-+              <listitem>
-+                <para>
-+                  Bloom filter for matches (see below).
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_ITEM_DST_NAME</constant></term>
-+              <listitem>
-+                <para>
-+                  Well-known name to send this message to. Required if
-+                  <varname>dst_id</varname> is set to
-+                  <constant>KDBUS_DST_ID_NAME</constant>.
-+                  If a connection holding the given name can't be found,
-+                  the ioctl will fail with <varname>errno</varname> set to
-+                  <constant>ESRCH</constant> is returned.
-+                </para>
-+                <para>
-+                  For messages to a unique name (ID), this item is optional. If
-+                  present, the kernel will make sure the name owner matches the
-+                  given unique name. This allows programs to tie the message
-+                  sending to the condition that a name is currently owned by a
-+                  certain unique name.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+          </variablelist>
-+        </listitem>
-+      </varlistentry>
-+    </variablelist>
-+
-+    <para>
-+      The message will be augmented by the requested metadata items when
-+      queued into the receiver's pool. See
-+      <citerefentry>
-+        <refentrytitle>kdbus.connection</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>
-+      and
-+      <citerefentry>
-+        <refentrytitle>kdbus.item</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>
-+      for more information on metadata.
-+    </para>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Receiving messages</title>
-+
-+    <para>
-+      Messages are received by the client with the
-+      <constant>KDBUS_CMD_RECV</constant> ioctl. The endpoint file of the bus
-+      supports <function>poll()/epoll()/select()</function>; when new messages
-+      are available on the connection's file descriptor,
-+      <constant>POLLIN</constant> is reported. For compatibility reasons,
-+      <constant>POLLOUT</constant> is always reported as well. Note, however,
-+      that the latter does not guarantee that a message can in fact be sent, as
-+      this depends on how many pending messages the receiver has in its pool.
-+    </para>
-+
-+    <para>
-+      With the <constant>KDBUS_CMD_RECV</constant> ioctl, a
-+      <type>struct kdbus_cmd_recv</type> is used.
-+    </para>
-+
-+    <programlisting>
-+struct kdbus_cmd_recv {
-+  __u64 size;
-+  __u64 flags;
-+  __u64 return_flags;
-+  __s64 priority;
-+  __u64 dropped_msgs;
-+  struct kdbus_msg_info msg;
-+  struct kdbus_item items[0];
-+};
-+    </programlisting>
-+
-+    <para>The fields in this struct are described below.</para>
-+
-+    <variablelist>
-+      <varlistentry>
-+        <term><varname>size</varname></term>
-+        <listitem><para>
-+          The overall size of the struct, including its items.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>flags</varname></term>
-+        <listitem><para>Flags to control the receive command.</para>
-+          <variablelist>
-+            <varlistentry>
-+              <term><constant>KDBUS_RECV_PEEK</constant></term>
-+              <listitem>
-+                <para>
-+                  Just return the location of the next message. Do not install
-+                  file descriptors or anything else. This is usually used to
-+                  determine the sender of the next queued message.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_RECV_DROP</constant></term>
-+              <listitem>
-+                <para>
-+                  Drop the next message without doing anything else with it,
-+                  and free the pool slice. This a short-cut for
-+                  <constant>KDBUS_RECV_PEEK</constant> and
-+                  <constant>KDBUS_CMD_FREE</constant>.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_RECV_USE_PRIORITY</constant></term>
-+              <listitem>
-+                <para>
-+                  Dequeue the messages ordered by their priority, and filtering
-+                  them with the priority field (see below).
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
-+              <listitem>
-+                <para>
-+                  Request a set of valid flags for this ioctl. When this bit is
-+                  set, no action is taken; the ioctl will fail with
-+                  <errorcode>-1</errorcode>, <varname>errno</varname>
-+                  is set to <constant>EPROTO</constant>.
-+                  Once the ioctl returned, the <varname>flags</varname>
-+                  field will have all bits set that the kernel recognizes as
-+                  valid for this command.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+          </variablelist>
-+        </listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>return_flags</varname></term>
-+        <listitem><para>
-+          Flags returned by the kernel. If the <varname>dropped_msgs</varname>
-+          field is non-zero, <constant>KDBUS_RECV_RETURN_DROPPED_MSGS</constant>
-+          is set. If a file descriptor could not be installed, the
-+          <constant>KDBUS_RECV_RETURN_INCOMPLETE_FDS</constant> flag is set.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>priority</varname></term>
-+        <listitem><para>
-+          With <constant>KDBUS_RECV_USE_PRIORITY</constant> set in
-+          <varname>flags</varname>, messages will be dequeued ordered by their
-+          priority, starting with the highest value. Also, messages will be
-+          filtered by the value given in this field, so the returned message
-+          will at least have the requested priority. If no such message is
-+          waiting in the queue, the ioctl will fail, and
-+          <varname>errno</varname> will be set to <constant>EAGAIN</constant>.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>dropped_msgs</varname></term>
-+        <listitem><para>
-+          Whenever a message with <constant>KDBUS_MSG_SIGNAL</constant> is sent
-+          but cannot be queued on a peer (e.g., as it contains FDs but the peer
-+          does not support FDs, or there is no space left in the peer's pool)
-+          the 'dropped_msgs' counter of the peer is incremented. On the next
-+          RECV ioctl, the 'dropped_msgs' field is copied into the ioctl struct
-+          and cleared on the peer. If it was non-zero, the
-+          <constant>KDBUS_RECV_RETURN_DROPPED_MSGS</constant> flag will be set
-+          in <varname>return_flags</varname>. Note that this will only happen
-+          if the ioctl succeeded or failed with <constant>EAGAIN</constant>. In
-+          other error cases, the 'dropped_msgs' field of the peer is left
-+          untouched.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>msg</varname></term>
-+        <listitem><para>
-+          Embedded struct containing information on the received message when
-+          this command succeeded (see below).
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>items</varname></term>
-+        <listitem><para>
-+          Items to specify further details for the receive command.
-+          Currently unused, and all items will be rejected with
-+          <varname>errno</varname> set to <constant>EINVAL</constant>.
-+        </para></listitem>
-+      </varlistentry>
-+    </variablelist>
-+
-+    <para>
-+      Both <type>struct kdbus_cmd_recv</type> and
-+      <type>struct kdbus_cmd_send</type> embed
-+      <type>struct kdbus_msg_info</type>.
-+      For the <constant>KDBUS_CMD_SEND</constant> ioctl, it is used to catch
-+      synchronous replies, if one was requested, and is unused otherwise.
-+    </para>
-+
-+    <programlisting>
-+struct kdbus_msg_info {
-+  __u64 offset;
-+  __u64 msg_size;
-+  __u64 return_flags;
-+};
-+    </programlisting>
-+
-+    <para>The fields in this struct are described below.</para>
-+
-+    <variablelist>
-+      <varlistentry>
-+        <term><varname>offset</varname></term>
-+        <listitem><para>
-+          Upon return of the ioctl, this field contains the offset in the
-+          receiver's memory pool. The memory must be freed with
-+          <constant>KDBUS_CMD_FREE</constant>. See
-+          <citerefentry>
-+            <refentrytitle>kdbus.pool</refentrytitle>
-+            <manvolnum>7</manvolnum>
-+          </citerefentry>
-+          for further details.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>msg_size</varname></term>
-+        <listitem><para>
-+          Upon successful return of the ioctl, this field contains the size of
-+          the allocated slice at offset <varname>offset</varname>.
-+          It is the combination of the size of the stored
-+          <type>struct kdbus_msg</type> object plus all appended VECs.
-+          You can use it in combination with <varname>offset</varname> to map
-+          a single message, instead of mapping the entire pool. See
-+          <citerefentry>
-+            <refentrytitle>kdbus.pool</refentrytitle>
-+            <manvolnum>7</manvolnum>
-+          </citerefentry>
-+          for further details.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>return_flags</varname></term>
-+        <listitem>
-+          <para>
-+            Kernel-provided return flags. Currently, the following flags are
-+            defined.
-+          </para>
-+          <variablelist>
-+            <varlistentry>
-+              <term><constant>KDBUS_RECV_RETURN_INCOMPLETE_FDS</constant></term>
-+              <listitem>
-+                <para>
-+                  The message contained memfds or file descriptors, and the
-+                  kernel failed to install one or more of them at receive time.
-+                  Most probably that happened because the maximum number of
-+                  file descriptors for the receiver's task were exceeded.
-+                  In such cases, the message is still delivered, so this is not
-+                  a fatal condition. File descriptors numbers inside the
-+                  <constant>KDBUS_ITEM_FDS</constant> item or memfd files
-+                  referenced by <constant>KDBUS_ITEM_PAYLOAD_MEMFD</constant>
-+                  items which could not be installed will be set to
-+                  <constant>-1</constant>.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+          </variablelist>
-+        </listitem>
-+      </varlistentry>
-+    </variablelist>
-+
-+    <para>
-+      Unless <constant>KDBUS_RECV_DROP</constant> was passed, the
-+      <varname>offset</varname> field contains the location of the new message
-+      inside the receiver's pool after the <constant>KDBUS_CMD_RECV</constant>
-+      ioctl was employed. The message is stored as <type>struct kdbus_msg</type>
-+      at this offset, and can be interpreted with the semantics described above.
-+    </para>
-+    <para>
-+      Also, if the connection allowed for file descriptor to be passed
-+      (<constant>KDBUS_HELLO_ACCEPT_FD</constant>), and if the message contained
-+      any, they will be installed into the receiving process when the
-+      <constant>KDBUS_CMD_RECV</constant> ioctl is called.
-+      <emphasis>memfds</emphasis> may always be part of the message payload.
-+      The receiving task is obliged to close all file descriptors appropriately
-+      once no longer needed. If <constant>KDBUS_RECV_PEEK</constant> is set, no
-+      file descriptors are installed. This allows for peeking at a message,
-+      looking at its metadata only and dropping it via
-+      <constant>KDBUS_RECV_DROP</constant>, without installing any of the file
-+      descriptors into the receiving process.
-+    </para>
-+    <para>
-+      The caller is obliged to call the <constant>KDBUS_CMD_FREE</constant>
-+      ioctl with the returned offset when the memory is no longer needed.
-+    </para>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Notifications</title>
-+    <para>
-+      A kernel notification is a regular kdbus message with the following
-+      details.
-+    </para>
-+
-+    <itemizedlist>
-+      <listitem><para>
-+          kdbus_msg.src_id == <constant>KDBUS_SRC_ID_KERNEL</constant>
-+      </para></listitem>
-+      <listitem><para>
-+        kdbus_msg.dst_id == <constant>KDBUS_DST_ID_BROADCAST</constant>
-+      </para></listitem>
-+      <listitem><para>
-+        kdbus_msg.payload_type == <constant>KDBUS_PAYLOAD_KERNEL</constant>
-+      </para></listitem>
-+      <listitem><para>
-+        Has exactly one of the items attached that are described below.
-+      </para></listitem>
-+      <listitem><para>
-+        Always has a timestamp item (<constant>KDBUS_ITEM_TIMESTAMP</constant>)
-+        attached.
-+      </para></listitem>
-+    </itemizedlist>
-+
-+    <para>
-+      The kernel will notify its users of the following events.
-+    </para>
-+
-+    <itemizedlist>
-+      <listitem><para>
-+        When connection <emphasis>A</emphasis> is terminated while connection
-+        <emphasis>B</emphasis> is waiting for a reply from it, connection
-+        <emphasis>B</emphasis> is notified with a message with an item of
-+        type <constant>KDBUS_ITEM_REPLY_DEAD</constant>.
-+      </para></listitem>
-+
-+      <listitem><para>
-+        When connection <emphasis>A</emphasis> does not receive a reply from
-+        connection <emphasis>B</emphasis> within the specified timeout window,
-+        connection <emphasis>A</emphasis> will receive a message with an
-+        item of type <constant>KDBUS_ITEM_REPLY_TIMEOUT</constant>.
-+      </para></listitem>
-+
-+      <listitem><para>
-+        When an ordinary connection (not a monitor) is created on or removed
-+        from a bus, messages with an item of type
-+        <constant>KDBUS_ITEM_ID_ADD</constant> or
-+        <constant>KDBUS_ITEM_ID_REMOVE</constant>, respectively, are delivered
-+        to all bus members that match these messages through their match
-+        database. Eavesdroppers (monitor connections) do not cause such
-+        notifications to be sent. They are invisible on the bus.
-+      </para></listitem>
-+
-+      <listitem><para>
-+        When a connection gains or loses ownership of a name, messages with an
-+        item of type <constant>KDBUS_ITEM_NAME_ADD</constant>,
-+        <constant>KDBUS_ITEM_NAME_REMOVE</constant> or
-+        <constant>KDBUS_ITEM_NAME_CHANGE</constant> are delivered to all bus
-+        members that match these messages through their match database.
-+      </para></listitem>
-+    </itemizedlist>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Return value</title>
-+    <para>
-+      On success, all mentioned ioctl commands return <errorcode>0</errorcode>;
-+      on error, <errorcode>-1</errorcode> is returned, and
-+      <varname>errno</varname> is set to indicate the error.
-+      If the issued ioctl is illegal for the file descriptor used,
-+      <varname>errno</varname> will be set to <constant>ENOTTY</constant>.
-+    </para>
-+
-+    <refsect2>
-+      <title>
-+        <constant>KDBUS_CMD_SEND</constant> may fail with the following
-+        errors
-+      </title>
-+
-+      <variablelist>
-+        <varlistentry>
-+          <term><constant>EOPNOTSUPP</constant></term>
-+          <listitem><para>
-+            The connection is not an ordinary connection, or the passed
-+            file descriptors in <constant>KDBUS_ITEM_FDS</constant> item are
-+            either kdbus handles or unix domain sockets. Both are currently
-+            unsupported.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EINVAL</constant></term>
-+          <listitem><para>
-+            The submitted payload type is
-+            <constant>KDBUS_PAYLOAD_KERNEL</constant>,
-+            <constant>KDBUS_MSG_EXPECT_REPLY</constant> was set without timeout
-+            or cookie values, <constant>KDBUS_SEND_SYNC_REPLY</constant> was
-+            set without <constant>KDBUS_MSG_EXPECT_REPLY</constant>, an invalid
-+            item was supplied, <constant>src_id</constant> was non-zero and was
-+            different from the current connection's ID, a supplied memfd had a
-+            size of 0, or a string was not properly null-terminated.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>ENOTUNIQ</constant></term>
-+          <listitem><para>
-+            The supplied destination is
-+            <constant>KDBUS_DST_ID_BROADCAST</constant> and either
-+            file descriptors were passed, or
-+            <constant>KDBUS_MSG_EXPECT_REPLY</constant> was set,
-+            or a timeout was given.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>E2BIG</constant></term>
-+          <listitem><para>
-+            Too many items.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EMSGSIZE</constant></term>
-+          <listitem><para>
-+            The size of the message header and items or the payload vector
-+            is excessive.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EEXIST</constant></term>
-+          <listitem><para>
-+            Multiple <constant>KDBUS_ITEM_FDS</constant>,
-+            <constant>KDBUS_ITEM_BLOOM_FILTER</constant> or
-+            <constant>KDBUS_ITEM_DST_NAME</constant> items were supplied.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EBADF</constant></term>
-+          <listitem><para>
-+            The supplied <constant>KDBUS_ITEM_FDS</constant> or
-+            <constant>KDBUS_ITEM_PAYLOAD_MEMFD</constant> items
-+            contained an illegal file descriptor.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EMEDIUMTYPE</constant></term>
-+          <listitem><para>
-+            The supplied memfd is not a sealed kdbus memfd.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EMFILE</constant></term>
-+          <listitem><para>
-+            Too many file descriptors inside a
-+            <constant>KDBUS_ITEM_FDS</constant>.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EBADMSG</constant></term>
-+          <listitem><para>
-+            An item had illegal size, both a <constant>dst_id</constant> and a
-+            <constant>KDBUS_ITEM_DST_NAME</constant> was given, or both a name
-+            and a bloom filter was given.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>ETXTBSY</constant></term>
-+          <listitem><para>
-+            The supplied kdbus memfd file cannot be sealed or the seal
-+            was removed, because it is shared with other processes or
-+            still mapped with
-+            <citerefentry>
-+              <refentrytitle>mmap</refentrytitle>
-+              <manvolnum>2</manvolnum>
-+            </citerefentry>.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>ECOMM</constant></term>
-+          <listitem><para>
-+            A peer does not accept the file descriptors addressed to it.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EFAULT</constant></term>
-+          <listitem><para>
-+            The supplied bloom filter size was not 64-bit aligned, or supplied
-+            memory could not be accessed by the kernel.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EDOM</constant></term>
-+          <listitem><para>
-+            The supplied bloom filter size did not match the bloom filter
-+            size of the bus.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EDESTADDRREQ</constant></term>
-+          <listitem><para>
-+            <constant>dst_id</constant> was set to
-+            <constant>KDBUS_DST_ID_NAME</constant>, but no
-+            <constant>KDBUS_ITEM_DST_NAME</constant> was attached.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>ESRCH</constant></term>
-+          <listitem><para>
-+            The name to look up was not found in the name registry.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EADDRNOTAVAIL</constant></term>
-+          <listitem><para>
-+            <constant>KDBUS_MSG_NO_AUTO_START</constant> was given but the
-+            destination connection is an activator.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>ENXIO</constant></term>
-+          <listitem><para>
-+            The passed numeric destination connection ID couldn't be found,
-+            or is not connected.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>ECONNRESET</constant></term>
-+          <listitem><para>
-+            The destination connection is no longer active.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>ETIMEDOUT</constant></term>
-+          <listitem><para>
-+            Timeout while synchronously waiting for a reply.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EINTR</constant></term>
-+          <listitem><para>
-+            Interrupted system call while synchronously waiting for a reply.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EPIPE</constant></term>
-+          <listitem><para>
-+            When sending a message, a synchronous reply from the receiving
-+            connection was expected but the connection died before answering.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>ENOBUFS</constant></term>
-+          <listitem><para>
-+            Too many pending messages on the receiver side.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EREMCHG</constant></term>
-+          <listitem><para>
-+            Both a well-known name and a unique name (ID) was given, but
-+            the name is not currently owned by that connection.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EXFULL</constant></term>
-+          <listitem><para>
-+            The memory pool of the receiver is full.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EREMOTEIO</constant></term>
-+          <listitem><para>
-+            While synchronously waiting for a reply, the remote peer
-+            failed with an I/O error.
-+          </para></listitem>
-+        </varlistentry>
-+      </variablelist>
-+    </refsect2>
-+
-+    <refsect2>
-+      <title>
-+        <constant>KDBUS_CMD_RECV</constant> may fail with the following
-+        errors
-+      </title>
-+
-+      <variablelist>
-+        <varlistentry>
-+          <term><constant>EOPNOTSUPP</constant></term>
-+          <listitem><para>
-+            The connection is not an ordinary connection, or the passed
-+            file descriptors are either kdbus handles or unix domain
-+            sockets. Both are currently unsupported.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EINVAL</constant></term>
-+          <listitem><para>
-+            Invalid flags or offset.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EAGAIN</constant></term>
-+          <listitem><para>
-+            No message found in the queue.
-+          </para></listitem>
-+        </varlistentry>
-+      </variablelist>
-+    </refsect2>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>See Also</title>
-+    <simplelist type="inline">
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.bus</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.connection</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.endpoint</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.fs</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.item</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.name</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.pool</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>clock_gettime</refentrytitle>
-+          <manvolnum>2</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>ioctl</refentrytitle>
-+          <manvolnum>2</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>poll</refentrytitle>
-+          <manvolnum>2</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>select</refentrytitle>
-+          <manvolnum>2</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>epoll</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>eventfd</refentrytitle>
-+          <manvolnum>2</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>memfd_create</refentrytitle>
-+          <manvolnum>2</manvolnum>
-+        </citerefentry>
-+      </member>
-+    </simplelist>
-+  </refsect1>
-+</refentry>
-diff --git a/Documentation/kdbus/kdbus.name.xml b/Documentation/kdbus/kdbus.name.xml
-new file mode 100644
-index 0000000..3f5f6a6
---- /dev/null
-+++ b/Documentation/kdbus/kdbus.name.xml
-@@ -0,0 +1,711 @@
-+<?xml version='1.0'?> <!--*-nxml-*-->
-+<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
-+        "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
-+
-+<refentry id="kdbus.name">
-+
-+  <refentryinfo>
-+    <title>kdbus.name</title>
-+    <productname>kdbus.name</productname>
-+  </refentryinfo>
-+
-+  <refmeta>
-+    <refentrytitle>kdbus.name</refentrytitle>
-+    <manvolnum>7</manvolnum>
-+  </refmeta>
-+
-+  <refnamediv>
-+    <refname>kdbus.name</refname>
-+    <refpurpose>kdbus.name</refpurpose>
-+  </refnamediv>
-+
-+  <refsect1>
-+    <title>Description</title>
-+    <para>
-+      Each
-+      <citerefentry>
-+        <refentrytitle>kdbus.bus</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>
-+      instantiates a name registry to resolve well-known names into unique
-+      connection IDs for message delivery. The registry will be queried when a
-+      message is sent with <varname>kdbus_msg.dst_id</varname> set to
-+      <constant>KDBUS_DST_ID_NAME</constant>, or when a registry dump is
-+      requested with <constant>KDBUS_CMD_NAME_LIST</constant>.
-+    </para>
-+
-+    <para>
-+      All of the below is subject to policy rules for <emphasis>SEE</emphasis>
-+      and <emphasis>OWN</emphasis> permissions. See
-+      <citerefentry>
-+        <refentrytitle>kdbus.policy</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>
-+      for more information.
-+    </para>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Name validity</title>
-+    <para>
-+      A name has to comply with the following rules in order to be considered
-+      valid.
-+    </para>
-+
-+    <itemizedlist>
-+      <listitem>
-+        <para>
-+          The name has two or more elements separated by a
-+          '<literal>.</literal>' (period) character.
-+        </para>
-+      </listitem>
-+      <listitem>
-+        <para>
-+          All elements must contain at least one character.
-+        </para>
-+      </listitem>
-+      <listitem>
-+        <para>
-+          Each element must only contain the ASCII characters
-+          <literal>[A-Z][a-z][0-9]_</literal> and must not begin with a
-+          digit.
-+        </para>
-+      </listitem>
-+      <listitem>
-+        <para>
-+          The name must contain at least one '<literal>.</literal>' (period)
-+          character (and thus at least two elements).
-+        </para>
-+      </listitem>
-+      <listitem>
-+        <para>
-+          The name must not begin with a '<literal>.</literal>' (period)
-+          character.
-+        </para>
-+      </listitem>
-+      <listitem>
-+        <para>
-+          The name must not exceed <constant>255</constant> characters in
-+          length.
-+        </para>
-+      </listitem>
-+    </itemizedlist>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Acquiring a name</title>
-+    <para>
-+      To acquire a name, a client uses the
-+      <constant>KDBUS_CMD_NAME_ACQUIRE</constant> ioctl with
-+      <type>struct kdbus_cmd</type> as argument.
-+    </para>
-+
-+    <programlisting>
-+struct kdbus_cmd {
-+  __u64 size;
-+  __u64 flags;
-+  __u64 return_flags;
-+  struct kdbus_item items[0];
-+};
-+    </programlisting>
-+
-+    <para>The fields in this struct are described below.</para>
-+
-+    <variablelist>
-+      <varlistentry>
-+        <term><varname>size</varname></term>
-+        <listitem><para>
-+          The overall size of the struct, including its items.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>flags</varname></term>
-+        <listitem><para>Flags to control details in the name acquisition.</para>
-+          <variablelist>
-+            <varlistentry>
-+              <term><constant>KDBUS_NAME_REPLACE_EXISTING</constant></term>
-+              <listitem>
-+                <para>
-+                  Acquiring a name that is already present usually fails,
-+                  unless this flag is set in the call, and
-+                  <constant>KDBUS_NAME_ALLOW_REPLACEMENT</constant> (see below)
-+                  was set when the current owner of the name acquired it, or
-+                  if the current owner is an activator connection (see
-+                  <citerefentry>
-+                    <refentrytitle>kdbus.connection</refentrytitle>
-+                    <manvolnum>7</manvolnum>
-+                  </citerefentry>).
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_NAME_ALLOW_REPLACEMENT</constant></term>
-+              <listitem>
-+                <para>
-+                  Allow other connections to take over this name. When this
-+                  happens, the former owner of the connection will be notified
-+                  of the name loss.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_NAME_QUEUE</constant></term>
-+              <listitem>
-+                <para>
-+                  A name that is already acquired by a connection can not be
-+                  acquired again (unless the
-+                  <constant>KDBUS_NAME_ALLOW_REPLACEMENT</constant> flag was
-+                  set during acquisition; see above).
-+                  However, a connection can put itself in a queue of
-+                  connections waiting for the name to be released. Once that
-+                  happens, the first connection in that queue becomes the new
-+                  owner and is notified accordingly.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
-+              <listitem>
-+                <para>
-+                  Request a set of valid flags for this ioctl. When this bit is
-+                  set, no action is taken; the ioctl will fail with
-+                  <errorcode>-1</errorcode>, and <varname>errno</varname>
-+                  is set to <constant>EPROTO</constant>.
-+                  Once the ioctl returned, the <varname>flags</varname>
-+                  field will have all bits set that the kernel recognizes as
-+                  valid for this command.
-+                  The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
-+                  cleared by the operation.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+          </variablelist>
-+        </listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>return_flags</varname></term>
-+        <listitem>
-+          <para>
-+            Flags returned by the kernel. Currently, the following may be
-+            returned by the kernel.
-+          </para>
-+          <variablelist>
-+            <varlistentry>
-+              <term><constant>KDBUS_NAME_IN_QUEUE</constant></term>
-+              <listitem>
-+                <para>
-+                  The name was not acquired yet, but the connection was
-+                  placed in the queue of peers waiting for the name.
-+                  This can only happen if <constant>KDBUS_NAME_QUEUE</constant>
-+                  was set in the <varname>flags</varname> member (see above).
-+                  The connection will receive a name owner change notification
-+                  once the current owner has given up the name and its
-+                  ownership was transferred.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+          </variablelist>
-+        </listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>items</varname></term>
-+        <listitem>
-+          <para>
-+            Items to submit the name. Currently, one item of type
-+            <constant>KDBUS_ITEM_NAME</constant> is expected and allowed, and
-+            the contained string must be a valid bus name.
-+            <constant>KDBUS_ITEM_NEGOTIATE</constant> may be used to probe for
-+            valid item types. See
-+            <citerefentry>
-+              <refentrytitle>kdbus.item</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>
-+            for a detailed description of how this item is used.
-+          </para>
-+          <para>
-+            Unrecognized items are rejected, and the ioctl will fail with
-+            <varname>errno</varname> set to <errorname>>EINVAL</errorname>.
-+          </para>
-+        </listitem>
-+      </varlistentry>
-+    </variablelist>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Releasing a name</title>
-+    <para>
-+      A connection may release a name explicitly with the
-+      <constant>KDBUS_CMD_NAME_RELEASE</constant> ioctl. If the connection was
-+      an implementer of an activatable name, its pending messages are moved
-+      back to the activator. If there are any connections queued up as waiters
-+      for the name, the first one in the queue (the oldest entry) will become
-+      the new owner. The same happens implicitly for all names once a
-+      connection terminates. See
-+      <citerefentry>
-+        <refentrytitle>kdbus.connection</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>
-+      for more information on connections.
-+    </para>
-+    <para>
-+      The <constant>KDBUS_CMD_NAME_RELEASE</constant> ioctl uses the same data
-+      structure as the acquisition call
-+      (<constant>KDBUS_CMD_NAME_ACQUIRE</constant>),
-+      but with slightly different field usage.
-+    </para>
-+
-+    <programlisting>
-+struct kdbus_cmd {
-+  __u64 size;
-+  __u64 flags;
-+  __u64 return_flags;
-+  struct kdbus_item items[0];
-+};
-+    </programlisting>
-+
-+    <para>The fields in this struct are described below.</para>
-+
-+    <variablelist>
-+      <varlistentry>
-+        <term><varname>size</varname></term>
-+        <listitem><para>
-+          The overall size of the struct, including its items.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>flags</varname></term>
-+        <listitem><para>
-+          Flags to the command. Currently unused.
-+          <constant>KDBUS_FLAG_NEGOTIATE</constant> is accepted to probe for
-+          valid flags. If set, the ioctl will return <errorcode>0</errorcode>,
-+          and the <varname>flags</varname> field is set to
-+          <constant>0</constant>.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>return_flags</varname></term>
-+        <listitem><para>
-+          Flags returned by the kernel. Currently unused and always set to
-+          <constant>0</constant> by the kernel.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>items</varname></term>
-+        <listitem>
-+          <para>
-+            Items to submit the name. Currently, one item of type
-+            <constant>KDBUS_ITEM_NAME</constant> is expected and allowed, and
-+            the contained string must be a valid bus name.
-+            <constant>KDBUS_ITEM_NEGOTIATE</constant> may be used to probe for
-+            valid item types. See
-+            <citerefentry>
-+              <refentrytitle>kdbus.item</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>
-+            for a detailed description of how this item is used.
-+          </para>
-+          <para>
-+            Unrecognized items are rejected, and the ioctl will fail with
-+            <varname>errno</varname> set to <constant>EINVAL</constant>.
-+          </para>
-+        </listitem>
-+      </varlistentry>
-+    </variablelist>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Dumping the name registry</title>
-+    <para>
-+      A connection may request a complete or filtered dump of currently active
-+      bus names with the <constant>KDBUS_CMD_LIST</constant> ioctl, which
-+      takes a <type>struct kdbus_cmd_list</type> as argument.
-+    </para>
-+
-+    <programlisting>
-+struct kdbus_cmd_list {
-+  __u64 flags;
-+  __u64 return_flags;
-+  __u64 offset;
-+};
-+    </programlisting>
-+
-+    <para>The fields in this struct are described below.</para>
-+
-+    <variablelist>
-+      <varlistentry>
-+        <term><varname>flags</varname></term>
-+        <listitem>
-+          <para>
-+            Any combination of flags to specify which names should be dumped.
-+          </para>
-+          <variablelist>
-+            <varlistentry>
-+              <term><constant>KDBUS_LIST_UNIQUE</constant></term>
-+              <listitem>
-+                <para>
-+                  List the unique (numeric) IDs of the connection, whether it
-+                  owns a name or not.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_LIST_NAMES</constant></term>
-+              <listitem>
-+                <para>
-+                  List well-known names stored in the database which are
-+                  actively owned by a real connection (not an activator).
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_LIST_ACTIVATORS</constant></term>
-+              <listitem>
-+                <para>
-+                  List names that are owned by an activator.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_LIST_QUEUED</constant></term>
-+              <listitem>
-+                <para>
-+                  List connections that are not yet owning a name but are
-+                  waiting for it to become available.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
-+              <listitem>
-+                <para>
-+                  Request a set of valid flags for this ioctl. When this bit is
-+                  set, no action is taken; the ioctl will fail with
-+                  <errorcode>-1</errorcode>, and <varname>errno</varname>
-+                  is set to <constant>EPROTO</constant>.
-+                  Once the ioctl returned, the <varname>flags</varname>
-+                  field will have all bits set that the kernel recognizes as
-+                  valid for this command.
-+                  The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
-+                  cleared by the operation.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+          </variablelist>
-+        </listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>return_flags</varname></term>
-+        <listitem><para>
-+          Flags returned by the kernel. Currently unused and always set to
-+          <constant>0</constant> by the kernel.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>offset</varname></term>
-+        <listitem><para>
-+          When the ioctl returns successfully, the offset to the name registry
-+          dump inside the connection's pool will be stored in this field.
-+        </para></listitem>
-+      </varlistentry>
-+    </variablelist>
-+
-+    <para>
-+      The returned list of names is stored in a <type>struct kdbus_list</type>
-+      that in turn contains an array of type <type>struct kdbus_info</type>,
-+      The array-size in bytes is given as <varname>list_size</varname>.
-+      The fields inside <type>struct kdbus_info</type> is described next.
-+    </para>
-+
-+    <programlisting>
-+struct kdbus_info {
-+  __u64 size;
-+  __u64 id;
-+  __u64 flags;
-+  struct kdbus_item items[0];
-+};
-+    </programlisting>
-+
-+    <para>The fields in this struct are described below.</para>
-+
-+    <variablelist>
-+      <varlistentry>
-+        <term><varname>size</varname></term>
-+        <listitem><para>
-+          The overall size of the struct, including its items.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>id</varname></term>
-+        <listitem><para>
-+          The owning connection's unique ID.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>flags</varname></term>
-+        <listitem><para>
-+          The flags of the owning connection.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>items</varname></term>
-+        <listitem>
-+          <para>
-+            Items containing the actual name. Currently, one item of type
-+            <constant>KDBUS_ITEM_OWNED_NAME</constant> will be attached,
-+            including the name's flags. In that item, the flags field of the
-+            name may carry the following bits:
-+          </para>
-+          <variablelist>
-+            <varlistentry>
-+              <term><constant>KDBUS_NAME_ALLOW_REPLACEMENT</constant></term>
-+              <listitem>
-+                <para>
-+                  Other connections are allowed to take over this name from the
-+                  connection that owns it.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_NAME_IN_QUEUE</constant></term>
-+              <listitem>
-+                <para>
-+                  When retrieving a list of currently acquired names in the
-+                  registry, this flag indicates whether the connection
-+                  actually owns the name or is currently waiting for it to
-+                  become available.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_NAME_ACTIVATOR</constant></term>
-+              <listitem>
-+                <para>
-+                  An activator connection owns a name as a placeholder for an
-+                  implementer, which is started on demand by programs as soon
-+                  as the first message arrives. There's some more information
-+                  on this topic in
-+                  <citerefentry>
-+                    <refentrytitle>kdbus.connection</refentrytitle>
-+                    <manvolnum>7</manvolnum>
-+                  </citerefentry>
-+                  .
-+                </para>
-+                <para>
-+                  In contrast to
-+                  <constant>KDBUS_NAME_REPLACE_EXISTING</constant>,
-+                  when a name is taken over from an activator connection, all
-+                  the messages that have been queued in the activator
-+                  connection will be moved over to the new owner. The activator
-+                  connection will still be tracked for the name and will take
-+                  control again if the implementer connection terminates.
-+                </para>
-+                <para>
-+                  This flag can not be used when acquiring a name, but is
-+                  implicitly set through <constant>KDBUS_CMD_HELLO</constant>
-+                  with <constant>KDBUS_HELLO_ACTIVATOR</constant> set in
-+                  <varname>kdbus_cmd_hello.conn_flags</varname>.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
-+              <listitem>
-+                <para>
-+                  Requests a set of valid flags for this ioctl. When this bit is
-+                  set, no action is taken; the ioctl will return
-+                  <errorcode>0</errorcode>, and the <varname>flags</varname>
-+                  field will have all bits set that are valid for this command.
-+                  The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
-+                  cleared by the operation.
-+                </para>
-+              </listitem>
-+            </varlistentry>
-+          </variablelist>
-+        </listitem>
-+      </varlistentry>
-+    </variablelist>
-+
-+    <para>
-+      The returned buffer must be freed with the
-+      <constant>KDBUS_CMD_FREE</constant> ioctl when the user is finished with
-+      it. See
-+      <citerefentry>
-+        <refentrytitle>kdbus.pool</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>
-+      for more information.
-+    </para>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Return value</title>
-+    <para>
-+      On success, all mentioned ioctl commands return <errorcode>0</errorcode>;
-+      on error, <errorcode>-1</errorcode> is returned, and
-+      <varname>errno</varname> is set to indicate the error.
-+      If the issued ioctl is illegal for the file descriptor used,
-+      <varname>errno</varname> will be set to <constant>ENOTTY</constant>.
-+    </para>
-+
-+    <refsect2>
-+      <title>
-+        <constant>KDBUS_CMD_NAME_ACQUIRE</constant> may fail with the following
-+        errors
-+      </title>
-+
-+      <variablelist>
-+        <varlistentry>
-+          <term><constant>EINVAL</constant></term>
-+          <listitem><para>
-+            Illegal command flags, illegal name provided, or an activator
-+            tried to acquire a second name.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EPERM</constant></term>
-+          <listitem><para>
-+            Policy prohibited name ownership.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EALREADY</constant></term>
-+          <listitem><para>
-+            Connection already owns that name.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EEXIST</constant></term>
-+          <listitem><para>
-+            The name already exists and can not be taken over.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>E2BIG</constant></term>
-+          <listitem><para>
-+            The maximum number of well-known names per connection is exhausted.
-+          </para></listitem>
-+        </varlistentry>
-+      </variablelist>
-+    </refsect2>
-+
-+    <refsect2>
-+      <title>
-+        <constant>KDBUS_CMD_NAME_RELEASE</constant>
-+        may fail with the following errors
-+      </title>
-+
-+      <variablelist>
-+        <varlistentry>
-+          <term><constant>EINVAL</constant></term>
-+          <listitem><para>
-+            Invalid command flags, or invalid name provided.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>ESRCH</constant></term>
-+          <listitem><para>
-+            Name is not found in the registry.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EADDRINUSE</constant></term>
-+          <listitem><para>
-+            Name is owned by a different connection and can't be released.
-+          </para></listitem>
-+        </varlistentry>
-+      </variablelist>
-+    </refsect2>
-+
-+    <refsect2>
-+      <title>
-+        <constant>KDBUS_CMD_LIST</constant> may fail with the following
-+        errors
-+      </title>
-+
-+      <variablelist>
-+        <varlistentry>
-+          <term><constant>EINVAL</constant></term>
-+          <listitem><para>
-+            Invalid command flags
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>ENOBUFS</constant></term>
-+          <listitem><para>
-+            No available memory in the connection's pool.
-+          </para></listitem>
-+        </varlistentry>
-+      </variablelist>
-+    </refsect2>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>See Also</title>
-+    <simplelist type="inline">
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.bus</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.connection</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.item</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.policy</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.pool</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+    </simplelist>
-+  </refsect1>
-+</refentry>
-diff --git a/Documentation/kdbus/kdbus.policy.xml b/Documentation/kdbus/kdbus.policy.xml
-new file mode 100644
-index 0000000..6732416
---- /dev/null
-+++ b/Documentation/kdbus/kdbus.policy.xml
-@@ -0,0 +1,406 @@
-+<?xml version='1.0'?> <!--*-nxml-*-->
-+<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
-+        "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
-+
-+<refentry id="kdbus.policy">
-+
-+  <refentryinfo>
-+    <title>kdbus.policy</title>
-+    <productname>kdbus.policy</productname>
-+  </refentryinfo>
-+
-+  <refmeta>
-+    <refentrytitle>kdbus.policy</refentrytitle>
-+    <manvolnum>7</manvolnum>
-+  </refmeta>
-+
-+  <refnamediv>
-+    <refname>kdbus.policy</refname>
-+    <refpurpose>kdbus policy</refpurpose>
-+  </refnamediv>
-+
-+  <refsect1>
-+    <title>Description</title>
-+
-+    <para>
-+      A kdbus policy restricts the possibilities of connections to own, see and
-+      talk to well-known names. A policy can be associated with a bus (through a
-+      policy holder connection) or a custom endpoint. kdbus stores its policy
-+      information in a database that can be accessed through the following
-+      ioctl commands:
-+    </para>
-+
-+    <variablelist>
-+      <varlistentry>
-+        <term><constant>KDBUS_CMD_HELLO</constant></term>
-+        <listitem><para>
-+          When creating, or updating, a policy holder connection. See
-+          <citerefentry>
-+            <refentrytitle>kdbus.connection</refentrytitle>
-+            <manvolnum>7</manvolnum>
-+          </citerefentry>.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><constant>KDBUS_CMD_ENDPOINT_MAKE</constant></term>
-+        <term><constant>KDBUS_CMD_ENDPOINT_UPDATE</constant></term>
-+        <listitem><para>
-+          When creating, or updating, a bus custom endpoint. See
-+          <citerefentry>
-+            <refentrytitle>kdbus.endpoint</refentrytitle>
-+            <manvolnum>7</manvolnum>
-+          </citerefentry>.
-+        </para></listitem>
-+      </varlistentry>
-+    </variablelist>
-+
-+    <para>
-+      In all cases, the name and policy access information is stored in items
-+      of type <constant>KDBUS_ITEM_NAME</constant> and
-+      <constant>KDBUS_ITEM_POLICY_ACCESS</constant>. For this transport, the
-+      following rules apply.
-+    </para>
-+
-+    <itemizedlist>
-+      <listitem>
-+        <para>
-+          An item of type <constant>KDBUS_ITEM_NAME</constant> must be followed
-+          by at least one <constant>KDBUS_ITEM_POLICY_ACCESS</constant> item.
-+        </para>
-+      </listitem>
-+
-+      <listitem>
-+        <para>
-+          An item of type <constant>KDBUS_ITEM_NAME</constant> can be followed
-+          by an arbitrary number of
-+          <constant>KDBUS_ITEM_POLICY_ACCESS</constant> items.
-+        </para>
-+      </listitem>
-+
-+      <listitem>
-+        <para>
-+          An arbitrary number of groups of names and access levels can be given.
-+        </para>
-+      </listitem>
-+    </itemizedlist>
-+
-+    <para>
-+      Names passed in items of type <constant>KDBUS_ITEM_NAME</constant> must
-+      comply to the rules of valid kdbus.name. See
-+      <citerefentry>
-+        <refentrytitle>kdbus.name</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>
-+      for more information.
-+
-+      The payload of an item of type
-+      <constant>KDBUS_ITEM_POLICY_ACCESS</constant> is defined by the following
-+      struct. For more information on the layout of items, please refer to
-+      <citerefentry>
-+        <refentrytitle>kdbus.item</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>.
-+    </para>
-+
-+    <programlisting>
-+struct kdbus_policy_access {
-+  __u64 type;
-+  __u64 access;
-+  __u64 id;
-+};
-+    </programlisting>
-+
-+    <para>The fields in this struct are described below.</para>
-+
-+    <variablelist>
-+      <varlistentry>
-+        <term><varname>type</varname></term>
-+        <listitem>
-+          <para>
-+            One of the following.
-+          </para>
-+
-+          <variablelist>
-+            <varlistentry>
-+              <term><constant>KDBUS_POLICY_ACCESS_USER</constant></term>
-+              <listitem><para>
-+                Grant access to a user with the UID stored in the
-+                <varname>id</varname> field.
-+              </para></listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_POLICY_ACCESS_GROUP</constant></term>
-+              <listitem><para>
-+                Grant access to a user with the GID stored in the
-+                <varname>id</varname> field.
-+              </para></listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_POLICY_ACCESS_WORLD</constant></term>
-+              <listitem><para>
-+                Grant access to everyone. The <varname>id</varname> field
-+                is ignored.
-+              </para></listitem>
-+            </varlistentry>
-+          </variablelist>
-+        </listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>access</varname></term>
-+        <listitem>
-+          <para>
-+            The access to grant. One of the following.
-+          </para>
-+
-+          <variablelist>
-+            <varlistentry>
-+              <term><constant>KDBUS_POLICY_SEE</constant></term>
-+              <listitem><para>
-+                Allow the name to be seen.
-+              </para></listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_POLICY_TALK</constant></term>
-+              <listitem><para>
-+                Allow the name to be talked to.
-+              </para></listitem>
-+            </varlistentry>
-+
-+            <varlistentry>
-+              <term><constant>KDBUS_POLICY_OWN</constant></term>
-+              <listitem><para>
-+                Allow the name to be owned.
-+              </para></listitem>
-+            </varlistentry>
-+          </variablelist>
-+        </listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>id</varname></term>
-+        <listitem><para>
-+           For <constant>KDBUS_POLICY_ACCESS_USER</constant>, stores the UID.
-+           For <constant>KDBUS_POLICY_ACCESS_GROUP</constant>, stores the GID.
-+        </para></listitem>
-+      </varlistentry>
-+
-+    </variablelist>
-+
-+    <para>
-+      All endpoints of buses have an empty policy database by default.
-+      Therefore, unless policy rules are added, all operations will also be
-+      denied by default. Also see
-+      <citerefentry>
-+        <refentrytitle>kdbus.endpoint</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>.
-+    </para>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Wildcard names</title>
-+    <para>
-+      Policy holder connections may upload names that contain the wildcard
-+      suffix (<literal>".*"</literal>). Such a policy entry is effective for
-+      every well-known name that extends the provided name by exactly one more
-+      level.
-+
-+      For example, the name <literal>foo.bar.*</literal> matches both
-+      <literal>"foo.bar.baz"</literal> and
-+      <literal>"foo.bar.bazbaz"</literal> are, but not
-+      <literal>"foo.bar.baz.baz"</literal>.
-+
-+      This allows connections to take control over multiple names that the
-+      policy holder doesn't need to know about when uploading the policy.
-+
-+      Such wildcard entries are not allowed for custom endpoints.
-+    </para>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Privileged connections</title>
-+    <para>
-+      The policy database is overruled when action is taken by a privileged
-+      connection. Please refer to
-+      <citerefentry>
-+        <refentrytitle>kdbus.connection</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>
-+      for more information on what makes a connection privileged.
-+    </para>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Examples</title>
-+    <para>
-+      For instance, a set of policy rules may look like this:
-+    </para>
-+
-+    <programlisting>
-+KDBUS_ITEM_NAME: str='org.foo.bar'
-+KDBUS_ITEM_POLICY_ACCESS: type=USER, access=OWN, ID=1000
-+KDBUS_ITEM_POLICY_ACCESS: type=USER, access=TALK, ID=1001
-+KDBUS_ITEM_POLICY_ACCESS: type=WORLD, access=SEE
-+
-+KDBUS_ITEM_NAME: str='org.blah.baz'
-+KDBUS_ITEM_POLICY_ACCESS: type=USER, access=OWN, ID=0
-+KDBUS_ITEM_POLICY_ACCESS: type=WORLD, access=TALK
-+    </programlisting>
-+
-+    <para>
-+      That means that 'org.foo.bar' may only be owned by UID 1000, but every
-+      user on the bus is allowed to see the name. However, only UID 1001 may
-+      actually send a message to the connection and receive a reply from it.
-+
-+      The second rule allows 'org.blah.baz' to be owned by UID 0 only, but
-+      every user may talk to it.
-+    </para>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>TALK access and multiple well-known names per connection</title>
-+    <para>
-+      Note that TALK access is checked against all names of a connection. For
-+      example, if a connection owns both <constant>'org.foo.bar'</constant> and
-+      <constant>'org.blah.baz'</constant>, and the policy database allows
-+      <constant>'org.blah.baz'</constant> to be talked to by WORLD, then this
-+      permission is also granted to <constant>'org.foo.bar'</constant>. That
-+      might sound illogical, but after all, we allow messages to be directed to
-+      either the ID or a well-known name, and policy is applied to the
-+      connection, not the name. In other words, the effective TALK policy for a
-+      connection is the most permissive of all names the connection owns.
-+
-+      For broadcast messages, the receiver needs TALK permissions to the sender
-+      to receive the broadcast.
-+    </para>
-+    <para>
-+      Both the endpoint and the bus policy databases are consulted to allow
-+      name registry listing, owning a well-known name and message delivery.
-+      If either one fails, the operation is failed with
-+      <varname>errno</varname> set to <constant>EPERM</constant>.
-+
-+      For best practices, connections that own names with a restricted TALK
-+      access should not install matches. This avoids cases where the sent
-+      message may pass the bloom filter due to false-positives and may also
-+      satisfy the policy rules.
-+
-+      Also see
-+      <citerefentry>
-+        <refentrytitle>kdbus.match</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>.
-+    </para>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Implicit policies</title>
-+    <para>
-+      Depending on the type of the endpoint, a set of implicit rules that
-+      override installed policies might be enforced.
-+
-+      On default endpoints, the following set is enforced and checked before
-+      any user-supplied policy is checked.
-+    </para>
-+
-+    <itemizedlist>
-+      <listitem>
-+        <para>
-+          Privileged connections always override any installed policy. Those
-+          connections could easily install their own policies, so there is no
-+          reason to enforce installed policies.
-+        </para>
-+      </listitem>
-+      <listitem>
-+        <para>
-+          Connections can always talk to connections of the same user. This
-+          includes broadcast messages.
-+        </para>
-+      </listitem>
-+    </itemizedlist>
-+
-+    <para>
-+      Custom endpoints have stricter policies. The following rules apply:
-+    </para>
-+
-+    <itemizedlist>
-+      <listitem>
-+        <para>
-+          Policy rules are always enforced, even if the connection is a
-+          privileged connection.
-+        </para>
-+      </listitem>
-+      <listitem>
-+        <para>
-+          Policy rules are always enforced for <constant>TALK</constant> access,
-+          even if both ends are running under the same user. This includes
-+          broadcast messages.
-+        </para>
-+      </listitem>
-+      <listitem>
-+        <para>
-+          To restrict the set of names that can be seen, endpoint policies can
-+          install <constant>SEE</constant> policies.
-+        </para>
-+      </listitem>
-+    </itemizedlist>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>See Also</title>
-+    <simplelist type="inline">
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.bus</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.endpoint</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.fs</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.item</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.message</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.name</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.pool</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+    </simplelist>
-+  </refsect1>
-+</refentry>
-diff --git a/Documentation/kdbus/kdbus.pool.xml b/Documentation/kdbus/kdbus.pool.xml
-new file mode 100644
-index 0000000..a9e16f1
---- /dev/null
-+++ b/Documentation/kdbus/kdbus.pool.xml
-@@ -0,0 +1,326 @@
-+<?xml version='1.0'?> <!--*-nxml-*-->
-+<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
-+        "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
-+
-+<refentry id="kdbus.pool">
-+
-+  <refentryinfo>
-+    <title>kdbus.pool</title>
-+    <productname>kdbus.pool</productname>
-+  </refentryinfo>
-+
-+  <refmeta>
-+    <refentrytitle>kdbus.pool</refentrytitle>
-+    <manvolnum>7</manvolnum>
-+  </refmeta>
-+
-+  <refnamediv>
-+    <refname>kdbus.pool</refname>
-+    <refpurpose>kdbus pool</refpurpose>
-+  </refnamediv>
-+
-+  <refsect1>
-+    <title>Description</title>
-+    <para>
-+      A pool for data received from the kernel is installed for every
-+      <emphasis>connection</emphasis> of the <emphasis>bus</emphasis>, and
-+      is sized according to the information stored in the
-+      <varname>pool_size</varname> member of <type>struct kdbus_cmd_hello</type>
-+      when <constant>KDBUS_CMD_HELLO</constant> is employed. Internally, the
-+      pool is segmented into <emphasis>slices</emphasis>, each referenced by its
-+      <emphasis>offset</emphasis> in the pool, expressed in <type>bytes</type>.
-+      See
-+      <citerefentry>
-+        <refentrytitle>kdbus.connection</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>
-+      for more information about <constant>KDBUS_CMD_HELLO</constant>.
-+    </para>
-+
-+    <para>
-+      The pool is written to by the kernel when one of the following
-+      <emphasis>ioctls</emphasis> is issued:
-+
-+      <variablelist>
-+        <varlistentry>
-+          <term><constant>KDBUS_CMD_HELLO</constant></term>
-+          <listitem><para>
-+            ... to receive details about the bus the connection was made to
-+          </para></listitem>
-+        </varlistentry>
-+        <varlistentry>
-+          <term><constant>KDBUS_CMD_RECV</constant></term>
-+          <listitem><para>
-+            ... to receive a message
-+          </para></listitem>
-+        </varlistentry>
-+        <varlistentry>
-+          <term><constant>KDBUS_CMD_LIST</constant></term>
-+          <listitem><para>
-+            ... to dump the name registry
-+          </para></listitem>
-+        </varlistentry>
-+        <varlistentry>
-+          <term><constant>KDBUS_CMD_CONN_INFO</constant></term>
-+          <listitem><para>
-+            ... to retrieve information on a connection
-+          </para></listitem>
-+        </varlistentry>
-+        <varlistentry>
-+          <term><constant>KDBUS_CMD_BUS_CREATOR_INFO</constant></term>
-+          <listitem><para>
-+            ... to retrieve information about a connection's bus creator
-+          </para></listitem>
-+        </varlistentry>
-+      </variablelist>
-+
-+    </para>
-+    <para>
-+      The <varname>offset</varname> fields returned by either one of the
-+      aforementioned ioctls describe offsets inside the pool. In order to make
-+      the slice available for subsequent calls,
-+      <constant>KDBUS_CMD_FREE</constant> has to be called on that offset
-+      (see below). Otherwise, the pool will fill up, and the connection won't
-+      be able to receive any more information through its pool.
-+    </para>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Pool slice allocation</title>
-+    <para>
-+      Pool slices are allocated by the kernel in order to report information
-+      back to a task, such as messages, returned name list etc.
-+      Allocation of pool slices cannot be initiated by userspace. See
-+      <citerefentry>
-+        <refentrytitle>kdbus.connection</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>
-+      and
-+      <citerefentry>
-+        <refentrytitle>kdbus.name</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>
-+      for examples of commands that use the <emphasis>pool</emphasis> to
-+      return data.
-+    </para>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Accessing the pool memory</title>
-+    <para>
-+      Memory in the pool is read-only for userspace and may only be written
-+      to by the kernel. To read from the pool memory, the caller is expected to
-+      <citerefentry>
-+        <refentrytitle>mmap</refentrytitle>
-+        <manvolnum>2</manvolnum>
-+      </citerefentry>
-+      the buffer into its task, like this:
-+    </para>
-+    <programlisting>
-+uint8_t *buf = mmap(NULL, size, PROT_READ, MAP_SHARED, conn_fd, 0);
-+    </programlisting>
-+
-+    <para>
-+      In order to map the entire pool, the <varname>size</varname> parameter in
-+      the example above should be set to the value of the
-+      <varname>pool_size</varname> member of
-+      <type>struct kdbus_cmd_hello</type> when
-+      <constant>KDBUS_CMD_HELLO</constant> was employed to create the
-+      connection (see above).
-+    </para>
-+
-+    <para>
-+      The <emphasis>file descriptor</emphasis> used to map the memory must be
-+      the one that was used to create the <emphasis>connection</emphasis>.
-+      In other words, the one that was used to call
-+      <constant>KDBUS_CMD_HELLO</constant>. See
-+      <citerefentry>
-+        <refentrytitle>kdbus.connection</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>
-+      for more details.
-+    </para>
-+
-+    <para>
-+      Alternatively, instead of mapping the entire pool buffer, only parts
-+      of it can be mapped. Every kdbus command that returns an
-+      <emphasis>offset</emphasis> (see above) also reports a
-+      <emphasis>size</emphasis> along with it, so programs can be written
-+      in a way that it only maps portions of the pool to access a specific
-+      <emphasis>slice</emphasis>.
-+    </para>
-+
-+    <para>
-+      When access to the pool memory is no longer needed, programs should
-+      call <function>munmap()</function> on the pointer returned by
-+      <function>mmap()</function>.
-+    </para>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Freeing pool slices</title>
-+    <para>
-+      The <constant>KDBUS_CMD_FREE</constant> ioctl is used to free a slice
-+      inside the pool, describing an offset that was returned in an
-+      <varname>offset</varname> field of another ioctl struct.
-+      The <constant>KDBUS_CMD_FREE</constant> command takes a
-+      <type>struct kdbus_cmd_free</type> as argument.
-+    </para>
-+
-+<programlisting>
-+struct kdbus_cmd_free {
-+  __u64 size;
-+  __u64 flags;
-+  __u64 return_flags;
-+  __u64 offset;
-+  struct kdbus_item items[0];
-+};
-+</programlisting>
-+
-+    <para>The fields in this struct are described below.</para>
-+
-+    <variablelist>
-+      <varlistentry>
-+        <term><varname>size</varname></term>
-+        <listitem><para>
-+          The overall size of the struct, including its items.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>flags</varname></term>
-+        <listitem><para>
-+          Currently unused.
-+          <constant>KDBUS_FLAG_NEGOTIATE</constant> is accepted to probe for
-+          valid flags. If set, the ioctl will return <errorcode>0</errorcode>,
-+          and the <varname>flags</varname> field is set to
-+          <constant>0</constant>.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>return_flags</varname></term>
-+        <listitem><para>
-+          Flags returned by the kernel. Currently unused and always set to
-+          <constant>0</constant> by the kernel.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>offset</varname></term>
-+        <listitem><para>
-+          The offset to free, as returned by other ioctls that allocated
-+          memory for returned information.
-+        </para></listitem>
-+      </varlistentry>
-+
-+      <varlistentry>
-+        <term><varname>items</varname></term>
-+        <listitem><para>
-+          Items to specify further details for the receive command.
-+          Currently unused.
-+          Unrecognized items are rejected, and the ioctl will fail with
-+          <varname>errno</varname> set to <constant>EINVAL</constant>.
-+          All items except for
-+          <constant>KDBUS_ITEM_NEGOTIATE</constant> (see
-+            <citerefentry>
-+              <refentrytitle>kdbus.item</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>
-+          ) will be rejected.
-+        </para></listitem>
-+      </varlistentry>
-+    </variablelist>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Return value</title>
-+    <para>
-+      On success, all mentioned ioctl commands return <errorcode>0</errorcode>;
-+      on error, <errorcode>-1</errorcode> is returned, and
-+      <varname>errno</varname> is set to indicate the error.
-+      If the issued ioctl is illegal for the file descriptor used,
-+      <varname>errno</varname> will be set to <constant>ENOTTY</constant>.
-+    </para>
-+
-+    <refsect2>
-+      <title>
-+        <constant>KDBUS_CMD_FREE</constant> may fail with the following
-+        errors
-+      </title>
-+
-+      <variablelist>
-+        <varlistentry>
-+          <term><constant>ENXIO</constant></term>
-+          <listitem><para>
-+            No pool slice found at given offset.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EINVAL</constant></term>
-+          <listitem><para>
-+            Invalid flags provided.
-+          </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>EINVAL</constant></term>
-+          <listitem><para>
-+            The offset is valid, but the user is not allowed to free the slice.
-+            This happens, for example, if the offset was retrieved with
-+            <constant>KDBUS_RECV_PEEK</constant>.
-+          </para></listitem>
-+        </varlistentry>
-+      </variablelist>
-+    </refsect2>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>See Also</title>
-+    <simplelist type="inline">
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.bus</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.connection</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.endpoint</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.name</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>mmap</refentrytitle>
-+            <manvolnum>2</manvolnum>
-+          </citerefentry>
-+        </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>munmap</refentrytitle>
-+          <manvolnum>2</manvolnum>
-+        </citerefentry>
-+      </member>
-+    </simplelist>
-+  </refsect1>
-+</refentry>
-diff --git a/Documentation/kdbus/kdbus.xml b/Documentation/kdbus/kdbus.xml
-new file mode 100644
-index 0000000..d8e7400
---- /dev/null
-+++ b/Documentation/kdbus/kdbus.xml
-@@ -0,0 +1,1012 @@
-+<?xml version='1.0'?> <!--*-nxml-*-->
-+<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
-+        "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
-+
-+<refentry id="kdbus">
-+
-+  <refentryinfo>
-+    <title>kdbus</title>
-+    <productname>kdbus</productname>
-+  </refentryinfo>
-+
-+  <refmeta>
-+    <refentrytitle>kdbus</refentrytitle>
-+    <manvolnum>7</manvolnum>
-+  </refmeta>
-+
-+  <refnamediv>
-+    <refname>kdbus</refname>
-+    <refpurpose>Kernel Message Bus</refpurpose>
-+  </refnamediv>
-+
-+  <refsect1>
-+    <title>Synopsis</title>
-+    <para>
-+      kdbus is an inter-process communication bus system controlled by the
-+      kernel. It provides user-space with an API to create buses and send
-+      unicast and multicast messages to one, or many, peers connected to the
-+      same bus. It does not enforce any layout on the transmitted data, but
-+      only provides the transport layer used for message interchange between
-+      peers.
-+    </para>
-+    <para>
-+      This set of man-pages gives a comprehensive overview of the kernel-level
-+      API, with all ioctl commands, associated structs and bit masks. However,
-+      most people will not use this API level directly, but rather let one of
-+      the high-level abstraction libraries help them integrate D-Bus
-+      functionality into their applications.
-+    </para>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Description</title>
-+    <para>
-+      kdbus provides a pseudo filesystem called <emphasis>kdbusfs</emphasis>,
-+      which is usually mounted on <filename>/sys/fs/kdbus</filename>. Bus
-+      primitives can be accessed as files and sub-directories underneath this
-+      mount-point. Any advanced operations are done via
-+      <function>ioctl()</function> on files created by
-+      <emphasis>kdbusfs</emphasis>. Multiple mount-points of
-+      <emphasis>kdbusfs</emphasis> are independent of each other. This allows
-+      namespacing of kdbus by mounting a new instance of
-+      <emphasis>kdbusfs</emphasis> in a new mount-namespace. kdbus calls these
-+      mount instances domains and each bus belongs to exactly one domain.
-+    </para>
-+
-+    <para>
-+      kdbus was designed as a transport layer for D-Bus, but is in no way
-+      limited, nor controlled by the D-Bus protocol specification. The D-Bus
-+      protocol is one possible application layer on top of kdbus.
-+    </para>
-+
-+    <para>
-+      For the general D-Bus protocol specification, its payload format, its
-+      marshaling, and its communication semantics, please refer to the
-+      <ulink url="http://dbus.freedesktop.org/doc/dbus-specification.html">
-+      D-Bus specification</ulink>.
-+    </para>
-+
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Terminology</title>
-+
-+    <refsect2>
-+      <title>Domain</title>
-+      <para>
-+        A domain is a <emphasis>kdbusfs</emphasis> mount-point containing all
-+        the bus primitives. Each domain is independent, and separate domains
-+        do not affect each other.
-+      </para>
-+    </refsect2>
-+
-+    <refsect2>
-+      <title>Bus</title>
-+      <para>
-+        A bus is a named object inside a domain. Clients exchange messages
-+        over a bus. Multiple buses themselves have no connection to each other;
-+        messages can only be exchanged on the same bus. The default endpoint of
-+        a bus, to which clients establish connections, is the "bus" file
-+        /sys/fs/kdbus/&lt;bus name&gt;/bus.
-+        Common operating system setups create one "system bus" per system,
-+        and one "user bus" for every logged-in user. Applications or services
-+        may create their own private buses. The kernel driver does not
-+        distinguish between different bus types, they are all handled the same
-+        way. See
-+        <citerefentry>
-+          <refentrytitle>kdbus.bus</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+        for more details.
-+      </para>
-+    </refsect2>
-+
-+    <refsect2>
-+      <title>Endpoint</title>
-+      <para>
-+        An endpoint provides a file to talk to a bus. Opening an endpoint
-+        creates a new connection to the bus to which the endpoint belongs. All
-+        endpoints have unique names and are accessible as files underneath the
-+        directory of a bus, e.g., /sys/fs/kdbus/&lt;bus&gt;/&lt;endpoint&gt;
-+        Every bus has a default endpoint called "bus".
-+        A bus can optionally offer additional endpoints with custom names
-+        to provide restricted access to the bus. Custom endpoints carry
-+        additional policy which can be used to create sandboxes with
-+        locked-down, limited, filtered access to a bus. See
-+        <citerefentry>
-+          <refentrytitle>kdbus.endpoint</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+        for more details.
-+      </para>
-+    </refsect2>
-+
-+    <refsect2>
-+      <title>Connection</title>
-+      <para>
-+        A connection to a bus is created by opening an endpoint file of a
-+        bus. Every ordinary client connection has a unique identifier on the
-+        bus and can address messages to every other connection on the same
-+        bus by using the peer's connection ID as the destination. See
-+        <citerefentry>
-+          <refentrytitle>kdbus.connection</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+        for more details.
-+      </para>
-+    </refsect2>
-+
-+    <refsect2>
-+      <title>Pool</title>
-+      <para>
-+        Each connection allocates a piece of shmem-backed memory that is
-+        used to receive messages and answers to ioctl commands from the kernel.
-+        It is never used to send anything to the kernel. In order to access that
-+        memory, an application must mmap() it into its address space. See
-+        <citerefentry>
-+          <refentrytitle>kdbus.pool</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+        for more details.
-+      </para>
-+    </refsect2>
-+
-+    <refsect2>
-+      <title>Well-known Name</title>
-+      <para>
-+        A connection can, in addition to its implicit unique connection ID,
-+        request the ownership of a textual well-known name. Well-known names are
-+        noted in reverse-domain notation, such as com.example.service1. A
-+        connection that offers a service on a bus is usually reached by its
-+        well-known name. An analogy of connection ID and well-known name is an
-+        IP address and a DNS name associated with that address. See
-+        <citerefentry>
-+          <refentrytitle>kdbus.name</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+        for more details.
-+      </para>
-+    </refsect2>
-+
-+    <refsect2>
-+      <title>Message</title>
-+      <para>
-+        Connections can exchange messages with other connections by addressing
-+        the peers with their connection ID or well-known name. A message
-+        consists of a message header with information on how to route the
-+        message, and the message payload, which is a logical byte stream of
-+        arbitrary size. Messages can carry additional file descriptors to be
-+        passed from one connection to another, just like passing file
-+        descriptors over UNIX domain sockets. Every connection can specify which
-+        set of metadata the kernel should attach to the message when it is
-+        delivered to the receiving connection. Metadata contains information
-+        like: system time stamps, UID, GID, TID, proc-starttime, well-known
-+        names, process comm, process exe, process argv, cgroup, capabilities,
-+        seclabel, audit session, loginuid and the connection's human-readable
-+        name. See
-+        <citerefentry>
-+          <refentrytitle>kdbus.message</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+        for more details.
-+      </para>
-+    </refsect2>
-+
-+    <refsect2>
-+      <title>Item</title>
-+      <para>
-+        The API of kdbus implements the notion of items, submitted through and
-+        returned by most ioctls, and stored inside data structures in the
-+        connection's pool. See
-+        <citerefentry>
-+          <refentrytitle>kdbus.item</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+        for more details.
-+      </para>
-+    </refsect2>
-+
-+    <refsect2>
-+      <title>Broadcast, signal, filter, match</title>
-+      <para>
-+        Signals are messages that a receiver opts in for by installing a blob of
-+        bytes, called a 'match'. Signal messages must always carry a
-+        counter-part blob, called a 'filter', and signals are only delivered to
-+        peers which have a match that white-lists the message's filter. Senders
-+        of signal messages can use either a single connection ID as receiver,
-+        or the special connection ID
-+        <constant>KDBUS_DST_ID_BROADCAST</constant> to potentially send it to
-+        all connections of a bus, following the logic described above. See
-+        <citerefentry>
-+          <refentrytitle>kdbus.match</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+        and
-+        <citerefentry>
-+          <refentrytitle>kdbus.message</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+        for more details.
-+      </para>
-+    </refsect2>
-+
-+    <refsect2>
-+      <title>Policy</title>
-+      <para>
-+        A policy is a set of rules that define which connections can see, talk
-+        to, or register a well-known name on the bus. A policy is attached to
-+        buses and custom endpoints, and modified by policy holder connections or
-+        owners of custom endpoints. See
-+        <citerefentry>
-+          <refentrytitle>kdbus.policy</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+        for more details.
-+      </para>
-+    </refsect2>
-+
-+    <refsect2>
-+      <title>Privileged bus users</title>
-+      <para>
-+        A user connecting to the bus is considered privileged if it is either
-+        the creator of the bus, or if it has the CAP_IPC_OWNER capability flag
-+        set. See
-+        <citerefentry>
-+          <refentrytitle>kdbus.connection</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+        for more details.
-+      </para>
-+    </refsect2>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Bus Layout</title>
-+
-+    <para>
-+      A <emphasis>bus</emphasis> provides and defines an environment that peers
-+      can connect to for message interchange. A bus is created via the kdbus
-+      control interface and can be modified by the bus creator. It applies the
-+      policy that control all bus operations. The bus creator itself does not
-+      participate as a peer. To establish a peer
-+      <emphasis>connection</emphasis>, you have to open one of the
-+      <emphasis>endpoints</emphasis> of a bus. Each bus provides a default
-+      endpoint, but further endpoints can be created on-demand. Endpoints are
-+      used to apply additional policies for all connections on this endpoint.
-+      Thus, they provide additional filters to further restrict access of
-+      specific connections to the bus.
-+    </para>
-+
-+    <para>
-+      Following, you can see an example bus layout:
-+    </para>
-+
-+    <programlisting><![CDATA[
-+                                  Bus Creator
-+                                       |
-+                                       |
-+                                    +-----+
-+                                    | Bus |
-+                                    +-----+
-+                                       |
-+                    __________________/ \__________________
-+                   /                                       \
-+                   |                                       |
-+             +----------+                             +----------+
-+             | Endpoint |                             | Endpoint |
-+             +----------+                             +----------+
-+         _________/|\_________                   _________/|\_________
-+        /          |          \                 /          |          \
-+        |          |          |                 |          |          |
-+        |          |          |                 |          |          |
-+   Connection  Connection  Connection      Connection  Connection  Connection
-+    ]]></programlisting>
-+
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Data structures and interconnections</title>
-+    <programlisting><![CDATA[
-+  +--------------------------------------------------------------------------+
-+  | Domain (Mount Point)                                                     |
-+  | /sys/fs/kdbus/control                                                    |
-+  | +----------------------------------------------------------------------+ |
-+  | | Bus (System Bus)                                                     | |
-+  | | /sys/fs/kdbus/0-system/                                              | |
-+  | | +-------------------------------+ +--------------------------------+ | |
-+  | | | Endpoint                      | | Endpoint                       | | |
-+  | | | /sys/fs/kdbus/0-system/bus    | | /sys/fs/kdbus/0-system/ep.app  | | |
-+  | | +-------------------------------+ +--------------------------------+ | |
-+  | | +--------------+ +--------------+ +--------------+ +---------------+ | |
-+  | | | Connection   | | Connection   | | Connection   | | Connection    | | |
-+  | | | :1.22        | | :1.25        | | :1.55        | | :1.81         | | |
-+  | | +--------------+ +--------------+ +--------------+ +---------------+ | |
-+  | +----------------------------------------------------------------------+ |
-+  |                                                                          |
-+  | +----------------------------------------------------------------------+ |
-+  | | Bus (User Bus for UID 2702)                                          | |
-+  | | /sys/fs/kdbus/2702-user/                                             | |
-+  | | +-------------------------------+ +--------------------------------+ | |
-+  | | | Endpoint                      | | Endpoint                       | | |
-+  | | | /sys/fs/kdbus/2702-user/bus   | | /sys/fs/kdbus/2702-user/ep.app | | |
-+  | | +-------------------------------+ +--------------------------------+ | |
-+  | | +--------------+ +--------------+ +--------------+ +---------------+ | |
-+  | | | Connection   | | Connection   | | Connection   | | Connection    | | |
-+  | | | :1.22        | | :1.25        | | :1.55        | | :1.81         | | |
-+  | | +--------------+ +--------------+ +--------------------------------+ | |
-+  | +----------------------------------------------------------------------+ |
-+  +--------------------------------------------------------------------------+
-+    ]]></programlisting>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>Metadata</title>
-+
-+    <refsect2>
-+      <title>When metadata is collected</title>
-+      <para>
-+        kdbus records data about the system in certain situations. Such metadata
-+        can refer to the currently active process (creds, PIDs, current user
-+        groups, process names and its executable path, cgroup membership,
-+        capabilities, security label and audit information), connection
-+        information (description string, currently owned names) and time stamps.
-+      </para>
-+      <para>
-+        Metadata is collected at the following times.
-+      </para>
-+
-+      <itemizedlist>
-+        <listitem><para>
-+          When a bus is created (<constant>KDBUS_CMD_MAKE</constant>),
-+          information about the calling task is collected. This data is returned
-+          by the kernel via the <constant>KDBUS_CMD_BUS_CREATOR_INFO</constant>
-+          call.
-+        </para></listitem>
-+
-+        <listitem>
-+          <para>
-+            When a connection is created (<constant>KDBUS_CMD_HELLO</constant>),
-+            information about the calling task is collected. Alternatively, a
-+            privileged connection may provide 'faked' information about
-+            credentials, PIDs and security labels which will be stored instead.
-+            This data is returned by the kernel as information on a connection
-+            (<constant>KDBUS_CMD_CONN_INFO</constant>). Only metadata that a
-+            connection allowed to be sent (by setting its bit in
-+            <varname>attach_flags_send</varname>) will be exported in this way.
-+          </para>
-+        </listitem>
-+
-+        <listitem>
-+          <para>
-+            When a message is sent (<constant>KDBUS_CMD_SEND</constant>),
-+            information about the sending task and the sending connection is
-+            collected. This metadata will be attached to the message when it
-+            arrives in the receiver's pool. If the connection sending the
-+            message installed faked credentials (see
-+            <citerefentry>
-+              <refentrytitle>kdbus.connection</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>),
-+            the message will not be augmented by any information about the
-+            currently sending task. Note that only metadata that was requested
-+            by the receiving connection will be collected and attached to
-+            messages.
-+          </para>
-+        </listitem>
-+      </itemizedlist>
-+
-+      <para>
-+        Which metadata items are actually delivered depends on the following
-+        sets and masks:
-+      </para>
-+
-+      <itemizedlist>
-+        <listitem><para>
-+          (a) the system-wide kmod creds mask
-+          (module parameter <varname>attach_flags_mask</varname>)
-+        </para></listitem>
-+
-+        <listitem><para>
-+          (b) the per-connection send creds mask, set by the connecting client
-+        </para></listitem>
-+
-+        <listitem><para>
-+          (c) the per-connection receive creds mask, set by the connecting
-+          client
-+        </para></listitem>
-+
-+        <listitem><para>
-+          (d) the per-bus minimal creds mask, set by the bus creator
-+        </para></listitem>
-+
-+        <listitem><para>
-+          (e) the per-bus owner creds mask, set by the bus creator
-+        </para></listitem>
-+
-+        <listitem><para>
-+          (f) the mask specified when querying creds of a bus peer
-+        </para></listitem>
-+
-+        <listitem><para>
-+          (g) the mask specified when querying creds of a bus owner
-+        </para></listitem>
-+      </itemizedlist>
-+
-+      <para>
-+        With the following rules:
-+      </para>
-+
-+      <itemizedlist>
-+        <listitem>
-+          <para>
-+            [1] The creds attached to messages are determined as
-+            <constant>a &amp; b &amp; c</constant>.
-+          </para>
-+        </listitem>
-+
-+        <listitem>
-+          <para>
-+            [2] When connecting to a bus (<constant>KDBUS_CMD_HELLO</constant>),
-+            and <constant>~b &amp; d != 0</constant>, the call will fail with,
-+            <errorcode>-1</errorcode>, and <varname>errno</varname> is set to
-+            <constant>ECONNREFUSED</constant>.
-+          </para>
-+        </listitem>
-+
-+        <listitem>
-+          <para>
-+            [3] When querying creds of a bus peer, the creds returned are
-+            <constant>a &amp; b &amp; f</constant>.
-+          </para>
-+        </listitem>
-+
-+        <listitem>
-+          <para>
-+            [4] When querying creds of a bus owner, the creds returned are
-+            <constant>a &amp; e &amp; g</constant>.
-+          </para>
-+        </listitem>
-+      </itemizedlist>
-+
-+      <para>
-+        Hence, programs might not always get all requested metadata items that
-+        it requested. Code must be written so that it can cope with this fact.
-+      </para>
-+    </refsect2>
-+
-+    <refsect2>
-+      <title>Benefits and heads-up</title>
-+      <para>
-+        Attaching metadata to messages has two major benefits.
-+
-+        <itemizedlist>
-+          <listitem>
-+            <para>
-+              Metadata attached to messages is gathered at the moment when the
-+              other side calls <constant>KDBUS_CMD_SEND</constant>, or,
-+              respectively, then the kernel notification is generated. There is
-+              no need for the receiving peer to retrieve information about the
-+              task in a second step. This closes a race gap that would otherwise
-+              be inherent.
-+            </para>
-+          </listitem>
-+          <listitem>
-+            <para>
-+              As metadata is delivered along with messages in the same data
-+              blob, no extra calls to kernel functions etc. are needed to gather
-+              them.
-+            </para>
-+          </listitem>
-+        </itemizedlist>
-+
-+        Note, however, that collecting metadata does come at a price for
-+        performance, so developers should carefully assess which metadata to
-+        really opt-in for. For best practice, data that is not needed as part
-+        of a message should not be requested by the connection in the first
-+        place (see <varname>attach_flags_recv</varname> in
-+        <constant>KDBUS_CMD_HELLO</constant>).
-+      </para>
-+    </refsect2>
-+
-+    <refsect2>
-+      <title>Attach flags for metadata items</title>
-+      <para>
-+        To let the kernel know which metadata information to attach as items
-+        to the aforementioned commands, it uses a bitmask. In those, the
-+        following <emphasis>attach flags</emphasis> are currently supported.
-+        Both the <varname>attach_flags_recv</varname> and
-+        <varname>attach_flags_send</varname> fields of
-+        <type>struct kdbus_cmd_hello</type>, as well as the payload of the
-+        <constant>KDBUS_ITEM_ATTACH_FLAGS_SEND</constant> and
-+        <constant>KDBUS_ITEM_ATTACH_FLAGS_RECV</constant> items follow this
-+        scheme.
-+      </para>
-+
-+      <variablelist>
-+        <varlistentry>
-+          <term><constant>KDBUS_ATTACH_TIMESTAMP</constant></term>
-+            <listitem><para>
-+              Requests the attachment of an item of type
-+              <constant>KDBUS_ITEM_TIMESTAMP</constant>.
-+            </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ATTACH_CREDS</constant></term>
-+            <listitem><para>
-+              Requests the attachment of an item of type
-+              <constant>KDBUS_ITEM_CREDS</constant>.
-+            </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ATTACH_PIDS</constant></term>
-+            <listitem><para>
-+              Requests the attachment of an item of type
-+              <constant>KDBUS_ITEM_PIDS</constant>.
-+            </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ATTACH_AUXGROUPS</constant></term>
-+            <listitem><para>
-+              Requests the attachment of an item of type
-+              <constant>KDBUS_ITEM_AUXGROUPS</constant>.
-+            </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ATTACH_NAMES</constant></term>
-+            <listitem><para>
-+              Requests the attachment of an item of type
-+              <constant>KDBUS_ITEM_OWNED_NAME</constant>.
-+            </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ATTACH_TID_COMM</constant></term>
-+            <listitem><para>
-+              Requests the attachment of an item of type
-+              <constant>KDBUS_ITEM_TID_COMM</constant>.
-+            </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ATTACH_PID_COMM</constant></term>
-+            <listitem><para>
-+              Requests the attachment of an item of type
-+              <constant>KDBUS_ITEM_PID_COMM</constant>.
-+            </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ATTACH_EXE</constant></term>
-+            <listitem><para>
-+              Requests the attachment of an item of type
-+              <constant>KDBUS_ITEM_EXE</constant>.
-+            </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ATTACH_CMDLINE</constant></term>
-+            <listitem><para>
-+              Requests the attachment of an item of type
-+              <constant>KDBUS_ITEM_CMDLINE</constant>.
-+            </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ATTACH_CGROUP</constant></term>
-+            <listitem><para>
-+              Requests the attachment of an item of type
-+              <constant>KDBUS_ITEM_CGROUP</constant>.
-+            </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ATTACH_CAPS</constant></term>
-+            <listitem><para>
-+              Requests the attachment of an item of type
-+              <constant>KDBUS_ITEM_CAPS</constant>.
-+            </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ATTACH_SECLABEL</constant></term>
-+            <listitem><para>
-+              Requests the attachment of an item of type
-+              <constant>KDBUS_ITEM_SECLABEL</constant>.
-+            </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ATTACH_AUDIT</constant></term>
-+            <listitem><para>
-+              Requests the attachment of an item of type
-+              <constant>KDBUS_ITEM_AUDIT</constant>.
-+            </para></listitem>
-+        </varlistentry>
-+
-+        <varlistentry>
-+          <term><constant>KDBUS_ATTACH_CONN_DESCRIPTION</constant></term>
-+            <listitem><para>
-+              Requests the attachment of an item of type
-+              <constant>KDBUS_ITEM_CONN_DESCRIPTION</constant>.
-+            </para></listitem>
-+        </varlistentry>
-+      </variablelist>
-+
-+      <para>
-+        Please refer to
-+        <citerefentry>
-+          <refentrytitle>kdbus.item</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+        for detailed information about the layout and payload of items and
-+        what metadata should be used to.
-+      </para>
-+    </refsect2>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>The ioctl interface</title>
-+
-+    <para>
-+      As stated in the 'synopsis' section above, application developers are
-+      strongly encouraged to use kdbus through one of the high-level D-Bus
-+      abstraction libraries, rather than using the low-level API directly.
-+    </para>
-+
-+    <para>
-+      kdbus on the kernel level exposes its functions exclusively through
-+      <citerefentry>
-+        <refentrytitle>ioctl</refentrytitle>
-+        <manvolnum>2</manvolnum>
-+      </citerefentry>,
-+      employed on file descriptors returned by
-+      <citerefentry>
-+        <refentrytitle>open</refentrytitle>
-+        <manvolnum>2</manvolnum>
-+      </citerefentry>
-+      on pseudo files exposed by
-+      <citerefentry>
-+        <refentrytitle>kdbus.fs</refentrytitle>
-+        <manvolnum>7</manvolnum>
-+      </citerefentry>.
-+    </para>
-+    <para>
-+      Following is a list of all the ioctls, along with the command structs
-+      they must be used with.
-+    </para>
-+
-+    <informaltable frame="none">
-+      <tgroup cols="3" colsep="1">
-+        <thead>
-+          <row>
-+            <entry>ioctl signature</entry>
-+            <entry>command</entry>
-+            <entry>transported struct</entry>
-+          </row>
-+        </thead>
-+        <tbody>
-+          <row>
-+            <entry><constant>0x40189500</constant></entry>
-+            <entry><constant>KDBUS_CMD_BUS_MAKE</constant></entry>
-+            <entry><type>struct kdbus_cmd *</type></entry>
-+          </row><row>
-+            <entry><constant>0x40189510</constant></entry>
-+            <entry><constant>KDBUS_CMD_ENDPOINT_MAKE</constant></entry>
-+            <entry><type>struct kdbus_cmd *</type></entry>
-+          </row><row>
-+            <entry><constant>0xc0609580</constant></entry>
-+            <entry><constant>KDBUS_CMD_HELLO</constant></entry>
-+            <entry><type>struct kdbus_cmd_hello *</type></entry>
-+          </row><row>
-+            <entry><constant>0x40189582</constant></entry>
-+            <entry><constant>KDBUS_CMD_BYEBYE</constant></entry>
-+            <entry><type>struct kdbus_cmd *</type></entry>
-+          </row><row>
-+            <entry><constant>0x40389590</constant></entry>
-+            <entry><constant>KDBUS_CMD_SEND</constant></entry>
-+            <entry><type>struct kdbus_cmd_send *</type></entry>
-+          </row><row>
-+            <entry><constant>0x80409591</constant></entry>
-+            <entry><constant>KDBUS_CMD_RECV</constant></entry>
-+            <entry><type>struct kdbus_cmd_recv *</type></entry>
-+          </row><row>
-+            <entry><constant>0x40209583</constant></entry>
-+            <entry><constant>KDBUS_CMD_FREE</constant></entry>
-+            <entry><type>struct kdbus_cmd_free *</type></entry>
-+          </row><row>
-+            <entry><constant>0x401895a0</constant></entry>
-+            <entry><constant>KDBUS_CMD_NAME_ACQUIRE</constant></entry>
-+            <entry><type>struct kdbus_cmd *</type></entry>
-+          </row><row>
-+            <entry><constant>0x401895a1</constant></entry>
-+            <entry><constant>KDBUS_CMD_NAME_RELEASE</constant></entry>
-+            <entry><type>struct kdbus_cmd *</type></entry>
-+          </row><row>
-+            <entry><constant>0x80289586</constant></entry>
-+            <entry><constant>KDBUS_CMD_LIST</constant></entry>
-+            <entry><type>struct kdbus_cmd_list *</type></entry>
-+          </row><row>
-+            <entry><constant>0x80309584</constant></entry>
-+            <entry><constant>KDBUS_CMD_CONN_INFO</constant></entry>
-+            <entry><type>struct kdbus_cmd_info *</type></entry>
-+          </row><row>
-+            <entry><constant>0x40209551</constant></entry>
-+            <entry><constant>KDBUS_CMD_UPDATE</constant></entry>
-+            <entry><type>struct kdbus_cmd *</type></entry>
-+          </row><row>
-+            <entry><constant>0x80309585</constant></entry>
-+            <entry><constant>KDBUS_CMD_BUS_CREATOR_INFO</constant></entry>
-+            <entry><type>struct kdbus_cmd_info *</type></entry>
-+          </row><row>
-+            <entry><constant>0x40189511</constant></entry>
-+            <entry><constant>KDBUS_CMD_ENDPOINT_UPDATE</constant></entry>
-+            <entry><type>struct kdbus_cmd *</type></entry>
-+          </row><row>
-+            <entry><constant>0x402095b0</constant></entry>
-+            <entry><constant>KDBUS_CMD_MATCH_ADD</constant></entry>
-+            <entry><type>struct kdbus_cmd_match *</type></entry>
-+          </row><row>
-+            <entry><constant>0x402095b1</constant></entry>
-+            <entry><constant>KDBUS_CMD_MATCH_REMOVE</constant></entry>
-+            <entry><type>struct kdbus_cmd_match *</type></entry>
-+          </row>
-+        </tbody>
-+      </tgroup>
-+    </informaltable>
-+
-+    <para>
-+      Depending on the type of <emphasis>kdbusfs</emphasis> node that was
-+      opened and what ioctls have been executed on a file descriptor before,
-+      a different sub-set of ioctl commands is allowed.
-+    </para>
-+
-+    <itemizedlist>
-+      <listitem>
-+        <para>
-+          On a file descriptor resulting from opening a
-+          <emphasis>control node</emphasis>, only the
-+          <constant>KDBUS_CMD_BUS_MAKE</constant> ioctl may be executed.
-+        </para>
-+      </listitem>
-+      <listitem>
-+        <para>
-+          On a file descriptor resulting from opening a
-+          <emphasis>bus endpoint node</emphasis>, only the
-+          <constant>KDBUS_CMD_ENDPOINT_MAKE</constant> and
-+          <constant>KDBUS_CMD_HELLO</constant> ioctls may be executed.
-+        </para>
-+      </listitem>
-+      <listitem>
-+        <para>
-+          A file descriptor that was used to create a bus
-+          (via <constant>KDBUS_CMD_BUS_MAKE</constant>) is called a
-+          <emphasis>bus owner</emphasis> file descriptor. The bus will be
-+          active as long as the file descriptor is kept open.
-+          A bus owner file descriptor can not be used to
-+          employ any further ioctls. As soon as
-+          <citerefentry>
-+            <refentrytitle>close</refentrytitle>
-+            <manvolnum>2</manvolnum>
-+          </citerefentry>
-+          is called on it, the bus will be shut down, along will all associated
-+          endpoints and connections. See
-+          <citerefentry>
-+            <refentrytitle>kdbus.bus</refentrytitle>
-+            <manvolnum>7</manvolnum>
-+          </citerefentry>
-+          for more details.
-+        </para>
-+      </listitem>
-+      <listitem>
-+        <para>
-+          A file descriptor that was used to create an endpoint
-+          (via <constant>KDBUS_CMD_ENDPOINT_MAKE</constant>) is called an
-+          <emphasis>endpoint owner</emphasis> file descriptor. The endpoint
-+          will be active as long as the file descriptor is kept open.
-+          An endpoint owner file descriptor can only be used
-+          to update details of an endpoint through the
-+          <constant>KDBUS_CMD_ENDPOINT_UPDATE</constant> ioctl. As soon as
-+          <citerefentry>
-+            <refentrytitle>close</refentrytitle>
-+            <manvolnum>2</manvolnum>
-+          </citerefentry>
-+          is called on it, the endpoint will be removed from the bus, and all
-+          connections that are connected to the bus through it are shut down.
-+          See
-+          <citerefentry>
-+            <refentrytitle>kdbus.endpoint</refentrytitle>
-+            <manvolnum>7</manvolnum>
-+          </citerefentry>
-+          for more details.
-+        </para>
-+      </listitem>
-+      <listitem>
-+        <para>
-+          A file descriptor that was used to create a connection
-+          (via <constant>KDBUS_CMD_HELLO</constant>) is called a
-+          <emphasis>connection owner</emphasis> file descriptor. The connection
-+          will be active as long as the file descriptor is kept open.
-+          A connection owner file descriptor may be used to
-+          issue any of the following ioctls.
-+        </para>
-+
-+        <itemizedlist>
-+          <listitem><para>
-+            <constant>KDBUS_CMD_UPDATE</constant> to tweak details of the
-+            connection. See
-+            <citerefentry>
-+              <refentrytitle>kdbus.connection</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>.
-+          </para></listitem>
-+
-+          <listitem><para>
-+            <constant>KDBUS_CMD_BYEBYE</constant> to shut down a connection
-+            without losing messages. See
-+            <citerefentry>
-+              <refentrytitle>kdbus.connection</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>.
-+          </para></listitem>
-+
-+          <listitem><para>
-+            <constant>KDBUS_CMD_FREE</constant> to free a slice of memory in
-+            the pool. See
-+            <citerefentry>
-+              <refentrytitle>kdbus.pool</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>.
-+          </para></listitem>
-+
-+          <listitem><para>
-+            <constant>KDBUS_CMD_CONN_INFO</constant> to retrieve information
-+            on other connections on the bus. See
-+            <citerefentry>
-+              <refentrytitle>kdbus.connection</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>.
-+          </para></listitem>
-+
-+          <listitem><para>
-+            <constant>KDBUS_CMD_BUS_CREATOR_INFO</constant> to retrieve
-+            information on the bus creator. See
-+            <citerefentry>
-+              <refentrytitle>kdbus.connection</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>.
-+          </para></listitem>
-+
-+          <listitem><para>
-+            <constant>KDBUS_CMD_LIST</constant> to retrieve a list of
-+            currently active well-known names and unique IDs on the bus. See
-+            <citerefentry>
-+              <refentrytitle>kdbus.name</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>.
-+          </para></listitem>
-+
-+          <listitem><para>
-+            <constant>KDBUS_CMD_SEND</constant> and
-+            <constant>KDBUS_CMD_RECV</constant> to send or receive a message.
-+            See
-+            <citerefentry>
-+              <refentrytitle>kdbus.message</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>.
-+          </para></listitem>
-+
-+          <listitem><para>
-+            <constant>KDBUS_CMD_NAME_ACQUIRE</constant> and
-+            <constant>KDBUS_CMD_NAME_RELEASE</constant> to acquire or release
-+            a well-known name on the bus. See
-+            <citerefentry>
-+              <refentrytitle>kdbus.name</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>.
-+          </para></listitem>
-+
-+          <listitem><para>
-+            <constant>KDBUS_CMD_MATCH_ADD</constant> and
-+            <constant>KDBUS_CMD_MATCH_REMOVE</constant> to add or remove
-+            a match for signal messages. See
-+            <citerefentry>
-+              <refentrytitle>kdbus.match</refentrytitle>
-+              <manvolnum>7</manvolnum>
-+            </citerefentry>.
-+          </para></listitem>
-+        </itemizedlist>
-+      </listitem>
-+    </itemizedlist>
-+
-+    <para>
-+      These ioctls, along with the structs they transport, are explained in
-+      detail in the other documents linked to in the "See Also" section below.
-+    </para>
-+  </refsect1>
-+
-+  <refsect1>
-+    <title>See Also</title>
-+    <simplelist type="inline">
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.bus</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.connection</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.endpoint</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.fs</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.item</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.message</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.name</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>kdbus.pool</refentrytitle>
-+          <manvolnum>7</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>ioctl</refentrytitle>
-+          <manvolnum>2</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>mmap</refentrytitle>
-+          <manvolnum>2</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>open</refentrytitle>
-+          <manvolnum>2</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <citerefentry>
-+          <refentrytitle>close</refentrytitle>
-+          <manvolnum>2</manvolnum>
-+        </citerefentry>
-+      </member>
-+      <member>
-+        <ulink url="http://freedesktop.org/wiki/Software/dbus">D-Bus</ulink>
-+      </member>
-+    </simplelist>
-+  </refsect1>
-+
-+</refentry>
-diff --git a/Documentation/kdbus/stylesheet.xsl b/Documentation/kdbus/stylesheet.xsl
-new file mode 100644
-index 0000000..52565ea
---- /dev/null
-+++ b/Documentation/kdbus/stylesheet.xsl
-@@ -0,0 +1,16 @@
-+<?xml version="1.0" encoding="UTF-8"?>
-+<stylesheet xmlns="http://www.w3.org/1999/XSL/Transform" version="1.0">
-+	<param name="chunk.quietly">1</param>
-+	<param name="funcsynopsis.style">ansi</param>
-+	<param name="funcsynopsis.tabular.threshold">80</param>
-+	<param name="callout.graphics">0</param>
-+	<param name="paper.type">A4</param>
-+	<param name="generate.section.toc.level">2</param>
-+	<param name="use.id.as.filename">1</param>
-+	<param name="citerefentry.link">1</param>
-+	<strip-space elements="*"/>
-+	<template name="generate.citerefentry.link">
-+		<value-of select="refentrytitle"/>
-+		<text>.html</text>
-+	</template>
-+</stylesheet>
-diff --git a/MAINTAINERS b/MAINTAINERS
-index d8afd29..02f7668 100644
---- a/MAINTAINERS
-+++ b/MAINTAINERS
-@@ -5585,6 +5585,19 @@ S:	Maintained
- F:	Documentation/kbuild/kconfig-language.txt
- F:	scripts/kconfig/
- 
-+KDBUS
-+M:	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+M:	Daniel Mack <daniel@zonque.org>
-+M:	David Herrmann <dh.herrmann@googlemail.com>
-+M:	Djalal Harouni <tixxdz@opendz.org>
-+L:	linux-kernel@vger.kernel.org
-+S:	Maintained
-+F:	ipc/kdbus/*
-+F:	samples/kdbus/*
-+F:	Documentation/kdbus/*
-+F:	include/uapi/linux/kdbus.h
-+F:	tools/testing/selftests/kdbus/
-+
- KDUMP
- M:	Vivek Goyal <vgoyal@redhat.com>
- M:	Haren Myneni <hbabu@us.ibm.com>
-diff --git a/Makefile b/Makefile
-index f5c8983..a1c8d57 100644
---- a/Makefile
-+++ b/Makefile
-@@ -1343,6 +1343,7 @@ $(help-board-dirs): help-%:
- %docs: scripts_basic FORCE
- 	$(Q)$(MAKE) $(build)=scripts build_docproc
- 	$(Q)$(MAKE) $(build)=Documentation/DocBook $@
-+	$(Q)$(MAKE) $(build)=Documentation/kdbus $@
- 
- else # KBUILD_EXTMOD
- 
-diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
-index 1a0006a..4842a98 100644
---- a/include/uapi/linux/Kbuild
-+++ b/include/uapi/linux/Kbuild
-@@ -215,6 +215,7 @@ header-y += ixjuser.h
- header-y += jffs2.h
- header-y += joystick.h
- header-y += kcmp.h
-+header-y += kdbus.h
- header-y += kdev_t.h
- header-y += kd.h
- header-y += kernelcapi.h
-diff --git a/include/uapi/linux/kdbus.h b/include/uapi/linux/kdbus.h
-new file mode 100644
-index 0000000..4fc44cb
---- /dev/null
-+++ b/include/uapi/linux/kdbus.h
-@@ -0,0 +1,984 @@
-+/*
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef _UAPI_KDBUS_H_
-+#define _UAPI_KDBUS_H_
-+
-+#include <linux/ioctl.h>
-+#include <linux/types.h>
-+
-+#define KDBUS_IOCTL_MAGIC		0x95
-+#define KDBUS_SRC_ID_KERNEL		(0)
-+#define KDBUS_DST_ID_NAME		(0)
-+#define KDBUS_MATCH_ID_ANY		(~0ULL)
-+#define KDBUS_DST_ID_BROADCAST		(~0ULL)
-+#define KDBUS_FLAG_NEGOTIATE		(1ULL << 63)
-+
-+/**
-+ * struct kdbus_notify_id_change - name registry change message
-+ * @id:			New or former owner of the name
-+ * @flags:		flags field from KDBUS_HELLO_*
-+ *
-+ * Sent from kernel to userspace when the owner or activator of
-+ * a well-known name changes.
-+ *
-+ * Attached to:
-+ *   KDBUS_ITEM_ID_ADD
-+ *   KDBUS_ITEM_ID_REMOVE
-+ */
-+struct kdbus_notify_id_change {
-+	__u64 id;
-+	__u64 flags;
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_notify_name_change - name registry change message
-+ * @old_id:		ID and flags of former owner of a name
-+ * @new_id:		ID and flags of new owner of a name
-+ * @name:		Well-known name
-+ *
-+ * Sent from kernel to userspace when the owner or activator of
-+ * a well-known name changes.
-+ *
-+ * Attached to:
-+ *   KDBUS_ITEM_NAME_ADD
-+ *   KDBUS_ITEM_NAME_REMOVE
-+ *   KDBUS_ITEM_NAME_CHANGE
-+ */
-+struct kdbus_notify_name_change {
-+	struct kdbus_notify_id_change old_id;
-+	struct kdbus_notify_id_change new_id;
-+	char name[0];
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_creds - process credentials
-+ * @uid:		User ID
-+ * @euid:		Effective UID
-+ * @suid:		Saved UID
-+ * @fsuid:		Filesystem UID
-+ * @gid:		Group ID
-+ * @egid:		Effective GID
-+ * @sgid:		Saved GID
-+ * @fsgid:		Filesystem GID
-+ *
-+ * Attached to:
-+ *   KDBUS_ITEM_CREDS
-+ */
-+struct kdbus_creds {
-+	__u64 uid;
-+	__u64 euid;
-+	__u64 suid;
-+	__u64 fsuid;
-+	__u64 gid;
-+	__u64 egid;
-+	__u64 sgid;
-+	__u64 fsgid;
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_pids - process identifiers
-+ * @pid:		Process ID
-+ * @tid:		Thread ID
-+ * @ppid:		Parent process ID
-+ *
-+ * The PID and TID of a process.
-+ *
-+ * Attached to:
-+ *   KDBUS_ITEM_PIDS
-+ */
-+struct kdbus_pids {
-+	__u64 pid;
-+	__u64 tid;
-+	__u64 ppid;
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_caps - process capabilities
-+ * @last_cap:	Highest currently known capability bit
-+ * @caps:	Variable number of 32-bit capabilities flags
-+ *
-+ * Contains a variable number of 32-bit capabilities flags.
-+ *
-+ * Attached to:
-+ *   KDBUS_ITEM_CAPS
-+ */
-+struct kdbus_caps {
-+	__u32 last_cap;
-+	__u32 caps[0];
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_audit - audit information
-+ * @sessionid:		The audit session ID
-+ * @loginuid:		The audit login uid
-+ *
-+ * Attached to:
-+ *   KDBUS_ITEM_AUDIT
-+ */
-+struct kdbus_audit {
-+	__u32 sessionid;
-+	__u32 loginuid;
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_timestamp
-+ * @seqnum:		Global per-domain message sequence number
-+ * @monotonic_ns:	Monotonic timestamp, in nanoseconds
-+ * @realtime_ns:	Realtime timestamp, in nanoseconds
-+ *
-+ * Attached to:
-+ *   KDBUS_ITEM_TIMESTAMP
-+ */
-+struct kdbus_timestamp {
-+	__u64 seqnum;
-+	__u64 monotonic_ns;
-+	__u64 realtime_ns;
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_vec - I/O vector for kdbus payload items
-+ * @size:		The size of the vector
-+ * @address:		Memory address of data buffer
-+ * @offset:		Offset in the in-message payload memory,
-+ *			relative to the message head
-+ *
-+ * Attached to:
-+ *   KDBUS_ITEM_PAYLOAD_VEC, KDBUS_ITEM_PAYLOAD_OFF
-+ */
-+struct kdbus_vec {
-+	__u64 size;
-+	union {
-+		__u64 address;
-+		__u64 offset;
-+	};
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_bloom_parameter - bus-wide bloom parameters
-+ * @size:		Size of the bit field in bytes (m / 8)
-+ * @n_hash:		Number of hash functions used (k)
-+ */
-+struct kdbus_bloom_parameter {
-+	__u64 size;
-+	__u64 n_hash;
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_bloom_filter - bloom filter containing n elements
-+ * @generation:		Generation of the element set in the filter
-+ * @data:		Bit field, multiple of 8 bytes
-+ */
-+struct kdbus_bloom_filter {
-+	__u64 generation;
-+	__u64 data[0];
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_memfd - a kdbus memfd
-+ * @start:		The offset into the memfd where the segment starts
-+ * @size:		The size of the memfd segment
-+ * @fd:			The file descriptor number
-+ * @__pad:		Padding to ensure proper alignment and size
-+ *
-+ * Attached to:
-+ *   KDBUS_ITEM_PAYLOAD_MEMFD
-+ */
-+struct kdbus_memfd {
-+	__u64 start;
-+	__u64 size;
-+	int fd;
-+	__u32 __pad;
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_name - a registered well-known name with its flags
-+ * @flags:		Flags from KDBUS_NAME_*
-+ * @name:		Well-known name
-+ *
-+ * Attached to:
-+ *   KDBUS_ITEM_OWNED_NAME
-+ */
-+struct kdbus_name {
-+	__u64 flags;
-+	char name[0];
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * enum kdbus_policy_access_type - permissions of a policy record
-+ * @_KDBUS_POLICY_ACCESS_NULL:	Uninitialized/invalid
-+ * @KDBUS_POLICY_ACCESS_USER:	Grant access to a uid
-+ * @KDBUS_POLICY_ACCESS_GROUP:	Grant access to gid
-+ * @KDBUS_POLICY_ACCESS_WORLD:	World-accessible
-+ */
-+enum kdbus_policy_access_type {
-+	_KDBUS_POLICY_ACCESS_NULL,
-+	KDBUS_POLICY_ACCESS_USER,
-+	KDBUS_POLICY_ACCESS_GROUP,
-+	KDBUS_POLICY_ACCESS_WORLD,
-+};
-+
-+/**
-+ * enum kdbus_policy_access_flags - mode flags
-+ * @KDBUS_POLICY_OWN:		Allow to own a well-known name
-+ *				Implies KDBUS_POLICY_TALK and KDBUS_POLICY_SEE
-+ * @KDBUS_POLICY_TALK:		Allow communication to a well-known name
-+ *				Implies KDBUS_POLICY_SEE
-+ * @KDBUS_POLICY_SEE:		Allow to see a well-known name
-+ */
-+enum kdbus_policy_type {
-+	KDBUS_POLICY_SEE	= 0,
-+	KDBUS_POLICY_TALK,
-+	KDBUS_POLICY_OWN,
-+};
-+
-+/**
-+ * struct kdbus_policy_access - policy access item
-+ * @type:		One of KDBUS_POLICY_ACCESS_* types
-+ * @access:		Access to grant
-+ * @id:			For KDBUS_POLICY_ACCESS_USER, the uid
-+ *			For KDBUS_POLICY_ACCESS_GROUP, the gid
-+ */
-+struct kdbus_policy_access {
-+	__u64 type;	/* USER, GROUP, WORLD */
-+	__u64 access;	/* OWN, TALK, SEE */
-+	__u64 id;	/* uid, gid, 0 */
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * enum kdbus_attach_flags - flags for metadata attachments
-+ * @KDBUS_ATTACH_TIMESTAMP:		Timestamp
-+ * @KDBUS_ATTACH_CREDS:			Credentials
-+ * @KDBUS_ATTACH_PIDS:			PIDs
-+ * @KDBUS_ATTACH_AUXGROUPS:		Auxiliary groups
-+ * @KDBUS_ATTACH_NAMES:			Well-known names
-+ * @KDBUS_ATTACH_TID_COMM:		The "comm" process identifier of the TID
-+ * @KDBUS_ATTACH_PID_COMM:		The "comm" process identifier of the PID
-+ * @KDBUS_ATTACH_EXE:			The path of the executable
-+ * @KDBUS_ATTACH_CMDLINE:		The process command line
-+ * @KDBUS_ATTACH_CGROUP:		The croup membership
-+ * @KDBUS_ATTACH_CAPS:			The process capabilities
-+ * @KDBUS_ATTACH_SECLABEL:		The security label
-+ * @KDBUS_ATTACH_AUDIT:			The audit IDs
-+ * @KDBUS_ATTACH_CONN_DESCRIPTION:	The human-readable connection name
-+ * @_KDBUS_ATTACH_ALL:			All of the above
-+ * @_KDBUS_ATTACH_ANY:			Wildcard match to enable any kind of
-+ *					metatdata.
-+ */
-+enum kdbus_attach_flags {
-+	KDBUS_ATTACH_TIMESTAMP		=  1ULL <<  0,
-+	KDBUS_ATTACH_CREDS		=  1ULL <<  1,
-+	KDBUS_ATTACH_PIDS		=  1ULL <<  2,
-+	KDBUS_ATTACH_AUXGROUPS		=  1ULL <<  3,
-+	KDBUS_ATTACH_NAMES		=  1ULL <<  4,
-+	KDBUS_ATTACH_TID_COMM		=  1ULL <<  5,
-+	KDBUS_ATTACH_PID_COMM		=  1ULL <<  6,
-+	KDBUS_ATTACH_EXE		=  1ULL <<  7,
-+	KDBUS_ATTACH_CMDLINE		=  1ULL <<  8,
-+	KDBUS_ATTACH_CGROUP		=  1ULL <<  9,
-+	KDBUS_ATTACH_CAPS		=  1ULL << 10,
-+	KDBUS_ATTACH_SECLABEL		=  1ULL << 11,
-+	KDBUS_ATTACH_AUDIT		=  1ULL << 12,
-+	KDBUS_ATTACH_CONN_DESCRIPTION	=  1ULL << 13,
-+	_KDBUS_ATTACH_ALL		=  (1ULL << 14) - 1,
-+	_KDBUS_ATTACH_ANY		=  ~0ULL
-+};
-+
-+/**
-+ * enum kdbus_item_type - item types to chain data in a list
-+ * @_KDBUS_ITEM_NULL:			Uninitialized/invalid
-+ * @_KDBUS_ITEM_USER_BASE:		Start of user items
-+ * @KDBUS_ITEM_NEGOTIATE:		Negotiate supported items
-+ * @KDBUS_ITEM_PAYLOAD_VEC:		Vector to data
-+ * @KDBUS_ITEM_PAYLOAD_OFF:		Data at returned offset to message head
-+ * @KDBUS_ITEM_PAYLOAD_MEMFD:		Data as sealed memfd
-+ * @KDBUS_ITEM_FDS:			Attached file descriptors
-+ * @KDBUS_ITEM_CANCEL_FD:		FD used to cancel a synchronous
-+ *					operation by writing to it from
-+ *					userspace
-+ * @KDBUS_ITEM_BLOOM_PARAMETER:		Bus-wide bloom parameters, used with
-+ *					KDBUS_CMD_BUS_MAKE, carries a
-+ *					struct kdbus_bloom_parameter
-+ * @KDBUS_ITEM_BLOOM_FILTER:		Bloom filter carried with a message,
-+ *					used to match against a bloom mask of a
-+ *					connection, carries a struct
-+ *					kdbus_bloom_filter
-+ * @KDBUS_ITEM_BLOOM_MASK:		Bloom mask used to match against a
-+ *					message'sbloom filter
-+ * @KDBUS_ITEM_DST_NAME:		Destination's well-known name
-+ * @KDBUS_ITEM_MAKE_NAME:		Name of domain, bus, endpoint
-+ * @KDBUS_ITEM_ATTACH_FLAGS_SEND:	Attach-flags, used for updating which
-+ *					metadata a connection opts in to send
-+ * @KDBUS_ITEM_ATTACH_FLAGS_RECV:	Attach-flags, used for updating which
-+ *					metadata a connection requests to
-+ *					receive for each reeceived message
-+ * @KDBUS_ITEM_ID:			Connection ID
-+ * @KDBUS_ITEM_NAME:			Well-know name with flags
-+ * @_KDBUS_ITEM_ATTACH_BASE:		Start of metadata attach items
-+ * @KDBUS_ITEM_TIMESTAMP:		Timestamp
-+ * @KDBUS_ITEM_CREDS:			Process credentials
-+ * @KDBUS_ITEM_PIDS:			Process identifiers
-+ * @KDBUS_ITEM_AUXGROUPS:		Auxiliary process groups
-+ * @KDBUS_ITEM_OWNED_NAME:		A name owned by the associated
-+ *					connection
-+ * @KDBUS_ITEM_TID_COMM:		Thread ID "comm" identifier
-+ *					(Don't trust this, see below.)
-+ * @KDBUS_ITEM_PID_COMM:		Process ID "comm" identifier
-+ *					(Don't trust this, see below.)
-+ * @KDBUS_ITEM_EXE:			The path of the executable
-+ *					(Don't trust this, see below.)
-+ * @KDBUS_ITEM_CMDLINE:			The process command line
-+ *					(Don't trust this, see below.)
-+ * @KDBUS_ITEM_CGROUP:			The croup membership
-+ * @KDBUS_ITEM_CAPS:			The process capabilities
-+ * @KDBUS_ITEM_SECLABEL:		The security label
-+ * @KDBUS_ITEM_AUDIT:			The audit IDs
-+ * @KDBUS_ITEM_CONN_DESCRIPTION:	The connection's human-readable name
-+ *					(debugging)
-+ * @_KDBUS_ITEM_POLICY_BASE:		Start of policy items
-+ * @KDBUS_ITEM_POLICY_ACCESS:		Policy access block
-+ * @_KDBUS_ITEM_KERNEL_BASE:		Start of kernel-generated message items
-+ * @KDBUS_ITEM_NAME_ADD:		Notification in kdbus_notify_name_change
-+ * @KDBUS_ITEM_NAME_REMOVE:		Notification in kdbus_notify_name_change
-+ * @KDBUS_ITEM_NAME_CHANGE:		Notification in kdbus_notify_name_change
-+ * @KDBUS_ITEM_ID_ADD:			Notification in kdbus_notify_id_change
-+ * @KDBUS_ITEM_ID_REMOVE:		Notification in kdbus_notify_id_change
-+ * @KDBUS_ITEM_REPLY_TIMEOUT:		Timeout has been reached
-+ * @KDBUS_ITEM_REPLY_DEAD:		Destination died
-+ *
-+ * N.B: The process and thread COMM fields, as well as the CMDLINE and
-+ * EXE fields may be altered by unprivileged processes und should
-+ * hence *not* used for security decisions. Peers should make use of
-+ * these items only for informational purposes, such as generating log
-+ * records.
-+ */
-+enum kdbus_item_type {
-+	_KDBUS_ITEM_NULL,
-+	_KDBUS_ITEM_USER_BASE,
-+	KDBUS_ITEM_NEGOTIATE	= _KDBUS_ITEM_USER_BASE,
-+	KDBUS_ITEM_PAYLOAD_VEC,
-+	KDBUS_ITEM_PAYLOAD_OFF,
-+	KDBUS_ITEM_PAYLOAD_MEMFD,
-+	KDBUS_ITEM_FDS,
-+	KDBUS_ITEM_CANCEL_FD,
-+	KDBUS_ITEM_BLOOM_PARAMETER,
-+	KDBUS_ITEM_BLOOM_FILTER,
-+	KDBUS_ITEM_BLOOM_MASK,
-+	KDBUS_ITEM_DST_NAME,
-+	KDBUS_ITEM_MAKE_NAME,
-+	KDBUS_ITEM_ATTACH_FLAGS_SEND,
-+	KDBUS_ITEM_ATTACH_FLAGS_RECV,
-+	KDBUS_ITEM_ID,
-+	KDBUS_ITEM_NAME,
-+	KDBUS_ITEM_DST_ID,
-+
-+	/* keep these item types in sync with KDBUS_ATTACH_* flags */
-+	_KDBUS_ITEM_ATTACH_BASE	= 0x1000,
-+	KDBUS_ITEM_TIMESTAMP	= _KDBUS_ITEM_ATTACH_BASE,
-+	KDBUS_ITEM_CREDS,
-+	KDBUS_ITEM_PIDS,
-+	KDBUS_ITEM_AUXGROUPS,
-+	KDBUS_ITEM_OWNED_NAME,
-+	KDBUS_ITEM_TID_COMM,
-+	KDBUS_ITEM_PID_COMM,
-+	KDBUS_ITEM_EXE,
-+	KDBUS_ITEM_CMDLINE,
-+	KDBUS_ITEM_CGROUP,
-+	KDBUS_ITEM_CAPS,
-+	KDBUS_ITEM_SECLABEL,
-+	KDBUS_ITEM_AUDIT,
-+	KDBUS_ITEM_CONN_DESCRIPTION,
-+
-+	_KDBUS_ITEM_POLICY_BASE	= 0x2000,
-+	KDBUS_ITEM_POLICY_ACCESS = _KDBUS_ITEM_POLICY_BASE,
-+
-+	_KDBUS_ITEM_KERNEL_BASE	= 0x8000,
-+	KDBUS_ITEM_NAME_ADD	= _KDBUS_ITEM_KERNEL_BASE,
-+	KDBUS_ITEM_NAME_REMOVE,
-+	KDBUS_ITEM_NAME_CHANGE,
-+	KDBUS_ITEM_ID_ADD,
-+	KDBUS_ITEM_ID_REMOVE,
-+	KDBUS_ITEM_REPLY_TIMEOUT,
-+	KDBUS_ITEM_REPLY_DEAD,
-+};
-+
-+/**
-+ * struct kdbus_item - chain of data blocks
-+ * @size:		Overall data record size
-+ * @type:		Kdbus_item type of data
-+ * @data:		Generic bytes
-+ * @data32:		Generic 32 bit array
-+ * @data64:		Generic 64 bit array
-+ * @str:		Generic string
-+ * @id:			Connection ID
-+ * @vec:		KDBUS_ITEM_PAYLOAD_VEC
-+ * @creds:		KDBUS_ITEM_CREDS
-+ * @audit:		KDBUS_ITEM_AUDIT
-+ * @timestamp:		KDBUS_ITEM_TIMESTAMP
-+ * @name:		KDBUS_ITEM_NAME
-+ * @bloom_parameter:	KDBUS_ITEM_BLOOM_PARAMETER
-+ * @bloom_filter:	KDBUS_ITEM_BLOOM_FILTER
-+ * @memfd:		KDBUS_ITEM_PAYLOAD_MEMFD
-+ * @name_change:	KDBUS_ITEM_NAME_ADD
-+ *			KDBUS_ITEM_NAME_REMOVE
-+ *			KDBUS_ITEM_NAME_CHANGE
-+ * @id_change:		KDBUS_ITEM_ID_ADD
-+ *			KDBUS_ITEM_ID_REMOVE
-+ * @policy:		KDBUS_ITEM_POLICY_ACCESS
-+ */
-+struct kdbus_item {
-+	__u64 size;
-+	__u64 type;
-+	union {
-+		__u8 data[0];
-+		__u32 data32[0];
-+		__u64 data64[0];
-+		char str[0];
-+
-+		__u64 id;
-+		struct kdbus_vec vec;
-+		struct kdbus_creds creds;
-+		struct kdbus_pids pids;
-+		struct kdbus_audit audit;
-+		struct kdbus_caps caps;
-+		struct kdbus_timestamp timestamp;
-+		struct kdbus_name name;
-+		struct kdbus_bloom_parameter bloom_parameter;
-+		struct kdbus_bloom_filter bloom_filter;
-+		struct kdbus_memfd memfd;
-+		int fds[0];
-+		struct kdbus_notify_name_change name_change;
-+		struct kdbus_notify_id_change id_change;
-+		struct kdbus_policy_access policy_access;
-+	};
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * enum kdbus_msg_flags - type of message
-+ * @KDBUS_MSG_EXPECT_REPLY:	Expect a reply message, used for
-+ *				method calls. The userspace-supplied
-+ *				cookie identifies the message and the
-+ *				respective reply carries the cookie
-+ *				in cookie_reply
-+ * @KDBUS_MSG_NO_AUTO_START:	Do not start a service if the addressed
-+ *				name is not currently active. This flag is
-+ *				not looked at by the kernel but only
-+ *				serves as hint for userspace implementations.
-+ * @KDBUS_MSG_SIGNAL:		Treat this message as signal
-+ */
-+enum kdbus_msg_flags {
-+	KDBUS_MSG_EXPECT_REPLY	= 1ULL << 0,
-+	KDBUS_MSG_NO_AUTO_START	= 1ULL << 1,
-+	KDBUS_MSG_SIGNAL	= 1ULL << 2,
-+};
-+
-+/**
-+ * enum kdbus_payload_type - type of payload carried by message
-+ * @KDBUS_PAYLOAD_KERNEL:	Kernel-generated simple message
-+ * @KDBUS_PAYLOAD_DBUS:		D-Bus marshalling "DBusDBus"
-+ *
-+ * Any payload-type is accepted. Common types will get added here once
-+ * established.
-+ */
-+enum kdbus_payload_type {
-+	KDBUS_PAYLOAD_KERNEL,
-+	KDBUS_PAYLOAD_DBUS	= 0x4442757344427573ULL,
-+};
-+
-+/**
-+ * struct kdbus_msg - the representation of a kdbus message
-+ * @size:		Total size of the message
-+ * @flags:		Message flags (KDBUS_MSG_*), userspace → kernel
-+ * @priority:		Message queue priority value
-+ * @dst_id:		64-bit ID of the destination connection
-+ * @src_id:		64-bit ID of the source connection
-+ * @payload_type:	Payload type (KDBUS_PAYLOAD_*)
-+ * @cookie:		Userspace-supplied cookie, for the connection
-+ *			to identify its messages
-+ * @timeout_ns:		The time to wait for a message reply from the peer.
-+ *			If there is no reply, and the send command is
-+ *			executed asynchronously, a kernel-generated message
-+ *			with an attached KDBUS_ITEM_REPLY_TIMEOUT item
-+ *			is sent to @src_id. For synchronously executed send
-+ *			command, the value denotes the maximum time the call
-+ *			blocks to wait for a reply. The timeout is expected in
-+ *			nanoseconds and as absolute CLOCK_MONOTONIC value.
-+ * @cookie_reply:	A reply to the requesting message with the same
-+ *			cookie. The requesting connection can match its
-+ *			request and the reply with this value
-+ * @items:		A list of kdbus_items containing the message payload
-+ */
-+struct kdbus_msg {
-+	__u64 size;
-+	__u64 flags;
-+	__s64 priority;
-+	__u64 dst_id;
-+	__u64 src_id;
-+	__u64 payload_type;
-+	__u64 cookie;
-+	union {
-+		__u64 timeout_ns;
-+		__u64 cookie_reply;
-+	};
-+	struct kdbus_item items[0];
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_msg_info - returned message container
-+ * @offset:		Offset of kdbus_msg slice in pool
-+ * @msg_size:		Copy of the kdbus_msg.size field
-+ * @return_flags:	Command return flags, kernel → userspace
-+ */
-+struct kdbus_msg_info {
-+	__u64 offset;
-+	__u64 msg_size;
-+	__u64 return_flags;
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * enum kdbus_send_flags - flags for sending messages
-+ * @KDBUS_SEND_SYNC_REPLY:	Wait for destination connection to
-+ *				reply to this message. The
-+ *				KDBUS_CMD_SEND ioctl() will block
-+ *				until the reply is received, and
-+ *				reply in struct kdbus_cmd_send will
-+ *				yield the offset in the sender's pool
-+ *				where the reply can be found.
-+ *				This flag is only valid if
-+ *				@KDBUS_MSG_EXPECT_REPLY is set as well.
-+ */
-+enum kdbus_send_flags {
-+	KDBUS_SEND_SYNC_REPLY		= 1ULL << 0,
-+};
-+
-+/**
-+ * struct kdbus_cmd_send - send message
-+ * @size:		Overall size of this structure
-+ * @flags:		Flags to change send behavior (KDBUS_SEND_*)
-+ * @return_flags:	Command return flags, kernel → userspace
-+ * @msg_address:	Storage address of the kdbus_msg to send
-+ * @reply:		Storage for message reply if KDBUS_SEND_SYNC_REPLY
-+ *			was given
-+ * @items:		Additional items for this command
-+ */
-+struct kdbus_cmd_send {
-+	__u64 size;
-+	__u64 flags;
-+	__u64 return_flags;
-+	__u64 msg_address;
-+	struct kdbus_msg_info reply;
-+	struct kdbus_item items[0];
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * enum kdbus_recv_flags - flags for de-queuing messages
-+ * @KDBUS_RECV_PEEK:		Return the next queued message without
-+ *				actually de-queuing it, and without installing
-+ *				any file descriptors or other resources. It is
-+ *				usually used to determine the activating
-+ *				connection of a bus name.
-+ * @KDBUS_RECV_DROP:		Drop and free the next queued message and all
-+ *				its resources without actually receiving it.
-+ * @KDBUS_RECV_USE_PRIORITY:	Only de-queue messages with the specified or
-+ *				higher priority (lowest values); if not set,
-+ *				the priority value is ignored.
-+ */
-+enum kdbus_recv_flags {
-+	KDBUS_RECV_PEEK		= 1ULL <<  0,
-+	KDBUS_RECV_DROP		= 1ULL <<  1,
-+	KDBUS_RECV_USE_PRIORITY	= 1ULL <<  2,
-+};
-+
-+/**
-+ * enum kdbus_recv_return_flags - return flags for message receive commands
-+ * @KDBUS_RECV_RETURN_INCOMPLETE_FDS:	One or more file descriptors could not
-+ *					be installed. These descriptors in
-+ *					KDBUS_ITEM_FDS will carry the value -1.
-+ * @KDBUS_RECV_RETURN_DROPPED_MSGS:	There have been dropped messages since
-+ *					the last time a message was received.
-+ *					The 'dropped_msgs' counter contains the
-+ *					number of messages dropped pool
-+ *					overflows or other missed broadcasts.
-+ */
-+enum kdbus_recv_return_flags {
-+	KDBUS_RECV_RETURN_INCOMPLETE_FDS	= 1ULL <<  0,
-+	KDBUS_RECV_RETURN_DROPPED_MSGS		= 1ULL <<  1,
-+};
-+
-+/**
-+ * struct kdbus_cmd_recv - struct to de-queue a buffered message
-+ * @size:		Overall size of this object
-+ * @flags:		KDBUS_RECV_* flags, userspace → kernel
-+ * @return_flags:	Command return flags, kernel → userspace
-+ * @priority:		Minimum priority of the messages to de-queue. Lowest
-+ *			values have the highest priority.
-+ * @dropped_msgs:	In case there were any dropped messages since the last
-+ *			time a message was received, this will be set to the
-+ *			number of lost messages and
-+ *			KDBUS_RECV_RETURN_DROPPED_MSGS will be set in
-+ *			'return_flags'. This can only happen if the ioctl
-+ *			returns 0 or EAGAIN.
-+ * @msg:		Return storage for received message.
-+ * @items:		Additional items for this command.
-+ *
-+ * This struct is used with the KDBUS_CMD_RECV ioctl.
-+ */
-+struct kdbus_cmd_recv {
-+	__u64 size;
-+	__u64 flags;
-+	__u64 return_flags;
-+	__s64 priority;
-+	__u64 dropped_msgs;
-+	struct kdbus_msg_info msg;
-+	struct kdbus_item items[0];
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_cmd_free - struct to free a slice of memory in the pool
-+ * @size:		Overall size of this structure
-+ * @flags:		Flags for the free command, userspace → kernel
-+ * @return_flags:	Command return flags, kernel → userspace
-+ * @offset:		The offset of the memory slice, as returned by other
-+ *			ioctls
-+ * @items:		Additional items to modify the behavior
-+ *
-+ * This struct is used with the KDBUS_CMD_FREE ioctl.
-+ */
-+struct kdbus_cmd_free {
-+	__u64 size;
-+	__u64 flags;
-+	__u64 return_flags;
-+	__u64 offset;
-+	struct kdbus_item items[0];
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * enum kdbus_hello_flags - flags for struct kdbus_cmd_hello
-+ * @KDBUS_HELLO_ACCEPT_FD:	The connection allows the reception of
-+ *				any passed file descriptors
-+ * @KDBUS_HELLO_ACTIVATOR:	Special-purpose connection which registers
-+ *				a well-know name for a process to be started
-+ *				when traffic arrives
-+ * @KDBUS_HELLO_POLICY_HOLDER:	Special-purpose connection which registers
-+ *				policy entries for a name. The provided name
-+ *				is not activated and not registered with the
-+ *				name database, it only allows unprivileged
-+ *				connections to acquire a name, talk or discover
-+ *				a service
-+ * @KDBUS_HELLO_MONITOR:	Special-purpose connection to monitor
-+ *				bus traffic
-+ */
-+enum kdbus_hello_flags {
-+	KDBUS_HELLO_ACCEPT_FD		=  1ULL <<  0,
-+	KDBUS_HELLO_ACTIVATOR		=  1ULL <<  1,
-+	KDBUS_HELLO_POLICY_HOLDER	=  1ULL <<  2,
-+	KDBUS_HELLO_MONITOR		=  1ULL <<  3,
-+};
-+
-+/**
-+ * struct kdbus_cmd_hello - struct to say hello to kdbus
-+ * @size:		The total size of the structure
-+ * @flags:		Connection flags (KDBUS_HELLO_*), userspace → kernel
-+ * @return_flags:	Command return flags, kernel → userspace
-+ * @attach_flags_send:	Mask of metadata to attach to each message sent
-+ *			off by this connection (KDBUS_ATTACH_*)
-+ * @attach_flags_recv:	Mask of metadata to attach to each message receieved
-+ *			by the new connection (KDBUS_ATTACH_*)
-+ * @bus_flags:		The flags field copied verbatim from the original
-+ *			KDBUS_CMD_BUS_MAKE ioctl. It's intended to be useful
-+ *			to do negotiation of features of the payload that is
-+ *			transferred (kernel → userspace)
-+ * @id:			The ID of this connection (kernel → userspace)
-+ * @pool_size:		Size of the connection's buffer where the received
-+ *			messages are placed
-+ * @offset:		Pool offset where items are returned to report
-+ *			additional information about the bus and the newly
-+ *			created connection.
-+ * @items_size:		Size of buffer returned in the pool slice at @offset.
-+ * @id128:		Unique 128-bit ID of the bus (kernel → userspace)
-+ * @items:		A list of items
-+ *
-+ * This struct is used with the KDBUS_CMD_HELLO ioctl.
-+ */
-+struct kdbus_cmd_hello {
-+	__u64 size;
-+	__u64 flags;
-+	__u64 return_flags;
-+	__u64 attach_flags_send;
-+	__u64 attach_flags_recv;
-+	__u64 bus_flags;
-+	__u64 id;
-+	__u64 pool_size;
-+	__u64 offset;
-+	__u64 items_size;
-+	__u8 id128[16];
-+	struct kdbus_item items[0];
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_info - connection information
-+ * @size:		total size of the struct
-+ * @id:			64bit object ID
-+ * @flags:		object creation flags
-+ * @items:		list of items
-+ *
-+ * Note that the user is responsible for freeing the allocated memory with
-+ * the KDBUS_CMD_FREE ioctl.
-+ */
-+struct kdbus_info {
-+	__u64 size;
-+	__u64 id;
-+	__u64 flags;
-+	struct kdbus_item items[0];
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * enum kdbus_list_flags - what to include into the returned list
-+ * @KDBUS_LIST_UNIQUE:		active connections
-+ * @KDBUS_LIST_ACTIVATORS:	activator connections
-+ * @KDBUS_LIST_NAMES:		known well-known names
-+ * @KDBUS_LIST_QUEUED:		queued-up names
-+ */
-+enum kdbus_list_flags {
-+	KDBUS_LIST_UNIQUE		= 1ULL <<  0,
-+	KDBUS_LIST_NAMES		= 1ULL <<  1,
-+	KDBUS_LIST_ACTIVATORS		= 1ULL <<  2,
-+	KDBUS_LIST_QUEUED		= 1ULL <<  3,
-+};
-+
-+/**
-+ * struct kdbus_cmd_list - list connections
-+ * @size:		overall size of this object
-+ * @flags:		flags for the query (KDBUS_LIST_*), userspace → kernel
-+ * @return_flags:	command return flags, kernel → userspace
-+ * @offset:		Offset in the caller's pool buffer where an array of
-+ *			kdbus_info objects is stored.
-+ *			The user must use KDBUS_CMD_FREE to free the
-+ *			allocated memory.
-+ * @list_size:		size of returned list in bytes
-+ * @items:		Items for the command. Reserved for future use.
-+ *
-+ * This structure is used with the KDBUS_CMD_LIST ioctl.
-+ */
-+struct kdbus_cmd_list {
-+	__u64 size;
-+	__u64 flags;
-+	__u64 return_flags;
-+	__u64 offset;
-+	__u64 list_size;
-+	struct kdbus_item items[0];
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_cmd_info - struct used for KDBUS_CMD_CONN_INFO ioctl
-+ * @size:		The total size of the struct
-+ * @flags:		Flags for this ioctl, userspace → kernel
-+ * @return_flags:	Command return flags, kernel → userspace
-+ * @id:			The 64-bit ID of the connection. If set to zero, passing
-+ *			@name is required. kdbus will look up the name to
-+ *			determine the ID in this case.
-+ * @attach_flags:	Set of attach flags to specify the set of information
-+ *			to receive, userspace → kernel
-+ * @offset:		Returned offset in the caller's pool buffer where the
-+ *			kdbus_info struct result is stored. The user must
-+ *			use KDBUS_CMD_FREE to free the allocated memory.
-+ * @info_size:		Output buffer to report size of data at @offset.
-+ * @items:		The optional item list, containing the
-+ *			well-known name to look up as a KDBUS_ITEM_NAME.
-+ *			Only needed in case @id is zero.
-+ *
-+ * On success, the KDBUS_CMD_CONN_INFO ioctl will return 0 and @offset will
-+ * tell the user the offset in the connection pool buffer at which to find the
-+ * result in a struct kdbus_info.
-+ */
-+struct kdbus_cmd_info {
-+	__u64 size;
-+	__u64 flags;
-+	__u64 return_flags;
-+	__u64 id;
-+	__u64 attach_flags;
-+	__u64 offset;
-+	__u64 info_size;
-+	struct kdbus_item items[0];
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * enum kdbus_cmd_match_flags - flags to control the KDBUS_CMD_MATCH_ADD ioctl
-+ * @KDBUS_MATCH_REPLACE:	If entries with the supplied cookie already
-+ *				exists, remove them before installing the new
-+ *				matches.
-+ */
-+enum kdbus_cmd_match_flags {
-+	KDBUS_MATCH_REPLACE	= 1ULL <<  0,
-+};
-+
-+/**
-+ * struct kdbus_cmd_match - struct to add or remove matches
-+ * @size:		The total size of the struct
-+ * @flags:		Flags for match command (KDBUS_MATCH_*),
-+ *			userspace → kernel
-+ * @return_flags:	Command return flags, kernel → userspace
-+ * @cookie:		Userspace supplied cookie. When removing, the cookie
-+ *			identifies the match to remove
-+ * @items:		A list of items for additional information
-+ *
-+ * This structure is used with the KDBUS_CMD_MATCH_ADD and
-+ * KDBUS_CMD_MATCH_REMOVE ioctl.
-+ */
-+struct kdbus_cmd_match {
-+	__u64 size;
-+	__u64 flags;
-+	__u64 return_flags;
-+	__u64 cookie;
-+	struct kdbus_item items[0];
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * enum kdbus_make_flags - Flags for KDBUS_CMD_{BUS,ENDPOINT}_MAKE
-+ * @KDBUS_MAKE_ACCESS_GROUP:	Make the bus or endpoint node group-accessible
-+ * @KDBUS_MAKE_ACCESS_WORLD:	Make the bus or endpoint node world-accessible
-+ */
-+enum kdbus_make_flags {
-+	KDBUS_MAKE_ACCESS_GROUP		= 1ULL <<  0,
-+	KDBUS_MAKE_ACCESS_WORLD		= 1ULL <<  1,
-+};
-+
-+/**
-+ * enum kdbus_name_flags - flags for KDBUS_CMD_NAME_ACQUIRE
-+ * @KDBUS_NAME_REPLACE_EXISTING:	Try to replace name of other connections
-+ * @KDBUS_NAME_ALLOW_REPLACEMENT:	Allow the replacement of the name
-+ * @KDBUS_NAME_QUEUE:			Name should be queued if busy
-+ * @KDBUS_NAME_IN_QUEUE:		Name is queued
-+ * @KDBUS_NAME_ACTIVATOR:		Name is owned by a activator connection
-+ * @KDBUS_NAME_PRIMARY:			Primary owner of the name
-+ * @KDBUS_NAME_ACQUIRED:		Name was acquired/queued _now_
-+ */
-+enum kdbus_name_flags {
-+	KDBUS_NAME_REPLACE_EXISTING	= 1ULL <<  0,
-+	KDBUS_NAME_ALLOW_REPLACEMENT	= 1ULL <<  1,
-+	KDBUS_NAME_QUEUE		= 1ULL <<  2,
-+	KDBUS_NAME_IN_QUEUE		= 1ULL <<  3,
-+	KDBUS_NAME_ACTIVATOR		= 1ULL <<  4,
-+	KDBUS_NAME_PRIMARY		= 1ULL <<  5,
-+	KDBUS_NAME_ACQUIRED		= 1ULL <<  6,
-+};
-+
-+/**
-+ * struct kdbus_cmd - generic ioctl payload
-+ * @size:		Overall size of this structure
-+ * @flags:		Flags for this ioctl, userspace → kernel
-+ * @return_flags:	Ioctl return flags, kernel → userspace
-+ * @items:		Additional items to modify the behavior
-+ *
-+ * This is a generic ioctl payload object. It's used by all ioctls that only
-+ * take flags and items as input.
-+ */
-+struct kdbus_cmd {
-+	__u64 size;
-+	__u64 flags;
-+	__u64 return_flags;
-+	struct kdbus_item items[0];
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * Ioctl API
-+ *
-+ * KDBUS_CMD_BUS_MAKE:		After opening the "control" node, this command
-+ *				creates a new bus with the specified
-+ *				name. The bus is immediately shut down and
-+ *				cleaned up when the opened file descriptor is
-+ *				closed.
-+ *
-+ * KDBUS_CMD_ENDPOINT_MAKE:	Creates a new named special endpoint to talk to
-+ *				the bus. Such endpoints usually carry a more
-+ *				restrictive policy and grant restricted access
-+ *				to specific applications.
-+ * KDBUS_CMD_ENDPOINT_UPDATE:	Update the properties of a custom enpoint. Used
-+ *				to update the policy.
-+ *
-+ * KDBUS_CMD_HELLO:		By opening the bus node, a connection is
-+ *				created. After a HELLO the opened connection
-+ *				becomes an active peer on the bus.
-+ * KDBUS_CMD_UPDATE:		Update the properties of a connection. Used to
-+ *				update the metadata subscription mask and
-+ *				policy.
-+ * KDBUS_CMD_BYEBYE:		Disconnect a connection. If there are no
-+ *				messages queued up in the connection's pool,
-+ *				the call succeeds, and the handle is rendered
-+ *				unusable. Otherwise, -EBUSY is returned without
-+ *				any further side-effects.
-+ * KDBUS_CMD_FREE:		Release the allocated memory in the receiver's
-+ *				pool.
-+ * KDBUS_CMD_CONN_INFO:		Retrieve credentials and properties of the
-+ *				initial creator of the connection. The data was
-+ *				stored at registration time and does not
-+ *				necessarily represent the connected process or
-+ *				the actual state of the process.
-+ * KDBUS_CMD_BUS_CREATOR_INFO:	Retrieve information of the creator of the bus
-+ *				a connection is attached to.
-+ *
-+ * KDBUS_CMD_SEND:		Send a message and pass data from userspace to
-+ *				the kernel.
-+ * KDBUS_CMD_RECV:		Receive a message from the kernel which is
-+ *				placed in the receiver's pool.
-+ *
-+ * KDBUS_CMD_NAME_ACQUIRE:	Request a well-known bus name to associate with
-+ *				the connection. Well-known names are used to
-+ *				address a peer on the bus.
-+ * KDBUS_CMD_NAME_RELEASE:	Release a well-known name the connection
-+ *				currently owns.
-+ * KDBUS_CMD_LIST:		Retrieve the list of all currently registered
-+ *				well-known and unique names.
-+ *
-+ * KDBUS_CMD_MATCH_ADD:		Install a match which broadcast messages should
-+ *				be delivered to the connection.
-+ * KDBUS_CMD_MATCH_REMOVE:	Remove a current match for broadcast messages.
-+ */
-+enum kdbus_ioctl_type {
-+	/* bus owner (00-0f) */
-+	KDBUS_CMD_BUS_MAKE =		_IOW(KDBUS_IOCTL_MAGIC, 0x00,
-+					     struct kdbus_cmd),
-+
-+	/* endpoint owner (10-1f) */
-+	KDBUS_CMD_ENDPOINT_MAKE =	_IOW(KDBUS_IOCTL_MAGIC, 0x10,
-+					     struct kdbus_cmd),
-+	KDBUS_CMD_ENDPOINT_UPDATE =	_IOW(KDBUS_IOCTL_MAGIC, 0x11,
-+					     struct kdbus_cmd),
-+
-+	/* connection owner (80-ff) */
-+	KDBUS_CMD_HELLO =		_IOWR(KDBUS_IOCTL_MAGIC, 0x80,
-+					      struct kdbus_cmd_hello),
-+	KDBUS_CMD_UPDATE =		_IOW(KDBUS_IOCTL_MAGIC, 0x81,
-+					     struct kdbus_cmd),
-+	KDBUS_CMD_BYEBYE =		_IOW(KDBUS_IOCTL_MAGIC, 0x82,
-+					     struct kdbus_cmd),
-+	KDBUS_CMD_FREE =		_IOW(KDBUS_IOCTL_MAGIC, 0x83,
-+					     struct kdbus_cmd_free),
-+	KDBUS_CMD_CONN_INFO =		_IOR(KDBUS_IOCTL_MAGIC, 0x84,
-+					     struct kdbus_cmd_info),
-+	KDBUS_CMD_BUS_CREATOR_INFO =	_IOR(KDBUS_IOCTL_MAGIC, 0x85,
-+					     struct kdbus_cmd_info),
-+	KDBUS_CMD_LIST =		_IOR(KDBUS_IOCTL_MAGIC, 0x86,
-+					     struct kdbus_cmd_list),
-+
-+	KDBUS_CMD_SEND =		_IOW(KDBUS_IOCTL_MAGIC, 0x90,
-+					     struct kdbus_cmd_send),
-+	KDBUS_CMD_RECV =		_IOR(KDBUS_IOCTL_MAGIC, 0x91,
-+					     struct kdbus_cmd_recv),
-+
-+	KDBUS_CMD_NAME_ACQUIRE =	_IOW(KDBUS_IOCTL_MAGIC, 0xa0,
-+					     struct kdbus_cmd),
-+	KDBUS_CMD_NAME_RELEASE =	_IOW(KDBUS_IOCTL_MAGIC, 0xa1,
-+					     struct kdbus_cmd),
-+
-+	KDBUS_CMD_MATCH_ADD =		_IOW(KDBUS_IOCTL_MAGIC, 0xb0,
-+					     struct kdbus_cmd_match),
-+	KDBUS_CMD_MATCH_REMOVE =	_IOW(KDBUS_IOCTL_MAGIC, 0xb1,
-+					     struct kdbus_cmd_match),
-+};
-+
-+#endif /* _UAPI_KDBUS_H_ */
-diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
-index 7b1425a..ce2ac5a 100644
---- a/include/uapi/linux/magic.h
-+++ b/include/uapi/linux/magic.h
-@@ -76,4 +76,6 @@
- #define BTRFS_TEST_MAGIC	0x73727279
- #define NSFS_MAGIC		0x6e736673
- 
-+#define KDBUS_SUPER_MAGIC	0x44427573
-+
- #endif /* __LINUX_MAGIC_H__ */
-diff --git a/init/Kconfig b/init/Kconfig
-index dc24dec..9388071 100644
---- a/init/Kconfig
-+++ b/init/Kconfig
-@@ -261,6 +261,19 @@ config POSIX_MQUEUE_SYSCTL
- 	depends on SYSCTL
- 	default y
- 
-+config KDBUS
-+	tristate "kdbus interprocess communication"
-+	depends on TMPFS
-+	help
-+	  D-Bus is a system for low-latency, low-overhead, easy to use
-+	  interprocess communication (IPC).
-+
-+	  See the man-pages and HTML files in Documentation/kdbus/
-+	  that are generated by 'make mandocs' and 'make htmldocs'.
-+
-+	  If you have an ordinary machine, select M here. The module
-+	  will be called kdbus.
-+
- config CROSS_MEMORY_ATTACH
- 	bool "Enable process_vm_readv/writev syscalls"
- 	depends on MMU
-diff --git a/ipc/Makefile b/ipc/Makefile
-index 86c7300..68ec416 100644
---- a/ipc/Makefile
-+++ b/ipc/Makefile
-@@ -9,4 +9,4 @@ obj_mq-$(CONFIG_COMPAT) += compat_mq.o
- obj-$(CONFIG_POSIX_MQUEUE) += mqueue.o msgutil.o $(obj_mq-y)
- obj-$(CONFIG_IPC_NS) += namespace.o
- obj-$(CONFIG_POSIX_MQUEUE_SYSCTL) += mq_sysctl.o
--
-+obj-$(CONFIG_KDBUS) += kdbus/
-diff --git a/ipc/kdbus/Makefile b/ipc/kdbus/Makefile
-new file mode 100644
-index 0000000..66663a1
---- /dev/null
-+++ b/ipc/kdbus/Makefile
-@@ -0,0 +1,33 @@
-+#
-+# By setting KDBUS_EXT=2, the kdbus module will be built as kdbus2.ko, and
-+# KBUILD_MODNAME=kdbus2. This has the effect that all exported objects have
-+# different names than usually (kdbus2fs, /sys/fs/kdbus2/) and you can run
-+# your test-infrastructure against the kdbus2.ko, while running your system
-+# on kdbus.ko.
-+#
-+# To just build the module, use:
-+#     make KDBUS_EXT=2 M=ipc/kdbus
-+#
-+
-+kdbus$(KDBUS_EXT)-y := \
-+	bus.o \
-+	connection.o \
-+	endpoint.o \
-+	fs.o \
-+	handle.o \
-+	item.o \
-+	main.o \
-+	match.o \
-+	message.o \
-+	metadata.o \
-+	names.o \
-+	node.o \
-+	notify.o \
-+	domain.o \
-+	policy.o \
-+	pool.o \
-+	reply.o \
-+	queue.o \
-+	util.o
-+
-+obj-$(CONFIG_KDBUS) += kdbus$(KDBUS_EXT).o
-diff --git a/ipc/kdbus/bus.c b/ipc/kdbus/bus.c
-new file mode 100644
-index 0000000..a67f825
---- /dev/null
-+++ b/ipc/kdbus/bus.c
-@@ -0,0 +1,514 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/fs.h>
-+#include <linux/hashtable.h>
-+#include <linux/init.h>
-+#include <linux/module.h>
-+#include <linux/random.h>
-+#include <linux/sched.h>
-+#include <linux/sizes.h>
-+#include <linux/slab.h>
-+#include <linux/uaccess.h>
-+#include <linux/uio.h>
-+
-+#include "bus.h"
-+#include "notify.h"
-+#include "connection.h"
-+#include "domain.h"
-+#include "endpoint.h"
-+#include "handle.h"
-+#include "item.h"
-+#include "match.h"
-+#include "message.h"
-+#include "metadata.h"
-+#include "names.h"
-+#include "policy.h"
-+#include "util.h"
-+
-+static void kdbus_bus_free(struct kdbus_node *node)
-+{
-+	struct kdbus_bus *bus = container_of(node, struct kdbus_bus, node);
-+
-+	WARN_ON(!list_empty(&bus->monitors_list));
-+	WARN_ON(!hash_empty(bus->conn_hash));
-+
-+	kdbus_notify_free(bus);
-+
-+	kdbus_user_unref(bus->creator);
-+	kdbus_name_registry_free(bus->name_registry);
-+	kdbus_domain_unref(bus->domain);
-+	kdbus_policy_db_clear(&bus->policy_db);
-+	kdbus_meta_proc_unref(bus->creator_meta);
-+	kfree(bus);
-+}
-+
-+static void kdbus_bus_release(struct kdbus_node *node, bool was_active)
-+{
-+	struct kdbus_bus *bus = container_of(node, struct kdbus_bus, node);
-+
-+	if (was_active)
-+		atomic_dec(&bus->creator->buses);
-+}
-+
-+static struct kdbus_bus *kdbus_bus_new(struct kdbus_domain *domain,
-+				       const char *name,
-+				       struct kdbus_bloom_parameter *bloom,
-+				       const u64 *pattach_owner,
-+				       u64 flags, kuid_t uid, kgid_t gid)
-+{
-+	struct kdbus_bus *b;
-+	u64 attach_owner;
-+	int ret;
-+
-+	if (bloom->size < 8 || bloom->size > KDBUS_BUS_BLOOM_MAX_SIZE ||
-+	    !KDBUS_IS_ALIGNED8(bloom->size) || bloom->n_hash < 1)
-+		return ERR_PTR(-EINVAL);
-+
-+	ret = kdbus_sanitize_attach_flags(pattach_owner ? *pattach_owner : 0,
-+					  &attach_owner);
-+	if (ret < 0)
-+		return ERR_PTR(ret);
-+
-+	ret = kdbus_verify_uid_prefix(name, domain->user_namespace, uid);
-+	if (ret < 0)
-+		return ERR_PTR(ret);
-+
-+	b = kzalloc(sizeof(*b), GFP_KERNEL);
-+	if (!b)
-+		return ERR_PTR(-ENOMEM);
-+
-+	kdbus_node_init(&b->node, KDBUS_NODE_BUS);
-+
-+	b->node.free_cb = kdbus_bus_free;
-+	b->node.release_cb = kdbus_bus_release;
-+	b->node.uid = uid;
-+	b->node.gid = gid;
-+	b->node.mode = S_IRUSR | S_IXUSR;
-+
-+	if (flags & (KDBUS_MAKE_ACCESS_GROUP | KDBUS_MAKE_ACCESS_WORLD))
-+		b->node.mode |= S_IRGRP | S_IXGRP;
-+	if (flags & KDBUS_MAKE_ACCESS_WORLD)
-+		b->node.mode |= S_IROTH | S_IXOTH;
-+
-+	b->id = atomic64_inc_return(&domain->last_id);
-+	b->bus_flags = flags;
-+	b->attach_flags_owner = attach_owner;
-+	generate_random_uuid(b->id128);
-+	b->bloom = *bloom;
-+	b->domain = kdbus_domain_ref(domain);
-+
-+	kdbus_policy_db_init(&b->policy_db);
-+
-+	init_rwsem(&b->conn_rwlock);
-+	hash_init(b->conn_hash);
-+	INIT_LIST_HEAD(&b->monitors_list);
-+
-+	INIT_LIST_HEAD(&b->notify_list);
-+	spin_lock_init(&b->notify_lock);
-+	mutex_init(&b->notify_flush_lock);
-+
-+	ret = kdbus_node_link(&b->node, &domain->node, name);
-+	if (ret < 0)
-+		goto exit_unref;
-+
-+	/* cache the metadata/credentials of the creator */
-+	b->creator_meta = kdbus_meta_proc_new();
-+	if (IS_ERR(b->creator_meta)) {
-+		ret = PTR_ERR(b->creator_meta);
-+		b->creator_meta = NULL;
-+		goto exit_unref;
-+	}
-+
-+	ret = kdbus_meta_proc_collect(b->creator_meta,
-+				      KDBUS_ATTACH_CREDS |
-+				      KDBUS_ATTACH_PIDS |
-+				      KDBUS_ATTACH_AUXGROUPS |
-+				      KDBUS_ATTACH_TID_COMM |
-+				      KDBUS_ATTACH_PID_COMM |
-+				      KDBUS_ATTACH_EXE |
-+				      KDBUS_ATTACH_CMDLINE |
-+				      KDBUS_ATTACH_CGROUP |
-+				      KDBUS_ATTACH_CAPS |
-+				      KDBUS_ATTACH_SECLABEL |
-+				      KDBUS_ATTACH_AUDIT);
-+	if (ret < 0)
-+		goto exit_unref;
-+
-+	b->name_registry = kdbus_name_registry_new();
-+	if (IS_ERR(b->name_registry)) {
-+		ret = PTR_ERR(b->name_registry);
-+		b->name_registry = NULL;
-+		goto exit_unref;
-+	}
-+
-+	/*
-+	 * Bus-limits of the creator are accounted on its real UID, just like
-+	 * all other per-user limits.
-+	 */
-+	b->creator = kdbus_user_lookup(domain, current_uid());
-+	if (IS_ERR(b->creator)) {
-+		ret = PTR_ERR(b->creator);
-+		b->creator = NULL;
-+		goto exit_unref;
-+	}
-+
-+	return b;
-+
-+exit_unref:
-+	kdbus_node_deactivate(&b->node);
-+	kdbus_node_unref(&b->node);
-+	return ERR_PTR(ret);
-+}
-+
-+/**
-+ * kdbus_bus_ref() - increase the reference counter of a kdbus_bus
-+ * @bus:		The bus to reference
-+ *
-+ * Every user of a bus, except for its creator, must add a reference to the
-+ * kdbus_bus using this function.
-+ *
-+ * Return: the bus itself
-+ */
-+struct kdbus_bus *kdbus_bus_ref(struct kdbus_bus *bus)
-+{
-+	if (bus)
-+		kdbus_node_ref(&bus->node);
-+	return bus;
-+}
-+
-+/**
-+ * kdbus_bus_unref() - decrease the reference counter of a kdbus_bus
-+ * @bus:		The bus to unref
-+ *
-+ * Release a reference. If the reference count drops to 0, the bus will be
-+ * freed.
-+ *
-+ * Return: NULL
-+ */
-+struct kdbus_bus *kdbus_bus_unref(struct kdbus_bus *bus)
-+{
-+	if (bus)
-+		kdbus_node_unref(&bus->node);
-+	return NULL;
-+}
-+
-+/**
-+ * kdbus_bus_find_conn_by_id() - find a connection with a given id
-+ * @bus:		The bus to look for the connection
-+ * @id:			The 64-bit connection id
-+ *
-+ * Looks up a connection with a given id. The returned connection
-+ * is ref'ed, and needs to be unref'ed by the user. Returns NULL if
-+ * the connection can't be found.
-+ */
-+struct kdbus_conn *kdbus_bus_find_conn_by_id(struct kdbus_bus *bus, u64 id)
-+{
-+	struct kdbus_conn *conn, *found = NULL;
-+
-+	down_read(&bus->conn_rwlock);
-+	hash_for_each_possible(bus->conn_hash, conn, hentry, id)
-+		if (conn->id == id) {
-+			found = kdbus_conn_ref(conn);
-+			break;
-+		}
-+	up_read(&bus->conn_rwlock);
-+
-+	return found;
-+}
-+
-+/**
-+ * kdbus_bus_broadcast() - send a message to all subscribed connections
-+ * @bus:	The bus the connections are connected to
-+ * @conn_src:	The source connection, may be %NULL for kernel notifications
-+ * @staging:	Staging object containing the message to send
-+ *
-+ * Send message to all connections that are currently active on the bus.
-+ * Connections must still have matches installed in order to let the message
-+ * pass.
-+ *
-+ * The caller must hold the name-registry lock of @bus.
-+ */
-+void kdbus_bus_broadcast(struct kdbus_bus *bus,
-+			 struct kdbus_conn *conn_src,
-+			 struct kdbus_staging *staging)
-+{
-+	struct kdbus_conn *conn_dst;
-+	unsigned int i;
-+	int ret;
-+
-+	lockdep_assert_held(&bus->name_registry->rwlock);
-+
-+	/*
-+	 * Make sure broadcast are queued on monitors before we send it out to
-+	 * anyone else. Otherwise, connections might react to broadcasts before
-+	 * the monitor gets the broadcast queued. In the worst case, the
-+	 * monitor sees a reaction to the broadcast before the broadcast itself.
-+	 * We don't give ordering guarantees across connections (and monitors
-+	 * can re-construct order via sequence numbers), but we should at least
-+	 * try to avoid re-ordering for monitors.
-+	 */
-+	kdbus_bus_eavesdrop(bus, conn_src, staging);
-+
-+	down_read(&bus->conn_rwlock);
-+	hash_for_each(bus->conn_hash, i, conn_dst, hentry) {
-+		if (!kdbus_conn_is_ordinary(conn_dst))
-+			continue;
-+
-+		/*
-+		 * Check if there is a match for the kmsg object in
-+		 * the destination connection match db
-+		 */
-+		if (!kdbus_match_db_match_msg(conn_dst->match_db, conn_src,
-+					      staging))
-+			continue;
-+
-+		if (conn_src) {
-+			/*
-+			 * Anyone can send broadcasts, as they have no
-+			 * destination. But a receiver needs TALK access to
-+			 * the sender in order to receive broadcasts.
-+			 */
-+			if (!kdbus_conn_policy_talk(conn_dst, NULL, conn_src))
-+				continue;
-+		} else {
-+			/*
-+			 * Check if there is a policy db that prevents the
-+			 * destination connection from receiving this kernel
-+			 * notification
-+			 */
-+			if (!kdbus_conn_policy_see_notification(conn_dst, NULL,
-+								staging->msg))
-+				continue;
-+		}
-+
-+		ret = kdbus_conn_entry_insert(conn_src, conn_dst, staging,
-+					      NULL, NULL);
-+		if (ret < 0)
-+			kdbus_conn_lost_message(conn_dst);
-+	}
-+	up_read(&bus->conn_rwlock);
-+}
-+
-+/**
-+ * kdbus_bus_eavesdrop() - send a message to all subscribed monitors
-+ * @bus:	The bus the monitors are connected to
-+ * @conn_src:	The source connection, may be %NULL for kernel notifications
-+ * @staging:	Staging object containing the message to send
-+ *
-+ * Send message to all monitors that are currently active on the bus. Monitors
-+ * must still have matches installed in order to let the message pass.
-+ *
-+ * The caller must hold the name-registry lock of @bus.
-+ */
-+void kdbus_bus_eavesdrop(struct kdbus_bus *bus,
-+			 struct kdbus_conn *conn_src,
-+			 struct kdbus_staging *staging)
-+{
-+	struct kdbus_conn *conn_dst;
-+	int ret;
-+
-+	/*
-+	 * Monitor connections get all messages; ignore possible errors
-+	 * when sending messages to monitor connections.
-+	 */
-+
-+	lockdep_assert_held(&bus->name_registry->rwlock);
-+
-+	down_read(&bus->conn_rwlock);
-+	list_for_each_entry(conn_dst, &bus->monitors_list, monitor_entry) {
-+		ret = kdbus_conn_entry_insert(conn_src, conn_dst, staging,
-+					      NULL, NULL);
-+		if (ret < 0)
-+			kdbus_conn_lost_message(conn_dst);
-+	}
-+	up_read(&bus->conn_rwlock);
-+}
-+
-+/**
-+ * kdbus_cmd_bus_make() - handle KDBUS_CMD_BUS_MAKE
-+ * @domain:		domain to operate on
-+ * @argp:		command payload
-+ *
-+ * Return: NULL or newly created bus on success, ERR_PTR on failure.
-+ */
-+struct kdbus_bus *kdbus_cmd_bus_make(struct kdbus_domain *domain,
-+				     void __user *argp)
-+{
-+	struct kdbus_bus *bus = NULL;
-+	struct kdbus_cmd *cmd;
-+	struct kdbus_ep *ep = NULL;
-+	int ret;
-+
-+	struct kdbus_arg argv[] = {
-+		{ .type = KDBUS_ITEM_NEGOTIATE },
-+		{ .type = KDBUS_ITEM_MAKE_NAME, .mandatory = true },
-+		{ .type = KDBUS_ITEM_BLOOM_PARAMETER, .mandatory = true },
-+		{ .type = KDBUS_ITEM_ATTACH_FLAGS_SEND },
-+	};
-+	struct kdbus_args args = {
-+		.allowed_flags = KDBUS_FLAG_NEGOTIATE |
-+				 KDBUS_MAKE_ACCESS_GROUP |
-+				 KDBUS_MAKE_ACCESS_WORLD,
-+		.argv = argv,
-+		.argc = ARRAY_SIZE(argv),
-+	};
-+
-+	ret = kdbus_args_parse(&args, argp, &cmd);
-+	if (ret < 0)
-+		return ERR_PTR(ret);
-+	if (ret > 0)
-+		return NULL;
-+
-+	bus = kdbus_bus_new(domain,
-+			    argv[1].item->str, &argv[2].item->bloom_parameter,
-+			    argv[3].item ? argv[3].item->data64 : NULL,
-+			    cmd->flags, current_euid(), current_egid());
-+	if (IS_ERR(bus)) {
-+		ret = PTR_ERR(bus);
-+		bus = NULL;
-+		goto exit;
-+	}
-+
-+	if (atomic_inc_return(&bus->creator->buses) > KDBUS_USER_MAX_BUSES) {
-+		atomic_dec(&bus->creator->buses);
-+		ret = -EMFILE;
-+		goto exit;
-+	}
-+
-+	if (!kdbus_node_activate(&bus->node)) {
-+		atomic_dec(&bus->creator->buses);
-+		ret = -ESHUTDOWN;
-+		goto exit;
-+	}
-+
-+	ep = kdbus_ep_new(bus, "bus", cmd->flags, bus->node.uid, bus->node.gid,
-+			  false);
-+	if (IS_ERR(ep)) {
-+		ret = PTR_ERR(ep);
-+		ep = NULL;
-+		goto exit;
-+	}
-+
-+	if (!kdbus_node_activate(&ep->node)) {
-+		ret = -ESHUTDOWN;
-+		goto exit;
-+	}
-+
-+	/*
-+	 * Drop our own reference, effectively causing the endpoint to be
-+	 * deactivated and released when the parent bus is.
-+	 */
-+	ep = kdbus_ep_unref(ep);
-+
-+exit:
-+	ret = kdbus_args_clear(&args, ret);
-+	if (ret < 0) {
-+		if (ep) {
-+			kdbus_node_deactivate(&ep->node);
-+			kdbus_ep_unref(ep);
-+		}
-+		if (bus) {
-+			kdbus_node_deactivate(&bus->node);
-+			kdbus_bus_unref(bus);
-+		}
-+		return ERR_PTR(ret);
-+	}
-+	return bus;
-+}
-+
-+/**
-+ * kdbus_cmd_bus_creator_info() - handle KDBUS_CMD_BUS_CREATOR_INFO
-+ * @conn:		connection to operate on
-+ * @argp:		command payload
-+ *
-+ * Return: >=0 on success, negative error code on failure.
-+ */
-+int kdbus_cmd_bus_creator_info(struct kdbus_conn *conn, void __user *argp)
-+{
-+	struct kdbus_cmd_info *cmd;
-+	struct kdbus_bus *bus = conn->ep->bus;
-+	struct kdbus_pool_slice *slice = NULL;
-+	struct kdbus_item *meta_items = NULL;
-+	struct kdbus_item_header item_hdr;
-+	struct kdbus_info info = {};
-+	size_t meta_size, name_len, cnt = 0;
-+	struct kvec kvec[6];
-+	u64 attach_flags, size = 0;
-+	int ret;
-+
-+	struct kdbus_arg argv[] = {
-+		{ .type = KDBUS_ITEM_NEGOTIATE },
-+	};
-+	struct kdbus_args args = {
-+		.allowed_flags = KDBUS_FLAG_NEGOTIATE,
-+		.argv = argv,
-+		.argc = ARRAY_SIZE(argv),
-+	};
-+
-+	ret = kdbus_args_parse(&args, argp, &cmd);
-+	if (ret != 0)
-+		return ret;
-+
-+	ret = kdbus_sanitize_attach_flags(cmd->attach_flags, &attach_flags);
-+	if (ret < 0)
-+		goto exit;
-+
-+	attach_flags &= bus->attach_flags_owner;
-+
-+	ret = kdbus_meta_emit(bus->creator_meta, NULL, NULL, conn,
-+			      attach_flags, &meta_items, &meta_size);
-+	if (ret < 0)
-+		goto exit;
-+
-+	name_len = strlen(bus->node.name) + 1;
-+	info.id = bus->id;
-+	info.flags = bus->bus_flags;
-+	item_hdr.type = KDBUS_ITEM_MAKE_NAME;
-+	item_hdr.size = KDBUS_ITEM_HEADER_SIZE + name_len;
-+
-+	kdbus_kvec_set(&kvec[cnt++], &info, sizeof(info), &size);
-+	kdbus_kvec_set(&kvec[cnt++], &item_hdr, sizeof(item_hdr), &size);
-+	kdbus_kvec_set(&kvec[cnt++], bus->node.name, name_len, &size);
-+	cnt += !!kdbus_kvec_pad(&kvec[cnt], &size);
-+	if (meta_size > 0) {
-+		kdbus_kvec_set(&kvec[cnt++], meta_items, meta_size, &size);
-+		cnt += !!kdbus_kvec_pad(&kvec[cnt], &size);
-+	}
-+
-+	info.size = size;
-+
-+	slice = kdbus_pool_slice_alloc(conn->pool, size, false);
-+	if (IS_ERR(slice)) {
-+		ret = PTR_ERR(slice);
-+		slice = NULL;
-+		goto exit;
-+	}
-+
-+	ret = kdbus_pool_slice_copy_kvec(slice, 0, kvec, cnt, size);
-+	if (ret < 0)
-+		goto exit;
-+
-+	kdbus_pool_slice_publish(slice, &cmd->offset, &cmd->info_size);
-+
-+	if (kdbus_member_set_user(&cmd->offset, argp, typeof(*cmd), offset) ||
-+	    kdbus_member_set_user(&cmd->info_size, argp,
-+				  typeof(*cmd), info_size))
-+		ret = -EFAULT;
-+
-+exit:
-+	kdbus_pool_slice_release(slice);
-+	kfree(meta_items);
-+	return kdbus_args_clear(&args, ret);
-+}
-diff --git a/ipc/kdbus/bus.h b/ipc/kdbus/bus.h
-new file mode 100644
-index 0000000..8c2acae
---- /dev/null
-+++ b/ipc/kdbus/bus.h
-@@ -0,0 +1,101 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_BUS_H
-+#define __KDBUS_BUS_H
-+
-+#include <linux/hashtable.h>
-+#include <linux/list.h>
-+#include <linux/mutex.h>
-+#include <linux/rwsem.h>
-+#include <linux/spinlock.h>
-+#include <uapi/linux/kdbus.h>
-+
-+#include "metadata.h"
-+#include "names.h"
-+#include "node.h"
-+#include "policy.h"
-+
-+struct kdbus_conn;
-+struct kdbus_domain;
-+struct kdbus_staging;
-+struct kdbus_user;
-+
-+/**
-+ * struct kdbus_bus - bus in a domain
-+ * @node:		kdbus_node
-+ * @id:			ID of this bus in the domain
-+ * @bus_flags:		Simple pass-through flags from userspace to userspace
-+ * @attach_flags_owner:	KDBUS_ATTACH_* flags of bus creator that other
-+ *			connections can see or query
-+ * @id128:		Unique random 128 bit ID of this bus
-+ * @bloom:		Bloom parameters
-+ * @domain:		Domain of this bus
-+ * @creator:		Creator of the bus
-+ * @creator_meta:	Meta information about the bus creator
-+ * @last_message_id:	Last used message id
-+ * @policy_db:		Policy database for this bus
-+ * @name_registry:	Name registry of this bus
-+ * @conn_rwlock:	Read/Write lock for all lists of child connections
-+ * @conn_hash:		Map of connection IDs
-+ * @monitors_list:	Connections that monitor this bus
-+ * @notify_list:	List of pending kernel-generated messages
-+ * @notify_lock:	Notification list lock
-+ * @notify_flush_lock:	Notification flushing lock
-+ */
-+struct kdbus_bus {
-+	struct kdbus_node node;
-+
-+	/* static */
-+	u64 id;
-+	u64 bus_flags;
-+	u64 attach_flags_owner;
-+	u8 id128[16];
-+	struct kdbus_bloom_parameter bloom;
-+	struct kdbus_domain *domain;
-+	struct kdbus_user *creator;
-+	struct kdbus_meta_proc *creator_meta;
-+
-+	/* protected by own locks */
-+	atomic64_t last_message_id;
-+	struct kdbus_policy_db policy_db;
-+	struct kdbus_name_registry *name_registry;
-+
-+	/* protected by conn_rwlock */
-+	struct rw_semaphore conn_rwlock;
-+	DECLARE_HASHTABLE(conn_hash, 8);
-+	struct list_head monitors_list;
-+
-+	/* protected by notify_lock */
-+	struct list_head notify_list;
-+	spinlock_t notify_lock;
-+	struct mutex notify_flush_lock;
-+};
-+
-+struct kdbus_bus *kdbus_bus_ref(struct kdbus_bus *bus);
-+struct kdbus_bus *kdbus_bus_unref(struct kdbus_bus *bus);
-+
-+struct kdbus_conn *kdbus_bus_find_conn_by_id(struct kdbus_bus *bus, u64 id);
-+void kdbus_bus_broadcast(struct kdbus_bus *bus,
-+			 struct kdbus_conn *conn_src,
-+			 struct kdbus_staging *staging);
-+void kdbus_bus_eavesdrop(struct kdbus_bus *bus,
-+			 struct kdbus_conn *conn_src,
-+			 struct kdbus_staging *staging);
-+
-+struct kdbus_bus *kdbus_cmd_bus_make(struct kdbus_domain *domain,
-+				     void __user *argp);
-+int kdbus_cmd_bus_creator_info(struct kdbus_conn *conn, void __user *argp);
-+
-+#endif
-diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c
-new file mode 100644
-index 0000000..ef63d65
---- /dev/null
-+++ b/ipc/kdbus/connection.c
-@@ -0,0 +1,2227 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/audit.h>
-+#include <linux/file.h>
-+#include <linux/fs.h>
-+#include <linux/fs_struct.h>
-+#include <linux/hashtable.h>
-+#include <linux/idr.h>
-+#include <linux/init.h>
-+#include <linux/math64.h>
-+#include <linux/mm.h>
-+#include <linux/module.h>
-+#include <linux/mutex.h>
-+#include <linux/path.h>
-+#include <linux/poll.h>
-+#include <linux/sched.h>
-+#include <linux/shmem_fs.h>
-+#include <linux/sizes.h>
-+#include <linux/slab.h>
-+#include <linux/syscalls.h>
-+#include <linux/uio.h>
-+
-+#include "bus.h"
-+#include "connection.h"
-+#include "endpoint.h"
-+#include "handle.h"
-+#include "match.h"
-+#include "message.h"
-+#include "metadata.h"
-+#include "names.h"
-+#include "domain.h"
-+#include "item.h"
-+#include "notify.h"
-+#include "policy.h"
-+#include "pool.h"
-+#include "reply.h"
-+#include "util.h"
-+#include "queue.h"
-+
-+#define KDBUS_CONN_ACTIVE_BIAS	(INT_MIN + 2)
-+#define KDBUS_CONN_ACTIVE_NEW	(INT_MIN + 1)
-+
-+static struct kdbus_conn *kdbus_conn_new(struct kdbus_ep *ep,
-+					 struct file *file,
-+					 struct kdbus_cmd_hello *hello,
-+					 const char *name,
-+					 const struct kdbus_creds *creds,
-+					 const struct kdbus_pids *pids,
-+					 const char *seclabel,
-+					 const char *conn_description)
-+{
-+#ifdef CONFIG_DEBUG_LOCK_ALLOC
-+	static struct lock_class_key __key;
-+#endif
-+	struct kdbus_pool_slice *slice = NULL;
-+	struct kdbus_bus *bus = ep->bus;
-+	struct kdbus_conn *conn;
-+	u64 attach_flags_send;
-+	u64 attach_flags_recv;
-+	u64 items_size = 0;
-+	bool is_policy_holder;
-+	bool is_activator;
-+	bool is_monitor;
-+	bool privileged;
-+	bool owner;
-+	struct kvec kvec;
-+	int ret;
-+
-+	struct {
-+		u64 size;
-+		u64 type;
-+		struct kdbus_bloom_parameter bloom;
-+	} bloom_item;
-+
-+	privileged = kdbus_ep_is_privileged(ep, file);
-+	owner = kdbus_ep_is_owner(ep, file);
-+
-+	is_monitor = hello->flags & KDBUS_HELLO_MONITOR;
-+	is_activator = hello->flags & KDBUS_HELLO_ACTIVATOR;
-+	is_policy_holder = hello->flags & KDBUS_HELLO_POLICY_HOLDER;
-+
-+	if (!hello->pool_size || !IS_ALIGNED(hello->pool_size, PAGE_SIZE))
-+		return ERR_PTR(-EINVAL);
-+	if (is_monitor + is_activator + is_policy_holder > 1)
-+		return ERR_PTR(-EINVAL);
-+	if (name && !is_activator && !is_policy_holder)
-+		return ERR_PTR(-EINVAL);
-+	if (!name && (is_activator || is_policy_holder))
-+		return ERR_PTR(-EINVAL);
-+	if (name && !kdbus_name_is_valid(name, true))
-+		return ERR_PTR(-EINVAL);
-+	if (is_monitor && ep->user)
-+		return ERR_PTR(-EOPNOTSUPP);
-+	if (!owner && (is_activator || is_policy_holder || is_monitor))
-+		return ERR_PTR(-EPERM);
-+	if (!owner && (creds || pids || seclabel))
-+		return ERR_PTR(-EPERM);
-+
-+	ret = kdbus_sanitize_attach_flags(hello->attach_flags_send,
-+					  &attach_flags_send);
-+	if (ret < 0)
-+		return ERR_PTR(ret);
-+
-+	ret = kdbus_sanitize_attach_flags(hello->attach_flags_recv,
-+					  &attach_flags_recv);
-+	if (ret < 0)
-+		return ERR_PTR(ret);
-+
-+	conn = kzalloc(sizeof(*conn), GFP_KERNEL);
-+	if (!conn)
-+		return ERR_PTR(-ENOMEM);
-+
-+	kref_init(&conn->kref);
-+	atomic_set(&conn->active, KDBUS_CONN_ACTIVE_NEW);
-+#ifdef CONFIG_DEBUG_LOCK_ALLOC
-+	lockdep_init_map(&conn->dep_map, "s_active", &__key, 0);
-+#endif
-+	mutex_init(&conn->lock);
-+	INIT_LIST_HEAD(&conn->names_list);
-+	INIT_LIST_HEAD(&conn->reply_list);
-+	atomic_set(&conn->request_count, 0);
-+	atomic_set(&conn->lost_count, 0);
-+	INIT_DELAYED_WORK(&conn->work, kdbus_reply_list_scan_work);
-+	conn->cred = get_cred(file->f_cred);
-+	conn->pid = get_pid(task_pid(current));
-+	get_fs_root(current->fs, &conn->root_path);
-+	init_waitqueue_head(&conn->wait);
-+	kdbus_queue_init(&conn->queue);
-+	conn->privileged = privileged;
-+	conn->owner = owner;
-+	conn->ep = kdbus_ep_ref(ep);
-+	conn->id = atomic64_inc_return(&bus->domain->last_id);
-+	conn->flags = hello->flags;
-+	atomic64_set(&conn->attach_flags_send, attach_flags_send);
-+	atomic64_set(&conn->attach_flags_recv, attach_flags_recv);
-+	INIT_LIST_HEAD(&conn->monitor_entry);
-+
-+	if (conn_description) {
-+		conn->description = kstrdup(conn_description, GFP_KERNEL);
-+		if (!conn->description) {
-+			ret = -ENOMEM;
-+			goto exit_unref;
-+		}
-+	}
-+
-+	conn->pool = kdbus_pool_new(conn->description, hello->pool_size);
-+	if (IS_ERR(conn->pool)) {
-+		ret = PTR_ERR(conn->pool);
-+		conn->pool = NULL;
-+		goto exit_unref;
-+	}
-+
-+	conn->match_db = kdbus_match_db_new();
-+	if (IS_ERR(conn->match_db)) {
-+		ret = PTR_ERR(conn->match_db);
-+		conn->match_db = NULL;
-+		goto exit_unref;
-+	}
-+
-+	/* return properties of this connection to the caller */
-+	hello->bus_flags = bus->bus_flags;
-+	hello->id = conn->id;
-+
-+	BUILD_BUG_ON(sizeof(bus->id128) != sizeof(hello->id128));
-+	memcpy(hello->id128, bus->id128, sizeof(hello->id128));
-+
-+	/* privileged processes can impersonate somebody else */
-+	if (creds || pids || seclabel) {
-+		conn->meta_fake = kdbus_meta_fake_new();
-+		if (IS_ERR(conn->meta_fake)) {
-+			ret = PTR_ERR(conn->meta_fake);
-+			conn->meta_fake = NULL;
-+			goto exit_unref;
-+		}
-+
-+		ret = kdbus_meta_fake_collect(conn->meta_fake,
-+					      creds, pids, seclabel);
-+		if (ret < 0)
-+			goto exit_unref;
-+	} else {
-+		conn->meta_proc = kdbus_meta_proc_new();
-+		if (IS_ERR(conn->meta_proc)) {
-+			ret = PTR_ERR(conn->meta_proc);
-+			conn->meta_proc = NULL;
-+			goto exit_unref;
-+		}
-+
-+		ret = kdbus_meta_proc_collect(conn->meta_proc,
-+					      KDBUS_ATTACH_CREDS |
-+					      KDBUS_ATTACH_PIDS |
-+					      KDBUS_ATTACH_AUXGROUPS |
-+					      KDBUS_ATTACH_TID_COMM |
-+					      KDBUS_ATTACH_PID_COMM |
-+					      KDBUS_ATTACH_EXE |
-+					      KDBUS_ATTACH_CMDLINE |
-+					      KDBUS_ATTACH_CGROUP |
-+					      KDBUS_ATTACH_CAPS |
-+					      KDBUS_ATTACH_SECLABEL |
-+					      KDBUS_ATTACH_AUDIT);
-+		if (ret < 0)
-+			goto exit_unref;
-+	}
-+
-+	/*
-+	 * Account the connection against the current user (UID), or for
-+	 * custom endpoints use the anonymous user assigned to the endpoint.
-+	 * Note that limits are always accounted against the real UID, not
-+	 * the effective UID (cred->user always points to the accounting of
-+	 * cred->uid, not cred->euid).
-+	 * In case the caller is privileged, we allow changing the accounting
-+	 * to the faked user.
-+	 */
-+	if (ep->user) {
-+		conn->user = kdbus_user_ref(ep->user);
-+	} else {
-+		kuid_t uid;
-+
-+		if (conn->meta_fake && uid_valid(conn->meta_fake->uid) &&
-+		    conn->privileged)
-+			uid = conn->meta_fake->uid;
-+		else
-+			uid = conn->cred->uid;
-+
-+		conn->user = kdbus_user_lookup(ep->bus->domain, uid);
-+		if (IS_ERR(conn->user)) {
-+			ret = PTR_ERR(conn->user);
-+			conn->user = NULL;
-+			goto exit_unref;
-+		}
-+	}
-+
-+	if (atomic_inc_return(&conn->user->connections) > KDBUS_USER_MAX_CONN) {
-+		/* decremented by destructor as conn->user is valid */
-+		ret = -EMFILE;
-+		goto exit_unref;
-+	}
-+
-+	bloom_item.size = sizeof(bloom_item);
-+	bloom_item.type = KDBUS_ITEM_BLOOM_PARAMETER;
-+	bloom_item.bloom = bus->bloom;
-+	kdbus_kvec_set(&kvec, &bloom_item, bloom_item.size, &items_size);
-+
-+	slice = kdbus_pool_slice_alloc(conn->pool, items_size, false);
-+	if (IS_ERR(slice)) {
-+		ret = PTR_ERR(slice);
-+		slice = NULL;
-+		goto exit_unref;
-+	}
-+
-+	ret = kdbus_pool_slice_copy_kvec(slice, 0, &kvec, 1, items_size);
-+	if (ret < 0)
-+		goto exit_unref;
-+
-+	kdbus_pool_slice_publish(slice, &hello->offset, &hello->items_size);
-+	kdbus_pool_slice_release(slice);
-+
-+	return conn;
-+
-+exit_unref:
-+	kdbus_pool_slice_release(slice);
-+	kdbus_conn_unref(conn);
-+	return ERR_PTR(ret);
-+}
-+
-+static void __kdbus_conn_free(struct kref *kref)
-+{
-+	struct kdbus_conn *conn = container_of(kref, struct kdbus_conn, kref);
-+
-+	WARN_ON(kdbus_conn_active(conn));
-+	WARN_ON(delayed_work_pending(&conn->work));
-+	WARN_ON(!list_empty(&conn->queue.msg_list));
-+	WARN_ON(!list_empty(&conn->names_list));
-+	WARN_ON(!list_empty(&conn->reply_list));
-+
-+	if (conn->user) {
-+		atomic_dec(&conn->user->connections);
-+		kdbus_user_unref(conn->user);
-+	}
-+
-+	kdbus_meta_fake_free(conn->meta_fake);
-+	kdbus_meta_proc_unref(conn->meta_proc);
-+	kdbus_match_db_free(conn->match_db);
-+	kdbus_pool_free(conn->pool);
-+	kdbus_ep_unref(conn->ep);
-+	path_put(&conn->root_path);
-+	put_pid(conn->pid);
-+	put_cred(conn->cred);
-+	kfree(conn->description);
-+	kfree(conn->quota);
-+	kfree(conn);
-+}
-+
-+/**
-+ * kdbus_conn_ref() - take a connection reference
-+ * @conn:		Connection, may be %NULL
-+ *
-+ * Return: the connection itself
-+ */
-+struct kdbus_conn *kdbus_conn_ref(struct kdbus_conn *conn)
-+{
-+	if (conn)
-+		kref_get(&conn->kref);
-+	return conn;
-+}
-+
-+/**
-+ * kdbus_conn_unref() - drop a connection reference
-+ * @conn:		Connection (may be NULL)
-+ *
-+ * When the last reference is dropped, the connection's internal structure
-+ * is freed.
-+ *
-+ * Return: NULL
-+ */
-+struct kdbus_conn *kdbus_conn_unref(struct kdbus_conn *conn)
-+{
-+	if (conn)
-+		kref_put(&conn->kref, __kdbus_conn_free);
-+	return NULL;
-+}
-+
-+/**
-+ * kdbus_conn_active() - connection is not disconnected
-+ * @conn:		Connection to check
-+ *
-+ * Return true if the connection was not disconnected, yet. Note that a
-+ * connection might be disconnected asynchronously, unless you hold the
-+ * connection lock. If that's not suitable for you, see kdbus_conn_acquire() to
-+ * suppress connection shutdown for a short period.
-+ *
-+ * Return: true if the connection is still active
-+ */
-+bool kdbus_conn_active(const struct kdbus_conn *conn)
-+{
-+	return atomic_read(&conn->active) >= 0;
-+}
-+
-+/**
-+ * kdbus_conn_acquire() - acquire an active connection reference
-+ * @conn:		Connection
-+ *
-+ * Users can close a connection via KDBUS_BYEBYE (or by destroying the
-+ * endpoint/bus/...) at any time. Whenever this happens, we should deny any
-+ * user-visible action on this connection and signal ECONNRESET instead.
-+ * To avoid testing for connection availability everytime you take the
-+ * connection-lock, you can acquire a connection for short periods.
-+ *
-+ * By calling kdbus_conn_acquire(), you gain an "active reference" to the
-+ * connection. You must also hold a regular reference at any time! As long as
-+ * you hold the active-ref, the connection will not be shut down. However, if
-+ * the connection was shut down, you can never acquire an active-ref again.
-+ *
-+ * kdbus_conn_disconnect() disables the connection and then waits for all active
-+ * references to be dropped. It will also wake up any pending operation.
-+ * However, you must not sleep for an indefinite period while holding an
-+ * active-reference. Otherwise, kdbus_conn_disconnect() might stall. If you need
-+ * to sleep for an indefinite period, either release the reference and try to
-+ * acquire it again after waking up, or make kdbus_conn_disconnect() wake up
-+ * your wait-queue.
-+ *
-+ * Return: 0 on success, negative error code on failure.
-+ */
-+int kdbus_conn_acquire(struct kdbus_conn *conn)
-+{
-+	if (!atomic_inc_unless_negative(&conn->active))
-+		return -ECONNRESET;
-+
-+#ifdef CONFIG_DEBUG_LOCK_ALLOC
-+	rwsem_acquire_read(&conn->dep_map, 0, 1, _RET_IP_);
-+#endif
-+
-+	return 0;
-+}
-+
-+/**
-+ * kdbus_conn_release() - release an active connection reference
-+ * @conn:		Connection
-+ *
-+ * This releases an active reference that has been acquired via
-+ * kdbus_conn_acquire(). If the connection was already disabled and this is the
-+ * last active-ref that is dropped, the disconnect-waiter will be woken up and
-+ * properly close the connection.
-+ */
-+void kdbus_conn_release(struct kdbus_conn *conn)
-+{
-+	int v;
-+
-+	if (!conn)
-+		return;
-+
-+#ifdef CONFIG_DEBUG_LOCK_ALLOC
-+	rwsem_release(&conn->dep_map, 1, _RET_IP_);
-+#endif
-+
-+	v = atomic_dec_return(&conn->active);
-+	if (v != KDBUS_CONN_ACTIVE_BIAS)
-+		return;
-+
-+	wake_up_all(&conn->wait);
-+}
-+
-+static int kdbus_conn_connect(struct kdbus_conn *conn, const char *name)
-+{
-+	struct kdbus_ep *ep = conn->ep;
-+	struct kdbus_bus *bus = ep->bus;
-+	int ret;
-+
-+	if (WARN_ON(atomic_read(&conn->active) != KDBUS_CONN_ACTIVE_NEW))
-+		return -EALREADY;
-+
-+	/* make sure the ep-node is active while we add our connection */
-+	if (!kdbus_node_acquire(&ep->node))
-+		return -ESHUTDOWN;
-+
-+	/* lock order: domain -> bus -> ep -> names -> conn */
-+	mutex_lock(&ep->lock);
-+	down_write(&bus->conn_rwlock);
-+
-+	/* link into monitor list */
-+	if (kdbus_conn_is_monitor(conn))
-+		list_add_tail(&conn->monitor_entry, &bus->monitors_list);
-+
-+	/* link into bus and endpoint */
-+	list_add_tail(&conn->ep_entry, &ep->conn_list);
-+	hash_add(bus->conn_hash, &conn->hentry, conn->id);
-+
-+	/* enable lookups and acquire active ref */
-+	atomic_set(&conn->active, 1);
-+#ifdef CONFIG_DEBUG_LOCK_ALLOC
-+	rwsem_acquire_read(&conn->dep_map, 0, 1, _RET_IP_);
-+#endif
-+
-+	up_write(&bus->conn_rwlock);
-+	mutex_unlock(&ep->lock);
-+
-+	kdbus_node_release(&ep->node);
-+
-+	/*
-+	 * Notify subscribers about the new active connection, unless it is
-+	 * a monitor. Monitors are invisible on the bus, can't be addressed
-+	 * directly, and won't cause any notifications.
-+	 */
-+	if (!kdbus_conn_is_monitor(conn)) {
-+		ret = kdbus_notify_id_change(bus, KDBUS_ITEM_ID_ADD,
-+					     conn->id, conn->flags);
-+		if (ret < 0)
-+			goto exit_disconnect;
-+	}
-+
-+	if (kdbus_conn_is_activator(conn)) {
-+		u64 flags = KDBUS_NAME_ACTIVATOR;
-+
-+		if (WARN_ON(!name)) {
-+			ret = -EINVAL;
-+			goto exit_disconnect;
-+		}
-+
-+		ret = kdbus_name_acquire(bus->name_registry, conn, name,
-+					 flags, NULL);
-+		if (ret < 0)
-+			goto exit_disconnect;
-+	}
-+
-+	kdbus_conn_release(conn);
-+	kdbus_notify_flush(bus);
-+	return 0;
-+
-+exit_disconnect:
-+	kdbus_conn_release(conn);
-+	kdbus_conn_disconnect(conn, false);
-+	return ret;
-+}
-+
-+/**
-+ * kdbus_conn_disconnect() - disconnect a connection
-+ * @conn:		The connection to disconnect
-+ * @ensure_queue_empty:	Flag to indicate if the call should fail in
-+ *			case the connection's message list is not
-+ *			empty
-+ *
-+ * If @ensure_msg_list_empty is true, and the connection has pending messages,
-+ * -EBUSY is returned.
-+ *
-+ * Return: 0 on success, negative errno on failure
-+ */
-+int kdbus_conn_disconnect(struct kdbus_conn *conn, bool ensure_queue_empty)
-+{
-+	struct kdbus_queue_entry *entry, *tmp;
-+	struct kdbus_bus *bus = conn->ep->bus;
-+	struct kdbus_reply *r, *r_tmp;
-+	struct kdbus_conn *c;
-+	int i, v;
-+
-+	mutex_lock(&conn->lock);
-+	v = atomic_read(&conn->active);
-+	if (v == KDBUS_CONN_ACTIVE_NEW) {
-+		/* was never connected */
-+		mutex_unlock(&conn->lock);
-+		return 0;
-+	}
-+	if (v < 0) {
-+		/* already dead */
-+		mutex_unlock(&conn->lock);
-+		return -ECONNRESET;
-+	}
-+	if (ensure_queue_empty && !list_empty(&conn->queue.msg_list)) {
-+		/* still busy */
-+		mutex_unlock(&conn->lock);
-+		return -EBUSY;
-+	}
-+
-+	atomic_add(KDBUS_CONN_ACTIVE_BIAS, &conn->active);
-+	mutex_unlock(&conn->lock);
-+
-+	wake_up_interruptible(&conn->wait);
-+
-+#ifdef CONFIG_DEBUG_LOCK_ALLOC
-+	rwsem_acquire(&conn->dep_map, 0, 0, _RET_IP_);
-+	if (atomic_read(&conn->active) != KDBUS_CONN_ACTIVE_BIAS)
-+		lock_contended(&conn->dep_map, _RET_IP_);
-+#endif
-+
-+	wait_event(conn->wait,
-+		   atomic_read(&conn->active) == KDBUS_CONN_ACTIVE_BIAS);
-+
-+#ifdef CONFIG_DEBUG_LOCK_ALLOC
-+	lock_acquired(&conn->dep_map, _RET_IP_);
-+	rwsem_release(&conn->dep_map, 1, _RET_IP_);
-+#endif
-+
-+	cancel_delayed_work_sync(&conn->work);
-+	kdbus_policy_remove_owner(&conn->ep->bus->policy_db, conn);
-+
-+	/* lock order: domain -> bus -> ep -> names -> conn */
-+	mutex_lock(&conn->ep->lock);
-+	down_write(&bus->conn_rwlock);
-+
-+	/* remove from bus and endpoint */
-+	hash_del(&conn->hentry);
-+	list_del(&conn->monitor_entry);
-+	list_del(&conn->ep_entry);
-+
-+	up_write(&bus->conn_rwlock);
-+	mutex_unlock(&conn->ep->lock);
-+
-+	/*
-+	 * Remove all names associated with this connection; this possibly
-+	 * moves queued messages back to the activator connection.
-+	 */
-+	kdbus_name_release_all(bus->name_registry, conn);
-+
-+	/* if we die while other connections wait for our reply, notify them */
-+	mutex_lock(&conn->lock);
-+	list_for_each_entry_safe(entry, tmp, &conn->queue.msg_list, entry) {
-+		if (entry->reply)
-+			kdbus_notify_reply_dead(bus,
-+						entry->reply->reply_dst->id,
-+						entry->reply->cookie);
-+		kdbus_queue_entry_free(entry);
-+	}
-+
-+	list_for_each_entry_safe(r, r_tmp, &conn->reply_list, entry)
-+		kdbus_reply_unlink(r);
-+	mutex_unlock(&conn->lock);
-+
-+	/* lock order: domain -> bus -> ep -> names -> conn */
-+	down_read(&bus->conn_rwlock);
-+	hash_for_each(bus->conn_hash, i, c, hentry) {
-+		mutex_lock(&c->lock);
-+		list_for_each_entry_safe(r, r_tmp, &c->reply_list, entry) {
-+			if (r->reply_src != conn)
-+				continue;
-+
-+			if (r->sync)
-+				kdbus_sync_reply_wakeup(r, -EPIPE);
-+			else
-+				/* send a 'connection dead' notification */
-+				kdbus_notify_reply_dead(bus, c->id, r->cookie);
-+
-+			kdbus_reply_unlink(r);
-+		}
-+		mutex_unlock(&c->lock);
-+	}
-+	up_read(&bus->conn_rwlock);
-+
-+	if (!kdbus_conn_is_monitor(conn))
-+		kdbus_notify_id_change(bus, KDBUS_ITEM_ID_REMOVE,
-+				       conn->id, conn->flags);
-+
-+	kdbus_notify_flush(bus);
-+
-+	return 0;
-+}
-+
-+/**
-+ * kdbus_conn_has_name() - check if a connection owns a name
-+ * @conn:		Connection
-+ * @name:		Well-know name to check for
-+ *
-+ * The caller must hold the registry lock of conn->ep->bus.
-+ *
-+ * Return: true if the name is currently owned by the connection
-+ */
-+bool kdbus_conn_has_name(struct kdbus_conn *conn, const char *name)
-+{
-+	struct kdbus_name_owner *owner;
-+
-+	lockdep_assert_held(&conn->ep->bus->name_registry->rwlock);
-+
-+	list_for_each_entry(owner, &conn->names_list, conn_entry)
-+		if (!(owner->flags & KDBUS_NAME_IN_QUEUE) &&
-+		    !strcmp(name, owner->name->name))
-+			return true;
-+
-+	return false;
-+}
-+
-+struct kdbus_quota {
-+	u32 memory;
-+	u16 msgs;
-+	u8 fds;
-+};
-+
-+/**
-+ * kdbus_conn_quota_inc() - increase quota accounting
-+ * @c:		connection owning the quota tracking
-+ * @u:		user to account for (or NULL for kernel accounting)
-+ * @memory:	size of memory to account for
-+ * @fds:	number of FDs to account for
-+ *
-+ * This call manages the quotas on resource @c. That is, it's used if other
-+ * users want to use the resources of connection @c, which so far only concerns
-+ * the receive queue of the destination.
-+ *
-+ * This increases the quota-accounting for user @u by @memory bytes and @fds
-+ * file descriptors. If the user has already reached the quota limits, this call
-+ * will not do any accounting but return a negative error code indicating the
-+ * failure.
-+ *
-+ * Return: 0 on success, negative error code on failure.
-+ */
-+int kdbus_conn_quota_inc(struct kdbus_conn *c, struct kdbus_user *u,
-+			 size_t memory, size_t fds)
-+{
-+	struct kdbus_quota *quota;
-+	size_t available, accounted;
-+	unsigned int id;
-+
-+	/*
-+	 * Pool Layout:
-+	 * 50% of a pool is always owned by the connection. It is reserved for
-+	 * kernel queries, handling received messages and other tasks that are
-+	 * under control of the pool owner. The other 50% of the pool are used
-+	 * as incoming queue.
-+	 * As we optionally support user-space based policies, we need fair
-+	 * allocation schemes. Furthermore, resource utilization should be
-+	 * maximized, so only minimal resources stay reserved. However, we need
-+	 * to adapt to a dynamic number of users, as we cannot know how many
-+	 * users will talk to a connection. Therefore, the current allocation
-+	 * works like this:
-+	 * We limit the number of bytes in a destination's pool per sending
-+	 * user. The space available for a user is 33% of the unused pool space
-+	 * (whereas the space used by the user itself is also treated as
-+	 * 'unused'). This way, we favor users coming first, but keep enough
-+	 * pool space available for any following users. Given that messages are
-+	 * dequeued in FIFO order, this should balance nicely if the number of
-+	 * users grows. At the same time, this algorithm guarantees that the
-+	 * space available to a connection is reduced dynamically, the more
-+	 * concurrent users talk to a connection.
-+	 */
-+
-+	/* per user-accounting is expensive, so we keep state small */
-+	BUILD_BUG_ON(sizeof(quota->memory) != 4);
-+	BUILD_BUG_ON(sizeof(quota->msgs) != 2);
-+	BUILD_BUG_ON(sizeof(quota->fds) != 1);
-+	BUILD_BUG_ON(KDBUS_CONN_MAX_MSGS > U16_MAX);
-+	BUILD_BUG_ON(KDBUS_CONN_MAX_FDS_PER_USER > U8_MAX);
-+
-+	id = u ? u->id : KDBUS_USER_KERNEL_ID;
-+	if (id >= c->n_quota) {
-+		unsigned int users;
-+
-+		users = max(KDBUS_ALIGN8(id) + 8, id);
-+		quota = krealloc(c->quota, users * sizeof(*quota),
-+				 GFP_KERNEL | __GFP_ZERO);
-+		if (!quota)
-+			return -ENOMEM;
-+
-+		c->n_quota = users;
-+		c->quota = quota;
-+	}
-+
-+	quota = &c->quota[id];
-+	kdbus_pool_accounted(c->pool, &available, &accounted);
-+
-+	/* half the pool is _always_ reserved for the pool owner */
-+	available /= 2;
-+
-+	/*
-+	 * Pool owner slices are un-accounted slices; they can claim more
-+	 * than 50% of the queue. However, the slices we're dealing with here
-+	 * belong to the incoming queue, hence they are 'accounted' slices
-+	 * to which the 50%-limit applies.
-+	 */
-+	if (available < accounted)
-+		return -ENOBUFS;
-+
-+	/* 1/3 of the remaining space (including your own memory) */
-+	available = (available - accounted + quota->memory) / 3;
-+
-+	if (available < quota->memory ||
-+	    available - quota->memory < memory ||
-+	    quota->memory + memory > U32_MAX)
-+		return -ENOBUFS;
-+	if (quota->msgs >= KDBUS_CONN_MAX_MSGS)
-+		return -ENOBUFS;
-+	if (quota->fds + fds < quota->fds ||
-+	    quota->fds + fds > KDBUS_CONN_MAX_FDS_PER_USER)
-+		return -EMFILE;
-+
-+	quota->memory += memory;
-+	quota->fds += fds;
-+	++quota->msgs;
-+	return 0;
-+}
-+
-+/**
-+ * kdbus_conn_quota_dec() - decrease quota accounting
-+ * @c:		connection owning the quota tracking
-+ * @u:		user which was accounted for (or NULL for kernel accounting)
-+ * @memory:	size of memory which was accounted for
-+ * @fds:	number of FDs which were accounted for
-+ *
-+ * This does the reverse of kdbus_conn_quota_inc(). You have to release any
-+ * accounted resources that you called kdbus_conn_quota_inc() for. However, you
-+ * must not call kdbus_conn_quota_dec() if the accounting failed (that is,
-+ * kdbus_conn_quota_inc() failed).
-+ */
-+void kdbus_conn_quota_dec(struct kdbus_conn *c, struct kdbus_user *u,
-+			  size_t memory, size_t fds)
-+{
-+	struct kdbus_quota *quota;
-+	unsigned int id;
-+
-+	id = u ? u->id : KDBUS_USER_KERNEL_ID;
-+	if (WARN_ON(id >= c->n_quota))
-+		return;
-+
-+	quota = &c->quota[id];
-+
-+	if (!WARN_ON(quota->msgs == 0))
-+		--quota->msgs;
-+	if (!WARN_ON(quota->memory < memory))
-+		quota->memory -= memory;
-+	if (!WARN_ON(quota->fds < fds))
-+		quota->fds -= fds;
-+}
-+
-+/**
-+ * kdbus_conn_lost_message() - handle lost messages
-+ * @c:		connection that lost a message
-+ *
-+ * kdbus is reliable. That means, we try hard to never lose messages. However,
-+ * memory is limited, so we cannot rely on transmissions to never fail.
-+ * Therefore, we use quota-limits to let callers know if their unicast message
-+ * cannot be transmitted to a peer. This works fine for unicasts, but for
-+ * broadcasts we cannot make the caller handle the transmission failure.
-+ * Instead, we must let the destination know that it couldn't receive a
-+ * broadcast.
-+ * As this is an unlikely scenario, we keep it simple. A single lost-counter
-+ * remembers the number of lost messages since the last call to RECV. The next
-+ * message retrieval will notify the connection that it lost messages since the
-+ * last message retrieval and thus should resync its state.
-+ */
-+void kdbus_conn_lost_message(struct kdbus_conn *c)
-+{
-+	if (atomic_inc_return(&c->lost_count) == 1)
-+		wake_up_interruptible(&c->wait);
-+}
-+
-+/* Callers should take the conn_dst lock */
-+static struct kdbus_queue_entry *
-+kdbus_conn_entry_make(struct kdbus_conn *conn_src,
-+		      struct kdbus_conn *conn_dst,
-+		      struct kdbus_staging *staging)
-+{
-+	/* The remote connection was disconnected */
-+	if (!kdbus_conn_active(conn_dst))
-+		return ERR_PTR(-ECONNRESET);
-+
-+	/*
-+	 * If the connection does not accept file descriptors but the message
-+	 * has some attached, refuse it.
-+	 *
-+	 * If this is a monitor connection, accept the message. In that
-+	 * case, all file descriptors will be set to -1 at receive time.
-+	 */
-+	if (!kdbus_conn_is_monitor(conn_dst) &&
-+	    !(conn_dst->flags & KDBUS_HELLO_ACCEPT_FD) &&
-+	    staging->gaps && staging->gaps->n_fds > 0)
-+		return ERR_PTR(-ECOMM);
-+
-+	return kdbus_queue_entry_new(conn_src, conn_dst, staging);
-+}
-+
-+/*
-+ * Synchronously responding to a message, allocate a queue entry
-+ * and attach it to the reply tracking object.
-+ * The connection's queue will never get to see it.
-+ */
-+static int kdbus_conn_entry_sync_attach(struct kdbus_conn *conn_dst,
-+					struct kdbus_staging *staging,
-+					struct kdbus_reply *reply_wake)
-+{
-+	struct kdbus_queue_entry *entry;
-+	int remote_ret, ret = 0;
-+
-+	mutex_lock(&reply_wake->reply_dst->lock);
-+
-+	/*
-+	 * If we are still waiting then proceed, allocate a queue
-+	 * entry and attach it to the reply object
-+	 */
-+	if (reply_wake->waiting) {
-+		entry = kdbus_conn_entry_make(reply_wake->reply_src, conn_dst,
-+					      staging);
-+		if (IS_ERR(entry))
-+			ret = PTR_ERR(entry);
-+		else
-+			/* Attach the entry to the reply object */
-+			reply_wake->queue_entry = entry;
-+	} else {
-+		ret = -ECONNRESET;
-+	}
-+
-+	/*
-+	 * Update the reply object and wake up remote peer only
-+	 * on appropriate return codes
-+	 *
-+	 * * -ECOMM: if the replying connection failed with -ECOMM
-+	 *           then wakeup remote peer with -EREMOTEIO
-+	 *
-+	 *           We do this to differenciate between -ECOMM errors
-+	 *           from the original sender perspective:
-+	 *           -ECOMM error during the sync send and
-+	 *           -ECOMM error during the sync reply, this last
-+	 *           one is rewritten to -EREMOTEIO
-+	 *
-+	 * * Wake up on all other return codes.
-+	 */
-+	remote_ret = ret;
-+
-+	if (ret == -ECOMM)
-+		remote_ret = -EREMOTEIO;
-+
-+	kdbus_sync_reply_wakeup(reply_wake, remote_ret);
-+	kdbus_reply_unlink(reply_wake);
-+	mutex_unlock(&reply_wake->reply_dst->lock);
-+
-+	return ret;
-+}
-+
-+/**
-+ * kdbus_conn_entry_insert() - enqueue a message into the receiver's pool
-+ * @conn_src:		The sending connection
-+ * @conn_dst:		The connection to queue into
-+ * @staging:		Message to send
-+ * @reply:		The reply tracker to attach to the queue entry
-+ * @name:		Destination name this msg is sent to, or NULL
-+ *
-+ * Return: 0 on success. negative error otherwise.
-+ */
-+int kdbus_conn_entry_insert(struct kdbus_conn *conn_src,
-+			    struct kdbus_conn *conn_dst,
-+			    struct kdbus_staging *staging,
-+			    struct kdbus_reply *reply,
-+			    const struct kdbus_name_entry *name)
-+{
-+	struct kdbus_queue_entry *entry;
-+	int ret;
-+
-+	kdbus_conn_lock2(conn_src, conn_dst);
-+
-+	entry = kdbus_conn_entry_make(conn_src, conn_dst, staging);
-+	if (IS_ERR(entry)) {
-+		ret = PTR_ERR(entry);
-+		goto exit_unlock;
-+	}
-+
-+	if (reply) {
-+		kdbus_reply_link(reply);
-+		if (!reply->sync)
-+			schedule_delayed_work(&conn_src->work, 0);
-+	}
-+
-+	/*
-+	 * Record the sequence number of the registered name; it will
-+	 * be remembered by the queue, in case messages addressed to a
-+	 * name need to be moved from or to an activator.
-+	 */
-+	if (name)
-+		entry->dst_name_id = name->name_id;
-+
-+	kdbus_queue_entry_enqueue(entry, reply);
-+	wake_up_interruptible(&conn_dst->wait);
-+
-+	ret = 0;
-+
-+exit_unlock:
-+	kdbus_conn_unlock2(conn_src, conn_dst);
-+	return ret;
-+}
-+
-+static int kdbus_conn_wait_reply(struct kdbus_conn *conn_src,
-+				 struct kdbus_cmd_send *cmd_send,
-+				 struct file *ioctl_file,
-+				 struct file *cancel_fd,
-+				 struct kdbus_reply *reply_wait,
-+				 ktime_t expire)
-+{
-+	struct kdbus_queue_entry *entry;
-+	struct poll_wqueues pwq = {};
-+	int ret;
-+
-+	if (WARN_ON(!reply_wait))
-+		return -EIO;
-+
-+	/*
-+	 * Block until the reply arrives. reply_wait is left untouched
-+	 * by the timeout scans that might be conducted for other,
-+	 * asynchronous replies of conn_src.
-+	 */
-+
-+	poll_initwait(&pwq);
-+	poll_wait(ioctl_file, &conn_src->wait, &pwq.pt);
-+
-+	for (;;) {
-+		/*
-+		 * Any of the following conditions will stop our synchronously
-+		 * blocking SEND command:
-+		 *
-+		 * a) The origin sender closed its connection
-+		 * b) The remote peer answered, setting reply_wait->waiting = 0
-+		 * c) The cancel FD was written to
-+		 * d) A signal was received
-+		 * e) The specified timeout was reached, and none of the above
-+		 *    conditions kicked in.
-+		 */
-+
-+		/*
-+		 * We have already acquired an active reference when
-+		 * entering here, but another thread may call
-+		 * KDBUS_CMD_BYEBYE which does not acquire an active
-+		 * reference, therefore kdbus_conn_disconnect() will
-+		 * not wait for us.
-+		 */
-+		if (!kdbus_conn_active(conn_src)) {
-+			ret = -ECONNRESET;
-+			break;
-+		}
-+
-+		/*
-+		 * After the replying peer unset the waiting variable
-+		 * it will wake up us.
-+		 */
-+		if (!reply_wait->waiting) {
-+			ret = reply_wait->err;
-+			break;
-+		}
-+
-+		if (cancel_fd) {
-+			unsigned int r;
-+
-+			r = cancel_fd->f_op->poll(cancel_fd, &pwq.pt);
-+			if (r & POLLIN) {
-+				ret = -ECANCELED;
-+				break;
-+			}
-+		}
-+
-+		if (signal_pending(current)) {
-+			ret = -EINTR;
-+			break;
-+		}
-+
-+		if (!poll_schedule_timeout(&pwq, TASK_INTERRUPTIBLE,
-+					   &expire, 0)) {
-+			ret = -ETIMEDOUT;
-+			break;
-+		}
-+
-+		/*
-+		 * Reset the poll worker func, so the waitqueues are not
-+		 * added to the poll table again. We just reuse what we've
-+		 * collected earlier for further iterations.
-+		 */
-+		init_poll_funcptr(&pwq.pt, NULL);
-+	}
-+
-+	poll_freewait(&pwq);
-+
-+	if (ret == -EINTR) {
-+		/*
-+		 * Interrupted system call. Unref the reply object, and pass
-+		 * the return value down the chain. Mark the reply as
-+		 * interrupted, so the cleanup work can remove it, but do not
-+		 * unlink it from the list. Once the syscall restarts, we'll
-+		 * pick it up and wait on it again.
-+		 */
-+		mutex_lock(&conn_src->lock);
-+		reply_wait->interrupted = true;
-+		schedule_delayed_work(&conn_src->work, 0);
-+		mutex_unlock(&conn_src->lock);
-+
-+		return -ERESTARTSYS;
-+	}
-+
-+	mutex_lock(&conn_src->lock);
-+	reply_wait->waiting = false;
-+	entry = reply_wait->queue_entry;
-+	if (entry) {
-+		ret = kdbus_queue_entry_install(entry,
-+						&cmd_send->reply.return_flags,
-+						true);
-+		kdbus_pool_slice_publish(entry->slice, &cmd_send->reply.offset,
-+					 &cmd_send->reply.msg_size);
-+		kdbus_queue_entry_free(entry);
-+	}
-+	kdbus_reply_unlink(reply_wait);
-+	mutex_unlock(&conn_src->lock);
-+
-+	return ret;
-+}
-+
-+static int kdbus_pin_dst(struct kdbus_bus *bus,
-+			 struct kdbus_staging *staging,
-+			 struct kdbus_name_entry **out_name,
-+			 struct kdbus_conn **out_dst)
-+{
-+	const struct kdbus_msg *msg = staging->msg;
-+	struct kdbus_name_owner *owner = NULL;
-+	struct kdbus_name_entry *name = NULL;
-+	struct kdbus_conn *dst = NULL;
-+	int ret;
-+
-+	lockdep_assert_held(&bus->name_registry->rwlock);
-+
-+	if (!staging->dst_name) {
-+		dst = kdbus_bus_find_conn_by_id(bus, msg->dst_id);
-+		if (!dst)
-+			return -ENXIO;
-+
-+		if (!kdbus_conn_is_ordinary(dst)) {
-+			ret = -ENXIO;
-+			goto error;
-+		}
-+	} else {
-+		name = kdbus_name_lookup_unlocked(bus->name_registry,
-+						  staging->dst_name);
-+		if (name)
-+			owner = kdbus_name_get_owner(name);
-+		if (!owner)
-+			return -ESRCH;
-+
-+		/*
-+		 * If both a name and a connection ID are given as destination
-+		 * of a message, check that the currently owning connection of
-+		 * the name matches the specified ID.
-+		 * This way, we allow userspace to send the message to a
-+		 * specific connection by ID only if the connection currently
-+		 * owns the given name.
-+		 */
-+		if (msg->dst_id != KDBUS_DST_ID_NAME &&
-+		    msg->dst_id != owner->conn->id)
-+			return -EREMCHG;
-+
-+		if ((msg->flags & KDBUS_MSG_NO_AUTO_START) &&
-+		    kdbus_conn_is_activator(owner->conn))
-+			return -EADDRNOTAVAIL;
-+
-+		dst = kdbus_conn_ref(owner->conn);
-+	}
-+
-+	*out_name = name;
-+	*out_dst = dst;
-+	return 0;
-+
-+error:
-+	kdbus_conn_unref(dst);
-+	return ret;
-+}
-+
-+static int kdbus_conn_reply(struct kdbus_conn *src,
-+			    struct kdbus_staging *staging)
-+{
-+	const struct kdbus_msg *msg = staging->msg;
-+	struct kdbus_name_entry *name = NULL;
-+	struct kdbus_reply *reply, *wake = NULL;
-+	struct kdbus_conn *dst = NULL;
-+	struct kdbus_bus *bus = src->ep->bus;
-+	int ret;
-+
-+	if (WARN_ON(msg->dst_id == KDBUS_DST_ID_BROADCAST) ||
-+	    WARN_ON(msg->flags & KDBUS_MSG_EXPECT_REPLY) ||
-+	    WARN_ON(msg->flags & KDBUS_MSG_SIGNAL))
-+		return -EINVAL;
-+
-+	/* name-registry must be locked for lookup *and* collecting data */
-+	down_read(&bus->name_registry->rwlock);
-+
-+	/* find and pin destination */
-+
-+	ret = kdbus_pin_dst(bus, staging, &name, &dst);
-+	if (ret < 0)
-+		goto exit;
-+
-+	mutex_lock(&dst->lock);
-+	reply = kdbus_reply_find(src, dst, msg->cookie_reply);
-+	if (reply) {
-+		if (reply->sync)
-+			wake = kdbus_reply_ref(reply);
-+		kdbus_reply_unlink(reply);
-+	}
-+	mutex_unlock(&dst->lock);
-+
-+	if (!reply) {
-+		ret = -EBADSLT;
-+		goto exit;
-+	}
-+
-+	/* send message */
-+
-+	kdbus_bus_eavesdrop(bus, src, staging);
-+
-+	if (wake)
-+		ret = kdbus_conn_entry_sync_attach(dst, staging, wake);
-+	else
-+		ret = kdbus_conn_entry_insert(src, dst, staging, NULL, name);
-+
-+exit:
-+	up_read(&bus->name_registry->rwlock);
-+	kdbus_reply_unref(wake);
-+	kdbus_conn_unref(dst);
-+	return ret;
-+}
-+
-+static struct kdbus_reply *kdbus_conn_call(struct kdbus_conn *src,
-+					   struct kdbus_staging *staging,
-+					   ktime_t exp)
-+{
-+	const struct kdbus_msg *msg = staging->msg;
-+	struct kdbus_name_entry *name = NULL;
-+	struct kdbus_reply *wait = NULL;
-+	struct kdbus_conn *dst = NULL;
-+	struct kdbus_bus *bus = src->ep->bus;
-+	int ret;
-+
-+	if (WARN_ON(msg->dst_id == KDBUS_DST_ID_BROADCAST) ||
-+	    WARN_ON(msg->flags & KDBUS_MSG_SIGNAL) ||
-+	    WARN_ON(!(msg->flags & KDBUS_MSG_EXPECT_REPLY)))
-+		return ERR_PTR(-EINVAL);
-+
-+	/* resume previous wait-context, if available */
-+
-+	mutex_lock(&src->lock);
-+	wait = kdbus_reply_find(NULL, src, msg->cookie);
-+	if (wait) {
-+		if (wait->interrupted) {
-+			kdbus_reply_ref(wait);
-+			wait->interrupted = false;
-+		} else {
-+			wait = NULL;
-+		}
-+	}
-+	mutex_unlock(&src->lock);
-+
-+	if (wait)
-+		return wait;
-+
-+	if (ktime_compare(ktime_get(), exp) >= 0)
-+		return ERR_PTR(-ETIMEDOUT);
-+
-+	/* name-registry must be locked for lookup *and* collecting data */
-+	down_read(&bus->name_registry->rwlock);
-+
-+	/* find and pin destination */
-+
-+	ret = kdbus_pin_dst(bus, staging, &name, &dst);
-+	if (ret < 0)
-+		goto exit;
-+
-+	if (!kdbus_conn_policy_talk(src, current_cred(), dst)) {
-+		ret = -EPERM;
-+		goto exit;
-+	}
-+
-+	wait = kdbus_reply_new(dst, src, msg, name, true);
-+	if (IS_ERR(wait)) {
-+		ret = PTR_ERR(wait);
-+		wait = NULL;
-+		goto exit;
-+	}
-+
-+	/* send message */
-+
-+	kdbus_bus_eavesdrop(bus, src, staging);
-+
-+	ret = kdbus_conn_entry_insert(src, dst, staging, wait, name);
-+	if (ret < 0)
-+		goto exit;
-+
-+	ret = 0;
-+
-+exit:
-+	up_read(&bus->name_registry->rwlock);
-+	if (ret < 0) {
-+		kdbus_reply_unref(wait);
-+		wait = ERR_PTR(ret);
-+	}
-+	kdbus_conn_unref(dst);
-+	return wait;
-+}
-+
-+static int kdbus_conn_unicast(struct kdbus_conn *src,
-+			      struct kdbus_staging *staging)
-+{
-+	const struct kdbus_msg *msg = staging->msg;
-+	struct kdbus_name_entry *name = NULL;
-+	struct kdbus_reply *wait = NULL;
-+	struct kdbus_conn *dst = NULL;
-+	struct kdbus_bus *bus = src->ep->bus;
-+	bool is_signal = (msg->flags & KDBUS_MSG_SIGNAL);
-+	int ret = 0;
-+
-+	if (WARN_ON(msg->dst_id == KDBUS_DST_ID_BROADCAST) ||
-+	    WARN_ON(!(msg->flags & KDBUS_MSG_EXPECT_REPLY) &&
-+		    msg->cookie_reply != 0))
-+		return -EINVAL;
-+
-+	/* name-registry must be locked for lookup *and* collecting data */
-+	down_read(&bus->name_registry->rwlock);
-+
-+	/* find and pin destination */
-+
-+	ret = kdbus_pin_dst(bus, staging, &name, &dst);
-+	if (ret < 0)
-+		goto exit;
-+
-+	if (is_signal) {
-+		/* like broadcasts we eavesdrop even if the msg is dropped */
-+		kdbus_bus_eavesdrop(bus, src, staging);
-+
-+		/* drop silently if peer is not interested or not privileged */
-+		if (!kdbus_match_db_match_msg(dst->match_db, src, staging) ||
-+		    !kdbus_conn_policy_talk(dst, NULL, src))
-+			goto exit;
-+	} else if (!kdbus_conn_policy_talk(src, current_cred(), dst)) {
-+		ret = -EPERM;
-+		goto exit;
-+	} else if (msg->flags & KDBUS_MSG_EXPECT_REPLY) {
-+		wait = kdbus_reply_new(dst, src, msg, name, false);
-+		if (IS_ERR(wait)) {
-+			ret = PTR_ERR(wait);
-+			wait = NULL;
-+			goto exit;
-+		}
-+	}
-+
-+	/* send message */
-+
-+	if (!is_signal)
-+		kdbus_bus_eavesdrop(bus, src, staging);
-+
-+	ret = kdbus_conn_entry_insert(src, dst, staging, wait, name);
-+	if (ret < 0 && !is_signal)
-+		goto exit;
-+
-+	/* signals are treated like broadcasts, recv-errors are ignored */
-+	ret = 0;
-+
-+exit:
-+	up_read(&bus->name_registry->rwlock);
-+	kdbus_reply_unref(wait);
-+	kdbus_conn_unref(dst);
-+	return ret;
-+}
-+
-+/**
-+ * kdbus_conn_move_messages() - move messages from one connection to another
-+ * @conn_dst:		Connection to copy to
-+ * @conn_src:		Connection to copy from
-+ * @name_id:		Filter for the sequence number of the registered
-+ *			name, 0 means no filtering.
-+ *
-+ * Move all messages from one connection to another. This is used when
-+ * an implementer connection is taking over/giving back a well-known name
-+ * from/to an activator connection.
-+ */
-+void kdbus_conn_move_messages(struct kdbus_conn *conn_dst,
-+			      struct kdbus_conn *conn_src,
-+			      u64 name_id)
-+{
-+	struct kdbus_queue_entry *e, *e_tmp;
-+	struct kdbus_reply *r, *r_tmp;
-+	struct kdbus_bus *bus;
-+	struct kdbus_conn *c;
-+	LIST_HEAD(msg_list);
-+	int i, ret = 0;
-+
-+	if (WARN_ON(conn_src == conn_dst))
-+		return;
-+
-+	bus = conn_src->ep->bus;
-+
-+	/* lock order: domain -> bus -> ep -> names -> conn */
-+	down_read(&bus->conn_rwlock);
-+	hash_for_each(bus->conn_hash, i, c, hentry) {
-+		if (c == conn_src || c == conn_dst)
-+			continue;
-+
-+		mutex_lock(&c->lock);
-+		list_for_each_entry_safe(r, r_tmp, &c->reply_list, entry) {
-+			if (r->reply_src != conn_src)
-+				continue;
-+
-+			/* filter messages for a specific name */
-+			if (name_id > 0 && r->name_id != name_id)
-+				continue;
-+
-+			kdbus_conn_unref(r->reply_src);
-+			r->reply_src = kdbus_conn_ref(conn_dst);
-+		}
-+		mutex_unlock(&c->lock);
-+	}
-+	up_read(&bus->conn_rwlock);
-+
-+	kdbus_conn_lock2(conn_src, conn_dst);
-+	list_for_each_entry_safe(e, e_tmp, &conn_src->queue.msg_list, entry) {
-+		/* filter messages for a specific name */
-+		if (name_id > 0 && e->dst_name_id != name_id)
-+			continue;
-+
-+		if (!(conn_dst->flags & KDBUS_HELLO_ACCEPT_FD) &&
-+		    e->gaps && e->gaps->n_fds > 0) {
-+			kdbus_conn_lost_message(conn_dst);
-+			kdbus_queue_entry_free(e);
-+			continue;
-+		}
-+
-+		ret = kdbus_queue_entry_move(e, conn_dst);
-+		if (ret < 0) {
-+			kdbus_conn_lost_message(conn_dst);
-+			kdbus_queue_entry_free(e);
-+			continue;
-+		}
-+	}
-+	kdbus_conn_unlock2(conn_src, conn_dst);
-+
-+	/* wake up poll() */
-+	wake_up_interruptible(&conn_dst->wait);
-+}
-+
-+/* query the policy-database for all names of @whom */
-+static bool kdbus_conn_policy_query_all(struct kdbus_conn *conn,
-+					const struct cred *conn_creds,
-+					struct kdbus_policy_db *db,
-+					struct kdbus_conn *whom,
-+					unsigned int access)
-+{
-+	struct kdbus_name_owner *owner;
-+	bool pass = false;
-+	int res;
-+
-+	lockdep_assert_held(&conn->ep->bus->name_registry->rwlock);
-+
-+	down_read(&db->entries_rwlock);
-+	mutex_lock(&whom->lock);
-+
-+	list_for_each_entry(owner, &whom->names_list, conn_entry) {
-+		if (owner->flags & KDBUS_NAME_IN_QUEUE)
-+			continue;
-+
-+		res = kdbus_policy_query_unlocked(db,
-+					conn_creds ? : conn->cred,
-+					owner->name->name,
-+					kdbus_strhash(owner->name->name));
-+		if (res >= (int)access) {
-+			pass = true;
-+			break;
-+		}
-+	}
-+
-+	mutex_unlock(&whom->lock);
-+	up_read(&db->entries_rwlock);
-+
-+	return pass;
-+}
-+
-+/**
-+ * kdbus_conn_policy_own_name() - verify a connection can own the given name
-+ * @conn:		Connection
-+ * @conn_creds:		Credentials of @conn to use for policy check
-+ * @name:		Name
-+ *
-+ * This verifies that @conn is allowed to acquire the well-known name @name.
-+ *
-+ * Return: true if allowed, false if not.
-+ */
-+bool kdbus_conn_policy_own_name(struct kdbus_conn *conn,
-+				const struct cred *conn_creds,
-+				const char *name)
-+{
-+	unsigned int hash = kdbus_strhash(name);
-+	int res;
-+
-+	if (!conn_creds)
-+		conn_creds = conn->cred;
-+
-+	if (conn->ep->user) {
-+		res = kdbus_policy_query(&conn->ep->policy_db, conn_creds,
-+					 name, hash);
-+		if (res < KDBUS_POLICY_OWN)
-+			return false;
-+	}
-+
-+	if (conn->owner)
-+		return true;
-+
-+	res = kdbus_policy_query(&conn->ep->bus->policy_db, conn_creds,
-+				 name, hash);
-+	return res >= KDBUS_POLICY_OWN;
-+}
-+
-+/**
-+ * kdbus_conn_policy_talk() - verify a connection can talk to a given peer
-+ * @conn:		Connection that tries to talk
-+ * @conn_creds:		Credentials of @conn to use for policy check
-+ * @to:			Connection that is talked to
-+ *
-+ * This verifies that @conn is allowed to talk to @to.
-+ *
-+ * Return: true if allowed, false if not.
-+ */
-+bool kdbus_conn_policy_talk(struct kdbus_conn *conn,
-+			    const struct cred *conn_creds,
-+			    struct kdbus_conn *to)
-+{
-+	if (!conn_creds)
-+		conn_creds = conn->cred;
-+
-+	if (conn->ep->user &&
-+	    !kdbus_conn_policy_query_all(conn, conn_creds, &conn->ep->policy_db,
-+					 to, KDBUS_POLICY_TALK))
-+		return false;
-+
-+	if (conn->owner)
-+		return true;
-+	if (uid_eq(conn_creds->euid, to->cred->uid))
-+		return true;
-+
-+	return kdbus_conn_policy_query_all(conn, conn_creds,
-+					   &conn->ep->bus->policy_db, to,
-+					   KDBUS_POLICY_TALK);
-+}
-+
-+/**
-+ * kdbus_conn_policy_see_name_unlocked() - verify a connection can see a given
-+ *					   name
-+ * @conn:		Connection
-+ * @conn_creds:		Credentials of @conn to use for policy check
-+ * @name:		Name
-+ *
-+ * This verifies that @conn is allowed to see the well-known name @name. Caller
-+ * must hold policy-lock.
-+ *
-+ * Return: true if allowed, false if not.
-+ */
-+bool kdbus_conn_policy_see_name_unlocked(struct kdbus_conn *conn,
-+					 const struct cred *conn_creds,
-+					 const char *name)
-+{
-+	int res;
-+
-+	/*
-+	 * By default, all names are visible on a bus. SEE policies can only be
-+	 * installed on custom endpoints, where by default no name is visible.
-+	 */
-+	if (!conn->ep->user)
-+		return true;
-+
-+	res = kdbus_policy_query_unlocked(&conn->ep->policy_db,
-+					  conn_creds ? : conn->cred,
-+					  name, kdbus_strhash(name));
-+	return res >= KDBUS_POLICY_SEE;
-+}
-+
-+static bool kdbus_conn_policy_see_name(struct kdbus_conn *conn,
-+				       const struct cred *conn_creds,
-+				       const char *name)
-+{
-+	bool res;
-+
-+	down_read(&conn->ep->policy_db.entries_rwlock);
-+	res = kdbus_conn_policy_see_name_unlocked(conn, conn_creds, name);
-+	up_read(&conn->ep->policy_db.entries_rwlock);
-+
-+	return res;
-+}
-+
-+static bool kdbus_conn_policy_see(struct kdbus_conn *conn,
-+				  const struct cred *conn_creds,
-+				  struct kdbus_conn *whom)
-+{
-+	/*
-+	 * By default, all names are visible on a bus, so a connection can
-+	 * always see other connections. SEE policies can only be installed on
-+	 * custom endpoints, where by default no name is visible and we hide
-+	 * peers from each other, unless you see at least _one_ name of the
-+	 * peer.
-+	 */
-+	return !conn->ep->user ||
-+	       kdbus_conn_policy_query_all(conn, conn_creds,
-+					   &conn->ep->policy_db, whom,
-+					   KDBUS_POLICY_SEE);
-+}
-+
-+/**
-+ * kdbus_conn_policy_see_notification() - verify a connection is allowed to
-+ *					  receive a given kernel notification
-+ * @conn:		Connection
-+ * @conn_creds:		Credentials of @conn to use for policy check
-+ * @msg:		Notification message
-+ *
-+ * This checks whether @conn is allowed to see the kernel notification.
-+ *
-+ * Return: true if allowed, false if not.
-+ */
-+bool kdbus_conn_policy_see_notification(struct kdbus_conn *conn,
-+					const struct cred *conn_creds,
-+					const struct kdbus_msg *msg)
-+{
-+	/*
-+	 * Depending on the notification type, broadcasted kernel notifications
-+	 * have to be filtered:
-+	 *
-+	 * KDBUS_ITEM_NAME_{ADD,REMOVE,CHANGE}: This notification is forwarded
-+	 *     to a peer if, and only if, that peer can see the name this
-+	 *     notification is for.
-+	 *
-+	 * KDBUS_ITEM_ID_{ADD,REMOVE}: Notifications for ID changes are
-+	 *     broadcast to everyone, to allow tracking peers.
-+	 */
-+
-+	switch (msg->items[0].type) {
-+	case KDBUS_ITEM_NAME_ADD:
-+	case KDBUS_ITEM_NAME_REMOVE:
-+	case KDBUS_ITEM_NAME_CHANGE:
-+		return kdbus_conn_policy_see_name(conn, conn_creds,
-+					msg->items[0].name_change.name);
-+
-+	case KDBUS_ITEM_ID_ADD:
-+	case KDBUS_ITEM_ID_REMOVE:
-+		return true;
-+
-+	default:
-+		WARN(1, "Invalid type for notification broadcast: %llu\n",
-+		     (unsigned long long)msg->items[0].type);
-+		return false;
-+	}
-+}
-+
-+/**
-+ * kdbus_cmd_hello() - handle KDBUS_CMD_HELLO
-+ * @ep:			Endpoint to operate on
-+ * @file:		File this connection is opened on
-+ * @argp:		Command payload
-+ *
-+ * Return: NULL or newly created connection on success, ERR_PTR on failure.
-+ */
-+struct kdbus_conn *kdbus_cmd_hello(struct kdbus_ep *ep, struct file *file,
-+				   void __user *argp)
-+{
-+	struct kdbus_cmd_hello *cmd;
-+	struct kdbus_conn *c = NULL;
-+	const char *item_name;
-+	int ret;
-+
-+	struct kdbus_arg argv[] = {
-+		{ .type = KDBUS_ITEM_NEGOTIATE },
-+		{ .type = KDBUS_ITEM_NAME },
-+		{ .type = KDBUS_ITEM_CREDS },
-+		{ .type = KDBUS_ITEM_PIDS },
-+		{ .type = KDBUS_ITEM_SECLABEL },
-+		{ .type = KDBUS_ITEM_CONN_DESCRIPTION },
-+		{ .type = KDBUS_ITEM_POLICY_ACCESS, .multiple = true },
-+	};
-+	struct kdbus_args args = {
-+		.allowed_flags = KDBUS_FLAG_NEGOTIATE |
-+				 KDBUS_HELLO_ACCEPT_FD |
-+				 KDBUS_HELLO_ACTIVATOR |
-+				 KDBUS_HELLO_POLICY_HOLDER |
-+				 KDBUS_HELLO_MONITOR,
-+		.argv = argv,
-+		.argc = ARRAY_SIZE(argv),
-+	};
-+
-+	ret = kdbus_args_parse(&args, argp, &cmd);
-+	if (ret < 0)
-+		return ERR_PTR(ret);
-+	if (ret > 0)
-+		return NULL;
-+
-+	item_name = argv[1].item ? argv[1].item->str : NULL;
-+
-+	c = kdbus_conn_new(ep, file, cmd, item_name,
-+			   argv[2].item ? &argv[2].item->creds : NULL,
-+			   argv[3].item ? &argv[3].item->pids : NULL,
-+			   argv[4].item ? argv[4].item->str : NULL,
-+			   argv[5].item ? argv[5].item->str : NULL);
-+	if (IS_ERR(c)) {
-+		ret = PTR_ERR(c);
-+		c = NULL;
-+		goto exit;
-+	}
-+
-+	ret = kdbus_conn_connect(c, item_name);
-+	if (ret < 0)
-+		goto exit;
-+
-+	if (kdbus_conn_is_activator(c) || kdbus_conn_is_policy_holder(c)) {
-+		ret = kdbus_conn_acquire(c);
-+		if (ret < 0)
-+			goto exit;
-+
-+		ret = kdbus_policy_set(&c->ep->bus->policy_db, args.items,
-+				       args.items_size, 1,
-+				       kdbus_conn_is_policy_holder(c), c);
-+		kdbus_conn_release(c);
-+		if (ret < 0)
-+			goto exit;
-+	}
-+
-+	if (copy_to_user(argp, cmd, sizeof(*cmd)))
-+		ret = -EFAULT;
-+
-+exit:
-+	ret = kdbus_args_clear(&args, ret);
-+	if (ret < 0) {
-+		if (c) {
-+			kdbus_conn_disconnect(c, false);
-+			kdbus_conn_unref(c);
-+		}
-+		return ERR_PTR(ret);
-+	}
-+	return c;
-+}
-+
-+/**
-+ * kdbus_cmd_byebye_unlocked() - handle KDBUS_CMD_BYEBYE
-+ * @conn:		connection to operate on
-+ * @argp:		command payload
-+ *
-+ * The caller must not hold any active reference to @conn or this will deadlock.
-+ *
-+ * Return: >=0 on success, negative error code on failure.
-+ */
-+int kdbus_cmd_byebye_unlocked(struct kdbus_conn *conn, void __user *argp)
-+{
-+	struct kdbus_cmd *cmd;
-+	int ret;
-+
-+	struct kdbus_arg argv[] = {
-+		{ .type = KDBUS_ITEM_NEGOTIATE },
-+	};
-+	struct kdbus_args args = {
-+		.allowed_flags = KDBUS_FLAG_NEGOTIATE,
-+		.argv = argv,
-+		.argc = ARRAY_SIZE(argv),
-+	};
-+
-+	if (!kdbus_conn_is_ordinary(conn))
-+		return -EOPNOTSUPP;
-+
-+	ret = kdbus_args_parse(&args, argp, &cmd);
-+	if (ret != 0)
-+		return ret;
-+
-+	ret = kdbus_conn_disconnect(conn, true);
-+	return kdbus_args_clear(&args, ret);
-+}
-+
-+/**
-+ * kdbus_cmd_conn_info() - handle KDBUS_CMD_CONN_INFO
-+ * @conn:		connection to operate on
-+ * @argp:		command payload
-+ *
-+ * Return: >=0 on success, negative error code on failure.
-+ */
-+int kdbus_cmd_conn_info(struct kdbus_conn *conn, void __user *argp)
-+{
-+	struct kdbus_meta_conn *conn_meta = NULL;
-+	struct kdbus_pool_slice *slice = NULL;
-+	struct kdbus_name_entry *entry = NULL;
-+	struct kdbus_name_owner *owner = NULL;
-+	struct kdbus_conn *owner_conn = NULL;
-+	struct kdbus_item *meta_items = NULL;
-+	struct kdbus_info info = {};
-+	struct kdbus_cmd_info *cmd;
-+	struct kdbus_bus *bus = conn->ep->bus;
-+	struct kvec kvec[3];
-+	size_t meta_size, cnt = 0;
-+	const char *name;
-+	u64 attach_flags, size = 0;
-+	int ret;
-+
-+	struct kdbus_arg argv[] = {
-+		{ .type = KDBUS_ITEM_NEGOTIATE },
-+		{ .type = KDBUS_ITEM_NAME },
-+	};
-+	struct kdbus_args args = {
-+		.allowed_flags = KDBUS_FLAG_NEGOTIATE,
-+		.argv = argv,
-+		.argc = ARRAY_SIZE(argv),
-+	};
-+
-+	ret = kdbus_args_parse(&args, argp, &cmd);
-+	if (ret != 0)
-+		return ret;
-+
-+	/* registry must be held throughout lookup *and* collecting data */
-+	down_read(&bus->name_registry->rwlock);
-+
-+	ret = kdbus_sanitize_attach_flags(cmd->attach_flags, &attach_flags);
-+	if (ret < 0)
-+		goto exit;
-+
-+	name = argv[1].item ? argv[1].item->str : NULL;
-+
-+	if (name) {
-+		entry = kdbus_name_lookup_unlocked(bus->name_registry, name);
-+		if (entry)
-+			owner = kdbus_name_get_owner(entry);
-+		if (!owner ||
-+		    !kdbus_conn_policy_see_name(conn, current_cred(), name) ||
-+		    (cmd->id != 0 && owner->conn->id != cmd->id)) {
-+			/* pretend a name doesn't exist if you cannot see it */
-+			ret = -ESRCH;
-+			goto exit;
-+		}
-+
-+		owner_conn = kdbus_conn_ref(owner->conn);
-+	} else if (cmd->id > 0) {
-+		owner_conn = kdbus_bus_find_conn_by_id(bus, cmd->id);
-+		if (!owner_conn || !kdbus_conn_policy_see(conn, current_cred(),
-+							  owner_conn)) {
-+			/* pretend an id doesn't exist if you cannot see it */
-+			ret = -ENXIO;
-+			goto exit;
-+		}
-+	} else {
-+		ret = -EINVAL;
-+		goto exit;
-+	}
-+
-+	attach_flags &= atomic64_read(&owner_conn->attach_flags_send);
-+
-+	conn_meta = kdbus_meta_conn_new();
-+	if (IS_ERR(conn_meta)) {
-+		ret = PTR_ERR(conn_meta);
-+		conn_meta = NULL;
-+		goto exit;
-+	}
-+
-+	ret = kdbus_meta_conn_collect(conn_meta, owner_conn, 0, attach_flags);
-+	if (ret < 0)
-+		goto exit;
-+
-+	ret = kdbus_meta_emit(owner_conn->meta_proc, owner_conn->meta_fake,
-+			      conn_meta, conn, attach_flags,
-+			      &meta_items, &meta_size);
-+	if (ret < 0)
-+		goto exit;
-+
-+	info.id = owner_conn->id;
-+	info.flags = owner_conn->flags;
-+
-+	kdbus_kvec_set(&kvec[cnt++], &info, sizeof(info), &size);
-+	if (meta_size > 0) {
-+		kdbus_kvec_set(&kvec[cnt++], meta_items, meta_size, &size);
-+		cnt += !!kdbus_kvec_pad(&kvec[cnt], &size);
-+	}
-+
-+	info.size = size;
-+
-+	slice = kdbus_pool_slice_alloc(conn->pool, size, false);
-+	if (IS_ERR(slice)) {
-+		ret = PTR_ERR(slice);
-+		slice = NULL;
-+		goto exit;
-+	}
-+
-+	ret = kdbus_pool_slice_copy_kvec(slice, 0, kvec, cnt, size);
-+	if (ret < 0)
-+		goto exit;
-+
-+	kdbus_pool_slice_publish(slice, &cmd->offset, &cmd->info_size);
-+
-+	if (kdbus_member_set_user(&cmd->offset, argp, typeof(*cmd), offset) ||
-+	    kdbus_member_set_user(&cmd->info_size, argp,
-+				  typeof(*cmd), info_size)) {
-+		ret = -EFAULT;
-+		goto exit;
-+	}
-+
-+	ret = 0;
-+
-+exit:
-+	up_read(&bus->name_registry->rwlock);
-+	kdbus_pool_slice_release(slice);
-+	kfree(meta_items);
-+	kdbus_meta_conn_unref(conn_meta);
-+	kdbus_conn_unref(owner_conn);
-+	return kdbus_args_clear(&args, ret);
-+}
-+
-+/**
-+ * kdbus_cmd_update() - handle KDBUS_CMD_UPDATE
-+ * @conn:		connection to operate on
-+ * @argp:		command payload
-+ *
-+ * Return: >=0 on success, negative error code on failure.
-+ */
-+int kdbus_cmd_update(struct kdbus_conn *conn, void __user *argp)
-+{
-+	struct kdbus_item *item_policy;
-+	u64 *item_attach_send = NULL;
-+	u64 *item_attach_recv = NULL;
-+	struct kdbus_cmd *cmd;
-+	u64 attach_send;
-+	u64 attach_recv;
-+	int ret;
-+
-+	struct kdbus_arg argv[] = {
-+		{ .type = KDBUS_ITEM_NEGOTIATE },
-+		{ .type = KDBUS_ITEM_ATTACH_FLAGS_SEND },
-+		{ .type = KDBUS_ITEM_ATTACH_FLAGS_RECV },
-+		{ .type = KDBUS_ITEM_NAME, .multiple = true },
-+		{ .type = KDBUS_ITEM_POLICY_ACCESS, .multiple = true },
-+	};
-+	struct kdbus_args args = {
-+		.allowed_flags = KDBUS_FLAG_NEGOTIATE,
-+		.argv = argv,
-+		.argc = ARRAY_SIZE(argv),
-+	};
-+
-+	ret = kdbus_args_parse(&args, argp, &cmd);
-+	if (ret != 0)
-+		return ret;
-+
-+	item_attach_send = argv[1].item ? &argv[1].item->data64[0] : NULL;
-+	item_attach_recv = argv[2].item ? &argv[2].item->data64[0] : NULL;
-+	item_policy = argv[3].item ? : argv[4].item;
-+
-+	if (item_attach_send) {
-+		if (!kdbus_conn_is_ordinary(conn) &&
-+		    !kdbus_conn_is_monitor(conn)) {
-+			ret = -EOPNOTSUPP;
-+			goto exit;
-+		}
-+
-+		ret = kdbus_sanitize_attach_flags(*item_attach_send,
-+						  &attach_send);
-+		if (ret < 0)
-+			goto exit;
-+	}
-+
-+	if (item_attach_recv) {
-+		if (!kdbus_conn_is_ordinary(conn) &&
-+		    !kdbus_conn_is_monitor(conn) &&
-+		    !kdbus_conn_is_activator(conn)) {
-+			ret = -EOPNOTSUPP;
-+			goto exit;
-+		}
-+
-+		ret = kdbus_sanitize_attach_flags(*item_attach_recv,
-+						  &attach_recv);
-+		if (ret < 0)
-+			goto exit;
-+	}
-+
-+	if (item_policy && !kdbus_conn_is_policy_holder(conn)) {
-+		ret = -EOPNOTSUPP;
-+		goto exit;
-+	}
-+
-+	/* now that we verified the input, update the connection */
-+
-+	if (item_policy) {
-+		ret = kdbus_policy_set(&conn->ep->bus->policy_db, cmd->items,
-+				       KDBUS_ITEMS_SIZE(cmd, items),
-+				       1, true, conn);
-+		if (ret < 0)
-+			goto exit;
-+	}
-+
-+	if (item_attach_send)
-+		atomic64_set(&conn->attach_flags_send, attach_send);
-+
-+	if (item_attach_recv)
-+		atomic64_set(&conn->attach_flags_recv, attach_recv);
-+
-+exit:
-+	return kdbus_args_clear(&args, ret);
-+}
-+
-+/**
-+ * kdbus_cmd_send() - handle KDBUS_CMD_SEND
-+ * @conn:		connection to operate on
-+ * @f:			file this command was called on
-+ * @argp:		command payload
-+ *
-+ * Return: >=0 on success, negative error code on failure.
-+ */
-+int kdbus_cmd_send(struct kdbus_conn *conn, struct file *f, void __user *argp)
-+{
-+	struct kdbus_cmd_send *cmd;
-+	struct kdbus_staging *staging = NULL;
-+	struct kdbus_msg *msg = NULL;
-+	struct file *cancel_fd = NULL;
-+	int ret, ret2;
-+
-+	/* command arguments */
-+	struct kdbus_arg argv[] = {
-+		{ .type = KDBUS_ITEM_NEGOTIATE },
-+		{ .type = KDBUS_ITEM_CANCEL_FD },
-+	};
-+	struct kdbus_args args = {
-+		.allowed_flags = KDBUS_FLAG_NEGOTIATE |
-+				 KDBUS_SEND_SYNC_REPLY,
-+		.argv = argv,
-+		.argc = ARRAY_SIZE(argv),
-+	};
-+
-+	/* message arguments */
-+	struct kdbus_arg msg_argv[] = {
-+		{ .type = KDBUS_ITEM_NEGOTIATE },
-+		{ .type = KDBUS_ITEM_PAYLOAD_VEC, .multiple = true },
-+		{ .type = KDBUS_ITEM_PAYLOAD_MEMFD, .multiple = true },
-+		{ .type = KDBUS_ITEM_FDS },
-+		{ .type = KDBUS_ITEM_BLOOM_FILTER },
-+		{ .type = KDBUS_ITEM_DST_NAME },
-+	};
-+	struct kdbus_args msg_args = {
-+		.allowed_flags = KDBUS_FLAG_NEGOTIATE |
-+				 KDBUS_MSG_EXPECT_REPLY |
-+				 KDBUS_MSG_NO_AUTO_START |
-+				 KDBUS_MSG_SIGNAL,
-+		.argv = msg_argv,
-+		.argc = ARRAY_SIZE(msg_argv),
-+	};
-+
-+	if (!kdbus_conn_is_ordinary(conn))
-+		return -EOPNOTSUPP;
-+
-+	/* make sure to parse both, @cmd and @msg on negotiation */
-+
-+	ret = kdbus_args_parse(&args, argp, &cmd);
-+	if (ret < 0)
-+		goto exit;
-+	else if (ret > 0 && !cmd->msg_address) /* negotiation without msg */
-+		goto exit;
-+
-+	ret2 = kdbus_args_parse_msg(&msg_args, KDBUS_PTR(cmd->msg_address),
-+				    &msg);
-+	if (ret2 < 0) { /* cannot parse message */
-+		ret = ret2;
-+		goto exit;
-+	} else if (ret2 > 0 && !ret) { /* msg-negot implies cmd-negot */
-+		ret = -EINVAL;
-+		goto exit;
-+	} else if (ret > 0) { /* negotiation */
-+		goto exit;
-+	}
-+
-+	/* here we parsed both, @cmd and @msg, and neither wants negotiation */
-+
-+	cmd->reply.return_flags = 0;
-+	kdbus_pool_publish_empty(conn->pool, &cmd->reply.offset,
-+				 &cmd->reply.msg_size);
-+
-+	if (argv[1].item) {
-+		cancel_fd = fget(argv[1].item->fds[0]);
-+		if (!cancel_fd) {
-+			ret = -EBADF;
-+			goto exit;
-+		}
-+
-+		if (!cancel_fd->f_op->poll) {
-+			ret = -EINVAL;
-+			goto exit;
-+		}
-+	}
-+
-+	/* patch-in the source of this message */
-+	if (msg->src_id > 0 && msg->src_id != conn->id) {
-+		ret = -EINVAL;
-+		goto exit;
-+	}
-+	msg->src_id = conn->id;
-+
-+	staging = kdbus_staging_new_user(conn->ep->bus, cmd, msg);
-+	if (IS_ERR(staging)) {
-+		ret = PTR_ERR(staging);
-+		staging = NULL;
-+		goto exit;
-+	}
-+
-+	if (msg->dst_id == KDBUS_DST_ID_BROADCAST) {
-+		down_read(&conn->ep->bus->name_registry->rwlock);
-+		kdbus_bus_broadcast(conn->ep->bus, conn, staging);
-+		up_read(&conn->ep->bus->name_registry->rwlock);
-+	} else if (cmd->flags & KDBUS_SEND_SYNC_REPLY) {
-+		struct kdbus_reply *r;
-+		ktime_t exp;
-+
-+		exp = ns_to_ktime(msg->timeout_ns);
-+		r = kdbus_conn_call(conn, staging, exp);
-+		if (IS_ERR(r)) {
-+			ret = PTR_ERR(r);
-+			goto exit;
-+		}
-+
-+		ret = kdbus_conn_wait_reply(conn, cmd, f, cancel_fd, r, exp);
-+		kdbus_reply_unref(r);
-+		if (ret < 0)
-+			goto exit;
-+	} else if ((msg->flags & KDBUS_MSG_EXPECT_REPLY) ||
-+		   msg->cookie_reply == 0) {
-+		ret = kdbus_conn_unicast(conn, staging);
-+		if (ret < 0)
-+			goto exit;
-+	} else {
-+		ret = kdbus_conn_reply(conn, staging);
-+		if (ret < 0)
-+			goto exit;
-+	}
-+
-+	if (kdbus_member_set_user(&cmd->reply, argp, typeof(*cmd), reply))
-+		ret = -EFAULT;
-+
-+exit:
-+	if (cancel_fd)
-+		fput(cancel_fd);
-+	kdbus_staging_free(staging);
-+	ret = kdbus_args_clear(&msg_args, ret);
-+	return kdbus_args_clear(&args, ret);
-+}
-+
-+/**
-+ * kdbus_cmd_recv() - handle KDBUS_CMD_RECV
-+ * @conn:		connection to operate on
-+ * @argp:		command payload
-+ *
-+ * Return: >=0 on success, negative error code on failure.
-+ */
-+int kdbus_cmd_recv(struct kdbus_conn *conn, void __user *argp)
-+{
-+	struct kdbus_queue_entry *entry;
-+	struct kdbus_cmd_recv *cmd;
-+	int ret;
-+
-+	struct kdbus_arg argv[] = {
-+		{ .type = KDBUS_ITEM_NEGOTIATE },
-+	};
-+	struct kdbus_args args = {
-+		.allowed_flags = KDBUS_FLAG_NEGOTIATE |
-+				 KDBUS_RECV_PEEK |
-+				 KDBUS_RECV_DROP |
-+				 KDBUS_RECV_USE_PRIORITY,
-+		.argv = argv,
-+		.argc = ARRAY_SIZE(argv),
-+	};
-+
-+	if (!kdbus_conn_is_ordinary(conn) &&
-+	    !kdbus_conn_is_monitor(conn) &&
-+	    !kdbus_conn_is_activator(conn))
-+		return -EOPNOTSUPP;
-+
-+	ret = kdbus_args_parse(&args, argp, &cmd);
-+	if (ret != 0)
-+		return ret;
-+
-+	cmd->dropped_msgs = 0;
-+	cmd->msg.return_flags = 0;
-+	kdbus_pool_publish_empty(conn->pool, &cmd->msg.offset,
-+				 &cmd->msg.msg_size);
-+
-+	/* DROP+priority is not realiably, so prevent it */
-+	if ((cmd->flags & KDBUS_RECV_DROP) &&
-+	    (cmd->flags & KDBUS_RECV_USE_PRIORITY)) {
-+		ret = -EINVAL;
-+		goto exit;
-+	}
-+
-+	mutex_lock(&conn->lock);
-+
-+	entry = kdbus_queue_peek(&conn->queue, cmd->priority,
-+				 cmd->flags & KDBUS_RECV_USE_PRIORITY);
-+	if (!entry) {
-+		mutex_unlock(&conn->lock);
-+		ret = -EAGAIN;
-+	} else if (cmd->flags & KDBUS_RECV_DROP) {
-+		struct kdbus_reply *reply = kdbus_reply_ref(entry->reply);
-+
-+		kdbus_queue_entry_free(entry);
-+
-+		mutex_unlock(&conn->lock);
-+
-+		if (reply) {
-+			mutex_lock(&reply->reply_dst->lock);
-+			if (!list_empty(&reply->entry)) {
-+				kdbus_reply_unlink(reply);
-+				if (reply->sync)
-+					kdbus_sync_reply_wakeup(reply, -EPIPE);
-+				else
-+					kdbus_notify_reply_dead(conn->ep->bus,
-+							reply->reply_dst->id,
-+							reply->cookie);
-+			}
-+			mutex_unlock(&reply->reply_dst->lock);
-+			kdbus_notify_flush(conn->ep->bus);
-+		}
-+
-+		kdbus_reply_unref(reply);
-+	} else {
-+		bool install_fds;
-+
-+		/*
-+		 * PEEK just returns the location of the next message. Do not
-+		 * install FDs nor memfds nor anything else. The only
-+		 * information of interest should be the message header and
-+		 * metadata. Any FD numbers in the payload is undefined for
-+		 * PEEK'ed messages.
-+		 * Also make sure to never install fds into a connection that
-+		 * has refused to receive any. Ordinary connections will not get
-+		 * messages with FDs queued (the receiver will get -ECOMM), but
-+		 * eavesdroppers might.
-+		 */
-+		install_fds = (conn->flags & KDBUS_HELLO_ACCEPT_FD) &&
-+			      !(cmd->flags & KDBUS_RECV_PEEK);
-+
-+		ret = kdbus_queue_entry_install(entry,
-+						&cmd->msg.return_flags,
-+						install_fds);
-+		if (ret < 0) {
-+			mutex_unlock(&conn->lock);
-+			goto exit;
-+		}
-+
-+		kdbus_pool_slice_publish(entry->slice, &cmd->msg.offset,
-+					 &cmd->msg.msg_size);
-+
-+		if (!(cmd->flags & KDBUS_RECV_PEEK))
-+			kdbus_queue_entry_free(entry);
-+
-+		mutex_unlock(&conn->lock);
-+	}
-+
-+	cmd->dropped_msgs = atomic_xchg(&conn->lost_count, 0);
-+	if (cmd->dropped_msgs > 0)
-+		cmd->return_flags |= KDBUS_RECV_RETURN_DROPPED_MSGS;
-+
-+	if (kdbus_member_set_user(&cmd->msg, argp, typeof(*cmd), msg) ||
-+	    kdbus_member_set_user(&cmd->dropped_msgs, argp, typeof(*cmd),
-+				  dropped_msgs))
-+		ret = -EFAULT;
-+
-+exit:
-+	return kdbus_args_clear(&args, ret);
-+}
-+
-+/**
-+ * kdbus_cmd_free() - handle KDBUS_CMD_FREE
-+ * @conn:		connection to operate on
-+ * @argp:		command payload
-+ *
-+ * Return: >=0 on success, negative error code on failure.
-+ */
-+int kdbus_cmd_free(struct kdbus_conn *conn, void __user *argp)
-+{
-+	struct kdbus_cmd_free *cmd;
-+	int ret;
-+
-+	struct kdbus_arg argv[] = {
-+		{ .type = KDBUS_ITEM_NEGOTIATE },
-+	};
-+	struct kdbus_args args = {
-+		.allowed_flags = KDBUS_FLAG_NEGOTIATE,
-+		.argv = argv,
-+		.argc = ARRAY_SIZE(argv),
-+	};
-+
-+	if (!kdbus_conn_is_ordinary(conn) &&
-+	    !kdbus_conn_is_monitor(conn) &&
-+	    !kdbus_conn_is_activator(conn))
-+		return -EOPNOTSUPP;
-+
-+	ret = kdbus_args_parse(&args, argp, &cmd);
-+	if (ret != 0)
-+		return ret;
-+
-+	ret = kdbus_pool_release_offset(conn->pool, cmd->offset);
-+
-+	return kdbus_args_clear(&args, ret);
-+}
-diff --git a/ipc/kdbus/connection.h b/ipc/kdbus/connection.h
-new file mode 100644
-index 0000000..1ad0820
---- /dev/null
-+++ b/ipc/kdbus/connection.h
-@@ -0,0 +1,260 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_CONNECTION_H
-+#define __KDBUS_CONNECTION_H
-+
-+#include <linux/atomic.h>
-+#include <linux/kref.h>
-+#include <linux/lockdep.h>
-+#include <linux/path.h>
-+
-+#include "limits.h"
-+#include "metadata.h"
-+#include "pool.h"
-+#include "queue.h"
-+#include "util.h"
-+
-+#define KDBUS_HELLO_SPECIAL_CONN	(KDBUS_HELLO_ACTIVATOR | \
-+					 KDBUS_HELLO_POLICY_HOLDER | \
-+					 KDBUS_HELLO_MONITOR)
-+
-+struct kdbus_name_entry;
-+struct kdbus_quota;
-+struct kdbus_staging;
-+
-+/**
-+ * struct kdbus_conn - connection to a bus
-+ * @kref:		Reference count
-+ * @active:		Active references to the connection
-+ * @id:			Connection ID
-+ * @flags:		KDBUS_HELLO_* flags
-+ * @attach_flags_send:	KDBUS_ATTACH_* flags for sending
-+ * @attach_flags_recv:	KDBUS_ATTACH_* flags for receiving
-+ * @description:	Human-readable connection description, used for
-+ *			debugging. This field is only set when the
-+ *			connection is created.
-+ * @ep:			The endpoint this connection belongs to
-+ * @lock:		Connection data lock
-+ * @hentry:		Entry in ID <-> connection map
-+ * @ep_entry:		Entry in endpoint
-+ * @monitor_entry:	Entry in monitor, if the connection is a monitor
-+ * @reply_list:		List of connections this connection should
-+ *			reply to
-+ * @work:		Delayed work to handle timeouts
-+ *			activator for
-+ * @match_db:		Subscription filter to broadcast messages
-+ * @meta_proc:		Process metadata of connection creator, or NULL
-+ * @meta_fake:		Faked metadata, or NULL
-+ * @pool:		The user's buffer to receive messages
-+ * @user:		Owner of the connection
-+ * @cred:		The credentials of the connection at creation time
-+ * @pid:		Pid at creation time
-+ * @root_path:		Root path at creation time
-+ * @request_count:	Number of pending requests issued by this
-+ *			connection that are waiting for replies from
-+ *			other peers
-+ * @lost_count:		Number of lost broadcast messages
-+ * @wait:		Wake up this endpoint
-+ * @queue:		The message queue associated with this connection
-+ * @quota:		Array of per-user quota indexed by user->id
-+ * @n_quota:		Number of elements in quota array
-+ * @names_list:		List of well-known names
-+ * @name_count:		Number of owned well-known names
-+ * @privileged:		Whether this connection is privileged on the domain
-+ * @owner:		Owned by the same user as the bus owner
-+ */
-+struct kdbus_conn {
-+	struct kref kref;
-+	atomic_t active;
-+#ifdef CONFIG_DEBUG_LOCK_ALLOC
-+	struct lockdep_map dep_map;
-+#endif
-+	u64 id;
-+	u64 flags;
-+	atomic64_t attach_flags_send;
-+	atomic64_t attach_flags_recv;
-+	const char *description;
-+	struct kdbus_ep *ep;
-+	struct mutex lock;
-+	struct hlist_node hentry;
-+	struct list_head ep_entry;
-+	struct list_head monitor_entry;
-+	struct list_head reply_list;
-+	struct delayed_work work;
-+	struct kdbus_match_db *match_db;
-+	struct kdbus_meta_proc *meta_proc;
-+	struct kdbus_meta_fake *meta_fake;
-+	struct kdbus_pool *pool;
-+	struct kdbus_user *user;
-+	const struct cred *cred;
-+	struct pid *pid;
-+	struct path root_path;
-+	atomic_t request_count;
-+	atomic_t lost_count;
-+	wait_queue_head_t wait;
-+	struct kdbus_queue queue;
-+
-+	struct kdbus_quota *quota;
-+	unsigned int n_quota;
-+
-+	/* protected by registry->rwlock */
-+	struct list_head names_list;
-+	unsigned int name_count;
-+
-+	bool privileged:1;
-+	bool owner:1;
-+};
-+
-+struct kdbus_conn *kdbus_conn_ref(struct kdbus_conn *conn);
-+struct kdbus_conn *kdbus_conn_unref(struct kdbus_conn *conn);
-+bool kdbus_conn_active(const struct kdbus_conn *conn);
-+int kdbus_conn_acquire(struct kdbus_conn *conn);
-+void kdbus_conn_release(struct kdbus_conn *conn);
-+int kdbus_conn_disconnect(struct kdbus_conn *conn, bool ensure_queue_empty);
-+bool kdbus_conn_has_name(struct kdbus_conn *conn, const char *name);
-+int kdbus_conn_quota_inc(struct kdbus_conn *c, struct kdbus_user *u,
-+			 size_t memory, size_t fds);
-+void kdbus_conn_quota_dec(struct kdbus_conn *c, struct kdbus_user *u,
-+			  size_t memory, size_t fds);
-+void kdbus_conn_lost_message(struct kdbus_conn *c);
-+int kdbus_conn_entry_insert(struct kdbus_conn *conn_src,
-+			    struct kdbus_conn *conn_dst,
-+			    struct kdbus_staging *staging,
-+			    struct kdbus_reply *reply,
-+			    const struct kdbus_name_entry *name);
-+void kdbus_conn_move_messages(struct kdbus_conn *conn_dst,
-+			      struct kdbus_conn *conn_src,
-+			      u64 name_id);
-+
-+/* policy */
-+bool kdbus_conn_policy_own_name(struct kdbus_conn *conn,
-+				const struct cred *conn_creds,
-+				const char *name);
-+bool kdbus_conn_policy_talk(struct kdbus_conn *conn,
-+			    const struct cred *conn_creds,
-+			    struct kdbus_conn *to);
-+bool kdbus_conn_policy_see_name_unlocked(struct kdbus_conn *conn,
-+					 const struct cred *curr_creds,
-+					 const char *name);
-+bool kdbus_conn_policy_see_notification(struct kdbus_conn *conn,
-+					const struct cred *curr_creds,
-+					const struct kdbus_msg *msg);
-+
-+/* command dispatcher */
-+struct kdbus_conn *kdbus_cmd_hello(struct kdbus_ep *ep, struct file *file,
-+				   void __user *argp);
-+int kdbus_cmd_byebye_unlocked(struct kdbus_conn *conn, void __user *argp);
-+int kdbus_cmd_conn_info(struct kdbus_conn *conn, void __user *argp);
-+int kdbus_cmd_update(struct kdbus_conn *conn, void __user *argp);
-+int kdbus_cmd_send(struct kdbus_conn *conn, struct file *f, void __user *argp);
-+int kdbus_cmd_recv(struct kdbus_conn *conn, void __user *argp);
-+int kdbus_cmd_free(struct kdbus_conn *conn, void __user *argp);
-+
-+/**
-+ * kdbus_conn_is_ordinary() - Check if connection is ordinary
-+ * @conn:		The connection to check
-+ *
-+ * Return: Non-zero if the connection is an ordinary connection
-+ */
-+static inline int kdbus_conn_is_ordinary(const struct kdbus_conn *conn)
-+{
-+	return !(conn->flags & KDBUS_HELLO_SPECIAL_CONN);
-+}
-+
-+/**
-+ * kdbus_conn_is_activator() - Check if connection is an activator
-+ * @conn:		The connection to check
-+ *
-+ * Return: Non-zero if the connection is an activator
-+ */
-+static inline int kdbus_conn_is_activator(const struct kdbus_conn *conn)
-+{
-+	return conn->flags & KDBUS_HELLO_ACTIVATOR;
-+}
-+
-+/**
-+ * kdbus_conn_is_policy_holder() - Check if connection is a policy holder
-+ * @conn:		The connection to check
-+ *
-+ * Return: Non-zero if the connection is a policy holder
-+ */
-+static inline int kdbus_conn_is_policy_holder(const struct kdbus_conn *conn)
-+{
-+	return conn->flags & KDBUS_HELLO_POLICY_HOLDER;
-+}
-+
-+/**
-+ * kdbus_conn_is_monitor() - Check if connection is a monitor
-+ * @conn:		The connection to check
-+ *
-+ * Return: Non-zero if the connection is a monitor
-+ */
-+static inline int kdbus_conn_is_monitor(const struct kdbus_conn *conn)
-+{
-+	return conn->flags & KDBUS_HELLO_MONITOR;
-+}
-+
-+/**
-+ * kdbus_conn_lock2() - Lock two connections
-+ * @a:		connection A to lock or NULL
-+ * @b:		connection B to lock or NULL
-+ *
-+ * Lock two connections at once. As we need to have a stable locking order, we
-+ * always lock the connection with lower memory address first.
-+ */
-+static inline void kdbus_conn_lock2(struct kdbus_conn *a, struct kdbus_conn *b)
-+{
-+	if (a < b) {
-+		if (a)
-+			mutex_lock(&a->lock);
-+		if (b && b != a)
-+			mutex_lock_nested(&b->lock, !!a);
-+	} else {
-+		if (b)
-+			mutex_lock(&b->lock);
-+		if (a && a != b)
-+			mutex_lock_nested(&a->lock, !!b);
-+	}
-+}
-+
-+/**
-+ * kdbus_conn_unlock2() - Unlock two connections
-+ * @a:		connection A to unlock or NULL
-+ * @b:		connection B to unlock or NULL
-+ *
-+ * Unlock two connections at once. See kdbus_conn_lock2().
-+ */
-+static inline void kdbus_conn_unlock2(struct kdbus_conn *a,
-+				      struct kdbus_conn *b)
-+{
-+	if (a)
-+		mutex_unlock(&a->lock);
-+	if (b && b != a)
-+		mutex_unlock(&b->lock);
-+}
-+
-+/**
-+ * kdbus_conn_assert_active() - lockdep assert on active lock
-+ * @conn:	connection that shall be active
-+ *
-+ * This verifies via lockdep that the caller holds an active reference to the
-+ * given connection.
-+ */
-+static inline void kdbus_conn_assert_active(struct kdbus_conn *conn)
-+{
-+	lockdep_assert_held(conn);
-+}
-+
-+#endif
-diff --git a/ipc/kdbus/domain.c b/ipc/kdbus/domain.c
-new file mode 100644
-index 0000000..ac9f760
---- /dev/null
-+++ b/ipc/kdbus/domain.c
-@@ -0,0 +1,296 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/fs.h>
-+#include <linux/idr.h>
-+#include <linux/init.h>
-+#include <linux/module.h>
-+#include <linux/sched.h>
-+#include <linux/sizes.h>
-+#include <linux/slab.h>
-+#include <linux/uaccess.h>
-+
-+#include "bus.h"
-+#include "domain.h"
-+#include "handle.h"
-+#include "item.h"
-+#include "limits.h"
-+#include "util.h"
-+
-+static void kdbus_domain_control_free(struct kdbus_node *node)
-+{
-+	kfree(node);
-+}
-+
-+static struct kdbus_node *kdbus_domain_control_new(struct kdbus_domain *domain,
-+						   unsigned int access)
-+{
-+	struct kdbus_node *node;
-+	int ret;
-+
-+	node = kzalloc(sizeof(*node), GFP_KERNEL);
-+	if (!node)
-+		return ERR_PTR(-ENOMEM);
-+
-+	kdbus_node_init(node, KDBUS_NODE_CONTROL);
-+
-+	node->free_cb = kdbus_domain_control_free;
-+	node->mode = domain->node.mode;
-+	node->mode = S_IRUSR | S_IWUSR;
-+	if (access & (KDBUS_MAKE_ACCESS_GROUP | KDBUS_MAKE_ACCESS_WORLD))
-+		node->mode |= S_IRGRP | S_IWGRP;
-+	if (access & KDBUS_MAKE_ACCESS_WORLD)
-+		node->mode |= S_IROTH | S_IWOTH;
-+
-+	ret = kdbus_node_link(node, &domain->node, "control");
-+	if (ret < 0)
-+		goto exit_free;
-+
-+	return node;
-+
-+exit_free:
-+	kdbus_node_deactivate(node);
-+	kdbus_node_unref(node);
-+	return ERR_PTR(ret);
-+}
-+
-+static void kdbus_domain_free(struct kdbus_node *node)
-+{
-+	struct kdbus_domain *domain =
-+		container_of(node, struct kdbus_domain, node);
-+
-+	put_user_ns(domain->user_namespace);
-+	ida_destroy(&domain->user_ida);
-+	idr_destroy(&domain->user_idr);
-+	kfree(domain);
-+}
-+
-+/**
-+ * kdbus_domain_new() - create a new domain
-+ * @access:		The access mode for this node (KDBUS_MAKE_ACCESS_*)
-+ *
-+ * Return: a new kdbus_domain on success, ERR_PTR on failure
-+ */
-+struct kdbus_domain *kdbus_domain_new(unsigned int access)
-+{
-+	struct kdbus_domain *d;
-+	int ret;
-+
-+	d = kzalloc(sizeof(*d), GFP_KERNEL);
-+	if (!d)
-+		return ERR_PTR(-ENOMEM);
-+
-+	kdbus_node_init(&d->node, KDBUS_NODE_DOMAIN);
-+
-+	d->node.free_cb = kdbus_domain_free;
-+	d->node.mode = S_IRUSR | S_IXUSR;
-+	if (access & (KDBUS_MAKE_ACCESS_GROUP | KDBUS_MAKE_ACCESS_WORLD))
-+		d->node.mode |= S_IRGRP | S_IXGRP;
-+	if (access & KDBUS_MAKE_ACCESS_WORLD)
-+		d->node.mode |= S_IROTH | S_IXOTH;
-+
-+	mutex_init(&d->lock);
-+	idr_init(&d->user_idr);
-+	ida_init(&d->user_ida);
-+
-+	/* Pin user namespace so we can guarantee domain-unique bus * names. */
-+	d->user_namespace = get_user_ns(current_user_ns());
-+
-+	ret = kdbus_node_link(&d->node, NULL, NULL);
-+	if (ret < 0)
-+		goto exit_unref;
-+
-+	return d;
-+
-+exit_unref:
-+	kdbus_node_deactivate(&d->node);
-+	kdbus_node_unref(&d->node);
-+	return ERR_PTR(ret);
-+}
-+
-+/**
-+ * kdbus_domain_ref() - take a domain reference
-+ * @domain:		Domain
-+ *
-+ * Return: the domain itself
-+ */
-+struct kdbus_domain *kdbus_domain_ref(struct kdbus_domain *domain)
-+{
-+	if (domain)
-+		kdbus_node_ref(&domain->node);
-+	return domain;
-+}
-+
-+/**
-+ * kdbus_domain_unref() - drop a domain reference
-+ * @domain:		Domain
-+ *
-+ * When the last reference is dropped, the domain internal structure
-+ * is freed.
-+ *
-+ * Return: NULL
-+ */
-+struct kdbus_domain *kdbus_domain_unref(struct kdbus_domain *domain)
-+{
-+	if (domain)
-+		kdbus_node_unref(&domain->node);
-+	return NULL;
-+}
-+
-+/**
-+ * kdbus_domain_populate() - populate static domain nodes
-+ * @domain:	domain to populate
-+ * @access:	KDBUS_MAKE_ACCESS_* access restrictions for new nodes
-+ *
-+ * Allocate and activate static sub-nodes of the given domain. This will fail if
-+ * you call it on a non-active node or if the domain was already populated.
-+ *
-+ * Return: 0 on success, negative error code on failure.
-+ */
-+int kdbus_domain_populate(struct kdbus_domain *domain, unsigned int access)
-+{
-+	struct kdbus_node *control;
-+
-+	/*
-+	 * Create a control-node for this domain. We drop our own reference
-+	 * immediately, effectively causing the node to be deactivated and
-+	 * released when the parent domain is.
-+	 */
-+	control = kdbus_domain_control_new(domain, access);
-+	if (IS_ERR(control))
-+		return PTR_ERR(control);
-+
-+	kdbus_node_activate(control);
-+	kdbus_node_unref(control);
-+	return 0;
-+}
-+
-+/**
-+ * kdbus_user_lookup() - lookup a kdbus_user object
-+ * @domain:		domain of the user
-+ * @uid:		uid of the user; INVALID_UID for an anon user
-+ *
-+ * Lookup the kdbus user accounting object for the given domain. If INVALID_UID
-+ * is passed, a new anonymous user is created which is private to the caller.
-+ *
-+ * Return: The user object is returned, ERR_PTR on failure.
-+ */
-+struct kdbus_user *kdbus_user_lookup(struct kdbus_domain *domain, kuid_t uid)
-+{
-+	struct kdbus_user *u = NULL, *old = NULL;
-+	int ret;
-+
-+	mutex_lock(&domain->lock);
-+
-+	if (uid_valid(uid)) {
-+		old = idr_find(&domain->user_idr, __kuid_val(uid));
-+		/*
-+		 * If the object is about to be destroyed, ignore it and
-+		 * replace the slot in the IDR later on.
-+		 */
-+		if (old && kref_get_unless_zero(&old->kref)) {
-+			mutex_unlock(&domain->lock);
-+			return old;
-+		}
-+	}
-+
-+	u = kzalloc(sizeof(*u), GFP_KERNEL);
-+	if (!u) {
-+		ret = -ENOMEM;
-+		goto exit;
-+	}
-+
-+	kref_init(&u->kref);
-+	u->domain = kdbus_domain_ref(domain);
-+	u->uid = uid;
-+	atomic_set(&u->buses, 0);
-+	atomic_set(&u->connections, 0);
-+
-+	if (uid_valid(uid)) {
-+		if (old) {
-+			idr_replace(&domain->user_idr, u, __kuid_val(uid));
-+			old->uid = INVALID_UID; /* mark old as removed */
-+		} else {
-+			ret = idr_alloc(&domain->user_idr, u, __kuid_val(uid),
-+					__kuid_val(uid) + 1, GFP_KERNEL);
-+			if (ret < 0)
-+				goto exit;
-+		}
-+	}
-+
-+	/*
-+	 * Allocate the smallest possible index for this user; used
-+	 * in arrays for accounting user quota in receiver queues.
-+	 */
-+	ret = ida_simple_get(&domain->user_ida, 1, 0, GFP_KERNEL);
-+	if (ret < 0)
-+		goto exit;
-+
-+	u->id = ret;
-+	mutex_unlock(&domain->lock);
-+	return u;
-+
-+exit:
-+	if (u) {
-+		if (uid_valid(u->uid))
-+			idr_remove(&domain->user_idr, __kuid_val(u->uid));
-+		kdbus_domain_unref(u->domain);
-+		kfree(u);
-+	}
-+	mutex_unlock(&domain->lock);
-+	return ERR_PTR(ret);
-+}
-+
-+static void __kdbus_user_free(struct kref *kref)
-+{
-+	struct kdbus_user *user = container_of(kref, struct kdbus_user, kref);
-+
-+	WARN_ON(atomic_read(&user->buses) > 0);
-+	WARN_ON(atomic_read(&user->connections) > 0);
-+
-+	mutex_lock(&user->domain->lock);
-+	ida_simple_remove(&user->domain->user_ida, user->id);
-+	if (uid_valid(user->uid))
-+		idr_remove(&user->domain->user_idr, __kuid_val(user->uid));
-+	mutex_unlock(&user->domain->lock);
-+
-+	kdbus_domain_unref(user->domain);
-+	kfree(user);
-+}
-+
-+/**
-+ * kdbus_user_ref() - take a user reference
-+ * @u:		User
-+ *
-+ * Return: @u is returned
-+ */
-+struct kdbus_user *kdbus_user_ref(struct kdbus_user *u)
-+{
-+	if (u)
-+		kref_get(&u->kref);
-+	return u;
-+}
-+
-+/**
-+ * kdbus_user_unref() - drop a user reference
-+ * @u:		User
-+ *
-+ * Return: NULL
-+ */
-+struct kdbus_user *kdbus_user_unref(struct kdbus_user *u)
-+{
-+	if (u)
-+		kref_put(&u->kref, __kdbus_user_free);
-+	return NULL;
-+}
-diff --git a/ipc/kdbus/domain.h b/ipc/kdbus/domain.h
-new file mode 100644
-index 0000000..447a2bd
---- /dev/null
-+++ b/ipc/kdbus/domain.h
-@@ -0,0 +1,77 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_DOMAIN_H
-+#define __KDBUS_DOMAIN_H
-+
-+#include <linux/fs.h>
-+#include <linux/idr.h>
-+#include <linux/kref.h>
-+#include <linux/user_namespace.h>
-+
-+#include "node.h"
-+
-+/**
-+ * struct kdbus_domain - domain for buses
-+ * @node:		Underlying API node
-+ * @lock:		Domain data lock
-+ * @last_id:		Last used object id
-+ * @user_idr:		Set of all users indexed by UID
-+ * @user_ida:		Set of all users to compute small indices
-+ * @user_namespace:	User namespace, pinned at creation time
-+ * @dentry:		Root dentry of VFS mount (don't use outside of kdbusfs)
-+ */
-+struct kdbus_domain {
-+	struct kdbus_node node;
-+	struct mutex lock;
-+	atomic64_t last_id;
-+	struct idr user_idr;
-+	struct ida user_ida;
-+	struct user_namespace *user_namespace;
-+	struct dentry *dentry;
-+};
-+
-+/**
-+ * struct kdbus_user - resource accounting for users
-+ * @kref:		Reference counter
-+ * @domain:		Domain of the user
-+ * @id:			Index of this user
-+ * @uid:		UID of the user
-+ * @buses:		Number of buses the user has created
-+ * @connections:	Number of connections the user has created
-+ */
-+struct kdbus_user {
-+	struct kref kref;
-+	struct kdbus_domain *domain;
-+	unsigned int id;
-+	kuid_t uid;
-+	atomic_t buses;
-+	atomic_t connections;
-+};
-+
-+#define kdbus_domain_from_node(_node) \
-+	container_of((_node), struct kdbus_domain, node)
-+
-+struct kdbus_domain *kdbus_domain_new(unsigned int access);
-+struct kdbus_domain *kdbus_domain_ref(struct kdbus_domain *domain);
-+struct kdbus_domain *kdbus_domain_unref(struct kdbus_domain *domain);
-+int kdbus_domain_populate(struct kdbus_domain *domain, unsigned int access);
-+
-+#define KDBUS_USER_KERNEL_ID 0 /* ID 0 is reserved for kernel accounting */
-+
-+struct kdbus_user *kdbus_user_lookup(struct kdbus_domain *domain, kuid_t uid);
-+struct kdbus_user *kdbus_user_ref(struct kdbus_user *u);
-+struct kdbus_user *kdbus_user_unref(struct kdbus_user *u);
-+
-+#endif
-diff --git a/ipc/kdbus/endpoint.c b/ipc/kdbus/endpoint.c
-new file mode 100644
-index 0000000..44e7a20
---- /dev/null
-+++ b/ipc/kdbus/endpoint.c
-@@ -0,0 +1,303 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/fs.h>
-+#include <linux/idr.h>
-+#include <linux/init.h>
-+#include <linux/module.h>
-+#include <linux/sched.h>
-+#include <linux/sizes.h>
-+#include <linux/slab.h>
-+#include <linux/uaccess.h>
-+#include <linux/uio.h>
-+
-+#include "bus.h"
-+#include "connection.h"
-+#include "domain.h"
-+#include "endpoint.h"
-+#include "handle.h"
-+#include "item.h"
-+#include "message.h"
-+#include "policy.h"
-+
-+static void kdbus_ep_free(struct kdbus_node *node)
-+{
-+	struct kdbus_ep *ep = container_of(node, struct kdbus_ep, node);
-+
-+	WARN_ON(!list_empty(&ep->conn_list));
-+
-+	kdbus_policy_db_clear(&ep->policy_db);
-+	kdbus_bus_unref(ep->bus);
-+	kdbus_user_unref(ep->user);
-+	kfree(ep);
-+}
-+
-+static void kdbus_ep_release(struct kdbus_node *node, bool was_active)
-+{
-+	struct kdbus_ep *ep = container_of(node, struct kdbus_ep, node);
-+
-+	/* disconnect all connections to this endpoint */
-+	for (;;) {
-+		struct kdbus_conn *conn;
-+
-+		mutex_lock(&ep->lock);
-+		conn = list_first_entry_or_null(&ep->conn_list,
-+						struct kdbus_conn,
-+						ep_entry);
-+		if (!conn) {
-+			mutex_unlock(&ep->lock);
-+			break;
-+		}
-+
-+		/* take reference, release lock, disconnect without lock */
-+		kdbus_conn_ref(conn);
-+		mutex_unlock(&ep->lock);
-+
-+		kdbus_conn_disconnect(conn, false);
-+		kdbus_conn_unref(conn);
-+	}
-+}
-+
-+/**
-+ * kdbus_ep_new() - create a new endpoint
-+ * @bus:		The bus this endpoint will be created for
-+ * @name:		The name of the endpoint
-+ * @access:		The access flags for this node (KDBUS_MAKE_ACCESS_*)
-+ * @uid:		The uid of the node
-+ * @gid:		The gid of the node
-+ * @is_custom:		Whether this is a custom endpoint
-+ *
-+ * This function will create a new endpoint with the given
-+ * name and properties for a given bus.
-+ *
-+ * Return: a new kdbus_ep on success, ERR_PTR on failure.
-+ */
-+struct kdbus_ep *kdbus_ep_new(struct kdbus_bus *bus, const char *name,
-+			      unsigned int access, kuid_t uid, kgid_t gid,
-+			      bool is_custom)
-+{
-+	struct kdbus_ep *e;
-+	int ret;
-+
-+	/*
-+	 * Validate only custom endpoints names, default endpoints
-+	 * with a "bus" name are created when the bus is created
-+	 */
-+	if (is_custom) {
-+		ret = kdbus_verify_uid_prefix(name, bus->domain->user_namespace,
-+					      uid);
-+		if (ret < 0)
-+			return ERR_PTR(ret);
-+	}
-+
-+	e = kzalloc(sizeof(*e), GFP_KERNEL);
-+	if (!e)
-+		return ERR_PTR(-ENOMEM);
-+
-+	kdbus_node_init(&e->node, KDBUS_NODE_ENDPOINT);
-+
-+	e->node.free_cb = kdbus_ep_free;
-+	e->node.release_cb = kdbus_ep_release;
-+	e->node.uid = uid;
-+	e->node.gid = gid;
-+	e->node.mode = S_IRUSR | S_IWUSR;
-+	if (access & (KDBUS_MAKE_ACCESS_GROUP | KDBUS_MAKE_ACCESS_WORLD))
-+		e->node.mode |= S_IRGRP | S_IWGRP;
-+	if (access & KDBUS_MAKE_ACCESS_WORLD)
-+		e->node.mode |= S_IROTH | S_IWOTH;
-+
-+	mutex_init(&e->lock);
-+	INIT_LIST_HEAD(&e->conn_list);
-+	kdbus_policy_db_init(&e->policy_db);
-+	e->bus = kdbus_bus_ref(bus);
-+
-+	ret = kdbus_node_link(&e->node, &bus->node, name);
-+	if (ret < 0)
-+		goto exit_unref;
-+
-+	/*
-+	 * Transactions on custom endpoints are never accounted on the global
-+	 * user limits. Instead, for each custom endpoint, we create a custom,
-+	 * unique user, which all transactions are accounted on. Regardless of
-+	 * the user using that endpoint, it is always accounted on the same
-+	 * user-object. This budget is not shared with ordinary users on
-+	 * non-custom endpoints.
-+	 */
-+	if (is_custom) {
-+		e->user = kdbus_user_lookup(bus->domain, INVALID_UID);
-+		if (IS_ERR(e->user)) {
-+			ret = PTR_ERR(e->user);
-+			e->user = NULL;
-+			goto exit_unref;
-+		}
-+	}
-+
-+	return e;
-+
-+exit_unref:
-+	kdbus_node_deactivate(&e->node);
-+	kdbus_node_unref(&e->node);
-+	return ERR_PTR(ret);
-+}
-+
-+/**
-+ * kdbus_ep_ref() - increase the reference counter of a kdbus_ep
-+ * @ep:			The endpoint to reference
-+ *
-+ * Every user of an endpoint, except for its creator, must add a reference to
-+ * the kdbus_ep instance using this function.
-+ *
-+ * Return: the ep itself
-+ */
-+struct kdbus_ep *kdbus_ep_ref(struct kdbus_ep *ep)
-+{
-+	if (ep)
-+		kdbus_node_ref(&ep->node);
-+	return ep;
-+}
-+
-+/**
-+ * kdbus_ep_unref() - decrease the reference counter of a kdbus_ep
-+ * @ep:		The ep to unref
-+ *
-+ * Release a reference. If the reference count drops to 0, the ep will be
-+ * freed.
-+ *
-+ * Return: NULL
-+ */
-+struct kdbus_ep *kdbus_ep_unref(struct kdbus_ep *ep)
-+{
-+	if (ep)
-+		kdbus_node_unref(&ep->node);
-+	return NULL;
-+}
-+
-+/**
-+ * kdbus_ep_is_privileged() - check whether a file is privileged
-+ * @ep:		endpoint to operate on
-+ * @file:	file to test
-+ *
-+ * Return: True if @file is privileged in the domain of @ep.
-+ */
-+bool kdbus_ep_is_privileged(struct kdbus_ep *ep, struct file *file)
-+{
-+	return !ep->user &&
-+		file_ns_capable(file, ep->bus->domain->user_namespace,
-+				CAP_IPC_OWNER);
-+}
-+
-+/**
-+ * kdbus_ep_is_owner() - check whether a file should be treated as bus owner
-+ * @ep:		endpoint to operate on
-+ * @file:	file to test
-+ *
-+ * Return: True if @file should be treated as bus owner on @ep
-+ */
-+bool kdbus_ep_is_owner(struct kdbus_ep *ep, struct file *file)
-+{
-+	return !ep->user &&
-+		(uid_eq(file->f_cred->euid, ep->bus->node.uid) ||
-+		 kdbus_ep_is_privileged(ep, file));
-+}
-+
-+/**
-+ * kdbus_cmd_ep_make() - handle KDBUS_CMD_ENDPOINT_MAKE
-+ * @bus:		bus to operate on
-+ * @argp:		command payload
-+ *
-+ * Return: NULL or newly created endpoint on success, ERR_PTR on failure.
-+ */
-+struct kdbus_ep *kdbus_cmd_ep_make(struct kdbus_bus *bus, void __user *argp)
-+{
-+	const char *item_make_name;
-+	struct kdbus_ep *ep = NULL;
-+	struct kdbus_cmd *cmd;
-+	int ret;
-+
-+	struct kdbus_arg argv[] = {
-+		{ .type = KDBUS_ITEM_NEGOTIATE },
-+		{ .type = KDBUS_ITEM_MAKE_NAME, .mandatory = true },
-+	};
-+	struct kdbus_args args = {
-+		.allowed_flags = KDBUS_FLAG_NEGOTIATE |
-+				 KDBUS_MAKE_ACCESS_GROUP |
-+				 KDBUS_MAKE_ACCESS_WORLD,
-+		.argv = argv,
-+		.argc = ARRAY_SIZE(argv),
-+	};
-+
-+	ret = kdbus_args_parse(&args, argp, &cmd);
-+	if (ret < 0)
-+		return ERR_PTR(ret);
-+	if (ret > 0)
-+		return NULL;
-+
-+	item_make_name = argv[1].item->str;
-+
-+	ep = kdbus_ep_new(bus, item_make_name, cmd->flags,
-+			  current_euid(), current_egid(), true);
-+	if (IS_ERR(ep)) {
-+		ret = PTR_ERR(ep);
-+		ep = NULL;
-+		goto exit;
-+	}
-+
-+	if (!kdbus_node_activate(&ep->node)) {
-+		ret = -ESHUTDOWN;
-+		goto exit;
-+	}
-+
-+exit:
-+	ret = kdbus_args_clear(&args, ret);
-+	if (ret < 0) {
-+		if (ep) {
-+			kdbus_node_deactivate(&ep->node);
-+			kdbus_ep_unref(ep);
-+		}
-+		return ERR_PTR(ret);
-+	}
-+	return ep;
-+}
-+
-+/**
-+ * kdbus_cmd_ep_update() - handle KDBUS_CMD_ENDPOINT_UPDATE
-+ * @ep:			endpoint to operate on
-+ * @argp:		command payload
-+ *
-+ * Return: >=0 on success, negative error code on failure.
-+ */
-+int kdbus_cmd_ep_update(struct kdbus_ep *ep, void __user *argp)
-+{
-+	struct kdbus_cmd *cmd;
-+	int ret;
-+
-+	struct kdbus_arg argv[] = {
-+		{ .type = KDBUS_ITEM_NEGOTIATE },
-+		{ .type = KDBUS_ITEM_NAME, .multiple = true },
-+		{ .type = KDBUS_ITEM_POLICY_ACCESS, .multiple = true },
-+	};
-+	struct kdbus_args args = {
-+		.allowed_flags = KDBUS_FLAG_NEGOTIATE,
-+		.argv = argv,
-+		.argc = ARRAY_SIZE(argv),
-+	};
-+
-+	ret = kdbus_args_parse(&args, argp, &cmd);
-+	if (ret != 0)
-+		return ret;
-+
-+	ret = kdbus_policy_set(&ep->policy_db, args.items, args.items_size,
-+			       0, true, ep);
-+	return kdbus_args_clear(&args, ret);
-+}
-diff --git a/ipc/kdbus/endpoint.h b/ipc/kdbus/endpoint.h
-new file mode 100644
-index 0000000..e0da59f
---- /dev/null
-+++ b/ipc/kdbus/endpoint.h
-@@ -0,0 +1,70 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_ENDPOINT_H
-+#define __KDBUS_ENDPOINT_H
-+
-+#include <linux/list.h>
-+#include <linux/mutex.h>
-+#include <linux/uidgid.h>
-+#include "node.h"
-+#include "policy.h"
-+
-+struct kdbus_bus;
-+struct kdbus_user;
-+
-+/**
-+ * struct kdbus_ep - endpoint to access a bus
-+ * @node:		The kdbus node
-+ * @lock:		Endpoint data lock
-+ * @bus:		Bus behind this endpoint
-+ * @user:		Custom enpoints account against an anonymous user
-+ * @policy_db:		Uploaded policy
-+ * @conn_list:		Connections of this endpoint
-+ *
-+ * An endpoint offers access to a bus; the default endpoint node name is "bus".
-+ * Additional custom endpoints to the same bus can be created and they can
-+ * carry their own policies/filters.
-+ */
-+struct kdbus_ep {
-+	struct kdbus_node node;
-+	struct mutex lock;
-+
-+	/* static */
-+	struct kdbus_bus *bus;
-+	struct kdbus_user *user;
-+
-+	/* protected by own locks */
-+	struct kdbus_policy_db policy_db;
-+
-+	/* protected by ep->lock */
-+	struct list_head conn_list;
-+};
-+
-+#define kdbus_ep_from_node(_node) \
-+	container_of((_node), struct kdbus_ep, node)
-+
-+struct kdbus_ep *kdbus_ep_new(struct kdbus_bus *bus, const char *name,
-+			      unsigned int access, kuid_t uid, kgid_t gid,
-+			      bool policy);
-+struct kdbus_ep *kdbus_ep_ref(struct kdbus_ep *ep);
-+struct kdbus_ep *kdbus_ep_unref(struct kdbus_ep *ep);
-+
-+bool kdbus_ep_is_privileged(struct kdbus_ep *ep, struct file *file);
-+bool kdbus_ep_is_owner(struct kdbus_ep *ep, struct file *file);
-+
-+struct kdbus_ep *kdbus_cmd_ep_make(struct kdbus_bus *bus, void __user *argp);
-+int kdbus_cmd_ep_update(struct kdbus_ep *ep, void __user *argp);
-+
-+#endif
-diff --git a/ipc/kdbus/fs.c b/ipc/kdbus/fs.c
-new file mode 100644
-index 0000000..09c4809
---- /dev/null
-+++ b/ipc/kdbus/fs.c
-@@ -0,0 +1,508 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/dcache.h>
-+#include <linux/fs.h>
-+#include <linux/fsnotify.h>
-+#include <linux/init.h>
-+#include <linux/ipc_namespace.h>
-+#include <linux/magic.h>
-+#include <linux/module.h>
-+#include <linux/mount.h>
-+#include <linux/mutex.h>
-+#include <linux/namei.h>
-+#include <linux/pagemap.h>
-+#include <linux/sched.h>
-+#include <linux/slab.h>
-+
-+#include "bus.h"
-+#include "domain.h"
-+#include "endpoint.h"
-+#include "fs.h"
-+#include "handle.h"
-+#include "node.h"
-+
-+#define kdbus_node_from_dentry(_dentry) \
-+	((struct kdbus_node *)(_dentry)->d_fsdata)
-+
-+static struct inode *fs_inode_get(struct super_block *sb,
-+				  struct kdbus_node *node);
-+
-+/*
-+ * Directory Management
-+ */
-+
-+static inline unsigned char kdbus_dt_type(struct kdbus_node *node)
-+{
-+	switch (node->type) {
-+	case KDBUS_NODE_DOMAIN:
-+	case KDBUS_NODE_BUS:
-+		return DT_DIR;
-+	case KDBUS_NODE_CONTROL:
-+	case KDBUS_NODE_ENDPOINT:
-+		return DT_REG;
-+	}
-+
-+	return DT_UNKNOWN;
-+}
-+
-+static int fs_dir_fop_iterate(struct file *file, struct dir_context *ctx)
-+{
-+	struct dentry *dentry = file->f_path.dentry;
-+	struct kdbus_node *parent = kdbus_node_from_dentry(dentry);
-+	struct kdbus_node *old, *next = file->private_data;
-+
-+	/*
-+	 * kdbusfs directory iterator (modelled after sysfs/kernfs)
-+	 * When iterating kdbusfs directories, we iterate all children of the
-+	 * parent kdbus_node object. We use ctx->pos to store the hash of the
-+	 * child and file->private_data to store a reference to the next node
-+	 * object. If ctx->pos is not modified via llseek while you iterate a
-+	 * directory, then we use the file->private_data node pointer to
-+	 * directly access the next node in the tree.
-+	 * However, if you directly seek on the directory, we have to find the
-+	 * closest node to that position and cannot use our node pointer. This
-+	 * means iterating the rb-tree to find the closest match and start over
-+	 * from there.
-+	 * Note that hash values are not necessarily unique. Therefore, llseek
-+	 * is not guaranteed to seek to the same node that you got when you
-+	 * retrieved the position. Seeking to 0, 1, 2 and >=INT_MAX is safe,
-+	 * though. We could use the inode-number as position, but this would
-+	 * require another rb-tree for fast access. Kernfs and others already
-+	 * ignore those conflicts, so we should be fine, too.
-+	 */
-+
-+	if (!dir_emit_dots(file, ctx))
-+		return 0;
-+
-+	/* acquire @next; if deactivated, or seek detected, find next node */
-+	old = next;
-+	if (next && ctx->pos == next->hash) {
-+		if (kdbus_node_acquire(next))
-+			kdbus_node_ref(next);
-+		else
-+			next = kdbus_node_next_child(parent, next);
-+	} else {
-+		next = kdbus_node_find_closest(parent, ctx->pos);
-+	}
-+	kdbus_node_unref(old);
-+
-+	while (next) {
-+		/* emit @next */
-+		file->private_data = next;
-+		ctx->pos = next->hash;
-+
-+		kdbus_node_release(next);
-+
-+		if (!dir_emit(ctx, next->name, strlen(next->name), next->id,
-+			      kdbus_dt_type(next)))
-+			return 0;
-+
-+		/* find next node after @next */
-+		old = next;
-+		next = kdbus_node_next_child(parent, next);
-+		kdbus_node_unref(old);
-+	}
-+
-+	file->private_data = NULL;
-+	ctx->pos = INT_MAX;
-+
-+	return 0;
-+}
-+
-+static loff_t fs_dir_fop_llseek(struct file *file, loff_t offset, int whence)
-+{
-+	struct inode *inode = file_inode(file);
-+	loff_t ret;
-+
-+	/* protect f_off against fop_iterate */
-+	mutex_lock(&inode->i_mutex);
-+	ret = generic_file_llseek(file, offset, whence);
-+	mutex_unlock(&inode->i_mutex);
-+
-+	return ret;
-+}
-+
-+static int fs_dir_fop_release(struct inode *inode, struct file *file)
-+{
-+	kdbus_node_unref(file->private_data);
-+	return 0;
-+}
-+
-+static const struct file_operations fs_dir_fops = {
-+	.read		= generic_read_dir,
-+	.iterate	= fs_dir_fop_iterate,
-+	.llseek		= fs_dir_fop_llseek,
-+	.release	= fs_dir_fop_release,
-+};
-+
-+static struct dentry *fs_dir_iop_lookup(struct inode *dir,
-+					struct dentry *dentry,
-+					unsigned int flags)
-+{
-+	struct dentry *dnew = NULL;
-+	struct kdbus_node *parent;
-+	struct kdbus_node *node;
-+	struct inode *inode;
-+
-+	parent = kdbus_node_from_dentry(dentry->d_parent);
-+	if (!kdbus_node_acquire(parent))
-+		return NULL;
-+
-+	/* returns reference to _acquired_ child node */
-+	node = kdbus_node_find_child(parent, dentry->d_name.name);
-+	if (node) {
-+		dentry->d_fsdata = node;
-+		inode = fs_inode_get(dir->i_sb, node);
-+		if (IS_ERR(inode))
-+			dnew = ERR_CAST(inode);
-+		else
-+			dnew = d_splice_alias(inode, dentry);
-+
-+		kdbus_node_release(node);
-+	}
-+
-+	kdbus_node_release(parent);
-+	return dnew;
-+}
-+
-+static const struct inode_operations fs_dir_iops = {
-+	.permission	= generic_permission,
-+	.lookup		= fs_dir_iop_lookup,
-+};
-+
-+/*
-+ * Inode Management
-+ */
-+
-+static const struct inode_operations fs_inode_iops = {
-+	.permission	= generic_permission,
-+};
-+
-+static struct inode *fs_inode_get(struct super_block *sb,
-+				  struct kdbus_node *node)
-+{
-+	struct inode *inode;
-+
-+	inode = iget_locked(sb, node->id);
-+	if (!inode)
-+		return ERR_PTR(-ENOMEM);
-+	if (!(inode->i_state & I_NEW))
-+		return inode;
-+
-+	inode->i_private = kdbus_node_ref(node);
-+	inode->i_mapping->a_ops = &empty_aops;
-+	inode->i_mode = node->mode & S_IALLUGO;
-+	inode->i_atime = inode->i_ctime = inode->i_mtime = CURRENT_TIME;
-+	inode->i_uid = node->uid;
-+	inode->i_gid = node->gid;
-+
-+	switch (node->type) {
-+	case KDBUS_NODE_DOMAIN:
-+	case KDBUS_NODE_BUS:
-+		inode->i_mode |= S_IFDIR;
-+		inode->i_op = &fs_dir_iops;
-+		inode->i_fop = &fs_dir_fops;
-+		set_nlink(inode, 2);
-+		break;
-+	case KDBUS_NODE_CONTROL:
-+	case KDBUS_NODE_ENDPOINT:
-+		inode->i_mode |= S_IFREG;
-+		inode->i_op = &fs_inode_iops;
-+		inode->i_fop = &kdbus_handle_ops;
-+		break;
-+	}
-+
-+	unlock_new_inode(inode);
-+
-+	return inode;
-+}
-+
-+/*
-+ * Superblock Management
-+ */
-+
-+static int fs_super_dop_revalidate(struct dentry *dentry, unsigned int flags)
-+{
-+	struct kdbus_node *node;
-+
-+	/* Force lookup on negatives */
-+	if (!dentry->d_inode)
-+		return 0;
-+
-+	node = kdbus_node_from_dentry(dentry);
-+
-+	/* see whether the node has been removed */
-+	if (!kdbus_node_is_active(node))
-+		return 0;
-+
-+	return 1;
-+}
-+
-+static void fs_super_dop_release(struct dentry *dentry)
-+{
-+	kdbus_node_unref(dentry->d_fsdata);
-+}
-+
-+static const struct dentry_operations fs_super_dops = {
-+	.d_revalidate	= fs_super_dop_revalidate,
-+	.d_release	= fs_super_dop_release,
-+};
-+
-+static void fs_super_sop_evict_inode(struct inode *inode)
-+{
-+	struct kdbus_node *node = kdbus_node_from_inode(inode);
-+
-+	truncate_inode_pages_final(&inode->i_data);
-+	clear_inode(inode);
-+	kdbus_node_unref(node);
-+}
-+
-+static const struct super_operations fs_super_sops = {
-+	.statfs		= simple_statfs,
-+	.drop_inode	= generic_delete_inode,
-+	.evict_inode	= fs_super_sop_evict_inode,
-+};
-+
-+static int fs_super_fill(struct super_block *sb)
-+{
-+	struct kdbus_domain *domain = sb->s_fs_info;
-+	struct inode *inode;
-+	int ret;
-+
-+	sb->s_blocksize = PAGE_CACHE_SIZE;
-+	sb->s_blocksize_bits = PAGE_CACHE_SHIFT;
-+	sb->s_magic = KDBUS_SUPER_MAGIC;
-+	sb->s_maxbytes = MAX_LFS_FILESIZE;
-+	sb->s_op = &fs_super_sops;
-+	sb->s_time_gran = 1;
-+
-+	inode = fs_inode_get(sb, &domain->node);
-+	if (IS_ERR(inode))
-+		return PTR_ERR(inode);
-+
-+	sb->s_root = d_make_root(inode);
-+	if (!sb->s_root) {
-+		/* d_make_root iput()s the inode on failure */
-+		return -ENOMEM;
-+	}
-+
-+	/* sb holds domain reference */
-+	sb->s_root->d_fsdata = &domain->node;
-+	sb->s_d_op = &fs_super_dops;
-+
-+	/* sb holds root reference */
-+	domain->dentry = sb->s_root;
-+
-+	if (!kdbus_node_activate(&domain->node))
-+		return -ESHUTDOWN;
-+
-+	ret = kdbus_domain_populate(domain, KDBUS_MAKE_ACCESS_WORLD);
-+	if (ret < 0)
-+		return ret;
-+
-+	sb->s_flags |= MS_ACTIVE;
-+	return 0;
-+}
-+
-+static void fs_super_kill(struct super_block *sb)
-+{
-+	struct kdbus_domain *domain = sb->s_fs_info;
-+
-+	if (domain) {
-+		kdbus_node_deactivate(&domain->node);
-+		domain->dentry = NULL;
-+	}
-+
-+	kill_anon_super(sb);
-+	kdbus_domain_unref(domain);
-+}
-+
-+static int fs_super_set(struct super_block *sb, void *data)
-+{
-+	int ret;
-+
-+	ret = set_anon_super(sb, data);
-+	if (!ret)
-+		sb->s_fs_info = data;
-+
-+	return ret;
-+}
-+
-+static struct dentry *fs_super_mount(struct file_system_type *fs_type,
-+				     int flags, const char *dev_name,
-+				     void *data)
-+{
-+	struct kdbus_domain *domain;
-+	struct super_block *sb;
-+	int ret;
-+
-+	domain = kdbus_domain_new(KDBUS_MAKE_ACCESS_WORLD);
-+	if (IS_ERR(domain))
-+		return ERR_CAST(domain);
-+
-+	sb = sget(fs_type, NULL, fs_super_set, flags, domain);
-+	if (IS_ERR(sb)) {
-+		kdbus_node_deactivate(&domain->node);
-+		kdbus_domain_unref(domain);
-+		return ERR_CAST(sb);
-+	}
-+
-+	WARN_ON(sb->s_fs_info != domain);
-+	WARN_ON(sb->s_root);
-+
-+	ret = fs_super_fill(sb);
-+	if (ret < 0) {
-+		/* calls into ->kill_sb() when done */
-+		deactivate_locked_super(sb);
-+		return ERR_PTR(ret);
-+	}
-+
-+	return dget(sb->s_root);
-+}
-+
-+static struct file_system_type fs_type = {
-+	.name		= KBUILD_MODNAME "fs",
-+	.owner		= THIS_MODULE,
-+	.mount		= fs_super_mount,
-+	.kill_sb	= fs_super_kill,
-+	.fs_flags	= FS_USERNS_MOUNT,
-+};
-+
-+/**
-+ * kdbus_fs_init() - register kdbus filesystem
-+ *
-+ * This registers a filesystem with the VFS layer. The filesystem is called
-+ * `KBUILD_MODNAME "fs"', which usually resolves to `kdbusfs'. The nameing
-+ * scheme allows to set KBUILD_MODNAME to "kdbus2" and you will get an
-+ * independent filesystem for developers.
-+ *
-+ * Each mount of the kdbusfs filesystem has an kdbus_domain attached.
-+ * Operations on this mount will only affect the attached domain. On each mount
-+ * a new domain is automatically created and used for this mount exclusively.
-+ * If you want to share a domain across multiple mounts, you need to bind-mount
-+ * it.
-+ *
-+ * Mounts of kdbusfs (with a different domain each) are unrelated to each other
-+ * and will never have any effect on any domain but their own.
-+ *
-+ * Return: 0 on success, negative error otherwise.
-+ */
-+int kdbus_fs_init(void)
-+{
-+	return register_filesystem(&fs_type);
-+}
-+
-+/**
-+ * kdbus_fs_exit() - unregister kdbus filesystem
-+ *
-+ * This does the reverse to kdbus_fs_init(). It unregisters the kdbusfs
-+ * filesystem from VFS and cleans up any allocated resources.
-+ */
-+void kdbus_fs_exit(void)
-+{
-+	unregister_filesystem(&fs_type);
-+}
-+
-+/* acquire domain of @node, making sure all ancestors are active */
-+static struct kdbus_domain *fs_acquire_domain(struct kdbus_node *node)
-+{
-+	struct kdbus_domain *domain;
-+	struct kdbus_node *iter;
-+
-+	/* caller must guarantee that @node is linked */
-+	for (iter = node; iter->parent; iter = iter->parent)
-+		if (!kdbus_node_is_active(iter->parent))
-+			return NULL;
-+
-+	/* root nodes are always domains */
-+	if (WARN_ON(iter->type != KDBUS_NODE_DOMAIN))
-+		return NULL;
-+
-+	domain = kdbus_domain_from_node(iter);
-+	if (!kdbus_node_acquire(&domain->node))
-+		return NULL;
-+
-+	return domain;
-+}
-+
-+/**
-+ * kdbus_fs_flush() - flush dcache entries of a node
-+ * @node:		Node to flush entries of
-+ *
-+ * This flushes all VFS filesystem cache entries for a node and all its
-+ * children. This should be called whenever a node is destroyed during
-+ * runtime. It will flush the cache entries so the linked objects can be
-+ * deallocated.
-+ *
-+ * This is a no-op if you call it on active nodes (they really should stay in
-+ * cache) or on nodes with deactivated parents (flushing the parent is enough).
-+ * Furthermore, there is no need to call it on nodes whose lifetime is bound to
-+ * their parents'. In those cases, the parent-flush will always also flush the
-+ * children.
-+ */
-+void kdbus_fs_flush(struct kdbus_node *node)
-+{
-+	struct dentry *dentry, *parent_dentry = NULL;
-+	struct kdbus_domain *domain;
-+	struct qstr name;
-+
-+	/* active nodes should remain in cache */
-+	if (!kdbus_node_is_deactivated(node))
-+		return;
-+
-+	/* nodes that were never linked were never instantiated */
-+	if (!node->parent)
-+		return;
-+
-+	/* acquire domain and verify all ancestors are active */
-+	domain = fs_acquire_domain(node);
-+	if (!domain)
-+		return;
-+
-+	switch (node->type) {
-+	case KDBUS_NODE_ENDPOINT:
-+		if (WARN_ON(!node->parent || !node->parent->name))
-+			goto exit;
-+
-+		name.name = node->parent->name;
-+		name.len = strlen(node->parent->name);
-+		parent_dentry = d_hash_and_lookup(domain->dentry, &name);
-+		if (IS_ERR_OR_NULL(parent_dentry))
-+			goto exit;
-+
-+		/* fallthrough */
-+	case KDBUS_NODE_BUS:
-+		if (WARN_ON(!node->name))
-+			goto exit;
-+
-+		name.name = node->name;
-+		name.len = strlen(node->name);
-+		dentry = d_hash_and_lookup(parent_dentry ? : domain->dentry,
-+					   &name);
-+		if (!IS_ERR_OR_NULL(dentry)) {
-+			d_invalidate(dentry);
-+			dput(dentry);
-+		}
-+
-+		dput(parent_dentry);
-+		break;
-+
-+	default:
-+		/* all other types are bound to their parent lifetime */
-+		break;
-+	}
-+
-+exit:
-+	kdbus_node_release(&domain->node);
-+}
-diff --git a/ipc/kdbus/fs.h b/ipc/kdbus/fs.h
-new file mode 100644
-index 0000000..62f7d6a
---- /dev/null
-+++ b/ipc/kdbus/fs.h
-@@ -0,0 +1,28 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUSFS_H
-+#define __KDBUSFS_H
-+
-+#include <linux/kernel.h>
-+
-+struct kdbus_node;
-+
-+int kdbus_fs_init(void);
-+void kdbus_fs_exit(void);
-+void kdbus_fs_flush(struct kdbus_node *node);
-+
-+#define kdbus_node_from_inode(_inode) \
-+	((struct kdbus_node *)(_inode)->i_private)
-+
-+#endif
-diff --git a/ipc/kdbus/handle.c b/ipc/kdbus/handle.c
-new file mode 100644
-index 0000000..fc60932
---- /dev/null
-+++ b/ipc/kdbus/handle.c
-@@ -0,0 +1,691 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/file.h>
-+#include <linux/fs.h>
-+#include <linux/idr.h>
-+#include <linux/init.h>
-+#include <linux/kdev_t.h>
-+#include <linux/module.h>
-+#include <linux/mutex.h>
-+#include <linux/poll.h>
-+#include <linux/rwsem.h>
-+#include <linux/sched.h>
-+#include <linux/sizes.h>
-+#include <linux/slab.h>
-+#include <linux/uaccess.h>
-+#include <linux/syscalls.h>
-+
-+#include "bus.h"
-+#include "connection.h"
-+#include "endpoint.h"
-+#include "fs.h"
-+#include "handle.h"
-+#include "item.h"
-+#include "match.h"
-+#include "message.h"
-+#include "names.h"
-+#include "domain.h"
-+#include "policy.h"
-+
-+static int kdbus_args_verify(struct kdbus_args *args)
-+{
-+	struct kdbus_item *item;
-+	size_t i;
-+	int ret;
-+
-+	KDBUS_ITEMS_FOREACH(item, args->items, args->items_size) {
-+		struct kdbus_arg *arg = NULL;
-+
-+		if (!KDBUS_ITEM_VALID(item, args->items, args->items_size))
-+			return -EINVAL;
-+
-+		for (i = 0; i < args->argc; ++i)
-+			if (args->argv[i].type == item->type)
-+				break;
-+		if (i >= args->argc)
-+			return -EINVAL;
-+
-+		arg = &args->argv[i];
-+
-+		ret = kdbus_item_validate(item);
-+		if (ret < 0)
-+			return ret;
-+
-+		if (arg->item && !arg->multiple)
-+			return -EINVAL;
-+
-+		arg->item = item;
-+	}
-+
-+	if (!KDBUS_ITEMS_END(item, args->items, args->items_size))
-+		return -EINVAL;
-+
-+	return 0;
-+}
-+
-+static int kdbus_args_negotiate(struct kdbus_args *args)
-+{
-+	struct kdbus_item __user *user;
-+	struct kdbus_item *negotiation;
-+	size_t i, j, num;
-+
-+	/*
-+	 * If KDBUS_FLAG_NEGOTIATE is set, we overwrite the flags field with
-+	 * the set of supported flags. Furthermore, if an KDBUS_ITEM_NEGOTIATE
-+	 * item is passed, we iterate its payload (array of u64, each set to an
-+	 * item type) and clear all unsupported item-types to 0.
-+	 * The caller might do this recursively, if other flags or objects are
-+	 * embedded in the payload itself.
-+	 */
-+
-+	if (args->cmd->flags & KDBUS_FLAG_NEGOTIATE) {
-+		if (put_user(args->allowed_flags & ~KDBUS_FLAG_NEGOTIATE,
-+			     &args->user->flags))
-+			return -EFAULT;
-+	}
-+
-+	if (args->argc < 1 || args->argv[0].type != KDBUS_ITEM_NEGOTIATE ||
-+	    !args->argv[0].item)
-+		return 0;
-+
-+	negotiation = args->argv[0].item;
-+	user = (struct kdbus_item __user *)
-+		((u8 __user *)args->user +
-+		 ((u8 *)negotiation - (u8 *)args->cmd));
-+	num = KDBUS_ITEM_PAYLOAD_SIZE(negotiation) / sizeof(u64);
-+
-+	for (i = 0; i < num; ++i) {
-+		for (j = 0; j < args->argc; ++j)
-+			if (negotiation->data64[i] == args->argv[j].type)
-+				break;
-+
-+		if (j < args->argc)
-+			continue;
-+
-+		/* this item is not supported, clear it out */
-+		negotiation->data64[i] = 0;
-+		if (put_user(negotiation->data64[i], &user->data64[i]))
-+			return -EFAULT;
-+	}
-+
-+	return 0;
-+}
-+
-+/**
-+ * __kdbus_args_parse() - parse payload of kdbus command
-+ * @args:		object to parse data into
-+ * @is_cmd:		whether this is a command or msg payload
-+ * @argp:		user-space location of command payload to parse
-+ * @type_size:		overall size of command payload to parse
-+ * @items_offset:	offset of items array in command payload
-+ * @out:		output variable to store pointer to copied payload
-+ *
-+ * This parses the ioctl payload at user-space location @argp into @args. @args
-+ * must be pre-initialized by the caller to reflect the supported flags and
-+ * items of this command. This parser will then copy the command payload into
-+ * kernel-space, verify correctness and consistency and cache pointers to parsed
-+ * items and other data in @args.
-+ *
-+ * If this function succeeded, you must call kdbus_args_clear() to release
-+ * allocated resources before destroying @args.
-+ *
-+ * This can also be used to import kdbus_msg objects. In that case, @is_cmd must
-+ * be set to 'false' and the 'return_flags' field will not be touched (as it
-+ * doesn't exist on kdbus_msg).
-+ *
-+ * Return: On failure a negative error code is returned. Otherwise, 1 is
-+ * returned if negotiation was requested, 0 if not.
-+ */
-+int __kdbus_args_parse(struct kdbus_args *args, bool is_cmd, void __user *argp,
-+		       size_t type_size, size_t items_offset, void **out)
-+{
-+	u64 user_size;
-+	int ret, i;
-+
-+	ret = kdbus_copy_from_user(&user_size, argp, sizeof(user_size));
-+	if (ret < 0)
-+		return ret;
-+
-+	if (user_size < type_size)
-+		return -EINVAL;
-+	if (user_size > KDBUS_CMD_MAX_SIZE)
-+		return -EMSGSIZE;
-+
-+	if (user_size <= sizeof(args->cmd_buf)) {
-+		if (copy_from_user(args->cmd_buf, argp, user_size))
-+			return -EFAULT;
-+		args->cmd = (void*)args->cmd_buf;
-+	} else {
-+		args->cmd = memdup_user(argp, user_size);
-+		if (IS_ERR(args->cmd))
-+			return PTR_ERR(args->cmd);
-+	}
-+
-+	if (args->cmd->size != user_size) {
-+		ret = -EINVAL;
-+		goto error;
-+	}
-+
-+	if (is_cmd)
-+		args->cmd->return_flags = 0;
-+	args->user = argp;
-+	args->items = (void *)((u8 *)args->cmd + items_offset);
-+	args->items_size = args->cmd->size - items_offset;
-+	args->is_cmd = is_cmd;
-+
-+	if (args->cmd->flags & ~args->allowed_flags) {
-+		ret = -EINVAL;
-+		goto error;
-+	}
-+
-+	ret = kdbus_args_verify(args);
-+	if (ret < 0)
-+		goto error;
-+
-+	ret = kdbus_args_negotiate(args);
-+	if (ret < 0)
-+		goto error;
-+
-+	/* mandatory items must be given (but not on negotiation) */
-+	if (!(args->cmd->flags & KDBUS_FLAG_NEGOTIATE)) {
-+		for (i = 0; i < args->argc; ++i)
-+			if (args->argv[i].mandatory && !args->argv[i].item) {
-+				ret = -EINVAL;
-+				goto error;
-+			}
-+	}
-+
-+	*out = args->cmd;
-+	return !!(args->cmd->flags & KDBUS_FLAG_NEGOTIATE);
-+
-+error:
-+	return kdbus_args_clear(args, ret);
-+}
-+
-+/**
-+ * kdbus_args_clear() - release allocated command resources
-+ * @args:	object to release resources of
-+ * @ret:	return value of this command
-+ *
-+ * This frees all allocated resources on @args and copies the command result
-+ * flags into user-space. @ret is usually returned unchanged by this function,
-+ * so it can be used in the final 'return' statement of the command handler.
-+ *
-+ * Return: -EFAULT if return values cannot be copied into user-space, otherwise
-+ *         @ret is returned unchanged.
-+ */
-+int kdbus_args_clear(struct kdbus_args *args, int ret)
-+{
-+	if (!args)
-+		return ret;
-+
-+	if (!IS_ERR_OR_NULL(args->cmd)) {
-+		if (args->is_cmd && put_user(args->cmd->return_flags,
-+					     &args->user->return_flags))
-+			ret = -EFAULT;
-+		if (args->cmd != (void*)args->cmd_buf)
-+			kfree(args->cmd);
-+		args->cmd = NULL;
-+	}
-+
-+	return ret;
-+}
-+
-+/**
-+ * enum kdbus_handle_type - type an handle can be of
-+ * @KDBUS_HANDLE_NONE:		no type set, yet
-+ * @KDBUS_HANDLE_BUS_OWNER:	bus owner
-+ * @KDBUS_HANDLE_EP_OWNER:	endpoint owner
-+ * @KDBUS_HANDLE_CONNECTED:	endpoint connection after HELLO
-+ */
-+enum kdbus_handle_type {
-+	KDBUS_HANDLE_NONE,
-+	KDBUS_HANDLE_BUS_OWNER,
-+	KDBUS_HANDLE_EP_OWNER,
-+	KDBUS_HANDLE_CONNECTED,
-+};
-+
-+/**
-+ * struct kdbus_handle - handle to the kdbus system
-+ * @lock:		handle lock
-+ * @type:		type of this handle (KDBUS_HANDLE_*)
-+ * @bus_owner:		bus this handle owns
-+ * @ep_owner:		endpoint this handle owns
-+ * @conn:		connection this handle owns
-+ */
-+struct kdbus_handle {
-+	struct mutex lock;
-+
-+	enum kdbus_handle_type type;
-+	union {
-+		struct kdbus_bus *bus_owner;
-+		struct kdbus_ep *ep_owner;
-+		struct kdbus_conn *conn;
-+	};
-+};
-+
-+static int kdbus_handle_open(struct inode *inode, struct file *file)
-+{
-+	struct kdbus_handle *handle;
-+	struct kdbus_node *node;
-+	int ret;
-+
-+	node = kdbus_node_from_inode(inode);
-+	if (!kdbus_node_acquire(node))
-+		return -ESHUTDOWN;
-+
-+	handle = kzalloc(sizeof(*handle), GFP_KERNEL);
-+	if (!handle) {
-+		ret = -ENOMEM;
-+		goto exit;
-+	}
-+
-+	mutex_init(&handle->lock);
-+	handle->type = KDBUS_HANDLE_NONE;
-+
-+	file->private_data = handle;
-+	ret = 0;
-+
-+exit:
-+	kdbus_node_release(node);
-+	return ret;
-+}
-+
-+static int kdbus_handle_release(struct inode *inode, struct file *file)
-+{
-+	struct kdbus_handle *handle = file->private_data;
-+
-+	switch (handle->type) {
-+	case KDBUS_HANDLE_BUS_OWNER:
-+		if (handle->bus_owner) {
-+			kdbus_node_deactivate(&handle->bus_owner->node);
-+			kdbus_bus_unref(handle->bus_owner);
-+		}
-+		break;
-+	case KDBUS_HANDLE_EP_OWNER:
-+		if (handle->ep_owner) {
-+			kdbus_node_deactivate(&handle->ep_owner->node);
-+			kdbus_ep_unref(handle->ep_owner);
-+		}
-+		break;
-+	case KDBUS_HANDLE_CONNECTED:
-+		kdbus_conn_disconnect(handle->conn, false);
-+		kdbus_conn_unref(handle->conn);
-+		break;
-+	case KDBUS_HANDLE_NONE:
-+		/* nothing to clean up */
-+		break;
-+	}
-+
-+	kfree(handle);
-+
-+	return 0;
-+}
-+
-+static long kdbus_handle_ioctl_control(struct file *file, unsigned int cmd,
-+				       void __user *argp)
-+{
-+	struct kdbus_handle *handle = file->private_data;
-+	struct kdbus_node *node = file_inode(file)->i_private;
-+	struct kdbus_domain *domain;
-+	int ret = 0;
-+
-+	if (!kdbus_node_acquire(node))
-+		return -ESHUTDOWN;
-+
-+	/*
-+	 * The parent of control-nodes is always a domain, make sure to pin it
-+	 * so the parent is actually valid.
-+	 */
-+	domain = kdbus_domain_from_node(node->parent);
-+	if (!kdbus_node_acquire(&domain->node)) {
-+		kdbus_node_release(node);
-+		return -ESHUTDOWN;
-+	}
-+
-+	switch (cmd) {
-+	case KDBUS_CMD_BUS_MAKE: {
-+		struct kdbus_bus *bus;
-+
-+		bus = kdbus_cmd_bus_make(domain, argp);
-+		if (IS_ERR_OR_NULL(bus)) {
-+			ret = PTR_ERR_OR_ZERO(bus);
-+			break;
-+		}
-+
-+		handle->bus_owner = bus;
-+		ret = KDBUS_HANDLE_BUS_OWNER;
-+		break;
-+	}
-+
-+	default:
-+		ret = -EBADFD;
-+		break;
-+	}
-+
-+	kdbus_node_release(&domain->node);
-+	kdbus_node_release(node);
-+	return ret;
-+}
-+
-+static long kdbus_handle_ioctl_ep(struct file *file, unsigned int cmd,
-+				  void __user *buf)
-+{
-+	struct kdbus_handle *handle = file->private_data;
-+	struct kdbus_node *node = file_inode(file)->i_private;
-+	struct kdbus_ep *ep, *file_ep = kdbus_ep_from_node(node);
-+	struct kdbus_bus *bus = file_ep->bus;
-+	struct kdbus_conn *conn;
-+	int ret = 0;
-+
-+	if (!kdbus_node_acquire(node))
-+		return -ESHUTDOWN;
-+
-+	switch (cmd) {
-+	case KDBUS_CMD_ENDPOINT_MAKE: {
-+		/* creating custom endpoints is a privileged operation */
-+		if (!kdbus_ep_is_owner(file_ep, file)) {
-+			ret = -EPERM;
-+			break;
-+		}
-+
-+		ep = kdbus_cmd_ep_make(bus, buf);
-+		if (IS_ERR_OR_NULL(ep)) {
-+			ret = PTR_ERR_OR_ZERO(ep);
-+			break;
-+		}
-+
-+		handle->ep_owner = ep;
-+		ret = KDBUS_HANDLE_EP_OWNER;
-+		break;
-+	}
-+
-+	case KDBUS_CMD_HELLO:
-+		conn = kdbus_cmd_hello(file_ep, file, buf);
-+		if (IS_ERR_OR_NULL(conn)) {
-+			ret = PTR_ERR_OR_ZERO(conn);
-+			break;
-+		}
-+
-+		handle->conn = conn;
-+		ret = KDBUS_HANDLE_CONNECTED;
-+		break;
-+
-+	default:
-+		ret = -EBADFD;
-+		break;
-+	}
-+
-+	kdbus_node_release(node);
-+	return ret;
-+}
-+
-+static long kdbus_handle_ioctl_ep_owner(struct file *file, unsigned int command,
-+					void __user *buf)
-+{
-+	struct kdbus_handle *handle = file->private_data;
-+	struct kdbus_ep *ep = handle->ep_owner;
-+	int ret;
-+
-+	if (!kdbus_node_acquire(&ep->node))
-+		return -ESHUTDOWN;
-+
-+	switch (command) {
-+	case KDBUS_CMD_ENDPOINT_UPDATE:
-+		ret = kdbus_cmd_ep_update(ep, buf);
-+		break;
-+	default:
-+		ret = -EBADFD;
-+		break;
-+	}
-+
-+	kdbus_node_release(&ep->node);
-+	return ret;
-+}
-+
-+static long kdbus_handle_ioctl_connected(struct file *file,
-+					 unsigned int command, void __user *buf)
-+{
-+	struct kdbus_handle *handle = file->private_data;
-+	struct kdbus_conn *conn = handle->conn;
-+	struct kdbus_conn *release_conn = NULL;
-+	int ret;
-+
-+	release_conn = conn;
-+	ret = kdbus_conn_acquire(release_conn);
-+	if (ret < 0)
-+		return ret;
-+
-+	switch (command) {
-+	case KDBUS_CMD_BYEBYE:
-+		/*
-+		 * BYEBYE is special; we must not acquire a connection when
-+		 * calling into kdbus_conn_disconnect() or we will deadlock,
-+		 * because kdbus_conn_disconnect() will wait for all acquired
-+		 * references to be dropped.
-+		 */
-+		kdbus_conn_release(release_conn);
-+		release_conn = NULL;
-+		ret = kdbus_cmd_byebye_unlocked(conn, buf);
-+		break;
-+	case KDBUS_CMD_NAME_ACQUIRE:
-+		ret = kdbus_cmd_name_acquire(conn, buf);
-+		break;
-+	case KDBUS_CMD_NAME_RELEASE:
-+		ret = kdbus_cmd_name_release(conn, buf);
-+		break;
-+	case KDBUS_CMD_LIST:
-+		ret = kdbus_cmd_list(conn, buf);
-+		break;
-+	case KDBUS_CMD_CONN_INFO:
-+		ret = kdbus_cmd_conn_info(conn, buf);
-+		break;
-+	case KDBUS_CMD_BUS_CREATOR_INFO:
-+		ret = kdbus_cmd_bus_creator_info(conn, buf);
-+		break;
-+	case KDBUS_CMD_UPDATE:
-+		ret = kdbus_cmd_update(conn, buf);
-+		break;
-+	case KDBUS_CMD_MATCH_ADD:
-+		ret = kdbus_cmd_match_add(conn, buf);
-+		break;
-+	case KDBUS_CMD_MATCH_REMOVE:
-+		ret = kdbus_cmd_match_remove(conn, buf);
-+		break;
-+	case KDBUS_CMD_SEND:
-+		ret = kdbus_cmd_send(conn, file, buf);
-+		break;
-+	case KDBUS_CMD_RECV:
-+		ret = kdbus_cmd_recv(conn, buf);
-+		break;
-+	case KDBUS_CMD_FREE:
-+		ret = kdbus_cmd_free(conn, buf);
-+		break;
-+	default:
-+		ret = -EBADFD;
-+		break;
-+	}
-+
-+	kdbus_conn_release(release_conn);
-+	return ret;
-+}
-+
-+static long kdbus_handle_ioctl(struct file *file, unsigned int cmd,
-+			       unsigned long arg)
-+{
-+	struct kdbus_handle *handle = file->private_data;
-+	struct kdbus_node *node = kdbus_node_from_inode(file_inode(file));
-+	void __user *argp = (void __user *)arg;
-+	long ret = -EBADFD;
-+
-+	switch (cmd) {
-+	case KDBUS_CMD_BUS_MAKE:
-+	case KDBUS_CMD_ENDPOINT_MAKE:
-+	case KDBUS_CMD_HELLO:
-+		mutex_lock(&handle->lock);
-+		if (handle->type == KDBUS_HANDLE_NONE) {
-+			if (node->type == KDBUS_NODE_CONTROL)
-+				ret = kdbus_handle_ioctl_control(file, cmd,
-+								 argp);
-+			else if (node->type == KDBUS_NODE_ENDPOINT)
-+				ret = kdbus_handle_ioctl_ep(file, cmd, argp);
-+
-+			if (ret > 0) {
-+				/*
-+				 * The data given via open() is not sufficient
-+				 * to setup a kdbus handle. Hence, we require
-+				 * the user to perform a setup ioctl. This setup
-+				 * can only be performed once and defines the
-+				 * type of the handle. The different setup
-+				 * ioctls are locked against each other so they
-+				 * cannot race. Once the handle type is set,
-+				 * the type-dependent ioctls are enabled. To
-+				 * improve performance, we don't lock those via
-+				 * handle->lock. Instead, we issue a
-+				 * write-barrier before performing the
-+				 * type-change, which pairs with smp_rmb() in
-+				 * all handlers that access the type field. This
-+				 * guarantees the handle is fully setup, if
-+				 * handle->type is set. If handle->type is
-+				 * unset, you must not make any assumptions
-+				 * without taking handle->lock.
-+				 * Note that handle->type is only set once. It
-+				 * will never change afterwards.
-+				 */
-+				smp_wmb();
-+				handle->type = ret;
-+			}
-+		}
-+		mutex_unlock(&handle->lock);
-+		break;
-+
-+	case KDBUS_CMD_ENDPOINT_UPDATE:
-+	case KDBUS_CMD_BYEBYE:
-+	case KDBUS_CMD_NAME_ACQUIRE:
-+	case KDBUS_CMD_NAME_RELEASE:
-+	case KDBUS_CMD_LIST:
-+	case KDBUS_CMD_CONN_INFO:
-+	case KDBUS_CMD_BUS_CREATOR_INFO:
-+	case KDBUS_CMD_UPDATE:
-+	case KDBUS_CMD_MATCH_ADD:
-+	case KDBUS_CMD_MATCH_REMOVE:
-+	case KDBUS_CMD_SEND:
-+	case KDBUS_CMD_RECV:
-+	case KDBUS_CMD_FREE: {
-+		enum kdbus_handle_type type;
-+
-+		/*
-+		 * This read-barrier pairs with smp_wmb() of the handle setup.
-+		 * it guarantees the handle is fully written, in case the
-+		 * type has been set. It allows us to access the handle without
-+		 * taking handle->lock, given the guarantee that the type is
-+		 * only ever set once, and stays constant afterwards.
-+		 * Furthermore, the handle object itself is not modified in any
-+		 * way after the type is set. That is, the type-field is the
-+		 * last field that is written on any handle. If it has not been
-+		 * set, we must not access the handle here.
-+		 */
-+		type = handle->type;
-+		smp_rmb();
-+
-+		if (type == KDBUS_HANDLE_EP_OWNER)
-+			ret = kdbus_handle_ioctl_ep_owner(file, cmd, argp);
-+		else if (type == KDBUS_HANDLE_CONNECTED)
-+			ret = kdbus_handle_ioctl_connected(file, cmd, argp);
-+
-+		break;
-+	}
-+	default:
-+		ret = -ENOTTY;
-+		break;
-+	}
-+
-+	return ret < 0 ? ret : 0;
-+}
-+
-+static unsigned int kdbus_handle_poll(struct file *file,
-+				      struct poll_table_struct *wait)
-+{
-+	struct kdbus_handle *handle = file->private_data;
-+	enum kdbus_handle_type type;
-+	unsigned int mask = POLLOUT | POLLWRNORM;
-+
-+	/*
-+	 * This pairs with smp_wmb() during handle setup. It guarantees that
-+	 * _iff_ the handle type is set, handle->conn is valid. Furthermore,
-+	 * _iff_ the type is set, the handle object is constant and never
-+	 * changed again. If it's not set, we must not access the handle but
-+	 * bail out. We also must assume no setup has taken place, yet.
-+	 */
-+	type = handle->type;
-+	smp_rmb();
-+
-+	/* Only a connected endpoint can read/write data */
-+	if (type != KDBUS_HANDLE_CONNECTED)
-+		return POLLERR | POLLHUP;
-+
-+	poll_wait(file, &handle->conn->wait, wait);
-+
-+	/*
-+	 * Verify the connection hasn't been deactivated _after_ adding the
-+	 * wait-queue. This guarantees, that if the connection is deactivated
-+	 * after we checked it, the waitqueue is signaled and we're called
-+	 * again.
-+	 */
-+	if (!kdbus_conn_active(handle->conn))
-+		return POLLERR | POLLHUP;
-+
-+	if (!list_empty(&handle->conn->queue.msg_list) ||
-+	    atomic_read(&handle->conn->lost_count) > 0)
-+		mask |= POLLIN | POLLRDNORM;
-+
-+	return mask;
-+}
-+
-+static int kdbus_handle_mmap(struct file *file, struct vm_area_struct *vma)
-+{
-+	struct kdbus_handle *handle = file->private_data;
-+	enum kdbus_handle_type type;
-+	int ret = -EBADFD;
-+
-+	/*
-+	 * This pairs with smp_wmb() during handle setup. It guarantees that
-+	 * _iff_ the handle type is set, handle->conn is valid. Furthermore,
-+	 * _iff_ the type is set, the handle object is constant and never
-+	 * changed again. If it's not set, we must not access the handle but
-+	 * bail out. We also must assume no setup has taken place, yet.
-+	 */
-+	type = handle->type;
-+	smp_rmb();
-+
-+	/* Only connected handles have a pool we can map */
-+	if (type == KDBUS_HANDLE_CONNECTED)
-+		ret = kdbus_pool_mmap(handle->conn->pool, vma);
-+
-+	return ret;
-+}
-+
-+const struct file_operations kdbus_handle_ops = {
-+	.owner =		THIS_MODULE,
-+	.open =			kdbus_handle_open,
-+	.release =		kdbus_handle_release,
-+	.poll =			kdbus_handle_poll,
-+	.llseek =		noop_llseek,
-+	.unlocked_ioctl =	kdbus_handle_ioctl,
-+	.mmap =			kdbus_handle_mmap,
-+#ifdef CONFIG_COMPAT
-+	.compat_ioctl =		kdbus_handle_ioctl,
-+#endif
-+};
-diff --git a/ipc/kdbus/handle.h b/ipc/kdbus/handle.h
-new file mode 100644
-index 0000000..5dde2c1
---- /dev/null
-+++ b/ipc/kdbus/handle.h
-@@ -0,0 +1,103 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_HANDLE_H
-+#define __KDBUS_HANDLE_H
-+
-+#include <linux/fs.h>
-+#include <uapi/linux/kdbus.h>
-+
-+extern const struct file_operations kdbus_handle_ops;
-+
-+/**
-+ * kdbus_arg - information and state of a single ioctl command item
-+ * @type:		item type
-+ * @item:		set by the parser to the first found item of this type
-+ * @multiple:		whether multiple items of this type are allowed
-+ * @mandatory:		whether at least one item of this type is required
-+ *
-+ * This structure describes a single item in an ioctl command payload. The
-+ * caller has to pre-fill the type and flags, the parser will then use this
-+ * information to verify the ioctl payload. @item is set by the parser to point
-+ * to the first occurrence of the item.
-+ */
-+struct kdbus_arg {
-+	u64 type;
-+	struct kdbus_item *item;
-+	bool multiple : 1;
-+	bool mandatory : 1;
-+};
-+
-+/**
-+ * kdbus_args - information and state of ioctl command parser
-+ * @allowed_flags:	set of flags this command supports
-+ * @argc:		number of items in @argv
-+ * @argv:		array of items this command supports
-+ * @user:		set by parser to user-space location of current command
-+ * @cmd:		set by parser to kernel copy of command payload
-+ * @cmd_buf:		inline buf to avoid kmalloc() on small cmds
-+ * @items:		points to item array in @cmd
-+ * @items_size:		size of @items in bytes
-+ * @is_cmd:		whether this is a command-payload or msg-payload
-+ *
-+ * This structure is used to parse ioctl command payloads on each invocation.
-+ * The ioctl handler has to pre-fill the flags and allowed items before passing
-+ * the object to kdbus_args_parse(). The parser will copy the command payload
-+ * into kernel-space and verify the correctness of the data.
-+ *
-+ * We use a 256 bytes buffer for small command payloads, to be allocated on
-+ * stack on syscall entrance.
-+ */
-+struct kdbus_args {
-+	u64 allowed_flags;
-+	size_t argc;
-+	struct kdbus_arg *argv;
-+
-+	struct kdbus_cmd __user *user;
-+	struct kdbus_cmd *cmd;
-+	u8 cmd_buf[256];
-+
-+	struct kdbus_item *items;
-+	size_t items_size;
-+	bool is_cmd : 1;
-+};
-+
-+int __kdbus_args_parse(struct kdbus_args *args, bool is_cmd, void __user *argp,
-+		       size_t type_size, size_t items_offset, void **out);
-+int kdbus_args_clear(struct kdbus_args *args, int ret);
-+
-+#define kdbus_args_parse(_args, _argp, _v)                              \
-+	({                                                              \
-+		BUILD_BUG_ON(offsetof(typeof(**(_v)), size) !=          \
-+			     offsetof(struct kdbus_cmd, size));         \
-+		BUILD_BUG_ON(offsetof(typeof(**(_v)), flags) !=         \
-+			     offsetof(struct kdbus_cmd, flags));        \
-+		BUILD_BUG_ON(offsetof(typeof(**(_v)), return_flags) !=  \
-+			     offsetof(struct kdbus_cmd, return_flags)); \
-+		__kdbus_args_parse((_args), 1, (_argp), sizeof(**(_v)), \
-+				   offsetof(typeof(**(_v)), items),     \
-+				   (void **)(_v));                      \
-+	})
-+
-+#define kdbus_args_parse_msg(_args, _argp, _v)                          \
-+	({                                                              \
-+		BUILD_BUG_ON(offsetof(typeof(**(_v)), size) !=          \
-+			     offsetof(struct kdbus_cmd, size));         \
-+		BUILD_BUG_ON(offsetof(typeof(**(_v)), flags) !=         \
-+			     offsetof(struct kdbus_cmd, flags));        \
-+		__kdbus_args_parse((_args), 0, (_argp), sizeof(**(_v)), \
-+				   offsetof(typeof(**(_v)), items),     \
-+				   (void **)(_v));                      \
-+	})
-+
-+#endif
-diff --git a/ipc/kdbus/item.c b/ipc/kdbus/item.c
-new file mode 100644
-index 0000000..ce78dba
---- /dev/null
-+++ b/ipc/kdbus/item.c
-@@ -0,0 +1,293 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/ctype.h>
-+#include <linux/fs.h>
-+#include <linux/string.h>
-+
-+#include "item.h"
-+#include "limits.h"
-+#include "util.h"
-+
-+/*
-+ * This verifies the string at position @str with size @size is properly
-+ * zero-terminated and does not contain a 0-byte but at the end.
-+ */
-+static bool kdbus_str_valid(const char *str, size_t size)
-+{
-+	return size > 0 && memchr(str, '\0', size) == str + size - 1;
-+}
-+
-+/**
-+ * kdbus_item_validate_name() - validate an item containing a name
-+ * @item:		Item to validate
-+ *
-+ * Return: zero on success or an negative error code on failure
-+ */
-+int kdbus_item_validate_name(const struct kdbus_item *item)
-+{
-+	const char *name = item->str;
-+	unsigned int i;
-+	size_t len;
-+
-+	if (item->size < KDBUS_ITEM_HEADER_SIZE + 2)
-+		return -EINVAL;
-+
-+	if (item->size > KDBUS_ITEM_HEADER_SIZE +
-+			 KDBUS_SYSNAME_MAX_LEN + 1)
-+		return -ENAMETOOLONG;
-+
-+	if (!kdbus_str_valid(name, KDBUS_ITEM_PAYLOAD_SIZE(item)))
-+		return -EINVAL;
-+
-+	len = strlen(name);
-+	if (len == 0)
-+		return -EINVAL;
-+
-+	for (i = 0; i < len; i++) {
-+		if (isalpha(name[i]))
-+			continue;
-+		if (isdigit(name[i]))
-+			continue;
-+		if (name[i] == '_')
-+			continue;
-+		if (i > 0 && i + 1 < len && (name[i] == '-' || name[i] == '.'))
-+			continue;
-+
-+		return -EINVAL;
-+	}
-+
-+	return 0;
-+}
-+
-+/**
-+ * kdbus_item_validate() - validate a single item
-+ * @item:	item to validate
-+ *
-+ * Return: 0 if item is valid, negative error code if not.
-+ */
-+int kdbus_item_validate(const struct kdbus_item *item)
-+{
-+	size_t payload_size = KDBUS_ITEM_PAYLOAD_SIZE(item);
-+	size_t l;
-+	int ret;
-+
-+	BUILD_BUG_ON(KDBUS_ITEM_HEADER_SIZE !=
-+		     sizeof(struct kdbus_item_header));
-+
-+	if (item->size < KDBUS_ITEM_HEADER_SIZE)
-+		return -EINVAL;
-+
-+	switch (item->type) {
-+	case KDBUS_ITEM_NEGOTIATE:
-+		if (payload_size % sizeof(u64) != 0)
-+			return -EINVAL;
-+		break;
-+
-+	case KDBUS_ITEM_PAYLOAD_VEC:
-+	case KDBUS_ITEM_PAYLOAD_OFF:
-+		if (payload_size != sizeof(struct kdbus_vec))
-+			return -EINVAL;
-+		if (item->vec.size == 0 || item->vec.size > SIZE_MAX)
-+			return -EINVAL;
-+		break;
-+
-+	case KDBUS_ITEM_PAYLOAD_MEMFD:
-+		if (payload_size != sizeof(struct kdbus_memfd))
-+			return -EINVAL;
-+		if (item->memfd.size == 0 || item->memfd.size > SIZE_MAX)
-+			return -EINVAL;
-+		if (item->memfd.fd < 0)
-+			return -EBADF;
-+		break;
-+
-+	case KDBUS_ITEM_FDS:
-+		if (payload_size % sizeof(int) != 0)
-+			return -EINVAL;
-+		break;
-+
-+	case KDBUS_ITEM_CANCEL_FD:
-+		if (payload_size != sizeof(int))
-+			return -EINVAL;
-+		break;
-+
-+	case KDBUS_ITEM_BLOOM_PARAMETER:
-+		if (payload_size != sizeof(struct kdbus_bloom_parameter))
-+			return -EINVAL;
-+		break;
-+
-+	case KDBUS_ITEM_BLOOM_FILTER:
-+		/* followed by the bloom-mask, depends on the bloom-size */
-+		if (payload_size < sizeof(struct kdbus_bloom_filter))
-+			return -EINVAL;
-+		break;
-+
-+	case KDBUS_ITEM_BLOOM_MASK:
-+		/* size depends on bloom-size of bus */
-+		break;
-+
-+	case KDBUS_ITEM_CONN_DESCRIPTION:
-+	case KDBUS_ITEM_MAKE_NAME:
-+		ret = kdbus_item_validate_name(item);
-+		if (ret < 0)
-+			return ret;
-+		break;
-+
-+	case KDBUS_ITEM_ATTACH_FLAGS_SEND:
-+	case KDBUS_ITEM_ATTACH_FLAGS_RECV:
-+	case KDBUS_ITEM_ID:
-+	case KDBUS_ITEM_DST_ID:
-+		if (payload_size != sizeof(u64))
-+			return -EINVAL;
-+		break;
-+
-+	case KDBUS_ITEM_TIMESTAMP:
-+		if (payload_size != sizeof(struct kdbus_timestamp))
-+			return -EINVAL;
-+		break;
-+
-+	case KDBUS_ITEM_CREDS:
-+		if (payload_size != sizeof(struct kdbus_creds))
-+			return -EINVAL;
-+		break;
-+
-+	case KDBUS_ITEM_AUXGROUPS:
-+		if (payload_size % sizeof(u32) != 0)
-+			return -EINVAL;
-+		break;
-+
-+	case KDBUS_ITEM_NAME:
-+	case KDBUS_ITEM_DST_NAME:
-+	case KDBUS_ITEM_PID_COMM:
-+	case KDBUS_ITEM_TID_COMM:
-+	case KDBUS_ITEM_EXE:
-+	case KDBUS_ITEM_CMDLINE:
-+	case KDBUS_ITEM_CGROUP:
-+	case KDBUS_ITEM_SECLABEL:
-+		if (!kdbus_str_valid(item->str, payload_size))
-+			return -EINVAL;
-+		break;
-+
-+	case KDBUS_ITEM_CAPS:
-+		if (payload_size < sizeof(u32))
-+			return -EINVAL;
-+		if (payload_size < sizeof(u32) +
-+		    4 * CAP_TO_INDEX(item->caps.last_cap) * sizeof(u32))
-+			return -EINVAL;
-+		break;
-+
-+	case KDBUS_ITEM_AUDIT:
-+		if (payload_size != sizeof(struct kdbus_audit))
-+			return -EINVAL;
-+		break;
-+
-+	case KDBUS_ITEM_POLICY_ACCESS:
-+		if (payload_size != sizeof(struct kdbus_policy_access))
-+			return -EINVAL;
-+		break;
-+
-+	case KDBUS_ITEM_NAME_ADD:
-+	case KDBUS_ITEM_NAME_REMOVE:
-+	case KDBUS_ITEM_NAME_CHANGE:
-+		if (payload_size < sizeof(struct kdbus_notify_name_change))
-+			return -EINVAL;
-+		l = payload_size - offsetof(struct kdbus_notify_name_change,
-+					    name);
-+		if (l > 0 && !kdbus_str_valid(item->name_change.name, l))
-+			return -EINVAL;
-+		break;
-+
-+	case KDBUS_ITEM_ID_ADD:
-+	case KDBUS_ITEM_ID_REMOVE:
-+		if (payload_size != sizeof(struct kdbus_notify_id_change))
-+			return -EINVAL;
-+		break;
-+
-+	case KDBUS_ITEM_REPLY_TIMEOUT:
-+	case KDBUS_ITEM_REPLY_DEAD:
-+		if (payload_size != 0)
-+			return -EINVAL;
-+		break;
-+
-+	default:
-+		break;
-+	}
-+
-+	return 0;
-+}
-+
-+/**
-+ * kdbus_items_validate() - validate items passed by user-space
-+ * @items:		items to validate
-+ * @items_size:		number of items
-+ *
-+ * This verifies that the passed items pointer is consistent and valid.
-+ * Furthermore, each item is checked for:
-+ *  - valid "size" value
-+ *  - payload is of expected type
-+ *  - payload is fully included in the item
-+ *  - string payloads are zero-terminated
-+ *
-+ * Return: 0 on success, negative error code on failure.
-+ */
-+int kdbus_items_validate(const struct kdbus_item *items, size_t items_size)
-+{
-+	const struct kdbus_item *item;
-+	int ret;
-+
-+	KDBUS_ITEMS_FOREACH(item, items, items_size) {
-+		if (!KDBUS_ITEM_VALID(item, items, items_size))
-+			return -EINVAL;
-+
-+		ret = kdbus_item_validate(item);
-+		if (ret < 0)
-+			return ret;
-+	}
-+
-+	if (!KDBUS_ITEMS_END(item, items, items_size))
-+		return -EINVAL;
-+
-+	return 0;
-+}
-+
-+/**
-+ * kdbus_item_set() - Set item content
-+ * @item:	The item to modify
-+ * @type:	The item type to set (KDBUS_ITEM_*)
-+ * @data:	Data to copy to item->data, may be %NULL
-+ * @len:	Number of bytes in @data
-+ *
-+ * This sets type, size and data fields of an item. If @data is NULL, the data
-+ * memory is cleared.
-+ *
-+ * Note that you must align your @data memory to 8 bytes. Trailing padding (in
-+ * case @len is not 8byte aligned) is cleared by this call.
-+ *
-+ * Returns: Pointer to the following item.
-+ */
-+struct kdbus_item *kdbus_item_set(struct kdbus_item *item, u64 type,
-+				  const void *data, size_t len)
-+{
-+	item->type = type;
-+	item->size = KDBUS_ITEM_HEADER_SIZE + len;
-+
-+	if (data) {
-+		memcpy(item->data, data, len);
-+		memset(item->data + len, 0, KDBUS_ALIGN8(len) - len);
-+	} else {
-+		memset(item->data, 0, KDBUS_ALIGN8(len));
-+	}
-+
-+	return KDBUS_ITEM_NEXT(item);
-+}
-diff --git a/ipc/kdbus/item.h b/ipc/kdbus/item.h
-new file mode 100644
-index 0000000..3a7e6cc
---- /dev/null
-+++ b/ipc/kdbus/item.h
-@@ -0,0 +1,61 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_ITEM_H
-+#define __KDBUS_ITEM_H
-+
-+#include <linux/kernel.h>
-+#include <uapi/linux/kdbus.h>
-+
-+#include "util.h"
-+
-+/* generic access and iterators over a stream of items */
-+#define KDBUS_ITEM_NEXT(_i) (typeof(_i))((u8 *)(_i) + KDBUS_ALIGN8((_i)->size))
-+#define KDBUS_ITEMS_SIZE(_h, _is) ((_h)->size - offsetof(typeof(*(_h)), _is))
-+#define KDBUS_ITEM_HEADER_SIZE offsetof(struct kdbus_item, data)
-+#define KDBUS_ITEM_SIZE(_s) KDBUS_ALIGN8(KDBUS_ITEM_HEADER_SIZE + (_s))
-+#define KDBUS_ITEM_PAYLOAD_SIZE(_i) ((_i)->size - KDBUS_ITEM_HEADER_SIZE)
-+
-+#define KDBUS_ITEMS_FOREACH(_i, _is, _s)				\
-+	for ((_i) = (_is);						\
-+	     ((u8 *)(_i) < (u8 *)(_is) + (_s)) &&			\
-+	       ((u8 *)(_i) >= (u8 *)(_is));				\
-+	     (_i) = KDBUS_ITEM_NEXT(_i))
-+
-+#define KDBUS_ITEM_VALID(_i, _is, _s)					\
-+	((_i)->size >= KDBUS_ITEM_HEADER_SIZE &&			\
-+	 (u8 *)(_i) + (_i)->size > (u8 *)(_i) &&			\
-+	 (u8 *)(_i) + (_i)->size <= (u8 *)(_is) + (_s) &&		\
-+	 (u8 *)(_i) >= (u8 *)(_is))
-+
-+#define KDBUS_ITEMS_END(_i, _is, _s)					\
-+	((u8 *)(_i) == ((u8 *)(_is) + KDBUS_ALIGN8(_s)))
-+
-+/**
-+ * struct kdbus_item_header - Describes the fix part of an item
-+ * @size:	The total size of the item
-+ * @type:	The item type, one of KDBUS_ITEM_*
-+ */
-+struct kdbus_item_header {
-+	u64 size;
-+	u64 type;
-+};
-+
-+int kdbus_item_validate_name(const struct kdbus_item *item);
-+int kdbus_item_validate(const struct kdbus_item *item);
-+int kdbus_items_validate(const struct kdbus_item *items, size_t items_size);
-+struct kdbus_item *kdbus_item_set(struct kdbus_item *item, u64 type,
-+				  const void *data, size_t len);
-+
-+#endif
-diff --git a/ipc/kdbus/limits.h b/ipc/kdbus/limits.h
-new file mode 100644
-index 0000000..c54925a
---- /dev/null
-+++ b/ipc/kdbus/limits.h
-@@ -0,0 +1,61 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_DEFAULTS_H
-+#define __KDBUS_DEFAULTS_H
-+
-+#include <linux/kernel.h>
-+
-+/* maximum size of message header and items */
-+#define KDBUS_MSG_MAX_SIZE		SZ_8K
-+
-+/* maximum number of memfd items per message */
-+#define KDBUS_MSG_MAX_MEMFD_ITEMS	16
-+
-+/* max size of ioctl command data */
-+#define KDBUS_CMD_MAX_SIZE		SZ_32K
-+
-+/* maximum number of inflight fds in a target queue per user */
-+#define KDBUS_CONN_MAX_FDS_PER_USER	16
-+
-+/* maximum message payload size */
-+#define KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE		SZ_2M
-+
-+/* maximum size of bloom bit field in bytes */
-+#define KDBUS_BUS_BLOOM_MAX_SIZE		SZ_4K
-+
-+/* maximum length of well-known bus name */
-+#define KDBUS_NAME_MAX_LEN			255
-+
-+/* maximum length of bus, domain, ep name */
-+#define KDBUS_SYSNAME_MAX_LEN			63
-+
-+/* maximum number of matches per connection */
-+#define KDBUS_MATCH_MAX				256
-+
-+/* maximum number of queued messages from the same individual user */
-+#define KDBUS_CONN_MAX_MSGS			256
-+
-+/* maximum number of well-known names per connection */
-+#define KDBUS_CONN_MAX_NAMES			256
-+
-+/* maximum number of queued requests waiting for a reply */
-+#define KDBUS_CONN_MAX_REQUESTS_PENDING		128
-+
-+/* maximum number of connections per user in one domain */
-+#define KDBUS_USER_MAX_CONN			1024
-+
-+/* maximum number of buses per user in one domain */
-+#define KDBUS_USER_MAX_BUSES			16
-+
-+#endif
-diff --git a/ipc/kdbus/main.c b/ipc/kdbus/main.c
-new file mode 100644
-index 0000000..1ad4dc8
---- /dev/null
-+++ b/ipc/kdbus/main.c
-@@ -0,0 +1,114 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#define pr_fmt(fmt)    KBUILD_MODNAME ": " fmt
-+#include <linux/fs.h>
-+#include <linux/init.h>
-+#include <linux/module.h>
-+
-+#include "util.h"
-+#include "fs.h"
-+#include "handle.h"
-+#include "metadata.h"
-+#include "node.h"
-+
-+/*
-+ * This is a simplified outline of the internal kdbus object relations, for
-+ * those interested in the inner life of the driver implementation.
-+ *
-+ * From a mount point's (domain's) perspective:
-+ *
-+ * struct kdbus_domain
-+ *   |» struct kdbus_user *user (many, owned)
-+ *   '» struct kdbus_node node (embedded)
-+ *       |» struct kdbus_node children (many, referenced)
-+ *       |» struct kdbus_node *parent (pinned)
-+ *       '» struct kdbus_bus (many, pinned)
-+ *           |» struct kdbus_node node (embedded)
-+ *           '» struct kdbus_ep (many, pinned)
-+ *               |» struct kdbus_node node (embedded)
-+ *               |» struct kdbus_bus *bus (pinned)
-+ *               |» struct kdbus_conn conn_list (many, pinned)
-+ *               |   |» struct kdbus_ep *ep (pinned)
-+ *               |   |» struct kdbus_name_entry *activator_of (owned)
-+ *               |   |» struct kdbus_match_db *match_db (owned)
-+ *               |   |» struct kdbus_meta *meta (owned)
-+ *               |   |» struct kdbus_match_db *match_db (owned)
-+ *               |   |    '» struct kdbus_match_entry (many, owned)
-+ *               |   |
-+ *               |   |» struct kdbus_pool *pool (owned)
-+ *               |   |    '» struct kdbus_pool_slice *slices (many, owned)
-+ *               |   |       '» struct kdbus_pool *pool (pinned)
-+ *               |   |
-+ *               |   |» struct kdbus_user *user (pinned)
-+ *               |   `» struct kdbus_queue_entry entries (many, embedded)
-+ *               |        |» struct kdbus_pool_slice *slice (pinned)
-+ *               |        |» struct kdbus_conn_reply *reply (owned)
-+ *               |        '» struct kdbus_user *user (pinned)
-+ *               |
-+ *               '» struct kdbus_user *user (pinned)
-+ *                   '» struct kdbus_policy_db policy_db (embedded)
-+ *                        |» struct kdbus_policy_db_entry (many, owned)
-+ *                        |   |» struct kdbus_conn (pinned)
-+ *                        |   '» struct kdbus_ep (pinned)
-+ *                        |
-+ *                        '» struct kdbus_policy_db_cache_entry (many, owned)
-+ *                            '» struct kdbus_conn (pinned)
-+ *
-+ * For the life-time of a file descriptor derived from calling open() on a file
-+ * inside the mount point:
-+ *
-+ * struct kdbus_handle
-+ *  |» struct kdbus_meta *meta (owned)
-+ *  |» struct kdbus_ep *ep (pinned)
-+ *  |» struct kdbus_conn *conn (owned)
-+ *  '» struct kdbus_ep *ep (owned)
-+ */
-+
-+/* kdbus mount-point /sys/fs/kdbus */
-+static struct kobject *kdbus_dir;
-+
-+static int __init kdbus_init(void)
-+{
-+	int ret;
-+
-+	kdbus_dir = kobject_create_and_add(KBUILD_MODNAME, fs_kobj);
-+	if (!kdbus_dir)
-+		return -ENOMEM;
-+
-+	ret = kdbus_fs_init();
-+	if (ret < 0) {
-+		pr_err("cannot register filesystem: %d\n", ret);
-+		goto exit_dir;
-+	}
-+
-+	pr_info("initialized\n");
-+	return 0;
-+
-+exit_dir:
-+	kobject_put(kdbus_dir);
-+	return ret;
-+}
-+
-+static void __exit kdbus_exit(void)
-+{
-+	kdbus_fs_exit();
-+	kobject_put(kdbus_dir);
-+	ida_destroy(&kdbus_node_ida);
-+}
-+
-+module_init(kdbus_init);
-+module_exit(kdbus_exit);
-+MODULE_LICENSE("GPL");
-+MODULE_DESCRIPTION("D-Bus, powerful, easy to use interprocess communication");
-+MODULE_ALIAS_FS(KBUILD_MODNAME "fs");
-diff --git a/ipc/kdbus/match.c b/ipc/kdbus/match.c
-new file mode 100644
-index 0000000..4ee6a1f
---- /dev/null
-+++ b/ipc/kdbus/match.c
-@@ -0,0 +1,546 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/fs.h>
-+#include <linux/hash.h>
-+#include <linux/init.h>
-+#include <linux/mutex.h>
-+#include <linux/sched.h>
-+#include <linux/sizes.h>
-+#include <linux/slab.h>
-+#include <linux/uaccess.h>
-+
-+#include "bus.h"
-+#include "connection.h"
-+#include "endpoint.h"
-+#include "handle.h"
-+#include "item.h"
-+#include "match.h"
-+#include "message.h"
-+#include "names.h"
-+
-+/**
-+ * struct kdbus_match_db - message filters
-+ * @entries_list:	List of matches
-+ * @mdb_rwlock:		Match data lock
-+ * @entries_count:	Number of entries in database
-+ */
-+struct kdbus_match_db {
-+	struct list_head entries_list;
-+	struct rw_semaphore mdb_rwlock;
-+	unsigned int entries_count;
-+};
-+
-+/**
-+ * struct kdbus_match_entry - a match database entry
-+ * @cookie:		User-supplied cookie to lookup the entry
-+ * @list_entry:		The list entry element for the db list
-+ * @rules_list:		The list head for tracking rules of this entry
-+ */
-+struct kdbus_match_entry {
-+	u64 cookie;
-+	struct list_head list_entry;
-+	struct list_head rules_list;
-+};
-+
-+/**
-+ * struct kdbus_bloom_mask - mask to match against filter
-+ * @generations:	Number of generations carried
-+ * @data:		Array of bloom bit fields
-+ */
-+struct kdbus_bloom_mask {
-+	u64 generations;
-+	u64 *data;
-+};
-+
-+/**
-+ * struct kdbus_match_rule - a rule appended to a match entry
-+ * @type:		An item type to match against
-+ * @bloom_mask:		Bloom mask to match a message's filter against, used
-+ *			with KDBUS_ITEM_BLOOM_MASK
-+ * @name:		Name to match against, used with KDBUS_ITEM_NAME,
-+ *			KDBUS_ITEM_NAME_{ADD,REMOVE,CHANGE}
-+ * @old_id:		ID to match against, used with
-+ *			KDBUS_ITEM_NAME_{ADD,REMOVE,CHANGE},
-+ *			KDBUS_ITEM_ID_REMOVE
-+ * @new_id:		ID to match against, used with
-+ *			KDBUS_ITEM_NAME_{ADD,REMOVE,CHANGE},
-+ *			KDBUS_ITEM_ID_REMOVE
-+ * @src_id:		ID to match against, used with KDBUS_ITEM_ID
-+ * @dst_id:		Message destination ID, used with KDBUS_ITEM_DST_ID
-+ * @rules_entry:	Entry in the entry's rules list
-+ */
-+struct kdbus_match_rule {
-+	u64 type;
-+	union {
-+		struct kdbus_bloom_mask bloom_mask;
-+		struct {
-+			char *name;
-+			u64 old_id;
-+			u64 new_id;
-+		};
-+		u64 src_id;
-+		u64 dst_id;
-+	};
-+	struct list_head rules_entry;
-+};
-+
-+static void kdbus_match_rule_free(struct kdbus_match_rule *rule)
-+{
-+	if (!rule)
-+		return;
-+
-+	switch (rule->type) {
-+	case KDBUS_ITEM_BLOOM_MASK:
-+		kfree(rule->bloom_mask.data);
-+		break;
-+
-+	case KDBUS_ITEM_NAME:
-+	case KDBUS_ITEM_NAME_ADD:
-+	case KDBUS_ITEM_NAME_REMOVE:
-+	case KDBUS_ITEM_NAME_CHANGE:
-+		kfree(rule->name);
-+		break;
-+
-+	case KDBUS_ITEM_ID:
-+	case KDBUS_ITEM_DST_ID:
-+	case KDBUS_ITEM_ID_ADD:
-+	case KDBUS_ITEM_ID_REMOVE:
-+		break;
-+
-+	default:
-+		BUG();
-+	}
-+
-+	list_del(&rule->rules_entry);
-+	kfree(rule);
-+}
-+
-+static void kdbus_match_entry_free(struct kdbus_match_entry *entry)
-+{
-+	struct kdbus_match_rule *r, *tmp;
-+
-+	if (!entry)
-+		return;
-+
-+	list_for_each_entry_safe(r, tmp, &entry->rules_list, rules_entry)
-+		kdbus_match_rule_free(r);
-+
-+	list_del(&entry->list_entry);
-+	kfree(entry);
-+}
-+
-+/**
-+ * kdbus_match_db_free() - free match db resources
-+ * @mdb:		The match database
-+ */
-+void kdbus_match_db_free(struct kdbus_match_db *mdb)
-+{
-+	struct kdbus_match_entry *entry, *tmp;
-+
-+	if (!mdb)
-+		return;
-+
-+	list_for_each_entry_safe(entry, tmp, &mdb->entries_list, list_entry)
-+		kdbus_match_entry_free(entry);
-+
-+	kfree(mdb);
-+}
-+
-+/**
-+ * kdbus_match_db_new() - create a new match database
-+ *
-+ * Return: a new kdbus_match_db on success, ERR_PTR on failure.
-+ */
-+struct kdbus_match_db *kdbus_match_db_new(void)
-+{
-+	struct kdbus_match_db *d;
-+
-+	d = kzalloc(sizeof(*d), GFP_KERNEL);
-+	if (!d)
-+		return ERR_PTR(-ENOMEM);
-+
-+	init_rwsem(&d->mdb_rwlock);
-+	INIT_LIST_HEAD(&d->entries_list);
-+
-+	return d;
-+}
-+
-+static bool kdbus_match_bloom(const struct kdbus_bloom_filter *filter,
-+			      const struct kdbus_bloom_mask *mask,
-+			      const struct kdbus_conn *conn)
-+{
-+	size_t n = conn->ep->bus->bloom.size / sizeof(u64);
-+	const u64 *m;
-+	size_t i;
-+
-+	/*
-+	 * The message's filter carries a generation identifier, the
-+	 * match's mask possibly carries an array of multiple generations
-+	 * of the mask. Select the mask with the closest match of the
-+	 * filter's generation.
-+	 */
-+	m = mask->data + (min(filter->generation, mask->generations - 1) * n);
-+
-+	/*
-+	 * The message's filter contains the messages properties,
-+	 * the match's mask contains the properties to look for in the
-+	 * message. Check the mask bit field against the filter bit field,
-+	 * if the message possibly carries the properties the connection
-+	 * has subscribed to.
-+	 */
-+	for (i = 0; i < n; i++)
-+		if ((filter->data[i] & m[i]) != m[i])
-+			return false;
-+
-+	return true;
-+}
-+
-+static bool kdbus_match_rule_conn(const struct kdbus_match_rule *r,
-+				  struct kdbus_conn *c,
-+				  const struct kdbus_staging *s)
-+{
-+	lockdep_assert_held(&c->ep->bus->name_registry->rwlock);
-+
-+	switch (r->type) {
-+	case KDBUS_ITEM_BLOOM_MASK:
-+		return kdbus_match_bloom(s->bloom_filter, &r->bloom_mask, c);
-+	case KDBUS_ITEM_ID:
-+		return r->src_id == c->id || r->src_id == KDBUS_MATCH_ID_ANY;
-+	case KDBUS_ITEM_DST_ID:
-+		return r->dst_id == s->msg->dst_id ||
-+		       r->dst_id == KDBUS_MATCH_ID_ANY;
-+	case KDBUS_ITEM_NAME:
-+		return kdbus_conn_has_name(c, r->name);
-+	default:
-+		return false;
-+	}
-+}
-+
-+static bool kdbus_match_rule_kernel(const struct kdbus_match_rule *r,
-+				    const struct kdbus_staging *s)
-+{
-+	struct kdbus_item *n = s->notify;
-+
-+	if (WARN_ON(!n) || n->type != r->type)
-+		return false;
-+
-+	switch (r->type) {
-+	case KDBUS_ITEM_ID_ADD:
-+		return r->new_id == KDBUS_MATCH_ID_ANY ||
-+		       r->new_id == n->id_change.id;
-+	case KDBUS_ITEM_ID_REMOVE:
-+		return r->old_id == KDBUS_MATCH_ID_ANY ||
-+		       r->old_id == n->id_change.id;
-+	case KDBUS_ITEM_NAME_ADD:
-+	case KDBUS_ITEM_NAME_CHANGE:
-+	case KDBUS_ITEM_NAME_REMOVE:
-+		return (r->old_id == KDBUS_MATCH_ID_ANY ||
-+		        r->old_id == n->name_change.old_id.id) &&
-+		       (r->new_id == KDBUS_MATCH_ID_ANY ||
-+		        r->new_id == n->name_change.new_id.id) &&
-+		       (!r->name || !strcmp(r->name, n->name_change.name));
-+	default:
-+		return false;
-+	}
-+}
-+
-+static bool kdbus_match_rules(const struct kdbus_match_entry *entry,
-+			      struct kdbus_conn *c,
-+			      const struct kdbus_staging *s)
-+{
-+	struct kdbus_match_rule *r;
-+
-+	list_for_each_entry(r, &entry->rules_list, rules_entry)
-+		if ((c && !kdbus_match_rule_conn(r, c, s)) ||
-+		    (!c && !kdbus_match_rule_kernel(r, s)))
-+			return false;
-+
-+	return true;
-+}
-+
-+/**
-+ * kdbus_match_db_match_msg() - match a msg object agains the database entries
-+ * @mdb:		The match database
-+ * @conn_src:		The connection object originating the message
-+ * @staging:		Staging object containing the message to match against
-+ *
-+ * This function will walk through all the database entries previously uploaded
-+ * with kdbus_match_db_add(). As soon as any of them has an all-satisfied rule
-+ * set, this function will return true.
-+ *
-+ * The caller must hold the registry lock of conn_src->ep->bus, in case conn_src
-+ * is non-NULL.
-+ *
-+ * Return: true if there was a matching database entry, false otherwise.
-+ */
-+bool kdbus_match_db_match_msg(struct kdbus_match_db *mdb,
-+			      struct kdbus_conn *conn_src,
-+			      const struct kdbus_staging *staging)
-+{
-+	struct kdbus_match_entry *entry;
-+	bool matched = false;
-+
-+	down_read(&mdb->mdb_rwlock);
-+	list_for_each_entry(entry, &mdb->entries_list, list_entry) {
-+		matched = kdbus_match_rules(entry, conn_src, staging);
-+		if (matched)
-+			break;
-+	}
-+	up_read(&mdb->mdb_rwlock);
-+
-+	return matched;
-+}
-+
-+static int kdbus_match_db_remove_unlocked(struct kdbus_match_db *mdb,
-+					  u64 cookie)
-+{
-+	struct kdbus_match_entry *entry, *tmp;
-+	bool found = false;
-+
-+	list_for_each_entry_safe(entry, tmp, &mdb->entries_list, list_entry)
-+		if (entry->cookie == cookie) {
-+			kdbus_match_entry_free(entry);
-+			--mdb->entries_count;
-+			found = true;
-+		}
-+
-+	return found ? 0 : -EBADSLT;
-+}
-+
-+/**
-+ * kdbus_cmd_match_add() - handle KDBUS_CMD_MATCH_ADD
-+ * @conn:		connection to operate on
-+ * @argp:		command payload
-+ *
-+ * One call to this function (or one ioctl(KDBUS_CMD_MATCH_ADD), respectively,
-+ * adds one new database entry with n rules attached to it. Each rule is
-+ * described with an kdbus_item, and an entry is considered matching if all
-+ * its rules are satisfied.
-+ *
-+ * The items attached to a kdbus_cmd_match struct have the following mapping:
-+ *
-+ * KDBUS_ITEM_BLOOM_MASK:	A bloom mask
-+ * KDBUS_ITEM_NAME:		A connection's source name
-+ * KDBUS_ITEM_ID:		A connection ID
-+ * KDBUS_ITEM_DST_ID:		A connection ID
-+ * KDBUS_ITEM_NAME_ADD:
-+ * KDBUS_ITEM_NAME_REMOVE:
-+ * KDBUS_ITEM_NAME_CHANGE:	Well-known name changes, carry
-+ *				kdbus_notify_name_change
-+ * KDBUS_ITEM_ID_ADD:
-+ * KDBUS_ITEM_ID_REMOVE:	Connection ID changes, carry
-+ *				kdbus_notify_id_change
-+ *
-+ * For kdbus_notify_{id,name}_change structs, only the ID and name fields
-+ * are looked at when adding an entry. The flags are unused.
-+ *
-+ * Also note that KDBUS_ITEM_BLOOM_MASK, KDBUS_ITEM_NAME, KDBUS_ITEM_ID,
-+ * and KDBUS_ITEM_DST_ID are used to match messages from userspace, while the
-+ * others apply to kernel-generated notifications.
-+ *
-+ * Return: >=0 on success, negative error code on failure.
-+ */
-+int kdbus_cmd_match_add(struct kdbus_conn *conn, void __user *argp)
-+{
-+	struct kdbus_match_db *mdb = conn->match_db;
-+	struct kdbus_match_entry *entry = NULL;
-+	struct kdbus_cmd_match *cmd;
-+	struct kdbus_item *item;
-+	int ret;
-+
-+	struct kdbus_arg argv[] = {
-+		{ .type = KDBUS_ITEM_NEGOTIATE },
-+		{ .type = KDBUS_ITEM_BLOOM_MASK, .multiple = true },
-+		{ .type = KDBUS_ITEM_NAME, .multiple = true },
-+		{ .type = KDBUS_ITEM_ID, .multiple = true },
-+		{ .type = KDBUS_ITEM_DST_ID, .multiple = true },
-+		{ .type = KDBUS_ITEM_NAME_ADD, .multiple = true },
-+		{ .type = KDBUS_ITEM_NAME_REMOVE, .multiple = true },
-+		{ .type = KDBUS_ITEM_NAME_CHANGE, .multiple = true },
-+		{ .type = KDBUS_ITEM_ID_ADD, .multiple = true },
-+		{ .type = KDBUS_ITEM_ID_REMOVE, .multiple = true },
-+	};
-+	struct kdbus_args args = {
-+		.allowed_flags = KDBUS_FLAG_NEGOTIATE |
-+				 KDBUS_MATCH_REPLACE,
-+		.argv = argv,
-+		.argc = ARRAY_SIZE(argv),
-+	};
-+
-+	if (!kdbus_conn_is_ordinary(conn))
-+		return -EOPNOTSUPP;
-+
-+	ret = kdbus_args_parse(&args, argp, &cmd);
-+	if (ret != 0)
-+		return ret;
-+
-+	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
-+	if (!entry) {
-+		ret = -ENOMEM;
-+		goto exit;
-+	}
-+
-+	entry->cookie = cmd->cookie;
-+	INIT_LIST_HEAD(&entry->list_entry);
-+	INIT_LIST_HEAD(&entry->rules_list);
-+
-+	KDBUS_ITEMS_FOREACH(item, cmd->items, KDBUS_ITEMS_SIZE(cmd, items)) {
-+		struct kdbus_match_rule *rule;
-+		size_t size = item->size - offsetof(struct kdbus_item, data);
-+
-+		rule = kzalloc(sizeof(*rule), GFP_KERNEL);
-+		if (!rule) {
-+			ret = -ENOMEM;
-+			goto exit;
-+		}
-+
-+		rule->type = item->type;
-+		INIT_LIST_HEAD(&rule->rules_entry);
-+
-+		switch (item->type) {
-+		case KDBUS_ITEM_BLOOM_MASK: {
-+			u64 bsize = conn->ep->bus->bloom.size;
-+			u64 generations;
-+			u64 remainder;
-+
-+			generations = div64_u64_rem(size, bsize, &remainder);
-+			if (size < bsize || remainder > 0) {
-+				ret = -EDOM;
-+				break;
-+			}
-+
-+			rule->bloom_mask.data = kmemdup(item->data,
-+							size, GFP_KERNEL);
-+			if (!rule->bloom_mask.data) {
-+				ret = -ENOMEM;
-+				break;
-+			}
-+
-+			rule->bloom_mask.generations = generations;
-+			break;
-+		}
-+
-+		case KDBUS_ITEM_NAME:
-+			if (!kdbus_name_is_valid(item->str, false)) {
-+				ret = -EINVAL;
-+				break;
-+			}
-+
-+			rule->name = kstrdup(item->str, GFP_KERNEL);
-+			if (!rule->name)
-+				ret = -ENOMEM;
-+
-+			break;
-+
-+		case KDBUS_ITEM_ID:
-+			rule->src_id = item->id;
-+			break;
-+
-+		case KDBUS_ITEM_DST_ID:
-+			rule->dst_id = item->id;
-+			break;
-+
-+		case KDBUS_ITEM_NAME_ADD:
-+		case KDBUS_ITEM_NAME_REMOVE:
-+		case KDBUS_ITEM_NAME_CHANGE:
-+			rule->old_id = item->name_change.old_id.id;
-+			rule->new_id = item->name_change.new_id.id;
-+
-+			if (size > sizeof(struct kdbus_notify_name_change)) {
-+				rule->name = kstrdup(item->name_change.name,
-+						     GFP_KERNEL);
-+				if (!rule->name)
-+					ret = -ENOMEM;
-+			}
-+
-+			break;
-+
-+		case KDBUS_ITEM_ID_ADD:
-+		case KDBUS_ITEM_ID_REMOVE:
-+			if (item->type == KDBUS_ITEM_ID_ADD)
-+				rule->new_id = item->id_change.id;
-+			else
-+				rule->old_id = item->id_change.id;
-+
-+			break;
-+		}
-+
-+		if (ret < 0) {
-+			kdbus_match_rule_free(rule);
-+			goto exit;
-+		}
-+
-+		list_add_tail(&rule->rules_entry, &entry->rules_list);
-+	}
-+
-+	down_write(&mdb->mdb_rwlock);
-+
-+	/* Remove any entry that has the same cookie as the current one. */
-+	if (cmd->flags & KDBUS_MATCH_REPLACE)
-+		kdbus_match_db_remove_unlocked(mdb, entry->cookie);
-+
-+	/*
-+	 * If the above removal caught any entry, there will be room for the
-+	 * new one.
-+	 */
-+	if (++mdb->entries_count > KDBUS_MATCH_MAX) {
-+		--mdb->entries_count;
-+		ret = -EMFILE;
-+	} else {
-+		list_add_tail(&entry->list_entry, &mdb->entries_list);
-+		entry = NULL;
-+	}
-+
-+	up_write(&mdb->mdb_rwlock);
-+
-+exit:
-+	kdbus_match_entry_free(entry);
-+	return kdbus_args_clear(&args, ret);
-+}
-+
-+/**
-+ * kdbus_cmd_match_remove() - handle KDBUS_CMD_MATCH_REMOVE
-+ * @conn:		connection to operate on
-+ * @argp:		command payload
-+ *
-+ * Return: >=0 on success, negative error code on failure.
-+ */
-+int kdbus_cmd_match_remove(struct kdbus_conn *conn, void __user *argp)
-+{
-+	struct kdbus_cmd_match *cmd;
-+	int ret;
-+
-+	struct kdbus_arg argv[] = {
-+		{ .type = KDBUS_ITEM_NEGOTIATE },
-+	};
-+	struct kdbus_args args = {
-+		.allowed_flags = KDBUS_FLAG_NEGOTIATE,
-+		.argv = argv,
-+		.argc = ARRAY_SIZE(argv),
-+	};
-+
-+	if (!kdbus_conn_is_ordinary(conn))
-+		return -EOPNOTSUPP;
-+
-+	ret = kdbus_args_parse(&args, argp, &cmd);
-+	if (ret != 0)
-+		return ret;
-+
-+	down_write(&conn->match_db->mdb_rwlock);
-+	ret = kdbus_match_db_remove_unlocked(conn->match_db, cmd->cookie);
-+	up_write(&conn->match_db->mdb_rwlock);
-+
-+	return kdbus_args_clear(&args, ret);
-+}
-diff --git a/ipc/kdbus/match.h b/ipc/kdbus/match.h
-new file mode 100644
-index 0000000..ceb492f
---- /dev/null
-+++ b/ipc/kdbus/match.h
-@@ -0,0 +1,35 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_MATCH_H
-+#define __KDBUS_MATCH_H
-+
-+struct kdbus_conn;
-+struct kdbus_match_db;
-+struct kdbus_staging;
-+
-+struct kdbus_match_db *kdbus_match_db_new(void);
-+void kdbus_match_db_free(struct kdbus_match_db *db);
-+int kdbus_match_db_add(struct kdbus_conn *conn,
-+		       struct kdbus_cmd_match *cmd);
-+int kdbus_match_db_remove(struct kdbus_conn *conn,
-+			  struct kdbus_cmd_match *cmd);
-+bool kdbus_match_db_match_msg(struct kdbus_match_db *db,
-+			      struct kdbus_conn *conn_src,
-+			      const struct kdbus_staging *staging);
-+
-+int kdbus_cmd_match_add(struct kdbus_conn *conn, void __user *argp);
-+int kdbus_cmd_match_remove(struct kdbus_conn *conn, void __user *argp);
-+
-+#endif
-diff --git a/ipc/kdbus/message.c b/ipc/kdbus/message.c
-new file mode 100644
-index 0000000..ae565cd
---- /dev/null
-+++ b/ipc/kdbus/message.c
-@@ -0,0 +1,1040 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/capability.h>
-+#include <linux/cgroup.h>
-+#include <linux/cred.h>
-+#include <linux/file.h>
-+#include <linux/init.h>
-+#include <linux/module.h>
-+#include <linux/mutex.h>
-+#include <linux/sched.h>
-+#include <linux/shmem_fs.h>
-+#include <linux/sizes.h>
-+#include <linux/slab.h>
-+#include <linux/uaccess.h>
-+#include <net/sock.h>
-+
-+#include "bus.h"
-+#include "connection.h"
-+#include "domain.h"
-+#include "endpoint.h"
-+#include "handle.h"
-+#include "item.h"
-+#include "match.h"
-+#include "message.h"
-+#include "names.h"
-+#include "policy.h"
-+
-+static const char * const zeros = "\0\0\0\0\0\0\0";
-+
-+static struct kdbus_gaps *kdbus_gaps_new(size_t n_memfds, size_t n_fds)
-+{
-+	size_t size_offsets, size_memfds, size_fds, size;
-+	struct kdbus_gaps *gaps;
-+
-+	size_offsets = n_memfds * sizeof(*gaps->memfd_offsets);
-+	size_memfds = n_memfds * sizeof(*gaps->memfd_files);
-+	size_fds = n_fds * sizeof(*gaps->fd_files);
-+	size = sizeof(*gaps) + size_offsets + size_memfds + size_fds;
-+
-+	gaps = kzalloc(size, GFP_KERNEL);
-+	if (!gaps)
-+		return ERR_PTR(-ENOMEM);
-+
-+	kref_init(&gaps->kref);
-+	gaps->n_memfds = 0; /* we reserve n_memfds, but don't enforce them */
-+	gaps->memfd_offsets = (void *)(gaps + 1);
-+	gaps->memfd_files = (void *)((u8 *)gaps->memfd_offsets + size_offsets);
-+	gaps->n_fds = 0; /* we reserve n_fds, but don't enforce them */
-+	gaps->fd_files = (void *)((u8 *)gaps->memfd_files + size_memfds);
-+
-+	return gaps;
-+}
-+
-+static void kdbus_gaps_free(struct kref *kref)
-+{
-+	struct kdbus_gaps *gaps = container_of(kref, struct kdbus_gaps, kref);
-+	size_t i;
-+
-+	for (i = 0; i < gaps->n_fds; ++i)
-+		if (gaps->fd_files[i])
-+			fput(gaps->fd_files[i]);
-+	for (i = 0; i < gaps->n_memfds; ++i)
-+		if (gaps->memfd_files[i])
-+			fput(gaps->memfd_files[i]);
-+
-+	kfree(gaps);
-+}
-+
-+/**
-+ * kdbus_gaps_ref() - gain reference
-+ * @gaps:	gaps object
-+ *
-+ * Return: @gaps is returned
-+ */
-+struct kdbus_gaps *kdbus_gaps_ref(struct kdbus_gaps *gaps)
-+{
-+	if (gaps)
-+		kref_get(&gaps->kref);
-+	return gaps;
-+}
-+
-+/**
-+ * kdbus_gaps_unref() - drop reference
-+ * @gaps:	gaps object
-+ *
-+ * Return: NULL
-+ */
-+struct kdbus_gaps *kdbus_gaps_unref(struct kdbus_gaps *gaps)
-+{
-+	if (gaps)
-+		kref_put(&gaps->kref, kdbus_gaps_free);
-+	return NULL;
-+}
-+
-+/**
-+ * kdbus_gaps_install() - install file-descriptors
-+ * @gaps:		gaps object, or NULL
-+ * @slice:		pool slice that contains the message
-+ * @out_incomplete	output variable to note incomplete fds
-+ *
-+ * This function installs all file-descriptors of @gaps into the current
-+ * process and copies the file-descriptor numbers into the target pool slice.
-+ *
-+ * If the file-descriptors were only partially installed, then @out_incomplete
-+ * will be set to true. Otherwise, it's set to false.
-+ *
-+ * Return: 0 on success, negative error code on failure
-+ */
-+int kdbus_gaps_install(struct kdbus_gaps *gaps, struct kdbus_pool_slice *slice,
-+		       bool *out_incomplete)
-+{
-+	bool incomplete_fds = false;
-+	struct kvec kvec;
-+	size_t i, n_fds;
-+	int ret, *fds;
-+
-+	if (!gaps) {
-+		/* nothing to do */
-+		*out_incomplete = incomplete_fds;
-+		return 0;
-+	}
-+
-+	n_fds = gaps->n_fds + gaps->n_memfds;
-+	if (n_fds < 1) {
-+		/* nothing to do */
-+		*out_incomplete = incomplete_fds;
-+		return 0;
-+	}
-+
-+	fds = kmalloc_array(n_fds, sizeof(*fds), GFP_TEMPORARY);
-+	n_fds = 0;
-+	if (!fds)
-+		return -ENOMEM;
-+
-+	/* 1) allocate fds and copy them over */
-+
-+	if (gaps->n_fds > 0) {
-+		for (i = 0; i < gaps->n_fds; ++i) {
-+			int fd;
-+
-+			fd = get_unused_fd_flags(O_CLOEXEC);
-+			if (fd < 0)
-+				incomplete_fds = true;
-+
-+			WARN_ON(!gaps->fd_files[i]);
-+
-+			fds[n_fds++] = fd < 0 ? -1 : fd;
-+		}
-+
-+		/*
-+		 * The file-descriptor array can only be present once per
-+		 * message. Hence, prepare all fds and then copy them over with
-+		 * a single kvec.
-+		 */
-+
-+		WARN_ON(!gaps->fd_offset);
-+
-+		kvec.iov_base = fds;
-+		kvec.iov_len = gaps->n_fds * sizeof(*fds);
-+		ret = kdbus_pool_slice_copy_kvec(slice, gaps->fd_offset,
-+						 &kvec, 1, kvec.iov_len);
-+		if (ret < 0)
-+			goto exit;
-+	}
-+
-+	for (i = 0; i < gaps->n_memfds; ++i) {
-+		int memfd;
-+
-+		memfd = get_unused_fd_flags(O_CLOEXEC);
-+		if (memfd < 0) {
-+			incomplete_fds = true;
-+			/* memfds are initialized to -1, skip copying it */
-+			continue;
-+		}
-+
-+		fds[n_fds++] = memfd;
-+
-+		/*
-+		 * memfds have to be copied individually as they each are put
-+		 * into a separate item. This should not be an issue, though,
-+		 * as usually there is no need to send more than one memfd per
-+		 * message.
-+		 */
-+
-+		WARN_ON(!gaps->memfd_offsets[i]);
-+		WARN_ON(!gaps->memfd_files[i]);
-+
-+		kvec.iov_base = &memfd;
-+		kvec.iov_len = sizeof(memfd);
-+		ret = kdbus_pool_slice_copy_kvec(slice, gaps->memfd_offsets[i],
-+						 &kvec, 1, kvec.iov_len);
-+		if (ret < 0)
-+			goto exit;
-+	}
-+
-+	/* 2) install fds now that everything was successful */
-+
-+	for (i = 0; i < gaps->n_fds; ++i)
-+		if (fds[i] >= 0)
-+			fd_install(fds[i], get_file(gaps->fd_files[i]));
-+	for (i = 0; i < gaps->n_memfds; ++i)
-+		if (fds[gaps->n_fds + i] >= 0)
-+			fd_install(fds[gaps->n_fds + i],
-+				   get_file(gaps->memfd_files[i]));
-+
-+	ret = 0;
-+
-+exit:
-+	if (ret < 0)
-+		for (i = 0; i < n_fds; ++i)
-+			put_unused_fd(fds[i]);
-+	kfree(fds);
-+	*out_incomplete = incomplete_fds;
-+	return ret;
-+}
-+
-+static struct file *kdbus_get_fd(int fd)
-+{
-+	struct file *f, *ret;
-+	struct inode *inode;
-+	struct socket *sock;
-+
-+	if (fd < 0)
-+		return ERR_PTR(-EBADF);
-+
-+	f = fget_raw(fd);
-+	if (!f)
-+		return ERR_PTR(-EBADF);
-+
-+	inode = file_inode(f);
-+	sock = S_ISSOCK(inode->i_mode) ? SOCKET_I(inode) : NULL;
-+
-+	if (f->f_mode & FMODE_PATH)
-+		ret = f; /* O_PATH is always allowed */
-+	else if (f->f_op == &kdbus_handle_ops)
-+		ret = ERR_PTR(-EOPNOTSUPP); /* disallow kdbus-fd over kdbus */
-+	else if (sock && sock->sk && sock->ops && sock->ops->family == PF_UNIX)
-+		ret = ERR_PTR(-EOPNOTSUPP); /* disallow UDS over kdbus */
-+	else
-+		ret = f; /* all other are allowed */
-+
-+	if (f != ret)
-+		fput(f);
-+
-+	return ret;
-+}
-+
-+static struct file *kdbus_get_memfd(const struct kdbus_memfd *memfd)
-+{
-+	const int m = F_SEAL_SHRINK | F_SEAL_GROW | F_SEAL_WRITE | F_SEAL_SEAL;
-+	struct file *f, *ret;
-+	int s;
-+
-+	if (memfd->fd < 0)
-+		return ERR_PTR(-EBADF);
-+
-+	f = fget(memfd->fd);
-+	if (!f)
-+		return ERR_PTR(-EBADF);
-+
-+	s = shmem_get_seals(f);
-+	if (s < 0)
-+		ret = ERR_PTR(-EMEDIUMTYPE);
-+	else if ((s & m) != m)
-+		ret = ERR_PTR(-ETXTBSY);
-+	else if (memfd->start + memfd->size > (u64)i_size_read(file_inode(f)))
-+		ret = ERR_PTR(-EFAULT);
-+	else
-+		ret = f;
-+
-+	if (f != ret)
-+		fput(f);
-+
-+	return ret;
-+}
-+
-+static int kdbus_msg_examine(struct kdbus_msg *msg, struct kdbus_bus *bus,
-+			     struct kdbus_cmd_send *cmd, size_t *out_n_memfds,
-+			     size_t *out_n_fds, size_t *out_n_parts)
-+{
-+	struct kdbus_item *item, *fds = NULL, *bloom = NULL, *dstname = NULL;
-+	u64 n_parts, n_memfds, n_fds, vec_size;
-+
-+	/*
-+	 * Step 1:
-+	 * Validate the message and command parameters.
-+	 */
-+
-+	/* KDBUS_PAYLOAD_KERNEL is reserved to kernel messages */
-+	if (msg->payload_type == KDBUS_PAYLOAD_KERNEL)
-+		return -EINVAL;
-+
-+	if (msg->dst_id == KDBUS_DST_ID_BROADCAST) {
-+		/* broadcasts must be marked as signals */
-+		if (!(msg->flags & KDBUS_MSG_SIGNAL))
-+			return -EBADMSG;
-+		/* broadcasts cannot have timeouts */
-+		if (msg->timeout_ns > 0)
-+			return -ENOTUNIQ;
-+	}
-+
-+	if (msg->flags & KDBUS_MSG_EXPECT_REPLY) {
-+		/* if you expect a reply, you must specify a timeout */
-+		if (msg->timeout_ns == 0)
-+			return -EINVAL;
-+		/* signals cannot have replies */
-+		if (msg->flags & KDBUS_MSG_SIGNAL)
-+			return -ENOTUNIQ;
-+	} else {
-+		/* must expect reply if sent as synchronous call */
-+		if (cmd->flags & KDBUS_SEND_SYNC_REPLY)
-+			return -EINVAL;
-+		/* cannot mark replies as signal */
-+		if (msg->cookie_reply && (msg->flags & KDBUS_MSG_SIGNAL))
-+			return -EINVAL;
-+	}
-+
-+	/*
-+	 * Step 2:
-+	 * Validate all passed items. While at it, select some statistics that
-+	 * are required to allocate state objects later on.
-+	 *
-+	 * Generic item validation has already been done via
-+	 * kdbus_item_validate(). Furthermore, the number of items is naturally
-+	 * limited by the maximum message size. Hence, only non-generic item
-+	 * checks are performed here (mainly integer overflow tests).
-+	 */
-+
-+	n_parts = 0;
-+	n_memfds = 0;
-+	n_fds = 0;
-+	vec_size = 0;
-+
-+	KDBUS_ITEMS_FOREACH(item, msg->items, KDBUS_ITEMS_SIZE(msg, items)) {
-+		switch (item->type) {
-+		case KDBUS_ITEM_PAYLOAD_VEC: {
-+			void __force __user *ptr = KDBUS_PTR(item->vec.address);
-+			u64 size = item->vec.size;
-+
-+			if (vec_size + size < vec_size)
-+				return -EMSGSIZE;
-+			if (vec_size + size > KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE)
-+				return -EMSGSIZE;
-+			if (ptr && unlikely(!access_ok(VERIFY_READ, ptr, size)))
-+				return -EFAULT;
-+
-+			if (ptr || size % 8) /* data or padding */
-+				++n_parts;
-+			break;
-+		}
-+		case KDBUS_ITEM_PAYLOAD_MEMFD: {
-+			u64 start = item->memfd.start;
-+			u64 size = item->memfd.size;
-+
-+			if (start + size < start)
-+				return -EMSGSIZE;
-+			if (n_memfds >= KDBUS_MSG_MAX_MEMFD_ITEMS)
-+				return -E2BIG;
-+
-+			++n_memfds;
-+			if (size % 8) /* vec-padding required */
-+				++n_parts;
-+			break;
-+		}
-+		case KDBUS_ITEM_FDS: {
-+			if (fds)
-+				return -EEXIST;
-+
-+			fds = item;
-+			n_fds = KDBUS_ITEM_PAYLOAD_SIZE(item) / sizeof(int);
-+			if (n_fds > KDBUS_CONN_MAX_FDS_PER_USER)
-+				return -EMFILE;
-+
-+			break;
-+		}
-+		case KDBUS_ITEM_BLOOM_FILTER: {
-+			u64 bloom_size;
-+
-+			if (bloom)
-+				return -EEXIST;
-+
-+			bloom = item;
-+			bloom_size = KDBUS_ITEM_PAYLOAD_SIZE(item) -
-+				     offsetof(struct kdbus_bloom_filter, data);
-+			if (!KDBUS_IS_ALIGNED8(bloom_size))
-+				return -EFAULT;
-+			if (bloom_size != bus->bloom.size)
-+				return -EDOM;
-+
-+			break;
-+		}
-+		case KDBUS_ITEM_DST_NAME: {
-+			if (dstname)
-+				return -EEXIST;
-+
-+			dstname = item;
-+			if (!kdbus_name_is_valid(item->str, false))
-+				return -EINVAL;
-+			if (msg->dst_id == KDBUS_DST_ID_BROADCAST)
-+				return -EBADMSG;
-+
-+			break;
-+		}
-+		default:
-+			return -EINVAL;
-+		}
-+	}
-+
-+	/*
-+	 * Step 3:
-+	 * Validate that required items were actually passed, and that no item
-+	 * contradicts the message flags.
-+	 */
-+
-+	/* bloom filters must be attached _iff_ it's a signal */
-+	if (!(msg->flags & KDBUS_MSG_SIGNAL) != !bloom)
-+		return -EBADMSG;
-+	/* destination name is required if no ID is given */
-+	if (msg->dst_id == KDBUS_DST_ID_NAME && !dstname)
-+		return -EDESTADDRREQ;
-+	/* cannot send file-descriptors attached to broadcasts */
-+	if (msg->dst_id == KDBUS_DST_ID_BROADCAST && fds)
-+		return -ENOTUNIQ;
-+
-+	*out_n_memfds = n_memfds;
-+	*out_n_fds = n_fds;
-+	*out_n_parts = n_parts;
-+
-+	return 0;
-+}
-+
-+static bool kdbus_staging_merge_vecs(struct kdbus_staging *staging,
-+				     struct kdbus_item **prev_item,
-+				     struct iovec **prev_vec,
-+				     const struct kdbus_item *merge)
-+{
-+	void __user *ptr = (void __user *)KDBUS_PTR(merge->vec.address);
-+	u64 padding = merge->vec.size % 8;
-+	struct kdbus_item *prev = *prev_item;
-+	struct iovec *vec = *prev_vec;
-+
-+	/* XXX: merging is disabled so far */
-+	if (0 && prev && prev->type == KDBUS_ITEM_PAYLOAD_OFF &&
-+	    !merge->vec.address == !prev->vec.address) {
-+		/*
-+		 * If we merge two VECs, we can always drop the second
-+		 * PAYLOAD_VEC item. Hence, include its size in the previous
-+		 * one.
-+		 */
-+		prev->vec.size += merge->vec.size;
-+
-+		if (ptr) {
-+			/*
-+			 * If we merge two data VECs, we need two iovecs to copy
-+			 * the data. But the items can be easily merged by
-+			 * summing their lengths.
-+			 */
-+			vec = &staging->parts[staging->n_parts++];
-+			vec->iov_len = merge->vec.size;
-+			vec->iov_base = ptr;
-+			staging->n_payload += vec->iov_len;
-+		} else if (padding) {
-+			/*
-+			 * If we merge two 0-vecs with the second 0-vec
-+			 * requiring padding, we need to insert an iovec to copy
-+			 * the 0-padding. We try merging it with the previous
-+			 * 0-padding iovec. This might end up with an
-+			 * iov_len==0, in which case we simply drop the iovec.
-+			 */
-+			if (vec) {
-+				staging->n_payload -= vec->iov_len;
-+				vec->iov_len = prev->vec.size % 8;
-+				if (!vec->iov_len) {
-+					--staging->n_parts;
-+					vec = NULL;
-+				} else {
-+					staging->n_payload += vec->iov_len;
-+				}
-+			} else {
-+				vec = &staging->parts[staging->n_parts++];
-+				vec->iov_len = padding;
-+				vec->iov_base = (char __user *)zeros;
-+				staging->n_payload += vec->iov_len;
-+			}
-+		} else {
-+			/*
-+			 * If we merge two 0-vecs with the second 0-vec having
-+			 * no padding, we know the padding of the first stays
-+			 * the same. Hence, @vec needs no adjustment.
-+			 */
-+		}
-+
-+		/* successfully merged with previous item */
-+		merge = prev;
-+	} else {
-+		/*
-+		 * If we cannot merge the payload item with the previous one,
-+		 * we simply insert a new iovec for the data/padding.
-+		 */
-+		if (ptr) {
-+			vec = &staging->parts[staging->n_parts++];
-+			vec->iov_len = merge->vec.size;
-+			vec->iov_base = ptr;
-+			staging->n_payload += vec->iov_len;
-+		} else if (padding) {
-+			vec = &staging->parts[staging->n_parts++];
-+			vec->iov_len = padding;
-+			vec->iov_base = (char __user *)zeros;
-+			staging->n_payload += vec->iov_len;
-+		} else {
-+			vec = NULL;
-+		}
-+	}
-+
-+	*prev_item = (struct kdbus_item *)merge;
-+	*prev_vec = vec;
-+
-+	return merge == prev;
-+}
-+
-+static int kdbus_staging_import(struct kdbus_staging *staging)
-+{
-+	struct kdbus_item *it, *item, *last, *prev_payload;
-+	struct kdbus_gaps *gaps = staging->gaps;
-+	struct kdbus_msg *msg = staging->msg;
-+	struct iovec *part, *prev_part;
-+	bool drop_item;
-+
-+	drop_item = false;
-+	last = NULL;
-+	prev_payload = NULL;
-+	prev_part = NULL;
-+
-+	/*
-+	 * We modify msg->items along the way; make sure to use @item as offset
-+	 * to the next item (instead of the iterator @it).
-+	 */
-+	for (it = item = msg->items;
-+	     it >= msg->items &&
-+	             (u8 *)it < (u8 *)msg + msg->size &&
-+	             (u8 *)it + it->size <= (u8 *)msg + msg->size; ) {
-+		/*
-+		 * If we dropped items along the way, move current item to
-+		 * front. We must not access @it afterwards, but use @item
-+		 * instead!
-+		 */
-+		if (it != item)
-+			memmove(item, it, it->size);
-+		it = (void *)((u8 *)it + KDBUS_ALIGN8(item->size));
-+
-+		switch (item->type) {
-+		case KDBUS_ITEM_PAYLOAD_VEC: {
-+			size_t offset = staging->n_payload;
-+
-+			if (kdbus_staging_merge_vecs(staging, &prev_payload,
-+						     &prev_part, item)) {
-+				drop_item = true;
-+			} else if (item->vec.address) {
-+				/* real offset is patched later on */
-+				item->type = KDBUS_ITEM_PAYLOAD_OFF;
-+				item->vec.offset = offset;
-+			} else {
-+				item->type = KDBUS_ITEM_PAYLOAD_OFF;
-+				item->vec.offset = ~0ULL;
-+			}
-+
-+			break;
-+		}
-+		case KDBUS_ITEM_PAYLOAD_MEMFD: {
-+			struct file *f;
-+
-+			f = kdbus_get_memfd(&item->memfd);
-+			if (IS_ERR(f))
-+				return PTR_ERR(f);
-+
-+			gaps->memfd_files[gaps->n_memfds] = f;
-+			gaps->memfd_offsets[gaps->n_memfds] =
-+					(u8 *)&item->memfd.fd - (u8 *)msg;
-+			++gaps->n_memfds;
-+
-+			/* memfds cannot be merged */
-+			prev_payload = item;
-+			prev_part = NULL;
-+
-+			/* insert padding to make following VECs aligned */
-+			if (item->memfd.size % 8) {
-+				part = &staging->parts[staging->n_parts++];
-+				part->iov_len = item->memfd.size % 8;
-+				part->iov_base = (char __user *)zeros;
-+				staging->n_payload += part->iov_len;
-+			}
-+
-+			break;
-+		}
-+		case KDBUS_ITEM_FDS: {
-+			size_t i, n_fds;
-+
-+			n_fds = KDBUS_ITEM_PAYLOAD_SIZE(item) / sizeof(int);
-+			for (i = 0; i < n_fds; ++i) {
-+				struct file *f;
-+
-+				f = kdbus_get_fd(item->fds[i]);
-+				if (IS_ERR(f))
-+					return PTR_ERR(f);
-+
-+				gaps->fd_files[gaps->n_fds++] = f;
-+			}
-+
-+			gaps->fd_offset = (u8 *)item->fds - (u8 *)msg;
-+
-+			break;
-+		}
-+		case KDBUS_ITEM_BLOOM_FILTER:
-+			staging->bloom_filter = &item->bloom_filter;
-+			break;
-+		case KDBUS_ITEM_DST_NAME:
-+			staging->dst_name = item->str;
-+			break;
-+		}
-+
-+		/* drop item if we merged it with a previous one */
-+		if (drop_item) {
-+			drop_item = false;
-+		} else {
-+			last = item;
-+			item = KDBUS_ITEM_NEXT(item);
-+		}
-+	}
-+
-+	/* adjust message size regarding dropped items */
-+	msg->size = offsetof(struct kdbus_msg, items);
-+	if (last)
-+		msg->size += ((u8 *)last - (u8 *)msg->items) + last->size;
-+
-+	return 0;
-+}
-+
-+static void kdbus_staging_reserve(struct kdbus_staging *staging)
-+{
-+	struct iovec *part;
-+
-+	part = &staging->parts[staging->n_parts++];
-+	part->iov_base = (void __user *)zeros;
-+	part->iov_len = 0;
-+}
-+
-+static struct kdbus_staging *kdbus_staging_new(struct kdbus_bus *bus,
-+					       size_t n_parts,
-+					       size_t msg_extra_size)
-+{
-+	const size_t reserved_parts = 5; /* see below for explanation */
-+	struct kdbus_staging *staging;
-+	int ret;
-+
-+	n_parts += reserved_parts;
-+
-+	staging = kzalloc(sizeof(*staging) + n_parts * sizeof(*staging->parts) +
-+			  msg_extra_size, GFP_TEMPORARY);
-+	if (!staging)
-+		return ERR_PTR(-ENOMEM);
-+
-+	staging->msg_seqnum = atomic64_inc_return(&bus->last_message_id);
-+	staging->n_parts = 0; /* we reserve n_parts, but don't enforce them */
-+	staging->parts = (void *)(staging + 1);
-+
-+	if (msg_extra_size) /* if requested, allocate message, too */
-+		staging->msg = (void *)((u8 *)staging->parts +
-+				        n_parts * sizeof(*staging->parts));
-+
-+	staging->meta_proc = kdbus_meta_proc_new();
-+	if (IS_ERR(staging->meta_proc)) {
-+		ret = PTR_ERR(staging->meta_proc);
-+		staging->meta_proc = NULL;
-+		goto error;
-+	}
-+
-+	staging->meta_conn = kdbus_meta_conn_new();
-+	if (IS_ERR(staging->meta_conn)) {
-+		ret = PTR_ERR(staging->meta_conn);
-+		staging->meta_conn = NULL;
-+		goto error;
-+	}
-+
-+	/*
-+	 * Prepare iovecs to copy the message into the target pool. We use the
-+	 * following iovecs:
-+	 *   * iovec to copy "kdbus_msg.size"
-+	 *   * iovec to copy "struct kdbus_msg" (minus size) plus items
-+	 *   * iovec for possible padding after the items
-+	 *   * iovec for metadata items
-+	 *   * iovec for possible padding after the items
-+	 *
-+	 * Make sure to update @reserved_parts if you add more parts here.
-+	 */
-+
-+	kdbus_staging_reserve(staging); /* msg.size */
-+	kdbus_staging_reserve(staging); /* msg (minus msg.size) plus items */
-+	kdbus_staging_reserve(staging); /* msg padding */
-+	kdbus_staging_reserve(staging); /* meta */
-+	kdbus_staging_reserve(staging); /* meta padding */
-+
-+	return staging;
-+
-+error:
-+	kdbus_staging_free(staging);
-+	return ERR_PTR(ret);
-+}
-+
-+struct kdbus_staging *kdbus_staging_new_kernel(struct kdbus_bus *bus,
-+					       u64 dst, u64 cookie_timeout,
-+					       size_t it_size, size_t it_type)
-+{
-+	struct kdbus_staging *staging;
-+	size_t size;
-+
-+	size = offsetof(struct kdbus_msg, items) +
-+	       KDBUS_ITEM_HEADER_SIZE + it_size;
-+
-+	staging = kdbus_staging_new(bus, 0, KDBUS_ALIGN8(size));
-+	if (IS_ERR(staging))
-+		return ERR_CAST(staging);
-+
-+	staging->msg->size = size;
-+	staging->msg->flags = (dst == KDBUS_DST_ID_BROADCAST) ?
-+							KDBUS_MSG_SIGNAL : 0;
-+	staging->msg->dst_id = dst;
-+	staging->msg->src_id = KDBUS_SRC_ID_KERNEL;
-+	staging->msg->payload_type = KDBUS_PAYLOAD_KERNEL;
-+	staging->msg->cookie_reply = cookie_timeout;
-+	staging->notify = staging->msg->items;
-+	staging->notify->size = KDBUS_ITEM_HEADER_SIZE + it_size;
-+	staging->notify->type = it_type;
-+
-+	return staging;
-+}
-+
-+struct kdbus_staging *kdbus_staging_new_user(struct kdbus_bus *bus,
-+					     struct kdbus_cmd_send *cmd,
-+					     struct kdbus_msg *msg)
-+{
-+	const size_t reserved_parts = 1; /* see below for explanation */
-+	size_t n_memfds, n_fds, n_parts;
-+	struct kdbus_staging *staging;
-+	int ret;
-+
-+	/*
-+	 * Examine user-supplied message and figure out how many resources we
-+	 * need to allocate in our staging area. This requires us to iterate
-+	 * the message twice, but saves us from re-allocating our resources
-+	 * all the time.
-+	 */
-+
-+	ret = kdbus_msg_examine(msg, bus, cmd, &n_memfds, &n_fds, &n_parts);
-+	if (ret < 0)
-+		return ERR_PTR(ret);
-+
-+	n_parts += reserved_parts;
-+
-+	/*
-+	 * Allocate staging area with the number of required resources. Make
-+	 * sure that we have enough iovecs for all required parts pre-allocated
-+	 * so this will hopefully be the only memory allocation for this
-+	 * message transaction.
-+	 */
-+
-+	staging = kdbus_staging_new(bus, n_parts, 0);
-+	if (IS_ERR(staging))
-+		return ERR_CAST(staging);
-+
-+	staging->msg = msg;
-+
-+	/*
-+	 * If the message contains memfds or fd items, we need to remember some
-+	 * state so we can fill in the requested information at RECV time.
-+	 * File-descriptors cannot be passed at SEND time. Hence, allocate a
-+	 * gaps-object to remember that state. That gaps object is linked to
-+	 * from the staging area, but will also be linked to from the message
-+	 * queue of each peer. Hence, each receiver owns a reference to it, and
-+	 * it will later be used to fill the 'gaps' in message that couldn't be
-+	 * filled at SEND time.
-+	 * Note that the 'gaps' object is read-only once the staging-allocator
-+	 * returns. There might be connections receiving a queued message while
-+	 * the sender still broadcasts the message to other receivers.
-+	 */
-+
-+	if (n_memfds > 0 || n_fds > 0) {
-+		staging->gaps = kdbus_gaps_new(n_memfds, n_fds);
-+		if (IS_ERR(staging->gaps)) {
-+			ret = PTR_ERR(staging->gaps);
-+			staging->gaps = NULL;
-+			kdbus_staging_free(staging);
-+			return ERR_PTR(ret);
-+		}
-+	}
-+
-+	/*
-+	 * kdbus_staging_new() already reserves parts for message setup. For
-+	 * user-supplied messages, we add the following iovecs:
-+	 *   ... variable number of iovecs for payload ...
-+	 *   * final iovec for possible padding of payload
-+	 *
-+	 * Make sure to update @reserved_parts if you add more parts here.
-+	 */
-+
-+	ret = kdbus_staging_import(staging); /* payload */
-+	kdbus_staging_reserve(staging); /* payload padding */
-+
-+	if (ret < 0)
-+		goto error;
-+
-+	return staging;
-+
-+error:
-+	kdbus_staging_free(staging);
-+	return ERR_PTR(ret);
-+}
-+
-+struct kdbus_staging *kdbus_staging_free(struct kdbus_staging *staging)
-+{
-+	if (!staging)
-+		return NULL;
-+
-+	kdbus_meta_conn_unref(staging->meta_conn);
-+	kdbus_meta_proc_unref(staging->meta_proc);
-+	kdbus_gaps_unref(staging->gaps);
-+	kfree(staging);
-+
-+	return NULL;
-+}
-+
-+static int kdbus_staging_collect_metadata(struct kdbus_staging *staging,
-+					  struct kdbus_conn *src,
-+					  struct kdbus_conn *dst,
-+					  u64 *out_attach)
-+{
-+	u64 attach;
-+	int ret;
-+
-+	if (src)
-+		attach = kdbus_meta_msg_mask(src, dst);
-+	else
-+		attach = KDBUS_ATTACH_TIMESTAMP; /* metadata for kernel msgs */
-+
-+	if (src && !src->meta_fake) {
-+		ret = kdbus_meta_proc_collect(staging->meta_proc, attach);
-+		if (ret < 0)
-+			return ret;
-+	}
-+
-+	ret = kdbus_meta_conn_collect(staging->meta_conn, src,
-+				      staging->msg_seqnum, attach);
-+	if (ret < 0)
-+		return ret;
-+
-+	*out_attach = attach;
-+	return 0;
-+}
-+
-+/**
-+ * kdbus_staging_emit() - emit linearized message in target pool
-+ * @staging:		staging object to create message from
-+ * @src:		sender of the message (or NULL)
-+ * @dst:		target connection to allocate message for
-+ *
-+ * This allocates a pool-slice for @dst and copies the message provided by
-+ * @staging into it. The new slice is then returned to the caller for further
-+ * processing. It's not linked into any queue, yet.
-+ *
-+ * Return: Newly allocated slice or ERR_PTR on failure.
-+ */
-+struct kdbus_pool_slice *kdbus_staging_emit(struct kdbus_staging *staging,
-+					    struct kdbus_conn *src,
-+					    struct kdbus_conn *dst)
-+{
-+	struct kdbus_item *item, *meta_items = NULL;
-+	struct kdbus_pool_slice *slice = NULL;
-+	size_t off, size, meta_size;
-+	struct iovec *v;
-+	u64 attach, msg_size;
-+	int ret;
-+
-+	/*
-+	 * Step 1:
-+	 * Collect metadata from @src depending on the attach-flags allowed for
-+	 * @dst. Translate it into the namespaces pinned by @dst.
-+	 */
-+
-+	ret = kdbus_staging_collect_metadata(staging, src, dst, &attach);
-+	if (ret < 0)
-+		goto error;
-+
-+	ret = kdbus_meta_emit(staging->meta_proc, NULL, staging->meta_conn,
-+			      dst, attach, &meta_items, &meta_size);
-+	if (ret < 0)
-+		goto error;
-+
-+	/*
-+	 * Step 2:
-+	 * Setup iovecs for the message. See kdbus_staging_new() for allocation
-+	 * of those iovecs. All reserved iovecs have been initialized with
-+	 * iov_len=0 + iov_base=zeros. Furthermore, the iovecs to copy the
-+	 * actual message payload have already been initialized and need not be
-+	 * touched.
-+	 */
-+
-+	v = staging->parts;
-+	msg_size = staging->msg->size;
-+
-+	/* msg.size */
-+	v->iov_len = sizeof(msg_size);
-+	v->iov_base = (void __user *)&msg_size;
-+	++v;
-+
-+	/* msg (after msg.size) plus items */
-+	v->iov_len = staging->msg->size - sizeof(staging->msg->size);
-+	v->iov_base = (void __user *)((u8 *)staging->msg +
-+				      sizeof(staging->msg->size));
-+	++v;
-+
-+	/* padding after msg */
-+	v->iov_len = KDBUS_ALIGN8(staging->msg->size) - staging->msg->size;
-+	v->iov_base = (void __user *)zeros;
-+	++v;
-+
-+	if (meta_size > 0) {
-+		/* metadata items */
-+		v->iov_len = meta_size;
-+		v->iov_base = (void __user *)meta_items;
-+		++v;
-+
-+		/* padding after metadata */
-+		v->iov_len = KDBUS_ALIGN8(meta_size) - meta_size;
-+		v->iov_base = (void __user *)zeros;
-+		++v;
-+
-+		msg_size = KDBUS_ALIGN8(msg_size) + meta_size;
-+	} else {
-+		/* metadata items */
-+		v->iov_len = 0;
-+		v->iov_base = (void __user *)zeros;
-+		++v;
-+
-+		/* padding after metadata */
-+		v->iov_len = 0;
-+		v->iov_base = (void __user *)zeros;
-+		++v;
-+	}
-+
-+	/* ... payload iovecs are already filled in ... */
-+
-+	/* compute overall size and fill in padding after payload */
-+	size = KDBUS_ALIGN8(msg_size);
-+
-+	if (staging->n_payload > 0) {
-+		size += staging->n_payload;
-+
-+		v = &staging->parts[staging->n_parts - 1];
-+		v->iov_len = KDBUS_ALIGN8(size) - size;
-+		v->iov_base = (void __user *)zeros;
-+
-+		size = KDBUS_ALIGN8(size);
-+	}
-+
-+	/*
-+	 * Step 3:
-+	 * The PAYLOAD_OFF items in the message contain a relative 'offset'
-+	 * field that tells the receiver where to find the actual payload. This
-+	 * offset is relative to the start of the message, and as such depends
-+	 * on the size of the metadata items we inserted. This size is variable
-+	 * and changes for each peer we send the message to. Hence, we remember
-+	 * the last relative offset that was used to calculate the 'offset'
-+	 * fields. For each message, we re-calculate it and patch all items, in
-+	 * case it changed.
-+	 */
-+
-+	off = KDBUS_ALIGN8(msg_size);
-+
-+	if (off != staging->i_payload) {
-+		KDBUS_ITEMS_FOREACH(item, staging->msg->items,
-+				    KDBUS_ITEMS_SIZE(staging->msg, items)) {
-+			if (item->type != KDBUS_ITEM_PAYLOAD_OFF)
-+				continue;
-+
-+			item->vec.offset -= staging->i_payload;
-+			item->vec.offset += off;
-+		}
-+
-+		staging->i_payload = off;
-+	}
-+
-+	/*
-+	 * Step 4:
-+	 * Allocate pool slice and copy over all data. Make sure to properly
-+	 * account on user quota.
-+	 */
-+
-+	ret = kdbus_conn_quota_inc(dst, src ? src->user : NULL, size,
-+				   staging->gaps ? staging->gaps->n_fds : 0);
-+	if (ret < 0)
-+		goto error;
-+
-+	slice = kdbus_pool_slice_alloc(dst->pool, size, true);
-+	if (IS_ERR(slice)) {
-+		ret = PTR_ERR(slice);
-+		slice = NULL;
-+		goto error;
-+	}
-+
-+	WARN_ON(kdbus_pool_slice_size(slice) != size);
-+
-+	ret = kdbus_pool_slice_copy_iovec(slice, 0, staging->parts,
-+					  staging->n_parts, size);
-+	if (ret < 0)
-+		goto error;
-+
-+	/* all done, return slice to caller */
-+	goto exit;
-+
-+error:
-+	if (slice)
-+		kdbus_conn_quota_dec(dst, src ? src->user : NULL, size,
-+				     staging->gaps ? staging->gaps->n_fds : 0);
-+	kdbus_pool_slice_release(slice);
-+	slice = ERR_PTR(ret);
-+exit:
-+	kfree(meta_items);
-+	return slice;
-+}
-diff --git a/ipc/kdbus/message.h b/ipc/kdbus/message.h
-new file mode 100644
-index 0000000..298f9c9
---- /dev/null
-+++ b/ipc/kdbus/message.h
-@@ -0,0 +1,120 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_MESSAGE_H
-+#define __KDBUS_MESSAGE_H
-+
-+#include <linux/fs.h>
-+#include <linux/kref.h>
-+#include <uapi/linux/kdbus.h>
-+
-+struct kdbus_bus;
-+struct kdbus_conn;
-+struct kdbus_meta_conn;
-+struct kdbus_meta_proc;
-+struct kdbus_pool_slice;
-+
-+/**
-+ * struct kdbus_gaps - gaps in message to be filled later
-+ * @kref:		Reference counter
-+ * @n_memfd_offs:	Number of memfds
-+ * @memfd_offs:		Offsets of kdbus_memfd items in target slice
-+ * @n_fds:		Number of fds
-+ * @fds:		Array of sent fds
-+ * @fds_offset:		Offset of fd-array in target slice
-+ *
-+ * The 'gaps' object is used to track data that is needed to fill gaps in a
-+ * message at RECV time. Usually, we try to compile the whole message at SEND
-+ * time. This has the advantage, that we don't have to cache any information and
-+ * can keep the memory consumption small. Furthermore, all copy operations can
-+ * be combined into a single function call, which speeds up transactions
-+ * considerably.
-+ * However, things like file-descriptors can only be fully installed at RECV
-+ * time. The gaps object tracks this data and pins it until a message is
-+ * received. The gaps object is shared between all receivers of the same
-+ * message.
-+ */
-+struct kdbus_gaps {
-+	struct kref kref;
-+
-+	/* state tracking for KDBUS_ITEM_PAYLOAD_MEMFD entries */
-+	size_t n_memfds;
-+	u64 *memfd_offsets;
-+	struct file **memfd_files;
-+
-+	/* state tracking for KDBUS_ITEM_FDS */
-+	size_t n_fds;
-+	struct file **fd_files;
-+	u64 fd_offset;
-+};
-+
-+struct kdbus_gaps *kdbus_gaps_ref(struct kdbus_gaps *gaps);
-+struct kdbus_gaps *kdbus_gaps_unref(struct kdbus_gaps *gaps);
-+int kdbus_gaps_install(struct kdbus_gaps *gaps, struct kdbus_pool_slice *slice,
-+		       bool *out_incomplete);
-+
-+/**
-+ * struct kdbus_staging - staging area to import messages
-+ * @msg:		User-supplied message
-+ * @gaps:		Gaps-object created during import (or NULL if empty)
-+ * @msg_seqnum:		Message sequence number
-+ * @notify_entry:	Entry into list of kernel-generated notifications
-+ * @i_payload:		Current relative index of start of payload
-+ * @n_payload:		Total number of bytes needed for payload
-+ * @n_parts:		Number of parts
-+ * @parts:		Array of iovecs that make up the whole message
-+ * @meta_proc:		Process metadata of the sender (or NULL if empty)
-+ * @meta_conn:		Connection metadata of the sender (or NULL if empty)
-+ * @bloom_filter:	Pointer to the bloom-item in @msg, or NULL
-+ * @dst_name:		Pointer to the dst-name-item in @msg, or NULL
-+ * @notify:		Pointer to the notification item in @msg, or NULL
-+ *
-+ * The kdbus_staging object is a temporary staging area to import user-supplied
-+ * messages into the kernel. It is only used during SEND and dropped once the
-+ * message is queued. Any data that cannot be collected during SEND, is
-+ * collected in a kdbus_gaps object and attached to the message queue.
-+ */
-+struct kdbus_staging {
-+	struct kdbus_msg *msg;
-+	struct kdbus_gaps *gaps;
-+	u64 msg_seqnum;
-+	struct list_head notify_entry;
-+
-+	/* crafted iovecs to copy the message */
-+	size_t i_payload;
-+	size_t n_payload;
-+	size_t n_parts;
-+	struct iovec *parts;
-+
-+	/* metadata state */
-+	struct kdbus_meta_proc *meta_proc;
-+	struct kdbus_meta_conn *meta_conn;
-+
-+	/* cached pointers into @msg */
-+	const struct kdbus_bloom_filter *bloom_filter;
-+	const char *dst_name;
-+	struct kdbus_item *notify;
-+};
-+
-+struct kdbus_staging *kdbus_staging_new_kernel(struct kdbus_bus *bus,
-+					       u64 dst, u64 cookie_timeout,
-+					       size_t it_size, size_t it_type);
-+struct kdbus_staging *kdbus_staging_new_user(struct kdbus_bus *bus,
-+					     struct kdbus_cmd_send *cmd,
-+					     struct kdbus_msg *msg);
-+struct kdbus_staging *kdbus_staging_free(struct kdbus_staging *staging);
-+struct kdbus_pool_slice *kdbus_staging_emit(struct kdbus_staging *staging,
-+					    struct kdbus_conn *src,
-+					    struct kdbus_conn *dst);
-+
-+#endif
-diff --git a/ipc/kdbus/metadata.c b/ipc/kdbus/metadata.c
-new file mode 100644
-index 0000000..71ca475
---- /dev/null
-+++ b/ipc/kdbus/metadata.c
-@@ -0,0 +1,1347 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/audit.h>
-+#include <linux/capability.h>
-+#include <linux/cgroup.h>
-+#include <linux/cred.h>
-+#include <linux/file.h>
-+#include <linux/fs_struct.h>
-+#include <linux/init.h>
-+#include <linux/kref.h>
-+#include <linux/mutex.h>
-+#include <linux/sched.h>
-+#include <linux/security.h>
-+#include <linux/sizes.h>
-+#include <linux/slab.h>
-+#include <linux/uaccess.h>
-+#include <linux/uidgid.h>
-+#include <linux/uio.h>
-+#include <linux/user_namespace.h>
-+
-+#include "bus.h"
-+#include "connection.h"
-+#include "endpoint.h"
-+#include "item.h"
-+#include "message.h"
-+#include "metadata.h"
-+#include "names.h"
-+
-+/**
-+ * struct kdbus_meta_proc - Process metadata
-+ * @kref:		Reference counting
-+ * @lock:		Object lock
-+ * @collected:		Bitmask of collected items
-+ * @valid:		Bitmask of collected and valid items
-+ * @cred:		Credentials
-+ * @pid:		PID of process
-+ * @tgid:		TGID of process
-+ * @ppid:		PPID of process
-+ * @tid_comm:		TID comm line
-+ * @pid_comm:		PID comm line
-+ * @exe_path:		Executable path
-+ * @root_path:		Root-FS path
-+ * @cmdline:		Command-line
-+ * @cgroup:		Full cgroup path
-+ * @seclabel:		Seclabel
-+ * @audit_loginuid:	Audit login-UID
-+ * @audit_sessionid:	Audit session-ID
-+ */
-+struct kdbus_meta_proc {
-+	struct kref kref;
-+	struct mutex lock;
-+	u64 collected;
-+	u64 valid;
-+
-+	/* KDBUS_ITEM_CREDS */
-+	/* KDBUS_ITEM_AUXGROUPS */
-+	/* KDBUS_ITEM_CAPS */
-+	const struct cred *cred;
-+
-+	/* KDBUS_ITEM_PIDS */
-+	struct pid *pid;
-+	struct pid *tgid;
-+	struct pid *ppid;
-+
-+	/* KDBUS_ITEM_TID_COMM */
-+	char tid_comm[TASK_COMM_LEN];
-+	/* KDBUS_ITEM_PID_COMM */
-+	char pid_comm[TASK_COMM_LEN];
-+
-+	/* KDBUS_ITEM_EXE */
-+	struct path exe_path;
-+	struct path root_path;
-+
-+	/* KDBUS_ITEM_CMDLINE */
-+	char *cmdline;
-+
-+	/* KDBUS_ITEM_CGROUP */
-+	char *cgroup;
-+
-+	/* KDBUS_ITEM_SECLABEL */
-+	char *seclabel;
-+
-+	/* KDBUS_ITEM_AUDIT */
-+	kuid_t audit_loginuid;
-+	unsigned int audit_sessionid;
-+};
-+
-+/**
-+ * struct kdbus_meta_conn
-+ * @kref:		Reference counting
-+ * @lock:		Object lock
-+ * @collected:		Bitmask of collected items
-+ * @valid:		Bitmask of collected and valid items
-+ * @ts:			Timestamp values
-+ * @owned_names_items:	Serialized items for owned names
-+ * @owned_names_size:	Size of @owned_names_items
-+ * @conn_description:	Connection description
-+ */
-+struct kdbus_meta_conn {
-+	struct kref kref;
-+	struct mutex lock;
-+	u64 collected;
-+	u64 valid;
-+
-+	/* KDBUS_ITEM_TIMESTAMP */
-+	struct kdbus_timestamp ts;
-+
-+	/* KDBUS_ITEM_OWNED_NAME */
-+	struct kdbus_item *owned_names_items;
-+	size_t owned_names_size;
-+
-+	/* KDBUS_ITEM_CONN_DESCRIPTION */
-+	char *conn_description;
-+};
-+
-+/* fixed size equivalent of "kdbus_caps" */
-+struct kdbus_meta_caps {
-+	u32 last_cap;
-+	struct {
-+		u32 caps[_KERNEL_CAPABILITY_U32S];
-+	} set[4];
-+};
-+
-+/**
-+ * kdbus_meta_proc_new() - Create process metadata object
-+ *
-+ * Return: Pointer to new object on success, ERR_PTR on failure.
-+ */
-+struct kdbus_meta_proc *kdbus_meta_proc_new(void)
-+{
-+	struct kdbus_meta_proc *mp;
-+
-+	mp = kzalloc(sizeof(*mp), GFP_KERNEL);
-+	if (!mp)
-+		return ERR_PTR(-ENOMEM);
-+
-+	kref_init(&mp->kref);
-+	mutex_init(&mp->lock);
-+
-+	return mp;
-+}
-+
-+static void kdbus_meta_proc_free(struct kref *kref)
-+{
-+	struct kdbus_meta_proc *mp = container_of(kref, struct kdbus_meta_proc,
-+						  kref);
-+
-+	path_put(&mp->exe_path);
-+	path_put(&mp->root_path);
-+	if (mp->cred)
-+		put_cred(mp->cred);
-+	put_pid(mp->ppid);
-+	put_pid(mp->tgid);
-+	put_pid(mp->pid);
-+
-+	kfree(mp->seclabel);
-+	kfree(mp->cmdline);
-+	kfree(mp->cgroup);
-+	kfree(mp);
-+}
-+
-+/**
-+ * kdbus_meta_proc_ref() - Gain reference
-+ * @mp:		Process metadata object
-+ *
-+ * Return: @mp is returned
-+ */
-+struct kdbus_meta_proc *kdbus_meta_proc_ref(struct kdbus_meta_proc *mp)
-+{
-+	if (mp)
-+		kref_get(&mp->kref);
-+	return mp;
-+}
-+
-+/**
-+ * kdbus_meta_proc_unref() - Drop reference
-+ * @mp:		Process metadata object
-+ *
-+ * Return: NULL
-+ */
-+struct kdbus_meta_proc *kdbus_meta_proc_unref(struct kdbus_meta_proc *mp)
-+{
-+	if (mp)
-+		kref_put(&mp->kref, kdbus_meta_proc_free);
-+	return NULL;
-+}
-+
-+static void kdbus_meta_proc_collect_pids(struct kdbus_meta_proc *mp)
-+{
-+	struct task_struct *parent;
-+
-+	mp->pid = get_pid(task_pid(current));
-+	mp->tgid = get_pid(task_tgid(current));
-+
-+	rcu_read_lock();
-+	parent = rcu_dereference(current->real_parent);
-+	mp->ppid = get_pid(task_tgid(parent));
-+	rcu_read_unlock();
-+
-+	mp->valid |= KDBUS_ATTACH_PIDS;
-+}
-+
-+static void kdbus_meta_proc_collect_tid_comm(struct kdbus_meta_proc *mp)
-+{
-+	get_task_comm(mp->tid_comm, current);
-+	mp->valid |= KDBUS_ATTACH_TID_COMM;
-+}
-+
-+static void kdbus_meta_proc_collect_pid_comm(struct kdbus_meta_proc *mp)
-+{
-+	get_task_comm(mp->pid_comm, current->group_leader);
-+	mp->valid |= KDBUS_ATTACH_PID_COMM;
-+}
-+
-+static void kdbus_meta_proc_collect_exe(struct kdbus_meta_proc *mp)
-+{
-+	struct file *exe_file;
-+
-+	rcu_read_lock();
-+	exe_file = rcu_dereference(current->mm->exe_file);
-+	if (exe_file) {
-+		mp->exe_path = exe_file->f_path;
-+		path_get(&mp->exe_path);
-+		get_fs_root(current->fs, &mp->root_path);
-+		mp->valid |= KDBUS_ATTACH_EXE;
-+	}
-+	rcu_read_unlock();
-+}
-+
-+static int kdbus_meta_proc_collect_cmdline(struct kdbus_meta_proc *mp)
-+{
-+	struct mm_struct *mm = current->mm;
-+	char *cmdline;
-+
-+	if (!mm->arg_end)
-+		return 0;
-+
-+	cmdline = strndup_user((const char __user *)mm->arg_start,
-+			       mm->arg_end - mm->arg_start);
-+	if (IS_ERR(cmdline))
-+		return PTR_ERR(cmdline);
-+
-+	mp->cmdline = cmdline;
-+	mp->valid |= KDBUS_ATTACH_CMDLINE;
-+
-+	return 0;
-+}
-+
-+static int kdbus_meta_proc_collect_cgroup(struct kdbus_meta_proc *mp)
-+{
-+#ifdef CONFIG_CGROUPS
-+	void *page;
-+	char *s;
-+
-+	page = (void *)__get_free_page(GFP_TEMPORARY);
-+	if (!page)
-+		return -ENOMEM;
-+
-+	s = task_cgroup_path(current, page, PAGE_SIZE);
-+	if (s) {
-+		mp->cgroup = kstrdup(s, GFP_KERNEL);
-+		if (!mp->cgroup) {
-+			free_page((unsigned long)page);
-+			return -ENOMEM;
-+		}
-+	}
-+
-+	free_page((unsigned long)page);
-+	mp->valid |= KDBUS_ATTACH_CGROUP;
-+#endif
-+
-+	return 0;
-+}
-+
-+static int kdbus_meta_proc_collect_seclabel(struct kdbus_meta_proc *mp)
-+{
-+#ifdef CONFIG_SECURITY
-+	char *ctx = NULL;
-+	u32 sid, len;
-+	int ret;
-+
-+	security_task_getsecid(current, &sid);
-+	ret = security_secid_to_secctx(sid, &ctx, &len);
-+	if (ret < 0) {
-+		/*
-+		 * EOPNOTSUPP means no security module is active,
-+		 * lets skip adding the seclabel then. This effectively
-+		 * drops the SECLABEL item.
-+		 */
-+		return (ret == -EOPNOTSUPP) ? 0 : ret;
-+	}
-+
-+	mp->seclabel = kstrdup(ctx, GFP_KERNEL);
-+	security_release_secctx(ctx, len);
-+	if (!mp->seclabel)
-+		return -ENOMEM;
-+
-+	mp->valid |= KDBUS_ATTACH_SECLABEL;
-+#endif
-+
-+	return 0;
-+}
-+
-+static void kdbus_meta_proc_collect_audit(struct kdbus_meta_proc *mp)
-+{
-+#ifdef CONFIG_AUDITSYSCALL
-+	mp->audit_loginuid = audit_get_loginuid(current);
-+	mp->audit_sessionid = audit_get_sessionid(current);
-+	mp->valid |= KDBUS_ATTACH_AUDIT;
-+#endif
-+}
-+
-+/**
-+ * kdbus_meta_proc_collect() - Collect process metadata
-+ * @mp:		Process metadata object
-+ * @what:	Attach flags to collect
-+ *
-+ * This collects process metadata from current and saves it in @mp.
-+ *
-+ * Return: 0 on success, negative error code on failure.
-+ */
-+int kdbus_meta_proc_collect(struct kdbus_meta_proc *mp, u64 what)
-+{
-+	int ret;
-+
-+	if (!mp || !(what & (KDBUS_ATTACH_CREDS |
-+			     KDBUS_ATTACH_PIDS |
-+			     KDBUS_ATTACH_AUXGROUPS |
-+			     KDBUS_ATTACH_TID_COMM |
-+			     KDBUS_ATTACH_PID_COMM |
-+			     KDBUS_ATTACH_EXE |
-+			     KDBUS_ATTACH_CMDLINE |
-+			     KDBUS_ATTACH_CGROUP |
-+			     KDBUS_ATTACH_CAPS |
-+			     KDBUS_ATTACH_SECLABEL |
-+			     KDBUS_ATTACH_AUDIT)))
-+		return 0;
-+
-+	mutex_lock(&mp->lock);
-+
-+	/* creds, auxgrps and caps share "struct cred" as context */
-+	{
-+		const u64 m_cred = KDBUS_ATTACH_CREDS |
-+				   KDBUS_ATTACH_AUXGROUPS |
-+				   KDBUS_ATTACH_CAPS;
-+
-+		if ((what & m_cred) && !(mp->collected & m_cred)) {
-+			mp->cred = get_current_cred();
-+			mp->valid |= m_cred;
-+			mp->collected |= m_cred;
-+		}
-+	}
-+
-+	if ((what & KDBUS_ATTACH_PIDS) &&
-+	    !(mp->collected & KDBUS_ATTACH_PIDS)) {
-+		kdbus_meta_proc_collect_pids(mp);
-+		mp->collected |= KDBUS_ATTACH_PIDS;
-+	}
-+
-+	if ((what & KDBUS_ATTACH_TID_COMM) &&
-+	    !(mp->collected & KDBUS_ATTACH_TID_COMM)) {
-+		kdbus_meta_proc_collect_tid_comm(mp);
-+		mp->collected |= KDBUS_ATTACH_TID_COMM;
-+	}
-+
-+	if ((what & KDBUS_ATTACH_PID_COMM) &&
-+	    !(mp->collected & KDBUS_ATTACH_PID_COMM)) {
-+		kdbus_meta_proc_collect_pid_comm(mp);
-+		mp->collected |= KDBUS_ATTACH_PID_COMM;
-+	}
-+
-+	if ((what & KDBUS_ATTACH_EXE) &&
-+	    !(mp->collected & KDBUS_ATTACH_EXE)) {
-+		kdbus_meta_proc_collect_exe(mp);
-+		mp->collected |= KDBUS_ATTACH_EXE;
-+	}
-+
-+	if ((what & KDBUS_ATTACH_CMDLINE) &&
-+	    !(mp->collected & KDBUS_ATTACH_CMDLINE)) {
-+		ret = kdbus_meta_proc_collect_cmdline(mp);
-+		if (ret < 0)
-+			goto exit_unlock;
-+		mp->collected |= KDBUS_ATTACH_CMDLINE;
-+	}
-+
-+	if ((what & KDBUS_ATTACH_CGROUP) &&
-+	    !(mp->collected & KDBUS_ATTACH_CGROUP)) {
-+		ret = kdbus_meta_proc_collect_cgroup(mp);
-+		if (ret < 0)
-+			goto exit_unlock;
-+		mp->collected |= KDBUS_ATTACH_CGROUP;
-+	}
-+
-+	if ((what & KDBUS_ATTACH_SECLABEL) &&
-+	    !(mp->collected & KDBUS_ATTACH_SECLABEL)) {
-+		ret = kdbus_meta_proc_collect_seclabel(mp);
-+		if (ret < 0)
-+			goto exit_unlock;
-+		mp->collected |= KDBUS_ATTACH_SECLABEL;
-+	}
-+
-+	if ((what & KDBUS_ATTACH_AUDIT) &&
-+	    !(mp->collected & KDBUS_ATTACH_AUDIT)) {
-+		kdbus_meta_proc_collect_audit(mp);
-+		mp->collected |= KDBUS_ATTACH_AUDIT;
-+	}
-+
-+	ret = 0;
-+
-+exit_unlock:
-+	mutex_unlock(&mp->lock);
-+	return ret;
-+}
-+
-+/**
-+ * kdbus_meta_fake_new() - Create fake metadata object
-+ *
-+ * Return: Pointer to new object on success, ERR_PTR on failure.
-+ */
-+struct kdbus_meta_fake *kdbus_meta_fake_new(void)
-+{
-+	struct kdbus_meta_fake *mf;
-+
-+	mf = kzalloc(sizeof(*mf), GFP_KERNEL);
-+	if (!mf)
-+		return ERR_PTR(-ENOMEM);
-+
-+	return mf;
-+}
-+
-+/**
-+ * kdbus_meta_fake_free() - Free fake metadata object
-+ * @mf:		Fake metadata object
-+ *
-+ * Return: NULL
-+ */
-+struct kdbus_meta_fake *kdbus_meta_fake_free(struct kdbus_meta_fake *mf)
-+{
-+	if (mf) {
-+		put_pid(mf->ppid);
-+		put_pid(mf->tgid);
-+		put_pid(mf->pid);
-+		kfree(mf->seclabel);
-+		kfree(mf);
-+	}
-+
-+	return NULL;
-+}
-+
-+/**
-+ * kdbus_meta_fake_collect() - Fill fake metadata from faked credentials
-+ * @mf:		Fake metadata object
-+ * @creds:	Creds to set, may be %NULL
-+ * @pids:	PIDs to set, may be %NULL
-+ * @seclabel:	Seclabel to set, may be %NULL
-+ *
-+ * This function takes information stored in @creds, @pids and @seclabel and
-+ * resolves them to kernel-representations, if possible. This call uses the
-+ * current task's namespaces to resolve the given information.
-+ *
-+ * Return: 0 on success, negative error code on failure.
-+ */
-+int kdbus_meta_fake_collect(struct kdbus_meta_fake *mf,
-+			    const struct kdbus_creds *creds,
-+			    const struct kdbus_pids *pids,
-+			    const char *seclabel)
-+{
-+	if (mf->valid)
-+		return -EALREADY;
-+
-+	if (creds) {
-+		struct user_namespace *ns = current_user_ns();
-+
-+		mf->uid		= make_kuid(ns, creds->uid);
-+		mf->euid	= make_kuid(ns, creds->euid);
-+		mf->suid	= make_kuid(ns, creds->suid);
-+		mf->fsuid	= make_kuid(ns, creds->fsuid);
-+
-+		mf->gid		= make_kgid(ns, creds->gid);
-+		mf->egid	= make_kgid(ns, creds->egid);
-+		mf->sgid	= make_kgid(ns, creds->sgid);
-+		mf->fsgid	= make_kgid(ns, creds->fsgid);
-+
-+		if ((creds->uid   != (uid_t)-1 && !uid_valid(mf->uid))   ||
-+		    (creds->euid  != (uid_t)-1 && !uid_valid(mf->euid))  ||
-+		    (creds->suid  != (uid_t)-1 && !uid_valid(mf->suid))  ||
-+		    (creds->fsuid != (uid_t)-1 && !uid_valid(mf->fsuid)) ||
-+		    (creds->gid   != (gid_t)-1 && !gid_valid(mf->gid))   ||
-+		    (creds->egid  != (gid_t)-1 && !gid_valid(mf->egid))  ||
-+		    (creds->sgid  != (gid_t)-1 && !gid_valid(mf->sgid))  ||
-+		    (creds->fsgid != (gid_t)-1 && !gid_valid(mf->fsgid)))
-+			return -EINVAL;
-+
-+		mf->valid |= KDBUS_ATTACH_CREDS;
-+	}
-+
-+	if (pids) {
-+		mf->pid = get_pid(find_vpid(pids->tid));
-+		mf->tgid = get_pid(find_vpid(pids->pid));
-+		mf->ppid = get_pid(find_vpid(pids->ppid));
-+
-+		if ((pids->tid != 0 && !mf->pid) ||
-+		    (pids->pid != 0 && !mf->tgid) ||
-+		    (pids->ppid != 0 && !mf->ppid)) {
-+			put_pid(mf->pid);
-+			put_pid(mf->tgid);
-+			put_pid(mf->ppid);
-+			mf->pid = NULL;
-+			mf->tgid = NULL;
-+			mf->ppid = NULL;
-+			return -EINVAL;
-+		}
-+
-+		mf->valid |= KDBUS_ATTACH_PIDS;
-+	}
-+
-+	if (seclabel) {
-+		mf->seclabel = kstrdup(seclabel, GFP_KERNEL);
-+		if (!mf->seclabel)
-+			return -ENOMEM;
-+
-+		mf->valid |= KDBUS_ATTACH_SECLABEL;
-+	}
-+
-+	return 0;
-+}
-+
-+/**
-+ * kdbus_meta_conn_new() - Create connection metadata object
-+ *
-+ * Return: Pointer to new object on success, ERR_PTR on failure.
-+ */
-+struct kdbus_meta_conn *kdbus_meta_conn_new(void)
-+{
-+	struct kdbus_meta_conn *mc;
-+
-+	mc = kzalloc(sizeof(*mc), GFP_KERNEL);
-+	if (!mc)
-+		return ERR_PTR(-ENOMEM);
-+
-+	kref_init(&mc->kref);
-+	mutex_init(&mc->lock);
-+
-+	return mc;
-+}
-+
-+static void kdbus_meta_conn_free(struct kref *kref)
-+{
-+	struct kdbus_meta_conn *mc =
-+		container_of(kref, struct kdbus_meta_conn, kref);
-+
-+	kfree(mc->conn_description);
-+	kfree(mc->owned_names_items);
-+	kfree(mc);
-+}
-+
-+/**
-+ * kdbus_meta_conn_ref() - Gain reference
-+ * @mc:		Connection metadata object
-+ */
-+struct kdbus_meta_conn *kdbus_meta_conn_ref(struct kdbus_meta_conn *mc)
-+{
-+	if (mc)
-+		kref_get(&mc->kref);
-+	return mc;
-+}
-+
-+/**
-+ * kdbus_meta_conn_unref() - Drop reference
-+ * @mc:		Connection metadata object
-+ */
-+struct kdbus_meta_conn *kdbus_meta_conn_unref(struct kdbus_meta_conn *mc)
-+{
-+	if (mc)
-+		kref_put(&mc->kref, kdbus_meta_conn_free);
-+	return NULL;
-+}
-+
-+static void kdbus_meta_conn_collect_timestamp(struct kdbus_meta_conn *mc,
-+					      u64 msg_seqnum)
-+{
-+	mc->ts.monotonic_ns = ktime_get_ns();
-+	mc->ts.realtime_ns = ktime_get_real_ns();
-+
-+	if (msg_seqnum)
-+		mc->ts.seqnum = msg_seqnum;
-+
-+	mc->valid |= KDBUS_ATTACH_TIMESTAMP;
-+}
-+
-+static int kdbus_meta_conn_collect_names(struct kdbus_meta_conn *mc,
-+					 struct kdbus_conn *conn)
-+{
-+	const struct kdbus_name_owner *owner;
-+	struct kdbus_item *item;
-+	size_t slen, size;
-+
-+	lockdep_assert_held(&conn->ep->bus->name_registry->rwlock);
-+
-+	size = 0;
-+	/* open-code length calculation to avoid final padding */
-+	list_for_each_entry(owner, &conn->names_list, conn_entry)
-+		if (!(owner->flags & KDBUS_NAME_IN_QUEUE))
-+			size = KDBUS_ALIGN8(size) + KDBUS_ITEM_HEADER_SIZE +
-+				sizeof(struct kdbus_name) +
-+				strlen(owner->name->name) + 1;
-+
-+	if (!size)
-+		return 0;
-+
-+	/* make sure we include zeroed padding for convenience helpers */
-+	item = kmalloc(KDBUS_ALIGN8(size), GFP_KERNEL);
-+	if (!item)
-+		return -ENOMEM;
-+
-+	mc->owned_names_items = item;
-+	mc->owned_names_size = size;
-+
-+	list_for_each_entry(owner, &conn->names_list, conn_entry) {
-+		if (owner->flags & KDBUS_NAME_IN_QUEUE)
-+			continue;
-+
-+		slen = strlen(owner->name->name) + 1;
-+		kdbus_item_set(item, KDBUS_ITEM_OWNED_NAME, NULL,
-+			       sizeof(struct kdbus_name) + slen);
-+		item->name.flags = owner->flags;
-+		memcpy(item->name.name, owner->name->name, slen);
-+		item = KDBUS_ITEM_NEXT(item);
-+	}
-+
-+	/* sanity check: the buffer should be completely written now */
-+	WARN_ON((u8 *)item !=
-+			(u8 *)mc->owned_names_items + KDBUS_ALIGN8(size));
-+
-+	mc->valid |= KDBUS_ATTACH_NAMES;
-+	return 0;
-+}
-+
-+static int kdbus_meta_conn_collect_description(struct kdbus_meta_conn *mc,
-+					       struct kdbus_conn *conn)
-+{
-+	if (!conn->description)
-+		return 0;
-+
-+	mc->conn_description = kstrdup(conn->description, GFP_KERNEL);
-+	if (!mc->conn_description)
-+		return -ENOMEM;
-+
-+	mc->valid |= KDBUS_ATTACH_CONN_DESCRIPTION;
-+	return 0;
-+}
-+
-+/**
-+ * kdbus_meta_conn_collect() - Collect connection metadata
-+ * @mc:		Message metadata object
-+ * @conn:	Connection to collect data from
-+ * @msg_seqnum:	Sequence number of the message to send
-+ * @what:	Attach flags to collect
-+ *
-+ * This collects connection metadata from @msg_seqnum and @conn and saves it
-+ * in @mc.
-+ *
-+ * If KDBUS_ATTACH_NAMES is set in @what and @conn is non-NULL, the caller must
-+ * hold the name-registry read-lock of conn->ep->bus->registry.
-+ *
-+ * Return: 0 on success, negative error code on failure.
-+ */
-+int kdbus_meta_conn_collect(struct kdbus_meta_conn *mc,
-+			    struct kdbus_conn *conn,
-+			    u64 msg_seqnum, u64 what)
-+{
-+	int ret;
-+
-+	if (!mc || !(what & (KDBUS_ATTACH_TIMESTAMP |
-+			     KDBUS_ATTACH_NAMES |
-+			     KDBUS_ATTACH_CONN_DESCRIPTION)))
-+		return 0;
-+
-+	mutex_lock(&mc->lock);
-+
-+	if (msg_seqnum && (what & KDBUS_ATTACH_TIMESTAMP) &&
-+	    !(mc->collected & KDBUS_ATTACH_TIMESTAMP)) {
-+		kdbus_meta_conn_collect_timestamp(mc, msg_seqnum);
-+		mc->collected |= KDBUS_ATTACH_TIMESTAMP;
-+	}
-+
-+	if (conn && (what & KDBUS_ATTACH_NAMES) &&
-+	    !(mc->collected & KDBUS_ATTACH_NAMES)) {
-+		ret = kdbus_meta_conn_collect_names(mc, conn);
-+		if (ret < 0)
-+			goto exit_unlock;
-+		mc->collected |= KDBUS_ATTACH_NAMES;
-+	}
-+
-+	if (conn && (what & KDBUS_ATTACH_CONN_DESCRIPTION) &&
-+	    !(mc->collected & KDBUS_ATTACH_CONN_DESCRIPTION)) {
-+		ret = kdbus_meta_conn_collect_description(mc, conn);
-+		if (ret < 0)
-+			goto exit_unlock;
-+		mc->collected |= KDBUS_ATTACH_CONN_DESCRIPTION;
-+	}
-+
-+	ret = 0;
-+
-+exit_unlock:
-+	mutex_unlock(&mc->lock);
-+	return ret;
-+}
-+
-+static void kdbus_meta_export_caps(struct kdbus_meta_caps *out,
-+				   const struct kdbus_meta_proc *mp,
-+				   struct user_namespace *user_ns)
-+{
-+	struct user_namespace *iter;
-+	const struct cred *cred = mp->cred;
-+	bool parent = false, owner = false;
-+	int i;
-+
-+	/*
-+	 * This translates the effective capabilities of 'cred' into the given
-+	 * user-namespace. If the given user-namespace is a child-namespace of
-+	 * the user-namespace of 'cred', the mask can be copied verbatim. If
-+	 * not, the mask is cleared.
-+	 * There's one exception: If 'cred' is the owner of any user-namespace
-+	 * in the path between the given user-namespace and the user-namespace
-+	 * of 'cred', then it has all effective capabilities set. This means,
-+	 * the user who created a user-namespace always has all effective
-+	 * capabilities in any child namespaces. Note that this is based on the
-+	 * uid of the namespace creator, not the task hierarchy.
-+	 */
-+	for (iter = user_ns; iter; iter = iter->parent) {
-+		if (iter == cred->user_ns) {
-+			parent = true;
-+			break;
-+		}
-+
-+		if (iter == &init_user_ns)
-+			break;
-+
-+		if ((iter->parent == cred->user_ns) &&
-+		    uid_eq(iter->owner, cred->euid)) {
-+			owner = true;
-+			break;
-+		}
-+	}
-+
-+	out->last_cap = CAP_LAST_CAP;
-+
-+	CAP_FOR_EACH_U32(i) {
-+		if (parent) {
-+			out->set[0].caps[i] = cred->cap_inheritable.cap[i];
-+			out->set[1].caps[i] = cred->cap_permitted.cap[i];
-+			out->set[2].caps[i] = cred->cap_effective.cap[i];
-+			out->set[3].caps[i] = cred->cap_bset.cap[i];
-+		} else if (owner) {
-+			out->set[0].caps[i] = 0U;
-+			out->set[1].caps[i] = ~0U;
-+			out->set[2].caps[i] = ~0U;
-+			out->set[3].caps[i] = ~0U;
-+		} else {
-+			out->set[0].caps[i] = 0U;
-+			out->set[1].caps[i] = 0U;
-+			out->set[2].caps[i] = 0U;
-+			out->set[3].caps[i] = 0U;
-+		}
-+	}
-+
-+	/* clear unused bits */
-+	for (i = 0; i < 4; i++)
-+		out->set[i].caps[CAP_TO_INDEX(CAP_LAST_CAP)] &=
-+					CAP_LAST_U32_VALID_MASK;
-+}
-+
-+/* This is equivalent to from_kuid_munged(), but maps INVALID_UID to itself */
-+static uid_t kdbus_from_kuid_keep(struct user_namespace *ns, kuid_t uid)
-+{
-+	return uid_valid(uid) ? from_kuid_munged(ns, uid) : ((uid_t)-1);
-+}
-+
-+/* This is equivalent to from_kgid_munged(), but maps INVALID_GID to itself */
-+static gid_t kdbus_from_kgid_keep(struct user_namespace *ns, kgid_t gid)
-+{
-+	return gid_valid(gid) ? from_kgid_munged(ns, gid) : ((gid_t)-1);
-+}
-+
-+struct kdbus_meta_staging {
-+	const struct kdbus_meta_proc *mp;
-+	const struct kdbus_meta_fake *mf;
-+	const struct kdbus_meta_conn *mc;
-+	const struct kdbus_conn *conn;
-+	u64 mask;
-+
-+	void *exe;
-+	const char *exe_path;
-+};
-+
-+static size_t kdbus_meta_measure(struct kdbus_meta_staging *staging)
-+{
-+	const struct kdbus_meta_proc *mp = staging->mp;
-+	const struct kdbus_meta_fake *mf = staging->mf;
-+	const struct kdbus_meta_conn *mc = staging->mc;
-+	const u64 mask = staging->mask;
-+	size_t size = 0;
-+
-+	/* process metadata */
-+
-+	if (mf && (mask & KDBUS_ATTACH_CREDS))
-+		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_creds));
-+	else if (mp && (mask & KDBUS_ATTACH_CREDS))
-+		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_creds));
-+
-+	if (mf && (mask & KDBUS_ATTACH_PIDS))
-+		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_pids));
-+	else if (mp && (mask & KDBUS_ATTACH_PIDS))
-+		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_pids));
-+
-+	if (mp && (mask & KDBUS_ATTACH_AUXGROUPS))
-+		size += KDBUS_ITEM_SIZE(mp->cred->group_info->ngroups *
-+					sizeof(u64));
-+
-+	if (mp && (mask & KDBUS_ATTACH_TID_COMM))
-+		size += KDBUS_ITEM_SIZE(strlen(mp->tid_comm) + 1);
-+
-+	if (mp && (mask & KDBUS_ATTACH_PID_COMM))
-+		size += KDBUS_ITEM_SIZE(strlen(mp->pid_comm) + 1);
-+
-+	if (staging->exe_path && (mask & KDBUS_ATTACH_EXE))
-+		size += KDBUS_ITEM_SIZE(strlen(staging->exe_path) + 1);
-+
-+	if (mp && (mask & KDBUS_ATTACH_CMDLINE))
-+		size += KDBUS_ITEM_SIZE(strlen(mp->cmdline) + 1);
-+
-+	if (mp && (mask & KDBUS_ATTACH_CGROUP))
-+		size += KDBUS_ITEM_SIZE(strlen(mp->cgroup) + 1);
-+
-+	if (mp && (mask & KDBUS_ATTACH_CAPS))
-+		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_meta_caps));
-+
-+	if (mf && (mask & KDBUS_ATTACH_SECLABEL))
-+		size += KDBUS_ITEM_SIZE(strlen(mf->seclabel) + 1);
-+	else if (mp && (mask & KDBUS_ATTACH_SECLABEL))
-+		size += KDBUS_ITEM_SIZE(strlen(mp->seclabel) + 1);
-+
-+	if (mp && (mask & KDBUS_ATTACH_AUDIT))
-+		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_audit));
-+
-+	/* connection metadata */
-+
-+	if (mc && (mask & KDBUS_ATTACH_NAMES))
-+		size += KDBUS_ALIGN8(mc->owned_names_size);
-+
-+	if (mc && (mask & KDBUS_ATTACH_CONN_DESCRIPTION))
-+		size += KDBUS_ITEM_SIZE(strlen(mc->conn_description) + 1);
-+
-+	if (mc && (mask & KDBUS_ATTACH_TIMESTAMP))
-+		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_timestamp));
-+
-+	return size;
-+}
-+
-+static struct kdbus_item *kdbus_write_head(struct kdbus_item **iter,
-+					   u64 type, u64 size)
-+{
-+	struct kdbus_item *item = *iter;
-+	size_t padding;
-+
-+	item->type = type;
-+	item->size = KDBUS_ITEM_HEADER_SIZE + size;
-+
-+	/* clear padding */
-+	padding = KDBUS_ALIGN8(item->size) - item->size;
-+	if (padding)
-+		memset(item->data + size, 0, padding);
-+
-+	*iter = KDBUS_ITEM_NEXT(item);
-+	return item;
-+}
-+
-+static struct kdbus_item *kdbus_write_full(struct kdbus_item **iter,
-+					   u64 type, u64 size, const void *data)
-+{
-+	struct kdbus_item *item;
-+
-+	item = kdbus_write_head(iter, type, size);
-+	memcpy(item->data, data, size);
-+	return item;
-+}
-+
-+static size_t kdbus_meta_write(struct kdbus_meta_staging *staging, void *mem,
-+			       size_t size)
-+{
-+	struct user_namespace *user_ns = staging->conn->cred->user_ns;
-+	struct pid_namespace *pid_ns = ns_of_pid(staging->conn->pid);
-+	struct kdbus_item *item = NULL, *items = mem;
-+	u8 *end, *owned_names_end = NULL;
-+
-+	/* process metadata */
-+
-+	if (staging->mf && (staging->mask & KDBUS_ATTACH_CREDS)) {
-+		const struct kdbus_meta_fake *mf = staging->mf;
-+
-+		item = kdbus_write_head(&items, KDBUS_ITEM_CREDS,
-+					sizeof(struct kdbus_creds));
-+		item->creds = (struct kdbus_creds){
-+			.uid	= kdbus_from_kuid_keep(user_ns, mf->uid),
-+			.euid	= kdbus_from_kuid_keep(user_ns, mf->euid),
-+			.suid	= kdbus_from_kuid_keep(user_ns, mf->suid),
-+			.fsuid	= kdbus_from_kuid_keep(user_ns, mf->fsuid),
-+			.gid	= kdbus_from_kgid_keep(user_ns, mf->gid),
-+			.egid	= kdbus_from_kgid_keep(user_ns, mf->egid),
-+			.sgid	= kdbus_from_kgid_keep(user_ns, mf->sgid),
-+			.fsgid	= kdbus_from_kgid_keep(user_ns, mf->fsgid),
-+		};
-+	} else if (staging->mp && (staging->mask & KDBUS_ATTACH_CREDS)) {
-+		const struct cred *c = staging->mp->cred;
-+
-+		item = kdbus_write_head(&items, KDBUS_ITEM_CREDS,
-+					sizeof(struct kdbus_creds));
-+		item->creds = (struct kdbus_creds){
-+			.uid	= kdbus_from_kuid_keep(user_ns, c->uid),
-+			.euid	= kdbus_from_kuid_keep(user_ns, c->euid),
-+			.suid	= kdbus_from_kuid_keep(user_ns, c->suid),
-+			.fsuid	= kdbus_from_kuid_keep(user_ns, c->fsuid),
-+			.gid	= kdbus_from_kgid_keep(user_ns, c->gid),
-+			.egid	= kdbus_from_kgid_keep(user_ns, c->egid),
-+			.sgid	= kdbus_from_kgid_keep(user_ns, c->sgid),
-+			.fsgid	= kdbus_from_kgid_keep(user_ns, c->fsgid),
-+		};
-+	}
-+
-+	if (staging->mf && (staging->mask & KDBUS_ATTACH_PIDS)) {
-+		item = kdbus_write_head(&items, KDBUS_ITEM_PIDS,
-+					sizeof(struct kdbus_pids));
-+		item->pids = (struct kdbus_pids){
-+			.pid = pid_nr_ns(staging->mf->tgid, pid_ns),
-+			.tid = pid_nr_ns(staging->mf->pid, pid_ns),
-+			.ppid = pid_nr_ns(staging->mf->ppid, pid_ns),
-+		};
-+	} else if (staging->mp && (staging->mask & KDBUS_ATTACH_PIDS)) {
-+		item = kdbus_write_head(&items, KDBUS_ITEM_PIDS,
-+					sizeof(struct kdbus_pids));
-+		item->pids = (struct kdbus_pids){
-+			.pid = pid_nr_ns(staging->mp->tgid, pid_ns),
-+			.tid = pid_nr_ns(staging->mp->pid, pid_ns),
-+			.ppid = pid_nr_ns(staging->mp->ppid, pid_ns),
-+		};
-+	}
-+
-+	if (staging->mp && (staging->mask & KDBUS_ATTACH_AUXGROUPS)) {
-+		const struct group_info *info = staging->mp->cred->group_info;
-+		size_t i;
-+
-+		item = kdbus_write_head(&items, KDBUS_ITEM_AUXGROUPS,
-+					info->ngroups * sizeof(u64));
-+		for (i = 0; i < info->ngroups; ++i)
-+			item->data64[i] = from_kgid_munged(user_ns,
-+							   GROUP_AT(info, i));
-+	}
-+
-+	if (staging->mp && (staging->mask & KDBUS_ATTACH_TID_COMM))
-+		item = kdbus_write_full(&items, KDBUS_ITEM_TID_COMM,
-+					strlen(staging->mp->tid_comm) + 1,
-+					staging->mp->tid_comm);
-+
-+	if (staging->mp && (staging->mask & KDBUS_ATTACH_PID_COMM))
-+		item = kdbus_write_full(&items, KDBUS_ITEM_PID_COMM,
-+					strlen(staging->mp->pid_comm) + 1,
-+					staging->mp->pid_comm);
-+
-+	if (staging->exe_path && (staging->mask & KDBUS_ATTACH_EXE))
-+		item = kdbus_write_full(&items, KDBUS_ITEM_EXE,
-+					strlen(staging->exe_path) + 1,
-+					staging->exe_path);
-+
-+	if (staging->mp && (staging->mask & KDBUS_ATTACH_CMDLINE))
-+		item = kdbus_write_full(&items, KDBUS_ITEM_CMDLINE,
-+					strlen(staging->mp->cmdline) + 1,
-+					staging->mp->cmdline);
-+
-+	if (staging->mp && (staging->mask & KDBUS_ATTACH_CGROUP))
-+		item = kdbus_write_full(&items, KDBUS_ITEM_CGROUP,
-+					strlen(staging->mp->cgroup) + 1,
-+					staging->mp->cgroup);
-+
-+	if (staging->mp && (staging->mask & KDBUS_ATTACH_CAPS)) {
-+		item = kdbus_write_head(&items, KDBUS_ITEM_CAPS,
-+					sizeof(struct kdbus_meta_caps));
-+		kdbus_meta_export_caps((void*)&item->caps, staging->mp,
-+				       user_ns);
-+	}
-+
-+	if (staging->mf && (staging->mask & KDBUS_ATTACH_SECLABEL))
-+		item = kdbus_write_full(&items, KDBUS_ITEM_SECLABEL,
-+					strlen(staging->mf->seclabel) + 1,
-+					staging->mf->seclabel);
-+	else if (staging->mp && (staging->mask & KDBUS_ATTACH_SECLABEL))
-+		item = kdbus_write_full(&items, KDBUS_ITEM_SECLABEL,
-+					strlen(staging->mp->seclabel) + 1,
-+					staging->mp->seclabel);
-+
-+	if (staging->mp && (staging->mask & KDBUS_ATTACH_AUDIT)) {
-+		item = kdbus_write_head(&items, KDBUS_ITEM_AUDIT,
-+					sizeof(struct kdbus_audit));
-+		item->audit = (struct kdbus_audit){
-+			.loginuid = from_kuid(user_ns,
-+					      staging->mp->audit_loginuid),
-+			.sessionid = staging->mp->audit_sessionid,
-+		};
-+	}
-+
-+	/* connection metadata */
-+
-+	if (staging->mc && (staging->mask & KDBUS_ATTACH_NAMES)) {
-+		memcpy(items, staging->mc->owned_names_items,
-+		       KDBUS_ALIGN8(staging->mc->owned_names_size));
-+		owned_names_end = (u8 *)items + staging->mc->owned_names_size;
-+		items = (void *)KDBUS_ALIGN8((unsigned long)owned_names_end);
-+	}
-+
-+	if (staging->mc && (staging->mask & KDBUS_ATTACH_CONN_DESCRIPTION))
-+		item = kdbus_write_full(&items, KDBUS_ITEM_CONN_DESCRIPTION,
-+				strlen(staging->mc->conn_description) + 1,
-+				staging->mc->conn_description);
-+
-+	if (staging->mc && (staging->mask & KDBUS_ATTACH_TIMESTAMP))
-+		item = kdbus_write_full(&items, KDBUS_ITEM_TIMESTAMP,
-+					sizeof(staging->mc->ts),
-+					&staging->mc->ts);
-+
-+	/*
-+	 * Return real size (minus trailing padding). In case of 'owned_names'
-+	 * we cannot deduce it from item->size, so treat it special.
-+	 */
-+
-+	if (items == (void *)KDBUS_ALIGN8((unsigned long)owned_names_end))
-+		end = owned_names_end;
-+	else if (item)
-+		end = (u8 *)item + item->size;
-+	else
-+		end = mem;
-+
-+	WARN_ON((u8 *)items - (u8 *)mem != size);
-+	WARN_ON((void *)KDBUS_ALIGN8((unsigned long)end) != (void *)items);
-+
-+	return end - (u8 *)mem;
-+}
-+
-+int kdbus_meta_emit(struct kdbus_meta_proc *mp,
-+		    struct kdbus_meta_fake *mf,
-+		    struct kdbus_meta_conn *mc,
-+		    struct kdbus_conn *conn,
-+		    u64 mask,
-+		    struct kdbus_item **out_items,
-+		    size_t *out_size)
-+{
-+	struct kdbus_meta_staging staging = {};
-+	struct kdbus_item *items = NULL;
-+	size_t size = 0;
-+	int ret;
-+
-+	if (WARN_ON(mf && mp))
-+		mp = NULL;
-+
-+	staging.mp = mp;
-+	staging.mf = mf;
-+	staging.mc = mc;
-+	staging.conn = conn;
-+
-+	/* get mask of valid items */
-+	if (mf)
-+		staging.mask |= mf->valid;
-+	if (mp) {
-+		mutex_lock(&mp->lock);
-+		staging.mask |= mp->valid;
-+		mutex_unlock(&mp->lock);
-+	}
-+	if (mc) {
-+		mutex_lock(&mc->lock);
-+		staging.mask |= mc->valid;
-+		mutex_unlock(&mc->lock);
-+	}
-+
-+	staging.mask &= mask;
-+
-+	if (!staging.mask) { /* bail out if nothing to do */
-+		ret = 0;
-+		goto exit;
-+	}
-+
-+	/* EXE is special as it needs a temporary page to assemble */
-+	if (mp && (staging.mask & KDBUS_ATTACH_EXE)) {
-+		struct path p;
-+
-+		/*
-+		 * XXX: We need access to __d_path() so we can write the path
-+		 * relative to conn->root_path. Once upstream, we need
-+		 * EXPORT_SYMBOL(__d_path) or an equivalent of d_path() that
-+		 * takes the root path directly. Until then, we drop this item
-+		 * if the root-paths differ.
-+		 */
-+
-+		get_fs_root(current->fs, &p);
-+		if (path_equal(&p, &conn->root_path)) {
-+			staging.exe = (void *)__get_free_page(GFP_TEMPORARY);
-+			if (!staging.exe) {
-+				path_put(&p);
-+				ret = -ENOMEM;
-+				goto exit;
-+			}
-+
-+			staging.exe_path = d_path(&mp->exe_path, staging.exe,
-+						  PAGE_SIZE);
-+			if (IS_ERR(staging.exe_path)) {
-+				path_put(&p);
-+				ret = PTR_ERR(staging.exe_path);
-+				goto exit;
-+			}
-+		}
-+		path_put(&p);
-+	}
-+
-+	size = kdbus_meta_measure(&staging);
-+	if (!size) { /* bail out if nothing to do */
-+		ret = 0;
-+		goto exit;
-+	}
-+
-+	items = kmalloc(size, GFP_KERNEL);
-+	if (!items) {
-+		ret = -ENOMEM;
-+		goto exit;
-+	}
-+
-+	size = kdbus_meta_write(&staging, items, size);
-+	if (!size) {
-+		kfree(items);
-+		items = NULL;
-+	}
-+
-+	ret = 0;
-+
-+exit:
-+	if (staging.exe)
-+		free_page((unsigned long)staging.exe);
-+	if (ret >= 0) {
-+		*out_items = items;
-+		*out_size = size;
-+	}
-+	return ret;
-+}
-+
-+enum {
-+	KDBUS_META_PROC_NONE,
-+	KDBUS_META_PROC_NORMAL,
-+};
-+
-+/**
-+ * kdbus_proc_permission() - check /proc permissions on target pid
-+ * @pid_ns:		namespace we operate in
-+ * @cred:		credentials of requestor
-+ * @target:		target process
-+ *
-+ * This checks whether a process with credentials @cred can access information
-+ * of @target in the namespace @pid_ns. This tries to follow /proc permissions,
-+ * but is slightly more restrictive.
-+ *
-+ * Return: The /proc access level (KDBUS_META_PROC_*) is returned.
-+ */
-+static unsigned int kdbus_proc_permission(const struct pid_namespace *pid_ns,
-+					  const struct cred *cred,
-+					  struct pid *target)
-+{
-+	if (pid_ns->hide_pid < 1)
-+		return KDBUS_META_PROC_NORMAL;
-+
-+	/* XXX: we need groups_search() exported for aux-groups */
-+	if (gid_eq(cred->egid, pid_ns->pid_gid))
-+		return KDBUS_META_PROC_NORMAL;
-+
-+	/*
-+	 * XXX: If ptrace_may_access(PTRACE_MODE_READ) is granted, you can
-+	 * overwrite hide_pid. However, ptrace_may_access() only supports
-+	 * checking 'current', hence, we cannot use this here. But we
-+	 * simply decide to not support this override, so no need to worry.
-+	 */
-+
-+	return KDBUS_META_PROC_NONE;
-+}
-+
-+/**
-+ * kdbus_meta_proc_mask() - calculate which metadata would be visible to
-+ *			    a connection via /proc
-+ * @prv_pid:		pid of metadata provider
-+ * @req_pid:		pid of metadata requestor
-+ * @req_cred:		credentials of metadata reqeuestor
-+ * @wanted:		metadata that is requested
-+ *
-+ * This checks which metadata items of @prv_pid can be read via /proc by the
-+ * requestor @req_pid.
-+ *
-+ * Return: Set of metadata flags the requestor can see (limited by @wanted).
-+ */
-+static u64 kdbus_meta_proc_mask(struct pid *prv_pid,
-+				struct pid *req_pid,
-+				const struct cred *req_cred,
-+				u64 wanted)
-+{
-+	struct pid_namespace *prv_ns, *req_ns;
-+	unsigned int proc;
-+
-+	prv_ns = ns_of_pid(prv_pid);
-+	req_ns = ns_of_pid(req_pid);
-+
-+	/*
-+	 * If the sender is not visible in the receiver namespace, then the
-+	 * receiver cannot access the sender via its own procfs. Hence, we do
-+	 * not attach any additional metadata.
-+	 */
-+	if (!pid_nr_ns(prv_pid, req_ns))
-+		return 0;
-+
-+	/*
-+	 * If the pid-namespace of the receiver has hide_pid set, it cannot see
-+	 * any process but its own. We shortcut this /proc permission check if
-+	 * provider and requestor are the same. If not, we perform rather
-+	 * expensive /proc permission checks.
-+	 */
-+	if (prv_pid == req_pid)
-+		proc = KDBUS_META_PROC_NORMAL;
-+	else
-+		proc = kdbus_proc_permission(req_ns, req_cred, prv_pid);
-+
-+	/* you need /proc access to read standard process attributes */
-+	if (proc < KDBUS_META_PROC_NORMAL)
-+		wanted &= ~(KDBUS_ATTACH_TID_COMM |
-+			    KDBUS_ATTACH_PID_COMM |
-+			    KDBUS_ATTACH_SECLABEL |
-+			    KDBUS_ATTACH_CMDLINE |
-+			    KDBUS_ATTACH_CGROUP |
-+			    KDBUS_ATTACH_AUDIT |
-+			    KDBUS_ATTACH_CAPS |
-+			    KDBUS_ATTACH_EXE);
-+
-+	/* clear all non-/proc flags */
-+	return wanted & (KDBUS_ATTACH_TID_COMM |
-+			 KDBUS_ATTACH_PID_COMM |
-+			 KDBUS_ATTACH_SECLABEL |
-+			 KDBUS_ATTACH_CMDLINE |
-+			 KDBUS_ATTACH_CGROUP |
-+			 KDBUS_ATTACH_AUDIT |
-+			 KDBUS_ATTACH_CAPS |
-+			 KDBUS_ATTACH_EXE);
-+}
-+
-+/**
-+ * kdbus_meta_get_mask() - calculate attach flags mask for metadata request
-+ * @prv_pid:		pid of metadata provider
-+ * @prv_mask:		mask of metadata the provide grants unchecked
-+ * @req_pid:		pid of metadata requestor
-+ * @req_cred:		credentials of metadata requestor
-+ * @req_mask:		mask of metadata that is requested
-+ *
-+ * This calculates the metadata items that the requestor @req_pid can access
-+ * from the metadata provider @prv_pid. This permission check consists of
-+ * several different parts:
-+ *  - Providers can grant metadata items unchecked. Regardless of their type,
-+ *    they're always granted to the requestor. This mask is passed as @prv_mask.
-+ *  - Basic items (credentials and connection metadata) are granted implicitly
-+ *    to everyone. They're publicly available to any bus-user that can see the
-+ *    provider.
-+ *  - Process credentials that are not granted implicitly follow the same
-+ *    permission checks as /proc. This means, we always assume a requestor
-+ *    process has access to their *own* /proc mount, if they have access to
-+ *    kdbusfs.
-+ *
-+ * Return: Mask of metadata that is granted.
-+ */
-+static u64 kdbus_meta_get_mask(struct pid *prv_pid, u64 prv_mask,
-+			       struct pid *req_pid,
-+			       const struct cred *req_cred, u64 req_mask)
-+{
-+	u64 missing, impl_mask, proc_mask = 0;
-+
-+	/*
-+	 * Connection metadata and basic unix process credentials are
-+	 * transmitted implicitly, and cannot be suppressed. Both are required
-+	 * to perform user-space policies on the receiver-side. Furthermore,
-+	 * connection metadata is public state, anyway, and unix credentials
-+	 * are needed for UDS-compatibility. We extend them slightly by
-+	 * auxiliary groups and additional uids/gids/pids.
-+	 */
-+	impl_mask = /* connection metadata */
-+		    KDBUS_ATTACH_CONN_DESCRIPTION |
-+		    KDBUS_ATTACH_TIMESTAMP |
-+		    KDBUS_ATTACH_NAMES |
-+		    /* credentials and pids */
-+		    KDBUS_ATTACH_AUXGROUPS |
-+		    KDBUS_ATTACH_CREDS |
-+		    KDBUS_ATTACH_PIDS;
-+
-+	/*
-+	 * Calculate the set of metadata that is not granted implicitly nor by
-+	 * the sender, but still requested by the receiver. If any are left,
-+	 * perform rather expensive /proc access checks for them.
-+	 */
-+	missing = req_mask & ~((prv_mask | impl_mask) & req_mask);
-+	if (missing)
-+		proc_mask = kdbus_meta_proc_mask(prv_pid, req_pid, req_cred,
-+						 missing);
-+
-+	return (prv_mask | impl_mask | proc_mask) & req_mask;
-+}
-+
-+/**
-+ */
-+u64 kdbus_meta_info_mask(const struct kdbus_conn *conn, u64 mask)
-+{
-+	return kdbus_meta_get_mask(conn->pid,
-+				   atomic64_read(&conn->attach_flags_send),
-+				   task_pid(current),
-+				   current_cred(),
-+				   mask);
-+}
-+
-+/**
-+ */
-+u64 kdbus_meta_msg_mask(const struct kdbus_conn *snd,
-+			const struct kdbus_conn *rcv)
-+{
-+	return kdbus_meta_get_mask(task_pid(current),
-+				   atomic64_read(&snd->attach_flags_send),
-+				   rcv->pid,
-+				   rcv->cred,
-+				   atomic64_read(&rcv->attach_flags_recv));
-+}
-diff --git a/ipc/kdbus/metadata.h b/ipc/kdbus/metadata.h
-new file mode 100644
-index 0000000..dba7cc7
---- /dev/null
-+++ b/ipc/kdbus/metadata.h
-@@ -0,0 +1,86 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_METADATA_H
-+#define __KDBUS_METADATA_H
-+
-+#include <linux/kernel.h>
-+
-+struct kdbus_conn;
-+struct kdbus_pool_slice;
-+
-+struct kdbus_meta_proc;
-+struct kdbus_meta_conn;
-+
-+/**
-+ * struct kdbus_meta_fake - Fake metadata
-+ * @valid:		Bitmask of collected and valid items
-+ * @uid:		UID of process
-+ * @euid:		EUID of process
-+ * @suid:		SUID of process
-+ * @fsuid:		FSUID of process
-+ * @gid:		GID of process
-+ * @egid:		EGID of process
-+ * @sgid:		SGID of process
-+ * @fsgid:		FSGID of process
-+ * @pid:		PID of process
-+ * @tgid:		TGID of process
-+ * @ppid:		PPID of process
-+ * @seclabel:		Seclabel
-+ */
-+struct kdbus_meta_fake {
-+	u64 valid;
-+
-+	/* KDBUS_ITEM_CREDS */
-+	kuid_t uid, euid, suid, fsuid;
-+	kgid_t gid, egid, sgid, fsgid;
-+
-+	/* KDBUS_ITEM_PIDS */
-+	struct pid *pid, *tgid, *ppid;
-+
-+	/* KDBUS_ITEM_SECLABEL */
-+	char *seclabel;
-+};
-+
-+struct kdbus_meta_proc *kdbus_meta_proc_new(void);
-+struct kdbus_meta_proc *kdbus_meta_proc_ref(struct kdbus_meta_proc *mp);
-+struct kdbus_meta_proc *kdbus_meta_proc_unref(struct kdbus_meta_proc *mp);
-+int kdbus_meta_proc_collect(struct kdbus_meta_proc *mp, u64 what);
-+
-+struct kdbus_meta_fake *kdbus_meta_fake_new(void);
-+struct kdbus_meta_fake *kdbus_meta_fake_free(struct kdbus_meta_fake *mf);
-+int kdbus_meta_fake_collect(struct kdbus_meta_fake *mf,
-+			    const struct kdbus_creds *creds,
-+			    const struct kdbus_pids *pids,
-+			    const char *seclabel);
-+
-+struct kdbus_meta_conn *kdbus_meta_conn_new(void);
-+struct kdbus_meta_conn *kdbus_meta_conn_ref(struct kdbus_meta_conn *mc);
-+struct kdbus_meta_conn *kdbus_meta_conn_unref(struct kdbus_meta_conn *mc);
-+int kdbus_meta_conn_collect(struct kdbus_meta_conn *mc,
-+			    struct kdbus_conn *conn,
-+			    u64 msg_seqnum, u64 what);
-+
-+int kdbus_meta_emit(struct kdbus_meta_proc *mp,
-+		    struct kdbus_meta_fake *mf,
-+		    struct kdbus_meta_conn *mc,
-+		    struct kdbus_conn *conn,
-+		    u64 mask,
-+		    struct kdbus_item **out_items,
-+		    size_t *out_size);
-+u64 kdbus_meta_info_mask(const struct kdbus_conn *conn, u64 mask);
-+u64 kdbus_meta_msg_mask(const struct kdbus_conn *snd,
-+			const struct kdbus_conn *rcv);
-+
-+#endif
-diff --git a/ipc/kdbus/names.c b/ipc/kdbus/names.c
-new file mode 100644
-index 0000000..bf44ca3
---- /dev/null
-+++ b/ipc/kdbus/names.c
-@@ -0,0 +1,854 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/ctype.h>
-+#include <linux/fs.h>
-+#include <linux/hash.h>
-+#include <linux/idr.h>
-+#include <linux/init.h>
-+#include <linux/module.h>
-+#include <linux/mutex.h>
-+#include <linux/rwsem.h>
-+#include <linux/sched.h>
-+#include <linux/slab.h>
-+#include <linux/uaccess.h>
-+#include <linux/uio.h>
-+
-+#include "bus.h"
-+#include "connection.h"
-+#include "endpoint.h"
-+#include "handle.h"
-+#include "item.h"
-+#include "names.h"
-+#include "notify.h"
-+#include "policy.h"
-+
-+#define KDBUS_NAME_SAVED_MASK (KDBUS_NAME_ALLOW_REPLACEMENT |	\
-+			       KDBUS_NAME_QUEUE)
-+
-+static bool kdbus_name_owner_is_used(struct kdbus_name_owner *owner)
-+{
-+	return !list_empty(&owner->name_entry) ||
-+	       owner == owner->name->activator;
-+}
-+
-+static struct kdbus_name_owner *
-+kdbus_name_owner_new(struct kdbus_conn *conn, struct kdbus_name_entry *name,
-+		     u64 flags)
-+{
-+	struct kdbus_name_owner *owner;
-+
-+	kdbus_conn_assert_active(conn);
-+
-+	if (conn->name_count >= KDBUS_CONN_MAX_NAMES)
-+		return ERR_PTR(-E2BIG);
-+
-+	owner = kmalloc(sizeof(*owner), GFP_KERNEL);
-+	if (!owner)
-+		return ERR_PTR(-ENOMEM);
-+
-+	owner->flags = flags & KDBUS_NAME_SAVED_MASK;
-+	owner->conn = conn;
-+	owner->name = name;
-+	list_add_tail(&owner->conn_entry, &conn->names_list);
-+	INIT_LIST_HEAD(&owner->name_entry);
-+
-+	++conn->name_count;
-+	return owner;
-+}
-+
-+static void kdbus_name_owner_free(struct kdbus_name_owner *owner)
-+{
-+	if (!owner)
-+		return;
-+
-+	WARN_ON(kdbus_name_owner_is_used(owner));
-+	--owner->conn->name_count;
-+	list_del(&owner->conn_entry);
-+	kfree(owner);
-+}
-+
-+static struct kdbus_name_owner *
-+kdbus_name_owner_find(struct kdbus_name_entry *name, struct kdbus_conn *conn)
-+{
-+	struct kdbus_name_owner *owner;
-+
-+	/*
-+	 * Use conn->names_list over name->queue to make sure boundaries of
-+	 * this linear search are controlled by the connection itself.
-+	 * Furthermore, this will find normal owners as well as activators
-+	 * without any additional code.
-+	 */
-+	list_for_each_entry(owner, &conn->names_list, conn_entry)
-+		if (owner->name == name)
-+			return owner;
-+
-+	return NULL;
-+}
-+
-+static bool kdbus_name_entry_is_used(struct kdbus_name_entry *name)
-+{
-+	return !list_empty(&name->queue) || name->activator;
-+}
-+
-+static struct kdbus_name_owner *
-+kdbus_name_entry_first(struct kdbus_name_entry *name)
-+{
-+	return list_first_entry_or_null(&name->queue, struct kdbus_name_owner,
-+					name_entry);
-+}
-+
-+static struct kdbus_name_entry *
-+kdbus_name_entry_new(struct kdbus_name_registry *r, u32 hash,
-+		     const char *name_str)
-+{
-+	struct kdbus_name_entry *name;
-+	size_t namelen;
-+
-+	lockdep_assert_held(&r->rwlock);
-+
-+	namelen = strlen(name_str);
-+
-+	name = kmalloc(sizeof(*name) + namelen + 1, GFP_KERNEL);
-+	if (!name)
-+		return ERR_PTR(-ENOMEM);
-+
-+	name->name_id = ++r->name_seq_last;
-+	name->activator = NULL;
-+	INIT_LIST_HEAD(&name->queue);
-+	hash_add(r->entries_hash, &name->hentry, hash);
-+	memcpy(name->name, name_str, namelen + 1);
-+
-+	return name;
-+}
-+
-+static void kdbus_name_entry_free(struct kdbus_name_entry *name)
-+{
-+	if (!name)
-+		return;
-+
-+	WARN_ON(kdbus_name_entry_is_used(name));
-+	hash_del(&name->hentry);
-+	kfree(name);
-+}
-+
-+static struct kdbus_name_entry *
-+kdbus_name_entry_find(struct kdbus_name_registry *r, u32 hash,
-+		      const char *name_str)
-+{
-+	struct kdbus_name_entry *name;
-+
-+	lockdep_assert_held(&r->rwlock);
-+
-+	hash_for_each_possible(r->entries_hash, name, hentry, hash)
-+		if (!strcmp(name->name, name_str))
-+			return name;
-+
-+	return NULL;
-+}
-+
-+/**
-+ * kdbus_name_registry_new() - create a new name registry
-+ *
-+ * Return: a new kdbus_name_registry on success, ERR_PTR on failure.
-+ */
-+struct kdbus_name_registry *kdbus_name_registry_new(void)
-+{
-+	struct kdbus_name_registry *r;
-+
-+	r = kmalloc(sizeof(*r), GFP_KERNEL);
-+	if (!r)
-+		return ERR_PTR(-ENOMEM);
-+
-+	hash_init(r->entries_hash);
-+	init_rwsem(&r->rwlock);
-+	r->name_seq_last = 0;
-+
-+	return r;
-+}
-+
-+/**
-+ * kdbus_name_registry_free() - free name registry
-+ * @r:		name registry to free, or NULL
-+ *
-+ * Free a name registry and cleanup all internal objects. This is a no-op if
-+ * you pass NULL as registry.
-+ */
-+void kdbus_name_registry_free(struct kdbus_name_registry *r)
-+{
-+	if (!r)
-+		return;
-+
-+	WARN_ON(!hash_empty(r->entries_hash));
-+	kfree(r);
-+}
-+
-+/**
-+ * kdbus_name_lookup_unlocked() - lookup name in registry
-+ * @reg:		name registry
-+ * @name:		name to lookup
-+ *
-+ * This looks up @name in the given name-registry and returns the
-+ * kdbus_name_entry object. The caller must hold the registry-lock and must not
-+ * access the returned object after releasing the lock.
-+ *
-+ * Return: Pointer to name-entry, or NULL if not found.
-+ */
-+struct kdbus_name_entry *
-+kdbus_name_lookup_unlocked(struct kdbus_name_registry *reg, const char *name)
-+{
-+	return kdbus_name_entry_find(reg, kdbus_strhash(name), name);
-+}
-+
-+static int kdbus_name_become_activator(struct kdbus_name_owner *owner,
-+				       u64 *return_flags)
-+{
-+	if (kdbus_name_owner_is_used(owner))
-+		return -EALREADY;
-+	if (owner->name->activator)
-+		return -EEXIST;
-+
-+	owner->name->activator = owner;
-+	owner->flags |= KDBUS_NAME_ACTIVATOR;
-+
-+	if (kdbus_name_entry_first(owner->name)) {
-+		owner->flags |= KDBUS_NAME_IN_QUEUE;
-+	} else {
-+		owner->flags |= KDBUS_NAME_PRIMARY;
-+		kdbus_notify_name_change(owner->conn->ep->bus,
-+					 KDBUS_ITEM_NAME_ADD,
-+					 0, owner->conn->id,
-+					 0, owner->flags,
-+					 owner->name->name);
-+	}
-+
-+	if (return_flags)
-+		*return_flags = owner->flags | KDBUS_NAME_ACQUIRED;
-+
-+	return 0;
-+}
-+
-+static int kdbus_name_update(struct kdbus_name_owner *owner, u64 flags,
-+			     u64 *return_flags)
-+{
-+	struct kdbus_name_owner *primary, *activator;
-+	struct kdbus_name_entry *name;
-+	struct kdbus_bus *bus;
-+	u64 nflags = 0;
-+	int ret = 0;
-+
-+	name = owner->name;
-+	bus = owner->conn->ep->bus;
-+	primary = kdbus_name_entry_first(name);
-+	activator = name->activator;
-+
-+	/* cannot be activator and acquire a name */
-+	if (owner == activator)
-+		return -EUCLEAN;
-+
-+	/* update saved flags */
-+	owner->flags = flags & KDBUS_NAME_SAVED_MASK;
-+
-+	if (!primary) {
-+		/*
-+		 * No primary owner (but maybe an activator). Take over the
-+		 * name.
-+		 */
-+
-+		list_add(&owner->name_entry, &name->queue);
-+		owner->flags |= KDBUS_NAME_PRIMARY;
-+		nflags |= KDBUS_NAME_ACQUIRED;
-+
-+		/* move messages to new owner on activation */
-+		if (activator) {
-+			kdbus_conn_move_messages(owner->conn, activator->conn,
-+						 name->name_id);
-+			kdbus_notify_name_change(bus, KDBUS_ITEM_NAME_CHANGE,
-+					activator->conn->id, owner->conn->id,
-+					activator->flags, owner->flags,
-+					name->name);
-+			activator->flags &= ~KDBUS_NAME_PRIMARY;
-+			activator->flags |= KDBUS_NAME_IN_QUEUE;
-+		} else {
-+			kdbus_notify_name_change(bus, KDBUS_ITEM_NAME_ADD,
-+						 0, owner->conn->id,
-+						 0, owner->flags,
-+						 name->name);
-+		}
-+
-+	} else if (owner == primary) {
-+		/*
-+		 * Already the primary owner of the name, flags were already
-+		 * updated. Nothing to do.
-+		 */
-+
-+		owner->flags |= KDBUS_NAME_PRIMARY;
-+
-+	} else if ((primary->flags & KDBUS_NAME_ALLOW_REPLACEMENT) &&
-+		   (flags & KDBUS_NAME_REPLACE_EXISTING)) {
-+		/*
-+		 * We're not the primary owner but can replace it. Move us
-+		 * ahead of the primary owner and acquire the name (possibly
-+		 * skipping queued owners ahead of us).
-+		 */
-+
-+		list_del_init(&owner->name_entry);
-+		list_add(&owner->name_entry, &name->queue);
-+		owner->flags |= KDBUS_NAME_PRIMARY;
-+		nflags |= KDBUS_NAME_ACQUIRED;
-+
-+		kdbus_notify_name_change(bus, KDBUS_ITEM_NAME_CHANGE,
-+					 primary->conn->id, owner->conn->id,
-+					 primary->flags, owner->flags,
-+					 name->name);
-+
-+		/* requeue old primary, or drop if queueing not wanted */
-+		if (primary->flags & KDBUS_NAME_QUEUE) {
-+			primary->flags &= ~KDBUS_NAME_PRIMARY;
-+			primary->flags |= KDBUS_NAME_IN_QUEUE;
-+		} else {
-+			list_del_init(&primary->name_entry);
-+			kdbus_name_owner_free(primary);
-+		}
-+
-+	} else if (flags & KDBUS_NAME_QUEUE) {
-+		/*
-+		 * Name is already occupied and we cannot take it over, but
-+		 * queuing is allowed. Put us silently on the queue, if not
-+		 * already there.
-+		 */
-+
-+		owner->flags |= KDBUS_NAME_IN_QUEUE;
-+		if (!kdbus_name_owner_is_used(owner)) {
-+			list_add_tail(&owner->name_entry, &name->queue);
-+			nflags |= KDBUS_NAME_ACQUIRED;
-+		}
-+	} else if (kdbus_name_owner_is_used(owner)) {
-+		/*
-+		 * Already queued on name, but re-queueing was not requested.
-+		 * Make sure to unlink it from the name, the caller is
-+		 * responsible for releasing it.
-+		 */
-+
-+		list_del_init(&owner->name_entry);
-+	} else {
-+		/*
-+		 * Name is already claimed and queueing is not requested.
-+		 * Return error to the caller.
-+		 */
-+
-+		ret = -EEXIST;
-+	}
-+
-+	if (return_flags)
-+		*return_flags = owner->flags | nflags;
-+
-+	return ret;
-+}
-+
-+int kdbus_name_acquire(struct kdbus_name_registry *reg,
-+		       struct kdbus_conn *conn, const char *name_str,
-+		       u64 flags, u64 *return_flags)
-+{
-+	struct kdbus_name_entry *name = NULL;
-+	struct kdbus_name_owner *owner = NULL;
-+	u32 hash;
-+	int ret;
-+
-+	kdbus_conn_assert_active(conn);
-+
-+	down_write(&reg->rwlock);
-+
-+	/*
-+	 * Verify the connection has access to the name. Do this before testing
-+	 * for double-acquisitions and other errors to make sure we do not leak
-+	 * information about this name through possible custom endpoints.
-+	 */
-+	if (!kdbus_conn_policy_own_name(conn, current_cred(), name_str)) {
-+		ret = -EPERM;
-+		goto exit;
-+	}
-+
-+	/*
-+	 * Lookup the name entry. If it already exists, search for an owner
-+	 * entry as we might already own that name. If either does not exist,
-+	 * we will allocate a fresh one.
-+	 */
-+	hash = kdbus_strhash(name_str);
-+	name = kdbus_name_entry_find(reg, hash, name_str);
-+	if (name) {
-+		owner = kdbus_name_owner_find(name, conn);
-+	} else {
-+		name = kdbus_name_entry_new(reg, hash, name_str);
-+		if (IS_ERR(name)) {
-+			ret = PTR_ERR(name);
-+			name = NULL;
-+			goto exit;
-+		}
-+	}
-+
-+	/* create name owner object if not already queued */
-+	if (!owner) {
-+		owner = kdbus_name_owner_new(conn, name, flags);
-+		if (IS_ERR(owner)) {
-+			ret = PTR_ERR(owner);
-+			owner = NULL;
-+			goto exit;
-+		}
-+	}
-+
-+	if (flags & KDBUS_NAME_ACTIVATOR)
-+		ret = kdbus_name_become_activator(owner, return_flags);
-+	else
-+		ret = kdbus_name_update(owner, flags, return_flags);
-+	if (ret < 0)
-+		goto exit;
-+
-+exit:
-+	if (owner && !kdbus_name_owner_is_used(owner))
-+		kdbus_name_owner_free(owner);
-+	if (name && !kdbus_name_entry_is_used(name))
-+		kdbus_name_entry_free(name);
-+	up_write(&reg->rwlock);
-+	kdbus_notify_flush(conn->ep->bus);
-+	return ret;
-+}
-+
-+static void kdbus_name_release_unlocked(struct kdbus_name_owner *owner)
-+{
-+	struct kdbus_name_owner *primary, *next;
-+	struct kdbus_name_entry *name;
-+
-+	name = owner->name;
-+	primary = kdbus_name_entry_first(name);
-+
-+	list_del_init(&owner->name_entry);
-+	if (owner == name->activator)
-+		name->activator = NULL;
-+
-+	if (!primary || owner == primary) {
-+		next = kdbus_name_entry_first(name);
-+		if (!next)
-+			next = name->activator;
-+
-+		if (next) {
-+			/* hand to next in queue */
-+			next->flags &= ~KDBUS_NAME_IN_QUEUE;
-+			next->flags |= KDBUS_NAME_PRIMARY;
-+			if (next == name->activator)
-+				kdbus_conn_move_messages(next->conn,
-+							 owner->conn,
-+							 name->name_id);
-+
-+			kdbus_notify_name_change(owner->conn->ep->bus,
-+					KDBUS_ITEM_NAME_CHANGE,
-+					owner->conn->id, next->conn->id,
-+					owner->flags, next->flags,
-+					name->name);
-+		} else {
-+			kdbus_notify_name_change(owner->conn->ep->bus,
-+						 KDBUS_ITEM_NAME_REMOVE,
-+						 owner->conn->id, 0,
-+						 owner->flags, 0,
-+						 name->name);
-+		}
-+	}
-+
-+	kdbus_name_owner_free(owner);
-+	if (!kdbus_name_entry_is_used(name))
-+		kdbus_name_entry_free(name);
-+}
-+
-+static int kdbus_name_release(struct kdbus_name_registry *reg,
-+			      struct kdbus_conn *conn,
-+			      const char *name_str)
-+{
-+	struct kdbus_name_owner *owner;
-+	struct kdbus_name_entry *name;
-+	int ret = 0;
-+
-+	down_write(&reg->rwlock);
-+	name = kdbus_name_entry_find(reg, kdbus_strhash(name_str), name_str);
-+	if (name) {
-+		owner = kdbus_name_owner_find(name, conn);
-+		if (owner)
-+			kdbus_name_release_unlocked(owner);
-+		else
-+			ret = -EADDRINUSE;
-+	} else {
-+		ret = -ESRCH;
-+	}
-+	up_write(&reg->rwlock);
-+
-+	kdbus_notify_flush(conn->ep->bus);
-+	return ret;
-+}
-+
-+/**
-+ * kdbus_name_release_all() - remove all name entries of a given connection
-+ * @reg:		name registry
-+ * @conn:		connection
-+ */
-+void kdbus_name_release_all(struct kdbus_name_registry *reg,
-+			    struct kdbus_conn *conn)
-+{
-+	struct kdbus_name_owner *owner;
-+
-+	down_write(&reg->rwlock);
-+
-+	while ((owner = list_first_entry_or_null(&conn->names_list,
-+						 struct kdbus_name_owner,
-+						 conn_entry)))
-+		kdbus_name_release_unlocked(owner);
-+
-+	up_write(&reg->rwlock);
-+
-+	kdbus_notify_flush(conn->ep->bus);
-+}
-+
-+/**
-+ * kdbus_name_is_valid() - check if a name is valid
-+ * @p:			The name to check
-+ * @allow_wildcard:	Whether or not to allow a wildcard name
-+ *
-+ * A name is valid if all of the following criterias are met:
-+ *
-+ *  - The name has two or more elements separated by a period ('.') character.
-+ *  - All elements must contain at least one character.
-+ *  - Each element must only contain the ASCII characters "[A-Z][a-z][0-9]_-"
-+ *    and must not begin with a digit.
-+ *  - The name must not exceed KDBUS_NAME_MAX_LEN.
-+ *  - If @allow_wildcard is true, the name may end on '.*'
-+ */
-+bool kdbus_name_is_valid(const char *p, bool allow_wildcard)
-+{
-+	bool dot, found_dot = false;
-+	const char *q;
-+
-+	for (dot = true, q = p; *q; q++) {
-+		if (*q == '.') {
-+			if (dot)
-+				return false;
-+
-+			found_dot = true;
-+			dot = true;
-+		} else {
-+			bool good;
-+
-+			good = isalpha(*q) || (!dot && isdigit(*q)) ||
-+				*q == '_' || *q == '-' ||
-+				(allow_wildcard && dot &&
-+					*q == '*' && *(q + 1) == '\0');
-+
-+			if (!good)
-+				return false;
-+
-+			dot = false;
-+		}
-+	}
-+
-+	if (q - p > KDBUS_NAME_MAX_LEN)
-+		return false;
-+
-+	if (dot)
-+		return false;
-+
-+	if (!found_dot)
-+		return false;
-+
-+	return true;
-+}
-+
-+/**
-+ * kdbus_cmd_name_acquire() - handle KDBUS_CMD_NAME_ACQUIRE
-+ * @conn:		connection to operate on
-+ * @argp:		command payload
-+ *
-+ * Return: >=0 on success, negative error code on failure.
-+ */
-+int kdbus_cmd_name_acquire(struct kdbus_conn *conn, void __user *argp)
-+{
-+	const char *item_name;
-+	struct kdbus_cmd *cmd;
-+	int ret;
-+
-+	struct kdbus_arg argv[] = {
-+		{ .type = KDBUS_ITEM_NEGOTIATE },
-+		{ .type = KDBUS_ITEM_NAME, .mandatory = true },
-+	};
-+	struct kdbus_args args = {
-+		.allowed_flags = KDBUS_FLAG_NEGOTIATE |
-+				 KDBUS_NAME_REPLACE_EXISTING |
-+				 KDBUS_NAME_ALLOW_REPLACEMENT |
-+				 KDBUS_NAME_QUEUE,
-+		.argv = argv,
-+		.argc = ARRAY_SIZE(argv),
-+	};
-+
-+	if (!kdbus_conn_is_ordinary(conn))
-+		return -EOPNOTSUPP;
-+
-+	ret = kdbus_args_parse(&args, argp, &cmd);
-+	if (ret != 0)
-+		return ret;
-+
-+	item_name = argv[1].item->str;
-+	if (!kdbus_name_is_valid(item_name, false)) {
-+		ret = -EINVAL;
-+		goto exit;
-+	}
-+
-+	ret = kdbus_name_acquire(conn->ep->bus->name_registry, conn, item_name,
-+				 cmd->flags, &cmd->return_flags);
-+
-+exit:
-+	return kdbus_args_clear(&args, ret);
-+}
-+
-+/**
-+ * kdbus_cmd_name_release() - handle KDBUS_CMD_NAME_RELEASE
-+ * @conn:		connection to operate on
-+ * @argp:		command payload
-+ *
-+ * Return: >=0 on success, negative error code on failure.
-+ */
-+int kdbus_cmd_name_release(struct kdbus_conn *conn, void __user *argp)
-+{
-+	struct kdbus_cmd *cmd;
-+	int ret;
-+
-+	struct kdbus_arg argv[] = {
-+		{ .type = KDBUS_ITEM_NEGOTIATE },
-+		{ .type = KDBUS_ITEM_NAME, .mandatory = true },
-+	};
-+	struct kdbus_args args = {
-+		.allowed_flags = KDBUS_FLAG_NEGOTIATE,
-+		.argv = argv,
-+		.argc = ARRAY_SIZE(argv),
-+	};
-+
-+	if (!kdbus_conn_is_ordinary(conn))
-+		return -EOPNOTSUPP;
-+
-+	ret = kdbus_args_parse(&args, argp, &cmd);
-+	if (ret != 0)
-+		return ret;
-+
-+	ret = kdbus_name_release(conn->ep->bus->name_registry, conn,
-+				 argv[1].item->str);
-+	return kdbus_args_clear(&args, ret);
-+}
-+
-+static int kdbus_list_write(struct kdbus_conn *conn,
-+			    struct kdbus_conn *c,
-+			    struct kdbus_pool_slice *slice,
-+			    size_t *pos,
-+			    struct kdbus_name_owner *o,
-+			    bool write)
-+{
-+	struct kvec kvec[4];
-+	size_t cnt = 0;
-+	int ret;
-+
-+	/* info header */
-+	struct kdbus_info info = {
-+		.size = 0,
-+		.id = c->id,
-+		.flags = c->flags,
-+	};
-+
-+	/* fake the header of a kdbus_name item */
-+	struct {
-+		u64 size;
-+		u64 type;
-+		u64 flags;
-+	} h = {};
-+
-+	if (o && !kdbus_conn_policy_see_name_unlocked(conn, current_cred(),
-+						      o->name->name))
-+		return 0;
-+
-+	kdbus_kvec_set(&kvec[cnt++], &info, sizeof(info), &info.size);
-+
-+	/* append name */
-+	if (o) {
-+		size_t slen = strlen(o->name->name) + 1;
-+
-+		h.size = offsetof(struct kdbus_item, name.name) + slen;
-+		h.type = KDBUS_ITEM_OWNED_NAME;
-+		h.flags = o->flags;
-+
-+		kdbus_kvec_set(&kvec[cnt++], &h, sizeof(h), &info.size);
-+		kdbus_kvec_set(&kvec[cnt++], o->name->name, slen, &info.size);
-+		cnt += !!kdbus_kvec_pad(&kvec[cnt], &info.size);
-+	}
-+
-+	if (write) {
-+		ret = kdbus_pool_slice_copy_kvec(slice, *pos, kvec,
-+						 cnt, info.size);
-+		if (ret < 0)
-+			return ret;
-+	}
-+
-+	*pos += info.size;
-+	return 0;
-+}
-+
-+static int kdbus_list_all(struct kdbus_conn *conn, u64 flags,
-+			  struct kdbus_pool_slice *slice,
-+			  size_t *pos, bool write)
-+{
-+	struct kdbus_conn *c;
-+	size_t p = *pos;
-+	int ret, i;
-+
-+	hash_for_each(conn->ep->bus->conn_hash, i, c, hentry) {
-+		bool added = false;
-+
-+		/* skip monitors */
-+		if (kdbus_conn_is_monitor(c))
-+			continue;
-+
-+		/* all names the connection owns */
-+		if (flags & (KDBUS_LIST_NAMES |
-+			     KDBUS_LIST_ACTIVATORS |
-+			     KDBUS_LIST_QUEUED)) {
-+			struct kdbus_name_owner *o;
-+
-+			list_for_each_entry(o, &c->names_list, conn_entry) {
-+				if (o->flags & KDBUS_NAME_ACTIVATOR) {
-+					if (!(flags & KDBUS_LIST_ACTIVATORS))
-+						continue;
-+
-+					ret = kdbus_list_write(conn, c, slice,
-+							       &p, o, write);
-+					if (ret < 0) {
-+						mutex_unlock(&c->lock);
-+						return ret;
-+					}
-+
-+					added = true;
-+				} else if (o->flags & KDBUS_NAME_IN_QUEUE) {
-+					if (!(flags & KDBUS_LIST_QUEUED))
-+						continue;
-+
-+					ret = kdbus_list_write(conn, c, slice,
-+							       &p, o, write);
-+					if (ret < 0) {
-+						mutex_unlock(&c->lock);
-+						return ret;
-+					}
-+
-+					added = true;
-+				} else if (flags & KDBUS_LIST_NAMES) {
-+					ret = kdbus_list_write(conn, c, slice,
-+							       &p, o, write);
-+					if (ret < 0) {
-+						mutex_unlock(&c->lock);
-+						return ret;
-+					}
-+
-+					added = true;
-+				}
-+			}
-+		}
-+
-+		/* nothing added so far, just add the unique ID */
-+		if (!added && (flags & KDBUS_LIST_UNIQUE)) {
-+			ret = kdbus_list_write(conn, c, slice, &p, NULL, write);
-+			if (ret < 0)
-+				return ret;
-+		}
-+	}
-+
-+	*pos = p;
-+	return 0;
-+}
-+
-+/**
-+ * kdbus_cmd_list() - handle KDBUS_CMD_LIST
-+ * @conn:		connection to operate on
-+ * @argp:		command payload
-+ *
-+ * Return: >=0 on success, negative error code on failure.
-+ */
-+int kdbus_cmd_list(struct kdbus_conn *conn, void __user *argp)
-+{
-+	struct kdbus_name_registry *reg = conn->ep->bus->name_registry;
-+	struct kdbus_pool_slice *slice = NULL;
-+	struct kdbus_cmd_list *cmd;
-+	size_t pos, size;
-+	int ret;
-+
-+	struct kdbus_arg argv[] = {
-+		{ .type = KDBUS_ITEM_NEGOTIATE },
-+	};
-+	struct kdbus_args args = {
-+		.allowed_flags = KDBUS_FLAG_NEGOTIATE |
-+				 KDBUS_LIST_UNIQUE |
-+				 KDBUS_LIST_NAMES |
-+				 KDBUS_LIST_ACTIVATORS |
-+				 KDBUS_LIST_QUEUED,
-+		.argv = argv,
-+		.argc = ARRAY_SIZE(argv),
-+	};
-+
-+	ret = kdbus_args_parse(&args, argp, &cmd);
-+	if (ret != 0)
-+		return ret;
-+
-+	/* lock order: domain -> bus -> ep -> names -> conn */
-+	down_read(&reg->rwlock);
-+	down_read(&conn->ep->bus->conn_rwlock);
-+	down_read(&conn->ep->policy_db.entries_rwlock);
-+
-+	/* size of records */
-+	size = 0;
-+	ret = kdbus_list_all(conn, cmd->flags, NULL, &size, false);
-+	if (ret < 0)
-+		goto exit_unlock;
-+
-+	if (size == 0) {
-+		kdbus_pool_publish_empty(conn->pool, &cmd->offset,
-+					 &cmd->list_size);
-+	} else {
-+		slice = kdbus_pool_slice_alloc(conn->pool, size, false);
-+		if (IS_ERR(slice)) {
-+			ret = PTR_ERR(slice);
-+			slice = NULL;
-+			goto exit_unlock;
-+		}
-+
-+		/* copy the records */
-+		pos = 0;
-+		ret = kdbus_list_all(conn, cmd->flags, slice, &pos, true);
-+		if (ret < 0)
-+			goto exit_unlock;
-+
-+		WARN_ON(pos != size);
-+		kdbus_pool_slice_publish(slice, &cmd->offset, &cmd->list_size);
-+	}
-+
-+	if (kdbus_member_set_user(&cmd->offset, argp, typeof(*cmd), offset) ||
-+	    kdbus_member_set_user(&cmd->list_size, argp,
-+				  typeof(*cmd), list_size))
-+		ret = -EFAULT;
-+
-+exit_unlock:
-+	up_read(&conn->ep->policy_db.entries_rwlock);
-+	up_read(&conn->ep->bus->conn_rwlock);
-+	up_read(&reg->rwlock);
-+	kdbus_pool_slice_release(slice);
-+	return kdbus_args_clear(&args, ret);
-+}
-diff --git a/ipc/kdbus/names.h b/ipc/kdbus/names.h
-new file mode 100644
-index 0000000..edac59d
---- /dev/null
-+++ b/ipc/kdbus/names.h
-@@ -0,0 +1,105 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_NAMES_H
-+#define __KDBUS_NAMES_H
-+
-+#include <linux/hashtable.h>
-+#include <linux/rwsem.h>
-+
-+struct kdbus_name_entry;
-+struct kdbus_name_owner;
-+struct kdbus_name_registry;
-+
-+/**
-+ * struct kdbus_name_registry - names registered for a bus
-+ * @entries_hash:	Map of entries
-+ * @lock:		Registry data lock
-+ * @name_seq_last:	Last used sequence number to assign to a name entry
-+ */
-+struct kdbus_name_registry {
-+	DECLARE_HASHTABLE(entries_hash, 8);
-+	struct rw_semaphore rwlock;
-+	u64 name_seq_last;
-+};
-+
-+/**
-+ * struct kdbus_name_entry - well-know name entry
-+ * @name_id:		sequence number of name entry to be able to uniquely
-+ *			identify a name over its registration lifetime
-+ * @activator:		activator of this name, or NULL
-+ * @queue:		list of queued owners
-+ * @hentry:		entry in registry map
-+ * @name:		well-known name
-+ */
-+struct kdbus_name_entry {
-+	u64 name_id;
-+	struct kdbus_name_owner *activator;
-+	struct list_head queue;
-+	struct hlist_node hentry;
-+	char name[];
-+};
-+
-+/**
-+ * struct kdbus_name_owner - owner of a well-known name
-+ * @flags:		KDBUS_NAME_* flags of this owner
-+ * @conn:		connection owning the name
-+ * @name:		name that is owned
-+ * @conn_entry:		link into @conn
-+ * @name_entry:		link into @name
-+ */
-+struct kdbus_name_owner {
-+	u64 flags;
-+	struct kdbus_conn *conn;
-+	struct kdbus_name_entry *name;
-+	struct list_head conn_entry;
-+	struct list_head name_entry;
-+};
-+
-+bool kdbus_name_is_valid(const char *p, bool allow_wildcard);
-+
-+struct kdbus_name_registry *kdbus_name_registry_new(void);
-+void kdbus_name_registry_free(struct kdbus_name_registry *reg);
-+
-+struct kdbus_name_entry *
-+kdbus_name_lookup_unlocked(struct kdbus_name_registry *reg, const char *name);
-+
-+int kdbus_name_acquire(struct kdbus_name_registry *reg,
-+		       struct kdbus_conn *conn, const char *name,
-+		       u64 flags, u64 *return_flags);
-+void kdbus_name_release_all(struct kdbus_name_registry *reg,
-+			    struct kdbus_conn *conn);
-+
-+int kdbus_cmd_name_acquire(struct kdbus_conn *conn, void __user *argp);
-+int kdbus_cmd_name_release(struct kdbus_conn *conn, void __user *argp);
-+int kdbus_cmd_list(struct kdbus_conn *conn, void __user *argp);
-+
-+/**
-+ * kdbus_name_get_owner() - get current owner of a name
-+ * @name:	name to get current owner of
-+ *
-+ * This returns a pointer to the current owner of a name (or its activator if
-+ * there is no owner). The caller must make sure @name is valid and does not
-+ * vanish.
-+ *
-+ * Return: Pointer to current owner or NULL if there is none.
-+ */
-+static inline struct kdbus_name_owner *
-+kdbus_name_get_owner(struct kdbus_name_entry *name)
-+{
-+	return list_first_entry_or_null(&name->queue, struct kdbus_name_owner,
-+					name_entry) ? : name->activator;
-+}
-+
-+#endif
-diff --git a/ipc/kdbus/node.c b/ipc/kdbus/node.c
-new file mode 100644
-index 0000000..89f58bc
---- /dev/null
-+++ b/ipc/kdbus/node.c
-@@ -0,0 +1,897 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/atomic.h>
-+#include <linux/fs.h>
-+#include <linux/idr.h>
-+#include <linux/kdev_t.h>
-+#include <linux/rbtree.h>
-+#include <linux/rwsem.h>
-+#include <linux/sched.h>
-+#include <linux/slab.h>
-+#include <linux/wait.h>
-+
-+#include "bus.h"
-+#include "domain.h"
-+#include "endpoint.h"
-+#include "fs.h"
-+#include "handle.h"
-+#include "node.h"
-+#include "util.h"
-+
-+/**
-+ * DOC: kdbus nodes
-+ *
-+ * Nodes unify lifetime management across exposed kdbus objects and provide a
-+ * hierarchy. Each kdbus object, that might be exposed to user-space, has a
-+ * kdbus_node object embedded and is linked into the hierarchy. Each node can
-+ * have any number (0-n) of child nodes linked. Each child retains a reference
-+ * to its parent node. For root-nodes, the parent is NULL.
-+ *
-+ * Each node object goes through a bunch of states during it's lifetime:
-+ *     * NEW
-+ *       * LINKED    (can be skipped by NEW->FREED transition)
-+ *         * ACTIVE  (can be skipped by LINKED->INACTIVE transition)
-+ *       * INACTIVE
-+ *       * DRAINED
-+ *     * FREED
-+ *
-+ * Each node is allocated by the caller and initialized via kdbus_node_init().
-+ * This never fails and sets the object into state NEW. From now on, ref-counts
-+ * on the node manage its lifetime. During init, the ref-count is set to 1. Once
-+ * it drops to 0, the node goes to state FREED and the node->free_cb() callback
-+ * is called to deallocate any memory.
-+ *
-+ * After initializing a node, you usually link it into the hierarchy. You need
-+ * to provide a parent node and a name. The node will be linked as child to the
-+ * parent and a globally unique ID is assigned to the child. The name of the
-+ * child must be unique for all children of this parent. Otherwise, linking the
-+ * child will fail with -EEXIST.
-+ * Note that the child is not marked active, yet. Admittedly, it prevents any
-+ * other node from being linked with the same name (thus, it reserves that
-+ * name), but any child-lookup (via name or unique ID) will never return this
-+ * child unless it has been marked active.
-+ *
-+ * Once successfully linked, you can use kdbus_node_activate() to activate a
-+ * child. This will mark the child active. This state can be skipped by directly
-+ * deactivating the child via kdbus_node_deactivate() (see below).
-+ * By activating a child, you enable any lookups on this child to succeed from
-+ * now on. Furthermore, any code that got its hands on a reference to the node,
-+ * can from now on "acquire" the node.
-+ *
-+ *     Active References (or: 'acquiring' and 'releasing' a node)
-+ *     Additionally to normal object references, nodes support something we call
-+ *     "active references". An active reference can be acquired via
-+ *     kdbus_node_acquire() and released via kdbus_node_release(). A caller
-+ *     _must_ own a normal object reference whenever calling those functions.
-+ *     Unlike object references, acquiring an active reference can fail (by
-+ *     returning 'false' from kdbus_node_acquire()). An active reference can
-+ *     only be acquired if the node is marked active. If it is not marked
-+ *     active, yet, or if it was already deactivated, no more active references
-+ *     can be acquired, ever!
-+ *     Active references are used to track tasks working on a node. Whenever a
-+ *     task enters kernel-space to perform an action on a node, it acquires an
-+ *     active reference, performs the action and releases the reference again.
-+ *     While holding an active reference, the node is guaranteed to stay active.
-+ *     If the node is deactivated in parallel, the node is marked as
-+ *     deactivated, then we wait for all active references to be dropped, before
-+ *     we finally proceed with any cleanups. That is, if you hold an active
-+ *     reference to a node, any resources that are bound to the "active" state
-+ *     are guaranteed to stay accessible until you release your reference.
-+ *
-+ *     Active-references are very similar to rw-locks, where acquiring a node is
-+ *     equal to try-read-lock and releasing to read-unlock. Deactivating a node
-+ *     means write-lock and never releasing it again.
-+ *     Unlike rw-locks, the 'active reference' concept is more versatile and
-+ *     avoids unusual rw-lock usage (never releasing a write-lock..).
-+ *
-+ *     It is safe to acquire multiple active-references recursively. But you
-+ *     need to check the return value of kdbus_node_acquire() on _each_ call. It
-+ *     may stop granting references at _any_ time.
-+ *
-+ *     You're free to perform any operations you want while holding an active
-+ *     reference, except sleeping for an indefinite period. Sleeping for a fixed
-+ *     amount of time is fine, but you usually should not wait on wait-queues
-+ *     without a timeout.
-+ *     For example, if you wait for I/O to happen, you should gather all data
-+ *     and schedule the I/O operation, then release your active reference and
-+ *     wait for it to complete. Then try to acquire a new reference. If it
-+ *     fails, perform any cleanup (the node is now dead). Otherwise, you can
-+ *     finish your operation.
-+ *
-+ * All nodes can be deactivated via kdbus_node_deactivate() at any time. You can
-+ * call this multiple times, even in parallel or on nodes that were never
-+ * linked, and it will just work. The only restriction is, you must not hold an
-+ * active reference when calling kdbus_node_deactivate().
-+ * By deactivating a node, it is immediately marked inactive. Then, we wait for
-+ * all active references to be released (called 'draining' the node). This
-+ * shouldn't take very long as we don't perform long-lasting operations while
-+ * holding an active reference. Note that once the node is marked inactive, no
-+ * new active references can be acquired.
-+ * Once all active references are dropped, the node is considered 'drained'. Now
-+ * kdbus_node_deactivate() is called on each child of the node before we
-+ * continue deactivating our node. That is, once all children are entirely
-+ * deactivated, we call ->release_cb() of our node. ->release_cb() can release
-+ * any resources on that node which are bound to the "active" state of a node.
-+ * When done, we unlink the node from its parent rb-tree, mark it as
-+ * 'released' and return.
-+ * If kdbus_node_deactivate() is called multiple times (even in parallel), all
-+ * but one caller will just wait until the node is fully deactivated. That is,
-+ * one random caller of kdbus_node_deactivate() is selected to call
-+ * ->release_cb() and cleanup the node. Only once all this is done, all other
-+ * callers will return from kdbus_node_deactivate(). That is, it doesn't matter
-+ * whether you're the selected caller or not, it will only return after
-+ * everything is fully done.
-+ *
-+ * When a node is activated, we acquire a normal object reference to the node.
-+ * This reference is dropped after deactivation is fully done (and only iff the
-+ * node really was activated). This allows callers to link+activate a child node
-+ * and then drop all refs. The node will be deactivated together with the
-+ * parent, and then be freed when this reference is dropped.
-+ *
-+ * Currently, nodes provide a bunch of resources that external code can use
-+ * directly. This includes:
-+ *
-+ *     * node->waitq: Each node has its own wait-queue that is used to manage
-+ *                    the 'active' state. When a node is deactivated, we wait on
-+ *                    this queue until all active refs are dropped. Analogously,
-+ *                    when you release an active reference on a deactivated
-+ *                    node, and the active ref-count drops to 0, we wake up a
-+ *                    single thread on this queue. Furthermore, once the
-+ *                    ->release_cb() callback finished, we wake up all waiters.
-+ *                    The node-owner is free to re-use this wait-queue for other
-+ *                    purposes. As node-management uses this queue only during
-+ *                    deactivation, it is usually totally fine to re-use the
-+ *                    queue for other, preferably low-overhead, use-cases.
-+ *
-+ *     * node->type: This field defines the type of the owner of this node. It
-+ *                   must be set during node initialization and must remain
-+ *                   constant. The node management never looks at this value,
-+ *                   but external users might use to gain access to the owner
-+ *                   object of a node.
-+ *                   It is totally up to the owner of the node to define what
-+ *                   their type means. Usually it means you can access the
-+ *                   parent structure via container_of(), as long as you hold an
-+ *                   active reference to the node.
-+ *
-+ *     * node->free_cb:    callback after all references are dropped
-+ *       node->release_cb: callback during node deactivation
-+ *                         These fields must be set by the node owner during
-+ *                         node initialization. They must remain constant. If
-+ *                         NULL, they're skipped.
-+ *
-+ *     * node->mode: filesystem access modes
-+ *       node->uid:  filesystem owner uid
-+ *       node->gid:  filesystem owner gid
-+ *                   These fields must be set by the node owner during node
-+ *                   initialization. They must remain constant and may be
-+ *                   accessed by other callers to properly initialize
-+ *                   filesystem nodes.
-+ *
-+ *     * node->id: This is an unsigned 32bit integer allocated by an IDA. It is
-+ *                 always kept as small as possible during allocation and is
-+ *                 globally unique across all nodes allocated by this module. 0
-+ *                 is reserved as "not assigned" and is the default.
-+ *                 The ID is assigned during kdbus_node_link() and is kept until
-+ *                 the object is freed. Thus, the ID surpasses the active
-+ *                 lifetime of a node. As long as you hold an object reference
-+ *                 to a node (and the node was linked once), the ID is valid and
-+ *                 unique.
-+ *
-+ *     * node->name: name of this node
-+ *       node->hash: 31bit hash-value of @name (range [2..INT_MAX-1])
-+ *                   These values follow the same lifetime rules as node->id.
-+ *                   They're initialized when the node is linked and then remain
-+ *                   constant until the last object reference is dropped.
-+ *                   Unlike the id, the name is only unique across all siblings
-+ *                   and only until the node is deactivated. Currently, the name
-+ *                   is even unique if linked but not activated, yet. This might
-+ *                   change in the future, though. Code should not rely on this.
-+ *
-+ *     * node->lock:     lock to protect node->children, node->rb, node->parent
-+ *     * node->parent: Reference to parent node. This is set during LINK time
-+ *                     and is dropped during destruction. You must not access
-+ *                     it unless you hold an active reference to the node or if
-+ *                     you know the node is dead.
-+ *     * node->children: rb-tree of all linked children of this node. You must
-+ *                       not access this directly, but use one of the iterator
-+ *                       or lookup helpers.
-+ */
-+
-+/*
-+ * Bias values track states of "active references". They're all negative. If a
-+ * node is active, its active-ref-counter is >=0 and tracks all active
-+ * references. Once a node is deactivaed, we subtract NODE_BIAS. This means, the
-+ * counter is now negative but still counts the active references. Once it drops
-+ * to exactly NODE_BIAS, we know all active references were dropped. Exactly one
-+ * thread will change it to NODE_RELEASE now, perform cleanup and then put it
-+ * into NODE_DRAINED. Once drained, all other threads that tried deactivating
-+ * the node will now be woken up (thus, they wait until the node is fully done).
-+ * The initial state during node-setup is NODE_NEW. If a node is directly
-+ * deactivated without having ever been active, it is put into
-+ * NODE_RELEASE_DIRECT instead of NODE_BIAS. This tracks this one-bit state
-+ * across node-deactivation. The task putting it into NODE_RELEASE now knows
-+ * whether the node was active before or not.
-+ *
-+ * Some archs implement atomic_sub(v) with atomic_add(-v), so reserve INT_MIN
-+ * to avoid overflows if multiplied by -1.
-+ */
-+#define KDBUS_NODE_BIAS			(INT_MIN + 5)
-+#define KDBUS_NODE_RELEASE_DIRECT	(KDBUS_NODE_BIAS - 1)
-+#define KDBUS_NODE_RELEASE		(KDBUS_NODE_BIAS - 2)
-+#define KDBUS_NODE_DRAINED		(KDBUS_NODE_BIAS - 3)
-+#define KDBUS_NODE_NEW			(KDBUS_NODE_BIAS - 4)
-+
-+/* global unique ID mapping for kdbus nodes */
-+DEFINE_IDA(kdbus_node_ida);
-+
-+/**
-+ * kdbus_node_name_hash() - hash a name
-+ * @name:	The string to hash
-+ *
-+ * This computes the hash of @name. It is guaranteed to be in the range
-+ * [2..INT_MAX-1]. The values 1, 2 and INT_MAX are unused as they are reserved
-+ * for the filesystem code.
-+ *
-+ * Return: hash value of the passed string
-+ */
-+static unsigned int kdbus_node_name_hash(const char *name)
-+{
-+	unsigned int hash;
-+
-+	/* reserve hash numbers 0, 1 and >=INT_MAX for magic directories */
-+	hash = kdbus_strhash(name) & INT_MAX;
-+	if (hash < 2)
-+		hash += 2;
-+	if (hash >= INT_MAX)
-+		hash = INT_MAX - 1;
-+
-+	return hash;
-+}
-+
-+/**
-+ * kdbus_node_name_compare() - compare a name with a node's name
-+ * @hash:	hash of the string to compare the node with
-+ * @name:	name to compare the node with
-+ * @node:	node to compare the name with
-+ *
-+ * Return: 0 if @name and @hash exactly match the information in @node, or
-+ * an integer less than or greater than zero if @name is found, respectively,
-+ * to be less than or be greater than the string stored in @node.
-+ */
-+static int kdbus_node_name_compare(unsigned int hash, const char *name,
-+				   const struct kdbus_node *node)
-+{
-+	if (hash != node->hash)
-+		return hash - node->hash;
-+
-+	return strcmp(name, node->name);
-+}
-+
-+/**
-+ * kdbus_node_init() - initialize a kdbus_node
-+ * @node:	Pointer to the node to initialize
-+ * @type:	The type the node will have (KDBUS_NODE_*)
-+ *
-+ * The caller is responsible of allocating @node and initializating it to zero.
-+ * Once this call returns, you must use the node_ref() and node_unref()
-+ * functions to manage this node.
-+ */
-+void kdbus_node_init(struct kdbus_node *node, unsigned int type)
-+{
-+	atomic_set(&node->refcnt, 1);
-+	mutex_init(&node->lock);
-+	node->id = 0;
-+	node->type = type;
-+	RB_CLEAR_NODE(&node->rb);
-+	node->children = RB_ROOT;
-+	init_waitqueue_head(&node->waitq);
-+	atomic_set(&node->active, KDBUS_NODE_NEW);
-+}
-+
-+/**
-+ * kdbus_node_link() - link a node into the nodes system
-+ * @node:	Pointer to the node to initialize
-+ * @parent:	Pointer to a parent node, may be %NULL
-+ * @name:	The name of the node (or NULL if root node)
-+ *
-+ * This links a node into the hierarchy. This must not be called multiple times.
-+ * If @parent is NULL, the node becomes a new root node.
-+ *
-+ * This call will fail if @name is not unique across all its siblings or if no
-+ * ID could be allocated. You must not activate a node if linking failed! It is
-+ * safe to deactivate it, though.
-+ *
-+ * Once you linked a node, you must call kdbus_node_deactivate() before you drop
-+ * the last reference (even if you never activate the node).
-+ *
-+ * Return: 0 on success. negative error otherwise.
-+ */
-+int kdbus_node_link(struct kdbus_node *node, struct kdbus_node *parent,
-+		    const char *name)
-+{
-+	int ret;
-+
-+	if (WARN_ON(node->type != KDBUS_NODE_DOMAIN && !parent))
-+		return -EINVAL;
-+
-+	if (WARN_ON(parent && !name))
-+		return -EINVAL;
-+
-+	if (name) {
-+		node->name = kstrdup(name, GFP_KERNEL);
-+		if (!node->name)
-+			return -ENOMEM;
-+
-+		node->hash = kdbus_node_name_hash(name);
-+	}
-+
-+	ret = ida_simple_get(&kdbus_node_ida, 1, 0, GFP_KERNEL);
-+	if (ret < 0)
-+		return ret;
-+
-+	node->id = ret;
-+	ret = 0;
-+
-+	if (parent) {
-+		struct rb_node **n, *prev;
-+
-+		if (!kdbus_node_acquire(parent))
-+			return -ESHUTDOWN;
-+
-+		mutex_lock(&parent->lock);
-+
-+		n = &parent->children.rb_node;
-+		prev = NULL;
-+
-+		while (*n) {
-+			struct kdbus_node *pos;
-+			int result;
-+
-+			pos = kdbus_node_from_rb(*n);
-+			prev = *n;
-+			result = kdbus_node_name_compare(node->hash,
-+							 node->name,
-+							 pos);
-+			if (result == 0) {
-+				ret = -EEXIST;
-+				goto exit_unlock;
-+			}
-+
-+			if (result < 0)
-+				n = &pos->rb.rb_left;
-+			else
-+				n = &pos->rb.rb_right;
-+		}
-+
-+		/* add new node and rebalance the tree */
-+		rb_link_node(&node->rb, prev, n);
-+		rb_insert_color(&node->rb, &parent->children);
-+		node->parent = kdbus_node_ref(parent);
-+
-+exit_unlock:
-+		mutex_unlock(&parent->lock);
-+		kdbus_node_release(parent);
-+	}
-+
-+	return ret;
-+}
-+
-+/**
-+ * kdbus_node_ref() - Acquire object reference
-+ * @node:	node to acquire reference to (or NULL)
-+ *
-+ * This acquires a new reference to @node. You must already own a reference when
-+ * calling this!
-+ * If @node is NULL, this is a no-op.
-+ *
-+ * Return: @node is returned
-+ */
-+struct kdbus_node *kdbus_node_ref(struct kdbus_node *node)
-+{
-+	if (node)
-+		atomic_inc(&node->refcnt);
-+	return node;
-+}
-+
-+/**
-+ * kdbus_node_unref() - Drop object reference
-+ * @node:	node to drop reference to (or NULL)
-+ *
-+ * This drops an object reference to @node. You must not access the node if you
-+ * no longer own a reference.
-+ * If the ref-count drops to 0, the object will be destroyed (->free_cb will be
-+ * called).
-+ *
-+ * If you linked or activated the node, you must deactivate the node before you
-+ * drop your last reference! If you didn't link or activate the node, you can
-+ * drop any reference you want.
-+ *
-+ * Note that this calls into ->free_cb() and thus _might_ sleep. The ->free_cb()
-+ * callbacks must not acquire any outer locks, though. So you can safely drop
-+ * references while holding locks.
-+ *
-+ * If @node is NULL, this is a no-op.
-+ *
-+ * Return: This always returns NULL
-+ */
-+struct kdbus_node *kdbus_node_unref(struct kdbus_node *node)
-+{
-+	if (node && atomic_dec_and_test(&node->refcnt)) {
-+		struct kdbus_node safe = *node;
-+
-+		WARN_ON(atomic_read(&node->active) != KDBUS_NODE_DRAINED);
-+		WARN_ON(!RB_EMPTY_NODE(&node->rb));
-+
-+		if (node->free_cb)
-+			node->free_cb(node);
-+		if (safe.id > 0)
-+			ida_simple_remove(&kdbus_node_ida, safe.id);
-+
-+		kfree(safe.name);
-+
-+		/*
-+		 * kdbusfs relies on the parent to be available even after the
-+		 * node was deactivated and unlinked. Therefore, we pin it
-+		 * until a node is destroyed.
-+		 */
-+		kdbus_node_unref(safe.parent);
-+	}
-+
-+	return NULL;
-+}
-+
-+/**
-+ * kdbus_node_is_active() - test whether a node is active
-+ * @node:	node to test
-+ *
-+ * This checks whether @node is active. That means, @node was linked and
-+ * activated by the node owner and hasn't been deactivated, yet. If, and only
-+ * if, a node is active, kdbus_node_acquire() will be able to acquire active
-+ * references.
-+ *
-+ * Note that this function does not give any lifetime guarantees. After this
-+ * call returns, the node might be deactivated immediately. Normally, what you
-+ * want is to acquire a real active reference via kdbus_node_acquire().
-+ *
-+ * Return: true if @node is active, false otherwise
-+ */
-+bool kdbus_node_is_active(struct kdbus_node *node)
-+{
-+	return atomic_read(&node->active) >= 0;
-+}
-+
-+/**
-+ * kdbus_node_is_deactivated() - test whether a node was already deactivated
-+ * @node:	node to test
-+ *
-+ * This checks whether kdbus_node_deactivate() was called on @node. Note that
-+ * this might be true even if you never deactivated the node directly, but only
-+ * one of its ancestors.
-+ *
-+ * Note that even if this returns 'false', the node might get deactivated
-+ * immediately after the call returns.
-+ *
-+ * Return: true if @node was already deactivated, false if not
-+ */
-+bool kdbus_node_is_deactivated(struct kdbus_node *node)
-+{
-+	int v;
-+
-+	v = atomic_read(&node->active);
-+	return v != KDBUS_NODE_NEW && v < 0;
-+}
-+
-+/**
-+ * kdbus_node_activate() - activate a node
-+ * @node:	node to activate
-+ *
-+ * This marks @node as active if, and only if, the node wasn't activated nor
-+ * deactivated, yet, and the parent is still active. Any but the first call to
-+ * kdbus_node_activate() is a no-op.
-+ * If you called kdbus_node_deactivate() before, then even the first call to
-+ * kdbus_node_activate() will be a no-op.
-+ *
-+ * This call doesn't give any lifetime guarantees. The node might get
-+ * deactivated immediately after this call returns. Or the parent might already
-+ * be deactivated, which will make this call a no-op.
-+ *
-+ * If this call successfully activated a node, it will take an object reference
-+ * to it. This reference is dropped after the node is deactivated. Therefore,
-+ * the object owner can safely drop their reference to @node iff they know that
-+ * its parent node will get deactivated at some point. Once the parent node is
-+ * deactivated, it will deactivate all its child and thus drop this reference
-+ * again.
-+ *
-+ * Return: True if this call successfully activated the node, otherwise false.
-+ *         Note that this might return false, even if the node is still active
-+ *         (eg., if you called this a second time).
-+ */
-+bool kdbus_node_activate(struct kdbus_node *node)
-+{
-+	bool res = false;
-+
-+	mutex_lock(&node->lock);
-+	if (atomic_read(&node->active) == KDBUS_NODE_NEW) {
-+		atomic_sub(KDBUS_NODE_NEW, &node->active);
-+		/* activated nodes have ref +1 */
-+		kdbus_node_ref(node);
-+		res = true;
-+	}
-+	mutex_unlock(&node->lock);
-+
-+	return res;
-+}
-+
-+/**
-+ * kdbus_node_deactivate() - deactivate a node
-+ * @node:	The node to deactivate.
-+ *
-+ * This function recursively deactivates this node and all its children. It
-+ * returns only once all children and the node itself were recursively disabled
-+ * (even if you call this function multiple times in parallel).
-+ *
-+ * It is safe to call this function on _any_ node that was initialized _any_
-+ * number of times.
-+ *
-+ * This call may sleep, as it waits for all active references to be dropped.
-+ */
-+void kdbus_node_deactivate(struct kdbus_node *node)
-+{
-+	struct kdbus_node *pos, *child;
-+	struct rb_node *rb;
-+	int v_pre, v_post;
-+
-+	pos = node;
-+
-+	/*
-+	 * To avoid recursion, we perform back-tracking while deactivating
-+	 * nodes. For each node we enter, we first mark the active-counter as
-+	 * deactivated by adding BIAS. If the node as children, we set the first
-+	 * child as current position and start over. If the node has no
-+	 * children, we drain the node by waiting for all active refs to be
-+	 * dropped and then releasing the node.
-+	 *
-+	 * After the node is released, we set its parent as current position
-+	 * and start over. If the current position was the initial node, we're
-+	 * done.
-+	 *
-+	 * Note that this function can be called in parallel by multiple
-+	 * callers. We make sure that each node is only released once, and any
-+	 * racing caller will wait until the other thread fully released that
-+	 * node.
-+	 */
-+
-+	for (;;) {
-+		/*
-+		 * Add BIAS to node->active to mark it as inactive. If it was
-+		 * never active before, immediately mark it as RELEASE_INACTIVE
-+		 * so we remember this state.
-+		 * We cannot remember v_pre as we might iterate into the
-+		 * children, overwriting v_pre, before we can release our node.
-+		 */
-+		mutex_lock(&pos->lock);
-+		v_pre = atomic_read(&pos->active);
-+		if (v_pre >= 0)
-+			atomic_add_return(KDBUS_NODE_BIAS, &pos->active);
-+		else if (v_pre == KDBUS_NODE_NEW)
-+			atomic_set(&pos->active, KDBUS_NODE_RELEASE_DIRECT);
-+		mutex_unlock(&pos->lock);
-+
-+		/* wait until all active references were dropped */
-+		wait_event(pos->waitq,
-+			   atomic_read(&pos->active) <= KDBUS_NODE_BIAS);
-+
-+		mutex_lock(&pos->lock);
-+		/* recurse into first child if any */
-+		rb = rb_first(&pos->children);
-+		if (rb) {
-+			child = kdbus_node_ref(kdbus_node_from_rb(rb));
-+			mutex_unlock(&pos->lock);
-+			pos = child;
-+			continue;
-+		}
-+
-+		/* mark object as RELEASE */
-+		v_post = atomic_read(&pos->active);
-+		if (v_post == KDBUS_NODE_BIAS ||
-+		    v_post == KDBUS_NODE_RELEASE_DIRECT)
-+			atomic_set(&pos->active, KDBUS_NODE_RELEASE);
-+		mutex_unlock(&pos->lock);
-+
-+		/*
-+		 * If this is the thread that marked the object as RELEASE, we
-+		 * perform the actual release. Otherwise, we wait until the
-+		 * release is done and the node is marked as DRAINED.
-+		 */
-+		if (v_post == KDBUS_NODE_BIAS ||
-+		    v_post == KDBUS_NODE_RELEASE_DIRECT) {
-+			if (pos->release_cb)
-+				pos->release_cb(pos, v_post == KDBUS_NODE_BIAS);
-+
-+			if (pos->parent) {
-+				mutex_lock(&pos->parent->lock);
-+				if (!RB_EMPTY_NODE(&pos->rb)) {
-+					rb_erase(&pos->rb,
-+						 &pos->parent->children);
-+					RB_CLEAR_NODE(&pos->rb);
-+				}
-+				mutex_unlock(&pos->parent->lock);
-+			}
-+
-+			/* mark as DRAINED */
-+			atomic_set(&pos->active, KDBUS_NODE_DRAINED);
-+			wake_up_all(&pos->waitq);
-+
-+			/* drop VFS cache */
-+			kdbus_fs_flush(pos);
-+
-+			/*
-+			 * If the node was activated and someone subtracted BIAS
-+			 * from it to deactivate it, we, and only us, are
-+			 * responsible to release the extra ref-count that was
-+			 * taken once in kdbus_node_activate().
-+			 * If the node was never activated, no-one ever
-+			 * subtracted BIAS, but instead skipped that state and
-+			 * immediately went to NODE_RELEASE_DIRECT. In that case
-+			 * we must not drop the reference.
-+			 */
-+			if (v_post == KDBUS_NODE_BIAS)
-+				kdbus_node_unref(pos);
-+		} else {
-+			/* wait until object is DRAINED */
-+			wait_event(pos->waitq,
-+			    atomic_read(&pos->active) == KDBUS_NODE_DRAINED);
-+		}
-+
-+		/*
-+		 * We're done with the current node. Continue on its parent
-+		 * again, which will try deactivating its next child, or itself
-+		 * if no child is left.
-+		 * If we've reached our initial node again, we are done and
-+		 * can safely return.
-+		 */
-+		if (pos == node)
-+			break;
-+
-+		child = pos;
-+		pos = pos->parent;
-+		kdbus_node_unref(child);
-+	}
-+}
-+
-+/**
-+ * kdbus_node_acquire() - Acquire an active ref on a node
-+ * @node:	The node
-+ *
-+ * This acquires an active-reference to @node. This will only succeed if the
-+ * node is active. You must release this active reference via
-+ * kdbus_node_release() again.
-+ *
-+ * See the introduction to "active references" for more details.
-+ *
-+ * Return: %true if @node was non-NULL and active
-+ */
-+bool kdbus_node_acquire(struct kdbus_node *node)
-+{
-+	return node && atomic_inc_unless_negative(&node->active);
-+}
-+
-+/**
-+ * kdbus_node_release() - Release an active ref on a node
-+ * @node:	The node
-+ *
-+ * This releases an active reference that was previously acquired via
-+ * kdbus_node_acquire(). See kdbus_node_acquire() for details.
-+ */
-+void kdbus_node_release(struct kdbus_node *node)
-+{
-+	if (node && atomic_dec_return(&node->active) == KDBUS_NODE_BIAS)
-+		wake_up(&node->waitq);
-+}
-+
-+/**
-+ * kdbus_node_find_child() - Find child by name
-+ * @node:	parent node to search through
-+ * @name:	name of child node
-+ *
-+ * This searches through all children of @node for a child-node with name @name.
-+ * If not found, or if the child is deactivated, NULL is returned. Otherwise,
-+ * the child is acquired and a new reference is returned.
-+ *
-+ * If you're done with the child, you need to release it and drop your
-+ * reference.
-+ *
-+ * This function does not acquire the parent node. However, if the parent was
-+ * already deactivated, then kdbus_node_deactivate() will, at some point, also
-+ * deactivate the child. Therefore, we can rely on the explicit ordering during
-+ * deactivation.
-+ *
-+ * Return: Reference to acquired child node, or NULL if not found / not active.
-+ */
-+struct kdbus_node *kdbus_node_find_child(struct kdbus_node *node,
-+					 const char *name)
-+{
-+	struct kdbus_node *child;
-+	struct rb_node *rb;
-+	unsigned int hash;
-+	int ret;
-+
-+	hash = kdbus_node_name_hash(name);
-+
-+	mutex_lock(&node->lock);
-+	rb = node->children.rb_node;
-+	while (rb) {
-+		child = kdbus_node_from_rb(rb);
-+		ret = kdbus_node_name_compare(hash, name, child);
-+		if (ret < 0)
-+			rb = rb->rb_left;
-+		else if (ret > 0)
-+			rb = rb->rb_right;
-+		else
-+			break;
-+	}
-+	if (rb && kdbus_node_acquire(child))
-+		kdbus_node_ref(child);
-+	else
-+		child = NULL;
-+	mutex_unlock(&node->lock);
-+
-+	return child;
-+}
-+
-+static struct kdbus_node *node_find_closest_unlocked(struct kdbus_node *node,
-+						     unsigned int hash,
-+						     const char *name)
-+{
-+	struct kdbus_node *n, *pos = NULL;
-+	struct rb_node *rb;
-+	int res;
-+
-+	/*
-+	 * Find the closest child with ``node->hash >= hash'', or, if @name is
-+	 * valid, ``node->name >= name'' (where '>=' is the lex. order).
-+	 */
-+
-+	rb = node->children.rb_node;
-+	while (rb) {
-+		n = kdbus_node_from_rb(rb);
-+
-+		if (name)
-+			res = kdbus_node_name_compare(hash, name, n);
-+		else
-+			res = hash - n->hash;
-+
-+		if (res <= 0) {
-+			rb = rb->rb_left;
-+			pos = n;
-+		} else { /* ``hash > n->hash'', ``name > n->name'' */
-+			rb = rb->rb_right;
-+		}
-+	}
-+
-+	return pos;
-+}
-+
-+/**
-+ * kdbus_node_find_closest() - Find closest child-match
-+ * @node:	parent node to search through
-+ * @hash:	hash value to find closest match for
-+ *
-+ * Find the closest child of @node with a hash greater than or equal to @hash.
-+ * The closest match is the left-most child of @node with this property. Which
-+ * means, it is the first child with that hash returned by
-+ * kdbus_node_next_child(), if you'd iterate the whole parent node.
-+ *
-+ * Return: Reference to acquired child, or NULL if none found.
-+ */
-+struct kdbus_node *kdbus_node_find_closest(struct kdbus_node *node,
-+					   unsigned int hash)
-+{
-+	struct kdbus_node *child;
-+	struct rb_node *rb;
-+
-+	mutex_lock(&node->lock);
-+
-+	child = node_find_closest_unlocked(node, hash, NULL);
-+	while (child && !kdbus_node_acquire(child)) {
-+		rb = rb_next(&child->rb);
-+		if (rb)
-+			child = kdbus_node_from_rb(rb);
-+		else
-+			child = NULL;
-+	}
-+	kdbus_node_ref(child);
-+
-+	mutex_unlock(&node->lock);
-+
-+	return child;
-+}
-+
-+/**
-+ * kdbus_node_next_child() - Acquire next child
-+ * @node:	parent node
-+ * @prev:	previous child-node position or NULL
-+ *
-+ * This function returns a reference to the next active child of @node, after
-+ * the passed position @prev. If @prev is NULL, a reference to the first active
-+ * child is returned. If no more active children are found, NULL is returned.
-+ *
-+ * This function acquires the next child it returns. If you're done with the
-+ * returned pointer, you need to release _and_ unref it.
-+ *
-+ * The passed in pointer @prev is not modified by this function, and it does
-+ * *not* have to be active. If @prev was acquired via different means, or if it
-+ * was unlinked from its parent before you pass it in, then this iterator will
-+ * still return the next active child (it will have to search through the
-+ * rb-tree based on the node-name, though).
-+ * However, @prev must not be linked to a different parent than @node!
-+ *
-+ * Return: Reference to next acquired child, or NULL if at the end.
-+ */
-+struct kdbus_node *kdbus_node_next_child(struct kdbus_node *node,
-+					 struct kdbus_node *prev)
-+{
-+	struct kdbus_node *pos = NULL;
-+	struct rb_node *rb;
-+
-+	mutex_lock(&node->lock);
-+
-+	if (!prev) {
-+		/*
-+		 * New iteration; find first node in rb-tree and try to acquire
-+		 * it. If we got it, directly return it as first element.
-+		 * Otherwise, the loop below will find the next active node.
-+		 */
-+		rb = rb_first(&node->children);
-+		if (!rb)
-+			goto exit;
-+		pos = kdbus_node_from_rb(rb);
-+		if (kdbus_node_acquire(pos))
-+			goto exit;
-+	} else if (RB_EMPTY_NODE(&prev->rb)) {
-+		/*
-+		 * The current iterator is no longer linked to the rb-tree. Use
-+		 * its hash value and name to find the next _higher_ node and
-+		 * acquire it. If we got it, return it as next element.
-+		 * Otherwise, the loop below will find the next active node.
-+		 */
-+		pos = node_find_closest_unlocked(node, prev->hash, prev->name);
-+		if (!pos)
-+			goto exit;
-+		if (kdbus_node_acquire(pos))
-+			goto exit;
-+	} else {
-+		/*
-+		 * The current iterator is still linked to the parent. Set it
-+		 * as current position and use the loop below to find the next
-+		 * active element.
-+		 */
-+		pos = prev;
-+	}
-+
-+	/* @pos was already returned or is inactive; find next active node */
-+	do {
-+		rb = rb_next(&pos->rb);
-+		if (rb)
-+			pos = kdbus_node_from_rb(rb);
-+		else
-+			pos = NULL;
-+	} while (pos && !kdbus_node_acquire(pos));
-+
-+exit:
-+	/* @pos is NULL or acquired. Take ref if non-NULL and return it */
-+	kdbus_node_ref(pos);
-+	mutex_unlock(&node->lock);
-+	return pos;
-+}
-diff --git a/ipc/kdbus/node.h b/ipc/kdbus/node.h
-new file mode 100644
-index 0000000..970e02b
---- /dev/null
-+++ b/ipc/kdbus/node.h
-@@ -0,0 +1,86 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_NODE_H
-+#define __KDBUS_NODE_H
-+
-+#include <linux/atomic.h>
-+#include <linux/kernel.h>
-+#include <linux/mutex.h>
-+#include <linux/wait.h>
-+
-+struct kdbus_node;
-+
-+enum kdbus_node_type {
-+	KDBUS_NODE_DOMAIN,
-+	KDBUS_NODE_CONTROL,
-+	KDBUS_NODE_BUS,
-+	KDBUS_NODE_ENDPOINT,
-+};
-+
-+typedef void (*kdbus_node_free_t) (struct kdbus_node *node);
-+typedef void (*kdbus_node_release_t) (struct kdbus_node *node, bool was_active);
-+
-+struct kdbus_node {
-+	atomic_t refcnt;
-+	atomic_t active;
-+	wait_queue_head_t waitq;
-+
-+	/* static members */
-+	unsigned int type;
-+	kdbus_node_free_t free_cb;
-+	kdbus_node_release_t release_cb;
-+	umode_t mode;
-+	kuid_t uid;
-+	kgid_t gid;
-+
-+	/* valid once linked */
-+	char *name;
-+	unsigned int hash;
-+	unsigned int id;
-+	struct kdbus_node *parent; /* may be NULL */
-+
-+	/* valid iff active */
-+	struct mutex lock;
-+	struct rb_node rb;
-+	struct rb_root children;
-+};
-+
-+#define kdbus_node_from_rb(_node) rb_entry((_node), struct kdbus_node, rb)
-+
-+extern struct ida kdbus_node_ida;
-+
-+void kdbus_node_init(struct kdbus_node *node, unsigned int type);
-+
-+int kdbus_node_link(struct kdbus_node *node, struct kdbus_node *parent,
-+		    const char *name);
-+
-+struct kdbus_node *kdbus_node_ref(struct kdbus_node *node);
-+struct kdbus_node *kdbus_node_unref(struct kdbus_node *node);
-+
-+bool kdbus_node_is_active(struct kdbus_node *node);
-+bool kdbus_node_is_deactivated(struct kdbus_node *node);
-+bool kdbus_node_activate(struct kdbus_node *node);
-+void kdbus_node_deactivate(struct kdbus_node *node);
-+
-+bool kdbus_node_acquire(struct kdbus_node *node);
-+void kdbus_node_release(struct kdbus_node *node);
-+
-+struct kdbus_node *kdbus_node_find_child(struct kdbus_node *node,
-+					 const char *name);
-+struct kdbus_node *kdbus_node_find_closest(struct kdbus_node *node,
-+					   unsigned int hash);
-+struct kdbus_node *kdbus_node_next_child(struct kdbus_node *node,
-+					 struct kdbus_node *prev);
-+
-+#endif
-diff --git a/ipc/kdbus/notify.c b/ipc/kdbus/notify.c
-new file mode 100644
-index 0000000..375758c
---- /dev/null
-+++ b/ipc/kdbus/notify.c
-@@ -0,0 +1,204 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/fs.h>
-+#include <linux/init.h>
-+#include <linux/module.h>
-+#include <linux/mutex.h>
-+#include <linux/spinlock.h>
-+#include <linux/sched.h>
-+#include <linux/slab.h>
-+
-+#include "bus.h"
-+#include "connection.h"
-+#include "domain.h"
-+#include "endpoint.h"
-+#include "item.h"
-+#include "message.h"
-+#include "notify.h"
-+
-+static inline void kdbus_notify_add_tail(struct kdbus_staging *staging,
-+					 struct kdbus_bus *bus)
-+{
-+	spin_lock(&bus->notify_lock);
-+	list_add_tail(&staging->notify_entry, &bus->notify_list);
-+	spin_unlock(&bus->notify_lock);
-+}
-+
-+static int kdbus_notify_reply(struct kdbus_bus *bus, u64 id,
-+			      u64 cookie, u64 msg_type)
-+{
-+	struct kdbus_staging *s;
-+
-+	s = kdbus_staging_new_kernel(bus, id, cookie, 0, msg_type);
-+	if (IS_ERR(s))
-+		return PTR_ERR(s);
-+
-+	kdbus_notify_add_tail(s, bus);
-+	return 0;
-+}
-+
-+/**
-+ * kdbus_notify_reply_timeout() - queue a timeout reply
-+ * @bus:		Bus which queues the messages
-+ * @id:			The destination's connection ID
-+ * @cookie:		The cookie to set in the reply.
-+ *
-+ * Queues a message that has a KDBUS_ITEM_REPLY_TIMEOUT item attached.
-+ *
-+ * Return: 0 on success, negative errno on failure.
-+ */
-+int kdbus_notify_reply_timeout(struct kdbus_bus *bus, u64 id, u64 cookie)
-+{
-+	return kdbus_notify_reply(bus, id, cookie, KDBUS_ITEM_REPLY_TIMEOUT);
-+}
-+
-+/**
-+ * kdbus_notify_reply_dead() - queue a 'dead' reply
-+ * @bus:		Bus which queues the messages
-+ * @id:			The destination's connection ID
-+ * @cookie:		The cookie to set in the reply.
-+ *
-+ * Queues a message that has a KDBUS_ITEM_REPLY_DEAD item attached.
-+ *
-+ * Return: 0 on success, negative errno on failure.
-+ */
-+int kdbus_notify_reply_dead(struct kdbus_bus *bus, u64 id, u64 cookie)
-+{
-+	return kdbus_notify_reply(bus, id, cookie, KDBUS_ITEM_REPLY_DEAD);
-+}
-+
-+/**
-+ * kdbus_notify_name_change() - queue a notification about a name owner change
-+ * @bus:		Bus which queues the messages
-+ * @type:		The type if the notification; KDBUS_ITEM_NAME_ADD,
-+ *			KDBUS_ITEM_NAME_CHANGE or KDBUS_ITEM_NAME_REMOVE
-+ * @old_id:		The id of the connection that used to own the name
-+ * @new_id:		The id of the new owner connection
-+ * @old_flags:		The flags to pass in the KDBUS_ITEM flags field for
-+ *			the old owner
-+ * @new_flags:		The flags to pass in the KDBUS_ITEM flags field for
-+ *			the new owner
-+ * @name:		The name that was removed or assigned to a new owner
-+ *
-+ * Return: 0 on success, negative errno on failure.
-+ */
-+int kdbus_notify_name_change(struct kdbus_bus *bus, u64 type,
-+			     u64 old_id, u64 new_id,
-+			     u64 old_flags, u64 new_flags,
-+			     const char *name)
-+{
-+	size_t name_len, extra_size;
-+	struct kdbus_staging *s;
-+
-+	name_len = strlen(name) + 1;
-+	extra_size = sizeof(struct kdbus_notify_name_change) + name_len;
-+
-+	s = kdbus_staging_new_kernel(bus, KDBUS_DST_ID_BROADCAST, 0,
-+				     extra_size, type);
-+	if (IS_ERR(s))
-+		return PTR_ERR(s);
-+
-+	s->notify->name_change.old_id.id = old_id;
-+	s->notify->name_change.old_id.flags = old_flags;
-+	s->notify->name_change.new_id.id = new_id;
-+	s->notify->name_change.new_id.flags = new_flags;
-+	memcpy(s->notify->name_change.name, name, name_len);
-+
-+	kdbus_notify_add_tail(s, bus);
-+	return 0;
-+}
-+
-+/**
-+ * kdbus_notify_id_change() - queue a notification about a unique ID change
-+ * @bus:		Bus which queues the messages
-+ * @type:		The type if the notification; KDBUS_ITEM_ID_ADD or
-+ *			KDBUS_ITEM_ID_REMOVE
-+ * @id:			The id of the connection that was added or removed
-+ * @flags:		The flags to pass in the KDBUS_ITEM flags field
-+ *
-+ * Return: 0 on success, negative errno on failure.
-+ */
-+int kdbus_notify_id_change(struct kdbus_bus *bus, u64 type, u64 id, u64 flags)
-+{
-+	struct kdbus_staging *s;
-+	size_t extra_size;
-+
-+	extra_size = sizeof(struct kdbus_notify_id_change);
-+	s = kdbus_staging_new_kernel(bus, KDBUS_DST_ID_BROADCAST, 0,
-+				     extra_size, type);
-+	if (IS_ERR(s))
-+		return PTR_ERR(s);
-+
-+	s->notify->id_change.id = id;
-+	s->notify->id_change.flags = flags;
-+
-+	kdbus_notify_add_tail(s, bus);
-+	return 0;
-+}
-+
-+/**
-+ * kdbus_notify_flush() - send a list of collected messages
-+ * @bus:		Bus which queues the messages
-+ *
-+ * The list is empty after sending the messages.
-+ */
-+void kdbus_notify_flush(struct kdbus_bus *bus)
-+{
-+	LIST_HEAD(notify_list);
-+	struct kdbus_staging *s, *tmp;
-+
-+	mutex_lock(&bus->notify_flush_lock);
-+	down_read(&bus->name_registry->rwlock);
-+
-+	spin_lock(&bus->notify_lock);
-+	list_splice_init(&bus->notify_list, &notify_list);
-+	spin_unlock(&bus->notify_lock);
-+
-+	list_for_each_entry_safe(s, tmp, &notify_list, notify_entry) {
-+		if (s->msg->dst_id != KDBUS_DST_ID_BROADCAST) {
-+			struct kdbus_conn *conn;
-+
-+			conn = kdbus_bus_find_conn_by_id(bus, s->msg->dst_id);
-+			if (conn) {
-+				kdbus_bus_eavesdrop(bus, NULL, s);
-+				kdbus_conn_entry_insert(NULL, conn, s, NULL,
-+							NULL);
-+				kdbus_conn_unref(conn);
-+			}
-+		} else {
-+			kdbus_bus_broadcast(bus, NULL, s);
-+		}
-+
-+		list_del(&s->notify_entry);
-+		kdbus_staging_free(s);
-+	}
-+
-+	up_read(&bus->name_registry->rwlock);
-+	mutex_unlock(&bus->notify_flush_lock);
-+}
-+
-+/**
-+ * kdbus_notify_free() - free a list of collected messages
-+ * @bus:		Bus which queues the messages
-+ */
-+void kdbus_notify_free(struct kdbus_bus *bus)
-+{
-+	struct kdbus_staging *s, *tmp;
-+
-+	list_for_each_entry_safe(s, tmp, &bus->notify_list, notify_entry) {
-+		list_del(&s->notify_entry);
-+		kdbus_staging_free(s);
-+	}
-+}
-diff --git a/ipc/kdbus/notify.h b/ipc/kdbus/notify.h
-new file mode 100644
-index 0000000..03df464
---- /dev/null
-+++ b/ipc/kdbus/notify.h
-@@ -0,0 +1,30 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_NOTIFY_H
-+#define __KDBUS_NOTIFY_H
-+
-+struct kdbus_bus;
-+
-+int kdbus_notify_id_change(struct kdbus_bus *bus, u64 type, u64 id, u64 flags);
-+int kdbus_notify_reply_timeout(struct kdbus_bus *bus, u64 id, u64 cookie);
-+int kdbus_notify_reply_dead(struct kdbus_bus *bus, u64 id, u64 cookie);
-+int kdbus_notify_name_change(struct kdbus_bus *bus, u64 type,
-+			     u64 old_id, u64 new_id,
-+			     u64 old_flags, u64 new_flags,
-+			     const char *name);
-+void kdbus_notify_flush(struct kdbus_bus *bus);
-+void kdbus_notify_free(struct kdbus_bus *bus);
-+
-+#endif
-diff --git a/ipc/kdbus/policy.c b/ipc/kdbus/policy.c
-new file mode 100644
-index 0000000..f2618e15
---- /dev/null
-+++ b/ipc/kdbus/policy.c
-@@ -0,0 +1,489 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/dcache.h>
-+#include <linux/fs.h>
-+#include <linux/init.h>
-+#include <linux/mutex.h>
-+#include <linux/sched.h>
-+#include <linux/sizes.h>
-+#include <linux/slab.h>
-+#include <linux/uaccess.h>
-+
-+#include "bus.h"
-+#include "connection.h"
-+#include "domain.h"
-+#include "item.h"
-+#include "names.h"
-+#include "policy.h"
-+
-+#define KDBUS_POLICY_HASH_SIZE	64
-+
-+/**
-+ * struct kdbus_policy_db_entry_access - a database entry access item
-+ * @type:		One of KDBUS_POLICY_ACCESS_* types
-+ * @access:		Access to grant. One of KDBUS_POLICY_*
-+ * @uid:		For KDBUS_POLICY_ACCESS_USER, the global uid
-+ * @gid:		For KDBUS_POLICY_ACCESS_GROUP, the global gid
-+ * @list:		List entry item for the entry's list
-+ *
-+ * This is the internal version of struct kdbus_policy_db_access.
-+ */
-+struct kdbus_policy_db_entry_access {
-+	u8 type;		/* USER, GROUP, WORLD */
-+	u8 access;		/* OWN, TALK, SEE */
-+	union {
-+		kuid_t uid;	/* global uid */
-+		kgid_t gid;	/* global gid */
-+	};
-+	struct list_head list;
-+};
-+
-+/**
-+ * struct kdbus_policy_db_entry - a policy database entry
-+ * @name:		The name to match the policy entry against
-+ * @hentry:		The hash entry for the database's entries_hash
-+ * @access_list:	List head for keeping tracks of the entry's
-+ *			access items.
-+ * @owner:		The owner of this entry. Can be a kdbus_conn or
-+ *			a kdbus_ep object.
-+ * @wildcard:		The name is a wildcard, such as ending on '.*'
-+ */
-+struct kdbus_policy_db_entry {
-+	char *name;
-+	struct hlist_node hentry;
-+	struct list_head access_list;
-+	const void *owner;
-+	bool wildcard:1;
-+};
-+
-+static void kdbus_policy_entry_free(struct kdbus_policy_db_entry *e)
-+{
-+	struct kdbus_policy_db_entry_access *a, *tmp;
-+
-+	list_for_each_entry_safe(a, tmp, &e->access_list, list) {
-+		list_del(&a->list);
-+		kfree(a);
-+	}
-+
-+	kfree(e->name);
-+	kfree(e);
-+}
-+
-+static unsigned int kdbus_strnhash(const char *str, size_t len)
-+{
-+	unsigned long hash = init_name_hash();
-+
-+	while (len--)
-+		hash = partial_name_hash(*str++, hash);
-+
-+	return end_name_hash(hash);
-+}
-+
-+static const struct kdbus_policy_db_entry *
-+kdbus_policy_lookup(struct kdbus_policy_db *db, const char *name, u32 hash)
-+{
-+	struct kdbus_policy_db_entry *e;
-+	const char *dot;
-+	size_t len;
-+
-+	/* find exact match */
-+	hash_for_each_possible(db->entries_hash, e, hentry, hash)
-+		if (strcmp(e->name, name) == 0 && !e->wildcard)
-+			return e;
-+
-+	/* find wildcard match */
-+
-+	dot = strrchr(name, '.');
-+	if (!dot)
-+		return NULL;
-+
-+	len = dot - name;
-+	hash = kdbus_strnhash(name, len);
-+
-+	hash_for_each_possible(db->entries_hash, e, hentry, hash)
-+		if (e->wildcard && !strncmp(e->name, name, len) &&
-+		    !e->name[len])
-+			return e;
-+
-+	return NULL;
-+}
-+
-+/**
-+ * kdbus_policy_db_clear - release all memory from a policy db
-+ * @db:		The policy database
-+ */
-+void kdbus_policy_db_clear(struct kdbus_policy_db *db)
-+{
-+	struct kdbus_policy_db_entry *e;
-+	struct hlist_node *tmp;
-+	unsigned int i;
-+
-+	/* purge entries */
-+	down_write(&db->entries_rwlock);
-+	hash_for_each_safe(db->entries_hash, i, tmp, e, hentry) {
-+		hash_del(&e->hentry);
-+		kdbus_policy_entry_free(e);
-+	}
-+	up_write(&db->entries_rwlock);
-+}
-+
-+/**
-+ * kdbus_policy_db_init() - initialize a new policy database
-+ * @db:		The location of the database
-+ *
-+ * This initializes a new policy-db. The underlying memory must have been
-+ * cleared to zero by the caller.
-+ */
-+void kdbus_policy_db_init(struct kdbus_policy_db *db)
-+{
-+	hash_init(db->entries_hash);
-+	init_rwsem(&db->entries_rwlock);
-+}
-+
-+/**
-+ * kdbus_policy_query_unlocked() - Query the policy database
-+ * @db:		Policy database
-+ * @cred:	Credentials to test against
-+ * @name:	Name to query
-+ * @hash:	Hash value of @name
-+ *
-+ * Same as kdbus_policy_query() but requires the caller to lock the policy
-+ * database against concurrent writes.
-+ *
-+ * Return: The highest KDBUS_POLICY_* access type found, or -EPERM if none.
-+ */
-+int kdbus_policy_query_unlocked(struct kdbus_policy_db *db,
-+				const struct cred *cred, const char *name,
-+				unsigned int hash)
-+{
-+	struct kdbus_policy_db_entry_access *a;
-+	const struct kdbus_policy_db_entry *e;
-+	int i, highest = -EPERM;
-+
-+	e = kdbus_policy_lookup(db, name, hash);
-+	if (!e)
-+		return -EPERM;
-+
-+	list_for_each_entry(a, &e->access_list, list) {
-+		if ((int)a->access <= highest)
-+			continue;
-+
-+		switch (a->type) {
-+		case KDBUS_POLICY_ACCESS_USER:
-+			if (uid_eq(cred->euid, a->uid))
-+				highest = a->access;
-+			break;
-+		case KDBUS_POLICY_ACCESS_GROUP:
-+			if (gid_eq(cred->egid, a->gid)) {
-+				highest = a->access;
-+				break;
-+			}
-+
-+			for (i = 0; i < cred->group_info->ngroups; i++) {
-+				kgid_t gid = GROUP_AT(cred->group_info, i);
-+
-+				if (gid_eq(gid, a->gid)) {
-+					highest = a->access;
-+					break;
-+				}
-+			}
-+
-+			break;
-+		case KDBUS_POLICY_ACCESS_WORLD:
-+			highest = a->access;
-+			break;
-+		}
-+
-+		/* OWN is the highest possible policy */
-+		if (highest >= KDBUS_POLICY_OWN)
-+			break;
-+	}
-+
-+	return highest;
-+}
-+
-+/**
-+ * kdbus_policy_query() - Query the policy database
-+ * @db:		Policy database
-+ * @cred:	Credentials to test against
-+ * @name:	Name to query
-+ * @hash:	Hash value of @name
-+ *
-+ * Query the policy database @db for the access rights of @cred to the name
-+ * @name. The access rights of @cred are returned, or -EPERM if no access is
-+ * granted.
-+ *
-+ * This call effectively searches for the highest access-right granted to
-+ * @cred. The caller should really cache those as policy lookups are rather
-+ * expensive.
-+ *
-+ * Return: The highest KDBUS_POLICY_* access type found, or -EPERM if none.
-+ */
-+int kdbus_policy_query(struct kdbus_policy_db *db, const struct cred *cred,
-+		       const char *name, unsigned int hash)
-+{
-+	int ret;
-+
-+	down_read(&db->entries_rwlock);
-+	ret = kdbus_policy_query_unlocked(db, cred, name, hash);
-+	up_read(&db->entries_rwlock);
-+
-+	return ret;
-+}
-+
-+static void __kdbus_policy_remove_owner(struct kdbus_policy_db *db,
-+					const void *owner)
-+{
-+	struct kdbus_policy_db_entry *e;
-+	struct hlist_node *tmp;
-+	int i;
-+
-+	hash_for_each_safe(db->entries_hash, i, tmp, e, hentry)
-+		if (e->owner == owner) {
-+			hash_del(&e->hentry);
-+			kdbus_policy_entry_free(e);
-+		}
-+}
-+
-+/**
-+ * kdbus_policy_remove_owner() - remove all entries related to a connection
-+ * @db:		The policy database
-+ * @owner:	The connection which items to remove
-+ */
-+void kdbus_policy_remove_owner(struct kdbus_policy_db *db,
-+			       const void *owner)
-+{
-+	down_write(&db->entries_rwlock);
-+	__kdbus_policy_remove_owner(db, owner);
-+	up_write(&db->entries_rwlock);
-+}
-+
-+/*
-+ * Convert user provided policy access to internal kdbus policy
-+ * access
-+ */
-+static struct kdbus_policy_db_entry_access *
-+kdbus_policy_make_access(const struct kdbus_policy_access *uaccess)
-+{
-+	int ret;
-+	struct kdbus_policy_db_entry_access *a;
-+
-+	a = kzalloc(sizeof(*a), GFP_KERNEL);
-+	if (!a)
-+		return ERR_PTR(-ENOMEM);
-+
-+	ret = -EINVAL;
-+	switch (uaccess->access) {
-+	case KDBUS_POLICY_SEE:
-+	case KDBUS_POLICY_TALK:
-+	case KDBUS_POLICY_OWN:
-+		a->access = uaccess->access;
-+		break;
-+	default:
-+		goto err;
-+	}
-+
-+	switch (uaccess->type) {
-+	case KDBUS_POLICY_ACCESS_USER:
-+		a->uid = make_kuid(current_user_ns(), uaccess->id);
-+		if (!uid_valid(a->uid))
-+			goto err;
-+
-+		break;
-+	case KDBUS_POLICY_ACCESS_GROUP:
-+		a->gid = make_kgid(current_user_ns(), uaccess->id);
-+		if (!gid_valid(a->gid))
-+			goto err;
-+
-+		break;
-+	case KDBUS_POLICY_ACCESS_WORLD:
-+		break;
-+	default:
-+		goto err;
-+	}
-+
-+	a->type = uaccess->type;
-+
-+	return a;
-+
-+err:
-+	kfree(a);
-+	return ERR_PTR(ret);
-+}
-+
-+/**
-+ * kdbus_policy_set() - set a connection's policy rules
-+ * @db:				The policy database
-+ * @items:			A list of kdbus_item elements that contain both
-+ *				names and access rules to set.
-+ * @items_size:			The total size of the items.
-+ * @max_policies:		The maximum number of policy entries to allow.
-+ *				Pass 0 for no limit.
-+ * @allow_wildcards:		Boolean value whether wildcard entries (such
-+ *				ending on '.*') should be allowed.
-+ * @owner:			The owner of the new policy items.
-+ *
-+ * This function sets a new set of policies for a given owner. The names and
-+ * access rules are gathered by walking the list of items passed in as
-+ * argument. An item of type KDBUS_ITEM_NAME is expected before any number of
-+ * KDBUS_ITEM_POLICY_ACCESS items. If there are more repetitions of this
-+ * pattern than denoted in @max_policies, -EINVAL is returned.
-+ *
-+ * In order to allow atomic replacement of rules, the function first removes
-+ * all entries that have been created for the given owner previously.
-+ *
-+ * Callers to this function must make sure that the owner is a custom
-+ * endpoint, or if the endpoint is a default endpoint, then it must be
-+ * either a policy holder or an activator.
-+ *
-+ * Return: 0 on success, negative errno on failure.
-+ */
-+int kdbus_policy_set(struct kdbus_policy_db *db,
-+		     const struct kdbus_item *items,
-+		     size_t items_size,
-+		     size_t max_policies,
-+		     bool allow_wildcards,
-+		     const void *owner)
-+{
-+	struct kdbus_policy_db_entry_access *a;
-+	struct kdbus_policy_db_entry *e, *p;
-+	const struct kdbus_item *item;
-+	struct hlist_node *tmp;
-+	HLIST_HEAD(entries);
-+	HLIST_HEAD(restore);
-+	size_t count = 0;
-+	int i, ret = 0;
-+	u32 hash;
-+
-+	/* Walk the list of items and look for new policies */
-+	e = NULL;
-+	KDBUS_ITEMS_FOREACH(item, items, items_size) {
-+		switch (item->type) {
-+		case KDBUS_ITEM_NAME: {
-+			size_t len;
-+
-+			if (max_policies && ++count > max_policies) {
-+				ret = -E2BIG;
-+				goto exit;
-+			}
-+
-+			if (!kdbus_name_is_valid(item->str, true)) {
-+				ret = -EINVAL;
-+				goto exit;
-+			}
-+
-+			e = kzalloc(sizeof(*e), GFP_KERNEL);
-+			if (!e) {
-+				ret = -ENOMEM;
-+				goto exit;
-+			}
-+
-+			INIT_LIST_HEAD(&e->access_list);
-+			e->owner = owner;
-+			hlist_add_head(&e->hentry, &entries);
-+
-+			e->name = kstrdup(item->str, GFP_KERNEL);
-+			if (!e->name) {
-+				ret = -ENOMEM;
-+				goto exit;
-+			}
-+
-+			/*
-+			 * If a supplied name ends with an '.*', cut off that
-+			 * part, only store anything before it, and mark the
-+			 * entry as wildcard.
-+			 */
-+			len = strlen(e->name);
-+			if (len > 2 &&
-+			    e->name[len - 3] == '.' &&
-+			    e->name[len - 2] == '*') {
-+				if (!allow_wildcards) {
-+					ret = -EINVAL;
-+					goto exit;
-+				}
-+
-+				e->name[len - 3] = '\0';
-+				e->wildcard = true;
-+			}
-+
-+			break;
-+		}
-+
-+		case KDBUS_ITEM_POLICY_ACCESS:
-+			if (!e) {
-+				ret = -EINVAL;
-+				goto exit;
-+			}
-+
-+			a = kdbus_policy_make_access(&item->policy_access);
-+			if (IS_ERR(a)) {
-+				ret = PTR_ERR(a);
-+				goto exit;
-+			}
-+
-+			list_add_tail(&a->list, &e->access_list);
-+			break;
-+		}
-+	}
-+
-+	down_write(&db->entries_rwlock);
-+
-+	/* remember previous entries to restore in case of failure */
-+	hash_for_each_safe(db->entries_hash, i, tmp, e, hentry)
-+		if (e->owner == owner) {
-+			hash_del(&e->hentry);
-+			hlist_add_head(&e->hentry, &restore);
-+		}
-+
-+	hlist_for_each_entry_safe(e, tmp, &entries, hentry) {
-+		/* prevent duplicates */
-+		hash = kdbus_strhash(e->name);
-+		hash_for_each_possible(db->entries_hash, p, hentry, hash)
-+			if (strcmp(e->name, p->name) == 0 &&
-+			    e->wildcard == p->wildcard) {
-+				ret = -EEXIST;
-+				goto restore;
-+			}
-+
-+		hlist_del(&e->hentry);
-+		hash_add(db->entries_hash, &e->hentry, hash);
-+	}
-+
-+restore:
-+	/* if we failed, flush all entries we added so far */
-+	if (ret < 0)
-+		__kdbus_policy_remove_owner(db, owner);
-+
-+	/* if we failed, restore entries, otherwise release them */
-+	hlist_for_each_entry_safe(e, tmp, &restore, hentry) {
-+		hlist_del(&e->hentry);
-+		if (ret < 0) {
-+			hash = kdbus_strhash(e->name);
-+			hash_add(db->entries_hash, &e->hentry, hash);
-+		} else {
-+			kdbus_policy_entry_free(e);
-+		}
-+	}
-+
-+	up_write(&db->entries_rwlock);
-+
-+exit:
-+	hlist_for_each_entry_safe(e, tmp, &entries, hentry) {
-+		hlist_del(&e->hentry);
-+		kdbus_policy_entry_free(e);
-+	}
-+
-+	return ret;
-+}
-diff --git a/ipc/kdbus/policy.h b/ipc/kdbus/policy.h
-new file mode 100644
-index 0000000..15dd7bc
---- /dev/null
-+++ b/ipc/kdbus/policy.h
-@@ -0,0 +1,51 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_POLICY_H
-+#define __KDBUS_POLICY_H
-+
-+#include <linux/hashtable.h>
-+#include <linux/rwsem.h>
-+
-+struct kdbus_conn;
-+struct kdbus_item;
-+
-+/**
-+ * struct kdbus_policy_db - policy database
-+ * @entries_hash:	Hashtable of entries
-+ * @entries_rwlock:	Mutex to protect the database's access entries
-+ */
-+struct kdbus_policy_db {
-+	DECLARE_HASHTABLE(entries_hash, 6);
-+	struct rw_semaphore entries_rwlock;
-+};
-+
-+void kdbus_policy_db_init(struct kdbus_policy_db *db);
-+void kdbus_policy_db_clear(struct kdbus_policy_db *db);
-+
-+int kdbus_policy_query_unlocked(struct kdbus_policy_db *db,
-+				const struct cred *cred, const char *name,
-+				unsigned int hash);
-+int kdbus_policy_query(struct kdbus_policy_db *db, const struct cred *cred,
-+		       const char *name, unsigned int hash);
-+
-+void kdbus_policy_remove_owner(struct kdbus_policy_db *db,
-+			       const void *owner);
-+int kdbus_policy_set(struct kdbus_policy_db *db,
-+		     const struct kdbus_item *items,
-+		     size_t items_size,
-+		     size_t max_policies,
-+		     bool allow_wildcards,
-+		     const void *owner);
-+
-+#endif
-diff --git a/ipc/kdbus/pool.c b/ipc/kdbus/pool.c
-new file mode 100644
-index 0000000..63ccd55
---- /dev/null
-+++ b/ipc/kdbus/pool.c
-@@ -0,0 +1,728 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/aio.h>
-+#include <linux/file.h>
-+#include <linux/fs.h>
-+#include <linux/highmem.h>
-+#include <linux/init.h>
-+#include <linux/mm.h>
-+#include <linux/module.h>
-+#include <linux/pagemap.h>
-+#include <linux/rbtree.h>
-+#include <linux/sched.h>
-+#include <linux/shmem_fs.h>
-+#include <linux/sizes.h>
-+#include <linux/slab.h>
-+#include <linux/uaccess.h>
-+#include <linux/uio.h>
-+
-+#include "pool.h"
-+#include "util.h"
-+
-+/**
-+ * struct kdbus_pool - the receiver's buffer
-+ * @f:			The backing shmem file
-+ * @size:		The size of the file
-+ * @accounted_size:	Currently accounted memory in bytes
-+ * @lock:		Pool data lock
-+ * @slices:		All slices sorted by address
-+ * @slices_busy:	Tree of allocated slices
-+ * @slices_free:	Tree of free slices
-+ *
-+ * The receiver's buffer, managed as a pool of allocated and free
-+ * slices containing the queued messages.
-+ *
-+ * Messages sent with KDBUS_CMD_SEND are copied directly by the
-+ * sending process into the receiver's pool.
-+ *
-+ * Messages received with KDBUS_CMD_RECV just return the offset
-+ * to the data placed in the pool.
-+ *
-+ * The internally allocated memory needs to be returned by the receiver
-+ * with KDBUS_CMD_FREE.
-+ */
-+struct kdbus_pool {
-+	struct file *f;
-+	size_t size;
-+	size_t accounted_size;
-+	struct mutex lock;
-+
-+	struct list_head slices;
-+	struct rb_root slices_busy;
-+	struct rb_root slices_free;
-+};
-+
-+/**
-+ * struct kdbus_pool_slice - allocated element in kdbus_pool
-+ * @pool:		Pool this slice belongs to
-+ * @off:		Offset of slice in the shmem file
-+ * @size:		Size of slice
-+ * @entry:		Entry in "all slices" list
-+ * @rb_node:		Entry in free or busy list
-+ * @free:		Unused slice
-+ * @accounted:		Accounted as queue slice
-+ * @ref_kernel:		Kernel holds a reference
-+ * @ref_user:		Userspace holds a reference
-+ *
-+ * The pool has one or more slices, always spanning the entire size of the
-+ * pool.
-+ *
-+ * Every slice is an element in a list sorted by the buffer address, to
-+ * provide access to the next neighbor slice.
-+ *
-+ * Every slice is member in either the busy or the free tree. The free
-+ * tree is organized by slice size, the busy tree organized by buffer
-+ * offset.
-+ */
-+struct kdbus_pool_slice {
-+	struct kdbus_pool *pool;
-+	size_t off;
-+	size_t size;
-+
-+	struct list_head entry;
-+	struct rb_node rb_node;
-+
-+	bool free:1;
-+	bool accounted:1;
-+	bool ref_kernel:1;
-+	bool ref_user:1;
-+};
-+
-+static struct kdbus_pool_slice *kdbus_pool_slice_new(struct kdbus_pool *pool,
-+						     size_t off, size_t size)
-+{
-+	struct kdbus_pool_slice *slice;
-+
-+	slice = kzalloc(sizeof(*slice), GFP_KERNEL);
-+	if (!slice)
-+		return NULL;
-+
-+	slice->pool = pool;
-+	slice->off = off;
-+	slice->size = size;
-+	slice->free = true;
-+	return slice;
-+}
-+
-+/* insert a slice into the free tree */
-+static void kdbus_pool_add_free_slice(struct kdbus_pool *pool,
-+				      struct kdbus_pool_slice *slice)
-+{
-+	struct rb_node **n;
-+	struct rb_node *pn = NULL;
-+
-+	n = &pool->slices_free.rb_node;
-+	while (*n) {
-+		struct kdbus_pool_slice *pslice;
-+
-+		pn = *n;
-+		pslice = rb_entry(pn, struct kdbus_pool_slice, rb_node);
-+		if (slice->size < pslice->size)
-+			n = &pn->rb_left;
-+		else
-+			n = &pn->rb_right;
-+	}
-+
-+	rb_link_node(&slice->rb_node, pn, n);
-+	rb_insert_color(&slice->rb_node, &pool->slices_free);
-+}
-+
-+/* insert a slice into the busy tree */
-+static void kdbus_pool_add_busy_slice(struct kdbus_pool *pool,
-+				      struct kdbus_pool_slice *slice)
-+{
-+	struct rb_node **n;
-+	struct rb_node *pn = NULL;
-+
-+	n = &pool->slices_busy.rb_node;
-+	while (*n) {
-+		struct kdbus_pool_slice *pslice;
-+
-+		pn = *n;
-+		pslice = rb_entry(pn, struct kdbus_pool_slice, rb_node);
-+		if (slice->off < pslice->off)
-+			n = &pn->rb_left;
-+		else if (slice->off > pslice->off)
-+			n = &pn->rb_right;
-+		else
-+			BUG();
-+	}
-+
-+	rb_link_node(&slice->rb_node, pn, n);
-+	rb_insert_color(&slice->rb_node, &pool->slices_busy);
-+}
-+
-+static struct kdbus_pool_slice *kdbus_pool_find_slice(struct kdbus_pool *pool,
-+						      size_t off)
-+{
-+	struct rb_node *n;
-+
-+	n = pool->slices_busy.rb_node;
-+	while (n) {
-+		struct kdbus_pool_slice *s;
-+
-+		s = rb_entry(n, struct kdbus_pool_slice, rb_node);
-+		if (off < s->off)
-+			n = n->rb_left;
-+		else if (off > s->off)
-+			n = n->rb_right;
-+		else
-+			return s;
-+	}
-+
-+	return NULL;
-+}
-+
-+/**
-+ * kdbus_pool_slice_alloc() - allocate memory from a pool
-+ * @pool:	The receiver's pool
-+ * @size:	The number of bytes to allocate
-+ * @accounted:	Whether this slice should be accounted for
-+ *
-+ * The returned slice is used for kdbus_pool_slice_release() to
-+ * free the allocated memory. If either @kvec or @iovec is non-NULL, the data
-+ * will be copied from kernel or userspace memory into the new slice at
-+ * offset 0.
-+ *
-+ * Return: the allocated slice on success, ERR_PTR on failure.
-+ */
-+struct kdbus_pool_slice *kdbus_pool_slice_alloc(struct kdbus_pool *pool,
-+						size_t size, bool accounted)
-+{
-+	size_t slice_size = KDBUS_ALIGN8(size);
-+	struct rb_node *n, *found = NULL;
-+	struct kdbus_pool_slice *s;
-+	int ret = 0;
-+
-+	if (WARN_ON(!size))
-+		return ERR_PTR(-EINVAL);
-+
-+	/* search a free slice with the closest matching size */
-+	mutex_lock(&pool->lock);
-+	n = pool->slices_free.rb_node;
-+	while (n) {
-+		s = rb_entry(n, struct kdbus_pool_slice, rb_node);
-+		if (slice_size < s->size) {
-+			found = n;
-+			n = n->rb_left;
-+		} else if (slice_size > s->size) {
-+			n = n->rb_right;
-+		} else {
-+			found = n;
-+			break;
-+		}
-+	}
-+
-+	/* no slice with the minimum size found in the pool */
-+	if (!found) {
-+		ret = -EXFULL;
-+		goto exit_unlock;
-+	}
-+
-+	/* no exact match, use the closest one */
-+	if (!n) {
-+		struct kdbus_pool_slice *s_new;
-+
-+		s = rb_entry(found, struct kdbus_pool_slice, rb_node);
-+
-+		/* split-off the remainder of the size to its own slice */
-+		s_new = kdbus_pool_slice_new(pool, s->off + slice_size,
-+					     s->size - slice_size);
-+		if (!s_new) {
-+			ret = -ENOMEM;
-+			goto exit_unlock;
-+		}
-+
-+		list_add(&s_new->entry, &s->entry);
-+		kdbus_pool_add_free_slice(pool, s_new);
-+
-+		/* adjust our size now that we split-off another slice */
-+		s->size = slice_size;
-+	}
-+
-+	/* move slice from free to the busy tree */
-+	rb_erase(found, &pool->slices_free);
-+	kdbus_pool_add_busy_slice(pool, s);
-+
-+	WARN_ON(s->ref_kernel || s->ref_user);
-+
-+	s->ref_kernel = true;
-+	s->free = false;
-+	s->accounted = accounted;
-+	if (accounted)
-+		pool->accounted_size += s->size;
-+	mutex_unlock(&pool->lock);
-+
-+	return s;
-+
-+exit_unlock:
-+	mutex_unlock(&pool->lock);
-+	return ERR_PTR(ret);
-+}
-+
-+static void __kdbus_pool_slice_release(struct kdbus_pool_slice *slice)
-+{
-+	struct kdbus_pool *pool = slice->pool;
-+
-+	/* don't free the slice if either has a reference */
-+	if (slice->ref_kernel || slice->ref_user)
-+		return;
-+
-+	if (WARN_ON(slice->free))
-+		return;
-+
-+	rb_erase(&slice->rb_node, &pool->slices_busy);
-+
-+	/* merge with the next free slice */
-+	if (!list_is_last(&slice->entry, &pool->slices)) {
-+		struct kdbus_pool_slice *s;
-+
-+		s = list_entry(slice->entry.next,
-+			       struct kdbus_pool_slice, entry);
-+		if (s->free) {
-+			rb_erase(&s->rb_node, &pool->slices_free);
-+			list_del(&s->entry);
-+			slice->size += s->size;
-+			kfree(s);
-+		}
-+	}
-+
-+	/* merge with previous free slice */
-+	if (pool->slices.next != &slice->entry) {
-+		struct kdbus_pool_slice *s;
-+
-+		s = list_entry(slice->entry.prev,
-+			       struct kdbus_pool_slice, entry);
-+		if (s->free) {
-+			rb_erase(&s->rb_node, &pool->slices_free);
-+			list_del(&slice->entry);
-+			s->size += slice->size;
-+			kfree(slice);
-+			slice = s;
-+		}
-+	}
-+
-+	slice->free = true;
-+	kdbus_pool_add_free_slice(pool, slice);
-+}
-+
-+/**
-+ * kdbus_pool_slice_release() - drop kernel-reference on allocated slice
-+ * @slice:		Slice allocated from the pool
-+ *
-+ * This releases the kernel-reference on the given slice. If the
-+ * kernel-reference and the user-reference on a slice are dropped, the slice is
-+ * returned to the pool.
-+ *
-+ * So far, we do not implement full ref-counting on slices. Each, kernel and
-+ * user-space can have exactly one reference to a slice. If both are dropped at
-+ * the same time, the slice is released.
-+ */
-+void kdbus_pool_slice_release(struct kdbus_pool_slice *slice)
-+{
-+	struct kdbus_pool *pool;
-+
-+	if (!slice)
-+		return;
-+
-+	/* @slice may be freed, so keep local ptr to @pool */
-+	pool = slice->pool;
-+
-+	mutex_lock(&pool->lock);
-+	/* kernel must own a ref to @slice to drop it */
-+	WARN_ON(!slice->ref_kernel);
-+	slice->ref_kernel = false;
-+	/* no longer kernel-owned, de-account slice */
-+	if (slice->accounted && !WARN_ON(pool->accounted_size < slice->size))
-+		pool->accounted_size -= slice->size;
-+	__kdbus_pool_slice_release(slice);
-+	mutex_unlock(&pool->lock);
-+}
-+
-+/**
-+ * kdbus_pool_release_offset() - release a public offset
-+ * @pool:		pool to operate on
-+ * @off:		offset to release
-+ *
-+ * This should be called whenever user-space frees a slice given to them. It
-+ * verifies the slice is available and public, and then drops it. It ensures
-+ * correct locking and barriers against queues.
-+ *
-+ * Return: 0 on success, ENXIO if the offset is invalid or not public.
-+ */
-+int kdbus_pool_release_offset(struct kdbus_pool *pool, size_t off)
-+{
-+	struct kdbus_pool_slice *slice;
-+	int ret = 0;
-+
-+	/* 'pool->size' is used as dummy offset for empty slices */
-+	if (off == pool->size)
-+		return 0;
-+
-+	mutex_lock(&pool->lock);
-+	slice = kdbus_pool_find_slice(pool, off);
-+	if (slice && slice->ref_user) {
-+		slice->ref_user = false;
-+		__kdbus_pool_slice_release(slice);
-+	} else {
-+		ret = -ENXIO;
-+	}
-+	mutex_unlock(&pool->lock);
-+
-+	return ret;
-+}
-+
-+/**
-+ * kdbus_pool_publish_empty() - publish empty slice to user-space
-+ * @pool:		pool to operate on
-+ * @off:		output storage for offset, or NULL
-+ * @size:		output storage for size, or NULL
-+ *
-+ * This is the same as kdbus_pool_slice_publish(), but uses a dummy slice with
-+ * size 0. The returned offset points to the end of the pool and is never
-+ * returned on real slices.
-+ */
-+void kdbus_pool_publish_empty(struct kdbus_pool *pool, u64 *off, u64 *size)
-+{
-+	if (off)
-+		*off = pool->size;
-+	if (size)
-+		*size = 0;
-+}
-+
-+/**
-+ * kdbus_pool_slice_publish() - publish slice to user-space
-+ * @slice:		The slice
-+ * @out_offset:		Output storage for offset, or NULL
-+ * @out_size:		Output storage for size, or NULL
-+ *
-+ * This prepares a slice to be published to user-space.
-+ *
-+ * This call combines the following operations:
-+ *   * the memory region is flushed so the user's memory view is consistent
-+ *   * the slice is marked as referenced by user-space, so user-space has to
-+ *     call KDBUS_CMD_FREE to release it
-+ *   * the offset and size of the slice are written to the given output
-+ *     arguments, if non-NULL
-+ */
-+void kdbus_pool_slice_publish(struct kdbus_pool_slice *slice,
-+			      u64 *out_offset, u64 *out_size)
-+{
-+	mutex_lock(&slice->pool->lock);
-+	/* kernel must own a ref to @slice to gain a user-space ref */
-+	WARN_ON(!slice->ref_kernel);
-+	slice->ref_user = true;
-+	mutex_unlock(&slice->pool->lock);
-+
-+	if (out_offset)
-+		*out_offset = slice->off;
-+	if (out_size)
-+		*out_size = slice->size;
-+}
-+
-+/**
-+ * kdbus_pool_slice_offset() - Get a slice's offset inside the pool
-+ * @slice:	Slice to return the offset of
-+ *
-+ * Return: The internal offset @slice inside the pool.
-+ */
-+off_t kdbus_pool_slice_offset(const struct kdbus_pool_slice *slice)
-+{
-+	return slice->off;
-+}
-+
-+/**
-+ * kdbus_pool_slice_size() - get size of a pool slice
-+ * @slice:	slice to query
-+ *
-+ * Return: size of the given slice
-+ */
-+size_t kdbus_pool_slice_size(const struct kdbus_pool_slice *slice)
-+{
-+	return slice->size;
-+}
-+
-+/**
-+ * kdbus_pool_new() - create a new pool
-+ * @name:		Name of the (deleted) file which shows up in
-+ *			/proc, used for debugging
-+ * @size:		Maximum size of the pool
-+ *
-+ * Return: a new kdbus_pool on success, ERR_PTR on failure.
-+ */
-+struct kdbus_pool *kdbus_pool_new(const char *name, size_t size)
-+{
-+	struct kdbus_pool_slice *s;
-+	struct kdbus_pool *p;
-+	struct file *f;
-+	char *n = NULL;
-+	int ret;
-+
-+	p = kzalloc(sizeof(*p), GFP_KERNEL);
-+	if (!p)
-+		return ERR_PTR(-ENOMEM);
-+
-+	if (name) {
-+		n = kasprintf(GFP_KERNEL, KBUILD_MODNAME "-conn:%s", name);
-+		if (!n) {
-+			ret = -ENOMEM;
-+			goto exit_free;
-+		}
-+	}
-+
-+	f = shmem_file_setup(n ?: KBUILD_MODNAME "-conn", size, 0);
-+	kfree(n);
-+
-+	if (IS_ERR(f)) {
-+		ret = PTR_ERR(f);
-+		goto exit_free;
-+	}
-+
-+	ret = get_write_access(file_inode(f));
-+	if (ret < 0)
-+		goto exit_put_shmem;
-+
-+	/* allocate first slice spanning the entire pool */
-+	s = kdbus_pool_slice_new(p, 0, size);
-+	if (!s) {
-+		ret = -ENOMEM;
-+		goto exit_put_write;
-+	}
-+
-+	p->f = f;
-+	p->size = size;
-+	p->slices_free = RB_ROOT;
-+	p->slices_busy = RB_ROOT;
-+	mutex_init(&p->lock);
-+
-+	INIT_LIST_HEAD(&p->slices);
-+	list_add(&s->entry, &p->slices);
-+
-+	kdbus_pool_add_free_slice(p, s);
-+	return p;
-+
-+exit_put_write:
-+	put_write_access(file_inode(f));
-+exit_put_shmem:
-+	fput(f);
-+exit_free:
-+	kfree(p);
-+	return ERR_PTR(ret);
-+}
-+
-+/**
-+ * kdbus_pool_free() - destroy pool
-+ * @pool:		The receiver's pool
-+ */
-+void kdbus_pool_free(struct kdbus_pool *pool)
-+{
-+	struct kdbus_pool_slice *s, *tmp;
-+
-+	if (!pool)
-+		return;
-+
-+	list_for_each_entry_safe(s, tmp, &pool->slices, entry) {
-+		list_del(&s->entry);
-+		kfree(s);
-+	}
-+
-+	put_write_access(file_inode(pool->f));
-+	fput(pool->f);
-+	kfree(pool);
-+}
-+
-+/**
-+ * kdbus_pool_accounted() - retrieve accounting information
-+ * @pool:		pool to query
-+ * @size:		output for overall pool size
-+ * @acc:		output for currently accounted size
-+ *
-+ * This returns accounting information of the pool. Note that the data might
-+ * change after the function returns, as the pool lock is dropped. You need to
-+ * protect the data via other means, if you need reliable accounting.
-+ */
-+void kdbus_pool_accounted(struct kdbus_pool *pool, size_t *size, size_t *acc)
-+{
-+	mutex_lock(&pool->lock);
-+	if (size)
-+		*size = pool->size;
-+	if (acc)
-+		*acc = pool->accounted_size;
-+	mutex_unlock(&pool->lock);
-+}
-+
-+/**
-+ * kdbus_pool_slice_copy_iovec() - copy user memory to a slice
-+ * @slice:		The slice to write to
-+ * @off:		Offset in the slice to write to
-+ * @iov:		iovec array, pointing to data to copy
-+ * @iov_len:		Number of elements in @iov
-+ * @total_len:		Total number of bytes described in members of @iov
-+ *
-+ * User memory referenced by @iov will be copied into @slice at offset @off.
-+ *
-+ * Return: the numbers of bytes copied, negative errno on failure.
-+ */
-+ssize_t
-+kdbus_pool_slice_copy_iovec(const struct kdbus_pool_slice *slice, loff_t off,
-+			    struct iovec *iov, size_t iov_len, size_t total_len)
-+{
-+	struct iov_iter iter;
-+	ssize_t len;
-+
-+	if (WARN_ON(off + total_len > slice->size))
-+		return -EFAULT;
-+
-+	off += slice->off;
-+	iov_iter_init(&iter, WRITE, iov, iov_len, total_len);
-+	len = vfs_iter_write(slice->pool->f, &iter, &off);
-+
-+	return (len >= 0 && len != total_len) ? -EFAULT : len;
-+}
-+
-+/**
-+ * kdbus_pool_slice_copy_kvec() - copy kernel memory to a slice
-+ * @slice:		The slice to write to
-+ * @off:		Offset in the slice to write to
-+ * @kvec:		kvec array, pointing to data to copy
-+ * @kvec_len:		Number of elements in @kvec
-+ * @total_len:		Total number of bytes described in members of @kvec
-+ *
-+ * Kernel memory referenced by @kvec will be copied into @slice at offset @off.
-+ *
-+ * Return: the numbers of bytes copied, negative errno on failure.
-+ */
-+ssize_t kdbus_pool_slice_copy_kvec(const struct kdbus_pool_slice *slice,
-+				   loff_t off, struct kvec *kvec,
-+				   size_t kvec_len, size_t total_len)
-+{
-+	struct iov_iter iter;
-+	mm_segment_t old_fs;
-+	ssize_t len;
-+
-+	if (WARN_ON(off + total_len > slice->size))
-+		return -EFAULT;
-+
-+	off += slice->off;
-+	iov_iter_kvec(&iter, WRITE | ITER_KVEC, kvec, kvec_len, total_len);
-+
-+	old_fs = get_fs();
-+	set_fs(get_ds());
-+	len = vfs_iter_write(slice->pool->f, &iter, &off);
-+	set_fs(old_fs);
-+
-+	return (len >= 0 && len != total_len) ? -EFAULT : len;
-+}
-+
-+/**
-+ * kdbus_pool_slice_copy() - copy data from one slice into another
-+ * @slice_dst:		destination slice
-+ * @slice_src:		source slice
-+ *
-+ * Return: 0 on success, negative error number on failure.
-+ */
-+int kdbus_pool_slice_copy(const struct kdbus_pool_slice *slice_dst,
-+			  const struct kdbus_pool_slice *slice_src)
-+{
-+	struct file *f_src = slice_src->pool->f;
-+	struct file *f_dst = slice_dst->pool->f;
-+	struct inode *i_dst = file_inode(f_dst);
-+	struct address_space *mapping_dst = f_dst->f_mapping;
-+	const struct address_space_operations *aops = mapping_dst->a_ops;
-+	unsigned long len = slice_src->size;
-+	loff_t off_src = slice_src->off;
-+	loff_t off_dst = slice_dst->off;
-+	mm_segment_t old_fs;
-+	int ret = 0;
-+
-+	if (WARN_ON(slice_src->size != slice_dst->size) ||
-+	    WARN_ON(slice_src->free || slice_dst->free))
-+		return -EINVAL;
-+
-+	mutex_lock(&i_dst->i_mutex);
-+	old_fs = get_fs();
-+	set_fs(get_ds());
-+	while (len > 0) {
-+		unsigned long page_off;
-+		unsigned long copy_len;
-+		char __user *kaddr;
-+		struct page *page;
-+		ssize_t n_read;
-+		void *fsdata;
-+		long status;
-+
-+		page_off = off_dst & (PAGE_CACHE_SIZE - 1);
-+		copy_len = min_t(unsigned long,
-+				 PAGE_CACHE_SIZE - page_off, len);
-+
-+		status = aops->write_begin(f_dst, mapping_dst, off_dst,
-+					   copy_len, 0, &page, &fsdata);
-+		if (unlikely(status < 0)) {
-+			ret = status;
-+			break;
-+		}
-+
-+		kaddr = (char __force __user *)kmap(page) + page_off;
-+		n_read = __vfs_read(f_src, kaddr, copy_len, &off_src);
-+		kunmap(page);
-+		mark_page_accessed(page);
-+		flush_dcache_page(page);
-+
-+		if (unlikely(n_read != copy_len)) {
-+			ret = -EFAULT;
-+			break;
-+		}
-+
-+		status = aops->write_end(f_dst, mapping_dst, off_dst,
-+					 copy_len, copy_len, page, fsdata);
-+		if (unlikely(status != copy_len)) {
-+			ret = -EFAULT;
-+			break;
-+		}
-+
-+		off_dst += copy_len;
-+		len -= copy_len;
-+	}
-+	set_fs(old_fs);
-+	mutex_unlock(&i_dst->i_mutex);
-+
-+	return ret;
-+}
-+
-+/**
-+ * kdbus_pool_mmap() -  map the pool into the process
-+ * @pool:		The receiver's pool
-+ * @vma:		passed by mmap() syscall
-+ *
-+ * Return: the result of the mmap() call, negative errno on failure.
-+ */
-+int kdbus_pool_mmap(const struct kdbus_pool *pool, struct vm_area_struct *vma)
-+{
-+	/* deny write access to the pool */
-+	if (vma->vm_flags & VM_WRITE)
-+		return -EPERM;
-+	vma->vm_flags &= ~VM_MAYWRITE;
-+
-+	/* do not allow to map more than the size of the file */
-+	if ((vma->vm_end - vma->vm_start) > pool->size)
-+		return -EFAULT;
-+
-+	/* replace the connection file with our shmem file */
-+	if (vma->vm_file)
-+		fput(vma->vm_file);
-+	vma->vm_file = get_file(pool->f);
-+
-+	return pool->f->f_op->mmap(pool->f, vma);
-+}
-diff --git a/ipc/kdbus/pool.h b/ipc/kdbus/pool.h
-new file mode 100644
-index 0000000..a903821
---- /dev/null
-+++ b/ipc/kdbus/pool.h
-@@ -0,0 +1,46 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_POOL_H
-+#define __KDBUS_POOL_H
-+
-+#include <linux/uio.h>
-+
-+struct kdbus_pool;
-+struct kdbus_pool_slice;
-+
-+struct kdbus_pool *kdbus_pool_new(const char *name, size_t size);
-+void kdbus_pool_free(struct kdbus_pool *pool);
-+void kdbus_pool_accounted(struct kdbus_pool *pool, size_t *size, size_t *acc);
-+int kdbus_pool_mmap(const struct kdbus_pool *pool, struct vm_area_struct *vma);
-+int kdbus_pool_release_offset(struct kdbus_pool *pool, size_t off);
-+void kdbus_pool_publish_empty(struct kdbus_pool *pool, u64 *off, u64 *size);
-+
-+struct kdbus_pool_slice *kdbus_pool_slice_alloc(struct kdbus_pool *pool,
-+						size_t size, bool accounted);
-+void kdbus_pool_slice_release(struct kdbus_pool_slice *slice);
-+void kdbus_pool_slice_publish(struct kdbus_pool_slice *slice,
-+			      u64 *out_offset, u64 *out_size);
-+off_t kdbus_pool_slice_offset(const struct kdbus_pool_slice *slice);
-+size_t kdbus_pool_slice_size(const struct kdbus_pool_slice *slice);
-+int kdbus_pool_slice_copy(const struct kdbus_pool_slice *slice_dst,
-+			  const struct kdbus_pool_slice *slice_src);
-+ssize_t kdbus_pool_slice_copy_kvec(const struct kdbus_pool_slice *slice,
-+				   loff_t off, struct kvec *kvec,
-+				   size_t kvec_count, size_t total_len);
-+ssize_t kdbus_pool_slice_copy_iovec(const struct kdbus_pool_slice *slice,
-+				    loff_t off, struct iovec *iov,
-+				    size_t iov_count, size_t total_len);
-+
-+#endif
-diff --git a/ipc/kdbus/queue.c b/ipc/kdbus/queue.c
-new file mode 100644
-index 0000000..f9c44d7
---- /dev/null
-+++ b/ipc/kdbus/queue.c
-@@ -0,0 +1,363 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/audit.h>
-+#include <linux/file.h>
-+#include <linux/fs.h>
-+#include <linux/hashtable.h>
-+#include <linux/idr.h>
-+#include <linux/init.h>
-+#include <linux/math64.h>
-+#include <linux/mm.h>
-+#include <linux/module.h>
-+#include <linux/mutex.h>
-+#include <linux/poll.h>
-+#include <linux/sched.h>
-+#include <linux/sizes.h>
-+#include <linux/slab.h>
-+#include <linux/syscalls.h>
-+#include <linux/uio.h>
-+
-+#include "util.h"
-+#include "domain.h"
-+#include "connection.h"
-+#include "item.h"
-+#include "message.h"
-+#include "metadata.h"
-+#include "queue.h"
-+#include "reply.h"
-+
-+/**
-+ * kdbus_queue_init() - initialize data structure related to a queue
-+ * @queue:	The queue to initialize
-+ */
-+void kdbus_queue_init(struct kdbus_queue *queue)
-+{
-+	INIT_LIST_HEAD(&queue->msg_list);
-+	queue->msg_prio_queue = RB_ROOT;
-+}
-+
-+/**
-+ * kdbus_queue_peek() - Retrieves an entry from a queue
-+ * @queue:		The queue
-+ * @priority:		The minimum priority of the entry to peek
-+ * @use_priority:	Boolean flag whether or not to peek by priority
-+ *
-+ * Look for a entry in a queue, either by priority, or the oldest one (FIFO).
-+ * The entry is not freed, put off the queue's lists or anything else.
-+ *
-+ * Return: the peeked queue entry on success, NULL if no suitable msg is found
-+ */
-+struct kdbus_queue_entry *kdbus_queue_peek(struct kdbus_queue *queue,
-+					   s64 priority, bool use_priority)
-+{
-+	struct kdbus_queue_entry *e;
-+
-+	if (list_empty(&queue->msg_list))
-+		return NULL;
-+
-+	if (use_priority) {
-+		/* get next entry with highest priority */
-+		e = rb_entry(queue->msg_prio_highest,
-+			     struct kdbus_queue_entry, prio_node);
-+
-+		/* no entry with the requested priority */
-+		if (e->priority > priority)
-+			return NULL;
-+	} else {
-+		/* ignore the priority, return the next entry in the entry */
-+		e = list_first_entry(&queue->msg_list,
-+				     struct kdbus_queue_entry, entry);
-+	}
-+
-+	return e;
-+}
-+
-+static void kdbus_queue_entry_link(struct kdbus_queue_entry *entry)
-+{
-+	struct kdbus_queue *queue = &entry->conn->queue;
-+	struct rb_node **n, *pn = NULL;
-+	bool highest = true;
-+
-+	lockdep_assert_held(&entry->conn->lock);
-+	if (WARN_ON(!list_empty(&entry->entry)))
-+		return;
-+
-+	/* sort into priority entry tree */
-+	n = &queue->msg_prio_queue.rb_node;
-+	while (*n) {
-+		struct kdbus_queue_entry *e;
-+
-+		pn = *n;
-+		e = rb_entry(pn, struct kdbus_queue_entry, prio_node);
-+
-+		/* existing node for this priority, add to its list */
-+		if (likely(entry->priority == e->priority)) {
-+			list_add_tail(&entry->prio_entry, &e->prio_entry);
-+			goto prio_done;
-+		}
-+
-+		if (entry->priority < e->priority) {
-+			n = &pn->rb_left;
-+		} else {
-+			n = &pn->rb_right;
-+			highest = false;
-+		}
-+	}
-+
-+	/* cache highest-priority entry */
-+	if (highest)
-+		queue->msg_prio_highest = &entry->prio_node;
-+
-+	/* new node for this priority */
-+	rb_link_node(&entry->prio_node, pn, n);
-+	rb_insert_color(&entry->prio_node, &queue->msg_prio_queue);
-+	INIT_LIST_HEAD(&entry->prio_entry);
-+
-+prio_done:
-+	/* add to unsorted fifo list */
-+	list_add_tail(&entry->entry, &queue->msg_list);
-+}
-+
-+static void kdbus_queue_entry_unlink(struct kdbus_queue_entry *entry)
-+{
-+	struct kdbus_queue *queue = &entry->conn->queue;
-+
-+	lockdep_assert_held(&entry->conn->lock);
-+	if (list_empty(&entry->entry))
-+		return;
-+
-+	list_del_init(&entry->entry);
-+
-+	if (list_empty(&entry->prio_entry)) {
-+		/*
-+		 * Single entry for this priority, update cached
-+		 * highest-priority entry, remove the tree node.
-+		 */
-+		if (queue->msg_prio_highest == &entry->prio_node)
-+			queue->msg_prio_highest = rb_next(&entry->prio_node);
-+
-+		rb_erase(&entry->prio_node, &queue->msg_prio_queue);
-+	} else {
-+		struct kdbus_queue_entry *q;
-+
-+		/*
-+		 * Multiple entries for this priority entry, get next one in
-+		 * the list. Update cached highest-priority entry, store the
-+		 * new one as the tree node.
-+		 */
-+		q = list_first_entry(&entry->prio_entry,
-+				     struct kdbus_queue_entry, prio_entry);
-+		list_del(&entry->prio_entry);
-+
-+		if (queue->msg_prio_highest == &entry->prio_node)
-+			queue->msg_prio_highest = &q->prio_node;
-+
-+		rb_replace_node(&entry->prio_node, &q->prio_node,
-+				&queue->msg_prio_queue);
-+	}
-+}
-+
-+/**
-+ * kdbus_queue_entry_new() - allocate a queue entry
-+ * @src:	source connection, or NULL
-+ * @dst:	destination connection
-+ * @s:		staging object carrying the message
-+ *
-+ * Allocates a queue entry based on a given msg and allocate space for
-+ * the message payload and the requested metadata in the connection's pool.
-+ * The entry is not actually added to the queue's lists at this point.
-+ *
-+ * Return: the allocated entry on success, or an ERR_PTR on failures.
-+ */
-+struct kdbus_queue_entry *kdbus_queue_entry_new(struct kdbus_conn *src,
-+						struct kdbus_conn *dst,
-+						struct kdbus_staging *s)
-+{
-+	struct kdbus_queue_entry *entry;
-+	int ret;
-+
-+	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
-+	if (!entry)
-+		return ERR_PTR(-ENOMEM);
-+
-+	INIT_LIST_HEAD(&entry->entry);
-+	entry->priority = s->msg->priority;
-+	entry->conn = kdbus_conn_ref(dst);
-+	entry->gaps = kdbus_gaps_ref(s->gaps);
-+
-+	entry->slice = kdbus_staging_emit(s, src, dst);
-+	if (IS_ERR(entry->slice)) {
-+		ret = PTR_ERR(entry->slice);
-+		entry->slice = NULL;
-+		goto error;
-+	}
-+
-+	entry->user = src ? kdbus_user_ref(src->user) : NULL;
-+	return entry;
-+
-+error:
-+	kdbus_queue_entry_free(entry);
-+	return ERR_PTR(ret);
-+}
-+
-+/**
-+ * kdbus_queue_entry_free() - free resources of an entry
-+ * @entry:	The entry to free
-+ *
-+ * Removes resources allocated by a queue entry, along with the entry itself.
-+ * Note that the entry's slice is not freed at this point.
-+ */
-+void kdbus_queue_entry_free(struct kdbus_queue_entry *entry)
-+{
-+	if (!entry)
-+		return;
-+
-+	lockdep_assert_held(&entry->conn->lock);
-+
-+	kdbus_queue_entry_unlink(entry);
-+	kdbus_reply_unref(entry->reply);
-+
-+	if (entry->slice) {
-+		kdbus_conn_quota_dec(entry->conn, entry->user,
-+				     kdbus_pool_slice_size(entry->slice),
-+				     entry->gaps ? entry->gaps->n_fds : 0);
-+		kdbus_pool_slice_release(entry->slice);
-+	}
-+
-+	kdbus_user_unref(entry->user);
-+	kdbus_gaps_unref(entry->gaps);
-+	kdbus_conn_unref(entry->conn);
-+	kfree(entry);
-+}
-+
-+/**
-+ * kdbus_queue_entry_install() - install message components into the
-+ *				 receiver's process
-+ * @entry:		The queue entry to install
-+ * @return_flags:	Pointer to store the return flags for userspace
-+ * @install_fds:	Whether or not to install associated file descriptors
-+ *
-+ * Return: 0 on success.
-+ */
-+int kdbus_queue_entry_install(struct kdbus_queue_entry *entry,
-+			      u64 *return_flags, bool install_fds)
-+{
-+	bool incomplete_fds = false;
-+	int ret;
-+
-+	lockdep_assert_held(&entry->conn->lock);
-+
-+	ret = kdbus_gaps_install(entry->gaps, entry->slice, &incomplete_fds);
-+	if (ret < 0)
-+		return ret;
-+
-+	if (incomplete_fds)
-+		*return_flags |= KDBUS_RECV_RETURN_INCOMPLETE_FDS;
-+	return 0;
-+}
-+
-+/**
-+ * kdbus_queue_entry_enqueue() - enqueue an entry
-+ * @entry:		entry to enqueue
-+ * @reply:		reply to link to this entry (or NULL if none)
-+ *
-+ * This enqueues an unqueued entry into the message queue of the linked
-+ * connection. It also binds a reply object to the entry so we can remember it
-+ * when the message is moved.
-+ *
-+ * Once this call returns (and the connection lock is released), this entry can
-+ * be dequeued by the target connection. Note that the entry will not be removed
-+ * from the queue until it is destroyed.
-+ */
-+void kdbus_queue_entry_enqueue(struct kdbus_queue_entry *entry,
-+			       struct kdbus_reply *reply)
-+{
-+	lockdep_assert_held(&entry->conn->lock);
-+
-+	if (WARN_ON(entry->reply) || WARN_ON(!list_empty(&entry->entry)))
-+		return;
-+
-+	entry->reply = kdbus_reply_ref(reply);
-+	kdbus_queue_entry_link(entry);
-+}
-+
-+/**
-+ * kdbus_queue_entry_move() - move queue entry
-+ * @e:		queue entry to move
-+ * @dst:	destination connection to queue the entry on
-+ *
-+ * This moves a queue entry onto a different connection. It allocates a new
-+ * slice on the target connection and copies the message over. If the copy
-+ * succeeded, we move the entry from @src to @dst.
-+ *
-+ * On failure, the entry is left untouched.
-+ *
-+ * The queue entry must be queued right now, and after the call succeeds it will
-+ * be queued on the destination, but no longer on the source.
-+ *
-+ * The caller must hold the connection lock of the source *and* destination.
-+ *
-+ * Return: 0 on success, negative error code on failure.
-+ */
-+int kdbus_queue_entry_move(struct kdbus_queue_entry *e,
-+			   struct kdbus_conn *dst)
-+{
-+	struct kdbus_pool_slice *slice = NULL;
-+	struct kdbus_conn *src = e->conn;
-+	size_t size, fds;
-+	int ret;
-+
-+	lockdep_assert_held(&src->lock);
-+	lockdep_assert_held(&dst->lock);
-+
-+	if (WARN_ON(list_empty(&e->entry)))
-+		return -EINVAL;
-+	if (src == dst)
-+		return 0;
-+
-+	size = kdbus_pool_slice_size(e->slice);
-+	fds = e->gaps ? e->gaps->n_fds : 0;
-+
-+	ret = kdbus_conn_quota_inc(dst, e->user, size, fds);
-+	if (ret < 0)
-+		return ret;
-+
-+	slice = kdbus_pool_slice_alloc(dst->pool, size, true);
-+	if (IS_ERR(slice)) {
-+		ret = PTR_ERR(slice);
-+		slice = NULL;
-+		goto error;
-+	}
-+
-+	ret = kdbus_pool_slice_copy(slice, e->slice);
-+	if (ret < 0)
-+		goto error;
-+
-+	kdbus_queue_entry_unlink(e);
-+	kdbus_conn_quota_dec(src, e->user, size, fds);
-+	kdbus_pool_slice_release(e->slice);
-+	kdbus_conn_unref(e->conn);
-+
-+	e->slice = slice;
-+	e->conn = kdbus_conn_ref(dst);
-+	kdbus_queue_entry_link(e);
-+
-+	return 0;
-+
-+error:
-+	kdbus_pool_slice_release(slice);
-+	kdbus_conn_quota_dec(dst, e->user, size, fds);
-+	return ret;
-+}
-diff --git a/ipc/kdbus/queue.h b/ipc/kdbus/queue.h
-new file mode 100644
-index 0000000..bf686d1
---- /dev/null
-+++ b/ipc/kdbus/queue.h
-@@ -0,0 +1,84 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_QUEUE_H
-+#define __KDBUS_QUEUE_H
-+
-+#include <linux/list.h>
-+#include <linux/rbtree.h>
-+
-+struct kdbus_conn;
-+struct kdbus_pool_slice;
-+struct kdbus_reply;
-+struct kdbus_staging;
-+struct kdbus_user;
-+
-+/**
-+ * struct kdbus_queue - a connection's message queue
-+ * @msg_list:		List head for kdbus_queue_entry objects
-+ * @msg_prio_queue:	RB tree root for messages, sorted by priority
-+ * @msg_prio_highest:	Link to the RB node referencing the message with the
-+ *			highest priority in the tree.
-+ */
-+struct kdbus_queue {
-+	struct list_head msg_list;
-+	struct rb_root msg_prio_queue;
-+	struct rb_node *msg_prio_highest;
-+};
-+
-+/**
-+ * struct kdbus_queue_entry - messages waiting to be read
-+ * @entry:		Entry in the connection's list
-+ * @prio_node:		Entry in the priority queue tree
-+ * @prio_entry:		Queue tree node entry in the list of one priority
-+ * @priority:		Message priority
-+ * @dst_name_id:	The sequence number of the name this message is
-+ *			addressed to, 0 for messages sent to an ID
-+ * @conn:		Connection this entry is queued on
-+ * @gaps:		Gaps object to fill message gaps at RECV time
-+ * @user:		User used for accounting
-+ * @slice:		Slice in the receiver's pool for the message
-+ * @reply:		The reply block if a reply to this message is expected
-+ */
-+struct kdbus_queue_entry {
-+	struct list_head entry;
-+	struct rb_node prio_node;
-+	struct list_head prio_entry;
-+
-+	s64 priority;
-+	u64 dst_name_id;
-+
-+	struct kdbus_conn *conn;
-+	struct kdbus_gaps *gaps;
-+	struct kdbus_user *user;
-+	struct kdbus_pool_slice *slice;
-+	struct kdbus_reply *reply;
-+};
-+
-+void kdbus_queue_init(struct kdbus_queue *queue);
-+struct kdbus_queue_entry *kdbus_queue_peek(struct kdbus_queue *queue,
-+					   s64 priority, bool use_priority);
-+
-+struct kdbus_queue_entry *kdbus_queue_entry_new(struct kdbus_conn *src,
-+						struct kdbus_conn *dst,
-+						struct kdbus_staging *s);
-+void kdbus_queue_entry_free(struct kdbus_queue_entry *entry);
-+int kdbus_queue_entry_install(struct kdbus_queue_entry *entry,
-+			      u64 *return_flags, bool install_fds);
-+void kdbus_queue_entry_enqueue(struct kdbus_queue_entry *entry,
-+			       struct kdbus_reply *reply);
-+int kdbus_queue_entry_move(struct kdbus_queue_entry *entry,
-+			   struct kdbus_conn *dst);
-+
-+#endif /* __KDBUS_QUEUE_H */
-diff --git a/ipc/kdbus/reply.c b/ipc/kdbus/reply.c
-new file mode 100644
-index 0000000..e6791d8
---- /dev/null
-+++ b/ipc/kdbus/reply.c
-@@ -0,0 +1,252 @@
-+#include <linux/init.h>
-+#include <linux/mm.h>
-+#include <linux/module.h>
-+#include <linux/mutex.h>
-+#include <linux/slab.h>
-+#include <linux/uio.h>
-+
-+#include "bus.h"
-+#include "connection.h"
-+#include "endpoint.h"
-+#include "message.h"
-+#include "metadata.h"
-+#include "names.h"
-+#include "domain.h"
-+#include "item.h"
-+#include "notify.h"
-+#include "policy.h"
-+#include "reply.h"
-+#include "util.h"
-+
-+/**
-+ * kdbus_reply_new() - Allocate and set up a new kdbus_reply object
-+ * @reply_src:		The connection a reply is expected from
-+ * @reply_dst:		The connection this reply object belongs to
-+ * @msg:		Message associated with the reply
-+ * @name_entry:		Name entry used to send the message
-+ * @sync:		Whether or not to make this reply synchronous
-+ *
-+ * Allocate and fill a new kdbus_reply object.
-+ *
-+ * Return: New kdbus_conn object on success, ERR_PTR on error.
-+ */
-+struct kdbus_reply *kdbus_reply_new(struct kdbus_conn *reply_src,
-+				    struct kdbus_conn *reply_dst,
-+				    const struct kdbus_msg *msg,
-+				    struct kdbus_name_entry *name_entry,
-+				    bool sync)
-+{
-+	struct kdbus_reply *r;
-+	int ret;
-+
-+	if (atomic_inc_return(&reply_dst->request_count) >
-+	    KDBUS_CONN_MAX_REQUESTS_PENDING) {
-+		ret = -EMLINK;
-+		goto exit_dec_request_count;
-+	}
-+
-+	r = kzalloc(sizeof(*r), GFP_KERNEL);
-+	if (!r) {
-+		ret = -ENOMEM;
-+		goto exit_dec_request_count;
-+	}
-+
-+	kref_init(&r->kref);
-+	INIT_LIST_HEAD(&r->entry);
-+	r->reply_src = kdbus_conn_ref(reply_src);
-+	r->reply_dst = kdbus_conn_ref(reply_dst);
-+	r->cookie = msg->cookie;
-+	r->name_id = name_entry ? name_entry->name_id : 0;
-+	r->deadline_ns = msg->timeout_ns;
-+
-+	if (sync) {
-+		r->sync = true;
-+		r->waiting = true;
-+	}
-+
-+	return r;
-+
-+exit_dec_request_count:
-+	atomic_dec(&reply_dst->request_count);
-+	return ERR_PTR(ret);
-+}
-+
-+static void __kdbus_reply_free(struct kref *kref)
-+{
-+	struct kdbus_reply *reply =
-+		container_of(kref, struct kdbus_reply, kref);
-+
-+	atomic_dec(&reply->reply_dst->request_count);
-+	kdbus_conn_unref(reply->reply_src);
-+	kdbus_conn_unref(reply->reply_dst);
-+	kfree(reply);
-+}
-+
-+/**
-+ * kdbus_reply_ref() - Increase reference on kdbus_reply
-+ * @r:		The reply, may be %NULL
-+ *
-+ * Return: The reply object with an extra reference
-+ */
-+struct kdbus_reply *kdbus_reply_ref(struct kdbus_reply *r)
-+{
-+	if (r)
-+		kref_get(&r->kref);
-+	return r;
-+}
-+
-+/**
-+ * kdbus_reply_unref() - Decrease reference on kdbus_reply
-+ * @r:		The reply, may be %NULL
-+ *
-+ * Return: NULL
-+ */
-+struct kdbus_reply *kdbus_reply_unref(struct kdbus_reply *r)
-+{
-+	if (r)
-+		kref_put(&r->kref, __kdbus_reply_free);
-+	return NULL;
-+}
-+
-+/**
-+ * kdbus_reply_link() - Link reply object into target connection
-+ * @r:		Reply to link
-+ */
-+void kdbus_reply_link(struct kdbus_reply *r)
-+{
-+	if (WARN_ON(!list_empty(&r->entry)))
-+		return;
-+
-+	list_add(&r->entry, &r->reply_dst->reply_list);
-+	kdbus_reply_ref(r);
-+}
-+
-+/**
-+ * kdbus_reply_unlink() - Unlink reply object from target connection
-+ * @r:		Reply to unlink
-+ */
-+void kdbus_reply_unlink(struct kdbus_reply *r)
-+{
-+	if (!list_empty(&r->entry)) {
-+		list_del_init(&r->entry);
-+		kdbus_reply_unref(r);
-+	}
-+}
-+
-+/**
-+ * kdbus_sync_reply_wakeup() - Wake a synchronously blocking reply
-+ * @reply:	The reply object
-+ * @err:	Error code to set on the remote side
-+ *
-+ * Wake up remote peer (method origin) with the appropriate synchronous reply
-+ * code.
-+ */
-+void kdbus_sync_reply_wakeup(struct kdbus_reply *reply, int err)
-+{
-+	if (WARN_ON(!reply->sync))
-+		return;
-+
-+	reply->waiting = false;
-+	reply->err = err;
-+	wake_up_interruptible(&reply->reply_dst->wait);
-+}
-+
-+/**
-+ * kdbus_reply_find() - Find the corresponding reply object
-+ * @replying:	The replying connection or NULL
-+ * @reply_dst:	The connection the reply will be sent to
-+ *		(method origin)
-+ * @cookie:	The cookie of the requesting message
-+ *
-+ * Lookup a reply object that should be sent as a reply by
-+ * @replying to @reply_dst with the given cookie.
-+ *
-+ * Callers must take the @reply_dst lock.
-+ *
-+ * Return: the corresponding reply object or NULL if not found
-+ */
-+struct kdbus_reply *kdbus_reply_find(struct kdbus_conn *replying,
-+				     struct kdbus_conn *reply_dst,
-+				     u64 cookie)
-+{
-+	struct kdbus_reply *r;
-+
-+	list_for_each_entry(r, &reply_dst->reply_list, entry) {
-+		if (r->cookie == cookie &&
-+		    (!replying || r->reply_src == replying))
-+			return r;
-+	}
-+
-+	return NULL;
-+}
-+
-+/**
-+ * kdbus_reply_list_scan_work() - Worker callback to scan the replies of a
-+ *				  connection for exceeded timeouts
-+ * @work:		Work struct of the connection to scan
-+ *
-+ * Walk the list of replies stored with a connection and look for entries
-+ * that have exceeded their timeout. If such an entry is found, a timeout
-+ * notification is sent to the waiting peer, and the reply is removed from
-+ * the list.
-+ *
-+ * The work is rescheduled to the nearest timeout found during the list
-+ * iteration.
-+ */
-+void kdbus_reply_list_scan_work(struct work_struct *work)
-+{
-+	struct kdbus_conn *conn =
-+		container_of(work, struct kdbus_conn, work.work);
-+	struct kdbus_reply *reply, *reply_tmp;
-+	u64 deadline = ~0ULL;
-+	u64 now;
-+
-+	now = ktime_get_ns();
-+
-+	mutex_lock(&conn->lock);
-+	if (!kdbus_conn_active(conn)) {
-+		mutex_unlock(&conn->lock);
-+		return;
-+	}
-+
-+	list_for_each_entry_safe(reply, reply_tmp, &conn->reply_list, entry) {
-+		/*
-+		 * If the reply block is waiting for synchronous I/O,
-+		 * the timeout is handled by wait_event_*_timeout(),
-+		 * so we don't have to care for it here.
-+		 */
-+		if (reply->sync && !reply->interrupted)
-+			continue;
-+
-+		WARN_ON(reply->reply_dst != conn);
-+
-+		if (reply->deadline_ns > now) {
-+			/* remember next timeout */
-+			if (deadline > reply->deadline_ns)
-+				deadline = reply->deadline_ns;
-+
-+			continue;
-+		}
-+
-+		/*
-+		 * A zero deadline means the connection died, was
-+		 * cleaned up already and the notification was sent.
-+		 * Don't send notifications for reply trackers that were
-+		 * left in an interrupted syscall state.
-+		 */
-+		if (reply->deadline_ns != 0 && !reply->interrupted)
-+			kdbus_notify_reply_timeout(conn->ep->bus, conn->id,
-+						   reply->cookie);
-+
-+		kdbus_reply_unlink(reply);
-+	}
-+
-+	/* rearm delayed work with next timeout */
-+	if (deadline != ~0ULL)
-+		schedule_delayed_work(&conn->work,
-+				      nsecs_to_jiffies(deadline - now));
-+
-+	mutex_unlock(&conn->lock);
-+
-+	kdbus_notify_flush(conn->ep->bus);
-+}
-diff --git a/ipc/kdbus/reply.h b/ipc/kdbus/reply.h
-new file mode 100644
-index 0000000..68d5232
---- /dev/null
-+++ b/ipc/kdbus/reply.h
-@@ -0,0 +1,68 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_REPLY_H
-+#define __KDBUS_REPLY_H
-+
-+/**
-+ * struct kdbus_reply - an entry of kdbus_conn's list of replies
-+ * @kref:		Ref-count of this object
-+ * @entry:		The entry of the connection's reply_list
-+ * @reply_src:		The connection the reply will be sent from
-+ * @reply_dst:		The connection the reply will be sent to
-+ * @queue_entry:	The queue entry item that is prepared by the replying
-+ *			connection
-+ * @deadline_ns:	The deadline of the reply, in nanoseconds
-+ * @cookie:		The cookie of the requesting message
-+ * @name_id:		ID of the well-known name the original msg was sent to
-+ * @sync:		The reply block is waiting for synchronous I/O
-+ * @waiting:		The condition to synchronously wait for
-+ * @interrupted:	The sync reply was left in an interrupted state
-+ * @err:		The error code for the synchronous reply
-+ */
-+struct kdbus_reply {
-+	struct kref kref;
-+	struct list_head entry;
-+	struct kdbus_conn *reply_src;
-+	struct kdbus_conn *reply_dst;
-+	struct kdbus_queue_entry *queue_entry;
-+	u64 deadline_ns;
-+	u64 cookie;
-+	u64 name_id;
-+	bool sync:1;
-+	bool waiting:1;
-+	bool interrupted:1;
-+	int err;
-+};
-+
-+struct kdbus_reply *kdbus_reply_new(struct kdbus_conn *reply_src,
-+				    struct kdbus_conn *reply_dst,
-+				    const struct kdbus_msg *msg,
-+				    struct kdbus_name_entry *name_entry,
-+				    bool sync);
-+
-+struct kdbus_reply *kdbus_reply_ref(struct kdbus_reply *r);
-+struct kdbus_reply *kdbus_reply_unref(struct kdbus_reply *r);
-+
-+void kdbus_reply_link(struct kdbus_reply *r);
-+void kdbus_reply_unlink(struct kdbus_reply *r);
-+
-+struct kdbus_reply *kdbus_reply_find(struct kdbus_conn *replying,
-+				     struct kdbus_conn *reply_dst,
-+				     u64 cookie);
-+
-+void kdbus_sync_reply_wakeup(struct kdbus_reply *reply, int err);
-+void kdbus_reply_list_scan_work(struct work_struct *work);
-+
-+#endif /* __KDBUS_REPLY_H */
-diff --git a/ipc/kdbus/util.c b/ipc/kdbus/util.c
-new file mode 100644
-index 0000000..72b1883
---- /dev/null
-+++ b/ipc/kdbus/util.c
-@@ -0,0 +1,156 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/capability.h>
-+#include <linux/cred.h>
-+#include <linux/ctype.h>
-+#include <linux/err.h>
-+#include <linux/file.h>
-+#include <linux/slab.h>
-+#include <linux/string.h>
-+#include <linux/uaccess.h>
-+#include <linux/uio.h>
-+#include <linux/user_namespace.h>
-+
-+#include "limits.h"
-+#include "util.h"
-+
-+/**
-+ * kdbus_copy_from_user() - copy aligned data from user-space
-+ * @dest:	target buffer in kernel memory
-+ * @user_ptr:	user-provided source buffer
-+ * @size:	memory size to copy from user
-+ *
-+ * This copies @size bytes from @user_ptr into the kernel, just like
-+ * copy_from_user() does. But we enforce an 8-byte alignment and reject any
-+ * unaligned user-space pointers.
-+ *
-+ * Return: 0 on success, negative error code on failure.
-+ */
-+int kdbus_copy_from_user(void *dest, void __user *user_ptr, size_t size)
-+{
-+	if (!KDBUS_IS_ALIGNED8((uintptr_t)user_ptr))
-+		return -EFAULT;
-+
-+	if (copy_from_user(dest, user_ptr, size))
-+		return -EFAULT;
-+
-+	return 0;
-+}
-+
-+/**
-+ * kdbus_verify_uid_prefix() - verify UID prefix of a user-supplied name
-+ * @name:	user-supplied name to verify
-+ * @user_ns:	user-namespace to act in
-+ * @kuid:	Kernel internal uid of user
-+ *
-+ * This verifies that the user-supplied name @name has their UID as prefix. This
-+ * is the default name-spacing policy we enforce on user-supplied names for
-+ * public kdbus entities like buses and endpoints.
-+ *
-+ * The user must supply names prefixed with "<UID>-", whereas the UID is
-+ * interpreted in the user-namespace of the domain. If the user fails to supply
-+ * such a prefixed name, we reject it.
-+ *
-+ * Return: 0 on success, negative error code on failure
-+ */
-+int kdbus_verify_uid_prefix(const char *name, struct user_namespace *user_ns,
-+			    kuid_t kuid)
-+{
-+	uid_t uid;
-+	char prefix[16];
-+
-+	/*
-+	 * The kuid must have a mapping into the userns of the domain
-+	 * otherwise do not allow creation of buses nor endpoints.
-+	 */
-+	uid = from_kuid(user_ns, kuid);
-+	if (uid == (uid_t) -1)
-+		return -EINVAL;
-+
-+	snprintf(prefix, sizeof(prefix), "%u-", uid);
-+	if (strncmp(name, prefix, strlen(prefix)) != 0)
-+		return -EINVAL;
-+
-+	return 0;
-+}
-+
-+/**
-+ * kdbus_sanitize_attach_flags() - Sanitize attach flags from user-space
-+ * @flags:		Attach flags provided by userspace
-+ * @attach_flags:	A pointer where to store the valid attach flags
-+ *
-+ * Convert attach-flags provided by user-space into a valid mask. If the mask
-+ * is invalid, an error is returned. The sanitized attach flags are stored in
-+ * the output parameter.
-+ *
-+ * Return: 0 on success, negative error on failure.
-+ */
-+int kdbus_sanitize_attach_flags(u64 flags, u64 *attach_flags)
-+{
-+	/* 'any' degrades to 'all' for compatibility */
-+	if (flags == _KDBUS_ATTACH_ANY)
-+		flags = _KDBUS_ATTACH_ALL;
-+
-+	/* reject unknown attach flags */
-+	if (flags & ~_KDBUS_ATTACH_ALL)
-+		return -EINVAL;
-+
-+	*attach_flags = flags;
-+	return 0;
-+}
-+
-+/**
-+ * kdbus_kvec_set - helper utility to assemble kvec arrays
-+ * @kvec:	kvec entry to use
-+ * @src:	Source address to set in @kvec
-+ * @len:	Number of bytes in @src
-+ * @total_len:	Pointer to total length variable
-+ *
-+ * Set @src and @len in @kvec, and increase @total_len by @len.
-+ */
-+void kdbus_kvec_set(struct kvec *kvec, void *src, size_t len, u64 *total_len)
-+{
-+	kvec->iov_base = src;
-+	kvec->iov_len = len;
-+	*total_len += len;
-+}
-+
-+static const char * const zeros = "\0\0\0\0\0\0\0";
-+
-+/**
-+ * kdbus_kvec_pad - conditionally write a padding kvec
-+ * @kvec:	kvec entry to use
-+ * @len:	Total length used for kvec array
-+ *
-+ * Check if the current total byte length of the array in @len is aligned to
-+ * 8 bytes. If it isn't, fill @kvec with padding information and increase @len
-+ * by the number of bytes stored in @kvec.
-+ *
-+ * Return: the number of added padding bytes.
-+ */
-+size_t kdbus_kvec_pad(struct kvec *kvec, u64 *len)
-+{
-+	size_t pad = KDBUS_ALIGN8(*len) - *len;
-+
-+	if (!pad)
-+		return 0;
-+
-+	kvec->iov_base = (void *)zeros;
-+	kvec->iov_len = pad;
-+
-+	*len += pad;
-+
-+	return pad;
-+}
-diff --git a/ipc/kdbus/util.h b/ipc/kdbus/util.h
-new file mode 100644
-index 0000000..5297166
---- /dev/null
-+++ b/ipc/kdbus/util.h
-@@ -0,0 +1,73 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_UTIL_H
-+#define __KDBUS_UTIL_H
-+
-+#include <linux/dcache.h>
-+#include <linux/ioctl.h>
-+
-+#include <uapi/linux/kdbus.h>
-+
-+/* all exported addresses are 64 bit */
-+#define KDBUS_PTR(addr) ((void __user *)(uintptr_t)(addr))
-+
-+/* all exported sizes are 64 bit and data aligned to 64 bit */
-+#define KDBUS_ALIGN8(s) ALIGN((s), 8)
-+#define KDBUS_IS_ALIGNED8(s) (IS_ALIGNED(s, 8))
-+
-+/**
-+ * kdbus_member_set_user - write a structure member to user memory
-+ * @_s:		Variable to copy from
-+ * @_b:		Buffer to write to
-+ * @_t:		Structure type
-+ * @_m:		Member name in the passed structure
-+ *
-+ * Return: the result of copy_to_user()
-+ */
-+#define kdbus_member_set_user(_s, _b, _t, _m)				\
-+({									\
-+	u64 __user *_sz =						\
-+		(void __user *)((u8 __user *)(_b) + offsetof(_t, _m));	\
-+	copy_to_user(_sz, _s, FIELD_SIZEOF(_t, _m));			\
-+})
-+
-+/**
-+ * kdbus_strhash - calculate a hash
-+ * @str:	String
-+ *
-+ * Return: hash value
-+ */
-+static inline unsigned int kdbus_strhash(const char *str)
-+{
-+	unsigned long hash = init_name_hash();
-+
-+	while (*str)
-+		hash = partial_name_hash(*str++, hash);
-+
-+	return end_name_hash(hash);
-+}
-+
-+int kdbus_verify_uid_prefix(const char *name, struct user_namespace *user_ns,
-+			    kuid_t kuid);
-+int kdbus_sanitize_attach_flags(u64 flags, u64 *attach_flags);
-+
-+int kdbus_copy_from_user(void *dest, void __user *user_ptr, size_t size);
-+
-+struct kvec;
-+
-+void kdbus_kvec_set(struct kvec *kvec, void *src, size_t len, u64 *total_len);
-+size_t kdbus_kvec_pad(struct kvec *kvec, u64 *len);
-+
-+#endif
-diff --git a/samples/Kconfig b/samples/Kconfig
-index 224ebb4..a4c6b2f 100644
---- a/samples/Kconfig
-+++ b/samples/Kconfig
-@@ -55,6 +55,13 @@ config SAMPLE_KDB
- 	  Build an example of how to dynamically add the hello
- 	  command to the kdb shell.
- 
-+config SAMPLE_KDBUS
-+	bool "Build kdbus API example"
-+	depends on KDBUS
-+	help
-+	  Build an example of how the kdbus API can be used from
-+	  userspace.
-+
- config SAMPLE_RPMSG_CLIENT
- 	tristate "Build rpmsg client sample -- loadable modules only"
- 	depends on RPMSG && m
-diff --git a/samples/Makefile b/samples/Makefile
-index f00257b..f0ad51e 100644
---- a/samples/Makefile
-+++ b/samples/Makefile
-@@ -1,4 +1,5 @@
- # Makefile for Linux samples code
- 
- obj-$(CONFIG_SAMPLES)	+= kobject/ kprobes/ trace_events/ livepatch/ \
--			   hw_breakpoint/ kfifo/ kdb/ hidraw/ rpmsg/ seccomp/
-+			   hw_breakpoint/ kfifo/ kdb/ kdbus/ hidraw/ rpmsg/ \
-+			   seccomp/
-diff --git a/samples/kdbus/.gitignore b/samples/kdbus/.gitignore
-new file mode 100644
-index 0000000..ee07d98
---- /dev/null
-+++ b/samples/kdbus/.gitignore
-@@ -0,0 +1 @@
-+kdbus-workers
-diff --git a/samples/kdbus/Makefile b/samples/kdbus/Makefile
-new file mode 100644
-index 0000000..137f842
---- /dev/null
-+++ b/samples/kdbus/Makefile
-@@ -0,0 +1,9 @@
-+# kbuild trick to avoid linker error. Can be omitted if a module is built.
-+obj- := dummy.o
-+
-+hostprogs-$(CONFIG_SAMPLE_KDBUS) += kdbus-workers
-+
-+always := $(hostprogs-y)
-+
-+HOSTCFLAGS_kdbus-workers.o += -I$(objtree)/usr/include
-+HOSTLOADLIBES_kdbus-workers := -lrt
-diff --git a/samples/kdbus/kdbus-api.h b/samples/kdbus/kdbus-api.h
-new file mode 100644
-index 0000000..7f3abae
---- /dev/null
-+++ b/samples/kdbus/kdbus-api.h
-@@ -0,0 +1,114 @@
-+#ifndef KDBUS_API_H
-+#define KDBUS_API_H
-+
-+#include <sys/ioctl.h>
-+#include <linux/kdbus.h>
-+
-+#define KDBUS_ALIGN8(l) (((l) + 7) & ~7)
-+#define KDBUS_ITEM_HEADER_SIZE offsetof(struct kdbus_item, data)
-+#define KDBUS_ITEM_SIZE(s) KDBUS_ALIGN8((s) + KDBUS_ITEM_HEADER_SIZE)
-+#define KDBUS_ITEM_NEXT(item) \
-+	(typeof(item))((uint8_t *)(item) + KDBUS_ALIGN8((item)->size))
-+#define KDBUS_FOREACH(iter, first, _size)				\
-+	for ((iter) = (first);						\
-+	     ((uint8_t *)(iter) < (uint8_t *)(first) + (_size)) &&	\
-+	       ((uint8_t *)(iter) >= (uint8_t *)(first));		\
-+	     (iter) = (void *)((uint8_t *)(iter) + KDBUS_ALIGN8((iter)->size)))
-+
-+static inline int kdbus_cmd_bus_make(int control_fd, struct kdbus_cmd *cmd)
-+{
-+	int ret = ioctl(control_fd, KDBUS_CMD_BUS_MAKE, cmd);
-+	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_endpoint_make(int bus_fd, struct kdbus_cmd *cmd)
-+{
-+	int ret = ioctl(bus_fd, KDBUS_CMD_ENDPOINT_MAKE, cmd);
-+	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_endpoint_update(int ep_fd, struct kdbus_cmd *cmd)
-+{
-+	int ret = ioctl(ep_fd, KDBUS_CMD_ENDPOINT_UPDATE, cmd);
-+	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_hello(int bus_fd, struct kdbus_cmd_hello *cmd)
-+{
-+	int ret = ioctl(bus_fd, KDBUS_CMD_HELLO, cmd);
-+	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_update(int fd, struct kdbus_cmd *cmd)
-+{
-+	int ret = ioctl(fd, KDBUS_CMD_UPDATE, cmd);
-+	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_byebye(int conn_fd, struct kdbus_cmd *cmd)
-+{
-+	int ret = ioctl(conn_fd, KDBUS_CMD_BYEBYE, cmd);
-+	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_free(int conn_fd, struct kdbus_cmd_free *cmd)
-+{
-+	int ret = ioctl(conn_fd, KDBUS_CMD_FREE, cmd);
-+	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_conn_info(int conn_fd, struct kdbus_cmd_info *cmd)
-+{
-+	int ret = ioctl(conn_fd, KDBUS_CMD_CONN_INFO, cmd);
-+	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_bus_creator_info(int conn_fd, struct kdbus_cmd_info *cmd)
-+{
-+	int ret = ioctl(conn_fd, KDBUS_CMD_BUS_CREATOR_INFO, cmd);
-+	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_list(int fd, struct kdbus_cmd_list *cmd)
-+{
-+	int ret = ioctl(fd, KDBUS_CMD_LIST, cmd);
-+	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_send(int conn_fd, struct kdbus_cmd_send *cmd)
-+{
-+	int ret = ioctl(conn_fd, KDBUS_CMD_SEND, cmd);
-+	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_recv(int conn_fd, struct kdbus_cmd_recv *cmd)
-+{
-+	int ret = ioctl(conn_fd, KDBUS_CMD_RECV, cmd);
-+	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_name_acquire(int conn_fd, struct kdbus_cmd *cmd)
-+{
-+	int ret = ioctl(conn_fd, KDBUS_CMD_NAME_ACQUIRE, cmd);
-+	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_name_release(int conn_fd, struct kdbus_cmd *cmd)
-+{
-+	int ret = ioctl(conn_fd, KDBUS_CMD_NAME_RELEASE, cmd);
-+	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_match_add(int conn_fd, struct kdbus_cmd_match *cmd)
-+{
-+	int ret = ioctl(conn_fd, KDBUS_CMD_MATCH_ADD, cmd);
-+	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_match_remove(int conn_fd, struct kdbus_cmd_match *cmd)
-+{
-+	int ret = ioctl(conn_fd, KDBUS_CMD_MATCH_REMOVE, cmd);
-+	return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+#endif /* KDBUS_API_H */
-diff --git a/samples/kdbus/kdbus-workers.c b/samples/kdbus/kdbus-workers.c
-new file mode 100644
-index 0000000..5a6dfdc
---- /dev/null
-+++ b/samples/kdbus/kdbus-workers.c
-@@ -0,0 +1,1346 @@
-+/*
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+/*
-+ * Example: Workers
-+ * This program computes prime-numbers based on the sieve of Eratosthenes. The
-+ * master sets up a shared memory region and spawns workers which clear out the
-+ * non-primes. The master reacts to keyboard input and to client-requests to
-+ * control what each worker does. Note that this is in no way meant as efficient
-+ * way to compute primes. It should only serve as example how a master/worker
-+ * concept can be implemented with kdbus used as control messages.
-+ *
-+ * The main process is called the 'master'. It creates a new, private bus which
-+ * will be used between the master and its workers to communicate. The master
-+ * then spawns a fixed number of workers. Whenever a worker dies (detected via
-+ * SIGCHLD), the master spawns a new worker. When done, the master waits for all
-+ * workers to exit, prints a status report and exits itself.
-+ *
-+ * The master process does *not* keep track of its workers. Instead, this
-+ * example implements a PULL model. That is, the master acquires a well-known
-+ * name on the bus which each worker uses to request tasks from the master. If
-+ * there are no more tasks, the master will return an empty task-list, which
-+ * casues a worker to exit immediately.
-+ *
-+ * As tasks can be computationally expensive, we support cancellation. Whenever
-+ * the master process is interrupted, it will drop its well-known name on the
-+ * bus. This causes kdbus to broadcast a name-change notification. The workers
-+ * check for broadcast messages regularly and will exit if they receive one.
-+ *
-+ * This example exists of 4 objects:
-+ *  * master: The master object contains the context of the master process. This
-+ *            process manages the prime-context, spawns workers and assigns
-+ *            prime-ranges to each worker to compute.
-+ *            The master itself does not do any prime-computations itself.
-+ *  * child:  The child object contains the context of a worker. It inherits the
-+ *            prime context from its parent (the master) and then creates a new
-+ *            bus context to request prime-ranges to compute.
-+ *  * prime:  The "prime" object is used to abstract how we compute primes. When
-+ *            allocated, it prepares a memory region to hold 1 bit for each
-+ *            natural number up to a fixed maximum ('MAX_PRIMES').
-+ *            The memory region is backed by a memfd which we share between
-+ *            processes. Each worker now gets assigned a range of natural
-+ *            numbers which it clears multiples of off the memory region. The
-+ *            master process is responsible of distributing all natural numbers
-+ *            up to the fixed maximum to its workers.
-+ *  * bus:    The bus object is an abstraction of the kdbus API. It is pretty
-+ *            straightfoward and only manages the connection-fd plus the
-+ *            memory-mapped pool in a single object.
-+ *
-+ * This example is in reversed order, which should make it easier to read
-+ * top-down, but requires some forward-declarations. Just ignore those.
-+ */
-+
-+#include <stdio.h>
-+#include <stdlib.h>
-+#include <sys/syscall.h>
-+
-+/* glibc < 2.7 does not ship sys/signalfd.h */
-+/* we require kernels with __NR_memfd_create */
-+#if __GLIBC__ >= 2 && __GLIBC_MINOR__ >= 7 && defined(__NR_memfd_create)
-+
-+#include <ctype.h>
-+#include <errno.h>
-+#include <fcntl.h>
-+#include <linux/memfd.h>
-+#include <signal.h>
-+#include <stdbool.h>
-+#include <stddef.h>
-+#include <stdint.h>
-+#include <string.h>
-+#include <sys/mman.h>
-+#include <sys/poll.h>
-+#include <sys/signalfd.h>
-+#include <sys/time.h>
-+#include <sys/wait.h>
-+#include <time.h>
-+#include <unistd.h>
-+#include "kdbus-api.h"
-+
-+/* FORWARD DECLARATIONS */
-+
-+#define POOL_SIZE (16 * 1024 * 1024)
-+#define MAX_PRIMES (2UL << 24)
-+#define WORKER_COUNT (16)
-+#define PRIME_STEPS (65536 * 4)
-+
-+static const char *arg_busname = "example-workers";
-+static const char *arg_modname = "kdbus";
-+static const char *arg_master = "org.freedesktop.master";
-+
-+static int err_assert(int r_errno, const char *msg, const char *func, int line,
-+		      const char *file)
-+{
-+	r_errno = (r_errno != 0) ? -abs(r_errno) : -EFAULT;
-+	if (r_errno < 0) {
-+		errno = -r_errno;
-+		fprintf(stderr, "ERR: %s: %m (%s:%d in %s)\n",
-+			msg, func, line, file);
-+	}
-+	return r_errno;
-+}
-+
-+#define err_r(_r, _msg) err_assert((_r), (_msg), __func__, __LINE__, __FILE__)
-+#define err(_msg) err_r(errno, (_msg))
-+
-+struct prime;
-+struct bus;
-+struct master;
-+struct child;
-+
-+struct prime {
-+	int fd;
-+	uint8_t *area;
-+	size_t max;
-+	size_t done;
-+	size_t status;
-+};
-+
-+static int prime_new(struct prime **out);
-+static void prime_free(struct prime *p);
-+static bool prime_done(struct prime *p);
-+static void prime_consume(struct prime *p, size_t amount);
-+static int prime_run(struct prime *p, struct bus *cancel, size_t number);
-+static void prime_print(struct prime *p);
-+
-+struct bus {
-+	int fd;
-+	uint8_t *pool;
-+};
-+
-+static int bus_open_connection(struct bus **out, uid_t uid, const char *name,
-+			       uint64_t recv_flags);
-+static void bus_close_connection(struct bus *b);
-+static void bus_poool_free_slice(struct bus *b, uint64_t offset);
-+static int bus_acquire_name(struct bus *b, const char *name);
-+static int bus_install_name_loss_match(struct bus *b, const char *name);
-+static int bus_poll(struct bus *b);
-+static int bus_make(uid_t uid, const char *name);
-+
-+struct master {
-+	size_t n_workers;
-+	size_t max_workers;
-+
-+	int signal_fd;
-+	int control_fd;
-+
-+	struct prime *prime;
-+	struct bus *bus;
-+};
-+
-+static int master_new(struct master **out);
-+static void master_free(struct master *m);
-+static int master_run(struct master *m);
-+static int master_poll(struct master *m);
-+static int master_handle_stdin(struct master *m);
-+static int master_handle_signal(struct master *m);
-+static int master_handle_bus(struct master *m);
-+static int master_reply(struct master *m, const struct kdbus_msg *msg);
-+static int master_waitpid(struct master *m);
-+static int master_spawn(struct master *m);
-+
-+struct child {
-+	struct bus *bus;
-+	struct prime *prime;
-+};
-+
-+static int child_new(struct child **out, struct prime *p);
-+static void child_free(struct child *c);
-+static int child_run(struct child *c);
-+
-+/* END OF FORWARD DECLARATIONS */
-+
-+/*
-+ * This is the main entrypoint of this example. It is pretty straightforward. We
-+ * create a master object, run the computation, print a status report and then
-+ * exit. Nothing particularly interesting here, so lets look into the master
-+ * object...
-+ */
-+int main(int argc, char **argv)
-+{
-+	struct master *m = NULL;
-+	int r;
-+
-+	r = master_new(&m);
-+	if (r < 0)
-+		goto out;
-+
-+	r = master_run(m);
-+	if (r < 0)
-+		goto out;
-+
-+	if (0)
-+		prime_print(m->prime);
-+
-+out:
-+	master_free(m);
-+	if (r < 0 && r != -EINTR)
-+		fprintf(stderr, "failed\n");
-+	else
-+		fprintf(stderr, "done\n");
-+	return r < 0 ? EXIT_FAILURE : EXIT_SUCCESS;
-+}
-+
-+/*
-+ * ...this will allocate a new master context. It keeps track of the current
-+ * number of children/workers that are running, manages a signalfd to track
-+ * SIGCHLD, and creates a private kdbus bus. Afterwards, it opens its connection
-+ * to the bus and acquires a well known-name (arg_master).
-+ */
-+static int master_new(struct master **out)
-+{
-+	struct master *m;
-+	sigset_t smask;
-+	int r;
-+
-+	m = calloc(1, sizeof(*m));
-+	if (!m)
-+		return err("cannot allocate master");
-+
-+	m->max_workers = WORKER_COUNT;
-+	m->signal_fd = -1;
-+	m->control_fd = -1;
-+
-+	/* Block SIGINT and SIGCHLD signals */
-+	sigemptyset(&smask);
-+	sigaddset(&smask, SIGINT);
-+	sigaddset(&smask, SIGCHLD);
-+	sigprocmask(SIG_BLOCK, &smask, NULL);
-+
-+	m->signal_fd = signalfd(-1, &smask, SFD_CLOEXEC);
-+	if (m->signal_fd < 0) {
-+		r = err("cannot create signalfd");
-+		goto error;
-+	}
-+
-+	r = prime_new(&m->prime);
-+	if (r < 0)
-+		goto error;
-+
-+	m->control_fd = bus_make(getuid(), arg_busname);
-+	if (m->control_fd < 0) {
-+		r = m->control_fd;
-+		goto error;
-+	}
-+
-+	/*
-+	 * Open a bus connection for the master, and require each received
-+	 * message to have a metadata item of type KDBUS_ITEM_PIDS attached.
-+	 * The current UID is needed to compute the name of the bus node to
-+	 * connect to.
-+	 */
-+	r = bus_open_connection(&m->bus, getuid(),
-+				arg_busname, KDBUS_ATTACH_PIDS);
-+	if (r < 0)
-+		goto error;
-+
-+	/*
-+	 * Acquire a well-known name on the bus, so children can address
-+	 * messages to the master using KDBUS_DST_ID_NAME as destination-ID
-+	 * of messages.
-+	 */
-+	r = bus_acquire_name(m->bus, arg_master);
-+	if (r < 0)
-+		goto error;
-+
-+	*out = m;
-+	return 0;
-+
-+error:
-+	master_free(m);
-+	return r;
-+}
-+
-+/* pretty straightforward destructor of a master object */
-+static void master_free(struct master *m)
-+{
-+	if (!m)
-+		return;
-+
-+	bus_close_connection(m->bus);
-+	if (m->control_fd >= 0)
-+		close(m->control_fd);
-+	prime_free(m->prime);
-+	if (m->signal_fd >= 0)
-+		close(m->signal_fd);
-+	free(m);
-+}
-+
-+static int master_run(struct master *m)
-+{
-+	int res, r = 0;
-+
-+	while (!prime_done(m->prime)) {
-+		while (m->n_workers < m->max_workers) {
-+			r = master_spawn(m);
-+			if (r < 0)
-+				break;
-+		}
-+
-+		r = master_poll(m);
-+		if (r < 0)
-+			break;
-+	}
-+
-+	if (r < 0) {
-+		bus_close_connection(m->bus);
-+		m->bus = NULL;
-+	}
-+
-+	while (m->n_workers > 0) {
-+		res = master_poll(m);
-+		if (res < 0) {
-+			if (m->bus) {
-+				bus_close_connection(m->bus);
-+				m->bus = NULL;
-+			}
-+			r = res;
-+		}
-+	}
-+
-+	return r == -EINTR ? 0 : r;
-+}
-+
-+static int master_poll(struct master *m)
-+{
-+	struct pollfd fds[3] = {};
-+	int r = 0, n = 0;
-+
-+	/*
-+	 * Add stdin, the eventfd and the connection owner file descriptor to
-+	 * the pollfd table, and handle incoming traffic on the latter in
-+	 * master_handle_bus().
-+	 */
-+	fds[n].fd = STDIN_FILENO;
-+	fds[n++].events = POLLIN;
-+	fds[n].fd = m->signal_fd;
-+	fds[n++].events = POLLIN;
-+	if (m->bus) {
-+		fds[n].fd = m->bus->fd;
-+		fds[n++].events = POLLIN;
-+	}
-+
-+	r = poll(fds, n, -1);
-+	if (r < 0)
-+		return err("poll() failed");
-+
-+	if (fds[0].revents & POLLIN)
-+		r = master_handle_stdin(m);
-+	else if (fds[0].revents)
-+		r = err("ERR/HUP on stdin");
-+	if (r < 0)
-+		return r;
-+
-+	if (fds[1].revents & POLLIN)
-+		r = master_handle_signal(m);
-+	else if (fds[1].revents)
-+		r = err("ERR/HUP on signalfd");
-+	if (r < 0)
-+		return r;
-+
-+	if (fds[2].revents & POLLIN)
-+		r = master_handle_bus(m);
-+	else if (fds[2].revents)
-+		r = err("ERR/HUP on bus");
-+
-+	return r;
-+}
-+
-+static int master_handle_stdin(struct master *m)
-+{
-+	char buf[128];
-+	ssize_t l;
-+	int r = 0;
-+
-+	l = read(STDIN_FILENO, buf, sizeof(buf));
-+	if (l < 0)
-+		return err("cannot read stdin");
-+	if (l == 0)
-+		return err_r(-EINVAL, "EOF on stdin");
-+
-+	while (l-- > 0) {
-+		switch (buf[l]) {
-+		case 'q':
-+			/* quit */
-+			r = -EINTR;
-+			break;
-+		case '\n':
-+		case ' ':
-+			/* ignore */
-+			break;
-+		default:
-+			if (isgraph(buf[l]))
-+				fprintf(stderr, "invalid input '%c'\n", buf[l]);
-+			else
-+				fprintf(stderr, "invalid input 0x%x\n", buf[l]);
-+			break;
-+		}
-+	}
-+
-+	return r;
-+}
-+
-+static int master_handle_signal(struct master *m)
-+{
-+	struct signalfd_siginfo val;
-+	ssize_t l;
-+
-+	l = read(m->signal_fd, &val, sizeof(val));
-+	if (l < 0)
-+		return err("cannot read signalfd");
-+	if (l != sizeof(val))
-+		return err_r(-EINVAL, "invalid data from signalfd");
-+
-+	switch (val.ssi_signo) {
-+	case SIGCHLD:
-+		return master_waitpid(m);
-+	case SIGINT:
-+		return err_r(-EINTR, "interrupted");
-+	default:
-+		return err_r(-EINVAL, "caught invalid signal");
-+	}
-+}
-+
-+static int master_handle_bus(struct master *m)
-+{
-+	struct kdbus_cmd_recv recv = { .size = sizeof(recv) };
-+	const struct kdbus_msg *msg = NULL;
-+	const struct kdbus_item *item;
-+	const struct kdbus_vec *vec = NULL;
-+	int r = 0;
-+
-+	/*
-+	 * To receive a message, the KDBUS_CMD_RECV ioctl is used.
-+	 * It takes an argument of type 'struct kdbus_cmd_recv', which
-+	 * will contain information on the received message when the call
-+	 * returns. See kdbus.message(7).
-+	 */
-+	r = kdbus_cmd_recv(m->bus->fd, &recv);
-+	/*
-+	 * EAGAIN is returned when there is no message waiting on this
-+	 * connection. This is not an error - simply bail out.
-+	 */
-+	if (r == -EAGAIN)
-+		return 0;
-+	if (r < 0)
-+		return err_r(r, "cannot receive message");
-+
-+	/*
-+	 * Messages received by a connection are stored inside the connection's
-+	 * pool, at an offset that has been returned in the 'recv' command
-+	 * struct above. The value describes the relative offset from the
-+	 * start address of the pool. A message is described with
-+	 * 'struct kdbus_msg'. See kdbus.message(7).
-+	 */
-+	msg = (void *)(m->bus->pool + recv.msg.offset);
-+
-+	/*
-+	 * A messages describes its actual payload in an array of items.
-+	 * KDBUS_FOREACH() is a simple iterator that walks such an array.
-+	 * struct kdbus_msg has a field to denote its total size, which is
-+	 * needed to determine the number of items in the array.
-+	 */
-+	KDBUS_FOREACH(item, msg->items,
-+		      msg->size - offsetof(struct kdbus_msg, items)) {
-+		/*
-+		 * An item of type PAYLOAD_OFF describes in-line memory
-+		 * stored in the pool at a described offset. That offset is
-+		 * relative to the start address of the message header.
-+		 * This example program only expects one single item of that
-+		 * type, remembers the struct kdbus_vec member of the item
-+		 * when it sees it, and bails out if there is more than one
-+		 * of them.
-+		 */
-+		if (item->type == KDBUS_ITEM_PAYLOAD_OFF) {
-+			if (vec) {
-+				r = err_r(-EEXIST,
-+					  "message with multiple vecs");
-+				break;
-+			}
-+			vec = &item->vec;
-+			if (vec->size != 1) {
-+				r = err_r(-EINVAL, "invalid message size");
-+				break;
-+			}
-+
-+		/*
-+		 * MEMFDs are transported as items of type PAYLOAD_MEMFD.
-+		 * If such an item is attached, a new file descriptor was
-+		 * installed into the task when KDBUS_CMD_RECV was called, and
-+		 * its number is stored in item->memfd.fd.
-+		 * Implementers *must* handle this item type and close the
-+		 * file descriptor when no longer needed in order to prevent
-+		 * file descriptor exhaustion. This example program just bails
-+		 * out with an error in this case, as memfds are not expected
-+		 * in this context.
-+		 */
-+		} else if (item->type == KDBUS_ITEM_PAYLOAD_MEMFD) {
-+			r = err_r(-EINVAL, "message with memfd");
-+			break;
-+		}
-+	}
-+	if (r < 0)
-+		goto exit;
-+	if (!vec) {
-+		r = err_r(-EINVAL, "empty message");
-+		goto exit;
-+	}
-+
-+	switch (*((const uint8_t *)msg + vec->offset)) {
-+	case 'r': {
-+		r = master_reply(m, msg);
-+		break;
-+	}
-+	default:
-+		r = err_r(-EINVAL, "invalid message type");
-+		break;
-+	}
-+
-+exit:
-+	/*
-+	 * We are done with the memory slice that was given to us through
-+	 * recv.msg.offset. Tell the kernel it can use it for other content
-+	 * in the future. See kdbus.pool(7).
-+	 */
-+	bus_poool_free_slice(m->bus, recv.msg.offset);
-+	return r;
-+}
-+
-+static int master_reply(struct master *m, const struct kdbus_msg *msg)
-+{
-+	struct kdbus_cmd_send cmd;
-+	struct kdbus_item *item;
-+	struct kdbus_msg *reply;
-+	size_t size, status, p[2];
-+	int r;
-+
-+	/*
-+	 * This functions sends a message over kdbus. To do this, it uses the
-+	 * KDBUS_CMD_SEND ioctl, which takes a command struct argument of type
-+	 * 'struct kdbus_cmd_send'. This struct stores a pointer to the actual
-+	 * message to send. See kdbus.message(7).
-+	 */
-+	p[0] = m->prime->done;
-+	p[1] = prime_done(m->prime) ? 0 : PRIME_STEPS;
-+
-+	size = sizeof(*reply);
-+	size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
-+
-+	/* Prepare the message to send */
-+	reply = alloca(size);
-+	memset(reply, 0, size);
-+	reply->size = size;
-+
-+	/* Each message has a cookie that can be used to send replies */
-+	reply->cookie = 1;
-+
-+	/* The payload_type is arbitrary, but it must be non-zero */
-+	reply->payload_type = 0xdeadbeef;
-+
-+	/*
-+	 * We are sending a reply. Let the kernel know the cookie of the
-+	 * message we are replying to.
-+	 */
-+	reply->cookie_reply = msg->cookie;
-+
-+	/*
-+	 * Messages can either be directed to a well-known name (stored as
-+	 * string) or to a unique name (stored as number). This example does
-+	 * the latter. If the message would be directed to a well-known name
-+	 * instead, the message's dst_id field would be set to
-+	 * KDBUS_DST_ID_NAME, and the name would be attaches in an item of type
-+	 * KDBUS_ITEM_DST_NAME. See below for an example, and also refer to
-+	 * kdbus.message(7).
-+	 */
-+	reply->dst_id = msg->src_id;
-+
-+	/* Our message has exactly one item to store its payload */
-+	item = reply->items;
-+	item->type = KDBUS_ITEM_PAYLOAD_VEC;
-+	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
-+	item->vec.address = (uintptr_t)p;
-+	item->vec.size = sizeof(p);
-+
-+	/*
-+	 * Now prepare the command struct, and reference the message we want
-+	 * to send.
-+	 */
-+	memset(&cmd, 0, sizeof(cmd));
-+	cmd.size = sizeof(cmd);
-+	cmd.msg_address = (uintptr_t)reply;
-+
-+	/*
-+	 * Finally, employ the command on the connection owner
-+	 * file descriptor.
-+	 */
-+	r = kdbus_cmd_send(m->bus->fd, &cmd);
-+	if (r < 0)
-+		return err_r(r, "cannot send reply");
-+
-+	if (p[1]) {
-+		prime_consume(m->prime, p[1]);
-+		status = m->prime->done * 10000 / m->prime->max;
-+		if (status != m->prime->status) {
-+			m->prime->status = status;
-+			fprintf(stderr, "status: %7.3lf%%\n",
-+				(double)status / 100);
-+		}
-+	}
-+
-+	return 0;
-+}
-+
-+static int master_waitpid(struct master *m)
-+{
-+	pid_t pid;
-+	int r;
-+
-+	while ((pid = waitpid(-1, &r, WNOHANG)) > 0) {
-+		if (m->n_workers > 0)
-+			--m->n_workers;
-+		if (!WIFEXITED(r))
-+			r = err_r(-EINVAL, "child died unexpectedly");
-+		else if (WEXITSTATUS(r) != 0)
-+			r = err_r(-WEXITSTATUS(r), "child failed");
-+	}
-+
-+	return r;
-+}
-+
-+static int master_spawn(struct master *m)
-+{
-+	struct child *c = NULL;
-+	struct prime *p = NULL;
-+	pid_t pid;
-+	int r;
-+
-+	/* Spawn off one child and call child_run() inside it */
-+
-+	pid = fork();
-+	if (pid < 0)
-+		return err("cannot fork");
-+	if (pid > 0) {
-+		/* parent */
-+		++m->n_workers;
-+		return 0;
-+	}
-+
-+	/* child */
-+
-+	p = m->prime;
-+	m->prime = NULL;
-+	master_free(m);
-+
-+	r = child_new(&c, p);
-+	if (r < 0)
-+		goto exit;
-+
-+	r = child_run(c);
-+
-+exit:
-+	child_free(c);
-+	exit(abs(r));
-+}
-+
-+static int child_new(struct child **out, struct prime *p)
-+{
-+	struct child *c;
-+	int r;
-+
-+	c = calloc(1, sizeof(*c));
-+	if (!c)
-+		return err("cannot allocate child");
-+
-+	c->prime = p;
-+
-+	/*
-+	 * Open a connection to the bus and require each received message to
-+	 * carry a list of the well-known names the sendind connection currently
-+	 * owns. The current UID is needed in order to determine the name of the
-+	 * bus node to connect to.
-+	 */
-+	r = bus_open_connection(&c->bus, getuid(),
-+				arg_busname, KDBUS_ATTACH_NAMES);
-+	if (r < 0)
-+		goto error;
-+
-+	/*
-+	 * Install a kdbus match so the child's connection gets notified when
-+	 * the master loses its well-known name.
-+	 */
-+	r = bus_install_name_loss_match(c->bus, arg_master);
-+	if (r < 0)
-+		goto error;
-+
-+	*out = c;
-+	return 0;
-+
-+error:
-+	child_free(c);
-+	return r;
-+}
-+
-+static void child_free(struct child *c)
-+{
-+	if (!c)
-+		return;
-+
-+	bus_close_connection(c->bus);
-+	prime_free(c->prime);
-+	free(c);
-+}
-+
-+static int child_run(struct child *c)
-+{
-+	struct kdbus_cmd_send cmd;
-+	struct kdbus_item *item;
-+	struct kdbus_vec *vec = NULL;
-+	struct kdbus_msg *msg;
-+	struct timespec spec;
-+	size_t n, steps, size;
-+	int r = 0;
-+
-+	/*
-+	 * Let's send a message to the master and ask for work. To do this,
-+	 * we use the KDBUS_CMD_SEND ioctl, which takes an argument of type
-+	 * 'struct kdbus_cmd_send'. This struct stores a pointer to the actual
-+	 * message to send. See kdbus.message(7).
-+	 */
-+	size = sizeof(*msg);
-+	size += KDBUS_ITEM_SIZE(strlen(arg_master) + 1);
-+	size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
-+
-+	msg = alloca(size);
-+	memset(msg, 0, size);
-+	msg->size = size;
-+
-+	/*
-+	 * Tell the kernel that we expect a reply to this message. This means
-+	 * that
-+	 *
-+	 * a) The remote peer will gain temporary permission to talk to us
-+	 *    even if it would not be allowed to normally.
-+	 *
-+	 * b) A timeout value is required.
-+	 *
-+	 *    For asynchronous send commands, if no reply is received, we will
-+	 *    get a kernel notification with an item of type
-+	 *    KDBUS_ITEM_REPLY_TIMEOUT attached.
-+	 *
-+	 *    For synchronous send commands (which this example does), the
-+	 *    ioctl will block until a reply is received or the timeout is
-+	 *    exceeded.
-+	 */
-+	msg->flags = KDBUS_MSG_EXPECT_REPLY;
-+
-+	/* Set our cookie. Replies must use this cookie to send their reply. */
-+	msg->cookie = 1;
-+
-+	/* The payload_type is arbitrary, but it must be non-zero */
-+	msg->payload_type = 0xdeadbeef;
-+
-+	/*
-+	 * We are sending our message to the current owner of a well-known
-+	 * name. This makes an item of type KDBUS_ITEM_DST_NAME mandatory.
-+	 */
-+	msg->dst_id = KDBUS_DST_ID_NAME;
-+
-+	/*
-+	 * Set the reply timeout to 5 seconds. Timeouts are always set in
-+	 * absolute timestamps, based con CLOCK_MONOTONIC. See kdbus.message(7).
-+	 */
-+	clock_gettime(CLOCK_MONOTONIC_COARSE, &spec);
-+	msg->timeout_ns += (5 + spec.tv_sec) * 1000ULL * 1000ULL * 1000ULL;
-+	msg->timeout_ns += spec.tv_nsec;
-+
-+	/*
-+	 * Fill the appended items. First, set the well-known name of the
-+	 * destination we want to talk to.
-+	 */
-+	item = msg->items;
-+	item->type = KDBUS_ITEM_DST_NAME;
-+	item->size = KDBUS_ITEM_HEADER_SIZE + strlen(arg_master) + 1;
-+	strcpy(item->str, arg_master);
-+
-+	/*
-+	 * The 2nd item contains a vector to memory we want to send. It
-+	 * can be content of any type. In our case, we're sending a one-byte
-+	 * string only. The memory referenced by this item will be copied into
-+	 * the pool of the receiver connection, and does not need to be valid
-+	 * after the command is employed.
-+	 */
-+	item = KDBUS_ITEM_NEXT(item);
-+	item->type = KDBUS_ITEM_PAYLOAD_VEC;
-+	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
-+	item->vec.address = (uintptr_t)"r";
-+	item->vec.size = 1;
-+
-+	/* Set up the command struct and reference the message we prepared */
-+	memset(&cmd, 0, sizeof(cmd));
-+	cmd.size = sizeof(cmd);
-+	cmd.msg_address = (uintptr_t)msg;
-+
-+	/*
-+	 * The send commands knows a mode in which it will block until a
-+	 * reply to a message is received. This example uses that mode.
-+	 * The pool offset to the received reply will be stored in the command
-+	 * struct after the send command returned. See below.
-+	 */
-+	cmd.flags = KDBUS_SEND_SYNC_REPLY;
-+
-+	/*
-+	 * Finally, employ the command on the connection owner
-+	 * file descriptor.
-+	 */
-+	r = kdbus_cmd_send(c->bus->fd, &cmd);
-+	if (r == -ESRCH || r == -EPIPE || r == -ECONNRESET)
-+		return 0;
-+	if (r < 0)
-+		return err_r(r, "cannot send request to master");
-+
-+	/*
-+	 * The command was sent with the KDBUS_SEND_SYNC_REPLY flag set,
-+	 * and returned successfully, which means that cmd.reply.offset now
-+	 * points to a message inside our connection's pool where the reply
-+	 * is found. This is equivalent to receiving the reply with
-+	 * KDBUS_CMD_RECV, but it doesn't require waiting for the reply with
-+	 * poll() and also saves the ioctl to receive the message.
-+	 */
-+	msg = (void *)(c->bus->pool + cmd.reply.offset);
-+
-+	/*
-+	 * A messages describes its actual payload in an array of items.
-+	 * KDBUS_FOREACH() is a simple iterator that walks such an array.
-+	 * struct kdbus_msg has a field to denote its total size, which is
-+	 * needed to determine the number of items in the array.
-+	 */
-+	KDBUS_FOREACH(item, msg->items,
-+		      msg->size - offsetof(struct kdbus_msg, items)) {
-+		/*
-+		 * An item of type PAYLOAD_OFF describes in-line memory
-+		 * stored in the pool at a described offset. That offset is
-+		 * relative to the start address of the message header.
-+		 * This example program only expects one single item of that
-+		 * type, remembers the struct kdbus_vec member of the item
-+		 * when it sees it, and bails out if there is more than one
-+		 * of them.
-+		 */
-+		if (item->type == KDBUS_ITEM_PAYLOAD_OFF) {
-+			if (vec) {
-+				r = err_r(-EEXIST,
-+					  "message with multiple vecs");
-+				break;
-+			}
-+			vec = &item->vec;
-+			if (vec->size != 2 * sizeof(size_t)) {
-+				r = err_r(-EINVAL, "invalid message size");
-+				break;
-+			}
-+		/*
-+		 * MEMFDs are transported as items of type PAYLOAD_MEMFD.
-+		 * If such an item is attached, a new file descriptor was
-+		 * installed into the task when KDBUS_CMD_RECV was called, and
-+		 * its number is stored in item->memfd.fd.
-+		 * Implementers *must* handle this item type close the
-+		 * file descriptor when no longer needed in order to prevent
-+		 * file descriptor exhaustion. This example program just bails
-+		 * out with an error in this case, as memfds are not expected
-+		 * in this context.
-+		 */
-+		} else if (item->type == KDBUS_ITEM_PAYLOAD_MEMFD) {
-+			r = err_r(-EINVAL, "message with memfd");
-+			break;
-+		}
-+	}
-+	if (r < 0)
-+		goto exit;
-+	if (!vec) {
-+		r = err_r(-EINVAL, "empty message");
-+		goto exit;
-+	}
-+
-+	n = ((size_t *)((const uint8_t *)msg + vec->offset))[0];
-+	steps = ((size_t *)((const uint8_t *)msg + vec->offset))[1];
-+
-+	while (steps-- > 0) {
-+		++n;
-+		r = prime_run(c->prime, c->bus, n);
-+		if (r < 0)
-+			break;
-+		r = bus_poll(c->bus);
-+		if (r != 0) {
-+			r = r < 0 ? r : -EINTR;
-+			break;
-+		}
-+	}
-+
-+exit:
-+	/*
-+	 * We are done with the memory slice that was given to us through
-+	 * cmd.reply.offset. Tell the kernel it can use it for other content
-+	 * in the future. See kdbus.pool(7).
-+	 */
-+	bus_poool_free_slice(c->bus, cmd.reply.offset);
-+	return r;
-+}
-+
-+/*
-+ * Prime Computation
-+ *
-+ */
-+
-+static int prime_new(struct prime **out)
-+{
-+	struct prime *p;
-+	int r;
-+
-+	p = calloc(1, sizeof(*p));
-+	if (!p)
-+		return err("cannot allocate prime memory");
-+
-+	p->fd = -1;
-+	p->area = MAP_FAILED;
-+	p->max = MAX_PRIMES;
-+
-+	/*
-+	 * Prepare and map a memfd to store the bit-fields for the number
-+	 * ranges we want to perform the prime detection on.
-+	 */
-+	p->fd = syscall(__NR_memfd_create, "prime-area", MFD_CLOEXEC);
-+	if (p->fd < 0) {
-+		r = err("cannot create memfd");
-+		goto error;
-+	}
-+
-+	r = ftruncate(p->fd, p->max / 8 + 1);
-+	if (r < 0) {
-+		r = err("cannot ftruncate area");
-+		goto error;
-+	}
-+
-+	p->area = mmap(NULL, p->max / 8 + 1, PROT_READ | PROT_WRITE,
-+		       MAP_SHARED, p->fd, 0);
-+	if (p->area == MAP_FAILED) {
-+		r = err("cannot mmap memfd");
-+		goto error;
-+	}
-+
-+	*out = p;
-+	return 0;
-+
-+error:
-+	prime_free(p);
-+	return r;
-+}
-+
-+static void prime_free(struct prime *p)
-+{
-+	if (!p)
-+		return;
-+
-+	if (p->area != MAP_FAILED)
-+		munmap(p->area, p->max / 8 + 1);
-+	if (p->fd >= 0)
-+		close(p->fd);
-+	free(p);
-+}
-+
-+static bool prime_done(struct prime *p)
-+{
-+	return p->done >= p->max;
-+}
-+
-+static void prime_consume(struct prime *p, size_t amount)
-+{
-+	p->done += amount;
-+}
-+
-+static int prime_run(struct prime *p, struct bus *cancel, size_t number)
-+{
-+	size_t i, n = 0;
-+	int r;
-+
-+	if (number < 2 || number > 65535)
-+		return 0;
-+
-+	for (i = number * number;
-+	     i < p->max && i > number;
-+	     i += number) {
-+		p->area[i / 8] |= 1 << (i % 8);
-+
-+		if (!(++n % (1 << 20))) {
-+			r = bus_poll(cancel);
-+			if (r != 0)
-+				return r < 0 ? r : -EINTR;
-+		}
-+	}
-+
-+	return 0;
-+}
-+
-+static void prime_print(struct prime *p)
-+{
-+	size_t i, l = 0;
-+
-+	fprintf(stderr, "PRIMES:");
-+	for (i = 0; i < p->max; ++i) {
-+		if (!(p->area[i / 8] & (1 << (i % 8))))
-+			fprintf(stderr, "%c%7zu", !(l++ % 16) ? '\n' : ' ', i);
-+	}
-+	fprintf(stderr, "\nEND\n");
-+}
-+
-+static int bus_open_connection(struct bus **out, uid_t uid, const char *name,
-+			       uint64_t recv_flags)
-+{
-+	struct kdbus_cmd_hello hello;
-+	char path[128];
-+	struct bus *b;
-+	int r;
-+
-+	/*
-+	 * The 'bus' object is our representation of a kdbus connection which
-+	 * stores two details: the connection owner file descriptor, and the
-+	 * mmap()ed memory of its associated pool. See kdbus.connection(7) and
-+	 * kdbus.pool(7).
-+	 */
-+	b = calloc(1, sizeof(*b));
-+	if (!b)
-+		return err("cannot allocate bus memory");
-+
-+	b->fd = -1;
-+	b->pool = MAP_FAILED;
-+
-+	/* Compute the name of the bus node to connect to. */
-+	snprintf(path, sizeof(path), "/sys/fs/%s/%lu-%s/bus",
-+		 arg_modname, (unsigned long)uid, name);
-+	b->fd = open(path, O_RDWR | O_CLOEXEC);
-+	if (b->fd < 0) {
-+		r = err("cannot open bus");
-+		goto error;
-+	}
-+
-+	/*
-+	 * To make a connection to the bus, the KDBUS_CMD_HELLO ioctl is used.
-+	 * It takes an argument of type 'struct kdbus_cmd_hello'.
-+	 */
-+	memset(&hello, 0, sizeof(hello));
-+	hello.size = sizeof(hello);
-+
-+	/*
-+	 * Specify a mask of metadata attach flags, describing metadata items
-+	 * that this new connection allows to be sent.
-+	 */
-+	hello.attach_flags_send = _KDBUS_ATTACH_ALL;
-+
-+	/*
-+	 * Specify a mask of metadata attach flags, describing metadata items
-+	 * that this new connection wants to be receive along with each message.
-+	 */
-+	hello.attach_flags_recv = recv_flags;
-+
-+	/*
-+	 * A connection may choose the size of its pool, but the number has to
-+	 * comply with two rules: a) it must be greater than 0, and b) it must
-+	 * be a mulitple of PAGE_SIZE. See kdbus.pool(7).
-+	 */
-+	hello.pool_size = POOL_SIZE;
-+
-+	/*
-+	 * Now employ the command on the file descriptor opened above.
-+	 * This command will turn the file descriptor into a connection-owner
-+	 * file descriptor that controls the life-time of the connection; once
-+	 * it's closed, the connection is shut down.
-+	 */
-+	r = kdbus_cmd_hello(b->fd, &hello);
-+	if (r < 0) {
-+		err_r(r, "HELLO failed");
-+		goto error;
-+	}
-+
-+	bus_poool_free_slice(b, hello.offset);
-+
-+	/*
-+	 * Map the pool of the connection. Its size has been set in the
-+	 * command struct above. See kdbus.pool(7).
-+	 */
-+	b->pool = mmap(NULL, POOL_SIZE, PROT_READ, MAP_SHARED, b->fd, 0);
-+	if (b->pool == MAP_FAILED) {
-+		r = err("cannot mmap pool");
-+		goto error;
-+	}
-+
-+	*out = b;
-+	return 0;
-+
-+error:
-+	bus_close_connection(b);
-+	return r;
-+}
-+
-+static void bus_close_connection(struct bus *b)
-+{
-+	if (!b)
-+		return;
-+
-+	/*
-+	 * A bus connection is closed by simply calling close() on the
-+	 * connection owner file descriptor. The unique name and all owned
-+	 * well-known names of the conneciton will disappear.
-+	 * See kdbus.connection(7).
-+	 */
-+	if (b->pool != MAP_FAILED)
-+		munmap(b->pool, POOL_SIZE);
-+	if (b->fd >= 0)
-+		close(b->fd);
-+	free(b);
-+}
-+
-+static void bus_poool_free_slice(struct bus *b, uint64_t offset)
-+{
-+	struct kdbus_cmd_free cmd = {
-+		.size = sizeof(cmd),
-+		.offset = offset,
-+	};
-+	int r;
-+
-+	/*
-+	 * Once we're done with a piece of pool memory that was returned
-+	 * by a command, we have to call the KDBUS_CMD_FREE ioctl on it so it
-+	 * can be reused. The command takes an argument of type
-+	 * 'struct kdbus_cmd_free', in which the pool offset of the slice to
-+	 * free is stored. The ioctl is employed on the connection owner
-+	 * file descriptor. See kdbus.pool(7),
-+	 */
-+	r = kdbus_cmd_free(b->fd, &cmd);
-+	if (r < 0)
-+		err_r(r, "cannot free pool slice");
-+}
-+
-+static int bus_acquire_name(struct bus *b, const char *name)
-+{
-+	struct kdbus_item *item;
-+	struct kdbus_cmd *cmd;
-+	size_t size;
-+	int r;
-+
-+	/*
-+	 * This function acquires a well-known name on the bus through the
-+	 * KDBUS_CMD_NAME_ACQUIRE ioctl. This ioctl takes an argument of type
-+	 * 'struct kdbus_cmd', which is assembled below. See kdbus.name(7).
-+	 */
-+	size = sizeof(*cmd);
-+	size += KDBUS_ITEM_SIZE(strlen(name) + 1);
-+
-+	cmd = alloca(size);
-+	memset(cmd, 0, size);
-+	cmd->size = size;
-+
-+	/*
-+	 * The command requires an item of type KDBUS_ITEM_NAME, and its
-+	 * content must be a valid bus name.
-+	 */
-+	item = cmd->items;
-+	item->type = KDBUS_ITEM_NAME;
-+	item->size = KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
-+	strcpy(item->str, name);
-+
-+	/*
-+	 * Employ the command on the connection owner file descriptor.
-+	 */
-+	r = kdbus_cmd_name_acquire(b->fd, cmd);
-+	if (r < 0)
-+		return err_r(r, "cannot acquire name");
-+
-+	return 0;
-+}
-+
-+static int bus_install_name_loss_match(struct bus *b, const char *name)
-+{
-+	struct kdbus_cmd_match *match;
-+	struct kdbus_item *item;
-+	size_t size;
-+	int r;
-+
-+	/*
-+	 * In order to install a match for signal messages, we have to
-+	 * assemble a 'struct kdbus_cmd_match' and use it along with the
-+	 * KDBUS_CMD_MATCH_ADD ioctl. See kdbus.match(7).
-+	 */
-+	size = sizeof(*match);
-+	size += KDBUS_ITEM_SIZE(sizeof(item->name_change) + strlen(name) + 1);
-+
-+	match = alloca(size);
-+	memset(match, 0, size);
-+	match->size = size;
-+
-+	/*
-+	 * A match is comprised of many 'rules', each of which describes a
-+	 * mandatory detail of the message. All rules of a match must be
-+	 * satified in order to make a message pass.
-+	 */
-+	item = match->items;
-+
-+	/*
-+	 * In this case, we're interested in notifications that inform us
-+	 * about a well-known name being removed from the bus.
-+	 */
-+	item->type = KDBUS_ITEM_NAME_REMOVE;
-+	item->size = KDBUS_ITEM_HEADER_SIZE +
-+			sizeof(item->name_change) + strlen(name) + 1;
-+
-+	/*
-+	 * We could limit the match further and require a specific unique-ID
-+	 * to be the new or the old owner of the name. In this case, however,
-+	 * we don't, and allow 'any' id.
-+	 */
-+	item->name_change.old_id.id = KDBUS_MATCH_ID_ANY;
-+	item->name_change.new_id.id = KDBUS_MATCH_ID_ANY;
-+
-+	/* Copy in the well-known name we're interested in */
-+	strcpy(item->name_change.name, name);
-+
-+	/*
-+	 * Add the match through the KDBUS_CMD_MATCH_ADD ioctl, employed on
-+	 * the connection owner fd.
-+	 */
-+	r = kdbus_cmd_match_add(b->fd, match);
-+	if (r < 0)
-+		return err_r(r, "cannot add match");
-+
-+	return 0;
-+}
-+
-+static int bus_poll(struct bus *b)
-+{
-+	struct pollfd fds[1] = {};
-+	int r;
-+
-+	/*
-+	 * A connection endpoint supports poll() and will wake-up the
-+	 * task with POLLIN set once a message has arrived.
-+	 */
-+	fds[0].fd = b->fd;
-+	fds[0].events = POLLIN;
-+	r = poll(fds, sizeof(fds) / sizeof(*fds), 0);
-+	if (r < 0)
-+		return err("cannot poll bus");
-+	return !!(fds[0].revents & POLLIN);
-+}
-+
-+static int bus_make(uid_t uid, const char *name)
-+{
-+	struct kdbus_item *item;
-+	struct kdbus_cmd *make;
-+	char path[128], busname[128];
-+	size_t size;
-+	int r, fd;
-+
-+	/*
-+	 * Compute the full path to the 'control' node. 'arg_modname' may be
-+	 * set to a different value than 'kdbus' for development purposes.
-+	 * The 'control' node is the primary entry point to kdbus that must be
-+	 * used in order to create a bus. See kdbus(7) and kdbus.bus(7).
-+	 */
-+	snprintf(path, sizeof(path), "/sys/fs/%s/control", arg_modname);
-+
-+	/*
-+	 * Compute the bus name. A valid bus name must always be prefixed with
-+	 * the EUID of the currently running process in order to avoid name
-+	 * conflicts. See kdbus.bus(7).
-+	 */
-+	snprintf(busname, sizeof(busname), "%lu-%s", (unsigned long)uid, name);
-+
-+	fd = open(path, O_RDWR | O_CLOEXEC);
-+	if (fd < 0)
-+		return err("cannot open control file");
-+
-+	/*
-+	 * The KDBUS_CMD_BUS_MAKE ioctl takes an argument of type
-+	 * 'struct kdbus_cmd', and expects at least two items attached to
-+	 * it: one to decribe the bloom parameters to be propagated to
-+	 * connections of the bus, and the name of the bus that was computed
-+	 * above. Assemble this struct now, and fill it with values.
-+	 */
-+	size = sizeof(*make);
-+	size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_parameter));
-+	size += KDBUS_ITEM_SIZE(strlen(busname) + 1);
-+
-+	make = alloca(size);
-+	memset(make, 0, size);
-+	make->size = size;
-+
-+	/*
-+	 * Each item has a 'type' and 'size' field, and must be stored at an
-+	 * 8-byte aligned address. The KDBUS_ITEM_NEXT macro is used to advance
-+	 * the pointer. See kdbus.item(7) for more details.
-+	 */
-+	item = make->items;
-+	item->type = KDBUS_ITEM_BLOOM_PARAMETER;
-+	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(item->bloom_parameter);
-+	item->bloom_parameter.size = 8;
-+	item->bloom_parameter.n_hash = 1;
-+
-+	/* The name of the new bus is stored in the next item. */
-+	item = KDBUS_ITEM_NEXT(item);
-+	item->type = KDBUS_ITEM_MAKE_NAME;
-+	item->size = KDBUS_ITEM_HEADER_SIZE + strlen(busname) + 1;
-+	strcpy(item->str, busname);
-+
-+	/*
-+	 * Now create the bus via the KDBUS_CMD_BUS_MAKE ioctl and return the
-+	 * fd that was used back to the caller of this function. This fd is now
-+	 * called a 'bus owner file descriptor', and it controls the life-time
-+	 * of the newly created bus; once the file descriptor is closed, the
-+	 * bus goes away, and all connections are shut down. See kdbus.bus(7).
-+	 */
-+	r = kdbus_cmd_bus_make(fd, make);
-+	if (r < 0) {
-+		err_r(r, "cannot make bus");
-+		close(fd);
-+		return r;
-+	}
-+
-+	return fd;
-+}
-+
-+#else
-+
-+#warning "Skipping compilation due to unsupported libc version"
-+
-+int main(int argc, char **argv)
-+{
-+	fprintf(stderr,
-+		"Compilation of %s was skipped due to unsupported libc.\n",
-+		argv[0]);
-+
-+	return EXIT_FAILURE;
-+}
-+
-+#endif /* libc sanity check */
-diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
-index 95abddc..b57100c 100644
---- a/tools/testing/selftests/Makefile
-+++ b/tools/testing/selftests/Makefile
-@@ -5,6 +5,7 @@ TARGETS += exec
- TARGETS += firmware
- TARGETS += ftrace
- TARGETS += kcmp
-+TARGETS += kdbus
- TARGETS += memfd
- TARGETS += memory-hotplug
- TARGETS += mount
-diff --git a/tools/testing/selftests/kdbus/.gitignore b/tools/testing/selftests/kdbus/.gitignore
-new file mode 100644
-index 0000000..d3ef42f
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/.gitignore
-@@ -0,0 +1 @@
-+kdbus-test
-diff --git a/tools/testing/selftests/kdbus/Makefile b/tools/testing/selftests/kdbus/Makefile
-new file mode 100644
-index 0000000..8f36cb5
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/Makefile
-@@ -0,0 +1,49 @@
-+CFLAGS += -I../../../../usr/include/
-+CFLAGS += -I../../../../samples/kdbus/
-+CFLAGS += -I../../../../include/uapi/
-+CFLAGS += -std=gnu99
-+CFLAGS += -DKBUILD_MODNAME=\"kdbus\" -D_GNU_SOURCE
-+LDLIBS = -pthread -lcap -lm
-+
-+OBJS= \
-+	kdbus-enum.o		\
-+	kdbus-util.o		\
-+	kdbus-test.o		\
-+	kdbus-test.o		\
-+	test-activator.o	\
-+	test-benchmark.o	\
-+	test-bus.o		\
-+	test-chat.o		\
-+	test-connection.o	\
-+	test-daemon.o		\
-+	test-endpoint.o		\
-+	test-fd.o		\
-+	test-free.o		\
-+	test-match.o		\
-+	test-message.o		\
-+	test-metadata-ns.o	\
-+	test-monitor.o		\
-+	test-names.o		\
-+	test-policy.o		\
-+	test-policy-ns.o	\
-+	test-policy-priv.o	\
-+	test-sync.o		\
-+	test-timeout.o
-+
-+all: kdbus-test
-+
-+include ../lib.mk
-+
-+%.o: %.c kdbus-enum.h kdbus-test.h kdbus-util.h
-+	$(CC) $(CFLAGS) -c $< -o $@
-+
-+kdbus-test: $(OBJS)
-+	$(CC) $(CFLAGS) $^ $(LDLIBS) -o $@
-+
-+TEST_PROGS := kdbus-test
-+
-+run_tests:
-+	./kdbus-test --tap
-+
-+clean:
-+	rm -f *.o kdbus-test
-diff --git a/tools/testing/selftests/kdbus/kdbus-enum.c b/tools/testing/selftests/kdbus/kdbus-enum.c
-new file mode 100644
-index 0000000..4f1e579
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/kdbus-enum.c
-@@ -0,0 +1,94 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <stdio.h>
-+#include <string.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+
-+struct kdbus_enum_table {
-+	long long id;
-+	const char *name;
-+};
-+
-+#define TABLE(what) static struct kdbus_enum_table kdbus_table_##what[]
-+#define ENUM(_id) { .id = _id, .name = STRINGIFY(_id) }
-+#define LOOKUP(what)							\
-+	const char *enum_##what(long long id)				\
-+	{								\
-+		for (size_t i = 0; i < ELEMENTSOF(kdbus_table_##what); i++) \
-+			if (id == kdbus_table_##what[i].id)		\
-+				return kdbus_table_##what[i].name;	\
-+		return "UNKNOWN";					\
-+	}
-+
-+TABLE(CMD) = {
-+	ENUM(KDBUS_CMD_BUS_MAKE),
-+	ENUM(KDBUS_CMD_ENDPOINT_MAKE),
-+	ENUM(KDBUS_CMD_HELLO),
-+	ENUM(KDBUS_CMD_SEND),
-+	ENUM(KDBUS_CMD_RECV),
-+	ENUM(KDBUS_CMD_LIST),
-+	ENUM(KDBUS_CMD_NAME_RELEASE),
-+	ENUM(KDBUS_CMD_CONN_INFO),
-+	ENUM(KDBUS_CMD_MATCH_ADD),
-+	ENUM(KDBUS_CMD_MATCH_REMOVE),
-+};
-+LOOKUP(CMD);
-+
-+TABLE(MSG) = {
-+	ENUM(_KDBUS_ITEM_NULL),
-+	ENUM(KDBUS_ITEM_PAYLOAD_VEC),
-+	ENUM(KDBUS_ITEM_PAYLOAD_OFF),
-+	ENUM(KDBUS_ITEM_PAYLOAD_MEMFD),
-+	ENUM(KDBUS_ITEM_FDS),
-+	ENUM(KDBUS_ITEM_BLOOM_PARAMETER),
-+	ENUM(KDBUS_ITEM_BLOOM_FILTER),
-+	ENUM(KDBUS_ITEM_DST_NAME),
-+	ENUM(KDBUS_ITEM_MAKE_NAME),
-+	ENUM(KDBUS_ITEM_ATTACH_FLAGS_SEND),
-+	ENUM(KDBUS_ITEM_ATTACH_FLAGS_RECV),
-+	ENUM(KDBUS_ITEM_ID),
-+	ENUM(KDBUS_ITEM_NAME),
-+	ENUM(KDBUS_ITEM_TIMESTAMP),
-+	ENUM(KDBUS_ITEM_CREDS),
-+	ENUM(KDBUS_ITEM_PIDS),
-+	ENUM(KDBUS_ITEM_AUXGROUPS),
-+	ENUM(KDBUS_ITEM_OWNED_NAME),
-+	ENUM(KDBUS_ITEM_TID_COMM),
-+	ENUM(KDBUS_ITEM_PID_COMM),
-+	ENUM(KDBUS_ITEM_EXE),
-+	ENUM(KDBUS_ITEM_CMDLINE),
-+	ENUM(KDBUS_ITEM_CGROUP),
-+	ENUM(KDBUS_ITEM_CAPS),
-+	ENUM(KDBUS_ITEM_SECLABEL),
-+	ENUM(KDBUS_ITEM_AUDIT),
-+	ENUM(KDBUS_ITEM_CONN_DESCRIPTION),
-+	ENUM(KDBUS_ITEM_NAME_ADD),
-+	ENUM(KDBUS_ITEM_NAME_REMOVE),
-+	ENUM(KDBUS_ITEM_NAME_CHANGE),
-+	ENUM(KDBUS_ITEM_ID_ADD),
-+	ENUM(KDBUS_ITEM_ID_REMOVE),
-+	ENUM(KDBUS_ITEM_REPLY_TIMEOUT),
-+	ENUM(KDBUS_ITEM_REPLY_DEAD),
-+};
-+LOOKUP(MSG);
-+
-+TABLE(PAYLOAD) = {
-+	ENUM(KDBUS_PAYLOAD_KERNEL),
-+	ENUM(KDBUS_PAYLOAD_DBUS),
-+};
-+LOOKUP(PAYLOAD);
-diff --git a/tools/testing/selftests/kdbus/kdbus-enum.h b/tools/testing/selftests/kdbus/kdbus-enum.h
-new file mode 100644
-index 0000000..ed28cca
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/kdbus-enum.h
-@@ -0,0 +1,15 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#pragma once
-+
-+const char *enum_CMD(long long id);
-+const char *enum_MSG(long long id);
-+const char *enum_MATCH(long long id);
-+const char *enum_PAYLOAD(long long id);
-diff --git a/tools/testing/selftests/kdbus/kdbus-test.c b/tools/testing/selftests/kdbus/kdbus-test.c
-new file mode 100644
-index 0000000..db57381
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/kdbus-test.c
-@@ -0,0 +1,905 @@
-+#include <errno.h>
-+#include <stdio.h>
-+#include <string.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <time.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <assert.h>
-+#include <getopt.h>
-+#include <stdbool.h>
-+#include <signal.h>
-+#include <sys/mount.h>
-+#include <sys/prctl.h>
-+#include <sys/wait.h>
-+#include <sys/syscall.h>
-+#include <sys/eventfd.h>
-+#include <linux/sched.h>
-+
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+#include "kdbus-test.h"
-+
-+enum {
-+	TEST_CREATE_BUS		= 1 << 0,
-+	TEST_CREATE_CONN	= 1 << 1,
-+};
-+
-+struct kdbus_test {
-+	const char *name;
-+	const char *desc;
-+	int (*func)(struct kdbus_test_env *env);
-+	unsigned int flags;
-+};
-+
-+struct kdbus_test_args {
-+	bool mntns;
-+	bool pidns;
-+	bool userns;
-+	char *uid_map;
-+	char *gid_map;
-+	int loop;
-+	int wait;
-+	int fork;
-+	int tap_output;
-+	char *module;
-+	char *root;
-+	char *test;
-+	char *busname;
-+};
-+
-+static const struct kdbus_test tests[] = {
-+	{
-+		.name	= "bus-make",
-+		.desc	= "bus make functions",
-+		.func	= kdbus_test_bus_make,
-+		.flags	= 0,
-+	},
-+	{
-+		.name	= "hello",
-+		.desc	= "the HELLO command",
-+		.func	= kdbus_test_hello,
-+		.flags	= TEST_CREATE_BUS,
-+	},
-+	{
-+		.name	= "byebye",
-+		.desc	= "the BYEBYE command",
-+		.func	= kdbus_test_byebye,
-+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
-+	},
-+	{
-+		.name	= "chat",
-+		.desc	= "a chat pattern",
-+		.func	= kdbus_test_chat,
-+		.flags	= TEST_CREATE_BUS,
-+	},
-+	{
-+		.name	= "daemon",
-+		.desc	= "a simple daemon",
-+		.func	= kdbus_test_daemon,
-+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
-+	},
-+	{
-+		.name	= "fd-passing",
-+		.desc	= "file descriptor passing",
-+		.func	= kdbus_test_fd_passing,
-+		.flags	= TEST_CREATE_BUS,
-+	},
-+	{
-+		.name	= "endpoint",
-+		.desc	= "custom endpoint",
-+		.func	= kdbus_test_custom_endpoint,
-+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
-+	},
-+	{
-+		.name	= "monitor",
-+		.desc	= "monitor functionality",
-+		.func	= kdbus_test_monitor,
-+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
-+	},
-+	{
-+		.name	= "name-basics",
-+		.desc	= "basic name registry functions",
-+		.func	= kdbus_test_name_basic,
-+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
-+	},
-+	{
-+		.name	= "name-conflict",
-+		.desc	= "name registry conflict details",
-+		.func	= kdbus_test_name_conflict,
-+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
-+	},
-+	{
-+		.name	= "name-queue",
-+		.desc	= "queuing of names",
-+		.func	= kdbus_test_name_queue,
-+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
-+	},
-+	{
-+		.name	= "name-takeover",
-+		.desc	= "takeover of names",
-+		.func	= kdbus_test_name_takeover,
-+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
-+	},
-+	{
-+		.name	= "message-basic",
-+		.desc	= "basic message handling",
-+		.func	= kdbus_test_message_basic,
-+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
-+	},
-+	{
-+		.name	= "message-prio",
-+		.desc	= "handling of messages with priority",
-+		.func	= kdbus_test_message_prio,
-+		.flags	= TEST_CREATE_BUS,
-+	},
-+	{
-+		.name	= "message-quota",
-+		.desc	= "message quotas are enforced",
-+		.func	= kdbus_test_message_quota,
-+		.flags	= TEST_CREATE_BUS,
-+	},
-+	{
-+		.name	= "memory-access",
-+		.desc	= "memory access",
-+		.func	= kdbus_test_memory_access,
-+		.flags	= TEST_CREATE_BUS,
-+	},
-+	{
-+		.name	= "timeout",
-+		.desc	= "timeout",
-+		.func	= kdbus_test_timeout,
-+		.flags	= TEST_CREATE_BUS,
-+	},
-+	{
-+		.name	= "sync-byebye",
-+		.desc	= "synchronous replies vs. BYEBYE",
-+		.func	= kdbus_test_sync_byebye,
-+		.flags	= TEST_CREATE_BUS,
-+	},
-+	{
-+		.name	= "sync-reply",
-+		.desc	= "synchronous replies",
-+		.func	= kdbus_test_sync_reply,
-+		.flags	= TEST_CREATE_BUS,
-+	},
-+	{
-+		.name	= "message-free",
-+		.desc	= "freeing of memory",
-+		.func	= kdbus_test_free,
-+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
-+	},
-+	{
-+		.name	= "connection-info",
-+		.desc	= "retrieving connection information",
-+		.func	= kdbus_test_conn_info,
-+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
-+	},
-+	{
-+		.name	= "connection-update",
-+		.desc	= "updating connection information",
-+		.func	= kdbus_test_conn_update,
-+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
-+	},
-+	{
-+		.name	= "writable-pool",
-+		.desc	= "verifying pools are never writable",
-+		.func	= kdbus_test_writable_pool,
-+		.flags	= TEST_CREATE_BUS,
-+	},
-+	{
-+		.name	= "policy",
-+		.desc	= "policy",
-+		.func	= kdbus_test_policy,
-+		.flags	= TEST_CREATE_BUS,
-+	},
-+	{
-+		.name	= "policy-priv",
-+		.desc	= "unprivileged bus access",
-+		.func	= kdbus_test_policy_priv,
-+		.flags	= TEST_CREATE_BUS,
-+	},
-+	{
-+		.name	= "policy-ns",
-+		.desc	= "policy in user namespaces",
-+		.func	= kdbus_test_policy_ns,
-+		.flags	= TEST_CREATE_BUS,
-+	},
-+	{
-+		.name	= "metadata-ns",
-+		.desc	= "metadata in different namespaces",
-+		.func	= kdbus_test_metadata_ns,
-+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
-+	},
-+	{
-+		.name	= "match-id-add",
-+		.desc	= "adding of matches by id",
-+		.func	= kdbus_test_match_id_add,
-+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
-+	},
-+	{
-+		.name	= "match-id-remove",
-+		.desc	= "removing of matches by id",
-+		.func	= kdbus_test_match_id_remove,
-+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
-+	},
-+	{
-+		.name	= "match-replace",
-+		.desc	= "replace of matches with the same cookie",
-+		.func	= kdbus_test_match_replace,
-+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
-+	},
-+	{
-+		.name	= "match-name-add",
-+		.desc	= "adding of matches by name",
-+		.func	= kdbus_test_match_name_add,
-+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
-+	},
-+	{
-+		.name	= "match-name-remove",
-+		.desc	= "removing of matches by name",
-+		.func	= kdbus_test_match_name_remove,
-+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
-+	},
-+	{
-+		.name	= "match-name-change",
-+		.desc	= "matching for name changes",
-+		.func	= kdbus_test_match_name_change,
-+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
-+	},
-+	{
-+		.name	= "match-bloom",
-+		.desc	= "matching with bloom filters",
-+		.func	= kdbus_test_match_bloom,
-+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
-+	},
-+	{
-+		.name	= "activator",
-+		.desc	= "activator connections",
-+		.func	= kdbus_test_activator,
-+		.flags	= TEST_CREATE_BUS | TEST_CREATE_CONN,
-+	},
-+	{
-+		.name	= "benchmark",
-+		.desc	= "benchmark",
-+		.func	= kdbus_test_benchmark,
-+		.flags	= TEST_CREATE_BUS,
-+	},
-+	{
-+		.name	= "benchmark-nomemfds",
-+		.desc	= "benchmark without using memfds",
-+		.func	= kdbus_test_benchmark_nomemfds,
-+		.flags	= TEST_CREATE_BUS,
-+	},
-+	{
-+		.name	= "benchmark-uds",
-+		.desc	= "benchmark comparison to UDS",
-+		.func	= kdbus_test_benchmark_uds,
-+		.flags	= TEST_CREATE_BUS,
-+	},
-+};
-+
-+#define N_TESTS ((int) (sizeof(tests) / sizeof(tests[0])))
-+
-+static int test_prepare_env(const struct kdbus_test *t,
-+			    const struct kdbus_test_args *args,
-+			    struct kdbus_test_env *env)
-+{
-+	if (t->flags & TEST_CREATE_BUS) {
-+		char *s;
-+		char *n = NULL;
-+		int ret;
-+
-+		asprintf(&s, "%s/control", args->root);
-+
-+		env->control_fd = open(s, O_RDWR);
-+		free(s);
-+		ASSERT_RETURN(env->control_fd >= 0);
-+
-+		if (!args->busname) {
-+			n = unique_name("test-bus");
-+			ASSERT_RETURN(n);
-+		}
-+
-+		ret = kdbus_create_bus(env->control_fd,
-+				       args->busname ?: n,
-+				       _KDBUS_ATTACH_ALL, &s);
-+		free(n);
-+		ASSERT_RETURN(ret == 0);
-+
-+		asprintf(&env->buspath, "%s/%s/bus", args->root, s);
-+		free(s);
-+	}
-+
-+	if (t->flags & TEST_CREATE_CONN) {
-+		env->conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+		ASSERT_RETURN(env->conn);
-+	}
-+
-+	env->root = args->root;
-+	env->module = args->module;
-+
-+	return 0;
-+}
-+
-+void test_unprepare_env(const struct kdbus_test *t, struct kdbus_test_env *env)
-+{
-+	if (env->conn) {
-+		kdbus_conn_free(env->conn);
-+		env->conn = NULL;
-+	}
-+
-+	if (env->control_fd >= 0) {
-+		close(env->control_fd);
-+		env->control_fd = -1;
-+	}
-+
-+	if (env->buspath) {
-+		free(env->buspath);
-+		env->buspath = NULL;
-+	}
-+}
-+
-+static int test_run(const struct kdbus_test *t,
-+		    const struct kdbus_test_args *kdbus_args,
-+		    int wait)
-+{
-+	int ret;
-+	struct kdbus_test_env env = {};
-+
-+	ret = test_prepare_env(t, kdbus_args, &env);
-+	if (ret != TEST_OK)
-+		return ret;
-+
-+	if (wait > 0) {
-+		printf("Sleeping %d seconds before running test ...\n", wait);
-+		sleep(wait);
-+	}
-+
-+	ret = t->func(&env);
-+	test_unprepare_env(t, &env);
-+	return ret;
-+}
-+
-+static int test_run_forked(const struct kdbus_test *t,
-+			   const struct kdbus_test_args *kdbus_args,
-+			   int wait)
-+{
-+	int ret;
-+	pid_t pid;
-+
-+	pid = fork();
-+	if (pid < 0) {
-+		return TEST_ERR;
-+	} else if (pid == 0) {
-+		ret = test_run(t, kdbus_args, wait);
-+		_exit(ret);
-+	}
-+
-+	pid = waitpid(pid, &ret, 0);
-+	if (pid <= 0)
-+		return TEST_ERR;
-+	else if (!WIFEXITED(ret))
-+		return TEST_ERR;
-+	else
-+		return WEXITSTATUS(ret);
-+}
-+
-+static void print_test_result(int ret)
-+{
-+	switch (ret) {
-+	case TEST_OK:
-+		printf("OK");
-+		break;
-+	case TEST_SKIP:
-+		printf("SKIPPED");
-+		break;
-+	case TEST_ERR:
-+		printf("ERROR");
-+		break;
-+	}
-+}
-+
-+static int start_all_tests(struct kdbus_test_args *kdbus_args)
-+{
-+	int ret;
-+	unsigned int fail_cnt = 0;
-+	unsigned int skip_cnt = 0;
-+	unsigned int ok_cnt = 0;
-+	unsigned int i;
-+
-+	if (kdbus_args->tap_output) {
-+		printf("1..%d\n", N_TESTS);
-+		fflush(stdout);
-+	}
-+
-+	kdbus_util_verbose = false;
-+
-+	for (i = 0; i < N_TESTS; i++) {
-+		const struct kdbus_test *t = tests + i;
-+
-+		if (!kdbus_args->tap_output) {
-+			unsigned int n;
-+
-+			printf("Testing %s (%s) ", t->desc, t->name);
-+			for (n = 0; n < 60 - strlen(t->desc) - strlen(t->name); n++)
-+				printf(".");
-+			printf(" ");
-+		}
-+
-+		ret = test_run_forked(t, kdbus_args, 0);
-+		switch (ret) {
-+		case TEST_OK:
-+			ok_cnt++;
-+			break;
-+		case TEST_SKIP:
-+			skip_cnt++;
-+			break;
-+		case TEST_ERR:
-+			fail_cnt++;
-+			break;
-+		}
-+
-+		if (kdbus_args->tap_output) {
-+			printf("%sok %d - %s%s (%s)\n",
-+			       (ret == TEST_ERR) ? "not " : "", i + 1,
-+			       (ret == TEST_SKIP) ? "# SKIP " : "",
-+			       t->desc, t->name);
-+			fflush(stdout);
-+		} else {
-+			print_test_result(ret);
-+			printf("\n");
-+		}
-+	}
-+
-+	if (kdbus_args->tap_output)
-+		printf("Failed %d/%d tests, %.2f%% okay\n", fail_cnt, N_TESTS,
-+		       100.0 - (fail_cnt * 100.0) / ((float) N_TESTS));
-+	else
-+		printf("\nSUMMARY: %u tests passed, %u skipped, %u failed\n",
-+		       ok_cnt, skip_cnt, fail_cnt);
-+
-+	return fail_cnt > 0 ? TEST_ERR : TEST_OK;
-+}
-+
-+static int start_one_test(struct kdbus_test_args *kdbus_args)
-+{
-+	int i, ret;
-+	bool test_found = false;
-+
-+	for (i = 0; i < N_TESTS; i++) {
-+		const struct kdbus_test *t = tests + i;
-+
-+		if (strcmp(t->name, kdbus_args->test))
-+			continue;
-+
-+		do {
-+			test_found = true;
-+			if (kdbus_args->fork)
-+				ret = test_run_forked(t, kdbus_args,
-+						      kdbus_args->wait);
-+			else
-+				ret = test_run(t, kdbus_args,
-+					       kdbus_args->wait);
-+
-+			printf("Testing %s: ", t->desc);
-+			print_test_result(ret);
-+			printf("\n");
-+
-+			if (ret != TEST_OK)
-+				break;
-+		} while (kdbus_args->loop);
-+
-+		return ret;
-+	}
-+
-+	if (!test_found) {
-+		printf("Unknown test-id '%s'\n", kdbus_args->test);
-+		return TEST_ERR;
-+	}
-+
-+	return TEST_OK;
-+}
-+
-+static void usage(const char *argv0)
-+{
-+	unsigned int i, j;
-+
-+	printf("Usage: %s [options]\n"
-+	       "Options:\n"
-+	       "\t-a, --tap		Output test results in TAP format\n"
-+	       "\t-m, --module <module>	Kdbus module name\n"
-+	       "\t-x, --loop		Run in a loop\n"
-+	       "\t-f, --fork		Fork before running a test\n"
-+	       "\t-h, --help		Print this help\n"
-+	       "\t-r, --root <root>	Toplevel of the kdbus hierarchy\n"
-+	       "\t-t, --test <test-id>	Run one specific test only, in verbose mode\n"
-+	       "\t-b, --bus <busname>	Instead of generating a random bus name, take <busname>.\n"
-+	       "\t-w, --wait <secs>	Wait <secs> before actually starting test\n"
-+	       "\t    --mntns		New mount namespace\n"
-+	       "\t    --pidns		New PID namespace\n"
-+	       "\t    --userns		New user namespace\n"
-+	       "\t    --uidmap uid_map	UID map for user namespace\n"
-+	       "\t    --gidmap gid_map	GID map for user namespace\n"
-+	       "\n", argv0);
-+
-+	printf("By default, all test are run once, and a summary is printed.\n"
-+	       "Available tests for --test:\n\n");
-+
-+	for (i = 0; i < N_TESTS; i++) {
-+		const struct kdbus_test *t = tests + i;
-+
-+		printf("\t%s", t->name);
-+
-+		for (j = 0; j < 24 - strlen(t->name); j++)
-+			printf(" ");
-+
-+		printf("Test %s\n", t->desc);
-+	}
-+
-+	printf("\n");
-+	printf("Note that some tests may, if run specifically by --test, "
-+	       "behave differently, and not terminate by themselves.\n");
-+
-+	exit(EXIT_FAILURE);
-+}
-+
-+void print_kdbus_test_args(struct kdbus_test_args *args)
-+{
-+	if (args->userns || args->pidns || args->mntns)
-+		printf("# Starting tests in new %s%s%s namespaces%s\n",
-+			args->mntns ? "MOUNT " : "",
-+			args->pidns ? "PID " : "",
-+			args->userns ? "USER " : "",
-+			args->mntns ? ", kdbusfs will be remounted" : "");
-+	else
-+		printf("# Starting tests in the same namespaces\n");
-+}
-+
-+void print_metadata_support(void)
-+{
-+	bool no_meta_audit, no_meta_cgroups, no_meta_seclabel;
-+
-+	/*
-+	 * KDBUS_ATTACH_CGROUP, KDBUS_ATTACH_AUDIT and
-+	 * KDBUS_ATTACH_SECLABEL
-+	 */
-+	no_meta_audit = !config_auditsyscall_is_enabled();
-+	no_meta_cgroups = !config_cgroups_is_enabled();
-+	no_meta_seclabel = !config_security_is_enabled();
-+
-+	if (no_meta_audit | no_meta_cgroups | no_meta_seclabel)
-+		printf("# Starting tests without %s%s%s metadata support\n",
-+		       no_meta_audit ? "AUDIT " : "",
-+		       no_meta_cgroups ? "CGROUP " : "",
-+		       no_meta_seclabel ? "SECLABEL " : "");
-+	else
-+		printf("# Starting tests with full metadata support\n");
-+}
-+
-+int run_tests(struct kdbus_test_args *kdbus_args)
-+{
-+	int ret;
-+	static char control[4096];
-+
-+	snprintf(control, sizeof(control), "%s/control", kdbus_args->root);
-+
-+	if (access(control, W_OK) < 0) {
-+		printf("Unable to locate control node at '%s'.\n",
-+			control);
-+		return TEST_ERR;
-+	}
-+
-+	if (kdbus_args->test) {
-+		ret = start_one_test(kdbus_args);
-+	} else {
-+		do {
-+			ret = start_all_tests(kdbus_args);
-+			if (ret != TEST_OK)
-+				break;
-+		} while (kdbus_args->loop);
-+	}
-+
-+	return ret;
-+}
-+
-+static void nop_handler(int sig) {}
-+
-+static int test_prepare_mounts(struct kdbus_test_args *kdbus_args)
-+{
-+	int ret;
-+	char kdbusfs[64] = {'\0'};
-+
-+	snprintf(kdbusfs, sizeof(kdbusfs), "%sfs", kdbus_args->module);
-+
-+	/* make current mount slave */
-+	ret = mount(NULL, "/", NULL, MS_SLAVE|MS_REC, NULL);
-+	if (ret < 0) {
-+		ret = -errno;
-+		printf("error mount() root: %d (%m)\n", ret);
-+		return ret;
-+	}
-+
-+	/* Remount procfs since we need it in our tests */
-+	if (kdbus_args->pidns) {
-+		ret = mount("proc", "/proc", "proc",
-+			    MS_NOSUID|MS_NOEXEC|MS_NODEV, NULL);
-+		if (ret < 0) {
-+			ret = -errno;
-+			printf("error mount() /proc : %d (%m)\n", ret);
-+			return ret;
-+		}
-+	}
-+
-+	/* Remount kdbusfs */
-+	ret = mount(kdbusfs, kdbus_args->root, kdbusfs,
-+		    MS_NOSUID|MS_NOEXEC|MS_NODEV, NULL);
-+	if (ret < 0) {
-+		ret = -errno;
-+		printf("error mount() %s :%d (%m)\n", kdbusfs, ret);
-+		return ret;
-+	}
-+
-+	return 0;
-+}
-+
-+int run_tests_in_namespaces(struct kdbus_test_args *kdbus_args)
-+{
-+	int ret;
-+	int efd = -1;
-+	int status;
-+	pid_t pid, rpid;
-+	struct sigaction oldsa;
-+	struct sigaction sa = {
-+		.sa_handler = nop_handler,
-+		.sa_flags = SA_NOCLDSTOP,
-+	};
-+
-+	efd = eventfd(0, EFD_CLOEXEC);
-+	if (efd < 0) {
-+		ret = -errno;
-+		printf("eventfd() failed: %d (%m)\n", ret);
-+		return TEST_ERR;
-+	}
-+
-+	ret = sigaction(SIGCHLD, &sa, &oldsa);
-+	if (ret < 0) {
-+		ret = -errno;
-+		printf("sigaction() failed: %d (%m)\n", ret);
-+		return TEST_ERR;
-+	}
-+
-+	/* setup namespaces */
-+	pid = syscall(__NR_clone, SIGCHLD|
-+		      (kdbus_args->userns ? CLONE_NEWUSER : 0) |
-+		      (kdbus_args->mntns ? CLONE_NEWNS : 0) |
-+		      (kdbus_args->pidns ? CLONE_NEWPID : 0), NULL);
-+	if (pid < 0) {
-+		printf("clone() failed: %d (%m)\n", -errno);
-+		return TEST_ERR;
-+	}
-+
-+	if (pid == 0) {
-+		eventfd_t event_status = 0;
-+
-+		ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
-+		if (ret < 0) {
-+			ret = -errno;
-+			printf("error prctl(): %d (%m)\n", ret);
-+			_exit(TEST_ERR);
-+		}
-+
-+		/* reset sighandlers of childs */
-+		ret = sigaction(SIGCHLD, &oldsa, NULL);
-+		if (ret < 0) {
-+			ret = -errno;
-+			printf("sigaction() failed: %d (%m)\n", ret);
-+			_exit(TEST_ERR);
-+		}
-+
-+		ret = eventfd_read(efd, &event_status);
-+		if (ret < 0 || event_status != 1) {
-+			printf("error eventfd_read()\n");
-+			_exit(TEST_ERR);
-+		}
-+
-+		if (kdbus_args->mntns) {
-+			ret = test_prepare_mounts(kdbus_args);
-+			if (ret < 0) {
-+				printf("error preparing mounts\n");
-+				_exit(TEST_ERR);
-+			}
-+		}
-+
-+		ret = run_tests(kdbus_args);
-+		_exit(ret);
-+	}
-+
-+	/* Setup userns mapping */
-+	if (kdbus_args->userns) {
-+		ret = userns_map_uid_gid(pid, kdbus_args->uid_map,
-+					 kdbus_args->gid_map);
-+		if (ret < 0) {
-+			printf("error mapping uid and gid in userns\n");
-+			eventfd_write(efd, 2);
-+			return TEST_ERR;
-+		}
-+	}
-+
-+	ret = eventfd_write(efd, 1);
-+	if (ret < 0) {
-+		ret = -errno;
-+		printf("error eventfd_write(): %d (%m)\n", ret);
-+		return TEST_ERR;
-+	}
-+
-+	rpid = waitpid(pid, &status, 0);
-+	ASSERT_RETURN_VAL(rpid == pid, TEST_ERR);
-+
-+	close(efd);
-+
-+	if (!WIFEXITED(status) || WEXITSTATUS(status) != 0)
-+		return TEST_ERR;
-+
-+	return TEST_OK;
-+}
-+
-+int start_tests(struct kdbus_test_args *kdbus_args)
-+{
-+	int ret;
-+	bool namespaces;
-+	static char fspath[4096];
-+
-+	namespaces = (kdbus_args->mntns || kdbus_args->pidns ||
-+		      kdbus_args->userns);
-+
-+	/* for pidns we need mntns set */
-+	if (kdbus_args->pidns && !kdbus_args->mntns) {
-+		printf("Failed: please set both pid and mnt namesapces\n");
-+		return TEST_ERR;
-+	}
-+
-+	if (kdbus_args->userns) {
-+		if (!config_user_ns_is_enabled()) {
-+			printf("User namespace not supported\n");
-+			return TEST_ERR;
-+		}
-+
-+		if (!kdbus_args->uid_map || !kdbus_args->gid_map) {
-+			printf("Failed: please specify uid or gid mapping\n");
-+			return TEST_ERR;
-+		}
-+	}
-+
-+	print_kdbus_test_args(kdbus_args);
-+	print_metadata_support();
-+
-+	/* setup kdbus paths */
-+	if (!kdbus_args->module)
-+		kdbus_args->module = "kdbus";
-+
-+	if (!kdbus_args->root) {
-+		snprintf(fspath, sizeof(fspath), "/sys/fs/%s",
-+			 kdbus_args->module);
-+		kdbus_args->root = fspath;
-+	}
-+
-+	/* Start tests */
-+	if (namespaces)
-+		ret = run_tests_in_namespaces(kdbus_args);
-+	else
-+		ret = run_tests(kdbus_args);
-+
-+	return ret;
-+}
-+
-+int main(int argc, char *argv[])
-+{
-+	int t, ret = 0;
-+	struct kdbus_test_args *kdbus_args;
-+	enum {
-+		ARG_MNTNS = 0x100,
-+		ARG_PIDNS,
-+		ARG_USERNS,
-+		ARG_UIDMAP,
-+		ARG_GIDMAP,
-+	};
-+
-+	kdbus_args = malloc(sizeof(*kdbus_args));
-+	if (!kdbus_args) {
-+		printf("unable to malloc() kdbus_args\n");
-+		return EXIT_FAILURE;
-+	}
-+
-+	memset(kdbus_args, 0, sizeof(*kdbus_args));
-+
-+	static const struct option options[] = {
-+		{ "loop",	no_argument,		NULL, 'x' },
-+		{ "help",	no_argument,		NULL, 'h' },
-+		{ "root",	required_argument,	NULL, 'r' },
-+		{ "test",	required_argument,	NULL, 't' },
-+		{ "bus",	required_argument,	NULL, 'b' },
-+		{ "wait",	required_argument,	NULL, 'w' },
-+		{ "fork",	no_argument,		NULL, 'f' },
-+		{ "module",	required_argument,	NULL, 'm' },
-+		{ "tap",	no_argument,		NULL, 'a' },
-+		{ "mntns",	no_argument,		NULL, ARG_MNTNS },
-+		{ "pidns",	no_argument,		NULL, ARG_PIDNS },
-+		{ "userns",	no_argument,		NULL, ARG_USERNS },
-+		{ "uidmap",	required_argument,	NULL, ARG_UIDMAP },
-+		{ "gidmap",	required_argument,	NULL, ARG_GIDMAP },
-+		{}
-+	};
-+
-+	srand(time(NULL));
-+
-+	while ((t = getopt_long(argc, argv, "hxfm:r:t:b:w:a", options, NULL)) >= 0) {
-+		switch (t) {
-+		case 'x':
-+			kdbus_args->loop = 1;
-+			break;
-+
-+		case 'm':
-+			kdbus_args->module = optarg;
-+			break;
-+
-+		case 'r':
-+			kdbus_args->root = optarg;
-+			break;
-+
-+		case 't':
-+			kdbus_args->test = optarg;
-+			break;
-+
-+		case 'b':
-+			kdbus_args->busname = optarg;
-+			break;
-+
-+		case 'w':
-+			kdbus_args->wait = strtol(optarg, NULL, 10);
-+			break;
-+
-+		case 'f':
-+			kdbus_args->fork = 1;
-+			break;
-+
-+		case 'a':
-+			kdbus_args->tap_output = 1;
-+			break;
-+
-+		case ARG_MNTNS:
-+			kdbus_args->mntns = true;
-+			break;
-+
-+		case ARG_PIDNS:
-+			kdbus_args->pidns = true;
-+			break;
-+
-+		case ARG_USERNS:
-+			kdbus_args->userns = true;
-+			break;
-+
-+		case ARG_UIDMAP:
-+			kdbus_args->uid_map = optarg;
-+			break;
-+
-+		case ARG_GIDMAP:
-+			kdbus_args->gid_map = optarg;
-+			break;
-+
-+		default:
-+		case 'h':
-+			usage(argv[0]);
-+		}
-+	}
-+
-+	ret = start_tests(kdbus_args);
-+	if (ret == TEST_ERR)
-+		return EXIT_FAILURE;
-+
-+	free(kdbus_args);
-+
-+	return 0;
-+}
-diff --git a/tools/testing/selftests/kdbus/kdbus-test.h b/tools/testing/selftests/kdbus/kdbus-test.h
-new file mode 100644
-index 0000000..ee937f9
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/kdbus-test.h
-@@ -0,0 +1,84 @@
-+#ifndef _TEST_KDBUS_H_
-+#define _TEST_KDBUS_H_
-+
-+struct kdbus_test_env {
-+	char *buspath;
-+	const char *root;
-+	const char *module;
-+	int control_fd;
-+	struct kdbus_conn *conn;
-+};
-+
-+enum {
-+	TEST_OK,
-+	TEST_SKIP,
-+	TEST_ERR,
-+};
-+
-+#define ASSERT_RETURN_VAL(cond, val)		\
-+	if (!(cond)) {			\
-+		fprintf(stderr,	"Assertion '%s' failed in %s(), %s:%d\n", \
-+			#cond, __func__, __FILE__, __LINE__);	\
-+		return val;	\
-+	}
-+
-+#define ASSERT_EXIT_VAL(cond, val)		\
-+	if (!(cond)) {			\
-+		fprintf(stderr, "Assertion '%s' failed in %s(), %s:%d\n", \
-+			#cond, __func__, __FILE__, __LINE__);	\
-+		_exit(val);	\
-+	}
-+
-+#define ASSERT_BREAK(cond)		\
-+	if (!(cond)) {			\
-+		fprintf(stderr, "Assertion '%s' failed in %s(), %s:%d\n", \
-+			#cond, __func__, __FILE__, __LINE__);	\
-+		break; \
-+	}
-+
-+#define ASSERT_RETURN(cond)		\
-+	ASSERT_RETURN_VAL(cond, TEST_ERR)
-+
-+#define ASSERT_EXIT(cond)		\
-+	ASSERT_EXIT_VAL(cond, EXIT_FAILURE)
-+
-+int kdbus_test_activator(struct kdbus_test_env *env);
-+int kdbus_test_benchmark(struct kdbus_test_env *env);
-+int kdbus_test_benchmark_nomemfds(struct kdbus_test_env *env);
-+int kdbus_test_benchmark_uds(struct kdbus_test_env *env);
-+int kdbus_test_bus_make(struct kdbus_test_env *env);
-+int kdbus_test_byebye(struct kdbus_test_env *env);
-+int kdbus_test_chat(struct kdbus_test_env *env);
-+int kdbus_test_conn_info(struct kdbus_test_env *env);
-+int kdbus_test_conn_update(struct kdbus_test_env *env);
-+int kdbus_test_daemon(struct kdbus_test_env *env);
-+int kdbus_test_custom_endpoint(struct kdbus_test_env *env);
-+int kdbus_test_fd_passing(struct kdbus_test_env *env);
-+int kdbus_test_free(struct kdbus_test_env *env);
-+int kdbus_test_hello(struct kdbus_test_env *env);
-+int kdbus_test_match_bloom(struct kdbus_test_env *env);
-+int kdbus_test_match_id_add(struct kdbus_test_env *env);
-+int kdbus_test_match_id_remove(struct kdbus_test_env *env);
-+int kdbus_test_match_replace(struct kdbus_test_env *env);
-+int kdbus_test_match_name_add(struct kdbus_test_env *env);
-+int kdbus_test_match_name_change(struct kdbus_test_env *env);
-+int kdbus_test_match_name_remove(struct kdbus_test_env *env);
-+int kdbus_test_message_basic(struct kdbus_test_env *env);
-+int kdbus_test_message_prio(struct kdbus_test_env *env);
-+int kdbus_test_message_quota(struct kdbus_test_env *env);
-+int kdbus_test_memory_access(struct kdbus_test_env *env);
-+int kdbus_test_metadata_ns(struct kdbus_test_env *env);
-+int kdbus_test_monitor(struct kdbus_test_env *env);
-+int kdbus_test_name_basic(struct kdbus_test_env *env);
-+int kdbus_test_name_conflict(struct kdbus_test_env *env);
-+int kdbus_test_name_queue(struct kdbus_test_env *env);
-+int kdbus_test_name_takeover(struct kdbus_test_env *env);
-+int kdbus_test_policy(struct kdbus_test_env *env);
-+int kdbus_test_policy_ns(struct kdbus_test_env *env);
-+int kdbus_test_policy_priv(struct kdbus_test_env *env);
-+int kdbus_test_sync_byebye(struct kdbus_test_env *env);
-+int kdbus_test_sync_reply(struct kdbus_test_env *env);
-+int kdbus_test_timeout(struct kdbus_test_env *env);
-+int kdbus_test_writable_pool(struct kdbus_test_env *env);
-+
-+#endif /* _TEST_KDBUS_H_ */
-diff --git a/tools/testing/selftests/kdbus/kdbus-util.c b/tools/testing/selftests/kdbus/kdbus-util.c
-new file mode 100644
-index 0000000..82fa89b
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/kdbus-util.c
-@@ -0,0 +1,1612 @@
-+/*
-+ * Copyright (C) 2013-2015 Daniel Mack
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2014-2015 Djalal Harouni
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <stdio.h>
-+#include <stdarg.h>
-+#include <string.h>
-+#include <time.h>
-+#include <inttypes.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <stdbool.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <poll.h>
-+#include <grp.h>
-+#include <sys/capability.h>
-+#include <sys/mman.h>
-+#include <sys/stat.h>
-+#include <sys/time.h>
-+#include <linux/unistd.h>
-+#include <linux/memfd.h>
-+
-+#ifndef __NR_memfd_create
-+  #ifdef __x86_64__
-+    #define __NR_memfd_create 319
-+  #elif defined __arm__
-+    #define __NR_memfd_create 385
-+  #else
-+    #define __NR_memfd_create 356
-+  #endif
-+#endif
-+
-+#include "kdbus-api.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+
-+#ifndef F_ADD_SEALS
-+#define F_LINUX_SPECIFIC_BASE	1024
-+#define F_ADD_SEALS     (F_LINUX_SPECIFIC_BASE + 9)
-+#define F_GET_SEALS     (F_LINUX_SPECIFIC_BASE + 10)
-+
-+#define F_SEAL_SEAL     0x0001  /* prevent further seals from being set */
-+#define F_SEAL_SHRINK   0x0002  /* prevent file from shrinking */
-+#define F_SEAL_GROW     0x0004  /* prevent file from growing */
-+#define F_SEAL_WRITE    0x0008  /* prevent writes */
-+#endif
-+
-+int kdbus_util_verbose = true;
-+
-+int kdbus_sysfs_get_parameter_mask(const char *path, uint64_t *mask)
-+{
-+	int ret;
-+	FILE *file;
-+	unsigned long long value;
-+
-+	file = fopen(path, "r");
-+	if (!file) {
-+		ret = -errno;
-+		kdbus_printf("--- error fopen(): %d (%m)\n", ret);
-+		return ret;
-+	}
-+
-+	ret = fscanf(file, "%llu", &value);
-+	if (ret != 1) {
-+		if (ferror(file))
-+			ret = -errno;
-+		else
-+			ret = -EIO;
-+
-+		kdbus_printf("--- error fscanf(): %d\n", ret);
-+		fclose(file);
-+		return ret;
-+	}
-+
-+	*mask = (uint64_t)value;
-+
-+	fclose(file);
-+
-+	return 0;
-+}
-+
-+int kdbus_sysfs_set_parameter_mask(const char *path, uint64_t mask)
-+{
-+	int ret;
-+	FILE *file;
-+
-+	file = fopen(path, "w");
-+	if (!file) {
-+		ret = -errno;
-+		kdbus_printf("--- error open(): %d (%m)\n", ret);
-+		return ret;
-+	}
-+
-+	ret = fprintf(file, "%llu", (unsigned long long)mask);
-+	if (ret <= 0) {
-+		ret = -EIO;
-+		kdbus_printf("--- error fprintf(): %d\n", ret);
-+	}
-+
-+	fclose(file);
-+
-+	return ret > 0 ? 0 : ret;
-+}
-+
-+int kdbus_create_bus(int control_fd, const char *name,
-+		     uint64_t owner_meta, char **path)
-+{
-+	struct {
-+		struct kdbus_cmd cmd;
-+
-+		/* bloom size item */
-+		struct {
-+			uint64_t size;
-+			uint64_t type;
-+			struct kdbus_bloom_parameter bloom;
-+		} bp;
-+
-+		/* owner metadata items */
-+		struct {
-+			uint64_t size;
-+			uint64_t type;
-+			uint64_t flags;
-+		} attach;
-+
-+		/* name item */
-+		struct {
-+			uint64_t size;
-+			uint64_t type;
-+			char str[64];
-+		} name;
-+	} bus_make;
-+	int ret;
-+
-+	memset(&bus_make, 0, sizeof(bus_make));
-+	bus_make.bp.size = sizeof(bus_make.bp);
-+	bus_make.bp.type = KDBUS_ITEM_BLOOM_PARAMETER;
-+	bus_make.bp.bloom.size = 64;
-+	bus_make.bp.bloom.n_hash = 1;
-+
-+	snprintf(bus_make.name.str, sizeof(bus_make.name.str),
-+		 "%u-%s", getuid(), name);
-+
-+	bus_make.attach.type = KDBUS_ITEM_ATTACH_FLAGS_SEND;
-+	bus_make.attach.size = sizeof(bus_make.attach);
-+	bus_make.attach.flags = owner_meta;
-+
-+	bus_make.name.type = KDBUS_ITEM_MAKE_NAME;
-+	bus_make.name.size = KDBUS_ITEM_HEADER_SIZE +
-+			     strlen(bus_make.name.str) + 1;
-+
-+	bus_make.cmd.flags = KDBUS_MAKE_ACCESS_WORLD;
-+	bus_make.cmd.size = sizeof(bus_make.cmd) +
-+			     bus_make.bp.size +
-+			     bus_make.attach.size +
-+			     bus_make.name.size;
-+
-+	kdbus_printf("Creating bus with name >%s< on control fd %d ...\n",
-+		     name, control_fd);
-+
-+	ret = kdbus_cmd_bus_make(control_fd, &bus_make.cmd);
-+	if (ret < 0) {
-+		kdbus_printf("--- error when making bus: %d (%m)\n", ret);
-+		return ret;
-+	}
-+
-+	if (ret == 0 && path)
-+		*path = strdup(bus_make.name.str);
-+
-+	return ret;
-+}
-+
-+struct kdbus_conn *
-+kdbus_hello(const char *path, uint64_t flags,
-+	    const struct kdbus_item *item, size_t item_size)
-+{
-+	struct kdbus_cmd_free cmd_free = {};
-+	int fd, ret;
-+	struct {
-+		struct kdbus_cmd_hello hello;
-+
-+		struct {
-+			uint64_t size;
-+			uint64_t type;
-+			char str[16];
-+		} conn_name;
-+
-+		uint8_t extra_items[item_size];
-+	} h;
-+	struct kdbus_conn *conn;
-+
-+	memset(&h, 0, sizeof(h));
-+
-+	if (item_size > 0)
-+		memcpy(h.extra_items, item, item_size);
-+
-+	kdbus_printf("-- opening bus connection %s\n", path);
-+	fd = open(path, O_RDWR|O_CLOEXEC);
-+	if (fd < 0) {
-+		kdbus_printf("--- error %d (%m)\n", fd);
-+		return NULL;
-+	}
-+
-+	h.hello.flags = flags | KDBUS_HELLO_ACCEPT_FD;
-+	h.hello.attach_flags_send = _KDBUS_ATTACH_ALL;
-+	h.hello.attach_flags_recv = _KDBUS_ATTACH_ALL;
-+	h.conn_name.type = KDBUS_ITEM_CONN_DESCRIPTION;
-+	strcpy(h.conn_name.str, "this-is-my-name");
-+	h.conn_name.size = KDBUS_ITEM_HEADER_SIZE + strlen(h.conn_name.str) + 1;
-+
-+	h.hello.size = sizeof(h);
-+	h.hello.pool_size = POOL_SIZE;
-+
-+	ret = kdbus_cmd_hello(fd, (struct kdbus_cmd_hello *) &h.hello);
-+	if (ret < 0) {
-+		kdbus_printf("--- error when saying hello: %d (%m)\n", ret);
-+		return NULL;
-+	}
-+	kdbus_printf("-- Our peer ID for %s: %llu -- bus uuid: '%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x'\n",
-+		     path, (unsigned long long)h.hello.id,
-+		     h.hello.id128[0],  h.hello.id128[1],  h.hello.id128[2],
-+		     h.hello.id128[3],  h.hello.id128[4],  h.hello.id128[5],
-+		     h.hello.id128[6],  h.hello.id128[7],  h.hello.id128[8],
-+		     h.hello.id128[9],  h.hello.id128[10], h.hello.id128[11],
-+		     h.hello.id128[12], h.hello.id128[13], h.hello.id128[14],
-+		     h.hello.id128[15]);
-+
-+	cmd_free.size = sizeof(cmd_free);
-+	cmd_free.offset = h.hello.offset;
-+	kdbus_cmd_free(fd, &cmd_free);
-+
-+	conn = malloc(sizeof(*conn));
-+	if (!conn) {
-+		kdbus_printf("unable to malloc()!?\n");
-+		return NULL;
-+	}
-+
-+	conn->buf = mmap(NULL, POOL_SIZE, PROT_READ, MAP_SHARED, fd, 0);
-+	if (conn->buf == MAP_FAILED) {
-+		free(conn);
-+		close(fd);
-+		kdbus_printf("--- error mmap (%m)\n");
-+		return NULL;
-+	}
-+
-+	conn->fd = fd;
-+	conn->id = h.hello.id;
-+	return conn;
-+}
-+
-+struct kdbus_conn *
-+kdbus_hello_registrar(const char *path, const char *name,
-+		      const struct kdbus_policy_access *access,
-+		      size_t num_access, uint64_t flags)
-+{
-+	struct kdbus_item *item, *items;
-+	size_t i, size;
-+
-+	size = KDBUS_ITEM_SIZE(strlen(name) + 1) +
-+		num_access * KDBUS_ITEM_SIZE(sizeof(*access));
-+
-+	items = alloca(size);
-+
-+	item = items;
-+	item->size = KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
-+	item->type = KDBUS_ITEM_NAME;
-+	strcpy(item->str, name);
-+	item = KDBUS_ITEM_NEXT(item);
-+
-+	for (i = 0; i < num_access; i++) {
-+		item->size = KDBUS_ITEM_HEADER_SIZE +
-+			     sizeof(struct kdbus_policy_access);
-+		item->type = KDBUS_ITEM_POLICY_ACCESS;
-+
-+		item->policy_access.type = access[i].type;
-+		item->policy_access.access = access[i].access;
-+		item->policy_access.id = access[i].id;
-+
-+		item = KDBUS_ITEM_NEXT(item);
-+	}
-+
-+	return kdbus_hello(path, flags, items, size);
-+}
-+
-+struct kdbus_conn *kdbus_hello_activator(const char *path, const char *name,
-+				   const struct kdbus_policy_access *access,
-+				   size_t num_access)
-+{
-+	return kdbus_hello_registrar(path, name, access, num_access,
-+				     KDBUS_HELLO_ACTIVATOR);
-+}
-+
-+bool kdbus_item_in_message(struct kdbus_msg *msg, uint64_t type)
-+{
-+	const struct kdbus_item *item;
-+
-+	KDBUS_ITEM_FOREACH(item, msg, items)
-+		if (item->type == type)
-+			return true;
-+
-+	return false;
-+}
-+
-+int kdbus_bus_creator_info(struct kdbus_conn *conn,
-+			   uint64_t flags,
-+			   uint64_t *offset)
-+{
-+	struct kdbus_cmd_info *cmd;
-+	size_t size = sizeof(*cmd);
-+	int ret;
-+
-+	cmd = alloca(size);
-+	memset(cmd, 0, size);
-+	cmd->size = size;
-+	cmd->attach_flags = flags;
-+
-+	ret = kdbus_cmd_bus_creator_info(conn->fd, cmd);
-+	if (ret < 0) {
-+		kdbus_printf("--- error when requesting info: %d (%m)\n", ret);
-+		return ret;
-+	}
-+
-+	if (offset)
-+		*offset = cmd->offset;
-+	else
-+		kdbus_free(conn, cmd->offset);
-+
-+	return 0;
-+}
-+
-+int kdbus_conn_info(struct kdbus_conn *conn, uint64_t id,
-+		    const char *name, uint64_t flags,
-+		    uint64_t *offset)
-+{
-+	struct kdbus_cmd_info *cmd;
-+	size_t size = sizeof(*cmd);
-+	struct kdbus_info *info;
-+	int ret;
-+
-+	if (name)
-+		size += KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
-+
-+	cmd = alloca(size);
-+	memset(cmd, 0, size);
-+	cmd->size = size;
-+	cmd->attach_flags = flags;
-+
-+	if (name) {
-+		cmd->items[0].size = KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
-+		cmd->items[0].type = KDBUS_ITEM_NAME;
-+		strcpy(cmd->items[0].str, name);
-+	} else {
-+		cmd->id = id;
-+	}
-+
-+	ret = kdbus_cmd_conn_info(conn->fd, cmd);
-+	if (ret < 0) {
-+		kdbus_printf("--- error when requesting info: %d (%m)\n", ret);
-+		return ret;
-+	}
-+
-+	info = (struct kdbus_info *) (conn->buf + cmd->offset);
-+	if (info->size != cmd->info_size) {
-+		kdbus_printf("%s(): size mismatch: %d != %d\n", __func__,
-+				(int) info->size, (int) cmd->info_size);
-+		return -EIO;
-+	}
-+
-+	if (offset)
-+		*offset = cmd->offset;
-+	else
-+		kdbus_free(conn, cmd->offset);
-+
-+	return 0;
-+}
-+
-+void kdbus_conn_free(struct kdbus_conn *conn)
-+{
-+	if (!conn)
-+		return;
-+
-+	if (conn->buf)
-+		munmap(conn->buf, POOL_SIZE);
-+
-+	if (conn->fd >= 0)
-+		close(conn->fd);
-+
-+	free(conn);
-+}
-+
-+int sys_memfd_create(const char *name, __u64 size)
-+{
-+	int ret, fd;
-+
-+	fd = syscall(__NR_memfd_create, name, MFD_ALLOW_SEALING);
-+	if (fd < 0)
-+		return fd;
-+
-+	ret = ftruncate(fd, size);
-+	if (ret < 0) {
-+		close(fd);
-+		return ret;
-+	}
-+
-+	return fd;
-+}
-+
-+int sys_memfd_seal_set(int fd)
-+{
-+	return fcntl(fd, F_ADD_SEALS, F_SEAL_SHRINK |
-+			 F_SEAL_GROW | F_SEAL_WRITE | F_SEAL_SEAL);
-+}
-+
-+off_t sys_memfd_get_size(int fd, off_t *size)
-+{
-+	struct stat stat;
-+	int ret;
-+
-+	ret = fstat(fd, &stat);
-+	if (ret < 0) {
-+		kdbus_printf("stat() failed: %m\n");
-+		return ret;
-+	}
-+
-+	*size = stat.st_size;
-+	return 0;
-+}
-+
-+static int __kdbus_msg_send(const struct kdbus_conn *conn,
-+			    const char *name,
-+			    uint64_t cookie,
-+			    uint64_t flags,
-+			    uint64_t timeout,
-+			    int64_t priority,
-+			    uint64_t dst_id,
-+			    uint64_t cmd_flags,
-+			    int cancel_fd)
-+{
-+	struct kdbus_cmd_send *cmd = NULL;
-+	struct kdbus_msg *msg = NULL;
-+	const char ref1[1024 * 128 + 3] = "0123456789_0";
-+	const char ref2[] = "0123456789_1";
-+	struct kdbus_item *item;
-+	struct timespec now;
-+	uint64_t size;
-+	int memfd = -1;
-+	int ret;
-+
-+	size = sizeof(*msg) + 3 * KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
-+
-+	if (dst_id == KDBUS_DST_ID_BROADCAST)
-+		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
-+	else {
-+		memfd = sys_memfd_create("my-name-is-nice", 1024 * 1024);
-+		if (memfd < 0) {
-+			kdbus_printf("failed to create memfd: %m\n");
-+			return memfd;
-+		}
-+
-+		if (write(memfd, "kdbus memfd 1234567", 19) != 19) {
-+			ret = -errno;
-+			kdbus_printf("writing to memfd failed: %m\n");
-+			goto out;
-+		}
-+
-+		ret = sys_memfd_seal_set(memfd);
-+		if (ret < 0) {
-+			ret = -errno;
-+			kdbus_printf("memfd sealing failed: %m\n");
-+			goto out;
-+		}
-+
-+		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_memfd));
-+	}
-+
-+	if (name)
-+		size += KDBUS_ITEM_SIZE(strlen(name) + 1);
-+
-+	msg = malloc(size);
-+	if (!msg) {
-+		ret = -errno;
-+		kdbus_printf("unable to malloc()!?\n");
-+		goto out;
-+	}
-+
-+	if (dst_id == KDBUS_DST_ID_BROADCAST)
-+		flags |= KDBUS_MSG_SIGNAL;
-+
-+	memset(msg, 0, size);
-+	msg->flags = flags;
-+	msg->priority = priority;
-+	msg->size = size;
-+	msg->src_id = conn->id;
-+	msg->dst_id = name ? 0 : dst_id;
-+	msg->cookie = cookie;
-+	msg->payload_type = KDBUS_PAYLOAD_DBUS;
-+
-+	if (timeout) {
-+		ret = clock_gettime(CLOCK_MONOTONIC_COARSE, &now);
-+		if (ret < 0)
-+			goto out;
-+
-+		msg->timeout_ns = now.tv_sec * 1000000000ULL +
-+				  now.tv_nsec + timeout;
-+	}
-+
-+	item = msg->items;
-+
-+	if (name) {
-+		item->type = KDBUS_ITEM_DST_NAME;
-+		item->size = KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
-+		strcpy(item->str, name);
-+		item = KDBUS_ITEM_NEXT(item);
-+	}
-+
-+	item->type = KDBUS_ITEM_PAYLOAD_VEC;
-+	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
-+	item->vec.address = (uintptr_t)&ref1;
-+	item->vec.size = sizeof(ref1);
-+	item = KDBUS_ITEM_NEXT(item);
-+
-+	/* data padding for ref1 */
-+	item->type = KDBUS_ITEM_PAYLOAD_VEC;
-+	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
-+	item->vec.address = (uintptr_t)NULL;
-+	item->vec.size =  KDBUS_ALIGN8(sizeof(ref1)) - sizeof(ref1);
-+	item = KDBUS_ITEM_NEXT(item);
-+
-+	item->type = KDBUS_ITEM_PAYLOAD_VEC;
-+	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
-+	item->vec.address = (uintptr_t)&ref2;
-+	item->vec.size = sizeof(ref2);
-+	item = KDBUS_ITEM_NEXT(item);
-+
-+	if (dst_id == KDBUS_DST_ID_BROADCAST) {
-+		item->type = KDBUS_ITEM_BLOOM_FILTER;
-+		item->size = KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
-+		item->bloom_filter.generation = 0;
-+	} else {
-+		item->type = KDBUS_ITEM_PAYLOAD_MEMFD;
-+		item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_memfd);
-+		item->memfd.size = 16;
-+		item->memfd.fd = memfd;
-+	}
-+	item = KDBUS_ITEM_NEXT(item);
-+
-+	size = sizeof(*cmd);
-+	if (cancel_fd != -1)
-+		size += KDBUS_ITEM_SIZE(sizeof(cancel_fd));
-+
-+	cmd = malloc(size);
-+	if (!cmd) {
-+		ret = -errno;
-+		kdbus_printf("unable to malloc()!?\n");
-+		goto out;
-+	}
-+
-+	cmd->size = size;
-+	cmd->flags = cmd_flags;
-+	cmd->msg_address = (uintptr_t)msg;
-+
-+	item = cmd->items;
-+
-+	if (cancel_fd != -1) {
-+		item->type = KDBUS_ITEM_CANCEL_FD;
-+		item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(cancel_fd);
-+		item->fds[0] = cancel_fd;
-+		item = KDBUS_ITEM_NEXT(item);
-+	}
-+
-+	ret = kdbus_cmd_send(conn->fd, cmd);
-+	if (ret < 0) {
-+		kdbus_printf("error sending message: %d (%m)\n", ret);
-+		goto out;
-+	}
-+
-+	if (cmd_flags & KDBUS_SEND_SYNC_REPLY) {
-+		struct kdbus_msg *reply;
-+
-+		kdbus_printf("SYNC REPLY @offset %llu:\n", cmd->reply.offset);
-+		reply = (struct kdbus_msg *)(conn->buf + cmd->reply.offset);
-+		kdbus_msg_dump(conn, reply);
-+
-+		kdbus_msg_free(reply);
-+
-+		ret = kdbus_free(conn, cmd->reply.offset);
-+		if (ret < 0)
-+			goto out;
-+	}
-+
-+out:
-+	free(msg);
-+	free(cmd);
-+
-+	if (memfd >= 0)
-+		close(memfd);
-+
-+	return ret < 0 ? ret : 0;
-+}
-+
-+int kdbus_msg_send(const struct kdbus_conn *conn, const char *name,
-+		   uint64_t cookie, uint64_t flags, uint64_t timeout,
-+		   int64_t priority, uint64_t dst_id)
-+{
-+	return __kdbus_msg_send(conn, name, cookie, flags, timeout, priority,
-+				dst_id, 0, -1);
-+}
-+
-+int kdbus_msg_send_sync(const struct kdbus_conn *conn, const char *name,
-+			uint64_t cookie, uint64_t flags, uint64_t timeout,
-+			int64_t priority, uint64_t dst_id, int cancel_fd)
-+{
-+	return __kdbus_msg_send(conn, name, cookie, flags, timeout, priority,
-+				dst_id, KDBUS_SEND_SYNC_REPLY, cancel_fd);
-+}
-+
-+int kdbus_msg_send_reply(const struct kdbus_conn *conn,
-+			 uint64_t reply_cookie,
-+			 uint64_t dst_id)
-+{
-+	struct kdbus_cmd_send cmd = {};
-+	struct kdbus_msg *msg;
-+	const char ref1[1024 * 128 + 3] = "0123456789_0";
-+	struct kdbus_item *item;
-+	uint64_t size;
-+	int ret;
-+
-+	size = sizeof(struct kdbus_msg);
-+	size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
-+
-+	msg = malloc(size);
-+	if (!msg) {
-+		kdbus_printf("unable to malloc()!?\n");
-+		return -ENOMEM;
-+	}
-+
-+	memset(msg, 0, size);
-+	msg->size = size;
-+	msg->src_id = conn->id;
-+	msg->dst_id = dst_id;
-+	msg->cookie_reply = reply_cookie;
-+	msg->payload_type = KDBUS_PAYLOAD_DBUS;
-+
-+	item = msg->items;
-+
-+	item->type = KDBUS_ITEM_PAYLOAD_VEC;
-+	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
-+	item->vec.address = (uintptr_t)&ref1;
-+	item->vec.size = sizeof(ref1);
-+	item = KDBUS_ITEM_NEXT(item);
-+
-+	cmd.size = sizeof(cmd);
-+	cmd.msg_address = (uintptr_t)msg;
-+
-+	ret = kdbus_cmd_send(conn->fd, &cmd);
-+	if (ret < 0)
-+		kdbus_printf("error sending message: %d (%m)\n", ret);
-+
-+	free(msg);
-+
-+	return ret;
-+}
-+
-+static char *msg_id(uint64_t id, char *buf)
-+{
-+	if (id == 0)
-+		return "KERNEL";
-+	if (id == ~0ULL)
-+		return "BROADCAST";
-+	sprintf(buf, "%llu", (unsigned long long)id);
-+	return buf;
-+}
-+
-+int kdbus_msg_dump(const struct kdbus_conn *conn, const struct kdbus_msg *msg)
-+{
-+	const struct kdbus_item *item = msg->items;
-+	char buf_src[32];
-+	char buf_dst[32];
-+	uint64_t timeout = 0;
-+	uint64_t cookie_reply = 0;
-+	int ret = 0;
-+
-+	if (msg->flags & KDBUS_MSG_EXPECT_REPLY)
-+		timeout = msg->timeout_ns;
-+	else
-+		cookie_reply = msg->cookie_reply;
-+
-+	kdbus_printf("MESSAGE: %s (%llu bytes) flags=0x%08llx, %s → %s, "
-+		     "cookie=%llu, timeout=%llu cookie_reply=%llu priority=%lli\n",
-+		enum_PAYLOAD(msg->payload_type), (unsigned long long)msg->size,
-+		(unsigned long long)msg->flags,
-+		msg_id(msg->src_id, buf_src), msg_id(msg->dst_id, buf_dst),
-+		(unsigned long long)msg->cookie, (unsigned long long)timeout,
-+		(unsigned long long)cookie_reply, (long long)msg->priority);
-+
-+	KDBUS_ITEM_FOREACH(item, msg, items) {
-+		if (item->size < KDBUS_ITEM_HEADER_SIZE) {
-+			kdbus_printf("  +%s (%llu bytes) invalid data record\n",
-+				     enum_MSG(item->type), item->size);
-+			ret = -EINVAL;
-+			break;
-+		}
-+
-+		switch (item->type) {
-+		case KDBUS_ITEM_PAYLOAD_OFF: {
-+			char *s;
-+
-+			if (item->vec.offset == ~0ULL)
-+				s = "[\\0-bytes]";
-+			else
-+				s = (char *)msg + item->vec.offset;
-+
-+			kdbus_printf("  +%s (%llu bytes) off=%llu size=%llu '%s'\n",
-+			       enum_MSG(item->type), item->size,
-+			       (unsigned long long)item->vec.offset,
-+			       (unsigned long long)item->vec.size, s);
-+			break;
-+		}
-+
-+		case KDBUS_ITEM_FDS: {
-+			int i, n = (item->size - KDBUS_ITEM_HEADER_SIZE) /
-+					sizeof(int);
-+
-+			kdbus_printf("  +%s (%llu bytes, %d fds)\n",
-+			       enum_MSG(item->type), item->size, n);
-+
-+			for (i = 0; i < n; i++)
-+				kdbus_printf("    fd[%d] = %d\n",
-+					     i, item->fds[i]);
-+
-+			break;
-+		}
-+
-+		case KDBUS_ITEM_PAYLOAD_MEMFD: {
-+			char *buf;
-+			off_t size;
-+
-+			buf = mmap(NULL, item->memfd.size, PROT_READ,
-+				   MAP_PRIVATE, item->memfd.fd, 0);
-+			if (buf == MAP_FAILED) {
-+				kdbus_printf("mmap() fd=%i size=%llu failed: %m\n",
-+					     item->memfd.fd, item->memfd.size);
-+				break;
-+			}
-+
-+			if (sys_memfd_get_size(item->memfd.fd, &size) < 0) {
-+				kdbus_printf("KDBUS_CMD_MEMFD_SIZE_GET failed: %m\n");
-+				break;
-+			}
-+
-+			kdbus_printf("  +%s (%llu bytes) fd=%i size=%llu filesize=%llu '%s'\n",
-+			       enum_MSG(item->type), item->size, item->memfd.fd,
-+			       (unsigned long long)item->memfd.size,
-+			       (unsigned long long)size, buf);
-+			munmap(buf, item->memfd.size);
-+			break;
-+		}
-+
-+		case KDBUS_ITEM_CREDS:
-+			kdbus_printf("  +%s (%llu bytes) uid=%lld, euid=%lld, suid=%lld, fsuid=%lld, "
-+							"gid=%lld, egid=%lld, sgid=%lld, fsgid=%lld\n",
-+				enum_MSG(item->type), item->size,
-+				item->creds.uid, item->creds.euid,
-+				item->creds.suid, item->creds.fsuid,
-+				item->creds.gid, item->creds.egid,
-+				item->creds.sgid, item->creds.fsgid);
-+			break;
-+
-+		case KDBUS_ITEM_PIDS:
-+			kdbus_printf("  +%s (%llu bytes) pid=%lld, tid=%lld, ppid=%lld\n",
-+				enum_MSG(item->type), item->size,
-+				item->pids.pid, item->pids.tid,
-+				item->pids.ppid);
-+			break;
-+
-+		case KDBUS_ITEM_AUXGROUPS: {
-+			int i, n;
-+
-+			kdbus_printf("  +%s (%llu bytes)\n",
-+				     enum_MSG(item->type), item->size);
-+			n = (item->size - KDBUS_ITEM_HEADER_SIZE) /
-+				sizeof(uint64_t);
-+
-+			for (i = 0; i < n; i++)
-+				kdbus_printf("    gid[%d] = %lld\n",
-+					     i, item->data64[i]);
-+			break;
-+		}
-+
-+		case KDBUS_ITEM_NAME:
-+		case KDBUS_ITEM_PID_COMM:
-+		case KDBUS_ITEM_TID_COMM:
-+		case KDBUS_ITEM_EXE:
-+		case KDBUS_ITEM_CGROUP:
-+		case KDBUS_ITEM_SECLABEL:
-+		case KDBUS_ITEM_DST_NAME:
-+		case KDBUS_ITEM_CONN_DESCRIPTION:
-+			kdbus_printf("  +%s (%llu bytes) '%s' (%zu)\n",
-+				     enum_MSG(item->type), item->size,
-+				     item->str, strlen(item->str));
-+			break;
-+
-+		case KDBUS_ITEM_OWNED_NAME: {
-+			kdbus_printf("  +%s (%llu bytes) '%s' (%zu) flags=0x%08llx\n",
-+				     enum_MSG(item->type), item->size,
-+				     item->name.name, strlen(item->name.name),
-+				     item->name.flags);
-+			break;
-+		}
-+
-+		case KDBUS_ITEM_CMDLINE: {
-+			size_t size = item->size - KDBUS_ITEM_HEADER_SIZE;
-+			const char *str = item->str;
-+			int count = 0;
-+
-+			kdbus_printf("  +%s (%llu bytes) ",
-+				     enum_MSG(item->type), item->size);
-+			while (size) {
-+				kdbus_printf("'%s' ", str);
-+				size -= strlen(str) + 1;
-+				str += strlen(str) + 1;
-+				count++;
-+			}
-+
-+			kdbus_printf("(%d string%s)\n",
-+				     count, (count == 1) ? "" : "s");
-+			break;
-+		}
-+
-+		case KDBUS_ITEM_AUDIT:
-+			kdbus_printf("  +%s (%llu bytes) loginuid=%u sessionid=%u\n",
-+			       enum_MSG(item->type), item->size,
-+			       item->audit.loginuid, item->audit.sessionid);
-+			break;
-+
-+		case KDBUS_ITEM_CAPS: {
-+			const uint32_t *cap;
-+			int n, i;
-+
-+			kdbus_printf("  +%s (%llu bytes) len=%llu bytes, last_cap %d\n",
-+				     enum_MSG(item->type), item->size,
-+				     (unsigned long long)item->size -
-+					KDBUS_ITEM_HEADER_SIZE,
-+				     (int) item->caps.last_cap);
-+
-+			cap = item->caps.caps;
-+			n = (item->size - offsetof(struct kdbus_item, caps.caps))
-+				/ 4 / sizeof(uint32_t);
-+
-+			kdbus_printf("    CapInh=");
-+			for (i = 0; i < n; i++)
-+				kdbus_printf("%08x", cap[(0 * n) + (n - i - 1)]);
-+
-+			kdbus_printf(" CapPrm=");
-+			for (i = 0; i < n; i++)
-+				kdbus_printf("%08x", cap[(1 * n) + (n - i - 1)]);
-+
-+			kdbus_printf(" CapEff=");
-+			for (i = 0; i < n; i++)
-+				kdbus_printf("%08x", cap[(2 * n) + (n - i - 1)]);
-+
-+			kdbus_printf(" CapBnd=");
-+			for (i = 0; i < n; i++)
-+				kdbus_printf("%08x", cap[(3 * n) + (n - i - 1)]);
-+			kdbus_printf("\n");
-+			break;
-+		}
-+
-+		case KDBUS_ITEM_TIMESTAMP:
-+			kdbus_printf("  +%s (%llu bytes) seq=%llu realtime=%lluns monotonic=%lluns\n",
-+			       enum_MSG(item->type), item->size,
-+			       (unsigned long long)item->timestamp.seqnum,
-+			       (unsigned long long)item->timestamp.realtime_ns,
-+			       (unsigned long long)item->timestamp.monotonic_ns);
-+			break;
-+
-+		case KDBUS_ITEM_REPLY_TIMEOUT:
-+			kdbus_printf("  +%s (%llu bytes) cookie=%llu\n",
-+			       enum_MSG(item->type), item->size,
-+			       msg->cookie_reply);
-+			break;
-+
-+		case KDBUS_ITEM_NAME_ADD:
-+		case KDBUS_ITEM_NAME_REMOVE:
-+		case KDBUS_ITEM_NAME_CHANGE:
-+			kdbus_printf("  +%s (%llu bytes) '%s', old id=%lld, now id=%lld, old_flags=0x%llx new_flags=0x%llx\n",
-+				enum_MSG(item->type),
-+				(unsigned long long) item->size,
-+				item->name_change.name,
-+				item->name_change.old_id.id,
-+				item->name_change.new_id.id,
-+				item->name_change.old_id.flags,
-+				item->name_change.new_id.flags);
-+			break;
-+
-+		case KDBUS_ITEM_ID_ADD:
-+		case KDBUS_ITEM_ID_REMOVE:
-+			kdbus_printf("  +%s (%llu bytes) id=%llu flags=%llu\n",
-+			       enum_MSG(item->type),
-+			       (unsigned long long) item->size,
-+			       (unsigned long long) item->id_change.id,
-+			       (unsigned long long) item->id_change.flags);
-+			break;
-+
-+		default:
-+			kdbus_printf("  +%s (%llu bytes)\n",
-+				     enum_MSG(item->type), item->size);
-+			break;
-+		}
-+	}
-+
-+	if ((char *)item - ((char *)msg + msg->size) >= 8) {
-+		kdbus_printf("invalid padding at end of message\n");
-+		ret = -EINVAL;
-+	}
-+
-+	kdbus_printf("\n");
-+
-+	return ret;
-+}
-+
-+void kdbus_msg_free(struct kdbus_msg *msg)
-+{
-+	const struct kdbus_item *item;
-+	int nfds, i;
-+
-+	if (!msg)
-+		return;
-+
-+	KDBUS_ITEM_FOREACH(item, msg, items) {
-+		switch (item->type) {
-+		/* close all memfds */
-+		case KDBUS_ITEM_PAYLOAD_MEMFD:
-+			close(item->memfd.fd);
-+			break;
-+		case KDBUS_ITEM_FDS:
-+			nfds = (item->size - KDBUS_ITEM_HEADER_SIZE) /
-+				sizeof(int);
-+
-+			for (i = 0; i < nfds; i++)
-+				close(item->fds[i]);
-+
-+			break;
-+		}
-+	}
-+}
-+
-+int kdbus_msg_recv(struct kdbus_conn *conn,
-+		   struct kdbus_msg **msg_out,
-+		   uint64_t *offset)
-+{
-+	struct kdbus_cmd_recv recv = { .size = sizeof(recv) };
-+	struct kdbus_msg *msg;
-+	int ret;
-+
-+	ret = kdbus_cmd_recv(conn->fd, &recv);
-+	if (ret < 0)
-+		return ret;
-+
-+	msg = (struct kdbus_msg *)(conn->buf + recv.msg.offset);
-+	ret = kdbus_msg_dump(conn, msg);
-+	if (ret < 0) {
-+		kdbus_msg_free(msg);
-+		return ret;
-+	}
-+
-+	if (msg_out) {
-+		*msg_out = msg;
-+
-+		if (offset)
-+			*offset = recv.msg.offset;
-+	} else {
-+		kdbus_msg_free(msg);
-+
-+		ret = kdbus_free(conn, recv.msg.offset);
-+		if (ret < 0)
-+			return ret;
-+	}
-+
-+	return 0;
-+}
-+
-+/*
-+ * Returns: 0 on success, negative errno on failure.
-+ *
-+ * We must return -ETIMEDOUT, -ECONNREST, -EAGAIN and other errors.
-+ * We must return the result of kdbus_msg_recv()
-+ */
-+int kdbus_msg_recv_poll(struct kdbus_conn *conn,
-+			int timeout_ms,
-+			struct kdbus_msg **msg_out,
-+			uint64_t *offset)
-+{
-+	int ret;
-+
-+	do {
-+		struct timeval before, after, diff;
-+		struct pollfd fd;
-+
-+		fd.fd = conn->fd;
-+		fd.events = POLLIN | POLLPRI | POLLHUP;
-+		fd.revents = 0;
-+
-+		gettimeofday(&before, NULL);
-+		ret = poll(&fd, 1, timeout_ms);
-+		gettimeofday(&after, NULL);
-+
-+		if (ret == 0) {
-+			ret = -ETIMEDOUT;
-+			break;
-+		}
-+
-+		if (ret > 0) {
-+			if (fd.revents & POLLIN)
-+				ret = kdbus_msg_recv(conn, msg_out, offset);
-+
-+			if (fd.revents & (POLLHUP | POLLERR))
-+				ret = -ECONNRESET;
-+		}
-+
-+		if (ret == 0 || ret != -EAGAIN)
-+			break;
-+
-+		timersub(&after, &before, &diff);
-+		timeout_ms -= diff.tv_sec * 1000UL +
-+			      diff.tv_usec / 1000UL;
-+	} while (timeout_ms > 0);
-+
-+	return ret;
-+}
-+
-+int kdbus_free(const struct kdbus_conn *conn, uint64_t offset)
-+{
-+	struct kdbus_cmd_free cmd_free = {};
-+	int ret;
-+
-+	cmd_free.size = sizeof(cmd_free);
-+	cmd_free.offset = offset;
-+	cmd_free.flags = 0;
-+
-+	ret = kdbus_cmd_free(conn->fd, &cmd_free);
-+	if (ret < 0) {
-+		kdbus_printf("KDBUS_CMD_FREE failed: %d (%m)\n", ret);
-+		return ret;
-+	}
-+
-+	return 0;
-+}
-+
-+int kdbus_name_acquire(struct kdbus_conn *conn,
-+		       const char *name, uint64_t *flags)
-+{
-+	struct kdbus_cmd *cmd_name;
-+	size_t name_len = strlen(name) + 1;
-+	uint64_t size = sizeof(*cmd_name) + KDBUS_ITEM_SIZE(name_len);
-+	struct kdbus_item *item;
-+	int ret;
-+
-+	cmd_name = alloca(size);
-+
-+	memset(cmd_name, 0, size);
-+
-+	item = cmd_name->items;
-+	item->size = KDBUS_ITEM_HEADER_SIZE + name_len;
-+	item->type = KDBUS_ITEM_NAME;
-+	strcpy(item->str, name);
-+
-+	cmd_name->size = size;
-+	if (flags)
-+		cmd_name->flags = *flags;
-+
-+	ret = kdbus_cmd_name_acquire(conn->fd, cmd_name);
-+	if (ret < 0) {
-+		kdbus_printf("error aquiring name: %s\n", strerror(-ret));
-+		return ret;
-+	}
-+
-+	kdbus_printf("%s(): flags after call: 0x%llx\n", __func__,
-+		     cmd_name->return_flags);
-+
-+	if (flags)
-+		*flags = cmd_name->return_flags;
-+
-+	return 0;
-+}
-+
-+int kdbus_name_release(struct kdbus_conn *conn, const char *name)
-+{
-+	struct kdbus_cmd *cmd_name;
-+	size_t name_len = strlen(name) + 1;
-+	uint64_t size = sizeof(*cmd_name) + KDBUS_ITEM_SIZE(name_len);
-+	struct kdbus_item *item;
-+	int ret;
-+
-+	cmd_name = alloca(size);
-+
-+	memset(cmd_name, 0, size);
-+
-+	item = cmd_name->items;
-+	item->size = KDBUS_ITEM_HEADER_SIZE + name_len;
-+	item->type = KDBUS_ITEM_NAME;
-+	strcpy(item->str, name);
-+
-+	cmd_name->size = size;
-+
-+	kdbus_printf("conn %lld giving up name '%s'\n",
-+		     (unsigned long long) conn->id, name);
-+
-+	ret = kdbus_cmd_name_release(conn->fd, cmd_name);
-+	if (ret < 0) {
-+		kdbus_printf("error releasing name: %s\n", strerror(-ret));
-+		return ret;
-+	}
-+
-+	return 0;
-+}
-+
-+int kdbus_list(struct kdbus_conn *conn, uint64_t flags)
-+{
-+	struct kdbus_cmd_list cmd_list = {};
-+	struct kdbus_info *list, *name;
-+	int ret;
-+
-+	cmd_list.size = sizeof(cmd_list);
-+	cmd_list.flags = flags;
-+
-+	ret = kdbus_cmd_list(conn->fd, &cmd_list);
-+	if (ret < 0) {
-+		kdbus_printf("error listing names: %d (%m)\n", ret);
-+		return ret;
-+	}
-+
-+	kdbus_printf("REGISTRY:\n");
-+	list = (struct kdbus_info *)(conn->buf + cmd_list.offset);
-+
-+	KDBUS_FOREACH(name, list, cmd_list.list_size) {
-+		uint64_t flags = 0;
-+		struct kdbus_item *item;
-+		const char *n = "MISSING-NAME";
-+
-+		if (name->size == sizeof(struct kdbus_cmd))
-+			continue;
-+
-+		KDBUS_ITEM_FOREACH(item, name, items)
-+			if (item->type == KDBUS_ITEM_OWNED_NAME) {
-+				n = item->name.name;
-+				flags = item->name.flags;
-+
-+				kdbus_printf("%8llu flags=0x%08llx conn=0x%08llx '%s'\n",
-+					     name->id,
-+					     (unsigned long long) flags,
-+					     name->flags, n);
-+			}
-+	}
-+	kdbus_printf("\n");
-+
-+	ret = kdbus_free(conn, cmd_list.offset);
-+
-+	return ret;
-+}
-+
-+int kdbus_conn_update_attach_flags(struct kdbus_conn *conn,
-+				   uint64_t attach_flags_send,
-+				   uint64_t attach_flags_recv)
-+{
-+	int ret;
-+	size_t size;
-+	struct kdbus_cmd *update;
-+	struct kdbus_item *item;
-+
-+	size = sizeof(struct kdbus_cmd);
-+	size += KDBUS_ITEM_SIZE(sizeof(uint64_t)) * 2;
-+
-+	update = malloc(size);
-+	if (!update) {
-+		kdbus_printf("error malloc: %m\n");
-+		return -ENOMEM;
-+	}
-+
-+	memset(update, 0, size);
-+	update->size = size;
-+
-+	item = update->items;
-+
-+	item->type = KDBUS_ITEM_ATTACH_FLAGS_SEND;
-+	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(uint64_t);
-+	item->data64[0] = attach_flags_send;
-+	item = KDBUS_ITEM_NEXT(item);
-+
-+	item->type = KDBUS_ITEM_ATTACH_FLAGS_RECV;
-+	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(uint64_t);
-+	item->data64[0] = attach_flags_recv;
-+	item = KDBUS_ITEM_NEXT(item);
-+
-+	ret = kdbus_cmd_update(conn->fd, update);
-+	if (ret < 0)
-+		kdbus_printf("error conn update: %d (%m)\n", ret);
-+
-+	free(update);
-+
-+	return ret;
-+}
-+
-+int kdbus_conn_update_policy(struct kdbus_conn *conn, const char *name,
-+			     const struct kdbus_policy_access *access,
-+			     size_t num_access)
-+{
-+	struct kdbus_cmd *update;
-+	struct kdbus_item *item;
-+	size_t i, size;
-+	int ret;
-+
-+	size = sizeof(struct kdbus_cmd);
-+	size += KDBUS_ITEM_SIZE(strlen(name) + 1);
-+	size += num_access * KDBUS_ITEM_SIZE(sizeof(struct kdbus_policy_access));
-+
-+	update = malloc(size);
-+	if (!update) {
-+		kdbus_printf("error malloc: %m\n");
-+		return -ENOMEM;
-+	}
-+
-+	memset(update, 0, size);
-+	update->size = size;
-+
-+	item = update->items;
-+
-+	item->type = KDBUS_ITEM_NAME;
-+	item->size = KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
-+	strcpy(item->str, name);
-+	item = KDBUS_ITEM_NEXT(item);
-+
-+	for (i = 0; i < num_access; i++) {
-+		item->size = KDBUS_ITEM_HEADER_SIZE +
-+			     sizeof(struct kdbus_policy_access);
-+		item->type = KDBUS_ITEM_POLICY_ACCESS;
-+
-+		item->policy_access.type = access[i].type;
-+		item->policy_access.access = access[i].access;
-+		item->policy_access.id = access[i].id;
-+
-+		item = KDBUS_ITEM_NEXT(item);
-+	}
-+
-+	ret = kdbus_cmd_update(conn->fd, update);
-+	if (ret < 0)
-+		kdbus_printf("error conn update: %d (%m)\n", ret);
-+
-+	free(update);
-+
-+	return ret;
-+}
-+
-+int kdbus_add_match_id(struct kdbus_conn *conn, uint64_t cookie,
-+		       uint64_t type, uint64_t id)
-+{
-+	struct {
-+		struct kdbus_cmd_match cmd;
-+		struct {
-+			uint64_t size;
-+			uint64_t type;
-+			struct kdbus_notify_id_change chg;
-+		} item;
-+	} buf;
-+	int ret;
-+
-+	memset(&buf, 0, sizeof(buf));
-+
-+	buf.cmd.size = sizeof(buf);
-+	buf.cmd.cookie = cookie;
-+	buf.item.size = sizeof(buf.item);
-+	buf.item.type = type;
-+	buf.item.chg.id = id;
-+
-+	ret = kdbus_cmd_match_add(conn->fd, &buf.cmd);
-+	if (ret < 0)
-+		kdbus_printf("--- error adding conn match: %d (%m)\n", ret);
-+
-+	return ret;
-+}
-+
-+int kdbus_add_match_empty(struct kdbus_conn *conn)
-+{
-+	struct {
-+		struct kdbus_cmd_match cmd;
-+		struct kdbus_item item;
-+	} buf;
-+	int ret;
-+
-+	memset(&buf, 0, sizeof(buf));
-+
-+	buf.item.size = sizeof(uint64_t) * 3;
-+	buf.item.type = KDBUS_ITEM_ID;
-+	buf.item.id = KDBUS_MATCH_ID_ANY;
-+
-+	buf.cmd.size = sizeof(buf.cmd) + buf.item.size;
-+
-+	ret = kdbus_cmd_match_add(conn->fd, &buf.cmd);
-+	if (ret < 0)
-+		kdbus_printf("--- error adding conn match: %d (%m)\n", ret);
-+
-+	return ret;
-+}
-+
-+static int all_ids_are_mapped(const char *path)
-+{
-+	int ret;
-+	FILE *file;
-+	uint32_t inside_id, length;
-+
-+	file = fopen(path, "r");
-+	if (!file) {
-+		ret = -errno;
-+		kdbus_printf("error fopen() %s: %d (%m)\n",
-+			     path, ret);
-+		return ret;
-+	}
-+
-+	ret = fscanf(file, "%u\t%*u\t%u", &inside_id, &length);
-+	if (ret != 2) {
-+		if (ferror(file))
-+			ret = -errno;
-+		else
-+			ret = -EIO;
-+
-+		kdbus_printf("--- error fscanf(): %d\n", ret);
-+		fclose(file);
-+		return ret;
-+	}
-+
-+	fclose(file);
-+
-+	/*
-+	 * If length is 4294967295 which means the invalid uid
-+	 * (uid_t) -1 then we are able to map all uid/gids
-+	 */
-+	if (inside_id == 0 && length == (uid_t) -1)
-+		return 1;
-+
-+	return 0;
-+}
-+
-+int all_uids_gids_are_mapped(void)
-+{
-+	int ret;
-+
-+	ret = all_ids_are_mapped("/proc/self/uid_map");
-+	if (ret <= 0) {
-+		kdbus_printf("--- error not all uids are mapped\n");
-+		return 0;
-+	}
-+
-+	ret = all_ids_are_mapped("/proc/self/gid_map");
-+	if (ret <= 0) {
-+		kdbus_printf("--- error not all gids are mapped\n");
-+		return 0;
-+	}
-+
-+	return 1;
-+}
-+
-+int drop_privileges(uid_t uid, gid_t gid)
-+{
-+	int ret;
-+
-+	ret = setgroups(0, NULL);
-+	if (ret < 0) {
-+		ret = -errno;
-+		kdbus_printf("error setgroups: %d (%m)\n", ret);
-+		return ret;
-+	}
-+
-+	ret = setresgid(gid, gid, gid);
-+	if (ret < 0) {
-+		ret = -errno;
-+		kdbus_printf("error setresgid: %d (%m)\n", ret);
-+		return ret;
-+	}
-+
-+	ret = setresuid(uid, uid, uid);
-+	if (ret < 0) {
-+		ret = -errno;
-+		kdbus_printf("error setresuid: %d (%m)\n", ret);
-+		return ret;
-+	}
-+
-+	return ret;
-+}
-+
-+uint64_t now(clockid_t clock)
-+{
-+	struct timespec spec;
-+
-+	clock_gettime(clock, &spec);
-+	return spec.tv_sec * 1000ULL * 1000ULL * 1000ULL + spec.tv_nsec;
-+}
-+
-+char *unique_name(const char *prefix)
-+{
-+	unsigned int i;
-+	uint64_t u_now;
-+	char n[17];
-+	char *str;
-+	int r;
-+
-+	/*
-+	 * This returns a random string which is guaranteed to be
-+	 * globally unique across all calls to unique_name(). We
-+	 * compose the string as:
-+	 *   <prefix>-<random>-<time>
-+	 * With:
-+	 *   <prefix>: string provided by the caller
-+	 *   <random>: a random alpha string of 16 characters
-+	 *   <time>: the current time in micro-seconds since last boot
-+	 *
-+	 * The <random> part makes the string always look vastly different,
-+	 * the <time> part makes sure no two calls return the same string.
-+	 */
-+
-+	u_now = now(CLOCK_MONOTONIC);
-+
-+	for (i = 0; i < sizeof(n) - 1; ++i)
-+		n[i] = 'a' + (rand() % ('z' - 'a'));
-+	n[sizeof(n) - 1] = 0;
-+
-+	r = asprintf(&str, "%s-%s-%" PRIu64, prefix, n, u_now);
-+	if (r < 0)
-+		return NULL;
-+
-+	return str;
-+}
-+
-+static int do_userns_map_id(pid_t pid,
-+			    const char *map_file,
-+			    const char *map_id)
-+{
-+	int ret;
-+	int fd;
-+	char *map;
-+	unsigned int i;
-+
-+	map = strndupa(map_id, strlen(map_id));
-+	if (!map) {
-+		ret = -errno;
-+		kdbus_printf("error strndupa %s: %d (%m)\n",
-+			map_file, ret);
-+		return ret;
-+	}
-+
-+	for (i = 0; i < strlen(map); i++)
-+		if (map[i] == ',')
-+			map[i] = '\n';
-+
-+	fd = open(map_file, O_RDWR);
-+	if (fd < 0) {
-+		ret = -errno;
-+		kdbus_printf("error open %s: %d (%m)\n",
-+			map_file, ret);
-+		return ret;
-+	}
-+
-+	ret = write(fd, map, strlen(map));
-+	if (ret < 0) {
-+		ret = -errno;
-+		kdbus_printf("error write to %s: %d (%m)\n",
-+			     map_file, ret);
-+		goto out;
-+	}
-+
-+	ret = 0;
-+
-+out:
-+	close(fd);
-+	return ret;
-+}
-+
-+int userns_map_uid_gid(pid_t pid,
-+		       const char *map_uid,
-+		       const char *map_gid)
-+{
-+	int fd, ret;
-+	char file_id[128] = {'\0'};
-+
-+	snprintf(file_id, sizeof(file_id), "/proc/%ld/uid_map",
-+		 (long) pid);
-+
-+	ret = do_userns_map_id(pid, file_id, map_uid);
-+	if (ret < 0)
-+		return ret;
-+
-+	snprintf(file_id, sizeof(file_id), "/proc/%ld/setgroups",
-+		 (long) pid);
-+
-+	fd = open(file_id, O_WRONLY);
-+	if (fd >= 0) {
-+		write(fd, "deny\n", 5);
-+		close(fd);
-+	}
-+
-+	snprintf(file_id, sizeof(file_id), "/proc/%ld/gid_map",
-+		 (long) pid);
-+
-+	return do_userns_map_id(pid, file_id, map_gid);
-+}
-+
-+static int do_cap_get_flag(cap_t caps, cap_value_t cap)
-+{
-+	int ret;
-+	cap_flag_value_t flag_set;
-+
-+	ret = cap_get_flag(caps, cap, CAP_EFFECTIVE, &flag_set);
-+	if (ret < 0) {
-+		ret = -errno;
-+		kdbus_printf("error cap_get_flag(): %d (%m)\n", ret);
-+		return ret;
-+	}
-+
-+	return (flag_set == CAP_SET);
-+}
-+
-+/*
-+ * Returns:
-+ *  1 in case all the requested effective capabilities are set.
-+ *  0 in case we do not have the requested capabilities. This value
-+ *    will be used to abort tests with TEST_SKIP
-+ *  Negative errno on failure.
-+ *
-+ *  Terminate args with a negative value.
-+ */
-+int test_is_capable(int cap, ...)
-+{
-+	int ret;
-+	va_list ap;
-+	cap_t caps;
-+
-+	caps = cap_get_proc();
-+	if (!caps) {
-+		ret = -errno;
-+		kdbus_printf("error cap_get_proc(): %d (%m)\n", ret);
-+		return ret;
-+	}
-+
-+	ret = do_cap_get_flag(caps, (cap_value_t)cap);
-+	if (ret <= 0)
-+		goto out;
-+
-+	va_start(ap, cap);
-+	while ((cap = va_arg(ap, int)) > 0) {
-+		ret = do_cap_get_flag(caps, (cap_value_t)cap);
-+		if (ret <= 0)
-+			break;
-+	}
-+	va_end(ap);
-+
-+out:
-+	cap_free(caps);
-+	return ret;
-+}
-+
-+int config_user_ns_is_enabled(void)
-+{
-+	return (access("/proc/self/uid_map", F_OK) == 0);
-+}
-+
-+int config_auditsyscall_is_enabled(void)
-+{
-+	return (access("/proc/self/loginuid", F_OK) == 0);
-+}
-+
-+int config_cgroups_is_enabled(void)
-+{
-+	return (access("/proc/self/cgroup", F_OK) == 0);
-+}
-+
-+int config_security_is_enabled(void)
-+{
-+	int fd;
-+	int ret;
-+	char buf[128];
-+
-+	/* CONFIG_SECURITY is disabled */
-+	if (access("/proc/self/attr/current", F_OK) != 0)
-+		return 0;
-+
-+	/*
-+	 * Now only if read() fails with -EINVAL then we assume
-+	 * that SECLABEL and LSM are disabled
-+	 */
-+	fd = open("/proc/self/attr/current", O_RDONLY|O_CLOEXEC);
-+	if (fd < 0)
-+		return 1;
-+
-+	ret = read(fd, buf, sizeof(buf));
-+	if (ret == -1 && errno == EINVAL)
-+		ret = 0;
-+	else
-+		ret = 1;
-+
-+	close(fd);
-+
-+	return ret;
-+}
-diff --git a/tools/testing/selftests/kdbus/kdbus-util.h b/tools/testing/selftests/kdbus/kdbus-util.h
-new file mode 100644
-index 0000000..e1e18b9
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/kdbus-util.h
-@@ -0,0 +1,218 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Daniel Mack
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#pragma once
-+
-+#define BIT(X) (1 << (X))
-+
-+#include <time.h>
-+#include <stdbool.h>
-+#include <linux/kdbus.h>
-+
-+#define _STRINGIFY(x) #x
-+#define STRINGIFY(x) _STRINGIFY(x)
-+#define ELEMENTSOF(x) (sizeof(x)/sizeof((x)[0]))
-+
-+#define KDBUS_PTR(addr) ((void *)(uintptr_t)(addr))
-+
-+#define KDBUS_ALIGN8(l) (((l) + 7) & ~7)
-+#define KDBUS_ITEM_HEADER_SIZE offsetof(struct kdbus_item, data)
-+#define KDBUS_ITEM_SIZE(s) KDBUS_ALIGN8((s) + KDBUS_ITEM_HEADER_SIZE)
-+
-+#define KDBUS_ITEM_NEXT(item) \
-+	(typeof(item))((uint8_t *)(item) + KDBUS_ALIGN8((item)->size))
-+#define KDBUS_ITEM_FOREACH(item, head, first)				\
-+	for ((item) = (head)->first;					\
-+	     ((uint8_t *)(item) < (uint8_t *)(head) + (head)->size) &&	\
-+	       ((uint8_t *)(item) >= (uint8_t *)(head));		\
-+	     (item) = KDBUS_ITEM_NEXT(item))
-+#define KDBUS_FOREACH(iter, first, _size)				\
-+	for ((iter) = (first);						\
-+	     ((uint8_t *)(iter) < (uint8_t *)(first) + (_size)) &&	\
-+	       ((uint8_t *)(iter) >= (uint8_t *)(first));		\
-+	     (iter) = (void *)((uint8_t *)(iter) + KDBUS_ALIGN8((iter)->size)))
-+
-+#define _KDBUS_ATTACH_BITS_SET_NR (__builtin_popcountll(_KDBUS_ATTACH_ALL))
-+
-+/* Sum of KDBUS_ITEM_* that reflects _KDBUS_ATTACH_ALL */
-+#define KDBUS_ATTACH_ITEMS_TYPE_SUM					\
-+	((((_KDBUS_ATTACH_BITS_SET_NR - 1) *				\
-+	((_KDBUS_ATTACH_BITS_SET_NR - 1) + 1)) / 2) +			\
-+	(_KDBUS_ITEM_ATTACH_BASE * _KDBUS_ATTACH_BITS_SET_NR))
-+
-+#define POOL_SIZE (16 * 1024LU * 1024LU)
-+
-+#define UNPRIV_UID 65534
-+#define UNPRIV_GID 65534
-+
-+/* Dump as user of process, useful for user namespace testing */
-+#define SUID_DUMP_USER	1
-+
-+extern int kdbus_util_verbose;
-+
-+#define kdbus_printf(X...) \
-+	if (kdbus_util_verbose) \
-+		printf(X)
-+
-+#define RUN_UNPRIVILEGED(child_uid, child_gid, _child_, _parent_) ({	\
-+		pid_t pid, rpid;					\
-+		int ret;						\
-+									\
-+		pid = fork();						\
-+		if (pid == 0) {						\
-+			ret = drop_privileges(child_uid, child_gid);	\
-+			ASSERT_EXIT_VAL(ret == 0, ret);			\
-+									\
-+			_child_;					\
-+			_exit(0);					\
-+		} else if (pid > 0) {					\
-+			_parent_;					\
-+			rpid = waitpid(pid, &ret, 0);			\
-+			ASSERT_RETURN(rpid == pid);			\
-+			ASSERT_RETURN(WIFEXITED(ret));			\
-+			ASSERT_RETURN(WEXITSTATUS(ret) == 0);		\
-+			ret = TEST_OK;					\
-+		} else {						\
-+			ret = pid;					\
-+		}							\
-+									\
-+		ret;							\
-+	})
-+
-+#define RUN_UNPRIVILEGED_CONN(_var_, _bus_, _code_)			\
-+	RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_GID, ({			\
-+		struct kdbus_conn *_var_;				\
-+		_var_ = kdbus_hello(_bus_, 0, NULL, 0);			\
-+		ASSERT_EXIT(_var_);					\
-+		_code_;							\
-+		kdbus_conn_free(_var_);					\
-+	}), ({ 0; }))
-+
-+#define RUN_CLONE_CHILD(clone_ret, flags, _setup_, _child_body_,	\
-+			_parent_setup_, _parent_body_) ({		\
-+	pid_t pid, rpid;						\
-+	int ret;							\
-+	int efd = -1;							\
-+									\
-+	_setup_;							\
-+	efd = eventfd(0, EFD_CLOEXEC);					\
-+	ASSERT_RETURN(efd >= 0);					\
-+	*(clone_ret) = 0;						\
-+	pid = syscall(__NR_clone, flags, NULL);				\
-+	if (pid == 0) {							\
-+		eventfd_t event_status = 0;				\
-+		ret = prctl(PR_SET_PDEATHSIG, SIGKILL);			\
-+		ASSERT_EXIT(ret == 0);					\
-+		ret = eventfd_read(efd, &event_status);			\
-+		if (ret < 0 || event_status != 1) {			\
-+			kdbus_printf("error eventfd_read()\n");		\
-+			_exit(EXIT_FAILURE);				\
-+		}							\
-+		_child_body_;						\
-+		_exit(0);						\
-+	} else if (pid > 0) {						\
-+		_parent_setup_;						\
-+		ret = eventfd_write(efd, 1);				\
-+		ASSERT_RETURN(ret >= 0);				\
-+		_parent_body_;						\
-+		rpid = waitpid(pid, &ret, 0);				\
-+		ASSERT_RETURN(rpid == pid);				\
-+		ASSERT_RETURN(WIFEXITED(ret));				\
-+		ASSERT_RETURN(WEXITSTATUS(ret) == 0);			\
-+		ret = TEST_OK;						\
-+	} else {							\
-+		ret = -errno;						\
-+		*(clone_ret) = -errno;					\
-+	}								\
-+	close(efd);							\
-+	ret;								\
-+})
-+
-+/* Enums for parent if it should drop privs or not */
-+enum kdbus_drop_parent {
-+	DO_NOT_DROP,
-+	DROP_SAME_UNPRIV,
-+	DROP_OTHER_UNPRIV,
-+};
-+
-+struct kdbus_conn {
-+	int fd;
-+	uint64_t id;
-+	unsigned char *buf;
-+};
-+
-+int kdbus_sysfs_get_parameter_mask(const char *path, uint64_t *mask);
-+int kdbus_sysfs_set_parameter_mask(const char *path, uint64_t mask);
-+
-+int sys_memfd_create(const char *name, __u64 size);
-+int sys_memfd_seal_set(int fd);
-+off_t sys_memfd_get_size(int fd, off_t *size);
-+
-+int kdbus_list(struct kdbus_conn *conn, uint64_t flags);
-+int kdbus_name_release(struct kdbus_conn *conn, const char *name);
-+int kdbus_name_acquire(struct kdbus_conn *conn, const char *name,
-+		       uint64_t *flags);
-+void kdbus_msg_free(struct kdbus_msg *msg);
-+int kdbus_msg_recv(struct kdbus_conn *conn,
-+		   struct kdbus_msg **msg, uint64_t *offset);
-+int kdbus_msg_recv_poll(struct kdbus_conn *conn, int timeout_ms,
-+			struct kdbus_msg **msg_out, uint64_t *offset);
-+int kdbus_free(const struct kdbus_conn *conn, uint64_t offset);
-+int kdbus_msg_dump(const struct kdbus_conn *conn,
-+		   const struct kdbus_msg *msg);
-+int kdbus_create_bus(int control_fd, const char *name,
-+		     uint64_t owner_meta, char **path);
-+int kdbus_msg_send(const struct kdbus_conn *conn, const char *name,
-+		   uint64_t cookie, uint64_t flags, uint64_t timeout,
-+		   int64_t priority, uint64_t dst_id);
-+int kdbus_msg_send_sync(const struct kdbus_conn *conn, const char *name,
-+			uint64_t cookie, uint64_t flags, uint64_t timeout,
-+			int64_t priority, uint64_t dst_id, int cancel_fd);
-+int kdbus_msg_send_reply(const struct kdbus_conn *conn,
-+			 uint64_t reply_cookie,
-+			 uint64_t dst_id);
-+struct kdbus_conn *kdbus_hello(const char *path, uint64_t hello_flags,
-+			       const struct kdbus_item *item,
-+			       size_t item_size);
-+struct kdbus_conn *kdbus_hello_registrar(const char *path, const char *name,
-+					 const struct kdbus_policy_access *access,
-+					 size_t num_access, uint64_t flags);
-+struct kdbus_conn *kdbus_hello_activator(const char *path, const char *name,
-+					 const struct kdbus_policy_access *access,
-+					 size_t num_access);
-+bool kdbus_item_in_message(struct kdbus_msg *msg, uint64_t type);
-+int kdbus_bus_creator_info(struct kdbus_conn *conn,
-+			   uint64_t flags,
-+			   uint64_t *offset);
-+int kdbus_conn_info(struct kdbus_conn *conn, uint64_t id,
-+		    const char *name, uint64_t flags, uint64_t *offset);
-+void kdbus_conn_free(struct kdbus_conn *conn);
-+int kdbus_conn_update_attach_flags(struct kdbus_conn *conn,
-+				   uint64_t attach_flags_send,
-+				   uint64_t attach_flags_recv);
-+int kdbus_conn_update_policy(struct kdbus_conn *conn, const char *name,
-+			     const struct kdbus_policy_access *access,
-+			     size_t num_access);
-+
-+int kdbus_add_match_id(struct kdbus_conn *conn, uint64_t cookie,
-+		       uint64_t type, uint64_t id);
-+int kdbus_add_match_empty(struct kdbus_conn *conn);
-+
-+int all_uids_gids_are_mapped(void);
-+int drop_privileges(uid_t uid, gid_t gid);
-+uint64_t now(clockid_t clock);
-+char *unique_name(const char *prefix);
-+
-+int userns_map_uid_gid(pid_t pid, const char *map_uid, const char *map_gid);
-+int test_is_capable(int cap, ...);
-+int config_user_ns_is_enabled(void);
-+int config_auditsyscall_is_enabled(void);
-+int config_cgroups_is_enabled(void);
-+int config_security_is_enabled(void);
-diff --git a/tools/testing/selftests/kdbus/test-activator.c b/tools/testing/selftests/kdbus/test-activator.c
-new file mode 100644
-index 0000000..3d1b763
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-activator.c
-@@ -0,0 +1,318 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <time.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stdbool.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <poll.h>
-+#include <sys/capability.h>
-+#include <sys/types.h>
-+#include <sys/wait.h>
-+
-+#include "kdbus-test.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+
-+static int kdbus_starter_poll(struct kdbus_conn *conn)
-+{
-+	int ret;
-+	struct pollfd fd;
-+
-+	fd.fd = conn->fd;
-+	fd.events = POLLIN | POLLPRI | POLLHUP;
-+	fd.revents = 0;
-+
-+	ret = poll(&fd, 1, 100);
-+	if (ret == 0)
-+		return -ETIMEDOUT;
-+	else if (ret > 0) {
-+		if (fd.revents & POLLIN)
-+			return 0;
-+
-+		if (fd.revents & (POLLHUP | POLLERR))
-+			ret = -ECONNRESET;
-+	}
-+
-+	return ret;
-+}
-+
-+/* Ensure that kdbus activator logic is safe */
-+static int kdbus_priv_activator(struct kdbus_test_env *env)
-+{
-+	int ret;
-+	struct kdbus_msg *msg = NULL;
-+	uint64_t cookie = 0xdeadbeef;
-+	uint64_t flags = KDBUS_NAME_REPLACE_EXISTING;
-+	struct kdbus_conn *activator;
-+	struct kdbus_conn *service;
-+	struct kdbus_conn *client;
-+	struct kdbus_conn *holder;
-+	struct kdbus_policy_access *access;
-+
-+	access = (struct kdbus_policy_access[]){
-+		{
-+			.type = KDBUS_POLICY_ACCESS_USER,
-+			.id = getuid(),
-+			.access = KDBUS_POLICY_OWN,
-+		},
-+		{
-+			.type = KDBUS_POLICY_ACCESS_USER,
-+			.id = getuid(),
-+			.access = KDBUS_POLICY_TALK,
-+		},
-+	};
-+
-+	activator = kdbus_hello_activator(env->buspath, "foo.priv.activator",
-+					  access, 2);
-+	ASSERT_RETURN(activator);
-+
-+	service = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(service);
-+
-+	client = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(client);
-+
-+	/*
-+	 * Make sure that other users can't TALK to the activator
-+	 */
-+
-+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+		/* Try to talk using the ID */
-+		ret = kdbus_msg_send(unpriv, NULL, 0xdeadbeef, 0, 0,
-+				     0, activator->id);
-+		ASSERT_EXIT(ret == -ENXIO);
-+
-+		/* Try to talk to the name */
-+		ret = kdbus_msg_send(unpriv, "foo.priv.activator",
-+				     0xdeadbeef, 0, 0, 0,
-+				     KDBUS_DST_ID_NAME);
-+		ASSERT_EXIT(ret == -EPERM);
-+	}));
-+	ASSERT_RETURN(ret >= 0);
-+
-+	/*
-+	 * Make sure that we did not receive anything, so the
-+	 * service will not be started automatically
-+	 */
-+
-+	ret = kdbus_starter_poll(activator);
-+	ASSERT_RETURN(ret == -ETIMEDOUT);
-+
-+	/*
-+	 * Now try to emulate the starter/service logic and
-+	 * acquire the name.
-+	 */
-+
-+	cookie++;
-+	ret = kdbus_msg_send(service, "foo.priv.activator", cookie,
-+			     0, 0, 0, KDBUS_DST_ID_NAME);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_starter_poll(activator);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* Policies are still checked, access denied */
-+
-+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+		ret = kdbus_name_acquire(unpriv, "foo.priv.activator",
-+					 &flags);
-+		ASSERT_RETURN(ret == -EPERM);
-+	}));
-+	ASSERT_RETURN(ret >= 0);
-+
-+	ret = kdbus_name_acquire(service, "foo.priv.activator",
-+				 &flags);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* We read our previous starter message */
-+
-+	ret = kdbus_msg_recv_poll(service, 100, NULL, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* Try to talk, we still fail */
-+
-+	cookie++;
-+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+		/* Try to talk to the name */
-+		ret = kdbus_msg_send(unpriv, "foo.priv.activator",
-+				     cookie, 0, 0, 0,
-+				     KDBUS_DST_ID_NAME);
-+		ASSERT_EXIT(ret == -EPERM);
-+	}));
-+	ASSERT_RETURN(ret >= 0);
-+
-+	/* Still nothing to read */
-+
-+	ret = kdbus_msg_recv_poll(service, 100, NULL, NULL);
-+	ASSERT_RETURN(ret == -ETIMEDOUT);
-+
-+	/* We receive every thing now */
-+
-+	cookie++;
-+	ret = kdbus_msg_send(client, "foo.priv.activator", cookie,
-+			     0, 0, 0, KDBUS_DST_ID_NAME);
-+	ASSERT_RETURN(ret == 0);
-+	ret = kdbus_msg_recv_poll(service, 100, &msg, NULL);
-+	ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
-+
-+	kdbus_msg_free(msg);
-+
-+	/* Policies default to deny TALK now */
-+	kdbus_conn_free(activator);
-+
-+	cookie++;
-+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+		/* Try to talk to the name */
-+		ret = kdbus_msg_send(unpriv, "foo.priv.activator",
-+				     cookie, 0, 0, 0,
-+				     KDBUS_DST_ID_NAME);
-+		ASSERT_EXIT(ret == -EPERM);
-+	}));
-+	ASSERT_RETURN(ret >= 0);
-+
-+	ret = kdbus_msg_recv_poll(service, 100, NULL, NULL);
-+	ASSERT_RETURN(ret == -ETIMEDOUT);
-+
-+	/* Same user is able to TALK */
-+	cookie++;
-+	ret = kdbus_msg_send(client, "foo.priv.activator", cookie,
-+			     0, 0, 0, KDBUS_DST_ID_NAME);
-+	ASSERT_RETURN(ret == 0);
-+	ret = kdbus_msg_recv_poll(service, 100, &msg, NULL);
-+	ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
-+
-+	kdbus_msg_free(msg);
-+
-+	access = (struct kdbus_policy_access []){
-+		{
-+			.type = KDBUS_POLICY_ACCESS_WORLD,
-+			.id = getuid(),
-+			.access = KDBUS_POLICY_TALK,
-+		},
-+	};
-+
-+	holder = kdbus_hello_registrar(env->buspath, "foo.priv.activator",
-+				       access, 1, KDBUS_HELLO_POLICY_HOLDER);
-+	ASSERT_RETURN(holder);
-+
-+	/* Now we are able to TALK to the name */
-+
-+	cookie++;
-+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+		/* Try to talk to the name */
-+		ret = kdbus_msg_send(unpriv, "foo.priv.activator",
-+				     cookie, 0, 0, 0,
-+				     KDBUS_DST_ID_NAME);
-+		ASSERT_EXIT(ret == 0);
-+	}));
-+	ASSERT_RETURN(ret >= 0);
-+
-+	ret = kdbus_msg_recv_poll(service, 100, NULL, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+		ret = kdbus_name_acquire(unpriv, "foo.priv.activator",
-+					 &flags);
-+		ASSERT_RETURN(ret == -EPERM);
-+	}));
-+	ASSERT_RETURN(ret >= 0);
-+
-+	kdbus_conn_free(service);
-+	kdbus_conn_free(client);
-+	kdbus_conn_free(holder);
-+
-+	return 0;
-+}
-+
-+int kdbus_test_activator(struct kdbus_test_env *env)
-+{
-+	int ret;
-+	struct kdbus_conn *activator;
-+	struct pollfd fds[2];
-+	bool activator_done = false;
-+	struct kdbus_policy_access access[2];
-+
-+	access[0].type = KDBUS_POLICY_ACCESS_USER;
-+	access[0].id = getuid();
-+	access[0].access = KDBUS_POLICY_OWN;
-+
-+	access[1].type = KDBUS_POLICY_ACCESS_WORLD;
-+	access[1].access = KDBUS_POLICY_TALK;
-+
-+	activator = kdbus_hello_activator(env->buspath, "foo.test.activator",
-+					  access, 2);
-+	ASSERT_RETURN(activator);
-+
-+	ret = kdbus_add_match_empty(env->conn);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_list(env->conn, KDBUS_LIST_NAMES |
-+				    KDBUS_LIST_UNIQUE |
-+				    KDBUS_LIST_ACTIVATORS |
-+				    KDBUS_LIST_QUEUED);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_msg_send(env->conn, "foo.test.activator", 0xdeafbeef,
-+			     0, 0, 0, KDBUS_DST_ID_NAME);
-+	ASSERT_RETURN(ret == 0);
-+
-+	fds[0].fd = activator->fd;
-+	fds[1].fd = env->conn->fd;
-+
-+	kdbus_printf("-- entering poll loop ...\n");
-+
-+	for (;;) {
-+		int i, nfds = sizeof(fds) / sizeof(fds[0]);
-+
-+		for (i = 0; i < nfds; i++) {
-+			fds[i].events = POLLIN | POLLPRI;
-+			fds[i].revents = 0;
-+		}
-+
-+		ret = poll(fds, nfds, 3000);
-+		ASSERT_RETURN(ret >= 0);
-+
-+		ret = kdbus_list(env->conn, KDBUS_LIST_NAMES);
-+		ASSERT_RETURN(ret == 0);
-+
-+		if ((fds[0].revents & POLLIN) && !activator_done) {
-+			uint64_t flags = KDBUS_NAME_REPLACE_EXISTING;
-+
-+			kdbus_printf("Starter was called back!\n");
-+
-+			ret = kdbus_name_acquire(env->conn,
-+						 "foo.test.activator", &flags);
-+			ASSERT_RETURN(ret == 0);
-+
-+			activator_done = true;
-+		}
-+
-+		if (fds[1].revents & POLLIN) {
-+			kdbus_msg_recv(env->conn, NULL, NULL);
-+			break;
-+		}
-+	}
-+
-+	/* Check if all uids/gids are mapped */
-+	if (!all_uids_gids_are_mapped())
-+		return TEST_SKIP;
-+
-+	/* Check now capabilities, so we run the previous tests */
-+	ret = test_is_capable(CAP_SETUID, CAP_SETGID, -1);
-+	ASSERT_RETURN(ret >= 0);
-+
-+	if (!ret)
-+		return TEST_SKIP;
-+
-+	ret = kdbus_priv_activator(env);
-+	ASSERT_RETURN(ret == 0);
-+
-+	kdbus_conn_free(activator);
-+
-+	return TEST_OK;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-benchmark.c b/tools/testing/selftests/kdbus/test-benchmark.c
-new file mode 100644
-index 0000000..8a9744b
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-benchmark.c
-@@ -0,0 +1,451 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <time.h>
-+#include <fcntl.h>
-+#include <locale.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <stdbool.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <poll.h>
-+#include <sys/time.h>
-+#include <sys/mman.h>
-+#include <sys/socket.h>
-+#include <math.h>
-+
-+#include "kdbus-api.h"
-+#include "kdbus-test.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+
-+#define SERVICE_NAME "foo.bar.echo"
-+
-+/*
-+ * To have a banchmark comparison with unix socket, set:
-+ * user_memfd	= false;
-+ * compare_uds	= true;
-+ * attach_none	= true;		do not attached metadata
-+ */
-+
-+static bool use_memfd = true;		/* transmit memfd? */
-+static bool compare_uds = false;		/* unix-socket comparison? */
-+static bool attach_none = false;		/* clear attach-flags? */
-+static char stress_payload[8192];
-+
-+struct stats {
-+	uint64_t count;
-+	uint64_t latency_acc;
-+	uint64_t latency_low;
-+	uint64_t latency_high;
-+	uint64_t latency_avg;
-+	uint64_t latency_ssquares;
-+};
-+
-+static struct stats stats;
-+
-+static void reset_stats(void)
-+{
-+	stats.count = 0;
-+	stats.latency_acc = 0;
-+	stats.latency_low = UINT64_MAX;
-+	stats.latency_high = 0;
-+	stats.latency_avg = 0;
-+	stats.latency_ssquares = 0;
-+}
-+
-+static void dump_stats(bool is_uds)
-+{
-+	if (stats.count > 0) {
-+		kdbus_printf("stats %s: %'llu packets processed, latency (nsecs) min/max/avg/dev %'7llu // %'7llu // %'7llu // %'7.f\n",
-+			     is_uds ? " (UNIX)" : "(KDBUS)",
-+			     (unsigned long long) stats.count,
-+			     (unsigned long long) stats.latency_low,
-+			     (unsigned long long) stats.latency_high,
-+			     (unsigned long long) stats.latency_avg,
-+			     sqrt(stats.latency_ssquares / stats.count));
-+	} else {
-+		kdbus_printf("*** no packets received. bus stuck?\n");
-+	}
-+}
-+
-+static void add_stats(uint64_t prev)
-+{
-+	uint64_t diff, latency_avg_prev;
-+
-+	diff = now(CLOCK_THREAD_CPUTIME_ID) - prev;
-+
-+	stats.count++;
-+	stats.latency_acc += diff;
-+
-+	/* see Welford62 */
-+	latency_avg_prev = stats.latency_avg;
-+	stats.latency_avg = stats.latency_acc / stats.count;
-+	stats.latency_ssquares += (diff - latency_avg_prev) * (diff - stats.latency_avg);
-+
-+	if (stats.latency_low > diff)
-+		stats.latency_low = diff;
-+
-+	if (stats.latency_high < diff)
-+		stats.latency_high = diff;
-+}
-+
-+static int setup_simple_kdbus_msg(struct kdbus_conn *conn,
-+				  uint64_t dst_id,
-+				  struct kdbus_msg **msg_out)
-+{
-+	struct kdbus_msg *msg;
-+	struct kdbus_item *item;
-+	uint64_t size;
-+
-+	size = sizeof(struct kdbus_msg);
-+	size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
-+
-+	msg = malloc(size);
-+	ASSERT_RETURN_VAL(msg, -ENOMEM);
-+
-+	memset(msg, 0, size);
-+	msg->size = size;
-+	msg->src_id = conn->id;
-+	msg->dst_id = dst_id;
-+	msg->payload_type = KDBUS_PAYLOAD_DBUS;
-+
-+	item = msg->items;
-+
-+	item->type = KDBUS_ITEM_PAYLOAD_VEC;
-+	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
-+	item->vec.address = (uintptr_t) stress_payload;
-+	item->vec.size = sizeof(stress_payload);
-+	item = KDBUS_ITEM_NEXT(item);
-+
-+	*msg_out = msg;
-+
-+	return 0;
-+}
-+
-+static int setup_memfd_kdbus_msg(struct kdbus_conn *conn,
-+				 uint64_t dst_id,
-+				 off_t *memfd_item_offset,
-+				 struct kdbus_msg **msg_out)
-+{
-+	struct kdbus_msg *msg;
-+	struct kdbus_item *item;
-+	uint64_t size;
-+
-+	size = sizeof(struct kdbus_msg);
-+	size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
-+	size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_memfd));
-+
-+	msg = malloc(size);
-+	ASSERT_RETURN_VAL(msg, -ENOMEM);
-+
-+	memset(msg, 0, size);
-+	msg->size = size;
-+	msg->src_id = conn->id;
-+	msg->dst_id = dst_id;
-+	msg->payload_type = KDBUS_PAYLOAD_DBUS;
-+
-+	item = msg->items;
-+
-+	item->type = KDBUS_ITEM_PAYLOAD_VEC;
-+	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
-+	item->vec.address = (uintptr_t) stress_payload;
-+	item->vec.size = sizeof(stress_payload);
-+	item = KDBUS_ITEM_NEXT(item);
-+
-+	item->type = KDBUS_ITEM_PAYLOAD_MEMFD;
-+	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_memfd);
-+	item->memfd.size = sizeof(uint64_t);
-+
-+	*memfd_item_offset = (unsigned char *)item - (unsigned char *)msg;
-+	*msg_out = msg;
-+
-+	return 0;
-+}
-+
-+static int
-+send_echo_request(struct kdbus_conn *conn, uint64_t dst_id,
-+		  void *kdbus_msg, off_t memfd_item_offset)
-+{
-+	struct kdbus_cmd_send cmd = {};
-+	int memfd = -1;
-+	int ret;
-+
-+	if (use_memfd) {
-+		uint64_t now_ns = now(CLOCK_THREAD_CPUTIME_ID);
-+		struct kdbus_item *item = memfd_item_offset + kdbus_msg;
-+		memfd = sys_memfd_create("memfd-name", 0);
-+		ASSERT_RETURN_VAL(memfd >= 0, memfd);
-+
-+		ret = write(memfd, &now_ns, sizeof(now_ns));
-+		ASSERT_RETURN_VAL(ret == sizeof(now_ns), -EAGAIN);
-+
-+		ret = sys_memfd_seal_set(memfd);
-+		ASSERT_RETURN_VAL(ret == 0, -errno);
-+
-+		item->memfd.fd = memfd;
-+	}
-+
-+	cmd.size = sizeof(cmd);
-+	cmd.msg_address = (uintptr_t)kdbus_msg;
-+
-+	ret = kdbus_cmd_send(conn->fd, &cmd);
-+	ASSERT_RETURN_VAL(ret == 0, ret);
-+
-+	close(memfd);
-+
-+	return 0;
-+}
-+
-+static int
-+handle_echo_reply(struct kdbus_conn *conn, uint64_t send_ns)
-+{
-+	int ret;
-+	struct kdbus_cmd_recv recv = { .size = sizeof(recv) };
-+	struct kdbus_msg *msg;
-+	const struct kdbus_item *item;
-+	bool has_memfd = false;
-+
-+	ret = kdbus_cmd_recv(conn->fd, &recv);
-+	if (ret == -EAGAIN)
-+		return ret;
-+
-+	ASSERT_RETURN_VAL(ret == 0, ret);
-+
-+	if (!use_memfd)
-+		goto out;
-+
-+	msg = (struct kdbus_msg *)(conn->buf + recv.msg.offset);
-+
-+	KDBUS_ITEM_FOREACH(item, msg, items) {
-+		switch (item->type) {
-+		case KDBUS_ITEM_PAYLOAD_MEMFD: {
-+			char *buf;
-+
-+			buf = mmap(NULL, item->memfd.size, PROT_READ,
-+				   MAP_PRIVATE, item->memfd.fd, 0);
-+			ASSERT_RETURN_VAL(buf != MAP_FAILED, -EINVAL);
-+			ASSERT_RETURN_VAL(item->memfd.size == sizeof(uint64_t),
-+					  -EINVAL);
-+
-+			add_stats(*(uint64_t*)buf);
-+			munmap(buf, item->memfd.size);
-+			close(item->memfd.fd);
-+			has_memfd = true;
-+			break;
-+		}
-+
-+		case KDBUS_ITEM_PAYLOAD_OFF:
-+			/* ignore */
-+			break;
-+		}
-+	}
-+
-+out:
-+	if (!has_memfd)
-+		add_stats(send_ns);
-+
-+	ret = kdbus_free(conn, recv.msg.offset);
-+	ASSERT_RETURN_VAL(ret == 0, -errno);
-+
-+	return 0;
-+}
-+
-+static int benchmark(struct kdbus_test_env *env)
-+{
-+	static char buf[sizeof(stress_payload)];
-+	struct kdbus_msg *kdbus_msg = NULL;
-+	off_t memfd_cached_offset = 0;
-+	int ret;
-+	struct kdbus_conn *conn_a, *conn_b;
-+	struct pollfd fds[2];
-+	uint64_t start, send_ns, now_ns, diff;
-+	unsigned int i;
-+	int uds[2];
-+
-+	setlocale(LC_ALL, "");
-+
-+	for (i = 0; i < sizeof(stress_payload); i++)
-+		stress_payload[i] = i;
-+
-+	/* setup kdbus pair */
-+
-+	conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
-+	conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn_a && conn_b);
-+
-+	ret = kdbus_add_match_empty(conn_a);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_add_match_empty(conn_b);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_name_acquire(conn_a, SERVICE_NAME, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	if (attach_none) {
-+		ret = kdbus_conn_update_attach_flags(conn_a,
-+						     _KDBUS_ATTACH_ALL,
-+						     0);
-+		ASSERT_RETURN(ret == 0);
-+	}
-+
-+	/* setup UDS pair */
-+
-+	ret = socketpair(AF_UNIX, SOCK_SEQPACKET | SOCK_NONBLOCK, 0, uds);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* setup a kdbus msg now */
-+	if (use_memfd) {
-+		ret = setup_memfd_kdbus_msg(conn_b, conn_a->id,
-+					    &memfd_cached_offset,
-+					    &kdbus_msg);
-+		ASSERT_RETURN(ret == 0);
-+	} else {
-+		ret = setup_simple_kdbus_msg(conn_b, conn_a->id, &kdbus_msg);
-+		ASSERT_RETURN(ret == 0);
-+	}
-+
-+	/* start benchmark */
-+
-+	kdbus_printf("-- entering poll loop ...\n");
-+
-+	do {
-+		/* run kdbus benchmark */
-+		fds[0].fd = conn_a->fd;
-+		fds[1].fd = conn_b->fd;
-+
-+		/* cancel any pending message */
-+		handle_echo_reply(conn_a, 0);
-+
-+		start = now(CLOCK_THREAD_CPUTIME_ID);
-+		reset_stats();
-+
-+		send_ns = now(CLOCK_THREAD_CPUTIME_ID);
-+		ret = send_echo_request(conn_b, conn_a->id,
-+					kdbus_msg, memfd_cached_offset);
-+		ASSERT_RETURN(ret == 0);
-+
-+		while (1) {
-+			unsigned int nfds = sizeof(fds) / sizeof(fds[0]);
-+			unsigned int i;
-+
-+			for (i = 0; i < nfds; i++) {
-+				fds[i].events = POLLIN | POLLPRI | POLLHUP;
-+				fds[i].revents = 0;
-+			}
-+
-+			ret = poll(fds, nfds, 10);
-+			if (ret < 0)
-+				break;
-+
-+			if (fds[0].revents & POLLIN) {
-+				ret = handle_echo_reply(conn_a, send_ns);
-+				ASSERT_RETURN(ret == 0);
-+
-+				send_ns = now(CLOCK_THREAD_CPUTIME_ID);
-+				ret = send_echo_request(conn_b, conn_a->id,
-+							kdbus_msg,
-+							memfd_cached_offset);
-+				ASSERT_RETURN(ret == 0);
-+			}
-+
-+			now_ns = now(CLOCK_THREAD_CPUTIME_ID);
-+			diff = now_ns - start;
-+			if (diff > 1000000000ULL) {
-+				start = now_ns;
-+
-+				dump_stats(false);
-+				break;
-+			}
-+		}
-+
-+		if (!compare_uds)
-+			continue;
-+
-+		/* run unix-socket benchmark as comparison */
-+
-+		fds[0].fd = uds[0];
-+		fds[1].fd = uds[1];
-+
-+		/* cancel any pendign message */
-+		read(uds[1], buf, sizeof(buf));
-+
-+		start = now(CLOCK_THREAD_CPUTIME_ID);
-+		reset_stats();
-+
-+		send_ns = now(CLOCK_THREAD_CPUTIME_ID);
-+		ret = write(uds[0], stress_payload, sizeof(stress_payload));
-+		ASSERT_RETURN(ret == sizeof(stress_payload));
-+
-+		while (1) {
-+			unsigned int nfds = sizeof(fds) / sizeof(fds[0]);
-+			unsigned int i;
-+
-+			for (i = 0; i < nfds; i++) {
-+				fds[i].events = POLLIN | POLLPRI | POLLHUP;
-+				fds[i].revents = 0;
-+			}
-+
-+			ret = poll(fds, nfds, 10);
-+			if (ret < 0)
-+				break;
-+
-+			if (fds[1].revents & POLLIN) {
-+				ret = read(uds[1], buf, sizeof(buf));
-+				ASSERT_RETURN(ret == sizeof(buf));
-+
-+				add_stats(send_ns);
-+
-+				send_ns = now(CLOCK_THREAD_CPUTIME_ID);
-+				ret = write(uds[0], buf, sizeof(buf));
-+				ASSERT_RETURN(ret == sizeof(buf));
-+			}
-+
-+			now_ns = now(CLOCK_THREAD_CPUTIME_ID);
-+			diff = now_ns - start;
-+			if (diff > 1000000000ULL) {
-+				start = now_ns;
-+
-+				dump_stats(true);
-+				break;
-+			}
-+		}
-+
-+	} while (kdbus_util_verbose);
-+
-+	kdbus_printf("-- closing bus connections\n");
-+
-+	free(kdbus_msg);
-+
-+	kdbus_conn_free(conn_a);
-+	kdbus_conn_free(conn_b);
-+
-+	return (stats.count > 1) ? TEST_OK : TEST_ERR;
-+}
-+
-+int kdbus_test_benchmark(struct kdbus_test_env *env)
-+{
-+	use_memfd = true;
-+	attach_none = false;
-+	compare_uds = false;
-+	return benchmark(env);
-+}
-+
-+int kdbus_test_benchmark_nomemfds(struct kdbus_test_env *env)
-+{
-+	use_memfd = false;
-+	attach_none = false;
-+	compare_uds = false;
-+	return benchmark(env);
-+}
-+
-+int kdbus_test_benchmark_uds(struct kdbus_test_env *env)
-+{
-+	use_memfd = false;
-+	attach_none = true;
-+	compare_uds = true;
-+	return benchmark(env);
-+}
-diff --git a/tools/testing/selftests/kdbus/test-bus.c b/tools/testing/selftests/kdbus/test-bus.c
-new file mode 100644
-index 0000000..762fb30
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-bus.c
-@@ -0,0 +1,175 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <limits.h>
-+#include <sys/mman.h>
-+#include <stdbool.h>
-+
-+#include "kdbus-api.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+#include "kdbus-test.h"
-+
-+static struct kdbus_item *kdbus_get_item(struct kdbus_info *info,
-+					 uint64_t type)
-+{
-+	struct kdbus_item *item;
-+
-+	KDBUS_ITEM_FOREACH(item, info, items)
-+		if (item->type == type)
-+			return item;
-+
-+	return NULL;
-+}
-+
-+static int test_bus_creator_info(const char *bus_path)
-+{
-+	int ret;
-+	uint64_t offset;
-+	struct kdbus_conn *conn;
-+	struct kdbus_info *info;
-+	struct kdbus_item *item;
-+	char *tmp, *busname;
-+
-+	/* extract the bus-name from @bus_path */
-+	tmp = strdup(bus_path);
-+	ASSERT_RETURN(tmp);
-+	busname = strrchr(tmp, '/');
-+	ASSERT_RETURN(busname);
-+	*busname = 0;
-+	busname = strrchr(tmp, '/');
-+	ASSERT_RETURN(busname);
-+	++busname;
-+
-+	conn = kdbus_hello(bus_path, 0, NULL, 0);
-+	ASSERT_RETURN(conn);
-+
-+	ret = kdbus_bus_creator_info(conn, _KDBUS_ATTACH_ALL, &offset);
-+	ASSERT_RETURN(ret == 0);
-+
-+	info = (struct kdbus_info *)(conn->buf + offset);
-+
-+	item = kdbus_get_item(info, KDBUS_ITEM_MAKE_NAME);
-+	ASSERT_RETURN(item);
-+	ASSERT_RETURN(!strcmp(item->str, busname));
-+
-+	ret = kdbus_free(conn, offset);
-+	ASSERT_RETURN_VAL(ret == 0, ret);
-+
-+	free(tmp);
-+	kdbus_conn_free(conn);
-+	return 0;
-+}
-+
-+int kdbus_test_bus_make(struct kdbus_test_env *env)
-+{
-+	struct {
-+		struct kdbus_cmd cmd;
-+
-+		/* bloom size item */
-+		struct {
-+			uint64_t size;
-+			uint64_t type;
-+			struct kdbus_bloom_parameter bloom;
-+		} bs;
-+
-+		/* name item */
-+		uint64_t n_size;
-+		uint64_t n_type;
-+		char name[64];
-+	} bus_make;
-+	char s[PATH_MAX], *name;
-+	int ret, control_fd2;
-+	uid_t uid;
-+
-+	name = unique_name("");
-+	ASSERT_RETURN(name);
-+
-+	snprintf(s, sizeof(s), "%s/control", env->root);
-+	env->control_fd = open(s, O_RDWR|O_CLOEXEC);
-+	ASSERT_RETURN(env->control_fd >= 0);
-+
-+	control_fd2 = open(s, O_RDWR|O_CLOEXEC);
-+	ASSERT_RETURN(control_fd2 >= 0);
-+
-+	memset(&bus_make, 0, sizeof(bus_make));
-+
-+	bus_make.bs.size = sizeof(bus_make.bs);
-+	bus_make.bs.type = KDBUS_ITEM_BLOOM_PARAMETER;
-+	bus_make.bs.bloom.size = 64;
-+	bus_make.bs.bloom.n_hash = 1;
-+
-+	bus_make.n_type = KDBUS_ITEM_MAKE_NAME;
-+
-+	uid = getuid();
-+
-+	/* missing uid prefix */
-+	snprintf(bus_make.name, sizeof(bus_make.name), "foo");
-+	bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
-+	bus_make.cmd.size = sizeof(struct kdbus_cmd) +
-+			    sizeof(bus_make.bs) + bus_make.n_size;
-+	ret = kdbus_cmd_bus_make(env->control_fd, &bus_make.cmd);
-+	ASSERT_RETURN(ret == -EINVAL);
-+
-+	/* non alphanumeric character */
-+	snprintf(bus_make.name, sizeof(bus_make.name), "%u-blah@123", uid);
-+	bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
-+	bus_make.cmd.size = sizeof(struct kdbus_cmd) +
-+			    sizeof(bus_make.bs) + bus_make.n_size;
-+	ret = kdbus_cmd_bus_make(env->control_fd, &bus_make.cmd);
-+	ASSERT_RETURN(ret == -EINVAL);
-+
-+	/* '-' at the end */
-+	snprintf(bus_make.name, sizeof(bus_make.name), "%u-blah-", uid);
-+	bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
-+	bus_make.cmd.size = sizeof(struct kdbus_cmd) +
-+			    sizeof(bus_make.bs) + bus_make.n_size;
-+	ret = kdbus_cmd_bus_make(env->control_fd, &bus_make.cmd);
-+	ASSERT_RETURN(ret == -EINVAL);
-+
-+	/* create a new bus */
-+	snprintf(bus_make.name, sizeof(bus_make.name), "%u-%s-1", uid, name);
-+	bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
-+	bus_make.cmd.size = sizeof(struct kdbus_cmd) +
-+			    sizeof(bus_make.bs) + bus_make.n_size;
-+	ret = kdbus_cmd_bus_make(env->control_fd, &bus_make.cmd);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_cmd_bus_make(control_fd2, &bus_make.cmd);
-+	ASSERT_RETURN(ret == -EEXIST);
-+
-+	snprintf(s, sizeof(s), "%s/%u-%s-1/bus", env->root, uid, name);
-+	ASSERT_RETURN(access(s, F_OK) == 0);
-+
-+	ret = test_bus_creator_info(s);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* can't use the same fd for bus make twice, even though a different
-+	 * bus name is used
-+	 */
-+	snprintf(bus_make.name, sizeof(bus_make.name), "%u-%s-2", uid, name);
-+	bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
-+	bus_make.cmd.size = sizeof(struct kdbus_cmd) +
-+			    sizeof(bus_make.bs) + bus_make.n_size;
-+	ret = kdbus_cmd_bus_make(env->control_fd, &bus_make.cmd);
-+	ASSERT_RETURN(ret == -EBADFD);
-+
-+	/* create a new bus, with different fd and different bus name */
-+	snprintf(bus_make.name, sizeof(bus_make.name), "%u-%s-2", uid, name);
-+	bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
-+	bus_make.cmd.size = sizeof(struct kdbus_cmd) +
-+			    sizeof(bus_make.bs) + bus_make.n_size;
-+	ret = kdbus_cmd_bus_make(control_fd2, &bus_make.cmd);
-+	ASSERT_RETURN(ret == 0);
-+
-+	close(control_fd2);
-+	free(name);
-+
-+	return TEST_OK;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-chat.c b/tools/testing/selftests/kdbus/test-chat.c
-new file mode 100644
-index 0000000..41e5b53
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-chat.c
-@@ -0,0 +1,124 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <time.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <poll.h>
-+#include <stdbool.h>
-+
-+#include "kdbus-test.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+
-+int kdbus_test_chat(struct kdbus_test_env *env)
-+{
-+	int ret, cookie;
-+	struct kdbus_conn *conn_a, *conn_b;
-+	struct pollfd fds[2];
-+	uint64_t flags;
-+	int count;
-+
-+	conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
-+	conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn_a && conn_b);
-+
-+	flags = KDBUS_NAME_ALLOW_REPLACEMENT;
-+	ret = kdbus_name_acquire(conn_a, "foo.bar.test", &flags);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_name_acquire(conn_a, "foo.bar.baz", NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	flags = KDBUS_NAME_QUEUE;
-+	ret = kdbus_name_acquire(conn_b, "foo.bar.baz", &flags);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_name_acquire(conn_a, "foo.bar.double", NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	flags = 0;
-+	ret = kdbus_name_acquire(conn_a, "foo.bar.double", &flags);
-+	ASSERT_RETURN(ret == 0);
-+	ASSERT_RETURN(!(flags & KDBUS_NAME_ACQUIRED));
-+
-+	ret = kdbus_name_release(conn_a, "foo.bar.double");
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_name_release(conn_a, "foo.bar.double");
-+	ASSERT_RETURN(ret == -ESRCH);
-+
-+	ret = kdbus_list(conn_b, KDBUS_LIST_UNIQUE |
-+				 KDBUS_LIST_NAMES  |
-+				 KDBUS_LIST_QUEUED |
-+				 KDBUS_LIST_ACTIVATORS);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_add_match_empty(conn_a);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_add_match_empty(conn_b);
-+	ASSERT_RETURN(ret == 0);
-+
-+	cookie = 0;
-+	ret = kdbus_msg_send(conn_b, NULL, 0xc0000000 | cookie, 0, 0, 0,
-+			     KDBUS_DST_ID_BROADCAST);
-+	ASSERT_RETURN(ret == 0);
-+
-+	fds[0].fd = conn_a->fd;
-+	fds[1].fd = conn_b->fd;
-+
-+	kdbus_printf("-- entering poll loop ...\n");
-+
-+	for (count = 0;; count++) {
-+		int i, nfds = sizeof(fds) / sizeof(fds[0]);
-+
-+		for (i = 0; i < nfds; i++) {
-+			fds[i].events = POLLIN | POLLPRI | POLLHUP;
-+			fds[i].revents = 0;
-+		}
-+
-+		ret = poll(fds, nfds, 3000);
-+		ASSERT_RETURN(ret >= 0);
-+
-+		if (fds[0].revents & POLLIN) {
-+			if (count > 2)
-+				kdbus_name_release(conn_a, "foo.bar.baz");
-+
-+			ret = kdbus_msg_recv(conn_a, NULL, NULL);
-+			ASSERT_RETURN(ret == 0);
-+			ret = kdbus_msg_send(conn_a, NULL,
-+					     0xc0000000 | cookie++,
-+					     0, 0, 0, conn_b->id);
-+			ASSERT_RETURN(ret == 0);
-+		}
-+
-+		if (fds[1].revents & POLLIN) {
-+			ret = kdbus_msg_recv(conn_b, NULL, NULL);
-+			ASSERT_RETURN(ret == 0);
-+			ret = kdbus_msg_send(conn_b, NULL,
-+					     0xc0000000 | cookie++,
-+					     0, 0, 0, conn_a->id);
-+			ASSERT_RETURN(ret == 0);
-+		}
-+
-+		ret = kdbus_list(conn_b, KDBUS_LIST_UNIQUE |
-+					 KDBUS_LIST_NAMES  |
-+					 KDBUS_LIST_QUEUED |
-+					 KDBUS_LIST_ACTIVATORS);
-+		ASSERT_RETURN(ret == 0);
-+
-+		if (count > 10)
-+			break;
-+	}
-+
-+	kdbus_printf("-- closing bus connections\n");
-+	kdbus_conn_free(conn_a);
-+	kdbus_conn_free(conn_b);
-+
-+	return TEST_OK;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-connection.c b/tools/testing/selftests/kdbus/test-connection.c
-new file mode 100644
-index 0000000..4688ce8
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-connection.c
-@@ -0,0 +1,597 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <limits.h>
-+#include <sys/types.h>
-+#include <sys/capability.h>
-+#include <sys/mman.h>
-+#include <sys/syscall.h>
-+#include <sys/wait.h>
-+#include <stdbool.h>
-+
-+#include "kdbus-api.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+#include "kdbus-test.h"
-+
-+int kdbus_test_hello(struct kdbus_test_env *env)
-+{
-+	struct kdbus_cmd_free cmd_free = {};
-+	struct kdbus_cmd_hello hello;
-+	int fd, ret;
-+
-+	memset(&hello, 0, sizeof(hello));
-+
-+	fd = open(env->buspath, O_RDWR|O_CLOEXEC);
-+	ASSERT_RETURN(fd >= 0);
-+
-+	hello.flags = KDBUS_HELLO_ACCEPT_FD;
-+	hello.attach_flags_send = _KDBUS_ATTACH_ALL;
-+	hello.attach_flags_recv = _KDBUS_ATTACH_ALL;
-+	hello.size = sizeof(struct kdbus_cmd_hello);
-+	hello.pool_size = POOL_SIZE;
-+
-+	/* an unaligned hello must result in -EFAULT */
-+	ret = kdbus_cmd_hello(fd, (struct kdbus_cmd_hello *) ((char *) &hello + 1));
-+	ASSERT_RETURN(ret == -EFAULT);
-+
-+	/* a size of 0 must return EMSGSIZE */
-+	hello.size = 1;
-+	hello.flags = KDBUS_HELLO_ACCEPT_FD;
-+	hello.attach_flags_send = _KDBUS_ATTACH_ALL;
-+	ret = kdbus_cmd_hello(fd, &hello);
-+	ASSERT_RETURN(ret == -EINVAL);
-+
-+	hello.size = sizeof(struct kdbus_cmd_hello);
-+
-+	/* check faulty flags */
-+	hello.flags = 1ULL << 32;
-+	hello.attach_flags_send = _KDBUS_ATTACH_ALL;
-+	ret = kdbus_cmd_hello(fd, &hello);
-+	ASSERT_RETURN(ret == -EINVAL);
-+
-+	/* check for faulty pool sizes */
-+	hello.pool_size = 0;
-+	hello.flags = KDBUS_HELLO_ACCEPT_FD;
-+	hello.attach_flags_send = _KDBUS_ATTACH_ALL;
-+	ret = kdbus_cmd_hello(fd, &hello);
-+	ASSERT_RETURN(ret == -EINVAL);
-+
-+	hello.pool_size = 4097;
-+	hello.attach_flags_send = _KDBUS_ATTACH_ALL;
-+	ret = kdbus_cmd_hello(fd, &hello);
-+	ASSERT_RETURN(ret == -EINVAL);
-+
-+	hello.pool_size = POOL_SIZE;
-+
-+	hello.attach_flags_send = _KDBUS_ATTACH_ALL;
-+	hello.offset = (__u64)-1;
-+
-+	/* success test */
-+	ret = kdbus_cmd_hello(fd, &hello);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* The kernel should have returned some items */
-+	ASSERT_RETURN(hello.offset != (__u64)-1);
-+	cmd_free.size = sizeof(cmd_free);
-+	cmd_free.offset = hello.offset;
-+	ret = kdbus_cmd_free(fd, &cmd_free);
-+	ASSERT_RETURN(ret >= 0);
-+
-+	close(fd);
-+
-+	fd = open(env->buspath, O_RDWR|O_CLOEXEC);
-+	ASSERT_RETURN(fd >= 0);
-+
-+	/* no ACTIVATOR flag without a name */
-+	hello.flags = KDBUS_HELLO_ACTIVATOR;
-+	ret = kdbus_cmd_hello(fd, &hello);
-+	ASSERT_RETURN(ret == -EINVAL);
-+
-+	close(fd);
-+
-+	return TEST_OK;
-+}
-+
-+int kdbus_test_byebye(struct kdbus_test_env *env)
-+{
-+	struct kdbus_conn *conn;
-+	struct kdbus_cmd_recv cmd_recv = { .size = sizeof(cmd_recv) };
-+	struct kdbus_cmd cmd_byebye = { .size = sizeof(cmd_byebye) };
-+	int ret;
-+
-+	/* create a 2nd connection */
-+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn != NULL);
-+
-+	ret = kdbus_add_match_empty(conn);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_add_match_empty(env->conn);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* send over 1st connection */
-+	ret = kdbus_msg_send(env->conn, NULL, 0, 0, 0, 0,
-+			     KDBUS_DST_ID_BROADCAST);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* say byebye on the 2nd, which must fail */
-+	ret = kdbus_cmd_byebye(conn->fd, &cmd_byebye);
-+	ASSERT_RETURN(ret == -EBUSY);
-+
-+	/* receive the message */
-+	ret = kdbus_cmd_recv(conn->fd, &cmd_recv);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_free(conn, cmd_recv.msg.offset);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* and try again */
-+	ret = kdbus_cmd_byebye(conn->fd, &cmd_byebye);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* a 2nd try should result in -ECONNRESET */
-+	ret = kdbus_cmd_byebye(conn->fd, &cmd_byebye);
-+	ASSERT_RETURN(ret == -ECONNRESET);
-+
-+	kdbus_conn_free(conn);
-+
-+	return TEST_OK;
-+}
-+
-+/* Get only the first item */
-+static struct kdbus_item *kdbus_get_item(struct kdbus_info *info,
-+					 uint64_t type)
-+{
-+	struct kdbus_item *item;
-+
-+	KDBUS_ITEM_FOREACH(item, info, items)
-+		if (item->type == type)
-+			return item;
-+
-+	return NULL;
-+}
-+
-+static unsigned int kdbus_count_item(struct kdbus_info *info,
-+				     uint64_t type)
-+{
-+	unsigned int i = 0;
-+	const struct kdbus_item *item;
-+
-+	KDBUS_ITEM_FOREACH(item, info, items)
-+		if (item->type == type)
-+			i++;
-+
-+	return i;
-+}
-+
-+static int kdbus_fuzz_conn_info(struct kdbus_test_env *env, int capable)
-+{
-+	int ret;
-+	unsigned int cnt = 0;
-+	uint64_t offset = 0;
-+	struct kdbus_info *info;
-+	struct kdbus_conn *conn;
-+	struct kdbus_conn *privileged;
-+	const struct kdbus_item *item;
-+	uint64_t valid_flags = KDBUS_ATTACH_NAMES |
-+			       KDBUS_ATTACH_CREDS |
-+			       KDBUS_ATTACH_PIDS |
-+			       KDBUS_ATTACH_CONN_DESCRIPTION;
-+
-+	uint64_t invalid_flags = KDBUS_ATTACH_NAMES	|
-+				 KDBUS_ATTACH_CREDS	|
-+				 KDBUS_ATTACH_PIDS	|
-+				 KDBUS_ATTACH_CAPS	|
-+				 KDBUS_ATTACH_CGROUP	|
-+				 KDBUS_ATTACH_CONN_DESCRIPTION;
-+
-+	struct kdbus_creds cached_creds;
-+	uid_t ruid, euid, suid;
-+	gid_t rgid, egid, sgid;
-+
-+	getresuid(&ruid, &euid, &suid);
-+	getresgid(&rgid, &egid, &sgid);
-+
-+	cached_creds.uid = ruid;
-+	cached_creds.euid = euid;
-+	cached_creds.suid = suid;
-+	cached_creds.fsuid = ruid;
-+
-+	cached_creds.gid = rgid;
-+	cached_creds.egid = egid;
-+	cached_creds.sgid = sgid;
-+	cached_creds.fsgid = rgid;
-+
-+	struct kdbus_pids cached_pids = {
-+		.pid	= getpid(),
-+		.tid	= syscall(SYS_gettid),
-+		.ppid	= getppid(),
-+	};
-+
-+	ret = kdbus_conn_info(env->conn, env->conn->id, NULL,
-+			      valid_flags, &offset);
-+	ASSERT_RETURN(ret == 0);
-+
-+	info = (struct kdbus_info *)(env->conn->buf + offset);
-+	ASSERT_RETURN(info->id == env->conn->id);
-+
-+	/* We do not have any well-known name */
-+	item = kdbus_get_item(info, KDBUS_ITEM_NAME);
-+	ASSERT_RETURN(item == NULL);
-+
-+	item = kdbus_get_item(info, KDBUS_ITEM_CONN_DESCRIPTION);
-+	if (valid_flags & KDBUS_ATTACH_CONN_DESCRIPTION) {
-+		ASSERT_RETURN(item);
-+	} else {
-+		ASSERT_RETURN(item == NULL);
-+	}
-+
-+	kdbus_free(env->conn, offset);
-+
-+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn);
-+
-+	privileged = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(privileged);
-+
-+	ret = kdbus_conn_info(conn, conn->id, NULL, valid_flags, &offset);
-+	ASSERT_RETURN(ret == 0);
-+
-+	info = (struct kdbus_info *)(conn->buf + offset);
-+	ASSERT_RETURN(info->id == conn->id);
-+
-+	/* We do not have any well-known name */
-+	item = kdbus_get_item(info, KDBUS_ITEM_NAME);
-+	ASSERT_RETURN(item == NULL);
-+
-+	cnt = kdbus_count_item(info, KDBUS_ITEM_CREDS);
-+	if (valid_flags & KDBUS_ATTACH_CREDS) {
-+		ASSERT_RETURN(cnt == 1);
-+
-+		item = kdbus_get_item(info, KDBUS_ITEM_CREDS);
-+		ASSERT_RETURN(item);
-+
-+		/* Compare received items with cached creds */
-+		ASSERT_RETURN(memcmp(&item->creds, &cached_creds,
-+				      sizeof(struct kdbus_creds)) == 0);
-+	} else {
-+		ASSERT_RETURN(cnt == 0);
-+	}
-+
-+	item = kdbus_get_item(info, KDBUS_ITEM_PIDS);
-+	if (valid_flags & KDBUS_ATTACH_PIDS) {
-+		ASSERT_RETURN(item);
-+
-+		/* Compare item->pids with cached PIDs */
-+		ASSERT_RETURN(item->pids.pid == cached_pids.pid &&
-+			      item->pids.tid == cached_pids.tid &&
-+			      item->pids.ppid == cached_pids.ppid);
-+	} else {
-+		ASSERT_RETURN(item == NULL);
-+	}
-+
-+	/* We did not request KDBUS_ITEM_CAPS */
-+	item = kdbus_get_item(info, KDBUS_ITEM_CAPS);
-+	ASSERT_RETURN(item == NULL);
-+
-+	kdbus_free(conn, offset);
-+
-+	ret = kdbus_name_acquire(conn, "com.example.a", NULL);
-+	ASSERT_RETURN(ret >= 0);
-+
-+	ret = kdbus_conn_info(conn, conn->id, NULL, valid_flags, &offset);
-+	ASSERT_RETURN(ret == 0);
-+
-+	info = (struct kdbus_info *)(conn->buf + offset);
-+	ASSERT_RETURN(info->id == conn->id);
-+
-+	item = kdbus_get_item(info, KDBUS_ITEM_OWNED_NAME);
-+	if (valid_flags & KDBUS_ATTACH_NAMES) {
-+		ASSERT_RETURN(item && !strcmp(item->name.name, "com.example.a"));
-+	} else {
-+		ASSERT_RETURN(item == NULL);
-+	}
-+
-+	kdbus_free(conn, offset);
-+
-+	ret = kdbus_conn_info(conn, 0, "com.example.a", valid_flags, &offset);
-+	ASSERT_RETURN(ret == 0);
-+
-+	info = (struct kdbus_info *)(conn->buf + offset);
-+	ASSERT_RETURN(info->id == conn->id);
-+
-+	kdbus_free(conn, offset);
-+
-+	/* does not have the necessary caps to drop to unprivileged */
-+	if (!capable)
-+		goto continue_test;
-+
-+	ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_GID, ({
-+		ret = kdbus_conn_info(conn, conn->id, NULL,
-+				      valid_flags, &offset);
-+		ASSERT_EXIT(ret == 0);
-+
-+		info = (struct kdbus_info *)(conn->buf + offset);
-+		ASSERT_EXIT(info->id == conn->id);
-+
-+		if (valid_flags & KDBUS_ATTACH_NAMES) {
-+			item = kdbus_get_item(info, KDBUS_ITEM_OWNED_NAME);
-+			ASSERT_EXIT(item &&
-+				    strcmp(item->name.name,
-+				           "com.example.a") == 0);
-+		}
-+
-+		if (valid_flags & KDBUS_ATTACH_CREDS) {
-+			item = kdbus_get_item(info, KDBUS_ITEM_CREDS);
-+			ASSERT_EXIT(item);
-+
-+			/* Compare received items with cached creds */
-+			ASSERT_EXIT(memcmp(&item->creds, &cached_creds,
-+				    sizeof(struct kdbus_creds)) == 0);
-+		}
-+
-+		if (valid_flags & KDBUS_ATTACH_PIDS) {
-+			item = kdbus_get_item(info, KDBUS_ITEM_PIDS);
-+			ASSERT_EXIT(item);
-+
-+			/*
-+			 * Compare item->pids with cached pids of
-+			 * privileged one.
-+			 *
-+			 * cmd_info will always return cached pids.
-+			 */
-+			ASSERT_EXIT(item->pids.pid == cached_pids.pid &&
-+				    item->pids.tid == cached_pids.tid);
-+		}
-+
-+		kdbus_free(conn, offset);
-+
-+		/*
-+		 * Use invalid_flags and make sure that userspace
-+		 * do not play with us.
-+		 */
-+		ret = kdbus_conn_info(conn, conn->id, NULL,
-+				      invalid_flags, &offset);
-+		ASSERT_EXIT(ret == 0);
-+
-+		/*
-+		 * Make sure that we return only one creds item and
-+		 * it points to the cached creds.
-+		 */
-+		cnt = kdbus_count_item(info, KDBUS_ITEM_CREDS);
-+		if (invalid_flags & KDBUS_ATTACH_CREDS) {
-+			ASSERT_EXIT(cnt == 1);
-+
-+			item = kdbus_get_item(info, KDBUS_ITEM_CREDS);
-+			ASSERT_EXIT(item);
-+
-+			/* Compare received items with cached creds */
-+			ASSERT_EXIT(memcmp(&item->creds, &cached_creds,
-+				    sizeof(struct kdbus_creds)) == 0);
-+		} else {
-+			ASSERT_EXIT(cnt == 0);
-+		}
-+
-+		if (invalid_flags & KDBUS_ATTACH_PIDS) {
-+			cnt = kdbus_count_item(info, KDBUS_ITEM_PIDS);
-+			ASSERT_EXIT(cnt == 1);
-+
-+			item = kdbus_get_item(info, KDBUS_ITEM_PIDS);
-+			ASSERT_EXIT(item);
-+
-+			/* Compare item->pids with cached pids */
-+			ASSERT_EXIT(item->pids.pid == cached_pids.pid &&
-+				    item->pids.tid == cached_pids.tid);
-+		}
-+
-+		cnt = kdbus_count_item(info, KDBUS_ITEM_CGROUP);
-+		if (invalid_flags & KDBUS_ATTACH_CGROUP) {
-+			ASSERT_EXIT(cnt == 1);
-+		} else {
-+			ASSERT_EXIT(cnt == 0);
-+		}
-+
-+		cnt = kdbus_count_item(info, KDBUS_ITEM_CAPS);
-+		if (invalid_flags & KDBUS_ATTACH_CAPS) {
-+			ASSERT_EXIT(cnt == 1);
-+		} else {
-+			ASSERT_EXIT(cnt == 0);
-+		}
-+
-+		kdbus_free(conn, offset);
-+	}),
-+	({ 0; }));
-+	ASSERT_RETURN(ret == 0);
-+
-+continue_test:
-+
-+	/* A second name */
-+	ret = kdbus_name_acquire(conn, "com.example.b", NULL);
-+	ASSERT_RETURN(ret >= 0);
-+
-+	ret = kdbus_conn_info(conn, conn->id, NULL, valid_flags, &offset);
-+	ASSERT_RETURN(ret == 0);
-+
-+	info = (struct kdbus_info *)(conn->buf + offset);
-+	ASSERT_RETURN(info->id == conn->id);
-+
-+	cnt = kdbus_count_item(info, KDBUS_ITEM_OWNED_NAME);
-+	if (valid_flags & KDBUS_ATTACH_NAMES) {
-+		ASSERT_RETURN(cnt == 2);
-+	} else {
-+		ASSERT_RETURN(cnt == 0);
-+	}
-+
-+	kdbus_free(conn, offset);
-+
-+	ASSERT_RETURN(ret == 0);
-+
-+	return 0;
-+}
-+
-+int kdbus_test_conn_info(struct kdbus_test_env *env)
-+{
-+	int ret;
-+	int have_caps;
-+	struct {
-+		struct kdbus_cmd_info cmd_info;
-+
-+		struct {
-+			uint64_t size;
-+			uint64_t type;
-+			char str[64];
-+		} name;
-+	} buf;
-+
-+	buf.cmd_info.size = sizeof(struct kdbus_cmd_info);
-+	buf.cmd_info.flags = 0;
-+	buf.cmd_info.attach_flags = 0;
-+	buf.cmd_info.id = env->conn->id;
-+
-+	ret = kdbus_conn_info(env->conn, env->conn->id, NULL, 0, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* try to pass a name that is longer than the buffer's size */
-+	buf.name.size = KDBUS_ITEM_HEADER_SIZE + 1;
-+	buf.name.type = KDBUS_ITEM_NAME;
-+	strcpy(buf.name.str, "foo.bar.bla");
-+
-+	buf.cmd_info.id = 0;
-+	buf.cmd_info.size = sizeof(buf.cmd_info) + buf.name.size;
-+	ret = kdbus_cmd_conn_info(env->conn->fd, (struct kdbus_cmd_info *) &buf);
-+	ASSERT_RETURN(ret == -EINVAL);
-+
-+	/* Pass a non existent name */
-+	ret = kdbus_conn_info(env->conn, 0, "non.existent.name", 0, NULL);
-+	ASSERT_RETURN(ret == -ESRCH);
-+
-+	if (!all_uids_gids_are_mapped())
-+		return TEST_SKIP;
-+
-+	/* Test for caps here, so we run the previous test */
-+	have_caps = test_is_capable(CAP_SETUID, CAP_SETGID, -1);
-+	ASSERT_RETURN(have_caps >= 0);
-+
-+	ret = kdbus_fuzz_conn_info(env, have_caps);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* Now if we have skipped some tests then let the user know */
-+	if (!have_caps)
-+		return TEST_SKIP;
-+
-+	return TEST_OK;
-+}
-+
-+int kdbus_test_conn_update(struct kdbus_test_env *env)
-+{
-+	struct kdbus_conn *conn;
-+	struct kdbus_msg *msg;
-+	int found = 0;
-+	int ret;
-+
-+	/*
-+	 * kdbus_hello() sets all attach flags. Receive a message by this
-+	 * connection, and make sure a timestamp item (just to pick one) is
-+	 * present.
-+	 */
-+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn);
-+
-+	ret = kdbus_msg_send(env->conn, NULL, 0x12345678, 0, 0, 0, conn->id);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_msg_recv(conn, &msg, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	found = kdbus_item_in_message(msg, KDBUS_ITEM_TIMESTAMP);
-+	ASSERT_RETURN(found == 1);
-+
-+	kdbus_msg_free(msg);
-+
-+	/*
-+	 * Now, modify the attach flags and repeat the action. The item must
-+	 * now be missing.
-+	 */
-+	found = 0;
-+
-+	ret = kdbus_conn_update_attach_flags(conn,
-+					     _KDBUS_ATTACH_ALL,
-+					     _KDBUS_ATTACH_ALL &
-+					     ~KDBUS_ATTACH_TIMESTAMP);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_msg_send(env->conn, NULL, 0x12345678, 0, 0, 0, conn->id);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_msg_recv(conn, &msg, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	found = kdbus_item_in_message(msg, KDBUS_ITEM_TIMESTAMP);
-+	ASSERT_RETURN(found == 0);
-+
-+	/* Provide a bogus attach_flags value */
-+	ret = kdbus_conn_update_attach_flags(conn,
-+					     _KDBUS_ATTACH_ALL + 1,
-+					     _KDBUS_ATTACH_ALL);
-+	ASSERT_RETURN(ret == -EINVAL);
-+
-+	kdbus_msg_free(msg);
-+
-+	kdbus_conn_free(conn);
-+
-+	return TEST_OK;
-+}
-+
-+int kdbus_test_writable_pool(struct kdbus_test_env *env)
-+{
-+	struct kdbus_cmd_free cmd_free = {};
-+	struct kdbus_cmd_hello hello;
-+	int fd, ret;
-+	void *map;
-+
-+	fd = open(env->buspath, O_RDWR | O_CLOEXEC);
-+	ASSERT_RETURN(fd >= 0);
-+
-+	memset(&hello, 0, sizeof(hello));
-+	hello.flags = KDBUS_HELLO_ACCEPT_FD;
-+	hello.attach_flags_send = _KDBUS_ATTACH_ALL;
-+	hello.attach_flags_recv = _KDBUS_ATTACH_ALL;
-+	hello.size = sizeof(struct kdbus_cmd_hello);
-+	hello.pool_size = POOL_SIZE;
-+	hello.offset = (__u64)-1;
-+
-+	/* success test */
-+	ret = kdbus_cmd_hello(fd, &hello);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* The kernel should have returned some items */
-+	ASSERT_RETURN(hello.offset != (__u64)-1);
-+	cmd_free.size = sizeof(cmd_free);
-+	cmd_free.offset = hello.offset;
-+	ret = kdbus_cmd_free(fd, &cmd_free);
-+	ASSERT_RETURN(ret >= 0);
-+
-+	/* pools cannot be mapped writable */
-+	map = mmap(NULL, POOL_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
-+	ASSERT_RETURN(map == MAP_FAILED);
-+
-+	/* pools can always be mapped readable */
-+	map = mmap(NULL, POOL_SIZE, PROT_READ, MAP_SHARED, fd, 0);
-+	ASSERT_RETURN(map != MAP_FAILED);
-+
-+	/* make sure we cannot change protection masks to writable */
-+	ret = mprotect(map, POOL_SIZE, PROT_READ | PROT_WRITE);
-+	ASSERT_RETURN(ret < 0);
-+
-+	munmap(map, POOL_SIZE);
-+	close(fd);
-+
-+	return TEST_OK;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-daemon.c b/tools/testing/selftests/kdbus/test-daemon.c
-new file mode 100644
-index 0000000..8bc2386
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-daemon.c
-@@ -0,0 +1,65 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <time.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <poll.h>
-+#include <stdbool.h>
-+
-+#include "kdbus-test.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+
-+int kdbus_test_daemon(struct kdbus_test_env *env)
-+{
-+	struct pollfd fds[2];
-+	int count;
-+	int ret;
-+
-+	/* This test doesn't make any sense in non-interactive mode */
-+	if (!kdbus_util_verbose)
-+		return TEST_OK;
-+
-+	printf("Created connection %llu on bus '%s'\n",
-+		(unsigned long long) env->conn->id, env->buspath);
-+
-+	ret = kdbus_name_acquire(env->conn, "com.example.kdbus-test", NULL);
-+	ASSERT_RETURN(ret == 0);
-+	printf("  Aquired name: com.example.kdbus-test\n");
-+
-+	fds[0].fd = env->conn->fd;
-+	fds[1].fd = STDIN_FILENO;
-+
-+	printf("Monitoring connections:\n");
-+
-+	for (count = 0;; count++) {
-+		int i, nfds = sizeof(fds) / sizeof(fds[0]);
-+
-+		for (i = 0; i < nfds; i++) {
-+			fds[i].events = POLLIN | POLLPRI | POLLHUP;
-+			fds[i].revents = 0;
-+		}
-+
-+		ret = poll(fds, nfds, -1);
-+		if (ret <= 0)
-+			break;
-+
-+		if (fds[0].revents & POLLIN) {
-+			ret = kdbus_msg_recv(env->conn, NULL, NULL);
-+			ASSERT_RETURN(ret == 0);
-+		}
-+
-+		/* stdin */
-+		if (fds[1].revents & POLLIN)
-+			break;
-+	}
-+
-+	printf("Closing bus connection\n");
-+
-+	return TEST_OK;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-endpoint.c b/tools/testing/selftests/kdbus/test-endpoint.c
-new file mode 100644
-index 0000000..34a7be4
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-endpoint.c
-@@ -0,0 +1,352 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <libgen.h>
-+#include <sys/capability.h>
-+#include <sys/wait.h>
-+#include <stdbool.h>
-+
-+#include "kdbus-api.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+#include "kdbus-test.h"
-+
-+#define KDBUS_SYSNAME_MAX_LEN			63
-+
-+static int install_name_add_match(struct kdbus_conn *conn, const char *name)
-+{
-+	struct {
-+		struct kdbus_cmd_match cmd;
-+		struct {
-+			uint64_t size;
-+			uint64_t type;
-+			struct kdbus_notify_name_change chg;
-+		} item;
-+		char name[64];
-+	} buf;
-+	int ret;
-+
-+	/* install the match rule */
-+	memset(&buf, 0, sizeof(buf));
-+	buf.item.type = KDBUS_ITEM_NAME_ADD;
-+	buf.item.chg.old_id.id = KDBUS_MATCH_ID_ANY;
-+	buf.item.chg.new_id.id = KDBUS_MATCH_ID_ANY;
-+	strncpy(buf.name, name, sizeof(buf.name) - 1);
-+	buf.item.size = sizeof(buf.item) + strlen(buf.name) + 1;
-+	buf.cmd.size = sizeof(buf.cmd) + buf.item.size;
-+
-+	ret = kdbus_cmd_match_add(conn->fd, &buf.cmd);
-+	if (ret < 0)
-+		return ret;
-+
-+	return 0;
-+}
-+
-+static int create_endpoint(const char *buspath, uid_t uid, const char *name,
-+			   uint64_t flags)
-+{
-+	struct {
-+		struct kdbus_cmd cmd;
-+
-+		/* name item */
-+		struct {
-+			uint64_t size;
-+			uint64_t type;
-+			/* max should be KDBUS_SYSNAME_MAX_LEN */
-+			char str[128];
-+		} name;
-+	} ep_make;
-+	int fd, ret;
-+
-+	fd = open(buspath, O_RDWR);
-+	if (fd < 0)
-+		return fd;
-+
-+	memset(&ep_make, 0, sizeof(ep_make));
-+
-+	snprintf(ep_make.name.str,
-+		 /* Use the KDBUS_SYSNAME_MAX_LEN or sizeof(str) */
-+		 KDBUS_SYSNAME_MAX_LEN > strlen(name) ?
-+		 KDBUS_SYSNAME_MAX_LEN : sizeof(ep_make.name.str),
-+		 "%u-%s", uid, name);
-+
-+	ep_make.name.type = KDBUS_ITEM_MAKE_NAME;
-+	ep_make.name.size = KDBUS_ITEM_HEADER_SIZE +
-+			    strlen(ep_make.name.str) + 1;
-+
-+	ep_make.cmd.flags = flags;
-+	ep_make.cmd.size = sizeof(ep_make.cmd) + ep_make.name.size;
-+
-+	ret = kdbus_cmd_endpoint_make(fd, &ep_make.cmd);
-+	if (ret < 0) {
-+		kdbus_printf("error creating endpoint: %d (%m)\n", ret);
-+		return ret;
-+	}
-+
-+	return fd;
-+}
-+
-+static int unpriv_test_custom_ep(const char *buspath)
-+{
-+	int ret, ep_fd1, ep_fd2;
-+	char *ep1, *ep2, *tmp1, *tmp2;
-+
-+	tmp1 = strdup(buspath);
-+	tmp2 = strdup(buspath);
-+	ASSERT_RETURN(tmp1 && tmp2);
-+
-+	ret = asprintf(&ep1, "%s/%u-%s", dirname(tmp1), getuid(), "apps1");
-+	ASSERT_RETURN(ret >= 0);
-+
-+	ret = asprintf(&ep2, "%s/%u-%s", dirname(tmp2), getuid(), "apps2");
-+	ASSERT_RETURN(ret >= 0);
-+
-+	free(tmp1);
-+	free(tmp2);
-+
-+	/* endpoint only accessible to current uid */
-+	ep_fd1 = create_endpoint(buspath, getuid(), "apps1", 0);
-+	ASSERT_RETURN(ep_fd1 >= 0);
-+
-+	/* endpoint world accessible */
-+	ep_fd2 = create_endpoint(buspath, getuid(), "apps2",
-+				  KDBUS_MAKE_ACCESS_WORLD);
-+	ASSERT_RETURN(ep_fd2 >= 0);
-+
-+	ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_UID, ({
-+		int ep_fd;
-+		struct kdbus_conn *ep_conn;
-+
-+		/*
-+		 * Make sure that we are not able to create custom
-+		 * endpoints
-+		 */
-+		ep_fd = create_endpoint(buspath, getuid(),
-+					"unpriv_costum_ep", 0);
-+		ASSERT_EXIT(ep_fd == -EPERM);
-+
-+		/*
-+		 * Endpoint "apps1" only accessible to same users,
-+		 * that own the endpoint. Access denied by VFS
-+		 */
-+		ep_conn = kdbus_hello(ep1, 0, NULL, 0);
-+		ASSERT_EXIT(!ep_conn && errno == EACCES);
-+
-+		/* Endpoint "apps2" world accessible */
-+		ep_conn = kdbus_hello(ep2, 0, NULL, 0);
-+		ASSERT_EXIT(ep_conn);
-+
-+		kdbus_conn_free(ep_conn);
-+
-+		_exit(EXIT_SUCCESS);
-+	}),
-+	({ 0; }));
-+	ASSERT_RETURN(ret == 0);
-+
-+	close(ep_fd1);
-+	close(ep_fd2);
-+	free(ep1);
-+	free(ep2);
-+
-+	return 0;
-+}
-+
-+static int update_endpoint(int fd, const char *name)
-+{
-+	int len = strlen(name) + 1;
-+	struct {
-+		struct kdbus_cmd cmd;
-+
-+		/* name item */
-+		struct {
-+			uint64_t size;
-+			uint64_t type;
-+			char str[KDBUS_ALIGN8(len)];
-+		} name;
-+
-+		struct {
-+			uint64_t size;
-+			uint64_t type;
-+			struct kdbus_policy_access access;
-+		} access;
-+	} ep_update;
-+	int ret;
-+
-+	memset(&ep_update, 0, sizeof(ep_update));
-+
-+	ep_update.name.size = KDBUS_ITEM_HEADER_SIZE + len;
-+	ep_update.name.type = KDBUS_ITEM_NAME;
-+	strncpy(ep_update.name.str, name, sizeof(ep_update.name.str) - 1);
-+
-+	ep_update.access.size = sizeof(ep_update.access);
-+	ep_update.access.type = KDBUS_ITEM_POLICY_ACCESS;
-+	ep_update.access.access.type = KDBUS_POLICY_ACCESS_WORLD;
-+	ep_update.access.access.access = KDBUS_POLICY_SEE;
-+
-+	ep_update.cmd.size = sizeof(ep_update);
-+
-+	ret = kdbus_cmd_endpoint_update(fd, &ep_update.cmd);
-+	if (ret < 0) {
-+		kdbus_printf("error updating endpoint: %d (%m)\n", ret);
-+		return ret;
-+	}
-+
-+	return 0;
-+}
-+
-+int kdbus_test_custom_endpoint(struct kdbus_test_env *env)
-+{
-+	char *ep, *tmp;
-+	int ret, ep_fd;
-+	struct kdbus_msg *msg;
-+	struct kdbus_conn *ep_conn;
-+	struct kdbus_conn *reader;
-+	const char *name = "foo.bar.baz";
-+	const char *epname = "foo";
-+	char fake_ep[KDBUS_SYSNAME_MAX_LEN + 1] = {'\0'};
-+
-+	memset(fake_ep, 'X', sizeof(fake_ep) - 1);
-+
-+	/* Try to create a custom endpoint with a long name */
-+	ret = create_endpoint(env->buspath, getuid(), fake_ep, 0);
-+	ASSERT_RETURN(ret == -ENAMETOOLONG);
-+
-+	/* Try to create a custom endpoint with a different uid */
-+	ret = create_endpoint(env->buspath, getuid() + 1, "foobar", 0);
-+	ASSERT_RETURN(ret == -EINVAL);
-+
-+	/* create a custom endpoint, and open a connection on it */
-+	ep_fd = create_endpoint(env->buspath, getuid(), "foo", 0);
-+	ASSERT_RETURN(ep_fd >= 0);
-+
-+	tmp = strdup(env->buspath);
-+	ASSERT_RETURN(tmp);
-+
-+	ret = asprintf(&ep, "%s/%u-%s", dirname(tmp), getuid(), epname);
-+	free(tmp);
-+	ASSERT_RETURN(ret >= 0);
-+
-+	/* Register a connection that listen to broadcasts */
-+	reader = kdbus_hello(ep, 0, NULL, 0);
-+	ASSERT_RETURN(reader);
-+
-+	/* Register to kernel signals */
-+	ret = kdbus_add_match_id(reader, 0x1, KDBUS_ITEM_ID_ADD,
-+				 KDBUS_MATCH_ID_ANY);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_add_match_id(reader, 0x2, KDBUS_ITEM_ID_REMOVE,
-+				 KDBUS_MATCH_ID_ANY);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = install_name_add_match(reader, name);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* Monitor connections are not supported on custom endpoints */
-+	ep_conn = kdbus_hello(ep, KDBUS_HELLO_MONITOR, NULL, 0);
-+	ASSERT_RETURN(!ep_conn && errno == EOPNOTSUPP);
-+
-+	ep_conn = kdbus_hello(ep, 0, NULL, 0);
-+	ASSERT_RETURN(ep_conn);
-+
-+	/* Check that the reader got the IdAdd notification */
-+	ret = kdbus_msg_recv(reader, &msg, NULL);
-+	ASSERT_RETURN(ret == 0);
-+	ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_ID_ADD);
-+	ASSERT_RETURN(msg->items[0].id_change.id == ep_conn->id);
-+	kdbus_msg_free(msg);
-+
-+	/*
-+	 * Add a name add match on the endpoint connection, acquire name from
-+	 * the unfiltered connection, and make sure the filtered connection
-+	 * did not get the notification on the name owner change. Also, the
-+	 * endpoint connection may not be able to call conn_info, neither on
-+	 * the name nor on the ID.
-+	 */
-+	ret = install_name_add_match(ep_conn, name);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_name_acquire(env->conn, name, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_msg_recv(ep_conn, NULL, NULL);
-+	ASSERT_RETURN(ret == -EAGAIN);
-+
-+	ret = kdbus_conn_info(ep_conn, 0, name, 0, NULL);
-+	ASSERT_RETURN(ret == -ESRCH);
-+
-+	ret = kdbus_conn_info(ep_conn, 0, "random.crappy.name", 0, NULL);
-+	ASSERT_RETURN(ret == -ESRCH);
-+
-+	ret = kdbus_conn_info(ep_conn, env->conn->id, NULL, 0, NULL);
-+	ASSERT_RETURN(ret == -ENXIO);
-+
-+	ret = kdbus_conn_info(ep_conn, 0x0fffffffffffffffULL, NULL, 0, NULL);
-+	ASSERT_RETURN(ret == -ENXIO);
-+
-+	/* Check that the reader did not receive the name notification */
-+	ret = kdbus_msg_recv(reader, NULL, NULL);
-+	ASSERT_RETURN(ret == -EAGAIN);
-+
-+	/*
-+	 * Release the name again, update the custom endpoint policy,
-+	 * and try again. This time, the connection on the custom endpoint
-+	 * should have gotten it.
-+	 */
-+	ret = kdbus_name_release(env->conn, name);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* Check that the reader did not receive the name notification */
-+	ret = kdbus_msg_recv(reader, NULL, NULL);
-+	ASSERT_RETURN(ret == -EAGAIN);
-+
-+	ret = update_endpoint(ep_fd, name);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_name_acquire(env->conn, name, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_msg_recv(ep_conn, &msg, NULL);
-+	ASSERT_RETURN(ret == 0);
-+	ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_NAME_ADD);
-+	ASSERT_RETURN(msg->items[0].name_change.old_id.id == 0);
-+	ASSERT_RETURN(msg->items[0].name_change.new_id.id == env->conn->id);
-+	ASSERT_RETURN(strcmp(msg->items[0].name_change.name, name) == 0);
-+	kdbus_msg_free(msg);
-+
-+	ret = kdbus_msg_recv(reader, &msg, NULL);
-+	ASSERT_RETURN(ret == 0);
-+	ASSERT_RETURN(strcmp(msg->items[0].name_change.name, name) == 0);
-+
-+	kdbus_msg_free(msg);
-+
-+	ret = kdbus_conn_info(ep_conn, 0, name, 0, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_conn_info(ep_conn, env->conn->id, NULL, 0, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* If we have privileges test custom endpoints */
-+	ret = test_is_capable(CAP_SETUID, CAP_SETGID, -1);
-+	ASSERT_RETURN(ret >= 0);
-+
-+	/*
-+	 * All uids/gids are mapped and we have the necessary caps
-+	 */
-+	if (ret && all_uids_gids_are_mapped()) {
-+		ret = unpriv_test_custom_ep(env->buspath);
-+		ASSERT_RETURN(ret == 0);
-+	}
-+
-+	kdbus_conn_free(reader);
-+	kdbus_conn_free(ep_conn);
-+	close(ep_fd);
-+
-+	return TEST_OK;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-fd.c b/tools/testing/selftests/kdbus/test-fd.c
-new file mode 100644
-index 0000000..2ae0f5a
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-fd.c
-@@ -0,0 +1,789 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <time.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stdbool.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <sys/types.h>
-+#include <sys/mman.h>
-+#include <sys/socket.h>
-+#include <sys/wait.h>
-+
-+#include "kdbus-api.h"
-+#include "kdbus-test.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+
-+#define KDBUS_MSG_MAX_ITEMS     128
-+#define KDBUS_USER_MAX_CONN	256
-+
-+/* maximum number of inflight fds in a target queue per user */
-+#define KDBUS_CONN_MAX_FDS_PER_USER	16
-+
-+/* maximum number of memfd items per message */
-+#define KDBUS_MSG_MAX_MEMFD_ITEMS       16
-+
-+static int make_msg_payload_dbus(uint64_t src_id, uint64_t dst_id,
-+				 uint64_t msg_size,
-+				 struct kdbus_msg **msg_dbus)
-+{
-+	struct kdbus_msg *msg;
-+
-+	msg = malloc(msg_size);
-+	ASSERT_RETURN_VAL(msg, -ENOMEM);
-+
-+	memset(msg, 0, msg_size);
-+	msg->size = msg_size;
-+	msg->src_id = src_id;
-+	msg->dst_id = dst_id;
-+	msg->payload_type = KDBUS_PAYLOAD_DBUS;
-+
-+	*msg_dbus = msg;
-+
-+	return 0;
-+}
-+
-+static void make_item_memfds(struct kdbus_item *item,
-+			     int *memfds, size_t memfd_size)
-+{
-+	size_t i;
-+
-+	for (i = 0; i < memfd_size; i++) {
-+		item->type = KDBUS_ITEM_PAYLOAD_MEMFD;
-+		item->size = KDBUS_ITEM_HEADER_SIZE +
-+			     sizeof(struct kdbus_memfd);
-+		item->memfd.fd = memfds[i];
-+		item->memfd.size = sizeof(uint64_t); /* const size */
-+		item = KDBUS_ITEM_NEXT(item);
-+	}
-+}
-+
-+static void make_item_fds(struct kdbus_item *item,
-+			  int *fd_array, size_t fd_size)
-+{
-+	size_t i;
-+	item->type = KDBUS_ITEM_FDS;
-+	item->size = KDBUS_ITEM_HEADER_SIZE + (sizeof(int) * fd_size);
-+
-+	for (i = 0; i < fd_size; i++)
-+		item->fds[i] = fd_array[i];
-+}
-+
-+static int memfd_write(const char *name, void *buf, size_t bufsize)
-+{
-+	ssize_t ret;
-+	int memfd;
-+
-+	memfd = sys_memfd_create(name, 0);
-+	ASSERT_RETURN_VAL(memfd >= 0, memfd);
-+
-+	ret = write(memfd, buf, bufsize);
-+	ASSERT_RETURN_VAL(ret == (ssize_t)bufsize, -EAGAIN);
-+
-+	ret = sys_memfd_seal_set(memfd);
-+	ASSERT_RETURN_VAL(ret == 0, -errno);
-+
-+	return memfd;
-+}
-+
-+static int send_memfds(struct kdbus_conn *conn, uint64_t dst_id,
-+		       int *memfds_array, size_t memfd_count)
-+{
-+	struct kdbus_cmd_send cmd = {};
-+	struct kdbus_item *item;
-+	struct kdbus_msg *msg;
-+	uint64_t size;
-+	int ret;
-+
-+	size = sizeof(struct kdbus_msg);
-+	size += memfd_count * KDBUS_ITEM_SIZE(sizeof(struct kdbus_memfd));
-+
-+	if (dst_id == KDBUS_DST_ID_BROADCAST)
-+		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
-+
-+	ret = make_msg_payload_dbus(conn->id, dst_id, size, &msg);
-+	ASSERT_RETURN_VAL(ret == 0, ret);
-+
-+	item = msg->items;
-+
-+	if (dst_id == KDBUS_DST_ID_BROADCAST) {
-+		item->type = KDBUS_ITEM_BLOOM_FILTER;
-+		item->size = KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
-+		item = KDBUS_ITEM_NEXT(item);
-+
-+		msg->flags |= KDBUS_MSG_SIGNAL;
-+	}
-+
-+	make_item_memfds(item, memfds_array, memfd_count);
-+
-+	cmd.size = sizeof(cmd);
-+	cmd.msg_address = (uintptr_t)msg;
-+
-+	ret = kdbus_cmd_send(conn->fd, &cmd);
-+	if (ret < 0) {
-+		kdbus_printf("error sending message: %d (%m)\n", ret);
-+		return ret;
-+	}
-+
-+	free(msg);
-+	return 0;
-+}
-+
-+static int send_fds(struct kdbus_conn *conn, uint64_t dst_id,
-+		    int *fd_array, size_t fd_count)
-+{
-+	struct kdbus_cmd_send cmd = {};
-+	struct kdbus_item *item;
-+	struct kdbus_msg *msg;
-+	uint64_t size;
-+	int ret;
-+
-+	size = sizeof(struct kdbus_msg);
-+	size += KDBUS_ITEM_SIZE(sizeof(int) * fd_count);
-+
-+	if (dst_id == KDBUS_DST_ID_BROADCAST)
-+		size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
-+
-+	ret = make_msg_payload_dbus(conn->id, dst_id, size, &msg);
-+	ASSERT_RETURN_VAL(ret == 0, ret);
-+
-+	item = msg->items;
-+
-+	if (dst_id == KDBUS_DST_ID_BROADCAST) {
-+		item->type = KDBUS_ITEM_BLOOM_FILTER;
-+		item->size = KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
-+		item = KDBUS_ITEM_NEXT(item);
-+
-+		msg->flags |= KDBUS_MSG_SIGNAL;
-+	}
-+
-+	make_item_fds(item, fd_array, fd_count);
-+
-+	cmd.size = sizeof(cmd);
-+	cmd.msg_address = (uintptr_t)msg;
-+
-+	ret = kdbus_cmd_send(conn->fd, &cmd);
-+	if (ret < 0) {
-+		kdbus_printf("error sending message: %d (%m)\n", ret);
-+		return ret;
-+	}
-+
-+	free(msg);
-+	return ret;
-+}
-+
-+static int send_fds_memfds(struct kdbus_conn *conn, uint64_t dst_id,
-+			   int *fds_array, size_t fd_count,
-+			   int *memfds_array, size_t memfd_count)
-+{
-+	struct kdbus_cmd_send cmd = {};
-+	struct kdbus_item *item;
-+	struct kdbus_msg *msg;
-+	uint64_t size;
-+	int ret;
-+
-+	size = sizeof(struct kdbus_msg);
-+	size += memfd_count * KDBUS_ITEM_SIZE(sizeof(struct kdbus_memfd));
-+	size += KDBUS_ITEM_SIZE(sizeof(int) * fd_count);
-+
-+	ret = make_msg_payload_dbus(conn->id, dst_id, size, &msg);
-+	ASSERT_RETURN_VAL(ret == 0, ret);
-+
-+	item = msg->items;
-+
-+	make_item_fds(item, fds_array, fd_count);
-+	item = KDBUS_ITEM_NEXT(item);
-+	make_item_memfds(item, memfds_array, memfd_count);
-+
-+	cmd.size = sizeof(cmd);
-+	cmd.msg_address = (uintptr_t)msg;
-+
-+	ret = kdbus_cmd_send(conn->fd, &cmd);
-+	if (ret < 0) {
-+		kdbus_printf("error sending message: %d (%m)\n", ret);
-+		return ret;
-+	}
-+
-+	free(msg);
-+	return ret;
-+}
-+
-+/* Return the number of received fds */
-+static unsigned int kdbus_item_get_nfds(struct kdbus_msg *msg)
-+{
-+	unsigned int fds = 0;
-+	const struct kdbus_item *item;
-+
-+	KDBUS_ITEM_FOREACH(item, msg, items) {
-+		switch (item->type) {
-+		case KDBUS_ITEM_FDS: {
-+			fds += (item->size - KDBUS_ITEM_HEADER_SIZE) /
-+				sizeof(int);
-+			break;
-+		}
-+
-+		case KDBUS_ITEM_PAYLOAD_MEMFD:
-+			fds++;
-+			break;
-+
-+		default:
-+			break;
-+		}
-+	}
-+
-+	return fds;
-+}
-+
-+static struct kdbus_msg *
-+get_kdbus_msg_with_fd(struct kdbus_conn *conn_src,
-+		      uint64_t dst_id, uint64_t cookie, int fd)
-+{
-+	int ret;
-+	uint64_t size;
-+	struct kdbus_item *item;
-+	struct kdbus_msg *msg;
-+
-+	size = sizeof(struct kdbus_msg);
-+	if (fd >= 0)
-+		size += KDBUS_ITEM_SIZE(sizeof(int));
-+
-+	ret = make_msg_payload_dbus(conn_src->id, dst_id, size, &msg);
-+	ASSERT_RETURN_VAL(ret == 0, NULL);
-+
-+	msg->cookie = cookie;
-+
-+	if (fd >= 0) {
-+		item = msg->items;
-+
-+		make_item_fds(item, (int *)&fd, 1);
-+	}
-+
-+	return msg;
-+}
-+
-+static int kdbus_test_no_fds(struct kdbus_test_env *env,
-+			     int *fds, int *memfd)
-+{
-+	pid_t pid;
-+	int ret, status;
-+	uint64_t cookie;
-+	int connfd1, connfd2;
-+	struct kdbus_msg *msg, *msg_sync_reply;
-+	struct kdbus_cmd_hello hello;
-+	struct kdbus_conn *conn_src, *conn_dst, *conn_dummy;
-+	struct kdbus_cmd_send cmd = {};
-+	struct kdbus_cmd_free cmd_free = {};
-+
-+	conn_src = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn_src);
-+
-+	connfd1 = open(env->buspath, O_RDWR|O_CLOEXEC);
-+	ASSERT_RETURN(connfd1 >= 0);
-+
-+	connfd2 = open(env->buspath, O_RDWR|O_CLOEXEC);
-+	ASSERT_RETURN(connfd2 >= 0);
-+
-+	/*
-+	 * Create connections without KDBUS_HELLO_ACCEPT_FD
-+	 * to test if send fd operations are blocked
-+	 */
-+	conn_dst = malloc(sizeof(*conn_dst));
-+	ASSERT_RETURN(conn_dst);
-+
-+	conn_dummy = malloc(sizeof(*conn_dummy));
-+	ASSERT_RETURN(conn_dummy);
-+
-+	memset(&hello, 0, sizeof(hello));
-+	hello.size = sizeof(struct kdbus_cmd_hello);
-+	hello.pool_size = POOL_SIZE;
-+	hello.attach_flags_send = _KDBUS_ATTACH_ALL;
-+
-+	ret = kdbus_cmd_hello(connfd1, &hello);
-+	ASSERT_RETURN(ret == 0);
-+
-+	cmd_free.size = sizeof(cmd_free);
-+	cmd_free.offset = hello.offset;
-+	ret = kdbus_cmd_free(connfd1, &cmd_free);
-+	ASSERT_RETURN(ret >= 0);
-+
-+	conn_dst->fd = connfd1;
-+	conn_dst->id = hello.id;
-+
-+	memset(&hello, 0, sizeof(hello));
-+	hello.size = sizeof(struct kdbus_cmd_hello);
-+	hello.pool_size = POOL_SIZE;
-+	hello.attach_flags_send = _KDBUS_ATTACH_ALL;
-+
-+	ret = kdbus_cmd_hello(connfd2, &hello);
-+	ASSERT_RETURN(ret == 0);
-+
-+	cmd_free.size = sizeof(cmd_free);
-+	cmd_free.offset = hello.offset;
-+	ret = kdbus_cmd_free(connfd2, &cmd_free);
-+	ASSERT_RETURN(ret >= 0);
-+
-+	conn_dummy->fd = connfd2;
-+	conn_dummy->id = hello.id;
-+
-+	conn_dst->buf = mmap(NULL, POOL_SIZE, PROT_READ,
-+			     MAP_SHARED, connfd1, 0);
-+	ASSERT_RETURN(conn_dst->buf != MAP_FAILED);
-+
-+	conn_dummy->buf = mmap(NULL, POOL_SIZE, PROT_READ,
-+			       MAP_SHARED, connfd2, 0);
-+	ASSERT_RETURN(conn_dummy->buf != MAP_FAILED);
-+
-+	/*
-+	 * Send fds to connection that do not accept fd passing
-+	 */
-+	ret = send_fds(conn_src, conn_dst->id, fds, 1);
-+	ASSERT_RETURN(ret == -ECOMM);
-+
-+	/*
-+	 * memfd are kdbus payload
-+	 */
-+	ret = send_memfds(conn_src, conn_dst->id, memfd, 1);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_msg_recv_poll(conn_dst, 100, NULL, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	cookie = time(NULL);
-+
-+	pid = fork();
-+	ASSERT_RETURN_VAL(pid >= 0, pid);
-+
-+	if (pid == 0) {
-+		struct timespec now;
-+
-+		/*
-+		 * A sync send/reply to a connection that do not
-+		 * accept fds should fail if it contains an fd
-+		 */
-+		msg_sync_reply = get_kdbus_msg_with_fd(conn_dst,
-+						       conn_dummy->id,
-+						       cookie, fds[0]);
-+		ASSERT_EXIT(msg_sync_reply);
-+
-+		ret = clock_gettime(CLOCK_MONOTONIC_COARSE, &now);
-+		ASSERT_EXIT(ret == 0);
-+
-+		msg_sync_reply->timeout_ns = now.tv_sec * 1000000000ULL +
-+					     now.tv_nsec + 100000000ULL;
-+		msg_sync_reply->flags = KDBUS_MSG_EXPECT_REPLY;
-+
-+		memset(&cmd, 0, sizeof(cmd));
-+		cmd.size = sizeof(cmd);
-+		cmd.msg_address = (uintptr_t)msg_sync_reply;
-+		cmd.flags = KDBUS_SEND_SYNC_REPLY;
-+
-+		ret = kdbus_cmd_send(conn_dst->fd, &cmd);
-+		ASSERT_EXIT(ret == -ECOMM);
-+
-+		/*
-+		 * Now send a normal message, but the sync reply
-+		 * will fail since it contains an fd that the
-+		 * original sender do not want.
-+		 *
-+		 * The original sender will fail with -ETIMEDOUT
-+		 */
-+		cookie++;
-+		ret = kdbus_msg_send_sync(conn_dst, NULL, cookie,
-+					  KDBUS_MSG_EXPECT_REPLY,
-+					  5000000000ULL, 0, conn_src->id, -1);
-+		ASSERT_EXIT(ret == -EREMOTEIO);
-+
-+		cookie++;
-+		ret = kdbus_msg_recv_poll(conn_dst, 100, &msg, NULL);
-+		ASSERT_EXIT(ret == 0);
-+		ASSERT_EXIT(msg->cookie == cookie);
-+
-+		free(msg_sync_reply);
-+		kdbus_msg_free(msg);
-+
-+		_exit(EXIT_SUCCESS);
-+	}
-+
-+	ret = kdbus_msg_recv_poll(conn_dummy, 100, NULL, NULL);
-+	ASSERT_RETURN(ret == -ETIMEDOUT);
-+
-+	cookie++;
-+	ret = kdbus_msg_recv_poll(conn_src, 100, &msg, NULL);
-+	ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
-+
-+	kdbus_msg_free(msg);
-+
-+	/*
-+	 * Try to reply with a kdbus connection handle, this should
-+	 * fail with -EOPNOTSUPP
-+	 */
-+	msg_sync_reply = get_kdbus_msg_with_fd(conn_src,
-+					       conn_dst->id,
-+					       cookie, conn_dst->fd);
-+	ASSERT_RETURN(msg_sync_reply);
-+
-+	msg_sync_reply->cookie_reply = cookie;
-+
-+	memset(&cmd, 0, sizeof(cmd));
-+	cmd.size = sizeof(cmd);
-+	cmd.msg_address = (uintptr_t)msg_sync_reply;
-+
-+	ret = kdbus_cmd_send(conn_src->fd, &cmd);
-+	ASSERT_RETURN(ret == -EOPNOTSUPP);
-+
-+	free(msg_sync_reply);
-+
-+	/*
-+	 * Try to reply with a normal fd, this should fail even
-+	 * if the response is a sync reply
-+	 *
-+	 * From the sender view we fail with -ECOMM
-+	 */
-+	msg_sync_reply = get_kdbus_msg_with_fd(conn_src,
-+					       conn_dst->id,
-+					       cookie, fds[0]);
-+	ASSERT_RETURN(msg_sync_reply);
-+
-+	msg_sync_reply->cookie_reply = cookie;
-+
-+	memset(&cmd, 0, sizeof(cmd));
-+	cmd.size = sizeof(cmd);
-+	cmd.msg_address = (uintptr_t)msg_sync_reply;
-+
-+	ret = kdbus_cmd_send(conn_src->fd, &cmd);
-+	ASSERT_RETURN(ret == -ECOMM);
-+
-+	free(msg_sync_reply);
-+
-+	/*
-+	 * Resend another normal message and check if the queue
-+	 * is clear
-+	 */
-+	cookie++;
-+	ret = kdbus_msg_send(conn_src, NULL, cookie, 0, 0, 0,
-+			     conn_dst->id);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = waitpid(pid, &status, 0);
-+	ASSERT_RETURN_VAL(ret >= 0, ret);
-+
-+	kdbus_conn_free(conn_dummy);
-+	kdbus_conn_free(conn_dst);
-+	kdbus_conn_free(conn_src);
-+
-+	return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
-+}
-+
-+static int kdbus_send_multiple_fds(struct kdbus_conn *conn_src,
-+				   struct kdbus_conn *conn_dst)
-+{
-+	int ret, i;
-+	unsigned int nfds;
-+	int fds[KDBUS_CONN_MAX_FDS_PER_USER + 1];
-+	int memfds[KDBUS_MSG_MAX_ITEMS + 1];
-+	struct kdbus_msg *msg;
-+	uint64_t dummy_value;
-+
-+	dummy_value = time(NULL);
-+
-+	for (i = 0; i < KDBUS_CONN_MAX_FDS_PER_USER + 1; i++) {
-+		fds[i] = open("/dev/null", O_RDWR|O_CLOEXEC);
-+		ASSERT_RETURN_VAL(fds[i] >= 0, -errno);
-+	}
-+
-+	/* Send KDBUS_CONN_MAX_FDS_PER_USER with one more fd */
-+	ret = send_fds(conn_src, conn_dst->id, fds,
-+		       KDBUS_CONN_MAX_FDS_PER_USER + 1);
-+	ASSERT_RETURN(ret == -EMFILE);
-+
-+	/* Retry with the correct KDBUS_CONN_MAX_FDS_PER_USER */
-+	ret = send_fds(conn_src, conn_dst->id, fds,
-+		       KDBUS_CONN_MAX_FDS_PER_USER);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_msg_recv(conn_dst, &msg, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* Check we got the right number of fds */
-+	nfds = kdbus_item_get_nfds(msg);
-+	ASSERT_RETURN(nfds == KDBUS_CONN_MAX_FDS_PER_USER);
-+
-+	kdbus_msg_free(msg);
-+
-+	for (i = 0; i < KDBUS_MSG_MAX_ITEMS + 1; i++, dummy_value++) {
-+		memfds[i] = memfd_write("memfd-name",
-+					&dummy_value,
-+					sizeof(dummy_value));
-+		ASSERT_RETURN_VAL(memfds[i] >= 0, memfds[i]);
-+	}
-+
-+	/* Send KDBUS_MSG_MAX_ITEMS with one more memfd */
-+	ret = send_memfds(conn_src, conn_dst->id,
-+			  memfds, KDBUS_MSG_MAX_ITEMS + 1);
-+	ASSERT_RETURN(ret == -E2BIG);
-+
-+	ret = send_memfds(conn_src, conn_dst->id,
-+			  memfds, KDBUS_MSG_MAX_MEMFD_ITEMS + 1);
-+	ASSERT_RETURN(ret == -E2BIG);
-+
-+	/* Retry with the correct KDBUS_MSG_MAX_ITEMS */
-+	ret = send_memfds(conn_src, conn_dst->id,
-+			  memfds, KDBUS_MSG_MAX_MEMFD_ITEMS);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_msg_recv(conn_dst, &msg, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* Check we got the right number of fds */
-+	nfds = kdbus_item_get_nfds(msg);
-+	ASSERT_RETURN(nfds == KDBUS_MSG_MAX_MEMFD_ITEMS);
-+
-+	kdbus_msg_free(msg);
-+
-+
-+	/*
-+	 * Combine multiple KDBUS_CONN_MAX_FDS_PER_USER+1 fds and
-+	 * 10 memfds
-+	 */
-+	ret = send_fds_memfds(conn_src, conn_dst->id,
-+			      fds, KDBUS_CONN_MAX_FDS_PER_USER + 1,
-+			      memfds, 10);
-+	ASSERT_RETURN(ret == -EMFILE);
-+
-+	ret = kdbus_msg_recv(conn_dst, NULL, NULL);
-+	ASSERT_RETURN(ret == -EAGAIN);
-+
-+	/*
-+	 * Combine multiple KDBUS_CONN_MAX_FDS_PER_USER fds and
-+	 * (128 - 1) + 1 memfds, all fds take one item, while each
-+	 * memfd takes one item
-+	 */
-+	ret = send_fds_memfds(conn_src, conn_dst->id,
-+			      fds, KDBUS_CONN_MAX_FDS_PER_USER,
-+			      memfds, (KDBUS_MSG_MAX_ITEMS - 1) + 1);
-+	ASSERT_RETURN(ret == -E2BIG);
-+
-+	ret = send_fds_memfds(conn_src, conn_dst->id,
-+			      fds, KDBUS_CONN_MAX_FDS_PER_USER,
-+			      memfds, KDBUS_MSG_MAX_MEMFD_ITEMS + 1);
-+	ASSERT_RETURN(ret == -E2BIG);
-+
-+	ret = kdbus_msg_recv(conn_dst, NULL, NULL);
-+	ASSERT_RETURN(ret == -EAGAIN);
-+
-+	/*
-+	 * Send KDBUS_CONN_MAX_FDS_PER_USER fds +
-+	 * KDBUS_MSG_MAX_MEMFD_ITEMS memfds
-+	 */
-+	ret = send_fds_memfds(conn_src, conn_dst->id,
-+			      fds, KDBUS_CONN_MAX_FDS_PER_USER,
-+			      memfds, KDBUS_MSG_MAX_MEMFD_ITEMS);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_msg_recv(conn_dst, &msg, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* Check we got the right number of fds */
-+	nfds = kdbus_item_get_nfds(msg);
-+	ASSERT_RETURN(nfds == KDBUS_CONN_MAX_FDS_PER_USER +
-+			      KDBUS_MSG_MAX_MEMFD_ITEMS);
-+
-+	kdbus_msg_free(msg);
-+
-+
-+	/*
-+	 * Re-send fds + memfds, close them, but do not receive them
-+	 * and try to queue more
-+	 */
-+	ret = send_fds_memfds(conn_src, conn_dst->id,
-+			      fds, KDBUS_CONN_MAX_FDS_PER_USER,
-+			      memfds, KDBUS_MSG_MAX_MEMFD_ITEMS);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* close old references and get a new ones */
-+	for (i = 0; i < KDBUS_CONN_MAX_FDS_PER_USER + 1; i++) {
-+		close(fds[i]);
-+		fds[i] = open("/dev/null", O_RDWR|O_CLOEXEC);
-+		ASSERT_RETURN_VAL(fds[i] >= 0, -errno);
-+	}
-+
-+	/* should fail since we have already fds in the queue */
-+	ret = send_fds(conn_src, conn_dst->id, fds,
-+		       KDBUS_CONN_MAX_FDS_PER_USER);
-+	ASSERT_RETURN(ret == -EMFILE);
-+
-+	/* This should succeed */
-+	ret = send_memfds(conn_src, conn_dst->id,
-+			  memfds, KDBUS_MSG_MAX_MEMFD_ITEMS);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_msg_recv(conn_dst, &msg, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	nfds = kdbus_item_get_nfds(msg);
-+	ASSERT_RETURN(nfds == KDBUS_CONN_MAX_FDS_PER_USER +
-+			      KDBUS_MSG_MAX_MEMFD_ITEMS);
-+
-+	kdbus_msg_free(msg);
-+
-+	ret = kdbus_msg_recv(conn_dst, &msg, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	nfds = kdbus_item_get_nfds(msg);
-+	ASSERT_RETURN(nfds == KDBUS_MSG_MAX_MEMFD_ITEMS);
-+
-+	kdbus_msg_free(msg);
-+
-+	ret = kdbus_msg_recv(conn_dst, NULL, NULL);
-+	ASSERT_RETURN(ret == -EAGAIN);
-+
-+	for (i = 0; i < KDBUS_CONN_MAX_FDS_PER_USER + 1; i++)
-+		close(fds[i]);
-+
-+	for (i = 0; i < KDBUS_MSG_MAX_ITEMS + 1; i++)
-+		close(memfds[i]);
-+
-+	return 0;
-+}
-+
-+int kdbus_test_fd_passing(struct kdbus_test_env *env)
-+{
-+	struct kdbus_conn *conn_src, *conn_dst;
-+	const char *str = "stackenblocken";
-+	const struct kdbus_item *item;
-+	struct kdbus_msg *msg;
-+	unsigned int i;
-+	uint64_t now;
-+	int fds_conn[2];
-+	int sock_pair[2];
-+	int fds[2];
-+	int memfd;
-+	int ret;
-+
-+	now = (uint64_t) time(NULL);
-+
-+	/* create two connections */
-+	conn_src = kdbus_hello(env->buspath, 0, NULL, 0);
-+	conn_dst = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn_src && conn_dst);
-+
-+	fds_conn[0] = conn_src->fd;
-+	fds_conn[1] = conn_dst->fd;
-+
-+	ret = socketpair(AF_UNIX, SOCK_STREAM, 0, sock_pair);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* Setup memfd */
-+	memfd = memfd_write("memfd-name", &now, sizeof(now));
-+	ASSERT_RETURN(memfd >= 0);
-+
-+	/* Setup pipes */
-+	ret = pipe(fds);
-+	ASSERT_RETURN(ret == 0);
-+
-+	i = write(fds[1], str, strlen(str));
-+	ASSERT_RETURN(i == strlen(str));
-+
-+	/*
-+	 * Try to ass the handle of a connection as message payload.
-+	 * This must fail.
-+	 */
-+	ret = send_fds(conn_src, conn_dst->id, fds_conn, 2);
-+	ASSERT_RETURN(ret == -ENOTSUP);
-+
-+	ret = send_fds(conn_dst, conn_src->id, fds_conn, 2);
-+	ASSERT_RETURN(ret == -ENOTSUP);
-+
-+	ret = send_fds(conn_src, conn_dst->id, sock_pair, 2);
-+	ASSERT_RETURN(ret == -ENOTSUP);
-+
-+	/*
-+	 * Send fds and memfds to connection that do not accept fds
-+	 */
-+	ret = kdbus_test_no_fds(env, fds, (int *)&memfd);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* Try to broadcast file descriptors. This must fail. */
-+	ret = send_fds(conn_src, KDBUS_DST_ID_BROADCAST, fds, 1);
-+	ASSERT_RETURN(ret == -ENOTUNIQ);
-+
-+	/* Try to broadcast memfd. This must succeed. */
-+	ret = send_memfds(conn_src, KDBUS_DST_ID_BROADCAST, (int *)&memfd, 1);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* Open code this loop */
-+loop_send_fds:
-+
-+	/*
-+	 * Send the read end of the pipe and close it.
-+	 */
-+	ret = send_fds(conn_src, conn_dst->id, fds, 1);
-+	ASSERT_RETURN(ret == 0);
-+	close(fds[0]);
-+
-+	ret = kdbus_msg_recv(conn_dst, &msg, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	KDBUS_ITEM_FOREACH(item, msg, items) {
-+		if (item->type == KDBUS_ITEM_FDS) {
-+			char tmp[14];
-+			int nfds = (item->size - KDBUS_ITEM_HEADER_SIZE) /
-+					sizeof(int);
-+			ASSERT_RETURN(nfds == 1);
-+
-+			i = read(item->fds[0], tmp, sizeof(tmp));
-+			if (i != 0) {
-+				ASSERT_RETURN(i == sizeof(tmp));
-+				ASSERT_RETURN(memcmp(tmp, str, sizeof(tmp)) == 0);
-+
-+				/* Write EOF */
-+				close(fds[1]);
-+
-+				/*
-+				 * Resend the read end of the pipe,
-+				 * the receiver still holds a reference
-+				 * to it...
-+				 */
-+				goto loop_send_fds;
-+			}
-+
-+			/* Got EOF */
-+
-+			/*
-+			 * Close the last reference to the read end
-+			 * of the pipe, other references are
-+			 * automatically closed just after send.
-+			 */
-+			close(item->fds[0]);
-+		}
-+	}
-+
-+	/*
-+	 * Try to resend the read end of the pipe. Must fail with
-+	 * -EBADF since both the sender and receiver closed their
-+	 * references to it. We assume the above since sender and
-+	 * receiver are on the same process.
-+	 */
-+	ret = send_fds(conn_src, conn_dst->id, fds, 1);
-+	ASSERT_RETURN(ret == -EBADF);
-+
-+	/* Then we clear out received any data... */
-+	kdbus_msg_free(msg);
-+
-+	ret = kdbus_send_multiple_fds(conn_src, conn_dst);
-+	ASSERT_RETURN(ret == 0);
-+
-+	close(sock_pair[0]);
-+	close(sock_pair[1]);
-+	close(memfd);
-+
-+	kdbus_conn_free(conn_src);
-+	kdbus_conn_free(conn_dst);
-+
-+	return TEST_OK;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-free.c b/tools/testing/selftests/kdbus/test-free.c
-new file mode 100644
-index 0000000..f666da3
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-free.c
-@@ -0,0 +1,64 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <stdbool.h>
-+
-+#include "kdbus-api.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+#include "kdbus-test.h"
-+
-+static int sample_ioctl_call(struct kdbus_test_env *env)
-+{
-+	int ret;
-+	struct kdbus_cmd_list cmd_list = {
-+		.flags = KDBUS_LIST_QUEUED,
-+		.size = sizeof(cmd_list),
-+	};
-+
-+	ret = kdbus_cmd_list(env->conn->fd, &cmd_list);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* DON'T FREE THIS SLICE OF MEMORY! */
-+
-+	return TEST_OK;
-+}
-+
-+int kdbus_test_free(struct kdbus_test_env *env)
-+{
-+	int ret;
-+	struct kdbus_cmd_free cmd_free = {};
-+
-+	/* free an unallocated buffer */
-+	cmd_free.size = sizeof(cmd_free);
-+	cmd_free.flags = 0;
-+	cmd_free.offset = 0;
-+	ret = kdbus_cmd_free(env->conn->fd, &cmd_free);
-+	ASSERT_RETURN(ret == -ENXIO);
-+
-+	/* free a buffer out of the pool's bounds */
-+	cmd_free.size = sizeof(cmd_free);
-+	cmd_free.offset = POOL_SIZE + 1;
-+	ret = kdbus_cmd_free(env->conn->fd, &cmd_free);
-+	ASSERT_RETURN(ret == -ENXIO);
-+
-+	/*
-+	 * The user application is responsible for freeing the allocated
-+	 * memory with the KDBUS_CMD_FREE ioctl, so let's test what happens
-+	 * if we forget about it.
-+	 */
-+
-+	ret = sample_ioctl_call(env);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = sample_ioctl_call(env);
-+	ASSERT_RETURN(ret == 0);
-+
-+	return TEST_OK;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-match.c b/tools/testing/selftests/kdbus/test-match.c
-new file mode 100644
-index 0000000..2360dc1
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-match.c
-@@ -0,0 +1,441 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <stdbool.h>
-+
-+#include "kdbus-api.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+#include "kdbus-test.h"
-+
-+int kdbus_test_match_id_add(struct kdbus_test_env *env)
-+{
-+	struct {
-+		struct kdbus_cmd_match cmd;
-+		struct {
-+			uint64_t size;
-+			uint64_t type;
-+			struct kdbus_notify_id_change chg;
-+		} item;
-+	} buf;
-+	struct kdbus_conn *conn;
-+	struct kdbus_msg *msg;
-+	int ret;
-+
-+	memset(&buf, 0, sizeof(buf));
-+
-+	buf.cmd.size = sizeof(buf);
-+	buf.cmd.cookie = 0xdeafbeefdeaddead;
-+	buf.item.size = sizeof(buf.item);
-+	buf.item.type = KDBUS_ITEM_ID_ADD;
-+	buf.item.chg.id = KDBUS_MATCH_ID_ANY;
-+
-+	/* match on id add */
-+	ret = kdbus_cmd_match_add(env->conn->fd, &buf.cmd);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* create 2nd connection */
-+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn != NULL);
-+
-+	/* 1st connection should have received a notification */
-+	ret = kdbus_msg_recv(env->conn, &msg, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_ID_ADD);
-+	ASSERT_RETURN(msg->items[0].id_change.id == conn->id);
-+
-+	kdbus_conn_free(conn);
-+
-+	return TEST_OK;
-+}
-+
-+int kdbus_test_match_id_remove(struct kdbus_test_env *env)
-+{
-+	struct {
-+		struct kdbus_cmd_match cmd;
-+		struct {
-+			uint64_t size;
-+			uint64_t type;
-+			struct kdbus_notify_id_change chg;
-+		} item;
-+	} buf;
-+	struct kdbus_conn *conn;
-+	struct kdbus_msg *msg;
-+	size_t id;
-+	int ret;
-+
-+	/* create 2nd connection */
-+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn != NULL);
-+	id = conn->id;
-+
-+	memset(&buf, 0, sizeof(buf));
-+	buf.cmd.size = sizeof(buf);
-+	buf.cmd.cookie = 0xdeafbeefdeaddead;
-+	buf.item.size = sizeof(buf.item);
-+	buf.item.type = KDBUS_ITEM_ID_REMOVE;
-+	buf.item.chg.id = id;
-+
-+	/* register match on 2nd connection */
-+	ret = kdbus_cmd_match_add(env->conn->fd, &buf.cmd);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* remove 2nd connection again */
-+	kdbus_conn_free(conn);
-+
-+	/* 1st connection should have received a notification */
-+	ret = kdbus_msg_recv(env->conn, &msg, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_ID_REMOVE);
-+	ASSERT_RETURN(msg->items[0].id_change.id == id);
-+
-+	return TEST_OK;
-+}
-+
-+int kdbus_test_match_replace(struct kdbus_test_env *env)
-+{
-+	struct {
-+		struct kdbus_cmd_match cmd;
-+		struct {
-+			uint64_t size;
-+			uint64_t type;
-+			struct kdbus_notify_id_change chg;
-+		} item;
-+	} buf;
-+	struct kdbus_conn *conn;
-+	struct kdbus_msg *msg;
-+	size_t id;
-+	int ret;
-+
-+	/* add a match to id_add */
-+	ASSERT_RETURN(kdbus_test_match_id_add(env) == TEST_OK);
-+
-+	/* do a replace of the match from id_add to id_remove */
-+	memset(&buf, 0, sizeof(buf));
-+
-+	buf.cmd.size = sizeof(buf);
-+	buf.cmd.cookie = 0xdeafbeefdeaddead;
-+	buf.cmd.flags = KDBUS_MATCH_REPLACE;
-+	buf.item.size = sizeof(buf.item);
-+	buf.item.type = KDBUS_ITEM_ID_REMOVE;
-+	buf.item.chg.id = KDBUS_MATCH_ID_ANY;
-+
-+	ret = kdbus_cmd_match_add(env->conn->fd, &buf.cmd);
-+
-+	/* create 2nd connection */
-+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn != NULL);
-+	id = conn->id;
-+
-+	/* 1st connection should _not_ have received a notification */
-+	ret = kdbus_msg_recv(env->conn, &msg, NULL);
-+	ASSERT_RETURN(ret != 0);
-+
-+	/* remove 2nd connection */
-+	kdbus_conn_free(conn);
-+
-+	/* 1st connection should _now_ have received a notification */
-+	ret = kdbus_msg_recv(env->conn, &msg, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_ID_REMOVE);
-+	ASSERT_RETURN(msg->items[0].id_change.id == id);
-+
-+	return TEST_OK;
-+}
-+
-+int kdbus_test_match_name_add(struct kdbus_test_env *env)
-+{
-+	struct {
-+		struct kdbus_cmd_match cmd;
-+		struct {
-+			uint64_t size;
-+			uint64_t type;
-+			struct kdbus_notify_name_change chg;
-+		} item;
-+		char name[64];
-+	} buf;
-+	struct kdbus_msg *msg;
-+	char *name;
-+	int ret;
-+
-+	name = "foo.bla.blaz";
-+
-+	/* install the match rule */
-+	memset(&buf, 0, sizeof(buf));
-+	buf.item.type = KDBUS_ITEM_NAME_ADD;
-+	buf.item.chg.old_id.id = KDBUS_MATCH_ID_ANY;
-+	buf.item.chg.new_id.id = KDBUS_MATCH_ID_ANY;
-+	strncpy(buf.name, name, sizeof(buf.name) - 1);
-+	buf.item.size = sizeof(buf.item) + strlen(buf.name) + 1;
-+	buf.cmd.size = sizeof(buf.cmd) + buf.item.size;
-+
-+	ret = kdbus_cmd_match_add(env->conn->fd, &buf.cmd);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* acquire the name */
-+	ret = kdbus_name_acquire(env->conn, name, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* we should have received a notification */
-+	ret = kdbus_msg_recv(env->conn, &msg, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_NAME_ADD);
-+	ASSERT_RETURN(msg->items[0].name_change.old_id.id == 0);
-+	ASSERT_RETURN(msg->items[0].name_change.new_id.id == env->conn->id);
-+	ASSERT_RETURN(strcmp(msg->items[0].name_change.name, name) == 0);
-+
-+	return TEST_OK;
-+}
-+
-+int kdbus_test_match_name_remove(struct kdbus_test_env *env)
-+{
-+	struct {
-+		struct kdbus_cmd_match cmd;
-+		struct {
-+			uint64_t size;
-+			uint64_t type;
-+			struct kdbus_notify_name_change chg;
-+		} item;
-+		char name[64];
-+	} buf;
-+	struct kdbus_msg *msg;
-+	char *name;
-+	int ret;
-+
-+	name = "foo.bla.blaz";
-+
-+	/* acquire the name */
-+	ret = kdbus_name_acquire(env->conn, name, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* install the match rule */
-+	memset(&buf, 0, sizeof(buf));
-+	buf.item.type = KDBUS_ITEM_NAME_REMOVE;
-+	buf.item.chg.old_id.id = KDBUS_MATCH_ID_ANY;
-+	buf.item.chg.new_id.id = KDBUS_MATCH_ID_ANY;
-+	strncpy(buf.name, name, sizeof(buf.name) - 1);
-+	buf.item.size = sizeof(buf.item) + strlen(buf.name) + 1;
-+	buf.cmd.size = sizeof(buf.cmd) + buf.item.size;
-+
-+	ret = kdbus_cmd_match_add(env->conn->fd, &buf.cmd);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* release the name again */
-+	kdbus_name_release(env->conn, name);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* we should have received a notification */
-+	ret = kdbus_msg_recv(env->conn, &msg, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_NAME_REMOVE);
-+	ASSERT_RETURN(msg->items[0].name_change.old_id.id == env->conn->id);
-+	ASSERT_RETURN(msg->items[0].name_change.new_id.id == 0);
-+	ASSERT_RETURN(strcmp(msg->items[0].name_change.name, name) == 0);
-+
-+	return TEST_OK;
-+}
-+
-+int kdbus_test_match_name_change(struct kdbus_test_env *env)
-+{
-+	struct {
-+		struct kdbus_cmd_match cmd;
-+		struct {
-+			uint64_t size;
-+			uint64_t type;
-+			struct kdbus_notify_name_change chg;
-+		} item;
-+		char name[64];
-+	} buf;
-+	struct kdbus_conn *conn;
-+	struct kdbus_msg *msg;
-+	uint64_t flags;
-+	char *name = "foo.bla.baz";
-+	int ret;
-+
-+	/* acquire the name */
-+	ret = kdbus_name_acquire(env->conn, name, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* install the match rule */
-+	memset(&buf, 0, sizeof(buf));
-+	buf.item.type = KDBUS_ITEM_NAME_CHANGE;
-+	buf.item.chg.old_id.id = KDBUS_MATCH_ID_ANY;
-+	buf.item.chg.new_id.id = KDBUS_MATCH_ID_ANY;
-+	strncpy(buf.name, name, sizeof(buf.name) - 1);
-+	buf.item.size = sizeof(buf.item) + strlen(buf.name) + 1;
-+	buf.cmd.size = sizeof(buf.cmd) + buf.item.size;
-+
-+	ret = kdbus_cmd_match_add(env->conn->fd, &buf.cmd);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* create a 2nd connection */
-+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn != NULL);
-+
-+	/* allow the new connection to own the same name */
-+	/* queue the 2nd connection as waiting owner */
-+	flags = KDBUS_NAME_QUEUE;
-+	ret = kdbus_name_acquire(conn, name, &flags);
-+	ASSERT_RETURN(ret == 0);
-+	ASSERT_RETURN(flags & KDBUS_NAME_IN_QUEUE);
-+
-+	/* release name from 1st connection */
-+	ret = kdbus_name_release(env->conn, name);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* we should have received a notification */
-+	ret = kdbus_msg_recv(env->conn, &msg, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_NAME_CHANGE);
-+	ASSERT_RETURN(msg->items[0].name_change.old_id.id == env->conn->id);
-+	ASSERT_RETURN(msg->items[0].name_change.new_id.id == conn->id);
-+	ASSERT_RETURN(strcmp(msg->items[0].name_change.name, name) == 0);
-+
-+	kdbus_conn_free(conn);
-+
-+	return TEST_OK;
-+}
-+
-+static int send_bloom_filter(const struct kdbus_conn *conn,
-+			     uint64_t cookie,
-+			     const uint8_t *filter,
-+			     size_t filter_size,
-+			     uint64_t filter_generation)
-+{
-+	struct kdbus_cmd_send cmd = {};
-+	struct kdbus_msg *msg;
-+	struct kdbus_item *item;
-+	uint64_t size;
-+	int ret;
-+
-+	size = sizeof(struct kdbus_msg);
-+	size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + filter_size;
-+
-+	msg = alloca(size);
-+
-+	memset(msg, 0, size);
-+	msg->size = size;
-+	msg->src_id = conn->id;
-+	msg->dst_id = KDBUS_DST_ID_BROADCAST;
-+	msg->flags = KDBUS_MSG_SIGNAL;
-+	msg->payload_type = KDBUS_PAYLOAD_DBUS;
-+	msg->cookie = cookie;
-+
-+	item = msg->items;
-+	item->type = KDBUS_ITEM_BLOOM_FILTER;
-+	item->size = KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) +
-+				filter_size;
-+
-+	item->bloom_filter.generation = filter_generation;
-+	memcpy(item->bloom_filter.data, filter, filter_size);
-+
-+	cmd.size = sizeof(cmd);
-+	cmd.msg_address = (uintptr_t)msg;
-+
-+	ret = kdbus_cmd_send(conn->fd, &cmd);
-+	if (ret < 0) {
-+		kdbus_printf("error sending message: %d (%m)\n", ret);
-+		return ret;
-+	}
-+
-+	return 0;
-+}
-+
-+int kdbus_test_match_bloom(struct kdbus_test_env *env)
-+{
-+	struct {
-+		struct kdbus_cmd_match cmd;
-+		struct {
-+			uint64_t size;
-+			uint64_t type;
-+			uint8_t data_gen0[64];
-+			uint8_t data_gen1[64];
-+		} item;
-+	} buf;
-+	struct kdbus_conn *conn;
-+	struct kdbus_msg *msg;
-+	uint64_t cookie = 0xf000f00f;
-+	uint8_t filter[64];
-+	int ret;
-+
-+	/* install the match rule */
-+	memset(&buf, 0, sizeof(buf));
-+	buf.cmd.size = sizeof(buf);
-+
-+	buf.item.size = sizeof(buf.item);
-+	buf.item.type = KDBUS_ITEM_BLOOM_MASK;
-+	buf.item.data_gen0[0] = 0x55;
-+	buf.item.data_gen0[63] = 0x80;
-+
-+	buf.item.data_gen1[1] = 0xaa;
-+	buf.item.data_gen1[9] = 0x02;
-+
-+	ret = kdbus_cmd_match_add(env->conn->fd, &buf.cmd);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* create a 2nd connection */
-+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn != NULL);
-+
-+	/* a message with a 0'ed out filter must not reach the other peer */
-+	memset(filter, 0, sizeof(filter));
-+	ret = send_bloom_filter(conn, ++cookie, filter, sizeof(filter), 0);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_msg_recv(env->conn, &msg, NULL);
-+	ASSERT_RETURN(ret == -EAGAIN);
-+
-+	/* now set the filter to the connection's mask and expect success */
-+	filter[0] = 0x55;
-+	filter[63] = 0x80;
-+	ret = send_bloom_filter(conn, ++cookie, filter, sizeof(filter), 0);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_msg_recv(env->conn, &msg, NULL);
-+	ASSERT_RETURN(ret == 0);
-+	ASSERT_RETURN(msg->cookie == cookie);
-+
-+	/* broaden the filter and try again. this should also succeed. */
-+	filter[0] = 0xff;
-+	filter[8] = 0xff;
-+	filter[63] = 0xff;
-+	ret = send_bloom_filter(conn, ++cookie, filter, sizeof(filter), 0);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_msg_recv(env->conn, &msg, NULL);
-+	ASSERT_RETURN(ret == 0);
-+	ASSERT_RETURN(msg->cookie == cookie);
-+
-+	/* the same filter must not match against bloom generation 1 */
-+	ret = send_bloom_filter(conn, ++cookie, filter, sizeof(filter), 1);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_msg_recv(env->conn, &msg, NULL);
-+	ASSERT_RETURN(ret == -EAGAIN);
-+
-+	/* set a different filter and try again */
-+	filter[1] = 0xaa;
-+	filter[9] = 0x02;
-+	ret = send_bloom_filter(conn, ++cookie, filter, sizeof(filter), 1);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_msg_recv(env->conn, &msg, NULL);
-+	ASSERT_RETURN(ret == 0);
-+	ASSERT_RETURN(msg->cookie == cookie);
-+
-+	kdbus_conn_free(conn);
-+
-+	return TEST_OK;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-message.c b/tools/testing/selftests/kdbus/test-message.c
-new file mode 100644
-index 0000000..563dc85
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-message.c
-@@ -0,0 +1,734 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <time.h>
-+#include <stdbool.h>
-+#include <sys/eventfd.h>
-+#include <sys/types.h>
-+#include <sys/wait.h>
-+
-+#include "kdbus-api.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+#include "kdbus-test.h"
-+
-+/* maximum number of queued messages from the same individual user */
-+#define KDBUS_CONN_MAX_MSGS			256
-+
-+/* maximum number of queued requests waiting for a reply */
-+#define KDBUS_CONN_MAX_REQUESTS_PENDING		128
-+
-+/* maximum message payload size */
-+#define KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE		(2 * 1024UL * 1024UL)
-+
-+int kdbus_test_message_basic(struct kdbus_test_env *env)
-+{
-+	struct kdbus_conn *conn;
-+	struct kdbus_conn *sender;
-+	struct kdbus_msg *msg;
-+	uint64_t cookie = 0x1234abcd5678eeff;
-+	uint64_t offset;
-+	int ret;
-+
-+	sender = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(sender != NULL);
-+
-+	/* create a 2nd connection */
-+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn != NULL);
-+
-+	ret = kdbus_add_match_empty(conn);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_add_match_empty(sender);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* send over 1st connection */
-+	ret = kdbus_msg_send(sender, NULL, cookie, 0, 0, 0,
-+			     KDBUS_DST_ID_BROADCAST);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* Make sure that we do get our own broadcasts */
-+	ret = kdbus_msg_recv(sender, &msg, &offset);
-+	ASSERT_RETURN(ret == 0);
-+	ASSERT_RETURN(msg->cookie == cookie);
-+
-+	kdbus_msg_free(msg);
-+
-+	/* ... and receive on the 2nd */
-+	ret = kdbus_msg_recv_poll(conn, 100, &msg, &offset);
-+	ASSERT_RETURN(ret == 0);
-+	ASSERT_RETURN(msg->cookie == cookie);
-+
-+	kdbus_msg_free(msg);
-+
-+	/* Msgs that expect a reply must have timeout and cookie */
-+	ret = kdbus_msg_send(sender, NULL, 0, KDBUS_MSG_EXPECT_REPLY,
-+			     0, 0, conn->id);
-+	ASSERT_RETURN(ret == -EINVAL);
-+
-+	/* Faked replies with a valid reply cookie are rejected */
-+	ret = kdbus_msg_send_reply(conn, time(NULL) ^ cookie, sender->id);
-+	ASSERT_RETURN(ret == -EBADSLT);
-+
-+	ret = kdbus_free(conn, offset);
-+	ASSERT_RETURN(ret == 0);
-+
-+	kdbus_conn_free(sender);
-+	kdbus_conn_free(conn);
-+
-+	return TEST_OK;
-+}
-+
-+static int msg_recv_prio(struct kdbus_conn *conn,
-+			 int64_t requested_prio,
-+			 int64_t expected_prio)
-+{
-+	struct kdbus_cmd_recv recv = {
-+		.size = sizeof(recv),
-+		.flags = KDBUS_RECV_USE_PRIORITY,
-+		.priority = requested_prio,
-+	};
-+	struct kdbus_msg *msg;
-+	int ret;
-+
-+	ret = kdbus_cmd_recv(conn->fd, &recv);
-+	if (ret < 0) {
-+		kdbus_printf("error receiving message: %d (%m)\n", -errno);
-+		return ret;
-+	}
-+
-+	msg = (struct kdbus_msg *)(conn->buf + recv.msg.offset);
-+	kdbus_msg_dump(conn, msg);
-+
-+	if (msg->priority != expected_prio) {
-+		kdbus_printf("expected message prio %lld, got %lld\n",
-+			     (unsigned long long) expected_prio,
-+			     (unsigned long long) msg->priority);
-+		return -EINVAL;
-+	}
-+
-+	kdbus_msg_free(msg);
-+	ret = kdbus_free(conn, recv.msg.offset);
-+	if (ret < 0)
-+		return ret;
-+
-+	return 0;
-+}
-+
-+int kdbus_test_message_prio(struct kdbus_test_env *env)
-+{
-+	struct kdbus_conn *a, *b;
-+	uint64_t cookie = 0;
-+
-+	a = kdbus_hello(env->buspath, 0, NULL, 0);
-+	b = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(a && b);
-+
-+	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0,   25, a->id) == 0);
-+	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -600, a->id) == 0);
-+	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0,   10, a->id) == 0);
-+	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0,  -35, a->id) == 0);
-+	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -100, a->id) == 0);
-+	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0,   20, a->id) == 0);
-+	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0,  -15, a->id) == 0);
-+	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -800, a->id) == 0);
-+	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -150, a->id) == 0);
-+	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0,   10, a->id) == 0);
-+	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -800, a->id) == 0);
-+	ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0,  -10, a->id) == 0);
-+
-+	ASSERT_RETURN(msg_recv_prio(a, -200, -800) == 0);
-+	ASSERT_RETURN(msg_recv_prio(a, -100, -800) == 0);
-+	ASSERT_RETURN(msg_recv_prio(a, -400, -600) == 0);
-+	ASSERT_RETURN(msg_recv_prio(a, -400, -600) == -EAGAIN);
-+	ASSERT_RETURN(msg_recv_prio(a, 10, -150) == 0);
-+	ASSERT_RETURN(msg_recv_prio(a, 10, -100) == 0);
-+
-+	kdbus_printf("--- get priority (all)\n");
-+	ASSERT_RETURN(kdbus_msg_recv(a, NULL, NULL) == 0);
-+
-+	kdbus_conn_free(a);
-+	kdbus_conn_free(b);
-+
-+	return TEST_OK;
-+}
-+
-+static int kdbus_test_notify_kernel_quota(struct kdbus_test_env *env)
-+{
-+	int ret;
-+	unsigned int i;
-+	struct kdbus_conn *conn;
-+	struct kdbus_conn *reader;
-+	struct kdbus_msg *msg = NULL;
-+	struct kdbus_cmd_recv recv = { .size = sizeof(recv) };
-+
-+	reader = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(reader);
-+
-+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn);
-+
-+	/* Register for ID signals */
-+	ret = kdbus_add_match_id(reader, 0x1, KDBUS_ITEM_ID_ADD,
-+				 KDBUS_MATCH_ID_ANY);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_add_match_id(reader, 0x2, KDBUS_ITEM_ID_REMOVE,
-+				 KDBUS_MATCH_ID_ANY);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* Each iteration two notifications: add and remove ID */
-+	for (i = 0; i < KDBUS_CONN_MAX_MSGS / 2; i++) {
-+		struct kdbus_conn *notifier;
-+
-+		notifier = kdbus_hello(env->buspath, 0, NULL, 0);
-+		ASSERT_RETURN(notifier);
-+
-+		kdbus_conn_free(notifier);
-+	}
-+
-+	/*
-+	 * Now the reader queue is full with kernel notfications,
-+	 * but as a user we still have room to push our messages.
-+	 */
-+	ret = kdbus_msg_send(conn, NULL, 0xdeadbeef, 0, 0, 0, reader->id);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* More ID kernel notifications that will be lost */
-+	kdbus_conn_free(conn);
-+
-+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn);
-+
-+	kdbus_conn_free(conn);
-+
-+	/*
-+	 * We lost only 3 packets since only signal msgs are
-+	 * accounted. The connection ID add/remove notification
-+	 */
-+	ret = kdbus_cmd_recv(reader->fd, &recv);
-+	ASSERT_RETURN(ret == 0);
-+	ASSERT_RETURN(recv.return_flags & KDBUS_RECV_RETURN_DROPPED_MSGS);
-+	ASSERT_RETURN(recv.dropped_msgs == 3);
-+
-+	msg = (struct kdbus_msg *)(reader->buf + recv.msg.offset);
-+	kdbus_msg_free(msg);
-+
-+	/* Read our queue */
-+	for (i = 0; i < KDBUS_CONN_MAX_MSGS - 1; i++) {
-+		memset(&recv, 0, sizeof(recv));
-+		recv.size = sizeof(recv);
-+
-+		ret = kdbus_cmd_recv(reader->fd, &recv);
-+		ASSERT_RETURN(ret == 0);
-+		ASSERT_RETURN(!(recv.return_flags &
-+			        KDBUS_RECV_RETURN_DROPPED_MSGS));
-+
-+		msg = (struct kdbus_msg *)(reader->buf + recv.msg.offset);
-+		kdbus_msg_free(msg);
-+	}
-+
-+	ret = kdbus_msg_recv(reader, NULL, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_msg_recv(reader, NULL, NULL);
-+	ASSERT_RETURN(ret == -EAGAIN);
-+
-+	kdbus_conn_free(reader);
-+
-+	return 0;
-+}
-+
-+/* Return the number of message successfully sent */
-+static int kdbus_fill_conn_queue(struct kdbus_conn *conn_src,
-+				 uint64_t dst_id,
-+				 unsigned int max_msgs)
-+{
-+	unsigned int i;
-+	uint64_t cookie = 0;
-+	size_t size;
-+	struct kdbus_cmd_send cmd = {};
-+	struct kdbus_msg *msg;
-+	int ret;
-+
-+	size = sizeof(struct kdbus_msg);
-+	msg = malloc(size);
-+	ASSERT_RETURN_VAL(msg, -ENOMEM);
-+
-+	memset(msg, 0, size);
-+	msg->size = size;
-+	msg->src_id = conn_src->id;
-+	msg->dst_id = dst_id;
-+	msg->payload_type = KDBUS_PAYLOAD_DBUS;
-+
-+	cmd.size = sizeof(cmd);
-+	cmd.msg_address = (uintptr_t)msg;
-+
-+	for (i = 0; i < max_msgs; i++) {
-+		msg->cookie = cookie++;
-+		ret = kdbus_cmd_send(conn_src->fd, &cmd);
-+		if (ret < 0)
-+			break;
-+	}
-+
-+	free(msg);
-+
-+	return i;
-+}
-+
-+static int kdbus_test_activator_quota(struct kdbus_test_env *env)
-+{
-+	int ret;
-+	unsigned int i;
-+	unsigned int activator_msgs_count = 0;
-+	uint64_t cookie = time(NULL);
-+	struct kdbus_conn *conn;
-+	struct kdbus_conn *sender;
-+	struct kdbus_conn *activator;
-+	struct kdbus_msg *msg;
-+	uint64_t flags = KDBUS_NAME_REPLACE_EXISTING;
-+	struct kdbus_cmd_recv recv = { .size = sizeof(recv) };
-+	struct kdbus_policy_access access = {
-+		.type = KDBUS_POLICY_ACCESS_USER,
-+		.id = geteuid(),
-+		.access = KDBUS_POLICY_OWN,
-+	};
-+
-+	activator = kdbus_hello_activator(env->buspath, "foo.test.activator",
-+					  &access, 1);
-+	ASSERT_RETURN(activator);
-+
-+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+	sender = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn || sender);
-+
-+	ret = kdbus_list(sender, KDBUS_LIST_NAMES |
-+				 KDBUS_LIST_UNIQUE |
-+				 KDBUS_LIST_ACTIVATORS |
-+				 KDBUS_LIST_QUEUED);
-+	ASSERT_RETURN(ret == 0);
-+
-+	for (i = 0; i < KDBUS_CONN_MAX_MSGS; i++) {
-+		ret = kdbus_msg_send(sender, "foo.test.activator",
-+				     cookie++, 0, 0, 0,
-+				     KDBUS_DST_ID_NAME);
-+		if (ret < 0)
-+			break;
-+		activator_msgs_count++;
-+	}
-+
-+	/* we must have at least sent one message */
-+	ASSERT_RETURN_VAL(i > 0, -errno);
-+	ASSERT_RETURN(ret == -ENOBUFS);
-+
-+	/* Good, activator queue is full now */
-+
-+	/* ENXIO on direct send (activators can never be addressed by ID) */
-+	ret = kdbus_msg_send(conn, NULL, cookie++, 0, 0, 0, activator->id);
-+	ASSERT_RETURN(ret == -ENXIO);
-+
-+	/* can't queue more */
-+	ret = kdbus_msg_send(conn, "foo.test.activator", cookie++,
-+			     0, 0, 0, KDBUS_DST_ID_NAME);
-+	ASSERT_RETURN(ret == -ENOBUFS);
-+
-+	/* no match installed, so the broadcast will not inc dropped_msgs */
-+	ret = kdbus_msg_send(sender, NULL, cookie++, 0, 0, 0,
-+			     KDBUS_DST_ID_BROADCAST);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* Check activator queue */
-+	ret = kdbus_cmd_recv(activator->fd, &recv);
-+	ASSERT_RETURN(ret == 0);
-+	ASSERT_RETURN(recv.dropped_msgs == 0);
-+
-+	activator_msgs_count--;
-+
-+	msg = (struct kdbus_msg *)(activator->buf + recv.msg.offset);
-+	kdbus_msg_free(msg);
-+
-+
-+	/* Stage 1) of test check the pool memory quota */
-+
-+	/* Consume the connection pool memory */
-+	for (i = 0; i < KDBUS_CONN_MAX_MSGS; i++) {
-+		ret = kdbus_msg_send(sender, NULL,
-+				     cookie++, 0, 0, 0, conn->id);
-+		if (ret < 0)
-+			break;
-+	}
-+
-+	/* consume one message, so later at least one can be moved */
-+	memset(&recv, 0, sizeof(recv));
-+	recv.size = sizeof(recv);
-+	ret = kdbus_cmd_recv(conn->fd, &recv);
-+	ASSERT_RETURN(ret == 0);
-+	ASSERT_RETURN(recv.dropped_msgs == 0);
-+	msg = (struct kdbus_msg *)(conn->buf + recv.msg.offset);
-+	kdbus_msg_free(msg);
-+
-+	/* Try to acquire the name now */
-+	ret = kdbus_name_acquire(conn, "foo.test.activator", &flags);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* try to read messages and see if we have lost some */
-+	memset(&recv, 0, sizeof(recv));
-+	recv.size = sizeof(recv);
-+	ret = kdbus_cmd_recv(conn->fd, &recv);
-+	ASSERT_RETURN(ret == 0);
-+	ASSERT_RETURN(recv.dropped_msgs != 0);
-+
-+	/* number of dropped msgs < received ones (at least one was moved) */
-+	ASSERT_RETURN(recv.dropped_msgs < activator_msgs_count);
-+
-+	/* Deduct the number of dropped msgs from the activator msgs */
-+	activator_msgs_count -= recv.dropped_msgs;
-+
-+	msg = (struct kdbus_msg *)(activator->buf + recv.msg.offset);
-+	kdbus_msg_free(msg);
-+
-+	/*
-+	 * Release the name and hand it back to activator, now
-+	 * we should have 'activator_msgs_count' msgs again in
-+	 * the activator queue
-+	 */
-+	ret = kdbus_name_release(conn, "foo.test.activator");
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* make sure that we got our previous activator msgs */
-+	ret = kdbus_msg_recv(activator, &msg, NULL);
-+	ASSERT_RETURN(ret == 0);
-+	ASSERT_RETURN(msg->src_id == sender->id);
-+
-+	activator_msgs_count--;
-+
-+	kdbus_msg_free(msg);
-+
-+
-+	/* Stage 2) of test check max message quota */
-+
-+	/* Empty conn queue */
-+	for (i = 0; i < KDBUS_CONN_MAX_MSGS; i++) {
-+		ret = kdbus_msg_recv(conn, NULL, NULL);
-+		if (ret == -EAGAIN)
-+			break;
-+	}
-+
-+	/* fill queue with max msgs quota */
-+	ret = kdbus_fill_conn_queue(sender, conn->id, KDBUS_CONN_MAX_MSGS);
-+	ASSERT_RETURN(ret == KDBUS_CONN_MAX_MSGS);
-+
-+	/* This one is lost but it is not accounted */
-+	ret = kdbus_msg_send(sender, NULL,
-+			     cookie++, 0, 0, 0, conn->id);
-+	ASSERT_RETURN(ret == -ENOBUFS);
-+
-+	/* Acquire the name again */
-+	ret = kdbus_name_acquire(conn, "foo.test.activator", &flags);
-+	ASSERT_RETURN(ret == 0);
-+
-+	memset(&recv, 0, sizeof(recv));
-+	recv.size = sizeof(recv);
-+
-+	/*
-+	 * Try to read messages and make sure that we have lost all
-+	 * the activator messages due to quota checks. Our queue is
-+	 * already full.
-+	 */
-+	ret = kdbus_cmd_recv(conn->fd, &recv);
-+	ASSERT_RETURN(ret == 0);
-+	ASSERT_RETURN(recv.dropped_msgs == activator_msgs_count);
-+
-+	msg = (struct kdbus_msg *)(activator->buf + recv.msg.offset);
-+	kdbus_msg_free(msg);
-+
-+	kdbus_conn_free(sender);
-+	kdbus_conn_free(conn);
-+	kdbus_conn_free(activator);
-+
-+	return 0;
-+}
-+
-+static int kdbus_test_expected_reply_quota(struct kdbus_test_env *env)
-+{
-+	int ret;
-+	unsigned int i, n;
-+	unsigned int count;
-+	uint64_t cookie = 0x1234abcd5678eeff;
-+	struct kdbus_conn *conn;
-+	struct kdbus_conn *connections[9];
-+
-+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn);
-+
-+	for (i = 0; i < 9; i++) {
-+		connections[i] = kdbus_hello(env->buspath, 0, NULL, 0);
-+		ASSERT_RETURN(connections[i]);
-+	}
-+
-+	count = 0;
-+	/* Send 16 messages to 8 different connections */
-+	for (i = 0; i < 8; i++) {
-+		for (n = 0; n < 16; n++) {
-+			ret = kdbus_msg_send(conn, NULL, cookie++,
-+					     KDBUS_MSG_EXPECT_REPLY,
-+					     100000000ULL, 0,
-+					     connections[i]->id);
-+			if (ret < 0)
-+				break;
-+
-+			count++;
-+		}
-+	}
-+
-+	/*
-+	 * We should have queued at least
-+	 * KDBUS_CONN_MAX_REQUESTS_PENDING method call
-+	 */
-+	ASSERT_RETURN(count == KDBUS_CONN_MAX_REQUESTS_PENDING);
-+
-+	/*
-+	 * Now try to send a message to the last connection,
-+	 * if we have reached KDBUS_CONN_MAX_REQUESTS_PENDING
-+	 * no further requests are allowed
-+	 */
-+	ret = kdbus_msg_send(conn, NULL, cookie++, KDBUS_MSG_EXPECT_REPLY,
-+			     1000000000ULL, 0, connections[8]->id);
-+	ASSERT_RETURN(ret == -EMLINK);
-+
-+	for (i = 0; i < 9; i++)
-+		kdbus_conn_free(connections[i]);
-+
-+	kdbus_conn_free(conn);
-+
-+	return 0;
-+}
-+
-+int kdbus_test_pool_quota(struct kdbus_test_env *env)
-+{
-+	struct kdbus_conn *a, *b, *c;
-+	struct kdbus_cmd_send cmd = {};
-+	struct kdbus_item *item;
-+	struct kdbus_msg *recv_msg;
-+	struct kdbus_msg *msg;
-+	uint64_t cookie = time(NULL);
-+	uint64_t size;
-+	unsigned int i;
-+	char *payload;
-+	int ret;
-+
-+	/* just a guard */
-+	if (POOL_SIZE <= KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE ||
-+	    POOL_SIZE % KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE != 0)
-+		return 0;
-+
-+	payload = calloc(KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE, sizeof(char));
-+	ASSERT_RETURN_VAL(payload, -ENOMEM);
-+
-+	a = kdbus_hello(env->buspath, 0, NULL, 0);
-+	b = kdbus_hello(env->buspath, 0, NULL, 0);
-+	c = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(a && b && c);
-+
-+	size = sizeof(struct kdbus_msg);
-+	size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
-+
-+	msg = malloc(size);
-+	ASSERT_RETURN_VAL(msg, -ENOMEM);
-+
-+	memset(msg, 0, size);
-+	msg->size = size;
-+	msg->src_id = a->id;
-+	msg->dst_id = c->id;
-+	msg->payload_type = KDBUS_PAYLOAD_DBUS;
-+
-+	item = msg->items;
-+	item->type = KDBUS_ITEM_PAYLOAD_VEC;
-+	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
-+	item->vec.address = (uintptr_t)payload;
-+	item->vec.size = KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE;
-+	item = KDBUS_ITEM_NEXT(item);
-+
-+	cmd.size = sizeof(cmd);
-+	cmd.msg_address = (uintptr_t)msg;
-+
-+	/*
-+	 * Send 2097248 bytes, a user is only allowed to get 33% of half of
-+	 * the free space of the pool, the already used space is
-+	 * accounted as free space
-+	 */
-+	size += KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE;
-+	for (i = size; i < (POOL_SIZE / 2 / 3); i += size) {
-+		msg->cookie = cookie++;
-+
-+		ret = kdbus_cmd_send(a->fd, &cmd);
-+		ASSERT_RETURN_VAL(ret == 0, ret);
-+	}
-+
-+	/* Try to get more than 33% */
-+	msg->cookie = cookie++;
-+	ret = kdbus_cmd_send(a->fd, &cmd);
-+	ASSERT_RETURN(ret == -ENOBUFS);
-+
-+	/* We still can pass small messages */
-+	ret = kdbus_msg_send(b, NULL, cookie++, 0, 0, 0, c->id);
-+	ASSERT_RETURN(ret == 0);
-+
-+	for (i = size; i < (POOL_SIZE / 2 / 3); i += size) {
-+		ret = kdbus_msg_recv(c, &recv_msg, NULL);
-+		ASSERT_RETURN(ret == 0);
-+		ASSERT_RETURN(recv_msg->src_id == a->id);
-+
-+		kdbus_msg_free(recv_msg);
-+	}
-+
-+	ret = kdbus_msg_recv(c, &recv_msg, NULL);
-+	ASSERT_RETURN(ret == 0);
-+	ASSERT_RETURN(recv_msg->src_id == b->id);
-+
-+	kdbus_msg_free(recv_msg);
-+
-+	ret = kdbus_msg_recv(c, NULL, NULL);
-+	ASSERT_RETURN(ret == -EAGAIN);
-+
-+	free(msg);
-+	free(payload);
-+
-+	kdbus_conn_free(c);
-+	kdbus_conn_free(b);
-+	kdbus_conn_free(a);
-+
-+	return 0;
-+}
-+
-+int kdbus_test_message_quota(struct kdbus_test_env *env)
-+{
-+	struct kdbus_conn *a, *b;
-+	uint64_t cookie = 0;
-+	int ret;
-+	int i;
-+
-+	ret = kdbus_test_activator_quota(env);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_test_notify_kernel_quota(env);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_test_pool_quota(env);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_test_expected_reply_quota(env);
-+	ASSERT_RETURN(ret == 0);
-+
-+	a = kdbus_hello(env->buspath, 0, NULL, 0);
-+	b = kdbus_hello(env->buspath, 0, NULL, 0);
-+
-+	ret = kdbus_fill_conn_queue(b, a->id, KDBUS_CONN_MAX_MSGS);
-+	ASSERT_RETURN(ret == KDBUS_CONN_MAX_MSGS);
-+
-+	ret = kdbus_msg_send(b, NULL, ++cookie, 0, 0, 0, a->id);
-+	ASSERT_RETURN(ret == -ENOBUFS);
-+
-+	for (i = 0; i < KDBUS_CONN_MAX_MSGS; ++i) {
-+		ret = kdbus_msg_recv(a, NULL, NULL);
-+		ASSERT_RETURN(ret == 0);
-+	}
-+
-+	ret = kdbus_msg_recv(a, NULL, NULL);
-+	ASSERT_RETURN(ret == -EAGAIN);
-+
-+	ret = kdbus_fill_conn_queue(b, a->id, KDBUS_CONN_MAX_MSGS + 1);
-+	ASSERT_RETURN(ret == KDBUS_CONN_MAX_MSGS);
-+
-+	ret = kdbus_msg_send(b, NULL, ++cookie, 0, 0, 0, a->id);
-+	ASSERT_RETURN(ret == -ENOBUFS);
-+
-+	kdbus_conn_free(a);
-+	kdbus_conn_free(b);
-+
-+	return TEST_OK;
-+}
-+
-+int kdbus_test_memory_access(struct kdbus_test_env *env)
-+{
-+	struct kdbus_conn *a, *b;
-+	struct kdbus_cmd_send cmd = {};
-+	struct kdbus_item *item;
-+	struct kdbus_msg *msg;
-+	uint64_t test_addr = 0;
-+	char line[256];
-+	uint64_t size;
-+	FILE *f;
-+	int ret;
-+
-+	/*
-+	 * Search in /proc/kallsyms for the address of a kernel symbol that
-+	 * should always be there, regardless of the config. Use that address
-+	 * in a PAYLOAD_VEC item and make sure it's inaccessible.
-+	 */
-+
-+	f = fopen("/proc/kallsyms", "r");
-+	if (!f)
-+		return TEST_SKIP;
-+
-+	while (fgets(line, sizeof(line), f)) {
-+		char *s = line;
-+
-+		if (!strsep(&s, " "))
-+			continue;
-+
-+		if (!strsep(&s, " "))
-+			continue;
-+
-+		if (!strncmp(s, "mutex_lock", 10)) {
-+			test_addr = strtoull(line, NULL, 16);
-+			break;
-+		}
-+	}
-+
-+	fclose(f);
-+
-+	if (!test_addr)
-+		return TEST_SKIP;
-+
-+	a = kdbus_hello(env->buspath, 0, NULL, 0);
-+	b = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(a && b);
-+
-+	size = sizeof(struct kdbus_msg);
-+	size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
-+
-+	msg = alloca(size);
-+	ASSERT_RETURN_VAL(msg, -ENOMEM);
-+
-+	memset(msg, 0, size);
-+	msg->size = size;
-+	msg->src_id = a->id;
-+	msg->dst_id = b->id;
-+	msg->payload_type = KDBUS_PAYLOAD_DBUS;
-+
-+	item = msg->items;
-+	item->type = KDBUS_ITEM_PAYLOAD_VEC;
-+	item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
-+	item->vec.address = test_addr;
-+	item->vec.size = sizeof(void*);
-+	item = KDBUS_ITEM_NEXT(item);
-+
-+	cmd.size = sizeof(cmd);
-+	cmd.msg_address = (uintptr_t)msg;
-+
-+	ret = kdbus_cmd_send(a->fd, &cmd);
-+	ASSERT_RETURN(ret == -EFAULT);
-+
-+	kdbus_conn_free(b);
-+	kdbus_conn_free(a);
-+
-+	return 0;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-metadata-ns.c b/tools/testing/selftests/kdbus/test-metadata-ns.c
-new file mode 100644
-index 0000000..1f6edc0
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-metadata-ns.c
-@@ -0,0 +1,500 @@
-+/*
-+ * Test metadata in new namespaces. Even if our tests can run
-+ * in a namespaced setup, this test is necessary so we can inspect
-+ * metadata on the same kdbusfs but between multiple namespaces
-+ */
-+
-+#include <stdio.h>
-+#include <string.h>
-+#include <sched.h>
-+#include <time.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <signal.h>
-+#include <sys/wait.h>
-+#include <sys/prctl.h>
-+#include <sys/eventfd.h>
-+#include <sys/syscall.h>
-+#include <sys/capability.h>
-+#include <linux/sched.h>
-+
-+#include "kdbus-test.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+
-+static const struct kdbus_creds privileged_creds = {};
-+
-+static const struct kdbus_creds unmapped_creds = {
-+	.uid	= UNPRIV_UID,
-+	.euid	= UNPRIV_UID,
-+	.suid	= UNPRIV_UID,
-+	.fsuid	= UNPRIV_UID,
-+	.gid	= UNPRIV_GID,
-+	.egid	= UNPRIV_GID,
-+	.sgid	= UNPRIV_GID,
-+	.fsgid	= UNPRIV_GID,
-+};
-+
-+static const struct kdbus_pids unmapped_pids = {};
-+
-+/* Get only the first item */
-+static struct kdbus_item *kdbus_get_item(struct kdbus_msg *msg,
-+					 uint64_t type)
-+{
-+	struct kdbus_item *item;
-+
-+	KDBUS_ITEM_FOREACH(item, msg, items)
-+		if (item->type == type)
-+			return item;
-+
-+	return NULL;
-+}
-+
-+static int kdbus_match_kdbus_creds(struct kdbus_msg *msg,
-+				   const struct kdbus_creds *expected_creds)
-+{
-+	struct kdbus_item *item;
-+
-+	item = kdbus_get_item(msg, KDBUS_ITEM_CREDS);
-+	ASSERT_RETURN(item);
-+
-+	ASSERT_RETURN(memcmp(&item->creds, expected_creds,
-+			     sizeof(struct kdbus_creds)) == 0);
-+
-+	return 0;
-+}
-+
-+static int kdbus_match_kdbus_pids(struct kdbus_msg *msg,
-+				  const struct kdbus_pids *expected_pids)
-+{
-+	struct kdbus_item *item;
-+
-+	item = kdbus_get_item(msg, KDBUS_ITEM_PIDS);
-+	ASSERT_RETURN(item);
-+
-+	ASSERT_RETURN(memcmp(&item->pids, expected_pids,
-+			     sizeof(struct kdbus_pids)) == 0);
-+
-+	return 0;
-+}
-+
-+static int __kdbus_clone_userns_test(const char *bus,
-+				     struct kdbus_conn *conn,
-+				     uint64_t grandpa_pid,
-+				     int signal_fd)
-+{
-+	int clone_ret;
-+	int ret;
-+	struct kdbus_msg *msg = NULL;
-+	const struct kdbus_item *item;
-+	uint64_t cookie = time(NULL) ^ 0xdeadbeef;
-+	struct kdbus_conn *unpriv_conn = NULL;
-+	struct kdbus_pids parent_pids = {
-+		.pid = getppid(),
-+		.tid = getppid(),
-+		.ppid = grandpa_pid,
-+	};
-+
-+	ret = drop_privileges(UNPRIV_UID, UNPRIV_GID);
-+	ASSERT_EXIT(ret == 0);
-+
-+	unpriv_conn = kdbus_hello(bus, 0, NULL, 0);
-+	ASSERT_EXIT(unpriv_conn);
-+
-+	ret = kdbus_add_match_empty(unpriv_conn);
-+	ASSERT_EXIT(ret == 0);
-+
-+	/*
-+	 * ping privileged connection from this new unprivileged
-+	 * one
-+	 */
-+
-+	ret = kdbus_msg_send(unpriv_conn, NULL, cookie, 0, 0,
-+			     0, conn->id);
-+	ASSERT_EXIT(ret == 0);
-+
-+	/*
-+	 * Since we just dropped privileges, the dumpable flag
-+	 * was just cleared which makes the /proc/$clone_child/uid_map
-+	 * to be owned by root, hence any userns uid mapping will fail
-+	 * with -EPERM since the mapping will be done by uid 65534.
-+	 *
-+	 * To avoid this set the dumpable flag again which makes
-+	 * procfs update the /proc/$clone_child/ inodes owner to 65534.
-+	 *
-+	 * Using this we will be able write to /proc/$clone_child/uid_map
-+	 * as uid 65534 and map the uid 65534 to 0 inside the user namespace.
-+	 */
-+	ret = prctl(PR_SET_DUMPABLE, SUID_DUMP_USER);
-+	ASSERT_EXIT(ret == 0);
-+
-+	/* Make child privileged in its new userns and run tests */
-+
-+	ret = RUN_CLONE_CHILD(&clone_ret,
-+			      SIGCHLD | CLONE_NEWUSER | CLONE_NEWPID,
-+	({ 0;  /* Clone setup, nothing */ }),
-+	({
-+		eventfd_t event_status = 0;
-+		struct kdbus_conn *userns_conn;
-+
-+		/* ping connection from the new user namespace */
-+		userns_conn = kdbus_hello(bus, 0, NULL, 0);
-+		ASSERT_EXIT(userns_conn);
-+
-+		ret = kdbus_add_match_empty(userns_conn);
-+		ASSERT_EXIT(ret == 0);
-+
-+		cookie++;
-+		ret = kdbus_msg_send(userns_conn, NULL, cookie,
-+				     0, 0, 0, conn->id);
-+		ASSERT_EXIT(ret == 0);
-+
-+		/* Parent did send */
-+		ret = eventfd_read(signal_fd, &event_status);
-+		ASSERT_RETURN(ret >= 0 && event_status == 1);
-+
-+		/*
-+		 * Receive from privileged connection
-+		 */
-+		kdbus_printf("Privileged → unprivileged/privileged "
-+			     "in its userns "
-+			     "(different userns and pidns):\n");
-+		ret = kdbus_msg_recv_poll(userns_conn, 300, &msg, NULL);
-+		ASSERT_EXIT(ret == 0);
-+		ASSERT_EXIT(msg->dst_id == userns_conn->id);
-+
-+		item = kdbus_get_item(msg, KDBUS_ITEM_CAPS);
-+		ASSERT_EXIT(item);
-+
-+		/* uid/gid not mapped, so we have unpriv cached creds */
-+		ret = kdbus_match_kdbus_creds(msg, &unmapped_creds);
-+		ASSERT_EXIT(ret == 0);
-+
-+		/*
-+		 * Diffent pid namepsaces. This is the child pidns
-+		 * so it should not see its parent kdbus_pids
-+		 */
-+		ret = kdbus_match_kdbus_pids(msg, &unmapped_pids);
-+		ASSERT_EXIT(ret == 0);
-+
-+		kdbus_msg_free(msg);
-+
-+
-+		/*
-+		 * Receive broadcast from privileged connection
-+		 */
-+		kdbus_printf("Privileged → unprivileged/privileged "
-+			     "in its userns "
-+			     "(different userns and pidns):\n");
-+		ret = kdbus_msg_recv_poll(userns_conn, 300, &msg, NULL);
-+		ASSERT_EXIT(ret == 0);
-+		ASSERT_EXIT(msg->dst_id == KDBUS_DST_ID_BROADCAST);
-+
-+		item = kdbus_get_item(msg, KDBUS_ITEM_CAPS);
-+		ASSERT_EXIT(item);
-+
-+		/* uid/gid not mapped, so we have unpriv cached creds */
-+		ret = kdbus_match_kdbus_creds(msg, &unmapped_creds);
-+		ASSERT_EXIT(ret == 0);
-+
-+		/*
-+		 * Diffent pid namepsaces. This is the child pidns
-+		 * so it should not see its parent kdbus_pids
-+		 */
-+		ret = kdbus_match_kdbus_pids(msg, &unmapped_pids);
-+		ASSERT_EXIT(ret == 0);
-+
-+		kdbus_msg_free(msg);
-+
-+		kdbus_conn_free(userns_conn);
-+	}),
-+	({
-+		/* Parent setup map child uid/gid */
-+		ret = userns_map_uid_gid(pid, "0 65534 1", "0 65534 1");
-+		ASSERT_EXIT(ret == 0);
-+	}),
-+	({ 0; }));
-+	/* Unprivileged was not able to create user namespace */
-+	if (clone_ret == -EPERM) {
-+		kdbus_printf("-- CLONE_NEWUSER TEST Failed for "
-+			     "uid: %u\n -- Make sure that your kernel "
-+			     "do not allow CLONE_NEWUSER for "
-+			     "unprivileged users\n", UNPRIV_UID);
-+		ret = 0;
-+		goto out;
-+	}
-+
-+	ASSERT_EXIT(ret == 0);
-+
-+
-+	/*
-+	 * Receive from privileged connection
-+	 */
-+	kdbus_printf("\nPrivileged → unprivileged (same namespaces):\n");
-+	ret = kdbus_msg_recv_poll(unpriv_conn, 300, &msg, NULL);
-+
-+	ASSERT_EXIT(ret == 0);
-+	ASSERT_EXIT(msg->dst_id == unpriv_conn->id);
-+
-+	/* will get the privileged creds */
-+	ret = kdbus_match_kdbus_creds(msg, &privileged_creds);
-+	ASSERT_EXIT(ret == 0);
-+
-+	/* Same pidns so will get the kdbus_pids */
-+	ret = kdbus_match_kdbus_pids(msg, &parent_pids);
-+	ASSERT_RETURN(ret == 0);
-+
-+	kdbus_msg_free(msg);
-+
-+
-+	/*
-+	 * Receive broadcast from privileged connection
-+	 */
-+	kdbus_printf("\nPrivileged → unprivileged (same namespaces):\n");
-+	ret = kdbus_msg_recv_poll(unpriv_conn, 300, &msg, NULL);
-+
-+	ASSERT_EXIT(ret == 0);
-+	ASSERT_EXIT(msg->dst_id == KDBUS_DST_ID_BROADCAST);
-+
-+	/* will get the privileged creds */
-+	ret = kdbus_match_kdbus_creds(msg, &privileged_creds);
-+	ASSERT_EXIT(ret == 0);
-+
-+	ret = kdbus_match_kdbus_pids(msg, &parent_pids);
-+	ASSERT_RETURN(ret == 0);
-+
-+	kdbus_msg_free(msg);
-+
-+out:
-+	kdbus_conn_free(unpriv_conn);
-+
-+	return ret;
-+}
-+
-+static int kdbus_clone_userns_test(const char *bus,
-+				   struct kdbus_conn *conn)
-+{
-+	int ret, status, efd;
-+	pid_t pid, ppid;
-+	uint64_t unpriv_conn_id, userns_conn_id;
-+	struct kdbus_msg *msg;
-+	const struct kdbus_item *item;
-+	struct kdbus_pids expected_pids;
-+	struct kdbus_conn *monitor;
-+
-+	kdbus_printf("STARTING TEST 'metadata-ns'.\n");
-+
-+	monitor = kdbus_hello(bus, KDBUS_HELLO_MONITOR, NULL, 0);
-+	ASSERT_EXIT(monitor);
-+
-+	/*
-+	 * parent will signal to child that is in its
-+	 * userns to read its queue
-+	 */
-+	efd = eventfd(0, EFD_CLOEXEC);
-+	ASSERT_RETURN_VAL(efd >= 0, efd);
-+
-+	ppid = getppid();
-+
-+	pid = fork();
-+	ASSERT_RETURN_VAL(pid >= 0, -errno);
-+
-+	if (pid == 0) {
-+		ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
-+		ASSERT_EXIT_VAL(ret == 0, -errno);
-+
-+		ret = __kdbus_clone_userns_test(bus, conn, ppid, efd);
-+		_exit(ret);
-+	}
-+
-+
-+	/* Phase 1) privileged receives from unprivileged */
-+
-+	/*
-+	 * Receive from the unprivileged child
-+	 */
-+	kdbus_printf("\nUnprivileged → privileged (same namespaces):\n");
-+	ret = kdbus_msg_recv_poll(conn, 300, &msg, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	unpriv_conn_id = msg->src_id;
-+
-+	/* Unprivileged user */
-+	ret = kdbus_match_kdbus_creds(msg, &unmapped_creds);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* Set the expected creds_pids */
-+	expected_pids = (struct kdbus_pids) {
-+		.pid = pid,
-+		.tid = pid,
-+		.ppid = getpid(),
-+	};
-+	ret = kdbus_match_kdbus_pids(msg, &expected_pids);
-+	ASSERT_RETURN(ret == 0);
-+
-+	kdbus_msg_free(msg);
-+
-+
-+	/*
-+	 * Receive from the unprivileged that is in his own
-+	 * userns and pidns
-+	 */
-+
-+	kdbus_printf("\nUnprivileged/privileged in its userns → privileged "
-+		     "(different userns and pidns)\n");
-+	ret = kdbus_msg_recv_poll(conn, 300, &msg, NULL);
-+	if (ret == -ETIMEDOUT)
-+		/* perhaps unprivileged userns is not allowed */
-+		goto wait;
-+
-+	ASSERT_RETURN(ret == 0);
-+
-+	userns_conn_id = msg->src_id;
-+
-+	item = kdbus_get_item(msg, KDBUS_ITEM_CAPS);
-+	ASSERT_RETURN(item);
-+
-+	/*
-+	 * Compare received items, creds must be translated into
-+	 * the receiver user namespace, so the user is unprivileged
-+	 */
-+	ret = kdbus_match_kdbus_creds(msg, &unmapped_creds);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/*
-+	 * We should have the kdbus_pids since we are the parent
-+	 * pidns
-+	 */
-+	item = kdbus_get_item(msg, KDBUS_ITEM_PIDS);
-+	ASSERT_RETURN(item);
-+
-+	ASSERT_RETURN(memcmp(&item->pids, &unmapped_pids,
-+			     sizeof(struct kdbus_pids)) != 0);
-+
-+	/*
-+	 * Parent pid of the unprivileged/privileged in its userns
-+	 * is the unprivileged child pid that was forked here.
-+	 */
-+	ASSERT_RETURN((uint64_t)pid == item->pids.ppid);
-+
-+	kdbus_msg_free(msg);
-+
-+
-+	/* Phase 2) Privileged connection sends now 3 packets */
-+
-+	/*
-+	 * Sending to unprivileged connections a unicast
-+	 */
-+	ret = kdbus_msg_send(conn, NULL, 0xdeadbeef, 0, 0,
-+			     0, unpriv_conn_id);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* signal to child that is in its userns */
-+	ret = eventfd_write(efd, 1);
-+	ASSERT_EXIT(ret == 0);
-+
-+	/*
-+	 * Sending to unprivileged/privilged in its userns
-+	 * connections a unicast
-+	 */
-+	ret = kdbus_msg_send(conn, NULL, 0xdeadbeef, 0, 0,
-+			     0, userns_conn_id);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/*
-+	 * Sending to unprivileged connections a broadcast
-+	 */
-+	ret = kdbus_msg_send(conn, NULL, 0xdeadbeef, 0, 0,
-+			     0, KDBUS_DST_ID_BROADCAST);
-+	ASSERT_RETURN(ret == 0);
-+
-+
-+wait:
-+	ret = waitpid(pid, &status, 0);
-+	ASSERT_RETURN(ret >= 0);
-+
-+	ASSERT_RETURN(WIFEXITED(status))
-+	ASSERT_RETURN(!WEXITSTATUS(status));
-+
-+	/* Dump monitor queue */
-+	kdbus_printf("\n\nMonitor queue:\n");
-+	for (;;) {
-+		ret = kdbus_msg_recv_poll(monitor, 100, &msg, NULL);
-+		if (ret < 0)
-+			break;
-+
-+		if (msg->payload_type == KDBUS_PAYLOAD_DBUS) {
-+			/*
-+			 * Parent pidns should see all the
-+			 * pids
-+			 */
-+			item = kdbus_get_item(msg, KDBUS_ITEM_PIDS);
-+			ASSERT_RETURN(item);
-+
-+			ASSERT_RETURN(item->pids.pid != 0 &&
-+				      item->pids.tid != 0 &&
-+				      item->pids.ppid != 0);
-+		}
-+
-+		kdbus_msg_free(msg);
-+	}
-+
-+	kdbus_conn_free(monitor);
-+	close(efd);
-+
-+	return 0;
-+}
-+
-+int kdbus_test_metadata_ns(struct kdbus_test_env *env)
-+{
-+	int ret;
-+	struct kdbus_conn *holder, *conn;
-+	struct kdbus_policy_access policy_access = {
-+		/* Allow world so we can inspect metadata in namespace */
-+		.type = KDBUS_POLICY_ACCESS_WORLD,
-+		.id = geteuid(),
-+		.access = KDBUS_POLICY_TALK,
-+	};
-+
-+	/*
-+	 * We require user-namespaces and all uids/gids
-+	 * should be mapped (we can just require the necessary ones)
-+	 */
-+	if (!config_user_ns_is_enabled() ||
-+	    !all_uids_gids_are_mapped())
-+		return TEST_SKIP;
-+
-+	ret = test_is_capable(CAP_SETUID, CAP_SETGID, CAP_SYS_ADMIN, -1);
-+	ASSERT_RETURN(ret >= 0);
-+
-+	/* no enough privileges, SKIP test */
-+	if (!ret)
-+		return TEST_SKIP;
-+
-+	holder = kdbus_hello_registrar(env->buspath, "com.example.metadata",
-+				       &policy_access, 1,
-+				       KDBUS_HELLO_POLICY_HOLDER);
-+	ASSERT_RETURN(holder);
-+
-+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn);
-+
-+	ret = kdbus_add_match_empty(conn);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_name_acquire(conn, "com.example.metadata", NULL);
-+	ASSERT_EXIT(ret >= 0);
-+
-+	ret = kdbus_clone_userns_test(env->buspath, conn);
-+	ASSERT_RETURN(ret == 0);
-+
-+	kdbus_conn_free(holder);
-+	kdbus_conn_free(conn);
-+
-+	return TEST_OK;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-monitor.c b/tools/testing/selftests/kdbus/test-monitor.c
-new file mode 100644
-index 0000000..e00d738
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-monitor.c
-@@ -0,0 +1,176 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <time.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <stdbool.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <signal.h>
-+#include <sys/time.h>
-+#include <sys/mman.h>
-+#include <sys/capability.h>
-+#include <sys/wait.h>
-+
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+#include "kdbus-test.h"
-+
-+int kdbus_test_monitor(struct kdbus_test_env *env)
-+{
-+	struct kdbus_conn *monitor, *conn;
-+	unsigned int cookie = 0xdeadbeef;
-+	struct kdbus_msg *msg;
-+	uint64_t offset = 0;
-+	int ret;
-+
-+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn);
-+
-+	/* add matches to make sure the monitor do not trigger an item add or
-+	 * remove on connect and disconnect, respectively.
-+	 */
-+	ret = kdbus_add_match_id(conn, 0x1, KDBUS_ITEM_ID_ADD,
-+				 KDBUS_MATCH_ID_ANY);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_add_match_id(conn, 0x2, KDBUS_ITEM_ID_REMOVE,
-+				 KDBUS_MATCH_ID_ANY);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* register a monitor */
-+	monitor = kdbus_hello(env->buspath, KDBUS_HELLO_MONITOR, NULL, 0);
-+	ASSERT_RETURN(monitor);
-+
-+	/* make sure we did not receive a monitor connect notification */
-+	ret = kdbus_msg_recv(conn, &msg, &offset);
-+	ASSERT_RETURN(ret == -EAGAIN);
-+
-+	/* check that a monitor cannot acquire a name */
-+	ret = kdbus_name_acquire(monitor, "foo.bar.baz", NULL);
-+	ASSERT_RETURN(ret == -EOPNOTSUPP);
-+
-+	ret = kdbus_msg_send(env->conn, NULL, cookie, 0, 0,  0, conn->id);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* the recipient should have gotten the message */
-+	ret = kdbus_msg_recv(conn, &msg, &offset);
-+	ASSERT_RETURN(ret == 0);
-+	ASSERT_RETURN(msg->cookie == cookie);
-+	kdbus_msg_free(msg);
-+	kdbus_free(conn, offset);
-+
-+	/* and so should the monitor */
-+	ret = kdbus_msg_recv(monitor, &msg, &offset);
-+	ASSERT_RETURN(ret == 0);
-+	ASSERT_RETURN(msg->cookie == cookie);
-+
-+	kdbus_msg_free(msg);
-+	kdbus_free(monitor, offset);
-+
-+	/* Installing matches for monitors must fais must fail */
-+	ret = kdbus_add_match_empty(monitor);
-+	ASSERT_RETURN(ret == -EOPNOTSUPP);
-+
-+	cookie++;
-+	ret = kdbus_msg_send(env->conn, NULL, cookie, 0, 0, 0,
-+			     KDBUS_DST_ID_BROADCAST);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* The monitor should get the message. */
-+	ret = kdbus_msg_recv_poll(monitor, 100, &msg, &offset);
-+	ASSERT_RETURN(ret == 0);
-+	ASSERT_RETURN(msg->cookie == cookie);
-+
-+	kdbus_msg_free(msg);
-+	kdbus_free(monitor, offset);
-+
-+	/*
-+	 * Since we are the only monitor, update the attach flags
-+	 * and tell we are not interessted in attach flags recv
-+	 */
-+
-+	ret = kdbus_conn_update_attach_flags(monitor,
-+					     _KDBUS_ATTACH_ALL,
-+					     0);
-+	ASSERT_RETURN(ret == 0);
-+
-+	cookie++;
-+	ret = kdbus_msg_send(env->conn, NULL, cookie, 0, 0, 0,
-+			     KDBUS_DST_ID_BROADCAST);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_msg_recv_poll(monitor, 100, &msg, &offset);
-+	ASSERT_RETURN(ret == 0);
-+	ASSERT_RETURN(msg->cookie == cookie);
-+
-+	ret = kdbus_item_in_message(msg, KDBUS_ITEM_TIMESTAMP);
-+	ASSERT_RETURN(ret == 0);
-+
-+	kdbus_msg_free(msg);
-+	kdbus_free(monitor, offset);
-+
-+	/*
-+	 * Now we are interested in KDBUS_ITEM_TIMESTAMP and
-+	 * KDBUS_ITEM_CREDS
-+	 */
-+	ret = kdbus_conn_update_attach_flags(monitor,
-+					     _KDBUS_ATTACH_ALL,
-+					     KDBUS_ATTACH_TIMESTAMP |
-+					     KDBUS_ATTACH_CREDS);
-+	ASSERT_RETURN(ret == 0);
-+
-+	cookie++;
-+	ret = kdbus_msg_send(env->conn, NULL, cookie, 0, 0, 0,
-+			     KDBUS_DST_ID_BROADCAST);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_msg_recv_poll(monitor, 100, &msg, &offset);
-+	ASSERT_RETURN(ret == 0);
-+	ASSERT_RETURN(msg->cookie == cookie);
-+
-+	ret = kdbus_item_in_message(msg, KDBUS_ITEM_TIMESTAMP);
-+	ASSERT_RETURN(ret == 1);
-+
-+	ret = kdbus_item_in_message(msg, KDBUS_ITEM_CREDS);
-+	ASSERT_RETURN(ret == 1);
-+
-+	/* the KDBUS_ITEM_PID_COMM was not requested */
-+	ret = kdbus_item_in_message(msg, KDBUS_ITEM_PID_COMM);
-+	ASSERT_RETURN(ret == 0);
-+
-+	kdbus_msg_free(msg);
-+	kdbus_free(monitor, offset);
-+
-+	kdbus_conn_free(monitor);
-+	/* make sure we did not receive a monitor disconnect notification */
-+	ret = kdbus_msg_recv(conn, &msg, &offset);
-+	ASSERT_RETURN(ret == -EAGAIN);
-+
-+	kdbus_conn_free(conn);
-+
-+	/* Make sure that monitor as unprivileged is not allowed */
-+	ret = test_is_capable(CAP_SETUID, CAP_SETGID, -1);
-+	ASSERT_RETURN(ret >= 0);
-+
-+	if (ret && all_uids_gids_are_mapped()) {
-+		ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_UID, ({
-+			monitor = kdbus_hello(env->buspath,
-+					      KDBUS_HELLO_MONITOR,
-+					      NULL, 0);
-+			ASSERT_EXIT(!monitor && errno == EPERM);
-+
-+			_exit(EXIT_SUCCESS);
-+		}),
-+		({ 0; }));
-+		ASSERT_RETURN(ret == 0);
-+	}
-+
-+	return TEST_OK;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-names.c b/tools/testing/selftests/kdbus/test-names.c
-new file mode 100644
-index 0000000..e400dc8
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-names.c
-@@ -0,0 +1,272 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <time.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <limits.h>
-+#include <getopt.h>
-+#include <stdbool.h>
-+
-+#include "kdbus-api.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+#include "kdbus-test.h"
-+
-+struct test_name {
-+	const char *name;
-+	__u64 owner_id;
-+	__u64 flags;
-+};
-+
-+static bool conn_test_names(const struct kdbus_conn *conn,
-+			    const struct test_name *tests,
-+			    unsigned int n_tests)
-+{
-+	struct kdbus_cmd_list cmd_list = {};
-+	struct kdbus_info *name, *list;
-+	unsigned int i;
-+	int ret;
-+
-+	cmd_list.size = sizeof(cmd_list);
-+	cmd_list.flags = KDBUS_LIST_NAMES |
-+			 KDBUS_LIST_ACTIVATORS |
-+			 KDBUS_LIST_QUEUED;
-+
-+	ret = kdbus_cmd_list(conn->fd, &cmd_list);
-+	ASSERT_RETURN(ret == 0);
-+
-+	list = (struct kdbus_info *)(conn->buf + cmd_list.offset);
-+
-+	for (i = 0; i < n_tests; i++) {
-+		const struct test_name *t = tests + i;
-+		bool found = false;
-+
-+		KDBUS_FOREACH(name, list, cmd_list.list_size) {
-+			struct kdbus_item *item;
-+
-+			KDBUS_ITEM_FOREACH(item, name, items) {
-+				if (item->type != KDBUS_ITEM_OWNED_NAME ||
-+				    strcmp(item->name.name, t->name) != 0)
-+					continue;
-+
-+				if (t->owner_id == name->id &&
-+				    t->flags == item->name.flags) {
-+					found = true;
-+					break;
-+				}
-+			}
-+		}
-+
-+		if (!found)
-+			return false;
-+	}
-+
-+	return true;
-+}
-+
-+static bool conn_is_name_primary_owner(const struct kdbus_conn *conn,
-+				       const char *needle)
-+{
-+	struct test_name t = {
-+		.name = needle,
-+		.owner_id = conn->id,
-+		.flags = KDBUS_NAME_PRIMARY,
-+	};
-+
-+	return conn_test_names(conn, &t, 1);
-+}
-+
-+int kdbus_test_name_basic(struct kdbus_test_env *env)
-+{
-+	struct kdbus_conn *conn;
-+	char *name, *dot_name, *invalid_name, *wildcard_name;
-+	int ret;
-+
-+	name = "foo.bla.blaz";
-+	dot_name = ".bla.blaz";
-+	invalid_name = "foo";
-+	wildcard_name = "foo.bla.bl.*";
-+
-+	/* create a 2nd connection */
-+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn != NULL);
-+
-+	/* acquire name "foo.bar.xxx" name */
-+	ret = kdbus_name_acquire(conn, "foo.bar.xxx", NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* Name is not valid, must fail */
-+	ret = kdbus_name_acquire(env->conn, dot_name, NULL);
-+	ASSERT_RETURN(ret == -EINVAL);
-+
-+	ret = kdbus_name_acquire(env->conn, invalid_name, NULL);
-+	ASSERT_RETURN(ret == -EINVAL);
-+
-+	ret = kdbus_name_acquire(env->conn, wildcard_name, NULL);
-+	ASSERT_RETURN(ret == -EINVAL);
-+
-+	/* check that we can acquire a name */
-+	ret = kdbus_name_acquire(env->conn, name, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = conn_is_name_primary_owner(env->conn, name);
-+	ASSERT_RETURN(ret == true);
-+
-+	/* ... and release it again */
-+	ret = kdbus_name_release(env->conn, name);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = conn_is_name_primary_owner(env->conn, name);
-+	ASSERT_RETURN(ret == false);
-+
-+	/* check that we can't release it again */
-+	ret = kdbus_name_release(env->conn, name);
-+	ASSERT_RETURN(ret == -ESRCH);
-+
-+	/* check that we can't release a name that we don't own */
-+	ret = kdbus_name_release(env->conn, "foo.bar.xxx");
-+	ASSERT_RETURN(ret == -EADDRINUSE);
-+
-+	/* Name is not valid, must fail */
-+	ret = kdbus_name_release(env->conn, dot_name);
-+	ASSERT_RETURN(ret == -ESRCH);
-+
-+	ret = kdbus_name_release(env->conn, invalid_name);
-+	ASSERT_RETURN(ret == -ESRCH);
-+
-+	ret = kdbus_name_release(env->conn, wildcard_name);
-+	ASSERT_RETURN(ret == -ESRCH);
-+
-+	kdbus_conn_free(conn);
-+
-+	return TEST_OK;
-+}
-+
-+int kdbus_test_name_conflict(struct kdbus_test_env *env)
-+{
-+	struct kdbus_conn *conn;
-+	char *name;
-+	int ret;
-+
-+	name = "foo.bla.blaz";
-+
-+	/* create a 2nd connection */
-+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn != NULL);
-+
-+	/* allow the new connection to own the same name */
-+	/* acquire name from the 1st connection */
-+	ret = kdbus_name_acquire(env->conn, name, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = conn_is_name_primary_owner(env->conn, name);
-+	ASSERT_RETURN(ret == true);
-+
-+	/* check that we also can't acquire it again from the 2nd connection */
-+	ret = kdbus_name_acquire(conn, name, NULL);
-+	ASSERT_RETURN(ret == -EEXIST);
-+
-+	kdbus_conn_free(conn);
-+
-+	return TEST_OK;
-+}
-+
-+int kdbus_test_name_queue(struct kdbus_test_env *env)
-+{
-+	struct kdbus_conn *conn;
-+	struct test_name t[2];
-+	const char *name;
-+	uint64_t flags;
-+	int ret;
-+
-+	name = "foo.bla.blaz";
-+
-+	flags = 0;
-+
-+	/* create a 2nd connection */
-+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn != NULL);
-+
-+	/* allow the new connection to own the same name */
-+	/* acquire name from the 1st connection */
-+	ret = kdbus_name_acquire(env->conn, name, &flags);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = conn_is_name_primary_owner(env->conn, name);
-+	ASSERT_RETURN(ret == true);
-+
-+	/* queue the 2nd connection as waiting owner */
-+	flags = KDBUS_NAME_QUEUE;
-+	ret = kdbus_name_acquire(conn, name, &flags);
-+	ASSERT_RETURN(ret == 0);
-+	ASSERT_RETURN(flags & KDBUS_NAME_IN_QUEUE);
-+
-+	t[0].name = name;
-+	t[0].owner_id = env->conn->id;
-+	t[0].flags = KDBUS_NAME_PRIMARY;
-+	t[1].name = name;
-+	t[1].owner_id = conn->id;
-+	t[1].flags = KDBUS_NAME_QUEUE | KDBUS_NAME_IN_QUEUE;
-+	ret = conn_test_names(conn, t, 2);
-+	ASSERT_RETURN(ret == true);
-+
-+	/* release name from 1st connection */
-+	ret = kdbus_name_release(env->conn, name);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* now the name should be owned by the 2nd connection */
-+	t[0].name = name;
-+	t[0].owner_id = conn->id;
-+	t[0].flags = KDBUS_NAME_PRIMARY | KDBUS_NAME_QUEUE;
-+	ret = conn_test_names(conn, t, 1);
-+	ASSERT_RETURN(ret == true);
-+
-+	kdbus_conn_free(conn);
-+
-+	return TEST_OK;
-+}
-+
-+int kdbus_test_name_takeover(struct kdbus_test_env *env)
-+{
-+	struct kdbus_conn *conn;
-+	struct test_name t;
-+	const char *name;
-+	uint64_t flags;
-+	int ret;
-+
-+	name = "foo.bla.blaz";
-+
-+	flags = KDBUS_NAME_ALLOW_REPLACEMENT;
-+
-+	/* create a 2nd connection */
-+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn != NULL);
-+
-+	/* acquire name for 1st connection */
-+	ret = kdbus_name_acquire(env->conn, name, &flags);
-+	ASSERT_RETURN(ret == 0);
-+
-+	t.name = name;
-+	t.owner_id = env->conn->id;
-+	t.flags = KDBUS_NAME_ALLOW_REPLACEMENT | KDBUS_NAME_PRIMARY;
-+	ret = conn_test_names(conn, &t, 1);
-+	ASSERT_RETURN(ret == true);
-+
-+	/* now steal name with 2nd connection */
-+	flags = KDBUS_NAME_REPLACE_EXISTING;
-+	ret = kdbus_name_acquire(conn, name, &flags);
-+	ASSERT_RETURN(ret == 0);
-+	ASSERT_RETURN(flags & KDBUS_NAME_ACQUIRED);
-+
-+	ret = conn_is_name_primary_owner(conn, name);
-+	ASSERT_RETURN(ret == true);
-+
-+	kdbus_conn_free(conn);
-+
-+	return TEST_OK;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-policy-ns.c b/tools/testing/selftests/kdbus/test-policy-ns.c
-new file mode 100644
-index 0000000..3437012
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-policy-ns.c
-@@ -0,0 +1,632 @@
-+/*
-+ * Test metadata and policies in new namespaces. Even if our tests
-+ * can run in a namespaced setup, this test is necessary so we can
-+ * inspect policies on the same kdbusfs but between multiple
-+ * namespaces.
-+ *
-+ * Copyright (C) 2014-2015 Djalal Harouni
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <stdio.h>
-+#include <string.h>
-+#include <fcntl.h>
-+#include <pthread.h>
-+#include <sched.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <stdint.h>
-+#include <stdbool.h>
-+#include <unistd.h>
-+#include <errno.h>
-+#include <signal.h>
-+#include <sys/wait.h>
-+#include <sys/prctl.h>
-+#include <sys/eventfd.h>
-+#include <sys/syscall.h>
-+#include <sys/capability.h>
-+#include <linux/sched.h>
-+
-+#include "kdbus-test.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+
-+#define MAX_CONN	64
-+#define POLICY_NAME	"foo.test.policy-test"
-+
-+#define KDBUS_CONN_MAX_MSGS_PER_USER            16
-+
-+/**
-+ * Note: this test can be used to inspect policy_db->talk_access_hash
-+ *
-+ * The purpose of these tests:
-+ * 1) Check KDBUS_POLICY_TALK
-+ * 2) Check the cache state: kdbus_policy_db->talk_access_hash
-+ * Should be extended
-+ */
-+
-+/**
-+ * Check a list of connections against conn_db[0]
-+ * conn_db[0] will own the name "foo.test.policy-test" and the
-+ * policy holder connection for this name will update the policy
-+ * entries, so different use cases can be tested.
-+ */
-+static struct kdbus_conn **conn_db;
-+
-+static void *kdbus_recv_echo(void *ptr)
-+{
-+	int ret;
-+	struct kdbus_conn *conn = ptr;
-+
-+	ret = kdbus_msg_recv_poll(conn, 200, NULL, NULL);
-+
-+	return (void *)(long)ret;
-+}
-+
-+/* Trigger kdbus_policy_set() */
-+static int kdbus_set_policy_talk(struct kdbus_conn *conn,
-+				 const char *name,
-+				 uid_t id, unsigned int type)
-+{
-+	int ret;
-+	struct kdbus_policy_access access = {
-+		.type = type,
-+		.id = id,
-+		.access = KDBUS_POLICY_TALK,
-+	};
-+
-+	ret = kdbus_conn_update_policy(conn, name, &access, 1);
-+	ASSERT_RETURN(ret == 0);
-+
-+	return TEST_OK;
-+}
-+
-+/* return TEST_OK or TEST_ERR on failure */
-+static int kdbus_register_same_activator(char *bus, const char *name,
-+					 struct kdbus_conn **c)
-+{
-+	int ret;
-+	struct kdbus_conn *activator;
-+
-+	activator = kdbus_hello_activator(bus, name, NULL, 0);
-+	if (activator) {
-+		*c = activator;
-+		fprintf(stderr, "--- error was able to register name twice '%s'.\n",
-+			name);
-+		return TEST_ERR;
-+	}
-+
-+	ret = -errno;
-+	/* -EEXIST means test succeeded */
-+	if (ret == -EEXIST)
-+		return TEST_OK;
-+
-+	return TEST_ERR;
-+}
-+
-+/* return TEST_OK or TEST_ERR on failure */
-+static int kdbus_register_policy_holder(char *bus, const char *name,
-+					struct kdbus_conn **conn)
-+{
-+	struct kdbus_conn *c;
-+	struct kdbus_policy_access access[2];
-+
-+	access[0].type = KDBUS_POLICY_ACCESS_USER;
-+	access[0].access = KDBUS_POLICY_OWN;
-+	access[0].id = geteuid();
-+
-+	access[1].type = KDBUS_POLICY_ACCESS_WORLD;
-+	access[1].access = KDBUS_POLICY_TALK;
-+	access[1].id = geteuid();
-+
-+	c = kdbus_hello_registrar(bus, name, access, 2,
-+				  KDBUS_HELLO_POLICY_HOLDER);
-+	ASSERT_RETURN(c);
-+
-+	*conn = c;
-+
-+	return TEST_OK;
-+}
-+
-+/**
-+ * Create new threads for receiving from multiple senders,
-+ * The 'conn_db' will be populated by newly created connections.
-+ * Caller should free all allocated connections.
-+ *
-+ * return 0 on success, negative errno on failure.
-+ */
-+static int kdbus_recv_in_threads(const char *bus, const char *name,
-+				 struct kdbus_conn **conn_db)
-+{
-+	int ret;
-+	bool pool_full = false;
-+	unsigned int sent_packets = 0;
-+	unsigned int lost_packets = 0;
-+	unsigned int i, tid;
-+	unsigned long dst_id;
-+	unsigned long cookie = 1;
-+	unsigned int thread_nr = MAX_CONN - 1;
-+	pthread_t thread_id[MAX_CONN - 1] = {'\0'};
-+
-+	dst_id = name ? KDBUS_DST_ID_NAME : conn_db[0]->id;
-+
-+	for (tid = 0, i = 1; tid < thread_nr; tid++, i++) {
-+		ret = pthread_create(&thread_id[tid], NULL,
-+				     kdbus_recv_echo, (void *)conn_db[0]);
-+		if (ret < 0) {
-+			ret = -errno;
-+			kdbus_printf("error pthread_create: %d (%m)\n",
-+				      ret);
-+			break;
-+		}
-+
-+		/* just free before re-using */
-+		kdbus_conn_free(conn_db[i]);
-+		conn_db[i] = NULL;
-+
-+		/* We need to create connections here */
-+		conn_db[i] = kdbus_hello(bus, 0, NULL, 0);
-+		if (!conn_db[i]) {
-+			ret = -errno;
-+			break;
-+		}
-+
-+		ret = kdbus_add_match_empty(conn_db[i]);
-+		if (ret < 0)
-+			break;
-+
-+		ret = kdbus_msg_send(conn_db[i], name, cookie++,
-+				     0, 0, 0, dst_id);
-+		if (ret < 0) {
-+			/*
-+			 * Receivers are not reading their messages,
-+			 * not scheduled ?!
-+			 *
-+			 * So set the pool full here, perhaps the
-+			 * connection pool or queue was full, later
-+			 * recheck receivers errors
-+			 */
-+			if (ret == -ENOBUFS || ret == -EXFULL)
-+				pool_full = true;
-+			break;
-+		}
-+
-+		sent_packets++;
-+	}
-+
-+	for (tid = 0; tid < thread_nr; tid++) {
-+		int thread_ret = 0;
-+
-+		if (thread_id[tid]) {
-+			pthread_join(thread_id[tid], (void *)&thread_ret);
-+			if (thread_ret < 0) {
-+				/* Update only if send did not fail */
-+				if (ret == 0)
-+					ret = thread_ret;
-+
-+				lost_packets++;
-+			}
-+		}
-+	}
-+
-+	/*
-+	 * When sending if we did fail with -ENOBUFS or -EXFULL
-+	 * then we should have set lost_packet and we should at
-+	 * least have sent_packets set to KDBUS_CONN_MAX_MSGS_PER_USER
-+	 */
-+	if (pool_full) {
-+		ASSERT_RETURN(lost_packets > 0);
-+
-+		/*
-+		 * We should at least send KDBUS_CONN_MAX_MSGS_PER_USER
-+		 *
-+		 * For every send operation we create a thread to
-+		 * recv the packet, so we keep the queue clean
-+		 */
-+		ASSERT_RETURN(sent_packets >= KDBUS_CONN_MAX_MSGS_PER_USER);
-+
-+		/*
-+		 * Set ret to zero since we only failed due to
-+		 * the receiving threads that have not been
-+		 * scheduled
-+		 */
-+		ret = 0;
-+	}
-+
-+	return ret;
-+}
-+
-+/* Return: TEST_OK or TEST_ERR on failure */
-+static int kdbus_normal_test(const char *bus, const char *name,
-+			     struct kdbus_conn **conn_db)
-+{
-+	int ret;
-+
-+	ret = kdbus_recv_in_threads(bus, name, conn_db);
-+	ASSERT_RETURN(ret >= 0);
-+
-+	return TEST_OK;
-+}
-+
-+static int kdbus_fork_test_by_id(const char *bus,
-+				 struct kdbus_conn **conn_db,
-+				 int parent_status, int child_status)
-+{
-+	int ret;
-+	pid_t pid;
-+	uint64_t cookie = 0x9876ecba;
-+	struct kdbus_msg *msg = NULL;
-+	uint64_t offset = 0;
-+	int status = 0;
-+
-+	/*
-+	 * If the child_status is not EXIT_SUCCESS, then we expect
-+	 * that sending from the child will fail, thus receiving
-+	 * from parent must error with -ETIMEDOUT, and vice versa.
-+	 */
-+	bool parent_timedout = !!child_status;
-+	bool child_timedout = !!parent_status;
-+
-+	pid = fork();
-+	ASSERT_RETURN_VAL(pid >= 0, pid);
-+
-+	if (pid == 0) {
-+		struct kdbus_conn *conn_src;
-+
-+		ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
-+		ASSERT_EXIT(ret == 0);
-+
-+		ret = drop_privileges(65534, 65534);
-+		ASSERT_EXIT(ret == 0);
-+
-+		conn_src = kdbus_hello(bus, 0, NULL, 0);
-+		ASSERT_EXIT(conn_src);
-+
-+		ret = kdbus_add_match_empty(conn_src);
-+		ASSERT_EXIT(ret == 0);
-+
-+		/*
-+		 * child_status is always checked against send
-+		 * operations, in case it fails always return
-+		 * EXIT_FAILURE.
-+		 */
-+		ret = kdbus_msg_send(conn_src, NULL, cookie,
-+				     0, 0, 0, conn_db[0]->id);
-+		ASSERT_EXIT(ret == child_status);
-+
-+		ret = kdbus_msg_recv_poll(conn_src, 100, NULL, NULL);
-+
-+		kdbus_conn_free(conn_src);
-+
-+		/*
-+		 * Child kdbus_msg_recv_poll() should timeout since
-+		 * the parent_status was set to a non EXIT_SUCCESS
-+		 * value.
-+		 */
-+		if (child_timedout)
-+			_exit(ret == -ETIMEDOUT ? EXIT_SUCCESS : EXIT_FAILURE);
-+
-+		_exit(ret == 0 ? EXIT_SUCCESS : EXIT_FAILURE);
-+	}
-+
-+	ret = kdbus_msg_recv_poll(conn_db[0], 100, &msg, &offset);
-+	/*
-+	 * If parent_timedout is set then this should fail with
-+	 * -ETIMEDOUT since the child_status was set to a non
-+	 * EXIT_SUCCESS value. Otherwise, assume
-+	 * that kdbus_msg_recv_poll() has succeeded.
-+	 */
-+	if (parent_timedout) {
-+		ASSERT_RETURN_VAL(ret == -ETIMEDOUT, TEST_ERR);
-+
-+		/* timedout no need to continue, we don't have the
-+		 * child connection ID, so just terminate. */
-+		goto out;
-+	} else {
-+		ASSERT_RETURN_VAL(ret == 0, ret);
-+	}
-+
-+	ret = kdbus_msg_send(conn_db[0], NULL, ++cookie,
-+			     0, 0, 0, msg->src_id);
-+	/*
-+	 * parent_status is checked against send operations,
-+	 * on failures always return TEST_ERR.
-+	 */
-+	ASSERT_RETURN_VAL(ret == parent_status, TEST_ERR);
-+
-+	kdbus_msg_free(msg);
-+	kdbus_free(conn_db[0], offset);
-+
-+out:
-+	ret = waitpid(pid, &status, 0);
-+	ASSERT_RETURN_VAL(ret >= 0, ret);
-+
-+	return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
-+}
-+
-+/*
-+ * Return: TEST_OK, TEST_ERR or TEST_SKIP
-+ * we return TEST_OK only if the children return with the expected
-+ * 'expected_status' that is specified as an argument.
-+ */
-+static int kdbus_fork_test(const char *bus, const char *name,
-+			   struct kdbus_conn **conn_db, int expected_status)
-+{
-+	pid_t pid;
-+	int ret = 0;
-+	int status = 0;
-+
-+	pid = fork();
-+	ASSERT_RETURN_VAL(pid >= 0, pid);
-+
-+	if (pid == 0) {
-+		ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
-+		ASSERT_EXIT(ret == 0);
-+
-+		ret = drop_privileges(65534, 65534);
-+		ASSERT_EXIT(ret == 0);
-+
-+		ret = kdbus_recv_in_threads(bus, name, conn_db);
-+		_exit(ret == expected_status ? EXIT_SUCCESS : EXIT_FAILURE);
-+	}
-+
-+	ret = waitpid(pid, &status, 0);
-+	ASSERT_RETURN(ret >= 0);
-+
-+	return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
-+}
-+
-+/* Return EXIT_SUCCESS, EXIT_FAILURE or negative errno */
-+static int __kdbus_clone_userns_test(const char *bus,
-+				     const char *name,
-+				     struct kdbus_conn **conn_db,
-+				     int expected_status)
-+{
-+	int efd;
-+	pid_t pid;
-+	int ret = 0;
-+	unsigned int uid = 65534;
-+	int status;
-+
-+	ret = drop_privileges(uid, uid);
-+	ASSERT_RETURN_VAL(ret == 0, ret);
-+
-+	/*
-+	 * Since we just dropped privileges, the dumpable flag was just
-+	 * cleared which makes the /proc/$clone_child/uid_map to be
-+	 * owned by root, hence any userns uid mapping will fail with
-+	 * -EPERM since the mapping will be done by uid 65534.
-+	 *
-+	 * To avoid this set the dumpable flag again which makes procfs
-+	 * update the /proc/$clone_child/ inodes owner to 65534.
-+	 *
-+	 * Using this we will be able write to /proc/$clone_child/uid_map
-+	 * as uid 65534 and map the uid 65534 to 0 inside the user
-+	 * namespace.
-+	 */
-+	ret = prctl(PR_SET_DUMPABLE, SUID_DUMP_USER);
-+	ASSERT_RETURN_VAL(ret == 0, ret);
-+
-+	/* sync parent/child */
-+	efd = eventfd(0, EFD_CLOEXEC);
-+	ASSERT_RETURN_VAL(efd >= 0, efd);
-+
-+	pid = syscall(__NR_clone, SIGCHLD|CLONE_NEWUSER, NULL);
-+	if (pid < 0) {
-+		ret = -errno;
-+		kdbus_printf("error clone: %d (%m)\n", ret);
-+		/*
-+		 * Normal user not allowed to create userns,
-+		 * so nothing to worry about ?
-+		 */
-+		if (ret == -EPERM) {
-+			kdbus_printf("-- CLONE_NEWUSER TEST Failed for uid: %u\n"
-+				"-- Make sure that your kernel do not allow "
-+				"CLONE_NEWUSER for unprivileged users\n"
-+				"-- Upstream Commit: "
-+				"https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5eaf563e\n",
-+				uid);
-+			ret = 0;
-+		}
-+
-+		return ret;
-+	}
-+
-+	if (pid == 0) {
-+		struct kdbus_conn *conn_src;
-+		eventfd_t event_status = 0;
-+
-+		ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
-+		ASSERT_EXIT(ret == 0);
-+
-+		ret = eventfd_read(efd, &event_status);
-+		ASSERT_EXIT(ret >= 0 && event_status == 1);
-+
-+		/* ping connection from the new user namespace */
-+		conn_src = kdbus_hello(bus, 0, NULL, 0);
-+		ASSERT_EXIT(conn_src);
-+
-+		ret = kdbus_add_match_empty(conn_src);
-+		ASSERT_EXIT(ret == 0);
-+
-+		ret = kdbus_msg_send(conn_src, name, 0xabcd1234,
-+				     0, 0, 0, KDBUS_DST_ID_NAME);
-+		kdbus_conn_free(conn_src);
-+
-+		_exit(ret == expected_status ? EXIT_SUCCESS : EXIT_FAILURE);
-+	}
-+
-+	ret = userns_map_uid_gid(pid, "0 65534 1", "0 65534 1");
-+	ASSERT_RETURN_VAL(ret == 0, ret);
-+
-+	/* Tell child we are ready */
-+	ret = eventfd_write(efd, 1);
-+	ASSERT_RETURN_VAL(ret == 0, ret);
-+
-+	ret = waitpid(pid, &status, 0);
-+	ASSERT_RETURN_VAL(ret >= 0, ret);
-+
-+	close(efd);
-+
-+	return status == EXIT_SUCCESS ? TEST_OK : TEST_ERR;
-+}
-+
-+static int kdbus_clone_userns_test(const char *bus,
-+				   const char *name,
-+				   struct kdbus_conn **conn_db,
-+				   int expected_status)
-+{
-+	pid_t pid;
-+	int ret = 0;
-+	int status;
-+
-+	pid = fork();
-+	ASSERT_RETURN_VAL(pid >= 0, -errno);
-+
-+	if (pid == 0) {
-+		ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
-+		if (ret < 0)
-+			_exit(EXIT_FAILURE);
-+
-+		ret = __kdbus_clone_userns_test(bus, name, conn_db,
-+						expected_status);
-+		_exit(ret);
-+	}
-+
-+	/*
-+	 * Receive in the original (root privileged) user namespace,
-+	 * must fail with -ETIMEDOUT.
-+	 */
-+	ret = kdbus_msg_recv_poll(conn_db[0], 100, NULL, NULL);
-+	ASSERT_RETURN_VAL(ret == -ETIMEDOUT, ret);
-+
-+	ret = waitpid(pid, &status, 0);
-+	ASSERT_RETURN_VAL(ret >= 0, ret);
-+
-+	return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
-+}
-+
-+int kdbus_test_policy_ns(struct kdbus_test_env *env)
-+{
-+	int i;
-+	int ret;
-+	struct kdbus_conn *activator = NULL;
-+	struct kdbus_conn *policy_holder = NULL;
-+	char *bus = env->buspath;
-+
-+	ret = test_is_capable(CAP_SETUID, CAP_SETGID, -1);
-+	ASSERT_RETURN(ret >= 0);
-+
-+	/* no enough privileges, SKIP test */
-+	if (!ret)
-+		return TEST_SKIP;
-+
-+	/* we require user-namespaces */
-+	if (access("/proc/self/uid_map", F_OK) != 0)
-+		return TEST_SKIP;
-+
-+	/* uids/gids must be mapped */
-+	if (!all_uids_gids_are_mapped())
-+		return TEST_SKIP;
-+
-+	conn_db = calloc(MAX_CONN, sizeof(struct kdbus_conn *));
-+	ASSERT_RETURN(conn_db);
-+
-+	memset(conn_db, 0, MAX_CONN * sizeof(struct kdbus_conn *));
-+
-+	conn_db[0] = kdbus_hello(bus, 0, NULL, 0);
-+	ASSERT_RETURN(conn_db[0]);
-+
-+	ret = kdbus_add_match_empty(conn_db[0]);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_fork_test_by_id(bus, conn_db, -EPERM, -EPERM);
-+	ASSERT_EXIT(ret == 0);
-+
-+	ret = kdbus_register_policy_holder(bus, POLICY_NAME,
-+					   &policy_holder);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* Try to register the same name with an activator */
-+	ret = kdbus_register_same_activator(bus, POLICY_NAME,
-+					    &activator);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* Acquire POLICY_NAME */
-+	ret = kdbus_name_acquire(conn_db[0], POLICY_NAME, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_normal_test(bus, POLICY_NAME, conn_db);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_list(conn_db[0], KDBUS_LIST_NAMES |
-+				     KDBUS_LIST_UNIQUE |
-+				     KDBUS_LIST_ACTIVATORS |
-+				     KDBUS_LIST_QUEUED);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_fork_test(bus, POLICY_NAME, conn_db, EXIT_SUCCESS);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/*
-+	 * children connections are able to talk to conn_db[0] since
-+	 * current POLICY_NAME TALK type is KDBUS_POLICY_ACCESS_WORLD,
-+	 * so expect EXIT_SUCCESS when sending from child. However,
-+	 * since the child's connection does not own any well-known
-+	 * name, The parent connection conn_db[0] should fail with
-+	 * -EPERM but since it is a privileged bus user the TALK is
-+	 *  allowed.
-+	 */
-+	ret = kdbus_fork_test_by_id(bus, conn_db,
-+				    EXIT_SUCCESS, EXIT_SUCCESS);
-+	ASSERT_EXIT(ret == 0);
-+
-+	/*
-+	 * Connections that can talk are perhaps being destroyed now.
-+	 * Restrict the policy and purge cache entries where the
-+	 * conn_db[0] is the destination.
-+	 *
-+	 * Now only connections with uid == 0 are allowed to talk.
-+	 */
-+	ret = kdbus_set_policy_talk(policy_holder, POLICY_NAME,
-+				    geteuid(), KDBUS_POLICY_ACCESS_USER);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/*
-+	 * Testing connections (FORK+DROP) again:
-+	 * After setting the policy re-check connections
-+	 * we expect the children to fail with -EPERM
-+	 */
-+	ret = kdbus_fork_test(bus, POLICY_NAME, conn_db, -EPERM);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/*
-+	 * Now expect that both parent and child to fail.
-+	 *
-+	 * Child should fail with -EPERM since we just restricted
-+	 * the POLICY_NAME TALK to uid 0 and its uid is 65534.
-+	 *
-+	 * Since the parent's connection will timeout when receiving
-+	 * from the child, we never continue. FWIW just put -EPERM.
-+	 */
-+	ret = kdbus_fork_test_by_id(bus, conn_db, -EPERM, -EPERM);
-+	ASSERT_EXIT(ret == 0);
-+
-+	/* Check if the name can be reached in a new userns */
-+	ret = kdbus_clone_userns_test(bus, POLICY_NAME, conn_db, -EPERM);
-+	ASSERT_RETURN(ret == 0);
-+
-+	for (i = 0; i < MAX_CONN; i++)
-+		kdbus_conn_free(conn_db[i]);
-+
-+	kdbus_conn_free(activator);
-+	kdbus_conn_free(policy_holder);
-+
-+	free(conn_db);
-+
-+	return ret;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-policy-priv.c b/tools/testing/selftests/kdbus/test-policy-priv.c
-new file mode 100644
-index 0000000..0208638
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-policy-priv.c
-@@ -0,0 +1,1285 @@
-+#include <errno.h>
-+#include <stdio.h>
-+#include <string.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stdint.h>
-+#include <stdbool.h>
-+#include <unistd.h>
-+#include <time.h>
-+#include <sys/capability.h>
-+#include <sys/eventfd.h>
-+#include <sys/wait.h>
-+
-+#include "kdbus-test.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+
-+static int test_policy_priv_by_id(const char *bus,
-+				  struct kdbus_conn *conn_dst,
-+				  bool drop_second_user,
-+				  int parent_status,
-+				  int child_status)
-+{
-+	int ret = 0;
-+	uint64_t expected_cookie = time(NULL) ^ 0xdeadbeef;
-+
-+	ASSERT_RETURN(conn_dst);
-+
-+	ret = RUN_UNPRIVILEGED_CONN(unpriv, bus, ({
-+		ret = kdbus_msg_send(unpriv, NULL,
-+				     expected_cookie, 0, 0, 0,
-+				     conn_dst->id);
-+		ASSERT_EXIT(ret == child_status);
-+	}));
-+	ASSERT_RETURN(ret >= 0);
-+
-+	ret = kdbus_msg_recv_poll(conn_dst, 300, NULL, NULL);
-+	ASSERT_RETURN(ret == parent_status);
-+
-+	return 0;
-+}
-+
-+static int test_policy_priv_by_broadcast(const char *bus,
-+					 struct kdbus_conn *conn_dst,
-+					 int drop_second_user,
-+					 int parent_status,
-+					 int child_status)
-+{
-+	int efd;
-+	int ret = 0;
-+	eventfd_t event_status = 0;
-+	struct kdbus_msg *msg = NULL;
-+	uid_t second_uid = UNPRIV_UID;
-+	gid_t second_gid = UNPRIV_GID;
-+	struct kdbus_conn *child_2 = conn_dst;
-+	uint64_t expected_cookie = time(NULL) ^ 0xdeadbeef;
-+
-+	/* Drop to another unprivileged user other than UNPRIV_UID */
-+	if (drop_second_user == DROP_OTHER_UNPRIV) {
-+		second_uid = UNPRIV_UID - 1;
-+		second_gid = UNPRIV_GID - 1;
-+	}
-+
-+	/* child will signal parent to send broadcast */
-+	efd = eventfd(0, EFD_CLOEXEC);
-+	ASSERT_RETURN_VAL(efd >= 0, efd);
-+
-+	ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_GID, ({
-+		struct kdbus_conn *child;
-+
-+		child = kdbus_hello(bus, 0, NULL, 0);
-+		ASSERT_EXIT(child);
-+
-+		ret = kdbus_add_match_empty(child);
-+		ASSERT_EXIT(ret == 0);
-+
-+		/* signal parent */
-+		ret = eventfd_write(efd, 1);
-+		ASSERT_EXIT(ret == 0);
-+
-+		/* Use a little bit high time */
-+		ret = kdbus_msg_recv_poll(child, 500, &msg, NULL);
-+		ASSERT_EXIT(ret == child_status);
-+
-+		/*
-+		 * If we expect the child to get the broadcast
-+		 * message, then check the received cookie.
-+		 */
-+		if (ret == 0) {
-+			ASSERT_EXIT(expected_cookie == msg->cookie);
-+		}
-+
-+		/* Use expected_cookie since 'msg' might be NULL */
-+		ret = kdbus_msg_send(child, NULL, expected_cookie + 1,
-+				     0, 0, 0, KDBUS_DST_ID_BROADCAST);
-+		ASSERT_EXIT(ret == 0);
-+
-+		kdbus_msg_free(msg);
-+		kdbus_conn_free(child);
-+	}),
-+	({
-+		if (drop_second_user == DO_NOT_DROP) {
-+			ASSERT_RETURN(child_2);
-+
-+			ret = eventfd_read(efd, &event_status);
-+			ASSERT_RETURN(ret >= 0 && event_status == 1);
-+
-+			ret = kdbus_msg_send(child_2, NULL,
-+					     expected_cookie, 0, 0, 0,
-+					     KDBUS_DST_ID_BROADCAST);
-+			ASSERT_RETURN(ret == 0);
-+
-+			/* drop own broadcast */
-+			ret = kdbus_msg_recv(child_2, &msg, NULL);
-+			ASSERT_RETURN(ret == 0);
-+			ASSERT_RETURN(msg->src_id == child_2->id);
-+			kdbus_msg_free(msg);
-+
-+			/* Use a little bit high time */
-+			ret = kdbus_msg_recv_poll(child_2, 1000,
-+						  &msg, NULL);
-+			ASSERT_RETURN(ret == parent_status);
-+
-+			/*
-+			 * Check returned cookie in case we expect
-+			 * success.
-+			 */
-+			if (ret == 0) {
-+				ASSERT_RETURN(msg->cookie ==
-+					      expected_cookie + 1);
-+			}
-+
-+			kdbus_msg_free(msg);
-+		} else {
-+			/*
-+			 * Two unprivileged users will try to
-+			 * communicate using broadcast.
-+			 */
-+			ret = RUN_UNPRIVILEGED(second_uid, second_gid, ({
-+				child_2 = kdbus_hello(bus, 0, NULL, 0);
-+				ASSERT_EXIT(child_2);
-+
-+				ret = kdbus_add_match_empty(child_2);
-+				ASSERT_EXIT(ret == 0);
-+
-+				ret = eventfd_read(efd, &event_status);
-+				ASSERT_EXIT(ret >= 0 && event_status == 1);
-+
-+				ret = kdbus_msg_send(child_2, NULL,
-+						expected_cookie, 0, 0, 0,
-+						KDBUS_DST_ID_BROADCAST);
-+				ASSERT_EXIT(ret == 0);
-+
-+				/* drop own broadcast */
-+				ret = kdbus_msg_recv(child_2, &msg, NULL);
-+				ASSERT_RETURN(ret == 0);
-+				ASSERT_RETURN(msg->src_id == child_2->id);
-+				kdbus_msg_free(msg);
-+
-+				/* Use a little bit high time */
-+				ret = kdbus_msg_recv_poll(child_2, 1000,
-+							  &msg, NULL);
-+				ASSERT_EXIT(ret == parent_status);
-+
-+				/*
-+				 * Check returned cookie in case we expect
-+				 * success.
-+				 */
-+				if (ret == 0) {
-+					ASSERT_EXIT(msg->cookie ==
-+						    expected_cookie + 1);
-+				}
-+
-+				kdbus_msg_free(msg);
-+				kdbus_conn_free(child_2);
-+			}),
-+			({ 0; }));
-+			ASSERT_RETURN(ret == 0);
-+		}
-+	}));
-+	ASSERT_RETURN(ret == 0);
-+
-+	close(efd);
-+
-+	return ret;
-+}
-+
-+static void nosig(int sig)
-+{
-+}
-+
-+static int test_priv_before_policy_upload(struct kdbus_test_env *env)
-+{
-+	int ret = 0;
-+	struct kdbus_conn *conn;
-+
-+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn);
-+
-+	/*
-+	 * Make sure unprivileged bus user cannot acquire names
-+	 * before registring any policy holder.
-+	 */
-+
-+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+		ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
-+		ASSERT_EXIT(ret < 0);
-+	}));
-+	ASSERT_RETURN(ret == 0);
-+
-+	/*
-+	 * Make sure unprivileged bus users cannot talk by default
-+	 * to privileged ones, unless a policy holder that allows
-+	 * this was uploaded.
-+	 */
-+
-+	ret = test_policy_priv_by_id(env->buspath, conn, false,
-+				     -ETIMEDOUT, -EPERM);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* Activate matching for a privileged connection */
-+	ret = kdbus_add_match_empty(conn);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/*
-+	 * First make sure that BROADCAST with msg flag
-+	 * KDBUS_MSG_EXPECT_REPLY will fail with -ENOTUNIQ
-+	 */
-+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+		ret = kdbus_msg_send(unpriv, NULL, 0xdeadbeef,
-+				     KDBUS_MSG_EXPECT_REPLY,
-+				     5000000000ULL, 0,
-+				     KDBUS_DST_ID_BROADCAST);
-+		ASSERT_EXIT(ret == -ENOTUNIQ);
-+	}));
-+	ASSERT_RETURN(ret == 0);
-+
-+	/*
-+	 * Test broadcast with a privileged connection.
-+	 *
-+	 * The first unprivileged receiver should not get the
-+	 * broadcast message sent by the privileged connection,
-+	 * since there is no a TALK policy that allows the
-+	 * unprivileged to TALK to the privileged connection. It
-+	 * will fail with -ETIMEDOUT
-+	 *
-+	 * Then second case:
-+	 * The privileged connection should get the broadcast
-+	 * message from the unprivileged one. Since the receiver is
-+	 * a privileged bus user and it has default TALK access to
-+	 * all connections it will receive those.
-+	 */
-+
-+	ret = test_policy_priv_by_broadcast(env->buspath, conn,
-+					    DO_NOT_DROP,
-+					    0, -ETIMEDOUT);
-+	ASSERT_RETURN(ret == 0);
-+
-+
-+	/*
-+	 * Test broadcast with two unprivileged connections running
-+	 * under the same user.
-+	 *
-+	 * Both connections should succeed.
-+	 */
-+
-+	ret = test_policy_priv_by_broadcast(env->buspath, NULL,
-+					    DROP_SAME_UNPRIV, 0, 0);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/*
-+	 * Test broadcast with two unprivileged connections running
-+	 * under different users.
-+	 *
-+	 * Both connections will fail with -ETIMEDOUT.
-+	 */
-+
-+	ret = test_policy_priv_by_broadcast(env->buspath, NULL,
-+					    DROP_OTHER_UNPRIV,
-+					    -ETIMEDOUT, -ETIMEDOUT);
-+	ASSERT_RETURN(ret == 0);
-+
-+	kdbus_conn_free(conn);
-+
-+	return ret;
-+}
-+
-+static int test_broadcast_after_policy_upload(struct kdbus_test_env *env)
-+{
-+	int ret;
-+	int efd;
-+	eventfd_t event_status = 0;
-+	struct kdbus_msg *msg = NULL;
-+	struct kdbus_conn *owner_a, *owner_b;
-+	struct kdbus_conn *holder_a, *holder_b;
-+	struct kdbus_policy_access access = {};
-+	uint64_t expected_cookie = time(NULL) ^ 0xdeadbeef;
-+
-+	owner_a = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(owner_a);
-+
-+	ret = kdbus_name_acquire(owner_a, "com.example.broadcastA", NULL);
-+	ASSERT_EXIT(ret >= 0);
-+
-+	/*
-+	 * Make sure unprivileged bus users cannot talk by default
-+	 * to privileged ones, unless a policy holder that allows
-+	 * this was uploaded.
-+	 */
-+
-+	++expected_cookie;
-+	ret = test_policy_priv_by_id(env->buspath, owner_a, false,
-+				     -ETIMEDOUT, -EPERM);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/*
-+	 * Make sure that privileged won't receive broadcasts unless
-+	 * it installs a match. It will fail with -ETIMEDOUT
-+	 *
-+	 * At same time check that the unprivileged connection will
-+	 * not receive the broadcast message from the privileged one
-+	 * since the privileged one owns a name with a restricted
-+	 * policy TALK (actually the TALK policy is still not
-+	 * registered so we fail by default), thus the unprivileged
-+	 * receiver is not able to TALK to that name.
-+	 */
-+
-+	/* Activate matching for a privileged connection */
-+	ret = kdbus_add_match_empty(owner_a);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/*
-+	 * Redo the previous test. The privileged conn owner_a is
-+	 * able to TALK to any connection so it will receive the
-+	 * broadcast message now.
-+	 */
-+
-+	ret = test_policy_priv_by_broadcast(env->buspath, owner_a,
-+					    DO_NOT_DROP,
-+					    0, -ETIMEDOUT);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/*
-+	 * Test that broadcast between two unprivileged users running
-+	 * under the same user still succeed.
-+	 */
-+
-+	ret = test_policy_priv_by_broadcast(env->buspath, NULL,
-+					    DROP_SAME_UNPRIV, 0, 0);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/*
-+	 * Test broadcast with two unprivileged connections running
-+	 * under different users.
-+	 *
-+	 * Both connections will fail with -ETIMEDOUT.
-+	 */
-+
-+	ret = test_policy_priv_by_broadcast(env->buspath, NULL,
-+					    DROP_OTHER_UNPRIV,
-+					    -ETIMEDOUT, -ETIMEDOUT);
-+	ASSERT_RETURN(ret == 0);
-+
-+	access = (struct kdbus_policy_access){
-+		.type = KDBUS_POLICY_ACCESS_USER,
-+		.id = geteuid(),
-+		.access = KDBUS_POLICY_OWN,
-+	};
-+
-+	holder_a = kdbus_hello_registrar(env->buspath,
-+					 "com.example.broadcastA",
-+					 &access, 1,
-+					 KDBUS_HELLO_POLICY_HOLDER);
-+	ASSERT_RETURN(holder_a);
-+
-+	holder_b = kdbus_hello_registrar(env->buspath,
-+					 "com.example.broadcastB",
-+					 &access, 1,
-+					 KDBUS_HELLO_POLICY_HOLDER);
-+	ASSERT_RETURN(holder_b);
-+
-+	/* Free connections and their received messages and restart */
-+	kdbus_conn_free(owner_a);
-+
-+	owner_a = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(owner_a);
-+
-+	/* Activate matching for a privileged connection */
-+	ret = kdbus_add_match_empty(owner_a);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_name_acquire(owner_a, "com.example.broadcastA", NULL);
-+	ASSERT_EXIT(ret >= 0);
-+
-+	owner_b = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(owner_b);
-+
-+	ret = kdbus_name_acquire(owner_b, "com.example.broadcastB", NULL);
-+	ASSERT_EXIT(ret >= 0);
-+
-+	/* Activate matching for a privileged connection */
-+	ret = kdbus_add_match_empty(owner_b);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/*
-+	 * Test that even if "com.example.broadcastA" and
-+	 * "com.example.broadcastB" do have a TALK access by default
-+	 * they are able to signal each other using broadcast due to
-+	 * the fact they are privileged connections, they receive
-+	 * all broadcasts if the match allows it.
-+	 */
-+
-+	++expected_cookie;
-+	ret = kdbus_msg_send(owner_a, NULL, expected_cookie, 0,
-+			     0, 0, KDBUS_DST_ID_BROADCAST);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_msg_recv_poll(owner_a, 100, &msg, NULL);
-+	ASSERT_RETURN(ret == 0);
-+	ASSERT_RETURN(msg->cookie == expected_cookie);
-+
-+	/* Check src ID */
-+	ASSERT_RETURN(msg->src_id == owner_a->id);
-+
-+	kdbus_msg_free(msg);
-+
-+	ret = kdbus_msg_recv_poll(owner_b, 100, &msg, NULL);
-+	ASSERT_RETURN(ret == 0);
-+	ASSERT_RETURN(msg->cookie == expected_cookie);
-+
-+	/* Check src ID */
-+	ASSERT_RETURN(msg->src_id == owner_a->id);
-+
-+	kdbus_msg_free(msg);
-+
-+	/* Release name "com.example.broadcastB" */
-+
-+	ret = kdbus_name_release(owner_b, "com.example.broadcastB");
-+	ASSERT_EXIT(ret >= 0);
-+
-+	/* KDBUS_POLICY_OWN for unprivileged connections */
-+	access = (struct kdbus_policy_access){
-+		.type = KDBUS_POLICY_ACCESS_WORLD,
-+		.id = geteuid(),
-+		.access = KDBUS_POLICY_OWN,
-+	};
-+
-+	/* Update the policy so unprivileged will own the name */
-+
-+	ret = kdbus_conn_update_policy(holder_b,
-+				       "com.example.broadcastB",
-+				       &access, 1);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/*
-+	 * Send broadcasts from an unprivileged connection that
-+	 * owns a name "com.example.broadcastB".
-+	 *
-+	 * We'll have four destinations here:
-+	 *
-+	 * 1) destination owner_a: privileged connection that owns
-+	 * "com.example.broadcastA". It will receive the broadcast
-+	 * since it is a privileged has default TALK access to all
-+	 * connections, and it is subscribed to the match.
-+	 * Will succeed.
-+	 *
-+	 * owner_b: privileged connection (running under a different
-+	 * uid) that do not own names, but with an empty broadcast
-+	 * match, so it will receive broadcasts since it has default
-+	 * TALK access to all connection.
-+	 *
-+	 * unpriv_a: unpriv connection that do not own any name.
-+	 * It will receive the broadcast since it is running under
-+	 * the same user of the one broadcasting and did install
-+	 * matches. It should get the message.
-+	 *
-+	 * unpriv_b: unpriv connection is not interested in broadcast
-+	 * messages, so it did not install broadcast matches. Should
-+	 * fail with -ETIMEDOUT
-+	 */
-+
-+	++expected_cookie;
-+	efd = eventfd(0, EFD_CLOEXEC);
-+	ASSERT_RETURN_VAL(efd >= 0, efd);
-+
-+	ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_UID, ({
-+		struct kdbus_conn *unpriv_owner;
-+		struct kdbus_conn *unpriv_a, *unpriv_b;
-+
-+		unpriv_owner = kdbus_hello(env->buspath, 0, NULL, 0);
-+		ASSERT_EXIT(unpriv_owner);
-+
-+		unpriv_a = kdbus_hello(env->buspath, 0, NULL, 0);
-+		ASSERT_EXIT(unpriv_a);
-+
-+		unpriv_b = kdbus_hello(env->buspath, 0, NULL, 0);
-+		ASSERT_EXIT(unpriv_b);
-+
-+		ret = kdbus_name_acquire(unpriv_owner,
-+					 "com.example.broadcastB",
-+					 NULL);
-+		ASSERT_EXIT(ret >= 0);
-+
-+		ret = kdbus_add_match_empty(unpriv_a);
-+		ASSERT_EXIT(ret == 0);
-+
-+		/* Signal that we are doing broadcasts */
-+		ret = eventfd_write(efd, 1);
-+		ASSERT_EXIT(ret == 0);
-+
-+		/*
-+		 * Do broadcast from a connection that owns the
-+		 * names "com.example.broadcastB".
-+		 */
-+		ret = kdbus_msg_send(unpriv_owner, NULL,
-+				     expected_cookie,
-+				     0, 0, 0,
-+				     KDBUS_DST_ID_BROADCAST);
-+		ASSERT_EXIT(ret == 0);
-+
-+		/*
-+		 * Unprivileged connection running under the same
-+		 * user. It should succeed.
-+		 */
-+		ret = kdbus_msg_recv_poll(unpriv_a, 300, &msg, NULL);
-+		ASSERT_EXIT(ret == 0 && msg->cookie == expected_cookie);
-+
-+		/*
-+		 * Did not install matches, not interested in
-+		 * broadcasts
-+		 */
-+		ret = kdbus_msg_recv_poll(unpriv_b, 300, NULL, NULL);
-+		ASSERT_EXIT(ret == -ETIMEDOUT);
-+	}),
-+	({
-+		ret = eventfd_read(efd, &event_status);
-+		ASSERT_RETURN(ret >= 0 && event_status == 1);
-+
-+		/*
-+		 * owner_a must fail with -ETIMEDOUT, since it owns
-+		 * name "com.example.broadcastA" and its TALK
-+		 * access is restriced.
-+		 */
-+		ret = kdbus_msg_recv_poll(owner_a, 300, &msg, NULL);
-+		ASSERT_RETURN(ret == 0);
-+
-+		/* confirm the received cookie */
-+		ASSERT_RETURN(msg->cookie == expected_cookie);
-+
-+		kdbus_msg_free(msg);
-+
-+		/*
-+		 * owner_b got the broadcast from an unprivileged
-+		 * connection.
-+		 */
-+		ret = kdbus_msg_recv_poll(owner_b, 300, &msg, NULL);
-+		ASSERT_RETURN(ret == 0);
-+
-+		/* confirm the received cookie */
-+		ASSERT_RETURN(msg->cookie == expected_cookie);
-+
-+		kdbus_msg_free(msg);
-+
-+	}));
-+	ASSERT_RETURN(ret == 0);
-+
-+	close(efd);
-+
-+	/*
-+	 * Test broadcast with two unprivileged connections running
-+	 * under different users.
-+	 *
-+	 * Both connections will fail with -ETIMEDOUT.
-+	 */
-+
-+	ret = test_policy_priv_by_broadcast(env->buspath, NULL,
-+					    DROP_OTHER_UNPRIV,
-+					    -ETIMEDOUT, -ETIMEDOUT);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* Drop received broadcasts by privileged */
-+	ret = kdbus_msg_recv_poll(owner_a, 100, NULL, NULL);
-+	ret = kdbus_msg_recv_poll(owner_a, 100, NULL, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_msg_recv(owner_a, NULL, NULL);
-+	ASSERT_RETURN(ret == -EAGAIN);
-+
-+	ret = kdbus_msg_recv_poll(owner_b, 100, NULL, NULL);
-+	ret = kdbus_msg_recv_poll(owner_b, 100, NULL, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_msg_recv(owner_b, NULL, NULL);
-+	ASSERT_RETURN(ret == -EAGAIN);
-+
-+	/*
-+	 * Perform last tests, allow others to talk to name
-+	 * "com.example.broadcastA". So now receiving broadcasts
-+	 * from it should succeed since the TALK policy allow it.
-+	 */
-+
-+	/* KDBUS_POLICY_OWN for unprivileged connections */
-+	access = (struct kdbus_policy_access){
-+		.type = KDBUS_POLICY_ACCESS_WORLD,
-+		.id = geteuid(),
-+		.access = KDBUS_POLICY_TALK,
-+	};
-+
-+	ret = kdbus_conn_update_policy(holder_a,
-+				       "com.example.broadcastA",
-+				       &access, 1);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/*
-+	 * Unprivileged is able to TALK to "com.example.broadcastA"
-+	 * now so it will receive its broadcasts
-+	 */
-+	ret = test_policy_priv_by_broadcast(env->buspath, owner_a,
-+					    DO_NOT_DROP, 0, 0);
-+	ASSERT_RETURN(ret == 0);
-+
-+	++expected_cookie;
-+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+		ret = kdbus_name_acquire(unpriv, "com.example.broadcastB",
-+					 NULL);
-+		ASSERT_EXIT(ret >= 0);
-+		ret = kdbus_msg_send(unpriv, NULL, expected_cookie,
-+				     0, 0, 0, KDBUS_DST_ID_BROADCAST);
-+		ASSERT_EXIT(ret == 0);
-+	}));
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* owner_a is privileged it will get the broadcast now. */
-+	ret = kdbus_msg_recv_poll(owner_a, 300, &msg, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* confirm the received cookie */
-+	ASSERT_RETURN(msg->cookie == expected_cookie);
-+
-+	kdbus_msg_free(msg);
-+
-+	/*
-+	 * owner_a released name "com.example.broadcastA". It should
-+	 * receive broadcasts since it is still privileged and has
-+	 * the right match.
-+	 *
-+	 * Unprivileged connection will own a name and will try to
-+	 * signal to the privileged connection.
-+	 */
-+
-+	ret = kdbus_name_release(owner_a, "com.example.broadcastA");
-+	ASSERT_EXIT(ret >= 0);
-+
-+	++expected_cookie;
-+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+		ret = kdbus_name_acquire(unpriv, "com.example.broadcastB",
-+					 NULL);
-+		ASSERT_EXIT(ret >= 0);
-+		ret = kdbus_msg_send(unpriv, NULL, expected_cookie,
-+				     0, 0, 0, KDBUS_DST_ID_BROADCAST);
-+		ASSERT_EXIT(ret == 0);
-+	}));
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* owner_a will get the broadcast now. */
-+	ret = kdbus_msg_recv_poll(owner_a, 300, &msg, NULL);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/* confirm the received cookie */
-+	ASSERT_RETURN(msg->cookie == expected_cookie);
-+
-+	kdbus_msg_free(msg);
-+
-+	kdbus_conn_free(owner_a);
-+	kdbus_conn_free(owner_b);
-+	kdbus_conn_free(holder_a);
-+	kdbus_conn_free(holder_b);
-+
-+	return 0;
-+}
-+
-+static int test_policy_priv(struct kdbus_test_env *env)
-+{
-+	struct kdbus_conn *conn_a, *conn_b, *conn, *owner;
-+	struct kdbus_policy_access access, *acc;
-+	sigset_t sset;
-+	size_t num;
-+	int ret;
-+
-+	/*
-+	 * Make sure we have CAP_SETUID/SETGID so we can drop privileges
-+	 */
-+
-+	ret = test_is_capable(CAP_SETUID, CAP_SETGID, -1);
-+	ASSERT_RETURN(ret >= 0);
-+
-+	if (!ret)
-+		return TEST_SKIP;
-+
-+	/* make sure that uids and gids are mapped */
-+	if (!all_uids_gids_are_mapped())
-+		return TEST_SKIP;
-+
-+	/*
-+	 * Setup:
-+	 *  conn_a: policy holder for com.example.a
-+	 *  conn_b: name holder of com.example.b
-+	 */
-+
-+	signal(SIGUSR1, nosig);
-+	sigemptyset(&sset);
-+	sigaddset(&sset, SIGUSR1);
-+	sigprocmask(SIG_BLOCK, &sset, NULL);
-+
-+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn);
-+
-+	/*
-+	 * Before registering any policy holder, make sure that the
-+	 * bus is secure by default. This test is necessary, it catches
-+	 * several cases where old D-Bus was vulnerable.
-+	 */
-+
-+	ret = test_priv_before_policy_upload(env);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/*
-+	 * Make sure unprivileged are not able to register policy
-+	 * holders
-+	 */
-+
-+	ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_GID, ({
-+		struct kdbus_conn *holder;
-+
-+		holder = kdbus_hello_registrar(env->buspath,
-+					       "com.example.a", NULL, 0,
-+					       KDBUS_HELLO_POLICY_HOLDER);
-+		ASSERT_EXIT(holder == NULL && errno == EPERM);
-+	}),
-+	({ 0; }));
-+	ASSERT_RETURN(ret == 0);
-+
-+
-+	/* Register policy holder */
-+
-+	conn_a = kdbus_hello_registrar(env->buspath, "com.example.a",
-+				       NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
-+	ASSERT_RETURN(conn_a);
-+
-+	conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn_b);
-+
-+	ret = kdbus_name_acquire(conn_b, "com.example.b", NULL);
-+	ASSERT_EXIT(ret >= 0);
-+
-+	/*
-+	 * Make sure bus-owners can always acquire names.
-+	 */
-+	ret = kdbus_name_acquire(conn, "com.example.a", NULL);
-+	ASSERT_EXIT(ret >= 0);
-+
-+	kdbus_conn_free(conn);
-+
-+	/*
-+	 * Make sure unprivileged users cannot acquire names with default
-+	 * policy assigned.
-+	 */
-+
-+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+		ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
-+		ASSERT_EXIT(ret < 0);
-+	}));
-+	ASSERT_RETURN(ret >= 0);
-+
-+	/*
-+	 * Make sure unprivileged users can acquire names if we make them
-+	 * world-accessible.
-+	 */
-+
-+	access = (struct kdbus_policy_access){
-+		.type = KDBUS_POLICY_ACCESS_WORLD,
-+		.id = 0,
-+		.access = KDBUS_POLICY_OWN,
-+	};
-+
-+	/*
-+	 * Make sure unprivileged/normal connections are not able
-+	 * to update policies
-+	 */
-+
-+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+		ret = kdbus_conn_update_policy(unpriv, "com.example.a",
-+					       &access, 1);
-+		ASSERT_EXIT(ret == -EOPNOTSUPP);
-+	}));
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+		ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
-+		ASSERT_EXIT(ret >= 0);
-+	}));
-+	ASSERT_RETURN(ret >= 0);
-+
-+	/*
-+	 * Make sure unprivileged users can acquire names if we make them
-+	 * gid-accessible. But only if the gid matches.
-+	 */
-+
-+	access = (struct kdbus_policy_access){
-+		.type = KDBUS_POLICY_ACCESS_GROUP,
-+		.id = UNPRIV_GID,
-+		.access = KDBUS_POLICY_OWN,
-+	};
-+
-+	ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+		ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
-+		ASSERT_EXIT(ret >= 0);
-+	}));
-+	ASSERT_RETURN(ret >= 0);
-+
-+	access = (struct kdbus_policy_access){
-+		.type = KDBUS_POLICY_ACCESS_GROUP,
-+		.id = 1,
-+		.access = KDBUS_POLICY_OWN,
-+	};
-+
-+	ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+		ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
-+		ASSERT_EXIT(ret < 0);
-+	}));
-+	ASSERT_RETURN(ret >= 0);
-+
-+	/*
-+	 * Make sure unprivileged users can acquire names if we make them
-+	 * uid-accessible. But only if the uid matches.
-+	 */
-+
-+	access = (struct kdbus_policy_access){
-+		.type = KDBUS_POLICY_ACCESS_USER,
-+		.id = UNPRIV_UID,
-+		.access = KDBUS_POLICY_OWN,
-+	};
-+
-+	ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+		ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
-+		ASSERT_EXIT(ret >= 0);
-+	}));
-+	ASSERT_RETURN(ret >= 0);
-+
-+	access = (struct kdbus_policy_access){
-+		.type = KDBUS_POLICY_ACCESS_USER,
-+		.id = 1,
-+		.access = KDBUS_POLICY_OWN,
-+	};
-+
-+	ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+		ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
-+		ASSERT_EXIT(ret < 0);
-+	}));
-+	ASSERT_RETURN(ret >= 0);
-+
-+	/*
-+	 * Make sure unprivileged users cannot acquire names if no owner-policy
-+	 * matches, even if SEE/TALK policies match.
-+	 */
-+
-+	num = 4;
-+	acc = (struct kdbus_policy_access[]){
-+		{
-+			.type = KDBUS_POLICY_ACCESS_GROUP,
-+			.id = UNPRIV_GID,
-+			.access = KDBUS_POLICY_SEE,
-+		},
-+		{
-+			.type = KDBUS_POLICY_ACCESS_USER,
-+			.id = UNPRIV_UID,
-+			.access = KDBUS_POLICY_TALK,
-+		},
-+		{
-+			.type = KDBUS_POLICY_ACCESS_WORLD,
-+			.id = 0,
-+			.access = KDBUS_POLICY_TALK,
-+		},
-+		{
-+			.type = KDBUS_POLICY_ACCESS_WORLD,
-+			.id = 0,
-+			.access = KDBUS_POLICY_SEE,
-+		},
-+	};
-+
-+	ret = kdbus_conn_update_policy(conn_a, "com.example.a", acc, num);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+		ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
-+		ASSERT_EXIT(ret < 0);
-+	}));
-+	ASSERT_RETURN(ret >= 0);
-+
-+	/*
-+	 * Make sure unprivileged users can acquire names if the only matching
-+	 * policy is somewhere in the middle.
-+	 */
-+
-+	num = 5;
-+	acc = (struct kdbus_policy_access[]){
-+		{
-+			.type = KDBUS_POLICY_ACCESS_USER,
-+			.id = 1,
-+			.access = KDBUS_POLICY_OWN,
-+		},
-+		{
-+			.type = KDBUS_POLICY_ACCESS_USER,
-+			.id = 2,
-+			.access = KDBUS_POLICY_OWN,
-+		},
-+		{
-+			.type = KDBUS_POLICY_ACCESS_USER,
-+			.id = UNPRIV_UID,
-+			.access = KDBUS_POLICY_OWN,
-+		},
-+		{
-+			.type = KDBUS_POLICY_ACCESS_USER,
-+			.id = 3,
-+			.access = KDBUS_POLICY_OWN,
-+		},
-+		{
-+			.type = KDBUS_POLICY_ACCESS_USER,
-+			.id = 4,
-+			.access = KDBUS_POLICY_OWN,
-+		},
-+	};
-+
-+	ret = kdbus_conn_update_policy(conn_a, "com.example.a", acc, num);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+		ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
-+		ASSERT_EXIT(ret >= 0);
-+	}));
-+	ASSERT_RETURN(ret >= 0);
-+
-+	/*
-+	 * Clear policies
-+	 */
-+
-+	ret = kdbus_conn_update_policy(conn_a, "com.example.a", NULL, 0);
-+	ASSERT_RETURN(ret == 0);
-+
-+	/*
-+	 * Make sure privileged bus users can _always_ talk to others.
-+	 */
-+
-+	conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn);
-+
-+	ret = kdbus_msg_send(conn, "com.example.b", 0xdeadbeef, 0, 0, 0, 0);
-+	ASSERT_EXIT(ret >= 0);
-+
-+	ret = kdbus_msg_recv_poll(conn_b, 300, NULL, NULL);
-+	ASSERT_EXIT(ret >= 0);
-+
-+	kdbus_conn_free(conn);
-+
-+	/*
-+	 * Make sure unprivileged bus users cannot talk by default.
-+	 */
-+
-+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+		ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
-+				     0, 0);
-+		ASSERT_EXIT(ret == -EPERM);
-+	}));
-+	ASSERT_RETURN(ret >= 0);
-+
-+	/*
-+	 * Make sure unprivileged bus users can talk to equals, even without
-+	 * policy.
-+	 */
-+
-+	access = (struct kdbus_policy_access){
-+		.type = KDBUS_POLICY_ACCESS_USER,
-+		.id = UNPRIV_UID,
-+		.access = KDBUS_POLICY_OWN,
-+	};
-+
-+	ret = kdbus_conn_update_policy(conn_a, "com.example.c", &access, 1);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+		struct kdbus_conn *owner;
-+
-+		owner = kdbus_hello(env->buspath, 0, NULL, 0);
-+		ASSERT_RETURN(owner);
-+
-+		ret = kdbus_name_acquire(owner, "com.example.c", NULL);
-+		ASSERT_EXIT(ret >= 0);
-+
-+		ret = kdbus_msg_send(unpriv, "com.example.c", 0xdeadbeef, 0, 0,
-+				     0, 0);
-+		ASSERT_EXIT(ret >= 0);
-+		ret = kdbus_msg_recv_poll(owner, 100, NULL, NULL);
-+		ASSERT_EXIT(ret >= 0);
-+
-+		kdbus_conn_free(owner);
-+	}));
-+	ASSERT_RETURN(ret >= 0);
-+
-+	/*
-+	 * Make sure unprivileged bus users can talk to privileged users if a
-+	 * suitable UID policy is set.
-+	 */
-+
-+	access = (struct kdbus_policy_access){
-+		.type = KDBUS_POLICY_ACCESS_USER,
-+		.id = UNPRIV_UID,
-+		.access = KDBUS_POLICY_TALK,
-+	};
-+
-+	ret = kdbus_conn_update_policy(conn_a, "com.example.b", &access, 1);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+		ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
-+				     0, 0);
-+		ASSERT_EXIT(ret >= 0);
-+	}));
-+	ASSERT_RETURN(ret >= 0);
-+
-+	ret = kdbus_msg_recv_poll(conn_b, 100, NULL, NULL);
-+	ASSERT_EXIT(ret >= 0);
-+
-+	/*
-+	 * Make sure unprivileged bus users can talk to privileged users if a
-+	 * suitable GID policy is set.
-+	 */
-+
-+	access = (struct kdbus_policy_access){
-+		.type = KDBUS_POLICY_ACCESS_GROUP,
-+		.id = UNPRIV_GID,
-+		.access = KDBUS_POLICY_TALK,
-+	};
-+
-+	ret = kdbus_conn_update_policy(conn_a, "com.example.b", &access, 1);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+		ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
-+				     0, 0);
-+		ASSERT_EXIT(ret >= 0);
-+	}));
-+	ASSERT_RETURN(ret >= 0);
-+
-+	ret = kdbus_msg_recv_poll(conn_b, 100, NULL, NULL);
-+	ASSERT_EXIT(ret >= 0);
-+
-+	/*
-+	 * Make sure unprivileged bus users can talk to privileged users if a
-+	 * suitable WORLD policy is set.
-+	 */
-+
-+	access = (struct kdbus_policy_access){
-+		.type = KDBUS_POLICY_ACCESS_WORLD,
-+		.id = 0,
-+		.access = KDBUS_POLICY_TALK,
-+	};
-+
-+	ret = kdbus_conn_update_policy(conn_a, "com.example.b", &access, 1);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+		ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
-+				     0, 0);
-+		ASSERT_EXIT(ret >= 0);
-+	}));
-+	ASSERT_RETURN(ret >= 0);
-+
-+	ret = kdbus_msg_recv_poll(conn_b, 100, NULL, NULL);
-+	ASSERT_EXIT(ret >= 0);
-+
-+	/*
-+	 * Make sure unprivileged bus users cannot talk to privileged users if
-+	 * no suitable policy is set.
-+	 */
-+
-+	num = 5;
-+	acc = (struct kdbus_policy_access[]){
-+		{
-+			.type = KDBUS_POLICY_ACCESS_USER,
-+			.id = 0,
-+			.access = KDBUS_POLICY_OWN,
-+		},
-+		{
-+			.type = KDBUS_POLICY_ACCESS_USER,
-+			.id = 1,
-+			.access = KDBUS_POLICY_TALK,
-+		},
-+		{
-+			.type = KDBUS_POLICY_ACCESS_USER,
-+			.id = UNPRIV_UID,
-+			.access = KDBUS_POLICY_SEE,
-+		},
-+		{
-+			.type = KDBUS_POLICY_ACCESS_USER,
-+			.id = 3,
-+			.access = KDBUS_POLICY_TALK,
-+		},
-+		{
-+			.type = KDBUS_POLICY_ACCESS_USER,
-+			.id = 4,
-+			.access = KDBUS_POLICY_TALK,
-+		},
-+	};
-+
-+	ret = kdbus_conn_update_policy(conn_a, "com.example.b", acc, num);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+		ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
-+				     0, 0);
-+		ASSERT_EXIT(ret == -EPERM);
-+	}));
-+	ASSERT_RETURN(ret >= 0);
-+
-+	/*
-+	 * Make sure unprivileged bus users can talk to privileged users if a
-+	 * suitable OWN privilege overwrites TALK.
-+	 */
-+
-+	access = (struct kdbus_policy_access){
-+		.type = KDBUS_POLICY_ACCESS_WORLD,
-+		.id = 0,
-+		.access = KDBUS_POLICY_OWN,
-+	};
-+
-+	ret = kdbus_conn_update_policy(conn_a, "com.example.b", &access, 1);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+		ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
-+				     0, 0);
-+		ASSERT_EXIT(ret >= 0);
-+	}));
-+	ASSERT_RETURN(ret >= 0);
-+
-+	ret = kdbus_msg_recv_poll(conn_b, 100, NULL, NULL);
-+	ASSERT_EXIT(ret >= 0);
-+
-+	/*
-+	 * Make sure the TALK cache is reset correctly when policies are
-+	 * updated.
-+	 */
-+
-+	access = (struct kdbus_policy_access){
-+		.type = KDBUS_POLICY_ACCESS_WORLD,
-+		.id = 0,
-+		.access = KDBUS_POLICY_TALK,
-+	};
-+
-+	ret = kdbus_conn_update_policy(conn_a, "com.example.b", &access, 1);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+		ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
-+				     0, 0);
-+		ASSERT_EXIT(ret >= 0);
-+
-+		ret = kdbus_msg_recv_poll(conn_b, 100, NULL, NULL);
-+		ASSERT_EXIT(ret >= 0);
-+
-+		ret = kdbus_conn_update_policy(conn_a, "com.example.b",
-+					       NULL, 0);
-+		ASSERT_RETURN(ret == 0);
-+
-+		ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
-+				     0, 0);
-+		ASSERT_EXIT(ret == -EPERM);
-+	}));
-+	ASSERT_RETURN(ret >= 0);
-+
-+	/*
-+	 * Make sure the TALK cache is reset correctly when policy holders
-+	 * disconnect.
-+	 */
-+
-+	access = (struct kdbus_policy_access){
-+		.type = KDBUS_POLICY_ACCESS_WORLD,
-+		.id = 0,
-+		.access = KDBUS_POLICY_OWN,
-+	};
-+
-+	conn = kdbus_hello_registrar(env->buspath, "com.example.c",
-+				     NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
-+	ASSERT_RETURN(conn);
-+
-+	ret = kdbus_conn_update_policy(conn, "com.example.c", &access, 1);
-+	ASSERT_RETURN(ret == 0);
-+
-+	owner = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(owner);
-+
-+	ret = kdbus_name_acquire(owner, "com.example.c", NULL);
-+	ASSERT_RETURN(ret >= 0);
-+
-+	ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_GID, ({
-+		struct kdbus_conn *unpriv;
-+
-+		/* wait for parent to be finished */
-+		sigemptyset(&sset);
-+		ret = sigsuspend(&sset);
-+		ASSERT_RETURN(ret == -1 && errno == EINTR);
-+
-+		unpriv = kdbus_hello(env->buspath, 0, NULL, 0);
-+		ASSERT_RETURN(unpriv);
-+
-+		ret = kdbus_msg_send(unpriv, "com.example.c", 0xdeadbeef, 0, 0,
-+				     0, 0);
-+		ASSERT_EXIT(ret >= 0);
-+
-+		ret = kdbus_msg_recv_poll(owner, 100, NULL, NULL);
-+		ASSERT_EXIT(ret >= 0);
-+
-+		/* free policy holder */
-+		kdbus_conn_free(conn);
-+
-+		ret = kdbus_msg_send(unpriv, "com.example.c", 0xdeadbeef, 0, 0,
-+				     0, 0);
-+		ASSERT_EXIT(ret == -EPERM);
-+
-+		kdbus_conn_free(unpriv);
-+	}), ({
-+		/* make sure policy holder is only valid in child */
-+		kdbus_conn_free(conn);
-+		kill(pid, SIGUSR1);
-+	}));
-+	ASSERT_RETURN(ret >= 0);
-+
-+
-+	/*
-+	 * The following tests are necessary.
-+	 */
-+
-+	ret = test_broadcast_after_policy_upload(env);
-+	ASSERT_RETURN(ret == 0);
-+
-+	kdbus_conn_free(owner);
-+
-+	/*
-+	 * cleanup resources
-+	 */
-+
-+	kdbus_conn_free(conn_b);
-+	kdbus_conn_free(conn_a);
-+
-+	return TEST_OK;
-+}
-+
-+int kdbus_test_policy_priv(struct kdbus_test_env *env)
-+{
-+	pid_t pid;
-+	int ret;
-+
-+	/* make sure to exit() if a child returns from fork() */
-+	pid = getpid();
-+	ret = test_policy_priv(env);
-+	if (pid != getpid())
-+		exit(1);
-+
-+	return ret;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-policy.c b/tools/testing/selftests/kdbus/test-policy.c
-new file mode 100644
-index 0000000..96d20d5
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-policy.c
-@@ -0,0 +1,80 @@
-+#include <errno.h>
-+#include <stdio.h>
-+#include <string.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stdint.h>
-+#include <stdbool.h>
-+#include <unistd.h>
-+
-+#include "kdbus-test.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+
-+int kdbus_test_policy(struct kdbus_test_env *env)
-+{
-+	struct kdbus_conn *conn_a, *conn_b;
-+	struct kdbus_policy_access access;
-+	int ret;
-+
-+	/* Invalid name */
-+	conn_a = kdbus_hello_registrar(env->buspath, ".example.a",
-+				       NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
-+	ASSERT_RETURN(conn_a == NULL);
-+
-+	conn_a = kdbus_hello_registrar(env->buspath, "example",
-+				       NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
-+	ASSERT_RETURN(conn_a == NULL);
-+
-+	conn_a = kdbus_hello_registrar(env->buspath, "com.example.a",
-+				       NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
-+	ASSERT_RETURN(conn_a);
-+
-+	conn_b = kdbus_hello_registrar(env->buspath, "com.example.b",
-+				       NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
-+	ASSERT_RETURN(conn_b);
-+
-+	/*
-+	 * Verify there cannot be any duplicate entries, except for specific vs.
-+	 * wildcard entries.
-+	 */
-+
-+	access = (struct kdbus_policy_access){
-+		.type = KDBUS_POLICY_ACCESS_USER,
-+		.id = geteuid(),
-+		.access = KDBUS_POLICY_SEE,
-+	};
-+
-+	ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_conn_update_policy(conn_b, "com.example.a", &access, 1);
-+	ASSERT_RETURN(ret == -EEXIST);
-+
-+	ret = kdbus_conn_update_policy(conn_b, "com.example.a.*", &access, 1);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_conn_update_policy(conn_a, "com.example.a.*", &access, 1);
-+	ASSERT_RETURN(ret == -EEXIST);
-+
-+	ret = kdbus_conn_update_policy(conn_a, "com.example.*", &access, 1);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_conn_update_policy(conn_b, "com.example.a", &access, 1);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = kdbus_conn_update_policy(conn_b, "com.example.*", &access, 1);
-+	ASSERT_RETURN(ret == -EEXIST);
-+
-+	/* Invalid name */
-+	ret = kdbus_conn_update_policy(conn_b, ".example.*", &access, 1);
-+	ASSERT_RETURN(ret == -EINVAL);
-+
-+	ret = kdbus_conn_update_policy(conn_b, "example", &access, 1);
-+	ASSERT_RETURN(ret == -EINVAL);
-+
-+	kdbus_conn_free(conn_b);
-+	kdbus_conn_free(conn_a);
-+
-+	return TEST_OK;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-sync.c b/tools/testing/selftests/kdbus/test-sync.c
-new file mode 100644
-index 0000000..0655a54
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-sync.c
-@@ -0,0 +1,369 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <time.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <pthread.h>
-+#include <stdbool.h>
-+#include <signal.h>
-+#include <sys/wait.h>
-+#include <sys/eventfd.h>
-+
-+#include "kdbus-api.h"
-+#include "kdbus-test.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+
-+static struct kdbus_conn *conn_a, *conn_b;
-+static unsigned int cookie = 0xdeadbeef;
-+
-+static void nop_handler(int sig) {}
-+
-+static int interrupt_sync(struct kdbus_conn *conn_src,
-+			  struct kdbus_conn *conn_dst)
-+{
-+	pid_t pid;
-+	int ret, status;
-+	struct kdbus_msg *msg = NULL;
-+	struct sigaction sa = {
-+		.sa_handler = nop_handler,
-+		.sa_flags = SA_NOCLDSTOP|SA_RESTART,
-+	};
-+
-+	cookie++;
-+	pid = fork();
-+	ASSERT_RETURN_VAL(pid >= 0, pid);
-+
-+	if (pid == 0) {
-+		ret = sigaction(SIGINT, &sa, NULL);
-+		ASSERT_EXIT(ret == 0);
-+
-+		ret = kdbus_msg_send_sync(conn_dst, NULL, cookie,
-+					  KDBUS_MSG_EXPECT_REPLY,
-+					  100000000ULL, 0, conn_src->id, -1);
-+		ASSERT_EXIT(ret == -ETIMEDOUT);
-+
-+		_exit(EXIT_SUCCESS);
-+	}
-+
-+	ret = kdbus_msg_recv_poll(conn_src, 100, &msg, NULL);
-+	ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
-+
-+	kdbus_msg_free(msg);
-+
-+	ret = kill(pid, SIGINT);
-+	ASSERT_RETURN_VAL(ret == 0, ret);
-+
-+	ret = waitpid(pid, &status, 0);
-+	ASSERT_RETURN_VAL(ret >= 0, ret);
-+
-+	if (WIFSIGNALED(status))
-+		return TEST_ERR;
-+
-+	ret = kdbus_msg_recv_poll(conn_src, 100, NULL, NULL);
-+	ASSERT_RETURN(ret == -ETIMEDOUT);
-+
-+	return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
-+}
-+
-+static int close_epipe_sync(const char *bus)
-+{
-+	pid_t pid;
-+	int ret, status;
-+	struct kdbus_conn *conn_src;
-+	struct kdbus_conn *conn_dst;
-+	struct kdbus_msg *msg = NULL;
-+
-+	conn_src = kdbus_hello(bus, 0, NULL, 0);
-+	ASSERT_RETURN(conn_src);
-+
-+	ret = kdbus_add_match_empty(conn_src);
-+	ASSERT_RETURN(ret == 0);
-+
-+	conn_dst = kdbus_hello(bus, 0, NULL, 0);
-+	ASSERT_RETURN(conn_dst);
-+
-+	cookie++;
-+	pid = fork();
-+	ASSERT_RETURN_VAL(pid >= 0, pid);
-+
-+	if (pid == 0) {
-+		uint64_t dst_id;
-+
-+		/* close our reference */
-+		dst_id = conn_dst->id;
-+		kdbus_conn_free(conn_dst);
-+
-+		ret = kdbus_msg_recv_poll(conn_src, 100, &msg, NULL);
-+		ASSERT_EXIT(ret == 0 && msg->cookie == cookie);
-+		ASSERT_EXIT(msg->src_id == dst_id);
-+
-+		cookie++;
-+		ret = kdbus_msg_send_sync(conn_src, NULL, cookie,
-+					  KDBUS_MSG_EXPECT_REPLY,
-+					  100000000ULL, 0, dst_id, -1);
-+		ASSERT_EXIT(ret == -EPIPE);
-+
-+		_exit(EXIT_SUCCESS);
-+	}
-+
-+	ret = kdbus_msg_send(conn_dst, NULL, cookie, 0, 0, 0,
-+			     KDBUS_DST_ID_BROADCAST);
-+	ASSERT_RETURN(ret == 0);
-+
-+	cookie++;
-+	ret = kdbus_msg_recv_poll(conn_dst, 100, &msg, NULL);
-+	ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
-+
-+	kdbus_msg_free(msg);
-+
-+	/* destroy connection */
-+	kdbus_conn_free(conn_dst);
-+	kdbus_conn_free(conn_src);
-+
-+	ret = waitpid(pid, &status, 0);
-+	ASSERT_RETURN_VAL(ret >= 0, ret);
-+
-+	if (!WIFEXITED(status))
-+		return TEST_ERR;
-+
-+	return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
-+}
-+
-+static int cancel_fd_sync(struct kdbus_conn *conn_src,
-+			  struct kdbus_conn *conn_dst)
-+{
-+	pid_t pid;
-+	int cancel_fd;
-+	int ret, status;
-+	uint64_t counter = 1;
-+	struct kdbus_msg *msg = NULL;
-+
-+	cancel_fd = eventfd(0, 0);
-+	ASSERT_RETURN_VAL(cancel_fd >= 0, cancel_fd);
-+
-+	cookie++;
-+	pid = fork();
-+	ASSERT_RETURN_VAL(pid >= 0, pid);
-+
-+	if (pid == 0) {
-+		ret = kdbus_msg_send_sync(conn_dst, NULL, cookie,
-+					  KDBUS_MSG_EXPECT_REPLY,
-+					  100000000ULL, 0, conn_src->id,
-+					  cancel_fd);
-+		ASSERT_EXIT(ret == -ECANCELED);
-+
-+		_exit(EXIT_SUCCESS);
-+	}
-+
-+	ret = kdbus_msg_recv_poll(conn_src, 100, &msg, NULL);
-+	ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
-+
-+	kdbus_msg_free(msg);
-+
-+	ret = write(cancel_fd, &counter, sizeof(counter));
-+	ASSERT_RETURN(ret == sizeof(counter));
-+
-+	ret = waitpid(pid, &status, 0);
-+	ASSERT_RETURN_VAL(ret >= 0, ret);
-+
-+	if (WIFSIGNALED(status))
-+		return TEST_ERR;
-+
-+	return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
-+}
-+
-+static int no_cancel_sync(struct kdbus_conn *conn_src,
-+			  struct kdbus_conn *conn_dst)
-+{
-+	pid_t pid;
-+	int cancel_fd;
-+	int ret, status;
-+	struct kdbus_msg *msg = NULL;
-+
-+	/* pass eventfd, but never signal it so it shouldn't have any effect */
-+
-+	cancel_fd = eventfd(0, 0);
-+	ASSERT_RETURN_VAL(cancel_fd >= 0, cancel_fd);
-+
-+	cookie++;
-+	pid = fork();
-+	ASSERT_RETURN_VAL(pid >= 0, pid);
-+
-+	if (pid == 0) {
-+		ret = kdbus_msg_send_sync(conn_dst, NULL, cookie,
-+					  KDBUS_MSG_EXPECT_REPLY,
-+					  100000000ULL, 0, conn_src->id,
-+					  cancel_fd);
-+		ASSERT_EXIT(ret == 0);
-+
-+		_exit(EXIT_SUCCESS);
-+	}
-+
-+	ret = kdbus_msg_recv_poll(conn_src, 100, &msg, NULL);
-+	ASSERT_RETURN_VAL(ret == 0 && msg->cookie == cookie, -1);
-+
-+	kdbus_msg_free(msg);
-+
-+	ret = kdbus_msg_send_reply(conn_src, cookie, conn_dst->id);
-+	ASSERT_RETURN_VAL(ret >= 0, ret);
-+
-+	ret = waitpid(pid, &status, 0);
-+	ASSERT_RETURN_VAL(ret >= 0, ret);
-+
-+	if (WIFSIGNALED(status))
-+		return -1;
-+
-+	return (status == EXIT_SUCCESS) ? 0 : -1;
-+}
-+
-+static void *run_thread_reply(void *data)
-+{
-+	int ret;
-+	unsigned long status = TEST_OK;
-+
-+	ret = kdbus_msg_recv_poll(conn_a, 3000, NULL, NULL);
-+	if (ret < 0)
-+		goto exit_thread;
-+
-+	kdbus_printf("Thread received message, sending reply ...\n");
-+
-+	/* using an unknown cookie must fail */
-+	ret = kdbus_msg_send_reply(conn_a, ~cookie, conn_b->id);
-+	if (ret != -EBADSLT) {
-+		status = TEST_ERR;
-+		goto exit_thread;
-+	}
-+
-+	ret = kdbus_msg_send_reply(conn_a, cookie, conn_b->id);
-+	if (ret != 0) {
-+		status = TEST_ERR;
-+		goto exit_thread;
-+	}
-+
-+exit_thread:
-+	pthread_exit(NULL);
-+	return (void *) status;
-+}
-+
-+int kdbus_test_sync_reply(struct kdbus_test_env *env)
-+{
-+	unsigned long status;
-+	pthread_t thread;
-+	int ret;
-+
-+	conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
-+	conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn_a && conn_b);
-+
-+	pthread_create(&thread, NULL, run_thread_reply, NULL);
-+
-+	ret = kdbus_msg_send_sync(conn_b, NULL, cookie,
-+				  KDBUS_MSG_EXPECT_REPLY,
-+				  5000000000ULL, 0, conn_a->id, -1);
-+
-+	pthread_join(thread, (void *) &status);
-+	ASSERT_RETURN(status == 0);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = interrupt_sync(conn_a, conn_b);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = close_epipe_sync(env->buspath);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = cancel_fd_sync(conn_a, conn_b);
-+	ASSERT_RETURN(ret == 0);
-+
-+	ret = no_cancel_sync(conn_a, conn_b);
-+	ASSERT_RETURN(ret == 0);
-+
-+	kdbus_printf("-- closing bus connections\n");
-+
-+	kdbus_conn_free(conn_a);
-+	kdbus_conn_free(conn_b);
-+
-+	return TEST_OK;
-+}
-+
-+#define BYEBYE_ME ((void*)0L)
-+#define BYEBYE_THEM ((void*)1L)
-+
-+static void *run_thread_byebye(void *data)
-+{
-+	struct kdbus_cmd cmd_byebye = { .size = sizeof(cmd_byebye) };
-+	int ret;
-+
-+	ret = kdbus_msg_recv_poll(conn_a, 3000, NULL, NULL);
-+	if (ret == 0) {
-+		kdbus_printf("Thread received message, invoking BYEBYE ...\n");
-+		kdbus_msg_recv(conn_a, NULL, NULL);
-+		if (data == BYEBYE_ME)
-+			kdbus_cmd_byebye(conn_b->fd, &cmd_byebye);
-+		else if (data == BYEBYE_THEM)
-+			kdbus_cmd_byebye(conn_a->fd, &cmd_byebye);
-+	}
-+
-+	pthread_exit(NULL);
-+	return NULL;
-+}
-+
-+int kdbus_test_sync_byebye(struct kdbus_test_env *env)
-+{
-+	pthread_t thread;
-+	int ret;
-+
-+	/*
-+	 * This sends a synchronous message to a thread, which waits until it
-+	 * received the message and then invokes BYEBYE on the *ORIGINAL*
-+	 * connection. That is, on the same connection that synchronously waits
-+	 * for an reply.
-+	 * This should properly wake the connection up and cause ECONNRESET as
-+	 * the connection is disconnected now.
-+	 *
-+	 * The second time, we do the same but invoke BYEBYE on the *TARGET*
-+	 * connection. This should also wake up the synchronous sender as the
-+	 * reply cannot be sent by a disconnected target.
-+	 */
-+
-+	conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
-+	conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn_a && conn_b);
-+
-+	pthread_create(&thread, NULL, run_thread_byebye, BYEBYE_ME);
-+
-+	ret = kdbus_msg_send_sync(conn_b, NULL, cookie,
-+				  KDBUS_MSG_EXPECT_REPLY,
-+				  5000000000ULL, 0, conn_a->id, -1);
-+
-+	ASSERT_RETURN(ret == -ECONNRESET);
-+
-+	pthread_join(thread, NULL);
-+
-+	kdbus_conn_free(conn_a);
-+	kdbus_conn_free(conn_b);
-+
-+	conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
-+	conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn_a && conn_b);
-+
-+	pthread_create(&thread, NULL, run_thread_byebye, BYEBYE_THEM);
-+
-+	ret = kdbus_msg_send_sync(conn_b, NULL, cookie,
-+				  KDBUS_MSG_EXPECT_REPLY,
-+				  5000000000ULL, 0, conn_a->id, -1);
-+
-+	ASSERT_RETURN(ret == -EPIPE);
-+
-+	pthread_join(thread, NULL);
-+
-+	kdbus_conn_free(conn_a);
-+	kdbus_conn_free(conn_b);
-+
-+	return TEST_OK;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-timeout.c b/tools/testing/selftests/kdbus/test-timeout.c
-new file mode 100644
-index 0000000..cfd1930
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-timeout.c
-@@ -0,0 +1,99 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <time.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <poll.h>
-+#include <stdbool.h>
-+
-+#include "kdbus-api.h"
-+#include "kdbus-test.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+
-+int timeout_msg_recv(struct kdbus_conn *conn, uint64_t *expected)
-+{
-+	struct kdbus_cmd_recv recv = { .size = sizeof(recv) };
-+	struct kdbus_msg *msg;
-+	int ret;
-+
-+	ret = kdbus_cmd_recv(conn->fd, &recv);
-+	if (ret < 0) {
-+		kdbus_printf("error receiving message: %d (%m)\n", ret);
-+		return ret;
-+	}
-+
-+	msg = (struct kdbus_msg *)(conn->buf + recv.msg.offset);
-+
-+	ASSERT_RETURN_VAL(msg->payload_type == KDBUS_PAYLOAD_KERNEL, -EINVAL);
-+	ASSERT_RETURN_VAL(msg->src_id == KDBUS_SRC_ID_KERNEL, -EINVAL);
-+	ASSERT_RETURN_VAL(msg->dst_id == conn->id, -EINVAL);
-+
-+	*expected &= ~(1ULL << msg->cookie_reply);
-+	kdbus_printf("Got message timeout for cookie %llu\n",
-+		     msg->cookie_reply);
-+
-+	ret = kdbus_free(conn, recv.msg.offset);
-+	if (ret < 0)
-+		return ret;
-+
-+	return 0;
-+}
-+
-+int kdbus_test_timeout(struct kdbus_test_env *env)
-+{
-+	struct kdbus_conn *conn_a, *conn_b;
-+	struct pollfd fd;
-+	int ret, i, n_msgs = 4;
-+	uint64_t expected = 0;
-+	uint64_t cookie = 0xdeadbeef;
-+
-+	conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
-+	conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
-+	ASSERT_RETURN(conn_a && conn_b);
-+
-+	fd.fd = conn_b->fd;
-+
-+	/*
-+	 * send messages that expect a reply (within 100 msec),
-+	 * but never answer it.
-+	 */
-+	for (i = 0; i < n_msgs; i++, cookie++) {
-+		kdbus_printf("Sending message with cookie %llu ...\n",
-+			     (unsigned long long)cookie);
-+		ASSERT_RETURN(kdbus_msg_send(conn_b, NULL, cookie,
-+			      KDBUS_MSG_EXPECT_REPLY,
-+			      (i + 1) * 100ULL * 1000000ULL, 0,
-+			      conn_a->id) == 0);
-+		expected |= 1ULL << cookie;
-+	}
-+
-+	for (;;) {
-+		fd.events = POLLIN | POLLPRI | POLLHUP;
-+		fd.revents = 0;
-+
-+		ret = poll(&fd, 1, (n_msgs + 1) * 100);
-+		if (ret == 0)
-+			kdbus_printf("--- timeout\n");
-+		if (ret <= 0)
-+			break;
-+
-+		if (fd.revents & POLLIN)
-+			ASSERT_RETURN(!timeout_msg_recv(conn_b, &expected));
-+
-+		if (expected == 0)
-+			break;
-+	}
-+
-+	ASSERT_RETURN(expected == 0);
-+
-+	kdbus_conn_free(conn_a);
-+	kdbus_conn_free(conn_b);
-+
-+	return TEST_OK;
-+}


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-11-10  0:58 Mike Pagano
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-11-10  0:58 UTC (permalink / raw
  To: gentoo-commits

commit:     decadf545e156ac9100bb99a7dc63d44bbfb1c08
Author:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Tue Nov 10 00:58:38 2015 +0000
Commit:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Tue Nov 10 00:58:38 2015 +0000
URL:        https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=decadf54

Linux patch 4.2.6

 0000_README            |    4 +
 1005_linux-4.2.6.patch | 3380 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 3384 insertions(+)

diff --git a/0000_README b/0000_README
index cf9d964..8190b77 100644
--- a/0000_README
+++ b/0000_README
@@ -63,6 +63,10 @@ Patch:  1004_linux-4.2.5.patch
 From:   http://www.kernel.org
 Desc:   Linux 4.2.5
 
+Patch:  1005_linux-4.2.6.patch
+From:   http://www.kernel.org
+Desc:   Linux 4.2.6
+
 Patch:  1500_XATTR_USER_PREFIX.patch
 From:   https://bugs.gentoo.org/show_bug.cgi?id=470644
 Desc:   Support for namespace user.pax.* on tmpfs.

diff --git a/1005_linux-4.2.6.patch b/1005_linux-4.2.6.patch
new file mode 100644
index 0000000..39cc395
--- /dev/null
+++ b/1005_linux-4.2.6.patch
@@ -0,0 +1,3380 @@
+diff --git a/Makefile b/Makefile
+index 96076dcad18e..9ef37399b4e8 100644
+--- a/Makefile
++++ b/Makefile
+@@ -1,6 +1,6 @@
+ VERSION = 4
+ PATCHLEVEL = 2
+-SUBLEVEL = 5
++SUBLEVEL = 6
+ EXTRAVERSION =
+ NAME = Hurr durr I'ma sheep
+ 
+diff --git a/arch/arm/boot/dts/am57xx-beagle-x15.dts b/arch/arm/boot/dts/am57xx-beagle-x15.dts
+index a63bf78191ea..03385fabf839 100644
+--- a/arch/arm/boot/dts/am57xx-beagle-x15.dts
++++ b/arch/arm/boot/dts/am57xx-beagle-x15.dts
+@@ -415,11 +415,12 @@
+ 				/* SMPS9 unused */
+ 
+ 				ldo1_reg: ldo1 {
+-					/* VDD_SD  */
++					/* VDD_SD / VDDSHV8  */
+ 					regulator-name = "ldo1";
+ 					regulator-min-microvolt = <1800000>;
+ 					regulator-max-microvolt = <3300000>;
+ 					regulator-boot-on;
++					regulator-always-on;
+ 				};
+ 
+ 				ldo2_reg: ldo2 {
+diff --git a/arch/arm/boot/dts/armada-385-db-ap.dts b/arch/arm/boot/dts/armada-385-db-ap.dts
+index 89f5a95954ed..4047621b137e 100644
+--- a/arch/arm/boot/dts/armada-385-db-ap.dts
++++ b/arch/arm/boot/dts/armada-385-db-ap.dts
+@@ -46,7 +46,7 @@
+ 
+ / {
+ 	model = "Marvell Armada 385 Access Point Development Board";
+-	compatible = "marvell,a385-db-ap", "marvell,armada385", "marvell,armada38x";
++	compatible = "marvell,a385-db-ap", "marvell,armada385", "marvell,armada380";
+ 
+ 	chosen {
+ 		stdout-path = "serial1:115200n8";
+diff --git a/arch/arm/boot/dts/berlin2q.dtsi b/arch/arm/boot/dts/berlin2q.dtsi
+index 63a48490e2f9..d4dbd28d348c 100644
+--- a/arch/arm/boot/dts/berlin2q.dtsi
++++ b/arch/arm/boot/dts/berlin2q.dtsi
+@@ -152,7 +152,7 @@
+ 		};
+ 
+ 		usb_phy2: phy@a2f400 {
+-			compatible = "marvell,berlin2-usb-phy";
++			compatible = "marvell,berlin2cd-usb-phy";
+ 			reg = <0xa2f400 0x128>;
+ 			#phy-cells = <0>;
+ 			resets = <&chip_rst 0x104 14>;
+@@ -170,7 +170,7 @@
+ 		};
+ 
+ 		usb_phy0: phy@b74000 {
+-			compatible = "marvell,berlin2-usb-phy";
++			compatible = "marvell,berlin2cd-usb-phy";
+ 			reg = <0xb74000 0x128>;
+ 			#phy-cells = <0>;
+ 			resets = <&chip_rst 0x104 12>;
+@@ -178,7 +178,7 @@
+ 		};
+ 
+ 		usb_phy1: phy@b78000 {
+-			compatible = "marvell,berlin2-usb-phy";
++			compatible = "marvell,berlin2cd-usb-phy";
+ 			reg = <0xb78000 0x128>;
+ 			#phy-cells = <0>;
+ 			resets = <&chip_rst 0x104 13>;
+diff --git a/arch/arm/boot/dts/exynos5420-peach-pit.dts b/arch/arm/boot/dts/exynos5420-peach-pit.dts
+index 8f4d76c5e11c..1b95da79293c 100644
+--- a/arch/arm/boot/dts/exynos5420-peach-pit.dts
++++ b/arch/arm/boot/dts/exynos5420-peach-pit.dts
+@@ -915,6 +915,11 @@
+ 	};
+ };
+ 
++&pmu_system_controller {
++	assigned-clocks = <&pmu_system_controller 0>;
++	assigned-clock-parents = <&clock CLK_FIN_PLL>;
++};
++
+ &rtc {
+ 	status = "okay";
+ 	clocks = <&clock CLK_RTC>, <&max77802 MAX77802_CLK_32K_AP>;
+diff --git a/arch/arm/boot/dts/exynos5800-peach-pi.dts b/arch/arm/boot/dts/exynos5800-peach-pi.dts
+index 7d5b386b5ae6..8f40c7e549bd 100644
+--- a/arch/arm/boot/dts/exynos5800-peach-pi.dts
++++ b/arch/arm/boot/dts/exynos5800-peach-pi.dts
+@@ -878,6 +878,11 @@
+ 	};
+ };
+ 
++&pmu_system_controller {
++	assigned-clocks = <&pmu_system_controller 0>;
++	assigned-clock-parents = <&clock CLK_FIN_PLL>;
++};
++
+ &rtc {
+ 	status = "okay";
+ 	clocks = <&clock CLK_RTC>, <&max77802 MAX77802_CLK_32K_AP>;
+diff --git a/arch/arm/boot/dts/imx7d.dtsi b/arch/arm/boot/dts/imx7d.dtsi
+index c42cf8db0451..9accbae15374 100644
+--- a/arch/arm/boot/dts/imx7d.dtsi
++++ b/arch/arm/boot/dts/imx7d.dtsi
+@@ -340,10 +340,10 @@
+ 				status = "disabled";
+ 			};
+ 
+-			uart2: serial@30870000 {
++			uart2: serial@30890000 {
+ 				compatible = "fsl,imx7d-uart",
+ 					     "fsl,imx6q-uart";
+-				reg = <0x30870000 0x10000>;
++				reg = <0x30890000 0x10000>;
+ 				interrupts = <GIC_SPI 27 IRQ_TYPE_LEVEL_HIGH>;
+ 				clocks = <&clks IMX7D_UART2_ROOT_CLK>,
+ 					<&clks IMX7D_UART2_ROOT_CLK>;
+diff --git a/arch/arm/boot/dts/ste-hrefv60plus.dtsi b/arch/arm/boot/dts/ste-hrefv60plus.dtsi
+index 810cda743b6d..9c2387b34d0c 100644
+--- a/arch/arm/boot/dts/ste-hrefv60plus.dtsi
++++ b/arch/arm/boot/dts/ste-hrefv60plus.dtsi
+@@ -56,7 +56,7 @@
+ 					/* VMMCI level-shifter enable */
+ 					default_hrefv60_cfg2 {
+ 						pins = "GPIO169_D22";
+-						ste,config = <&gpio_out_lo>;
++						ste,config = <&gpio_out_hi>;
+ 					};
+ 					/* VMMCI level-shifter voltage select */
+ 					default_hrefv60_cfg3 {
+diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
+index bfb915d05665..dd5fc1e36384 100644
+--- a/arch/arm/kvm/Kconfig
++++ b/arch/arm/kvm/Kconfig
+@@ -21,6 +21,7 @@ config KVM
+ 	depends on MMU && OF
+ 	select PREEMPT_NOTIFIERS
+ 	select ANON_INODES
++	select ARM_GIC
+ 	select HAVE_KVM_CPU_RELAX_INTERCEPT
+ 	select HAVE_KVM_ARCH_TLB_FLUSH_ALL
+ 	select KVM_MMIO
+diff --git a/arch/arm/mach-exynos/pm_domains.c b/arch/arm/mach-exynos/pm_domains.c
+index 4a87e86dec45..7c21760f590f 100644
+--- a/arch/arm/mach-exynos/pm_domains.c
++++ b/arch/arm/mach-exynos/pm_domains.c
+@@ -200,15 +200,15 @@ no_clk:
+ 		args.args_count = 0;
+ 		child_domain = of_genpd_get_from_provider(&args);
+ 		if (IS_ERR(child_domain))
+-			goto next_pd;
++			continue;
+ 
+ 		if (of_parse_phandle_with_args(np, "power-domains",
+ 					 "#power-domain-cells", 0, &args) != 0)
+-			goto next_pd;
++			continue;
+ 
+ 		parent_domain = of_genpd_get_from_provider(&args);
+ 		if (IS_ERR(parent_domain))
+-			goto next_pd;
++			continue;
+ 
+ 		if (pm_genpd_add_subdomain(parent_domain, child_domain))
+ 			pr_warn("%s failed to add subdomain: %s\n",
+@@ -216,8 +216,6 @@ no_clk:
+ 		else
+ 			pr_info("%s has as child subdomain: %s.\n",
+ 				parent_domain->name, child_domain->name);
+-next_pd:
+-		of_node_put(np);
+ 	}
+ 
+ 	return 0;
+diff --git a/arch/arm/plat-orion/common.c b/arch/arm/plat-orion/common.c
+index 2235081a04ee..8861c367d061 100644
+--- a/arch/arm/plat-orion/common.c
++++ b/arch/arm/plat-orion/common.c
+@@ -495,7 +495,7 @@ void __init orion_ge00_switch_init(struct dsa_platform_data *d, int irq)
+ 
+ 	d->netdev = &orion_ge00.dev;
+ 	for (i = 0; i < d->nr_chips; i++)
+-		d->chip[i].host_dev = &orion_ge00_shared.dev;
++		d->chip[i].host_dev = &orion_ge_mvmdio.dev;
+ 	orion_switch_device.dev.platform_data = d;
+ 
+ 	platform_device_register(&orion_switch_device);
+diff --git a/arch/arm/vdso/vdsomunge.c b/arch/arm/vdso/vdsomunge.c
+index aedec81d1198..f6455273b2f8 100644
+--- a/arch/arm/vdso/vdsomunge.c
++++ b/arch/arm/vdso/vdsomunge.c
+@@ -45,7 +45,6 @@
+  * it does.
+  */
+ 
+-#include <byteswap.h>
+ #include <elf.h>
+ #include <errno.h>
+ #include <fcntl.h>
+@@ -59,6 +58,16 @@
+ #include <sys/types.h>
+ #include <unistd.h>
+ 
++#define swab16(x) \
++	((((x) & 0x00ff) << 8) | \
++	 (((x) & 0xff00) >> 8))
++
++#define swab32(x) \
++	((((x) & 0x000000ff) << 24) | \
++	 (((x) & 0x0000ff00) <<  8) | \
++	 (((x) & 0x00ff0000) >>  8) | \
++	 (((x) & 0xff000000) >> 24))
++
+ #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+ #define HOST_ORDER ELFDATA2LSB
+ #elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+@@ -104,17 +113,17 @@ static void cleanup(void)
+ 
+ static Elf32_Word read_elf_word(Elf32_Word word, bool swap)
+ {
+-	return swap ? bswap_32(word) : word;
++	return swap ? swab32(word) : word;
+ }
+ 
+ static Elf32_Half read_elf_half(Elf32_Half half, bool swap)
+ {
+-	return swap ? bswap_16(half) : half;
++	return swap ? swab16(half) : half;
+ }
+ 
+ static void write_elf_word(Elf32_Word val, Elf32_Word *dst, bool swap)
+ {
+-	*dst = swap ? bswap_32(val) : val;
++	*dst = swap ? swab32(val) : val;
+ }
+ 
+ int main(int argc, char **argv)
+diff --git a/arch/arm64/kernel/armv8_deprecated.c b/arch/arm64/kernel/armv8_deprecated.c
+index 7922c2e710ca..7ac3920b1356 100644
+--- a/arch/arm64/kernel/armv8_deprecated.c
++++ b/arch/arm64/kernel/armv8_deprecated.c
+@@ -279,22 +279,24 @@ static void register_insn_emulation_sysctl(struct ctl_table *table)
+  */
+ #define __user_swpX_asm(data, addr, res, temp, B)		\
+ 	__asm__ __volatile__(					\
+-	"	mov		%w2, %w1\n"			\
+-	"0:	ldxr"B"		%w1, [%3]\n"			\
+-	"1:	stxr"B"		%w0, %w2, [%3]\n"		\
++	"0:	ldxr"B"		%w2, [%3]\n"			\
++	"1:	stxr"B"		%w0, %w1, [%3]\n"		\
+ 	"	cbz		%w0, 2f\n"			\
+ 	"	mov		%w0, %w4\n"			\
++	"	b		3f\n"				\
+ 	"2:\n"							\
++	"	mov		%w1, %w2\n"			\
++	"3:\n"							\
+ 	"	.pushsection	 .fixup,\"ax\"\n"		\
+ 	"	.align		2\n"				\
+-	"3:	mov		%w0, %w5\n"			\
+-	"	b		2b\n"				\
++	"4:	mov		%w0, %w5\n"			\
++	"	b		3b\n"				\
+ 	"	.popsection"					\
+ 	"	.pushsection	 __ex_table,\"a\"\n"		\
+ 	"	.align		3\n"				\
+-	"	.quad		0b, 3b\n"			\
+-	"	.quad		1b, 3b\n"			\
+-	"	.popsection"					\
++	"	.quad		0b, 4b\n"			\
++	"	.quad		1b, 4b\n"			\
++	"	.popsection\n"					\
+ 	: "=&r" (res), "+r" (data), "=&r" (temp)		\
+ 	: "r" (addr), "i" (-EAGAIN), "i" (-EFAULT)		\
+ 	: "memory")
+diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
+index 407991bf79f5..ccb6078ed9f2 100644
+--- a/arch/arm64/kernel/stacktrace.c
++++ b/arch/arm64/kernel/stacktrace.c
+@@ -48,11 +48,7 @@ int notrace unwind_frame(struct stackframe *frame)
+ 
+ 	frame->sp = fp + 0x10;
+ 	frame->fp = *(unsigned long *)(fp);
+-	/*
+-	 * -4 here because we care about the PC at time of bl,
+-	 * not where the return will go.
+-	 */
+-	frame->pc = *(unsigned long *)(fp + 8) - 4;
++	frame->pc = *(unsigned long *)(fp + 8);
+ 
+ 	return 0;
+ }
+diff --git a/arch/arm64/kernel/suspend.c b/arch/arm64/kernel/suspend.c
+index 8297d502217e..44ca4143b013 100644
+--- a/arch/arm64/kernel/suspend.c
++++ b/arch/arm64/kernel/suspend.c
+@@ -80,17 +80,21 @@ int cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
+ 	if (ret == 0) {
+ 		/*
+ 		 * We are resuming from reset with TTBR0_EL1 set to the
+-		 * idmap to enable the MMU; restore the active_mm mappings in
+-		 * TTBR0_EL1 unless the active_mm == &init_mm, in which case
+-		 * the thread entered cpu_suspend with TTBR0_EL1 set to
+-		 * reserved TTBR0 page tables and should be restored as such.
++		 * idmap to enable the MMU; set the TTBR0 to the reserved
++		 * page tables to prevent speculative TLB allocations, flush
++		 * the local tlb and set the default tcr_el1.t0sz so that
++		 * the TTBR0 address space set-up is properly restored.
++		 * If the current active_mm != &init_mm we entered cpu_suspend
++		 * with mappings in TTBR0 that must be restored, so we switch
++		 * them back to complete the address space configuration
++		 * restoration before returning.
+ 		 */
+-		if (mm == &init_mm)
+-			cpu_set_reserved_ttbr0();
+-		else
+-			cpu_switch_mm(mm->pgd, mm);
+-
++		cpu_set_reserved_ttbr0();
+ 		flush_tlb_all();
++		cpu_set_default_tcr_t0sz();
++
++		if (mm != &init_mm)
++			cpu_switch_mm(mm->pgd, mm);
+ 
+ 		/*
+ 		 * Restore per-cpu offset before any kernel
+diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
+index caffb10e7aa3..5607693f35cf 100644
+--- a/arch/powerpc/kernel/rtas.c
++++ b/arch/powerpc/kernel/rtas.c
+@@ -1041,6 +1041,9 @@ asmlinkage int ppc_rtas(struct rtas_args __user *uargs)
+ 	if (!capable(CAP_SYS_ADMIN))
+ 		return -EPERM;
+ 
++	if (!rtas.entry)
++		return -EINVAL;
++
+ 	if (copy_from_user(&args, uargs, 3 * sizeof(u32)) != 0)
+ 		return -EFAULT;
+ 
+diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c
+index 557232f758b6..5610b185d1e9 100644
+--- a/arch/um/kernel/trap.c
++++ b/arch/um/kernel/trap.c
+@@ -220,7 +220,7 @@ unsigned long segv(struct faultinfo fi, unsigned long ip, int is_user,
+ 		show_regs(container_of(regs, struct pt_regs, regs));
+ 		panic("Segfault with no mm");
+ 	}
+-	else if (!is_user && address < TASK_SIZE) {
++	else if (!is_user && address > PAGE_SIZE && address < TASK_SIZE) {
+ 		show_regs(container_of(regs, struct pt_regs, regs));
+ 		panic("Kernel tried to access user memory at addr 0x%lx, ip 0x%lx",
+ 		       address, ip);
+diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
+index 7d69afd8b6fa..16edc0f169fa 100644
+--- a/arch/x86/boot/compressed/eboot.c
++++ b/arch/x86/boot/compressed/eboot.c
+@@ -667,6 +667,7 @@ setup_gop32(struct screen_info *si, efi_guid_t *proto,
+ 		bool conout_found = false;
+ 		void *dummy = NULL;
+ 		u32 h = handles[i];
++		u32 current_fb_base;
+ 
+ 		status = efi_call_early(handle_protocol, h,
+ 					proto, (void **)&gop32);
+@@ -678,7 +679,7 @@ setup_gop32(struct screen_info *si, efi_guid_t *proto,
+ 		if (status == EFI_SUCCESS)
+ 			conout_found = true;
+ 
+-		status = __gop_query32(gop32, &info, &size, &fb_base);
++		status = __gop_query32(gop32, &info, &size, &current_fb_base);
+ 		if (status == EFI_SUCCESS && (!first_gop || conout_found)) {
+ 			/*
+ 			 * Systems that use the UEFI Console Splitter may
+@@ -692,6 +693,7 @@ setup_gop32(struct screen_info *si, efi_guid_t *proto,
+ 			pixel_format = info->pixel_format;
+ 			pixel_info = info->pixel_information;
+ 			pixels_per_scan_line = info->pixels_per_scan_line;
++			fb_base = current_fb_base;
+ 
+ 			/*
+ 			 * Once we've found a GOP supporting ConOut,
+@@ -770,6 +772,7 @@ setup_gop64(struct screen_info *si, efi_guid_t *proto,
+ 		bool conout_found = false;
+ 		void *dummy = NULL;
+ 		u64 h = handles[i];
++		u32 current_fb_base;
+ 
+ 		status = efi_call_early(handle_protocol, h,
+ 					proto, (void **)&gop64);
+@@ -781,7 +784,7 @@ setup_gop64(struct screen_info *si, efi_guid_t *proto,
+ 		if (status == EFI_SUCCESS)
+ 			conout_found = true;
+ 
+-		status = __gop_query64(gop64, &info, &size, &fb_base);
++		status = __gop_query64(gop64, &info, &size, &current_fb_base);
+ 		if (status == EFI_SUCCESS && (!first_gop || conout_found)) {
+ 			/*
+ 			 * Systems that use the UEFI Console Splitter may
+@@ -795,6 +798,7 @@ setup_gop64(struct screen_info *si, efi_guid_t *proto,
+ 			pixel_format = info->pixel_format;
+ 			pixel_info = info->pixel_information;
+ 			pixels_per_scan_line = info->pixels_per_scan_line;
++			fb_base = current_fb_base;
+ 
+ 			/*
+ 			 * Once we've found a GOP supporting ConOut,
+diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
+index 5880b482d83c..11b46d91f4e5 100644
+--- a/arch/x86/kernel/apic/io_apic.c
++++ b/arch/x86/kernel/apic/io_apic.c
+@@ -2547,7 +2547,9 @@ void __init setup_ioapic_dest(void)
+ 			mask = apic->target_cpus();
+ 
+ 		chip = irq_data_get_irq_chip(idata);
+-		chip->irq_set_affinity(idata, mask, false);
++		/* Might be lapic_chip for irq 0 */
++		if (chip->irq_set_affinity)
++			chip->irq_set_affinity(idata, mask, false);
+ 	}
+ }
+ #endif
+diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
+index 777ad2f03160..3cebc65221a2 100644
+--- a/arch/x86/xen/enlighten.c
++++ b/arch/x86/xen/enlighten.c
+@@ -33,7 +33,7 @@
+ #include <linux/memblock.h>
+ #include <linux/edd.h>
+ 
+-#ifdef CONFIG_KEXEC_CORE
++#ifdef CONFIG_KEXEC
+ #include <linux/kexec.h>
+ #endif
+ 
+@@ -1804,7 +1804,7 @@ static struct notifier_block xen_hvm_cpu_notifier = {
+ 	.notifier_call	= xen_hvm_cpu_notify,
+ };
+ 
+-#ifdef CONFIG_KEXEC_CORE
++#ifdef CONFIG_KEXEC
+ static void xen_hvm_shutdown(void)
+ {
+ 	native_machine_shutdown();
+@@ -1838,7 +1838,7 @@ static void __init xen_hvm_guest_init(void)
+ 	x86_init.irqs.intr_init = xen_init_IRQ;
+ 	xen_hvm_init_time_ops();
+ 	xen_hvm_init_mmu_ops();
+-#ifdef CONFIG_KEXEC_CORE
++#ifdef CONFIG_KEXEC
+ 	machine_ops.shutdown = xen_hvm_shutdown;
+ 	machine_ops.crash_shutdown = xen_hvm_crash_shutdown;
+ #endif
+diff --git a/block/blk-core.c b/block/blk-core.c
+index 627ed0c593fb..1955ed3a1fa9 100644
+--- a/block/blk-core.c
++++ b/block/blk-core.c
+@@ -578,7 +578,7 @@ void blk_cleanup_queue(struct request_queue *q)
+ 		q->queue_lock = &q->__queue_lock;
+ 	spin_unlock_irq(lock);
+ 
+-	bdi_destroy(&q->backing_dev_info);
++	bdi_unregister(&q->backing_dev_info);
+ 
+ 	/* @q is and will stay empty, shutdown and put */
+ 	blk_put_queue(q);
+diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
+index 9115c6d59948..273519894951 100644
+--- a/block/blk-mq-tag.c
++++ b/block/blk-mq-tag.c
+@@ -628,6 +628,7 @@ void blk_mq_free_tags(struct blk_mq_tags *tags)
+ {
+ 	bt_free(&tags->bitmap_tags);
+ 	bt_free(&tags->breserved_tags);
++	free_cpumask_var(tags->cpumask);
+ 	kfree(tags);
+ }
+ 
+diff --git a/block/blk-mq.c b/block/blk-mq.c
+index c69902695136..4d6ff5259a61 100644
+--- a/block/blk-mq.c
++++ b/block/blk-mq.c
+@@ -2263,10 +2263,8 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
+ 	int i;
+ 
+ 	for (i = 0; i < set->nr_hw_queues; i++) {
+-		if (set->tags[i]) {
++		if (set->tags[i])
+ 			blk_mq_free_rq_map(set, set->tags[i], i);
+-			free_cpumask_var(set->tags[i]->cpumask);
+-		}
+ 	}
+ 
+ 	kfree(set->tags);
+diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
+index 6264b382d4d1..145ddb6c6d31 100644
+--- a/block/blk-sysfs.c
++++ b/block/blk-sysfs.c
+@@ -502,6 +502,7 @@ static void blk_release_queue(struct kobject *kobj)
+ 	struct request_queue *q =
+ 		container_of(kobj, struct request_queue, kobj);
+ 
++	bdi_exit(&q->backing_dev_info);
+ 	blkcg_exit_queue(q);
+ 
+ 	if (q->elevator) {
+diff --git a/crypto/ablkcipher.c b/crypto/ablkcipher.c
+index b788f169cc98..b4ffc5be1a93 100644
+--- a/crypto/ablkcipher.c
++++ b/crypto/ablkcipher.c
+@@ -706,7 +706,7 @@ struct crypto_ablkcipher *crypto_alloc_ablkcipher(const char *alg_name,
+ err:
+ 		if (err != -EAGAIN)
+ 			break;
+-		if (signal_pending(current)) {
++		if (fatal_signal_pending(current)) {
+ 			err = -EINTR;
+ 			break;
+ 		}
+diff --git a/crypto/algapi.c b/crypto/algapi.c
+index 3c079b7f23f6..b603b34ce8a8 100644
+--- a/crypto/algapi.c
++++ b/crypto/algapi.c
+@@ -335,7 +335,7 @@ static void crypto_wait_for_test(struct crypto_larval *larval)
+ 		crypto_alg_tested(larval->alg.cra_driver_name, 0);
+ 	}
+ 
+-	err = wait_for_completion_interruptible(&larval->completion);
++	err = wait_for_completion_killable(&larval->completion);
+ 	WARN_ON(err);
+ 
+ out:
+diff --git a/crypto/api.c b/crypto/api.c
+index afe4610afc4b..bbc147cb5dec 100644
+--- a/crypto/api.c
++++ b/crypto/api.c
+@@ -172,7 +172,7 @@ static struct crypto_alg *crypto_larval_wait(struct crypto_alg *alg)
+ 	struct crypto_larval *larval = (void *)alg;
+ 	long timeout;
+ 
+-	timeout = wait_for_completion_interruptible_timeout(
++	timeout = wait_for_completion_killable_timeout(
+ 		&larval->completion, 60 * HZ);
+ 
+ 	alg = larval->adult;
+@@ -445,7 +445,7 @@ struct crypto_tfm *crypto_alloc_base(const char *alg_name, u32 type, u32 mask)
+ err:
+ 		if (err != -EAGAIN)
+ 			break;
+-		if (signal_pending(current)) {
++		if (fatal_signal_pending(current)) {
+ 			err = -EINTR;
+ 			break;
+ 		}
+@@ -562,7 +562,7 @@ void *crypto_alloc_tfm(const char *alg_name,
+ err:
+ 		if (err != -EAGAIN)
+ 			break;
+-		if (signal_pending(current)) {
++		if (fatal_signal_pending(current)) {
+ 			err = -EINTR;
+ 			break;
+ 		}
+diff --git a/crypto/crypto_user.c b/crypto/crypto_user.c
+index 08ea2867fc8a..d59fb4eeed2b 100644
+--- a/crypto/crypto_user.c
++++ b/crypto/crypto_user.c
+@@ -376,7 +376,7 @@ static struct crypto_alg *crypto_user_skcipher_alg(const char *name, u32 type,
+ 		err = PTR_ERR(alg);
+ 		if (err != -EAGAIN)
+ 			break;
+-		if (signal_pending(current)) {
++		if (fatal_signal_pending(current)) {
+ 			err = -EINTR;
+ 			break;
+ 		}
+diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
+index 7920c2741b47..cf91c114ed9f 100644
+--- a/drivers/block/nvme-core.c
++++ b/drivers/block/nvme-core.c
+@@ -597,6 +597,7 @@ static void req_completion(struct nvme_queue *nvmeq, void *ctx,
+ 	struct nvme_iod *iod = ctx;
+ 	struct request *req = iod_get_private(iod);
+ 	struct nvme_cmd_info *cmd_rq = blk_mq_rq_to_pdu(req);
++	bool requeue = false;
+ 
+ 	u16 status = le16_to_cpup(&cqe->status) >> 1;
+ 
+@@ -605,12 +606,13 @@ static void req_completion(struct nvme_queue *nvmeq, void *ctx,
+ 		    && (jiffies - req->start_time) < req->timeout) {
+ 			unsigned long flags;
+ 
++			requeue = true;
+ 			blk_mq_requeue_request(req);
+ 			spin_lock_irqsave(req->q->queue_lock, flags);
+ 			if (!blk_queue_stopped(req->q))
+ 				blk_mq_kick_requeue_list(req->q);
+ 			spin_unlock_irqrestore(req->q->queue_lock, flags);
+-			return;
++			goto release_iod;
+ 		}
+ 		if (req->cmd_type == REQ_TYPE_DRV_PRIV) {
+ 			if (cmd_rq->ctx == CMD_CTX_CANCELLED)
+@@ -631,7 +633,7 @@ static void req_completion(struct nvme_queue *nvmeq, void *ctx,
+ 		dev_warn(nvmeq->dev->dev,
+ 			"completing aborted command with status:%04x\n",
+ 			status);
+-
++ release_iod:
+ 	if (iod->nents) {
+ 		dma_unmap_sg(nvmeq->dev->dev, iod->sg, iod->nents,
+ 			rq_data_dir(req) ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
+@@ -644,7 +646,8 @@ static void req_completion(struct nvme_queue *nvmeq, void *ctx,
+ 	}
+ 	nvme_free_iod(nvmeq->dev, iod);
+ 
+-	blk_mq_complete_request(req);
++	if (likely(!requeue))
++		blk_mq_complete_request(req);
+ }
+ 
+ /* length is in bytes.  gfp flags indicates whether we may sleep. */
+@@ -1764,7 +1767,7 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio)
+ 
+ 	length = (io.nblocks + 1) << ns->lba_shift;
+ 	meta_len = (io.nblocks + 1) * ns->ms;
+-	metadata = (void __user *)(unsigned long)io.metadata;
++	metadata = (void __user *)(uintptr_t)io.metadata;
+ 	write = io.opcode & 1;
+ 
+ 	if (ns->ext) {
+@@ -1804,7 +1807,7 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio)
+ 	c.rw.metadata = cpu_to_le64(meta_dma);
+ 
+ 	status = __nvme_submit_sync_cmd(ns->queue, &c, NULL,
+-			(void __user *)io.addr, length, NULL, 0);
++			(void __user *)(uintptr_t)io.addr, length, NULL, 0);
+  unmap:
+ 	if (meta) {
+ 		if (status == NVME_SC_SUCCESS && !write) {
+@@ -1846,7 +1849,7 @@ static int nvme_user_cmd(struct nvme_dev *dev, struct nvme_ns *ns,
+ 		timeout = msecs_to_jiffies(cmd.timeout_ms);
+ 
+ 	status = __nvme_submit_sync_cmd(ns ? ns->queue : dev->admin_q, &c,
+-			NULL, (void __user *)cmd.addr, cmd.data_len,
++			NULL, (void __user *)(uintptr_t)cmd.addr, cmd.data_len,
+ 			&cmd.result, timeout);
+ 	if (status >= 0) {
+ 		if (put_user(cmd.result, &ucmd->result))
+diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
+index 324bf35ec4dd..017b7d58ae06 100644
+--- a/drivers/block/rbd.c
++++ b/drivers/block/rbd.c
+@@ -96,6 +96,8 @@ static int atomic_dec_return_safe(atomic_t *v)
+ #define RBD_MINORS_PER_MAJOR		256
+ #define RBD_SINGLE_MAJOR_PART_SHIFT	4
+ 
++#define RBD_MAX_PARENT_CHAIN_LEN	16
++
+ #define RBD_SNAP_DEV_NAME_PREFIX	"snap_"
+ #define RBD_MAX_SNAP_NAME_LEN	\
+ 			(NAME_MAX - (sizeof (RBD_SNAP_DEV_NAME_PREFIX) - 1))
+@@ -426,7 +428,7 @@ static ssize_t rbd_add_single_major(struct bus_type *bus, const char *buf,
+ 				    size_t count);
+ static ssize_t rbd_remove_single_major(struct bus_type *bus, const char *buf,
+ 				       size_t count);
+-static int rbd_dev_image_probe(struct rbd_device *rbd_dev, bool mapping);
++static int rbd_dev_image_probe(struct rbd_device *rbd_dev, int depth);
+ static void rbd_spec_put(struct rbd_spec *spec);
+ 
+ static int rbd_dev_id_to_minor(int dev_id)
+@@ -3819,6 +3821,9 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
+ 	q->limits.discard_zeroes_data = 1;
+ 
+ 	blk_queue_merge_bvec(q, rbd_merge_bvec);
++	if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
++		q->backing_dev_info.capabilities |= BDI_CAP_STABLE_WRITES;
++
+ 	disk->queue = q;
+ 
+ 	q->queuedata = rbd_dev;
+@@ -5169,44 +5174,51 @@ out_err:
+ 	return ret;
+ }
+ 
+-static int rbd_dev_probe_parent(struct rbd_device *rbd_dev)
++/*
++ * @depth is rbd_dev_image_probe() -> rbd_dev_probe_parent() ->
++ * rbd_dev_image_probe() recursion depth, which means it's also the
++ * length of the already discovered part of the parent chain.
++ */
++static int rbd_dev_probe_parent(struct rbd_device *rbd_dev, int depth)
+ {
+ 	struct rbd_device *parent = NULL;
+-	struct rbd_spec *parent_spec;
+-	struct rbd_client *rbdc;
+ 	int ret;
+ 
+ 	if (!rbd_dev->parent_spec)
+ 		return 0;
+-	/*
+-	 * We need to pass a reference to the client and the parent
+-	 * spec when creating the parent rbd_dev.  Images related by
+-	 * parent/child relationships always share both.
+-	 */
+-	parent_spec = rbd_spec_get(rbd_dev->parent_spec);
+-	rbdc = __rbd_get_client(rbd_dev->rbd_client);
+ 
+-	ret = -ENOMEM;
+-	parent = rbd_dev_create(rbdc, parent_spec, NULL);
+-	if (!parent)
++	if (++depth > RBD_MAX_PARENT_CHAIN_LEN) {
++		pr_info("parent chain is too long (%d)\n", depth);
++		ret = -EINVAL;
+ 		goto out_err;
++	}
+ 
+-	ret = rbd_dev_image_probe(parent, false);
++	parent = rbd_dev_create(rbd_dev->rbd_client, rbd_dev->parent_spec,
++				NULL);
++	if (!parent) {
++		ret = -ENOMEM;
++		goto out_err;
++	}
++
++	/*
++	 * Images related by parent/child relationships always share
++	 * rbd_client and spec/parent_spec, so bump their refcounts.
++	 */
++	__rbd_get_client(rbd_dev->rbd_client);
++	rbd_spec_get(rbd_dev->parent_spec);
++
++	ret = rbd_dev_image_probe(parent, depth);
+ 	if (ret < 0)
+ 		goto out_err;
++
+ 	rbd_dev->parent = parent;
+ 	atomic_set(&rbd_dev->parent_ref, 1);
+-
+ 	return 0;
++
+ out_err:
+-	if (parent) {
+-		rbd_dev_unparent(rbd_dev);
++	rbd_dev_unparent(rbd_dev);
++	if (parent)
+ 		rbd_dev_destroy(parent);
+-	} else {
+-		rbd_put_client(rbdc);
+-		rbd_spec_put(parent_spec);
+-	}
+-
+ 	return ret;
+ }
+ 
+@@ -5324,7 +5336,7 @@ static void rbd_dev_image_release(struct rbd_device *rbd_dev)
+  * parent), initiate a watch on its header object before using that
+  * object to get detailed information about the rbd image.
+  */
+-static int rbd_dev_image_probe(struct rbd_device *rbd_dev, bool mapping)
++static int rbd_dev_image_probe(struct rbd_device *rbd_dev, int depth)
+ {
+ 	int ret;
+ 
+@@ -5342,7 +5354,7 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, bool mapping)
+ 	if (ret)
+ 		goto err_out_format;
+ 
+-	if (mapping) {
++	if (!depth) {
+ 		ret = rbd_dev_header_watch_sync(rbd_dev);
+ 		if (ret) {
+ 			if (ret == -ENOENT)
+@@ -5363,7 +5375,7 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, bool mapping)
+ 	 * Otherwise this is a parent image, identified by pool, image
+ 	 * and snap ids - need to fill in names for those ids.
+ 	 */
+-	if (mapping)
++	if (!depth)
+ 		ret = rbd_spec_fill_snap_id(rbd_dev);
+ 	else
+ 		ret = rbd_spec_fill_names(rbd_dev);
+@@ -5385,12 +5397,12 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, bool mapping)
+ 		 * Need to warn users if this image is the one being
+ 		 * mapped and has a parent.
+ 		 */
+-		if (mapping && rbd_dev->parent_spec)
++		if (!depth && rbd_dev->parent_spec)
+ 			rbd_warn(rbd_dev,
+ 				 "WARNING: kernel layering is EXPERIMENTAL!");
+ 	}
+ 
+-	ret = rbd_dev_probe_parent(rbd_dev);
++	ret = rbd_dev_probe_parent(rbd_dev, depth);
+ 	if (ret)
+ 		goto err_out_probe;
+ 
+@@ -5401,7 +5413,7 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, bool mapping)
+ err_out_probe:
+ 	rbd_dev_unprobe(rbd_dev);
+ err_out_watch:
+-	if (mapping)
++	if (!depth)
+ 		rbd_dev_header_unwatch_sync(rbd_dev);
+ out_header_name:
+ 	kfree(rbd_dev->header_name);
+@@ -5464,7 +5476,7 @@ static ssize_t do_rbd_add(struct bus_type *bus,
+ 	spec = NULL;		/* rbd_dev now owns this */
+ 	rbd_opts = NULL;	/* rbd_dev now owns this */
+ 
+-	rc = rbd_dev_image_probe(rbd_dev, true);
++	rc = rbd_dev_image_probe(rbd_dev, 0);
+ 	if (rc < 0)
+ 		goto err_out_rbd_dev;
+ 
+diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
+index 7a8a73f1fc04..d68b08ae4be1 100644
+--- a/drivers/block/xen-blkfront.c
++++ b/drivers/block/xen-blkfront.c
+@@ -1984,7 +1984,8 @@ static void blkback_changed(struct xenbus_device *dev,
+ 			break;
+ 		/* Missed the backend's Closing state -- fallthrough */
+ 	case XenbusStateClosing:
+-		blkfront_closing(info);
++		if (info)
++			blkfront_closing(info);
+ 		break;
+ 	}
+ }
+diff --git a/drivers/bus/arm-ccn.c b/drivers/bus/arm-ccn.c
+index 7d9879e166cf..395cb7f9f5a4 100644
+--- a/drivers/bus/arm-ccn.c
++++ b/drivers/bus/arm-ccn.c
+@@ -1188,7 +1188,8 @@ static int arm_ccn_pmu_cpu_notifier(struct notifier_block *nb,
+ 			break;
+ 		perf_pmu_migrate_context(&dt->pmu, cpu, target);
+ 		cpumask_set_cpu(target, &dt->cpu);
+-		WARN_ON(irq_set_affinity(ccn->irq, &dt->cpu) != 0);
++		if (ccn->irq)
++			WARN_ON(irq_set_affinity(ccn->irq, &dt->cpu) != 0);
+ 	default:
+ 		break;
+ 	}
+diff --git a/drivers/clk/clkdev.c b/drivers/clk/clkdev.c
+index c0eaf0973bd2..779b6ff0c7ad 100644
+--- a/drivers/clk/clkdev.c
++++ b/drivers/clk/clkdev.c
+@@ -333,7 +333,8 @@ int clk_add_alias(const char *alias, const char *alias_dev_name,
+ 	if (IS_ERR(r))
+ 		return PTR_ERR(r);
+ 
+-	l = clkdev_create(r, alias, "%s", alias_dev_name);
++	l = clkdev_create(r, alias, alias_dev_name ? "%s" : NULL,
++			  alias_dev_name);
+ 	clk_put(r);
+ 
+ 	return l ? 0 : -ENODEV;
+diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
+index fcb929ec5304..aba2117a80c1 100644
+--- a/drivers/cpufreq/intel_pstate.c
++++ b/drivers/cpufreq/intel_pstate.c
+@@ -766,6 +766,11 @@ static inline void intel_pstate_sample(struct cpudata *cpu)
+ 	local_irq_save(flags);
+ 	rdmsrl(MSR_IA32_APERF, aperf);
+ 	rdmsrl(MSR_IA32_MPERF, mperf);
++	if (cpu->prev_mperf == mperf) {
++		local_irq_restore(flags);
++		return;
++	}
++
+ 	tsc = native_read_tsc();
+ 	local_irq_restore(flags);
+ 
+diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
+index ca7831168298..91cf71008e11 100644
+--- a/drivers/edac/sb_edac.c
++++ b/drivers/edac/sb_edac.c
+@@ -1648,6 +1648,7 @@ static int sbridge_mci_bind_devs(struct mem_ctl_info *mci,
+ {
+ 	struct sbridge_pvt *pvt = mci->pvt_info;
+ 	struct pci_dev *pdev;
++	u8 saw_chan_mask = 0;
+ 	int i;
+ 
+ 	for (i = 0; i < sbridge_dev->n_devs; i++) {
+@@ -1681,6 +1682,7 @@ static int sbridge_mci_bind_devs(struct mem_ctl_info *mci,
+ 		{
+ 			int id = pdev->device - PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD0;
+ 			pvt->pci_tad[id] = pdev;
++			saw_chan_mask |= 1 << id;
+ 		}
+ 			break;
+ 		case PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_DDRIO:
+@@ -1701,10 +1703,8 @@ static int sbridge_mci_bind_devs(struct mem_ctl_info *mci,
+ 	    !pvt-> pci_tad || !pvt->pci_ras  || !pvt->pci_ta)
+ 		goto enodev;
+ 
+-	for (i = 0; i < NUM_CHANNELS; i++) {
+-		if (!pvt->pci_tad[i])
+-			goto enodev;
+-	}
++	if (saw_chan_mask != 0x0f)
++		goto enodev;
+ 	return 0;
+ 
+ enodev:
+diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+index f7b49d5ce4b8..e3305a5aedfd 100644
+--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
++++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+@@ -1583,6 +1583,7 @@ struct amdgpu_pm {
+ 	u8                      fan_max_rpm;
+ 	/* dpm */
+ 	bool                    dpm_enabled;
++	bool                    sysfs_initialized;
+ 	struct amdgpu_dpm       dpm;
+ 	const struct firmware	*fw;	/* SMC firmware */
+ 	uint32_t                fw_version;
+diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
+index ed13baa7c976..91c7556a365a 100644
+--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
++++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
+@@ -693,6 +693,9 @@ int amdgpu_pm_sysfs_init(struct amdgpu_device *adev)
+ {
+ 	int ret;
+ 
++	if (adev->pm.sysfs_initialized)
++		return 0;
++
+ 	if (adev->pm.funcs->get_temperature == NULL)
+ 		return 0;
+ 	adev->pm.int_hwmon_dev = hwmon_device_register_with_groups(adev->dev,
+@@ -721,6 +724,8 @@ int amdgpu_pm_sysfs_init(struct amdgpu_device *adev)
+ 		return ret;
+ 	}
+ 
++	adev->pm.sysfs_initialized = true;
++
+ 	return 0;
+ }
+ 
+diff --git a/drivers/gpu/drm/amd/amdgpu/kv_dpm.c b/drivers/gpu/drm/amd/amdgpu/kv_dpm.c
+index 9745ed3a9aef..7e9154c7f1db 100644
+--- a/drivers/gpu/drm/amd/amdgpu/kv_dpm.c
++++ b/drivers/gpu/drm/amd/amdgpu/kv_dpm.c
+@@ -2997,6 +2997,9 @@ static int kv_dpm_late_init(void *handle)
+ 	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+ 	int ret;
+ 
++	if (!amdgpu_dpm)
++		return 0;
++
+ 	/* init the sysfs and debugfs files late */
+ 	ret = amdgpu_pm_sysfs_init(adev);
+ 	if (ret)
+diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
+index fed748311b92..4e8d72d40af4 100644
+--- a/drivers/gpu/drm/drm_crtc.c
++++ b/drivers/gpu/drm/drm_crtc.c
+@@ -4221,7 +4221,7 @@ drm_property_create_blob(struct drm_device *dev, size_t length,
+ 	struct drm_property_blob *blob;
+ 	int ret;
+ 
+-	if (!length)
++	if (!length || length > ULONG_MAX - sizeof(struct drm_property_blob))
+ 		return ERR_PTR(-EINVAL);
+ 
+ 	blob = kzalloc(sizeof(struct drm_property_blob)+length, GFP_KERNEL);
+@@ -4573,7 +4573,7 @@ int drm_mode_createblob_ioctl(struct drm_device *dev,
+ 	 * not associated with any file_priv. */
+ 	mutex_lock(&dev->mode_config.blob_lock);
+ 	out_resp->blob_id = blob->base.id;
+-	list_add_tail(&file_priv->blobs, &blob->head_file);
++	list_add_tail(&blob->head_file, &file_priv->blobs);
+ 	mutex_unlock(&dev->mode_config.blob_lock);
+ 
+ 	return 0;
+diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
+index 27a2426c3daa..1f94219f3e0e 100644
+--- a/drivers/gpu/drm/drm_dp_mst_topology.c
++++ b/drivers/gpu/drm/drm_dp_mst_topology.c
+@@ -1193,17 +1193,18 @@ static struct drm_dp_mst_branch *drm_dp_get_mst_branch_device(struct drm_dp_mst_
+ 
+ 		list_for_each_entry(port, &mstb->ports, next) {
+ 			if (port->port_num == port_num) {
+-				if (!port->mstb) {
++				mstb = port->mstb;
++				if (!mstb) {
+ 					DRM_ERROR("failed to lookup MSTB with lct %d, rad %02x\n", lct, rad[0]);
+-					return NULL;
++					goto out;
+ 				}
+ 
+-				mstb = port->mstb;
+ 				break;
+ 			}
+ 		}
+ 	}
+ 	kref_get(&mstb->kref);
++out:
+ 	mutex_unlock(&mgr->lock);
+ 	return mstb;
+ }
+diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
+index 8fd431bcdfd3..a96b9006a51e 100644
+--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
++++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
+@@ -804,7 +804,10 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = {
+  * Also note, that the object created here is not currently a "first class"
+  * object, in that several ioctls are banned. These are the CPU access
+  * ioctls: mmap(), pwrite and pread. In practice, you are expected to use
+- * direct access via your pointer rather than use those ioctls.
++ * direct access via your pointer rather than use those ioctls. Another
++ * restriction is that we do not allow userptr surfaces to be pinned to the
++ * hardware and so we reject any attempt to create a framebuffer out of a
++ * userptr.
+  *
+  * If you think this is a good interface to use to pass GPU memory between
+  * drivers, please use dma-buf instead. In fact, wherever possible use
+diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
+index 107c6c0519fd..10b1b657d32a 100644
+--- a/drivers/gpu/drm/i915/intel_display.c
++++ b/drivers/gpu/drm/i915/intel_display.c
+@@ -1729,6 +1729,8 @@ static void i9xx_enable_pll(struct intel_crtc *crtc)
+ 			   I915_READ(DPLL(!crtc->pipe)) | DPLL_DVO_2X_MODE);
+ 	}
+ 
++	I915_WRITE(reg, dpll);
++
+ 	/* Wait for the clocks to stabilize. */
+ 	POSTING_READ(reg);
+ 	udelay(150);
+@@ -14070,6 +14072,11 @@ static int intel_user_framebuffer_create_handle(struct drm_framebuffer *fb,
+ 	struct intel_framebuffer *intel_fb = to_intel_framebuffer(fb);
+ 	struct drm_i915_gem_object *obj = intel_fb->obj;
+ 
++	if (obj->userptr.mm) {
++		DRM_DEBUG("attempting to use a userptr for a framebuffer, denied\n");
++		return -EINVAL;
++	}
++
+ 	return drm_gem_handle_create(file, &obj->base, handle);
+ }
+ 
+diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
+index 7f2161a1ff5d..504728b401b6 100644
+--- a/drivers/gpu/drm/i915/intel_lrc.c
++++ b/drivers/gpu/drm/i915/intel_lrc.c
+@@ -1250,6 +1250,7 @@ static int gen8_emit_flush_render(struct intel_ringbuffer *ringbuf,
+ 	if (flush_domains) {
+ 		flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
+ 		flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
++		flags |= PIPE_CONTROL_FLUSH_ENABLE;
+ 	}
+ 
+ 	if (invalidate_domains) {
+diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
+index 3817a6f00d9e..ba672aa980e1 100644
+--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
++++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
+@@ -342,6 +342,7 @@ gen7_render_ring_flush(struct intel_engine_cs *ring,
+ 	if (flush_domains) {
+ 		flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
+ 		flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
++		flags |= PIPE_CONTROL_FLUSH_ENABLE;
+ 	}
+ 	if (invalidate_domains) {
+ 		flags |= PIPE_CONTROL_TLB_INVALIDATE;
+@@ -412,6 +413,7 @@ gen8_render_ring_flush(struct intel_engine_cs *ring,
+ 	if (flush_domains) {
+ 		flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
+ 		flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
++		flags |= PIPE_CONTROL_FLUSH_ENABLE;
+ 	}
+ 	if (invalidate_domains) {
+ 		flags |= PIPE_CONTROL_TLB_INVALIDATE;
+diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
+index af1ee517f372..0b2239423a37 100644
+--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
++++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
+@@ -227,11 +227,12 @@ nouveau_gem_info(struct drm_file *file_priv, struct drm_gem_object *gem,
+ 	struct nouveau_bo *nvbo = nouveau_gem_object(gem);
+ 	struct nvkm_vma *vma;
+ 
+-	if (nvbo->bo.mem.mem_type == TTM_PL_TT)
++	if (is_power_of_2(nvbo->valid_domains))
++		rep->domain = nvbo->valid_domains;
++	else if (nvbo->bo.mem.mem_type == TTM_PL_TT)
+ 		rep->domain = NOUVEAU_GEM_DOMAIN_GART;
+ 	else
+ 		rep->domain = NOUVEAU_GEM_DOMAIN_VRAM;
+-
+ 	rep->offset = nvbo->bo.offset;
+ 	if (cli->vm) {
+ 		vma = nouveau_bo_vma_find(nvbo, cli->vm);
+diff --git a/drivers/gpu/drm/radeon/atombios_encoders.c b/drivers/gpu/drm/radeon/atombios_encoders.c
+index 65adb9c72377..bb292143997e 100644
+--- a/drivers/gpu/drm/radeon/atombios_encoders.c
++++ b/drivers/gpu/drm/radeon/atombios_encoders.c
+@@ -237,6 +237,7 @@ void radeon_atom_backlight_init(struct radeon_encoder *radeon_encoder,
+ 	backlight_update_status(bd);
+ 
+ 	DRM_INFO("radeon atom DIG backlight initialized\n");
++	rdev->mode_info.bl_encoder = radeon_encoder;
+ 
+ 	return;
+ 
+@@ -1624,9 +1625,14 @@ radeon_atom_encoder_dpms_avivo(struct drm_encoder *encoder, int mode)
+ 		} else
+ 			atom_execute_table(rdev->mode_info.atom_context, index, (uint32_t *)&args);
+ 		if (radeon_encoder->devices & (ATOM_DEVICE_LCD_SUPPORT)) {
+-			struct radeon_encoder_atom_dig *dig = radeon_encoder->enc_priv;
++			if (rdev->mode_info.bl_encoder) {
++				struct radeon_encoder_atom_dig *dig = radeon_encoder->enc_priv;
+ 
+-			atombios_set_backlight_level(radeon_encoder, dig->backlight_level);
++				atombios_set_backlight_level(radeon_encoder, dig->backlight_level);
++			} else {
++				args.ucAction = ATOM_LCD_BLON;
++				atom_execute_table(rdev->mode_info.atom_context, index, (uint32_t *)&args);
++			}
+ 		}
+ 		break;
+ 	case DRM_MODE_DPMS_STANDBY:
+@@ -1706,8 +1712,13 @@ radeon_atom_encoder_dpms_dig(struct drm_encoder *encoder, int mode)
+ 			if (ASIC_IS_DCE4(rdev))
+ 				atombios_dig_encoder_setup(encoder, ATOM_ENCODER_CMD_DP_VIDEO_ON, 0);
+ 		}
+-		if (radeon_encoder->devices & (ATOM_DEVICE_LCD_SUPPORT))
+-			atombios_set_backlight_level(radeon_encoder, dig->backlight_level);
++		if (radeon_encoder->devices & (ATOM_DEVICE_LCD_SUPPORT)) {
++			if (rdev->mode_info.bl_encoder)
++				atombios_set_backlight_level(radeon_encoder, dig->backlight_level);
++			else
++				atombios_dig_transmitter_setup(encoder,
++							       ATOM_TRANSMITTER_ACTION_LCD_BLON, 0, 0);
++		}
+ 		if (ext_encoder)
+ 			atombios_external_encoder_setup(encoder, ext_encoder, ATOM_ENABLE);
+ 		break;
+diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
+index f03b7eb15233..b6cbd816537e 100644
+--- a/drivers/gpu/drm/radeon/radeon.h
++++ b/drivers/gpu/drm/radeon/radeon.h
+@@ -1658,6 +1658,7 @@ struct radeon_pm {
+ 	u8                      fan_max_rpm;
+ 	/* dpm */
+ 	bool                    dpm_enabled;
++	bool                    sysfs_initialized;
+ 	struct radeon_dpm       dpm;
+ };
+ 
+diff --git a/drivers/gpu/drm/radeon/radeon_encoders.c b/drivers/gpu/drm/radeon/radeon_encoders.c
+index ef99917f000d..c6ee80216cf4 100644
+--- a/drivers/gpu/drm/radeon/radeon_encoders.c
++++ b/drivers/gpu/drm/radeon/radeon_encoders.c
+@@ -194,7 +194,6 @@ static void radeon_encoder_add_backlight(struct radeon_encoder *radeon_encoder,
+ 			radeon_atom_backlight_init(radeon_encoder, connector);
+ 		else
+ 			radeon_legacy_backlight_init(radeon_encoder, connector);
+-		rdev->mode_info.bl_encoder = radeon_encoder;
+ 	}
+ }
+ 
+diff --git a/drivers/gpu/drm/radeon/radeon_legacy_encoders.c b/drivers/gpu/drm/radeon/radeon_legacy_encoders.c
+index 45715307db71..30de43366eae 100644
+--- a/drivers/gpu/drm/radeon/radeon_legacy_encoders.c
++++ b/drivers/gpu/drm/radeon/radeon_legacy_encoders.c
+@@ -441,6 +441,7 @@ void radeon_legacy_backlight_init(struct radeon_encoder *radeon_encoder,
+ 	backlight_update_status(bd);
+ 
+ 	DRM_INFO("radeon legacy LVDS backlight initialized\n");
++	rdev->mode_info.bl_encoder = radeon_encoder;
+ 
+ 	return;
+ 
+diff --git a/drivers/gpu/drm/radeon/radeon_pm.c b/drivers/gpu/drm/radeon/radeon_pm.c
+index 948c33105801..91764320c56f 100644
+--- a/drivers/gpu/drm/radeon/radeon_pm.c
++++ b/drivers/gpu/drm/radeon/radeon_pm.c
+@@ -720,10 +720,14 @@ static umode_t hwmon_attributes_visible(struct kobject *kobj,
+ 	struct radeon_device *rdev = dev_get_drvdata(dev);
+ 	umode_t effective_mode = attr->mode;
+ 
+-	/* Skip limit attributes if DPM is not enabled */
++	/* Skip attributes if DPM is not enabled */
+ 	if (rdev->pm.pm_method != PM_METHOD_DPM &&
+ 	    (attr == &sensor_dev_attr_temp1_crit.dev_attr.attr ||
+-	     attr == &sensor_dev_attr_temp1_crit_hyst.dev_attr.attr))
++	     attr == &sensor_dev_attr_temp1_crit_hyst.dev_attr.attr ||
++	     attr == &sensor_dev_attr_pwm1.dev_attr.attr ||
++	     attr == &sensor_dev_attr_pwm1_enable.dev_attr.attr ||
++	     attr == &sensor_dev_attr_pwm1_max.dev_attr.attr ||
++	     attr == &sensor_dev_attr_pwm1_min.dev_attr.attr))
+ 		return 0;
+ 
+ 	/* Skip fan attributes if fan is not present */
+@@ -1529,19 +1533,23 @@ int radeon_pm_late_init(struct radeon_device *rdev)
+ 
+ 	if (rdev->pm.pm_method == PM_METHOD_DPM) {
+ 		if (rdev->pm.dpm_enabled) {
+-			ret = device_create_file(rdev->dev, &dev_attr_power_dpm_state);
+-			if (ret)
+-				DRM_ERROR("failed to create device file for dpm state\n");
+-			ret = device_create_file(rdev->dev, &dev_attr_power_dpm_force_performance_level);
+-			if (ret)
+-				DRM_ERROR("failed to create device file for dpm state\n");
+-			/* XXX: these are noops for dpm but are here for backwards compat */
+-			ret = device_create_file(rdev->dev, &dev_attr_power_profile);
+-			if (ret)
+-				DRM_ERROR("failed to create device file for power profile\n");
+-			ret = device_create_file(rdev->dev, &dev_attr_power_method);
+-			if (ret)
+-				DRM_ERROR("failed to create device file for power method\n");
++			if (!rdev->pm.sysfs_initialized) {
++				ret = device_create_file(rdev->dev, &dev_attr_power_dpm_state);
++				if (ret)
++					DRM_ERROR("failed to create device file for dpm state\n");
++				ret = device_create_file(rdev->dev, &dev_attr_power_dpm_force_performance_level);
++				if (ret)
++					DRM_ERROR("failed to create device file for dpm state\n");
++				/* XXX: these are noops for dpm but are here for backwards compat */
++				ret = device_create_file(rdev->dev, &dev_attr_power_profile);
++				if (ret)
++					DRM_ERROR("failed to create device file for power profile\n");
++				ret = device_create_file(rdev->dev, &dev_attr_power_method);
++				if (ret)
++					DRM_ERROR("failed to create device file for power method\n");
++				if (!ret)
++					rdev->pm.sysfs_initialized = true;
++			}
+ 
+ 			mutex_lock(&rdev->pm.mutex);
+ 			ret = radeon_dpm_late_enable(rdev);
+@@ -1557,7 +1565,8 @@ int radeon_pm_late_init(struct radeon_device *rdev)
+ 			}
+ 		}
+ 	} else {
+-		if (rdev->pm.num_power_states > 1) {
++		if ((rdev->pm.num_power_states > 1) &&
++		    (!rdev->pm.sysfs_initialized)) {
+ 			/* where's the best place to put these? */
+ 			ret = device_create_file(rdev->dev, &dev_attr_power_profile);
+ 			if (ret)
+@@ -1565,6 +1574,8 @@ int radeon_pm_late_init(struct radeon_device *rdev)
+ 			ret = device_create_file(rdev->dev, &dev_attr_power_method);
+ 			if (ret)
+ 				DRM_ERROR("failed to create device file for power method\n");
++			if (!ret)
++				rdev->pm.sysfs_initialized = true;
+ 		}
+ 	}
+ 	return ret;
+diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+index 620bb5cf617c..15a8d7746fd2 100644
+--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
++++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+@@ -1458,6 +1458,9 @@ static void __exit vmwgfx_exit(void)
+ 	drm_pci_exit(&driver, &vmw_pci_driver);
+ }
+ 
++MODULE_INFO(vmw_patch, "ed7d78b2");
++MODULE_INFO(vmw_patch, "54c12bc3");
++
+ module_init(vmwgfx_init);
+ module_exit(vmwgfx_exit);
+ 
+diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+index d26a6daa9719..d8896ed41b9e 100644
+--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
++++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+@@ -636,7 +636,8 @@ extern int vmw_user_dmabuf_alloc(struct vmw_private *dev_priv,
+ 				 uint32_t size,
+ 				 bool shareable,
+ 				 uint32_t *handle,
+-				 struct vmw_dma_buffer **p_dma_buf);
++				 struct vmw_dma_buffer **p_dma_buf,
++				 struct ttm_base_object **p_base);
+ extern int vmw_user_dmabuf_reference(struct ttm_object_file *tfile,
+ 				     struct vmw_dma_buffer *dma_buf,
+ 				     uint32_t *handle);
+@@ -650,7 +651,8 @@ extern uint32_t vmw_dmabuf_validate_node(struct ttm_buffer_object *bo,
+ 					 uint32_t cur_validate_node);
+ extern void vmw_dmabuf_validate_clear(struct ttm_buffer_object *bo);
+ extern int vmw_user_dmabuf_lookup(struct ttm_object_file *tfile,
+-				  uint32_t id, struct vmw_dma_buffer **out);
++				  uint32_t id, struct vmw_dma_buffer **out,
++				  struct ttm_base_object **base);
+ extern int vmw_stream_claim_ioctl(struct drm_device *dev, void *data,
+ 				  struct drm_file *file_priv);
+ extern int vmw_stream_unref_ioctl(struct drm_device *dev, void *data,
+diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
+index 97ad3bcb99a7..aee1c6ccc52d 100644
+--- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
++++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
+@@ -887,7 +887,8 @@ static int vmw_translate_mob_ptr(struct vmw_private *dev_priv,
+ 	struct vmw_relocation *reloc;
+ 	int ret;
+ 
+-	ret = vmw_user_dmabuf_lookup(sw_context->fp->tfile, handle, &vmw_bo);
++	ret = vmw_user_dmabuf_lookup(sw_context->fp->tfile, handle, &vmw_bo,
++				     NULL);
+ 	if (unlikely(ret != 0)) {
+ 		DRM_ERROR("Could not find or use MOB buffer.\n");
+ 		ret = -EINVAL;
+@@ -949,7 +950,8 @@ static int vmw_translate_guest_ptr(struct vmw_private *dev_priv,
+ 	struct vmw_relocation *reloc;
+ 	int ret;
+ 
+-	ret = vmw_user_dmabuf_lookup(sw_context->fp->tfile, handle, &vmw_bo);
++	ret = vmw_user_dmabuf_lookup(sw_context->fp->tfile, handle, &vmw_bo,
++				     NULL);
+ 	if (unlikely(ret != 0)) {
+ 		DRM_ERROR("Could not find or use GMR region.\n");
+ 		ret = -EINVAL;
+diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c b/drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c
+index 87e39f68e9d0..e1898982b44a 100644
+--- a/drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c
++++ b/drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c
+@@ -484,7 +484,7 @@ int vmw_overlay_ioctl(struct drm_device *dev, void *data,
+ 		goto out_unlock;
+ 	}
+ 
+-	ret = vmw_user_dmabuf_lookup(tfile, arg->handle, &buf);
++	ret = vmw_user_dmabuf_lookup(tfile, arg->handle, &buf, NULL);
+ 	if (ret)
+ 		goto out_unlock;
+ 
+diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
+index 210ef15b1d09..c5b4c47e86d6 100644
+--- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
++++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
+@@ -356,7 +356,7 @@ int vmw_user_lookup_handle(struct vmw_private *dev_priv,
+ 	}
+ 
+ 	*out_surf = NULL;
+-	ret = vmw_user_dmabuf_lookup(tfile, handle, out_buf);
++	ret = vmw_user_dmabuf_lookup(tfile, handle, out_buf, NULL);
+ 	return ret;
+ }
+ 
+@@ -483,7 +483,8 @@ int vmw_user_dmabuf_alloc(struct vmw_private *dev_priv,
+ 			  uint32_t size,
+ 			  bool shareable,
+ 			  uint32_t *handle,
+-			  struct vmw_dma_buffer **p_dma_buf)
++			  struct vmw_dma_buffer **p_dma_buf,
++			  struct ttm_base_object **p_base)
+ {
+ 	struct vmw_user_dma_buffer *user_bo;
+ 	struct ttm_buffer_object *tmp;
+@@ -517,6 +518,10 @@ int vmw_user_dmabuf_alloc(struct vmw_private *dev_priv,
+ 	}
+ 
+ 	*p_dma_buf = &user_bo->dma;
++	if (p_base) {
++		*p_base = &user_bo->prime.base;
++		kref_get(&(*p_base)->refcount);
++	}
+ 	*handle = user_bo->prime.base.hash.key;
+ 
+ out_no_base_object:
+@@ -633,6 +638,7 @@ int vmw_user_dmabuf_synccpu_ioctl(struct drm_device *dev, void *data,
+ 	struct vmw_dma_buffer *dma_buf;
+ 	struct vmw_user_dma_buffer *user_bo;
+ 	struct ttm_object_file *tfile = vmw_fpriv(file_priv)->tfile;
++	struct ttm_base_object *buffer_base;
+ 	int ret;
+ 
+ 	if ((arg->flags & (drm_vmw_synccpu_read | drm_vmw_synccpu_write)) == 0
+@@ -645,7 +651,8 @@ int vmw_user_dmabuf_synccpu_ioctl(struct drm_device *dev, void *data,
+ 
+ 	switch (arg->op) {
+ 	case drm_vmw_synccpu_grab:
+-		ret = vmw_user_dmabuf_lookup(tfile, arg->handle, &dma_buf);
++		ret = vmw_user_dmabuf_lookup(tfile, arg->handle, &dma_buf,
++					     &buffer_base);
+ 		if (unlikely(ret != 0))
+ 			return ret;
+ 
+@@ -653,6 +660,7 @@ int vmw_user_dmabuf_synccpu_ioctl(struct drm_device *dev, void *data,
+ 				       dma);
+ 		ret = vmw_user_dmabuf_synccpu_grab(user_bo, tfile, arg->flags);
+ 		vmw_dmabuf_unreference(&dma_buf);
++		ttm_base_object_unref(&buffer_base);
+ 		if (unlikely(ret != 0 && ret != -ERESTARTSYS &&
+ 			     ret != -EBUSY)) {
+ 			DRM_ERROR("Failed synccpu grab on handle 0x%08x.\n",
+@@ -694,7 +702,8 @@ int vmw_dmabuf_alloc_ioctl(struct drm_device *dev, void *data,
+ 		return ret;
+ 
+ 	ret = vmw_user_dmabuf_alloc(dev_priv, vmw_fpriv(file_priv)->tfile,
+-				    req->size, false, &handle, &dma_buf);
++				    req->size, false, &handle, &dma_buf,
++				    NULL);
+ 	if (unlikely(ret != 0))
+ 		goto out_no_dmabuf;
+ 
+@@ -723,7 +732,8 @@ int vmw_dmabuf_unref_ioctl(struct drm_device *dev, void *data,
+ }
+ 
+ int vmw_user_dmabuf_lookup(struct ttm_object_file *tfile,
+-			   uint32_t handle, struct vmw_dma_buffer **out)
++			   uint32_t handle, struct vmw_dma_buffer **out,
++			   struct ttm_base_object **p_base)
+ {
+ 	struct vmw_user_dma_buffer *vmw_user_bo;
+ 	struct ttm_base_object *base;
+@@ -745,7 +755,10 @@ int vmw_user_dmabuf_lookup(struct ttm_object_file *tfile,
+ 	vmw_user_bo = container_of(base, struct vmw_user_dma_buffer,
+ 				   prime.base);
+ 	(void)ttm_bo_reference(&vmw_user_bo->dma.base);
+-	ttm_base_object_unref(&base);
++	if (p_base)
++		*p_base = base;
++	else
++		ttm_base_object_unref(&base);
+ 	*out = &vmw_user_bo->dma;
+ 
+ 	return 0;
+@@ -1006,7 +1019,7 @@ int vmw_dumb_create(struct drm_file *file_priv,
+ 
+ 	ret = vmw_user_dmabuf_alloc(dev_priv, vmw_fpriv(file_priv)->tfile,
+ 				    args->size, false, &args->handle,
+-				    &dma_buf);
++				    &dma_buf, NULL);
+ 	if (unlikely(ret != 0))
+ 		goto out_no_dmabuf;
+ 
+@@ -1034,7 +1047,7 @@ int vmw_dumb_map_offset(struct drm_file *file_priv,
+ 	struct vmw_dma_buffer *out_buf;
+ 	int ret;
+ 
+-	ret = vmw_user_dmabuf_lookup(tfile, handle, &out_buf);
++	ret = vmw_user_dmabuf_lookup(tfile, handle, &out_buf, NULL);
+ 	if (ret != 0)
+ 		return -EINVAL;
+ 
+diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c b/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c
+index 6a4584a43aa6..d2751ada19b1 100644
+--- a/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c
++++ b/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c
+@@ -470,7 +470,7 @@ int vmw_shader_define_ioctl(struct drm_device *dev, void *data,
+ 
+ 	if (arg->buffer_handle != SVGA3D_INVALID_ID) {
+ 		ret = vmw_user_dmabuf_lookup(tfile, arg->buffer_handle,
+-					     &buffer);
++					     &buffer, NULL);
+ 		if (unlikely(ret != 0)) {
+ 			DRM_ERROR("Could not find buffer for shader "
+ 				  "creation.\n");
+diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
+index 4ecdbf3e59da..17a4107639b2 100644
+--- a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
++++ b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
+@@ -43,6 +43,7 @@ struct vmw_user_surface {
+ 	struct vmw_surface srf;
+ 	uint32_t size;
+ 	struct drm_master *master;
++	struct ttm_base_object *backup_base;
+ };
+ 
+ /**
+@@ -652,6 +653,8 @@ static void vmw_user_surface_base_release(struct ttm_base_object **p_base)
+ 	struct vmw_resource *res = &user_srf->srf.res;
+ 
+ 	*p_base = NULL;
++	if (user_srf->backup_base)
++		ttm_base_object_unref(&user_srf->backup_base);
+ 	vmw_resource_unreference(&res);
+ }
+ 
+@@ -846,7 +849,8 @@ int vmw_surface_define_ioctl(struct drm_device *dev, void *data,
+ 					    res->backup_size,
+ 					    true,
+ 					    &backup_handle,
+-					    &res->backup);
++					    &res->backup,
++					    &user_srf->backup_base);
+ 		if (unlikely(ret != 0)) {
+ 			vmw_resource_unreference(&res);
+ 			goto out_unlock;
+@@ -1309,7 +1313,8 @@ int vmw_gb_surface_define_ioctl(struct drm_device *dev, void *data,
+ 
+ 	if (req->buffer_handle != SVGA3D_INVALID_ID) {
+ 		ret = vmw_user_dmabuf_lookup(tfile, req->buffer_handle,
+-					     &res->backup);
++					     &res->backup,
++					     &user_srf->backup_base);
+ 	} else if (req->drm_surface_flags &
+ 		   drm_vmw_surface_flag_create_buffer)
+ 		ret = vmw_user_dmabuf_alloc(dev_priv, tfile,
+@@ -1317,7 +1322,8 @@ int vmw_gb_surface_define_ioctl(struct drm_device *dev, void *data,
+ 					    req->drm_surface_flags &
+ 					    drm_vmw_surface_flag_shareable,
+ 					    &backup_handle,
+-					    &res->backup);
++					    &res->backup,
++					    &user_srf->backup_base);
+ 
+ 	if (unlikely(ret != 0)) {
+ 		vmw_resource_unreference(&res);
+diff --git a/drivers/i2c/busses/i2c-mv64xxx.c b/drivers/i2c/busses/i2c-mv64xxx.c
+index 30059c1df2a3..5801227b97ab 100644
+--- a/drivers/i2c/busses/i2c-mv64xxx.c
++++ b/drivers/i2c/busses/i2c-mv64xxx.c
+@@ -669,8 +669,6 @@ mv64xxx_i2c_can_offload(struct mv64xxx_i2c_data *drv_data)
+ 	struct i2c_msg *msgs = drv_data->msgs;
+ 	int num = drv_data->num_msgs;
+ 
+-	return false;
+-
+ 	if (!drv_data->offload_enabled)
+ 		return false;
+ 
+diff --git a/drivers/iio/accel/st_accel_core.c b/drivers/iio/accel/st_accel_core.c
+index 4002e6410444..c472477f9a7d 100644
+--- a/drivers/iio/accel/st_accel_core.c
++++ b/drivers/iio/accel/st_accel_core.c
+@@ -149,8 +149,6 @@
+ #define ST_ACCEL_4_BDU_MASK			0x40
+ #define ST_ACCEL_4_DRDY_IRQ_ADDR		0x21
+ #define ST_ACCEL_4_DRDY_IRQ_INT1_MASK		0x04
+-#define ST_ACCEL_4_IG1_EN_ADDR			0x21
+-#define ST_ACCEL_4_IG1_EN_MASK			0x08
+ #define ST_ACCEL_4_MULTIREAD_BIT		true
+ 
+ /* CUSTOM VALUES FOR SENSOR 5 */
+@@ -484,10 +482,6 @@ static const struct st_sensor_settings st_accel_sensors_settings[] = {
+ 		.drdy_irq = {
+ 			.addr = ST_ACCEL_4_DRDY_IRQ_ADDR,
+ 			.mask_int1 = ST_ACCEL_4_DRDY_IRQ_INT1_MASK,
+-			.ig1 = {
+-				.en_addr = ST_ACCEL_4_IG1_EN_ADDR,
+-				.en_mask = ST_ACCEL_4_IG1_EN_MASK,
+-			},
+ 		},
+ 		.multi_read_bit = ST_ACCEL_4_MULTIREAD_BIT,
+ 		.bootime = 2, /* guess */
+diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
+index 3a972ebf3c0d..8be73524aabd 100644
+--- a/drivers/infiniband/core/cm.c
++++ b/drivers/infiniband/core/cm.c
+@@ -873,6 +873,11 @@ retest:
+ 	case IB_CM_SIDR_REQ_RCVD:
+ 		spin_unlock_irq(&cm_id_priv->lock);
+ 		cm_reject_sidr_req(cm_id_priv, IB_SIDR_REJECT);
++		spin_lock_irq(&cm.lock);
++		if (!RB_EMPTY_NODE(&cm_id_priv->sidr_id_node))
++			rb_erase(&cm_id_priv->sidr_id_node,
++				 &cm.remote_sidr_table);
++		spin_unlock_irq(&cm.lock);
+ 		break;
+ 	case IB_CM_REQ_SENT:
+ 	case IB_CM_MRA_REQ_RCVD:
+@@ -3112,7 +3117,10 @@ int ib_send_cm_sidr_rep(struct ib_cm_id *cm_id,
+ 	spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+ 
+ 	spin_lock_irqsave(&cm.lock, flags);
+-	rb_erase(&cm_id_priv->sidr_id_node, &cm.remote_sidr_table);
++	if (!RB_EMPTY_NODE(&cm_id_priv->sidr_id_node)) {
++		rb_erase(&cm_id_priv->sidr_id_node, &cm.remote_sidr_table);
++		RB_CLEAR_NODE(&cm_id_priv->sidr_id_node);
++	}
+ 	spin_unlock_irqrestore(&cm.lock, flags);
+ 	return 0;
+ 
+diff --git a/drivers/input/mouse/alps.c b/drivers/input/mouse/alps.c
+index 4d246861d692..41e6cb501e6a 100644
+--- a/drivers/input/mouse/alps.c
++++ b/drivers/input/mouse/alps.c
+@@ -100,7 +100,7 @@ static const struct alps_nibble_commands alps_v6_nibble_commands[] = {
+ #define ALPS_FOUR_BUTTONS	0x40	/* 4 direction button present */
+ #define ALPS_PS2_INTERLEAVED	0x80	/* 3-byte PS/2 packet interleaved with
+ 					   6-byte ALPS packet */
+-#define ALPS_DELL		0x100	/* device is a Dell laptop */
++#define ALPS_STICK_BITS		0x100	/* separate stick button bits */
+ #define ALPS_BUTTONPAD		0x200	/* device is a clickpad */
+ 
+ static const struct alps_model_info alps_model_data[] = {
+@@ -159,6 +159,43 @@ static const struct alps_protocol_info alps_v8_protocol_data = {
+ 	ALPS_PROTO_V8, 0x18, 0x18, 0
+ };
+ 
++/*
++ * Some v2 models report the stick buttons in separate bits
++ */
++static const struct dmi_system_id alps_dmi_has_separate_stick_buttons[] = {
++#if defined(CONFIG_DMI) && defined(CONFIG_X86)
++	{
++		/* Extrapolated from other entries */
++		.matches = {
++			DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
++			DMI_MATCH(DMI_PRODUCT_NAME, "Latitude D420"),
++		},
++	},
++	{
++		/* Reported-by: Hans de Bruin <jmdebruin@xmsnet.nl> */
++		.matches = {
++			DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
++			DMI_MATCH(DMI_PRODUCT_NAME, "Latitude D430"),
++		},
++	},
++	{
++		/* Reported-by: Hans de Goede <hdegoede@redhat.com> */
++		.matches = {
++			DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
++			DMI_MATCH(DMI_PRODUCT_NAME, "Latitude D620"),
++		},
++	},
++	{
++		/* Extrapolated from other entries */
++		.matches = {
++			DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
++			DMI_MATCH(DMI_PRODUCT_NAME, "Latitude D630"),
++		},
++	},
++#endif
++	{ }
++};
++
+ static void alps_set_abs_params_st(struct alps_data *priv,
+ 				   struct input_dev *dev1);
+ static void alps_set_abs_params_semi_mt(struct alps_data *priv,
+@@ -253,9 +290,8 @@ static void alps_process_packet_v1_v2(struct psmouse *psmouse)
+ 		return;
+ 	}
+ 
+-	/* Dell non interleaved V2 dualpoint has separate stick button bits */
+-	if (priv->proto_version == ALPS_PROTO_V2 &&
+-	    priv->flags == (ALPS_DELL | ALPS_PASS | ALPS_DUALPOINT)) {
++	/* Some models have separate stick button bits */
++	if (priv->flags & ALPS_STICK_BITS) {
+ 		left |= packet[0] & 1;
+ 		right |= packet[0] & 2;
+ 		middle |= packet[0] & 4;
+@@ -2552,8 +2588,6 @@ static int alps_set_protocol(struct psmouse *psmouse,
+ 	priv->byte0 = protocol->byte0;
+ 	priv->mask0 = protocol->mask0;
+ 	priv->flags = protocol->flags;
+-	if (dmi_name_in_vendors("Dell"))
+-		priv->flags |= ALPS_DELL;
+ 
+ 	priv->x_max = 2000;
+ 	priv->y_max = 1400;
+@@ -2568,6 +2602,8 @@ static int alps_set_protocol(struct psmouse *psmouse,
+ 		priv->set_abs_params = alps_set_abs_params_st;
+ 		priv->x_max = 1023;
+ 		priv->y_max = 767;
++		if (dmi_check_system(alps_dmi_has_separate_stick_buttons))
++			priv->flags |= ALPS_STICK_BITS;
+ 		break;
+ 
+ 	case ALPS_PROTO_V3:
+diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
+index 658ee39e6569..1b10e5fd6ef6 100644
+--- a/drivers/iommu/amd_iommu.c
++++ b/drivers/iommu/amd_iommu.c
+@@ -1974,8 +1974,8 @@ static void set_dte_entry(u16 devid, struct protection_domain *domain, bool ats)
+ static void clear_dte_entry(u16 devid)
+ {
+ 	/* remove entry from the device table seen by the hardware */
+-	amd_iommu_dev_table[devid].data[0] = IOMMU_PTE_P | IOMMU_PTE_TV;
+-	amd_iommu_dev_table[devid].data[1] = 0;
++	amd_iommu_dev_table[devid].data[0]  = IOMMU_PTE_P | IOMMU_PTE_TV;
++	amd_iommu_dev_table[devid].data[1] &= DTE_FLAG_MASK;
+ 
+ 	amd_iommu_apply_erratum_63(devid);
+ }
+diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
+index f65908841be0..c9b64722f623 100644
+--- a/drivers/iommu/amd_iommu_types.h
++++ b/drivers/iommu/amd_iommu_types.h
+@@ -295,6 +295,7 @@
+ #define IOMMU_PTE_IR (1ULL << 61)
+ #define IOMMU_PTE_IW (1ULL << 62)
+ 
++#define DTE_FLAG_MASK	(0x3ffULL << 32)
+ #define DTE_FLAG_IOTLB	(0x01UL << 32)
+ #define DTE_FLAG_GV	(0x01ULL << 55)
+ #define DTE_GLX_SHIFT	(56)
+diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c
+index f7b875bb70d4..c3b8a5b9f035 100644
+--- a/drivers/iommu/amd_iommu_v2.c
++++ b/drivers/iommu/amd_iommu_v2.c
+@@ -516,6 +516,13 @@ static void do_fault(struct work_struct *work)
+ 		goto out;
+ 	}
+ 
++	if (!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE))) {
++		/* handle_mm_fault would BUG_ON() */
++		up_read(&mm->mmap_sem);
++		handle_fault_error(fault);
++		goto out;
++	}
++
+ 	ret = handle_mm_fault(mm, vma, address, write);
+ 	if (ret & VM_FAULT_ERROR) {
+ 		/* failed to service fault */
+diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
+index 7553cb90627f..bd1b8ad8af44 100644
+--- a/drivers/iommu/intel-iommu.c
++++ b/drivers/iommu/intel-iommu.c
+@@ -2109,15 +2109,19 @@ static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
+ 				return -ENOMEM;
+ 			/* It is large page*/
+ 			if (largepage_lvl > 1) {
++				unsigned long nr_superpages, end_pfn;
++
+ 				pteval |= DMA_PTE_LARGE_PAGE;
+ 				lvl_pages = lvl_to_nr_pages(largepage_lvl);
++
++				nr_superpages = sg_res / lvl_pages;
++				end_pfn = iov_pfn + nr_superpages * lvl_pages - 1;
++
+ 				/*
+ 				 * Ensure that old small page tables are
+-				 * removed to make room for superpage,
+-				 * if they exist.
++				 * removed to make room for superpage(s).
+ 				 */
+-				dma_pte_free_pagetable(domain, iov_pfn,
+-						       iov_pfn + lvl_pages - 1);
++				dma_pte_free_pagetable(domain, iov_pfn, end_pfn);
+ 			} else {
+ 				pteval &= ~(uint64_t)DMA_PTE_LARGE_PAGE;
+ 			}
+diff --git a/drivers/irqchip/irq-tegra.c b/drivers/irqchip/irq-tegra.c
+index f67bbd80433e..ab5353a96a82 100644
+--- a/drivers/irqchip/irq-tegra.c
++++ b/drivers/irqchip/irq-tegra.c
+@@ -215,6 +215,7 @@ static struct irq_chip tegra_ictlr_chip = {
+ 	.irq_unmask		= tegra_unmask,
+ 	.irq_retrigger		= tegra_retrigger,
+ 	.irq_set_wake		= tegra_set_wake,
++	.irq_set_type		= irq_chip_set_type_parent,
+ 	.flags			= IRQCHIP_MASK_ON_SUSPEND,
+ #ifdef CONFIG_SMP
+ 	.irq_set_affinity	= irq_chip_set_affinity_parent,
+diff --git a/drivers/md/dm-cache-metadata.c b/drivers/md/dm-cache-metadata.c
+index 20cc36b01b77..0a17d1b91a81 100644
+--- a/drivers/md/dm-cache-metadata.c
++++ b/drivers/md/dm-cache-metadata.c
+@@ -634,10 +634,10 @@ static int __commit_transaction(struct dm_cache_metadata *cmd,
+ 
+ 	disk_super = dm_block_data(sblock);
+ 
++	disk_super->flags = cpu_to_le32(cmd->flags);
+ 	if (mutator)
+ 		update_flags(disk_super, mutator);
+ 
+-	disk_super->flags = cpu_to_le32(cmd->flags);
+ 	disk_super->mapping_root = cpu_to_le64(cmd->root);
+ 	disk_super->hint_root = cpu_to_le64(cmd->hint_root);
+ 	disk_super->discard_root = cpu_to_le64(cmd->discard_root);
+diff --git a/drivers/md/md.c b/drivers/md/md.c
+index e25f00f0138a..95e7b72a164a 100644
+--- a/drivers/md/md.c
++++ b/drivers/md/md.c
+@@ -8030,8 +8030,7 @@ static int remove_and_add_spares(struct mddev *mddev,
+ 		       !test_bit(Bitmap_sync, &rdev->flags)))
+ 			continue;
+ 
+-		if (rdev->saved_raid_disk < 0)
+-			rdev->recovery_offset = 0;
++		rdev->recovery_offset = 0;
+ 		if (mddev->pers->
+ 		    hot_add_disk(mddev, rdev) == 0) {
+ 			if (sysfs_link_rdev(mddev, rdev))
+diff --git a/drivers/md/persistent-data/dm-btree-remove.c b/drivers/md/persistent-data/dm-btree-remove.c
+index 4222f774cf36..1dac15d1697c 100644
+--- a/drivers/md/persistent-data/dm-btree-remove.c
++++ b/drivers/md/persistent-data/dm-btree-remove.c
+@@ -301,11 +301,16 @@ static void redistribute3(struct dm_btree_info *info, struct btree_node *parent,
+ {
+ 	int s;
+ 	uint32_t max_entries = le32_to_cpu(left->header.max_entries);
+-	unsigned target = (nr_left + nr_center + nr_right) / 3;
+-	BUG_ON(target > max_entries);
++	unsigned total = nr_left + nr_center + nr_right;
++	unsigned target_right = total / 3;
++	unsigned remainder = (target_right * 3) != total;
++	unsigned target_left = target_right + remainder;
++
++	BUG_ON(target_left > max_entries);
++	BUG_ON(target_right > max_entries);
+ 
+ 	if (nr_left < nr_right) {
+-		s = nr_left - target;
++		s = nr_left - target_left;
+ 
+ 		if (s < 0 && nr_center < -s) {
+ 			/* not enough in central node */
+@@ -316,10 +321,10 @@ static void redistribute3(struct dm_btree_info *info, struct btree_node *parent,
+ 		} else
+ 			shift(left, center, s);
+ 
+-		shift(center, right, target - nr_right);
++		shift(center, right, target_right - nr_right);
+ 
+ 	} else {
+-		s = target - nr_right;
++		s = target_right - nr_right;
+ 		if (s > 0 && nr_center < s) {
+ 			/* not enough in central node */
+ 			shift(center, right, nr_center);
+@@ -329,7 +334,7 @@ static void redistribute3(struct dm_btree_info *info, struct btree_node *parent,
+ 		} else
+ 			shift(center, right, s);
+ 
+-		shift(left, center, nr_left - target);
++		shift(left, center, nr_left - target_left);
+ 	}
+ 
+ 	*key_ptr(parent, c->index) = center->keys[0];
+diff --git a/drivers/md/persistent-data/dm-btree.c b/drivers/md/persistent-data/dm-btree.c
+index c7726cebc495..d6e47033b5e0 100644
+--- a/drivers/md/persistent-data/dm-btree.c
++++ b/drivers/md/persistent-data/dm-btree.c
+@@ -523,7 +523,7 @@ static int btree_split_beneath(struct shadow_spine *s, uint64_t key)
+ 
+ 	r = new_block(s->info, &right);
+ 	if (r < 0) {
+-		/* FIXME: put left */
++		unlock_block(s->info, left);
+ 		return r;
+ 	}
+ 
+diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
+index 967a4ed73929..d10d3008227e 100644
+--- a/drivers/md/raid1.c
++++ b/drivers/md/raid1.c
+@@ -2249,7 +2249,7 @@ static int narrow_write_error(struct r1bio *r1_bio, int i)
+ 		bio_trim(wbio, sector - r1_bio->sector, sectors);
+ 		wbio->bi_iter.bi_sector += rdev->data_offset;
+ 		wbio->bi_bdev = rdev->bdev;
+-		if (submit_bio_wait(WRITE, wbio) == 0)
++		if (submit_bio_wait(WRITE, wbio) < 0)
+ 			/* failure! */
+ 			ok = rdev_set_badblocks(rdev, sector,
+ 						sectors, 0)
+diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
+index 38c58e19cfce..d4b70d90de9c 100644
+--- a/drivers/md/raid10.c
++++ b/drivers/md/raid10.c
+@@ -2580,7 +2580,7 @@ static int narrow_write_error(struct r10bio *r10_bio, int i)
+ 				   choose_data_offset(r10_bio, rdev) +
+ 				   (sector - r10_bio->sector));
+ 		wbio->bi_bdev = rdev->bdev;
+-		if (submit_bio_wait(WRITE, wbio) == 0)
++		if (submit_bio_wait(WRITE, wbio) < 0)
+ 			/* Failure! */
+ 			ok = rdev_set_badblocks(rdev, sector,
+ 						sectors, 0)
+diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
+index f757023fc458..0d4f7b1b7f73 100644
+--- a/drivers/md/raid5.c
++++ b/drivers/md/raid5.c
+@@ -3505,6 +3505,7 @@ returnbi:
+ 		}
+ 	if (!discard_pending &&
+ 	    test_bit(R5_Discard, &sh->dev[sh->pd_idx].flags)) {
++		int hash;
+ 		clear_bit(R5_Discard, &sh->dev[sh->pd_idx].flags);
+ 		clear_bit(R5_UPTODATE, &sh->dev[sh->pd_idx].flags);
+ 		if (sh->qd_idx >= 0) {
+@@ -3518,16 +3519,17 @@ returnbi:
+ 		 * no updated data, so remove it from hash list and the stripe
+ 		 * will be reinitialized
+ 		 */
+-		spin_lock_irq(&conf->device_lock);
+ unhash:
++		hash = sh->hash_lock_index;
++		spin_lock_irq(conf->hash_locks + hash);
+ 		remove_hash(sh);
++		spin_unlock_irq(conf->hash_locks + hash);
+ 		if (head_sh->batch_head) {
+ 			sh = list_first_entry(&sh->batch_list,
+ 					      struct stripe_head, batch_list);
+ 			if (sh != head_sh)
+ 					goto unhash;
+ 		}
+-		spin_unlock_irq(&conf->device_lock);
+ 		sh = head_sh;
+ 
+ 		if (test_bit(STRIPE_SYNC_REQUESTED, &sh->state))
+diff --git a/drivers/media/dvb-frontends/m88ds3103.c b/drivers/media/dvb-frontends/m88ds3103.c
+index e9b2d2b69b1d..377fb6991ab3 100644
+--- a/drivers/media/dvb-frontends/m88ds3103.c
++++ b/drivers/media/dvb-frontends/m88ds3103.c
+@@ -18,6 +18,27 @@
+ 
+ static struct dvb_frontend_ops m88ds3103_ops;
+ 
++/* write single register with mask */
++static int m88ds3103_update_bits(struct m88ds3103_dev *dev,
++				u8 reg, u8 mask, u8 val)
++{
++	int ret;
++	u8 tmp;
++
++	/* no need for read if whole reg is written */
++	if (mask != 0xff) {
++		ret = regmap_bulk_read(dev->regmap, reg, &tmp, 1);
++		if (ret)
++			return ret;
++
++		val &= mask;
++		tmp &= ~mask;
++		val |= tmp;
++	}
++
++	return regmap_bulk_write(dev->regmap, reg, &val, 1);
++}
++
+ /* write reg val table using reg addr auto increment */
+ static int m88ds3103_wr_reg_val_tab(struct m88ds3103_dev *dev,
+ 		const struct m88ds3103_reg_val *tab, int tab_len)
+@@ -394,10 +415,10 @@ static int m88ds3103_set_frontend(struct dvb_frontend *fe)
+ 			u8tmp2 = 0x00; /* 0b00 */
+ 			break;
+ 		}
+-		ret = regmap_update_bits(dev->regmap, 0x22, 0xc0, u8tmp1 << 6);
++		ret = m88ds3103_update_bits(dev, 0x22, 0xc0, u8tmp1 << 6);
+ 		if (ret)
+ 			goto err;
+-		ret = regmap_update_bits(dev->regmap, 0x24, 0xc0, u8tmp2 << 6);
++		ret = m88ds3103_update_bits(dev, 0x24, 0xc0, u8tmp2 << 6);
+ 		if (ret)
+ 			goto err;
+ 	}
+@@ -455,13 +476,13 @@ static int m88ds3103_set_frontend(struct dvb_frontend *fe)
+ 			if (ret)
+ 				goto err;
+ 		}
+-		ret = regmap_update_bits(dev->regmap, 0x9d, 0x08, 0x08);
++		ret = m88ds3103_update_bits(dev, 0x9d, 0x08, 0x08);
+ 		if (ret)
+ 			goto err;
+ 		ret = regmap_write(dev->regmap, 0xf1, 0x01);
+ 		if (ret)
+ 			goto err;
+-		ret = regmap_update_bits(dev->regmap, 0x30, 0x80, 0x80);
++		ret = m88ds3103_update_bits(dev, 0x30, 0x80, 0x80);
+ 		if (ret)
+ 			goto err;
+ 	}
+@@ -498,7 +519,7 @@ static int m88ds3103_set_frontend(struct dvb_frontend *fe)
+ 	switch (dev->cfg->ts_mode) {
+ 	case M88DS3103_TS_SERIAL:
+ 	case M88DS3103_TS_SERIAL_D7:
+-		ret = regmap_update_bits(dev->regmap, 0x29, 0x20, u8tmp1);
++		ret = m88ds3103_update_bits(dev, 0x29, 0x20, u8tmp1);
+ 		if (ret)
+ 			goto err;
+ 		u8tmp1 = 0;
+@@ -567,11 +588,11 @@ static int m88ds3103_set_frontend(struct dvb_frontend *fe)
+ 	if (ret)
+ 		goto err;
+ 
+-	ret = regmap_update_bits(dev->regmap, 0x4d, 0x02, dev->cfg->spec_inv << 1);
++	ret = m88ds3103_update_bits(dev, 0x4d, 0x02, dev->cfg->spec_inv << 1);
+ 	if (ret)
+ 		goto err;
+ 
+-	ret = regmap_update_bits(dev->regmap, 0x30, 0x10, dev->cfg->agc_inv << 4);
++	ret = m88ds3103_update_bits(dev, 0x30, 0x10, dev->cfg->agc_inv << 4);
+ 	if (ret)
+ 		goto err;
+ 
+@@ -625,13 +646,13 @@ static int m88ds3103_init(struct dvb_frontend *fe)
+ 	dev->warm = false;
+ 
+ 	/* wake up device from sleep */
+-	ret = regmap_update_bits(dev->regmap, 0x08, 0x01, 0x01);
++	ret = m88ds3103_update_bits(dev, 0x08, 0x01, 0x01);
+ 	if (ret)
+ 		goto err;
+-	ret = regmap_update_bits(dev->regmap, 0x04, 0x01, 0x00);
++	ret = m88ds3103_update_bits(dev, 0x04, 0x01, 0x00);
+ 	if (ret)
+ 		goto err;
+-	ret = regmap_update_bits(dev->regmap, 0x23, 0x10, 0x00);
++	ret = m88ds3103_update_bits(dev, 0x23, 0x10, 0x00);
+ 	if (ret)
+ 		goto err;
+ 
+@@ -749,18 +770,18 @@ static int m88ds3103_sleep(struct dvb_frontend *fe)
+ 		utmp = 0x29;
+ 	else
+ 		utmp = 0x27;
+-	ret = regmap_update_bits(dev->regmap, utmp, 0x01, 0x00);
++	ret = m88ds3103_update_bits(dev, utmp, 0x01, 0x00);
+ 	if (ret)
+ 		goto err;
+ 
+ 	/* sleep */
+-	ret = regmap_update_bits(dev->regmap, 0x08, 0x01, 0x00);
++	ret = m88ds3103_update_bits(dev, 0x08, 0x01, 0x00);
+ 	if (ret)
+ 		goto err;
+-	ret = regmap_update_bits(dev->regmap, 0x04, 0x01, 0x01);
++	ret = m88ds3103_update_bits(dev, 0x04, 0x01, 0x01);
+ 	if (ret)
+ 		goto err;
+-	ret = regmap_update_bits(dev->regmap, 0x23, 0x10, 0x10);
++	ret = m88ds3103_update_bits(dev, 0x23, 0x10, 0x10);
+ 	if (ret)
+ 		goto err;
+ 
+@@ -992,12 +1013,12 @@ static int m88ds3103_set_tone(struct dvb_frontend *fe,
+ 	}
+ 
+ 	utmp = tone << 7 | dev->cfg->envelope_mode << 5;
+-	ret = regmap_update_bits(dev->regmap, 0xa2, 0xe0, utmp);
++	ret = m88ds3103_update_bits(dev, 0xa2, 0xe0, utmp);
+ 	if (ret)
+ 		goto err;
+ 
+ 	utmp = 1 << 2;
+-	ret = regmap_update_bits(dev->regmap, 0xa1, reg_a1_mask, utmp);
++	ret = m88ds3103_update_bits(dev, 0xa1, reg_a1_mask, utmp);
+ 	if (ret)
+ 		goto err;
+ 
+@@ -1047,7 +1068,7 @@ static int m88ds3103_set_voltage(struct dvb_frontend *fe,
+ 	voltage_dis ^= dev->cfg->lnb_en_pol;
+ 
+ 	utmp = voltage_dis << 1 | voltage_sel << 0;
+-	ret = regmap_update_bits(dev->regmap, 0xa2, 0x03, utmp);
++	ret = m88ds3103_update_bits(dev, 0xa2, 0x03, utmp);
+ 	if (ret)
+ 		goto err;
+ 
+@@ -1080,7 +1101,7 @@ static int m88ds3103_diseqc_send_master_cmd(struct dvb_frontend *fe,
+ 	}
+ 
+ 	utmp = dev->cfg->envelope_mode << 5;
+-	ret = regmap_update_bits(dev->regmap, 0xa2, 0xe0, utmp);
++	ret = m88ds3103_update_bits(dev, 0xa2, 0xe0, utmp);
+ 	if (ret)
+ 		goto err;
+ 
+@@ -1115,12 +1136,12 @@ static int m88ds3103_diseqc_send_master_cmd(struct dvb_frontend *fe,
+ 	} else {
+ 		dev_dbg(&client->dev, "diseqc tx timeout\n");
+ 
+-		ret = regmap_update_bits(dev->regmap, 0xa1, 0xc0, 0x40);
++		ret = m88ds3103_update_bits(dev, 0xa1, 0xc0, 0x40);
+ 		if (ret)
+ 			goto err;
+ 	}
+ 
+-	ret = regmap_update_bits(dev->regmap, 0xa2, 0xc0, 0x80);
++	ret = m88ds3103_update_bits(dev, 0xa2, 0xc0, 0x80);
+ 	if (ret)
+ 		goto err;
+ 
+@@ -1152,7 +1173,7 @@ static int m88ds3103_diseqc_send_burst(struct dvb_frontend *fe,
+ 	}
+ 
+ 	utmp = dev->cfg->envelope_mode << 5;
+-	ret = regmap_update_bits(dev->regmap, 0xa2, 0xe0, utmp);
++	ret = m88ds3103_update_bits(dev, 0xa2, 0xe0, utmp);
+ 	if (ret)
+ 		goto err;
+ 
+@@ -1194,12 +1215,12 @@ static int m88ds3103_diseqc_send_burst(struct dvb_frontend *fe,
+ 	} else {
+ 		dev_dbg(&client->dev, "diseqc tx timeout\n");
+ 
+-		ret = regmap_update_bits(dev->regmap, 0xa1, 0xc0, 0x40);
++		ret = m88ds3103_update_bits(dev, 0xa1, 0xc0, 0x40);
+ 		if (ret)
+ 			goto err;
+ 	}
+ 
+-	ret = regmap_update_bits(dev->regmap, 0xa2, 0xc0, 0x80);
++	ret = m88ds3103_update_bits(dev, 0xa2, 0xc0, 0x80);
+ 	if (ret)
+ 		goto err;
+ 
+@@ -1435,13 +1456,13 @@ static int m88ds3103_probe(struct i2c_client *client,
+ 		goto err_kfree;
+ 
+ 	/* sleep */
+-	ret = regmap_update_bits(dev->regmap, 0x08, 0x01, 0x00);
++	ret = m88ds3103_update_bits(dev, 0x08, 0x01, 0x00);
+ 	if (ret)
+ 		goto err_kfree;
+-	ret = regmap_update_bits(dev->regmap, 0x04, 0x01, 0x01);
++	ret = m88ds3103_update_bits(dev, 0x04, 0x01, 0x01);
+ 	if (ret)
+ 		goto err_kfree;
+-	ret = regmap_update_bits(dev->regmap, 0x23, 0x10, 0x10);
++	ret = m88ds3103_update_bits(dev, 0x23, 0x10, 0x10);
+ 	if (ret)
+ 		goto err_kfree;
+ 
+diff --git a/drivers/media/dvb-frontends/si2168.c b/drivers/media/dvb-frontends/si2168.c
+index 25e238c370e5..cb6a49b8c1ce 100644
+--- a/drivers/media/dvb-frontends/si2168.c
++++ b/drivers/media/dvb-frontends/si2168.c
+@@ -502,6 +502,10 @@ static int si2168_init(struct dvb_frontend *fe)
+ 		/* firmware is in the new format */
+ 		for (remaining = fw->size; remaining > 0; remaining -= 17) {
+ 			len = fw->data[fw->size - remaining];
++			if (len > SI2168_ARGLEN) {
++				ret = -EINVAL;
++				break;
++			}
+ 			memcpy(cmd.args, &fw->data[(fw->size - remaining) + 1], len);
+ 			cmd.wlen = len;
+ 			cmd.rlen = 1;
+diff --git a/drivers/media/tuners/si2157.c b/drivers/media/tuners/si2157.c
+index a6245ef379c4..416c865eb595 100644
+--- a/drivers/media/tuners/si2157.c
++++ b/drivers/media/tuners/si2157.c
+@@ -166,6 +166,10 @@ static int si2157_init(struct dvb_frontend *fe)
+ 
+ 	for (remaining = fw->size; remaining > 0; remaining -= 17) {
+ 		len = fw->data[fw->size - remaining];
++		if (len > SI2157_ARGLEN) {
++			dev_err(&client->dev, "Bad firmware length\n");
++			goto err_release_firmware;
++		}
+ 		memcpy(cmd.args, &fw->data[(fw->size - remaining) + 1], len);
+ 		cmd.wlen = len;
+ 		cmd.rlen = 1;
+diff --git a/drivers/media/usb/dvb-usb-v2/rtl28xxu.c b/drivers/media/usb/dvb-usb-v2/rtl28xxu.c
+index c3cac4c12fb3..197a4f2e54d2 100644
+--- a/drivers/media/usb/dvb-usb-v2/rtl28xxu.c
++++ b/drivers/media/usb/dvb-usb-v2/rtl28xxu.c
+@@ -34,6 +34,14 @@ static int rtl28xxu_ctrl_msg(struct dvb_usb_device *d, struct rtl28xxu_req *req)
+ 	unsigned int pipe;
+ 	u8 requesttype;
+ 
++	mutex_lock(&d->usb_mutex);
++
++	if (req->size > sizeof(dev->buf)) {
++		dev_err(&d->intf->dev, "too large message %u\n", req->size);
++		ret = -EINVAL;
++		goto err_mutex_unlock;
++	}
++
+ 	if (req->index & CMD_WR_FLAG) {
+ 		/* write */
+ 		memcpy(dev->buf, req->data, req->size);
+@@ -50,14 +58,17 @@ static int rtl28xxu_ctrl_msg(struct dvb_usb_device *d, struct rtl28xxu_req *req)
+ 	dvb_usb_dbg_usb_control_msg(d->udev, 0, requesttype, req->value,
+ 			req->index, dev->buf, req->size);
+ 	if (ret < 0)
+-		goto err;
++		goto err_mutex_unlock;
+ 
+ 	/* read request, copy returned data to return buf */
+ 	if (requesttype == (USB_TYPE_VENDOR | USB_DIR_IN))
+ 		memcpy(req->data, dev->buf, req->size);
+ 
++	mutex_unlock(&d->usb_mutex);
++
+ 	return 0;
+-err:
++err_mutex_unlock:
++	mutex_unlock(&d->usb_mutex);
+ 	dev_dbg(&d->intf->dev, "failed=%d\n", ret);
+ 	return ret;
+ }
+diff --git a/drivers/media/usb/dvb-usb-v2/rtl28xxu.h b/drivers/media/usb/dvb-usb-v2/rtl28xxu.h
+index 9f6115a2ee01..138062960a73 100644
+--- a/drivers/media/usb/dvb-usb-v2/rtl28xxu.h
++++ b/drivers/media/usb/dvb-usb-v2/rtl28xxu.h
+@@ -71,7 +71,7 @@
+ 
+ 
+ struct rtl28xxu_dev {
+-	u8 buf[28];
++	u8 buf[128];
+ 	u8 chip_id;
+ 	u8 tuner;
+ 	char *tuner_name;
+diff --git a/drivers/mmc/card/mmc_test.c b/drivers/mmc/card/mmc_test.c
+index b78cf5d403a3..7fc9174d4619 100644
+--- a/drivers/mmc/card/mmc_test.c
++++ b/drivers/mmc/card/mmc_test.c
+@@ -2263,15 +2263,12 @@ static int mmc_test_profile_sglen_r_nonblock_perf(struct mmc_test_card *test)
+ /*
+  * eMMC hardware reset.
+  */
+-static int mmc_test_hw_reset(struct mmc_test_card *test)
++static int mmc_test_reset(struct mmc_test_card *test)
+ {
+ 	struct mmc_card *card = test->card;
+ 	struct mmc_host *host = card->host;
+ 	int err;
+ 
+-	if (!mmc_card_mmc(card) || !mmc_can_reset(card))
+-		return RESULT_UNSUP_CARD;
+-
+ 	err = mmc_hw_reset(host);
+ 	if (!err)
+ 		return RESULT_OK;
+@@ -2605,8 +2602,8 @@ static const struct mmc_test_case mmc_test_cases[] = {
+ 	},
+ 
+ 	{
+-		.name = "eMMC hardware reset",
+-		.run = mmc_test_hw_reset,
++		.name = "Reset test",
++		.run = mmc_test_reset,
+ 	},
+ };
+ 
+diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
+index e726903170a8..f6cd995dbe92 100644
+--- a/drivers/mmc/core/mmc.c
++++ b/drivers/mmc/core/mmc.c
+@@ -1924,7 +1924,6 @@ EXPORT_SYMBOL(mmc_can_reset);
+ static int mmc_reset(struct mmc_host *host)
+ {
+ 	struct mmc_card *card = host->card;
+-	u32 status;
+ 
+ 	if (!(host->caps & MMC_CAP_HW_RESET) || !host->ops->hw_reset)
+ 		return -EOPNOTSUPP;
+@@ -1937,12 +1936,6 @@ static int mmc_reset(struct mmc_host *host)
+ 
+ 	host->ops->hw_reset(host);
+ 
+-	/* If the reset has happened, then a status command will fail */
+-	if (!mmc_send_status(card, &status)) {
+-		mmc_host_clk_release(host);
+-		return -ENOSYS;
+-	}
+-
+ 	/* Set initial state and call mmc_set_ios */
+ 	mmc_set_initial_state(host);
+ 	mmc_host_clk_release(host);
+diff --git a/drivers/net/wireless/ath/ath9k/init.c b/drivers/net/wireless/ath/ath9k/init.c
+index eff0e5325e6a..bfddc9efd6cc 100644
+--- a/drivers/net/wireless/ath/ath9k/init.c
++++ b/drivers/net/wireless/ath/ath9k/init.c
+@@ -874,6 +874,7 @@ static void ath9k_set_hw_capab(struct ath_softc *sc, struct ieee80211_hw *hw)
+ 	hw->max_rate_tries = 10;
+ 	hw->sta_data_size = sizeof(struct ath_node);
+ 	hw->vif_data_size = sizeof(struct ath_vif);
++	hw->extra_tx_headroom = 4;
+ 
+ 	hw->wiphy->available_antennas_rx = BIT(ah->caps.max_rxchains) - 1;
+ 	hw->wiphy->available_antennas_tx = BIT(ah->caps.max_txchains) - 1;
+diff --git a/drivers/net/wireless/iwlwifi/dvm/lib.c b/drivers/net/wireless/iwlwifi/dvm/lib.c
+index 1d2223df5cb0..e7d3566c714b 100644
+--- a/drivers/net/wireless/iwlwifi/dvm/lib.c
++++ b/drivers/net/wireless/iwlwifi/dvm/lib.c
+@@ -1022,7 +1022,7 @@ static void iwlagn_wowlan_program_keys(struct ieee80211_hw *hw,
+ 			u8 *pn = seq.ccmp.pn;
+ 
+ 			ieee80211_get_key_rx_seq(key, i, &seq);
+-			aes_sc->pn = cpu_to_le64(
++			aes_sc[i].pn = cpu_to_le64(
+ 					(u64)pn[5] |
+ 					((u64)pn[4] << 8) |
+ 					((u64)pn[3] << 16) |
+diff --git a/drivers/net/wireless/iwlwifi/iwl-7000.c b/drivers/net/wireless/iwlwifi/iwl-7000.c
+index cc35f796d406..d7acbd147bd1 100644
+--- a/drivers/net/wireless/iwlwifi/iwl-7000.c
++++ b/drivers/net/wireless/iwlwifi/iwl-7000.c
+@@ -348,6 +348,6 @@ const struct iwl_cfg iwl7265d_n_cfg = {
+ };
+ 
+ MODULE_FIRMWARE(IWL7260_MODULE_FIRMWARE(IWL7260_UCODE_API_OK));
+-MODULE_FIRMWARE(IWL3160_MODULE_FIRMWARE(IWL3160_UCODE_API_OK));
++MODULE_FIRMWARE(IWL3160_MODULE_FIRMWARE(IWL7260_UCODE_API_OK));
+ MODULE_FIRMWARE(IWL7265_MODULE_FIRMWARE(IWL7260_UCODE_API_OK));
+ MODULE_FIRMWARE(IWL7265D_MODULE_FIRMWARE(IWL7260_UCODE_API_OK));
+diff --git a/drivers/net/wireless/iwlwifi/mvm/d3.c b/drivers/net/wireless/iwlwifi/mvm/d3.c
+index 4165d104e4c3..f60b89baab7a 100644
+--- a/drivers/net/wireless/iwlwifi/mvm/d3.c
++++ b/drivers/net/wireless/iwlwifi/mvm/d3.c
+@@ -274,18 +274,13 @@ static void iwl_mvm_wowlan_program_keys(struct ieee80211_hw *hw,
+ 		break;
+ 	case WLAN_CIPHER_SUITE_CCMP:
+ 		if (sta) {
+-			u8 *pn = seq.ccmp.pn;
++			u64 pn64;
+ 
+ 			aes_sc = data->rsc_tsc->all_tsc_rsc.aes.unicast_rsc;
+ 			aes_tx_sc = &data->rsc_tsc->all_tsc_rsc.aes.tsc;
+ 
+-			ieee80211_get_key_tx_seq(key, &seq);
+-			aes_tx_sc->pn = cpu_to_le64((u64)pn[5] |
+-						    ((u64)pn[4] << 8) |
+-						    ((u64)pn[3] << 16) |
+-						    ((u64)pn[2] << 24) |
+-						    ((u64)pn[1] << 32) |
+-						    ((u64)pn[0] << 40));
++			pn64 = atomic64_read(&key->tx_pn);
++			aes_tx_sc->pn = cpu_to_le64(pn64);
+ 		} else {
+ 			aes_sc = data->rsc_tsc->all_tsc_rsc.aes.multicast_rsc;
+ 		}
+@@ -298,12 +293,12 @@ static void iwl_mvm_wowlan_program_keys(struct ieee80211_hw *hw,
+ 			u8 *pn = seq.ccmp.pn;
+ 
+ 			ieee80211_get_key_rx_seq(key, i, &seq);
+-			aes_sc->pn = cpu_to_le64((u64)pn[5] |
+-						 ((u64)pn[4] << 8) |
+-						 ((u64)pn[3] << 16) |
+-						 ((u64)pn[2] << 24) |
+-						 ((u64)pn[1] << 32) |
+-						 ((u64)pn[0] << 40));
++			aes_sc[i].pn = cpu_to_le64((u64)pn[5] |
++						   ((u64)pn[4] << 8) |
++						   ((u64)pn[3] << 16) |
++						   ((u64)pn[2] << 24) |
++						   ((u64)pn[1] << 32) |
++						   ((u64)pn[0] << 40));
+ 		}
+ 		data->use_rsc_tsc = true;
+ 		break;
+@@ -1446,15 +1441,15 @@ static void iwl_mvm_d3_update_gtks(struct ieee80211_hw *hw,
+ 
+ 		switch (key->cipher) {
+ 		case WLAN_CIPHER_SUITE_CCMP:
+-			iwl_mvm_aes_sc_to_seq(&sc->aes.tsc, &seq);
+ 			iwl_mvm_set_aes_rx_seq(sc->aes.unicast_rsc, key);
++			atomic64_set(&key->tx_pn, le64_to_cpu(sc->aes.tsc.pn));
+ 			break;
+ 		case WLAN_CIPHER_SUITE_TKIP:
+ 			iwl_mvm_tkip_sc_to_seq(&sc->tkip.tsc, &seq);
+ 			iwl_mvm_set_tkip_rx_seq(sc->tkip.unicast_rsc, key);
++			ieee80211_set_key_tx_seq(key, &seq);
+ 			break;
+ 		}
+-		ieee80211_set_key_tx_seq(key, &seq);
+ 
+ 		/* that's it for this key */
+ 		return;
+diff --git a/drivers/net/wireless/iwlwifi/mvm/fw.c b/drivers/net/wireless/iwlwifi/mvm/fw.c
+index eb10c5ee4a14..b49367e1cfd2 100644
+--- a/drivers/net/wireless/iwlwifi/mvm/fw.c
++++ b/drivers/net/wireless/iwlwifi/mvm/fw.c
+@@ -364,7 +364,7 @@ int iwl_run_init_mvm_ucode(struct iwl_mvm *mvm, bool read_nvm)
+ 	 * abort after reading the nvm in case RF Kill is on, we will complete
+ 	 * the init seq later when RF kill will switch to off
+ 	 */
+-	if (iwl_mvm_is_radio_killed(mvm)) {
++	if (iwl_mvm_is_radio_hw_killed(mvm)) {
+ 		IWL_DEBUG_RF_KILL(mvm,
+ 				  "jump over all phy activities due to RF kill\n");
+ 		iwl_remove_notification(&mvm->notif_wait, &calib_wait);
+@@ -397,7 +397,7 @@ int iwl_run_init_mvm_ucode(struct iwl_mvm *mvm, bool read_nvm)
+ 	ret = iwl_wait_notification(&mvm->notif_wait, &calib_wait,
+ 			MVM_UCODE_CALIB_TIMEOUT);
+ 
+-	if (ret && iwl_mvm_is_radio_killed(mvm)) {
++	if (ret && iwl_mvm_is_radio_hw_killed(mvm)) {
+ 		IWL_DEBUG_RF_KILL(mvm, "RFKILL while calibrating.\n");
+ 		ret = 1;
+ 	}
+diff --git a/drivers/net/wireless/iwlwifi/mvm/mac80211.c b/drivers/net/wireless/iwlwifi/mvm/mac80211.c
+index dfdab38e2d4a..f82019c0c4c0 100644
+--- a/drivers/net/wireless/iwlwifi/mvm/mac80211.c
++++ b/drivers/net/wireless/iwlwifi/mvm/mac80211.c
+@@ -2373,6 +2373,7 @@ static void iwl_mvm_stop_ap_ibss(struct ieee80211_hw *hw,
+ 		iwl_mvm_remove_time_event(mvm, mvmvif,
+ 					  &mvmvif->time_event_data);
+ 		RCU_INIT_POINTER(mvm->csa_vif, NULL);
++		mvmvif->csa_countdown = false;
+ 	}
+ 
+ 	if (rcu_access_pointer(mvm->csa_tx_blocked_vif) == vif) {
+diff --git a/drivers/net/wireless/iwlwifi/mvm/mvm.h b/drivers/net/wireless/iwlwifi/mvm/mvm.h
+index 2d4bad5fe825..4a6f1627ae43 100644
+--- a/drivers/net/wireless/iwlwifi/mvm/mvm.h
++++ b/drivers/net/wireless/iwlwifi/mvm/mvm.h
+@@ -848,6 +848,11 @@ static inline bool iwl_mvm_is_radio_killed(struct iwl_mvm *mvm)
+ 	       test_bit(IWL_MVM_STATUS_HW_CTKILL, &mvm->status);
+ }
+ 
++static inline bool iwl_mvm_is_radio_hw_killed(struct iwl_mvm *mvm)
++{
++	return test_bit(IWL_MVM_STATUS_HW_RFKILL, &mvm->status);
++}
++
+ /* Must be called with rcu_read_lock() held and it can only be
+  * released when mvmsta is not needed anymore.
+  */
+diff --git a/drivers/net/wireless/iwlwifi/mvm/ops.c b/drivers/net/wireless/iwlwifi/mvm/ops.c
+index e4fa50075ffd..61c2b0ad5db7 100644
+--- a/drivers/net/wireless/iwlwifi/mvm/ops.c
++++ b/drivers/net/wireless/iwlwifi/mvm/ops.c
+@@ -582,6 +582,7 @@ iwl_op_mode_mvm_start(struct iwl_trans *trans, const struct iwl_cfg *cfg,
+ 	ieee80211_unregister_hw(mvm->hw);
+ 	iwl_mvm_leds_exit(mvm);
+  out_free:
++	flush_delayed_work(&mvm->fw_dump_wk);
+ 	iwl_phy_db_free(mvm->phy_db);
+ 	kfree(mvm->scan_cmd);
+ 	if (!cfg->no_power_up_nic_in_init || !mvm->nvm_file_name)
+diff --git a/drivers/net/wireless/iwlwifi/pcie/drv.c b/drivers/net/wireless/iwlwifi/pcie/drv.c
+index 9f65c1cff1b1..865d578dee82 100644
+--- a/drivers/net/wireless/iwlwifi/pcie/drv.c
++++ b/drivers/net/wireless/iwlwifi/pcie/drv.c
+@@ -414,6 +414,11 @@ static const struct pci_device_id iwl_hw_card_ids[] = {
+ 	{IWL_PCI_DEVICE(0x095A, 0x5590, iwl7265_2ac_cfg)},
+ 	{IWL_PCI_DEVICE(0x095B, 0x5290, iwl7265_2ac_cfg)},
+ 	{IWL_PCI_DEVICE(0x095A, 0x5490, iwl7265_2ac_cfg)},
++	{IWL_PCI_DEVICE(0x095A, 0x5F10, iwl7265_2ac_cfg)},
++	{IWL_PCI_DEVICE(0x095B, 0x5212, iwl7265_2ac_cfg)},
++	{IWL_PCI_DEVICE(0x095B, 0x520A, iwl7265_2ac_cfg)},
++	{IWL_PCI_DEVICE(0x095A, 0x9000, iwl7265_2ac_cfg)},
++	{IWL_PCI_DEVICE(0x095A, 0x9400, iwl7265_2ac_cfg)},
+ 
+ /* 8000 Series */
+ 	{IWL_PCI_DEVICE(0x24F3, 0x0010, iwl8260_2ac_cfg)},
+diff --git a/drivers/net/wireless/rtlwifi/pci.h b/drivers/net/wireless/rtlwifi/pci.h
+index d4567d12e07e..5da6703942d9 100644
+--- a/drivers/net/wireless/rtlwifi/pci.h
++++ b/drivers/net/wireless/rtlwifi/pci.h
+@@ -247,6 +247,8 @@ struct rtl_pci {
+ 	/* MSI support */
+ 	bool msi_support;
+ 	bool using_msi;
++	/* interrupt clear before set */
++	bool int_clear;
+ };
+ 
+ struct mp_adapter {
+diff --git a/drivers/net/wireless/rtlwifi/rtl8821ae/hw.c b/drivers/net/wireless/rtlwifi/rtl8821ae/hw.c
+index b7f18e2155eb..6e9418ed90c2 100644
+--- a/drivers/net/wireless/rtlwifi/rtl8821ae/hw.c
++++ b/drivers/net/wireless/rtlwifi/rtl8821ae/hw.c
+@@ -2253,11 +2253,28 @@ void rtl8821ae_set_qos(struct ieee80211_hw *hw, int aci)
+ 	}
+ }
+ 
++static void rtl8821ae_clear_interrupt(struct ieee80211_hw *hw)
++{
++	struct rtl_priv *rtlpriv = rtl_priv(hw);
++	u32 tmp = rtl_read_dword(rtlpriv, REG_HISR);
++
++	rtl_write_dword(rtlpriv, REG_HISR, tmp);
++
++	tmp = rtl_read_dword(rtlpriv, REG_HISRE);
++	rtl_write_dword(rtlpriv, REG_HISRE, tmp);
++
++	tmp = rtl_read_dword(rtlpriv, REG_HSISR);
++	rtl_write_dword(rtlpriv, REG_HSISR, tmp);
++}
++
+ void rtl8821ae_enable_interrupt(struct ieee80211_hw *hw)
+ {
+ 	struct rtl_priv *rtlpriv = rtl_priv(hw);
+ 	struct rtl_pci *rtlpci = rtl_pcidev(rtl_pcipriv(hw));
+ 
++	if (!rtlpci->int_clear)
++		rtl8821ae_clear_interrupt(hw);/*clear it here first*/
++
+ 	rtl_write_dword(rtlpriv, REG_HIMR, rtlpci->irq_mask[0] & 0xFFFFFFFF);
+ 	rtl_write_dword(rtlpriv, REG_HIMRE, rtlpci->irq_mask[1] & 0xFFFFFFFF);
+ 	rtlpci->irq_enabled = true;
+diff --git a/drivers/net/wireless/rtlwifi/rtl8821ae/sw.c b/drivers/net/wireless/rtlwifi/rtl8821ae/sw.c
+index a4988121e1ab..8ee141a55bc5 100644
+--- a/drivers/net/wireless/rtlwifi/rtl8821ae/sw.c
++++ b/drivers/net/wireless/rtlwifi/rtl8821ae/sw.c
+@@ -96,6 +96,7 @@ int rtl8821ae_init_sw_vars(struct ieee80211_hw *hw)
+ 
+ 	rtl8821ae_bt_reg_init(hw);
+ 	rtlpci->msi_support = rtlpriv->cfg->mod_params->msi_support;
++	rtlpci->int_clear = rtlpriv->cfg->mod_params->int_clear;
+ 	rtlpriv->btcoexist.btc_ops = rtl_btc_get_ops_pointer();
+ 
+ 	rtlpriv->dm.dm_initialgain_enable = 1;
+@@ -167,6 +168,7 @@ int rtl8821ae_init_sw_vars(struct ieee80211_hw *hw)
+ 	rtlpriv->psc.swctrl_lps = rtlpriv->cfg->mod_params->swctrl_lps;
+ 	rtlpriv->psc.fwctrl_lps = rtlpriv->cfg->mod_params->fwctrl_lps;
+ 	rtlpci->msi_support = rtlpriv->cfg->mod_params->msi_support;
++	rtlpci->msi_support = rtlpriv->cfg->mod_params->int_clear;
+ 	if (rtlpriv->cfg->mod_params->disable_watchdog)
+ 		pr_info("watchdog disabled\n");
+ 	rtlpriv->psc.reg_fwctrl_lps = 3;
+@@ -308,6 +310,7 @@ static struct rtl_mod_params rtl8821ae_mod_params = {
+ 	.swctrl_lps = false,
+ 	.fwctrl_lps = true,
+ 	.msi_support = true,
++	.int_clear = true,
+ 	.debug = DBG_EMERG,
+ 	.disable_watchdog = 0,
+ };
+@@ -437,6 +440,7 @@ module_param_named(fwlps, rtl8821ae_mod_params.fwctrl_lps, bool, 0444);
+ module_param_named(msi, rtl8821ae_mod_params.msi_support, bool, 0444);
+ module_param_named(disable_watchdog, rtl8821ae_mod_params.disable_watchdog,
+ 		   bool, 0444);
++module_param_named(int_clear, rtl8821ae_mod_params.int_clear, bool, 0444);
+ MODULE_PARM_DESC(swenc, "Set to 1 for software crypto (default 0)\n");
+ MODULE_PARM_DESC(ips, "Set to 0 to not use link power save (default 1)\n");
+ MODULE_PARM_DESC(swlps, "Set to 1 to use SW control power save (default 0)\n");
+@@ -444,6 +448,7 @@ MODULE_PARM_DESC(fwlps, "Set to 1 to use FW control power save (default 1)\n");
+ MODULE_PARM_DESC(msi, "Set to 1 to use MSI interrupts mode (default 1)\n");
+ MODULE_PARM_DESC(debug, "Set debug level (0-5) (default 0)");
+ MODULE_PARM_DESC(disable_watchdog, "Set to 1 to disable the watchdog (default 0)\n");
++MODULE_PARM_DESC(int_clear, "Set to 1 to disable interrupt clear before set (default 0)\n");
+ 
+ static SIMPLE_DEV_PM_OPS(rtlwifi_pm_ops, rtl_pci_suspend, rtl_pci_resume);
+ 
+diff --git a/drivers/net/wireless/rtlwifi/wifi.h b/drivers/net/wireless/rtlwifi/wifi.h
+index 2b770b5e2620..0a3570aa6651 100644
+--- a/drivers/net/wireless/rtlwifi/wifi.h
++++ b/drivers/net/wireless/rtlwifi/wifi.h
+@@ -2234,6 +2234,9 @@ struct rtl_mod_params {
+ 
+ 	/* default 0: 1 means disable */
+ 	bool disable_watchdog;
++
++	/* default 0: 1 means do not disable interrupts */
++	bool int_clear;
+ };
+ 
+ struct rtl_hal_usbint_cfg {
+diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
+index 312f23a8429c..92618686604c 100644
+--- a/drivers/pci/pci-sysfs.c
++++ b/drivers/pci/pci-sysfs.c
+@@ -216,7 +216,7 @@ static ssize_t numa_node_store(struct device *dev,
+ 	if (ret)
+ 		return ret;
+ 
+-	if (!node_online(node))
++	if (node >= MAX_NUMNODES || !node_online(node))
+ 		return -EINVAL;
+ 
+ 	add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK);
+diff --git a/drivers/pinctrl/intel/pinctrl-baytrail.c b/drivers/pinctrl/intel/pinctrl-baytrail.c
+index 2062c224e32f..b2602210784d 100644
+--- a/drivers/pinctrl/intel/pinctrl-baytrail.c
++++ b/drivers/pinctrl/intel/pinctrl-baytrail.c
+@@ -146,7 +146,7 @@ struct byt_gpio_pin_context {
+ struct byt_gpio {
+ 	struct gpio_chip		chip;
+ 	struct platform_device		*pdev;
+-	spinlock_t			lock;
++	raw_spinlock_t			lock;
+ 	void __iomem			*reg_base;
+ 	struct pinctrl_gpio_range	*range;
+ 	struct byt_gpio_pin_context	*saved_context;
+@@ -174,11 +174,11 @@ static void byt_gpio_clear_triggering(struct byt_gpio *vg, unsigned offset)
+ 	unsigned long flags;
+ 	u32 value;
+ 
+-	spin_lock_irqsave(&vg->lock, flags);
++	raw_spin_lock_irqsave(&vg->lock, flags);
+ 	value = readl(reg);
+ 	value &= ~(BYT_TRIG_POS | BYT_TRIG_NEG | BYT_TRIG_LVL);
+ 	writel(value, reg);
+-	spin_unlock_irqrestore(&vg->lock, flags);
++	raw_spin_unlock_irqrestore(&vg->lock, flags);
+ }
+ 
+ static u32 byt_get_gpio_mux(struct byt_gpio *vg, unsigned offset)
+@@ -201,6 +201,9 @@ static int byt_gpio_request(struct gpio_chip *chip, unsigned offset)
+ 	struct byt_gpio *vg = to_byt_gpio(chip);
+ 	void __iomem *reg = byt_gpio_reg(chip, offset, BYT_CONF0_REG);
+ 	u32 value, gpio_mux;
++	unsigned long flags;
++
++	raw_spin_lock_irqsave(&vg->lock, flags);
+ 
+ 	/*
+ 	 * In most cases, func pin mux 000 means GPIO function.
+@@ -214,18 +217,16 @@ static int byt_gpio_request(struct gpio_chip *chip, unsigned offset)
+ 	value = readl(reg) & BYT_PIN_MUX;
+ 	gpio_mux = byt_get_gpio_mux(vg, offset);
+ 	if (WARN_ON(gpio_mux != value)) {
+-		unsigned long flags;
+-
+-		spin_lock_irqsave(&vg->lock, flags);
+ 		value = readl(reg) & ~BYT_PIN_MUX;
+ 		value |= gpio_mux;
+ 		writel(value, reg);
+-		spin_unlock_irqrestore(&vg->lock, flags);
+ 
+ 		dev_warn(&vg->pdev->dev,
+ 			 "pin %u forcibly re-configured as GPIO\n", offset);
+ 	}
+ 
++	raw_spin_unlock_irqrestore(&vg->lock, flags);
++
+ 	pm_runtime_get(&vg->pdev->dev);
+ 
+ 	return 0;
+@@ -250,7 +251,7 @@ static int byt_irq_type(struct irq_data *d, unsigned type)
+ 	if (offset >= vg->chip.ngpio)
+ 		return -EINVAL;
+ 
+-	spin_lock_irqsave(&vg->lock, flags);
++	raw_spin_lock_irqsave(&vg->lock, flags);
+ 	value = readl(reg);
+ 
+ 	WARN(value & BYT_DIRECT_IRQ_EN,
+@@ -269,7 +270,7 @@ static int byt_irq_type(struct irq_data *d, unsigned type)
+ 	else if (type & IRQ_TYPE_LEVEL_MASK)
+ 		__irq_set_handler_locked(d->irq, handle_level_irq);
+ 
+-	spin_unlock_irqrestore(&vg->lock, flags);
++	raw_spin_unlock_irqrestore(&vg->lock, flags);
+ 
+ 	return 0;
+ }
+@@ -277,7 +278,15 @@ static int byt_irq_type(struct irq_data *d, unsigned type)
+ static int byt_gpio_get(struct gpio_chip *chip, unsigned offset)
+ {
+ 	void __iomem *reg = byt_gpio_reg(chip, offset, BYT_VAL_REG);
+-	return readl(reg) & BYT_LEVEL;
++	struct byt_gpio *vg = to_byt_gpio(chip);
++	unsigned long flags;
++	u32 val;
++
++	raw_spin_lock_irqsave(&vg->lock, flags);
++	val = readl(reg);
++	raw_spin_unlock_irqrestore(&vg->lock, flags);
++
++	return val & BYT_LEVEL;
+ }
+ 
+ static void byt_gpio_set(struct gpio_chip *chip, unsigned offset, int value)
+@@ -287,7 +296,7 @@ static void byt_gpio_set(struct gpio_chip *chip, unsigned offset, int value)
+ 	unsigned long flags;
+ 	u32 old_val;
+ 
+-	spin_lock_irqsave(&vg->lock, flags);
++	raw_spin_lock_irqsave(&vg->lock, flags);
+ 
+ 	old_val = readl(reg);
+ 
+@@ -296,7 +305,7 @@ static void byt_gpio_set(struct gpio_chip *chip, unsigned offset, int value)
+ 	else
+ 		writel(old_val & ~BYT_LEVEL, reg);
+ 
+-	spin_unlock_irqrestore(&vg->lock, flags);
++	raw_spin_unlock_irqrestore(&vg->lock, flags);
+ }
+ 
+ static int byt_gpio_direction_input(struct gpio_chip *chip, unsigned offset)
+@@ -306,13 +315,13 @@ static int byt_gpio_direction_input(struct gpio_chip *chip, unsigned offset)
+ 	unsigned long flags;
+ 	u32 value;
+ 
+-	spin_lock_irqsave(&vg->lock, flags);
++	raw_spin_lock_irqsave(&vg->lock, flags);
+ 
+ 	value = readl(reg) | BYT_DIR_MASK;
+ 	value &= ~BYT_INPUT_EN;		/* active low */
+ 	writel(value, reg);
+ 
+-	spin_unlock_irqrestore(&vg->lock, flags);
++	raw_spin_unlock_irqrestore(&vg->lock, flags);
+ 
+ 	return 0;
+ }
+@@ -326,7 +335,7 @@ static int byt_gpio_direction_output(struct gpio_chip *chip,
+ 	unsigned long flags;
+ 	u32 reg_val;
+ 
+-	spin_lock_irqsave(&vg->lock, flags);
++	raw_spin_lock_irqsave(&vg->lock, flags);
+ 
+ 	/*
+ 	 * Before making any direction modifications, do a check if gpio
+@@ -345,7 +354,7 @@ static int byt_gpio_direction_output(struct gpio_chip *chip,
+ 	else
+ 		writel(reg_val & ~BYT_LEVEL, reg);
+ 
+-	spin_unlock_irqrestore(&vg->lock, flags);
++	raw_spin_unlock_irqrestore(&vg->lock, flags);
+ 
+ 	return 0;
+ }
+@@ -354,18 +363,19 @@ static void byt_gpio_dbg_show(struct seq_file *s, struct gpio_chip *chip)
+ {
+ 	struct byt_gpio *vg = to_byt_gpio(chip);
+ 	int i;
+-	unsigned long flags;
+ 	u32 conf0, val, offs;
+ 
+-	spin_lock_irqsave(&vg->lock, flags);
+-
+ 	for (i = 0; i < vg->chip.ngpio; i++) {
+ 		const char *pull_str = NULL;
+ 		const char *pull = NULL;
++		unsigned long flags;
+ 		const char *label;
+ 		offs = vg->range->pins[i] * 16;
++
++		raw_spin_lock_irqsave(&vg->lock, flags);
+ 		conf0 = readl(vg->reg_base + offs + BYT_CONF0_REG);
+ 		val = readl(vg->reg_base + offs + BYT_VAL_REG);
++		raw_spin_unlock_irqrestore(&vg->lock, flags);
+ 
+ 		label = gpiochip_is_requested(chip, i);
+ 		if (!label)
+@@ -418,7 +428,6 @@ static void byt_gpio_dbg_show(struct seq_file *s, struct gpio_chip *chip)
+ 
+ 		seq_puts(s, "\n");
+ 	}
+-	spin_unlock_irqrestore(&vg->lock, flags);
+ }
+ 
+ static void byt_gpio_irq_handler(unsigned irq, struct irq_desc *desc)
+@@ -450,8 +459,10 @@ static void byt_irq_ack(struct irq_data *d)
+ 	unsigned offset = irqd_to_hwirq(d);
+ 	void __iomem *reg;
+ 
++	raw_spin_lock(&vg->lock);
+ 	reg = byt_gpio_reg(&vg->chip, offset, BYT_INT_STAT_REG);
+ 	writel(BIT(offset % 32), reg);
++	raw_spin_unlock(&vg->lock);
+ }
+ 
+ static void byt_irq_unmask(struct irq_data *d)
+@@ -463,9 +474,9 @@ static void byt_irq_unmask(struct irq_data *d)
+ 	void __iomem *reg;
+ 	u32 value;
+ 
+-	spin_lock_irqsave(&vg->lock, flags);
+-
+ 	reg = byt_gpio_reg(&vg->chip, offset, BYT_CONF0_REG);
++
++	raw_spin_lock_irqsave(&vg->lock, flags);
+ 	value = readl(reg);
+ 
+ 	switch (irqd_get_trigger_type(d)) {
+@@ -486,7 +497,7 @@ static void byt_irq_unmask(struct irq_data *d)
+ 
+ 	writel(value, reg);
+ 
+-	spin_unlock_irqrestore(&vg->lock, flags);
++	raw_spin_unlock_irqrestore(&vg->lock, flags);
+ }
+ 
+ static void byt_irq_mask(struct irq_data *d)
+@@ -578,7 +589,7 @@ static int byt_gpio_probe(struct platform_device *pdev)
+ 	if (IS_ERR(vg->reg_base))
+ 		return PTR_ERR(vg->reg_base);
+ 
+-	spin_lock_init(&vg->lock);
++	raw_spin_lock_init(&vg->lock);
+ 
+ 	gc = &vg->chip;
+ 	gc->label = dev_name(&pdev->dev);
+diff --git a/drivers/scsi/mvsas/mv_sas.c b/drivers/scsi/mvsas/mv_sas.c
+index 454536c49315..9c780740fb82 100644
+--- a/drivers/scsi/mvsas/mv_sas.c
++++ b/drivers/scsi/mvsas/mv_sas.c
+@@ -887,6 +887,8 @@ static void mvs_slot_free(struct mvs_info *mvi, u32 rx_desc)
+ static void mvs_slot_task_free(struct mvs_info *mvi, struct sas_task *task,
+ 			  struct mvs_slot_info *slot, u32 slot_idx)
+ {
++	if (!slot)
++		return;
+ 	if (!slot->task)
+ 		return;
+ 	if (!sas_protocol_ata(task->task_proto))
+diff --git a/drivers/staging/iio/accel/sca3000_ring.c b/drivers/staging/iio/accel/sca3000_ring.c
+index 23685e74917e..bd2c69f85949 100644
+--- a/drivers/staging/iio/accel/sca3000_ring.c
++++ b/drivers/staging/iio/accel/sca3000_ring.c
+@@ -116,7 +116,7 @@ static int sca3000_read_first_n_hw_rb(struct iio_buffer *r,
+ 	if (ret)
+ 		goto error_ret;
+ 
+-	for (i = 0; i < num_read; i++)
++	for (i = 0; i < num_read / sizeof(u16); i++)
+ 		*(((u16 *)rx) + i) = be16_to_cpup((__be16 *)rx + i);
+ 
+ 	if (copy_to_user(buf, rx, num_read))
+diff --git a/drivers/staging/iio/adc/mxs-lradc.c b/drivers/staging/iio/adc/mxs-lradc.c
+index d7c5223f1c3e..2931ea9b75d1 100644
+--- a/drivers/staging/iio/adc/mxs-lradc.c
++++ b/drivers/staging/iio/adc/mxs-lradc.c
+@@ -919,11 +919,12 @@ static int mxs_lradc_read_raw(struct iio_dev *iio_dev,
+ 	case IIO_CHAN_INFO_OFFSET:
+ 		if (chan->type == IIO_TEMP) {
+ 			/* The calculated value from the ADC is in Kelvin, we
+-			 * want Celsius for hwmon so the offset is
+-			 * -272.15 * scale
++			 * want Celsius for hwmon so the offset is -273.15
++			 * The offset is applied before scaling so it is
++			 * actually -213.15 * 4 / 1.012 = -1079.644268
+ 			 */
+-			*val = -1075;
+-			*val2 = 691699;
++			*val = -1079;
++			*val2 = 644268;
+ 
+ 			return IIO_VAL_INT_PLUS_MICRO;
+ 		}
+diff --git a/drivers/thermal/samsung/exynos_tmu.c b/drivers/thermal/samsung/exynos_tmu.c
+index c96ff10b869e..af68d06fd193 100644
+--- a/drivers/thermal/samsung/exynos_tmu.c
++++ b/drivers/thermal/samsung/exynos_tmu.c
+@@ -933,7 +933,7 @@ static void exynos4412_tmu_set_emulation(struct exynos_tmu_data *data,
+ 
+ 	if (data->soc == SOC_ARCH_EXYNOS5260)
+ 		emul_con = EXYNOS5260_EMUL_CON;
+-	if (data->soc == SOC_ARCH_EXYNOS5433)
++	else if (data->soc == SOC_ARCH_EXYNOS5433)
+ 		emul_con = EXYNOS5433_TMU_EMUL_CON;
+ 	else if (data->soc == SOC_ARCH_EXYNOS7)
+ 		emul_con = EXYNOS7_TMU_REG_EMUL_CON;
+diff --git a/drivers/tty/serial/8250/8250_dma.c b/drivers/tty/serial/8250/8250_dma.c
+index 21d01a491405..e508939daea3 100644
+--- a/drivers/tty/serial/8250/8250_dma.c
++++ b/drivers/tty/serial/8250/8250_dma.c
+@@ -80,10 +80,6 @@ int serial8250_tx_dma(struct uart_8250_port *p)
+ 		return 0;
+ 
+ 	dma->tx_size = CIRC_CNT_TO_END(xmit->head, xmit->tail, UART_XMIT_SIZE);
+-	if (dma->tx_size < p->port.fifosize) {
+-		ret = -EINVAL;
+-		goto err;
+-	}
+ 
+ 	desc = dmaengine_prep_slave_single(dma->txchan,
+ 					   dma->tx_addr + xmit->tail,
+diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
+index c79d33676672..c47d3e480586 100644
+--- a/drivers/usb/host/xhci-pci.c
++++ b/drivers/usb/host/xhci-pci.c
+@@ -147,6 +147,7 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
+ 	if (pdev->vendor == PCI_VENDOR_ID_INTEL &&
+ 		pdev->device == PCI_DEVICE_ID_INTEL_LYNXPOINT_LP_XHCI) {
+ 		xhci->quirks |= XHCI_SPURIOUS_REBOOT;
++		xhci->quirks |= XHCI_SPURIOUS_WAKEUP;
+ 	}
+ 	if (pdev->vendor == PCI_VENDOR_ID_INTEL &&
+ 		(pdev->device == PCI_DEVICE_ID_INTEL_SUNRISEPOINT_LP_XHCI ||
+diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
+index 8aadf3def901..63041c1e5a9f 100644
+--- a/drivers/usb/host/xhci-ring.c
++++ b/drivers/usb/host/xhci-ring.c
+@@ -2239,6 +2239,7 @@ static int handle_tx_event(struct xhci_hcd *xhci,
+ 	u32 trb_comp_code;
+ 	int ret = 0;
+ 	int td_num = 0;
++	bool handling_skipped_tds = false;
+ 
+ 	slot_id = TRB_TO_SLOT_ID(le32_to_cpu(event->flags));
+ 	xdev = xhci->devs[slot_id];
+@@ -2372,6 +2373,10 @@ static int handle_tx_event(struct xhci_hcd *xhci,
+ 		ep->skip = true;
+ 		xhci_dbg(xhci, "Miss service interval error, set skip flag\n");
+ 		goto cleanup;
++	case COMP_PING_ERR:
++		ep->skip = true;
++		xhci_dbg(xhci, "No Ping response error, Skip one Isoc TD\n");
++		goto cleanup;
+ 	default:
+ 		if (xhci_is_vendor_info_code(xhci, trb_comp_code)) {
+ 			status = 0;
+@@ -2508,13 +2513,18 @@ static int handle_tx_event(struct xhci_hcd *xhci,
+ 						 ep, &status);
+ 
+ cleanup:
++
++
++		handling_skipped_tds = ep->skip &&
++			trb_comp_code != COMP_MISSED_INT &&
++			trb_comp_code != COMP_PING_ERR;
++
+ 		/*
+-		 * Do not update event ring dequeue pointer if ep->skip is set.
+-		 * Will roll back to continue process missed tds.
++		 * Do not update event ring dequeue pointer if we're in a loop
++		 * processing missed tds.
+ 		 */
+-		if (trb_comp_code == COMP_MISSED_INT || !ep->skip) {
++		if (!handling_skipped_tds)
+ 			inc_deq(xhci, xhci->event_ring);
+-		}
+ 
+ 		if (ret) {
+ 			urb = td->urb;
+@@ -2549,7 +2559,7 @@ cleanup:
+ 	 * Process them as short transfer until reach the td pointed by
+ 	 * the event.
+ 	 */
+-	} while (ep->skip && trb_comp_code != COMP_MISSED_INT);
++	} while (handling_skipped_tds);
+ 
+ 	return 0;
+ }
+diff --git a/drivers/usb/serial/qcserial.c b/drivers/usb/serial/qcserial.c
+index ebcec8cda858..f49d262e926b 100644
+--- a/drivers/usb/serial/qcserial.c
++++ b/drivers/usb/serial/qcserial.c
+@@ -153,6 +153,8 @@ static const struct usb_device_id id_table[] = {
+ 	{DEVICE_SWI(0x1199, 0x9056)},	/* Sierra Wireless Modem */
+ 	{DEVICE_SWI(0x1199, 0x9060)},	/* Sierra Wireless Modem */
+ 	{DEVICE_SWI(0x1199, 0x9061)},	/* Sierra Wireless Modem */
++	{DEVICE_SWI(0x1199, 0x9070)},	/* Sierra Wireless MC74xx/EM74xx */
++	{DEVICE_SWI(0x1199, 0x9071)},	/* Sierra Wireless MC74xx/EM74xx */
+ 	{DEVICE_SWI(0x413c, 0x81a2)},	/* Dell Wireless 5806 Gobi(TM) 4G LTE Mobile Broadband Card */
+ 	{DEVICE_SWI(0x413c, 0x81a3)},	/* Dell Wireless 5570 HSPA+ (42Mbps) Mobile Broadband Card */
+ 	{DEVICE_SWI(0x413c, 0x81a4)},	/* Dell Wireless 5570e HSPA+ (42Mbps) Mobile Broadband Card */
+diff --git a/drivers/video/console/fbcon.c b/drivers/video/console/fbcon.c
+index 1aaf89300621..92f394927f24 100644
+--- a/drivers/video/console/fbcon.c
++++ b/drivers/video/console/fbcon.c
+@@ -1093,6 +1093,7 @@ static void fbcon_init(struct vc_data *vc, int init)
+ 		con_copy_unimap(vc, svc);
+ 
+ 	ops = info->fbcon_par;
++	ops->cur_blink_jiffies = msecs_to_jiffies(vc->vc_cur_blink_ms);
+ 	p->con_rotate = initial_rotation;
+ 	set_blitting_type(vc, info);
+ 
+diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
+index f490b6155091..641d3dc4f31e 100644
+--- a/fs/btrfs/ioctl.c
++++ b/fs/btrfs/ioctl.c
+@@ -4649,7 +4649,7 @@ locked:
+ 
+ 	if (bctl->flags & ~(BTRFS_BALANCE_ARGS_MASK | BTRFS_BALANCE_TYPE_MASK)) {
+ 		ret = -EINVAL;
+-		goto out_bargs;
++		goto out_bctl;
+ 	}
+ 
+ do_balance:
+@@ -4663,12 +4663,15 @@ do_balance:
+ 	need_unlock = false;
+ 
+ 	ret = btrfs_balance(bctl, bargs);
++	bctl = NULL;
+ 
+ 	if (arg) {
+ 		if (copy_to_user(arg, bargs, sizeof(*bargs)))
+ 			ret = -EFAULT;
+ 	}
+ 
++out_bctl:
++	kfree(bctl);
+ out_bargs:
+ 	kfree(bargs);
+ out_unlock:
+diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
+index 84d693d37428..871fcb67be97 100644
+--- a/fs/overlayfs/copy_up.c
++++ b/fs/overlayfs/copy_up.c
+@@ -81,11 +81,11 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
+ 	if (len == 0)
+ 		return 0;
+ 
+-	old_file = ovl_path_open(old, O_RDONLY);
++	old_file = ovl_path_open(old, O_LARGEFILE | O_RDONLY);
+ 	if (IS_ERR(old_file))
+ 		return PTR_ERR(old_file);
+ 
+-	new_file = ovl_path_open(new, O_WRONLY);
++	new_file = ovl_path_open(new, O_LARGEFILE | O_WRONLY);
+ 	if (IS_ERR(new_file)) {
+ 		error = PTR_ERR(new_file);
+ 		goto out_fput;
+@@ -267,7 +267,7 @@ out:
+ 
+ out_cleanup:
+ 	ovl_cleanup(wdir, newdentry);
+-	goto out;
++	goto out2;
+ }
+ 
+ /*
+diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
+index d9da5a4e9382..ec0c2a050043 100644
+--- a/fs/overlayfs/inode.c
++++ b/fs/overlayfs/inode.c
+@@ -363,6 +363,9 @@ struct inode *ovl_d_select_inode(struct dentry *dentry, unsigned file_flags)
+ 		ovl_path_upper(dentry, &realpath);
+ 	}
+ 
++	if (realpath.dentry->d_flags & DCACHE_OP_SELECT_INODE)
++		return realpath.dentry->d_op->d_select_inode(realpath.dentry, file_flags);
++
+ 	return d_backing_inode(realpath.dentry);
+ }
+ 
+diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
+index 79073d68b475..e38ee0fed24a 100644
+--- a/fs/overlayfs/super.c
++++ b/fs/overlayfs/super.c
+@@ -544,6 +544,7 @@ static void ovl_put_super(struct super_block *sb)
+ 	mntput(ufs->upper_mnt);
+ 	for (i = 0; i < ufs->numlower; i++)
+ 		mntput(ufs->lower_mnt[i]);
++	kfree(ufs->lower_mnt);
+ 
+ 	kfree(ufs->config.lowerdir);
+ 	kfree(ufs->config.upperdir);
+@@ -1048,6 +1049,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
+ 		oe->lowerstack[i].dentry = stack[i].dentry;
+ 		oe->lowerstack[i].mnt = ufs->lower_mnt[i];
+ 	}
++	kfree(stack);
+ 
+ 	root_dentry->d_fsdata = oe;
+ 
+diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
+index 0fe9df983ab7..fe0ab983859b 100644
+--- a/include/linux/backing-dev.h
++++ b/include/linux/backing-dev.h
+@@ -18,13 +18,17 @@
+ #include <linux/slab.h>
+ 
+ int __must_check bdi_init(struct backing_dev_info *bdi);
+-void bdi_destroy(struct backing_dev_info *bdi);
++void bdi_exit(struct backing_dev_info *bdi);
+ 
+ __printf(3, 4)
+ int bdi_register(struct backing_dev_info *bdi, struct device *parent,
+ 		const char *fmt, ...);
+ int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev);
++void bdi_unregister(struct backing_dev_info *bdi);
++
+ int __must_check bdi_setup_and_register(struct backing_dev_info *, char *);
++void bdi_destroy(struct backing_dev_info *bdi);
++
+ void wb_start_writeback(struct bdi_writeback *wb, long nr_pages,
+ 			bool range_cyclic, enum wb_reason reason);
+ void wb_start_background_writeback(struct bdi_writeback *wb);
+diff --git a/include/linux/omap-dma.h b/include/linux/omap-dma.h
+index e5a70132a240..88fa8af2b937 100644
+--- a/include/linux/omap-dma.h
++++ b/include/linux/omap-dma.h
+@@ -17,7 +17,7 @@
+ 
+ #include <linux/platform_device.h>
+ 
+-#define INT_DMA_LCD			25
++#define INT_DMA_LCD			(NR_IRQS_LEGACY + 25)
+ 
+ #define OMAP1_DMA_TOUT_IRQ		(1 << 0)
+ #define OMAP_DMA_DROP_IRQ		(1 << 1)
+diff --git a/include/sound/soc.h b/include/sound/soc.h
+index 93df8bf9d54a..334d0d292020 100644
+--- a/include/sound/soc.h
++++ b/include/sound/soc.h
+@@ -86,7 +86,7 @@
+ 	.access = SNDRV_CTL_ELEM_ACCESS_TLV_READ | \
+ 	SNDRV_CTL_ELEM_ACCESS_READWRITE, \
+ 	.tlv.p  = (tlv_array),\
+-	.info = snd_soc_info_volsw, \
++	.info = snd_soc_info_volsw_sx, \
+ 	.get = snd_soc_get_volsw_sx,\
+ 	.put = snd_soc_put_volsw_sx, \
+ 	.private_value = (unsigned long)&(struct soc_mixer_control) \
+@@ -156,7 +156,7 @@
+ 	.access = SNDRV_CTL_ELEM_ACCESS_TLV_READ | \
+ 	SNDRV_CTL_ELEM_ACCESS_READWRITE, \
+ 	.tlv.p  = (tlv_array), \
+-	.info = snd_soc_info_volsw, \
++	.info = snd_soc_info_volsw_sx, \
+ 	.get = snd_soc_get_volsw_sx, \
+ 	.put = snd_soc_put_volsw_sx, \
+ 	.private_value = (unsigned long)&(struct soc_mixer_control) \
+@@ -573,6 +573,8 @@ int snd_soc_put_enum_double(struct snd_kcontrol *kcontrol,
+ 	struct snd_ctl_elem_value *ucontrol);
+ int snd_soc_info_volsw(struct snd_kcontrol *kcontrol,
+ 	struct snd_ctl_elem_info *uinfo);
++int snd_soc_info_volsw_sx(struct snd_kcontrol *kcontrol,
++			  struct snd_ctl_elem_info *uinfo);
+ #define snd_soc_info_bool_ext		snd_ctl_boolean_mono_info
+ int snd_soc_get_volsw(struct snd_kcontrol *kcontrol,
+ 	struct snd_ctl_elem_value *ucontrol);
+diff --git a/include/sound/wm8904.h b/include/sound/wm8904.h
+index 898be3a8db9a..6d8f8fba3341 100644
+--- a/include/sound/wm8904.h
++++ b/include/sound/wm8904.h
+@@ -119,7 +119,7 @@
+ #define WM8904_MIC_REGS  2
+ #define WM8904_GPIO_REGS 4
+ #define WM8904_DRC_REGS  4
+-#define WM8904_EQ_REGS   25
++#define WM8904_EQ_REGS   24
+ 
+ /**
+  * DRC configurations are specified with a label and a set of register
+diff --git a/kernel/module.c b/kernel/module.c
+index b86b7bf1be38..8f051a106676 100644
+--- a/kernel/module.c
++++ b/kernel/module.c
+@@ -1063,11 +1063,15 @@ void symbol_put_addr(void *addr)
+ 	if (core_kernel_text(a))
+ 		return;
+ 
+-	/* module_text_address is safe here: we're supposed to have reference
+-	 * to module from symbol_get, so it can't go away. */
++	/*
++	 * Even though we hold a reference on the module; we still need to
++	 * disable preemption in order to safely traverse the data structure.
++	 */
++	preempt_disable();
+ 	modaddr = __module_text_address(a);
+ 	BUG_ON(!modaddr);
+ 	module_put(modaddr);
++	preempt_enable();
+ }
+ EXPORT_SYMBOL_GPL(symbol_put_addr);
+ 
+diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
+index 0a17af35670a..da7f8266913b 100644
+--- a/kernel/sched/deadline.c
++++ b/kernel/sched/deadline.c
+@@ -1066,8 +1066,9 @@ select_task_rq_dl(struct task_struct *p, int cpu, int sd_flag, int flags)
+ 		int target = find_later_rq(p);
+ 
+ 		if (target != -1 &&
+-				dl_time_before(p->dl.deadline,
+-					cpu_rq(target)->dl.earliest_dl.curr))
++				(dl_time_before(p->dl.deadline,
++					cpu_rq(target)->dl.earliest_dl.curr) ||
++				(cpu_rq(target)->dl.dl_nr_running == 0)))
+ 			cpu = target;
+ 	}
+ 	rcu_read_unlock();
+@@ -1417,7 +1418,8 @@ static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq)
+ 
+ 		later_rq = cpu_rq(cpu);
+ 
+-		if (!dl_time_before(task->dl.deadline,
++		if (later_rq->dl.dl_nr_running &&
++		    !dl_time_before(task->dl.deadline,
+ 					later_rq->dl.earliest_dl.curr)) {
+ 			/*
+ 			 * Target rq has tasks of equal or earlier deadline,
+diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
+index 3f34496244e9..96969012f242 100644
+--- a/kernel/trace/trace_stack.c
++++ b/kernel/trace/trace_stack.c
+@@ -94,6 +94,12 @@ check_stack(unsigned long ip, unsigned long *stack)
+ 	local_irq_save(flags);
+ 	arch_spin_lock(&max_stack_lock);
+ 
++	/*
++	 * RCU may not be watching, make it see us.
++	 * The stack trace code uses rcu_sched.
++	 */
++	rcu_irq_enter();
++
+ 	/* In case another CPU set the tracer_frame on us */
+ 	if (unlikely(!frame_size))
+ 		this_size -= tracer_frame;
+@@ -174,6 +180,7 @@ check_stack(unsigned long ip, unsigned long *stack)
+ 	}
+ 
+  out:
++	rcu_irq_exit();
+ 	arch_spin_unlock(&max_stack_lock);
+ 	local_irq_restore(flags);
+ }
+diff --git a/lib/fault-inject.c b/lib/fault-inject.c
+index f1cdeb024d17..6a823a53e357 100644
+--- a/lib/fault-inject.c
++++ b/lib/fault-inject.c
+@@ -44,7 +44,7 @@ static void fail_dump(struct fault_attr *attr)
+ 		printk(KERN_NOTICE "FAULT_INJECTION: forcing a failure.\n"
+ 		       "name %pd, interval %lu, probability %lu, "
+ 		       "space %d, times %d\n", attr->dname,
+-		       attr->probability, attr->interval,
++		       attr->interval, attr->probability,
+ 		       atomic_read(&attr->space),
+ 		       atomic_read(&attr->times));
+ 		if (attr->verbose > 1)
+diff --git a/mm/backing-dev.c b/mm/backing-dev.c
+index dac5bf59309d..dc07d8866d9a 100644
+--- a/mm/backing-dev.c
++++ b/mm/backing-dev.c
+@@ -823,7 +823,7 @@ static void bdi_remove_from_list(struct backing_dev_info *bdi)
+ 	synchronize_rcu_expedited();
+ }
+ 
+-void bdi_destroy(struct backing_dev_info *bdi)
++void bdi_unregister(struct backing_dev_info *bdi)
+ {
+ 	/* make sure nobody finds us on the bdi_list anymore */
+ 	bdi_remove_from_list(bdi);
+@@ -835,9 +835,19 @@ void bdi_destroy(struct backing_dev_info *bdi)
+ 		device_unregister(bdi->dev);
+ 		bdi->dev = NULL;
+ 	}
++}
+ 
++void bdi_exit(struct backing_dev_info *bdi)
++{
++	WARN_ON_ONCE(bdi->dev);
+ 	wb_exit(&bdi->wb);
+ }
++
++void bdi_destroy(struct backing_dev_info *bdi)
++{
++	bdi_unregister(bdi);
++	bdi_exit(bdi);
++}
+ EXPORT_SYMBOL(bdi_destroy);
+ 
+ /*
+diff --git a/mm/filemap.c b/mm/filemap.c
+index 1283fc825458..3fd68ee183c6 100644
+--- a/mm/filemap.c
++++ b/mm/filemap.c
+@@ -2488,6 +2488,11 @@ again:
+ 			break;
+ 		}
+ 
++		if (fatal_signal_pending(current)) {
++			status = -EINTR;
++			break;
++		}
++
+ 		status = a_ops->write_begin(file, mapping, pos, bytes, flags,
+ 						&page, &fsdata);
+ 		if (unlikely(status < 0))
+@@ -2525,10 +2530,6 @@ again:
+ 		written += copied;
+ 
+ 		balance_dirty_pages_ratelimited(mapping);
+-		if (fatal_signal_pending(current)) {
+-			status = -EINTR;
+-			break;
+-		}
+ 	} while (iov_iter_count(i));
+ 
+ 	return written ? written : status;
+diff --git a/mm/huge_memory.c b/mm/huge_memory.c
+index 097c7a4bfbd9..da0ac6a0445f 100644
+--- a/mm/huge_memory.c
++++ b/mm/huge_memory.c
+@@ -2132,7 +2132,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
+ 	for (_pte = pte; _pte < pte+HPAGE_PMD_NR;
+ 	     _pte++, address += PAGE_SIZE) {
+ 		pte_t pteval = *_pte;
+-		if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
++		if (pte_none(pteval) || (pte_present(pteval) &&
++			is_zero_pfn(pte_pfn(pteval)))) {
+ 			if (++none_or_zero <= khugepaged_max_ptes_none)
+ 				continue;
+ 			else
+diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
+index 3ea8b7de9633..58d9a8167dd2 100644
+--- a/net/mac80211/debugfs.c
++++ b/net/mac80211/debugfs.c
+@@ -148,7 +148,7 @@ static ssize_t hwflags_read(struct file *file, char __user *user_buf,
+ 
+ 	for (i = 0; i < NUM_IEEE80211_HW_FLAGS; i++) {
+ 		if (test_bit(i, local->hw.flags))
+-			pos += scnprintf(pos, end - pos, "%s",
++			pos += scnprintf(pos, end - pos, "%s\n",
+ 					 hw_flag_names[i]);
+ 	}
+ 
+diff --git a/net/netfilter/ipset/ip_set_list_set.c b/net/netfilter/ipset/ip_set_list_set.c
+index a1fe5377a2b3..5a30ce6e8c90 100644
+--- a/net/netfilter/ipset/ip_set_list_set.c
++++ b/net/netfilter/ipset/ip_set_list_set.c
+@@ -297,7 +297,7 @@ list_set_uadd(struct ip_set *set, void *value, const struct ip_set_ext *ext,
+ 	      ip_set_timeout_expired(ext_timeout(n, set))))
+ 		n =  NULL;
+ 
+-	e = kzalloc(set->dsize, GFP_KERNEL);
++	e = kzalloc(set->dsize, GFP_ATOMIC);
+ 	if (!e)
+ 		return -ENOMEM;
+ 	e->id = d->id;
+diff --git a/sound/hda/ext/hdac_ext_bus.c b/sound/hda/ext/hdac_ext_bus.c
+index 0aa5d9eb6c3f..d85aa1a75188 100644
+--- a/sound/hda/ext/hdac_ext_bus.c
++++ b/sound/hda/ext/hdac_ext_bus.c
+@@ -19,6 +19,7 @@
+ 
+ #include <linux/module.h>
+ #include <linux/slab.h>
++#include <linux/io.h>
+ #include <sound/hdaudio_ext.h>
+ 
+ MODULE_DESCRIPTION("HDA extended core");
+diff --git a/sound/pci/hda/hda_codec.c b/sound/pci/hda/hda_codec.c
+index d1a2cb65e27c..ca374462d7e5 100644
+--- a/sound/pci/hda/hda_codec.c
++++ b/sound/pci/hda/hda_codec.c
+@@ -3438,10 +3438,8 @@ int snd_hda_codec_build_pcms(struct hda_codec *codec)
+ 	int dev, err;
+ 
+ 	err = snd_hda_codec_parse_pcms(codec);
+-	if (err < 0) {
+-		snd_hda_codec_reset(codec);
++	if (err < 0)
+ 		return err;
+-	}
+ 
+ 	/* attach a new PCM streams */
+ 	list_for_each_entry(cpcm, &codec->pcm_list_head, list) {
+diff --git a/sound/pci/hda/patch_conexant.c b/sound/pci/hda/patch_conexant.c
+index ca03c40609fc..2f0ec7c45fc7 100644
+--- a/sound/pci/hda/patch_conexant.c
++++ b/sound/pci/hda/patch_conexant.c
+@@ -819,6 +819,7 @@ static const struct snd_pci_quirk cxt5066_fixups[] = {
+ 	SND_PCI_QUIRK(0x17aa, 0x21da, "Lenovo X220", CXT_PINCFG_LENOVO_TP410),
+ 	SND_PCI_QUIRK(0x17aa, 0x21db, "Lenovo X220-tablet", CXT_PINCFG_LENOVO_TP410),
+ 	SND_PCI_QUIRK(0x17aa, 0x38af, "Lenovo IdeaPad Z560", CXT_FIXUP_MUTE_LED_EAPD),
++	SND_PCI_QUIRK(0x17aa, 0x390b, "Lenovo G50-80", CXT_FIXUP_STEREO_DMIC),
+ 	SND_PCI_QUIRK(0x17aa, 0x3975, "Lenovo U300s", CXT_FIXUP_STEREO_DMIC),
+ 	SND_PCI_QUIRK(0x17aa, 0x3977, "Lenovo IdeaPad U310", CXT_FIXUP_STEREO_DMIC),
+ 	SND_PCI_QUIRK(0x17aa, 0x397b, "Lenovo S205", CXT_FIXUP_STEREO_DMIC),
+diff --git a/sound/soc/soc-ops.c b/sound/soc/soc-ops.c
+index 100d92b5b77e..05977ae1ff2a 100644
+--- a/sound/soc/soc-ops.c
++++ b/sound/soc/soc-ops.c
+@@ -207,6 +207,34 @@ int snd_soc_info_volsw(struct snd_kcontrol *kcontrol,
+ EXPORT_SYMBOL_GPL(snd_soc_info_volsw);
+ 
+ /**
++ * snd_soc_info_volsw_sx - Mixer info callback for SX TLV controls
++ * @kcontrol: mixer control
++ * @uinfo: control element information
++ *
++ * Callback to provide information about a single mixer control, or a double
++ * mixer control that spans 2 registers of the SX TLV type. SX TLV controls
++ * have a range that represents both positive and negative values either side
++ * of zero but without a sign bit.
++ *
++ * Returns 0 for success.
++ */
++int snd_soc_info_volsw_sx(struct snd_kcontrol *kcontrol,
++			  struct snd_ctl_elem_info *uinfo)
++{
++	struct soc_mixer_control *mc =
++		(struct soc_mixer_control *)kcontrol->private_value;
++
++	snd_soc_info_volsw(kcontrol, uinfo);
++	/* Max represents the number of levels in an SX control not the
++	 * maximum value, so add the minimum value back on
++	 */
++	uinfo->value.integer.max += mc->min;
++
++	return 0;
++}
++EXPORT_SYMBOL_GPL(snd_soc_info_volsw_sx);
++
++/**
+  * snd_soc_get_volsw - single mixer get callback
+  * @kcontrol: mixer control
+  * @ucontrol: control element information
+diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
+index 21c14244f4c4..d7ea8e20dae4 100644
+--- a/virt/kvm/irqchip.c
++++ b/virt/kvm/irqchip.c
+@@ -213,11 +213,15 @@ int kvm_set_irq_routing(struct kvm *kvm,
+ 			goto out;
+ 
+ 		r = -EINVAL;
+-		if (ue->flags)
++		if (ue->flags) {
++			kfree(e);
+ 			goto out;
++		}
+ 		r = setup_routing_entry(new, e, ue);
+-		if (r)
++		if (r) {
++			kfree(e);
+ 			goto out;
++		}
+ 		++ue;
+ 	}
+ 


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-12-11 14:31 Mike Pagano
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-12-11 14:31 UTC (permalink / raw
  To: gentoo-commits

commit:     42c0079efa9fff84d8c63842c360060aad2f75cb
Author:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Fri Dec 11 14:31:16 2015 +0000
Commit:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Fri Dec 11 14:31:16 2015 +0000
URL:        https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=42c0079e

Linux patch 4.2.7

 0000_README            |    4 +
 1006_linux-4.2.7.patch | 4131 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 4135 insertions(+)

diff --git a/0000_README b/0000_README
index 8190b77..2299001 100644
--- a/0000_README
+++ b/0000_README
@@ -67,6 +67,10 @@ Patch:  1005_linux-4.2.6.patch
 From:   http://www.kernel.org
 Desc:   Linux 4.2.6
 
+Patch:  1006_linux-4.2.7.patch
+From:   http://www.kernel.org
+Desc:   Linux 4.2.7
+
 Patch:  1500_XATTR_USER_PREFIX.patch
 From:   https://bugs.gentoo.org/show_bug.cgi?id=470644
 Desc:   Support for namespace user.pax.* on tmpfs.

diff --git a/1006_linux-4.2.7.patch b/1006_linux-4.2.7.patch
new file mode 100644
index 0000000..35ba2e4
--- /dev/null
+++ b/1006_linux-4.2.7.patch
@@ -0,0 +1,4131 @@
+diff --git a/Documentation/devicetree/bindings/usb/dwc3.txt b/Documentation/devicetree/bindings/usb/dwc3.txt
+index 0815eac5b185..e12f3448846a 100644
+--- a/Documentation/devicetree/bindings/usb/dwc3.txt
++++ b/Documentation/devicetree/bindings/usb/dwc3.txt
+@@ -35,6 +35,8 @@ Optional properties:
+ 			LTSSM during USB3 Compliance mode.
+  - snps,dis_u3_susphy_quirk: when set core will disable USB3 suspend phy.
+  - snps,dis_u2_susphy_quirk: when set core will disable USB2 suspend phy.
++ - snps,dis_enblslpm_quirk: when set clears the enblslpm in GUSB2PHYCFG,
++			disabling the suspend signal to the PHY.
+  - snps,is-utmi-l1-suspend: true when DWC3 asserts output signal
+ 			utmi_l1_suspend_n, false when asserts utmi_sleep_n
+  - snps,hird-threshold: HIRD threshold
+diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
+index 6f7fafde0884..3e2844eca266 100644
+--- a/Documentation/filesystems/proc.txt
++++ b/Documentation/filesystems/proc.txt
+@@ -140,7 +140,8 @@ Table 1-1: Process specific entries in /proc
+  stat		Process status
+  statm		Process memory status information
+  status		Process status in human readable form
+- wchan		If CONFIG_KALLSYMS is set, a pre-decoded wchan
++ wchan		Present with CONFIG_KALLSYMS=y: it shows the kernel function
++		symbol the task is blocked in - or "0" if not blocked.
+  pagemap	Page table
+  stack		Report full stack trace, enable via CONFIG_STACKTRACE
+  smaps		a extension based on maps, showing the memory consumption of
+@@ -310,7 +311,7 @@ Table 1-4: Contents of the stat files (as of 2.6.30-rc7)
+   blocked       bitmap of blocked signals
+   sigign        bitmap of ignored signals
+   sigcatch      bitmap of caught signals
+-  wchan         address where process went to sleep
++  0		(place holder, used to be the wchan address, use /proc/PID/wchan instead)
+   0             (place holder)
+   0             (place holder)
+   exit_signal   signal to send to parent thread on exit
+diff --git a/Makefile b/Makefile
+index 9ef37399b4e8..f5014eaf2532 100644
+--- a/Makefile
++++ b/Makefile
+@@ -1,6 +1,6 @@
+ VERSION = 4
+ PATCHLEVEL = 2
+-SUBLEVEL = 6
++SUBLEVEL = 7
+ EXTRAVERSION =
+ NAME = Hurr durr I'ma sheep
+ 
+diff --git a/arch/arm/boot/dts/imx27.dtsi b/arch/arm/boot/dts/imx27.dtsi
+index b69be5c499cf..8c603fdf9da1 100644
+--- a/arch/arm/boot/dts/imx27.dtsi
++++ b/arch/arm/boot/dts/imx27.dtsi
+@@ -477,7 +477,10 @@
+ 				compatible = "fsl,imx27-usb";
+ 				reg = <0x10024000 0x200>;
+ 				interrupts = <56>;
+-				clocks = <&clks IMX27_CLK_USB_IPG_GATE>;
++				clocks = <&clks IMX27_CLK_USB_IPG_GATE>,
++					<&clks IMX27_CLK_USB_AHB_GATE>,
++					<&clks IMX27_CLK_USB_DIV>;
++				clock-names = "ipg", "ahb", "per";
+ 				fsl,usbmisc = <&usbmisc 0>;
+ 				status = "disabled";
+ 			};
+@@ -486,7 +489,10 @@
+ 				compatible = "fsl,imx27-usb";
+ 				reg = <0x10024200 0x200>;
+ 				interrupts = <54>;
+-				clocks = <&clks IMX27_CLK_USB_IPG_GATE>;
++				clocks = <&clks IMX27_CLK_USB_IPG_GATE>,
++					<&clks IMX27_CLK_USB_AHB_GATE>,
++					<&clks IMX27_CLK_USB_DIV>;
++				clock-names = "ipg", "ahb", "per";
+ 				fsl,usbmisc = <&usbmisc 1>;
+ 				dr_mode = "host";
+ 				status = "disabled";
+@@ -496,7 +502,10 @@
+ 				compatible = "fsl,imx27-usb";
+ 				reg = <0x10024400 0x200>;
+ 				interrupts = <55>;
+-				clocks = <&clks IMX27_CLK_USB_IPG_GATE>;
++				clocks = <&clks IMX27_CLK_USB_IPG_GATE>,
++					<&clks IMX27_CLK_USB_AHB_GATE>,
++					<&clks IMX27_CLK_USB_DIV>;
++				clock-names = "ipg", "ahb", "per";
+ 				fsl,usbmisc = <&usbmisc 2>;
+ 				dr_mode = "host";
+ 				status = "disabled";
+@@ -506,7 +515,6 @@
+ 				#index-cells = <1>;
+ 				compatible = "fsl,imx27-usbmisc";
+ 				reg = <0x10024600 0x200>;
+-				clocks = <&clks IMX27_CLK_USB_AHB_GATE>;
+ 			};
+ 
+ 			sahara2: sahara@10025000 {
+diff --git a/arch/arm/boot/dts/omap5-uevm.dts b/arch/arm/boot/dts/omap5-uevm.dts
+index 5771a149ce4a..23d645daeac1 100644
+--- a/arch/arm/boot/dts/omap5-uevm.dts
++++ b/arch/arm/boot/dts/omap5-uevm.dts
+@@ -31,6 +31,24 @@
+ 		regulator-max-microvolt = <3000000>;
+ 	};
+ 
++	mmc3_pwrseq: sdhci0_pwrseq {
++		compatible = "mmc-pwrseq-simple";
++		clocks = <&clk32kgaudio>;
++		clock-names = "ext_clock";
++	};
++
++	vmmcsdio_fixed: fixedregulator-mmcsdio {
++		compatible = "regulator-fixed";
++		regulator-name = "vmmcsdio_fixed";
++		regulator-min-microvolt = <1800000>;
++		regulator-max-microvolt = <1800000>;
++		gpio = <&gpio5 12 GPIO_ACTIVE_HIGH>;	/* gpio140 WLAN_EN */
++		enable-active-high;
++		startup-delay-us = <70000>;
++		pinctrl-names = "default";
++		pinctrl-0 = <&wlan_pins>;
++	};
++
+ 	/* HS USB Host PHY on PORT 2 */
+ 	hsusb2_phy: hsusb2_phy {
+ 		compatible = "usb-nop-xceiv";
+@@ -197,12 +215,20 @@
+ 		>;
+ 	};
+ 
+-	mcspi4_pins: pinmux_mcspi4_pins {
++	mmc3_pins: pinmux_mmc3_pins {
++		pinctrl-single,pins = <
++			OMAP5_IOPAD(0x01a4, PIN_INPUT_PULLUP | MUX_MODE0) /* wlsdio_clk */
++			OMAP5_IOPAD(0x01a6, PIN_INPUT_PULLUP | MUX_MODE0) /* wlsdio_cmd */
++			OMAP5_IOPAD(0x01a8, PIN_INPUT_PULLUP | MUX_MODE0) /* wlsdio_data0 */
++			OMAP5_IOPAD(0x01aa, PIN_INPUT_PULLUP | MUX_MODE0) /* wlsdio_data1 */
++			OMAP5_IOPAD(0x01ac, PIN_INPUT_PULLUP | MUX_MODE0) /* wlsdio_data2 */
++			OMAP5_IOPAD(0x01ae, PIN_INPUT_PULLUP | MUX_MODE0) /* wlsdio_data3 */
++		>;
++	};
++
++	wlan_pins: pinmux_wlan_pins {
+ 		pinctrl-single,pins = <
+-			0x164 (PIN_INPUT | MUX_MODE1)		/*  mcspi4_clk */
+-			0x168 (PIN_INPUT | MUX_MODE1)		/*  mcspi4_simo */
+-			0x16a (PIN_INPUT | MUX_MODE1)		/*  mcspi4_somi */
+-			0x16c (PIN_INPUT | MUX_MODE1)		/*  mcspi4_cs0 */
++			OMAP5_IOPAD(0x1bc, PIN_OUTPUT | MUX_MODE6) /* mcspi1_clk.gpio5_140 */
+ 		>;
+ 	};
+ 
+@@ -276,6 +302,12 @@
+ 			0x1A (PIN_OUTPUT | MUX_MODE0) /* fref_clk1_out, USB hub clk */
+ 		>;
+ 	};
++
++	wlcore_irq_pin: pinmux_wlcore_irq_pin {
++		pinctrl-single,pins = <
++			OMAP5_IOPAD(0x040, WAKEUP_EN | PIN_INPUT_PULLUP | MUX_MODE6)	/* llia_wakereqin.gpio1_wk14 */
++		>;
++	};
+ };
+ 
+ &mmc1 {
+@@ -290,8 +322,25 @@
+ };
+ 
+ &mmc3 {
++	vmmc-supply = <&vmmcsdio_fixed>;
++	mmc-pwrseq = <&mmc3_pwrseq>;
+ 	bus-width = <4>;
+-	ti,non-removable;
++	non-removable;
++	cap-power-off-card;
++	pinctrl-names = "default";
++	pinctrl-0 = <&mmc3_pins &wlcore_irq_pin>;
++	interrupts-extended = <&gic GIC_SPI 94 IRQ_TYPE_LEVEL_HIGH
++			       &omap5_pmx_core 0x168>;
++
++	#address-cells = <1>;
++	#size-cells = <0>;
++	wlcore: wlcore@2 {
++		compatible = "ti,wl1271";
++		reg = <2>;
++		interrupt-parent = <&gpio1>;
++		interrupts = <14 IRQ_TYPE_LEVEL_HIGH>;	/* gpio 14 */
++		ref-clock-frequency = <26000000>;
++	};
+ };
+ 
+ &mmc4 {
+@@ -591,11 +640,6 @@
+ 	pinctrl-0 = <&mcspi3_pins>;
+ };
+ 
+-&mcspi4 {
+-	pinctrl-names = "default";
+-	pinctrl-0 = <&mcspi4_pins>;
+-};
+-
+ &uart1 {
+ 	pinctrl-names = "default";
+ 	pinctrl-0 = <&uart1_pins>;
+diff --git a/arch/arm/boot/dts/sama5d4.dtsi b/arch/arm/boot/dts/sama5d4.dtsi
+index 3ee22ee13c5a..1ba10e495f21 100644
+--- a/arch/arm/boot/dts/sama5d4.dtsi
++++ b/arch/arm/boot/dts/sama5d4.dtsi
+@@ -939,11 +939,11 @@
+ 				reg = <0xf8018000 0x4000>;
+ 				interrupts = <33 IRQ_TYPE_LEVEL_HIGH 6>;
+ 				dmas = <&dma1
+-					(AT91_XDMAC_DT_MEM_IF(0) | AT91_XDMAC_DT_PER_IF(1))
+-					AT91_XDMAC_DT_PERID(4)>,
++					(AT91_XDMAC_DT_MEM_IF(0) | AT91_XDMAC_DT_PER_IF(1)
++					| AT91_XDMAC_DT_PERID(4))>,
+ 				       <&dma1
+-					(AT91_XDMAC_DT_MEM_IF(0) | AT91_XDMAC_DT_PER_IF(1))
+-					AT91_XDMAC_DT_PERID(5)>;
++					(AT91_XDMAC_DT_MEM_IF(0) | AT91_XDMAC_DT_PER_IF(1)
++					| AT91_XDMAC_DT_PERID(5))>;
+ 				dma-names = "tx", "rx";
+ 				pinctrl-names = "default";
+ 				pinctrl-0 = <&pinctrl_i2c1>;
+diff --git a/arch/arm/boot/dts/sun6i-a31-hummingbird.dts b/arch/arm/boot/dts/sun6i-a31-hummingbird.dts
+index d0cfadac0691..18f26ca4e375 100644
+--- a/arch/arm/boot/dts/sun6i-a31-hummingbird.dts
++++ b/arch/arm/boot/dts/sun6i-a31-hummingbird.dts
+@@ -184,18 +184,18 @@
+ 				regulator-name = "vcc-3v0";
+ 			};
+ 
+-			vdd_cpu: dcdc2 {
++			vdd_gpu: dcdc2 {
+ 				regulator-always-on;
+ 				regulator-min-microvolt = <700000>;
+ 				regulator-max-microvolt = <1320000>;
+-				regulator-name = "vdd-cpu";
++				regulator-name = "vdd-gpu";
+ 			};
+ 
+-			vdd_gpu: dcdc3 {
++			vdd_cpu: dcdc3 {
+ 				regulator-always-on;
+ 				regulator-min-microvolt = <700000>;
+ 				regulator-max-microvolt = <1320000>;
+-				regulator-name = "vdd-gpu";
++				regulator-name = "vdd-cpu";
+ 			};
+ 
+ 			vdd_sys_dll: dcdc4 {
+diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
+index 873dbfcc7dc9..56fc339571f9 100644
+--- a/arch/arm/common/edma.c
++++ b/arch/arm/common/edma.c
+@@ -406,7 +406,8 @@ static irqreturn_t dma_irq_handler(int irq, void *data)
+ 					BIT(slot));
+ 			if (edma_cc[ctlr]->intr_data[channel].callback)
+ 				edma_cc[ctlr]->intr_data[channel].callback(
+-					channel, EDMA_DMA_COMPLETE,
++					EDMA_CTLR_CHAN(ctlr, channel),
++					EDMA_DMA_COMPLETE,
+ 					edma_cc[ctlr]->intr_data[channel].data);
+ 		}
+ 	} while (sh_ipr);
+@@ -460,7 +461,8 @@ static irqreturn_t dma_ccerr_handler(int irq, void *data)
+ 					if (edma_cc[ctlr]->intr_data[k].
+ 								callback) {
+ 						edma_cc[ctlr]->intr_data[k].
+-						callback(k,
++						callback(
++						EDMA_CTLR_CHAN(ctlr, k),
+ 						EDMA_DMA_CC_ERROR,
+ 						edma_cc[ctlr]->intr_data
+ 						[k].data);
+diff --git a/arch/arm/include/asm/irq.h b/arch/arm/include/asm/irq.h
+index 53c15dec7af6..6a9851ea6a60 100644
+--- a/arch/arm/include/asm/irq.h
++++ b/arch/arm/include/asm/irq.h
+@@ -35,6 +35,11 @@ extern void (*handle_arch_irq)(struct pt_regs *);
+ extern void set_handle_irq(void (*handle_irq)(struct pt_regs *));
+ #endif
+ 
++static inline int nr_legacy_irqs(void)
++{
++	return NR_IRQS_LEGACY;
++}
++
+ #endif
+ 
+ #endif
+diff --git a/arch/arm/mach-at91/pm_suspend.S b/arch/arm/mach-at91/pm_suspend.S
+index 0d95f488b47a..a25defda3d22 100644
+--- a/arch/arm/mach-at91/pm_suspend.S
++++ b/arch/arm/mach-at91/pm_suspend.S
+@@ -80,6 +80,8 @@ tmp2	.req	r5
+  *	@r2: base address of second SDRAM Controller or 0 if not present
+  *	@r3: pm information
+  */
++/* at91_pm_suspend_in_sram must be 8-byte aligned per the requirements of fncpy() */
++	.align 3
+ ENTRY(at91_pm_suspend_in_sram)
+ 	/* Save registers on stack */
+ 	stmfd	sp!, {r4 - r12, lr}
+diff --git a/arch/arm/mach-pxa/include/mach/pxa27x.h b/arch/arm/mach-pxa/include/mach/pxa27x.h
+index 599b925a657c..1a4291936c58 100644
+--- a/arch/arm/mach-pxa/include/mach/pxa27x.h
++++ b/arch/arm/mach-pxa/include/mach/pxa27x.h
+@@ -19,7 +19,7 @@
+ #define ARB_CORE_PARK		(1<<24)	   /* Be parked with core when idle */
+ #define ARB_LOCK_FLAG		(1<<23)	   /* Only Locking masters gain access to the bus */
+ 
+-extern int __init pxa27x_set_pwrmode(unsigned int mode);
++extern int pxa27x_set_pwrmode(unsigned int mode);
+ extern void pxa27x_cpu_pm_enter(suspend_state_t state);
+ 
+ #endif /* __MACH_PXA27x_H */
+diff --git a/arch/arm/mach-pxa/pxa27x.c b/arch/arm/mach-pxa/pxa27x.c
+index b5abdeb5bb2d..aa97547099fb 100644
+--- a/arch/arm/mach-pxa/pxa27x.c
++++ b/arch/arm/mach-pxa/pxa27x.c
+@@ -84,7 +84,7 @@ EXPORT_SYMBOL_GPL(pxa27x_configure_ac97reset);
+  */
+ static unsigned int pwrmode = PWRMODE_SLEEP;
+ 
+-int __init pxa27x_set_pwrmode(unsigned int mode)
++int pxa27x_set_pwrmode(unsigned int mode)
+ {
+ 	switch (mode) {
+ 	case PWRMODE_SLEEP:
+diff --git a/arch/arm/mach-tegra/board-paz00.c b/arch/arm/mach-tegra/board-paz00.c
+index fbe74c6806f3..49d1110cff53 100644
+--- a/arch/arm/mach-tegra/board-paz00.c
++++ b/arch/arm/mach-tegra/board-paz00.c
+@@ -39,8 +39,8 @@ static struct platform_device wifi_rfkill_device = {
+ static struct gpiod_lookup_table wifi_gpio_lookup = {
+ 	.dev_id = "rfkill_gpio",
+ 	.table = {
+-		GPIO_LOOKUP_IDX("tegra-gpio", 25, NULL, 0, 0),
+-		GPIO_LOOKUP_IDX("tegra-gpio", 85, NULL, 1, 0),
++		GPIO_LOOKUP("tegra-gpio", 25, "reset", 0),
++		GPIO_LOOKUP("tegra-gpio", 85, "shutdown", 0),
+ 		{ },
+ 	},
+ };
+diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
+index cba12f34ff77..25ecc6afec4c 100644
+--- a/arch/arm/mm/dma-mapping.c
++++ b/arch/arm/mm/dma-mapping.c
+@@ -1413,12 +1413,19 @@ static int arm_iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
+ 	unsigned long uaddr = vma->vm_start;
+ 	unsigned long usize = vma->vm_end - vma->vm_start;
+ 	struct page **pages = __iommu_get_pages(cpu_addr, attrs);
++	unsigned long nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
++	unsigned long off = vma->vm_pgoff;
+ 
+ 	vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot);
+ 
+ 	if (!pages)
+ 		return -ENXIO;
+ 
++	if (off >= nr_pages || (usize >> PAGE_SHIFT) > nr_pages - off)
++		return -ENXIO;
++
++	pages += off;
++
+ 	do {
+ 		int ret = vm_insert_page(vma, uaddr, *pages++);
+ 		if (ret) {
+diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
+index bbb251b14746..8b9bf54105b3 100644
+--- a/arch/arm64/include/asm/irq.h
++++ b/arch/arm64/include/asm/irq.h
+@@ -21,4 +21,9 @@ static inline void acpi_irq_init(void)
+ }
+ #define acpi_irq_init acpi_irq_init
+ 
++static inline int nr_legacy_irqs(void)
++{
++	return 0;
++}
++
+ #endif
+diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
+index d6dd9fdbc3be..d4264bb0a409 100644
+--- a/arch/arm64/include/asm/ptrace.h
++++ b/arch/arm64/include/asm/ptrace.h
+@@ -83,14 +83,14 @@
+ #define compat_sp	regs[13]
+ #define compat_lr	regs[14]
+ #define compat_sp_hyp	regs[15]
+-#define compat_sp_irq	regs[16]
+-#define compat_lr_irq	regs[17]
+-#define compat_sp_svc	regs[18]
+-#define compat_lr_svc	regs[19]
+-#define compat_sp_abt	regs[20]
+-#define compat_lr_abt	regs[21]
+-#define compat_sp_und	regs[22]
+-#define compat_lr_und	regs[23]
++#define compat_lr_irq	regs[16]
++#define compat_sp_irq	regs[17]
++#define compat_lr_svc	regs[18]
++#define compat_sp_svc	regs[19]
++#define compat_lr_abt	regs[20]
++#define compat_sp_abt	regs[21]
++#define compat_lr_und	regs[22]
++#define compat_sp_und	regs[23]
+ #define compat_r8_fiq	regs[24]
+ #define compat_r9_fiq	regs[25]
+ #define compat_r10_fiq	regs[26]
+diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
+index 98073332e2d0..4d77757b5894 100644
+--- a/arch/arm64/kernel/vmlinux.lds.S
++++ b/arch/arm64/kernel/vmlinux.lds.S
+@@ -60,9 +60,12 @@ PECOFF_FILE_ALIGNMENT = 0x200;
+ #define PECOFF_EDATA_PADDING
+ #endif
+ 
+-#ifdef CONFIG_DEBUG_ALIGN_RODATA
++#if defined(CONFIG_DEBUG_ALIGN_RODATA)
+ #define ALIGN_DEBUG_RO			. = ALIGN(1<<SECTION_SHIFT);
+ #define ALIGN_DEBUG_RO_MIN(min)		ALIGN_DEBUG_RO
++#elif defined(CONFIG_DEBUG_RODATA)
++#define ALIGN_DEBUG_RO			. = ALIGN(1<<PAGE_SHIFT);
++#define ALIGN_DEBUG_RO_MIN(min)		ALIGN_DEBUG_RO
+ #else
+ #define ALIGN_DEBUG_RO
+ #define ALIGN_DEBUG_RO_MIN(min)		. = ALIGN(min);
+diff --git a/arch/mips/ath79/setup.c b/arch/mips/ath79/setup.c
+index 1ba21204ebe0..9a0013703579 100644
+--- a/arch/mips/ath79/setup.c
++++ b/arch/mips/ath79/setup.c
+@@ -216,9 +216,9 @@ void __init plat_mem_setup(void)
+ 					   AR71XX_RESET_SIZE);
+ 	ath79_pll_base = ioremap_nocache(AR71XX_PLL_BASE,
+ 					 AR71XX_PLL_SIZE);
++	ath79_detect_sys_type();
+ 	ath79_ddr_ctrl_init();
+ 
+-	ath79_detect_sys_type();
+ 	if (mips_machtype != ATH79_MACH_GENERIC_OF)
+ 		detect_memory_region(0, ATH79_MEM_SIZE_MIN, ATH79_MEM_SIZE_MAX);
+ 
+diff --git a/arch/mips/include/asm/cdmm.h b/arch/mips/include/asm/cdmm.h
+index 16e22ce9719f..85dc4ce401ad 100644
+--- a/arch/mips/include/asm/cdmm.h
++++ b/arch/mips/include/asm/cdmm.h
+@@ -84,6 +84,17 @@ void mips_cdmm_driver_unregister(struct mips_cdmm_driver *);
+ 	module_driver(__mips_cdmm_driver, mips_cdmm_driver_register, \
+ 			mips_cdmm_driver_unregister)
+ 
++/*
++ * builtin_mips_cdmm_driver() - Helper macro for drivers that don't do anything
++ * special in init and have no exit. This eliminates some boilerplate. Each
++ * driver may only use this macro once, and calling it replaces device_initcall
++ * (or in some cases, the legacy __initcall). This is meant to be a direct
++ * parallel of module_mips_cdmm_driver() above but without the __exit stuff that
++ * is not used for builtin cases.
++ */
++#define builtin_mips_cdmm_driver(__mips_cdmm_driver) \
++	builtin_driver(__mips_cdmm_driver, mips_cdmm_driver_register)
++
+ /* drivers/tty/mips_ejtag_fdc.c */
+ 
+ #ifdef CONFIG_MIPS_EJTAG_FDC_EARLYCON
+diff --git a/arch/mips/kvm/emulate.c b/arch/mips/kvm/emulate.c
+index d5fa3eaf39a1..41b1b090f56f 100644
+--- a/arch/mips/kvm/emulate.c
++++ b/arch/mips/kvm/emulate.c
+@@ -1581,7 +1581,7 @@ enum emulation_result kvm_mips_emulate_cache(uint32_t inst, uint32_t *opc,
+ 
+ 	base = (inst >> 21) & 0x1f;
+ 	op_inst = (inst >> 16) & 0x1f;
+-	offset = inst & 0xffff;
++	offset = (int16_t)inst;
+ 	cache = (inst >> 16) & 0x3;
+ 	op = (inst >> 18) & 0x7;
+ 
+diff --git a/arch/mips/kvm/locore.S b/arch/mips/kvm/locore.S
+index c567240386a0..d1ee95a7f7dd 100644
+--- a/arch/mips/kvm/locore.S
++++ b/arch/mips/kvm/locore.S
+@@ -165,9 +165,11 @@ FEXPORT(__kvm_mips_vcpu_run)
+ 
+ FEXPORT(__kvm_mips_load_asid)
+ 	/* Set the ASID for the Guest Kernel */
+-	INT_SLL	t0, t0, 1	/* with kseg0 @ 0x40000000, kernel */
+-			        /* addresses shift to 0x80000000 */
+-	bltz	t0, 1f		/* If kernel */
++	PTR_L	t0, VCPU_COP0(k1)
++	LONG_L	t0, COP0_STATUS(t0)
++	andi	t0, KSU_USER | ST0_ERL | ST0_EXL
++	xori	t0, KSU_USER
++	bnez	t0, 1f		/* If kernel */
+ 	 INT_ADDIU t1, k1, VCPU_GUEST_KERNEL_ASID  /* (BD)  */
+ 	INT_ADDIU t1, k1, VCPU_GUEST_USER_ASID    /* else user */
+ 1:
+@@ -482,9 +484,11 @@ __kvm_mips_return_to_guest:
+ 	mtc0	t0, CP0_EPC
+ 
+ 	/* Set the ASID for the Guest Kernel */
+-	INT_SLL	t0, t0, 1	/* with kseg0 @ 0x40000000, kernel */
+-				/* addresses shift to 0x80000000 */
+-	bltz	t0, 1f		/* If kernel */
++	PTR_L	t0, VCPU_COP0(k1)
++	LONG_L	t0, COP0_STATUS(t0)
++	andi	t0, KSU_USER | ST0_ERL | ST0_EXL
++	xori	t0, KSU_USER
++	bnez	t0, 1f		/* If kernel */
+ 	 INT_ADDIU t1, k1, VCPU_GUEST_KERNEL_ASID  /* (BD)  */
+ 	INT_ADDIU t1, k1, VCPU_GUEST_USER_ASID    /* else user */
+ 1:
+diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
+index cd4c129ce743..bafb32b4c6b4 100644
+--- a/arch/mips/kvm/mips.c
++++ b/arch/mips/kvm/mips.c
+@@ -278,7 +278,7 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
+ 
+ 	if (!gebase) {
+ 		err = -ENOMEM;
+-		goto out_free_cpu;
++		goto out_uninit_cpu;
+ 	}
+ 	kvm_debug("Allocated %d bytes for KVM Exception Handlers @ %p\n",
+ 		  ALIGN(size, PAGE_SIZE), gebase);
+@@ -342,6 +342,9 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
+ out_free_gebase:
+ 	kfree(gebase);
+ 
++out_uninit_cpu:
++	kvm_vcpu_uninit(vcpu);
++
+ out_free_cpu:
+ 	kfree(vcpu);
+ 
+diff --git a/arch/mips/lantiq/clk.c b/arch/mips/lantiq/clk.c
+index 3fc2e6d70c77..a0706fd4ce0a 100644
+--- a/arch/mips/lantiq/clk.c
++++ b/arch/mips/lantiq/clk.c
+@@ -99,6 +99,23 @@ int clk_set_rate(struct clk *clk, unsigned long rate)
+ }
+ EXPORT_SYMBOL(clk_set_rate);
+ 
++long clk_round_rate(struct clk *clk, unsigned long rate)
++{
++	if (unlikely(!clk_good(clk)))
++		return 0;
++	if (clk->rates && *clk->rates) {
++		unsigned long *r = clk->rates;
++
++		while (*r && (*r != rate))
++			r++;
++		if (!*r) {
++			return clk->rate;
++		}
++	}
++	return rate;
++}
++EXPORT_SYMBOL(clk_round_rate);
++
+ int clk_enable(struct clk *clk)
+ {
+ 	if (unlikely(!clk_good(clk)))
+diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
+index c98d89708e99..cbee788d9625 100644
+--- a/arch/s390/kvm/interrupt.c
++++ b/arch/s390/kvm/interrupt.c
+@@ -1051,8 +1051,7 @@ static int __inject_extcall(struct kvm_vcpu *vcpu, struct kvm_s390_irq *irq)
+ 				   src_id, 0, 2);
+ 
+ 	/* sending vcpu invalid */
+-	if (src_id >= KVM_MAX_VCPUS ||
+-	    kvm_get_vcpu(vcpu->kvm, src_id) == NULL)
++	if (kvm_get_vcpu_by_id(vcpu->kvm, src_id) == NULL)
+ 		return -EINVAL;
+ 
+ 	if (sclp.has_sigpif)
+@@ -1131,6 +1130,10 @@ static int __inject_sigp_emergency(struct kvm_vcpu *vcpu,
+ 	trace_kvm_s390_inject_vcpu(vcpu->vcpu_id, KVM_S390_INT_EMERGENCY,
+ 				   irq->u.emerg.code, 0, 2);
+ 
++	/* sending vcpu invalid */
++	if (kvm_get_vcpu_by_id(vcpu->kvm, irq->u.emerg.code) == NULL)
++		return -EINVAL;
++
+ 	set_bit(irq->u.emerg.code, li->sigp_emerg_pending);
+ 	set_bit(IRQ_PEND_EXT_EMERGENCY, &li->pending_irqs);
+ 	atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags);
+diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
+index f32f843a3631..4a001c1b5a1a 100644
+--- a/arch/s390/kvm/kvm-s390.c
++++ b/arch/s390/kvm/kvm-s390.c
+@@ -289,12 +289,16 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
+ 		r = 0;
+ 		break;
+ 	case KVM_CAP_S390_VECTOR_REGISTERS:
+-		if (MACHINE_HAS_VX) {
++		mutex_lock(&kvm->lock);
++		if (atomic_read(&kvm->online_vcpus)) {
++			r = -EBUSY;
++		} else if (MACHINE_HAS_VX) {
+ 			set_kvm_facility(kvm->arch.model.fac->mask, 129);
+ 			set_kvm_facility(kvm->arch.model.fac->list, 129);
+ 			r = 0;
+ 		} else
+ 			r = -EINVAL;
++		mutex_unlock(&kvm->lock);
+ 		break;
+ 	case KVM_CAP_S390_USER_STSI:
+ 		kvm->arch.user_stsi = 1;
+@@ -1037,7 +1041,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
+ 	if (!kvm->arch.sca)
+ 		goto out_err;
+ 	spin_lock(&kvm_lock);
+-	sca_offset = (sca_offset + 16) & 0x7f0;
++	sca_offset += 16;
++	if (sca_offset + sizeof(struct sca_block) > PAGE_SIZE)
++		sca_offset = 0;
+ 	kvm->arch.sca = (struct sca_block *) ((char *) kvm->arch.sca + sca_offset);
+ 	spin_unlock(&kvm_lock);
+ 
+diff --git a/arch/s390/kvm/sigp.c b/arch/s390/kvm/sigp.c
+index 72e58bd2bee7..7171056fc24d 100644
+--- a/arch/s390/kvm/sigp.c
++++ b/arch/s390/kvm/sigp.c
+@@ -294,12 +294,8 @@ static int handle_sigp_dst(struct kvm_vcpu *vcpu, u8 order_code,
+ 			   u16 cpu_addr, u32 parameter, u64 *status_reg)
+ {
+ 	int rc;
+-	struct kvm_vcpu *dst_vcpu;
++	struct kvm_vcpu *dst_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, cpu_addr);
+ 
+-	if (cpu_addr >= KVM_MAX_VCPUS)
+-		return SIGP_CC_NOT_OPERATIONAL;
+-
+-	dst_vcpu = kvm_get_vcpu(vcpu->kvm, cpu_addr);
+ 	if (!dst_vcpu)
+ 		return SIGP_CC_NOT_OPERATIONAL;
+ 
+@@ -481,7 +477,7 @@ int kvm_s390_handle_sigp_pei(struct kvm_vcpu *vcpu)
+ 	trace_kvm_s390_handle_sigp_pei(vcpu, order_code, cpu_addr);
+ 
+ 	if (order_code == SIGP_EXTERNAL_CALL) {
+-		dest_vcpu = kvm_get_vcpu(vcpu->kvm, cpu_addr);
++		dest_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, cpu_addr);
+ 		BUG_ON(dest_vcpu == NULL);
+ 
+ 		kvm_s390_vcpu_wakeup(dest_vcpu);
+diff --git a/arch/tile/kernel/usb.c b/arch/tile/kernel/usb.c
+index f0da5a237e94..9f1e05e12255 100644
+--- a/arch/tile/kernel/usb.c
++++ b/arch/tile/kernel/usb.c
+@@ -22,6 +22,7 @@
+ #include <linux/platform_device.h>
+ #include <linux/usb/tilegx.h>
+ #include <linux/init.h>
++#include <linux/module.h>
+ #include <linux/types.h>
+ 
+ static u64 ehci_dmamask = DMA_BIT_MASK(32);
+diff --git a/arch/x86/include/asm/i8259.h b/arch/x86/include/asm/i8259.h
+index ccffa53750a8..39bcefc20de7 100644
+--- a/arch/x86/include/asm/i8259.h
++++ b/arch/x86/include/asm/i8259.h
+@@ -60,6 +60,7 @@ struct legacy_pic {
+ 	void (*mask_all)(void);
+ 	void (*restore_mask)(void);
+ 	void (*init)(int auto_eoi);
++	int (*probe)(void);
+ 	int (*irq_pending)(unsigned int irq);
+ 	void (*make_irq)(unsigned int irq);
+ };
+diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
+index e16466ec473c..e9cd7befcb76 100644
+--- a/arch/x86/include/asm/kvm_emulate.h
++++ b/arch/x86/include/asm/kvm_emulate.h
+@@ -112,6 +112,16 @@ struct x86_emulate_ops {
+ 			struct x86_exception *fault);
+ 
+ 	/*
++	 * read_phys: Read bytes of standard (non-emulated/special) memory.
++	 *            Used for descriptor reading.
++	 *  @addr:  [IN ] Physical address from which to read.
++	 *  @val:   [OUT] Value read from memory.
++	 *  @bytes: [IN ] Number of bytes to read from memory.
++	 */
++	int (*read_phys)(struct x86_emulate_ctxt *ctxt, unsigned long addr,
++			void *val, unsigned int bytes);
++
++	/*
+ 	 * write_std: Write bytes of standard (non-emulated/special) memory.
+ 	 *            Used for descriptor writing.
+ 	 *  @addr:  [IN ] Linear address to which to write.
+diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
+index b5d7640abc5d..8a4add8e4639 100644
+--- a/arch/x86/include/uapi/asm/svm.h
++++ b/arch/x86/include/uapi/asm/svm.h
+@@ -100,6 +100,7 @@
+ 	{ SVM_EXIT_EXCP_BASE + UD_VECTOR,       "UD excp" }, \
+ 	{ SVM_EXIT_EXCP_BASE + PF_VECTOR,       "PF excp" }, \
+ 	{ SVM_EXIT_EXCP_BASE + NM_VECTOR,       "NM excp" }, \
++	{ SVM_EXIT_EXCP_BASE + AC_VECTOR,       "AC excp" }, \
+ 	{ SVM_EXIT_EXCP_BASE + MC_VECTOR,       "MC excp" }, \
+ 	{ SVM_EXIT_INTR,        "interrupt" }, \
+ 	{ SVM_EXIT_NMI,         "nmi" }, \
+diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
+index 2683f36e4e0a..ea4ba83ca0cf 100644
+--- a/arch/x86/kernel/apic/vector.c
++++ b/arch/x86/kernel/apic/vector.c
+@@ -360,7 +360,11 @@ int __init arch_probe_nr_irqs(void)
+ 	if (nr < nr_irqs)
+ 		nr_irqs = nr;
+ 
+-	return nr_legacy_irqs();
++	/*
++	 * We don't know if PIC is present at this point so we need to do
++	 * probe() to get the right number of legacy IRQs.
++	 */
++	return legacy_pic->probe();
+ }
+ 
+ #ifdef	CONFIG_X86_IO_APIC
+diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
+index cb9e5df42dd2..e4f929d97c42 100644
+--- a/arch/x86/kernel/cpu/common.c
++++ b/arch/x86/kernel/cpu/common.c
+@@ -272,10 +272,9 @@ __setup("nosmap", setup_disable_smap);
+ 
+ static __always_inline void setup_smap(struct cpuinfo_x86 *c)
+ {
+-	unsigned long eflags;
++	unsigned long eflags = native_save_fl();
+ 
+ 	/* This should have been cleared long ago */
+-	raw_local_save_flags(eflags);
+ 	BUG_ON(eflags & X86_EFLAGS_AC);
+ 
+ 	if (cpu_has(c, X86_FEATURE_SMAP)) {
+diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
+index 50ec9af1bd51..6545e6ddbfb1 100644
+--- a/arch/x86/kernel/fpu/signal.c
++++ b/arch/x86/kernel/fpu/signal.c
+@@ -385,20 +385,19 @@ fpu__alloc_mathframe(unsigned long sp, int ia32_frame,
+  */
+ void fpu__init_prepare_fx_sw_frame(void)
+ {
+-	int fsave_header_size = sizeof(struct fregs_state);
+ 	int size = xstate_size + FP_XSTATE_MAGIC2_SIZE;
+ 
+-	if (config_enabled(CONFIG_X86_32))
+-		size += fsave_header_size;
+-
+ 	fx_sw_reserved.magic1 = FP_XSTATE_MAGIC1;
+ 	fx_sw_reserved.extended_size = size;
+ 	fx_sw_reserved.xfeatures = xfeatures_mask;
+ 	fx_sw_reserved.xstate_size = xstate_size;
+ 
+-	if (config_enabled(CONFIG_IA32_EMULATION)) {
++	if (config_enabled(CONFIG_IA32_EMULATION) ||
++	    config_enabled(CONFIG_X86_32)) {
++		int fsave_header_size = sizeof(struct fregs_state);
++
+ 		fx_sw_reserved_ia32 = fx_sw_reserved;
+-		fx_sw_reserved_ia32.extended_size += fsave_header_size;
++		fx_sw_reserved_ia32.extended_size = size + fsave_header_size;
+ 	}
+ }
+ 
+diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
+index 62fc001c7846..2c4ac072a702 100644
+--- a/arch/x86/kernel/fpu/xstate.c
++++ b/arch/x86/kernel/fpu/xstate.c
+@@ -402,7 +402,6 @@ void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature)
+ 	if (!boot_cpu_has(X86_FEATURE_XSAVE))
+ 		return NULL;
+ 
+-	xsave = &current->thread.fpu.state.xsave;
+ 	/*
+ 	 * We should not ever be requesting features that we
+ 	 * have not enabled.  Remember that pcntxt_mask is
+diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
+index 1d40ca8a73f2..ffdc0e860390 100644
+--- a/arch/x86/kernel/head_64.S
++++ b/arch/x86/kernel/head_64.S
+@@ -65,6 +65,9 @@ startup_64:
+ 	 * tables and then reload them.
+ 	 */
+ 
++	/* Sanitize CPU configuration */
++	call verify_cpu
++
+ 	/*
+ 	 * Compute the delta between the address I am compiled to run at and the
+ 	 * address I am actually running at.
+@@ -174,6 +177,9 @@ ENTRY(secondary_startup_64)
+ 	 * after the boot processor executes this code.
+ 	 */
+ 
++	/* Sanitize CPU configuration */
++	call verify_cpu
++
+ 	movq	$(init_level4_pgt - __START_KERNEL_map), %rax
+ 1:
+ 
+@@ -288,6 +294,8 @@ ENTRY(secondary_startup_64)
+ 	pushq	%rax		# target address in negative space
+ 	lretq
+ 
++#include "verify_cpu.S"
++
+ #ifdef CONFIG_HOTPLUG_CPU
+ /*
+  * Boot CPU0 entry point. It's called from play_dead(). Everything has been set
+diff --git a/arch/x86/kernel/i8259.c b/arch/x86/kernel/i8259.c
+index 16cb827a5b27..be22f5a2192e 100644
+--- a/arch/x86/kernel/i8259.c
++++ b/arch/x86/kernel/i8259.c
+@@ -295,16 +295,11 @@ static void unmask_8259A(void)
+ 	raw_spin_unlock_irqrestore(&i8259A_lock, flags);
+ }
+ 
+-static void init_8259A(int auto_eoi)
++static int probe_8259A(void)
+ {
+ 	unsigned long flags;
+ 	unsigned char probe_val = ~(1 << PIC_CASCADE_IR);
+ 	unsigned char new_val;
+-
+-	i8259A_auto_eoi = auto_eoi;
+-
+-	raw_spin_lock_irqsave(&i8259A_lock, flags);
+-
+ 	/*
+ 	 * Check to see if we have a PIC.
+ 	 * Mask all except the cascade and read
+@@ -312,16 +307,28 @@ static void init_8259A(int auto_eoi)
+ 	 * have a PIC, we will read 0xff as opposed to the
+ 	 * value we wrote.
+ 	 */
++	raw_spin_lock_irqsave(&i8259A_lock, flags);
++
+ 	outb(0xff, PIC_SLAVE_IMR);	/* mask all of 8259A-2 */
+ 	outb(probe_val, PIC_MASTER_IMR);
+ 	new_val = inb(PIC_MASTER_IMR);
+ 	if (new_val != probe_val) {
+ 		printk(KERN_INFO "Using NULL legacy PIC\n");
+ 		legacy_pic = &null_legacy_pic;
+-		raw_spin_unlock_irqrestore(&i8259A_lock, flags);
+-		return;
+ 	}
+ 
++	raw_spin_unlock_irqrestore(&i8259A_lock, flags);
++	return nr_legacy_irqs();
++}
++
++static void init_8259A(int auto_eoi)
++{
++	unsigned long flags;
++
++	i8259A_auto_eoi = auto_eoi;
++
++	raw_spin_lock_irqsave(&i8259A_lock, flags);
++
+ 	outb(0xff, PIC_MASTER_IMR);	/* mask all of 8259A-1 */
+ 
+ 	/*
+@@ -379,6 +386,10 @@ static int legacy_pic_irq_pending_noop(unsigned int irq)
+ {
+ 	return 0;
+ }
++static int legacy_pic_probe(void)
++{
++	return 0;
++}
+ 
+ struct legacy_pic null_legacy_pic = {
+ 	.nr_legacy_irqs = 0,
+@@ -388,6 +399,7 @@ struct legacy_pic null_legacy_pic = {
+ 	.mask_all = legacy_pic_noop,
+ 	.restore_mask = legacy_pic_noop,
+ 	.init = legacy_pic_int_noop,
++	.probe = legacy_pic_probe,
+ 	.irq_pending = legacy_pic_irq_pending_noop,
+ 	.make_irq = legacy_pic_uint_noop,
+ };
+@@ -400,6 +412,7 @@ struct legacy_pic default_legacy_pic = {
+ 	.mask_all = mask_8259A,
+ 	.restore_mask = unmask_8259A,
+ 	.init = init_8259A,
++	.probe = probe_8259A,
+ 	.irq_pending = i8259A_irq_pending,
+ 	.make_irq = make_8259A_irq,
+ };
+diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
+index 80f874bf999e..1e6f70f1f251 100644
+--- a/arch/x86/kernel/setup.c
++++ b/arch/x86/kernel/setup.c
+@@ -1198,6 +1198,14 @@ void __init setup_arch(char **cmdline_p)
+ 	clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY,
+ 			swapper_pg_dir     + KERNEL_PGD_BOUNDARY,
+ 			KERNEL_PGD_PTRS);
++
++	/*
++	 * sync back low identity map too.  It is used for example
++	 * in the 32-bit EFI stub.
++	 */
++	clone_pgd_range(initial_page_table,
++			swapper_pg_dir     + KERNEL_PGD_BOUNDARY,
++			min(KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY));
+ #endif
+ 
+ 	tboot_probe();
+diff --git a/arch/x86/kernel/verify_cpu.S b/arch/x86/kernel/verify_cpu.S
+index b9242bacbe59..4cf401f581e7 100644
+--- a/arch/x86/kernel/verify_cpu.S
++++ b/arch/x86/kernel/verify_cpu.S
+@@ -34,10 +34,11 @@
+ #include <asm/msr-index.h>
+ 
+ verify_cpu:
+-	pushfl				# Save caller passed flags
+-	pushl	$0			# Kill any dangerous flags
+-	popfl
++	pushf				# Save caller passed flags
++	push	$0			# Kill any dangerous flags
++	popf
+ 
++#ifndef __x86_64__
+ 	pushfl				# standard way to check for cpuid
+ 	popl	%eax
+ 	movl	%eax,%ebx
+@@ -48,6 +49,7 @@ verify_cpu:
+ 	popl	%eax
+ 	cmpl	%eax,%ebx
+ 	jz	verify_cpu_no_longmode	# cpu has no cpuid
++#endif
+ 
+ 	movl	$0x0,%eax		# See if cpuid 1 is implemented
+ 	cpuid
+@@ -130,10 +132,10 @@ verify_cpu_sse_test:
+ 	jmp	verify_cpu_sse_test	# try again
+ 
+ verify_cpu_no_longmode:
+-	popfl				# Restore caller passed flags
++	popf				# Restore caller passed flags
+ 	movl $1,%eax
+ 	ret
+ verify_cpu_sse_ok:
+-	popfl				# Restore caller passed flags
++	popf				# Restore caller passed flags
+ 	xorl %eax, %eax
+ 	ret
+diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
+index 2392541a96e6..f17c342355f6 100644
+--- a/arch/x86/kvm/emulate.c
++++ b/arch/x86/kvm/emulate.c
+@@ -2272,8 +2272,8 @@ static int emulator_has_longmode(struct x86_emulate_ctxt *ctxt)
+ #define GET_SMSTATE(type, smbase, offset)				  \
+ 	({								  \
+ 	 type __val;							  \
+-	 int r = ctxt->ops->read_std(ctxt, smbase + offset, &__val,       \
+-				     sizeof(__val), NULL);		  \
++	 int r = ctxt->ops->read_phys(ctxt, smbase + offset, &__val,      \
++				      sizeof(__val));			  \
+ 	 if (r != X86EMUL_CONTINUE)					  \
+ 		 return X86EMUL_UNHANDLEABLE;				  \
+ 	 __val;								  \
+@@ -2484,17 +2484,36 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)
+ 
+ 	/*
+ 	 * Get back to real mode, to prepare a safe state in which to load
+-	 * CR0/CR3/CR4/EFER.  Also this will ensure that addresses passed
+-	 * to read_std/write_std are not virtual.
+-	 *
+-	 * CR4.PCIDE must be zero, because it is a 64-bit mode only feature.
++	 * CR0/CR3/CR4/EFER.  It's all a bit more complicated if the vCPU
++	 * supports long mode.
+ 	 */
++	cr4 = ctxt->ops->get_cr(ctxt, 4);
++	if (emulator_has_longmode(ctxt)) {
++		struct desc_struct cs_desc;
++
++		/* Zero CR4.PCIDE before CR0.PG.  */
++		if (cr4 & X86_CR4_PCIDE) {
++			ctxt->ops->set_cr(ctxt, 4, cr4 & ~X86_CR4_PCIDE);
++			cr4 &= ~X86_CR4_PCIDE;
++		}
++
++		/* A 32-bit code segment is required to clear EFER.LMA.  */
++		memset(&cs_desc, 0, sizeof(cs_desc));
++		cs_desc.type = 0xb;
++		cs_desc.s = cs_desc.g = cs_desc.p = 1;
++		ctxt->ops->set_segment(ctxt, 0, &cs_desc, 0, VCPU_SREG_CS);
++	}
++
++	/* For the 64-bit case, this will clear EFER.LMA.  */
+ 	cr0 = ctxt->ops->get_cr(ctxt, 0);
+ 	if (cr0 & X86_CR0_PE)
+ 		ctxt->ops->set_cr(ctxt, 0, cr0 & ~(X86_CR0_PG | X86_CR0_PE));
+-	cr4 = ctxt->ops->get_cr(ctxt, 4);
++
++	/* Now clear CR4.PAE (which must be done before clearing EFER.LME).  */
+ 	if (cr4 & X86_CR4_PAE)
+ 		ctxt->ops->set_cr(ctxt, 4, cr4 & ~X86_CR4_PAE);
++
++	/* And finally go back to 32-bit mode.  */
+ 	efer = 0;
+ 	ctxt->ops->set_msr(ctxt, MSR_EFER, efer);
+ 
+@@ -4455,7 +4474,7 @@ static const struct opcode twobyte_table[256] = {
+ 	F(DstMem | SrcReg | Src2CL | ModRM, em_shld), N, N,
+ 	/* 0xA8 - 0xAF */
+ 	I(Stack | Src2GS, em_push_sreg), I(Stack | Src2GS, em_pop_sreg),
+-	II(No64 | EmulateOnUD | ImplicitOps, em_rsm, rsm),
++	II(EmulateOnUD | ImplicitOps, em_rsm, rsm),
+ 	F(DstMem | SrcReg | ModRM | BitOp | Lock | PageTable, em_bts),
+ 	F(DstMem | SrcReg | Src2ImmByte | ModRM, em_shrd),
+ 	F(DstMem | SrcReg | Src2CL | ModRM, em_shrd),
+diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
+index 2a5ca97c263b..236e346584c3 100644
+--- a/arch/x86/kvm/lapic.c
++++ b/arch/x86/kvm/lapic.c
+@@ -348,6 +348,8 @@ void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir)
+ 	struct kvm_lapic *apic = vcpu->arch.apic;
+ 
+ 	__kvm_apic_update_irr(pir, apic->regs);
++
++	kvm_make_request(KVM_REQ_EVENT, vcpu);
+ }
+ EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
+ 
+diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
+index 2d32b67a1043..00da6e85a27f 100644
+--- a/arch/x86/kvm/svm.c
++++ b/arch/x86/kvm/svm.c
+@@ -1085,7 +1085,7 @@ static u64 svm_compute_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc)
+ 	return target_tsc - tsc;
+ }
+ 
+-static void init_vmcb(struct vcpu_svm *svm, bool init_event)
++static void init_vmcb(struct vcpu_svm *svm)
+ {
+ 	struct vmcb_control_area *control = &svm->vmcb->control;
+ 	struct vmcb_save_area *save = &svm->vmcb->save;
+@@ -1106,6 +1106,7 @@ static void init_vmcb(struct vcpu_svm *svm, bool init_event)
+ 	set_exception_intercept(svm, PF_VECTOR);
+ 	set_exception_intercept(svm, UD_VECTOR);
+ 	set_exception_intercept(svm, MC_VECTOR);
++	set_exception_intercept(svm, AC_VECTOR);
+ 
+ 	set_intercept(svm, INTERCEPT_INTR);
+ 	set_intercept(svm, INTERCEPT_NMI);
+@@ -1156,8 +1157,7 @@ static void init_vmcb(struct vcpu_svm *svm, bool init_event)
+ 	init_sys_seg(&save->ldtr, SEG_TYPE_LDT);
+ 	init_sys_seg(&save->tr, SEG_TYPE_BUSY_TSS16);
+ 
+-	if (!init_event)
+-		svm_set_efer(&svm->vcpu, 0);
++	svm_set_efer(&svm->vcpu, 0);
+ 	save->dr6 = 0xffff0ff0;
+ 	kvm_set_rflags(&svm->vcpu, 2);
+ 	save->rip = 0x0000fff0;
+@@ -1211,7 +1211,7 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
+ 		if (kvm_vcpu_is_reset_bsp(&svm->vcpu))
+ 			svm->vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
+ 	}
+-	init_vmcb(svm, init_event);
++	init_vmcb(svm);
+ 
+ 	kvm_cpuid(vcpu, &eax, &dummy, &dummy, &dummy);
+ 	kvm_register_write(vcpu, VCPU_REGS_RDX, eax);
+@@ -1267,7 +1267,7 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
+ 	clear_page(svm->vmcb);
+ 	svm->vmcb_pa = page_to_pfn(page) << PAGE_SHIFT;
+ 	svm->asid_generation = 0;
+-	init_vmcb(svm, false);
++	init_vmcb(svm);
+ 
+ 	svm_init_osvw(&svm->vcpu);
+ 
+@@ -1795,6 +1795,12 @@ static int ud_interception(struct vcpu_svm *svm)
+ 	return 1;
+ }
+ 
++static int ac_interception(struct vcpu_svm *svm)
++{
++	kvm_queue_exception_e(&svm->vcpu, AC_VECTOR, 0);
++	return 1;
++}
++
+ static void svm_fpu_activate(struct kvm_vcpu *vcpu)
+ {
+ 	struct vcpu_svm *svm = to_svm(vcpu);
+@@ -1889,7 +1895,7 @@ static int shutdown_interception(struct vcpu_svm *svm)
+ 	 * so reinitialize it.
+ 	 */
+ 	clear_page(svm->vmcb);
+-	init_vmcb(svm, false);
++	init_vmcb(svm);
+ 
+ 	kvm_run->exit_reason = KVM_EXIT_SHUTDOWN;
+ 	return 0;
+@@ -3369,6 +3375,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
+ 	[SVM_EXIT_EXCP_BASE + PF_VECTOR]	= pf_interception,
+ 	[SVM_EXIT_EXCP_BASE + NM_VECTOR]	= nm_interception,
+ 	[SVM_EXIT_EXCP_BASE + MC_VECTOR]	= mc_interception,
++	[SVM_EXIT_EXCP_BASE + AC_VECTOR]	= ac_interception,
+ 	[SVM_EXIT_INTR]				= intr_interception,
+ 	[SVM_EXIT_NMI]				= nmi_interception,
+ 	[SVM_EXIT_SMI]				= nop_on_interception,
+diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
+index aa9e8229571d..e77d75b8772a 100644
+--- a/arch/x86/kvm/vmx.c
++++ b/arch/x86/kvm/vmx.c
+@@ -1567,7 +1567,7 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu)
+ 	u32 eb;
+ 
+ 	eb = (1u << PF_VECTOR) | (1u << UD_VECTOR) | (1u << MC_VECTOR) |
+-	     (1u << NM_VECTOR) | (1u << DB_VECTOR);
++	     (1u << NM_VECTOR) | (1u << DB_VECTOR) | (1u << AC_VECTOR);
+ 	if ((vcpu->guest_debug &
+ 	     (KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP)) ==
+ 	    (KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP))
+@@ -4780,8 +4780,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
+ 	vmx_set_cr0(vcpu, cr0); /* enter rmode */
+ 	vmx->vcpu.arch.cr0 = cr0;
+ 	vmx_set_cr4(vcpu, 0);
+-	if (!init_event)
+-		vmx_set_efer(vcpu, 0);
++	vmx_set_efer(vcpu, 0);
+ 	vmx_fpu_activate(vcpu);
+ 	update_exception_bitmap(vcpu);
+ 
+@@ -5118,6 +5117,9 @@ static int handle_exception(struct kvm_vcpu *vcpu)
+ 		return handle_rmode_exception(vcpu, ex_no, error_code);
+ 
+ 	switch (ex_no) {
++	case AC_VECTOR:
++		kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
++		return 1;
+ 	case DB_VECTOR:
+ 		dr6 = vmcs_readl(EXIT_QUALIFICATION);
+ 		if (!(vcpu->guest_debug &
+diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
+index 373328b71599..2781e2b0201d 100644
+--- a/arch/x86/kvm/x86.c
++++ b/arch/x86/kvm/x86.c
+@@ -621,7 +621,9 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
+ 	if ((cr0 ^ old_cr0) & update_bits)
+ 		kvm_mmu_reset_context(vcpu);
+ 
+-	if ((cr0 ^ old_cr0) & X86_CR0_CD)
++	if (((cr0 ^ old_cr0) & X86_CR0_CD) &&
++	    kvm_arch_has_noncoherent_dma(vcpu->kvm) &&
++	    !kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
+ 		kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL);
+ 
+ 	return 0;
+@@ -4260,6 +4262,15 @@ static int kvm_read_guest_virt_system(struct x86_emulate_ctxt *ctxt,
+ 	return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, 0, exception);
+ }
+ 
++static int kvm_read_guest_phys_system(struct x86_emulate_ctxt *ctxt,
++		unsigned long addr, void *val, unsigned int bytes)
++{
++	struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
++	int r = kvm_vcpu_read_guest(vcpu, addr, val, bytes);
++
++	return r < 0 ? X86EMUL_IO_NEEDED : X86EMUL_CONTINUE;
++}
++
+ int kvm_write_guest_virt_system(struct x86_emulate_ctxt *ctxt,
+ 				       gva_t addr, void *val,
+ 				       unsigned int bytes,
+@@ -4995,6 +5006,7 @@ static const struct x86_emulate_ops emulate_ops = {
+ 	.write_gpr           = emulator_write_gpr,
+ 	.read_std            = kvm_read_guest_virt_system,
+ 	.write_std           = kvm_write_guest_virt_system,
++	.read_phys           = kvm_read_guest_phys_system,
+ 	.fetch               = kvm_fetch_guest_virt,
+ 	.read_emulated       = emulator_read_emulated,
+ 	.write_emulated      = emulator_write_emulated,
+diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
+index db1b0bc5017c..c28f6185f8a4 100644
+--- a/arch/x86/mm/mpx.c
++++ b/arch/x86/mm/mpx.c
+@@ -622,6 +622,29 @@ static unsigned long mpx_bd_entry_to_bt_addr(struct mm_struct *mm,
+ }
+ 
+ /*
++ * We only want to do a 4-byte get_user() on 32-bit.  Otherwise,
++ * we might run off the end of the bounds table if we are on
++ * a 64-bit kernel and try to get 8 bytes.
++ */
++int get_user_bd_entry(struct mm_struct *mm, unsigned long *bd_entry_ret,
++		long __user *bd_entry_ptr)
++{
++	u32 bd_entry_32;
++	int ret;
++
++	if (is_64bit_mm(mm))
++		return get_user(*bd_entry_ret, bd_entry_ptr);
++
++	/*
++	 * Note that get_user() uses the type of the *pointer* to
++	 * establish the size of the get, not the destination.
++	 */
++	ret = get_user(bd_entry_32, (u32 __user *)bd_entry_ptr);
++	*bd_entry_ret = bd_entry_32;
++	return ret;
++}
++
++/*
+  * Get the base of bounds tables pointed by specific bounds
+  * directory entry.
+  */
+@@ -641,7 +664,7 @@ static int get_bt_addr(struct mm_struct *mm,
+ 		int need_write = 0;
+ 
+ 		pagefault_disable();
+-		ret = get_user(bd_entry, bd_entry_ptr);
++		ret = get_user_bd_entry(mm, &bd_entry, bd_entry_ptr);
+ 		pagefault_enable();
+ 		if (!ret)
+ 			break;
+@@ -736,11 +759,23 @@ static unsigned long mpx_get_bt_entry_offset_bytes(struct mm_struct *mm,
+  */
+ static inline unsigned long bd_entry_virt_space(struct mm_struct *mm)
+ {
+-	unsigned long long virt_space = (1ULL << boot_cpu_data.x86_virt_bits);
+-	if (is_64bit_mm(mm))
+-		return virt_space / MPX_BD_NR_ENTRIES_64;
+-	else
+-		return virt_space / MPX_BD_NR_ENTRIES_32;
++	unsigned long long virt_space;
++	unsigned long long GB = (1ULL << 30);
++
++	/*
++	 * This covers 32-bit emulation as well as 32-bit kernels
++	 * running on 64-bit harware.
++	 */
++	if (!is_64bit_mm(mm))
++		return (4ULL * GB) / MPX_BD_NR_ENTRIES_32;
++
++	/*
++	 * 'x86_virt_bits' returns what the hardware is capable
++	 * of, and returns the full >32-bit adddress space when
++	 * running 32-bit kernels on 64-bit hardware.
++	 */
++	virt_space = (1ULL << boot_cpu_data.x86_virt_bits);
++	return virt_space / MPX_BD_NR_ENTRIES_64;
+ }
+ 
+ /*
+diff --git a/drivers/bluetooth/ath3k.c b/drivers/bluetooth/ath3k.c
+index e527a3e13939..fa893c3ec408 100644
+--- a/drivers/bluetooth/ath3k.c
++++ b/drivers/bluetooth/ath3k.c
+@@ -93,6 +93,7 @@ static const struct usb_device_id ath3k_table[] = {
+ 	{ USB_DEVICE(0x04CA, 0x300f) },
+ 	{ USB_DEVICE(0x04CA, 0x3010) },
+ 	{ USB_DEVICE(0x0930, 0x0219) },
++	{ USB_DEVICE(0x0930, 0x021c) },
+ 	{ USB_DEVICE(0x0930, 0x0220) },
+ 	{ USB_DEVICE(0x0930, 0x0227) },
+ 	{ USB_DEVICE(0x0b05, 0x17d0) },
+@@ -104,6 +105,7 @@ static const struct usb_device_id ath3k_table[] = {
+ 	{ USB_DEVICE(0x0CF3, 0x311F) },
+ 	{ USB_DEVICE(0x0cf3, 0x3121) },
+ 	{ USB_DEVICE(0x0CF3, 0x817a) },
++	{ USB_DEVICE(0x0CF3, 0x817b) },
+ 	{ USB_DEVICE(0x0cf3, 0xe003) },
+ 	{ USB_DEVICE(0x0CF3, 0xE004) },
+ 	{ USB_DEVICE(0x0CF3, 0xE005) },
+@@ -153,6 +155,7 @@ static const struct usb_device_id ath3k_blist_tbl[] = {
+ 	{ USB_DEVICE(0x04ca, 0x300f), .driver_info = BTUSB_ATH3012 },
+ 	{ USB_DEVICE(0x04ca, 0x3010), .driver_info = BTUSB_ATH3012 },
+ 	{ USB_DEVICE(0x0930, 0x0219), .driver_info = BTUSB_ATH3012 },
++	{ USB_DEVICE(0x0930, 0x021c), .driver_info = BTUSB_ATH3012 },
+ 	{ USB_DEVICE(0x0930, 0x0220), .driver_info = BTUSB_ATH3012 },
+ 	{ USB_DEVICE(0x0930, 0x0227), .driver_info = BTUSB_ATH3012 },
+ 	{ USB_DEVICE(0x0b05, 0x17d0), .driver_info = BTUSB_ATH3012 },
+@@ -164,6 +167,7 @@ static const struct usb_device_id ath3k_blist_tbl[] = {
+ 	{ USB_DEVICE(0x0cf3, 0x311F), .driver_info = BTUSB_ATH3012 },
+ 	{ USB_DEVICE(0x0cf3, 0x3121), .driver_info = BTUSB_ATH3012 },
+ 	{ USB_DEVICE(0x0CF3, 0x817a), .driver_info = BTUSB_ATH3012 },
++	{ USB_DEVICE(0x0CF3, 0x817b), .driver_info = BTUSB_ATH3012 },
+ 	{ USB_DEVICE(0x0cf3, 0xe004), .driver_info = BTUSB_ATH3012 },
+ 	{ USB_DEVICE(0x0cf3, 0xe005), .driver_info = BTUSB_ATH3012 },
+ 	{ USB_DEVICE(0x0cf3, 0xe006), .driver_info = BTUSB_ATH3012 },
+diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
+index b4cf8d9c9dac..7d9b09f4158c 100644
+--- a/drivers/bluetooth/btusb.c
++++ b/drivers/bluetooth/btusb.c
+@@ -192,6 +192,7 @@ static const struct usb_device_id blacklist_table[] = {
+ 	{ USB_DEVICE(0x04ca, 0x300f), .driver_info = BTUSB_ATH3012 },
+ 	{ USB_DEVICE(0x04ca, 0x3010), .driver_info = BTUSB_ATH3012 },
+ 	{ USB_DEVICE(0x0930, 0x0219), .driver_info = BTUSB_ATH3012 },
++	{ USB_DEVICE(0x0930, 0x021c), .driver_info = BTUSB_ATH3012 },
+ 	{ USB_DEVICE(0x0930, 0x0220), .driver_info = BTUSB_ATH3012 },
+ 	{ USB_DEVICE(0x0930, 0x0227), .driver_info = BTUSB_ATH3012 },
+ 	{ USB_DEVICE(0x0b05, 0x17d0), .driver_info = BTUSB_ATH3012 },
+@@ -203,6 +204,7 @@ static const struct usb_device_id blacklist_table[] = {
+ 	{ USB_DEVICE(0x0cf3, 0x311f), .driver_info = BTUSB_ATH3012 },
+ 	{ USB_DEVICE(0x0cf3, 0x3121), .driver_info = BTUSB_ATH3012 },
+ 	{ USB_DEVICE(0x0cf3, 0x817a), .driver_info = BTUSB_ATH3012 },
++	{ USB_DEVICE(0x0cf3, 0x817b), .driver_info = BTUSB_ATH3012 },
+ 	{ USB_DEVICE(0x0cf3, 0xe003), .driver_info = BTUSB_ATH3012 },
+ 	{ USB_DEVICE(0x0cf3, 0xe004), .driver_info = BTUSB_ATH3012 },
+ 	{ USB_DEVICE(0x0cf3, 0xe005), .driver_info = BTUSB_ATH3012 },
+diff --git a/drivers/clk/bcm/clk-iproc-pll.c b/drivers/clk/bcm/clk-iproc-pll.c
+index 2dda4e8295a9..d679ab869653 100644
+--- a/drivers/clk/bcm/clk-iproc-pll.c
++++ b/drivers/clk/bcm/clk-iproc-pll.c
+@@ -345,8 +345,8 @@ static unsigned long iproc_pll_recalc_rate(struct clk_hw *hw,
+ 	struct iproc_pll *pll = clk->pll;
+ 	const struct iproc_pll_ctrl *ctrl = pll->ctrl;
+ 	u32 val;
+-	u64 ndiv;
+-	unsigned int ndiv_int, ndiv_frac, pdiv;
++	u64 ndiv, ndiv_int, ndiv_frac;
++	unsigned int pdiv;
+ 
+ 	if (parent_rate == 0)
+ 		return 0;
+@@ -366,22 +366,19 @@ static unsigned long iproc_pll_recalc_rate(struct clk_hw *hw,
+ 	val = readl(pll->pll_base + ctrl->ndiv_int.offset);
+ 	ndiv_int = (val >> ctrl->ndiv_int.shift) &
+ 		bit_mask(ctrl->ndiv_int.width);
+-	ndiv = (u64)ndiv_int << ctrl->ndiv_int.shift;
++	ndiv = ndiv_int << 20;
+ 
+ 	if (ctrl->flags & IPROC_CLK_PLL_HAS_NDIV_FRAC) {
+ 		val = readl(pll->pll_base + ctrl->ndiv_frac.offset);
+ 		ndiv_frac = (val >> ctrl->ndiv_frac.shift) &
+ 			bit_mask(ctrl->ndiv_frac.width);
+-
+-		if (ndiv_frac != 0)
+-			ndiv = ((u64)ndiv_int << ctrl->ndiv_int.shift) |
+-				ndiv_frac;
++		ndiv += ndiv_frac;
+ 	}
+ 
+ 	val = readl(pll->pll_base + ctrl->pdiv.offset);
+ 	pdiv = (val >> ctrl->pdiv.shift) & bit_mask(ctrl->pdiv.width);
+ 
+-	clk->rate = (ndiv * parent_rate) >> ctrl->ndiv_int.shift;
++	clk->rate = (ndiv * parent_rate) >> 20;
+ 
+ 	if (pdiv == 0)
+ 		clk->rate *= 2;
+diff --git a/drivers/clk/versatile/clk-icst.c b/drivers/clk/versatile/clk-icst.c
+index bc96f103bd7c..9064636a867f 100644
+--- a/drivers/clk/versatile/clk-icst.c
++++ b/drivers/clk/versatile/clk-icst.c
+@@ -156,8 +156,10 @@ struct clk *icst_clk_register(struct device *dev,
+ 	icst->lockreg = base + desc->lock_offset;
+ 
+ 	clk = clk_register(dev, &icst->hw);
+-	if (IS_ERR(clk))
++	if (IS_ERR(clk)) {
++		kfree(pclone);
+ 		kfree(icst);
++	}
+ 
+ 	return clk;
+ }
+diff --git a/drivers/mfd/twl6040.c b/drivers/mfd/twl6040.c
+index c5265c1262c5..6aacd205a774 100644
+--- a/drivers/mfd/twl6040.c
++++ b/drivers/mfd/twl6040.c
+@@ -647,6 +647,8 @@ static int twl6040_probe(struct i2c_client *client,
+ 
+ 	twl6040->clk32k = devm_clk_get(&client->dev, "clk32k");
+ 	if (IS_ERR(twl6040->clk32k)) {
++		if (PTR_ERR(twl6040->clk32k) == -EPROBE_DEFER)
++			return -EPROBE_DEFER;
+ 		dev_info(&client->dev, "clk32k is not handled\n");
+ 		twl6040->clk32k = NULL;
+ 	}
+diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
+index a98dd4f1b0e3..cbbb1c93386d 100644
+--- a/drivers/net/bonding/bond_main.c
++++ b/drivers/net/bonding/bond_main.c
+@@ -1751,6 +1751,7 @@ err_undo_flags:
+ 					    slave_dev->dev_addr))
+ 			eth_hw_addr_random(bond_dev);
+ 		if (bond_dev->type != ARPHRD_ETHER) {
++			dev_close(bond_dev);
+ 			ether_setup(bond_dev);
+ 			bond_dev->flags |= IFF_MASTER;
+ 			bond_dev->priv_flags &= ~IFF_TX_SKB_SHARING;
+diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c
+index aede704605c6..141c2a42d7ed 100644
+--- a/drivers/net/can/dev.c
++++ b/drivers/net/can/dev.c
+@@ -915,7 +915,7 @@ static int can_fill_info(struct sk_buff *skb, const struct net_device *dev)
+ 	     nla_put(skb, IFLA_CAN_BITTIMING_CONST,
+ 		     sizeof(*priv->bittiming_const), priv->bittiming_const)) ||
+ 
+-	    nla_put(skb, IFLA_CAN_CLOCK, sizeof(cm), &priv->clock) ||
++	    nla_put(skb, IFLA_CAN_CLOCK, sizeof(priv->clock), &priv->clock) ||
+ 	    nla_put_u32(skb, IFLA_CAN_STATE, state) ||
+ 	    nla_put(skb, IFLA_CAN_CTRLMODE, sizeof(cm), &cm) ||
+ 	    nla_put_u32(skb, IFLA_CAN_RESTART_MS, priv->restart_ms) ||
+diff --git a/drivers/net/can/sja1000/sja1000.c b/drivers/net/can/sja1000/sja1000.c
+index 7b92e911a616..f10834be48a5 100644
+--- a/drivers/net/can/sja1000/sja1000.c
++++ b/drivers/net/can/sja1000/sja1000.c
+@@ -218,6 +218,9 @@ static void sja1000_start(struct net_device *dev)
+ 	priv->write_reg(priv, SJA1000_RXERR, 0x0);
+ 	priv->read_reg(priv, SJA1000_ECC);
+ 
++	/* clear interrupt flags */
++	priv->read_reg(priv, SJA1000_IR);
++
+ 	/* leave reset mode */
+ 	set_normal_mode(dev);
+ }
+diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-dev.c b/drivers/net/ethernet/amd/xgbe/xgbe-dev.c
+index a4473d8ff4fa..f672dba345f7 100644
+--- a/drivers/net/ethernet/amd/xgbe/xgbe-dev.c
++++ b/drivers/net/ethernet/amd/xgbe/xgbe-dev.c
+@@ -1595,7 +1595,7 @@ static void xgbe_dev_xmit(struct xgbe_channel *channel)
+ 				  packet->rdesc_count, 1);
+ 
+ 	/* Make sure ownership is written to the descriptor */
+-	dma_wmb();
++	smp_wmb();
+ 
+ 	ring->cur = cur_index + 1;
+ 	if (!packet->skb->xmit_more ||
+diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
+index aae9d5ecd182..dde0486667e0 100644
+--- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
++++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
+@@ -1807,6 +1807,7 @@ static int xgbe_tx_poll(struct xgbe_channel *channel)
+ 	struct netdev_queue *txq;
+ 	int processed = 0;
+ 	unsigned int tx_packets = 0, tx_bytes = 0;
++	unsigned int cur;
+ 
+ 	DBGPR("-->xgbe_tx_poll\n");
+ 
+@@ -1814,10 +1815,15 @@ static int xgbe_tx_poll(struct xgbe_channel *channel)
+ 	if (!ring)
+ 		return 0;
+ 
++	cur = ring->cur;
++
++	/* Be sure we get ring->cur before accessing descriptor data */
++	smp_rmb();
++
+ 	txq = netdev_get_tx_queue(netdev, channel->queue_index);
+ 
+ 	while ((processed < XGBE_TX_DESC_MAX_PROC) &&
+-	       (ring->dirty != ring->cur)) {
++	       (ring->dirty != cur)) {
+ 		rdata = XGBE_GET_DESC_DATA(ring, ring->dirty);
+ 		rdesc = rdata->rdesc;
+ 
+diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
+index de63266de16b..5d1dde3f3540 100644
+--- a/drivers/net/ethernet/freescale/fec_main.c
++++ b/drivers/net/ethernet/freescale/fec_main.c
+@@ -1775,7 +1775,7 @@ static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
+ 	int ret = 0;
+ 
+ 	ret = pm_runtime_get_sync(dev);
+-	if (IS_ERR_VALUE(ret))
++	if (ret < 0)
+ 		return ret;
+ 
+ 	fep->mii_timeout = 0;
+@@ -1811,11 +1811,13 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
+ 	struct fec_enet_private *fep = bus->priv;
+ 	struct device *dev = &fep->pdev->dev;
+ 	unsigned long time_left;
+-	int ret = 0;
++	int ret;
+ 
+ 	ret = pm_runtime_get_sync(dev);
+-	if (IS_ERR_VALUE(ret))
++	if (ret < 0)
+ 		return ret;
++	else
++		ret = 0;
+ 
+ 	fep->mii_timeout = 0;
+ 	reinit_completion(&fep->mdio_done);
+@@ -2866,7 +2868,7 @@ fec_enet_open(struct net_device *ndev)
+ 	int ret;
+ 
+ 	ret = pm_runtime_get_sync(&fep->pdev->dev);
+-	if (IS_ERR_VALUE(ret))
++	if (ret < 0)
+ 		return ret;
+ 
+ 	pinctrl_pm_select_default_state(&fep->pdev->dev);
+diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
+index 09ec32e33076..7e788073c154 100644
+--- a/drivers/net/ethernet/marvell/mvneta.c
++++ b/drivers/net/ethernet/marvell/mvneta.c
+@@ -949,7 +949,7 @@ static void mvneta_defaults_set(struct mvneta_port *pp)
+ 	/* Set CPU queue access map - all CPUs have access to all RX
+ 	 * queues and to all TX queues
+ 	 */
+-	for (cpu = 0; cpu < CONFIG_NR_CPUS; cpu++)
++	for_each_present_cpu(cpu)
+ 		mvreg_write(pp, MVNETA_CPU_MAP(cpu),
+ 			    (MVNETA_CPU_RXQ_ACCESS_ALL_MASK |
+ 			     MVNETA_CPU_TXQ_ACCESS_ALL_MASK));
+@@ -1533,12 +1533,16 @@ static int mvneta_rx(struct mvneta_port *pp, int rx_todo,
+ 		}
+ 
+ 		skb = build_skb(data, pp->frag_size > PAGE_SIZE ? 0 : pp->frag_size);
+-		if (!skb)
+-			goto err_drop_frame;
+ 
++		/* After refill old buffer has to be unmapped regardless
++		 * the skb is successfully built or not.
++		 */
+ 		dma_unmap_single(dev->dev.parent, phys_addr,
+ 				 MVNETA_RX_BUF_SIZE(pp->pkt_size), DMA_FROM_DEVICE);
+ 
++		if (!skb)
++			goto err_drop_frame;
++
+ 		rcvd_pkts++;
+ 		rcvd_bytes += rx_bytes;
+ 
+diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
+index 0a3202047569..2177e56ed0be 100644
+--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
++++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
+@@ -2398,7 +2398,7 @@ int mlx4_multi_func_init(struct mlx4_dev *dev)
+ 			}
+ 		}
+ 
+-		memset(&priv->mfunc.master.cmd_eqe, 0, dev->caps.eqe_size);
++		memset(&priv->mfunc.master.cmd_eqe, 0, sizeof(struct mlx4_eqe));
+ 		priv->mfunc.master.cmd_eqe.type = MLX4_EVENT_TYPE_CMD;
+ 		INIT_WORK(&priv->mfunc.master.comm_work,
+ 			  mlx4_master_comm_channel);
+diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c b/drivers/net/ethernet/mellanox/mlx4/eq.c
+index 8e81e53c370e..ad8f95df4310 100644
+--- a/drivers/net/ethernet/mellanox/mlx4/eq.c
++++ b/drivers/net/ethernet/mellanox/mlx4/eq.c
+@@ -196,7 +196,7 @@ static void slave_event(struct mlx4_dev *dev, u8 slave, struct mlx4_eqe *eqe)
+ 		return;
+ 	}
+ 
+-	memcpy(s_eqe, eqe, dev->caps.eqe_size - 1);
++	memcpy(s_eqe, eqe, sizeof(struct mlx4_eqe) - 1);
+ 	s_eqe->slave_id = slave;
+ 	/* ensure all information is written before setting the ownersip bit */
+ 	dma_wmb();
+diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
+index b1a4ea21c91c..4dd18f4bb5ae 100644
+--- a/drivers/net/ethernet/sfc/ef10.c
++++ b/drivers/net/ethernet/sfc/ef10.c
+@@ -1809,7 +1809,9 @@ static void efx_ef10_tx_write(struct efx_tx_queue *tx_queue)
+ 	unsigned int write_ptr;
+ 	efx_qword_t *txd;
+ 
+-	BUG_ON(tx_queue->write_count == tx_queue->insert_count);
++	tx_queue->xmit_more_available = false;
++	if (unlikely(tx_queue->write_count == tx_queue->insert_count))
++		return;
+ 
+ 	do {
+ 		write_ptr = tx_queue->write_count & tx_queue->ptr_mask;
+diff --git a/drivers/net/ethernet/sfc/farch.c b/drivers/net/ethernet/sfc/farch.c
+index f08266f0eca2..5a1c5a8f278a 100644
+--- a/drivers/net/ethernet/sfc/farch.c
++++ b/drivers/net/ethernet/sfc/farch.c
+@@ -321,7 +321,9 @@ void efx_farch_tx_write(struct efx_tx_queue *tx_queue)
+ 	unsigned write_ptr;
+ 	unsigned old_write_count = tx_queue->write_count;
+ 
+-	BUG_ON(tx_queue->write_count == tx_queue->insert_count);
++	tx_queue->xmit_more_available = false;
++	if (unlikely(tx_queue->write_count == tx_queue->insert_count))
++		return;
+ 
+ 	do {
+ 		write_ptr = tx_queue->write_count & tx_queue->ptr_mask;
+diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
+index 47d1e3a96522..b8e8ce1caf0f 100644
+--- a/drivers/net/ethernet/sfc/net_driver.h
++++ b/drivers/net/ethernet/sfc/net_driver.h
+@@ -219,6 +219,7 @@ struct efx_tx_buffer {
+  * @tso_packets: Number of packets via the TSO xmit path
+  * @pushes: Number of times the TX push feature has been used
+  * @pio_packets: Number of times the TX PIO feature has been used
++ * @xmit_more_available: Are any packets waiting to be pushed to the NIC
+  * @empty_read_count: If the completion path has seen the queue as empty
+  *	and the transmission path has not yet checked this, the value of
+  *	@read_count bitwise-added to %EFX_EMPTY_COUNT_VALID; otherwise 0.
+@@ -253,6 +254,7 @@ struct efx_tx_queue {
+ 	unsigned int tso_packets;
+ 	unsigned int pushes;
+ 	unsigned int pio_packets;
++	bool xmit_more_available;
+ 	/* Statistics to supplement MAC stats */
+ 	unsigned long tx_packets;
+ 
+diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c
+index 1833a0146571..67f6afaa022f 100644
+--- a/drivers/net/ethernet/sfc/tx.c
++++ b/drivers/net/ethernet/sfc/tx.c
+@@ -431,8 +431,20 @@ finish_packet:
+ 	efx_tx_maybe_stop_queue(tx_queue);
+ 
+ 	/* Pass off to hardware */
+-	if (!skb->xmit_more || netif_xmit_stopped(tx_queue->core_txq))
++	if (!skb->xmit_more || netif_xmit_stopped(tx_queue->core_txq)) {
++		struct efx_tx_queue *txq2 = efx_tx_queue_partner(tx_queue);
++
++		/* There could be packets left on the partner queue if those
++		 * SKBs had skb->xmit_more set. If we do not push those they
++		 * could be left for a long time and cause a netdev watchdog.
++		 */
++		if (txq2->xmit_more_available)
++			efx_nic_push_buffers(txq2);
++
+ 		efx_nic_push_buffers(tx_queue);
++	} else {
++		tx_queue->xmit_more_available = skb->xmit_more;
++	}
+ 
+ 	tx_queue->tx_packets++;
+ 
+@@ -722,6 +734,7 @@ void efx_init_tx_queue(struct efx_tx_queue *tx_queue)
+ 	tx_queue->read_count = 0;
+ 	tx_queue->old_read_count = 0;
+ 	tx_queue->empty_read_count = 0 | EFX_EMPTY_COUNT_VALID;
++	tx_queue->xmit_more_available = false;
+ 
+ 	/* Set up TX descriptor ring */
+ 	efx_nic_init_tx(tx_queue);
+@@ -747,6 +760,7 @@ void efx_fini_tx_queue(struct efx_tx_queue *tx_queue)
+ 
+ 		++tx_queue->read_count;
+ 	}
++	tx_queue->xmit_more_available = false;
+ 	netdev_tx_reset_queue(tx_queue->core_txq);
+ }
+ 
+@@ -1302,8 +1316,20 @@ static int efx_enqueue_skb_tso(struct efx_tx_queue *tx_queue,
+ 	efx_tx_maybe_stop_queue(tx_queue);
+ 
+ 	/* Pass off to hardware */
+-	if (!skb->xmit_more || netif_xmit_stopped(tx_queue->core_txq))
++	if (!skb->xmit_more || netif_xmit_stopped(tx_queue->core_txq)) {
++		struct efx_tx_queue *txq2 = efx_tx_queue_partner(tx_queue);
++
++		/* There could be packets left on the partner queue if those
++		 * SKBs had skb->xmit_more set. If we do not push those they
++		 * could be left for a long time and cause a netdev watchdog.
++		 */
++		if (txq2->xmit_more_available)
++			efx_nic_push_buffers(txq2);
++
+ 		efx_nic_push_buffers(tx_queue);
++	} else {
++		tx_queue->xmit_more_available = skb->xmit_more;
++	}
+ 
+ 	tx_queue->tso_bursts++;
+ 	return NETDEV_TX_OK;
+diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
+index 771cda2a48b2..2e51b816a7e8 100644
+--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
++++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
+@@ -721,10 +721,13 @@ static int stmmac_get_ts_info(struct net_device *dev,
+ {
+ 	struct stmmac_priv *priv = netdev_priv(dev);
+ 
+-	if ((priv->hwts_tx_en) && (priv->hwts_rx_en)) {
++	if ((priv->dma_cap.time_stamp || priv->dma_cap.atime_stamp)) {
+ 
+-		info->so_timestamping = SOF_TIMESTAMPING_TX_HARDWARE |
++		info->so_timestamping = SOF_TIMESTAMPING_TX_SOFTWARE |
++					SOF_TIMESTAMPING_TX_HARDWARE |
++					SOF_TIMESTAMPING_RX_SOFTWARE |
+ 					SOF_TIMESTAMPING_RX_HARDWARE |
++					SOF_TIMESTAMPING_SOFTWARE |
+ 					SOF_TIMESTAMPING_RAW_HARDWARE;
+ 
+ 		if (priv->ptp_clock)
+diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
+index 248478c6f6e4..197c93937c2d 100644
+--- a/drivers/net/macvtap.c
++++ b/drivers/net/macvtap.c
+@@ -137,7 +137,7 @@ static const struct proto_ops macvtap_socket_ops;
+ #define TUN_OFFLOADS (NETIF_F_HW_CSUM | NETIF_F_TSO_ECN | NETIF_F_TSO | \
+ 		      NETIF_F_TSO6 | NETIF_F_UFO)
+ #define RX_OFFLOADS (NETIF_F_GRO | NETIF_F_LRO)
+-#define TAP_FEATURES (NETIF_F_GSO | NETIF_F_SG)
++#define TAP_FEATURES (NETIF_F_GSO | NETIF_F_SG | NETIF_F_FRAGLIST)
+ 
+ static struct macvlan_dev *macvtap_get_vlan_rcu(const struct net_device *dev)
+ {
+diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
+index 2ed75060da50..5e0b43283bce 100644
+--- a/drivers/net/ppp/pppoe.c
++++ b/drivers/net/ppp/pppoe.c
+@@ -589,7 +589,7 @@ static int pppoe_release(struct socket *sock)
+ 
+ 	po = pppox_sk(sk);
+ 
+-	if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND | PPPOX_ZOMBIE)) {
++	if (po->pppoe_dev) {
+ 		dev_put(po->pppoe_dev);
+ 		po->pppoe_dev = NULL;
+ 	}
+diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
+index 64a60afbe50c..8f1738c3b3c5 100644
+--- a/drivers/net/usb/qmi_wwan.c
++++ b/drivers/net/usb/qmi_wwan.c
+@@ -765,6 +765,10 @@ static const struct usb_device_id products[] = {
+ 	{QMI_FIXED_INTF(0x1199, 0x9056, 8)},	/* Sierra Wireless Modem */
+ 	{QMI_FIXED_INTF(0x1199, 0x9057, 8)},
+ 	{QMI_FIXED_INTF(0x1199, 0x9061, 8)},	/* Sierra Wireless Modem */
++	{QMI_FIXED_INTF(0x1199, 0x9070, 8)},	/* Sierra Wireless MC74xx/EM74xx */
++	{QMI_FIXED_INTF(0x1199, 0x9070, 10)},	/* Sierra Wireless MC74xx/EM74xx */
++	{QMI_FIXED_INTF(0x1199, 0x9071, 8)},	/* Sierra Wireless MC74xx/EM74xx */
++	{QMI_FIXED_INTF(0x1199, 0x9071, 10)},	/* Sierra Wireless MC74xx/EM74xx */
+ 	{QMI_FIXED_INTF(0x1bbb, 0x011e, 4)},	/* Telekom Speedstick LTE II (Alcatel One Touch L100V LTE) */
+ 	{QMI_FIXED_INTF(0x1bbb, 0x0203, 2)},	/* Alcatel L800MA */
+ 	{QMI_FIXED_INTF(0x2357, 0x0201, 4)},	/* TP-LINK HSUPA Modem MA180 */
+diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
+index 0d3c474ff76d..a5ea8a984c53 100644
+--- a/drivers/net/wireless/ath/ath10k/mac.c
++++ b/drivers/net/wireless/ath/ath10k/mac.c
+@@ -2070,7 +2070,8 @@ static void ath10k_peer_assoc_h_ht(struct ath10k *ar,
+ 	enum ieee80211_band band;
+ 	const u8 *ht_mcs_mask;
+ 	const u16 *vht_mcs_mask;
+-	int i, n, max_nss;
++	int i, n;
++	u8 max_nss;
+ 	u32 stbc;
+ 
+ 	lockdep_assert_held(&ar->conf_mutex);
+@@ -2155,7 +2156,7 @@ static void ath10k_peer_assoc_h_ht(struct ath10k *ar,
+ 			arg->peer_ht_rates.rates[i] = i;
+ 	} else {
+ 		arg->peer_ht_rates.num_rates = n;
+-		arg->peer_num_spatial_streams = max_nss;
++		arg->peer_num_spatial_streams = min(sta->rx_nss, max_nss);
+ 	}
+ 
+ 	ath10k_dbg(ar, ATH10K_DBG_MAC, "mac ht peer %pM mcs cnt %d nss %d\n",
+@@ -4021,7 +4022,7 @@ static int ath10k_config(struct ieee80211_hw *hw, u32 changed)
+ 
+ static u32 get_nss_from_chainmask(u16 chain_mask)
+ {
+-	if ((chain_mask & 0x15) == 0x15)
++	if ((chain_mask & 0xf) == 0xf)
+ 		return 4;
+ 	else if ((chain_mask & 0x7) == 0x7)
+ 		return 3;
+diff --git a/drivers/net/wireless/iwlwifi/pcie/drv.c b/drivers/net/wireless/iwlwifi/pcie/drv.c
+index 865d578dee82..fd6aef7d4496 100644
+--- a/drivers/net/wireless/iwlwifi/pcie/drv.c
++++ b/drivers/net/wireless/iwlwifi/pcie/drv.c
+@@ -423,14 +423,21 @@ static const struct pci_device_id iwl_hw_card_ids[] = {
+ /* 8000 Series */
+ 	{IWL_PCI_DEVICE(0x24F3, 0x0010, iwl8260_2ac_cfg)},
+ 	{IWL_PCI_DEVICE(0x24F3, 0x1010, iwl8260_2ac_cfg)},
++	{IWL_PCI_DEVICE(0x24F3, 0x0130, iwl8260_2ac_cfg)},
++	{IWL_PCI_DEVICE(0x24F3, 0x1130, iwl8260_2ac_cfg)},
++	{IWL_PCI_DEVICE(0x24F3, 0x0132, iwl8260_2ac_cfg)},
++	{IWL_PCI_DEVICE(0x24F3, 0x1132, iwl8260_2ac_cfg)},
+ 	{IWL_PCI_DEVICE(0x24F3, 0x0110, iwl8260_2ac_cfg)},
++	{IWL_PCI_DEVICE(0x24F3, 0x01F0, iwl8260_2ac_cfg)},
++	{IWL_PCI_DEVICE(0x24F3, 0x0012, iwl8260_2ac_cfg)},
++	{IWL_PCI_DEVICE(0x24F3, 0x1012, iwl8260_2ac_cfg)},
+ 	{IWL_PCI_DEVICE(0x24F3, 0x1110, iwl8260_2ac_cfg)},
+ 	{IWL_PCI_DEVICE(0x24F3, 0x0050, iwl8260_2ac_cfg)},
+ 	{IWL_PCI_DEVICE(0x24F3, 0x0250, iwl8260_2ac_cfg)},
+ 	{IWL_PCI_DEVICE(0x24F3, 0x1050, iwl8260_2ac_cfg)},
+ 	{IWL_PCI_DEVICE(0x24F3, 0x0150, iwl8260_2ac_cfg)},
++	{IWL_PCI_DEVICE(0x24F3, 0x1150, iwl8260_2ac_cfg)},
+ 	{IWL_PCI_DEVICE(0x24F4, 0x0030, iwl8260_2ac_cfg)},
+-	{IWL_PCI_DEVICE(0x24F4, 0x1130, iwl8260_2ac_cfg)},
+ 	{IWL_PCI_DEVICE(0x24F4, 0x1030, iwl8260_2ac_cfg)},
+ 	{IWL_PCI_DEVICE(0x24F3, 0xC010, iwl8260_2ac_cfg)},
+ 	{IWL_PCI_DEVICE(0x24F3, 0xC110, iwl8260_2ac_cfg)},
+@@ -438,18 +445,28 @@ static const struct pci_device_id iwl_hw_card_ids[] = {
+ 	{IWL_PCI_DEVICE(0x24F3, 0xC050, iwl8260_2ac_cfg)},
+ 	{IWL_PCI_DEVICE(0x24F3, 0xD050, iwl8260_2ac_cfg)},
+ 	{IWL_PCI_DEVICE(0x24F3, 0x8010, iwl8260_2ac_cfg)},
++	{IWL_PCI_DEVICE(0x24F3, 0x8110, iwl8260_2ac_cfg)},
+ 	{IWL_PCI_DEVICE(0x24F3, 0x9010, iwl8260_2ac_cfg)},
++	{IWL_PCI_DEVICE(0x24F3, 0x9110, iwl8260_2ac_cfg)},
+ 	{IWL_PCI_DEVICE(0x24F4, 0x8030, iwl8260_2ac_cfg)},
+ 	{IWL_PCI_DEVICE(0x24F4, 0x9030, iwl8260_2ac_cfg)},
++	{IWL_PCI_DEVICE(0x24F3, 0x8130, iwl8260_2ac_cfg)},
++	{IWL_PCI_DEVICE(0x24F3, 0x9130, iwl8260_2ac_cfg)},
++	{IWL_PCI_DEVICE(0x24F3, 0x8132, iwl8260_2ac_cfg)},
++	{IWL_PCI_DEVICE(0x24F3, 0x9132, iwl8260_2ac_cfg)},
+ 	{IWL_PCI_DEVICE(0x24F3, 0x8050, iwl8260_2ac_cfg)},
++	{IWL_PCI_DEVICE(0x24F3, 0x8150, iwl8260_2ac_cfg)},
+ 	{IWL_PCI_DEVICE(0x24F3, 0x9050, iwl8260_2ac_cfg)},
++	{IWL_PCI_DEVICE(0x24F3, 0x9150, iwl8260_2ac_cfg)},
+ 	{IWL_PCI_DEVICE(0x24F3, 0x0004, iwl8260_2n_cfg)},
++	{IWL_PCI_DEVICE(0x24F3, 0x0044, iwl8260_2n_cfg)},
+ 	{IWL_PCI_DEVICE(0x24F5, 0x0010, iwl4165_2ac_cfg)},
+ 	{IWL_PCI_DEVICE(0x24F6, 0x0030, iwl4165_2ac_cfg)},
+ 	{IWL_PCI_DEVICE(0x24F3, 0x0810, iwl8260_2ac_cfg)},
+ 	{IWL_PCI_DEVICE(0x24F3, 0x0910, iwl8260_2ac_cfg)},
+ 	{IWL_PCI_DEVICE(0x24F3, 0x0850, iwl8260_2ac_cfg)},
+ 	{IWL_PCI_DEVICE(0x24F3, 0x0950, iwl8260_2ac_cfg)},
++	{IWL_PCI_DEVICE(0x24F3, 0x0930, iwl8260_2ac_cfg)},
+ #endif /* CONFIG_IWLMVM */
+ 
+ 	{0}
+diff --git a/drivers/net/wireless/iwlwifi/pcie/trans.c b/drivers/net/wireless/iwlwifi/pcie/trans.c
+index 9e144e71da0b..dab9b91b3f3d 100644
+--- a/drivers/net/wireless/iwlwifi/pcie/trans.c
++++ b/drivers/net/wireless/iwlwifi/pcie/trans.c
+@@ -592,10 +592,8 @@ static int iwl_pcie_prepare_card_hw(struct iwl_trans *trans)
+ 
+ 		do {
+ 			ret = iwl_pcie_set_hw_ready(trans);
+-			if (ret >= 0) {
+-				ret = 0;
+-				goto out;
+-			}
++			if (ret >= 0)
++				return 0;
+ 
+ 			usleep_range(200, 1000);
+ 			t += 200;
+@@ -605,10 +603,6 @@ static int iwl_pcie_prepare_card_hw(struct iwl_trans *trans)
+ 
+ 	IWL_ERR(trans, "Couldn't prepare the card\n");
+ 
+-out:
+-	iwl_clear_bit(trans, CSR_DBG_LINK_PWR_MGMT_REG,
+-		      CSR_RESET_LINK_PWR_MGMT_DISABLED);
+-
+ 	return ret;
+ }
+ 
+diff --git a/drivers/net/wireless/mwifiex/debugfs.c b/drivers/net/wireless/mwifiex/debugfs.c
+index 5a0636d43a1b..5583856fc5c4 100644
+--- a/drivers/net/wireless/mwifiex/debugfs.c
++++ b/drivers/net/wireless/mwifiex/debugfs.c
+@@ -731,7 +731,7 @@ mwifiex_rdeeprom_read(struct file *file, char __user *ubuf,
+ 		(struct mwifiex_private *) file->private_data;
+ 	unsigned long addr = get_zeroed_page(GFP_KERNEL);
+ 	char *buf = (char *) addr;
+-	int pos = 0, ret = 0, i;
++	int pos, ret, i;
+ 	u8 value[MAX_EEPROM_DATA];
+ 
+ 	if (!buf)
+@@ -739,7 +739,7 @@ mwifiex_rdeeprom_read(struct file *file, char __user *ubuf,
+ 
+ 	if (saved_offset == -1) {
+ 		/* No command has been given */
+-		pos += snprintf(buf, PAGE_SIZE, "0");
++		pos = snprintf(buf, PAGE_SIZE, "0");
+ 		goto done;
+ 	}
+ 
+@@ -748,17 +748,17 @@ mwifiex_rdeeprom_read(struct file *file, char __user *ubuf,
+ 				  (u16) saved_bytes, value);
+ 	if (ret) {
+ 		ret = -EINVAL;
+-		goto done;
++		goto out_free;
+ 	}
+ 
+-	pos += snprintf(buf, PAGE_SIZE, "%d %d ", saved_offset, saved_bytes);
++	pos = snprintf(buf, PAGE_SIZE, "%d %d ", saved_offset, saved_bytes);
+ 
+ 	for (i = 0; i < saved_bytes; i++)
+-		pos += snprintf(buf + strlen(buf), PAGE_SIZE, "%d ", value[i]);
+-
+-	ret = simple_read_from_buffer(ubuf, count, ppos, buf, pos);
++		pos += scnprintf(buf + pos, PAGE_SIZE - pos, "%d ", value[i]);
+ 
+ done:
++	ret = simple_read_from_buffer(ubuf, count, ppos, buf, pos);
++out_free:
+ 	free_page(addr);
+ 	return ret;
+ }
+diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
+index a9c9a077c77d..bc3d907fd20f 100644
+--- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
++++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
+@@ -680,7 +680,7 @@ void lnet_debug_peer(lnet_nid_t nid);
+ static inline void
+ lnet_peer_set_alive(lnet_peer_t *lp)
+ {
+-	lp->lp_last_alive = lp->lp_last_query = get_seconds();
++	lp->lp_last_alive = lp->lp_last_query = jiffies;
+ 	if (!lp->lp_alive)
+ 		lnet_notify_locked(lp, 0, 1, lp->lp_last_alive);
+ }
+diff --git a/drivers/staging/rtl8712/usb_intf.c b/drivers/staging/rtl8712/usb_intf.c
+index f8b5b332e7c3..943a0e204532 100644
+--- a/drivers/staging/rtl8712/usb_intf.c
++++ b/drivers/staging/rtl8712/usb_intf.c
+@@ -144,6 +144,7 @@ static struct usb_device_id rtl871x_usb_id_tbl[] = {
+ 	{USB_DEVICE(0x0DF6, 0x0058)},
+ 	{USB_DEVICE(0x0DF6, 0x0049)},
+ 	{USB_DEVICE(0x0DF6, 0x004C)},
++	{USB_DEVICE(0x0DF6, 0x006C)},
+ 	{USB_DEVICE(0x0DF6, 0x0064)},
+ 	/* Skyworth */
+ 	{USB_DEVICE(0x14b2, 0x3300)},
+diff --git a/drivers/tty/mips_ejtag_fdc.c b/drivers/tty/mips_ejtag_fdc.c
+index 358323c83b4f..43a2ba0c0fe9 100644
+--- a/drivers/tty/mips_ejtag_fdc.c
++++ b/drivers/tty/mips_ejtag_fdc.c
+@@ -1045,38 +1045,6 @@ err_destroy_ports:
+ 	return ret;
+ }
+ 
+-static int mips_ejtag_fdc_tty_remove(struct mips_cdmm_device *dev)
+-{
+-	struct mips_ejtag_fdc_tty *priv = mips_cdmm_get_drvdata(dev);
+-	struct mips_ejtag_fdc_tty_port *dport;
+-	int nport;
+-	unsigned int cfg;
+-
+-	if (priv->irq >= 0) {
+-		raw_spin_lock_irq(&priv->lock);
+-		cfg = mips_ejtag_fdc_read(priv, REG_FDCFG);
+-		/* Disable interrupts */
+-		cfg &= ~(REG_FDCFG_TXINTTHRES | REG_FDCFG_RXINTTHRES);
+-		cfg |= REG_FDCFG_TXINTTHRES_DISABLED;
+-		cfg |= REG_FDCFG_RXINTTHRES_DISABLED;
+-		mips_ejtag_fdc_write(priv, REG_FDCFG, cfg);
+-		raw_spin_unlock_irq(&priv->lock);
+-	} else {
+-		priv->removing = true;
+-		del_timer_sync(&priv->poll_timer);
+-	}
+-	kthread_stop(priv->thread);
+-	if (dev->cpu == 0)
+-		mips_ejtag_fdc_con.tty_drv = NULL;
+-	tty_unregister_driver(priv->driver);
+-	for (nport = 0; nport < NUM_TTY_CHANNELS; nport++) {
+-		dport = &priv->ports[nport];
+-		tty_port_destroy(&dport->port);
+-	}
+-	put_tty_driver(priv->driver);
+-	return 0;
+-}
+-
+ static int mips_ejtag_fdc_tty_cpu_down(struct mips_cdmm_device *dev)
+ {
+ 	struct mips_ejtag_fdc_tty *priv = mips_cdmm_get_drvdata(dev);
+@@ -1149,12 +1117,11 @@ static struct mips_cdmm_driver mips_ejtag_fdc_tty_driver = {
+ 		.name	= "mips_ejtag_fdc",
+ 	},
+ 	.probe		= mips_ejtag_fdc_tty_probe,
+-	.remove		= mips_ejtag_fdc_tty_remove,
+ 	.cpu_down	= mips_ejtag_fdc_tty_cpu_down,
+ 	.cpu_up		= mips_ejtag_fdc_tty_cpu_up,
+ 	.id_table	= mips_ejtag_fdc_tty_ids,
+ };
+-module_mips_cdmm_driver(mips_ejtag_fdc_tty_driver);
++builtin_mips_cdmm_driver(mips_ejtag_fdc_tty_driver);
+ 
+ static int __init mips_ejtag_fdc_init_console(void)
+ {
+diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
+index afc1879f66e0..dedac8ab85bf 100644
+--- a/drivers/tty/n_tty.c
++++ b/drivers/tty/n_tty.c
+@@ -169,7 +169,7 @@ static inline int tty_copy_to_user(struct tty_struct *tty,
+ {
+ 	struct n_tty_data *ldata = tty->disc_data;
+ 
+-	tty_audit_add_data(tty, to, n, ldata->icanon);
++	tty_audit_add_data(tty, from, n, ldata->icanon);
+ 	return copy_to_user(to, from, n);
+ }
+ 
+diff --git a/drivers/tty/tty_audit.c b/drivers/tty/tty_audit.c
+index 90ca082935f6..3d245cd3d8e6 100644
+--- a/drivers/tty/tty_audit.c
++++ b/drivers/tty/tty_audit.c
+@@ -265,7 +265,7 @@ static struct tty_audit_buf *tty_audit_buf_get(struct tty_struct *tty,
+  *
+  *	Audit @data of @size from @tty, if necessary.
+  */
+-void tty_audit_add_data(struct tty_struct *tty, unsigned char *data,
++void tty_audit_add_data(struct tty_struct *tty, const void *data,
+ 			size_t size, unsigned icanon)
+ {
+ 	struct tty_audit_buf *buf;
+diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
+index 774df354af55..1aa028638120 100644
+--- a/drivers/tty/tty_io.c
++++ b/drivers/tty/tty_io.c
+@@ -1279,18 +1279,22 @@ int tty_send_xchar(struct tty_struct *tty, char ch)
+ 	int	was_stopped = tty->stopped;
+ 
+ 	if (tty->ops->send_xchar) {
++		down_read(&tty->termios_rwsem);
+ 		tty->ops->send_xchar(tty, ch);
++		up_read(&tty->termios_rwsem);
+ 		return 0;
+ 	}
+ 
+ 	if (tty_write_lock(tty, 0) < 0)
+ 		return -ERESTARTSYS;
+ 
++	down_read(&tty->termios_rwsem);
+ 	if (was_stopped)
+ 		start_tty(tty);
+ 	tty->ops->write(tty, &ch, 1);
+ 	if (was_stopped)
+ 		stop_tty(tty);
++	up_read(&tty->termios_rwsem);
+ 	tty_write_unlock(tty);
+ 	return 0;
+ }
+diff --git a/drivers/tty/tty_ioctl.c b/drivers/tty/tty_ioctl.c
+index 5232fb60b0b1..043e332e7423 100644
+--- a/drivers/tty/tty_ioctl.c
++++ b/drivers/tty/tty_ioctl.c
+@@ -1142,16 +1142,12 @@ int n_tty_ioctl_helper(struct tty_struct *tty, struct file *file,
+ 			spin_unlock_irq(&tty->flow_lock);
+ 			break;
+ 		case TCIOFF:
+-			down_read(&tty->termios_rwsem);
+ 			if (STOP_CHAR(tty) != __DISABLED_CHAR)
+ 				retval = tty_send_xchar(tty, STOP_CHAR(tty));
+-			up_read(&tty->termios_rwsem);
+ 			break;
+ 		case TCION:
+-			down_read(&tty->termios_rwsem);
+ 			if (START_CHAR(tty) != __DISABLED_CHAR)
+ 				retval = tty_send_xchar(tty, START_CHAR(tty));
+-			up_read(&tty->termios_rwsem);
+ 			break;
+ 		default:
+ 			return -EINVAL;
+diff --git a/drivers/usb/chipidea/ci_hdrc_imx.c b/drivers/usb/chipidea/ci_hdrc_imx.c
+index fa774323ebda..846ceb91ec14 100644
+--- a/drivers/usb/chipidea/ci_hdrc_imx.c
++++ b/drivers/usb/chipidea/ci_hdrc_imx.c
+@@ -68,6 +68,12 @@ struct ci_hdrc_imx_data {
+ 	struct imx_usbmisc_data *usbmisc_data;
+ 	bool supports_runtime_pm;
+ 	bool in_lpm;
++	/* SoC before i.mx6 (except imx23/imx28) needs three clks */
++	bool need_three_clks;
++	struct clk *clk_ipg;
++	struct clk *clk_ahb;
++	struct clk *clk_per;
++	/* --------------------------------- */
+ };
+ 
+ /* Common functions shared by usbmisc drivers */
+@@ -119,6 +125,102 @@ static struct imx_usbmisc_data *usbmisc_get_init_data(struct device *dev)
+ }
+ 
+ /* End of common functions shared by usbmisc drivers*/
++static int imx_get_clks(struct device *dev)
++{
++	struct ci_hdrc_imx_data *data = dev_get_drvdata(dev);
++	int ret = 0;
++
++	data->clk_ipg = devm_clk_get(dev, "ipg");
++	if (IS_ERR(data->clk_ipg)) {
++		/* If the platform only needs one clocks */
++		data->clk = devm_clk_get(dev, NULL);
++		if (IS_ERR(data->clk)) {
++			ret = PTR_ERR(data->clk);
++			dev_err(dev,
++				"Failed to get clks, err=%ld,%ld\n",
++				PTR_ERR(data->clk), PTR_ERR(data->clk_ipg));
++			return ret;
++		}
++		return ret;
++	}
++
++	data->clk_ahb = devm_clk_get(dev, "ahb");
++	if (IS_ERR(data->clk_ahb)) {
++		ret = PTR_ERR(data->clk_ahb);
++		dev_err(dev,
++			"Failed to get ahb clock, err=%d\n", ret);
++		return ret;
++	}
++
++	data->clk_per = devm_clk_get(dev, "per");
++	if (IS_ERR(data->clk_per)) {
++		ret = PTR_ERR(data->clk_per);
++		dev_err(dev,
++			"Failed to get per clock, err=%d\n", ret);
++		return ret;
++	}
++
++	data->need_three_clks = true;
++	return ret;
++}
++
++static int imx_prepare_enable_clks(struct device *dev)
++{
++	struct ci_hdrc_imx_data *data = dev_get_drvdata(dev);
++	int ret = 0;
++
++	if (data->need_three_clks) {
++		ret = clk_prepare_enable(data->clk_ipg);
++		if (ret) {
++			dev_err(dev,
++				"Failed to prepare/enable ipg clk, err=%d\n",
++				ret);
++			return ret;
++		}
++
++		ret = clk_prepare_enable(data->clk_ahb);
++		if (ret) {
++			dev_err(dev,
++				"Failed to prepare/enable ahb clk, err=%d\n",
++				ret);
++			clk_disable_unprepare(data->clk_ipg);
++			return ret;
++		}
++
++		ret = clk_prepare_enable(data->clk_per);
++		if (ret) {
++			dev_err(dev,
++				"Failed to prepare/enable per clk, err=%d\n",
++				ret);
++			clk_disable_unprepare(data->clk_ahb);
++			clk_disable_unprepare(data->clk_ipg);
++			return ret;
++		}
++	} else {
++		ret = clk_prepare_enable(data->clk);
++		if (ret) {
++			dev_err(dev,
++				"Failed to prepare/enable clk, err=%d\n",
++				ret);
++			return ret;
++		}
++	}
++
++	return ret;
++}
++
++static void imx_disable_unprepare_clks(struct device *dev)
++{
++	struct ci_hdrc_imx_data *data = dev_get_drvdata(dev);
++
++	if (data->need_three_clks) {
++		clk_disable_unprepare(data->clk_per);
++		clk_disable_unprepare(data->clk_ahb);
++		clk_disable_unprepare(data->clk_ipg);
++	} else {
++		clk_disable_unprepare(data->clk);
++	}
++}
+ 
+ static int ci_hdrc_imx_probe(struct platform_device *pdev)
+ {
+@@ -137,23 +239,18 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
+ 	if (!data)
+ 		return -ENOMEM;
+ 
++	platform_set_drvdata(pdev, data);
+ 	data->usbmisc_data = usbmisc_get_init_data(&pdev->dev);
+ 	if (IS_ERR(data->usbmisc_data))
+ 		return PTR_ERR(data->usbmisc_data);
+ 
+-	data->clk = devm_clk_get(&pdev->dev, NULL);
+-	if (IS_ERR(data->clk)) {
+-		dev_err(&pdev->dev,
+-			"Failed to get clock, err=%ld\n", PTR_ERR(data->clk));
+-		return PTR_ERR(data->clk);
+-	}
++	ret = imx_get_clks(&pdev->dev);
++	if (ret)
++		return ret;
+ 
+-	ret = clk_prepare_enable(data->clk);
+-	if (ret) {
+-		dev_err(&pdev->dev,
+-			"Failed to prepare or enable clock, err=%d\n", ret);
++	ret = imx_prepare_enable_clks(&pdev->dev);
++	if (ret)
+ 		return ret;
+-	}
+ 
+ 	data->phy = devm_usb_get_phy_by_phandle(&pdev->dev, "fsl,usbphy", 0);
+ 	if (IS_ERR(data->phy)) {
+@@ -196,8 +293,6 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
+ 		goto disable_device;
+ 	}
+ 
+-	platform_set_drvdata(pdev, data);
+-
+ 	if (data->supports_runtime_pm) {
+ 		pm_runtime_set_active(&pdev->dev);
+ 		pm_runtime_enable(&pdev->dev);
+@@ -210,7 +305,7 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
+ disable_device:
+ 	ci_hdrc_remove_device(data->ci_pdev);
+ err_clk:
+-	clk_disable_unprepare(data->clk);
++	imx_disable_unprepare_clks(&pdev->dev);
+ 	return ret;
+ }
+ 
+@@ -224,7 +319,7 @@ static int ci_hdrc_imx_remove(struct platform_device *pdev)
+ 		pm_runtime_put_noidle(&pdev->dev);
+ 	}
+ 	ci_hdrc_remove_device(data->ci_pdev);
+-	clk_disable_unprepare(data->clk);
++	imx_disable_unprepare_clks(&pdev->dev);
+ 
+ 	return 0;
+ }
+@@ -236,7 +331,7 @@ static int imx_controller_suspend(struct device *dev)
+ 
+ 	dev_dbg(dev, "at %s\n", __func__);
+ 
+-	clk_disable_unprepare(data->clk);
++	imx_disable_unprepare_clks(dev);
+ 	data->in_lpm = true;
+ 
+ 	return 0;
+@@ -254,7 +349,7 @@ static int imx_controller_resume(struct device *dev)
+ 		return 0;
+ 	}
+ 
+-	ret = clk_prepare_enable(data->clk);
++	ret = imx_prepare_enable_clks(dev);
+ 	if (ret)
+ 		return ret;
+ 
+@@ -269,7 +364,7 @@ static int imx_controller_resume(struct device *dev)
+ 	return 0;
+ 
+ clk_disable:
+-	clk_disable_unprepare(data->clk);
++	imx_disable_unprepare_clks(dev);
+ 	return ret;
+ }
+ 
+diff --git a/drivers/usb/chipidea/udc.c b/drivers/usb/chipidea/udc.c
+index 6e53c24fa1cb..92937c14f818 100644
+--- a/drivers/usb/chipidea/udc.c
++++ b/drivers/usb/chipidea/udc.c
+@@ -1730,6 +1730,22 @@ static int ci_udc_start(struct usb_gadget *gadget,
+ 	return retval;
+ }
+ 
++static void ci_udc_stop_for_otg_fsm(struct ci_hdrc *ci)
++{
++	if (!ci_otg_is_fsm_mode(ci))
++		return;
++
++	mutex_lock(&ci->fsm.lock);
++	if (ci->fsm.otg->state == OTG_STATE_A_PERIPHERAL) {
++		ci->fsm.a_bidl_adis_tmout = 1;
++		ci_hdrc_otg_fsm_start(ci);
++	} else if (ci->fsm.otg->state == OTG_STATE_B_PERIPHERAL) {
++		ci->fsm.protocol = PROTO_UNDEF;
++		ci->fsm.otg->state = OTG_STATE_UNDEFINED;
++	}
++	mutex_unlock(&ci->fsm.lock);
++}
++
+ /**
+  * ci_udc_stop: unregister a gadget driver
+  */
+@@ -1754,6 +1770,7 @@ static int ci_udc_stop(struct usb_gadget *gadget)
+ 	ci->driver = NULL;
+ 	spin_unlock_irqrestore(&ci->lock, flags);
+ 
++	ci_udc_stop_for_otg_fsm(ci);
+ 	return 0;
+ }
+ 
+diff --git a/drivers/usb/class/usblp.c b/drivers/usb/class/usblp.c
+index f38e875a3fb1..8218ba7eb263 100644
+--- a/drivers/usb/class/usblp.c
++++ b/drivers/usb/class/usblp.c
+@@ -873,11 +873,11 @@ static int usblp_wwait(struct usblp *usblp, int nonblock)
+ 
+ 	add_wait_queue(&usblp->wwait, &waita);
+ 	for (;;) {
+-		set_current_state(TASK_INTERRUPTIBLE);
+ 		if (mutex_lock_interruptible(&usblp->mut)) {
+ 			rc = -EINTR;
+ 			break;
+ 		}
++		set_current_state(TASK_INTERRUPTIBLE);
+ 		rc = usblp_wtest(usblp, nonblock);
+ 		mutex_unlock(&usblp->mut);
+ 		if (rc <= 0)
+diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
+index ff5773c66b84..c0566ecd9977 100644
+--- a/drivers/usb/dwc3/core.c
++++ b/drivers/usb/dwc3/core.c
+@@ -490,6 +490,9 @@ static int dwc3_phy_setup(struct dwc3 *dwc)
+ 	if (dwc->dis_u2_susphy_quirk)
+ 		reg &= ~DWC3_GUSB2PHYCFG_SUSPHY;
+ 
++	if (dwc->dis_enblslpm_quirk)
++		reg &= ~DWC3_GUSB2PHYCFG_ENBLSLPM;
++
+ 	dwc3_writel(dwc->regs, DWC3_GUSB2PHYCFG(0), reg);
+ 
+ 	return 0;
+@@ -509,12 +512,18 @@ static int dwc3_core_init(struct dwc3 *dwc)
+ 
+ 	reg = dwc3_readl(dwc->regs, DWC3_GSNPSID);
+ 	/* This should read as U3 followed by revision number */
+-	if ((reg & DWC3_GSNPSID_MASK) != 0x55330000) {
++	if ((reg & DWC3_GSNPSID_MASK) == 0x55330000) {
++		/* Detected DWC_usb3 IP */
++		dwc->revision = reg;
++	} else if ((reg & DWC3_GSNPSID_MASK) == 0x33310000) {
++		/* Detected DWC_usb31 IP */
++		dwc->revision = dwc3_readl(dwc->regs, DWC3_VER_NUMBER);
++		dwc->revision |= DWC3_REVISION_IS_DWC31;
++	} else {
+ 		dev_err(dwc->dev, "this is not a DesignWare USB3 DRD Core\n");
+ 		ret = -ENODEV;
+ 		goto err0;
+ 	}
+-	dwc->revision = reg;
+ 
+ 	/*
+ 	 * Write Linux Version Code to our GUID register so it's easy to figure
+@@ -881,6 +890,8 @@ static int dwc3_probe(struct platform_device *pdev)
+ 				"snps,dis_u3_susphy_quirk");
+ 		dwc->dis_u2_susphy_quirk = of_property_read_bool(node,
+ 				"snps,dis_u2_susphy_quirk");
++	dwc->dis_enblslpm_quirk = device_property_read_bool(dev,
++				"snps,dis_enblslpm_quirk");
+ 
+ 		dwc->tx_de_emphasis_quirk = of_property_read_bool(node,
+ 				"snps,tx_de_emphasis_quirk");
+@@ -911,6 +922,7 @@ static int dwc3_probe(struct platform_device *pdev)
+ 		dwc->rx_detect_poll_quirk = pdata->rx_detect_poll_quirk;
+ 		dwc->dis_u3_susphy_quirk = pdata->dis_u3_susphy_quirk;
+ 		dwc->dis_u2_susphy_quirk = pdata->dis_u2_susphy_quirk;
++		dwc->dis_enblslpm_quirk = pdata->dis_enblslpm_quirk;
+ 
+ 		dwc->tx_de_emphasis_quirk = pdata->tx_de_emphasis_quirk;
+ 		if (pdata->tx_de_emphasis)
+diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
+index 044778884585..6e53ce9ce320 100644
+--- a/drivers/usb/dwc3/core.h
++++ b/drivers/usb/dwc3/core.h
+@@ -108,6 +108,9 @@
+ #define DWC3_GPRTBIMAP_FS0	0xc188
+ #define DWC3_GPRTBIMAP_FS1	0xc18c
+ 
++#define DWC3_VER_NUMBER		0xc1a0
++#define DWC3_VER_TYPE		0xc1a4
++
+ #define DWC3_GUSB2PHYCFG(n)	(0xc200 + (n * 0x04))
+ #define DWC3_GUSB2I2CCTL(n)	(0xc240 + (n * 0x04))
+ 
+@@ -175,6 +178,7 @@
+ #define DWC3_GUSB2PHYCFG_PHYSOFTRST	(1 << 31)
+ #define DWC3_GUSB2PHYCFG_SUSPHY		(1 << 6)
+ #define DWC3_GUSB2PHYCFG_ULPI_UTMI	(1 << 4)
++#define DWC3_GUSB2PHYCFG_ENBLSLPM	(1 << 8)
+ 
+ /* Global USB2 PHY Vendor Control Register */
+ #define DWC3_GUSB2PHYACC_NEWREGREQ	(1 << 25)
+@@ -712,6 +716,8 @@ struct dwc3_scratchpad_array {
+  * @rx_detect_poll_quirk: set if we enable rx_detect to polling lfps quirk
+  * @dis_u3_susphy_quirk: set if we disable usb3 suspend phy
+  * @dis_u2_susphy_quirk: set if we disable usb2 suspend phy
++ * @dis_enblslpm_quirk: set if we clear enblslpm in GUSB2PHYCFG,
++ *                      disabling the suspend signal to the PHY.
+  * @tx_de_emphasis_quirk: set if we enable Tx de-emphasis quirk
+  * @tx_de_emphasis: Tx de-emphasis value
+  * 	0	- -6dB de-emphasis
+@@ -766,6 +772,14 @@ struct dwc3 {
+ 	u32			num_event_buffers;
+ 	u32			u1u2;
+ 	u32			maximum_speed;
++
++	/*
++	 * All 3.1 IP version constants are greater than the 3.0 IP
++	 * version constants. This works for most version checks in
++	 * dwc3. However, in the future, this may not apply as
++	 * features may be developed on newer versions of the 3.0 IP
++	 * that are not in the 3.1 IP.
++	 */
+ 	u32			revision;
+ 
+ #define DWC3_REVISION_173A	0x5533173a
+@@ -788,6 +802,13 @@ struct dwc3 {
+ #define DWC3_REVISION_270A	0x5533270a
+ #define DWC3_REVISION_280A	0x5533280a
+ 
++/*
++ * NOTICE: we're using bit 31 as a "is usb 3.1" flag. This is really
++ * just so dwc31 revisions are always larger than dwc3.
++ */
++#define DWC3_REVISION_IS_DWC31		0x80000000
++#define DWC3_USB31_REVISION_110A	(0x3131302a | DWC3_REVISION_IS_USB31)
++
+ 	enum dwc3_ep0_next	ep0_next_event;
+ 	enum dwc3_ep0_state	ep0state;
+ 	enum dwc3_link_state	link_state;
+@@ -841,6 +862,7 @@ struct dwc3 {
+ 	unsigned		rx_detect_poll_quirk:1;
+ 	unsigned		dis_u3_susphy_quirk:1;
+ 	unsigned		dis_u2_susphy_quirk:1;
++	unsigned		dis_enblslpm_quirk:1;
+ 
+ 	unsigned		tx_de_emphasis_quirk:1;
+ 	unsigned		tx_de_emphasis:2;
+diff --git a/drivers/usb/dwc3/dwc3-pci.c b/drivers/usb/dwc3/dwc3-pci.c
+index 27e4fc896e9d..04b87ebe6f94 100644
+--- a/drivers/usb/dwc3/dwc3-pci.c
++++ b/drivers/usb/dwc3/dwc3-pci.c
+@@ -27,6 +27,8 @@
+ #include "platform_data.h"
+ 
+ #define PCI_DEVICE_ID_SYNOPSYS_HAPSUSB3	0xabcd
++#define PCI_DEVICE_ID_SYNOPSYS_HAPSUSB3_AXI 0xabce
++#define PCI_DEVICE_ID_SYNOPSYS_HAPSUSB31 0xabcf
+ #define PCI_DEVICE_ID_INTEL_BYT		0x0f37
+ #define PCI_DEVICE_ID_INTEL_MRFLD	0x119e
+ #define PCI_DEVICE_ID_INTEL_BSW		0x22B7
+@@ -100,6 +102,22 @@ static int dwc3_pci_quirks(struct pci_dev *pdev)
+ 		}
+ 	}
+ 
++	if (pdev->vendor == PCI_VENDOR_ID_SYNOPSYS &&
++	    (pdev->device == PCI_DEVICE_ID_SYNOPSYS_HAPSUSB3 ||
++	     pdev->device == PCI_DEVICE_ID_SYNOPSYS_HAPSUSB3_AXI ||
++	     pdev->device == PCI_DEVICE_ID_SYNOPSYS_HAPSUSB31)) {
++
++		struct dwc3_platform_data pdata;
++
++		memset(&pdata, 0, sizeof(pdata));
++		pdata.usb3_lpm_capable = true;
++		pdata.has_lpm_erratum = true;
++		pdata.dis_enblslpm_quirk = true;
++
++		return platform_device_add_data(pci_get_drvdata(pdev), &pdata,
++						sizeof(pdata));
++	}
++
+ 	return 0;
+ }
+ 
+@@ -172,6 +190,14 @@ static const struct pci_device_id dwc3_pci_id_table[] = {
+ 		PCI_DEVICE(PCI_VENDOR_ID_SYNOPSYS,
+ 				PCI_DEVICE_ID_SYNOPSYS_HAPSUSB3),
+ 	},
++	{
++		PCI_DEVICE(PCI_VENDOR_ID_SYNOPSYS,
++				PCI_DEVICE_ID_SYNOPSYS_HAPSUSB3_AXI),
++	},
++	{
++		PCI_DEVICE(PCI_VENDOR_ID_SYNOPSYS,
++				PCI_DEVICE_ID_SYNOPSYS_HAPSUSB31),
++	},
+ 	{ PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_BSW), },
+ 	{ PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_BYT), },
+ 	{ PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MRFLD), },
+diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
+index 333a7c0078fc..6fbf461d523c 100644
+--- a/drivers/usb/dwc3/gadget.c
++++ b/drivers/usb/dwc3/gadget.c
+@@ -1859,27 +1859,32 @@ static int dwc3_cleanup_done_reqs(struct dwc3 *dwc, struct dwc3_ep *dep,
+ 	unsigned int		i;
+ 	int			ret;
+ 
+-	req = next_request(&dep->req_queued);
+-	if (!req) {
+-		WARN_ON_ONCE(1);
+-		return 1;
+-	}
+-	i = 0;
+ 	do {
+-		slot = req->start_slot + i;
+-		if ((slot == DWC3_TRB_NUM - 1) &&
++		req = next_request(&dep->req_queued);
++		if (!req) {
++			WARN_ON_ONCE(1);
++			return 1;
++		}
++		i = 0;
++		do {
++			slot = req->start_slot + i;
++			if ((slot == DWC3_TRB_NUM - 1) &&
+ 				usb_endpoint_xfer_isoc(dep->endpoint.desc))
+-			slot++;
+-		slot %= DWC3_TRB_NUM;
+-		trb = &dep->trb_pool[slot];
++				slot++;
++			slot %= DWC3_TRB_NUM;
++			trb = &dep->trb_pool[slot];
++
++			ret = __dwc3_cleanup_done_trbs(dwc, dep, req, trb,
++					event, status);
++			if (ret)
++				break;
++		} while (++i < req->request.num_mapped_sgs);
++
++		dwc3_gadget_giveback(dep, req, status);
+ 
+-		ret = __dwc3_cleanup_done_trbs(dwc, dep, req, trb,
+-				event, status);
+ 		if (ret)
+ 			break;
+-	} while (++i < req->request.num_mapped_sgs);
+-
+-	dwc3_gadget_giveback(dep, req, status);
++	} while (1);
+ 
+ 	if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
+ 			list_empty(&dep->req_queued)) {
+@@ -2709,12 +2714,34 @@ int dwc3_gadget_init(struct dwc3 *dwc)
+ 	}
+ 
+ 	dwc->gadget.ops			= &dwc3_gadget_ops;
+-	dwc->gadget.max_speed		= USB_SPEED_SUPER;
+ 	dwc->gadget.speed		= USB_SPEED_UNKNOWN;
+ 	dwc->gadget.sg_supported	= true;
+ 	dwc->gadget.name		= "dwc3-gadget";
+ 
+ 	/*
++	 * FIXME We might be setting max_speed to <SUPER, however versions
++	 * <2.20a of dwc3 have an issue with metastability (documented
++	 * elsewhere in this driver) which tells us we can't set max speed to
++	 * anything lower than SUPER.
++	 *
++	 * Because gadget.max_speed is only used by composite.c and function
++	 * drivers (i.e. it won't go into dwc3's registers) we are allowing this
++	 * to happen so we avoid sending SuperSpeed Capability descriptor
++	 * together with our BOS descriptor as that could confuse host into
++	 * thinking we can handle super speed.
++	 *
++	 * Note that, in fact, we won't even support GetBOS requests when speed
++	 * is less than super speed because we don't have means, yet, to tell
++	 * composite.c that we are USB 2.0 + LPM ECN.
++	 */
++	if (dwc->revision < DWC3_REVISION_220A)
++		dwc3_trace(trace_dwc3_gadget,
++				"Changing max_speed on rev %08x\n",
++				dwc->revision);
++
++	dwc->gadget.max_speed		= dwc->maximum_speed;
++
++	/*
+ 	 * Per databook, DWC3 needs buffer size to be aligned to MaxPacketSize
+ 	 * on ep out.
+ 	 */
+diff --git a/drivers/usb/dwc3/platform_data.h b/drivers/usb/dwc3/platform_data.h
+index d3614ecbb9ca..db2938002260 100644
+--- a/drivers/usb/dwc3/platform_data.h
++++ b/drivers/usb/dwc3/platform_data.h
+@@ -42,6 +42,7 @@ struct dwc3_platform_data {
+ 	unsigned rx_detect_poll_quirk:1;
+ 	unsigned dis_u3_susphy_quirk:1;
+ 	unsigned dis_u2_susphy_quirk:1;
++	unsigned dis_enblslpm_quirk:1;
+ 
+ 	unsigned tx_de_emphasis_quirk:1;
+ 	unsigned tx_de_emphasis:2;
+diff --git a/drivers/usb/gadget/udc/atmel_usba_udc.c b/drivers/usb/gadget/udc/atmel_usba_udc.c
+index 4095cce05e6a..35fff450bdc8 100644
+--- a/drivers/usb/gadget/udc/atmel_usba_udc.c
++++ b/drivers/usb/gadget/udc/atmel_usba_udc.c
+@@ -1634,7 +1634,7 @@ static irqreturn_t usba_udc_irq(int irq, void *devid)
+ 	spin_lock(&udc->lock);
+ 
+ 	int_enb = usba_int_enb_get(udc);
+-	status = usba_readl(udc, INT_STA) & int_enb;
++	status = usba_readl(udc, INT_STA) & (int_enb | USBA_HIGH_SPEED);
+ 	DBG(DBG_INT, "irq, status=%#08x\n", status);
+ 
+ 	if (status & USBA_DET_SUSPEND) {
+diff --git a/drivers/usb/gadget/udc/net2280.c b/drivers/usb/gadget/udc/net2280.c
+index 2bee912ca65b..baa0191666aa 100644
+--- a/drivers/usb/gadget/udc/net2280.c
++++ b/drivers/usb/gadget/udc/net2280.c
+@@ -1846,7 +1846,7 @@ static void defect7374_disable_data_eps(struct net2280 *dev)
+ 
+ 	for (i = 1; i < 5; i++) {
+ 		ep = &dev->ep[i];
+-		writel(0, &ep->cfg->ep_cfg);
++		writel(i, &ep->cfg->ep_cfg);
+ 	}
+ 
+ 	/* CSROUT, CSRIN, PCIOUT, PCIIN, STATIN, RCIN */
+diff --git a/drivers/usb/host/ehci-orion.c b/drivers/usb/host/ehci-orion.c
+index bfcbb9aa8816..ee8d5faa0194 100644
+--- a/drivers/usb/host/ehci-orion.c
++++ b/drivers/usb/host/ehci-orion.c
+@@ -224,7 +224,8 @@ static int ehci_orion_drv_probe(struct platform_device *pdev)
+ 	priv->phy = devm_phy_optional_get(&pdev->dev, "usb");
+ 	if (IS_ERR(priv->phy)) {
+ 		err = PTR_ERR(priv->phy);
+-		goto err_phy_get;
++		if (err != -ENOSYS)
++			goto err_phy_get;
+ 	} else {
+ 		err = phy_init(priv->phy);
+ 		if (err)
+diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
+index d7b9f484d4e9..6062996d35a6 100644
+--- a/drivers/usb/host/xhci.c
++++ b/drivers/usb/host/xhci.c
+@@ -175,6 +175,16 @@ int xhci_reset(struct xhci_hcd *xhci)
+ 	command |= CMD_RESET;
+ 	writel(command, &xhci->op_regs->command);
+ 
++	/* Existing Intel xHCI controllers require a delay of 1 mS,
++	 * after setting the CMD_RESET bit, and before accessing any
++	 * HC registers. This allows the HC to complete the
++	 * reset operation and be ready for HC register access.
++	 * Without this delay, the subsequent HC register access,
++	 * may result in a system hang very rarely.
++	 */
++	if (xhci->quirks & XHCI_INTEL_HOST)
++		udelay(1000);
++
+ 	ret = xhci_handshake(&xhci->op_regs->command,
+ 			CMD_RESET, 0, 10 * 1000 * 1000);
+ 	if (ret)
+diff --git a/drivers/usb/musb/musb_core.c b/drivers/usb/musb/musb_core.c
+index 514a6cdaeff6..2fe6d263eb6b 100644
+--- a/drivers/usb/musb/musb_core.c
++++ b/drivers/usb/musb/musb_core.c
+@@ -132,7 +132,7 @@ static inline struct musb *dev_to_musb(struct device *dev)
+ /*-------------------------------------------------------------------------*/
+ 
+ #ifndef CONFIG_BLACKFIN
+-static int musb_ulpi_read(struct usb_phy *phy, u32 offset)
++static int musb_ulpi_read(struct usb_phy *phy, u32 reg)
+ {
+ 	void __iomem *addr = phy->io_priv;
+ 	int	i = 0;
+@@ -151,7 +151,7 @@ static int musb_ulpi_read(struct usb_phy *phy, u32 offset)
+ 	 * ULPICarKitControlDisableUTMI after clearing POWER_SUSPENDM.
+ 	 */
+ 
+-	musb_writeb(addr, MUSB_ULPI_REG_ADDR, (u8)offset);
++	musb_writeb(addr, MUSB_ULPI_REG_ADDR, (u8)reg);
+ 	musb_writeb(addr, MUSB_ULPI_REG_CONTROL,
+ 			MUSB_ULPI_REG_REQ | MUSB_ULPI_RDN_WR);
+ 
+@@ -176,7 +176,7 @@ out:
+ 	return ret;
+ }
+ 
+-static int musb_ulpi_write(struct usb_phy *phy, u32 offset, u32 data)
++static int musb_ulpi_write(struct usb_phy *phy, u32 val, u32 reg)
+ {
+ 	void __iomem *addr = phy->io_priv;
+ 	int	i = 0;
+@@ -191,8 +191,8 @@ static int musb_ulpi_write(struct usb_phy *phy, u32 offset, u32 data)
+ 	power &= ~MUSB_POWER_SUSPENDM;
+ 	musb_writeb(addr, MUSB_POWER, power);
+ 
+-	musb_writeb(addr, MUSB_ULPI_REG_ADDR, (u8)offset);
+-	musb_writeb(addr, MUSB_ULPI_REG_DATA, (u8)data);
++	musb_writeb(addr, MUSB_ULPI_REG_ADDR, (u8)reg);
++	musb_writeb(addr, MUSB_ULPI_REG_DATA, (u8)val);
+ 	musb_writeb(addr, MUSB_ULPI_REG_CONTROL, MUSB_ULPI_REG_REQ);
+ 
+ 	while (!(musb_readb(addr, MUSB_ULPI_REG_CONTROL)
+diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c
+index 7c8eb4c4c175..4021846139c9 100644
+--- a/drivers/usb/serial/option.c
++++ b/drivers/usb/serial/option.c
+@@ -162,6 +162,7 @@ static void option_instat_callback(struct urb *urb);
+ #define NOVATELWIRELESS_PRODUCT_HSPA_EMBEDDED_HIGHSPEED	0x9001
+ #define NOVATELWIRELESS_PRODUCT_E362		0x9010
+ #define NOVATELWIRELESS_PRODUCT_E371		0x9011
++#define NOVATELWIRELESS_PRODUCT_U620L		0x9022
+ #define NOVATELWIRELESS_PRODUCT_G2		0xA010
+ #define NOVATELWIRELESS_PRODUCT_MC551		0xB001
+ 
+@@ -357,6 +358,7 @@ static void option_instat_callback(struct urb *urb);
+ /* This is the 4G XS Stick W14 a.k.a. Mobilcom Debitel Surf-Stick *
+  * It seems to contain a Qualcomm QSC6240/6290 chipset            */
+ #define FOUR_G_SYSTEMS_PRODUCT_W14		0x9603
++#define FOUR_G_SYSTEMS_PRODUCT_W100		0x9b01
+ 
+ /* iBall 3.5G connect wireless modem */
+ #define IBALL_3_5G_CONNECT			0x9605
+@@ -522,6 +524,11 @@ static const struct option_blacklist_info four_g_w14_blacklist = {
+ 	.sendsetup = BIT(0) | BIT(1),
+ };
+ 
++static const struct option_blacklist_info four_g_w100_blacklist = {
++	.sendsetup = BIT(1) | BIT(2),
++	.reserved = BIT(3),
++};
++
+ static const struct option_blacklist_info alcatel_x200_blacklist = {
+ 	.sendsetup = BIT(0) | BIT(1),
+ 	.reserved = BIT(4),
+@@ -1060,6 +1067,7 @@ static const struct usb_device_id option_ids[] = {
+ 	{ USB_DEVICE_AND_INTERFACE_INFO(NOVATELWIRELESS_VENDOR_ID, NOVATELWIRELESS_PRODUCT_MC551, 0xff, 0xff, 0xff) },
+ 	{ USB_DEVICE_AND_INTERFACE_INFO(NOVATELWIRELESS_VENDOR_ID, NOVATELWIRELESS_PRODUCT_E362, 0xff, 0xff, 0xff) },
+ 	{ USB_DEVICE_AND_INTERFACE_INFO(NOVATELWIRELESS_VENDOR_ID, NOVATELWIRELESS_PRODUCT_E371, 0xff, 0xff, 0xff) },
++	{ USB_DEVICE_AND_INTERFACE_INFO(NOVATELWIRELESS_VENDOR_ID, NOVATELWIRELESS_PRODUCT_U620L, 0xff, 0x00, 0x00) },
+ 
+ 	{ USB_DEVICE(AMOI_VENDOR_ID, AMOI_PRODUCT_H01) },
+ 	{ USB_DEVICE(AMOI_VENDOR_ID, AMOI_PRODUCT_H01A) },
+@@ -1653,6 +1661,9 @@ static const struct usb_device_id option_ids[] = {
+ 	{ USB_DEVICE(LONGCHEER_VENDOR_ID, FOUR_G_SYSTEMS_PRODUCT_W14),
+   	  .driver_info = (kernel_ulong_t)&four_g_w14_blacklist
+   	},
++	{ USB_DEVICE(LONGCHEER_VENDOR_ID, FOUR_G_SYSTEMS_PRODUCT_W100),
++	  .driver_info = (kernel_ulong_t)&four_g_w100_blacklist
++	},
+ 	{ USB_DEVICE_INTERFACE_CLASS(LONGCHEER_VENDOR_ID, SPEEDUP_PRODUCT_SU9800, 0xff) },
+ 	{ USB_DEVICE(LONGCHEER_VENDOR_ID, ZOOM_PRODUCT_4597) },
+ 	{ USB_DEVICE(LONGCHEER_VENDOR_ID, IBALL_3_5G_CONNECT) },
+diff --git a/drivers/usb/serial/qcserial.c b/drivers/usb/serial/qcserial.c
+index f49d262e926b..514fa91cf74e 100644
+--- a/drivers/usb/serial/qcserial.c
++++ b/drivers/usb/serial/qcserial.c
+@@ -22,6 +22,8 @@
+ #define DRIVER_AUTHOR "Qualcomm Inc"
+ #define DRIVER_DESC "Qualcomm USB Serial driver"
+ 
++#define QUECTEL_EC20_PID	0x9215
++
+ /* standard device layouts supported by this driver */
+ enum qcserial_layouts {
+ 	QCSERIAL_G2K = 0,	/* Gobi 2000 */
+@@ -169,6 +171,38 @@ static const struct usb_device_id id_table[] = {
+ };
+ MODULE_DEVICE_TABLE(usb, id_table);
+ 
++static int handle_quectel_ec20(struct device *dev, int ifnum)
++{
++	int altsetting = 0;
++
++	/*
++	 * Quectel EC20 Mini PCIe LTE module layout:
++	 * 0: DM/DIAG (use libqcdm from ModemManager for communication)
++	 * 1: NMEA
++	 * 2: AT-capable modem port
++	 * 3: Modem interface
++	 * 4: NDIS
++	 */
++	switch (ifnum) {
++	case 0:
++		dev_dbg(dev, "Quectel EC20 DM/DIAG interface found\n");
++		break;
++	case 1:
++		dev_dbg(dev, "Quectel EC20 NMEA GPS interface found\n");
++		break;
++	case 2:
++	case 3:
++		dev_dbg(dev, "Quectel EC20 Modem port found\n");
++		break;
++	case 4:
++		/* Don't claim the QMI/net interface */
++		altsetting = -1;
++		break;
++	}
++
++	return altsetting;
++}
++
+ static int qcprobe(struct usb_serial *serial, const struct usb_device_id *id)
+ {
+ 	struct usb_host_interface *intf = serial->interface->cur_altsetting;
+@@ -178,6 +212,10 @@ static int qcprobe(struct usb_serial *serial, const struct usb_device_id *id)
+ 	__u8 ifnum;
+ 	int altsetting = -1;
+ 
++	/* we only support vendor specific functions */
++	if (intf->desc.bInterfaceClass != USB_CLASS_VENDOR_SPEC)
++		goto done;
++
+ 	nintf = serial->dev->actconfig->desc.bNumInterfaces;
+ 	dev_dbg(dev, "Num Interfaces = %d\n", nintf);
+ 	ifnum = intf->desc.bInterfaceNumber;
+@@ -237,6 +275,12 @@ static int qcprobe(struct usb_serial *serial, const struct usb_device_id *id)
+ 			altsetting = -1;
+ 		break;
+ 	case QCSERIAL_G2K:
++		/* handle non-standard layouts */
++		if (nintf == 5 && id->idProduct == QUECTEL_EC20_PID) {
++			altsetting = handle_quectel_ec20(dev, ifnum);
++			goto done;
++		}
++
+ 		/*
+ 		 * Gobi 2K+ USB layout:
+ 		 * 0: QMI/net
+@@ -297,29 +341,39 @@ static int qcprobe(struct usb_serial *serial, const struct usb_device_id *id)
+ 		break;
+ 	case QCSERIAL_HWI:
+ 		/*
+-		 * Huawei layout:
+-		 * 0: AT-capable modem port
+-		 * 1: DM/DIAG
+-		 * 2: AT-capable modem port
+-		 * 3: CCID-compatible PCSC interface
+-		 * 4: QMI/net
+-		 * 5: NMEA
++		 * Huawei devices map functions by subclass + protocol
++		 * instead of interface numbers. The protocol identify
++		 * a specific function, while the subclass indicate a
++		 * specific firmware source
++		 *
++		 * This is a blacklist of functions known to be
++		 * non-serial.  The rest are assumed to be serial and
++		 * will be handled by this driver
+ 		 */
+-		switch (ifnum) {
+-		case 0:
+-		case 2:
+-			dev_dbg(dev, "Modem port found\n");
+-			break;
+-		case 1:
+-			dev_dbg(dev, "DM/DIAG interface found\n");
+-			break;
+-		case 5:
+-			dev_dbg(dev, "NMEA GPS interface found\n");
+-			break;
+-		default:
+-			/* don't claim any unsupported interface */
++		switch (intf->desc.bInterfaceProtocol) {
++			/* QMI combined (qmi_wwan) */
++		case 0x07:
++		case 0x37:
++		case 0x67:
++			/* QMI data (qmi_wwan) */
++		case 0x08:
++		case 0x38:
++		case 0x68:
++			/* QMI control (qmi_wwan) */
++		case 0x09:
++		case 0x39:
++		case 0x69:
++			/* NCM like (huawei_cdc_ncm) */
++		case 0x16:
++		case 0x46:
++		case 0x76:
+ 			altsetting = -1;
+ 			break;
++		default:
++			dev_dbg(dev, "Huawei type serial port found (%02x/%02x/%02x)\n",
++				intf->desc.bInterfaceClass,
++				intf->desc.bInterfaceSubClass,
++				intf->desc.bInterfaceProtocol);
+ 		}
+ 		break;
+ 	default:
+diff --git a/drivers/usb/serial/ti_usb_3410_5052.c b/drivers/usb/serial/ti_usb_3410_5052.c
+index e9da41d9fe7f..2694df2f4559 100644
+--- a/drivers/usb/serial/ti_usb_3410_5052.c
++++ b/drivers/usb/serial/ti_usb_3410_5052.c
+@@ -159,6 +159,7 @@ static const struct usb_device_id ti_id_table_3410[] = {
+ 	{ USB_DEVICE(ABBOTT_VENDOR_ID, ABBOTT_STEREO_PLUG_ID) },
+ 	{ USB_DEVICE(ABBOTT_VENDOR_ID, ABBOTT_STRIP_PORT_ID) },
+ 	{ USB_DEVICE(TI_VENDOR_ID, FRI2_PRODUCT_ID) },
++	{ USB_DEVICE(HONEYWELL_VENDOR_ID, HONEYWELL_HGI80_PRODUCT_ID) },
+ 	{ }	/* terminator */
+ };
+ 
+@@ -191,6 +192,7 @@ static const struct usb_device_id ti_id_table_combined[] = {
+ 	{ USB_DEVICE(ABBOTT_VENDOR_ID, ABBOTT_PRODUCT_ID) },
+ 	{ USB_DEVICE(ABBOTT_VENDOR_ID, ABBOTT_STRIP_PORT_ID) },
+ 	{ USB_DEVICE(TI_VENDOR_ID, FRI2_PRODUCT_ID) },
++	{ USB_DEVICE(HONEYWELL_VENDOR_ID, HONEYWELL_HGI80_PRODUCT_ID) },
+ 	{ }	/* terminator */
+ };
+ 
+diff --git a/drivers/usb/serial/ti_usb_3410_5052.h b/drivers/usb/serial/ti_usb_3410_5052.h
+index 4a2423e84d55..98f35c656c02 100644
+--- a/drivers/usb/serial/ti_usb_3410_5052.h
++++ b/drivers/usb/serial/ti_usb_3410_5052.h
+@@ -56,6 +56,10 @@
+ #define ABBOTT_PRODUCT_ID		ABBOTT_STEREO_PLUG_ID
+ #define ABBOTT_STRIP_PORT_ID		0x3420
+ 
++/* Honeywell vendor and product IDs */
++#define HONEYWELL_VENDOR_ID		0x10ac
++#define HONEYWELL_HGI80_PRODUCT_ID	0x0102  /* Honeywell HGI80 */
++
+ /* Commands */
+ #define TI_GET_VERSION			0x01
+ #define TI_GET_PORT_STATUS		0x02
+diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
+index 96093ae369a5..cdc3d3360764 100644
+--- a/drivers/xen/events/events_base.c
++++ b/drivers/xen/events/events_base.c
+@@ -39,6 +39,7 @@
+ #include <asm/irq.h>
+ #include <asm/idle.h>
+ #include <asm/io_apic.h>
++#include <asm/i8259.h>
+ #include <asm/xen/pci.h>
+ #include <xen/page.h>
+ #endif
+@@ -420,7 +421,7 @@ static int __must_check xen_allocate_irq_gsi(unsigned gsi)
+ 		return xen_allocate_irq_dynamic();
+ 
+ 	/* Legacy IRQ descriptors are already allocated by the arch. */
+-	if (gsi < NR_IRQS_LEGACY)
++	if (gsi < nr_legacy_irqs())
+ 		irq = gsi;
+ 	else
+ 		irq = irq_alloc_desc_at(gsi, -1);
+@@ -446,7 +447,7 @@ static void xen_free_irq(unsigned irq)
+ 	kfree(info);
+ 
+ 	/* Legacy IRQ descriptors are managed by the arch. */
+-	if (irq < NR_IRQS_LEGACY)
++	if (irq < nr_legacy_irqs())
+ 		return;
+ 
+ 	irq_free_desc(irq);
+diff --git a/fs/proc/array.c b/fs/proc/array.c
+index ce065cf3104f..57fde2dfd4af 100644
+--- a/fs/proc/array.c
++++ b/fs/proc/array.c
+@@ -372,7 +372,7 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
+ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
+ 			struct pid *pid, struct task_struct *task, int whole)
+ {
+-	unsigned long vsize, eip, esp, wchan = ~0UL;
++	unsigned long vsize, eip, esp, wchan = 0;
+ 	int priority, nice;
+ 	int tty_pgrp = -1, tty_nr = 0;
+ 	sigset_t sigign, sigcatch;
+@@ -504,7 +504,19 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
+ 	seq_put_decimal_ull(m, ' ', task->blocked.sig[0] & 0x7fffffffUL);
+ 	seq_put_decimal_ull(m, ' ', sigign.sig[0] & 0x7fffffffUL);
+ 	seq_put_decimal_ull(m, ' ', sigcatch.sig[0] & 0x7fffffffUL);
+-	seq_put_decimal_ull(m, ' ', wchan);
++
++	/*
++	 * We used to output the absolute kernel address, but that's an
++	 * information leak - so instead we show a 0/1 flag here, to signal
++	 * to user-space whether there's a wchan field in /proc/PID/wchan.
++	 *
++	 * This works with older implementations of procps as well.
++	 */
++	if (wchan)
++		seq_puts(m, " 1");
++	else
++		seq_puts(m, " 0");
++
+ 	seq_put_decimal_ull(m, ' ', 0);
+ 	seq_put_decimal_ull(m, ' ', 0);
+ 	seq_put_decimal_ll(m, ' ', task->exit_signal);
+diff --git a/fs/proc/base.c b/fs/proc/base.c
+index aa50d1ac28fc..83a43c131e9d 100644
+--- a/fs/proc/base.c
++++ b/fs/proc/base.c
+@@ -430,13 +430,10 @@ static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
+ 
+ 	wchan = get_wchan(task);
+ 
+-	if (lookup_symbol_name(wchan, symname) < 0) {
+-		if (!ptrace_may_access(task, PTRACE_MODE_READ))
+-			return 0;
+-		seq_printf(m, "%lu", wchan);
+-	} else {
++	if (wchan && ptrace_may_access(task, PTRACE_MODE_READ) && !lookup_symbol_name(wchan, symname))
+ 		seq_printf(m, "%s", symname);
+-	}
++	else
++		seq_putc(m, '0');
+ 
+ 	return 0;
+ }
+diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
+index 05e99b8ef465..053f122b592d 100644
+--- a/include/linux/kvm_host.h
++++ b/include/linux/kvm_host.h
+@@ -436,6 +436,17 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
+ 	     (vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \
+ 	     idx++)
+ 
++static inline struct kvm_vcpu *kvm_get_vcpu_by_id(struct kvm *kvm, int id)
++{
++	struct kvm_vcpu *vcpu;
++	int i;
++
++	kvm_for_each_vcpu(i, vcpu, kvm)
++		if (vcpu->vcpu_id == id)
++			return vcpu;
++	return NULL;
++}
++
+ #define kvm_for_each_memslot(memslot, slots)	\
+ 	for (memslot = &slots->memslots[0];	\
+ 	      memslot < slots->memslots + KVM_MEM_SLOTS_NUM && memslot->npages;\
+diff --git a/include/linux/tty.h b/include/linux/tty.h
+index ad6c8913aa3e..342a760d5729 100644
+--- a/include/linux/tty.h
++++ b/include/linux/tty.h
+@@ -605,7 +605,7 @@ extern void n_tty_inherit_ops(struct tty_ldisc_ops *ops);
+ 
+ /* tty_audit.c */
+ #ifdef CONFIG_AUDIT
+-extern void tty_audit_add_data(struct tty_struct *tty, unsigned char *data,
++extern void tty_audit_add_data(struct tty_struct *tty, const void *data,
+ 			       size_t size, unsigned icanon);
+ extern void tty_audit_exit(void);
+ extern void tty_audit_fork(struct signal_struct *sig);
+@@ -613,8 +613,8 @@ extern void tty_audit_tiocsti(struct tty_struct *tty, char ch);
+ extern void tty_audit_push(struct tty_struct *tty);
+ extern int tty_audit_push_current(void);
+ #else
+-static inline void tty_audit_add_data(struct tty_struct *tty,
+-		unsigned char *data, size_t size, unsigned icanon)
++static inline void tty_audit_add_data(struct tty_struct *tty, const void *data,
++				      size_t size, unsigned icanon)
+ {
+ }
+ static inline void tty_audit_tiocsti(struct tty_struct *tty, char ch)
+diff --git a/include/net/inet_common.h b/include/net/inet_common.h
+index 279f83591971..109e3ee9108c 100644
+--- a/include/net/inet_common.h
++++ b/include/net/inet_common.h
+@@ -41,7 +41,8 @@ int inet_recv_error(struct sock *sk, struct msghdr *msg, int len,
+ 
+ static inline void inet_ctl_sock_destroy(struct sock *sk)
+ {
+-	sock_release(sk->sk_socket);
++	if (sk)
++		sock_release(sk->sk_socket);
+ }
+ 
+ #endif
+diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
+index 5fa643b4e891..ff6d78ff68df 100644
+--- a/include/net/ip_fib.h
++++ b/include/net/ip_fib.h
+@@ -306,7 +306,7 @@ void fib_flush_external(struct net *net);
+ 
+ /* Exported by fib_semantics.c */
+ int ip_fib_check_default(__be32 gw, struct net_device *dev);
+-int fib_sync_down_dev(struct net_device *dev, unsigned long event);
++int fib_sync_down_dev(struct net_device *dev, unsigned long event, bool force);
+ int fib_sync_down_addr(struct net *net, __be32 local);
+ int fib_sync_up(struct net_device *dev, unsigned int nh_flags);
+ void fib_select_multipath(struct fib_result *res);
+diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c
+index f1a117f8cad2..0bec4588c3c8 100644
+--- a/net/bluetooth/hidp/core.c
++++ b/net/bluetooth/hidp/core.c
+@@ -401,6 +401,20 @@ static void hidp_idle_timeout(unsigned long arg)
+ {
+ 	struct hidp_session *session = (struct hidp_session *) arg;
+ 
++	/* The HIDP user-space API only contains calls to add and remove
++	 * devices. There is no way to forward events of any kind. Therefore,
++	 * we have to forcefully disconnect a device on idle-timeouts. This is
++	 * unfortunate and weird API design, but it is spec-compliant and
++	 * required for backwards-compatibility. Hence, on idle-timeout, we
++	 * signal driver-detach events, so poll() will be woken up with an
++	 * error-condition on both sockets.
++	 */
++
++	session->intr_sock->sk->sk_err = EUNATCH;
++	session->ctrl_sock->sk->sk_err = EUNATCH;
++	wake_up_interruptible(sk_sleep(session->intr_sock->sk));
++	wake_up_interruptible(sk_sleep(session->ctrl_sock->sk));
++
+ 	hidp_session_terminate(session);
+ }
+ 
+diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c
+index 92720f3fe573..e32a9e4910da 100644
+--- a/net/bluetooth/mgmt.c
++++ b/net/bluetooth/mgmt.c
+@@ -3090,6 +3090,11 @@ static int unpair_device(struct sock *sk, struct hci_dev *hdev, void *data,
+ 	} else {
+ 		u8 addr_type;
+ 
++		if (cp->addr.type == BDADDR_LE_PUBLIC)
++			addr_type = ADDR_LE_DEV_PUBLIC;
++		else
++			addr_type = ADDR_LE_DEV_RANDOM;
++
+ 		conn = hci_conn_hash_lookup_ba(hdev, LE_LINK,
+ 					       &cp->addr.bdaddr);
+ 		if (conn) {
+@@ -3105,13 +3110,10 @@ static int unpair_device(struct sock *sk, struct hci_dev *hdev, void *data,
+ 			 */
+ 			if (!cp->disconnect)
+ 				conn = NULL;
++		} else {
++			hci_conn_params_del(hdev, &cp->addr.bdaddr, addr_type);
+ 		}
+ 
+-		if (cp->addr.type == BDADDR_LE_PUBLIC)
+-			addr_type = ADDR_LE_DEV_PUBLIC;
+-		else
+-			addr_type = ADDR_LE_DEV_RANDOM;
+-
+ 		hci_remove_irk(hdev, &cp->addr.bdaddr, addr_type);
+ 
+ 		err = hci_remove_ltk(hdev, &cp->addr.bdaddr, addr_type);
+diff --git a/net/core/dst.c b/net/core/dst.c
+index 002144bea935..cc4a086ae09c 100644
+--- a/net/core/dst.c
++++ b/net/core/dst.c
+@@ -287,7 +287,7 @@ void dst_release(struct dst_entry *dst)
+ 		if (unlikely(newrefcnt < 0))
+ 			net_warn_ratelimited("%s: dst:%p refcnt:%d\n",
+ 					     __func__, dst, newrefcnt);
+-		if (unlikely(dst->flags & DST_NOCACHE) && !newrefcnt)
++		if (!newrefcnt && unlikely(dst->flags & DST_NOCACHE))
+ 			call_rcu(&dst->rcu_head, dst_destroy_rcu);
+ 	}
+ }
+diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
+index 6bbc54940eb4..d7116cf4eba4 100644
+--- a/net/ipv4/fib_frontend.c
++++ b/net/ipv4/fib_frontend.c
+@@ -1063,9 +1063,10 @@ static void nl_fib_lookup_exit(struct net *net)
+ 	net->ipv4.fibnl = NULL;
+ }
+ 
+-static void fib_disable_ip(struct net_device *dev, unsigned long event)
++static void fib_disable_ip(struct net_device *dev, unsigned long event,
++			   bool force)
+ {
+-	if (fib_sync_down_dev(dev, event))
++	if (fib_sync_down_dev(dev, event, force))
+ 		fib_flush(dev_net(dev));
+ 	rt_cache_flush(dev_net(dev));
+ 	arp_ifdown(dev);
+@@ -1093,7 +1094,7 @@ static int fib_inetaddr_event(struct notifier_block *this, unsigned long event,
+ 			/* Last address was deleted from this interface.
+ 			 * Disable IP.
+ 			 */
+-			fib_disable_ip(dev, event);
++			fib_disable_ip(dev, event, true);
+ 		} else {
+ 			rt_cache_flush(dev_net(dev));
+ 		}
+@@ -1110,7 +1111,7 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
+ 	unsigned int flags;
+ 
+ 	if (event == NETDEV_UNREGISTER) {
+-		fib_disable_ip(dev, event);
++		fib_disable_ip(dev, event, true);
+ 		rt_flush_dev(dev);
+ 		return NOTIFY_DONE;
+ 	}
+@@ -1131,14 +1132,14 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
+ 		rt_cache_flush(net);
+ 		break;
+ 	case NETDEV_DOWN:
+-		fib_disable_ip(dev, event);
++		fib_disable_ip(dev, event, false);
+ 		break;
+ 	case NETDEV_CHANGE:
+ 		flags = dev_get_flags(dev);
+ 		if (flags & (IFF_RUNNING | IFF_LOWER_UP))
+ 			fib_sync_up(dev, RTNH_F_LINKDOWN);
+ 		else
+-			fib_sync_down_dev(dev, event);
++			fib_sync_down_dev(dev, event, false);
+ 		/* fall through */
+ 	case NETDEV_CHANGEMTU:
+ 		rt_cache_flush(net);
+diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
+index 3a06586b170c..71bad5c82445 100644
+--- a/net/ipv4/fib_semantics.c
++++ b/net/ipv4/fib_semantics.c
+@@ -1132,7 +1132,13 @@ int fib_sync_down_addr(struct net *net, __be32 local)
+ 	return ret;
+ }
+ 
+-int fib_sync_down_dev(struct net_device *dev, unsigned long event)
++/* Event              force Flags           Description
++ * NETDEV_CHANGE      0     LINKDOWN        Carrier OFF, not for scope host
++ * NETDEV_DOWN        0     LINKDOWN|DEAD   Link down, not for scope host
++ * NETDEV_DOWN        1     LINKDOWN|DEAD   Last address removed
++ * NETDEV_UNREGISTER  1     LINKDOWN|DEAD   Device removed
++ */
++int fib_sync_down_dev(struct net_device *dev, unsigned long event, bool force)
+ {
+ 	int ret = 0;
+ 	int scope = RT_SCOPE_NOWHERE;
+@@ -1141,8 +1147,7 @@ int fib_sync_down_dev(struct net_device *dev, unsigned long event)
+ 	struct hlist_head *head = &fib_info_devhash[hash];
+ 	struct fib_nh *nh;
+ 
+-	if (event == NETDEV_UNREGISTER ||
+-	    event == NETDEV_DOWN)
++	if (force)
+ 		scope = -1;
+ 
+ 	hlist_for_each_entry(nh, head, nh_hash) {
+@@ -1291,6 +1296,13 @@ int fib_sync_up(struct net_device *dev, unsigned int nh_flags)
+ 	if (!(dev->flags & IFF_UP))
+ 		return 0;
+ 
++	if (nh_flags & RTNH_F_DEAD) {
++		unsigned int flags = dev_get_flags(dev);
++
++		if (flags & (IFF_RUNNING | IFF_LOWER_UP))
++			nh_flags |= RTNH_F_LINKDOWN;
++	}
++
+ 	prev_fi = NULL;
+ 	hash = fib_devindex_hashfn(dev->ifindex);
+ 	head = &fib_info_devhash[hash];
+diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
+index b0c6258ffb79..ea3aedb7dd0e 100644
+--- a/net/ipv4/fib_trie.c
++++ b/net/ipv4/fib_trie.c
+@@ -1561,7 +1561,7 @@ static struct key_vector *leaf_walk_rcu(struct key_vector **tn, t_key key)
+ 	do {
+ 		/* record parent and next child index */
+ 		pn = n;
+-		cindex = key ? get_index(key, pn) : 0;
++		cindex = (key > pn->key) ? get_index(key, pn) : 0;
+ 
+ 		if (cindex >> pn->bits)
+ 			break;
+diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
+index 5aa46d4b44ef..5a8ee3282550 100644
+--- a/net/ipv4/gre_offload.c
++++ b/net/ipv4/gre_offload.c
+@@ -36,7 +36,8 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,
+ 				  SKB_GSO_TCP_ECN |
+ 				  SKB_GSO_GRE |
+ 				  SKB_GSO_GRE_CSUM |
+-				  SKB_GSO_IPIP)))
++				  SKB_GSO_IPIP |
++				  SKB_GSO_SIT)))
+ 		goto out;
+ 
+ 	if (!skb->encapsulation)
+diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
+index 3a2c0162c3ba..df28693f32e1 100644
+--- a/net/ipv4/ipmr.c
++++ b/net/ipv4/ipmr.c
+@@ -1683,8 +1683,8 @@ static inline int ipmr_forward_finish(struct sock *sk, struct sk_buff *skb)
+ {
+ 	struct ip_options *opt = &(IPCB(skb)->opt);
+ 
+-	IP_INC_STATS_BH(dev_net(skb_dst(skb)->dev), IPSTATS_MIB_OUTFORWDATAGRAMS);
+-	IP_ADD_STATS_BH(dev_net(skb_dst(skb)->dev), IPSTATS_MIB_OUTOCTETS, skb->len);
++	IP_INC_STATS(dev_net(skb_dst(skb)->dev), IPSTATS_MIB_OUTFORWDATAGRAMS);
++	IP_ADD_STATS(dev_net(skb_dst(skb)->dev), IPSTATS_MIB_OUTOCTETS, skb->len);
+ 
+ 	if (unlikely(opt->optlen))
+ 		ip_forward_options(skb);
+@@ -1746,7 +1746,7 @@ static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt,
+ 		 * to blackhole.
+ 		 */
+ 
+-		IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_FRAGFAILS);
++		IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGFAILS);
+ 		ip_rt_put(rt);
+ 		goto out_free;
+ 	}
+diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
+index 0330ab2e2b63..a1442c5a3e0c 100644
+--- a/net/ipv4/sysctl_net_ipv4.c
++++ b/net/ipv4/sysctl_net_ipv4.c
+@@ -47,14 +47,14 @@ static void set_local_port_range(struct net *net, int range[2])
+ {
+ 	bool same_parity = !((range[0] ^ range[1]) & 1);
+ 
+-	write_seqlock(&net->ipv4.ip_local_ports.lock);
++	write_seqlock_bh(&net->ipv4.ip_local_ports.lock);
+ 	if (same_parity && !net->ipv4.ip_local_ports.warned) {
+ 		net->ipv4.ip_local_ports.warned = true;
+ 		pr_err_ratelimited("ip_local_port_range: prefer different parity for start/end values.\n");
+ 	}
+ 	net->ipv4.ip_local_ports.range[0] = range[0];
+ 	net->ipv4.ip_local_ports.range[1] = range[1];
+-	write_sequnlock(&net->ipv4.ip_local_ports.lock);
++	write_sequnlock_bh(&net->ipv4.ip_local_ports.lock);
+ }
+ 
+ /* Validate changes from /proc interface. */
+diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
+index b7dedd9d36d8..747a4c47e070 100644
+--- a/net/ipv4/tcp_output.c
++++ b/net/ipv4/tcp_output.c
+@@ -3406,7 +3406,7 @@ static int tcp_xmit_probe_skb(struct sock *sk, int urgent, int mib)
+ 	 */
+ 	tcp_init_nondata_skb(skb, tp->snd_una - !urgent, TCPHDR_ACK);
+ 	skb_mstamp_get(&skb->skb_mstamp);
+-	NET_INC_STATS_BH(sock_net(sk), mib);
++	NET_INC_STATS(sock_net(sk), mib);
+ 	return tcp_transmit_skb(sk, skb, 0, GFP_ATOMIC);
+ }
+ 
+diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
+index 21c2c818df3b..c8c1fea06003 100644
+--- a/net/ipv6/addrconf.c
++++ b/net/ipv6/addrconf.c
+@@ -411,6 +411,7 @@ static struct inet6_dev *ipv6_add_dev(struct net_device *dev)
+ 	if (err) {
+ 		ipv6_mc_destroy_dev(ndev);
+ 		del_timer(&ndev->regen_timer);
++		snmp6_unregister_dev(ndev);
+ 		goto err_release;
+ 	}
+ 	/* protected by rtnl_lock */
+diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
+index ac35a28599be..85c4b2fff504 100644
+--- a/net/ipv6/sit.c
++++ b/net/ipv6/sit.c
+@@ -1394,34 +1394,20 @@ static int ipip6_tunnel_init(struct net_device *dev)
+ 	return 0;
+ }
+ 
+-static int __net_init ipip6_fb_tunnel_init(struct net_device *dev)
++static void __net_init ipip6_fb_tunnel_init(struct net_device *dev)
+ {
+ 	struct ip_tunnel *tunnel = netdev_priv(dev);
+ 	struct iphdr *iph = &tunnel->parms.iph;
+ 	struct net *net = dev_net(dev);
+ 	struct sit_net *sitn = net_generic(net, sit_net_id);
+ 
+-	tunnel->dev = dev;
+-	tunnel->net = dev_net(dev);
+-
+ 	iph->version		= 4;
+ 	iph->protocol		= IPPROTO_IPV6;
+ 	iph->ihl		= 5;
+ 	iph->ttl		= 64;
+ 
+-	dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
+-	if (!dev->tstats)
+-		return -ENOMEM;
+-
+-	tunnel->dst_cache = alloc_percpu(struct ip_tunnel_dst);
+-	if (!tunnel->dst_cache) {
+-		free_percpu(dev->tstats);
+-		return -ENOMEM;
+-	}
+-
+ 	dev_hold(dev);
+ 	rcu_assign_pointer(sitn->tunnels_wc[0], tunnel);
+-	return 0;
+ }
+ 
+ static int ipip6_validate(struct nlattr *tb[], struct nlattr *data[])
+@@ -1831,23 +1817,19 @@ static int __net_init sit_init_net(struct net *net)
+ 	 */
+ 	sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
+ 
+-	err = ipip6_fb_tunnel_init(sitn->fb_tunnel_dev);
+-	if (err)
+-		goto err_dev_free;
+-
+-	ipip6_tunnel_clone_6rd(sitn->fb_tunnel_dev, sitn);
+ 	err = register_netdev(sitn->fb_tunnel_dev);
+ 	if (err)
+ 		goto err_reg_dev;
+ 
++	ipip6_tunnel_clone_6rd(sitn->fb_tunnel_dev, sitn);
++	ipip6_fb_tunnel_init(sitn->fb_tunnel_dev);
++
+ 	t = netdev_priv(sitn->fb_tunnel_dev);
+ 
+ 	strcpy(t->parms.name, sitn->fb_tunnel_dev->name);
+ 	return 0;
+ 
+ err_reg_dev:
+-	dev_put(sitn->fb_tunnel_dev);
+-err_dev_free:
+ 	ipip6_dev_free(sitn->fb_tunnel_dev);
+ err_alloc_dev:
+ 	return err;
+diff --git a/net/irda/irlmp.c b/net/irda/irlmp.c
+index a26c401ef4a4..43964594aa12 100644
+--- a/net/irda/irlmp.c
++++ b/net/irda/irlmp.c
+@@ -1839,7 +1839,7 @@ static void *irlmp_seq_hb_idx(struct irlmp_iter_state *iter, loff_t *off)
+ 	for (element = hashbin_get_first(iter->hashbin);
+ 	     element != NULL;
+ 	     element = hashbin_get_next(iter->hashbin)) {
+-		if (!off || *off-- == 0) {
++		if (!off || (*off)-- == 0) {
+ 			/* NB: hashbin left locked */
+ 			return element;
+ 		}
+diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
+index 9b2cc278ac2a..33bf779df350 100644
+--- a/net/mac80211/mlme.c
++++ b/net/mac80211/mlme.c
+@@ -3378,7 +3378,7 @@ static void ieee80211_rx_mgmt_beacon(struct ieee80211_sub_if_data *sdata,
+ 
+ 	if (ifmgd->rssi_min_thold != ifmgd->rssi_max_thold &&
+ 	    ifmgd->count_beacon_signal >= IEEE80211_SIGNAL_AVE_MIN_COUNT) {
+-		int sig = ifmgd->ave_beacon_signal;
++		int sig = ifmgd->ave_beacon_signal / 16;
+ 		int last_sig = ifmgd->last_ave_beacon_signal;
+ 		struct ieee80211_event event = {
+ 			.type = RSSI_EVENT,
+@@ -4999,6 +4999,25 @@ int ieee80211_mgd_deauth(struct ieee80211_sub_if_data *sdata,
+ 		return 0;
+ 	}
+ 
++	if (ifmgd->assoc_data &&
++	    ether_addr_equal(ifmgd->assoc_data->bss->bssid, req->bssid)) {
++		sdata_info(sdata,
++			   "aborting association with %pM by local choice (Reason: %u=%s)\n",
++			   req->bssid, req->reason_code,
++			   ieee80211_get_reason_code_string(req->reason_code));
++
++		drv_mgd_prepare_tx(sdata->local, sdata);
++		ieee80211_send_deauth_disassoc(sdata, req->bssid,
++					       IEEE80211_STYPE_DEAUTH,
++					       req->reason_code, tx,
++					       frame_buf);
++		ieee80211_destroy_assoc_data(sdata, false);
++		ieee80211_report_disconnect(sdata, frame_buf,
++					    sizeof(frame_buf), true,
++					    req->reason_code);
++		return 0;
++	}
++
+ 	if (ifmgd->associated &&
+ 	    ether_addr_equal(ifmgd->associated->bssid, req->bssid)) {
+ 		sdata_info(sdata,
+diff --git a/net/mac80211/trace.h b/net/mac80211/trace.h
+index 6f14591d8ca9..0b13bfa6f32f 100644
+--- a/net/mac80211/trace.h
++++ b/net/mac80211/trace.h
+@@ -33,11 +33,11 @@
+ 			__field(u32, chan_width)					\
+ 			__field(u32, center_freq1)					\
+ 			__field(u32, center_freq2)
+-#define CHANDEF_ASSIGN(c)								\
+-			__entry->control_freq = (c)->chan ? (c)->chan->center_freq : 0;	\
+-			__entry->chan_width = (c)->width;				\
+-			__entry->center_freq1 = (c)->center_freq1;			\
+-			__entry->center_freq2 = (c)->center_freq2;
++#define CHANDEF_ASSIGN(c)							\
++			__entry->control_freq = (c) ? ((c)->chan ? (c)->chan->center_freq : 0) : 0;	\
++			__entry->chan_width = (c) ? (c)->width : 0;			\
++			__entry->center_freq1 = (c) ? (c)->center_freq1 : 0;		\
++			__entry->center_freq2 = (c) ? (c)->center_freq2 : 0;
+ #define CHANDEF_PR_FMT	" control:%d MHz width:%d center: %d/%d MHz"
+ #define CHANDEF_PR_ARG	__entry->control_freq, __entry->chan_width,			\
+ 			__entry->center_freq1, __entry->center_freq2
+diff --git a/net/mac80211/util.c b/net/mac80211/util.c
+index 43e5aadd7a89..f5fa8c09cb42 100644
+--- a/net/mac80211/util.c
++++ b/net/mac80211/util.c
+@@ -2984,6 +2984,13 @@ ieee80211_extend_noa_desc(struct ieee80211_noa_data *data, u32 tsf, int i)
+ 	if (end > 0)
+ 		return false;
+ 
++	/* One shot NOA  */
++	if (data->count[i] == 1)
++		return false;
++
++	if (data->desc[i].interval == 0)
++		return false;
++
+ 	/* End time is in the past, check for repetitions */
+ 	skip = DIV_ROUND_UP(-end, data->desc[i].interval);
+ 	if (data->count[i] < 255) {
+diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
+index a133d16eb053..8b158f71bff6 100644
+--- a/net/netlink/af_netlink.c
++++ b/net/netlink/af_netlink.c
+@@ -2346,7 +2346,7 @@ static int netlink_getsockopt(struct socket *sock, int level, int optname,
+ 		int pos, idx, shift;
+ 
+ 		err = 0;
+-		netlink_table_grab();
++		netlink_lock_table();
+ 		for (pos = 0; pos * 8 < nlk->ngroups; pos += sizeof(u32)) {
+ 			if (len - pos < sizeof(u32))
+ 				break;
+@@ -2361,7 +2361,7 @@ static int netlink_getsockopt(struct socket *sock, int level, int optname,
+ 		}
+ 		if (put_user(ALIGN(nlk->ngroups / 8, sizeof(u32)), optlen))
+ 			err = -EFAULT;
+-		netlink_table_ungrab();
++		netlink_unlock_table();
+ 		break;
+ 	}
+ 	default:
+diff --git a/net/nfc/nci/hci.c b/net/nfc/nci/hci.c
+index 609f92283d1b..30b09f04c142 100644
+--- a/net/nfc/nci/hci.c
++++ b/net/nfc/nci/hci.c
+@@ -101,6 +101,20 @@ struct nci_hcp_packet {
+ #define NCI_HCP_MSG_GET_CMD(header)  (header & 0x3f)
+ #define NCI_HCP_MSG_GET_PIPE(header) (header & 0x7f)
+ 
++static int nci_hci_result_to_errno(u8 result)
++{
++	switch (result) {
++	case NCI_HCI_ANY_OK:
++		return 0;
++	case NCI_HCI_ANY_E_REG_PAR_UNKNOWN:
++		return -EOPNOTSUPP;
++	case NCI_HCI_ANY_E_TIMEOUT:
++		return -ETIME;
++	default:
++		return -1;
++	}
++}
++
+ /* HCI core */
+ static void nci_hci_reset_pipes(struct nci_hci_dev *hdev)
+ {
+@@ -146,18 +160,18 @@ static int nci_hci_send_data(struct nci_dev *ndev, u8 pipe,
+ 	if (!conn_info)
+ 		return -EPROTO;
+ 
+-	skb = nci_skb_alloc(ndev, 2 + conn_info->max_pkt_payload_len +
++	i = 0;
++	skb = nci_skb_alloc(ndev, conn_info->max_pkt_payload_len +
+ 			    NCI_DATA_HDR_SIZE, GFP_KERNEL);
+ 	if (!skb)
+ 		return -ENOMEM;
+ 
+-	skb_reserve(skb, 2 + NCI_DATA_HDR_SIZE);
++	skb_reserve(skb, NCI_DATA_HDR_SIZE + 2);
+ 	*skb_push(skb, 1) = data_type;
+ 
+-	i = 0;
+-	len = conn_info->max_pkt_payload_len;
+-
+ 	do {
++		len = conn_info->max_pkt_payload_len;
++
+ 		/* If last packet add NCI_HFP_NO_CHAINING */
+ 		if (i + conn_info->max_pkt_payload_len -
+ 		    (skb->len + 1) >= data_len) {
+@@ -177,9 +191,15 @@ static int nci_hci_send_data(struct nci_dev *ndev, u8 pipe,
+ 			return r;
+ 
+ 		i += len;
++
+ 		if (i < data_len) {
+-			skb_trim(skb, 0);
+-			skb_pull(skb, len);
++			skb = nci_skb_alloc(ndev,
++					    conn_info->max_pkt_payload_len +
++					    NCI_DATA_HDR_SIZE, GFP_KERNEL);
++			if (!skb)
++				return -ENOMEM;
++
++			skb_reserve(skb, NCI_DATA_HDR_SIZE + 1);
+ 		}
+ 	} while (i < data_len);
+ 
+@@ -212,7 +232,8 @@ int nci_hci_send_cmd(struct nci_dev *ndev, u8 gate, u8 cmd,
+ 		     const u8 *param, size_t param_len,
+ 		     struct sk_buff **skb)
+ {
+-	struct nci_conn_info    *conn_info;
++	struct nci_hcp_message *message;
++	struct nci_conn_info   *conn_info;
+ 	struct nci_data data;
+ 	int r;
+ 	u8 pipe = ndev->hci_dev->gate2pipe[gate];
+@@ -232,9 +253,15 @@ int nci_hci_send_cmd(struct nci_dev *ndev, u8 gate, u8 cmd,
+ 
+ 	r = nci_request(ndev, nci_hci_send_data_req, (unsigned long)&data,
+ 			msecs_to_jiffies(NCI_DATA_TIMEOUT));
+-
+-	if (r == NCI_STATUS_OK && skb)
+-		*skb = conn_info->rx_skb;
++	if (r == NCI_STATUS_OK) {
++		message = (struct nci_hcp_message *)conn_info->rx_skb->data;
++		r = nci_hci_result_to_errno(
++			NCI_HCP_MSG_GET_CMD(message->header));
++		skb_pull(conn_info->rx_skb, NCI_HCI_HCP_MESSAGE_HEADER_LEN);
++
++		if (!r && skb)
++			*skb = conn_info->rx_skb;
++	}
+ 
+ 	return r;
+ }
+@@ -328,9 +355,6 @@ static void nci_hci_resp_received(struct nci_dev *ndev, u8 pipe,
+ 	struct nci_conn_info    *conn_info;
+ 	u8 status = result;
+ 
+-	if (result != NCI_HCI_ANY_OK)
+-		goto exit;
+-
+ 	conn_info = ndev->hci_dev->conn_info;
+ 	if (!conn_info) {
+ 		status = NCI_STATUS_REJECTED;
+@@ -340,7 +364,7 @@ static void nci_hci_resp_received(struct nci_dev *ndev, u8 pipe,
+ 	conn_info->rx_skb = skb;
+ 
+ exit:
+-	nci_req_complete(ndev, status);
++	nci_req_complete(ndev, NCI_STATUS_OK);
+ }
+ 
+ /* Receive hcp message for pipe, with type and cmd.
+@@ -378,7 +402,7 @@ static void nci_hci_msg_rx_work(struct work_struct *work)
+ 	u8 pipe, type, instruction;
+ 
+ 	while ((skb = skb_dequeue(&hdev->msg_rx_queue)) != NULL) {
+-		pipe = skb->data[0];
++		pipe = NCI_HCP_MSG_GET_PIPE(skb->data[0]);
+ 		skb_pull(skb, NCI_HCI_HCP_PACKET_HEADER_LEN);
+ 		message = (struct nci_hcp_message *)skb->data;
+ 		type = NCI_HCP_MSG_GET_TYPE(message->header);
+@@ -395,7 +419,7 @@ void nci_hci_data_received_cb(void *context,
+ {
+ 	struct nci_dev *ndev = (struct nci_dev *)context;
+ 	struct nci_hcp_packet *packet;
+-	u8 pipe, type, instruction;
++	u8 pipe, type;
+ 	struct sk_buff *hcp_skb;
+ 	struct sk_buff *frag_skb;
+ 	int msg_len;
+@@ -415,7 +439,7 @@ void nci_hci_data_received_cb(void *context,
+ 
+ 	/* it's the last fragment. Does it need re-aggregation? */
+ 	if (skb_queue_len(&ndev->hci_dev->rx_hcp_frags)) {
+-		pipe = packet->header & NCI_HCI_FRAGMENT;
++		pipe = NCI_HCP_MSG_GET_PIPE(packet->header);
+ 		skb_queue_tail(&ndev->hci_dev->rx_hcp_frags, skb);
+ 
+ 		msg_len = 0;
+@@ -434,7 +458,7 @@ void nci_hci_data_received_cb(void *context,
+ 		*skb_put(hcp_skb, NCI_HCI_HCP_PACKET_HEADER_LEN) = pipe;
+ 
+ 		skb_queue_walk(&ndev->hci_dev->rx_hcp_frags, frag_skb) {
+-		       msg_len = frag_skb->len - NCI_HCI_HCP_PACKET_HEADER_LEN;
++			msg_len = frag_skb->len - NCI_HCI_HCP_PACKET_HEADER_LEN;
+ 			memcpy(skb_put(hcp_skb, msg_len), frag_skb->data +
+ 			       NCI_HCI_HCP_PACKET_HEADER_LEN, msg_len);
+ 		}
+@@ -452,11 +476,10 @@ void nci_hci_data_received_cb(void *context,
+ 	packet = (struct nci_hcp_packet *)hcp_skb->data;
+ 	type = NCI_HCP_MSG_GET_TYPE(packet->message.header);
+ 	if (type == NCI_HCI_HCP_RESPONSE) {
+-		pipe = packet->header;
+-		instruction = NCI_HCP_MSG_GET_CMD(packet->message.header);
+-		skb_pull(hcp_skb, NCI_HCI_HCP_PACKET_HEADER_LEN +
+-			 NCI_HCI_HCP_MESSAGE_HEADER_LEN);
+-		nci_hci_hcp_message_rx(ndev, pipe, type, instruction, hcp_skb);
++		pipe = NCI_HCP_MSG_GET_PIPE(packet->header);
++		skb_pull(hcp_skb, NCI_HCI_HCP_PACKET_HEADER_LEN);
++		nci_hci_hcp_message_rx(ndev, pipe, type,
++				       NCI_STATUS_OK, hcp_skb);
+ 	} else {
+ 		skb_queue_tail(&ndev->hci_dev->msg_rx_queue, hcp_skb);
+ 		schedule_work(&ndev->hci_dev->msg_rx_work);
+@@ -488,6 +511,7 @@ EXPORT_SYMBOL(nci_hci_open_pipe);
+ int nci_hci_set_param(struct nci_dev *ndev, u8 gate, u8 idx,
+ 		      const u8 *param, size_t param_len)
+ {
++	struct nci_hcp_message *message;
+ 	struct nci_conn_info *conn_info;
+ 	struct nci_data data;
+ 	int r;
+@@ -520,6 +544,12 @@ int nci_hci_set_param(struct nci_dev *ndev, u8 gate, u8 idx,
+ 	r = nci_request(ndev, nci_hci_send_data_req,
+ 			(unsigned long)&data,
+ 			msecs_to_jiffies(NCI_DATA_TIMEOUT));
++	if (r == NCI_STATUS_OK) {
++		message = (struct nci_hcp_message *)conn_info->rx_skb->data;
++		r = nci_hci_result_to_errno(
++			NCI_HCP_MSG_GET_CMD(message->header));
++		skb_pull(conn_info->rx_skb, NCI_HCI_HCP_MESSAGE_HEADER_LEN);
++	}
+ 
+ 	kfree(tmp);
+ 	return r;
+@@ -529,6 +559,7 @@ EXPORT_SYMBOL(nci_hci_set_param);
+ int nci_hci_get_param(struct nci_dev *ndev, u8 gate, u8 idx,
+ 		      struct sk_buff **skb)
+ {
++	struct nci_hcp_message *message;
+ 	struct nci_conn_info    *conn_info;
+ 	struct nci_data data;
+ 	int r;
+@@ -553,8 +584,15 @@ int nci_hci_get_param(struct nci_dev *ndev, u8 gate, u8 idx,
+ 	r = nci_request(ndev, nci_hci_send_data_req, (unsigned long)&data,
+ 			msecs_to_jiffies(NCI_DATA_TIMEOUT));
+ 
+-	if (r == NCI_STATUS_OK)
+-		*skb = conn_info->rx_skb;
++	if (r == NCI_STATUS_OK) {
++		message = (struct nci_hcp_message *)conn_info->rx_skb->data;
++		r = nci_hci_result_to_errno(
++			NCI_HCP_MSG_GET_CMD(message->header));
++		skb_pull(conn_info->rx_skb, NCI_HCI_HCP_MESSAGE_HEADER_LEN);
++
++		if (!r && skb)
++			*skb = conn_info->rx_skb;
++	}
+ 
+ 	return r;
+ }
+diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
+index 7851b1222a36..71cb085e16fd 100644
+--- a/net/packet/af_packet.c
++++ b/net/packet/af_packet.c
+@@ -2784,22 +2784,40 @@ static int packet_release(struct socket *sock)
+  *	Attach a packet hook.
+  */
+ 
+-static int packet_do_bind(struct sock *sk, struct net_device *dev, __be16 proto)
++static int packet_do_bind(struct sock *sk, const char *name, int ifindex,
++			  __be16 proto)
+ {
+ 	struct packet_sock *po = pkt_sk(sk);
+ 	struct net_device *dev_curr;
+ 	__be16 proto_curr;
+ 	bool need_rehook;
++	struct net_device *dev = NULL;
++	int ret = 0;
++	bool unlisted = false;
+ 
+-	if (po->fanout) {
+-		if (dev)
+-			dev_put(dev);
+-
++	if (po->fanout)
+ 		return -EINVAL;
+-	}
+ 
+ 	lock_sock(sk);
+ 	spin_lock(&po->bind_lock);
++	rcu_read_lock();
++
++	if (name) {
++		dev = dev_get_by_name_rcu(sock_net(sk), name);
++		if (!dev) {
++			ret = -ENODEV;
++			goto out_unlock;
++		}
++	} else if (ifindex) {
++		dev = dev_get_by_index_rcu(sock_net(sk), ifindex);
++		if (!dev) {
++			ret = -ENODEV;
++			goto out_unlock;
++		}
++	}
++
++	if (dev)
++		dev_hold(dev);
+ 
+ 	proto_curr = po->prot_hook.type;
+ 	dev_curr = po->prot_hook.dev;
+@@ -2807,14 +2825,29 @@ static int packet_do_bind(struct sock *sk, struct net_device *dev, __be16 proto)
+ 	need_rehook = proto_curr != proto || dev_curr != dev;
+ 
+ 	if (need_rehook) {
+-		unregister_prot_hook(sk, true);
++		if (po->running) {
++			rcu_read_unlock();
++			__unregister_prot_hook(sk, true);
++			rcu_read_lock();
++			dev_curr = po->prot_hook.dev;
++			if (dev)
++				unlisted = !dev_get_by_index_rcu(sock_net(sk),
++								 dev->ifindex);
++		}
+ 
+ 		po->num = proto;
+ 		po->prot_hook.type = proto;
+-		po->prot_hook.dev = dev;
+ 
+-		po->ifindex = dev ? dev->ifindex : 0;
+-		packet_cached_dev_assign(po, dev);
++		if (unlikely(unlisted)) {
++			dev_put(dev);
++			po->prot_hook.dev = NULL;
++			po->ifindex = -1;
++			packet_cached_dev_reset(po);
++		} else {
++			po->prot_hook.dev = dev;
++			po->ifindex = dev ? dev->ifindex : 0;
++			packet_cached_dev_assign(po, dev);
++		}
+ 	}
+ 	if (dev_curr)
+ 		dev_put(dev_curr);
+@@ -2822,7 +2855,7 @@ static int packet_do_bind(struct sock *sk, struct net_device *dev, __be16 proto)
+ 	if (proto == 0 || !need_rehook)
+ 		goto out_unlock;
+ 
+-	if (!dev || (dev->flags & IFF_UP)) {
++	if (!unlisted && (!dev || (dev->flags & IFF_UP))) {
+ 		register_prot_hook(sk);
+ 	} else {
+ 		sk->sk_err = ENETDOWN;
+@@ -2831,9 +2864,10 @@ static int packet_do_bind(struct sock *sk, struct net_device *dev, __be16 proto)
+ 	}
+ 
+ out_unlock:
++	rcu_read_unlock();
+ 	spin_unlock(&po->bind_lock);
+ 	release_sock(sk);
+-	return 0;
++	return ret;
+ }
+ 
+ /*
+@@ -2845,8 +2879,6 @@ static int packet_bind_spkt(struct socket *sock, struct sockaddr *uaddr,
+ {
+ 	struct sock *sk = sock->sk;
+ 	char name[15];
+-	struct net_device *dev;
+-	int err = -ENODEV;
+ 
+ 	/*
+ 	 *	Check legality
+@@ -2856,19 +2888,13 @@ static int packet_bind_spkt(struct socket *sock, struct sockaddr *uaddr,
+ 		return -EINVAL;
+ 	strlcpy(name, uaddr->sa_data, sizeof(name));
+ 
+-	dev = dev_get_by_name(sock_net(sk), name);
+-	if (dev)
+-		err = packet_do_bind(sk, dev, pkt_sk(sk)->num);
+-	return err;
++	return packet_do_bind(sk, name, 0, pkt_sk(sk)->num);
+ }
+ 
+ static int packet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
+ {
+ 	struct sockaddr_ll *sll = (struct sockaddr_ll *)uaddr;
+ 	struct sock *sk = sock->sk;
+-	struct net_device *dev = NULL;
+-	int err;
+-
+ 
+ 	/*
+ 	 *	Check legality
+@@ -2879,16 +2905,8 @@ static int packet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len
+ 	if (sll->sll_family != AF_PACKET)
+ 		return -EINVAL;
+ 
+-	if (sll->sll_ifindex) {
+-		err = -ENODEV;
+-		dev = dev_get_by_index(sock_net(sk), sll->sll_ifindex);
+-		if (dev == NULL)
+-			goto out;
+-	}
+-	err = packet_do_bind(sk, dev, sll->sll_protocol ? : pkt_sk(sk)->num);
+-
+-out:
+-	return err;
++	return packet_do_bind(sk, NULL, sll->sll_ifindex,
++			      sll->sll_protocol ? : pkt_sk(sk)->num);
+ }
+ 
+ static struct proto packet_proto = {
+diff --git a/net/rds/connection.c b/net/rds/connection.c
+index da6da57e5f36..9d66705f9d41 100644
+--- a/net/rds/connection.c
++++ b/net/rds/connection.c
+@@ -187,6 +187,12 @@ new_conn:
+ 		}
+ 	}
+ 
++	if (trans == NULL) {
++		kmem_cache_free(rds_conn_slab, conn);
++		conn = ERR_PTR(-ENODEV);
++		goto out;
++	}
++
+ 	conn->c_trans = trans;
+ 
+ 	ret = trans->conn_alloc(conn, gfp);
+diff --git a/net/rds/tcp_recv.c b/net/rds/tcp_recv.c
+index fbc5ef88bc0e..27a992154804 100644
+--- a/net/rds/tcp_recv.c
++++ b/net/rds/tcp_recv.c
+@@ -214,8 +214,15 @@ static int rds_tcp_data_recv(read_descriptor_t *desc, struct sk_buff *skb,
+ 			}
+ 
+ 			to_copy = min(tc->t_tinc_data_rem, left);
+-			pskb_pull(clone, offset);
+-			pskb_trim(clone, to_copy);
++			if (!pskb_pull(clone, offset) ||
++			    pskb_trim(clone, to_copy)) {
++				pr_warn("rds_tcp_data_recv: pull/trim failed "
++					"left %zu data_rem %zu skb_len %d\n",
++					left, tc->t_tinc_data_rem, skb->len);
++				kfree_skb(clone);
++				desc->error = -ENOMEM;
++				goto out;
++			}
+ 			skb_queue_tail(&tinc->ti_skb_list, clone);
+ 
+ 			rdsdebug("skb %p data %p len %d off %u to_copy %zu -> "
+diff --git a/net/tipc/msg.c b/net/tipc/msg.c
+index 08b4cc7d496d..b3a393104b17 100644
+--- a/net/tipc/msg.c
++++ b/net/tipc/msg.c
+@@ -121,7 +121,7 @@ int tipc_buf_append(struct sk_buff **headbuf, struct sk_buff **buf)
+ {
+ 	struct sk_buff *head = *headbuf;
+ 	struct sk_buff *frag = *buf;
+-	struct sk_buff *tail;
++	struct sk_buff *tail = NULL;
+ 	struct tipc_msg *msg;
+ 	u32 fragid;
+ 	int delta;
+@@ -141,9 +141,15 @@ int tipc_buf_append(struct sk_buff **headbuf, struct sk_buff **buf)
+ 		if (unlikely(skb_unclone(frag, GFP_ATOMIC)))
+ 			goto err;
+ 		head = *headbuf = frag;
+-		skb_frag_list_init(head);
+-		TIPC_SKB_CB(head)->tail = NULL;
+ 		*buf = NULL;
++		TIPC_SKB_CB(head)->tail = NULL;
++		if (skb_is_nonlinear(head)) {
++			skb_walk_frags(head, tail) {
++				TIPC_SKB_CB(head)->tail = tail;
++			}
++		} else {
++			skb_frag_list_init(head);
++		}
+ 		return 0;
+ 	}
+ 
+diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c
+index 66deebc66aa1..f8dfee5072c0 100644
+--- a/net/tipc/udp_media.c
++++ b/net/tipc/udp_media.c
+@@ -48,6 +48,7 @@
+ #include <linux/tipc_netlink.h>
+ #include "core.h"
+ #include "bearer.h"
++#include "msg.h"
+ 
+ /* IANA assigned UDP port */
+ #define UDP_PORT_DEFAULT	6118
+@@ -216,6 +217,10 @@ static int tipc_udp_recv(struct sock *sk, struct sk_buff *skb)
+ {
+ 	struct udp_bearer *ub;
+ 	struct tipc_bearer *b;
++	int usr = msg_user(buf_msg(skb));
++
++	if ((usr == LINK_PROTOCOL) || (usr == NAME_DISTRIBUTOR))
++		skb_linearize(skb);
+ 
+ 	ub = rcu_dereference_sk_user_data(sk);
+ 	if (!ub) {
+diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
+index 76b41578a838..d059cf31d754 100644
+--- a/net/wireless/nl80211.c
++++ b/net/wireless/nl80211.c
+@@ -3408,12 +3408,6 @@ static int nl80211_start_ap(struct sk_buff *skb, struct genl_info *info)
+ 					   wdev->iftype))
+ 		return -EINVAL;
+ 
+-	if (info->attrs[NL80211_ATTR_ACL_POLICY]) {
+-		params.acl = parse_acl_data(&rdev->wiphy, info);
+-		if (IS_ERR(params.acl))
+-			return PTR_ERR(params.acl);
+-	}
+-
+ 	if (info->attrs[NL80211_ATTR_SMPS_MODE]) {
+ 		params.smps_mode =
+ 			nla_get_u8(info->attrs[NL80211_ATTR_SMPS_MODE]);
+@@ -3437,6 +3431,12 @@ static int nl80211_start_ap(struct sk_buff *skb, struct genl_info *info)
+ 		params.smps_mode = NL80211_SMPS_OFF;
+ 	}
+ 
++	if (info->attrs[NL80211_ATTR_ACL_POLICY]) {
++		params.acl = parse_acl_data(&rdev->wiphy, info);
++		if (IS_ERR(params.acl))
++			return PTR_ERR(params.acl);
++	}
++
+ 	wdev_lock(wdev);
+ 	err = rdev_start_ap(rdev, dev, &params);
+ 	if (!err) {
+diff --git a/sound/usb/midi.c b/sound/usb/midi.c
+index 417ebb11cf48..bec63e0d2605 100644
+--- a/sound/usb/midi.c
++++ b/sound/usb/midi.c
+@@ -174,6 +174,8 @@ struct snd_usb_midi_in_endpoint {
+ 		u8 running_status_length;
+ 	} ports[0x10];
+ 	u8 seen_f5;
++	bool in_sysex;
++	u8 last_cin;
+ 	u8 error_resubmit;
+ 	int current_port;
+ };
+@@ -468,6 +470,39 @@ static void snd_usbmidi_maudio_broken_running_status_input(
+ }
+ 
+ /*
++ * QinHeng CH345 is buggy: every second packet inside a SysEx has not CIN 4
++ * but the previously seen CIN, but still with three data bytes.
++ */
++static void ch345_broken_sysex_input(struct snd_usb_midi_in_endpoint *ep,
++				     uint8_t *buffer, int buffer_length)
++{
++	unsigned int i, cin, length;
++
++	for (i = 0; i + 3 < buffer_length; i += 4) {
++		if (buffer[i] == 0 && i > 0)
++			break;
++		cin = buffer[i] & 0x0f;
++		if (ep->in_sysex &&
++		    cin == ep->last_cin &&
++		    (buffer[i + 1 + (cin == 0x6)] & 0x80) == 0)
++			cin = 0x4;
++#if 0
++		if (buffer[i + 1] == 0x90) {
++			/*
++			 * Either a corrupted running status or a real note-on
++			 * message; impossible to detect reliably.
++			 */
++		}
++#endif
++		length = snd_usbmidi_cin_length[cin];
++		snd_usbmidi_input_data(ep, 0, &buffer[i + 1], length);
++		ep->in_sysex = cin == 0x4;
++		if (!ep->in_sysex)
++			ep->last_cin = cin;
++	}
++}
++
++/*
+  * CME protocol: like the standard protocol, but SysEx commands are sent as a
+  * single USB packet preceded by a 0x0F byte.
+  */
+@@ -660,6 +695,12 @@ static struct usb_protocol_ops snd_usbmidi_cme_ops = {
+ 	.output_packet = snd_usbmidi_output_standard_packet,
+ };
+ 
++static struct usb_protocol_ops snd_usbmidi_ch345_broken_sysex_ops = {
++	.input = ch345_broken_sysex_input,
++	.output = snd_usbmidi_standard_output,
++	.output_packet = snd_usbmidi_output_standard_packet,
++};
++
+ /*
+  * AKAI MPD16 protocol:
+  *
+@@ -1341,6 +1382,7 @@ static int snd_usbmidi_out_endpoint_create(struct snd_usb_midi *umidi,
+ 		 * Various chips declare a packet size larger than 4 bytes, but
+ 		 * do not actually work with larger packets:
+ 		 */
++	case USB_ID(0x0a67, 0x5011): /* Medeli DD305 */
+ 	case USB_ID(0x0a92, 0x1020): /* ESI M4U */
+ 	case USB_ID(0x1430, 0x474b): /* RedOctane GH MIDI INTERFACE */
+ 	case USB_ID(0x15ca, 0x0101): /* Textech USB Midi Cable */
+@@ -2375,6 +2417,10 @@ int snd_usbmidi_create(struct snd_card *card,
+ 
+ 		err = snd_usbmidi_detect_per_port_endpoints(umidi, endpoints);
+ 		break;
++	case QUIRK_MIDI_CH345:
++		umidi->usb_protocol_ops = &snd_usbmidi_ch345_broken_sysex_ops;
++		err = snd_usbmidi_detect_per_port_endpoints(umidi, endpoints);
++		break;
+ 	default:
+ 		dev_err(&umidi->dev->dev, "invalid quirk type %d\n",
+ 			quirk->type);
+diff --git a/sound/usb/quirks-table.h b/sound/usb/quirks-table.h
+index e4756651a52c..ecc2a4ea014d 100644
+--- a/sound/usb/quirks-table.h
++++ b/sound/usb/quirks-table.h
+@@ -2820,6 +2820,17 @@ YAMAHA_DEVICE(0x7010, "UB99"),
+ 	.idProduct = 0x1020,
+ },
+ 
++/* QinHeng devices */
++{
++	USB_DEVICE(0x1a86, 0x752d),
++	.driver_info = (unsigned long) &(const struct snd_usb_audio_quirk) {
++		.vendor_name = "QinHeng",
++		.product_name = "CH345",
++		.ifnum = 1,
++		.type = QUIRK_MIDI_CH345
++	}
++},
++
+ /* KeithMcMillen Stringport */
+ {
+ 	USB_DEVICE(0x1f38, 0x0001),
+diff --git a/sound/usb/quirks.c b/sound/usb/quirks.c
+index 00ebc0ca008e..eef9b8e4b949 100644
+--- a/sound/usb/quirks.c
++++ b/sound/usb/quirks.c
+@@ -535,6 +535,7 @@ int snd_usb_create_quirk(struct snd_usb_audio *chip,
+ 		[QUIRK_MIDI_CME] = create_any_midi_quirk,
+ 		[QUIRK_MIDI_AKAI] = create_any_midi_quirk,
+ 		[QUIRK_MIDI_FTDI] = create_any_midi_quirk,
++		[QUIRK_MIDI_CH345] = create_any_midi_quirk,
+ 		[QUIRK_AUDIO_STANDARD_INTERFACE] = create_standard_audio_quirk,
+ 		[QUIRK_AUDIO_FIXED_ENDPOINT] = create_fixed_stream_quirk,
+ 		[QUIRK_AUDIO_EDIROL_UAXX] = create_uaxx_quirk,
+@@ -1271,6 +1272,7 @@ u64 snd_usb_interface_dsd_format_quirks(struct snd_usb_audio *chip,
+ 	case USB_ID(0x20b1, 0x000a): /* Gustard DAC-X20U */
+ 	case USB_ID(0x20b1, 0x2009): /* DIYINHK DSD DXD 384kHz USB to I2S/DSD */
+ 	case USB_ID(0x20b1, 0x2023): /* JLsounds I2SoverUSB */
++	case USB_ID(0x20b1, 0x3023): /* Aune X1S 32BIT/384 DSD DAC */
+ 		if (fp->altsetting == 3)
+ 			return SNDRV_PCM_FMTBIT_DSD_U32_BE;
+ 		break;
+diff --git a/sound/usb/usbaudio.h b/sound/usb/usbaudio.h
+index 91d0380431b4..991aa84491cd 100644
+--- a/sound/usb/usbaudio.h
++++ b/sound/usb/usbaudio.h
+@@ -94,6 +94,7 @@ enum quirk_type {
+ 	QUIRK_MIDI_AKAI,
+ 	QUIRK_MIDI_US122L,
+ 	QUIRK_MIDI_FTDI,
++	QUIRK_MIDI_CH345,
+ 	QUIRK_AUDIO_STANDARD_INTERFACE,
+ 	QUIRK_AUDIO_FIXED_ENDPOINT,
+ 	QUIRK_AUDIO_EDIROL_UAXX,


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-12-15 11:15 Mike Pagano
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-12-15 11:15 UTC (permalink / raw
  To: gentoo-commits

commit:     3a9d9184e1f0d412574eabf24e5cd3586f69d3e9
Author:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Tue Dec 15 11:15:05 2015 +0000
Commit:     Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Tue Dec 15 11:15:05 2015 +0000
URL:        https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=3a9d9184

Linux patch 4.2.8

 0000_README            |    4 +
 1007_linux-4.2.8.patch | 3882 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 3886 insertions(+)

diff --git a/0000_README b/0000_README
index 2299001..5645178 100644
--- a/0000_README
+++ b/0000_README
@@ -71,6 +71,10 @@ Patch:  1006_linux-4.2.7.patch
 From:   http://www.kernel.org
 Desc:   Linux 4.2.7
 
+Patch:  1007_linux-4.2.8.patch
+From:   http://www.kernel.org
+Desc:   Linux 4.2.8
+
 Patch:  1500_XATTR_USER_PREFIX.patch
 From:   https://bugs.gentoo.org/show_bug.cgi?id=470644
 Desc:   Support for namespace user.pax.* on tmpfs.

diff --git a/1007_linux-4.2.8.patch b/1007_linux-4.2.8.patch
new file mode 100644
index 0000000..7aca417
--- /dev/null
+++ b/1007_linux-4.2.8.patch
@@ -0,0 +1,3882 @@
+diff --git a/Makefile b/Makefile
+index f5014eaf2532..06b988951ccb 100644
+--- a/Makefile
++++ b/Makefile
+@@ -1,6 +1,6 @@
+ VERSION = 4
+ PATCHLEVEL = 2
+-SUBLEVEL = 7
++SUBLEVEL = 8
+ EXTRAVERSION =
+ NAME = Hurr durr I'ma sheep
+ 
+diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
+index 017b7d58ae06..55f8a6a706fc 100644
+--- a/drivers/block/rbd.c
++++ b/drivers/block/rbd.c
+@@ -3439,6 +3439,7 @@ static void rbd_queue_workfn(struct work_struct *work)
+ 		goto err_rq;
+ 	}
+ 	img_request->rq = rq;
++	snapc = NULL; /* img_request consumes a ref */
+ 
+ 	if (op_type == OBJ_OP_DISCARD)
+ 		result = rbd_img_request_fill(img_request, OBJ_REQUEST_NODATA,
+diff --git a/drivers/firewire/ohci.c b/drivers/firewire/ohci.c
+index f51d376d10ba..c2f5117fd8cb 100644
+--- a/drivers/firewire/ohci.c
++++ b/drivers/firewire/ohci.c
+@@ -3675,6 +3675,11 @@ static int pci_probe(struct pci_dev *dev,
+ 
+ 	reg_write(ohci, OHCI1394_IsoXmitIntMaskSet, ~0);
+ 	ohci->it_context_support = reg_read(ohci, OHCI1394_IsoXmitIntMaskSet);
++	/* JMicron JMB38x often shows 0 at first read, just ignore it */
++	if (!ohci->it_context_support) {
++		ohci_notice(ohci, "overriding IsoXmitIntMask\n");
++		ohci->it_context_support = 0xf;
++	}
+ 	reg_write(ohci, OHCI1394_IsoXmitIntMaskClear, ~0);
+ 	ohci->it_context_mask = ohci->it_context_support;
+ 	ohci->n_it = hweight32(ohci->it_context_mask);
+diff --git a/drivers/media/pci/cobalt/Kconfig b/drivers/media/pci/cobalt/Kconfig
+index 6a1c0089bb62..4ecf171d14a2 100644
+--- a/drivers/media/pci/cobalt/Kconfig
++++ b/drivers/media/pci/cobalt/Kconfig
+@@ -1,6 +1,6 @@
+ config VIDEO_COBALT
+ 	tristate "Cisco Cobalt support"
+-	depends on VIDEO_V4L2 && I2C && MEDIA_CONTROLLER
++	depends on VIDEO_V4L2 && I2C && VIDEO_V4L2_SUBDEV_API
+ 	depends on PCI_MSI && MTD_COMPLEX_MAPPINGS && GPIOLIB
+ 	depends on SND
+ 	select I2C_ALGOBIT
+diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+index 3b90afb8c293..6f2a748524f3 100644
+--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
++++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+@@ -1325,7 +1325,12 @@ err_disable_device:
+ static void nicvf_remove(struct pci_dev *pdev)
+ {
+ 	struct net_device *netdev = pci_get_drvdata(pdev);
+-	struct nicvf *nic = netdev_priv(netdev);
++	struct nicvf *nic;
++
++	if (!netdev)
++		return;
++
++	nic = netdev_priv(netdev);
+ 
+ 	unregister_netdev(netdev);
+ 	nicvf_unregister_interrupts(nic);
+diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+index 731423ca575d..8bead97373ab 100644
+--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
++++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+@@ -4934,26 +4934,41 @@ static void rem_slave_counters(struct mlx4_dev *dev, int slave)
+ 	struct res_counter *counter;
+ 	struct res_counter *tmp;
+ 	int err;
+-	int index;
++	int *counters_arr = NULL;
++	int i, j;
+ 
+ 	err = move_all_busy(dev, slave, RES_COUNTER);
+ 	if (err)
+ 		mlx4_warn(dev, "rem_slave_counters: Could not move all counters - too busy for slave %d\n",
+ 			  slave);
+ 
+-	spin_lock_irq(mlx4_tlock(dev));
+-	list_for_each_entry_safe(counter, tmp, counter_list, com.list) {
+-		if (counter->com.owner == slave) {
+-			index = counter->com.res_id;
+-			rb_erase(&counter->com.node,
+-				 &tracker->res_tree[RES_COUNTER]);
+-			list_del(&counter->com.list);
+-			kfree(counter);
+-			__mlx4_counter_free(dev, index);
++	counters_arr = kmalloc_array(dev->caps.max_counters,
++				     sizeof(*counters_arr), GFP_KERNEL);
++	if (!counters_arr)
++		return;
++
++	do {
++		i = 0;
++		j = 0;
++		spin_lock_irq(mlx4_tlock(dev));
++		list_for_each_entry_safe(counter, tmp, counter_list, com.list) {
++			if (counter->com.owner == slave) {
++				counters_arr[i++] = counter->com.res_id;
++				rb_erase(&counter->com.node,
++					 &tracker->res_tree[RES_COUNTER]);
++				list_del(&counter->com.list);
++				kfree(counter);
++			}
++		}
++		spin_unlock_irq(mlx4_tlock(dev));
++
++		while (j < i) {
++			__mlx4_counter_free(dev, counters_arr[j++]);
+ 			mlx4_release_resource(dev, slave, RES_COUNTER, 1, 0);
+ 		}
+-	}
+-	spin_unlock_irq(mlx4_tlock(dev));
++	} while (i);
++
++	kfree(counters_arr);
+ }
+ 
+ static void rem_slave_xrcdns(struct mlx4_dev *dev, int slave)
+diff --git a/drivers/net/ethernet/via/via-rhine.c b/drivers/net/ethernet/via/via-rhine.c
+index a83263743665..2b7550c43f78 100644
+--- a/drivers/net/ethernet/via/via-rhine.c
++++ b/drivers/net/ethernet/via/via-rhine.c
+@@ -2134,10 +2134,11 @@ static int rhine_rx(struct net_device *dev, int limit)
+ 			}
+ 
+ 			skb_put(skb, pkt_len);
+-			skb->protocol = eth_type_trans(skb, dev);
+ 
+ 			rhine_rx_vlan_tag(skb, desc, data_size);
+ 
++			skb->protocol = eth_type_trans(skb, dev);
++
+ 			netif_receive_skb(skb);
+ 
+ 			u64_stats_update_begin(&rp->rx_stats.syncp);
+diff --git a/drivers/net/phy/broadcom.c b/drivers/net/phy/broadcom.c
+index 9c71295f2fef..85e640440bd9 100644
+--- a/drivers/net/phy/broadcom.c
++++ b/drivers/net/phy/broadcom.c
+@@ -675,7 +675,7 @@ static struct mdio_device_id __maybe_unused broadcom_tbl[] = {
+ 	{ PHY_ID_BCM5461, 0xfffffff0 },
+ 	{ PHY_ID_BCM54616S, 0xfffffff0 },
+ 	{ PHY_ID_BCM5464, 0xfffffff0 },
+-	{ PHY_ID_BCM5482, 0xfffffff0 },
++	{ PHY_ID_BCM5481, 0xfffffff0 },
+ 	{ PHY_ID_BCM5482, 0xfffffff0 },
+ 	{ PHY_ID_BCM50610, 0xfffffff0 },
+ 	{ PHY_ID_BCM50610M, 0xfffffff0 },
+diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
+index 8f1738c3b3c5..de27f510c0f3 100644
+--- a/drivers/net/usb/qmi_wwan.c
++++ b/drivers/net/usb/qmi_wwan.c
+@@ -775,6 +775,7 @@ static const struct usb_device_id products[] = {
+ 	{QMI_FIXED_INTF(0x2357, 0x9000, 4)},	/* TP-LINK MA260 */
+ 	{QMI_FIXED_INTF(0x1bc7, 0x1200, 5)},	/* Telit LE920 */
+ 	{QMI_FIXED_INTF(0x1bc7, 0x1201, 2)},	/* Telit LE920 */
++	{QMI_FIXED_INTF(0x1c9e, 0x9b01, 3)},	/* XS Stick W100-2 from 4G Systems */
+ 	{QMI_FIXED_INTF(0x0b3c, 0xc000, 4)},	/* Olivetti Olicard 100 */
+ 	{QMI_FIXED_INTF(0x0b3c, 0xc001, 4)},	/* Olivetti Olicard 120 */
+ 	{QMI_FIXED_INTF(0x0b3c, 0xc002, 4)},	/* Olivetti Olicard 140 */
+diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
+index aac314e14188..bb25b8d00570 100644
+--- a/fs/btrfs/ctree.h
++++ b/fs/btrfs/ctree.h
+@@ -3404,7 +3404,7 @@ int btrfs_set_disk_extent_flags(struct btrfs_trans_handle *trans,
+ int btrfs_free_extent(struct btrfs_trans_handle *trans,
+ 		      struct btrfs_root *root,
+ 		      u64 bytenr, u64 num_bytes, u64 parent, u64 root_objectid,
+-		      u64 owner, u64 offset, int no_quota);
++		      u64 owner, u64 offset);
+ 
+ int btrfs_free_reserved_extent(struct btrfs_root *root, u64 start, u64 len,
+ 			       int delalloc);
+@@ -3417,7 +3417,7 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans,
+ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
+ 			 struct btrfs_root *root,
+ 			 u64 bytenr, u64 num_bytes, u64 parent,
+-			 u64 root_objectid, u64 owner, u64 offset, int no_quota);
++			 u64 root_objectid, u64 owner, u64 offset);
+ 
+ int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans,
+ 				   struct btrfs_root *root);
+diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
+index ac3e81da6d4e..7832031fef68 100644
+--- a/fs/btrfs/delayed-ref.c
++++ b/fs/btrfs/delayed-ref.c
+@@ -197,6 +197,119 @@ static inline void drop_delayed_ref(struct btrfs_trans_handle *trans,
+ 		trans->delayed_ref_updates--;
+ }
+ 
++static bool merge_ref(struct btrfs_trans_handle *trans,
++		      struct btrfs_delayed_ref_root *delayed_refs,
++		      struct btrfs_delayed_ref_head *head,
++		      struct btrfs_delayed_ref_node *ref,
++		      u64 seq)
++{
++	struct btrfs_delayed_ref_node *next;
++	bool done = false;
++
++	next = list_first_entry(&head->ref_list, struct btrfs_delayed_ref_node,
++				list);
++	while (!done && &next->list != &head->ref_list) {
++		int mod;
++		struct btrfs_delayed_ref_node *next2;
++
++		next2 = list_next_entry(next, list);
++
++		if (next == ref)
++			goto next;
++
++		if (seq && next->seq >= seq)
++			goto next;
++
++		if (next->type != ref->type)
++			goto next;
++
++		if ((ref->type == BTRFS_TREE_BLOCK_REF_KEY ||
++		     ref->type == BTRFS_SHARED_BLOCK_REF_KEY) &&
++		    comp_tree_refs(btrfs_delayed_node_to_tree_ref(ref),
++				   btrfs_delayed_node_to_tree_ref(next),
++				   ref->type))
++			goto next;
++		if ((ref->type == BTRFS_EXTENT_DATA_REF_KEY ||
++		     ref->type == BTRFS_SHARED_DATA_REF_KEY) &&
++		    comp_data_refs(btrfs_delayed_node_to_data_ref(ref),
++				   btrfs_delayed_node_to_data_ref(next)))
++			goto next;
++
++		if (ref->action == next->action) {
++			mod = next->ref_mod;
++		} else {
++			if (ref->ref_mod < next->ref_mod) {
++				swap(ref, next);
++				done = true;
++			}
++			mod = -next->ref_mod;
++		}
++
++		drop_delayed_ref(trans, delayed_refs, head, next);
++		ref->ref_mod += mod;
++		if (ref->ref_mod == 0) {
++			drop_delayed_ref(trans, delayed_refs, head, ref);
++			done = true;
++		} else {
++			/*
++			 * Can't have multiples of the same ref on a tree block.
++			 */
++			WARN_ON(ref->type == BTRFS_TREE_BLOCK_REF_KEY ||
++				ref->type == BTRFS_SHARED_BLOCK_REF_KEY);
++		}
++next:
++		next = next2;
++	}
++
++	return done;
++}
++
++void btrfs_merge_delayed_refs(struct btrfs_trans_handle *trans,
++			      struct btrfs_fs_info *fs_info,
++			      struct btrfs_delayed_ref_root *delayed_refs,
++			      struct btrfs_delayed_ref_head *head)
++{
++	struct btrfs_delayed_ref_node *ref;
++	u64 seq = 0;
++
++	assert_spin_locked(&head->lock);
++
++	if (list_empty(&head->ref_list))
++		return;
++
++	/* We don't have too many refs to merge for data. */
++	if (head->is_data)
++		return;
++
++	spin_lock(&fs_info->tree_mod_seq_lock);
++	if (!list_empty(&fs_info->tree_mod_seq_list)) {
++		struct seq_list *elem;
++
++		elem = list_first_entry(&fs_info->tree_mod_seq_list,
++					struct seq_list, list);
++		seq = elem->seq;
++	}
++	spin_unlock(&fs_info->tree_mod_seq_lock);
++
++	ref = list_first_entry(&head->ref_list, struct btrfs_delayed_ref_node,
++			       list);
++	while (&ref->list != &head->ref_list) {
++		if (seq && ref->seq >= seq)
++			goto next;
++
++		if (merge_ref(trans, delayed_refs, head, ref, seq)) {
++			if (list_empty(&head->ref_list))
++				break;
++			ref = list_first_entry(&head->ref_list,
++					       struct btrfs_delayed_ref_node,
++					       list);
++			continue;
++		}
++next:
++		ref = list_next_entry(ref, list);
++	}
++}
++
+ int btrfs_check_delayed_seq(struct btrfs_fs_info *fs_info,
+ 			    struct btrfs_delayed_ref_root *delayed_refs,
+ 			    u64 seq)
+@@ -292,8 +405,7 @@ add_delayed_ref_tail_merge(struct btrfs_trans_handle *trans,
+ 	exist = list_entry(href->ref_list.prev, struct btrfs_delayed_ref_node,
+ 			   list);
+ 	/* No need to compare bytenr nor is_head */
+-	if (exist->type != ref->type || exist->no_quota != ref->no_quota ||
+-	    exist->seq != ref->seq)
++	if (exist->type != ref->type || exist->seq != ref->seq)
+ 		goto add_tail;
+ 
+ 	if ((exist->type == BTRFS_TREE_BLOCK_REF_KEY ||
+@@ -524,7 +636,7 @@ add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
+ 		     struct btrfs_delayed_ref_head *head_ref,
+ 		     struct btrfs_delayed_ref_node *ref, u64 bytenr,
+ 		     u64 num_bytes, u64 parent, u64 ref_root, int level,
+-		     int action, int no_quota)
++		     int action)
+ {
+ 	struct btrfs_delayed_tree_ref *full_ref;
+ 	struct btrfs_delayed_ref_root *delayed_refs;
+@@ -546,7 +658,6 @@ add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
+ 	ref->action = action;
+ 	ref->is_head = 0;
+ 	ref->in_tree = 1;
+-	ref->no_quota = no_quota;
+ 	ref->seq = seq;
+ 
+ 	full_ref = btrfs_delayed_node_to_tree_ref(ref);
+@@ -579,7 +690,7 @@ add_delayed_data_ref(struct btrfs_fs_info *fs_info,
+ 		     struct btrfs_delayed_ref_head *head_ref,
+ 		     struct btrfs_delayed_ref_node *ref, u64 bytenr,
+ 		     u64 num_bytes, u64 parent, u64 ref_root, u64 owner,
+-		     u64 offset, int action, int no_quota)
++		     u64 offset, int action)
+ {
+ 	struct btrfs_delayed_data_ref *full_ref;
+ 	struct btrfs_delayed_ref_root *delayed_refs;
+@@ -602,7 +713,6 @@ add_delayed_data_ref(struct btrfs_fs_info *fs_info,
+ 	ref->action = action;
+ 	ref->is_head = 0;
+ 	ref->in_tree = 1;
+-	ref->no_quota = no_quota;
+ 	ref->seq = seq;
+ 
+ 	full_ref = btrfs_delayed_node_to_data_ref(ref);
+@@ -633,17 +743,13 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
+ 			       struct btrfs_trans_handle *trans,
+ 			       u64 bytenr, u64 num_bytes, u64 parent,
+ 			       u64 ref_root,  int level, int action,
+-			       struct btrfs_delayed_extent_op *extent_op,
+-			       int no_quota)
++			       struct btrfs_delayed_extent_op *extent_op)
+ {
+ 	struct btrfs_delayed_tree_ref *ref;
+ 	struct btrfs_delayed_ref_head *head_ref;
+ 	struct btrfs_delayed_ref_root *delayed_refs;
+ 	struct btrfs_qgroup_extent_record *record = NULL;
+ 
+-	if (!is_fstree(ref_root) || !fs_info->quota_enabled)
+-		no_quota = 0;
+-
+ 	BUG_ON(extent_op && extent_op->is_data);
+ 	ref = kmem_cache_alloc(btrfs_delayed_tree_ref_cachep, GFP_NOFS);
+ 	if (!ref)
+@@ -672,8 +778,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
+ 					bytenr, num_bytes, action, 0);
+ 
+ 	add_delayed_tree_ref(fs_info, trans, head_ref, &ref->node, bytenr,
+-				   num_bytes, parent, ref_root, level, action,
+-				   no_quota);
++			     num_bytes, parent, ref_root, level, action);
+ 	spin_unlock(&delayed_refs->lock);
+ 
+ 	return 0;
+@@ -694,17 +799,13 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
+ 			       u64 bytenr, u64 num_bytes,
+ 			       u64 parent, u64 ref_root,
+ 			       u64 owner, u64 offset, int action,
+-			       struct btrfs_delayed_extent_op *extent_op,
+-			       int no_quota)
++			       struct btrfs_delayed_extent_op *extent_op)
+ {
+ 	struct btrfs_delayed_data_ref *ref;
+ 	struct btrfs_delayed_ref_head *head_ref;
+ 	struct btrfs_delayed_ref_root *delayed_refs;
+ 	struct btrfs_qgroup_extent_record *record = NULL;
+ 
+-	if (!is_fstree(ref_root) || !fs_info->quota_enabled)
+-		no_quota = 0;
+-
+ 	BUG_ON(extent_op && !extent_op->is_data);
+ 	ref = kmem_cache_alloc(btrfs_delayed_data_ref_cachep, GFP_NOFS);
+ 	if (!ref)
+@@ -740,7 +841,7 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
+ 
+ 	add_delayed_data_ref(fs_info, trans, head_ref, &ref->node, bytenr,
+ 				   num_bytes, parent, ref_root, owner, offset,
+-				   action, no_quota);
++				   action);
+ 	spin_unlock(&delayed_refs->lock);
+ 
+ 	return 0;
+diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
+index 13fb5e6090fe..930887a4275f 100644
+--- a/fs/btrfs/delayed-ref.h
++++ b/fs/btrfs/delayed-ref.h
+@@ -68,7 +68,6 @@ struct btrfs_delayed_ref_node {
+ 
+ 	unsigned int action:8;
+ 	unsigned int type:8;
+-	unsigned int no_quota:1;
+ 	/* is this node still in the rbtree? */
+ 	unsigned int is_head:1;
+ 	unsigned int in_tree:1;
+@@ -233,15 +232,13 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
+ 			       struct btrfs_trans_handle *trans,
+ 			       u64 bytenr, u64 num_bytes, u64 parent,
+ 			       u64 ref_root, int level, int action,
+-			       struct btrfs_delayed_extent_op *extent_op,
+-			       int no_quota);
++			       struct btrfs_delayed_extent_op *extent_op);
+ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
+ 			       struct btrfs_trans_handle *trans,
+ 			       u64 bytenr, u64 num_bytes,
+ 			       u64 parent, u64 ref_root,
+ 			       u64 owner, u64 offset, int action,
+-			       struct btrfs_delayed_extent_op *extent_op,
+-			       int no_quota);
++			       struct btrfs_delayed_extent_op *extent_op);
+ int btrfs_add_delayed_extent_op(struct btrfs_fs_info *fs_info,
+ 				struct btrfs_trans_handle *trans,
+ 				u64 bytenr, u64 num_bytes,
+diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
+index 07204bf601ed..5d870c4eac05 100644
+--- a/fs/btrfs/extent-tree.c
++++ b/fs/btrfs/extent-tree.c
+@@ -95,8 +95,7 @@ static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans,
+ 				     struct btrfs_root *root,
+ 				     u64 parent, u64 root_objectid,
+ 				     u64 flags, struct btrfs_disk_key *key,
+-				     int level, struct btrfs_key *ins,
+-				     int no_quota);
++				     int level, struct btrfs_key *ins);
+ static int do_chunk_alloc(struct btrfs_trans_handle *trans,
+ 			  struct btrfs_root *extent_root, u64 flags,
+ 			  int force);
+@@ -1941,8 +1940,7 @@ int btrfs_discard_extent(struct btrfs_root *root, u64 bytenr,
+ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
+ 			 struct btrfs_root *root,
+ 			 u64 bytenr, u64 num_bytes, u64 parent,
+-			 u64 root_objectid, u64 owner, u64 offset,
+-			 int no_quota)
++			 u64 root_objectid, u64 owner, u64 offset)
+ {
+ 	int ret;
+ 	struct btrfs_fs_info *fs_info = root->fs_info;
+@@ -1954,12 +1952,12 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
+ 		ret = btrfs_add_delayed_tree_ref(fs_info, trans, bytenr,
+ 					num_bytes,
+ 					parent, root_objectid, (int)owner,
+-					BTRFS_ADD_DELAYED_REF, NULL, no_quota);
++					BTRFS_ADD_DELAYED_REF, NULL);
+ 	} else {
+ 		ret = btrfs_add_delayed_data_ref(fs_info, trans, bytenr,
+ 					num_bytes,
+ 					parent, root_objectid, owner, offset,
+-					BTRFS_ADD_DELAYED_REF, NULL, no_quota);
++					BTRFS_ADD_DELAYED_REF, NULL);
+ 	}
+ 	return ret;
+ }
+@@ -1980,15 +1978,11 @@ static int __btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
+ 	u64 num_bytes = node->num_bytes;
+ 	u64 refs;
+ 	int ret;
+-	int no_quota = node->no_quota;
+ 
+ 	path = btrfs_alloc_path();
+ 	if (!path)
+ 		return -ENOMEM;
+ 
+-	if (!is_fstree(root_objectid) || !root->fs_info->quota_enabled)
+-		no_quota = 1;
+-
+ 	path->reada = 1;
+ 	path->leave_spinning = 1;
+ 	/* this will setup the path even if it fails to insert the back ref */
+@@ -2223,8 +2217,7 @@ static int run_delayed_tree_ref(struct btrfs_trans_handle *trans,
+ 						parent, ref_root,
+ 						extent_op->flags_to_set,
+ 						&extent_op->key,
+-						ref->level, &ins,
+-						node->no_quota);
++						ref->level, &ins);
+ 	} else if (node->action == BTRFS_ADD_DELAYED_REF) {
+ 		ret = __btrfs_inc_extent_ref(trans, root, node,
+ 					     parent, ref_root,
+@@ -2365,7 +2358,21 @@ static noinline int __btrfs_run_delayed_refs(struct btrfs_trans_handle *trans,
+ 			}
+ 		}
+ 
++		/*
++		 * We need to try and merge add/drops of the same ref since we
++		 * can run into issues with relocate dropping the implicit ref
++		 * and then it being added back again before the drop can
++		 * finish.  If we merged anything we need to re-loop so we can
++		 * get a good ref.
++		 * Or we can get node references of the same type that weren't
++		 * merged when created due to bumps in the tree mod seq, and
++		 * we need to merge them to prevent adding an inline extent
++		 * backref before dropping it (triggering a BUG_ON at
++		 * insert_inline_extent_backref()).
++		 */
+ 		spin_lock(&locked_ref->lock);
++		btrfs_merge_delayed_refs(trans, fs_info, delayed_refs,
++					 locked_ref);
+ 
+ 		/*
+ 		 * locked_ref is the head node, so we have to go one
+@@ -3038,7 +3045,7 @@ static int __btrfs_mod_ref(struct btrfs_trans_handle *trans,
+ 	int level;
+ 	int ret = 0;
+ 	int (*process_func)(struct btrfs_trans_handle *, struct btrfs_root *,
+-			    u64, u64, u64, u64, u64, u64, int);
++			    u64, u64, u64, u64, u64, u64);
+ 
+ 
+ 	if (btrfs_test_is_dummy_root(root))
+@@ -3079,15 +3086,14 @@ static int __btrfs_mod_ref(struct btrfs_trans_handle *trans,
+ 			key.offset -= btrfs_file_extent_offset(buf, fi);
+ 			ret = process_func(trans, root, bytenr, num_bytes,
+ 					   parent, ref_root, key.objectid,
+-					   key.offset, 1);
++					   key.offset);
+ 			if (ret)
+ 				goto fail;
+ 		} else {
+ 			bytenr = btrfs_node_blockptr(buf, i);
+ 			num_bytes = root->nodesize;
+ 			ret = process_func(trans, root, bytenr, num_bytes,
+-					   parent, ref_root, level - 1, 0,
+-					   1);
++					   parent, ref_root, level - 1, 0);
+ 			if (ret)
+ 				goto fail;
+ 		}
+@@ -6137,7 +6143,6 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
+ 	int extent_slot = 0;
+ 	int found_extent = 0;
+ 	int num_to_del = 1;
+-	int no_quota = node->no_quota;
+ 	u32 item_size;
+ 	u64 refs;
+ 	u64 bytenr = node->bytenr;
+@@ -6146,9 +6151,6 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
+ 	bool skinny_metadata = btrfs_fs_incompat(root->fs_info,
+ 						 SKINNY_METADATA);
+ 
+-	if (!info->quota_enabled || !is_fstree(root_objectid))
+-		no_quota = 1;
+-
+ 	path = btrfs_alloc_path();
+ 	if (!path)
+ 		return -ENOMEM;
+@@ -6474,7 +6476,7 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
+ 					buf->start, buf->len,
+ 					parent, root->root_key.objectid,
+ 					btrfs_header_level(buf),
+-					BTRFS_DROP_DELAYED_REF, NULL, 0);
++					BTRFS_DROP_DELAYED_REF, NULL);
+ 		BUG_ON(ret); /* -ENOMEM */
+ 	}
+ 
+@@ -6522,7 +6524,7 @@ out:
+ /* Can return -ENOMEM */
+ int btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root,
+ 		      u64 bytenr, u64 num_bytes, u64 parent, u64 root_objectid,
+-		      u64 owner, u64 offset, int no_quota)
++		      u64 owner, u64 offset)
+ {
+ 	int ret;
+ 	struct btrfs_fs_info *fs_info = root->fs_info;
+@@ -6545,13 +6547,13 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root,
+ 		ret = btrfs_add_delayed_tree_ref(fs_info, trans, bytenr,
+ 					num_bytes,
+ 					parent, root_objectid, (int)owner,
+-					BTRFS_DROP_DELAYED_REF, NULL, no_quota);
++					BTRFS_DROP_DELAYED_REF, NULL);
+ 	} else {
+ 		ret = btrfs_add_delayed_data_ref(fs_info, trans, bytenr,
+ 						num_bytes,
+ 						parent, root_objectid, owner,
+ 						offset, BTRFS_DROP_DELAYED_REF,
+-						NULL, no_quota);
++						NULL);
+ 	}
+ 	return ret;
+ }
+@@ -7333,8 +7335,7 @@ static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans,
+ 				     struct btrfs_root *root,
+ 				     u64 parent, u64 root_objectid,
+ 				     u64 flags, struct btrfs_disk_key *key,
+-				     int level, struct btrfs_key *ins,
+-				     int no_quota)
++				     int level, struct btrfs_key *ins)
+ {
+ 	int ret;
+ 	struct btrfs_fs_info *fs_info = root->fs_info;
+@@ -7424,7 +7425,7 @@ int btrfs_alloc_reserved_file_extent(struct btrfs_trans_handle *trans,
+ 	ret = btrfs_add_delayed_data_ref(root->fs_info, trans, ins->objectid,
+ 					 ins->offset, 0,
+ 					 root_objectid, owner, offset,
+-					 BTRFS_ADD_DELAYED_EXTENT, NULL, 0);
++					 BTRFS_ADD_DELAYED_EXTENT, NULL);
+ 	return ret;
+ }
+ 
+@@ -7641,7 +7642,7 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans,
+ 						 ins.objectid, ins.offset,
+ 						 parent, root_objectid, level,
+ 						 BTRFS_ADD_DELAYED_EXTENT,
+-						 extent_op, 0);
++						 extent_op);
+ 		if (ret)
+ 			goto out_free_delayed;
+ 	}
+@@ -8189,7 +8190,7 @@ skip:
+ 			}
+ 		}
+ 		ret = btrfs_free_extent(trans, root, bytenr, blocksize, parent,
+-				root->root_key.objectid, level - 1, 0, 0);
++				root->root_key.objectid, level - 1, 0);
+ 		BUG_ON(ret); /* -ENOMEM */
+ 	}
+ 	btrfs_tree_unlock(next);
+diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
+index b823fac91c92..5e314856a58c 100644
+--- a/fs/btrfs/file.c
++++ b/fs/btrfs/file.c
+@@ -756,8 +756,16 @@ next_slot:
+ 		}
+ 
+ 		btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
+-		if (key.objectid > ino ||
+-		    key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
++
++		if (key.objectid > ino)
++			break;
++		if (WARN_ON_ONCE(key.objectid < ino) ||
++		    key.type < BTRFS_EXTENT_DATA_KEY) {
++			ASSERT(del_nr == 0);
++			path->slots[0]++;
++			goto next_slot;
++		}
++		if (key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
+ 			break;
+ 
+ 		fi = btrfs_item_ptr(leaf, path->slots[0],
+@@ -776,8 +784,8 @@ next_slot:
+ 				btrfs_file_extent_inline_len(leaf,
+ 						     path->slots[0], fi);
+ 		} else {
+-			WARN_ON(1);
+-			extent_end = search_start;
++			/* can't happen */
++			BUG();
+ 		}
+ 
+ 		/*
+@@ -847,7 +855,7 @@ next_slot:
+ 						disk_bytenr, num_bytes, 0,
+ 						root->root_key.objectid,
+ 						new_key.objectid,
+-						start - extent_offset, 1);
++						start - extent_offset);
+ 				BUG_ON(ret); /* -ENOMEM */
+ 			}
+ 			key.offset = start;
+@@ -925,7 +933,7 @@ delete_extent_item:
+ 						disk_bytenr, num_bytes, 0,
+ 						root->root_key.objectid,
+ 						key.objectid, key.offset -
+-						extent_offset, 0);
++						extent_offset);
+ 				BUG_ON(ret); /* -ENOMEM */
+ 				inode_sub_bytes(inode,
+ 						extent_end - key.offset);
+@@ -1204,7 +1212,7 @@ again:
+ 
+ 		ret = btrfs_inc_extent_ref(trans, root, bytenr, num_bytes, 0,
+ 					   root->root_key.objectid,
+-					   ino, orig_offset, 1);
++					   ino, orig_offset);
+ 		BUG_ON(ret); /* -ENOMEM */
+ 
+ 		if (split == start) {
+@@ -1231,7 +1239,7 @@ again:
+ 		del_nr++;
+ 		ret = btrfs_free_extent(trans, root, bytenr, num_bytes,
+ 					0, root->root_key.objectid,
+-					ino, orig_offset, 0);
++					ino, orig_offset);
+ 		BUG_ON(ret); /* -ENOMEM */
+ 	}
+ 	other_start = 0;
+@@ -1248,7 +1256,7 @@ again:
+ 		del_nr++;
+ 		ret = btrfs_free_extent(trans, root, bytenr, num_bytes,
+ 					0, root->root_key.objectid,
+-					ino, orig_offset, 0);
++					ino, orig_offset);
+ 		BUG_ON(ret); /* -ENOMEM */
+ 	}
+ 	if (del_nr == 0) {
+@@ -1868,8 +1876,13 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
+ 	struct btrfs_log_ctx ctx;
+ 	int ret = 0;
+ 	bool full_sync = 0;
+-	const u64 len = end - start + 1;
++	u64 len;
+ 
++	/*
++	 * The range length can be represented by u64, we have to do the typecasts
++	 * to avoid signed overflow if it's [0, LLONG_MAX] eg. from fsync()
++	 */
++	len = (u64)end - (u64)start + 1;
+ 	trace_btrfs_sync_file(file, datasync);
+ 
+ 	/*
+@@ -2057,8 +2070,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
+ 			}
+ 		}
+ 		if (!full_sync) {
+-			ret = btrfs_wait_ordered_range(inode, start,
+-						       end - start + 1);
++			ret = btrfs_wait_ordered_range(inode, start, len);
+ 			if (ret) {
+ 				btrfs_end_transaction(trans, root);
+ 				goto out;
+diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
+index b54e63038b96..9aabff2102f8 100644
+--- a/fs/btrfs/inode.c
++++ b/fs/btrfs/inode.c
+@@ -1294,8 +1294,14 @@ next_slot:
+ 		num_bytes = 0;
+ 		btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
+ 
+-		if (found_key.objectid > ino ||
+-		    found_key.type > BTRFS_EXTENT_DATA_KEY ||
++		if (found_key.objectid > ino)
++			break;
++		if (WARN_ON_ONCE(found_key.objectid < ino) ||
++		    found_key.type < BTRFS_EXTENT_DATA_KEY) {
++			path->slots[0]++;
++			goto next_slot;
++		}
++		if (found_key.type > BTRFS_EXTENT_DATA_KEY ||
+ 		    found_key.offset > end)
+ 			break;
+ 
+@@ -2569,7 +2575,7 @@ again:
+ 	ret = btrfs_inc_extent_ref(trans, root, new->bytenr,
+ 			new->disk_len, 0,
+ 			backref->root_id, backref->inum,
+-			new->file_pos, 0);	/* start - extent_offset */
++			new->file_pos);	/* start - extent_offset */
+ 	if (ret) {
+ 		btrfs_abort_transaction(trans, root, ret);
+ 		goto out_free_path;
+@@ -4184,6 +4190,47 @@ static int truncate_space_check(struct btrfs_trans_handle *trans,
+ 
+ }
+ 
++static int truncate_inline_extent(struct inode *inode,
++				  struct btrfs_path *path,
++				  struct btrfs_key *found_key,
++				  const u64 item_end,
++				  const u64 new_size)
++{
++	struct extent_buffer *leaf = path->nodes[0];
++	int slot = path->slots[0];
++	struct btrfs_file_extent_item *fi;
++	u32 size = (u32)(new_size - found_key->offset);
++	struct btrfs_root *root = BTRFS_I(inode)->root;
++
++	fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
++
++	if (btrfs_file_extent_compression(leaf, fi) != BTRFS_COMPRESS_NONE) {
++		loff_t offset = new_size;
++		loff_t page_end = ALIGN(offset, PAGE_CACHE_SIZE);
++
++		/*
++		 * Zero out the remaining of the last page of our inline extent,
++		 * instead of directly truncating our inline extent here - that
++		 * would be much more complex (decompressing all the data, then
++		 * compressing the truncated data, which might be bigger than
++		 * the size of the inline extent, resize the extent, etc).
++		 * We release the path because to get the page we might need to
++		 * read the extent item from disk (data not in the page cache).
++		 */
++		btrfs_release_path(path);
++		return btrfs_truncate_page(inode, offset, page_end - offset, 0);
++	}
++
++	btrfs_set_file_extent_ram_bytes(leaf, fi, size);
++	size = btrfs_file_extent_calc_inline_size(size);
++	btrfs_truncate_item(root, path, size, 1);
++
++	if (test_bit(BTRFS_ROOT_REF_COWS, &root->state))
++		inode_sub_bytes(inode, item_end + 1 - new_size);
++
++	return 0;
++}
++
+ /*
+  * this can truncate away extent items, csum items and directory items.
+  * It starts at a high offset and removes keys until it can't find
+@@ -4378,27 +4425,40 @@ search_again:
+ 			 * special encodings
+ 			 */
+ 			if (!del_item &&
+-			    btrfs_file_extent_compression(leaf, fi) == 0 &&
+ 			    btrfs_file_extent_encryption(leaf, fi) == 0 &&
+ 			    btrfs_file_extent_other_encoding(leaf, fi) == 0) {
+-				u32 size = new_size - found_key.offset;
+-
+-				if (test_bit(BTRFS_ROOT_REF_COWS, &root->state))
+-					inode_sub_bytes(inode, item_end + 1 -
+-							new_size);
+ 
+ 				/*
+-				 * update the ram bytes to properly reflect
+-				 * the new size of our item
++				 * Need to release path in order to truncate a
++				 * compressed extent. So delete any accumulated
++				 * extent items so far.
+ 				 */
+-				btrfs_set_file_extent_ram_bytes(leaf, fi, size);
+-				size =
+-				    btrfs_file_extent_calc_inline_size(size);
+-				btrfs_truncate_item(root, path, size, 1);
++				if (btrfs_file_extent_compression(leaf, fi) !=
++				    BTRFS_COMPRESS_NONE && pending_del_nr) {
++					err = btrfs_del_items(trans, root, path,
++							      pending_del_slot,
++							      pending_del_nr);
++					if (err) {
++						btrfs_abort_transaction(trans,
++									root,
++									err);
++						goto error;
++					}
++					pending_del_nr = 0;
++				}
++
++				err = truncate_inline_extent(inode, path,
++							     &found_key,
++							     item_end,
++							     new_size);
++				if (err) {
++					btrfs_abort_transaction(trans,
++								root, err);
++					goto error;
++				}
+ 			} else if (test_bit(BTRFS_ROOT_REF_COWS,
+ 					    &root->state)) {
+-				inode_sub_bytes(inode, item_end + 1 -
+-						found_key.offset);
++				inode_sub_bytes(inode, item_end + 1 - new_size);
+ 			}
+ 		}
+ delete:
+@@ -4428,7 +4488,7 @@ delete:
+ 			ret = btrfs_free_extent(trans, root, extent_start,
+ 						extent_num_bytes, 0,
+ 						btrfs_header_owner(leaf),
+-						ino, extent_offset, 0);
++						ino, extent_offset);
+ 			BUG_ON(ret);
+ 			if (btrfs_should_throttle_delayed_refs(trans, root))
+ 				btrfs_async_run_delayed_refs(root,
+diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
+index 641d3dc4f31e..be4e53c61dd9 100644
+--- a/fs/btrfs/ioctl.c
++++ b/fs/btrfs/ioctl.c
+@@ -3195,41 +3195,6 @@ out:
+ 	return ret;
+ }
+ 
+-/* Helper to check and see if this root currently has a ref on the given disk
+- * bytenr.  If it does then we need to update the quota for this root.  This
+- * doesn't do anything if quotas aren't enabled.
+- */
+-static int check_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root,
+-		     u64 disko)
+-{
+-	struct seq_list tree_mod_seq_elem = SEQ_LIST_INIT(tree_mod_seq_elem);
+-	struct ulist *roots;
+-	struct ulist_iterator uiter;
+-	struct ulist_node *root_node = NULL;
+-	int ret;
+-
+-	if (!root->fs_info->quota_enabled)
+-		return 1;
+-
+-	btrfs_get_tree_mod_seq(root->fs_info, &tree_mod_seq_elem);
+-	ret = btrfs_find_all_roots(trans, root->fs_info, disko,
+-				   tree_mod_seq_elem.seq, &roots);
+-	if (ret < 0)
+-		goto out;
+-	ret = 0;
+-	ULIST_ITER_INIT(&uiter);
+-	while ((root_node = ulist_next(roots, &uiter))) {
+-		if (root_node->val == root->objectid) {
+-			ret = 1;
+-			break;
+-		}
+-	}
+-	ulist_free(roots);
+-out:
+-	btrfs_put_tree_mod_seq(root->fs_info, &tree_mod_seq_elem);
+-	return ret;
+-}
+-
+ static int clone_finish_inode_update(struct btrfs_trans_handle *trans,
+ 				     struct inode *inode,
+ 				     u64 endoff,
+@@ -3320,6 +3285,150 @@ static void clone_update_extent_map(struct inode *inode,
+ 			&BTRFS_I(inode)->runtime_flags);
+ }
+ 
++/*
++ * Make sure we do not end up inserting an inline extent into a file that has
++ * already other (non-inline) extents. If a file has an inline extent it can
++ * not have any other extents and the (single) inline extent must start at the
++ * file offset 0. Failing to respect these rules will lead to file corruption,
++ * resulting in EIO errors on read/write operations, hitting BUG_ON's in mm, etc
++ *
++ * We can have extents that have been already written to disk or we can have
++ * dirty ranges still in delalloc, in which case the extent maps and items are
++ * created only when we run delalloc, and the delalloc ranges might fall outside
++ * the range we are currently locking in the inode's io tree. So we check the
++ * inode's i_size because of that (i_size updates are done while holding the
++ * i_mutex, which we are holding here).
++ * We also check to see if the inode has a size not greater than "datal" but has
++ * extents beyond it, due to an fallocate with FALLOC_FL_KEEP_SIZE (and we are
++ * protected against such concurrent fallocate calls by the i_mutex).
++ *
++ * If the file has no extents but a size greater than datal, do not allow the
++ * copy because we would need turn the inline extent into a non-inline one (even
++ * with NO_HOLES enabled). If we find our destination inode only has one inline
++ * extent, just overwrite it with the source inline extent if its size is less
++ * than the source extent's size, or we could copy the source inline extent's
++ * data into the destination inode's inline extent if the later is greater then
++ * the former.
++ */
++static int clone_copy_inline_extent(struct inode *src,
++				    struct inode *dst,
++				    struct btrfs_trans_handle *trans,
++				    struct btrfs_path *path,
++				    struct btrfs_key *new_key,
++				    const u64 drop_start,
++				    const u64 datal,
++				    const u64 skip,
++				    const u64 size,
++				    char *inline_data)
++{
++	struct btrfs_root *root = BTRFS_I(dst)->root;
++	const u64 aligned_end = ALIGN(new_key->offset + datal,
++				      root->sectorsize);
++	int ret;
++	struct btrfs_key key;
++
++	if (new_key->offset > 0)
++		return -EOPNOTSUPP;
++
++	key.objectid = btrfs_ino(dst);
++	key.type = BTRFS_EXTENT_DATA_KEY;
++	key.offset = 0;
++	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
++	if (ret < 0) {
++		return ret;
++	} else if (ret > 0) {
++		if (path->slots[0] >= btrfs_header_nritems(path->nodes[0])) {
++			ret = btrfs_next_leaf(root, path);
++			if (ret < 0)
++				return ret;
++			else if (ret > 0)
++				goto copy_inline_extent;
++		}
++		btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
++		if (key.objectid == btrfs_ino(dst) &&
++		    key.type == BTRFS_EXTENT_DATA_KEY) {
++			ASSERT(key.offset > 0);
++			return -EOPNOTSUPP;
++		}
++	} else if (i_size_read(dst) <= datal) {
++		struct btrfs_file_extent_item *ei;
++		u64 ext_len;
++
++		/*
++		 * If the file size is <= datal, make sure there are no other
++		 * extents following (can happen do to an fallocate call with
++		 * the flag FALLOC_FL_KEEP_SIZE).
++		 */
++		ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
++				    struct btrfs_file_extent_item);
++		/*
++		 * If it's an inline extent, it can not have other extents
++		 * following it.
++		 */
++		if (btrfs_file_extent_type(path->nodes[0], ei) ==
++		    BTRFS_FILE_EXTENT_INLINE)
++			goto copy_inline_extent;
++
++		ext_len = btrfs_file_extent_num_bytes(path->nodes[0], ei);
++		if (ext_len > aligned_end)
++			return -EOPNOTSUPP;
++
++		ret = btrfs_next_item(root, path);
++		if (ret < 0) {
++			return ret;
++		} else if (ret == 0) {
++			btrfs_item_key_to_cpu(path->nodes[0], &key,
++					      path->slots[0]);
++			if (key.objectid == btrfs_ino(dst) &&
++			    key.type == BTRFS_EXTENT_DATA_KEY)
++				return -EOPNOTSUPP;
++		}
++	}
++
++copy_inline_extent:
++	/*
++	 * We have no extent items, or we have an extent at offset 0 which may
++	 * or may not be inlined. All these cases are dealt the same way.
++	 */
++	if (i_size_read(dst) > datal) {
++		/*
++		 * If the destination inode has an inline extent...
++		 * This would require copying the data from the source inline
++		 * extent into the beginning of the destination's inline extent.
++		 * But this is really complex, both extents can be compressed
++		 * or just one of them, which would require decompressing and
++		 * re-compressing data (which could increase the new compressed
++		 * size, not allowing the compressed data to fit anymore in an
++		 * inline extent).
++		 * So just don't support this case for now (it should be rare,
++		 * we are not really saving space when cloning inline extents).
++		 */
++		return -EOPNOTSUPP;
++	}
++
++	btrfs_release_path(path);
++	ret = btrfs_drop_extents(trans, root, dst, drop_start, aligned_end, 1);
++	if (ret)
++		return ret;
++	ret = btrfs_insert_empty_item(trans, root, path, new_key, size);
++	if (ret)
++		return ret;
++
++	if (skip) {
++		const u32 start = btrfs_file_extent_calc_inline_size(0);
++
++		memmove(inline_data + start, inline_data + start + skip, datal);
++	}
++
++	write_extent_buffer(path->nodes[0], inline_data,
++			    btrfs_item_ptr_offset(path->nodes[0],
++						  path->slots[0]),
++			    size);
++	inode_add_bytes(dst, datal);
++
++	return 0;
++}
++
+ /**
+  * btrfs_clone() - clone a range from inode file to another
+  *
+@@ -3344,9 +3453,7 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
+ 	u32 nritems;
+ 	int slot;
+ 	int ret;
+-	int no_quota;
+ 	const u64 len = olen_aligned;
+-	u64 last_disko = 0;
+ 	u64 last_dest_end = destoff;
+ 
+ 	ret = -ENOMEM;
+@@ -3392,7 +3499,6 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
+ 
+ 		nritems = btrfs_header_nritems(path->nodes[0]);
+ process_slot:
+-		no_quota = 1;
+ 		if (path->slots[0] >= nritems) {
+ 			ret = btrfs_next_leaf(BTRFS_I(src)->root, path);
+ 			if (ret < 0)
+@@ -3544,35 +3650,13 @@ process_slot:
+ 				btrfs_set_file_extent_num_bytes(leaf, extent,
+ 								datal);
+ 
+-				/*
+-				 * We need to look up the roots that point at
+-				 * this bytenr and see if the new root does.  If
+-				 * it does not we need to make sure we update
+-				 * quotas appropriately.
+-				 */
+-				if (disko && root != BTRFS_I(src)->root &&
+-				    disko != last_disko) {
+-					no_quota = check_ref(trans, root,
+-							     disko);
+-					if (no_quota < 0) {
+-						btrfs_abort_transaction(trans,
+-									root,
+-									ret);
+-						btrfs_end_transaction(trans,
+-								      root);
+-						ret = no_quota;
+-						goto out;
+-					}
+-				}
+-
+ 				if (disko) {
+ 					inode_add_bytes(inode, datal);
+ 					ret = btrfs_inc_extent_ref(trans, root,
+ 							disko, diskl, 0,
+ 							root->root_key.objectid,
+ 							btrfs_ino(inode),
+-							new_key.offset - datao,
+-							no_quota);
++							new_key.offset - datao);
+ 					if (ret) {
+ 						btrfs_abort_transaction(trans,
+ 									root,
+@@ -3586,21 +3670,6 @@ process_slot:
+ 			} else if (type == BTRFS_FILE_EXTENT_INLINE) {
+ 				u64 skip = 0;
+ 				u64 trim = 0;
+-				u64 aligned_end = 0;
+-
+-				/*
+-				 * Don't copy an inline extent into an offset
+-				 * greater than zero. Having an inline extent
+-				 * at such an offset results in chaos as btrfs
+-				 * isn't prepared for such cases. Just skip
+-				 * this case for the same reasons as commented
+-				 * at btrfs_ioctl_clone().
+-				 */
+-				if (last_dest_end > 0) {
+-					ret = -EOPNOTSUPP;
+-					btrfs_end_transaction(trans, root);
+-					goto out;
+-				}
+ 
+ 				if (off > key.offset) {
+ 					skip = off - key.offset;
+@@ -3618,42 +3687,22 @@ process_slot:
+ 				size -= skip + trim;
+ 				datal -= skip + trim;
+ 
+-				aligned_end = ALIGN(new_key.offset + datal,
+-						    root->sectorsize);
+-				ret = btrfs_drop_extents(trans, root, inode,
+-							 drop_start,
+-							 aligned_end,
+-							 1);
++				ret = clone_copy_inline_extent(src, inode,
++							       trans, path,
++							       &new_key,
++							       drop_start,
++							       datal,
++							       skip, size, buf);
+ 				if (ret) {
+ 					if (ret != -EOPNOTSUPP)
+ 						btrfs_abort_transaction(trans,
+-							root, ret);
+-					btrfs_end_transaction(trans, root);
+-					goto out;
+-				}
+-
+-				ret = btrfs_insert_empty_item(trans, root, path,
+-							      &new_key, size);
+-				if (ret) {
+-					btrfs_abort_transaction(trans, root,
+-								ret);
++									root,
++									ret);
+ 					btrfs_end_transaction(trans, root);
+ 					goto out;
+ 				}
+-
+-				if (skip) {
+-					u32 start =
+-					  btrfs_file_extent_calc_inline_size(0);
+-					memmove(buf+start, buf+start+skip,
+-						datal);
+-				}
+-
+ 				leaf = path->nodes[0];
+ 				slot = path->slots[0];
+-				write_extent_buffer(leaf, buf,
+-					    btrfs_item_ptr_offset(leaf, slot),
+-					    size);
+-				inode_add_bytes(inode, datal);
+ 			}
+ 
+ 			/* If we have an implicit hole (NO_HOLES feature). */
+diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
+index 88cbb5995667..3a828a33cd67 100644
+--- a/fs/btrfs/relocation.c
++++ b/fs/btrfs/relocation.c
+@@ -1716,7 +1716,7 @@ int replace_file_extents(struct btrfs_trans_handle *trans,
+ 		ret = btrfs_inc_extent_ref(trans, root, new_bytenr,
+ 					   num_bytes, parent,
+ 					   btrfs_header_owner(leaf),
+-					   key.objectid, key.offset, 1);
++					   key.objectid, key.offset);
+ 		if (ret) {
+ 			btrfs_abort_transaction(trans, root, ret);
+ 			break;
+@@ -1724,7 +1724,7 @@ int replace_file_extents(struct btrfs_trans_handle *trans,
+ 
+ 		ret = btrfs_free_extent(trans, root, bytenr, num_bytes,
+ 					parent, btrfs_header_owner(leaf),
+-					key.objectid, key.offset, 1);
++					key.objectid, key.offset);
+ 		if (ret) {
+ 			btrfs_abort_transaction(trans, root, ret);
+ 			break;
+@@ -1900,23 +1900,21 @@ again:
+ 
+ 		ret = btrfs_inc_extent_ref(trans, src, old_bytenr, blocksize,
+ 					path->nodes[level]->start,
+-					src->root_key.objectid, level - 1, 0,
+-					1);
++					src->root_key.objectid, level - 1, 0);
+ 		BUG_ON(ret);
+ 		ret = btrfs_inc_extent_ref(trans, dest, new_bytenr, blocksize,
+ 					0, dest->root_key.objectid, level - 1,
+-					0, 1);
++					0);
+ 		BUG_ON(ret);
+ 
+ 		ret = btrfs_free_extent(trans, src, new_bytenr, blocksize,
+ 					path->nodes[level]->start,
+-					src->root_key.objectid, level - 1, 0,
+-					1);
++					src->root_key.objectid, level - 1, 0);
+ 		BUG_ON(ret);
+ 
+ 		ret = btrfs_free_extent(trans, dest, old_bytenr, blocksize,
+ 					0, dest->root_key.objectid, level - 1,
+-					0, 1);
++					0);
+ 		BUG_ON(ret);
+ 
+ 		btrfs_unlock_up_safe(path, 0);
+@@ -2746,7 +2744,7 @@ static int do_relocation(struct btrfs_trans_handle *trans,
+ 						node->eb->start, blocksize,
+ 						upper->eb->start,
+ 						btrfs_header_owner(upper->eb),
+-						node->level, 0, 1);
++						node->level, 0);
+ 			BUG_ON(ret);
+ 
+ 			ret = btrfs_drop_subtree(trans, root, eb, upper->eb);
+diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
+index aa72bfd28f7d..890933b61267 100644
+--- a/fs/btrfs/send.c
++++ b/fs/btrfs/send.c
+@@ -2351,8 +2351,14 @@ static int send_subvol_begin(struct send_ctx *sctx)
+ 	}
+ 
+ 	TLV_PUT_STRING(sctx, BTRFS_SEND_A_PATH, name, namelen);
+-	TLV_PUT_UUID(sctx, BTRFS_SEND_A_UUID,
+-			sctx->send_root->root_item.uuid);
++
++	if (!btrfs_is_empty_uuid(sctx->send_root->root_item.received_uuid))
++		TLV_PUT_UUID(sctx, BTRFS_SEND_A_UUID,
++			    sctx->send_root->root_item.received_uuid);
++	else
++		TLV_PUT_UUID(sctx, BTRFS_SEND_A_UUID,
++			    sctx->send_root->root_item.uuid);
++
+ 	TLV_PUT_U64(sctx, BTRFS_SEND_A_CTRANSID,
+ 		    le64_to_cpu(sctx->send_root->root_item.ctransid));
+ 	if (parent_root) {
+diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
+index 9c45431e69ab..7639695075dd 100644
+--- a/fs/btrfs/tree-log.c
++++ b/fs/btrfs/tree-log.c
+@@ -700,7 +700,7 @@ static noinline int replay_one_extent(struct btrfs_trans_handle *trans,
+ 				ret = btrfs_inc_extent_ref(trans, root,
+ 						ins.objectid, ins.offset,
+ 						0, root->root_key.objectid,
+-						key->objectid, offset, 0);
++						key->objectid, offset);
+ 				if (ret)
+ 					goto out;
+ 			} else {
+diff --git a/fs/btrfs/xattr.c b/fs/btrfs/xattr.c
+index 6f518c90e1c1..1fcd7b6e7564 100644
+--- a/fs/btrfs/xattr.c
++++ b/fs/btrfs/xattr.c
+@@ -313,8 +313,10 @@ ssize_t btrfs_listxattr(struct dentry *dentry, char *buffer, size_t size)
+ 		/* check to make sure this item is what we want */
+ 		if (found_key.objectid != key.objectid)
+ 			break;
+-		if (found_key.type != BTRFS_XATTR_ITEM_KEY)
++		if (found_key.type > BTRFS_XATTR_ITEM_KEY)
+ 			break;
++		if (found_key.type < BTRFS_XATTR_ITEM_KEY)
++			goto next;
+ 
+ 		di = btrfs_item_ptr(leaf, slot, struct btrfs_dir_item);
+ 		if (verify_dir_item(root, leaf, di))
+diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
+index 6aa07af67603..df45a818c570 100644
+--- a/fs/ceph/mds_client.c
++++ b/fs/ceph/mds_client.c
+@@ -1935,7 +1935,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_client *mdsc,
+ 
+ 	len = sizeof(*head) +
+ 		pathlen1 + pathlen2 + 2*(1 + sizeof(u32) + sizeof(u64)) +
+-		sizeof(struct timespec);
++		sizeof(struct ceph_timespec);
+ 
+ 	/* calculate (max) length for cap releases */
+ 	len += sizeof(struct ceph_mds_request_release) *
+diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
+index c711be8d6a3c..9c8d23316da1 100644
+--- a/fs/debugfs/inode.c
++++ b/fs/debugfs/inode.c
+@@ -271,8 +271,12 @@ static struct dentry *start_creating(const char *name, struct dentry *parent)
+ 		dput(dentry);
+ 		dentry = ERR_PTR(-EEXIST);
+ 	}
+-	if (IS_ERR(dentry))
++
++	if (IS_ERR(dentry)) {
+ 		mutex_unlock(&d_inode(parent)->i_mutex);
++		simple_release_fs(&debugfs_mount, &debugfs_mount_count);
++	}
++
+ 	return dentry;
+ }
+ 
+diff --git a/fs/ext4/crypto.c b/fs/ext4/crypto.c
+index 45731558138c..54a5169327a3 100644
+--- a/fs/ext4/crypto.c
++++ b/fs/ext4/crypto.c
+@@ -296,7 +296,6 @@ static int ext4_page_crypto(struct ext4_crypto_ctx *ctx,
+ 	else
+ 		res = crypto_ablkcipher_encrypt(req);
+ 	if (res == -EINPROGRESS || res == -EBUSY) {
+-		BUG_ON(req->base.data != &ecr);
+ 		wait_for_completion(&ecr.completion);
+ 		res = ecr.res;
+ 	}
+diff --git a/fs/ext4/crypto_fname.c b/fs/ext4/crypto_fname.c
+index 7dc4eb55913c..f9d53c2bd756 100644
+--- a/fs/ext4/crypto_fname.c
++++ b/fs/ext4/crypto_fname.c
+@@ -121,7 +121,6 @@ static int ext4_fname_encrypt(struct inode *inode,
+ 	ablkcipher_request_set_crypt(req, &src_sg, &dst_sg, ciphertext_len, iv);
+ 	res = crypto_ablkcipher_encrypt(req);
+ 	if (res == -EINPROGRESS || res == -EBUSY) {
+-		BUG_ON(req->base.data != &ecr);
+ 		wait_for_completion(&ecr.completion);
+ 		res = ecr.res;
+ 	}
+@@ -183,7 +182,6 @@ static int ext4_fname_decrypt(struct inode *inode,
+ 	ablkcipher_request_set_crypt(req, &src_sg, &dst_sg, iname->len, iv);
+ 	res = crypto_ablkcipher_decrypt(req);
+ 	if (res == -EINPROGRESS || res == -EBUSY) {
+-		BUG_ON(req->base.data != &ecr);
+ 		wait_for_completion(&ecr.completion);
+ 		res = ecr.res;
+ 	}
+diff --git a/fs/ext4/crypto_key.c b/fs/ext4/crypto_key.c
+index 442d24e8efc0..9bad1132ac8f 100644
+--- a/fs/ext4/crypto_key.c
++++ b/fs/ext4/crypto_key.c
+@@ -71,7 +71,6 @@ static int ext4_derive_key_aes(char deriving_key[EXT4_AES_128_ECB_KEY_SIZE],
+ 				     EXT4_AES_256_XTS_KEY_SIZE, NULL);
+ 	res = crypto_ablkcipher_encrypt(req);
+ 	if (res == -EINPROGRESS || res == -EBUSY) {
+-		BUG_ON(req->base.data != &ecr);
+ 		wait_for_completion(&ecr.completion);
+ 		res = ecr.res;
+ 	}
+@@ -208,7 +207,12 @@ retry:
+ 		goto out;
+ 	}
+ 	crypt_info->ci_keyring_key = keyring_key;
+-	BUG_ON(keyring_key->type != &key_type_logon);
++	if (keyring_key->type != &key_type_logon) {
++		printk_once(KERN_WARNING
++			    "ext4: key type must be logon\n");
++		res = -ENOKEY;
++		goto out;
++	}
+ 	ukp = ((struct user_key_payload *)keyring_key->payload.data);
+ 	if (ukp->datalen != sizeof(struct ext4_encryption_key)) {
+ 		res = -EINVAL;
+@@ -217,7 +221,13 @@ retry:
+ 	master_key = (struct ext4_encryption_key *)ukp->data;
+ 	BUILD_BUG_ON(EXT4_AES_128_ECB_KEY_SIZE !=
+ 		     EXT4_KEY_DERIVATION_NONCE_SIZE);
+-	BUG_ON(master_key->size != EXT4_AES_256_XTS_KEY_SIZE);
++	if (master_key->size != EXT4_AES_256_XTS_KEY_SIZE) {
++		printk_once(KERN_WARNING
++			    "ext4: key size incorrect: %d\n",
++			    master_key->size);
++		res = -ENOKEY;
++		goto out;
++	}
+ 	res = ext4_derive_key_aes(ctx.nonce, master_key->raw,
+ 				  raw_key);
+ got_key:
+diff --git a/fs/ext4/crypto_policy.c b/fs/ext4/crypto_policy.c
+index 02c4e5df7afb..f92fa93e67f1 100644
+--- a/fs/ext4/crypto_policy.c
++++ b/fs/ext4/crypto_policy.c
+@@ -137,7 +137,8 @@ int ext4_is_child_context_consistent_with_parent(struct inode *parent,
+ 
+ 	if ((parent == NULL) || (child == NULL)) {
+ 		pr_err("parent %p child %p\n", parent, child);
+-		BUG_ON(1);
++		WARN_ON(1);	/* Should never happen */
++		return 0;
+ 	}
+ 	/* no restrictions if the parent directory is not encrypted */
+ 	if (!ext4_encrypted_inode(parent))
+diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
+index d41843181818..e770c1ee4613 100644
+--- a/fs/ext4/ext4_jbd2.c
++++ b/fs/ext4/ext4_jbd2.c
+@@ -88,13 +88,13 @@ int __ext4_journal_stop(const char *where, unsigned int line, handle_t *handle)
+ 		return 0;
+ 	}
+ 
++	err = handle->h_err;
+ 	if (!handle->h_transaction) {
+-		err = jbd2_journal_stop(handle);
+-		return handle->h_err ? handle->h_err : err;
++		rc = jbd2_journal_stop(handle);
++		return err ? err : rc;
+ 	}
+ 
+ 	sb = handle->h_transaction->t_journal->j_private;
+-	err = handle->h_err;
+ 	rc = jbd2_journal_stop(handle);
+ 
+ 	if (!err)
+diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
+index 5602450f03f6..89e96f99dae7 100644
+--- a/fs/ext4/page-io.c
++++ b/fs/ext4/page-io.c
+@@ -425,6 +425,7 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
+ 	struct buffer_head *bh, *head;
+ 	int ret = 0;
+ 	int nr_submitted = 0;
++	int nr_to_submit = 0;
+ 
+ 	blocksize = 1 << inode->i_blkbits;
+ 
+@@ -477,11 +478,13 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
+ 			unmap_underlying_metadata(bh->b_bdev, bh->b_blocknr);
+ 		}
+ 		set_buffer_async_write(bh);
++		nr_to_submit++;
+ 	} while ((bh = bh->b_this_page) != head);
+ 
+ 	bh = head = page_buffers(page);
+ 
+-	if (ext4_encrypted_inode(inode) && S_ISREG(inode->i_mode)) {
++	if (ext4_encrypted_inode(inode) && S_ISREG(inode->i_mode) &&
++	    nr_to_submit) {
+ 		data_page = ext4_encrypt(inode, page);
+ 		if (IS_ERR(data_page)) {
+ 			ret = PTR_ERR(data_page);
+diff --git a/fs/ext4/super.c b/fs/ext4/super.c
+index a5e8c744e962..bc24d1b44b8f 100644
+--- a/fs/ext4/super.c
++++ b/fs/ext4/super.c
+@@ -397,9 +397,13 @@ static void ext4_handle_error(struct super_block *sb)
+ 		smp_wmb();
+ 		sb->s_flags |= MS_RDONLY;
+ 	}
+-	if (test_opt(sb, ERRORS_PANIC))
++	if (test_opt(sb, ERRORS_PANIC)) {
++		if (EXT4_SB(sb)->s_journal &&
++		  !(EXT4_SB(sb)->s_journal->j_flags & JBD2_REC_ERR))
++			return;
+ 		panic("EXT4-fs (device %s): panic forced after error\n",
+ 			sb->s_id);
++	}
+ }
+ 
+ #define ext4_error_ratelimit(sb)					\
+@@ -588,8 +592,12 @@ void __ext4_abort(struct super_block *sb, const char *function,
+ 			jbd2_journal_abort(EXT4_SB(sb)->s_journal, -EIO);
+ 		save_error_info(sb, function, line);
+ 	}
+-	if (test_opt(sb, ERRORS_PANIC))
++	if (test_opt(sb, ERRORS_PANIC)) {
++		if (EXT4_SB(sb)->s_journal &&
++		  !(EXT4_SB(sb)->s_journal->j_flags & JBD2_REC_ERR))
++			return;
+ 		panic("EXT4-fs panic from previous error\n");
++	}
+ }
+ 
+ void __ext4_msg(struct super_block *sb,
+diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
+index 2721513adb1f..fd2787a39b87 100644
+--- a/fs/jbd2/journal.c
++++ b/fs/jbd2/journal.c
+@@ -2071,8 +2071,12 @@ static void __journal_abort_soft (journal_t *journal, int errno)
+ 
+ 	__jbd2_journal_abort_hard(journal);
+ 
+-	if (errno)
++	if (errno) {
+ 		jbd2_journal_update_sb_errno(journal);
++		write_lock(&journal->j_state_lock);
++		journal->j_flags |= JBD2_REC_ERR;
++		write_unlock(&journal->j_state_lock);
++	}
+ }
+ 
+ /**
+diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
+index 4afbe13321cb..f27cc76ed5e6 100644
+--- a/fs/nfs/inode.c
++++ b/fs/nfs/inode.c
+@@ -1816,7 +1816,11 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
+ 		if ((long)fattr->gencount - (long)nfsi->attr_gencount > 0)
+ 			nfsi->attr_gencount = fattr->gencount;
+ 	}
+-	invalid &= ~NFS_INO_INVALID_ATTR;
++
++	/* Don't declare attrcache up to date if there were no attrs! */
++	if (fattr->valid != 0)
++		invalid &= ~NFS_INO_INVALID_ATTR;
++
+ 	/* Don't invalidate the data if we were to blame */
+ 	if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)
+ 				|| S_ISLNK(inode->i_mode)))
+diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
+index 3aa6a9ba5113..199648d5fcc5 100644
+--- a/fs/nfs/nfs4client.c
++++ b/fs/nfs/nfs4client.c
+@@ -33,7 +33,7 @@ static int nfs_get_cb_ident_idr(struct nfs_client *clp, int minorversion)
+ 		return ret;
+ 	idr_preload(GFP_KERNEL);
+ 	spin_lock(&nn->nfs_client_lock);
+-	ret = idr_alloc(&nn->cb_ident_idr, clp, 0, 0, GFP_NOWAIT);
++	ret = idr_alloc(&nn->cb_ident_idr, clp, 1, 0, GFP_NOWAIT);
+ 	if (ret >= 0)
+ 		clp->cl_cb_ident = ret;
+ 	spin_unlock(&nn->nfs_client_lock);
+diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
+index 75189cd34583..5ea13286e2b0 100644
+--- a/fs/nfsd/nfs4state.c
++++ b/fs/nfsd/nfs4state.c
+@@ -765,16 +765,68 @@ void nfs4_unhash_stid(struct nfs4_stid *s)
+ 	s->sc_type = 0;
+ }
+ 
+-static void
++/**
++ * nfs4_get_existing_delegation - Discover if this delegation already exists
++ * @clp:     a pointer to the nfs4_client we're granting a delegation to
++ * @fp:      a pointer to the nfs4_file we're granting a delegation on
++ *
++ * Return:
++ *      On success: NULL if an existing delegation was not found.
++ *
++ *      On error: -EAGAIN if one was previously granted to this nfs4_client
++ *                 for this nfs4_file.
++ *
++ */
++
++static int
++nfs4_get_existing_delegation(struct nfs4_client *clp, struct nfs4_file *fp)
++{
++	struct nfs4_delegation *searchdp = NULL;
++	struct nfs4_client *searchclp = NULL;
++
++	lockdep_assert_held(&state_lock);
++	lockdep_assert_held(&fp->fi_lock);
++
++	list_for_each_entry(searchdp, &fp->fi_delegations, dl_perfile) {
++		searchclp = searchdp->dl_stid.sc_client;
++		if (clp == searchclp) {
++			return -EAGAIN;
++		}
++	}
++	return 0;
++}
++
++/**
++ * hash_delegation_locked - Add a delegation to the appropriate lists
++ * @dp:     a pointer to the nfs4_delegation we are adding.
++ * @fp:     a pointer to the nfs4_file we're granting a delegation on
++ *
++ * Return:
++ *      On success: NULL if the delegation was successfully hashed.
++ *
++ *      On error: -EAGAIN if one was previously granted to this
++ *                 nfs4_client for this nfs4_file. Delegation is not hashed.
++ *
++ */
++
++static int
+ hash_delegation_locked(struct nfs4_delegation *dp, struct nfs4_file *fp)
+ {
++	int status;
++	struct nfs4_client *clp = dp->dl_stid.sc_client;
++
+ 	lockdep_assert_held(&state_lock);
+ 	lockdep_assert_held(&fp->fi_lock);
+ 
++	status = nfs4_get_existing_delegation(clp, fp);
++	if (status)
++		return status;
++	++fp->fi_delegees;
+ 	atomic_inc(&dp->dl_stid.sc_count);
+ 	dp->dl_stid.sc_type = NFS4_DELEG_STID;
+ 	list_add(&dp->dl_perfile, &fp->fi_delegations);
+-	list_add(&dp->dl_perclnt, &dp->dl_stid.sc_client->cl_delegations);
++	list_add(&dp->dl_perclnt, &clp->cl_delegations);
++	return 0;
+ }
+ 
+ static bool
+@@ -3351,6 +3403,7 @@ static void init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp,
+ 	stp->st_access_bmap = 0;
+ 	stp->st_deny_bmap = 0;
+ 	stp->st_openstp = NULL;
++	init_rwsem(&stp->st_rwsem);
+ 	spin_lock(&oo->oo_owner.so_client->cl_lock);
+ 	list_add(&stp->st_perstateowner, &oo->oo_owner.so_stateids);
+ 	spin_lock(&fp->fi_lock);
+@@ -3939,6 +3992,18 @@ static struct file_lock *nfs4_alloc_init_lease(struct nfs4_file *fp, int flag)
+ 	return fl;
+ }
+ 
++/**
++ * nfs4_setlease - Obtain a delegation by requesting lease from vfs layer
++ * @dp:   a pointer to the nfs4_delegation we're adding.
++ *
++ * Return:
++ *      On success: Return code will be 0 on success.
++ *
++ *      On error: -EAGAIN if there was an existing delegation.
++ *                 nonzero if there is an error in other cases.
++ *
++ */
++
+ static int nfs4_setlease(struct nfs4_delegation *dp)
+ {
+ 	struct nfs4_file *fp = dp->dl_stid.sc_file;
+@@ -3970,16 +4035,19 @@ static int nfs4_setlease(struct nfs4_delegation *dp)
+ 		goto out_unlock;
+ 	/* Race breaker */
+ 	if (fp->fi_deleg_file) {
+-		status = 0;
+-		++fp->fi_delegees;
+-		hash_delegation_locked(dp, fp);
++		status = hash_delegation_locked(dp, fp);
+ 		goto out_unlock;
+ 	}
+ 	fp->fi_deleg_file = filp;
+-	fp->fi_delegees = 1;
+-	hash_delegation_locked(dp, fp);
++	fp->fi_delegees = 0;
++	status = hash_delegation_locked(dp, fp);
+ 	spin_unlock(&fp->fi_lock);
+ 	spin_unlock(&state_lock);
++	if (status) {
++		/* Should never happen, this is a new fi_deleg_file  */
++		WARN_ON_ONCE(1);
++		goto out_fput;
++	}
+ 	return 0;
+ out_unlock:
+ 	spin_unlock(&fp->fi_lock);
+@@ -3999,6 +4067,15 @@ nfs4_set_delegation(struct nfs4_client *clp, struct svc_fh *fh,
+ 	if (fp->fi_had_conflict)
+ 		return ERR_PTR(-EAGAIN);
+ 
++	spin_lock(&state_lock);
++	spin_lock(&fp->fi_lock);
++	status = nfs4_get_existing_delegation(clp, fp);
++	spin_unlock(&fp->fi_lock);
++	spin_unlock(&state_lock);
++
++	if (status)
++		return ERR_PTR(status);
++
+ 	dp = alloc_init_deleg(clp, fh, odstate);
+ 	if (!dp)
+ 		return ERR_PTR(-ENOMEM);
+@@ -4017,9 +4094,7 @@ nfs4_set_delegation(struct nfs4_client *clp, struct svc_fh *fh,
+ 		status = -EAGAIN;
+ 		goto out_unlock;
+ 	}
+-	++fp->fi_delegees;
+-	hash_delegation_locked(dp, fp);
+-	status = 0;
++	status = hash_delegation_locked(dp, fp);
+ out_unlock:
+ 	spin_unlock(&fp->fi_lock);
+ 	spin_unlock(&state_lock);
+@@ -4180,15 +4255,20 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
+ 	 */
+ 	if (stp) {
+ 		/* Stateid was found, this is an OPEN upgrade */
++		down_read(&stp->st_rwsem);
+ 		status = nfs4_upgrade_open(rqstp, fp, current_fh, stp, open);
+-		if (status)
++		if (status) {
++			up_read(&stp->st_rwsem);
+ 			goto out;
++		}
+ 	} else {
+ 		stp = open->op_stp;
+ 		open->op_stp = NULL;
+ 		init_open_stateid(stp, fp, open);
++		down_read(&stp->st_rwsem);
+ 		status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open);
+ 		if (status) {
++			up_read(&stp->st_rwsem);
+ 			release_open_stateid(stp);
+ 			goto out;
+ 		}
+@@ -4200,6 +4280,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
+ 	}
+ 	update_stateid(&stp->st_stid.sc_stateid);
+ 	memcpy(&open->op_stateid, &stp->st_stid.sc_stateid, sizeof(stateid_t));
++	up_read(&stp->st_rwsem);
+ 
+ 	if (nfsd4_has_session(&resp->cstate)) {
+ 		if (open->op_deleg_want & NFS4_SHARE_WANT_NO_DELEG) {
+@@ -4814,10 +4895,13 @@ static __be32 nfs4_seqid_op_checks(struct nfsd4_compound_state *cstate, stateid_
+ 		 * revoked delegations are kept only for free_stateid.
+ 		 */
+ 		return nfserr_bad_stateid;
++	down_write(&stp->st_rwsem);
+ 	status = check_stateid_generation(stateid, &stp->st_stid.sc_stateid, nfsd4_has_session(cstate));
+-	if (status)
+-		return status;
+-	return nfs4_check_fh(current_fh, &stp->st_stid);
++	if (status == nfs_ok)
++		status = nfs4_check_fh(current_fh, &stp->st_stid);
++	if (status != nfs_ok)
++		up_write(&stp->st_rwsem);
++	return status;
+ }
+ 
+ /* 
+@@ -4864,6 +4948,7 @@ static __be32 nfs4_preprocess_confirmed_seqid_op(struct nfsd4_compound_state *cs
+ 		return status;
+ 	oo = openowner(stp->st_stateowner);
+ 	if (!(oo->oo_flags & NFS4_OO_CONFIRMED)) {
++		up_write(&stp->st_rwsem);
+ 		nfs4_put_stid(&stp->st_stid);
+ 		return nfserr_bad_stateid;
+ 	}
+@@ -4894,11 +4979,14 @@ nfsd4_open_confirm(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
+ 		goto out;
+ 	oo = openowner(stp->st_stateowner);
+ 	status = nfserr_bad_stateid;
+-	if (oo->oo_flags & NFS4_OO_CONFIRMED)
++	if (oo->oo_flags & NFS4_OO_CONFIRMED) {
++		up_write(&stp->st_rwsem);
+ 		goto put_stateid;
++	}
+ 	oo->oo_flags |= NFS4_OO_CONFIRMED;
+ 	update_stateid(&stp->st_stid.sc_stateid);
+ 	memcpy(&oc->oc_resp_stateid, &stp->st_stid.sc_stateid, sizeof(stateid_t));
++	up_write(&stp->st_rwsem);
+ 	dprintk("NFSD: %s: success, seqid=%d stateid=" STATEID_FMT "\n",
+ 		__func__, oc->oc_seqid, STATEID_VAL(&stp->st_stid.sc_stateid));
+ 
+@@ -4977,6 +5065,7 @@ nfsd4_open_downgrade(struct svc_rqst *rqstp,
+ 	memcpy(&od->od_stateid, &stp->st_stid.sc_stateid, sizeof(stateid_t));
+ 	status = nfs_ok;
+ put_stateid:
++	up_write(&stp->st_rwsem);
+ 	nfs4_put_stid(&stp->st_stid);
+ out:
+ 	nfsd4_bump_seqid(cstate, status);
+@@ -5030,6 +5119,7 @@ nfsd4_close(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
+ 		goto out; 
+ 	update_stateid(&stp->st_stid.sc_stateid);
+ 	memcpy(&close->cl_stateid, &stp->st_stid.sc_stateid, sizeof(stateid_t));
++	up_write(&stp->st_rwsem);
+ 
+ 	nfsd4_close_open_stateid(stp);
+ 
+@@ -5260,6 +5350,7 @@ init_lock_stateid(struct nfs4_ol_stateid *stp, struct nfs4_lockowner *lo,
+ 	stp->st_access_bmap = 0;
+ 	stp->st_deny_bmap = open_stp->st_deny_bmap;
+ 	stp->st_openstp = open_stp;
++	init_rwsem(&stp->st_rwsem);
+ 	list_add(&stp->st_locks, &open_stp->st_locks);
+ 	list_add(&stp->st_perstateowner, &lo->lo_owner.so_stateids);
+ 	spin_lock(&fp->fi_lock);
+@@ -5428,6 +5519,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
+ 					&open_stp, nn);
+ 		if (status)
+ 			goto out;
++		up_write(&open_stp->st_rwsem);
+ 		open_sop = openowner(open_stp->st_stateowner);
+ 		status = nfserr_bad_stateid;
+ 		if (!same_clid(&open_sop->oo_owner.so_client->cl_clientid,
+@@ -5435,6 +5527,8 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
+ 			goto out;
+ 		status = lookup_or_create_lock_state(cstate, open_stp, lock,
+ 							&lock_stp, &new);
++		if (status == nfs_ok)
++			down_write(&lock_stp->st_rwsem);
+ 	} else {
+ 		status = nfs4_preprocess_seqid_op(cstate,
+ 				       lock->lk_old_lock_seqid,
+@@ -5540,6 +5634,8 @@ out:
+ 		    seqid_mutating_err(ntohl(status)))
+ 			lock_sop->lo_owner.so_seqid++;
+ 
++		up_write(&lock_stp->st_rwsem);
++
+ 		/*
+ 		 * If this is a new, never-before-used stateid, and we are
+ 		 * returning an error, then just go ahead and release it.
+@@ -5710,6 +5806,7 @@ nfsd4_locku(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
+ fput:
+ 	fput(filp);
+ put_stateid:
++	up_write(&stp->st_rwsem);
+ 	nfs4_put_stid(&stp->st_stid);
+ out:
+ 	nfsd4_bump_seqid(cstate, status);
+diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
+index 4874ce515fc1..fada614d6db1 100644
+--- a/fs/nfsd/state.h
++++ b/fs/nfsd/state.h
+@@ -534,15 +534,16 @@ struct nfs4_file {
+  * Better suggestions welcome.
+  */
+ struct nfs4_ol_stateid {
+-	struct nfs4_stid    st_stid; /* must be first field */
+-	struct list_head              st_perfile;
+-	struct list_head              st_perstateowner;
+-	struct list_head              st_locks;
+-	struct nfs4_stateowner      * st_stateowner;
+-	struct nfs4_clnt_odstate    * st_clnt_odstate;
+-	unsigned char                 st_access_bmap;
+-	unsigned char                 st_deny_bmap;
+-	struct nfs4_ol_stateid         * st_openstp;
++	struct nfs4_stid		st_stid;
++	struct list_head		st_perfile;
++	struct list_head		st_perstateowner;
++	struct list_head		st_locks;
++	struct nfs4_stateowner		*st_stateowner;
++	struct nfs4_clnt_odstate	*st_clnt_odstate;
++	unsigned char			st_access_bmap;
++	unsigned char			st_deny_bmap;
++	struct nfs4_ol_stateid		*st_openstp;
++	struct rw_semaphore		st_rwsem;
+ };
+ 
+ static inline struct nfs4_ol_stateid *openlockstateid(struct nfs4_stid *s)
+diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
+index 6e6abb93fda5..ff040125c190 100644
+--- a/fs/ocfs2/namei.c
++++ b/fs/ocfs2/namei.c
+@@ -365,6 +365,8 @@ static int ocfs2_mknod(struct inode *dir,
+ 		mlog_errno(status);
+ 		goto leave;
+ 	}
++	/* update inode->i_mode after mask with "umask". */
++	inode->i_mode = mode;
+ 
+ 	handle = ocfs2_start_trans(osb, ocfs2_mknod_credits(osb->sb,
+ 							    S_ISDIR(mode),
+diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
+index 82806c60aa42..e4b464983322 100644
+--- a/include/linux/ipv6.h
++++ b/include/linux/ipv6.h
+@@ -224,7 +224,7 @@ struct ipv6_pinfo {
+ 	struct ipv6_ac_socklist	*ipv6_ac_list;
+ 	struct ipv6_fl_socklist __rcu *ipv6_fl_list;
+ 
+-	struct ipv6_txoptions	*opt;
++	struct ipv6_txoptions __rcu	*opt;
+ 	struct sk_buff		*pktoptions;
+ 	struct sk_buff		*rxpmtu;
+ 	struct inet6_cork	cork;
+diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
+index eb1cebed3f36..c90c9b70e568 100644
+--- a/include/linux/jbd2.h
++++ b/include/linux/jbd2.h
+@@ -1007,6 +1007,7 @@ struct journal_s
+ #define JBD2_ABORT_ON_SYNCDATA_ERR	0x040	/* Abort the journal on file
+ 						 * data write error in ordered
+ 						 * mode */
++#define JBD2_REC_ERR	0x080	/* The errno in the sb has been recorded */
+ 
+ /*
+  * Function declarations for the journaling transaction and buffer
+diff --git a/include/net/af_unix.h b/include/net/af_unix.h
+index cb1b9bbda332..49c7683e1096 100644
+--- a/include/net/af_unix.h
++++ b/include/net/af_unix.h
+@@ -62,6 +62,7 @@ struct unix_sock {
+ #define UNIX_GC_CANDIDATE	0
+ #define UNIX_GC_MAYBE_CYCLE	1
+ 	struct socket_wq	peer_wq;
++	wait_queue_t		peer_wake;
+ };
+ 
+ static inline struct unix_sock *unix_sk(struct sock *sk)
+diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
+index 3b76849c190f..75a888c254e4 100644
+--- a/include/net/ip6_fib.h
++++ b/include/net/ip6_fib.h
+@@ -165,7 +165,8 @@ static inline void rt6_update_expires(struct rt6_info *rt0, int timeout)
+ 
+ static inline u32 rt6_get_cookie(const struct rt6_info *rt)
+ {
+-	if (rt->rt6i_flags & RTF_PCPU || unlikely(rt->dst.flags & DST_NOCACHE))
++	if (rt->rt6i_flags & RTF_PCPU ||
++	    (unlikely(rt->dst.flags & DST_NOCACHE) && rt->dst.from))
+ 		rt = (struct rt6_info *)(rt->dst.from);
+ 
+ 	return rt->rt6i_node ? rt->rt6i_node->fn_sernum : 0;
+diff --git a/include/net/ip6_tunnel.h b/include/net/ip6_tunnel.h
+index b8529aa1dae7..b0f7445c0fdc 100644
+--- a/include/net/ip6_tunnel.h
++++ b/include/net/ip6_tunnel.h
+@@ -83,11 +83,12 @@ static inline void ip6tunnel_xmit(struct sock *sk, struct sk_buff *skb,
+ 	err = ip6_local_out_sk(sk, skb);
+ 
+ 	if (net_xmit_eval(err) == 0) {
+-		struct pcpu_sw_netstats *tstats = this_cpu_ptr(dev->tstats);
++		struct pcpu_sw_netstats *tstats = get_cpu_ptr(dev->tstats);
+ 		u64_stats_update_begin(&tstats->syncp);
+ 		tstats->tx_bytes += pkt_len;
+ 		tstats->tx_packets++;
+ 		u64_stats_update_end(&tstats->syncp);
++		put_cpu_ptr(tstats);
+ 	} else {
+ 		stats->tx_errors++;
+ 		stats->tx_aborted_errors++;
+diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
+index d8214cb88bbc..9c2897e56ee1 100644
+--- a/include/net/ip_tunnels.h
++++ b/include/net/ip_tunnels.h
+@@ -207,12 +207,13 @@ static inline void iptunnel_xmit_stats(int err,
+ 				       struct pcpu_sw_netstats __percpu *stats)
+ {
+ 	if (err > 0) {
+-		struct pcpu_sw_netstats *tstats = this_cpu_ptr(stats);
++		struct pcpu_sw_netstats *tstats = get_cpu_ptr(stats);
+ 
+ 		u64_stats_update_begin(&tstats->syncp);
+ 		tstats->tx_bytes += err;
+ 		tstats->tx_packets++;
+ 		u64_stats_update_end(&tstats->syncp);
++		put_cpu_ptr(tstats);
+ 	} else if (err < 0) {
+ 		err_stats->tx_errors++;
+ 		err_stats->tx_aborted_errors++;
+diff --git a/include/net/ipv6.h b/include/net/ipv6.h
+index 82dbdb092a5d..177a89689095 100644
+--- a/include/net/ipv6.h
++++ b/include/net/ipv6.h
+@@ -205,6 +205,7 @@ extern rwlock_t ip6_ra_lock;
+  */
+ 
+ struct ipv6_txoptions {
++	atomic_t		refcnt;
+ 	/* Length of this structure */
+ 	int			tot_len;
+ 
+@@ -217,7 +218,7 @@ struct ipv6_txoptions {
+ 	struct ipv6_opt_hdr	*dst0opt;
+ 	struct ipv6_rt_hdr	*srcrt;	/* Routing Header */
+ 	struct ipv6_opt_hdr	*dst1opt;
+-
++	struct rcu_head		rcu;
+ 	/* Option buffer, as read by IPV6_PKTOPTIONS, starts here. */
+ };
+ 
+@@ -252,6 +253,24 @@ struct ipv6_fl_socklist {
+ 	struct rcu_head			rcu;
+ };
+ 
++static inline struct ipv6_txoptions *txopt_get(const struct ipv6_pinfo *np)
++{
++	struct ipv6_txoptions *opt;
++
++	rcu_read_lock();
++	opt = rcu_dereference(np->opt);
++	if (opt && !atomic_inc_not_zero(&opt->refcnt))
++		opt = NULL;
++	rcu_read_unlock();
++	return opt;
++}
++
++static inline void txopt_put(struct ipv6_txoptions *opt)
++{
++	if (opt && atomic_dec_and_test(&opt->refcnt))
++		kfree_rcu(opt, rcu);
++}
++
+ struct ip6_flowlabel *fl6_sock_lookup(struct sock *sk, __be32 label);
+ struct ipv6_txoptions *fl6_merge_options(struct ipv6_txoptions *opt_space,
+ 					 struct ip6_flowlabel *fl,
+@@ -490,6 +509,7 @@ struct ip6_create_arg {
+ 	u32 user;
+ 	const struct in6_addr *src;
+ 	const struct in6_addr *dst;
++	int iif;
+ 	u8 ecn;
+ };
+ 
+diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
+index 2738f6f87908..49dda3835061 100644
+--- a/include/net/sch_generic.h
++++ b/include/net/sch_generic.h
+@@ -61,6 +61,9 @@ struct Qdisc {
+ 				      */
+ #define TCQ_F_WARN_NONWC	(1 << 16)
+ #define TCQ_F_CPUSTATS		0x20 /* run using percpu statistics */
++#define TCQ_F_NOPARENT		0x40 /* root of its hierarchy :
++				      * qdisc_tree_decrease_qlen() should stop.
++				      */
+ 	u32			limit;
+ 	const struct Qdisc_ops	*ops;
+ 	struct qdisc_size_table	__rcu *stab;
+diff --git a/include/net/switchdev.h b/include/net/switchdev.h
+index d5671f118bfc..0b9197975603 100644
+--- a/include/net/switchdev.h
++++ b/include/net/switchdev.h
+@@ -268,7 +268,7 @@ static inline int switchdev_port_fdb_dump(struct sk_buff *skb,
+ 					  struct net_device *filter_dev,
+ 					  int idx)
+ {
+-	return -EOPNOTSUPP;
++       return idx;
+ }
+ 
+ #endif
+diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
+index cb31229a6fa4..34265a1ddb51 100644
+--- a/kernel/bpf/arraymap.c
++++ b/kernel/bpf/arraymap.c
+@@ -104,7 +104,7 @@ static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
+ 		/* all elements already exist */
+ 		return -EEXIST;
+ 
+-	memcpy(array->value + array->elem_size * index, value, array->elem_size);
++	memcpy(array->value + array->elem_size * index, value, map->value_size);
+ 	return 0;
+ }
+ 
+diff --git a/net/core/neighbour.c b/net/core/neighbour.c
+index 84195dacb8b6..ecdb1717ef3a 100644
+--- a/net/core/neighbour.c
++++ b/net/core/neighbour.c
+@@ -2210,7 +2210,7 @@ static int pneigh_fill_info(struct sk_buff *skb, struct pneigh_entry *pn,
+ 	ndm->ndm_pad2    = 0;
+ 	ndm->ndm_flags	 = pn->flags | NTF_PROXY;
+ 	ndm->ndm_type	 = RTN_UNICAST;
+-	ndm->ndm_ifindex = pn->dev->ifindex;
++	ndm->ndm_ifindex = pn->dev ? pn->dev->ifindex : 0;
+ 	ndm->ndm_state	 = NUD_NONE;
+ 
+ 	if (nla_put(skb, NDA_DST, tbl->key_len, pn->key))
+@@ -2285,7 +2285,7 @@ static int pneigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb,
+ 		if (h > s_h)
+ 			s_idx = 0;
+ 		for (n = tbl->phash_buckets[h], idx = 0; n; n = n->next) {
+-			if (dev_net(n->dev) != net)
++			if (pneigh_net(n) != net)
+ 				continue;
+ 			if (idx < s_idx)
+ 				goto next;
+diff --git a/net/core/scm.c b/net/core/scm.c
+index 3b6899b7d810..8a1741b14302 100644
+--- a/net/core/scm.c
++++ b/net/core/scm.c
+@@ -305,6 +305,8 @@ void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm)
+ 			err = put_user(cmlen, &cm->cmsg_len);
+ 		if (!err) {
+ 			cmlen = CMSG_SPACE(i*sizeof(int));
++			if (msg->msg_controllen < cmlen)
++				cmlen = msg->msg_controllen;
+ 			msg->msg_control += cmlen;
+ 			msg->msg_controllen -= cmlen;
+ 		}
+diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
+index 5165571f397a..a0490508d213 100644
+--- a/net/dccp/ipv6.c
++++ b/net/dccp/ipv6.c
+@@ -202,7 +202,9 @@ static int dccp_v6_send_response(struct sock *sk, struct request_sock *req)
+ 	security_req_classify_flow(req, flowi6_to_flowi(&fl6));
+ 
+ 
+-	final_p = fl6_update_dst(&fl6, np->opt, &final);
++	rcu_read_lock();
++	final_p = fl6_update_dst(&fl6, rcu_dereference(np->opt), &final);
++	rcu_read_unlock();
+ 
+ 	dst = ip6_dst_lookup_flow(sk, &fl6, final_p);
+ 	if (IS_ERR(dst)) {
+@@ -219,7 +221,10 @@ static int dccp_v6_send_response(struct sock *sk, struct request_sock *req)
+ 							 &ireq->ir_v6_loc_addr,
+ 							 &ireq->ir_v6_rmt_addr);
+ 		fl6.daddr = ireq->ir_v6_rmt_addr;
+-		err = ip6_xmit(sk, skb, &fl6, np->opt, np->tclass);
++		rcu_read_lock();
++		err = ip6_xmit(sk, skb, &fl6, rcu_dereference(np->opt),
++			       np->tclass);
++		rcu_read_unlock();
+ 		err = net_xmit_eval(err);
+ 	}
+ 
+@@ -415,6 +420,7 @@ static struct sock *dccp_v6_request_recv_sock(struct sock *sk,
+ {
+ 	struct inet_request_sock *ireq = inet_rsk(req);
+ 	struct ipv6_pinfo *newnp, *np = inet6_sk(sk);
++	struct ipv6_txoptions *opt;
+ 	struct inet_sock *newinet;
+ 	struct dccp6_sock *newdp6;
+ 	struct sock *newsk;
+@@ -534,13 +540,15 @@ static struct sock *dccp_v6_request_recv_sock(struct sock *sk,
+ 	 * Yes, keeping reference count would be much more clever, but we make
+ 	 * one more one thing there: reattach optmem to newsk.
+ 	 */
+-	if (np->opt != NULL)
+-		newnp->opt = ipv6_dup_options(newsk, np->opt);
+-
++	opt = rcu_dereference(np->opt);
++	if (opt) {
++		opt = ipv6_dup_options(newsk, opt);
++		RCU_INIT_POINTER(newnp->opt, opt);
++	}
+ 	inet_csk(newsk)->icsk_ext_hdr_len = 0;
+-	if (newnp->opt != NULL)
+-		inet_csk(newsk)->icsk_ext_hdr_len = (newnp->opt->opt_nflen +
+-						     newnp->opt->opt_flen);
++	if (opt)
++		inet_csk(newsk)->icsk_ext_hdr_len = opt->opt_nflen +
++						    opt->opt_flen;
+ 
+ 	dccp_sync_mss(newsk, dst_mtu(dst));
+ 
+@@ -793,6 +801,7 @@ static int dccp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
+ 	struct ipv6_pinfo *np = inet6_sk(sk);
+ 	struct dccp_sock *dp = dccp_sk(sk);
+ 	struct in6_addr *saddr = NULL, *final_p, final;
++	struct ipv6_txoptions *opt;
+ 	struct flowi6 fl6;
+ 	struct dst_entry *dst;
+ 	int addr_type;
+@@ -892,7 +901,8 @@ static int dccp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
+ 	fl6.fl6_sport = inet->inet_sport;
+ 	security_sk_classify_flow(sk, flowi6_to_flowi(&fl6));
+ 
+-	final_p = fl6_update_dst(&fl6, np->opt, &final);
++	opt = rcu_dereference_protected(np->opt, sock_owned_by_user(sk));
++	final_p = fl6_update_dst(&fl6, opt, &final);
+ 
+ 	dst = ip6_dst_lookup_flow(sk, &fl6, final_p);
+ 	if (IS_ERR(dst)) {
+@@ -912,9 +922,8 @@ static int dccp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
+ 	__ip6_dst_store(sk, dst, NULL, NULL);
+ 
+ 	icsk->icsk_ext_hdr_len = 0;
+-	if (np->opt != NULL)
+-		icsk->icsk_ext_hdr_len = (np->opt->opt_flen +
+-					  np->opt->opt_nflen);
++	if (opt)
++		icsk->icsk_ext_hdr_len = opt->opt_flen + opt->opt_nflen;
+ 
+ 	inet->inet_dport = usin->sin6_port;
+ 
+diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
+index df28693f32e1..c3bfebd501ed 100644
+--- a/net/ipv4/ipmr.c
++++ b/net/ipv4/ipmr.c
+@@ -134,7 +134,7 @@ static int __ipmr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb,
+ 			      struct mfc_cache *c, struct rtmsg *rtm);
+ static void mroute_netlink_event(struct mr_table *mrt, struct mfc_cache *mfc,
+ 				 int cmd);
+-static void mroute_clean_tables(struct mr_table *mrt);
++static void mroute_clean_tables(struct mr_table *mrt, bool all);
+ static void ipmr_expire_process(unsigned long arg);
+ 
+ #ifdef CONFIG_IP_MROUTE_MULTIPLE_TABLES
+@@ -351,7 +351,7 @@ static struct mr_table *ipmr_new_table(struct net *net, u32 id)
+ static void ipmr_free_table(struct mr_table *mrt)
+ {
+ 	del_timer_sync(&mrt->ipmr_expire_timer);
+-	mroute_clean_tables(mrt);
++	mroute_clean_tables(mrt, true);
+ 	kfree(mrt);
+ }
+ 
+@@ -1209,7 +1209,7 @@ static int ipmr_mfc_add(struct net *net, struct mr_table *mrt,
+  *	Close the multicast socket, and clear the vif tables etc
+  */
+ 
+-static void mroute_clean_tables(struct mr_table *mrt)
++static void mroute_clean_tables(struct mr_table *mrt, bool all)
+ {
+ 	int i;
+ 	LIST_HEAD(list);
+@@ -1218,8 +1218,9 @@ static void mroute_clean_tables(struct mr_table *mrt)
+ 	/* Shut down all active vif entries */
+ 
+ 	for (i = 0; i < mrt->maxvif; i++) {
+-		if (!(mrt->vif_table[i].flags & VIFF_STATIC))
+-			vif_delete(mrt, i, 0, &list);
++		if (!all && (mrt->vif_table[i].flags & VIFF_STATIC))
++			continue;
++		vif_delete(mrt, i, 0, &list);
+ 	}
+ 	unregister_netdevice_many(&list);
+ 
+@@ -1227,7 +1228,7 @@ static void mroute_clean_tables(struct mr_table *mrt)
+ 
+ 	for (i = 0; i < MFC_LINES; i++) {
+ 		list_for_each_entry_safe(c, next, &mrt->mfc_cache_array[i], list) {
+-			if (c->mfc_flags & MFC_STATIC)
++			if (!all && (c->mfc_flags & MFC_STATIC))
+ 				continue;
+ 			list_del_rcu(&c->list);
+ 			mroute_netlink_event(mrt, c, RTM_DELROUTE);
+@@ -1262,7 +1263,7 @@ static void mrtsock_destruct(struct sock *sk)
+ 						    NETCONFA_IFINDEX_ALL,
+ 						    net->ipv4.devconf_all);
+ 			RCU_INIT_POINTER(mrt->mroute_sk, NULL);
+-			mroute_clean_tables(mrt);
++			mroute_clean_tables(mrt, false);
+ 		}
+ 	}
+ 	rtnl_unlock();
+diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
+index 728f5b3d3c64..77730b43469d 100644
+--- a/net/ipv4/tcp_input.c
++++ b/net/ipv4/tcp_input.c
+@@ -4434,19 +4434,34 @@ static int __must_check tcp_queue_rcv(struct sock *sk, struct sk_buff *skb, int
+ int tcp_send_rcvq(struct sock *sk, struct msghdr *msg, size_t size)
+ {
+ 	struct sk_buff *skb;
++	int err = -ENOMEM;
++	int data_len = 0;
+ 	bool fragstolen;
+ 
+ 	if (size == 0)
+ 		return 0;
+ 
+-	skb = alloc_skb(size, sk->sk_allocation);
++	if (size > PAGE_SIZE) {
++		int npages = min_t(size_t, size >> PAGE_SHIFT, MAX_SKB_FRAGS);
++
++		data_len = npages << PAGE_SHIFT;
++		size = data_len + (size & ~PAGE_MASK);
++	}
++	skb = alloc_skb_with_frags(size - data_len, data_len,
++				   PAGE_ALLOC_COSTLY_ORDER,
++				   &err, sk->sk_allocation);
+ 	if (!skb)
+ 		goto err;
+ 
++	skb_put(skb, size - data_len);
++	skb->data_len = data_len;
++	skb->len = size;
++
+ 	if (tcp_try_rmem_schedule(sk, skb, skb->truesize))
+ 		goto err_free;
+ 
+-	if (memcpy_from_msg(skb_put(skb, size), msg, size))
++	err = skb_copy_datagram_from_iter(skb, 0, &msg->msg_iter, size);
++	if (err)
+ 		goto err_free;
+ 
+ 	TCP_SKB_CB(skb)->seq = tcp_sk(sk)->rcv_nxt;
+@@ -4462,7 +4477,8 @@ int tcp_send_rcvq(struct sock *sk, struct msghdr *msg, size_t size)
+ err_free:
+ 	kfree_skb(skb);
+ err:
+-	return -ENOMEM;
++	return err;
++
+ }
+ 
+ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
+@@ -5620,6 +5636,7 @@ discard:
+ 		}
+ 
+ 		tp->rcv_nxt = TCP_SKB_CB(skb)->seq + 1;
++		tp->copied_seq = tp->rcv_nxt;
+ 		tp->rcv_wup = TCP_SKB_CB(skb)->seq + 1;
+ 
+ 		/* RFC1323: The window in SYN & SYN/ACK segments is
+diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
+index 0ea2e1c5d395..569c63894472 100644
+--- a/net/ipv4/tcp_ipv4.c
++++ b/net/ipv4/tcp_ipv4.c
+@@ -922,7 +922,8 @@ int tcp_md5_do_add(struct sock *sk, const union tcp_md5_addr *addr,
+ 	}
+ 
+ 	md5sig = rcu_dereference_protected(tp->md5sig_info,
+-					   sock_owned_by_user(sk));
++					   sock_owned_by_user(sk) ||
++					   lockdep_is_held(&sk->sk_lock.slock));
+ 	if (!md5sig) {
+ 		md5sig = kmalloc(sizeof(*md5sig), gfp);
+ 		if (!md5sig)
+diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
+index 5b752f58a900..1e63c8fe1db8 100644
+--- a/net/ipv4/tcp_timer.c
++++ b/net/ipv4/tcp_timer.c
+@@ -176,6 +176,18 @@ static int tcp_write_timeout(struct sock *sk)
+ 		syn_set = true;
+ 	} else {
+ 		if (retransmits_timed_out(sk, sysctl_tcp_retries1, 0, 0)) {
++			/* Some middle-boxes may black-hole Fast Open _after_
++			 * the handshake. Therefore we conservatively disable
++			 * Fast Open on this path on recurring timeouts with
++			 * few or zero bytes acked after Fast Open.
++			 */
++			if (tp->syn_data_acked &&
++			    tp->bytes_acked <= tp->rx_opt.mss_clamp) {
++				tcp_fastopen_cache_set(sk, 0, NULL, true, 0);
++				if (icsk->icsk_retransmits == sysctl_tcp_retries1)
++					NET_INC_STATS_BH(sock_net(sk),
++							 LINUX_MIB_TCPFASTOPENACTIVEFAIL);
++			}
+ 			/* Black hole detection */
+ 			tcp_mtu_probing(icsk, sk);
+ 
+diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
+index 7de52b65173f..d87519efc3bd 100644
+--- a/net/ipv6/af_inet6.c
++++ b/net/ipv6/af_inet6.c
+@@ -426,9 +426,11 @@ void inet6_destroy_sock(struct sock *sk)
+ 
+ 	/* Free tx options */
+ 
+-	opt = xchg(&np->opt, NULL);
+-	if (opt)
+-		sock_kfree_s(sk, opt, opt->tot_len);
++	opt = xchg((__force struct ipv6_txoptions **)&np->opt, NULL);
++	if (opt) {
++		atomic_sub(opt->tot_len, &sk->sk_omem_alloc);
++		txopt_put(opt);
++	}
+ }
+ EXPORT_SYMBOL_GPL(inet6_destroy_sock);
+ 
+@@ -657,7 +659,10 @@ int inet6_sk_rebuild_header(struct sock *sk)
+ 		fl6.fl6_sport = inet->inet_sport;
+ 		security_sk_classify_flow(sk, flowi6_to_flowi(&fl6));
+ 
+-		final_p = fl6_update_dst(&fl6, np->opt, &final);
++		rcu_read_lock();
++		final_p = fl6_update_dst(&fl6, rcu_dereference(np->opt),
++					 &final);
++		rcu_read_unlock();
+ 
+ 		dst = ip6_dst_lookup_flow(sk, &fl6, final_p);
+ 		if (IS_ERR(dst)) {
+diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
+index b10a88986a98..13ca4cf5616f 100644
+--- a/net/ipv6/datagram.c
++++ b/net/ipv6/datagram.c
+@@ -167,8 +167,10 @@ ipv4_connected:
+ 
+ 	security_sk_classify_flow(sk, flowi6_to_flowi(&fl6));
+ 
+-	opt = flowlabel ? flowlabel->opt : np->opt;
++	rcu_read_lock();
++	opt = flowlabel ? flowlabel->opt : rcu_dereference(np->opt);
+ 	final_p = fl6_update_dst(&fl6, opt, &final);
++	rcu_read_unlock();
+ 
+ 	dst = ip6_dst_lookup_flow(sk, &fl6, final_p);
+ 	err = 0;
+diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
+index a7bbbe45570b..adbd6958c398 100644
+--- a/net/ipv6/exthdrs.c
++++ b/net/ipv6/exthdrs.c
+@@ -727,6 +727,7 @@ ipv6_dup_options(struct sock *sk, struct ipv6_txoptions *opt)
+ 			*((char **)&opt2->dst1opt) += dif;
+ 		if (opt2->srcrt)
+ 			*((char **)&opt2->srcrt) += dif;
++		atomic_set(&opt2->refcnt, 1);
+ 	}
+ 	return opt2;
+ }
+@@ -790,7 +791,7 @@ ipv6_renew_options(struct sock *sk, struct ipv6_txoptions *opt,
+ 		return ERR_PTR(-ENOBUFS);
+ 
+ 	memset(opt2, 0, tot_len);
+-
++	atomic_set(&opt2->refcnt, 1);
+ 	opt2->tot_len = tot_len;
+ 	p = (char *)(opt2 + 1);
+ 
+diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c
+index 6927f3fb5597..9beed302eb36 100644
+--- a/net/ipv6/inet6_connection_sock.c
++++ b/net/ipv6/inet6_connection_sock.c
+@@ -77,7 +77,9 @@ struct dst_entry *inet6_csk_route_req(struct sock *sk,
+ 	memset(fl6, 0, sizeof(*fl6));
+ 	fl6->flowi6_proto = IPPROTO_TCP;
+ 	fl6->daddr = ireq->ir_v6_rmt_addr;
+-	final_p = fl6_update_dst(fl6, np->opt, &final);
++	rcu_read_lock();
++	final_p = fl6_update_dst(fl6, rcu_dereference(np->opt), &final);
++	rcu_read_unlock();
+ 	fl6->saddr = ireq->ir_v6_loc_addr;
+ 	fl6->flowi6_oif = ireq->ir_iif;
+ 	fl6->flowi6_mark = ireq->ir_mark;
+@@ -207,7 +209,9 @@ static struct dst_entry *inet6_csk_route_socket(struct sock *sk,
+ 	fl6->fl6_dport = inet->inet_dport;
+ 	security_sk_classify_flow(sk, flowi6_to_flowi(fl6));
+ 
+-	final_p = fl6_update_dst(fl6, np->opt, &final);
++	rcu_read_lock();
++	final_p = fl6_update_dst(fl6, rcu_dereference(np->opt), &final);
++	rcu_read_unlock();
+ 
+ 	dst = __inet6_csk_dst_check(sk, np->dst_cookie);
+ 	if (!dst) {
+@@ -240,7 +244,8 @@ int inet6_csk_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl_unused
+ 	/* Restore final destination back after routing done */
+ 	fl6.daddr = sk->sk_v6_daddr;
+ 
+-	res = ip6_xmit(sk, skb, &fl6, np->opt, np->tclass);
++	res = ip6_xmit(sk, skb, &fl6, rcu_dereference(np->opt),
++		       np->tclass);
+ 	rcu_read_unlock();
+ 	return res;
+ }
+diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
+index 5f36266b1f5e..a7aef4b52d65 100644
+--- a/net/ipv6/ip6mr.c
++++ b/net/ipv6/ip6mr.c
+@@ -118,7 +118,7 @@ static void mr6_netlink_event(struct mr6_table *mrt, struct mfc6_cache *mfc,
+ 			      int cmd);
+ static int ip6mr_rtm_dumproute(struct sk_buff *skb,
+ 			       struct netlink_callback *cb);
+-static void mroute_clean_tables(struct mr6_table *mrt);
++static void mroute_clean_tables(struct mr6_table *mrt, bool all);
+ static void ipmr_expire_process(unsigned long arg);
+ 
+ #ifdef CONFIG_IPV6_MROUTE_MULTIPLE_TABLES
+@@ -335,7 +335,7 @@ static struct mr6_table *ip6mr_new_table(struct net *net, u32 id)
+ static void ip6mr_free_table(struct mr6_table *mrt)
+ {
+ 	del_timer_sync(&mrt->ipmr_expire_timer);
+-	mroute_clean_tables(mrt);
++	mroute_clean_tables(mrt, true);
+ 	kfree(mrt);
+ }
+ 
+@@ -1543,7 +1543,7 @@ static int ip6mr_mfc_add(struct net *net, struct mr6_table *mrt,
+  *	Close the multicast socket, and clear the vif tables etc
+  */
+ 
+-static void mroute_clean_tables(struct mr6_table *mrt)
++static void mroute_clean_tables(struct mr6_table *mrt, bool all)
+ {
+ 	int i;
+ 	LIST_HEAD(list);
+@@ -1553,8 +1553,9 @@ static void mroute_clean_tables(struct mr6_table *mrt)
+ 	 *	Shut down all active vif entries
+ 	 */
+ 	for (i = 0; i < mrt->maxvif; i++) {
+-		if (!(mrt->vif6_table[i].flags & VIFF_STATIC))
+-			mif6_delete(mrt, i, &list);
++		if (!all && (mrt->vif6_table[i].flags & VIFF_STATIC))
++			continue;
++		mif6_delete(mrt, i, &list);
+ 	}
+ 	unregister_netdevice_many(&list);
+ 
+@@ -1563,7 +1564,7 @@ static void mroute_clean_tables(struct mr6_table *mrt)
+ 	 */
+ 	for (i = 0; i < MFC6_LINES; i++) {
+ 		list_for_each_entry_safe(c, next, &mrt->mfc6_cache_array[i], list) {
+-			if (c->mfc_flags & MFC_STATIC)
++			if (!all && (c->mfc_flags & MFC_STATIC))
+ 				continue;
+ 			write_lock_bh(&mrt_lock);
+ 			list_del(&c->list);
+@@ -1626,7 +1627,7 @@ int ip6mr_sk_done(struct sock *sk)
+ 						     net->ipv6.devconf_all);
+ 			write_unlock_bh(&mrt_lock);
+ 
+-			mroute_clean_tables(mrt);
++			mroute_clean_tables(mrt, false);
+ 			err = 0;
+ 			break;
+ 		}
+diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
+index 63e6956917c9..4449ad1f8114 100644
+--- a/net/ipv6/ipv6_sockglue.c
++++ b/net/ipv6/ipv6_sockglue.c
+@@ -111,7 +111,8 @@ struct ipv6_txoptions *ipv6_update_options(struct sock *sk,
+ 			icsk->icsk_sync_mss(sk, icsk->icsk_pmtu_cookie);
+ 		}
+ 	}
+-	opt = xchg(&inet6_sk(sk)->opt, opt);
++	opt = xchg((__force struct ipv6_txoptions **)&inet6_sk(sk)->opt,
++		   opt);
+ 	sk_dst_reset(sk);
+ 
+ 	return opt;
+@@ -231,9 +232,12 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
+ 				sk->sk_socket->ops = &inet_dgram_ops;
+ 				sk->sk_family = PF_INET;
+ 			}
+-			opt = xchg(&np->opt, NULL);
+-			if (opt)
+-				sock_kfree_s(sk, opt, opt->tot_len);
++			opt = xchg((__force struct ipv6_txoptions **)&np->opt,
++				   NULL);
++			if (opt) {
++				atomic_sub(opt->tot_len, &sk->sk_omem_alloc);
++				txopt_put(opt);
++			}
+ 			pktopt = xchg(&np->pktoptions, NULL);
+ 			kfree_skb(pktopt);
+ 
+@@ -403,7 +407,8 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
+ 		if (optname != IPV6_RTHDR && !ns_capable(net->user_ns, CAP_NET_RAW))
+ 			break;
+ 
+-		opt = ipv6_renew_options(sk, np->opt, optname,
++		opt = rcu_dereference_protected(np->opt, sock_owned_by_user(sk));
++		opt = ipv6_renew_options(sk, opt, optname,
+ 					 (struct ipv6_opt_hdr __user *)optval,
+ 					 optlen);
+ 		if (IS_ERR(opt)) {
+@@ -432,8 +437,10 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
+ 		retv = 0;
+ 		opt = ipv6_update_options(sk, opt);
+ sticky_done:
+-		if (opt)
+-			sock_kfree_s(sk, opt, opt->tot_len);
++		if (opt) {
++			atomic_sub(opt->tot_len, &sk->sk_omem_alloc);
++			txopt_put(opt);
++		}
+ 		break;
+ 	}
+ 
+@@ -486,6 +493,7 @@ sticky_done:
+ 			break;
+ 
+ 		memset(opt, 0, sizeof(*opt));
++		atomic_set(&opt->refcnt, 1);
+ 		opt->tot_len = sizeof(*opt) + optlen;
+ 		retv = -EFAULT;
+ 		if (copy_from_user(opt+1, optval, optlen))
+@@ -502,8 +510,10 @@ update:
+ 		retv = 0;
+ 		opt = ipv6_update_options(sk, opt);
+ done:
+-		if (opt)
+-			sock_kfree_s(sk, opt, opt->tot_len);
++		if (opt) {
++			atomic_sub(opt->tot_len, &sk->sk_omem_alloc);
++			txopt_put(opt);
++		}
+ 		break;
+ 	}
+ 	case IPV6_UNICAST_HOPS:
+@@ -1110,10 +1120,11 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
+ 	case IPV6_RTHDR:
+ 	case IPV6_DSTOPTS:
+ 	{
++		struct ipv6_txoptions *opt;
+ 
+ 		lock_sock(sk);
+-		len = ipv6_getsockopt_sticky(sk, np->opt,
+-					     optname, optval, len);
++		opt = rcu_dereference_protected(np->opt, sock_owned_by_user(sk));
++		len = ipv6_getsockopt_sticky(sk, opt, optname, optval, len);
+ 		release_sock(sk);
+ 		/* check if ipv6_getsockopt_sticky() returns err code */
+ 		if (len < 0)
+diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
+index 083b2927fc67..41e3b5ee8d0b 100644
+--- a/net/ipv6/mcast.c
++++ b/net/ipv6/mcast.c
+@@ -1651,7 +1651,6 @@ out:
+ 	if (!err) {
+ 		ICMP6MSGOUT_INC_STATS(net, idev, ICMPV6_MLD2_REPORT);
+ 		ICMP6_INC_STATS(net, idev, ICMP6_MIB_OUTMSGS);
+-		IP6_UPD_PO_STATS(net, idev, IPSTATS_MIB_OUTMCAST, payload_len);
+ 	} else {
+ 		IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS);
+ 	}
+@@ -2014,7 +2013,6 @@ out:
+ 	if (!err) {
+ 		ICMP6MSGOUT_INC_STATS(net, idev, type);
+ 		ICMP6_INC_STATS(net, idev, ICMP6_MIB_OUTMSGS);
+-		IP6_UPD_PO_STATS(net, idev, IPSTATS_MIB_OUTMCAST, full_len);
+ 	} else
+ 		IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS);
+ 
+diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
+index 6d02498172c1..2a4682c847b0 100644
+--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
++++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
+@@ -190,7 +190,7 @@ static void nf_ct_frag6_expire(unsigned long data)
+ /* Creation primitives. */
+ static inline struct frag_queue *fq_find(struct net *net, __be32 id,
+ 					 u32 user, struct in6_addr *src,
+-					 struct in6_addr *dst, u8 ecn)
++					 struct in6_addr *dst, int iif, u8 ecn)
+ {
+ 	struct inet_frag_queue *q;
+ 	struct ip6_create_arg arg;
+@@ -200,6 +200,7 @@ static inline struct frag_queue *fq_find(struct net *net, __be32 id,
+ 	arg.user = user;
+ 	arg.src = src;
+ 	arg.dst = dst;
++	arg.iif = iif;
+ 	arg.ecn = ecn;
+ 
+ 	local_bh_disable();
+@@ -603,7 +604,7 @@ struct sk_buff *nf_ct_frag6_gather(struct sk_buff *skb, u32 user)
+ 	fhdr = (struct frag_hdr *)skb_transport_header(clone);
+ 
+ 	fq = fq_find(net, fhdr->identification, user, &hdr->saddr, &hdr->daddr,
+-		     ip6_frag_ecn(hdr));
++		     skb->dev ? skb->dev->ifindex : 0, ip6_frag_ecn(hdr));
+ 	if (fq == NULL) {
+ 		pr_debug("Can't find and can't create new queue\n");
+ 		goto ret_orig;
+diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
+index ca4700cb26c4..92d532967c90 100644
+--- a/net/ipv6/raw.c
++++ b/net/ipv6/raw.c
+@@ -731,6 +731,7 @@ static int raw6_getfrag(void *from, char *to, int offset, int len, int odd,
+ 
+ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
+ {
++	struct ipv6_txoptions *opt_to_free = NULL;
+ 	struct ipv6_txoptions opt_space;
+ 	DECLARE_SOCKADDR(struct sockaddr_in6 *, sin6, msg->msg_name);
+ 	struct in6_addr *daddr, *final_p, final;
+@@ -837,8 +838,10 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
+ 		if (!(opt->opt_nflen|opt->opt_flen))
+ 			opt = NULL;
+ 	}
+-	if (!opt)
+-		opt = np->opt;
++	if (!opt) {
++		opt = txopt_get(np);
++		opt_to_free = opt;
++		}
+ 	if (flowlabel)
+ 		opt = fl6_merge_options(&opt_space, flowlabel, opt);
+ 	opt = ipv6_fixup_options(&opt_space, opt);
+@@ -904,6 +907,7 @@ done:
+ 	dst_release(dst);
+ out:
+ 	fl6_sock_release(flowlabel);
++	txopt_put(opt_to_free);
+ 	return err < 0 ? err : len;
+ do_confirm:
+ 	dst_confirm(dst);
+diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
+index f1159bb76e0a..04013a910ce5 100644
+--- a/net/ipv6/reassembly.c
++++ b/net/ipv6/reassembly.c
+@@ -108,7 +108,10 @@ bool ip6_frag_match(const struct inet_frag_queue *q, const void *a)
+ 	return	fq->id == arg->id &&
+ 		fq->user == arg->user &&
+ 		ipv6_addr_equal(&fq->saddr, arg->src) &&
+-		ipv6_addr_equal(&fq->daddr, arg->dst);
++		ipv6_addr_equal(&fq->daddr, arg->dst) &&
++		(arg->iif == fq->iif ||
++		 !(ipv6_addr_type(arg->dst) & (IPV6_ADDR_MULTICAST |
++					       IPV6_ADDR_LINKLOCAL)));
+ }
+ EXPORT_SYMBOL(ip6_frag_match);
+ 
+@@ -180,7 +183,7 @@ static void ip6_frag_expire(unsigned long data)
+ 
+ static struct frag_queue *
+ fq_find(struct net *net, __be32 id, const struct in6_addr *src,
+-	const struct in6_addr *dst, u8 ecn)
++	const struct in6_addr *dst, int iif, u8 ecn)
+ {
+ 	struct inet_frag_queue *q;
+ 	struct ip6_create_arg arg;
+@@ -190,6 +193,7 @@ fq_find(struct net *net, __be32 id, const struct in6_addr *src,
+ 	arg.user = IP6_DEFRAG_LOCAL_DELIVER;
+ 	arg.src = src;
+ 	arg.dst = dst;
++	arg.iif = iif;
+ 	arg.ecn = ecn;
+ 
+ 	hash = inet6_hash_frag(id, src, dst);
+@@ -551,7 +555,7 @@ static int ipv6_frag_rcv(struct sk_buff *skb)
+ 	}
+ 
+ 	fq = fq_find(net, fhdr->identification, &hdr->saddr, &hdr->daddr,
+-		     ip6_frag_ecn(hdr));
++		     skb->dev ? skb->dev->ifindex : 0, ip6_frag_ecn(hdr));
+ 	if (fq) {
+ 		int ret;
+ 
+diff --git a/net/ipv6/route.c b/net/ipv6/route.c
+index dd6ebba5846c..8478719ef500 100644
+--- a/net/ipv6/route.c
++++ b/net/ipv6/route.c
+@@ -401,6 +401,14 @@ static void ip6_dst_ifdown(struct dst_entry *dst, struct net_device *dev,
+ 	}
+ }
+ 
++static bool __rt6_check_expired(const struct rt6_info *rt)
++{
++	if (rt->rt6i_flags & RTF_EXPIRES)
++		return time_after(jiffies, rt->dst.expires);
++	else
++		return false;
++}
++
+ static bool rt6_check_expired(const struct rt6_info *rt)
+ {
+ 	if (rt->rt6i_flags & RTF_EXPIRES) {
+@@ -1255,7 +1263,8 @@ static struct dst_entry *rt6_check(struct rt6_info *rt, u32 cookie)
+ 
+ static struct dst_entry *rt6_dst_from_check(struct rt6_info *rt, u32 cookie)
+ {
+-	if (rt->dst.obsolete == DST_OBSOLETE_FORCE_CHK &&
++	if (!__rt6_check_expired(rt) &&
++	    rt->dst.obsolete == DST_OBSOLETE_FORCE_CHK &&
+ 	    rt6_check((struct rt6_info *)(rt->dst.from), cookie))
+ 		return &rt->dst;
+ 	else
+@@ -1275,7 +1284,8 @@ static struct dst_entry *ip6_dst_check(struct dst_entry *dst, u32 cookie)
+ 
+ 	rt6_dst_from_metrics_check(rt);
+ 
+-	if ((rt->rt6i_flags & RTF_PCPU) || unlikely(dst->flags & DST_NOCACHE))
++	if (rt->rt6i_flags & RTF_PCPU ||
++	    (unlikely(dst->flags & DST_NOCACHE) && rt->dst.from))
+ 		return rt6_dst_from_check(rt, cookie);
+ 	else
+ 		return rt6_check(rt, cookie);
+@@ -1326,6 +1336,12 @@ static void rt6_do_update_pmtu(struct rt6_info *rt, u32 mtu)
+ 	rt6_update_expires(rt, net->ipv6.sysctl.ip6_rt_mtu_expires);
+ }
+ 
++static bool rt6_cache_allowed_for_pmtu(const struct rt6_info *rt)
++{
++	return !(rt->rt6i_flags & RTF_CACHE) &&
++		(rt->rt6i_flags & RTF_PCPU || rt->rt6i_node);
++}
++
+ static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
+ 				 const struct ipv6hdr *iph, u32 mtu)
+ {
+@@ -1339,7 +1355,7 @@ static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
+ 	if (mtu >= dst_mtu(dst))
+ 		return;
+ 
+-	if (rt6->rt6i_flags & RTF_CACHE) {
++	if (!rt6_cache_allowed_for_pmtu(rt6)) {
+ 		rt6_do_update_pmtu(rt6, mtu);
+ 	} else {
+ 		const struct in6_addr *daddr, *saddr;
+diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
+index 0909f4e0d53c..f30bfdcdea54 100644
+--- a/net/ipv6/syncookies.c
++++ b/net/ipv6/syncookies.c
+@@ -225,7 +225,7 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
+ 		memset(&fl6, 0, sizeof(fl6));
+ 		fl6.flowi6_proto = IPPROTO_TCP;
+ 		fl6.daddr = ireq->ir_v6_rmt_addr;
+-		final_p = fl6_update_dst(&fl6, np->opt, &final);
++		final_p = fl6_update_dst(&fl6, rcu_dereference(np->opt), &final);
+ 		fl6.saddr = ireq->ir_v6_loc_addr;
+ 		fl6.flowi6_oif = sk->sk_bound_dev_if;
+ 		fl6.flowi6_mark = ireq->ir_mark;
+diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
+index 7a6cea5e4274..45e473ee340b 100644
+--- a/net/ipv6/tcp_ipv6.c
++++ b/net/ipv6/tcp_ipv6.c
+@@ -120,6 +120,7 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
+ 	struct ipv6_pinfo *np = inet6_sk(sk);
+ 	struct tcp_sock *tp = tcp_sk(sk);
+ 	struct in6_addr *saddr = NULL, *final_p, final;
++	struct ipv6_txoptions *opt;
+ 	struct flowi6 fl6;
+ 	struct dst_entry *dst;
+ 	int addr_type;
+@@ -235,7 +236,8 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
+ 	fl6.fl6_dport = usin->sin6_port;
+ 	fl6.fl6_sport = inet->inet_sport;
+ 
+-	final_p = fl6_update_dst(&fl6, np->opt, &final);
++	opt = rcu_dereference_protected(np->opt, sock_owned_by_user(sk));
++	final_p = fl6_update_dst(&fl6, opt, &final);
+ 
+ 	security_sk_classify_flow(sk, flowi6_to_flowi(&fl6));
+ 
+@@ -263,9 +265,9 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
+ 		tcp_fetch_timewait_stamp(sk, dst);
+ 
+ 	icsk->icsk_ext_hdr_len = 0;
+-	if (np->opt)
+-		icsk->icsk_ext_hdr_len = (np->opt->opt_flen +
+-					  np->opt->opt_nflen);
++	if (opt)
++		icsk->icsk_ext_hdr_len = opt->opt_flen +
++					 opt->opt_nflen;
+ 
+ 	tp->rx_opt.mss_clamp = IPV6_MIN_MTU - sizeof(struct tcphdr) - sizeof(struct ipv6hdr);
+ 
+@@ -461,7 +463,8 @@ static int tcp_v6_send_synack(struct sock *sk, struct dst_entry *dst,
+ 			fl6->flowlabel = ip6_flowlabel(ipv6_hdr(ireq->pktopts));
+ 
+ 		skb_set_queue_mapping(skb, queue_mapping);
+-		err = ip6_xmit(sk, skb, fl6, np->opt, np->tclass);
++		err = ip6_xmit(sk, skb, fl6, rcu_dereference(np->opt),
++			       np->tclass);
+ 		err = net_xmit_eval(err);
+ 	}
+ 
+@@ -991,6 +994,7 @@ static struct sock *tcp_v6_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
+ 	struct inet_request_sock *ireq;
+ 	struct ipv6_pinfo *newnp, *np = inet6_sk(sk);
+ 	struct tcp6_sock *newtcp6sk;
++	struct ipv6_txoptions *opt;
+ 	struct inet_sock *newinet;
+ 	struct tcp_sock *newtp;
+ 	struct sock *newsk;
+@@ -1126,13 +1130,15 @@ static struct sock *tcp_v6_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
+ 	   but we make one more one thing there: reattach optmem
+ 	   to newsk.
+ 	 */
+-	if (np->opt)
+-		newnp->opt = ipv6_dup_options(newsk, np->opt);
+-
++	opt = rcu_dereference(np->opt);
++	if (opt) {
++		opt = ipv6_dup_options(newsk, opt);
++		RCU_INIT_POINTER(newnp->opt, opt);
++	}
+ 	inet_csk(newsk)->icsk_ext_hdr_len = 0;
+-	if (newnp->opt)
+-		inet_csk(newsk)->icsk_ext_hdr_len = (newnp->opt->opt_nflen +
+-						     newnp->opt->opt_flen);
++	if (opt)
++		inet_csk(newsk)->icsk_ext_hdr_len = opt->opt_nflen +
++						    opt->opt_flen;
+ 
+ 	tcp_ca_openreq_child(newsk, dst);
+ 
+diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
+index e51fc3eee6db..7333f3575fc5 100644
+--- a/net/ipv6/udp.c
++++ b/net/ipv6/udp.c
+@@ -1107,6 +1107,7 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
+ 	DECLARE_SOCKADDR(struct sockaddr_in6 *, sin6, msg->msg_name);
+ 	struct in6_addr *daddr, *final_p, final;
+ 	struct ipv6_txoptions *opt = NULL;
++	struct ipv6_txoptions *opt_to_free = NULL;
+ 	struct ip6_flowlabel *flowlabel = NULL;
+ 	struct flowi6 fl6;
+ 	struct dst_entry *dst;
+@@ -1260,8 +1261,10 @@ do_udp_sendmsg:
+ 			opt = NULL;
+ 		connected = 0;
+ 	}
+-	if (!opt)
+-		opt = np->opt;
++	if (!opt) {
++		opt = txopt_get(np);
++		opt_to_free = opt;
++	}
+ 	if (flowlabel)
+ 		opt = fl6_merge_options(&opt_space, flowlabel, opt);
+ 	opt = ipv6_fixup_options(&opt_space, opt);
+@@ -1370,6 +1373,7 @@ release_dst:
+ out:
+ 	dst_release(dst);
+ 	fl6_sock_release(flowlabel);
++	txopt_put(opt_to_free);
+ 	if (!err)
+ 		return len;
+ 	/*
+diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
+index d1ded3777815..0ce9da948ad7 100644
+--- a/net/l2tp/l2tp_ip6.c
++++ b/net/l2tp/l2tp_ip6.c
+@@ -486,6 +486,7 @@ static int l2tp_ip6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
+ 	DECLARE_SOCKADDR(struct sockaddr_l2tpip6 *, lsa, msg->msg_name);
+ 	struct in6_addr *daddr, *final_p, final;
+ 	struct ipv6_pinfo *np = inet6_sk(sk);
++	struct ipv6_txoptions *opt_to_free = NULL;
+ 	struct ipv6_txoptions *opt = NULL;
+ 	struct ip6_flowlabel *flowlabel = NULL;
+ 	struct dst_entry *dst = NULL;
+@@ -575,8 +576,10 @@ static int l2tp_ip6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
+ 			opt = NULL;
+ 	}
+ 
+-	if (opt == NULL)
+-		opt = np->opt;
++	if (!opt) {
++		opt = txopt_get(np);
++		opt_to_free = opt;
++	}
+ 	if (flowlabel)
+ 		opt = fl6_merge_options(&opt_space, flowlabel, opt);
+ 	opt = ipv6_fixup_options(&opt_space, opt);
+@@ -631,6 +634,7 @@ done:
+ 	dst_release(dst);
+ out:
+ 	fl6_sock_release(flowlabel);
++	txopt_put(opt_to_free);
+ 
+ 	return err < 0 ? err : len;
+ 
+diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
+index 71cb085e16fd..71d671c06952 100644
+--- a/net/packet/af_packet.c
++++ b/net/packet/af_packet.c
+@@ -1622,6 +1622,20 @@ static void fanout_release(struct sock *sk)
+ 		kfree_rcu(po->rollover, rcu);
+ }
+ 
++static bool packet_extra_vlan_len_allowed(const struct net_device *dev,
++					  struct sk_buff *skb)
++{
++	/* Earlier code assumed this would be a VLAN pkt, double-check
++	 * this now that we have the actual packet in hand. We can only
++	 * do this check on Ethernet devices.
++	 */
++	if (unlikely(dev->type != ARPHRD_ETHER))
++		return false;
++
++	skb_reset_mac_header(skb);
++	return likely(eth_hdr(skb)->h_proto == htons(ETH_P_8021Q));
++}
++
+ static const struct proto_ops packet_ops;
+ 
+ static const struct proto_ops packet_ops_spkt;
+@@ -1783,18 +1797,10 @@ retry:
+ 		goto retry;
+ 	}
+ 
+-	if (len > (dev->mtu + dev->hard_header_len + extra_len)) {
+-		/* Earlier code assumed this would be a VLAN pkt,
+-		 * double-check this now that we have the actual
+-		 * packet in hand.
+-		 */
+-		struct ethhdr *ehdr;
+-		skb_reset_mac_header(skb);
+-		ehdr = eth_hdr(skb);
+-		if (ehdr->h_proto != htons(ETH_P_8021Q)) {
+-			err = -EMSGSIZE;
+-			goto out_unlock;
+-		}
++	if (len > (dev->mtu + dev->hard_header_len + extra_len) &&
++	    !packet_extra_vlan_len_allowed(dev, skb)) {
++		err = -EMSGSIZE;
++		goto out_unlock;
+ 	}
+ 
+ 	skb->protocol = proto;
+@@ -2213,6 +2219,15 @@ static bool ll_header_truncated(const struct net_device *dev, int len)
+ 	return false;
+ }
+ 
++static void tpacket_set_protocol(const struct net_device *dev,
++				 struct sk_buff *skb)
++{
++	if (dev->type == ARPHRD_ETHER) {
++		skb_reset_mac_header(skb);
++		skb->protocol = eth_hdr(skb)->h_proto;
++	}
++}
++
+ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
+ 		void *frame, struct net_device *dev, int size_max,
+ 		__be16 proto, unsigned char *addr, int hlen)
+@@ -2249,8 +2264,6 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
+ 	skb_reserve(skb, hlen);
+ 	skb_reset_network_header(skb);
+ 
+-	if (!packet_use_direct_xmit(po))
+-		skb_probe_transport_header(skb, 0);
+ 	if (unlikely(po->tp_tx_has_off)) {
+ 		int off_min, off_max, off;
+ 		off_min = po->tp_hdrlen - sizeof(struct sockaddr_ll);
+@@ -2296,6 +2309,8 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
+ 				dev->hard_header_len);
+ 		if (unlikely(err))
+ 			return err;
++		if (!skb->protocol)
++			tpacket_set_protocol(dev, skb);
+ 
+ 		data += dev->hard_header_len;
+ 		to_write -= dev->hard_header_len;
+@@ -2330,6 +2345,8 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
+ 		len = ((to_write > len_max) ? len_max : to_write);
+ 	}
+ 
++	skb_probe_transport_header(skb, 0);
++
+ 	return tp_len;
+ }
+ 
+@@ -2374,12 +2391,13 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
+ 	if (unlikely(!(dev->flags & IFF_UP)))
+ 		goto out_put;
+ 
+-	reserve = dev->hard_header_len + VLAN_HLEN;
++	if (po->sk.sk_socket->type == SOCK_RAW)
++		reserve = dev->hard_header_len;
+ 	size_max = po->tx_ring.frame_size
+ 		- (po->tp_hdrlen - sizeof(struct sockaddr_ll));
+ 
+-	if (size_max > dev->mtu + reserve)
+-		size_max = dev->mtu + reserve;
++	if (size_max > dev->mtu + reserve + VLAN_HLEN)
++		size_max = dev->mtu + reserve + VLAN_HLEN;
+ 
+ 	do {
+ 		ph = packet_current_frame(po, &po->tx_ring,
+@@ -2406,18 +2424,10 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
+ 		tp_len = tpacket_fill_skb(po, skb, ph, dev, size_max, proto,
+ 					  addr, hlen);
+ 		if (likely(tp_len >= 0) &&
+-		    tp_len > dev->mtu + dev->hard_header_len) {
+-			struct ethhdr *ehdr;
+-			/* Earlier code assumed this would be a VLAN pkt,
+-			 * double-check this now that we have the actual
+-			 * packet in hand.
+-			 */
++		    tp_len > dev->mtu + reserve &&
++		    !packet_extra_vlan_len_allowed(dev, skb))
++			tp_len = -EMSGSIZE;
+ 
+-			skb_reset_mac_header(skb);
+-			ehdr = eth_hdr(skb);
+-			if (ehdr->h_proto != htons(ETH_P_8021Q))
+-				tp_len = -EMSGSIZE;
+-		}
+ 		if (unlikely(tp_len < 0)) {
+ 			if (po->tp_loss) {
+ 				__packet_set_status(po, ph,
+@@ -2638,18 +2648,10 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
+ 
+ 	sock_tx_timestamp(sk, &skb_shinfo(skb)->tx_flags);
+ 
+-	if (!gso_type && (len > dev->mtu + reserve + extra_len)) {
+-		/* Earlier code assumed this would be a VLAN pkt,
+-		 * double-check this now that we have the actual
+-		 * packet in hand.
+-		 */
+-		struct ethhdr *ehdr;
+-		skb_reset_mac_header(skb);
+-		ehdr = eth_hdr(skb);
+-		if (ehdr->h_proto != htons(ETH_P_8021Q)) {
+-			err = -EMSGSIZE;
+-			goto out_free;
+-		}
++	if (!gso_type && (len > dev->mtu + reserve + extra_len) &&
++	    !packet_extra_vlan_len_allowed(dev, skb)) {
++		err = -EMSGSIZE;
++		goto out_free;
+ 	}
+ 
+ 	skb->protocol = proto;
+@@ -2680,8 +2682,8 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
+ 		len += vnet_hdr_len;
+ 	}
+ 
+-	if (!packet_use_direct_xmit(po))
+-		skb_probe_transport_header(skb, reserve);
++	skb_probe_transport_header(skb, reserve);
++
+ 	if (unlikely(extra_len == 4))
+ 		skb->no_fcs = 1;
+ 
+diff --git a/net/rds/connection.c b/net/rds/connection.c
+index 9d66705f9d41..da6da57e5f36 100644
+--- a/net/rds/connection.c
++++ b/net/rds/connection.c
+@@ -187,12 +187,6 @@ new_conn:
+ 		}
+ 	}
+ 
+-	if (trans == NULL) {
+-		kmem_cache_free(rds_conn_slab, conn);
+-		conn = ERR_PTR(-ENODEV);
+-		goto out;
+-	}
+-
+ 	conn->c_trans = trans;
+ 
+ 	ret = trans->conn_alloc(conn, gfp);
+diff --git a/net/rds/send.c b/net/rds/send.c
+index e9430f537f9c..7b30c0f3180d 100644
+--- a/net/rds/send.c
++++ b/net/rds/send.c
+@@ -986,11 +986,13 @@ int rds_sendmsg(struct socket *sock, struct msghdr *msg, size_t payload_len)
+ 		release_sock(sk);
+ 	}
+ 
+-	/* racing with another thread binding seems ok here */
++	lock_sock(sk);
+ 	if (daddr == 0 || rs->rs_bound_addr == 0) {
++		release_sock(sk);
+ 		ret = -ENOTCONN; /* XXX not a great errno */
+ 		goto out;
+ 	}
++	release_sock(sk);
+ 
+ 	/* size of rm including all sgs */
+ 	ret = rds_rm_size(msg, payload_len);
+diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
+index f06aa01d60fd..1a0aa2a7cfeb 100644
+--- a/net/sched/sch_api.c
++++ b/net/sched/sch_api.c
+@@ -253,7 +253,8 @@ int qdisc_set_default(const char *name)
+ }
+ 
+ /* We know handle. Find qdisc among all qdisc's attached to device
+-   (root qdisc, all its children, children of children etc.)
++ * (root qdisc, all its children, children of children etc.)
++ * Note: caller either uses rtnl or rcu_read_lock()
+  */
+ 
+ static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle)
+@@ -264,7 +265,7 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle)
+ 	    root->handle == handle)
+ 		return root;
+ 
+-	list_for_each_entry(q, &root->list, list) {
++	list_for_each_entry_rcu(q, &root->list, list) {
+ 		if (q->handle == handle)
+ 			return q;
+ 	}
+@@ -277,15 +278,18 @@ void qdisc_list_add(struct Qdisc *q)
+ 		struct Qdisc *root = qdisc_dev(q)->qdisc;
+ 
+ 		WARN_ON_ONCE(root == &noop_qdisc);
+-		list_add_tail(&q->list, &root->list);
++		ASSERT_RTNL();
++		list_add_tail_rcu(&q->list, &root->list);
+ 	}
+ }
+ EXPORT_SYMBOL(qdisc_list_add);
+ 
+ void qdisc_list_del(struct Qdisc *q)
+ {
+-	if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS))
+-		list_del(&q->list);
++	if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) {
++		ASSERT_RTNL();
++		list_del_rcu(&q->list);
++	}
+ }
+ EXPORT_SYMBOL(qdisc_list_del);
+ 
+@@ -750,14 +754,18 @@ void qdisc_tree_decrease_qlen(struct Qdisc *sch, unsigned int n)
+ 	if (n == 0)
+ 		return;
+ 	drops = max_t(int, n, 0);
++	rcu_read_lock();
+ 	while ((parentid = sch->parent)) {
+ 		if (TC_H_MAJ(parentid) == TC_H_MAJ(TC_H_INGRESS))
+-			return;
++			break;
+ 
++		if (sch->flags & TCQ_F_NOPARENT)
++			break;
++		/* TODO: perform the search on a per txq basis */
+ 		sch = qdisc_lookup(qdisc_dev(sch), TC_H_MAJ(parentid));
+ 		if (sch == NULL) {
+-			WARN_ON(parentid != TC_H_ROOT);
+-			return;
++			WARN_ON_ONCE(parentid != TC_H_ROOT);
++			break;
+ 		}
+ 		cops = sch->ops->cl_ops;
+ 		if (cops->qlen_notify) {
+@@ -768,6 +776,7 @@ void qdisc_tree_decrease_qlen(struct Qdisc *sch, unsigned int n)
+ 		sch->q.qlen -= n;
+ 		__qdisc_qstats_drop(sch, drops);
+ 	}
++	rcu_read_unlock();
+ }
+ EXPORT_SYMBOL(qdisc_tree_decrease_qlen);
+ 
+@@ -941,7 +950,7 @@ qdisc_create(struct net_device *dev, struct netdev_queue *dev_queue,
+ 		}
+ 		lockdep_set_class(qdisc_lock(sch), &qdisc_tx_lock);
+ 		if (!netif_is_multiqueue(dev))
+-			sch->flags |= TCQ_F_ONETXQUEUE;
++			sch->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
+ 	}
+ 
+ 	sch->handle = handle;
+diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
+index 6efca30894aa..b453270be3fd 100644
+--- a/net/sched/sch_generic.c
++++ b/net/sched/sch_generic.c
+@@ -743,7 +743,7 @@ static void attach_one_default_qdisc(struct net_device *dev,
+ 			return;
+ 		}
+ 		if (!netif_is_multiqueue(dev))
+-			qdisc->flags |= TCQ_F_ONETXQUEUE;
++			qdisc->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
+ 	}
+ 	dev_queue->qdisc_sleeping = qdisc;
+ }
+diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
+index f3cbaecd283a..3e82f047caaf 100644
+--- a/net/sched/sch_mq.c
++++ b/net/sched/sch_mq.c
+@@ -63,7 +63,7 @@ static int mq_init(struct Qdisc *sch, struct nlattr *opt)
+ 		if (qdisc == NULL)
+ 			goto err;
+ 		priv->qdiscs[ntx] = qdisc;
+-		qdisc->flags |= TCQ_F_ONETXQUEUE;
++		qdisc->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
+ 	}
+ 
+ 	sch->flags |= TCQ_F_MQROOT;
+@@ -156,7 +156,7 @@ static int mq_graft(struct Qdisc *sch, unsigned long cl, struct Qdisc *new,
+ 
+ 	*old = dev_graft_qdisc(dev_queue, new);
+ 	if (new)
+-		new->flags |= TCQ_F_ONETXQUEUE;
++		new->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
+ 	if (dev->flags & IFF_UP)
+ 		dev_activate(dev);
+ 	return 0;
+diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
+index 3811a745452c..ad70ecf57ce7 100644
+--- a/net/sched/sch_mqprio.c
++++ b/net/sched/sch_mqprio.c
+@@ -132,7 +132,7 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt)
+ 			goto err;
+ 		}
+ 		priv->qdiscs[i] = qdisc;
+-		qdisc->flags |= TCQ_F_ONETXQUEUE;
++		qdisc->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
+ 	}
+ 
+ 	/* If the mqprio options indicate that hardware should own
+@@ -209,7 +209,7 @@ static int mqprio_graft(struct Qdisc *sch, unsigned long cl, struct Qdisc *new,
+ 	*old = dev_graft_qdisc(dev_queue, new);
+ 
+ 	if (new)
+-		new->flags |= TCQ_F_ONETXQUEUE;
++		new->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
+ 
+ 	if (dev->flags & IFF_UP)
+ 		dev_activate(dev);
+diff --git a/net/sctp/auth.c b/net/sctp/auth.c
+index 4f15b7d730e1..1543e39f47c3 100644
+--- a/net/sctp/auth.c
++++ b/net/sctp/auth.c
+@@ -809,8 +809,8 @@ int sctp_auth_ep_set_hmacs(struct sctp_endpoint *ep,
+ 	if (!has_sha1)
+ 		return -EINVAL;
+ 
+-	memcpy(ep->auth_hmacs_list->hmac_ids, &hmacs->shmac_idents[0],
+-		hmacs->shmac_num_idents * sizeof(__u16));
++	for (i = 0; i < hmacs->shmac_num_idents; i++)
++		ep->auth_hmacs_list->hmac_ids[i] = htons(hmacs->shmac_idents[i]);
+ 	ep->auth_hmacs_list->param_hdr.length = htons(sizeof(sctp_paramhdr_t) +
+ 				hmacs->shmac_num_idents * sizeof(__u16));
+ 	return 0;
+diff --git a/net/sctp/socket.c b/net/sctp/socket.c
+index 17bef01b9aa3..3ec88be0faec 100644
+--- a/net/sctp/socket.c
++++ b/net/sctp/socket.c
+@@ -7375,6 +7375,13 @@ struct proto sctp_prot = {
+ 
+ #if IS_ENABLED(CONFIG_IPV6)
+ 
++#include <net/transp_v6.h>
++static void sctp_v6_destroy_sock(struct sock *sk)
++{
++	sctp_destroy_sock(sk);
++	inet6_destroy_sock(sk);
++}
++
+ struct proto sctpv6_prot = {
+ 	.name		= "SCTPv6",
+ 	.owner		= THIS_MODULE,
+@@ -7384,7 +7391,7 @@ struct proto sctpv6_prot = {
+ 	.accept		= sctp_accept,
+ 	.ioctl		= sctp_ioctl,
+ 	.init		= sctp_init_sock,
+-	.destroy	= sctp_destroy_sock,
++	.destroy	= sctp_v6_destroy_sock,
+ 	.shutdown	= sctp_shutdown,
+ 	.setsockopt	= sctp_setsockopt,
+ 	.getsockopt	= sctp_getsockopt,
+diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
+index 94f658235fb4..128b0982c96b 100644
+--- a/net/unix/af_unix.c
++++ b/net/unix/af_unix.c
+@@ -326,6 +326,118 @@ found:
+ 	return s;
+ }
+ 
++/* Support code for asymmetrically connected dgram sockets
++ *
++ * If a datagram socket is connected to a socket not itself connected
++ * to the first socket (eg, /dev/log), clients may only enqueue more
++ * messages if the present receive queue of the server socket is not
++ * "too large". This means there's a second writeability condition
++ * poll and sendmsg need to test. The dgram recv code will do a wake
++ * up on the peer_wait wait queue of a socket upon reception of a
++ * datagram which needs to be propagated to sleeping would-be writers
++ * since these might not have sent anything so far. This can't be
++ * accomplished via poll_wait because the lifetime of the server
++ * socket might be less than that of its clients if these break their
++ * association with it or if the server socket is closed while clients
++ * are still connected to it and there's no way to inform "a polling
++ * implementation" that it should let go of a certain wait queue
++ *
++ * In order to propagate a wake up, a wait_queue_t of the client
++ * socket is enqueued on the peer_wait queue of the server socket
++ * whose wake function does a wake_up on the ordinary client socket
++ * wait queue. This connection is established whenever a write (or
++ * poll for write) hit the flow control condition and broken when the
++ * association to the server socket is dissolved or after a wake up
++ * was relayed.
++ */
++
++static int unix_dgram_peer_wake_relay(wait_queue_t *q, unsigned mode, int flags,
++				      void *key)
++{
++	struct unix_sock *u;
++	wait_queue_head_t *u_sleep;
++
++	u = container_of(q, struct unix_sock, peer_wake);
++
++	__remove_wait_queue(&unix_sk(u->peer_wake.private)->peer_wait,
++			    q);
++	u->peer_wake.private = NULL;
++
++	/* relaying can only happen while the wq still exists */
++	u_sleep = sk_sleep(&u->sk);
++	if (u_sleep)
++		wake_up_interruptible_poll(u_sleep, key);
++
++	return 0;
++}
++
++static int unix_dgram_peer_wake_connect(struct sock *sk, struct sock *other)
++{
++	struct unix_sock *u, *u_other;
++	int rc;
++
++	u = unix_sk(sk);
++	u_other = unix_sk(other);
++	rc = 0;
++	spin_lock(&u_other->peer_wait.lock);
++
++	if (!u->peer_wake.private) {
++		u->peer_wake.private = other;
++		__add_wait_queue(&u_other->peer_wait, &u->peer_wake);
++
++		rc = 1;
++	}
++
++	spin_unlock(&u_other->peer_wait.lock);
++	return rc;
++}
++
++static void unix_dgram_peer_wake_disconnect(struct sock *sk,
++					    struct sock *other)
++{
++	struct unix_sock *u, *u_other;
++
++	u = unix_sk(sk);
++	u_other = unix_sk(other);
++	spin_lock(&u_other->peer_wait.lock);
++
++	if (u->peer_wake.private == other) {
++		__remove_wait_queue(&u_other->peer_wait, &u->peer_wake);
++		u->peer_wake.private = NULL;
++	}
++
++	spin_unlock(&u_other->peer_wait.lock);
++}
++
++static void unix_dgram_peer_wake_disconnect_wakeup(struct sock *sk,
++						   struct sock *other)
++{
++	unix_dgram_peer_wake_disconnect(sk, other);
++	wake_up_interruptible_poll(sk_sleep(sk),
++				   POLLOUT |
++				   POLLWRNORM |
++				   POLLWRBAND);
++}
++
++/* preconditions:
++ *	- unix_peer(sk) == other
++ *	- association is stable
++ */
++static int unix_dgram_peer_wake_me(struct sock *sk, struct sock *other)
++{
++	int connected;
++
++	connected = unix_dgram_peer_wake_connect(sk, other);
++
++	if (unix_recvq_full(other))
++		return 1;
++
++	if (connected)
++		unix_dgram_peer_wake_disconnect(sk, other);
++
++	return 0;
++}
++
+ static inline int unix_writable(struct sock *sk)
+ {
+ 	return (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
+@@ -430,6 +542,8 @@ static void unix_release_sock(struct sock *sk, int embrion)
+ 			skpair->sk_state_change(skpair);
+ 			sk_wake_async(skpair, SOCK_WAKE_WAITD, POLL_HUP);
+ 		}
++
++		unix_dgram_peer_wake_disconnect(sk, skpair);
+ 		sock_put(skpair); /* It may now die */
+ 		unix_peer(sk) = NULL;
+ 	}
+@@ -440,6 +554,7 @@ static void unix_release_sock(struct sock *sk, int embrion)
+ 		if (state == TCP_LISTEN)
+ 			unix_release_sock(skb->sk, 1);
+ 		/* passed fds are erased in the kfree_skb hook	      */
++		UNIXCB(skb).consumed = skb->len;
+ 		kfree_skb(skb);
+ 	}
+ 
+@@ -664,6 +779,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern)
+ 	INIT_LIST_HEAD(&u->link);
+ 	mutex_init(&u->readlock); /* single task reading lock */
+ 	init_waitqueue_head(&u->peer_wait);
++	init_waitqueue_func_entry(&u->peer_wake, unix_dgram_peer_wake_relay);
+ 	unix_insert_socket(unix_sockets_unbound(sk), sk);
+ out:
+ 	if (sk == NULL)
+@@ -1031,6 +1147,8 @@ restart:
+ 	if (unix_peer(sk)) {
+ 		struct sock *old_peer = unix_peer(sk);
+ 		unix_peer(sk) = other;
++		unix_dgram_peer_wake_disconnect_wakeup(sk, old_peer);
++
+ 		unix_state_double_unlock(sk, other);
+ 
+ 		if (other != old_peer)
+@@ -1432,6 +1550,14 @@ static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bool sen
+ 	return err;
+ }
+ 
++static bool unix_passcred_enabled(const struct socket *sock,
++				  const struct sock *other)
++{
++	return test_bit(SOCK_PASSCRED, &sock->flags) ||
++	       !other->sk_socket ||
++	       test_bit(SOCK_PASSCRED, &other->sk_socket->flags);
++}
++
+ /*
+  * Some apps rely on write() giving SCM_CREDENTIALS
+  * We include credentials if source or destination socket
+@@ -1442,14 +1568,41 @@ static void maybe_add_creds(struct sk_buff *skb, const struct socket *sock,
+ {
+ 	if (UNIXCB(skb).pid)
+ 		return;
+-	if (test_bit(SOCK_PASSCRED, &sock->flags) ||
+-	    !other->sk_socket ||
+-	    test_bit(SOCK_PASSCRED, &other->sk_socket->flags)) {
++	if (unix_passcred_enabled(sock, other)) {
+ 		UNIXCB(skb).pid  = get_pid(task_tgid(current));
+ 		current_uid_gid(&UNIXCB(skb).uid, &UNIXCB(skb).gid);
+ 	}
+ }
+ 
++static int maybe_init_creds(struct scm_cookie *scm,
++			    struct socket *socket,
++			    const struct sock *other)
++{
++	int err;
++	struct msghdr msg = { .msg_controllen = 0 };
++
++	err = scm_send(socket, &msg, scm, false);
++	if (err)
++		return err;
++
++	if (unix_passcred_enabled(socket, other)) {
++		scm->pid = get_pid(task_tgid(current));
++		current_uid_gid(&scm->creds.uid, &scm->creds.gid);
++	}
++	return err;
++}
++
++static bool unix_skb_scm_eq(struct sk_buff *skb,
++			    struct scm_cookie *scm)
++{
++	const struct unix_skb_parms *u = &UNIXCB(skb);
++
++	return u->pid == scm->pid &&
++	       uid_eq(u->uid, scm->creds.uid) &&
++	       gid_eq(u->gid, scm->creds.gid) &&
++	       unix_secdata_eq(scm, skb);
++}
++
+ /*
+  *	Send AF_UNIX data.
+  */
+@@ -1470,6 +1623,7 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
+ 	struct scm_cookie scm;
+ 	int max_level;
+ 	int data_len = 0;
++	int sk_locked;
+ 
+ 	wait_for_unix_gc();
+ 	err = scm_send(sock, msg, &scm, false);
+@@ -1548,12 +1702,14 @@ restart:
+ 		goto out_free;
+ 	}
+ 
++	sk_locked = 0;
+ 	unix_state_lock(other);
++restart_locked:
+ 	err = -EPERM;
+ 	if (!unix_may_send(sk, other))
+ 		goto out_unlock;
+ 
+-	if (sock_flag(other, SOCK_DEAD)) {
++	if (unlikely(sock_flag(other, SOCK_DEAD))) {
+ 		/*
+ 		 *	Check with 1003.1g - what should
+ 		 *	datagram error
+@@ -1561,10 +1717,14 @@ restart:
+ 		unix_state_unlock(other);
+ 		sock_put(other);
+ 
++		if (!sk_locked)
++			unix_state_lock(sk);
++
+ 		err = 0;
+-		unix_state_lock(sk);
+ 		if (unix_peer(sk) == other) {
+ 			unix_peer(sk) = NULL;
++			unix_dgram_peer_wake_disconnect_wakeup(sk, other);
++
+ 			unix_state_unlock(sk);
+ 
+ 			unix_dgram_disconnected(sk, other);
+@@ -1590,21 +1750,38 @@ restart:
+ 			goto out_unlock;
+ 	}
+ 
+-	if (unix_peer(other) != sk && unix_recvq_full(other)) {
+-		if (!timeo) {
+-			err = -EAGAIN;
+-			goto out_unlock;
++	if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
++		if (timeo) {
++			timeo = unix_wait_for_peer(other, timeo);
++
++			err = sock_intr_errno(timeo);
++			if (signal_pending(current))
++				goto out_free;
++
++			goto restart;
+ 		}
+ 
+-		timeo = unix_wait_for_peer(other, timeo);
++		if (!sk_locked) {
++			unix_state_unlock(other);
++			unix_state_double_lock(sk, other);
++		}
+ 
+-		err = sock_intr_errno(timeo);
+-		if (signal_pending(current))
+-			goto out_free;
++		if (unix_peer(sk) != other ||
++		    unix_dgram_peer_wake_me(sk, other)) {
++			err = -EAGAIN;
++			sk_locked = 1;
++			goto out_unlock;
++		}
+ 
+-		goto restart;
++		if (!sk_locked) {
++			sk_locked = 1;
++			goto restart_locked;
++		}
+ 	}
+ 
++	if (unlikely(sk_locked))
++		unix_state_unlock(sk);
++
+ 	if (sock_flag(other, SOCK_RCVTSTAMP))
+ 		__net_timestamp(skb);
+ 	maybe_add_creds(skb, sock, other);
+@@ -1618,6 +1795,8 @@ restart:
+ 	return len;
+ 
+ out_unlock:
++	if (sk_locked)
++		unix_state_unlock(sk);
+ 	unix_state_unlock(other);
+ out_free:
+ 	kfree_skb(skb);
+@@ -1739,8 +1918,10 @@ out_err:
+ static ssize_t unix_stream_sendpage(struct socket *socket, struct page *page,
+ 				    int offset, size_t size, int flags)
+ {
+-	int err = 0;
+-	bool send_sigpipe = true;
++	int err;
++	bool send_sigpipe = false;
++	bool init_scm = true;
++	struct scm_cookie scm;
+ 	struct sock *other, *sk = socket->sk;
+ 	struct sk_buff *skb, *newskb = NULL, *tail = NULL;
+ 
+@@ -1758,7 +1939,7 @@ alloc_skb:
+ 		newskb = sock_alloc_send_pskb(sk, 0, 0, flags & MSG_DONTWAIT,
+ 					      &err, 0);
+ 		if (!newskb)
+-			return err;
++			goto err;
+ 	}
+ 
+ 	/* we must acquire readlock as we modify already present
+@@ -1767,12 +1948,12 @@ alloc_skb:
+ 	err = mutex_lock_interruptible(&unix_sk(other)->readlock);
+ 	if (err) {
+ 		err = flags & MSG_DONTWAIT ? -EAGAIN : -ERESTARTSYS;
+-		send_sigpipe = false;
+ 		goto err;
+ 	}
+ 
+ 	if (sk->sk_shutdown & SEND_SHUTDOWN) {
+ 		err = -EPIPE;
++		send_sigpipe = true;
+ 		goto err_unlock;
+ 	}
+ 
+@@ -1781,23 +1962,34 @@ alloc_skb:
+ 	if (sock_flag(other, SOCK_DEAD) ||
+ 	    other->sk_shutdown & RCV_SHUTDOWN) {
+ 		err = -EPIPE;
++		send_sigpipe = true;
+ 		goto err_state_unlock;
+ 	}
+ 
++	if (init_scm) {
++		err = maybe_init_creds(&scm, socket, other);
++		if (err)
++			goto err_state_unlock;
++		init_scm = false;
++	}
++
+ 	skb = skb_peek_tail(&other->sk_receive_queue);
+ 	if (tail && tail == skb) {
+ 		skb = newskb;
+-	} else if (!skb) {
+-		if (newskb)
++	} else if (!skb || !unix_skb_scm_eq(skb, &scm)) {
++		if (newskb) {
+ 			skb = newskb;
+-		else
++		} else {
++			tail = skb;
+ 			goto alloc_skb;
++		}
+ 	} else if (newskb) {
+ 		/* this is fast path, we don't necessarily need to
+ 		 * call to kfree_skb even though with newskb == NULL
+ 		 * this - does no harm
+ 		 */
+ 		consume_skb(newskb);
++		newskb = NULL;
+ 	}
+ 
+ 	if (skb_append_pagefrags(skb, page, offset, size)) {
+@@ -1810,14 +2002,20 @@ alloc_skb:
+ 	skb->truesize += size;
+ 	atomic_add(size, &sk->sk_wmem_alloc);
+ 
+-	if (newskb)
++	if (newskb) {
++		err = unix_scm_to_skb(&scm, skb, false);
++		if (err)
++			goto err_state_unlock;
++		spin_lock(&other->sk_receive_queue.lock);
+ 		__skb_queue_tail(&other->sk_receive_queue, newskb);
++		spin_unlock(&other->sk_receive_queue.lock);
++	}
+ 
+ 	unix_state_unlock(other);
+ 	mutex_unlock(&unix_sk(other)->readlock);
+ 
+ 	other->sk_data_ready(other);
+-
++	scm_destroy(&scm);
+ 	return size;
+ 
+ err_state_unlock:
+@@ -1828,6 +2026,8 @@ err:
+ 	kfree_skb(newskb);
+ 	if (send_sigpipe && !(flags & MSG_NOSIGNAL))
+ 		send_sig(SIGPIPE, current, 0);
++	if (!init_scm)
++		scm_destroy(&scm);
+ 	return err;
+ }
+ 
+@@ -2071,6 +2271,7 @@ static int unix_stream_read_generic(struct unix_stream_read_state *state)
+ 
+ 	do {
+ 		int chunk;
++		bool drop_skb;
+ 		struct sk_buff *skb, *last;
+ 
+ 		unix_state_lock(sk);
+@@ -2130,10 +2331,7 @@ unlock:
+ 
+ 		if (check_creds) {
+ 			/* Never glue messages from different writers */
+-			if ((UNIXCB(skb).pid  != scm.pid) ||
+-			    !uid_eq(UNIXCB(skb).uid, scm.creds.uid) ||
+-			    !gid_eq(UNIXCB(skb).gid, scm.creds.gid) ||
+-			    !unix_secdata_eq(&scm, skb))
++			if (!unix_skb_scm_eq(skb, &scm))
+ 				break;
+ 		} else if (test_bit(SOCK_PASSCRED, &sock->flags)) {
+ 			/* Copy credentials */
+@@ -2151,7 +2349,11 @@ unlock:
+ 		}
+ 
+ 		chunk = min_t(unsigned int, unix_skb_len(skb) - skip, size);
++		skb_get(skb);
+ 		chunk = state->recv_actor(skb, skip, chunk, state);
++		drop_skb = !unix_skb_len(skb);
++		/* skb is only safe to use if !drop_skb */
++		consume_skb(skb);
+ 		if (chunk < 0) {
+ 			if (copied == 0)
+ 				copied = -EFAULT;
+@@ -2160,6 +2362,18 @@ unlock:
+ 		copied += chunk;
+ 		size -= chunk;
+ 
++		if (drop_skb) {
++			/* the skb was touched by a concurrent reader;
++			 * we should not expect anything from this skb
++			 * anymore and assume it invalid - we can be
++			 * sure it was dropped from the socket queue
++			 *
++			 * let's report a short read
++			 */
++			err = 0;
++			break;
++		}
++
+ 		/* Mark read part of skb as used */
+ 		if (!(flags & MSG_PEEK)) {
+ 			UNIXCB(skb).consumed += chunk;
+@@ -2453,14 +2667,16 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock,
+ 		return mask;
+ 
+ 	writable = unix_writable(sk);
+-	other = unix_peer_get(sk);
+-	if (other) {
+-		if (unix_peer(other) != sk) {
+-			sock_poll_wait(file, &unix_sk(other)->peer_wait, wait);
+-			if (unix_recvq_full(other))
+-				writable = 0;
+-		}
+-		sock_put(other);
++	if (writable) {
++		unix_state_lock(sk);
++
++		other = unix_peer(sk);
++		if (other && unix_peer(other) != sk &&
++		    unix_recvq_full(other) &&
++		    unix_dgram_peer_wake_me(sk, other))
++			writable = 0;
++
++		unix_state_unlock(sk);
+ 	}
+ 
+ 	if (writable)
+diff --git a/sound/pci/hda/patch_hdmi.c b/sound/pci/hda/patch_hdmi.c
+index a97db5fc8a15..9d1f91db57e6 100644
+--- a/sound/pci/hda/patch_hdmi.c
++++ b/sound/pci/hda/patch_hdmi.c
+@@ -48,8 +48,9 @@ MODULE_PARM_DESC(static_hdmi_pcm, "Don't restrict PCM parameters per ELD info");
+ #define is_haswell(codec)  ((codec)->core.vendor_id == 0x80862807)
+ #define is_broadwell(codec)    ((codec)->core.vendor_id == 0x80862808)
+ #define is_skylake(codec) ((codec)->core.vendor_id == 0x80862809)
++#define is_broxton(codec) ((codec)->core.vendor_id == 0x8086280a)
+ #define is_haswell_plus(codec) (is_haswell(codec) || is_broadwell(codec) \
+-					|| is_skylake(codec))
++				|| is_skylake(codec) || is_broxton(codec))
+ 
+ #define is_valleyview(codec) ((codec)->core.vendor_id == 0x80862882)
+ #define is_cherryview(codec) ((codec)->core.vendor_id == 0x80862883)
+diff --git a/tools/net/Makefile b/tools/net/Makefile
+index ee577ea03ba5..ddf888010652 100644
+--- a/tools/net/Makefile
++++ b/tools/net/Makefile
+@@ -4,6 +4,9 @@ CC = gcc
+ LEX = flex
+ YACC = bison
+ 
++CFLAGS += -Wall -O2
++CFLAGS += -D__EXPORTED_HEADERS__ -I../../include/uapi -I../../include
++
+ %.yacc.c: %.y
+ 	$(YACC) -o $@ -d $<
+ 
+@@ -12,15 +15,13 @@ YACC = bison
+ 
+ all : bpf_jit_disasm bpf_dbg bpf_asm
+ 
+-bpf_jit_disasm : CFLAGS = -Wall -O2 -DPACKAGE='bpf_jit_disasm'
++bpf_jit_disasm : CFLAGS += -DPACKAGE='bpf_jit_disasm'
+ bpf_jit_disasm : LDLIBS = -lopcodes -lbfd -ldl
+ bpf_jit_disasm : bpf_jit_disasm.o
+ 
+-bpf_dbg : CFLAGS = -Wall -O2
+ bpf_dbg : LDLIBS = -lreadline
+ bpf_dbg : bpf_dbg.o
+ 
+-bpf_asm : CFLAGS = -Wall -O2 -I.
+ bpf_asm : LDLIBS =
+ bpf_asm : bpf_asm.o bpf_exp.yacc.o bpf_exp.lex.o
+ bpf_exp.lex.o : bpf_exp.yacc.c


^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-12-15 11:15 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-03 16:12 [gentoo-commits] proj/linux-patches:4.2 commit in: / Mike Pagano
  -- strict thread matches above, loose matches on Subject: below --
2015-12-15 11:15 Mike Pagano
2015-12-11 14:31 Mike Pagano
2015-11-10  0:58 Mike Pagano
2015-11-05 23:30 Mike Pagano
2015-10-27 13:36 Mike Pagano
2015-10-23 17:19 Mike Pagano
2015-10-23 17:14 Mike Pagano
2015-09-29 19:16 Mike Pagano
2015-09-29 17:51 Mike Pagano
2015-09-28 23:44 Mike Pagano
2015-09-28 16:49 Mike Pagano
2015-09-22 11:43 Mike Pagano
2015-09-21 22:19 Mike Pagano
2015-09-15 12:31 Mike Pagano
2015-09-02 16:34 Mike Pagano
2015-08-19 14:58 Mike Pagano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox