* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-08-19 14:58 Mike Pagano
0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-08-19 14:58 UTC (permalink / raw
To: gentoo-commits
commit: 1e94ed6b2554ee5b2a7770481aa5d64ad6a2332b
Author: Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Wed Aug 19 14:58:06 2015 +0000
Commit: Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Wed Aug 19 14:58:06 2015 +0000
URL: https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=1e94ed6b
Patch to enable link security restrictions by default. Patch to disable Windows 8 compatibility for some Lenovo ThinkPads. Patch to ensure that /dev/root doesn't appear in /proc/mounts when bootint without an initramfs. Path to not not lock when UMH is waiting on current thread spawned by linuxrc. (bug #481344) fbcondecor bootsplash patch. Kernel patch that enables gcc < v4.9 optimizations for additional CPUs. Add patch to support namespace user.pax.* on tmpfs, bug #470644.
0000_README | 36 +
1500_XATTR_USER_PREFIX.patch | 54 +
...ble-link-security-restrictions-by-default.patch | 22 +
2700_ThinkPad-30-brightness-control-fix.patch | 67 +
2900_dev-root-proc-mount-fix.patch | 38 +
2905_2disk-resume-image-fix.patch | 24 +
4200_fbcondecor-3.19.patch | 2119 ++
4567_distro-Gentoo-Kconfig.patch | 39 +-
...able-additional-cpu-optimizations-for-gcc.patch | 327 +
...-additional-cpu-optimizations-for-gcc-4.9.patch | 402 +
5015_kdbus-8-12-2015.patch | 34349 +++++++++++++++++++
11 files changed, 37446 insertions(+), 31 deletions(-)
diff --git a/0000_README b/0000_README
index 9018993..9022e99 100644
--- a/0000_README
+++ b/0000_README
@@ -43,6 +43,42 @@ EXPERIMENTAL
Individual Patch Descriptions:
--------------------------------------------------------------------------
+Patch: 1500_XATTR_USER_PREFIX.patch
+From: https://bugs.gentoo.org/show_bug.cgi?id=470644
+Desc: Support for namespace user.pax.* on tmpfs.
+
+Patch: 1510_fs-enable-link-security-restrictions-by-default.patch
+From: http://sources.debian.net/src/linux/3.16.7-ckt4-3/debian/patches/debian/fs-enable-link-security-restrictions-by-default.patch/
+Desc: Enable link security restrictions by default.
+
+Patch: 2700_ThinkPad-30-brightness-control-fix.patch
+From: Seth Forshee <seth.forshee@canonical.com>
+Desc: ACPI: Disable Windows 8 compatibility for some Lenovo ThinkPads.
+
+Patch: 2900_dev-root-proc-mount-fix.patch
+From: https://bugs.gentoo.org/show_bug.cgi?id=438380
+Desc: Ensure that /dev/root doesn't appear in /proc/mounts when bootint without an initramfs.
+
+Patch: 2905_s2disk-resume-image-fix.patch
+From: Al Viro <viro <at> ZenIV.linux.org.uk>
+Desc: Do not lock when UMH is waiting on current thread spawned by linuxrc. (bug #481344)
+
+Patch: 4200_fbcondecor-3.19.patch
+From: http://www.mepiscommunity.org/fbcondecor
+Desc: Bootsplash ported by Marco. (Bug #539616)
+
Patch: 4567_distro-Gentoo-Kconfig.patch
From: Tom Wijsman <TomWij@gentoo.org>
Desc: Add Gentoo Linux support config settings and defaults.
+
+Patch: 5000_enable-additional-cpu-optimizations-for-gcc.patch
+From: https://github.com/graysky2/kernel_gcc_patch/
+Desc: Kernel patch enables gcc < v4.9 optimizations for additional CPUs.
+
+Patch: 5010_enable-additional-cpu-optimizations-for-gcc-4.9.patch
+From: https://github.com/graysky2/kernel_gcc_patch/
+Desc: Kernel patch enables gcc >= v4.9 optimizations for additional CPUs.
+
+Patch: 5015_kdbus-8-12-2015.patch
+From: https://lkml.org
+Desc: Kernel-level IPC implementation
diff --git a/1500_XATTR_USER_PREFIX.patch b/1500_XATTR_USER_PREFIX.patch
new file mode 100644
index 0000000..cc15cd5
--- /dev/null
+++ b/1500_XATTR_USER_PREFIX.patch
@@ -0,0 +1,54 @@
+From: Anthony G. Basile <blueness@gentoo.org>
+
+This patch adds support for a restricted user-controlled namespace on
+tmpfs filesystem used to house PaX flags. The namespace must be of the
+form user.pax.* and its value cannot exceed a size of 8 bytes.
+
+This is needed even on all Gentoo systems so that XATTR_PAX flags
+are preserved for users who might build packages using portage on
+a tmpfs system with a non-hardened kernel and then switch to a
+hardened kernel with XATTR_PAX enabled.
+
+The namespace is added to any user with Extended Attribute support
+enabled for tmpfs. Users who do not enable xattrs will not have
+the XATTR_PAX flags preserved.
+
+diff --git a/include/uapi/linux/xattr.h b/include/uapi/linux/xattr.h
+index e4629b9..6958086 100644
+--- a/include/uapi/linux/xattr.h
++++ b/include/uapi/linux/xattr.h
+@@ -63,5 +63,9 @@
+ #define XATTR_POSIX_ACL_DEFAULT "posix_acl_default"
+ #define XATTR_NAME_POSIX_ACL_DEFAULT XATTR_SYSTEM_PREFIX XATTR_POSIX_ACL_DEFAULT
+
++/* User namespace */
++#define XATTR_PAX_PREFIX XATTR_USER_PREFIX "pax."
++#define XATTR_PAX_FLAGS_SUFFIX "flags"
++#define XATTR_NAME_PAX_FLAGS XATTR_PAX_PREFIX XATTR_PAX_FLAGS_SUFFIX
+
+ #endif /* _UAPI_LINUX_XATTR_H */
+diff --git a/mm/shmem.c b/mm/shmem.c
+index 1c44af7..f23bb1b 100644
+--- a/mm/shmem.c
++++ b/mm/shmem.c
+@@ -2201,6 +2201,7 @@ static const struct xattr_handler *shmem_xattr_handlers[] = {
+ static int shmem_xattr_validate(const char *name)
+ {
+ struct { const char *prefix; size_t len; } arr[] = {
++ { XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN},
+ { XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN },
+ { XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN }
+ };
+@@ -2256,6 +2257,12 @@ static int shmem_setxattr(struct dentry *dentry, const char *name,
+ if (err)
+ return err;
+
++ if (!strncmp(name, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN)) {
++ if (strcmp(name, XATTR_NAME_PAX_FLAGS))
++ return -EOPNOTSUPP;
++ if (size > 8)
++ return -EINVAL;
++ }
+ return simple_xattr_set(&info->xattrs, name, value, size, flags);
+ }
+
diff --git a/1510_fs-enable-link-security-restrictions-by-default.patch b/1510_fs-enable-link-security-restrictions-by-default.patch
new file mode 100644
index 0000000..639fb3c
--- /dev/null
+++ b/1510_fs-enable-link-security-restrictions-by-default.patch
@@ -0,0 +1,22 @@
+From: Ben Hutchings <ben@decadent.org.uk>
+Subject: fs: Enable link security restrictions by default
+Date: Fri, 02 Nov 2012 05:32:06 +0000
+Bug-Debian: https://bugs.debian.org/609455
+Forwarded: not-needed
+
+This reverts commit 561ec64ae67ef25cac8d72bb9c4bfc955edfd415
+('VFS: don't do protected {sym,hard}links by default').
+
+--- a/fs/namei.c
++++ b/fs/namei.c
+@@ -651,8 +651,8 @@ static inline void put_link(struct namei
+ path_put(link);
+ }
+
+-int sysctl_protected_symlinks __read_mostly = 0;
+-int sysctl_protected_hardlinks __read_mostly = 0;
++int sysctl_protected_symlinks __read_mostly = 1;
++int sysctl_protected_hardlinks __read_mostly = 1;
+
+ /**
+ * may_follow_link - Check symlink following for unsafe situations
diff --git a/2700_ThinkPad-30-brightness-control-fix.patch b/2700_ThinkPad-30-brightness-control-fix.patch
new file mode 100644
index 0000000..b548c6d
--- /dev/null
+++ b/2700_ThinkPad-30-brightness-control-fix.patch
@@ -0,0 +1,67 @@
+diff --git a/drivers/acpi/blacklist.c b/drivers/acpi/blacklist.c
+index cb96296..6c242ed 100644
+--- a/drivers/acpi/blacklist.c
++++ b/drivers/acpi/blacklist.c
+@@ -269,6 +276,61 @@ static struct dmi_system_id acpi_osi_dmi_table[] __initdata = {
+ },
+
+ /*
++ * The following Lenovo models have a broken workaround in the
++ * acpi_video backlight implementation to meet the Windows 8
++ * requirement of 101 backlight levels. Reverting to pre-Win8
++ * behavior fixes the problem.
++ */
++ {
++ .callback = dmi_disable_osi_win8,
++ .ident = "Lenovo ThinkPad L430",
++ .matches = {
++ DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
++ DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad L430"),
++ },
++ },
++ {
++ .callback = dmi_disable_osi_win8,
++ .ident = "Lenovo ThinkPad T430s",
++ .matches = {
++ DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
++ DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad T430s"),
++ },
++ },
++ {
++ .callback = dmi_disable_osi_win8,
++ .ident = "Lenovo ThinkPad T530",
++ .matches = {
++ DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
++ DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad T530"),
++ },
++ },
++ {
++ .callback = dmi_disable_osi_win8,
++ .ident = "Lenovo ThinkPad W530",
++ .matches = {
++ DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
++ DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad W530"),
++ },
++ },
++ {
++ .callback = dmi_disable_osi_win8,
++ .ident = "Lenovo ThinkPad X1 Carbon",
++ .matches = {
++ DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
++ DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad X1 Carbon"),
++ },
++ },
++ {
++ .callback = dmi_disable_osi_win8,
++ .ident = "Lenovo ThinkPad X230",
++ .matches = {
++ DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
++ DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad X230"),
++ },
++ },
++
++ /*
+ * BIOS invocation of _OSI(Linux) is almost always a BIOS bug.
+ * Linux ignores it, except for the machines enumerated below.
+ */
+
diff --git a/2900_dev-root-proc-mount-fix.patch b/2900_dev-root-proc-mount-fix.patch
new file mode 100644
index 0000000..60af1eb
--- /dev/null
+++ b/2900_dev-root-proc-mount-fix.patch
@@ -0,0 +1,38 @@
+--- a/init/do_mounts.c 2015-08-19 10:27:16.753852576 -0400
++++ b/init/do_mounts.c 2015-08-19 10:34:25.473850353 -0400
+@@ -490,7 +490,11 @@ void __init change_floppy(char *fmt, ...
+ va_start(args, fmt);
+ vsprintf(buf, fmt, args);
+ va_end(args);
+- fd = sys_open("/dev/root", O_RDWR | O_NDELAY, 0);
++ if (saved_root_name[0])
++ fd = sys_open(saved_root_name, O_RDWR | O_NDELAY, 0);
++ else
++ fd = sys_open("/dev/root", O_RDWR | O_NDELAY, 0);
++
+ if (fd >= 0) {
+ sys_ioctl(fd, FDEJECT, 0);
+ sys_close(fd);
+@@ -534,11 +538,17 @@ void __init mount_root(void)
+ #endif
+ #ifdef CONFIG_BLOCK
+ {
+- int err = create_dev("/dev/root", ROOT_DEV);
+-
+- if (err < 0)
+- pr_emerg("Failed to create /dev/root: %d\n", err);
+- mount_block_root("/dev/root", root_mountflags);
++ if (saved_root_name[0] == '/') {
++ int err = create_dev(saved_root_name, ROOT_DEV);
++ if (err < 0)
++ pr_emerg("Failed to create %s: %d\n", saved_root_name, err);
++ mount_block_root(saved_root_name, root_mountflags);
++ } else {
++ int err = create_dev("/dev/root", ROOT_DEV);
++ if (err < 0)
++ pr_emerg("Failed to create /dev/root: %d\n", err);
++ mount_block_root("/dev/root", root_mountflags);
++ }
+ }
+ #endif
+ }
diff --git a/2905_2disk-resume-image-fix.patch b/2905_2disk-resume-image-fix.patch
new file mode 100644
index 0000000..7e95d29
--- /dev/null
+++ b/2905_2disk-resume-image-fix.patch
@@ -0,0 +1,24 @@
+diff --git a/kernel/kmod.c b/kernel/kmod.c
+index fb32636..d968882 100644
+--- a/kernel/kmod.c
++++ b/kernel/kmod.c
+@@ -575,7 +575,8 @@
+ call_usermodehelper_freeinfo(sub_info);
+ return -EINVAL;
+ }
+- helper_lock();
++ if (!(current->flags & PF_FREEZER_SKIP))
++ helper_lock();
+ if (!khelper_wq || usermodehelper_disabled) {
+ retval = -EBUSY;
+ goto out;
+@@ -611,7 +612,8 @@ wait_done:
+ out:
+ call_usermodehelper_freeinfo(sub_info);
+ unlock:
+- helper_unlock();
++ if (!(current->flags & PF_FREEZER_SKIP))
++ helper_unlock();
+ return retval;
+ }
+ EXPORT_SYMBOL(call_usermodehelper_exec);
diff --git a/4200_fbcondecor-3.19.patch b/4200_fbcondecor-3.19.patch
new file mode 100644
index 0000000..29c379f
--- /dev/null
+++ b/4200_fbcondecor-3.19.patch
@@ -0,0 +1,2119 @@
+diff --git a/Documentation/fb/00-INDEX b/Documentation/fb/00-INDEX
+index fe85e7c..2230930 100644
+--- a/Documentation/fb/00-INDEX
++++ b/Documentation/fb/00-INDEX
+@@ -23,6 +23,8 @@ ep93xx-fb.txt
+ - info on the driver for EP93xx LCD controller.
+ fbcon.txt
+ - intro to and usage guide for the framebuffer console (fbcon).
++fbcondecor.txt
++ - info on the Framebuffer Console Decoration
+ framebuffer.txt
+ - introduction to frame buffer devices.
+ gxfb.txt
+diff --git a/Documentation/fb/fbcondecor.txt b/Documentation/fb/fbcondecor.txt
+new file mode 100644
+index 0000000..3388c61
+--- /dev/null
++++ b/Documentation/fb/fbcondecor.txt
+@@ -0,0 +1,207 @@
++What is it?
++-----------
++
++The framebuffer decorations are a kernel feature which allows displaying a
++background picture on selected consoles.
++
++What do I need to get it to work?
++---------------------------------
++
++To get fbcondecor up-and-running you will have to:
++ 1) get a copy of splashutils [1] or a similar program
++ 2) get some fbcondecor themes
++ 3) build the kernel helper program
++ 4) build your kernel with the FB_CON_DECOR option enabled.
++
++To get fbcondecor operational right after fbcon initialization is finished, you
++will have to include a theme and the kernel helper into your initramfs image.
++Please refer to splashutils documentation for instructions on how to do that.
++
++[1] The splashutils package can be downloaded from:
++ http://github.com/alanhaggai/fbsplash
++
++The userspace helper
++--------------------
++
++The userspace fbcondecor helper (by default: /sbin/fbcondecor_helper) is called by the
++kernel whenever an important event occurs and the kernel needs some kind of
++job to be carried out. Important events include console switches and video
++mode switches (the kernel requests background images and configuration
++parameters for the current console). The fbcondecor helper must be accessible at
++all times. If it's not, fbcondecor will be switched off automatically.
++
++It's possible to set path to the fbcondecor helper by writing it to
++/proc/sys/kernel/fbcondecor.
++
++*****************************************************************************
++
++The information below is mostly technical stuff. There's probably no need to
++read it unless you plan to develop a userspace helper.
++
++The fbcondecor protocol
++-----------------------
++
++The fbcondecor protocol defines a communication interface between the kernel and
++the userspace fbcondecor helper.
++
++The kernel side is responsible for:
++
++ * rendering console text, using an image as a background (instead of a
++ standard solid color fbcon uses),
++ * accepting commands from the user via ioctls on the fbcondecor device,
++ * calling the userspace helper to set things up as soon as the fb subsystem
++ is initialized.
++
++The userspace helper is responsible for everything else, including parsing
++configuration files, decompressing the image files whenever the kernel needs
++it, and communicating with the kernel if necessary.
++
++The fbcondecor protocol specifies how communication is done in both ways:
++kernel->userspace and userspace->helper.
++
++Kernel -> Userspace
++-------------------
++
++The kernel communicates with the userspace helper by calling it and specifying
++the task to be done in a series of arguments.
++
++The arguments follow the pattern:
++<fbcondecor protocol version> <command> <parameters>
++
++All commands defined in fbcondecor protocol v2 have the following parameters:
++ virtual console
++ framebuffer number
++ theme
++
++Fbcondecor protocol v1 specified an additional 'fbcondecor mode' after the
++framebuffer number. Fbcondecor protocol v1 is deprecated and should not be used.
++
++Fbcondecor protocol v2 specifies the following commands:
++
++getpic
++------
++ The kernel issues this command to request image data. It's up to the
++ userspace helper to find a background image appropriate for the specified
++ theme and the current resolution. The userspace helper should respond by
++ issuing the FBIOCONDECOR_SETPIC ioctl.
++
++init
++----
++ The kernel issues this command after the fbcondecor device is created and
++ the fbcondecor interface is initialized. Upon receiving 'init', the userspace
++ helper should parse the kernel command line (/proc/cmdline) or otherwise
++ decide whether fbcondecor is to be activated.
++
++ To activate fbcondecor on the first console the helper should issue the
++ FBIOCONDECOR_SETCFG, FBIOCONDECOR_SETPIC and FBIOCONDECOR_SETSTATE commands,
++ in the above-mentioned order.
++
++ When the userspace helper is called in an early phase of the boot process
++ (right after the initialization of fbcon), no filesystems will be mounted.
++ The helper program should mount sysfs and then create the appropriate
++ framebuffer, fbcondecor and tty0 devices (if they don't already exist) to get
++ current display settings and to be able to communicate with the kernel side.
++ It should probably also mount the procfs to be able to parse the kernel
++ command line parameters.
++
++ Note that the console sem is not held when the kernel calls fbcondecor_helper
++ with the 'init' command. The fbcondecor helper should perform all ioctls with
++ origin set to FBCON_DECOR_IO_ORIG_USER.
++
++modechange
++----------
++ The kernel issues this command on a mode change. The helper's response should
++ be similar to the response to the 'init' command. Note that this time the
++ console sem is held and all ioctls must be performed with origin set to
++ FBCON_DECOR_IO_ORIG_KERNEL.
++
++
++Userspace -> Kernel
++-------------------
++
++Userspace programs can communicate with fbcondecor via ioctls on the
++fbcondecor device. These ioctls are to be used by both the userspace helper
++(called only by the kernel) and userspace configuration tools (run by the users).
++
++The fbcondecor helper should set the origin field to FBCON_DECOR_IO_ORIG_KERNEL
++when doing the appropriate ioctls. All userspace configuration tools should
++use FBCON_DECOR_IO_ORIG_USER. Failure to set the appropriate value in the origin
++field when performing ioctls from the kernel helper will most likely result
++in a console deadlock.
++
++FBCON_DECOR_IO_ORIG_KERNEL instructs fbcondecor not to try to acquire the console
++semaphore. Not surprisingly, FBCON_DECOR_IO_ORIG_USER instructs it to acquire
++the console sem.
++
++The framebuffer console decoration provides the following ioctls (all defined in
++linux/fb.h):
++
++FBIOCONDECOR_SETPIC
++description: loads a background picture for a virtual console
++argument: struct fbcon_decor_iowrapper*; data: struct fb_image*
++notes:
++If called for consoles other than the current foreground one, the picture data
++will be ignored.
++
++If the current virtual console is running in a 8-bpp mode, the cmap substruct
++of fb_image has to be filled appropriately: start should be set to 16 (first
++16 colors are reserved for fbcon), len to a value <= 240 and red, green and
++blue should point to valid cmap data. The transp field is ingored. The fields
++dx, dy, bg_color, fg_color in fb_image are ignored as well.
++
++FBIOCONDECOR_SETCFG
++description: sets the fbcondecor config for a virtual console
++argument: struct fbcon_decor_iowrapper*; data: struct vc_decor*
++notes: The structure has to be filled with valid data.
++
++FBIOCONDECOR_GETCFG
++description: gets the fbcondecor config for a virtual console
++argument: struct fbcon_decor_iowrapper*; data: struct vc_decor*
++
++FBIOCONDECOR_SETSTATE
++description: sets the fbcondecor state for a virtual console
++argument: struct fbcon_decor_iowrapper*; data: unsigned int*
++ values: 0 = disabled, 1 = enabled.
++
++FBIOCONDECOR_GETSTATE
++description: gets the fbcondecor state for a virtual console
++argument: struct fbcon_decor_iowrapper*; data: unsigned int*
++ values: as in FBIOCONDECOR_SETSTATE
++
++Info on used structures:
++
++Definition of struct vc_decor can be found in linux/console_decor.h. It's
++heavily commented. Note that the 'theme' field should point to a string
++no longer than FBCON_DECOR_THEME_LEN. When FBIOCONDECOR_GETCFG call is
++performed, the theme field should point to a char buffer of length
++FBCON_DECOR_THEME_LEN.
++
++Definition of struct fbcon_decor_iowrapper can be found in linux/fb.h.
++The fields in this struct have the following meaning:
++
++vc:
++Virtual console number.
++
++origin:
++Specifies if the ioctl is performed as a response to a kernel request. The
++fbcondecor helper should set this field to FBCON_DECOR_IO_ORIG_KERNEL, userspace
++programs should set it to FBCON_DECOR_IO_ORIG_USER. This field is necessary to
++avoid console semaphore deadlocks.
++
++data:
++Pointer to a data structure appropriate for the performed ioctl. Type of
++the data struct is specified in the ioctls description.
++
++*****************************************************************************
++
++Credit
++------
++
++Original 'bootsplash' project & implementation by:
++ Volker Poplawski <volker@poplawski.de>, Stefan Reinauer <stepan@suse.de>,
++ Steffen Winterfeldt <snwint@suse.de>, Michael Schroeder <mls@suse.de>,
++ Ken Wimer <wimer@suse.de>.
++
++Fbcondecor, fbcondecor protocol design, current implementation & docs by:
++ Michal Januszewski <michalj+fbcondecor@gmail.com>
++
+diff --git a/drivers/Makefile b/drivers/Makefile
+index 7183b6a..d576148 100644
+--- a/drivers/Makefile
++++ b/drivers/Makefile
+@@ -17,6 +17,10 @@ obj-y += pwm/
+ obj-$(CONFIG_PCI) += pci/
+ obj-$(CONFIG_PARISC) += parisc/
+ obj-$(CONFIG_RAPIDIO) += rapidio/
++# tty/ comes before char/ so that the VT console is the boot-time
++# default.
++obj-y += tty/
++obj-y += char/
+ obj-y += video/
+ obj-y += idle/
+
+@@ -42,11 +46,6 @@ obj-$(CONFIG_REGULATOR) += regulator/
+ # reset controllers early, since gpu drivers might rely on them to initialize
+ obj-$(CONFIG_RESET_CONTROLLER) += reset/
+
+-# tty/ comes before char/ so that the VT console is the boot-time
+-# default.
+-obj-y += tty/
+-obj-y += char/
+-
+ # iommu/ comes before gpu as gpu are using iommu controllers
+ obj-$(CONFIG_IOMMU_SUPPORT) += iommu/
+
+diff --git a/drivers/video/console/Kconfig b/drivers/video/console/Kconfig
+index fe1cd01..6d2e87a 100644
+--- a/drivers/video/console/Kconfig
++++ b/drivers/video/console/Kconfig
+@@ -126,6 +126,19 @@ config FRAMEBUFFER_CONSOLE_ROTATION
+ such that other users of the framebuffer will remain normally
+ oriented.
+
++config FB_CON_DECOR
++ bool "Support for the Framebuffer Console Decorations"
++ depends on FRAMEBUFFER_CONSOLE=y && !FB_TILEBLITTING
++ default n
++ ---help---
++ This option enables support for framebuffer console decorations which
++ makes it possible to display images in the background of the system
++ consoles. Note that userspace utilities are necessary in order to take
++ advantage of these features. Refer to Documentation/fb/fbcondecor.txt
++ for more information.
++
++ If unsure, say N.
++
+ config STI_CONSOLE
+ bool "STI text console"
+ depends on PARISC
+diff --git a/drivers/video/console/Makefile b/drivers/video/console/Makefile
+index 43bfa48..cc104b6f 100644
+--- a/drivers/video/console/Makefile
++++ b/drivers/video/console/Makefile
+@@ -16,4 +16,5 @@ obj-$(CONFIG_FRAMEBUFFER_CONSOLE) += fbcon_rotate.o fbcon_cw.o fbcon_ud.o \
+ fbcon_ccw.o
+ endif
+
++obj-$(CONFIG_FB_CON_DECOR) += fbcondecor.o cfbcondecor.o
+ obj-$(CONFIG_FB_STI) += sticore.o
+diff --git a/drivers/video/console/bitblit.c b/drivers/video/console/bitblit.c
+index 61b182b..984384b 100644
+--- a/drivers/video/console/bitblit.c
++++ b/drivers/video/console/bitblit.c
+@@ -18,6 +18,7 @@
+ #include <linux/console.h>
+ #include <asm/types.h>
+ #include "fbcon.h"
++#include "fbcondecor.h"
+
+ /*
+ * Accelerated handlers.
+@@ -55,6 +56,13 @@ static void bit_bmove(struct vc_data *vc, struct fb_info *info, int sy,
+ area.height = height * vc->vc_font.height;
+ area.width = width * vc->vc_font.width;
+
++ if (fbcon_decor_active(info, vc)) {
++ area.sx += vc->vc_decor.tx;
++ area.sy += vc->vc_decor.ty;
++ area.dx += vc->vc_decor.tx;
++ area.dy += vc->vc_decor.ty;
++ }
++
+ info->fbops->fb_copyarea(info, &area);
+ }
+
+@@ -380,11 +388,15 @@ static void bit_cursor(struct vc_data *vc, struct fb_info *info, int mode,
+ cursor.image.depth = 1;
+ cursor.rop = ROP_XOR;
+
+- if (info->fbops->fb_cursor)
+- err = info->fbops->fb_cursor(info, &cursor);
++ if (fbcon_decor_active(info, vc)) {
++ fbcon_decor_cursor(info, &cursor);
++ } else {
++ if (info->fbops->fb_cursor)
++ err = info->fbops->fb_cursor(info, &cursor);
+
+- if (err)
+- soft_cursor(info, &cursor);
++ if (err)
++ soft_cursor(info, &cursor);
++ }
+
+ ops->cursor_reset = 0;
+ }
+diff --git a/drivers/video/console/cfbcondecor.c b/drivers/video/console/cfbcondecor.c
+new file mode 100644
+index 0000000..a2b4497
+--- /dev/null
++++ b/drivers/video/console/cfbcondecor.c
+@@ -0,0 +1,471 @@
++/*
++ * linux/drivers/video/cfbcon_decor.c -- Framebuffer decor render functions
++ *
++ * Copyright (C) 2004 Michal Januszewski <michalj+fbcondecor@gmail.com>
++ *
++ * Code based upon "Bootdecor" (C) 2001-2003
++ * Volker Poplawski <volker@poplawski.de>,
++ * Stefan Reinauer <stepan@suse.de>,
++ * Steffen Winterfeldt <snwint@suse.de>,
++ * Michael Schroeder <mls@suse.de>,
++ * Ken Wimer <wimer@suse.de>.
++ *
++ * This file is subject to the terms and conditions of the GNU General Public
++ * License. See the file COPYING in the main directory of this archive for
++ * more details.
++ */
++#include <linux/module.h>
++#include <linux/types.h>
++#include <linux/fb.h>
++#include <linux/selection.h>
++#include <linux/slab.h>
++#include <linux/vt_kern.h>
++#include <asm/irq.h>
++
++#include "fbcon.h"
++#include "fbcondecor.h"
++
++#define parse_pixel(shift,bpp,type) \
++ do { \
++ if (d & (0x80 >> (shift))) \
++ dd2[(shift)] = fgx; \
++ else \
++ dd2[(shift)] = transparent ? *(type *)decor_src : bgx; \
++ decor_src += (bpp); \
++ } while (0) \
++
++extern int get_color(struct vc_data *vc, struct fb_info *info,
++ u16 c, int is_fg);
++
++void fbcon_decor_fix_pseudo_pal(struct fb_info *info, struct vc_data *vc)
++{
++ int i, j, k;
++ int minlen = min(min(info->var.red.length, info->var.green.length),
++ info->var.blue.length);
++ u32 col;
++
++ for (j = i = 0; i < 16; i++) {
++ k = color_table[i];
++
++ col = ((vc->vc_palette[j++] >> (8-minlen))
++ << info->var.red.offset);
++ col |= ((vc->vc_palette[j++] >> (8-minlen))
++ << info->var.green.offset);
++ col |= ((vc->vc_palette[j++] >> (8-minlen))
++ << info->var.blue.offset);
++ ((u32 *)info->pseudo_palette)[k] = col;
++ }
++}
++
++void fbcon_decor_renderc(struct fb_info *info, int ypos, int xpos, int height,
++ int width, u8* src, u32 fgx, u32 bgx, u8 transparent)
++{
++ unsigned int x, y;
++ u32 dd;
++ int bytespp = ((info->var.bits_per_pixel + 7) >> 3);
++ unsigned int d = ypos * info->fix.line_length + xpos * bytespp;
++ unsigned int ds = (ypos * info->var.xres + xpos) * bytespp;
++ u16 dd2[4];
++
++ u8* decor_src = (u8 *)(info->bgdecor.data + ds);
++ u8* dst = (u8 *)(info->screen_base + d);
++
++ if ((ypos + height) > info->var.yres || (xpos + width) > info->var.xres)
++ return;
++
++ for (y = 0; y < height; y++) {
++ switch (info->var.bits_per_pixel) {
++
++ case 32:
++ for (x = 0; x < width; x++) {
++
++ if ((x & 7) == 0)
++ d = *src++;
++ if (d & 0x80)
++ dd = fgx;
++ else
++ dd = transparent ?
++ *(u32 *)decor_src : bgx;
++
++ d <<= 1;
++ decor_src += 4;
++ fb_writel(dd, dst);
++ dst += 4;
++ }
++ break;
++ case 24:
++ for (x = 0; x < width; x++) {
++
++ if ((x & 7) == 0)
++ d = *src++;
++ if (d & 0x80)
++ dd = fgx;
++ else
++ dd = transparent ?
++ (*(u32 *)decor_src & 0xffffff) : bgx;
++
++ d <<= 1;
++ decor_src += 3;
++#ifdef __LITTLE_ENDIAN
++ fb_writew(dd & 0xffff, dst);
++ dst += 2;
++ fb_writeb((dd >> 16), dst);
++#else
++ fb_writew(dd >> 8, dst);
++ dst += 2;
++ fb_writeb(dd & 0xff, dst);
++#endif
++ dst++;
++ }
++ break;
++ case 16:
++ for (x = 0; x < width; x += 2) {
++ if ((x & 7) == 0)
++ d = *src++;
++
++ parse_pixel(0, 2, u16);
++ parse_pixel(1, 2, u16);
++#ifdef __LITTLE_ENDIAN
++ dd = dd2[0] | (dd2[1] << 16);
++#else
++ dd = dd2[1] | (dd2[0] << 16);
++#endif
++ d <<= 2;
++ fb_writel(dd, dst);
++ dst += 4;
++ }
++ break;
++
++ case 8:
++ for (x = 0; x < width; x += 4) {
++ if ((x & 7) == 0)
++ d = *src++;
++
++ parse_pixel(0, 1, u8);
++ parse_pixel(1, 1, u8);
++ parse_pixel(2, 1, u8);
++ parse_pixel(3, 1, u8);
++
++#ifdef __LITTLE_ENDIAN
++ dd = dd2[0] | (dd2[1] << 8) | (dd2[2] << 16) | (dd2[3] << 24);
++#else
++ dd = dd2[3] | (dd2[2] << 8) | (dd2[1] << 16) | (dd2[0] << 24);
++#endif
++ d <<= 4;
++ fb_writel(dd, dst);
++ dst += 4;
++ }
++ }
++
++ dst += info->fix.line_length - width * bytespp;
++ decor_src += (info->var.xres - width) * bytespp;
++ }
++}
++
++#define cc2cx(a) \
++ ((info->fix.visual == FB_VISUAL_TRUECOLOR || \
++ info->fix.visual == FB_VISUAL_DIRECTCOLOR) ? \
++ ((u32*)info->pseudo_palette)[a] : a)
++
++void fbcon_decor_putcs(struct vc_data *vc, struct fb_info *info,
++ const unsigned short *s, int count, int yy, int xx)
++{
++ unsigned short charmask = vc->vc_hi_font_mask ? 0x1ff : 0xff;
++ struct fbcon_ops *ops = info->fbcon_par;
++ int fg_color, bg_color, transparent;
++ u8 *src;
++ u32 bgx, fgx;
++ u16 c = scr_readw(s);
++
++ fg_color = get_color(vc, info, c, 1);
++ bg_color = get_color(vc, info, c, 0);
++
++ /* Don't paint the background image if console is blanked */
++ transparent = ops->blank_state ? 0 :
++ (vc->vc_decor.bg_color == bg_color);
++
++ xx = xx * vc->vc_font.width + vc->vc_decor.tx;
++ yy = yy * vc->vc_font.height + vc->vc_decor.ty;
++
++ fgx = cc2cx(fg_color);
++ bgx = cc2cx(bg_color);
++
++ while (count--) {
++ c = scr_readw(s++);
++ src = vc->vc_font.data + (c & charmask) * vc->vc_font.height *
++ ((vc->vc_font.width + 7) >> 3);
++
++ fbcon_decor_renderc(info, yy, xx, vc->vc_font.height,
++ vc->vc_font.width, src, fgx, bgx, transparent);
++ xx += vc->vc_font.width;
++ }
++}
++
++void fbcon_decor_cursor(struct fb_info *info, struct fb_cursor *cursor)
++{
++ int i;
++ unsigned int dsize, s_pitch;
++ struct fbcon_ops *ops = info->fbcon_par;
++ struct vc_data* vc;
++ u8 *src;
++
++ /* we really don't need any cursors while the console is blanked */
++ if (info->state != FBINFO_STATE_RUNNING || ops->blank_state)
++ return;
++
++ vc = vc_cons[ops->currcon].d;
++
++ src = kmalloc(64 + sizeof(struct fb_image), GFP_ATOMIC);
++ if (!src)
++ return;
++
++ s_pitch = (cursor->image.width + 7) >> 3;
++ dsize = s_pitch * cursor->image.height;
++ if (cursor->enable) {
++ switch (cursor->rop) {
++ case ROP_XOR:
++ for (i = 0; i < dsize; i++)
++ src[i] = cursor->image.data[i] ^ cursor->mask[i];
++ break;
++ case ROP_COPY:
++ default:
++ for (i = 0; i < dsize; i++)
++ src[i] = cursor->image.data[i] & cursor->mask[i];
++ break;
++ }
++ } else
++ memcpy(src, cursor->image.data, dsize);
++
++ fbcon_decor_renderc(info,
++ cursor->image.dy + vc->vc_decor.ty,
++ cursor->image.dx + vc->vc_decor.tx,
++ cursor->image.height,
++ cursor->image.width,
++ (u8*)src,
++ cc2cx(cursor->image.fg_color),
++ cc2cx(cursor->image.bg_color),
++ cursor->image.bg_color == vc->vc_decor.bg_color);
++
++ kfree(src);
++}
++
++static void decorset(u8 *dst, int height, int width, int dstbytes,
++ u32 bgx, int bpp)
++{
++ int i;
++
++ if (bpp == 8)
++ bgx |= bgx << 8;
++ if (bpp == 16 || bpp == 8)
++ bgx |= bgx << 16;
++
++ while (height-- > 0) {
++ u8 *p = dst;
++
++ switch (bpp) {
++
++ case 32:
++ for (i=0; i < width; i++) {
++ fb_writel(bgx, p); p += 4;
++ }
++ break;
++ case 24:
++ for (i=0; i < width; i++) {
++#ifdef __LITTLE_ENDIAN
++ fb_writew((bgx & 0xffff),(u16*)p); p += 2;
++ fb_writeb((bgx >> 16),p++);
++#else
++ fb_writew((bgx >> 8),(u16*)p); p += 2;
++ fb_writeb((bgx & 0xff),p++);
++#endif
++ }
++ case 16:
++ for (i=0; i < width/4; i++) {
++ fb_writel(bgx,p); p += 4;
++ fb_writel(bgx,p); p += 4;
++ }
++ if (width & 2) {
++ fb_writel(bgx,p); p += 4;
++ }
++ if (width & 1)
++ fb_writew(bgx,(u16*)p);
++ break;
++ case 8:
++ for (i=0; i < width/4; i++) {
++ fb_writel(bgx,p); p += 4;
++ }
++
++ if (width & 2) {
++ fb_writew(bgx,p); p += 2;
++ }
++ if (width & 1)
++ fb_writeb(bgx,(u8*)p);
++ break;
++
++ }
++ dst += dstbytes;
++ }
++}
++
++void fbcon_decor_copy(u8 *dst, u8 *src, int height, int width, int linebytes,
++ int srclinebytes, int bpp)
++{
++ int i;
++
++ while (height-- > 0) {
++ u32 *p = (u32 *)dst;
++ u32 *q = (u32 *)src;
++
++ switch (bpp) {
++
++ case 32:
++ for (i=0; i < width; i++)
++ fb_writel(*q++, p++);
++ break;
++ case 24:
++ for (i=0; i < (width*3/4); i++)
++ fb_writel(*q++, p++);
++ if ((width*3) % 4) {
++ if (width & 2) {
++ fb_writeb(*(u8*)q, (u8*)p);
++ } else if (width & 1) {
++ fb_writew(*(u16*)q, (u16*)p);
++ fb_writeb(*(u8*)((u16*)q+1),(u8*)((u16*)p+2));
++ }
++ }
++ break;
++ case 16:
++ for (i=0; i < width/4; i++) {
++ fb_writel(*q++, p++);
++ fb_writel(*q++, p++);
++ }
++ if (width & 2)
++ fb_writel(*q++, p++);
++ if (width & 1)
++ fb_writew(*(u16*)q, (u16*)p);
++ break;
++ case 8:
++ for (i=0; i < width/4; i++)
++ fb_writel(*q++, p++);
++
++ if (width & 2) {
++ fb_writew(*(u16*)q, (u16*)p);
++ q = (u32*) ((u16*)q + 1);
++ p = (u32*) ((u16*)p + 1);
++ }
++ if (width & 1)
++ fb_writeb(*(u8*)q, (u8*)p);
++ break;
++ }
++
++ dst += linebytes;
++ src += srclinebytes;
++ }
++}
++
++static void decorfill(struct fb_info *info, int sy, int sx, int height,
++ int width)
++{
++ int bytespp = ((info->var.bits_per_pixel + 7) >> 3);
++ int d = sy * info->fix.line_length + sx * bytespp;
++ int ds = (sy * info->var.xres + sx) * bytespp;
++
++ fbcon_decor_copy((u8 *)(info->screen_base + d), (u8 *)(info->bgdecor.data + ds),
++ height, width, info->fix.line_length, info->var.xres * bytespp,
++ info->var.bits_per_pixel);
++}
++
++void fbcon_decor_clear(struct vc_data *vc, struct fb_info *info, int sy, int sx,
++ int height, int width)
++{
++ int bgshift = (vc->vc_hi_font_mask) ? 13 : 12;
++ struct fbcon_ops *ops = info->fbcon_par;
++ u8 *dst;
++ int transparent, bg_color = attr_bgcol_ec(bgshift, vc, info);
++
++ transparent = (vc->vc_decor.bg_color == bg_color);
++ sy = sy * vc->vc_font.height + vc->vc_decor.ty;
++ sx = sx * vc->vc_font.width + vc->vc_decor.tx;
++ height *= vc->vc_font.height;
++ width *= vc->vc_font.width;
++
++ /* Don't paint the background image if console is blanked */
++ if (transparent && !ops->blank_state) {
++ decorfill(info, sy, sx, height, width);
++ } else {
++ dst = (u8 *)(info->screen_base + sy * info->fix.line_length +
++ sx * ((info->var.bits_per_pixel + 7) >> 3));
++ decorset(dst, height, width, info->fix.line_length, cc2cx(bg_color),
++ info->var.bits_per_pixel);
++ }
++}
++
++void fbcon_decor_clear_margins(struct vc_data *vc, struct fb_info *info,
++ int bottom_only)
++{
++ unsigned int tw = vc->vc_cols*vc->vc_font.width;
++ unsigned int th = vc->vc_rows*vc->vc_font.height;
++
++ if (!bottom_only) {
++ /* top margin */
++ decorfill(info, 0, 0, vc->vc_decor.ty, info->var.xres);
++ /* left margin */
++ decorfill(info, vc->vc_decor.ty, 0, th, vc->vc_decor.tx);
++ /* right margin */
++ decorfill(info, vc->vc_decor.ty, vc->vc_decor.tx + tw, th,
++ info->var.xres - vc->vc_decor.tx - tw);
++ }
++ decorfill(info, vc->vc_decor.ty + th, 0,
++ info->var.yres - vc->vc_decor.ty - th, info->var.xres);
++}
++
++void fbcon_decor_bmove_redraw(struct vc_data *vc, struct fb_info *info, int y,
++ int sx, int dx, int width)
++{
++ u16 *d = (u16 *) (vc->vc_origin + vc->vc_size_row * y + dx * 2);
++ u16 *s = d + (dx - sx);
++ u16 *start = d;
++ u16 *ls = d;
++ u16 *le = d + width;
++ u16 c;
++ int x = dx;
++ u16 attr = 1;
++
++ do {
++ c = scr_readw(d);
++ if (attr != (c & 0xff00)) {
++ attr = c & 0xff00;
++ if (d > start) {
++ fbcon_decor_putcs(vc, info, start, d - start, y, x);
++ x += d - start;
++ start = d;
++ }
++ }
++ if (s >= ls && s < le && c == scr_readw(s)) {
++ if (d > start) {
++ fbcon_decor_putcs(vc, info, start, d - start, y, x);
++ x += d - start + 1;
++ start = d + 1;
++ } else {
++ x++;
++ start++;
++ }
++ }
++ s++;
++ d++;
++ } while (d < le);
++ if (d > start)
++ fbcon_decor_putcs(vc, info, start, d - start, y, x);
++}
++
++void fbcon_decor_blank(struct vc_data *vc, struct fb_info *info, int blank)
++{
++ if (blank) {
++ decorset((u8 *)info->screen_base, info->var.yres, info->var.xres,
++ info->fix.line_length, 0, info->var.bits_per_pixel);
++ } else {
++ update_screen(vc);
++ fbcon_decor_clear_margins(vc, info, 0);
++ }
++}
++
+diff --git a/drivers/video/console/fbcon.c b/drivers/video/console/fbcon.c
+index f447734..da50d61 100644
+--- a/drivers/video/console/fbcon.c
++++ b/drivers/video/console/fbcon.c
+@@ -79,6 +79,7 @@
+ #include <asm/irq.h>
+
+ #include "fbcon.h"
++#include "../console/fbcondecor.h"
+
+ #ifdef FBCONDEBUG
+ # define DPRINTK(fmt, args...) printk(KERN_DEBUG "%s: " fmt, __func__ , ## args)
+@@ -94,7 +95,7 @@ enum {
+
+ static struct display fb_display[MAX_NR_CONSOLES];
+
+-static signed char con2fb_map[MAX_NR_CONSOLES];
++signed char con2fb_map[MAX_NR_CONSOLES];
+ static signed char con2fb_map_boot[MAX_NR_CONSOLES];
+
+ static int logo_lines;
+@@ -286,7 +287,7 @@ static inline int fbcon_is_inactive(struct vc_data *vc, struct fb_info *info)
+ !vt_force_oops_output(vc);
+ }
+
+-static int get_color(struct vc_data *vc, struct fb_info *info,
++int get_color(struct vc_data *vc, struct fb_info *info,
+ u16 c, int is_fg)
+ {
+ int depth = fb_get_color_depth(&info->var, &info->fix);
+@@ -551,6 +552,9 @@ static int do_fbcon_takeover(int show_logo)
+ info_idx = -1;
+ } else {
+ fbcon_has_console_bind = 1;
++#ifdef CONFIG_FB_CON_DECOR
++ fbcon_decor_init();
++#endif
+ }
+
+ return err;
+@@ -1007,6 +1011,12 @@ static const char *fbcon_startup(void)
+ rows = FBCON_SWAP(ops->rotate, info->var.yres, info->var.xres);
+ cols /= vc->vc_font.width;
+ rows /= vc->vc_font.height;
++
++ if (fbcon_decor_active(info, vc)) {
++ cols = vc->vc_decor.twidth / vc->vc_font.width;
++ rows = vc->vc_decor.theight / vc->vc_font.height;
++ }
++
+ vc_resize(vc, cols, rows);
+
+ DPRINTK("mode: %s\n", info->fix.id);
+@@ -1036,7 +1046,7 @@ static void fbcon_init(struct vc_data *vc, int init)
+ cap = info->flags;
+
+ if (vc != svc || logo_shown == FBCON_LOGO_DONTSHOW ||
+- (info->fix.type == FB_TYPE_TEXT))
++ (info->fix.type == FB_TYPE_TEXT) || fbcon_decor_active(info, vc))
+ logo = 0;
+
+ if (var_to_display(p, &info->var, info))
+@@ -1260,6 +1270,11 @@ static void fbcon_clear(struct vc_data *vc, int sy, int sx, int height,
+ fbcon_clear_margins(vc, 0);
+ }
+
++ if (fbcon_decor_active(info, vc)) {
++ fbcon_decor_clear(vc, info, sy, sx, height, width);
++ return;
++ }
++
+ /* Split blits that cross physical y_wrap boundary */
+
+ y_break = p->vrows - p->yscroll;
+@@ -1279,10 +1294,15 @@ static void fbcon_putcs(struct vc_data *vc, const unsigned short *s,
+ struct display *p = &fb_display[vc->vc_num];
+ struct fbcon_ops *ops = info->fbcon_par;
+
+- if (!fbcon_is_inactive(vc, info))
+- ops->putcs(vc, info, s, count, real_y(p, ypos), xpos,
+- get_color(vc, info, scr_readw(s), 1),
+- get_color(vc, info, scr_readw(s), 0));
++ if (!fbcon_is_inactive(vc, info)) {
++
++ if (fbcon_decor_active(info, vc))
++ fbcon_decor_putcs(vc, info, s, count, ypos, xpos);
++ else
++ ops->putcs(vc, info, s, count, real_y(p, ypos), xpos,
++ get_color(vc, info, scr_readw(s), 1),
++ get_color(vc, info, scr_readw(s), 0));
++ }
+ }
+
+ static void fbcon_putc(struct vc_data *vc, int c, int ypos, int xpos)
+@@ -1298,8 +1318,13 @@ static void fbcon_clear_margins(struct vc_data *vc, int bottom_only)
+ struct fb_info *info = registered_fb[con2fb_map[vc->vc_num]];
+ struct fbcon_ops *ops = info->fbcon_par;
+
+- if (!fbcon_is_inactive(vc, info))
+- ops->clear_margins(vc, info, bottom_only);
++ if (!fbcon_is_inactive(vc, info)) {
++ if (fbcon_decor_active(info, vc)) {
++ fbcon_decor_clear_margins(vc, info, bottom_only);
++ } else {
++ ops->clear_margins(vc, info, bottom_only);
++ }
++ }
+ }
+
+ static void fbcon_cursor(struct vc_data *vc, int mode)
+@@ -1819,7 +1844,7 @@ static int fbcon_scroll(struct vc_data *vc, int t, int b, int dir,
+ count = vc->vc_rows;
+ if (softback_top)
+ fbcon_softback_note(vc, t, count);
+- if (logo_shown >= 0)
++ if (logo_shown >= 0 || fbcon_decor_active(info, vc))
+ goto redraw_up;
+ switch (p->scrollmode) {
+ case SCROLL_MOVE:
+@@ -1912,6 +1937,8 @@ static int fbcon_scroll(struct vc_data *vc, int t, int b, int dir,
+ count = vc->vc_rows;
+ if (logo_shown >= 0)
+ goto redraw_down;
++ if (fbcon_decor_active(info, vc))
++ goto redraw_down;
+ switch (p->scrollmode) {
+ case SCROLL_MOVE:
+ fbcon_redraw_blit(vc, info, p, b - 1, b - t - count,
+@@ -2060,6 +2087,13 @@ static void fbcon_bmove_rec(struct vc_data *vc, struct display *p, int sy, int s
+ }
+ return;
+ }
++
++ if (fbcon_decor_active(info, vc) && sy == dy && height == 1) {
++ /* must use slower redraw bmove to keep background pic intact */
++ fbcon_decor_bmove_redraw(vc, info, sy, sx, dx, width);
++ return;
++ }
++
+ ops->bmove(vc, info, real_y(p, sy), sx, real_y(p, dy), dx,
+ height, width);
+ }
+@@ -2130,8 +2164,8 @@ static int fbcon_resize(struct vc_data *vc, unsigned int width,
+ var.yres = virt_h * virt_fh;
+ x_diff = info->var.xres - var.xres;
+ y_diff = info->var.yres - var.yres;
+- if (x_diff < 0 || x_diff > virt_fw ||
+- y_diff < 0 || y_diff > virt_fh) {
++ if ((x_diff < 0 || x_diff > virt_fw ||
++ y_diff < 0 || y_diff > virt_fh) && !vc->vc_decor.state) {
+ const struct fb_videomode *mode;
+
+ DPRINTK("attempting resize %ix%i\n", var.xres, var.yres);
+@@ -2167,6 +2201,21 @@ static int fbcon_switch(struct vc_data *vc)
+
+ info = registered_fb[con2fb_map[vc->vc_num]];
+ ops = info->fbcon_par;
++ prev_console = ops->currcon;
++ if (prev_console != -1)
++ old_info = registered_fb[con2fb_map[prev_console]];
++
++#ifdef CONFIG_FB_CON_DECOR
++ if (!fbcon_decor_active_vc(vc) && info->fix.visual == FB_VISUAL_DIRECTCOLOR) {
++ struct vc_data *vc_curr = vc_cons[prev_console].d;
++ if (vc_curr && fbcon_decor_active_vc(vc_curr)) {
++ /* Clear the screen to avoid displaying funky colors during
++ * palette updates. */
++ memset((u8*)info->screen_base + info->fix.line_length * info->var.yoffset,
++ 0, info->var.yres * info->fix.line_length);
++ }
++ }
++#endif
+
+ if (softback_top) {
+ if (softback_lines)
+@@ -2185,9 +2234,6 @@ static int fbcon_switch(struct vc_data *vc)
+ logo_shown = FBCON_LOGO_CANSHOW;
+ }
+
+- prev_console = ops->currcon;
+- if (prev_console != -1)
+- old_info = registered_fb[con2fb_map[prev_console]];
+ /*
+ * FIXME: If we have multiple fbdev's loaded, we need to
+ * update all info->currcon. Perhaps, we can place this
+@@ -2231,6 +2277,18 @@ static int fbcon_switch(struct vc_data *vc)
+ fbcon_del_cursor_timer(old_info);
+ }
+
++ if (fbcon_decor_active_vc(vc)) {
++ struct vc_data *vc_curr = vc_cons[prev_console].d;
++
++ if (!vc_curr->vc_decor.theme ||
++ strcmp(vc->vc_decor.theme, vc_curr->vc_decor.theme) ||
++ (fbcon_decor_active_nores(info, vc_curr) &&
++ !fbcon_decor_active(info, vc_curr))) {
++ fbcon_decor_disable(vc, 0);
++ fbcon_decor_call_helper("modechange", vc->vc_num);
++ }
++ }
++
+ if (fbcon_is_inactive(vc, info) ||
+ ops->blank_state != FB_BLANK_UNBLANK)
+ fbcon_del_cursor_timer(info);
+@@ -2339,15 +2397,20 @@ static int fbcon_blank(struct vc_data *vc, int blank, int mode_switch)
+ }
+ }
+
+- if (!fbcon_is_inactive(vc, info)) {
++ if (!fbcon_is_inactive(vc, info)) {
+ if (ops->blank_state != blank) {
+ ops->blank_state = blank;
+ fbcon_cursor(vc, blank ? CM_ERASE : CM_DRAW);
+ ops->cursor_flash = (!blank);
+
+- if (!(info->flags & FBINFO_MISC_USEREVENT))
+- if (fb_blank(info, blank))
+- fbcon_generic_blank(vc, info, blank);
++ if (!(info->flags & FBINFO_MISC_USEREVENT)) {
++ if (fb_blank(info, blank)) {
++ if (fbcon_decor_active(info, vc))
++ fbcon_decor_blank(vc, info, blank);
++ else
++ fbcon_generic_blank(vc, info, blank);
++ }
++ }
+ }
+
+ if (!blank)
+@@ -2522,13 +2585,22 @@ static int fbcon_do_set_font(struct vc_data *vc, int w, int h,
+ }
+
+ if (resize) {
++ /* reset wrap/pan */
+ int cols, rows;
+
+ cols = FBCON_SWAP(ops->rotate, info->var.xres, info->var.yres);
+ rows = FBCON_SWAP(ops->rotate, info->var.yres, info->var.xres);
++
++ if (fbcon_decor_active(info, vc)) {
++ info->var.xoffset = info->var.yoffset = p->yscroll = 0;
++ cols = vc->vc_decor.twidth;
++ rows = vc->vc_decor.theight;
++ }
+ cols /= w;
+ rows /= h;
++
+ vc_resize(vc, cols, rows);
++
+ if (CON_IS_VISIBLE(vc) && softback_buf)
+ fbcon_update_softback(vc);
+ } else if (CON_IS_VISIBLE(vc)
+@@ -2657,7 +2729,11 @@ static int fbcon_set_palette(struct vc_data *vc, unsigned char *table)
+ int i, j, k, depth;
+ u8 val;
+
+- if (fbcon_is_inactive(vc, info))
++ if (fbcon_is_inactive(vc, info)
++#ifdef CONFIG_FB_CON_DECOR
++ || vc->vc_num != fg_console
++#endif
++ )
+ return -EINVAL;
+
+ if (!CON_IS_VISIBLE(vc))
+@@ -2683,14 +2759,56 @@ static int fbcon_set_palette(struct vc_data *vc, unsigned char *table)
+ } else
+ fb_copy_cmap(fb_default_cmap(1 << depth), &palette_cmap);
+
+- return fb_set_cmap(&palette_cmap, info);
++ if (fbcon_decor_active(info, vc_cons[fg_console].d) &&
++ info->fix.visual == FB_VISUAL_DIRECTCOLOR) {
++
++ u16 *red, *green, *blue;
++ int minlen = min(min(info->var.red.length, info->var.green.length),
++ info->var.blue.length);
++ int h;
++
++ struct fb_cmap cmap = {
++ .start = 0,
++ .len = (1 << minlen),
++ .red = NULL,
++ .green = NULL,
++ .blue = NULL,
++ .transp = NULL
++ };
++
++ red = kmalloc(256 * sizeof(u16) * 3, GFP_KERNEL);
++
++ if (!red)
++ goto out;
++
++ green = red + 256;
++ blue = green + 256;
++ cmap.red = red;
++ cmap.green = green;
++ cmap.blue = blue;
++
++ for (i = 0; i < cmap.len; i++) {
++ red[i] = green[i] = blue[i] = (0xffff * i)/(cmap.len-1);
++ }
++
++ h = fb_set_cmap(&cmap, info);
++ fbcon_decor_fix_pseudo_pal(info, vc_cons[fg_console].d);
++ kfree(red);
++
++ return h;
++
++ } else if (fbcon_decor_active(info, vc_cons[fg_console].d) &&
++ info->var.bits_per_pixel == 8 && info->bgdecor.cmap.red != NULL)
++ fb_set_cmap(&info->bgdecor.cmap, info);
++
++out: return fb_set_cmap(&palette_cmap, info);
+ }
+
+ static u16 *fbcon_screen_pos(struct vc_data *vc, int offset)
+ {
+ unsigned long p;
+ int line;
+-
++
+ if (vc->vc_num != fg_console || !softback_lines)
+ return (u16 *) (vc->vc_origin + offset);
+ line = offset / vc->vc_size_row;
+@@ -2909,7 +3027,14 @@ static void fbcon_modechanged(struct fb_info *info)
+ rows = FBCON_SWAP(ops->rotate, info->var.yres, info->var.xres);
+ cols /= vc->vc_font.width;
+ rows /= vc->vc_font.height;
+- vc_resize(vc, cols, rows);
++
++ if (!fbcon_decor_active_nores(info, vc)) {
++ vc_resize(vc, cols, rows);
++ } else {
++ fbcon_decor_disable(vc, 0);
++ fbcon_decor_call_helper("modechange", vc->vc_num);
++ }
++
+ updatescrollmode(p, info, vc);
+ scrollback_max = 0;
+ scrollback_current = 0;
+@@ -2954,7 +3079,9 @@ static void fbcon_set_all_vcs(struct fb_info *info)
+ rows = FBCON_SWAP(ops->rotate, info->var.yres, info->var.xres);
+ cols /= vc->vc_font.width;
+ rows /= vc->vc_font.height;
+- vc_resize(vc, cols, rows);
++ if (!fbcon_decor_active_nores(info, vc)) {
++ vc_resize(vc, cols, rows);
++ }
+ }
+
+ if (fg != -1)
+@@ -3596,6 +3723,7 @@ static void fbcon_exit(void)
+ }
+ }
+
++ fbcon_decor_exit();
+ fbcon_has_exited = 1;
+ }
+
+diff --git a/drivers/video/console/fbcondecor.c b/drivers/video/console/fbcondecor.c
+new file mode 100644
+index 0000000..babc8c5
+--- /dev/null
++++ b/drivers/video/console/fbcondecor.c
+@@ -0,0 +1,555 @@
++/*
++ * linux/drivers/video/console/fbcondecor.c -- Framebuffer console decorations
++ *
++ * Copyright (C) 2004-2009 Michal Januszewski <michalj+fbcondecor@gmail.com>
++ *
++ * Code based upon "Bootsplash" (C) 2001-2003
++ * Volker Poplawski <volker@poplawski.de>,
++ * Stefan Reinauer <stepan@suse.de>,
++ * Steffen Winterfeldt <snwint@suse.de>,
++ * Michael Schroeder <mls@suse.de>,
++ * Ken Wimer <wimer@suse.de>.
++ *
++ * Compat ioctl support by Thorsten Klein <TK@Thorsten-Klein.de>.
++ *
++ * This file is subject to the terms and conditions of the GNU General Public
++ * License. See the file COPYING in the main directory of this archive for
++ * more details.
++ *
++ */
++#include <linux/module.h>
++#include <linux/kernel.h>
++#include <linux/string.h>
++#include <linux/types.h>
++#include <linux/fb.h>
++#include <linux/vt_kern.h>
++#include <linux/vmalloc.h>
++#include <linux/unistd.h>
++#include <linux/syscalls.h>
++#include <linux/init.h>
++#include <linux/proc_fs.h>
++#include <linux/workqueue.h>
++#include <linux/kmod.h>
++#include <linux/miscdevice.h>
++#include <linux/device.h>
++#include <linux/fs.h>
++#include <linux/compat.h>
++#include <linux/console.h>
++
++#include <asm/uaccess.h>
++#include <asm/irq.h>
++
++#include "fbcon.h"
++#include "fbcondecor.h"
++
++extern signed char con2fb_map[];
++static int fbcon_decor_enable(struct vc_data *vc);
++char fbcon_decor_path[KMOD_PATH_LEN] = "/sbin/fbcondecor_helper";
++static int initialized = 0;
++
++int fbcon_decor_call_helper(char* cmd, unsigned short vc)
++{
++ char *envp[] = {
++ "HOME=/",
++ "PATH=/sbin:/bin",
++ NULL
++ };
++
++ char tfb[5];
++ char tcons[5];
++ unsigned char fb = (int) con2fb_map[vc];
++
++ char *argv[] = {
++ fbcon_decor_path,
++ "2",
++ cmd,
++ tcons,
++ tfb,
++ vc_cons[vc].d->vc_decor.theme,
++ NULL
++ };
++
++ snprintf(tfb,5,"%d",fb);
++ snprintf(tcons,5,"%d",vc);
++
++ return call_usermodehelper(fbcon_decor_path, argv, envp, UMH_WAIT_EXEC);
++}
++
++/* Disables fbcondecor on a virtual console; called with console sem held. */
++int fbcon_decor_disable(struct vc_data *vc, unsigned char redraw)
++{
++ struct fb_info* info;
++
++ if (!vc->vc_decor.state)
++ return -EINVAL;
++
++ info = registered_fb[(int) con2fb_map[vc->vc_num]];
++
++ if (info == NULL)
++ return -EINVAL;
++
++ vc->vc_decor.state = 0;
++ vc_resize(vc, info->var.xres / vc->vc_font.width,
++ info->var.yres / vc->vc_font.height);
++
++ if (fg_console == vc->vc_num && redraw) {
++ redraw_screen(vc, 0);
++ update_region(vc, vc->vc_origin +
++ vc->vc_size_row * vc->vc_top,
++ vc->vc_size_row * (vc->vc_bottom - vc->vc_top) / 2);
++ }
++
++ printk(KERN_INFO "fbcondecor: switched decor state to 'off' on console %d\n",
++ vc->vc_num);
++
++ return 0;
++}
++
++/* Enables fbcondecor on a virtual console; called with console sem held. */
++static int fbcon_decor_enable(struct vc_data *vc)
++{
++ struct fb_info* info;
++
++ info = registered_fb[(int) con2fb_map[vc->vc_num]];
++
++ if (vc->vc_decor.twidth == 0 || vc->vc_decor.theight == 0 ||
++ info == NULL || vc->vc_decor.state || (!info->bgdecor.data &&
++ vc->vc_num == fg_console))
++ return -EINVAL;
++
++ vc->vc_decor.state = 1;
++ vc_resize(vc, vc->vc_decor.twidth / vc->vc_font.width,
++ vc->vc_decor.theight / vc->vc_font.height);
++
++ if (fg_console == vc->vc_num) {
++ redraw_screen(vc, 0);
++ update_region(vc, vc->vc_origin +
++ vc->vc_size_row * vc->vc_top,
++ vc->vc_size_row * (vc->vc_bottom - vc->vc_top) / 2);
++ fbcon_decor_clear_margins(vc, info, 0);
++ }
++
++ printk(KERN_INFO "fbcondecor: switched decor state to 'on' on console %d\n",
++ vc->vc_num);
++
++ return 0;
++}
++
++static inline int fbcon_decor_ioctl_dosetstate(struct vc_data *vc, unsigned int state, unsigned char origin)
++{
++ int ret;
++
++// if (origin == FBCON_DECOR_IO_ORIG_USER)
++ console_lock();
++ if (!state)
++ ret = fbcon_decor_disable(vc, 1);
++ else
++ ret = fbcon_decor_enable(vc);
++// if (origin == FBCON_DECOR_IO_ORIG_USER)
++ console_unlock();
++
++ return ret;
++}
++
++static inline void fbcon_decor_ioctl_dogetstate(struct vc_data *vc, unsigned int *state)
++{
++ *state = vc->vc_decor.state;
++}
++
++static int fbcon_decor_ioctl_dosetcfg(struct vc_data *vc, struct vc_decor *cfg, unsigned char origin)
++{
++ struct fb_info *info;
++ int len;
++ char *tmp;
++
++ info = registered_fb[(int) con2fb_map[vc->vc_num]];
++
++ if (info == NULL || !cfg->twidth || !cfg->theight ||
++ cfg->tx + cfg->twidth > info->var.xres ||
++ cfg->ty + cfg->theight > info->var.yres)
++ return -EINVAL;
++
++ len = strlen_user(cfg->theme);
++ if (!len || len > FBCON_DECOR_THEME_LEN)
++ return -EINVAL;
++ tmp = kmalloc(len, GFP_KERNEL);
++ if (!tmp)
++ return -ENOMEM;
++ if (copy_from_user(tmp, (void __user *)cfg->theme, len))
++ return -EFAULT;
++ cfg->theme = tmp;
++ cfg->state = 0;
++
++ /* If this ioctl is a response to a request from kernel, the console sem
++ * is already held; we also don't need to disable decor because either the
++ * new config and background picture will be successfully loaded, and the
++ * decor will stay on, or in case of a failure it'll be turned off in fbcon. */
++// if (origin == FBCON_DECOR_IO_ORIG_USER) {
++ console_lock();
++ if (vc->vc_decor.state)
++ fbcon_decor_disable(vc, 1);
++// }
++
++ if (vc->vc_decor.theme)
++ kfree(vc->vc_decor.theme);
++
++ vc->vc_decor = *cfg;
++
++// if (origin == FBCON_DECOR_IO_ORIG_USER)
++ console_unlock();
++
++ printk(KERN_INFO "fbcondecor: console %d using theme '%s'\n",
++ vc->vc_num, vc->vc_decor.theme);
++ return 0;
++}
++
++static int fbcon_decor_ioctl_dogetcfg(struct vc_data *vc, struct vc_decor *decor)
++{
++ char __user *tmp;
++
++ tmp = decor->theme;
++ *decor = vc->vc_decor;
++ decor->theme = tmp;
++
++ if (vc->vc_decor.theme) {
++ if (copy_to_user(tmp, vc->vc_decor.theme, strlen(vc->vc_decor.theme) + 1))
++ return -EFAULT;
++ } else
++ if (put_user(0, tmp))
++ return -EFAULT;
++
++ return 0;
++}
++
++static int fbcon_decor_ioctl_dosetpic(struct vc_data *vc, struct fb_image *img, unsigned char origin)
++{
++ struct fb_info *info;
++ int len;
++ u8 *tmp;
++
++ if (vc->vc_num != fg_console)
++ return -EINVAL;
++
++ info = registered_fb[(int) con2fb_map[vc->vc_num]];
++
++ if (info == NULL)
++ return -EINVAL;
++
++ if (img->width != info->var.xres || img->height != info->var.yres) {
++ printk(KERN_ERR "fbcondecor: picture dimensions mismatch\n");
++ printk(KERN_ERR "%dx%d vs %dx%d\n", img->width, img->height, info->var.xres, info->var.yres);
++ return -EINVAL;
++ }
++
++ if (img->depth != info->var.bits_per_pixel) {
++ printk(KERN_ERR "fbcondecor: picture depth mismatch\n");
++ return -EINVAL;
++ }
++
++ if (img->depth == 8) {
++ if (!img->cmap.len || !img->cmap.red || !img->cmap.green ||
++ !img->cmap.blue)
++ return -EINVAL;
++
++ tmp = vmalloc(img->cmap.len * 3 * 2);
++ if (!tmp)
++ return -ENOMEM;
++
++ if (copy_from_user(tmp,
++ (void __user*)img->cmap.red, (img->cmap.len << 1)) ||
++ copy_from_user(tmp + (img->cmap.len << 1),
++ (void __user*)img->cmap.green, (img->cmap.len << 1)) ||
++ copy_from_user(tmp + (img->cmap.len << 2),
++ (void __user*)img->cmap.blue, (img->cmap.len << 1))) {
++ vfree(tmp);
++ return -EFAULT;
++ }
++
++ img->cmap.transp = NULL;
++ img->cmap.red = (u16*)tmp;
++ img->cmap.green = img->cmap.red + img->cmap.len;
++ img->cmap.blue = img->cmap.green + img->cmap.len;
++ } else {
++ img->cmap.red = NULL;
++ }
++
++ len = ((img->depth + 7) >> 3) * img->width * img->height;
++
++ /*
++ * Allocate an additional byte so that we never go outside of the
++ * buffer boundaries in the rendering functions in a 24 bpp mode.
++ */
++ tmp = vmalloc(len + 1);
++
++ if (!tmp)
++ goto out;
++
++ if (copy_from_user(tmp, (void __user*)img->data, len))
++ goto out;
++
++ img->data = tmp;
++
++ /* If this ioctl is a response to a request from kernel, the console sem
++ * is already held. */
++// if (origin == FBCON_DECOR_IO_ORIG_USER)
++ console_lock();
++
++ if (info->bgdecor.data)
++ vfree((u8*)info->bgdecor.data);
++ if (info->bgdecor.cmap.red)
++ vfree(info->bgdecor.cmap.red);
++
++ info->bgdecor = *img;
++
++ if (fbcon_decor_active_vc(vc) && fg_console == vc->vc_num) {
++ redraw_screen(vc, 0);
++ update_region(vc, vc->vc_origin +
++ vc->vc_size_row * vc->vc_top,
++ vc->vc_size_row * (vc->vc_bottom - vc->vc_top) / 2);
++ fbcon_decor_clear_margins(vc, info, 0);
++ }
++
++// if (origin == FBCON_DECOR_IO_ORIG_USER)
++ console_unlock();
++
++ return 0;
++
++out: if (img->cmap.red)
++ vfree(img->cmap.red);
++
++ if (tmp)
++ vfree(tmp);
++ return -ENOMEM;
++}
++
++static long fbcon_decor_ioctl(struct file *filp, u_int cmd, u_long arg)
++{
++ struct fbcon_decor_iowrapper __user *wrapper = (void __user*) arg;
++ struct vc_data *vc = NULL;
++ unsigned short vc_num = 0;
++ unsigned char origin = 0;
++ void __user *data = NULL;
++
++ if (!access_ok(VERIFY_READ, wrapper,
++ sizeof(struct fbcon_decor_iowrapper)))
++ return -EFAULT;
++
++ __get_user(vc_num, &wrapper->vc);
++ __get_user(origin, &wrapper->origin);
++ __get_user(data, &wrapper->data);
++
++ if (!vc_cons_allocated(vc_num))
++ return -EINVAL;
++
++ vc = vc_cons[vc_num].d;
++
++ switch (cmd) {
++ case FBIOCONDECOR_SETPIC:
++ {
++ struct fb_image img;
++ if (copy_from_user(&img, (struct fb_image __user *)data, sizeof(struct fb_image)))
++ return -EFAULT;
++
++ return fbcon_decor_ioctl_dosetpic(vc, &img, origin);
++ }
++ case FBIOCONDECOR_SETCFG:
++ {
++ struct vc_decor cfg;
++ if (copy_from_user(&cfg, (struct vc_decor __user *)data, sizeof(struct vc_decor)))
++ return -EFAULT;
++
++ return fbcon_decor_ioctl_dosetcfg(vc, &cfg, origin);
++ }
++ case FBIOCONDECOR_GETCFG:
++ {
++ int rval;
++ struct vc_decor cfg;
++
++ if (copy_from_user(&cfg, (struct vc_decor __user *)data, sizeof(struct vc_decor)))
++ return -EFAULT;
++
++ rval = fbcon_decor_ioctl_dogetcfg(vc, &cfg);
++
++ if (copy_to_user(data, &cfg, sizeof(struct vc_decor)))
++ return -EFAULT;
++ return rval;
++ }
++ case FBIOCONDECOR_SETSTATE:
++ {
++ unsigned int state = 0;
++ if (get_user(state, (unsigned int __user *)data))
++ return -EFAULT;
++ return fbcon_decor_ioctl_dosetstate(vc, state, origin);
++ }
++ case FBIOCONDECOR_GETSTATE:
++ {
++ unsigned int state = 0;
++ fbcon_decor_ioctl_dogetstate(vc, &state);
++ return put_user(state, (unsigned int __user *)data);
++ }
++
++ default:
++ return -ENOIOCTLCMD;
++ }
++}
++
++#ifdef CONFIG_COMPAT
++
++static long fbcon_decor_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) {
++
++ struct fbcon_decor_iowrapper32 __user *wrapper = (void __user *)arg;
++ struct vc_data *vc = NULL;
++ unsigned short vc_num = 0;
++ unsigned char origin = 0;
++ compat_uptr_t data_compat = 0;
++ void __user *data = NULL;
++
++ if (!access_ok(VERIFY_READ, wrapper,
++ sizeof(struct fbcon_decor_iowrapper32)))
++ return -EFAULT;
++
++ __get_user(vc_num, &wrapper->vc);
++ __get_user(origin, &wrapper->origin);
++ __get_user(data_compat, &wrapper->data);
++ data = compat_ptr(data_compat);
++
++ if (!vc_cons_allocated(vc_num))
++ return -EINVAL;
++
++ vc = vc_cons[vc_num].d;
++
++ switch (cmd) {
++ case FBIOCONDECOR_SETPIC32:
++ {
++ struct fb_image32 img_compat;
++ struct fb_image img;
++
++ if (copy_from_user(&img_compat, (struct fb_image32 __user *)data, sizeof(struct fb_image32)))
++ return -EFAULT;
++
++ fb_image_from_compat(img, img_compat);
++
++ return fbcon_decor_ioctl_dosetpic(vc, &img, origin);
++ }
++
++ case FBIOCONDECOR_SETCFG32:
++ {
++ struct vc_decor32 cfg_compat;
++ struct vc_decor cfg;
++
++ if (copy_from_user(&cfg_compat, (struct vc_decor32 __user *)data, sizeof(struct vc_decor32)))
++ return -EFAULT;
++
++ vc_decor_from_compat(cfg, cfg_compat);
++
++ return fbcon_decor_ioctl_dosetcfg(vc, &cfg, origin);
++ }
++
++ case FBIOCONDECOR_GETCFG32:
++ {
++ int rval;
++ struct vc_decor32 cfg_compat;
++ struct vc_decor cfg;
++
++ if (copy_from_user(&cfg_compat, (struct vc_decor32 __user *)data, sizeof(struct vc_decor32)))
++ return -EFAULT;
++ cfg.theme = compat_ptr(cfg_compat.theme);
++
++ rval = fbcon_decor_ioctl_dogetcfg(vc, &cfg);
++
++ vc_decor_to_compat(cfg_compat, cfg);
++
++ if (copy_to_user((struct vc_decor32 __user *)data, &cfg_compat, sizeof(struct vc_decor32)))
++ return -EFAULT;
++ return rval;
++ }
++
++ case FBIOCONDECOR_SETSTATE32:
++ {
++ compat_uint_t state_compat = 0;
++ unsigned int state = 0;
++
++ if (get_user(state_compat, (compat_uint_t __user *)data))
++ return -EFAULT;
++
++ state = (unsigned int)state_compat;
++
++ return fbcon_decor_ioctl_dosetstate(vc, state, origin);
++ }
++
++ case FBIOCONDECOR_GETSTATE32:
++ {
++ compat_uint_t state_compat = 0;
++ unsigned int state = 0;
++
++ fbcon_decor_ioctl_dogetstate(vc, &state);
++ state_compat = (compat_uint_t)state;
++
++ return put_user(state_compat, (compat_uint_t __user *)data);
++ }
++
++ default:
++ return -ENOIOCTLCMD;
++ }
++}
++#else
++ #define fbcon_decor_compat_ioctl NULL
++#endif
++
++static struct file_operations fbcon_decor_ops = {
++ .owner = THIS_MODULE,
++ .unlocked_ioctl = fbcon_decor_ioctl,
++ .compat_ioctl = fbcon_decor_compat_ioctl
++};
++
++static struct miscdevice fbcon_decor_dev = {
++ .minor = MISC_DYNAMIC_MINOR,
++ .name = "fbcondecor",
++ .fops = &fbcon_decor_ops
++};
++
++void fbcon_decor_reset(void)
++{
++ int i;
++
++ for (i = 0; i < num_registered_fb; i++) {
++ registered_fb[i]->bgdecor.data = NULL;
++ registered_fb[i]->bgdecor.cmap.red = NULL;
++ }
++
++ for (i = 0; i < MAX_NR_CONSOLES && vc_cons[i].d; i++) {
++ vc_cons[i].d->vc_decor.state = vc_cons[i].d->vc_decor.twidth =
++ vc_cons[i].d->vc_decor.theight = 0;
++ vc_cons[i].d->vc_decor.theme = NULL;
++ }
++
++ return;
++}
++
++int fbcon_decor_init(void)
++{
++ int i;
++
++ fbcon_decor_reset();
++
++ if (initialized)
++ return 0;
++
++ i = misc_register(&fbcon_decor_dev);
++ if (i) {
++ printk(KERN_ERR "fbcondecor: failed to register device\n");
++ return i;
++ }
++
++ fbcon_decor_call_helper("init", 0);
++ initialized = 1;
++ return 0;
++}
++
++int fbcon_decor_exit(void)
++{
++ fbcon_decor_reset();
++ return 0;
++}
++
++EXPORT_SYMBOL(fbcon_decor_path);
+diff --git a/drivers/video/console/fbcondecor.h b/drivers/video/console/fbcondecor.h
+new file mode 100644
+index 0000000..3b3724b
+--- /dev/null
++++ b/drivers/video/console/fbcondecor.h
+@@ -0,0 +1,78 @@
++/*
++ * linux/drivers/video/console/fbcondecor.h -- Framebuffer Console Decoration headers
++ *
++ * Copyright (C) 2004 Michal Januszewski <michalj+fbcondecor@gmail.com>
++ *
++ */
++
++#ifndef __FBCON_DECOR_H
++#define __FBCON_DECOR_H
++
++#ifndef _LINUX_FB_H
++#include <linux/fb.h>
++#endif
++
++/* This is needed for vc_cons in fbcmap.c */
++#include <linux/vt_kern.h>
++
++struct fb_cursor;
++struct fb_info;
++struct vc_data;
++
++#ifdef CONFIG_FB_CON_DECOR
++/* fbcondecor.c */
++int fbcon_decor_init(void);
++int fbcon_decor_exit(void);
++int fbcon_decor_call_helper(char* cmd, unsigned short cons);
++int fbcon_decor_disable(struct vc_data *vc, unsigned char redraw);
++
++/* cfbcondecor.c */
++void fbcon_decor_putcs(struct vc_data *vc, struct fb_info *info, const unsigned short *s, int count, int yy, int xx);
++void fbcon_decor_cursor(struct fb_info *info, struct fb_cursor *cursor);
++void fbcon_decor_clear(struct vc_data *vc, struct fb_info *info, int sy, int sx, int height, int width);
++void fbcon_decor_clear_margins(struct vc_data *vc, struct fb_info *info, int bottom_only);
++void fbcon_decor_blank(struct vc_data *vc, struct fb_info *info, int blank);
++void fbcon_decor_bmove_redraw(struct vc_data *vc, struct fb_info *info, int y, int sx, int dx, int width);
++void fbcon_decor_copy(u8 *dst, u8 *src, int height, int width, int linebytes, int srclinesbytes, int bpp);
++void fbcon_decor_fix_pseudo_pal(struct fb_info *info, struct vc_data *vc);
++
++/* vt.c */
++void acquire_console_sem(void);
++void release_console_sem(void);
++void do_unblank_screen(int entering_gfx);
++
++/* struct vc_data *y */
++#define fbcon_decor_active_vc(y) (y->vc_decor.state && y->vc_decor.theme)
++
++/* struct fb_info *x, struct vc_data *y */
++#define fbcon_decor_active_nores(x,y) (x->bgdecor.data && fbcon_decor_active_vc(y))
++
++/* struct fb_info *x, struct vc_data *y */
++#define fbcon_decor_active(x,y) (fbcon_decor_active_nores(x,y) && \
++ x->bgdecor.width == x->var.xres && \
++ x->bgdecor.height == x->var.yres && \
++ x->bgdecor.depth == x->var.bits_per_pixel)
++
++
++#else /* CONFIG_FB_CON_DECOR */
++
++static inline void fbcon_decor_putcs(struct vc_data *vc, struct fb_info *info, const unsigned short *s, int count, int yy, int xx) {}
++static inline void fbcon_decor_putc(struct vc_data *vc, struct fb_info *info, int c, int ypos, int xpos) {}
++static inline void fbcon_decor_cursor(struct fb_info *info, struct fb_cursor *cursor) {}
++static inline void fbcon_decor_clear(struct vc_data *vc, struct fb_info *info, int sy, int sx, int height, int width) {}
++static inline void fbcon_decor_clear_margins(struct vc_data *vc, struct fb_info *info, int bottom_only) {}
++static inline void fbcon_decor_blank(struct vc_data *vc, struct fb_info *info, int blank) {}
++static inline void fbcon_decor_bmove_redraw(struct vc_data *vc, struct fb_info *info, int y, int sx, int dx, int width) {}
++static inline void fbcon_decor_fix_pseudo_pal(struct fb_info *info, struct vc_data *vc) {}
++static inline int fbcon_decor_call_helper(char* cmd, unsigned short cons) { return 0; }
++static inline int fbcon_decor_init(void) { return 0; }
++static inline int fbcon_decor_exit(void) { return 0; }
++static inline int fbcon_decor_disable(struct vc_data *vc, unsigned char redraw) { return 0; }
++
++#define fbcon_decor_active_vc(y) (0)
++#define fbcon_decor_active_nores(x,y) (0)
++#define fbcon_decor_active(x,y) (0)
++
++#endif /* CONFIG_FB_CON_DECOR */
++
++#endif /* __FBCON_DECOR_H */
+diff --git a/drivers/video/fbdev/Kconfig b/drivers/video/fbdev/Kconfig
+index e1f4727..2952e33 100644
+--- a/drivers/video/fbdev/Kconfig
++++ b/drivers/video/fbdev/Kconfig
+@@ -1204,7 +1204,6 @@ config FB_MATROX
+ select FB_CFB_FILLRECT
+ select FB_CFB_COPYAREA
+ select FB_CFB_IMAGEBLIT
+- select FB_TILEBLITTING
+ select FB_MACMODES if PPC_PMAC
+ ---help---
+ Say Y here if you have a Matrox Millennium, Matrox Millennium II,
+diff --git a/drivers/video/fbdev/core/fbcmap.c b/drivers/video/fbdev/core/fbcmap.c
+index f89245b..05e036c 100644
+--- a/drivers/video/fbdev/core/fbcmap.c
++++ b/drivers/video/fbdev/core/fbcmap.c
+@@ -17,6 +17,8 @@
+ #include <linux/slab.h>
+ #include <linux/uaccess.h>
+
++#include "../../console/fbcondecor.h"
++
+ static u16 red2[] __read_mostly = {
+ 0x0000, 0xaaaa
+ };
+@@ -249,14 +251,17 @@ int fb_set_cmap(struct fb_cmap *cmap, struct fb_info *info)
+ if (transp)
+ htransp = *transp++;
+ if (info->fbops->fb_setcolreg(start++,
+- hred, hgreen, hblue,
++ hred, hgreen, hblue,
+ htransp, info))
+ break;
+ }
+ }
+- if (rc == 0)
++ if (rc == 0) {
+ fb_copy_cmap(cmap, &info->cmap);
+-
++ if (fbcon_decor_active(info, vc_cons[fg_console].d) &&
++ info->fix.visual == FB_VISUAL_DIRECTCOLOR)
++ fbcon_decor_fix_pseudo_pal(info, vc_cons[fg_console].d);
++ }
+ return rc;
+ }
+
+diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
+index b6d5008..d6703f2 100644
+--- a/drivers/video/fbdev/core/fbmem.c
++++ b/drivers/video/fbdev/core/fbmem.c
+@@ -1250,15 +1250,6 @@ struct fb_fix_screeninfo32 {
+ u16 reserved[3];
+ };
+
+-struct fb_cmap32 {
+- u32 start;
+- u32 len;
+- compat_caddr_t red;
+- compat_caddr_t green;
+- compat_caddr_t blue;
+- compat_caddr_t transp;
+-};
+-
+ static int fb_getput_cmap(struct fb_info *info, unsigned int cmd,
+ unsigned long arg)
+ {
+diff --git a/include/linux/console_decor.h b/include/linux/console_decor.h
+new file mode 100644
+index 0000000..04b8d80
+--- /dev/null
++++ b/include/linux/console_decor.h
+@@ -0,0 +1,46 @@
++#ifndef _LINUX_CONSOLE_DECOR_H_
++#define _LINUX_CONSOLE_DECOR_H_ 1
++
++/* A structure used by the framebuffer console decorations (drivers/video/console/fbcondecor.c) */
++struct vc_decor {
++ __u8 bg_color; /* The color that is to be treated as transparent */
++ __u8 state; /* Current decor state: 0 = off, 1 = on */
++ __u16 tx, ty; /* Top left corner coordinates of the text field */
++ __u16 twidth, theight; /* Width and height of the text field */
++ char* theme;
++};
++
++#ifdef __KERNEL__
++#ifdef CONFIG_COMPAT
++#include <linux/compat.h>
++
++struct vc_decor32 {
++ __u8 bg_color; /* The color that is to be treated as transparent */
++ __u8 state; /* Current decor state: 0 = off, 1 = on */
++ __u16 tx, ty; /* Top left corner coordinates of the text field */
++ __u16 twidth, theight; /* Width and height of the text field */
++ compat_uptr_t theme;
++};
++
++#define vc_decor_from_compat(to, from) \
++ (to).bg_color = (from).bg_color; \
++ (to).state = (from).state; \
++ (to).tx = (from).tx; \
++ (to).ty = (from).ty; \
++ (to).twidth = (from).twidth; \
++ (to).theight = (from).theight; \
++ (to).theme = compat_ptr((from).theme)
++
++#define vc_decor_to_compat(to, from) \
++ (to).bg_color = (from).bg_color; \
++ (to).state = (from).state; \
++ (to).tx = (from).tx; \
++ (to).ty = (from).ty; \
++ (to).twidth = (from).twidth; \
++ (to).theight = (from).theight; \
++ (to).theme = ptr_to_compat((from).theme)
++
++#endif /* CONFIG_COMPAT */
++#endif /* __KERNEL__ */
++
++#endif
+diff --git a/include/linux/console_struct.h b/include/linux/console_struct.h
+index 7f0c329..98f5d60 100644
+--- a/include/linux/console_struct.h
++++ b/include/linux/console_struct.h
+@@ -19,6 +19,7 @@
+ struct vt_struct;
+
+ #define NPAR 16
++#include <linux/console_decor.h>
+
+ struct vc_data {
+ struct tty_port port; /* Upper level data */
+@@ -107,6 +108,8 @@ struct vc_data {
+ unsigned long vc_uni_pagedir;
+ unsigned long *vc_uni_pagedir_loc; /* [!] Location of uni_pagedir variable for this console */
+ bool vc_panic_force_write; /* when oops/panic this VC can accept forced output/blanking */
++
++ struct vc_decor vc_decor;
+ /* additional information is in vt_kern.h */
+ };
+
+diff --git a/include/linux/fb.h b/include/linux/fb.h
+index fe6ac95..1e36b03 100644
+--- a/include/linux/fb.h
++++ b/include/linux/fb.h
+@@ -219,6 +219,34 @@ struct fb_deferred_io {
+ };
+ #endif
+
++#ifdef __KERNEL__
++#ifdef CONFIG_COMPAT
++struct fb_image32 {
++ __u32 dx; /* Where to place image */
++ __u32 dy;
++ __u32 width; /* Size of image */
++ __u32 height;
++ __u32 fg_color; /* Only used when a mono bitmap */
++ __u32 bg_color;
++ __u8 depth; /* Depth of the image */
++ const compat_uptr_t data; /* Pointer to image data */
++ struct fb_cmap32 cmap; /* color map info */
++};
++
++#define fb_image_from_compat(to, from) \
++ (to).dx = (from).dx; \
++ (to).dy = (from).dy; \
++ (to).width = (from).width; \
++ (to).height = (from).height; \
++ (to).fg_color = (from).fg_color; \
++ (to).bg_color = (from).bg_color; \
++ (to).depth = (from).depth; \
++ (to).data = compat_ptr((from).data); \
++ fb_cmap_from_compat((to).cmap, (from).cmap)
++
++#endif /* CONFIG_COMPAT */
++#endif /* __KERNEL__ */
++
+ /*
+ * Frame buffer operations
+ *
+@@ -489,6 +517,9 @@ struct fb_info {
+ #define FBINFO_STATE_SUSPENDED 1
+ u32 state; /* Hardware state i.e suspend */
+ void *fbcon_par; /* fbcon use-only private area */
++
++ struct fb_image bgdecor;
++
+ /* From here on everything is device dependent */
+ void *par;
+ /* we need the PCI or similar aperture base/size not
+diff --git a/include/uapi/linux/fb.h b/include/uapi/linux/fb.h
+index fb795c3..dc77a03 100644
+--- a/include/uapi/linux/fb.h
++++ b/include/uapi/linux/fb.h
+@@ -8,6 +8,25 @@
+
+ #define FB_MAX 32 /* sufficient for now */
+
++struct fbcon_decor_iowrapper
++{
++ unsigned short vc; /* Virtual console */
++ unsigned char origin; /* Point of origin of the request */
++ void *data;
++};
++
++#ifdef __KERNEL__
++#ifdef CONFIG_COMPAT
++#include <linux/compat.h>
++struct fbcon_decor_iowrapper32
++{
++ unsigned short vc; /* Virtual console */
++ unsigned char origin; /* Point of origin of the request */
++ compat_uptr_t data;
++};
++#endif /* CONFIG_COMPAT */
++#endif /* __KERNEL__ */
++
+ /* ioctls
+ 0x46 is 'F' */
+ #define FBIOGET_VSCREENINFO 0x4600
+@@ -35,6 +54,25 @@
+ #define FBIOGET_DISPINFO 0x4618
+ #define FBIO_WAITFORVSYNC _IOW('F', 0x20, __u32)
+
++#define FBIOCONDECOR_SETCFG _IOWR('F', 0x19, struct fbcon_decor_iowrapper)
++#define FBIOCONDECOR_GETCFG _IOR('F', 0x1A, struct fbcon_decor_iowrapper)
++#define FBIOCONDECOR_SETSTATE _IOWR('F', 0x1B, struct fbcon_decor_iowrapper)
++#define FBIOCONDECOR_GETSTATE _IOR('F', 0x1C, struct fbcon_decor_iowrapper)
++#define FBIOCONDECOR_SETPIC _IOWR('F', 0x1D, struct fbcon_decor_iowrapper)
++#ifdef __KERNEL__
++#ifdef CONFIG_COMPAT
++#define FBIOCONDECOR_SETCFG32 _IOWR('F', 0x19, struct fbcon_decor_iowrapper32)
++#define FBIOCONDECOR_GETCFG32 _IOR('F', 0x1A, struct fbcon_decor_iowrapper32)
++#define FBIOCONDECOR_SETSTATE32 _IOWR('F', 0x1B, struct fbcon_decor_iowrapper32)
++#define FBIOCONDECOR_GETSTATE32 _IOR('F', 0x1C, struct fbcon_decor_iowrapper32)
++#define FBIOCONDECOR_SETPIC32 _IOWR('F', 0x1D, struct fbcon_decor_iowrapper32)
++#endif /* CONFIG_COMPAT */
++#endif /* __KERNEL__ */
++
++#define FBCON_DECOR_THEME_LEN 128 /* Maximum lenght of a theme name */
++#define FBCON_DECOR_IO_ORIG_KERNEL 0 /* Kernel ioctl origin */
++#define FBCON_DECOR_IO_ORIG_USER 1 /* User ioctl origin */
++
+ #define FB_TYPE_PACKED_PIXELS 0 /* Packed Pixels */
+ #define FB_TYPE_PLANES 1 /* Non interleaved planes */
+ #define FB_TYPE_INTERLEAVED_PLANES 2 /* Interleaved planes */
+@@ -277,6 +315,29 @@ struct fb_var_screeninfo {
+ __u32 reserved[4]; /* Reserved for future compatibility */
+ };
+
++#ifdef __KERNEL__
++#ifdef CONFIG_COMPAT
++struct fb_cmap32 {
++ __u32 start;
++ __u32 len; /* Number of entries */
++ compat_uptr_t red; /* Red values */
++ compat_uptr_t green;
++ compat_uptr_t blue;
++ compat_uptr_t transp; /* transparency, can be NULL */
++};
++
++#define fb_cmap_from_compat(to, from) \
++ (to).start = (from).start; \
++ (to).len = (from).len; \
++ (to).red = compat_ptr((from).red); \
++ (to).green = compat_ptr((from).green); \
++ (to).blue = compat_ptr((from).blue); \
++ (to).transp = compat_ptr((from).transp)
++
++#endif /* CONFIG_COMPAT */
++#endif /* __KERNEL__ */
++
++
+ struct fb_cmap {
+ __u32 start; /* First entry */
+ __u32 len; /* Number of entries */
+diff --git a/kernel/sysctl.c b/kernel/sysctl.c
+index 74f5b58..6386ab0 100644
+--- a/kernel/sysctl.c
++++ b/kernel/sysctl.c
+@@ -146,6 +146,10 @@ static const int cap_last_cap = CAP_LAST_CAP;
+ static unsigned long hung_task_timeout_max = (LONG_MAX/HZ);
+ #endif
+
++#ifdef CONFIG_FB_CON_DECOR
++extern char fbcon_decor_path[];
++#endif
++
+ #ifdef CONFIG_INOTIFY_USER
+ #include <linux/inotify.h>
+ #endif
+@@ -255,6 +259,15 @@ static struct ctl_table sysctl_base_table[] = {
+ .mode = 0555,
+ .child = dev_table,
+ },
++#ifdef CONFIG_FB_CON_DECOR
++ {
++ .procname = "fbcondecor",
++ .data = &fbcon_decor_path,
++ .maxlen = KMOD_PATH_LEN,
++ .mode = 0644,
++ .proc_handler = &proc_dostring,
++ },
++#endif
+ { }
+ };
+
diff --git a/4567_distro-Gentoo-Kconfig.patch b/4567_distro-Gentoo-Kconfig.patch
index c7af596..652e2a7 100644
--- a/4567_distro-Gentoo-Kconfig.patch
+++ b/4567_distro-Gentoo-Kconfig.patch
@@ -1,5 +1,5 @@
---- a/Kconfig
-+++ b/Kconfig
+--- a/Kconfig 2014-04-02 09:45:05.389224541 -0400
++++ b/Kconfig 2014-04-02 09:45:39.269224273 -0400
@@ -8,4 +8,6 @@ config SRCARCH
string
option env="SRCARCH"
@@ -7,9 +7,9 @@
+source "distro/Kconfig"
+
source "arch/$SRCARCH/Kconfig"
---- /dev/null
-+++ b/distro/Kconfig
-@@ -0,0 +1,131 @@
+--- 1969-12-31 19:00:00.000000000 -0500
++++ b/distro/Kconfig 2014-04-02 09:57:03.539218861 -0400
+@@ -0,0 +1,108 @@
+menu "Gentoo Linux"
+
+config GENTOO_LINUX
@@ -30,7 +30,7 @@
+
+ depends on GENTOO_LINUX
+ default y if GENTOO_LINUX
-+
++
+ select DEVTMPFS
+ select TMPFS
+
@@ -51,29 +51,7 @@
+ boot process; if not available, it causes sysfs and udev to malfunction.
+
+ To ensure Gentoo Linux boots, it is best to leave this setting enabled;
-+ if you run a custom setup, you could consider whether to disable this.
-+
-+config GENTOO_LINUX_PORTAGE
-+ bool "Select options required by Portage features"
-+
-+ depends on GENTOO_LINUX
-+ default y if GENTOO_LINUX
-+
-+ select CGROUPS
-+ select NAMESPACES
-+ select IPC_NS
-+ select NET_NS
-+
-+ help
-+ This enables options required by various Portage FEATURES.
-+ Currently this selects:
-+
-+ CGROUPS (required for FEATURES=cgroup)
-+ IPC_NS (required for FEATURES=ipc-sandbox)
-+ NET_NS (required for FEATURES=network-sandbox)
-+
-+ It is highly recommended that you leave this enabled as these FEATURES
-+ are, or will soon be, enabled by default.
++ if you run a custom setup, you could consider whether to disable this.
+
+menu "Support for init systems, system and service managers"
+ visible if GENTOO_LINUX
@@ -109,13 +87,12 @@
+ select AUTOFS4_FS
+ select BLK_DEV_BSG
+ select CGROUPS
-+ select DEVPTS_MULTIPLE_INSTANCES
+ select EPOLL
+ select FANOTIFY
+ select FHANDLE
+ select INOTIFY_USER
+ select NET
-+ select NET_NS
++ select NET_NS
+ select PROC_FS
+ select SIGNALFD
+ select SYSFS
diff --git a/5000_enable-additional-cpu-optimizations-for-gcc.patch b/5000_enable-additional-cpu-optimizations-for-gcc.patch
new file mode 100644
index 0000000..f7ab6f0
--- /dev/null
+++ b/5000_enable-additional-cpu-optimizations-for-gcc.patch
@@ -0,0 +1,327 @@
+This patch has been tested on and known to work with kernel versions from 3.2
+up to the latest git version (pulled on 12/14/2013).
+
+This patch will expand the number of microarchitectures to include new
+processors including: AMD K10-family, AMD Family 10h (Barcelona), AMD Family
+14h (Bobcat), AMD Family 15h (Bulldozer), AMD Family 15h (Piledriver), AMD
+Family 16h (Jaguar), Intel 1st Gen Core i3/i5/i7 (Nehalem), Intel 2nd Gen Core
+i3/i5/i7 (Sandybridge), Intel 3rd Gen Core i3/i5/i7 (Ivybridge), and Intel 4th
+Gen Core i3/i5/i7 (Haswell). It also offers the compiler the 'native' flag.
+
+Small but real speed increases are measurable using a make endpoint comparing
+a generic kernel to one built with one of the respective microarchs.
+
+See the following experimental evidence supporting this statement:
+https://github.com/graysky2/kernel_gcc_patch
+
+REQUIREMENTS
+linux version >=3.15
+gcc version <4.9
+
+---
+diff -uprN a/arch/x86/include/asm/module.h b/arch/x86/include/asm/module.h
+--- a/arch/x86/include/asm/module.h 2013-11-03 18:41:51.000000000 -0500
++++ b/arch/x86/include/asm/module.h 2013-12-15 06:21:24.351122516 -0500
+@@ -15,6 +15,16 @@
+ #define MODULE_PROC_FAMILY "586MMX "
+ #elif defined CONFIG_MCORE2
+ #define MODULE_PROC_FAMILY "CORE2 "
++#elif defined CONFIG_MNATIVE
++#define MODULE_PROC_FAMILY "NATIVE "
++#elif defined CONFIG_MCOREI7
++#define MODULE_PROC_FAMILY "COREI7 "
++#elif defined CONFIG_MCOREI7AVX
++#define MODULE_PROC_FAMILY "COREI7AVX "
++#elif defined CONFIG_MCOREAVXI
++#define MODULE_PROC_FAMILY "COREAVXI "
++#elif defined CONFIG_MCOREAVX2
++#define MODULE_PROC_FAMILY "COREAVX2 "
+ #elif defined CONFIG_MATOM
+ #define MODULE_PROC_FAMILY "ATOM "
+ #elif defined CONFIG_M686
+@@ -33,6 +43,18 @@
+ #define MODULE_PROC_FAMILY "K7 "
+ #elif defined CONFIG_MK8
+ #define MODULE_PROC_FAMILY "K8 "
++#elif defined CONFIG_MK10
++#define MODULE_PROC_FAMILY "K10 "
++#elif defined CONFIG_MBARCELONA
++#define MODULE_PROC_FAMILY "BARCELONA "
++#elif defined CONFIG_MBOBCAT
++#define MODULE_PROC_FAMILY "BOBCAT "
++#elif defined CONFIG_MBULLDOZER
++#define MODULE_PROC_FAMILY "BULLDOZER "
++#elif defined CONFIG_MPILEDRIVER
++#define MODULE_PROC_FAMILY "PILEDRIVER "
++#elif defined CONFIG_MJAGUAR
++#define MODULE_PROC_FAMILY "JAGUAR "
+ #elif defined CONFIG_MELAN
+ #define MODULE_PROC_FAMILY "ELAN "
+ #elif defined CONFIG_MCRUSOE
+diff -uprN a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
+--- a/arch/x86/Kconfig.cpu 2013-11-03 18:41:51.000000000 -0500
++++ b/arch/x86/Kconfig.cpu 2013-12-15 06:21:24.351122516 -0500
+@@ -139,7 +139,7 @@ config MPENTIUM4
+
+
+ config MK6
+- bool "K6/K6-II/K6-III"
++ bool "AMD K6/K6-II/K6-III"
+ depends on X86_32
+ ---help---
+ Select this for an AMD K6-family processor. Enables use of
+@@ -147,7 +147,7 @@ config MK6
+ flags to GCC.
+
+ config MK7
+- bool "Athlon/Duron/K7"
++ bool "AMD Athlon/Duron/K7"
+ depends on X86_32
+ ---help---
+ Select this for an AMD Athlon K7-family processor. Enables use of
+@@ -155,12 +155,55 @@ config MK7
+ flags to GCC.
+
+ config MK8
+- bool "Opteron/Athlon64/Hammer/K8"
++ bool "AMD Opteron/Athlon64/Hammer/K8"
+ ---help---
+ Select this for an AMD Opteron or Athlon64 Hammer-family processor.
+ Enables use of some extended instructions, and passes appropriate
+ optimization flags to GCC.
+
++config MK10
++ bool "AMD 61xx/7x50/PhenomX3/X4/II/K10"
++ ---help---
++ Select this for an AMD 61xx Eight-Core Magny-Cours, Athlon X2 7x50,
++ Phenom X3/X4/II, Athlon II X2/X3/X4, or Turion II-family processor.
++ Enables use of some extended instructions, and passes appropriate
++ optimization flags to GCC.
++
++config MBARCELONA
++ bool "AMD Barcelona"
++ ---help---
++ Select this for AMD Barcelona and newer processors.
++
++ Enables -march=barcelona
++
++config MBOBCAT
++ bool "AMD Bobcat"
++ ---help---
++ Select this for AMD Bobcat processors.
++
++ Enables -march=btver1
++
++config MBULLDOZER
++ bool "AMD Bulldozer"
++ ---help---
++ Select this for AMD Bulldozer processors.
++
++ Enables -march=bdver1
++
++config MPILEDRIVER
++ bool "AMD Piledriver"
++ ---help---
++ Select this for AMD Piledriver processors.
++
++ Enables -march=bdver2
++
++config MJAGUAR
++ bool "AMD Jaguar"
++ ---help---
++ Select this for AMD Jaguar processors.
++
++ Enables -march=btver2
++
+ config MCRUSOE
+ bool "Crusoe"
+ depends on X86_32
+@@ -251,8 +294,17 @@ config MPSC
+ using the cpu family field
+ in /proc/cpuinfo. Family 15 is an older Xeon, Family 6 a newer one.
+
++config MATOM
++ bool "Intel Atom"
++ ---help---
++
++ Select this for the Intel Atom platform. Intel Atom CPUs have an
++ in-order pipelining architecture and thus can benefit from
++ accordingly optimized code. Use a recent GCC with specific Atom
++ support in order to fully benefit from selecting this option.
++
+ config MCORE2
+- bool "Core 2/newer Xeon"
++ bool "Intel Core 2"
+ ---help---
+
+ Select this for Intel Core 2 and newer Core 2 Xeons (Xeon 51xx and
+@@ -260,14 +312,40 @@ config MCORE2
+ family in /proc/cpuinfo. Newer ones have 6 and older ones 15
+ (not a typo)
+
+-config MATOM
+- bool "Intel Atom"
++ Enables -march=core2
++
++config MCOREI7
++ bool "Intel Core i7"
+ ---help---
+
+- Select this for the Intel Atom platform. Intel Atom CPUs have an
+- in-order pipelining architecture and thus can benefit from
+- accordingly optimized code. Use a recent GCC with specific Atom
+- support in order to fully benefit from selecting this option.
++ Select this for the Intel Nehalem platform. Intel Nehalem proecessors
++ include Core i3, i5, i7, Xeon: 34xx, 35xx, 55xx, 56xx, 75xx processors.
++
++ Enables -march=corei7
++
++config MCOREI7AVX
++ bool "Intel Core 2nd Gen AVX"
++ ---help---
++
++ Select this for 2nd Gen Core processors including Sandy Bridge.
++
++ Enables -march=corei7-avx
++
++config MCOREAVXI
++ bool "Intel Core 3rd Gen AVX"
++ ---help---
++
++ Select this for 3rd Gen Core processors including Ivy Bridge.
++
++ Enables -march=core-avx-i
++
++config MCOREAVX2
++ bool "Intel Core AVX2"
++ ---help---
++
++ Select this for AVX2 enabled processors including Haswell.
++
++ Enables -march=core-avx2
+
+ config GENERIC_CPU
+ bool "Generic-x86-64"
+@@ -276,6 +354,19 @@ config GENERIC_CPU
+ Generic x86-64 CPU.
+ Run equally well on all x86-64 CPUs.
+
++config MNATIVE
++ bool "Native optimizations autodetected by GCC"
++ ---help---
++
++ GCC 4.2 and above support -march=native, which automatically detects
++ the optimum settings to use based on your processor. -march=native
++ also detects and applies additional settings beyond -march specific
++ to your CPU, (eg. -msse4). Unless you have a specific reason not to
++ (e.g. distcc cross-compiling), you should probably be using
++ -march=native rather than anything listed below.
++
++ Enables -march=native
++
+ endchoice
+
+ config X86_GENERIC
+@@ -300,7 +391,7 @@ config X86_INTERNODE_CACHE_SHIFT
+ config X86_L1_CACHE_SHIFT
+ int
+ default "7" if MPENTIUM4 || MPSC
+- default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MATOM || MVIAC7 || X86_GENERIC || GENERIC_CPU
++ default "6" if MK7 || MK8 || MK10 || MBARCELONA || MBOBCAT || MBULLDOZER || MPILEDRIVER || MJAGUAR || MPENTIUMM || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2 || MATOM || MVIAC7 || X86_GENERIC || MNATIVE || GENERIC_CPU
+ default "4" if MELAN || M486 || MGEODEGX1
+ default "5" if MWINCHIP3D || MWINCHIPC6 || MCRUSOE || MEFFICEON || MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || M586 || MVIAC3_2 || MGEODE_LX
+
+@@ -331,11 +422,11 @@ config X86_ALIGNMENT_16
+
+ config X86_INTEL_USERCOPY
+ def_bool y
+- depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || X86_GENERIC || MK8 || MK7 || MEFFICEON || MCORE2
++ depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || MNATIVE || X86_GENERIC || MK8 || MK7 || MK10 || MBARCELONA || MEFFICEON || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2
+
+ config X86_USE_PPRO_CHECKSUM
+ def_bool y
+- depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MVIAC3_2 || MVIAC7 || MEFFICEON || MGEODE_LX || MCORE2 || MATOM
++ depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MK10 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MVIAC3_2 || MVIAC7 || MEFFICEON || MGEODE_LX || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2 || MATOM || MNATIVE
+
+ config X86_USE_3DNOW
+ def_bool y
+@@ -363,17 +454,17 @@ config X86_P6_NOP
+
+ config X86_TSC
+ def_bool y
+- depends on (MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MCORE2 || MATOM) || X86_64
++ depends on (MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MK10 || MBARCELONA || MBOBCAT || MBULLDOZER || MPILEDRIVER || MJAGUAR || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MCORE2 || MCOREI7 || MCOREI7-AVX || MATOM) || X86_64 || MNATIVE
+
+ config X86_CMPXCHG64
+ def_bool y
+- depends on X86_PAE || X86_64 || MCORE2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MATOM
++ depends on X86_PAE || X86_64 || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MATOM || MNATIVE
+
+ # this should be set for all -march=.. options where the compiler
+ # generates cmov.
+ config X86_CMOV
+ def_bool y
+- depends on (MK8 || MK7 || MCORE2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || X86_64 || MATOM || MGEODE_LX)
++ depends on (MK8 || MK10 || MBARCELONA || MBOBCAT || MBULLDOZER || MPILEDRIVER || MJAGUAR || MK7 || MCORE2 || MCOREI7 || MCOREI7AVX || MCOREAVXI || MCOREAVX2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || X86_64 || MNATIVE || MATOM || MGEODE_LX)
+
+ config X86_MINIMUM_CPU_FAMILY
+ int
+diff -uprN a/arch/x86/Makefile b/arch/x86/Makefile
+--- a/arch/x86/Makefile 2013-11-03 18:41:51.000000000 -0500
++++ b/arch/x86/Makefile 2013-12-15 06:21:24.354455723 -0500
+@@ -61,11 +61,26 @@ else
+ KBUILD_CFLAGS += $(call cc-option,-mno-sse -mpreferred-stack-boundary=3)
+
+ # FIXME - should be integrated in Makefile.cpu (Makefile_32.cpu)
++ cflags-$(CONFIG_MNATIVE) += $(call cc-option,-march=native)
+ cflags-$(CONFIG_MK8) += $(call cc-option,-march=k8)
++ cflags-$(CONFIG_MK10) += $(call cc-option,-march=amdfam10)
++ cflags-$(CONFIG_MBARCELONA) += $(call cc-option,-march=barcelona)
++ cflags-$(CONFIG_MBOBCAT) += $(call cc-option,-march=btver1)
++ cflags-$(CONFIG_MBULLDOZER) += $(call cc-option,-march=bdver1)
++ cflags-$(CONFIG_MPILEDRIVER) += $(call cc-option,-march=bdver2)
++ cflags-$(CONFIG_MJAGUAR) += $(call cc-option,-march=btver2)
+ cflags-$(CONFIG_MPSC) += $(call cc-option,-march=nocona)
+
+ cflags-$(CONFIG_MCORE2) += \
+- $(call cc-option,-march=core2,$(call cc-option,-mtune=generic))
++ $(call cc-option,-march=core2,$(call cc-option,-mtune=core2))
++ cflags-$(CONFIG_MCOREI7) += \
++ $(call cc-option,-march=corei7,$(call cc-option,-mtune=corei7))
++ cflags-$(CONFIG_MCOREI7AVX) += \
++ $(call cc-option,-march=corei7-avx,$(call cc-option,-mtune=corei7-avx))
++ cflags-$(CONFIG_MCOREAVXI) += \
++ $(call cc-option,-march=core-avx-i,$(call cc-option,-mtune=core-avx-i))
++ cflags-$(CONFIG_MCOREAVX2) += \
++ $(call cc-option,-march=core-avx2,$(call cc-option,-mtune=core-avx2))
+ cflags-$(CONFIG_MATOM) += $(call cc-option,-march=atom) \
+ $(call cc-option,-mtune=atom,$(call cc-option,-mtune=generic))
+ cflags-$(CONFIG_GENERIC_CPU) += $(call cc-option,-mtune=generic)
+diff -uprN a/arch/x86/Makefile_32.cpu b/arch/x86/Makefile_32.cpu
+--- a/arch/x86/Makefile_32.cpu 2013-11-03 18:41:51.000000000 -0500
++++ b/arch/x86/Makefile_32.cpu 2013-12-15 06:21:24.354455723 -0500
+@@ -23,7 +23,14 @@ cflags-$(CONFIG_MK6) += -march=k6
+ # Please note, that patches that add -march=athlon-xp and friends are pointless.
+ # They make zero difference whatsosever to performance at this time.
+ cflags-$(CONFIG_MK7) += -march=athlon
++cflags-$(CONFIG_MNATIVE) += $(call cc-option,-march=native)
+ cflags-$(CONFIG_MK8) += $(call cc-option,-march=k8,-march=athlon)
++cflags-$(CONFIG_MK10) += $(call cc-option,-march=amdfam10,-march=athlon)
++cflags-$(CONFIG_MBARCELONA) += $(call cc-option,-march=barcelona,-march=athlon)
++cflags-$(CONFIG_MBOBCAT) += $(call cc-option,-march=btver1,-march=athlon)
++cflags-$(CONFIG_MBULLDOZER) += $(call cc-option,-march=bdver1,-march=athlon)
++cflags-$(CONFIG_MPILEDRIVER) += $(call cc-option,-march=bdver2,-march=athlon)
++cflags-$(CONFIG_MJAGUAR) += $(call cc-option,-march=btver2,-march=athlon)
+ cflags-$(CONFIG_MCRUSOE) += -march=i686 $(align)-functions=0 $(align)-jumps=0 $(align)-loops=0
+ cflags-$(CONFIG_MEFFICEON) += -march=i686 $(call tune,pentium3) $(align)-functions=0 $(align)-jumps=0 $(align)-loops=0
+ cflags-$(CONFIG_MWINCHIPC6) += $(call cc-option,-march=winchip-c6,-march=i586)
+@@ -32,6 +39,10 @@ cflags-$(CONFIG_MCYRIXIII) += $(call cc-
+ cflags-$(CONFIG_MVIAC3_2) += $(call cc-option,-march=c3-2,-march=i686)
+ cflags-$(CONFIG_MVIAC7) += -march=i686
+ cflags-$(CONFIG_MCORE2) += -march=i686 $(call tune,core2)
++cflags-$(CONFIG_MCOREI7) += -march=i686 $(call tune,corei7)
++cflags-$(CONFIG_MCOREI7AVX) += -march=i686 $(call tune,corei7-avx)
++cflags-$(CONFIG_MCOREAVXI) += -march=i686 $(call tune,core-avx-i)
++cflags-$(CONFIG_MCOREAVX2) += -march=i686 $(call tune,core-avx2)
+ cflags-$(CONFIG_MATOM) += $(call cc-option,-march=atom,$(call cc-option,-march=core2,-march=i686)) \
+ $(call cc-option,-mtune=atom,$(call cc-option,-mtune=generic))
diff --git a/5010_enable-additional-cpu-optimizations-for-gcc-4.9.patch b/5010_enable-additional-cpu-optimizations-for-gcc-4.9.patch
new file mode 100644
index 0000000..c4efd06
--- /dev/null
+++ b/5010_enable-additional-cpu-optimizations-for-gcc-4.9.patch
@@ -0,0 +1,402 @@
+WARNING - this version of the patch works with version 4.9+ of gcc and with
+kernel version 3.15.x+ and should NOT be applied when compiling on older
+versions due to name changes of the flags with the 4.9 release of gcc.
+Use the older version of this patch hosted on the same github for older
+versions of gcc. For example:
+
+corei7 --> nehalem
+corei7-avx --> sandybridge
+core-avx-i --> ivybridge
+core-avx2 --> haswell
+
+For more, see: https://gcc.gnu.org/gcc-4.9/changes.html
+
+It also changes 'atom' to 'bonnell' in accordance with the gcc v4.9 changes.
+Note that upstream is using the deprecated 'match=atom' flags when I believe it
+should use the newer 'march=bonnell' flag for atom processors.
+
+I have made that change to this patch set as well. See the following kernel
+bug report to see if I'm right: https://bugzilla.kernel.org/show_bug.cgi?id=77461
+
+This patch will expand the number of microarchitectures to include newer
+processors including: AMD K10-family, AMD Family 10h (Barcelona), AMD Family
+14h (Bobcat), AMD Family 15h (Bulldozer), AMD Family 15h (Piledriver), AMD
+Family 16h (Jaguar), Intel 1st Gen Core i3/i5/i7 (Nehalem), Intel 1.5 Gen Core
+i3/i5/i7 (Westmere), Intel 2nd Gen Core i3/i5/i7 (Sandybridge), Intel 3rd Gen
+Core i3/i5/i7 (Ivybridge), Intel 4th Gen Core i3/i5/i7 (Haswell), Intel 5th
+Gen Core i3/i5/i7 (Broadwell), and the low power Silvermont series of Atom
+processors (Silvermont). It also offers the compiler the 'native' flag.
+
+Small but real speed increases are measurable using a make endpoint comparing
+a generic kernel to one built with one of the respective microarchs.
+
+See the following experimental evidence supporting this statement:
+https://github.com/graysky2/kernel_gcc_patch
+
+REQUIREMENTS
+linux version >=3.15
+gcc version >=4.9
+
+--- a/arch/x86/include/asm/module.h 2014-06-16 16:44:27.000000000 -0400
++++ b/arch/x86/include/asm/module.h 2015-03-07 03:27:32.556672424 -0500
+@@ -15,6 +15,22 @@
+ #define MODULE_PROC_FAMILY "586MMX "
+ #elif defined CONFIG_MCORE2
+ #define MODULE_PROC_FAMILY "CORE2 "
++#elif defined CONFIG_MNATIVE
++#define MODULE_PROC_FAMILY "NATIVE "
++#elif defined CONFIG_MNEHALEM
++#define MODULE_PROC_FAMILY "NEHALEM "
++#elif defined CONFIG_MWESTMERE
++#define MODULE_PROC_FAMILY "WESTMERE "
++#elif defined CONFIG_MSILVERMONT
++#define MODULE_PROC_FAMILY "SILVERMONT "
++#elif defined CONFIG_MSANDYBRIDGE
++#define MODULE_PROC_FAMILY "SANDYBRIDGE "
++#elif defined CONFIG_MIVYBRIDGE
++#define MODULE_PROC_FAMILY "IVYBRIDGE "
++#elif defined CONFIG_MHASWELL
++#define MODULE_PROC_FAMILY "HASWELL "
++#elif defined CONFIG_MBROADWELL
++#define MODULE_PROC_FAMILY "BROADWELL "
+ #elif defined CONFIG_MATOM
+ #define MODULE_PROC_FAMILY "ATOM "
+ #elif defined CONFIG_M686
+@@ -33,6 +49,20 @@
+ #define MODULE_PROC_FAMILY "K7 "
+ #elif defined CONFIG_MK8
+ #define MODULE_PROC_FAMILY "K8 "
++#elif defined CONFIG_MK8SSE3
++#define MODULE_PROC_FAMILY "K8SSE3 "
++#elif defined CONFIG_MK10
++#define MODULE_PROC_FAMILY "K10 "
++#elif defined CONFIG_MBARCELONA
++#define MODULE_PROC_FAMILY "BARCELONA "
++#elif defined CONFIG_MBOBCAT
++#define MODULE_PROC_FAMILY "BOBCAT "
++#elif defined CONFIG_MBULLDOZER
++#define MODULE_PROC_FAMILY "BULLDOZER "
++#elif defined CONFIG_MPILEDRIVER
++#define MODULE_PROC_FAMILY "PILEDRIVER "
++#elif defined CONFIG_MJAGUAR
++#define MODULE_PROC_FAMILY "JAGUAR "
+ #elif defined CONFIG_MELAN
+ #define MODULE_PROC_FAMILY "ELAN "
+ #elif defined CONFIG_MCRUSOE
+--- a/arch/x86/Kconfig.cpu 2014-06-16 16:44:27.000000000 -0400
++++ b/arch/x86/Kconfig.cpu 2015-03-07 03:32:14.337713226 -0500
+@@ -137,9 +137,8 @@ config MPENTIUM4
+ -Paxville
+ -Dempsey
+
+-
+ config MK6
+- bool "K6/K6-II/K6-III"
++ bool "AMD K6/K6-II/K6-III"
+ depends on X86_32
+ ---help---
+ Select this for an AMD K6-family processor. Enables use of
+@@ -147,7 +146,7 @@ config MK6
+ flags to GCC.
+
+ config MK7
+- bool "Athlon/Duron/K7"
++ bool "AMD Athlon/Duron/K7"
+ depends on X86_32
+ ---help---
+ Select this for an AMD Athlon K7-family processor. Enables use of
+@@ -155,12 +154,62 @@ config MK7
+ flags to GCC.
+
+ config MK8
+- bool "Opteron/Athlon64/Hammer/K8"
++ bool "AMD Opteron/Athlon64/Hammer/K8"
+ ---help---
+ Select this for an AMD Opteron or Athlon64 Hammer-family processor.
+ Enables use of some extended instructions, and passes appropriate
+ optimization flags to GCC.
+
++config MK8SSE3
++ bool "AMD Opteron/Athlon64/Hammer/K8 with SSE3"
++ ---help---
++ Select this for improved AMD Opteron or Athlon64 Hammer-family processors.
++ Enables use of some extended instructions, and passes appropriate
++ optimization flags to GCC.
++
++config MK10
++ bool "AMD 61xx/7x50/PhenomX3/X4/II/K10"
++ ---help---
++ Select this for an AMD 61xx Eight-Core Magny-Cours, Athlon X2 7x50,
++ Phenom X3/X4/II, Athlon II X2/X3/X4, or Turion II-family processor.
++ Enables use of some extended instructions, and passes appropriate
++ optimization flags to GCC.
++
++config MBARCELONA
++ bool "AMD Barcelona"
++ ---help---
++ Select this for AMD Barcelona and newer processors.
++
++ Enables -march=barcelona
++
++config MBOBCAT
++ bool "AMD Bobcat"
++ ---help---
++ Select this for AMD Bobcat processors.
++
++ Enables -march=btver1
++
++config MBULLDOZER
++ bool "AMD Bulldozer"
++ ---help---
++ Select this for AMD Bulldozer processors.
++
++ Enables -march=bdver1
++
++config MPILEDRIVER
++ bool "AMD Piledriver"
++ ---help---
++ Select this for AMD Piledriver processors.
++
++ Enables -march=bdver2
++
++config MJAGUAR
++ bool "AMD Jaguar"
++ ---help---
++ Select this for AMD Jaguar processors.
++
++ Enables -march=btver2
++
+ config MCRUSOE
+ bool "Crusoe"
+ depends on X86_32
+@@ -251,8 +300,17 @@ config MPSC
+ using the cpu family field
+ in /proc/cpuinfo. Family 15 is an older Xeon, Family 6 a newer one.
+
++config MATOM
++ bool "Intel Atom"
++ ---help---
++
++ Select this for the Intel Atom platform. Intel Atom CPUs have an
++ in-order pipelining architecture and thus can benefit from
++ accordingly optimized code. Use a recent GCC with specific Atom
++ support in order to fully benefit from selecting this option.
++
+ config MCORE2
+- bool "Core 2/newer Xeon"
++ bool "Intel Core 2"
+ ---help---
+
+ Select this for Intel Core 2 and newer Core 2 Xeons (Xeon 51xx and
+@@ -260,14 +318,63 @@ config MCORE2
+ family in /proc/cpuinfo. Newer ones have 6 and older ones 15
+ (not a typo)
+
+-config MATOM
+- bool "Intel Atom"
++ Enables -march=core2
++
++config MNEHALEM
++ bool "Intel Nehalem"
+ ---help---
+
+- Select this for the Intel Atom platform. Intel Atom CPUs have an
+- in-order pipelining architecture and thus can benefit from
+- accordingly optimized code. Use a recent GCC with specific Atom
+- support in order to fully benefit from selecting this option.
++ Select this for 1st Gen Core processors in the Nehalem family.
++
++ Enables -march=nehalem
++
++config MWESTMERE
++ bool "Intel Westmere"
++ ---help---
++
++ Select this for the Intel Westmere formerly Nehalem-C family.
++
++ Enables -march=westmere
++
++config MSILVERMONT
++ bool "Intel Silvermont"
++ ---help---
++
++ Select this for the Intel Silvermont platform.
++
++ Enables -march=silvermont
++
++config MSANDYBRIDGE
++ bool "Intel Sandy Bridge"
++ ---help---
++
++ Select this for 2nd Gen Core processors in the Sandy Bridge family.
++
++ Enables -march=sandybridge
++
++config MIVYBRIDGE
++ bool "Intel Ivy Bridge"
++ ---help---
++
++ Select this for 3rd Gen Core processors in the Ivy Bridge family.
++
++ Enables -march=ivybridge
++
++config MHASWELL
++ bool "Intel Haswell"
++ ---help---
++
++ Select this for 4th Gen Core processors in the Haswell family.
++
++ Enables -march=haswell
++
++config MBROADWELL
++ bool "Intel Broadwell"
++ ---help---
++
++ Select this for 5th Gen Core processors in the Broadwell family.
++
++ Enables -march=broadwell
+
+ config GENERIC_CPU
+ bool "Generic-x86-64"
+@@ -276,6 +383,19 @@ config GENERIC_CPU
+ Generic x86-64 CPU.
+ Run equally well on all x86-64 CPUs.
+
++config MNATIVE
++ bool "Native optimizations autodetected by GCC"
++ ---help---
++
++ GCC 4.2 and above support -march=native, which automatically detects
++ the optimum settings to use based on your processor. -march=native
++ also detects and applies additional settings beyond -march specific
++ to your CPU, (eg. -msse4). Unless you have a specific reason not to
++ (e.g. distcc cross-compiling), you should probably be using
++ -march=native rather than anything listed below.
++
++ Enables -march=native
++
+ endchoice
+
+ config X86_GENERIC
+@@ -300,7 +420,7 @@ config X86_INTERNODE_CACHE_SHIFT
+ config X86_L1_CACHE_SHIFT
+ int
+ default "7" if MPENTIUM4 || MPSC
+- default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MATOM || MVIAC7 || X86_GENERIC || GENERIC_CPU
++ default "6" if MK7 || MK8 || MK8SSE3 || MK10 || MBARCELONA || MBOBCAT || MBULLDOZER || MPILEDRIVER || MJAGUAR || MPENTIUMM || MCORE2 || MNEHALEM || MWESTMERE || MSILVERMONT || MSANDYBRIDGE || MIVYBRIDGE || MHASWELL || BROADWELL || MNATIVE || MATOM || MVIAC7 || X86_GENERIC || GENERIC_CPU
+ default "4" if MELAN || M486 || MGEODEGX1
+ default "5" if MWINCHIP3D || MWINCHIPC6 || MCRUSOE || MEFFICEON || MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || M586 || MVIAC3_2 || MGEODE_LX
+
+@@ -331,11 +451,11 @@ config X86_ALIGNMENT_16
+
+ config X86_INTEL_USERCOPY
+ def_bool y
+- depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || X86_GENERIC || MK8 || MK7 || MEFFICEON || MCORE2
++ depends on MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M586MMX || X86_GENERIC || MK8 || MK8SSE3 || MK7 || MEFFICEON || MCORE2 || MK10 || MBARCELONA || MNEHALEM || MWESTMERE || MSILVERMONT || MSANDYBRIDGE || MIVYBRIDGE || MHASWELL || MBROADWELL || MNATIVE
+
+ config X86_USE_PPRO_CHECKSUM
+ def_bool y
+- depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MVIAC3_2 || MVIAC7 || MEFFICEON || MGEODE_LX || MCORE2 || MATOM
++ depends on MWINCHIP3D || MWINCHIPC6 || MCYRIXIII || MK7 || MK6 || MK10 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MK8 || MK8SSE3 || MVIAC3_2 || MVIAC7 || MEFFICEON || MGEODE_LX || MCORE2 || MNEHALEM || MWESTMERE || MSILVERMONT || MSANDYBRIDGE || MIVYBRIDGE || MHASWELL || MBROADWELL || MATOM || MNATIVE
+
+ config X86_USE_3DNOW
+ def_bool y
+@@ -359,17 +479,17 @@ config X86_P6_NOP
+
+ config X86_TSC
+ def_bool y
+- depends on (MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MCORE2 || MATOM) || X86_64
++ depends on (MWINCHIP3D || MCRUSOE || MEFFICEON || MCYRIXIII || MK7 || MK6 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || MK8 || MK8SSE3 || MVIAC3_2 || MVIAC7 || MGEODEGX1 || MGEODE_LX || MCORE2 || MNEHALEM || MWESTMERE || MSILVERMONT || MSANDYBRIDGE || MIVYBRIDGE || MHASWELL || MBROADWELL || MNATIVE || MATOM) || X86_64
+
+ config X86_CMPXCHG64
+ def_bool y
+- depends on X86_PAE || X86_64 || MCORE2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MATOM
++ depends on X86_PAE || X86_64 || MCORE2 || MNEHALEM || MWESTMERE || MSILVERMONT || MSANDYBRIDGE || MIVYBRIDGE || MHASWELL || MBROADWELL || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MATOM || MNATIVE
+
+ # this should be set for all -march=.. options where the compiler
+ # generates cmov.
+ config X86_CMOV
+ def_bool y
+- depends on (MK8 || MK7 || MCORE2 || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || X86_64 || MATOM || MGEODE_LX)
++ depends on (MK8 || MK8SSE3 || MK10 || MBARCELONA || MBOBCAT || MBULLDOZER || MPILEDRIVER || MJAGUAR || MK7 || MCORE2 || MNEHALEM || MWESTMERE || MSILVERMONT || MSANDYBRIDGE || MIVYBRIDGE || MHASWELL || MBROADWELL || MPENTIUM4 || MPENTIUMM || MPENTIUMIII || MPENTIUMII || M686 || MVIAC3_2 || MVIAC7 || MCRUSOE || MEFFICEON || X86_64 || MNATIVE || MATOM || MGEODE_LX)
+
+ config X86_MINIMUM_CPU_FAMILY
+ int
+--- a/arch/x86/Makefile 2014-06-16 16:44:27.000000000 -0400
++++ b/arch/x86/Makefile 2015-03-07 03:33:27.650843211 -0500
+@@ -92,13 +92,35 @@ else
+ KBUILD_CFLAGS += $(call cc-option,-mpreferred-stack-boundary=3)
+
+ # FIXME - should be integrated in Makefile.cpu (Makefile_32.cpu)
++ cflags-$(CONFIG_MNATIVE) += $(call cc-option,-march=native)
+ cflags-$(CONFIG_MK8) += $(call cc-option,-march=k8)
++ cflags-$(CONFIG_MK8SSE3) += $(call cc-option,-march=k8-sse3,-mtune=k8)
++ cflags-$(CONFIG_MK10) += $(call cc-option,-march=amdfam10)
++ cflags-$(CONFIG_MBARCELONA) += $(call cc-option,-march=barcelona)
++ cflags-$(CONFIG_MBOBCAT) += $(call cc-option,-march=btver1)
++ cflags-$(CONFIG_MBULLDOZER) += $(call cc-option,-march=bdver1)
++ cflags-$(CONFIG_MPILEDRIVER) += $(call cc-option,-march=bdver2)
++ cflags-$(CONFIG_MJAGUAR) += $(call cc-option,-march=btver2)
+ cflags-$(CONFIG_MPSC) += $(call cc-option,-march=nocona)
+
+ cflags-$(CONFIG_MCORE2) += \
+- $(call cc-option,-march=core2,$(call cc-option,-mtune=generic))
+- cflags-$(CONFIG_MATOM) += $(call cc-option,-march=atom) \
+- $(call cc-option,-mtune=atom,$(call cc-option,-mtune=generic))
++ $(call cc-option,-march=core2,$(call cc-option,-mtune=core2))
++ cflags-$(CONFIG_MNEHALEM) += \
++ $(call cc-option,-march=nehalem,$(call cc-option,-mtune=nehalem))
++ cflags-$(CONFIG_MWESTMERE) += \
++ $(call cc-option,-march=westmere,$(call cc-option,-mtune=westmere))
++ cflags-$(CONFIG_MSILVERMONT) += \
++ $(call cc-option,-march=silvermont,$(call cc-option,-mtune=silvermont))
++ cflags-$(CONFIG_MSANDYBRIDGE) += \
++ $(call cc-option,-march=sandybridge,$(call cc-option,-mtune=sandybridge))
++ cflags-$(CONFIG_MIVYBRIDGE) += \
++ $(call cc-option,-march=ivybridge,$(call cc-option,-mtune=ivybridge))
++ cflags-$(CONFIG_MHASWELL) += \
++ $(call cc-option,-march=haswell,$(call cc-option,-mtune=haswell))
++ cflags-$(CONFIG_MBROADWELL) += \
++ $(call cc-option,-march=broadwell,$(call cc-option,-mtune=broadwell))
++ cflags-$(CONFIG_MATOM) += $(call cc-option,-march=bonnell) \
++ $(call cc-option,-mtune=bonnell,$(call cc-option,-mtune=generic))
+ cflags-$(CONFIG_GENERIC_CPU) += $(call cc-option,-mtune=generic)
+ KBUILD_CFLAGS += $(cflags-y)
+
+--- a/arch/x86/Makefile_32.cpu 2014-06-16 16:44:27.000000000 -0400
++++ b/arch/x86/Makefile_32.cpu 2015-03-07 03:34:15.203586024 -0500
+@@ -23,7 +23,15 @@ cflags-$(CONFIG_MK6) += -march=k6
+ # Please note, that patches that add -march=athlon-xp and friends are pointless.
+ # They make zero difference whatsosever to performance at this time.
+ cflags-$(CONFIG_MK7) += -march=athlon
++cflags-$(CONFIG_MNATIVE) += $(call cc-option,-march=native)
+ cflags-$(CONFIG_MK8) += $(call cc-option,-march=k8,-march=athlon)
++cflags-$(CONFIG_MK8SSE3) += $(call cc-option,-march=k8-sse3,-march=athlon)
++cflags-$(CONFIG_MK10) += $(call cc-option,-march=amdfam10,-march=athlon)
++cflags-$(CONFIG_MBARCELONA) += $(call cc-option,-march=barcelona,-march=athlon)
++cflags-$(CONFIG_MBOBCAT) += $(call cc-option,-march=btver1,-march=athlon)
++cflags-$(CONFIG_MBULLDOZER) += $(call cc-option,-march=bdver1,-march=athlon)
++cflags-$(CONFIG_MPILEDRIVER) += $(call cc-option,-march=bdver2,-march=athlon)
++cflags-$(CONFIG_MJAGUAR) += $(call cc-option,-march=btver2,-march=athlon)
+ cflags-$(CONFIG_MCRUSOE) += -march=i686 $(align)-functions=0 $(align)-jumps=0 $(align)-loops=0
+ cflags-$(CONFIG_MEFFICEON) += -march=i686 $(call tune,pentium3) $(align)-functions=0 $(align)-jumps=0 $(align)-loops=0
+ cflags-$(CONFIG_MWINCHIPC6) += $(call cc-option,-march=winchip-c6,-march=i586)
+@@ -32,8 +40,15 @@ cflags-$(CONFIG_MCYRIXIII) += $(call cc-
+ cflags-$(CONFIG_MVIAC3_2) += $(call cc-option,-march=c3-2,-march=i686)
+ cflags-$(CONFIG_MVIAC7) += -march=i686
+ cflags-$(CONFIG_MCORE2) += -march=i686 $(call tune,core2)
+-cflags-$(CONFIG_MATOM) += $(call cc-option,-march=atom,$(call cc-option,-march=core2,-march=i686)) \
+- $(call cc-option,-mtune=atom,$(call cc-option,-mtune=generic))
++cflags-$(CONFIG_MNEHALEM) += -march=i686 $(call tune,nehalem)
++cflags-$(CONFIG_MWESTMERE) += -march=i686 $(call tune,westmere)
++cflags-$(CONFIG_MSILVERMONT) += -march=i686 $(call tune,silvermont)
++cflags-$(CONFIG_MSANDYBRIDGE) += -march=i686 $(call tune,sandybridge)
++cflags-$(CONFIG_MIVYBRIDGE) += -march=i686 $(call tune,ivybridge)
++cflags-$(CONFIG_MHASWELL) += -march=i686 $(call tune,haswell)
++cflags-$(CONFIG_MBROADWELL) += -march=i686 $(call tune,broadwell)
++cflags-$(CONFIG_MATOM) += $(call cc-option,-march=bonnell,$(call cc-option,-march=core2,-march=i686)) \
++ $(call cc-option,-mtune=bonnell,$(call cc-option,-mtune=generic))
+
+ # AMD Elan support
+ cflags-$(CONFIG_MELAN) += -march=i486
+
diff --git a/5015_kdbus-8-12-2015.patch b/5015_kdbus-8-12-2015.patch
new file mode 100644
index 0000000..4e018f2
--- /dev/null
+++ b/5015_kdbus-8-12-2015.patch
@@ -0,0 +1,34349 @@
+diff --git a/Documentation/Makefile b/Documentation/Makefile
+index bc05482..e2127a7 100644
+--- a/Documentation/Makefile
++++ b/Documentation/Makefile
+@@ -1,4 +1,4 @@
+ subdir-y := accounting auxdisplay blackfin connector \
+- filesystems filesystems ia64 laptops mic misc-devices \
++ filesystems filesystems ia64 kdbus laptops mic misc-devices \
+ networking pcmcia prctl ptp spi timers vDSO video4linux \
+ watchdog
+diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
+index 51f4221..ec7c81b 100644
+--- a/Documentation/ioctl/ioctl-number.txt
++++ b/Documentation/ioctl/ioctl-number.txt
+@@ -292,6 +292,7 @@ Code Seq#(hex) Include File Comments
+ 0x92 00-0F drivers/usb/mon/mon_bin.c
+ 0x93 60-7F linux/auto_fs.h
+ 0x94 all fs/btrfs/ioctl.h
++0x95 all uapi/linux/kdbus.h kdbus IPC driver
+ 0x97 00-7F fs/ceph/ioctl.h Ceph file system
+ 0x99 00-0F 537-Addinboard driver
+ <mailto:buk@buks.ipn.de>
+diff --git a/Documentation/kdbus/.gitignore b/Documentation/kdbus/.gitignore
+new file mode 100644
+index 0000000..b4a77cc
+--- /dev/null
++++ b/Documentation/kdbus/.gitignore
+@@ -0,0 +1,2 @@
++*.7
++*.html
+diff --git a/Documentation/kdbus/Makefile b/Documentation/kdbus/Makefile
+new file mode 100644
+index 0000000..8caffe5
+--- /dev/null
++++ b/Documentation/kdbus/Makefile
+@@ -0,0 +1,44 @@
++DOCS := \
++ kdbus.xml \
++ kdbus.bus.xml \
++ kdbus.connection.xml \
++ kdbus.endpoint.xml \
++ kdbus.fs.xml \
++ kdbus.item.xml \
++ kdbus.match.xml \
++ kdbus.message.xml \
++ kdbus.name.xml \
++ kdbus.policy.xml \
++ kdbus.pool.xml
++
++XMLFILES := $(addprefix $(obj)/,$(DOCS))
++MANFILES := $(patsubst %.xml, %.7, $(XMLFILES))
++HTMLFILES := $(patsubst %.xml, %.html, $(XMLFILES))
++
++XMLTO_ARGS := -m $(srctree)/$(src)/stylesheet.xsl --skip-validation
++
++quiet_cmd_db2man = MAN $@
++ cmd_db2man = xmlto man $(XMLTO_ARGS) -o $(obj) $<
++%.7: %.xml
++ @(which xmlto > /dev/null 2>&1) || \
++ (echo "*** You need to install xmlto ***"; \
++ exit 1)
++ $(call cmd,db2man)
++
++quiet_cmd_db2html = HTML $@
++ cmd_db2html = xmlto html-nochunks $(XMLTO_ARGS) -o $(obj) $<
++%.html: %.xml
++ @(which xmlto > /dev/null 2>&1) || \
++ (echo "*** You need to install xmlto ***"; \
++ exit 1)
++ $(call cmd,db2html)
++
++mandocs: $(MANFILES)
++
++htmldocs: $(HTMLFILES)
++
++clean-files := $(MANFILES) $(HTMLFILES)
++
++# we don't support other %docs targets right now
++%docs:
++ @true
+diff --git a/Documentation/kdbus/kdbus.bus.xml b/Documentation/kdbus/kdbus.bus.xml
+new file mode 100644
+index 0000000..83f1198
+--- /dev/null
++++ b/Documentation/kdbus/kdbus.bus.xml
+@@ -0,0 +1,344 @@
++<?xml version='1.0'?> <!--*-nxml-*-->
++<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
++ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
++
++<refentry id="kdbus.bus">
++
++ <refentryinfo>
++ <title>kdbus.bus</title>
++ <productname>kdbus.bus</productname>
++ </refentryinfo>
++
++ <refmeta>
++ <refentrytitle>kdbus.bus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </refmeta>
++
++ <refnamediv>
++ <refname>kdbus.bus</refname>
++ <refpurpose>kdbus bus</refpurpose>
++ </refnamediv>
++
++ <refsect1>
++ <title>Description</title>
++
++ <para>
++ A bus is a resource that is shared between connections in order to
++ transmit messages (see
++ <citerefentry>
++ <refentrytitle>kdbus.message</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>).
++ Each bus is independent, and operations on the bus will not have any
++ effect on other buses. A bus is a management entity that controls the
++ addresses of its connections, their policies and message transactions
++ performed via this bus.
++ </para>
++ <para>
++ Each bus is bound to the mount instance it was created on. It has a
++ custom name that is unique across all buses of a domain. In
++ <citerefentry>
++ <refentrytitle>kdbus.fs</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ a bus is presented as a directory. No operations can be performed on
++ the bus itself; instead you need to perform the operations on an endpoint
++ associated with the bus. Endpoints are accessible as files underneath the
++ bus directory. A default endpoint called <constant>bus</constant> is
++ provided on each bus.
++ </para>
++ <para>
++ Bus names may be chosen freely except for one restriction: the name must
++ be prefixed with the numeric effective UID of the creator and a dash. This
++ is required to avoid namespace clashes between different users. When
++ creating a bus, the name that is passed in must be properly formatted, or
++ the kernel will refuse creation of the bus. Example:
++ <literal>1047-foobar</literal> is an acceptable name for a bus
++ registered by a user with UID 1047. However,
++ <literal>1024-foobar</literal> is not, and neither is
++ <literal>foobar</literal>. The UID must be provided in the
++ user-namespace of the bus owner.
++ </para>
++ <para>
++ To create a new bus, you need to open the control file of a domain and
++ employ the <constant>KDBUS_CMD_BUS_MAKE</constant> ioctl. The control
++ file descriptor that was used to issue
++ <constant>KDBUS_CMD_BUS_MAKE</constant> must not previously have been
++ used for any other control-ioctl and must be kept open for the entire
++ life-time of the created bus. Closing it will immediately cleanup the
++ entire bus and all its associated resources and endpoints. Every control
++ file descriptor can only be used to create a single new bus; from that
++ point on, it is not used for any further communication until the final
++ <citerefentry>
++ <refentrytitle>close</refentrytitle>
++ <manvolnum>2</manvolnum>
++ </citerefentry>
++ .
++ </para>
++ <para>
++ Each bus will generate a random, 128-bit UUID upon creation. This UUID
++ will be returned to creators of connections through
++ <varname>kdbus_cmd_hello.id128</varname> and can be used to uniquely
++ identify buses, even across different machines or containers. The UUID
++ will have its variant bits set to <literal>DCE</literal>, and denote
++ version 4 (random). For more details on UUIDs, see <ulink
++ url="https://en.wikipedia.org/wiki/Universally_unique_identifier">
++ the Wikipedia article on UUIDs</ulink>.
++ </para>
++
++ </refsect1>
++
++ <refsect1>
++ <title>Creating buses</title>
++ <para>
++ To create a new bus, the <constant>KDBUS_CMD_BUS_MAKE</constant>
++ command is used. It takes a <type>struct kdbus_cmd</type> argument.
++ </para>
++ <programlisting>
++struct kdbus_cmd {
++ __u64 size;
++ __u64 flags;
++ __u64 return_flags;
++ struct kdbus_item items[0];
++};
++ </programlisting>
++
++ <para>The fields in this struct are described below.</para>
++
++ <variablelist>
++ <varlistentry>
++ <term><varname>size</varname></term>
++ <listitem><para>
++ The overall size of the struct, including its items.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>flags</varname></term>
++ <listitem><para>The flags for creation.</para>
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_MAKE_ACCESS_GROUP</constant></term>
++ <listitem>
++ <para>Make the bus file group-accessible.</para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_MAKE_ACCESS_WORLD</constant></term>
++ <listitem>
++ <para>Make the bus file world-accessible.</para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
++ <listitem>
++ <para>
++ Requests a set of valid flags for this ioctl. When this bit is
++ set, no action is taken; the ioctl will return
++ <errorcode>0</errorcode>, and the <varname>flags</varname>
++ field will have all bits set that are valid for this command.
++ The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
++ cleared by the operation.
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>return_flags</varname></term>
++ <listitem><para>
++ Flags returned by the kernel. Currently unused and always set to
++ <constant>0</constant> by the kernel.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>items</varname></term>
++ <listitem>
++ <para>
++ The following items (see
++ <citerefentry>
++ <refentrytitle>kdbus.item</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>)
++ are expected for <constant>KDBUS_CMD_BUS_MAKE</constant>.
++ </para>
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_MAKE_NAME</constant></term>
++ <listitem>
++ <para>
++ Contains a null-terminated string that identifies the
++ bus. The name must be unique across the kdbus domain and
++ must start with the effective UID of the caller, followed by
++ a '<literal>-</literal>' (dash). This item is mandatory.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_BLOOM_PARAMETER</constant></term>
++ <listitem>
++ <para>
++ Bus-wide bloom parameters passed in a
++ <type>struct kdbus_bloom_parameter</type>. These settings are
++ copied back to new connections verbatim. This item is
++ mandatory. See
++ <citerefentry>
++ <refentrytitle>kdbus.item</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for a more detailed description of this item.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_ATTACH_FLAGS_SEND</constant></term>
++ <listitem>
++ <para>
++ An optional item that contains a set of attach flags that are
++ returned to connections when they query the bus creator
++ metadata. If not set, no metadata is returned.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
++ <listitem><para>
++ With this item, programs can <emphasis>probe</emphasis> the
++ kernel for known item types. See
++ <citerefentry>
++ <refentrytitle>kdbus.item</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more details.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++
++ <para>
++ Unrecognized items are rejected, and the ioctl will fail with
++ <varname>errno</varname> set to <constant>EINVAL</constant>.
++ </para>
++ </refsect1>
++
++ <refsect1>
++ <title>Return value</title>
++ <para>
++ On success, all mentioned ioctl commands return <errorcode>0</errorcode>;
++ on error, <errorcode>-1</errorcode> is returned, and
++ <varname>errno</varname> is set to indicate the error.
++ If the issued ioctl is illegal for the file descriptor used,
++ <varname>errno</varname> will be set to <constant>ENOTTY</constant>.
++ </para>
++
++ <refsect2>
++ <title>
++ <constant>KDBUS_CMD_BUS_MAKE</constant> may fail with the following
++ errors
++ </title>
++
++ <variablelist>
++ <varlistentry>
++ <term><constant>EBADMSG</constant></term>
++ <listitem><para>
++ A mandatory item is missing.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EINVAL</constant></term>
++ <listitem><para>
++ The flags supplied in the <constant>struct kdbus_cmd</constant>
++ are invalid or the supplied name does not start with the current
++ UID and a '<literal>-</literal>' (dash).
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EEXIST</constant></term>
++ <listitem><para>
++ A bus of that name already exists.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>ESHUTDOWN</constant></term>
++ <listitem><para>
++ The kdbus mount instance for the bus was already shut down.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EMFILE</constant></term>
++ <listitem><para>
++ The maximum number of buses for the current user is exhausted.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++ </refsect2>
++ </refsect1>
++
++ <refsect1>
++ <title>See Also</title>
++ <simplelist type="inline">
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.connection</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.endpoint</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.fs</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.item</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.message</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.name</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.pool</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ </simplelist>
++ </refsect1>
++</refentry>
+diff --git a/Documentation/kdbus/kdbus.connection.xml b/Documentation/kdbus/kdbus.connection.xml
+new file mode 100644
+index 0000000..4bb5f30
+--- /dev/null
++++ b/Documentation/kdbus/kdbus.connection.xml
+@@ -0,0 +1,1244 @@
++<?xml version='1.0'?> <!--*-nxml-*-->
++<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
++ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
++
++<refentry id="kdbus.connection">
++
++ <refentryinfo>
++ <title>kdbus.connection</title>
++ <productname>kdbus.connection</productname>
++ </refentryinfo>
++
++ <refmeta>
++ <refentrytitle>kdbus.connection</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </refmeta>
++
++ <refnamediv>
++ <refname>kdbus.connection</refname>
++ <refpurpose>kdbus connection</refpurpose>
++ </refnamediv>
++
++ <refsect1>
++ <title>Description</title>
++
++ <para>
++ Connections are identified by their <emphasis>connection ID</emphasis>,
++ internally implemented as a <type>uint64_t</type> counter.
++ The IDs of every newly created bus start at <constant>1</constant>, and
++ every new connection will increment the counter by <constant>1</constant>.
++ The IDs are not reused.
++ </para>
++ <para>
++ In higher level tools, the user visible representation of a connection is
++ defined by the D-Bus protocol specification as
++ <constant>":1.<ID>"</constant>.
++ </para>
++ <para>
++ Messages with a specific <type>uint64_t</type> destination ID are
++ directly delivered to the connection with the corresponding ID. Signal
++ messages (see
++ <citerefentry>
++ <refentrytitle>kdbus.message</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>)
++ may be addressed to the special destination ID
++ <constant>KDBUS_DST_ID_BROADCAST</constant> (~0ULL) and will then
++ potentially be delivered to all currently active connections on the bus.
++ However, in order to receive any signal messages, clients must subscribe
++ to them by installing a match (see
++ <citerefentry>
++ <refentrytitle>kdbus.match</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>).
++ </para>
++ <para>
++ Messages synthesized and sent directly by the kernel will carry the
++ special source ID <constant>KDBUS_SRC_ID_KERNEL</constant> (0).
++ </para>
++ <para>
++ In addition to the unique <type>uint64_t</type> connection ID,
++ established connections can request the ownership of
++ <emphasis>well-known names</emphasis>, under which they can be found and
++ addressed by other bus clients. A well-known name is associated with one
++ and only one connection at a time. See
++ <citerefentry>
++ <refentrytitle>kdbus.name</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ on name acquisition, the name registry, and the validity of names.
++ </para>
++ <para>
++ Messages can specify the special destination ID
++ <constant>KDBUS_DST_ID_NAME</constant> (0) and carry a well-known name
++ in the message data. Such a message is delivered to the destination
++ connection which owns that well-known name.
++ </para>
++
++ <programlisting><![CDATA[
++ +-------------------------------------------------------------------------+
++ | +---------------+ +---------------------------+ |
++ | | Connection | | Message | -----------------+ |
++ | | :1.22 | --> | src: 22 | | |
++ | | | | dst: 25 | | |
++ | | | | | | |
++ | | | | | | |
++ | | | +---------------------------+ | |
++ | | | | |
++ | | | <--------------------------------------+ | |
++ | +---------------+ | | |
++ | | | |
++ | +---------------+ +---------------------------+ | | |
++ | | Connection | | Message | -----+ | |
++ | | :1.25 | --> | src: 25 | | |
++ | | | | dst: 0xffffffffffffffff | -------------+ | |
++ | | | | (KDBUS_DST_ID_BROADCAST) | | | |
++ | | | | | ---------+ | | |
++ | | | +---------------------------+ | | | |
++ | | | | | | |
++ | | | <--------------------------------------------------+ |
++ | +---------------+ | | |
++ | | | |
++ | +---------------+ +---------------------------+ | | |
++ | | Connection | | Message | --+ | | |
++ | | :1.55 | --> | src: 55 | | | | |
++ | | | | dst: 0 / org.foo.bar | | | | |
++ | | | | | | | | |
++ | | | | | | | | |
++ | | | +---------------------------+ | | | |
++ | | | | | | |
++ | | | <------------------------------------------+ | |
++ | +---------------+ | | |
++ | | | |
++ | +---------------+ | | |
++ | | Connection | | | |
++ | | :1.81 | | | |
++ | | org.foo.bar | | | |
++ | | | | | |
++ | | | | | |
++ | | | <-----------------------------------+ | |
++ | | | | |
++ | | | <----------------------------------------------+ |
++ | +---------------+ |
++ +-------------------------------------------------------------------------+
++ ]]></programlisting>
++ </refsect1>
++
++ <refsect1>
++ <title>Privileged connections</title>
++ <para>
++ A connection is considered <emphasis>privileged</emphasis> if the user
++ it was created by is the same that created the bus, or if the creating
++ task had <constant>CAP_IPC_OWNER</constant> set when it called
++ <constant>KDBUS_CMD_HELLO</constant> (see below).
++ </para>
++ <para>
++ Privileged connections have permission to employ certain restricted
++ functions and commands, which are explained below and in other kdbus
++ man-pages.
++ </para>
++ </refsect1>
++
++ <refsect1>
++ <title>Activator and policy holder connection</title>
++ <para>
++ An <emphasis>activator</emphasis> connection is a placeholder for a
++ <emphasis>well-known name</emphasis>. Messages sent to such a connection
++ can be used to start an implementer connection, which will then get all
++ the messages from the activator copied over. An activator connection
++ cannot be used to send any message.
++ </para>
++ <para>
++ A <emphasis>policy holder</emphasis> connection only installs a policy
++ for one or more names. These policy entries are kept active as long as
++ the connection is alive, and are removed once it terminates. Such a
++ policy connection type can be used to deploy restrictions for names that
++ are not yet active on the bus. A policy holder connection cannot be used
++ to send any message.
++ </para>
++ <para>
++ The creation of activator or policy holder connections is restricted to
++ privileged users on the bus (see above).
++ </para>
++ </refsect1>
++
++ <refsect1>
++ <title>Monitor connections</title>
++ <para>
++ Monitors are eavesdropping connections that receive all the traffic on the
++ bus, but is invisible to other connections. Such connections have all
++ properties of any other, regular connection, except for the following
++ details:
++ </para>
++
++ <itemizedlist>
++ <listitem><para>
++ They will get every message sent over the bus, both unicasts and
++ broadcasts.
++ </para></listitem>
++
++ <listitem><para>
++ Installing matches for signal messages is neither necessary
++ nor allowed.
++ </para></listitem>
++
++ <listitem><para>
++ They cannot send messages or be directly addressed as receiver.
++ </para></listitem>
++
++ <listitem><para>
++ They cannot own well-known names. Therefore, they also can't operate as
++ activators.
++ </para></listitem>
++
++ <listitem><para>
++ Their creation and destruction will not cause
++ <constant>KDBUS_ITEM_ID_{ADD,REMOVE}</constant> (see
++ <citerefentry>
++ <refentrytitle>kdbus.item</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>).
++ </para></listitem>
++
++ <listitem><para>
++ They are not listed with their unique name in name registry dumps
++ (see <constant>KDBUS_CMD_NAME_LIST</constant> in
++ <citerefentry>
++ <refentrytitle>kdbus.name</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>), so other connections cannot detect the presence of
++ a monitor.
++ </para></listitem>
++ </itemizedlist>
++ <para>
++ The creation of monitor connections is restricted to privileged users on
++ the bus (see above).
++ </para>
++ </refsect1>
++
++ <refsect1>
++ <title>Creating connections</title>
++ <para>
++ A connection to a bus is created by opening an endpoint file (see
++ <citerefentry>
++ <refentrytitle>kdbus.endpoint</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>)
++ of a bus and becoming an active client with the
++ <constant>KDBUS_CMD_HELLO</constant> ioctl. Every connection has a unique
++ identifier on the bus and can address messages to every other connection
++ on the same bus by using the peer's connection ID as the destination.
++ </para>
++ <para>
++ The <constant>KDBUS_CMD_HELLO</constant> ioctl takes a <type>struct
++ kdbus_cmd_hello</type> as argument.
++ </para>
++
++ <programlisting>
++struct kdbus_cmd_hello {
++ __u64 size;
++ __u64 flags;
++ __u64 return_flags;
++ __u64 attach_flags_send;
++ __u64 attach_flags_recv;
++ __u64 bus_flags;
++ __u64 id;
++ __u64 pool_size;
++ __u64 offset;
++ __u8 id128[16];
++ struct kdbus_item items[0];
++};
++ </programlisting>
++
++ <para>The fields in this struct are described below.</para>
++
++ <variablelist>
++ <varlistentry>
++ <term><varname>size</varname></term>
++ <listitem><para>
++ The overall size of the struct, including its items.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>flags</varname></term>
++ <listitem>
++ <para>Flags to apply to this connection</para>
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_HELLO_ACCEPT_FD</constant></term>
++ <listitem>
++ <para>
++ When this flag is set, the connection can be sent file
++ descriptors as message payload of unicast messages. If it's
++ not set, an attempt to send file descriptors will result in
++ <constant>-ECOMM</constant> on the sender's side.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_HELLO_ACTIVATOR</constant></term>
++ <listitem>
++ <para>
++ Make this connection an activator (see above). With this bit
++ set, an item of type <constant>KDBUS_ITEM_NAME</constant> has
++ to be attached. This item describes the well-known name this
++ connection should be an activator for.
++ A connection can not be an activator and a policy holder at
++ the same time time, so this bit is not allowed together with
++ <constant>KDBUS_HELLO_POLICY_HOLDER</constant>.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_HELLO_POLICY_HOLDER</constant></term>
++ <listitem>
++ <para>
++ Make this connection a policy holder (see above). With this
++ bit set, an item of type <constant>KDBUS_ITEM_NAME</constant>
++ has to be attached. This item describes the well-known name
++ this connection should hold a policy for.
++ A connection can not be an activator and a policy holder at
++ the same time time, so this bit is not allowed together with
++ <constant>KDBUS_HELLO_ACTIVATOR</constant>.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_HELLO_MONITOR</constant></term>
++ <listitem>
++ <para>
++ Make this connection a monitor connection (see above).
++ </para>
++ <para>
++ This flag can only be set by privileged bus connections. See
++ below for more information.
++ A connection can not be monitor and an activator or a policy
++ holder at the same time time, so this bit is not allowed
++ together with <constant>KDBUS_HELLO_ACTIVATOR</constant> or
++ <constant>KDBUS_HELLO_POLICY_HOLDER</constant>.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
++ <listitem>
++ <para>
++ Requests a set of valid flags for this ioctl. When this bit is
++ set, no action is taken; the ioctl will return
++ <errorcode>0</errorcode>, and the <varname>flags</varname>
++ field will have all bits set that are valid for this command.
++ The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
++ cleared by the operation.
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>return_flags</varname></term>
++ <listitem><para>
++ Flags returned by the kernel. Currently unused and always set to
++ <constant>0</constant> by the kernel.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>attach_flags_send</varname></term>
++ <listitem><para>
++ Set the bits for metadata this connection permits to be sent to the
++ receiving peer. Only metadata items that are both allowed to be sent
++ by the sender and that are requested by the receiver will be attached
++ to the message.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>attach_flags_recv</varname></term>
++ <listitem><para>
++ Request the attachment of metadata for each message received by this
++ connection. See
++ <citerefentry>
++ <refentrytitle>kdbus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for information about metadata, and
++ <citerefentry>
++ <refentrytitle>kdbus.item</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ regarding items in general.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>bus_flags</varname></term>
++ <listitem><para>
++ Upon successful completion of the ioctl, this member will contain the
++ flags of the bus it connected to.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>id</varname></term>
++ <listitem><para>
++ Upon successful completion of the command, this member will contain
++ the numerical ID of the new connection.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>pool_size</varname></term>
++ <listitem><para>
++ The size of the communication pool, in bytes. The pool can be
++ accessed by calling
++ <citerefentry>
++ <refentrytitle>mmap</refentrytitle>
++ <manvolnum>2</manvolnum>
++ </citerefentry>
++ on the file descriptor that was used to issue the
++ <constant>KDBUS_CMD_HELLO</constant> ioctl.
++ The pool size of a connection must be greater than
++ <constant>0</constant> and a multiple of
++ <constant>PAGE_SIZE</constant>. See
++ <citerefentry>
++ <refentrytitle>kdbus.pool</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more information.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>offset</varname></term>
++ <listitem><para>
++ The kernel will return the offset in the pool where returned details
++ will be stored. See below.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>id128</varname></term>
++ <listitem><para>
++ Upon successful completion of the ioctl, this member will contain the
++ <emphasis>128-bit UUID</emphasis> of the connected bus.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>items</varname></term>
++ <listitem>
++ <para>
++ Variable list of items containing optional additional information.
++ The following items are currently expected/valid:
++ </para>
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_CONN_DESCRIPTION</constant></term>
++ <listitem>
++ <para>
++ Contains a string that describes this connection, so it can
++ be identified later.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_NAME</constant></term>
++ <term><constant>KDBUS_ITEM_POLICY_ACCESS</constant></term>
++ <listitem>
++ <para>
++ For activators and policy holders only, combinations of
++ these two items describe policy access entries. See
++ <citerefentry>
++ <refentrytitle>kdbus.policy</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for further details.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_CREDS</constant></term>
++ <term><constant>KDBUS_ITEM_PIDS</constant></term>
++ <term><constant>KDBUS_ITEM_SECLABEL</constant></term>
++ <listitem>
++ <para>
++ Privileged bus users may submit these types in order to
++ create connections with faked credentials. This information
++ will be returned when peer information is queried by
++ <constant>KDBUS_CMD_CONN_INFO</constant>. See below for more
++ information on retrieving information on connections.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
++ <listitem><para>
++ With this item, programs can <emphasis>probe</emphasis> the
++ kernel for known item types. See
++ <citerefentry>
++ <refentrytitle>kdbus.item</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more details.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++
++ <para>
++ Unrecognized items are rejected, and the ioctl will fail with
++ <varname>errno</varname> set to <constant>EINVAL</constant>.
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++
++ <para>
++ At the offset returned in the <varname>offset</varname> field of
++ <type>struct kdbus_cmd_hello</type>, the kernel will store items
++ of the following types:
++ </para>
++
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_BLOOM_PARAMETER</constant></term>
++ <listitem>
++ <para>
++ Bloom filter parameter as defined by the bus creator.
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++
++ <para>
++ The offset in the pool has to be freed with the
++ <constant>KDBUS_CMD_FREE</constant> ioctl. See
++ <citerefentry>
++ <refentrytitle>kdbus.pool</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for further information.
++ </para>
++ </refsect1>
++
++ <refsect1>
++ <title>Retrieving information on a connection</title>
++ <para>
++ The <constant>KDBUS_CMD_CONN_INFO</constant> ioctl can be used to
++ retrieve credentials and properties of the initial creator of a
++ connection. This ioctl uses the following struct.
++ </para>
++
++ <programlisting>
++struct kdbus_cmd_info {
++ __u64 size;
++ __u64 flags;
++ __u64 return_flags;
++ __u64 id;
++ __u64 attach_flags;
++ __u64 offset;
++ __u64 info_size;
++ struct kdbus_item items[0];
++};
++ </programlisting>
++
++ <variablelist>
++ <varlistentry>
++ <term><varname>size</varname></term>
++ <listitem><para>
++ The overall size of the struct, including its items.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>flags</varname></term>
++ <listitem><para>
++ Currently, no flags are supported.
++ <constant>KDBUS_FLAG_NEGOTIATE</constant> is accepted to probe for
++ valid flags. If set, the ioctl will return <errorcode>0</errorcode>,
++ and the <varname>flags</varname> field is set to
++ <constant>0</constant>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>return_flags</varname></term>
++ <listitem><para>
++ Flags returned by the kernel. Currently unused and always set to
++ <constant>0</constant> by the kernel.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>id</varname></term>
++ <listitem><para>
++ The numerical ID of the connection for which information is to be
++ retrieved. If set to a non-zero value, the
++ <constant>KDBUS_ITEM_OWNED_NAME</constant> item is ignored.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>attach_flags</varname></term>
++ <listitem><para>
++ Specifies which metadata items should be attached to the answer. See
++ <citerefentry>
++ <refentrytitle>kdbus.message</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>offset</varname></term>
++ <listitem><para>
++ When the ioctl returns, this field will contain the offset of the
++ connection information inside the caller's pool. See
++ <citerefentry>
++ <refentrytitle>kdbus.pool</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for further information.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>info_size</varname></term>
++ <listitem><para>
++ The kernel will return the size of the returned information, so
++ applications can optionally
++ <citerefentry>
++ <refentrytitle>mmap</refentrytitle>
++ <manvolnum>2</manvolnum>
++ </citerefentry>
++ specific parts of the pool. See
++ <citerefentry>
++ <refentrytitle>kdbus.pool</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for further information.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>items</varname></term>
++ <listitem>
++ <para>
++ The following items are expected for
++ <constant>KDBUS_CMD_CONN_INFO</constant>.
++ </para>
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_OWNED_NAME</constant></term>
++ <listitem>
++ <para>
++ Contains the well-known name of the connection to look up as.
++ This item is mandatory if the <varname>id</varname> field is
++ set to 0.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
++ <listitem><para>
++ With this item, programs can <emphasis>probe</emphasis> the
++ kernel for known item types. See
++ <citerefentry>
++ <refentrytitle>kdbus.item</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more details.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++ <para>
++ Unrecognized items are rejected, and the ioctl will fail with
++ <varname>errno</varname> set to <constant>EINVAL</constant>.
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++
++ <para>
++ When the ioctl returns, the following struct will be stored in the
++ caller's pool at <varname>offset</varname>. The fields in this struct
++ are described below.
++ </para>
++
++ <programlisting>
++struct kdbus_info {
++ __u64 size;
++ __u64 id;
++ __u64 flags;
++ struct kdbus_item items[0];
++};
++ </programlisting>
++
++ <variablelist>
++ <varlistentry>
++ <term><varname>size</varname></term>
++ <listitem><para>
++ The overall size of the struct, including its items.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>id</varname></term>
++ <listitem><para>
++ The connection's unique ID.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>flags</varname></term>
++ <listitem><para>
++ The connection's flags as specified when it was created.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>items</varname></term>
++ <listitem>
++ <para>
++ Depending on the <varname>flags</varname> field in
++ <type>struct kdbus_cmd_info</type>, items of types
++ <constant>KDBUS_ITEM_OWNED_NAME</constant> and
++ <constant>KDBUS_ITEM_CONN_DESCRIPTION</constant> may follow here.
++ <constant>KDBUS_ITEM_NEGOTIATE</constant> is also allowed.
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++
++ <para>
++ Once the caller is finished with parsing the return buffer, it needs to
++ employ the <constant>KDBUS_CMD_FREE</constant> command for the offset, in
++ order to free the buffer part. See
++ <citerefentry>
++ <refentrytitle>kdbus.pool</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for further information.
++ </para>
++ </refsect1>
++
++ <refsect1>
++ <title>Getting information about a connection's bus creator</title>
++ <para>
++ The <constant>KDBUS_CMD_BUS_CREATOR_INFO</constant> ioctl takes the same
++ struct as <constant>KDBUS_CMD_CONN_INFO</constant>, but is used to
++ retrieve information about the creator of the bus the connection is
++ attached to. The metadata returned by this call is collected during the
++ creation of the bus and is never altered afterwards, so it provides
++ pristine information on the task that created the bus, at the moment when
++ it did so.
++ </para>
++ <para>
++ In response to this call, a slice in the connection's pool is allocated
++ and filled with an object of type <type>struct kdbus_info</type>,
++ pointed to by the ioctl's <varname>offset</varname> field.
++ </para>
++
++ <programlisting>
++struct kdbus_info {
++ __u64 size;
++ __u64 id;
++ __u64 flags;
++ struct kdbus_item items[0];
++};
++ </programlisting>
++
++ <variablelist>
++ <varlistentry>
++ <term><varname>size</varname></term>
++ <listitem><para>
++ The overall size of the struct, including its items.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>id</varname></term>
++ <listitem><para>
++ The bus ID.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>flags</varname></term>
++ <listitem><para>
++ The bus flags as specified when it was created.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>items</varname></term>
++ <listitem>
++ <para>
++ Metadata information is stored in items here. The item list
++ contains a <constant>KDBUS_ITEM_MAKE_NAME</constant> item that
++ indicates the bus name of the calling connection.
++ <constant>KDBUS_ITEM_NEGOTIATE</constant> is allowed to probe
++ for known item types.
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++
++ <para>
++ Once the caller is finished with parsing the return buffer, it needs to
++ employ the <constant>KDBUS_CMD_FREE</constant> command for the offset, in
++ order to free the buffer part. See
++ <citerefentry>
++ <refentrytitle>kdbus.pool</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for further information.
++ </para>
++ </refsect1>
++
++ <refsect1>
++ <title>Updating connection details</title>
++ <para>
++ Some of a connection's details can be updated with the
++ <constant>KDBUS_CMD_CONN_UPDATE</constant> ioctl, using the file
++ descriptor that was used to create the connection. The update command
++ uses the following struct.
++ </para>
++
++ <programlisting>
++struct kdbus_cmd {
++ __u64 size;
++ __u64 flags;
++ __u64 return_flags;
++ struct kdbus_item items[0];
++};
++ </programlisting>
++
++ <variablelist>
++ <varlistentry>
++ <term><varname>size</varname></term>
++ <listitem><para>
++ The overall size of the struct, including its items.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>flags</varname></term>
++ <listitem><para>
++ Currently, no flags are supported.
++ <constant>KDBUS_FLAG_NEGOTIATE</constant> is accepted to probe for
++ valid flags. If set, the ioctl will return <errorcode>0</errorcode>,
++ and the <varname>flags</varname> field is set to
++ <constant>0</constant>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>return_flags</varname></term>
++ <listitem><para>
++ Flags returned by the kernel. Currently unused and always set to
++ <constant>0</constant> by the kernel.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>items</varname></term>
++ <listitem>
++ <para>
++ Items to describe the connection details to be updated. The
++ following item types are supported.
++ </para>
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_ATTACH_FLAGS_SEND</constant></term>
++ <listitem>
++ <para>
++ Supply a new set of metadata items that this connection
++ permits to be sent along with messages.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_ATTACH_FLAGS_RECV</constant></term>
++ <listitem>
++ <para>
++ Supply a new set of metadata items that this connection
++ requests to be attached to each message.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_NAME</constant></term>
++ <term><constant>KDBUS_ITEM_POLICY_ACCESS</constant></term>
++ <listitem>
++ <para>
++ Policy holder connections may supply a new set of policy
++ information with these items. For other connection types,
++ <constant>EOPNOTSUPP</constant> is returned in
++ <varname>errno</varname>.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
++ <listitem><para>
++ With this item, programs can <emphasis>probe</emphasis> the
++ kernel for known item types. See
++ <citerefentry>
++ <refentrytitle>kdbus.item</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more details.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++
++ <para>
++ Unrecognized items are rejected, and the ioctl will fail with
++ <varname>errno</varname> set to <constant>EINVAL</constant>.
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++ </refsect1>
++
++ <refsect1>
++ <title>Termination of connections</title>
++ <para>
++ A connection can be terminated by simply calling
++ <citerefentry>
++ <refentrytitle>close</refentrytitle>
++ <manvolnum>2</manvolnum>
++ </citerefentry>
++ on its file descriptor. All pending incoming messages will be discarded,
++ and the memory allocated by the pool will be freed.
++ </para>
++
++ <para>
++ An alternative way of closing down a connection is via the
++ <constant>KDBUS_CMD_BYEBYE</constant> ioctl. This ioctl will succeed only
++ if the message queue of the connection is empty at the time of closing;
++ otherwise, the ioctl will fail with <varname>errno</varname> set to
++ <constant>EBUSY</constant>. When this ioctl returns
++ successfully, the connection has been terminated and won't accept any new
++ messages from remote peers. This way, a connection can be terminated
++ race-free, without losing any messages. The ioctl takes an argument of
++ type <type>struct kdbus_cmd</type>.
++ </para>
++
++ <programlisting>
++struct kdbus_cmd {
++ __u64 size;
++ __u64 flags;
++ __u64 return_flags;
++ struct kdbus_item items[0];
++};
++ </programlisting>
++
++ <variablelist>
++ <varlistentry>
++ <term><varname>size</varname></term>
++ <listitem><para>
++ The overall size of the struct, including its items.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>flags</varname></term>
++ <listitem><para>
++ Currently, no flags are supported.
++ <constant>KDBUS_FLAG_NEGOTIATE</constant> is accepted to probe for
++ valid flags. If set, the ioctl will fail with
++ <varname>errno</varname> set to <constant>EPROTO</constant>, and
++ the <varname>flags</varname> field is set to <constant>0</constant>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>return_flags</varname></term>
++ <listitem><para>
++ Flags returned by the kernel. Currently unused and always set to
++ <constant>0</constant> by the kernel.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>items</varname></term>
++ <listitem>
++ <para>
++ The following item types are supported.
++ </para>
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
++ <listitem><para>
++ With this item, programs can <emphasis>probe</emphasis> the
++ kernel for known item types. See
++ <citerefentry>
++ <refentrytitle>kdbus.item</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more details.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++
++ <para>
++ Unrecognized items are rejected, and the ioctl will fail with
++ <varname>errno</varname> set to <constant>EINVAL</constant>.
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++ </refsect1>
++
++ <refsect1>
++ <title>Return value</title>
++ <para>
++ On success, all mentioned ioctl commands return <errorcode>0</errorcode>;
++ on error, <errorcode>-1</errorcode> is returned, and
++ <varname>errno</varname> is set to indicate the error.
++ If the issued ioctl is illegal for the file descriptor used,
++ <varname>errno</varname> will be set to <constant>ENOTTY</constant>.
++ </para>
++
++ <refsect2>
++ <title>
++ <constant>KDBUS_CMD_HELLO</constant> may fail with the following
++ errors
++ </title>
++
++ <variablelist>
++ <varlistentry>
++ <term><constant>EFAULT</constant></term>
++ <listitem><para>
++ The supplied pool size was 0 or not a multiple of the page size.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EINVAL</constant></term>
++ <listitem><para>
++ The flags supplied in <type>struct kdbus_cmd_hello</type>
++ are invalid.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EINVAL</constant></term>
++ <listitem><para>
++ An illegal combination of
++ <constant>KDBUS_HELLO_MONITOR</constant>,
++ <constant>KDBUS_HELLO_ACTIVATOR</constant> and
++ <constant>KDBUS_HELLO_POLICY_HOLDER</constant> was passed in
++ <varname>flags</varname>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EINVAL</constant></term>
++ <listitem><para>
++ An invalid set of items was supplied.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>ECONNREFUSED</constant></term>
++ <listitem><para>
++ The attach_flags_send field did not satisfy the requirements of
++ the bus.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EPERM</constant></term>
++ <listitem><para>
++ A <constant>KDBUS_ITEM_CREDS</constant> items was supplied, but the
++ current user is not privileged.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>ESHUTDOWN</constant></term>
++ <listitem><para>
++ The bus you were trying to connect to has already been shut down.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EMFILE</constant></term>
++ <listitem><para>
++ The maximum number of connections on the bus has been reached.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EOPNOTSUPP</constant></term>
++ <listitem><para>
++ The endpoint does not support the connection flags supplied in
++ <type>struct kdbus_cmd_hello</type>.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++ </refsect2>
++
++ <refsect2>
++ <title>
++ <constant>KDBUS_CMD_BYEBYE</constant> may fail with the following
++ errors
++ </title>
++
++ <variablelist>
++ <varlistentry>
++ <term><constant>EALREADY</constant></term>
++ <listitem><para>
++ The connection has already been shut down.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EBUSY</constant></term>
++ <listitem><para>
++ There are still messages queued up in the connection's pool.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++ </refsect2>
++
++ <refsect2>
++ <title>
++ <constant>KDBUS_CMD_CONN_INFO</constant> may fail with the following
++ errors
++ </title>
++
++ <variablelist>
++ <varlistentry>
++ <term><constant>EINVAL</constant></term>
++ <listitem><para>
++ Invalid flags, or neither an ID nor a name was provided, or the
++ name is invalid.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>ESRCH</constant></term>
++ <listitem><para>
++ Connection lookup by name failed.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>ENXIO</constant></term>
++ <listitem><para>
++ No connection with the provided connection ID found.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++ </refsect2>
++
++ <refsect2>
++ <title>
++ <constant>KDBUS_CMD_CONN_UPDATE</constant> may fail with the following
++ errors
++ </title>
++
++ <variablelist>
++ <varlistentry>
++ <term><constant>EINVAL</constant></term>
++ <listitem><para>
++ Illegal flags or items.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EINVAL</constant></term>
++ <listitem><para>
++ Wildcards submitted in policy entries, or illegal sequence
++ of policy items.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EOPNOTSUPP</constant></term>
++ <listitem><para>
++ Operation not supported by connection.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>E2BIG</constant></term>
++ <listitem><para>
++ Too many policy items attached.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++ </refsect2>
++ </refsect1>
++
++ <refsect1>
++ <title>See Also</title>
++ <simplelist type="inline">
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.bus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.endpoint</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.message</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.name</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.policy</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.pool</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.item</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ </simplelist>
++ </refsect1>
++</refentry>
+diff --git a/Documentation/kdbus/kdbus.endpoint.xml b/Documentation/kdbus/kdbus.endpoint.xml
+new file mode 100644
+index 0000000..6632485
+--- /dev/null
++++ b/Documentation/kdbus/kdbus.endpoint.xml
+@@ -0,0 +1,429 @@
++<?xml version='1.0'?> <!--*-nxml-*-->
++<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
++ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
++
++<refentry id="kdbus.endpoint">
++
++ <refentryinfo>
++ <title>kdbus.endpoint</title>
++ <productname>kdbus.endpoint</productname>
++ </refentryinfo>
++
++ <refmeta>
++ <refentrytitle>kdbus.endpoint</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </refmeta>
++
++ <refnamediv>
++ <refname>kdbus.endpoint</refname>
++ <refpurpose>kdbus endpoint</refpurpose>
++ </refnamediv>
++
++ <refsect1>
++ <title>Description</title>
++
++ <para>
++ Endpoints are entry points to a bus (see
++ <citerefentry>
++ <refentrytitle>kdbus.bus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>).
++ By default, each bus has a default
++ endpoint called 'bus'. The bus owner has the ability to create custom
++ endpoints with specific names, permissions, and policy databases
++ (see below). An endpoint is presented as file underneath the directory
++ of the parent bus.
++ </para>
++ <para>
++ To create a custom endpoint, open the default endpoint
++ (<literal>bus</literal>) and use the
++ <constant>KDBUS_CMD_ENDPOINT_MAKE</constant> ioctl with
++ <type>struct kdbus_cmd</type>. Custom endpoints always have a policy
++ database that, by default, forbids any operation. You have to explicitly
++ install policy entries to allow any operation on this endpoint.
++ </para>
++ <para>
++ Once <constant>KDBUS_CMD_ENDPOINT_MAKE</constant> succeeded, the new
++ endpoint will appear in the filesystem
++ (<citerefentry>
++ <refentrytitle>kdbus.bus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>), and the used file descriptor will manage the
++ newly created endpoint resource. It cannot be used to manage further
++ resources and must be kept open as long as the endpoint is needed. The
++ endpoint will be terminated as soon as the file descriptor is closed.
++ </para>
++ <para>
++ Endpoint names may be chosen freely except for one restriction: the name
++ must be prefixed with the numeric effective UID of the creator and a dash.
++ This is required to avoid namespace clashes between different users. When
++ creating an endpoint, the name that is passed in must be properly
++ formatted or the kernel will refuse creation of the endpoint. Example:
++ <literal>1047-my-endpoint</literal> is an acceptable name for an
++ endpoint registered by a user with UID 1047. However,
++ <literal>1024-my-endpoint</literal> is not, and neither is
++ <literal>my-endpoint</literal>. The UID must be provided in the
++ user-namespace of the bus.
++ </para>
++ <para>
++ To create connections to a bus, use <constant>KDBUS_CMD_HELLO</constant>
++ on a file descriptor returned by <function>open()</function> on an
++ endpoint node. See
++ <citerefentry>
++ <refentrytitle>kdbus.connection</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for further details.
++ </para>
++ </refsect1>
++
++ <refsect1>
++ <title>Creating custom endpoints</title>
++ <para>
++ To create a new endpoint, the
++ <constant>KDBUS_CMD_ENDPOINT_MAKE</constant> command is used. Along with
++ the endpoint's name, which will be used to expose the endpoint in the
++ <citerefentry>
++ <refentrytitle>kdbus.fs</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>,
++ the command also optionally takes items to set up the endpoint's
++ <citerefentry>
++ <refentrytitle>kdbus.policy</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>.
++ <constant>KDBUS_CMD_ENDPOINT_MAKE</constant> takes a
++ <type>struct kdbus_cmd</type> argument.
++ </para>
++ <programlisting>
++struct kdbus_cmd {
++ __u64 size;
++ __u64 flags;
++ __u64 return_flags;
++ struct kdbus_item items[0];
++};
++ </programlisting>
++
++ <para>The fields in this struct are described below.</para>
++
++ <variablelist>
++ <varlistentry>
++ <term><varname>size</varname></term>
++ <listitem><para>
++ The overall size of the struct, including its items.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>flags</varname></term>
++ <listitem><para>The flags for creation.</para>
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_MAKE_ACCESS_GROUP</constant></term>
++ <listitem>
++ <para>Make the endpoint file group-accessible.</para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_MAKE_ACCESS_WORLD</constant></term>
++ <listitem>
++ <para>Make the endpoint file world-accessible.</para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
++ <listitem>
++ <para>
++ Requests a set of valid flags for this ioctl. When this bit is
++ set, no action is taken; the ioctl will return
++ <errorcode>0</errorcode>, and the <varname>flags</varname>
++ field will have all bits set that are valid for this command.
++ The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
++ cleared by the operation.
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>return_flags</varname></term>
++ <listitem><para>
++ Flags returned by the kernel. Currently unused and always set to
++ <constant>0</constant> by the kernel.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>items</varname></term>
++ <listitem>
++ <para>
++ The following items are expected for
++ <constant>KDBUS_CMD_ENDPOINT_MAKE</constant>.
++ </para>
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_MAKE_NAME</constant></term>
++ <listitem>
++ <para>Contains a string to identify the endpoint name.</para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_NAME</constant></term>
++ <term><constant>KDBUS_ITEM_POLICY_ACCESS</constant></term>
++ <listitem>
++ <para>
++ These items are used to set the policy attached to the
++ endpoint. For more details on bus and endpoint policies, see
++ <citerefentry>
++ <refentrytitle>kdbus.policy</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>.
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++ <para>
++ Unrecognized items are rejected, and the ioctl will fail with
++ <varname>errno</varname> set to <varname>EINVAL</varname>.
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++ </refsect1>
++
++ <refsect1>
++ <title>Updating endpoints</title>
++ <para>
++ To update an existing endpoint, the
++ <constant>KDBUS_CMD_ENDPOINT_UPDATE</constant> command is used on the file
++ descriptor that was used to create the endpoint, using
++ <constant>KDBUS_CMD_ENDPOINT_MAKE</constant>. The only relevant detail of
++ the endpoint that can be updated is the policy. When the command is
++ employed, the policy of the endpoint is <emphasis>replaced</emphasis>
++ atomically with the new set of rules.
++ The command takes a <type>struct kdbus_cmd</type> argument.
++ </para>
++ <programlisting>
++struct kdbus_cmd {
++ __u64 size;
++ __u64 flags;
++ __u64 return_flags;
++ struct kdbus_item items[0];
++};
++ </programlisting>
++
++ <para>The fields in this struct are described below.</para>
++
++ <variablelist>
++ <varlistentry>
++ <term><varname>size</varname></term>
++ <listitem><para>
++ The overall size of the struct, including its items.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>flags</varname></term>
++ <listitem><para>
++ Unused for this command.
++ <constant>KDBUS_FLAG_NEGOTIATE</constant> is accepted to probe for
++ valid flags. If set, the ioctl will return <errorcode>0</errorcode>,
++ and the <varname>flags</varname> field is set to
++ <constant>0</constant>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>return_flags</varname></term>
++ <listitem><para>
++ Flags returned by the kernel. Currently unused and always set to
++ <constant>0</constant> by the kernel.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>items</varname></term>
++ <listitem>
++ <para>
++ The following items are expected for
++ <constant>KDBUS_CMD_ENDPOINT_UPDATE</constant>.
++ </para>
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_NAME</constant></term>
++ <term><constant>KDBUS_ITEM_POLICY_ACCESS</constant></term>
++ <listitem>
++ <para>
++ These items are used to set the policy attached to the
++ endpoint. For more details on bus and endpoint policies, see
++ <citerefentry>
++ <refentrytitle>kdbus.policy</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>.
++ Existing policy is atomically replaced with the new rules
++ provided.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
++ <listitem><para>
++ With this item, programs can <emphasis>probe</emphasis> the
++ kernel for known item types. See
++ <citerefentry>
++ <refentrytitle>kdbus.item</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more details.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++ <para>
++ Unrecognized items are rejected, and the ioctl will fail with
++ <varname>errno</varname> set to <constant>EINVAL</constant>.
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++ </refsect1>
++
++ <refsect1>
++ <title>Return value</title>
++ <para>
++ On success, all mentioned ioctl commands return <errorcode>0</errorcode>;
++ on error, <errorcode>-1</errorcode> is returned, and
++ <varname>errno</varname> is set to indicate the error.
++ If the issued ioctl is illegal for the file descriptor used,
++ <varname>errno</varname> will be set to <constant>ENOTTY</constant>.
++ </para>
++
++ <refsect2>
++ <title>
++ <constant>KDBUS_CMD_ENDPOINT_MAKE</constant> may fail with the
++ following errors
++ </title>
++
++ <variablelist>
++ <varlistentry>
++ <term><constant>EINVAL</constant></term>
++ <listitem><para>
++ The flags supplied in the <type>struct kdbus_cmd</type>
++ are invalid.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EINVAL</constant></term>
++ <listitem><para>
++ Illegal combination of <constant>KDBUS_ITEM_NAME</constant> and
++ <constant>KDBUS_ITEM_POLICY_ACCESS</constant> was provided.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EEXIST</constant></term>
++ <listitem><para>
++ An endpoint of that name already exists.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EPERM</constant></term>
++ <listitem><para>
++ The calling user is not privileged. See
++ <citerefentry>
++ <refentrytitle>kdbus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for information about privileged users.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++ </refsect2>
++
++ <refsect2>
++ <title>
++ <constant>KDBUS_CMD_ENDPOINT_UPDATE</constant> may fail with the
++ following errors
++ </title>
++
++ <variablelist>
++ <varlistentry>
++ <term><constant>EINVAL</constant></term>
++ <listitem><para>
++ The flags supplied in <type>struct kdbus_cmd</type>
++ are invalid.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EINVAL</constant></term>
++ <listitem><para>
++ Illegal combination of <constant>KDBUS_ITEM_NAME</constant> and
++ <constant>KDBUS_ITEM_POLICY_ACCESS</constant> was provided.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++ </refsect2>
++ </refsect1>
++
++ <refsect1>
++ <title>See Also</title>
++ <simplelist type="inline">
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.bus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.endpoint</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.fs</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.item</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.message</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.name</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.pool</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ </simplelist>
++ </refsect1>
++</refentry>
+diff --git a/Documentation/kdbus/kdbus.fs.xml b/Documentation/kdbus/kdbus.fs.xml
+new file mode 100644
+index 0000000..8c2a90e
+--- /dev/null
++++ b/Documentation/kdbus/kdbus.fs.xml
+@@ -0,0 +1,124 @@
++<?xml version='1.0'?> <!--*-nxml-*-->
++<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
++ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
++
++<refentry id="kdbus_fs">
++
++ <refentryinfo>
++ <title>kdbus.fs</title>
++ <productname>kdbus.fs</productname>
++ </refentryinfo>
++
++ <refmeta>
++ <refentrytitle>kdbus.fs</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </refmeta>
++
++ <refnamediv>
++ <refname>kdbus.fs</refname>
++ <refpurpose>kdbus file system</refpurpose>
++ </refnamediv>
++
++ <refsect1>
++ <title>File-system Layout</title>
++
++ <para>
++ The <emphasis>kdbusfs</emphasis> pseudo filesystem provides access to
++ kdbus entities, such as <emphasis>buses</emphasis> and
++ <emphasis>endpoints</emphasis>. Each time the filesystem is mounted,
++ a new, isolated kdbus instance is created, which is independent from the
++ other instances.
++ </para>
++ <para>
++ The system-wide standard mount point for <emphasis>kdbusfs</emphasis> is
++ <constant>/sys/fs/kdbus</constant>.
++ </para>
++
++ <para>
++ Buses are represented as directories in the file system layout, whereas
++ endpoints are exposed as files inside these directories. At the top-level,
++ a <emphasis>control</emphasis> node is present, which can be opened to
++ create new buses via the <constant>KDBUS_CMD_BUS_MAKE</constant> ioctl.
++ Each <emphasis>bus</emphasis> shows a default endpoint called
++ <varname>bus</varname>, which can be opened to either create a connection
++ with the <constant>KDBUS_CMD_HELLO</constant> ioctl, or to create new
++ custom endpoints for the bus with
++ <constant>KDBUS_CMD_ENDPOINT_MAKE</constant>. See
++ <citerefentry>
++ <refentrytitle>kdbus.bus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>,
++ <citerefentry>
++ <refentrytitle>kdbus.connection</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry> and
++ <citerefentry>
++ <refentrytitle>kdbus.endpoint</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more details.
++ </para>
++
++ <para>Following, you can see an example layout of the
++ <emphasis>kdbusfs</emphasis> filesystem:</para>
++
++<programlisting>
++ /sys/fs/kdbus/ ; mount-point
++ |-- 0-system ; bus directory
++ | |-- bus ; default endpoint
++ | `-- 1017-custom ; custom endpoint
++ |-- 1000-user ; bus directory
++ | |-- bus ; default endpoint
++ | |-- 1000-service-A ; custom endpoint
++ | `-- 1000-service-B ; custom endpoint
++ `-- control ; control file
++</programlisting>
++ </refsect1>
++
++ <refsect1>
++ <title>Mounting instances</title>
++ <para>
++ In order to get a new and separate kdbus environment, a new instance
++ of <emphasis>kdbusfs</emphasis> can be mounted like this:
++ </para>
++<programlisting>
++ # mount -t kdbusfs kdbusfs /tmp/new_kdbus/
++</programlisting>
++ </refsect1>
++
++ <refsect1>
++ <title>See Also</title>
++ <simplelist type="inline">
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.bus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.connection</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.endpoint</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>mount</refentrytitle>
++ <manvolnum>8</manvolnum>
++ </citerefentry>
++ </member>
++ </simplelist>
++ </refsect1>
++</refentry>
+diff --git a/Documentation/kdbus/kdbus.item.xml b/Documentation/kdbus/kdbus.item.xml
+new file mode 100644
+index 0000000..ee09dfa
+--- /dev/null
++++ b/Documentation/kdbus/kdbus.item.xml
+@@ -0,0 +1,839 @@
++<?xml version='1.0'?> <!--*-nxml-*-->
++<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
++ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
++
++<refentry id="kdbus">
++
++ <refentryinfo>
++ <title>kdbus.item</title>
++ <productname>kdbus item</productname>
++ </refentryinfo>
++
++ <refmeta>
++ <refentrytitle>kdbus.item</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </refmeta>
++
++ <refnamediv>
++ <refname>kdbus.item</refname>
++ <refpurpose>kdbus item structure, layout and usage</refpurpose>
++ </refnamediv>
++
++ <refsect1>
++ <title>Description</title>
++
++ <para>
++ To flexibly augment transport structures, data blobs of type
++ <type>struct kdbus_item</type> can be attached to the structs passed
++ into the ioctls. Some ioctls make items of certain types mandatory,
++ others are optional. Items that are unsupported by ioctls they are
++ attached to will cause the ioctl to fail with <varname>errno</varname>
++ set to <constant>EINVAL</constant>.
++ Items are also used for information stored in a connection's
++ <emphasis>pool</emphasis>, such as received messages, name lists or
++ requested connection or bus owner information. Depending on the type of
++ an item, its total size is either fixed or variable.
++ </para>
++
++ <refsect2>
++ <title>Chaining items</title>
++ <para>
++ Whenever items are used as part of the kdbus kernel API, they are
++ embedded in structs that are embedded inside structs that themselves
++ include a size field containing the overall size of the structure.
++ This allows multiple items to be chained up, and an item iterator
++ (see below) is capable of detecting the end of an item chain.
++ </para>
++ </refsect2>
++
++ <refsect2>
++ <title>Alignment</title>
++ <para>
++ The kernel expects all items to be aligned to 8-byte boundaries.
++ Unaligned items will cause the ioctl they are used with to fail
++ with <varname>errno</varname> set to <constant>EINVAL</constant>.
++ An item that has an unaligned size itself hence needs to be padded
++ if it is followed by another item.
++ </para>
++ </refsect2>
++
++ <refsect2>
++ <title>Iterating items</title>
++ <para>
++ A simple iterator would iterate over the items until the items have
++ reached the embedding structure's overall size. An example
++ implementation is shown below.
++ </para>
++
++ <programlisting><![CDATA[
++#define KDBUS_ALIGN8(val) (((val) + 7) & ~7)
++
++#define KDBUS_ITEM_NEXT(item) \
++ (typeof(item))((uint8_t *)(item) + KDBUS_ALIGN8((item)->size))
++
++#define KDBUS_ITEM_FOREACH(item, head, first) \
++ for ((item) = (head)->first; \
++ ((uint8_t *)(item) < (uint8_t *)(head) + (head)->size) && \
++ ((uint8_t *)(item) >= (uint8_t *)(head)); \
++ (item) = KDBUS_ITEM_NEXT(item))
++ ]]></programlisting>
++ </refsect2>
++ </refsect1>
++
++ <refsect1>
++ <title>Item layout</title>
++ <para>
++ A <type>struct kdbus_item</type> consists of a
++ <varname>size</varname> field, describing its overall size, and a
++ <varname>type</varname> field, both 64 bit wide. They are followed by
++ a union to store information that is specific to the item's type.
++ The struct layout is shown below.
++ </para>
++
++ <programlisting>
++struct kdbus_item {
++ __u64 size;
++ __u64 type;
++ /* item payload - see below */
++ union {
++ __u8 data[0];
++ __u32 data32[0];
++ __u64 data64[0];
++ char str[0];
++
++ __u64 id;
++ struct kdbus_vec vec;
++ struct kdbus_creds creds;
++ struct kdbus_pids pids;
++ struct kdbus_audit audit;
++ struct kdbus_caps caps;
++ struct kdbus_timestamp timestamp;
++ struct kdbus_name name;
++ struct kdbus_bloom_parameter bloom_parameter;
++ struct kdbus_bloom_filter bloom_filter;
++ struct kdbus_memfd memfd;
++ int fds[0];
++ struct kdbus_notify_name_change name_change;
++ struct kdbus_notify_id_change id_change;
++ struct kdbus_policy_access policy_access;
++ };
++};
++ </programlisting>
++
++ <para>
++ <type>struct kdbus_item</type> should never be used to allocate
++ an item instance, as its size may grow in future releases of the API.
++ Instead, it should be manually assembled by storing the
++ <varname>size</varname>, <varname>type</varname> and payload to a
++ struct of its own.
++ </para>
++ </refsect1>
++
++ <refsect1>
++ <title>Item types</title>
++
++ <refsect2>
++ <title>Negotiation item</title>
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
++ <listitem><para>
++ With this item is attached to any ioctl, programs can
++ <emphasis>probe</emphasis> the kernel for known item types.
++ The item carries an array of <type>uint64_t</type> values in
++ <varname>item.data64</varname>, each set to an item type to
++ probe. The kernel will reset each member of this array that is
++ not recognized as valid item type to <constant>0</constant>.
++ This way, users can negotiate kernel features at start-up to
++ keep newer userspace compatible with older kernels. This item
++ is never attached by the kernel in response to any command.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++ </refsect2>
++
++ <refsect2>
++ <title>Command specific items</title>
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_PAYLOAD_VEC</constant></term>
++ <term><constant>KDBUS_ITEM_PAYLOAD_OFF</constant></term>
++ <listitem><para>
++ Messages are directly copied by the sending process into the
++ receiver's
++ <citerefentry>
++ <refentrytitle>kdbus.pool</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>.
++ This way, two peers can exchange data by effectively doing a
++ single-copy from one process to another; the kernel will not buffer
++ the data anywhere else. <constant>KDBUS_ITEM_PAYLOAD_VEC</constant>
++ is used when <emphasis>sending</emphasis> message. The item
++ references a memory address when the payload data can be found.
++ <constant>KDBUS_ITEM_PAYLOAD_OFF</constant> is used when messages
++ are <emphasis>received</emphasis>, and the
++ <constant>offset</constant> value describes the offset inside the
++ receiving connection's
++ <citerefentry>
++ <refentrytitle>kdbus.pool</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ where the message payload can be found. See
++ <citerefentry>
++ <refentrytitle>kdbus.message</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more information on passing of payload data along with a
++ message.
++ <programlisting>
++struct kdbus_vec {
++ __u64 size;
++ union {
++ __u64 address;
++ __u64 offset;
++ };
++};
++ </programlisting>
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_PAYLOAD_MEMFD</constant></term>
++ <listitem><para>
++ Transports a file descriptor of a <emphasis>memfd</emphasis> in
++ <type>struct kdbus_memfd</type> in <varname>item.memfd</varname>.
++ The <varname>size</varname> field has to match the actual size of
++ the memfd that was specified when it was created. The
++ <varname>start</varname> parameter denotes the offset inside the
++ memfd at which the referenced payload starts. See
++ <citerefentry>
++ <refentrytitle>kdbus.message</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more information on passing of payload data along with a
++ message.
++ <programlisting>
++struct kdbus_memfd {
++ __u64 start;
++ __u64 size;
++ int fd;
++ __u32 __pad;
++};
++ </programlisting>
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_FDS</constant></term>
++ <listitem><para>
++ Contains an array of <emphasis>file descriptors</emphasis>.
++ When used with <constant>KDBUS_CMD_SEND</constant>, the values of
++ this array must be filled with valid file descriptor numbers.
++ When received as item attached to a message, the array will
++ contain the numbers of the installed file descriptors, or
++ <constant>-1</constant> in case an error occurred.
++ In either case, the number of entries in the array is derived from
++ the item's total size. See
++ <citerefentry>
++ <refentrytitle>kdbus.message</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more information.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++ </refsect2>
++
++ <refsect2>
++ <title>Items specific to some commands</title>
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_CANCEL_FD</constant></term>
++ <listitem><para>
++ Transports a file descriptor that can be used to cancel a
++ synchronous <constant>KDBUS_CMD_SEND</constant> operation by
++ writing to it. The file descriptor is stored in
++ <varname>item.fd[0]</varname>. The item may only contain one
++ file descriptor. See
++ <citerefentry>
++ <refentrytitle>kdbus.message</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more information on this item and how to use it.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_BLOOM_PARAMETER</constant></term>
++ <listitem><para>
++ Contains a set of <emphasis>bloom parameters</emphasis> as
++ <type>struct kdbus_bloom_parameter</type> in
++ <varname>item.bloom_parameter</varname>.
++ The item is passed from userspace to kernel during the
++ <constant>KDBUS_CMD_BUS_MAKE</constant> ioctl, and returned
++ verbatim when <constant>KDBUS_CMD_HELLO</constant> is called.
++ The kernel does not use the bloom parameters, but they need to
++ be known by each connection on the bus in order to define the
++ bloom filter hash details. See
++ <citerefentry>
++ <refentrytitle>kdbus.match</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more information on matching and bloom filters.
++ <programlisting>
++struct kdbus_bloom_parameter {
++ __u64 size;
++ __u64 n_hash;
++};
++ </programlisting>
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_BLOOM_FILTER</constant></term>
++ <listitem><para>
++ Carries a <emphasis>bloom filter</emphasis> as
++ <type>struct kdbus_bloom_filter</type> in
++ <varname>item.bloom_filter</varname>. It is mandatory to send this
++ item attached to a <type>struct kdbus_msg</type>, in case the
++ message is a signal. This item is never transported from kernel to
++ userspace. See
++ <citerefentry>
++ <refentrytitle>kdbus.match</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more information on matching and bloom filters.
++ <programlisting>
++struct kdbus_bloom_filter {
++ __u64 generation;
++ __u64 data[0];
++};
++ </programlisting>
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_BLOOM_MASK</constant></term>
++ <listitem><para>
++ Transports a <emphasis>bloom mask</emphasis> as binary data blob
++ stored in <varname>item.data</varname>. This item is used to
++ describe a match into a connection's match database. See
++ <citerefentry>
++ <refentrytitle>kdbus.match</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more information on matching and bloom filters.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_DST_NAME</constant></term>
++ <listitem><para>
++ Contains a <emphasis>well-known name</emphasis> to send a
++ message to, as null-terminated string in
++ <varname>item.str</varname>. This item is used with
++ <constant>KDBUS_CMD_SEND</constant>. See
++ <citerefentry>
++ <refentrytitle>kdbus.message</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more information on how to send a message.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_MAKE_NAME</constant></term>
++ <listitem><para>
++ Contains a <emphasis>bus name</emphasis> or
++ <emphasis>endpoint name</emphasis>, stored as null-terminated
++ string in <varname>item.str</varname>. This item is sent from
++ userspace to kernel when buses or endpoints are created, and
++ returned back to userspace when the bus creator information is
++ queried. See
++ <citerefentry>
++ <refentrytitle>kdbus.bus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ and
++ <citerefentry>
++ <refentrytitle>kdbus.endpoint</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_ATTACH_FLAGS_SEND</constant></term>
++ <term><constant>KDBUS_ITEM_ATTACH_FLAGS_RECV</constant></term>
++ <listitem><para>
++ Contains a set of <emphasis>attach flags</emphasis> at
++ <emphasis>send</emphasis> or <emphasis>receive</emphasis> time. See
++ <citerefentry>
++ <refentrytitle>kdbus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>,
++ <citerefentry>
++ <refentrytitle>kdbus.bus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry> and
++ <citerefentry>
++ <refentrytitle>kdbus.connection</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more information on attach flags.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_ID</constant></term>
++ <listitem><para>
++ Transports a connection's <emphasis>numerical ID</emphasis> of
++ a connection as <type>uint64_t</type> value in
++ <varname>item.id</varname>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_NAME</constant></term>
++ <listitem><para>
++ Transports a name associated with the
++ <emphasis>name registry</emphasis> as null-terminated string as
++ <type>struct kdbus_name</type> in
++ <varname>item.name</varname>. The <varname>flags</varname>
++ contains the flags of the name. See
++ <citerefentry>
++ <refentrytitle>kdbus.name</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more information on how to access the name registry of a bus.
++ <programlisting>
++struct kdbus_name {
++ __u64 flags;
++ char name[0];
++};
++ </programlisting>
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++ </refsect2>
++
++ <refsect2>
++ <title>Items attached by the kernel as metadata</title>
++
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_TIMESTAMP</constant></term>
++ <listitem><para>
++ Contains both the <emphasis>monotonic</emphasis> and the
++ <emphasis>realtime</emphasis> timestamp, taken when the message
++ was processed on the kernel side.
++ Stored as <type>struct kdbus_timestamp</type> in
++ <varname>item.timestamp</varname>.
++ <programlisting>
++struct kdbus_timestamp {
++ __u64 seqnum;
++ __u64 monotonic_ns;
++ __u64 realtime_ns;
++};
++ </programlisting>
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_CREDS</constant></term>
++ <listitem><para>
++ Contains a set of <emphasis>user</emphasis> and
++ <emphasis>group</emphasis> information as 32-bit values, in the
++ usual four flavors: real, effective, saved and filesystem related.
++ Stored as <type>struct kdbus_creds</type> in
++ <varname>item.creds</varname>.
++ <programlisting>
++struct kdbus_creds {
++ __u32 uid;
++ __u32 euid;
++ __u32 suid;
++ __u32 fsuid;
++ __u32 gid;
++ __u32 egid;
++ __u32 sgid;
++ __u32 fsgid;
++};
++ </programlisting>
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_PIDS</constant></term>
++ <listitem><para>
++ Contains the <emphasis>PID</emphasis>, <emphasis>TID</emphasis>
++ and <emphasis>parent PID (PPID)</emphasis> of a remote peer.
++ Stored as <type>struct kdbus_pids</type> in
++ <varname>item.pids</varname>.
++ <programlisting>
++struct kdbus_pids {
++ __u64 pid;
++ __u64 tid;
++ __u64 ppid;
++};
++ </programlisting>
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_AUXGROUPS</constant></term>
++ <listitem><para>
++ Contains the <emphasis>auxiliary (supplementary) groups</emphasis>
++ a remote peer is a member of, stored as array of
++ <type>uint32_t</type> values in <varname>item.data32</varname>.
++ The array length can be determined by looking at the item's total
++ size, subtracting the size of the header and dividing the
++ remainder by <constant>sizeof(uint32_t)</constant>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_OWNED_NAME</constant></term>
++ <listitem><para>
++ Contains a <emphasis>well-known name</emphasis> currently owned
++ by a connection. The name is stored as null-terminated string in
++ <varname>item.str</varname>. Its length can also be derived from
++ the item's total size.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_TID_COMM</constant> [*]</term>
++ <listitem><para>
++ Contains the <emphasis>comm</emphasis> string of a task's
++ <emphasis>TID</emphasis> (thread ID), stored as null-terminated
++ string in <varname>item.str</varname>. Its length can also be
++ derived from the item's total size. Receivers of this item should
++ not use its contents for any kind of security measures. See below.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_PID_COMM</constant> [*]</term>
++ <listitem><para>
++ Contains the <emphasis>comm</emphasis> string of a task's
++ <emphasis>PID</emphasis> (process ID), stored as null-terminated
++ string in <varname>item.str</varname>. Its length can also be
++ derived from the item's total size. Receivers of this item should
++ not use its contents for any kind of security measures. See below.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_EXE</constant> [*]</term>
++ <listitem><para>
++ Contains the <emphasis>path to the executable</emphasis> of a task,
++ stored as null-terminated string in <varname>item.str</varname>. Its
++ length can also be derived from the item's total size. Receivers of
++ this item should not use its contents for any kind of security
++ measures. See below.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_CMDLINE</constant> [*]</term>
++ <listitem><para>
++ Contains the <emphasis>command line arguments</emphasis> of a
++ task, stored as an <emphasis>array</emphasis> of null-terminated
++ strings in <varname>item.str</varname>. The total length of all
++ strings in the array can be derived from the item's total size.
++ Receivers of this item should not use its contents for any kind
++ of security measures. See below.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_CGROUP</constant></term>
++ <listitem><para>
++ Contains the <emphasis>cgroup path</emphasis> of a task, stored
++ as null-terminated string in <varname>item.str</varname>. Its
++ length can also be derived from the item's total size.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_CAPS</constant></term>
++ <listitem><para>
++ Contains sets of <emphasis>capabilities</emphasis>, stored as
++ <type>struct kdbus_caps</type> in <varname>item.caps</varname>.
++ As the item size may increase in the future, programs should be
++ written in a way that it takes
++ <varname>item.caps.last_cap</varname> into account, and derive
++ the number of sets and rows from the item size and the reported
++ number of valid capability bits.
++ <programlisting>
++struct kdbus_caps {
++ __u32 last_cap;
++ __u32 caps[0];
++};
++ </programlisting>
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_SECLABEL</constant></term>
++ <listitem><para>
++ Contains the <emphasis>LSM label</emphasis> of a task, stored as
++ null-terminated string in <varname>item.str</varname>. Its length
++ can also be derived from the item's total size.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_AUDIT</constant></term>
++ <listitem><para>
++ Contains the audit <emphasis>sessionid</emphasis> and
++ <emphasis>loginuid</emphasis> of a task, stored as
++ <type>struct kdbus_audit</type> in
++ <varname>item.audit</varname>.
++ <programlisting>
++struct kdbus_audit {
++ __u32 sessionid;
++ __u32 loginuid;
++};
++ </programlisting>
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_CONN_DESCRIPTION</constant></term>
++ <listitem><para>
++ Contains the <emphasis>connection description</emphasis>, as set
++ by <constant>KDBUS_CMD_HELLO</constant> or
++ <constant>KDBUS_CMD_CONN_UPDATE</constant>, stored as
++ null-terminated string in <varname>item.str</varname>. Its length
++ can also be derived from the item's total size.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++
++ <para>
++ All metadata is automatically translated into the
++ <emphasis>namespaces</emphasis> of the task that receives them. See
++ <citerefentry>
++ <refentrytitle>kdbus.message</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more information.
++ </para>
++
++ <para>
++ [*] Note that the content stored in metadata items of type
++ <constant>KDBUS_ITEM_TID_COMM</constant>,
++ <constant>KDBUS_ITEM_PID_COMM</constant>,
++ <constant>KDBUS_ITEM_EXE</constant> and
++ <constant>KDBUS_ITEM_CMDLINE</constant>
++ can easily be tampered by the sending tasks. Therefore, they should
++ <emphasis>not</emphasis> be used for any sort of security relevant
++ assumptions. The only reason they are transmitted is to let
++ receivers know about details that were set when metadata was
++ collected, even though the task they were collected from is not
++ active any longer when the items are received.
++ </para>
++ </refsect2>
++
++ <refsect2>
++ <title>Items used for policy entries, matches and notifications</title>
++
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_POLICY_ACCESS</constant></term>
++ <listitem><para>
++ This item describes a <emphasis>policy access</emphasis> entry to
++ access the policy database of a
++ <citerefentry>
++ <refentrytitle>kdbus.bus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry> or
++ <citerefentry>
++ <refentrytitle>kdbus.endpoint</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>.
++ Please refer to
++ <citerefentry>
++ <refentrytitle>kdbus.policy</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more information on the policy database and how to access it.
++ <programlisting>
++struct kdbus_policy_access {
++ __u64 type;
++ __u64 access;
++ __u64 id;
++};
++ </programlisting>
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_ID_ADD</constant></term>
++ <term><constant>KDBUS_ITEM_ID_REMOVE</constant></term>
++ <listitem><para>
++ This item is sent as attachment to a
++ <emphasis>kernel notification</emphasis> and indicates that a
++ new connection was created on the bus, or that a connection was
++ disconnected, respectively. It stores a
++ <type>struct kdbus_notify_id_change</type> in
++ <varname>item.id_change</varname>.
++ The <varname>id</varname> field contains the numeric ID of the
++ connection that was added or removed, and <varname>flags</varname>
++ is set to the connection flags, as passed by
++ <constant>KDBUS_CMD_HELLO</constant>. See
++ <citerefentry>
++ <refentrytitle>kdbus.match</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ and
++ <citerefentry>
++ <refentrytitle>kdbus.message</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more information on matches and notification messages.
++ <programlisting>
++struct kdbus_notify_id_change {
++ __u64 id;
++ __u64 flags;
++};
++ </programlisting>
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_NAME_ADD</constant></term>
++ <term><constant>KDBUS_ITEM_NAME_REMOVE</constant></term>
++ <term><constant>KDBUS_ITEM_NAME_CHANGE</constant></term>
++ <listitem><para>
++ This item is sent as attachment to a
++ <emphasis>kernel notification</emphasis> and indicates that a
++ <emphasis>well-known name</emphasis> appeared, disappeared or
++ transferred to another owner on the bus. It stores a
++ <type>struct kdbus_notify_name_change</type> in
++ <varname>item.name_change</varname>.
++ <varname>old_id</varname> describes the former owner of the name
++ and is set to <constant>0</constant> values in case of
++ <constant>KDBUS_ITEM_NAME_ADD</constant>.
++ <varname>new_id</varname> describes the new owner of the name and
++ is set to <constant>0</constant> values in case of
++ <constant>KDBUS_ITEM_NAME_REMOVE</constant>.
++ The <varname>name</varname> field contains the well-known name the
++ notification is about, as null-terminated string. See
++ <citerefentry>
++ <refentrytitle>kdbus.match</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ and
++ <citerefentry>
++ <refentrytitle>kdbus.message</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more information on matches and notification messages.
++ <programlisting>
++struct kdbus_notify_name_change {
++ struct kdbus_notify_id_change old_id;
++ struct kdbus_notify_id_change new_id;
++ char name[0];
++};
++ </programlisting>
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_REPLY_TIMEOUT</constant></term>
++ <listitem><para>
++ This item is sent as attachment to a
++ <emphasis>kernel notification</emphasis>. It informs the receiver
++ that an expected reply to a message was not received in time.
++ The remote peer ID and the message cookie are stored in the message
++ header. See
++ <citerefentry>
++ <refentrytitle>kdbus.message</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more information about messages, timeouts and notifications.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_REPLY_DEAD</constant></term>
++ <listitem><para>
++ This item is sent as attachment to a
++ <emphasis>kernel notification</emphasis>. It informs the receiver
++ that a remote connection a reply is expected from was disconnected
++ before that reply was sent. The remote peer ID and the message
++ cookie are stored in the message header. See
++ <citerefentry>
++ <refentrytitle>kdbus.message</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more information about messages, timeouts and notifications.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++ </refsect2>
++ </refsect1>
++
++ <refsect1>
++ <title>See Also</title>
++ <simplelist type="inline">
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.bus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.connection</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.endpoint</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.fs</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.message</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.name</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.pool</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>memfd_create</refentrytitle>
++ <manvolnum>2</manvolnum>
++ </citerefentry>
++ </member>
++ </simplelist>
++ </refsect1>
++
++</refentry>
+diff --git a/Documentation/kdbus/kdbus.match.xml b/Documentation/kdbus/kdbus.match.xml
+new file mode 100644
+index 0000000..ae38e04
+--- /dev/null
++++ b/Documentation/kdbus/kdbus.match.xml
+@@ -0,0 +1,555 @@
++<?xml version='1.0'?> <!--*-nxml-*-->
++<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
++ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
++
++<refentry id="kdbus.match">
++
++ <refentryinfo>
++ <title>kdbus.match</title>
++ <productname>kdbus.match</productname>
++ </refentryinfo>
++
++ <refmeta>
++ <refentrytitle>kdbus.match</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </refmeta>
++
++ <refnamediv>
++ <refname>kdbus.match</refname>
++ <refpurpose>kdbus match</refpurpose>
++ </refnamediv>
++
++ <refsect1>
++ <title>Description</title>
++
++ <para>
++ kdbus connections can install matches in order to subscribe to signal
++ messages sent on the bus. Such signal messages can be either directed
++ to a single connection (by setting a specific connection ID in
++ <varname>struct kdbus_msg.dst_id</varname> or by sending it to a
++ well-known name), or to potentially <emphasis>all</emphasis> currently
++ active connections on the bus (by setting
++ <varname>struct kdbus_msg.dst_id</varname> to
++ <constant>KDBUS_DST_ID_BROADCAST</constant>).
++ A signal message always has the <constant>KDBUS_MSG_SIGNAL</constant>
++ bit set in the <varname>flags</varname> bitfield.
++ Also, signal messages can originate from either the kernel (called
++ <emphasis>notifications</emphasis>), or from other bus connections.
++ In either case, a bus connection needs to have a suitable
++ <emphasis>match</emphasis> installed in order to receive any signal
++ message. Without any rules installed in the connection, no signal message
++ will be received.
++ </para>
++ </refsect1>
++
++ <refsect1>
++ <title>Matches for signal messages from other connections</title>
++ <para>
++ Matches for messages from other connections (not kernel notifications)
++ are implemented as bloom filters (see below). The sender adds certain
++ properties of the message as elements to a bloom filter bit field, and
++ sends that along with the signal message.
++
++ The receiving connection adds the message properties it is interested in
++ as elements to a bloom mask bit field, and uploads the mask as match rule,
++ possibly along with some other rules to further limit the match.
++
++ The kernel will match the signal message's bloom filter against the
++ connection's bloom mask (simply by &-ing it), and will decide whether
++ the message should be delivered to a connection.
++ </para>
++ <para>
++ The kernel has no notion of any specific properties of the signal message,
++ all it sees are the bit fields of the bloom filter and the mask to match
++ against. The use of bloom filters allows simple and efficient matching,
++ without exposing any message properties or internals to the kernel side.
++ Clients need to deal with the fact that they might receive signal messages
++ which they did not subscribe to, as the bloom filter might allow
++ false-positives to pass the filter.
++
++ To allow the future extension of the set of elements in the bloom filter,
++ the filter specifies a <emphasis>generation</emphasis> number. A later
++ generation must always contain all elements of the set of the previous
++ generation, but can add new elements to the set. The match rules mask can
++ carry an array with all previous generations of masks individually stored.
++ When the filter and mask are matched by the kernel, the mask with the
++ closest matching generation is selected as the index into the mask array.
++ </para>
++ </refsect1>
++
++ <refsect1>
++ <title>Bloom filters</title>
++ <para>
++ Bloom filters allow checking whether a given word is present in a
++ dictionary. This allows connections to set up a mask for information it
++ is interested in, and will be delivered signal messages that have a
++ matching filter.
++
++ For general information, see
++ <ulink url="https://en.wikipedia.org/wiki/Bloom_filter">the Wikipedia
++ article on bloom filters</ulink>.
++ </para>
++ <para>
++ The size of the bloom filter is defined per bus when it is created, in
++ <varname>kdbus_bloom_parameter.size</varname>. All bloom filters attached
++ to signal messages on the bus must match this size, and all bloom filter
++ matches uploaded by connections must also match the size, or a multiple
++ thereof (see below).
++
++ The calculation of the mask has to be done in userspace applications. The
++ kernel just checks the bitmasks to decide whether or not to let the
++ message pass. All bits in the mask must match the filter in and bit-wise
++ <emphasis>AND</emphasis> logic, but the mask may have more bits set than
++ the filter. Consequently, false positive matches are expected to happen,
++ and programs must deal with that fact by checking the contents of the
++ payload again at receive time.
++ </para>
++ <para>
++ Masks are entities that are always passed to the kernel as part of a
++ match (with an item of type <constant>KDBUS_ITEM_BLOOM_MASK</constant>),
++ and filters can be attached to signals, with an item of type
++ <constant>KDBUS_ITEM_BLOOM_FILTER</constant>. For a filter to match, all
++ its bits have to be set in the match mask as well.
++ </para>
++ <para>
++ For example, consider a bus that has a bloom size of 8 bytes, and the
++ following mask/filter combinations:
++ </para>
++ <programlisting><![CDATA[
++ filter 0x0101010101010101
++ mask 0x0101010101010101
++ -> matches
++
++ filter 0x0303030303030303
++ mask 0x0101010101010101
++ -> doesn't match
++
++ filter 0x0101010101010101
++ mask 0x0303030303030303
++ -> matches
++ ]]></programlisting>
++
++ <para>
++ Hence, in order to catch all messages, a mask filled with
++ <constant>0xff</constant> bytes can be installed as a wildcard match rule.
++ </para>
++
++ <refsect2>
++ <title>Generations</title>
++
++ <para>
++ Uploaded matches may contain multiple masks, which have to be as large
++ as the bloom filter size defined by the bus. Each block of a mask is
++ called a <emphasis>generation</emphasis>, starting at index 0.
++
++ At match time, when a signal is about to be delivered, a bloom mask
++ generation is passed, which denotes which of the bloom masks the filter
++ should be matched against. This allows programs to provide backward
++ compatible masks at upload time, while older clients can still match
++ against older versions of filters.
++ </para>
++ </refsect2>
++ </refsect1>
++
++ <refsect1>
++ <title>Matches for kernel notifications</title>
++ <para>
++ To receive kernel generated notifications (see
++ <citerefentry>
++ <refentrytitle>kdbus.message</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>),
++ a connection must install match rules that are different from
++ the bloom filter matches described in the section above. They can be
++ filtered by the connection ID that caused the notification to be sent, by
++ one of the names it currently owns, or by the type of the notification
++ (ID/name add/remove/change).
++ </para>
++ </refsect1>
++
++ <refsect1>
++ <title>Adding a match</title>
++ <para>
++ To add a match, the <constant>KDBUS_CMD_MATCH_ADD</constant> ioctl is
++ used, which takes a <type>struct kdbus_cmd_match</type> as an argument
++ described below.
++
++ Note that each of the items attached to this command will internally
++ create one match <emphasis>rule</emphasis>, and the collection of them,
++ which is submitted as one block via the ioctl, is called a
++ <emphasis>match</emphasis>. To allow a message to pass, all rules of a
++ match have to be satisfied. Hence, adding more items to the command will
++ only narrow the possibility of a match to effectively let the message
++ pass, and will decrease the chance that the connection's process will be
++ woken up needlessly.
++
++ Multiple matches can be installed per connection. As long as one of it has
++ a set of rules which allows the message to pass, this one will be
++ decisive.
++ </para>
++
++ <programlisting>
++struct kdbus_cmd_match {
++ __u64 size;
++ __u64 flags;
++ __u64 return_flags;
++ __u64 cookie;
++ struct kdbus_item items[0];
++};
++ </programlisting>
++
++ <para>The fields in this struct are described below.</para>
++
++ <variablelist>
++ <varlistentry>
++ <term><varname>size</varname></term>
++ <listitem><para>
++ The overall size of the struct, including its items.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>flags</varname></term>
++ <listitem><para>Flags to control the behavior of the ioctl.</para>
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_MATCH_REPLACE</constant></term>
++ <listitem>
++ <para>Make the endpoint file group-accessible</para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
++ <listitem>
++ <para>
++ Requests a set of valid flags for this ioctl. When this bit is
++ set, no action is taken; the ioctl will return
++ <errorcode>0</errorcode>, and the <varname>flags</varname>
++ field will have all bits set that are valid for this command.
++ The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
++ cleared by the operation.
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>return_flags</varname></term>
++ <listitem><para>
++ Flags returned by the kernel. Currently unused and always set to
++ <constant>0</constant> by the kernel.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>cookie</varname></term>
++ <listitem><para>
++ A cookie which identifies the match, so it can be referred to when
++ removing it.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>items</varname></term>
++ <listitem>
++ <para>
++ Items to define the actual rules of the matches. The following item
++ types are expected. Each item will create one new match rule.
++ </para>
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_BLOOM_MASK</constant></term>
++ <listitem>
++ <para>
++ An item that carries the bloom filter mask to match against
++ in its data field. The payload size must match the bloom
++ filter size that was specified when the bus was created.
++ See the "Bloom filters" section above for more information on
++ bloom filters.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_NAME</constant></term>
++ <listitem>
++ <para>
++ When used as part of kernel notifications, this item specifies
++ a name that is acquired, lost or that changed its owner (see
++ below). When used as part of a match for user-generated signal
++ messages, it specifies a name that the sending connection must
++ own at the time of sending the signal.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_ID</constant></term>
++ <listitem>
++ <para>
++ Specify a sender connection's ID that will match this rule.
++ For kernel notifications, this specifies the ID of a
++ connection that was added to or removed from the bus.
++ For used-generated signals, it specifies the ID of the
++ connection that sent the signal message.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_NAME_ADD</constant></term>
++ <term><constant>KDBUS_ITEM_NAME_REMOVE</constant></term>
++ <term><constant>KDBUS_ITEM_NAME_CHANGE</constant></term>
++ <listitem>
++ <para>
++ These items request delivery of kernel notifications that
++ describe a name acquisition, loss, or change. The details
++ are stored in the item's
++ <varname>kdbus_notify_name_change</varname> member.
++ All information specified must be matched in order to make
++ the message pass. Use
++ <constant>KDBUS_MATCH_ID_ANY</constant> to
++ match against any unique connection ID.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_ID_ADD</constant></term>
++ <term><constant>KDBUS_ITEM_ID_REMOVE</constant></term>
++ <listitem>
++ <para>
++ These items request delivery of kernel notifications that are
++ generated when a connection is created or terminated.
++ <type>struct kdbus_notify_id_change</type> is used to
++ store the actual match information. This item can be used to
++ monitor one particular connection ID, or, when the ID field
++ is set to <constant>KDBUS_MATCH_ID_ANY</constant>,
++ all of them.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
++ <listitem><para>
++ With this item, programs can <emphasis>probe</emphasis> the
++ kernel for known item types. See
++ <citerefentry>
++ <refentrytitle>kdbus.item</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more details.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++
++ <para>
++ Unrecognized items are rejected, and the ioctl will fail with
++ <varname>errno</varname> set to <constant>EINVAL</constant>.
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++
++ <para>
++ Refer to
++ <citerefentry>
++ <refentrytitle>kdbus.message</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more information on message types.
++ </para>
++ </refsect1>
++
++ <refsect1>
++ <title>Removing a match</title>
++ <para>
++ Matches can be removed with the
++ <constant>KDBUS_CMD_MATCH_REMOVE</constant> ioctl, which takes
++ <type>struct kdbus_cmd_match</type> as argument, but its fields
++ usage slightly differs compared to that of
++ <constant>KDBUS_CMD_MATCH_ADD</constant>.
++ </para>
++
++ <programlisting>
++struct kdbus_cmd_match {
++ __u64 size;
++ __u64 cookie;
++ __u64 flags;
++ __u64 return_flags;
++ struct kdbus_item items[0];
++};
++ </programlisting>
++
++ <para>The fields in this struct are described below.</para>
++
++ <variablelist>
++ <varlistentry>
++ <term><varname>size</varname></term>
++ <listitem><para>
++ The overall size of the struct, including its items.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>cookie</varname></term>
++ <listitem><para>
++ The cookie of the match, as it was passed when the match was added.
++ All matches that have this cookie will be removed.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>flags</varname></term>
++ <listitem><para>
++ No flags are supported for this use case.
++ <constant>KDBUS_FLAG_NEGOTIATE</constant> is accepted to probe for
++ valid flags. If set, the ioctl will fail with
++ <errorcode>-1</errorcode>, <varname>errno</varname> is set to
++ <constant>EPROTO</constant>, and the <varname>flags</varname> field
++ is set to <constant>0</constant>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>return_flags</varname></term>
++ <listitem><para>
++ Flags returned by the kernel. Currently unused and always set to
++ <constant>0</constant> by the kernel.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>items</varname></term>
++ <listitem>
++ <para>
++ No items are supported for this use case, but
++ <constant>KDBUS_ITEM_NEGOTIATE</constant> is allowed nevertheless.
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++ </refsect1>
++
++ <refsect1>
++ <title>Return value</title>
++ <para>
++ On success, all mentioned ioctl commands return <errorcode>0</errorcode>;
++ on error, <errorcode>-1</errorcode> is returned, and
++ <varname>errno</varname> is set to indicate the error.
++ If the issued ioctl is illegal for the file descriptor used,
++ <varname>errno</varname> will be set to <constant>ENOTTY</constant>.
++ </para>
++
++ <refsect2>
++ <title>
++ <constant>KDBUS_CMD_MATCH_ADD</constant> may fail with the following
++ errors
++ </title>
++
++ <variablelist>
++ <varlistentry>
++ <term><constant>EINVAL</constant></term>
++ <listitem><para>
++ Illegal flags or items.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EDOM</constant></term>
++ <listitem><para>
++ Illegal bloom filter size.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EMFILE</constant></term>
++ <listitem><para>
++ Too many matches for this connection.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++ </refsect2>
++
++ <refsect2>
++ <title>
++ <constant>KDBUS_CMD_MATCH_REMOVE</constant> may fail with the following
++ errors
++ </title>
++
++ <variablelist>
++ <varlistentry>
++ <term><constant>EINVAL</constant></term>
++ <listitem><para>
++ Illegal flags.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EBADSLT</constant></term>
++ <listitem><para>
++ A match entry with the given cookie could not be found.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++ </refsect2>
++ </refsect1>
++
++ <refsect1>
++ <title>See Also</title>
++ <simplelist type="inline">
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.bus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.match</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.fs</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.item</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.message</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.name</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.pool</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ </simplelist>
++ </refsect1>
++</refentry>
+diff --git a/Documentation/kdbus/kdbus.message.xml b/Documentation/kdbus/kdbus.message.xml
+new file mode 100644
+index 0000000..0115d9d
+--- /dev/null
++++ b/Documentation/kdbus/kdbus.message.xml
+@@ -0,0 +1,1276 @@
++<?xml version='1.0'?> <!--*-nxml-*-->
++<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
++ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
++
++<refentry id="kdbus.message">
++
++ <refentryinfo>
++ <title>kdbus.message</title>
++ <productname>kdbus.message</productname>
++ </refentryinfo>
++
++ <refmeta>
++ <refentrytitle>kdbus.message</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </refmeta>
++
++ <refnamediv>
++ <refname>kdbus.message</refname>
++ <refpurpose>kdbus message</refpurpose>
++ </refnamediv>
++
++ <refsect1>
++ <title>Description</title>
++
++ <para>
++ A kdbus message is used to exchange information between two connections
++ on a bus, or to transport notifications from the kernel to one or many
++ connections. This document describes the layout of messages, how payload
++ is added to them and how they are sent and received.
++ </para>
++ </refsect1>
++
++ <refsect1>
++ <title>Message layout</title>
++
++ <para>The layout of a message is shown below.</para>
++
++ <programlisting>
++ +-------------------------------------------------------------------------+
++ | Message |
++ | +---------------------------------------------------------------------+ |
++ | | Header | |
++ | | size: overall message size, including the data records | |
++ | | destination: connection ID of the receiver | |
++ | | source: connection ID of the sender (set by kernel) | |
++ | | payload_type: "DBusDBus" textual identifier stored as uint64_t | |
++ | +---------------------------------------------------------------------+ |
++ | +---------------------------------------------------------------------+ |
++ | | Data Record | |
++ | | size: overall record size (without padding) | |
++ | | type: type of data | |
++ | | data: reference to data (address or file descriptor) | |
++ | +---------------------------------------------------------------------+ |
++ | +---------------------------------------------------------------------+ |
++ | | padding bytes to the next 8 byte alignment | |
++ | +---------------------------------------------------------------------+ |
++ | +---------------------------------------------------------------------+ |
++ | | Data Record | |
++ | | size: overall record size (without padding) | |
++ | | ... | |
++ | +---------------------------------------------------------------------+ |
++ | +---------------------------------------------------------------------+ |
++ | | padding bytes to the next 8 byte alignment | |
++ | +---------------------------------------------------------------------+ |
++ | +---------------------------------------------------------------------+ |
++ | | Data Record | |
++ | | size: overall record size | |
++ | | ... | |
++ | +---------------------------------------------------------------------+ |
++ | ... further data records ... |
++ +-------------------------------------------------------------------------+
++ </programlisting>
++ </refsect1>
++
++ <refsect1>
++ <title>Message payload</title>
++
++ <para>
++ When connecting to the bus, receivers request a memory pool of a given
++ size, large enough to carry all backlog of data enqueued for the
++ connection. The pool is internally backed by a shared memory file which
++ can be <function>mmap()</function>ed by the receiver. See
++ <citerefentry>
++ <refentrytitle>kdbus.pool</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more information.
++ </para>
++
++ <para>
++ Message payload must be described in items attached to a message when
++ it is sent. A receiver can access the payload by looking at the items
++ that are attached to a message in its pool. The following items are used.
++ </para>
++
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_PAYLOAD_VEC</constant></term>
++ <listitem>
++ <para>
++ This item references a piece of memory on the sender side which is
++ directly copied into the receiver's pool. This way, two peers can
++ exchange data by effectively doing a single-copy from one process
++ to another; the kernel will not buffer the data anywhere else.
++ This item is never found in a message received by a connection.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_PAYLOAD_OFF</constant></term>
++ <listitem>
++ <para>
++ This item is attached to messages on the receiving side and points
++ to a memory area inside the receiver's pool. The
++ <varname>offset</varname> variable in the item denotes the memory
++ location relative to the message itself.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_PAYLOAD_MEMFD</constant></term>
++ <listitem>
++ <para>
++ Messages can reference <emphasis>memfd</emphasis> files which
++ contain the data. memfd files are tmpfs-backed files that allow
++ sealing of the content of the file, which prevents all writable
++ access to the file content.
++ </para>
++ <para>
++ Only memfds that have
++ <constant>(F_SEAL_SHRINK|F_SEAL_GROW|F_SEAL_WRITE|F_SEAL_SEAL)
++ </constant>
++ set are accepted as payload data, which enforces reliable passing of
++ data. The receiver can assume that neither the sender nor anyone
++ else can alter the content after the message is sent. If those
++ seals are not set on the memfd, the ioctl will fail with
++ <errorcode>-1</errorcode>, and <varname>errno</varname> will be
++ set to <constant>ETXTBUSY</constant>.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_FDS</constant></term>
++ <listitem>
++ <para>
++ Messages can transport regular file descriptors via
++ <constant>KDBUS_ITEM_FDS</constant>. This item carries an array
++ of <type>int</type> values in <varname>item.fd</varname>. The
++ maximum number of file descriptors in the item is
++ <constant>253</constant>, and only one item of this type is
++ accepted per message. All passed values must be valid file
++ descriptors; the open count of each file descriptors is increased
++ by installing it to the receiver's task. This item can only be
++ used for directed messages, not for broadcasts, and only to
++ remote peers that have opted-in for receiving file descriptors
++ at connection time (<constant>KDBUS_HELLO_ACCEPT_FD</constant>).
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++
++ <para>
++ The sender must not make any assumptions on the type in which data is
++ received by the remote peer. The kernel is free to re-pack multiple
++ <constant>KDBUS_ITEM_PAYLOAD_VEC</constant> and
++ <constant>KDBUS_ITEM_PAYLOAD_MEMFD</constant> payloads. For instance, the
++ kernel may decide to merge multiple <constant>VECs</constant> into a
++ single <constant>VEC</constant>, inline <constant>MEMFD</constant>
++ payloads into memory, or merge all passed <constant>VECs</constant> into a
++ single <constant>MEMFD</constant>. However, the kernel preserves the order
++ of passed data. This means that the order of all <constant>VEC</constant>
++ and <constant>MEMFD</constant> items is not changed in respect to each
++ other. In other words: All passed <constant>VEC</constant> and
++ <constant>MEMFD</constant> data payloads are treated as a single stream
++ of data that may be received by the remote peer in a different set of
++ chunks than it was sent as.
++ </para>
++ </refsect1>
++
++ <refsect1>
++ <title>Sending messages</title>
++
++ <para>
++ Messages are passed to the kernel with the
++ <constant>KDBUS_CMD_SEND</constant> ioctl. Depending on the destination
++ address of the message, the kernel delivers the message to the specific
++ destination connection, or to some subset of all connections on the same
++ bus. Sending messages across buses is not possible. Messages are always
++ queued in the memory pool of the destination connection (see above).
++ </para>
++
++ <para>
++ The <constant>KDBUS_CMD_SEND</constant> ioctl uses a
++ <type>struct kdbus_cmd_send</type> to describe the message
++ transfer.
++ </para>
++ <programlisting>
++struct kdbus_cmd_send {
++ __u64 size;
++ __u64 flags;
++ __u64 return_flags;
++ __u64 msg_address;
++ struct kdbus_msg_info reply;
++ struct kdbus_item items[0];
++};
++ </programlisting>
++
++ <para>The fields in this struct are described below.</para>
++
++ <variablelist>
++ <varlistentry>
++ <term><varname>size</varname></term>
++ <listitem><para>
++ The overall size of the struct, including its items.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>flags</varname></term>
++ <listitem><para>Flags for message delivery</para>
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_SEND_SYNC_REPLY</constant></term>
++ <listitem>
++ <para>
++ By default, all calls to kdbus are considered asynchronous,
++ non-blocking. However, as there are many use cases that need
++ to wait for a remote peer to answer a method call, there's a
++ way to send a message and wait for a reply in a synchronous
++ fashion. This is what the
++ <constant>KDBUS_SEND_SYNC_REPLY</constant> controls. The
++ <constant>KDBUS_CMD_SEND</constant> ioctl will block until the
++ reply has arrived, the timeout limit is reached, in case the
++ remote connection was shut down, or if interrupted by a signal
++ before any reply; see
++ <citerefentry>
++ <refentrytitle>signal</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>.
++
++ The offset of the reply message in the sender's pool is stored
++ in <varname>reply</varname> when the ioctl has returned without
++ error. Hence, there is no need for another
++ <constant>KDBUS_CMD_RECV</constant> ioctl or anything else to
++ receive the reply.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
++ <listitem>
++ <para>
++ Request a set of valid flags for this ioctl. When this bit is
++ set, no action is taken; the ioctl will fail with
++ <errorcode>-1</errorcode>, <varname>errno</varname>
++ is set to <constant>EPROTO</constant>.
++ Once the ioctl returned, the <varname>flags</varname>
++ field will have all bits set that the kernel recognizes as
++ valid for this command.
++ The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
++ cleared by the operation.
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>return_flags</varname></term>
++ <listitem><para>
++ Flags returned by the kernel. Currently unused and always set to
++ <constant>0</constant> by the kernel.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>msg_address</varname></term>
++ <listitem><para>
++ In this field, users have to provide a pointer to a message
++ (<type>struct kdbus_msg</type>) to send. See below for a
++ detailed description.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>reply</varname></term>
++ <listitem><para>
++ Only used for synchronous replies. See description of
++ <type>struct kdbus_cmd_recv</type> for more details.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>items</varname></term>
++ <listitem>
++ <para>
++ The following items are currently recognized.
++ </para>
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_CANCEL_FD</constant></term>
++ <listitem>
++ <para>
++ When this optional item is passed in, and the call is
++ executed as SYNC call, the passed in file descriptor can be
++ used as alternative cancellation point. The kernel will call
++ <citerefentry>
++ <refentrytitle>poll</refentrytitle>
++ <manvolnum>2</manvolnum>
++ </citerefentry>
++ on this file descriptor, and once it reports any incoming
++ bytes, the blocking send operation will be canceled; the
++ blocking, synchronous ioctl call will return
++ <errorcode>-1</errorcode>, and <varname>errno</varname> will
++ be set to <errorname>ECANCELED</errorname>.
++ Any type of file descriptor on which
++ <citerefentry>
++ <refentrytitle>poll</refentrytitle>
++ <manvolnum>2</manvolnum>
++ </citerefentry>
++ can be called on can be used as payload to this item; for
++ example, an eventfd can be used for this purpose, see
++ <citerefentry>
++ <refentrytitle>eventfd</refentrytitle>
++ <manvolnum>2</manvolnum>
++ </citerefentry>.
++ For asynchronous message sending, this item is allowed but
++ ignored.
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++ <para>
++ Unrecognized items are rejected, and the ioctl will fail with
++ <varname>errno</varname> set to <constant>EINVAL</constant>.
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++
++ <para>
++ The message referenced by the <varname>msg_address</varname> above has
++ the following layout.
++ </para>
++
++ <programlisting>
++struct kdbus_msg {
++ __u64 size;
++ __u64 flags;
++ __s64 priority;
++ __u64 dst_id;
++ __u64 src_id;
++ __u64 payload_type;
++ __u64 cookie;
++ __u64 timeout_ns;
++ __u64 cookie_reply;
++ struct kdbus_item items[0];
++};
++ </programlisting>
++
++ <para>The fields in this struct are described below.</para>
++
++ <variablelist>
++ <varlistentry>
++ <term><varname>size</varname></term>
++ <listitem><para>
++ The overall size of the struct, including its items.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>flags</varname></term>
++ <listitem><para>Flags to describe message details.</para>
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_MSG_EXPECT_REPLY</constant></term>
++ <listitem>
++ <para>
++ Expect a reply to this message from the remote peer. With
++ this bit set, the timeout_ns field must be set to a non-zero
++ number of nanoseconds in which the receiving peer is expected
++ to reply. If such a reply is not received in time, the sender
++ will be notified with a timeout message (see below). The
++ value must be an absolute value, in nanoseconds and based on
++ <constant>CLOCK_MONOTONIC</constant>.
++ </para><para>
++ For a message to be accepted as reply, it must be a direct
++ message to the original sender (not a broadcast and not a
++ signal message), and its
++ <varname>kdbus_msg.cookie_reply</varname> must match the
++ previous message's <varname>kdbus_msg.cookie</varname>.
++ </para><para>
++ Expected replies also temporarily open the policy of the
++ sending connection, so the other peer is allowed to respond
++ within the given time window.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_MSG_NO_AUTO_START</constant></term>
++ <listitem>
++ <para>
++ By default, when a message is sent to an activator
++ connection, the activator is notified and will start an
++ implementer. This flag inhibits that behavior. With this bit
++ set, and the remote being an activator, the ioctl will fail
++ with <varname>errno</varname> set to
++ <constant>EADDRNOTAVAIL</constant>.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
++ <listitem>
++ <para>
++ Requests a set of valid flags for this ioctl. When this bit is
++ set, no action is taken; the ioctl will return
++ <errorcode>0</errorcode>, and the <varname>flags</varname>
++ field will have all bits set that are valid for this command.
++ The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
++ cleared by the operation.
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>priority</varname></term>
++ <listitem><para>
++ The priority of this message. Receiving messages (see below) may
++ optionally be constrained to messages of a minimal priority. This
++ allows for use cases where timing critical data is interleaved with
++ control data on the same connection. If unused, the priority field
++ should be set to <constant>0</constant>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>dst_id</varname></term>
++ <listitem><para>
++ The numeric ID of the destination connection, or
++ <constant>KDBUS_DST_ID_BROADCAST</constant>
++ (~0ULL) to address every peer on the bus, or
++ <constant>KDBUS_DST_ID_NAME</constant> (0) to look
++ it up dynamically from the bus' name registry.
++ In the latter case, an item of type
++ <constant>KDBUS_ITEM_DST_NAME</constant> is mandatory.
++ Also see
++ <citerefentry>
++ <refentrytitle>kdbus.name</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ .
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>src_id</varname></term>
++ <listitem><para>
++ Upon return of the ioctl, this member will contain the sending
++ connection's numerical ID. Should be 0 at send time.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>payload_type</varname></term>
++ <listitem><para>
++ Type of the payload in the actual data records. Currently, only
++ <constant>KDBUS_PAYLOAD_DBUS</constant> is accepted as input value
++ of this field. When receiving messages that are generated by the
++ kernel (notifications), this field will contain
++ <constant>KDBUS_PAYLOAD_KERNEL</constant>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>cookie</varname></term>
++ <listitem><para>
++ Cookie of this message, for later recognition. Also, when replying
++ to a message (see above), the <varname>cookie_reply</varname>
++ field must match this value.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>timeout_ns</varname></term>
++ <listitem><para>
++ If the message sent requires a reply from the remote peer (see above),
++ this field contains the timeout in absolute nanoseconds based on
++ <constant>CLOCK_MONOTONIC</constant>. Also see
++ <citerefentry>
++ <refentrytitle>clock_gettime</refentrytitle>
++ <manvolnum>2</manvolnum>
++ </citerefentry>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>cookie_reply</varname></term>
++ <listitem><para>
++ If the message sent is a reply to another message, this field must
++ match the cookie of the formerly received message.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>items</varname></term>
++ <listitem>
++ <para>
++ A dynamically sized list of items to contain additional information.
++ The following items are expected/valid:
++ </para>
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_PAYLOAD_VEC</constant></term>
++ <term><constant>KDBUS_ITEM_PAYLOAD_MEMFD</constant></term>
++ <term><constant>KDBUS_ITEM_FDS</constant></term>
++ <listitem>
++ <para>
++ Actual data records containing the payload. See section
++ "Message payload".
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_BLOOM_FILTER</constant></term>
++ <listitem>
++ <para>
++ Bloom filter for matches (see below).
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ITEM_DST_NAME</constant></term>
++ <listitem>
++ <para>
++ Well-known name to send this message to. Required if
++ <varname>dst_id</varname> is set to
++ <constant>KDBUS_DST_ID_NAME</constant>.
++ If a connection holding the given name can't be found,
++ the ioctl will fail with <varname>errno</varname> set to
++ <constant>ESRCH</constant> is returned.
++ </para>
++ <para>
++ For messages to a unique name (ID), this item is optional. If
++ present, the kernel will make sure the name owner matches the
++ given unique name. This allows programs to tie the message
++ sending to the condition that a name is currently owned by a
++ certain unique name.
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++
++ <para>
++ The message will be augmented by the requested metadata items when
++ queued into the receiver's pool. See
++ <citerefentry>
++ <refentrytitle>kdbus.connection</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ and
++ <citerefentry>
++ <refentrytitle>kdbus.item</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more information on metadata.
++ </para>
++ </refsect1>
++
++ <refsect1>
++ <title>Receiving messages</title>
++
++ <para>
++ Messages are received by the client with the
++ <constant>KDBUS_CMD_RECV</constant> ioctl. The endpoint file of the bus
++ supports <function>poll()/epoll()/select()</function>; when new messages
++ are available on the connection's file descriptor,
++ <constant>POLLIN</constant> is reported. For compatibility reasons,
++ <constant>POLLOUT</constant> is always reported as well. Note, however,
++ that the latter does not guarantee that a message can in fact be sent, as
++ this depends on how many pending messages the receiver has in its pool.
++ </para>
++
++ <para>
++ With the <constant>KDBUS_CMD_RECV</constant> ioctl, a
++ <type>struct kdbus_cmd_recv</type> is used.
++ </para>
++
++ <programlisting>
++struct kdbus_cmd_recv {
++ __u64 size;
++ __u64 flags;
++ __u64 return_flags;
++ __s64 priority;
++ __u64 dropped_msgs;
++ struct kdbus_msg_info msg;
++ struct kdbus_item items[0];
++};
++ </programlisting>
++
++ <para>The fields in this struct are described below.</para>
++
++ <variablelist>
++ <varlistentry>
++ <term><varname>size</varname></term>
++ <listitem><para>
++ The overall size of the struct, including its items.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>flags</varname></term>
++ <listitem><para>Flags to control the receive command.</para>
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_RECV_PEEK</constant></term>
++ <listitem>
++ <para>
++ Just return the location of the next message. Do not install
++ file descriptors or anything else. This is usually used to
++ determine the sender of the next queued message.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_RECV_DROP</constant></term>
++ <listitem>
++ <para>
++ Drop the next message without doing anything else with it,
++ and free the pool slice. This a short-cut for
++ <constant>KDBUS_RECV_PEEK</constant> and
++ <constant>KDBUS_CMD_FREE</constant>.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_RECV_USE_PRIORITY</constant></term>
++ <listitem>
++ <para>
++ Dequeue the messages ordered by their priority, and filtering
++ them with the priority field (see below).
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
++ <listitem>
++ <para>
++ Request a set of valid flags for this ioctl. When this bit is
++ set, no action is taken; the ioctl will fail with
++ <errorcode>-1</errorcode>, <varname>errno</varname>
++ is set to <constant>EPROTO</constant>.
++ Once the ioctl returned, the <varname>flags</varname>
++ field will have all bits set that the kernel recognizes as
++ valid for this command.
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>return_flags</varname></term>
++ <listitem><para>
++ Flags returned by the kernel. If the <varname>dropped_msgs</varname>
++ field is non-zero, <constant>KDBUS_RECV_RETURN_DROPPED_MSGS</constant>
++ is set. If a file descriptor could not be installed, the
++ <constant>KDBUS_RECV_RETURN_INCOMPLETE_FDS</constant> flag is set.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>priority</varname></term>
++ <listitem><para>
++ With <constant>KDBUS_RECV_USE_PRIORITY</constant> set in
++ <varname>flags</varname>, messages will be dequeued ordered by their
++ priority, starting with the highest value. Also, messages will be
++ filtered by the value given in this field, so the returned message
++ will at least have the requested priority. If no such message is
++ waiting in the queue, the ioctl will fail, and
++ <varname>errno</varname> will be set to <constant>EAGAIN</constant>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>dropped_msgs</varname></term>
++ <listitem><para>
++ Whenever a message with <constant>KDBUS_MSG_SIGNAL</constant> is sent
++ but cannot be queued on a peer (e.g., as it contains FDs but the peer
++ does not support FDs, or there is no space left in the peer's pool)
++ the 'dropped_msgs' counter of the peer is incremented. On the next
++ RECV ioctl, the 'dropped_msgs' field is copied into the ioctl struct
++ and cleared on the peer. If it was non-zero, the
++ <constant>KDBUS_RECV_RETURN_DROPPED_MSGS</constant> flag will be set
++ in <varname>return_flags</varname>. Note that this will only happen
++ if the ioctl succeeded or failed with <constant>EAGAIN</constant>. In
++ other error cases, the 'dropped_msgs' field of the peer is left
++ untouched.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>msg</varname></term>
++ <listitem><para>
++ Embedded struct containing information on the received message when
++ this command succeeded (see below).
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>items</varname></term>
++ <listitem><para>
++ Items to specify further details for the receive command.
++ Currently unused, and all items will be rejected with
++ <varname>errno</varname> set to <constant>EINVAL</constant>.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++
++ <para>
++ Both <type>struct kdbus_cmd_recv</type> and
++ <type>struct kdbus_cmd_send</type> embed
++ <type>struct kdbus_msg_info</type>.
++ For the <constant>KDBUS_CMD_SEND</constant> ioctl, it is used to catch
++ synchronous replies, if one was requested, and is unused otherwise.
++ </para>
++
++ <programlisting>
++struct kdbus_msg_info {
++ __u64 offset;
++ __u64 msg_size;
++ __u64 return_flags;
++};
++ </programlisting>
++
++ <para>The fields in this struct are described below.</para>
++
++ <variablelist>
++ <varlistentry>
++ <term><varname>offset</varname></term>
++ <listitem><para>
++ Upon return of the ioctl, this field contains the offset in the
++ receiver's memory pool. The memory must be freed with
++ <constant>KDBUS_CMD_FREE</constant>. See
++ <citerefentry>
++ <refentrytitle>kdbus.pool</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for further details.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>msg_size</varname></term>
++ <listitem><para>
++ Upon successful return of the ioctl, this field contains the size of
++ the allocated slice at offset <varname>offset</varname>.
++ It is the combination of the size of the stored
++ <type>struct kdbus_msg</type> object plus all appended VECs.
++ You can use it in combination with <varname>offset</varname> to map
++ a single message, instead of mapping the entire pool. See
++ <citerefentry>
++ <refentrytitle>kdbus.pool</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for further details.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>return_flags</varname></term>
++ <listitem>
++ <para>
++ Kernel-provided return flags. Currently, the following flags are
++ defined.
++ </para>
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_RECV_RETURN_INCOMPLETE_FDS</constant></term>
++ <listitem>
++ <para>
++ The message contained memfds or file descriptors, and the
++ kernel failed to install one or more of them at receive time.
++ Most probably that happened because the maximum number of
++ file descriptors for the receiver's task were exceeded.
++ In such cases, the message is still delivered, so this is not
++ a fatal condition. File descriptors numbers inside the
++ <constant>KDBUS_ITEM_FDS</constant> item or memfd files
++ referenced by <constant>KDBUS_ITEM_PAYLOAD_MEMFD</constant>
++ items which could not be installed will be set to
++ <constant>-1</constant>.
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++
++ <para>
++ Unless <constant>KDBUS_RECV_DROP</constant> was passed, the
++ <varname>offset</varname> field contains the location of the new message
++ inside the receiver's pool after the <constant>KDBUS_CMD_RECV</constant>
++ ioctl was employed. The message is stored as <type>struct kdbus_msg</type>
++ at this offset, and can be interpreted with the semantics described above.
++ </para>
++ <para>
++ Also, if the connection allowed for file descriptor to be passed
++ (<constant>KDBUS_HELLO_ACCEPT_FD</constant>), and if the message contained
++ any, they will be installed into the receiving process when the
++ <constant>KDBUS_CMD_RECV</constant> ioctl is called.
++ <emphasis>memfds</emphasis> may always be part of the message payload.
++ The receiving task is obliged to close all file descriptors appropriately
++ once no longer needed. If <constant>KDBUS_RECV_PEEK</constant> is set, no
++ file descriptors are installed. This allows for peeking at a message,
++ looking at its metadata only and dropping it via
++ <constant>KDBUS_RECV_DROP</constant>, without installing any of the file
++ descriptors into the receiving process.
++ </para>
++ <para>
++ The caller is obliged to call the <constant>KDBUS_CMD_FREE</constant>
++ ioctl with the returned offset when the memory is no longer needed.
++ </para>
++ </refsect1>
++
++ <refsect1>
++ <title>Notifications</title>
++ <para>
++ A kernel notification is a regular kdbus message with the following
++ details.
++ </para>
++
++ <itemizedlist>
++ <listitem><para>
++ kdbus_msg.src_id == <constant>KDBUS_SRC_ID_KERNEL</constant>
++ </para></listitem>
++ <listitem><para>
++ kdbus_msg.dst_id == <constant>KDBUS_DST_ID_BROADCAST</constant>
++ </para></listitem>
++ <listitem><para>
++ kdbus_msg.payload_type == <constant>KDBUS_PAYLOAD_KERNEL</constant>
++ </para></listitem>
++ <listitem><para>
++ Has exactly one of the items attached that are described below.
++ </para></listitem>
++ <listitem><para>
++ Always has a timestamp item (<constant>KDBUS_ITEM_TIMESTAMP</constant>)
++ attached.
++ </para></listitem>
++ </itemizedlist>
++
++ <para>
++ The kernel will notify its users of the following events.
++ </para>
++
++ <itemizedlist>
++ <listitem><para>
++ When connection <emphasis>A</emphasis> is terminated while connection
++ <emphasis>B</emphasis> is waiting for a reply from it, connection
++ <emphasis>B</emphasis> is notified with a message with an item of
++ type <constant>KDBUS_ITEM_REPLY_DEAD</constant>.
++ </para></listitem>
++
++ <listitem><para>
++ When connection <emphasis>A</emphasis> does not receive a reply from
++ connection <emphasis>B</emphasis> within the specified timeout window,
++ connection <emphasis>A</emphasis> will receive a message with an
++ item of type <constant>KDBUS_ITEM_REPLY_TIMEOUT</constant>.
++ </para></listitem>
++
++ <listitem><para>
++ When an ordinary connection (not a monitor) is created on or removed
++ from a bus, messages with an item of type
++ <constant>KDBUS_ITEM_ID_ADD</constant> or
++ <constant>KDBUS_ITEM_ID_REMOVE</constant>, respectively, are delivered
++ to all bus members that match these messages through their match
++ database. Eavesdroppers (monitor connections) do not cause such
++ notifications to be sent. They are invisible on the bus.
++ </para></listitem>
++
++ <listitem><para>
++ When a connection gains or loses ownership of a name, messages with an
++ item of type <constant>KDBUS_ITEM_NAME_ADD</constant>,
++ <constant>KDBUS_ITEM_NAME_REMOVE</constant> or
++ <constant>KDBUS_ITEM_NAME_CHANGE</constant> are delivered to all bus
++ members that match these messages through their match database.
++ </para></listitem>
++ </itemizedlist>
++ </refsect1>
++
++ <refsect1>
++ <title>Return value</title>
++ <para>
++ On success, all mentioned ioctl commands return <errorcode>0</errorcode>;
++ on error, <errorcode>-1</errorcode> is returned, and
++ <varname>errno</varname> is set to indicate the error.
++ If the issued ioctl is illegal for the file descriptor used,
++ <varname>errno</varname> will be set to <constant>ENOTTY</constant>.
++ </para>
++
++ <refsect2>
++ <title>
++ <constant>KDBUS_CMD_SEND</constant> may fail with the following
++ errors
++ </title>
++
++ <variablelist>
++ <varlistentry>
++ <term><constant>EOPNOTSUPP</constant></term>
++ <listitem><para>
++ The connection is not an ordinary connection, or the passed
++ file descriptors in <constant>KDBUS_ITEM_FDS</constant> item are
++ either kdbus handles or unix domain sockets. Both are currently
++ unsupported.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EINVAL</constant></term>
++ <listitem><para>
++ The submitted payload type is
++ <constant>KDBUS_PAYLOAD_KERNEL</constant>,
++ <constant>KDBUS_MSG_EXPECT_REPLY</constant> was set without timeout
++ or cookie values, <constant>KDBUS_SEND_SYNC_REPLY</constant> was
++ set without <constant>KDBUS_MSG_EXPECT_REPLY</constant>, an invalid
++ item was supplied, <constant>src_id</constant> was non-zero and was
++ different from the current connection's ID, a supplied memfd had a
++ size of 0, or a string was not properly null-terminated.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>ENOTUNIQ</constant></term>
++ <listitem><para>
++ The supplied destination is
++ <constant>KDBUS_DST_ID_BROADCAST</constant> and either
++ file descriptors were passed, or
++ <constant>KDBUS_MSG_EXPECT_REPLY</constant> was set,
++ or a timeout was given.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>E2BIG</constant></term>
++ <listitem><para>
++ Too many items.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EMSGSIZE</constant></term>
++ <listitem><para>
++ The size of the message header and items or the payload vector
++ is excessive.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EEXIST</constant></term>
++ <listitem><para>
++ Multiple <constant>KDBUS_ITEM_FDS</constant>,
++ <constant>KDBUS_ITEM_BLOOM_FILTER</constant> or
++ <constant>KDBUS_ITEM_DST_NAME</constant> items were supplied.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EBADF</constant></term>
++ <listitem><para>
++ The supplied <constant>KDBUS_ITEM_FDS</constant> or
++ <constant>KDBUS_ITEM_PAYLOAD_MEMFD</constant> items
++ contained an illegal file descriptor.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EMEDIUMTYPE</constant></term>
++ <listitem><para>
++ The supplied memfd is not a sealed kdbus memfd.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EMFILE</constant></term>
++ <listitem><para>
++ Too many file descriptors inside a
++ <constant>KDBUS_ITEM_FDS</constant>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EBADMSG</constant></term>
++ <listitem><para>
++ An item had illegal size, both a <constant>dst_id</constant> and a
++ <constant>KDBUS_ITEM_DST_NAME</constant> was given, or both a name
++ and a bloom filter was given.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>ETXTBSY</constant></term>
++ <listitem><para>
++ The supplied kdbus memfd file cannot be sealed or the seal
++ was removed, because it is shared with other processes or
++ still mapped with
++ <citerefentry>
++ <refentrytitle>mmap</refentrytitle>
++ <manvolnum>2</manvolnum>
++ </citerefentry>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>ECOMM</constant></term>
++ <listitem><para>
++ A peer does not accept the file descriptors addressed to it.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EFAULT</constant></term>
++ <listitem><para>
++ The supplied bloom filter size was not 64-bit aligned, or supplied
++ memory could not be accessed by the kernel.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EDOM</constant></term>
++ <listitem><para>
++ The supplied bloom filter size did not match the bloom filter
++ size of the bus.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EDESTADDRREQ</constant></term>
++ <listitem><para>
++ <constant>dst_id</constant> was set to
++ <constant>KDBUS_DST_ID_NAME</constant>, but no
++ <constant>KDBUS_ITEM_DST_NAME</constant> was attached.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>ESRCH</constant></term>
++ <listitem><para>
++ The name to look up was not found in the name registry.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EADDRNOTAVAIL</constant></term>
++ <listitem><para>
++ <constant>KDBUS_MSG_NO_AUTO_START</constant> was given but the
++ destination connection is an activator.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>ENXIO</constant></term>
++ <listitem><para>
++ The passed numeric destination connection ID couldn't be found,
++ or is not connected.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>ECONNRESET</constant></term>
++ <listitem><para>
++ The destination connection is no longer active.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>ETIMEDOUT</constant></term>
++ <listitem><para>
++ Timeout while synchronously waiting for a reply.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EINTR</constant></term>
++ <listitem><para>
++ Interrupted system call while synchronously waiting for a reply.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EPIPE</constant></term>
++ <listitem><para>
++ When sending a message, a synchronous reply from the receiving
++ connection was expected but the connection died before answering.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>ENOBUFS</constant></term>
++ <listitem><para>
++ Too many pending messages on the receiver side.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EREMCHG</constant></term>
++ <listitem><para>
++ Both a well-known name and a unique name (ID) was given, but
++ the name is not currently owned by that connection.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EXFULL</constant></term>
++ <listitem><para>
++ The memory pool of the receiver is full.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EREMOTEIO</constant></term>
++ <listitem><para>
++ While synchronously waiting for a reply, the remote peer
++ failed with an I/O error.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++ </refsect2>
++
++ <refsect2>
++ <title>
++ <constant>KDBUS_CMD_RECV</constant> may fail with the following
++ errors
++ </title>
++
++ <variablelist>
++ <varlistentry>
++ <term><constant>EOPNOTSUPP</constant></term>
++ <listitem><para>
++ The connection is not an ordinary connection, or the passed
++ file descriptors are either kdbus handles or unix domain
++ sockets. Both are currently unsupported.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EINVAL</constant></term>
++ <listitem><para>
++ Invalid flags or offset.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EAGAIN</constant></term>
++ <listitem><para>
++ No message found in the queue.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++ </refsect2>
++ </refsect1>
++
++ <refsect1>
++ <title>See Also</title>
++ <simplelist type="inline">
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.bus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.connection</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.endpoint</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.fs</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.item</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.name</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.pool</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>clock_gettime</refentrytitle>
++ <manvolnum>2</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>ioctl</refentrytitle>
++ <manvolnum>2</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>poll</refentrytitle>
++ <manvolnum>2</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>select</refentrytitle>
++ <manvolnum>2</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>epoll</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>eventfd</refentrytitle>
++ <manvolnum>2</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>memfd_create</refentrytitle>
++ <manvolnum>2</manvolnum>
++ </citerefentry>
++ </member>
++ </simplelist>
++ </refsect1>
++</refentry>
+diff --git a/Documentation/kdbus/kdbus.name.xml b/Documentation/kdbus/kdbus.name.xml
+new file mode 100644
+index 0000000..3f5f6a6
+--- /dev/null
++++ b/Documentation/kdbus/kdbus.name.xml
+@@ -0,0 +1,711 @@
++<?xml version='1.0'?> <!--*-nxml-*-->
++<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
++ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
++
++<refentry id="kdbus.name">
++
++ <refentryinfo>
++ <title>kdbus.name</title>
++ <productname>kdbus.name</productname>
++ </refentryinfo>
++
++ <refmeta>
++ <refentrytitle>kdbus.name</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </refmeta>
++
++ <refnamediv>
++ <refname>kdbus.name</refname>
++ <refpurpose>kdbus.name</refpurpose>
++ </refnamediv>
++
++ <refsect1>
++ <title>Description</title>
++ <para>
++ Each
++ <citerefentry>
++ <refentrytitle>kdbus.bus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ instantiates a name registry to resolve well-known names into unique
++ connection IDs for message delivery. The registry will be queried when a
++ message is sent with <varname>kdbus_msg.dst_id</varname> set to
++ <constant>KDBUS_DST_ID_NAME</constant>, or when a registry dump is
++ requested with <constant>KDBUS_CMD_NAME_LIST</constant>.
++ </para>
++
++ <para>
++ All of the below is subject to policy rules for <emphasis>SEE</emphasis>
++ and <emphasis>OWN</emphasis> permissions. See
++ <citerefentry>
++ <refentrytitle>kdbus.policy</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more information.
++ </para>
++ </refsect1>
++
++ <refsect1>
++ <title>Name validity</title>
++ <para>
++ A name has to comply with the following rules in order to be considered
++ valid.
++ </para>
++
++ <itemizedlist>
++ <listitem>
++ <para>
++ The name has two or more elements separated by a
++ '<literal>.</literal>' (period) character.
++ </para>
++ </listitem>
++ <listitem>
++ <para>
++ All elements must contain at least one character.
++ </para>
++ </listitem>
++ <listitem>
++ <para>
++ Each element must only contain the ASCII characters
++ <literal>[A-Z][a-z][0-9]_</literal> and must not begin with a
++ digit.
++ </para>
++ </listitem>
++ <listitem>
++ <para>
++ The name must contain at least one '<literal>.</literal>' (period)
++ character (and thus at least two elements).
++ </para>
++ </listitem>
++ <listitem>
++ <para>
++ The name must not begin with a '<literal>.</literal>' (period)
++ character.
++ </para>
++ </listitem>
++ <listitem>
++ <para>
++ The name must not exceed <constant>255</constant> characters in
++ length.
++ </para>
++ </listitem>
++ </itemizedlist>
++ </refsect1>
++
++ <refsect1>
++ <title>Acquiring a name</title>
++ <para>
++ To acquire a name, a client uses the
++ <constant>KDBUS_CMD_NAME_ACQUIRE</constant> ioctl with
++ <type>struct kdbus_cmd</type> as argument.
++ </para>
++
++ <programlisting>
++struct kdbus_cmd {
++ __u64 size;
++ __u64 flags;
++ __u64 return_flags;
++ struct kdbus_item items[0];
++};
++ </programlisting>
++
++ <para>The fields in this struct are described below.</para>
++
++ <variablelist>
++ <varlistentry>
++ <term><varname>size</varname></term>
++ <listitem><para>
++ The overall size of the struct, including its items.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>flags</varname></term>
++ <listitem><para>Flags to control details in the name acquisition.</para>
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_NAME_REPLACE_EXISTING</constant></term>
++ <listitem>
++ <para>
++ Acquiring a name that is already present usually fails,
++ unless this flag is set in the call, and
++ <constant>KDBUS_NAME_ALLOW_REPLACEMENT</constant> (see below)
++ was set when the current owner of the name acquired it, or
++ if the current owner is an activator connection (see
++ <citerefentry>
++ <refentrytitle>kdbus.connection</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>).
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_NAME_ALLOW_REPLACEMENT</constant></term>
++ <listitem>
++ <para>
++ Allow other connections to take over this name. When this
++ happens, the former owner of the connection will be notified
++ of the name loss.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_NAME_QUEUE</constant></term>
++ <listitem>
++ <para>
++ A name that is already acquired by a connection can not be
++ acquired again (unless the
++ <constant>KDBUS_NAME_ALLOW_REPLACEMENT</constant> flag was
++ set during acquisition; see above).
++ However, a connection can put itself in a queue of
++ connections waiting for the name to be released. Once that
++ happens, the first connection in that queue becomes the new
++ owner and is notified accordingly.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
++ <listitem>
++ <para>
++ Request a set of valid flags for this ioctl. When this bit is
++ set, no action is taken; the ioctl will fail with
++ <errorcode>-1</errorcode>, and <varname>errno</varname>
++ is set to <constant>EPROTO</constant>.
++ Once the ioctl returned, the <varname>flags</varname>
++ field will have all bits set that the kernel recognizes as
++ valid for this command.
++ The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
++ cleared by the operation.
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>return_flags</varname></term>
++ <listitem>
++ <para>
++ Flags returned by the kernel. Currently, the following may be
++ returned by the kernel.
++ </para>
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_NAME_IN_QUEUE</constant></term>
++ <listitem>
++ <para>
++ The name was not acquired yet, but the connection was
++ placed in the queue of peers waiting for the name.
++ This can only happen if <constant>KDBUS_NAME_QUEUE</constant>
++ was set in the <varname>flags</varname> member (see above).
++ The connection will receive a name owner change notification
++ once the current owner has given up the name and its
++ ownership was transferred.
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>items</varname></term>
++ <listitem>
++ <para>
++ Items to submit the name. Currently, one item of type
++ <constant>KDBUS_ITEM_NAME</constant> is expected and allowed, and
++ the contained string must be a valid bus name.
++ <constant>KDBUS_ITEM_NEGOTIATE</constant> may be used to probe for
++ valid item types. See
++ <citerefentry>
++ <refentrytitle>kdbus.item</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for a detailed description of how this item is used.
++ </para>
++ <para>
++ Unrecognized items are rejected, and the ioctl will fail with
++ <varname>errno</varname> set to <errorname>>EINVAL</errorname>.
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++ </refsect1>
++
++ <refsect1>
++ <title>Releasing a name</title>
++ <para>
++ A connection may release a name explicitly with the
++ <constant>KDBUS_CMD_NAME_RELEASE</constant> ioctl. If the connection was
++ an implementer of an activatable name, its pending messages are moved
++ back to the activator. If there are any connections queued up as waiters
++ for the name, the first one in the queue (the oldest entry) will become
++ the new owner. The same happens implicitly for all names once a
++ connection terminates. See
++ <citerefentry>
++ <refentrytitle>kdbus.connection</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more information on connections.
++ </para>
++ <para>
++ The <constant>KDBUS_CMD_NAME_RELEASE</constant> ioctl uses the same data
++ structure as the acquisition call
++ (<constant>KDBUS_CMD_NAME_ACQUIRE</constant>),
++ but with slightly different field usage.
++ </para>
++
++ <programlisting>
++struct kdbus_cmd {
++ __u64 size;
++ __u64 flags;
++ __u64 return_flags;
++ struct kdbus_item items[0];
++};
++ </programlisting>
++
++ <para>The fields in this struct are described below.</para>
++
++ <variablelist>
++ <varlistentry>
++ <term><varname>size</varname></term>
++ <listitem><para>
++ The overall size of the struct, including its items.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>flags</varname></term>
++ <listitem><para>
++ Flags to the command. Currently unused.
++ <constant>KDBUS_FLAG_NEGOTIATE</constant> is accepted to probe for
++ valid flags. If set, the ioctl will return <errorcode>0</errorcode>,
++ and the <varname>flags</varname> field is set to
++ <constant>0</constant>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>return_flags</varname></term>
++ <listitem><para>
++ Flags returned by the kernel. Currently unused and always set to
++ <constant>0</constant> by the kernel.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>items</varname></term>
++ <listitem>
++ <para>
++ Items to submit the name. Currently, one item of type
++ <constant>KDBUS_ITEM_NAME</constant> is expected and allowed, and
++ the contained string must be a valid bus name.
++ <constant>KDBUS_ITEM_NEGOTIATE</constant> may be used to probe for
++ valid item types. See
++ <citerefentry>
++ <refentrytitle>kdbus.item</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for a detailed description of how this item is used.
++ </para>
++ <para>
++ Unrecognized items are rejected, and the ioctl will fail with
++ <varname>errno</varname> set to <constant>EINVAL</constant>.
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++ </refsect1>
++
++ <refsect1>
++ <title>Dumping the name registry</title>
++ <para>
++ A connection may request a complete or filtered dump of currently active
++ bus names with the <constant>KDBUS_CMD_LIST</constant> ioctl, which
++ takes a <type>struct kdbus_cmd_list</type> as argument.
++ </para>
++
++ <programlisting>
++struct kdbus_cmd_list {
++ __u64 flags;
++ __u64 return_flags;
++ __u64 offset;
++};
++ </programlisting>
++
++ <para>The fields in this struct are described below.</para>
++
++ <variablelist>
++ <varlistentry>
++ <term><varname>flags</varname></term>
++ <listitem>
++ <para>
++ Any combination of flags to specify which names should be dumped.
++ </para>
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_LIST_UNIQUE</constant></term>
++ <listitem>
++ <para>
++ List the unique (numeric) IDs of the connection, whether it
++ owns a name or not.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_LIST_NAMES</constant></term>
++ <listitem>
++ <para>
++ List well-known names stored in the database which are
++ actively owned by a real connection (not an activator).
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_LIST_ACTIVATORS</constant></term>
++ <listitem>
++ <para>
++ List names that are owned by an activator.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_LIST_QUEUED</constant></term>
++ <listitem>
++ <para>
++ List connections that are not yet owning a name but are
++ waiting for it to become available.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
++ <listitem>
++ <para>
++ Request a set of valid flags for this ioctl. When this bit is
++ set, no action is taken; the ioctl will fail with
++ <errorcode>-1</errorcode>, and <varname>errno</varname>
++ is set to <constant>EPROTO</constant>.
++ Once the ioctl returned, the <varname>flags</varname>
++ field will have all bits set that the kernel recognizes as
++ valid for this command.
++ The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
++ cleared by the operation.
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>return_flags</varname></term>
++ <listitem><para>
++ Flags returned by the kernel. Currently unused and always set to
++ <constant>0</constant> by the kernel.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>offset</varname></term>
++ <listitem><para>
++ When the ioctl returns successfully, the offset to the name registry
++ dump inside the connection's pool will be stored in this field.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++
++ <para>
++ The returned list of names is stored in a <type>struct kdbus_list</type>
++ that in turn contains an array of type <type>struct kdbus_info</type>,
++ The array-size in bytes is given as <varname>list_size</varname>.
++ The fields inside <type>struct kdbus_info</type> is described next.
++ </para>
++
++ <programlisting>
++struct kdbus_info {
++ __u64 size;
++ __u64 id;
++ __u64 flags;
++ struct kdbus_item items[0];
++};
++ </programlisting>
++
++ <para>The fields in this struct are described below.</para>
++
++ <variablelist>
++ <varlistentry>
++ <term><varname>size</varname></term>
++ <listitem><para>
++ The overall size of the struct, including its items.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>id</varname></term>
++ <listitem><para>
++ The owning connection's unique ID.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>flags</varname></term>
++ <listitem><para>
++ The flags of the owning connection.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>items</varname></term>
++ <listitem>
++ <para>
++ Items containing the actual name. Currently, one item of type
++ <constant>KDBUS_ITEM_OWNED_NAME</constant> will be attached,
++ including the name's flags. In that item, the flags field of the
++ name may carry the following bits:
++ </para>
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_NAME_ALLOW_REPLACEMENT</constant></term>
++ <listitem>
++ <para>
++ Other connections are allowed to take over this name from the
++ connection that owns it.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_NAME_IN_QUEUE</constant></term>
++ <listitem>
++ <para>
++ When retrieving a list of currently acquired names in the
++ registry, this flag indicates whether the connection
++ actually owns the name or is currently waiting for it to
++ become available.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_NAME_ACTIVATOR</constant></term>
++ <listitem>
++ <para>
++ An activator connection owns a name as a placeholder for an
++ implementer, which is started on demand by programs as soon
++ as the first message arrives. There's some more information
++ on this topic in
++ <citerefentry>
++ <refentrytitle>kdbus.connection</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ .
++ </para>
++ <para>
++ In contrast to
++ <constant>KDBUS_NAME_REPLACE_EXISTING</constant>,
++ when a name is taken over from an activator connection, all
++ the messages that have been queued in the activator
++ connection will be moved over to the new owner. The activator
++ connection will still be tracked for the name and will take
++ control again if the implementer connection terminates.
++ </para>
++ <para>
++ This flag can not be used when acquiring a name, but is
++ implicitly set through <constant>KDBUS_CMD_HELLO</constant>
++ with <constant>KDBUS_HELLO_ACTIVATOR</constant> set in
++ <varname>kdbus_cmd_hello.conn_flags</varname>.
++ </para>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
++ <listitem>
++ <para>
++ Requests a set of valid flags for this ioctl. When this bit is
++ set, no action is taken; the ioctl will return
++ <errorcode>0</errorcode>, and the <varname>flags</varname>
++ field will have all bits set that are valid for this command.
++ The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
++ cleared by the operation.
++ </para>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++ </listitem>
++ </varlistentry>
++ </variablelist>
++
++ <para>
++ The returned buffer must be freed with the
++ <constant>KDBUS_CMD_FREE</constant> ioctl when the user is finished with
++ it. See
++ <citerefentry>
++ <refentrytitle>kdbus.pool</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more information.
++ </para>
++ </refsect1>
++
++ <refsect1>
++ <title>Return value</title>
++ <para>
++ On success, all mentioned ioctl commands return <errorcode>0</errorcode>;
++ on error, <errorcode>-1</errorcode> is returned, and
++ <varname>errno</varname> is set to indicate the error.
++ If the issued ioctl is illegal for the file descriptor used,
++ <varname>errno</varname> will be set to <constant>ENOTTY</constant>.
++ </para>
++
++ <refsect2>
++ <title>
++ <constant>KDBUS_CMD_NAME_ACQUIRE</constant> may fail with the following
++ errors
++ </title>
++
++ <variablelist>
++ <varlistentry>
++ <term><constant>EINVAL</constant></term>
++ <listitem><para>
++ Illegal command flags, illegal name provided, or an activator
++ tried to acquire a second name.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EPERM</constant></term>
++ <listitem><para>
++ Policy prohibited name ownership.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EALREADY</constant></term>
++ <listitem><para>
++ Connection already owns that name.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EEXIST</constant></term>
++ <listitem><para>
++ The name already exists and can not be taken over.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>E2BIG</constant></term>
++ <listitem><para>
++ The maximum number of well-known names per connection is exhausted.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++ </refsect2>
++
++ <refsect2>
++ <title>
++ <constant>KDBUS_CMD_NAME_RELEASE</constant>
++ may fail with the following errors
++ </title>
++
++ <variablelist>
++ <varlistentry>
++ <term><constant>EINVAL</constant></term>
++ <listitem><para>
++ Invalid command flags, or invalid name provided.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>ESRCH</constant></term>
++ <listitem><para>
++ Name is not found in the registry.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EADDRINUSE</constant></term>
++ <listitem><para>
++ Name is owned by a different connection and can't be released.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++ </refsect2>
++
++ <refsect2>
++ <title>
++ <constant>KDBUS_CMD_LIST</constant> may fail with the following
++ errors
++ </title>
++
++ <variablelist>
++ <varlistentry>
++ <term><constant>EINVAL</constant></term>
++ <listitem><para>
++ Invalid command flags
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>ENOBUFS</constant></term>
++ <listitem><para>
++ No available memory in the connection's pool.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++ </refsect2>
++ </refsect1>
++
++ <refsect1>
++ <title>See Also</title>
++ <simplelist type="inline">
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.bus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.connection</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.item</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.policy</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.pool</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ </simplelist>
++ </refsect1>
++</refentry>
+diff --git a/Documentation/kdbus/kdbus.policy.xml b/Documentation/kdbus/kdbus.policy.xml
+new file mode 100644
+index 0000000..6732416
+--- /dev/null
++++ b/Documentation/kdbus/kdbus.policy.xml
+@@ -0,0 +1,406 @@
++<?xml version='1.0'?> <!--*-nxml-*-->
++<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
++ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
++
++<refentry id="kdbus.policy">
++
++ <refentryinfo>
++ <title>kdbus.policy</title>
++ <productname>kdbus.policy</productname>
++ </refentryinfo>
++
++ <refmeta>
++ <refentrytitle>kdbus.policy</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </refmeta>
++
++ <refnamediv>
++ <refname>kdbus.policy</refname>
++ <refpurpose>kdbus policy</refpurpose>
++ </refnamediv>
++
++ <refsect1>
++ <title>Description</title>
++
++ <para>
++ A kdbus policy restricts the possibilities of connections to own, see and
++ talk to well-known names. A policy can be associated with a bus (through a
++ policy holder connection) or a custom endpoint. kdbus stores its policy
++ information in a database that can be accessed through the following
++ ioctl commands:
++ </para>
++
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_CMD_HELLO</constant></term>
++ <listitem><para>
++ When creating, or updating, a policy holder connection. See
++ <citerefentry>
++ <refentrytitle>kdbus.connection</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_CMD_ENDPOINT_MAKE</constant></term>
++ <term><constant>KDBUS_CMD_ENDPOINT_UPDATE</constant></term>
++ <listitem><para>
++ When creating, or updating, a bus custom endpoint. See
++ <citerefentry>
++ <refentrytitle>kdbus.endpoint</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++
++ <para>
++ In all cases, the name and policy access information is stored in items
++ of type <constant>KDBUS_ITEM_NAME</constant> and
++ <constant>KDBUS_ITEM_POLICY_ACCESS</constant>. For this transport, the
++ following rules apply.
++ </para>
++
++ <itemizedlist>
++ <listitem>
++ <para>
++ An item of type <constant>KDBUS_ITEM_NAME</constant> must be followed
++ by at least one <constant>KDBUS_ITEM_POLICY_ACCESS</constant> item.
++ </para>
++ </listitem>
++
++ <listitem>
++ <para>
++ An item of type <constant>KDBUS_ITEM_NAME</constant> can be followed
++ by an arbitrary number of
++ <constant>KDBUS_ITEM_POLICY_ACCESS</constant> items.
++ </para>
++ </listitem>
++
++ <listitem>
++ <para>
++ An arbitrary number of groups of names and access levels can be given.
++ </para>
++ </listitem>
++ </itemizedlist>
++
++ <para>
++ Names passed in items of type <constant>KDBUS_ITEM_NAME</constant> must
++ comply to the rules of valid kdbus.name. See
++ <citerefentry>
++ <refentrytitle>kdbus.name</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more information.
++
++ The payload of an item of type
++ <constant>KDBUS_ITEM_POLICY_ACCESS</constant> is defined by the following
++ struct. For more information on the layout of items, please refer to
++ <citerefentry>
++ <refentrytitle>kdbus.item</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>.
++ </para>
++
++ <programlisting>
++struct kdbus_policy_access {
++ __u64 type;
++ __u64 access;
++ __u64 id;
++};
++ </programlisting>
++
++ <para>The fields in this struct are described below.</para>
++
++ <variablelist>
++ <varlistentry>
++ <term><varname>type</varname></term>
++ <listitem>
++ <para>
++ One of the following.
++ </para>
++
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_POLICY_ACCESS_USER</constant></term>
++ <listitem><para>
++ Grant access to a user with the UID stored in the
++ <varname>id</varname> field.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_POLICY_ACCESS_GROUP</constant></term>
++ <listitem><para>
++ Grant access to a user with the GID stored in the
++ <varname>id</varname> field.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_POLICY_ACCESS_WORLD</constant></term>
++ <listitem><para>
++ Grant access to everyone. The <varname>id</varname> field
++ is ignored.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>access</varname></term>
++ <listitem>
++ <para>
++ The access to grant. One of the following.
++ </para>
++
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_POLICY_SEE</constant></term>
++ <listitem><para>
++ Allow the name to be seen.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_POLICY_TALK</constant></term>
++ <listitem><para>
++ Allow the name to be talked to.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_POLICY_OWN</constant></term>
++ <listitem><para>
++ Allow the name to be owned.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++ </listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>id</varname></term>
++ <listitem><para>
++ For <constant>KDBUS_POLICY_ACCESS_USER</constant>, stores the UID.
++ For <constant>KDBUS_POLICY_ACCESS_GROUP</constant>, stores the GID.
++ </para></listitem>
++ </varlistentry>
++
++ </variablelist>
++
++ <para>
++ All endpoints of buses have an empty policy database by default.
++ Therefore, unless policy rules are added, all operations will also be
++ denied by default. Also see
++ <citerefentry>
++ <refentrytitle>kdbus.endpoint</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>.
++ </para>
++ </refsect1>
++
++ <refsect1>
++ <title>Wildcard names</title>
++ <para>
++ Policy holder connections may upload names that contain the wildcard
++ suffix (<literal>".*"</literal>). Such a policy entry is effective for
++ every well-known name that extends the provided name by exactly one more
++ level.
++
++ For example, the name <literal>foo.bar.*</literal> matches both
++ <literal>"foo.bar.baz"</literal> and
++ <literal>"foo.bar.bazbaz"</literal> are, but not
++ <literal>"foo.bar.baz.baz"</literal>.
++
++ This allows connections to take control over multiple names that the
++ policy holder doesn't need to know about when uploading the policy.
++
++ Such wildcard entries are not allowed for custom endpoints.
++ </para>
++ </refsect1>
++
++ <refsect1>
++ <title>Privileged connections</title>
++ <para>
++ The policy database is overruled when action is taken by a privileged
++ connection. Please refer to
++ <citerefentry>
++ <refentrytitle>kdbus.connection</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more information on what makes a connection privileged.
++ </para>
++ </refsect1>
++
++ <refsect1>
++ <title>Examples</title>
++ <para>
++ For instance, a set of policy rules may look like this:
++ </para>
++
++ <programlisting>
++KDBUS_ITEM_NAME: str='org.foo.bar'
++KDBUS_ITEM_POLICY_ACCESS: type=USER, access=OWN, ID=1000
++KDBUS_ITEM_POLICY_ACCESS: type=USER, access=TALK, ID=1001
++KDBUS_ITEM_POLICY_ACCESS: type=WORLD, access=SEE
++
++KDBUS_ITEM_NAME: str='org.blah.baz'
++KDBUS_ITEM_POLICY_ACCESS: type=USER, access=OWN, ID=0
++KDBUS_ITEM_POLICY_ACCESS: type=WORLD, access=TALK
++ </programlisting>
++
++ <para>
++ That means that 'org.foo.bar' may only be owned by UID 1000, but every
++ user on the bus is allowed to see the name. However, only UID 1001 may
++ actually send a message to the connection and receive a reply from it.
++
++ The second rule allows 'org.blah.baz' to be owned by UID 0 only, but
++ every user may talk to it.
++ </para>
++ </refsect1>
++
++ <refsect1>
++ <title>TALK access and multiple well-known names per connection</title>
++ <para>
++ Note that TALK access is checked against all names of a connection. For
++ example, if a connection owns both <constant>'org.foo.bar'</constant> and
++ <constant>'org.blah.baz'</constant>, and the policy database allows
++ <constant>'org.blah.baz'</constant> to be talked to by WORLD, then this
++ permission is also granted to <constant>'org.foo.bar'</constant>. That
++ might sound illogical, but after all, we allow messages to be directed to
++ either the ID or a well-known name, and policy is applied to the
++ connection, not the name. In other words, the effective TALK policy for a
++ connection is the most permissive of all names the connection owns.
++
++ For broadcast messages, the receiver needs TALK permissions to the sender
++ to receive the broadcast.
++ </para>
++ <para>
++ Both the endpoint and the bus policy databases are consulted to allow
++ name registry listing, owning a well-known name and message delivery.
++ If either one fails, the operation is failed with
++ <varname>errno</varname> set to <constant>EPERM</constant>.
++
++ For best practices, connections that own names with a restricted TALK
++ access should not install matches. This avoids cases where the sent
++ message may pass the bloom filter due to false-positives and may also
++ satisfy the policy rules.
++
++ Also see
++ <citerefentry>
++ <refentrytitle>kdbus.match</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>.
++ </para>
++ </refsect1>
++
++ <refsect1>
++ <title>Implicit policies</title>
++ <para>
++ Depending on the type of the endpoint, a set of implicit rules that
++ override installed policies might be enforced.
++
++ On default endpoints, the following set is enforced and checked before
++ any user-supplied policy is checked.
++ </para>
++
++ <itemizedlist>
++ <listitem>
++ <para>
++ Privileged connections always override any installed policy. Those
++ connections could easily install their own policies, so there is no
++ reason to enforce installed policies.
++ </para>
++ </listitem>
++ <listitem>
++ <para>
++ Connections can always talk to connections of the same user. This
++ includes broadcast messages.
++ </para>
++ </listitem>
++ </itemizedlist>
++
++ <para>
++ Custom endpoints have stricter policies. The following rules apply:
++ </para>
++
++ <itemizedlist>
++ <listitem>
++ <para>
++ Policy rules are always enforced, even if the connection is a
++ privileged connection.
++ </para>
++ </listitem>
++ <listitem>
++ <para>
++ Policy rules are always enforced for <constant>TALK</constant> access,
++ even if both ends are running under the same user. This includes
++ broadcast messages.
++ </para>
++ </listitem>
++ <listitem>
++ <para>
++ To restrict the set of names that can be seen, endpoint policies can
++ install <constant>SEE</constant> policies.
++ </para>
++ </listitem>
++ </itemizedlist>
++ </refsect1>
++
++ <refsect1>
++ <title>See Also</title>
++ <simplelist type="inline">
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.bus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.endpoint</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.fs</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.item</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.message</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.name</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.pool</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ </simplelist>
++ </refsect1>
++</refentry>
+diff --git a/Documentation/kdbus/kdbus.pool.xml b/Documentation/kdbus/kdbus.pool.xml
+new file mode 100644
+index 0000000..a9e16f1
+--- /dev/null
++++ b/Documentation/kdbus/kdbus.pool.xml
+@@ -0,0 +1,326 @@
++<?xml version='1.0'?> <!--*-nxml-*-->
++<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
++ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
++
++<refentry id="kdbus.pool">
++
++ <refentryinfo>
++ <title>kdbus.pool</title>
++ <productname>kdbus.pool</productname>
++ </refentryinfo>
++
++ <refmeta>
++ <refentrytitle>kdbus.pool</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </refmeta>
++
++ <refnamediv>
++ <refname>kdbus.pool</refname>
++ <refpurpose>kdbus pool</refpurpose>
++ </refnamediv>
++
++ <refsect1>
++ <title>Description</title>
++ <para>
++ A pool for data received from the kernel is installed for every
++ <emphasis>connection</emphasis> of the <emphasis>bus</emphasis>, and
++ is sized according to the information stored in the
++ <varname>pool_size</varname> member of <type>struct kdbus_cmd_hello</type>
++ when <constant>KDBUS_CMD_HELLO</constant> is employed. Internally, the
++ pool is segmented into <emphasis>slices</emphasis>, each referenced by its
++ <emphasis>offset</emphasis> in the pool, expressed in <type>bytes</type>.
++ See
++ <citerefentry>
++ <refentrytitle>kdbus.connection</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more information about <constant>KDBUS_CMD_HELLO</constant>.
++ </para>
++
++ <para>
++ The pool is written to by the kernel when one of the following
++ <emphasis>ioctls</emphasis> is issued:
++
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_CMD_HELLO</constant></term>
++ <listitem><para>
++ ... to receive details about the bus the connection was made to
++ </para></listitem>
++ </varlistentry>
++ <varlistentry>
++ <term><constant>KDBUS_CMD_RECV</constant></term>
++ <listitem><para>
++ ... to receive a message
++ </para></listitem>
++ </varlistentry>
++ <varlistentry>
++ <term><constant>KDBUS_CMD_LIST</constant></term>
++ <listitem><para>
++ ... to dump the name registry
++ </para></listitem>
++ </varlistentry>
++ <varlistentry>
++ <term><constant>KDBUS_CMD_CONN_INFO</constant></term>
++ <listitem><para>
++ ... to retrieve information on a connection
++ </para></listitem>
++ </varlistentry>
++ <varlistentry>
++ <term><constant>KDBUS_CMD_BUS_CREATOR_INFO</constant></term>
++ <listitem><para>
++ ... to retrieve information about a connection's bus creator
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++
++ </para>
++ <para>
++ The <varname>offset</varname> fields returned by either one of the
++ aforementioned ioctls describe offsets inside the pool. In order to make
++ the slice available for subsequent calls,
++ <constant>KDBUS_CMD_FREE</constant> has to be called on that offset
++ (see below). Otherwise, the pool will fill up, and the connection won't
++ be able to receive any more information through its pool.
++ </para>
++ </refsect1>
++
++ <refsect1>
++ <title>Pool slice allocation</title>
++ <para>
++ Pool slices are allocated by the kernel in order to report information
++ back to a task, such as messages, returned name list etc.
++ Allocation of pool slices cannot be initiated by userspace. See
++ <citerefentry>
++ <refentrytitle>kdbus.connection</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ and
++ <citerefentry>
++ <refentrytitle>kdbus.name</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for examples of commands that use the <emphasis>pool</emphasis> to
++ return data.
++ </para>
++ </refsect1>
++
++ <refsect1>
++ <title>Accessing the pool memory</title>
++ <para>
++ Memory in the pool is read-only for userspace and may only be written
++ to by the kernel. To read from the pool memory, the caller is expected to
++ <citerefentry>
++ <refentrytitle>mmap</refentrytitle>
++ <manvolnum>2</manvolnum>
++ </citerefentry>
++ the buffer into its task, like this:
++ </para>
++ <programlisting>
++uint8_t *buf = mmap(NULL, size, PROT_READ, MAP_SHARED, conn_fd, 0);
++ </programlisting>
++
++ <para>
++ In order to map the entire pool, the <varname>size</varname> parameter in
++ the example above should be set to the value of the
++ <varname>pool_size</varname> member of
++ <type>struct kdbus_cmd_hello</type> when
++ <constant>KDBUS_CMD_HELLO</constant> was employed to create the
++ connection (see above).
++ </para>
++
++ <para>
++ The <emphasis>file descriptor</emphasis> used to map the memory must be
++ the one that was used to create the <emphasis>connection</emphasis>.
++ In other words, the one that was used to call
++ <constant>KDBUS_CMD_HELLO</constant>. See
++ <citerefentry>
++ <refentrytitle>kdbus.connection</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more details.
++ </para>
++
++ <para>
++ Alternatively, instead of mapping the entire pool buffer, only parts
++ of it can be mapped. Every kdbus command that returns an
++ <emphasis>offset</emphasis> (see above) also reports a
++ <emphasis>size</emphasis> along with it, so programs can be written
++ in a way that it only maps portions of the pool to access a specific
++ <emphasis>slice</emphasis>.
++ </para>
++
++ <para>
++ When access to the pool memory is no longer needed, programs should
++ call <function>munmap()</function> on the pointer returned by
++ <function>mmap()</function>.
++ </para>
++ </refsect1>
++
++ <refsect1>
++ <title>Freeing pool slices</title>
++ <para>
++ The <constant>KDBUS_CMD_FREE</constant> ioctl is used to free a slice
++ inside the pool, describing an offset that was returned in an
++ <varname>offset</varname> field of another ioctl struct.
++ The <constant>KDBUS_CMD_FREE</constant> command takes a
++ <type>struct kdbus_cmd_free</type> as argument.
++ </para>
++
++<programlisting>
++struct kdbus_cmd_free {
++ __u64 size;
++ __u64 flags;
++ __u64 return_flags;
++ __u64 offset;
++ struct kdbus_item items[0];
++};
++</programlisting>
++
++ <para>The fields in this struct are described below.</para>
++
++ <variablelist>
++ <varlistentry>
++ <term><varname>size</varname></term>
++ <listitem><para>
++ The overall size of the struct, including its items.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>flags</varname></term>
++ <listitem><para>
++ Currently unused.
++ <constant>KDBUS_FLAG_NEGOTIATE</constant> is accepted to probe for
++ valid flags. If set, the ioctl will return <errorcode>0</errorcode>,
++ and the <varname>flags</varname> field is set to
++ <constant>0</constant>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>return_flags</varname></term>
++ <listitem><para>
++ Flags returned by the kernel. Currently unused and always set to
++ <constant>0</constant> by the kernel.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>offset</varname></term>
++ <listitem><para>
++ The offset to free, as returned by other ioctls that allocated
++ memory for returned information.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><varname>items</varname></term>
++ <listitem><para>
++ Items to specify further details for the receive command.
++ Currently unused.
++ Unrecognized items are rejected, and the ioctl will fail with
++ <varname>errno</varname> set to <constant>EINVAL</constant>.
++ All items except for
++ <constant>KDBUS_ITEM_NEGOTIATE</constant> (see
++ <citerefentry>
++ <refentrytitle>kdbus.item</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ ) will be rejected.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++ </refsect1>
++
++ <refsect1>
++ <title>Return value</title>
++ <para>
++ On success, all mentioned ioctl commands return <errorcode>0</errorcode>;
++ on error, <errorcode>-1</errorcode> is returned, and
++ <varname>errno</varname> is set to indicate the error.
++ If the issued ioctl is illegal for the file descriptor used,
++ <varname>errno</varname> will be set to <constant>ENOTTY</constant>.
++ </para>
++
++ <refsect2>
++ <title>
++ <constant>KDBUS_CMD_FREE</constant> may fail with the following
++ errors
++ </title>
++
++ <variablelist>
++ <varlistentry>
++ <term><constant>ENXIO</constant></term>
++ <listitem><para>
++ No pool slice found at given offset.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EINVAL</constant></term>
++ <listitem><para>
++ Invalid flags provided.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>EINVAL</constant></term>
++ <listitem><para>
++ The offset is valid, but the user is not allowed to free the slice.
++ This happens, for example, if the offset was retrieved with
++ <constant>KDBUS_RECV_PEEK</constant>.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++ </refsect2>
++ </refsect1>
++
++ <refsect1>
++ <title>See Also</title>
++ <simplelist type="inline">
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.bus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.connection</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.endpoint</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.name</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>mmap</refentrytitle>
++ <manvolnum>2</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>munmap</refentrytitle>
++ <manvolnum>2</manvolnum>
++ </citerefentry>
++ </member>
++ </simplelist>
++ </refsect1>
++</refentry>
+diff --git a/Documentation/kdbus/kdbus.xml b/Documentation/kdbus/kdbus.xml
+new file mode 100644
+index 0000000..d8e7400
+--- /dev/null
++++ b/Documentation/kdbus/kdbus.xml
+@@ -0,0 +1,1012 @@
++<?xml version='1.0'?> <!--*-nxml-*-->
++<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
++ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
++
++<refentry id="kdbus">
++
++ <refentryinfo>
++ <title>kdbus</title>
++ <productname>kdbus</productname>
++ </refentryinfo>
++
++ <refmeta>
++ <refentrytitle>kdbus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </refmeta>
++
++ <refnamediv>
++ <refname>kdbus</refname>
++ <refpurpose>Kernel Message Bus</refpurpose>
++ </refnamediv>
++
++ <refsect1>
++ <title>Synopsis</title>
++ <para>
++ kdbus is an inter-process communication bus system controlled by the
++ kernel. It provides user-space with an API to create buses and send
++ unicast and multicast messages to one, or many, peers connected to the
++ same bus. It does not enforce any layout on the transmitted data, but
++ only provides the transport layer used for message interchange between
++ peers.
++ </para>
++ <para>
++ This set of man-pages gives a comprehensive overview of the kernel-level
++ API, with all ioctl commands, associated structs and bit masks. However,
++ most people will not use this API level directly, but rather let one of
++ the high-level abstraction libraries help them integrate D-Bus
++ functionality into their applications.
++ </para>
++ </refsect1>
++
++ <refsect1>
++ <title>Description</title>
++ <para>
++ kdbus provides a pseudo filesystem called <emphasis>kdbusfs</emphasis>,
++ which is usually mounted on <filename>/sys/fs/kdbus</filename>. Bus
++ primitives can be accessed as files and sub-directories underneath this
++ mount-point. Any advanced operations are done via
++ <function>ioctl()</function> on files created by
++ <emphasis>kdbusfs</emphasis>. Multiple mount-points of
++ <emphasis>kdbusfs</emphasis> are independent of each other. This allows
++ namespacing of kdbus by mounting a new instance of
++ <emphasis>kdbusfs</emphasis> in a new mount-namespace. kdbus calls these
++ mount instances domains and each bus belongs to exactly one domain.
++ </para>
++
++ <para>
++ kdbus was designed as a transport layer for D-Bus, but is in no way
++ limited, nor controlled by the D-Bus protocol specification. The D-Bus
++ protocol is one possible application layer on top of kdbus.
++ </para>
++
++ <para>
++ For the general D-Bus protocol specification, its payload format, its
++ marshaling, and its communication semantics, please refer to the
++ <ulink url="http://dbus.freedesktop.org/doc/dbus-specification.html">
++ D-Bus specification</ulink>.
++ </para>
++
++ </refsect1>
++
++ <refsect1>
++ <title>Terminology</title>
++
++ <refsect2>
++ <title>Domain</title>
++ <para>
++ A domain is a <emphasis>kdbusfs</emphasis> mount-point containing all
++ the bus primitives. Each domain is independent, and separate domains
++ do not affect each other.
++ </para>
++ </refsect2>
++
++ <refsect2>
++ <title>Bus</title>
++ <para>
++ A bus is a named object inside a domain. Clients exchange messages
++ over a bus. Multiple buses themselves have no connection to each other;
++ messages can only be exchanged on the same bus. The default endpoint of
++ a bus, to which clients establish connections, is the "bus" file
++ /sys/fs/kdbus/<bus name>/bus.
++ Common operating system setups create one "system bus" per system,
++ and one "user bus" for every logged-in user. Applications or services
++ may create their own private buses. The kernel driver does not
++ distinguish between different bus types, they are all handled the same
++ way. See
++ <citerefentry>
++ <refentrytitle>kdbus.bus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more details.
++ </para>
++ </refsect2>
++
++ <refsect2>
++ <title>Endpoint</title>
++ <para>
++ An endpoint provides a file to talk to a bus. Opening an endpoint
++ creates a new connection to the bus to which the endpoint belongs. All
++ endpoints have unique names and are accessible as files underneath the
++ directory of a bus, e.g., /sys/fs/kdbus/<bus>/<endpoint>
++ Every bus has a default endpoint called "bus".
++ A bus can optionally offer additional endpoints with custom names
++ to provide restricted access to the bus. Custom endpoints carry
++ additional policy which can be used to create sandboxes with
++ locked-down, limited, filtered access to a bus. See
++ <citerefentry>
++ <refentrytitle>kdbus.endpoint</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more details.
++ </para>
++ </refsect2>
++
++ <refsect2>
++ <title>Connection</title>
++ <para>
++ A connection to a bus is created by opening an endpoint file of a
++ bus. Every ordinary client connection has a unique identifier on the
++ bus and can address messages to every other connection on the same
++ bus by using the peer's connection ID as the destination. See
++ <citerefentry>
++ <refentrytitle>kdbus.connection</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more details.
++ </para>
++ </refsect2>
++
++ <refsect2>
++ <title>Pool</title>
++ <para>
++ Each connection allocates a piece of shmem-backed memory that is
++ used to receive messages and answers to ioctl commands from the kernel.
++ It is never used to send anything to the kernel. In order to access that
++ memory, an application must mmap() it into its address space. See
++ <citerefentry>
++ <refentrytitle>kdbus.pool</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more details.
++ </para>
++ </refsect2>
++
++ <refsect2>
++ <title>Well-known Name</title>
++ <para>
++ A connection can, in addition to its implicit unique connection ID,
++ request the ownership of a textual well-known name. Well-known names are
++ noted in reverse-domain notation, such as com.example.service1. A
++ connection that offers a service on a bus is usually reached by its
++ well-known name. An analogy of connection ID and well-known name is an
++ IP address and a DNS name associated with that address. See
++ <citerefentry>
++ <refentrytitle>kdbus.name</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more details.
++ </para>
++ </refsect2>
++
++ <refsect2>
++ <title>Message</title>
++ <para>
++ Connections can exchange messages with other connections by addressing
++ the peers with their connection ID or well-known name. A message
++ consists of a message header with information on how to route the
++ message, and the message payload, which is a logical byte stream of
++ arbitrary size. Messages can carry additional file descriptors to be
++ passed from one connection to another, just like passing file
++ descriptors over UNIX domain sockets. Every connection can specify which
++ set of metadata the kernel should attach to the message when it is
++ delivered to the receiving connection. Metadata contains information
++ like: system time stamps, UID, GID, TID, proc-starttime, well-known
++ names, process comm, process exe, process argv, cgroup, capabilities,
++ seclabel, audit session, loginuid and the connection's human-readable
++ name. See
++ <citerefentry>
++ <refentrytitle>kdbus.message</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more details.
++ </para>
++ </refsect2>
++
++ <refsect2>
++ <title>Item</title>
++ <para>
++ The API of kdbus implements the notion of items, submitted through and
++ returned by most ioctls, and stored inside data structures in the
++ connection's pool. See
++ <citerefentry>
++ <refentrytitle>kdbus.item</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more details.
++ </para>
++ </refsect2>
++
++ <refsect2>
++ <title>Broadcast, signal, filter, match</title>
++ <para>
++ Signals are messages that a receiver opts in for by installing a blob of
++ bytes, called a 'match'. Signal messages must always carry a
++ counter-part blob, called a 'filter', and signals are only delivered to
++ peers which have a match that white-lists the message's filter. Senders
++ of signal messages can use either a single connection ID as receiver,
++ or the special connection ID
++ <constant>KDBUS_DST_ID_BROADCAST</constant> to potentially send it to
++ all connections of a bus, following the logic described above. See
++ <citerefentry>
++ <refentrytitle>kdbus.match</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ and
++ <citerefentry>
++ <refentrytitle>kdbus.message</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more details.
++ </para>
++ </refsect2>
++
++ <refsect2>
++ <title>Policy</title>
++ <para>
++ A policy is a set of rules that define which connections can see, talk
++ to, or register a well-known name on the bus. A policy is attached to
++ buses and custom endpoints, and modified by policy holder connections or
++ owners of custom endpoints. See
++ <citerefentry>
++ <refentrytitle>kdbus.policy</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more details.
++ </para>
++ </refsect2>
++
++ <refsect2>
++ <title>Privileged bus users</title>
++ <para>
++ A user connecting to the bus is considered privileged if it is either
++ the creator of the bus, or if it has the CAP_IPC_OWNER capability flag
++ set. See
++ <citerefentry>
++ <refentrytitle>kdbus.connection</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more details.
++ </para>
++ </refsect2>
++ </refsect1>
++
++ <refsect1>
++ <title>Bus Layout</title>
++
++ <para>
++ A <emphasis>bus</emphasis> provides and defines an environment that peers
++ can connect to for message interchange. A bus is created via the kdbus
++ control interface and can be modified by the bus creator. It applies the
++ policy that control all bus operations. The bus creator itself does not
++ participate as a peer. To establish a peer
++ <emphasis>connection</emphasis>, you have to open one of the
++ <emphasis>endpoints</emphasis> of a bus. Each bus provides a default
++ endpoint, but further endpoints can be created on-demand. Endpoints are
++ used to apply additional policies for all connections on this endpoint.
++ Thus, they provide additional filters to further restrict access of
++ specific connections to the bus.
++ </para>
++
++ <para>
++ Following, you can see an example bus layout:
++ </para>
++
++ <programlisting><![CDATA[
++ Bus Creator
++ |
++ |
++ +-----+
++ | Bus |
++ +-----+
++ |
++ __________________/ \__________________
++ / \
++ | |
++ +----------+ +----------+
++ | Endpoint | | Endpoint |
++ +----------+ +----------+
++ _________/|\_________ _________/|\_________
++ / | \ / | \
++ | | | | | |
++ | | | | | |
++ Connection Connection Connection Connection Connection Connection
++ ]]></programlisting>
++
++ </refsect1>
++
++ <refsect1>
++ <title>Data structures and interconnections</title>
++ <programlisting><![CDATA[
++ +--------------------------------------------------------------------------+
++ | Domain (Mount Point) |
++ | /sys/fs/kdbus/control |
++ | +----------------------------------------------------------------------+ |
++ | | Bus (System Bus) | |
++ | | /sys/fs/kdbus/0-system/ | |
++ | | +-------------------------------+ +--------------------------------+ | |
++ | | | Endpoint | | Endpoint | | |
++ | | | /sys/fs/kdbus/0-system/bus | | /sys/fs/kdbus/0-system/ep.app | | |
++ | | +-------------------------------+ +--------------------------------+ | |
++ | | +--------------+ +--------------+ +--------------+ +---------------+ | |
++ | | | Connection | | Connection | | Connection | | Connection | | |
++ | | | :1.22 | | :1.25 | | :1.55 | | :1.81 | | |
++ | | +--------------+ +--------------+ +--------------+ +---------------+ | |
++ | +----------------------------------------------------------------------+ |
++ | |
++ | +----------------------------------------------------------------------+ |
++ | | Bus (User Bus for UID 2702) | |
++ | | /sys/fs/kdbus/2702-user/ | |
++ | | +-------------------------------+ +--------------------------------+ | |
++ | | | Endpoint | | Endpoint | | |
++ | | | /sys/fs/kdbus/2702-user/bus | | /sys/fs/kdbus/2702-user/ep.app | | |
++ | | +-------------------------------+ +--------------------------------+ | |
++ | | +--------------+ +--------------+ +--------------+ +---------------+ | |
++ | | | Connection | | Connection | | Connection | | Connection | | |
++ | | | :1.22 | | :1.25 | | :1.55 | | :1.81 | | |
++ | | +--------------+ +--------------+ +--------------------------------+ | |
++ | +----------------------------------------------------------------------+ |
++ +--------------------------------------------------------------------------+
++ ]]></programlisting>
++ </refsect1>
++
++ <refsect1>
++ <title>Metadata</title>
++
++ <refsect2>
++ <title>When metadata is collected</title>
++ <para>
++ kdbus records data about the system in certain situations. Such metadata
++ can refer to the currently active process (creds, PIDs, current user
++ groups, process names and its executable path, cgroup membership,
++ capabilities, security label and audit information), connection
++ information (description string, currently owned names) and time stamps.
++ </para>
++ <para>
++ Metadata is collected at the following times.
++ </para>
++
++ <itemizedlist>
++ <listitem><para>
++ When a bus is created (<constant>KDBUS_CMD_MAKE</constant>),
++ information about the calling task is collected. This data is returned
++ by the kernel via the <constant>KDBUS_CMD_BUS_CREATOR_INFO</constant>
++ call.
++ </para></listitem>
++
++ <listitem>
++ <para>
++ When a connection is created (<constant>KDBUS_CMD_HELLO</constant>),
++ information about the calling task is collected. Alternatively, a
++ privileged connection may provide 'faked' information about
++ credentials, PIDs and security labels which will be stored instead.
++ This data is returned by the kernel as information on a connection
++ (<constant>KDBUS_CMD_CONN_INFO</constant>). Only metadata that a
++ connection allowed to be sent (by setting its bit in
++ <varname>attach_flags_send</varname>) will be exported in this way.
++ </para>
++ </listitem>
++
++ <listitem>
++ <para>
++ When a message is sent (<constant>KDBUS_CMD_SEND</constant>),
++ information about the sending task and the sending connection is
++ collected. This metadata will be attached to the message when it
++ arrives in the receiver's pool. If the connection sending the
++ message installed faked credentials (see
++ <citerefentry>
++ <refentrytitle>kdbus.connection</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>),
++ the message will not be augmented by any information about the
++ currently sending task. Note that only metadata that was requested
++ by the receiving connection will be collected and attached to
++ messages.
++ </para>
++ </listitem>
++ </itemizedlist>
++
++ <para>
++ Which metadata items are actually delivered depends on the following
++ sets and masks:
++ </para>
++
++ <itemizedlist>
++ <listitem><para>
++ (a) the system-wide kmod creds mask
++ (module parameter <varname>attach_flags_mask</varname>)
++ </para></listitem>
++
++ <listitem><para>
++ (b) the per-connection send creds mask, set by the connecting client
++ </para></listitem>
++
++ <listitem><para>
++ (c) the per-connection receive creds mask, set by the connecting
++ client
++ </para></listitem>
++
++ <listitem><para>
++ (d) the per-bus minimal creds mask, set by the bus creator
++ </para></listitem>
++
++ <listitem><para>
++ (e) the per-bus owner creds mask, set by the bus creator
++ </para></listitem>
++
++ <listitem><para>
++ (f) the mask specified when querying creds of a bus peer
++ </para></listitem>
++
++ <listitem><para>
++ (g) the mask specified when querying creds of a bus owner
++ </para></listitem>
++ </itemizedlist>
++
++ <para>
++ With the following rules:
++ </para>
++
++ <itemizedlist>
++ <listitem>
++ <para>
++ [1] The creds attached to messages are determined as
++ <constant>a & b & c</constant>.
++ </para>
++ </listitem>
++
++ <listitem>
++ <para>
++ [2] When connecting to a bus (<constant>KDBUS_CMD_HELLO</constant>),
++ and <constant>~b & d != 0</constant>, the call will fail with,
++ <errorcode>-1</errorcode>, and <varname>errno</varname> is set to
++ <constant>ECONNREFUSED</constant>.
++ </para>
++ </listitem>
++
++ <listitem>
++ <para>
++ [3] When querying creds of a bus peer, the creds returned are
++ <constant>a & b & f</constant>.
++ </para>
++ </listitem>
++
++ <listitem>
++ <para>
++ [4] When querying creds of a bus owner, the creds returned are
++ <constant>a & e & g</constant>.
++ </para>
++ </listitem>
++ </itemizedlist>
++
++ <para>
++ Hence, programs might not always get all requested metadata items that
++ it requested. Code must be written so that it can cope with this fact.
++ </para>
++ </refsect2>
++
++ <refsect2>
++ <title>Benefits and heads-up</title>
++ <para>
++ Attaching metadata to messages has two major benefits.
++
++ <itemizedlist>
++ <listitem>
++ <para>
++ Metadata attached to messages is gathered at the moment when the
++ other side calls <constant>KDBUS_CMD_SEND</constant>, or,
++ respectively, then the kernel notification is generated. There is
++ no need for the receiving peer to retrieve information about the
++ task in a second step. This closes a race gap that would otherwise
++ be inherent.
++ </para>
++ </listitem>
++ <listitem>
++ <para>
++ As metadata is delivered along with messages in the same data
++ blob, no extra calls to kernel functions etc. are needed to gather
++ them.
++ </para>
++ </listitem>
++ </itemizedlist>
++
++ Note, however, that collecting metadata does come at a price for
++ performance, so developers should carefully assess which metadata to
++ really opt-in for. For best practice, data that is not needed as part
++ of a message should not be requested by the connection in the first
++ place (see <varname>attach_flags_recv</varname> in
++ <constant>KDBUS_CMD_HELLO</constant>).
++ </para>
++ </refsect2>
++
++ <refsect2>
++ <title>Attach flags for metadata items</title>
++ <para>
++ To let the kernel know which metadata information to attach as items
++ to the aforementioned commands, it uses a bitmask. In those, the
++ following <emphasis>attach flags</emphasis> are currently supported.
++ Both the <varname>attach_flags_recv</varname> and
++ <varname>attach_flags_send</varname> fields of
++ <type>struct kdbus_cmd_hello</type>, as well as the payload of the
++ <constant>KDBUS_ITEM_ATTACH_FLAGS_SEND</constant> and
++ <constant>KDBUS_ITEM_ATTACH_FLAGS_RECV</constant> items follow this
++ scheme.
++ </para>
++
++ <variablelist>
++ <varlistentry>
++ <term><constant>KDBUS_ATTACH_TIMESTAMP</constant></term>
++ <listitem><para>
++ Requests the attachment of an item of type
++ <constant>KDBUS_ITEM_TIMESTAMP</constant>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ATTACH_CREDS</constant></term>
++ <listitem><para>
++ Requests the attachment of an item of type
++ <constant>KDBUS_ITEM_CREDS</constant>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ATTACH_PIDS</constant></term>
++ <listitem><para>
++ Requests the attachment of an item of type
++ <constant>KDBUS_ITEM_PIDS</constant>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ATTACH_AUXGROUPS</constant></term>
++ <listitem><para>
++ Requests the attachment of an item of type
++ <constant>KDBUS_ITEM_AUXGROUPS</constant>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ATTACH_NAMES</constant></term>
++ <listitem><para>
++ Requests the attachment of an item of type
++ <constant>KDBUS_ITEM_OWNED_NAME</constant>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ATTACH_TID_COMM</constant></term>
++ <listitem><para>
++ Requests the attachment of an item of type
++ <constant>KDBUS_ITEM_TID_COMM</constant>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ATTACH_PID_COMM</constant></term>
++ <listitem><para>
++ Requests the attachment of an item of type
++ <constant>KDBUS_ITEM_PID_COMM</constant>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ATTACH_EXE</constant></term>
++ <listitem><para>
++ Requests the attachment of an item of type
++ <constant>KDBUS_ITEM_EXE</constant>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ATTACH_CMDLINE</constant></term>
++ <listitem><para>
++ Requests the attachment of an item of type
++ <constant>KDBUS_ITEM_CMDLINE</constant>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ATTACH_CGROUP</constant></term>
++ <listitem><para>
++ Requests the attachment of an item of type
++ <constant>KDBUS_ITEM_CGROUP</constant>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ATTACH_CAPS</constant></term>
++ <listitem><para>
++ Requests the attachment of an item of type
++ <constant>KDBUS_ITEM_CAPS</constant>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ATTACH_SECLABEL</constant></term>
++ <listitem><para>
++ Requests the attachment of an item of type
++ <constant>KDBUS_ITEM_SECLABEL</constant>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ATTACH_AUDIT</constant></term>
++ <listitem><para>
++ Requests the attachment of an item of type
++ <constant>KDBUS_ITEM_AUDIT</constant>.
++ </para></listitem>
++ </varlistentry>
++
++ <varlistentry>
++ <term><constant>KDBUS_ATTACH_CONN_DESCRIPTION</constant></term>
++ <listitem><para>
++ Requests the attachment of an item of type
++ <constant>KDBUS_ITEM_CONN_DESCRIPTION</constant>.
++ </para></listitem>
++ </varlistentry>
++ </variablelist>
++
++ <para>
++ Please refer to
++ <citerefentry>
++ <refentrytitle>kdbus.item</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for detailed information about the layout and payload of items and
++ what metadata should be used to.
++ </para>
++ </refsect2>
++ </refsect1>
++
++ <refsect1>
++ <title>The ioctl interface</title>
++
++ <para>
++ As stated in the 'synopsis' section above, application developers are
++ strongly encouraged to use kdbus through one of the high-level D-Bus
++ abstraction libraries, rather than using the low-level API directly.
++ </para>
++
++ <para>
++ kdbus on the kernel level exposes its functions exclusively through
++ <citerefentry>
++ <refentrytitle>ioctl</refentrytitle>
++ <manvolnum>2</manvolnum>
++ </citerefentry>,
++ employed on file descriptors returned by
++ <citerefentry>
++ <refentrytitle>open</refentrytitle>
++ <manvolnum>2</manvolnum>
++ </citerefentry>
++ on pseudo files exposed by
++ <citerefentry>
++ <refentrytitle>kdbus.fs</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>.
++ </para>
++ <para>
++ Following is a list of all the ioctls, along with the command structs
++ they must be used with.
++ </para>
++
++ <informaltable frame="none">
++ <tgroup cols="3" colsep="1">
++ <thead>
++ <row>
++ <entry>ioctl signature</entry>
++ <entry>command</entry>
++ <entry>transported struct</entry>
++ </row>
++ </thead>
++ <tbody>
++ <row>
++ <entry><constant>0x40189500</constant></entry>
++ <entry><constant>KDBUS_CMD_BUS_MAKE</constant></entry>
++ <entry><type>struct kdbus_cmd *</type></entry>
++ </row><row>
++ <entry><constant>0x40189510</constant></entry>
++ <entry><constant>KDBUS_CMD_ENDPOINT_MAKE</constant></entry>
++ <entry><type>struct kdbus_cmd *</type></entry>
++ </row><row>
++ <entry><constant>0xc0609580</constant></entry>
++ <entry><constant>KDBUS_CMD_HELLO</constant></entry>
++ <entry><type>struct kdbus_cmd_hello *</type></entry>
++ </row><row>
++ <entry><constant>0x40189582</constant></entry>
++ <entry><constant>KDBUS_CMD_BYEBYE</constant></entry>
++ <entry><type>struct kdbus_cmd *</type></entry>
++ </row><row>
++ <entry><constant>0x40389590</constant></entry>
++ <entry><constant>KDBUS_CMD_SEND</constant></entry>
++ <entry><type>struct kdbus_cmd_send *</type></entry>
++ </row><row>
++ <entry><constant>0x80409591</constant></entry>
++ <entry><constant>KDBUS_CMD_RECV</constant></entry>
++ <entry><type>struct kdbus_cmd_recv *</type></entry>
++ </row><row>
++ <entry><constant>0x40209583</constant></entry>
++ <entry><constant>KDBUS_CMD_FREE</constant></entry>
++ <entry><type>struct kdbus_cmd_free *</type></entry>
++ </row><row>
++ <entry><constant>0x401895a0</constant></entry>
++ <entry><constant>KDBUS_CMD_NAME_ACQUIRE</constant></entry>
++ <entry><type>struct kdbus_cmd *</type></entry>
++ </row><row>
++ <entry><constant>0x401895a1</constant></entry>
++ <entry><constant>KDBUS_CMD_NAME_RELEASE</constant></entry>
++ <entry><type>struct kdbus_cmd *</type></entry>
++ </row><row>
++ <entry><constant>0x80289586</constant></entry>
++ <entry><constant>KDBUS_CMD_LIST</constant></entry>
++ <entry><type>struct kdbus_cmd_list *</type></entry>
++ </row><row>
++ <entry><constant>0x80309584</constant></entry>
++ <entry><constant>KDBUS_CMD_CONN_INFO</constant></entry>
++ <entry><type>struct kdbus_cmd_info *</type></entry>
++ </row><row>
++ <entry><constant>0x40209551</constant></entry>
++ <entry><constant>KDBUS_CMD_UPDATE</constant></entry>
++ <entry><type>struct kdbus_cmd *</type></entry>
++ </row><row>
++ <entry><constant>0x80309585</constant></entry>
++ <entry><constant>KDBUS_CMD_BUS_CREATOR_INFO</constant></entry>
++ <entry><type>struct kdbus_cmd_info *</type></entry>
++ </row><row>
++ <entry><constant>0x40189511</constant></entry>
++ <entry><constant>KDBUS_CMD_ENDPOINT_UPDATE</constant></entry>
++ <entry><type>struct kdbus_cmd *</type></entry>
++ </row><row>
++ <entry><constant>0x402095b0</constant></entry>
++ <entry><constant>KDBUS_CMD_MATCH_ADD</constant></entry>
++ <entry><type>struct kdbus_cmd_match *</type></entry>
++ </row><row>
++ <entry><constant>0x402095b1</constant></entry>
++ <entry><constant>KDBUS_CMD_MATCH_REMOVE</constant></entry>
++ <entry><type>struct kdbus_cmd_match *</type></entry>
++ </row>
++ </tbody>
++ </tgroup>
++ </informaltable>
++
++ <para>
++ Depending on the type of <emphasis>kdbusfs</emphasis> node that was
++ opened and what ioctls have been executed on a file descriptor before,
++ a different sub-set of ioctl commands is allowed.
++ </para>
++
++ <itemizedlist>
++ <listitem>
++ <para>
++ On a file descriptor resulting from opening a
++ <emphasis>control node</emphasis>, only the
++ <constant>KDBUS_CMD_BUS_MAKE</constant> ioctl may be executed.
++ </para>
++ </listitem>
++ <listitem>
++ <para>
++ On a file descriptor resulting from opening a
++ <emphasis>bus endpoint node</emphasis>, only the
++ <constant>KDBUS_CMD_ENDPOINT_MAKE</constant> and
++ <constant>KDBUS_CMD_HELLO</constant> ioctls may be executed.
++ </para>
++ </listitem>
++ <listitem>
++ <para>
++ A file descriptor that was used to create a bus
++ (via <constant>KDBUS_CMD_BUS_MAKE</constant>) is called a
++ <emphasis>bus owner</emphasis> file descriptor. The bus will be
++ active as long as the file descriptor is kept open.
++ A bus owner file descriptor can not be used to
++ employ any further ioctls. As soon as
++ <citerefentry>
++ <refentrytitle>close</refentrytitle>
++ <manvolnum>2</manvolnum>
++ </citerefentry>
++ is called on it, the bus will be shut down, along will all associated
++ endpoints and connections. See
++ <citerefentry>
++ <refentrytitle>kdbus.bus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more details.
++ </para>
++ </listitem>
++ <listitem>
++ <para>
++ A file descriptor that was used to create an endpoint
++ (via <constant>KDBUS_CMD_ENDPOINT_MAKE</constant>) is called an
++ <emphasis>endpoint owner</emphasis> file descriptor. The endpoint
++ will be active as long as the file descriptor is kept open.
++ An endpoint owner file descriptor can only be used
++ to update details of an endpoint through the
++ <constant>KDBUS_CMD_ENDPOINT_UPDATE</constant> ioctl. As soon as
++ <citerefentry>
++ <refentrytitle>close</refentrytitle>
++ <manvolnum>2</manvolnum>
++ </citerefentry>
++ is called on it, the endpoint will be removed from the bus, and all
++ connections that are connected to the bus through it are shut down.
++ See
++ <citerefentry>
++ <refentrytitle>kdbus.endpoint</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ for more details.
++ </para>
++ </listitem>
++ <listitem>
++ <para>
++ A file descriptor that was used to create a connection
++ (via <constant>KDBUS_CMD_HELLO</constant>) is called a
++ <emphasis>connection owner</emphasis> file descriptor. The connection
++ will be active as long as the file descriptor is kept open.
++ A connection owner file descriptor may be used to
++ issue any of the following ioctls.
++ </para>
++
++ <itemizedlist>
++ <listitem><para>
++ <constant>KDBUS_CMD_UPDATE</constant> to tweak details of the
++ connection. See
++ <citerefentry>
++ <refentrytitle>kdbus.connection</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>.
++ </para></listitem>
++
++ <listitem><para>
++ <constant>KDBUS_CMD_BYEBYE</constant> to shut down a connection
++ without losing messages. See
++ <citerefentry>
++ <refentrytitle>kdbus.connection</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>.
++ </para></listitem>
++
++ <listitem><para>
++ <constant>KDBUS_CMD_FREE</constant> to free a slice of memory in
++ the pool. See
++ <citerefentry>
++ <refentrytitle>kdbus.pool</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>.
++ </para></listitem>
++
++ <listitem><para>
++ <constant>KDBUS_CMD_CONN_INFO</constant> to retrieve information
++ on other connections on the bus. See
++ <citerefentry>
++ <refentrytitle>kdbus.connection</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>.
++ </para></listitem>
++
++ <listitem><para>
++ <constant>KDBUS_CMD_BUS_CREATOR_INFO</constant> to retrieve
++ information on the bus creator. See
++ <citerefentry>
++ <refentrytitle>kdbus.connection</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>.
++ </para></listitem>
++
++ <listitem><para>
++ <constant>KDBUS_CMD_LIST</constant> to retrieve a list of
++ currently active well-known names and unique IDs on the bus. See
++ <citerefentry>
++ <refentrytitle>kdbus.name</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>.
++ </para></listitem>
++
++ <listitem><para>
++ <constant>KDBUS_CMD_SEND</constant> and
++ <constant>KDBUS_CMD_RECV</constant> to send or receive a message.
++ See
++ <citerefentry>
++ <refentrytitle>kdbus.message</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>.
++ </para></listitem>
++
++ <listitem><para>
++ <constant>KDBUS_CMD_NAME_ACQUIRE</constant> and
++ <constant>KDBUS_CMD_NAME_RELEASE</constant> to acquire or release
++ a well-known name on the bus. See
++ <citerefentry>
++ <refentrytitle>kdbus.name</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>.
++ </para></listitem>
++
++ <listitem><para>
++ <constant>KDBUS_CMD_MATCH_ADD</constant> and
++ <constant>KDBUS_CMD_MATCH_REMOVE</constant> to add or remove
++ a match for signal messages. See
++ <citerefentry>
++ <refentrytitle>kdbus.match</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>.
++ </para></listitem>
++ </itemizedlist>
++ </listitem>
++ </itemizedlist>
++
++ <para>
++ These ioctls, along with the structs they transport, are explained in
++ detail in the other documents linked to in the "See Also" section below.
++ </para>
++ </refsect1>
++
++ <refsect1>
++ <title>See Also</title>
++ <simplelist type="inline">
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.bus</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.connection</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.endpoint</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.fs</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.item</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.message</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.name</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>kdbus.pool</refentrytitle>
++ <manvolnum>7</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>ioctl</refentrytitle>
++ <manvolnum>2</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>mmap</refentrytitle>
++ <manvolnum>2</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>open</refentrytitle>
++ <manvolnum>2</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <citerefentry>
++ <refentrytitle>close</refentrytitle>
++ <manvolnum>2</manvolnum>
++ </citerefentry>
++ </member>
++ <member>
++ <ulink url="http://freedesktop.org/wiki/Software/dbus">D-Bus</ulink>
++ </member>
++ </simplelist>
++ </refsect1>
++
++</refentry>
+diff --git a/Documentation/kdbus/stylesheet.xsl b/Documentation/kdbus/stylesheet.xsl
+new file mode 100644
+index 0000000..52565ea
+--- /dev/null
++++ b/Documentation/kdbus/stylesheet.xsl
+@@ -0,0 +1,16 @@
++<?xml version="1.0" encoding="UTF-8"?>
++<stylesheet xmlns="http://www.w3.org/1999/XSL/Transform" version="1.0">
++ <param name="chunk.quietly">1</param>
++ <param name="funcsynopsis.style">ansi</param>
++ <param name="funcsynopsis.tabular.threshold">80</param>
++ <param name="callout.graphics">0</param>
++ <param name="paper.type">A4</param>
++ <param name="generate.section.toc.level">2</param>
++ <param name="use.id.as.filename">1</param>
++ <param name="citerefentry.link">1</param>
++ <strip-space elements="*"/>
++ <template name="generate.citerefentry.link">
++ <value-of select="refentrytitle"/>
++ <text>.html</text>
++ </template>
++</stylesheet>
+diff --git a/MAINTAINERS b/MAINTAINERS
+index d8afd29..02f7668 100644
+--- a/MAINTAINERS
++++ b/MAINTAINERS
+@@ -5585,6 +5585,19 @@ S: Maintained
+ F: Documentation/kbuild/kconfig-language.txt
+ F: scripts/kconfig/
+
++KDBUS
++M: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++M: Daniel Mack <daniel@zonque.org>
++M: David Herrmann <dh.herrmann@googlemail.com>
++M: Djalal Harouni <tixxdz@opendz.org>
++L: linux-kernel@vger.kernel.org
++S: Maintained
++F: ipc/kdbus/*
++F: samples/kdbus/*
++F: Documentation/kdbus/*
++F: include/uapi/linux/kdbus.h
++F: tools/testing/selftests/kdbus/
++
+ KDUMP
+ M: Vivek Goyal <vgoyal@redhat.com>
+ M: Haren Myneni <hbabu@us.ibm.com>
+diff --git a/Makefile b/Makefile
+index f5c8983..a1c8d57 100644
+--- a/Makefile
++++ b/Makefile
+@@ -1343,6 +1343,7 @@ $(help-board-dirs): help-%:
+ %docs: scripts_basic FORCE
+ $(Q)$(MAKE) $(build)=scripts build_docproc
+ $(Q)$(MAKE) $(build)=Documentation/DocBook $@
++ $(Q)$(MAKE) $(build)=Documentation/kdbus $@
+
+ else # KBUILD_EXTMOD
+
+diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
+index 1a0006a..4842a98 100644
+--- a/include/uapi/linux/Kbuild
++++ b/include/uapi/linux/Kbuild
+@@ -215,6 +215,7 @@ header-y += ixjuser.h
+ header-y += jffs2.h
+ header-y += joystick.h
+ header-y += kcmp.h
++header-y += kdbus.h
+ header-y += kdev_t.h
+ header-y += kd.h
+ header-y += kernelcapi.h
+diff --git a/include/uapi/linux/kdbus.h b/include/uapi/linux/kdbus.h
+new file mode 100644
+index 0000000..4fc44cb
+--- /dev/null
++++ b/include/uapi/linux/kdbus.h
+@@ -0,0 +1,984 @@
++/*
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef _UAPI_KDBUS_H_
++#define _UAPI_KDBUS_H_
++
++#include <linux/ioctl.h>
++#include <linux/types.h>
++
++#define KDBUS_IOCTL_MAGIC 0x95
++#define KDBUS_SRC_ID_KERNEL (0)
++#define KDBUS_DST_ID_NAME (0)
++#define KDBUS_MATCH_ID_ANY (~0ULL)
++#define KDBUS_DST_ID_BROADCAST (~0ULL)
++#define KDBUS_FLAG_NEGOTIATE (1ULL << 63)
++
++/**
++ * struct kdbus_notify_id_change - name registry change message
++ * @id: New or former owner of the name
++ * @flags: flags field from KDBUS_HELLO_*
++ *
++ * Sent from kernel to userspace when the owner or activator of
++ * a well-known name changes.
++ *
++ * Attached to:
++ * KDBUS_ITEM_ID_ADD
++ * KDBUS_ITEM_ID_REMOVE
++ */
++struct kdbus_notify_id_change {
++ __u64 id;
++ __u64 flags;
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_notify_name_change - name registry change message
++ * @old_id: ID and flags of former owner of a name
++ * @new_id: ID and flags of new owner of a name
++ * @name: Well-known name
++ *
++ * Sent from kernel to userspace when the owner or activator of
++ * a well-known name changes.
++ *
++ * Attached to:
++ * KDBUS_ITEM_NAME_ADD
++ * KDBUS_ITEM_NAME_REMOVE
++ * KDBUS_ITEM_NAME_CHANGE
++ */
++struct kdbus_notify_name_change {
++ struct kdbus_notify_id_change old_id;
++ struct kdbus_notify_id_change new_id;
++ char name[0];
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_creds - process credentials
++ * @uid: User ID
++ * @euid: Effective UID
++ * @suid: Saved UID
++ * @fsuid: Filesystem UID
++ * @gid: Group ID
++ * @egid: Effective GID
++ * @sgid: Saved GID
++ * @fsgid: Filesystem GID
++ *
++ * Attached to:
++ * KDBUS_ITEM_CREDS
++ */
++struct kdbus_creds {
++ __u64 uid;
++ __u64 euid;
++ __u64 suid;
++ __u64 fsuid;
++ __u64 gid;
++ __u64 egid;
++ __u64 sgid;
++ __u64 fsgid;
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_pids - process identifiers
++ * @pid: Process ID
++ * @tid: Thread ID
++ * @ppid: Parent process ID
++ *
++ * The PID and TID of a process.
++ *
++ * Attached to:
++ * KDBUS_ITEM_PIDS
++ */
++struct kdbus_pids {
++ __u64 pid;
++ __u64 tid;
++ __u64 ppid;
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_caps - process capabilities
++ * @last_cap: Highest currently known capability bit
++ * @caps: Variable number of 32-bit capabilities flags
++ *
++ * Contains a variable number of 32-bit capabilities flags.
++ *
++ * Attached to:
++ * KDBUS_ITEM_CAPS
++ */
++struct kdbus_caps {
++ __u32 last_cap;
++ __u32 caps[0];
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_audit - audit information
++ * @sessionid: The audit session ID
++ * @loginuid: The audit login uid
++ *
++ * Attached to:
++ * KDBUS_ITEM_AUDIT
++ */
++struct kdbus_audit {
++ __u32 sessionid;
++ __u32 loginuid;
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_timestamp
++ * @seqnum: Global per-domain message sequence number
++ * @monotonic_ns: Monotonic timestamp, in nanoseconds
++ * @realtime_ns: Realtime timestamp, in nanoseconds
++ *
++ * Attached to:
++ * KDBUS_ITEM_TIMESTAMP
++ */
++struct kdbus_timestamp {
++ __u64 seqnum;
++ __u64 monotonic_ns;
++ __u64 realtime_ns;
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_vec - I/O vector for kdbus payload items
++ * @size: The size of the vector
++ * @address: Memory address of data buffer
++ * @offset: Offset in the in-message payload memory,
++ * relative to the message head
++ *
++ * Attached to:
++ * KDBUS_ITEM_PAYLOAD_VEC, KDBUS_ITEM_PAYLOAD_OFF
++ */
++struct kdbus_vec {
++ __u64 size;
++ union {
++ __u64 address;
++ __u64 offset;
++ };
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_bloom_parameter - bus-wide bloom parameters
++ * @size: Size of the bit field in bytes (m / 8)
++ * @n_hash: Number of hash functions used (k)
++ */
++struct kdbus_bloom_parameter {
++ __u64 size;
++ __u64 n_hash;
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_bloom_filter - bloom filter containing n elements
++ * @generation: Generation of the element set in the filter
++ * @data: Bit field, multiple of 8 bytes
++ */
++struct kdbus_bloom_filter {
++ __u64 generation;
++ __u64 data[0];
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_memfd - a kdbus memfd
++ * @start: The offset into the memfd where the segment starts
++ * @size: The size of the memfd segment
++ * @fd: The file descriptor number
++ * @__pad: Padding to ensure proper alignment and size
++ *
++ * Attached to:
++ * KDBUS_ITEM_PAYLOAD_MEMFD
++ */
++struct kdbus_memfd {
++ __u64 start;
++ __u64 size;
++ int fd;
++ __u32 __pad;
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_name - a registered well-known name with its flags
++ * @flags: Flags from KDBUS_NAME_*
++ * @name: Well-known name
++ *
++ * Attached to:
++ * KDBUS_ITEM_OWNED_NAME
++ */
++struct kdbus_name {
++ __u64 flags;
++ char name[0];
++} __attribute__((__aligned__(8)));
++
++/**
++ * enum kdbus_policy_access_type - permissions of a policy record
++ * @_KDBUS_POLICY_ACCESS_NULL: Uninitialized/invalid
++ * @KDBUS_POLICY_ACCESS_USER: Grant access to a uid
++ * @KDBUS_POLICY_ACCESS_GROUP: Grant access to gid
++ * @KDBUS_POLICY_ACCESS_WORLD: World-accessible
++ */
++enum kdbus_policy_access_type {
++ _KDBUS_POLICY_ACCESS_NULL,
++ KDBUS_POLICY_ACCESS_USER,
++ KDBUS_POLICY_ACCESS_GROUP,
++ KDBUS_POLICY_ACCESS_WORLD,
++};
++
++/**
++ * enum kdbus_policy_access_flags - mode flags
++ * @KDBUS_POLICY_OWN: Allow to own a well-known name
++ * Implies KDBUS_POLICY_TALK and KDBUS_POLICY_SEE
++ * @KDBUS_POLICY_TALK: Allow communication to a well-known name
++ * Implies KDBUS_POLICY_SEE
++ * @KDBUS_POLICY_SEE: Allow to see a well-known name
++ */
++enum kdbus_policy_type {
++ KDBUS_POLICY_SEE = 0,
++ KDBUS_POLICY_TALK,
++ KDBUS_POLICY_OWN,
++};
++
++/**
++ * struct kdbus_policy_access - policy access item
++ * @type: One of KDBUS_POLICY_ACCESS_* types
++ * @access: Access to grant
++ * @id: For KDBUS_POLICY_ACCESS_USER, the uid
++ * For KDBUS_POLICY_ACCESS_GROUP, the gid
++ */
++struct kdbus_policy_access {
++ __u64 type; /* USER, GROUP, WORLD */
++ __u64 access; /* OWN, TALK, SEE */
++ __u64 id; /* uid, gid, 0 */
++} __attribute__((__aligned__(8)));
++
++/**
++ * enum kdbus_attach_flags - flags for metadata attachments
++ * @KDBUS_ATTACH_TIMESTAMP: Timestamp
++ * @KDBUS_ATTACH_CREDS: Credentials
++ * @KDBUS_ATTACH_PIDS: PIDs
++ * @KDBUS_ATTACH_AUXGROUPS: Auxiliary groups
++ * @KDBUS_ATTACH_NAMES: Well-known names
++ * @KDBUS_ATTACH_TID_COMM: The "comm" process identifier of the TID
++ * @KDBUS_ATTACH_PID_COMM: The "comm" process identifier of the PID
++ * @KDBUS_ATTACH_EXE: The path of the executable
++ * @KDBUS_ATTACH_CMDLINE: The process command line
++ * @KDBUS_ATTACH_CGROUP: The croup membership
++ * @KDBUS_ATTACH_CAPS: The process capabilities
++ * @KDBUS_ATTACH_SECLABEL: The security label
++ * @KDBUS_ATTACH_AUDIT: The audit IDs
++ * @KDBUS_ATTACH_CONN_DESCRIPTION: The human-readable connection name
++ * @_KDBUS_ATTACH_ALL: All of the above
++ * @_KDBUS_ATTACH_ANY: Wildcard match to enable any kind of
++ * metatdata.
++ */
++enum kdbus_attach_flags {
++ KDBUS_ATTACH_TIMESTAMP = 1ULL << 0,
++ KDBUS_ATTACH_CREDS = 1ULL << 1,
++ KDBUS_ATTACH_PIDS = 1ULL << 2,
++ KDBUS_ATTACH_AUXGROUPS = 1ULL << 3,
++ KDBUS_ATTACH_NAMES = 1ULL << 4,
++ KDBUS_ATTACH_TID_COMM = 1ULL << 5,
++ KDBUS_ATTACH_PID_COMM = 1ULL << 6,
++ KDBUS_ATTACH_EXE = 1ULL << 7,
++ KDBUS_ATTACH_CMDLINE = 1ULL << 8,
++ KDBUS_ATTACH_CGROUP = 1ULL << 9,
++ KDBUS_ATTACH_CAPS = 1ULL << 10,
++ KDBUS_ATTACH_SECLABEL = 1ULL << 11,
++ KDBUS_ATTACH_AUDIT = 1ULL << 12,
++ KDBUS_ATTACH_CONN_DESCRIPTION = 1ULL << 13,
++ _KDBUS_ATTACH_ALL = (1ULL << 14) - 1,
++ _KDBUS_ATTACH_ANY = ~0ULL
++};
++
++/**
++ * enum kdbus_item_type - item types to chain data in a list
++ * @_KDBUS_ITEM_NULL: Uninitialized/invalid
++ * @_KDBUS_ITEM_USER_BASE: Start of user items
++ * @KDBUS_ITEM_NEGOTIATE: Negotiate supported items
++ * @KDBUS_ITEM_PAYLOAD_VEC: Vector to data
++ * @KDBUS_ITEM_PAYLOAD_OFF: Data at returned offset to message head
++ * @KDBUS_ITEM_PAYLOAD_MEMFD: Data as sealed memfd
++ * @KDBUS_ITEM_FDS: Attached file descriptors
++ * @KDBUS_ITEM_CANCEL_FD: FD used to cancel a synchronous
++ * operation by writing to it from
++ * userspace
++ * @KDBUS_ITEM_BLOOM_PARAMETER: Bus-wide bloom parameters, used with
++ * KDBUS_CMD_BUS_MAKE, carries a
++ * struct kdbus_bloom_parameter
++ * @KDBUS_ITEM_BLOOM_FILTER: Bloom filter carried with a message,
++ * used to match against a bloom mask of a
++ * connection, carries a struct
++ * kdbus_bloom_filter
++ * @KDBUS_ITEM_BLOOM_MASK: Bloom mask used to match against a
++ * message'sbloom filter
++ * @KDBUS_ITEM_DST_NAME: Destination's well-known name
++ * @KDBUS_ITEM_MAKE_NAME: Name of domain, bus, endpoint
++ * @KDBUS_ITEM_ATTACH_FLAGS_SEND: Attach-flags, used for updating which
++ * metadata a connection opts in to send
++ * @KDBUS_ITEM_ATTACH_FLAGS_RECV: Attach-flags, used for updating which
++ * metadata a connection requests to
++ * receive for each reeceived message
++ * @KDBUS_ITEM_ID: Connection ID
++ * @KDBUS_ITEM_NAME: Well-know name with flags
++ * @_KDBUS_ITEM_ATTACH_BASE: Start of metadata attach items
++ * @KDBUS_ITEM_TIMESTAMP: Timestamp
++ * @KDBUS_ITEM_CREDS: Process credentials
++ * @KDBUS_ITEM_PIDS: Process identifiers
++ * @KDBUS_ITEM_AUXGROUPS: Auxiliary process groups
++ * @KDBUS_ITEM_OWNED_NAME: A name owned by the associated
++ * connection
++ * @KDBUS_ITEM_TID_COMM: Thread ID "comm" identifier
++ * (Don't trust this, see below.)
++ * @KDBUS_ITEM_PID_COMM: Process ID "comm" identifier
++ * (Don't trust this, see below.)
++ * @KDBUS_ITEM_EXE: The path of the executable
++ * (Don't trust this, see below.)
++ * @KDBUS_ITEM_CMDLINE: The process command line
++ * (Don't trust this, see below.)
++ * @KDBUS_ITEM_CGROUP: The croup membership
++ * @KDBUS_ITEM_CAPS: The process capabilities
++ * @KDBUS_ITEM_SECLABEL: The security label
++ * @KDBUS_ITEM_AUDIT: The audit IDs
++ * @KDBUS_ITEM_CONN_DESCRIPTION: The connection's human-readable name
++ * (debugging)
++ * @_KDBUS_ITEM_POLICY_BASE: Start of policy items
++ * @KDBUS_ITEM_POLICY_ACCESS: Policy access block
++ * @_KDBUS_ITEM_KERNEL_BASE: Start of kernel-generated message items
++ * @KDBUS_ITEM_NAME_ADD: Notification in kdbus_notify_name_change
++ * @KDBUS_ITEM_NAME_REMOVE: Notification in kdbus_notify_name_change
++ * @KDBUS_ITEM_NAME_CHANGE: Notification in kdbus_notify_name_change
++ * @KDBUS_ITEM_ID_ADD: Notification in kdbus_notify_id_change
++ * @KDBUS_ITEM_ID_REMOVE: Notification in kdbus_notify_id_change
++ * @KDBUS_ITEM_REPLY_TIMEOUT: Timeout has been reached
++ * @KDBUS_ITEM_REPLY_DEAD: Destination died
++ *
++ * N.B: The process and thread COMM fields, as well as the CMDLINE and
++ * EXE fields may be altered by unprivileged processes und should
++ * hence *not* used for security decisions. Peers should make use of
++ * these items only for informational purposes, such as generating log
++ * records.
++ */
++enum kdbus_item_type {
++ _KDBUS_ITEM_NULL,
++ _KDBUS_ITEM_USER_BASE,
++ KDBUS_ITEM_NEGOTIATE = _KDBUS_ITEM_USER_BASE,
++ KDBUS_ITEM_PAYLOAD_VEC,
++ KDBUS_ITEM_PAYLOAD_OFF,
++ KDBUS_ITEM_PAYLOAD_MEMFD,
++ KDBUS_ITEM_FDS,
++ KDBUS_ITEM_CANCEL_FD,
++ KDBUS_ITEM_BLOOM_PARAMETER,
++ KDBUS_ITEM_BLOOM_FILTER,
++ KDBUS_ITEM_BLOOM_MASK,
++ KDBUS_ITEM_DST_NAME,
++ KDBUS_ITEM_MAKE_NAME,
++ KDBUS_ITEM_ATTACH_FLAGS_SEND,
++ KDBUS_ITEM_ATTACH_FLAGS_RECV,
++ KDBUS_ITEM_ID,
++ KDBUS_ITEM_NAME,
++ KDBUS_ITEM_DST_ID,
++
++ /* keep these item types in sync with KDBUS_ATTACH_* flags */
++ _KDBUS_ITEM_ATTACH_BASE = 0x1000,
++ KDBUS_ITEM_TIMESTAMP = _KDBUS_ITEM_ATTACH_BASE,
++ KDBUS_ITEM_CREDS,
++ KDBUS_ITEM_PIDS,
++ KDBUS_ITEM_AUXGROUPS,
++ KDBUS_ITEM_OWNED_NAME,
++ KDBUS_ITEM_TID_COMM,
++ KDBUS_ITEM_PID_COMM,
++ KDBUS_ITEM_EXE,
++ KDBUS_ITEM_CMDLINE,
++ KDBUS_ITEM_CGROUP,
++ KDBUS_ITEM_CAPS,
++ KDBUS_ITEM_SECLABEL,
++ KDBUS_ITEM_AUDIT,
++ KDBUS_ITEM_CONN_DESCRIPTION,
++
++ _KDBUS_ITEM_POLICY_BASE = 0x2000,
++ KDBUS_ITEM_POLICY_ACCESS = _KDBUS_ITEM_POLICY_BASE,
++
++ _KDBUS_ITEM_KERNEL_BASE = 0x8000,
++ KDBUS_ITEM_NAME_ADD = _KDBUS_ITEM_KERNEL_BASE,
++ KDBUS_ITEM_NAME_REMOVE,
++ KDBUS_ITEM_NAME_CHANGE,
++ KDBUS_ITEM_ID_ADD,
++ KDBUS_ITEM_ID_REMOVE,
++ KDBUS_ITEM_REPLY_TIMEOUT,
++ KDBUS_ITEM_REPLY_DEAD,
++};
++
++/**
++ * struct kdbus_item - chain of data blocks
++ * @size: Overall data record size
++ * @type: Kdbus_item type of data
++ * @data: Generic bytes
++ * @data32: Generic 32 bit array
++ * @data64: Generic 64 bit array
++ * @str: Generic string
++ * @id: Connection ID
++ * @vec: KDBUS_ITEM_PAYLOAD_VEC
++ * @creds: KDBUS_ITEM_CREDS
++ * @audit: KDBUS_ITEM_AUDIT
++ * @timestamp: KDBUS_ITEM_TIMESTAMP
++ * @name: KDBUS_ITEM_NAME
++ * @bloom_parameter: KDBUS_ITEM_BLOOM_PARAMETER
++ * @bloom_filter: KDBUS_ITEM_BLOOM_FILTER
++ * @memfd: KDBUS_ITEM_PAYLOAD_MEMFD
++ * @name_change: KDBUS_ITEM_NAME_ADD
++ * KDBUS_ITEM_NAME_REMOVE
++ * KDBUS_ITEM_NAME_CHANGE
++ * @id_change: KDBUS_ITEM_ID_ADD
++ * KDBUS_ITEM_ID_REMOVE
++ * @policy: KDBUS_ITEM_POLICY_ACCESS
++ */
++struct kdbus_item {
++ __u64 size;
++ __u64 type;
++ union {
++ __u8 data[0];
++ __u32 data32[0];
++ __u64 data64[0];
++ char str[0];
++
++ __u64 id;
++ struct kdbus_vec vec;
++ struct kdbus_creds creds;
++ struct kdbus_pids pids;
++ struct kdbus_audit audit;
++ struct kdbus_caps caps;
++ struct kdbus_timestamp timestamp;
++ struct kdbus_name name;
++ struct kdbus_bloom_parameter bloom_parameter;
++ struct kdbus_bloom_filter bloom_filter;
++ struct kdbus_memfd memfd;
++ int fds[0];
++ struct kdbus_notify_name_change name_change;
++ struct kdbus_notify_id_change id_change;
++ struct kdbus_policy_access policy_access;
++ };
++} __attribute__((__aligned__(8)));
++
++/**
++ * enum kdbus_msg_flags - type of message
++ * @KDBUS_MSG_EXPECT_REPLY: Expect a reply message, used for
++ * method calls. The userspace-supplied
++ * cookie identifies the message and the
++ * respective reply carries the cookie
++ * in cookie_reply
++ * @KDBUS_MSG_NO_AUTO_START: Do not start a service if the addressed
++ * name is not currently active. This flag is
++ * not looked at by the kernel but only
++ * serves as hint for userspace implementations.
++ * @KDBUS_MSG_SIGNAL: Treat this message as signal
++ */
++enum kdbus_msg_flags {
++ KDBUS_MSG_EXPECT_REPLY = 1ULL << 0,
++ KDBUS_MSG_NO_AUTO_START = 1ULL << 1,
++ KDBUS_MSG_SIGNAL = 1ULL << 2,
++};
++
++/**
++ * enum kdbus_payload_type - type of payload carried by message
++ * @KDBUS_PAYLOAD_KERNEL: Kernel-generated simple message
++ * @KDBUS_PAYLOAD_DBUS: D-Bus marshalling "DBusDBus"
++ *
++ * Any payload-type is accepted. Common types will get added here once
++ * established.
++ */
++enum kdbus_payload_type {
++ KDBUS_PAYLOAD_KERNEL,
++ KDBUS_PAYLOAD_DBUS = 0x4442757344427573ULL,
++};
++
++/**
++ * struct kdbus_msg - the representation of a kdbus message
++ * @size: Total size of the message
++ * @flags: Message flags (KDBUS_MSG_*), userspace → kernel
++ * @priority: Message queue priority value
++ * @dst_id: 64-bit ID of the destination connection
++ * @src_id: 64-bit ID of the source connection
++ * @payload_type: Payload type (KDBUS_PAYLOAD_*)
++ * @cookie: Userspace-supplied cookie, for the connection
++ * to identify its messages
++ * @timeout_ns: The time to wait for a message reply from the peer.
++ * If there is no reply, and the send command is
++ * executed asynchronously, a kernel-generated message
++ * with an attached KDBUS_ITEM_REPLY_TIMEOUT item
++ * is sent to @src_id. For synchronously executed send
++ * command, the value denotes the maximum time the call
++ * blocks to wait for a reply. The timeout is expected in
++ * nanoseconds and as absolute CLOCK_MONOTONIC value.
++ * @cookie_reply: A reply to the requesting message with the same
++ * cookie. The requesting connection can match its
++ * request and the reply with this value
++ * @items: A list of kdbus_items containing the message payload
++ */
++struct kdbus_msg {
++ __u64 size;
++ __u64 flags;
++ __s64 priority;
++ __u64 dst_id;
++ __u64 src_id;
++ __u64 payload_type;
++ __u64 cookie;
++ union {
++ __u64 timeout_ns;
++ __u64 cookie_reply;
++ };
++ struct kdbus_item items[0];
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_msg_info - returned message container
++ * @offset: Offset of kdbus_msg slice in pool
++ * @msg_size: Copy of the kdbus_msg.size field
++ * @return_flags: Command return flags, kernel → userspace
++ */
++struct kdbus_msg_info {
++ __u64 offset;
++ __u64 msg_size;
++ __u64 return_flags;
++} __attribute__((__aligned__(8)));
++
++/**
++ * enum kdbus_send_flags - flags for sending messages
++ * @KDBUS_SEND_SYNC_REPLY: Wait for destination connection to
++ * reply to this message. The
++ * KDBUS_CMD_SEND ioctl() will block
++ * until the reply is received, and
++ * reply in struct kdbus_cmd_send will
++ * yield the offset in the sender's pool
++ * where the reply can be found.
++ * This flag is only valid if
++ * @KDBUS_MSG_EXPECT_REPLY is set as well.
++ */
++enum kdbus_send_flags {
++ KDBUS_SEND_SYNC_REPLY = 1ULL << 0,
++};
++
++/**
++ * struct kdbus_cmd_send - send message
++ * @size: Overall size of this structure
++ * @flags: Flags to change send behavior (KDBUS_SEND_*)
++ * @return_flags: Command return flags, kernel → userspace
++ * @msg_address: Storage address of the kdbus_msg to send
++ * @reply: Storage for message reply if KDBUS_SEND_SYNC_REPLY
++ * was given
++ * @items: Additional items for this command
++ */
++struct kdbus_cmd_send {
++ __u64 size;
++ __u64 flags;
++ __u64 return_flags;
++ __u64 msg_address;
++ struct kdbus_msg_info reply;
++ struct kdbus_item items[0];
++} __attribute__((__aligned__(8)));
++
++/**
++ * enum kdbus_recv_flags - flags for de-queuing messages
++ * @KDBUS_RECV_PEEK: Return the next queued message without
++ * actually de-queuing it, and without installing
++ * any file descriptors or other resources. It is
++ * usually used to determine the activating
++ * connection of a bus name.
++ * @KDBUS_RECV_DROP: Drop and free the next queued message and all
++ * its resources without actually receiving it.
++ * @KDBUS_RECV_USE_PRIORITY: Only de-queue messages with the specified or
++ * higher priority (lowest values); if not set,
++ * the priority value is ignored.
++ */
++enum kdbus_recv_flags {
++ KDBUS_RECV_PEEK = 1ULL << 0,
++ KDBUS_RECV_DROP = 1ULL << 1,
++ KDBUS_RECV_USE_PRIORITY = 1ULL << 2,
++};
++
++/**
++ * enum kdbus_recv_return_flags - return flags for message receive commands
++ * @KDBUS_RECV_RETURN_INCOMPLETE_FDS: One or more file descriptors could not
++ * be installed. These descriptors in
++ * KDBUS_ITEM_FDS will carry the value -1.
++ * @KDBUS_RECV_RETURN_DROPPED_MSGS: There have been dropped messages since
++ * the last time a message was received.
++ * The 'dropped_msgs' counter contains the
++ * number of messages dropped pool
++ * overflows or other missed broadcasts.
++ */
++enum kdbus_recv_return_flags {
++ KDBUS_RECV_RETURN_INCOMPLETE_FDS = 1ULL << 0,
++ KDBUS_RECV_RETURN_DROPPED_MSGS = 1ULL << 1,
++};
++
++/**
++ * struct kdbus_cmd_recv - struct to de-queue a buffered message
++ * @size: Overall size of this object
++ * @flags: KDBUS_RECV_* flags, userspace → kernel
++ * @return_flags: Command return flags, kernel → userspace
++ * @priority: Minimum priority of the messages to de-queue. Lowest
++ * values have the highest priority.
++ * @dropped_msgs: In case there were any dropped messages since the last
++ * time a message was received, this will be set to the
++ * number of lost messages and
++ * KDBUS_RECV_RETURN_DROPPED_MSGS will be set in
++ * 'return_flags'. This can only happen if the ioctl
++ * returns 0 or EAGAIN.
++ * @msg: Return storage for received message.
++ * @items: Additional items for this command.
++ *
++ * This struct is used with the KDBUS_CMD_RECV ioctl.
++ */
++struct kdbus_cmd_recv {
++ __u64 size;
++ __u64 flags;
++ __u64 return_flags;
++ __s64 priority;
++ __u64 dropped_msgs;
++ struct kdbus_msg_info msg;
++ struct kdbus_item items[0];
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_cmd_free - struct to free a slice of memory in the pool
++ * @size: Overall size of this structure
++ * @flags: Flags for the free command, userspace → kernel
++ * @return_flags: Command return flags, kernel → userspace
++ * @offset: The offset of the memory slice, as returned by other
++ * ioctls
++ * @items: Additional items to modify the behavior
++ *
++ * This struct is used with the KDBUS_CMD_FREE ioctl.
++ */
++struct kdbus_cmd_free {
++ __u64 size;
++ __u64 flags;
++ __u64 return_flags;
++ __u64 offset;
++ struct kdbus_item items[0];
++} __attribute__((__aligned__(8)));
++
++/**
++ * enum kdbus_hello_flags - flags for struct kdbus_cmd_hello
++ * @KDBUS_HELLO_ACCEPT_FD: The connection allows the reception of
++ * any passed file descriptors
++ * @KDBUS_HELLO_ACTIVATOR: Special-purpose connection which registers
++ * a well-know name for a process to be started
++ * when traffic arrives
++ * @KDBUS_HELLO_POLICY_HOLDER: Special-purpose connection which registers
++ * policy entries for a name. The provided name
++ * is not activated and not registered with the
++ * name database, it only allows unprivileged
++ * connections to acquire a name, talk or discover
++ * a service
++ * @KDBUS_HELLO_MONITOR: Special-purpose connection to monitor
++ * bus traffic
++ */
++enum kdbus_hello_flags {
++ KDBUS_HELLO_ACCEPT_FD = 1ULL << 0,
++ KDBUS_HELLO_ACTIVATOR = 1ULL << 1,
++ KDBUS_HELLO_POLICY_HOLDER = 1ULL << 2,
++ KDBUS_HELLO_MONITOR = 1ULL << 3,
++};
++
++/**
++ * struct kdbus_cmd_hello - struct to say hello to kdbus
++ * @size: The total size of the structure
++ * @flags: Connection flags (KDBUS_HELLO_*), userspace → kernel
++ * @return_flags: Command return flags, kernel → userspace
++ * @attach_flags_send: Mask of metadata to attach to each message sent
++ * off by this connection (KDBUS_ATTACH_*)
++ * @attach_flags_recv: Mask of metadata to attach to each message receieved
++ * by the new connection (KDBUS_ATTACH_*)
++ * @bus_flags: The flags field copied verbatim from the original
++ * KDBUS_CMD_BUS_MAKE ioctl. It's intended to be useful
++ * to do negotiation of features of the payload that is
++ * transferred (kernel → userspace)
++ * @id: The ID of this connection (kernel → userspace)
++ * @pool_size: Size of the connection's buffer where the received
++ * messages are placed
++ * @offset: Pool offset where items are returned to report
++ * additional information about the bus and the newly
++ * created connection.
++ * @items_size: Size of buffer returned in the pool slice at @offset.
++ * @id128: Unique 128-bit ID of the bus (kernel → userspace)
++ * @items: A list of items
++ *
++ * This struct is used with the KDBUS_CMD_HELLO ioctl.
++ */
++struct kdbus_cmd_hello {
++ __u64 size;
++ __u64 flags;
++ __u64 return_flags;
++ __u64 attach_flags_send;
++ __u64 attach_flags_recv;
++ __u64 bus_flags;
++ __u64 id;
++ __u64 pool_size;
++ __u64 offset;
++ __u64 items_size;
++ __u8 id128[16];
++ struct kdbus_item items[0];
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_info - connection information
++ * @size: total size of the struct
++ * @id: 64bit object ID
++ * @flags: object creation flags
++ * @items: list of items
++ *
++ * Note that the user is responsible for freeing the allocated memory with
++ * the KDBUS_CMD_FREE ioctl.
++ */
++struct kdbus_info {
++ __u64 size;
++ __u64 id;
++ __u64 flags;
++ struct kdbus_item items[0];
++} __attribute__((__aligned__(8)));
++
++/**
++ * enum kdbus_list_flags - what to include into the returned list
++ * @KDBUS_LIST_UNIQUE: active connections
++ * @KDBUS_LIST_ACTIVATORS: activator connections
++ * @KDBUS_LIST_NAMES: known well-known names
++ * @KDBUS_LIST_QUEUED: queued-up names
++ */
++enum kdbus_list_flags {
++ KDBUS_LIST_UNIQUE = 1ULL << 0,
++ KDBUS_LIST_NAMES = 1ULL << 1,
++ KDBUS_LIST_ACTIVATORS = 1ULL << 2,
++ KDBUS_LIST_QUEUED = 1ULL << 3,
++};
++
++/**
++ * struct kdbus_cmd_list - list connections
++ * @size: overall size of this object
++ * @flags: flags for the query (KDBUS_LIST_*), userspace → kernel
++ * @return_flags: command return flags, kernel → userspace
++ * @offset: Offset in the caller's pool buffer where an array of
++ * kdbus_info objects is stored.
++ * The user must use KDBUS_CMD_FREE to free the
++ * allocated memory.
++ * @list_size: size of returned list in bytes
++ * @items: Items for the command. Reserved for future use.
++ *
++ * This structure is used with the KDBUS_CMD_LIST ioctl.
++ */
++struct kdbus_cmd_list {
++ __u64 size;
++ __u64 flags;
++ __u64 return_flags;
++ __u64 offset;
++ __u64 list_size;
++ struct kdbus_item items[0];
++} __attribute__((__aligned__(8)));
++
++/**
++ * struct kdbus_cmd_info - struct used for KDBUS_CMD_CONN_INFO ioctl
++ * @size: The total size of the struct
++ * @flags: Flags for this ioctl, userspace → kernel
++ * @return_flags: Command return flags, kernel → userspace
++ * @id: The 64-bit ID of the connection. If set to zero, passing
++ * @name is required. kdbus will look up the name to
++ * determine the ID in this case.
++ * @attach_flags: Set of attach flags to specify the set of information
++ * to receive, userspace → kernel
++ * @offset: Returned offset in the caller's pool buffer where the
++ * kdbus_info struct result is stored. The user must
++ * use KDBUS_CMD_FREE to free the allocated memory.
++ * @info_size: Output buffer to report size of data at @offset.
++ * @items: The optional item list, containing the
++ * well-known name to look up as a KDBUS_ITEM_NAME.
++ * Only needed in case @id is zero.
++ *
++ * On success, the KDBUS_CMD_CONN_INFO ioctl will return 0 and @offset will
++ * tell the user the offset in the connection pool buffer at which to find the
++ * result in a struct kdbus_info.
++ */
++struct kdbus_cmd_info {
++ __u64 size;
++ __u64 flags;
++ __u64 return_flags;
++ __u64 id;
++ __u64 attach_flags;
++ __u64 offset;
++ __u64 info_size;
++ struct kdbus_item items[0];
++} __attribute__((__aligned__(8)));
++
++/**
++ * enum kdbus_cmd_match_flags - flags to control the KDBUS_CMD_MATCH_ADD ioctl
++ * @KDBUS_MATCH_REPLACE: If entries with the supplied cookie already
++ * exists, remove them before installing the new
++ * matches.
++ */
++enum kdbus_cmd_match_flags {
++ KDBUS_MATCH_REPLACE = 1ULL << 0,
++};
++
++/**
++ * struct kdbus_cmd_match - struct to add or remove matches
++ * @size: The total size of the struct
++ * @flags: Flags for match command (KDBUS_MATCH_*),
++ * userspace → kernel
++ * @return_flags: Command return flags, kernel → userspace
++ * @cookie: Userspace supplied cookie. When removing, the cookie
++ * identifies the match to remove
++ * @items: A list of items for additional information
++ *
++ * This structure is used with the KDBUS_CMD_MATCH_ADD and
++ * KDBUS_CMD_MATCH_REMOVE ioctl.
++ */
++struct kdbus_cmd_match {
++ __u64 size;
++ __u64 flags;
++ __u64 return_flags;
++ __u64 cookie;
++ struct kdbus_item items[0];
++} __attribute__((__aligned__(8)));
++
++/**
++ * enum kdbus_make_flags - Flags for KDBUS_CMD_{BUS,ENDPOINT}_MAKE
++ * @KDBUS_MAKE_ACCESS_GROUP: Make the bus or endpoint node group-accessible
++ * @KDBUS_MAKE_ACCESS_WORLD: Make the bus or endpoint node world-accessible
++ */
++enum kdbus_make_flags {
++ KDBUS_MAKE_ACCESS_GROUP = 1ULL << 0,
++ KDBUS_MAKE_ACCESS_WORLD = 1ULL << 1,
++};
++
++/**
++ * enum kdbus_name_flags - flags for KDBUS_CMD_NAME_ACQUIRE
++ * @KDBUS_NAME_REPLACE_EXISTING: Try to replace name of other connections
++ * @KDBUS_NAME_ALLOW_REPLACEMENT: Allow the replacement of the name
++ * @KDBUS_NAME_QUEUE: Name should be queued if busy
++ * @KDBUS_NAME_IN_QUEUE: Name is queued
++ * @KDBUS_NAME_ACTIVATOR: Name is owned by a activator connection
++ * @KDBUS_NAME_PRIMARY: Primary owner of the name
++ * @KDBUS_NAME_ACQUIRED: Name was acquired/queued _now_
++ */
++enum kdbus_name_flags {
++ KDBUS_NAME_REPLACE_EXISTING = 1ULL << 0,
++ KDBUS_NAME_ALLOW_REPLACEMENT = 1ULL << 1,
++ KDBUS_NAME_QUEUE = 1ULL << 2,
++ KDBUS_NAME_IN_QUEUE = 1ULL << 3,
++ KDBUS_NAME_ACTIVATOR = 1ULL << 4,
++ KDBUS_NAME_PRIMARY = 1ULL << 5,
++ KDBUS_NAME_ACQUIRED = 1ULL << 6,
++};
++
++/**
++ * struct kdbus_cmd - generic ioctl payload
++ * @size: Overall size of this structure
++ * @flags: Flags for this ioctl, userspace → kernel
++ * @return_flags: Ioctl return flags, kernel → userspace
++ * @items: Additional items to modify the behavior
++ *
++ * This is a generic ioctl payload object. It's used by all ioctls that only
++ * take flags and items as input.
++ */
++struct kdbus_cmd {
++ __u64 size;
++ __u64 flags;
++ __u64 return_flags;
++ struct kdbus_item items[0];
++} __attribute__((__aligned__(8)));
++
++/**
++ * Ioctl API
++ *
++ * KDBUS_CMD_BUS_MAKE: After opening the "control" node, this command
++ * creates a new bus with the specified
++ * name. The bus is immediately shut down and
++ * cleaned up when the opened file descriptor is
++ * closed.
++ *
++ * KDBUS_CMD_ENDPOINT_MAKE: Creates a new named special endpoint to talk to
++ * the bus. Such endpoints usually carry a more
++ * restrictive policy and grant restricted access
++ * to specific applications.
++ * KDBUS_CMD_ENDPOINT_UPDATE: Update the properties of a custom enpoint. Used
++ * to update the policy.
++ *
++ * KDBUS_CMD_HELLO: By opening the bus node, a connection is
++ * created. After a HELLO the opened connection
++ * becomes an active peer on the bus.
++ * KDBUS_CMD_UPDATE: Update the properties of a connection. Used to
++ * update the metadata subscription mask and
++ * policy.
++ * KDBUS_CMD_BYEBYE: Disconnect a connection. If there are no
++ * messages queued up in the connection's pool,
++ * the call succeeds, and the handle is rendered
++ * unusable. Otherwise, -EBUSY is returned without
++ * any further side-effects.
++ * KDBUS_CMD_FREE: Release the allocated memory in the receiver's
++ * pool.
++ * KDBUS_CMD_CONN_INFO: Retrieve credentials and properties of the
++ * initial creator of the connection. The data was
++ * stored at registration time and does not
++ * necessarily represent the connected process or
++ * the actual state of the process.
++ * KDBUS_CMD_BUS_CREATOR_INFO: Retrieve information of the creator of the bus
++ * a connection is attached to.
++ *
++ * KDBUS_CMD_SEND: Send a message and pass data from userspace to
++ * the kernel.
++ * KDBUS_CMD_RECV: Receive a message from the kernel which is
++ * placed in the receiver's pool.
++ *
++ * KDBUS_CMD_NAME_ACQUIRE: Request a well-known bus name to associate with
++ * the connection. Well-known names are used to
++ * address a peer on the bus.
++ * KDBUS_CMD_NAME_RELEASE: Release a well-known name the connection
++ * currently owns.
++ * KDBUS_CMD_LIST: Retrieve the list of all currently registered
++ * well-known and unique names.
++ *
++ * KDBUS_CMD_MATCH_ADD: Install a match which broadcast messages should
++ * be delivered to the connection.
++ * KDBUS_CMD_MATCH_REMOVE: Remove a current match for broadcast messages.
++ */
++enum kdbus_ioctl_type {
++ /* bus owner (00-0f) */
++ KDBUS_CMD_BUS_MAKE = _IOW(KDBUS_IOCTL_MAGIC, 0x00,
++ struct kdbus_cmd),
++
++ /* endpoint owner (10-1f) */
++ KDBUS_CMD_ENDPOINT_MAKE = _IOW(KDBUS_IOCTL_MAGIC, 0x10,
++ struct kdbus_cmd),
++ KDBUS_CMD_ENDPOINT_UPDATE = _IOW(KDBUS_IOCTL_MAGIC, 0x11,
++ struct kdbus_cmd),
++
++ /* connection owner (80-ff) */
++ KDBUS_CMD_HELLO = _IOWR(KDBUS_IOCTL_MAGIC, 0x80,
++ struct kdbus_cmd_hello),
++ KDBUS_CMD_UPDATE = _IOW(KDBUS_IOCTL_MAGIC, 0x81,
++ struct kdbus_cmd),
++ KDBUS_CMD_BYEBYE = _IOW(KDBUS_IOCTL_MAGIC, 0x82,
++ struct kdbus_cmd),
++ KDBUS_CMD_FREE = _IOW(KDBUS_IOCTL_MAGIC, 0x83,
++ struct kdbus_cmd_free),
++ KDBUS_CMD_CONN_INFO = _IOR(KDBUS_IOCTL_MAGIC, 0x84,
++ struct kdbus_cmd_info),
++ KDBUS_CMD_BUS_CREATOR_INFO = _IOR(KDBUS_IOCTL_MAGIC, 0x85,
++ struct kdbus_cmd_info),
++ KDBUS_CMD_LIST = _IOR(KDBUS_IOCTL_MAGIC, 0x86,
++ struct kdbus_cmd_list),
++
++ KDBUS_CMD_SEND = _IOW(KDBUS_IOCTL_MAGIC, 0x90,
++ struct kdbus_cmd_send),
++ KDBUS_CMD_RECV = _IOR(KDBUS_IOCTL_MAGIC, 0x91,
++ struct kdbus_cmd_recv),
++
++ KDBUS_CMD_NAME_ACQUIRE = _IOW(KDBUS_IOCTL_MAGIC, 0xa0,
++ struct kdbus_cmd),
++ KDBUS_CMD_NAME_RELEASE = _IOW(KDBUS_IOCTL_MAGIC, 0xa1,
++ struct kdbus_cmd),
++
++ KDBUS_CMD_MATCH_ADD = _IOW(KDBUS_IOCTL_MAGIC, 0xb0,
++ struct kdbus_cmd_match),
++ KDBUS_CMD_MATCH_REMOVE = _IOW(KDBUS_IOCTL_MAGIC, 0xb1,
++ struct kdbus_cmd_match),
++};
++
++#endif /* _UAPI_KDBUS_H_ */
+diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
+index 7b1425a..ce2ac5a 100644
+--- a/include/uapi/linux/magic.h
++++ b/include/uapi/linux/magic.h
+@@ -76,4 +76,6 @@
+ #define BTRFS_TEST_MAGIC 0x73727279
+ #define NSFS_MAGIC 0x6e736673
+
++#define KDBUS_SUPER_MAGIC 0x44427573
++
+ #endif /* __LINUX_MAGIC_H__ */
+diff --git a/init/Kconfig b/init/Kconfig
+index dc24dec..9388071 100644
+--- a/init/Kconfig
++++ b/init/Kconfig
+@@ -261,6 +261,19 @@ config POSIX_MQUEUE_SYSCTL
+ depends on SYSCTL
+ default y
+
++config KDBUS
++ tristate "kdbus interprocess communication"
++ depends on TMPFS
++ help
++ D-Bus is a system for low-latency, low-overhead, easy to use
++ interprocess communication (IPC).
++
++ See the man-pages and HTML files in Documentation/kdbus/
++ that are generated by 'make mandocs' and 'make htmldocs'.
++
++ If you have an ordinary machine, select M here. The module
++ will be called kdbus.
++
+ config CROSS_MEMORY_ATTACH
+ bool "Enable process_vm_readv/writev syscalls"
+ depends on MMU
+diff --git a/ipc/Makefile b/ipc/Makefile
+index 86c7300..68ec416 100644
+--- a/ipc/Makefile
++++ b/ipc/Makefile
+@@ -9,4 +9,4 @@ obj_mq-$(CONFIG_COMPAT) += compat_mq.o
+ obj-$(CONFIG_POSIX_MQUEUE) += mqueue.o msgutil.o $(obj_mq-y)
+ obj-$(CONFIG_IPC_NS) += namespace.o
+ obj-$(CONFIG_POSIX_MQUEUE_SYSCTL) += mq_sysctl.o
+-
++obj-$(CONFIG_KDBUS) += kdbus/
+diff --git a/ipc/kdbus/Makefile b/ipc/kdbus/Makefile
+new file mode 100644
+index 0000000..66663a1
+--- /dev/null
++++ b/ipc/kdbus/Makefile
+@@ -0,0 +1,33 @@
++#
++# By setting KDBUS_EXT=2, the kdbus module will be built as kdbus2.ko, and
++# KBUILD_MODNAME=kdbus2. This has the effect that all exported objects have
++# different names than usually (kdbus2fs, /sys/fs/kdbus2/) and you can run
++# your test-infrastructure against the kdbus2.ko, while running your system
++# on kdbus.ko.
++#
++# To just build the module, use:
++# make KDBUS_EXT=2 M=ipc/kdbus
++#
++
++kdbus$(KDBUS_EXT)-y := \
++ bus.o \
++ connection.o \
++ endpoint.o \
++ fs.o \
++ handle.o \
++ item.o \
++ main.o \
++ match.o \
++ message.o \
++ metadata.o \
++ names.o \
++ node.o \
++ notify.o \
++ domain.o \
++ policy.o \
++ pool.o \
++ reply.o \
++ queue.o \
++ util.o
++
++obj-$(CONFIG_KDBUS) += kdbus$(KDBUS_EXT).o
+diff --git a/ipc/kdbus/bus.c b/ipc/kdbus/bus.c
+new file mode 100644
+index 0000000..a67f825
+--- /dev/null
++++ b/ipc/kdbus/bus.c
+@@ -0,0 +1,514 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/fs.h>
++#include <linux/hashtable.h>
++#include <linux/init.h>
++#include <linux/module.h>
++#include <linux/random.h>
++#include <linux/sched.h>
++#include <linux/sizes.h>
++#include <linux/slab.h>
++#include <linux/uaccess.h>
++#include <linux/uio.h>
++
++#include "bus.h"
++#include "notify.h"
++#include "connection.h"
++#include "domain.h"
++#include "endpoint.h"
++#include "handle.h"
++#include "item.h"
++#include "match.h"
++#include "message.h"
++#include "metadata.h"
++#include "names.h"
++#include "policy.h"
++#include "util.h"
++
++static void kdbus_bus_free(struct kdbus_node *node)
++{
++ struct kdbus_bus *bus = container_of(node, struct kdbus_bus, node);
++
++ WARN_ON(!list_empty(&bus->monitors_list));
++ WARN_ON(!hash_empty(bus->conn_hash));
++
++ kdbus_notify_free(bus);
++
++ kdbus_user_unref(bus->creator);
++ kdbus_name_registry_free(bus->name_registry);
++ kdbus_domain_unref(bus->domain);
++ kdbus_policy_db_clear(&bus->policy_db);
++ kdbus_meta_proc_unref(bus->creator_meta);
++ kfree(bus);
++}
++
++static void kdbus_bus_release(struct kdbus_node *node, bool was_active)
++{
++ struct kdbus_bus *bus = container_of(node, struct kdbus_bus, node);
++
++ if (was_active)
++ atomic_dec(&bus->creator->buses);
++}
++
++static struct kdbus_bus *kdbus_bus_new(struct kdbus_domain *domain,
++ const char *name,
++ struct kdbus_bloom_parameter *bloom,
++ const u64 *pattach_owner,
++ u64 flags, kuid_t uid, kgid_t gid)
++{
++ struct kdbus_bus *b;
++ u64 attach_owner;
++ int ret;
++
++ if (bloom->size < 8 || bloom->size > KDBUS_BUS_BLOOM_MAX_SIZE ||
++ !KDBUS_IS_ALIGNED8(bloom->size) || bloom->n_hash < 1)
++ return ERR_PTR(-EINVAL);
++
++ ret = kdbus_sanitize_attach_flags(pattach_owner ? *pattach_owner : 0,
++ &attach_owner);
++ if (ret < 0)
++ return ERR_PTR(ret);
++
++ ret = kdbus_verify_uid_prefix(name, domain->user_namespace, uid);
++ if (ret < 0)
++ return ERR_PTR(ret);
++
++ b = kzalloc(sizeof(*b), GFP_KERNEL);
++ if (!b)
++ return ERR_PTR(-ENOMEM);
++
++ kdbus_node_init(&b->node, KDBUS_NODE_BUS);
++
++ b->node.free_cb = kdbus_bus_free;
++ b->node.release_cb = kdbus_bus_release;
++ b->node.uid = uid;
++ b->node.gid = gid;
++ b->node.mode = S_IRUSR | S_IXUSR;
++
++ if (flags & (KDBUS_MAKE_ACCESS_GROUP | KDBUS_MAKE_ACCESS_WORLD))
++ b->node.mode |= S_IRGRP | S_IXGRP;
++ if (flags & KDBUS_MAKE_ACCESS_WORLD)
++ b->node.mode |= S_IROTH | S_IXOTH;
++
++ b->id = atomic64_inc_return(&domain->last_id);
++ b->bus_flags = flags;
++ b->attach_flags_owner = attach_owner;
++ generate_random_uuid(b->id128);
++ b->bloom = *bloom;
++ b->domain = kdbus_domain_ref(domain);
++
++ kdbus_policy_db_init(&b->policy_db);
++
++ init_rwsem(&b->conn_rwlock);
++ hash_init(b->conn_hash);
++ INIT_LIST_HEAD(&b->monitors_list);
++
++ INIT_LIST_HEAD(&b->notify_list);
++ spin_lock_init(&b->notify_lock);
++ mutex_init(&b->notify_flush_lock);
++
++ ret = kdbus_node_link(&b->node, &domain->node, name);
++ if (ret < 0)
++ goto exit_unref;
++
++ /* cache the metadata/credentials of the creator */
++ b->creator_meta = kdbus_meta_proc_new();
++ if (IS_ERR(b->creator_meta)) {
++ ret = PTR_ERR(b->creator_meta);
++ b->creator_meta = NULL;
++ goto exit_unref;
++ }
++
++ ret = kdbus_meta_proc_collect(b->creator_meta,
++ KDBUS_ATTACH_CREDS |
++ KDBUS_ATTACH_PIDS |
++ KDBUS_ATTACH_AUXGROUPS |
++ KDBUS_ATTACH_TID_COMM |
++ KDBUS_ATTACH_PID_COMM |
++ KDBUS_ATTACH_EXE |
++ KDBUS_ATTACH_CMDLINE |
++ KDBUS_ATTACH_CGROUP |
++ KDBUS_ATTACH_CAPS |
++ KDBUS_ATTACH_SECLABEL |
++ KDBUS_ATTACH_AUDIT);
++ if (ret < 0)
++ goto exit_unref;
++
++ b->name_registry = kdbus_name_registry_new();
++ if (IS_ERR(b->name_registry)) {
++ ret = PTR_ERR(b->name_registry);
++ b->name_registry = NULL;
++ goto exit_unref;
++ }
++
++ /*
++ * Bus-limits of the creator are accounted on its real UID, just like
++ * all other per-user limits.
++ */
++ b->creator = kdbus_user_lookup(domain, current_uid());
++ if (IS_ERR(b->creator)) {
++ ret = PTR_ERR(b->creator);
++ b->creator = NULL;
++ goto exit_unref;
++ }
++
++ return b;
++
++exit_unref:
++ kdbus_node_deactivate(&b->node);
++ kdbus_node_unref(&b->node);
++ return ERR_PTR(ret);
++}
++
++/**
++ * kdbus_bus_ref() - increase the reference counter of a kdbus_bus
++ * @bus: The bus to reference
++ *
++ * Every user of a bus, except for its creator, must add a reference to the
++ * kdbus_bus using this function.
++ *
++ * Return: the bus itself
++ */
++struct kdbus_bus *kdbus_bus_ref(struct kdbus_bus *bus)
++{
++ if (bus)
++ kdbus_node_ref(&bus->node);
++ return bus;
++}
++
++/**
++ * kdbus_bus_unref() - decrease the reference counter of a kdbus_bus
++ * @bus: The bus to unref
++ *
++ * Release a reference. If the reference count drops to 0, the bus will be
++ * freed.
++ *
++ * Return: NULL
++ */
++struct kdbus_bus *kdbus_bus_unref(struct kdbus_bus *bus)
++{
++ if (bus)
++ kdbus_node_unref(&bus->node);
++ return NULL;
++}
++
++/**
++ * kdbus_bus_find_conn_by_id() - find a connection with a given id
++ * @bus: The bus to look for the connection
++ * @id: The 64-bit connection id
++ *
++ * Looks up a connection with a given id. The returned connection
++ * is ref'ed, and needs to be unref'ed by the user. Returns NULL if
++ * the connection can't be found.
++ */
++struct kdbus_conn *kdbus_bus_find_conn_by_id(struct kdbus_bus *bus, u64 id)
++{
++ struct kdbus_conn *conn, *found = NULL;
++
++ down_read(&bus->conn_rwlock);
++ hash_for_each_possible(bus->conn_hash, conn, hentry, id)
++ if (conn->id == id) {
++ found = kdbus_conn_ref(conn);
++ break;
++ }
++ up_read(&bus->conn_rwlock);
++
++ return found;
++}
++
++/**
++ * kdbus_bus_broadcast() - send a message to all subscribed connections
++ * @bus: The bus the connections are connected to
++ * @conn_src: The source connection, may be %NULL for kernel notifications
++ * @staging: Staging object containing the message to send
++ *
++ * Send message to all connections that are currently active on the bus.
++ * Connections must still have matches installed in order to let the message
++ * pass.
++ *
++ * The caller must hold the name-registry lock of @bus.
++ */
++void kdbus_bus_broadcast(struct kdbus_bus *bus,
++ struct kdbus_conn *conn_src,
++ struct kdbus_staging *staging)
++{
++ struct kdbus_conn *conn_dst;
++ unsigned int i;
++ int ret;
++
++ lockdep_assert_held(&bus->name_registry->rwlock);
++
++ /*
++ * Make sure broadcast are queued on monitors before we send it out to
++ * anyone else. Otherwise, connections might react to broadcasts before
++ * the monitor gets the broadcast queued. In the worst case, the
++ * monitor sees a reaction to the broadcast before the broadcast itself.
++ * We don't give ordering guarantees across connections (and monitors
++ * can re-construct order via sequence numbers), but we should at least
++ * try to avoid re-ordering for monitors.
++ */
++ kdbus_bus_eavesdrop(bus, conn_src, staging);
++
++ down_read(&bus->conn_rwlock);
++ hash_for_each(bus->conn_hash, i, conn_dst, hentry) {
++ if (!kdbus_conn_is_ordinary(conn_dst))
++ continue;
++
++ /*
++ * Check if there is a match for the kmsg object in
++ * the destination connection match db
++ */
++ if (!kdbus_match_db_match_msg(conn_dst->match_db, conn_src,
++ staging))
++ continue;
++
++ if (conn_src) {
++ /*
++ * Anyone can send broadcasts, as they have no
++ * destination. But a receiver needs TALK access to
++ * the sender in order to receive broadcasts.
++ */
++ if (!kdbus_conn_policy_talk(conn_dst, NULL, conn_src))
++ continue;
++ } else {
++ /*
++ * Check if there is a policy db that prevents the
++ * destination connection from receiving this kernel
++ * notification
++ */
++ if (!kdbus_conn_policy_see_notification(conn_dst, NULL,
++ staging->msg))
++ continue;
++ }
++
++ ret = kdbus_conn_entry_insert(conn_src, conn_dst, staging,
++ NULL, NULL);
++ if (ret < 0)
++ kdbus_conn_lost_message(conn_dst);
++ }
++ up_read(&bus->conn_rwlock);
++}
++
++/**
++ * kdbus_bus_eavesdrop() - send a message to all subscribed monitors
++ * @bus: The bus the monitors are connected to
++ * @conn_src: The source connection, may be %NULL for kernel notifications
++ * @staging: Staging object containing the message to send
++ *
++ * Send message to all monitors that are currently active on the bus. Monitors
++ * must still have matches installed in order to let the message pass.
++ *
++ * The caller must hold the name-registry lock of @bus.
++ */
++void kdbus_bus_eavesdrop(struct kdbus_bus *bus,
++ struct kdbus_conn *conn_src,
++ struct kdbus_staging *staging)
++{
++ struct kdbus_conn *conn_dst;
++ int ret;
++
++ /*
++ * Monitor connections get all messages; ignore possible errors
++ * when sending messages to monitor connections.
++ */
++
++ lockdep_assert_held(&bus->name_registry->rwlock);
++
++ down_read(&bus->conn_rwlock);
++ list_for_each_entry(conn_dst, &bus->monitors_list, monitor_entry) {
++ ret = kdbus_conn_entry_insert(conn_src, conn_dst, staging,
++ NULL, NULL);
++ if (ret < 0)
++ kdbus_conn_lost_message(conn_dst);
++ }
++ up_read(&bus->conn_rwlock);
++}
++
++/**
++ * kdbus_cmd_bus_make() - handle KDBUS_CMD_BUS_MAKE
++ * @domain: domain to operate on
++ * @argp: command payload
++ *
++ * Return: NULL or newly created bus on success, ERR_PTR on failure.
++ */
++struct kdbus_bus *kdbus_cmd_bus_make(struct kdbus_domain *domain,
++ void __user *argp)
++{
++ struct kdbus_bus *bus = NULL;
++ struct kdbus_cmd *cmd;
++ struct kdbus_ep *ep = NULL;
++ int ret;
++
++ struct kdbus_arg argv[] = {
++ { .type = KDBUS_ITEM_NEGOTIATE },
++ { .type = KDBUS_ITEM_MAKE_NAME, .mandatory = true },
++ { .type = KDBUS_ITEM_BLOOM_PARAMETER, .mandatory = true },
++ { .type = KDBUS_ITEM_ATTACH_FLAGS_SEND },
++ };
++ struct kdbus_args args = {
++ .allowed_flags = KDBUS_FLAG_NEGOTIATE |
++ KDBUS_MAKE_ACCESS_GROUP |
++ KDBUS_MAKE_ACCESS_WORLD,
++ .argv = argv,
++ .argc = ARRAY_SIZE(argv),
++ };
++
++ ret = kdbus_args_parse(&args, argp, &cmd);
++ if (ret < 0)
++ return ERR_PTR(ret);
++ if (ret > 0)
++ return NULL;
++
++ bus = kdbus_bus_new(domain,
++ argv[1].item->str, &argv[2].item->bloom_parameter,
++ argv[3].item ? argv[3].item->data64 : NULL,
++ cmd->flags, current_euid(), current_egid());
++ if (IS_ERR(bus)) {
++ ret = PTR_ERR(bus);
++ bus = NULL;
++ goto exit;
++ }
++
++ if (atomic_inc_return(&bus->creator->buses) > KDBUS_USER_MAX_BUSES) {
++ atomic_dec(&bus->creator->buses);
++ ret = -EMFILE;
++ goto exit;
++ }
++
++ if (!kdbus_node_activate(&bus->node)) {
++ atomic_dec(&bus->creator->buses);
++ ret = -ESHUTDOWN;
++ goto exit;
++ }
++
++ ep = kdbus_ep_new(bus, "bus", cmd->flags, bus->node.uid, bus->node.gid,
++ false);
++ if (IS_ERR(ep)) {
++ ret = PTR_ERR(ep);
++ ep = NULL;
++ goto exit;
++ }
++
++ if (!kdbus_node_activate(&ep->node)) {
++ ret = -ESHUTDOWN;
++ goto exit;
++ }
++
++ /*
++ * Drop our own reference, effectively causing the endpoint to be
++ * deactivated and released when the parent bus is.
++ */
++ ep = kdbus_ep_unref(ep);
++
++exit:
++ ret = kdbus_args_clear(&args, ret);
++ if (ret < 0) {
++ if (ep) {
++ kdbus_node_deactivate(&ep->node);
++ kdbus_ep_unref(ep);
++ }
++ if (bus) {
++ kdbus_node_deactivate(&bus->node);
++ kdbus_bus_unref(bus);
++ }
++ return ERR_PTR(ret);
++ }
++ return bus;
++}
++
++/**
++ * kdbus_cmd_bus_creator_info() - handle KDBUS_CMD_BUS_CREATOR_INFO
++ * @conn: connection to operate on
++ * @argp: command payload
++ *
++ * Return: >=0 on success, negative error code on failure.
++ */
++int kdbus_cmd_bus_creator_info(struct kdbus_conn *conn, void __user *argp)
++{
++ struct kdbus_cmd_info *cmd;
++ struct kdbus_bus *bus = conn->ep->bus;
++ struct kdbus_pool_slice *slice = NULL;
++ struct kdbus_item *meta_items = NULL;
++ struct kdbus_item_header item_hdr;
++ struct kdbus_info info = {};
++ size_t meta_size, name_len, cnt = 0;
++ struct kvec kvec[6];
++ u64 attach_flags, size = 0;
++ int ret;
++
++ struct kdbus_arg argv[] = {
++ { .type = KDBUS_ITEM_NEGOTIATE },
++ };
++ struct kdbus_args args = {
++ .allowed_flags = KDBUS_FLAG_NEGOTIATE,
++ .argv = argv,
++ .argc = ARRAY_SIZE(argv),
++ };
++
++ ret = kdbus_args_parse(&args, argp, &cmd);
++ if (ret != 0)
++ return ret;
++
++ ret = kdbus_sanitize_attach_flags(cmd->attach_flags, &attach_flags);
++ if (ret < 0)
++ goto exit;
++
++ attach_flags &= bus->attach_flags_owner;
++
++ ret = kdbus_meta_emit(bus->creator_meta, NULL, NULL, conn,
++ attach_flags, &meta_items, &meta_size);
++ if (ret < 0)
++ goto exit;
++
++ name_len = strlen(bus->node.name) + 1;
++ info.id = bus->id;
++ info.flags = bus->bus_flags;
++ item_hdr.type = KDBUS_ITEM_MAKE_NAME;
++ item_hdr.size = KDBUS_ITEM_HEADER_SIZE + name_len;
++
++ kdbus_kvec_set(&kvec[cnt++], &info, sizeof(info), &size);
++ kdbus_kvec_set(&kvec[cnt++], &item_hdr, sizeof(item_hdr), &size);
++ kdbus_kvec_set(&kvec[cnt++], bus->node.name, name_len, &size);
++ cnt += !!kdbus_kvec_pad(&kvec[cnt], &size);
++ if (meta_size > 0) {
++ kdbus_kvec_set(&kvec[cnt++], meta_items, meta_size, &size);
++ cnt += !!kdbus_kvec_pad(&kvec[cnt], &size);
++ }
++
++ info.size = size;
++
++ slice = kdbus_pool_slice_alloc(conn->pool, size, false);
++ if (IS_ERR(slice)) {
++ ret = PTR_ERR(slice);
++ slice = NULL;
++ goto exit;
++ }
++
++ ret = kdbus_pool_slice_copy_kvec(slice, 0, kvec, cnt, size);
++ if (ret < 0)
++ goto exit;
++
++ kdbus_pool_slice_publish(slice, &cmd->offset, &cmd->info_size);
++
++ if (kdbus_member_set_user(&cmd->offset, argp, typeof(*cmd), offset) ||
++ kdbus_member_set_user(&cmd->info_size, argp,
++ typeof(*cmd), info_size))
++ ret = -EFAULT;
++
++exit:
++ kdbus_pool_slice_release(slice);
++ kfree(meta_items);
++ return kdbus_args_clear(&args, ret);
++}
+diff --git a/ipc/kdbus/bus.h b/ipc/kdbus/bus.h
+new file mode 100644
+index 0000000..8c2acae
+--- /dev/null
++++ b/ipc/kdbus/bus.h
+@@ -0,0 +1,101 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_BUS_H
++#define __KDBUS_BUS_H
++
++#include <linux/hashtable.h>
++#include <linux/list.h>
++#include <linux/mutex.h>
++#include <linux/rwsem.h>
++#include <linux/spinlock.h>
++#include <uapi/linux/kdbus.h>
++
++#include "metadata.h"
++#include "names.h"
++#include "node.h"
++#include "policy.h"
++
++struct kdbus_conn;
++struct kdbus_domain;
++struct kdbus_staging;
++struct kdbus_user;
++
++/**
++ * struct kdbus_bus - bus in a domain
++ * @node: kdbus_node
++ * @id: ID of this bus in the domain
++ * @bus_flags: Simple pass-through flags from userspace to userspace
++ * @attach_flags_owner: KDBUS_ATTACH_* flags of bus creator that other
++ * connections can see or query
++ * @id128: Unique random 128 bit ID of this bus
++ * @bloom: Bloom parameters
++ * @domain: Domain of this bus
++ * @creator: Creator of the bus
++ * @creator_meta: Meta information about the bus creator
++ * @last_message_id: Last used message id
++ * @policy_db: Policy database for this bus
++ * @name_registry: Name registry of this bus
++ * @conn_rwlock: Read/Write lock for all lists of child connections
++ * @conn_hash: Map of connection IDs
++ * @monitors_list: Connections that monitor this bus
++ * @notify_list: List of pending kernel-generated messages
++ * @notify_lock: Notification list lock
++ * @notify_flush_lock: Notification flushing lock
++ */
++struct kdbus_bus {
++ struct kdbus_node node;
++
++ /* static */
++ u64 id;
++ u64 bus_flags;
++ u64 attach_flags_owner;
++ u8 id128[16];
++ struct kdbus_bloom_parameter bloom;
++ struct kdbus_domain *domain;
++ struct kdbus_user *creator;
++ struct kdbus_meta_proc *creator_meta;
++
++ /* protected by own locks */
++ atomic64_t last_message_id;
++ struct kdbus_policy_db policy_db;
++ struct kdbus_name_registry *name_registry;
++
++ /* protected by conn_rwlock */
++ struct rw_semaphore conn_rwlock;
++ DECLARE_HASHTABLE(conn_hash, 8);
++ struct list_head monitors_list;
++
++ /* protected by notify_lock */
++ struct list_head notify_list;
++ spinlock_t notify_lock;
++ struct mutex notify_flush_lock;
++};
++
++struct kdbus_bus *kdbus_bus_ref(struct kdbus_bus *bus);
++struct kdbus_bus *kdbus_bus_unref(struct kdbus_bus *bus);
++
++struct kdbus_conn *kdbus_bus_find_conn_by_id(struct kdbus_bus *bus, u64 id);
++void kdbus_bus_broadcast(struct kdbus_bus *bus,
++ struct kdbus_conn *conn_src,
++ struct kdbus_staging *staging);
++void kdbus_bus_eavesdrop(struct kdbus_bus *bus,
++ struct kdbus_conn *conn_src,
++ struct kdbus_staging *staging);
++
++struct kdbus_bus *kdbus_cmd_bus_make(struct kdbus_domain *domain,
++ void __user *argp);
++int kdbus_cmd_bus_creator_info(struct kdbus_conn *conn, void __user *argp);
++
++#endif
+diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c
+new file mode 100644
+index 0000000..ef63d65
+--- /dev/null
++++ b/ipc/kdbus/connection.c
+@@ -0,0 +1,2227 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/audit.h>
++#include <linux/file.h>
++#include <linux/fs.h>
++#include <linux/fs_struct.h>
++#include <linux/hashtable.h>
++#include <linux/idr.h>
++#include <linux/init.h>
++#include <linux/math64.h>
++#include <linux/mm.h>
++#include <linux/module.h>
++#include <linux/mutex.h>
++#include <linux/path.h>
++#include <linux/poll.h>
++#include <linux/sched.h>
++#include <linux/shmem_fs.h>
++#include <linux/sizes.h>
++#include <linux/slab.h>
++#include <linux/syscalls.h>
++#include <linux/uio.h>
++
++#include "bus.h"
++#include "connection.h"
++#include "endpoint.h"
++#include "handle.h"
++#include "match.h"
++#include "message.h"
++#include "metadata.h"
++#include "names.h"
++#include "domain.h"
++#include "item.h"
++#include "notify.h"
++#include "policy.h"
++#include "pool.h"
++#include "reply.h"
++#include "util.h"
++#include "queue.h"
++
++#define KDBUS_CONN_ACTIVE_BIAS (INT_MIN + 2)
++#define KDBUS_CONN_ACTIVE_NEW (INT_MIN + 1)
++
++static struct kdbus_conn *kdbus_conn_new(struct kdbus_ep *ep,
++ struct file *file,
++ struct kdbus_cmd_hello *hello,
++ const char *name,
++ const struct kdbus_creds *creds,
++ const struct kdbus_pids *pids,
++ const char *seclabel,
++ const char *conn_description)
++{
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++ static struct lock_class_key __key;
++#endif
++ struct kdbus_pool_slice *slice = NULL;
++ struct kdbus_bus *bus = ep->bus;
++ struct kdbus_conn *conn;
++ u64 attach_flags_send;
++ u64 attach_flags_recv;
++ u64 items_size = 0;
++ bool is_policy_holder;
++ bool is_activator;
++ bool is_monitor;
++ bool privileged;
++ bool owner;
++ struct kvec kvec;
++ int ret;
++
++ struct {
++ u64 size;
++ u64 type;
++ struct kdbus_bloom_parameter bloom;
++ } bloom_item;
++
++ privileged = kdbus_ep_is_privileged(ep, file);
++ owner = kdbus_ep_is_owner(ep, file);
++
++ is_monitor = hello->flags & KDBUS_HELLO_MONITOR;
++ is_activator = hello->flags & KDBUS_HELLO_ACTIVATOR;
++ is_policy_holder = hello->flags & KDBUS_HELLO_POLICY_HOLDER;
++
++ if (!hello->pool_size || !IS_ALIGNED(hello->pool_size, PAGE_SIZE))
++ return ERR_PTR(-EINVAL);
++ if (is_monitor + is_activator + is_policy_holder > 1)
++ return ERR_PTR(-EINVAL);
++ if (name && !is_activator && !is_policy_holder)
++ return ERR_PTR(-EINVAL);
++ if (!name && (is_activator || is_policy_holder))
++ return ERR_PTR(-EINVAL);
++ if (name && !kdbus_name_is_valid(name, true))
++ return ERR_PTR(-EINVAL);
++ if (is_monitor && ep->user)
++ return ERR_PTR(-EOPNOTSUPP);
++ if (!owner && (is_activator || is_policy_holder || is_monitor))
++ return ERR_PTR(-EPERM);
++ if (!owner && (creds || pids || seclabel))
++ return ERR_PTR(-EPERM);
++
++ ret = kdbus_sanitize_attach_flags(hello->attach_flags_send,
++ &attach_flags_send);
++ if (ret < 0)
++ return ERR_PTR(ret);
++
++ ret = kdbus_sanitize_attach_flags(hello->attach_flags_recv,
++ &attach_flags_recv);
++ if (ret < 0)
++ return ERR_PTR(ret);
++
++ conn = kzalloc(sizeof(*conn), GFP_KERNEL);
++ if (!conn)
++ return ERR_PTR(-ENOMEM);
++
++ kref_init(&conn->kref);
++ atomic_set(&conn->active, KDBUS_CONN_ACTIVE_NEW);
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++ lockdep_init_map(&conn->dep_map, "s_active", &__key, 0);
++#endif
++ mutex_init(&conn->lock);
++ INIT_LIST_HEAD(&conn->names_list);
++ INIT_LIST_HEAD(&conn->reply_list);
++ atomic_set(&conn->request_count, 0);
++ atomic_set(&conn->lost_count, 0);
++ INIT_DELAYED_WORK(&conn->work, kdbus_reply_list_scan_work);
++ conn->cred = get_cred(file->f_cred);
++ conn->pid = get_pid(task_pid(current));
++ get_fs_root(current->fs, &conn->root_path);
++ init_waitqueue_head(&conn->wait);
++ kdbus_queue_init(&conn->queue);
++ conn->privileged = privileged;
++ conn->owner = owner;
++ conn->ep = kdbus_ep_ref(ep);
++ conn->id = atomic64_inc_return(&bus->domain->last_id);
++ conn->flags = hello->flags;
++ atomic64_set(&conn->attach_flags_send, attach_flags_send);
++ atomic64_set(&conn->attach_flags_recv, attach_flags_recv);
++ INIT_LIST_HEAD(&conn->monitor_entry);
++
++ if (conn_description) {
++ conn->description = kstrdup(conn_description, GFP_KERNEL);
++ if (!conn->description) {
++ ret = -ENOMEM;
++ goto exit_unref;
++ }
++ }
++
++ conn->pool = kdbus_pool_new(conn->description, hello->pool_size);
++ if (IS_ERR(conn->pool)) {
++ ret = PTR_ERR(conn->pool);
++ conn->pool = NULL;
++ goto exit_unref;
++ }
++
++ conn->match_db = kdbus_match_db_new();
++ if (IS_ERR(conn->match_db)) {
++ ret = PTR_ERR(conn->match_db);
++ conn->match_db = NULL;
++ goto exit_unref;
++ }
++
++ /* return properties of this connection to the caller */
++ hello->bus_flags = bus->bus_flags;
++ hello->id = conn->id;
++
++ BUILD_BUG_ON(sizeof(bus->id128) != sizeof(hello->id128));
++ memcpy(hello->id128, bus->id128, sizeof(hello->id128));
++
++ /* privileged processes can impersonate somebody else */
++ if (creds || pids || seclabel) {
++ conn->meta_fake = kdbus_meta_fake_new();
++ if (IS_ERR(conn->meta_fake)) {
++ ret = PTR_ERR(conn->meta_fake);
++ conn->meta_fake = NULL;
++ goto exit_unref;
++ }
++
++ ret = kdbus_meta_fake_collect(conn->meta_fake,
++ creds, pids, seclabel);
++ if (ret < 0)
++ goto exit_unref;
++ } else {
++ conn->meta_proc = kdbus_meta_proc_new();
++ if (IS_ERR(conn->meta_proc)) {
++ ret = PTR_ERR(conn->meta_proc);
++ conn->meta_proc = NULL;
++ goto exit_unref;
++ }
++
++ ret = kdbus_meta_proc_collect(conn->meta_proc,
++ KDBUS_ATTACH_CREDS |
++ KDBUS_ATTACH_PIDS |
++ KDBUS_ATTACH_AUXGROUPS |
++ KDBUS_ATTACH_TID_COMM |
++ KDBUS_ATTACH_PID_COMM |
++ KDBUS_ATTACH_EXE |
++ KDBUS_ATTACH_CMDLINE |
++ KDBUS_ATTACH_CGROUP |
++ KDBUS_ATTACH_CAPS |
++ KDBUS_ATTACH_SECLABEL |
++ KDBUS_ATTACH_AUDIT);
++ if (ret < 0)
++ goto exit_unref;
++ }
++
++ /*
++ * Account the connection against the current user (UID), or for
++ * custom endpoints use the anonymous user assigned to the endpoint.
++ * Note that limits are always accounted against the real UID, not
++ * the effective UID (cred->user always points to the accounting of
++ * cred->uid, not cred->euid).
++ * In case the caller is privileged, we allow changing the accounting
++ * to the faked user.
++ */
++ if (ep->user) {
++ conn->user = kdbus_user_ref(ep->user);
++ } else {
++ kuid_t uid;
++
++ if (conn->meta_fake && uid_valid(conn->meta_fake->uid) &&
++ conn->privileged)
++ uid = conn->meta_fake->uid;
++ else
++ uid = conn->cred->uid;
++
++ conn->user = kdbus_user_lookup(ep->bus->domain, uid);
++ if (IS_ERR(conn->user)) {
++ ret = PTR_ERR(conn->user);
++ conn->user = NULL;
++ goto exit_unref;
++ }
++ }
++
++ if (atomic_inc_return(&conn->user->connections) > KDBUS_USER_MAX_CONN) {
++ /* decremented by destructor as conn->user is valid */
++ ret = -EMFILE;
++ goto exit_unref;
++ }
++
++ bloom_item.size = sizeof(bloom_item);
++ bloom_item.type = KDBUS_ITEM_BLOOM_PARAMETER;
++ bloom_item.bloom = bus->bloom;
++ kdbus_kvec_set(&kvec, &bloom_item, bloom_item.size, &items_size);
++
++ slice = kdbus_pool_slice_alloc(conn->pool, items_size, false);
++ if (IS_ERR(slice)) {
++ ret = PTR_ERR(slice);
++ slice = NULL;
++ goto exit_unref;
++ }
++
++ ret = kdbus_pool_slice_copy_kvec(slice, 0, &kvec, 1, items_size);
++ if (ret < 0)
++ goto exit_unref;
++
++ kdbus_pool_slice_publish(slice, &hello->offset, &hello->items_size);
++ kdbus_pool_slice_release(slice);
++
++ return conn;
++
++exit_unref:
++ kdbus_pool_slice_release(slice);
++ kdbus_conn_unref(conn);
++ return ERR_PTR(ret);
++}
++
++static void __kdbus_conn_free(struct kref *kref)
++{
++ struct kdbus_conn *conn = container_of(kref, struct kdbus_conn, kref);
++
++ WARN_ON(kdbus_conn_active(conn));
++ WARN_ON(delayed_work_pending(&conn->work));
++ WARN_ON(!list_empty(&conn->queue.msg_list));
++ WARN_ON(!list_empty(&conn->names_list));
++ WARN_ON(!list_empty(&conn->reply_list));
++
++ if (conn->user) {
++ atomic_dec(&conn->user->connections);
++ kdbus_user_unref(conn->user);
++ }
++
++ kdbus_meta_fake_free(conn->meta_fake);
++ kdbus_meta_proc_unref(conn->meta_proc);
++ kdbus_match_db_free(conn->match_db);
++ kdbus_pool_free(conn->pool);
++ kdbus_ep_unref(conn->ep);
++ path_put(&conn->root_path);
++ put_pid(conn->pid);
++ put_cred(conn->cred);
++ kfree(conn->description);
++ kfree(conn->quota);
++ kfree(conn);
++}
++
++/**
++ * kdbus_conn_ref() - take a connection reference
++ * @conn: Connection, may be %NULL
++ *
++ * Return: the connection itself
++ */
++struct kdbus_conn *kdbus_conn_ref(struct kdbus_conn *conn)
++{
++ if (conn)
++ kref_get(&conn->kref);
++ return conn;
++}
++
++/**
++ * kdbus_conn_unref() - drop a connection reference
++ * @conn: Connection (may be NULL)
++ *
++ * When the last reference is dropped, the connection's internal structure
++ * is freed.
++ *
++ * Return: NULL
++ */
++struct kdbus_conn *kdbus_conn_unref(struct kdbus_conn *conn)
++{
++ if (conn)
++ kref_put(&conn->kref, __kdbus_conn_free);
++ return NULL;
++}
++
++/**
++ * kdbus_conn_active() - connection is not disconnected
++ * @conn: Connection to check
++ *
++ * Return true if the connection was not disconnected, yet. Note that a
++ * connection might be disconnected asynchronously, unless you hold the
++ * connection lock. If that's not suitable for you, see kdbus_conn_acquire() to
++ * suppress connection shutdown for a short period.
++ *
++ * Return: true if the connection is still active
++ */
++bool kdbus_conn_active(const struct kdbus_conn *conn)
++{
++ return atomic_read(&conn->active) >= 0;
++}
++
++/**
++ * kdbus_conn_acquire() - acquire an active connection reference
++ * @conn: Connection
++ *
++ * Users can close a connection via KDBUS_BYEBYE (or by destroying the
++ * endpoint/bus/...) at any time. Whenever this happens, we should deny any
++ * user-visible action on this connection and signal ECONNRESET instead.
++ * To avoid testing for connection availability everytime you take the
++ * connection-lock, you can acquire a connection for short periods.
++ *
++ * By calling kdbus_conn_acquire(), you gain an "active reference" to the
++ * connection. You must also hold a regular reference at any time! As long as
++ * you hold the active-ref, the connection will not be shut down. However, if
++ * the connection was shut down, you can never acquire an active-ref again.
++ *
++ * kdbus_conn_disconnect() disables the connection and then waits for all active
++ * references to be dropped. It will also wake up any pending operation.
++ * However, you must not sleep for an indefinite period while holding an
++ * active-reference. Otherwise, kdbus_conn_disconnect() might stall. If you need
++ * to sleep for an indefinite period, either release the reference and try to
++ * acquire it again after waking up, or make kdbus_conn_disconnect() wake up
++ * your wait-queue.
++ *
++ * Return: 0 on success, negative error code on failure.
++ */
++int kdbus_conn_acquire(struct kdbus_conn *conn)
++{
++ if (!atomic_inc_unless_negative(&conn->active))
++ return -ECONNRESET;
++
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++ rwsem_acquire_read(&conn->dep_map, 0, 1, _RET_IP_);
++#endif
++
++ return 0;
++}
++
++/**
++ * kdbus_conn_release() - release an active connection reference
++ * @conn: Connection
++ *
++ * This releases an active reference that has been acquired via
++ * kdbus_conn_acquire(). If the connection was already disabled and this is the
++ * last active-ref that is dropped, the disconnect-waiter will be woken up and
++ * properly close the connection.
++ */
++void kdbus_conn_release(struct kdbus_conn *conn)
++{
++ int v;
++
++ if (!conn)
++ return;
++
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++ rwsem_release(&conn->dep_map, 1, _RET_IP_);
++#endif
++
++ v = atomic_dec_return(&conn->active);
++ if (v != KDBUS_CONN_ACTIVE_BIAS)
++ return;
++
++ wake_up_all(&conn->wait);
++}
++
++static int kdbus_conn_connect(struct kdbus_conn *conn, const char *name)
++{
++ struct kdbus_ep *ep = conn->ep;
++ struct kdbus_bus *bus = ep->bus;
++ int ret;
++
++ if (WARN_ON(atomic_read(&conn->active) != KDBUS_CONN_ACTIVE_NEW))
++ return -EALREADY;
++
++ /* make sure the ep-node is active while we add our connection */
++ if (!kdbus_node_acquire(&ep->node))
++ return -ESHUTDOWN;
++
++ /* lock order: domain -> bus -> ep -> names -> conn */
++ mutex_lock(&ep->lock);
++ down_write(&bus->conn_rwlock);
++
++ /* link into monitor list */
++ if (kdbus_conn_is_monitor(conn))
++ list_add_tail(&conn->monitor_entry, &bus->monitors_list);
++
++ /* link into bus and endpoint */
++ list_add_tail(&conn->ep_entry, &ep->conn_list);
++ hash_add(bus->conn_hash, &conn->hentry, conn->id);
++
++ /* enable lookups and acquire active ref */
++ atomic_set(&conn->active, 1);
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++ rwsem_acquire_read(&conn->dep_map, 0, 1, _RET_IP_);
++#endif
++
++ up_write(&bus->conn_rwlock);
++ mutex_unlock(&ep->lock);
++
++ kdbus_node_release(&ep->node);
++
++ /*
++ * Notify subscribers about the new active connection, unless it is
++ * a monitor. Monitors are invisible on the bus, can't be addressed
++ * directly, and won't cause any notifications.
++ */
++ if (!kdbus_conn_is_monitor(conn)) {
++ ret = kdbus_notify_id_change(bus, KDBUS_ITEM_ID_ADD,
++ conn->id, conn->flags);
++ if (ret < 0)
++ goto exit_disconnect;
++ }
++
++ if (kdbus_conn_is_activator(conn)) {
++ u64 flags = KDBUS_NAME_ACTIVATOR;
++
++ if (WARN_ON(!name)) {
++ ret = -EINVAL;
++ goto exit_disconnect;
++ }
++
++ ret = kdbus_name_acquire(bus->name_registry, conn, name,
++ flags, NULL);
++ if (ret < 0)
++ goto exit_disconnect;
++ }
++
++ kdbus_conn_release(conn);
++ kdbus_notify_flush(bus);
++ return 0;
++
++exit_disconnect:
++ kdbus_conn_release(conn);
++ kdbus_conn_disconnect(conn, false);
++ return ret;
++}
++
++/**
++ * kdbus_conn_disconnect() - disconnect a connection
++ * @conn: The connection to disconnect
++ * @ensure_queue_empty: Flag to indicate if the call should fail in
++ * case the connection's message list is not
++ * empty
++ *
++ * If @ensure_msg_list_empty is true, and the connection has pending messages,
++ * -EBUSY is returned.
++ *
++ * Return: 0 on success, negative errno on failure
++ */
++int kdbus_conn_disconnect(struct kdbus_conn *conn, bool ensure_queue_empty)
++{
++ struct kdbus_queue_entry *entry, *tmp;
++ struct kdbus_bus *bus = conn->ep->bus;
++ struct kdbus_reply *r, *r_tmp;
++ struct kdbus_conn *c;
++ int i, v;
++
++ mutex_lock(&conn->lock);
++ v = atomic_read(&conn->active);
++ if (v == KDBUS_CONN_ACTIVE_NEW) {
++ /* was never connected */
++ mutex_unlock(&conn->lock);
++ return 0;
++ }
++ if (v < 0) {
++ /* already dead */
++ mutex_unlock(&conn->lock);
++ return -ECONNRESET;
++ }
++ if (ensure_queue_empty && !list_empty(&conn->queue.msg_list)) {
++ /* still busy */
++ mutex_unlock(&conn->lock);
++ return -EBUSY;
++ }
++
++ atomic_add(KDBUS_CONN_ACTIVE_BIAS, &conn->active);
++ mutex_unlock(&conn->lock);
++
++ wake_up_interruptible(&conn->wait);
++
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++ rwsem_acquire(&conn->dep_map, 0, 0, _RET_IP_);
++ if (atomic_read(&conn->active) != KDBUS_CONN_ACTIVE_BIAS)
++ lock_contended(&conn->dep_map, _RET_IP_);
++#endif
++
++ wait_event(conn->wait,
++ atomic_read(&conn->active) == KDBUS_CONN_ACTIVE_BIAS);
++
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++ lock_acquired(&conn->dep_map, _RET_IP_);
++ rwsem_release(&conn->dep_map, 1, _RET_IP_);
++#endif
++
++ cancel_delayed_work_sync(&conn->work);
++ kdbus_policy_remove_owner(&conn->ep->bus->policy_db, conn);
++
++ /* lock order: domain -> bus -> ep -> names -> conn */
++ mutex_lock(&conn->ep->lock);
++ down_write(&bus->conn_rwlock);
++
++ /* remove from bus and endpoint */
++ hash_del(&conn->hentry);
++ list_del(&conn->monitor_entry);
++ list_del(&conn->ep_entry);
++
++ up_write(&bus->conn_rwlock);
++ mutex_unlock(&conn->ep->lock);
++
++ /*
++ * Remove all names associated with this connection; this possibly
++ * moves queued messages back to the activator connection.
++ */
++ kdbus_name_release_all(bus->name_registry, conn);
++
++ /* if we die while other connections wait for our reply, notify them */
++ mutex_lock(&conn->lock);
++ list_for_each_entry_safe(entry, tmp, &conn->queue.msg_list, entry) {
++ if (entry->reply)
++ kdbus_notify_reply_dead(bus,
++ entry->reply->reply_dst->id,
++ entry->reply->cookie);
++ kdbus_queue_entry_free(entry);
++ }
++
++ list_for_each_entry_safe(r, r_tmp, &conn->reply_list, entry)
++ kdbus_reply_unlink(r);
++ mutex_unlock(&conn->lock);
++
++ /* lock order: domain -> bus -> ep -> names -> conn */
++ down_read(&bus->conn_rwlock);
++ hash_for_each(bus->conn_hash, i, c, hentry) {
++ mutex_lock(&c->lock);
++ list_for_each_entry_safe(r, r_tmp, &c->reply_list, entry) {
++ if (r->reply_src != conn)
++ continue;
++
++ if (r->sync)
++ kdbus_sync_reply_wakeup(r, -EPIPE);
++ else
++ /* send a 'connection dead' notification */
++ kdbus_notify_reply_dead(bus, c->id, r->cookie);
++
++ kdbus_reply_unlink(r);
++ }
++ mutex_unlock(&c->lock);
++ }
++ up_read(&bus->conn_rwlock);
++
++ if (!kdbus_conn_is_monitor(conn))
++ kdbus_notify_id_change(bus, KDBUS_ITEM_ID_REMOVE,
++ conn->id, conn->flags);
++
++ kdbus_notify_flush(bus);
++
++ return 0;
++}
++
++/**
++ * kdbus_conn_has_name() - check if a connection owns a name
++ * @conn: Connection
++ * @name: Well-know name to check for
++ *
++ * The caller must hold the registry lock of conn->ep->bus.
++ *
++ * Return: true if the name is currently owned by the connection
++ */
++bool kdbus_conn_has_name(struct kdbus_conn *conn, const char *name)
++{
++ struct kdbus_name_owner *owner;
++
++ lockdep_assert_held(&conn->ep->bus->name_registry->rwlock);
++
++ list_for_each_entry(owner, &conn->names_list, conn_entry)
++ if (!(owner->flags & KDBUS_NAME_IN_QUEUE) &&
++ !strcmp(name, owner->name->name))
++ return true;
++
++ return false;
++}
++
++struct kdbus_quota {
++ u32 memory;
++ u16 msgs;
++ u8 fds;
++};
++
++/**
++ * kdbus_conn_quota_inc() - increase quota accounting
++ * @c: connection owning the quota tracking
++ * @u: user to account for (or NULL for kernel accounting)
++ * @memory: size of memory to account for
++ * @fds: number of FDs to account for
++ *
++ * This call manages the quotas on resource @c. That is, it's used if other
++ * users want to use the resources of connection @c, which so far only concerns
++ * the receive queue of the destination.
++ *
++ * This increases the quota-accounting for user @u by @memory bytes and @fds
++ * file descriptors. If the user has already reached the quota limits, this call
++ * will not do any accounting but return a negative error code indicating the
++ * failure.
++ *
++ * Return: 0 on success, negative error code on failure.
++ */
++int kdbus_conn_quota_inc(struct kdbus_conn *c, struct kdbus_user *u,
++ size_t memory, size_t fds)
++{
++ struct kdbus_quota *quota;
++ size_t available, accounted;
++ unsigned int id;
++
++ /*
++ * Pool Layout:
++ * 50% of a pool is always owned by the connection. It is reserved for
++ * kernel queries, handling received messages and other tasks that are
++ * under control of the pool owner. The other 50% of the pool are used
++ * as incoming queue.
++ * As we optionally support user-space based policies, we need fair
++ * allocation schemes. Furthermore, resource utilization should be
++ * maximized, so only minimal resources stay reserved. However, we need
++ * to adapt to a dynamic number of users, as we cannot know how many
++ * users will talk to a connection. Therefore, the current allocation
++ * works like this:
++ * We limit the number of bytes in a destination's pool per sending
++ * user. The space available for a user is 33% of the unused pool space
++ * (whereas the space used by the user itself is also treated as
++ * 'unused'). This way, we favor users coming first, but keep enough
++ * pool space available for any following users. Given that messages are
++ * dequeued in FIFO order, this should balance nicely if the number of
++ * users grows. At the same time, this algorithm guarantees that the
++ * space available to a connection is reduced dynamically, the more
++ * concurrent users talk to a connection.
++ */
++
++ /* per user-accounting is expensive, so we keep state small */
++ BUILD_BUG_ON(sizeof(quota->memory) != 4);
++ BUILD_BUG_ON(sizeof(quota->msgs) != 2);
++ BUILD_BUG_ON(sizeof(quota->fds) != 1);
++ BUILD_BUG_ON(KDBUS_CONN_MAX_MSGS > U16_MAX);
++ BUILD_BUG_ON(KDBUS_CONN_MAX_FDS_PER_USER > U8_MAX);
++
++ id = u ? u->id : KDBUS_USER_KERNEL_ID;
++ if (id >= c->n_quota) {
++ unsigned int users;
++
++ users = max(KDBUS_ALIGN8(id) + 8, id);
++ quota = krealloc(c->quota, users * sizeof(*quota),
++ GFP_KERNEL | __GFP_ZERO);
++ if (!quota)
++ return -ENOMEM;
++
++ c->n_quota = users;
++ c->quota = quota;
++ }
++
++ quota = &c->quota[id];
++ kdbus_pool_accounted(c->pool, &available, &accounted);
++
++ /* half the pool is _always_ reserved for the pool owner */
++ available /= 2;
++
++ /*
++ * Pool owner slices are un-accounted slices; they can claim more
++ * than 50% of the queue. However, the slices we're dealing with here
++ * belong to the incoming queue, hence they are 'accounted' slices
++ * to which the 50%-limit applies.
++ */
++ if (available < accounted)
++ return -ENOBUFS;
++
++ /* 1/3 of the remaining space (including your own memory) */
++ available = (available - accounted + quota->memory) / 3;
++
++ if (available < quota->memory ||
++ available - quota->memory < memory ||
++ quota->memory + memory > U32_MAX)
++ return -ENOBUFS;
++ if (quota->msgs >= KDBUS_CONN_MAX_MSGS)
++ return -ENOBUFS;
++ if (quota->fds + fds < quota->fds ||
++ quota->fds + fds > KDBUS_CONN_MAX_FDS_PER_USER)
++ return -EMFILE;
++
++ quota->memory += memory;
++ quota->fds += fds;
++ ++quota->msgs;
++ return 0;
++}
++
++/**
++ * kdbus_conn_quota_dec() - decrease quota accounting
++ * @c: connection owning the quota tracking
++ * @u: user which was accounted for (or NULL for kernel accounting)
++ * @memory: size of memory which was accounted for
++ * @fds: number of FDs which were accounted for
++ *
++ * This does the reverse of kdbus_conn_quota_inc(). You have to release any
++ * accounted resources that you called kdbus_conn_quota_inc() for. However, you
++ * must not call kdbus_conn_quota_dec() if the accounting failed (that is,
++ * kdbus_conn_quota_inc() failed).
++ */
++void kdbus_conn_quota_dec(struct kdbus_conn *c, struct kdbus_user *u,
++ size_t memory, size_t fds)
++{
++ struct kdbus_quota *quota;
++ unsigned int id;
++
++ id = u ? u->id : KDBUS_USER_KERNEL_ID;
++ if (WARN_ON(id >= c->n_quota))
++ return;
++
++ quota = &c->quota[id];
++
++ if (!WARN_ON(quota->msgs == 0))
++ --quota->msgs;
++ if (!WARN_ON(quota->memory < memory))
++ quota->memory -= memory;
++ if (!WARN_ON(quota->fds < fds))
++ quota->fds -= fds;
++}
++
++/**
++ * kdbus_conn_lost_message() - handle lost messages
++ * @c: connection that lost a message
++ *
++ * kdbus is reliable. That means, we try hard to never lose messages. However,
++ * memory is limited, so we cannot rely on transmissions to never fail.
++ * Therefore, we use quota-limits to let callers know if their unicast message
++ * cannot be transmitted to a peer. This works fine for unicasts, but for
++ * broadcasts we cannot make the caller handle the transmission failure.
++ * Instead, we must let the destination know that it couldn't receive a
++ * broadcast.
++ * As this is an unlikely scenario, we keep it simple. A single lost-counter
++ * remembers the number of lost messages since the last call to RECV. The next
++ * message retrieval will notify the connection that it lost messages since the
++ * last message retrieval and thus should resync its state.
++ */
++void kdbus_conn_lost_message(struct kdbus_conn *c)
++{
++ if (atomic_inc_return(&c->lost_count) == 1)
++ wake_up_interruptible(&c->wait);
++}
++
++/* Callers should take the conn_dst lock */
++static struct kdbus_queue_entry *
++kdbus_conn_entry_make(struct kdbus_conn *conn_src,
++ struct kdbus_conn *conn_dst,
++ struct kdbus_staging *staging)
++{
++ /* The remote connection was disconnected */
++ if (!kdbus_conn_active(conn_dst))
++ return ERR_PTR(-ECONNRESET);
++
++ /*
++ * If the connection does not accept file descriptors but the message
++ * has some attached, refuse it.
++ *
++ * If this is a monitor connection, accept the message. In that
++ * case, all file descriptors will be set to -1 at receive time.
++ */
++ if (!kdbus_conn_is_monitor(conn_dst) &&
++ !(conn_dst->flags & KDBUS_HELLO_ACCEPT_FD) &&
++ staging->gaps && staging->gaps->n_fds > 0)
++ return ERR_PTR(-ECOMM);
++
++ return kdbus_queue_entry_new(conn_src, conn_dst, staging);
++}
++
++/*
++ * Synchronously responding to a message, allocate a queue entry
++ * and attach it to the reply tracking object.
++ * The connection's queue will never get to see it.
++ */
++static int kdbus_conn_entry_sync_attach(struct kdbus_conn *conn_dst,
++ struct kdbus_staging *staging,
++ struct kdbus_reply *reply_wake)
++{
++ struct kdbus_queue_entry *entry;
++ int remote_ret, ret = 0;
++
++ mutex_lock(&reply_wake->reply_dst->lock);
++
++ /*
++ * If we are still waiting then proceed, allocate a queue
++ * entry and attach it to the reply object
++ */
++ if (reply_wake->waiting) {
++ entry = kdbus_conn_entry_make(reply_wake->reply_src, conn_dst,
++ staging);
++ if (IS_ERR(entry))
++ ret = PTR_ERR(entry);
++ else
++ /* Attach the entry to the reply object */
++ reply_wake->queue_entry = entry;
++ } else {
++ ret = -ECONNRESET;
++ }
++
++ /*
++ * Update the reply object and wake up remote peer only
++ * on appropriate return codes
++ *
++ * * -ECOMM: if the replying connection failed with -ECOMM
++ * then wakeup remote peer with -EREMOTEIO
++ *
++ * We do this to differenciate between -ECOMM errors
++ * from the original sender perspective:
++ * -ECOMM error during the sync send and
++ * -ECOMM error during the sync reply, this last
++ * one is rewritten to -EREMOTEIO
++ *
++ * * Wake up on all other return codes.
++ */
++ remote_ret = ret;
++
++ if (ret == -ECOMM)
++ remote_ret = -EREMOTEIO;
++
++ kdbus_sync_reply_wakeup(reply_wake, remote_ret);
++ kdbus_reply_unlink(reply_wake);
++ mutex_unlock(&reply_wake->reply_dst->lock);
++
++ return ret;
++}
++
++/**
++ * kdbus_conn_entry_insert() - enqueue a message into the receiver's pool
++ * @conn_src: The sending connection
++ * @conn_dst: The connection to queue into
++ * @staging: Message to send
++ * @reply: The reply tracker to attach to the queue entry
++ * @name: Destination name this msg is sent to, or NULL
++ *
++ * Return: 0 on success. negative error otherwise.
++ */
++int kdbus_conn_entry_insert(struct kdbus_conn *conn_src,
++ struct kdbus_conn *conn_dst,
++ struct kdbus_staging *staging,
++ struct kdbus_reply *reply,
++ const struct kdbus_name_entry *name)
++{
++ struct kdbus_queue_entry *entry;
++ int ret;
++
++ kdbus_conn_lock2(conn_src, conn_dst);
++
++ entry = kdbus_conn_entry_make(conn_src, conn_dst, staging);
++ if (IS_ERR(entry)) {
++ ret = PTR_ERR(entry);
++ goto exit_unlock;
++ }
++
++ if (reply) {
++ kdbus_reply_link(reply);
++ if (!reply->sync)
++ schedule_delayed_work(&conn_src->work, 0);
++ }
++
++ /*
++ * Record the sequence number of the registered name; it will
++ * be remembered by the queue, in case messages addressed to a
++ * name need to be moved from or to an activator.
++ */
++ if (name)
++ entry->dst_name_id = name->name_id;
++
++ kdbus_queue_entry_enqueue(entry, reply);
++ wake_up_interruptible(&conn_dst->wait);
++
++ ret = 0;
++
++exit_unlock:
++ kdbus_conn_unlock2(conn_src, conn_dst);
++ return ret;
++}
++
++static int kdbus_conn_wait_reply(struct kdbus_conn *conn_src,
++ struct kdbus_cmd_send *cmd_send,
++ struct file *ioctl_file,
++ struct file *cancel_fd,
++ struct kdbus_reply *reply_wait,
++ ktime_t expire)
++{
++ struct kdbus_queue_entry *entry;
++ struct poll_wqueues pwq = {};
++ int ret;
++
++ if (WARN_ON(!reply_wait))
++ return -EIO;
++
++ /*
++ * Block until the reply arrives. reply_wait is left untouched
++ * by the timeout scans that might be conducted for other,
++ * asynchronous replies of conn_src.
++ */
++
++ poll_initwait(&pwq);
++ poll_wait(ioctl_file, &conn_src->wait, &pwq.pt);
++
++ for (;;) {
++ /*
++ * Any of the following conditions will stop our synchronously
++ * blocking SEND command:
++ *
++ * a) The origin sender closed its connection
++ * b) The remote peer answered, setting reply_wait->waiting = 0
++ * c) The cancel FD was written to
++ * d) A signal was received
++ * e) The specified timeout was reached, and none of the above
++ * conditions kicked in.
++ */
++
++ /*
++ * We have already acquired an active reference when
++ * entering here, but another thread may call
++ * KDBUS_CMD_BYEBYE which does not acquire an active
++ * reference, therefore kdbus_conn_disconnect() will
++ * not wait for us.
++ */
++ if (!kdbus_conn_active(conn_src)) {
++ ret = -ECONNRESET;
++ break;
++ }
++
++ /*
++ * After the replying peer unset the waiting variable
++ * it will wake up us.
++ */
++ if (!reply_wait->waiting) {
++ ret = reply_wait->err;
++ break;
++ }
++
++ if (cancel_fd) {
++ unsigned int r;
++
++ r = cancel_fd->f_op->poll(cancel_fd, &pwq.pt);
++ if (r & POLLIN) {
++ ret = -ECANCELED;
++ break;
++ }
++ }
++
++ if (signal_pending(current)) {
++ ret = -EINTR;
++ break;
++ }
++
++ if (!poll_schedule_timeout(&pwq, TASK_INTERRUPTIBLE,
++ &expire, 0)) {
++ ret = -ETIMEDOUT;
++ break;
++ }
++
++ /*
++ * Reset the poll worker func, so the waitqueues are not
++ * added to the poll table again. We just reuse what we've
++ * collected earlier for further iterations.
++ */
++ init_poll_funcptr(&pwq.pt, NULL);
++ }
++
++ poll_freewait(&pwq);
++
++ if (ret == -EINTR) {
++ /*
++ * Interrupted system call. Unref the reply object, and pass
++ * the return value down the chain. Mark the reply as
++ * interrupted, so the cleanup work can remove it, but do not
++ * unlink it from the list. Once the syscall restarts, we'll
++ * pick it up and wait on it again.
++ */
++ mutex_lock(&conn_src->lock);
++ reply_wait->interrupted = true;
++ schedule_delayed_work(&conn_src->work, 0);
++ mutex_unlock(&conn_src->lock);
++
++ return -ERESTARTSYS;
++ }
++
++ mutex_lock(&conn_src->lock);
++ reply_wait->waiting = false;
++ entry = reply_wait->queue_entry;
++ if (entry) {
++ ret = kdbus_queue_entry_install(entry,
++ &cmd_send->reply.return_flags,
++ true);
++ kdbus_pool_slice_publish(entry->slice, &cmd_send->reply.offset,
++ &cmd_send->reply.msg_size);
++ kdbus_queue_entry_free(entry);
++ }
++ kdbus_reply_unlink(reply_wait);
++ mutex_unlock(&conn_src->lock);
++
++ return ret;
++}
++
++static int kdbus_pin_dst(struct kdbus_bus *bus,
++ struct kdbus_staging *staging,
++ struct kdbus_name_entry **out_name,
++ struct kdbus_conn **out_dst)
++{
++ const struct kdbus_msg *msg = staging->msg;
++ struct kdbus_name_owner *owner = NULL;
++ struct kdbus_name_entry *name = NULL;
++ struct kdbus_conn *dst = NULL;
++ int ret;
++
++ lockdep_assert_held(&bus->name_registry->rwlock);
++
++ if (!staging->dst_name) {
++ dst = kdbus_bus_find_conn_by_id(bus, msg->dst_id);
++ if (!dst)
++ return -ENXIO;
++
++ if (!kdbus_conn_is_ordinary(dst)) {
++ ret = -ENXIO;
++ goto error;
++ }
++ } else {
++ name = kdbus_name_lookup_unlocked(bus->name_registry,
++ staging->dst_name);
++ if (name)
++ owner = kdbus_name_get_owner(name);
++ if (!owner)
++ return -ESRCH;
++
++ /*
++ * If both a name and a connection ID are given as destination
++ * of a message, check that the currently owning connection of
++ * the name matches the specified ID.
++ * This way, we allow userspace to send the message to a
++ * specific connection by ID only if the connection currently
++ * owns the given name.
++ */
++ if (msg->dst_id != KDBUS_DST_ID_NAME &&
++ msg->dst_id != owner->conn->id)
++ return -EREMCHG;
++
++ if ((msg->flags & KDBUS_MSG_NO_AUTO_START) &&
++ kdbus_conn_is_activator(owner->conn))
++ return -EADDRNOTAVAIL;
++
++ dst = kdbus_conn_ref(owner->conn);
++ }
++
++ *out_name = name;
++ *out_dst = dst;
++ return 0;
++
++error:
++ kdbus_conn_unref(dst);
++ return ret;
++}
++
++static int kdbus_conn_reply(struct kdbus_conn *src,
++ struct kdbus_staging *staging)
++{
++ const struct kdbus_msg *msg = staging->msg;
++ struct kdbus_name_entry *name = NULL;
++ struct kdbus_reply *reply, *wake = NULL;
++ struct kdbus_conn *dst = NULL;
++ struct kdbus_bus *bus = src->ep->bus;
++ int ret;
++
++ if (WARN_ON(msg->dst_id == KDBUS_DST_ID_BROADCAST) ||
++ WARN_ON(msg->flags & KDBUS_MSG_EXPECT_REPLY) ||
++ WARN_ON(msg->flags & KDBUS_MSG_SIGNAL))
++ return -EINVAL;
++
++ /* name-registry must be locked for lookup *and* collecting data */
++ down_read(&bus->name_registry->rwlock);
++
++ /* find and pin destination */
++
++ ret = kdbus_pin_dst(bus, staging, &name, &dst);
++ if (ret < 0)
++ goto exit;
++
++ mutex_lock(&dst->lock);
++ reply = kdbus_reply_find(src, dst, msg->cookie_reply);
++ if (reply) {
++ if (reply->sync)
++ wake = kdbus_reply_ref(reply);
++ kdbus_reply_unlink(reply);
++ }
++ mutex_unlock(&dst->lock);
++
++ if (!reply) {
++ ret = -EBADSLT;
++ goto exit;
++ }
++
++ /* send message */
++
++ kdbus_bus_eavesdrop(bus, src, staging);
++
++ if (wake)
++ ret = kdbus_conn_entry_sync_attach(dst, staging, wake);
++ else
++ ret = kdbus_conn_entry_insert(src, dst, staging, NULL, name);
++
++exit:
++ up_read(&bus->name_registry->rwlock);
++ kdbus_reply_unref(wake);
++ kdbus_conn_unref(dst);
++ return ret;
++}
++
++static struct kdbus_reply *kdbus_conn_call(struct kdbus_conn *src,
++ struct kdbus_staging *staging,
++ ktime_t exp)
++{
++ const struct kdbus_msg *msg = staging->msg;
++ struct kdbus_name_entry *name = NULL;
++ struct kdbus_reply *wait = NULL;
++ struct kdbus_conn *dst = NULL;
++ struct kdbus_bus *bus = src->ep->bus;
++ int ret;
++
++ if (WARN_ON(msg->dst_id == KDBUS_DST_ID_BROADCAST) ||
++ WARN_ON(msg->flags & KDBUS_MSG_SIGNAL) ||
++ WARN_ON(!(msg->flags & KDBUS_MSG_EXPECT_REPLY)))
++ return ERR_PTR(-EINVAL);
++
++ /* resume previous wait-context, if available */
++
++ mutex_lock(&src->lock);
++ wait = kdbus_reply_find(NULL, src, msg->cookie);
++ if (wait) {
++ if (wait->interrupted) {
++ kdbus_reply_ref(wait);
++ wait->interrupted = false;
++ } else {
++ wait = NULL;
++ }
++ }
++ mutex_unlock(&src->lock);
++
++ if (wait)
++ return wait;
++
++ if (ktime_compare(ktime_get(), exp) >= 0)
++ return ERR_PTR(-ETIMEDOUT);
++
++ /* name-registry must be locked for lookup *and* collecting data */
++ down_read(&bus->name_registry->rwlock);
++
++ /* find and pin destination */
++
++ ret = kdbus_pin_dst(bus, staging, &name, &dst);
++ if (ret < 0)
++ goto exit;
++
++ if (!kdbus_conn_policy_talk(src, current_cred(), dst)) {
++ ret = -EPERM;
++ goto exit;
++ }
++
++ wait = kdbus_reply_new(dst, src, msg, name, true);
++ if (IS_ERR(wait)) {
++ ret = PTR_ERR(wait);
++ wait = NULL;
++ goto exit;
++ }
++
++ /* send message */
++
++ kdbus_bus_eavesdrop(bus, src, staging);
++
++ ret = kdbus_conn_entry_insert(src, dst, staging, wait, name);
++ if (ret < 0)
++ goto exit;
++
++ ret = 0;
++
++exit:
++ up_read(&bus->name_registry->rwlock);
++ if (ret < 0) {
++ kdbus_reply_unref(wait);
++ wait = ERR_PTR(ret);
++ }
++ kdbus_conn_unref(dst);
++ return wait;
++}
++
++static int kdbus_conn_unicast(struct kdbus_conn *src,
++ struct kdbus_staging *staging)
++{
++ const struct kdbus_msg *msg = staging->msg;
++ struct kdbus_name_entry *name = NULL;
++ struct kdbus_reply *wait = NULL;
++ struct kdbus_conn *dst = NULL;
++ struct kdbus_bus *bus = src->ep->bus;
++ bool is_signal = (msg->flags & KDBUS_MSG_SIGNAL);
++ int ret = 0;
++
++ if (WARN_ON(msg->dst_id == KDBUS_DST_ID_BROADCAST) ||
++ WARN_ON(!(msg->flags & KDBUS_MSG_EXPECT_REPLY) &&
++ msg->cookie_reply != 0))
++ return -EINVAL;
++
++ /* name-registry must be locked for lookup *and* collecting data */
++ down_read(&bus->name_registry->rwlock);
++
++ /* find and pin destination */
++
++ ret = kdbus_pin_dst(bus, staging, &name, &dst);
++ if (ret < 0)
++ goto exit;
++
++ if (is_signal) {
++ /* like broadcasts we eavesdrop even if the msg is dropped */
++ kdbus_bus_eavesdrop(bus, src, staging);
++
++ /* drop silently if peer is not interested or not privileged */
++ if (!kdbus_match_db_match_msg(dst->match_db, src, staging) ||
++ !kdbus_conn_policy_talk(dst, NULL, src))
++ goto exit;
++ } else if (!kdbus_conn_policy_talk(src, current_cred(), dst)) {
++ ret = -EPERM;
++ goto exit;
++ } else if (msg->flags & KDBUS_MSG_EXPECT_REPLY) {
++ wait = kdbus_reply_new(dst, src, msg, name, false);
++ if (IS_ERR(wait)) {
++ ret = PTR_ERR(wait);
++ wait = NULL;
++ goto exit;
++ }
++ }
++
++ /* send message */
++
++ if (!is_signal)
++ kdbus_bus_eavesdrop(bus, src, staging);
++
++ ret = kdbus_conn_entry_insert(src, dst, staging, wait, name);
++ if (ret < 0 && !is_signal)
++ goto exit;
++
++ /* signals are treated like broadcasts, recv-errors are ignored */
++ ret = 0;
++
++exit:
++ up_read(&bus->name_registry->rwlock);
++ kdbus_reply_unref(wait);
++ kdbus_conn_unref(dst);
++ return ret;
++}
++
++/**
++ * kdbus_conn_move_messages() - move messages from one connection to another
++ * @conn_dst: Connection to copy to
++ * @conn_src: Connection to copy from
++ * @name_id: Filter for the sequence number of the registered
++ * name, 0 means no filtering.
++ *
++ * Move all messages from one connection to another. This is used when
++ * an implementer connection is taking over/giving back a well-known name
++ * from/to an activator connection.
++ */
++void kdbus_conn_move_messages(struct kdbus_conn *conn_dst,
++ struct kdbus_conn *conn_src,
++ u64 name_id)
++{
++ struct kdbus_queue_entry *e, *e_tmp;
++ struct kdbus_reply *r, *r_tmp;
++ struct kdbus_bus *bus;
++ struct kdbus_conn *c;
++ LIST_HEAD(msg_list);
++ int i, ret = 0;
++
++ if (WARN_ON(conn_src == conn_dst))
++ return;
++
++ bus = conn_src->ep->bus;
++
++ /* lock order: domain -> bus -> ep -> names -> conn */
++ down_read(&bus->conn_rwlock);
++ hash_for_each(bus->conn_hash, i, c, hentry) {
++ if (c == conn_src || c == conn_dst)
++ continue;
++
++ mutex_lock(&c->lock);
++ list_for_each_entry_safe(r, r_tmp, &c->reply_list, entry) {
++ if (r->reply_src != conn_src)
++ continue;
++
++ /* filter messages for a specific name */
++ if (name_id > 0 && r->name_id != name_id)
++ continue;
++
++ kdbus_conn_unref(r->reply_src);
++ r->reply_src = kdbus_conn_ref(conn_dst);
++ }
++ mutex_unlock(&c->lock);
++ }
++ up_read(&bus->conn_rwlock);
++
++ kdbus_conn_lock2(conn_src, conn_dst);
++ list_for_each_entry_safe(e, e_tmp, &conn_src->queue.msg_list, entry) {
++ /* filter messages for a specific name */
++ if (name_id > 0 && e->dst_name_id != name_id)
++ continue;
++
++ if (!(conn_dst->flags & KDBUS_HELLO_ACCEPT_FD) &&
++ e->gaps && e->gaps->n_fds > 0) {
++ kdbus_conn_lost_message(conn_dst);
++ kdbus_queue_entry_free(e);
++ continue;
++ }
++
++ ret = kdbus_queue_entry_move(e, conn_dst);
++ if (ret < 0) {
++ kdbus_conn_lost_message(conn_dst);
++ kdbus_queue_entry_free(e);
++ continue;
++ }
++ }
++ kdbus_conn_unlock2(conn_src, conn_dst);
++
++ /* wake up poll() */
++ wake_up_interruptible(&conn_dst->wait);
++}
++
++/* query the policy-database for all names of @whom */
++static bool kdbus_conn_policy_query_all(struct kdbus_conn *conn,
++ const struct cred *conn_creds,
++ struct kdbus_policy_db *db,
++ struct kdbus_conn *whom,
++ unsigned int access)
++{
++ struct kdbus_name_owner *owner;
++ bool pass = false;
++ int res;
++
++ lockdep_assert_held(&conn->ep->bus->name_registry->rwlock);
++
++ down_read(&db->entries_rwlock);
++ mutex_lock(&whom->lock);
++
++ list_for_each_entry(owner, &whom->names_list, conn_entry) {
++ if (owner->flags & KDBUS_NAME_IN_QUEUE)
++ continue;
++
++ res = kdbus_policy_query_unlocked(db,
++ conn_creds ? : conn->cred,
++ owner->name->name,
++ kdbus_strhash(owner->name->name));
++ if (res >= (int)access) {
++ pass = true;
++ break;
++ }
++ }
++
++ mutex_unlock(&whom->lock);
++ up_read(&db->entries_rwlock);
++
++ return pass;
++}
++
++/**
++ * kdbus_conn_policy_own_name() - verify a connection can own the given name
++ * @conn: Connection
++ * @conn_creds: Credentials of @conn to use for policy check
++ * @name: Name
++ *
++ * This verifies that @conn is allowed to acquire the well-known name @name.
++ *
++ * Return: true if allowed, false if not.
++ */
++bool kdbus_conn_policy_own_name(struct kdbus_conn *conn,
++ const struct cred *conn_creds,
++ const char *name)
++{
++ unsigned int hash = kdbus_strhash(name);
++ int res;
++
++ if (!conn_creds)
++ conn_creds = conn->cred;
++
++ if (conn->ep->user) {
++ res = kdbus_policy_query(&conn->ep->policy_db, conn_creds,
++ name, hash);
++ if (res < KDBUS_POLICY_OWN)
++ return false;
++ }
++
++ if (conn->owner)
++ return true;
++
++ res = kdbus_policy_query(&conn->ep->bus->policy_db, conn_creds,
++ name, hash);
++ return res >= KDBUS_POLICY_OWN;
++}
++
++/**
++ * kdbus_conn_policy_talk() - verify a connection can talk to a given peer
++ * @conn: Connection that tries to talk
++ * @conn_creds: Credentials of @conn to use for policy check
++ * @to: Connection that is talked to
++ *
++ * This verifies that @conn is allowed to talk to @to.
++ *
++ * Return: true if allowed, false if not.
++ */
++bool kdbus_conn_policy_talk(struct kdbus_conn *conn,
++ const struct cred *conn_creds,
++ struct kdbus_conn *to)
++{
++ if (!conn_creds)
++ conn_creds = conn->cred;
++
++ if (conn->ep->user &&
++ !kdbus_conn_policy_query_all(conn, conn_creds, &conn->ep->policy_db,
++ to, KDBUS_POLICY_TALK))
++ return false;
++
++ if (conn->owner)
++ return true;
++ if (uid_eq(conn_creds->euid, to->cred->uid))
++ return true;
++
++ return kdbus_conn_policy_query_all(conn, conn_creds,
++ &conn->ep->bus->policy_db, to,
++ KDBUS_POLICY_TALK);
++}
++
++/**
++ * kdbus_conn_policy_see_name_unlocked() - verify a connection can see a given
++ * name
++ * @conn: Connection
++ * @conn_creds: Credentials of @conn to use for policy check
++ * @name: Name
++ *
++ * This verifies that @conn is allowed to see the well-known name @name. Caller
++ * must hold policy-lock.
++ *
++ * Return: true if allowed, false if not.
++ */
++bool kdbus_conn_policy_see_name_unlocked(struct kdbus_conn *conn,
++ const struct cred *conn_creds,
++ const char *name)
++{
++ int res;
++
++ /*
++ * By default, all names are visible on a bus. SEE policies can only be
++ * installed on custom endpoints, where by default no name is visible.
++ */
++ if (!conn->ep->user)
++ return true;
++
++ res = kdbus_policy_query_unlocked(&conn->ep->policy_db,
++ conn_creds ? : conn->cred,
++ name, kdbus_strhash(name));
++ return res >= KDBUS_POLICY_SEE;
++}
++
++static bool kdbus_conn_policy_see_name(struct kdbus_conn *conn,
++ const struct cred *conn_creds,
++ const char *name)
++{
++ bool res;
++
++ down_read(&conn->ep->policy_db.entries_rwlock);
++ res = kdbus_conn_policy_see_name_unlocked(conn, conn_creds, name);
++ up_read(&conn->ep->policy_db.entries_rwlock);
++
++ return res;
++}
++
++static bool kdbus_conn_policy_see(struct kdbus_conn *conn,
++ const struct cred *conn_creds,
++ struct kdbus_conn *whom)
++{
++ /*
++ * By default, all names are visible on a bus, so a connection can
++ * always see other connections. SEE policies can only be installed on
++ * custom endpoints, where by default no name is visible and we hide
++ * peers from each other, unless you see at least _one_ name of the
++ * peer.
++ */
++ return !conn->ep->user ||
++ kdbus_conn_policy_query_all(conn, conn_creds,
++ &conn->ep->policy_db, whom,
++ KDBUS_POLICY_SEE);
++}
++
++/**
++ * kdbus_conn_policy_see_notification() - verify a connection is allowed to
++ * receive a given kernel notification
++ * @conn: Connection
++ * @conn_creds: Credentials of @conn to use for policy check
++ * @msg: Notification message
++ *
++ * This checks whether @conn is allowed to see the kernel notification.
++ *
++ * Return: true if allowed, false if not.
++ */
++bool kdbus_conn_policy_see_notification(struct kdbus_conn *conn,
++ const struct cred *conn_creds,
++ const struct kdbus_msg *msg)
++{
++ /*
++ * Depending on the notification type, broadcasted kernel notifications
++ * have to be filtered:
++ *
++ * KDBUS_ITEM_NAME_{ADD,REMOVE,CHANGE}: This notification is forwarded
++ * to a peer if, and only if, that peer can see the name this
++ * notification is for.
++ *
++ * KDBUS_ITEM_ID_{ADD,REMOVE}: Notifications for ID changes are
++ * broadcast to everyone, to allow tracking peers.
++ */
++
++ switch (msg->items[0].type) {
++ case KDBUS_ITEM_NAME_ADD:
++ case KDBUS_ITEM_NAME_REMOVE:
++ case KDBUS_ITEM_NAME_CHANGE:
++ return kdbus_conn_policy_see_name(conn, conn_creds,
++ msg->items[0].name_change.name);
++
++ case KDBUS_ITEM_ID_ADD:
++ case KDBUS_ITEM_ID_REMOVE:
++ return true;
++
++ default:
++ WARN(1, "Invalid type for notification broadcast: %llu\n",
++ (unsigned long long)msg->items[0].type);
++ return false;
++ }
++}
++
++/**
++ * kdbus_cmd_hello() - handle KDBUS_CMD_HELLO
++ * @ep: Endpoint to operate on
++ * @file: File this connection is opened on
++ * @argp: Command payload
++ *
++ * Return: NULL or newly created connection on success, ERR_PTR on failure.
++ */
++struct kdbus_conn *kdbus_cmd_hello(struct kdbus_ep *ep, struct file *file,
++ void __user *argp)
++{
++ struct kdbus_cmd_hello *cmd;
++ struct kdbus_conn *c = NULL;
++ const char *item_name;
++ int ret;
++
++ struct kdbus_arg argv[] = {
++ { .type = KDBUS_ITEM_NEGOTIATE },
++ { .type = KDBUS_ITEM_NAME },
++ { .type = KDBUS_ITEM_CREDS },
++ { .type = KDBUS_ITEM_PIDS },
++ { .type = KDBUS_ITEM_SECLABEL },
++ { .type = KDBUS_ITEM_CONN_DESCRIPTION },
++ { .type = KDBUS_ITEM_POLICY_ACCESS, .multiple = true },
++ };
++ struct kdbus_args args = {
++ .allowed_flags = KDBUS_FLAG_NEGOTIATE |
++ KDBUS_HELLO_ACCEPT_FD |
++ KDBUS_HELLO_ACTIVATOR |
++ KDBUS_HELLO_POLICY_HOLDER |
++ KDBUS_HELLO_MONITOR,
++ .argv = argv,
++ .argc = ARRAY_SIZE(argv),
++ };
++
++ ret = kdbus_args_parse(&args, argp, &cmd);
++ if (ret < 0)
++ return ERR_PTR(ret);
++ if (ret > 0)
++ return NULL;
++
++ item_name = argv[1].item ? argv[1].item->str : NULL;
++
++ c = kdbus_conn_new(ep, file, cmd, item_name,
++ argv[2].item ? &argv[2].item->creds : NULL,
++ argv[3].item ? &argv[3].item->pids : NULL,
++ argv[4].item ? argv[4].item->str : NULL,
++ argv[5].item ? argv[5].item->str : NULL);
++ if (IS_ERR(c)) {
++ ret = PTR_ERR(c);
++ c = NULL;
++ goto exit;
++ }
++
++ ret = kdbus_conn_connect(c, item_name);
++ if (ret < 0)
++ goto exit;
++
++ if (kdbus_conn_is_activator(c) || kdbus_conn_is_policy_holder(c)) {
++ ret = kdbus_conn_acquire(c);
++ if (ret < 0)
++ goto exit;
++
++ ret = kdbus_policy_set(&c->ep->bus->policy_db, args.items,
++ args.items_size, 1,
++ kdbus_conn_is_policy_holder(c), c);
++ kdbus_conn_release(c);
++ if (ret < 0)
++ goto exit;
++ }
++
++ if (copy_to_user(argp, cmd, sizeof(*cmd)))
++ ret = -EFAULT;
++
++exit:
++ ret = kdbus_args_clear(&args, ret);
++ if (ret < 0) {
++ if (c) {
++ kdbus_conn_disconnect(c, false);
++ kdbus_conn_unref(c);
++ }
++ return ERR_PTR(ret);
++ }
++ return c;
++}
++
++/**
++ * kdbus_cmd_byebye_unlocked() - handle KDBUS_CMD_BYEBYE
++ * @conn: connection to operate on
++ * @argp: command payload
++ *
++ * The caller must not hold any active reference to @conn or this will deadlock.
++ *
++ * Return: >=0 on success, negative error code on failure.
++ */
++int kdbus_cmd_byebye_unlocked(struct kdbus_conn *conn, void __user *argp)
++{
++ struct kdbus_cmd *cmd;
++ int ret;
++
++ struct kdbus_arg argv[] = {
++ { .type = KDBUS_ITEM_NEGOTIATE },
++ };
++ struct kdbus_args args = {
++ .allowed_flags = KDBUS_FLAG_NEGOTIATE,
++ .argv = argv,
++ .argc = ARRAY_SIZE(argv),
++ };
++
++ if (!kdbus_conn_is_ordinary(conn))
++ return -EOPNOTSUPP;
++
++ ret = kdbus_args_parse(&args, argp, &cmd);
++ if (ret != 0)
++ return ret;
++
++ ret = kdbus_conn_disconnect(conn, true);
++ return kdbus_args_clear(&args, ret);
++}
++
++/**
++ * kdbus_cmd_conn_info() - handle KDBUS_CMD_CONN_INFO
++ * @conn: connection to operate on
++ * @argp: command payload
++ *
++ * Return: >=0 on success, negative error code on failure.
++ */
++int kdbus_cmd_conn_info(struct kdbus_conn *conn, void __user *argp)
++{
++ struct kdbus_meta_conn *conn_meta = NULL;
++ struct kdbus_pool_slice *slice = NULL;
++ struct kdbus_name_entry *entry = NULL;
++ struct kdbus_name_owner *owner = NULL;
++ struct kdbus_conn *owner_conn = NULL;
++ struct kdbus_item *meta_items = NULL;
++ struct kdbus_info info = {};
++ struct kdbus_cmd_info *cmd;
++ struct kdbus_bus *bus = conn->ep->bus;
++ struct kvec kvec[3];
++ size_t meta_size, cnt = 0;
++ const char *name;
++ u64 attach_flags, size = 0;
++ int ret;
++
++ struct kdbus_arg argv[] = {
++ { .type = KDBUS_ITEM_NEGOTIATE },
++ { .type = KDBUS_ITEM_NAME },
++ };
++ struct kdbus_args args = {
++ .allowed_flags = KDBUS_FLAG_NEGOTIATE,
++ .argv = argv,
++ .argc = ARRAY_SIZE(argv),
++ };
++
++ ret = kdbus_args_parse(&args, argp, &cmd);
++ if (ret != 0)
++ return ret;
++
++ /* registry must be held throughout lookup *and* collecting data */
++ down_read(&bus->name_registry->rwlock);
++
++ ret = kdbus_sanitize_attach_flags(cmd->attach_flags, &attach_flags);
++ if (ret < 0)
++ goto exit;
++
++ name = argv[1].item ? argv[1].item->str : NULL;
++
++ if (name) {
++ entry = kdbus_name_lookup_unlocked(bus->name_registry, name);
++ if (entry)
++ owner = kdbus_name_get_owner(entry);
++ if (!owner ||
++ !kdbus_conn_policy_see_name(conn, current_cred(), name) ||
++ (cmd->id != 0 && owner->conn->id != cmd->id)) {
++ /* pretend a name doesn't exist if you cannot see it */
++ ret = -ESRCH;
++ goto exit;
++ }
++
++ owner_conn = kdbus_conn_ref(owner->conn);
++ } else if (cmd->id > 0) {
++ owner_conn = kdbus_bus_find_conn_by_id(bus, cmd->id);
++ if (!owner_conn || !kdbus_conn_policy_see(conn, current_cred(),
++ owner_conn)) {
++ /* pretend an id doesn't exist if you cannot see it */
++ ret = -ENXIO;
++ goto exit;
++ }
++ } else {
++ ret = -EINVAL;
++ goto exit;
++ }
++
++ attach_flags &= atomic64_read(&owner_conn->attach_flags_send);
++
++ conn_meta = kdbus_meta_conn_new();
++ if (IS_ERR(conn_meta)) {
++ ret = PTR_ERR(conn_meta);
++ conn_meta = NULL;
++ goto exit;
++ }
++
++ ret = kdbus_meta_conn_collect(conn_meta, owner_conn, 0, attach_flags);
++ if (ret < 0)
++ goto exit;
++
++ ret = kdbus_meta_emit(owner_conn->meta_proc, owner_conn->meta_fake,
++ conn_meta, conn, attach_flags,
++ &meta_items, &meta_size);
++ if (ret < 0)
++ goto exit;
++
++ info.id = owner_conn->id;
++ info.flags = owner_conn->flags;
++
++ kdbus_kvec_set(&kvec[cnt++], &info, sizeof(info), &size);
++ if (meta_size > 0) {
++ kdbus_kvec_set(&kvec[cnt++], meta_items, meta_size, &size);
++ cnt += !!kdbus_kvec_pad(&kvec[cnt], &size);
++ }
++
++ info.size = size;
++
++ slice = kdbus_pool_slice_alloc(conn->pool, size, false);
++ if (IS_ERR(slice)) {
++ ret = PTR_ERR(slice);
++ slice = NULL;
++ goto exit;
++ }
++
++ ret = kdbus_pool_slice_copy_kvec(slice, 0, kvec, cnt, size);
++ if (ret < 0)
++ goto exit;
++
++ kdbus_pool_slice_publish(slice, &cmd->offset, &cmd->info_size);
++
++ if (kdbus_member_set_user(&cmd->offset, argp, typeof(*cmd), offset) ||
++ kdbus_member_set_user(&cmd->info_size, argp,
++ typeof(*cmd), info_size)) {
++ ret = -EFAULT;
++ goto exit;
++ }
++
++ ret = 0;
++
++exit:
++ up_read(&bus->name_registry->rwlock);
++ kdbus_pool_slice_release(slice);
++ kfree(meta_items);
++ kdbus_meta_conn_unref(conn_meta);
++ kdbus_conn_unref(owner_conn);
++ return kdbus_args_clear(&args, ret);
++}
++
++/**
++ * kdbus_cmd_update() - handle KDBUS_CMD_UPDATE
++ * @conn: connection to operate on
++ * @argp: command payload
++ *
++ * Return: >=0 on success, negative error code on failure.
++ */
++int kdbus_cmd_update(struct kdbus_conn *conn, void __user *argp)
++{
++ struct kdbus_item *item_policy;
++ u64 *item_attach_send = NULL;
++ u64 *item_attach_recv = NULL;
++ struct kdbus_cmd *cmd;
++ u64 attach_send;
++ u64 attach_recv;
++ int ret;
++
++ struct kdbus_arg argv[] = {
++ { .type = KDBUS_ITEM_NEGOTIATE },
++ { .type = KDBUS_ITEM_ATTACH_FLAGS_SEND },
++ { .type = KDBUS_ITEM_ATTACH_FLAGS_RECV },
++ { .type = KDBUS_ITEM_NAME, .multiple = true },
++ { .type = KDBUS_ITEM_POLICY_ACCESS, .multiple = true },
++ };
++ struct kdbus_args args = {
++ .allowed_flags = KDBUS_FLAG_NEGOTIATE,
++ .argv = argv,
++ .argc = ARRAY_SIZE(argv),
++ };
++
++ ret = kdbus_args_parse(&args, argp, &cmd);
++ if (ret != 0)
++ return ret;
++
++ item_attach_send = argv[1].item ? &argv[1].item->data64[0] : NULL;
++ item_attach_recv = argv[2].item ? &argv[2].item->data64[0] : NULL;
++ item_policy = argv[3].item ? : argv[4].item;
++
++ if (item_attach_send) {
++ if (!kdbus_conn_is_ordinary(conn) &&
++ !kdbus_conn_is_monitor(conn)) {
++ ret = -EOPNOTSUPP;
++ goto exit;
++ }
++
++ ret = kdbus_sanitize_attach_flags(*item_attach_send,
++ &attach_send);
++ if (ret < 0)
++ goto exit;
++ }
++
++ if (item_attach_recv) {
++ if (!kdbus_conn_is_ordinary(conn) &&
++ !kdbus_conn_is_monitor(conn) &&
++ !kdbus_conn_is_activator(conn)) {
++ ret = -EOPNOTSUPP;
++ goto exit;
++ }
++
++ ret = kdbus_sanitize_attach_flags(*item_attach_recv,
++ &attach_recv);
++ if (ret < 0)
++ goto exit;
++ }
++
++ if (item_policy && !kdbus_conn_is_policy_holder(conn)) {
++ ret = -EOPNOTSUPP;
++ goto exit;
++ }
++
++ /* now that we verified the input, update the connection */
++
++ if (item_policy) {
++ ret = kdbus_policy_set(&conn->ep->bus->policy_db, cmd->items,
++ KDBUS_ITEMS_SIZE(cmd, items),
++ 1, true, conn);
++ if (ret < 0)
++ goto exit;
++ }
++
++ if (item_attach_send)
++ atomic64_set(&conn->attach_flags_send, attach_send);
++
++ if (item_attach_recv)
++ atomic64_set(&conn->attach_flags_recv, attach_recv);
++
++exit:
++ return kdbus_args_clear(&args, ret);
++}
++
++/**
++ * kdbus_cmd_send() - handle KDBUS_CMD_SEND
++ * @conn: connection to operate on
++ * @f: file this command was called on
++ * @argp: command payload
++ *
++ * Return: >=0 on success, negative error code on failure.
++ */
++int kdbus_cmd_send(struct kdbus_conn *conn, struct file *f, void __user *argp)
++{
++ struct kdbus_cmd_send *cmd;
++ struct kdbus_staging *staging = NULL;
++ struct kdbus_msg *msg = NULL;
++ struct file *cancel_fd = NULL;
++ int ret, ret2;
++
++ /* command arguments */
++ struct kdbus_arg argv[] = {
++ { .type = KDBUS_ITEM_NEGOTIATE },
++ { .type = KDBUS_ITEM_CANCEL_FD },
++ };
++ struct kdbus_args args = {
++ .allowed_flags = KDBUS_FLAG_NEGOTIATE |
++ KDBUS_SEND_SYNC_REPLY,
++ .argv = argv,
++ .argc = ARRAY_SIZE(argv),
++ };
++
++ /* message arguments */
++ struct kdbus_arg msg_argv[] = {
++ { .type = KDBUS_ITEM_NEGOTIATE },
++ { .type = KDBUS_ITEM_PAYLOAD_VEC, .multiple = true },
++ { .type = KDBUS_ITEM_PAYLOAD_MEMFD, .multiple = true },
++ { .type = KDBUS_ITEM_FDS },
++ { .type = KDBUS_ITEM_BLOOM_FILTER },
++ { .type = KDBUS_ITEM_DST_NAME },
++ };
++ struct kdbus_args msg_args = {
++ .allowed_flags = KDBUS_FLAG_NEGOTIATE |
++ KDBUS_MSG_EXPECT_REPLY |
++ KDBUS_MSG_NO_AUTO_START |
++ KDBUS_MSG_SIGNAL,
++ .argv = msg_argv,
++ .argc = ARRAY_SIZE(msg_argv),
++ };
++
++ if (!kdbus_conn_is_ordinary(conn))
++ return -EOPNOTSUPP;
++
++ /* make sure to parse both, @cmd and @msg on negotiation */
++
++ ret = kdbus_args_parse(&args, argp, &cmd);
++ if (ret < 0)
++ goto exit;
++ else if (ret > 0 && !cmd->msg_address) /* negotiation without msg */
++ goto exit;
++
++ ret2 = kdbus_args_parse_msg(&msg_args, KDBUS_PTR(cmd->msg_address),
++ &msg);
++ if (ret2 < 0) { /* cannot parse message */
++ ret = ret2;
++ goto exit;
++ } else if (ret2 > 0 && !ret) { /* msg-negot implies cmd-negot */
++ ret = -EINVAL;
++ goto exit;
++ } else if (ret > 0) { /* negotiation */
++ goto exit;
++ }
++
++ /* here we parsed both, @cmd and @msg, and neither wants negotiation */
++
++ cmd->reply.return_flags = 0;
++ kdbus_pool_publish_empty(conn->pool, &cmd->reply.offset,
++ &cmd->reply.msg_size);
++
++ if (argv[1].item) {
++ cancel_fd = fget(argv[1].item->fds[0]);
++ if (!cancel_fd) {
++ ret = -EBADF;
++ goto exit;
++ }
++
++ if (!cancel_fd->f_op->poll) {
++ ret = -EINVAL;
++ goto exit;
++ }
++ }
++
++ /* patch-in the source of this message */
++ if (msg->src_id > 0 && msg->src_id != conn->id) {
++ ret = -EINVAL;
++ goto exit;
++ }
++ msg->src_id = conn->id;
++
++ staging = kdbus_staging_new_user(conn->ep->bus, cmd, msg);
++ if (IS_ERR(staging)) {
++ ret = PTR_ERR(staging);
++ staging = NULL;
++ goto exit;
++ }
++
++ if (msg->dst_id == KDBUS_DST_ID_BROADCAST) {
++ down_read(&conn->ep->bus->name_registry->rwlock);
++ kdbus_bus_broadcast(conn->ep->bus, conn, staging);
++ up_read(&conn->ep->bus->name_registry->rwlock);
++ } else if (cmd->flags & KDBUS_SEND_SYNC_REPLY) {
++ struct kdbus_reply *r;
++ ktime_t exp;
++
++ exp = ns_to_ktime(msg->timeout_ns);
++ r = kdbus_conn_call(conn, staging, exp);
++ if (IS_ERR(r)) {
++ ret = PTR_ERR(r);
++ goto exit;
++ }
++
++ ret = kdbus_conn_wait_reply(conn, cmd, f, cancel_fd, r, exp);
++ kdbus_reply_unref(r);
++ if (ret < 0)
++ goto exit;
++ } else if ((msg->flags & KDBUS_MSG_EXPECT_REPLY) ||
++ msg->cookie_reply == 0) {
++ ret = kdbus_conn_unicast(conn, staging);
++ if (ret < 0)
++ goto exit;
++ } else {
++ ret = kdbus_conn_reply(conn, staging);
++ if (ret < 0)
++ goto exit;
++ }
++
++ if (kdbus_member_set_user(&cmd->reply, argp, typeof(*cmd), reply))
++ ret = -EFAULT;
++
++exit:
++ if (cancel_fd)
++ fput(cancel_fd);
++ kdbus_staging_free(staging);
++ ret = kdbus_args_clear(&msg_args, ret);
++ return kdbus_args_clear(&args, ret);
++}
++
++/**
++ * kdbus_cmd_recv() - handle KDBUS_CMD_RECV
++ * @conn: connection to operate on
++ * @argp: command payload
++ *
++ * Return: >=0 on success, negative error code on failure.
++ */
++int kdbus_cmd_recv(struct kdbus_conn *conn, void __user *argp)
++{
++ struct kdbus_queue_entry *entry;
++ struct kdbus_cmd_recv *cmd;
++ int ret;
++
++ struct kdbus_arg argv[] = {
++ { .type = KDBUS_ITEM_NEGOTIATE },
++ };
++ struct kdbus_args args = {
++ .allowed_flags = KDBUS_FLAG_NEGOTIATE |
++ KDBUS_RECV_PEEK |
++ KDBUS_RECV_DROP |
++ KDBUS_RECV_USE_PRIORITY,
++ .argv = argv,
++ .argc = ARRAY_SIZE(argv),
++ };
++
++ if (!kdbus_conn_is_ordinary(conn) &&
++ !kdbus_conn_is_monitor(conn) &&
++ !kdbus_conn_is_activator(conn))
++ return -EOPNOTSUPP;
++
++ ret = kdbus_args_parse(&args, argp, &cmd);
++ if (ret != 0)
++ return ret;
++
++ cmd->dropped_msgs = 0;
++ cmd->msg.return_flags = 0;
++ kdbus_pool_publish_empty(conn->pool, &cmd->msg.offset,
++ &cmd->msg.msg_size);
++
++ /* DROP+priority is not realiably, so prevent it */
++ if ((cmd->flags & KDBUS_RECV_DROP) &&
++ (cmd->flags & KDBUS_RECV_USE_PRIORITY)) {
++ ret = -EINVAL;
++ goto exit;
++ }
++
++ mutex_lock(&conn->lock);
++
++ entry = kdbus_queue_peek(&conn->queue, cmd->priority,
++ cmd->flags & KDBUS_RECV_USE_PRIORITY);
++ if (!entry) {
++ mutex_unlock(&conn->lock);
++ ret = -EAGAIN;
++ } else if (cmd->flags & KDBUS_RECV_DROP) {
++ struct kdbus_reply *reply = kdbus_reply_ref(entry->reply);
++
++ kdbus_queue_entry_free(entry);
++
++ mutex_unlock(&conn->lock);
++
++ if (reply) {
++ mutex_lock(&reply->reply_dst->lock);
++ if (!list_empty(&reply->entry)) {
++ kdbus_reply_unlink(reply);
++ if (reply->sync)
++ kdbus_sync_reply_wakeup(reply, -EPIPE);
++ else
++ kdbus_notify_reply_dead(conn->ep->bus,
++ reply->reply_dst->id,
++ reply->cookie);
++ }
++ mutex_unlock(&reply->reply_dst->lock);
++ kdbus_notify_flush(conn->ep->bus);
++ }
++
++ kdbus_reply_unref(reply);
++ } else {
++ bool install_fds;
++
++ /*
++ * PEEK just returns the location of the next message. Do not
++ * install FDs nor memfds nor anything else. The only
++ * information of interest should be the message header and
++ * metadata. Any FD numbers in the payload is undefined for
++ * PEEK'ed messages.
++ * Also make sure to never install fds into a connection that
++ * has refused to receive any. Ordinary connections will not get
++ * messages with FDs queued (the receiver will get -ECOMM), but
++ * eavesdroppers might.
++ */
++ install_fds = (conn->flags & KDBUS_HELLO_ACCEPT_FD) &&
++ !(cmd->flags & KDBUS_RECV_PEEK);
++
++ ret = kdbus_queue_entry_install(entry,
++ &cmd->msg.return_flags,
++ install_fds);
++ if (ret < 0) {
++ mutex_unlock(&conn->lock);
++ goto exit;
++ }
++
++ kdbus_pool_slice_publish(entry->slice, &cmd->msg.offset,
++ &cmd->msg.msg_size);
++
++ if (!(cmd->flags & KDBUS_RECV_PEEK))
++ kdbus_queue_entry_free(entry);
++
++ mutex_unlock(&conn->lock);
++ }
++
++ cmd->dropped_msgs = atomic_xchg(&conn->lost_count, 0);
++ if (cmd->dropped_msgs > 0)
++ cmd->return_flags |= KDBUS_RECV_RETURN_DROPPED_MSGS;
++
++ if (kdbus_member_set_user(&cmd->msg, argp, typeof(*cmd), msg) ||
++ kdbus_member_set_user(&cmd->dropped_msgs, argp, typeof(*cmd),
++ dropped_msgs))
++ ret = -EFAULT;
++
++exit:
++ return kdbus_args_clear(&args, ret);
++}
++
++/**
++ * kdbus_cmd_free() - handle KDBUS_CMD_FREE
++ * @conn: connection to operate on
++ * @argp: command payload
++ *
++ * Return: >=0 on success, negative error code on failure.
++ */
++int kdbus_cmd_free(struct kdbus_conn *conn, void __user *argp)
++{
++ struct kdbus_cmd_free *cmd;
++ int ret;
++
++ struct kdbus_arg argv[] = {
++ { .type = KDBUS_ITEM_NEGOTIATE },
++ };
++ struct kdbus_args args = {
++ .allowed_flags = KDBUS_FLAG_NEGOTIATE,
++ .argv = argv,
++ .argc = ARRAY_SIZE(argv),
++ };
++
++ if (!kdbus_conn_is_ordinary(conn) &&
++ !kdbus_conn_is_monitor(conn) &&
++ !kdbus_conn_is_activator(conn))
++ return -EOPNOTSUPP;
++
++ ret = kdbus_args_parse(&args, argp, &cmd);
++ if (ret != 0)
++ return ret;
++
++ ret = kdbus_pool_release_offset(conn->pool, cmd->offset);
++
++ return kdbus_args_clear(&args, ret);
++}
+diff --git a/ipc/kdbus/connection.h b/ipc/kdbus/connection.h
+new file mode 100644
+index 0000000..1ad0820
+--- /dev/null
++++ b/ipc/kdbus/connection.h
+@@ -0,0 +1,260 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_CONNECTION_H
++#define __KDBUS_CONNECTION_H
++
++#include <linux/atomic.h>
++#include <linux/kref.h>
++#include <linux/lockdep.h>
++#include <linux/path.h>
++
++#include "limits.h"
++#include "metadata.h"
++#include "pool.h"
++#include "queue.h"
++#include "util.h"
++
++#define KDBUS_HELLO_SPECIAL_CONN (KDBUS_HELLO_ACTIVATOR | \
++ KDBUS_HELLO_POLICY_HOLDER | \
++ KDBUS_HELLO_MONITOR)
++
++struct kdbus_name_entry;
++struct kdbus_quota;
++struct kdbus_staging;
++
++/**
++ * struct kdbus_conn - connection to a bus
++ * @kref: Reference count
++ * @active: Active references to the connection
++ * @id: Connection ID
++ * @flags: KDBUS_HELLO_* flags
++ * @attach_flags_send: KDBUS_ATTACH_* flags for sending
++ * @attach_flags_recv: KDBUS_ATTACH_* flags for receiving
++ * @description: Human-readable connection description, used for
++ * debugging. This field is only set when the
++ * connection is created.
++ * @ep: The endpoint this connection belongs to
++ * @lock: Connection data lock
++ * @hentry: Entry in ID <-> connection map
++ * @ep_entry: Entry in endpoint
++ * @monitor_entry: Entry in monitor, if the connection is a monitor
++ * @reply_list: List of connections this connection should
++ * reply to
++ * @work: Delayed work to handle timeouts
++ * activator for
++ * @match_db: Subscription filter to broadcast messages
++ * @meta_proc: Process metadata of connection creator, or NULL
++ * @meta_fake: Faked metadata, or NULL
++ * @pool: The user's buffer to receive messages
++ * @user: Owner of the connection
++ * @cred: The credentials of the connection at creation time
++ * @pid: Pid at creation time
++ * @root_path: Root path at creation time
++ * @request_count: Number of pending requests issued by this
++ * connection that are waiting for replies from
++ * other peers
++ * @lost_count: Number of lost broadcast messages
++ * @wait: Wake up this endpoint
++ * @queue: The message queue associated with this connection
++ * @quota: Array of per-user quota indexed by user->id
++ * @n_quota: Number of elements in quota array
++ * @names_list: List of well-known names
++ * @name_count: Number of owned well-known names
++ * @privileged: Whether this connection is privileged on the domain
++ * @owner: Owned by the same user as the bus owner
++ */
++struct kdbus_conn {
++ struct kref kref;
++ atomic_t active;
++#ifdef CONFIG_DEBUG_LOCK_ALLOC
++ struct lockdep_map dep_map;
++#endif
++ u64 id;
++ u64 flags;
++ atomic64_t attach_flags_send;
++ atomic64_t attach_flags_recv;
++ const char *description;
++ struct kdbus_ep *ep;
++ struct mutex lock;
++ struct hlist_node hentry;
++ struct list_head ep_entry;
++ struct list_head monitor_entry;
++ struct list_head reply_list;
++ struct delayed_work work;
++ struct kdbus_match_db *match_db;
++ struct kdbus_meta_proc *meta_proc;
++ struct kdbus_meta_fake *meta_fake;
++ struct kdbus_pool *pool;
++ struct kdbus_user *user;
++ const struct cred *cred;
++ struct pid *pid;
++ struct path root_path;
++ atomic_t request_count;
++ atomic_t lost_count;
++ wait_queue_head_t wait;
++ struct kdbus_queue queue;
++
++ struct kdbus_quota *quota;
++ unsigned int n_quota;
++
++ /* protected by registry->rwlock */
++ struct list_head names_list;
++ unsigned int name_count;
++
++ bool privileged:1;
++ bool owner:1;
++};
++
++struct kdbus_conn *kdbus_conn_ref(struct kdbus_conn *conn);
++struct kdbus_conn *kdbus_conn_unref(struct kdbus_conn *conn);
++bool kdbus_conn_active(const struct kdbus_conn *conn);
++int kdbus_conn_acquire(struct kdbus_conn *conn);
++void kdbus_conn_release(struct kdbus_conn *conn);
++int kdbus_conn_disconnect(struct kdbus_conn *conn, bool ensure_queue_empty);
++bool kdbus_conn_has_name(struct kdbus_conn *conn, const char *name);
++int kdbus_conn_quota_inc(struct kdbus_conn *c, struct kdbus_user *u,
++ size_t memory, size_t fds);
++void kdbus_conn_quota_dec(struct kdbus_conn *c, struct kdbus_user *u,
++ size_t memory, size_t fds);
++void kdbus_conn_lost_message(struct kdbus_conn *c);
++int kdbus_conn_entry_insert(struct kdbus_conn *conn_src,
++ struct kdbus_conn *conn_dst,
++ struct kdbus_staging *staging,
++ struct kdbus_reply *reply,
++ const struct kdbus_name_entry *name);
++void kdbus_conn_move_messages(struct kdbus_conn *conn_dst,
++ struct kdbus_conn *conn_src,
++ u64 name_id);
++
++/* policy */
++bool kdbus_conn_policy_own_name(struct kdbus_conn *conn,
++ const struct cred *conn_creds,
++ const char *name);
++bool kdbus_conn_policy_talk(struct kdbus_conn *conn,
++ const struct cred *conn_creds,
++ struct kdbus_conn *to);
++bool kdbus_conn_policy_see_name_unlocked(struct kdbus_conn *conn,
++ const struct cred *curr_creds,
++ const char *name);
++bool kdbus_conn_policy_see_notification(struct kdbus_conn *conn,
++ const struct cred *curr_creds,
++ const struct kdbus_msg *msg);
++
++/* command dispatcher */
++struct kdbus_conn *kdbus_cmd_hello(struct kdbus_ep *ep, struct file *file,
++ void __user *argp);
++int kdbus_cmd_byebye_unlocked(struct kdbus_conn *conn, void __user *argp);
++int kdbus_cmd_conn_info(struct kdbus_conn *conn, void __user *argp);
++int kdbus_cmd_update(struct kdbus_conn *conn, void __user *argp);
++int kdbus_cmd_send(struct kdbus_conn *conn, struct file *f, void __user *argp);
++int kdbus_cmd_recv(struct kdbus_conn *conn, void __user *argp);
++int kdbus_cmd_free(struct kdbus_conn *conn, void __user *argp);
++
++/**
++ * kdbus_conn_is_ordinary() - Check if connection is ordinary
++ * @conn: The connection to check
++ *
++ * Return: Non-zero if the connection is an ordinary connection
++ */
++static inline int kdbus_conn_is_ordinary(const struct kdbus_conn *conn)
++{
++ return !(conn->flags & KDBUS_HELLO_SPECIAL_CONN);
++}
++
++/**
++ * kdbus_conn_is_activator() - Check if connection is an activator
++ * @conn: The connection to check
++ *
++ * Return: Non-zero if the connection is an activator
++ */
++static inline int kdbus_conn_is_activator(const struct kdbus_conn *conn)
++{
++ return conn->flags & KDBUS_HELLO_ACTIVATOR;
++}
++
++/**
++ * kdbus_conn_is_policy_holder() - Check if connection is a policy holder
++ * @conn: The connection to check
++ *
++ * Return: Non-zero if the connection is a policy holder
++ */
++static inline int kdbus_conn_is_policy_holder(const struct kdbus_conn *conn)
++{
++ return conn->flags & KDBUS_HELLO_POLICY_HOLDER;
++}
++
++/**
++ * kdbus_conn_is_monitor() - Check if connection is a monitor
++ * @conn: The connection to check
++ *
++ * Return: Non-zero if the connection is a monitor
++ */
++static inline int kdbus_conn_is_monitor(const struct kdbus_conn *conn)
++{
++ return conn->flags & KDBUS_HELLO_MONITOR;
++}
++
++/**
++ * kdbus_conn_lock2() - Lock two connections
++ * @a: connection A to lock or NULL
++ * @b: connection B to lock or NULL
++ *
++ * Lock two connections at once. As we need to have a stable locking order, we
++ * always lock the connection with lower memory address first.
++ */
++static inline void kdbus_conn_lock2(struct kdbus_conn *a, struct kdbus_conn *b)
++{
++ if (a < b) {
++ if (a)
++ mutex_lock(&a->lock);
++ if (b && b != a)
++ mutex_lock_nested(&b->lock, !!a);
++ } else {
++ if (b)
++ mutex_lock(&b->lock);
++ if (a && a != b)
++ mutex_lock_nested(&a->lock, !!b);
++ }
++}
++
++/**
++ * kdbus_conn_unlock2() - Unlock two connections
++ * @a: connection A to unlock or NULL
++ * @b: connection B to unlock or NULL
++ *
++ * Unlock two connections at once. See kdbus_conn_lock2().
++ */
++static inline void kdbus_conn_unlock2(struct kdbus_conn *a,
++ struct kdbus_conn *b)
++{
++ if (a)
++ mutex_unlock(&a->lock);
++ if (b && b != a)
++ mutex_unlock(&b->lock);
++}
++
++/**
++ * kdbus_conn_assert_active() - lockdep assert on active lock
++ * @conn: connection that shall be active
++ *
++ * This verifies via lockdep that the caller holds an active reference to the
++ * given connection.
++ */
++static inline void kdbus_conn_assert_active(struct kdbus_conn *conn)
++{
++ lockdep_assert_held(conn);
++}
++
++#endif
+diff --git a/ipc/kdbus/domain.c b/ipc/kdbus/domain.c
+new file mode 100644
+index 0000000..ac9f760
+--- /dev/null
++++ b/ipc/kdbus/domain.c
+@@ -0,0 +1,296 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/fs.h>
++#include <linux/idr.h>
++#include <linux/init.h>
++#include <linux/module.h>
++#include <linux/sched.h>
++#include <linux/sizes.h>
++#include <linux/slab.h>
++#include <linux/uaccess.h>
++
++#include "bus.h"
++#include "domain.h"
++#include "handle.h"
++#include "item.h"
++#include "limits.h"
++#include "util.h"
++
++static void kdbus_domain_control_free(struct kdbus_node *node)
++{
++ kfree(node);
++}
++
++static struct kdbus_node *kdbus_domain_control_new(struct kdbus_domain *domain,
++ unsigned int access)
++{
++ struct kdbus_node *node;
++ int ret;
++
++ node = kzalloc(sizeof(*node), GFP_KERNEL);
++ if (!node)
++ return ERR_PTR(-ENOMEM);
++
++ kdbus_node_init(node, KDBUS_NODE_CONTROL);
++
++ node->free_cb = kdbus_domain_control_free;
++ node->mode = domain->node.mode;
++ node->mode = S_IRUSR | S_IWUSR;
++ if (access & (KDBUS_MAKE_ACCESS_GROUP | KDBUS_MAKE_ACCESS_WORLD))
++ node->mode |= S_IRGRP | S_IWGRP;
++ if (access & KDBUS_MAKE_ACCESS_WORLD)
++ node->mode |= S_IROTH | S_IWOTH;
++
++ ret = kdbus_node_link(node, &domain->node, "control");
++ if (ret < 0)
++ goto exit_free;
++
++ return node;
++
++exit_free:
++ kdbus_node_deactivate(node);
++ kdbus_node_unref(node);
++ return ERR_PTR(ret);
++}
++
++static void kdbus_domain_free(struct kdbus_node *node)
++{
++ struct kdbus_domain *domain =
++ container_of(node, struct kdbus_domain, node);
++
++ put_user_ns(domain->user_namespace);
++ ida_destroy(&domain->user_ida);
++ idr_destroy(&domain->user_idr);
++ kfree(domain);
++}
++
++/**
++ * kdbus_domain_new() - create a new domain
++ * @access: The access mode for this node (KDBUS_MAKE_ACCESS_*)
++ *
++ * Return: a new kdbus_domain on success, ERR_PTR on failure
++ */
++struct kdbus_domain *kdbus_domain_new(unsigned int access)
++{
++ struct kdbus_domain *d;
++ int ret;
++
++ d = kzalloc(sizeof(*d), GFP_KERNEL);
++ if (!d)
++ return ERR_PTR(-ENOMEM);
++
++ kdbus_node_init(&d->node, KDBUS_NODE_DOMAIN);
++
++ d->node.free_cb = kdbus_domain_free;
++ d->node.mode = S_IRUSR | S_IXUSR;
++ if (access & (KDBUS_MAKE_ACCESS_GROUP | KDBUS_MAKE_ACCESS_WORLD))
++ d->node.mode |= S_IRGRP | S_IXGRP;
++ if (access & KDBUS_MAKE_ACCESS_WORLD)
++ d->node.mode |= S_IROTH | S_IXOTH;
++
++ mutex_init(&d->lock);
++ idr_init(&d->user_idr);
++ ida_init(&d->user_ida);
++
++ /* Pin user namespace so we can guarantee domain-unique bus * names. */
++ d->user_namespace = get_user_ns(current_user_ns());
++
++ ret = kdbus_node_link(&d->node, NULL, NULL);
++ if (ret < 0)
++ goto exit_unref;
++
++ return d;
++
++exit_unref:
++ kdbus_node_deactivate(&d->node);
++ kdbus_node_unref(&d->node);
++ return ERR_PTR(ret);
++}
++
++/**
++ * kdbus_domain_ref() - take a domain reference
++ * @domain: Domain
++ *
++ * Return: the domain itself
++ */
++struct kdbus_domain *kdbus_domain_ref(struct kdbus_domain *domain)
++{
++ if (domain)
++ kdbus_node_ref(&domain->node);
++ return domain;
++}
++
++/**
++ * kdbus_domain_unref() - drop a domain reference
++ * @domain: Domain
++ *
++ * When the last reference is dropped, the domain internal structure
++ * is freed.
++ *
++ * Return: NULL
++ */
++struct kdbus_domain *kdbus_domain_unref(struct kdbus_domain *domain)
++{
++ if (domain)
++ kdbus_node_unref(&domain->node);
++ return NULL;
++}
++
++/**
++ * kdbus_domain_populate() - populate static domain nodes
++ * @domain: domain to populate
++ * @access: KDBUS_MAKE_ACCESS_* access restrictions for new nodes
++ *
++ * Allocate and activate static sub-nodes of the given domain. This will fail if
++ * you call it on a non-active node or if the domain was already populated.
++ *
++ * Return: 0 on success, negative error code on failure.
++ */
++int kdbus_domain_populate(struct kdbus_domain *domain, unsigned int access)
++{
++ struct kdbus_node *control;
++
++ /*
++ * Create a control-node for this domain. We drop our own reference
++ * immediately, effectively causing the node to be deactivated and
++ * released when the parent domain is.
++ */
++ control = kdbus_domain_control_new(domain, access);
++ if (IS_ERR(control))
++ return PTR_ERR(control);
++
++ kdbus_node_activate(control);
++ kdbus_node_unref(control);
++ return 0;
++}
++
++/**
++ * kdbus_user_lookup() - lookup a kdbus_user object
++ * @domain: domain of the user
++ * @uid: uid of the user; INVALID_UID for an anon user
++ *
++ * Lookup the kdbus user accounting object for the given domain. If INVALID_UID
++ * is passed, a new anonymous user is created which is private to the caller.
++ *
++ * Return: The user object is returned, ERR_PTR on failure.
++ */
++struct kdbus_user *kdbus_user_lookup(struct kdbus_domain *domain, kuid_t uid)
++{
++ struct kdbus_user *u = NULL, *old = NULL;
++ int ret;
++
++ mutex_lock(&domain->lock);
++
++ if (uid_valid(uid)) {
++ old = idr_find(&domain->user_idr, __kuid_val(uid));
++ /*
++ * If the object is about to be destroyed, ignore it and
++ * replace the slot in the IDR later on.
++ */
++ if (old && kref_get_unless_zero(&old->kref)) {
++ mutex_unlock(&domain->lock);
++ return old;
++ }
++ }
++
++ u = kzalloc(sizeof(*u), GFP_KERNEL);
++ if (!u) {
++ ret = -ENOMEM;
++ goto exit;
++ }
++
++ kref_init(&u->kref);
++ u->domain = kdbus_domain_ref(domain);
++ u->uid = uid;
++ atomic_set(&u->buses, 0);
++ atomic_set(&u->connections, 0);
++
++ if (uid_valid(uid)) {
++ if (old) {
++ idr_replace(&domain->user_idr, u, __kuid_val(uid));
++ old->uid = INVALID_UID; /* mark old as removed */
++ } else {
++ ret = idr_alloc(&domain->user_idr, u, __kuid_val(uid),
++ __kuid_val(uid) + 1, GFP_KERNEL);
++ if (ret < 0)
++ goto exit;
++ }
++ }
++
++ /*
++ * Allocate the smallest possible index for this user; used
++ * in arrays for accounting user quota in receiver queues.
++ */
++ ret = ida_simple_get(&domain->user_ida, 1, 0, GFP_KERNEL);
++ if (ret < 0)
++ goto exit;
++
++ u->id = ret;
++ mutex_unlock(&domain->lock);
++ return u;
++
++exit:
++ if (u) {
++ if (uid_valid(u->uid))
++ idr_remove(&domain->user_idr, __kuid_val(u->uid));
++ kdbus_domain_unref(u->domain);
++ kfree(u);
++ }
++ mutex_unlock(&domain->lock);
++ return ERR_PTR(ret);
++}
++
++static void __kdbus_user_free(struct kref *kref)
++{
++ struct kdbus_user *user = container_of(kref, struct kdbus_user, kref);
++
++ WARN_ON(atomic_read(&user->buses) > 0);
++ WARN_ON(atomic_read(&user->connections) > 0);
++
++ mutex_lock(&user->domain->lock);
++ ida_simple_remove(&user->domain->user_ida, user->id);
++ if (uid_valid(user->uid))
++ idr_remove(&user->domain->user_idr, __kuid_val(user->uid));
++ mutex_unlock(&user->domain->lock);
++
++ kdbus_domain_unref(user->domain);
++ kfree(user);
++}
++
++/**
++ * kdbus_user_ref() - take a user reference
++ * @u: User
++ *
++ * Return: @u is returned
++ */
++struct kdbus_user *kdbus_user_ref(struct kdbus_user *u)
++{
++ if (u)
++ kref_get(&u->kref);
++ return u;
++}
++
++/**
++ * kdbus_user_unref() - drop a user reference
++ * @u: User
++ *
++ * Return: NULL
++ */
++struct kdbus_user *kdbus_user_unref(struct kdbus_user *u)
++{
++ if (u)
++ kref_put(&u->kref, __kdbus_user_free);
++ return NULL;
++}
+diff --git a/ipc/kdbus/domain.h b/ipc/kdbus/domain.h
+new file mode 100644
+index 0000000..447a2bd
+--- /dev/null
++++ b/ipc/kdbus/domain.h
+@@ -0,0 +1,77 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_DOMAIN_H
++#define __KDBUS_DOMAIN_H
++
++#include <linux/fs.h>
++#include <linux/idr.h>
++#include <linux/kref.h>
++#include <linux/user_namespace.h>
++
++#include "node.h"
++
++/**
++ * struct kdbus_domain - domain for buses
++ * @node: Underlying API node
++ * @lock: Domain data lock
++ * @last_id: Last used object id
++ * @user_idr: Set of all users indexed by UID
++ * @user_ida: Set of all users to compute small indices
++ * @user_namespace: User namespace, pinned at creation time
++ * @dentry: Root dentry of VFS mount (don't use outside of kdbusfs)
++ */
++struct kdbus_domain {
++ struct kdbus_node node;
++ struct mutex lock;
++ atomic64_t last_id;
++ struct idr user_idr;
++ struct ida user_ida;
++ struct user_namespace *user_namespace;
++ struct dentry *dentry;
++};
++
++/**
++ * struct kdbus_user - resource accounting for users
++ * @kref: Reference counter
++ * @domain: Domain of the user
++ * @id: Index of this user
++ * @uid: UID of the user
++ * @buses: Number of buses the user has created
++ * @connections: Number of connections the user has created
++ */
++struct kdbus_user {
++ struct kref kref;
++ struct kdbus_domain *domain;
++ unsigned int id;
++ kuid_t uid;
++ atomic_t buses;
++ atomic_t connections;
++};
++
++#define kdbus_domain_from_node(_node) \
++ container_of((_node), struct kdbus_domain, node)
++
++struct kdbus_domain *kdbus_domain_new(unsigned int access);
++struct kdbus_domain *kdbus_domain_ref(struct kdbus_domain *domain);
++struct kdbus_domain *kdbus_domain_unref(struct kdbus_domain *domain);
++int kdbus_domain_populate(struct kdbus_domain *domain, unsigned int access);
++
++#define KDBUS_USER_KERNEL_ID 0 /* ID 0 is reserved for kernel accounting */
++
++struct kdbus_user *kdbus_user_lookup(struct kdbus_domain *domain, kuid_t uid);
++struct kdbus_user *kdbus_user_ref(struct kdbus_user *u);
++struct kdbus_user *kdbus_user_unref(struct kdbus_user *u);
++
++#endif
+diff --git a/ipc/kdbus/endpoint.c b/ipc/kdbus/endpoint.c
+new file mode 100644
+index 0000000..44e7a20
+--- /dev/null
++++ b/ipc/kdbus/endpoint.c
+@@ -0,0 +1,303 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/fs.h>
++#include <linux/idr.h>
++#include <linux/init.h>
++#include <linux/module.h>
++#include <linux/sched.h>
++#include <linux/sizes.h>
++#include <linux/slab.h>
++#include <linux/uaccess.h>
++#include <linux/uio.h>
++
++#include "bus.h"
++#include "connection.h"
++#include "domain.h"
++#include "endpoint.h"
++#include "handle.h"
++#include "item.h"
++#include "message.h"
++#include "policy.h"
++
++static void kdbus_ep_free(struct kdbus_node *node)
++{
++ struct kdbus_ep *ep = container_of(node, struct kdbus_ep, node);
++
++ WARN_ON(!list_empty(&ep->conn_list));
++
++ kdbus_policy_db_clear(&ep->policy_db);
++ kdbus_bus_unref(ep->bus);
++ kdbus_user_unref(ep->user);
++ kfree(ep);
++}
++
++static void kdbus_ep_release(struct kdbus_node *node, bool was_active)
++{
++ struct kdbus_ep *ep = container_of(node, struct kdbus_ep, node);
++
++ /* disconnect all connections to this endpoint */
++ for (;;) {
++ struct kdbus_conn *conn;
++
++ mutex_lock(&ep->lock);
++ conn = list_first_entry_or_null(&ep->conn_list,
++ struct kdbus_conn,
++ ep_entry);
++ if (!conn) {
++ mutex_unlock(&ep->lock);
++ break;
++ }
++
++ /* take reference, release lock, disconnect without lock */
++ kdbus_conn_ref(conn);
++ mutex_unlock(&ep->lock);
++
++ kdbus_conn_disconnect(conn, false);
++ kdbus_conn_unref(conn);
++ }
++}
++
++/**
++ * kdbus_ep_new() - create a new endpoint
++ * @bus: The bus this endpoint will be created for
++ * @name: The name of the endpoint
++ * @access: The access flags for this node (KDBUS_MAKE_ACCESS_*)
++ * @uid: The uid of the node
++ * @gid: The gid of the node
++ * @is_custom: Whether this is a custom endpoint
++ *
++ * This function will create a new endpoint with the given
++ * name and properties for a given bus.
++ *
++ * Return: a new kdbus_ep on success, ERR_PTR on failure.
++ */
++struct kdbus_ep *kdbus_ep_new(struct kdbus_bus *bus, const char *name,
++ unsigned int access, kuid_t uid, kgid_t gid,
++ bool is_custom)
++{
++ struct kdbus_ep *e;
++ int ret;
++
++ /*
++ * Validate only custom endpoints names, default endpoints
++ * with a "bus" name are created when the bus is created
++ */
++ if (is_custom) {
++ ret = kdbus_verify_uid_prefix(name, bus->domain->user_namespace,
++ uid);
++ if (ret < 0)
++ return ERR_PTR(ret);
++ }
++
++ e = kzalloc(sizeof(*e), GFP_KERNEL);
++ if (!e)
++ return ERR_PTR(-ENOMEM);
++
++ kdbus_node_init(&e->node, KDBUS_NODE_ENDPOINT);
++
++ e->node.free_cb = kdbus_ep_free;
++ e->node.release_cb = kdbus_ep_release;
++ e->node.uid = uid;
++ e->node.gid = gid;
++ e->node.mode = S_IRUSR | S_IWUSR;
++ if (access & (KDBUS_MAKE_ACCESS_GROUP | KDBUS_MAKE_ACCESS_WORLD))
++ e->node.mode |= S_IRGRP | S_IWGRP;
++ if (access & KDBUS_MAKE_ACCESS_WORLD)
++ e->node.mode |= S_IROTH | S_IWOTH;
++
++ mutex_init(&e->lock);
++ INIT_LIST_HEAD(&e->conn_list);
++ kdbus_policy_db_init(&e->policy_db);
++ e->bus = kdbus_bus_ref(bus);
++
++ ret = kdbus_node_link(&e->node, &bus->node, name);
++ if (ret < 0)
++ goto exit_unref;
++
++ /*
++ * Transactions on custom endpoints are never accounted on the global
++ * user limits. Instead, for each custom endpoint, we create a custom,
++ * unique user, which all transactions are accounted on. Regardless of
++ * the user using that endpoint, it is always accounted on the same
++ * user-object. This budget is not shared with ordinary users on
++ * non-custom endpoints.
++ */
++ if (is_custom) {
++ e->user = kdbus_user_lookup(bus->domain, INVALID_UID);
++ if (IS_ERR(e->user)) {
++ ret = PTR_ERR(e->user);
++ e->user = NULL;
++ goto exit_unref;
++ }
++ }
++
++ return e;
++
++exit_unref:
++ kdbus_node_deactivate(&e->node);
++ kdbus_node_unref(&e->node);
++ return ERR_PTR(ret);
++}
++
++/**
++ * kdbus_ep_ref() - increase the reference counter of a kdbus_ep
++ * @ep: The endpoint to reference
++ *
++ * Every user of an endpoint, except for its creator, must add a reference to
++ * the kdbus_ep instance using this function.
++ *
++ * Return: the ep itself
++ */
++struct kdbus_ep *kdbus_ep_ref(struct kdbus_ep *ep)
++{
++ if (ep)
++ kdbus_node_ref(&ep->node);
++ return ep;
++}
++
++/**
++ * kdbus_ep_unref() - decrease the reference counter of a kdbus_ep
++ * @ep: The ep to unref
++ *
++ * Release a reference. If the reference count drops to 0, the ep will be
++ * freed.
++ *
++ * Return: NULL
++ */
++struct kdbus_ep *kdbus_ep_unref(struct kdbus_ep *ep)
++{
++ if (ep)
++ kdbus_node_unref(&ep->node);
++ return NULL;
++}
++
++/**
++ * kdbus_ep_is_privileged() - check whether a file is privileged
++ * @ep: endpoint to operate on
++ * @file: file to test
++ *
++ * Return: True if @file is privileged in the domain of @ep.
++ */
++bool kdbus_ep_is_privileged(struct kdbus_ep *ep, struct file *file)
++{
++ return !ep->user &&
++ file_ns_capable(file, ep->bus->domain->user_namespace,
++ CAP_IPC_OWNER);
++}
++
++/**
++ * kdbus_ep_is_owner() - check whether a file should be treated as bus owner
++ * @ep: endpoint to operate on
++ * @file: file to test
++ *
++ * Return: True if @file should be treated as bus owner on @ep
++ */
++bool kdbus_ep_is_owner(struct kdbus_ep *ep, struct file *file)
++{
++ return !ep->user &&
++ (uid_eq(file->f_cred->euid, ep->bus->node.uid) ||
++ kdbus_ep_is_privileged(ep, file));
++}
++
++/**
++ * kdbus_cmd_ep_make() - handle KDBUS_CMD_ENDPOINT_MAKE
++ * @bus: bus to operate on
++ * @argp: command payload
++ *
++ * Return: NULL or newly created endpoint on success, ERR_PTR on failure.
++ */
++struct kdbus_ep *kdbus_cmd_ep_make(struct kdbus_bus *bus, void __user *argp)
++{
++ const char *item_make_name;
++ struct kdbus_ep *ep = NULL;
++ struct kdbus_cmd *cmd;
++ int ret;
++
++ struct kdbus_arg argv[] = {
++ { .type = KDBUS_ITEM_NEGOTIATE },
++ { .type = KDBUS_ITEM_MAKE_NAME, .mandatory = true },
++ };
++ struct kdbus_args args = {
++ .allowed_flags = KDBUS_FLAG_NEGOTIATE |
++ KDBUS_MAKE_ACCESS_GROUP |
++ KDBUS_MAKE_ACCESS_WORLD,
++ .argv = argv,
++ .argc = ARRAY_SIZE(argv),
++ };
++
++ ret = kdbus_args_parse(&args, argp, &cmd);
++ if (ret < 0)
++ return ERR_PTR(ret);
++ if (ret > 0)
++ return NULL;
++
++ item_make_name = argv[1].item->str;
++
++ ep = kdbus_ep_new(bus, item_make_name, cmd->flags,
++ current_euid(), current_egid(), true);
++ if (IS_ERR(ep)) {
++ ret = PTR_ERR(ep);
++ ep = NULL;
++ goto exit;
++ }
++
++ if (!kdbus_node_activate(&ep->node)) {
++ ret = -ESHUTDOWN;
++ goto exit;
++ }
++
++exit:
++ ret = kdbus_args_clear(&args, ret);
++ if (ret < 0) {
++ if (ep) {
++ kdbus_node_deactivate(&ep->node);
++ kdbus_ep_unref(ep);
++ }
++ return ERR_PTR(ret);
++ }
++ return ep;
++}
++
++/**
++ * kdbus_cmd_ep_update() - handle KDBUS_CMD_ENDPOINT_UPDATE
++ * @ep: endpoint to operate on
++ * @argp: command payload
++ *
++ * Return: >=0 on success, negative error code on failure.
++ */
++int kdbus_cmd_ep_update(struct kdbus_ep *ep, void __user *argp)
++{
++ struct kdbus_cmd *cmd;
++ int ret;
++
++ struct kdbus_arg argv[] = {
++ { .type = KDBUS_ITEM_NEGOTIATE },
++ { .type = KDBUS_ITEM_NAME, .multiple = true },
++ { .type = KDBUS_ITEM_POLICY_ACCESS, .multiple = true },
++ };
++ struct kdbus_args args = {
++ .allowed_flags = KDBUS_FLAG_NEGOTIATE,
++ .argv = argv,
++ .argc = ARRAY_SIZE(argv),
++ };
++
++ ret = kdbus_args_parse(&args, argp, &cmd);
++ if (ret != 0)
++ return ret;
++
++ ret = kdbus_policy_set(&ep->policy_db, args.items, args.items_size,
++ 0, true, ep);
++ return kdbus_args_clear(&args, ret);
++}
+diff --git a/ipc/kdbus/endpoint.h b/ipc/kdbus/endpoint.h
+new file mode 100644
+index 0000000..e0da59f
+--- /dev/null
++++ b/ipc/kdbus/endpoint.h
+@@ -0,0 +1,70 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_ENDPOINT_H
++#define __KDBUS_ENDPOINT_H
++
++#include <linux/list.h>
++#include <linux/mutex.h>
++#include <linux/uidgid.h>
++#include "node.h"
++#include "policy.h"
++
++struct kdbus_bus;
++struct kdbus_user;
++
++/**
++ * struct kdbus_ep - endpoint to access a bus
++ * @node: The kdbus node
++ * @lock: Endpoint data lock
++ * @bus: Bus behind this endpoint
++ * @user: Custom enpoints account against an anonymous user
++ * @policy_db: Uploaded policy
++ * @conn_list: Connections of this endpoint
++ *
++ * An endpoint offers access to a bus; the default endpoint node name is "bus".
++ * Additional custom endpoints to the same bus can be created and they can
++ * carry their own policies/filters.
++ */
++struct kdbus_ep {
++ struct kdbus_node node;
++ struct mutex lock;
++
++ /* static */
++ struct kdbus_bus *bus;
++ struct kdbus_user *user;
++
++ /* protected by own locks */
++ struct kdbus_policy_db policy_db;
++
++ /* protected by ep->lock */
++ struct list_head conn_list;
++};
++
++#define kdbus_ep_from_node(_node) \
++ container_of((_node), struct kdbus_ep, node)
++
++struct kdbus_ep *kdbus_ep_new(struct kdbus_bus *bus, const char *name,
++ unsigned int access, kuid_t uid, kgid_t gid,
++ bool policy);
++struct kdbus_ep *kdbus_ep_ref(struct kdbus_ep *ep);
++struct kdbus_ep *kdbus_ep_unref(struct kdbus_ep *ep);
++
++bool kdbus_ep_is_privileged(struct kdbus_ep *ep, struct file *file);
++bool kdbus_ep_is_owner(struct kdbus_ep *ep, struct file *file);
++
++struct kdbus_ep *kdbus_cmd_ep_make(struct kdbus_bus *bus, void __user *argp);
++int kdbus_cmd_ep_update(struct kdbus_ep *ep, void __user *argp);
++
++#endif
+diff --git a/ipc/kdbus/fs.c b/ipc/kdbus/fs.c
+new file mode 100644
+index 0000000..09c4809
+--- /dev/null
++++ b/ipc/kdbus/fs.c
+@@ -0,0 +1,508 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/dcache.h>
++#include <linux/fs.h>
++#include <linux/fsnotify.h>
++#include <linux/init.h>
++#include <linux/ipc_namespace.h>
++#include <linux/magic.h>
++#include <linux/module.h>
++#include <linux/mount.h>
++#include <linux/mutex.h>
++#include <linux/namei.h>
++#include <linux/pagemap.h>
++#include <linux/sched.h>
++#include <linux/slab.h>
++
++#include "bus.h"
++#include "domain.h"
++#include "endpoint.h"
++#include "fs.h"
++#include "handle.h"
++#include "node.h"
++
++#define kdbus_node_from_dentry(_dentry) \
++ ((struct kdbus_node *)(_dentry)->d_fsdata)
++
++static struct inode *fs_inode_get(struct super_block *sb,
++ struct kdbus_node *node);
++
++/*
++ * Directory Management
++ */
++
++static inline unsigned char kdbus_dt_type(struct kdbus_node *node)
++{
++ switch (node->type) {
++ case KDBUS_NODE_DOMAIN:
++ case KDBUS_NODE_BUS:
++ return DT_DIR;
++ case KDBUS_NODE_CONTROL:
++ case KDBUS_NODE_ENDPOINT:
++ return DT_REG;
++ }
++
++ return DT_UNKNOWN;
++}
++
++static int fs_dir_fop_iterate(struct file *file, struct dir_context *ctx)
++{
++ struct dentry *dentry = file->f_path.dentry;
++ struct kdbus_node *parent = kdbus_node_from_dentry(dentry);
++ struct kdbus_node *old, *next = file->private_data;
++
++ /*
++ * kdbusfs directory iterator (modelled after sysfs/kernfs)
++ * When iterating kdbusfs directories, we iterate all children of the
++ * parent kdbus_node object. We use ctx->pos to store the hash of the
++ * child and file->private_data to store a reference to the next node
++ * object. If ctx->pos is not modified via llseek while you iterate a
++ * directory, then we use the file->private_data node pointer to
++ * directly access the next node in the tree.
++ * However, if you directly seek on the directory, we have to find the
++ * closest node to that position and cannot use our node pointer. This
++ * means iterating the rb-tree to find the closest match and start over
++ * from there.
++ * Note that hash values are not necessarily unique. Therefore, llseek
++ * is not guaranteed to seek to the same node that you got when you
++ * retrieved the position. Seeking to 0, 1, 2 and >=INT_MAX is safe,
++ * though. We could use the inode-number as position, but this would
++ * require another rb-tree for fast access. Kernfs and others already
++ * ignore those conflicts, so we should be fine, too.
++ */
++
++ if (!dir_emit_dots(file, ctx))
++ return 0;
++
++ /* acquire @next; if deactivated, or seek detected, find next node */
++ old = next;
++ if (next && ctx->pos == next->hash) {
++ if (kdbus_node_acquire(next))
++ kdbus_node_ref(next);
++ else
++ next = kdbus_node_next_child(parent, next);
++ } else {
++ next = kdbus_node_find_closest(parent, ctx->pos);
++ }
++ kdbus_node_unref(old);
++
++ while (next) {
++ /* emit @next */
++ file->private_data = next;
++ ctx->pos = next->hash;
++
++ kdbus_node_release(next);
++
++ if (!dir_emit(ctx, next->name, strlen(next->name), next->id,
++ kdbus_dt_type(next)))
++ return 0;
++
++ /* find next node after @next */
++ old = next;
++ next = kdbus_node_next_child(parent, next);
++ kdbus_node_unref(old);
++ }
++
++ file->private_data = NULL;
++ ctx->pos = INT_MAX;
++
++ return 0;
++}
++
++static loff_t fs_dir_fop_llseek(struct file *file, loff_t offset, int whence)
++{
++ struct inode *inode = file_inode(file);
++ loff_t ret;
++
++ /* protect f_off against fop_iterate */
++ mutex_lock(&inode->i_mutex);
++ ret = generic_file_llseek(file, offset, whence);
++ mutex_unlock(&inode->i_mutex);
++
++ return ret;
++}
++
++static int fs_dir_fop_release(struct inode *inode, struct file *file)
++{
++ kdbus_node_unref(file->private_data);
++ return 0;
++}
++
++static const struct file_operations fs_dir_fops = {
++ .read = generic_read_dir,
++ .iterate = fs_dir_fop_iterate,
++ .llseek = fs_dir_fop_llseek,
++ .release = fs_dir_fop_release,
++};
++
++static struct dentry *fs_dir_iop_lookup(struct inode *dir,
++ struct dentry *dentry,
++ unsigned int flags)
++{
++ struct dentry *dnew = NULL;
++ struct kdbus_node *parent;
++ struct kdbus_node *node;
++ struct inode *inode;
++
++ parent = kdbus_node_from_dentry(dentry->d_parent);
++ if (!kdbus_node_acquire(parent))
++ return NULL;
++
++ /* returns reference to _acquired_ child node */
++ node = kdbus_node_find_child(parent, dentry->d_name.name);
++ if (node) {
++ dentry->d_fsdata = node;
++ inode = fs_inode_get(dir->i_sb, node);
++ if (IS_ERR(inode))
++ dnew = ERR_CAST(inode);
++ else
++ dnew = d_splice_alias(inode, dentry);
++
++ kdbus_node_release(node);
++ }
++
++ kdbus_node_release(parent);
++ return dnew;
++}
++
++static const struct inode_operations fs_dir_iops = {
++ .permission = generic_permission,
++ .lookup = fs_dir_iop_lookup,
++};
++
++/*
++ * Inode Management
++ */
++
++static const struct inode_operations fs_inode_iops = {
++ .permission = generic_permission,
++};
++
++static struct inode *fs_inode_get(struct super_block *sb,
++ struct kdbus_node *node)
++{
++ struct inode *inode;
++
++ inode = iget_locked(sb, node->id);
++ if (!inode)
++ return ERR_PTR(-ENOMEM);
++ if (!(inode->i_state & I_NEW))
++ return inode;
++
++ inode->i_private = kdbus_node_ref(node);
++ inode->i_mapping->a_ops = &empty_aops;
++ inode->i_mode = node->mode & S_IALLUGO;
++ inode->i_atime = inode->i_ctime = inode->i_mtime = CURRENT_TIME;
++ inode->i_uid = node->uid;
++ inode->i_gid = node->gid;
++
++ switch (node->type) {
++ case KDBUS_NODE_DOMAIN:
++ case KDBUS_NODE_BUS:
++ inode->i_mode |= S_IFDIR;
++ inode->i_op = &fs_dir_iops;
++ inode->i_fop = &fs_dir_fops;
++ set_nlink(inode, 2);
++ break;
++ case KDBUS_NODE_CONTROL:
++ case KDBUS_NODE_ENDPOINT:
++ inode->i_mode |= S_IFREG;
++ inode->i_op = &fs_inode_iops;
++ inode->i_fop = &kdbus_handle_ops;
++ break;
++ }
++
++ unlock_new_inode(inode);
++
++ return inode;
++}
++
++/*
++ * Superblock Management
++ */
++
++static int fs_super_dop_revalidate(struct dentry *dentry, unsigned int flags)
++{
++ struct kdbus_node *node;
++
++ /* Force lookup on negatives */
++ if (!dentry->d_inode)
++ return 0;
++
++ node = kdbus_node_from_dentry(dentry);
++
++ /* see whether the node has been removed */
++ if (!kdbus_node_is_active(node))
++ return 0;
++
++ return 1;
++}
++
++static void fs_super_dop_release(struct dentry *dentry)
++{
++ kdbus_node_unref(dentry->d_fsdata);
++}
++
++static const struct dentry_operations fs_super_dops = {
++ .d_revalidate = fs_super_dop_revalidate,
++ .d_release = fs_super_dop_release,
++};
++
++static void fs_super_sop_evict_inode(struct inode *inode)
++{
++ struct kdbus_node *node = kdbus_node_from_inode(inode);
++
++ truncate_inode_pages_final(&inode->i_data);
++ clear_inode(inode);
++ kdbus_node_unref(node);
++}
++
++static const struct super_operations fs_super_sops = {
++ .statfs = simple_statfs,
++ .drop_inode = generic_delete_inode,
++ .evict_inode = fs_super_sop_evict_inode,
++};
++
++static int fs_super_fill(struct super_block *sb)
++{
++ struct kdbus_domain *domain = sb->s_fs_info;
++ struct inode *inode;
++ int ret;
++
++ sb->s_blocksize = PAGE_CACHE_SIZE;
++ sb->s_blocksize_bits = PAGE_CACHE_SHIFT;
++ sb->s_magic = KDBUS_SUPER_MAGIC;
++ sb->s_maxbytes = MAX_LFS_FILESIZE;
++ sb->s_op = &fs_super_sops;
++ sb->s_time_gran = 1;
++
++ inode = fs_inode_get(sb, &domain->node);
++ if (IS_ERR(inode))
++ return PTR_ERR(inode);
++
++ sb->s_root = d_make_root(inode);
++ if (!sb->s_root) {
++ /* d_make_root iput()s the inode on failure */
++ return -ENOMEM;
++ }
++
++ /* sb holds domain reference */
++ sb->s_root->d_fsdata = &domain->node;
++ sb->s_d_op = &fs_super_dops;
++
++ /* sb holds root reference */
++ domain->dentry = sb->s_root;
++
++ if (!kdbus_node_activate(&domain->node))
++ return -ESHUTDOWN;
++
++ ret = kdbus_domain_populate(domain, KDBUS_MAKE_ACCESS_WORLD);
++ if (ret < 0)
++ return ret;
++
++ sb->s_flags |= MS_ACTIVE;
++ return 0;
++}
++
++static void fs_super_kill(struct super_block *sb)
++{
++ struct kdbus_domain *domain = sb->s_fs_info;
++
++ if (domain) {
++ kdbus_node_deactivate(&domain->node);
++ domain->dentry = NULL;
++ }
++
++ kill_anon_super(sb);
++ kdbus_domain_unref(domain);
++}
++
++static int fs_super_set(struct super_block *sb, void *data)
++{
++ int ret;
++
++ ret = set_anon_super(sb, data);
++ if (!ret)
++ sb->s_fs_info = data;
++
++ return ret;
++}
++
++static struct dentry *fs_super_mount(struct file_system_type *fs_type,
++ int flags, const char *dev_name,
++ void *data)
++{
++ struct kdbus_domain *domain;
++ struct super_block *sb;
++ int ret;
++
++ domain = kdbus_domain_new(KDBUS_MAKE_ACCESS_WORLD);
++ if (IS_ERR(domain))
++ return ERR_CAST(domain);
++
++ sb = sget(fs_type, NULL, fs_super_set, flags, domain);
++ if (IS_ERR(sb)) {
++ kdbus_node_deactivate(&domain->node);
++ kdbus_domain_unref(domain);
++ return ERR_CAST(sb);
++ }
++
++ WARN_ON(sb->s_fs_info != domain);
++ WARN_ON(sb->s_root);
++
++ ret = fs_super_fill(sb);
++ if (ret < 0) {
++ /* calls into ->kill_sb() when done */
++ deactivate_locked_super(sb);
++ return ERR_PTR(ret);
++ }
++
++ return dget(sb->s_root);
++}
++
++static struct file_system_type fs_type = {
++ .name = KBUILD_MODNAME "fs",
++ .owner = THIS_MODULE,
++ .mount = fs_super_mount,
++ .kill_sb = fs_super_kill,
++ .fs_flags = FS_USERNS_MOUNT,
++};
++
++/**
++ * kdbus_fs_init() - register kdbus filesystem
++ *
++ * This registers a filesystem with the VFS layer. The filesystem is called
++ * `KBUILD_MODNAME "fs"', which usually resolves to `kdbusfs'. The nameing
++ * scheme allows to set KBUILD_MODNAME to "kdbus2" and you will get an
++ * independent filesystem for developers.
++ *
++ * Each mount of the kdbusfs filesystem has an kdbus_domain attached.
++ * Operations on this mount will only affect the attached domain. On each mount
++ * a new domain is automatically created and used for this mount exclusively.
++ * If you want to share a domain across multiple mounts, you need to bind-mount
++ * it.
++ *
++ * Mounts of kdbusfs (with a different domain each) are unrelated to each other
++ * and will never have any effect on any domain but their own.
++ *
++ * Return: 0 on success, negative error otherwise.
++ */
++int kdbus_fs_init(void)
++{
++ return register_filesystem(&fs_type);
++}
++
++/**
++ * kdbus_fs_exit() - unregister kdbus filesystem
++ *
++ * This does the reverse to kdbus_fs_init(). It unregisters the kdbusfs
++ * filesystem from VFS and cleans up any allocated resources.
++ */
++void kdbus_fs_exit(void)
++{
++ unregister_filesystem(&fs_type);
++}
++
++/* acquire domain of @node, making sure all ancestors are active */
++static struct kdbus_domain *fs_acquire_domain(struct kdbus_node *node)
++{
++ struct kdbus_domain *domain;
++ struct kdbus_node *iter;
++
++ /* caller must guarantee that @node is linked */
++ for (iter = node; iter->parent; iter = iter->parent)
++ if (!kdbus_node_is_active(iter->parent))
++ return NULL;
++
++ /* root nodes are always domains */
++ if (WARN_ON(iter->type != KDBUS_NODE_DOMAIN))
++ return NULL;
++
++ domain = kdbus_domain_from_node(iter);
++ if (!kdbus_node_acquire(&domain->node))
++ return NULL;
++
++ return domain;
++}
++
++/**
++ * kdbus_fs_flush() - flush dcache entries of a node
++ * @node: Node to flush entries of
++ *
++ * This flushes all VFS filesystem cache entries for a node and all its
++ * children. This should be called whenever a node is destroyed during
++ * runtime. It will flush the cache entries so the linked objects can be
++ * deallocated.
++ *
++ * This is a no-op if you call it on active nodes (they really should stay in
++ * cache) or on nodes with deactivated parents (flushing the parent is enough).
++ * Furthermore, there is no need to call it on nodes whose lifetime is bound to
++ * their parents'. In those cases, the parent-flush will always also flush the
++ * children.
++ */
++void kdbus_fs_flush(struct kdbus_node *node)
++{
++ struct dentry *dentry, *parent_dentry = NULL;
++ struct kdbus_domain *domain;
++ struct qstr name;
++
++ /* active nodes should remain in cache */
++ if (!kdbus_node_is_deactivated(node))
++ return;
++
++ /* nodes that were never linked were never instantiated */
++ if (!node->parent)
++ return;
++
++ /* acquire domain and verify all ancestors are active */
++ domain = fs_acquire_domain(node);
++ if (!domain)
++ return;
++
++ switch (node->type) {
++ case KDBUS_NODE_ENDPOINT:
++ if (WARN_ON(!node->parent || !node->parent->name))
++ goto exit;
++
++ name.name = node->parent->name;
++ name.len = strlen(node->parent->name);
++ parent_dentry = d_hash_and_lookup(domain->dentry, &name);
++ if (IS_ERR_OR_NULL(parent_dentry))
++ goto exit;
++
++ /* fallthrough */
++ case KDBUS_NODE_BUS:
++ if (WARN_ON(!node->name))
++ goto exit;
++
++ name.name = node->name;
++ name.len = strlen(node->name);
++ dentry = d_hash_and_lookup(parent_dentry ? : domain->dentry,
++ &name);
++ if (!IS_ERR_OR_NULL(dentry)) {
++ d_invalidate(dentry);
++ dput(dentry);
++ }
++
++ dput(parent_dentry);
++ break;
++
++ default:
++ /* all other types are bound to their parent lifetime */
++ break;
++ }
++
++exit:
++ kdbus_node_release(&domain->node);
++}
+diff --git a/ipc/kdbus/fs.h b/ipc/kdbus/fs.h
+new file mode 100644
+index 0000000..62f7d6a
+--- /dev/null
++++ b/ipc/kdbus/fs.h
+@@ -0,0 +1,28 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUSFS_H
++#define __KDBUSFS_H
++
++#include <linux/kernel.h>
++
++struct kdbus_node;
++
++int kdbus_fs_init(void);
++void kdbus_fs_exit(void);
++void kdbus_fs_flush(struct kdbus_node *node);
++
++#define kdbus_node_from_inode(_inode) \
++ ((struct kdbus_node *)(_inode)->i_private)
++
++#endif
+diff --git a/ipc/kdbus/handle.c b/ipc/kdbus/handle.c
+new file mode 100644
+index 0000000..fc60932
+--- /dev/null
++++ b/ipc/kdbus/handle.c
+@@ -0,0 +1,691 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/file.h>
++#include <linux/fs.h>
++#include <linux/idr.h>
++#include <linux/init.h>
++#include <linux/kdev_t.h>
++#include <linux/module.h>
++#include <linux/mutex.h>
++#include <linux/poll.h>
++#include <linux/rwsem.h>
++#include <linux/sched.h>
++#include <linux/sizes.h>
++#include <linux/slab.h>
++#include <linux/uaccess.h>
++#include <linux/syscalls.h>
++
++#include "bus.h"
++#include "connection.h"
++#include "endpoint.h"
++#include "fs.h"
++#include "handle.h"
++#include "item.h"
++#include "match.h"
++#include "message.h"
++#include "names.h"
++#include "domain.h"
++#include "policy.h"
++
++static int kdbus_args_verify(struct kdbus_args *args)
++{
++ struct kdbus_item *item;
++ size_t i;
++ int ret;
++
++ KDBUS_ITEMS_FOREACH(item, args->items, args->items_size) {
++ struct kdbus_arg *arg = NULL;
++
++ if (!KDBUS_ITEM_VALID(item, args->items, args->items_size))
++ return -EINVAL;
++
++ for (i = 0; i < args->argc; ++i)
++ if (args->argv[i].type == item->type)
++ break;
++ if (i >= args->argc)
++ return -EINVAL;
++
++ arg = &args->argv[i];
++
++ ret = kdbus_item_validate(item);
++ if (ret < 0)
++ return ret;
++
++ if (arg->item && !arg->multiple)
++ return -EINVAL;
++
++ arg->item = item;
++ }
++
++ if (!KDBUS_ITEMS_END(item, args->items, args->items_size))
++ return -EINVAL;
++
++ return 0;
++}
++
++static int kdbus_args_negotiate(struct kdbus_args *args)
++{
++ struct kdbus_item __user *user;
++ struct kdbus_item *negotiation;
++ size_t i, j, num;
++
++ /*
++ * If KDBUS_FLAG_NEGOTIATE is set, we overwrite the flags field with
++ * the set of supported flags. Furthermore, if an KDBUS_ITEM_NEGOTIATE
++ * item is passed, we iterate its payload (array of u64, each set to an
++ * item type) and clear all unsupported item-types to 0.
++ * The caller might do this recursively, if other flags or objects are
++ * embedded in the payload itself.
++ */
++
++ if (args->cmd->flags & KDBUS_FLAG_NEGOTIATE) {
++ if (put_user(args->allowed_flags & ~KDBUS_FLAG_NEGOTIATE,
++ &args->user->flags))
++ return -EFAULT;
++ }
++
++ if (args->argc < 1 || args->argv[0].type != KDBUS_ITEM_NEGOTIATE ||
++ !args->argv[0].item)
++ return 0;
++
++ negotiation = args->argv[0].item;
++ user = (struct kdbus_item __user *)
++ ((u8 __user *)args->user +
++ ((u8 *)negotiation - (u8 *)args->cmd));
++ num = KDBUS_ITEM_PAYLOAD_SIZE(negotiation) / sizeof(u64);
++
++ for (i = 0; i < num; ++i) {
++ for (j = 0; j < args->argc; ++j)
++ if (negotiation->data64[i] == args->argv[j].type)
++ break;
++
++ if (j < args->argc)
++ continue;
++
++ /* this item is not supported, clear it out */
++ negotiation->data64[i] = 0;
++ if (put_user(negotiation->data64[i], &user->data64[i]))
++ return -EFAULT;
++ }
++
++ return 0;
++}
++
++/**
++ * __kdbus_args_parse() - parse payload of kdbus command
++ * @args: object to parse data into
++ * @is_cmd: whether this is a command or msg payload
++ * @argp: user-space location of command payload to parse
++ * @type_size: overall size of command payload to parse
++ * @items_offset: offset of items array in command payload
++ * @out: output variable to store pointer to copied payload
++ *
++ * This parses the ioctl payload at user-space location @argp into @args. @args
++ * must be pre-initialized by the caller to reflect the supported flags and
++ * items of this command. This parser will then copy the command payload into
++ * kernel-space, verify correctness and consistency and cache pointers to parsed
++ * items and other data in @args.
++ *
++ * If this function succeeded, you must call kdbus_args_clear() to release
++ * allocated resources before destroying @args.
++ *
++ * This can also be used to import kdbus_msg objects. In that case, @is_cmd must
++ * be set to 'false' and the 'return_flags' field will not be touched (as it
++ * doesn't exist on kdbus_msg).
++ *
++ * Return: On failure a negative error code is returned. Otherwise, 1 is
++ * returned if negotiation was requested, 0 if not.
++ */
++int __kdbus_args_parse(struct kdbus_args *args, bool is_cmd, void __user *argp,
++ size_t type_size, size_t items_offset, void **out)
++{
++ u64 user_size;
++ int ret, i;
++
++ ret = kdbus_copy_from_user(&user_size, argp, sizeof(user_size));
++ if (ret < 0)
++ return ret;
++
++ if (user_size < type_size)
++ return -EINVAL;
++ if (user_size > KDBUS_CMD_MAX_SIZE)
++ return -EMSGSIZE;
++
++ if (user_size <= sizeof(args->cmd_buf)) {
++ if (copy_from_user(args->cmd_buf, argp, user_size))
++ return -EFAULT;
++ args->cmd = (void*)args->cmd_buf;
++ } else {
++ args->cmd = memdup_user(argp, user_size);
++ if (IS_ERR(args->cmd))
++ return PTR_ERR(args->cmd);
++ }
++
++ if (args->cmd->size != user_size) {
++ ret = -EINVAL;
++ goto error;
++ }
++
++ if (is_cmd)
++ args->cmd->return_flags = 0;
++ args->user = argp;
++ args->items = (void *)((u8 *)args->cmd + items_offset);
++ args->items_size = args->cmd->size - items_offset;
++ args->is_cmd = is_cmd;
++
++ if (args->cmd->flags & ~args->allowed_flags) {
++ ret = -EINVAL;
++ goto error;
++ }
++
++ ret = kdbus_args_verify(args);
++ if (ret < 0)
++ goto error;
++
++ ret = kdbus_args_negotiate(args);
++ if (ret < 0)
++ goto error;
++
++ /* mandatory items must be given (but not on negotiation) */
++ if (!(args->cmd->flags & KDBUS_FLAG_NEGOTIATE)) {
++ for (i = 0; i < args->argc; ++i)
++ if (args->argv[i].mandatory && !args->argv[i].item) {
++ ret = -EINVAL;
++ goto error;
++ }
++ }
++
++ *out = args->cmd;
++ return !!(args->cmd->flags & KDBUS_FLAG_NEGOTIATE);
++
++error:
++ return kdbus_args_clear(args, ret);
++}
++
++/**
++ * kdbus_args_clear() - release allocated command resources
++ * @args: object to release resources of
++ * @ret: return value of this command
++ *
++ * This frees all allocated resources on @args and copies the command result
++ * flags into user-space. @ret is usually returned unchanged by this function,
++ * so it can be used in the final 'return' statement of the command handler.
++ *
++ * Return: -EFAULT if return values cannot be copied into user-space, otherwise
++ * @ret is returned unchanged.
++ */
++int kdbus_args_clear(struct kdbus_args *args, int ret)
++{
++ if (!args)
++ return ret;
++
++ if (!IS_ERR_OR_NULL(args->cmd)) {
++ if (args->is_cmd && put_user(args->cmd->return_flags,
++ &args->user->return_flags))
++ ret = -EFAULT;
++ if (args->cmd != (void*)args->cmd_buf)
++ kfree(args->cmd);
++ args->cmd = NULL;
++ }
++
++ return ret;
++}
++
++/**
++ * enum kdbus_handle_type - type an handle can be of
++ * @KDBUS_HANDLE_NONE: no type set, yet
++ * @KDBUS_HANDLE_BUS_OWNER: bus owner
++ * @KDBUS_HANDLE_EP_OWNER: endpoint owner
++ * @KDBUS_HANDLE_CONNECTED: endpoint connection after HELLO
++ */
++enum kdbus_handle_type {
++ KDBUS_HANDLE_NONE,
++ KDBUS_HANDLE_BUS_OWNER,
++ KDBUS_HANDLE_EP_OWNER,
++ KDBUS_HANDLE_CONNECTED,
++};
++
++/**
++ * struct kdbus_handle - handle to the kdbus system
++ * @lock: handle lock
++ * @type: type of this handle (KDBUS_HANDLE_*)
++ * @bus_owner: bus this handle owns
++ * @ep_owner: endpoint this handle owns
++ * @conn: connection this handle owns
++ */
++struct kdbus_handle {
++ struct mutex lock;
++
++ enum kdbus_handle_type type;
++ union {
++ struct kdbus_bus *bus_owner;
++ struct kdbus_ep *ep_owner;
++ struct kdbus_conn *conn;
++ };
++};
++
++static int kdbus_handle_open(struct inode *inode, struct file *file)
++{
++ struct kdbus_handle *handle;
++ struct kdbus_node *node;
++ int ret;
++
++ node = kdbus_node_from_inode(inode);
++ if (!kdbus_node_acquire(node))
++ return -ESHUTDOWN;
++
++ handle = kzalloc(sizeof(*handle), GFP_KERNEL);
++ if (!handle) {
++ ret = -ENOMEM;
++ goto exit;
++ }
++
++ mutex_init(&handle->lock);
++ handle->type = KDBUS_HANDLE_NONE;
++
++ file->private_data = handle;
++ ret = 0;
++
++exit:
++ kdbus_node_release(node);
++ return ret;
++}
++
++static int kdbus_handle_release(struct inode *inode, struct file *file)
++{
++ struct kdbus_handle *handle = file->private_data;
++
++ switch (handle->type) {
++ case KDBUS_HANDLE_BUS_OWNER:
++ if (handle->bus_owner) {
++ kdbus_node_deactivate(&handle->bus_owner->node);
++ kdbus_bus_unref(handle->bus_owner);
++ }
++ break;
++ case KDBUS_HANDLE_EP_OWNER:
++ if (handle->ep_owner) {
++ kdbus_node_deactivate(&handle->ep_owner->node);
++ kdbus_ep_unref(handle->ep_owner);
++ }
++ break;
++ case KDBUS_HANDLE_CONNECTED:
++ kdbus_conn_disconnect(handle->conn, false);
++ kdbus_conn_unref(handle->conn);
++ break;
++ case KDBUS_HANDLE_NONE:
++ /* nothing to clean up */
++ break;
++ }
++
++ kfree(handle);
++
++ return 0;
++}
++
++static long kdbus_handle_ioctl_control(struct file *file, unsigned int cmd,
++ void __user *argp)
++{
++ struct kdbus_handle *handle = file->private_data;
++ struct kdbus_node *node = file_inode(file)->i_private;
++ struct kdbus_domain *domain;
++ int ret = 0;
++
++ if (!kdbus_node_acquire(node))
++ return -ESHUTDOWN;
++
++ /*
++ * The parent of control-nodes is always a domain, make sure to pin it
++ * so the parent is actually valid.
++ */
++ domain = kdbus_domain_from_node(node->parent);
++ if (!kdbus_node_acquire(&domain->node)) {
++ kdbus_node_release(node);
++ return -ESHUTDOWN;
++ }
++
++ switch (cmd) {
++ case KDBUS_CMD_BUS_MAKE: {
++ struct kdbus_bus *bus;
++
++ bus = kdbus_cmd_bus_make(domain, argp);
++ if (IS_ERR_OR_NULL(bus)) {
++ ret = PTR_ERR_OR_ZERO(bus);
++ break;
++ }
++
++ handle->bus_owner = bus;
++ ret = KDBUS_HANDLE_BUS_OWNER;
++ break;
++ }
++
++ default:
++ ret = -EBADFD;
++ break;
++ }
++
++ kdbus_node_release(&domain->node);
++ kdbus_node_release(node);
++ return ret;
++}
++
++static long kdbus_handle_ioctl_ep(struct file *file, unsigned int cmd,
++ void __user *buf)
++{
++ struct kdbus_handle *handle = file->private_data;
++ struct kdbus_node *node = file_inode(file)->i_private;
++ struct kdbus_ep *ep, *file_ep = kdbus_ep_from_node(node);
++ struct kdbus_bus *bus = file_ep->bus;
++ struct kdbus_conn *conn;
++ int ret = 0;
++
++ if (!kdbus_node_acquire(node))
++ return -ESHUTDOWN;
++
++ switch (cmd) {
++ case KDBUS_CMD_ENDPOINT_MAKE: {
++ /* creating custom endpoints is a privileged operation */
++ if (!kdbus_ep_is_owner(file_ep, file)) {
++ ret = -EPERM;
++ break;
++ }
++
++ ep = kdbus_cmd_ep_make(bus, buf);
++ if (IS_ERR_OR_NULL(ep)) {
++ ret = PTR_ERR_OR_ZERO(ep);
++ break;
++ }
++
++ handle->ep_owner = ep;
++ ret = KDBUS_HANDLE_EP_OWNER;
++ break;
++ }
++
++ case KDBUS_CMD_HELLO:
++ conn = kdbus_cmd_hello(file_ep, file, buf);
++ if (IS_ERR_OR_NULL(conn)) {
++ ret = PTR_ERR_OR_ZERO(conn);
++ break;
++ }
++
++ handle->conn = conn;
++ ret = KDBUS_HANDLE_CONNECTED;
++ break;
++
++ default:
++ ret = -EBADFD;
++ break;
++ }
++
++ kdbus_node_release(node);
++ return ret;
++}
++
++static long kdbus_handle_ioctl_ep_owner(struct file *file, unsigned int command,
++ void __user *buf)
++{
++ struct kdbus_handle *handle = file->private_data;
++ struct kdbus_ep *ep = handle->ep_owner;
++ int ret;
++
++ if (!kdbus_node_acquire(&ep->node))
++ return -ESHUTDOWN;
++
++ switch (command) {
++ case KDBUS_CMD_ENDPOINT_UPDATE:
++ ret = kdbus_cmd_ep_update(ep, buf);
++ break;
++ default:
++ ret = -EBADFD;
++ break;
++ }
++
++ kdbus_node_release(&ep->node);
++ return ret;
++}
++
++static long kdbus_handle_ioctl_connected(struct file *file,
++ unsigned int command, void __user *buf)
++{
++ struct kdbus_handle *handle = file->private_data;
++ struct kdbus_conn *conn = handle->conn;
++ struct kdbus_conn *release_conn = NULL;
++ int ret;
++
++ release_conn = conn;
++ ret = kdbus_conn_acquire(release_conn);
++ if (ret < 0)
++ return ret;
++
++ switch (command) {
++ case KDBUS_CMD_BYEBYE:
++ /*
++ * BYEBYE is special; we must not acquire a connection when
++ * calling into kdbus_conn_disconnect() or we will deadlock,
++ * because kdbus_conn_disconnect() will wait for all acquired
++ * references to be dropped.
++ */
++ kdbus_conn_release(release_conn);
++ release_conn = NULL;
++ ret = kdbus_cmd_byebye_unlocked(conn, buf);
++ break;
++ case KDBUS_CMD_NAME_ACQUIRE:
++ ret = kdbus_cmd_name_acquire(conn, buf);
++ break;
++ case KDBUS_CMD_NAME_RELEASE:
++ ret = kdbus_cmd_name_release(conn, buf);
++ break;
++ case KDBUS_CMD_LIST:
++ ret = kdbus_cmd_list(conn, buf);
++ break;
++ case KDBUS_CMD_CONN_INFO:
++ ret = kdbus_cmd_conn_info(conn, buf);
++ break;
++ case KDBUS_CMD_BUS_CREATOR_INFO:
++ ret = kdbus_cmd_bus_creator_info(conn, buf);
++ break;
++ case KDBUS_CMD_UPDATE:
++ ret = kdbus_cmd_update(conn, buf);
++ break;
++ case KDBUS_CMD_MATCH_ADD:
++ ret = kdbus_cmd_match_add(conn, buf);
++ break;
++ case KDBUS_CMD_MATCH_REMOVE:
++ ret = kdbus_cmd_match_remove(conn, buf);
++ break;
++ case KDBUS_CMD_SEND:
++ ret = kdbus_cmd_send(conn, file, buf);
++ break;
++ case KDBUS_CMD_RECV:
++ ret = kdbus_cmd_recv(conn, buf);
++ break;
++ case KDBUS_CMD_FREE:
++ ret = kdbus_cmd_free(conn, buf);
++ break;
++ default:
++ ret = -EBADFD;
++ break;
++ }
++
++ kdbus_conn_release(release_conn);
++ return ret;
++}
++
++static long kdbus_handle_ioctl(struct file *file, unsigned int cmd,
++ unsigned long arg)
++{
++ struct kdbus_handle *handle = file->private_data;
++ struct kdbus_node *node = kdbus_node_from_inode(file_inode(file));
++ void __user *argp = (void __user *)arg;
++ long ret = -EBADFD;
++
++ switch (cmd) {
++ case KDBUS_CMD_BUS_MAKE:
++ case KDBUS_CMD_ENDPOINT_MAKE:
++ case KDBUS_CMD_HELLO:
++ mutex_lock(&handle->lock);
++ if (handle->type == KDBUS_HANDLE_NONE) {
++ if (node->type == KDBUS_NODE_CONTROL)
++ ret = kdbus_handle_ioctl_control(file, cmd,
++ argp);
++ else if (node->type == KDBUS_NODE_ENDPOINT)
++ ret = kdbus_handle_ioctl_ep(file, cmd, argp);
++
++ if (ret > 0) {
++ /*
++ * The data given via open() is not sufficient
++ * to setup a kdbus handle. Hence, we require
++ * the user to perform a setup ioctl. This setup
++ * can only be performed once and defines the
++ * type of the handle. The different setup
++ * ioctls are locked against each other so they
++ * cannot race. Once the handle type is set,
++ * the type-dependent ioctls are enabled. To
++ * improve performance, we don't lock those via
++ * handle->lock. Instead, we issue a
++ * write-barrier before performing the
++ * type-change, which pairs with smp_rmb() in
++ * all handlers that access the type field. This
++ * guarantees the handle is fully setup, if
++ * handle->type is set. If handle->type is
++ * unset, you must not make any assumptions
++ * without taking handle->lock.
++ * Note that handle->type is only set once. It
++ * will never change afterwards.
++ */
++ smp_wmb();
++ handle->type = ret;
++ }
++ }
++ mutex_unlock(&handle->lock);
++ break;
++
++ case KDBUS_CMD_ENDPOINT_UPDATE:
++ case KDBUS_CMD_BYEBYE:
++ case KDBUS_CMD_NAME_ACQUIRE:
++ case KDBUS_CMD_NAME_RELEASE:
++ case KDBUS_CMD_LIST:
++ case KDBUS_CMD_CONN_INFO:
++ case KDBUS_CMD_BUS_CREATOR_INFO:
++ case KDBUS_CMD_UPDATE:
++ case KDBUS_CMD_MATCH_ADD:
++ case KDBUS_CMD_MATCH_REMOVE:
++ case KDBUS_CMD_SEND:
++ case KDBUS_CMD_RECV:
++ case KDBUS_CMD_FREE: {
++ enum kdbus_handle_type type;
++
++ /*
++ * This read-barrier pairs with smp_wmb() of the handle setup.
++ * it guarantees the handle is fully written, in case the
++ * type has been set. It allows us to access the handle without
++ * taking handle->lock, given the guarantee that the type is
++ * only ever set once, and stays constant afterwards.
++ * Furthermore, the handle object itself is not modified in any
++ * way after the type is set. That is, the type-field is the
++ * last field that is written on any handle. If it has not been
++ * set, we must not access the handle here.
++ */
++ type = handle->type;
++ smp_rmb();
++
++ if (type == KDBUS_HANDLE_EP_OWNER)
++ ret = kdbus_handle_ioctl_ep_owner(file, cmd, argp);
++ else if (type == KDBUS_HANDLE_CONNECTED)
++ ret = kdbus_handle_ioctl_connected(file, cmd, argp);
++
++ break;
++ }
++ default:
++ ret = -ENOTTY;
++ break;
++ }
++
++ return ret < 0 ? ret : 0;
++}
++
++static unsigned int kdbus_handle_poll(struct file *file,
++ struct poll_table_struct *wait)
++{
++ struct kdbus_handle *handle = file->private_data;
++ enum kdbus_handle_type type;
++ unsigned int mask = POLLOUT | POLLWRNORM;
++
++ /*
++ * This pairs with smp_wmb() during handle setup. It guarantees that
++ * _iff_ the handle type is set, handle->conn is valid. Furthermore,
++ * _iff_ the type is set, the handle object is constant and never
++ * changed again. If it's not set, we must not access the handle but
++ * bail out. We also must assume no setup has taken place, yet.
++ */
++ type = handle->type;
++ smp_rmb();
++
++ /* Only a connected endpoint can read/write data */
++ if (type != KDBUS_HANDLE_CONNECTED)
++ return POLLERR | POLLHUP;
++
++ poll_wait(file, &handle->conn->wait, wait);
++
++ /*
++ * Verify the connection hasn't been deactivated _after_ adding the
++ * wait-queue. This guarantees, that if the connection is deactivated
++ * after we checked it, the waitqueue is signaled and we're called
++ * again.
++ */
++ if (!kdbus_conn_active(handle->conn))
++ return POLLERR | POLLHUP;
++
++ if (!list_empty(&handle->conn->queue.msg_list) ||
++ atomic_read(&handle->conn->lost_count) > 0)
++ mask |= POLLIN | POLLRDNORM;
++
++ return mask;
++}
++
++static int kdbus_handle_mmap(struct file *file, struct vm_area_struct *vma)
++{
++ struct kdbus_handle *handle = file->private_data;
++ enum kdbus_handle_type type;
++ int ret = -EBADFD;
++
++ /*
++ * This pairs with smp_wmb() during handle setup. It guarantees that
++ * _iff_ the handle type is set, handle->conn is valid. Furthermore,
++ * _iff_ the type is set, the handle object is constant and never
++ * changed again. If it's not set, we must not access the handle but
++ * bail out. We also must assume no setup has taken place, yet.
++ */
++ type = handle->type;
++ smp_rmb();
++
++ /* Only connected handles have a pool we can map */
++ if (type == KDBUS_HANDLE_CONNECTED)
++ ret = kdbus_pool_mmap(handle->conn->pool, vma);
++
++ return ret;
++}
++
++const struct file_operations kdbus_handle_ops = {
++ .owner = THIS_MODULE,
++ .open = kdbus_handle_open,
++ .release = kdbus_handle_release,
++ .poll = kdbus_handle_poll,
++ .llseek = noop_llseek,
++ .unlocked_ioctl = kdbus_handle_ioctl,
++ .mmap = kdbus_handle_mmap,
++#ifdef CONFIG_COMPAT
++ .compat_ioctl = kdbus_handle_ioctl,
++#endif
++};
+diff --git a/ipc/kdbus/handle.h b/ipc/kdbus/handle.h
+new file mode 100644
+index 0000000..5dde2c1
+--- /dev/null
++++ b/ipc/kdbus/handle.h
+@@ -0,0 +1,103 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_HANDLE_H
++#define __KDBUS_HANDLE_H
++
++#include <linux/fs.h>
++#include <uapi/linux/kdbus.h>
++
++extern const struct file_operations kdbus_handle_ops;
++
++/**
++ * kdbus_arg - information and state of a single ioctl command item
++ * @type: item type
++ * @item: set by the parser to the first found item of this type
++ * @multiple: whether multiple items of this type are allowed
++ * @mandatory: whether at least one item of this type is required
++ *
++ * This structure describes a single item in an ioctl command payload. The
++ * caller has to pre-fill the type and flags, the parser will then use this
++ * information to verify the ioctl payload. @item is set by the parser to point
++ * to the first occurrence of the item.
++ */
++struct kdbus_arg {
++ u64 type;
++ struct kdbus_item *item;
++ bool multiple : 1;
++ bool mandatory : 1;
++};
++
++/**
++ * kdbus_args - information and state of ioctl command parser
++ * @allowed_flags: set of flags this command supports
++ * @argc: number of items in @argv
++ * @argv: array of items this command supports
++ * @user: set by parser to user-space location of current command
++ * @cmd: set by parser to kernel copy of command payload
++ * @cmd_buf: inline buf to avoid kmalloc() on small cmds
++ * @items: points to item array in @cmd
++ * @items_size: size of @items in bytes
++ * @is_cmd: whether this is a command-payload or msg-payload
++ *
++ * This structure is used to parse ioctl command payloads on each invocation.
++ * The ioctl handler has to pre-fill the flags and allowed items before passing
++ * the object to kdbus_args_parse(). The parser will copy the command payload
++ * into kernel-space and verify the correctness of the data.
++ *
++ * We use a 256 bytes buffer for small command payloads, to be allocated on
++ * stack on syscall entrance.
++ */
++struct kdbus_args {
++ u64 allowed_flags;
++ size_t argc;
++ struct kdbus_arg *argv;
++
++ struct kdbus_cmd __user *user;
++ struct kdbus_cmd *cmd;
++ u8 cmd_buf[256];
++
++ struct kdbus_item *items;
++ size_t items_size;
++ bool is_cmd : 1;
++};
++
++int __kdbus_args_parse(struct kdbus_args *args, bool is_cmd, void __user *argp,
++ size_t type_size, size_t items_offset, void **out);
++int kdbus_args_clear(struct kdbus_args *args, int ret);
++
++#define kdbus_args_parse(_args, _argp, _v) \
++ ({ \
++ BUILD_BUG_ON(offsetof(typeof(**(_v)), size) != \
++ offsetof(struct kdbus_cmd, size)); \
++ BUILD_BUG_ON(offsetof(typeof(**(_v)), flags) != \
++ offsetof(struct kdbus_cmd, flags)); \
++ BUILD_BUG_ON(offsetof(typeof(**(_v)), return_flags) != \
++ offsetof(struct kdbus_cmd, return_flags)); \
++ __kdbus_args_parse((_args), 1, (_argp), sizeof(**(_v)), \
++ offsetof(typeof(**(_v)), items), \
++ (void **)(_v)); \
++ })
++
++#define kdbus_args_parse_msg(_args, _argp, _v) \
++ ({ \
++ BUILD_BUG_ON(offsetof(typeof(**(_v)), size) != \
++ offsetof(struct kdbus_cmd, size)); \
++ BUILD_BUG_ON(offsetof(typeof(**(_v)), flags) != \
++ offsetof(struct kdbus_cmd, flags)); \
++ __kdbus_args_parse((_args), 0, (_argp), sizeof(**(_v)), \
++ offsetof(typeof(**(_v)), items), \
++ (void **)(_v)); \
++ })
++
++#endif
+diff --git a/ipc/kdbus/item.c b/ipc/kdbus/item.c
+new file mode 100644
+index 0000000..ce78dba
+--- /dev/null
++++ b/ipc/kdbus/item.c
+@@ -0,0 +1,293 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/ctype.h>
++#include <linux/fs.h>
++#include <linux/string.h>
++
++#include "item.h"
++#include "limits.h"
++#include "util.h"
++
++/*
++ * This verifies the string at position @str with size @size is properly
++ * zero-terminated and does not contain a 0-byte but at the end.
++ */
++static bool kdbus_str_valid(const char *str, size_t size)
++{
++ return size > 0 && memchr(str, '\0', size) == str + size - 1;
++}
++
++/**
++ * kdbus_item_validate_name() - validate an item containing a name
++ * @item: Item to validate
++ *
++ * Return: zero on success or an negative error code on failure
++ */
++int kdbus_item_validate_name(const struct kdbus_item *item)
++{
++ const char *name = item->str;
++ unsigned int i;
++ size_t len;
++
++ if (item->size < KDBUS_ITEM_HEADER_SIZE + 2)
++ return -EINVAL;
++
++ if (item->size > KDBUS_ITEM_HEADER_SIZE +
++ KDBUS_SYSNAME_MAX_LEN + 1)
++ return -ENAMETOOLONG;
++
++ if (!kdbus_str_valid(name, KDBUS_ITEM_PAYLOAD_SIZE(item)))
++ return -EINVAL;
++
++ len = strlen(name);
++ if (len == 0)
++ return -EINVAL;
++
++ for (i = 0; i < len; i++) {
++ if (isalpha(name[i]))
++ continue;
++ if (isdigit(name[i]))
++ continue;
++ if (name[i] == '_')
++ continue;
++ if (i > 0 && i + 1 < len && (name[i] == '-' || name[i] == '.'))
++ continue;
++
++ return -EINVAL;
++ }
++
++ return 0;
++}
++
++/**
++ * kdbus_item_validate() - validate a single item
++ * @item: item to validate
++ *
++ * Return: 0 if item is valid, negative error code if not.
++ */
++int kdbus_item_validate(const struct kdbus_item *item)
++{
++ size_t payload_size = KDBUS_ITEM_PAYLOAD_SIZE(item);
++ size_t l;
++ int ret;
++
++ BUILD_BUG_ON(KDBUS_ITEM_HEADER_SIZE !=
++ sizeof(struct kdbus_item_header));
++
++ if (item->size < KDBUS_ITEM_HEADER_SIZE)
++ return -EINVAL;
++
++ switch (item->type) {
++ case KDBUS_ITEM_NEGOTIATE:
++ if (payload_size % sizeof(u64) != 0)
++ return -EINVAL;
++ break;
++
++ case KDBUS_ITEM_PAYLOAD_VEC:
++ case KDBUS_ITEM_PAYLOAD_OFF:
++ if (payload_size != sizeof(struct kdbus_vec))
++ return -EINVAL;
++ if (item->vec.size == 0 || item->vec.size > SIZE_MAX)
++ return -EINVAL;
++ break;
++
++ case KDBUS_ITEM_PAYLOAD_MEMFD:
++ if (payload_size != sizeof(struct kdbus_memfd))
++ return -EINVAL;
++ if (item->memfd.size == 0 || item->memfd.size > SIZE_MAX)
++ return -EINVAL;
++ if (item->memfd.fd < 0)
++ return -EBADF;
++ break;
++
++ case KDBUS_ITEM_FDS:
++ if (payload_size % sizeof(int) != 0)
++ return -EINVAL;
++ break;
++
++ case KDBUS_ITEM_CANCEL_FD:
++ if (payload_size != sizeof(int))
++ return -EINVAL;
++ break;
++
++ case KDBUS_ITEM_BLOOM_PARAMETER:
++ if (payload_size != sizeof(struct kdbus_bloom_parameter))
++ return -EINVAL;
++ break;
++
++ case KDBUS_ITEM_BLOOM_FILTER:
++ /* followed by the bloom-mask, depends on the bloom-size */
++ if (payload_size < sizeof(struct kdbus_bloom_filter))
++ return -EINVAL;
++ break;
++
++ case KDBUS_ITEM_BLOOM_MASK:
++ /* size depends on bloom-size of bus */
++ break;
++
++ case KDBUS_ITEM_CONN_DESCRIPTION:
++ case KDBUS_ITEM_MAKE_NAME:
++ ret = kdbus_item_validate_name(item);
++ if (ret < 0)
++ return ret;
++ break;
++
++ case KDBUS_ITEM_ATTACH_FLAGS_SEND:
++ case KDBUS_ITEM_ATTACH_FLAGS_RECV:
++ case KDBUS_ITEM_ID:
++ case KDBUS_ITEM_DST_ID:
++ if (payload_size != sizeof(u64))
++ return -EINVAL;
++ break;
++
++ case KDBUS_ITEM_TIMESTAMP:
++ if (payload_size != sizeof(struct kdbus_timestamp))
++ return -EINVAL;
++ break;
++
++ case KDBUS_ITEM_CREDS:
++ if (payload_size != sizeof(struct kdbus_creds))
++ return -EINVAL;
++ break;
++
++ case KDBUS_ITEM_AUXGROUPS:
++ if (payload_size % sizeof(u32) != 0)
++ return -EINVAL;
++ break;
++
++ case KDBUS_ITEM_NAME:
++ case KDBUS_ITEM_DST_NAME:
++ case KDBUS_ITEM_PID_COMM:
++ case KDBUS_ITEM_TID_COMM:
++ case KDBUS_ITEM_EXE:
++ case KDBUS_ITEM_CMDLINE:
++ case KDBUS_ITEM_CGROUP:
++ case KDBUS_ITEM_SECLABEL:
++ if (!kdbus_str_valid(item->str, payload_size))
++ return -EINVAL;
++ break;
++
++ case KDBUS_ITEM_CAPS:
++ if (payload_size < sizeof(u32))
++ return -EINVAL;
++ if (payload_size < sizeof(u32) +
++ 4 * CAP_TO_INDEX(item->caps.last_cap) * sizeof(u32))
++ return -EINVAL;
++ break;
++
++ case KDBUS_ITEM_AUDIT:
++ if (payload_size != sizeof(struct kdbus_audit))
++ return -EINVAL;
++ break;
++
++ case KDBUS_ITEM_POLICY_ACCESS:
++ if (payload_size != sizeof(struct kdbus_policy_access))
++ return -EINVAL;
++ break;
++
++ case KDBUS_ITEM_NAME_ADD:
++ case KDBUS_ITEM_NAME_REMOVE:
++ case KDBUS_ITEM_NAME_CHANGE:
++ if (payload_size < sizeof(struct kdbus_notify_name_change))
++ return -EINVAL;
++ l = payload_size - offsetof(struct kdbus_notify_name_change,
++ name);
++ if (l > 0 && !kdbus_str_valid(item->name_change.name, l))
++ return -EINVAL;
++ break;
++
++ case KDBUS_ITEM_ID_ADD:
++ case KDBUS_ITEM_ID_REMOVE:
++ if (payload_size != sizeof(struct kdbus_notify_id_change))
++ return -EINVAL;
++ break;
++
++ case KDBUS_ITEM_REPLY_TIMEOUT:
++ case KDBUS_ITEM_REPLY_DEAD:
++ if (payload_size != 0)
++ return -EINVAL;
++ break;
++
++ default:
++ break;
++ }
++
++ return 0;
++}
++
++/**
++ * kdbus_items_validate() - validate items passed by user-space
++ * @items: items to validate
++ * @items_size: number of items
++ *
++ * This verifies that the passed items pointer is consistent and valid.
++ * Furthermore, each item is checked for:
++ * - valid "size" value
++ * - payload is of expected type
++ * - payload is fully included in the item
++ * - string payloads are zero-terminated
++ *
++ * Return: 0 on success, negative error code on failure.
++ */
++int kdbus_items_validate(const struct kdbus_item *items, size_t items_size)
++{
++ const struct kdbus_item *item;
++ int ret;
++
++ KDBUS_ITEMS_FOREACH(item, items, items_size) {
++ if (!KDBUS_ITEM_VALID(item, items, items_size))
++ return -EINVAL;
++
++ ret = kdbus_item_validate(item);
++ if (ret < 0)
++ return ret;
++ }
++
++ if (!KDBUS_ITEMS_END(item, items, items_size))
++ return -EINVAL;
++
++ return 0;
++}
++
++/**
++ * kdbus_item_set() - Set item content
++ * @item: The item to modify
++ * @type: The item type to set (KDBUS_ITEM_*)
++ * @data: Data to copy to item->data, may be %NULL
++ * @len: Number of bytes in @data
++ *
++ * This sets type, size and data fields of an item. If @data is NULL, the data
++ * memory is cleared.
++ *
++ * Note that you must align your @data memory to 8 bytes. Trailing padding (in
++ * case @len is not 8byte aligned) is cleared by this call.
++ *
++ * Returns: Pointer to the following item.
++ */
++struct kdbus_item *kdbus_item_set(struct kdbus_item *item, u64 type,
++ const void *data, size_t len)
++{
++ item->type = type;
++ item->size = KDBUS_ITEM_HEADER_SIZE + len;
++
++ if (data) {
++ memcpy(item->data, data, len);
++ memset(item->data + len, 0, KDBUS_ALIGN8(len) - len);
++ } else {
++ memset(item->data, 0, KDBUS_ALIGN8(len));
++ }
++
++ return KDBUS_ITEM_NEXT(item);
++}
+diff --git a/ipc/kdbus/item.h b/ipc/kdbus/item.h
+new file mode 100644
+index 0000000..3a7e6cc
+--- /dev/null
++++ b/ipc/kdbus/item.h
+@@ -0,0 +1,61 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_ITEM_H
++#define __KDBUS_ITEM_H
++
++#include <linux/kernel.h>
++#include <uapi/linux/kdbus.h>
++
++#include "util.h"
++
++/* generic access and iterators over a stream of items */
++#define KDBUS_ITEM_NEXT(_i) (typeof(_i))((u8 *)(_i) + KDBUS_ALIGN8((_i)->size))
++#define KDBUS_ITEMS_SIZE(_h, _is) ((_h)->size - offsetof(typeof(*(_h)), _is))
++#define KDBUS_ITEM_HEADER_SIZE offsetof(struct kdbus_item, data)
++#define KDBUS_ITEM_SIZE(_s) KDBUS_ALIGN8(KDBUS_ITEM_HEADER_SIZE + (_s))
++#define KDBUS_ITEM_PAYLOAD_SIZE(_i) ((_i)->size - KDBUS_ITEM_HEADER_SIZE)
++
++#define KDBUS_ITEMS_FOREACH(_i, _is, _s) \
++ for ((_i) = (_is); \
++ ((u8 *)(_i) < (u8 *)(_is) + (_s)) && \
++ ((u8 *)(_i) >= (u8 *)(_is)); \
++ (_i) = KDBUS_ITEM_NEXT(_i))
++
++#define KDBUS_ITEM_VALID(_i, _is, _s) \
++ ((_i)->size >= KDBUS_ITEM_HEADER_SIZE && \
++ (u8 *)(_i) + (_i)->size > (u8 *)(_i) && \
++ (u8 *)(_i) + (_i)->size <= (u8 *)(_is) + (_s) && \
++ (u8 *)(_i) >= (u8 *)(_is))
++
++#define KDBUS_ITEMS_END(_i, _is, _s) \
++ ((u8 *)(_i) == ((u8 *)(_is) + KDBUS_ALIGN8(_s)))
++
++/**
++ * struct kdbus_item_header - Describes the fix part of an item
++ * @size: The total size of the item
++ * @type: The item type, one of KDBUS_ITEM_*
++ */
++struct kdbus_item_header {
++ u64 size;
++ u64 type;
++};
++
++int kdbus_item_validate_name(const struct kdbus_item *item);
++int kdbus_item_validate(const struct kdbus_item *item);
++int kdbus_items_validate(const struct kdbus_item *items, size_t items_size);
++struct kdbus_item *kdbus_item_set(struct kdbus_item *item, u64 type,
++ const void *data, size_t len);
++
++#endif
+diff --git a/ipc/kdbus/limits.h b/ipc/kdbus/limits.h
+new file mode 100644
+index 0000000..c54925a
+--- /dev/null
++++ b/ipc/kdbus/limits.h
+@@ -0,0 +1,61 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_DEFAULTS_H
++#define __KDBUS_DEFAULTS_H
++
++#include <linux/kernel.h>
++
++/* maximum size of message header and items */
++#define KDBUS_MSG_MAX_SIZE SZ_8K
++
++/* maximum number of memfd items per message */
++#define KDBUS_MSG_MAX_MEMFD_ITEMS 16
++
++/* max size of ioctl command data */
++#define KDBUS_CMD_MAX_SIZE SZ_32K
++
++/* maximum number of inflight fds in a target queue per user */
++#define KDBUS_CONN_MAX_FDS_PER_USER 16
++
++/* maximum message payload size */
++#define KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE SZ_2M
++
++/* maximum size of bloom bit field in bytes */
++#define KDBUS_BUS_BLOOM_MAX_SIZE SZ_4K
++
++/* maximum length of well-known bus name */
++#define KDBUS_NAME_MAX_LEN 255
++
++/* maximum length of bus, domain, ep name */
++#define KDBUS_SYSNAME_MAX_LEN 63
++
++/* maximum number of matches per connection */
++#define KDBUS_MATCH_MAX 256
++
++/* maximum number of queued messages from the same individual user */
++#define KDBUS_CONN_MAX_MSGS 256
++
++/* maximum number of well-known names per connection */
++#define KDBUS_CONN_MAX_NAMES 256
++
++/* maximum number of queued requests waiting for a reply */
++#define KDBUS_CONN_MAX_REQUESTS_PENDING 128
++
++/* maximum number of connections per user in one domain */
++#define KDBUS_USER_MAX_CONN 1024
++
++/* maximum number of buses per user in one domain */
++#define KDBUS_USER_MAX_BUSES 16
++
++#endif
+diff --git a/ipc/kdbus/main.c b/ipc/kdbus/main.c
+new file mode 100644
+index 0000000..1ad4dc8
+--- /dev/null
++++ b/ipc/kdbus/main.c
+@@ -0,0 +1,114 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
++#include <linux/fs.h>
++#include <linux/init.h>
++#include <linux/module.h>
++
++#include "util.h"
++#include "fs.h"
++#include "handle.h"
++#include "metadata.h"
++#include "node.h"
++
++/*
++ * This is a simplified outline of the internal kdbus object relations, for
++ * those interested in the inner life of the driver implementation.
++ *
++ * From a mount point's (domain's) perspective:
++ *
++ * struct kdbus_domain
++ * |» struct kdbus_user *user (many, owned)
++ * '» struct kdbus_node node (embedded)
++ * |» struct kdbus_node children (many, referenced)
++ * |» struct kdbus_node *parent (pinned)
++ * '» struct kdbus_bus (many, pinned)
++ * |» struct kdbus_node node (embedded)
++ * '» struct kdbus_ep (many, pinned)
++ * |» struct kdbus_node node (embedded)
++ * |» struct kdbus_bus *bus (pinned)
++ * |» struct kdbus_conn conn_list (many, pinned)
++ * | |» struct kdbus_ep *ep (pinned)
++ * | |» struct kdbus_name_entry *activator_of (owned)
++ * | |» struct kdbus_match_db *match_db (owned)
++ * | |» struct kdbus_meta *meta (owned)
++ * | |» struct kdbus_match_db *match_db (owned)
++ * | | '» struct kdbus_match_entry (many, owned)
++ * | |
++ * | |» struct kdbus_pool *pool (owned)
++ * | | '» struct kdbus_pool_slice *slices (many, owned)
++ * | | '» struct kdbus_pool *pool (pinned)
++ * | |
++ * | |» struct kdbus_user *user (pinned)
++ * | `» struct kdbus_queue_entry entries (many, embedded)
++ * | |» struct kdbus_pool_slice *slice (pinned)
++ * | |» struct kdbus_conn_reply *reply (owned)
++ * | '» struct kdbus_user *user (pinned)
++ * |
++ * '» struct kdbus_user *user (pinned)
++ * '» struct kdbus_policy_db policy_db (embedded)
++ * |» struct kdbus_policy_db_entry (many, owned)
++ * | |» struct kdbus_conn (pinned)
++ * | '» struct kdbus_ep (pinned)
++ * |
++ * '» struct kdbus_policy_db_cache_entry (many, owned)
++ * '» struct kdbus_conn (pinned)
++ *
++ * For the life-time of a file descriptor derived from calling open() on a file
++ * inside the mount point:
++ *
++ * struct kdbus_handle
++ * |» struct kdbus_meta *meta (owned)
++ * |» struct kdbus_ep *ep (pinned)
++ * |» struct kdbus_conn *conn (owned)
++ * '» struct kdbus_ep *ep (owned)
++ */
++
++/* kdbus mount-point /sys/fs/kdbus */
++static struct kobject *kdbus_dir;
++
++static int __init kdbus_init(void)
++{
++ int ret;
++
++ kdbus_dir = kobject_create_and_add(KBUILD_MODNAME, fs_kobj);
++ if (!kdbus_dir)
++ return -ENOMEM;
++
++ ret = kdbus_fs_init();
++ if (ret < 0) {
++ pr_err("cannot register filesystem: %d\n", ret);
++ goto exit_dir;
++ }
++
++ pr_info("initialized\n");
++ return 0;
++
++exit_dir:
++ kobject_put(kdbus_dir);
++ return ret;
++}
++
++static void __exit kdbus_exit(void)
++{
++ kdbus_fs_exit();
++ kobject_put(kdbus_dir);
++ ida_destroy(&kdbus_node_ida);
++}
++
++module_init(kdbus_init);
++module_exit(kdbus_exit);
++MODULE_LICENSE("GPL");
++MODULE_DESCRIPTION("D-Bus, powerful, easy to use interprocess communication");
++MODULE_ALIAS_FS(KBUILD_MODNAME "fs");
+diff --git a/ipc/kdbus/match.c b/ipc/kdbus/match.c
+new file mode 100644
+index 0000000..4ee6a1f
+--- /dev/null
++++ b/ipc/kdbus/match.c
+@@ -0,0 +1,546 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/fs.h>
++#include <linux/hash.h>
++#include <linux/init.h>
++#include <linux/mutex.h>
++#include <linux/sched.h>
++#include <linux/sizes.h>
++#include <linux/slab.h>
++#include <linux/uaccess.h>
++
++#include "bus.h"
++#include "connection.h"
++#include "endpoint.h"
++#include "handle.h"
++#include "item.h"
++#include "match.h"
++#include "message.h"
++#include "names.h"
++
++/**
++ * struct kdbus_match_db - message filters
++ * @entries_list: List of matches
++ * @mdb_rwlock: Match data lock
++ * @entries_count: Number of entries in database
++ */
++struct kdbus_match_db {
++ struct list_head entries_list;
++ struct rw_semaphore mdb_rwlock;
++ unsigned int entries_count;
++};
++
++/**
++ * struct kdbus_match_entry - a match database entry
++ * @cookie: User-supplied cookie to lookup the entry
++ * @list_entry: The list entry element for the db list
++ * @rules_list: The list head for tracking rules of this entry
++ */
++struct kdbus_match_entry {
++ u64 cookie;
++ struct list_head list_entry;
++ struct list_head rules_list;
++};
++
++/**
++ * struct kdbus_bloom_mask - mask to match against filter
++ * @generations: Number of generations carried
++ * @data: Array of bloom bit fields
++ */
++struct kdbus_bloom_mask {
++ u64 generations;
++ u64 *data;
++};
++
++/**
++ * struct kdbus_match_rule - a rule appended to a match entry
++ * @type: An item type to match against
++ * @bloom_mask: Bloom mask to match a message's filter against, used
++ * with KDBUS_ITEM_BLOOM_MASK
++ * @name: Name to match against, used with KDBUS_ITEM_NAME,
++ * KDBUS_ITEM_NAME_{ADD,REMOVE,CHANGE}
++ * @old_id: ID to match against, used with
++ * KDBUS_ITEM_NAME_{ADD,REMOVE,CHANGE},
++ * KDBUS_ITEM_ID_REMOVE
++ * @new_id: ID to match against, used with
++ * KDBUS_ITEM_NAME_{ADD,REMOVE,CHANGE},
++ * KDBUS_ITEM_ID_REMOVE
++ * @src_id: ID to match against, used with KDBUS_ITEM_ID
++ * @dst_id: Message destination ID, used with KDBUS_ITEM_DST_ID
++ * @rules_entry: Entry in the entry's rules list
++ */
++struct kdbus_match_rule {
++ u64 type;
++ union {
++ struct kdbus_bloom_mask bloom_mask;
++ struct {
++ char *name;
++ u64 old_id;
++ u64 new_id;
++ };
++ u64 src_id;
++ u64 dst_id;
++ };
++ struct list_head rules_entry;
++};
++
++static void kdbus_match_rule_free(struct kdbus_match_rule *rule)
++{
++ if (!rule)
++ return;
++
++ switch (rule->type) {
++ case KDBUS_ITEM_BLOOM_MASK:
++ kfree(rule->bloom_mask.data);
++ break;
++
++ case KDBUS_ITEM_NAME:
++ case KDBUS_ITEM_NAME_ADD:
++ case KDBUS_ITEM_NAME_REMOVE:
++ case KDBUS_ITEM_NAME_CHANGE:
++ kfree(rule->name);
++ break;
++
++ case KDBUS_ITEM_ID:
++ case KDBUS_ITEM_DST_ID:
++ case KDBUS_ITEM_ID_ADD:
++ case KDBUS_ITEM_ID_REMOVE:
++ break;
++
++ default:
++ BUG();
++ }
++
++ list_del(&rule->rules_entry);
++ kfree(rule);
++}
++
++static void kdbus_match_entry_free(struct kdbus_match_entry *entry)
++{
++ struct kdbus_match_rule *r, *tmp;
++
++ if (!entry)
++ return;
++
++ list_for_each_entry_safe(r, tmp, &entry->rules_list, rules_entry)
++ kdbus_match_rule_free(r);
++
++ list_del(&entry->list_entry);
++ kfree(entry);
++}
++
++/**
++ * kdbus_match_db_free() - free match db resources
++ * @mdb: The match database
++ */
++void kdbus_match_db_free(struct kdbus_match_db *mdb)
++{
++ struct kdbus_match_entry *entry, *tmp;
++
++ if (!mdb)
++ return;
++
++ list_for_each_entry_safe(entry, tmp, &mdb->entries_list, list_entry)
++ kdbus_match_entry_free(entry);
++
++ kfree(mdb);
++}
++
++/**
++ * kdbus_match_db_new() - create a new match database
++ *
++ * Return: a new kdbus_match_db on success, ERR_PTR on failure.
++ */
++struct kdbus_match_db *kdbus_match_db_new(void)
++{
++ struct kdbus_match_db *d;
++
++ d = kzalloc(sizeof(*d), GFP_KERNEL);
++ if (!d)
++ return ERR_PTR(-ENOMEM);
++
++ init_rwsem(&d->mdb_rwlock);
++ INIT_LIST_HEAD(&d->entries_list);
++
++ return d;
++}
++
++static bool kdbus_match_bloom(const struct kdbus_bloom_filter *filter,
++ const struct kdbus_bloom_mask *mask,
++ const struct kdbus_conn *conn)
++{
++ size_t n = conn->ep->bus->bloom.size / sizeof(u64);
++ const u64 *m;
++ size_t i;
++
++ /*
++ * The message's filter carries a generation identifier, the
++ * match's mask possibly carries an array of multiple generations
++ * of the mask. Select the mask with the closest match of the
++ * filter's generation.
++ */
++ m = mask->data + (min(filter->generation, mask->generations - 1) * n);
++
++ /*
++ * The message's filter contains the messages properties,
++ * the match's mask contains the properties to look for in the
++ * message. Check the mask bit field against the filter bit field,
++ * if the message possibly carries the properties the connection
++ * has subscribed to.
++ */
++ for (i = 0; i < n; i++)
++ if ((filter->data[i] & m[i]) != m[i])
++ return false;
++
++ return true;
++}
++
++static bool kdbus_match_rule_conn(const struct kdbus_match_rule *r,
++ struct kdbus_conn *c,
++ const struct kdbus_staging *s)
++{
++ lockdep_assert_held(&c->ep->bus->name_registry->rwlock);
++
++ switch (r->type) {
++ case KDBUS_ITEM_BLOOM_MASK:
++ return kdbus_match_bloom(s->bloom_filter, &r->bloom_mask, c);
++ case KDBUS_ITEM_ID:
++ return r->src_id == c->id || r->src_id == KDBUS_MATCH_ID_ANY;
++ case KDBUS_ITEM_DST_ID:
++ return r->dst_id == s->msg->dst_id ||
++ r->dst_id == KDBUS_MATCH_ID_ANY;
++ case KDBUS_ITEM_NAME:
++ return kdbus_conn_has_name(c, r->name);
++ default:
++ return false;
++ }
++}
++
++static bool kdbus_match_rule_kernel(const struct kdbus_match_rule *r,
++ const struct kdbus_staging *s)
++{
++ struct kdbus_item *n = s->notify;
++
++ if (WARN_ON(!n) || n->type != r->type)
++ return false;
++
++ switch (r->type) {
++ case KDBUS_ITEM_ID_ADD:
++ return r->new_id == KDBUS_MATCH_ID_ANY ||
++ r->new_id == n->id_change.id;
++ case KDBUS_ITEM_ID_REMOVE:
++ return r->old_id == KDBUS_MATCH_ID_ANY ||
++ r->old_id == n->id_change.id;
++ case KDBUS_ITEM_NAME_ADD:
++ case KDBUS_ITEM_NAME_CHANGE:
++ case KDBUS_ITEM_NAME_REMOVE:
++ return (r->old_id == KDBUS_MATCH_ID_ANY ||
++ r->old_id == n->name_change.old_id.id) &&
++ (r->new_id == KDBUS_MATCH_ID_ANY ||
++ r->new_id == n->name_change.new_id.id) &&
++ (!r->name || !strcmp(r->name, n->name_change.name));
++ default:
++ return false;
++ }
++}
++
++static bool kdbus_match_rules(const struct kdbus_match_entry *entry,
++ struct kdbus_conn *c,
++ const struct kdbus_staging *s)
++{
++ struct kdbus_match_rule *r;
++
++ list_for_each_entry(r, &entry->rules_list, rules_entry)
++ if ((c && !kdbus_match_rule_conn(r, c, s)) ||
++ (!c && !kdbus_match_rule_kernel(r, s)))
++ return false;
++
++ return true;
++}
++
++/**
++ * kdbus_match_db_match_msg() - match a msg object agains the database entries
++ * @mdb: The match database
++ * @conn_src: The connection object originating the message
++ * @staging: Staging object containing the message to match against
++ *
++ * This function will walk through all the database entries previously uploaded
++ * with kdbus_match_db_add(). As soon as any of them has an all-satisfied rule
++ * set, this function will return true.
++ *
++ * The caller must hold the registry lock of conn_src->ep->bus, in case conn_src
++ * is non-NULL.
++ *
++ * Return: true if there was a matching database entry, false otherwise.
++ */
++bool kdbus_match_db_match_msg(struct kdbus_match_db *mdb,
++ struct kdbus_conn *conn_src,
++ const struct kdbus_staging *staging)
++{
++ struct kdbus_match_entry *entry;
++ bool matched = false;
++
++ down_read(&mdb->mdb_rwlock);
++ list_for_each_entry(entry, &mdb->entries_list, list_entry) {
++ matched = kdbus_match_rules(entry, conn_src, staging);
++ if (matched)
++ break;
++ }
++ up_read(&mdb->mdb_rwlock);
++
++ return matched;
++}
++
++static int kdbus_match_db_remove_unlocked(struct kdbus_match_db *mdb,
++ u64 cookie)
++{
++ struct kdbus_match_entry *entry, *tmp;
++ bool found = false;
++
++ list_for_each_entry_safe(entry, tmp, &mdb->entries_list, list_entry)
++ if (entry->cookie == cookie) {
++ kdbus_match_entry_free(entry);
++ --mdb->entries_count;
++ found = true;
++ }
++
++ return found ? 0 : -EBADSLT;
++}
++
++/**
++ * kdbus_cmd_match_add() - handle KDBUS_CMD_MATCH_ADD
++ * @conn: connection to operate on
++ * @argp: command payload
++ *
++ * One call to this function (or one ioctl(KDBUS_CMD_MATCH_ADD), respectively,
++ * adds one new database entry with n rules attached to it. Each rule is
++ * described with an kdbus_item, and an entry is considered matching if all
++ * its rules are satisfied.
++ *
++ * The items attached to a kdbus_cmd_match struct have the following mapping:
++ *
++ * KDBUS_ITEM_BLOOM_MASK: A bloom mask
++ * KDBUS_ITEM_NAME: A connection's source name
++ * KDBUS_ITEM_ID: A connection ID
++ * KDBUS_ITEM_DST_ID: A connection ID
++ * KDBUS_ITEM_NAME_ADD:
++ * KDBUS_ITEM_NAME_REMOVE:
++ * KDBUS_ITEM_NAME_CHANGE: Well-known name changes, carry
++ * kdbus_notify_name_change
++ * KDBUS_ITEM_ID_ADD:
++ * KDBUS_ITEM_ID_REMOVE: Connection ID changes, carry
++ * kdbus_notify_id_change
++ *
++ * For kdbus_notify_{id,name}_change structs, only the ID and name fields
++ * are looked at when adding an entry. The flags are unused.
++ *
++ * Also note that KDBUS_ITEM_BLOOM_MASK, KDBUS_ITEM_NAME, KDBUS_ITEM_ID,
++ * and KDBUS_ITEM_DST_ID are used to match messages from userspace, while the
++ * others apply to kernel-generated notifications.
++ *
++ * Return: >=0 on success, negative error code on failure.
++ */
++int kdbus_cmd_match_add(struct kdbus_conn *conn, void __user *argp)
++{
++ struct kdbus_match_db *mdb = conn->match_db;
++ struct kdbus_match_entry *entry = NULL;
++ struct kdbus_cmd_match *cmd;
++ struct kdbus_item *item;
++ int ret;
++
++ struct kdbus_arg argv[] = {
++ { .type = KDBUS_ITEM_NEGOTIATE },
++ { .type = KDBUS_ITEM_BLOOM_MASK, .multiple = true },
++ { .type = KDBUS_ITEM_NAME, .multiple = true },
++ { .type = KDBUS_ITEM_ID, .multiple = true },
++ { .type = KDBUS_ITEM_DST_ID, .multiple = true },
++ { .type = KDBUS_ITEM_NAME_ADD, .multiple = true },
++ { .type = KDBUS_ITEM_NAME_REMOVE, .multiple = true },
++ { .type = KDBUS_ITEM_NAME_CHANGE, .multiple = true },
++ { .type = KDBUS_ITEM_ID_ADD, .multiple = true },
++ { .type = KDBUS_ITEM_ID_REMOVE, .multiple = true },
++ };
++ struct kdbus_args args = {
++ .allowed_flags = KDBUS_FLAG_NEGOTIATE |
++ KDBUS_MATCH_REPLACE,
++ .argv = argv,
++ .argc = ARRAY_SIZE(argv),
++ };
++
++ if (!kdbus_conn_is_ordinary(conn))
++ return -EOPNOTSUPP;
++
++ ret = kdbus_args_parse(&args, argp, &cmd);
++ if (ret != 0)
++ return ret;
++
++ entry = kzalloc(sizeof(*entry), GFP_KERNEL);
++ if (!entry) {
++ ret = -ENOMEM;
++ goto exit;
++ }
++
++ entry->cookie = cmd->cookie;
++ INIT_LIST_HEAD(&entry->list_entry);
++ INIT_LIST_HEAD(&entry->rules_list);
++
++ KDBUS_ITEMS_FOREACH(item, cmd->items, KDBUS_ITEMS_SIZE(cmd, items)) {
++ struct kdbus_match_rule *rule;
++ size_t size = item->size - offsetof(struct kdbus_item, data);
++
++ rule = kzalloc(sizeof(*rule), GFP_KERNEL);
++ if (!rule) {
++ ret = -ENOMEM;
++ goto exit;
++ }
++
++ rule->type = item->type;
++ INIT_LIST_HEAD(&rule->rules_entry);
++
++ switch (item->type) {
++ case KDBUS_ITEM_BLOOM_MASK: {
++ u64 bsize = conn->ep->bus->bloom.size;
++ u64 generations;
++ u64 remainder;
++
++ generations = div64_u64_rem(size, bsize, &remainder);
++ if (size < bsize || remainder > 0) {
++ ret = -EDOM;
++ break;
++ }
++
++ rule->bloom_mask.data = kmemdup(item->data,
++ size, GFP_KERNEL);
++ if (!rule->bloom_mask.data) {
++ ret = -ENOMEM;
++ break;
++ }
++
++ rule->bloom_mask.generations = generations;
++ break;
++ }
++
++ case KDBUS_ITEM_NAME:
++ if (!kdbus_name_is_valid(item->str, false)) {
++ ret = -EINVAL;
++ break;
++ }
++
++ rule->name = kstrdup(item->str, GFP_KERNEL);
++ if (!rule->name)
++ ret = -ENOMEM;
++
++ break;
++
++ case KDBUS_ITEM_ID:
++ rule->src_id = item->id;
++ break;
++
++ case KDBUS_ITEM_DST_ID:
++ rule->dst_id = item->id;
++ break;
++
++ case KDBUS_ITEM_NAME_ADD:
++ case KDBUS_ITEM_NAME_REMOVE:
++ case KDBUS_ITEM_NAME_CHANGE:
++ rule->old_id = item->name_change.old_id.id;
++ rule->new_id = item->name_change.new_id.id;
++
++ if (size > sizeof(struct kdbus_notify_name_change)) {
++ rule->name = kstrdup(item->name_change.name,
++ GFP_KERNEL);
++ if (!rule->name)
++ ret = -ENOMEM;
++ }
++
++ break;
++
++ case KDBUS_ITEM_ID_ADD:
++ case KDBUS_ITEM_ID_REMOVE:
++ if (item->type == KDBUS_ITEM_ID_ADD)
++ rule->new_id = item->id_change.id;
++ else
++ rule->old_id = item->id_change.id;
++
++ break;
++ }
++
++ if (ret < 0) {
++ kdbus_match_rule_free(rule);
++ goto exit;
++ }
++
++ list_add_tail(&rule->rules_entry, &entry->rules_list);
++ }
++
++ down_write(&mdb->mdb_rwlock);
++
++ /* Remove any entry that has the same cookie as the current one. */
++ if (cmd->flags & KDBUS_MATCH_REPLACE)
++ kdbus_match_db_remove_unlocked(mdb, entry->cookie);
++
++ /*
++ * If the above removal caught any entry, there will be room for the
++ * new one.
++ */
++ if (++mdb->entries_count > KDBUS_MATCH_MAX) {
++ --mdb->entries_count;
++ ret = -EMFILE;
++ } else {
++ list_add_tail(&entry->list_entry, &mdb->entries_list);
++ entry = NULL;
++ }
++
++ up_write(&mdb->mdb_rwlock);
++
++exit:
++ kdbus_match_entry_free(entry);
++ return kdbus_args_clear(&args, ret);
++}
++
++/**
++ * kdbus_cmd_match_remove() - handle KDBUS_CMD_MATCH_REMOVE
++ * @conn: connection to operate on
++ * @argp: command payload
++ *
++ * Return: >=0 on success, negative error code on failure.
++ */
++int kdbus_cmd_match_remove(struct kdbus_conn *conn, void __user *argp)
++{
++ struct kdbus_cmd_match *cmd;
++ int ret;
++
++ struct kdbus_arg argv[] = {
++ { .type = KDBUS_ITEM_NEGOTIATE },
++ };
++ struct kdbus_args args = {
++ .allowed_flags = KDBUS_FLAG_NEGOTIATE,
++ .argv = argv,
++ .argc = ARRAY_SIZE(argv),
++ };
++
++ if (!kdbus_conn_is_ordinary(conn))
++ return -EOPNOTSUPP;
++
++ ret = kdbus_args_parse(&args, argp, &cmd);
++ if (ret != 0)
++ return ret;
++
++ down_write(&conn->match_db->mdb_rwlock);
++ ret = kdbus_match_db_remove_unlocked(conn->match_db, cmd->cookie);
++ up_write(&conn->match_db->mdb_rwlock);
++
++ return kdbus_args_clear(&args, ret);
++}
+diff --git a/ipc/kdbus/match.h b/ipc/kdbus/match.h
+new file mode 100644
+index 0000000..ceb492f
+--- /dev/null
++++ b/ipc/kdbus/match.h
+@@ -0,0 +1,35 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_MATCH_H
++#define __KDBUS_MATCH_H
++
++struct kdbus_conn;
++struct kdbus_match_db;
++struct kdbus_staging;
++
++struct kdbus_match_db *kdbus_match_db_new(void);
++void kdbus_match_db_free(struct kdbus_match_db *db);
++int kdbus_match_db_add(struct kdbus_conn *conn,
++ struct kdbus_cmd_match *cmd);
++int kdbus_match_db_remove(struct kdbus_conn *conn,
++ struct kdbus_cmd_match *cmd);
++bool kdbus_match_db_match_msg(struct kdbus_match_db *db,
++ struct kdbus_conn *conn_src,
++ const struct kdbus_staging *staging);
++
++int kdbus_cmd_match_add(struct kdbus_conn *conn, void __user *argp);
++int kdbus_cmd_match_remove(struct kdbus_conn *conn, void __user *argp);
++
++#endif
+diff --git a/ipc/kdbus/message.c b/ipc/kdbus/message.c
+new file mode 100644
+index 0000000..ae565cd
+--- /dev/null
++++ b/ipc/kdbus/message.c
+@@ -0,0 +1,1040 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/capability.h>
++#include <linux/cgroup.h>
++#include <linux/cred.h>
++#include <linux/file.h>
++#include <linux/init.h>
++#include <linux/module.h>
++#include <linux/mutex.h>
++#include <linux/sched.h>
++#include <linux/shmem_fs.h>
++#include <linux/sizes.h>
++#include <linux/slab.h>
++#include <linux/uaccess.h>
++#include <net/sock.h>
++
++#include "bus.h"
++#include "connection.h"
++#include "domain.h"
++#include "endpoint.h"
++#include "handle.h"
++#include "item.h"
++#include "match.h"
++#include "message.h"
++#include "names.h"
++#include "policy.h"
++
++static const char * const zeros = "\0\0\0\0\0\0\0";
++
++static struct kdbus_gaps *kdbus_gaps_new(size_t n_memfds, size_t n_fds)
++{
++ size_t size_offsets, size_memfds, size_fds, size;
++ struct kdbus_gaps *gaps;
++
++ size_offsets = n_memfds * sizeof(*gaps->memfd_offsets);
++ size_memfds = n_memfds * sizeof(*gaps->memfd_files);
++ size_fds = n_fds * sizeof(*gaps->fd_files);
++ size = sizeof(*gaps) + size_offsets + size_memfds + size_fds;
++
++ gaps = kzalloc(size, GFP_KERNEL);
++ if (!gaps)
++ return ERR_PTR(-ENOMEM);
++
++ kref_init(&gaps->kref);
++ gaps->n_memfds = 0; /* we reserve n_memfds, but don't enforce them */
++ gaps->memfd_offsets = (void *)(gaps + 1);
++ gaps->memfd_files = (void *)((u8 *)gaps->memfd_offsets + size_offsets);
++ gaps->n_fds = 0; /* we reserve n_fds, but don't enforce them */
++ gaps->fd_files = (void *)((u8 *)gaps->memfd_files + size_memfds);
++
++ return gaps;
++}
++
++static void kdbus_gaps_free(struct kref *kref)
++{
++ struct kdbus_gaps *gaps = container_of(kref, struct kdbus_gaps, kref);
++ size_t i;
++
++ for (i = 0; i < gaps->n_fds; ++i)
++ if (gaps->fd_files[i])
++ fput(gaps->fd_files[i]);
++ for (i = 0; i < gaps->n_memfds; ++i)
++ if (gaps->memfd_files[i])
++ fput(gaps->memfd_files[i]);
++
++ kfree(gaps);
++}
++
++/**
++ * kdbus_gaps_ref() - gain reference
++ * @gaps: gaps object
++ *
++ * Return: @gaps is returned
++ */
++struct kdbus_gaps *kdbus_gaps_ref(struct kdbus_gaps *gaps)
++{
++ if (gaps)
++ kref_get(&gaps->kref);
++ return gaps;
++}
++
++/**
++ * kdbus_gaps_unref() - drop reference
++ * @gaps: gaps object
++ *
++ * Return: NULL
++ */
++struct kdbus_gaps *kdbus_gaps_unref(struct kdbus_gaps *gaps)
++{
++ if (gaps)
++ kref_put(&gaps->kref, kdbus_gaps_free);
++ return NULL;
++}
++
++/**
++ * kdbus_gaps_install() - install file-descriptors
++ * @gaps: gaps object, or NULL
++ * @slice: pool slice that contains the message
++ * @out_incomplete output variable to note incomplete fds
++ *
++ * This function installs all file-descriptors of @gaps into the current
++ * process and copies the file-descriptor numbers into the target pool slice.
++ *
++ * If the file-descriptors were only partially installed, then @out_incomplete
++ * will be set to true. Otherwise, it's set to false.
++ *
++ * Return: 0 on success, negative error code on failure
++ */
++int kdbus_gaps_install(struct kdbus_gaps *gaps, struct kdbus_pool_slice *slice,
++ bool *out_incomplete)
++{
++ bool incomplete_fds = false;
++ struct kvec kvec;
++ size_t i, n_fds;
++ int ret, *fds;
++
++ if (!gaps) {
++ /* nothing to do */
++ *out_incomplete = incomplete_fds;
++ return 0;
++ }
++
++ n_fds = gaps->n_fds + gaps->n_memfds;
++ if (n_fds < 1) {
++ /* nothing to do */
++ *out_incomplete = incomplete_fds;
++ return 0;
++ }
++
++ fds = kmalloc_array(n_fds, sizeof(*fds), GFP_TEMPORARY);
++ n_fds = 0;
++ if (!fds)
++ return -ENOMEM;
++
++ /* 1) allocate fds and copy them over */
++
++ if (gaps->n_fds > 0) {
++ for (i = 0; i < gaps->n_fds; ++i) {
++ int fd;
++
++ fd = get_unused_fd_flags(O_CLOEXEC);
++ if (fd < 0)
++ incomplete_fds = true;
++
++ WARN_ON(!gaps->fd_files[i]);
++
++ fds[n_fds++] = fd < 0 ? -1 : fd;
++ }
++
++ /*
++ * The file-descriptor array can only be present once per
++ * message. Hence, prepare all fds and then copy them over with
++ * a single kvec.
++ */
++
++ WARN_ON(!gaps->fd_offset);
++
++ kvec.iov_base = fds;
++ kvec.iov_len = gaps->n_fds * sizeof(*fds);
++ ret = kdbus_pool_slice_copy_kvec(slice, gaps->fd_offset,
++ &kvec, 1, kvec.iov_len);
++ if (ret < 0)
++ goto exit;
++ }
++
++ for (i = 0; i < gaps->n_memfds; ++i) {
++ int memfd;
++
++ memfd = get_unused_fd_flags(O_CLOEXEC);
++ if (memfd < 0) {
++ incomplete_fds = true;
++ /* memfds are initialized to -1, skip copying it */
++ continue;
++ }
++
++ fds[n_fds++] = memfd;
++
++ /*
++ * memfds have to be copied individually as they each are put
++ * into a separate item. This should not be an issue, though,
++ * as usually there is no need to send more than one memfd per
++ * message.
++ */
++
++ WARN_ON(!gaps->memfd_offsets[i]);
++ WARN_ON(!gaps->memfd_files[i]);
++
++ kvec.iov_base = &memfd;
++ kvec.iov_len = sizeof(memfd);
++ ret = kdbus_pool_slice_copy_kvec(slice, gaps->memfd_offsets[i],
++ &kvec, 1, kvec.iov_len);
++ if (ret < 0)
++ goto exit;
++ }
++
++ /* 2) install fds now that everything was successful */
++
++ for (i = 0; i < gaps->n_fds; ++i)
++ if (fds[i] >= 0)
++ fd_install(fds[i], get_file(gaps->fd_files[i]));
++ for (i = 0; i < gaps->n_memfds; ++i)
++ if (fds[gaps->n_fds + i] >= 0)
++ fd_install(fds[gaps->n_fds + i],
++ get_file(gaps->memfd_files[i]));
++
++ ret = 0;
++
++exit:
++ if (ret < 0)
++ for (i = 0; i < n_fds; ++i)
++ put_unused_fd(fds[i]);
++ kfree(fds);
++ *out_incomplete = incomplete_fds;
++ return ret;
++}
++
++static struct file *kdbus_get_fd(int fd)
++{
++ struct file *f, *ret;
++ struct inode *inode;
++ struct socket *sock;
++
++ if (fd < 0)
++ return ERR_PTR(-EBADF);
++
++ f = fget_raw(fd);
++ if (!f)
++ return ERR_PTR(-EBADF);
++
++ inode = file_inode(f);
++ sock = S_ISSOCK(inode->i_mode) ? SOCKET_I(inode) : NULL;
++
++ if (f->f_mode & FMODE_PATH)
++ ret = f; /* O_PATH is always allowed */
++ else if (f->f_op == &kdbus_handle_ops)
++ ret = ERR_PTR(-EOPNOTSUPP); /* disallow kdbus-fd over kdbus */
++ else if (sock && sock->sk && sock->ops && sock->ops->family == PF_UNIX)
++ ret = ERR_PTR(-EOPNOTSUPP); /* disallow UDS over kdbus */
++ else
++ ret = f; /* all other are allowed */
++
++ if (f != ret)
++ fput(f);
++
++ return ret;
++}
++
++static struct file *kdbus_get_memfd(const struct kdbus_memfd *memfd)
++{
++ const int m = F_SEAL_SHRINK | F_SEAL_GROW | F_SEAL_WRITE | F_SEAL_SEAL;
++ struct file *f, *ret;
++ int s;
++
++ if (memfd->fd < 0)
++ return ERR_PTR(-EBADF);
++
++ f = fget(memfd->fd);
++ if (!f)
++ return ERR_PTR(-EBADF);
++
++ s = shmem_get_seals(f);
++ if (s < 0)
++ ret = ERR_PTR(-EMEDIUMTYPE);
++ else if ((s & m) != m)
++ ret = ERR_PTR(-ETXTBSY);
++ else if (memfd->start + memfd->size > (u64)i_size_read(file_inode(f)))
++ ret = ERR_PTR(-EFAULT);
++ else
++ ret = f;
++
++ if (f != ret)
++ fput(f);
++
++ return ret;
++}
++
++static int kdbus_msg_examine(struct kdbus_msg *msg, struct kdbus_bus *bus,
++ struct kdbus_cmd_send *cmd, size_t *out_n_memfds,
++ size_t *out_n_fds, size_t *out_n_parts)
++{
++ struct kdbus_item *item, *fds = NULL, *bloom = NULL, *dstname = NULL;
++ u64 n_parts, n_memfds, n_fds, vec_size;
++
++ /*
++ * Step 1:
++ * Validate the message and command parameters.
++ */
++
++ /* KDBUS_PAYLOAD_KERNEL is reserved to kernel messages */
++ if (msg->payload_type == KDBUS_PAYLOAD_KERNEL)
++ return -EINVAL;
++
++ if (msg->dst_id == KDBUS_DST_ID_BROADCAST) {
++ /* broadcasts must be marked as signals */
++ if (!(msg->flags & KDBUS_MSG_SIGNAL))
++ return -EBADMSG;
++ /* broadcasts cannot have timeouts */
++ if (msg->timeout_ns > 0)
++ return -ENOTUNIQ;
++ }
++
++ if (msg->flags & KDBUS_MSG_EXPECT_REPLY) {
++ /* if you expect a reply, you must specify a timeout */
++ if (msg->timeout_ns == 0)
++ return -EINVAL;
++ /* signals cannot have replies */
++ if (msg->flags & KDBUS_MSG_SIGNAL)
++ return -ENOTUNIQ;
++ } else {
++ /* must expect reply if sent as synchronous call */
++ if (cmd->flags & KDBUS_SEND_SYNC_REPLY)
++ return -EINVAL;
++ /* cannot mark replies as signal */
++ if (msg->cookie_reply && (msg->flags & KDBUS_MSG_SIGNAL))
++ return -EINVAL;
++ }
++
++ /*
++ * Step 2:
++ * Validate all passed items. While at it, select some statistics that
++ * are required to allocate state objects later on.
++ *
++ * Generic item validation has already been done via
++ * kdbus_item_validate(). Furthermore, the number of items is naturally
++ * limited by the maximum message size. Hence, only non-generic item
++ * checks are performed here (mainly integer overflow tests).
++ */
++
++ n_parts = 0;
++ n_memfds = 0;
++ n_fds = 0;
++ vec_size = 0;
++
++ KDBUS_ITEMS_FOREACH(item, msg->items, KDBUS_ITEMS_SIZE(msg, items)) {
++ switch (item->type) {
++ case KDBUS_ITEM_PAYLOAD_VEC: {
++ void __force __user *ptr = KDBUS_PTR(item->vec.address);
++ u64 size = item->vec.size;
++
++ if (vec_size + size < vec_size)
++ return -EMSGSIZE;
++ if (vec_size + size > KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE)
++ return -EMSGSIZE;
++ if (ptr && unlikely(!access_ok(VERIFY_READ, ptr, size)))
++ return -EFAULT;
++
++ if (ptr || size % 8) /* data or padding */
++ ++n_parts;
++ break;
++ }
++ case KDBUS_ITEM_PAYLOAD_MEMFD: {
++ u64 start = item->memfd.start;
++ u64 size = item->memfd.size;
++
++ if (start + size < start)
++ return -EMSGSIZE;
++ if (n_memfds >= KDBUS_MSG_MAX_MEMFD_ITEMS)
++ return -E2BIG;
++
++ ++n_memfds;
++ if (size % 8) /* vec-padding required */
++ ++n_parts;
++ break;
++ }
++ case KDBUS_ITEM_FDS: {
++ if (fds)
++ return -EEXIST;
++
++ fds = item;
++ n_fds = KDBUS_ITEM_PAYLOAD_SIZE(item) / sizeof(int);
++ if (n_fds > KDBUS_CONN_MAX_FDS_PER_USER)
++ return -EMFILE;
++
++ break;
++ }
++ case KDBUS_ITEM_BLOOM_FILTER: {
++ u64 bloom_size;
++
++ if (bloom)
++ return -EEXIST;
++
++ bloom = item;
++ bloom_size = KDBUS_ITEM_PAYLOAD_SIZE(item) -
++ offsetof(struct kdbus_bloom_filter, data);
++ if (!KDBUS_IS_ALIGNED8(bloom_size))
++ return -EFAULT;
++ if (bloom_size != bus->bloom.size)
++ return -EDOM;
++
++ break;
++ }
++ case KDBUS_ITEM_DST_NAME: {
++ if (dstname)
++ return -EEXIST;
++
++ dstname = item;
++ if (!kdbus_name_is_valid(item->str, false))
++ return -EINVAL;
++ if (msg->dst_id == KDBUS_DST_ID_BROADCAST)
++ return -EBADMSG;
++
++ break;
++ }
++ default:
++ return -EINVAL;
++ }
++ }
++
++ /*
++ * Step 3:
++ * Validate that required items were actually passed, and that no item
++ * contradicts the message flags.
++ */
++
++ /* bloom filters must be attached _iff_ it's a signal */
++ if (!(msg->flags & KDBUS_MSG_SIGNAL) != !bloom)
++ return -EBADMSG;
++ /* destination name is required if no ID is given */
++ if (msg->dst_id == KDBUS_DST_ID_NAME && !dstname)
++ return -EDESTADDRREQ;
++ /* cannot send file-descriptors attached to broadcasts */
++ if (msg->dst_id == KDBUS_DST_ID_BROADCAST && fds)
++ return -ENOTUNIQ;
++
++ *out_n_memfds = n_memfds;
++ *out_n_fds = n_fds;
++ *out_n_parts = n_parts;
++
++ return 0;
++}
++
++static bool kdbus_staging_merge_vecs(struct kdbus_staging *staging,
++ struct kdbus_item **prev_item,
++ struct iovec **prev_vec,
++ const struct kdbus_item *merge)
++{
++ void __user *ptr = (void __user *)KDBUS_PTR(merge->vec.address);
++ u64 padding = merge->vec.size % 8;
++ struct kdbus_item *prev = *prev_item;
++ struct iovec *vec = *prev_vec;
++
++ /* XXX: merging is disabled so far */
++ if (0 && prev && prev->type == KDBUS_ITEM_PAYLOAD_OFF &&
++ !merge->vec.address == !prev->vec.address) {
++ /*
++ * If we merge two VECs, we can always drop the second
++ * PAYLOAD_VEC item. Hence, include its size in the previous
++ * one.
++ */
++ prev->vec.size += merge->vec.size;
++
++ if (ptr) {
++ /*
++ * If we merge two data VECs, we need two iovecs to copy
++ * the data. But the items can be easily merged by
++ * summing their lengths.
++ */
++ vec = &staging->parts[staging->n_parts++];
++ vec->iov_len = merge->vec.size;
++ vec->iov_base = ptr;
++ staging->n_payload += vec->iov_len;
++ } else if (padding) {
++ /*
++ * If we merge two 0-vecs with the second 0-vec
++ * requiring padding, we need to insert an iovec to copy
++ * the 0-padding. We try merging it with the previous
++ * 0-padding iovec. This might end up with an
++ * iov_len==0, in which case we simply drop the iovec.
++ */
++ if (vec) {
++ staging->n_payload -= vec->iov_len;
++ vec->iov_len = prev->vec.size % 8;
++ if (!vec->iov_len) {
++ --staging->n_parts;
++ vec = NULL;
++ } else {
++ staging->n_payload += vec->iov_len;
++ }
++ } else {
++ vec = &staging->parts[staging->n_parts++];
++ vec->iov_len = padding;
++ vec->iov_base = (char __user *)zeros;
++ staging->n_payload += vec->iov_len;
++ }
++ } else {
++ /*
++ * If we merge two 0-vecs with the second 0-vec having
++ * no padding, we know the padding of the first stays
++ * the same. Hence, @vec needs no adjustment.
++ */
++ }
++
++ /* successfully merged with previous item */
++ merge = prev;
++ } else {
++ /*
++ * If we cannot merge the payload item with the previous one,
++ * we simply insert a new iovec for the data/padding.
++ */
++ if (ptr) {
++ vec = &staging->parts[staging->n_parts++];
++ vec->iov_len = merge->vec.size;
++ vec->iov_base = ptr;
++ staging->n_payload += vec->iov_len;
++ } else if (padding) {
++ vec = &staging->parts[staging->n_parts++];
++ vec->iov_len = padding;
++ vec->iov_base = (char __user *)zeros;
++ staging->n_payload += vec->iov_len;
++ } else {
++ vec = NULL;
++ }
++ }
++
++ *prev_item = (struct kdbus_item *)merge;
++ *prev_vec = vec;
++
++ return merge == prev;
++}
++
++static int kdbus_staging_import(struct kdbus_staging *staging)
++{
++ struct kdbus_item *it, *item, *last, *prev_payload;
++ struct kdbus_gaps *gaps = staging->gaps;
++ struct kdbus_msg *msg = staging->msg;
++ struct iovec *part, *prev_part;
++ bool drop_item;
++
++ drop_item = false;
++ last = NULL;
++ prev_payload = NULL;
++ prev_part = NULL;
++
++ /*
++ * We modify msg->items along the way; make sure to use @item as offset
++ * to the next item (instead of the iterator @it).
++ */
++ for (it = item = msg->items;
++ it >= msg->items &&
++ (u8 *)it < (u8 *)msg + msg->size &&
++ (u8 *)it + it->size <= (u8 *)msg + msg->size; ) {
++ /*
++ * If we dropped items along the way, move current item to
++ * front. We must not access @it afterwards, but use @item
++ * instead!
++ */
++ if (it != item)
++ memmove(item, it, it->size);
++ it = (void *)((u8 *)it + KDBUS_ALIGN8(item->size));
++
++ switch (item->type) {
++ case KDBUS_ITEM_PAYLOAD_VEC: {
++ size_t offset = staging->n_payload;
++
++ if (kdbus_staging_merge_vecs(staging, &prev_payload,
++ &prev_part, item)) {
++ drop_item = true;
++ } else if (item->vec.address) {
++ /* real offset is patched later on */
++ item->type = KDBUS_ITEM_PAYLOAD_OFF;
++ item->vec.offset = offset;
++ } else {
++ item->type = KDBUS_ITEM_PAYLOAD_OFF;
++ item->vec.offset = ~0ULL;
++ }
++
++ break;
++ }
++ case KDBUS_ITEM_PAYLOAD_MEMFD: {
++ struct file *f;
++
++ f = kdbus_get_memfd(&item->memfd);
++ if (IS_ERR(f))
++ return PTR_ERR(f);
++
++ gaps->memfd_files[gaps->n_memfds] = f;
++ gaps->memfd_offsets[gaps->n_memfds] =
++ (u8 *)&item->memfd.fd - (u8 *)msg;
++ ++gaps->n_memfds;
++
++ /* memfds cannot be merged */
++ prev_payload = item;
++ prev_part = NULL;
++
++ /* insert padding to make following VECs aligned */
++ if (item->memfd.size % 8) {
++ part = &staging->parts[staging->n_parts++];
++ part->iov_len = item->memfd.size % 8;
++ part->iov_base = (char __user *)zeros;
++ staging->n_payload += part->iov_len;
++ }
++
++ break;
++ }
++ case KDBUS_ITEM_FDS: {
++ size_t i, n_fds;
++
++ n_fds = KDBUS_ITEM_PAYLOAD_SIZE(item) / sizeof(int);
++ for (i = 0; i < n_fds; ++i) {
++ struct file *f;
++
++ f = kdbus_get_fd(item->fds[i]);
++ if (IS_ERR(f))
++ return PTR_ERR(f);
++
++ gaps->fd_files[gaps->n_fds++] = f;
++ }
++
++ gaps->fd_offset = (u8 *)item->fds - (u8 *)msg;
++
++ break;
++ }
++ case KDBUS_ITEM_BLOOM_FILTER:
++ staging->bloom_filter = &item->bloom_filter;
++ break;
++ case KDBUS_ITEM_DST_NAME:
++ staging->dst_name = item->str;
++ break;
++ }
++
++ /* drop item if we merged it with a previous one */
++ if (drop_item) {
++ drop_item = false;
++ } else {
++ last = item;
++ item = KDBUS_ITEM_NEXT(item);
++ }
++ }
++
++ /* adjust message size regarding dropped items */
++ msg->size = offsetof(struct kdbus_msg, items);
++ if (last)
++ msg->size += ((u8 *)last - (u8 *)msg->items) + last->size;
++
++ return 0;
++}
++
++static void kdbus_staging_reserve(struct kdbus_staging *staging)
++{
++ struct iovec *part;
++
++ part = &staging->parts[staging->n_parts++];
++ part->iov_base = (void __user *)zeros;
++ part->iov_len = 0;
++}
++
++static struct kdbus_staging *kdbus_staging_new(struct kdbus_bus *bus,
++ size_t n_parts,
++ size_t msg_extra_size)
++{
++ const size_t reserved_parts = 5; /* see below for explanation */
++ struct kdbus_staging *staging;
++ int ret;
++
++ n_parts += reserved_parts;
++
++ staging = kzalloc(sizeof(*staging) + n_parts * sizeof(*staging->parts) +
++ msg_extra_size, GFP_TEMPORARY);
++ if (!staging)
++ return ERR_PTR(-ENOMEM);
++
++ staging->msg_seqnum = atomic64_inc_return(&bus->last_message_id);
++ staging->n_parts = 0; /* we reserve n_parts, but don't enforce them */
++ staging->parts = (void *)(staging + 1);
++
++ if (msg_extra_size) /* if requested, allocate message, too */
++ staging->msg = (void *)((u8 *)staging->parts +
++ n_parts * sizeof(*staging->parts));
++
++ staging->meta_proc = kdbus_meta_proc_new();
++ if (IS_ERR(staging->meta_proc)) {
++ ret = PTR_ERR(staging->meta_proc);
++ staging->meta_proc = NULL;
++ goto error;
++ }
++
++ staging->meta_conn = kdbus_meta_conn_new();
++ if (IS_ERR(staging->meta_conn)) {
++ ret = PTR_ERR(staging->meta_conn);
++ staging->meta_conn = NULL;
++ goto error;
++ }
++
++ /*
++ * Prepare iovecs to copy the message into the target pool. We use the
++ * following iovecs:
++ * * iovec to copy "kdbus_msg.size"
++ * * iovec to copy "struct kdbus_msg" (minus size) plus items
++ * * iovec for possible padding after the items
++ * * iovec for metadata items
++ * * iovec for possible padding after the items
++ *
++ * Make sure to update @reserved_parts if you add more parts here.
++ */
++
++ kdbus_staging_reserve(staging); /* msg.size */
++ kdbus_staging_reserve(staging); /* msg (minus msg.size) plus items */
++ kdbus_staging_reserve(staging); /* msg padding */
++ kdbus_staging_reserve(staging); /* meta */
++ kdbus_staging_reserve(staging); /* meta padding */
++
++ return staging;
++
++error:
++ kdbus_staging_free(staging);
++ return ERR_PTR(ret);
++}
++
++struct kdbus_staging *kdbus_staging_new_kernel(struct kdbus_bus *bus,
++ u64 dst, u64 cookie_timeout,
++ size_t it_size, size_t it_type)
++{
++ struct kdbus_staging *staging;
++ size_t size;
++
++ size = offsetof(struct kdbus_msg, items) +
++ KDBUS_ITEM_HEADER_SIZE + it_size;
++
++ staging = kdbus_staging_new(bus, 0, KDBUS_ALIGN8(size));
++ if (IS_ERR(staging))
++ return ERR_CAST(staging);
++
++ staging->msg->size = size;
++ staging->msg->flags = (dst == KDBUS_DST_ID_BROADCAST) ?
++ KDBUS_MSG_SIGNAL : 0;
++ staging->msg->dst_id = dst;
++ staging->msg->src_id = KDBUS_SRC_ID_KERNEL;
++ staging->msg->payload_type = KDBUS_PAYLOAD_KERNEL;
++ staging->msg->cookie_reply = cookie_timeout;
++ staging->notify = staging->msg->items;
++ staging->notify->size = KDBUS_ITEM_HEADER_SIZE + it_size;
++ staging->notify->type = it_type;
++
++ return staging;
++}
++
++struct kdbus_staging *kdbus_staging_new_user(struct kdbus_bus *bus,
++ struct kdbus_cmd_send *cmd,
++ struct kdbus_msg *msg)
++{
++ const size_t reserved_parts = 1; /* see below for explanation */
++ size_t n_memfds, n_fds, n_parts;
++ struct kdbus_staging *staging;
++ int ret;
++
++ /*
++ * Examine user-supplied message and figure out how many resources we
++ * need to allocate in our staging area. This requires us to iterate
++ * the message twice, but saves us from re-allocating our resources
++ * all the time.
++ */
++
++ ret = kdbus_msg_examine(msg, bus, cmd, &n_memfds, &n_fds, &n_parts);
++ if (ret < 0)
++ return ERR_PTR(ret);
++
++ n_parts += reserved_parts;
++
++ /*
++ * Allocate staging area with the number of required resources. Make
++ * sure that we have enough iovecs for all required parts pre-allocated
++ * so this will hopefully be the only memory allocation for this
++ * message transaction.
++ */
++
++ staging = kdbus_staging_new(bus, n_parts, 0);
++ if (IS_ERR(staging))
++ return ERR_CAST(staging);
++
++ staging->msg = msg;
++
++ /*
++ * If the message contains memfds or fd items, we need to remember some
++ * state so we can fill in the requested information at RECV time.
++ * File-descriptors cannot be passed at SEND time. Hence, allocate a
++ * gaps-object to remember that state. That gaps object is linked to
++ * from the staging area, but will also be linked to from the message
++ * queue of each peer. Hence, each receiver owns a reference to it, and
++ * it will later be used to fill the 'gaps' in message that couldn't be
++ * filled at SEND time.
++ * Note that the 'gaps' object is read-only once the staging-allocator
++ * returns. There might be connections receiving a queued message while
++ * the sender still broadcasts the message to other receivers.
++ */
++
++ if (n_memfds > 0 || n_fds > 0) {
++ staging->gaps = kdbus_gaps_new(n_memfds, n_fds);
++ if (IS_ERR(staging->gaps)) {
++ ret = PTR_ERR(staging->gaps);
++ staging->gaps = NULL;
++ kdbus_staging_free(staging);
++ return ERR_PTR(ret);
++ }
++ }
++
++ /*
++ * kdbus_staging_new() already reserves parts for message setup. For
++ * user-supplied messages, we add the following iovecs:
++ * ... variable number of iovecs for payload ...
++ * * final iovec for possible padding of payload
++ *
++ * Make sure to update @reserved_parts if you add more parts here.
++ */
++
++ ret = kdbus_staging_import(staging); /* payload */
++ kdbus_staging_reserve(staging); /* payload padding */
++
++ if (ret < 0)
++ goto error;
++
++ return staging;
++
++error:
++ kdbus_staging_free(staging);
++ return ERR_PTR(ret);
++}
++
++struct kdbus_staging *kdbus_staging_free(struct kdbus_staging *staging)
++{
++ if (!staging)
++ return NULL;
++
++ kdbus_meta_conn_unref(staging->meta_conn);
++ kdbus_meta_proc_unref(staging->meta_proc);
++ kdbus_gaps_unref(staging->gaps);
++ kfree(staging);
++
++ return NULL;
++}
++
++static int kdbus_staging_collect_metadata(struct kdbus_staging *staging,
++ struct kdbus_conn *src,
++ struct kdbus_conn *dst,
++ u64 *out_attach)
++{
++ u64 attach;
++ int ret;
++
++ if (src)
++ attach = kdbus_meta_msg_mask(src, dst);
++ else
++ attach = KDBUS_ATTACH_TIMESTAMP; /* metadata for kernel msgs */
++
++ if (src && !src->meta_fake) {
++ ret = kdbus_meta_proc_collect(staging->meta_proc, attach);
++ if (ret < 0)
++ return ret;
++ }
++
++ ret = kdbus_meta_conn_collect(staging->meta_conn, src,
++ staging->msg_seqnum, attach);
++ if (ret < 0)
++ return ret;
++
++ *out_attach = attach;
++ return 0;
++}
++
++/**
++ * kdbus_staging_emit() - emit linearized message in target pool
++ * @staging: staging object to create message from
++ * @src: sender of the message (or NULL)
++ * @dst: target connection to allocate message for
++ *
++ * This allocates a pool-slice for @dst and copies the message provided by
++ * @staging into it. The new slice is then returned to the caller for further
++ * processing. It's not linked into any queue, yet.
++ *
++ * Return: Newly allocated slice or ERR_PTR on failure.
++ */
++struct kdbus_pool_slice *kdbus_staging_emit(struct kdbus_staging *staging,
++ struct kdbus_conn *src,
++ struct kdbus_conn *dst)
++{
++ struct kdbus_item *item, *meta_items = NULL;
++ struct kdbus_pool_slice *slice = NULL;
++ size_t off, size, meta_size;
++ struct iovec *v;
++ u64 attach, msg_size;
++ int ret;
++
++ /*
++ * Step 1:
++ * Collect metadata from @src depending on the attach-flags allowed for
++ * @dst. Translate it into the namespaces pinned by @dst.
++ */
++
++ ret = kdbus_staging_collect_metadata(staging, src, dst, &attach);
++ if (ret < 0)
++ goto error;
++
++ ret = kdbus_meta_emit(staging->meta_proc, NULL, staging->meta_conn,
++ dst, attach, &meta_items, &meta_size);
++ if (ret < 0)
++ goto error;
++
++ /*
++ * Step 2:
++ * Setup iovecs for the message. See kdbus_staging_new() for allocation
++ * of those iovecs. All reserved iovecs have been initialized with
++ * iov_len=0 + iov_base=zeros. Furthermore, the iovecs to copy the
++ * actual message payload have already been initialized and need not be
++ * touched.
++ */
++
++ v = staging->parts;
++ msg_size = staging->msg->size;
++
++ /* msg.size */
++ v->iov_len = sizeof(msg_size);
++ v->iov_base = (void __user *)&msg_size;
++ ++v;
++
++ /* msg (after msg.size) plus items */
++ v->iov_len = staging->msg->size - sizeof(staging->msg->size);
++ v->iov_base = (void __user *)((u8 *)staging->msg +
++ sizeof(staging->msg->size));
++ ++v;
++
++ /* padding after msg */
++ v->iov_len = KDBUS_ALIGN8(staging->msg->size) - staging->msg->size;
++ v->iov_base = (void __user *)zeros;
++ ++v;
++
++ if (meta_size > 0) {
++ /* metadata items */
++ v->iov_len = meta_size;
++ v->iov_base = (void __user *)meta_items;
++ ++v;
++
++ /* padding after metadata */
++ v->iov_len = KDBUS_ALIGN8(meta_size) - meta_size;
++ v->iov_base = (void __user *)zeros;
++ ++v;
++
++ msg_size = KDBUS_ALIGN8(msg_size) + meta_size;
++ } else {
++ /* metadata items */
++ v->iov_len = 0;
++ v->iov_base = (void __user *)zeros;
++ ++v;
++
++ /* padding after metadata */
++ v->iov_len = 0;
++ v->iov_base = (void __user *)zeros;
++ ++v;
++ }
++
++ /* ... payload iovecs are already filled in ... */
++
++ /* compute overall size and fill in padding after payload */
++ size = KDBUS_ALIGN8(msg_size);
++
++ if (staging->n_payload > 0) {
++ size += staging->n_payload;
++
++ v = &staging->parts[staging->n_parts - 1];
++ v->iov_len = KDBUS_ALIGN8(size) - size;
++ v->iov_base = (void __user *)zeros;
++
++ size = KDBUS_ALIGN8(size);
++ }
++
++ /*
++ * Step 3:
++ * The PAYLOAD_OFF items in the message contain a relative 'offset'
++ * field that tells the receiver where to find the actual payload. This
++ * offset is relative to the start of the message, and as such depends
++ * on the size of the metadata items we inserted. This size is variable
++ * and changes for each peer we send the message to. Hence, we remember
++ * the last relative offset that was used to calculate the 'offset'
++ * fields. For each message, we re-calculate it and patch all items, in
++ * case it changed.
++ */
++
++ off = KDBUS_ALIGN8(msg_size);
++
++ if (off != staging->i_payload) {
++ KDBUS_ITEMS_FOREACH(item, staging->msg->items,
++ KDBUS_ITEMS_SIZE(staging->msg, items)) {
++ if (item->type != KDBUS_ITEM_PAYLOAD_OFF)
++ continue;
++
++ item->vec.offset -= staging->i_payload;
++ item->vec.offset += off;
++ }
++
++ staging->i_payload = off;
++ }
++
++ /*
++ * Step 4:
++ * Allocate pool slice and copy over all data. Make sure to properly
++ * account on user quota.
++ */
++
++ ret = kdbus_conn_quota_inc(dst, src ? src->user : NULL, size,
++ staging->gaps ? staging->gaps->n_fds : 0);
++ if (ret < 0)
++ goto error;
++
++ slice = kdbus_pool_slice_alloc(dst->pool, size, true);
++ if (IS_ERR(slice)) {
++ ret = PTR_ERR(slice);
++ slice = NULL;
++ goto error;
++ }
++
++ WARN_ON(kdbus_pool_slice_size(slice) != size);
++
++ ret = kdbus_pool_slice_copy_iovec(slice, 0, staging->parts,
++ staging->n_parts, size);
++ if (ret < 0)
++ goto error;
++
++ /* all done, return slice to caller */
++ goto exit;
++
++error:
++ if (slice)
++ kdbus_conn_quota_dec(dst, src ? src->user : NULL, size,
++ staging->gaps ? staging->gaps->n_fds : 0);
++ kdbus_pool_slice_release(slice);
++ slice = ERR_PTR(ret);
++exit:
++ kfree(meta_items);
++ return slice;
++}
+diff --git a/ipc/kdbus/message.h b/ipc/kdbus/message.h
+new file mode 100644
+index 0000000..298f9c9
+--- /dev/null
++++ b/ipc/kdbus/message.h
+@@ -0,0 +1,120 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_MESSAGE_H
++#define __KDBUS_MESSAGE_H
++
++#include <linux/fs.h>
++#include <linux/kref.h>
++#include <uapi/linux/kdbus.h>
++
++struct kdbus_bus;
++struct kdbus_conn;
++struct kdbus_meta_conn;
++struct kdbus_meta_proc;
++struct kdbus_pool_slice;
++
++/**
++ * struct kdbus_gaps - gaps in message to be filled later
++ * @kref: Reference counter
++ * @n_memfd_offs: Number of memfds
++ * @memfd_offs: Offsets of kdbus_memfd items in target slice
++ * @n_fds: Number of fds
++ * @fds: Array of sent fds
++ * @fds_offset: Offset of fd-array in target slice
++ *
++ * The 'gaps' object is used to track data that is needed to fill gaps in a
++ * message at RECV time. Usually, we try to compile the whole message at SEND
++ * time. This has the advantage, that we don't have to cache any information and
++ * can keep the memory consumption small. Furthermore, all copy operations can
++ * be combined into a single function call, which speeds up transactions
++ * considerably.
++ * However, things like file-descriptors can only be fully installed at RECV
++ * time. The gaps object tracks this data and pins it until a message is
++ * received. The gaps object is shared between all receivers of the same
++ * message.
++ */
++struct kdbus_gaps {
++ struct kref kref;
++
++ /* state tracking for KDBUS_ITEM_PAYLOAD_MEMFD entries */
++ size_t n_memfds;
++ u64 *memfd_offsets;
++ struct file **memfd_files;
++
++ /* state tracking for KDBUS_ITEM_FDS */
++ size_t n_fds;
++ struct file **fd_files;
++ u64 fd_offset;
++};
++
++struct kdbus_gaps *kdbus_gaps_ref(struct kdbus_gaps *gaps);
++struct kdbus_gaps *kdbus_gaps_unref(struct kdbus_gaps *gaps);
++int kdbus_gaps_install(struct kdbus_gaps *gaps, struct kdbus_pool_slice *slice,
++ bool *out_incomplete);
++
++/**
++ * struct kdbus_staging - staging area to import messages
++ * @msg: User-supplied message
++ * @gaps: Gaps-object created during import (or NULL if empty)
++ * @msg_seqnum: Message sequence number
++ * @notify_entry: Entry into list of kernel-generated notifications
++ * @i_payload: Current relative index of start of payload
++ * @n_payload: Total number of bytes needed for payload
++ * @n_parts: Number of parts
++ * @parts: Array of iovecs that make up the whole message
++ * @meta_proc: Process metadata of the sender (or NULL if empty)
++ * @meta_conn: Connection metadata of the sender (or NULL if empty)
++ * @bloom_filter: Pointer to the bloom-item in @msg, or NULL
++ * @dst_name: Pointer to the dst-name-item in @msg, or NULL
++ * @notify: Pointer to the notification item in @msg, or NULL
++ *
++ * The kdbus_staging object is a temporary staging area to import user-supplied
++ * messages into the kernel. It is only used during SEND and dropped once the
++ * message is queued. Any data that cannot be collected during SEND, is
++ * collected in a kdbus_gaps object and attached to the message queue.
++ */
++struct kdbus_staging {
++ struct kdbus_msg *msg;
++ struct kdbus_gaps *gaps;
++ u64 msg_seqnum;
++ struct list_head notify_entry;
++
++ /* crafted iovecs to copy the message */
++ size_t i_payload;
++ size_t n_payload;
++ size_t n_parts;
++ struct iovec *parts;
++
++ /* metadata state */
++ struct kdbus_meta_proc *meta_proc;
++ struct kdbus_meta_conn *meta_conn;
++
++ /* cached pointers into @msg */
++ const struct kdbus_bloom_filter *bloom_filter;
++ const char *dst_name;
++ struct kdbus_item *notify;
++};
++
++struct kdbus_staging *kdbus_staging_new_kernel(struct kdbus_bus *bus,
++ u64 dst, u64 cookie_timeout,
++ size_t it_size, size_t it_type);
++struct kdbus_staging *kdbus_staging_new_user(struct kdbus_bus *bus,
++ struct kdbus_cmd_send *cmd,
++ struct kdbus_msg *msg);
++struct kdbus_staging *kdbus_staging_free(struct kdbus_staging *staging);
++struct kdbus_pool_slice *kdbus_staging_emit(struct kdbus_staging *staging,
++ struct kdbus_conn *src,
++ struct kdbus_conn *dst);
++
++#endif
+diff --git a/ipc/kdbus/metadata.c b/ipc/kdbus/metadata.c
+new file mode 100644
+index 0000000..71ca475
+--- /dev/null
++++ b/ipc/kdbus/metadata.c
+@@ -0,0 +1,1347 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/audit.h>
++#include <linux/capability.h>
++#include <linux/cgroup.h>
++#include <linux/cred.h>
++#include <linux/file.h>
++#include <linux/fs_struct.h>
++#include <linux/init.h>
++#include <linux/kref.h>
++#include <linux/mutex.h>
++#include <linux/sched.h>
++#include <linux/security.h>
++#include <linux/sizes.h>
++#include <linux/slab.h>
++#include <linux/uaccess.h>
++#include <linux/uidgid.h>
++#include <linux/uio.h>
++#include <linux/user_namespace.h>
++
++#include "bus.h"
++#include "connection.h"
++#include "endpoint.h"
++#include "item.h"
++#include "message.h"
++#include "metadata.h"
++#include "names.h"
++
++/**
++ * struct kdbus_meta_proc - Process metadata
++ * @kref: Reference counting
++ * @lock: Object lock
++ * @collected: Bitmask of collected items
++ * @valid: Bitmask of collected and valid items
++ * @cred: Credentials
++ * @pid: PID of process
++ * @tgid: TGID of process
++ * @ppid: PPID of process
++ * @tid_comm: TID comm line
++ * @pid_comm: PID comm line
++ * @exe_path: Executable path
++ * @root_path: Root-FS path
++ * @cmdline: Command-line
++ * @cgroup: Full cgroup path
++ * @seclabel: Seclabel
++ * @audit_loginuid: Audit login-UID
++ * @audit_sessionid: Audit session-ID
++ */
++struct kdbus_meta_proc {
++ struct kref kref;
++ struct mutex lock;
++ u64 collected;
++ u64 valid;
++
++ /* KDBUS_ITEM_CREDS */
++ /* KDBUS_ITEM_AUXGROUPS */
++ /* KDBUS_ITEM_CAPS */
++ const struct cred *cred;
++
++ /* KDBUS_ITEM_PIDS */
++ struct pid *pid;
++ struct pid *tgid;
++ struct pid *ppid;
++
++ /* KDBUS_ITEM_TID_COMM */
++ char tid_comm[TASK_COMM_LEN];
++ /* KDBUS_ITEM_PID_COMM */
++ char pid_comm[TASK_COMM_LEN];
++
++ /* KDBUS_ITEM_EXE */
++ struct path exe_path;
++ struct path root_path;
++
++ /* KDBUS_ITEM_CMDLINE */
++ char *cmdline;
++
++ /* KDBUS_ITEM_CGROUP */
++ char *cgroup;
++
++ /* KDBUS_ITEM_SECLABEL */
++ char *seclabel;
++
++ /* KDBUS_ITEM_AUDIT */
++ kuid_t audit_loginuid;
++ unsigned int audit_sessionid;
++};
++
++/**
++ * struct kdbus_meta_conn
++ * @kref: Reference counting
++ * @lock: Object lock
++ * @collected: Bitmask of collected items
++ * @valid: Bitmask of collected and valid items
++ * @ts: Timestamp values
++ * @owned_names_items: Serialized items for owned names
++ * @owned_names_size: Size of @owned_names_items
++ * @conn_description: Connection description
++ */
++struct kdbus_meta_conn {
++ struct kref kref;
++ struct mutex lock;
++ u64 collected;
++ u64 valid;
++
++ /* KDBUS_ITEM_TIMESTAMP */
++ struct kdbus_timestamp ts;
++
++ /* KDBUS_ITEM_OWNED_NAME */
++ struct kdbus_item *owned_names_items;
++ size_t owned_names_size;
++
++ /* KDBUS_ITEM_CONN_DESCRIPTION */
++ char *conn_description;
++};
++
++/* fixed size equivalent of "kdbus_caps" */
++struct kdbus_meta_caps {
++ u32 last_cap;
++ struct {
++ u32 caps[_KERNEL_CAPABILITY_U32S];
++ } set[4];
++};
++
++/**
++ * kdbus_meta_proc_new() - Create process metadata object
++ *
++ * Return: Pointer to new object on success, ERR_PTR on failure.
++ */
++struct kdbus_meta_proc *kdbus_meta_proc_new(void)
++{
++ struct kdbus_meta_proc *mp;
++
++ mp = kzalloc(sizeof(*mp), GFP_KERNEL);
++ if (!mp)
++ return ERR_PTR(-ENOMEM);
++
++ kref_init(&mp->kref);
++ mutex_init(&mp->lock);
++
++ return mp;
++}
++
++static void kdbus_meta_proc_free(struct kref *kref)
++{
++ struct kdbus_meta_proc *mp = container_of(kref, struct kdbus_meta_proc,
++ kref);
++
++ path_put(&mp->exe_path);
++ path_put(&mp->root_path);
++ if (mp->cred)
++ put_cred(mp->cred);
++ put_pid(mp->ppid);
++ put_pid(mp->tgid);
++ put_pid(mp->pid);
++
++ kfree(mp->seclabel);
++ kfree(mp->cmdline);
++ kfree(mp->cgroup);
++ kfree(mp);
++}
++
++/**
++ * kdbus_meta_proc_ref() - Gain reference
++ * @mp: Process metadata object
++ *
++ * Return: @mp is returned
++ */
++struct kdbus_meta_proc *kdbus_meta_proc_ref(struct kdbus_meta_proc *mp)
++{
++ if (mp)
++ kref_get(&mp->kref);
++ return mp;
++}
++
++/**
++ * kdbus_meta_proc_unref() - Drop reference
++ * @mp: Process metadata object
++ *
++ * Return: NULL
++ */
++struct kdbus_meta_proc *kdbus_meta_proc_unref(struct kdbus_meta_proc *mp)
++{
++ if (mp)
++ kref_put(&mp->kref, kdbus_meta_proc_free);
++ return NULL;
++}
++
++static void kdbus_meta_proc_collect_pids(struct kdbus_meta_proc *mp)
++{
++ struct task_struct *parent;
++
++ mp->pid = get_pid(task_pid(current));
++ mp->tgid = get_pid(task_tgid(current));
++
++ rcu_read_lock();
++ parent = rcu_dereference(current->real_parent);
++ mp->ppid = get_pid(task_tgid(parent));
++ rcu_read_unlock();
++
++ mp->valid |= KDBUS_ATTACH_PIDS;
++}
++
++static void kdbus_meta_proc_collect_tid_comm(struct kdbus_meta_proc *mp)
++{
++ get_task_comm(mp->tid_comm, current);
++ mp->valid |= KDBUS_ATTACH_TID_COMM;
++}
++
++static void kdbus_meta_proc_collect_pid_comm(struct kdbus_meta_proc *mp)
++{
++ get_task_comm(mp->pid_comm, current->group_leader);
++ mp->valid |= KDBUS_ATTACH_PID_COMM;
++}
++
++static void kdbus_meta_proc_collect_exe(struct kdbus_meta_proc *mp)
++{
++ struct file *exe_file;
++
++ rcu_read_lock();
++ exe_file = rcu_dereference(current->mm->exe_file);
++ if (exe_file) {
++ mp->exe_path = exe_file->f_path;
++ path_get(&mp->exe_path);
++ get_fs_root(current->fs, &mp->root_path);
++ mp->valid |= KDBUS_ATTACH_EXE;
++ }
++ rcu_read_unlock();
++}
++
++static int kdbus_meta_proc_collect_cmdline(struct kdbus_meta_proc *mp)
++{
++ struct mm_struct *mm = current->mm;
++ char *cmdline;
++
++ if (!mm->arg_end)
++ return 0;
++
++ cmdline = strndup_user((const char __user *)mm->arg_start,
++ mm->arg_end - mm->arg_start);
++ if (IS_ERR(cmdline))
++ return PTR_ERR(cmdline);
++
++ mp->cmdline = cmdline;
++ mp->valid |= KDBUS_ATTACH_CMDLINE;
++
++ return 0;
++}
++
++static int kdbus_meta_proc_collect_cgroup(struct kdbus_meta_proc *mp)
++{
++#ifdef CONFIG_CGROUPS
++ void *page;
++ char *s;
++
++ page = (void *)__get_free_page(GFP_TEMPORARY);
++ if (!page)
++ return -ENOMEM;
++
++ s = task_cgroup_path(current, page, PAGE_SIZE);
++ if (s) {
++ mp->cgroup = kstrdup(s, GFP_KERNEL);
++ if (!mp->cgroup) {
++ free_page((unsigned long)page);
++ return -ENOMEM;
++ }
++ }
++
++ free_page((unsigned long)page);
++ mp->valid |= KDBUS_ATTACH_CGROUP;
++#endif
++
++ return 0;
++}
++
++static int kdbus_meta_proc_collect_seclabel(struct kdbus_meta_proc *mp)
++{
++#ifdef CONFIG_SECURITY
++ char *ctx = NULL;
++ u32 sid, len;
++ int ret;
++
++ security_task_getsecid(current, &sid);
++ ret = security_secid_to_secctx(sid, &ctx, &len);
++ if (ret < 0) {
++ /*
++ * EOPNOTSUPP means no security module is active,
++ * lets skip adding the seclabel then. This effectively
++ * drops the SECLABEL item.
++ */
++ return (ret == -EOPNOTSUPP) ? 0 : ret;
++ }
++
++ mp->seclabel = kstrdup(ctx, GFP_KERNEL);
++ security_release_secctx(ctx, len);
++ if (!mp->seclabel)
++ return -ENOMEM;
++
++ mp->valid |= KDBUS_ATTACH_SECLABEL;
++#endif
++
++ return 0;
++}
++
++static void kdbus_meta_proc_collect_audit(struct kdbus_meta_proc *mp)
++{
++#ifdef CONFIG_AUDITSYSCALL
++ mp->audit_loginuid = audit_get_loginuid(current);
++ mp->audit_sessionid = audit_get_sessionid(current);
++ mp->valid |= KDBUS_ATTACH_AUDIT;
++#endif
++}
++
++/**
++ * kdbus_meta_proc_collect() - Collect process metadata
++ * @mp: Process metadata object
++ * @what: Attach flags to collect
++ *
++ * This collects process metadata from current and saves it in @mp.
++ *
++ * Return: 0 on success, negative error code on failure.
++ */
++int kdbus_meta_proc_collect(struct kdbus_meta_proc *mp, u64 what)
++{
++ int ret;
++
++ if (!mp || !(what & (KDBUS_ATTACH_CREDS |
++ KDBUS_ATTACH_PIDS |
++ KDBUS_ATTACH_AUXGROUPS |
++ KDBUS_ATTACH_TID_COMM |
++ KDBUS_ATTACH_PID_COMM |
++ KDBUS_ATTACH_EXE |
++ KDBUS_ATTACH_CMDLINE |
++ KDBUS_ATTACH_CGROUP |
++ KDBUS_ATTACH_CAPS |
++ KDBUS_ATTACH_SECLABEL |
++ KDBUS_ATTACH_AUDIT)))
++ return 0;
++
++ mutex_lock(&mp->lock);
++
++ /* creds, auxgrps and caps share "struct cred" as context */
++ {
++ const u64 m_cred = KDBUS_ATTACH_CREDS |
++ KDBUS_ATTACH_AUXGROUPS |
++ KDBUS_ATTACH_CAPS;
++
++ if ((what & m_cred) && !(mp->collected & m_cred)) {
++ mp->cred = get_current_cred();
++ mp->valid |= m_cred;
++ mp->collected |= m_cred;
++ }
++ }
++
++ if ((what & KDBUS_ATTACH_PIDS) &&
++ !(mp->collected & KDBUS_ATTACH_PIDS)) {
++ kdbus_meta_proc_collect_pids(mp);
++ mp->collected |= KDBUS_ATTACH_PIDS;
++ }
++
++ if ((what & KDBUS_ATTACH_TID_COMM) &&
++ !(mp->collected & KDBUS_ATTACH_TID_COMM)) {
++ kdbus_meta_proc_collect_tid_comm(mp);
++ mp->collected |= KDBUS_ATTACH_TID_COMM;
++ }
++
++ if ((what & KDBUS_ATTACH_PID_COMM) &&
++ !(mp->collected & KDBUS_ATTACH_PID_COMM)) {
++ kdbus_meta_proc_collect_pid_comm(mp);
++ mp->collected |= KDBUS_ATTACH_PID_COMM;
++ }
++
++ if ((what & KDBUS_ATTACH_EXE) &&
++ !(mp->collected & KDBUS_ATTACH_EXE)) {
++ kdbus_meta_proc_collect_exe(mp);
++ mp->collected |= KDBUS_ATTACH_EXE;
++ }
++
++ if ((what & KDBUS_ATTACH_CMDLINE) &&
++ !(mp->collected & KDBUS_ATTACH_CMDLINE)) {
++ ret = kdbus_meta_proc_collect_cmdline(mp);
++ if (ret < 0)
++ goto exit_unlock;
++ mp->collected |= KDBUS_ATTACH_CMDLINE;
++ }
++
++ if ((what & KDBUS_ATTACH_CGROUP) &&
++ !(mp->collected & KDBUS_ATTACH_CGROUP)) {
++ ret = kdbus_meta_proc_collect_cgroup(mp);
++ if (ret < 0)
++ goto exit_unlock;
++ mp->collected |= KDBUS_ATTACH_CGROUP;
++ }
++
++ if ((what & KDBUS_ATTACH_SECLABEL) &&
++ !(mp->collected & KDBUS_ATTACH_SECLABEL)) {
++ ret = kdbus_meta_proc_collect_seclabel(mp);
++ if (ret < 0)
++ goto exit_unlock;
++ mp->collected |= KDBUS_ATTACH_SECLABEL;
++ }
++
++ if ((what & KDBUS_ATTACH_AUDIT) &&
++ !(mp->collected & KDBUS_ATTACH_AUDIT)) {
++ kdbus_meta_proc_collect_audit(mp);
++ mp->collected |= KDBUS_ATTACH_AUDIT;
++ }
++
++ ret = 0;
++
++exit_unlock:
++ mutex_unlock(&mp->lock);
++ return ret;
++}
++
++/**
++ * kdbus_meta_fake_new() - Create fake metadata object
++ *
++ * Return: Pointer to new object on success, ERR_PTR on failure.
++ */
++struct kdbus_meta_fake *kdbus_meta_fake_new(void)
++{
++ struct kdbus_meta_fake *mf;
++
++ mf = kzalloc(sizeof(*mf), GFP_KERNEL);
++ if (!mf)
++ return ERR_PTR(-ENOMEM);
++
++ return mf;
++}
++
++/**
++ * kdbus_meta_fake_free() - Free fake metadata object
++ * @mf: Fake metadata object
++ *
++ * Return: NULL
++ */
++struct kdbus_meta_fake *kdbus_meta_fake_free(struct kdbus_meta_fake *mf)
++{
++ if (mf) {
++ put_pid(mf->ppid);
++ put_pid(mf->tgid);
++ put_pid(mf->pid);
++ kfree(mf->seclabel);
++ kfree(mf);
++ }
++
++ return NULL;
++}
++
++/**
++ * kdbus_meta_fake_collect() - Fill fake metadata from faked credentials
++ * @mf: Fake metadata object
++ * @creds: Creds to set, may be %NULL
++ * @pids: PIDs to set, may be %NULL
++ * @seclabel: Seclabel to set, may be %NULL
++ *
++ * This function takes information stored in @creds, @pids and @seclabel and
++ * resolves them to kernel-representations, if possible. This call uses the
++ * current task's namespaces to resolve the given information.
++ *
++ * Return: 0 on success, negative error code on failure.
++ */
++int kdbus_meta_fake_collect(struct kdbus_meta_fake *mf,
++ const struct kdbus_creds *creds,
++ const struct kdbus_pids *pids,
++ const char *seclabel)
++{
++ if (mf->valid)
++ return -EALREADY;
++
++ if (creds) {
++ struct user_namespace *ns = current_user_ns();
++
++ mf->uid = make_kuid(ns, creds->uid);
++ mf->euid = make_kuid(ns, creds->euid);
++ mf->suid = make_kuid(ns, creds->suid);
++ mf->fsuid = make_kuid(ns, creds->fsuid);
++
++ mf->gid = make_kgid(ns, creds->gid);
++ mf->egid = make_kgid(ns, creds->egid);
++ mf->sgid = make_kgid(ns, creds->sgid);
++ mf->fsgid = make_kgid(ns, creds->fsgid);
++
++ if ((creds->uid != (uid_t)-1 && !uid_valid(mf->uid)) ||
++ (creds->euid != (uid_t)-1 && !uid_valid(mf->euid)) ||
++ (creds->suid != (uid_t)-1 && !uid_valid(mf->suid)) ||
++ (creds->fsuid != (uid_t)-1 && !uid_valid(mf->fsuid)) ||
++ (creds->gid != (gid_t)-1 && !gid_valid(mf->gid)) ||
++ (creds->egid != (gid_t)-1 && !gid_valid(mf->egid)) ||
++ (creds->sgid != (gid_t)-1 && !gid_valid(mf->sgid)) ||
++ (creds->fsgid != (gid_t)-1 && !gid_valid(mf->fsgid)))
++ return -EINVAL;
++
++ mf->valid |= KDBUS_ATTACH_CREDS;
++ }
++
++ if (pids) {
++ mf->pid = get_pid(find_vpid(pids->tid));
++ mf->tgid = get_pid(find_vpid(pids->pid));
++ mf->ppid = get_pid(find_vpid(pids->ppid));
++
++ if ((pids->tid != 0 && !mf->pid) ||
++ (pids->pid != 0 && !mf->tgid) ||
++ (pids->ppid != 0 && !mf->ppid)) {
++ put_pid(mf->pid);
++ put_pid(mf->tgid);
++ put_pid(mf->ppid);
++ mf->pid = NULL;
++ mf->tgid = NULL;
++ mf->ppid = NULL;
++ return -EINVAL;
++ }
++
++ mf->valid |= KDBUS_ATTACH_PIDS;
++ }
++
++ if (seclabel) {
++ mf->seclabel = kstrdup(seclabel, GFP_KERNEL);
++ if (!mf->seclabel)
++ return -ENOMEM;
++
++ mf->valid |= KDBUS_ATTACH_SECLABEL;
++ }
++
++ return 0;
++}
++
++/**
++ * kdbus_meta_conn_new() - Create connection metadata object
++ *
++ * Return: Pointer to new object on success, ERR_PTR on failure.
++ */
++struct kdbus_meta_conn *kdbus_meta_conn_new(void)
++{
++ struct kdbus_meta_conn *mc;
++
++ mc = kzalloc(sizeof(*mc), GFP_KERNEL);
++ if (!mc)
++ return ERR_PTR(-ENOMEM);
++
++ kref_init(&mc->kref);
++ mutex_init(&mc->lock);
++
++ return mc;
++}
++
++static void kdbus_meta_conn_free(struct kref *kref)
++{
++ struct kdbus_meta_conn *mc =
++ container_of(kref, struct kdbus_meta_conn, kref);
++
++ kfree(mc->conn_description);
++ kfree(mc->owned_names_items);
++ kfree(mc);
++}
++
++/**
++ * kdbus_meta_conn_ref() - Gain reference
++ * @mc: Connection metadata object
++ */
++struct kdbus_meta_conn *kdbus_meta_conn_ref(struct kdbus_meta_conn *mc)
++{
++ if (mc)
++ kref_get(&mc->kref);
++ return mc;
++}
++
++/**
++ * kdbus_meta_conn_unref() - Drop reference
++ * @mc: Connection metadata object
++ */
++struct kdbus_meta_conn *kdbus_meta_conn_unref(struct kdbus_meta_conn *mc)
++{
++ if (mc)
++ kref_put(&mc->kref, kdbus_meta_conn_free);
++ return NULL;
++}
++
++static void kdbus_meta_conn_collect_timestamp(struct kdbus_meta_conn *mc,
++ u64 msg_seqnum)
++{
++ mc->ts.monotonic_ns = ktime_get_ns();
++ mc->ts.realtime_ns = ktime_get_real_ns();
++
++ if (msg_seqnum)
++ mc->ts.seqnum = msg_seqnum;
++
++ mc->valid |= KDBUS_ATTACH_TIMESTAMP;
++}
++
++static int kdbus_meta_conn_collect_names(struct kdbus_meta_conn *mc,
++ struct kdbus_conn *conn)
++{
++ const struct kdbus_name_owner *owner;
++ struct kdbus_item *item;
++ size_t slen, size;
++
++ lockdep_assert_held(&conn->ep->bus->name_registry->rwlock);
++
++ size = 0;
++ /* open-code length calculation to avoid final padding */
++ list_for_each_entry(owner, &conn->names_list, conn_entry)
++ if (!(owner->flags & KDBUS_NAME_IN_QUEUE))
++ size = KDBUS_ALIGN8(size) + KDBUS_ITEM_HEADER_SIZE +
++ sizeof(struct kdbus_name) +
++ strlen(owner->name->name) + 1;
++
++ if (!size)
++ return 0;
++
++ /* make sure we include zeroed padding for convenience helpers */
++ item = kmalloc(KDBUS_ALIGN8(size), GFP_KERNEL);
++ if (!item)
++ return -ENOMEM;
++
++ mc->owned_names_items = item;
++ mc->owned_names_size = size;
++
++ list_for_each_entry(owner, &conn->names_list, conn_entry) {
++ if (owner->flags & KDBUS_NAME_IN_QUEUE)
++ continue;
++
++ slen = strlen(owner->name->name) + 1;
++ kdbus_item_set(item, KDBUS_ITEM_OWNED_NAME, NULL,
++ sizeof(struct kdbus_name) + slen);
++ item->name.flags = owner->flags;
++ memcpy(item->name.name, owner->name->name, slen);
++ item = KDBUS_ITEM_NEXT(item);
++ }
++
++ /* sanity check: the buffer should be completely written now */
++ WARN_ON((u8 *)item !=
++ (u8 *)mc->owned_names_items + KDBUS_ALIGN8(size));
++
++ mc->valid |= KDBUS_ATTACH_NAMES;
++ return 0;
++}
++
++static int kdbus_meta_conn_collect_description(struct kdbus_meta_conn *mc,
++ struct kdbus_conn *conn)
++{
++ if (!conn->description)
++ return 0;
++
++ mc->conn_description = kstrdup(conn->description, GFP_KERNEL);
++ if (!mc->conn_description)
++ return -ENOMEM;
++
++ mc->valid |= KDBUS_ATTACH_CONN_DESCRIPTION;
++ return 0;
++}
++
++/**
++ * kdbus_meta_conn_collect() - Collect connection metadata
++ * @mc: Message metadata object
++ * @conn: Connection to collect data from
++ * @msg_seqnum: Sequence number of the message to send
++ * @what: Attach flags to collect
++ *
++ * This collects connection metadata from @msg_seqnum and @conn and saves it
++ * in @mc.
++ *
++ * If KDBUS_ATTACH_NAMES is set in @what and @conn is non-NULL, the caller must
++ * hold the name-registry read-lock of conn->ep->bus->registry.
++ *
++ * Return: 0 on success, negative error code on failure.
++ */
++int kdbus_meta_conn_collect(struct kdbus_meta_conn *mc,
++ struct kdbus_conn *conn,
++ u64 msg_seqnum, u64 what)
++{
++ int ret;
++
++ if (!mc || !(what & (KDBUS_ATTACH_TIMESTAMP |
++ KDBUS_ATTACH_NAMES |
++ KDBUS_ATTACH_CONN_DESCRIPTION)))
++ return 0;
++
++ mutex_lock(&mc->lock);
++
++ if (msg_seqnum && (what & KDBUS_ATTACH_TIMESTAMP) &&
++ !(mc->collected & KDBUS_ATTACH_TIMESTAMP)) {
++ kdbus_meta_conn_collect_timestamp(mc, msg_seqnum);
++ mc->collected |= KDBUS_ATTACH_TIMESTAMP;
++ }
++
++ if (conn && (what & KDBUS_ATTACH_NAMES) &&
++ !(mc->collected & KDBUS_ATTACH_NAMES)) {
++ ret = kdbus_meta_conn_collect_names(mc, conn);
++ if (ret < 0)
++ goto exit_unlock;
++ mc->collected |= KDBUS_ATTACH_NAMES;
++ }
++
++ if (conn && (what & KDBUS_ATTACH_CONN_DESCRIPTION) &&
++ !(mc->collected & KDBUS_ATTACH_CONN_DESCRIPTION)) {
++ ret = kdbus_meta_conn_collect_description(mc, conn);
++ if (ret < 0)
++ goto exit_unlock;
++ mc->collected |= KDBUS_ATTACH_CONN_DESCRIPTION;
++ }
++
++ ret = 0;
++
++exit_unlock:
++ mutex_unlock(&mc->lock);
++ return ret;
++}
++
++static void kdbus_meta_export_caps(struct kdbus_meta_caps *out,
++ const struct kdbus_meta_proc *mp,
++ struct user_namespace *user_ns)
++{
++ struct user_namespace *iter;
++ const struct cred *cred = mp->cred;
++ bool parent = false, owner = false;
++ int i;
++
++ /*
++ * This translates the effective capabilities of 'cred' into the given
++ * user-namespace. If the given user-namespace is a child-namespace of
++ * the user-namespace of 'cred', the mask can be copied verbatim. If
++ * not, the mask is cleared.
++ * There's one exception: If 'cred' is the owner of any user-namespace
++ * in the path between the given user-namespace and the user-namespace
++ * of 'cred', then it has all effective capabilities set. This means,
++ * the user who created a user-namespace always has all effective
++ * capabilities in any child namespaces. Note that this is based on the
++ * uid of the namespace creator, not the task hierarchy.
++ */
++ for (iter = user_ns; iter; iter = iter->parent) {
++ if (iter == cred->user_ns) {
++ parent = true;
++ break;
++ }
++
++ if (iter == &init_user_ns)
++ break;
++
++ if ((iter->parent == cred->user_ns) &&
++ uid_eq(iter->owner, cred->euid)) {
++ owner = true;
++ break;
++ }
++ }
++
++ out->last_cap = CAP_LAST_CAP;
++
++ CAP_FOR_EACH_U32(i) {
++ if (parent) {
++ out->set[0].caps[i] = cred->cap_inheritable.cap[i];
++ out->set[1].caps[i] = cred->cap_permitted.cap[i];
++ out->set[2].caps[i] = cred->cap_effective.cap[i];
++ out->set[3].caps[i] = cred->cap_bset.cap[i];
++ } else if (owner) {
++ out->set[0].caps[i] = 0U;
++ out->set[1].caps[i] = ~0U;
++ out->set[2].caps[i] = ~0U;
++ out->set[3].caps[i] = ~0U;
++ } else {
++ out->set[0].caps[i] = 0U;
++ out->set[1].caps[i] = 0U;
++ out->set[2].caps[i] = 0U;
++ out->set[3].caps[i] = 0U;
++ }
++ }
++
++ /* clear unused bits */
++ for (i = 0; i < 4; i++)
++ out->set[i].caps[CAP_TO_INDEX(CAP_LAST_CAP)] &=
++ CAP_LAST_U32_VALID_MASK;
++}
++
++/* This is equivalent to from_kuid_munged(), but maps INVALID_UID to itself */
++static uid_t kdbus_from_kuid_keep(struct user_namespace *ns, kuid_t uid)
++{
++ return uid_valid(uid) ? from_kuid_munged(ns, uid) : ((uid_t)-1);
++}
++
++/* This is equivalent to from_kgid_munged(), but maps INVALID_GID to itself */
++static gid_t kdbus_from_kgid_keep(struct user_namespace *ns, kgid_t gid)
++{
++ return gid_valid(gid) ? from_kgid_munged(ns, gid) : ((gid_t)-1);
++}
++
++struct kdbus_meta_staging {
++ const struct kdbus_meta_proc *mp;
++ const struct kdbus_meta_fake *mf;
++ const struct kdbus_meta_conn *mc;
++ const struct kdbus_conn *conn;
++ u64 mask;
++
++ void *exe;
++ const char *exe_path;
++};
++
++static size_t kdbus_meta_measure(struct kdbus_meta_staging *staging)
++{
++ const struct kdbus_meta_proc *mp = staging->mp;
++ const struct kdbus_meta_fake *mf = staging->mf;
++ const struct kdbus_meta_conn *mc = staging->mc;
++ const u64 mask = staging->mask;
++ size_t size = 0;
++
++ /* process metadata */
++
++ if (mf && (mask & KDBUS_ATTACH_CREDS))
++ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_creds));
++ else if (mp && (mask & KDBUS_ATTACH_CREDS))
++ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_creds));
++
++ if (mf && (mask & KDBUS_ATTACH_PIDS))
++ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_pids));
++ else if (mp && (mask & KDBUS_ATTACH_PIDS))
++ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_pids));
++
++ if (mp && (mask & KDBUS_ATTACH_AUXGROUPS))
++ size += KDBUS_ITEM_SIZE(mp->cred->group_info->ngroups *
++ sizeof(u64));
++
++ if (mp && (mask & KDBUS_ATTACH_TID_COMM))
++ size += KDBUS_ITEM_SIZE(strlen(mp->tid_comm) + 1);
++
++ if (mp && (mask & KDBUS_ATTACH_PID_COMM))
++ size += KDBUS_ITEM_SIZE(strlen(mp->pid_comm) + 1);
++
++ if (staging->exe_path && (mask & KDBUS_ATTACH_EXE))
++ size += KDBUS_ITEM_SIZE(strlen(staging->exe_path) + 1);
++
++ if (mp && (mask & KDBUS_ATTACH_CMDLINE))
++ size += KDBUS_ITEM_SIZE(strlen(mp->cmdline) + 1);
++
++ if (mp && (mask & KDBUS_ATTACH_CGROUP))
++ size += KDBUS_ITEM_SIZE(strlen(mp->cgroup) + 1);
++
++ if (mp && (mask & KDBUS_ATTACH_CAPS))
++ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_meta_caps));
++
++ if (mf && (mask & KDBUS_ATTACH_SECLABEL))
++ size += KDBUS_ITEM_SIZE(strlen(mf->seclabel) + 1);
++ else if (mp && (mask & KDBUS_ATTACH_SECLABEL))
++ size += KDBUS_ITEM_SIZE(strlen(mp->seclabel) + 1);
++
++ if (mp && (mask & KDBUS_ATTACH_AUDIT))
++ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_audit));
++
++ /* connection metadata */
++
++ if (mc && (mask & KDBUS_ATTACH_NAMES))
++ size += KDBUS_ALIGN8(mc->owned_names_size);
++
++ if (mc && (mask & KDBUS_ATTACH_CONN_DESCRIPTION))
++ size += KDBUS_ITEM_SIZE(strlen(mc->conn_description) + 1);
++
++ if (mc && (mask & KDBUS_ATTACH_TIMESTAMP))
++ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_timestamp));
++
++ return size;
++}
++
++static struct kdbus_item *kdbus_write_head(struct kdbus_item **iter,
++ u64 type, u64 size)
++{
++ struct kdbus_item *item = *iter;
++ size_t padding;
++
++ item->type = type;
++ item->size = KDBUS_ITEM_HEADER_SIZE + size;
++
++ /* clear padding */
++ padding = KDBUS_ALIGN8(item->size) - item->size;
++ if (padding)
++ memset(item->data + size, 0, padding);
++
++ *iter = KDBUS_ITEM_NEXT(item);
++ return item;
++}
++
++static struct kdbus_item *kdbus_write_full(struct kdbus_item **iter,
++ u64 type, u64 size, const void *data)
++{
++ struct kdbus_item *item;
++
++ item = kdbus_write_head(iter, type, size);
++ memcpy(item->data, data, size);
++ return item;
++}
++
++static size_t kdbus_meta_write(struct kdbus_meta_staging *staging, void *mem,
++ size_t size)
++{
++ struct user_namespace *user_ns = staging->conn->cred->user_ns;
++ struct pid_namespace *pid_ns = ns_of_pid(staging->conn->pid);
++ struct kdbus_item *item = NULL, *items = mem;
++ u8 *end, *owned_names_end = NULL;
++
++ /* process metadata */
++
++ if (staging->mf && (staging->mask & KDBUS_ATTACH_CREDS)) {
++ const struct kdbus_meta_fake *mf = staging->mf;
++
++ item = kdbus_write_head(&items, KDBUS_ITEM_CREDS,
++ sizeof(struct kdbus_creds));
++ item->creds = (struct kdbus_creds){
++ .uid = kdbus_from_kuid_keep(user_ns, mf->uid),
++ .euid = kdbus_from_kuid_keep(user_ns, mf->euid),
++ .suid = kdbus_from_kuid_keep(user_ns, mf->suid),
++ .fsuid = kdbus_from_kuid_keep(user_ns, mf->fsuid),
++ .gid = kdbus_from_kgid_keep(user_ns, mf->gid),
++ .egid = kdbus_from_kgid_keep(user_ns, mf->egid),
++ .sgid = kdbus_from_kgid_keep(user_ns, mf->sgid),
++ .fsgid = kdbus_from_kgid_keep(user_ns, mf->fsgid),
++ };
++ } else if (staging->mp && (staging->mask & KDBUS_ATTACH_CREDS)) {
++ const struct cred *c = staging->mp->cred;
++
++ item = kdbus_write_head(&items, KDBUS_ITEM_CREDS,
++ sizeof(struct kdbus_creds));
++ item->creds = (struct kdbus_creds){
++ .uid = kdbus_from_kuid_keep(user_ns, c->uid),
++ .euid = kdbus_from_kuid_keep(user_ns, c->euid),
++ .suid = kdbus_from_kuid_keep(user_ns, c->suid),
++ .fsuid = kdbus_from_kuid_keep(user_ns, c->fsuid),
++ .gid = kdbus_from_kgid_keep(user_ns, c->gid),
++ .egid = kdbus_from_kgid_keep(user_ns, c->egid),
++ .sgid = kdbus_from_kgid_keep(user_ns, c->sgid),
++ .fsgid = kdbus_from_kgid_keep(user_ns, c->fsgid),
++ };
++ }
++
++ if (staging->mf && (staging->mask & KDBUS_ATTACH_PIDS)) {
++ item = kdbus_write_head(&items, KDBUS_ITEM_PIDS,
++ sizeof(struct kdbus_pids));
++ item->pids = (struct kdbus_pids){
++ .pid = pid_nr_ns(staging->mf->tgid, pid_ns),
++ .tid = pid_nr_ns(staging->mf->pid, pid_ns),
++ .ppid = pid_nr_ns(staging->mf->ppid, pid_ns),
++ };
++ } else if (staging->mp && (staging->mask & KDBUS_ATTACH_PIDS)) {
++ item = kdbus_write_head(&items, KDBUS_ITEM_PIDS,
++ sizeof(struct kdbus_pids));
++ item->pids = (struct kdbus_pids){
++ .pid = pid_nr_ns(staging->mp->tgid, pid_ns),
++ .tid = pid_nr_ns(staging->mp->pid, pid_ns),
++ .ppid = pid_nr_ns(staging->mp->ppid, pid_ns),
++ };
++ }
++
++ if (staging->mp && (staging->mask & KDBUS_ATTACH_AUXGROUPS)) {
++ const struct group_info *info = staging->mp->cred->group_info;
++ size_t i;
++
++ item = kdbus_write_head(&items, KDBUS_ITEM_AUXGROUPS,
++ info->ngroups * sizeof(u64));
++ for (i = 0; i < info->ngroups; ++i)
++ item->data64[i] = from_kgid_munged(user_ns,
++ GROUP_AT(info, i));
++ }
++
++ if (staging->mp && (staging->mask & KDBUS_ATTACH_TID_COMM))
++ item = kdbus_write_full(&items, KDBUS_ITEM_TID_COMM,
++ strlen(staging->mp->tid_comm) + 1,
++ staging->mp->tid_comm);
++
++ if (staging->mp && (staging->mask & KDBUS_ATTACH_PID_COMM))
++ item = kdbus_write_full(&items, KDBUS_ITEM_PID_COMM,
++ strlen(staging->mp->pid_comm) + 1,
++ staging->mp->pid_comm);
++
++ if (staging->exe_path && (staging->mask & KDBUS_ATTACH_EXE))
++ item = kdbus_write_full(&items, KDBUS_ITEM_EXE,
++ strlen(staging->exe_path) + 1,
++ staging->exe_path);
++
++ if (staging->mp && (staging->mask & KDBUS_ATTACH_CMDLINE))
++ item = kdbus_write_full(&items, KDBUS_ITEM_CMDLINE,
++ strlen(staging->mp->cmdline) + 1,
++ staging->mp->cmdline);
++
++ if (staging->mp && (staging->mask & KDBUS_ATTACH_CGROUP))
++ item = kdbus_write_full(&items, KDBUS_ITEM_CGROUP,
++ strlen(staging->mp->cgroup) + 1,
++ staging->mp->cgroup);
++
++ if (staging->mp && (staging->mask & KDBUS_ATTACH_CAPS)) {
++ item = kdbus_write_head(&items, KDBUS_ITEM_CAPS,
++ sizeof(struct kdbus_meta_caps));
++ kdbus_meta_export_caps((void*)&item->caps, staging->mp,
++ user_ns);
++ }
++
++ if (staging->mf && (staging->mask & KDBUS_ATTACH_SECLABEL))
++ item = kdbus_write_full(&items, KDBUS_ITEM_SECLABEL,
++ strlen(staging->mf->seclabel) + 1,
++ staging->mf->seclabel);
++ else if (staging->mp && (staging->mask & KDBUS_ATTACH_SECLABEL))
++ item = kdbus_write_full(&items, KDBUS_ITEM_SECLABEL,
++ strlen(staging->mp->seclabel) + 1,
++ staging->mp->seclabel);
++
++ if (staging->mp && (staging->mask & KDBUS_ATTACH_AUDIT)) {
++ item = kdbus_write_head(&items, KDBUS_ITEM_AUDIT,
++ sizeof(struct kdbus_audit));
++ item->audit = (struct kdbus_audit){
++ .loginuid = from_kuid(user_ns,
++ staging->mp->audit_loginuid),
++ .sessionid = staging->mp->audit_sessionid,
++ };
++ }
++
++ /* connection metadata */
++
++ if (staging->mc && (staging->mask & KDBUS_ATTACH_NAMES)) {
++ memcpy(items, staging->mc->owned_names_items,
++ KDBUS_ALIGN8(staging->mc->owned_names_size));
++ owned_names_end = (u8 *)items + staging->mc->owned_names_size;
++ items = (void *)KDBUS_ALIGN8((unsigned long)owned_names_end);
++ }
++
++ if (staging->mc && (staging->mask & KDBUS_ATTACH_CONN_DESCRIPTION))
++ item = kdbus_write_full(&items, KDBUS_ITEM_CONN_DESCRIPTION,
++ strlen(staging->mc->conn_description) + 1,
++ staging->mc->conn_description);
++
++ if (staging->mc && (staging->mask & KDBUS_ATTACH_TIMESTAMP))
++ item = kdbus_write_full(&items, KDBUS_ITEM_TIMESTAMP,
++ sizeof(staging->mc->ts),
++ &staging->mc->ts);
++
++ /*
++ * Return real size (minus trailing padding). In case of 'owned_names'
++ * we cannot deduce it from item->size, so treat it special.
++ */
++
++ if (items == (void *)KDBUS_ALIGN8((unsigned long)owned_names_end))
++ end = owned_names_end;
++ else if (item)
++ end = (u8 *)item + item->size;
++ else
++ end = mem;
++
++ WARN_ON((u8 *)items - (u8 *)mem != size);
++ WARN_ON((void *)KDBUS_ALIGN8((unsigned long)end) != (void *)items);
++
++ return end - (u8 *)mem;
++}
++
++int kdbus_meta_emit(struct kdbus_meta_proc *mp,
++ struct kdbus_meta_fake *mf,
++ struct kdbus_meta_conn *mc,
++ struct kdbus_conn *conn,
++ u64 mask,
++ struct kdbus_item **out_items,
++ size_t *out_size)
++{
++ struct kdbus_meta_staging staging = {};
++ struct kdbus_item *items = NULL;
++ size_t size = 0;
++ int ret;
++
++ if (WARN_ON(mf && mp))
++ mp = NULL;
++
++ staging.mp = mp;
++ staging.mf = mf;
++ staging.mc = mc;
++ staging.conn = conn;
++
++ /* get mask of valid items */
++ if (mf)
++ staging.mask |= mf->valid;
++ if (mp) {
++ mutex_lock(&mp->lock);
++ staging.mask |= mp->valid;
++ mutex_unlock(&mp->lock);
++ }
++ if (mc) {
++ mutex_lock(&mc->lock);
++ staging.mask |= mc->valid;
++ mutex_unlock(&mc->lock);
++ }
++
++ staging.mask &= mask;
++
++ if (!staging.mask) { /* bail out if nothing to do */
++ ret = 0;
++ goto exit;
++ }
++
++ /* EXE is special as it needs a temporary page to assemble */
++ if (mp && (staging.mask & KDBUS_ATTACH_EXE)) {
++ struct path p;
++
++ /*
++ * XXX: We need access to __d_path() so we can write the path
++ * relative to conn->root_path. Once upstream, we need
++ * EXPORT_SYMBOL(__d_path) or an equivalent of d_path() that
++ * takes the root path directly. Until then, we drop this item
++ * if the root-paths differ.
++ */
++
++ get_fs_root(current->fs, &p);
++ if (path_equal(&p, &conn->root_path)) {
++ staging.exe = (void *)__get_free_page(GFP_TEMPORARY);
++ if (!staging.exe) {
++ path_put(&p);
++ ret = -ENOMEM;
++ goto exit;
++ }
++
++ staging.exe_path = d_path(&mp->exe_path, staging.exe,
++ PAGE_SIZE);
++ if (IS_ERR(staging.exe_path)) {
++ path_put(&p);
++ ret = PTR_ERR(staging.exe_path);
++ goto exit;
++ }
++ }
++ path_put(&p);
++ }
++
++ size = kdbus_meta_measure(&staging);
++ if (!size) { /* bail out if nothing to do */
++ ret = 0;
++ goto exit;
++ }
++
++ items = kmalloc(size, GFP_KERNEL);
++ if (!items) {
++ ret = -ENOMEM;
++ goto exit;
++ }
++
++ size = kdbus_meta_write(&staging, items, size);
++ if (!size) {
++ kfree(items);
++ items = NULL;
++ }
++
++ ret = 0;
++
++exit:
++ if (staging.exe)
++ free_page((unsigned long)staging.exe);
++ if (ret >= 0) {
++ *out_items = items;
++ *out_size = size;
++ }
++ return ret;
++}
++
++enum {
++ KDBUS_META_PROC_NONE,
++ KDBUS_META_PROC_NORMAL,
++};
++
++/**
++ * kdbus_proc_permission() - check /proc permissions on target pid
++ * @pid_ns: namespace we operate in
++ * @cred: credentials of requestor
++ * @target: target process
++ *
++ * This checks whether a process with credentials @cred can access information
++ * of @target in the namespace @pid_ns. This tries to follow /proc permissions,
++ * but is slightly more restrictive.
++ *
++ * Return: The /proc access level (KDBUS_META_PROC_*) is returned.
++ */
++static unsigned int kdbus_proc_permission(const struct pid_namespace *pid_ns,
++ const struct cred *cred,
++ struct pid *target)
++{
++ if (pid_ns->hide_pid < 1)
++ return KDBUS_META_PROC_NORMAL;
++
++ /* XXX: we need groups_search() exported for aux-groups */
++ if (gid_eq(cred->egid, pid_ns->pid_gid))
++ return KDBUS_META_PROC_NORMAL;
++
++ /*
++ * XXX: If ptrace_may_access(PTRACE_MODE_READ) is granted, you can
++ * overwrite hide_pid. However, ptrace_may_access() only supports
++ * checking 'current', hence, we cannot use this here. But we
++ * simply decide to not support this override, so no need to worry.
++ */
++
++ return KDBUS_META_PROC_NONE;
++}
++
++/**
++ * kdbus_meta_proc_mask() - calculate which metadata would be visible to
++ * a connection via /proc
++ * @prv_pid: pid of metadata provider
++ * @req_pid: pid of metadata requestor
++ * @req_cred: credentials of metadata reqeuestor
++ * @wanted: metadata that is requested
++ *
++ * This checks which metadata items of @prv_pid can be read via /proc by the
++ * requestor @req_pid.
++ *
++ * Return: Set of metadata flags the requestor can see (limited by @wanted).
++ */
++static u64 kdbus_meta_proc_mask(struct pid *prv_pid,
++ struct pid *req_pid,
++ const struct cred *req_cred,
++ u64 wanted)
++{
++ struct pid_namespace *prv_ns, *req_ns;
++ unsigned int proc;
++
++ prv_ns = ns_of_pid(prv_pid);
++ req_ns = ns_of_pid(req_pid);
++
++ /*
++ * If the sender is not visible in the receiver namespace, then the
++ * receiver cannot access the sender via its own procfs. Hence, we do
++ * not attach any additional metadata.
++ */
++ if (!pid_nr_ns(prv_pid, req_ns))
++ return 0;
++
++ /*
++ * If the pid-namespace of the receiver has hide_pid set, it cannot see
++ * any process but its own. We shortcut this /proc permission check if
++ * provider and requestor are the same. If not, we perform rather
++ * expensive /proc permission checks.
++ */
++ if (prv_pid == req_pid)
++ proc = KDBUS_META_PROC_NORMAL;
++ else
++ proc = kdbus_proc_permission(req_ns, req_cred, prv_pid);
++
++ /* you need /proc access to read standard process attributes */
++ if (proc < KDBUS_META_PROC_NORMAL)
++ wanted &= ~(KDBUS_ATTACH_TID_COMM |
++ KDBUS_ATTACH_PID_COMM |
++ KDBUS_ATTACH_SECLABEL |
++ KDBUS_ATTACH_CMDLINE |
++ KDBUS_ATTACH_CGROUP |
++ KDBUS_ATTACH_AUDIT |
++ KDBUS_ATTACH_CAPS |
++ KDBUS_ATTACH_EXE);
++
++ /* clear all non-/proc flags */
++ return wanted & (KDBUS_ATTACH_TID_COMM |
++ KDBUS_ATTACH_PID_COMM |
++ KDBUS_ATTACH_SECLABEL |
++ KDBUS_ATTACH_CMDLINE |
++ KDBUS_ATTACH_CGROUP |
++ KDBUS_ATTACH_AUDIT |
++ KDBUS_ATTACH_CAPS |
++ KDBUS_ATTACH_EXE);
++}
++
++/**
++ * kdbus_meta_get_mask() - calculate attach flags mask for metadata request
++ * @prv_pid: pid of metadata provider
++ * @prv_mask: mask of metadata the provide grants unchecked
++ * @req_pid: pid of metadata requestor
++ * @req_cred: credentials of metadata requestor
++ * @req_mask: mask of metadata that is requested
++ *
++ * This calculates the metadata items that the requestor @req_pid can access
++ * from the metadata provider @prv_pid. This permission check consists of
++ * several different parts:
++ * - Providers can grant metadata items unchecked. Regardless of their type,
++ * they're always granted to the requestor. This mask is passed as @prv_mask.
++ * - Basic items (credentials and connection metadata) are granted implicitly
++ * to everyone. They're publicly available to any bus-user that can see the
++ * provider.
++ * - Process credentials that are not granted implicitly follow the same
++ * permission checks as /proc. This means, we always assume a requestor
++ * process has access to their *own* /proc mount, if they have access to
++ * kdbusfs.
++ *
++ * Return: Mask of metadata that is granted.
++ */
++static u64 kdbus_meta_get_mask(struct pid *prv_pid, u64 prv_mask,
++ struct pid *req_pid,
++ const struct cred *req_cred, u64 req_mask)
++{
++ u64 missing, impl_mask, proc_mask = 0;
++
++ /*
++ * Connection metadata and basic unix process credentials are
++ * transmitted implicitly, and cannot be suppressed. Both are required
++ * to perform user-space policies on the receiver-side. Furthermore,
++ * connection metadata is public state, anyway, and unix credentials
++ * are needed for UDS-compatibility. We extend them slightly by
++ * auxiliary groups and additional uids/gids/pids.
++ */
++ impl_mask = /* connection metadata */
++ KDBUS_ATTACH_CONN_DESCRIPTION |
++ KDBUS_ATTACH_TIMESTAMP |
++ KDBUS_ATTACH_NAMES |
++ /* credentials and pids */
++ KDBUS_ATTACH_AUXGROUPS |
++ KDBUS_ATTACH_CREDS |
++ KDBUS_ATTACH_PIDS;
++
++ /*
++ * Calculate the set of metadata that is not granted implicitly nor by
++ * the sender, but still requested by the receiver. If any are left,
++ * perform rather expensive /proc access checks for them.
++ */
++ missing = req_mask & ~((prv_mask | impl_mask) & req_mask);
++ if (missing)
++ proc_mask = kdbus_meta_proc_mask(prv_pid, req_pid, req_cred,
++ missing);
++
++ return (prv_mask | impl_mask | proc_mask) & req_mask;
++}
++
++/**
++ */
++u64 kdbus_meta_info_mask(const struct kdbus_conn *conn, u64 mask)
++{
++ return kdbus_meta_get_mask(conn->pid,
++ atomic64_read(&conn->attach_flags_send),
++ task_pid(current),
++ current_cred(),
++ mask);
++}
++
++/**
++ */
++u64 kdbus_meta_msg_mask(const struct kdbus_conn *snd,
++ const struct kdbus_conn *rcv)
++{
++ return kdbus_meta_get_mask(task_pid(current),
++ atomic64_read(&snd->attach_flags_send),
++ rcv->pid,
++ rcv->cred,
++ atomic64_read(&rcv->attach_flags_recv));
++}
+diff --git a/ipc/kdbus/metadata.h b/ipc/kdbus/metadata.h
+new file mode 100644
+index 0000000..dba7cc7
+--- /dev/null
++++ b/ipc/kdbus/metadata.h
+@@ -0,0 +1,86 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_METADATA_H
++#define __KDBUS_METADATA_H
++
++#include <linux/kernel.h>
++
++struct kdbus_conn;
++struct kdbus_pool_slice;
++
++struct kdbus_meta_proc;
++struct kdbus_meta_conn;
++
++/**
++ * struct kdbus_meta_fake - Fake metadata
++ * @valid: Bitmask of collected and valid items
++ * @uid: UID of process
++ * @euid: EUID of process
++ * @suid: SUID of process
++ * @fsuid: FSUID of process
++ * @gid: GID of process
++ * @egid: EGID of process
++ * @sgid: SGID of process
++ * @fsgid: FSGID of process
++ * @pid: PID of process
++ * @tgid: TGID of process
++ * @ppid: PPID of process
++ * @seclabel: Seclabel
++ */
++struct kdbus_meta_fake {
++ u64 valid;
++
++ /* KDBUS_ITEM_CREDS */
++ kuid_t uid, euid, suid, fsuid;
++ kgid_t gid, egid, sgid, fsgid;
++
++ /* KDBUS_ITEM_PIDS */
++ struct pid *pid, *tgid, *ppid;
++
++ /* KDBUS_ITEM_SECLABEL */
++ char *seclabel;
++};
++
++struct kdbus_meta_proc *kdbus_meta_proc_new(void);
++struct kdbus_meta_proc *kdbus_meta_proc_ref(struct kdbus_meta_proc *mp);
++struct kdbus_meta_proc *kdbus_meta_proc_unref(struct kdbus_meta_proc *mp);
++int kdbus_meta_proc_collect(struct kdbus_meta_proc *mp, u64 what);
++
++struct kdbus_meta_fake *kdbus_meta_fake_new(void);
++struct kdbus_meta_fake *kdbus_meta_fake_free(struct kdbus_meta_fake *mf);
++int kdbus_meta_fake_collect(struct kdbus_meta_fake *mf,
++ const struct kdbus_creds *creds,
++ const struct kdbus_pids *pids,
++ const char *seclabel);
++
++struct kdbus_meta_conn *kdbus_meta_conn_new(void);
++struct kdbus_meta_conn *kdbus_meta_conn_ref(struct kdbus_meta_conn *mc);
++struct kdbus_meta_conn *kdbus_meta_conn_unref(struct kdbus_meta_conn *mc);
++int kdbus_meta_conn_collect(struct kdbus_meta_conn *mc,
++ struct kdbus_conn *conn,
++ u64 msg_seqnum, u64 what);
++
++int kdbus_meta_emit(struct kdbus_meta_proc *mp,
++ struct kdbus_meta_fake *mf,
++ struct kdbus_meta_conn *mc,
++ struct kdbus_conn *conn,
++ u64 mask,
++ struct kdbus_item **out_items,
++ size_t *out_size);
++u64 kdbus_meta_info_mask(const struct kdbus_conn *conn, u64 mask);
++u64 kdbus_meta_msg_mask(const struct kdbus_conn *snd,
++ const struct kdbus_conn *rcv);
++
++#endif
+diff --git a/ipc/kdbus/names.c b/ipc/kdbus/names.c
+new file mode 100644
+index 0000000..bf44ca3
+--- /dev/null
++++ b/ipc/kdbus/names.c
+@@ -0,0 +1,854 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/ctype.h>
++#include <linux/fs.h>
++#include <linux/hash.h>
++#include <linux/idr.h>
++#include <linux/init.h>
++#include <linux/module.h>
++#include <linux/mutex.h>
++#include <linux/rwsem.h>
++#include <linux/sched.h>
++#include <linux/slab.h>
++#include <linux/uaccess.h>
++#include <linux/uio.h>
++
++#include "bus.h"
++#include "connection.h"
++#include "endpoint.h"
++#include "handle.h"
++#include "item.h"
++#include "names.h"
++#include "notify.h"
++#include "policy.h"
++
++#define KDBUS_NAME_SAVED_MASK (KDBUS_NAME_ALLOW_REPLACEMENT | \
++ KDBUS_NAME_QUEUE)
++
++static bool kdbus_name_owner_is_used(struct kdbus_name_owner *owner)
++{
++ return !list_empty(&owner->name_entry) ||
++ owner == owner->name->activator;
++}
++
++static struct kdbus_name_owner *
++kdbus_name_owner_new(struct kdbus_conn *conn, struct kdbus_name_entry *name,
++ u64 flags)
++{
++ struct kdbus_name_owner *owner;
++
++ kdbus_conn_assert_active(conn);
++
++ if (conn->name_count >= KDBUS_CONN_MAX_NAMES)
++ return ERR_PTR(-E2BIG);
++
++ owner = kmalloc(sizeof(*owner), GFP_KERNEL);
++ if (!owner)
++ return ERR_PTR(-ENOMEM);
++
++ owner->flags = flags & KDBUS_NAME_SAVED_MASK;
++ owner->conn = conn;
++ owner->name = name;
++ list_add_tail(&owner->conn_entry, &conn->names_list);
++ INIT_LIST_HEAD(&owner->name_entry);
++
++ ++conn->name_count;
++ return owner;
++}
++
++static void kdbus_name_owner_free(struct kdbus_name_owner *owner)
++{
++ if (!owner)
++ return;
++
++ WARN_ON(kdbus_name_owner_is_used(owner));
++ --owner->conn->name_count;
++ list_del(&owner->conn_entry);
++ kfree(owner);
++}
++
++static struct kdbus_name_owner *
++kdbus_name_owner_find(struct kdbus_name_entry *name, struct kdbus_conn *conn)
++{
++ struct kdbus_name_owner *owner;
++
++ /*
++ * Use conn->names_list over name->queue to make sure boundaries of
++ * this linear search are controlled by the connection itself.
++ * Furthermore, this will find normal owners as well as activators
++ * without any additional code.
++ */
++ list_for_each_entry(owner, &conn->names_list, conn_entry)
++ if (owner->name == name)
++ return owner;
++
++ return NULL;
++}
++
++static bool kdbus_name_entry_is_used(struct kdbus_name_entry *name)
++{
++ return !list_empty(&name->queue) || name->activator;
++}
++
++static struct kdbus_name_owner *
++kdbus_name_entry_first(struct kdbus_name_entry *name)
++{
++ return list_first_entry_or_null(&name->queue, struct kdbus_name_owner,
++ name_entry);
++}
++
++static struct kdbus_name_entry *
++kdbus_name_entry_new(struct kdbus_name_registry *r, u32 hash,
++ const char *name_str)
++{
++ struct kdbus_name_entry *name;
++ size_t namelen;
++
++ lockdep_assert_held(&r->rwlock);
++
++ namelen = strlen(name_str);
++
++ name = kmalloc(sizeof(*name) + namelen + 1, GFP_KERNEL);
++ if (!name)
++ return ERR_PTR(-ENOMEM);
++
++ name->name_id = ++r->name_seq_last;
++ name->activator = NULL;
++ INIT_LIST_HEAD(&name->queue);
++ hash_add(r->entries_hash, &name->hentry, hash);
++ memcpy(name->name, name_str, namelen + 1);
++
++ return name;
++}
++
++static void kdbus_name_entry_free(struct kdbus_name_entry *name)
++{
++ if (!name)
++ return;
++
++ WARN_ON(kdbus_name_entry_is_used(name));
++ hash_del(&name->hentry);
++ kfree(name);
++}
++
++static struct kdbus_name_entry *
++kdbus_name_entry_find(struct kdbus_name_registry *r, u32 hash,
++ const char *name_str)
++{
++ struct kdbus_name_entry *name;
++
++ lockdep_assert_held(&r->rwlock);
++
++ hash_for_each_possible(r->entries_hash, name, hentry, hash)
++ if (!strcmp(name->name, name_str))
++ return name;
++
++ return NULL;
++}
++
++/**
++ * kdbus_name_registry_new() - create a new name registry
++ *
++ * Return: a new kdbus_name_registry on success, ERR_PTR on failure.
++ */
++struct kdbus_name_registry *kdbus_name_registry_new(void)
++{
++ struct kdbus_name_registry *r;
++
++ r = kmalloc(sizeof(*r), GFP_KERNEL);
++ if (!r)
++ return ERR_PTR(-ENOMEM);
++
++ hash_init(r->entries_hash);
++ init_rwsem(&r->rwlock);
++ r->name_seq_last = 0;
++
++ return r;
++}
++
++/**
++ * kdbus_name_registry_free() - free name registry
++ * @r: name registry to free, or NULL
++ *
++ * Free a name registry and cleanup all internal objects. This is a no-op if
++ * you pass NULL as registry.
++ */
++void kdbus_name_registry_free(struct kdbus_name_registry *r)
++{
++ if (!r)
++ return;
++
++ WARN_ON(!hash_empty(r->entries_hash));
++ kfree(r);
++}
++
++/**
++ * kdbus_name_lookup_unlocked() - lookup name in registry
++ * @reg: name registry
++ * @name: name to lookup
++ *
++ * This looks up @name in the given name-registry and returns the
++ * kdbus_name_entry object. The caller must hold the registry-lock and must not
++ * access the returned object after releasing the lock.
++ *
++ * Return: Pointer to name-entry, or NULL if not found.
++ */
++struct kdbus_name_entry *
++kdbus_name_lookup_unlocked(struct kdbus_name_registry *reg, const char *name)
++{
++ return kdbus_name_entry_find(reg, kdbus_strhash(name), name);
++}
++
++static int kdbus_name_become_activator(struct kdbus_name_owner *owner,
++ u64 *return_flags)
++{
++ if (kdbus_name_owner_is_used(owner))
++ return -EALREADY;
++ if (owner->name->activator)
++ return -EEXIST;
++
++ owner->name->activator = owner;
++ owner->flags |= KDBUS_NAME_ACTIVATOR;
++
++ if (kdbus_name_entry_first(owner->name)) {
++ owner->flags |= KDBUS_NAME_IN_QUEUE;
++ } else {
++ owner->flags |= KDBUS_NAME_PRIMARY;
++ kdbus_notify_name_change(owner->conn->ep->bus,
++ KDBUS_ITEM_NAME_ADD,
++ 0, owner->conn->id,
++ 0, owner->flags,
++ owner->name->name);
++ }
++
++ if (return_flags)
++ *return_flags = owner->flags | KDBUS_NAME_ACQUIRED;
++
++ return 0;
++}
++
++static int kdbus_name_update(struct kdbus_name_owner *owner, u64 flags,
++ u64 *return_flags)
++{
++ struct kdbus_name_owner *primary, *activator;
++ struct kdbus_name_entry *name;
++ struct kdbus_bus *bus;
++ u64 nflags = 0;
++ int ret = 0;
++
++ name = owner->name;
++ bus = owner->conn->ep->bus;
++ primary = kdbus_name_entry_first(name);
++ activator = name->activator;
++
++ /* cannot be activator and acquire a name */
++ if (owner == activator)
++ return -EUCLEAN;
++
++ /* update saved flags */
++ owner->flags = flags & KDBUS_NAME_SAVED_MASK;
++
++ if (!primary) {
++ /*
++ * No primary owner (but maybe an activator). Take over the
++ * name.
++ */
++
++ list_add(&owner->name_entry, &name->queue);
++ owner->flags |= KDBUS_NAME_PRIMARY;
++ nflags |= KDBUS_NAME_ACQUIRED;
++
++ /* move messages to new owner on activation */
++ if (activator) {
++ kdbus_conn_move_messages(owner->conn, activator->conn,
++ name->name_id);
++ kdbus_notify_name_change(bus, KDBUS_ITEM_NAME_CHANGE,
++ activator->conn->id, owner->conn->id,
++ activator->flags, owner->flags,
++ name->name);
++ activator->flags &= ~KDBUS_NAME_PRIMARY;
++ activator->flags |= KDBUS_NAME_IN_QUEUE;
++ } else {
++ kdbus_notify_name_change(bus, KDBUS_ITEM_NAME_ADD,
++ 0, owner->conn->id,
++ 0, owner->flags,
++ name->name);
++ }
++
++ } else if (owner == primary) {
++ /*
++ * Already the primary owner of the name, flags were already
++ * updated. Nothing to do.
++ */
++
++ owner->flags |= KDBUS_NAME_PRIMARY;
++
++ } else if ((primary->flags & KDBUS_NAME_ALLOW_REPLACEMENT) &&
++ (flags & KDBUS_NAME_REPLACE_EXISTING)) {
++ /*
++ * We're not the primary owner but can replace it. Move us
++ * ahead of the primary owner and acquire the name (possibly
++ * skipping queued owners ahead of us).
++ */
++
++ list_del_init(&owner->name_entry);
++ list_add(&owner->name_entry, &name->queue);
++ owner->flags |= KDBUS_NAME_PRIMARY;
++ nflags |= KDBUS_NAME_ACQUIRED;
++
++ kdbus_notify_name_change(bus, KDBUS_ITEM_NAME_CHANGE,
++ primary->conn->id, owner->conn->id,
++ primary->flags, owner->flags,
++ name->name);
++
++ /* requeue old primary, or drop if queueing not wanted */
++ if (primary->flags & KDBUS_NAME_QUEUE) {
++ primary->flags &= ~KDBUS_NAME_PRIMARY;
++ primary->flags |= KDBUS_NAME_IN_QUEUE;
++ } else {
++ list_del_init(&primary->name_entry);
++ kdbus_name_owner_free(primary);
++ }
++
++ } else if (flags & KDBUS_NAME_QUEUE) {
++ /*
++ * Name is already occupied and we cannot take it over, but
++ * queuing is allowed. Put us silently on the queue, if not
++ * already there.
++ */
++
++ owner->flags |= KDBUS_NAME_IN_QUEUE;
++ if (!kdbus_name_owner_is_used(owner)) {
++ list_add_tail(&owner->name_entry, &name->queue);
++ nflags |= KDBUS_NAME_ACQUIRED;
++ }
++ } else if (kdbus_name_owner_is_used(owner)) {
++ /*
++ * Already queued on name, but re-queueing was not requested.
++ * Make sure to unlink it from the name, the caller is
++ * responsible for releasing it.
++ */
++
++ list_del_init(&owner->name_entry);
++ } else {
++ /*
++ * Name is already claimed and queueing is not requested.
++ * Return error to the caller.
++ */
++
++ ret = -EEXIST;
++ }
++
++ if (return_flags)
++ *return_flags = owner->flags | nflags;
++
++ return ret;
++}
++
++int kdbus_name_acquire(struct kdbus_name_registry *reg,
++ struct kdbus_conn *conn, const char *name_str,
++ u64 flags, u64 *return_flags)
++{
++ struct kdbus_name_entry *name = NULL;
++ struct kdbus_name_owner *owner = NULL;
++ u32 hash;
++ int ret;
++
++ kdbus_conn_assert_active(conn);
++
++ down_write(®->rwlock);
++
++ /*
++ * Verify the connection has access to the name. Do this before testing
++ * for double-acquisitions and other errors to make sure we do not leak
++ * information about this name through possible custom endpoints.
++ */
++ if (!kdbus_conn_policy_own_name(conn, current_cred(), name_str)) {
++ ret = -EPERM;
++ goto exit;
++ }
++
++ /*
++ * Lookup the name entry. If it already exists, search for an owner
++ * entry as we might already own that name. If either does not exist,
++ * we will allocate a fresh one.
++ */
++ hash = kdbus_strhash(name_str);
++ name = kdbus_name_entry_find(reg, hash, name_str);
++ if (name) {
++ owner = kdbus_name_owner_find(name, conn);
++ } else {
++ name = kdbus_name_entry_new(reg, hash, name_str);
++ if (IS_ERR(name)) {
++ ret = PTR_ERR(name);
++ name = NULL;
++ goto exit;
++ }
++ }
++
++ /* create name owner object if not already queued */
++ if (!owner) {
++ owner = kdbus_name_owner_new(conn, name, flags);
++ if (IS_ERR(owner)) {
++ ret = PTR_ERR(owner);
++ owner = NULL;
++ goto exit;
++ }
++ }
++
++ if (flags & KDBUS_NAME_ACTIVATOR)
++ ret = kdbus_name_become_activator(owner, return_flags);
++ else
++ ret = kdbus_name_update(owner, flags, return_flags);
++ if (ret < 0)
++ goto exit;
++
++exit:
++ if (owner && !kdbus_name_owner_is_used(owner))
++ kdbus_name_owner_free(owner);
++ if (name && !kdbus_name_entry_is_used(name))
++ kdbus_name_entry_free(name);
++ up_write(®->rwlock);
++ kdbus_notify_flush(conn->ep->bus);
++ return ret;
++}
++
++static void kdbus_name_release_unlocked(struct kdbus_name_owner *owner)
++{
++ struct kdbus_name_owner *primary, *next;
++ struct kdbus_name_entry *name;
++
++ name = owner->name;
++ primary = kdbus_name_entry_first(name);
++
++ list_del_init(&owner->name_entry);
++ if (owner == name->activator)
++ name->activator = NULL;
++
++ if (!primary || owner == primary) {
++ next = kdbus_name_entry_first(name);
++ if (!next)
++ next = name->activator;
++
++ if (next) {
++ /* hand to next in queue */
++ next->flags &= ~KDBUS_NAME_IN_QUEUE;
++ next->flags |= KDBUS_NAME_PRIMARY;
++ if (next == name->activator)
++ kdbus_conn_move_messages(next->conn,
++ owner->conn,
++ name->name_id);
++
++ kdbus_notify_name_change(owner->conn->ep->bus,
++ KDBUS_ITEM_NAME_CHANGE,
++ owner->conn->id, next->conn->id,
++ owner->flags, next->flags,
++ name->name);
++ } else {
++ kdbus_notify_name_change(owner->conn->ep->bus,
++ KDBUS_ITEM_NAME_REMOVE,
++ owner->conn->id, 0,
++ owner->flags, 0,
++ name->name);
++ }
++ }
++
++ kdbus_name_owner_free(owner);
++ if (!kdbus_name_entry_is_used(name))
++ kdbus_name_entry_free(name);
++}
++
++static int kdbus_name_release(struct kdbus_name_registry *reg,
++ struct kdbus_conn *conn,
++ const char *name_str)
++{
++ struct kdbus_name_owner *owner;
++ struct kdbus_name_entry *name;
++ int ret = 0;
++
++ down_write(®->rwlock);
++ name = kdbus_name_entry_find(reg, kdbus_strhash(name_str), name_str);
++ if (name) {
++ owner = kdbus_name_owner_find(name, conn);
++ if (owner)
++ kdbus_name_release_unlocked(owner);
++ else
++ ret = -EADDRINUSE;
++ } else {
++ ret = -ESRCH;
++ }
++ up_write(®->rwlock);
++
++ kdbus_notify_flush(conn->ep->bus);
++ return ret;
++}
++
++/**
++ * kdbus_name_release_all() - remove all name entries of a given connection
++ * @reg: name registry
++ * @conn: connection
++ */
++void kdbus_name_release_all(struct kdbus_name_registry *reg,
++ struct kdbus_conn *conn)
++{
++ struct kdbus_name_owner *owner;
++
++ down_write(®->rwlock);
++
++ while ((owner = list_first_entry_or_null(&conn->names_list,
++ struct kdbus_name_owner,
++ conn_entry)))
++ kdbus_name_release_unlocked(owner);
++
++ up_write(®->rwlock);
++
++ kdbus_notify_flush(conn->ep->bus);
++}
++
++/**
++ * kdbus_name_is_valid() - check if a name is valid
++ * @p: The name to check
++ * @allow_wildcard: Whether or not to allow a wildcard name
++ *
++ * A name is valid if all of the following criterias are met:
++ *
++ * - The name has two or more elements separated by a period ('.') character.
++ * - All elements must contain at least one character.
++ * - Each element must only contain the ASCII characters "[A-Z][a-z][0-9]_-"
++ * and must not begin with a digit.
++ * - The name must not exceed KDBUS_NAME_MAX_LEN.
++ * - If @allow_wildcard is true, the name may end on '.*'
++ */
++bool kdbus_name_is_valid(const char *p, bool allow_wildcard)
++{
++ bool dot, found_dot = false;
++ const char *q;
++
++ for (dot = true, q = p; *q; q++) {
++ if (*q == '.') {
++ if (dot)
++ return false;
++
++ found_dot = true;
++ dot = true;
++ } else {
++ bool good;
++
++ good = isalpha(*q) || (!dot && isdigit(*q)) ||
++ *q == '_' || *q == '-' ||
++ (allow_wildcard && dot &&
++ *q == '*' && *(q + 1) == '\0');
++
++ if (!good)
++ return false;
++
++ dot = false;
++ }
++ }
++
++ if (q - p > KDBUS_NAME_MAX_LEN)
++ return false;
++
++ if (dot)
++ return false;
++
++ if (!found_dot)
++ return false;
++
++ return true;
++}
++
++/**
++ * kdbus_cmd_name_acquire() - handle KDBUS_CMD_NAME_ACQUIRE
++ * @conn: connection to operate on
++ * @argp: command payload
++ *
++ * Return: >=0 on success, negative error code on failure.
++ */
++int kdbus_cmd_name_acquire(struct kdbus_conn *conn, void __user *argp)
++{
++ const char *item_name;
++ struct kdbus_cmd *cmd;
++ int ret;
++
++ struct kdbus_arg argv[] = {
++ { .type = KDBUS_ITEM_NEGOTIATE },
++ { .type = KDBUS_ITEM_NAME, .mandatory = true },
++ };
++ struct kdbus_args args = {
++ .allowed_flags = KDBUS_FLAG_NEGOTIATE |
++ KDBUS_NAME_REPLACE_EXISTING |
++ KDBUS_NAME_ALLOW_REPLACEMENT |
++ KDBUS_NAME_QUEUE,
++ .argv = argv,
++ .argc = ARRAY_SIZE(argv),
++ };
++
++ if (!kdbus_conn_is_ordinary(conn))
++ return -EOPNOTSUPP;
++
++ ret = kdbus_args_parse(&args, argp, &cmd);
++ if (ret != 0)
++ return ret;
++
++ item_name = argv[1].item->str;
++ if (!kdbus_name_is_valid(item_name, false)) {
++ ret = -EINVAL;
++ goto exit;
++ }
++
++ ret = kdbus_name_acquire(conn->ep->bus->name_registry, conn, item_name,
++ cmd->flags, &cmd->return_flags);
++
++exit:
++ return kdbus_args_clear(&args, ret);
++}
++
++/**
++ * kdbus_cmd_name_release() - handle KDBUS_CMD_NAME_RELEASE
++ * @conn: connection to operate on
++ * @argp: command payload
++ *
++ * Return: >=0 on success, negative error code on failure.
++ */
++int kdbus_cmd_name_release(struct kdbus_conn *conn, void __user *argp)
++{
++ struct kdbus_cmd *cmd;
++ int ret;
++
++ struct kdbus_arg argv[] = {
++ { .type = KDBUS_ITEM_NEGOTIATE },
++ { .type = KDBUS_ITEM_NAME, .mandatory = true },
++ };
++ struct kdbus_args args = {
++ .allowed_flags = KDBUS_FLAG_NEGOTIATE,
++ .argv = argv,
++ .argc = ARRAY_SIZE(argv),
++ };
++
++ if (!kdbus_conn_is_ordinary(conn))
++ return -EOPNOTSUPP;
++
++ ret = kdbus_args_parse(&args, argp, &cmd);
++ if (ret != 0)
++ return ret;
++
++ ret = kdbus_name_release(conn->ep->bus->name_registry, conn,
++ argv[1].item->str);
++ return kdbus_args_clear(&args, ret);
++}
++
++static int kdbus_list_write(struct kdbus_conn *conn,
++ struct kdbus_conn *c,
++ struct kdbus_pool_slice *slice,
++ size_t *pos,
++ struct kdbus_name_owner *o,
++ bool write)
++{
++ struct kvec kvec[4];
++ size_t cnt = 0;
++ int ret;
++
++ /* info header */
++ struct kdbus_info info = {
++ .size = 0,
++ .id = c->id,
++ .flags = c->flags,
++ };
++
++ /* fake the header of a kdbus_name item */
++ struct {
++ u64 size;
++ u64 type;
++ u64 flags;
++ } h = {};
++
++ if (o && !kdbus_conn_policy_see_name_unlocked(conn, current_cred(),
++ o->name->name))
++ return 0;
++
++ kdbus_kvec_set(&kvec[cnt++], &info, sizeof(info), &info.size);
++
++ /* append name */
++ if (o) {
++ size_t slen = strlen(o->name->name) + 1;
++
++ h.size = offsetof(struct kdbus_item, name.name) + slen;
++ h.type = KDBUS_ITEM_OWNED_NAME;
++ h.flags = o->flags;
++
++ kdbus_kvec_set(&kvec[cnt++], &h, sizeof(h), &info.size);
++ kdbus_kvec_set(&kvec[cnt++], o->name->name, slen, &info.size);
++ cnt += !!kdbus_kvec_pad(&kvec[cnt], &info.size);
++ }
++
++ if (write) {
++ ret = kdbus_pool_slice_copy_kvec(slice, *pos, kvec,
++ cnt, info.size);
++ if (ret < 0)
++ return ret;
++ }
++
++ *pos += info.size;
++ return 0;
++}
++
++static int kdbus_list_all(struct kdbus_conn *conn, u64 flags,
++ struct kdbus_pool_slice *slice,
++ size_t *pos, bool write)
++{
++ struct kdbus_conn *c;
++ size_t p = *pos;
++ int ret, i;
++
++ hash_for_each(conn->ep->bus->conn_hash, i, c, hentry) {
++ bool added = false;
++
++ /* skip monitors */
++ if (kdbus_conn_is_monitor(c))
++ continue;
++
++ /* all names the connection owns */
++ if (flags & (KDBUS_LIST_NAMES |
++ KDBUS_LIST_ACTIVATORS |
++ KDBUS_LIST_QUEUED)) {
++ struct kdbus_name_owner *o;
++
++ list_for_each_entry(o, &c->names_list, conn_entry) {
++ if (o->flags & KDBUS_NAME_ACTIVATOR) {
++ if (!(flags & KDBUS_LIST_ACTIVATORS))
++ continue;
++
++ ret = kdbus_list_write(conn, c, slice,
++ &p, o, write);
++ if (ret < 0) {
++ mutex_unlock(&c->lock);
++ return ret;
++ }
++
++ added = true;
++ } else if (o->flags & KDBUS_NAME_IN_QUEUE) {
++ if (!(flags & KDBUS_LIST_QUEUED))
++ continue;
++
++ ret = kdbus_list_write(conn, c, slice,
++ &p, o, write);
++ if (ret < 0) {
++ mutex_unlock(&c->lock);
++ return ret;
++ }
++
++ added = true;
++ } else if (flags & KDBUS_LIST_NAMES) {
++ ret = kdbus_list_write(conn, c, slice,
++ &p, o, write);
++ if (ret < 0) {
++ mutex_unlock(&c->lock);
++ return ret;
++ }
++
++ added = true;
++ }
++ }
++ }
++
++ /* nothing added so far, just add the unique ID */
++ if (!added && (flags & KDBUS_LIST_UNIQUE)) {
++ ret = kdbus_list_write(conn, c, slice, &p, NULL, write);
++ if (ret < 0)
++ return ret;
++ }
++ }
++
++ *pos = p;
++ return 0;
++}
++
++/**
++ * kdbus_cmd_list() - handle KDBUS_CMD_LIST
++ * @conn: connection to operate on
++ * @argp: command payload
++ *
++ * Return: >=0 on success, negative error code on failure.
++ */
++int kdbus_cmd_list(struct kdbus_conn *conn, void __user *argp)
++{
++ struct kdbus_name_registry *reg = conn->ep->bus->name_registry;
++ struct kdbus_pool_slice *slice = NULL;
++ struct kdbus_cmd_list *cmd;
++ size_t pos, size;
++ int ret;
++
++ struct kdbus_arg argv[] = {
++ { .type = KDBUS_ITEM_NEGOTIATE },
++ };
++ struct kdbus_args args = {
++ .allowed_flags = KDBUS_FLAG_NEGOTIATE |
++ KDBUS_LIST_UNIQUE |
++ KDBUS_LIST_NAMES |
++ KDBUS_LIST_ACTIVATORS |
++ KDBUS_LIST_QUEUED,
++ .argv = argv,
++ .argc = ARRAY_SIZE(argv),
++ };
++
++ ret = kdbus_args_parse(&args, argp, &cmd);
++ if (ret != 0)
++ return ret;
++
++ /* lock order: domain -> bus -> ep -> names -> conn */
++ down_read(®->rwlock);
++ down_read(&conn->ep->bus->conn_rwlock);
++ down_read(&conn->ep->policy_db.entries_rwlock);
++
++ /* size of records */
++ size = 0;
++ ret = kdbus_list_all(conn, cmd->flags, NULL, &size, false);
++ if (ret < 0)
++ goto exit_unlock;
++
++ if (size == 0) {
++ kdbus_pool_publish_empty(conn->pool, &cmd->offset,
++ &cmd->list_size);
++ } else {
++ slice = kdbus_pool_slice_alloc(conn->pool, size, false);
++ if (IS_ERR(slice)) {
++ ret = PTR_ERR(slice);
++ slice = NULL;
++ goto exit_unlock;
++ }
++
++ /* copy the records */
++ pos = 0;
++ ret = kdbus_list_all(conn, cmd->flags, slice, &pos, true);
++ if (ret < 0)
++ goto exit_unlock;
++
++ WARN_ON(pos != size);
++ kdbus_pool_slice_publish(slice, &cmd->offset, &cmd->list_size);
++ }
++
++ if (kdbus_member_set_user(&cmd->offset, argp, typeof(*cmd), offset) ||
++ kdbus_member_set_user(&cmd->list_size, argp,
++ typeof(*cmd), list_size))
++ ret = -EFAULT;
++
++exit_unlock:
++ up_read(&conn->ep->policy_db.entries_rwlock);
++ up_read(&conn->ep->bus->conn_rwlock);
++ up_read(®->rwlock);
++ kdbus_pool_slice_release(slice);
++ return kdbus_args_clear(&args, ret);
++}
+diff --git a/ipc/kdbus/names.h b/ipc/kdbus/names.h
+new file mode 100644
+index 0000000..edac59d
+--- /dev/null
++++ b/ipc/kdbus/names.h
+@@ -0,0 +1,105 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_NAMES_H
++#define __KDBUS_NAMES_H
++
++#include <linux/hashtable.h>
++#include <linux/rwsem.h>
++
++struct kdbus_name_entry;
++struct kdbus_name_owner;
++struct kdbus_name_registry;
++
++/**
++ * struct kdbus_name_registry - names registered for a bus
++ * @entries_hash: Map of entries
++ * @lock: Registry data lock
++ * @name_seq_last: Last used sequence number to assign to a name entry
++ */
++struct kdbus_name_registry {
++ DECLARE_HASHTABLE(entries_hash, 8);
++ struct rw_semaphore rwlock;
++ u64 name_seq_last;
++};
++
++/**
++ * struct kdbus_name_entry - well-know name entry
++ * @name_id: sequence number of name entry to be able to uniquely
++ * identify a name over its registration lifetime
++ * @activator: activator of this name, or NULL
++ * @queue: list of queued owners
++ * @hentry: entry in registry map
++ * @name: well-known name
++ */
++struct kdbus_name_entry {
++ u64 name_id;
++ struct kdbus_name_owner *activator;
++ struct list_head queue;
++ struct hlist_node hentry;
++ char name[];
++};
++
++/**
++ * struct kdbus_name_owner - owner of a well-known name
++ * @flags: KDBUS_NAME_* flags of this owner
++ * @conn: connection owning the name
++ * @name: name that is owned
++ * @conn_entry: link into @conn
++ * @name_entry: link into @name
++ */
++struct kdbus_name_owner {
++ u64 flags;
++ struct kdbus_conn *conn;
++ struct kdbus_name_entry *name;
++ struct list_head conn_entry;
++ struct list_head name_entry;
++};
++
++bool kdbus_name_is_valid(const char *p, bool allow_wildcard);
++
++struct kdbus_name_registry *kdbus_name_registry_new(void);
++void kdbus_name_registry_free(struct kdbus_name_registry *reg);
++
++struct kdbus_name_entry *
++kdbus_name_lookup_unlocked(struct kdbus_name_registry *reg, const char *name);
++
++int kdbus_name_acquire(struct kdbus_name_registry *reg,
++ struct kdbus_conn *conn, const char *name,
++ u64 flags, u64 *return_flags);
++void kdbus_name_release_all(struct kdbus_name_registry *reg,
++ struct kdbus_conn *conn);
++
++int kdbus_cmd_name_acquire(struct kdbus_conn *conn, void __user *argp);
++int kdbus_cmd_name_release(struct kdbus_conn *conn, void __user *argp);
++int kdbus_cmd_list(struct kdbus_conn *conn, void __user *argp);
++
++/**
++ * kdbus_name_get_owner() - get current owner of a name
++ * @name: name to get current owner of
++ *
++ * This returns a pointer to the current owner of a name (or its activator if
++ * there is no owner). The caller must make sure @name is valid and does not
++ * vanish.
++ *
++ * Return: Pointer to current owner or NULL if there is none.
++ */
++static inline struct kdbus_name_owner *
++kdbus_name_get_owner(struct kdbus_name_entry *name)
++{
++ return list_first_entry_or_null(&name->queue, struct kdbus_name_owner,
++ name_entry) ? : name->activator;
++}
++
++#endif
+diff --git a/ipc/kdbus/node.c b/ipc/kdbus/node.c
+new file mode 100644
+index 0000000..89f58bc
+--- /dev/null
++++ b/ipc/kdbus/node.c
+@@ -0,0 +1,897 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/atomic.h>
++#include <linux/fs.h>
++#include <linux/idr.h>
++#include <linux/kdev_t.h>
++#include <linux/rbtree.h>
++#include <linux/rwsem.h>
++#include <linux/sched.h>
++#include <linux/slab.h>
++#include <linux/wait.h>
++
++#include "bus.h"
++#include "domain.h"
++#include "endpoint.h"
++#include "fs.h"
++#include "handle.h"
++#include "node.h"
++#include "util.h"
++
++/**
++ * DOC: kdbus nodes
++ *
++ * Nodes unify lifetime management across exposed kdbus objects and provide a
++ * hierarchy. Each kdbus object, that might be exposed to user-space, has a
++ * kdbus_node object embedded and is linked into the hierarchy. Each node can
++ * have any number (0-n) of child nodes linked. Each child retains a reference
++ * to its parent node. For root-nodes, the parent is NULL.
++ *
++ * Each node object goes through a bunch of states during it's lifetime:
++ * * NEW
++ * * LINKED (can be skipped by NEW->FREED transition)
++ * * ACTIVE (can be skipped by LINKED->INACTIVE transition)
++ * * INACTIVE
++ * * DRAINED
++ * * FREED
++ *
++ * Each node is allocated by the caller and initialized via kdbus_node_init().
++ * This never fails and sets the object into state NEW. From now on, ref-counts
++ * on the node manage its lifetime. During init, the ref-count is set to 1. Once
++ * it drops to 0, the node goes to state FREED and the node->free_cb() callback
++ * is called to deallocate any memory.
++ *
++ * After initializing a node, you usually link it into the hierarchy. You need
++ * to provide a parent node and a name. The node will be linked as child to the
++ * parent and a globally unique ID is assigned to the child. The name of the
++ * child must be unique for all children of this parent. Otherwise, linking the
++ * child will fail with -EEXIST.
++ * Note that the child is not marked active, yet. Admittedly, it prevents any
++ * other node from being linked with the same name (thus, it reserves that
++ * name), but any child-lookup (via name or unique ID) will never return this
++ * child unless it has been marked active.
++ *
++ * Once successfully linked, you can use kdbus_node_activate() to activate a
++ * child. This will mark the child active. This state can be skipped by directly
++ * deactivating the child via kdbus_node_deactivate() (see below).
++ * By activating a child, you enable any lookups on this child to succeed from
++ * now on. Furthermore, any code that got its hands on a reference to the node,
++ * can from now on "acquire" the node.
++ *
++ * Active References (or: 'acquiring' and 'releasing' a node)
++ * Additionally to normal object references, nodes support something we call
++ * "active references". An active reference can be acquired via
++ * kdbus_node_acquire() and released via kdbus_node_release(). A caller
++ * _must_ own a normal object reference whenever calling those functions.
++ * Unlike object references, acquiring an active reference can fail (by
++ * returning 'false' from kdbus_node_acquire()). An active reference can
++ * only be acquired if the node is marked active. If it is not marked
++ * active, yet, or if it was already deactivated, no more active references
++ * can be acquired, ever!
++ * Active references are used to track tasks working on a node. Whenever a
++ * task enters kernel-space to perform an action on a node, it acquires an
++ * active reference, performs the action and releases the reference again.
++ * While holding an active reference, the node is guaranteed to stay active.
++ * If the node is deactivated in parallel, the node is marked as
++ * deactivated, then we wait for all active references to be dropped, before
++ * we finally proceed with any cleanups. That is, if you hold an active
++ * reference to a node, any resources that are bound to the "active" state
++ * are guaranteed to stay accessible until you release your reference.
++ *
++ * Active-references are very similar to rw-locks, where acquiring a node is
++ * equal to try-read-lock and releasing to read-unlock. Deactivating a node
++ * means write-lock and never releasing it again.
++ * Unlike rw-locks, the 'active reference' concept is more versatile and
++ * avoids unusual rw-lock usage (never releasing a write-lock..).
++ *
++ * It is safe to acquire multiple active-references recursively. But you
++ * need to check the return value of kdbus_node_acquire() on _each_ call. It
++ * may stop granting references at _any_ time.
++ *
++ * You're free to perform any operations you want while holding an active
++ * reference, except sleeping for an indefinite period. Sleeping for a fixed
++ * amount of time is fine, but you usually should not wait on wait-queues
++ * without a timeout.
++ * For example, if you wait for I/O to happen, you should gather all data
++ * and schedule the I/O operation, then release your active reference and
++ * wait for it to complete. Then try to acquire a new reference. If it
++ * fails, perform any cleanup (the node is now dead). Otherwise, you can
++ * finish your operation.
++ *
++ * All nodes can be deactivated via kdbus_node_deactivate() at any time. You can
++ * call this multiple times, even in parallel or on nodes that were never
++ * linked, and it will just work. The only restriction is, you must not hold an
++ * active reference when calling kdbus_node_deactivate().
++ * By deactivating a node, it is immediately marked inactive. Then, we wait for
++ * all active references to be released (called 'draining' the node). This
++ * shouldn't take very long as we don't perform long-lasting operations while
++ * holding an active reference. Note that once the node is marked inactive, no
++ * new active references can be acquired.
++ * Once all active references are dropped, the node is considered 'drained'. Now
++ * kdbus_node_deactivate() is called on each child of the node before we
++ * continue deactivating our node. That is, once all children are entirely
++ * deactivated, we call ->release_cb() of our node. ->release_cb() can release
++ * any resources on that node which are bound to the "active" state of a node.
++ * When done, we unlink the node from its parent rb-tree, mark it as
++ * 'released' and return.
++ * If kdbus_node_deactivate() is called multiple times (even in parallel), all
++ * but one caller will just wait until the node is fully deactivated. That is,
++ * one random caller of kdbus_node_deactivate() is selected to call
++ * ->release_cb() and cleanup the node. Only once all this is done, all other
++ * callers will return from kdbus_node_deactivate(). That is, it doesn't matter
++ * whether you're the selected caller or not, it will only return after
++ * everything is fully done.
++ *
++ * When a node is activated, we acquire a normal object reference to the node.
++ * This reference is dropped after deactivation is fully done (and only iff the
++ * node really was activated). This allows callers to link+activate a child node
++ * and then drop all refs. The node will be deactivated together with the
++ * parent, and then be freed when this reference is dropped.
++ *
++ * Currently, nodes provide a bunch of resources that external code can use
++ * directly. This includes:
++ *
++ * * node->waitq: Each node has its own wait-queue that is used to manage
++ * the 'active' state. When a node is deactivated, we wait on
++ * this queue until all active refs are dropped. Analogously,
++ * when you release an active reference on a deactivated
++ * node, and the active ref-count drops to 0, we wake up a
++ * single thread on this queue. Furthermore, once the
++ * ->release_cb() callback finished, we wake up all waiters.
++ * The node-owner is free to re-use this wait-queue for other
++ * purposes. As node-management uses this queue only during
++ * deactivation, it is usually totally fine to re-use the
++ * queue for other, preferably low-overhead, use-cases.
++ *
++ * * node->type: This field defines the type of the owner of this node. It
++ * must be set during node initialization and must remain
++ * constant. The node management never looks at this value,
++ * but external users might use to gain access to the owner
++ * object of a node.
++ * It is totally up to the owner of the node to define what
++ * their type means. Usually it means you can access the
++ * parent structure via container_of(), as long as you hold an
++ * active reference to the node.
++ *
++ * * node->free_cb: callback after all references are dropped
++ * node->release_cb: callback during node deactivation
++ * These fields must be set by the node owner during
++ * node initialization. They must remain constant. If
++ * NULL, they're skipped.
++ *
++ * * node->mode: filesystem access modes
++ * node->uid: filesystem owner uid
++ * node->gid: filesystem owner gid
++ * These fields must be set by the node owner during node
++ * initialization. They must remain constant and may be
++ * accessed by other callers to properly initialize
++ * filesystem nodes.
++ *
++ * * node->id: This is an unsigned 32bit integer allocated by an IDA. It is
++ * always kept as small as possible during allocation and is
++ * globally unique across all nodes allocated by this module. 0
++ * is reserved as "not assigned" and is the default.
++ * The ID is assigned during kdbus_node_link() and is kept until
++ * the object is freed. Thus, the ID surpasses the active
++ * lifetime of a node. As long as you hold an object reference
++ * to a node (and the node was linked once), the ID is valid and
++ * unique.
++ *
++ * * node->name: name of this node
++ * node->hash: 31bit hash-value of @name (range [2..INT_MAX-1])
++ * These values follow the same lifetime rules as node->id.
++ * They're initialized when the node is linked and then remain
++ * constant until the last object reference is dropped.
++ * Unlike the id, the name is only unique across all siblings
++ * and only until the node is deactivated. Currently, the name
++ * is even unique if linked but not activated, yet. This might
++ * change in the future, though. Code should not rely on this.
++ *
++ * * node->lock: lock to protect node->children, node->rb, node->parent
++ * * node->parent: Reference to parent node. This is set during LINK time
++ * and is dropped during destruction. You must not access
++ * it unless you hold an active reference to the node or if
++ * you know the node is dead.
++ * * node->children: rb-tree of all linked children of this node. You must
++ * not access this directly, but use one of the iterator
++ * or lookup helpers.
++ */
++
++/*
++ * Bias values track states of "active references". They're all negative. If a
++ * node is active, its active-ref-counter is >=0 and tracks all active
++ * references. Once a node is deactivaed, we subtract NODE_BIAS. This means, the
++ * counter is now negative but still counts the active references. Once it drops
++ * to exactly NODE_BIAS, we know all active references were dropped. Exactly one
++ * thread will change it to NODE_RELEASE now, perform cleanup and then put it
++ * into NODE_DRAINED. Once drained, all other threads that tried deactivating
++ * the node will now be woken up (thus, they wait until the node is fully done).
++ * The initial state during node-setup is NODE_NEW. If a node is directly
++ * deactivated without having ever been active, it is put into
++ * NODE_RELEASE_DIRECT instead of NODE_BIAS. This tracks this one-bit state
++ * across node-deactivation. The task putting it into NODE_RELEASE now knows
++ * whether the node was active before or not.
++ *
++ * Some archs implement atomic_sub(v) with atomic_add(-v), so reserve INT_MIN
++ * to avoid overflows if multiplied by -1.
++ */
++#define KDBUS_NODE_BIAS (INT_MIN + 5)
++#define KDBUS_NODE_RELEASE_DIRECT (KDBUS_NODE_BIAS - 1)
++#define KDBUS_NODE_RELEASE (KDBUS_NODE_BIAS - 2)
++#define KDBUS_NODE_DRAINED (KDBUS_NODE_BIAS - 3)
++#define KDBUS_NODE_NEW (KDBUS_NODE_BIAS - 4)
++
++/* global unique ID mapping for kdbus nodes */
++DEFINE_IDA(kdbus_node_ida);
++
++/**
++ * kdbus_node_name_hash() - hash a name
++ * @name: The string to hash
++ *
++ * This computes the hash of @name. It is guaranteed to be in the range
++ * [2..INT_MAX-1]. The values 1, 2 and INT_MAX are unused as they are reserved
++ * for the filesystem code.
++ *
++ * Return: hash value of the passed string
++ */
++static unsigned int kdbus_node_name_hash(const char *name)
++{
++ unsigned int hash;
++
++ /* reserve hash numbers 0, 1 and >=INT_MAX for magic directories */
++ hash = kdbus_strhash(name) & INT_MAX;
++ if (hash < 2)
++ hash += 2;
++ if (hash >= INT_MAX)
++ hash = INT_MAX - 1;
++
++ return hash;
++}
++
++/**
++ * kdbus_node_name_compare() - compare a name with a node's name
++ * @hash: hash of the string to compare the node with
++ * @name: name to compare the node with
++ * @node: node to compare the name with
++ *
++ * Return: 0 if @name and @hash exactly match the information in @node, or
++ * an integer less than or greater than zero if @name is found, respectively,
++ * to be less than or be greater than the string stored in @node.
++ */
++static int kdbus_node_name_compare(unsigned int hash, const char *name,
++ const struct kdbus_node *node)
++{
++ if (hash != node->hash)
++ return hash - node->hash;
++
++ return strcmp(name, node->name);
++}
++
++/**
++ * kdbus_node_init() - initialize a kdbus_node
++ * @node: Pointer to the node to initialize
++ * @type: The type the node will have (KDBUS_NODE_*)
++ *
++ * The caller is responsible of allocating @node and initializating it to zero.
++ * Once this call returns, you must use the node_ref() and node_unref()
++ * functions to manage this node.
++ */
++void kdbus_node_init(struct kdbus_node *node, unsigned int type)
++{
++ atomic_set(&node->refcnt, 1);
++ mutex_init(&node->lock);
++ node->id = 0;
++ node->type = type;
++ RB_CLEAR_NODE(&node->rb);
++ node->children = RB_ROOT;
++ init_waitqueue_head(&node->waitq);
++ atomic_set(&node->active, KDBUS_NODE_NEW);
++}
++
++/**
++ * kdbus_node_link() - link a node into the nodes system
++ * @node: Pointer to the node to initialize
++ * @parent: Pointer to a parent node, may be %NULL
++ * @name: The name of the node (or NULL if root node)
++ *
++ * This links a node into the hierarchy. This must not be called multiple times.
++ * If @parent is NULL, the node becomes a new root node.
++ *
++ * This call will fail if @name is not unique across all its siblings or if no
++ * ID could be allocated. You must not activate a node if linking failed! It is
++ * safe to deactivate it, though.
++ *
++ * Once you linked a node, you must call kdbus_node_deactivate() before you drop
++ * the last reference (even if you never activate the node).
++ *
++ * Return: 0 on success. negative error otherwise.
++ */
++int kdbus_node_link(struct kdbus_node *node, struct kdbus_node *parent,
++ const char *name)
++{
++ int ret;
++
++ if (WARN_ON(node->type != KDBUS_NODE_DOMAIN && !parent))
++ return -EINVAL;
++
++ if (WARN_ON(parent && !name))
++ return -EINVAL;
++
++ if (name) {
++ node->name = kstrdup(name, GFP_KERNEL);
++ if (!node->name)
++ return -ENOMEM;
++
++ node->hash = kdbus_node_name_hash(name);
++ }
++
++ ret = ida_simple_get(&kdbus_node_ida, 1, 0, GFP_KERNEL);
++ if (ret < 0)
++ return ret;
++
++ node->id = ret;
++ ret = 0;
++
++ if (parent) {
++ struct rb_node **n, *prev;
++
++ if (!kdbus_node_acquire(parent))
++ return -ESHUTDOWN;
++
++ mutex_lock(&parent->lock);
++
++ n = &parent->children.rb_node;
++ prev = NULL;
++
++ while (*n) {
++ struct kdbus_node *pos;
++ int result;
++
++ pos = kdbus_node_from_rb(*n);
++ prev = *n;
++ result = kdbus_node_name_compare(node->hash,
++ node->name,
++ pos);
++ if (result == 0) {
++ ret = -EEXIST;
++ goto exit_unlock;
++ }
++
++ if (result < 0)
++ n = &pos->rb.rb_left;
++ else
++ n = &pos->rb.rb_right;
++ }
++
++ /* add new node and rebalance the tree */
++ rb_link_node(&node->rb, prev, n);
++ rb_insert_color(&node->rb, &parent->children);
++ node->parent = kdbus_node_ref(parent);
++
++exit_unlock:
++ mutex_unlock(&parent->lock);
++ kdbus_node_release(parent);
++ }
++
++ return ret;
++}
++
++/**
++ * kdbus_node_ref() - Acquire object reference
++ * @node: node to acquire reference to (or NULL)
++ *
++ * This acquires a new reference to @node. You must already own a reference when
++ * calling this!
++ * If @node is NULL, this is a no-op.
++ *
++ * Return: @node is returned
++ */
++struct kdbus_node *kdbus_node_ref(struct kdbus_node *node)
++{
++ if (node)
++ atomic_inc(&node->refcnt);
++ return node;
++}
++
++/**
++ * kdbus_node_unref() - Drop object reference
++ * @node: node to drop reference to (or NULL)
++ *
++ * This drops an object reference to @node. You must not access the node if you
++ * no longer own a reference.
++ * If the ref-count drops to 0, the object will be destroyed (->free_cb will be
++ * called).
++ *
++ * If you linked or activated the node, you must deactivate the node before you
++ * drop your last reference! If you didn't link or activate the node, you can
++ * drop any reference you want.
++ *
++ * Note that this calls into ->free_cb() and thus _might_ sleep. The ->free_cb()
++ * callbacks must not acquire any outer locks, though. So you can safely drop
++ * references while holding locks.
++ *
++ * If @node is NULL, this is a no-op.
++ *
++ * Return: This always returns NULL
++ */
++struct kdbus_node *kdbus_node_unref(struct kdbus_node *node)
++{
++ if (node && atomic_dec_and_test(&node->refcnt)) {
++ struct kdbus_node safe = *node;
++
++ WARN_ON(atomic_read(&node->active) != KDBUS_NODE_DRAINED);
++ WARN_ON(!RB_EMPTY_NODE(&node->rb));
++
++ if (node->free_cb)
++ node->free_cb(node);
++ if (safe.id > 0)
++ ida_simple_remove(&kdbus_node_ida, safe.id);
++
++ kfree(safe.name);
++
++ /*
++ * kdbusfs relies on the parent to be available even after the
++ * node was deactivated and unlinked. Therefore, we pin it
++ * until a node is destroyed.
++ */
++ kdbus_node_unref(safe.parent);
++ }
++
++ return NULL;
++}
++
++/**
++ * kdbus_node_is_active() - test whether a node is active
++ * @node: node to test
++ *
++ * This checks whether @node is active. That means, @node was linked and
++ * activated by the node owner and hasn't been deactivated, yet. If, and only
++ * if, a node is active, kdbus_node_acquire() will be able to acquire active
++ * references.
++ *
++ * Note that this function does not give any lifetime guarantees. After this
++ * call returns, the node might be deactivated immediately. Normally, what you
++ * want is to acquire a real active reference via kdbus_node_acquire().
++ *
++ * Return: true if @node is active, false otherwise
++ */
++bool kdbus_node_is_active(struct kdbus_node *node)
++{
++ return atomic_read(&node->active) >= 0;
++}
++
++/**
++ * kdbus_node_is_deactivated() - test whether a node was already deactivated
++ * @node: node to test
++ *
++ * This checks whether kdbus_node_deactivate() was called on @node. Note that
++ * this might be true even if you never deactivated the node directly, but only
++ * one of its ancestors.
++ *
++ * Note that even if this returns 'false', the node might get deactivated
++ * immediately after the call returns.
++ *
++ * Return: true if @node was already deactivated, false if not
++ */
++bool kdbus_node_is_deactivated(struct kdbus_node *node)
++{
++ int v;
++
++ v = atomic_read(&node->active);
++ return v != KDBUS_NODE_NEW && v < 0;
++}
++
++/**
++ * kdbus_node_activate() - activate a node
++ * @node: node to activate
++ *
++ * This marks @node as active if, and only if, the node wasn't activated nor
++ * deactivated, yet, and the parent is still active. Any but the first call to
++ * kdbus_node_activate() is a no-op.
++ * If you called kdbus_node_deactivate() before, then even the first call to
++ * kdbus_node_activate() will be a no-op.
++ *
++ * This call doesn't give any lifetime guarantees. The node might get
++ * deactivated immediately after this call returns. Or the parent might already
++ * be deactivated, which will make this call a no-op.
++ *
++ * If this call successfully activated a node, it will take an object reference
++ * to it. This reference is dropped after the node is deactivated. Therefore,
++ * the object owner can safely drop their reference to @node iff they know that
++ * its parent node will get deactivated at some point. Once the parent node is
++ * deactivated, it will deactivate all its child and thus drop this reference
++ * again.
++ *
++ * Return: True if this call successfully activated the node, otherwise false.
++ * Note that this might return false, even if the node is still active
++ * (eg., if you called this a second time).
++ */
++bool kdbus_node_activate(struct kdbus_node *node)
++{
++ bool res = false;
++
++ mutex_lock(&node->lock);
++ if (atomic_read(&node->active) == KDBUS_NODE_NEW) {
++ atomic_sub(KDBUS_NODE_NEW, &node->active);
++ /* activated nodes have ref +1 */
++ kdbus_node_ref(node);
++ res = true;
++ }
++ mutex_unlock(&node->lock);
++
++ return res;
++}
++
++/**
++ * kdbus_node_deactivate() - deactivate a node
++ * @node: The node to deactivate.
++ *
++ * This function recursively deactivates this node and all its children. It
++ * returns only once all children and the node itself were recursively disabled
++ * (even if you call this function multiple times in parallel).
++ *
++ * It is safe to call this function on _any_ node that was initialized _any_
++ * number of times.
++ *
++ * This call may sleep, as it waits for all active references to be dropped.
++ */
++void kdbus_node_deactivate(struct kdbus_node *node)
++{
++ struct kdbus_node *pos, *child;
++ struct rb_node *rb;
++ int v_pre, v_post;
++
++ pos = node;
++
++ /*
++ * To avoid recursion, we perform back-tracking while deactivating
++ * nodes. For each node we enter, we first mark the active-counter as
++ * deactivated by adding BIAS. If the node as children, we set the first
++ * child as current position and start over. If the node has no
++ * children, we drain the node by waiting for all active refs to be
++ * dropped and then releasing the node.
++ *
++ * After the node is released, we set its parent as current position
++ * and start over. If the current position was the initial node, we're
++ * done.
++ *
++ * Note that this function can be called in parallel by multiple
++ * callers. We make sure that each node is only released once, and any
++ * racing caller will wait until the other thread fully released that
++ * node.
++ */
++
++ for (;;) {
++ /*
++ * Add BIAS to node->active to mark it as inactive. If it was
++ * never active before, immediately mark it as RELEASE_INACTIVE
++ * so we remember this state.
++ * We cannot remember v_pre as we might iterate into the
++ * children, overwriting v_pre, before we can release our node.
++ */
++ mutex_lock(&pos->lock);
++ v_pre = atomic_read(&pos->active);
++ if (v_pre >= 0)
++ atomic_add_return(KDBUS_NODE_BIAS, &pos->active);
++ else if (v_pre == KDBUS_NODE_NEW)
++ atomic_set(&pos->active, KDBUS_NODE_RELEASE_DIRECT);
++ mutex_unlock(&pos->lock);
++
++ /* wait until all active references were dropped */
++ wait_event(pos->waitq,
++ atomic_read(&pos->active) <= KDBUS_NODE_BIAS);
++
++ mutex_lock(&pos->lock);
++ /* recurse into first child if any */
++ rb = rb_first(&pos->children);
++ if (rb) {
++ child = kdbus_node_ref(kdbus_node_from_rb(rb));
++ mutex_unlock(&pos->lock);
++ pos = child;
++ continue;
++ }
++
++ /* mark object as RELEASE */
++ v_post = atomic_read(&pos->active);
++ if (v_post == KDBUS_NODE_BIAS ||
++ v_post == KDBUS_NODE_RELEASE_DIRECT)
++ atomic_set(&pos->active, KDBUS_NODE_RELEASE);
++ mutex_unlock(&pos->lock);
++
++ /*
++ * If this is the thread that marked the object as RELEASE, we
++ * perform the actual release. Otherwise, we wait until the
++ * release is done and the node is marked as DRAINED.
++ */
++ if (v_post == KDBUS_NODE_BIAS ||
++ v_post == KDBUS_NODE_RELEASE_DIRECT) {
++ if (pos->release_cb)
++ pos->release_cb(pos, v_post == KDBUS_NODE_BIAS);
++
++ if (pos->parent) {
++ mutex_lock(&pos->parent->lock);
++ if (!RB_EMPTY_NODE(&pos->rb)) {
++ rb_erase(&pos->rb,
++ &pos->parent->children);
++ RB_CLEAR_NODE(&pos->rb);
++ }
++ mutex_unlock(&pos->parent->lock);
++ }
++
++ /* mark as DRAINED */
++ atomic_set(&pos->active, KDBUS_NODE_DRAINED);
++ wake_up_all(&pos->waitq);
++
++ /* drop VFS cache */
++ kdbus_fs_flush(pos);
++
++ /*
++ * If the node was activated and someone subtracted BIAS
++ * from it to deactivate it, we, and only us, are
++ * responsible to release the extra ref-count that was
++ * taken once in kdbus_node_activate().
++ * If the node was never activated, no-one ever
++ * subtracted BIAS, but instead skipped that state and
++ * immediately went to NODE_RELEASE_DIRECT. In that case
++ * we must not drop the reference.
++ */
++ if (v_post == KDBUS_NODE_BIAS)
++ kdbus_node_unref(pos);
++ } else {
++ /* wait until object is DRAINED */
++ wait_event(pos->waitq,
++ atomic_read(&pos->active) == KDBUS_NODE_DRAINED);
++ }
++
++ /*
++ * We're done with the current node. Continue on its parent
++ * again, which will try deactivating its next child, or itself
++ * if no child is left.
++ * If we've reached our initial node again, we are done and
++ * can safely return.
++ */
++ if (pos == node)
++ break;
++
++ child = pos;
++ pos = pos->parent;
++ kdbus_node_unref(child);
++ }
++}
++
++/**
++ * kdbus_node_acquire() - Acquire an active ref on a node
++ * @node: The node
++ *
++ * This acquires an active-reference to @node. This will only succeed if the
++ * node is active. You must release this active reference via
++ * kdbus_node_release() again.
++ *
++ * See the introduction to "active references" for more details.
++ *
++ * Return: %true if @node was non-NULL and active
++ */
++bool kdbus_node_acquire(struct kdbus_node *node)
++{
++ return node && atomic_inc_unless_negative(&node->active);
++}
++
++/**
++ * kdbus_node_release() - Release an active ref on a node
++ * @node: The node
++ *
++ * This releases an active reference that was previously acquired via
++ * kdbus_node_acquire(). See kdbus_node_acquire() for details.
++ */
++void kdbus_node_release(struct kdbus_node *node)
++{
++ if (node && atomic_dec_return(&node->active) == KDBUS_NODE_BIAS)
++ wake_up(&node->waitq);
++}
++
++/**
++ * kdbus_node_find_child() - Find child by name
++ * @node: parent node to search through
++ * @name: name of child node
++ *
++ * This searches through all children of @node for a child-node with name @name.
++ * If not found, or if the child is deactivated, NULL is returned. Otherwise,
++ * the child is acquired and a new reference is returned.
++ *
++ * If you're done with the child, you need to release it and drop your
++ * reference.
++ *
++ * This function does not acquire the parent node. However, if the parent was
++ * already deactivated, then kdbus_node_deactivate() will, at some point, also
++ * deactivate the child. Therefore, we can rely on the explicit ordering during
++ * deactivation.
++ *
++ * Return: Reference to acquired child node, or NULL if not found / not active.
++ */
++struct kdbus_node *kdbus_node_find_child(struct kdbus_node *node,
++ const char *name)
++{
++ struct kdbus_node *child;
++ struct rb_node *rb;
++ unsigned int hash;
++ int ret;
++
++ hash = kdbus_node_name_hash(name);
++
++ mutex_lock(&node->lock);
++ rb = node->children.rb_node;
++ while (rb) {
++ child = kdbus_node_from_rb(rb);
++ ret = kdbus_node_name_compare(hash, name, child);
++ if (ret < 0)
++ rb = rb->rb_left;
++ else if (ret > 0)
++ rb = rb->rb_right;
++ else
++ break;
++ }
++ if (rb && kdbus_node_acquire(child))
++ kdbus_node_ref(child);
++ else
++ child = NULL;
++ mutex_unlock(&node->lock);
++
++ return child;
++}
++
++static struct kdbus_node *node_find_closest_unlocked(struct kdbus_node *node,
++ unsigned int hash,
++ const char *name)
++{
++ struct kdbus_node *n, *pos = NULL;
++ struct rb_node *rb;
++ int res;
++
++ /*
++ * Find the closest child with ``node->hash >= hash'', or, if @name is
++ * valid, ``node->name >= name'' (where '>=' is the lex. order).
++ */
++
++ rb = node->children.rb_node;
++ while (rb) {
++ n = kdbus_node_from_rb(rb);
++
++ if (name)
++ res = kdbus_node_name_compare(hash, name, n);
++ else
++ res = hash - n->hash;
++
++ if (res <= 0) {
++ rb = rb->rb_left;
++ pos = n;
++ } else { /* ``hash > n->hash'', ``name > n->name'' */
++ rb = rb->rb_right;
++ }
++ }
++
++ return pos;
++}
++
++/**
++ * kdbus_node_find_closest() - Find closest child-match
++ * @node: parent node to search through
++ * @hash: hash value to find closest match for
++ *
++ * Find the closest child of @node with a hash greater than or equal to @hash.
++ * The closest match is the left-most child of @node with this property. Which
++ * means, it is the first child with that hash returned by
++ * kdbus_node_next_child(), if you'd iterate the whole parent node.
++ *
++ * Return: Reference to acquired child, or NULL if none found.
++ */
++struct kdbus_node *kdbus_node_find_closest(struct kdbus_node *node,
++ unsigned int hash)
++{
++ struct kdbus_node *child;
++ struct rb_node *rb;
++
++ mutex_lock(&node->lock);
++
++ child = node_find_closest_unlocked(node, hash, NULL);
++ while (child && !kdbus_node_acquire(child)) {
++ rb = rb_next(&child->rb);
++ if (rb)
++ child = kdbus_node_from_rb(rb);
++ else
++ child = NULL;
++ }
++ kdbus_node_ref(child);
++
++ mutex_unlock(&node->lock);
++
++ return child;
++}
++
++/**
++ * kdbus_node_next_child() - Acquire next child
++ * @node: parent node
++ * @prev: previous child-node position or NULL
++ *
++ * This function returns a reference to the next active child of @node, after
++ * the passed position @prev. If @prev is NULL, a reference to the first active
++ * child is returned. If no more active children are found, NULL is returned.
++ *
++ * This function acquires the next child it returns. If you're done with the
++ * returned pointer, you need to release _and_ unref it.
++ *
++ * The passed in pointer @prev is not modified by this function, and it does
++ * *not* have to be active. If @prev was acquired via different means, or if it
++ * was unlinked from its parent before you pass it in, then this iterator will
++ * still return the next active child (it will have to search through the
++ * rb-tree based on the node-name, though).
++ * However, @prev must not be linked to a different parent than @node!
++ *
++ * Return: Reference to next acquired child, or NULL if at the end.
++ */
++struct kdbus_node *kdbus_node_next_child(struct kdbus_node *node,
++ struct kdbus_node *prev)
++{
++ struct kdbus_node *pos = NULL;
++ struct rb_node *rb;
++
++ mutex_lock(&node->lock);
++
++ if (!prev) {
++ /*
++ * New iteration; find first node in rb-tree and try to acquire
++ * it. If we got it, directly return it as first element.
++ * Otherwise, the loop below will find the next active node.
++ */
++ rb = rb_first(&node->children);
++ if (!rb)
++ goto exit;
++ pos = kdbus_node_from_rb(rb);
++ if (kdbus_node_acquire(pos))
++ goto exit;
++ } else if (RB_EMPTY_NODE(&prev->rb)) {
++ /*
++ * The current iterator is no longer linked to the rb-tree. Use
++ * its hash value and name to find the next _higher_ node and
++ * acquire it. If we got it, return it as next element.
++ * Otherwise, the loop below will find the next active node.
++ */
++ pos = node_find_closest_unlocked(node, prev->hash, prev->name);
++ if (!pos)
++ goto exit;
++ if (kdbus_node_acquire(pos))
++ goto exit;
++ } else {
++ /*
++ * The current iterator is still linked to the parent. Set it
++ * as current position and use the loop below to find the next
++ * active element.
++ */
++ pos = prev;
++ }
++
++ /* @pos was already returned or is inactive; find next active node */
++ do {
++ rb = rb_next(&pos->rb);
++ if (rb)
++ pos = kdbus_node_from_rb(rb);
++ else
++ pos = NULL;
++ } while (pos && !kdbus_node_acquire(pos));
++
++exit:
++ /* @pos is NULL or acquired. Take ref if non-NULL and return it */
++ kdbus_node_ref(pos);
++ mutex_unlock(&node->lock);
++ return pos;
++}
+diff --git a/ipc/kdbus/node.h b/ipc/kdbus/node.h
+new file mode 100644
+index 0000000..970e02b
+--- /dev/null
++++ b/ipc/kdbus/node.h
+@@ -0,0 +1,86 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_NODE_H
++#define __KDBUS_NODE_H
++
++#include <linux/atomic.h>
++#include <linux/kernel.h>
++#include <linux/mutex.h>
++#include <linux/wait.h>
++
++struct kdbus_node;
++
++enum kdbus_node_type {
++ KDBUS_NODE_DOMAIN,
++ KDBUS_NODE_CONTROL,
++ KDBUS_NODE_BUS,
++ KDBUS_NODE_ENDPOINT,
++};
++
++typedef void (*kdbus_node_free_t) (struct kdbus_node *node);
++typedef void (*kdbus_node_release_t) (struct kdbus_node *node, bool was_active);
++
++struct kdbus_node {
++ atomic_t refcnt;
++ atomic_t active;
++ wait_queue_head_t waitq;
++
++ /* static members */
++ unsigned int type;
++ kdbus_node_free_t free_cb;
++ kdbus_node_release_t release_cb;
++ umode_t mode;
++ kuid_t uid;
++ kgid_t gid;
++
++ /* valid once linked */
++ char *name;
++ unsigned int hash;
++ unsigned int id;
++ struct kdbus_node *parent; /* may be NULL */
++
++ /* valid iff active */
++ struct mutex lock;
++ struct rb_node rb;
++ struct rb_root children;
++};
++
++#define kdbus_node_from_rb(_node) rb_entry((_node), struct kdbus_node, rb)
++
++extern struct ida kdbus_node_ida;
++
++void kdbus_node_init(struct kdbus_node *node, unsigned int type);
++
++int kdbus_node_link(struct kdbus_node *node, struct kdbus_node *parent,
++ const char *name);
++
++struct kdbus_node *kdbus_node_ref(struct kdbus_node *node);
++struct kdbus_node *kdbus_node_unref(struct kdbus_node *node);
++
++bool kdbus_node_is_active(struct kdbus_node *node);
++bool kdbus_node_is_deactivated(struct kdbus_node *node);
++bool kdbus_node_activate(struct kdbus_node *node);
++void kdbus_node_deactivate(struct kdbus_node *node);
++
++bool kdbus_node_acquire(struct kdbus_node *node);
++void kdbus_node_release(struct kdbus_node *node);
++
++struct kdbus_node *kdbus_node_find_child(struct kdbus_node *node,
++ const char *name);
++struct kdbus_node *kdbus_node_find_closest(struct kdbus_node *node,
++ unsigned int hash);
++struct kdbus_node *kdbus_node_next_child(struct kdbus_node *node,
++ struct kdbus_node *prev);
++
++#endif
+diff --git a/ipc/kdbus/notify.c b/ipc/kdbus/notify.c
+new file mode 100644
+index 0000000..375758c
+--- /dev/null
++++ b/ipc/kdbus/notify.c
+@@ -0,0 +1,204 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/fs.h>
++#include <linux/init.h>
++#include <linux/module.h>
++#include <linux/mutex.h>
++#include <linux/spinlock.h>
++#include <linux/sched.h>
++#include <linux/slab.h>
++
++#include "bus.h"
++#include "connection.h"
++#include "domain.h"
++#include "endpoint.h"
++#include "item.h"
++#include "message.h"
++#include "notify.h"
++
++static inline void kdbus_notify_add_tail(struct kdbus_staging *staging,
++ struct kdbus_bus *bus)
++{
++ spin_lock(&bus->notify_lock);
++ list_add_tail(&staging->notify_entry, &bus->notify_list);
++ spin_unlock(&bus->notify_lock);
++}
++
++static int kdbus_notify_reply(struct kdbus_bus *bus, u64 id,
++ u64 cookie, u64 msg_type)
++{
++ struct kdbus_staging *s;
++
++ s = kdbus_staging_new_kernel(bus, id, cookie, 0, msg_type);
++ if (IS_ERR(s))
++ return PTR_ERR(s);
++
++ kdbus_notify_add_tail(s, bus);
++ return 0;
++}
++
++/**
++ * kdbus_notify_reply_timeout() - queue a timeout reply
++ * @bus: Bus which queues the messages
++ * @id: The destination's connection ID
++ * @cookie: The cookie to set in the reply.
++ *
++ * Queues a message that has a KDBUS_ITEM_REPLY_TIMEOUT item attached.
++ *
++ * Return: 0 on success, negative errno on failure.
++ */
++int kdbus_notify_reply_timeout(struct kdbus_bus *bus, u64 id, u64 cookie)
++{
++ return kdbus_notify_reply(bus, id, cookie, KDBUS_ITEM_REPLY_TIMEOUT);
++}
++
++/**
++ * kdbus_notify_reply_dead() - queue a 'dead' reply
++ * @bus: Bus which queues the messages
++ * @id: The destination's connection ID
++ * @cookie: The cookie to set in the reply.
++ *
++ * Queues a message that has a KDBUS_ITEM_REPLY_DEAD item attached.
++ *
++ * Return: 0 on success, negative errno on failure.
++ */
++int kdbus_notify_reply_dead(struct kdbus_bus *bus, u64 id, u64 cookie)
++{
++ return kdbus_notify_reply(bus, id, cookie, KDBUS_ITEM_REPLY_DEAD);
++}
++
++/**
++ * kdbus_notify_name_change() - queue a notification about a name owner change
++ * @bus: Bus which queues the messages
++ * @type: The type if the notification; KDBUS_ITEM_NAME_ADD,
++ * KDBUS_ITEM_NAME_CHANGE or KDBUS_ITEM_NAME_REMOVE
++ * @old_id: The id of the connection that used to own the name
++ * @new_id: The id of the new owner connection
++ * @old_flags: The flags to pass in the KDBUS_ITEM flags field for
++ * the old owner
++ * @new_flags: The flags to pass in the KDBUS_ITEM flags field for
++ * the new owner
++ * @name: The name that was removed or assigned to a new owner
++ *
++ * Return: 0 on success, negative errno on failure.
++ */
++int kdbus_notify_name_change(struct kdbus_bus *bus, u64 type,
++ u64 old_id, u64 new_id,
++ u64 old_flags, u64 new_flags,
++ const char *name)
++{
++ size_t name_len, extra_size;
++ struct kdbus_staging *s;
++
++ name_len = strlen(name) + 1;
++ extra_size = sizeof(struct kdbus_notify_name_change) + name_len;
++
++ s = kdbus_staging_new_kernel(bus, KDBUS_DST_ID_BROADCAST, 0,
++ extra_size, type);
++ if (IS_ERR(s))
++ return PTR_ERR(s);
++
++ s->notify->name_change.old_id.id = old_id;
++ s->notify->name_change.old_id.flags = old_flags;
++ s->notify->name_change.new_id.id = new_id;
++ s->notify->name_change.new_id.flags = new_flags;
++ memcpy(s->notify->name_change.name, name, name_len);
++
++ kdbus_notify_add_tail(s, bus);
++ return 0;
++}
++
++/**
++ * kdbus_notify_id_change() - queue a notification about a unique ID change
++ * @bus: Bus which queues the messages
++ * @type: The type if the notification; KDBUS_ITEM_ID_ADD or
++ * KDBUS_ITEM_ID_REMOVE
++ * @id: The id of the connection that was added or removed
++ * @flags: The flags to pass in the KDBUS_ITEM flags field
++ *
++ * Return: 0 on success, negative errno on failure.
++ */
++int kdbus_notify_id_change(struct kdbus_bus *bus, u64 type, u64 id, u64 flags)
++{
++ struct kdbus_staging *s;
++ size_t extra_size;
++
++ extra_size = sizeof(struct kdbus_notify_id_change);
++ s = kdbus_staging_new_kernel(bus, KDBUS_DST_ID_BROADCAST, 0,
++ extra_size, type);
++ if (IS_ERR(s))
++ return PTR_ERR(s);
++
++ s->notify->id_change.id = id;
++ s->notify->id_change.flags = flags;
++
++ kdbus_notify_add_tail(s, bus);
++ return 0;
++}
++
++/**
++ * kdbus_notify_flush() - send a list of collected messages
++ * @bus: Bus which queues the messages
++ *
++ * The list is empty after sending the messages.
++ */
++void kdbus_notify_flush(struct kdbus_bus *bus)
++{
++ LIST_HEAD(notify_list);
++ struct kdbus_staging *s, *tmp;
++
++ mutex_lock(&bus->notify_flush_lock);
++ down_read(&bus->name_registry->rwlock);
++
++ spin_lock(&bus->notify_lock);
++ list_splice_init(&bus->notify_list, ¬ify_list);
++ spin_unlock(&bus->notify_lock);
++
++ list_for_each_entry_safe(s, tmp, ¬ify_list, notify_entry) {
++ if (s->msg->dst_id != KDBUS_DST_ID_BROADCAST) {
++ struct kdbus_conn *conn;
++
++ conn = kdbus_bus_find_conn_by_id(bus, s->msg->dst_id);
++ if (conn) {
++ kdbus_bus_eavesdrop(bus, NULL, s);
++ kdbus_conn_entry_insert(NULL, conn, s, NULL,
++ NULL);
++ kdbus_conn_unref(conn);
++ }
++ } else {
++ kdbus_bus_broadcast(bus, NULL, s);
++ }
++
++ list_del(&s->notify_entry);
++ kdbus_staging_free(s);
++ }
++
++ up_read(&bus->name_registry->rwlock);
++ mutex_unlock(&bus->notify_flush_lock);
++}
++
++/**
++ * kdbus_notify_free() - free a list of collected messages
++ * @bus: Bus which queues the messages
++ */
++void kdbus_notify_free(struct kdbus_bus *bus)
++{
++ struct kdbus_staging *s, *tmp;
++
++ list_for_each_entry_safe(s, tmp, &bus->notify_list, notify_entry) {
++ list_del(&s->notify_entry);
++ kdbus_staging_free(s);
++ }
++}
+diff --git a/ipc/kdbus/notify.h b/ipc/kdbus/notify.h
+new file mode 100644
+index 0000000..03df464
+--- /dev/null
++++ b/ipc/kdbus/notify.h
+@@ -0,0 +1,30 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_NOTIFY_H
++#define __KDBUS_NOTIFY_H
++
++struct kdbus_bus;
++
++int kdbus_notify_id_change(struct kdbus_bus *bus, u64 type, u64 id, u64 flags);
++int kdbus_notify_reply_timeout(struct kdbus_bus *bus, u64 id, u64 cookie);
++int kdbus_notify_reply_dead(struct kdbus_bus *bus, u64 id, u64 cookie);
++int kdbus_notify_name_change(struct kdbus_bus *bus, u64 type,
++ u64 old_id, u64 new_id,
++ u64 old_flags, u64 new_flags,
++ const char *name);
++void kdbus_notify_flush(struct kdbus_bus *bus);
++void kdbus_notify_free(struct kdbus_bus *bus);
++
++#endif
+diff --git a/ipc/kdbus/policy.c b/ipc/kdbus/policy.c
+new file mode 100644
+index 0000000..f2618e15
+--- /dev/null
++++ b/ipc/kdbus/policy.c
+@@ -0,0 +1,489 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/dcache.h>
++#include <linux/fs.h>
++#include <linux/init.h>
++#include <linux/mutex.h>
++#include <linux/sched.h>
++#include <linux/sizes.h>
++#include <linux/slab.h>
++#include <linux/uaccess.h>
++
++#include "bus.h"
++#include "connection.h"
++#include "domain.h"
++#include "item.h"
++#include "names.h"
++#include "policy.h"
++
++#define KDBUS_POLICY_HASH_SIZE 64
++
++/**
++ * struct kdbus_policy_db_entry_access - a database entry access item
++ * @type: One of KDBUS_POLICY_ACCESS_* types
++ * @access: Access to grant. One of KDBUS_POLICY_*
++ * @uid: For KDBUS_POLICY_ACCESS_USER, the global uid
++ * @gid: For KDBUS_POLICY_ACCESS_GROUP, the global gid
++ * @list: List entry item for the entry's list
++ *
++ * This is the internal version of struct kdbus_policy_db_access.
++ */
++struct kdbus_policy_db_entry_access {
++ u8 type; /* USER, GROUP, WORLD */
++ u8 access; /* OWN, TALK, SEE */
++ union {
++ kuid_t uid; /* global uid */
++ kgid_t gid; /* global gid */
++ };
++ struct list_head list;
++};
++
++/**
++ * struct kdbus_policy_db_entry - a policy database entry
++ * @name: The name to match the policy entry against
++ * @hentry: The hash entry for the database's entries_hash
++ * @access_list: List head for keeping tracks of the entry's
++ * access items.
++ * @owner: The owner of this entry. Can be a kdbus_conn or
++ * a kdbus_ep object.
++ * @wildcard: The name is a wildcard, such as ending on '.*'
++ */
++struct kdbus_policy_db_entry {
++ char *name;
++ struct hlist_node hentry;
++ struct list_head access_list;
++ const void *owner;
++ bool wildcard:1;
++};
++
++static void kdbus_policy_entry_free(struct kdbus_policy_db_entry *e)
++{
++ struct kdbus_policy_db_entry_access *a, *tmp;
++
++ list_for_each_entry_safe(a, tmp, &e->access_list, list) {
++ list_del(&a->list);
++ kfree(a);
++ }
++
++ kfree(e->name);
++ kfree(e);
++}
++
++static unsigned int kdbus_strnhash(const char *str, size_t len)
++{
++ unsigned long hash = init_name_hash();
++
++ while (len--)
++ hash = partial_name_hash(*str++, hash);
++
++ return end_name_hash(hash);
++}
++
++static const struct kdbus_policy_db_entry *
++kdbus_policy_lookup(struct kdbus_policy_db *db, const char *name, u32 hash)
++{
++ struct kdbus_policy_db_entry *e;
++ const char *dot;
++ size_t len;
++
++ /* find exact match */
++ hash_for_each_possible(db->entries_hash, e, hentry, hash)
++ if (strcmp(e->name, name) == 0 && !e->wildcard)
++ return e;
++
++ /* find wildcard match */
++
++ dot = strrchr(name, '.');
++ if (!dot)
++ return NULL;
++
++ len = dot - name;
++ hash = kdbus_strnhash(name, len);
++
++ hash_for_each_possible(db->entries_hash, e, hentry, hash)
++ if (e->wildcard && !strncmp(e->name, name, len) &&
++ !e->name[len])
++ return e;
++
++ return NULL;
++}
++
++/**
++ * kdbus_policy_db_clear - release all memory from a policy db
++ * @db: The policy database
++ */
++void kdbus_policy_db_clear(struct kdbus_policy_db *db)
++{
++ struct kdbus_policy_db_entry *e;
++ struct hlist_node *tmp;
++ unsigned int i;
++
++ /* purge entries */
++ down_write(&db->entries_rwlock);
++ hash_for_each_safe(db->entries_hash, i, tmp, e, hentry) {
++ hash_del(&e->hentry);
++ kdbus_policy_entry_free(e);
++ }
++ up_write(&db->entries_rwlock);
++}
++
++/**
++ * kdbus_policy_db_init() - initialize a new policy database
++ * @db: The location of the database
++ *
++ * This initializes a new policy-db. The underlying memory must have been
++ * cleared to zero by the caller.
++ */
++void kdbus_policy_db_init(struct kdbus_policy_db *db)
++{
++ hash_init(db->entries_hash);
++ init_rwsem(&db->entries_rwlock);
++}
++
++/**
++ * kdbus_policy_query_unlocked() - Query the policy database
++ * @db: Policy database
++ * @cred: Credentials to test against
++ * @name: Name to query
++ * @hash: Hash value of @name
++ *
++ * Same as kdbus_policy_query() but requires the caller to lock the policy
++ * database against concurrent writes.
++ *
++ * Return: The highest KDBUS_POLICY_* access type found, or -EPERM if none.
++ */
++int kdbus_policy_query_unlocked(struct kdbus_policy_db *db,
++ const struct cred *cred, const char *name,
++ unsigned int hash)
++{
++ struct kdbus_policy_db_entry_access *a;
++ const struct kdbus_policy_db_entry *e;
++ int i, highest = -EPERM;
++
++ e = kdbus_policy_lookup(db, name, hash);
++ if (!e)
++ return -EPERM;
++
++ list_for_each_entry(a, &e->access_list, list) {
++ if ((int)a->access <= highest)
++ continue;
++
++ switch (a->type) {
++ case KDBUS_POLICY_ACCESS_USER:
++ if (uid_eq(cred->euid, a->uid))
++ highest = a->access;
++ break;
++ case KDBUS_POLICY_ACCESS_GROUP:
++ if (gid_eq(cred->egid, a->gid)) {
++ highest = a->access;
++ break;
++ }
++
++ for (i = 0; i < cred->group_info->ngroups; i++) {
++ kgid_t gid = GROUP_AT(cred->group_info, i);
++
++ if (gid_eq(gid, a->gid)) {
++ highest = a->access;
++ break;
++ }
++ }
++
++ break;
++ case KDBUS_POLICY_ACCESS_WORLD:
++ highest = a->access;
++ break;
++ }
++
++ /* OWN is the highest possible policy */
++ if (highest >= KDBUS_POLICY_OWN)
++ break;
++ }
++
++ return highest;
++}
++
++/**
++ * kdbus_policy_query() - Query the policy database
++ * @db: Policy database
++ * @cred: Credentials to test against
++ * @name: Name to query
++ * @hash: Hash value of @name
++ *
++ * Query the policy database @db for the access rights of @cred to the name
++ * @name. The access rights of @cred are returned, or -EPERM if no access is
++ * granted.
++ *
++ * This call effectively searches for the highest access-right granted to
++ * @cred. The caller should really cache those as policy lookups are rather
++ * expensive.
++ *
++ * Return: The highest KDBUS_POLICY_* access type found, or -EPERM if none.
++ */
++int kdbus_policy_query(struct kdbus_policy_db *db, const struct cred *cred,
++ const char *name, unsigned int hash)
++{
++ int ret;
++
++ down_read(&db->entries_rwlock);
++ ret = kdbus_policy_query_unlocked(db, cred, name, hash);
++ up_read(&db->entries_rwlock);
++
++ return ret;
++}
++
++static void __kdbus_policy_remove_owner(struct kdbus_policy_db *db,
++ const void *owner)
++{
++ struct kdbus_policy_db_entry *e;
++ struct hlist_node *tmp;
++ int i;
++
++ hash_for_each_safe(db->entries_hash, i, tmp, e, hentry)
++ if (e->owner == owner) {
++ hash_del(&e->hentry);
++ kdbus_policy_entry_free(e);
++ }
++}
++
++/**
++ * kdbus_policy_remove_owner() - remove all entries related to a connection
++ * @db: The policy database
++ * @owner: The connection which items to remove
++ */
++void kdbus_policy_remove_owner(struct kdbus_policy_db *db,
++ const void *owner)
++{
++ down_write(&db->entries_rwlock);
++ __kdbus_policy_remove_owner(db, owner);
++ up_write(&db->entries_rwlock);
++}
++
++/*
++ * Convert user provided policy access to internal kdbus policy
++ * access
++ */
++static struct kdbus_policy_db_entry_access *
++kdbus_policy_make_access(const struct kdbus_policy_access *uaccess)
++{
++ int ret;
++ struct kdbus_policy_db_entry_access *a;
++
++ a = kzalloc(sizeof(*a), GFP_KERNEL);
++ if (!a)
++ return ERR_PTR(-ENOMEM);
++
++ ret = -EINVAL;
++ switch (uaccess->access) {
++ case KDBUS_POLICY_SEE:
++ case KDBUS_POLICY_TALK:
++ case KDBUS_POLICY_OWN:
++ a->access = uaccess->access;
++ break;
++ default:
++ goto err;
++ }
++
++ switch (uaccess->type) {
++ case KDBUS_POLICY_ACCESS_USER:
++ a->uid = make_kuid(current_user_ns(), uaccess->id);
++ if (!uid_valid(a->uid))
++ goto err;
++
++ break;
++ case KDBUS_POLICY_ACCESS_GROUP:
++ a->gid = make_kgid(current_user_ns(), uaccess->id);
++ if (!gid_valid(a->gid))
++ goto err;
++
++ break;
++ case KDBUS_POLICY_ACCESS_WORLD:
++ break;
++ default:
++ goto err;
++ }
++
++ a->type = uaccess->type;
++
++ return a;
++
++err:
++ kfree(a);
++ return ERR_PTR(ret);
++}
++
++/**
++ * kdbus_policy_set() - set a connection's policy rules
++ * @db: The policy database
++ * @items: A list of kdbus_item elements that contain both
++ * names and access rules to set.
++ * @items_size: The total size of the items.
++ * @max_policies: The maximum number of policy entries to allow.
++ * Pass 0 for no limit.
++ * @allow_wildcards: Boolean value whether wildcard entries (such
++ * ending on '.*') should be allowed.
++ * @owner: The owner of the new policy items.
++ *
++ * This function sets a new set of policies for a given owner. The names and
++ * access rules are gathered by walking the list of items passed in as
++ * argument. An item of type KDBUS_ITEM_NAME is expected before any number of
++ * KDBUS_ITEM_POLICY_ACCESS items. If there are more repetitions of this
++ * pattern than denoted in @max_policies, -EINVAL is returned.
++ *
++ * In order to allow atomic replacement of rules, the function first removes
++ * all entries that have been created for the given owner previously.
++ *
++ * Callers to this function must make sure that the owner is a custom
++ * endpoint, or if the endpoint is a default endpoint, then it must be
++ * either a policy holder or an activator.
++ *
++ * Return: 0 on success, negative errno on failure.
++ */
++int kdbus_policy_set(struct kdbus_policy_db *db,
++ const struct kdbus_item *items,
++ size_t items_size,
++ size_t max_policies,
++ bool allow_wildcards,
++ const void *owner)
++{
++ struct kdbus_policy_db_entry_access *a;
++ struct kdbus_policy_db_entry *e, *p;
++ const struct kdbus_item *item;
++ struct hlist_node *tmp;
++ HLIST_HEAD(entries);
++ HLIST_HEAD(restore);
++ size_t count = 0;
++ int i, ret = 0;
++ u32 hash;
++
++ /* Walk the list of items and look for new policies */
++ e = NULL;
++ KDBUS_ITEMS_FOREACH(item, items, items_size) {
++ switch (item->type) {
++ case KDBUS_ITEM_NAME: {
++ size_t len;
++
++ if (max_policies && ++count > max_policies) {
++ ret = -E2BIG;
++ goto exit;
++ }
++
++ if (!kdbus_name_is_valid(item->str, true)) {
++ ret = -EINVAL;
++ goto exit;
++ }
++
++ e = kzalloc(sizeof(*e), GFP_KERNEL);
++ if (!e) {
++ ret = -ENOMEM;
++ goto exit;
++ }
++
++ INIT_LIST_HEAD(&e->access_list);
++ e->owner = owner;
++ hlist_add_head(&e->hentry, &entries);
++
++ e->name = kstrdup(item->str, GFP_KERNEL);
++ if (!e->name) {
++ ret = -ENOMEM;
++ goto exit;
++ }
++
++ /*
++ * If a supplied name ends with an '.*', cut off that
++ * part, only store anything before it, and mark the
++ * entry as wildcard.
++ */
++ len = strlen(e->name);
++ if (len > 2 &&
++ e->name[len - 3] == '.' &&
++ e->name[len - 2] == '*') {
++ if (!allow_wildcards) {
++ ret = -EINVAL;
++ goto exit;
++ }
++
++ e->name[len - 3] = '\0';
++ e->wildcard = true;
++ }
++
++ break;
++ }
++
++ case KDBUS_ITEM_POLICY_ACCESS:
++ if (!e) {
++ ret = -EINVAL;
++ goto exit;
++ }
++
++ a = kdbus_policy_make_access(&item->policy_access);
++ if (IS_ERR(a)) {
++ ret = PTR_ERR(a);
++ goto exit;
++ }
++
++ list_add_tail(&a->list, &e->access_list);
++ break;
++ }
++ }
++
++ down_write(&db->entries_rwlock);
++
++ /* remember previous entries to restore in case of failure */
++ hash_for_each_safe(db->entries_hash, i, tmp, e, hentry)
++ if (e->owner == owner) {
++ hash_del(&e->hentry);
++ hlist_add_head(&e->hentry, &restore);
++ }
++
++ hlist_for_each_entry_safe(e, tmp, &entries, hentry) {
++ /* prevent duplicates */
++ hash = kdbus_strhash(e->name);
++ hash_for_each_possible(db->entries_hash, p, hentry, hash)
++ if (strcmp(e->name, p->name) == 0 &&
++ e->wildcard == p->wildcard) {
++ ret = -EEXIST;
++ goto restore;
++ }
++
++ hlist_del(&e->hentry);
++ hash_add(db->entries_hash, &e->hentry, hash);
++ }
++
++restore:
++ /* if we failed, flush all entries we added so far */
++ if (ret < 0)
++ __kdbus_policy_remove_owner(db, owner);
++
++ /* if we failed, restore entries, otherwise release them */
++ hlist_for_each_entry_safe(e, tmp, &restore, hentry) {
++ hlist_del(&e->hentry);
++ if (ret < 0) {
++ hash = kdbus_strhash(e->name);
++ hash_add(db->entries_hash, &e->hentry, hash);
++ } else {
++ kdbus_policy_entry_free(e);
++ }
++ }
++
++ up_write(&db->entries_rwlock);
++
++exit:
++ hlist_for_each_entry_safe(e, tmp, &entries, hentry) {
++ hlist_del(&e->hentry);
++ kdbus_policy_entry_free(e);
++ }
++
++ return ret;
++}
+diff --git a/ipc/kdbus/policy.h b/ipc/kdbus/policy.h
+new file mode 100644
+index 0000000..15dd7bc
+--- /dev/null
++++ b/ipc/kdbus/policy.h
+@@ -0,0 +1,51 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_POLICY_H
++#define __KDBUS_POLICY_H
++
++#include <linux/hashtable.h>
++#include <linux/rwsem.h>
++
++struct kdbus_conn;
++struct kdbus_item;
++
++/**
++ * struct kdbus_policy_db - policy database
++ * @entries_hash: Hashtable of entries
++ * @entries_rwlock: Mutex to protect the database's access entries
++ */
++struct kdbus_policy_db {
++ DECLARE_HASHTABLE(entries_hash, 6);
++ struct rw_semaphore entries_rwlock;
++};
++
++void kdbus_policy_db_init(struct kdbus_policy_db *db);
++void kdbus_policy_db_clear(struct kdbus_policy_db *db);
++
++int kdbus_policy_query_unlocked(struct kdbus_policy_db *db,
++ const struct cred *cred, const char *name,
++ unsigned int hash);
++int kdbus_policy_query(struct kdbus_policy_db *db, const struct cred *cred,
++ const char *name, unsigned int hash);
++
++void kdbus_policy_remove_owner(struct kdbus_policy_db *db,
++ const void *owner);
++int kdbus_policy_set(struct kdbus_policy_db *db,
++ const struct kdbus_item *items,
++ size_t items_size,
++ size_t max_policies,
++ bool allow_wildcards,
++ const void *owner);
++
++#endif
+diff --git a/ipc/kdbus/pool.c b/ipc/kdbus/pool.c
+new file mode 100644
+index 0000000..63ccd55
+--- /dev/null
++++ b/ipc/kdbus/pool.c
+@@ -0,0 +1,728 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/aio.h>
++#include <linux/file.h>
++#include <linux/fs.h>
++#include <linux/highmem.h>
++#include <linux/init.h>
++#include <linux/mm.h>
++#include <linux/module.h>
++#include <linux/pagemap.h>
++#include <linux/rbtree.h>
++#include <linux/sched.h>
++#include <linux/shmem_fs.h>
++#include <linux/sizes.h>
++#include <linux/slab.h>
++#include <linux/uaccess.h>
++#include <linux/uio.h>
++
++#include "pool.h"
++#include "util.h"
++
++/**
++ * struct kdbus_pool - the receiver's buffer
++ * @f: The backing shmem file
++ * @size: The size of the file
++ * @accounted_size: Currently accounted memory in bytes
++ * @lock: Pool data lock
++ * @slices: All slices sorted by address
++ * @slices_busy: Tree of allocated slices
++ * @slices_free: Tree of free slices
++ *
++ * The receiver's buffer, managed as a pool of allocated and free
++ * slices containing the queued messages.
++ *
++ * Messages sent with KDBUS_CMD_SEND are copied directly by the
++ * sending process into the receiver's pool.
++ *
++ * Messages received with KDBUS_CMD_RECV just return the offset
++ * to the data placed in the pool.
++ *
++ * The internally allocated memory needs to be returned by the receiver
++ * with KDBUS_CMD_FREE.
++ */
++struct kdbus_pool {
++ struct file *f;
++ size_t size;
++ size_t accounted_size;
++ struct mutex lock;
++
++ struct list_head slices;
++ struct rb_root slices_busy;
++ struct rb_root slices_free;
++};
++
++/**
++ * struct kdbus_pool_slice - allocated element in kdbus_pool
++ * @pool: Pool this slice belongs to
++ * @off: Offset of slice in the shmem file
++ * @size: Size of slice
++ * @entry: Entry in "all slices" list
++ * @rb_node: Entry in free or busy list
++ * @free: Unused slice
++ * @accounted: Accounted as queue slice
++ * @ref_kernel: Kernel holds a reference
++ * @ref_user: Userspace holds a reference
++ *
++ * The pool has one or more slices, always spanning the entire size of the
++ * pool.
++ *
++ * Every slice is an element in a list sorted by the buffer address, to
++ * provide access to the next neighbor slice.
++ *
++ * Every slice is member in either the busy or the free tree. The free
++ * tree is organized by slice size, the busy tree organized by buffer
++ * offset.
++ */
++struct kdbus_pool_slice {
++ struct kdbus_pool *pool;
++ size_t off;
++ size_t size;
++
++ struct list_head entry;
++ struct rb_node rb_node;
++
++ bool free:1;
++ bool accounted:1;
++ bool ref_kernel:1;
++ bool ref_user:1;
++};
++
++static struct kdbus_pool_slice *kdbus_pool_slice_new(struct kdbus_pool *pool,
++ size_t off, size_t size)
++{
++ struct kdbus_pool_slice *slice;
++
++ slice = kzalloc(sizeof(*slice), GFP_KERNEL);
++ if (!slice)
++ return NULL;
++
++ slice->pool = pool;
++ slice->off = off;
++ slice->size = size;
++ slice->free = true;
++ return slice;
++}
++
++/* insert a slice into the free tree */
++static void kdbus_pool_add_free_slice(struct kdbus_pool *pool,
++ struct kdbus_pool_slice *slice)
++{
++ struct rb_node **n;
++ struct rb_node *pn = NULL;
++
++ n = &pool->slices_free.rb_node;
++ while (*n) {
++ struct kdbus_pool_slice *pslice;
++
++ pn = *n;
++ pslice = rb_entry(pn, struct kdbus_pool_slice, rb_node);
++ if (slice->size < pslice->size)
++ n = &pn->rb_left;
++ else
++ n = &pn->rb_right;
++ }
++
++ rb_link_node(&slice->rb_node, pn, n);
++ rb_insert_color(&slice->rb_node, &pool->slices_free);
++}
++
++/* insert a slice into the busy tree */
++static void kdbus_pool_add_busy_slice(struct kdbus_pool *pool,
++ struct kdbus_pool_slice *slice)
++{
++ struct rb_node **n;
++ struct rb_node *pn = NULL;
++
++ n = &pool->slices_busy.rb_node;
++ while (*n) {
++ struct kdbus_pool_slice *pslice;
++
++ pn = *n;
++ pslice = rb_entry(pn, struct kdbus_pool_slice, rb_node);
++ if (slice->off < pslice->off)
++ n = &pn->rb_left;
++ else if (slice->off > pslice->off)
++ n = &pn->rb_right;
++ else
++ BUG();
++ }
++
++ rb_link_node(&slice->rb_node, pn, n);
++ rb_insert_color(&slice->rb_node, &pool->slices_busy);
++}
++
++static struct kdbus_pool_slice *kdbus_pool_find_slice(struct kdbus_pool *pool,
++ size_t off)
++{
++ struct rb_node *n;
++
++ n = pool->slices_busy.rb_node;
++ while (n) {
++ struct kdbus_pool_slice *s;
++
++ s = rb_entry(n, struct kdbus_pool_slice, rb_node);
++ if (off < s->off)
++ n = n->rb_left;
++ else if (off > s->off)
++ n = n->rb_right;
++ else
++ return s;
++ }
++
++ return NULL;
++}
++
++/**
++ * kdbus_pool_slice_alloc() - allocate memory from a pool
++ * @pool: The receiver's pool
++ * @size: The number of bytes to allocate
++ * @accounted: Whether this slice should be accounted for
++ *
++ * The returned slice is used for kdbus_pool_slice_release() to
++ * free the allocated memory. If either @kvec or @iovec is non-NULL, the data
++ * will be copied from kernel or userspace memory into the new slice at
++ * offset 0.
++ *
++ * Return: the allocated slice on success, ERR_PTR on failure.
++ */
++struct kdbus_pool_slice *kdbus_pool_slice_alloc(struct kdbus_pool *pool,
++ size_t size, bool accounted)
++{
++ size_t slice_size = KDBUS_ALIGN8(size);
++ struct rb_node *n, *found = NULL;
++ struct kdbus_pool_slice *s;
++ int ret = 0;
++
++ if (WARN_ON(!size))
++ return ERR_PTR(-EINVAL);
++
++ /* search a free slice with the closest matching size */
++ mutex_lock(&pool->lock);
++ n = pool->slices_free.rb_node;
++ while (n) {
++ s = rb_entry(n, struct kdbus_pool_slice, rb_node);
++ if (slice_size < s->size) {
++ found = n;
++ n = n->rb_left;
++ } else if (slice_size > s->size) {
++ n = n->rb_right;
++ } else {
++ found = n;
++ break;
++ }
++ }
++
++ /* no slice with the minimum size found in the pool */
++ if (!found) {
++ ret = -EXFULL;
++ goto exit_unlock;
++ }
++
++ /* no exact match, use the closest one */
++ if (!n) {
++ struct kdbus_pool_slice *s_new;
++
++ s = rb_entry(found, struct kdbus_pool_slice, rb_node);
++
++ /* split-off the remainder of the size to its own slice */
++ s_new = kdbus_pool_slice_new(pool, s->off + slice_size,
++ s->size - slice_size);
++ if (!s_new) {
++ ret = -ENOMEM;
++ goto exit_unlock;
++ }
++
++ list_add(&s_new->entry, &s->entry);
++ kdbus_pool_add_free_slice(pool, s_new);
++
++ /* adjust our size now that we split-off another slice */
++ s->size = slice_size;
++ }
++
++ /* move slice from free to the busy tree */
++ rb_erase(found, &pool->slices_free);
++ kdbus_pool_add_busy_slice(pool, s);
++
++ WARN_ON(s->ref_kernel || s->ref_user);
++
++ s->ref_kernel = true;
++ s->free = false;
++ s->accounted = accounted;
++ if (accounted)
++ pool->accounted_size += s->size;
++ mutex_unlock(&pool->lock);
++
++ return s;
++
++exit_unlock:
++ mutex_unlock(&pool->lock);
++ return ERR_PTR(ret);
++}
++
++static void __kdbus_pool_slice_release(struct kdbus_pool_slice *slice)
++{
++ struct kdbus_pool *pool = slice->pool;
++
++ /* don't free the slice if either has a reference */
++ if (slice->ref_kernel || slice->ref_user)
++ return;
++
++ if (WARN_ON(slice->free))
++ return;
++
++ rb_erase(&slice->rb_node, &pool->slices_busy);
++
++ /* merge with the next free slice */
++ if (!list_is_last(&slice->entry, &pool->slices)) {
++ struct kdbus_pool_slice *s;
++
++ s = list_entry(slice->entry.next,
++ struct kdbus_pool_slice, entry);
++ if (s->free) {
++ rb_erase(&s->rb_node, &pool->slices_free);
++ list_del(&s->entry);
++ slice->size += s->size;
++ kfree(s);
++ }
++ }
++
++ /* merge with previous free slice */
++ if (pool->slices.next != &slice->entry) {
++ struct kdbus_pool_slice *s;
++
++ s = list_entry(slice->entry.prev,
++ struct kdbus_pool_slice, entry);
++ if (s->free) {
++ rb_erase(&s->rb_node, &pool->slices_free);
++ list_del(&slice->entry);
++ s->size += slice->size;
++ kfree(slice);
++ slice = s;
++ }
++ }
++
++ slice->free = true;
++ kdbus_pool_add_free_slice(pool, slice);
++}
++
++/**
++ * kdbus_pool_slice_release() - drop kernel-reference on allocated slice
++ * @slice: Slice allocated from the pool
++ *
++ * This releases the kernel-reference on the given slice. If the
++ * kernel-reference and the user-reference on a slice are dropped, the slice is
++ * returned to the pool.
++ *
++ * So far, we do not implement full ref-counting on slices. Each, kernel and
++ * user-space can have exactly one reference to a slice. If both are dropped at
++ * the same time, the slice is released.
++ */
++void kdbus_pool_slice_release(struct kdbus_pool_slice *slice)
++{
++ struct kdbus_pool *pool;
++
++ if (!slice)
++ return;
++
++ /* @slice may be freed, so keep local ptr to @pool */
++ pool = slice->pool;
++
++ mutex_lock(&pool->lock);
++ /* kernel must own a ref to @slice to drop it */
++ WARN_ON(!slice->ref_kernel);
++ slice->ref_kernel = false;
++ /* no longer kernel-owned, de-account slice */
++ if (slice->accounted && !WARN_ON(pool->accounted_size < slice->size))
++ pool->accounted_size -= slice->size;
++ __kdbus_pool_slice_release(slice);
++ mutex_unlock(&pool->lock);
++}
++
++/**
++ * kdbus_pool_release_offset() - release a public offset
++ * @pool: pool to operate on
++ * @off: offset to release
++ *
++ * This should be called whenever user-space frees a slice given to them. It
++ * verifies the slice is available and public, and then drops it. It ensures
++ * correct locking and barriers against queues.
++ *
++ * Return: 0 on success, ENXIO if the offset is invalid or not public.
++ */
++int kdbus_pool_release_offset(struct kdbus_pool *pool, size_t off)
++{
++ struct kdbus_pool_slice *slice;
++ int ret = 0;
++
++ /* 'pool->size' is used as dummy offset for empty slices */
++ if (off == pool->size)
++ return 0;
++
++ mutex_lock(&pool->lock);
++ slice = kdbus_pool_find_slice(pool, off);
++ if (slice && slice->ref_user) {
++ slice->ref_user = false;
++ __kdbus_pool_slice_release(slice);
++ } else {
++ ret = -ENXIO;
++ }
++ mutex_unlock(&pool->lock);
++
++ return ret;
++}
++
++/**
++ * kdbus_pool_publish_empty() - publish empty slice to user-space
++ * @pool: pool to operate on
++ * @off: output storage for offset, or NULL
++ * @size: output storage for size, or NULL
++ *
++ * This is the same as kdbus_pool_slice_publish(), but uses a dummy slice with
++ * size 0. The returned offset points to the end of the pool and is never
++ * returned on real slices.
++ */
++void kdbus_pool_publish_empty(struct kdbus_pool *pool, u64 *off, u64 *size)
++{
++ if (off)
++ *off = pool->size;
++ if (size)
++ *size = 0;
++}
++
++/**
++ * kdbus_pool_slice_publish() - publish slice to user-space
++ * @slice: The slice
++ * @out_offset: Output storage for offset, or NULL
++ * @out_size: Output storage for size, or NULL
++ *
++ * This prepares a slice to be published to user-space.
++ *
++ * This call combines the following operations:
++ * * the memory region is flushed so the user's memory view is consistent
++ * * the slice is marked as referenced by user-space, so user-space has to
++ * call KDBUS_CMD_FREE to release it
++ * * the offset and size of the slice are written to the given output
++ * arguments, if non-NULL
++ */
++void kdbus_pool_slice_publish(struct kdbus_pool_slice *slice,
++ u64 *out_offset, u64 *out_size)
++{
++ mutex_lock(&slice->pool->lock);
++ /* kernel must own a ref to @slice to gain a user-space ref */
++ WARN_ON(!slice->ref_kernel);
++ slice->ref_user = true;
++ mutex_unlock(&slice->pool->lock);
++
++ if (out_offset)
++ *out_offset = slice->off;
++ if (out_size)
++ *out_size = slice->size;
++}
++
++/**
++ * kdbus_pool_slice_offset() - Get a slice's offset inside the pool
++ * @slice: Slice to return the offset of
++ *
++ * Return: The internal offset @slice inside the pool.
++ */
++off_t kdbus_pool_slice_offset(const struct kdbus_pool_slice *slice)
++{
++ return slice->off;
++}
++
++/**
++ * kdbus_pool_slice_size() - get size of a pool slice
++ * @slice: slice to query
++ *
++ * Return: size of the given slice
++ */
++size_t kdbus_pool_slice_size(const struct kdbus_pool_slice *slice)
++{
++ return slice->size;
++}
++
++/**
++ * kdbus_pool_new() - create a new pool
++ * @name: Name of the (deleted) file which shows up in
++ * /proc, used for debugging
++ * @size: Maximum size of the pool
++ *
++ * Return: a new kdbus_pool on success, ERR_PTR on failure.
++ */
++struct kdbus_pool *kdbus_pool_new(const char *name, size_t size)
++{
++ struct kdbus_pool_slice *s;
++ struct kdbus_pool *p;
++ struct file *f;
++ char *n = NULL;
++ int ret;
++
++ p = kzalloc(sizeof(*p), GFP_KERNEL);
++ if (!p)
++ return ERR_PTR(-ENOMEM);
++
++ if (name) {
++ n = kasprintf(GFP_KERNEL, KBUILD_MODNAME "-conn:%s", name);
++ if (!n) {
++ ret = -ENOMEM;
++ goto exit_free;
++ }
++ }
++
++ f = shmem_file_setup(n ?: KBUILD_MODNAME "-conn", size, 0);
++ kfree(n);
++
++ if (IS_ERR(f)) {
++ ret = PTR_ERR(f);
++ goto exit_free;
++ }
++
++ ret = get_write_access(file_inode(f));
++ if (ret < 0)
++ goto exit_put_shmem;
++
++ /* allocate first slice spanning the entire pool */
++ s = kdbus_pool_slice_new(p, 0, size);
++ if (!s) {
++ ret = -ENOMEM;
++ goto exit_put_write;
++ }
++
++ p->f = f;
++ p->size = size;
++ p->slices_free = RB_ROOT;
++ p->slices_busy = RB_ROOT;
++ mutex_init(&p->lock);
++
++ INIT_LIST_HEAD(&p->slices);
++ list_add(&s->entry, &p->slices);
++
++ kdbus_pool_add_free_slice(p, s);
++ return p;
++
++exit_put_write:
++ put_write_access(file_inode(f));
++exit_put_shmem:
++ fput(f);
++exit_free:
++ kfree(p);
++ return ERR_PTR(ret);
++}
++
++/**
++ * kdbus_pool_free() - destroy pool
++ * @pool: The receiver's pool
++ */
++void kdbus_pool_free(struct kdbus_pool *pool)
++{
++ struct kdbus_pool_slice *s, *tmp;
++
++ if (!pool)
++ return;
++
++ list_for_each_entry_safe(s, tmp, &pool->slices, entry) {
++ list_del(&s->entry);
++ kfree(s);
++ }
++
++ put_write_access(file_inode(pool->f));
++ fput(pool->f);
++ kfree(pool);
++}
++
++/**
++ * kdbus_pool_accounted() - retrieve accounting information
++ * @pool: pool to query
++ * @size: output for overall pool size
++ * @acc: output for currently accounted size
++ *
++ * This returns accounting information of the pool. Note that the data might
++ * change after the function returns, as the pool lock is dropped. You need to
++ * protect the data via other means, if you need reliable accounting.
++ */
++void kdbus_pool_accounted(struct kdbus_pool *pool, size_t *size, size_t *acc)
++{
++ mutex_lock(&pool->lock);
++ if (size)
++ *size = pool->size;
++ if (acc)
++ *acc = pool->accounted_size;
++ mutex_unlock(&pool->lock);
++}
++
++/**
++ * kdbus_pool_slice_copy_iovec() - copy user memory to a slice
++ * @slice: The slice to write to
++ * @off: Offset in the slice to write to
++ * @iov: iovec array, pointing to data to copy
++ * @iov_len: Number of elements in @iov
++ * @total_len: Total number of bytes described in members of @iov
++ *
++ * User memory referenced by @iov will be copied into @slice at offset @off.
++ *
++ * Return: the numbers of bytes copied, negative errno on failure.
++ */
++ssize_t
++kdbus_pool_slice_copy_iovec(const struct kdbus_pool_slice *slice, loff_t off,
++ struct iovec *iov, size_t iov_len, size_t total_len)
++{
++ struct iov_iter iter;
++ ssize_t len;
++
++ if (WARN_ON(off + total_len > slice->size))
++ return -EFAULT;
++
++ off += slice->off;
++ iov_iter_init(&iter, WRITE, iov, iov_len, total_len);
++ len = vfs_iter_write(slice->pool->f, &iter, &off);
++
++ return (len >= 0 && len != total_len) ? -EFAULT : len;
++}
++
++/**
++ * kdbus_pool_slice_copy_kvec() - copy kernel memory to a slice
++ * @slice: The slice to write to
++ * @off: Offset in the slice to write to
++ * @kvec: kvec array, pointing to data to copy
++ * @kvec_len: Number of elements in @kvec
++ * @total_len: Total number of bytes described in members of @kvec
++ *
++ * Kernel memory referenced by @kvec will be copied into @slice at offset @off.
++ *
++ * Return: the numbers of bytes copied, negative errno on failure.
++ */
++ssize_t kdbus_pool_slice_copy_kvec(const struct kdbus_pool_slice *slice,
++ loff_t off, struct kvec *kvec,
++ size_t kvec_len, size_t total_len)
++{
++ struct iov_iter iter;
++ mm_segment_t old_fs;
++ ssize_t len;
++
++ if (WARN_ON(off + total_len > slice->size))
++ return -EFAULT;
++
++ off += slice->off;
++ iov_iter_kvec(&iter, WRITE | ITER_KVEC, kvec, kvec_len, total_len);
++
++ old_fs = get_fs();
++ set_fs(get_ds());
++ len = vfs_iter_write(slice->pool->f, &iter, &off);
++ set_fs(old_fs);
++
++ return (len >= 0 && len != total_len) ? -EFAULT : len;
++}
++
++/**
++ * kdbus_pool_slice_copy() - copy data from one slice into another
++ * @slice_dst: destination slice
++ * @slice_src: source slice
++ *
++ * Return: 0 on success, negative error number on failure.
++ */
++int kdbus_pool_slice_copy(const struct kdbus_pool_slice *slice_dst,
++ const struct kdbus_pool_slice *slice_src)
++{
++ struct file *f_src = slice_src->pool->f;
++ struct file *f_dst = slice_dst->pool->f;
++ struct inode *i_dst = file_inode(f_dst);
++ struct address_space *mapping_dst = f_dst->f_mapping;
++ const struct address_space_operations *aops = mapping_dst->a_ops;
++ unsigned long len = slice_src->size;
++ loff_t off_src = slice_src->off;
++ loff_t off_dst = slice_dst->off;
++ mm_segment_t old_fs;
++ int ret = 0;
++
++ if (WARN_ON(slice_src->size != slice_dst->size) ||
++ WARN_ON(slice_src->free || slice_dst->free))
++ return -EINVAL;
++
++ mutex_lock(&i_dst->i_mutex);
++ old_fs = get_fs();
++ set_fs(get_ds());
++ while (len > 0) {
++ unsigned long page_off;
++ unsigned long copy_len;
++ char __user *kaddr;
++ struct page *page;
++ ssize_t n_read;
++ void *fsdata;
++ long status;
++
++ page_off = off_dst & (PAGE_CACHE_SIZE - 1);
++ copy_len = min_t(unsigned long,
++ PAGE_CACHE_SIZE - page_off, len);
++
++ status = aops->write_begin(f_dst, mapping_dst, off_dst,
++ copy_len, 0, &page, &fsdata);
++ if (unlikely(status < 0)) {
++ ret = status;
++ break;
++ }
++
++ kaddr = (char __force __user *)kmap(page) + page_off;
++ n_read = __vfs_read(f_src, kaddr, copy_len, &off_src);
++ kunmap(page);
++ mark_page_accessed(page);
++ flush_dcache_page(page);
++
++ if (unlikely(n_read != copy_len)) {
++ ret = -EFAULT;
++ break;
++ }
++
++ status = aops->write_end(f_dst, mapping_dst, off_dst,
++ copy_len, copy_len, page, fsdata);
++ if (unlikely(status != copy_len)) {
++ ret = -EFAULT;
++ break;
++ }
++
++ off_dst += copy_len;
++ len -= copy_len;
++ }
++ set_fs(old_fs);
++ mutex_unlock(&i_dst->i_mutex);
++
++ return ret;
++}
++
++/**
++ * kdbus_pool_mmap() - map the pool into the process
++ * @pool: The receiver's pool
++ * @vma: passed by mmap() syscall
++ *
++ * Return: the result of the mmap() call, negative errno on failure.
++ */
++int kdbus_pool_mmap(const struct kdbus_pool *pool, struct vm_area_struct *vma)
++{
++ /* deny write access to the pool */
++ if (vma->vm_flags & VM_WRITE)
++ return -EPERM;
++ vma->vm_flags &= ~VM_MAYWRITE;
++
++ /* do not allow to map more than the size of the file */
++ if ((vma->vm_end - vma->vm_start) > pool->size)
++ return -EFAULT;
++
++ /* replace the connection file with our shmem file */
++ if (vma->vm_file)
++ fput(vma->vm_file);
++ vma->vm_file = get_file(pool->f);
++
++ return pool->f->f_op->mmap(pool->f, vma);
++}
+diff --git a/ipc/kdbus/pool.h b/ipc/kdbus/pool.h
+new file mode 100644
+index 0000000..a903821
+--- /dev/null
++++ b/ipc/kdbus/pool.h
+@@ -0,0 +1,46 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_POOL_H
++#define __KDBUS_POOL_H
++
++#include <linux/uio.h>
++
++struct kdbus_pool;
++struct kdbus_pool_slice;
++
++struct kdbus_pool *kdbus_pool_new(const char *name, size_t size);
++void kdbus_pool_free(struct kdbus_pool *pool);
++void kdbus_pool_accounted(struct kdbus_pool *pool, size_t *size, size_t *acc);
++int kdbus_pool_mmap(const struct kdbus_pool *pool, struct vm_area_struct *vma);
++int kdbus_pool_release_offset(struct kdbus_pool *pool, size_t off);
++void kdbus_pool_publish_empty(struct kdbus_pool *pool, u64 *off, u64 *size);
++
++struct kdbus_pool_slice *kdbus_pool_slice_alloc(struct kdbus_pool *pool,
++ size_t size, bool accounted);
++void kdbus_pool_slice_release(struct kdbus_pool_slice *slice);
++void kdbus_pool_slice_publish(struct kdbus_pool_slice *slice,
++ u64 *out_offset, u64 *out_size);
++off_t kdbus_pool_slice_offset(const struct kdbus_pool_slice *slice);
++size_t kdbus_pool_slice_size(const struct kdbus_pool_slice *slice);
++int kdbus_pool_slice_copy(const struct kdbus_pool_slice *slice_dst,
++ const struct kdbus_pool_slice *slice_src);
++ssize_t kdbus_pool_slice_copy_kvec(const struct kdbus_pool_slice *slice,
++ loff_t off, struct kvec *kvec,
++ size_t kvec_count, size_t total_len);
++ssize_t kdbus_pool_slice_copy_iovec(const struct kdbus_pool_slice *slice,
++ loff_t off, struct iovec *iov,
++ size_t iov_count, size_t total_len);
++
++#endif
+diff --git a/ipc/kdbus/queue.c b/ipc/kdbus/queue.c
+new file mode 100644
+index 0000000..f9c44d7
+--- /dev/null
++++ b/ipc/kdbus/queue.c
+@@ -0,0 +1,363 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/audit.h>
++#include <linux/file.h>
++#include <linux/fs.h>
++#include <linux/hashtable.h>
++#include <linux/idr.h>
++#include <linux/init.h>
++#include <linux/math64.h>
++#include <linux/mm.h>
++#include <linux/module.h>
++#include <linux/mutex.h>
++#include <linux/poll.h>
++#include <linux/sched.h>
++#include <linux/sizes.h>
++#include <linux/slab.h>
++#include <linux/syscalls.h>
++#include <linux/uio.h>
++
++#include "util.h"
++#include "domain.h"
++#include "connection.h"
++#include "item.h"
++#include "message.h"
++#include "metadata.h"
++#include "queue.h"
++#include "reply.h"
++
++/**
++ * kdbus_queue_init() - initialize data structure related to a queue
++ * @queue: The queue to initialize
++ */
++void kdbus_queue_init(struct kdbus_queue *queue)
++{
++ INIT_LIST_HEAD(&queue->msg_list);
++ queue->msg_prio_queue = RB_ROOT;
++}
++
++/**
++ * kdbus_queue_peek() - Retrieves an entry from a queue
++ * @queue: The queue
++ * @priority: The minimum priority of the entry to peek
++ * @use_priority: Boolean flag whether or not to peek by priority
++ *
++ * Look for a entry in a queue, either by priority, or the oldest one (FIFO).
++ * The entry is not freed, put off the queue's lists or anything else.
++ *
++ * Return: the peeked queue entry on success, NULL if no suitable msg is found
++ */
++struct kdbus_queue_entry *kdbus_queue_peek(struct kdbus_queue *queue,
++ s64 priority, bool use_priority)
++{
++ struct kdbus_queue_entry *e;
++
++ if (list_empty(&queue->msg_list))
++ return NULL;
++
++ if (use_priority) {
++ /* get next entry with highest priority */
++ e = rb_entry(queue->msg_prio_highest,
++ struct kdbus_queue_entry, prio_node);
++
++ /* no entry with the requested priority */
++ if (e->priority > priority)
++ return NULL;
++ } else {
++ /* ignore the priority, return the next entry in the entry */
++ e = list_first_entry(&queue->msg_list,
++ struct kdbus_queue_entry, entry);
++ }
++
++ return e;
++}
++
++static void kdbus_queue_entry_link(struct kdbus_queue_entry *entry)
++{
++ struct kdbus_queue *queue = &entry->conn->queue;
++ struct rb_node **n, *pn = NULL;
++ bool highest = true;
++
++ lockdep_assert_held(&entry->conn->lock);
++ if (WARN_ON(!list_empty(&entry->entry)))
++ return;
++
++ /* sort into priority entry tree */
++ n = &queue->msg_prio_queue.rb_node;
++ while (*n) {
++ struct kdbus_queue_entry *e;
++
++ pn = *n;
++ e = rb_entry(pn, struct kdbus_queue_entry, prio_node);
++
++ /* existing node for this priority, add to its list */
++ if (likely(entry->priority == e->priority)) {
++ list_add_tail(&entry->prio_entry, &e->prio_entry);
++ goto prio_done;
++ }
++
++ if (entry->priority < e->priority) {
++ n = &pn->rb_left;
++ } else {
++ n = &pn->rb_right;
++ highest = false;
++ }
++ }
++
++ /* cache highest-priority entry */
++ if (highest)
++ queue->msg_prio_highest = &entry->prio_node;
++
++ /* new node for this priority */
++ rb_link_node(&entry->prio_node, pn, n);
++ rb_insert_color(&entry->prio_node, &queue->msg_prio_queue);
++ INIT_LIST_HEAD(&entry->prio_entry);
++
++prio_done:
++ /* add to unsorted fifo list */
++ list_add_tail(&entry->entry, &queue->msg_list);
++}
++
++static void kdbus_queue_entry_unlink(struct kdbus_queue_entry *entry)
++{
++ struct kdbus_queue *queue = &entry->conn->queue;
++
++ lockdep_assert_held(&entry->conn->lock);
++ if (list_empty(&entry->entry))
++ return;
++
++ list_del_init(&entry->entry);
++
++ if (list_empty(&entry->prio_entry)) {
++ /*
++ * Single entry for this priority, update cached
++ * highest-priority entry, remove the tree node.
++ */
++ if (queue->msg_prio_highest == &entry->prio_node)
++ queue->msg_prio_highest = rb_next(&entry->prio_node);
++
++ rb_erase(&entry->prio_node, &queue->msg_prio_queue);
++ } else {
++ struct kdbus_queue_entry *q;
++
++ /*
++ * Multiple entries for this priority entry, get next one in
++ * the list. Update cached highest-priority entry, store the
++ * new one as the tree node.
++ */
++ q = list_first_entry(&entry->prio_entry,
++ struct kdbus_queue_entry, prio_entry);
++ list_del(&entry->prio_entry);
++
++ if (queue->msg_prio_highest == &entry->prio_node)
++ queue->msg_prio_highest = &q->prio_node;
++
++ rb_replace_node(&entry->prio_node, &q->prio_node,
++ &queue->msg_prio_queue);
++ }
++}
++
++/**
++ * kdbus_queue_entry_new() - allocate a queue entry
++ * @src: source connection, or NULL
++ * @dst: destination connection
++ * @s: staging object carrying the message
++ *
++ * Allocates a queue entry based on a given msg and allocate space for
++ * the message payload and the requested metadata in the connection's pool.
++ * The entry is not actually added to the queue's lists at this point.
++ *
++ * Return: the allocated entry on success, or an ERR_PTR on failures.
++ */
++struct kdbus_queue_entry *kdbus_queue_entry_new(struct kdbus_conn *src,
++ struct kdbus_conn *dst,
++ struct kdbus_staging *s)
++{
++ struct kdbus_queue_entry *entry;
++ int ret;
++
++ entry = kzalloc(sizeof(*entry), GFP_KERNEL);
++ if (!entry)
++ return ERR_PTR(-ENOMEM);
++
++ INIT_LIST_HEAD(&entry->entry);
++ entry->priority = s->msg->priority;
++ entry->conn = kdbus_conn_ref(dst);
++ entry->gaps = kdbus_gaps_ref(s->gaps);
++
++ entry->slice = kdbus_staging_emit(s, src, dst);
++ if (IS_ERR(entry->slice)) {
++ ret = PTR_ERR(entry->slice);
++ entry->slice = NULL;
++ goto error;
++ }
++
++ entry->user = src ? kdbus_user_ref(src->user) : NULL;
++ return entry;
++
++error:
++ kdbus_queue_entry_free(entry);
++ return ERR_PTR(ret);
++}
++
++/**
++ * kdbus_queue_entry_free() - free resources of an entry
++ * @entry: The entry to free
++ *
++ * Removes resources allocated by a queue entry, along with the entry itself.
++ * Note that the entry's slice is not freed at this point.
++ */
++void kdbus_queue_entry_free(struct kdbus_queue_entry *entry)
++{
++ if (!entry)
++ return;
++
++ lockdep_assert_held(&entry->conn->lock);
++
++ kdbus_queue_entry_unlink(entry);
++ kdbus_reply_unref(entry->reply);
++
++ if (entry->slice) {
++ kdbus_conn_quota_dec(entry->conn, entry->user,
++ kdbus_pool_slice_size(entry->slice),
++ entry->gaps ? entry->gaps->n_fds : 0);
++ kdbus_pool_slice_release(entry->slice);
++ }
++
++ kdbus_user_unref(entry->user);
++ kdbus_gaps_unref(entry->gaps);
++ kdbus_conn_unref(entry->conn);
++ kfree(entry);
++}
++
++/**
++ * kdbus_queue_entry_install() - install message components into the
++ * receiver's process
++ * @entry: The queue entry to install
++ * @return_flags: Pointer to store the return flags for userspace
++ * @install_fds: Whether or not to install associated file descriptors
++ *
++ * Return: 0 on success.
++ */
++int kdbus_queue_entry_install(struct kdbus_queue_entry *entry,
++ u64 *return_flags, bool install_fds)
++{
++ bool incomplete_fds = false;
++ int ret;
++
++ lockdep_assert_held(&entry->conn->lock);
++
++ ret = kdbus_gaps_install(entry->gaps, entry->slice, &incomplete_fds);
++ if (ret < 0)
++ return ret;
++
++ if (incomplete_fds)
++ *return_flags |= KDBUS_RECV_RETURN_INCOMPLETE_FDS;
++ return 0;
++}
++
++/**
++ * kdbus_queue_entry_enqueue() - enqueue an entry
++ * @entry: entry to enqueue
++ * @reply: reply to link to this entry (or NULL if none)
++ *
++ * This enqueues an unqueued entry into the message queue of the linked
++ * connection. It also binds a reply object to the entry so we can remember it
++ * when the message is moved.
++ *
++ * Once this call returns (and the connection lock is released), this entry can
++ * be dequeued by the target connection. Note that the entry will not be removed
++ * from the queue until it is destroyed.
++ */
++void kdbus_queue_entry_enqueue(struct kdbus_queue_entry *entry,
++ struct kdbus_reply *reply)
++{
++ lockdep_assert_held(&entry->conn->lock);
++
++ if (WARN_ON(entry->reply) || WARN_ON(!list_empty(&entry->entry)))
++ return;
++
++ entry->reply = kdbus_reply_ref(reply);
++ kdbus_queue_entry_link(entry);
++}
++
++/**
++ * kdbus_queue_entry_move() - move queue entry
++ * @e: queue entry to move
++ * @dst: destination connection to queue the entry on
++ *
++ * This moves a queue entry onto a different connection. It allocates a new
++ * slice on the target connection and copies the message over. If the copy
++ * succeeded, we move the entry from @src to @dst.
++ *
++ * On failure, the entry is left untouched.
++ *
++ * The queue entry must be queued right now, and after the call succeeds it will
++ * be queued on the destination, but no longer on the source.
++ *
++ * The caller must hold the connection lock of the source *and* destination.
++ *
++ * Return: 0 on success, negative error code on failure.
++ */
++int kdbus_queue_entry_move(struct kdbus_queue_entry *e,
++ struct kdbus_conn *dst)
++{
++ struct kdbus_pool_slice *slice = NULL;
++ struct kdbus_conn *src = e->conn;
++ size_t size, fds;
++ int ret;
++
++ lockdep_assert_held(&src->lock);
++ lockdep_assert_held(&dst->lock);
++
++ if (WARN_ON(list_empty(&e->entry)))
++ return -EINVAL;
++ if (src == dst)
++ return 0;
++
++ size = kdbus_pool_slice_size(e->slice);
++ fds = e->gaps ? e->gaps->n_fds : 0;
++
++ ret = kdbus_conn_quota_inc(dst, e->user, size, fds);
++ if (ret < 0)
++ return ret;
++
++ slice = kdbus_pool_slice_alloc(dst->pool, size, true);
++ if (IS_ERR(slice)) {
++ ret = PTR_ERR(slice);
++ slice = NULL;
++ goto error;
++ }
++
++ ret = kdbus_pool_slice_copy(slice, e->slice);
++ if (ret < 0)
++ goto error;
++
++ kdbus_queue_entry_unlink(e);
++ kdbus_conn_quota_dec(src, e->user, size, fds);
++ kdbus_pool_slice_release(e->slice);
++ kdbus_conn_unref(e->conn);
++
++ e->slice = slice;
++ e->conn = kdbus_conn_ref(dst);
++ kdbus_queue_entry_link(e);
++
++ return 0;
++
++error:
++ kdbus_pool_slice_release(slice);
++ kdbus_conn_quota_dec(dst, e->user, size, fds);
++ return ret;
++}
+diff --git a/ipc/kdbus/queue.h b/ipc/kdbus/queue.h
+new file mode 100644
+index 0000000..bf686d1
+--- /dev/null
++++ b/ipc/kdbus/queue.h
+@@ -0,0 +1,84 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_QUEUE_H
++#define __KDBUS_QUEUE_H
++
++#include <linux/list.h>
++#include <linux/rbtree.h>
++
++struct kdbus_conn;
++struct kdbus_pool_slice;
++struct kdbus_reply;
++struct kdbus_staging;
++struct kdbus_user;
++
++/**
++ * struct kdbus_queue - a connection's message queue
++ * @msg_list: List head for kdbus_queue_entry objects
++ * @msg_prio_queue: RB tree root for messages, sorted by priority
++ * @msg_prio_highest: Link to the RB node referencing the message with the
++ * highest priority in the tree.
++ */
++struct kdbus_queue {
++ struct list_head msg_list;
++ struct rb_root msg_prio_queue;
++ struct rb_node *msg_prio_highest;
++};
++
++/**
++ * struct kdbus_queue_entry - messages waiting to be read
++ * @entry: Entry in the connection's list
++ * @prio_node: Entry in the priority queue tree
++ * @prio_entry: Queue tree node entry in the list of one priority
++ * @priority: Message priority
++ * @dst_name_id: The sequence number of the name this message is
++ * addressed to, 0 for messages sent to an ID
++ * @conn: Connection this entry is queued on
++ * @gaps: Gaps object to fill message gaps at RECV time
++ * @user: User used for accounting
++ * @slice: Slice in the receiver's pool for the message
++ * @reply: The reply block if a reply to this message is expected
++ */
++struct kdbus_queue_entry {
++ struct list_head entry;
++ struct rb_node prio_node;
++ struct list_head prio_entry;
++
++ s64 priority;
++ u64 dst_name_id;
++
++ struct kdbus_conn *conn;
++ struct kdbus_gaps *gaps;
++ struct kdbus_user *user;
++ struct kdbus_pool_slice *slice;
++ struct kdbus_reply *reply;
++};
++
++void kdbus_queue_init(struct kdbus_queue *queue);
++struct kdbus_queue_entry *kdbus_queue_peek(struct kdbus_queue *queue,
++ s64 priority, bool use_priority);
++
++struct kdbus_queue_entry *kdbus_queue_entry_new(struct kdbus_conn *src,
++ struct kdbus_conn *dst,
++ struct kdbus_staging *s);
++void kdbus_queue_entry_free(struct kdbus_queue_entry *entry);
++int kdbus_queue_entry_install(struct kdbus_queue_entry *entry,
++ u64 *return_flags, bool install_fds);
++void kdbus_queue_entry_enqueue(struct kdbus_queue_entry *entry,
++ struct kdbus_reply *reply);
++int kdbus_queue_entry_move(struct kdbus_queue_entry *entry,
++ struct kdbus_conn *dst);
++
++#endif /* __KDBUS_QUEUE_H */
+diff --git a/ipc/kdbus/reply.c b/ipc/kdbus/reply.c
+new file mode 100644
+index 0000000..e6791d8
+--- /dev/null
++++ b/ipc/kdbus/reply.c
+@@ -0,0 +1,252 @@
++#include <linux/init.h>
++#include <linux/mm.h>
++#include <linux/module.h>
++#include <linux/mutex.h>
++#include <linux/slab.h>
++#include <linux/uio.h>
++
++#include "bus.h"
++#include "connection.h"
++#include "endpoint.h"
++#include "message.h"
++#include "metadata.h"
++#include "names.h"
++#include "domain.h"
++#include "item.h"
++#include "notify.h"
++#include "policy.h"
++#include "reply.h"
++#include "util.h"
++
++/**
++ * kdbus_reply_new() - Allocate and set up a new kdbus_reply object
++ * @reply_src: The connection a reply is expected from
++ * @reply_dst: The connection this reply object belongs to
++ * @msg: Message associated with the reply
++ * @name_entry: Name entry used to send the message
++ * @sync: Whether or not to make this reply synchronous
++ *
++ * Allocate and fill a new kdbus_reply object.
++ *
++ * Return: New kdbus_conn object on success, ERR_PTR on error.
++ */
++struct kdbus_reply *kdbus_reply_new(struct kdbus_conn *reply_src,
++ struct kdbus_conn *reply_dst,
++ const struct kdbus_msg *msg,
++ struct kdbus_name_entry *name_entry,
++ bool sync)
++{
++ struct kdbus_reply *r;
++ int ret;
++
++ if (atomic_inc_return(&reply_dst->request_count) >
++ KDBUS_CONN_MAX_REQUESTS_PENDING) {
++ ret = -EMLINK;
++ goto exit_dec_request_count;
++ }
++
++ r = kzalloc(sizeof(*r), GFP_KERNEL);
++ if (!r) {
++ ret = -ENOMEM;
++ goto exit_dec_request_count;
++ }
++
++ kref_init(&r->kref);
++ INIT_LIST_HEAD(&r->entry);
++ r->reply_src = kdbus_conn_ref(reply_src);
++ r->reply_dst = kdbus_conn_ref(reply_dst);
++ r->cookie = msg->cookie;
++ r->name_id = name_entry ? name_entry->name_id : 0;
++ r->deadline_ns = msg->timeout_ns;
++
++ if (sync) {
++ r->sync = true;
++ r->waiting = true;
++ }
++
++ return r;
++
++exit_dec_request_count:
++ atomic_dec(&reply_dst->request_count);
++ return ERR_PTR(ret);
++}
++
++static void __kdbus_reply_free(struct kref *kref)
++{
++ struct kdbus_reply *reply =
++ container_of(kref, struct kdbus_reply, kref);
++
++ atomic_dec(&reply->reply_dst->request_count);
++ kdbus_conn_unref(reply->reply_src);
++ kdbus_conn_unref(reply->reply_dst);
++ kfree(reply);
++}
++
++/**
++ * kdbus_reply_ref() - Increase reference on kdbus_reply
++ * @r: The reply, may be %NULL
++ *
++ * Return: The reply object with an extra reference
++ */
++struct kdbus_reply *kdbus_reply_ref(struct kdbus_reply *r)
++{
++ if (r)
++ kref_get(&r->kref);
++ return r;
++}
++
++/**
++ * kdbus_reply_unref() - Decrease reference on kdbus_reply
++ * @r: The reply, may be %NULL
++ *
++ * Return: NULL
++ */
++struct kdbus_reply *kdbus_reply_unref(struct kdbus_reply *r)
++{
++ if (r)
++ kref_put(&r->kref, __kdbus_reply_free);
++ return NULL;
++}
++
++/**
++ * kdbus_reply_link() - Link reply object into target connection
++ * @r: Reply to link
++ */
++void kdbus_reply_link(struct kdbus_reply *r)
++{
++ if (WARN_ON(!list_empty(&r->entry)))
++ return;
++
++ list_add(&r->entry, &r->reply_dst->reply_list);
++ kdbus_reply_ref(r);
++}
++
++/**
++ * kdbus_reply_unlink() - Unlink reply object from target connection
++ * @r: Reply to unlink
++ */
++void kdbus_reply_unlink(struct kdbus_reply *r)
++{
++ if (!list_empty(&r->entry)) {
++ list_del_init(&r->entry);
++ kdbus_reply_unref(r);
++ }
++}
++
++/**
++ * kdbus_sync_reply_wakeup() - Wake a synchronously blocking reply
++ * @reply: The reply object
++ * @err: Error code to set on the remote side
++ *
++ * Wake up remote peer (method origin) with the appropriate synchronous reply
++ * code.
++ */
++void kdbus_sync_reply_wakeup(struct kdbus_reply *reply, int err)
++{
++ if (WARN_ON(!reply->sync))
++ return;
++
++ reply->waiting = false;
++ reply->err = err;
++ wake_up_interruptible(&reply->reply_dst->wait);
++}
++
++/**
++ * kdbus_reply_find() - Find the corresponding reply object
++ * @replying: The replying connection or NULL
++ * @reply_dst: The connection the reply will be sent to
++ * (method origin)
++ * @cookie: The cookie of the requesting message
++ *
++ * Lookup a reply object that should be sent as a reply by
++ * @replying to @reply_dst with the given cookie.
++ *
++ * Callers must take the @reply_dst lock.
++ *
++ * Return: the corresponding reply object or NULL if not found
++ */
++struct kdbus_reply *kdbus_reply_find(struct kdbus_conn *replying,
++ struct kdbus_conn *reply_dst,
++ u64 cookie)
++{
++ struct kdbus_reply *r;
++
++ list_for_each_entry(r, &reply_dst->reply_list, entry) {
++ if (r->cookie == cookie &&
++ (!replying || r->reply_src == replying))
++ return r;
++ }
++
++ return NULL;
++}
++
++/**
++ * kdbus_reply_list_scan_work() - Worker callback to scan the replies of a
++ * connection for exceeded timeouts
++ * @work: Work struct of the connection to scan
++ *
++ * Walk the list of replies stored with a connection and look for entries
++ * that have exceeded their timeout. If such an entry is found, a timeout
++ * notification is sent to the waiting peer, and the reply is removed from
++ * the list.
++ *
++ * The work is rescheduled to the nearest timeout found during the list
++ * iteration.
++ */
++void kdbus_reply_list_scan_work(struct work_struct *work)
++{
++ struct kdbus_conn *conn =
++ container_of(work, struct kdbus_conn, work.work);
++ struct kdbus_reply *reply, *reply_tmp;
++ u64 deadline = ~0ULL;
++ u64 now;
++
++ now = ktime_get_ns();
++
++ mutex_lock(&conn->lock);
++ if (!kdbus_conn_active(conn)) {
++ mutex_unlock(&conn->lock);
++ return;
++ }
++
++ list_for_each_entry_safe(reply, reply_tmp, &conn->reply_list, entry) {
++ /*
++ * If the reply block is waiting for synchronous I/O,
++ * the timeout is handled by wait_event_*_timeout(),
++ * so we don't have to care for it here.
++ */
++ if (reply->sync && !reply->interrupted)
++ continue;
++
++ WARN_ON(reply->reply_dst != conn);
++
++ if (reply->deadline_ns > now) {
++ /* remember next timeout */
++ if (deadline > reply->deadline_ns)
++ deadline = reply->deadline_ns;
++
++ continue;
++ }
++
++ /*
++ * A zero deadline means the connection died, was
++ * cleaned up already and the notification was sent.
++ * Don't send notifications for reply trackers that were
++ * left in an interrupted syscall state.
++ */
++ if (reply->deadline_ns != 0 && !reply->interrupted)
++ kdbus_notify_reply_timeout(conn->ep->bus, conn->id,
++ reply->cookie);
++
++ kdbus_reply_unlink(reply);
++ }
++
++ /* rearm delayed work with next timeout */
++ if (deadline != ~0ULL)
++ schedule_delayed_work(&conn->work,
++ nsecs_to_jiffies(deadline - now));
++
++ mutex_unlock(&conn->lock);
++
++ kdbus_notify_flush(conn->ep->bus);
++}
+diff --git a/ipc/kdbus/reply.h b/ipc/kdbus/reply.h
+new file mode 100644
+index 0000000..68d5232
+--- /dev/null
++++ b/ipc/kdbus/reply.h
+@@ -0,0 +1,68 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_REPLY_H
++#define __KDBUS_REPLY_H
++
++/**
++ * struct kdbus_reply - an entry of kdbus_conn's list of replies
++ * @kref: Ref-count of this object
++ * @entry: The entry of the connection's reply_list
++ * @reply_src: The connection the reply will be sent from
++ * @reply_dst: The connection the reply will be sent to
++ * @queue_entry: The queue entry item that is prepared by the replying
++ * connection
++ * @deadline_ns: The deadline of the reply, in nanoseconds
++ * @cookie: The cookie of the requesting message
++ * @name_id: ID of the well-known name the original msg was sent to
++ * @sync: The reply block is waiting for synchronous I/O
++ * @waiting: The condition to synchronously wait for
++ * @interrupted: The sync reply was left in an interrupted state
++ * @err: The error code for the synchronous reply
++ */
++struct kdbus_reply {
++ struct kref kref;
++ struct list_head entry;
++ struct kdbus_conn *reply_src;
++ struct kdbus_conn *reply_dst;
++ struct kdbus_queue_entry *queue_entry;
++ u64 deadline_ns;
++ u64 cookie;
++ u64 name_id;
++ bool sync:1;
++ bool waiting:1;
++ bool interrupted:1;
++ int err;
++};
++
++struct kdbus_reply *kdbus_reply_new(struct kdbus_conn *reply_src,
++ struct kdbus_conn *reply_dst,
++ const struct kdbus_msg *msg,
++ struct kdbus_name_entry *name_entry,
++ bool sync);
++
++struct kdbus_reply *kdbus_reply_ref(struct kdbus_reply *r);
++struct kdbus_reply *kdbus_reply_unref(struct kdbus_reply *r);
++
++void kdbus_reply_link(struct kdbus_reply *r);
++void kdbus_reply_unlink(struct kdbus_reply *r);
++
++struct kdbus_reply *kdbus_reply_find(struct kdbus_conn *replying,
++ struct kdbus_conn *reply_dst,
++ u64 cookie);
++
++void kdbus_sync_reply_wakeup(struct kdbus_reply *reply, int err);
++void kdbus_reply_list_scan_work(struct work_struct *work);
++
++#endif /* __KDBUS_REPLY_H */
+diff --git a/ipc/kdbus/util.c b/ipc/kdbus/util.c
+new file mode 100644
+index 0000000..72b1883
+--- /dev/null
++++ b/ipc/kdbus/util.c
+@@ -0,0 +1,156 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <linux/capability.h>
++#include <linux/cred.h>
++#include <linux/ctype.h>
++#include <linux/err.h>
++#include <linux/file.h>
++#include <linux/slab.h>
++#include <linux/string.h>
++#include <linux/uaccess.h>
++#include <linux/uio.h>
++#include <linux/user_namespace.h>
++
++#include "limits.h"
++#include "util.h"
++
++/**
++ * kdbus_copy_from_user() - copy aligned data from user-space
++ * @dest: target buffer in kernel memory
++ * @user_ptr: user-provided source buffer
++ * @size: memory size to copy from user
++ *
++ * This copies @size bytes from @user_ptr into the kernel, just like
++ * copy_from_user() does. But we enforce an 8-byte alignment and reject any
++ * unaligned user-space pointers.
++ *
++ * Return: 0 on success, negative error code on failure.
++ */
++int kdbus_copy_from_user(void *dest, void __user *user_ptr, size_t size)
++{
++ if (!KDBUS_IS_ALIGNED8((uintptr_t)user_ptr))
++ return -EFAULT;
++
++ if (copy_from_user(dest, user_ptr, size))
++ return -EFAULT;
++
++ return 0;
++}
++
++/**
++ * kdbus_verify_uid_prefix() - verify UID prefix of a user-supplied name
++ * @name: user-supplied name to verify
++ * @user_ns: user-namespace to act in
++ * @kuid: Kernel internal uid of user
++ *
++ * This verifies that the user-supplied name @name has their UID as prefix. This
++ * is the default name-spacing policy we enforce on user-supplied names for
++ * public kdbus entities like buses and endpoints.
++ *
++ * The user must supply names prefixed with "<UID>-", whereas the UID is
++ * interpreted in the user-namespace of the domain. If the user fails to supply
++ * such a prefixed name, we reject it.
++ *
++ * Return: 0 on success, negative error code on failure
++ */
++int kdbus_verify_uid_prefix(const char *name, struct user_namespace *user_ns,
++ kuid_t kuid)
++{
++ uid_t uid;
++ char prefix[16];
++
++ /*
++ * The kuid must have a mapping into the userns of the domain
++ * otherwise do not allow creation of buses nor endpoints.
++ */
++ uid = from_kuid(user_ns, kuid);
++ if (uid == (uid_t) -1)
++ return -EINVAL;
++
++ snprintf(prefix, sizeof(prefix), "%u-", uid);
++ if (strncmp(name, prefix, strlen(prefix)) != 0)
++ return -EINVAL;
++
++ return 0;
++}
++
++/**
++ * kdbus_sanitize_attach_flags() - Sanitize attach flags from user-space
++ * @flags: Attach flags provided by userspace
++ * @attach_flags: A pointer where to store the valid attach flags
++ *
++ * Convert attach-flags provided by user-space into a valid mask. If the mask
++ * is invalid, an error is returned. The sanitized attach flags are stored in
++ * the output parameter.
++ *
++ * Return: 0 on success, negative error on failure.
++ */
++int kdbus_sanitize_attach_flags(u64 flags, u64 *attach_flags)
++{
++ /* 'any' degrades to 'all' for compatibility */
++ if (flags == _KDBUS_ATTACH_ANY)
++ flags = _KDBUS_ATTACH_ALL;
++
++ /* reject unknown attach flags */
++ if (flags & ~_KDBUS_ATTACH_ALL)
++ return -EINVAL;
++
++ *attach_flags = flags;
++ return 0;
++}
++
++/**
++ * kdbus_kvec_set - helper utility to assemble kvec arrays
++ * @kvec: kvec entry to use
++ * @src: Source address to set in @kvec
++ * @len: Number of bytes in @src
++ * @total_len: Pointer to total length variable
++ *
++ * Set @src and @len in @kvec, and increase @total_len by @len.
++ */
++void kdbus_kvec_set(struct kvec *kvec, void *src, size_t len, u64 *total_len)
++{
++ kvec->iov_base = src;
++ kvec->iov_len = len;
++ *total_len += len;
++}
++
++static const char * const zeros = "\0\0\0\0\0\0\0";
++
++/**
++ * kdbus_kvec_pad - conditionally write a padding kvec
++ * @kvec: kvec entry to use
++ * @len: Total length used for kvec array
++ *
++ * Check if the current total byte length of the array in @len is aligned to
++ * 8 bytes. If it isn't, fill @kvec with padding information and increase @len
++ * by the number of bytes stored in @kvec.
++ *
++ * Return: the number of added padding bytes.
++ */
++size_t kdbus_kvec_pad(struct kvec *kvec, u64 *len)
++{
++ size_t pad = KDBUS_ALIGN8(*len) - *len;
++
++ if (!pad)
++ return 0;
++
++ kvec->iov_base = (void *)zeros;
++ kvec->iov_len = pad;
++
++ *len += pad;
++
++ return pad;
++}
+diff --git a/ipc/kdbus/util.h b/ipc/kdbus/util.h
+new file mode 100644
+index 0000000..5297166
+--- /dev/null
++++ b/ipc/kdbus/util.h
+@@ -0,0 +1,73 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
++ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ * Copyright (C) 2013-2015 Linux Foundation
++ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#ifndef __KDBUS_UTIL_H
++#define __KDBUS_UTIL_H
++
++#include <linux/dcache.h>
++#include <linux/ioctl.h>
++
++#include <uapi/linux/kdbus.h>
++
++/* all exported addresses are 64 bit */
++#define KDBUS_PTR(addr) ((void __user *)(uintptr_t)(addr))
++
++/* all exported sizes are 64 bit and data aligned to 64 bit */
++#define KDBUS_ALIGN8(s) ALIGN((s), 8)
++#define KDBUS_IS_ALIGNED8(s) (IS_ALIGNED(s, 8))
++
++/**
++ * kdbus_member_set_user - write a structure member to user memory
++ * @_s: Variable to copy from
++ * @_b: Buffer to write to
++ * @_t: Structure type
++ * @_m: Member name in the passed structure
++ *
++ * Return: the result of copy_to_user()
++ */
++#define kdbus_member_set_user(_s, _b, _t, _m) \
++({ \
++ u64 __user *_sz = \
++ (void __user *)((u8 __user *)(_b) + offsetof(_t, _m)); \
++ copy_to_user(_sz, _s, FIELD_SIZEOF(_t, _m)); \
++})
++
++/**
++ * kdbus_strhash - calculate a hash
++ * @str: String
++ *
++ * Return: hash value
++ */
++static inline unsigned int kdbus_strhash(const char *str)
++{
++ unsigned long hash = init_name_hash();
++
++ while (*str)
++ hash = partial_name_hash(*str++, hash);
++
++ return end_name_hash(hash);
++}
++
++int kdbus_verify_uid_prefix(const char *name, struct user_namespace *user_ns,
++ kuid_t kuid);
++int kdbus_sanitize_attach_flags(u64 flags, u64 *attach_flags);
++
++int kdbus_copy_from_user(void *dest, void __user *user_ptr, size_t size);
++
++struct kvec;
++
++void kdbus_kvec_set(struct kvec *kvec, void *src, size_t len, u64 *total_len);
++size_t kdbus_kvec_pad(struct kvec *kvec, u64 *len);
++
++#endif
+diff --git a/samples/Kconfig b/samples/Kconfig
+index 224ebb4..a4c6b2f 100644
+--- a/samples/Kconfig
++++ b/samples/Kconfig
+@@ -55,6 +55,13 @@ config SAMPLE_KDB
+ Build an example of how to dynamically add the hello
+ command to the kdb shell.
+
++config SAMPLE_KDBUS
++ bool "Build kdbus API example"
++ depends on KDBUS
++ help
++ Build an example of how the kdbus API can be used from
++ userspace.
++
+ config SAMPLE_RPMSG_CLIENT
+ tristate "Build rpmsg client sample -- loadable modules only"
+ depends on RPMSG && m
+diff --git a/samples/Makefile b/samples/Makefile
+index f00257b..f0ad51e 100644
+--- a/samples/Makefile
++++ b/samples/Makefile
+@@ -1,4 +1,5 @@
+ # Makefile for Linux samples code
+
+ obj-$(CONFIG_SAMPLES) += kobject/ kprobes/ trace_events/ livepatch/ \
+- hw_breakpoint/ kfifo/ kdb/ hidraw/ rpmsg/ seccomp/
++ hw_breakpoint/ kfifo/ kdb/ kdbus/ hidraw/ rpmsg/ \
++ seccomp/
+diff --git a/samples/kdbus/.gitignore b/samples/kdbus/.gitignore
+new file mode 100644
+index 0000000..ee07d98
+--- /dev/null
++++ b/samples/kdbus/.gitignore
+@@ -0,0 +1 @@
++kdbus-workers
+diff --git a/samples/kdbus/Makefile b/samples/kdbus/Makefile
+new file mode 100644
+index 0000000..137f842
+--- /dev/null
++++ b/samples/kdbus/Makefile
+@@ -0,0 +1,9 @@
++# kbuild trick to avoid linker error. Can be omitted if a module is built.
++obj- := dummy.o
++
++hostprogs-$(CONFIG_SAMPLE_KDBUS) += kdbus-workers
++
++always := $(hostprogs-y)
++
++HOSTCFLAGS_kdbus-workers.o += -I$(objtree)/usr/include
++HOSTLOADLIBES_kdbus-workers := -lrt
+diff --git a/samples/kdbus/kdbus-api.h b/samples/kdbus/kdbus-api.h
+new file mode 100644
+index 0000000..7f3abae
+--- /dev/null
++++ b/samples/kdbus/kdbus-api.h
+@@ -0,0 +1,114 @@
++#ifndef KDBUS_API_H
++#define KDBUS_API_H
++
++#include <sys/ioctl.h>
++#include <linux/kdbus.h>
++
++#define KDBUS_ALIGN8(l) (((l) + 7) & ~7)
++#define KDBUS_ITEM_HEADER_SIZE offsetof(struct kdbus_item, data)
++#define KDBUS_ITEM_SIZE(s) KDBUS_ALIGN8((s) + KDBUS_ITEM_HEADER_SIZE)
++#define KDBUS_ITEM_NEXT(item) \
++ (typeof(item))((uint8_t *)(item) + KDBUS_ALIGN8((item)->size))
++#define KDBUS_FOREACH(iter, first, _size) \
++ for ((iter) = (first); \
++ ((uint8_t *)(iter) < (uint8_t *)(first) + (_size)) && \
++ ((uint8_t *)(iter) >= (uint8_t *)(first)); \
++ (iter) = (void *)((uint8_t *)(iter) + KDBUS_ALIGN8((iter)->size)))
++
++static inline int kdbus_cmd_bus_make(int control_fd, struct kdbus_cmd *cmd)
++{
++ int ret = ioctl(control_fd, KDBUS_CMD_BUS_MAKE, cmd);
++ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_endpoint_make(int bus_fd, struct kdbus_cmd *cmd)
++{
++ int ret = ioctl(bus_fd, KDBUS_CMD_ENDPOINT_MAKE, cmd);
++ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_endpoint_update(int ep_fd, struct kdbus_cmd *cmd)
++{
++ int ret = ioctl(ep_fd, KDBUS_CMD_ENDPOINT_UPDATE, cmd);
++ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_hello(int bus_fd, struct kdbus_cmd_hello *cmd)
++{
++ int ret = ioctl(bus_fd, KDBUS_CMD_HELLO, cmd);
++ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_update(int fd, struct kdbus_cmd *cmd)
++{
++ int ret = ioctl(fd, KDBUS_CMD_UPDATE, cmd);
++ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_byebye(int conn_fd, struct kdbus_cmd *cmd)
++{
++ int ret = ioctl(conn_fd, KDBUS_CMD_BYEBYE, cmd);
++ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_free(int conn_fd, struct kdbus_cmd_free *cmd)
++{
++ int ret = ioctl(conn_fd, KDBUS_CMD_FREE, cmd);
++ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_conn_info(int conn_fd, struct kdbus_cmd_info *cmd)
++{
++ int ret = ioctl(conn_fd, KDBUS_CMD_CONN_INFO, cmd);
++ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_bus_creator_info(int conn_fd, struct kdbus_cmd_info *cmd)
++{
++ int ret = ioctl(conn_fd, KDBUS_CMD_BUS_CREATOR_INFO, cmd);
++ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_list(int fd, struct kdbus_cmd_list *cmd)
++{
++ int ret = ioctl(fd, KDBUS_CMD_LIST, cmd);
++ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_send(int conn_fd, struct kdbus_cmd_send *cmd)
++{
++ int ret = ioctl(conn_fd, KDBUS_CMD_SEND, cmd);
++ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_recv(int conn_fd, struct kdbus_cmd_recv *cmd)
++{
++ int ret = ioctl(conn_fd, KDBUS_CMD_RECV, cmd);
++ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_name_acquire(int conn_fd, struct kdbus_cmd *cmd)
++{
++ int ret = ioctl(conn_fd, KDBUS_CMD_NAME_ACQUIRE, cmd);
++ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_name_release(int conn_fd, struct kdbus_cmd *cmd)
++{
++ int ret = ioctl(conn_fd, KDBUS_CMD_NAME_RELEASE, cmd);
++ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_match_add(int conn_fd, struct kdbus_cmd_match *cmd)
++{
++ int ret = ioctl(conn_fd, KDBUS_CMD_MATCH_ADD, cmd);
++ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++static inline int kdbus_cmd_match_remove(int conn_fd, struct kdbus_cmd_match *cmd)
++{
++ int ret = ioctl(conn_fd, KDBUS_CMD_MATCH_REMOVE, cmd);
++ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
++}
++
++#endif /* KDBUS_API_H */
+diff --git a/samples/kdbus/kdbus-workers.c b/samples/kdbus/kdbus-workers.c
+new file mode 100644
+index 0000000..5a6dfdc
+--- /dev/null
++++ b/samples/kdbus/kdbus-workers.c
+@@ -0,0 +1,1346 @@
++/*
++ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++/*
++ * Example: Workers
++ * This program computes prime-numbers based on the sieve of Eratosthenes. The
++ * master sets up a shared memory region and spawns workers which clear out the
++ * non-primes. The master reacts to keyboard input and to client-requests to
++ * control what each worker does. Note that this is in no way meant as efficient
++ * way to compute primes. It should only serve as example how a master/worker
++ * concept can be implemented with kdbus used as control messages.
++ *
++ * The main process is called the 'master'. It creates a new, private bus which
++ * will be used between the master and its workers to communicate. The master
++ * then spawns a fixed number of workers. Whenever a worker dies (detected via
++ * SIGCHLD), the master spawns a new worker. When done, the master waits for all
++ * workers to exit, prints a status report and exits itself.
++ *
++ * The master process does *not* keep track of its workers. Instead, this
++ * example implements a PULL model. That is, the master acquires a well-known
++ * name on the bus which each worker uses to request tasks from the master. If
++ * there are no more tasks, the master will return an empty task-list, which
++ * casues a worker to exit immediately.
++ *
++ * As tasks can be computationally expensive, we support cancellation. Whenever
++ * the master process is interrupted, it will drop its well-known name on the
++ * bus. This causes kdbus to broadcast a name-change notification. The workers
++ * check for broadcast messages regularly and will exit if they receive one.
++ *
++ * This example exists of 4 objects:
++ * * master: The master object contains the context of the master process. This
++ * process manages the prime-context, spawns workers and assigns
++ * prime-ranges to each worker to compute.
++ * The master itself does not do any prime-computations itself.
++ * * child: The child object contains the context of a worker. It inherits the
++ * prime context from its parent (the master) and then creates a new
++ * bus context to request prime-ranges to compute.
++ * * prime: The "prime" object is used to abstract how we compute primes. When
++ * allocated, it prepares a memory region to hold 1 bit for each
++ * natural number up to a fixed maximum ('MAX_PRIMES').
++ * The memory region is backed by a memfd which we share between
++ * processes. Each worker now gets assigned a range of natural
++ * numbers which it clears multiples of off the memory region. The
++ * master process is responsible of distributing all natural numbers
++ * up to the fixed maximum to its workers.
++ * * bus: The bus object is an abstraction of the kdbus API. It is pretty
++ * straightfoward and only manages the connection-fd plus the
++ * memory-mapped pool in a single object.
++ *
++ * This example is in reversed order, which should make it easier to read
++ * top-down, but requires some forward-declarations. Just ignore those.
++ */
++
++#include <stdio.h>
++#include <stdlib.h>
++#include <sys/syscall.h>
++
++/* glibc < 2.7 does not ship sys/signalfd.h */
++/* we require kernels with __NR_memfd_create */
++#if __GLIBC__ >= 2 && __GLIBC_MINOR__ >= 7 && defined(__NR_memfd_create)
++
++#include <ctype.h>
++#include <errno.h>
++#include <fcntl.h>
++#include <linux/memfd.h>
++#include <signal.h>
++#include <stdbool.h>
++#include <stddef.h>
++#include <stdint.h>
++#include <string.h>
++#include <sys/mman.h>
++#include <sys/poll.h>
++#include <sys/signalfd.h>
++#include <sys/time.h>
++#include <sys/wait.h>
++#include <time.h>
++#include <unistd.h>
++#include "kdbus-api.h"
++
++/* FORWARD DECLARATIONS */
++
++#define POOL_SIZE (16 * 1024 * 1024)
++#define MAX_PRIMES (2UL << 24)
++#define WORKER_COUNT (16)
++#define PRIME_STEPS (65536 * 4)
++
++static const char *arg_busname = "example-workers";
++static const char *arg_modname = "kdbus";
++static const char *arg_master = "org.freedesktop.master";
++
++static int err_assert(int r_errno, const char *msg, const char *func, int line,
++ const char *file)
++{
++ r_errno = (r_errno != 0) ? -abs(r_errno) : -EFAULT;
++ if (r_errno < 0) {
++ errno = -r_errno;
++ fprintf(stderr, "ERR: %s: %m (%s:%d in %s)\n",
++ msg, func, line, file);
++ }
++ return r_errno;
++}
++
++#define err_r(_r, _msg) err_assert((_r), (_msg), __func__, __LINE__, __FILE__)
++#define err(_msg) err_r(errno, (_msg))
++
++struct prime;
++struct bus;
++struct master;
++struct child;
++
++struct prime {
++ int fd;
++ uint8_t *area;
++ size_t max;
++ size_t done;
++ size_t status;
++};
++
++static int prime_new(struct prime **out);
++static void prime_free(struct prime *p);
++static bool prime_done(struct prime *p);
++static void prime_consume(struct prime *p, size_t amount);
++static int prime_run(struct prime *p, struct bus *cancel, size_t number);
++static void prime_print(struct prime *p);
++
++struct bus {
++ int fd;
++ uint8_t *pool;
++};
++
++static int bus_open_connection(struct bus **out, uid_t uid, const char *name,
++ uint64_t recv_flags);
++static void bus_close_connection(struct bus *b);
++static void bus_poool_free_slice(struct bus *b, uint64_t offset);
++static int bus_acquire_name(struct bus *b, const char *name);
++static int bus_install_name_loss_match(struct bus *b, const char *name);
++static int bus_poll(struct bus *b);
++static int bus_make(uid_t uid, const char *name);
++
++struct master {
++ size_t n_workers;
++ size_t max_workers;
++
++ int signal_fd;
++ int control_fd;
++
++ struct prime *prime;
++ struct bus *bus;
++};
++
++static int master_new(struct master **out);
++static void master_free(struct master *m);
++static int master_run(struct master *m);
++static int master_poll(struct master *m);
++static int master_handle_stdin(struct master *m);
++static int master_handle_signal(struct master *m);
++static int master_handle_bus(struct master *m);
++static int master_reply(struct master *m, const struct kdbus_msg *msg);
++static int master_waitpid(struct master *m);
++static int master_spawn(struct master *m);
++
++struct child {
++ struct bus *bus;
++ struct prime *prime;
++};
++
++static int child_new(struct child **out, struct prime *p);
++static void child_free(struct child *c);
++static int child_run(struct child *c);
++
++/* END OF FORWARD DECLARATIONS */
++
++/*
++ * This is the main entrypoint of this example. It is pretty straightforward. We
++ * create a master object, run the computation, print a status report and then
++ * exit. Nothing particularly interesting here, so lets look into the master
++ * object...
++ */
++int main(int argc, char **argv)
++{
++ struct master *m = NULL;
++ int r;
++
++ r = master_new(&m);
++ if (r < 0)
++ goto out;
++
++ r = master_run(m);
++ if (r < 0)
++ goto out;
++
++ if (0)
++ prime_print(m->prime);
++
++out:
++ master_free(m);
++ if (r < 0 && r != -EINTR)
++ fprintf(stderr, "failed\n");
++ else
++ fprintf(stderr, "done\n");
++ return r < 0 ? EXIT_FAILURE : EXIT_SUCCESS;
++}
++
++/*
++ * ...this will allocate a new master context. It keeps track of the current
++ * number of children/workers that are running, manages a signalfd to track
++ * SIGCHLD, and creates a private kdbus bus. Afterwards, it opens its connection
++ * to the bus and acquires a well known-name (arg_master).
++ */
++static int master_new(struct master **out)
++{
++ struct master *m;
++ sigset_t smask;
++ int r;
++
++ m = calloc(1, sizeof(*m));
++ if (!m)
++ return err("cannot allocate master");
++
++ m->max_workers = WORKER_COUNT;
++ m->signal_fd = -1;
++ m->control_fd = -1;
++
++ /* Block SIGINT and SIGCHLD signals */
++ sigemptyset(&smask);
++ sigaddset(&smask, SIGINT);
++ sigaddset(&smask, SIGCHLD);
++ sigprocmask(SIG_BLOCK, &smask, NULL);
++
++ m->signal_fd = signalfd(-1, &smask, SFD_CLOEXEC);
++ if (m->signal_fd < 0) {
++ r = err("cannot create signalfd");
++ goto error;
++ }
++
++ r = prime_new(&m->prime);
++ if (r < 0)
++ goto error;
++
++ m->control_fd = bus_make(getuid(), arg_busname);
++ if (m->control_fd < 0) {
++ r = m->control_fd;
++ goto error;
++ }
++
++ /*
++ * Open a bus connection for the master, and require each received
++ * message to have a metadata item of type KDBUS_ITEM_PIDS attached.
++ * The current UID is needed to compute the name of the bus node to
++ * connect to.
++ */
++ r = bus_open_connection(&m->bus, getuid(),
++ arg_busname, KDBUS_ATTACH_PIDS);
++ if (r < 0)
++ goto error;
++
++ /*
++ * Acquire a well-known name on the bus, so children can address
++ * messages to the master using KDBUS_DST_ID_NAME as destination-ID
++ * of messages.
++ */
++ r = bus_acquire_name(m->bus, arg_master);
++ if (r < 0)
++ goto error;
++
++ *out = m;
++ return 0;
++
++error:
++ master_free(m);
++ return r;
++}
++
++/* pretty straightforward destructor of a master object */
++static void master_free(struct master *m)
++{
++ if (!m)
++ return;
++
++ bus_close_connection(m->bus);
++ if (m->control_fd >= 0)
++ close(m->control_fd);
++ prime_free(m->prime);
++ if (m->signal_fd >= 0)
++ close(m->signal_fd);
++ free(m);
++}
++
++static int master_run(struct master *m)
++{
++ int res, r = 0;
++
++ while (!prime_done(m->prime)) {
++ while (m->n_workers < m->max_workers) {
++ r = master_spawn(m);
++ if (r < 0)
++ break;
++ }
++
++ r = master_poll(m);
++ if (r < 0)
++ break;
++ }
++
++ if (r < 0) {
++ bus_close_connection(m->bus);
++ m->bus = NULL;
++ }
++
++ while (m->n_workers > 0) {
++ res = master_poll(m);
++ if (res < 0) {
++ if (m->bus) {
++ bus_close_connection(m->bus);
++ m->bus = NULL;
++ }
++ r = res;
++ }
++ }
++
++ return r == -EINTR ? 0 : r;
++}
++
++static int master_poll(struct master *m)
++{
++ struct pollfd fds[3] = {};
++ int r = 0, n = 0;
++
++ /*
++ * Add stdin, the eventfd and the connection owner file descriptor to
++ * the pollfd table, and handle incoming traffic on the latter in
++ * master_handle_bus().
++ */
++ fds[n].fd = STDIN_FILENO;
++ fds[n++].events = POLLIN;
++ fds[n].fd = m->signal_fd;
++ fds[n++].events = POLLIN;
++ if (m->bus) {
++ fds[n].fd = m->bus->fd;
++ fds[n++].events = POLLIN;
++ }
++
++ r = poll(fds, n, -1);
++ if (r < 0)
++ return err("poll() failed");
++
++ if (fds[0].revents & POLLIN)
++ r = master_handle_stdin(m);
++ else if (fds[0].revents)
++ r = err("ERR/HUP on stdin");
++ if (r < 0)
++ return r;
++
++ if (fds[1].revents & POLLIN)
++ r = master_handle_signal(m);
++ else if (fds[1].revents)
++ r = err("ERR/HUP on signalfd");
++ if (r < 0)
++ return r;
++
++ if (fds[2].revents & POLLIN)
++ r = master_handle_bus(m);
++ else if (fds[2].revents)
++ r = err("ERR/HUP on bus");
++
++ return r;
++}
++
++static int master_handle_stdin(struct master *m)
++{
++ char buf[128];
++ ssize_t l;
++ int r = 0;
++
++ l = read(STDIN_FILENO, buf, sizeof(buf));
++ if (l < 0)
++ return err("cannot read stdin");
++ if (l == 0)
++ return err_r(-EINVAL, "EOF on stdin");
++
++ while (l-- > 0) {
++ switch (buf[l]) {
++ case 'q':
++ /* quit */
++ r = -EINTR;
++ break;
++ case '\n':
++ case ' ':
++ /* ignore */
++ break;
++ default:
++ if (isgraph(buf[l]))
++ fprintf(stderr, "invalid input '%c'\n", buf[l]);
++ else
++ fprintf(stderr, "invalid input 0x%x\n", buf[l]);
++ break;
++ }
++ }
++
++ return r;
++}
++
++static int master_handle_signal(struct master *m)
++{
++ struct signalfd_siginfo val;
++ ssize_t l;
++
++ l = read(m->signal_fd, &val, sizeof(val));
++ if (l < 0)
++ return err("cannot read signalfd");
++ if (l != sizeof(val))
++ return err_r(-EINVAL, "invalid data from signalfd");
++
++ switch (val.ssi_signo) {
++ case SIGCHLD:
++ return master_waitpid(m);
++ case SIGINT:
++ return err_r(-EINTR, "interrupted");
++ default:
++ return err_r(-EINVAL, "caught invalid signal");
++ }
++}
++
++static int master_handle_bus(struct master *m)
++{
++ struct kdbus_cmd_recv recv = { .size = sizeof(recv) };
++ const struct kdbus_msg *msg = NULL;
++ const struct kdbus_item *item;
++ const struct kdbus_vec *vec = NULL;
++ int r = 0;
++
++ /*
++ * To receive a message, the KDBUS_CMD_RECV ioctl is used.
++ * It takes an argument of type 'struct kdbus_cmd_recv', which
++ * will contain information on the received message when the call
++ * returns. See kdbus.message(7).
++ */
++ r = kdbus_cmd_recv(m->bus->fd, &recv);
++ /*
++ * EAGAIN is returned when there is no message waiting on this
++ * connection. This is not an error - simply bail out.
++ */
++ if (r == -EAGAIN)
++ return 0;
++ if (r < 0)
++ return err_r(r, "cannot receive message");
++
++ /*
++ * Messages received by a connection are stored inside the connection's
++ * pool, at an offset that has been returned in the 'recv' command
++ * struct above. The value describes the relative offset from the
++ * start address of the pool. A message is described with
++ * 'struct kdbus_msg'. See kdbus.message(7).
++ */
++ msg = (void *)(m->bus->pool + recv.msg.offset);
++
++ /*
++ * A messages describes its actual payload in an array of items.
++ * KDBUS_FOREACH() is a simple iterator that walks such an array.
++ * struct kdbus_msg has a field to denote its total size, which is
++ * needed to determine the number of items in the array.
++ */
++ KDBUS_FOREACH(item, msg->items,
++ msg->size - offsetof(struct kdbus_msg, items)) {
++ /*
++ * An item of type PAYLOAD_OFF describes in-line memory
++ * stored in the pool at a described offset. That offset is
++ * relative to the start address of the message header.
++ * This example program only expects one single item of that
++ * type, remembers the struct kdbus_vec member of the item
++ * when it sees it, and bails out if there is more than one
++ * of them.
++ */
++ if (item->type == KDBUS_ITEM_PAYLOAD_OFF) {
++ if (vec) {
++ r = err_r(-EEXIST,
++ "message with multiple vecs");
++ break;
++ }
++ vec = &item->vec;
++ if (vec->size != 1) {
++ r = err_r(-EINVAL, "invalid message size");
++ break;
++ }
++
++ /*
++ * MEMFDs are transported as items of type PAYLOAD_MEMFD.
++ * If such an item is attached, a new file descriptor was
++ * installed into the task when KDBUS_CMD_RECV was called, and
++ * its number is stored in item->memfd.fd.
++ * Implementers *must* handle this item type and close the
++ * file descriptor when no longer needed in order to prevent
++ * file descriptor exhaustion. This example program just bails
++ * out with an error in this case, as memfds are not expected
++ * in this context.
++ */
++ } else if (item->type == KDBUS_ITEM_PAYLOAD_MEMFD) {
++ r = err_r(-EINVAL, "message with memfd");
++ break;
++ }
++ }
++ if (r < 0)
++ goto exit;
++ if (!vec) {
++ r = err_r(-EINVAL, "empty message");
++ goto exit;
++ }
++
++ switch (*((const uint8_t *)msg + vec->offset)) {
++ case 'r': {
++ r = master_reply(m, msg);
++ break;
++ }
++ default:
++ r = err_r(-EINVAL, "invalid message type");
++ break;
++ }
++
++exit:
++ /*
++ * We are done with the memory slice that was given to us through
++ * recv.msg.offset. Tell the kernel it can use it for other content
++ * in the future. See kdbus.pool(7).
++ */
++ bus_poool_free_slice(m->bus, recv.msg.offset);
++ return r;
++}
++
++static int master_reply(struct master *m, const struct kdbus_msg *msg)
++{
++ struct kdbus_cmd_send cmd;
++ struct kdbus_item *item;
++ struct kdbus_msg *reply;
++ size_t size, status, p[2];
++ int r;
++
++ /*
++ * This functions sends a message over kdbus. To do this, it uses the
++ * KDBUS_CMD_SEND ioctl, which takes a command struct argument of type
++ * 'struct kdbus_cmd_send'. This struct stores a pointer to the actual
++ * message to send. See kdbus.message(7).
++ */
++ p[0] = m->prime->done;
++ p[1] = prime_done(m->prime) ? 0 : PRIME_STEPS;
++
++ size = sizeof(*reply);
++ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
++
++ /* Prepare the message to send */
++ reply = alloca(size);
++ memset(reply, 0, size);
++ reply->size = size;
++
++ /* Each message has a cookie that can be used to send replies */
++ reply->cookie = 1;
++
++ /* The payload_type is arbitrary, but it must be non-zero */
++ reply->payload_type = 0xdeadbeef;
++
++ /*
++ * We are sending a reply. Let the kernel know the cookie of the
++ * message we are replying to.
++ */
++ reply->cookie_reply = msg->cookie;
++
++ /*
++ * Messages can either be directed to a well-known name (stored as
++ * string) or to a unique name (stored as number). This example does
++ * the latter. If the message would be directed to a well-known name
++ * instead, the message's dst_id field would be set to
++ * KDBUS_DST_ID_NAME, and the name would be attaches in an item of type
++ * KDBUS_ITEM_DST_NAME. See below for an example, and also refer to
++ * kdbus.message(7).
++ */
++ reply->dst_id = msg->src_id;
++
++ /* Our message has exactly one item to store its payload */
++ item = reply->items;
++ item->type = KDBUS_ITEM_PAYLOAD_VEC;
++ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
++ item->vec.address = (uintptr_t)p;
++ item->vec.size = sizeof(p);
++
++ /*
++ * Now prepare the command struct, and reference the message we want
++ * to send.
++ */
++ memset(&cmd, 0, sizeof(cmd));
++ cmd.size = sizeof(cmd);
++ cmd.msg_address = (uintptr_t)reply;
++
++ /*
++ * Finally, employ the command on the connection owner
++ * file descriptor.
++ */
++ r = kdbus_cmd_send(m->bus->fd, &cmd);
++ if (r < 0)
++ return err_r(r, "cannot send reply");
++
++ if (p[1]) {
++ prime_consume(m->prime, p[1]);
++ status = m->prime->done * 10000 / m->prime->max;
++ if (status != m->prime->status) {
++ m->prime->status = status;
++ fprintf(stderr, "status: %7.3lf%%\n",
++ (double)status / 100);
++ }
++ }
++
++ return 0;
++}
++
++static int master_waitpid(struct master *m)
++{
++ pid_t pid;
++ int r;
++
++ while ((pid = waitpid(-1, &r, WNOHANG)) > 0) {
++ if (m->n_workers > 0)
++ --m->n_workers;
++ if (!WIFEXITED(r))
++ r = err_r(-EINVAL, "child died unexpectedly");
++ else if (WEXITSTATUS(r) != 0)
++ r = err_r(-WEXITSTATUS(r), "child failed");
++ }
++
++ return r;
++}
++
++static int master_spawn(struct master *m)
++{
++ struct child *c = NULL;
++ struct prime *p = NULL;
++ pid_t pid;
++ int r;
++
++ /* Spawn off one child and call child_run() inside it */
++
++ pid = fork();
++ if (pid < 0)
++ return err("cannot fork");
++ if (pid > 0) {
++ /* parent */
++ ++m->n_workers;
++ return 0;
++ }
++
++ /* child */
++
++ p = m->prime;
++ m->prime = NULL;
++ master_free(m);
++
++ r = child_new(&c, p);
++ if (r < 0)
++ goto exit;
++
++ r = child_run(c);
++
++exit:
++ child_free(c);
++ exit(abs(r));
++}
++
++static int child_new(struct child **out, struct prime *p)
++{
++ struct child *c;
++ int r;
++
++ c = calloc(1, sizeof(*c));
++ if (!c)
++ return err("cannot allocate child");
++
++ c->prime = p;
++
++ /*
++ * Open a connection to the bus and require each received message to
++ * carry a list of the well-known names the sendind connection currently
++ * owns. The current UID is needed in order to determine the name of the
++ * bus node to connect to.
++ */
++ r = bus_open_connection(&c->bus, getuid(),
++ arg_busname, KDBUS_ATTACH_NAMES);
++ if (r < 0)
++ goto error;
++
++ /*
++ * Install a kdbus match so the child's connection gets notified when
++ * the master loses its well-known name.
++ */
++ r = bus_install_name_loss_match(c->bus, arg_master);
++ if (r < 0)
++ goto error;
++
++ *out = c;
++ return 0;
++
++error:
++ child_free(c);
++ return r;
++}
++
++static void child_free(struct child *c)
++{
++ if (!c)
++ return;
++
++ bus_close_connection(c->bus);
++ prime_free(c->prime);
++ free(c);
++}
++
++static int child_run(struct child *c)
++{
++ struct kdbus_cmd_send cmd;
++ struct kdbus_item *item;
++ struct kdbus_vec *vec = NULL;
++ struct kdbus_msg *msg;
++ struct timespec spec;
++ size_t n, steps, size;
++ int r = 0;
++
++ /*
++ * Let's send a message to the master and ask for work. To do this,
++ * we use the KDBUS_CMD_SEND ioctl, which takes an argument of type
++ * 'struct kdbus_cmd_send'. This struct stores a pointer to the actual
++ * message to send. See kdbus.message(7).
++ */
++ size = sizeof(*msg);
++ size += KDBUS_ITEM_SIZE(strlen(arg_master) + 1);
++ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
++
++ msg = alloca(size);
++ memset(msg, 0, size);
++ msg->size = size;
++
++ /*
++ * Tell the kernel that we expect a reply to this message. This means
++ * that
++ *
++ * a) The remote peer will gain temporary permission to talk to us
++ * even if it would not be allowed to normally.
++ *
++ * b) A timeout value is required.
++ *
++ * For asynchronous send commands, if no reply is received, we will
++ * get a kernel notification with an item of type
++ * KDBUS_ITEM_REPLY_TIMEOUT attached.
++ *
++ * For synchronous send commands (which this example does), the
++ * ioctl will block until a reply is received or the timeout is
++ * exceeded.
++ */
++ msg->flags = KDBUS_MSG_EXPECT_REPLY;
++
++ /* Set our cookie. Replies must use this cookie to send their reply. */
++ msg->cookie = 1;
++
++ /* The payload_type is arbitrary, but it must be non-zero */
++ msg->payload_type = 0xdeadbeef;
++
++ /*
++ * We are sending our message to the current owner of a well-known
++ * name. This makes an item of type KDBUS_ITEM_DST_NAME mandatory.
++ */
++ msg->dst_id = KDBUS_DST_ID_NAME;
++
++ /*
++ * Set the reply timeout to 5 seconds. Timeouts are always set in
++ * absolute timestamps, based con CLOCK_MONOTONIC. See kdbus.message(7).
++ */
++ clock_gettime(CLOCK_MONOTONIC_COARSE, &spec);
++ msg->timeout_ns += (5 + spec.tv_sec) * 1000ULL * 1000ULL * 1000ULL;
++ msg->timeout_ns += spec.tv_nsec;
++
++ /*
++ * Fill the appended items. First, set the well-known name of the
++ * destination we want to talk to.
++ */
++ item = msg->items;
++ item->type = KDBUS_ITEM_DST_NAME;
++ item->size = KDBUS_ITEM_HEADER_SIZE + strlen(arg_master) + 1;
++ strcpy(item->str, arg_master);
++
++ /*
++ * The 2nd item contains a vector to memory we want to send. It
++ * can be content of any type. In our case, we're sending a one-byte
++ * string only. The memory referenced by this item will be copied into
++ * the pool of the receiver connection, and does not need to be valid
++ * after the command is employed.
++ */
++ item = KDBUS_ITEM_NEXT(item);
++ item->type = KDBUS_ITEM_PAYLOAD_VEC;
++ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
++ item->vec.address = (uintptr_t)"r";
++ item->vec.size = 1;
++
++ /* Set up the command struct and reference the message we prepared */
++ memset(&cmd, 0, sizeof(cmd));
++ cmd.size = sizeof(cmd);
++ cmd.msg_address = (uintptr_t)msg;
++
++ /*
++ * The send commands knows a mode in which it will block until a
++ * reply to a message is received. This example uses that mode.
++ * The pool offset to the received reply will be stored in the command
++ * struct after the send command returned. See below.
++ */
++ cmd.flags = KDBUS_SEND_SYNC_REPLY;
++
++ /*
++ * Finally, employ the command on the connection owner
++ * file descriptor.
++ */
++ r = kdbus_cmd_send(c->bus->fd, &cmd);
++ if (r == -ESRCH || r == -EPIPE || r == -ECONNRESET)
++ return 0;
++ if (r < 0)
++ return err_r(r, "cannot send request to master");
++
++ /*
++ * The command was sent with the KDBUS_SEND_SYNC_REPLY flag set,
++ * and returned successfully, which means that cmd.reply.offset now
++ * points to a message inside our connection's pool where the reply
++ * is found. This is equivalent to receiving the reply with
++ * KDBUS_CMD_RECV, but it doesn't require waiting for the reply with
++ * poll() and also saves the ioctl to receive the message.
++ */
++ msg = (void *)(c->bus->pool + cmd.reply.offset);
++
++ /*
++ * A messages describes its actual payload in an array of items.
++ * KDBUS_FOREACH() is a simple iterator that walks such an array.
++ * struct kdbus_msg has a field to denote its total size, which is
++ * needed to determine the number of items in the array.
++ */
++ KDBUS_FOREACH(item, msg->items,
++ msg->size - offsetof(struct kdbus_msg, items)) {
++ /*
++ * An item of type PAYLOAD_OFF describes in-line memory
++ * stored in the pool at a described offset. That offset is
++ * relative to the start address of the message header.
++ * This example program only expects one single item of that
++ * type, remembers the struct kdbus_vec member of the item
++ * when it sees it, and bails out if there is more than one
++ * of them.
++ */
++ if (item->type == KDBUS_ITEM_PAYLOAD_OFF) {
++ if (vec) {
++ r = err_r(-EEXIST,
++ "message with multiple vecs");
++ break;
++ }
++ vec = &item->vec;
++ if (vec->size != 2 * sizeof(size_t)) {
++ r = err_r(-EINVAL, "invalid message size");
++ break;
++ }
++ /*
++ * MEMFDs are transported as items of type PAYLOAD_MEMFD.
++ * If such an item is attached, a new file descriptor was
++ * installed into the task when KDBUS_CMD_RECV was called, and
++ * its number is stored in item->memfd.fd.
++ * Implementers *must* handle this item type close the
++ * file descriptor when no longer needed in order to prevent
++ * file descriptor exhaustion. This example program just bails
++ * out with an error in this case, as memfds are not expected
++ * in this context.
++ */
++ } else if (item->type == KDBUS_ITEM_PAYLOAD_MEMFD) {
++ r = err_r(-EINVAL, "message with memfd");
++ break;
++ }
++ }
++ if (r < 0)
++ goto exit;
++ if (!vec) {
++ r = err_r(-EINVAL, "empty message");
++ goto exit;
++ }
++
++ n = ((size_t *)((const uint8_t *)msg + vec->offset))[0];
++ steps = ((size_t *)((const uint8_t *)msg + vec->offset))[1];
++
++ while (steps-- > 0) {
++ ++n;
++ r = prime_run(c->prime, c->bus, n);
++ if (r < 0)
++ break;
++ r = bus_poll(c->bus);
++ if (r != 0) {
++ r = r < 0 ? r : -EINTR;
++ break;
++ }
++ }
++
++exit:
++ /*
++ * We are done with the memory slice that was given to us through
++ * cmd.reply.offset. Tell the kernel it can use it for other content
++ * in the future. See kdbus.pool(7).
++ */
++ bus_poool_free_slice(c->bus, cmd.reply.offset);
++ return r;
++}
++
++/*
++ * Prime Computation
++ *
++ */
++
++static int prime_new(struct prime **out)
++{
++ struct prime *p;
++ int r;
++
++ p = calloc(1, sizeof(*p));
++ if (!p)
++ return err("cannot allocate prime memory");
++
++ p->fd = -1;
++ p->area = MAP_FAILED;
++ p->max = MAX_PRIMES;
++
++ /*
++ * Prepare and map a memfd to store the bit-fields for the number
++ * ranges we want to perform the prime detection on.
++ */
++ p->fd = syscall(__NR_memfd_create, "prime-area", MFD_CLOEXEC);
++ if (p->fd < 0) {
++ r = err("cannot create memfd");
++ goto error;
++ }
++
++ r = ftruncate(p->fd, p->max / 8 + 1);
++ if (r < 0) {
++ r = err("cannot ftruncate area");
++ goto error;
++ }
++
++ p->area = mmap(NULL, p->max / 8 + 1, PROT_READ | PROT_WRITE,
++ MAP_SHARED, p->fd, 0);
++ if (p->area == MAP_FAILED) {
++ r = err("cannot mmap memfd");
++ goto error;
++ }
++
++ *out = p;
++ return 0;
++
++error:
++ prime_free(p);
++ return r;
++}
++
++static void prime_free(struct prime *p)
++{
++ if (!p)
++ return;
++
++ if (p->area != MAP_FAILED)
++ munmap(p->area, p->max / 8 + 1);
++ if (p->fd >= 0)
++ close(p->fd);
++ free(p);
++}
++
++static bool prime_done(struct prime *p)
++{
++ return p->done >= p->max;
++}
++
++static void prime_consume(struct prime *p, size_t amount)
++{
++ p->done += amount;
++}
++
++static int prime_run(struct prime *p, struct bus *cancel, size_t number)
++{
++ size_t i, n = 0;
++ int r;
++
++ if (number < 2 || number > 65535)
++ return 0;
++
++ for (i = number * number;
++ i < p->max && i > number;
++ i += number) {
++ p->area[i / 8] |= 1 << (i % 8);
++
++ if (!(++n % (1 << 20))) {
++ r = bus_poll(cancel);
++ if (r != 0)
++ return r < 0 ? r : -EINTR;
++ }
++ }
++
++ return 0;
++}
++
++static void prime_print(struct prime *p)
++{
++ size_t i, l = 0;
++
++ fprintf(stderr, "PRIMES:");
++ for (i = 0; i < p->max; ++i) {
++ if (!(p->area[i / 8] & (1 << (i % 8))))
++ fprintf(stderr, "%c%7zu", !(l++ % 16) ? '\n' : ' ', i);
++ }
++ fprintf(stderr, "\nEND\n");
++}
++
++static int bus_open_connection(struct bus **out, uid_t uid, const char *name,
++ uint64_t recv_flags)
++{
++ struct kdbus_cmd_hello hello;
++ char path[128];
++ struct bus *b;
++ int r;
++
++ /*
++ * The 'bus' object is our representation of a kdbus connection which
++ * stores two details: the connection owner file descriptor, and the
++ * mmap()ed memory of its associated pool. See kdbus.connection(7) and
++ * kdbus.pool(7).
++ */
++ b = calloc(1, sizeof(*b));
++ if (!b)
++ return err("cannot allocate bus memory");
++
++ b->fd = -1;
++ b->pool = MAP_FAILED;
++
++ /* Compute the name of the bus node to connect to. */
++ snprintf(path, sizeof(path), "/sys/fs/%s/%lu-%s/bus",
++ arg_modname, (unsigned long)uid, name);
++ b->fd = open(path, O_RDWR | O_CLOEXEC);
++ if (b->fd < 0) {
++ r = err("cannot open bus");
++ goto error;
++ }
++
++ /*
++ * To make a connection to the bus, the KDBUS_CMD_HELLO ioctl is used.
++ * It takes an argument of type 'struct kdbus_cmd_hello'.
++ */
++ memset(&hello, 0, sizeof(hello));
++ hello.size = sizeof(hello);
++
++ /*
++ * Specify a mask of metadata attach flags, describing metadata items
++ * that this new connection allows to be sent.
++ */
++ hello.attach_flags_send = _KDBUS_ATTACH_ALL;
++
++ /*
++ * Specify a mask of metadata attach flags, describing metadata items
++ * that this new connection wants to be receive along with each message.
++ */
++ hello.attach_flags_recv = recv_flags;
++
++ /*
++ * A connection may choose the size of its pool, but the number has to
++ * comply with two rules: a) it must be greater than 0, and b) it must
++ * be a mulitple of PAGE_SIZE. See kdbus.pool(7).
++ */
++ hello.pool_size = POOL_SIZE;
++
++ /*
++ * Now employ the command on the file descriptor opened above.
++ * This command will turn the file descriptor into a connection-owner
++ * file descriptor that controls the life-time of the connection; once
++ * it's closed, the connection is shut down.
++ */
++ r = kdbus_cmd_hello(b->fd, &hello);
++ if (r < 0) {
++ err_r(r, "HELLO failed");
++ goto error;
++ }
++
++ bus_poool_free_slice(b, hello.offset);
++
++ /*
++ * Map the pool of the connection. Its size has been set in the
++ * command struct above. See kdbus.pool(7).
++ */
++ b->pool = mmap(NULL, POOL_SIZE, PROT_READ, MAP_SHARED, b->fd, 0);
++ if (b->pool == MAP_FAILED) {
++ r = err("cannot mmap pool");
++ goto error;
++ }
++
++ *out = b;
++ return 0;
++
++error:
++ bus_close_connection(b);
++ return r;
++}
++
++static void bus_close_connection(struct bus *b)
++{
++ if (!b)
++ return;
++
++ /*
++ * A bus connection is closed by simply calling close() on the
++ * connection owner file descriptor. The unique name and all owned
++ * well-known names of the conneciton will disappear.
++ * See kdbus.connection(7).
++ */
++ if (b->pool != MAP_FAILED)
++ munmap(b->pool, POOL_SIZE);
++ if (b->fd >= 0)
++ close(b->fd);
++ free(b);
++}
++
++static void bus_poool_free_slice(struct bus *b, uint64_t offset)
++{
++ struct kdbus_cmd_free cmd = {
++ .size = sizeof(cmd),
++ .offset = offset,
++ };
++ int r;
++
++ /*
++ * Once we're done with a piece of pool memory that was returned
++ * by a command, we have to call the KDBUS_CMD_FREE ioctl on it so it
++ * can be reused. The command takes an argument of type
++ * 'struct kdbus_cmd_free', in which the pool offset of the slice to
++ * free is stored. The ioctl is employed on the connection owner
++ * file descriptor. See kdbus.pool(7),
++ */
++ r = kdbus_cmd_free(b->fd, &cmd);
++ if (r < 0)
++ err_r(r, "cannot free pool slice");
++}
++
++static int bus_acquire_name(struct bus *b, const char *name)
++{
++ struct kdbus_item *item;
++ struct kdbus_cmd *cmd;
++ size_t size;
++ int r;
++
++ /*
++ * This function acquires a well-known name on the bus through the
++ * KDBUS_CMD_NAME_ACQUIRE ioctl. This ioctl takes an argument of type
++ * 'struct kdbus_cmd', which is assembled below. See kdbus.name(7).
++ */
++ size = sizeof(*cmd);
++ size += KDBUS_ITEM_SIZE(strlen(name) + 1);
++
++ cmd = alloca(size);
++ memset(cmd, 0, size);
++ cmd->size = size;
++
++ /*
++ * The command requires an item of type KDBUS_ITEM_NAME, and its
++ * content must be a valid bus name.
++ */
++ item = cmd->items;
++ item->type = KDBUS_ITEM_NAME;
++ item->size = KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
++ strcpy(item->str, name);
++
++ /*
++ * Employ the command on the connection owner file descriptor.
++ */
++ r = kdbus_cmd_name_acquire(b->fd, cmd);
++ if (r < 0)
++ return err_r(r, "cannot acquire name");
++
++ return 0;
++}
++
++static int bus_install_name_loss_match(struct bus *b, const char *name)
++{
++ struct kdbus_cmd_match *match;
++ struct kdbus_item *item;
++ size_t size;
++ int r;
++
++ /*
++ * In order to install a match for signal messages, we have to
++ * assemble a 'struct kdbus_cmd_match' and use it along with the
++ * KDBUS_CMD_MATCH_ADD ioctl. See kdbus.match(7).
++ */
++ size = sizeof(*match);
++ size += KDBUS_ITEM_SIZE(sizeof(item->name_change) + strlen(name) + 1);
++
++ match = alloca(size);
++ memset(match, 0, size);
++ match->size = size;
++
++ /*
++ * A match is comprised of many 'rules', each of which describes a
++ * mandatory detail of the message. All rules of a match must be
++ * satified in order to make a message pass.
++ */
++ item = match->items;
++
++ /*
++ * In this case, we're interested in notifications that inform us
++ * about a well-known name being removed from the bus.
++ */
++ item->type = KDBUS_ITEM_NAME_REMOVE;
++ item->size = KDBUS_ITEM_HEADER_SIZE +
++ sizeof(item->name_change) + strlen(name) + 1;
++
++ /*
++ * We could limit the match further and require a specific unique-ID
++ * to be the new or the old owner of the name. In this case, however,
++ * we don't, and allow 'any' id.
++ */
++ item->name_change.old_id.id = KDBUS_MATCH_ID_ANY;
++ item->name_change.new_id.id = KDBUS_MATCH_ID_ANY;
++
++ /* Copy in the well-known name we're interested in */
++ strcpy(item->name_change.name, name);
++
++ /*
++ * Add the match through the KDBUS_CMD_MATCH_ADD ioctl, employed on
++ * the connection owner fd.
++ */
++ r = kdbus_cmd_match_add(b->fd, match);
++ if (r < 0)
++ return err_r(r, "cannot add match");
++
++ return 0;
++}
++
++static int bus_poll(struct bus *b)
++{
++ struct pollfd fds[1] = {};
++ int r;
++
++ /*
++ * A connection endpoint supports poll() and will wake-up the
++ * task with POLLIN set once a message has arrived.
++ */
++ fds[0].fd = b->fd;
++ fds[0].events = POLLIN;
++ r = poll(fds, sizeof(fds) / sizeof(*fds), 0);
++ if (r < 0)
++ return err("cannot poll bus");
++ return !!(fds[0].revents & POLLIN);
++}
++
++static int bus_make(uid_t uid, const char *name)
++{
++ struct kdbus_item *item;
++ struct kdbus_cmd *make;
++ char path[128], busname[128];
++ size_t size;
++ int r, fd;
++
++ /*
++ * Compute the full path to the 'control' node. 'arg_modname' may be
++ * set to a different value than 'kdbus' for development purposes.
++ * The 'control' node is the primary entry point to kdbus that must be
++ * used in order to create a bus. See kdbus(7) and kdbus.bus(7).
++ */
++ snprintf(path, sizeof(path), "/sys/fs/%s/control", arg_modname);
++
++ /*
++ * Compute the bus name. A valid bus name must always be prefixed with
++ * the EUID of the currently running process in order to avoid name
++ * conflicts. See kdbus.bus(7).
++ */
++ snprintf(busname, sizeof(busname), "%lu-%s", (unsigned long)uid, name);
++
++ fd = open(path, O_RDWR | O_CLOEXEC);
++ if (fd < 0)
++ return err("cannot open control file");
++
++ /*
++ * The KDBUS_CMD_BUS_MAKE ioctl takes an argument of type
++ * 'struct kdbus_cmd', and expects at least two items attached to
++ * it: one to decribe the bloom parameters to be propagated to
++ * connections of the bus, and the name of the bus that was computed
++ * above. Assemble this struct now, and fill it with values.
++ */
++ size = sizeof(*make);
++ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_parameter));
++ size += KDBUS_ITEM_SIZE(strlen(busname) + 1);
++
++ make = alloca(size);
++ memset(make, 0, size);
++ make->size = size;
++
++ /*
++ * Each item has a 'type' and 'size' field, and must be stored at an
++ * 8-byte aligned address. The KDBUS_ITEM_NEXT macro is used to advance
++ * the pointer. See kdbus.item(7) for more details.
++ */
++ item = make->items;
++ item->type = KDBUS_ITEM_BLOOM_PARAMETER;
++ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(item->bloom_parameter);
++ item->bloom_parameter.size = 8;
++ item->bloom_parameter.n_hash = 1;
++
++ /* The name of the new bus is stored in the next item. */
++ item = KDBUS_ITEM_NEXT(item);
++ item->type = KDBUS_ITEM_MAKE_NAME;
++ item->size = KDBUS_ITEM_HEADER_SIZE + strlen(busname) + 1;
++ strcpy(item->str, busname);
++
++ /*
++ * Now create the bus via the KDBUS_CMD_BUS_MAKE ioctl and return the
++ * fd that was used back to the caller of this function. This fd is now
++ * called a 'bus owner file descriptor', and it controls the life-time
++ * of the newly created bus; once the file descriptor is closed, the
++ * bus goes away, and all connections are shut down. See kdbus.bus(7).
++ */
++ r = kdbus_cmd_bus_make(fd, make);
++ if (r < 0) {
++ err_r(r, "cannot make bus");
++ close(fd);
++ return r;
++ }
++
++ return fd;
++}
++
++#else
++
++#warning "Skipping compilation due to unsupported libc version"
++
++int main(int argc, char **argv)
++{
++ fprintf(stderr,
++ "Compilation of %s was skipped due to unsupported libc.\n",
++ argv[0]);
++
++ return EXIT_FAILURE;
++}
++
++#endif /* libc sanity check */
+diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
+index 95abddc..b57100c 100644
+--- a/tools/testing/selftests/Makefile
++++ b/tools/testing/selftests/Makefile
+@@ -5,6 +5,7 @@ TARGETS += exec
+ TARGETS += firmware
+ TARGETS += ftrace
+ TARGETS += kcmp
++TARGETS += kdbus
+ TARGETS += memfd
+ TARGETS += memory-hotplug
+ TARGETS += mount
+diff --git a/tools/testing/selftests/kdbus/.gitignore b/tools/testing/selftests/kdbus/.gitignore
+new file mode 100644
+index 0000000..d3ef42f
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/.gitignore
+@@ -0,0 +1 @@
++kdbus-test
+diff --git a/tools/testing/selftests/kdbus/Makefile b/tools/testing/selftests/kdbus/Makefile
+new file mode 100644
+index 0000000..8f36cb5
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/Makefile
+@@ -0,0 +1,49 @@
++CFLAGS += -I../../../../usr/include/
++CFLAGS += -I../../../../samples/kdbus/
++CFLAGS += -I../../../../include/uapi/
++CFLAGS += -std=gnu99
++CFLAGS += -DKBUILD_MODNAME=\"kdbus\" -D_GNU_SOURCE
++LDLIBS = -pthread -lcap -lm
++
++OBJS= \
++ kdbus-enum.o \
++ kdbus-util.o \
++ kdbus-test.o \
++ kdbus-test.o \
++ test-activator.o \
++ test-benchmark.o \
++ test-bus.o \
++ test-chat.o \
++ test-connection.o \
++ test-daemon.o \
++ test-endpoint.o \
++ test-fd.o \
++ test-free.o \
++ test-match.o \
++ test-message.o \
++ test-metadata-ns.o \
++ test-monitor.o \
++ test-names.o \
++ test-policy.o \
++ test-policy-ns.o \
++ test-policy-priv.o \
++ test-sync.o \
++ test-timeout.o
++
++all: kdbus-test
++
++include ../lib.mk
++
++%.o: %.c kdbus-enum.h kdbus-test.h kdbus-util.h
++ $(CC) $(CFLAGS) -c $< -o $@
++
++kdbus-test: $(OBJS)
++ $(CC) $(CFLAGS) $^ $(LDLIBS) -o $@
++
++TEST_PROGS := kdbus-test
++
++run_tests:
++ ./kdbus-test --tap
++
++clean:
++ rm -f *.o kdbus-test
+diff --git a/tools/testing/selftests/kdbus/kdbus-enum.c b/tools/testing/selftests/kdbus/kdbus-enum.c
+new file mode 100644
+index 0000000..4f1e579
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/kdbus-enum.c
+@@ -0,0 +1,94 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <stdio.h>
++#include <string.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++
++struct kdbus_enum_table {
++ long long id;
++ const char *name;
++};
++
++#define TABLE(what) static struct kdbus_enum_table kdbus_table_##what[]
++#define ENUM(_id) { .id = _id, .name = STRINGIFY(_id) }
++#define LOOKUP(what) \
++ const char *enum_##what(long long id) \
++ { \
++ for (size_t i = 0; i < ELEMENTSOF(kdbus_table_##what); i++) \
++ if (id == kdbus_table_##what[i].id) \
++ return kdbus_table_##what[i].name; \
++ return "UNKNOWN"; \
++ }
++
++TABLE(CMD) = {
++ ENUM(KDBUS_CMD_BUS_MAKE),
++ ENUM(KDBUS_CMD_ENDPOINT_MAKE),
++ ENUM(KDBUS_CMD_HELLO),
++ ENUM(KDBUS_CMD_SEND),
++ ENUM(KDBUS_CMD_RECV),
++ ENUM(KDBUS_CMD_LIST),
++ ENUM(KDBUS_CMD_NAME_RELEASE),
++ ENUM(KDBUS_CMD_CONN_INFO),
++ ENUM(KDBUS_CMD_MATCH_ADD),
++ ENUM(KDBUS_CMD_MATCH_REMOVE),
++};
++LOOKUP(CMD);
++
++TABLE(MSG) = {
++ ENUM(_KDBUS_ITEM_NULL),
++ ENUM(KDBUS_ITEM_PAYLOAD_VEC),
++ ENUM(KDBUS_ITEM_PAYLOAD_OFF),
++ ENUM(KDBUS_ITEM_PAYLOAD_MEMFD),
++ ENUM(KDBUS_ITEM_FDS),
++ ENUM(KDBUS_ITEM_BLOOM_PARAMETER),
++ ENUM(KDBUS_ITEM_BLOOM_FILTER),
++ ENUM(KDBUS_ITEM_DST_NAME),
++ ENUM(KDBUS_ITEM_MAKE_NAME),
++ ENUM(KDBUS_ITEM_ATTACH_FLAGS_SEND),
++ ENUM(KDBUS_ITEM_ATTACH_FLAGS_RECV),
++ ENUM(KDBUS_ITEM_ID),
++ ENUM(KDBUS_ITEM_NAME),
++ ENUM(KDBUS_ITEM_TIMESTAMP),
++ ENUM(KDBUS_ITEM_CREDS),
++ ENUM(KDBUS_ITEM_PIDS),
++ ENUM(KDBUS_ITEM_AUXGROUPS),
++ ENUM(KDBUS_ITEM_OWNED_NAME),
++ ENUM(KDBUS_ITEM_TID_COMM),
++ ENUM(KDBUS_ITEM_PID_COMM),
++ ENUM(KDBUS_ITEM_EXE),
++ ENUM(KDBUS_ITEM_CMDLINE),
++ ENUM(KDBUS_ITEM_CGROUP),
++ ENUM(KDBUS_ITEM_CAPS),
++ ENUM(KDBUS_ITEM_SECLABEL),
++ ENUM(KDBUS_ITEM_AUDIT),
++ ENUM(KDBUS_ITEM_CONN_DESCRIPTION),
++ ENUM(KDBUS_ITEM_NAME_ADD),
++ ENUM(KDBUS_ITEM_NAME_REMOVE),
++ ENUM(KDBUS_ITEM_NAME_CHANGE),
++ ENUM(KDBUS_ITEM_ID_ADD),
++ ENUM(KDBUS_ITEM_ID_REMOVE),
++ ENUM(KDBUS_ITEM_REPLY_TIMEOUT),
++ ENUM(KDBUS_ITEM_REPLY_DEAD),
++};
++LOOKUP(MSG);
++
++TABLE(PAYLOAD) = {
++ ENUM(KDBUS_PAYLOAD_KERNEL),
++ ENUM(KDBUS_PAYLOAD_DBUS),
++};
++LOOKUP(PAYLOAD);
+diff --git a/tools/testing/selftests/kdbus/kdbus-enum.h b/tools/testing/selftests/kdbus/kdbus-enum.h
+new file mode 100644
+index 0000000..ed28cca
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/kdbus-enum.h
+@@ -0,0 +1,15 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#pragma once
++
++const char *enum_CMD(long long id);
++const char *enum_MSG(long long id);
++const char *enum_MATCH(long long id);
++const char *enum_PAYLOAD(long long id);
+diff --git a/tools/testing/selftests/kdbus/kdbus-test.c b/tools/testing/selftests/kdbus/kdbus-test.c
+new file mode 100644
+index 0000000..db57381
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/kdbus-test.c
+@@ -0,0 +1,905 @@
++#include <errno.h>
++#include <stdio.h>
++#include <string.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <time.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <assert.h>
++#include <getopt.h>
++#include <stdbool.h>
++#include <signal.h>
++#include <sys/mount.h>
++#include <sys/prctl.h>
++#include <sys/wait.h>
++#include <sys/syscall.h>
++#include <sys/eventfd.h>
++#include <linux/sched.h>
++
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++#include "kdbus-test.h"
++
++enum {
++ TEST_CREATE_BUS = 1 << 0,
++ TEST_CREATE_CONN = 1 << 1,
++};
++
++struct kdbus_test {
++ const char *name;
++ const char *desc;
++ int (*func)(struct kdbus_test_env *env);
++ unsigned int flags;
++};
++
++struct kdbus_test_args {
++ bool mntns;
++ bool pidns;
++ bool userns;
++ char *uid_map;
++ char *gid_map;
++ int loop;
++ int wait;
++ int fork;
++ int tap_output;
++ char *module;
++ char *root;
++ char *test;
++ char *busname;
++};
++
++static const struct kdbus_test tests[] = {
++ {
++ .name = "bus-make",
++ .desc = "bus make functions",
++ .func = kdbus_test_bus_make,
++ .flags = 0,
++ },
++ {
++ .name = "hello",
++ .desc = "the HELLO command",
++ .func = kdbus_test_hello,
++ .flags = TEST_CREATE_BUS,
++ },
++ {
++ .name = "byebye",
++ .desc = "the BYEBYE command",
++ .func = kdbus_test_byebye,
++ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
++ },
++ {
++ .name = "chat",
++ .desc = "a chat pattern",
++ .func = kdbus_test_chat,
++ .flags = TEST_CREATE_BUS,
++ },
++ {
++ .name = "daemon",
++ .desc = "a simple daemon",
++ .func = kdbus_test_daemon,
++ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
++ },
++ {
++ .name = "fd-passing",
++ .desc = "file descriptor passing",
++ .func = kdbus_test_fd_passing,
++ .flags = TEST_CREATE_BUS,
++ },
++ {
++ .name = "endpoint",
++ .desc = "custom endpoint",
++ .func = kdbus_test_custom_endpoint,
++ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
++ },
++ {
++ .name = "monitor",
++ .desc = "monitor functionality",
++ .func = kdbus_test_monitor,
++ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
++ },
++ {
++ .name = "name-basics",
++ .desc = "basic name registry functions",
++ .func = kdbus_test_name_basic,
++ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
++ },
++ {
++ .name = "name-conflict",
++ .desc = "name registry conflict details",
++ .func = kdbus_test_name_conflict,
++ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
++ },
++ {
++ .name = "name-queue",
++ .desc = "queuing of names",
++ .func = kdbus_test_name_queue,
++ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
++ },
++ {
++ .name = "name-takeover",
++ .desc = "takeover of names",
++ .func = kdbus_test_name_takeover,
++ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
++ },
++ {
++ .name = "message-basic",
++ .desc = "basic message handling",
++ .func = kdbus_test_message_basic,
++ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
++ },
++ {
++ .name = "message-prio",
++ .desc = "handling of messages with priority",
++ .func = kdbus_test_message_prio,
++ .flags = TEST_CREATE_BUS,
++ },
++ {
++ .name = "message-quota",
++ .desc = "message quotas are enforced",
++ .func = kdbus_test_message_quota,
++ .flags = TEST_CREATE_BUS,
++ },
++ {
++ .name = "memory-access",
++ .desc = "memory access",
++ .func = kdbus_test_memory_access,
++ .flags = TEST_CREATE_BUS,
++ },
++ {
++ .name = "timeout",
++ .desc = "timeout",
++ .func = kdbus_test_timeout,
++ .flags = TEST_CREATE_BUS,
++ },
++ {
++ .name = "sync-byebye",
++ .desc = "synchronous replies vs. BYEBYE",
++ .func = kdbus_test_sync_byebye,
++ .flags = TEST_CREATE_BUS,
++ },
++ {
++ .name = "sync-reply",
++ .desc = "synchronous replies",
++ .func = kdbus_test_sync_reply,
++ .flags = TEST_CREATE_BUS,
++ },
++ {
++ .name = "message-free",
++ .desc = "freeing of memory",
++ .func = kdbus_test_free,
++ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
++ },
++ {
++ .name = "connection-info",
++ .desc = "retrieving connection information",
++ .func = kdbus_test_conn_info,
++ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
++ },
++ {
++ .name = "connection-update",
++ .desc = "updating connection information",
++ .func = kdbus_test_conn_update,
++ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
++ },
++ {
++ .name = "writable-pool",
++ .desc = "verifying pools are never writable",
++ .func = kdbus_test_writable_pool,
++ .flags = TEST_CREATE_BUS,
++ },
++ {
++ .name = "policy",
++ .desc = "policy",
++ .func = kdbus_test_policy,
++ .flags = TEST_CREATE_BUS,
++ },
++ {
++ .name = "policy-priv",
++ .desc = "unprivileged bus access",
++ .func = kdbus_test_policy_priv,
++ .flags = TEST_CREATE_BUS,
++ },
++ {
++ .name = "policy-ns",
++ .desc = "policy in user namespaces",
++ .func = kdbus_test_policy_ns,
++ .flags = TEST_CREATE_BUS,
++ },
++ {
++ .name = "metadata-ns",
++ .desc = "metadata in different namespaces",
++ .func = kdbus_test_metadata_ns,
++ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
++ },
++ {
++ .name = "match-id-add",
++ .desc = "adding of matches by id",
++ .func = kdbus_test_match_id_add,
++ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
++ },
++ {
++ .name = "match-id-remove",
++ .desc = "removing of matches by id",
++ .func = kdbus_test_match_id_remove,
++ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
++ },
++ {
++ .name = "match-replace",
++ .desc = "replace of matches with the same cookie",
++ .func = kdbus_test_match_replace,
++ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
++ },
++ {
++ .name = "match-name-add",
++ .desc = "adding of matches by name",
++ .func = kdbus_test_match_name_add,
++ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
++ },
++ {
++ .name = "match-name-remove",
++ .desc = "removing of matches by name",
++ .func = kdbus_test_match_name_remove,
++ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
++ },
++ {
++ .name = "match-name-change",
++ .desc = "matching for name changes",
++ .func = kdbus_test_match_name_change,
++ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
++ },
++ {
++ .name = "match-bloom",
++ .desc = "matching with bloom filters",
++ .func = kdbus_test_match_bloom,
++ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
++ },
++ {
++ .name = "activator",
++ .desc = "activator connections",
++ .func = kdbus_test_activator,
++ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
++ },
++ {
++ .name = "benchmark",
++ .desc = "benchmark",
++ .func = kdbus_test_benchmark,
++ .flags = TEST_CREATE_BUS,
++ },
++ {
++ .name = "benchmark-nomemfds",
++ .desc = "benchmark without using memfds",
++ .func = kdbus_test_benchmark_nomemfds,
++ .flags = TEST_CREATE_BUS,
++ },
++ {
++ .name = "benchmark-uds",
++ .desc = "benchmark comparison to UDS",
++ .func = kdbus_test_benchmark_uds,
++ .flags = TEST_CREATE_BUS,
++ },
++};
++
++#define N_TESTS ((int) (sizeof(tests) / sizeof(tests[0])))
++
++static int test_prepare_env(const struct kdbus_test *t,
++ const struct kdbus_test_args *args,
++ struct kdbus_test_env *env)
++{
++ if (t->flags & TEST_CREATE_BUS) {
++ char *s;
++ char *n = NULL;
++ int ret;
++
++ asprintf(&s, "%s/control", args->root);
++
++ env->control_fd = open(s, O_RDWR);
++ free(s);
++ ASSERT_RETURN(env->control_fd >= 0);
++
++ if (!args->busname) {
++ n = unique_name("test-bus");
++ ASSERT_RETURN(n);
++ }
++
++ ret = kdbus_create_bus(env->control_fd,
++ args->busname ?: n,
++ _KDBUS_ATTACH_ALL, &s);
++ free(n);
++ ASSERT_RETURN(ret == 0);
++
++ asprintf(&env->buspath, "%s/%s/bus", args->root, s);
++ free(s);
++ }
++
++ if (t->flags & TEST_CREATE_CONN) {
++ env->conn = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(env->conn);
++ }
++
++ env->root = args->root;
++ env->module = args->module;
++
++ return 0;
++}
++
++void test_unprepare_env(const struct kdbus_test *t, struct kdbus_test_env *env)
++{
++ if (env->conn) {
++ kdbus_conn_free(env->conn);
++ env->conn = NULL;
++ }
++
++ if (env->control_fd >= 0) {
++ close(env->control_fd);
++ env->control_fd = -1;
++ }
++
++ if (env->buspath) {
++ free(env->buspath);
++ env->buspath = NULL;
++ }
++}
++
++static int test_run(const struct kdbus_test *t,
++ const struct kdbus_test_args *kdbus_args,
++ int wait)
++{
++ int ret;
++ struct kdbus_test_env env = {};
++
++ ret = test_prepare_env(t, kdbus_args, &env);
++ if (ret != TEST_OK)
++ return ret;
++
++ if (wait > 0) {
++ printf("Sleeping %d seconds before running test ...\n", wait);
++ sleep(wait);
++ }
++
++ ret = t->func(&env);
++ test_unprepare_env(t, &env);
++ return ret;
++}
++
++static int test_run_forked(const struct kdbus_test *t,
++ const struct kdbus_test_args *kdbus_args,
++ int wait)
++{
++ int ret;
++ pid_t pid;
++
++ pid = fork();
++ if (pid < 0) {
++ return TEST_ERR;
++ } else if (pid == 0) {
++ ret = test_run(t, kdbus_args, wait);
++ _exit(ret);
++ }
++
++ pid = waitpid(pid, &ret, 0);
++ if (pid <= 0)
++ return TEST_ERR;
++ else if (!WIFEXITED(ret))
++ return TEST_ERR;
++ else
++ return WEXITSTATUS(ret);
++}
++
++static void print_test_result(int ret)
++{
++ switch (ret) {
++ case TEST_OK:
++ printf("OK");
++ break;
++ case TEST_SKIP:
++ printf("SKIPPED");
++ break;
++ case TEST_ERR:
++ printf("ERROR");
++ break;
++ }
++}
++
++static int start_all_tests(struct kdbus_test_args *kdbus_args)
++{
++ int ret;
++ unsigned int fail_cnt = 0;
++ unsigned int skip_cnt = 0;
++ unsigned int ok_cnt = 0;
++ unsigned int i;
++
++ if (kdbus_args->tap_output) {
++ printf("1..%d\n", N_TESTS);
++ fflush(stdout);
++ }
++
++ kdbus_util_verbose = false;
++
++ for (i = 0; i < N_TESTS; i++) {
++ const struct kdbus_test *t = tests + i;
++
++ if (!kdbus_args->tap_output) {
++ unsigned int n;
++
++ printf("Testing %s (%s) ", t->desc, t->name);
++ for (n = 0; n < 60 - strlen(t->desc) - strlen(t->name); n++)
++ printf(".");
++ printf(" ");
++ }
++
++ ret = test_run_forked(t, kdbus_args, 0);
++ switch (ret) {
++ case TEST_OK:
++ ok_cnt++;
++ break;
++ case TEST_SKIP:
++ skip_cnt++;
++ break;
++ case TEST_ERR:
++ fail_cnt++;
++ break;
++ }
++
++ if (kdbus_args->tap_output) {
++ printf("%sok %d - %s%s (%s)\n",
++ (ret == TEST_ERR) ? "not " : "", i + 1,
++ (ret == TEST_SKIP) ? "# SKIP " : "",
++ t->desc, t->name);
++ fflush(stdout);
++ } else {
++ print_test_result(ret);
++ printf("\n");
++ }
++ }
++
++ if (kdbus_args->tap_output)
++ printf("Failed %d/%d tests, %.2f%% okay\n", fail_cnt, N_TESTS,
++ 100.0 - (fail_cnt * 100.0) / ((float) N_TESTS));
++ else
++ printf("\nSUMMARY: %u tests passed, %u skipped, %u failed\n",
++ ok_cnt, skip_cnt, fail_cnt);
++
++ return fail_cnt > 0 ? TEST_ERR : TEST_OK;
++}
++
++static int start_one_test(struct kdbus_test_args *kdbus_args)
++{
++ int i, ret;
++ bool test_found = false;
++
++ for (i = 0; i < N_TESTS; i++) {
++ const struct kdbus_test *t = tests + i;
++
++ if (strcmp(t->name, kdbus_args->test))
++ continue;
++
++ do {
++ test_found = true;
++ if (kdbus_args->fork)
++ ret = test_run_forked(t, kdbus_args,
++ kdbus_args->wait);
++ else
++ ret = test_run(t, kdbus_args,
++ kdbus_args->wait);
++
++ printf("Testing %s: ", t->desc);
++ print_test_result(ret);
++ printf("\n");
++
++ if (ret != TEST_OK)
++ break;
++ } while (kdbus_args->loop);
++
++ return ret;
++ }
++
++ if (!test_found) {
++ printf("Unknown test-id '%s'\n", kdbus_args->test);
++ return TEST_ERR;
++ }
++
++ return TEST_OK;
++}
++
++static void usage(const char *argv0)
++{
++ unsigned int i, j;
++
++ printf("Usage: %s [options]\n"
++ "Options:\n"
++ "\t-a, --tap Output test results in TAP format\n"
++ "\t-m, --module <module> Kdbus module name\n"
++ "\t-x, --loop Run in a loop\n"
++ "\t-f, --fork Fork before running a test\n"
++ "\t-h, --help Print this help\n"
++ "\t-r, --root <root> Toplevel of the kdbus hierarchy\n"
++ "\t-t, --test <test-id> Run one specific test only, in verbose mode\n"
++ "\t-b, --bus <busname> Instead of generating a random bus name, take <busname>.\n"
++ "\t-w, --wait <secs> Wait <secs> before actually starting test\n"
++ "\t --mntns New mount namespace\n"
++ "\t --pidns New PID namespace\n"
++ "\t --userns New user namespace\n"
++ "\t --uidmap uid_map UID map for user namespace\n"
++ "\t --gidmap gid_map GID map for user namespace\n"
++ "\n", argv0);
++
++ printf("By default, all test are run once, and a summary is printed.\n"
++ "Available tests for --test:\n\n");
++
++ for (i = 0; i < N_TESTS; i++) {
++ const struct kdbus_test *t = tests + i;
++
++ printf("\t%s", t->name);
++
++ for (j = 0; j < 24 - strlen(t->name); j++)
++ printf(" ");
++
++ printf("Test %s\n", t->desc);
++ }
++
++ printf("\n");
++ printf("Note that some tests may, if run specifically by --test, "
++ "behave differently, and not terminate by themselves.\n");
++
++ exit(EXIT_FAILURE);
++}
++
++void print_kdbus_test_args(struct kdbus_test_args *args)
++{
++ if (args->userns || args->pidns || args->mntns)
++ printf("# Starting tests in new %s%s%s namespaces%s\n",
++ args->mntns ? "MOUNT " : "",
++ args->pidns ? "PID " : "",
++ args->userns ? "USER " : "",
++ args->mntns ? ", kdbusfs will be remounted" : "");
++ else
++ printf("# Starting tests in the same namespaces\n");
++}
++
++void print_metadata_support(void)
++{
++ bool no_meta_audit, no_meta_cgroups, no_meta_seclabel;
++
++ /*
++ * KDBUS_ATTACH_CGROUP, KDBUS_ATTACH_AUDIT and
++ * KDBUS_ATTACH_SECLABEL
++ */
++ no_meta_audit = !config_auditsyscall_is_enabled();
++ no_meta_cgroups = !config_cgroups_is_enabled();
++ no_meta_seclabel = !config_security_is_enabled();
++
++ if (no_meta_audit | no_meta_cgroups | no_meta_seclabel)
++ printf("# Starting tests without %s%s%s metadata support\n",
++ no_meta_audit ? "AUDIT " : "",
++ no_meta_cgroups ? "CGROUP " : "",
++ no_meta_seclabel ? "SECLABEL " : "");
++ else
++ printf("# Starting tests with full metadata support\n");
++}
++
++int run_tests(struct kdbus_test_args *kdbus_args)
++{
++ int ret;
++ static char control[4096];
++
++ snprintf(control, sizeof(control), "%s/control", kdbus_args->root);
++
++ if (access(control, W_OK) < 0) {
++ printf("Unable to locate control node at '%s'.\n",
++ control);
++ return TEST_ERR;
++ }
++
++ if (kdbus_args->test) {
++ ret = start_one_test(kdbus_args);
++ } else {
++ do {
++ ret = start_all_tests(kdbus_args);
++ if (ret != TEST_OK)
++ break;
++ } while (kdbus_args->loop);
++ }
++
++ return ret;
++}
++
++static void nop_handler(int sig) {}
++
++static int test_prepare_mounts(struct kdbus_test_args *kdbus_args)
++{
++ int ret;
++ char kdbusfs[64] = {'\0'};
++
++ snprintf(kdbusfs, sizeof(kdbusfs), "%sfs", kdbus_args->module);
++
++ /* make current mount slave */
++ ret = mount(NULL, "/", NULL, MS_SLAVE|MS_REC, NULL);
++ if (ret < 0) {
++ ret = -errno;
++ printf("error mount() root: %d (%m)\n", ret);
++ return ret;
++ }
++
++ /* Remount procfs since we need it in our tests */
++ if (kdbus_args->pidns) {
++ ret = mount("proc", "/proc", "proc",
++ MS_NOSUID|MS_NOEXEC|MS_NODEV, NULL);
++ if (ret < 0) {
++ ret = -errno;
++ printf("error mount() /proc : %d (%m)\n", ret);
++ return ret;
++ }
++ }
++
++ /* Remount kdbusfs */
++ ret = mount(kdbusfs, kdbus_args->root, kdbusfs,
++ MS_NOSUID|MS_NOEXEC|MS_NODEV, NULL);
++ if (ret < 0) {
++ ret = -errno;
++ printf("error mount() %s :%d (%m)\n", kdbusfs, ret);
++ return ret;
++ }
++
++ return 0;
++}
++
++int run_tests_in_namespaces(struct kdbus_test_args *kdbus_args)
++{
++ int ret;
++ int efd = -1;
++ int status;
++ pid_t pid, rpid;
++ struct sigaction oldsa;
++ struct sigaction sa = {
++ .sa_handler = nop_handler,
++ .sa_flags = SA_NOCLDSTOP,
++ };
++
++ efd = eventfd(0, EFD_CLOEXEC);
++ if (efd < 0) {
++ ret = -errno;
++ printf("eventfd() failed: %d (%m)\n", ret);
++ return TEST_ERR;
++ }
++
++ ret = sigaction(SIGCHLD, &sa, &oldsa);
++ if (ret < 0) {
++ ret = -errno;
++ printf("sigaction() failed: %d (%m)\n", ret);
++ return TEST_ERR;
++ }
++
++ /* setup namespaces */
++ pid = syscall(__NR_clone, SIGCHLD|
++ (kdbus_args->userns ? CLONE_NEWUSER : 0) |
++ (kdbus_args->mntns ? CLONE_NEWNS : 0) |
++ (kdbus_args->pidns ? CLONE_NEWPID : 0), NULL);
++ if (pid < 0) {
++ printf("clone() failed: %d (%m)\n", -errno);
++ return TEST_ERR;
++ }
++
++ if (pid == 0) {
++ eventfd_t event_status = 0;
++
++ ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
++ if (ret < 0) {
++ ret = -errno;
++ printf("error prctl(): %d (%m)\n", ret);
++ _exit(TEST_ERR);
++ }
++
++ /* reset sighandlers of childs */
++ ret = sigaction(SIGCHLD, &oldsa, NULL);
++ if (ret < 0) {
++ ret = -errno;
++ printf("sigaction() failed: %d (%m)\n", ret);
++ _exit(TEST_ERR);
++ }
++
++ ret = eventfd_read(efd, &event_status);
++ if (ret < 0 || event_status != 1) {
++ printf("error eventfd_read()\n");
++ _exit(TEST_ERR);
++ }
++
++ if (kdbus_args->mntns) {
++ ret = test_prepare_mounts(kdbus_args);
++ if (ret < 0) {
++ printf("error preparing mounts\n");
++ _exit(TEST_ERR);
++ }
++ }
++
++ ret = run_tests(kdbus_args);
++ _exit(ret);
++ }
++
++ /* Setup userns mapping */
++ if (kdbus_args->userns) {
++ ret = userns_map_uid_gid(pid, kdbus_args->uid_map,
++ kdbus_args->gid_map);
++ if (ret < 0) {
++ printf("error mapping uid and gid in userns\n");
++ eventfd_write(efd, 2);
++ return TEST_ERR;
++ }
++ }
++
++ ret = eventfd_write(efd, 1);
++ if (ret < 0) {
++ ret = -errno;
++ printf("error eventfd_write(): %d (%m)\n", ret);
++ return TEST_ERR;
++ }
++
++ rpid = waitpid(pid, &status, 0);
++ ASSERT_RETURN_VAL(rpid == pid, TEST_ERR);
++
++ close(efd);
++
++ if (!WIFEXITED(status) || WEXITSTATUS(status) != 0)
++ return TEST_ERR;
++
++ return TEST_OK;
++}
++
++int start_tests(struct kdbus_test_args *kdbus_args)
++{
++ int ret;
++ bool namespaces;
++ static char fspath[4096];
++
++ namespaces = (kdbus_args->mntns || kdbus_args->pidns ||
++ kdbus_args->userns);
++
++ /* for pidns we need mntns set */
++ if (kdbus_args->pidns && !kdbus_args->mntns) {
++ printf("Failed: please set both pid and mnt namesapces\n");
++ return TEST_ERR;
++ }
++
++ if (kdbus_args->userns) {
++ if (!config_user_ns_is_enabled()) {
++ printf("User namespace not supported\n");
++ return TEST_ERR;
++ }
++
++ if (!kdbus_args->uid_map || !kdbus_args->gid_map) {
++ printf("Failed: please specify uid or gid mapping\n");
++ return TEST_ERR;
++ }
++ }
++
++ print_kdbus_test_args(kdbus_args);
++ print_metadata_support();
++
++ /* setup kdbus paths */
++ if (!kdbus_args->module)
++ kdbus_args->module = "kdbus";
++
++ if (!kdbus_args->root) {
++ snprintf(fspath, sizeof(fspath), "/sys/fs/%s",
++ kdbus_args->module);
++ kdbus_args->root = fspath;
++ }
++
++ /* Start tests */
++ if (namespaces)
++ ret = run_tests_in_namespaces(kdbus_args);
++ else
++ ret = run_tests(kdbus_args);
++
++ return ret;
++}
++
++int main(int argc, char *argv[])
++{
++ int t, ret = 0;
++ struct kdbus_test_args *kdbus_args;
++ enum {
++ ARG_MNTNS = 0x100,
++ ARG_PIDNS,
++ ARG_USERNS,
++ ARG_UIDMAP,
++ ARG_GIDMAP,
++ };
++
++ kdbus_args = malloc(sizeof(*kdbus_args));
++ if (!kdbus_args) {
++ printf("unable to malloc() kdbus_args\n");
++ return EXIT_FAILURE;
++ }
++
++ memset(kdbus_args, 0, sizeof(*kdbus_args));
++
++ static const struct option options[] = {
++ { "loop", no_argument, NULL, 'x' },
++ { "help", no_argument, NULL, 'h' },
++ { "root", required_argument, NULL, 'r' },
++ { "test", required_argument, NULL, 't' },
++ { "bus", required_argument, NULL, 'b' },
++ { "wait", required_argument, NULL, 'w' },
++ { "fork", no_argument, NULL, 'f' },
++ { "module", required_argument, NULL, 'm' },
++ { "tap", no_argument, NULL, 'a' },
++ { "mntns", no_argument, NULL, ARG_MNTNS },
++ { "pidns", no_argument, NULL, ARG_PIDNS },
++ { "userns", no_argument, NULL, ARG_USERNS },
++ { "uidmap", required_argument, NULL, ARG_UIDMAP },
++ { "gidmap", required_argument, NULL, ARG_GIDMAP },
++ {}
++ };
++
++ srand(time(NULL));
++
++ while ((t = getopt_long(argc, argv, "hxfm:r:t:b:w:a", options, NULL)) >= 0) {
++ switch (t) {
++ case 'x':
++ kdbus_args->loop = 1;
++ break;
++
++ case 'm':
++ kdbus_args->module = optarg;
++ break;
++
++ case 'r':
++ kdbus_args->root = optarg;
++ break;
++
++ case 't':
++ kdbus_args->test = optarg;
++ break;
++
++ case 'b':
++ kdbus_args->busname = optarg;
++ break;
++
++ case 'w':
++ kdbus_args->wait = strtol(optarg, NULL, 10);
++ break;
++
++ case 'f':
++ kdbus_args->fork = 1;
++ break;
++
++ case 'a':
++ kdbus_args->tap_output = 1;
++ break;
++
++ case ARG_MNTNS:
++ kdbus_args->mntns = true;
++ break;
++
++ case ARG_PIDNS:
++ kdbus_args->pidns = true;
++ break;
++
++ case ARG_USERNS:
++ kdbus_args->userns = true;
++ break;
++
++ case ARG_UIDMAP:
++ kdbus_args->uid_map = optarg;
++ break;
++
++ case ARG_GIDMAP:
++ kdbus_args->gid_map = optarg;
++ break;
++
++ default:
++ case 'h':
++ usage(argv[0]);
++ }
++ }
++
++ ret = start_tests(kdbus_args);
++ if (ret == TEST_ERR)
++ return EXIT_FAILURE;
++
++ free(kdbus_args);
++
++ return 0;
++}
+diff --git a/tools/testing/selftests/kdbus/kdbus-test.h b/tools/testing/selftests/kdbus/kdbus-test.h
+new file mode 100644
+index 0000000..ee937f9
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/kdbus-test.h
+@@ -0,0 +1,84 @@
++#ifndef _TEST_KDBUS_H_
++#define _TEST_KDBUS_H_
++
++struct kdbus_test_env {
++ char *buspath;
++ const char *root;
++ const char *module;
++ int control_fd;
++ struct kdbus_conn *conn;
++};
++
++enum {
++ TEST_OK,
++ TEST_SKIP,
++ TEST_ERR,
++};
++
++#define ASSERT_RETURN_VAL(cond, val) \
++ if (!(cond)) { \
++ fprintf(stderr, "Assertion '%s' failed in %s(), %s:%d\n", \
++ #cond, __func__, __FILE__, __LINE__); \
++ return val; \
++ }
++
++#define ASSERT_EXIT_VAL(cond, val) \
++ if (!(cond)) { \
++ fprintf(stderr, "Assertion '%s' failed in %s(), %s:%d\n", \
++ #cond, __func__, __FILE__, __LINE__); \
++ _exit(val); \
++ }
++
++#define ASSERT_BREAK(cond) \
++ if (!(cond)) { \
++ fprintf(stderr, "Assertion '%s' failed in %s(), %s:%d\n", \
++ #cond, __func__, __FILE__, __LINE__); \
++ break; \
++ }
++
++#define ASSERT_RETURN(cond) \
++ ASSERT_RETURN_VAL(cond, TEST_ERR)
++
++#define ASSERT_EXIT(cond) \
++ ASSERT_EXIT_VAL(cond, EXIT_FAILURE)
++
++int kdbus_test_activator(struct kdbus_test_env *env);
++int kdbus_test_benchmark(struct kdbus_test_env *env);
++int kdbus_test_benchmark_nomemfds(struct kdbus_test_env *env);
++int kdbus_test_benchmark_uds(struct kdbus_test_env *env);
++int kdbus_test_bus_make(struct kdbus_test_env *env);
++int kdbus_test_byebye(struct kdbus_test_env *env);
++int kdbus_test_chat(struct kdbus_test_env *env);
++int kdbus_test_conn_info(struct kdbus_test_env *env);
++int kdbus_test_conn_update(struct kdbus_test_env *env);
++int kdbus_test_daemon(struct kdbus_test_env *env);
++int kdbus_test_custom_endpoint(struct kdbus_test_env *env);
++int kdbus_test_fd_passing(struct kdbus_test_env *env);
++int kdbus_test_free(struct kdbus_test_env *env);
++int kdbus_test_hello(struct kdbus_test_env *env);
++int kdbus_test_match_bloom(struct kdbus_test_env *env);
++int kdbus_test_match_id_add(struct kdbus_test_env *env);
++int kdbus_test_match_id_remove(struct kdbus_test_env *env);
++int kdbus_test_match_replace(struct kdbus_test_env *env);
++int kdbus_test_match_name_add(struct kdbus_test_env *env);
++int kdbus_test_match_name_change(struct kdbus_test_env *env);
++int kdbus_test_match_name_remove(struct kdbus_test_env *env);
++int kdbus_test_message_basic(struct kdbus_test_env *env);
++int kdbus_test_message_prio(struct kdbus_test_env *env);
++int kdbus_test_message_quota(struct kdbus_test_env *env);
++int kdbus_test_memory_access(struct kdbus_test_env *env);
++int kdbus_test_metadata_ns(struct kdbus_test_env *env);
++int kdbus_test_monitor(struct kdbus_test_env *env);
++int kdbus_test_name_basic(struct kdbus_test_env *env);
++int kdbus_test_name_conflict(struct kdbus_test_env *env);
++int kdbus_test_name_queue(struct kdbus_test_env *env);
++int kdbus_test_name_takeover(struct kdbus_test_env *env);
++int kdbus_test_policy(struct kdbus_test_env *env);
++int kdbus_test_policy_ns(struct kdbus_test_env *env);
++int kdbus_test_policy_priv(struct kdbus_test_env *env);
++int kdbus_test_sync_byebye(struct kdbus_test_env *env);
++int kdbus_test_sync_reply(struct kdbus_test_env *env);
++int kdbus_test_timeout(struct kdbus_test_env *env);
++int kdbus_test_writable_pool(struct kdbus_test_env *env);
++
++#endif /* _TEST_KDBUS_H_ */
+diff --git a/tools/testing/selftests/kdbus/kdbus-util.c b/tools/testing/selftests/kdbus/kdbus-util.c
+new file mode 100644
+index 0000000..82fa89b
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/kdbus-util.c
+@@ -0,0 +1,1612 @@
++/*
++ * Copyright (C) 2013-2015 Daniel Mack
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2014-2015 Djalal Harouni
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <stdio.h>
++#include <stdarg.h>
++#include <string.h>
++#include <time.h>
++#include <inttypes.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <stdbool.h>
++#include <errno.h>
++#include <assert.h>
++#include <poll.h>
++#include <grp.h>
++#include <sys/capability.h>
++#include <sys/mman.h>
++#include <sys/stat.h>
++#include <sys/time.h>
++#include <linux/unistd.h>
++#include <linux/memfd.h>
++
++#ifndef __NR_memfd_create
++ #ifdef __x86_64__
++ #define __NR_memfd_create 319
++ #elif defined __arm__
++ #define __NR_memfd_create 385
++ #else
++ #define __NR_memfd_create 356
++ #endif
++#endif
++
++#include "kdbus-api.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++
++#ifndef F_ADD_SEALS
++#define F_LINUX_SPECIFIC_BASE 1024
++#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
++#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
++
++#define F_SEAL_SEAL 0x0001 /* prevent further seals from being set */
++#define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */
++#define F_SEAL_GROW 0x0004 /* prevent file from growing */
++#define F_SEAL_WRITE 0x0008 /* prevent writes */
++#endif
++
++int kdbus_util_verbose = true;
++
++int kdbus_sysfs_get_parameter_mask(const char *path, uint64_t *mask)
++{
++ int ret;
++ FILE *file;
++ unsigned long long value;
++
++ file = fopen(path, "r");
++ if (!file) {
++ ret = -errno;
++ kdbus_printf("--- error fopen(): %d (%m)\n", ret);
++ return ret;
++ }
++
++ ret = fscanf(file, "%llu", &value);
++ if (ret != 1) {
++ if (ferror(file))
++ ret = -errno;
++ else
++ ret = -EIO;
++
++ kdbus_printf("--- error fscanf(): %d\n", ret);
++ fclose(file);
++ return ret;
++ }
++
++ *mask = (uint64_t)value;
++
++ fclose(file);
++
++ return 0;
++}
++
++int kdbus_sysfs_set_parameter_mask(const char *path, uint64_t mask)
++{
++ int ret;
++ FILE *file;
++
++ file = fopen(path, "w");
++ if (!file) {
++ ret = -errno;
++ kdbus_printf("--- error open(): %d (%m)\n", ret);
++ return ret;
++ }
++
++ ret = fprintf(file, "%llu", (unsigned long long)mask);
++ if (ret <= 0) {
++ ret = -EIO;
++ kdbus_printf("--- error fprintf(): %d\n", ret);
++ }
++
++ fclose(file);
++
++ return ret > 0 ? 0 : ret;
++}
++
++int kdbus_create_bus(int control_fd, const char *name,
++ uint64_t owner_meta, char **path)
++{
++ struct {
++ struct kdbus_cmd cmd;
++
++ /* bloom size item */
++ struct {
++ uint64_t size;
++ uint64_t type;
++ struct kdbus_bloom_parameter bloom;
++ } bp;
++
++ /* owner metadata items */
++ struct {
++ uint64_t size;
++ uint64_t type;
++ uint64_t flags;
++ } attach;
++
++ /* name item */
++ struct {
++ uint64_t size;
++ uint64_t type;
++ char str[64];
++ } name;
++ } bus_make;
++ int ret;
++
++ memset(&bus_make, 0, sizeof(bus_make));
++ bus_make.bp.size = sizeof(bus_make.bp);
++ bus_make.bp.type = KDBUS_ITEM_BLOOM_PARAMETER;
++ bus_make.bp.bloom.size = 64;
++ bus_make.bp.bloom.n_hash = 1;
++
++ snprintf(bus_make.name.str, sizeof(bus_make.name.str),
++ "%u-%s", getuid(), name);
++
++ bus_make.attach.type = KDBUS_ITEM_ATTACH_FLAGS_SEND;
++ bus_make.attach.size = sizeof(bus_make.attach);
++ bus_make.attach.flags = owner_meta;
++
++ bus_make.name.type = KDBUS_ITEM_MAKE_NAME;
++ bus_make.name.size = KDBUS_ITEM_HEADER_SIZE +
++ strlen(bus_make.name.str) + 1;
++
++ bus_make.cmd.flags = KDBUS_MAKE_ACCESS_WORLD;
++ bus_make.cmd.size = sizeof(bus_make.cmd) +
++ bus_make.bp.size +
++ bus_make.attach.size +
++ bus_make.name.size;
++
++ kdbus_printf("Creating bus with name >%s< on control fd %d ...\n",
++ name, control_fd);
++
++ ret = kdbus_cmd_bus_make(control_fd, &bus_make.cmd);
++ if (ret < 0) {
++ kdbus_printf("--- error when making bus: %d (%m)\n", ret);
++ return ret;
++ }
++
++ if (ret == 0 && path)
++ *path = strdup(bus_make.name.str);
++
++ return ret;
++}
++
++struct kdbus_conn *
++kdbus_hello(const char *path, uint64_t flags,
++ const struct kdbus_item *item, size_t item_size)
++{
++ struct kdbus_cmd_free cmd_free = {};
++ int fd, ret;
++ struct {
++ struct kdbus_cmd_hello hello;
++
++ struct {
++ uint64_t size;
++ uint64_t type;
++ char str[16];
++ } conn_name;
++
++ uint8_t extra_items[item_size];
++ } h;
++ struct kdbus_conn *conn;
++
++ memset(&h, 0, sizeof(h));
++
++ if (item_size > 0)
++ memcpy(h.extra_items, item, item_size);
++
++ kdbus_printf("-- opening bus connection %s\n", path);
++ fd = open(path, O_RDWR|O_CLOEXEC);
++ if (fd < 0) {
++ kdbus_printf("--- error %d (%m)\n", fd);
++ return NULL;
++ }
++
++ h.hello.flags = flags | KDBUS_HELLO_ACCEPT_FD;
++ h.hello.attach_flags_send = _KDBUS_ATTACH_ALL;
++ h.hello.attach_flags_recv = _KDBUS_ATTACH_ALL;
++ h.conn_name.type = KDBUS_ITEM_CONN_DESCRIPTION;
++ strcpy(h.conn_name.str, "this-is-my-name");
++ h.conn_name.size = KDBUS_ITEM_HEADER_SIZE + strlen(h.conn_name.str) + 1;
++
++ h.hello.size = sizeof(h);
++ h.hello.pool_size = POOL_SIZE;
++
++ ret = kdbus_cmd_hello(fd, (struct kdbus_cmd_hello *) &h.hello);
++ if (ret < 0) {
++ kdbus_printf("--- error when saying hello: %d (%m)\n", ret);
++ return NULL;
++ }
++ kdbus_printf("-- Our peer ID for %s: %llu -- bus uuid: '%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x'\n",
++ path, (unsigned long long)h.hello.id,
++ h.hello.id128[0], h.hello.id128[1], h.hello.id128[2],
++ h.hello.id128[3], h.hello.id128[4], h.hello.id128[5],
++ h.hello.id128[6], h.hello.id128[7], h.hello.id128[8],
++ h.hello.id128[9], h.hello.id128[10], h.hello.id128[11],
++ h.hello.id128[12], h.hello.id128[13], h.hello.id128[14],
++ h.hello.id128[15]);
++
++ cmd_free.size = sizeof(cmd_free);
++ cmd_free.offset = h.hello.offset;
++ kdbus_cmd_free(fd, &cmd_free);
++
++ conn = malloc(sizeof(*conn));
++ if (!conn) {
++ kdbus_printf("unable to malloc()!?\n");
++ return NULL;
++ }
++
++ conn->buf = mmap(NULL, POOL_SIZE, PROT_READ, MAP_SHARED, fd, 0);
++ if (conn->buf == MAP_FAILED) {
++ free(conn);
++ close(fd);
++ kdbus_printf("--- error mmap (%m)\n");
++ return NULL;
++ }
++
++ conn->fd = fd;
++ conn->id = h.hello.id;
++ return conn;
++}
++
++struct kdbus_conn *
++kdbus_hello_registrar(const char *path, const char *name,
++ const struct kdbus_policy_access *access,
++ size_t num_access, uint64_t flags)
++{
++ struct kdbus_item *item, *items;
++ size_t i, size;
++
++ size = KDBUS_ITEM_SIZE(strlen(name) + 1) +
++ num_access * KDBUS_ITEM_SIZE(sizeof(*access));
++
++ items = alloca(size);
++
++ item = items;
++ item->size = KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
++ item->type = KDBUS_ITEM_NAME;
++ strcpy(item->str, name);
++ item = KDBUS_ITEM_NEXT(item);
++
++ for (i = 0; i < num_access; i++) {
++ item->size = KDBUS_ITEM_HEADER_SIZE +
++ sizeof(struct kdbus_policy_access);
++ item->type = KDBUS_ITEM_POLICY_ACCESS;
++
++ item->policy_access.type = access[i].type;
++ item->policy_access.access = access[i].access;
++ item->policy_access.id = access[i].id;
++
++ item = KDBUS_ITEM_NEXT(item);
++ }
++
++ return kdbus_hello(path, flags, items, size);
++}
++
++struct kdbus_conn *kdbus_hello_activator(const char *path, const char *name,
++ const struct kdbus_policy_access *access,
++ size_t num_access)
++{
++ return kdbus_hello_registrar(path, name, access, num_access,
++ KDBUS_HELLO_ACTIVATOR);
++}
++
++bool kdbus_item_in_message(struct kdbus_msg *msg, uint64_t type)
++{
++ const struct kdbus_item *item;
++
++ KDBUS_ITEM_FOREACH(item, msg, items)
++ if (item->type == type)
++ return true;
++
++ return false;
++}
++
++int kdbus_bus_creator_info(struct kdbus_conn *conn,
++ uint64_t flags,
++ uint64_t *offset)
++{
++ struct kdbus_cmd_info *cmd;
++ size_t size = sizeof(*cmd);
++ int ret;
++
++ cmd = alloca(size);
++ memset(cmd, 0, size);
++ cmd->size = size;
++ cmd->attach_flags = flags;
++
++ ret = kdbus_cmd_bus_creator_info(conn->fd, cmd);
++ if (ret < 0) {
++ kdbus_printf("--- error when requesting info: %d (%m)\n", ret);
++ return ret;
++ }
++
++ if (offset)
++ *offset = cmd->offset;
++ else
++ kdbus_free(conn, cmd->offset);
++
++ return 0;
++}
++
++int kdbus_conn_info(struct kdbus_conn *conn, uint64_t id,
++ const char *name, uint64_t flags,
++ uint64_t *offset)
++{
++ struct kdbus_cmd_info *cmd;
++ size_t size = sizeof(*cmd);
++ struct kdbus_info *info;
++ int ret;
++
++ if (name)
++ size += KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
++
++ cmd = alloca(size);
++ memset(cmd, 0, size);
++ cmd->size = size;
++ cmd->attach_flags = flags;
++
++ if (name) {
++ cmd->items[0].size = KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
++ cmd->items[0].type = KDBUS_ITEM_NAME;
++ strcpy(cmd->items[0].str, name);
++ } else {
++ cmd->id = id;
++ }
++
++ ret = kdbus_cmd_conn_info(conn->fd, cmd);
++ if (ret < 0) {
++ kdbus_printf("--- error when requesting info: %d (%m)\n", ret);
++ return ret;
++ }
++
++ info = (struct kdbus_info *) (conn->buf + cmd->offset);
++ if (info->size != cmd->info_size) {
++ kdbus_printf("%s(): size mismatch: %d != %d\n", __func__,
++ (int) info->size, (int) cmd->info_size);
++ return -EIO;
++ }
++
++ if (offset)
++ *offset = cmd->offset;
++ else
++ kdbus_free(conn, cmd->offset);
++
++ return 0;
++}
++
++void kdbus_conn_free(struct kdbus_conn *conn)
++{
++ if (!conn)
++ return;
++
++ if (conn->buf)
++ munmap(conn->buf, POOL_SIZE);
++
++ if (conn->fd >= 0)
++ close(conn->fd);
++
++ free(conn);
++}
++
++int sys_memfd_create(const char *name, __u64 size)
++{
++ int ret, fd;
++
++ fd = syscall(__NR_memfd_create, name, MFD_ALLOW_SEALING);
++ if (fd < 0)
++ return fd;
++
++ ret = ftruncate(fd, size);
++ if (ret < 0) {
++ close(fd);
++ return ret;
++ }
++
++ return fd;
++}
++
++int sys_memfd_seal_set(int fd)
++{
++ return fcntl(fd, F_ADD_SEALS, F_SEAL_SHRINK |
++ F_SEAL_GROW | F_SEAL_WRITE | F_SEAL_SEAL);
++}
++
++off_t sys_memfd_get_size(int fd, off_t *size)
++{
++ struct stat stat;
++ int ret;
++
++ ret = fstat(fd, &stat);
++ if (ret < 0) {
++ kdbus_printf("stat() failed: %m\n");
++ return ret;
++ }
++
++ *size = stat.st_size;
++ return 0;
++}
++
++static int __kdbus_msg_send(const struct kdbus_conn *conn,
++ const char *name,
++ uint64_t cookie,
++ uint64_t flags,
++ uint64_t timeout,
++ int64_t priority,
++ uint64_t dst_id,
++ uint64_t cmd_flags,
++ int cancel_fd)
++{
++ struct kdbus_cmd_send *cmd = NULL;
++ struct kdbus_msg *msg = NULL;
++ const char ref1[1024 * 128 + 3] = "0123456789_0";
++ const char ref2[] = "0123456789_1";
++ struct kdbus_item *item;
++ struct timespec now;
++ uint64_t size;
++ int memfd = -1;
++ int ret;
++
++ size = sizeof(*msg) + 3 * KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
++
++ if (dst_id == KDBUS_DST_ID_BROADCAST)
++ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
++ else {
++ memfd = sys_memfd_create("my-name-is-nice", 1024 * 1024);
++ if (memfd < 0) {
++ kdbus_printf("failed to create memfd: %m\n");
++ return memfd;
++ }
++
++ if (write(memfd, "kdbus memfd 1234567", 19) != 19) {
++ ret = -errno;
++ kdbus_printf("writing to memfd failed: %m\n");
++ goto out;
++ }
++
++ ret = sys_memfd_seal_set(memfd);
++ if (ret < 0) {
++ ret = -errno;
++ kdbus_printf("memfd sealing failed: %m\n");
++ goto out;
++ }
++
++ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_memfd));
++ }
++
++ if (name)
++ size += KDBUS_ITEM_SIZE(strlen(name) + 1);
++
++ msg = malloc(size);
++ if (!msg) {
++ ret = -errno;
++ kdbus_printf("unable to malloc()!?\n");
++ goto out;
++ }
++
++ if (dst_id == KDBUS_DST_ID_BROADCAST)
++ flags |= KDBUS_MSG_SIGNAL;
++
++ memset(msg, 0, size);
++ msg->flags = flags;
++ msg->priority = priority;
++ msg->size = size;
++ msg->src_id = conn->id;
++ msg->dst_id = name ? 0 : dst_id;
++ msg->cookie = cookie;
++ msg->payload_type = KDBUS_PAYLOAD_DBUS;
++
++ if (timeout) {
++ ret = clock_gettime(CLOCK_MONOTONIC_COARSE, &now);
++ if (ret < 0)
++ goto out;
++
++ msg->timeout_ns = now.tv_sec * 1000000000ULL +
++ now.tv_nsec + timeout;
++ }
++
++ item = msg->items;
++
++ if (name) {
++ item->type = KDBUS_ITEM_DST_NAME;
++ item->size = KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
++ strcpy(item->str, name);
++ item = KDBUS_ITEM_NEXT(item);
++ }
++
++ item->type = KDBUS_ITEM_PAYLOAD_VEC;
++ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
++ item->vec.address = (uintptr_t)&ref1;
++ item->vec.size = sizeof(ref1);
++ item = KDBUS_ITEM_NEXT(item);
++
++ /* data padding for ref1 */
++ item->type = KDBUS_ITEM_PAYLOAD_VEC;
++ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
++ item->vec.address = (uintptr_t)NULL;
++ item->vec.size = KDBUS_ALIGN8(sizeof(ref1)) - sizeof(ref1);
++ item = KDBUS_ITEM_NEXT(item);
++
++ item->type = KDBUS_ITEM_PAYLOAD_VEC;
++ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
++ item->vec.address = (uintptr_t)&ref2;
++ item->vec.size = sizeof(ref2);
++ item = KDBUS_ITEM_NEXT(item);
++
++ if (dst_id == KDBUS_DST_ID_BROADCAST) {
++ item->type = KDBUS_ITEM_BLOOM_FILTER;
++ item->size = KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
++ item->bloom_filter.generation = 0;
++ } else {
++ item->type = KDBUS_ITEM_PAYLOAD_MEMFD;
++ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_memfd);
++ item->memfd.size = 16;
++ item->memfd.fd = memfd;
++ }
++ item = KDBUS_ITEM_NEXT(item);
++
++ size = sizeof(*cmd);
++ if (cancel_fd != -1)
++ size += KDBUS_ITEM_SIZE(sizeof(cancel_fd));
++
++ cmd = malloc(size);
++ if (!cmd) {
++ ret = -errno;
++ kdbus_printf("unable to malloc()!?\n");
++ goto out;
++ }
++
++ cmd->size = size;
++ cmd->flags = cmd_flags;
++ cmd->msg_address = (uintptr_t)msg;
++
++ item = cmd->items;
++
++ if (cancel_fd != -1) {
++ item->type = KDBUS_ITEM_CANCEL_FD;
++ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(cancel_fd);
++ item->fds[0] = cancel_fd;
++ item = KDBUS_ITEM_NEXT(item);
++ }
++
++ ret = kdbus_cmd_send(conn->fd, cmd);
++ if (ret < 0) {
++ kdbus_printf("error sending message: %d (%m)\n", ret);
++ goto out;
++ }
++
++ if (cmd_flags & KDBUS_SEND_SYNC_REPLY) {
++ struct kdbus_msg *reply;
++
++ kdbus_printf("SYNC REPLY @offset %llu:\n", cmd->reply.offset);
++ reply = (struct kdbus_msg *)(conn->buf + cmd->reply.offset);
++ kdbus_msg_dump(conn, reply);
++
++ kdbus_msg_free(reply);
++
++ ret = kdbus_free(conn, cmd->reply.offset);
++ if (ret < 0)
++ goto out;
++ }
++
++out:
++ free(msg);
++ free(cmd);
++
++ if (memfd >= 0)
++ close(memfd);
++
++ return ret < 0 ? ret : 0;
++}
++
++int kdbus_msg_send(const struct kdbus_conn *conn, const char *name,
++ uint64_t cookie, uint64_t flags, uint64_t timeout,
++ int64_t priority, uint64_t dst_id)
++{
++ return __kdbus_msg_send(conn, name, cookie, flags, timeout, priority,
++ dst_id, 0, -1);
++}
++
++int kdbus_msg_send_sync(const struct kdbus_conn *conn, const char *name,
++ uint64_t cookie, uint64_t flags, uint64_t timeout,
++ int64_t priority, uint64_t dst_id, int cancel_fd)
++{
++ return __kdbus_msg_send(conn, name, cookie, flags, timeout, priority,
++ dst_id, KDBUS_SEND_SYNC_REPLY, cancel_fd);
++}
++
++int kdbus_msg_send_reply(const struct kdbus_conn *conn,
++ uint64_t reply_cookie,
++ uint64_t dst_id)
++{
++ struct kdbus_cmd_send cmd = {};
++ struct kdbus_msg *msg;
++ const char ref1[1024 * 128 + 3] = "0123456789_0";
++ struct kdbus_item *item;
++ uint64_t size;
++ int ret;
++
++ size = sizeof(struct kdbus_msg);
++ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
++
++ msg = malloc(size);
++ if (!msg) {
++ kdbus_printf("unable to malloc()!?\n");
++ return -ENOMEM;
++ }
++
++ memset(msg, 0, size);
++ msg->size = size;
++ msg->src_id = conn->id;
++ msg->dst_id = dst_id;
++ msg->cookie_reply = reply_cookie;
++ msg->payload_type = KDBUS_PAYLOAD_DBUS;
++
++ item = msg->items;
++
++ item->type = KDBUS_ITEM_PAYLOAD_VEC;
++ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
++ item->vec.address = (uintptr_t)&ref1;
++ item->vec.size = sizeof(ref1);
++ item = KDBUS_ITEM_NEXT(item);
++
++ cmd.size = sizeof(cmd);
++ cmd.msg_address = (uintptr_t)msg;
++
++ ret = kdbus_cmd_send(conn->fd, &cmd);
++ if (ret < 0)
++ kdbus_printf("error sending message: %d (%m)\n", ret);
++
++ free(msg);
++
++ return ret;
++}
++
++static char *msg_id(uint64_t id, char *buf)
++{
++ if (id == 0)
++ return "KERNEL";
++ if (id == ~0ULL)
++ return "BROADCAST";
++ sprintf(buf, "%llu", (unsigned long long)id);
++ return buf;
++}
++
++int kdbus_msg_dump(const struct kdbus_conn *conn, const struct kdbus_msg *msg)
++{
++ const struct kdbus_item *item = msg->items;
++ char buf_src[32];
++ char buf_dst[32];
++ uint64_t timeout = 0;
++ uint64_t cookie_reply = 0;
++ int ret = 0;
++
++ if (msg->flags & KDBUS_MSG_EXPECT_REPLY)
++ timeout = msg->timeout_ns;
++ else
++ cookie_reply = msg->cookie_reply;
++
++ kdbus_printf("MESSAGE: %s (%llu bytes) flags=0x%08llx, %s → %s, "
++ "cookie=%llu, timeout=%llu cookie_reply=%llu priority=%lli\n",
++ enum_PAYLOAD(msg->payload_type), (unsigned long long)msg->size,
++ (unsigned long long)msg->flags,
++ msg_id(msg->src_id, buf_src), msg_id(msg->dst_id, buf_dst),
++ (unsigned long long)msg->cookie, (unsigned long long)timeout,
++ (unsigned long long)cookie_reply, (long long)msg->priority);
++
++ KDBUS_ITEM_FOREACH(item, msg, items) {
++ if (item->size < KDBUS_ITEM_HEADER_SIZE) {
++ kdbus_printf(" +%s (%llu bytes) invalid data record\n",
++ enum_MSG(item->type), item->size);
++ ret = -EINVAL;
++ break;
++ }
++
++ switch (item->type) {
++ case KDBUS_ITEM_PAYLOAD_OFF: {
++ char *s;
++
++ if (item->vec.offset == ~0ULL)
++ s = "[\\0-bytes]";
++ else
++ s = (char *)msg + item->vec.offset;
++
++ kdbus_printf(" +%s (%llu bytes) off=%llu size=%llu '%s'\n",
++ enum_MSG(item->type), item->size,
++ (unsigned long long)item->vec.offset,
++ (unsigned long long)item->vec.size, s);
++ break;
++ }
++
++ case KDBUS_ITEM_FDS: {
++ int i, n = (item->size - KDBUS_ITEM_HEADER_SIZE) /
++ sizeof(int);
++
++ kdbus_printf(" +%s (%llu bytes, %d fds)\n",
++ enum_MSG(item->type), item->size, n);
++
++ for (i = 0; i < n; i++)
++ kdbus_printf(" fd[%d] = %d\n",
++ i, item->fds[i]);
++
++ break;
++ }
++
++ case KDBUS_ITEM_PAYLOAD_MEMFD: {
++ char *buf;
++ off_t size;
++
++ buf = mmap(NULL, item->memfd.size, PROT_READ,
++ MAP_PRIVATE, item->memfd.fd, 0);
++ if (buf == MAP_FAILED) {
++ kdbus_printf("mmap() fd=%i size=%llu failed: %m\n",
++ item->memfd.fd, item->memfd.size);
++ break;
++ }
++
++ if (sys_memfd_get_size(item->memfd.fd, &size) < 0) {
++ kdbus_printf("KDBUS_CMD_MEMFD_SIZE_GET failed: %m\n");
++ break;
++ }
++
++ kdbus_printf(" +%s (%llu bytes) fd=%i size=%llu filesize=%llu '%s'\n",
++ enum_MSG(item->type), item->size, item->memfd.fd,
++ (unsigned long long)item->memfd.size,
++ (unsigned long long)size, buf);
++ munmap(buf, item->memfd.size);
++ break;
++ }
++
++ case KDBUS_ITEM_CREDS:
++ kdbus_printf(" +%s (%llu bytes) uid=%lld, euid=%lld, suid=%lld, fsuid=%lld, "
++ "gid=%lld, egid=%lld, sgid=%lld, fsgid=%lld\n",
++ enum_MSG(item->type), item->size,
++ item->creds.uid, item->creds.euid,
++ item->creds.suid, item->creds.fsuid,
++ item->creds.gid, item->creds.egid,
++ item->creds.sgid, item->creds.fsgid);
++ break;
++
++ case KDBUS_ITEM_PIDS:
++ kdbus_printf(" +%s (%llu bytes) pid=%lld, tid=%lld, ppid=%lld\n",
++ enum_MSG(item->type), item->size,
++ item->pids.pid, item->pids.tid,
++ item->pids.ppid);
++ break;
++
++ case KDBUS_ITEM_AUXGROUPS: {
++ int i, n;
++
++ kdbus_printf(" +%s (%llu bytes)\n",
++ enum_MSG(item->type), item->size);
++ n = (item->size - KDBUS_ITEM_HEADER_SIZE) /
++ sizeof(uint64_t);
++
++ for (i = 0; i < n; i++)
++ kdbus_printf(" gid[%d] = %lld\n",
++ i, item->data64[i]);
++ break;
++ }
++
++ case KDBUS_ITEM_NAME:
++ case KDBUS_ITEM_PID_COMM:
++ case KDBUS_ITEM_TID_COMM:
++ case KDBUS_ITEM_EXE:
++ case KDBUS_ITEM_CGROUP:
++ case KDBUS_ITEM_SECLABEL:
++ case KDBUS_ITEM_DST_NAME:
++ case KDBUS_ITEM_CONN_DESCRIPTION:
++ kdbus_printf(" +%s (%llu bytes) '%s' (%zu)\n",
++ enum_MSG(item->type), item->size,
++ item->str, strlen(item->str));
++ break;
++
++ case KDBUS_ITEM_OWNED_NAME: {
++ kdbus_printf(" +%s (%llu bytes) '%s' (%zu) flags=0x%08llx\n",
++ enum_MSG(item->type), item->size,
++ item->name.name, strlen(item->name.name),
++ item->name.flags);
++ break;
++ }
++
++ case KDBUS_ITEM_CMDLINE: {
++ size_t size = item->size - KDBUS_ITEM_HEADER_SIZE;
++ const char *str = item->str;
++ int count = 0;
++
++ kdbus_printf(" +%s (%llu bytes) ",
++ enum_MSG(item->type), item->size);
++ while (size) {
++ kdbus_printf("'%s' ", str);
++ size -= strlen(str) + 1;
++ str += strlen(str) + 1;
++ count++;
++ }
++
++ kdbus_printf("(%d string%s)\n",
++ count, (count == 1) ? "" : "s");
++ break;
++ }
++
++ case KDBUS_ITEM_AUDIT:
++ kdbus_printf(" +%s (%llu bytes) loginuid=%u sessionid=%u\n",
++ enum_MSG(item->type), item->size,
++ item->audit.loginuid, item->audit.sessionid);
++ break;
++
++ case KDBUS_ITEM_CAPS: {
++ const uint32_t *cap;
++ int n, i;
++
++ kdbus_printf(" +%s (%llu bytes) len=%llu bytes, last_cap %d\n",
++ enum_MSG(item->type), item->size,
++ (unsigned long long)item->size -
++ KDBUS_ITEM_HEADER_SIZE,
++ (int) item->caps.last_cap);
++
++ cap = item->caps.caps;
++ n = (item->size - offsetof(struct kdbus_item, caps.caps))
++ / 4 / sizeof(uint32_t);
++
++ kdbus_printf(" CapInh=");
++ for (i = 0; i < n; i++)
++ kdbus_printf("%08x", cap[(0 * n) + (n - i - 1)]);
++
++ kdbus_printf(" CapPrm=");
++ for (i = 0; i < n; i++)
++ kdbus_printf("%08x", cap[(1 * n) + (n - i - 1)]);
++
++ kdbus_printf(" CapEff=");
++ for (i = 0; i < n; i++)
++ kdbus_printf("%08x", cap[(2 * n) + (n - i - 1)]);
++
++ kdbus_printf(" CapBnd=");
++ for (i = 0; i < n; i++)
++ kdbus_printf("%08x", cap[(3 * n) + (n - i - 1)]);
++ kdbus_printf("\n");
++ break;
++ }
++
++ case KDBUS_ITEM_TIMESTAMP:
++ kdbus_printf(" +%s (%llu bytes) seq=%llu realtime=%lluns monotonic=%lluns\n",
++ enum_MSG(item->type), item->size,
++ (unsigned long long)item->timestamp.seqnum,
++ (unsigned long long)item->timestamp.realtime_ns,
++ (unsigned long long)item->timestamp.monotonic_ns);
++ break;
++
++ case KDBUS_ITEM_REPLY_TIMEOUT:
++ kdbus_printf(" +%s (%llu bytes) cookie=%llu\n",
++ enum_MSG(item->type), item->size,
++ msg->cookie_reply);
++ break;
++
++ case KDBUS_ITEM_NAME_ADD:
++ case KDBUS_ITEM_NAME_REMOVE:
++ case KDBUS_ITEM_NAME_CHANGE:
++ kdbus_printf(" +%s (%llu bytes) '%s', old id=%lld, now id=%lld, old_flags=0x%llx new_flags=0x%llx\n",
++ enum_MSG(item->type),
++ (unsigned long long) item->size,
++ item->name_change.name,
++ item->name_change.old_id.id,
++ item->name_change.new_id.id,
++ item->name_change.old_id.flags,
++ item->name_change.new_id.flags);
++ break;
++
++ case KDBUS_ITEM_ID_ADD:
++ case KDBUS_ITEM_ID_REMOVE:
++ kdbus_printf(" +%s (%llu bytes) id=%llu flags=%llu\n",
++ enum_MSG(item->type),
++ (unsigned long long) item->size,
++ (unsigned long long) item->id_change.id,
++ (unsigned long long) item->id_change.flags);
++ break;
++
++ default:
++ kdbus_printf(" +%s (%llu bytes)\n",
++ enum_MSG(item->type), item->size);
++ break;
++ }
++ }
++
++ if ((char *)item - ((char *)msg + msg->size) >= 8) {
++ kdbus_printf("invalid padding at end of message\n");
++ ret = -EINVAL;
++ }
++
++ kdbus_printf("\n");
++
++ return ret;
++}
++
++void kdbus_msg_free(struct kdbus_msg *msg)
++{
++ const struct kdbus_item *item;
++ int nfds, i;
++
++ if (!msg)
++ return;
++
++ KDBUS_ITEM_FOREACH(item, msg, items) {
++ switch (item->type) {
++ /* close all memfds */
++ case KDBUS_ITEM_PAYLOAD_MEMFD:
++ close(item->memfd.fd);
++ break;
++ case KDBUS_ITEM_FDS:
++ nfds = (item->size - KDBUS_ITEM_HEADER_SIZE) /
++ sizeof(int);
++
++ for (i = 0; i < nfds; i++)
++ close(item->fds[i]);
++
++ break;
++ }
++ }
++}
++
++int kdbus_msg_recv(struct kdbus_conn *conn,
++ struct kdbus_msg **msg_out,
++ uint64_t *offset)
++{
++ struct kdbus_cmd_recv recv = { .size = sizeof(recv) };
++ struct kdbus_msg *msg;
++ int ret;
++
++ ret = kdbus_cmd_recv(conn->fd, &recv);
++ if (ret < 0)
++ return ret;
++
++ msg = (struct kdbus_msg *)(conn->buf + recv.msg.offset);
++ ret = kdbus_msg_dump(conn, msg);
++ if (ret < 0) {
++ kdbus_msg_free(msg);
++ return ret;
++ }
++
++ if (msg_out) {
++ *msg_out = msg;
++
++ if (offset)
++ *offset = recv.msg.offset;
++ } else {
++ kdbus_msg_free(msg);
++
++ ret = kdbus_free(conn, recv.msg.offset);
++ if (ret < 0)
++ return ret;
++ }
++
++ return 0;
++}
++
++/*
++ * Returns: 0 on success, negative errno on failure.
++ *
++ * We must return -ETIMEDOUT, -ECONNREST, -EAGAIN and other errors.
++ * We must return the result of kdbus_msg_recv()
++ */
++int kdbus_msg_recv_poll(struct kdbus_conn *conn,
++ int timeout_ms,
++ struct kdbus_msg **msg_out,
++ uint64_t *offset)
++{
++ int ret;
++
++ do {
++ struct timeval before, after, diff;
++ struct pollfd fd;
++
++ fd.fd = conn->fd;
++ fd.events = POLLIN | POLLPRI | POLLHUP;
++ fd.revents = 0;
++
++ gettimeofday(&before, NULL);
++ ret = poll(&fd, 1, timeout_ms);
++ gettimeofday(&after, NULL);
++
++ if (ret == 0) {
++ ret = -ETIMEDOUT;
++ break;
++ }
++
++ if (ret > 0) {
++ if (fd.revents & POLLIN)
++ ret = kdbus_msg_recv(conn, msg_out, offset);
++
++ if (fd.revents & (POLLHUP | POLLERR))
++ ret = -ECONNRESET;
++ }
++
++ if (ret == 0 || ret != -EAGAIN)
++ break;
++
++ timersub(&after, &before, &diff);
++ timeout_ms -= diff.tv_sec * 1000UL +
++ diff.tv_usec / 1000UL;
++ } while (timeout_ms > 0);
++
++ return ret;
++}
++
++int kdbus_free(const struct kdbus_conn *conn, uint64_t offset)
++{
++ struct kdbus_cmd_free cmd_free = {};
++ int ret;
++
++ cmd_free.size = sizeof(cmd_free);
++ cmd_free.offset = offset;
++ cmd_free.flags = 0;
++
++ ret = kdbus_cmd_free(conn->fd, &cmd_free);
++ if (ret < 0) {
++ kdbus_printf("KDBUS_CMD_FREE failed: %d (%m)\n", ret);
++ return ret;
++ }
++
++ return 0;
++}
++
++int kdbus_name_acquire(struct kdbus_conn *conn,
++ const char *name, uint64_t *flags)
++{
++ struct kdbus_cmd *cmd_name;
++ size_t name_len = strlen(name) + 1;
++ uint64_t size = sizeof(*cmd_name) + KDBUS_ITEM_SIZE(name_len);
++ struct kdbus_item *item;
++ int ret;
++
++ cmd_name = alloca(size);
++
++ memset(cmd_name, 0, size);
++
++ item = cmd_name->items;
++ item->size = KDBUS_ITEM_HEADER_SIZE + name_len;
++ item->type = KDBUS_ITEM_NAME;
++ strcpy(item->str, name);
++
++ cmd_name->size = size;
++ if (flags)
++ cmd_name->flags = *flags;
++
++ ret = kdbus_cmd_name_acquire(conn->fd, cmd_name);
++ if (ret < 0) {
++ kdbus_printf("error aquiring name: %s\n", strerror(-ret));
++ return ret;
++ }
++
++ kdbus_printf("%s(): flags after call: 0x%llx\n", __func__,
++ cmd_name->return_flags);
++
++ if (flags)
++ *flags = cmd_name->return_flags;
++
++ return 0;
++}
++
++int kdbus_name_release(struct kdbus_conn *conn, const char *name)
++{
++ struct kdbus_cmd *cmd_name;
++ size_t name_len = strlen(name) + 1;
++ uint64_t size = sizeof(*cmd_name) + KDBUS_ITEM_SIZE(name_len);
++ struct kdbus_item *item;
++ int ret;
++
++ cmd_name = alloca(size);
++
++ memset(cmd_name, 0, size);
++
++ item = cmd_name->items;
++ item->size = KDBUS_ITEM_HEADER_SIZE + name_len;
++ item->type = KDBUS_ITEM_NAME;
++ strcpy(item->str, name);
++
++ cmd_name->size = size;
++
++ kdbus_printf("conn %lld giving up name '%s'\n",
++ (unsigned long long) conn->id, name);
++
++ ret = kdbus_cmd_name_release(conn->fd, cmd_name);
++ if (ret < 0) {
++ kdbus_printf("error releasing name: %s\n", strerror(-ret));
++ return ret;
++ }
++
++ return 0;
++}
++
++int kdbus_list(struct kdbus_conn *conn, uint64_t flags)
++{
++ struct kdbus_cmd_list cmd_list = {};
++ struct kdbus_info *list, *name;
++ int ret;
++
++ cmd_list.size = sizeof(cmd_list);
++ cmd_list.flags = flags;
++
++ ret = kdbus_cmd_list(conn->fd, &cmd_list);
++ if (ret < 0) {
++ kdbus_printf("error listing names: %d (%m)\n", ret);
++ return ret;
++ }
++
++ kdbus_printf("REGISTRY:\n");
++ list = (struct kdbus_info *)(conn->buf + cmd_list.offset);
++
++ KDBUS_FOREACH(name, list, cmd_list.list_size) {
++ uint64_t flags = 0;
++ struct kdbus_item *item;
++ const char *n = "MISSING-NAME";
++
++ if (name->size == sizeof(struct kdbus_cmd))
++ continue;
++
++ KDBUS_ITEM_FOREACH(item, name, items)
++ if (item->type == KDBUS_ITEM_OWNED_NAME) {
++ n = item->name.name;
++ flags = item->name.flags;
++
++ kdbus_printf("%8llu flags=0x%08llx conn=0x%08llx '%s'\n",
++ name->id,
++ (unsigned long long) flags,
++ name->flags, n);
++ }
++ }
++ kdbus_printf("\n");
++
++ ret = kdbus_free(conn, cmd_list.offset);
++
++ return ret;
++}
++
++int kdbus_conn_update_attach_flags(struct kdbus_conn *conn,
++ uint64_t attach_flags_send,
++ uint64_t attach_flags_recv)
++{
++ int ret;
++ size_t size;
++ struct kdbus_cmd *update;
++ struct kdbus_item *item;
++
++ size = sizeof(struct kdbus_cmd);
++ size += KDBUS_ITEM_SIZE(sizeof(uint64_t)) * 2;
++
++ update = malloc(size);
++ if (!update) {
++ kdbus_printf("error malloc: %m\n");
++ return -ENOMEM;
++ }
++
++ memset(update, 0, size);
++ update->size = size;
++
++ item = update->items;
++
++ item->type = KDBUS_ITEM_ATTACH_FLAGS_SEND;
++ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(uint64_t);
++ item->data64[0] = attach_flags_send;
++ item = KDBUS_ITEM_NEXT(item);
++
++ item->type = KDBUS_ITEM_ATTACH_FLAGS_RECV;
++ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(uint64_t);
++ item->data64[0] = attach_flags_recv;
++ item = KDBUS_ITEM_NEXT(item);
++
++ ret = kdbus_cmd_update(conn->fd, update);
++ if (ret < 0)
++ kdbus_printf("error conn update: %d (%m)\n", ret);
++
++ free(update);
++
++ return ret;
++}
++
++int kdbus_conn_update_policy(struct kdbus_conn *conn, const char *name,
++ const struct kdbus_policy_access *access,
++ size_t num_access)
++{
++ struct kdbus_cmd *update;
++ struct kdbus_item *item;
++ size_t i, size;
++ int ret;
++
++ size = sizeof(struct kdbus_cmd);
++ size += KDBUS_ITEM_SIZE(strlen(name) + 1);
++ size += num_access * KDBUS_ITEM_SIZE(sizeof(struct kdbus_policy_access));
++
++ update = malloc(size);
++ if (!update) {
++ kdbus_printf("error malloc: %m\n");
++ return -ENOMEM;
++ }
++
++ memset(update, 0, size);
++ update->size = size;
++
++ item = update->items;
++
++ item->type = KDBUS_ITEM_NAME;
++ item->size = KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
++ strcpy(item->str, name);
++ item = KDBUS_ITEM_NEXT(item);
++
++ for (i = 0; i < num_access; i++) {
++ item->size = KDBUS_ITEM_HEADER_SIZE +
++ sizeof(struct kdbus_policy_access);
++ item->type = KDBUS_ITEM_POLICY_ACCESS;
++
++ item->policy_access.type = access[i].type;
++ item->policy_access.access = access[i].access;
++ item->policy_access.id = access[i].id;
++
++ item = KDBUS_ITEM_NEXT(item);
++ }
++
++ ret = kdbus_cmd_update(conn->fd, update);
++ if (ret < 0)
++ kdbus_printf("error conn update: %d (%m)\n", ret);
++
++ free(update);
++
++ return ret;
++}
++
++int kdbus_add_match_id(struct kdbus_conn *conn, uint64_t cookie,
++ uint64_t type, uint64_t id)
++{
++ struct {
++ struct kdbus_cmd_match cmd;
++ struct {
++ uint64_t size;
++ uint64_t type;
++ struct kdbus_notify_id_change chg;
++ } item;
++ } buf;
++ int ret;
++
++ memset(&buf, 0, sizeof(buf));
++
++ buf.cmd.size = sizeof(buf);
++ buf.cmd.cookie = cookie;
++ buf.item.size = sizeof(buf.item);
++ buf.item.type = type;
++ buf.item.chg.id = id;
++
++ ret = kdbus_cmd_match_add(conn->fd, &buf.cmd);
++ if (ret < 0)
++ kdbus_printf("--- error adding conn match: %d (%m)\n", ret);
++
++ return ret;
++}
++
++int kdbus_add_match_empty(struct kdbus_conn *conn)
++{
++ struct {
++ struct kdbus_cmd_match cmd;
++ struct kdbus_item item;
++ } buf;
++ int ret;
++
++ memset(&buf, 0, sizeof(buf));
++
++ buf.item.size = sizeof(uint64_t) * 3;
++ buf.item.type = KDBUS_ITEM_ID;
++ buf.item.id = KDBUS_MATCH_ID_ANY;
++
++ buf.cmd.size = sizeof(buf.cmd) + buf.item.size;
++
++ ret = kdbus_cmd_match_add(conn->fd, &buf.cmd);
++ if (ret < 0)
++ kdbus_printf("--- error adding conn match: %d (%m)\n", ret);
++
++ return ret;
++}
++
++static int all_ids_are_mapped(const char *path)
++{
++ int ret;
++ FILE *file;
++ uint32_t inside_id, length;
++
++ file = fopen(path, "r");
++ if (!file) {
++ ret = -errno;
++ kdbus_printf("error fopen() %s: %d (%m)\n",
++ path, ret);
++ return ret;
++ }
++
++ ret = fscanf(file, "%u\t%*u\t%u", &inside_id, &length);
++ if (ret != 2) {
++ if (ferror(file))
++ ret = -errno;
++ else
++ ret = -EIO;
++
++ kdbus_printf("--- error fscanf(): %d\n", ret);
++ fclose(file);
++ return ret;
++ }
++
++ fclose(file);
++
++ /*
++ * If length is 4294967295 which means the invalid uid
++ * (uid_t) -1 then we are able to map all uid/gids
++ */
++ if (inside_id == 0 && length == (uid_t) -1)
++ return 1;
++
++ return 0;
++}
++
++int all_uids_gids_are_mapped(void)
++{
++ int ret;
++
++ ret = all_ids_are_mapped("/proc/self/uid_map");
++ if (ret <= 0) {
++ kdbus_printf("--- error not all uids are mapped\n");
++ return 0;
++ }
++
++ ret = all_ids_are_mapped("/proc/self/gid_map");
++ if (ret <= 0) {
++ kdbus_printf("--- error not all gids are mapped\n");
++ return 0;
++ }
++
++ return 1;
++}
++
++int drop_privileges(uid_t uid, gid_t gid)
++{
++ int ret;
++
++ ret = setgroups(0, NULL);
++ if (ret < 0) {
++ ret = -errno;
++ kdbus_printf("error setgroups: %d (%m)\n", ret);
++ return ret;
++ }
++
++ ret = setresgid(gid, gid, gid);
++ if (ret < 0) {
++ ret = -errno;
++ kdbus_printf("error setresgid: %d (%m)\n", ret);
++ return ret;
++ }
++
++ ret = setresuid(uid, uid, uid);
++ if (ret < 0) {
++ ret = -errno;
++ kdbus_printf("error setresuid: %d (%m)\n", ret);
++ return ret;
++ }
++
++ return ret;
++}
++
++uint64_t now(clockid_t clock)
++{
++ struct timespec spec;
++
++ clock_gettime(clock, &spec);
++ return spec.tv_sec * 1000ULL * 1000ULL * 1000ULL + spec.tv_nsec;
++}
++
++char *unique_name(const char *prefix)
++{
++ unsigned int i;
++ uint64_t u_now;
++ char n[17];
++ char *str;
++ int r;
++
++ /*
++ * This returns a random string which is guaranteed to be
++ * globally unique across all calls to unique_name(). We
++ * compose the string as:
++ * <prefix>-<random>-<time>
++ * With:
++ * <prefix>: string provided by the caller
++ * <random>: a random alpha string of 16 characters
++ * <time>: the current time in micro-seconds since last boot
++ *
++ * The <random> part makes the string always look vastly different,
++ * the <time> part makes sure no two calls return the same string.
++ */
++
++ u_now = now(CLOCK_MONOTONIC);
++
++ for (i = 0; i < sizeof(n) - 1; ++i)
++ n[i] = 'a' + (rand() % ('z' - 'a'));
++ n[sizeof(n) - 1] = 0;
++
++ r = asprintf(&str, "%s-%s-%" PRIu64, prefix, n, u_now);
++ if (r < 0)
++ return NULL;
++
++ return str;
++}
++
++static int do_userns_map_id(pid_t pid,
++ const char *map_file,
++ const char *map_id)
++{
++ int ret;
++ int fd;
++ char *map;
++ unsigned int i;
++
++ map = strndupa(map_id, strlen(map_id));
++ if (!map) {
++ ret = -errno;
++ kdbus_printf("error strndupa %s: %d (%m)\n",
++ map_file, ret);
++ return ret;
++ }
++
++ for (i = 0; i < strlen(map); i++)
++ if (map[i] == ',')
++ map[i] = '\n';
++
++ fd = open(map_file, O_RDWR);
++ if (fd < 0) {
++ ret = -errno;
++ kdbus_printf("error open %s: %d (%m)\n",
++ map_file, ret);
++ return ret;
++ }
++
++ ret = write(fd, map, strlen(map));
++ if (ret < 0) {
++ ret = -errno;
++ kdbus_printf("error write to %s: %d (%m)\n",
++ map_file, ret);
++ goto out;
++ }
++
++ ret = 0;
++
++out:
++ close(fd);
++ return ret;
++}
++
++int userns_map_uid_gid(pid_t pid,
++ const char *map_uid,
++ const char *map_gid)
++{
++ int fd, ret;
++ char file_id[128] = {'\0'};
++
++ snprintf(file_id, sizeof(file_id), "/proc/%ld/uid_map",
++ (long) pid);
++
++ ret = do_userns_map_id(pid, file_id, map_uid);
++ if (ret < 0)
++ return ret;
++
++ snprintf(file_id, sizeof(file_id), "/proc/%ld/setgroups",
++ (long) pid);
++
++ fd = open(file_id, O_WRONLY);
++ if (fd >= 0) {
++ write(fd, "deny\n", 5);
++ close(fd);
++ }
++
++ snprintf(file_id, sizeof(file_id), "/proc/%ld/gid_map",
++ (long) pid);
++
++ return do_userns_map_id(pid, file_id, map_gid);
++}
++
++static int do_cap_get_flag(cap_t caps, cap_value_t cap)
++{
++ int ret;
++ cap_flag_value_t flag_set;
++
++ ret = cap_get_flag(caps, cap, CAP_EFFECTIVE, &flag_set);
++ if (ret < 0) {
++ ret = -errno;
++ kdbus_printf("error cap_get_flag(): %d (%m)\n", ret);
++ return ret;
++ }
++
++ return (flag_set == CAP_SET);
++}
++
++/*
++ * Returns:
++ * 1 in case all the requested effective capabilities are set.
++ * 0 in case we do not have the requested capabilities. This value
++ * will be used to abort tests with TEST_SKIP
++ * Negative errno on failure.
++ *
++ * Terminate args with a negative value.
++ */
++int test_is_capable(int cap, ...)
++{
++ int ret;
++ va_list ap;
++ cap_t caps;
++
++ caps = cap_get_proc();
++ if (!caps) {
++ ret = -errno;
++ kdbus_printf("error cap_get_proc(): %d (%m)\n", ret);
++ return ret;
++ }
++
++ ret = do_cap_get_flag(caps, (cap_value_t)cap);
++ if (ret <= 0)
++ goto out;
++
++ va_start(ap, cap);
++ while ((cap = va_arg(ap, int)) > 0) {
++ ret = do_cap_get_flag(caps, (cap_value_t)cap);
++ if (ret <= 0)
++ break;
++ }
++ va_end(ap);
++
++out:
++ cap_free(caps);
++ return ret;
++}
++
++int config_user_ns_is_enabled(void)
++{
++ return (access("/proc/self/uid_map", F_OK) == 0);
++}
++
++int config_auditsyscall_is_enabled(void)
++{
++ return (access("/proc/self/loginuid", F_OK) == 0);
++}
++
++int config_cgroups_is_enabled(void)
++{
++ return (access("/proc/self/cgroup", F_OK) == 0);
++}
++
++int config_security_is_enabled(void)
++{
++ int fd;
++ int ret;
++ char buf[128];
++
++ /* CONFIG_SECURITY is disabled */
++ if (access("/proc/self/attr/current", F_OK) != 0)
++ return 0;
++
++ /*
++ * Now only if read() fails with -EINVAL then we assume
++ * that SECLABEL and LSM are disabled
++ */
++ fd = open("/proc/self/attr/current", O_RDONLY|O_CLOEXEC);
++ if (fd < 0)
++ return 1;
++
++ ret = read(fd, buf, sizeof(buf));
++ if (ret == -1 && errno == EINVAL)
++ ret = 0;
++ else
++ ret = 1;
++
++ close(fd);
++
++ return ret;
++}
+diff --git a/tools/testing/selftests/kdbus/kdbus-util.h b/tools/testing/selftests/kdbus/kdbus-util.h
+new file mode 100644
+index 0000000..e1e18b9
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/kdbus-util.h
+@@ -0,0 +1,218 @@
++/*
++ * Copyright (C) 2013-2015 Kay Sievers
++ * Copyright (C) 2013-2015 Daniel Mack
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#pragma once
++
++#define BIT(X) (1 << (X))
++
++#include <time.h>
++#include <stdbool.h>
++#include <linux/kdbus.h>
++
++#define _STRINGIFY(x) #x
++#define STRINGIFY(x) _STRINGIFY(x)
++#define ELEMENTSOF(x) (sizeof(x)/sizeof((x)[0]))
++
++#define KDBUS_PTR(addr) ((void *)(uintptr_t)(addr))
++
++#define KDBUS_ALIGN8(l) (((l) + 7) & ~7)
++#define KDBUS_ITEM_HEADER_SIZE offsetof(struct kdbus_item, data)
++#define KDBUS_ITEM_SIZE(s) KDBUS_ALIGN8((s) + KDBUS_ITEM_HEADER_SIZE)
++
++#define KDBUS_ITEM_NEXT(item) \
++ (typeof(item))((uint8_t *)(item) + KDBUS_ALIGN8((item)->size))
++#define KDBUS_ITEM_FOREACH(item, head, first) \
++ for ((item) = (head)->first; \
++ ((uint8_t *)(item) < (uint8_t *)(head) + (head)->size) && \
++ ((uint8_t *)(item) >= (uint8_t *)(head)); \
++ (item) = KDBUS_ITEM_NEXT(item))
++#define KDBUS_FOREACH(iter, first, _size) \
++ for ((iter) = (first); \
++ ((uint8_t *)(iter) < (uint8_t *)(first) + (_size)) && \
++ ((uint8_t *)(iter) >= (uint8_t *)(first)); \
++ (iter) = (void *)((uint8_t *)(iter) + KDBUS_ALIGN8((iter)->size)))
++
++#define _KDBUS_ATTACH_BITS_SET_NR (__builtin_popcountll(_KDBUS_ATTACH_ALL))
++
++/* Sum of KDBUS_ITEM_* that reflects _KDBUS_ATTACH_ALL */
++#define KDBUS_ATTACH_ITEMS_TYPE_SUM \
++ ((((_KDBUS_ATTACH_BITS_SET_NR - 1) * \
++ ((_KDBUS_ATTACH_BITS_SET_NR - 1) + 1)) / 2) + \
++ (_KDBUS_ITEM_ATTACH_BASE * _KDBUS_ATTACH_BITS_SET_NR))
++
++#define POOL_SIZE (16 * 1024LU * 1024LU)
++
++#define UNPRIV_UID 65534
++#define UNPRIV_GID 65534
++
++/* Dump as user of process, useful for user namespace testing */
++#define SUID_DUMP_USER 1
++
++extern int kdbus_util_verbose;
++
++#define kdbus_printf(X...) \
++ if (kdbus_util_verbose) \
++ printf(X)
++
++#define RUN_UNPRIVILEGED(child_uid, child_gid, _child_, _parent_) ({ \
++ pid_t pid, rpid; \
++ int ret; \
++ \
++ pid = fork(); \
++ if (pid == 0) { \
++ ret = drop_privileges(child_uid, child_gid); \
++ ASSERT_EXIT_VAL(ret == 0, ret); \
++ \
++ _child_; \
++ _exit(0); \
++ } else if (pid > 0) { \
++ _parent_; \
++ rpid = waitpid(pid, &ret, 0); \
++ ASSERT_RETURN(rpid == pid); \
++ ASSERT_RETURN(WIFEXITED(ret)); \
++ ASSERT_RETURN(WEXITSTATUS(ret) == 0); \
++ ret = TEST_OK; \
++ } else { \
++ ret = pid; \
++ } \
++ \
++ ret; \
++ })
++
++#define RUN_UNPRIVILEGED_CONN(_var_, _bus_, _code_) \
++ RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_GID, ({ \
++ struct kdbus_conn *_var_; \
++ _var_ = kdbus_hello(_bus_, 0, NULL, 0); \
++ ASSERT_EXIT(_var_); \
++ _code_; \
++ kdbus_conn_free(_var_); \
++ }), ({ 0; }))
++
++#define RUN_CLONE_CHILD(clone_ret, flags, _setup_, _child_body_, \
++ _parent_setup_, _parent_body_) ({ \
++ pid_t pid, rpid; \
++ int ret; \
++ int efd = -1; \
++ \
++ _setup_; \
++ efd = eventfd(0, EFD_CLOEXEC); \
++ ASSERT_RETURN(efd >= 0); \
++ *(clone_ret) = 0; \
++ pid = syscall(__NR_clone, flags, NULL); \
++ if (pid == 0) { \
++ eventfd_t event_status = 0; \
++ ret = prctl(PR_SET_PDEATHSIG, SIGKILL); \
++ ASSERT_EXIT(ret == 0); \
++ ret = eventfd_read(efd, &event_status); \
++ if (ret < 0 || event_status != 1) { \
++ kdbus_printf("error eventfd_read()\n"); \
++ _exit(EXIT_FAILURE); \
++ } \
++ _child_body_; \
++ _exit(0); \
++ } else if (pid > 0) { \
++ _parent_setup_; \
++ ret = eventfd_write(efd, 1); \
++ ASSERT_RETURN(ret >= 0); \
++ _parent_body_; \
++ rpid = waitpid(pid, &ret, 0); \
++ ASSERT_RETURN(rpid == pid); \
++ ASSERT_RETURN(WIFEXITED(ret)); \
++ ASSERT_RETURN(WEXITSTATUS(ret) == 0); \
++ ret = TEST_OK; \
++ } else { \
++ ret = -errno; \
++ *(clone_ret) = -errno; \
++ } \
++ close(efd); \
++ ret; \
++})
++
++/* Enums for parent if it should drop privs or not */
++enum kdbus_drop_parent {
++ DO_NOT_DROP,
++ DROP_SAME_UNPRIV,
++ DROP_OTHER_UNPRIV,
++};
++
++struct kdbus_conn {
++ int fd;
++ uint64_t id;
++ unsigned char *buf;
++};
++
++int kdbus_sysfs_get_parameter_mask(const char *path, uint64_t *mask);
++int kdbus_sysfs_set_parameter_mask(const char *path, uint64_t mask);
++
++int sys_memfd_create(const char *name, __u64 size);
++int sys_memfd_seal_set(int fd);
++off_t sys_memfd_get_size(int fd, off_t *size);
++
++int kdbus_list(struct kdbus_conn *conn, uint64_t flags);
++int kdbus_name_release(struct kdbus_conn *conn, const char *name);
++int kdbus_name_acquire(struct kdbus_conn *conn, const char *name,
++ uint64_t *flags);
++void kdbus_msg_free(struct kdbus_msg *msg);
++int kdbus_msg_recv(struct kdbus_conn *conn,
++ struct kdbus_msg **msg, uint64_t *offset);
++int kdbus_msg_recv_poll(struct kdbus_conn *conn, int timeout_ms,
++ struct kdbus_msg **msg_out, uint64_t *offset);
++int kdbus_free(const struct kdbus_conn *conn, uint64_t offset);
++int kdbus_msg_dump(const struct kdbus_conn *conn,
++ const struct kdbus_msg *msg);
++int kdbus_create_bus(int control_fd, const char *name,
++ uint64_t owner_meta, char **path);
++int kdbus_msg_send(const struct kdbus_conn *conn, const char *name,
++ uint64_t cookie, uint64_t flags, uint64_t timeout,
++ int64_t priority, uint64_t dst_id);
++int kdbus_msg_send_sync(const struct kdbus_conn *conn, const char *name,
++ uint64_t cookie, uint64_t flags, uint64_t timeout,
++ int64_t priority, uint64_t dst_id, int cancel_fd);
++int kdbus_msg_send_reply(const struct kdbus_conn *conn,
++ uint64_t reply_cookie,
++ uint64_t dst_id);
++struct kdbus_conn *kdbus_hello(const char *path, uint64_t hello_flags,
++ const struct kdbus_item *item,
++ size_t item_size);
++struct kdbus_conn *kdbus_hello_registrar(const char *path, const char *name,
++ const struct kdbus_policy_access *access,
++ size_t num_access, uint64_t flags);
++struct kdbus_conn *kdbus_hello_activator(const char *path, const char *name,
++ const struct kdbus_policy_access *access,
++ size_t num_access);
++bool kdbus_item_in_message(struct kdbus_msg *msg, uint64_t type);
++int kdbus_bus_creator_info(struct kdbus_conn *conn,
++ uint64_t flags,
++ uint64_t *offset);
++int kdbus_conn_info(struct kdbus_conn *conn, uint64_t id,
++ const char *name, uint64_t flags, uint64_t *offset);
++void kdbus_conn_free(struct kdbus_conn *conn);
++int kdbus_conn_update_attach_flags(struct kdbus_conn *conn,
++ uint64_t attach_flags_send,
++ uint64_t attach_flags_recv);
++int kdbus_conn_update_policy(struct kdbus_conn *conn, const char *name,
++ const struct kdbus_policy_access *access,
++ size_t num_access);
++
++int kdbus_add_match_id(struct kdbus_conn *conn, uint64_t cookie,
++ uint64_t type, uint64_t id);
++int kdbus_add_match_empty(struct kdbus_conn *conn);
++
++int all_uids_gids_are_mapped(void);
++int drop_privileges(uid_t uid, gid_t gid);
++uint64_t now(clockid_t clock);
++char *unique_name(const char *prefix);
++
++int userns_map_uid_gid(pid_t pid, const char *map_uid, const char *map_gid);
++int test_is_capable(int cap, ...);
++int config_user_ns_is_enabled(void);
++int config_auditsyscall_is_enabled(void);
++int config_cgroups_is_enabled(void);
++int config_security_is_enabled(void);
+diff --git a/tools/testing/selftests/kdbus/test-activator.c b/tools/testing/selftests/kdbus/test-activator.c
+new file mode 100644
+index 0000000..3d1b763
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-activator.c
+@@ -0,0 +1,318 @@
++#include <stdio.h>
++#include <string.h>
++#include <time.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stdbool.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++#include <assert.h>
++#include <poll.h>
++#include <sys/capability.h>
++#include <sys/types.h>
++#include <sys/wait.h>
++
++#include "kdbus-test.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++
++static int kdbus_starter_poll(struct kdbus_conn *conn)
++{
++ int ret;
++ struct pollfd fd;
++
++ fd.fd = conn->fd;
++ fd.events = POLLIN | POLLPRI | POLLHUP;
++ fd.revents = 0;
++
++ ret = poll(&fd, 1, 100);
++ if (ret == 0)
++ return -ETIMEDOUT;
++ else if (ret > 0) {
++ if (fd.revents & POLLIN)
++ return 0;
++
++ if (fd.revents & (POLLHUP | POLLERR))
++ ret = -ECONNRESET;
++ }
++
++ return ret;
++}
++
++/* Ensure that kdbus activator logic is safe */
++static int kdbus_priv_activator(struct kdbus_test_env *env)
++{
++ int ret;
++ struct kdbus_msg *msg = NULL;
++ uint64_t cookie = 0xdeadbeef;
++ uint64_t flags = KDBUS_NAME_REPLACE_EXISTING;
++ struct kdbus_conn *activator;
++ struct kdbus_conn *service;
++ struct kdbus_conn *client;
++ struct kdbus_conn *holder;
++ struct kdbus_policy_access *access;
++
++ access = (struct kdbus_policy_access[]){
++ {
++ .type = KDBUS_POLICY_ACCESS_USER,
++ .id = getuid(),
++ .access = KDBUS_POLICY_OWN,
++ },
++ {
++ .type = KDBUS_POLICY_ACCESS_USER,
++ .id = getuid(),
++ .access = KDBUS_POLICY_TALK,
++ },
++ };
++
++ activator = kdbus_hello_activator(env->buspath, "foo.priv.activator",
++ access, 2);
++ ASSERT_RETURN(activator);
++
++ service = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(service);
++
++ client = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(client);
++
++ /*
++ * Make sure that other users can't TALK to the activator
++ */
++
++ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++ /* Try to talk using the ID */
++ ret = kdbus_msg_send(unpriv, NULL, 0xdeadbeef, 0, 0,
++ 0, activator->id);
++ ASSERT_EXIT(ret == -ENXIO);
++
++ /* Try to talk to the name */
++ ret = kdbus_msg_send(unpriv, "foo.priv.activator",
++ 0xdeadbeef, 0, 0, 0,
++ KDBUS_DST_ID_NAME);
++ ASSERT_EXIT(ret == -EPERM);
++ }));
++ ASSERT_RETURN(ret >= 0);
++
++ /*
++ * Make sure that we did not receive anything, so the
++ * service will not be started automatically
++ */
++
++ ret = kdbus_starter_poll(activator);
++ ASSERT_RETURN(ret == -ETIMEDOUT);
++
++ /*
++ * Now try to emulate the starter/service logic and
++ * acquire the name.
++ */
++
++ cookie++;
++ ret = kdbus_msg_send(service, "foo.priv.activator", cookie,
++ 0, 0, 0, KDBUS_DST_ID_NAME);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_starter_poll(activator);
++ ASSERT_RETURN(ret == 0);
++
++ /* Policies are still checked, access denied */
++
++ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++ ret = kdbus_name_acquire(unpriv, "foo.priv.activator",
++ &flags);
++ ASSERT_RETURN(ret == -EPERM);
++ }));
++ ASSERT_RETURN(ret >= 0);
++
++ ret = kdbus_name_acquire(service, "foo.priv.activator",
++ &flags);
++ ASSERT_RETURN(ret == 0);
++
++ /* We read our previous starter message */
++
++ ret = kdbus_msg_recv_poll(service, 100, NULL, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ /* Try to talk, we still fail */
++
++ cookie++;
++ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++ /* Try to talk to the name */
++ ret = kdbus_msg_send(unpriv, "foo.priv.activator",
++ cookie, 0, 0, 0,
++ KDBUS_DST_ID_NAME);
++ ASSERT_EXIT(ret == -EPERM);
++ }));
++ ASSERT_RETURN(ret >= 0);
++
++ /* Still nothing to read */
++
++ ret = kdbus_msg_recv_poll(service, 100, NULL, NULL);
++ ASSERT_RETURN(ret == -ETIMEDOUT);
++
++ /* We receive every thing now */
++
++ cookie++;
++ ret = kdbus_msg_send(client, "foo.priv.activator", cookie,
++ 0, 0, 0, KDBUS_DST_ID_NAME);
++ ASSERT_RETURN(ret == 0);
++ ret = kdbus_msg_recv_poll(service, 100, &msg, NULL);
++ ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
++
++ kdbus_msg_free(msg);
++
++ /* Policies default to deny TALK now */
++ kdbus_conn_free(activator);
++
++ cookie++;
++ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++ /* Try to talk to the name */
++ ret = kdbus_msg_send(unpriv, "foo.priv.activator",
++ cookie, 0, 0, 0,
++ KDBUS_DST_ID_NAME);
++ ASSERT_EXIT(ret == -EPERM);
++ }));
++ ASSERT_RETURN(ret >= 0);
++
++ ret = kdbus_msg_recv_poll(service, 100, NULL, NULL);
++ ASSERT_RETURN(ret == -ETIMEDOUT);
++
++ /* Same user is able to TALK */
++ cookie++;
++ ret = kdbus_msg_send(client, "foo.priv.activator", cookie,
++ 0, 0, 0, KDBUS_DST_ID_NAME);
++ ASSERT_RETURN(ret == 0);
++ ret = kdbus_msg_recv_poll(service, 100, &msg, NULL);
++ ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
++
++ kdbus_msg_free(msg);
++
++ access = (struct kdbus_policy_access []){
++ {
++ .type = KDBUS_POLICY_ACCESS_WORLD,
++ .id = getuid(),
++ .access = KDBUS_POLICY_TALK,
++ },
++ };
++
++ holder = kdbus_hello_registrar(env->buspath, "foo.priv.activator",
++ access, 1, KDBUS_HELLO_POLICY_HOLDER);
++ ASSERT_RETURN(holder);
++
++ /* Now we are able to TALK to the name */
++
++ cookie++;
++ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++ /* Try to talk to the name */
++ ret = kdbus_msg_send(unpriv, "foo.priv.activator",
++ cookie, 0, 0, 0,
++ KDBUS_DST_ID_NAME);
++ ASSERT_EXIT(ret == 0);
++ }));
++ ASSERT_RETURN(ret >= 0);
++
++ ret = kdbus_msg_recv_poll(service, 100, NULL, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++ ret = kdbus_name_acquire(unpriv, "foo.priv.activator",
++ &flags);
++ ASSERT_RETURN(ret == -EPERM);
++ }));
++ ASSERT_RETURN(ret >= 0);
++
++ kdbus_conn_free(service);
++ kdbus_conn_free(client);
++ kdbus_conn_free(holder);
++
++ return 0;
++}
++
++int kdbus_test_activator(struct kdbus_test_env *env)
++{
++ int ret;
++ struct kdbus_conn *activator;
++ struct pollfd fds[2];
++ bool activator_done = false;
++ struct kdbus_policy_access access[2];
++
++ access[0].type = KDBUS_POLICY_ACCESS_USER;
++ access[0].id = getuid();
++ access[0].access = KDBUS_POLICY_OWN;
++
++ access[1].type = KDBUS_POLICY_ACCESS_WORLD;
++ access[1].access = KDBUS_POLICY_TALK;
++
++ activator = kdbus_hello_activator(env->buspath, "foo.test.activator",
++ access, 2);
++ ASSERT_RETURN(activator);
++
++ ret = kdbus_add_match_empty(env->conn);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_list(env->conn, KDBUS_LIST_NAMES |
++ KDBUS_LIST_UNIQUE |
++ KDBUS_LIST_ACTIVATORS |
++ KDBUS_LIST_QUEUED);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_msg_send(env->conn, "foo.test.activator", 0xdeafbeef,
++ 0, 0, 0, KDBUS_DST_ID_NAME);
++ ASSERT_RETURN(ret == 0);
++
++ fds[0].fd = activator->fd;
++ fds[1].fd = env->conn->fd;
++
++ kdbus_printf("-- entering poll loop ...\n");
++
++ for (;;) {
++ int i, nfds = sizeof(fds) / sizeof(fds[0]);
++
++ for (i = 0; i < nfds; i++) {
++ fds[i].events = POLLIN | POLLPRI;
++ fds[i].revents = 0;
++ }
++
++ ret = poll(fds, nfds, 3000);
++ ASSERT_RETURN(ret >= 0);
++
++ ret = kdbus_list(env->conn, KDBUS_LIST_NAMES);
++ ASSERT_RETURN(ret == 0);
++
++ if ((fds[0].revents & POLLIN) && !activator_done) {
++ uint64_t flags = KDBUS_NAME_REPLACE_EXISTING;
++
++ kdbus_printf("Starter was called back!\n");
++
++ ret = kdbus_name_acquire(env->conn,
++ "foo.test.activator", &flags);
++ ASSERT_RETURN(ret == 0);
++
++ activator_done = true;
++ }
++
++ if (fds[1].revents & POLLIN) {
++ kdbus_msg_recv(env->conn, NULL, NULL);
++ break;
++ }
++ }
++
++ /* Check if all uids/gids are mapped */
++ if (!all_uids_gids_are_mapped())
++ return TEST_SKIP;
++
++ /* Check now capabilities, so we run the previous tests */
++ ret = test_is_capable(CAP_SETUID, CAP_SETGID, -1);
++ ASSERT_RETURN(ret >= 0);
++
++ if (!ret)
++ return TEST_SKIP;
++
++ ret = kdbus_priv_activator(env);
++ ASSERT_RETURN(ret == 0);
++
++ kdbus_conn_free(activator);
++
++ return TEST_OK;
++}
+diff --git a/tools/testing/selftests/kdbus/test-benchmark.c b/tools/testing/selftests/kdbus/test-benchmark.c
+new file mode 100644
+index 0000000..8a9744b
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-benchmark.c
+@@ -0,0 +1,451 @@
++#include <stdio.h>
++#include <string.h>
++#include <time.h>
++#include <fcntl.h>
++#include <locale.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <stdbool.h>
++#include <errno.h>
++#include <assert.h>
++#include <poll.h>
++#include <sys/time.h>
++#include <sys/mman.h>
++#include <sys/socket.h>
++#include <math.h>
++
++#include "kdbus-api.h"
++#include "kdbus-test.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++
++#define SERVICE_NAME "foo.bar.echo"
++
++/*
++ * To have a banchmark comparison with unix socket, set:
++ * user_memfd = false;
++ * compare_uds = true;
++ * attach_none = true; do not attached metadata
++ */
++
++static bool use_memfd = true; /* transmit memfd? */
++static bool compare_uds = false; /* unix-socket comparison? */
++static bool attach_none = false; /* clear attach-flags? */
++static char stress_payload[8192];
++
++struct stats {
++ uint64_t count;
++ uint64_t latency_acc;
++ uint64_t latency_low;
++ uint64_t latency_high;
++ uint64_t latency_avg;
++ uint64_t latency_ssquares;
++};
++
++static struct stats stats;
++
++static void reset_stats(void)
++{
++ stats.count = 0;
++ stats.latency_acc = 0;
++ stats.latency_low = UINT64_MAX;
++ stats.latency_high = 0;
++ stats.latency_avg = 0;
++ stats.latency_ssquares = 0;
++}
++
++static void dump_stats(bool is_uds)
++{
++ if (stats.count > 0) {
++ kdbus_printf("stats %s: %'llu packets processed, latency (nsecs) min/max/avg/dev %'7llu // %'7llu // %'7llu // %'7.f\n",
++ is_uds ? " (UNIX)" : "(KDBUS)",
++ (unsigned long long) stats.count,
++ (unsigned long long) stats.latency_low,
++ (unsigned long long) stats.latency_high,
++ (unsigned long long) stats.latency_avg,
++ sqrt(stats.latency_ssquares / stats.count));
++ } else {
++ kdbus_printf("*** no packets received. bus stuck?\n");
++ }
++}
++
++static void add_stats(uint64_t prev)
++{
++ uint64_t diff, latency_avg_prev;
++
++ diff = now(CLOCK_THREAD_CPUTIME_ID) - prev;
++
++ stats.count++;
++ stats.latency_acc += diff;
++
++ /* see Welford62 */
++ latency_avg_prev = stats.latency_avg;
++ stats.latency_avg = stats.latency_acc / stats.count;
++ stats.latency_ssquares += (diff - latency_avg_prev) * (diff - stats.latency_avg);
++
++ if (stats.latency_low > diff)
++ stats.latency_low = diff;
++
++ if (stats.latency_high < diff)
++ stats.latency_high = diff;
++}
++
++static int setup_simple_kdbus_msg(struct kdbus_conn *conn,
++ uint64_t dst_id,
++ struct kdbus_msg **msg_out)
++{
++ struct kdbus_msg *msg;
++ struct kdbus_item *item;
++ uint64_t size;
++
++ size = sizeof(struct kdbus_msg);
++ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
++
++ msg = malloc(size);
++ ASSERT_RETURN_VAL(msg, -ENOMEM);
++
++ memset(msg, 0, size);
++ msg->size = size;
++ msg->src_id = conn->id;
++ msg->dst_id = dst_id;
++ msg->payload_type = KDBUS_PAYLOAD_DBUS;
++
++ item = msg->items;
++
++ item->type = KDBUS_ITEM_PAYLOAD_VEC;
++ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
++ item->vec.address = (uintptr_t) stress_payload;
++ item->vec.size = sizeof(stress_payload);
++ item = KDBUS_ITEM_NEXT(item);
++
++ *msg_out = msg;
++
++ return 0;
++}
++
++static int setup_memfd_kdbus_msg(struct kdbus_conn *conn,
++ uint64_t dst_id,
++ off_t *memfd_item_offset,
++ struct kdbus_msg **msg_out)
++{
++ struct kdbus_msg *msg;
++ struct kdbus_item *item;
++ uint64_t size;
++
++ size = sizeof(struct kdbus_msg);
++ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
++ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_memfd));
++
++ msg = malloc(size);
++ ASSERT_RETURN_VAL(msg, -ENOMEM);
++
++ memset(msg, 0, size);
++ msg->size = size;
++ msg->src_id = conn->id;
++ msg->dst_id = dst_id;
++ msg->payload_type = KDBUS_PAYLOAD_DBUS;
++
++ item = msg->items;
++
++ item->type = KDBUS_ITEM_PAYLOAD_VEC;
++ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
++ item->vec.address = (uintptr_t) stress_payload;
++ item->vec.size = sizeof(stress_payload);
++ item = KDBUS_ITEM_NEXT(item);
++
++ item->type = KDBUS_ITEM_PAYLOAD_MEMFD;
++ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_memfd);
++ item->memfd.size = sizeof(uint64_t);
++
++ *memfd_item_offset = (unsigned char *)item - (unsigned char *)msg;
++ *msg_out = msg;
++
++ return 0;
++}
++
++static int
++send_echo_request(struct kdbus_conn *conn, uint64_t dst_id,
++ void *kdbus_msg, off_t memfd_item_offset)
++{
++ struct kdbus_cmd_send cmd = {};
++ int memfd = -1;
++ int ret;
++
++ if (use_memfd) {
++ uint64_t now_ns = now(CLOCK_THREAD_CPUTIME_ID);
++ struct kdbus_item *item = memfd_item_offset + kdbus_msg;
++ memfd = sys_memfd_create("memfd-name", 0);
++ ASSERT_RETURN_VAL(memfd >= 0, memfd);
++
++ ret = write(memfd, &now_ns, sizeof(now_ns));
++ ASSERT_RETURN_VAL(ret == sizeof(now_ns), -EAGAIN);
++
++ ret = sys_memfd_seal_set(memfd);
++ ASSERT_RETURN_VAL(ret == 0, -errno);
++
++ item->memfd.fd = memfd;
++ }
++
++ cmd.size = sizeof(cmd);
++ cmd.msg_address = (uintptr_t)kdbus_msg;
++
++ ret = kdbus_cmd_send(conn->fd, &cmd);
++ ASSERT_RETURN_VAL(ret == 0, ret);
++
++ close(memfd);
++
++ return 0;
++}
++
++static int
++handle_echo_reply(struct kdbus_conn *conn, uint64_t send_ns)
++{
++ int ret;
++ struct kdbus_cmd_recv recv = { .size = sizeof(recv) };
++ struct kdbus_msg *msg;
++ const struct kdbus_item *item;
++ bool has_memfd = false;
++
++ ret = kdbus_cmd_recv(conn->fd, &recv);
++ if (ret == -EAGAIN)
++ return ret;
++
++ ASSERT_RETURN_VAL(ret == 0, ret);
++
++ if (!use_memfd)
++ goto out;
++
++ msg = (struct kdbus_msg *)(conn->buf + recv.msg.offset);
++
++ KDBUS_ITEM_FOREACH(item, msg, items) {
++ switch (item->type) {
++ case KDBUS_ITEM_PAYLOAD_MEMFD: {
++ char *buf;
++
++ buf = mmap(NULL, item->memfd.size, PROT_READ,
++ MAP_PRIVATE, item->memfd.fd, 0);
++ ASSERT_RETURN_VAL(buf != MAP_FAILED, -EINVAL);
++ ASSERT_RETURN_VAL(item->memfd.size == sizeof(uint64_t),
++ -EINVAL);
++
++ add_stats(*(uint64_t*)buf);
++ munmap(buf, item->memfd.size);
++ close(item->memfd.fd);
++ has_memfd = true;
++ break;
++ }
++
++ case KDBUS_ITEM_PAYLOAD_OFF:
++ /* ignore */
++ break;
++ }
++ }
++
++out:
++ if (!has_memfd)
++ add_stats(send_ns);
++
++ ret = kdbus_free(conn, recv.msg.offset);
++ ASSERT_RETURN_VAL(ret == 0, -errno);
++
++ return 0;
++}
++
++static int benchmark(struct kdbus_test_env *env)
++{
++ static char buf[sizeof(stress_payload)];
++ struct kdbus_msg *kdbus_msg = NULL;
++ off_t memfd_cached_offset = 0;
++ int ret;
++ struct kdbus_conn *conn_a, *conn_b;
++ struct pollfd fds[2];
++ uint64_t start, send_ns, now_ns, diff;
++ unsigned int i;
++ int uds[2];
++
++ setlocale(LC_ALL, "");
++
++ for (i = 0; i < sizeof(stress_payload); i++)
++ stress_payload[i] = i;
++
++ /* setup kdbus pair */
++
++ conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
++ conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn_a && conn_b);
++
++ ret = kdbus_add_match_empty(conn_a);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_add_match_empty(conn_b);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_name_acquire(conn_a, SERVICE_NAME, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ if (attach_none) {
++ ret = kdbus_conn_update_attach_flags(conn_a,
++ _KDBUS_ATTACH_ALL,
++ 0);
++ ASSERT_RETURN(ret == 0);
++ }
++
++ /* setup UDS pair */
++
++ ret = socketpair(AF_UNIX, SOCK_SEQPACKET | SOCK_NONBLOCK, 0, uds);
++ ASSERT_RETURN(ret == 0);
++
++ /* setup a kdbus msg now */
++ if (use_memfd) {
++ ret = setup_memfd_kdbus_msg(conn_b, conn_a->id,
++ &memfd_cached_offset,
++ &kdbus_msg);
++ ASSERT_RETURN(ret == 0);
++ } else {
++ ret = setup_simple_kdbus_msg(conn_b, conn_a->id, &kdbus_msg);
++ ASSERT_RETURN(ret == 0);
++ }
++
++ /* start benchmark */
++
++ kdbus_printf("-- entering poll loop ...\n");
++
++ do {
++ /* run kdbus benchmark */
++ fds[0].fd = conn_a->fd;
++ fds[1].fd = conn_b->fd;
++
++ /* cancel any pending message */
++ handle_echo_reply(conn_a, 0);
++
++ start = now(CLOCK_THREAD_CPUTIME_ID);
++ reset_stats();
++
++ send_ns = now(CLOCK_THREAD_CPUTIME_ID);
++ ret = send_echo_request(conn_b, conn_a->id,
++ kdbus_msg, memfd_cached_offset);
++ ASSERT_RETURN(ret == 0);
++
++ while (1) {
++ unsigned int nfds = sizeof(fds) / sizeof(fds[0]);
++ unsigned int i;
++
++ for (i = 0; i < nfds; i++) {
++ fds[i].events = POLLIN | POLLPRI | POLLHUP;
++ fds[i].revents = 0;
++ }
++
++ ret = poll(fds, nfds, 10);
++ if (ret < 0)
++ break;
++
++ if (fds[0].revents & POLLIN) {
++ ret = handle_echo_reply(conn_a, send_ns);
++ ASSERT_RETURN(ret == 0);
++
++ send_ns = now(CLOCK_THREAD_CPUTIME_ID);
++ ret = send_echo_request(conn_b, conn_a->id,
++ kdbus_msg,
++ memfd_cached_offset);
++ ASSERT_RETURN(ret == 0);
++ }
++
++ now_ns = now(CLOCK_THREAD_CPUTIME_ID);
++ diff = now_ns - start;
++ if (diff > 1000000000ULL) {
++ start = now_ns;
++
++ dump_stats(false);
++ break;
++ }
++ }
++
++ if (!compare_uds)
++ continue;
++
++ /* run unix-socket benchmark as comparison */
++
++ fds[0].fd = uds[0];
++ fds[1].fd = uds[1];
++
++ /* cancel any pendign message */
++ read(uds[1], buf, sizeof(buf));
++
++ start = now(CLOCK_THREAD_CPUTIME_ID);
++ reset_stats();
++
++ send_ns = now(CLOCK_THREAD_CPUTIME_ID);
++ ret = write(uds[0], stress_payload, sizeof(stress_payload));
++ ASSERT_RETURN(ret == sizeof(stress_payload));
++
++ while (1) {
++ unsigned int nfds = sizeof(fds) / sizeof(fds[0]);
++ unsigned int i;
++
++ for (i = 0; i < nfds; i++) {
++ fds[i].events = POLLIN | POLLPRI | POLLHUP;
++ fds[i].revents = 0;
++ }
++
++ ret = poll(fds, nfds, 10);
++ if (ret < 0)
++ break;
++
++ if (fds[1].revents & POLLIN) {
++ ret = read(uds[1], buf, sizeof(buf));
++ ASSERT_RETURN(ret == sizeof(buf));
++
++ add_stats(send_ns);
++
++ send_ns = now(CLOCK_THREAD_CPUTIME_ID);
++ ret = write(uds[0], buf, sizeof(buf));
++ ASSERT_RETURN(ret == sizeof(buf));
++ }
++
++ now_ns = now(CLOCK_THREAD_CPUTIME_ID);
++ diff = now_ns - start;
++ if (diff > 1000000000ULL) {
++ start = now_ns;
++
++ dump_stats(true);
++ break;
++ }
++ }
++
++ } while (kdbus_util_verbose);
++
++ kdbus_printf("-- closing bus connections\n");
++
++ free(kdbus_msg);
++
++ kdbus_conn_free(conn_a);
++ kdbus_conn_free(conn_b);
++
++ return (stats.count > 1) ? TEST_OK : TEST_ERR;
++}
++
++int kdbus_test_benchmark(struct kdbus_test_env *env)
++{
++ use_memfd = true;
++ attach_none = false;
++ compare_uds = false;
++ return benchmark(env);
++}
++
++int kdbus_test_benchmark_nomemfds(struct kdbus_test_env *env)
++{
++ use_memfd = false;
++ attach_none = false;
++ compare_uds = false;
++ return benchmark(env);
++}
++
++int kdbus_test_benchmark_uds(struct kdbus_test_env *env)
++{
++ use_memfd = false;
++ attach_none = true;
++ compare_uds = true;
++ return benchmark(env);
++}
+diff --git a/tools/testing/selftests/kdbus/test-bus.c b/tools/testing/selftests/kdbus/test-bus.c
+new file mode 100644
+index 0000000..762fb30
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-bus.c
+@@ -0,0 +1,175 @@
++#include <stdio.h>
++#include <string.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++#include <assert.h>
++#include <limits.h>
++#include <sys/mman.h>
++#include <stdbool.h>
++
++#include "kdbus-api.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++#include "kdbus-test.h"
++
++static struct kdbus_item *kdbus_get_item(struct kdbus_info *info,
++ uint64_t type)
++{
++ struct kdbus_item *item;
++
++ KDBUS_ITEM_FOREACH(item, info, items)
++ if (item->type == type)
++ return item;
++
++ return NULL;
++}
++
++static int test_bus_creator_info(const char *bus_path)
++{
++ int ret;
++ uint64_t offset;
++ struct kdbus_conn *conn;
++ struct kdbus_info *info;
++ struct kdbus_item *item;
++ char *tmp, *busname;
++
++ /* extract the bus-name from @bus_path */
++ tmp = strdup(bus_path);
++ ASSERT_RETURN(tmp);
++ busname = strrchr(tmp, '/');
++ ASSERT_RETURN(busname);
++ *busname = 0;
++ busname = strrchr(tmp, '/');
++ ASSERT_RETURN(busname);
++ ++busname;
++
++ conn = kdbus_hello(bus_path, 0, NULL, 0);
++ ASSERT_RETURN(conn);
++
++ ret = kdbus_bus_creator_info(conn, _KDBUS_ATTACH_ALL, &offset);
++ ASSERT_RETURN(ret == 0);
++
++ info = (struct kdbus_info *)(conn->buf + offset);
++
++ item = kdbus_get_item(info, KDBUS_ITEM_MAKE_NAME);
++ ASSERT_RETURN(item);
++ ASSERT_RETURN(!strcmp(item->str, busname));
++
++ ret = kdbus_free(conn, offset);
++ ASSERT_RETURN_VAL(ret == 0, ret);
++
++ free(tmp);
++ kdbus_conn_free(conn);
++ return 0;
++}
++
++int kdbus_test_bus_make(struct kdbus_test_env *env)
++{
++ struct {
++ struct kdbus_cmd cmd;
++
++ /* bloom size item */
++ struct {
++ uint64_t size;
++ uint64_t type;
++ struct kdbus_bloom_parameter bloom;
++ } bs;
++
++ /* name item */
++ uint64_t n_size;
++ uint64_t n_type;
++ char name[64];
++ } bus_make;
++ char s[PATH_MAX], *name;
++ int ret, control_fd2;
++ uid_t uid;
++
++ name = unique_name("");
++ ASSERT_RETURN(name);
++
++ snprintf(s, sizeof(s), "%s/control", env->root);
++ env->control_fd = open(s, O_RDWR|O_CLOEXEC);
++ ASSERT_RETURN(env->control_fd >= 0);
++
++ control_fd2 = open(s, O_RDWR|O_CLOEXEC);
++ ASSERT_RETURN(control_fd2 >= 0);
++
++ memset(&bus_make, 0, sizeof(bus_make));
++
++ bus_make.bs.size = sizeof(bus_make.bs);
++ bus_make.bs.type = KDBUS_ITEM_BLOOM_PARAMETER;
++ bus_make.bs.bloom.size = 64;
++ bus_make.bs.bloom.n_hash = 1;
++
++ bus_make.n_type = KDBUS_ITEM_MAKE_NAME;
++
++ uid = getuid();
++
++ /* missing uid prefix */
++ snprintf(bus_make.name, sizeof(bus_make.name), "foo");
++ bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
++ bus_make.cmd.size = sizeof(struct kdbus_cmd) +
++ sizeof(bus_make.bs) + bus_make.n_size;
++ ret = kdbus_cmd_bus_make(env->control_fd, &bus_make.cmd);
++ ASSERT_RETURN(ret == -EINVAL);
++
++ /* non alphanumeric character */
++ snprintf(bus_make.name, sizeof(bus_make.name), "%u-blah@123", uid);
++ bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
++ bus_make.cmd.size = sizeof(struct kdbus_cmd) +
++ sizeof(bus_make.bs) + bus_make.n_size;
++ ret = kdbus_cmd_bus_make(env->control_fd, &bus_make.cmd);
++ ASSERT_RETURN(ret == -EINVAL);
++
++ /* '-' at the end */
++ snprintf(bus_make.name, sizeof(bus_make.name), "%u-blah-", uid);
++ bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
++ bus_make.cmd.size = sizeof(struct kdbus_cmd) +
++ sizeof(bus_make.bs) + bus_make.n_size;
++ ret = kdbus_cmd_bus_make(env->control_fd, &bus_make.cmd);
++ ASSERT_RETURN(ret == -EINVAL);
++
++ /* create a new bus */
++ snprintf(bus_make.name, sizeof(bus_make.name), "%u-%s-1", uid, name);
++ bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
++ bus_make.cmd.size = sizeof(struct kdbus_cmd) +
++ sizeof(bus_make.bs) + bus_make.n_size;
++ ret = kdbus_cmd_bus_make(env->control_fd, &bus_make.cmd);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_cmd_bus_make(control_fd2, &bus_make.cmd);
++ ASSERT_RETURN(ret == -EEXIST);
++
++ snprintf(s, sizeof(s), "%s/%u-%s-1/bus", env->root, uid, name);
++ ASSERT_RETURN(access(s, F_OK) == 0);
++
++ ret = test_bus_creator_info(s);
++ ASSERT_RETURN(ret == 0);
++
++ /* can't use the same fd for bus make twice, even though a different
++ * bus name is used
++ */
++ snprintf(bus_make.name, sizeof(bus_make.name), "%u-%s-2", uid, name);
++ bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
++ bus_make.cmd.size = sizeof(struct kdbus_cmd) +
++ sizeof(bus_make.bs) + bus_make.n_size;
++ ret = kdbus_cmd_bus_make(env->control_fd, &bus_make.cmd);
++ ASSERT_RETURN(ret == -EBADFD);
++
++ /* create a new bus, with different fd and different bus name */
++ snprintf(bus_make.name, sizeof(bus_make.name), "%u-%s-2", uid, name);
++ bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
++ bus_make.cmd.size = sizeof(struct kdbus_cmd) +
++ sizeof(bus_make.bs) + bus_make.n_size;
++ ret = kdbus_cmd_bus_make(control_fd2, &bus_make.cmd);
++ ASSERT_RETURN(ret == 0);
++
++ close(control_fd2);
++ free(name);
++
++ return TEST_OK;
++}
+diff --git a/tools/testing/selftests/kdbus/test-chat.c b/tools/testing/selftests/kdbus/test-chat.c
+new file mode 100644
+index 0000000..41e5b53
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-chat.c
+@@ -0,0 +1,124 @@
++#include <stdio.h>
++#include <string.h>
++#include <time.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++#include <assert.h>
++#include <poll.h>
++#include <stdbool.h>
++
++#include "kdbus-test.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++
++int kdbus_test_chat(struct kdbus_test_env *env)
++{
++ int ret, cookie;
++ struct kdbus_conn *conn_a, *conn_b;
++ struct pollfd fds[2];
++ uint64_t flags;
++ int count;
++
++ conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
++ conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn_a && conn_b);
++
++ flags = KDBUS_NAME_ALLOW_REPLACEMENT;
++ ret = kdbus_name_acquire(conn_a, "foo.bar.test", &flags);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_name_acquire(conn_a, "foo.bar.baz", NULL);
++ ASSERT_RETURN(ret == 0);
++
++ flags = KDBUS_NAME_QUEUE;
++ ret = kdbus_name_acquire(conn_b, "foo.bar.baz", &flags);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_name_acquire(conn_a, "foo.bar.double", NULL);
++ ASSERT_RETURN(ret == 0);
++
++ flags = 0;
++ ret = kdbus_name_acquire(conn_a, "foo.bar.double", &flags);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(!(flags & KDBUS_NAME_ACQUIRED));
++
++ ret = kdbus_name_release(conn_a, "foo.bar.double");
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_name_release(conn_a, "foo.bar.double");
++ ASSERT_RETURN(ret == -ESRCH);
++
++ ret = kdbus_list(conn_b, KDBUS_LIST_UNIQUE |
++ KDBUS_LIST_NAMES |
++ KDBUS_LIST_QUEUED |
++ KDBUS_LIST_ACTIVATORS);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_add_match_empty(conn_a);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_add_match_empty(conn_b);
++ ASSERT_RETURN(ret == 0);
++
++ cookie = 0;
++ ret = kdbus_msg_send(conn_b, NULL, 0xc0000000 | cookie, 0, 0, 0,
++ KDBUS_DST_ID_BROADCAST);
++ ASSERT_RETURN(ret == 0);
++
++ fds[0].fd = conn_a->fd;
++ fds[1].fd = conn_b->fd;
++
++ kdbus_printf("-- entering poll loop ...\n");
++
++ for (count = 0;; count++) {
++ int i, nfds = sizeof(fds) / sizeof(fds[0]);
++
++ for (i = 0; i < nfds; i++) {
++ fds[i].events = POLLIN | POLLPRI | POLLHUP;
++ fds[i].revents = 0;
++ }
++
++ ret = poll(fds, nfds, 3000);
++ ASSERT_RETURN(ret >= 0);
++
++ if (fds[0].revents & POLLIN) {
++ if (count > 2)
++ kdbus_name_release(conn_a, "foo.bar.baz");
++
++ ret = kdbus_msg_recv(conn_a, NULL, NULL);
++ ASSERT_RETURN(ret == 0);
++ ret = kdbus_msg_send(conn_a, NULL,
++ 0xc0000000 | cookie++,
++ 0, 0, 0, conn_b->id);
++ ASSERT_RETURN(ret == 0);
++ }
++
++ if (fds[1].revents & POLLIN) {
++ ret = kdbus_msg_recv(conn_b, NULL, NULL);
++ ASSERT_RETURN(ret == 0);
++ ret = kdbus_msg_send(conn_b, NULL,
++ 0xc0000000 | cookie++,
++ 0, 0, 0, conn_a->id);
++ ASSERT_RETURN(ret == 0);
++ }
++
++ ret = kdbus_list(conn_b, KDBUS_LIST_UNIQUE |
++ KDBUS_LIST_NAMES |
++ KDBUS_LIST_QUEUED |
++ KDBUS_LIST_ACTIVATORS);
++ ASSERT_RETURN(ret == 0);
++
++ if (count > 10)
++ break;
++ }
++
++ kdbus_printf("-- closing bus connections\n");
++ kdbus_conn_free(conn_a);
++ kdbus_conn_free(conn_b);
++
++ return TEST_OK;
++}
+diff --git a/tools/testing/selftests/kdbus/test-connection.c b/tools/testing/selftests/kdbus/test-connection.c
+new file mode 100644
+index 0000000..4688ce8
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-connection.c
+@@ -0,0 +1,597 @@
++#include <stdio.h>
++#include <string.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++#include <assert.h>
++#include <limits.h>
++#include <sys/types.h>
++#include <sys/capability.h>
++#include <sys/mman.h>
++#include <sys/syscall.h>
++#include <sys/wait.h>
++#include <stdbool.h>
++
++#include "kdbus-api.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++#include "kdbus-test.h"
++
++int kdbus_test_hello(struct kdbus_test_env *env)
++{
++ struct kdbus_cmd_free cmd_free = {};
++ struct kdbus_cmd_hello hello;
++ int fd, ret;
++
++ memset(&hello, 0, sizeof(hello));
++
++ fd = open(env->buspath, O_RDWR|O_CLOEXEC);
++ ASSERT_RETURN(fd >= 0);
++
++ hello.flags = KDBUS_HELLO_ACCEPT_FD;
++ hello.attach_flags_send = _KDBUS_ATTACH_ALL;
++ hello.attach_flags_recv = _KDBUS_ATTACH_ALL;
++ hello.size = sizeof(struct kdbus_cmd_hello);
++ hello.pool_size = POOL_SIZE;
++
++ /* an unaligned hello must result in -EFAULT */
++ ret = kdbus_cmd_hello(fd, (struct kdbus_cmd_hello *) ((char *) &hello + 1));
++ ASSERT_RETURN(ret == -EFAULT);
++
++ /* a size of 0 must return EMSGSIZE */
++ hello.size = 1;
++ hello.flags = KDBUS_HELLO_ACCEPT_FD;
++ hello.attach_flags_send = _KDBUS_ATTACH_ALL;
++ ret = kdbus_cmd_hello(fd, &hello);
++ ASSERT_RETURN(ret == -EINVAL);
++
++ hello.size = sizeof(struct kdbus_cmd_hello);
++
++ /* check faulty flags */
++ hello.flags = 1ULL << 32;
++ hello.attach_flags_send = _KDBUS_ATTACH_ALL;
++ ret = kdbus_cmd_hello(fd, &hello);
++ ASSERT_RETURN(ret == -EINVAL);
++
++ /* check for faulty pool sizes */
++ hello.pool_size = 0;
++ hello.flags = KDBUS_HELLO_ACCEPT_FD;
++ hello.attach_flags_send = _KDBUS_ATTACH_ALL;
++ ret = kdbus_cmd_hello(fd, &hello);
++ ASSERT_RETURN(ret == -EINVAL);
++
++ hello.pool_size = 4097;
++ hello.attach_flags_send = _KDBUS_ATTACH_ALL;
++ ret = kdbus_cmd_hello(fd, &hello);
++ ASSERT_RETURN(ret == -EINVAL);
++
++ hello.pool_size = POOL_SIZE;
++
++ hello.attach_flags_send = _KDBUS_ATTACH_ALL;
++ hello.offset = (__u64)-1;
++
++ /* success test */
++ ret = kdbus_cmd_hello(fd, &hello);
++ ASSERT_RETURN(ret == 0);
++
++ /* The kernel should have returned some items */
++ ASSERT_RETURN(hello.offset != (__u64)-1);
++ cmd_free.size = sizeof(cmd_free);
++ cmd_free.offset = hello.offset;
++ ret = kdbus_cmd_free(fd, &cmd_free);
++ ASSERT_RETURN(ret >= 0);
++
++ close(fd);
++
++ fd = open(env->buspath, O_RDWR|O_CLOEXEC);
++ ASSERT_RETURN(fd >= 0);
++
++ /* no ACTIVATOR flag without a name */
++ hello.flags = KDBUS_HELLO_ACTIVATOR;
++ ret = kdbus_cmd_hello(fd, &hello);
++ ASSERT_RETURN(ret == -EINVAL);
++
++ close(fd);
++
++ return TEST_OK;
++}
++
++int kdbus_test_byebye(struct kdbus_test_env *env)
++{
++ struct kdbus_conn *conn;
++ struct kdbus_cmd_recv cmd_recv = { .size = sizeof(cmd_recv) };
++ struct kdbus_cmd cmd_byebye = { .size = sizeof(cmd_byebye) };
++ int ret;
++
++ /* create a 2nd connection */
++ conn = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn != NULL);
++
++ ret = kdbus_add_match_empty(conn);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_add_match_empty(env->conn);
++ ASSERT_RETURN(ret == 0);
++
++ /* send over 1st connection */
++ ret = kdbus_msg_send(env->conn, NULL, 0, 0, 0, 0,
++ KDBUS_DST_ID_BROADCAST);
++ ASSERT_RETURN(ret == 0);
++
++ /* say byebye on the 2nd, which must fail */
++ ret = kdbus_cmd_byebye(conn->fd, &cmd_byebye);
++ ASSERT_RETURN(ret == -EBUSY);
++
++ /* receive the message */
++ ret = kdbus_cmd_recv(conn->fd, &cmd_recv);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_free(conn, cmd_recv.msg.offset);
++ ASSERT_RETURN(ret == 0);
++
++ /* and try again */
++ ret = kdbus_cmd_byebye(conn->fd, &cmd_byebye);
++ ASSERT_RETURN(ret == 0);
++
++ /* a 2nd try should result in -ECONNRESET */
++ ret = kdbus_cmd_byebye(conn->fd, &cmd_byebye);
++ ASSERT_RETURN(ret == -ECONNRESET);
++
++ kdbus_conn_free(conn);
++
++ return TEST_OK;
++}
++
++/* Get only the first item */
++static struct kdbus_item *kdbus_get_item(struct kdbus_info *info,
++ uint64_t type)
++{
++ struct kdbus_item *item;
++
++ KDBUS_ITEM_FOREACH(item, info, items)
++ if (item->type == type)
++ return item;
++
++ return NULL;
++}
++
++static unsigned int kdbus_count_item(struct kdbus_info *info,
++ uint64_t type)
++{
++ unsigned int i = 0;
++ const struct kdbus_item *item;
++
++ KDBUS_ITEM_FOREACH(item, info, items)
++ if (item->type == type)
++ i++;
++
++ return i;
++}
++
++static int kdbus_fuzz_conn_info(struct kdbus_test_env *env, int capable)
++{
++ int ret;
++ unsigned int cnt = 0;
++ uint64_t offset = 0;
++ struct kdbus_info *info;
++ struct kdbus_conn *conn;
++ struct kdbus_conn *privileged;
++ const struct kdbus_item *item;
++ uint64_t valid_flags = KDBUS_ATTACH_NAMES |
++ KDBUS_ATTACH_CREDS |
++ KDBUS_ATTACH_PIDS |
++ KDBUS_ATTACH_CONN_DESCRIPTION;
++
++ uint64_t invalid_flags = KDBUS_ATTACH_NAMES |
++ KDBUS_ATTACH_CREDS |
++ KDBUS_ATTACH_PIDS |
++ KDBUS_ATTACH_CAPS |
++ KDBUS_ATTACH_CGROUP |
++ KDBUS_ATTACH_CONN_DESCRIPTION;
++
++ struct kdbus_creds cached_creds;
++ uid_t ruid, euid, suid;
++ gid_t rgid, egid, sgid;
++
++ getresuid(&ruid, &euid, &suid);
++ getresgid(&rgid, &egid, &sgid);
++
++ cached_creds.uid = ruid;
++ cached_creds.euid = euid;
++ cached_creds.suid = suid;
++ cached_creds.fsuid = ruid;
++
++ cached_creds.gid = rgid;
++ cached_creds.egid = egid;
++ cached_creds.sgid = sgid;
++ cached_creds.fsgid = rgid;
++
++ struct kdbus_pids cached_pids = {
++ .pid = getpid(),
++ .tid = syscall(SYS_gettid),
++ .ppid = getppid(),
++ };
++
++ ret = kdbus_conn_info(env->conn, env->conn->id, NULL,
++ valid_flags, &offset);
++ ASSERT_RETURN(ret == 0);
++
++ info = (struct kdbus_info *)(env->conn->buf + offset);
++ ASSERT_RETURN(info->id == env->conn->id);
++
++ /* We do not have any well-known name */
++ item = kdbus_get_item(info, KDBUS_ITEM_NAME);
++ ASSERT_RETURN(item == NULL);
++
++ item = kdbus_get_item(info, KDBUS_ITEM_CONN_DESCRIPTION);
++ if (valid_flags & KDBUS_ATTACH_CONN_DESCRIPTION) {
++ ASSERT_RETURN(item);
++ } else {
++ ASSERT_RETURN(item == NULL);
++ }
++
++ kdbus_free(env->conn, offset);
++
++ conn = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn);
++
++ privileged = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(privileged);
++
++ ret = kdbus_conn_info(conn, conn->id, NULL, valid_flags, &offset);
++ ASSERT_RETURN(ret == 0);
++
++ info = (struct kdbus_info *)(conn->buf + offset);
++ ASSERT_RETURN(info->id == conn->id);
++
++ /* We do not have any well-known name */
++ item = kdbus_get_item(info, KDBUS_ITEM_NAME);
++ ASSERT_RETURN(item == NULL);
++
++ cnt = kdbus_count_item(info, KDBUS_ITEM_CREDS);
++ if (valid_flags & KDBUS_ATTACH_CREDS) {
++ ASSERT_RETURN(cnt == 1);
++
++ item = kdbus_get_item(info, KDBUS_ITEM_CREDS);
++ ASSERT_RETURN(item);
++
++ /* Compare received items with cached creds */
++ ASSERT_RETURN(memcmp(&item->creds, &cached_creds,
++ sizeof(struct kdbus_creds)) == 0);
++ } else {
++ ASSERT_RETURN(cnt == 0);
++ }
++
++ item = kdbus_get_item(info, KDBUS_ITEM_PIDS);
++ if (valid_flags & KDBUS_ATTACH_PIDS) {
++ ASSERT_RETURN(item);
++
++ /* Compare item->pids with cached PIDs */
++ ASSERT_RETURN(item->pids.pid == cached_pids.pid &&
++ item->pids.tid == cached_pids.tid &&
++ item->pids.ppid == cached_pids.ppid);
++ } else {
++ ASSERT_RETURN(item == NULL);
++ }
++
++ /* We did not request KDBUS_ITEM_CAPS */
++ item = kdbus_get_item(info, KDBUS_ITEM_CAPS);
++ ASSERT_RETURN(item == NULL);
++
++ kdbus_free(conn, offset);
++
++ ret = kdbus_name_acquire(conn, "com.example.a", NULL);
++ ASSERT_RETURN(ret >= 0);
++
++ ret = kdbus_conn_info(conn, conn->id, NULL, valid_flags, &offset);
++ ASSERT_RETURN(ret == 0);
++
++ info = (struct kdbus_info *)(conn->buf + offset);
++ ASSERT_RETURN(info->id == conn->id);
++
++ item = kdbus_get_item(info, KDBUS_ITEM_OWNED_NAME);
++ if (valid_flags & KDBUS_ATTACH_NAMES) {
++ ASSERT_RETURN(item && !strcmp(item->name.name, "com.example.a"));
++ } else {
++ ASSERT_RETURN(item == NULL);
++ }
++
++ kdbus_free(conn, offset);
++
++ ret = kdbus_conn_info(conn, 0, "com.example.a", valid_flags, &offset);
++ ASSERT_RETURN(ret == 0);
++
++ info = (struct kdbus_info *)(conn->buf + offset);
++ ASSERT_RETURN(info->id == conn->id);
++
++ kdbus_free(conn, offset);
++
++ /* does not have the necessary caps to drop to unprivileged */
++ if (!capable)
++ goto continue_test;
++
++ ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_GID, ({
++ ret = kdbus_conn_info(conn, conn->id, NULL,
++ valid_flags, &offset);
++ ASSERT_EXIT(ret == 0);
++
++ info = (struct kdbus_info *)(conn->buf + offset);
++ ASSERT_EXIT(info->id == conn->id);
++
++ if (valid_flags & KDBUS_ATTACH_NAMES) {
++ item = kdbus_get_item(info, KDBUS_ITEM_OWNED_NAME);
++ ASSERT_EXIT(item &&
++ strcmp(item->name.name,
++ "com.example.a") == 0);
++ }
++
++ if (valid_flags & KDBUS_ATTACH_CREDS) {
++ item = kdbus_get_item(info, KDBUS_ITEM_CREDS);
++ ASSERT_EXIT(item);
++
++ /* Compare received items with cached creds */
++ ASSERT_EXIT(memcmp(&item->creds, &cached_creds,
++ sizeof(struct kdbus_creds)) == 0);
++ }
++
++ if (valid_flags & KDBUS_ATTACH_PIDS) {
++ item = kdbus_get_item(info, KDBUS_ITEM_PIDS);
++ ASSERT_EXIT(item);
++
++ /*
++ * Compare item->pids with cached pids of
++ * privileged one.
++ *
++ * cmd_info will always return cached pids.
++ */
++ ASSERT_EXIT(item->pids.pid == cached_pids.pid &&
++ item->pids.tid == cached_pids.tid);
++ }
++
++ kdbus_free(conn, offset);
++
++ /*
++ * Use invalid_flags and make sure that userspace
++ * do not play with us.
++ */
++ ret = kdbus_conn_info(conn, conn->id, NULL,
++ invalid_flags, &offset);
++ ASSERT_EXIT(ret == 0);
++
++ /*
++ * Make sure that we return only one creds item and
++ * it points to the cached creds.
++ */
++ cnt = kdbus_count_item(info, KDBUS_ITEM_CREDS);
++ if (invalid_flags & KDBUS_ATTACH_CREDS) {
++ ASSERT_EXIT(cnt == 1);
++
++ item = kdbus_get_item(info, KDBUS_ITEM_CREDS);
++ ASSERT_EXIT(item);
++
++ /* Compare received items with cached creds */
++ ASSERT_EXIT(memcmp(&item->creds, &cached_creds,
++ sizeof(struct kdbus_creds)) == 0);
++ } else {
++ ASSERT_EXIT(cnt == 0);
++ }
++
++ if (invalid_flags & KDBUS_ATTACH_PIDS) {
++ cnt = kdbus_count_item(info, KDBUS_ITEM_PIDS);
++ ASSERT_EXIT(cnt == 1);
++
++ item = kdbus_get_item(info, KDBUS_ITEM_PIDS);
++ ASSERT_EXIT(item);
++
++ /* Compare item->pids with cached pids */
++ ASSERT_EXIT(item->pids.pid == cached_pids.pid &&
++ item->pids.tid == cached_pids.tid);
++ }
++
++ cnt = kdbus_count_item(info, KDBUS_ITEM_CGROUP);
++ if (invalid_flags & KDBUS_ATTACH_CGROUP) {
++ ASSERT_EXIT(cnt == 1);
++ } else {
++ ASSERT_EXIT(cnt == 0);
++ }
++
++ cnt = kdbus_count_item(info, KDBUS_ITEM_CAPS);
++ if (invalid_flags & KDBUS_ATTACH_CAPS) {
++ ASSERT_EXIT(cnt == 1);
++ } else {
++ ASSERT_EXIT(cnt == 0);
++ }
++
++ kdbus_free(conn, offset);
++ }),
++ ({ 0; }));
++ ASSERT_RETURN(ret == 0);
++
++continue_test:
++
++ /* A second name */
++ ret = kdbus_name_acquire(conn, "com.example.b", NULL);
++ ASSERT_RETURN(ret >= 0);
++
++ ret = kdbus_conn_info(conn, conn->id, NULL, valid_flags, &offset);
++ ASSERT_RETURN(ret == 0);
++
++ info = (struct kdbus_info *)(conn->buf + offset);
++ ASSERT_RETURN(info->id == conn->id);
++
++ cnt = kdbus_count_item(info, KDBUS_ITEM_OWNED_NAME);
++ if (valid_flags & KDBUS_ATTACH_NAMES) {
++ ASSERT_RETURN(cnt == 2);
++ } else {
++ ASSERT_RETURN(cnt == 0);
++ }
++
++ kdbus_free(conn, offset);
++
++ ASSERT_RETURN(ret == 0);
++
++ return 0;
++}
++
++int kdbus_test_conn_info(struct kdbus_test_env *env)
++{
++ int ret;
++ int have_caps;
++ struct {
++ struct kdbus_cmd_info cmd_info;
++
++ struct {
++ uint64_t size;
++ uint64_t type;
++ char str[64];
++ } name;
++ } buf;
++
++ buf.cmd_info.size = sizeof(struct kdbus_cmd_info);
++ buf.cmd_info.flags = 0;
++ buf.cmd_info.attach_flags = 0;
++ buf.cmd_info.id = env->conn->id;
++
++ ret = kdbus_conn_info(env->conn, env->conn->id, NULL, 0, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ /* try to pass a name that is longer than the buffer's size */
++ buf.name.size = KDBUS_ITEM_HEADER_SIZE + 1;
++ buf.name.type = KDBUS_ITEM_NAME;
++ strcpy(buf.name.str, "foo.bar.bla");
++
++ buf.cmd_info.id = 0;
++ buf.cmd_info.size = sizeof(buf.cmd_info) + buf.name.size;
++ ret = kdbus_cmd_conn_info(env->conn->fd, (struct kdbus_cmd_info *) &buf);
++ ASSERT_RETURN(ret == -EINVAL);
++
++ /* Pass a non existent name */
++ ret = kdbus_conn_info(env->conn, 0, "non.existent.name", 0, NULL);
++ ASSERT_RETURN(ret == -ESRCH);
++
++ if (!all_uids_gids_are_mapped())
++ return TEST_SKIP;
++
++ /* Test for caps here, so we run the previous test */
++ have_caps = test_is_capable(CAP_SETUID, CAP_SETGID, -1);
++ ASSERT_RETURN(have_caps >= 0);
++
++ ret = kdbus_fuzz_conn_info(env, have_caps);
++ ASSERT_RETURN(ret == 0);
++
++ /* Now if we have skipped some tests then let the user know */
++ if (!have_caps)
++ return TEST_SKIP;
++
++ return TEST_OK;
++}
++
++int kdbus_test_conn_update(struct kdbus_test_env *env)
++{
++ struct kdbus_conn *conn;
++ struct kdbus_msg *msg;
++ int found = 0;
++ int ret;
++
++ /*
++ * kdbus_hello() sets all attach flags. Receive a message by this
++ * connection, and make sure a timestamp item (just to pick one) is
++ * present.
++ */
++ conn = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn);
++
++ ret = kdbus_msg_send(env->conn, NULL, 0x12345678, 0, 0, 0, conn->id);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_msg_recv(conn, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ found = kdbus_item_in_message(msg, KDBUS_ITEM_TIMESTAMP);
++ ASSERT_RETURN(found == 1);
++
++ kdbus_msg_free(msg);
++
++ /*
++ * Now, modify the attach flags and repeat the action. The item must
++ * now be missing.
++ */
++ found = 0;
++
++ ret = kdbus_conn_update_attach_flags(conn,
++ _KDBUS_ATTACH_ALL,
++ _KDBUS_ATTACH_ALL &
++ ~KDBUS_ATTACH_TIMESTAMP);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_msg_send(env->conn, NULL, 0x12345678, 0, 0, 0, conn->id);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_msg_recv(conn, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ found = kdbus_item_in_message(msg, KDBUS_ITEM_TIMESTAMP);
++ ASSERT_RETURN(found == 0);
++
++ /* Provide a bogus attach_flags value */
++ ret = kdbus_conn_update_attach_flags(conn,
++ _KDBUS_ATTACH_ALL + 1,
++ _KDBUS_ATTACH_ALL);
++ ASSERT_RETURN(ret == -EINVAL);
++
++ kdbus_msg_free(msg);
++
++ kdbus_conn_free(conn);
++
++ return TEST_OK;
++}
++
++int kdbus_test_writable_pool(struct kdbus_test_env *env)
++{
++ struct kdbus_cmd_free cmd_free = {};
++ struct kdbus_cmd_hello hello;
++ int fd, ret;
++ void *map;
++
++ fd = open(env->buspath, O_RDWR | O_CLOEXEC);
++ ASSERT_RETURN(fd >= 0);
++
++ memset(&hello, 0, sizeof(hello));
++ hello.flags = KDBUS_HELLO_ACCEPT_FD;
++ hello.attach_flags_send = _KDBUS_ATTACH_ALL;
++ hello.attach_flags_recv = _KDBUS_ATTACH_ALL;
++ hello.size = sizeof(struct kdbus_cmd_hello);
++ hello.pool_size = POOL_SIZE;
++ hello.offset = (__u64)-1;
++
++ /* success test */
++ ret = kdbus_cmd_hello(fd, &hello);
++ ASSERT_RETURN(ret == 0);
++
++ /* The kernel should have returned some items */
++ ASSERT_RETURN(hello.offset != (__u64)-1);
++ cmd_free.size = sizeof(cmd_free);
++ cmd_free.offset = hello.offset;
++ ret = kdbus_cmd_free(fd, &cmd_free);
++ ASSERT_RETURN(ret >= 0);
++
++ /* pools cannot be mapped writable */
++ map = mmap(NULL, POOL_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
++ ASSERT_RETURN(map == MAP_FAILED);
++
++ /* pools can always be mapped readable */
++ map = mmap(NULL, POOL_SIZE, PROT_READ, MAP_SHARED, fd, 0);
++ ASSERT_RETURN(map != MAP_FAILED);
++
++ /* make sure we cannot change protection masks to writable */
++ ret = mprotect(map, POOL_SIZE, PROT_READ | PROT_WRITE);
++ ASSERT_RETURN(ret < 0);
++
++ munmap(map, POOL_SIZE);
++ close(fd);
++
++ return TEST_OK;
++}
+diff --git a/tools/testing/selftests/kdbus/test-daemon.c b/tools/testing/selftests/kdbus/test-daemon.c
+new file mode 100644
+index 0000000..8bc2386
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-daemon.c
+@@ -0,0 +1,65 @@
++#include <stdio.h>
++#include <string.h>
++#include <time.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++#include <assert.h>
++#include <poll.h>
++#include <stdbool.h>
++
++#include "kdbus-test.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++
++int kdbus_test_daemon(struct kdbus_test_env *env)
++{
++ struct pollfd fds[2];
++ int count;
++ int ret;
++
++ /* This test doesn't make any sense in non-interactive mode */
++ if (!kdbus_util_verbose)
++ return TEST_OK;
++
++ printf("Created connection %llu on bus '%s'\n",
++ (unsigned long long) env->conn->id, env->buspath);
++
++ ret = kdbus_name_acquire(env->conn, "com.example.kdbus-test", NULL);
++ ASSERT_RETURN(ret == 0);
++ printf(" Aquired name: com.example.kdbus-test\n");
++
++ fds[0].fd = env->conn->fd;
++ fds[1].fd = STDIN_FILENO;
++
++ printf("Monitoring connections:\n");
++
++ for (count = 0;; count++) {
++ int i, nfds = sizeof(fds) / sizeof(fds[0]);
++
++ for (i = 0; i < nfds; i++) {
++ fds[i].events = POLLIN | POLLPRI | POLLHUP;
++ fds[i].revents = 0;
++ }
++
++ ret = poll(fds, nfds, -1);
++ if (ret <= 0)
++ break;
++
++ if (fds[0].revents & POLLIN) {
++ ret = kdbus_msg_recv(env->conn, NULL, NULL);
++ ASSERT_RETURN(ret == 0);
++ }
++
++ /* stdin */
++ if (fds[1].revents & POLLIN)
++ break;
++ }
++
++ printf("Closing bus connection\n");
++
++ return TEST_OK;
++}
+diff --git a/tools/testing/selftests/kdbus/test-endpoint.c b/tools/testing/selftests/kdbus/test-endpoint.c
+new file mode 100644
+index 0000000..34a7be4
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-endpoint.c
+@@ -0,0 +1,352 @@
++#include <stdio.h>
++#include <string.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++#include <assert.h>
++#include <libgen.h>
++#include <sys/capability.h>
++#include <sys/wait.h>
++#include <stdbool.h>
++
++#include "kdbus-api.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++#include "kdbus-test.h"
++
++#define KDBUS_SYSNAME_MAX_LEN 63
++
++static int install_name_add_match(struct kdbus_conn *conn, const char *name)
++{
++ struct {
++ struct kdbus_cmd_match cmd;
++ struct {
++ uint64_t size;
++ uint64_t type;
++ struct kdbus_notify_name_change chg;
++ } item;
++ char name[64];
++ } buf;
++ int ret;
++
++ /* install the match rule */
++ memset(&buf, 0, sizeof(buf));
++ buf.item.type = KDBUS_ITEM_NAME_ADD;
++ buf.item.chg.old_id.id = KDBUS_MATCH_ID_ANY;
++ buf.item.chg.new_id.id = KDBUS_MATCH_ID_ANY;
++ strncpy(buf.name, name, sizeof(buf.name) - 1);
++ buf.item.size = sizeof(buf.item) + strlen(buf.name) + 1;
++ buf.cmd.size = sizeof(buf.cmd) + buf.item.size;
++
++ ret = kdbus_cmd_match_add(conn->fd, &buf.cmd);
++ if (ret < 0)
++ return ret;
++
++ return 0;
++}
++
++static int create_endpoint(const char *buspath, uid_t uid, const char *name,
++ uint64_t flags)
++{
++ struct {
++ struct kdbus_cmd cmd;
++
++ /* name item */
++ struct {
++ uint64_t size;
++ uint64_t type;
++ /* max should be KDBUS_SYSNAME_MAX_LEN */
++ char str[128];
++ } name;
++ } ep_make;
++ int fd, ret;
++
++ fd = open(buspath, O_RDWR);
++ if (fd < 0)
++ return fd;
++
++ memset(&ep_make, 0, sizeof(ep_make));
++
++ snprintf(ep_make.name.str,
++ /* Use the KDBUS_SYSNAME_MAX_LEN or sizeof(str) */
++ KDBUS_SYSNAME_MAX_LEN > strlen(name) ?
++ KDBUS_SYSNAME_MAX_LEN : sizeof(ep_make.name.str),
++ "%u-%s", uid, name);
++
++ ep_make.name.type = KDBUS_ITEM_MAKE_NAME;
++ ep_make.name.size = KDBUS_ITEM_HEADER_SIZE +
++ strlen(ep_make.name.str) + 1;
++
++ ep_make.cmd.flags = flags;
++ ep_make.cmd.size = sizeof(ep_make.cmd) + ep_make.name.size;
++
++ ret = kdbus_cmd_endpoint_make(fd, &ep_make.cmd);
++ if (ret < 0) {
++ kdbus_printf("error creating endpoint: %d (%m)\n", ret);
++ return ret;
++ }
++
++ return fd;
++}
++
++static int unpriv_test_custom_ep(const char *buspath)
++{
++ int ret, ep_fd1, ep_fd2;
++ char *ep1, *ep2, *tmp1, *tmp2;
++
++ tmp1 = strdup(buspath);
++ tmp2 = strdup(buspath);
++ ASSERT_RETURN(tmp1 && tmp2);
++
++ ret = asprintf(&ep1, "%s/%u-%s", dirname(tmp1), getuid(), "apps1");
++ ASSERT_RETURN(ret >= 0);
++
++ ret = asprintf(&ep2, "%s/%u-%s", dirname(tmp2), getuid(), "apps2");
++ ASSERT_RETURN(ret >= 0);
++
++ free(tmp1);
++ free(tmp2);
++
++ /* endpoint only accessible to current uid */
++ ep_fd1 = create_endpoint(buspath, getuid(), "apps1", 0);
++ ASSERT_RETURN(ep_fd1 >= 0);
++
++ /* endpoint world accessible */
++ ep_fd2 = create_endpoint(buspath, getuid(), "apps2",
++ KDBUS_MAKE_ACCESS_WORLD);
++ ASSERT_RETURN(ep_fd2 >= 0);
++
++ ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_UID, ({
++ int ep_fd;
++ struct kdbus_conn *ep_conn;
++
++ /*
++ * Make sure that we are not able to create custom
++ * endpoints
++ */
++ ep_fd = create_endpoint(buspath, getuid(),
++ "unpriv_costum_ep", 0);
++ ASSERT_EXIT(ep_fd == -EPERM);
++
++ /*
++ * Endpoint "apps1" only accessible to same users,
++ * that own the endpoint. Access denied by VFS
++ */
++ ep_conn = kdbus_hello(ep1, 0, NULL, 0);
++ ASSERT_EXIT(!ep_conn && errno == EACCES);
++
++ /* Endpoint "apps2" world accessible */
++ ep_conn = kdbus_hello(ep2, 0, NULL, 0);
++ ASSERT_EXIT(ep_conn);
++
++ kdbus_conn_free(ep_conn);
++
++ _exit(EXIT_SUCCESS);
++ }),
++ ({ 0; }));
++ ASSERT_RETURN(ret == 0);
++
++ close(ep_fd1);
++ close(ep_fd2);
++ free(ep1);
++ free(ep2);
++
++ return 0;
++}
++
++static int update_endpoint(int fd, const char *name)
++{
++ int len = strlen(name) + 1;
++ struct {
++ struct kdbus_cmd cmd;
++
++ /* name item */
++ struct {
++ uint64_t size;
++ uint64_t type;
++ char str[KDBUS_ALIGN8(len)];
++ } name;
++
++ struct {
++ uint64_t size;
++ uint64_t type;
++ struct kdbus_policy_access access;
++ } access;
++ } ep_update;
++ int ret;
++
++ memset(&ep_update, 0, sizeof(ep_update));
++
++ ep_update.name.size = KDBUS_ITEM_HEADER_SIZE + len;
++ ep_update.name.type = KDBUS_ITEM_NAME;
++ strncpy(ep_update.name.str, name, sizeof(ep_update.name.str) - 1);
++
++ ep_update.access.size = sizeof(ep_update.access);
++ ep_update.access.type = KDBUS_ITEM_POLICY_ACCESS;
++ ep_update.access.access.type = KDBUS_POLICY_ACCESS_WORLD;
++ ep_update.access.access.access = KDBUS_POLICY_SEE;
++
++ ep_update.cmd.size = sizeof(ep_update);
++
++ ret = kdbus_cmd_endpoint_update(fd, &ep_update.cmd);
++ if (ret < 0) {
++ kdbus_printf("error updating endpoint: %d (%m)\n", ret);
++ return ret;
++ }
++
++ return 0;
++}
++
++int kdbus_test_custom_endpoint(struct kdbus_test_env *env)
++{
++ char *ep, *tmp;
++ int ret, ep_fd;
++ struct kdbus_msg *msg;
++ struct kdbus_conn *ep_conn;
++ struct kdbus_conn *reader;
++ const char *name = "foo.bar.baz";
++ const char *epname = "foo";
++ char fake_ep[KDBUS_SYSNAME_MAX_LEN + 1] = {'\0'};
++
++ memset(fake_ep, 'X', sizeof(fake_ep) - 1);
++
++ /* Try to create a custom endpoint with a long name */
++ ret = create_endpoint(env->buspath, getuid(), fake_ep, 0);
++ ASSERT_RETURN(ret == -ENAMETOOLONG);
++
++ /* Try to create a custom endpoint with a different uid */
++ ret = create_endpoint(env->buspath, getuid() + 1, "foobar", 0);
++ ASSERT_RETURN(ret == -EINVAL);
++
++ /* create a custom endpoint, and open a connection on it */
++ ep_fd = create_endpoint(env->buspath, getuid(), "foo", 0);
++ ASSERT_RETURN(ep_fd >= 0);
++
++ tmp = strdup(env->buspath);
++ ASSERT_RETURN(tmp);
++
++ ret = asprintf(&ep, "%s/%u-%s", dirname(tmp), getuid(), epname);
++ free(tmp);
++ ASSERT_RETURN(ret >= 0);
++
++ /* Register a connection that listen to broadcasts */
++ reader = kdbus_hello(ep, 0, NULL, 0);
++ ASSERT_RETURN(reader);
++
++ /* Register to kernel signals */
++ ret = kdbus_add_match_id(reader, 0x1, KDBUS_ITEM_ID_ADD,
++ KDBUS_MATCH_ID_ANY);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_add_match_id(reader, 0x2, KDBUS_ITEM_ID_REMOVE,
++ KDBUS_MATCH_ID_ANY);
++ ASSERT_RETURN(ret == 0);
++
++ ret = install_name_add_match(reader, name);
++ ASSERT_RETURN(ret == 0);
++
++ /* Monitor connections are not supported on custom endpoints */
++ ep_conn = kdbus_hello(ep, KDBUS_HELLO_MONITOR, NULL, 0);
++ ASSERT_RETURN(!ep_conn && errno == EOPNOTSUPP);
++
++ ep_conn = kdbus_hello(ep, 0, NULL, 0);
++ ASSERT_RETURN(ep_conn);
++
++ /* Check that the reader got the IdAdd notification */
++ ret = kdbus_msg_recv(reader, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_ID_ADD);
++ ASSERT_RETURN(msg->items[0].id_change.id == ep_conn->id);
++ kdbus_msg_free(msg);
++
++ /*
++ * Add a name add match on the endpoint connection, acquire name from
++ * the unfiltered connection, and make sure the filtered connection
++ * did not get the notification on the name owner change. Also, the
++ * endpoint connection may not be able to call conn_info, neither on
++ * the name nor on the ID.
++ */
++ ret = install_name_add_match(ep_conn, name);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_name_acquire(env->conn, name, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_msg_recv(ep_conn, NULL, NULL);
++ ASSERT_RETURN(ret == -EAGAIN);
++
++ ret = kdbus_conn_info(ep_conn, 0, name, 0, NULL);
++ ASSERT_RETURN(ret == -ESRCH);
++
++ ret = kdbus_conn_info(ep_conn, 0, "random.crappy.name", 0, NULL);
++ ASSERT_RETURN(ret == -ESRCH);
++
++ ret = kdbus_conn_info(ep_conn, env->conn->id, NULL, 0, NULL);
++ ASSERT_RETURN(ret == -ENXIO);
++
++ ret = kdbus_conn_info(ep_conn, 0x0fffffffffffffffULL, NULL, 0, NULL);
++ ASSERT_RETURN(ret == -ENXIO);
++
++ /* Check that the reader did not receive the name notification */
++ ret = kdbus_msg_recv(reader, NULL, NULL);
++ ASSERT_RETURN(ret == -EAGAIN);
++
++ /*
++ * Release the name again, update the custom endpoint policy,
++ * and try again. This time, the connection on the custom endpoint
++ * should have gotten it.
++ */
++ ret = kdbus_name_release(env->conn, name);
++ ASSERT_RETURN(ret == 0);
++
++ /* Check that the reader did not receive the name notification */
++ ret = kdbus_msg_recv(reader, NULL, NULL);
++ ASSERT_RETURN(ret == -EAGAIN);
++
++ ret = update_endpoint(ep_fd, name);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_name_acquire(env->conn, name, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_msg_recv(ep_conn, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_NAME_ADD);
++ ASSERT_RETURN(msg->items[0].name_change.old_id.id == 0);
++ ASSERT_RETURN(msg->items[0].name_change.new_id.id == env->conn->id);
++ ASSERT_RETURN(strcmp(msg->items[0].name_change.name, name) == 0);
++ kdbus_msg_free(msg);
++
++ ret = kdbus_msg_recv(reader, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(strcmp(msg->items[0].name_change.name, name) == 0);
++
++ kdbus_msg_free(msg);
++
++ ret = kdbus_conn_info(ep_conn, 0, name, 0, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_conn_info(ep_conn, env->conn->id, NULL, 0, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ /* If we have privileges test custom endpoints */
++ ret = test_is_capable(CAP_SETUID, CAP_SETGID, -1);
++ ASSERT_RETURN(ret >= 0);
++
++ /*
++ * All uids/gids are mapped and we have the necessary caps
++ */
++ if (ret && all_uids_gids_are_mapped()) {
++ ret = unpriv_test_custom_ep(env->buspath);
++ ASSERT_RETURN(ret == 0);
++ }
++
++ kdbus_conn_free(reader);
++ kdbus_conn_free(ep_conn);
++ close(ep_fd);
++
++ return TEST_OK;
++}
+diff --git a/tools/testing/selftests/kdbus/test-fd.c b/tools/testing/selftests/kdbus/test-fd.c
+new file mode 100644
+index 0000000..2ae0f5a
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-fd.c
+@@ -0,0 +1,789 @@
++#include <stdio.h>
++#include <string.h>
++#include <time.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stdbool.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++#include <assert.h>
++#include <sys/types.h>
++#include <sys/mman.h>
++#include <sys/socket.h>
++#include <sys/wait.h>
++
++#include "kdbus-api.h"
++#include "kdbus-test.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++
++#define KDBUS_MSG_MAX_ITEMS 128
++#define KDBUS_USER_MAX_CONN 256
++
++/* maximum number of inflight fds in a target queue per user */
++#define KDBUS_CONN_MAX_FDS_PER_USER 16
++
++/* maximum number of memfd items per message */
++#define KDBUS_MSG_MAX_MEMFD_ITEMS 16
++
++static int make_msg_payload_dbus(uint64_t src_id, uint64_t dst_id,
++ uint64_t msg_size,
++ struct kdbus_msg **msg_dbus)
++{
++ struct kdbus_msg *msg;
++
++ msg = malloc(msg_size);
++ ASSERT_RETURN_VAL(msg, -ENOMEM);
++
++ memset(msg, 0, msg_size);
++ msg->size = msg_size;
++ msg->src_id = src_id;
++ msg->dst_id = dst_id;
++ msg->payload_type = KDBUS_PAYLOAD_DBUS;
++
++ *msg_dbus = msg;
++
++ return 0;
++}
++
++static void make_item_memfds(struct kdbus_item *item,
++ int *memfds, size_t memfd_size)
++{
++ size_t i;
++
++ for (i = 0; i < memfd_size; i++) {
++ item->type = KDBUS_ITEM_PAYLOAD_MEMFD;
++ item->size = KDBUS_ITEM_HEADER_SIZE +
++ sizeof(struct kdbus_memfd);
++ item->memfd.fd = memfds[i];
++ item->memfd.size = sizeof(uint64_t); /* const size */
++ item = KDBUS_ITEM_NEXT(item);
++ }
++}
++
++static void make_item_fds(struct kdbus_item *item,
++ int *fd_array, size_t fd_size)
++{
++ size_t i;
++ item->type = KDBUS_ITEM_FDS;
++ item->size = KDBUS_ITEM_HEADER_SIZE + (sizeof(int) * fd_size);
++
++ for (i = 0; i < fd_size; i++)
++ item->fds[i] = fd_array[i];
++}
++
++static int memfd_write(const char *name, void *buf, size_t bufsize)
++{
++ ssize_t ret;
++ int memfd;
++
++ memfd = sys_memfd_create(name, 0);
++ ASSERT_RETURN_VAL(memfd >= 0, memfd);
++
++ ret = write(memfd, buf, bufsize);
++ ASSERT_RETURN_VAL(ret == (ssize_t)bufsize, -EAGAIN);
++
++ ret = sys_memfd_seal_set(memfd);
++ ASSERT_RETURN_VAL(ret == 0, -errno);
++
++ return memfd;
++}
++
++static int send_memfds(struct kdbus_conn *conn, uint64_t dst_id,
++ int *memfds_array, size_t memfd_count)
++{
++ struct kdbus_cmd_send cmd = {};
++ struct kdbus_item *item;
++ struct kdbus_msg *msg;
++ uint64_t size;
++ int ret;
++
++ size = sizeof(struct kdbus_msg);
++ size += memfd_count * KDBUS_ITEM_SIZE(sizeof(struct kdbus_memfd));
++
++ if (dst_id == KDBUS_DST_ID_BROADCAST)
++ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
++
++ ret = make_msg_payload_dbus(conn->id, dst_id, size, &msg);
++ ASSERT_RETURN_VAL(ret == 0, ret);
++
++ item = msg->items;
++
++ if (dst_id == KDBUS_DST_ID_BROADCAST) {
++ item->type = KDBUS_ITEM_BLOOM_FILTER;
++ item->size = KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
++ item = KDBUS_ITEM_NEXT(item);
++
++ msg->flags |= KDBUS_MSG_SIGNAL;
++ }
++
++ make_item_memfds(item, memfds_array, memfd_count);
++
++ cmd.size = sizeof(cmd);
++ cmd.msg_address = (uintptr_t)msg;
++
++ ret = kdbus_cmd_send(conn->fd, &cmd);
++ if (ret < 0) {
++ kdbus_printf("error sending message: %d (%m)\n", ret);
++ return ret;
++ }
++
++ free(msg);
++ return 0;
++}
++
++static int send_fds(struct kdbus_conn *conn, uint64_t dst_id,
++ int *fd_array, size_t fd_count)
++{
++ struct kdbus_cmd_send cmd = {};
++ struct kdbus_item *item;
++ struct kdbus_msg *msg;
++ uint64_t size;
++ int ret;
++
++ size = sizeof(struct kdbus_msg);
++ size += KDBUS_ITEM_SIZE(sizeof(int) * fd_count);
++
++ if (dst_id == KDBUS_DST_ID_BROADCAST)
++ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
++
++ ret = make_msg_payload_dbus(conn->id, dst_id, size, &msg);
++ ASSERT_RETURN_VAL(ret == 0, ret);
++
++ item = msg->items;
++
++ if (dst_id == KDBUS_DST_ID_BROADCAST) {
++ item->type = KDBUS_ITEM_BLOOM_FILTER;
++ item->size = KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
++ item = KDBUS_ITEM_NEXT(item);
++
++ msg->flags |= KDBUS_MSG_SIGNAL;
++ }
++
++ make_item_fds(item, fd_array, fd_count);
++
++ cmd.size = sizeof(cmd);
++ cmd.msg_address = (uintptr_t)msg;
++
++ ret = kdbus_cmd_send(conn->fd, &cmd);
++ if (ret < 0) {
++ kdbus_printf("error sending message: %d (%m)\n", ret);
++ return ret;
++ }
++
++ free(msg);
++ return ret;
++}
++
++static int send_fds_memfds(struct kdbus_conn *conn, uint64_t dst_id,
++ int *fds_array, size_t fd_count,
++ int *memfds_array, size_t memfd_count)
++{
++ struct kdbus_cmd_send cmd = {};
++ struct kdbus_item *item;
++ struct kdbus_msg *msg;
++ uint64_t size;
++ int ret;
++
++ size = sizeof(struct kdbus_msg);
++ size += memfd_count * KDBUS_ITEM_SIZE(sizeof(struct kdbus_memfd));
++ size += KDBUS_ITEM_SIZE(sizeof(int) * fd_count);
++
++ ret = make_msg_payload_dbus(conn->id, dst_id, size, &msg);
++ ASSERT_RETURN_VAL(ret == 0, ret);
++
++ item = msg->items;
++
++ make_item_fds(item, fds_array, fd_count);
++ item = KDBUS_ITEM_NEXT(item);
++ make_item_memfds(item, memfds_array, memfd_count);
++
++ cmd.size = sizeof(cmd);
++ cmd.msg_address = (uintptr_t)msg;
++
++ ret = kdbus_cmd_send(conn->fd, &cmd);
++ if (ret < 0) {
++ kdbus_printf("error sending message: %d (%m)\n", ret);
++ return ret;
++ }
++
++ free(msg);
++ return ret;
++}
++
++/* Return the number of received fds */
++static unsigned int kdbus_item_get_nfds(struct kdbus_msg *msg)
++{
++ unsigned int fds = 0;
++ const struct kdbus_item *item;
++
++ KDBUS_ITEM_FOREACH(item, msg, items) {
++ switch (item->type) {
++ case KDBUS_ITEM_FDS: {
++ fds += (item->size - KDBUS_ITEM_HEADER_SIZE) /
++ sizeof(int);
++ break;
++ }
++
++ case KDBUS_ITEM_PAYLOAD_MEMFD:
++ fds++;
++ break;
++
++ default:
++ break;
++ }
++ }
++
++ return fds;
++}
++
++static struct kdbus_msg *
++get_kdbus_msg_with_fd(struct kdbus_conn *conn_src,
++ uint64_t dst_id, uint64_t cookie, int fd)
++{
++ int ret;
++ uint64_t size;
++ struct kdbus_item *item;
++ struct kdbus_msg *msg;
++
++ size = sizeof(struct kdbus_msg);
++ if (fd >= 0)
++ size += KDBUS_ITEM_SIZE(sizeof(int));
++
++ ret = make_msg_payload_dbus(conn_src->id, dst_id, size, &msg);
++ ASSERT_RETURN_VAL(ret == 0, NULL);
++
++ msg->cookie = cookie;
++
++ if (fd >= 0) {
++ item = msg->items;
++
++ make_item_fds(item, (int *)&fd, 1);
++ }
++
++ return msg;
++}
++
++static int kdbus_test_no_fds(struct kdbus_test_env *env,
++ int *fds, int *memfd)
++{
++ pid_t pid;
++ int ret, status;
++ uint64_t cookie;
++ int connfd1, connfd2;
++ struct kdbus_msg *msg, *msg_sync_reply;
++ struct kdbus_cmd_hello hello;
++ struct kdbus_conn *conn_src, *conn_dst, *conn_dummy;
++ struct kdbus_cmd_send cmd = {};
++ struct kdbus_cmd_free cmd_free = {};
++
++ conn_src = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn_src);
++
++ connfd1 = open(env->buspath, O_RDWR|O_CLOEXEC);
++ ASSERT_RETURN(connfd1 >= 0);
++
++ connfd2 = open(env->buspath, O_RDWR|O_CLOEXEC);
++ ASSERT_RETURN(connfd2 >= 0);
++
++ /*
++ * Create connections without KDBUS_HELLO_ACCEPT_FD
++ * to test if send fd operations are blocked
++ */
++ conn_dst = malloc(sizeof(*conn_dst));
++ ASSERT_RETURN(conn_dst);
++
++ conn_dummy = malloc(sizeof(*conn_dummy));
++ ASSERT_RETURN(conn_dummy);
++
++ memset(&hello, 0, sizeof(hello));
++ hello.size = sizeof(struct kdbus_cmd_hello);
++ hello.pool_size = POOL_SIZE;
++ hello.attach_flags_send = _KDBUS_ATTACH_ALL;
++
++ ret = kdbus_cmd_hello(connfd1, &hello);
++ ASSERT_RETURN(ret == 0);
++
++ cmd_free.size = sizeof(cmd_free);
++ cmd_free.offset = hello.offset;
++ ret = kdbus_cmd_free(connfd1, &cmd_free);
++ ASSERT_RETURN(ret >= 0);
++
++ conn_dst->fd = connfd1;
++ conn_dst->id = hello.id;
++
++ memset(&hello, 0, sizeof(hello));
++ hello.size = sizeof(struct kdbus_cmd_hello);
++ hello.pool_size = POOL_SIZE;
++ hello.attach_flags_send = _KDBUS_ATTACH_ALL;
++
++ ret = kdbus_cmd_hello(connfd2, &hello);
++ ASSERT_RETURN(ret == 0);
++
++ cmd_free.size = sizeof(cmd_free);
++ cmd_free.offset = hello.offset;
++ ret = kdbus_cmd_free(connfd2, &cmd_free);
++ ASSERT_RETURN(ret >= 0);
++
++ conn_dummy->fd = connfd2;
++ conn_dummy->id = hello.id;
++
++ conn_dst->buf = mmap(NULL, POOL_SIZE, PROT_READ,
++ MAP_SHARED, connfd1, 0);
++ ASSERT_RETURN(conn_dst->buf != MAP_FAILED);
++
++ conn_dummy->buf = mmap(NULL, POOL_SIZE, PROT_READ,
++ MAP_SHARED, connfd2, 0);
++ ASSERT_RETURN(conn_dummy->buf != MAP_FAILED);
++
++ /*
++ * Send fds to connection that do not accept fd passing
++ */
++ ret = send_fds(conn_src, conn_dst->id, fds, 1);
++ ASSERT_RETURN(ret == -ECOMM);
++
++ /*
++ * memfd are kdbus payload
++ */
++ ret = send_memfds(conn_src, conn_dst->id, memfd, 1);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_msg_recv_poll(conn_dst, 100, NULL, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ cookie = time(NULL);
++
++ pid = fork();
++ ASSERT_RETURN_VAL(pid >= 0, pid);
++
++ if (pid == 0) {
++ struct timespec now;
++
++ /*
++ * A sync send/reply to a connection that do not
++ * accept fds should fail if it contains an fd
++ */
++ msg_sync_reply = get_kdbus_msg_with_fd(conn_dst,
++ conn_dummy->id,
++ cookie, fds[0]);
++ ASSERT_EXIT(msg_sync_reply);
++
++ ret = clock_gettime(CLOCK_MONOTONIC_COARSE, &now);
++ ASSERT_EXIT(ret == 0);
++
++ msg_sync_reply->timeout_ns = now.tv_sec * 1000000000ULL +
++ now.tv_nsec + 100000000ULL;
++ msg_sync_reply->flags = KDBUS_MSG_EXPECT_REPLY;
++
++ memset(&cmd, 0, sizeof(cmd));
++ cmd.size = sizeof(cmd);
++ cmd.msg_address = (uintptr_t)msg_sync_reply;
++ cmd.flags = KDBUS_SEND_SYNC_REPLY;
++
++ ret = kdbus_cmd_send(conn_dst->fd, &cmd);
++ ASSERT_EXIT(ret == -ECOMM);
++
++ /*
++ * Now send a normal message, but the sync reply
++ * will fail since it contains an fd that the
++ * original sender do not want.
++ *
++ * The original sender will fail with -ETIMEDOUT
++ */
++ cookie++;
++ ret = kdbus_msg_send_sync(conn_dst, NULL, cookie,
++ KDBUS_MSG_EXPECT_REPLY,
++ 5000000000ULL, 0, conn_src->id, -1);
++ ASSERT_EXIT(ret == -EREMOTEIO);
++
++ cookie++;
++ ret = kdbus_msg_recv_poll(conn_dst, 100, &msg, NULL);
++ ASSERT_EXIT(ret == 0);
++ ASSERT_EXIT(msg->cookie == cookie);
++
++ free(msg_sync_reply);
++ kdbus_msg_free(msg);
++
++ _exit(EXIT_SUCCESS);
++ }
++
++ ret = kdbus_msg_recv_poll(conn_dummy, 100, NULL, NULL);
++ ASSERT_RETURN(ret == -ETIMEDOUT);
++
++ cookie++;
++ ret = kdbus_msg_recv_poll(conn_src, 100, &msg, NULL);
++ ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
++
++ kdbus_msg_free(msg);
++
++ /*
++ * Try to reply with a kdbus connection handle, this should
++ * fail with -EOPNOTSUPP
++ */
++ msg_sync_reply = get_kdbus_msg_with_fd(conn_src,
++ conn_dst->id,
++ cookie, conn_dst->fd);
++ ASSERT_RETURN(msg_sync_reply);
++
++ msg_sync_reply->cookie_reply = cookie;
++
++ memset(&cmd, 0, sizeof(cmd));
++ cmd.size = sizeof(cmd);
++ cmd.msg_address = (uintptr_t)msg_sync_reply;
++
++ ret = kdbus_cmd_send(conn_src->fd, &cmd);
++ ASSERT_RETURN(ret == -EOPNOTSUPP);
++
++ free(msg_sync_reply);
++
++ /*
++ * Try to reply with a normal fd, this should fail even
++ * if the response is a sync reply
++ *
++ * From the sender view we fail with -ECOMM
++ */
++ msg_sync_reply = get_kdbus_msg_with_fd(conn_src,
++ conn_dst->id,
++ cookie, fds[0]);
++ ASSERT_RETURN(msg_sync_reply);
++
++ msg_sync_reply->cookie_reply = cookie;
++
++ memset(&cmd, 0, sizeof(cmd));
++ cmd.size = sizeof(cmd);
++ cmd.msg_address = (uintptr_t)msg_sync_reply;
++
++ ret = kdbus_cmd_send(conn_src->fd, &cmd);
++ ASSERT_RETURN(ret == -ECOMM);
++
++ free(msg_sync_reply);
++
++ /*
++ * Resend another normal message and check if the queue
++ * is clear
++ */
++ cookie++;
++ ret = kdbus_msg_send(conn_src, NULL, cookie, 0, 0, 0,
++ conn_dst->id);
++ ASSERT_RETURN(ret == 0);
++
++ ret = waitpid(pid, &status, 0);
++ ASSERT_RETURN_VAL(ret >= 0, ret);
++
++ kdbus_conn_free(conn_dummy);
++ kdbus_conn_free(conn_dst);
++ kdbus_conn_free(conn_src);
++
++ return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
++}
++
++static int kdbus_send_multiple_fds(struct kdbus_conn *conn_src,
++ struct kdbus_conn *conn_dst)
++{
++ int ret, i;
++ unsigned int nfds;
++ int fds[KDBUS_CONN_MAX_FDS_PER_USER + 1];
++ int memfds[KDBUS_MSG_MAX_ITEMS + 1];
++ struct kdbus_msg *msg;
++ uint64_t dummy_value;
++
++ dummy_value = time(NULL);
++
++ for (i = 0; i < KDBUS_CONN_MAX_FDS_PER_USER + 1; i++) {
++ fds[i] = open("/dev/null", O_RDWR|O_CLOEXEC);
++ ASSERT_RETURN_VAL(fds[i] >= 0, -errno);
++ }
++
++ /* Send KDBUS_CONN_MAX_FDS_PER_USER with one more fd */
++ ret = send_fds(conn_src, conn_dst->id, fds,
++ KDBUS_CONN_MAX_FDS_PER_USER + 1);
++ ASSERT_RETURN(ret == -EMFILE);
++
++ /* Retry with the correct KDBUS_CONN_MAX_FDS_PER_USER */
++ ret = send_fds(conn_src, conn_dst->id, fds,
++ KDBUS_CONN_MAX_FDS_PER_USER);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_msg_recv(conn_dst, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ /* Check we got the right number of fds */
++ nfds = kdbus_item_get_nfds(msg);
++ ASSERT_RETURN(nfds == KDBUS_CONN_MAX_FDS_PER_USER);
++
++ kdbus_msg_free(msg);
++
++ for (i = 0; i < KDBUS_MSG_MAX_ITEMS + 1; i++, dummy_value++) {
++ memfds[i] = memfd_write("memfd-name",
++ &dummy_value,
++ sizeof(dummy_value));
++ ASSERT_RETURN_VAL(memfds[i] >= 0, memfds[i]);
++ }
++
++ /* Send KDBUS_MSG_MAX_ITEMS with one more memfd */
++ ret = send_memfds(conn_src, conn_dst->id,
++ memfds, KDBUS_MSG_MAX_ITEMS + 1);
++ ASSERT_RETURN(ret == -E2BIG);
++
++ ret = send_memfds(conn_src, conn_dst->id,
++ memfds, KDBUS_MSG_MAX_MEMFD_ITEMS + 1);
++ ASSERT_RETURN(ret == -E2BIG);
++
++ /* Retry with the correct KDBUS_MSG_MAX_ITEMS */
++ ret = send_memfds(conn_src, conn_dst->id,
++ memfds, KDBUS_MSG_MAX_MEMFD_ITEMS);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_msg_recv(conn_dst, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ /* Check we got the right number of fds */
++ nfds = kdbus_item_get_nfds(msg);
++ ASSERT_RETURN(nfds == KDBUS_MSG_MAX_MEMFD_ITEMS);
++
++ kdbus_msg_free(msg);
++
++
++ /*
++ * Combine multiple KDBUS_CONN_MAX_FDS_PER_USER+1 fds and
++ * 10 memfds
++ */
++ ret = send_fds_memfds(conn_src, conn_dst->id,
++ fds, KDBUS_CONN_MAX_FDS_PER_USER + 1,
++ memfds, 10);
++ ASSERT_RETURN(ret == -EMFILE);
++
++ ret = kdbus_msg_recv(conn_dst, NULL, NULL);
++ ASSERT_RETURN(ret == -EAGAIN);
++
++ /*
++ * Combine multiple KDBUS_CONN_MAX_FDS_PER_USER fds and
++ * (128 - 1) + 1 memfds, all fds take one item, while each
++ * memfd takes one item
++ */
++ ret = send_fds_memfds(conn_src, conn_dst->id,
++ fds, KDBUS_CONN_MAX_FDS_PER_USER,
++ memfds, (KDBUS_MSG_MAX_ITEMS - 1) + 1);
++ ASSERT_RETURN(ret == -E2BIG);
++
++ ret = send_fds_memfds(conn_src, conn_dst->id,
++ fds, KDBUS_CONN_MAX_FDS_PER_USER,
++ memfds, KDBUS_MSG_MAX_MEMFD_ITEMS + 1);
++ ASSERT_RETURN(ret == -E2BIG);
++
++ ret = kdbus_msg_recv(conn_dst, NULL, NULL);
++ ASSERT_RETURN(ret == -EAGAIN);
++
++ /*
++ * Send KDBUS_CONN_MAX_FDS_PER_USER fds +
++ * KDBUS_MSG_MAX_MEMFD_ITEMS memfds
++ */
++ ret = send_fds_memfds(conn_src, conn_dst->id,
++ fds, KDBUS_CONN_MAX_FDS_PER_USER,
++ memfds, KDBUS_MSG_MAX_MEMFD_ITEMS);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_msg_recv(conn_dst, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ /* Check we got the right number of fds */
++ nfds = kdbus_item_get_nfds(msg);
++ ASSERT_RETURN(nfds == KDBUS_CONN_MAX_FDS_PER_USER +
++ KDBUS_MSG_MAX_MEMFD_ITEMS);
++
++ kdbus_msg_free(msg);
++
++
++ /*
++ * Re-send fds + memfds, close them, but do not receive them
++ * and try to queue more
++ */
++ ret = send_fds_memfds(conn_src, conn_dst->id,
++ fds, KDBUS_CONN_MAX_FDS_PER_USER,
++ memfds, KDBUS_MSG_MAX_MEMFD_ITEMS);
++ ASSERT_RETURN(ret == 0);
++
++ /* close old references and get a new ones */
++ for (i = 0; i < KDBUS_CONN_MAX_FDS_PER_USER + 1; i++) {
++ close(fds[i]);
++ fds[i] = open("/dev/null", O_RDWR|O_CLOEXEC);
++ ASSERT_RETURN_VAL(fds[i] >= 0, -errno);
++ }
++
++ /* should fail since we have already fds in the queue */
++ ret = send_fds(conn_src, conn_dst->id, fds,
++ KDBUS_CONN_MAX_FDS_PER_USER);
++ ASSERT_RETURN(ret == -EMFILE);
++
++ /* This should succeed */
++ ret = send_memfds(conn_src, conn_dst->id,
++ memfds, KDBUS_MSG_MAX_MEMFD_ITEMS);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_msg_recv(conn_dst, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ nfds = kdbus_item_get_nfds(msg);
++ ASSERT_RETURN(nfds == KDBUS_CONN_MAX_FDS_PER_USER +
++ KDBUS_MSG_MAX_MEMFD_ITEMS);
++
++ kdbus_msg_free(msg);
++
++ ret = kdbus_msg_recv(conn_dst, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ nfds = kdbus_item_get_nfds(msg);
++ ASSERT_RETURN(nfds == KDBUS_MSG_MAX_MEMFD_ITEMS);
++
++ kdbus_msg_free(msg);
++
++ ret = kdbus_msg_recv(conn_dst, NULL, NULL);
++ ASSERT_RETURN(ret == -EAGAIN);
++
++ for (i = 0; i < KDBUS_CONN_MAX_FDS_PER_USER + 1; i++)
++ close(fds[i]);
++
++ for (i = 0; i < KDBUS_MSG_MAX_ITEMS + 1; i++)
++ close(memfds[i]);
++
++ return 0;
++}
++
++int kdbus_test_fd_passing(struct kdbus_test_env *env)
++{
++ struct kdbus_conn *conn_src, *conn_dst;
++ const char *str = "stackenblocken";
++ const struct kdbus_item *item;
++ struct kdbus_msg *msg;
++ unsigned int i;
++ uint64_t now;
++ int fds_conn[2];
++ int sock_pair[2];
++ int fds[2];
++ int memfd;
++ int ret;
++
++ now = (uint64_t) time(NULL);
++
++ /* create two connections */
++ conn_src = kdbus_hello(env->buspath, 0, NULL, 0);
++ conn_dst = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn_src && conn_dst);
++
++ fds_conn[0] = conn_src->fd;
++ fds_conn[1] = conn_dst->fd;
++
++ ret = socketpair(AF_UNIX, SOCK_STREAM, 0, sock_pair);
++ ASSERT_RETURN(ret == 0);
++
++ /* Setup memfd */
++ memfd = memfd_write("memfd-name", &now, sizeof(now));
++ ASSERT_RETURN(memfd >= 0);
++
++ /* Setup pipes */
++ ret = pipe(fds);
++ ASSERT_RETURN(ret == 0);
++
++ i = write(fds[1], str, strlen(str));
++ ASSERT_RETURN(i == strlen(str));
++
++ /*
++ * Try to ass the handle of a connection as message payload.
++ * This must fail.
++ */
++ ret = send_fds(conn_src, conn_dst->id, fds_conn, 2);
++ ASSERT_RETURN(ret == -ENOTSUP);
++
++ ret = send_fds(conn_dst, conn_src->id, fds_conn, 2);
++ ASSERT_RETURN(ret == -ENOTSUP);
++
++ ret = send_fds(conn_src, conn_dst->id, sock_pair, 2);
++ ASSERT_RETURN(ret == -ENOTSUP);
++
++ /*
++ * Send fds and memfds to connection that do not accept fds
++ */
++ ret = kdbus_test_no_fds(env, fds, (int *)&memfd);
++ ASSERT_RETURN(ret == 0);
++
++ /* Try to broadcast file descriptors. This must fail. */
++ ret = send_fds(conn_src, KDBUS_DST_ID_BROADCAST, fds, 1);
++ ASSERT_RETURN(ret == -ENOTUNIQ);
++
++ /* Try to broadcast memfd. This must succeed. */
++ ret = send_memfds(conn_src, KDBUS_DST_ID_BROADCAST, (int *)&memfd, 1);
++ ASSERT_RETURN(ret == 0);
++
++ /* Open code this loop */
++loop_send_fds:
++
++ /*
++ * Send the read end of the pipe and close it.
++ */
++ ret = send_fds(conn_src, conn_dst->id, fds, 1);
++ ASSERT_RETURN(ret == 0);
++ close(fds[0]);
++
++ ret = kdbus_msg_recv(conn_dst, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ KDBUS_ITEM_FOREACH(item, msg, items) {
++ if (item->type == KDBUS_ITEM_FDS) {
++ char tmp[14];
++ int nfds = (item->size - KDBUS_ITEM_HEADER_SIZE) /
++ sizeof(int);
++ ASSERT_RETURN(nfds == 1);
++
++ i = read(item->fds[0], tmp, sizeof(tmp));
++ if (i != 0) {
++ ASSERT_RETURN(i == sizeof(tmp));
++ ASSERT_RETURN(memcmp(tmp, str, sizeof(tmp)) == 0);
++
++ /* Write EOF */
++ close(fds[1]);
++
++ /*
++ * Resend the read end of the pipe,
++ * the receiver still holds a reference
++ * to it...
++ */
++ goto loop_send_fds;
++ }
++
++ /* Got EOF */
++
++ /*
++ * Close the last reference to the read end
++ * of the pipe, other references are
++ * automatically closed just after send.
++ */
++ close(item->fds[0]);
++ }
++ }
++
++ /*
++ * Try to resend the read end of the pipe. Must fail with
++ * -EBADF since both the sender and receiver closed their
++ * references to it. We assume the above since sender and
++ * receiver are on the same process.
++ */
++ ret = send_fds(conn_src, conn_dst->id, fds, 1);
++ ASSERT_RETURN(ret == -EBADF);
++
++ /* Then we clear out received any data... */
++ kdbus_msg_free(msg);
++
++ ret = kdbus_send_multiple_fds(conn_src, conn_dst);
++ ASSERT_RETURN(ret == 0);
++
++ close(sock_pair[0]);
++ close(sock_pair[1]);
++ close(memfd);
++
++ kdbus_conn_free(conn_src);
++ kdbus_conn_free(conn_dst);
++
++ return TEST_OK;
++}
+diff --git a/tools/testing/selftests/kdbus/test-free.c b/tools/testing/selftests/kdbus/test-free.c
+new file mode 100644
+index 0000000..f666da3
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-free.c
+@@ -0,0 +1,64 @@
++#include <stdio.h>
++#include <string.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++#include <assert.h>
++#include <stdbool.h>
++
++#include "kdbus-api.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++#include "kdbus-test.h"
++
++static int sample_ioctl_call(struct kdbus_test_env *env)
++{
++ int ret;
++ struct kdbus_cmd_list cmd_list = {
++ .flags = KDBUS_LIST_QUEUED,
++ .size = sizeof(cmd_list),
++ };
++
++ ret = kdbus_cmd_list(env->conn->fd, &cmd_list);
++ ASSERT_RETURN(ret == 0);
++
++ /* DON'T FREE THIS SLICE OF MEMORY! */
++
++ return TEST_OK;
++}
++
++int kdbus_test_free(struct kdbus_test_env *env)
++{
++ int ret;
++ struct kdbus_cmd_free cmd_free = {};
++
++ /* free an unallocated buffer */
++ cmd_free.size = sizeof(cmd_free);
++ cmd_free.flags = 0;
++ cmd_free.offset = 0;
++ ret = kdbus_cmd_free(env->conn->fd, &cmd_free);
++ ASSERT_RETURN(ret == -ENXIO);
++
++ /* free a buffer out of the pool's bounds */
++ cmd_free.size = sizeof(cmd_free);
++ cmd_free.offset = POOL_SIZE + 1;
++ ret = kdbus_cmd_free(env->conn->fd, &cmd_free);
++ ASSERT_RETURN(ret == -ENXIO);
++
++ /*
++ * The user application is responsible for freeing the allocated
++ * memory with the KDBUS_CMD_FREE ioctl, so let's test what happens
++ * if we forget about it.
++ */
++
++ ret = sample_ioctl_call(env);
++ ASSERT_RETURN(ret == 0);
++
++ ret = sample_ioctl_call(env);
++ ASSERT_RETURN(ret == 0);
++
++ return TEST_OK;
++}
+diff --git a/tools/testing/selftests/kdbus/test-match.c b/tools/testing/selftests/kdbus/test-match.c
+new file mode 100644
+index 0000000..2360dc1
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-match.c
+@@ -0,0 +1,441 @@
++#include <stdio.h>
++#include <string.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++#include <assert.h>
++#include <stdbool.h>
++
++#include "kdbus-api.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++#include "kdbus-test.h"
++
++int kdbus_test_match_id_add(struct kdbus_test_env *env)
++{
++ struct {
++ struct kdbus_cmd_match cmd;
++ struct {
++ uint64_t size;
++ uint64_t type;
++ struct kdbus_notify_id_change chg;
++ } item;
++ } buf;
++ struct kdbus_conn *conn;
++ struct kdbus_msg *msg;
++ int ret;
++
++ memset(&buf, 0, sizeof(buf));
++
++ buf.cmd.size = sizeof(buf);
++ buf.cmd.cookie = 0xdeafbeefdeaddead;
++ buf.item.size = sizeof(buf.item);
++ buf.item.type = KDBUS_ITEM_ID_ADD;
++ buf.item.chg.id = KDBUS_MATCH_ID_ANY;
++
++ /* match on id add */
++ ret = kdbus_cmd_match_add(env->conn->fd, &buf.cmd);
++ ASSERT_RETURN(ret == 0);
++
++ /* create 2nd connection */
++ conn = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn != NULL);
++
++ /* 1st connection should have received a notification */
++ ret = kdbus_msg_recv(env->conn, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_ID_ADD);
++ ASSERT_RETURN(msg->items[0].id_change.id == conn->id);
++
++ kdbus_conn_free(conn);
++
++ return TEST_OK;
++}
++
++int kdbus_test_match_id_remove(struct kdbus_test_env *env)
++{
++ struct {
++ struct kdbus_cmd_match cmd;
++ struct {
++ uint64_t size;
++ uint64_t type;
++ struct kdbus_notify_id_change chg;
++ } item;
++ } buf;
++ struct kdbus_conn *conn;
++ struct kdbus_msg *msg;
++ size_t id;
++ int ret;
++
++ /* create 2nd connection */
++ conn = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn != NULL);
++ id = conn->id;
++
++ memset(&buf, 0, sizeof(buf));
++ buf.cmd.size = sizeof(buf);
++ buf.cmd.cookie = 0xdeafbeefdeaddead;
++ buf.item.size = sizeof(buf.item);
++ buf.item.type = KDBUS_ITEM_ID_REMOVE;
++ buf.item.chg.id = id;
++
++ /* register match on 2nd connection */
++ ret = kdbus_cmd_match_add(env->conn->fd, &buf.cmd);
++ ASSERT_RETURN(ret == 0);
++
++ /* remove 2nd connection again */
++ kdbus_conn_free(conn);
++
++ /* 1st connection should have received a notification */
++ ret = kdbus_msg_recv(env->conn, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_ID_REMOVE);
++ ASSERT_RETURN(msg->items[0].id_change.id == id);
++
++ return TEST_OK;
++}
++
++int kdbus_test_match_replace(struct kdbus_test_env *env)
++{
++ struct {
++ struct kdbus_cmd_match cmd;
++ struct {
++ uint64_t size;
++ uint64_t type;
++ struct kdbus_notify_id_change chg;
++ } item;
++ } buf;
++ struct kdbus_conn *conn;
++ struct kdbus_msg *msg;
++ size_t id;
++ int ret;
++
++ /* add a match to id_add */
++ ASSERT_RETURN(kdbus_test_match_id_add(env) == TEST_OK);
++
++ /* do a replace of the match from id_add to id_remove */
++ memset(&buf, 0, sizeof(buf));
++
++ buf.cmd.size = sizeof(buf);
++ buf.cmd.cookie = 0xdeafbeefdeaddead;
++ buf.cmd.flags = KDBUS_MATCH_REPLACE;
++ buf.item.size = sizeof(buf.item);
++ buf.item.type = KDBUS_ITEM_ID_REMOVE;
++ buf.item.chg.id = KDBUS_MATCH_ID_ANY;
++
++ ret = kdbus_cmd_match_add(env->conn->fd, &buf.cmd);
++
++ /* create 2nd connection */
++ conn = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn != NULL);
++ id = conn->id;
++
++ /* 1st connection should _not_ have received a notification */
++ ret = kdbus_msg_recv(env->conn, &msg, NULL);
++ ASSERT_RETURN(ret != 0);
++
++ /* remove 2nd connection */
++ kdbus_conn_free(conn);
++
++ /* 1st connection should _now_ have received a notification */
++ ret = kdbus_msg_recv(env->conn, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_ID_REMOVE);
++ ASSERT_RETURN(msg->items[0].id_change.id == id);
++
++ return TEST_OK;
++}
++
++int kdbus_test_match_name_add(struct kdbus_test_env *env)
++{
++ struct {
++ struct kdbus_cmd_match cmd;
++ struct {
++ uint64_t size;
++ uint64_t type;
++ struct kdbus_notify_name_change chg;
++ } item;
++ char name[64];
++ } buf;
++ struct kdbus_msg *msg;
++ char *name;
++ int ret;
++
++ name = "foo.bla.blaz";
++
++ /* install the match rule */
++ memset(&buf, 0, sizeof(buf));
++ buf.item.type = KDBUS_ITEM_NAME_ADD;
++ buf.item.chg.old_id.id = KDBUS_MATCH_ID_ANY;
++ buf.item.chg.new_id.id = KDBUS_MATCH_ID_ANY;
++ strncpy(buf.name, name, sizeof(buf.name) - 1);
++ buf.item.size = sizeof(buf.item) + strlen(buf.name) + 1;
++ buf.cmd.size = sizeof(buf.cmd) + buf.item.size;
++
++ ret = kdbus_cmd_match_add(env->conn->fd, &buf.cmd);
++ ASSERT_RETURN(ret == 0);
++
++ /* acquire the name */
++ ret = kdbus_name_acquire(env->conn, name, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ /* we should have received a notification */
++ ret = kdbus_msg_recv(env->conn, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_NAME_ADD);
++ ASSERT_RETURN(msg->items[0].name_change.old_id.id == 0);
++ ASSERT_RETURN(msg->items[0].name_change.new_id.id == env->conn->id);
++ ASSERT_RETURN(strcmp(msg->items[0].name_change.name, name) == 0);
++
++ return TEST_OK;
++}
++
++int kdbus_test_match_name_remove(struct kdbus_test_env *env)
++{
++ struct {
++ struct kdbus_cmd_match cmd;
++ struct {
++ uint64_t size;
++ uint64_t type;
++ struct kdbus_notify_name_change chg;
++ } item;
++ char name[64];
++ } buf;
++ struct kdbus_msg *msg;
++ char *name;
++ int ret;
++
++ name = "foo.bla.blaz";
++
++ /* acquire the name */
++ ret = kdbus_name_acquire(env->conn, name, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ /* install the match rule */
++ memset(&buf, 0, sizeof(buf));
++ buf.item.type = KDBUS_ITEM_NAME_REMOVE;
++ buf.item.chg.old_id.id = KDBUS_MATCH_ID_ANY;
++ buf.item.chg.new_id.id = KDBUS_MATCH_ID_ANY;
++ strncpy(buf.name, name, sizeof(buf.name) - 1);
++ buf.item.size = sizeof(buf.item) + strlen(buf.name) + 1;
++ buf.cmd.size = sizeof(buf.cmd) + buf.item.size;
++
++ ret = kdbus_cmd_match_add(env->conn->fd, &buf.cmd);
++ ASSERT_RETURN(ret == 0);
++
++ /* release the name again */
++ kdbus_name_release(env->conn, name);
++ ASSERT_RETURN(ret == 0);
++
++ /* we should have received a notification */
++ ret = kdbus_msg_recv(env->conn, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_NAME_REMOVE);
++ ASSERT_RETURN(msg->items[0].name_change.old_id.id == env->conn->id);
++ ASSERT_RETURN(msg->items[0].name_change.new_id.id == 0);
++ ASSERT_RETURN(strcmp(msg->items[0].name_change.name, name) == 0);
++
++ return TEST_OK;
++}
++
++int kdbus_test_match_name_change(struct kdbus_test_env *env)
++{
++ struct {
++ struct kdbus_cmd_match cmd;
++ struct {
++ uint64_t size;
++ uint64_t type;
++ struct kdbus_notify_name_change chg;
++ } item;
++ char name[64];
++ } buf;
++ struct kdbus_conn *conn;
++ struct kdbus_msg *msg;
++ uint64_t flags;
++ char *name = "foo.bla.baz";
++ int ret;
++
++ /* acquire the name */
++ ret = kdbus_name_acquire(env->conn, name, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ /* install the match rule */
++ memset(&buf, 0, sizeof(buf));
++ buf.item.type = KDBUS_ITEM_NAME_CHANGE;
++ buf.item.chg.old_id.id = KDBUS_MATCH_ID_ANY;
++ buf.item.chg.new_id.id = KDBUS_MATCH_ID_ANY;
++ strncpy(buf.name, name, sizeof(buf.name) - 1);
++ buf.item.size = sizeof(buf.item) + strlen(buf.name) + 1;
++ buf.cmd.size = sizeof(buf.cmd) + buf.item.size;
++
++ ret = kdbus_cmd_match_add(env->conn->fd, &buf.cmd);
++ ASSERT_RETURN(ret == 0);
++
++ /* create a 2nd connection */
++ conn = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn != NULL);
++
++ /* allow the new connection to own the same name */
++ /* queue the 2nd connection as waiting owner */
++ flags = KDBUS_NAME_QUEUE;
++ ret = kdbus_name_acquire(conn, name, &flags);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(flags & KDBUS_NAME_IN_QUEUE);
++
++ /* release name from 1st connection */
++ ret = kdbus_name_release(env->conn, name);
++ ASSERT_RETURN(ret == 0);
++
++ /* we should have received a notification */
++ ret = kdbus_msg_recv(env->conn, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_NAME_CHANGE);
++ ASSERT_RETURN(msg->items[0].name_change.old_id.id == env->conn->id);
++ ASSERT_RETURN(msg->items[0].name_change.new_id.id == conn->id);
++ ASSERT_RETURN(strcmp(msg->items[0].name_change.name, name) == 0);
++
++ kdbus_conn_free(conn);
++
++ return TEST_OK;
++}
++
++static int send_bloom_filter(const struct kdbus_conn *conn,
++ uint64_t cookie,
++ const uint8_t *filter,
++ size_t filter_size,
++ uint64_t filter_generation)
++{
++ struct kdbus_cmd_send cmd = {};
++ struct kdbus_msg *msg;
++ struct kdbus_item *item;
++ uint64_t size;
++ int ret;
++
++ size = sizeof(struct kdbus_msg);
++ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + filter_size;
++
++ msg = alloca(size);
++
++ memset(msg, 0, size);
++ msg->size = size;
++ msg->src_id = conn->id;
++ msg->dst_id = KDBUS_DST_ID_BROADCAST;
++ msg->flags = KDBUS_MSG_SIGNAL;
++ msg->payload_type = KDBUS_PAYLOAD_DBUS;
++ msg->cookie = cookie;
++
++ item = msg->items;
++ item->type = KDBUS_ITEM_BLOOM_FILTER;
++ item->size = KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) +
++ filter_size;
++
++ item->bloom_filter.generation = filter_generation;
++ memcpy(item->bloom_filter.data, filter, filter_size);
++
++ cmd.size = sizeof(cmd);
++ cmd.msg_address = (uintptr_t)msg;
++
++ ret = kdbus_cmd_send(conn->fd, &cmd);
++ if (ret < 0) {
++ kdbus_printf("error sending message: %d (%m)\n", ret);
++ return ret;
++ }
++
++ return 0;
++}
++
++int kdbus_test_match_bloom(struct kdbus_test_env *env)
++{
++ struct {
++ struct kdbus_cmd_match cmd;
++ struct {
++ uint64_t size;
++ uint64_t type;
++ uint8_t data_gen0[64];
++ uint8_t data_gen1[64];
++ } item;
++ } buf;
++ struct kdbus_conn *conn;
++ struct kdbus_msg *msg;
++ uint64_t cookie = 0xf000f00f;
++ uint8_t filter[64];
++ int ret;
++
++ /* install the match rule */
++ memset(&buf, 0, sizeof(buf));
++ buf.cmd.size = sizeof(buf);
++
++ buf.item.size = sizeof(buf.item);
++ buf.item.type = KDBUS_ITEM_BLOOM_MASK;
++ buf.item.data_gen0[0] = 0x55;
++ buf.item.data_gen0[63] = 0x80;
++
++ buf.item.data_gen1[1] = 0xaa;
++ buf.item.data_gen1[9] = 0x02;
++
++ ret = kdbus_cmd_match_add(env->conn->fd, &buf.cmd);
++ ASSERT_RETURN(ret == 0);
++
++ /* create a 2nd connection */
++ conn = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn != NULL);
++
++ /* a message with a 0'ed out filter must not reach the other peer */
++ memset(filter, 0, sizeof(filter));
++ ret = send_bloom_filter(conn, ++cookie, filter, sizeof(filter), 0);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_msg_recv(env->conn, &msg, NULL);
++ ASSERT_RETURN(ret == -EAGAIN);
++
++ /* now set the filter to the connection's mask and expect success */
++ filter[0] = 0x55;
++ filter[63] = 0x80;
++ ret = send_bloom_filter(conn, ++cookie, filter, sizeof(filter), 0);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_msg_recv(env->conn, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(msg->cookie == cookie);
++
++ /* broaden the filter and try again. this should also succeed. */
++ filter[0] = 0xff;
++ filter[8] = 0xff;
++ filter[63] = 0xff;
++ ret = send_bloom_filter(conn, ++cookie, filter, sizeof(filter), 0);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_msg_recv(env->conn, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(msg->cookie == cookie);
++
++ /* the same filter must not match against bloom generation 1 */
++ ret = send_bloom_filter(conn, ++cookie, filter, sizeof(filter), 1);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_msg_recv(env->conn, &msg, NULL);
++ ASSERT_RETURN(ret == -EAGAIN);
++
++ /* set a different filter and try again */
++ filter[1] = 0xaa;
++ filter[9] = 0x02;
++ ret = send_bloom_filter(conn, ++cookie, filter, sizeof(filter), 1);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_msg_recv(env->conn, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(msg->cookie == cookie);
++
++ kdbus_conn_free(conn);
++
++ return TEST_OK;
++}
+diff --git a/tools/testing/selftests/kdbus/test-message.c b/tools/testing/selftests/kdbus/test-message.c
+new file mode 100644
+index 0000000..563dc85
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-message.c
+@@ -0,0 +1,734 @@
++#include <stdio.h>
++#include <string.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++#include <assert.h>
++#include <time.h>
++#include <stdbool.h>
++#include <sys/eventfd.h>
++#include <sys/types.h>
++#include <sys/wait.h>
++
++#include "kdbus-api.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++#include "kdbus-test.h"
++
++/* maximum number of queued messages from the same individual user */
++#define KDBUS_CONN_MAX_MSGS 256
++
++/* maximum number of queued requests waiting for a reply */
++#define KDBUS_CONN_MAX_REQUESTS_PENDING 128
++
++/* maximum message payload size */
++#define KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE (2 * 1024UL * 1024UL)
++
++int kdbus_test_message_basic(struct kdbus_test_env *env)
++{
++ struct kdbus_conn *conn;
++ struct kdbus_conn *sender;
++ struct kdbus_msg *msg;
++ uint64_t cookie = 0x1234abcd5678eeff;
++ uint64_t offset;
++ int ret;
++
++ sender = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(sender != NULL);
++
++ /* create a 2nd connection */
++ conn = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn != NULL);
++
++ ret = kdbus_add_match_empty(conn);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_add_match_empty(sender);
++ ASSERT_RETURN(ret == 0);
++
++ /* send over 1st connection */
++ ret = kdbus_msg_send(sender, NULL, cookie, 0, 0, 0,
++ KDBUS_DST_ID_BROADCAST);
++ ASSERT_RETURN(ret == 0);
++
++ /* Make sure that we do get our own broadcasts */
++ ret = kdbus_msg_recv(sender, &msg, &offset);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(msg->cookie == cookie);
++
++ kdbus_msg_free(msg);
++
++ /* ... and receive on the 2nd */
++ ret = kdbus_msg_recv_poll(conn, 100, &msg, &offset);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(msg->cookie == cookie);
++
++ kdbus_msg_free(msg);
++
++ /* Msgs that expect a reply must have timeout and cookie */
++ ret = kdbus_msg_send(sender, NULL, 0, KDBUS_MSG_EXPECT_REPLY,
++ 0, 0, conn->id);
++ ASSERT_RETURN(ret == -EINVAL);
++
++ /* Faked replies with a valid reply cookie are rejected */
++ ret = kdbus_msg_send_reply(conn, time(NULL) ^ cookie, sender->id);
++ ASSERT_RETURN(ret == -EBADSLT);
++
++ ret = kdbus_free(conn, offset);
++ ASSERT_RETURN(ret == 0);
++
++ kdbus_conn_free(sender);
++ kdbus_conn_free(conn);
++
++ return TEST_OK;
++}
++
++static int msg_recv_prio(struct kdbus_conn *conn,
++ int64_t requested_prio,
++ int64_t expected_prio)
++{
++ struct kdbus_cmd_recv recv = {
++ .size = sizeof(recv),
++ .flags = KDBUS_RECV_USE_PRIORITY,
++ .priority = requested_prio,
++ };
++ struct kdbus_msg *msg;
++ int ret;
++
++ ret = kdbus_cmd_recv(conn->fd, &recv);
++ if (ret < 0) {
++ kdbus_printf("error receiving message: %d (%m)\n", -errno);
++ return ret;
++ }
++
++ msg = (struct kdbus_msg *)(conn->buf + recv.msg.offset);
++ kdbus_msg_dump(conn, msg);
++
++ if (msg->priority != expected_prio) {
++ kdbus_printf("expected message prio %lld, got %lld\n",
++ (unsigned long long) expected_prio,
++ (unsigned long long) msg->priority);
++ return -EINVAL;
++ }
++
++ kdbus_msg_free(msg);
++ ret = kdbus_free(conn, recv.msg.offset);
++ if (ret < 0)
++ return ret;
++
++ return 0;
++}
++
++int kdbus_test_message_prio(struct kdbus_test_env *env)
++{
++ struct kdbus_conn *a, *b;
++ uint64_t cookie = 0;
++
++ a = kdbus_hello(env->buspath, 0, NULL, 0);
++ b = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(a && b);
++
++ ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, 25, a->id) == 0);
++ ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -600, a->id) == 0);
++ ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, 10, a->id) == 0);
++ ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -35, a->id) == 0);
++ ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -100, a->id) == 0);
++ ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, 20, a->id) == 0);
++ ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -15, a->id) == 0);
++ ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -800, a->id) == 0);
++ ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -150, a->id) == 0);
++ ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, 10, a->id) == 0);
++ ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -800, a->id) == 0);
++ ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -10, a->id) == 0);
++
++ ASSERT_RETURN(msg_recv_prio(a, -200, -800) == 0);
++ ASSERT_RETURN(msg_recv_prio(a, -100, -800) == 0);
++ ASSERT_RETURN(msg_recv_prio(a, -400, -600) == 0);
++ ASSERT_RETURN(msg_recv_prio(a, -400, -600) == -EAGAIN);
++ ASSERT_RETURN(msg_recv_prio(a, 10, -150) == 0);
++ ASSERT_RETURN(msg_recv_prio(a, 10, -100) == 0);
++
++ kdbus_printf("--- get priority (all)\n");
++ ASSERT_RETURN(kdbus_msg_recv(a, NULL, NULL) == 0);
++
++ kdbus_conn_free(a);
++ kdbus_conn_free(b);
++
++ return TEST_OK;
++}
++
++static int kdbus_test_notify_kernel_quota(struct kdbus_test_env *env)
++{
++ int ret;
++ unsigned int i;
++ struct kdbus_conn *conn;
++ struct kdbus_conn *reader;
++ struct kdbus_msg *msg = NULL;
++ struct kdbus_cmd_recv recv = { .size = sizeof(recv) };
++
++ reader = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(reader);
++
++ conn = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn);
++
++ /* Register for ID signals */
++ ret = kdbus_add_match_id(reader, 0x1, KDBUS_ITEM_ID_ADD,
++ KDBUS_MATCH_ID_ANY);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_add_match_id(reader, 0x2, KDBUS_ITEM_ID_REMOVE,
++ KDBUS_MATCH_ID_ANY);
++ ASSERT_RETURN(ret == 0);
++
++ /* Each iteration two notifications: add and remove ID */
++ for (i = 0; i < KDBUS_CONN_MAX_MSGS / 2; i++) {
++ struct kdbus_conn *notifier;
++
++ notifier = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(notifier);
++
++ kdbus_conn_free(notifier);
++ }
++
++ /*
++ * Now the reader queue is full with kernel notfications,
++ * but as a user we still have room to push our messages.
++ */
++ ret = kdbus_msg_send(conn, NULL, 0xdeadbeef, 0, 0, 0, reader->id);
++ ASSERT_RETURN(ret == 0);
++
++ /* More ID kernel notifications that will be lost */
++ kdbus_conn_free(conn);
++
++ conn = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn);
++
++ kdbus_conn_free(conn);
++
++ /*
++ * We lost only 3 packets since only signal msgs are
++ * accounted. The connection ID add/remove notification
++ */
++ ret = kdbus_cmd_recv(reader->fd, &recv);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(recv.return_flags & KDBUS_RECV_RETURN_DROPPED_MSGS);
++ ASSERT_RETURN(recv.dropped_msgs == 3);
++
++ msg = (struct kdbus_msg *)(reader->buf + recv.msg.offset);
++ kdbus_msg_free(msg);
++
++ /* Read our queue */
++ for (i = 0; i < KDBUS_CONN_MAX_MSGS - 1; i++) {
++ memset(&recv, 0, sizeof(recv));
++ recv.size = sizeof(recv);
++
++ ret = kdbus_cmd_recv(reader->fd, &recv);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(!(recv.return_flags &
++ KDBUS_RECV_RETURN_DROPPED_MSGS));
++
++ msg = (struct kdbus_msg *)(reader->buf + recv.msg.offset);
++ kdbus_msg_free(msg);
++ }
++
++ ret = kdbus_msg_recv(reader, NULL, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_msg_recv(reader, NULL, NULL);
++ ASSERT_RETURN(ret == -EAGAIN);
++
++ kdbus_conn_free(reader);
++
++ return 0;
++}
++
++/* Return the number of message successfully sent */
++static int kdbus_fill_conn_queue(struct kdbus_conn *conn_src,
++ uint64_t dst_id,
++ unsigned int max_msgs)
++{
++ unsigned int i;
++ uint64_t cookie = 0;
++ size_t size;
++ struct kdbus_cmd_send cmd = {};
++ struct kdbus_msg *msg;
++ int ret;
++
++ size = sizeof(struct kdbus_msg);
++ msg = malloc(size);
++ ASSERT_RETURN_VAL(msg, -ENOMEM);
++
++ memset(msg, 0, size);
++ msg->size = size;
++ msg->src_id = conn_src->id;
++ msg->dst_id = dst_id;
++ msg->payload_type = KDBUS_PAYLOAD_DBUS;
++
++ cmd.size = sizeof(cmd);
++ cmd.msg_address = (uintptr_t)msg;
++
++ for (i = 0; i < max_msgs; i++) {
++ msg->cookie = cookie++;
++ ret = kdbus_cmd_send(conn_src->fd, &cmd);
++ if (ret < 0)
++ break;
++ }
++
++ free(msg);
++
++ return i;
++}
++
++static int kdbus_test_activator_quota(struct kdbus_test_env *env)
++{
++ int ret;
++ unsigned int i;
++ unsigned int activator_msgs_count = 0;
++ uint64_t cookie = time(NULL);
++ struct kdbus_conn *conn;
++ struct kdbus_conn *sender;
++ struct kdbus_conn *activator;
++ struct kdbus_msg *msg;
++ uint64_t flags = KDBUS_NAME_REPLACE_EXISTING;
++ struct kdbus_cmd_recv recv = { .size = sizeof(recv) };
++ struct kdbus_policy_access access = {
++ .type = KDBUS_POLICY_ACCESS_USER,
++ .id = geteuid(),
++ .access = KDBUS_POLICY_OWN,
++ };
++
++ activator = kdbus_hello_activator(env->buspath, "foo.test.activator",
++ &access, 1);
++ ASSERT_RETURN(activator);
++
++ conn = kdbus_hello(env->buspath, 0, NULL, 0);
++ sender = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn || sender);
++
++ ret = kdbus_list(sender, KDBUS_LIST_NAMES |
++ KDBUS_LIST_UNIQUE |
++ KDBUS_LIST_ACTIVATORS |
++ KDBUS_LIST_QUEUED);
++ ASSERT_RETURN(ret == 0);
++
++ for (i = 0; i < KDBUS_CONN_MAX_MSGS; i++) {
++ ret = kdbus_msg_send(sender, "foo.test.activator",
++ cookie++, 0, 0, 0,
++ KDBUS_DST_ID_NAME);
++ if (ret < 0)
++ break;
++ activator_msgs_count++;
++ }
++
++ /* we must have at least sent one message */
++ ASSERT_RETURN_VAL(i > 0, -errno);
++ ASSERT_RETURN(ret == -ENOBUFS);
++
++ /* Good, activator queue is full now */
++
++ /* ENXIO on direct send (activators can never be addressed by ID) */
++ ret = kdbus_msg_send(conn, NULL, cookie++, 0, 0, 0, activator->id);
++ ASSERT_RETURN(ret == -ENXIO);
++
++ /* can't queue more */
++ ret = kdbus_msg_send(conn, "foo.test.activator", cookie++,
++ 0, 0, 0, KDBUS_DST_ID_NAME);
++ ASSERT_RETURN(ret == -ENOBUFS);
++
++ /* no match installed, so the broadcast will not inc dropped_msgs */
++ ret = kdbus_msg_send(sender, NULL, cookie++, 0, 0, 0,
++ KDBUS_DST_ID_BROADCAST);
++ ASSERT_RETURN(ret == 0);
++
++ /* Check activator queue */
++ ret = kdbus_cmd_recv(activator->fd, &recv);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(recv.dropped_msgs == 0);
++
++ activator_msgs_count--;
++
++ msg = (struct kdbus_msg *)(activator->buf + recv.msg.offset);
++ kdbus_msg_free(msg);
++
++
++ /* Stage 1) of test check the pool memory quota */
++
++ /* Consume the connection pool memory */
++ for (i = 0; i < KDBUS_CONN_MAX_MSGS; i++) {
++ ret = kdbus_msg_send(sender, NULL,
++ cookie++, 0, 0, 0, conn->id);
++ if (ret < 0)
++ break;
++ }
++
++ /* consume one message, so later at least one can be moved */
++ memset(&recv, 0, sizeof(recv));
++ recv.size = sizeof(recv);
++ ret = kdbus_cmd_recv(conn->fd, &recv);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(recv.dropped_msgs == 0);
++ msg = (struct kdbus_msg *)(conn->buf + recv.msg.offset);
++ kdbus_msg_free(msg);
++
++ /* Try to acquire the name now */
++ ret = kdbus_name_acquire(conn, "foo.test.activator", &flags);
++ ASSERT_RETURN(ret == 0);
++
++ /* try to read messages and see if we have lost some */
++ memset(&recv, 0, sizeof(recv));
++ recv.size = sizeof(recv);
++ ret = kdbus_cmd_recv(conn->fd, &recv);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(recv.dropped_msgs != 0);
++
++ /* number of dropped msgs < received ones (at least one was moved) */
++ ASSERT_RETURN(recv.dropped_msgs < activator_msgs_count);
++
++ /* Deduct the number of dropped msgs from the activator msgs */
++ activator_msgs_count -= recv.dropped_msgs;
++
++ msg = (struct kdbus_msg *)(activator->buf + recv.msg.offset);
++ kdbus_msg_free(msg);
++
++ /*
++ * Release the name and hand it back to activator, now
++ * we should have 'activator_msgs_count' msgs again in
++ * the activator queue
++ */
++ ret = kdbus_name_release(conn, "foo.test.activator");
++ ASSERT_RETURN(ret == 0);
++
++ /* make sure that we got our previous activator msgs */
++ ret = kdbus_msg_recv(activator, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(msg->src_id == sender->id);
++
++ activator_msgs_count--;
++
++ kdbus_msg_free(msg);
++
++
++ /* Stage 2) of test check max message quota */
++
++ /* Empty conn queue */
++ for (i = 0; i < KDBUS_CONN_MAX_MSGS; i++) {
++ ret = kdbus_msg_recv(conn, NULL, NULL);
++ if (ret == -EAGAIN)
++ break;
++ }
++
++ /* fill queue with max msgs quota */
++ ret = kdbus_fill_conn_queue(sender, conn->id, KDBUS_CONN_MAX_MSGS);
++ ASSERT_RETURN(ret == KDBUS_CONN_MAX_MSGS);
++
++ /* This one is lost but it is not accounted */
++ ret = kdbus_msg_send(sender, NULL,
++ cookie++, 0, 0, 0, conn->id);
++ ASSERT_RETURN(ret == -ENOBUFS);
++
++ /* Acquire the name again */
++ ret = kdbus_name_acquire(conn, "foo.test.activator", &flags);
++ ASSERT_RETURN(ret == 0);
++
++ memset(&recv, 0, sizeof(recv));
++ recv.size = sizeof(recv);
++
++ /*
++ * Try to read messages and make sure that we have lost all
++ * the activator messages due to quota checks. Our queue is
++ * already full.
++ */
++ ret = kdbus_cmd_recv(conn->fd, &recv);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(recv.dropped_msgs == activator_msgs_count);
++
++ msg = (struct kdbus_msg *)(activator->buf + recv.msg.offset);
++ kdbus_msg_free(msg);
++
++ kdbus_conn_free(sender);
++ kdbus_conn_free(conn);
++ kdbus_conn_free(activator);
++
++ return 0;
++}
++
++static int kdbus_test_expected_reply_quota(struct kdbus_test_env *env)
++{
++ int ret;
++ unsigned int i, n;
++ unsigned int count;
++ uint64_t cookie = 0x1234abcd5678eeff;
++ struct kdbus_conn *conn;
++ struct kdbus_conn *connections[9];
++
++ conn = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn);
++
++ for (i = 0; i < 9; i++) {
++ connections[i] = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(connections[i]);
++ }
++
++ count = 0;
++ /* Send 16 messages to 8 different connections */
++ for (i = 0; i < 8; i++) {
++ for (n = 0; n < 16; n++) {
++ ret = kdbus_msg_send(conn, NULL, cookie++,
++ KDBUS_MSG_EXPECT_REPLY,
++ 100000000ULL, 0,
++ connections[i]->id);
++ if (ret < 0)
++ break;
++
++ count++;
++ }
++ }
++
++ /*
++ * We should have queued at least
++ * KDBUS_CONN_MAX_REQUESTS_PENDING method call
++ */
++ ASSERT_RETURN(count == KDBUS_CONN_MAX_REQUESTS_PENDING);
++
++ /*
++ * Now try to send a message to the last connection,
++ * if we have reached KDBUS_CONN_MAX_REQUESTS_PENDING
++ * no further requests are allowed
++ */
++ ret = kdbus_msg_send(conn, NULL, cookie++, KDBUS_MSG_EXPECT_REPLY,
++ 1000000000ULL, 0, connections[8]->id);
++ ASSERT_RETURN(ret == -EMLINK);
++
++ for (i = 0; i < 9; i++)
++ kdbus_conn_free(connections[i]);
++
++ kdbus_conn_free(conn);
++
++ return 0;
++}
++
++int kdbus_test_pool_quota(struct kdbus_test_env *env)
++{
++ struct kdbus_conn *a, *b, *c;
++ struct kdbus_cmd_send cmd = {};
++ struct kdbus_item *item;
++ struct kdbus_msg *recv_msg;
++ struct kdbus_msg *msg;
++ uint64_t cookie = time(NULL);
++ uint64_t size;
++ unsigned int i;
++ char *payload;
++ int ret;
++
++ /* just a guard */
++ if (POOL_SIZE <= KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE ||
++ POOL_SIZE % KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE != 0)
++ return 0;
++
++ payload = calloc(KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE, sizeof(char));
++ ASSERT_RETURN_VAL(payload, -ENOMEM);
++
++ a = kdbus_hello(env->buspath, 0, NULL, 0);
++ b = kdbus_hello(env->buspath, 0, NULL, 0);
++ c = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(a && b && c);
++
++ size = sizeof(struct kdbus_msg);
++ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
++
++ msg = malloc(size);
++ ASSERT_RETURN_VAL(msg, -ENOMEM);
++
++ memset(msg, 0, size);
++ msg->size = size;
++ msg->src_id = a->id;
++ msg->dst_id = c->id;
++ msg->payload_type = KDBUS_PAYLOAD_DBUS;
++
++ item = msg->items;
++ item->type = KDBUS_ITEM_PAYLOAD_VEC;
++ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
++ item->vec.address = (uintptr_t)payload;
++ item->vec.size = KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE;
++ item = KDBUS_ITEM_NEXT(item);
++
++ cmd.size = sizeof(cmd);
++ cmd.msg_address = (uintptr_t)msg;
++
++ /*
++ * Send 2097248 bytes, a user is only allowed to get 33% of half of
++ * the free space of the pool, the already used space is
++ * accounted as free space
++ */
++ size += KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE;
++ for (i = size; i < (POOL_SIZE / 2 / 3); i += size) {
++ msg->cookie = cookie++;
++
++ ret = kdbus_cmd_send(a->fd, &cmd);
++ ASSERT_RETURN_VAL(ret == 0, ret);
++ }
++
++ /* Try to get more than 33% */
++ msg->cookie = cookie++;
++ ret = kdbus_cmd_send(a->fd, &cmd);
++ ASSERT_RETURN(ret == -ENOBUFS);
++
++ /* We still can pass small messages */
++ ret = kdbus_msg_send(b, NULL, cookie++, 0, 0, 0, c->id);
++ ASSERT_RETURN(ret == 0);
++
++ for (i = size; i < (POOL_SIZE / 2 / 3); i += size) {
++ ret = kdbus_msg_recv(c, &recv_msg, NULL);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(recv_msg->src_id == a->id);
++
++ kdbus_msg_free(recv_msg);
++ }
++
++ ret = kdbus_msg_recv(c, &recv_msg, NULL);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(recv_msg->src_id == b->id);
++
++ kdbus_msg_free(recv_msg);
++
++ ret = kdbus_msg_recv(c, NULL, NULL);
++ ASSERT_RETURN(ret == -EAGAIN);
++
++ free(msg);
++ free(payload);
++
++ kdbus_conn_free(c);
++ kdbus_conn_free(b);
++ kdbus_conn_free(a);
++
++ return 0;
++}
++
++int kdbus_test_message_quota(struct kdbus_test_env *env)
++{
++ struct kdbus_conn *a, *b;
++ uint64_t cookie = 0;
++ int ret;
++ int i;
++
++ ret = kdbus_test_activator_quota(env);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_test_notify_kernel_quota(env);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_test_pool_quota(env);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_test_expected_reply_quota(env);
++ ASSERT_RETURN(ret == 0);
++
++ a = kdbus_hello(env->buspath, 0, NULL, 0);
++ b = kdbus_hello(env->buspath, 0, NULL, 0);
++
++ ret = kdbus_fill_conn_queue(b, a->id, KDBUS_CONN_MAX_MSGS);
++ ASSERT_RETURN(ret == KDBUS_CONN_MAX_MSGS);
++
++ ret = kdbus_msg_send(b, NULL, ++cookie, 0, 0, 0, a->id);
++ ASSERT_RETURN(ret == -ENOBUFS);
++
++ for (i = 0; i < KDBUS_CONN_MAX_MSGS; ++i) {
++ ret = kdbus_msg_recv(a, NULL, NULL);
++ ASSERT_RETURN(ret == 0);
++ }
++
++ ret = kdbus_msg_recv(a, NULL, NULL);
++ ASSERT_RETURN(ret == -EAGAIN);
++
++ ret = kdbus_fill_conn_queue(b, a->id, KDBUS_CONN_MAX_MSGS + 1);
++ ASSERT_RETURN(ret == KDBUS_CONN_MAX_MSGS);
++
++ ret = kdbus_msg_send(b, NULL, ++cookie, 0, 0, 0, a->id);
++ ASSERT_RETURN(ret == -ENOBUFS);
++
++ kdbus_conn_free(a);
++ kdbus_conn_free(b);
++
++ return TEST_OK;
++}
++
++int kdbus_test_memory_access(struct kdbus_test_env *env)
++{
++ struct kdbus_conn *a, *b;
++ struct kdbus_cmd_send cmd = {};
++ struct kdbus_item *item;
++ struct kdbus_msg *msg;
++ uint64_t test_addr = 0;
++ char line[256];
++ uint64_t size;
++ FILE *f;
++ int ret;
++
++ /*
++ * Search in /proc/kallsyms for the address of a kernel symbol that
++ * should always be there, regardless of the config. Use that address
++ * in a PAYLOAD_VEC item and make sure it's inaccessible.
++ */
++
++ f = fopen("/proc/kallsyms", "r");
++ if (!f)
++ return TEST_SKIP;
++
++ while (fgets(line, sizeof(line), f)) {
++ char *s = line;
++
++ if (!strsep(&s, " "))
++ continue;
++
++ if (!strsep(&s, " "))
++ continue;
++
++ if (!strncmp(s, "mutex_lock", 10)) {
++ test_addr = strtoull(line, NULL, 16);
++ break;
++ }
++ }
++
++ fclose(f);
++
++ if (!test_addr)
++ return TEST_SKIP;
++
++ a = kdbus_hello(env->buspath, 0, NULL, 0);
++ b = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(a && b);
++
++ size = sizeof(struct kdbus_msg);
++ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
++
++ msg = alloca(size);
++ ASSERT_RETURN_VAL(msg, -ENOMEM);
++
++ memset(msg, 0, size);
++ msg->size = size;
++ msg->src_id = a->id;
++ msg->dst_id = b->id;
++ msg->payload_type = KDBUS_PAYLOAD_DBUS;
++
++ item = msg->items;
++ item->type = KDBUS_ITEM_PAYLOAD_VEC;
++ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
++ item->vec.address = test_addr;
++ item->vec.size = sizeof(void*);
++ item = KDBUS_ITEM_NEXT(item);
++
++ cmd.size = sizeof(cmd);
++ cmd.msg_address = (uintptr_t)msg;
++
++ ret = kdbus_cmd_send(a->fd, &cmd);
++ ASSERT_RETURN(ret == -EFAULT);
++
++ kdbus_conn_free(b);
++ kdbus_conn_free(a);
++
++ return 0;
++}
+diff --git a/tools/testing/selftests/kdbus/test-metadata-ns.c b/tools/testing/selftests/kdbus/test-metadata-ns.c
+new file mode 100644
+index 0000000..1f6edc0
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-metadata-ns.c
+@@ -0,0 +1,500 @@
++/*
++ * Test metadata in new namespaces. Even if our tests can run
++ * in a namespaced setup, this test is necessary so we can inspect
++ * metadata on the same kdbusfs but between multiple namespaces
++ */
++
++#include <stdio.h>
++#include <string.h>
++#include <sched.h>
++#include <time.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++#include <assert.h>
++#include <signal.h>
++#include <sys/wait.h>
++#include <sys/prctl.h>
++#include <sys/eventfd.h>
++#include <sys/syscall.h>
++#include <sys/capability.h>
++#include <linux/sched.h>
++
++#include "kdbus-test.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++
++static const struct kdbus_creds privileged_creds = {};
++
++static const struct kdbus_creds unmapped_creds = {
++ .uid = UNPRIV_UID,
++ .euid = UNPRIV_UID,
++ .suid = UNPRIV_UID,
++ .fsuid = UNPRIV_UID,
++ .gid = UNPRIV_GID,
++ .egid = UNPRIV_GID,
++ .sgid = UNPRIV_GID,
++ .fsgid = UNPRIV_GID,
++};
++
++static const struct kdbus_pids unmapped_pids = {};
++
++/* Get only the first item */
++static struct kdbus_item *kdbus_get_item(struct kdbus_msg *msg,
++ uint64_t type)
++{
++ struct kdbus_item *item;
++
++ KDBUS_ITEM_FOREACH(item, msg, items)
++ if (item->type == type)
++ return item;
++
++ return NULL;
++}
++
++static int kdbus_match_kdbus_creds(struct kdbus_msg *msg,
++ const struct kdbus_creds *expected_creds)
++{
++ struct kdbus_item *item;
++
++ item = kdbus_get_item(msg, KDBUS_ITEM_CREDS);
++ ASSERT_RETURN(item);
++
++ ASSERT_RETURN(memcmp(&item->creds, expected_creds,
++ sizeof(struct kdbus_creds)) == 0);
++
++ return 0;
++}
++
++static int kdbus_match_kdbus_pids(struct kdbus_msg *msg,
++ const struct kdbus_pids *expected_pids)
++{
++ struct kdbus_item *item;
++
++ item = kdbus_get_item(msg, KDBUS_ITEM_PIDS);
++ ASSERT_RETURN(item);
++
++ ASSERT_RETURN(memcmp(&item->pids, expected_pids,
++ sizeof(struct kdbus_pids)) == 0);
++
++ return 0;
++}
++
++static int __kdbus_clone_userns_test(const char *bus,
++ struct kdbus_conn *conn,
++ uint64_t grandpa_pid,
++ int signal_fd)
++{
++ int clone_ret;
++ int ret;
++ struct kdbus_msg *msg = NULL;
++ const struct kdbus_item *item;
++ uint64_t cookie = time(NULL) ^ 0xdeadbeef;
++ struct kdbus_conn *unpriv_conn = NULL;
++ struct kdbus_pids parent_pids = {
++ .pid = getppid(),
++ .tid = getppid(),
++ .ppid = grandpa_pid,
++ };
++
++ ret = drop_privileges(UNPRIV_UID, UNPRIV_GID);
++ ASSERT_EXIT(ret == 0);
++
++ unpriv_conn = kdbus_hello(bus, 0, NULL, 0);
++ ASSERT_EXIT(unpriv_conn);
++
++ ret = kdbus_add_match_empty(unpriv_conn);
++ ASSERT_EXIT(ret == 0);
++
++ /*
++ * ping privileged connection from this new unprivileged
++ * one
++ */
++
++ ret = kdbus_msg_send(unpriv_conn, NULL, cookie, 0, 0,
++ 0, conn->id);
++ ASSERT_EXIT(ret == 0);
++
++ /*
++ * Since we just dropped privileges, the dumpable flag
++ * was just cleared which makes the /proc/$clone_child/uid_map
++ * to be owned by root, hence any userns uid mapping will fail
++ * with -EPERM since the mapping will be done by uid 65534.
++ *
++ * To avoid this set the dumpable flag again which makes
++ * procfs update the /proc/$clone_child/ inodes owner to 65534.
++ *
++ * Using this we will be able write to /proc/$clone_child/uid_map
++ * as uid 65534 and map the uid 65534 to 0 inside the user namespace.
++ */
++ ret = prctl(PR_SET_DUMPABLE, SUID_DUMP_USER);
++ ASSERT_EXIT(ret == 0);
++
++ /* Make child privileged in its new userns and run tests */
++
++ ret = RUN_CLONE_CHILD(&clone_ret,
++ SIGCHLD | CLONE_NEWUSER | CLONE_NEWPID,
++ ({ 0; /* Clone setup, nothing */ }),
++ ({
++ eventfd_t event_status = 0;
++ struct kdbus_conn *userns_conn;
++
++ /* ping connection from the new user namespace */
++ userns_conn = kdbus_hello(bus, 0, NULL, 0);
++ ASSERT_EXIT(userns_conn);
++
++ ret = kdbus_add_match_empty(userns_conn);
++ ASSERT_EXIT(ret == 0);
++
++ cookie++;
++ ret = kdbus_msg_send(userns_conn, NULL, cookie,
++ 0, 0, 0, conn->id);
++ ASSERT_EXIT(ret == 0);
++
++ /* Parent did send */
++ ret = eventfd_read(signal_fd, &event_status);
++ ASSERT_RETURN(ret >= 0 && event_status == 1);
++
++ /*
++ * Receive from privileged connection
++ */
++ kdbus_printf("Privileged → unprivileged/privileged "
++ "in its userns "
++ "(different userns and pidns):\n");
++ ret = kdbus_msg_recv_poll(userns_conn, 300, &msg, NULL);
++ ASSERT_EXIT(ret == 0);
++ ASSERT_EXIT(msg->dst_id == userns_conn->id);
++
++ item = kdbus_get_item(msg, KDBUS_ITEM_CAPS);
++ ASSERT_EXIT(item);
++
++ /* uid/gid not mapped, so we have unpriv cached creds */
++ ret = kdbus_match_kdbus_creds(msg, &unmapped_creds);
++ ASSERT_EXIT(ret == 0);
++
++ /*
++ * Diffent pid namepsaces. This is the child pidns
++ * so it should not see its parent kdbus_pids
++ */
++ ret = kdbus_match_kdbus_pids(msg, &unmapped_pids);
++ ASSERT_EXIT(ret == 0);
++
++ kdbus_msg_free(msg);
++
++
++ /*
++ * Receive broadcast from privileged connection
++ */
++ kdbus_printf("Privileged → unprivileged/privileged "
++ "in its userns "
++ "(different userns and pidns):\n");
++ ret = kdbus_msg_recv_poll(userns_conn, 300, &msg, NULL);
++ ASSERT_EXIT(ret == 0);
++ ASSERT_EXIT(msg->dst_id == KDBUS_DST_ID_BROADCAST);
++
++ item = kdbus_get_item(msg, KDBUS_ITEM_CAPS);
++ ASSERT_EXIT(item);
++
++ /* uid/gid not mapped, so we have unpriv cached creds */
++ ret = kdbus_match_kdbus_creds(msg, &unmapped_creds);
++ ASSERT_EXIT(ret == 0);
++
++ /*
++ * Diffent pid namepsaces. This is the child pidns
++ * so it should not see its parent kdbus_pids
++ */
++ ret = kdbus_match_kdbus_pids(msg, &unmapped_pids);
++ ASSERT_EXIT(ret == 0);
++
++ kdbus_msg_free(msg);
++
++ kdbus_conn_free(userns_conn);
++ }),
++ ({
++ /* Parent setup map child uid/gid */
++ ret = userns_map_uid_gid(pid, "0 65534 1", "0 65534 1");
++ ASSERT_EXIT(ret == 0);
++ }),
++ ({ 0; }));
++ /* Unprivileged was not able to create user namespace */
++ if (clone_ret == -EPERM) {
++ kdbus_printf("-- CLONE_NEWUSER TEST Failed for "
++ "uid: %u\n -- Make sure that your kernel "
++ "do not allow CLONE_NEWUSER for "
++ "unprivileged users\n", UNPRIV_UID);
++ ret = 0;
++ goto out;
++ }
++
++ ASSERT_EXIT(ret == 0);
++
++
++ /*
++ * Receive from privileged connection
++ */
++ kdbus_printf("\nPrivileged → unprivileged (same namespaces):\n");
++ ret = kdbus_msg_recv_poll(unpriv_conn, 300, &msg, NULL);
++
++ ASSERT_EXIT(ret == 0);
++ ASSERT_EXIT(msg->dst_id == unpriv_conn->id);
++
++ /* will get the privileged creds */
++ ret = kdbus_match_kdbus_creds(msg, &privileged_creds);
++ ASSERT_EXIT(ret == 0);
++
++ /* Same pidns so will get the kdbus_pids */
++ ret = kdbus_match_kdbus_pids(msg, &parent_pids);
++ ASSERT_RETURN(ret == 0);
++
++ kdbus_msg_free(msg);
++
++
++ /*
++ * Receive broadcast from privileged connection
++ */
++ kdbus_printf("\nPrivileged → unprivileged (same namespaces):\n");
++ ret = kdbus_msg_recv_poll(unpriv_conn, 300, &msg, NULL);
++
++ ASSERT_EXIT(ret == 0);
++ ASSERT_EXIT(msg->dst_id == KDBUS_DST_ID_BROADCAST);
++
++ /* will get the privileged creds */
++ ret = kdbus_match_kdbus_creds(msg, &privileged_creds);
++ ASSERT_EXIT(ret == 0);
++
++ ret = kdbus_match_kdbus_pids(msg, &parent_pids);
++ ASSERT_RETURN(ret == 0);
++
++ kdbus_msg_free(msg);
++
++out:
++ kdbus_conn_free(unpriv_conn);
++
++ return ret;
++}
++
++static int kdbus_clone_userns_test(const char *bus,
++ struct kdbus_conn *conn)
++{
++ int ret, status, efd;
++ pid_t pid, ppid;
++ uint64_t unpriv_conn_id, userns_conn_id;
++ struct kdbus_msg *msg;
++ const struct kdbus_item *item;
++ struct kdbus_pids expected_pids;
++ struct kdbus_conn *monitor;
++
++ kdbus_printf("STARTING TEST 'metadata-ns'.\n");
++
++ monitor = kdbus_hello(bus, KDBUS_HELLO_MONITOR, NULL, 0);
++ ASSERT_EXIT(monitor);
++
++ /*
++ * parent will signal to child that is in its
++ * userns to read its queue
++ */
++ efd = eventfd(0, EFD_CLOEXEC);
++ ASSERT_RETURN_VAL(efd >= 0, efd);
++
++ ppid = getppid();
++
++ pid = fork();
++ ASSERT_RETURN_VAL(pid >= 0, -errno);
++
++ if (pid == 0) {
++ ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
++ ASSERT_EXIT_VAL(ret == 0, -errno);
++
++ ret = __kdbus_clone_userns_test(bus, conn, ppid, efd);
++ _exit(ret);
++ }
++
++
++ /* Phase 1) privileged receives from unprivileged */
++
++ /*
++ * Receive from the unprivileged child
++ */
++ kdbus_printf("\nUnprivileged → privileged (same namespaces):\n");
++ ret = kdbus_msg_recv_poll(conn, 300, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ unpriv_conn_id = msg->src_id;
++
++ /* Unprivileged user */
++ ret = kdbus_match_kdbus_creds(msg, &unmapped_creds);
++ ASSERT_RETURN(ret == 0);
++
++ /* Set the expected creds_pids */
++ expected_pids = (struct kdbus_pids) {
++ .pid = pid,
++ .tid = pid,
++ .ppid = getpid(),
++ };
++ ret = kdbus_match_kdbus_pids(msg, &expected_pids);
++ ASSERT_RETURN(ret == 0);
++
++ kdbus_msg_free(msg);
++
++
++ /*
++ * Receive from the unprivileged that is in his own
++ * userns and pidns
++ */
++
++ kdbus_printf("\nUnprivileged/privileged in its userns → privileged "
++ "(different userns and pidns)\n");
++ ret = kdbus_msg_recv_poll(conn, 300, &msg, NULL);
++ if (ret == -ETIMEDOUT)
++ /* perhaps unprivileged userns is not allowed */
++ goto wait;
++
++ ASSERT_RETURN(ret == 0);
++
++ userns_conn_id = msg->src_id;
++
++ item = kdbus_get_item(msg, KDBUS_ITEM_CAPS);
++ ASSERT_RETURN(item);
++
++ /*
++ * Compare received items, creds must be translated into
++ * the receiver user namespace, so the user is unprivileged
++ */
++ ret = kdbus_match_kdbus_creds(msg, &unmapped_creds);
++ ASSERT_RETURN(ret == 0);
++
++ /*
++ * We should have the kdbus_pids since we are the parent
++ * pidns
++ */
++ item = kdbus_get_item(msg, KDBUS_ITEM_PIDS);
++ ASSERT_RETURN(item);
++
++ ASSERT_RETURN(memcmp(&item->pids, &unmapped_pids,
++ sizeof(struct kdbus_pids)) != 0);
++
++ /*
++ * Parent pid of the unprivileged/privileged in its userns
++ * is the unprivileged child pid that was forked here.
++ */
++ ASSERT_RETURN((uint64_t)pid == item->pids.ppid);
++
++ kdbus_msg_free(msg);
++
++
++ /* Phase 2) Privileged connection sends now 3 packets */
++
++ /*
++ * Sending to unprivileged connections a unicast
++ */
++ ret = kdbus_msg_send(conn, NULL, 0xdeadbeef, 0, 0,
++ 0, unpriv_conn_id);
++ ASSERT_RETURN(ret == 0);
++
++ /* signal to child that is in its userns */
++ ret = eventfd_write(efd, 1);
++ ASSERT_EXIT(ret == 0);
++
++ /*
++ * Sending to unprivileged/privilged in its userns
++ * connections a unicast
++ */
++ ret = kdbus_msg_send(conn, NULL, 0xdeadbeef, 0, 0,
++ 0, userns_conn_id);
++ ASSERT_RETURN(ret == 0);
++
++ /*
++ * Sending to unprivileged connections a broadcast
++ */
++ ret = kdbus_msg_send(conn, NULL, 0xdeadbeef, 0, 0,
++ 0, KDBUS_DST_ID_BROADCAST);
++ ASSERT_RETURN(ret == 0);
++
++
++wait:
++ ret = waitpid(pid, &status, 0);
++ ASSERT_RETURN(ret >= 0);
++
++ ASSERT_RETURN(WIFEXITED(status))
++ ASSERT_RETURN(!WEXITSTATUS(status));
++
++ /* Dump monitor queue */
++ kdbus_printf("\n\nMonitor queue:\n");
++ for (;;) {
++ ret = kdbus_msg_recv_poll(monitor, 100, &msg, NULL);
++ if (ret < 0)
++ break;
++
++ if (msg->payload_type == KDBUS_PAYLOAD_DBUS) {
++ /*
++ * Parent pidns should see all the
++ * pids
++ */
++ item = kdbus_get_item(msg, KDBUS_ITEM_PIDS);
++ ASSERT_RETURN(item);
++
++ ASSERT_RETURN(item->pids.pid != 0 &&
++ item->pids.tid != 0 &&
++ item->pids.ppid != 0);
++ }
++
++ kdbus_msg_free(msg);
++ }
++
++ kdbus_conn_free(monitor);
++ close(efd);
++
++ return 0;
++}
++
++int kdbus_test_metadata_ns(struct kdbus_test_env *env)
++{
++ int ret;
++ struct kdbus_conn *holder, *conn;
++ struct kdbus_policy_access policy_access = {
++ /* Allow world so we can inspect metadata in namespace */
++ .type = KDBUS_POLICY_ACCESS_WORLD,
++ .id = geteuid(),
++ .access = KDBUS_POLICY_TALK,
++ };
++
++ /*
++ * We require user-namespaces and all uids/gids
++ * should be mapped (we can just require the necessary ones)
++ */
++ if (!config_user_ns_is_enabled() ||
++ !all_uids_gids_are_mapped())
++ return TEST_SKIP;
++
++ ret = test_is_capable(CAP_SETUID, CAP_SETGID, CAP_SYS_ADMIN, -1);
++ ASSERT_RETURN(ret >= 0);
++
++ /* no enough privileges, SKIP test */
++ if (!ret)
++ return TEST_SKIP;
++
++ holder = kdbus_hello_registrar(env->buspath, "com.example.metadata",
++ &policy_access, 1,
++ KDBUS_HELLO_POLICY_HOLDER);
++ ASSERT_RETURN(holder);
++
++ conn = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn);
++
++ ret = kdbus_add_match_empty(conn);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_name_acquire(conn, "com.example.metadata", NULL);
++ ASSERT_EXIT(ret >= 0);
++
++ ret = kdbus_clone_userns_test(env->buspath, conn);
++ ASSERT_RETURN(ret == 0);
++
++ kdbus_conn_free(holder);
++ kdbus_conn_free(conn);
++
++ return TEST_OK;
++}
+diff --git a/tools/testing/selftests/kdbus/test-monitor.c b/tools/testing/selftests/kdbus/test-monitor.c
+new file mode 100644
+index 0000000..e00d738
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-monitor.c
+@@ -0,0 +1,176 @@
++#include <stdio.h>
++#include <string.h>
++#include <time.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <stdbool.h>
++#include <errno.h>
++#include <assert.h>
++#include <signal.h>
++#include <sys/time.h>
++#include <sys/mman.h>
++#include <sys/capability.h>
++#include <sys/wait.h>
++
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++#include "kdbus-test.h"
++
++int kdbus_test_monitor(struct kdbus_test_env *env)
++{
++ struct kdbus_conn *monitor, *conn;
++ unsigned int cookie = 0xdeadbeef;
++ struct kdbus_msg *msg;
++ uint64_t offset = 0;
++ int ret;
++
++ conn = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn);
++
++ /* add matches to make sure the monitor do not trigger an item add or
++ * remove on connect and disconnect, respectively.
++ */
++ ret = kdbus_add_match_id(conn, 0x1, KDBUS_ITEM_ID_ADD,
++ KDBUS_MATCH_ID_ANY);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_add_match_id(conn, 0x2, KDBUS_ITEM_ID_REMOVE,
++ KDBUS_MATCH_ID_ANY);
++ ASSERT_RETURN(ret == 0);
++
++ /* register a monitor */
++ monitor = kdbus_hello(env->buspath, KDBUS_HELLO_MONITOR, NULL, 0);
++ ASSERT_RETURN(monitor);
++
++ /* make sure we did not receive a monitor connect notification */
++ ret = kdbus_msg_recv(conn, &msg, &offset);
++ ASSERT_RETURN(ret == -EAGAIN);
++
++ /* check that a monitor cannot acquire a name */
++ ret = kdbus_name_acquire(monitor, "foo.bar.baz", NULL);
++ ASSERT_RETURN(ret == -EOPNOTSUPP);
++
++ ret = kdbus_msg_send(env->conn, NULL, cookie, 0, 0, 0, conn->id);
++ ASSERT_RETURN(ret == 0);
++
++ /* the recipient should have gotten the message */
++ ret = kdbus_msg_recv(conn, &msg, &offset);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(msg->cookie == cookie);
++ kdbus_msg_free(msg);
++ kdbus_free(conn, offset);
++
++ /* and so should the monitor */
++ ret = kdbus_msg_recv(monitor, &msg, &offset);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(msg->cookie == cookie);
++
++ kdbus_msg_free(msg);
++ kdbus_free(monitor, offset);
++
++ /* Installing matches for monitors must fais must fail */
++ ret = kdbus_add_match_empty(monitor);
++ ASSERT_RETURN(ret == -EOPNOTSUPP);
++
++ cookie++;
++ ret = kdbus_msg_send(env->conn, NULL, cookie, 0, 0, 0,
++ KDBUS_DST_ID_BROADCAST);
++ ASSERT_RETURN(ret == 0);
++
++ /* The monitor should get the message. */
++ ret = kdbus_msg_recv_poll(monitor, 100, &msg, &offset);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(msg->cookie == cookie);
++
++ kdbus_msg_free(msg);
++ kdbus_free(monitor, offset);
++
++ /*
++ * Since we are the only monitor, update the attach flags
++ * and tell we are not interessted in attach flags recv
++ */
++
++ ret = kdbus_conn_update_attach_flags(monitor,
++ _KDBUS_ATTACH_ALL,
++ 0);
++ ASSERT_RETURN(ret == 0);
++
++ cookie++;
++ ret = kdbus_msg_send(env->conn, NULL, cookie, 0, 0, 0,
++ KDBUS_DST_ID_BROADCAST);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_msg_recv_poll(monitor, 100, &msg, &offset);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(msg->cookie == cookie);
++
++ ret = kdbus_item_in_message(msg, KDBUS_ITEM_TIMESTAMP);
++ ASSERT_RETURN(ret == 0);
++
++ kdbus_msg_free(msg);
++ kdbus_free(monitor, offset);
++
++ /*
++ * Now we are interested in KDBUS_ITEM_TIMESTAMP and
++ * KDBUS_ITEM_CREDS
++ */
++ ret = kdbus_conn_update_attach_flags(monitor,
++ _KDBUS_ATTACH_ALL,
++ KDBUS_ATTACH_TIMESTAMP |
++ KDBUS_ATTACH_CREDS);
++ ASSERT_RETURN(ret == 0);
++
++ cookie++;
++ ret = kdbus_msg_send(env->conn, NULL, cookie, 0, 0, 0,
++ KDBUS_DST_ID_BROADCAST);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_msg_recv_poll(monitor, 100, &msg, &offset);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(msg->cookie == cookie);
++
++ ret = kdbus_item_in_message(msg, KDBUS_ITEM_TIMESTAMP);
++ ASSERT_RETURN(ret == 1);
++
++ ret = kdbus_item_in_message(msg, KDBUS_ITEM_CREDS);
++ ASSERT_RETURN(ret == 1);
++
++ /* the KDBUS_ITEM_PID_COMM was not requested */
++ ret = kdbus_item_in_message(msg, KDBUS_ITEM_PID_COMM);
++ ASSERT_RETURN(ret == 0);
++
++ kdbus_msg_free(msg);
++ kdbus_free(monitor, offset);
++
++ kdbus_conn_free(monitor);
++ /* make sure we did not receive a monitor disconnect notification */
++ ret = kdbus_msg_recv(conn, &msg, &offset);
++ ASSERT_RETURN(ret == -EAGAIN);
++
++ kdbus_conn_free(conn);
++
++ /* Make sure that monitor as unprivileged is not allowed */
++ ret = test_is_capable(CAP_SETUID, CAP_SETGID, -1);
++ ASSERT_RETURN(ret >= 0);
++
++ if (ret && all_uids_gids_are_mapped()) {
++ ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_UID, ({
++ monitor = kdbus_hello(env->buspath,
++ KDBUS_HELLO_MONITOR,
++ NULL, 0);
++ ASSERT_EXIT(!monitor && errno == EPERM);
++
++ _exit(EXIT_SUCCESS);
++ }),
++ ({ 0; }));
++ ASSERT_RETURN(ret == 0);
++ }
++
++ return TEST_OK;
++}
+diff --git a/tools/testing/selftests/kdbus/test-names.c b/tools/testing/selftests/kdbus/test-names.c
+new file mode 100644
+index 0000000..e400dc8
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-names.c
+@@ -0,0 +1,272 @@
++#include <stdio.h>
++#include <string.h>
++#include <time.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++#include <assert.h>
++#include <limits.h>
++#include <getopt.h>
++#include <stdbool.h>
++
++#include "kdbus-api.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++#include "kdbus-test.h"
++
++struct test_name {
++ const char *name;
++ __u64 owner_id;
++ __u64 flags;
++};
++
++static bool conn_test_names(const struct kdbus_conn *conn,
++ const struct test_name *tests,
++ unsigned int n_tests)
++{
++ struct kdbus_cmd_list cmd_list = {};
++ struct kdbus_info *name, *list;
++ unsigned int i;
++ int ret;
++
++ cmd_list.size = sizeof(cmd_list);
++ cmd_list.flags = KDBUS_LIST_NAMES |
++ KDBUS_LIST_ACTIVATORS |
++ KDBUS_LIST_QUEUED;
++
++ ret = kdbus_cmd_list(conn->fd, &cmd_list);
++ ASSERT_RETURN(ret == 0);
++
++ list = (struct kdbus_info *)(conn->buf + cmd_list.offset);
++
++ for (i = 0; i < n_tests; i++) {
++ const struct test_name *t = tests + i;
++ bool found = false;
++
++ KDBUS_FOREACH(name, list, cmd_list.list_size) {
++ struct kdbus_item *item;
++
++ KDBUS_ITEM_FOREACH(item, name, items) {
++ if (item->type != KDBUS_ITEM_OWNED_NAME ||
++ strcmp(item->name.name, t->name) != 0)
++ continue;
++
++ if (t->owner_id == name->id &&
++ t->flags == item->name.flags) {
++ found = true;
++ break;
++ }
++ }
++ }
++
++ if (!found)
++ return false;
++ }
++
++ return true;
++}
++
++static bool conn_is_name_primary_owner(const struct kdbus_conn *conn,
++ const char *needle)
++{
++ struct test_name t = {
++ .name = needle,
++ .owner_id = conn->id,
++ .flags = KDBUS_NAME_PRIMARY,
++ };
++
++ return conn_test_names(conn, &t, 1);
++}
++
++int kdbus_test_name_basic(struct kdbus_test_env *env)
++{
++ struct kdbus_conn *conn;
++ char *name, *dot_name, *invalid_name, *wildcard_name;
++ int ret;
++
++ name = "foo.bla.blaz";
++ dot_name = ".bla.blaz";
++ invalid_name = "foo";
++ wildcard_name = "foo.bla.bl.*";
++
++ /* create a 2nd connection */
++ conn = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn != NULL);
++
++ /* acquire name "foo.bar.xxx" name */
++ ret = kdbus_name_acquire(conn, "foo.bar.xxx", NULL);
++ ASSERT_RETURN(ret == 0);
++
++ /* Name is not valid, must fail */
++ ret = kdbus_name_acquire(env->conn, dot_name, NULL);
++ ASSERT_RETURN(ret == -EINVAL);
++
++ ret = kdbus_name_acquire(env->conn, invalid_name, NULL);
++ ASSERT_RETURN(ret == -EINVAL);
++
++ ret = kdbus_name_acquire(env->conn, wildcard_name, NULL);
++ ASSERT_RETURN(ret == -EINVAL);
++
++ /* check that we can acquire a name */
++ ret = kdbus_name_acquire(env->conn, name, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ ret = conn_is_name_primary_owner(env->conn, name);
++ ASSERT_RETURN(ret == true);
++
++ /* ... and release it again */
++ ret = kdbus_name_release(env->conn, name);
++ ASSERT_RETURN(ret == 0);
++
++ ret = conn_is_name_primary_owner(env->conn, name);
++ ASSERT_RETURN(ret == false);
++
++ /* check that we can't release it again */
++ ret = kdbus_name_release(env->conn, name);
++ ASSERT_RETURN(ret == -ESRCH);
++
++ /* check that we can't release a name that we don't own */
++ ret = kdbus_name_release(env->conn, "foo.bar.xxx");
++ ASSERT_RETURN(ret == -EADDRINUSE);
++
++ /* Name is not valid, must fail */
++ ret = kdbus_name_release(env->conn, dot_name);
++ ASSERT_RETURN(ret == -ESRCH);
++
++ ret = kdbus_name_release(env->conn, invalid_name);
++ ASSERT_RETURN(ret == -ESRCH);
++
++ ret = kdbus_name_release(env->conn, wildcard_name);
++ ASSERT_RETURN(ret == -ESRCH);
++
++ kdbus_conn_free(conn);
++
++ return TEST_OK;
++}
++
++int kdbus_test_name_conflict(struct kdbus_test_env *env)
++{
++ struct kdbus_conn *conn;
++ char *name;
++ int ret;
++
++ name = "foo.bla.blaz";
++
++ /* create a 2nd connection */
++ conn = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn != NULL);
++
++ /* allow the new connection to own the same name */
++ /* acquire name from the 1st connection */
++ ret = kdbus_name_acquire(env->conn, name, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ ret = conn_is_name_primary_owner(env->conn, name);
++ ASSERT_RETURN(ret == true);
++
++ /* check that we also can't acquire it again from the 2nd connection */
++ ret = kdbus_name_acquire(conn, name, NULL);
++ ASSERT_RETURN(ret == -EEXIST);
++
++ kdbus_conn_free(conn);
++
++ return TEST_OK;
++}
++
++int kdbus_test_name_queue(struct kdbus_test_env *env)
++{
++ struct kdbus_conn *conn;
++ struct test_name t[2];
++ const char *name;
++ uint64_t flags;
++ int ret;
++
++ name = "foo.bla.blaz";
++
++ flags = 0;
++
++ /* create a 2nd connection */
++ conn = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn != NULL);
++
++ /* allow the new connection to own the same name */
++ /* acquire name from the 1st connection */
++ ret = kdbus_name_acquire(env->conn, name, &flags);
++ ASSERT_RETURN(ret == 0);
++
++ ret = conn_is_name_primary_owner(env->conn, name);
++ ASSERT_RETURN(ret == true);
++
++ /* queue the 2nd connection as waiting owner */
++ flags = KDBUS_NAME_QUEUE;
++ ret = kdbus_name_acquire(conn, name, &flags);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(flags & KDBUS_NAME_IN_QUEUE);
++
++ t[0].name = name;
++ t[0].owner_id = env->conn->id;
++ t[0].flags = KDBUS_NAME_PRIMARY;
++ t[1].name = name;
++ t[1].owner_id = conn->id;
++ t[1].flags = KDBUS_NAME_QUEUE | KDBUS_NAME_IN_QUEUE;
++ ret = conn_test_names(conn, t, 2);
++ ASSERT_RETURN(ret == true);
++
++ /* release name from 1st connection */
++ ret = kdbus_name_release(env->conn, name);
++ ASSERT_RETURN(ret == 0);
++
++ /* now the name should be owned by the 2nd connection */
++ t[0].name = name;
++ t[0].owner_id = conn->id;
++ t[0].flags = KDBUS_NAME_PRIMARY | KDBUS_NAME_QUEUE;
++ ret = conn_test_names(conn, t, 1);
++ ASSERT_RETURN(ret == true);
++
++ kdbus_conn_free(conn);
++
++ return TEST_OK;
++}
++
++int kdbus_test_name_takeover(struct kdbus_test_env *env)
++{
++ struct kdbus_conn *conn;
++ struct test_name t;
++ const char *name;
++ uint64_t flags;
++ int ret;
++
++ name = "foo.bla.blaz";
++
++ flags = KDBUS_NAME_ALLOW_REPLACEMENT;
++
++ /* create a 2nd connection */
++ conn = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn != NULL);
++
++ /* acquire name for 1st connection */
++ ret = kdbus_name_acquire(env->conn, name, &flags);
++ ASSERT_RETURN(ret == 0);
++
++ t.name = name;
++ t.owner_id = env->conn->id;
++ t.flags = KDBUS_NAME_ALLOW_REPLACEMENT | KDBUS_NAME_PRIMARY;
++ ret = conn_test_names(conn, &t, 1);
++ ASSERT_RETURN(ret == true);
++
++ /* now steal name with 2nd connection */
++ flags = KDBUS_NAME_REPLACE_EXISTING;
++ ret = kdbus_name_acquire(conn, name, &flags);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(flags & KDBUS_NAME_ACQUIRED);
++
++ ret = conn_is_name_primary_owner(conn, name);
++ ASSERT_RETURN(ret == true);
++
++ kdbus_conn_free(conn);
++
++ return TEST_OK;
++}
+diff --git a/tools/testing/selftests/kdbus/test-policy-ns.c b/tools/testing/selftests/kdbus/test-policy-ns.c
+new file mode 100644
+index 0000000..3437012
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-policy-ns.c
+@@ -0,0 +1,632 @@
++/*
++ * Test metadata and policies in new namespaces. Even if our tests
++ * can run in a namespaced setup, this test is necessary so we can
++ * inspect policies on the same kdbusfs but between multiple
++ * namespaces.
++ *
++ * Copyright (C) 2014-2015 Djalal Harouni
++ *
++ * kdbus is free software; you can redistribute it and/or modify it under
++ * the terms of the GNU Lesser General Public License as published by the
++ * Free Software Foundation; either version 2.1 of the License, or (at
++ * your option) any later version.
++ */
++
++#include <stdio.h>
++#include <string.h>
++#include <fcntl.h>
++#include <pthread.h>
++#include <sched.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <stdint.h>
++#include <stdbool.h>
++#include <unistd.h>
++#include <errno.h>
++#include <signal.h>
++#include <sys/wait.h>
++#include <sys/prctl.h>
++#include <sys/eventfd.h>
++#include <sys/syscall.h>
++#include <sys/capability.h>
++#include <linux/sched.h>
++
++#include "kdbus-test.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++
++#define MAX_CONN 64
++#define POLICY_NAME "foo.test.policy-test"
++
++#define KDBUS_CONN_MAX_MSGS_PER_USER 16
++
++/**
++ * Note: this test can be used to inspect policy_db->talk_access_hash
++ *
++ * The purpose of these tests:
++ * 1) Check KDBUS_POLICY_TALK
++ * 2) Check the cache state: kdbus_policy_db->talk_access_hash
++ * Should be extended
++ */
++
++/**
++ * Check a list of connections against conn_db[0]
++ * conn_db[0] will own the name "foo.test.policy-test" and the
++ * policy holder connection for this name will update the policy
++ * entries, so different use cases can be tested.
++ */
++static struct kdbus_conn **conn_db;
++
++static void *kdbus_recv_echo(void *ptr)
++{
++ int ret;
++ struct kdbus_conn *conn = ptr;
++
++ ret = kdbus_msg_recv_poll(conn, 200, NULL, NULL);
++
++ return (void *)(long)ret;
++}
++
++/* Trigger kdbus_policy_set() */
++static int kdbus_set_policy_talk(struct kdbus_conn *conn,
++ const char *name,
++ uid_t id, unsigned int type)
++{
++ int ret;
++ struct kdbus_policy_access access = {
++ .type = type,
++ .id = id,
++ .access = KDBUS_POLICY_TALK,
++ };
++
++ ret = kdbus_conn_update_policy(conn, name, &access, 1);
++ ASSERT_RETURN(ret == 0);
++
++ return TEST_OK;
++}
++
++/* return TEST_OK or TEST_ERR on failure */
++static int kdbus_register_same_activator(char *bus, const char *name,
++ struct kdbus_conn **c)
++{
++ int ret;
++ struct kdbus_conn *activator;
++
++ activator = kdbus_hello_activator(bus, name, NULL, 0);
++ if (activator) {
++ *c = activator;
++ fprintf(stderr, "--- error was able to register name twice '%s'.\n",
++ name);
++ return TEST_ERR;
++ }
++
++ ret = -errno;
++ /* -EEXIST means test succeeded */
++ if (ret == -EEXIST)
++ return TEST_OK;
++
++ return TEST_ERR;
++}
++
++/* return TEST_OK or TEST_ERR on failure */
++static int kdbus_register_policy_holder(char *bus, const char *name,
++ struct kdbus_conn **conn)
++{
++ struct kdbus_conn *c;
++ struct kdbus_policy_access access[2];
++
++ access[0].type = KDBUS_POLICY_ACCESS_USER;
++ access[0].access = KDBUS_POLICY_OWN;
++ access[0].id = geteuid();
++
++ access[1].type = KDBUS_POLICY_ACCESS_WORLD;
++ access[1].access = KDBUS_POLICY_TALK;
++ access[1].id = geteuid();
++
++ c = kdbus_hello_registrar(bus, name, access, 2,
++ KDBUS_HELLO_POLICY_HOLDER);
++ ASSERT_RETURN(c);
++
++ *conn = c;
++
++ return TEST_OK;
++}
++
++/**
++ * Create new threads for receiving from multiple senders,
++ * The 'conn_db' will be populated by newly created connections.
++ * Caller should free all allocated connections.
++ *
++ * return 0 on success, negative errno on failure.
++ */
++static int kdbus_recv_in_threads(const char *bus, const char *name,
++ struct kdbus_conn **conn_db)
++{
++ int ret;
++ bool pool_full = false;
++ unsigned int sent_packets = 0;
++ unsigned int lost_packets = 0;
++ unsigned int i, tid;
++ unsigned long dst_id;
++ unsigned long cookie = 1;
++ unsigned int thread_nr = MAX_CONN - 1;
++ pthread_t thread_id[MAX_CONN - 1] = {'\0'};
++
++ dst_id = name ? KDBUS_DST_ID_NAME : conn_db[0]->id;
++
++ for (tid = 0, i = 1; tid < thread_nr; tid++, i++) {
++ ret = pthread_create(&thread_id[tid], NULL,
++ kdbus_recv_echo, (void *)conn_db[0]);
++ if (ret < 0) {
++ ret = -errno;
++ kdbus_printf("error pthread_create: %d (%m)\n",
++ ret);
++ break;
++ }
++
++ /* just free before re-using */
++ kdbus_conn_free(conn_db[i]);
++ conn_db[i] = NULL;
++
++ /* We need to create connections here */
++ conn_db[i] = kdbus_hello(bus, 0, NULL, 0);
++ if (!conn_db[i]) {
++ ret = -errno;
++ break;
++ }
++
++ ret = kdbus_add_match_empty(conn_db[i]);
++ if (ret < 0)
++ break;
++
++ ret = kdbus_msg_send(conn_db[i], name, cookie++,
++ 0, 0, 0, dst_id);
++ if (ret < 0) {
++ /*
++ * Receivers are not reading their messages,
++ * not scheduled ?!
++ *
++ * So set the pool full here, perhaps the
++ * connection pool or queue was full, later
++ * recheck receivers errors
++ */
++ if (ret == -ENOBUFS || ret == -EXFULL)
++ pool_full = true;
++ break;
++ }
++
++ sent_packets++;
++ }
++
++ for (tid = 0; tid < thread_nr; tid++) {
++ int thread_ret = 0;
++
++ if (thread_id[tid]) {
++ pthread_join(thread_id[tid], (void *)&thread_ret);
++ if (thread_ret < 0) {
++ /* Update only if send did not fail */
++ if (ret == 0)
++ ret = thread_ret;
++
++ lost_packets++;
++ }
++ }
++ }
++
++ /*
++ * When sending if we did fail with -ENOBUFS or -EXFULL
++ * then we should have set lost_packet and we should at
++ * least have sent_packets set to KDBUS_CONN_MAX_MSGS_PER_USER
++ */
++ if (pool_full) {
++ ASSERT_RETURN(lost_packets > 0);
++
++ /*
++ * We should at least send KDBUS_CONN_MAX_MSGS_PER_USER
++ *
++ * For every send operation we create a thread to
++ * recv the packet, so we keep the queue clean
++ */
++ ASSERT_RETURN(sent_packets >= KDBUS_CONN_MAX_MSGS_PER_USER);
++
++ /*
++ * Set ret to zero since we only failed due to
++ * the receiving threads that have not been
++ * scheduled
++ */
++ ret = 0;
++ }
++
++ return ret;
++}
++
++/* Return: TEST_OK or TEST_ERR on failure */
++static int kdbus_normal_test(const char *bus, const char *name,
++ struct kdbus_conn **conn_db)
++{
++ int ret;
++
++ ret = kdbus_recv_in_threads(bus, name, conn_db);
++ ASSERT_RETURN(ret >= 0);
++
++ return TEST_OK;
++}
++
++static int kdbus_fork_test_by_id(const char *bus,
++ struct kdbus_conn **conn_db,
++ int parent_status, int child_status)
++{
++ int ret;
++ pid_t pid;
++ uint64_t cookie = 0x9876ecba;
++ struct kdbus_msg *msg = NULL;
++ uint64_t offset = 0;
++ int status = 0;
++
++ /*
++ * If the child_status is not EXIT_SUCCESS, then we expect
++ * that sending from the child will fail, thus receiving
++ * from parent must error with -ETIMEDOUT, and vice versa.
++ */
++ bool parent_timedout = !!child_status;
++ bool child_timedout = !!parent_status;
++
++ pid = fork();
++ ASSERT_RETURN_VAL(pid >= 0, pid);
++
++ if (pid == 0) {
++ struct kdbus_conn *conn_src;
++
++ ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
++ ASSERT_EXIT(ret == 0);
++
++ ret = drop_privileges(65534, 65534);
++ ASSERT_EXIT(ret == 0);
++
++ conn_src = kdbus_hello(bus, 0, NULL, 0);
++ ASSERT_EXIT(conn_src);
++
++ ret = kdbus_add_match_empty(conn_src);
++ ASSERT_EXIT(ret == 0);
++
++ /*
++ * child_status is always checked against send
++ * operations, in case it fails always return
++ * EXIT_FAILURE.
++ */
++ ret = kdbus_msg_send(conn_src, NULL, cookie,
++ 0, 0, 0, conn_db[0]->id);
++ ASSERT_EXIT(ret == child_status);
++
++ ret = kdbus_msg_recv_poll(conn_src, 100, NULL, NULL);
++
++ kdbus_conn_free(conn_src);
++
++ /*
++ * Child kdbus_msg_recv_poll() should timeout since
++ * the parent_status was set to a non EXIT_SUCCESS
++ * value.
++ */
++ if (child_timedout)
++ _exit(ret == -ETIMEDOUT ? EXIT_SUCCESS : EXIT_FAILURE);
++
++ _exit(ret == 0 ? EXIT_SUCCESS : EXIT_FAILURE);
++ }
++
++ ret = kdbus_msg_recv_poll(conn_db[0], 100, &msg, &offset);
++ /*
++ * If parent_timedout is set then this should fail with
++ * -ETIMEDOUT since the child_status was set to a non
++ * EXIT_SUCCESS value. Otherwise, assume
++ * that kdbus_msg_recv_poll() has succeeded.
++ */
++ if (parent_timedout) {
++ ASSERT_RETURN_VAL(ret == -ETIMEDOUT, TEST_ERR);
++
++ /* timedout no need to continue, we don't have the
++ * child connection ID, so just terminate. */
++ goto out;
++ } else {
++ ASSERT_RETURN_VAL(ret == 0, ret);
++ }
++
++ ret = kdbus_msg_send(conn_db[0], NULL, ++cookie,
++ 0, 0, 0, msg->src_id);
++ /*
++ * parent_status is checked against send operations,
++ * on failures always return TEST_ERR.
++ */
++ ASSERT_RETURN_VAL(ret == parent_status, TEST_ERR);
++
++ kdbus_msg_free(msg);
++ kdbus_free(conn_db[0], offset);
++
++out:
++ ret = waitpid(pid, &status, 0);
++ ASSERT_RETURN_VAL(ret >= 0, ret);
++
++ return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
++}
++
++/*
++ * Return: TEST_OK, TEST_ERR or TEST_SKIP
++ * we return TEST_OK only if the children return with the expected
++ * 'expected_status' that is specified as an argument.
++ */
++static int kdbus_fork_test(const char *bus, const char *name,
++ struct kdbus_conn **conn_db, int expected_status)
++{
++ pid_t pid;
++ int ret = 0;
++ int status = 0;
++
++ pid = fork();
++ ASSERT_RETURN_VAL(pid >= 0, pid);
++
++ if (pid == 0) {
++ ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
++ ASSERT_EXIT(ret == 0);
++
++ ret = drop_privileges(65534, 65534);
++ ASSERT_EXIT(ret == 0);
++
++ ret = kdbus_recv_in_threads(bus, name, conn_db);
++ _exit(ret == expected_status ? EXIT_SUCCESS : EXIT_FAILURE);
++ }
++
++ ret = waitpid(pid, &status, 0);
++ ASSERT_RETURN(ret >= 0);
++
++ return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
++}
++
++/* Return EXIT_SUCCESS, EXIT_FAILURE or negative errno */
++static int __kdbus_clone_userns_test(const char *bus,
++ const char *name,
++ struct kdbus_conn **conn_db,
++ int expected_status)
++{
++ int efd;
++ pid_t pid;
++ int ret = 0;
++ unsigned int uid = 65534;
++ int status;
++
++ ret = drop_privileges(uid, uid);
++ ASSERT_RETURN_VAL(ret == 0, ret);
++
++ /*
++ * Since we just dropped privileges, the dumpable flag was just
++ * cleared which makes the /proc/$clone_child/uid_map to be
++ * owned by root, hence any userns uid mapping will fail with
++ * -EPERM since the mapping will be done by uid 65534.
++ *
++ * To avoid this set the dumpable flag again which makes procfs
++ * update the /proc/$clone_child/ inodes owner to 65534.
++ *
++ * Using this we will be able write to /proc/$clone_child/uid_map
++ * as uid 65534 and map the uid 65534 to 0 inside the user
++ * namespace.
++ */
++ ret = prctl(PR_SET_DUMPABLE, SUID_DUMP_USER);
++ ASSERT_RETURN_VAL(ret == 0, ret);
++
++ /* sync parent/child */
++ efd = eventfd(0, EFD_CLOEXEC);
++ ASSERT_RETURN_VAL(efd >= 0, efd);
++
++ pid = syscall(__NR_clone, SIGCHLD|CLONE_NEWUSER, NULL);
++ if (pid < 0) {
++ ret = -errno;
++ kdbus_printf("error clone: %d (%m)\n", ret);
++ /*
++ * Normal user not allowed to create userns,
++ * so nothing to worry about ?
++ */
++ if (ret == -EPERM) {
++ kdbus_printf("-- CLONE_NEWUSER TEST Failed for uid: %u\n"
++ "-- Make sure that your kernel do not allow "
++ "CLONE_NEWUSER for unprivileged users\n"
++ "-- Upstream Commit: "
++ "https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5eaf563e\n",
++ uid);
++ ret = 0;
++ }
++
++ return ret;
++ }
++
++ if (pid == 0) {
++ struct kdbus_conn *conn_src;
++ eventfd_t event_status = 0;
++
++ ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
++ ASSERT_EXIT(ret == 0);
++
++ ret = eventfd_read(efd, &event_status);
++ ASSERT_EXIT(ret >= 0 && event_status == 1);
++
++ /* ping connection from the new user namespace */
++ conn_src = kdbus_hello(bus, 0, NULL, 0);
++ ASSERT_EXIT(conn_src);
++
++ ret = kdbus_add_match_empty(conn_src);
++ ASSERT_EXIT(ret == 0);
++
++ ret = kdbus_msg_send(conn_src, name, 0xabcd1234,
++ 0, 0, 0, KDBUS_DST_ID_NAME);
++ kdbus_conn_free(conn_src);
++
++ _exit(ret == expected_status ? EXIT_SUCCESS : EXIT_FAILURE);
++ }
++
++ ret = userns_map_uid_gid(pid, "0 65534 1", "0 65534 1");
++ ASSERT_RETURN_VAL(ret == 0, ret);
++
++ /* Tell child we are ready */
++ ret = eventfd_write(efd, 1);
++ ASSERT_RETURN_VAL(ret == 0, ret);
++
++ ret = waitpid(pid, &status, 0);
++ ASSERT_RETURN_VAL(ret >= 0, ret);
++
++ close(efd);
++
++ return status == EXIT_SUCCESS ? TEST_OK : TEST_ERR;
++}
++
++static int kdbus_clone_userns_test(const char *bus,
++ const char *name,
++ struct kdbus_conn **conn_db,
++ int expected_status)
++{
++ pid_t pid;
++ int ret = 0;
++ int status;
++
++ pid = fork();
++ ASSERT_RETURN_VAL(pid >= 0, -errno);
++
++ if (pid == 0) {
++ ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
++ if (ret < 0)
++ _exit(EXIT_FAILURE);
++
++ ret = __kdbus_clone_userns_test(bus, name, conn_db,
++ expected_status);
++ _exit(ret);
++ }
++
++ /*
++ * Receive in the original (root privileged) user namespace,
++ * must fail with -ETIMEDOUT.
++ */
++ ret = kdbus_msg_recv_poll(conn_db[0], 100, NULL, NULL);
++ ASSERT_RETURN_VAL(ret == -ETIMEDOUT, ret);
++
++ ret = waitpid(pid, &status, 0);
++ ASSERT_RETURN_VAL(ret >= 0, ret);
++
++ return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
++}
++
++int kdbus_test_policy_ns(struct kdbus_test_env *env)
++{
++ int i;
++ int ret;
++ struct kdbus_conn *activator = NULL;
++ struct kdbus_conn *policy_holder = NULL;
++ char *bus = env->buspath;
++
++ ret = test_is_capable(CAP_SETUID, CAP_SETGID, -1);
++ ASSERT_RETURN(ret >= 0);
++
++ /* no enough privileges, SKIP test */
++ if (!ret)
++ return TEST_SKIP;
++
++ /* we require user-namespaces */
++ if (access("/proc/self/uid_map", F_OK) != 0)
++ return TEST_SKIP;
++
++ /* uids/gids must be mapped */
++ if (!all_uids_gids_are_mapped())
++ return TEST_SKIP;
++
++ conn_db = calloc(MAX_CONN, sizeof(struct kdbus_conn *));
++ ASSERT_RETURN(conn_db);
++
++ memset(conn_db, 0, MAX_CONN * sizeof(struct kdbus_conn *));
++
++ conn_db[0] = kdbus_hello(bus, 0, NULL, 0);
++ ASSERT_RETURN(conn_db[0]);
++
++ ret = kdbus_add_match_empty(conn_db[0]);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_fork_test_by_id(bus, conn_db, -EPERM, -EPERM);
++ ASSERT_EXIT(ret == 0);
++
++ ret = kdbus_register_policy_holder(bus, POLICY_NAME,
++ &policy_holder);
++ ASSERT_RETURN(ret == 0);
++
++ /* Try to register the same name with an activator */
++ ret = kdbus_register_same_activator(bus, POLICY_NAME,
++ &activator);
++ ASSERT_RETURN(ret == 0);
++
++ /* Acquire POLICY_NAME */
++ ret = kdbus_name_acquire(conn_db[0], POLICY_NAME, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_normal_test(bus, POLICY_NAME, conn_db);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_list(conn_db[0], KDBUS_LIST_NAMES |
++ KDBUS_LIST_UNIQUE |
++ KDBUS_LIST_ACTIVATORS |
++ KDBUS_LIST_QUEUED);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_fork_test(bus, POLICY_NAME, conn_db, EXIT_SUCCESS);
++ ASSERT_RETURN(ret == 0);
++
++ /*
++ * children connections are able to talk to conn_db[0] since
++ * current POLICY_NAME TALK type is KDBUS_POLICY_ACCESS_WORLD,
++ * so expect EXIT_SUCCESS when sending from child. However,
++ * since the child's connection does not own any well-known
++ * name, The parent connection conn_db[0] should fail with
++ * -EPERM but since it is a privileged bus user the TALK is
++ * allowed.
++ */
++ ret = kdbus_fork_test_by_id(bus, conn_db,
++ EXIT_SUCCESS, EXIT_SUCCESS);
++ ASSERT_EXIT(ret == 0);
++
++ /*
++ * Connections that can talk are perhaps being destroyed now.
++ * Restrict the policy and purge cache entries where the
++ * conn_db[0] is the destination.
++ *
++ * Now only connections with uid == 0 are allowed to talk.
++ */
++ ret = kdbus_set_policy_talk(policy_holder, POLICY_NAME,
++ geteuid(), KDBUS_POLICY_ACCESS_USER);
++ ASSERT_RETURN(ret == 0);
++
++ /*
++ * Testing connections (FORK+DROP) again:
++ * After setting the policy re-check connections
++ * we expect the children to fail with -EPERM
++ */
++ ret = kdbus_fork_test(bus, POLICY_NAME, conn_db, -EPERM);
++ ASSERT_RETURN(ret == 0);
++
++ /*
++ * Now expect that both parent and child to fail.
++ *
++ * Child should fail with -EPERM since we just restricted
++ * the POLICY_NAME TALK to uid 0 and its uid is 65534.
++ *
++ * Since the parent's connection will timeout when receiving
++ * from the child, we never continue. FWIW just put -EPERM.
++ */
++ ret = kdbus_fork_test_by_id(bus, conn_db, -EPERM, -EPERM);
++ ASSERT_EXIT(ret == 0);
++
++ /* Check if the name can be reached in a new userns */
++ ret = kdbus_clone_userns_test(bus, POLICY_NAME, conn_db, -EPERM);
++ ASSERT_RETURN(ret == 0);
++
++ for (i = 0; i < MAX_CONN; i++)
++ kdbus_conn_free(conn_db[i]);
++
++ kdbus_conn_free(activator);
++ kdbus_conn_free(policy_holder);
++
++ free(conn_db);
++
++ return ret;
++}
+diff --git a/tools/testing/selftests/kdbus/test-policy-priv.c b/tools/testing/selftests/kdbus/test-policy-priv.c
+new file mode 100644
+index 0000000..0208638
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-policy-priv.c
+@@ -0,0 +1,1285 @@
++#include <errno.h>
++#include <stdio.h>
++#include <string.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stdint.h>
++#include <stdbool.h>
++#include <unistd.h>
++#include <time.h>
++#include <sys/capability.h>
++#include <sys/eventfd.h>
++#include <sys/wait.h>
++
++#include "kdbus-test.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++
++static int test_policy_priv_by_id(const char *bus,
++ struct kdbus_conn *conn_dst,
++ bool drop_second_user,
++ int parent_status,
++ int child_status)
++{
++ int ret = 0;
++ uint64_t expected_cookie = time(NULL) ^ 0xdeadbeef;
++
++ ASSERT_RETURN(conn_dst);
++
++ ret = RUN_UNPRIVILEGED_CONN(unpriv, bus, ({
++ ret = kdbus_msg_send(unpriv, NULL,
++ expected_cookie, 0, 0, 0,
++ conn_dst->id);
++ ASSERT_EXIT(ret == child_status);
++ }));
++ ASSERT_RETURN(ret >= 0);
++
++ ret = kdbus_msg_recv_poll(conn_dst, 300, NULL, NULL);
++ ASSERT_RETURN(ret == parent_status);
++
++ return 0;
++}
++
++static int test_policy_priv_by_broadcast(const char *bus,
++ struct kdbus_conn *conn_dst,
++ int drop_second_user,
++ int parent_status,
++ int child_status)
++{
++ int efd;
++ int ret = 0;
++ eventfd_t event_status = 0;
++ struct kdbus_msg *msg = NULL;
++ uid_t second_uid = UNPRIV_UID;
++ gid_t second_gid = UNPRIV_GID;
++ struct kdbus_conn *child_2 = conn_dst;
++ uint64_t expected_cookie = time(NULL) ^ 0xdeadbeef;
++
++ /* Drop to another unprivileged user other than UNPRIV_UID */
++ if (drop_second_user == DROP_OTHER_UNPRIV) {
++ second_uid = UNPRIV_UID - 1;
++ second_gid = UNPRIV_GID - 1;
++ }
++
++ /* child will signal parent to send broadcast */
++ efd = eventfd(0, EFD_CLOEXEC);
++ ASSERT_RETURN_VAL(efd >= 0, efd);
++
++ ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_GID, ({
++ struct kdbus_conn *child;
++
++ child = kdbus_hello(bus, 0, NULL, 0);
++ ASSERT_EXIT(child);
++
++ ret = kdbus_add_match_empty(child);
++ ASSERT_EXIT(ret == 0);
++
++ /* signal parent */
++ ret = eventfd_write(efd, 1);
++ ASSERT_EXIT(ret == 0);
++
++ /* Use a little bit high time */
++ ret = kdbus_msg_recv_poll(child, 500, &msg, NULL);
++ ASSERT_EXIT(ret == child_status);
++
++ /*
++ * If we expect the child to get the broadcast
++ * message, then check the received cookie.
++ */
++ if (ret == 0) {
++ ASSERT_EXIT(expected_cookie == msg->cookie);
++ }
++
++ /* Use expected_cookie since 'msg' might be NULL */
++ ret = kdbus_msg_send(child, NULL, expected_cookie + 1,
++ 0, 0, 0, KDBUS_DST_ID_BROADCAST);
++ ASSERT_EXIT(ret == 0);
++
++ kdbus_msg_free(msg);
++ kdbus_conn_free(child);
++ }),
++ ({
++ if (drop_second_user == DO_NOT_DROP) {
++ ASSERT_RETURN(child_2);
++
++ ret = eventfd_read(efd, &event_status);
++ ASSERT_RETURN(ret >= 0 && event_status == 1);
++
++ ret = kdbus_msg_send(child_2, NULL,
++ expected_cookie, 0, 0, 0,
++ KDBUS_DST_ID_BROADCAST);
++ ASSERT_RETURN(ret == 0);
++
++ /* drop own broadcast */
++ ret = kdbus_msg_recv(child_2, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(msg->src_id == child_2->id);
++ kdbus_msg_free(msg);
++
++ /* Use a little bit high time */
++ ret = kdbus_msg_recv_poll(child_2, 1000,
++ &msg, NULL);
++ ASSERT_RETURN(ret == parent_status);
++
++ /*
++ * Check returned cookie in case we expect
++ * success.
++ */
++ if (ret == 0) {
++ ASSERT_RETURN(msg->cookie ==
++ expected_cookie + 1);
++ }
++
++ kdbus_msg_free(msg);
++ } else {
++ /*
++ * Two unprivileged users will try to
++ * communicate using broadcast.
++ */
++ ret = RUN_UNPRIVILEGED(second_uid, second_gid, ({
++ child_2 = kdbus_hello(bus, 0, NULL, 0);
++ ASSERT_EXIT(child_2);
++
++ ret = kdbus_add_match_empty(child_2);
++ ASSERT_EXIT(ret == 0);
++
++ ret = eventfd_read(efd, &event_status);
++ ASSERT_EXIT(ret >= 0 && event_status == 1);
++
++ ret = kdbus_msg_send(child_2, NULL,
++ expected_cookie, 0, 0, 0,
++ KDBUS_DST_ID_BROADCAST);
++ ASSERT_EXIT(ret == 0);
++
++ /* drop own broadcast */
++ ret = kdbus_msg_recv(child_2, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(msg->src_id == child_2->id);
++ kdbus_msg_free(msg);
++
++ /* Use a little bit high time */
++ ret = kdbus_msg_recv_poll(child_2, 1000,
++ &msg, NULL);
++ ASSERT_EXIT(ret == parent_status);
++
++ /*
++ * Check returned cookie in case we expect
++ * success.
++ */
++ if (ret == 0) {
++ ASSERT_EXIT(msg->cookie ==
++ expected_cookie + 1);
++ }
++
++ kdbus_msg_free(msg);
++ kdbus_conn_free(child_2);
++ }),
++ ({ 0; }));
++ ASSERT_RETURN(ret == 0);
++ }
++ }));
++ ASSERT_RETURN(ret == 0);
++
++ close(efd);
++
++ return ret;
++}
++
++static void nosig(int sig)
++{
++}
++
++static int test_priv_before_policy_upload(struct kdbus_test_env *env)
++{
++ int ret = 0;
++ struct kdbus_conn *conn;
++
++ conn = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn);
++
++ /*
++ * Make sure unprivileged bus user cannot acquire names
++ * before registring any policy holder.
++ */
++
++ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++ ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
++ ASSERT_EXIT(ret < 0);
++ }));
++ ASSERT_RETURN(ret == 0);
++
++ /*
++ * Make sure unprivileged bus users cannot talk by default
++ * to privileged ones, unless a policy holder that allows
++ * this was uploaded.
++ */
++
++ ret = test_policy_priv_by_id(env->buspath, conn, false,
++ -ETIMEDOUT, -EPERM);
++ ASSERT_RETURN(ret == 0);
++
++ /* Activate matching for a privileged connection */
++ ret = kdbus_add_match_empty(conn);
++ ASSERT_RETURN(ret == 0);
++
++ /*
++ * First make sure that BROADCAST with msg flag
++ * KDBUS_MSG_EXPECT_REPLY will fail with -ENOTUNIQ
++ */
++ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++ ret = kdbus_msg_send(unpriv, NULL, 0xdeadbeef,
++ KDBUS_MSG_EXPECT_REPLY,
++ 5000000000ULL, 0,
++ KDBUS_DST_ID_BROADCAST);
++ ASSERT_EXIT(ret == -ENOTUNIQ);
++ }));
++ ASSERT_RETURN(ret == 0);
++
++ /*
++ * Test broadcast with a privileged connection.
++ *
++ * The first unprivileged receiver should not get the
++ * broadcast message sent by the privileged connection,
++ * since there is no a TALK policy that allows the
++ * unprivileged to TALK to the privileged connection. It
++ * will fail with -ETIMEDOUT
++ *
++ * Then second case:
++ * The privileged connection should get the broadcast
++ * message from the unprivileged one. Since the receiver is
++ * a privileged bus user and it has default TALK access to
++ * all connections it will receive those.
++ */
++
++ ret = test_policy_priv_by_broadcast(env->buspath, conn,
++ DO_NOT_DROP,
++ 0, -ETIMEDOUT);
++ ASSERT_RETURN(ret == 0);
++
++
++ /*
++ * Test broadcast with two unprivileged connections running
++ * under the same user.
++ *
++ * Both connections should succeed.
++ */
++
++ ret = test_policy_priv_by_broadcast(env->buspath, NULL,
++ DROP_SAME_UNPRIV, 0, 0);
++ ASSERT_RETURN(ret == 0);
++
++ /*
++ * Test broadcast with two unprivileged connections running
++ * under different users.
++ *
++ * Both connections will fail with -ETIMEDOUT.
++ */
++
++ ret = test_policy_priv_by_broadcast(env->buspath, NULL,
++ DROP_OTHER_UNPRIV,
++ -ETIMEDOUT, -ETIMEDOUT);
++ ASSERT_RETURN(ret == 0);
++
++ kdbus_conn_free(conn);
++
++ return ret;
++}
++
++static int test_broadcast_after_policy_upload(struct kdbus_test_env *env)
++{
++ int ret;
++ int efd;
++ eventfd_t event_status = 0;
++ struct kdbus_msg *msg = NULL;
++ struct kdbus_conn *owner_a, *owner_b;
++ struct kdbus_conn *holder_a, *holder_b;
++ struct kdbus_policy_access access = {};
++ uint64_t expected_cookie = time(NULL) ^ 0xdeadbeef;
++
++ owner_a = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(owner_a);
++
++ ret = kdbus_name_acquire(owner_a, "com.example.broadcastA", NULL);
++ ASSERT_EXIT(ret >= 0);
++
++ /*
++ * Make sure unprivileged bus users cannot talk by default
++ * to privileged ones, unless a policy holder that allows
++ * this was uploaded.
++ */
++
++ ++expected_cookie;
++ ret = test_policy_priv_by_id(env->buspath, owner_a, false,
++ -ETIMEDOUT, -EPERM);
++ ASSERT_RETURN(ret == 0);
++
++ /*
++ * Make sure that privileged won't receive broadcasts unless
++ * it installs a match. It will fail with -ETIMEDOUT
++ *
++ * At same time check that the unprivileged connection will
++ * not receive the broadcast message from the privileged one
++ * since the privileged one owns a name with a restricted
++ * policy TALK (actually the TALK policy is still not
++ * registered so we fail by default), thus the unprivileged
++ * receiver is not able to TALK to that name.
++ */
++
++ /* Activate matching for a privileged connection */
++ ret = kdbus_add_match_empty(owner_a);
++ ASSERT_RETURN(ret == 0);
++
++ /*
++ * Redo the previous test. The privileged conn owner_a is
++ * able to TALK to any connection so it will receive the
++ * broadcast message now.
++ */
++
++ ret = test_policy_priv_by_broadcast(env->buspath, owner_a,
++ DO_NOT_DROP,
++ 0, -ETIMEDOUT);
++ ASSERT_RETURN(ret == 0);
++
++ /*
++ * Test that broadcast between two unprivileged users running
++ * under the same user still succeed.
++ */
++
++ ret = test_policy_priv_by_broadcast(env->buspath, NULL,
++ DROP_SAME_UNPRIV, 0, 0);
++ ASSERT_RETURN(ret == 0);
++
++ /*
++ * Test broadcast with two unprivileged connections running
++ * under different users.
++ *
++ * Both connections will fail with -ETIMEDOUT.
++ */
++
++ ret = test_policy_priv_by_broadcast(env->buspath, NULL,
++ DROP_OTHER_UNPRIV,
++ -ETIMEDOUT, -ETIMEDOUT);
++ ASSERT_RETURN(ret == 0);
++
++ access = (struct kdbus_policy_access){
++ .type = KDBUS_POLICY_ACCESS_USER,
++ .id = geteuid(),
++ .access = KDBUS_POLICY_OWN,
++ };
++
++ holder_a = kdbus_hello_registrar(env->buspath,
++ "com.example.broadcastA",
++ &access, 1,
++ KDBUS_HELLO_POLICY_HOLDER);
++ ASSERT_RETURN(holder_a);
++
++ holder_b = kdbus_hello_registrar(env->buspath,
++ "com.example.broadcastB",
++ &access, 1,
++ KDBUS_HELLO_POLICY_HOLDER);
++ ASSERT_RETURN(holder_b);
++
++ /* Free connections and their received messages and restart */
++ kdbus_conn_free(owner_a);
++
++ owner_a = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(owner_a);
++
++ /* Activate matching for a privileged connection */
++ ret = kdbus_add_match_empty(owner_a);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_name_acquire(owner_a, "com.example.broadcastA", NULL);
++ ASSERT_EXIT(ret >= 0);
++
++ owner_b = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(owner_b);
++
++ ret = kdbus_name_acquire(owner_b, "com.example.broadcastB", NULL);
++ ASSERT_EXIT(ret >= 0);
++
++ /* Activate matching for a privileged connection */
++ ret = kdbus_add_match_empty(owner_b);
++ ASSERT_RETURN(ret == 0);
++
++ /*
++ * Test that even if "com.example.broadcastA" and
++ * "com.example.broadcastB" do have a TALK access by default
++ * they are able to signal each other using broadcast due to
++ * the fact they are privileged connections, they receive
++ * all broadcasts if the match allows it.
++ */
++
++ ++expected_cookie;
++ ret = kdbus_msg_send(owner_a, NULL, expected_cookie, 0,
++ 0, 0, KDBUS_DST_ID_BROADCAST);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_msg_recv_poll(owner_a, 100, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(msg->cookie == expected_cookie);
++
++ /* Check src ID */
++ ASSERT_RETURN(msg->src_id == owner_a->id);
++
++ kdbus_msg_free(msg);
++
++ ret = kdbus_msg_recv_poll(owner_b, 100, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++ ASSERT_RETURN(msg->cookie == expected_cookie);
++
++ /* Check src ID */
++ ASSERT_RETURN(msg->src_id == owner_a->id);
++
++ kdbus_msg_free(msg);
++
++ /* Release name "com.example.broadcastB" */
++
++ ret = kdbus_name_release(owner_b, "com.example.broadcastB");
++ ASSERT_EXIT(ret >= 0);
++
++ /* KDBUS_POLICY_OWN for unprivileged connections */
++ access = (struct kdbus_policy_access){
++ .type = KDBUS_POLICY_ACCESS_WORLD,
++ .id = geteuid(),
++ .access = KDBUS_POLICY_OWN,
++ };
++
++ /* Update the policy so unprivileged will own the name */
++
++ ret = kdbus_conn_update_policy(holder_b,
++ "com.example.broadcastB",
++ &access, 1);
++ ASSERT_RETURN(ret == 0);
++
++ /*
++ * Send broadcasts from an unprivileged connection that
++ * owns a name "com.example.broadcastB".
++ *
++ * We'll have four destinations here:
++ *
++ * 1) destination owner_a: privileged connection that owns
++ * "com.example.broadcastA". It will receive the broadcast
++ * since it is a privileged has default TALK access to all
++ * connections, and it is subscribed to the match.
++ * Will succeed.
++ *
++ * owner_b: privileged connection (running under a different
++ * uid) that do not own names, but with an empty broadcast
++ * match, so it will receive broadcasts since it has default
++ * TALK access to all connection.
++ *
++ * unpriv_a: unpriv connection that do not own any name.
++ * It will receive the broadcast since it is running under
++ * the same user of the one broadcasting and did install
++ * matches. It should get the message.
++ *
++ * unpriv_b: unpriv connection is not interested in broadcast
++ * messages, so it did not install broadcast matches. Should
++ * fail with -ETIMEDOUT
++ */
++
++ ++expected_cookie;
++ efd = eventfd(0, EFD_CLOEXEC);
++ ASSERT_RETURN_VAL(efd >= 0, efd);
++
++ ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_UID, ({
++ struct kdbus_conn *unpriv_owner;
++ struct kdbus_conn *unpriv_a, *unpriv_b;
++
++ unpriv_owner = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_EXIT(unpriv_owner);
++
++ unpriv_a = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_EXIT(unpriv_a);
++
++ unpriv_b = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_EXIT(unpriv_b);
++
++ ret = kdbus_name_acquire(unpriv_owner,
++ "com.example.broadcastB",
++ NULL);
++ ASSERT_EXIT(ret >= 0);
++
++ ret = kdbus_add_match_empty(unpriv_a);
++ ASSERT_EXIT(ret == 0);
++
++ /* Signal that we are doing broadcasts */
++ ret = eventfd_write(efd, 1);
++ ASSERT_EXIT(ret == 0);
++
++ /*
++ * Do broadcast from a connection that owns the
++ * names "com.example.broadcastB".
++ */
++ ret = kdbus_msg_send(unpriv_owner, NULL,
++ expected_cookie,
++ 0, 0, 0,
++ KDBUS_DST_ID_BROADCAST);
++ ASSERT_EXIT(ret == 0);
++
++ /*
++ * Unprivileged connection running under the same
++ * user. It should succeed.
++ */
++ ret = kdbus_msg_recv_poll(unpriv_a, 300, &msg, NULL);
++ ASSERT_EXIT(ret == 0 && msg->cookie == expected_cookie);
++
++ /*
++ * Did not install matches, not interested in
++ * broadcasts
++ */
++ ret = kdbus_msg_recv_poll(unpriv_b, 300, NULL, NULL);
++ ASSERT_EXIT(ret == -ETIMEDOUT);
++ }),
++ ({
++ ret = eventfd_read(efd, &event_status);
++ ASSERT_RETURN(ret >= 0 && event_status == 1);
++
++ /*
++ * owner_a must fail with -ETIMEDOUT, since it owns
++ * name "com.example.broadcastA" and its TALK
++ * access is restriced.
++ */
++ ret = kdbus_msg_recv_poll(owner_a, 300, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ /* confirm the received cookie */
++ ASSERT_RETURN(msg->cookie == expected_cookie);
++
++ kdbus_msg_free(msg);
++
++ /*
++ * owner_b got the broadcast from an unprivileged
++ * connection.
++ */
++ ret = kdbus_msg_recv_poll(owner_b, 300, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ /* confirm the received cookie */
++ ASSERT_RETURN(msg->cookie == expected_cookie);
++
++ kdbus_msg_free(msg);
++
++ }));
++ ASSERT_RETURN(ret == 0);
++
++ close(efd);
++
++ /*
++ * Test broadcast with two unprivileged connections running
++ * under different users.
++ *
++ * Both connections will fail with -ETIMEDOUT.
++ */
++
++ ret = test_policy_priv_by_broadcast(env->buspath, NULL,
++ DROP_OTHER_UNPRIV,
++ -ETIMEDOUT, -ETIMEDOUT);
++ ASSERT_RETURN(ret == 0);
++
++ /* Drop received broadcasts by privileged */
++ ret = kdbus_msg_recv_poll(owner_a, 100, NULL, NULL);
++ ret = kdbus_msg_recv_poll(owner_a, 100, NULL, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_msg_recv(owner_a, NULL, NULL);
++ ASSERT_RETURN(ret == -EAGAIN);
++
++ ret = kdbus_msg_recv_poll(owner_b, 100, NULL, NULL);
++ ret = kdbus_msg_recv_poll(owner_b, 100, NULL, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_msg_recv(owner_b, NULL, NULL);
++ ASSERT_RETURN(ret == -EAGAIN);
++
++ /*
++ * Perform last tests, allow others to talk to name
++ * "com.example.broadcastA". So now receiving broadcasts
++ * from it should succeed since the TALK policy allow it.
++ */
++
++ /* KDBUS_POLICY_OWN for unprivileged connections */
++ access = (struct kdbus_policy_access){
++ .type = KDBUS_POLICY_ACCESS_WORLD,
++ .id = geteuid(),
++ .access = KDBUS_POLICY_TALK,
++ };
++
++ ret = kdbus_conn_update_policy(holder_a,
++ "com.example.broadcastA",
++ &access, 1);
++ ASSERT_RETURN(ret == 0);
++
++ /*
++ * Unprivileged is able to TALK to "com.example.broadcastA"
++ * now so it will receive its broadcasts
++ */
++ ret = test_policy_priv_by_broadcast(env->buspath, owner_a,
++ DO_NOT_DROP, 0, 0);
++ ASSERT_RETURN(ret == 0);
++
++ ++expected_cookie;
++ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++ ret = kdbus_name_acquire(unpriv, "com.example.broadcastB",
++ NULL);
++ ASSERT_EXIT(ret >= 0);
++ ret = kdbus_msg_send(unpriv, NULL, expected_cookie,
++ 0, 0, 0, KDBUS_DST_ID_BROADCAST);
++ ASSERT_EXIT(ret == 0);
++ }));
++ ASSERT_RETURN(ret == 0);
++
++ /* owner_a is privileged it will get the broadcast now. */
++ ret = kdbus_msg_recv_poll(owner_a, 300, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ /* confirm the received cookie */
++ ASSERT_RETURN(msg->cookie == expected_cookie);
++
++ kdbus_msg_free(msg);
++
++ /*
++ * owner_a released name "com.example.broadcastA". It should
++ * receive broadcasts since it is still privileged and has
++ * the right match.
++ *
++ * Unprivileged connection will own a name and will try to
++ * signal to the privileged connection.
++ */
++
++ ret = kdbus_name_release(owner_a, "com.example.broadcastA");
++ ASSERT_EXIT(ret >= 0);
++
++ ++expected_cookie;
++ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++ ret = kdbus_name_acquire(unpriv, "com.example.broadcastB",
++ NULL);
++ ASSERT_EXIT(ret >= 0);
++ ret = kdbus_msg_send(unpriv, NULL, expected_cookie,
++ 0, 0, 0, KDBUS_DST_ID_BROADCAST);
++ ASSERT_EXIT(ret == 0);
++ }));
++ ASSERT_RETURN(ret == 0);
++
++ /* owner_a will get the broadcast now. */
++ ret = kdbus_msg_recv_poll(owner_a, 300, &msg, NULL);
++ ASSERT_RETURN(ret == 0);
++
++ /* confirm the received cookie */
++ ASSERT_RETURN(msg->cookie == expected_cookie);
++
++ kdbus_msg_free(msg);
++
++ kdbus_conn_free(owner_a);
++ kdbus_conn_free(owner_b);
++ kdbus_conn_free(holder_a);
++ kdbus_conn_free(holder_b);
++
++ return 0;
++}
++
++static int test_policy_priv(struct kdbus_test_env *env)
++{
++ struct kdbus_conn *conn_a, *conn_b, *conn, *owner;
++ struct kdbus_policy_access access, *acc;
++ sigset_t sset;
++ size_t num;
++ int ret;
++
++ /*
++ * Make sure we have CAP_SETUID/SETGID so we can drop privileges
++ */
++
++ ret = test_is_capable(CAP_SETUID, CAP_SETGID, -1);
++ ASSERT_RETURN(ret >= 0);
++
++ if (!ret)
++ return TEST_SKIP;
++
++ /* make sure that uids and gids are mapped */
++ if (!all_uids_gids_are_mapped())
++ return TEST_SKIP;
++
++ /*
++ * Setup:
++ * conn_a: policy holder for com.example.a
++ * conn_b: name holder of com.example.b
++ */
++
++ signal(SIGUSR1, nosig);
++ sigemptyset(&sset);
++ sigaddset(&sset, SIGUSR1);
++ sigprocmask(SIG_BLOCK, &sset, NULL);
++
++ conn = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn);
++
++ /*
++ * Before registering any policy holder, make sure that the
++ * bus is secure by default. This test is necessary, it catches
++ * several cases where old D-Bus was vulnerable.
++ */
++
++ ret = test_priv_before_policy_upload(env);
++ ASSERT_RETURN(ret == 0);
++
++ /*
++ * Make sure unprivileged are not able to register policy
++ * holders
++ */
++
++ ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_GID, ({
++ struct kdbus_conn *holder;
++
++ holder = kdbus_hello_registrar(env->buspath,
++ "com.example.a", NULL, 0,
++ KDBUS_HELLO_POLICY_HOLDER);
++ ASSERT_EXIT(holder == NULL && errno == EPERM);
++ }),
++ ({ 0; }));
++ ASSERT_RETURN(ret == 0);
++
++
++ /* Register policy holder */
++
++ conn_a = kdbus_hello_registrar(env->buspath, "com.example.a",
++ NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
++ ASSERT_RETURN(conn_a);
++
++ conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn_b);
++
++ ret = kdbus_name_acquire(conn_b, "com.example.b", NULL);
++ ASSERT_EXIT(ret >= 0);
++
++ /*
++ * Make sure bus-owners can always acquire names.
++ */
++ ret = kdbus_name_acquire(conn, "com.example.a", NULL);
++ ASSERT_EXIT(ret >= 0);
++
++ kdbus_conn_free(conn);
++
++ /*
++ * Make sure unprivileged users cannot acquire names with default
++ * policy assigned.
++ */
++
++ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++ ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
++ ASSERT_EXIT(ret < 0);
++ }));
++ ASSERT_RETURN(ret >= 0);
++
++ /*
++ * Make sure unprivileged users can acquire names if we make them
++ * world-accessible.
++ */
++
++ access = (struct kdbus_policy_access){
++ .type = KDBUS_POLICY_ACCESS_WORLD,
++ .id = 0,
++ .access = KDBUS_POLICY_OWN,
++ };
++
++ /*
++ * Make sure unprivileged/normal connections are not able
++ * to update policies
++ */
++
++ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++ ret = kdbus_conn_update_policy(unpriv, "com.example.a",
++ &access, 1);
++ ASSERT_EXIT(ret == -EOPNOTSUPP);
++ }));
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
++ ASSERT_RETURN(ret == 0);
++
++ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++ ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
++ ASSERT_EXIT(ret >= 0);
++ }));
++ ASSERT_RETURN(ret >= 0);
++
++ /*
++ * Make sure unprivileged users can acquire names if we make them
++ * gid-accessible. But only if the gid matches.
++ */
++
++ access = (struct kdbus_policy_access){
++ .type = KDBUS_POLICY_ACCESS_GROUP,
++ .id = UNPRIV_GID,
++ .access = KDBUS_POLICY_OWN,
++ };
++
++ ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
++ ASSERT_RETURN(ret == 0);
++
++ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++ ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
++ ASSERT_EXIT(ret >= 0);
++ }));
++ ASSERT_RETURN(ret >= 0);
++
++ access = (struct kdbus_policy_access){
++ .type = KDBUS_POLICY_ACCESS_GROUP,
++ .id = 1,
++ .access = KDBUS_POLICY_OWN,
++ };
++
++ ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
++ ASSERT_RETURN(ret == 0);
++
++ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++ ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
++ ASSERT_EXIT(ret < 0);
++ }));
++ ASSERT_RETURN(ret >= 0);
++
++ /*
++ * Make sure unprivileged users can acquire names if we make them
++ * uid-accessible. But only if the uid matches.
++ */
++
++ access = (struct kdbus_policy_access){
++ .type = KDBUS_POLICY_ACCESS_USER,
++ .id = UNPRIV_UID,
++ .access = KDBUS_POLICY_OWN,
++ };
++
++ ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
++ ASSERT_RETURN(ret == 0);
++
++ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++ ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
++ ASSERT_EXIT(ret >= 0);
++ }));
++ ASSERT_RETURN(ret >= 0);
++
++ access = (struct kdbus_policy_access){
++ .type = KDBUS_POLICY_ACCESS_USER,
++ .id = 1,
++ .access = KDBUS_POLICY_OWN,
++ };
++
++ ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
++ ASSERT_RETURN(ret == 0);
++
++ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++ ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
++ ASSERT_EXIT(ret < 0);
++ }));
++ ASSERT_RETURN(ret >= 0);
++
++ /*
++ * Make sure unprivileged users cannot acquire names if no owner-policy
++ * matches, even if SEE/TALK policies match.
++ */
++
++ num = 4;
++ acc = (struct kdbus_policy_access[]){
++ {
++ .type = KDBUS_POLICY_ACCESS_GROUP,
++ .id = UNPRIV_GID,
++ .access = KDBUS_POLICY_SEE,
++ },
++ {
++ .type = KDBUS_POLICY_ACCESS_USER,
++ .id = UNPRIV_UID,
++ .access = KDBUS_POLICY_TALK,
++ },
++ {
++ .type = KDBUS_POLICY_ACCESS_WORLD,
++ .id = 0,
++ .access = KDBUS_POLICY_TALK,
++ },
++ {
++ .type = KDBUS_POLICY_ACCESS_WORLD,
++ .id = 0,
++ .access = KDBUS_POLICY_SEE,
++ },
++ };
++
++ ret = kdbus_conn_update_policy(conn_a, "com.example.a", acc, num);
++ ASSERT_RETURN(ret == 0);
++
++ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++ ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
++ ASSERT_EXIT(ret < 0);
++ }));
++ ASSERT_RETURN(ret >= 0);
++
++ /*
++ * Make sure unprivileged users can acquire names if the only matching
++ * policy is somewhere in the middle.
++ */
++
++ num = 5;
++ acc = (struct kdbus_policy_access[]){
++ {
++ .type = KDBUS_POLICY_ACCESS_USER,
++ .id = 1,
++ .access = KDBUS_POLICY_OWN,
++ },
++ {
++ .type = KDBUS_POLICY_ACCESS_USER,
++ .id = 2,
++ .access = KDBUS_POLICY_OWN,
++ },
++ {
++ .type = KDBUS_POLICY_ACCESS_USER,
++ .id = UNPRIV_UID,
++ .access = KDBUS_POLICY_OWN,
++ },
++ {
++ .type = KDBUS_POLICY_ACCESS_USER,
++ .id = 3,
++ .access = KDBUS_POLICY_OWN,
++ },
++ {
++ .type = KDBUS_POLICY_ACCESS_USER,
++ .id = 4,
++ .access = KDBUS_POLICY_OWN,
++ },
++ };
++
++ ret = kdbus_conn_update_policy(conn_a, "com.example.a", acc, num);
++ ASSERT_RETURN(ret == 0);
++
++ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++ ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
++ ASSERT_EXIT(ret >= 0);
++ }));
++ ASSERT_RETURN(ret >= 0);
++
++ /*
++ * Clear policies
++ */
++
++ ret = kdbus_conn_update_policy(conn_a, "com.example.a", NULL, 0);
++ ASSERT_RETURN(ret == 0);
++
++ /*
++ * Make sure privileged bus users can _always_ talk to others.
++ */
++
++ conn = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn);
++
++ ret = kdbus_msg_send(conn, "com.example.b", 0xdeadbeef, 0, 0, 0, 0);
++ ASSERT_EXIT(ret >= 0);
++
++ ret = kdbus_msg_recv_poll(conn_b, 300, NULL, NULL);
++ ASSERT_EXIT(ret >= 0);
++
++ kdbus_conn_free(conn);
++
++ /*
++ * Make sure unprivileged bus users cannot talk by default.
++ */
++
++ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++ ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
++ 0, 0);
++ ASSERT_EXIT(ret == -EPERM);
++ }));
++ ASSERT_RETURN(ret >= 0);
++
++ /*
++ * Make sure unprivileged bus users can talk to equals, even without
++ * policy.
++ */
++
++ access = (struct kdbus_policy_access){
++ .type = KDBUS_POLICY_ACCESS_USER,
++ .id = UNPRIV_UID,
++ .access = KDBUS_POLICY_OWN,
++ };
++
++ ret = kdbus_conn_update_policy(conn_a, "com.example.c", &access, 1);
++ ASSERT_RETURN(ret == 0);
++
++ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++ struct kdbus_conn *owner;
++
++ owner = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(owner);
++
++ ret = kdbus_name_acquire(owner, "com.example.c", NULL);
++ ASSERT_EXIT(ret >= 0);
++
++ ret = kdbus_msg_send(unpriv, "com.example.c", 0xdeadbeef, 0, 0,
++ 0, 0);
++ ASSERT_EXIT(ret >= 0);
++ ret = kdbus_msg_recv_poll(owner, 100, NULL, NULL);
++ ASSERT_EXIT(ret >= 0);
++
++ kdbus_conn_free(owner);
++ }));
++ ASSERT_RETURN(ret >= 0);
++
++ /*
++ * Make sure unprivileged bus users can talk to privileged users if a
++ * suitable UID policy is set.
++ */
++
++ access = (struct kdbus_policy_access){
++ .type = KDBUS_POLICY_ACCESS_USER,
++ .id = UNPRIV_UID,
++ .access = KDBUS_POLICY_TALK,
++ };
++
++ ret = kdbus_conn_update_policy(conn_a, "com.example.b", &access, 1);
++ ASSERT_RETURN(ret == 0);
++
++ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++ ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
++ 0, 0);
++ ASSERT_EXIT(ret >= 0);
++ }));
++ ASSERT_RETURN(ret >= 0);
++
++ ret = kdbus_msg_recv_poll(conn_b, 100, NULL, NULL);
++ ASSERT_EXIT(ret >= 0);
++
++ /*
++ * Make sure unprivileged bus users can talk to privileged users if a
++ * suitable GID policy is set.
++ */
++
++ access = (struct kdbus_policy_access){
++ .type = KDBUS_POLICY_ACCESS_GROUP,
++ .id = UNPRIV_GID,
++ .access = KDBUS_POLICY_TALK,
++ };
++
++ ret = kdbus_conn_update_policy(conn_a, "com.example.b", &access, 1);
++ ASSERT_RETURN(ret == 0);
++
++ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++ ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
++ 0, 0);
++ ASSERT_EXIT(ret >= 0);
++ }));
++ ASSERT_RETURN(ret >= 0);
++
++ ret = kdbus_msg_recv_poll(conn_b, 100, NULL, NULL);
++ ASSERT_EXIT(ret >= 0);
++
++ /*
++ * Make sure unprivileged bus users can talk to privileged users if a
++ * suitable WORLD policy is set.
++ */
++
++ access = (struct kdbus_policy_access){
++ .type = KDBUS_POLICY_ACCESS_WORLD,
++ .id = 0,
++ .access = KDBUS_POLICY_TALK,
++ };
++
++ ret = kdbus_conn_update_policy(conn_a, "com.example.b", &access, 1);
++ ASSERT_RETURN(ret == 0);
++
++ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++ ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
++ 0, 0);
++ ASSERT_EXIT(ret >= 0);
++ }));
++ ASSERT_RETURN(ret >= 0);
++
++ ret = kdbus_msg_recv_poll(conn_b, 100, NULL, NULL);
++ ASSERT_EXIT(ret >= 0);
++
++ /*
++ * Make sure unprivileged bus users cannot talk to privileged users if
++ * no suitable policy is set.
++ */
++
++ num = 5;
++ acc = (struct kdbus_policy_access[]){
++ {
++ .type = KDBUS_POLICY_ACCESS_USER,
++ .id = 0,
++ .access = KDBUS_POLICY_OWN,
++ },
++ {
++ .type = KDBUS_POLICY_ACCESS_USER,
++ .id = 1,
++ .access = KDBUS_POLICY_TALK,
++ },
++ {
++ .type = KDBUS_POLICY_ACCESS_USER,
++ .id = UNPRIV_UID,
++ .access = KDBUS_POLICY_SEE,
++ },
++ {
++ .type = KDBUS_POLICY_ACCESS_USER,
++ .id = 3,
++ .access = KDBUS_POLICY_TALK,
++ },
++ {
++ .type = KDBUS_POLICY_ACCESS_USER,
++ .id = 4,
++ .access = KDBUS_POLICY_TALK,
++ },
++ };
++
++ ret = kdbus_conn_update_policy(conn_a, "com.example.b", acc, num);
++ ASSERT_RETURN(ret == 0);
++
++ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++ ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
++ 0, 0);
++ ASSERT_EXIT(ret == -EPERM);
++ }));
++ ASSERT_RETURN(ret >= 0);
++
++ /*
++ * Make sure unprivileged bus users can talk to privileged users if a
++ * suitable OWN privilege overwrites TALK.
++ */
++
++ access = (struct kdbus_policy_access){
++ .type = KDBUS_POLICY_ACCESS_WORLD,
++ .id = 0,
++ .access = KDBUS_POLICY_OWN,
++ };
++
++ ret = kdbus_conn_update_policy(conn_a, "com.example.b", &access, 1);
++ ASSERT_RETURN(ret == 0);
++
++ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++ ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
++ 0, 0);
++ ASSERT_EXIT(ret >= 0);
++ }));
++ ASSERT_RETURN(ret >= 0);
++
++ ret = kdbus_msg_recv_poll(conn_b, 100, NULL, NULL);
++ ASSERT_EXIT(ret >= 0);
++
++ /*
++ * Make sure the TALK cache is reset correctly when policies are
++ * updated.
++ */
++
++ access = (struct kdbus_policy_access){
++ .type = KDBUS_POLICY_ACCESS_WORLD,
++ .id = 0,
++ .access = KDBUS_POLICY_TALK,
++ };
++
++ ret = kdbus_conn_update_policy(conn_a, "com.example.b", &access, 1);
++ ASSERT_RETURN(ret == 0);
++
++ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
++ ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
++ 0, 0);
++ ASSERT_EXIT(ret >= 0);
++
++ ret = kdbus_msg_recv_poll(conn_b, 100, NULL, NULL);
++ ASSERT_EXIT(ret >= 0);
++
++ ret = kdbus_conn_update_policy(conn_a, "com.example.b",
++ NULL, 0);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
++ 0, 0);
++ ASSERT_EXIT(ret == -EPERM);
++ }));
++ ASSERT_RETURN(ret >= 0);
++
++ /*
++ * Make sure the TALK cache is reset correctly when policy holders
++ * disconnect.
++ */
++
++ access = (struct kdbus_policy_access){
++ .type = KDBUS_POLICY_ACCESS_WORLD,
++ .id = 0,
++ .access = KDBUS_POLICY_OWN,
++ };
++
++ conn = kdbus_hello_registrar(env->buspath, "com.example.c",
++ NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
++ ASSERT_RETURN(conn);
++
++ ret = kdbus_conn_update_policy(conn, "com.example.c", &access, 1);
++ ASSERT_RETURN(ret == 0);
++
++ owner = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(owner);
++
++ ret = kdbus_name_acquire(owner, "com.example.c", NULL);
++ ASSERT_RETURN(ret >= 0);
++
++ ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_GID, ({
++ struct kdbus_conn *unpriv;
++
++ /* wait for parent to be finished */
++ sigemptyset(&sset);
++ ret = sigsuspend(&sset);
++ ASSERT_RETURN(ret == -1 && errno == EINTR);
++
++ unpriv = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(unpriv);
++
++ ret = kdbus_msg_send(unpriv, "com.example.c", 0xdeadbeef, 0, 0,
++ 0, 0);
++ ASSERT_EXIT(ret >= 0);
++
++ ret = kdbus_msg_recv_poll(owner, 100, NULL, NULL);
++ ASSERT_EXIT(ret >= 0);
++
++ /* free policy holder */
++ kdbus_conn_free(conn);
++
++ ret = kdbus_msg_send(unpriv, "com.example.c", 0xdeadbeef, 0, 0,
++ 0, 0);
++ ASSERT_EXIT(ret == -EPERM);
++
++ kdbus_conn_free(unpriv);
++ }), ({
++ /* make sure policy holder is only valid in child */
++ kdbus_conn_free(conn);
++ kill(pid, SIGUSR1);
++ }));
++ ASSERT_RETURN(ret >= 0);
++
++
++ /*
++ * The following tests are necessary.
++ */
++
++ ret = test_broadcast_after_policy_upload(env);
++ ASSERT_RETURN(ret == 0);
++
++ kdbus_conn_free(owner);
++
++ /*
++ * cleanup resources
++ */
++
++ kdbus_conn_free(conn_b);
++ kdbus_conn_free(conn_a);
++
++ return TEST_OK;
++}
++
++int kdbus_test_policy_priv(struct kdbus_test_env *env)
++{
++ pid_t pid;
++ int ret;
++
++ /* make sure to exit() if a child returns from fork() */
++ pid = getpid();
++ ret = test_policy_priv(env);
++ if (pid != getpid())
++ exit(1);
++
++ return ret;
++}
+diff --git a/tools/testing/selftests/kdbus/test-policy.c b/tools/testing/selftests/kdbus/test-policy.c
+new file mode 100644
+index 0000000..96d20d5
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-policy.c
+@@ -0,0 +1,80 @@
++#include <errno.h>
++#include <stdio.h>
++#include <string.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stdint.h>
++#include <stdbool.h>
++#include <unistd.h>
++
++#include "kdbus-test.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++
++int kdbus_test_policy(struct kdbus_test_env *env)
++{
++ struct kdbus_conn *conn_a, *conn_b;
++ struct kdbus_policy_access access;
++ int ret;
++
++ /* Invalid name */
++ conn_a = kdbus_hello_registrar(env->buspath, ".example.a",
++ NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
++ ASSERT_RETURN(conn_a == NULL);
++
++ conn_a = kdbus_hello_registrar(env->buspath, "example",
++ NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
++ ASSERT_RETURN(conn_a == NULL);
++
++ conn_a = kdbus_hello_registrar(env->buspath, "com.example.a",
++ NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
++ ASSERT_RETURN(conn_a);
++
++ conn_b = kdbus_hello_registrar(env->buspath, "com.example.b",
++ NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
++ ASSERT_RETURN(conn_b);
++
++ /*
++ * Verify there cannot be any duplicate entries, except for specific vs.
++ * wildcard entries.
++ */
++
++ access = (struct kdbus_policy_access){
++ .type = KDBUS_POLICY_ACCESS_USER,
++ .id = geteuid(),
++ .access = KDBUS_POLICY_SEE,
++ };
++
++ ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_conn_update_policy(conn_b, "com.example.a", &access, 1);
++ ASSERT_RETURN(ret == -EEXIST);
++
++ ret = kdbus_conn_update_policy(conn_b, "com.example.a.*", &access, 1);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_conn_update_policy(conn_a, "com.example.a.*", &access, 1);
++ ASSERT_RETURN(ret == -EEXIST);
++
++ ret = kdbus_conn_update_policy(conn_a, "com.example.*", &access, 1);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_conn_update_policy(conn_b, "com.example.a", &access, 1);
++ ASSERT_RETURN(ret == 0);
++
++ ret = kdbus_conn_update_policy(conn_b, "com.example.*", &access, 1);
++ ASSERT_RETURN(ret == -EEXIST);
++
++ /* Invalid name */
++ ret = kdbus_conn_update_policy(conn_b, ".example.*", &access, 1);
++ ASSERT_RETURN(ret == -EINVAL);
++
++ ret = kdbus_conn_update_policy(conn_b, "example", &access, 1);
++ ASSERT_RETURN(ret == -EINVAL);
++
++ kdbus_conn_free(conn_b);
++ kdbus_conn_free(conn_a);
++
++ return TEST_OK;
++}
+diff --git a/tools/testing/selftests/kdbus/test-sync.c b/tools/testing/selftests/kdbus/test-sync.c
+new file mode 100644
+index 0000000..0655a54
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-sync.c
+@@ -0,0 +1,369 @@
++#include <stdio.h>
++#include <string.h>
++#include <time.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++#include <assert.h>
++#include <pthread.h>
++#include <stdbool.h>
++#include <signal.h>
++#include <sys/wait.h>
++#include <sys/eventfd.h>
++
++#include "kdbus-api.h"
++#include "kdbus-test.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++
++static struct kdbus_conn *conn_a, *conn_b;
++static unsigned int cookie = 0xdeadbeef;
++
++static void nop_handler(int sig) {}
++
++static int interrupt_sync(struct kdbus_conn *conn_src,
++ struct kdbus_conn *conn_dst)
++{
++ pid_t pid;
++ int ret, status;
++ struct kdbus_msg *msg = NULL;
++ struct sigaction sa = {
++ .sa_handler = nop_handler,
++ .sa_flags = SA_NOCLDSTOP|SA_RESTART,
++ };
++
++ cookie++;
++ pid = fork();
++ ASSERT_RETURN_VAL(pid >= 0, pid);
++
++ if (pid == 0) {
++ ret = sigaction(SIGINT, &sa, NULL);
++ ASSERT_EXIT(ret == 0);
++
++ ret = kdbus_msg_send_sync(conn_dst, NULL, cookie,
++ KDBUS_MSG_EXPECT_REPLY,
++ 100000000ULL, 0, conn_src->id, -1);
++ ASSERT_EXIT(ret == -ETIMEDOUT);
++
++ _exit(EXIT_SUCCESS);
++ }
++
++ ret = kdbus_msg_recv_poll(conn_src, 100, &msg, NULL);
++ ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
++
++ kdbus_msg_free(msg);
++
++ ret = kill(pid, SIGINT);
++ ASSERT_RETURN_VAL(ret == 0, ret);
++
++ ret = waitpid(pid, &status, 0);
++ ASSERT_RETURN_VAL(ret >= 0, ret);
++
++ if (WIFSIGNALED(status))
++ return TEST_ERR;
++
++ ret = kdbus_msg_recv_poll(conn_src, 100, NULL, NULL);
++ ASSERT_RETURN(ret == -ETIMEDOUT);
++
++ return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
++}
++
++static int close_epipe_sync(const char *bus)
++{
++ pid_t pid;
++ int ret, status;
++ struct kdbus_conn *conn_src;
++ struct kdbus_conn *conn_dst;
++ struct kdbus_msg *msg = NULL;
++
++ conn_src = kdbus_hello(bus, 0, NULL, 0);
++ ASSERT_RETURN(conn_src);
++
++ ret = kdbus_add_match_empty(conn_src);
++ ASSERT_RETURN(ret == 0);
++
++ conn_dst = kdbus_hello(bus, 0, NULL, 0);
++ ASSERT_RETURN(conn_dst);
++
++ cookie++;
++ pid = fork();
++ ASSERT_RETURN_VAL(pid >= 0, pid);
++
++ if (pid == 0) {
++ uint64_t dst_id;
++
++ /* close our reference */
++ dst_id = conn_dst->id;
++ kdbus_conn_free(conn_dst);
++
++ ret = kdbus_msg_recv_poll(conn_src, 100, &msg, NULL);
++ ASSERT_EXIT(ret == 0 && msg->cookie == cookie);
++ ASSERT_EXIT(msg->src_id == dst_id);
++
++ cookie++;
++ ret = kdbus_msg_send_sync(conn_src, NULL, cookie,
++ KDBUS_MSG_EXPECT_REPLY,
++ 100000000ULL, 0, dst_id, -1);
++ ASSERT_EXIT(ret == -EPIPE);
++
++ _exit(EXIT_SUCCESS);
++ }
++
++ ret = kdbus_msg_send(conn_dst, NULL, cookie, 0, 0, 0,
++ KDBUS_DST_ID_BROADCAST);
++ ASSERT_RETURN(ret == 0);
++
++ cookie++;
++ ret = kdbus_msg_recv_poll(conn_dst, 100, &msg, NULL);
++ ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
++
++ kdbus_msg_free(msg);
++
++ /* destroy connection */
++ kdbus_conn_free(conn_dst);
++ kdbus_conn_free(conn_src);
++
++ ret = waitpid(pid, &status, 0);
++ ASSERT_RETURN_VAL(ret >= 0, ret);
++
++ if (!WIFEXITED(status))
++ return TEST_ERR;
++
++ return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
++}
++
++static int cancel_fd_sync(struct kdbus_conn *conn_src,
++ struct kdbus_conn *conn_dst)
++{
++ pid_t pid;
++ int cancel_fd;
++ int ret, status;
++ uint64_t counter = 1;
++ struct kdbus_msg *msg = NULL;
++
++ cancel_fd = eventfd(0, 0);
++ ASSERT_RETURN_VAL(cancel_fd >= 0, cancel_fd);
++
++ cookie++;
++ pid = fork();
++ ASSERT_RETURN_VAL(pid >= 0, pid);
++
++ if (pid == 0) {
++ ret = kdbus_msg_send_sync(conn_dst, NULL, cookie,
++ KDBUS_MSG_EXPECT_REPLY,
++ 100000000ULL, 0, conn_src->id,
++ cancel_fd);
++ ASSERT_EXIT(ret == -ECANCELED);
++
++ _exit(EXIT_SUCCESS);
++ }
++
++ ret = kdbus_msg_recv_poll(conn_src, 100, &msg, NULL);
++ ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
++
++ kdbus_msg_free(msg);
++
++ ret = write(cancel_fd, &counter, sizeof(counter));
++ ASSERT_RETURN(ret == sizeof(counter));
++
++ ret = waitpid(pid, &status, 0);
++ ASSERT_RETURN_VAL(ret >= 0, ret);
++
++ if (WIFSIGNALED(status))
++ return TEST_ERR;
++
++ return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
++}
++
++static int no_cancel_sync(struct kdbus_conn *conn_src,
++ struct kdbus_conn *conn_dst)
++{
++ pid_t pid;
++ int cancel_fd;
++ int ret, status;
++ struct kdbus_msg *msg = NULL;
++
++ /* pass eventfd, but never signal it so it shouldn't have any effect */
++
++ cancel_fd = eventfd(0, 0);
++ ASSERT_RETURN_VAL(cancel_fd >= 0, cancel_fd);
++
++ cookie++;
++ pid = fork();
++ ASSERT_RETURN_VAL(pid >= 0, pid);
++
++ if (pid == 0) {
++ ret = kdbus_msg_send_sync(conn_dst, NULL, cookie,
++ KDBUS_MSG_EXPECT_REPLY,
++ 100000000ULL, 0, conn_src->id,
++ cancel_fd);
++ ASSERT_EXIT(ret == 0);
++
++ _exit(EXIT_SUCCESS);
++ }
++
++ ret = kdbus_msg_recv_poll(conn_src, 100, &msg, NULL);
++ ASSERT_RETURN_VAL(ret == 0 && msg->cookie == cookie, -1);
++
++ kdbus_msg_free(msg);
++
++ ret = kdbus_msg_send_reply(conn_src, cookie, conn_dst->id);
++ ASSERT_RETURN_VAL(ret >= 0, ret);
++
++ ret = waitpid(pid, &status, 0);
++ ASSERT_RETURN_VAL(ret >= 0, ret);
++
++ if (WIFSIGNALED(status))
++ return -1;
++
++ return (status == EXIT_SUCCESS) ? 0 : -1;
++}
++
++static void *run_thread_reply(void *data)
++{
++ int ret;
++ unsigned long status = TEST_OK;
++
++ ret = kdbus_msg_recv_poll(conn_a, 3000, NULL, NULL);
++ if (ret < 0)
++ goto exit_thread;
++
++ kdbus_printf("Thread received message, sending reply ...\n");
++
++ /* using an unknown cookie must fail */
++ ret = kdbus_msg_send_reply(conn_a, ~cookie, conn_b->id);
++ if (ret != -EBADSLT) {
++ status = TEST_ERR;
++ goto exit_thread;
++ }
++
++ ret = kdbus_msg_send_reply(conn_a, cookie, conn_b->id);
++ if (ret != 0) {
++ status = TEST_ERR;
++ goto exit_thread;
++ }
++
++exit_thread:
++ pthread_exit(NULL);
++ return (void *) status;
++}
++
++int kdbus_test_sync_reply(struct kdbus_test_env *env)
++{
++ unsigned long status;
++ pthread_t thread;
++ int ret;
++
++ conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
++ conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn_a && conn_b);
++
++ pthread_create(&thread, NULL, run_thread_reply, NULL);
++
++ ret = kdbus_msg_send_sync(conn_b, NULL, cookie,
++ KDBUS_MSG_EXPECT_REPLY,
++ 5000000000ULL, 0, conn_a->id, -1);
++
++ pthread_join(thread, (void *) &status);
++ ASSERT_RETURN(status == 0);
++ ASSERT_RETURN(ret == 0);
++
++ ret = interrupt_sync(conn_a, conn_b);
++ ASSERT_RETURN(ret == 0);
++
++ ret = close_epipe_sync(env->buspath);
++ ASSERT_RETURN(ret == 0);
++
++ ret = cancel_fd_sync(conn_a, conn_b);
++ ASSERT_RETURN(ret == 0);
++
++ ret = no_cancel_sync(conn_a, conn_b);
++ ASSERT_RETURN(ret == 0);
++
++ kdbus_printf("-- closing bus connections\n");
++
++ kdbus_conn_free(conn_a);
++ kdbus_conn_free(conn_b);
++
++ return TEST_OK;
++}
++
++#define BYEBYE_ME ((void*)0L)
++#define BYEBYE_THEM ((void*)1L)
++
++static void *run_thread_byebye(void *data)
++{
++ struct kdbus_cmd cmd_byebye = { .size = sizeof(cmd_byebye) };
++ int ret;
++
++ ret = kdbus_msg_recv_poll(conn_a, 3000, NULL, NULL);
++ if (ret == 0) {
++ kdbus_printf("Thread received message, invoking BYEBYE ...\n");
++ kdbus_msg_recv(conn_a, NULL, NULL);
++ if (data == BYEBYE_ME)
++ kdbus_cmd_byebye(conn_b->fd, &cmd_byebye);
++ else if (data == BYEBYE_THEM)
++ kdbus_cmd_byebye(conn_a->fd, &cmd_byebye);
++ }
++
++ pthread_exit(NULL);
++ return NULL;
++}
++
++int kdbus_test_sync_byebye(struct kdbus_test_env *env)
++{
++ pthread_t thread;
++ int ret;
++
++ /*
++ * This sends a synchronous message to a thread, which waits until it
++ * received the message and then invokes BYEBYE on the *ORIGINAL*
++ * connection. That is, on the same connection that synchronously waits
++ * for an reply.
++ * This should properly wake the connection up and cause ECONNRESET as
++ * the connection is disconnected now.
++ *
++ * The second time, we do the same but invoke BYEBYE on the *TARGET*
++ * connection. This should also wake up the synchronous sender as the
++ * reply cannot be sent by a disconnected target.
++ */
++
++ conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
++ conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn_a && conn_b);
++
++ pthread_create(&thread, NULL, run_thread_byebye, BYEBYE_ME);
++
++ ret = kdbus_msg_send_sync(conn_b, NULL, cookie,
++ KDBUS_MSG_EXPECT_REPLY,
++ 5000000000ULL, 0, conn_a->id, -1);
++
++ ASSERT_RETURN(ret == -ECONNRESET);
++
++ pthread_join(thread, NULL);
++
++ kdbus_conn_free(conn_a);
++ kdbus_conn_free(conn_b);
++
++ conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
++ conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn_a && conn_b);
++
++ pthread_create(&thread, NULL, run_thread_byebye, BYEBYE_THEM);
++
++ ret = kdbus_msg_send_sync(conn_b, NULL, cookie,
++ KDBUS_MSG_EXPECT_REPLY,
++ 5000000000ULL, 0, conn_a->id, -1);
++
++ ASSERT_RETURN(ret == -EPIPE);
++
++ pthread_join(thread, NULL);
++
++ kdbus_conn_free(conn_a);
++ kdbus_conn_free(conn_b);
++
++ return TEST_OK;
++}
+diff --git a/tools/testing/selftests/kdbus/test-timeout.c b/tools/testing/selftests/kdbus/test-timeout.c
+new file mode 100644
+index 0000000..cfd1930
+--- /dev/null
++++ b/tools/testing/selftests/kdbus/test-timeout.c
+@@ -0,0 +1,99 @@
++#include <stdio.h>
++#include <string.h>
++#include <time.h>
++#include <fcntl.h>
++#include <stdlib.h>
++#include <stddef.h>
++#include <unistd.h>
++#include <stdint.h>
++#include <errno.h>
++#include <assert.h>
++#include <poll.h>
++#include <stdbool.h>
++
++#include "kdbus-api.h"
++#include "kdbus-test.h"
++#include "kdbus-util.h"
++#include "kdbus-enum.h"
++
++int timeout_msg_recv(struct kdbus_conn *conn, uint64_t *expected)
++{
++ struct kdbus_cmd_recv recv = { .size = sizeof(recv) };
++ struct kdbus_msg *msg;
++ int ret;
++
++ ret = kdbus_cmd_recv(conn->fd, &recv);
++ if (ret < 0) {
++ kdbus_printf("error receiving message: %d (%m)\n", ret);
++ return ret;
++ }
++
++ msg = (struct kdbus_msg *)(conn->buf + recv.msg.offset);
++
++ ASSERT_RETURN_VAL(msg->payload_type == KDBUS_PAYLOAD_KERNEL, -EINVAL);
++ ASSERT_RETURN_VAL(msg->src_id == KDBUS_SRC_ID_KERNEL, -EINVAL);
++ ASSERT_RETURN_VAL(msg->dst_id == conn->id, -EINVAL);
++
++ *expected &= ~(1ULL << msg->cookie_reply);
++ kdbus_printf("Got message timeout for cookie %llu\n",
++ msg->cookie_reply);
++
++ ret = kdbus_free(conn, recv.msg.offset);
++ if (ret < 0)
++ return ret;
++
++ return 0;
++}
++
++int kdbus_test_timeout(struct kdbus_test_env *env)
++{
++ struct kdbus_conn *conn_a, *conn_b;
++ struct pollfd fd;
++ int ret, i, n_msgs = 4;
++ uint64_t expected = 0;
++ uint64_t cookie = 0xdeadbeef;
++
++ conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
++ conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
++ ASSERT_RETURN(conn_a && conn_b);
++
++ fd.fd = conn_b->fd;
++
++ /*
++ * send messages that expect a reply (within 100 msec),
++ * but never answer it.
++ */
++ for (i = 0; i < n_msgs; i++, cookie++) {
++ kdbus_printf("Sending message with cookie %llu ...\n",
++ (unsigned long long)cookie);
++ ASSERT_RETURN(kdbus_msg_send(conn_b, NULL, cookie,
++ KDBUS_MSG_EXPECT_REPLY,
++ (i + 1) * 100ULL * 1000000ULL, 0,
++ conn_a->id) == 0);
++ expected |= 1ULL << cookie;
++ }
++
++ for (;;) {
++ fd.events = POLLIN | POLLPRI | POLLHUP;
++ fd.revents = 0;
++
++ ret = poll(&fd, 1, (n_msgs + 1) * 100);
++ if (ret == 0)
++ kdbus_printf("--- timeout\n");
++ if (ret <= 0)
++ break;
++
++ if (fd.revents & POLLIN)
++ ASSERT_RETURN(!timeout_msg_recv(conn_b, &expected));
++
++ if (expected == 0)
++ break;
++ }
++
++ ASSERT_RETURN(expected == 0);
++
++ kdbus_conn_free(conn_a);
++ kdbus_conn_free(conn_b);
++
++ return TEST_OK;
++}
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-09-02 16:34 Mike Pagano
0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-09-02 16:34 UTC (permalink / raw
To: gentoo-commits
commit: eddb8464aeb6e997d68c122086cacafc96684e3b
Author: Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Wed Sep 2 16:34:29 2015 +0000
Commit: Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Wed Sep 2 16:34:29 2015 +0000
URL: https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=eddb8464
workqueue: Make flush_workqueue() available again to non GPL modules
2710_flush-workqueue-non-GPL-availability.patch | 33 +++++++++++++++++++++++++
1 file changed, 33 insertions(+)
diff --git a/2710_flush-workqueue-non-GPL-availability.patch b/2710_flush-workqueue-non-GPL-availability.patch
new file mode 100644
index 0000000..3e017d4
--- /dev/null
+++ b/2710_flush-workqueue-non-GPL-availability.patch
@@ -0,0 +1,33 @@
+From 1dadafa86a779884f14a6e7a3ddde1a57b0a0a65 Mon Sep 17 00:00:00 2001
+From: Tim Gardner <tim.gardner@canonical.com>
+Date: Tue, 4 Aug 2015 11:26:04 -0600
+Subject: workqueue: Make flush_workqueue() available again to non GPL modules
+
+Commit 37b1ef31a568fc02e53587620226e5f3c66454c8 ("workqueue: move
+flush_scheduled_work() to workqueue.h") moved the exported non GPL
+flush_scheduled_work() from a function to an inline wrapper.
+Unfortunately, it directly calls flush_workqueue() which is a GPL function.
+This has the effect of changing the licensing requirement for this function
+and makes it unavailable to non GPL modules.
+
+See commit ad7b1f841f8a54c6d61ff181451f55b68175e15a ("workqueue: Make
+schedule_work() available again to non GPL modules") for precedent.
+
+Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
+Signed-off-by: Tejun Heo <tj@kernel.org>
+
+diff --git a/kernel/workqueue.c b/kernel/workqueue.c
+index 4c4f061..a413acb 100644
+--- a/kernel/workqueue.c
++++ b/kernel/workqueue.c
+@@ -2614,7 +2614,7 @@ void flush_workqueue(struct workqueue_struct *wq)
+ out_unlock:
+ mutex_unlock(&wq->mutex);
+ }
+-EXPORT_SYMBOL_GPL(flush_workqueue);
++EXPORT_SYMBOL(flush_workqueue);
+
+ /**
+ * drain_workqueue - drain a workqueue
+--
+cgit v0.10.2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-09-15 12:31 Mike Pagano
0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-09-15 12:31 UTC (permalink / raw
To: gentoo-commits
commit: f5a88481980ca0cbc0f981717e2368c486afa34c
Author: Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Tue Sep 15 12:31:35 2015 +0000
Commit: Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Tue Sep 15 12:31:35 2015 +0000
URL: https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=f5a88481
BFQ v4r9 for 4.2
0000_README | 12 +
...roups-kconfig-build-bits-for-BFQ-v7r9-4.2.patch | 103 +
...introduce-the-BFQ-v7r9-I-O-sched-for-4.2.patch1 | 7026 ++++++++++++++++++++
...Early-Queue-Merge-EQM-to-BFQ-v7r9-for-4.2.patch | 1097 +++
4 files changed, 8238 insertions(+)
diff --git a/0000_README b/0000_README
index 9022e99..0f4cdca 100644
--- a/0000_README
+++ b/0000_README
@@ -75,6 +75,18 @@ Patch: 5000_enable-additional-cpu-optimizations-for-gcc.patch
From: https://github.com/graysky2/kernel_gcc_patch/
Desc: Kernel patch enables gcc < v4.9 optimizations for additional CPUs.
+Patch: 5001_block-cgroups-kconfig-build-bits-for-BFQ-v7r9-4.2.patch
+From: http://algo.ing.unimo.it/people/paolo/disk_sched/
+Desc: BFQ v7r9 patch 1 for 4.2: Build, cgroups and kconfig bits
+
+Patch: 5002_block-introduce-the-BFQ-v7r9-I-O-sched-for-4.2.patch1
+From: http://algo.ing.unimo.it/people/paolo/disk_sched/
+Desc: BFQ v7r9 patch 2 for 4.2: BFQ Scheduler
+
+Patch: 5003_block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r9-for-4.2.0.patch
+From: http://algo.ing.unimo.it/people/paolo/disk_sched/
+Desc: BFQ v7r9 patch 3 for 4.2: Early Queue Merge (EQM)
+
Patch: 5010_enable-additional-cpu-optimizations-for-gcc-4.9.patch
From: https://github.com/graysky2/kernel_gcc_patch/
Desc: Kernel patch enables gcc >= v4.9 optimizations for additional CPUs.
diff --git a/5001_block-cgroups-kconfig-build-bits-for-BFQ-v7r9-4.2.patch b/5001_block-cgroups-kconfig-build-bits-for-BFQ-v7r9-4.2.patch
new file mode 100644
index 0000000..fc7ef8e
--- /dev/null
+++ b/5001_block-cgroups-kconfig-build-bits-for-BFQ-v7r9-4.2.patch
@@ -0,0 +1,103 @@
+From f53ecde45f8d40a343aa5b5195e9f0944b7a1a37 Mon Sep 17 00:00:00 2001
+From: Paolo Valente <paolo.valente@unimore.it>
+Date: Tue, 7 Apr 2015 13:39:12 +0200
+Subject: [PATCH 1/3] block: cgroups, kconfig, build bits for BFQ-v7r9-4.2
+
+Update Kconfig.iosched and do the related Makefile changes to include
+kernel configuration options for BFQ. Also increase the number of
+policies supported by the blkio controller so that BFQ can add its
+own.
+
+Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
+Signed-off-by: Arianna Avanzini <avanzini@google.com>
+---
+ block/Kconfig.iosched | 32 ++++++++++++++++++++++++++++++++
+ block/Makefile | 1 +
+ include/linux/blkdev.h | 2 +-
+ 3 files changed, 34 insertions(+), 1 deletion(-)
+
+diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
+index 421bef9..0ee5f0f 100644
+--- a/block/Kconfig.iosched
++++ b/block/Kconfig.iosched
+@@ -39,6 +39,27 @@ config CFQ_GROUP_IOSCHED
+ ---help---
+ Enable group IO scheduling in CFQ.
+
++config IOSCHED_BFQ
++ tristate "BFQ I/O scheduler"
++ default n
++ ---help---
++ The BFQ I/O scheduler tries to distribute bandwidth among
++ all processes according to their weights.
++ It aims at distributing the bandwidth as desired, independently of
++ the disk parameters and with any workload. It also tries to
++ guarantee low latency to interactive and soft real-time
++ applications. If compiled built-in (saying Y here), BFQ can
++ be configured to support hierarchical scheduling.
++
++config CGROUP_BFQIO
++ bool "BFQ hierarchical scheduling support"
++ depends on CGROUPS && IOSCHED_BFQ=y
++ default n
++ ---help---
++ Enable hierarchical scheduling in BFQ, using the cgroups
++ filesystem interface. The name of the subsystem will be
++ bfqio.
++
+ choice
+ prompt "Default I/O scheduler"
+ default DEFAULT_CFQ
+@@ -52,6 +73,16 @@ choice
+ config DEFAULT_CFQ
+ bool "CFQ" if IOSCHED_CFQ=y
+
++ config DEFAULT_BFQ
++ bool "BFQ" if IOSCHED_BFQ=y
++ help
++ Selects BFQ as the default I/O scheduler which will be
++ used by default for all block devices.
++ The BFQ I/O scheduler aims at distributing the bandwidth
++ as desired, independently of the disk parameters and with
++ any workload. It also tries to guarantee low latency to
++ interactive and soft real-time applications.
++
+ config DEFAULT_NOOP
+ bool "No-op"
+
+@@ -61,6 +92,7 @@ config DEFAULT_IOSCHED
+ string
+ default "deadline" if DEFAULT_DEADLINE
+ default "cfq" if DEFAULT_CFQ
++ default "bfq" if DEFAULT_BFQ
+ default "noop" if DEFAULT_NOOP
+
+ endmenu
+diff --git a/block/Makefile b/block/Makefile
+index 00ecc97..1ed86d5 100644
+--- a/block/Makefile
++++ b/block/Makefile
+@@ -18,6 +18,7 @@ obj-$(CONFIG_BLK_DEV_THROTTLING) += blk-throttle.o
+ obj-$(CONFIG_IOSCHED_NOOP) += noop-iosched.o
+ obj-$(CONFIG_IOSCHED_DEADLINE) += deadline-iosched.o
+ obj-$(CONFIG_IOSCHED_CFQ) += cfq-iosched.o
++obj-$(CONFIG_IOSCHED_BFQ) += bfq-iosched.o
+
+ obj-$(CONFIG_BLOCK_COMPAT) += compat_ioctl.o
+ obj-$(CONFIG_BLK_CMDLINE_PARSER) += cmdline-parser.o
+diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
+index a622f27..e2b4c03 100644
+--- a/include/linux/blkdev.h
++++ b/include/linux/blkdev.h
+@@ -43,7 +43,7 @@ struct blk_flush_queue;
+ * Maximum number of blkcg policies allowed to be registered concurrently.
+ * Defined here to simplify include dependency.
+ */
+-#define BLKCG_MAX_POLS 2
++#define BLKCG_MAX_POLS 3
+
+ struct request;
+ typedef void (rq_end_io_fn)(struct request *, int);
+--
+2.1.4
+
diff --git a/5002_block-introduce-the-BFQ-v7r9-I-O-sched-for-4.2.patch1 b/5002_block-introduce-the-BFQ-v7r9-I-O-sched-for-4.2.patch1
new file mode 100644
index 0000000..04dd37c
--- /dev/null
+++ b/5002_block-introduce-the-BFQ-v7r9-I-O-sched-for-4.2.patch1
@@ -0,0 +1,7026 @@
+From 152cacc8a71a6cd7fe8cedc1110a378721e66ffa Mon Sep 17 00:00:00 2001
+From: Paolo Valente <paolo.valente@unimore.it>
+Date: Thu, 9 May 2013 19:10:02 +0200
+Subject: [PATCH 2/3] block: introduce the BFQ-v7r9 I/O sched for 4.2
+
+Add the BFQ-v7r9 I/O scheduler to 4.2.
+The general structure is borrowed from CFQ, as much of the code for
+handling I/O contexts. Over time, several useful features have been
+ported from CFQ as well (details in the changelog in README.BFQ). A
+(bfq_)queue is associated to each task doing I/O on a device, and each
+time a scheduling decision has to be made a queue is selected and served
+until it expires.
+
+ - Slices are given in the service domain: tasks are assigned
+ budgets, measured in number of sectors. Once got the disk, a task
+ must however consume its assigned budget within a configurable
+ maximum time (by default, the maximum possible value of the
+ budgets is automatically computed to comply with this timeout).
+ This allows the desired latency vs "throughput boosting" tradeoff
+ to be set.
+
+ - Budgets are scheduled according to a variant of WF2Q+, implemented
+ using an augmented rb-tree to take eligibility into account while
+ preserving an O(log N) overall complexity.
+
+ - A low-latency tunable is provided; if enabled, both interactive
+ and soft real-time applications are guaranteed a very low latency.
+
+ - Latency guarantees are preserved also in the presence of NCQ.
+
+ - Also with flash-based devices, a high throughput is achieved
+ while still preserving latency guarantees.
+
+ - BFQ features Early Queue Merge (EQM), a sort of fusion of the
+ cooperating-queue-merging and the preemption mechanisms present
+ in CFQ. EQM is in fact a unified mechanism that tries to get a
+ sequential read pattern, and hence a high throughput, with any
+ set of processes performing interleaved I/O over a contiguous
+ sequence of sectors.
+
+ - BFQ supports full hierarchical scheduling, exporting a cgroups
+ interface. Since each node has a full scheduler, each group can
+ be assigned its own weight.
+
+ - If the cgroups interface is not used, only I/O priorities can be
+ assigned to processes, with ioprio values mapped to weights
+ with the relation weight = IOPRIO_BE_NR - ioprio.
+
+ - ioprio classes are served in strict priority order, i.e., lower
+ priority queues are not served as long as there are higher
+ priority queues. Among queues in the same class the bandwidth is
+ distributed in proportion to the weight of each queue. A very
+ thin extra bandwidth is however guaranteed to the Idle class, to
+ prevent it from starving.
+
+Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
+Signed-off-by: Arianna Avanzini <avanzini@google.com>
+---
+ block/Kconfig.iosched | 6 +-
+ block/bfq-cgroup.c | 1108 +++++++++++++++
+ block/bfq-ioc.c | 36 +
+ block/bfq-iosched.c | 3753 +++++++++++++++++++++++++++++++++++++++++++++++++
+ block/bfq-sched.c | 1197 ++++++++++++++++
+ block/bfq.h | 807 +++++++++++
+ 6 files changed, 6903 insertions(+), 4 deletions(-)
+ create mode 100644 block/bfq-cgroup.c
+ create mode 100644 block/bfq-ioc.c
+ create mode 100644 block/bfq-iosched.c
+ create mode 100644 block/bfq-sched.c
+ create mode 100644 block/bfq.h
+
+diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
+index 0ee5f0f..f78cd1a 100644
+--- a/block/Kconfig.iosched
++++ b/block/Kconfig.iosched
+@@ -51,14 +51,12 @@ config IOSCHED_BFQ
+ applications. If compiled built-in (saying Y here), BFQ can
+ be configured to support hierarchical scheduling.
+
+-config CGROUP_BFQIO
++config BFQ_GROUP_IOSCHED
+ bool "BFQ hierarchical scheduling support"
+ depends on CGROUPS && IOSCHED_BFQ=y
+ default n
+ ---help---
+- Enable hierarchical scheduling in BFQ, using the cgroups
+- filesystem interface. The name of the subsystem will be
+- bfqio.
++ Enable hierarchical scheduling in BFQ, using the blkio controller.
+
+ choice
+ prompt "Default I/O scheduler"
+diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
+new file mode 100644
+index 0000000..c02d65a
+--- /dev/null
++++ b/block/bfq-cgroup.c
+@@ -0,0 +1,1108 @@
++/*
++ * BFQ: CGROUPS support.
++ *
++ * Based on ideas and code from CFQ:
++ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
++ *
++ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
++ * Paolo Valente <paolo.valente@unimore.it>
++ *
++ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
++ *
++ * Licensed under the GPL-2 as detailed in the accompanying COPYING.BFQ
++ * file.
++ */
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++
++/* bfqg stats flags */
++enum bfqg_stats_flags {
++ BFQG_stats_waiting = 0,
++ BFQG_stats_idling,
++ BFQG_stats_empty,
++};
++
++#define BFQG_FLAG_FNS(name) \
++static void bfqg_stats_mark_##name(struct bfqg_stats *stats) \
++{ \
++ stats->flags |= (1 << BFQG_stats_##name); \
++} \
++static void bfqg_stats_clear_##name(struct bfqg_stats *stats) \
++{ \
++ stats->flags &= ~(1 << BFQG_stats_##name); \
++} \
++static int bfqg_stats_##name(struct bfqg_stats *stats) \
++{ \
++ return (stats->flags & (1 << BFQG_stats_##name)) != 0; \
++} \
++
++BFQG_FLAG_FNS(waiting)
++BFQG_FLAG_FNS(idling)
++BFQG_FLAG_FNS(empty)
++#undef BFQG_FLAG_FNS
++
++/* This should be called with the queue_lock held. */
++static void bfqg_stats_update_group_wait_time(struct bfqg_stats *stats)
++{
++ unsigned long long now;
++
++ if (!bfqg_stats_waiting(stats))
++ return;
++
++ now = sched_clock();
++ if (time_after64(now, stats->start_group_wait_time))
++ blkg_stat_add(&stats->group_wait_time,
++ now - stats->start_group_wait_time);
++ bfqg_stats_clear_waiting(stats);
++}
++
++/* This should be called with the queue_lock held. */
++static void bfqg_stats_set_start_group_wait_time(struct bfq_group *bfqg,
++ struct bfq_group *curr_bfqg)
++{
++ struct bfqg_stats *stats = &bfqg->stats;
++
++ if (bfqg_stats_waiting(stats))
++ return;
++ if (bfqg == curr_bfqg)
++ return;
++ stats->start_group_wait_time = sched_clock();
++ bfqg_stats_mark_waiting(stats);
++}
++
++/* This should be called with the queue_lock held. */
++static void bfqg_stats_end_empty_time(struct bfqg_stats *stats)
++{
++ unsigned long long now;
++
++ if (!bfqg_stats_empty(stats))
++ return;
++
++ now = sched_clock();
++ if (time_after64(now, stats->start_empty_time))
++ blkg_stat_add(&stats->empty_time,
++ now - stats->start_empty_time);
++ bfqg_stats_clear_empty(stats);
++}
++
++static void bfqg_stats_update_dequeue(struct bfq_group *bfqg)
++{
++ blkg_stat_add(&bfqg->stats.dequeue, 1);
++}
++
++static void bfqg_stats_set_start_empty_time(struct bfq_group *bfqg)
++{
++ struct bfqg_stats *stats = &bfqg->stats;
++
++ if (blkg_rwstat_total(&stats->queued))
++ return;
++
++ /*
++ * group is already marked empty. This can happen if bfqq got new
++ * request in parent group and moved to this group while being added
++ * to service tree. Just ignore the event and move on.
++ */
++ if (bfqg_stats_empty(stats))
++ return;
++
++ stats->start_empty_time = sched_clock();
++ bfqg_stats_mark_empty(stats);
++}
++
++static void bfqg_stats_update_idle_time(struct bfq_group *bfqg)
++{
++ struct bfqg_stats *stats = &bfqg->stats;
++
++ if (bfqg_stats_idling(stats)) {
++ unsigned long long now = sched_clock();
++
++ if (time_after64(now, stats->start_idle_time))
++ blkg_stat_add(&stats->idle_time,
++ now - stats->start_idle_time);
++ bfqg_stats_clear_idling(stats);
++ }
++}
++
++static void bfqg_stats_set_start_idle_time(struct bfq_group *bfqg)
++{
++ struct bfqg_stats *stats = &bfqg->stats;
++
++ stats->start_idle_time = sched_clock();
++ bfqg_stats_mark_idling(stats);
++}
++
++static void bfqg_stats_update_avg_queue_size(struct bfq_group *bfqg)
++{
++ struct bfqg_stats *stats = &bfqg->stats;
++
++ blkg_stat_add(&stats->avg_queue_size_sum,
++ blkg_rwstat_total(&stats->queued));
++ blkg_stat_add(&stats->avg_queue_size_samples, 1);
++ bfqg_stats_update_group_wait_time(stats);
++}
++
++static struct blkcg_policy blkcg_policy_bfq;
++
++/*
++ * blk-cgroup policy-related handlers
++ * The following functions help in converting between blk-cgroup
++ * internal structures and BFQ-specific structures.
++ */
++
++static struct bfq_group *pd_to_bfqg(struct blkg_policy_data *pd)
++{
++ return pd ? container_of(pd, struct bfq_group, pd) : NULL;
++}
++
++static struct blkcg_gq *bfqg_to_blkg(struct bfq_group *bfqg)
++{
++ return pd_to_blkg(&bfqg->pd);
++}
++
++static struct bfq_group *blkg_to_bfqg(struct blkcg_gq *blkg)
++{
++ return pd_to_bfqg(blkg_to_pd(blkg, &blkcg_policy_bfq));
++}
++
++/*
++ * bfq_group handlers
++ * The following functions help in navigating the bfq_group hierarchy
++ * by allowing to find the parent of a bfq_group or the bfq_group
++ * associated to a bfq_queue.
++ */
++
++static struct bfq_group *bfqg_parent(struct bfq_group *bfqg)
++{
++ struct blkcg_gq *pblkg = bfqg_to_blkg(bfqg)->parent;
++
++ return pblkg ? blkg_to_bfqg(pblkg) : NULL;
++}
++
++static struct bfq_group *bfqq_group(struct bfq_queue *bfqq)
++{
++ struct bfq_entity *group_entity = bfqq->entity.parent;
++
++ return group_entity ? container_of(group_entity, struct bfq_group,
++ entity) :
++ bfqq->bfqd->root_group;
++}
++
++/*
++ * The following two functions handle get and put of a bfq_group by
++ * wrapping the related blk-cgroup hooks.
++ */
++
++static void bfqg_get(struct bfq_group *bfqg)
++{
++ return blkg_get(bfqg_to_blkg(bfqg));
++}
++
++static void bfqg_put(struct bfq_group *bfqg)
++{
++ return blkg_put(bfqg_to_blkg(bfqg));
++}
++
++static void bfqg_stats_update_io_add(struct bfq_group *bfqg,
++ struct bfq_queue *bfqq,
++ int rw)
++{
++ blkg_rwstat_add(&bfqg->stats.queued, rw, 1);
++ bfqg_stats_end_empty_time(&bfqg->stats);
++ if (!(bfqq == ((struct bfq_data *)bfqg->bfqd)->in_service_queue))
++ bfqg_stats_set_start_group_wait_time(bfqg, bfqq_group(bfqq));
++}
++
++static void bfqg_stats_update_io_remove(struct bfq_group *bfqg, int rw)
++{
++ blkg_rwstat_add(&bfqg->stats.queued, rw, -1);
++}
++
++static void bfqg_stats_update_io_merged(struct bfq_group *bfqg, int rw)
++{
++ blkg_rwstat_add(&bfqg->stats.merged, rw, 1);
++}
++
++static void bfqg_stats_update_dispatch(struct bfq_group *bfqg,
++ uint64_t bytes, int rw)
++{
++ blkg_stat_add(&bfqg->stats.sectors, bytes >> 9);
++ blkg_rwstat_add(&bfqg->stats.serviced, rw, 1);
++ blkg_rwstat_add(&bfqg->stats.service_bytes, rw, bytes);
++}
++
++static void bfqg_stats_update_completion(struct bfq_group *bfqg,
++ uint64_t start_time, uint64_t io_start_time, int rw)
++{
++ struct bfqg_stats *stats = &bfqg->stats;
++ unsigned long long now = sched_clock();
++
++ if (time_after64(now, io_start_time))
++ blkg_rwstat_add(&stats->service_time, rw, now - io_start_time);
++ if (time_after64(io_start_time, start_time))
++ blkg_rwstat_add(&stats->wait_time, rw,
++ io_start_time - start_time);
++}
++
++/* @stats = 0 */
++static void bfqg_stats_reset(struct bfqg_stats *stats)
++{
++ if (!stats)
++ return;
++
++ /* queued stats shouldn't be cleared */
++ blkg_rwstat_reset(&stats->service_bytes);
++ blkg_rwstat_reset(&stats->serviced);
++ blkg_rwstat_reset(&stats->merged);
++ blkg_rwstat_reset(&stats->service_time);
++ blkg_rwstat_reset(&stats->wait_time);
++ blkg_stat_reset(&stats->time);
++ blkg_stat_reset(&stats->unaccounted_time);
++ blkg_stat_reset(&stats->avg_queue_size_sum);
++ blkg_stat_reset(&stats->avg_queue_size_samples);
++ blkg_stat_reset(&stats->dequeue);
++ blkg_stat_reset(&stats->group_wait_time);
++ blkg_stat_reset(&stats->idle_time);
++ blkg_stat_reset(&stats->empty_time);
++}
++
++/* @to += @from */
++static void bfqg_stats_merge(struct bfqg_stats *to, struct bfqg_stats *from)
++{
++ if (!to || !from)
++ return;
++
++ /* queued stats shouldn't be cleared */
++ blkg_rwstat_merge(&to->service_bytes, &from->service_bytes);
++ blkg_rwstat_merge(&to->serviced, &from->serviced);
++ blkg_rwstat_merge(&to->merged, &from->merged);
++ blkg_rwstat_merge(&to->service_time, &from->service_time);
++ blkg_rwstat_merge(&to->wait_time, &from->wait_time);
++ blkg_stat_merge(&from->time, &from->time);
++ blkg_stat_merge(&to->unaccounted_time, &from->unaccounted_time);
++ blkg_stat_merge(&to->avg_queue_size_sum, &from->avg_queue_size_sum);
++ blkg_stat_merge(&to->avg_queue_size_samples, &from->avg_queue_size_samples);
++ blkg_stat_merge(&to->dequeue, &from->dequeue);
++ blkg_stat_merge(&to->group_wait_time, &from->group_wait_time);
++ blkg_stat_merge(&to->idle_time, &from->idle_time);
++ blkg_stat_merge(&to->empty_time, &from->empty_time);
++}
++
++/*
++ * Transfer @bfqg's stats to its parent's dead_stats so that the ancestors'
++ * recursive stats can still account for the amount used by this bfqg after
++ * it's gone.
++ */
++static void bfqg_stats_xfer_dead(struct bfq_group *bfqg)
++{
++ struct bfq_group *parent;
++
++ if (!bfqg) /* root_group */
++ return;
++
++ parent = bfqg_parent(bfqg);
++
++ lockdep_assert_held(bfqg_to_blkg(bfqg)->q->queue_lock);
++
++ if (unlikely(!parent))
++ return;
++
++ bfqg_stats_merge(&parent->dead_stats, &bfqg->stats);
++ bfqg_stats_merge(&parent->dead_stats, &bfqg->dead_stats);
++ bfqg_stats_reset(&bfqg->stats);
++ bfqg_stats_reset(&bfqg->dead_stats);
++}
++
++static void bfq_init_entity(struct bfq_entity *entity,
++ struct bfq_group *bfqg)
++{
++ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++
++ entity->weight = entity->new_weight;
++ entity->orig_weight = entity->new_weight;
++ if (bfqq) {
++ bfqq->ioprio = bfqq->new_ioprio;
++ bfqq->ioprio_class = bfqq->new_ioprio_class;
++ bfqg_get(bfqg);
++ }
++ entity->parent = bfqg->my_entity;
++ entity->sched_data = &bfqg->sched_data;
++}
++
++static void bfqg_stats_init(struct bfqg_stats *stats)
++{
++ blkg_rwstat_init(&stats->service_bytes);
++ blkg_rwstat_init(&stats->serviced);
++ blkg_rwstat_init(&stats->merged);
++ blkg_rwstat_init(&stats->service_time);
++ blkg_rwstat_init(&stats->wait_time);
++ blkg_rwstat_init(&stats->queued);
++
++ blkg_stat_init(&stats->sectors);
++ blkg_stat_init(&stats->time);
++
++ blkg_stat_init(&stats->unaccounted_time);
++ blkg_stat_init(&stats->avg_queue_size_sum);
++ blkg_stat_init(&stats->avg_queue_size_samples);
++ blkg_stat_init(&stats->dequeue);
++ blkg_stat_init(&stats->group_wait_time);
++ blkg_stat_init(&stats->idle_time);
++ blkg_stat_init(&stats->empty_time);
++}
++
++static struct bfq_group_data *cpd_to_bfqgd(struct blkcg_policy_data *cpd)
++ {
++ return cpd ? container_of(cpd, struct bfq_group_data, pd) : NULL;
++ }
++
++static struct bfq_group_data *blkcg_to_bfqgd(struct blkcg *blkcg)
++{
++ return cpd_to_bfqgd(blkcg_to_cpd(blkcg, &blkcg_policy_bfq));
++}
++
++static void bfq_cpd_init(const struct blkcg *blkcg)
++{
++ struct bfq_group_data *d =
++ cpd_to_bfqgd(blkcg->pd[blkcg_policy_bfq.plid]);
++
++ d->weight = BFQ_DEFAULT_GRP_WEIGHT;
++}
++
++static void bfq_pd_init(struct blkcg_gq *blkg)
++{
++ struct bfq_group *bfqg = blkg_to_bfqg(blkg);
++ struct bfq_data *bfqd = blkg->q->elevator->elevator_data;
++ struct bfq_entity *entity = &bfqg->entity;
++ struct bfq_group_data *d = blkcg_to_bfqgd(blkg->blkcg);
++
++ entity->orig_weight = entity->weight = entity->new_weight = d->weight;
++ entity->my_sched_data = &bfqg->sched_data;
++ bfqg->my_entity = entity; /*
++ * the root_group's will be set to NULL
++ * in bfq_init_queue()
++ */
++ bfqg->bfqd = bfqd;
++ bfqg->active_entities = 0;
++
++ /* if the root_group does not exist, we are handling it right now */
++ if (bfqd->root_group && bfqg != bfqd->root_group)
++ hlist_add_head(&bfqg->bfqd_node, &bfqd->group_list);
++
++ bfqg_stats_init(&bfqg->stats);
++ bfqg_stats_init(&bfqg->dead_stats);
++}
++
++/* offset delta from bfqg->stats to bfqg->dead_stats */
++static const int dead_stats_off_delta = offsetof(struct bfq_group, dead_stats) -
++ offsetof(struct bfq_group, stats);
++
++/* to be used by recursive prfill, sums live and dead stats recursively */
++static u64 bfqg_stat_pd_recursive_sum(struct blkg_policy_data *pd, int off)
++{
++ u64 sum = 0;
++
++ sum += blkg_stat_recursive_sum(pd, off);
++ sum += blkg_stat_recursive_sum(pd, off + dead_stats_off_delta);
++ return sum;
++}
++
++/* to be used by recursive prfill, sums live and dead rwstats recursively */
++static struct blkg_rwstat bfqg_rwstat_pd_recursive_sum(struct blkg_policy_data *pd,
++ int off)
++{
++ struct blkg_rwstat a, b;
++
++ a = blkg_rwstat_recursive_sum(pd, off);
++ b = blkg_rwstat_recursive_sum(pd, off + dead_stats_off_delta);
++ blkg_rwstat_merge(&a, &b);
++ return a;
++}
++
++static void bfq_pd_reset_stats(struct blkcg_gq *blkg)
++{
++ struct bfq_group *bfqg = blkg_to_bfqg(blkg);
++
++ bfqg_stats_reset(&bfqg->stats);
++ bfqg_stats_reset(&bfqg->dead_stats);
++}
++
++static void bfq_group_set_parent(struct bfq_group *bfqg,
++ struct bfq_group *parent)
++{
++ struct bfq_entity *entity;
++
++ BUG_ON(!parent);
++ BUG_ON(!bfqg);
++ BUG_ON(bfqg == parent);
++
++ entity = &bfqg->entity;
++ entity->parent = parent->my_entity;
++ entity->sched_data = &parent->sched_data;
++}
++
++static struct bfq_group *bfq_find_alloc_group(struct bfq_data *bfqd,
++ struct blkcg *blkcg)
++{
++ struct request_queue *q = bfqd->queue;
++ struct bfq_group *bfqg = NULL, *parent;
++ struct bfq_entity *entity = NULL;
++
++ assert_spin_locked(bfqd->queue->queue_lock);
++
++ /* avoid lookup for the common case where there's no blkcg */
++ if (blkcg == &blkcg_root) {
++ bfqg = bfqd->root_group;
++ } else {
++ struct blkcg_gq *blkg;
++
++ blkg = blkg_lookup_create(blkcg, q);
++ if (!IS_ERR(blkg))
++ bfqg = blkg_to_bfqg(blkg);
++ else /* fallback to root_group */
++ bfqg = bfqd->root_group;
++ }
++
++ BUG_ON(!bfqg);
++
++ /*
++ * Update chain of bfq_groups as we might be handling a leaf group
++ * which, along with some of its relatives, has not been hooked yet
++ * to the private hierarchy of BFQ.
++ */
++ entity = &bfqg->entity;
++ for_each_entity(entity) {
++ bfqg = container_of(entity, struct bfq_group, entity);
++ BUG_ON(!bfqg);
++ if (bfqg != bfqd->root_group) {
++ parent = bfqg_parent(bfqg);
++ if (!parent)
++ parent = bfqd->root_group;
++ BUG_ON(!parent);
++ bfq_group_set_parent(bfqg, parent);
++ }
++ }
++
++ return bfqg;
++}
++
++/**
++ * bfq_bfqq_move - migrate @bfqq to @bfqg.
++ * @bfqd: queue descriptor.
++ * @bfqq: the queue to move.
++ * @entity: @bfqq's entity.
++ * @bfqg: the group to move to.
++ *
++ * Move @bfqq to @bfqg, deactivating it from its old group and reactivating
++ * it on the new one. Avoid putting the entity on the old group idle tree.
++ *
++ * Must be called under the queue lock; the cgroup owning @bfqg must
++ * not disappear (by now this just means that we are called under
++ * rcu_read_lock()).
++ */
++static void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++ struct bfq_entity *entity, struct bfq_group *bfqg)
++{
++ int busy, resume;
++
++ busy = bfq_bfqq_busy(bfqq);
++ resume = !RB_EMPTY_ROOT(&bfqq->sort_list);
++
++ BUG_ON(resume && !entity->on_st);
++ BUG_ON(busy && !resume && entity->on_st &&
++ bfqq != bfqd->in_service_queue);
++
++ if (busy) {
++ BUG_ON(atomic_read(&bfqq->ref) < 2);
++
++ if (!resume)
++ bfq_del_bfqq_busy(bfqd, bfqq, 0);
++ else
++ bfq_deactivate_bfqq(bfqd, bfqq, 0);
++ } else if (entity->on_st)
++ bfq_put_idle_entity(bfq_entity_service_tree(entity), entity);
++ bfqg_put(bfqq_group(bfqq));
++
++ /*
++ * Here we use a reference to bfqg. We don't need a refcounter
++ * as the cgroup reference will not be dropped, so that its
++ * destroy() callback will not be invoked.
++ */
++ entity->parent = bfqg->my_entity;
++ entity->sched_data = &bfqg->sched_data;
++ bfqg_get(bfqg);
++
++ if (busy) {
++ if (resume)
++ bfq_activate_bfqq(bfqd, bfqq);
++ }
++
++ if (!bfqd->in_service_queue && !bfqd->rq_in_driver)
++ bfq_schedule_dispatch(bfqd);
++}
++
++/**
++ * __bfq_bic_change_cgroup - move @bic to @cgroup.
++ * @bfqd: the queue descriptor.
++ * @bic: the bic to move.
++ * @blkcg: the blk-cgroup to move to.
++ *
++ * Move bic to blkcg, assuming that bfqd->queue is locked; the caller
++ * has to make sure that the reference to cgroup is valid across the call.
++ *
++ * NOTE: an alternative approach might have been to store the current
++ * cgroup in bfqq and getting a reference to it, reducing the lookup
++ * time here, at the price of slightly more complex code.
++ */
++static struct bfq_group *__bfq_bic_change_cgroup(struct bfq_data *bfqd,
++ struct bfq_io_cq *bic,
++ struct blkcg *blkcg)
++{
++ struct bfq_queue *async_bfqq = bic_to_bfqq(bic, 0);
++ struct bfq_queue *sync_bfqq = bic_to_bfqq(bic, 1);
++ struct bfq_group *bfqg;
++ struct bfq_entity *entity;
++
++ lockdep_assert_held(bfqd->queue->queue_lock);
++
++ bfqg = bfq_find_alloc_group(bfqd, blkcg);
++ if (async_bfqq) {
++ entity = &async_bfqq->entity;
++
++ if (entity->sched_data != &bfqg->sched_data) {
++ bic_set_bfqq(bic, NULL, 0);
++ bfq_log_bfqq(bfqd, async_bfqq,
++ "bic_change_group: %p %d",
++ async_bfqq, atomic_read(&async_bfqq->ref));
++ bfq_put_queue(async_bfqq);
++ }
++ }
++
++ if (sync_bfqq) {
++ entity = &sync_bfqq->entity;
++ if (entity->sched_data != &bfqg->sched_data)
++ bfq_bfqq_move(bfqd, sync_bfqq, entity, bfqg);
++ }
++
++ return bfqg;
++}
++
++static void bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio)
++{
++ struct bfq_data *bfqd = bic_to_bfqd(bic);
++ struct blkcg *blkcg;
++ struct bfq_group *bfqg = NULL;
++ uint64_t id;
++
++ rcu_read_lock();
++ blkcg = bio_blkcg(bio);
++ id = blkcg->css.serial_nr;
++ rcu_read_unlock();
++
++ /*
++ * Check whether blkcg has changed. The condition may trigger
++ * spuriously on a newly created cic but there's no harm.
++ */
++ if (unlikely(!bfqd) || likely(bic->blkcg_id == id))
++ return;
++
++ bfqg = __bfq_bic_change_cgroup(bfqd, bic, blkcg);
++ BUG_ON(!bfqg);
++ bic->blkcg_id = id;
++}
++
++/**
++ * bfq_flush_idle_tree - deactivate any entity on the idle tree of @st.
++ * @st: the service tree being flushed.
++ */
++static void bfq_flush_idle_tree(struct bfq_service_tree *st)
++{
++ struct bfq_entity *entity = st->first_idle;
++
++ for (; entity ; entity = st->first_idle)
++ __bfq_deactivate_entity(entity, 0);
++}
++
++/**
++ * bfq_reparent_leaf_entity - move leaf entity to the root_group.
++ * @bfqd: the device data structure with the root group.
++ * @entity: the entity to move.
++ */
++static void bfq_reparent_leaf_entity(struct bfq_data *bfqd,
++ struct bfq_entity *entity)
++{
++ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++
++ BUG_ON(!bfqq);
++ bfq_bfqq_move(bfqd, bfqq, entity, bfqd->root_group);
++ return;
++}
++
++/**
++ * bfq_reparent_active_entities - move to the root group all active
++ * entities.
++ * @bfqd: the device data structure with the root group.
++ * @bfqg: the group to move from.
++ * @st: the service tree with the entities.
++ *
++ * Needs queue_lock to be taken and reference to be valid over the call.
++ */
++static void bfq_reparent_active_entities(struct bfq_data *bfqd,
++ struct bfq_group *bfqg,
++ struct bfq_service_tree *st)
++{
++ struct rb_root *active = &st->active;
++ struct bfq_entity *entity = NULL;
++
++ if (!RB_EMPTY_ROOT(&st->active))
++ entity = bfq_entity_of(rb_first(active));
++
++ for (; entity ; entity = bfq_entity_of(rb_first(active)))
++ bfq_reparent_leaf_entity(bfqd, entity);
++
++ if (bfqg->sched_data.in_service_entity)
++ bfq_reparent_leaf_entity(bfqd,
++ bfqg->sched_data.in_service_entity);
++
++ return;
++}
++
++/**
++ * bfq_destroy_group - destroy @bfqg.
++ * @bfqg: the group being destroyed.
++ *
++ * Destroy @bfqg, making sure that it is not referenced from its parent.
++ * blkio already grabs the queue_lock for us, so no need to use RCU-based magic
++ */
++static void bfq_pd_offline(struct blkcg_gq *blkg)
++{
++ struct bfq_service_tree *st;
++ struct bfq_group *bfqg = blkg_to_bfqg(blkg);
++ struct bfq_data *bfqd = bfqg->bfqd;
++ struct bfq_entity *entity = bfqg->my_entity;
++ int i;
++
++ if (!entity) /* root group */
++ return;
++
++ /*
++ * Empty all service_trees belonging to this group before
++ * deactivating the group itself.
++ */
++ for (i = 0; i < BFQ_IOPRIO_CLASSES; i++) {
++ st = bfqg->sched_data.service_tree + i;
++
++ /*
++ * The idle tree may still contain bfq_queues belonging
++ * to exited task because they never migrated to a different
++ * cgroup from the one being destroyed now. No one else
++ * can access them so it's safe to act without any lock.
++ */
++ bfq_flush_idle_tree(st);
++
++ /*
++ * It may happen that some queues are still active
++ * (busy) upon group destruction (if the corresponding
++ * processes have been forced to terminate). We move
++ * all the leaf entities corresponding to these queues
++ * to the root_group.
++ * Also, it may happen that the group has an entity
++ * in service, which is disconnected from the active
++ * tree: it must be moved, too.
++ * There is no need to put the sync queues, as the
++ * scheduler has taken no reference.
++ */
++ bfq_reparent_active_entities(bfqd, bfqg, st);
++ BUG_ON(!RB_EMPTY_ROOT(&st->active));
++ BUG_ON(!RB_EMPTY_ROOT(&st->idle));
++ }
++ BUG_ON(bfqg->sched_data.next_in_service);
++ BUG_ON(bfqg->sched_data.in_service_entity);
++
++ hlist_del(&bfqg->bfqd_node);
++ __bfq_deactivate_entity(entity, 0);
++ bfq_put_async_queues(bfqd, bfqg);
++ BUG_ON(entity->tree);
++
++ bfqg_stats_xfer_dead(bfqg);
++}
++
++static void bfq_end_wr_async(struct bfq_data *bfqd)
++{
++ struct hlist_node *tmp;
++ struct bfq_group *bfqg;
++
++ hlist_for_each_entry_safe(bfqg, tmp, &bfqd->group_list, bfqd_node)
++ bfq_end_wr_async_queues(bfqd, bfqg);
++ bfq_end_wr_async_queues(bfqd, bfqd->root_group);
++}
++
++/**
++ * bfq_disconnect_groups - disconnect @bfqd from all its groups.
++ * @bfqd: the device descriptor being exited.
++ *
++ * When the device exits we just make sure that no lookup can return
++ * the now unused group structures. They will be deallocated on cgroup
++ * destruction.
++ */
++static void bfq_disconnect_groups(struct bfq_data *bfqd)
++{
++ struct hlist_node *tmp;
++ struct bfq_group *bfqg;
++
++ bfq_log(bfqd, "disconnect_groups beginning");
++ hlist_for_each_entry_safe(bfqg, tmp, &bfqd->group_list, bfqd_node) {
++ hlist_del(&bfqg->bfqd_node);
++
++ __bfq_deactivate_entity(bfqg->my_entity, 0);
++
++ /*
++ * Don't remove from the group hash, just set an
++ * invalid key. No lookups can race with the
++ * assignment as bfqd is being destroyed; this
++ * implies also that new elements cannot be added
++ * to the list.
++ */
++ rcu_assign_pointer(bfqg->bfqd, NULL);
++
++ bfq_log(bfqd, "disconnect_groups: put async for group %p",
++ bfqg);
++ bfq_put_async_queues(bfqd, bfqg);
++ }
++}
++
++static u64 bfqio_cgroup_weight_read(struct cgroup_subsys_state *css,
++ struct cftype *cftype)
++{
++ struct blkcg *blkcg = css_to_blkcg(css);
++ struct bfq_group_data *bfqgd = blkcg_to_bfqgd(blkcg);
++ int ret = -EINVAL;
++
++ spin_lock_irq(&blkcg->lock);
++ ret = bfqgd->weight;
++ spin_unlock_irq(&blkcg->lock);
++
++ return ret;
++}
++
++static int bfqio_cgroup_weight_write(struct cgroup_subsys_state *css,
++ struct cftype *cftype,
++ u64 val)
++{
++ struct blkcg *blkcg = css_to_blkcg(css);
++ struct bfq_group_data *bfqgd = blkcg_to_bfqgd(blkcg);
++ struct blkcg_gq *blkg;
++ int ret = -EINVAL;
++
++ if (val < BFQ_MIN_WEIGHT || val > BFQ_MAX_WEIGHT)
++ return ret;
++
++ ret = 0;
++ spin_lock_irq(&blkcg->lock);
++ bfqgd->weight = (unsigned short)val;
++ hlist_for_each_entry(blkg, &blkcg->blkg_list, blkcg_node) {
++ struct bfq_group *bfqg = blkg_to_bfqg(blkg);
++ if (!bfqg)
++ continue;
++ /*
++ * Setting the prio_changed flag of the entity
++ * to 1 with new_weight == weight would re-set
++ * the value of the weight to its ioprio mapping.
++ * Set the flag only if necessary.
++ */
++ if ((unsigned short)val != bfqg->entity.new_weight) {
++ bfqg->entity.new_weight = (unsigned short)val;
++ /*
++ * Make sure that the above new value has been
++ * stored in bfqg->entity.new_weight before
++ * setting the prio_changed flag. In fact,
++ * this flag may be read asynchronously (in
++ * critical sections protected by a different
++ * lock than that held here), and finding this
++ * flag set may cause the execution of the code
++ * for updating parameters whose value may
++ * depend also on bfqg->entity.new_weight (in
++ * __bfq_entity_update_weight_prio).
++ * This barrier makes sure that the new value
++ * of bfqg->entity.new_weight is correctly
++ * seen in that code.
++ */
++ smp_wmb();
++ bfqg->entity.prio_changed = 1;
++ }
++ }
++ spin_unlock_irq(&blkcg->lock);
++
++ return ret;
++}
++
++static int bfqg_print_stat(struct seq_file *sf, void *v)
++{
++ blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)), blkg_prfill_stat,
++ &blkcg_policy_bfq, seq_cft(sf)->private, false);
++ return 0;
++}
++
++static int bfqg_print_rwstat(struct seq_file *sf, void *v)
++{
++ blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)), blkg_prfill_rwstat,
++ &blkcg_policy_bfq, seq_cft(sf)->private, true);
++ return 0;
++}
++
++static u64 bfqg_prfill_stat_recursive(struct seq_file *sf,
++ struct blkg_policy_data *pd, int off)
++{
++ u64 sum = bfqg_stat_pd_recursive_sum(pd, off);
++
++ return __blkg_prfill_u64(sf, pd, sum);
++}
++
++static u64 bfqg_prfill_rwstat_recursive(struct seq_file *sf,
++ struct blkg_policy_data *pd, int off)
++{
++ struct blkg_rwstat sum = bfqg_rwstat_pd_recursive_sum(pd, off);
++
++ return __blkg_prfill_rwstat(sf, pd, &sum);
++}
++
++static int bfqg_print_stat_recursive(struct seq_file *sf, void *v)
++{
++ blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)),
++ bfqg_prfill_stat_recursive, &blkcg_policy_bfq,
++ seq_cft(sf)->private, false);
++ return 0;
++}
++
++static int bfqg_print_rwstat_recursive(struct seq_file *sf, void *v)
++{
++ blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)),
++ bfqg_prfill_rwstat_recursive, &blkcg_policy_bfq,
++ seq_cft(sf)->private, true);
++ return 0;
++}
++
++static u64 bfqg_prfill_avg_queue_size(struct seq_file *sf,
++ struct blkg_policy_data *pd, int off)
++{
++ struct bfq_group *bfqg = pd_to_bfqg(pd);
++ u64 samples = blkg_stat_read(&bfqg->stats.avg_queue_size_samples);
++ u64 v = 0;
++
++ if (samples) {
++ v = blkg_stat_read(&bfqg->stats.avg_queue_size_sum);
++ v = div64_u64(v, samples);
++ }
++ __blkg_prfill_u64(sf, pd, v);
++ return 0;
++}
++
++/* print avg_queue_size */
++static int bfqg_print_avg_queue_size(struct seq_file *sf, void *v)
++{
++ blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)),
++ bfqg_prfill_avg_queue_size, &blkcg_policy_bfq,
++ 0, false);
++ return 0;
++}
++
++static struct bfq_group *bfq_create_group_hierarchy(struct bfq_data *bfqd, int node)
++{
++ int ret;
++
++ ret = blkcg_activate_policy(bfqd->queue, &blkcg_policy_bfq);
++ if (ret)
++ return NULL;
++
++ return blkg_to_bfqg(bfqd->queue->root_blkg);
++}
++
++static struct cftype bfqio_files[] = {
++ {
++ .name = "bfq.weight",
++ .read_u64 = bfqio_cgroup_weight_read,
++ .write_u64 = bfqio_cgroup_weight_write,
++ },
++ /* statistics, cover only the tasks in the bfqg */
++ {
++ .name = "bfq.time",
++ .private = offsetof(struct bfq_group, stats.time),
++ .seq_show = bfqg_print_stat,
++ },
++ {
++ .name = "bfq.sectors",
++ .private = offsetof(struct bfq_group, stats.sectors),
++ .seq_show = bfqg_print_stat,
++ },
++ {
++ .name = "bfq.io_service_bytes",
++ .private = offsetof(struct bfq_group, stats.service_bytes),
++ .seq_show = bfqg_print_rwstat,
++ },
++ {
++ .name = "bfq.io_serviced",
++ .private = offsetof(struct bfq_group, stats.serviced),
++ .seq_show = bfqg_print_rwstat,
++ },
++ {
++ .name = "bfq.io_service_time",
++ .private = offsetof(struct bfq_group, stats.service_time),
++ .seq_show = bfqg_print_rwstat,
++ },
++ {
++ .name = "bfq.io_wait_time",
++ .private = offsetof(struct bfq_group, stats.wait_time),
++ .seq_show = bfqg_print_rwstat,
++ },
++ {
++ .name = "bfq.io_merged",
++ .private = offsetof(struct bfq_group, stats.merged),
++ .seq_show = bfqg_print_rwstat,
++ },
++ {
++ .name = "bfq.io_queued",
++ .private = offsetof(struct bfq_group, stats.queued),
++ .seq_show = bfqg_print_rwstat,
++ },
++
++ /* the same statictics which cover the bfqg and its descendants */
++ {
++ .name = "bfq.time_recursive",
++ .private = offsetof(struct bfq_group, stats.time),
++ .seq_show = bfqg_print_stat_recursive,
++ },
++ {
++ .name = "bfq.sectors_recursive",
++ .private = offsetof(struct bfq_group, stats.sectors),
++ .seq_show = bfqg_print_stat_recursive,
++ },
++ {
++ .name = "bfq.io_service_bytes_recursive",
++ .private = offsetof(struct bfq_group, stats.service_bytes),
++ .seq_show = bfqg_print_rwstat_recursive,
++ },
++ {
++ .name = "bfq.io_serviced_recursive",
++ .private = offsetof(struct bfq_group, stats.serviced),
++ .seq_show = bfqg_print_rwstat_recursive,
++ },
++ {
++ .name = "bfq.io_service_time_recursive",
++ .private = offsetof(struct bfq_group, stats.service_time),
++ .seq_show = bfqg_print_rwstat_recursive,
++ },
++ {
++ .name = "bfq.io_wait_time_recursive",
++ .private = offsetof(struct bfq_group, stats.wait_time),
++ .seq_show = bfqg_print_rwstat_recursive,
++ },
++ {
++ .name = "bfq.io_merged_recursive",
++ .private = offsetof(struct bfq_group, stats.merged),
++ .seq_show = bfqg_print_rwstat_recursive,
++ },
++ {
++ .name = "bfq.io_queued_recursive",
++ .private = offsetof(struct bfq_group, stats.queued),
++ .seq_show = bfqg_print_rwstat_recursive,
++ },
++ {
++ .name = "bfq.avg_queue_size",
++ .seq_show = bfqg_print_avg_queue_size,
++ },
++ {
++ .name = "bfq.group_wait_time",
++ .private = offsetof(struct bfq_group, stats.group_wait_time),
++ .seq_show = bfqg_print_stat,
++ },
++ {
++ .name = "bfq.idle_time",
++ .private = offsetof(struct bfq_group, stats.idle_time),
++ .seq_show = bfqg_print_stat,
++ },
++ {
++ .name = "bfq.empty_time",
++ .private = offsetof(struct bfq_group, stats.empty_time),
++ .seq_show = bfqg_print_stat,
++ },
++ {
++ .name = "bfq.dequeue",
++ .private = offsetof(struct bfq_group, stats.dequeue),
++ .seq_show = bfqg_print_stat,
++ },
++ {
++ .name = "bfq.unaccounted_time",
++ .private = offsetof(struct bfq_group, stats.unaccounted_time),
++ .seq_show = bfqg_print_stat,
++ },
++ { } /* terminate */
++};
++
++static struct blkcg_policy blkcg_policy_bfq = {
++ .pd_size = sizeof(struct bfq_group),
++ .cpd_size = sizeof(struct bfq_group_data),
++ .cftypes = bfqio_files,
++ .pd_init_fn = bfq_pd_init,
++ .cpd_init_fn = bfq_cpd_init,
++ .pd_offline_fn = bfq_pd_offline,
++ .pd_reset_stats_fn = bfq_pd_reset_stats,
++};
++
++#else
++
++static void bfq_init_entity(struct bfq_entity *entity,
++ struct bfq_group *bfqg)
++{
++ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++ entity->weight = entity->new_weight;
++ entity->orig_weight = entity->new_weight;
++ if (bfqq) {
++ bfqq->ioprio = bfqq->new_ioprio;
++ bfqq->ioprio_class = bfqq->new_ioprio_class;
++ }
++ entity->sched_data = &bfqg->sched_data;
++}
++
++static struct bfq_group *
++bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio)
++{
++ struct bfq_data *bfqd = bic_to_bfqd(bic);
++ return bfqd->root_group;
++}
++
++static void bfq_bfqq_move(struct bfq_data *bfqd,
++ struct bfq_queue *bfqq,
++ struct bfq_entity *entity,
++ struct bfq_group *bfqg)
++{
++}
++
++static void bfq_end_wr_async(struct bfq_data *bfqd)
++{
++ bfq_end_wr_async_queues(bfqd, bfqd->root_group);
++}
++
++static void bfq_disconnect_groups(struct bfq_data *bfqd)
++{
++ bfq_put_async_queues(bfqd, bfqd->root_group);
++}
++
++static struct bfq_group *bfq_find_alloc_group(struct bfq_data *bfqd,
++ struct blkcg *blkcg)
++{
++ return bfqd->root_group;
++}
++
++static struct bfq_group *bfq_create_group_hierarchy(struct bfq_data *bfqd, int node)
++{
++ struct bfq_group *bfqg;
++ int i;
++
++ bfqg = kmalloc_node(sizeof(*bfqg), GFP_KERNEL | __GFP_ZERO, node);
++ if (!bfqg)
++ return NULL;
++
++ for (i = 0; i < BFQ_IOPRIO_CLASSES; i++)
++ bfqg->sched_data.service_tree[i] = BFQ_SERVICE_TREE_INIT;
++
++ return bfqg;
++}
++#endif
+diff --git a/block/bfq-ioc.c b/block/bfq-ioc.c
+new file mode 100644
+index 0000000..fb7bb8f
+--- /dev/null
++++ b/block/bfq-ioc.c
+@@ -0,0 +1,36 @@
++/*
++ * BFQ: I/O context handling.
++ *
++ * Based on ideas and code from CFQ:
++ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
++ *
++ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
++ * Paolo Valente <paolo.valente@unimore.it>
++ *
++ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
++ */
++
++/**
++ * icq_to_bic - convert iocontext queue structure to bfq_io_cq.
++ * @icq: the iocontext queue.
++ */
++static struct bfq_io_cq *icq_to_bic(struct io_cq *icq)
++{
++ /* bic->icq is the first member, %NULL will convert to %NULL */
++ return container_of(icq, struct bfq_io_cq, icq);
++}
++
++/**
++ * bfq_bic_lookup - search into @ioc a bic associated to @bfqd.
++ * @bfqd: the lookup key.
++ * @ioc: the io_context of the process doing I/O.
++ *
++ * Queue lock must be held.
++ */
++static struct bfq_io_cq *bfq_bic_lookup(struct bfq_data *bfqd,
++ struct io_context *ioc)
++{
++ if (ioc)
++ return icq_to_bic(ioc_lookup_icq(ioc, bfqd->queue));
++ return NULL;
++}
+diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
+new file mode 100644
+index 0000000..51d24dd
+--- /dev/null
++++ b/block/bfq-iosched.c
+@@ -0,0 +1,3753 @@
++/*
++ * Budget Fair Queueing (BFQ) disk scheduler.
++ *
++ * Based on ideas and code from CFQ:
++ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
++ *
++ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
++ * Paolo Valente <paolo.valente@unimore.it>
++ *
++ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
++ *
++ * Licensed under the GPL-2 as detailed in the accompanying COPYING.BFQ
++ * file.
++ *
++ * BFQ is a proportional-share storage-I/O scheduling algorithm based on
++ * the slice-by-slice service scheme of CFQ. But BFQ assigns budgets,
++ * measured in number of sectors, to processes instead of time slices. The
++ * device is not granted to the in-service process for a given time slice,
++ * but until it has exhausted its assigned budget. This change from the time
++ * to the service domain allows BFQ to distribute the device throughput
++ * among processes as desired, without any distortion due to ZBR, workload
++ * fluctuations or other factors. BFQ uses an ad hoc internal scheduler,
++ * called B-WF2Q+, to schedule processes according to their budgets. More
++ * precisely, BFQ schedules queues associated to processes. Thanks to the
++ * accurate policy of B-WF2Q+, BFQ can afford to assign high budgets to
++ * I/O-bound processes issuing sequential requests (to boost the
++ * throughput), and yet guarantee a low latency to interactive and soft
++ * real-time applications.
++ *
++ * BFQ is described in [1], where also a reference to the initial, more
++ * theoretical paper on BFQ can be found. The interested reader can find
++ * in the latter paper full details on the main algorithm, as well as
++ * formulas of the guarantees and formal proofs of all the properties.
++ * With respect to the version of BFQ presented in these papers, this
++ * implementation adds a few more heuristics, such as the one that
++ * guarantees a low latency to soft real-time applications, and a
++ * hierarchical extension based on H-WF2Q+.
++ *
++ * B-WF2Q+ is based on WF2Q+, that is described in [2], together with
++ * H-WF2Q+, while the augmented tree used to implement B-WF2Q+ with O(log N)
++ * complexity derives from the one introduced with EEVDF in [3].
++ *
++ * [1] P. Valente and M. Andreolini, ``Improving Application Responsiveness
++ * with the BFQ Disk I/O Scheduler'',
++ * Proceedings of the 5th Annual International Systems and Storage
++ * Conference (SYSTOR '12), June 2012.
++ *
++ * http://algogroup.unimo.it/people/paolo/disk_sched/bf1-v1-suite-results.pdf
++ *
++ * [2] Jon C.R. Bennett and H. Zhang, ``Hierarchical Packet Fair Queueing
++ * Algorithms,'' IEEE/ACM Transactions on Networking, 5(5):675-689,
++ * Oct 1997.
++ *
++ * http://www.cs.cmu.edu/~hzhang/papers/TON-97-Oct.ps.gz
++ *
++ * [3] I. Stoica and H. Abdel-Wahab, ``Earliest Eligible Virtual Deadline
++ * First: A Flexible and Accurate Mechanism for Proportional Share
++ * Resource Allocation,'' technical report.
++ *
++ * http://www.cs.berkeley.edu/~istoica/papers/eevdf-tr-95.pdf
++ */
++#include <linux/module.h>
++#include <linux/slab.h>
++#include <linux/blkdev.h>
++#include <linux/cgroup.h>
++#include <linux/elevator.h>
++#include <linux/jiffies.h>
++#include <linux/rbtree.h>
++#include <linux/ioprio.h>
++#include "bfq.h"
++#include "blk.h"
++
++/* Expiration time of sync (0) and async (1) requests, in jiffies. */
++static const int bfq_fifo_expire[2] = { HZ / 4, HZ / 8 };
++
++/* Maximum backwards seek, in KiB. */
++static const int bfq_back_max = 16 * 1024;
++
++/* Penalty of a backwards seek, in number of sectors. */
++static const int bfq_back_penalty = 2;
++
++/* Idling period duration, in jiffies. */
++static int bfq_slice_idle = HZ / 125;
++
++/* Minimum number of assigned budgets for which stats are safe to compute. */
++static const int bfq_stats_min_budgets = 194;
++
++/* Default maximum budget values, in sectors and number of requests. */
++static const int bfq_default_max_budget = 16 * 1024;
++static const int bfq_max_budget_async_rq = 4;
++
++/*
++ * Async to sync throughput distribution is controlled as follows:
++ * when an async request is served, the entity is charged the number
++ * of sectors of the request, multiplied by the factor below
++ */
++static const int bfq_async_charge_factor = 10;
++
++/* Default timeout values, in jiffies, approximating CFQ defaults. */
++static const int bfq_timeout_sync = HZ / 8;
++static int bfq_timeout_async = HZ / 25;
++
++struct kmem_cache *bfq_pool;
++
++/* Below this threshold (in ms), we consider thinktime immediate. */
++#define BFQ_MIN_TT 2
++
++/* hw_tag detection: parallel requests threshold and min samples needed. */
++#define BFQ_HW_QUEUE_THRESHOLD 4
++#define BFQ_HW_QUEUE_SAMPLES 32
++
++#define BFQQ_SEEK_THR (sector_t)(8 * 1024)
++#define BFQQ_SEEKY(bfqq) ((bfqq)->seek_mean > BFQQ_SEEK_THR)
++
++/* Min samples used for peak rate estimation (for autotuning). */
++#define BFQ_PEAK_RATE_SAMPLES 32
++
++/* Shift used for peak rate fixed precision calculations. */
++#define BFQ_RATE_SHIFT 16
++
++/*
++ * By default, BFQ computes the duration of the weight raising for
++ * interactive applications automatically, using the following formula:
++ * duration = (R / r) * T, where r is the peak rate of the device, and
++ * R and T are two reference parameters.
++ * In particular, R is the peak rate of the reference device (see below),
++ * and T is a reference time: given the systems that are likely to be
++ * installed on the reference device according to its speed class, T is
++ * about the maximum time needed, under BFQ and while reading two files in
++ * parallel, to load typical large applications on these systems.
++ * In practice, the slower/faster the device at hand is, the more/less it
++ * takes to load applications with respect to the reference device.
++ * Accordingly, the longer/shorter BFQ grants weight raising to interactive
++ * applications.
++ *
++ * BFQ uses four different reference pairs (R, T), depending on:
++ * . whether the device is rotational or non-rotational;
++ * . whether the device is slow, such as old or portable HDDs, as well as
++ * SD cards, or fast, such as newer HDDs and SSDs.
++ *
++ * The device's speed class is dynamically (re)detected in
++ * bfq_update_peak_rate() every time the estimated peak rate is updated.
++ *
++ * In the following definitions, R_slow[0]/R_fast[0] and T_slow[0]/T_fast[0]
++ * are the reference values for a slow/fast rotational device, whereas
++ * R_slow[1]/R_fast[1] and T_slow[1]/T_fast[1] are the reference values for
++ * a slow/fast non-rotational device. Finally, device_speed_thresh are the
++ * thresholds used to switch between speed classes.
++ * Both the reference peak rates and the thresholds are measured in
++ * sectors/usec, left-shifted by BFQ_RATE_SHIFT.
++ */
++static int R_slow[2] = {1536, 10752};
++static int R_fast[2] = {17415, 34791};
++/*
++ * To improve readability, a conversion function is used to initialize the
++ * following arrays, which entails that they can be initialized only in a
++ * function.
++ */
++static int T_slow[2];
++static int T_fast[2];
++static int device_speed_thresh[2];
++
++#define BFQ_SERVICE_TREE_INIT ((struct bfq_service_tree) \
++ { RB_ROOT, RB_ROOT, NULL, NULL, 0, 0 })
++
++#define RQ_BIC(rq) ((struct bfq_io_cq *) (rq)->elv.priv[0])
++#define RQ_BFQQ(rq) ((rq)->elv.priv[1])
++
++static void bfq_schedule_dispatch(struct bfq_data *bfqd);
++
++#include "bfq-ioc.c"
++#include "bfq-sched.c"
++#include "bfq-cgroup.c"
++
++#define bfq_class_idle(bfqq) ((bfqq)->ioprio_class == IOPRIO_CLASS_IDLE)
++#define bfq_class_rt(bfqq) ((bfqq)->ioprio_class == IOPRIO_CLASS_RT)
++
++#define bfq_sample_valid(samples) ((samples) > 80)
++
++/*
++ * We regard a request as SYNC, if either it's a read or has the SYNC bit
++ * set (in which case it could also be a direct WRITE).
++ */
++static int bfq_bio_sync(struct bio *bio)
++{
++ if (bio_data_dir(bio) == READ || (bio->bi_rw & REQ_SYNC))
++ return 1;
++
++ return 0;
++}
++
++/*
++ * Scheduler run of queue, if there are requests pending and no one in the
++ * driver that will restart queueing.
++ */
++static void bfq_schedule_dispatch(struct bfq_data *bfqd)
++{
++ if (bfqd->queued != 0) {
++ bfq_log(bfqd, "schedule dispatch");
++ kblockd_schedule_work(&bfqd->unplug_work);
++ }
++}
++
++/*
++ * Lifted from AS - choose which of rq1 and rq2 that is best served now.
++ * We choose the request that is closesr to the head right now. Distance
++ * behind the head is penalized and only allowed to a certain extent.
++ */
++static struct request *bfq_choose_req(struct bfq_data *bfqd,
++ struct request *rq1,
++ struct request *rq2,
++ sector_t last)
++{
++ sector_t s1, s2, d1 = 0, d2 = 0;
++ unsigned long back_max;
++#define BFQ_RQ1_WRAP 0x01 /* request 1 wraps */
++#define BFQ_RQ2_WRAP 0x02 /* request 2 wraps */
++ unsigned wrap = 0; /* bit mask: requests behind the disk head? */
++
++ if (!rq1 || rq1 == rq2)
++ return rq2;
++ if (!rq2)
++ return rq1;
++
++ if (rq_is_sync(rq1) && !rq_is_sync(rq2))
++ return rq1;
++ else if (rq_is_sync(rq2) && !rq_is_sync(rq1))
++ return rq2;
++ if ((rq1->cmd_flags & REQ_META) && !(rq2->cmd_flags & REQ_META))
++ return rq1;
++ else if ((rq2->cmd_flags & REQ_META) && !(rq1->cmd_flags & REQ_META))
++ return rq2;
++
++ s1 = blk_rq_pos(rq1);
++ s2 = blk_rq_pos(rq2);
++
++ /*
++ * By definition, 1KiB is 2 sectors.
++ */
++ back_max = bfqd->bfq_back_max * 2;
++
++ /*
++ * Strict one way elevator _except_ in the case where we allow
++ * short backward seeks which are biased as twice the cost of a
++ * similar forward seek.
++ */
++ if (s1 >= last)
++ d1 = s1 - last;
++ else if (s1 + back_max >= last)
++ d1 = (last - s1) * bfqd->bfq_back_penalty;
++ else
++ wrap |= BFQ_RQ1_WRAP;
++
++ if (s2 >= last)
++ d2 = s2 - last;
++ else if (s2 + back_max >= last)
++ d2 = (last - s2) * bfqd->bfq_back_penalty;
++ else
++ wrap |= BFQ_RQ2_WRAP;
++
++ /* Found required data */
++
++ /*
++ * By doing switch() on the bit mask "wrap" we avoid having to
++ * check two variables for all permutations: --> faster!
++ */
++ switch (wrap) {
++ case 0: /* common case for CFQ: rq1 and rq2 not wrapped */
++ if (d1 < d2)
++ return rq1;
++ else if (d2 < d1)
++ return rq2;
++ else {
++ if (s1 >= s2)
++ return rq1;
++ else
++ return rq2;
++ }
++
++ case BFQ_RQ2_WRAP:
++ return rq1;
++ case BFQ_RQ1_WRAP:
++ return rq2;
++ case (BFQ_RQ1_WRAP|BFQ_RQ2_WRAP): /* both rqs wrapped */
++ default:
++ /*
++ * Since both rqs are wrapped,
++ * start with the one that's further behind head
++ * (--> only *one* back seek required),
++ * since back seek takes more time than forward.
++ */
++ if (s1 <= s2)
++ return rq1;
++ else
++ return rq2;
++ }
++}
++
++/*
++ * Tell whether there are active queues or groups with differentiated weights.
++ */
++static bool bfq_differentiated_weights(struct bfq_data *bfqd)
++{
++ /*
++ * For weights to differ, at least one of the trees must contain
++ * at least two nodes.
++ */
++ return (!RB_EMPTY_ROOT(&bfqd->queue_weights_tree) &&
++ (bfqd->queue_weights_tree.rb_node->rb_left ||
++ bfqd->queue_weights_tree.rb_node->rb_right)
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ ) ||
++ (!RB_EMPTY_ROOT(&bfqd->group_weights_tree) &&
++ (bfqd->group_weights_tree.rb_node->rb_left ||
++ bfqd->group_weights_tree.rb_node->rb_right)
++#endif
++ );
++}
++
++/*
++ * The following function returns true if every queue must receive the
++ * same share of the throughput (this condition is used when deciding
++ * whether idling may be disabled, see the comments in the function
++ * bfq_bfqq_may_idle()).
++ *
++ * Such a scenario occurs when:
++ * 1) all active queues have the same weight,
++ * 2) all active groups at the same level in the groups tree have the same
++ * weight,
++ * 3) all active groups at the same level in the groups tree have the same
++ * number of children.
++ *
++ * Unfortunately, keeping the necessary state for evaluating exactly the
++ * above symmetry conditions would be quite complex and time-consuming.
++ * Therefore this function evaluates, instead, the following stronger
++ * sub-conditions, for which it is much easier to maintain the needed
++ * state:
++ * 1) all active queues have the same weight,
++ * 2) all active groups have the same weight,
++ * 3) all active groups have at most one active child each.
++ * In particular, the last two conditions are always true if hierarchical
++ * support and the cgroups interface are not enabled, thus no state needs
++ * to be maintained in this case.
++ */
++static bool bfq_symmetric_scenario(struct bfq_data *bfqd)
++{
++ return
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ !bfqd->active_numerous_groups &&
++#endif
++ !bfq_differentiated_weights(bfqd);
++}
++
++/*
++ * If the weight-counter tree passed as input contains no counter for
++ * the weight of the input entity, then add that counter; otherwise just
++ * increment the existing counter.
++ *
++ * Note that weight-counter trees contain few nodes in mostly symmetric
++ * scenarios. For example, if all queues have the same weight, then the
++ * weight-counter tree for the queues may contain at most one node.
++ * This holds even if low_latency is on, because weight-raised queues
++ * are not inserted in the tree.
++ * In most scenarios, the rate at which nodes are created/destroyed
++ * should be low too.
++ */
++static void bfq_weights_tree_add(struct bfq_data *bfqd,
++ struct bfq_entity *entity,
++ struct rb_root *root)
++{
++ struct rb_node **new = &(root->rb_node), *parent = NULL;
++
++ /*
++ * Do not insert if the entity is already associated with a
++ * counter, which happens if:
++ * 1) the entity is associated with a queue,
++ * 2) a request arrival has caused the queue to become both
++ * non-weight-raised, and hence change its weight, and
++ * backlogged; in this respect, each of the two events
++ * causes an invocation of this function,
++ * 3) this is the invocation of this function caused by the
++ * second event. This second invocation is actually useless,
++ * and we handle this fact by exiting immediately. More
++ * efficient or clearer solutions might possibly be adopted.
++ */
++ if (entity->weight_counter)
++ return;
++
++ while (*new) {
++ struct bfq_weight_counter *__counter = container_of(*new,
++ struct bfq_weight_counter,
++ weights_node);
++ parent = *new;
++
++ if (entity->weight == __counter->weight) {
++ entity->weight_counter = __counter;
++ goto inc_counter;
++ }
++ if (entity->weight < __counter->weight)
++ new = &((*new)->rb_left);
++ else
++ new = &((*new)->rb_right);
++ }
++
++ entity->weight_counter = kzalloc(sizeof(struct bfq_weight_counter),
++ GFP_ATOMIC);
++ entity->weight_counter->weight = entity->weight;
++ rb_link_node(&entity->weight_counter->weights_node, parent, new);
++ rb_insert_color(&entity->weight_counter->weights_node, root);
++
++inc_counter:
++ entity->weight_counter->num_active++;
++}
++
++/*
++ * Decrement the weight counter associated with the entity, and, if the
++ * counter reaches 0, remove the counter from the tree.
++ * See the comments to the function bfq_weights_tree_add() for considerations
++ * about overhead.
++ */
++static void bfq_weights_tree_remove(struct bfq_data *bfqd,
++ struct bfq_entity *entity,
++ struct rb_root *root)
++{
++ if (!entity->weight_counter)
++ return;
++
++ BUG_ON(RB_EMPTY_ROOT(root));
++ BUG_ON(entity->weight_counter->weight != entity->weight);
++
++ BUG_ON(!entity->weight_counter->num_active);
++ entity->weight_counter->num_active--;
++ if (entity->weight_counter->num_active > 0)
++ goto reset_entity_pointer;
++
++ rb_erase(&entity->weight_counter->weights_node, root);
++ kfree(entity->weight_counter);
++
++reset_entity_pointer:
++ entity->weight_counter = NULL;
++}
++
++static struct request *bfq_find_next_rq(struct bfq_data *bfqd,
++ struct bfq_queue *bfqq,
++ struct request *last)
++{
++ struct rb_node *rbnext = rb_next(&last->rb_node);
++ struct rb_node *rbprev = rb_prev(&last->rb_node);
++ struct request *next = NULL, *prev = NULL;
++
++ BUG_ON(RB_EMPTY_NODE(&last->rb_node));
++
++ if (rbprev)
++ prev = rb_entry_rq(rbprev);
++
++ if (rbnext)
++ next = rb_entry_rq(rbnext);
++ else {
++ rbnext = rb_first(&bfqq->sort_list);
++ if (rbnext && rbnext != &last->rb_node)
++ next = rb_entry_rq(rbnext);
++ }
++
++ return bfq_choose_req(bfqd, next, prev, blk_rq_pos(last));
++}
++
++/* see the definition of bfq_async_charge_factor for details */
++static unsigned long bfq_serv_to_charge(struct request *rq,
++ struct bfq_queue *bfqq)
++{
++ return blk_rq_sectors(rq) *
++ (1 + ((!bfq_bfqq_sync(bfqq)) * (bfqq->wr_coeff == 1) *
++ bfq_async_charge_factor));
++}
++
++/**
++ * bfq_updated_next_req - update the queue after a new next_rq selection.
++ * @bfqd: the device data the queue belongs to.
++ * @bfqq: the queue to update.
++ *
++ * If the first request of a queue changes we make sure that the queue
++ * has enough budget to serve at least its first request (if the
++ * request has grown). We do this because if the queue has not enough
++ * budget for its first request, it has to go through two dispatch
++ * rounds to actually get it dispatched.
++ */
++static void bfq_updated_next_req(struct bfq_data *bfqd,
++ struct bfq_queue *bfqq)
++{
++ struct bfq_entity *entity = &bfqq->entity;
++ struct bfq_service_tree *st = bfq_entity_service_tree(entity);
++ struct request *next_rq = bfqq->next_rq;
++ unsigned long new_budget;
++
++ if (!next_rq)
++ return;
++
++ if (bfqq == bfqd->in_service_queue)
++ /*
++ * In order not to break guarantees, budgets cannot be
++ * changed after an entity has been selected.
++ */
++ return;
++
++ BUG_ON(entity->tree != &st->active);
++ BUG_ON(entity == entity->sched_data->in_service_entity);
++
++ new_budget = max_t(unsigned long, bfqq->max_budget,
++ bfq_serv_to_charge(next_rq, bfqq));
++ if (entity->budget != new_budget) {
++ entity->budget = new_budget;
++ bfq_log_bfqq(bfqd, bfqq, "updated next rq: new budget %lu",
++ new_budget);
++ bfq_activate_bfqq(bfqd, bfqq);
++ }
++}
++
++static unsigned int bfq_wr_duration(struct bfq_data *bfqd)
++{
++ u64 dur;
++
++ if (bfqd->bfq_wr_max_time > 0)
++ return bfqd->bfq_wr_max_time;
++
++ dur = bfqd->RT_prod;
++ do_div(dur, bfqd->peak_rate);
++
++ return dur;
++}
++
++/* Empty burst list and add just bfqq (see comments to bfq_handle_burst) */
++static void bfq_reset_burst_list(struct bfq_data *bfqd, struct bfq_queue *bfqq)
++{
++ struct bfq_queue *item;
++ struct hlist_node *n;
++
++ hlist_for_each_entry_safe(item, n, &bfqd->burst_list, burst_list_node)
++ hlist_del_init(&item->burst_list_node);
++ hlist_add_head(&bfqq->burst_list_node, &bfqd->burst_list);
++ bfqd->burst_size = 1;
++}
++
++/* Add bfqq to the list of queues in current burst (see bfq_handle_burst) */
++static void bfq_add_to_burst(struct bfq_data *bfqd, struct bfq_queue *bfqq)
++{
++ /* Increment burst size to take into account also bfqq */
++ bfqd->burst_size++;
++
++ if (bfqd->burst_size == bfqd->bfq_large_burst_thresh) {
++ struct bfq_queue *pos, *bfqq_item;
++ struct hlist_node *n;
++
++ /*
++ * Enough queues have been activated shortly after each
++ * other to consider this burst as large.
++ */
++ bfqd->large_burst = true;
++
++ /*
++ * We can now mark all queues in the burst list as
++ * belonging to a large burst.
++ */
++ hlist_for_each_entry(bfqq_item, &bfqd->burst_list,
++ burst_list_node)
++ bfq_mark_bfqq_in_large_burst(bfqq_item);
++ bfq_mark_bfqq_in_large_burst(bfqq);
++
++ /*
++ * From now on, and until the current burst finishes, any
++ * new queue being activated shortly after the last queue
++ * was inserted in the burst can be immediately marked as
++ * belonging to a large burst. So the burst list is not
++ * needed any more. Remove it.
++ */
++ hlist_for_each_entry_safe(pos, n, &bfqd->burst_list,
++ burst_list_node)
++ hlist_del_init(&pos->burst_list_node);
++ } else /* burst not yet large: add bfqq to the burst list */
++ hlist_add_head(&bfqq->burst_list_node, &bfqd->burst_list);
++}
++
++/*
++ * If many queues happen to become active shortly after each other, then,
++ * to help the processes associated to these queues get their job done as
++ * soon as possible, it is usually better to not grant either weight-raising
++ * or device idling to these queues. In this comment we describe, firstly,
++ * the reasons why this fact holds, and, secondly, the next function, which
++ * implements the main steps needed to properly mark these queues so that
++ * they can then be treated in a different way.
++ *
++ * As for the terminology, we say that a queue becomes active, i.e.,
++ * switches from idle to backlogged, either when it is created (as a
++ * consequence of the arrival of an I/O request), or, if already existing,
++ * when a new request for the queue arrives while the queue is idle.
++ * Bursts of activations, i.e., activations of different queues occurring
++ * shortly after each other, are typically caused by services or applications
++ * that spawn or reactivate many parallel threads/processes. Examples are
++ * systemd during boot or git grep.
++ *
++ * These services or applications benefit mostly from a high throughput:
++ * the quicker the requests of the activated queues are cumulatively served,
++ * the sooner the target job of these queues gets completed. As a consequence,
++ * weight-raising any of these queues, which also implies idling the device
++ * for it, is almost always counterproductive: in most cases it just lowers
++ * throughput.
++ *
++ * On the other hand, a burst of activations may be also caused by the start
++ * of an application that does not consist in a lot of parallel I/O-bound
++ * threads. In fact, with a complex application, the burst may be just a
++ * consequence of the fact that several processes need to be executed to
++ * start-up the application. To start an application as quickly as possible,
++ * the best thing to do is to privilege the I/O related to the application
++ * with respect to all other I/O. Therefore, the best strategy to start as
++ * quickly as possible an application that causes a burst of activations is
++ * to weight-raise all the queues activated during the burst. This is the
++ * exact opposite of the best strategy for the other type of bursts.
++ *
++ * In the end, to take the best action for each of the two cases, the two
++ * types of bursts need to be distinguished. Fortunately, this seems
++ * relatively easy to do, by looking at the sizes of the bursts. In
++ * particular, we found a threshold such that bursts with a larger size
++ * than that threshold are apparently caused only by services or commands
++ * such as systemd or git grep. For brevity, hereafter we call just 'large'
++ * these bursts. BFQ *does not* weight-raise queues whose activations occur
++ * in a large burst. In addition, for each of these queues BFQ performs or
++ * does not perform idling depending on which choice boosts the throughput
++ * most. The exact choice depends on the device and request pattern at
++ * hand.
++ *
++ * Turning back to the next function, it implements all the steps needed
++ * to detect the occurrence of a large burst and to properly mark all the
++ * queues belonging to it (so that they can then be treated in a different
++ * way). This goal is achieved by maintaining a special "burst list" that
++ * holds, temporarily, the queues that belong to the burst in progress. The
++ * list is then used to mark these queues as belonging to a large burst if
++ * the burst does become large. The main steps are the following.
++ *
++ * . when the very first queue is activated, the queue is inserted into the
++ * list (as it could be the first queue in a possible burst)
++ *
++ * . if the current burst has not yet become large, and a queue Q that does
++ * not yet belong to the burst is activated shortly after the last time
++ * at which a new queue entered the burst list, then the function appends
++ * Q to the burst list
++ *
++ * . if, as a consequence of the previous step, the burst size reaches
++ * the large-burst threshold, then
++ *
++ * . all the queues in the burst list are marked as belonging to a
++ * large burst
++ *
++ * . the burst list is deleted; in fact, the burst list already served
++ * its purpose (keeping temporarily track of the queues in a burst,
++ * so as to be able to mark them as belonging to a large burst in the
++ * previous sub-step), and now is not needed any more
++ *
++ * . the device enters a large-burst mode
++ *
++ * . if a queue Q that does not belong to the burst is activated while
++ * the device is in large-burst mode and shortly after the last time
++ * at which a queue either entered the burst list or was marked as
++ * belonging to the current large burst, then Q is immediately marked
++ * as belonging to a large burst.
++ *
++ * . if a queue Q that does not belong to the burst is activated a while
++ * later, i.e., not shortly after, than the last time at which a queue
++ * either entered the burst list or was marked as belonging to the
++ * current large burst, then the current burst is deemed as finished and:
++ *
++ * . the large-burst mode is reset if set
++ *
++ * . the burst list is emptied
++ *
++ * . Q is inserted in the burst list, as Q may be the first queue
++ * in a possible new burst (then the burst list contains just Q
++ * after this step).
++ */
++static void bfq_handle_burst(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++ bool idle_for_long_time)
++{
++ /*
++ * If bfqq happened to be activated in a burst, but has been idle
++ * for at least as long as an interactive queue, then we assume
++ * that, in the overall I/O initiated in the burst, the I/O
++ * associated to bfqq is finished. So bfqq does not need to be
++ * treated as a queue belonging to a burst anymore. Accordingly,
++ * we reset bfqq's in_large_burst flag if set, and remove bfqq
++ * from the burst list if it's there. We do not decrement instead
++ * burst_size, because the fact that bfqq does not need to belong
++ * to the burst list any more does not invalidate the fact that
++ * bfqq may have been activated during the current burst.
++ */
++ if (idle_for_long_time) {
++ hlist_del_init(&bfqq->burst_list_node);
++ bfq_clear_bfqq_in_large_burst(bfqq);
++ }
++
++ /*
++ * If bfqq is already in the burst list or is part of a large
++ * burst, then there is nothing else to do.
++ */
++ if (!hlist_unhashed(&bfqq->burst_list_node) ||
++ bfq_bfqq_in_large_burst(bfqq))
++ return;
++
++ /*
++ * If bfqq's activation happens late enough, then the current
++ * burst is finished, and related data structures must be reset.
++ *
++ * In this respect, consider the special case where bfqq is the very
++ * first queue being activated. In this case, last_ins_in_burst is
++ * not yet significant when we get here. But it is easy to verify
++ * that, whether or not the following condition is true, bfqq will
++ * end up being inserted into the burst list. In particular the
++ * list will happen to contain only bfqq. And this is exactly what
++ * has to happen, as bfqq may be the first queue in a possible
++ * burst.
++ */
++ if (time_is_before_jiffies(bfqd->last_ins_in_burst +
++ bfqd->bfq_burst_interval)) {
++ bfqd->large_burst = false;
++ bfq_reset_burst_list(bfqd, bfqq);
++ return;
++ }
++
++ /*
++ * If we get here, then bfqq is being activated shortly after the
++ * last queue. So, if the current burst is also large, we can mark
++ * bfqq as belonging to this large burst immediately.
++ */
++ if (bfqd->large_burst) {
++ bfq_mark_bfqq_in_large_burst(bfqq);
++ return;
++ }
++
++ /*
++ * If we get here, then a large-burst state has not yet been
++ * reached, but bfqq is being activated shortly after the last
++ * queue. Then we add bfqq to the burst.
++ */
++ bfq_add_to_burst(bfqd, bfqq);
++}
++
++static void bfq_add_request(struct request *rq)
++{
++ struct bfq_queue *bfqq = RQ_BFQQ(rq);
++ struct bfq_entity *entity = &bfqq->entity;
++ struct bfq_data *bfqd = bfqq->bfqd;
++ struct request *next_rq, *prev;
++ unsigned long old_wr_coeff = bfqq->wr_coeff;
++ bool interactive = false;
++
++ bfq_log_bfqq(bfqd, bfqq, "add_request %d", rq_is_sync(rq));
++ bfqq->queued[rq_is_sync(rq)]++;
++ bfqd->queued++;
++
++ elv_rb_add(&bfqq->sort_list, rq);
++
++ /*
++ * Check if this request is a better next-serve candidate.
++ */
++ prev = bfqq->next_rq;
++ next_rq = bfq_choose_req(bfqd, bfqq->next_rq, rq, bfqd->last_position);
++ BUG_ON(!next_rq);
++ bfqq->next_rq = next_rq;
++
++ if (!bfq_bfqq_busy(bfqq)) {
++ bool soft_rt, in_burst,
++ idle_for_long_time = time_is_before_jiffies(
++ bfqq->budget_timeout +
++ bfqd->bfq_wr_min_idle_time);
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ bfqg_stats_update_io_add(bfqq_group(RQ_BFQQ(rq)), bfqq,
++ rq->cmd_flags);
++#endif
++ if (bfq_bfqq_sync(bfqq)) {
++ bool already_in_burst =
++ !hlist_unhashed(&bfqq->burst_list_node) ||
++ bfq_bfqq_in_large_burst(bfqq);
++ bfq_handle_burst(bfqd, bfqq, idle_for_long_time);
++ /*
++ * If bfqq was not already in the current burst,
++ * then, at this point, bfqq either has been
++ * added to the current burst or has caused the
++ * current burst to terminate. In particular, in
++ * the second case, bfqq has become the first
++ * queue in a possible new burst.
++ * In both cases last_ins_in_burst needs to be
++ * moved forward.
++ */
++ if (!already_in_burst)
++ bfqd->last_ins_in_burst = jiffies;
++ }
++
++ in_burst = bfq_bfqq_in_large_burst(bfqq);
++ soft_rt = bfqd->bfq_wr_max_softrt_rate > 0 &&
++ !in_burst &&
++ time_is_before_jiffies(bfqq->soft_rt_next_start);
++ interactive = !in_burst && idle_for_long_time;
++ entity->budget = max_t(unsigned long, bfqq->max_budget,
++ bfq_serv_to_charge(next_rq, bfqq));
++
++ if (!bfq_bfqq_IO_bound(bfqq)) {
++ if (time_before(jiffies,
++ RQ_BIC(rq)->ttime.last_end_request +
++ bfqd->bfq_slice_idle)) {
++ bfqq->requests_within_timer++;
++ if (bfqq->requests_within_timer >=
++ bfqd->bfq_requests_within_timer)
++ bfq_mark_bfqq_IO_bound(bfqq);
++ } else
++ bfqq->requests_within_timer = 0;
++ }
++
++ if (!bfqd->low_latency)
++ goto add_bfqq_busy;
++
++ /*
++ * If the queue:
++ * - is not being boosted,
++ * - has been idle for enough time,
++ * - is not a sync queue or is linked to a bfq_io_cq (it is
++ * shared "for its nature" or it is not shared and its
++ * requests have not been redirected to a shared queue)
++ * start a weight-raising period.
++ */
++ if (old_wr_coeff == 1 && (interactive || soft_rt) &&
++ (!bfq_bfqq_sync(bfqq) || bfqq->bic)) {
++ bfqq->wr_coeff = bfqd->bfq_wr_coeff;
++ if (interactive)
++ bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
++ else
++ bfqq->wr_cur_max_time =
++ bfqd->bfq_wr_rt_max_time;
++ bfq_log_bfqq(bfqd, bfqq,
++ "wrais starting at %lu, rais_max_time %u",
++ jiffies,
++ jiffies_to_msecs(bfqq->wr_cur_max_time));
++ } else if (old_wr_coeff > 1) {
++ if (interactive)
++ bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
++ else if (in_burst ||
++ (bfqq->wr_cur_max_time ==
++ bfqd->bfq_wr_rt_max_time &&
++ !soft_rt)) {
++ bfqq->wr_coeff = 1;
++ bfq_log_bfqq(bfqd, bfqq,
++ "wrais ending at %lu, rais_max_time %u",
++ jiffies,
++ jiffies_to_msecs(bfqq->
++ wr_cur_max_time));
++ } else if (time_before(
++ bfqq->last_wr_start_finish +
++ bfqq->wr_cur_max_time,
++ jiffies +
++ bfqd->bfq_wr_rt_max_time) &&
++ soft_rt) {
++ /*
++ *
++ * The remaining weight-raising time is lower
++ * than bfqd->bfq_wr_rt_max_time, which means
++ * that the application is enjoying weight
++ * raising either because deemed soft-rt in
++ * the near past, or because deemed interactive
++ * a long ago.
++ * In both cases, resetting now the current
++ * remaining weight-raising time for the
++ * application to the weight-raising duration
++ * for soft rt applications would not cause any
++ * latency increase for the application (as the
++ * new duration would be higher than the
++ * remaining time).
++ *
++ * In addition, the application is now meeting
++ * the requirements for being deemed soft rt.
++ * In the end we can correctly and safely
++ * (re)charge the weight-raising duration for
++ * the application with the weight-raising
++ * duration for soft rt applications.
++ *
++ * In particular, doing this recharge now, i.e.,
++ * before the weight-raising period for the
++ * application finishes, reduces the probability
++ * of the following negative scenario:
++ * 1) the weight of a soft rt application is
++ * raised at startup (as for any newly
++ * created application),
++ * 2) since the application is not interactive,
++ * at a certain time weight-raising is
++ * stopped for the application,
++ * 3) at that time the application happens to
++ * still have pending requests, and hence
++ * is destined to not have a chance to be
++ * deemed soft rt before these requests are
++ * completed (see the comments to the
++ * function bfq_bfqq_softrt_next_start()
++ * for details on soft rt detection),
++ * 4) these pending requests experience a high
++ * latency because the application is not
++ * weight-raised while they are pending.
++ */
++ bfqq->last_wr_start_finish = jiffies;
++ bfqq->wr_cur_max_time =
++ bfqd->bfq_wr_rt_max_time;
++ }
++ }
++ if (old_wr_coeff != bfqq->wr_coeff)
++ entity->prio_changed = 1;
++add_bfqq_busy:
++ bfqq->last_idle_bklogged = jiffies;
++ bfqq->service_from_backlogged = 0;
++ bfq_clear_bfqq_softrt_update(bfqq);
++ bfq_add_bfqq_busy(bfqd, bfqq);
++ } else {
++ if (bfqd->low_latency && old_wr_coeff == 1 && !rq_is_sync(rq) &&
++ time_is_before_jiffies(
++ bfqq->last_wr_start_finish +
++ bfqd->bfq_wr_min_inter_arr_async)) {
++ bfqq->wr_coeff = bfqd->bfq_wr_coeff;
++ bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
++
++ bfqd->wr_busy_queues++;
++ entity->prio_changed = 1;
++ bfq_log_bfqq(bfqd, bfqq,
++ "non-idle wrais starting at %lu, rais_max_time %u",
++ jiffies,
++ jiffies_to_msecs(bfqq->wr_cur_max_time));
++ }
++ if (prev != bfqq->next_rq)
++ bfq_updated_next_req(bfqd, bfqq);
++ }
++
++ if (bfqd->low_latency &&
++ (old_wr_coeff == 1 || bfqq->wr_coeff == 1 || interactive))
++ bfqq->last_wr_start_finish = jiffies;
++}
++
++static struct request *bfq_find_rq_fmerge(struct bfq_data *bfqd,
++ struct bio *bio)
++{
++ struct task_struct *tsk = current;
++ struct bfq_io_cq *bic;
++ struct bfq_queue *bfqq;
++
++ bic = bfq_bic_lookup(bfqd, tsk->io_context);
++ if (!bic)
++ return NULL;
++
++ bfqq = bic_to_bfqq(bic, bfq_bio_sync(bio));
++ if (bfqq)
++ return elv_rb_find(&bfqq->sort_list, bio_end_sector(bio));
++
++ return NULL;
++}
++
++static void bfq_activate_request(struct request_queue *q, struct request *rq)
++{
++ struct bfq_data *bfqd = q->elevator->elevator_data;
++
++ bfqd->rq_in_driver++;
++ bfqd->last_position = blk_rq_pos(rq) + blk_rq_sectors(rq);
++ bfq_log(bfqd, "activate_request: new bfqd->last_position %llu",
++ (long long unsigned)bfqd->last_position);
++}
++
++static void bfq_deactivate_request(struct request_queue *q, struct request *rq)
++{
++ struct bfq_data *bfqd = q->elevator->elevator_data;
++
++ BUG_ON(bfqd->rq_in_driver == 0);
++ bfqd->rq_in_driver--;
++}
++
++static void bfq_remove_request(struct request *rq)
++{
++ struct bfq_queue *bfqq = RQ_BFQQ(rq);
++ struct bfq_data *bfqd = bfqq->bfqd;
++ const int sync = rq_is_sync(rq);
++
++ if (bfqq->next_rq == rq) {
++ bfqq->next_rq = bfq_find_next_rq(bfqd, bfqq, rq);
++ bfq_updated_next_req(bfqd, bfqq);
++ }
++
++ if (rq->queuelist.prev != &rq->queuelist)
++ list_del_init(&rq->queuelist);
++ BUG_ON(bfqq->queued[sync] == 0);
++ bfqq->queued[sync]--;
++ bfqd->queued--;
++ elv_rb_del(&bfqq->sort_list, rq);
++
++ if (RB_EMPTY_ROOT(&bfqq->sort_list)) {
++ if (bfq_bfqq_busy(bfqq) && bfqq != bfqd->in_service_queue)
++ bfq_del_bfqq_busy(bfqd, bfqq, 1);
++ /*
++ * Remove queue from request-position tree as it is empty.
++ */
++ if (bfqq->pos_root) {
++ rb_erase(&bfqq->pos_node, bfqq->pos_root);
++ bfqq->pos_root = NULL;
++ }
++ }
++
++ if (rq->cmd_flags & REQ_META) {
++ BUG_ON(bfqq->meta_pending == 0);
++ bfqq->meta_pending--;
++ }
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ bfqg_stats_update_io_remove(bfqq_group(bfqq), rq->cmd_flags);
++#endif
++}
++
++static int bfq_merge(struct request_queue *q, struct request **req,
++ struct bio *bio)
++{
++ struct bfq_data *bfqd = q->elevator->elevator_data;
++ struct request *__rq;
++
++ __rq = bfq_find_rq_fmerge(bfqd, bio);
++ if (__rq && elv_rq_merge_ok(__rq, bio)) {
++ *req = __rq;
++ return ELEVATOR_FRONT_MERGE;
++ }
++
++ return ELEVATOR_NO_MERGE;
++}
++
++static void bfq_merged_request(struct request_queue *q, struct request *req,
++ int type)
++{
++ if (type == ELEVATOR_FRONT_MERGE &&
++ rb_prev(&req->rb_node) &&
++ blk_rq_pos(req) <
++ blk_rq_pos(container_of(rb_prev(&req->rb_node),
++ struct request, rb_node))) {
++ struct bfq_queue *bfqq = RQ_BFQQ(req);
++ struct bfq_data *bfqd = bfqq->bfqd;
++ struct request *prev, *next_rq;
++
++ /* Reposition request in its sort_list */
++ elv_rb_del(&bfqq->sort_list, req);
++ elv_rb_add(&bfqq->sort_list, req);
++ /* Choose next request to be served for bfqq */
++ prev = bfqq->next_rq;
++ next_rq = bfq_choose_req(bfqd, bfqq->next_rq, req,
++ bfqd->last_position);
++ BUG_ON(!next_rq);
++ bfqq->next_rq = next_rq;
++ }
++}
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++static void bfq_bio_merged(struct request_queue *q, struct request *req,
++ struct bio *bio)
++{
++ bfqg_stats_update_io_merged(bfqq_group(RQ_BFQQ(req)), bio->bi_rw);
++}
++#endif
++
++static void bfq_merged_requests(struct request_queue *q, struct request *rq,
++ struct request *next)
++{
++ struct bfq_queue *bfqq = RQ_BFQQ(rq), *next_bfqq = RQ_BFQQ(next);
++
++ /*
++ * If next and rq belong to the same bfq_queue and next is older
++ * than rq, then reposition rq in the fifo (by substituting next
++ * with rq). Otherwise, if next and rq belong to different
++ * bfq_queues, never reposition rq: in fact, we would have to
++ * reposition it with respect to next's position in its own fifo,
++ * which would most certainly be too expensive with respect to
++ * the benefits.
++ */
++ if (bfqq == next_bfqq &&
++ !list_empty(&rq->queuelist) && !list_empty(&next->queuelist) &&
++ time_before(next->fifo_time, rq->fifo_time)) {
++ list_del_init(&rq->queuelist);
++ list_replace_init(&next->queuelist, &rq->queuelist);
++ rq->fifo_time = next->fifo_time;
++ }
++
++ if (bfqq->next_rq == next)
++ bfqq->next_rq = rq;
++
++ bfq_remove_request(next);
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ bfqg_stats_update_io_merged(bfqq_group(bfqq), next->cmd_flags);
++#endif
++}
++
++/* Must be called with bfqq != NULL */
++static void bfq_bfqq_end_wr(struct bfq_queue *bfqq)
++{
++ BUG_ON(!bfqq);
++ if (bfq_bfqq_busy(bfqq))
++ bfqq->bfqd->wr_busy_queues--;
++ bfqq->wr_coeff = 1;
++ bfqq->wr_cur_max_time = 0;
++ /* Trigger a weight change on the next activation of the queue */
++ bfqq->entity.prio_changed = 1;
++}
++
++static void bfq_end_wr_async_queues(struct bfq_data *bfqd,
++ struct bfq_group *bfqg)
++{
++ int i, j;
++
++ for (i = 0; i < 2; i++)
++ for (j = 0; j < IOPRIO_BE_NR; j++)
++ if (bfqg->async_bfqq[i][j])
++ bfq_bfqq_end_wr(bfqg->async_bfqq[i][j]);
++ if (bfqg->async_idle_bfqq)
++ bfq_bfqq_end_wr(bfqg->async_idle_bfqq);
++}
++
++static void bfq_end_wr(struct bfq_data *bfqd)
++{
++ struct bfq_queue *bfqq;
++
++ spin_lock_irq(bfqd->queue->queue_lock);
++
++ list_for_each_entry(bfqq, &bfqd->active_list, bfqq_list)
++ bfq_bfqq_end_wr(bfqq);
++ list_for_each_entry(bfqq, &bfqd->idle_list, bfqq_list)
++ bfq_bfqq_end_wr(bfqq);
++ bfq_end_wr_async(bfqd);
++
++ spin_unlock_irq(bfqd->queue->queue_lock);
++}
++
++static int bfq_allow_merge(struct request_queue *q, struct request *rq,
++ struct bio *bio)
++{
++ struct bfq_data *bfqd = q->elevator->elevator_data;
++ struct bfq_io_cq *bic;
++
++ /*
++ * Disallow merge of a sync bio into an async request.
++ */
++ if (bfq_bio_sync(bio) && !rq_is_sync(rq))
++ return 0;
++
++ /*
++ * Lookup the bfqq that this bio will be queued with. Allow
++ * merge only if rq is queued there.
++ * Queue lock is held here.
++ */
++ bic = bfq_bic_lookup(bfqd, current->io_context);
++ if (!bic)
++ return 0;
++
++ return bic_to_bfqq(bic, bfq_bio_sync(bio)) == RQ_BFQQ(rq);
++}
++
++static void __bfq_set_in_service_queue(struct bfq_data *bfqd,
++ struct bfq_queue *bfqq)
++{
++ if (bfqq) {
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ bfqg_stats_update_avg_queue_size(bfqq_group(bfqq));
++#endif
++ bfq_mark_bfqq_must_alloc(bfqq);
++ bfq_mark_bfqq_budget_new(bfqq);
++ bfq_clear_bfqq_fifo_expire(bfqq);
++
++ bfqd->budgets_assigned = (bfqd->budgets_assigned*7 + 256) / 8;
++
++ bfq_log_bfqq(bfqd, bfqq,
++ "set_in_service_queue, cur-budget = %d",
++ bfqq->entity.budget);
++ }
++
++ bfqd->in_service_queue = bfqq;
++}
++
++/*
++ * Get and set a new queue for service.
++ */
++static struct bfq_queue *bfq_set_in_service_queue(struct bfq_data *bfqd)
++{
++ struct bfq_queue *bfqq = bfq_get_next_queue(bfqd);
++
++ __bfq_set_in_service_queue(bfqd, bfqq);
++ return bfqq;
++}
++
++/*
++ * If enough samples have been computed, return the current max budget
++ * stored in bfqd, which is dynamically updated according to the
++ * estimated disk peak rate; otherwise return the default max budget
++ */
++static int bfq_max_budget(struct bfq_data *bfqd)
++{
++ if (bfqd->budgets_assigned < bfq_stats_min_budgets)
++ return bfq_default_max_budget;
++ else
++ return bfqd->bfq_max_budget;
++}
++
++/*
++ * Return min budget, which is a fraction of the current or default
++ * max budget (trying with 1/32)
++ */
++static int bfq_min_budget(struct bfq_data *bfqd)
++{
++ if (bfqd->budgets_assigned < bfq_stats_min_budgets)
++ return bfq_default_max_budget / 32;
++ else
++ return bfqd->bfq_max_budget / 32;
++}
++
++static void bfq_arm_slice_timer(struct bfq_data *bfqd)
++{
++ struct bfq_queue *bfqq = bfqd->in_service_queue;
++ struct bfq_io_cq *bic;
++ unsigned long sl;
++
++ BUG_ON(!RB_EMPTY_ROOT(&bfqq->sort_list));
++
++ /* Processes have exited, don't wait. */
++ bic = bfqd->in_service_bic;
++ if (!bic || atomic_read(&bic->icq.ioc->active_ref) == 0)
++ return;
++
++ bfq_mark_bfqq_wait_request(bfqq);
++
++ /*
++ * We don't want to idle for seeks, but we do want to allow
++ * fair distribution of slice time for a process doing back-to-back
++ * seeks. So allow a little bit of time for him to submit a new rq.
++ *
++ * To prevent processes with (partly) seeky workloads from
++ * being too ill-treated, grant them a small fraction of the
++ * assigned budget before reducing the waiting time to
++ * BFQ_MIN_TT. This happened to help reduce latency.
++ */
++ sl = bfqd->bfq_slice_idle;
++ /*
++ * Unless the queue is being weight-raised or the scenario is
++ * asymmetric, grant only minimum idle time if the queue either
++ * has been seeky for long enough or has already proved to be
++ * constantly seeky.
++ */
++ if (bfq_sample_valid(bfqq->seek_samples) &&
++ ((BFQQ_SEEKY(bfqq) && bfqq->entity.service >
++ bfq_max_budget(bfqq->bfqd) / 8) ||
++ bfq_bfqq_constantly_seeky(bfqq)) && bfqq->wr_coeff == 1 &&
++ bfq_symmetric_scenario(bfqd))
++ sl = min(sl, msecs_to_jiffies(BFQ_MIN_TT));
++ else if (bfqq->wr_coeff > 1)
++ sl = sl * 3;
++ bfqd->last_idling_start = ktime_get();
++ mod_timer(&bfqd->idle_slice_timer, jiffies + sl);
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ bfqg_stats_set_start_idle_time(bfqq_group(bfqq));
++#endif
++ bfq_log(bfqd, "arm idle: %u/%u ms",
++ jiffies_to_msecs(sl), jiffies_to_msecs(bfqd->bfq_slice_idle));
++}
++
++/*
++ * Set the maximum time for the in-service queue to consume its
++ * budget. This prevents seeky processes from lowering the disk
++ * throughput (always guaranteed with a time slice scheme as in CFQ).
++ */
++static void bfq_set_budget_timeout(struct bfq_data *bfqd)
++{
++ struct bfq_queue *bfqq = bfqd->in_service_queue;
++ unsigned int timeout_coeff;
++ if (bfqq->wr_cur_max_time == bfqd->bfq_wr_rt_max_time)
++ timeout_coeff = 1;
++ else
++ timeout_coeff = bfqq->entity.weight / bfqq->entity.orig_weight;
++
++ bfqd->last_budget_start = ktime_get();
++
++ bfq_clear_bfqq_budget_new(bfqq);
++ bfqq->budget_timeout = jiffies +
++ bfqd->bfq_timeout[bfq_bfqq_sync(bfqq)] * timeout_coeff;
++
++ bfq_log_bfqq(bfqd, bfqq, "set budget_timeout %u",
++ jiffies_to_msecs(bfqd->bfq_timeout[bfq_bfqq_sync(bfqq)] *
++ timeout_coeff));
++}
++
++/*
++ * Move request from internal lists to the request queue dispatch list.
++ */
++static void bfq_dispatch_insert(struct request_queue *q, struct request *rq)
++{
++ struct bfq_data *bfqd = q->elevator->elevator_data;
++ struct bfq_queue *bfqq = RQ_BFQQ(rq);
++
++ /*
++ * For consistency, the next instruction should have been executed
++ * after removing the request from the queue and dispatching it.
++ * We execute instead this instruction before bfq_remove_request()
++ * (and hence introduce a temporary inconsistency), for efficiency.
++ * In fact, in a forced_dispatch, this prevents two counters related
++ * to bfqq->dispatched to risk to be uselessly decremented if bfqq
++ * is not in service, and then to be incremented again after
++ * incrementing bfqq->dispatched.
++ */
++ bfqq->dispatched++;
++ bfq_remove_request(rq);
++ elv_dispatch_sort(q, rq);
++
++ if (bfq_bfqq_sync(bfqq))
++ bfqd->sync_flight++;
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ bfqg_stats_update_dispatch(bfqq_group(bfqq), blk_rq_bytes(rq),
++ rq->cmd_flags);
++#endif
++}
++
++/*
++ * Return expired entry, or NULL to just start from scratch in rbtree.
++ */
++static struct request *bfq_check_fifo(struct bfq_queue *bfqq)
++{
++ struct request *rq = NULL;
++
++ if (bfq_bfqq_fifo_expire(bfqq))
++ return NULL;
++
++ bfq_mark_bfqq_fifo_expire(bfqq);
++
++ if (list_empty(&bfqq->fifo))
++ return NULL;
++
++ rq = rq_entry_fifo(bfqq->fifo.next);
++
++ if (time_before(jiffies, rq->fifo_time))
++ return NULL;
++
++ return rq;
++}
++
++static int bfq_bfqq_budget_left(struct bfq_queue *bfqq)
++{
++ struct bfq_entity *entity = &bfqq->entity;
++ return entity->budget - entity->service;
++}
++
++static void __bfq_bfqq_expire(struct bfq_data *bfqd, struct bfq_queue *bfqq)
++{
++ BUG_ON(bfqq != bfqd->in_service_queue);
++
++ __bfq_bfqd_reset_in_service(bfqd);
++
++ if (RB_EMPTY_ROOT(&bfqq->sort_list)) {
++ /*
++ * Overloading budget_timeout field to store the time
++ * at which the queue remains with no backlog; used by
++ * the weight-raising mechanism.
++ */
++ bfqq->budget_timeout = jiffies;
++ bfq_del_bfqq_busy(bfqd, bfqq, 1);
++ } else
++ bfq_activate_bfqq(bfqd, bfqq);
++}
++
++/**
++ * __bfq_bfqq_recalc_budget - try to adapt the budget to the @bfqq behavior.
++ * @bfqd: device data.
++ * @bfqq: queue to update.
++ * @reason: reason for expiration.
++ *
++ * Handle the feedback on @bfqq budget at queue expiration.
++ * See the body for detailed comments.
++ */
++static void __bfq_bfqq_recalc_budget(struct bfq_data *bfqd,
++ struct bfq_queue *bfqq,
++ enum bfqq_expiration reason)
++{
++ struct request *next_rq;
++ int budget, min_budget;
++
++ budget = bfqq->max_budget;
++ min_budget = bfq_min_budget(bfqd);
++
++ BUG_ON(bfqq != bfqd->in_service_queue);
++
++ bfq_log_bfqq(bfqd, bfqq, "recalc_budg: last budg %d, budg left %d",
++ bfqq->entity.budget, bfq_bfqq_budget_left(bfqq));
++ bfq_log_bfqq(bfqd, bfqq, "recalc_budg: last max_budg %d, min budg %d",
++ budget, bfq_min_budget(bfqd));
++ bfq_log_bfqq(bfqd, bfqq, "recalc_budg: sync %d, seeky %d",
++ bfq_bfqq_sync(bfqq), BFQQ_SEEKY(bfqd->in_service_queue));
++
++ if (bfq_bfqq_sync(bfqq)) {
++ switch (reason) {
++ /*
++ * Caveat: in all the following cases we trade latency
++ * for throughput.
++ */
++ case BFQ_BFQQ_TOO_IDLE:
++ /*
++ * This is the only case where we may reduce
++ * the budget: if there is no request of the
++ * process still waiting for completion, then
++ * we assume (tentatively) that the timer has
++ * expired because the batch of requests of
++ * the process could have been served with a
++ * smaller budget. Hence, betting that
++ * process will behave in the same way when it
++ * becomes backlogged again, we reduce its
++ * next budget. As long as we guess right,
++ * this budget cut reduces the latency
++ * experienced by the process.
++ *
++ * However, if there are still outstanding
++ * requests, then the process may have not yet
++ * issued its next request just because it is
++ * still waiting for the completion of some of
++ * the still outstanding ones. So in this
++ * subcase we do not reduce its budget, on the
++ * contrary we increase it to possibly boost
++ * the throughput, as discussed in the
++ * comments to the BUDGET_TIMEOUT case.
++ */
++ if (bfqq->dispatched > 0) /* still outstanding reqs */
++ budget = min(budget * 2, bfqd->bfq_max_budget);
++ else {
++ if (budget > 5 * min_budget)
++ budget -= 4 * min_budget;
++ else
++ budget = min_budget;
++ }
++ break;
++ case BFQ_BFQQ_BUDGET_TIMEOUT:
++ /*
++ * We double the budget here because: 1) it
++ * gives the chance to boost the throughput if
++ * this is not a seeky process (which may have
++ * bumped into this timeout because of, e.g.,
++ * ZBR), 2) together with charge_full_budget
++ * it helps give seeky processes higher
++ * timestamps, and hence be served less
++ * frequently.
++ */
++ budget = min(budget * 2, bfqd->bfq_max_budget);
++ break;
++ case BFQ_BFQQ_BUDGET_EXHAUSTED:
++ /*
++ * The process still has backlog, and did not
++ * let either the budget timeout or the disk
++ * idling timeout expire. Hence it is not
++ * seeky, has a short thinktime and may be
++ * happy with a higher budget too. So
++ * definitely increase the budget of this good
++ * candidate to boost the disk throughput.
++ */
++ budget = min(budget * 4, bfqd->bfq_max_budget);
++ break;
++ case BFQ_BFQQ_NO_MORE_REQUESTS:
++ /*
++ * Leave the budget unchanged.
++ */
++ default:
++ return;
++ }
++ } else
++ /*
++ * Async queues get always the maximum possible budget
++ * (their ability to dispatch is limited by
++ * @bfqd->bfq_max_budget_async_rq).
++ */
++ budget = bfqd->bfq_max_budget;
++
++ bfqq->max_budget = budget;
++
++ if (bfqd->budgets_assigned >= bfq_stats_min_budgets &&
++ !bfqd->bfq_user_max_budget)
++ bfqq->max_budget = min(bfqq->max_budget, bfqd->bfq_max_budget);
++
++ /*
++ * Make sure that we have enough budget for the next request.
++ * Since the finish time of the bfqq must be kept in sync with
++ * the budget, be sure to call __bfq_bfqq_expire() after the
++ * update.
++ */
++ next_rq = bfqq->next_rq;
++ if (next_rq)
++ bfqq->entity.budget = max_t(unsigned long, bfqq->max_budget,
++ bfq_serv_to_charge(next_rq, bfqq));
++ else
++ bfqq->entity.budget = bfqq->max_budget;
++
++ bfq_log_bfqq(bfqd, bfqq, "head sect: %u, new budget %d",
++ next_rq ? blk_rq_sectors(next_rq) : 0,
++ bfqq->entity.budget);
++}
++
++static unsigned long bfq_calc_max_budget(u64 peak_rate, u64 timeout)
++{
++ unsigned long max_budget;
++
++ /*
++ * The max_budget calculated when autotuning is equal to the
++ * amount of sectors transfered in timeout_sync at the
++ * estimated peak rate.
++ */
++ max_budget = (unsigned long)(peak_rate * 1000 *
++ timeout >> BFQ_RATE_SHIFT);
++
++ return max_budget;
++}
++
++/*
++ * In addition to updating the peak rate, checks whether the process
++ * is "slow", and returns 1 if so. This slow flag is used, in addition
++ * to the budget timeout, to reduce the amount of service provided to
++ * seeky processes, and hence reduce their chances to lower the
++ * throughput. See the code for more details.
++ */
++static bool bfq_update_peak_rate(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++ bool compensate, enum bfqq_expiration reason)
++{
++ u64 bw, usecs, expected, timeout;
++ ktime_t delta;
++ int update = 0;
++
++ if (!bfq_bfqq_sync(bfqq) || bfq_bfqq_budget_new(bfqq))
++ return false;
++
++ if (compensate)
++ delta = bfqd->last_idling_start;
++ else
++ delta = ktime_get();
++ delta = ktime_sub(delta, bfqd->last_budget_start);
++ usecs = ktime_to_us(delta);
++
++ /* Don't trust short/unrealistic values. */
++ if (usecs < 100 || usecs >= LONG_MAX)
++ return false;
++
++ /*
++ * Calculate the bandwidth for the last slice. We use a 64 bit
++ * value to store the peak rate, in sectors per usec in fixed
++ * point math. We do so to have enough precision in the estimate
++ * and to avoid overflows.
++ */
++ bw = (u64)bfqq->entity.service << BFQ_RATE_SHIFT;
++ do_div(bw, (unsigned long)usecs);
++
++ timeout = jiffies_to_msecs(bfqd->bfq_timeout[BLK_RW_SYNC]);
++
++ /*
++ * Use only long (> 20ms) intervals to filter out spikes for
++ * the peak rate estimation.
++ */
++ if (usecs > 20000) {
++ if (bw > bfqd->peak_rate ||
++ (!BFQQ_SEEKY(bfqq) &&
++ reason == BFQ_BFQQ_BUDGET_TIMEOUT)) {
++ bfq_log(bfqd, "measured bw =%llu", bw);
++ /*
++ * To smooth oscillations use a low-pass filter with
++ * alpha=7/8, i.e.,
++ * new_rate = (7/8) * old_rate + (1/8) * bw
++ */
++ do_div(bw, 8);
++ if (bw == 0)
++ return 0;
++ bfqd->peak_rate *= 7;
++ do_div(bfqd->peak_rate, 8);
++ bfqd->peak_rate += bw;
++ update = 1;
++ bfq_log(bfqd, "new peak_rate=%llu", bfqd->peak_rate);
++ }
++
++ update |= bfqd->peak_rate_samples == BFQ_PEAK_RATE_SAMPLES - 1;
++
++ if (bfqd->peak_rate_samples < BFQ_PEAK_RATE_SAMPLES)
++ bfqd->peak_rate_samples++;
++
++ if (bfqd->peak_rate_samples == BFQ_PEAK_RATE_SAMPLES &&
++ update) {
++ int dev_type = blk_queue_nonrot(bfqd->queue);
++ if (bfqd->bfq_user_max_budget == 0) {
++ bfqd->bfq_max_budget =
++ bfq_calc_max_budget(bfqd->peak_rate,
++ timeout);
++ bfq_log(bfqd, "new max_budget=%d",
++ bfqd->bfq_max_budget);
++ }
++ if (bfqd->device_speed == BFQ_BFQD_FAST &&
++ bfqd->peak_rate < device_speed_thresh[dev_type]) {
++ bfqd->device_speed = BFQ_BFQD_SLOW;
++ bfqd->RT_prod = R_slow[dev_type] *
++ T_slow[dev_type];
++ } else if (bfqd->device_speed == BFQ_BFQD_SLOW &&
++ bfqd->peak_rate > device_speed_thresh[dev_type]) {
++ bfqd->device_speed = BFQ_BFQD_FAST;
++ bfqd->RT_prod = R_fast[dev_type] *
++ T_fast[dev_type];
++ }
++ }
++ }
++
++ /*
++ * If the process has been served for a too short time
++ * interval to let its possible sequential accesses prevail on
++ * the initial seek time needed to move the disk head on the
++ * first sector it requested, then give the process a chance
++ * and for the moment return false.
++ */
++ if (bfqq->entity.budget <= bfq_max_budget(bfqd) / 8)
++ return false;
++
++ /*
++ * A process is considered ``slow'' (i.e., seeky, so that we
++ * cannot treat it fairly in the service domain, as it would
++ * slow down too much the other processes) if, when a slice
++ * ends for whatever reason, it has received service at a
++ * rate that would not be high enough to complete the budget
++ * before the budget timeout expiration.
++ */
++ expected = bw * 1000 * timeout >> BFQ_RATE_SHIFT;
++
++ /*
++ * Caveat: processes doing IO in the slower disk zones will
++ * tend to be slow(er) even if not seeky. And the estimated
++ * peak rate will actually be an average over the disk
++ * surface. Hence, to not be too harsh with unlucky processes,
++ * we keep a budget/3 margin of safety before declaring a
++ * process slow.
++ */
++ return expected > (4 * bfqq->entity.budget) / 3;
++}
++
++/*
++ * To be deemed as soft real-time, an application must meet two
++ * requirements. First, the application must not require an average
++ * bandwidth higher than the approximate bandwidth required to playback or
++ * record a compressed high-definition video.
++ * The next function is invoked on the completion of the last request of a
++ * batch, to compute the next-start time instant, soft_rt_next_start, such
++ * that, if the next request of the application does not arrive before
++ * soft_rt_next_start, then the above requirement on the bandwidth is met.
++ *
++ * The second requirement is that the request pattern of the application is
++ * isochronous, i.e., that, after issuing a request or a batch of requests,
++ * the application stops issuing new requests until all its pending requests
++ * have been completed. After that, the application may issue a new batch,
++ * and so on.
++ * For this reason the next function is invoked to compute
++ * soft_rt_next_start only for applications that meet this requirement,
++ * whereas soft_rt_next_start is set to infinity for applications that do
++ * not.
++ *
++ * Unfortunately, even a greedy application may happen to behave in an
++ * isochronous way if the CPU load is high. In fact, the application may
++ * stop issuing requests while the CPUs are busy serving other processes,
++ * then restart, then stop again for a while, and so on. In addition, if
++ * the disk achieves a low enough throughput with the request pattern
++ * issued by the application (e.g., because the request pattern is random
++ * and/or the device is slow), then the application may meet the above
++ * bandwidth requirement too. To prevent such a greedy application to be
++ * deemed as soft real-time, a further rule is used in the computation of
++ * soft_rt_next_start: soft_rt_next_start must be higher than the current
++ * time plus the maximum time for which the arrival of a request is waited
++ * for when a sync queue becomes idle, namely bfqd->bfq_slice_idle.
++ * This filters out greedy applications, as the latter issue instead their
++ * next request as soon as possible after the last one has been completed
++ * (in contrast, when a batch of requests is completed, a soft real-time
++ * application spends some time processing data).
++ *
++ * Unfortunately, the last filter may easily generate false positives if
++ * only bfqd->bfq_slice_idle is used as a reference time interval and one
++ * or both the following cases occur:
++ * 1) HZ is so low that the duration of a jiffy is comparable to or higher
++ * than bfqd->bfq_slice_idle. This happens, e.g., on slow devices with
++ * HZ=100.
++ * 2) jiffies, instead of increasing at a constant rate, may stop increasing
++ * for a while, then suddenly 'jump' by several units to recover the lost
++ * increments. This seems to happen, e.g., inside virtual machines.
++ * To address this issue, we do not use as a reference time interval just
++ * bfqd->bfq_slice_idle, but bfqd->bfq_slice_idle plus a few jiffies. In
++ * particular we add the minimum number of jiffies for which the filter
++ * seems to be quite precise also in embedded systems and KVM/QEMU virtual
++ * machines.
++ */
++static unsigned long bfq_bfqq_softrt_next_start(struct bfq_data *bfqd,
++ struct bfq_queue *bfqq)
++{
++ return max(bfqq->last_idle_bklogged +
++ HZ * bfqq->service_from_backlogged /
++ bfqd->bfq_wr_max_softrt_rate,
++ jiffies + bfqq->bfqd->bfq_slice_idle + 4);
++}
++
++/*
++ * Return the largest-possible time instant such that, for as long as possible,
++ * the current time will be lower than this time instant according to the macro
++ * time_is_before_jiffies().
++ */
++static unsigned long bfq_infinity_from_now(unsigned long now)
++{
++ return now + ULONG_MAX / 2;
++}
++
++/**
++ * bfq_bfqq_expire - expire a queue.
++ * @bfqd: device owning the queue.
++ * @bfqq: the queue to expire.
++ * @compensate: if true, compensate for the time spent idling.
++ * @reason: the reason causing the expiration.
++ *
++ *
++ * If the process associated to the queue is slow (i.e., seeky), or in
++ * case of budget timeout, or, finally, if it is async, we
++ * artificially charge it an entire budget (independently of the
++ * actual service it received). As a consequence, the queue will get
++ * higher timestamps than the correct ones upon reactivation, and
++ * hence it will be rescheduled as if it had received more service
++ * than what it actually received. In the end, this class of processes
++ * will receive less service in proportion to how slowly they consume
++ * their budgets (and hence how seriously they tend to lower the
++ * throughput).
++ *
++ * In contrast, when a queue expires because it has been idling for
++ * too much or because it exhausted its budget, we do not touch the
++ * amount of service it has received. Hence when the queue will be
++ * reactivated and its timestamps updated, the latter will be in sync
++ * with the actual service received by the queue until expiration.
++ *
++ * Charging a full budget to the first type of queues and the exact
++ * service to the others has the effect of using the WF2Q+ policy to
++ * schedule the former on a timeslice basis, without violating the
++ * service domain guarantees of the latter.
++ */
++static void bfq_bfqq_expire(struct bfq_data *bfqd,
++ struct bfq_queue *bfqq,
++ bool compensate,
++ enum bfqq_expiration reason)
++{
++ bool slow;
++ BUG_ON(bfqq != bfqd->in_service_queue);
++
++ /*
++ * Update disk peak rate for autotuning and check whether the
++ * process is slow (see bfq_update_peak_rate).
++ */
++ slow = bfq_update_peak_rate(bfqd, bfqq, compensate, reason);
++
++ /*
++ * As above explained, 'punish' slow (i.e., seeky), timed-out
++ * and async queues, to favor sequential sync workloads.
++ *
++ * Processes doing I/O in the slower disk zones will tend to be
++ * slow(er) even if not seeky. Hence, since the estimated peak
++ * rate is actually an average over the disk surface, these
++ * processes may timeout just for bad luck. To avoid punishing
++ * them we do not charge a full budget to a process that
++ * succeeded in consuming at least 2/3 of its budget.
++ */
++ if (slow || (reason == BFQ_BFQQ_BUDGET_TIMEOUT &&
++ bfq_bfqq_budget_left(bfqq) >= bfqq->entity.budget / 3))
++ bfq_bfqq_charge_full_budget(bfqq);
++
++ bfqq->service_from_backlogged += bfqq->entity.service;
++
++ if (BFQQ_SEEKY(bfqq) && reason == BFQ_BFQQ_BUDGET_TIMEOUT &&
++ !bfq_bfqq_constantly_seeky(bfqq)) {
++ bfq_mark_bfqq_constantly_seeky(bfqq);
++ if (!blk_queue_nonrot(bfqd->queue))
++ bfqd->const_seeky_busy_in_flight_queues++;
++ }
++
++ if (reason == BFQ_BFQQ_TOO_IDLE &&
++ bfqq->entity.service <= 2 * bfqq->entity.budget / 10 )
++ bfq_clear_bfqq_IO_bound(bfqq);
++
++ if (bfqd->low_latency && bfqq->wr_coeff == 1)
++ bfqq->last_wr_start_finish = jiffies;
++
++ if (bfqd->low_latency && bfqd->bfq_wr_max_softrt_rate > 0 &&
++ RB_EMPTY_ROOT(&bfqq->sort_list)) {
++ /*
++ * If we get here, and there are no outstanding requests,
++ * then the request pattern is isochronous (see the comments
++ * to the function bfq_bfqq_softrt_next_start()). Hence we
++ * can compute soft_rt_next_start. If, instead, the queue
++ * still has outstanding requests, then we have to wait
++ * for the completion of all the outstanding requests to
++ * discover whether the request pattern is actually
++ * isochronous.
++ */
++ if (bfqq->dispatched == 0)
++ bfqq->soft_rt_next_start =
++ bfq_bfqq_softrt_next_start(bfqd, bfqq);
++ else {
++ /*
++ * The application is still waiting for the
++ * completion of one or more requests:
++ * prevent it from possibly being incorrectly
++ * deemed as soft real-time by setting its
++ * soft_rt_next_start to infinity. In fact,
++ * without this assignment, the application
++ * would be incorrectly deemed as soft
++ * real-time if:
++ * 1) it issued a new request before the
++ * completion of all its in-flight
++ * requests, and
++ * 2) at that time, its soft_rt_next_start
++ * happened to be in the past.
++ */
++ bfqq->soft_rt_next_start =
++ bfq_infinity_from_now(jiffies);
++ /*
++ * Schedule an update of soft_rt_next_start to when
++ * the task may be discovered to be isochronous.
++ */
++ bfq_mark_bfqq_softrt_update(bfqq);
++ }
++ }
++
++ bfq_log_bfqq(bfqd, bfqq,
++ "expire (%d, slow %d, num_disp %d, idle_win %d)", reason,
++ slow, bfqq->dispatched, bfq_bfqq_idle_window(bfqq));
++
++ /*
++ * Increase, decrease or leave budget unchanged according to
++ * reason.
++ */
++ __bfq_bfqq_recalc_budget(bfqd, bfqq, reason);
++ __bfq_bfqq_expire(bfqd, bfqq);
++}
++
++/*
++ * Budget timeout is not implemented through a dedicated timer, but
++ * just checked on request arrivals and completions, as well as on
++ * idle timer expirations.
++ */
++static bool bfq_bfqq_budget_timeout(struct bfq_queue *bfqq)
++{
++ if (bfq_bfqq_budget_new(bfqq) ||
++ time_before(jiffies, bfqq->budget_timeout))
++ return false;
++ return true;
++}
++
++/*
++ * If we expire a queue that is waiting for the arrival of a new
++ * request, we may prevent the fictitious timestamp back-shifting that
++ * allows the guarantees of the queue to be preserved (see [1] for
++ * this tricky aspect). Hence we return true only if this condition
++ * does not hold, or if the queue is slow enough to deserve only to be
++ * kicked off for preserving a high throughput.
++*/
++static bool bfq_may_expire_for_budg_timeout(struct bfq_queue *bfqq)
++{
++ bfq_log_bfqq(bfqq->bfqd, bfqq,
++ "may_budget_timeout: wait_request %d left %d timeout %d",
++ bfq_bfqq_wait_request(bfqq),
++ bfq_bfqq_budget_left(bfqq) >= bfqq->entity.budget / 3,
++ bfq_bfqq_budget_timeout(bfqq));
++
++ return (!bfq_bfqq_wait_request(bfqq) ||
++ bfq_bfqq_budget_left(bfqq) >= bfqq->entity.budget / 3)
++ &&
++ bfq_bfqq_budget_timeout(bfqq);
++}
++
++/*
++ * For a queue that becomes empty, device idling is allowed only if
++ * this function returns true for that queue. As a consequence, since
++ * device idling plays a critical role for both throughput boosting
++ * and service guarantees, the return value of this function plays a
++ * critical role as well.
++ *
++ * In a nutshell, this function returns true only if idling is
++ * beneficial for throughput or, even if detrimental for throughput,
++ * idling is however necessary to preserve service guarantees (low
++ * latency, desired throughput distribution, ...). In particular, on
++ * NCQ-capable devices, this function tries to return false, so as to
++ * help keep the drives' internal queues full, whenever this helps the
++ * device boost the throughput without causing any service-guarantee
++ * issue.
++ *
++ * In more detail, the return value of this function is obtained by,
++ * first, computing a number of boolean variables that take into
++ * account throughput and service-guarantee issues, and, then,
++ * combining these variables in a logical expression. Most of the
++ * issues taken into account are not trivial. We discuss these issues
++ * while introducing the variables.
++ */
++static bool bfq_bfqq_may_idle(struct bfq_queue *bfqq)
++{
++ struct bfq_data *bfqd = bfqq->bfqd;
++ bool idling_boosts_thr, idling_boosts_thr_without_issues,
++ all_queues_seeky, on_hdd_and_not_all_queues_seeky,
++ idling_needed_for_service_guarantees,
++ asymmetric_scenario;
++
++ /*
++ * The next variable takes into account the cases where idling
++ * boosts the throughput.
++ *
++ * The value of the variable is computed considering, first, that
++ * idling is virtually always beneficial for the throughput if:
++ * (a) the device is not NCQ-capable, or
++ * (b) regardless of the presence of NCQ, the device is rotational
++ * and the request pattern for bfqq is I/O-bound and sequential.
++ *
++ * Secondly, and in contrast to the above item (b), idling an
++ * NCQ-capable flash-based device would not boost the
++ * throughput even with sequential I/O; rather it would lower
++ * the throughput in proportion to how fast the device
++ * is. Accordingly, the next variable is true if any of the
++ * above conditions (a) and (b) is true, and, in particular,
++ * happens to be false if bfqd is an NCQ-capable flash-based
++ * device.
++ */
++ idling_boosts_thr = !bfqd->hw_tag ||
++ (!blk_queue_nonrot(bfqd->queue) && bfq_bfqq_IO_bound(bfqq) &&
++ bfq_bfqq_idle_window(bfqq)) ;
++
++ /*
++ * The value of the next variable,
++ * idling_boosts_thr_without_issues, is equal to that of
++ * idling_boosts_thr, unless a special case holds. In this
++ * special case, described below, idling may cause problems to
++ * weight-raised queues.
++ *
++ * When the request pool is saturated (e.g., in the presence
++ * of write hogs), if the processes associated with
++ * non-weight-raised queues ask for requests at a lower rate,
++ * then processes associated with weight-raised queues have a
++ * higher probability to get a request from the pool
++ * immediately (or at least soon) when they need one. Thus
++ * they have a higher probability to actually get a fraction
++ * of the device throughput proportional to their high
++ * weight. This is especially true with NCQ-capable drives,
++ * which enqueue several requests in advance, and further
++ * reorder internally-queued requests.
++ *
++ * For this reason, we force to false the value of
++ * idling_boosts_thr_without_issues if there are weight-raised
++ * busy queues. In this case, and if bfqq is not weight-raised,
++ * this guarantees that the device is not idled for bfqq (if,
++ * instead, bfqq is weight-raised, then idling will be
++ * guaranteed by another variable, see below). Combined with
++ * the timestamping rules of BFQ (see [1] for details), this
++ * behavior causes bfqq, and hence any sync non-weight-raised
++ * queue, to get a lower number of requests served, and thus
++ * to ask for a lower number of requests from the request
++ * pool, before the busy weight-raised queues get served
++ * again. This often mitigates starvation problems in the
++ * presence of heavy write workloads and NCQ, thereby
++ * guaranteeing a higher application and system responsiveness
++ * in these hostile scenarios.
++ */
++ idling_boosts_thr_without_issues = idling_boosts_thr &&
++ bfqd->wr_busy_queues == 0;
++
++ /*
++ * There are then two cases where idling must be performed not
++ * for throughput concerns, but to preserve service
++ * guarantees. In the description of these cases, we say, for
++ * short, that a queue is sequential/random if the process
++ * associated to the queue issues sequential/random requests
++ * (in the second case the queue may be tagged as seeky or
++ * even constantly_seeky).
++ *
++ * To introduce the first case, we note that, since
++ * bfq_bfqq_idle_window(bfqq) is false if the device is
++ * NCQ-capable and bfqq is random (see
++ * bfq_update_idle_window()), then, from the above two
++ * assignments it follows that
++ * idling_boosts_thr_without_issues is false if the device is
++ * NCQ-capable and bfqq is random. Therefore, for this case,
++ * device idling would never be allowed if we used just
++ * idling_boosts_thr_without_issues to decide whether to allow
++ * it. And, beneficially, this would imply that throughput
++ * would always be boosted also with random I/O on NCQ-capable
++ * HDDs.
++ *
++ * But we must be careful on this point, to avoid an unfair
++ * treatment for bfqq. In fact, because of the same above
++ * assignments, idling_boosts_thr_without_issues is, on the
++ * other hand, true if 1) the device is an HDD and bfqq is
++ * sequential, and 2) there are no busy weight-raised
++ * queues. As a consequence, if we used just
++ * idling_boosts_thr_without_issues to decide whether to idle
++ * the device, then with an HDD we might easily bump into a
++ * scenario where queues that are sequential and I/O-bound
++ * would enjoy idling, whereas random queues would not. The
++ * latter might then get a low share of the device throughput,
++ * simply because the former would get many requests served
++ * after being set as in service, while the latter would not.
++ *
++ * To address this issue, we start by setting to true a
++ * sentinel variable, on_hdd_and_not_all_queues_seeky, if the
++ * device is rotational and not all queues with pending or
++ * in-flight requests are constantly seeky (i.e., there are
++ * active sequential queues, and bfqq might then be mistreated
++ * if it does not enjoy idling because it is random).
++ */
++ all_queues_seeky = bfq_bfqq_constantly_seeky(bfqq) &&
++ bfqd->busy_in_flight_queues ==
++ bfqd->const_seeky_busy_in_flight_queues;
++
++ on_hdd_and_not_all_queues_seeky =
++ !blk_queue_nonrot(bfqd->queue) && !all_queues_seeky;
++
++ /*
++ * To introduce the second case where idling needs to be
++ * performed to preserve service guarantees, we can note that
++ * allowing the drive to enqueue more than one request at a
++ * time, and hence delegating de facto final scheduling
++ * decisions to the drive's internal scheduler, causes loss of
++ * control on the actual request service order. In particular,
++ * the critical situation is when requests from different
++ * processes happens to be present, at the same time, in the
++ * internal queue(s) of the drive. In such a situation, the
++ * drive, by deciding the service order of the
++ * internally-queued requests, does determine also the actual
++ * throughput distribution among these processes. But the
++ * drive typically has no notion or concern about per-process
++ * throughput distribution, and makes its decisions only on a
++ * per-request basis. Therefore, the service distribution
++ * enforced by the drive's internal scheduler is likely to
++ * coincide with the desired device-throughput distribution
++ * only in a completely symmetric scenario where:
++ * (i) each of these processes must get the same throughput as
++ * the others;
++ * (ii) all these processes have the same I/O pattern
++ (either sequential or random).
++ * In fact, in such a scenario, the drive will tend to treat
++ * the requests of each of these processes in about the same
++ * way as the requests of the others, and thus to provide
++ * each of these processes with about the same throughput
++ * (which is exactly the desired throughput distribution). In
++ * contrast, in any asymmetric scenario, device idling is
++ * certainly needed to guarantee that bfqq receives its
++ * assigned fraction of the device throughput (see [1] for
++ * details).
++ *
++ * We address this issue by controlling, actually, only the
++ * symmetry sub-condition (i), i.e., provided that
++ * sub-condition (i) holds, idling is not performed,
++ * regardless of whether sub-condition (ii) holds. In other
++ * words, only if sub-condition (i) holds, then idling is
++ * allowed, and the device tends to be prevented from queueing
++ * many requests, possibly of several processes. The reason
++ * for not controlling also sub-condition (ii) is that, first,
++ * in the case of an HDD, the asymmetry in terms of types of
++ * I/O patterns is already taken in to account in the above
++ * sentinel variable
++ * on_hdd_and_not_all_queues_seeky. Secondly, in the case of a
++ * flash-based device, we prefer however to privilege
++ * throughput (and idling lowers throughput for this type of
++ * devices), for the following reasons:
++ * 1) differently from HDDs, the service time of random
++ * requests is not orders of magnitudes lower than the service
++ * time of sequential requests; thus, even if processes doing
++ * sequential I/O get a preferential treatment with respect to
++ * others doing random I/O, the consequences are not as
++ * dramatic as with HDDs;
++ * 2) if a process doing random I/O does need strong
++ * throughput guarantees, it is hopefully already being
++ * weight-raised, or the user is likely to have assigned it a
++ * higher weight than the other processes (and thus
++ * sub-condition (i) is likely to be false, which triggers
++ * idling).
++ *
++ * According to the above considerations, the next variable is
++ * true (only) if sub-condition (i) holds. To compute the
++ * value of this variable, we not only use the return value of
++ * the function bfq_symmetric_scenario(), but also check
++ * whether bfqq is being weight-raised, because
++ * bfq_symmetric_scenario() does not take into account also
++ * weight-raised queues (see comments to
++ * bfq_weights_tree_add()).
++ *
++ * As a side note, it is worth considering that the above
++ * device-idling countermeasures may however fail in the
++ * following unlucky scenario: if idling is (correctly)
++ * disabled in a time period during which all symmetry
++ * sub-conditions hold, and hence the device is allowed to
++ * enqueue many requests, but at some later point in time some
++ * sub-condition stops to hold, then it may become impossible
++ * to let requests be served in the desired order until all
++ * the requests already queued in the device have been served.
++ */
++ asymmetric_scenario = bfqq->wr_coeff > 1 ||
++ !bfq_symmetric_scenario(bfqd);
++
++ /*
++ * Finally, there is a case where maximizing throughput is the
++ * best choice even if it may cause unfairness toward
++ * bfqq. Such a case is when bfqq became active in a burst of
++ * queue activations. Queues that became active during a large
++ * burst benefit only from throughput, as discussed in the
++ * comments to bfq_handle_burst. Thus, if bfqq became active
++ * in a burst and not idling the device maximizes throughput,
++ * then the device must no be idled, because not idling the
++ * device provides bfqq and all other queues in the burst with
++ * maximum benefit. Combining this and the two cases above, we
++ * can now establish when idling is actually needed to
++ * preserve service guarantees.
++ */
++ idling_needed_for_service_guarantees =
++ (on_hdd_and_not_all_queues_seeky || asymmetric_scenario) &&
++ !bfq_bfqq_in_large_burst(bfqq);
++
++ /*
++ * We have now all the components we need to compute the return
++ * value of the function, which is true only if both the following
++ * conditions hold:
++ * 1) bfqq is sync, because idling make sense only for sync queues;
++ * 2) idling either boosts the throughput (without issues), or
++ * is necessary to preserve service guarantees.
++ */
++ return bfq_bfqq_sync(bfqq) &&
++ (idling_boosts_thr_without_issues ||
++ idling_needed_for_service_guarantees);
++}
++
++/*
++ * If the in-service queue is empty but the function bfq_bfqq_may_idle
++ * returns true, then:
++ * 1) the queue must remain in service and cannot be expired, and
++ * 2) the device must be idled to wait for the possible arrival of a new
++ * request for the queue.
++ * See the comments to the function bfq_bfqq_may_idle for the reasons
++ * why performing device idling is the best choice to boost the throughput
++ * and preserve service guarantees when bfq_bfqq_may_idle itself
++ * returns true.
++ */
++static bool bfq_bfqq_must_idle(struct bfq_queue *bfqq)
++{
++ struct bfq_data *bfqd = bfqq->bfqd;
++
++ return RB_EMPTY_ROOT(&bfqq->sort_list) && bfqd->bfq_slice_idle != 0 &&
++ bfq_bfqq_may_idle(bfqq);
++}
++
++/*
++ * Select a queue for service. If we have a current queue in service,
++ * check whether to continue servicing it, or retrieve and set a new one.
++ */
++static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
++{
++ struct bfq_queue *bfqq;
++ struct request *next_rq;
++ enum bfqq_expiration reason = BFQ_BFQQ_BUDGET_TIMEOUT;
++
++ bfqq = bfqd->in_service_queue;
++ if (!bfqq)
++ goto new_queue;
++
++ bfq_log_bfqq(bfqd, bfqq, "select_queue: already in-service queue");
++
++ if (bfq_may_expire_for_budg_timeout(bfqq) &&
++ !timer_pending(&bfqd->idle_slice_timer) &&
++ !bfq_bfqq_must_idle(bfqq))
++ goto expire;
++
++ next_rq = bfqq->next_rq;
++ /*
++ * If bfqq has requests queued and it has enough budget left to
++ * serve them, keep the queue, otherwise expire it.
++ */
++ if (next_rq) {
++ if (bfq_serv_to_charge(next_rq, bfqq) >
++ bfq_bfqq_budget_left(bfqq)) {
++ reason = BFQ_BFQQ_BUDGET_EXHAUSTED;
++ goto expire;
++ } else {
++ /*
++ * The idle timer may be pending because we may
++ * not disable disk idling even when a new request
++ * arrives.
++ */
++ if (timer_pending(&bfqd->idle_slice_timer)) {
++ /*
++ * If we get here: 1) at least a new request
++ * has arrived but we have not disabled the
++ * timer because the request was too small,
++ * 2) then the block layer has unplugged
++ * the device, causing the dispatch to be
++ * invoked.
++ *
++ * Since the device is unplugged, now the
++ * requests are probably large enough to
++ * provide a reasonable throughput.
++ * So we disable idling.
++ */
++ bfq_clear_bfqq_wait_request(bfqq);
++ del_timer(&bfqd->idle_slice_timer);
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ bfqg_stats_update_idle_time(bfqq_group(bfqq));
++#endif
++ }
++ goto keep_queue;
++ }
++ }
++
++ /*
++ * No requests pending. However, if the in-service queue is idling
++ * for a new request, or has requests waiting for a completion and
++ * may idle after their completion, then keep it anyway.
++ */
++ if (timer_pending(&bfqd->idle_slice_timer) ||
++ (bfqq->dispatched != 0 && bfq_bfqq_may_idle(bfqq))) {
++ bfqq = NULL;
++ goto keep_queue;
++ }
++
++ reason = BFQ_BFQQ_NO_MORE_REQUESTS;
++expire:
++ bfq_bfqq_expire(bfqd, bfqq, false, reason);
++new_queue:
++ bfqq = bfq_set_in_service_queue(bfqd);
++ bfq_log(bfqd, "select_queue: new queue %d returned",
++ bfqq ? bfqq->pid : 0);
++keep_queue:
++ return bfqq;
++}
++
++static void bfq_update_wr_data(struct bfq_data *bfqd, struct bfq_queue *bfqq)
++{
++ struct bfq_entity *entity = &bfqq->entity;
++ if (bfqq->wr_coeff > 1) { /* queue is being weight-raised */
++ bfq_log_bfqq(bfqd, bfqq,
++ "raising period dur %u/%u msec, old coeff %u, w %d(%d)",
++ jiffies_to_msecs(jiffies - bfqq->last_wr_start_finish),
++ jiffies_to_msecs(bfqq->wr_cur_max_time),
++ bfqq->wr_coeff,
++ bfqq->entity.weight, bfqq->entity.orig_weight);
++
++ BUG_ON(bfqq != bfqd->in_service_queue && entity->weight !=
++ entity->orig_weight * bfqq->wr_coeff);
++ if (entity->prio_changed)
++ bfq_log_bfqq(bfqd, bfqq, "WARN: pending prio change");
++
++ /*
++ * If the queue was activated in a burst, or
++ * too much time has elapsed from the beginning
++ * of this weight-raising period, then end weight
++ * raising.
++ */
++ if (bfq_bfqq_in_large_burst(bfqq) ||
++ time_is_before_jiffies(bfqq->last_wr_start_finish +
++ bfqq->wr_cur_max_time)) {
++ bfqq->last_wr_start_finish = jiffies;
++ bfq_log_bfqq(bfqd, bfqq,
++ "wrais ending at %lu, rais_max_time %u",
++ bfqq->last_wr_start_finish,
++ jiffies_to_msecs(bfqq->wr_cur_max_time));
++ bfq_bfqq_end_wr(bfqq);
++ }
++ }
++ /* Update weight both if it must be raised and if it must be lowered */
++ if ((entity->weight > entity->orig_weight) != (bfqq->wr_coeff > 1))
++ __bfq_entity_update_weight_prio(
++ bfq_entity_service_tree(entity),
++ entity);
++}
++
++/*
++ * Dispatch one request from bfqq, moving it to the request queue
++ * dispatch list.
++ */
++static int bfq_dispatch_request(struct bfq_data *bfqd,
++ struct bfq_queue *bfqq)
++{
++ int dispatched = 0;
++ struct request *rq;
++ unsigned long service_to_charge;
++
++ BUG_ON(RB_EMPTY_ROOT(&bfqq->sort_list));
++
++ /* Follow expired path, else get first next available. */
++ rq = bfq_check_fifo(bfqq);
++ if (!rq)
++ rq = bfqq->next_rq;
++ service_to_charge = bfq_serv_to_charge(rq, bfqq);
++
++ if (service_to_charge > bfq_bfqq_budget_left(bfqq)) {
++ /*
++ * This may happen if the next rq is chosen in fifo order
++ * instead of sector order. The budget is properly
++ * dimensioned to be always sufficient to serve the next
++ * request only if it is chosen in sector order. The reason
++ * is that it would be quite inefficient and little useful
++ * to always make sure that the budget is large enough to
++ * serve even the possible next rq in fifo order.
++ * In fact, requests are seldom served in fifo order.
++ *
++ * Expire the queue for budget exhaustion, and make sure
++ * that the next act_budget is enough to serve the next
++ * request, even if it comes from the fifo expired path.
++ */
++ bfqq->next_rq = rq;
++ /*
++ * Since this dispatch is failed, make sure that
++ * a new one will be performed
++ */
++ if (!bfqd->rq_in_driver)
++ bfq_schedule_dispatch(bfqd);
++ goto expire;
++ }
++
++ /* Finally, insert request into driver dispatch list. */
++ bfq_bfqq_served(bfqq, service_to_charge);
++ bfq_dispatch_insert(bfqd->queue, rq);
++
++ bfq_update_wr_data(bfqd, bfqq);
++
++ bfq_log_bfqq(bfqd, bfqq,
++ "dispatched %u sec req (%llu), budg left %d",
++ blk_rq_sectors(rq),
++ (long long unsigned)blk_rq_pos(rq),
++ bfq_bfqq_budget_left(bfqq));
++
++ dispatched++;
++
++ if (!bfqd->in_service_bic) {
++ atomic_long_inc(&RQ_BIC(rq)->icq.ioc->refcount);
++ bfqd->in_service_bic = RQ_BIC(rq);
++ }
++
++ if (bfqd->busy_queues > 1 && ((!bfq_bfqq_sync(bfqq) &&
++ dispatched >= bfqd->bfq_max_budget_async_rq) ||
++ bfq_class_idle(bfqq)))
++ goto expire;
++
++ return dispatched;
++
++expire:
++ bfq_bfqq_expire(bfqd, bfqq, false, BFQ_BFQQ_BUDGET_EXHAUSTED);
++ return dispatched;
++}
++
++static int __bfq_forced_dispatch_bfqq(struct bfq_queue *bfqq)
++{
++ int dispatched = 0;
++
++ while (bfqq->next_rq) {
++ bfq_dispatch_insert(bfqq->bfqd->queue, bfqq->next_rq);
++ dispatched++;
++ }
++
++ BUG_ON(!list_empty(&bfqq->fifo));
++ return dispatched;
++}
++
++/*
++ * Drain our current requests.
++ * Used for barriers and when switching io schedulers on-the-fly.
++ */
++static int bfq_forced_dispatch(struct bfq_data *bfqd)
++{
++ struct bfq_queue *bfqq, *n;
++ struct bfq_service_tree *st;
++ int dispatched = 0;
++
++ bfqq = bfqd->in_service_queue;
++ if (bfqq)
++ __bfq_bfqq_expire(bfqd, bfqq);
++
++ /*
++ * Loop through classes, and be careful to leave the scheduler
++ * in a consistent state, as feedback mechanisms and vtime
++ * updates cannot be disabled during the process.
++ */
++ list_for_each_entry_safe(bfqq, n, &bfqd->active_list, bfqq_list) {
++ st = bfq_entity_service_tree(&bfqq->entity);
++
++ dispatched += __bfq_forced_dispatch_bfqq(bfqq);
++ bfqq->max_budget = bfq_max_budget(bfqd);
++
++ bfq_forget_idle(st);
++ }
++
++ BUG_ON(bfqd->busy_queues != 0);
++
++ return dispatched;
++}
++
++static int bfq_dispatch_requests(struct request_queue *q, int force)
++{
++ struct bfq_data *bfqd = q->elevator->elevator_data;
++ struct bfq_queue *bfqq;
++ int max_dispatch;
++
++ bfq_log(bfqd, "dispatch requests: %d busy queues", bfqd->busy_queues);
++ if (bfqd->busy_queues == 0)
++ return 0;
++
++ if (unlikely(force))
++ return bfq_forced_dispatch(bfqd);
++
++ bfqq = bfq_select_queue(bfqd);
++ if (!bfqq)
++ return 0;
++
++ if (bfq_class_idle(bfqq))
++ max_dispatch = 1;
++
++ if (!bfq_bfqq_sync(bfqq))
++ max_dispatch = bfqd->bfq_max_budget_async_rq;
++
++ if (!bfq_bfqq_sync(bfqq) && bfqq->dispatched >= max_dispatch) {
++ if (bfqd->busy_queues > 1)
++ return 0;
++ if (bfqq->dispatched >= 4 * max_dispatch)
++ return 0;
++ }
++
++ if (bfqd->sync_flight != 0 && !bfq_bfqq_sync(bfqq))
++ return 0;
++
++ bfq_clear_bfqq_wait_request(bfqq);
++ BUG_ON(timer_pending(&bfqd->idle_slice_timer));
++
++ if (!bfq_dispatch_request(bfqd, bfqq))
++ return 0;
++
++ bfq_log_bfqq(bfqd, bfqq, "dispatched %s request",
++ bfq_bfqq_sync(bfqq) ? "sync" : "async");
++
++ return 1;
++}
++
++/*
++ * Task holds one reference to the queue, dropped when task exits. Each rq
++ * in-flight on this queue also holds a reference, dropped when rq is freed.
++ *
++ * Queue lock must be held here.
++ */
++static void bfq_put_queue(struct bfq_queue *bfqq)
++{
++ struct bfq_data *bfqd = bfqq->bfqd;
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ struct bfq_group *bfqg = bfqq_group(bfqq);
++#endif
++
++ BUG_ON(atomic_read(&bfqq->ref) <= 0);
++
++ bfq_log_bfqq(bfqd, bfqq, "put_queue: %p %d", bfqq,
++ atomic_read(&bfqq->ref));
++ if (!atomic_dec_and_test(&bfqq->ref))
++ return;
++
++ BUG_ON(rb_first(&bfqq->sort_list));
++ BUG_ON(bfqq->allocated[READ] + bfqq->allocated[WRITE] != 0);
++ BUG_ON(bfqq->entity.tree);
++ BUG_ON(bfq_bfqq_busy(bfqq));
++ BUG_ON(bfqd->in_service_queue == bfqq);
++
++ if (bfq_bfqq_sync(bfqq))
++ /*
++ * The fact that this queue is being destroyed does not
++ * invalidate the fact that this queue may have been
++ * activated during the current burst. As a consequence,
++ * although the queue does not exist anymore, and hence
++ * needs to be removed from the burst list if there,
++ * the burst size has not to be decremented.
++ */
++ hlist_del_init(&bfqq->burst_list_node);
++
++ bfq_log_bfqq(bfqd, bfqq, "put_queue: %p freed", bfqq);
++
++ kmem_cache_free(bfq_pool, bfqq);
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ bfqg_put(bfqg);
++#endif
++}
++
++static void bfq_exit_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq)
++{
++ if (bfqq == bfqd->in_service_queue) {
++ __bfq_bfqq_expire(bfqd, bfqq);
++ bfq_schedule_dispatch(bfqd);
++ }
++
++ bfq_log_bfqq(bfqd, bfqq, "exit_bfqq: %p, %d", bfqq,
++ atomic_read(&bfqq->ref));
++
++ bfq_put_queue(bfqq);
++}
++
++static void bfq_init_icq(struct io_cq *icq)
++{
++ struct bfq_io_cq *bic = icq_to_bic(icq);
++
++ bic->ttime.last_end_request = jiffies;
++}
++
++static void bfq_exit_icq(struct io_cq *icq)
++{
++ struct bfq_io_cq *bic = icq_to_bic(icq);
++ struct bfq_data *bfqd = bic_to_bfqd(bic);
++
++ if (bic->bfqq[BLK_RW_ASYNC]) {
++ bfq_exit_bfqq(bfqd, bic->bfqq[BLK_RW_ASYNC]);
++ bic->bfqq[BLK_RW_ASYNC] = NULL;
++ }
++
++ if (bic->bfqq[BLK_RW_SYNC]) {
++ bfq_exit_bfqq(bfqd, bic->bfqq[BLK_RW_SYNC]);
++ bic->bfqq[BLK_RW_SYNC] = NULL;
++ }
++}
++
++/*
++ * Update the entity prio values; note that the new values will not
++ * be used until the next (re)activation.
++ */
++static void bfq_set_next_ioprio_data(struct bfq_queue *bfqq, struct bfq_io_cq *bic)
++{
++ struct task_struct *tsk = current;
++ int ioprio_class;
++
++ ioprio_class = IOPRIO_PRIO_CLASS(bic->ioprio);
++ switch (ioprio_class) {
++ default:
++ dev_err(bfqq->bfqd->queue->backing_dev_info.dev,
++ "bfq: bad prio class %d\n", ioprio_class);
++ case IOPRIO_CLASS_NONE:
++ /*
++ * No prio set, inherit CPU scheduling settings.
++ */
++ bfqq->new_ioprio = task_nice_ioprio(tsk);
++ bfqq->new_ioprio_class = task_nice_ioclass(tsk);
++ break;
++ case IOPRIO_CLASS_RT:
++ bfqq->new_ioprio = IOPRIO_PRIO_DATA(bic->ioprio);
++ bfqq->new_ioprio_class = IOPRIO_CLASS_RT;
++ break;
++ case IOPRIO_CLASS_BE:
++ bfqq->new_ioprio = IOPRIO_PRIO_DATA(bic->ioprio);
++ bfqq->new_ioprio_class = IOPRIO_CLASS_BE;
++ break;
++ case IOPRIO_CLASS_IDLE:
++ bfqq->new_ioprio_class = IOPRIO_CLASS_IDLE;
++ bfqq->new_ioprio = 7;
++ bfq_clear_bfqq_idle_window(bfqq);
++ break;
++ }
++
++ if (bfqq->new_ioprio < 0 || bfqq->new_ioprio >= IOPRIO_BE_NR) {
++ printk(KERN_CRIT "bfq_set_next_ioprio_data: new_ioprio %d\n",
++ bfqq->new_ioprio);
++ BUG();
++ }
++
++ bfqq->entity.new_weight = bfq_ioprio_to_weight(bfqq->new_ioprio);
++ bfqq->entity.prio_changed = 1;
++}
++
++static void bfq_check_ioprio_change(struct bfq_io_cq *bic, struct bio *bio)
++{
++ struct bfq_data *bfqd;
++ struct bfq_queue *bfqq, *new_bfqq;
++ unsigned long uninitialized_var(flags);
++ int ioprio = bic->icq.ioc->ioprio;
++
++ bfqd = bfq_get_bfqd_locked(&(bic->icq.q->elevator->elevator_data),
++ &flags);
++ /*
++ * This condition may trigger on a newly created bic, be sure to
++ * drop the lock before returning.
++ */
++ if (unlikely(!bfqd) || likely(bic->ioprio == ioprio))
++ goto out;
++
++ bic->ioprio = ioprio;
++
++ bfqq = bic->bfqq[BLK_RW_ASYNC];
++ if (bfqq) {
++ new_bfqq = bfq_get_queue(bfqd, bio, BLK_RW_ASYNC, bic,
++ GFP_ATOMIC);
++ if (new_bfqq) {
++ bic->bfqq[BLK_RW_ASYNC] = new_bfqq;
++ bfq_log_bfqq(bfqd, bfqq,
++ "check_ioprio_change: bfqq %p %d",
++ bfqq, atomic_read(&bfqq->ref));
++ bfq_put_queue(bfqq);
++ }
++ }
++
++ bfqq = bic->bfqq[BLK_RW_SYNC];
++ if (bfqq)
++ bfq_set_next_ioprio_data(bfqq, bic);
++
++out:
++ bfq_put_bfqd_unlock(bfqd, &flags);
++}
++
++static void bfq_init_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++ struct bfq_io_cq *bic, pid_t pid, int is_sync)
++{
++ RB_CLEAR_NODE(&bfqq->entity.rb_node);
++ INIT_LIST_HEAD(&bfqq->fifo);
++ INIT_HLIST_NODE(&bfqq->burst_list_node);
++
++ atomic_set(&bfqq->ref, 0);
++ bfqq->bfqd = bfqd;
++
++ if (bic)
++ bfq_set_next_ioprio_data(bfqq, bic);
++
++ if (is_sync) {
++ if (!bfq_class_idle(bfqq))
++ bfq_mark_bfqq_idle_window(bfqq);
++ bfq_mark_bfqq_sync(bfqq);
++ } else
++ bfq_clear_bfqq_sync(bfqq);
++ bfq_mark_bfqq_IO_bound(bfqq);
++
++ /* Tentative initial value to trade off between thr and lat */
++ bfqq->max_budget = (2 * bfq_max_budget(bfqd)) / 3;
++ bfqq->pid = pid;
++
++ bfqq->wr_coeff = 1;
++ bfqq->last_wr_start_finish = 0;
++ /*
++ * Set to the value for which bfqq will not be deemed as
++ * soft rt when it becomes backlogged.
++ */
++ bfqq->soft_rt_next_start = bfq_infinity_from_now(jiffies);
++}
++
++static struct bfq_queue *bfq_find_alloc_queue(struct bfq_data *bfqd,
++ struct bio *bio, int is_sync,
++ struct bfq_io_cq *bic,
++ gfp_t gfp_mask)
++{
++ struct bfq_group *bfqg;
++ struct bfq_queue *bfqq, *new_bfqq = NULL;
++ struct blkcg *blkcg;
++
++retry:
++ rcu_read_lock();
++
++ blkcg = bio_blkcg(bio);
++ bfqg = bfq_find_alloc_group(bfqd, blkcg);
++ /* bic always exists here */
++ bfqq = bic_to_bfqq(bic, is_sync);
++
++ /*
++ * Always try a new alloc if we fall back to the OOM bfqq
++ * originally, since it should just be a temporary situation.
++ */
++ if (!bfqq || bfqq == &bfqd->oom_bfqq) {
++ bfqq = NULL;
++ if (new_bfqq) {
++ bfqq = new_bfqq;
++ new_bfqq = NULL;
++ } else if (gfp_mask & __GFP_WAIT) {
++ rcu_read_unlock();
++ spin_unlock_irq(bfqd->queue->queue_lock);
++ new_bfqq = kmem_cache_alloc_node(bfq_pool,
++ gfp_mask | __GFP_ZERO,
++ bfqd->queue->node);
++ spin_lock_irq(bfqd->queue->queue_lock);
++ if (new_bfqq)
++ goto retry;
++ } else {
++ bfqq = kmem_cache_alloc_node(bfq_pool,
++ gfp_mask | __GFP_ZERO,
++ bfqd->queue->node);
++ }
++
++ if (bfqq) {
++ bfq_init_bfqq(bfqd, bfqq, bic, current->pid,
++ is_sync);
++ bfq_init_entity(&bfqq->entity, bfqg);
++ bfq_log_bfqq(bfqd, bfqq, "allocated");
++ } else {
++ bfqq = &bfqd->oom_bfqq;
++ bfq_log_bfqq(bfqd, bfqq, "using oom bfqq");
++ }
++ }
++
++ if (new_bfqq)
++ kmem_cache_free(bfq_pool, new_bfqq);
++
++ rcu_read_unlock();
++
++ return bfqq;
++}
++
++static struct bfq_queue **bfq_async_queue_prio(struct bfq_data *bfqd,
++ struct bfq_group *bfqg,
++ int ioprio_class, int ioprio)
++{
++ switch (ioprio_class) {
++ case IOPRIO_CLASS_RT:
++ return &bfqg->async_bfqq[0][ioprio];
++ case IOPRIO_CLASS_NONE:
++ ioprio = IOPRIO_NORM;
++ /* fall through */
++ case IOPRIO_CLASS_BE:
++ return &bfqg->async_bfqq[1][ioprio];
++ case IOPRIO_CLASS_IDLE:
++ return &bfqg->async_idle_bfqq;
++ default:
++ BUG();
++ }
++}
++
++static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
++ struct bio *bio, int is_sync,
++ struct bfq_io_cq *bic, gfp_t gfp_mask)
++{
++ const int ioprio = IOPRIO_PRIO_DATA(bic->ioprio);
++ const int ioprio_class = IOPRIO_PRIO_CLASS(bic->ioprio);
++ struct bfq_queue **async_bfqq = NULL;
++ struct bfq_queue *bfqq = NULL;
++
++ if (!is_sync) {
++ struct blkcg *blkcg;
++ struct bfq_group *bfqg;
++
++ rcu_read_lock();
++ blkcg = bio_blkcg(bio);
++ rcu_read_unlock();
++ bfqg = bfq_find_alloc_group(bfqd, blkcg);
++ async_bfqq = bfq_async_queue_prio(bfqd, bfqg, ioprio_class,
++ ioprio);
++ bfqq = *async_bfqq;
++ }
++
++ if (!bfqq)
++ bfqq = bfq_find_alloc_queue(bfqd, bio, is_sync, bic, gfp_mask);
++
++ /*
++ * Pin the queue now that it's allocated, scheduler exit will
++ * prune it.
++ */
++ if (!is_sync && !(*async_bfqq)) {
++ atomic_inc(&bfqq->ref);
++ bfq_log_bfqq(bfqd, bfqq, "get_queue, bfqq not in async: %p, %d",
++ bfqq, atomic_read(&bfqq->ref));
++ *async_bfqq = bfqq;
++ }
++
++ atomic_inc(&bfqq->ref);
++ bfq_log_bfqq(bfqd, bfqq, "get_queue, at end: %p, %d", bfqq,
++ atomic_read(&bfqq->ref));
++ return bfqq;
++}
++
++static void bfq_update_io_thinktime(struct bfq_data *bfqd,
++ struct bfq_io_cq *bic)
++{
++ unsigned long elapsed = jiffies - bic->ttime.last_end_request;
++ unsigned long ttime = min(elapsed, 2UL * bfqd->bfq_slice_idle);
++
++ bic->ttime.ttime_samples = (7*bic->ttime.ttime_samples + 256) / 8;
++ bic->ttime.ttime_total = (7*bic->ttime.ttime_total + 256*ttime) / 8;
++ bic->ttime.ttime_mean = (bic->ttime.ttime_total + 128) /
++ bic->ttime.ttime_samples;
++}
++
++static void bfq_update_io_seektime(struct bfq_data *bfqd,
++ struct bfq_queue *bfqq,
++ struct request *rq)
++{
++ sector_t sdist;
++ u64 total;
++
++ if (bfqq->last_request_pos < blk_rq_pos(rq))
++ sdist = blk_rq_pos(rq) - bfqq->last_request_pos;
++ else
++ sdist = bfqq->last_request_pos - blk_rq_pos(rq);
++
++ /*
++ * Don't allow the seek distance to get too large from the
++ * odd fragment, pagein, etc.
++ */
++ if (bfqq->seek_samples == 0) /* first request, not really a seek */
++ sdist = 0;
++ else if (bfqq->seek_samples <= 60) /* second & third seek */
++ sdist = min(sdist, (bfqq->seek_mean * 4) + 2*1024*1024);
++ else
++ sdist = min(sdist, (bfqq->seek_mean * 4) + 2*1024*64);
++
++ bfqq->seek_samples = (7*bfqq->seek_samples + 256) / 8;
++ bfqq->seek_total = (7*bfqq->seek_total + (u64)256*sdist) / 8;
++ total = bfqq->seek_total + (bfqq->seek_samples/2);
++ do_div(total, bfqq->seek_samples);
++ bfqq->seek_mean = (sector_t)total;
++
++ bfq_log_bfqq(bfqd, bfqq, "dist=%llu mean=%llu", (u64)sdist,
++ (u64)bfqq->seek_mean);
++}
++
++/*
++ * Disable idle window if the process thinks too long or seeks so much that
++ * it doesn't matter.
++ */
++static void bfq_update_idle_window(struct bfq_data *bfqd,
++ struct bfq_queue *bfqq,
++ struct bfq_io_cq *bic)
++{
++ int enable_idle;
++
++ /* Don't idle for async or idle io prio class. */
++ if (!bfq_bfqq_sync(bfqq) || bfq_class_idle(bfqq))
++ return;
++
++ enable_idle = bfq_bfqq_idle_window(bfqq);
++
++ if (atomic_read(&bic->icq.ioc->active_ref) == 0 ||
++ bfqd->bfq_slice_idle == 0 ||
++ (bfqd->hw_tag && BFQQ_SEEKY(bfqq) &&
++ bfqq->wr_coeff == 1))
++ enable_idle = 0;
++ else if (bfq_sample_valid(bic->ttime.ttime_samples)) {
++ if (bic->ttime.ttime_mean > bfqd->bfq_slice_idle &&
++ bfqq->wr_coeff == 1)
++ enable_idle = 0;
++ else
++ enable_idle = 1;
++ }
++ bfq_log_bfqq(bfqd, bfqq, "update_idle_window: enable_idle %d",
++ enable_idle);
++
++ if (enable_idle)
++ bfq_mark_bfqq_idle_window(bfqq);
++ else
++ bfq_clear_bfqq_idle_window(bfqq);
++}
++
++/*
++ * Called when a new fs request (rq) is added to bfqq. Check if there's
++ * something we should do about it.
++ */
++static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++ struct request *rq)
++{
++ struct bfq_io_cq *bic = RQ_BIC(rq);
++
++ if (rq->cmd_flags & REQ_META)
++ bfqq->meta_pending++;
++
++ bfq_update_io_thinktime(bfqd, bic);
++ bfq_update_io_seektime(bfqd, bfqq, rq);
++ if (!BFQQ_SEEKY(bfqq) && bfq_bfqq_constantly_seeky(bfqq)) {
++ bfq_clear_bfqq_constantly_seeky(bfqq);
++ if (!blk_queue_nonrot(bfqd->queue)) {
++ BUG_ON(!bfqd->const_seeky_busy_in_flight_queues);
++ bfqd->const_seeky_busy_in_flight_queues--;
++ }
++ }
++ if (bfqq->entity.service > bfq_max_budget(bfqd) / 8 ||
++ !BFQQ_SEEKY(bfqq))
++ bfq_update_idle_window(bfqd, bfqq, bic);
++
++ bfq_log_bfqq(bfqd, bfqq,
++ "rq_enqueued: idle_window=%d (seeky %d, mean %llu)",
++ bfq_bfqq_idle_window(bfqq), BFQQ_SEEKY(bfqq),
++ (long long unsigned)bfqq->seek_mean);
++
++ bfqq->last_request_pos = blk_rq_pos(rq) + blk_rq_sectors(rq);
++
++ if (bfqq == bfqd->in_service_queue && bfq_bfqq_wait_request(bfqq)) {
++ bool small_req = bfqq->queued[rq_is_sync(rq)] == 1 &&
++ blk_rq_sectors(rq) < 32;
++ bool budget_timeout = bfq_bfqq_budget_timeout(bfqq);
++
++ /*
++ * There is just this request queued: if the request
++ * is small and the queue is not to be expired, then
++ * just exit.
++ *
++ * In this way, if the disk is being idled to wait for
++ * a new request from the in-service queue, we avoid
++ * unplugging the device and committing the disk to serve
++ * just a small request. On the contrary, we wait for
++ * the block layer to decide when to unplug the device:
++ * hopefully, new requests will be merged to this one
++ * quickly, then the device will be unplugged and
++ * larger requests will be dispatched.
++ */
++ if (small_req && !budget_timeout)
++ return;
++
++ /*
++ * A large enough request arrived, or the queue is to
++ * be expired: in both cases disk idling is to be
++ * stopped, so clear wait_request flag and reset
++ * timer.
++ */
++ bfq_clear_bfqq_wait_request(bfqq);
++ del_timer(&bfqd->idle_slice_timer);
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ bfqg_stats_update_idle_time(bfqq_group(bfqq));
++#endif
++
++ /*
++ * The queue is not empty, because a new request just
++ * arrived. Hence we can safely expire the queue, in
++ * case of budget timeout, without risking that the
++ * timestamps of the queue are not updated correctly.
++ * See [1] for more details.
++ */
++ if (budget_timeout)
++ bfq_bfqq_expire(bfqd, bfqq, false,
++ BFQ_BFQQ_BUDGET_TIMEOUT);
++
++ /*
++ * Let the request rip immediately, or let a new queue be
++ * selected if bfqq has just been expired.
++ */
++ __blk_run_queue(bfqd->queue);
++ }
++}
++
++static void bfq_insert_request(struct request_queue *q, struct request *rq)
++{
++ struct bfq_data *bfqd = q->elevator->elevator_data;
++ struct bfq_queue *bfqq = RQ_BFQQ(rq);
++
++ assert_spin_locked(bfqd->queue->queue_lock);
++
++ bfq_add_request(rq);
++
++ rq->fifo_time = jiffies + bfqd->bfq_fifo_expire[rq_is_sync(rq)];
++ list_add_tail(&rq->queuelist, &bfqq->fifo);
++
++ bfq_rq_enqueued(bfqd, bfqq, rq);
++}
++
++static void bfq_update_hw_tag(struct bfq_data *bfqd)
++{
++ bfqd->max_rq_in_driver = max(bfqd->max_rq_in_driver,
++ bfqd->rq_in_driver);
++
++ if (bfqd->hw_tag == 1)
++ return;
++
++ /*
++ * This sample is valid if the number of outstanding requests
++ * is large enough to allow a queueing behavior. Note that the
++ * sum is not exact, as it's not taking into account deactivated
++ * requests.
++ */
++ if (bfqd->rq_in_driver + bfqd->queued < BFQ_HW_QUEUE_THRESHOLD)
++ return;
++
++ if (bfqd->hw_tag_samples++ < BFQ_HW_QUEUE_SAMPLES)
++ return;
++
++ bfqd->hw_tag = bfqd->max_rq_in_driver > BFQ_HW_QUEUE_THRESHOLD;
++ bfqd->max_rq_in_driver = 0;
++ bfqd->hw_tag_samples = 0;
++}
++
++static void bfq_completed_request(struct request_queue *q, struct request *rq)
++{
++ struct bfq_queue *bfqq = RQ_BFQQ(rq);
++ struct bfq_data *bfqd = bfqq->bfqd;
++ bool sync = bfq_bfqq_sync(bfqq);
++
++ bfq_log_bfqq(bfqd, bfqq, "completed one req with %u sects left (%d)",
++ blk_rq_sectors(rq), sync);
++
++ bfq_update_hw_tag(bfqd);
++
++ BUG_ON(!bfqd->rq_in_driver);
++ BUG_ON(!bfqq->dispatched);
++ bfqd->rq_in_driver--;
++ bfqq->dispatched--;
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ bfqg_stats_update_completion(bfqq_group(bfqq),
++ rq_start_time_ns(rq),
++ rq_io_start_time_ns(rq), rq->cmd_flags);
++#endif
++
++ if (!bfqq->dispatched && !bfq_bfqq_busy(bfqq)) {
++ bfq_weights_tree_remove(bfqd, &bfqq->entity,
++ &bfqd->queue_weights_tree);
++ if (!blk_queue_nonrot(bfqd->queue)) {
++ BUG_ON(!bfqd->busy_in_flight_queues);
++ bfqd->busy_in_flight_queues--;
++ if (bfq_bfqq_constantly_seeky(bfqq)) {
++ BUG_ON(!bfqd->
++ const_seeky_busy_in_flight_queues);
++ bfqd->const_seeky_busy_in_flight_queues--;
++ }
++ }
++ }
++
++ if (sync) {
++ bfqd->sync_flight--;
++ RQ_BIC(rq)->ttime.last_end_request = jiffies;
++ }
++
++ /*
++ * If we are waiting to discover whether the request pattern of the
++ * task associated with the queue is actually isochronous, and
++ * both requisites for this condition to hold are satisfied, then
++ * compute soft_rt_next_start (see the comments to the function
++ * bfq_bfqq_softrt_next_start()).
++ */
++ if (bfq_bfqq_softrt_update(bfqq) && bfqq->dispatched == 0 &&
++ RB_EMPTY_ROOT(&bfqq->sort_list))
++ bfqq->soft_rt_next_start =
++ bfq_bfqq_softrt_next_start(bfqd, bfqq);
++
++ /*
++ * If this is the in-service queue, check if it needs to be expired,
++ * or if we want to idle in case it has no pending requests.
++ */
++ if (bfqd->in_service_queue == bfqq) {
++ if (bfq_bfqq_budget_new(bfqq))
++ bfq_set_budget_timeout(bfqd);
++
++ if (bfq_bfqq_must_idle(bfqq)) {
++ bfq_arm_slice_timer(bfqd);
++ goto out;
++ } else if (bfq_may_expire_for_budg_timeout(bfqq))
++ bfq_bfqq_expire(bfqd, bfqq, false,
++ BFQ_BFQQ_BUDGET_TIMEOUT);
++ else if (RB_EMPTY_ROOT(&bfqq->sort_list) &&
++ (bfqq->dispatched == 0 ||
++ !bfq_bfqq_may_idle(bfqq)))
++ bfq_bfqq_expire(bfqd, bfqq, false,
++ BFQ_BFQQ_NO_MORE_REQUESTS);
++ }
++
++ if (!bfqd->rq_in_driver)
++ bfq_schedule_dispatch(bfqd);
++
++out:
++ return;
++}
++
++static int __bfq_may_queue(struct bfq_queue *bfqq)
++{
++ if (bfq_bfqq_wait_request(bfqq) && bfq_bfqq_must_alloc(bfqq)) {
++ bfq_clear_bfqq_must_alloc(bfqq);
++ return ELV_MQUEUE_MUST;
++ }
++
++ return ELV_MQUEUE_MAY;
++}
++
++static int bfq_may_queue(struct request_queue *q, int rw)
++{
++ struct bfq_data *bfqd = q->elevator->elevator_data;
++ struct task_struct *tsk = current;
++ struct bfq_io_cq *bic;
++ struct bfq_queue *bfqq;
++
++ /*
++ * Don't force setup of a queue from here, as a call to may_queue
++ * does not necessarily imply that a request actually will be
++ * queued. So just lookup a possibly existing queue, or return
++ * 'may queue' if that fails.
++ */
++ bic = bfq_bic_lookup(bfqd, tsk->io_context);
++ if (!bic)
++ return ELV_MQUEUE_MAY;
++
++ bfqq = bic_to_bfqq(bic, rw_is_sync(rw));
++ if (bfqq)
++ return __bfq_may_queue(bfqq);
++
++ return ELV_MQUEUE_MAY;
++}
++
++/*
++ * Queue lock held here.
++ */
++static void bfq_put_request(struct request *rq)
++{
++ struct bfq_queue *bfqq = RQ_BFQQ(rq);
++
++ if (bfqq) {
++ const int rw = rq_data_dir(rq);
++
++ BUG_ON(!bfqq->allocated[rw]);
++ bfqq->allocated[rw]--;
++
++ rq->elv.priv[0] = NULL;
++ rq->elv.priv[1] = NULL;
++
++ bfq_log_bfqq(bfqq->bfqd, bfqq, "put_request %p, %d",
++ bfqq, atomic_read(&bfqq->ref));
++ bfq_put_queue(bfqq);
++ }
++}
++
++/*
++ * Allocate bfq data structures associated with this request.
++ */
++static int bfq_set_request(struct request_queue *q, struct request *rq,
++ struct bio *bio, gfp_t gfp_mask)
++{
++ struct bfq_data *bfqd = q->elevator->elevator_data;
++ struct bfq_io_cq *bic = icq_to_bic(rq->elv.icq);
++ const int rw = rq_data_dir(rq);
++ const int is_sync = rq_is_sync(rq);
++ struct bfq_queue *bfqq;
++ unsigned long flags;
++
++ might_sleep_if(gfp_mask & __GFP_WAIT);
++
++ bfq_check_ioprio_change(bic, bio);
++
++ spin_lock_irqsave(q->queue_lock, flags);
++
++ if (!bic)
++ goto queue_fail;
++
++ bfq_bic_update_cgroup(bic, bio);
++
++ bfqq = bic_to_bfqq(bic, is_sync);
++ if (!bfqq || bfqq == &bfqd->oom_bfqq) {
++ bfqq = bfq_get_queue(bfqd, bio, is_sync, bic, gfp_mask);
++ bic_set_bfqq(bic, bfqq, is_sync);
++ if (is_sync) {
++ if (bfqd->large_burst)
++ bfq_mark_bfqq_in_large_burst(bfqq);
++ else
++ bfq_clear_bfqq_in_large_burst(bfqq);
++ }
++ }
++
++ bfqq->allocated[rw]++;
++ atomic_inc(&bfqq->ref);
++ bfq_log_bfqq(bfqd, bfqq, "set_request: bfqq %p, %d", bfqq,
++ atomic_read(&bfqq->ref));
++
++ rq->elv.priv[0] = bic;
++ rq->elv.priv[1] = bfqq;
++
++ spin_unlock_irqrestore(q->queue_lock, flags);
++
++ return 0;
++
++queue_fail:
++ bfq_schedule_dispatch(bfqd);
++ spin_unlock_irqrestore(q->queue_lock, flags);
++
++ return 1;
++}
++
++static void bfq_kick_queue(struct work_struct *work)
++{
++ struct bfq_data *bfqd =
++ container_of(work, struct bfq_data, unplug_work);
++ struct request_queue *q = bfqd->queue;
++
++ spin_lock_irq(q->queue_lock);
++ __blk_run_queue(q);
++ spin_unlock_irq(q->queue_lock);
++}
++
++/*
++ * Handler of the expiration of the timer running if the in-service queue
++ * is idling inside its time slice.
++ */
++static void bfq_idle_slice_timer(unsigned long data)
++{
++ struct bfq_data *bfqd = (struct bfq_data *)data;
++ struct bfq_queue *bfqq;
++ unsigned long flags;
++ enum bfqq_expiration reason;
++
++ spin_lock_irqsave(bfqd->queue->queue_lock, flags);
++
++ bfqq = bfqd->in_service_queue;
++ /*
++ * Theoretical race here: the in-service queue can be NULL or
++ * different from the queue that was idling if the timer handler
++ * spins on the queue_lock and a new request arrives for the
++ * current queue and there is a full dispatch cycle that changes
++ * the in-service queue. This can hardly happen, but in the worst
++ * case we just expire a queue too early.
++ */
++ if (bfqq) {
++ bfq_log_bfqq(bfqd, bfqq, "slice_timer expired");
++ if (bfq_bfqq_budget_timeout(bfqq))
++ /*
++ * Also here the queue can be safely expired
++ * for budget timeout without wasting
++ * guarantees
++ */
++ reason = BFQ_BFQQ_BUDGET_TIMEOUT;
++ else if (bfqq->queued[0] == 0 && bfqq->queued[1] == 0)
++ /*
++ * The queue may not be empty upon timer expiration,
++ * because we may not disable the timer when the
++ * first request of the in-service queue arrives
++ * during disk idling.
++ */
++ reason = BFQ_BFQQ_TOO_IDLE;
++ else
++ goto schedule_dispatch;
++
++ bfq_bfqq_expire(bfqd, bfqq, true, reason);
++ }
++
++schedule_dispatch:
++ bfq_schedule_dispatch(bfqd);
++
++ spin_unlock_irqrestore(bfqd->queue->queue_lock, flags);
++}
++
++static void bfq_shutdown_timer_wq(struct bfq_data *bfqd)
++{
++ del_timer_sync(&bfqd->idle_slice_timer);
++ cancel_work_sync(&bfqd->unplug_work);
++}
++
++static void __bfq_put_async_bfqq(struct bfq_data *bfqd,
++ struct bfq_queue **bfqq_ptr)
++{
++ struct bfq_group *root_group = bfqd->root_group;
++ struct bfq_queue *bfqq = *bfqq_ptr;
++
++ bfq_log(bfqd, "put_async_bfqq: %p", bfqq);
++ if (bfqq) {
++ bfq_bfqq_move(bfqd, bfqq, &bfqq->entity, root_group);
++ bfq_log_bfqq(bfqd, bfqq, "put_async_bfqq: putting %p, %d",
++ bfqq, atomic_read(&bfqq->ref));
++ bfq_put_queue(bfqq);
++ *bfqq_ptr = NULL;
++ }
++}
++
++/*
++ * Release all the bfqg references to its async queues. If we are
++ * deallocating the group these queues may still contain requests, so
++ * we reparent them to the root cgroup (i.e., the only one that will
++ * exist for sure until all the requests on a device are gone).
++ */
++static void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg)
++{
++ int i, j;
++
++ for (i = 0; i < 2; i++)
++ for (j = 0; j < IOPRIO_BE_NR; j++)
++ __bfq_put_async_bfqq(bfqd, &bfqg->async_bfqq[i][j]);
++
++ __bfq_put_async_bfqq(bfqd, &bfqg->async_idle_bfqq);
++}
++
++static void bfq_exit_queue(struct elevator_queue *e)
++{
++ struct bfq_data *bfqd = e->elevator_data;
++ struct request_queue *q = bfqd->queue;
++ struct bfq_queue *bfqq, *n;
++
++ bfq_shutdown_timer_wq(bfqd);
++
++ spin_lock_irq(q->queue_lock);
++
++ BUG_ON(bfqd->in_service_queue);
++ list_for_each_entry_safe(bfqq, n, &bfqd->idle_list, bfqq_list)
++ bfq_deactivate_bfqq(bfqd, bfqq, 0);
++
++ bfq_disconnect_groups(bfqd);
++ spin_unlock_irq(q->queue_lock);
++
++ bfq_shutdown_timer_wq(bfqd);
++
++ synchronize_rcu();
++
++ BUG_ON(timer_pending(&bfqd->idle_slice_timer));
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ blkcg_deactivate_policy(q, &blkcg_policy_bfq);
++#endif
++
++ kfree(bfqd);
++}
++
++static void bfq_init_root_group(struct bfq_group *root_group,
++ struct bfq_data *bfqd)
++{
++ int i;
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ root_group->entity.parent = NULL;
++ root_group->my_entity = NULL;
++ root_group->bfqd = bfqd;
++#endif
++ for (i = 0; i < BFQ_IOPRIO_CLASSES; i++)
++ root_group->sched_data.service_tree[i] = BFQ_SERVICE_TREE_INIT;
++}
++
++static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
++{
++ struct bfq_data *bfqd;
++ struct elevator_queue *eq;
++
++ eq = elevator_alloc(q, e);
++ if (!eq)
++ return -ENOMEM;
++
++ bfqd = kzalloc_node(sizeof(*bfqd), GFP_KERNEL, q->node);
++ if (!bfqd) {
++ kobject_put(&eq->kobj);
++ return -ENOMEM;
++ }
++ eq->elevator_data = bfqd;
++
++ /*
++ * Our fallback bfqq if bfq_find_alloc_queue() runs into OOM issues.
++ * Grab a permanent reference to it, so that the normal code flow
++ * will not attempt to free it.
++ */
++ bfq_init_bfqq(bfqd, &bfqd->oom_bfqq, NULL, 1, 0);
++ atomic_inc(&bfqd->oom_bfqq.ref);
++ bfqd->oom_bfqq.new_ioprio = BFQ_DEFAULT_QUEUE_IOPRIO;
++ bfqd->oom_bfqq.new_ioprio_class = IOPRIO_CLASS_BE;
++ bfqd->oom_bfqq.entity.new_weight =
++ bfq_ioprio_to_weight(bfqd->oom_bfqq.new_ioprio);
++ /*
++ * Trigger weight initialization, according to ioprio, at the
++ * oom_bfqq's first activation. The oom_bfqq's ioprio and ioprio
++ * class won't be changed any more.
++ */
++ bfqd->oom_bfqq.entity.prio_changed = 1;
++
++ bfqd->queue = q;
++
++ spin_lock_irq(q->queue_lock);
++ q->elevator = eq;
++ spin_unlock_irq(q->queue_lock);
++
++ bfqd->root_group = bfq_create_group_hierarchy(bfqd, q->node);
++ if (!bfqd->root_group)
++ goto out_free;
++ bfq_init_root_group(bfqd->root_group, bfqd);
++ bfq_init_entity(&bfqd->oom_bfqq.entity, bfqd->root_group);
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ bfqd->active_numerous_groups = 0;
++#endif
++
++ init_timer(&bfqd->idle_slice_timer);
++ bfqd->idle_slice_timer.function = bfq_idle_slice_timer;
++ bfqd->idle_slice_timer.data = (unsigned long)bfqd;
++
++ bfqd->queue_weights_tree = RB_ROOT;
++ bfqd->group_weights_tree = RB_ROOT;
++
++ INIT_WORK(&bfqd->unplug_work, bfq_kick_queue);
++
++ INIT_LIST_HEAD(&bfqd->active_list);
++ INIT_LIST_HEAD(&bfqd->idle_list);
++ INIT_HLIST_HEAD(&bfqd->burst_list);
++
++ bfqd->hw_tag = -1;
++
++ bfqd->bfq_max_budget = bfq_default_max_budget;
++
++ bfqd->bfq_fifo_expire[0] = bfq_fifo_expire[0];
++ bfqd->bfq_fifo_expire[1] = bfq_fifo_expire[1];
++ bfqd->bfq_back_max = bfq_back_max;
++ bfqd->bfq_back_penalty = bfq_back_penalty;
++ bfqd->bfq_slice_idle = bfq_slice_idle;
++ bfqd->bfq_class_idle_last_service = 0;
++ bfqd->bfq_max_budget_async_rq = bfq_max_budget_async_rq;
++ bfqd->bfq_timeout[BLK_RW_ASYNC] = bfq_timeout_async;
++ bfqd->bfq_timeout[BLK_RW_SYNC] = bfq_timeout_sync;
++
++ bfqd->bfq_requests_within_timer = 120;
++
++ bfqd->bfq_large_burst_thresh = 11;
++ bfqd->bfq_burst_interval = msecs_to_jiffies(500);
++
++ bfqd->low_latency = true;
++
++ bfqd->bfq_wr_coeff = 20;
++ bfqd->bfq_wr_rt_max_time = msecs_to_jiffies(300);
++ bfqd->bfq_wr_max_time = 0;
++ bfqd->bfq_wr_min_idle_time = msecs_to_jiffies(2000);
++ bfqd->bfq_wr_min_inter_arr_async = msecs_to_jiffies(500);
++ bfqd->bfq_wr_max_softrt_rate = 7000; /*
++ * Approximate rate required
++ * to playback or record a
++ * high-definition compressed
++ * video.
++ */
++ bfqd->wr_busy_queues = 0;
++ bfqd->busy_in_flight_queues = 0;
++ bfqd->const_seeky_busy_in_flight_queues = 0;
++
++ /*
++ * Begin by assuming, optimistically, that the device peak rate is
++ * equal to the highest reference rate.
++ */
++ bfqd->RT_prod = R_fast[blk_queue_nonrot(bfqd->queue)] *
++ T_fast[blk_queue_nonrot(bfqd->queue)];
++ bfqd->peak_rate = R_fast[blk_queue_nonrot(bfqd->queue)];
++ bfqd->device_speed = BFQ_BFQD_FAST;
++
++ return 0;
++
++out_free:
++ kfree(bfqd);
++ kobject_put(&eq->kobj);
++ return -ENOMEM;
++}
++
++static void bfq_slab_kill(void)
++{
++ if (bfq_pool)
++ kmem_cache_destroy(bfq_pool);
++}
++
++static int __init bfq_slab_setup(void)
++{
++ bfq_pool = KMEM_CACHE(bfq_queue, 0);
++ if (!bfq_pool)
++ return -ENOMEM;
++ return 0;
++}
++
++static ssize_t bfq_var_show(unsigned int var, char *page)
++{
++ return sprintf(page, "%d\n", var);
++}
++
++static ssize_t bfq_var_store(unsigned long *var, const char *page,
++ size_t count)
++{
++ unsigned long new_val;
++ int ret = kstrtoul(page, 10, &new_val);
++
++ if (ret == 0)
++ *var = new_val;
++
++ return count;
++}
++
++static ssize_t bfq_wr_max_time_show(struct elevator_queue *e, char *page)
++{
++ struct bfq_data *bfqd = e->elevator_data;
++ return sprintf(page, "%d\n", bfqd->bfq_wr_max_time > 0 ?
++ jiffies_to_msecs(bfqd->bfq_wr_max_time) :
++ jiffies_to_msecs(bfq_wr_duration(bfqd)));
++}
++
++static ssize_t bfq_weights_show(struct elevator_queue *e, char *page)
++{
++ struct bfq_queue *bfqq;
++ struct bfq_data *bfqd = e->elevator_data;
++ ssize_t num_char = 0;
++
++ num_char += sprintf(page + num_char, "Tot reqs queued %d\n\n",
++ bfqd->queued);
++
++ spin_lock_irq(bfqd->queue->queue_lock);
++
++ num_char += sprintf(page + num_char, "Active:\n");
++ list_for_each_entry(bfqq, &bfqd->active_list, bfqq_list) {
++ num_char += sprintf(page + num_char,
++ "pid%d: weight %hu, nr_queued %d %d, dur %d/%u\n",
++ bfqq->pid,
++ bfqq->entity.weight,
++ bfqq->queued[0],
++ bfqq->queued[1],
++ jiffies_to_msecs(jiffies - bfqq->last_wr_start_finish),
++ jiffies_to_msecs(bfqq->wr_cur_max_time));
++ }
++
++ num_char += sprintf(page + num_char, "Idle:\n");
++ list_for_each_entry(bfqq, &bfqd->idle_list, bfqq_list) {
++ num_char += sprintf(page + num_char,
++ "pid%d: weight %hu, dur %d/%u\n",
++ bfqq->pid,
++ bfqq->entity.weight,
++ jiffies_to_msecs(jiffies -
++ bfqq->last_wr_start_finish),
++ jiffies_to_msecs(bfqq->wr_cur_max_time));
++ }
++
++ spin_unlock_irq(bfqd->queue->queue_lock);
++
++ return num_char;
++}
++
++#define SHOW_FUNCTION(__FUNC, __VAR, __CONV) \
++static ssize_t __FUNC(struct elevator_queue *e, char *page) \
++{ \
++ struct bfq_data *bfqd = e->elevator_data; \
++ unsigned int __data = __VAR; \
++ if (__CONV) \
++ __data = jiffies_to_msecs(__data); \
++ return bfq_var_show(__data, (page)); \
++}
++SHOW_FUNCTION(bfq_fifo_expire_sync_show, bfqd->bfq_fifo_expire[1], 1);
++SHOW_FUNCTION(bfq_fifo_expire_async_show, bfqd->bfq_fifo_expire[0], 1);
++SHOW_FUNCTION(bfq_back_seek_max_show, bfqd->bfq_back_max, 0);
++SHOW_FUNCTION(bfq_back_seek_penalty_show, bfqd->bfq_back_penalty, 0);
++SHOW_FUNCTION(bfq_slice_idle_show, bfqd->bfq_slice_idle, 1);
++SHOW_FUNCTION(bfq_max_budget_show, bfqd->bfq_user_max_budget, 0);
++SHOW_FUNCTION(bfq_max_budget_async_rq_show,
++ bfqd->bfq_max_budget_async_rq, 0);
++SHOW_FUNCTION(bfq_timeout_sync_show, bfqd->bfq_timeout[BLK_RW_SYNC], 1);
++SHOW_FUNCTION(bfq_timeout_async_show, bfqd->bfq_timeout[BLK_RW_ASYNC], 1);
++SHOW_FUNCTION(bfq_low_latency_show, bfqd->low_latency, 0);
++SHOW_FUNCTION(bfq_wr_coeff_show, bfqd->bfq_wr_coeff, 0);
++SHOW_FUNCTION(bfq_wr_rt_max_time_show, bfqd->bfq_wr_rt_max_time, 1);
++SHOW_FUNCTION(bfq_wr_min_idle_time_show, bfqd->bfq_wr_min_idle_time, 1);
++SHOW_FUNCTION(bfq_wr_min_inter_arr_async_show, bfqd->bfq_wr_min_inter_arr_async,
++ 1);
++SHOW_FUNCTION(bfq_wr_max_softrt_rate_show, bfqd->bfq_wr_max_softrt_rate, 0);
++#undef SHOW_FUNCTION
++
++#define STORE_FUNCTION(__FUNC, __PTR, MIN, MAX, __CONV) \
++static ssize_t \
++__FUNC(struct elevator_queue *e, const char *page, size_t count) \
++{ \
++ struct bfq_data *bfqd = e->elevator_data; \
++ unsigned long uninitialized_var(__data); \
++ int ret = bfq_var_store(&__data, (page), count); \
++ if (__data < (MIN)) \
++ __data = (MIN); \
++ else if (__data > (MAX)) \
++ __data = (MAX); \
++ if (__CONV) \
++ *(__PTR) = msecs_to_jiffies(__data); \
++ else \
++ *(__PTR) = __data; \
++ return ret; \
++}
++STORE_FUNCTION(bfq_fifo_expire_sync_store, &bfqd->bfq_fifo_expire[1], 1,
++ INT_MAX, 1);
++STORE_FUNCTION(bfq_fifo_expire_async_store, &bfqd->bfq_fifo_expire[0], 1,
++ INT_MAX, 1);
++STORE_FUNCTION(bfq_back_seek_max_store, &bfqd->bfq_back_max, 0, INT_MAX, 0);
++STORE_FUNCTION(bfq_back_seek_penalty_store, &bfqd->bfq_back_penalty, 1,
++ INT_MAX, 0);
++STORE_FUNCTION(bfq_slice_idle_store, &bfqd->bfq_slice_idle, 0, INT_MAX, 1);
++STORE_FUNCTION(bfq_max_budget_async_rq_store, &bfqd->bfq_max_budget_async_rq,
++ 1, INT_MAX, 0);
++STORE_FUNCTION(bfq_timeout_async_store, &bfqd->bfq_timeout[BLK_RW_ASYNC], 0,
++ INT_MAX, 1);
++STORE_FUNCTION(bfq_wr_coeff_store, &bfqd->bfq_wr_coeff, 1, INT_MAX, 0);
++STORE_FUNCTION(bfq_wr_max_time_store, &bfqd->bfq_wr_max_time, 0, INT_MAX, 1);
++STORE_FUNCTION(bfq_wr_rt_max_time_store, &bfqd->bfq_wr_rt_max_time, 0, INT_MAX,
++ 1);
++STORE_FUNCTION(bfq_wr_min_idle_time_store, &bfqd->bfq_wr_min_idle_time, 0,
++ INT_MAX, 1);
++STORE_FUNCTION(bfq_wr_min_inter_arr_async_store,
++ &bfqd->bfq_wr_min_inter_arr_async, 0, INT_MAX, 1);
++STORE_FUNCTION(bfq_wr_max_softrt_rate_store, &bfqd->bfq_wr_max_softrt_rate, 0,
++ INT_MAX, 0);
++#undef STORE_FUNCTION
++
++/* do nothing for the moment */
++static ssize_t bfq_weights_store(struct elevator_queue *e,
++ const char *page, size_t count)
++{
++ return count;
++}
++
++static unsigned long bfq_estimated_max_budget(struct bfq_data *bfqd)
++{
++ u64 timeout = jiffies_to_msecs(bfqd->bfq_timeout[BLK_RW_SYNC]);
++
++ if (bfqd->peak_rate_samples >= BFQ_PEAK_RATE_SAMPLES)
++ return bfq_calc_max_budget(bfqd->peak_rate, timeout);
++ else
++ return bfq_default_max_budget;
++}
++
++static ssize_t bfq_max_budget_store(struct elevator_queue *e,
++ const char *page, size_t count)
++{
++ struct bfq_data *bfqd = e->elevator_data;
++ unsigned long uninitialized_var(__data);
++ int ret = bfq_var_store(&__data, (page), count);
++
++ if (__data == 0)
++ bfqd->bfq_max_budget = bfq_estimated_max_budget(bfqd);
++ else {
++ if (__data > INT_MAX)
++ __data = INT_MAX;
++ bfqd->bfq_max_budget = __data;
++ }
++
++ bfqd->bfq_user_max_budget = __data;
++
++ return ret;
++}
++
++static ssize_t bfq_timeout_sync_store(struct elevator_queue *e,
++ const char *page, size_t count)
++{
++ struct bfq_data *bfqd = e->elevator_data;
++ unsigned long uninitialized_var(__data);
++ int ret = bfq_var_store(&__data, (page), count);
++
++ if (__data < 1)
++ __data = 1;
++ else if (__data > INT_MAX)
++ __data = INT_MAX;
++
++ bfqd->bfq_timeout[BLK_RW_SYNC] = msecs_to_jiffies(__data);
++ if (bfqd->bfq_user_max_budget == 0)
++ bfqd->bfq_max_budget = bfq_estimated_max_budget(bfqd);
++
++ return ret;
++}
++
++static ssize_t bfq_low_latency_store(struct elevator_queue *e,
++ const char *page, size_t count)
++{
++ struct bfq_data *bfqd = e->elevator_data;
++ unsigned long uninitialized_var(__data);
++ int ret = bfq_var_store(&__data, (page), count);
++
++ if (__data > 1)
++ __data = 1;
++ if (__data == 0 && bfqd->low_latency != 0)
++ bfq_end_wr(bfqd);
++ bfqd->low_latency = __data;
++
++ return ret;
++}
++
++#define BFQ_ATTR(name) \
++ __ATTR(name, S_IRUGO|S_IWUSR, bfq_##name##_show, bfq_##name##_store)
++
++static struct elv_fs_entry bfq_attrs[] = {
++ BFQ_ATTR(fifo_expire_sync),
++ BFQ_ATTR(fifo_expire_async),
++ BFQ_ATTR(back_seek_max),
++ BFQ_ATTR(back_seek_penalty),
++ BFQ_ATTR(slice_idle),
++ BFQ_ATTR(max_budget),
++ BFQ_ATTR(max_budget_async_rq),
++ BFQ_ATTR(timeout_sync),
++ BFQ_ATTR(timeout_async),
++ BFQ_ATTR(low_latency),
++ BFQ_ATTR(wr_coeff),
++ BFQ_ATTR(wr_max_time),
++ BFQ_ATTR(wr_rt_max_time),
++ BFQ_ATTR(wr_min_idle_time),
++ BFQ_ATTR(wr_min_inter_arr_async),
++ BFQ_ATTR(wr_max_softrt_rate),
++ BFQ_ATTR(weights),
++ __ATTR_NULL
++};
++
++static struct elevator_type iosched_bfq = {
++ .ops = {
++ .elevator_merge_fn = bfq_merge,
++ .elevator_merged_fn = bfq_merged_request,
++ .elevator_merge_req_fn = bfq_merged_requests,
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ .elevator_bio_merged_fn = bfq_bio_merged,
++#endif
++ .elevator_allow_merge_fn = bfq_allow_merge,
++ .elevator_dispatch_fn = bfq_dispatch_requests,
++ .elevator_add_req_fn = bfq_insert_request,
++ .elevator_activate_req_fn = bfq_activate_request,
++ .elevator_deactivate_req_fn = bfq_deactivate_request,
++ .elevator_completed_req_fn = bfq_completed_request,
++ .elevator_former_req_fn = elv_rb_former_request,
++ .elevator_latter_req_fn = elv_rb_latter_request,
++ .elevator_init_icq_fn = bfq_init_icq,
++ .elevator_exit_icq_fn = bfq_exit_icq,
++ .elevator_set_req_fn = bfq_set_request,
++ .elevator_put_req_fn = bfq_put_request,
++ .elevator_may_queue_fn = bfq_may_queue,
++ .elevator_init_fn = bfq_init_queue,
++ .elevator_exit_fn = bfq_exit_queue,
++ },
++ .icq_size = sizeof(struct bfq_io_cq),
++ .icq_align = __alignof__(struct bfq_io_cq),
++ .elevator_attrs = bfq_attrs,
++ .elevator_name = "bfq",
++ .elevator_owner = THIS_MODULE,
++};
++
++static int __init bfq_init(void)
++{
++ int ret;
++
++ /*
++ * Can be 0 on HZ < 1000 setups.
++ */
++ if (bfq_slice_idle == 0)
++ bfq_slice_idle = 1;
++
++ if (bfq_timeout_async == 0)
++ bfq_timeout_async = 1;
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ ret = blkcg_policy_register(&blkcg_policy_bfq);
++ if (ret)
++ return ret;
++#endif
++
++ ret = -ENOMEM;
++ if (bfq_slab_setup())
++ goto err_pol_unreg;
++
++ /*
++ * Times to load large popular applications for the typical systems
++ * installed on the reference devices (see the comments before the
++ * definitions of the two arrays).
++ */
++ T_slow[0] = msecs_to_jiffies(2600);
++ T_slow[1] = msecs_to_jiffies(1000);
++ T_fast[0] = msecs_to_jiffies(5500);
++ T_fast[1] = msecs_to_jiffies(2000);
++
++ /*
++ * Thresholds that determine the switch between speed classes (see
++ * the comments before the definition of the array).
++ */
++ device_speed_thresh[0] = (R_fast[0] + R_slow[0]) / 2;
++ device_speed_thresh[1] = (R_fast[1] + R_slow[1]) / 2;
++
++ ret = elv_register(&iosched_bfq);
++ if (ret)
++ goto err_pol_unreg;
++
++ pr_info("BFQ I/O-scheduler: v7r9");
++
++ return 0;
++
++err_pol_unreg:
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ blkcg_policy_unregister(&blkcg_policy_bfq);
++#endif
++ return ret;
++}
++
++static void __exit bfq_exit(void)
++{
++ elv_unregister(&iosched_bfq);
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ blkcg_policy_unregister(&blkcg_policy_bfq);
++#endif
++ bfq_slab_kill();
++}
++
++module_init(bfq_init);
++module_exit(bfq_exit);
++
++MODULE_AUTHOR("Fabio Checconi, Paolo Valente");
++MODULE_LICENSE("GPL");
+diff --git a/block/bfq-sched.c b/block/bfq-sched.c
+new file mode 100644
+index 0000000..9328a1f
+--- /dev/null
++++ b/block/bfq-sched.c
+@@ -0,0 +1,1197 @@
++/*
++ * BFQ: Hierarchical B-WF2Q+ scheduler.
++ *
++ * Based on ideas and code from CFQ:
++ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
++ *
++ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
++ * Paolo Valente <paolo.valente@unimore.it>
++ *
++ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
++ */
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++#define for_each_entity(entity) \
++ for (; entity ; entity = entity->parent)
++
++#define for_each_entity_safe(entity, parent) \
++ for (; entity && ({ parent = entity->parent; 1; }); entity = parent)
++
++
++static struct bfq_entity *bfq_lookup_next_entity(struct bfq_sched_data *sd,
++ int extract,
++ struct bfq_data *bfqd);
++
++static struct bfq_group *bfqq_group(struct bfq_queue *bfqq);
++
++static void bfq_update_budget(struct bfq_entity *next_in_service)
++{
++ struct bfq_entity *bfqg_entity;
++ struct bfq_group *bfqg;
++ struct bfq_sched_data *group_sd;
++
++ BUG_ON(!next_in_service);
++
++ group_sd = next_in_service->sched_data;
++
++ bfqg = container_of(group_sd, struct bfq_group, sched_data);
++ /*
++ * bfq_group's my_entity field is not NULL only if the group
++ * is not the root group. We must not touch the root entity
++ * as it must never become an in-service entity.
++ */
++ bfqg_entity = bfqg->my_entity;
++ if (bfqg_entity)
++ bfqg_entity->budget = next_in_service->budget;
++}
++
++static int bfq_update_next_in_service(struct bfq_sched_data *sd)
++{
++ struct bfq_entity *next_in_service;
++
++ if (sd->in_service_entity)
++ /* will update/requeue at the end of service */
++ return 0;
++
++ /*
++ * NOTE: this can be improved in many ways, such as returning
++ * 1 (and thus propagating upwards the update) only when the
++ * budget changes, or caching the bfqq that will be scheduled
++ * next from this subtree. By now we worry more about
++ * correctness than about performance...
++ */
++ next_in_service = bfq_lookup_next_entity(sd, 0, NULL);
++ sd->next_in_service = next_in_service;
++
++ if (next_in_service)
++ bfq_update_budget(next_in_service);
++
++ return 1;
++}
++
++static void bfq_check_next_in_service(struct bfq_sched_data *sd,
++ struct bfq_entity *entity)
++{
++ BUG_ON(sd->next_in_service != entity);
++}
++#else
++#define for_each_entity(entity) \
++ for (; entity ; entity = NULL)
++
++#define for_each_entity_safe(entity, parent) \
++ for (parent = NULL; entity ; entity = parent)
++
++static int bfq_update_next_in_service(struct bfq_sched_data *sd)
++{
++ return 0;
++}
++
++static void bfq_check_next_in_service(struct bfq_sched_data *sd,
++ struct bfq_entity *entity)
++{
++}
++
++static void bfq_update_budget(struct bfq_entity *next_in_service)
++{
++}
++#endif
++
++/*
++ * Shift for timestamp calculations. This actually limits the maximum
++ * service allowed in one timestamp delta (small shift values increase it),
++ * the maximum total weight that can be used for the queues in the system
++ * (big shift values increase it), and the period of virtual time
++ * wraparounds.
++ */
++#define WFQ_SERVICE_SHIFT 22
++
++/**
++ * bfq_gt - compare two timestamps.
++ * @a: first ts.
++ * @b: second ts.
++ *
++ * Return @a > @b, dealing with wrapping correctly.
++ */
++static int bfq_gt(u64 a, u64 b)
++{
++ return (s64)(a - b) > 0;
++}
++
++static struct bfq_queue *bfq_entity_to_bfqq(struct bfq_entity *entity)
++{
++ struct bfq_queue *bfqq = NULL;
++
++ BUG_ON(!entity);
++
++ if (!entity->my_sched_data)
++ bfqq = container_of(entity, struct bfq_queue, entity);
++
++ return bfqq;
++}
++
++
++/**
++ * bfq_delta - map service into the virtual time domain.
++ * @service: amount of service.
++ * @weight: scale factor (weight of an entity or weight sum).
++ */
++static u64 bfq_delta(unsigned long service, unsigned long weight)
++{
++ u64 d = (u64)service << WFQ_SERVICE_SHIFT;
++
++ do_div(d, weight);
++ return d;
++}
++
++/**
++ * bfq_calc_finish - assign the finish time to an entity.
++ * @entity: the entity to act upon.
++ * @service: the service to be charged to the entity.
++ */
++static void bfq_calc_finish(struct bfq_entity *entity, unsigned long service)
++{
++ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++
++ BUG_ON(entity->weight == 0);
++
++ entity->finish = entity->start +
++ bfq_delta(service, entity->weight);
++
++ if (bfqq) {
++ bfq_log_bfqq(bfqq->bfqd, bfqq,
++ "calc_finish: serv %lu, w %d",
++ service, entity->weight);
++ bfq_log_bfqq(bfqq->bfqd, bfqq,
++ "calc_finish: start %llu, finish %llu, delta %llu",
++ entity->start, entity->finish,
++ bfq_delta(service, entity->weight));
++ }
++}
++
++/**
++ * bfq_entity_of - get an entity from a node.
++ * @node: the node field of the entity.
++ *
++ * Convert a node pointer to the relative entity. This is used only
++ * to simplify the logic of some functions and not as the generic
++ * conversion mechanism because, e.g., in the tree walking functions,
++ * the check for a %NULL value would be redundant.
++ */
++static struct bfq_entity *bfq_entity_of(struct rb_node *node)
++{
++ struct bfq_entity *entity = NULL;
++
++ if (node)
++ entity = rb_entry(node, struct bfq_entity, rb_node);
++
++ return entity;
++}
++
++/**
++ * bfq_extract - remove an entity from a tree.
++ * @root: the tree root.
++ * @entity: the entity to remove.
++ */
++static void bfq_extract(struct rb_root *root, struct bfq_entity *entity)
++{
++ BUG_ON(entity->tree != root);
++
++ entity->tree = NULL;
++ rb_erase(&entity->rb_node, root);
++}
++
++/**
++ * bfq_idle_extract - extract an entity from the idle tree.
++ * @st: the service tree of the owning @entity.
++ * @entity: the entity being removed.
++ */
++static void bfq_idle_extract(struct bfq_service_tree *st,
++ struct bfq_entity *entity)
++{
++ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++ struct rb_node *next;
++
++ BUG_ON(entity->tree != &st->idle);
++
++ if (entity == st->first_idle) {
++ next = rb_next(&entity->rb_node);
++ st->first_idle = bfq_entity_of(next);
++ }
++
++ if (entity == st->last_idle) {
++ next = rb_prev(&entity->rb_node);
++ st->last_idle = bfq_entity_of(next);
++ }
++
++ bfq_extract(&st->idle, entity);
++
++ if (bfqq)
++ list_del(&bfqq->bfqq_list);
++}
++
++/**
++ * bfq_insert - generic tree insertion.
++ * @root: tree root.
++ * @entity: entity to insert.
++ *
++ * This is used for the idle and the active tree, since they are both
++ * ordered by finish time.
++ */
++static void bfq_insert(struct rb_root *root, struct bfq_entity *entity)
++{
++ struct bfq_entity *entry;
++ struct rb_node **node = &root->rb_node;
++ struct rb_node *parent = NULL;
++
++ BUG_ON(entity->tree);
++
++ while (*node) {
++ parent = *node;
++ entry = rb_entry(parent, struct bfq_entity, rb_node);
++
++ if (bfq_gt(entry->finish, entity->finish))
++ node = &parent->rb_left;
++ else
++ node = &parent->rb_right;
++ }
++
++ rb_link_node(&entity->rb_node, parent, node);
++ rb_insert_color(&entity->rb_node, root);
++
++ entity->tree = root;
++}
++
++/**
++ * bfq_update_min - update the min_start field of a entity.
++ * @entity: the entity to update.
++ * @node: one of its children.
++ *
++ * This function is called when @entity may store an invalid value for
++ * min_start due to updates to the active tree. The function assumes
++ * that the subtree rooted at @node (which may be its left or its right
++ * child) has a valid min_start value.
++ */
++static void bfq_update_min(struct bfq_entity *entity, struct rb_node *node)
++{
++ struct bfq_entity *child;
++
++ if (node) {
++ child = rb_entry(node, struct bfq_entity, rb_node);
++ if (bfq_gt(entity->min_start, child->min_start))
++ entity->min_start = child->min_start;
++ }
++}
++
++/**
++ * bfq_update_active_node - recalculate min_start.
++ * @node: the node to update.
++ *
++ * @node may have changed position or one of its children may have moved,
++ * this function updates its min_start value. The left and right subtrees
++ * are assumed to hold a correct min_start value.
++ */
++static void bfq_update_active_node(struct rb_node *node)
++{
++ struct bfq_entity *entity = rb_entry(node, struct bfq_entity, rb_node);
++
++ entity->min_start = entity->start;
++ bfq_update_min(entity, node->rb_right);
++ bfq_update_min(entity, node->rb_left);
++}
++
++/**
++ * bfq_update_active_tree - update min_start for the whole active tree.
++ * @node: the starting node.
++ *
++ * @node must be the deepest modified node after an update. This function
++ * updates its min_start using the values held by its children, assuming
++ * that they did not change, and then updates all the nodes that may have
++ * changed in the path to the root. The only nodes that may have changed
++ * are the ones in the path or their siblings.
++ */
++static void bfq_update_active_tree(struct rb_node *node)
++{
++ struct rb_node *parent;
++
++up:
++ bfq_update_active_node(node);
++
++ parent = rb_parent(node);
++ if (!parent)
++ return;
++
++ if (node == parent->rb_left && parent->rb_right)
++ bfq_update_active_node(parent->rb_right);
++ else if (parent->rb_left)
++ bfq_update_active_node(parent->rb_left);
++
++ node = parent;
++ goto up;
++}
++
++static void bfq_weights_tree_add(struct bfq_data *bfqd,
++ struct bfq_entity *entity,
++ struct rb_root *root);
++
++static void bfq_weights_tree_remove(struct bfq_data *bfqd,
++ struct bfq_entity *entity,
++ struct rb_root *root);
++
++
++/**
++ * bfq_active_insert - insert an entity in the active tree of its
++ * group/device.
++ * @st: the service tree of the entity.
++ * @entity: the entity being inserted.
++ *
++ * The active tree is ordered by finish time, but an extra key is kept
++ * per each node, containing the minimum value for the start times of
++ * its children (and the node itself), so it's possible to search for
++ * the eligible node with the lowest finish time in logarithmic time.
++ */
++static void bfq_active_insert(struct bfq_service_tree *st,
++ struct bfq_entity *entity)
++{
++ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++ struct rb_node *node = &entity->rb_node;
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ struct bfq_sched_data *sd = NULL;
++ struct bfq_group *bfqg = NULL;
++ struct bfq_data *bfqd = NULL;
++#endif
++
++ bfq_insert(&st->active, entity);
++
++ if (node->rb_left)
++ node = node->rb_left;
++ else if (node->rb_right)
++ node = node->rb_right;
++
++ bfq_update_active_tree(node);
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ sd = entity->sched_data;
++ bfqg = container_of(sd, struct bfq_group, sched_data);
++ BUG_ON(!bfqg);
++ bfqd = (struct bfq_data *)bfqg->bfqd;
++#endif
++ if (bfqq)
++ list_add(&bfqq->bfqq_list, &bfqq->bfqd->active_list);
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ else { /* bfq_group */
++ BUG_ON(!bfqd);
++ bfq_weights_tree_add(bfqd, entity, &bfqd->group_weights_tree);
++ }
++ if (bfqg != bfqd->root_group) {
++ BUG_ON(!bfqg);
++ BUG_ON(!bfqd);
++ bfqg->active_entities++;
++ if (bfqg->active_entities == 2)
++ bfqd->active_numerous_groups++;
++ }
++#endif
++}
++
++/**
++ * bfq_ioprio_to_weight - calc a weight from an ioprio.
++ * @ioprio: the ioprio value to convert.
++ */
++static unsigned short bfq_ioprio_to_weight(int ioprio)
++{
++ BUG_ON(ioprio < 0 || ioprio >= IOPRIO_BE_NR);
++ return IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF - ioprio;
++}
++
++/**
++ * bfq_weight_to_ioprio - calc an ioprio from a weight.
++ * @weight: the weight value to convert.
++ *
++ * To preserve as much as possible the old only-ioprio user interface,
++ * 0 is used as an escape ioprio value for weights (numerically) equal or
++ * larger than IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF.
++ */
++static unsigned short bfq_weight_to_ioprio(int weight)
++{
++ BUG_ON(weight < BFQ_MIN_WEIGHT || weight > BFQ_MAX_WEIGHT);
++ return IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF - weight < 0 ?
++ 0 : IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF - weight;
++}
++
++static void bfq_get_entity(struct bfq_entity *entity)
++{
++ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++
++ if (bfqq) {
++ atomic_inc(&bfqq->ref);
++ bfq_log_bfqq(bfqq->bfqd, bfqq, "get_entity: %p %d",
++ bfqq, atomic_read(&bfqq->ref));
++ }
++}
++
++/**
++ * bfq_find_deepest - find the deepest node that an extraction can modify.
++ * @node: the node being removed.
++ *
++ * Do the first step of an extraction in an rb tree, looking for the
++ * node that will replace @node, and returning the deepest node that
++ * the following modifications to the tree can touch. If @node is the
++ * last node in the tree return %NULL.
++ */
++static struct rb_node *bfq_find_deepest(struct rb_node *node)
++{
++ struct rb_node *deepest;
++
++ if (!node->rb_right && !node->rb_left)
++ deepest = rb_parent(node);
++ else if (!node->rb_right)
++ deepest = node->rb_left;
++ else if (!node->rb_left)
++ deepest = node->rb_right;
++ else {
++ deepest = rb_next(node);
++ if (deepest->rb_right)
++ deepest = deepest->rb_right;
++ else if (rb_parent(deepest) != node)
++ deepest = rb_parent(deepest);
++ }
++
++ return deepest;
++}
++
++/**
++ * bfq_active_extract - remove an entity from the active tree.
++ * @st: the service_tree containing the tree.
++ * @entity: the entity being removed.
++ */
++static void bfq_active_extract(struct bfq_service_tree *st,
++ struct bfq_entity *entity)
++{
++ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++ struct rb_node *node;
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ struct bfq_sched_data *sd = NULL;
++ struct bfq_group *bfqg = NULL;
++ struct bfq_data *bfqd = NULL;
++#endif
++
++ node = bfq_find_deepest(&entity->rb_node);
++ bfq_extract(&st->active, entity);
++
++ if (node)
++ bfq_update_active_tree(node);
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ sd = entity->sched_data;
++ bfqg = container_of(sd, struct bfq_group, sched_data);
++ BUG_ON(!bfqg);
++ bfqd = (struct bfq_data *)bfqg->bfqd;
++#endif
++ if (bfqq)
++ list_del(&bfqq->bfqq_list);
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ else { /* bfq_group */
++ BUG_ON(!bfqd);
++ bfq_weights_tree_remove(bfqd, entity,
++ &bfqd->group_weights_tree);
++ }
++ if (bfqg != bfqd->root_group) {
++ BUG_ON(!bfqg);
++ BUG_ON(!bfqd);
++ BUG_ON(!bfqg->active_entities);
++ bfqg->active_entities--;
++ if (bfqg->active_entities == 1) {
++ BUG_ON(!bfqd->active_numerous_groups);
++ bfqd->active_numerous_groups--;
++ }
++ }
++#endif
++}
++
++/**
++ * bfq_idle_insert - insert an entity into the idle tree.
++ * @st: the service tree containing the tree.
++ * @entity: the entity to insert.
++ */
++static void bfq_idle_insert(struct bfq_service_tree *st,
++ struct bfq_entity *entity)
++{
++ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++ struct bfq_entity *first_idle = st->first_idle;
++ struct bfq_entity *last_idle = st->last_idle;
++
++ if (!first_idle || bfq_gt(first_idle->finish, entity->finish))
++ st->first_idle = entity;
++ if (!last_idle || bfq_gt(entity->finish, last_idle->finish))
++ st->last_idle = entity;
++
++ bfq_insert(&st->idle, entity);
++
++ if (bfqq)
++ list_add(&bfqq->bfqq_list, &bfqq->bfqd->idle_list);
++}
++
++/**
++ * bfq_forget_entity - remove an entity from the wfq trees.
++ * @st: the service tree.
++ * @entity: the entity being removed.
++ *
++ * Update the device status and forget everything about @entity, putting
++ * the device reference to it, if it is a queue. Entities belonging to
++ * groups are not refcounted.
++ */
++static void bfq_forget_entity(struct bfq_service_tree *st,
++ struct bfq_entity *entity)
++{
++ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++ struct bfq_sched_data *sd;
++
++ BUG_ON(!entity->on_st);
++
++ entity->on_st = 0;
++ st->wsum -= entity->weight;
++ if (bfqq) {
++ sd = entity->sched_data;
++ bfq_log_bfqq(bfqq->bfqd, bfqq, "forget_entity: %p %d",
++ bfqq, atomic_read(&bfqq->ref));
++ bfq_put_queue(bfqq);
++ }
++}
++
++/**
++ * bfq_put_idle_entity - release the idle tree ref of an entity.
++ * @st: service tree for the entity.
++ * @entity: the entity being released.
++ */
++static void bfq_put_idle_entity(struct bfq_service_tree *st,
++ struct bfq_entity *entity)
++{
++ bfq_idle_extract(st, entity);
++ bfq_forget_entity(st, entity);
++}
++
++/**
++ * bfq_forget_idle - update the idle tree if necessary.
++ * @st: the service tree to act upon.
++ *
++ * To preserve the global O(log N) complexity we only remove one entry here;
++ * as the idle tree will not grow indefinitely this can be done safely.
++ */
++static void bfq_forget_idle(struct bfq_service_tree *st)
++{
++ struct bfq_entity *first_idle = st->first_idle;
++ struct bfq_entity *last_idle = st->last_idle;
++
++ if (RB_EMPTY_ROOT(&st->active) && last_idle &&
++ !bfq_gt(last_idle->finish, st->vtime)) {
++ /*
++ * Forget the whole idle tree, increasing the vtime past
++ * the last finish time of idle entities.
++ */
++ st->vtime = last_idle->finish;
++ }
++
++ if (first_idle && !bfq_gt(first_idle->finish, st->vtime))
++ bfq_put_idle_entity(st, first_idle);
++}
++
++static struct bfq_service_tree *
++__bfq_entity_update_weight_prio(struct bfq_service_tree *old_st,
++ struct bfq_entity *entity)
++{
++ struct bfq_service_tree *new_st = old_st;
++
++ if (entity->prio_changed) {
++ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++ unsigned short prev_weight, new_weight;
++ struct bfq_data *bfqd = NULL;
++ struct rb_root *root;
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ struct bfq_sched_data *sd;
++ struct bfq_group *bfqg;
++#endif
++
++ if (bfqq)
++ bfqd = bfqq->bfqd;
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ else {
++ sd = entity->my_sched_data;
++ bfqg = container_of(sd, struct bfq_group, sched_data);
++ BUG_ON(!bfqg);
++ bfqd = (struct bfq_data *)bfqg->bfqd;
++ BUG_ON(!bfqd);
++ }
++#endif
++
++ BUG_ON(old_st->wsum < entity->weight);
++ old_st->wsum -= entity->weight;
++
++ if (entity->new_weight != entity->orig_weight) {
++ if (entity->new_weight < BFQ_MIN_WEIGHT ||
++ entity->new_weight > BFQ_MAX_WEIGHT) {
++ printk(KERN_CRIT "update_weight_prio: "
++ "new_weight %d\n",
++ entity->new_weight);
++ BUG();
++ }
++ entity->orig_weight = entity->new_weight;
++ if (bfqq)
++ bfqq->ioprio =
++ bfq_weight_to_ioprio(entity->orig_weight);
++ }
++
++ if (bfqq)
++ bfqq->ioprio_class = bfqq->new_ioprio_class;
++ entity->prio_changed = 0;
++
++ /*
++ * NOTE: here we may be changing the weight too early,
++ * this will cause unfairness. The correct approach
++ * would have required additional complexity to defer
++ * weight changes to the proper time instants (i.e.,
++ * when entity->finish <= old_st->vtime).
++ */
++ new_st = bfq_entity_service_tree(entity);
++
++ prev_weight = entity->weight;
++ new_weight = entity->orig_weight *
++ (bfqq ? bfqq->wr_coeff : 1);
++ /*
++ * If the weight of the entity changes, remove the entity
++ * from its old weight counter (if there is a counter
++ * associated with the entity), and add it to the counter
++ * associated with its new weight.
++ */
++ if (prev_weight != new_weight) {
++ root = bfqq ? &bfqd->queue_weights_tree :
++ &bfqd->group_weights_tree;
++ bfq_weights_tree_remove(bfqd, entity, root);
++ }
++ entity->weight = new_weight;
++ /*
++ * Add the entity to its weights tree only if it is
++ * not associated with a weight-raised queue.
++ */
++ if (prev_weight != new_weight &&
++ (bfqq ? bfqq->wr_coeff == 1 : 1))
++ /* If we get here, root has been initialized. */
++ bfq_weights_tree_add(bfqd, entity, root);
++
++ new_st->wsum += entity->weight;
++
++ if (new_st != old_st)
++ entity->start = new_st->vtime;
++ }
++
++ return new_st;
++}
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++static void bfqg_stats_set_start_empty_time(struct bfq_group *bfqg);
++#endif
++
++/**
++ * bfq_bfqq_served - update the scheduler status after selection for
++ * service.
++ * @bfqq: the queue being served.
++ * @served: bytes to transfer.
++ *
++ * NOTE: this can be optimized, as the timestamps of upper level entities
++ * are synchronized every time a new bfqq is selected for service. By now,
++ * we keep it to better check consistency.
++ */
++static void bfq_bfqq_served(struct bfq_queue *bfqq, int served)
++{
++ struct bfq_entity *entity = &bfqq->entity;
++ struct bfq_service_tree *st;
++
++ for_each_entity(entity) {
++ st = bfq_entity_service_tree(entity);
++
++ entity->service += served;
++ BUG_ON(entity->service > entity->budget);
++ BUG_ON(st->wsum == 0);
++
++ st->vtime += bfq_delta(served, st->wsum);
++ bfq_forget_idle(st);
++ }
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ bfqg_stats_set_start_empty_time(bfqq_group(bfqq));
++#endif
++ bfq_log_bfqq(bfqq->bfqd, bfqq, "bfqq_served %d secs", served);
++}
++
++/**
++ * bfq_bfqq_charge_full_budget - set the service to the entity budget.
++ * @bfqq: the queue that needs a service update.
++ *
++ * When it's not possible to be fair in the service domain, because
++ * a queue is not consuming its budget fast enough (the meaning of
++ * fast depends on the timeout parameter), we charge it a full
++ * budget. In this way we should obtain a sort of time-domain
++ * fairness among all the seeky/slow queues.
++ */
++static void bfq_bfqq_charge_full_budget(struct bfq_queue *bfqq)
++{
++ struct bfq_entity *entity = &bfqq->entity;
++
++ bfq_log_bfqq(bfqq->bfqd, bfqq, "charge_full_budget");
++
++ bfq_bfqq_served(bfqq, entity->budget - entity->service);
++}
++
++/**
++ * __bfq_activate_entity - activate an entity.
++ * @entity: the entity being activated.
++ *
++ * Called whenever an entity is activated, i.e., it is not active and one
++ * of its children receives a new request, or has to be reactivated due to
++ * budget exhaustion. It uses the current budget of the entity (and the
++ * service received if @entity is active) of the queue to calculate its
++ * timestamps.
++ */
++static void __bfq_activate_entity(struct bfq_entity *entity)
++{
++ struct bfq_sched_data *sd = entity->sched_data;
++ struct bfq_service_tree *st = bfq_entity_service_tree(entity);
++
++ if (entity == sd->in_service_entity) {
++ BUG_ON(entity->tree);
++ /*
++ * If we are requeueing the current entity we have
++ * to take care of not charging to it service it has
++ * not received.
++ */
++ bfq_calc_finish(entity, entity->service);
++ entity->start = entity->finish;
++ sd->in_service_entity = NULL;
++ } else if (entity->tree == &st->active) {
++ /*
++ * Requeueing an entity due to a change of some
++ * next_in_service entity below it. We reuse the
++ * old start time.
++ */
++ bfq_active_extract(st, entity);
++ } else if (entity->tree == &st->idle) {
++ /*
++ * Must be on the idle tree, bfq_idle_extract() will
++ * check for that.
++ */
++ bfq_idle_extract(st, entity);
++ entity->start = bfq_gt(st->vtime, entity->finish) ?
++ st->vtime : entity->finish;
++ } else {
++ /*
++ * The finish time of the entity may be invalid, and
++ * it is in the past for sure, otherwise the queue
++ * would have been on the idle tree.
++ */
++ entity->start = st->vtime;
++ st->wsum += entity->weight;
++ bfq_get_entity(entity);
++
++ BUG_ON(entity->on_st);
++ entity->on_st = 1;
++ }
++
++ st = __bfq_entity_update_weight_prio(st, entity);
++ bfq_calc_finish(entity, entity->budget);
++ bfq_active_insert(st, entity);
++}
++
++/**
++ * bfq_activate_entity - activate an entity and its ancestors if necessary.
++ * @entity: the entity to activate.
++ *
++ * Activate @entity and all the entities on the path from it to the root.
++ */
++static void bfq_activate_entity(struct bfq_entity *entity)
++{
++ struct bfq_sched_data *sd;
++
++ for_each_entity(entity) {
++ __bfq_activate_entity(entity);
++
++ sd = entity->sched_data;
++ if (!bfq_update_next_in_service(sd))
++ /*
++ * No need to propagate the activation to the
++ * upper entities, as they will be updated when
++ * the in-service entity is rescheduled.
++ */
++ break;
++ }
++}
++
++/**
++ * __bfq_deactivate_entity - deactivate an entity from its service tree.
++ * @entity: the entity to deactivate.
++ * @requeue: if false, the entity will not be put into the idle tree.
++ *
++ * Deactivate an entity, independently from its previous state. If the
++ * entity was not on a service tree just return, otherwise if it is on
++ * any scheduler tree, extract it from that tree, and if necessary
++ * and if the caller did not specify @requeue, put it on the idle tree.
++ *
++ * Return %1 if the caller should update the entity hierarchy, i.e.,
++ * if the entity was in service or if it was the next_in_service for
++ * its sched_data; return %0 otherwise.
++ */
++static int __bfq_deactivate_entity(struct bfq_entity *entity, int requeue)
++{
++ struct bfq_sched_data *sd = entity->sched_data;
++ struct bfq_service_tree *st = bfq_entity_service_tree(entity);
++ int was_in_service = entity == sd->in_service_entity;
++ int ret = 0;
++
++ if (!entity->on_st)
++ return 0;
++
++ BUG_ON(was_in_service && entity->tree);
++
++ if (was_in_service) {
++ bfq_calc_finish(entity, entity->service);
++ sd->in_service_entity = NULL;
++ } else if (entity->tree == &st->active)
++ bfq_active_extract(st, entity);
++ else if (entity->tree == &st->idle)
++ bfq_idle_extract(st, entity);
++ else if (entity->tree)
++ BUG();
++
++ if (was_in_service || sd->next_in_service == entity)
++ ret = bfq_update_next_in_service(sd);
++
++ if (!requeue || !bfq_gt(entity->finish, st->vtime))
++ bfq_forget_entity(st, entity);
++ else
++ bfq_idle_insert(st, entity);
++
++ BUG_ON(sd->in_service_entity == entity);
++ BUG_ON(sd->next_in_service == entity);
++
++ return ret;
++}
++
++/**
++ * bfq_deactivate_entity - deactivate an entity.
++ * @entity: the entity to deactivate.
++ * @requeue: true if the entity can be put on the idle tree
++ */
++static void bfq_deactivate_entity(struct bfq_entity *entity, int requeue)
++{
++ struct bfq_sched_data *sd;
++ struct bfq_entity *parent;
++
++ for_each_entity_safe(entity, parent) {
++ sd = entity->sched_data;
++
++ if (!__bfq_deactivate_entity(entity, requeue))
++ /*
++ * The parent entity is still backlogged, and
++ * we don't need to update it as it is still
++ * in service.
++ */
++ break;
++
++ if (sd->next_in_service)
++ /*
++ * The parent entity is still backlogged and
++ * the budgets on the path towards the root
++ * need to be updated.
++ */
++ goto update;
++
++ /*
++ * If we reach there the parent is no more backlogged and
++ * we want to propagate the dequeue upwards.
++ */
++ requeue = 1;
++ }
++
++ return;
++
++update:
++ entity = parent;
++ for_each_entity(entity) {
++ __bfq_activate_entity(entity);
++
++ sd = entity->sched_data;
++ if (!bfq_update_next_in_service(sd))
++ break;
++ }
++}
++
++/**
++ * bfq_update_vtime - update vtime if necessary.
++ * @st: the service tree to act upon.
++ *
++ * If necessary update the service tree vtime to have at least one
++ * eligible entity, skipping to its start time. Assumes that the
++ * active tree of the device is not empty.
++ *
++ * NOTE: this hierarchical implementation updates vtimes quite often,
++ * we may end up with reactivated processes getting timestamps after a
++ * vtime skip done because we needed a ->first_active entity on some
++ * intermediate node.
++ */
++static void bfq_update_vtime(struct bfq_service_tree *st)
++{
++ struct bfq_entity *entry;
++ struct rb_node *node = st->active.rb_node;
++
++ entry = rb_entry(node, struct bfq_entity, rb_node);
++ if (bfq_gt(entry->min_start, st->vtime)) {
++ st->vtime = entry->min_start;
++ bfq_forget_idle(st);
++ }
++}
++
++/**
++ * bfq_first_active_entity - find the eligible entity with
++ * the smallest finish time
++ * @st: the service tree to select from.
++ *
++ * This function searches the first schedulable entity, starting from the
++ * root of the tree and going on the left every time on this side there is
++ * a subtree with at least one eligible (start >= vtime) entity. The path on
++ * the right is followed only if a) the left subtree contains no eligible
++ * entities and b) no eligible entity has been found yet.
++ */
++static struct bfq_entity *bfq_first_active_entity(struct bfq_service_tree *st)
++{
++ struct bfq_entity *entry, *first = NULL;
++ struct rb_node *node = st->active.rb_node;
++
++ while (node) {
++ entry = rb_entry(node, struct bfq_entity, rb_node);
++left:
++ if (!bfq_gt(entry->start, st->vtime))
++ first = entry;
++
++ BUG_ON(bfq_gt(entry->min_start, st->vtime));
++
++ if (node->rb_left) {
++ entry = rb_entry(node->rb_left,
++ struct bfq_entity, rb_node);
++ if (!bfq_gt(entry->min_start, st->vtime)) {
++ node = node->rb_left;
++ goto left;
++ }
++ }
++ if (first)
++ break;
++ node = node->rb_right;
++ }
++
++ BUG_ON(!first && !RB_EMPTY_ROOT(&st->active));
++ return first;
++}
++
++/**
++ * __bfq_lookup_next_entity - return the first eligible entity in @st.
++ * @st: the service tree.
++ *
++ * Update the virtual time in @st and return the first eligible entity
++ * it contains.
++ */
++static struct bfq_entity *__bfq_lookup_next_entity(struct bfq_service_tree *st,
++ bool force)
++{
++ struct bfq_entity *entity, *new_next_in_service = NULL;
++
++ if (RB_EMPTY_ROOT(&st->active))
++ return NULL;
++
++ bfq_update_vtime(st);
++ entity = bfq_first_active_entity(st);
++ BUG_ON(bfq_gt(entity->start, st->vtime));
++
++ /*
++ * If the chosen entity does not match with the sched_data's
++ * next_in_service and we are forcedly serving the IDLE priority
++ * class tree, bubble up budget update.
++ */
++ if (unlikely(force && entity != entity->sched_data->next_in_service)) {
++ new_next_in_service = entity;
++ for_each_entity(new_next_in_service)
++ bfq_update_budget(new_next_in_service);
++ }
++
++ return entity;
++}
++
++/**
++ * bfq_lookup_next_entity - return the first eligible entity in @sd.
++ * @sd: the sched_data.
++ * @extract: if true the returned entity will be also extracted from @sd.
++ *
++ * NOTE: since we cache the next_in_service entity at each level of the
++ * hierarchy, the complexity of the lookup can be decreased with
++ * absolutely no effort just returning the cached next_in_service value;
++ * we prefer to do full lookups to test the consistency of * the data
++ * structures.
++ */
++static struct bfq_entity *bfq_lookup_next_entity(struct bfq_sched_data *sd,
++ int extract,
++ struct bfq_data *bfqd)
++{
++ struct bfq_service_tree *st = sd->service_tree;
++ struct bfq_entity *entity;
++ int i = 0;
++
++ BUG_ON(sd->in_service_entity);
++
++ if (bfqd &&
++ jiffies - bfqd->bfq_class_idle_last_service > BFQ_CL_IDLE_TIMEOUT) {
++ entity = __bfq_lookup_next_entity(st + BFQ_IOPRIO_CLASSES - 1,
++ true);
++ if (entity) {
++ i = BFQ_IOPRIO_CLASSES - 1;
++ bfqd->bfq_class_idle_last_service = jiffies;
++ sd->next_in_service = entity;
++ }
++ }
++ for (; i < BFQ_IOPRIO_CLASSES; i++) {
++ entity = __bfq_lookup_next_entity(st + i, false);
++ if (entity) {
++ if (extract) {
++ bfq_check_next_in_service(sd, entity);
++ bfq_active_extract(st + i, entity);
++ sd->in_service_entity = entity;
++ sd->next_in_service = NULL;
++ }
++ break;
++ }
++ }
++
++ return entity;
++}
++
++/*
++ * Get next queue for service.
++ */
++static struct bfq_queue *bfq_get_next_queue(struct bfq_data *bfqd)
++{
++ struct bfq_entity *entity = NULL;
++ struct bfq_sched_data *sd;
++ struct bfq_queue *bfqq;
++
++ BUG_ON(bfqd->in_service_queue);
++
++ if (bfqd->busy_queues == 0)
++ return NULL;
++
++ sd = &bfqd->root_group->sched_data;
++ for (; sd ; sd = entity->my_sched_data) {
++ entity = bfq_lookup_next_entity(sd, 1, bfqd);
++ BUG_ON(!entity);
++ entity->service = 0;
++ }
++
++ bfqq = bfq_entity_to_bfqq(entity);
++ BUG_ON(!bfqq);
++
++ return bfqq;
++}
++
++static void __bfq_bfqd_reset_in_service(struct bfq_data *bfqd)
++{
++ if (bfqd->in_service_bic) {
++ put_io_context(bfqd->in_service_bic->icq.ioc);
++ bfqd->in_service_bic = NULL;
++ }
++
++ bfqd->in_service_queue = NULL;
++ del_timer(&bfqd->idle_slice_timer);
++}
++
++static void bfq_deactivate_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++ int requeue)
++{
++ struct bfq_entity *entity = &bfqq->entity;
++
++ if (bfqq == bfqd->in_service_queue)
++ __bfq_bfqd_reset_in_service(bfqd);
++
++ bfq_deactivate_entity(entity, requeue);
++}
++
++static void bfq_activate_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq)
++{
++ struct bfq_entity *entity = &bfqq->entity;
++
++ bfq_activate_entity(entity);
++}
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++static void bfqg_stats_update_dequeue(struct bfq_group *bfqg);
++#endif
++
++/*
++ * Called when the bfqq no longer has requests pending, remove it from
++ * the service tree.
++ */
++static void bfq_del_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++ int requeue)
++{
++ BUG_ON(!bfq_bfqq_busy(bfqq));
++ BUG_ON(!RB_EMPTY_ROOT(&bfqq->sort_list));
++
++ bfq_log_bfqq(bfqd, bfqq, "del from busy");
++
++ bfq_clear_bfqq_busy(bfqq);
++
++ BUG_ON(bfqd->busy_queues == 0);
++ bfqd->busy_queues--;
++
++ if (!bfqq->dispatched) {
++ bfq_weights_tree_remove(bfqd, &bfqq->entity,
++ &bfqd->queue_weights_tree);
++ if (!blk_queue_nonrot(bfqd->queue)) {
++ BUG_ON(!bfqd->busy_in_flight_queues);
++ bfqd->busy_in_flight_queues--;
++ if (bfq_bfqq_constantly_seeky(bfqq)) {
++ BUG_ON(!bfqd->
++ const_seeky_busy_in_flight_queues);
++ bfqd->const_seeky_busy_in_flight_queues--;
++ }
++ }
++ }
++ if (bfqq->wr_coeff > 1)
++ bfqd->wr_busy_queues--;
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ bfqg_stats_update_dequeue(bfqq_group(bfqq));
++#endif
++
++ bfq_deactivate_bfqq(bfqd, bfqq, requeue);
++}
++
++/*
++ * Called when an inactive queue receives a new request.
++ */
++static void bfq_add_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq)
++{
++ BUG_ON(bfq_bfqq_busy(bfqq));
++ BUG_ON(bfqq == bfqd->in_service_queue);
++
++ bfq_log_bfqq(bfqd, bfqq, "add to busy");
++
++ bfq_activate_bfqq(bfqd, bfqq);
++
++ bfq_mark_bfqq_busy(bfqq);
++ bfqd->busy_queues++;
++
++ if (!bfqq->dispatched) {
++ if (bfqq->wr_coeff == 1)
++ bfq_weights_tree_add(bfqd, &bfqq->entity,
++ &bfqd->queue_weights_tree);
++ if (!blk_queue_nonrot(bfqd->queue)) {
++ bfqd->busy_in_flight_queues++;
++ if (bfq_bfqq_constantly_seeky(bfqq))
++ bfqd->const_seeky_busy_in_flight_queues++;
++ }
++ }
++ if (bfqq->wr_coeff > 1)
++ bfqd->wr_busy_queues++;
++}
+diff --git a/block/bfq.h b/block/bfq.h
+new file mode 100644
+index 0000000..ca5ac20
+--- /dev/null
++++ b/block/bfq.h
+@@ -0,0 +1,807 @@
++/*
++ * BFQ-v7r9 for 4.2.0: data structures and common functions prototypes.
++ *
++ * Based on ideas and code from CFQ:
++ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
++ *
++ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
++ * Paolo Valente <paolo.valente@unimore.it>
++ *
++ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
++ */
++
++#ifndef _BFQ_H
++#define _BFQ_H
++
++#include <linux/blktrace_api.h>
++#include <linux/hrtimer.h>
++#include <linux/ioprio.h>
++#include <linux/rbtree.h>
++#include <linux/blk-cgroup.h>
++
++#define BFQ_IOPRIO_CLASSES 3
++#define BFQ_CL_IDLE_TIMEOUT (HZ/5)
++
++#define BFQ_MIN_WEIGHT 1
++#define BFQ_MAX_WEIGHT 1000
++#define BFQ_WEIGHT_CONVERSION_COEFF 10
++
++#define BFQ_DEFAULT_QUEUE_IOPRIO 4
++
++#define BFQ_DEFAULT_GRP_WEIGHT 10
++#define BFQ_DEFAULT_GRP_IOPRIO 0
++#define BFQ_DEFAULT_GRP_CLASS IOPRIO_CLASS_BE
++
++struct bfq_entity;
++
++/**
++ * struct bfq_service_tree - per ioprio_class service tree.
++ * @active: tree for active entities (i.e., those backlogged).
++ * @idle: tree for idle entities (i.e., those not backlogged, with V <= F_i).
++ * @first_idle: idle entity with minimum F_i.
++ * @last_idle: idle entity with maximum F_i.
++ * @vtime: scheduler virtual time.
++ * @wsum: scheduler weight sum; active and idle entities contribute to it.
++ *
++ * Each service tree represents a B-WF2Q+ scheduler on its own. Each
++ * ioprio_class has its own independent scheduler, and so its own
++ * bfq_service_tree. All the fields are protected by the queue lock
++ * of the containing bfqd.
++ */
++struct bfq_service_tree {
++ struct rb_root active;
++ struct rb_root idle;
++
++ struct bfq_entity *first_idle;
++ struct bfq_entity *last_idle;
++
++ u64 vtime;
++ unsigned long wsum;
++};
++
++/**
++ * struct bfq_sched_data - multi-class scheduler.
++ * @in_service_entity: entity in service.
++ * @next_in_service: head-of-the-line entity in the scheduler.
++ * @service_tree: array of service trees, one per ioprio_class.
++ *
++ * bfq_sched_data is the basic scheduler queue. It supports three
++ * ioprio_classes, and can be used either as a toplevel queue or as
++ * an intermediate queue on a hierarchical setup.
++ * @next_in_service points to the active entity of the sched_data
++ * service trees that will be scheduled next.
++ *
++ * The supported ioprio_classes are the same as in CFQ, in descending
++ * priority order, IOPRIO_CLASS_RT, IOPRIO_CLASS_BE, IOPRIO_CLASS_IDLE.
++ * Requests from higher priority queues are served before all the
++ * requests from lower priority queues; among requests of the same
++ * queue requests are served according to B-WF2Q+.
++ * All the fields are protected by the queue lock of the containing bfqd.
++ */
++struct bfq_sched_data {
++ struct bfq_entity *in_service_entity;
++ struct bfq_entity *next_in_service;
++ struct bfq_service_tree service_tree[BFQ_IOPRIO_CLASSES];
++};
++
++/**
++ * struct bfq_weight_counter - counter of the number of all active entities
++ * with a given weight.
++ * @weight: weight of the entities that this counter refers to.
++ * @num_active: number of active entities with this weight.
++ * @weights_node: weights tree member (see bfq_data's @queue_weights_tree
++ * and @group_weights_tree).
++ */
++struct bfq_weight_counter {
++ short int weight;
++ unsigned int num_active;
++ struct rb_node weights_node;
++};
++
++/**
++ * struct bfq_entity - schedulable entity.
++ * @rb_node: service_tree member.
++ * @weight_counter: pointer to the weight counter associated with this entity.
++ * @on_st: flag, true if the entity is on a tree (either the active or
++ * the idle one of its service_tree).
++ * @finish: B-WF2Q+ finish timestamp (aka F_i).
++ * @start: B-WF2Q+ start timestamp (aka S_i).
++ * @tree: tree the entity is enqueued into; %NULL if not on a tree.
++ * @min_start: minimum start time of the (active) subtree rooted at
++ * this entity; used for O(log N) lookups into active trees.
++ * @service: service received during the last round of service.
++ * @budget: budget used to calculate F_i; F_i = S_i + @budget / @weight.
++ * @weight: weight of the queue
++ * @parent: parent entity, for hierarchical scheduling.
++ * @my_sched_data: for non-leaf nodes in the cgroup hierarchy, the
++ * associated scheduler queue, %NULL on leaf nodes.
++ * @sched_data: the scheduler queue this entity belongs to.
++ * @ioprio: the ioprio in use.
++ * @new_weight: when a weight change is requested, the new weight value.
++ * @orig_weight: original weight, used to implement weight boosting
++ * @prio_changed: flag, true when the user requested a weight, ioprio or
++ * ioprio_class change.
++ *
++ * A bfq_entity is used to represent either a bfq_queue (leaf node in the
++ * cgroup hierarchy) or a bfq_group into the upper level scheduler. Each
++ * entity belongs to the sched_data of the parent group in the cgroup
++ * hierarchy. Non-leaf entities have also their own sched_data, stored
++ * in @my_sched_data.
++ *
++ * Each entity stores independently its priority values; this would
++ * allow different weights on different devices, but this
++ * functionality is not exported to userspace by now. Priorities and
++ * weights are updated lazily, first storing the new values into the
++ * new_* fields, then setting the @prio_changed flag. As soon as
++ * there is a transition in the entity state that allows the priority
++ * update to take place the effective and the requested priority
++ * values are synchronized.
++ *
++ * Unless cgroups are used, the weight value is calculated from the
++ * ioprio to export the same interface as CFQ. When dealing with
++ * ``well-behaved'' queues (i.e., queues that do not spend too much
++ * time to consume their budget and have true sequential behavior, and
++ * when there are no external factors breaking anticipation) the
++ * relative weights at each level of the cgroups hierarchy should be
++ * guaranteed. All the fields are protected by the queue lock of the
++ * containing bfqd.
++ */
++struct bfq_entity {
++ struct rb_node rb_node;
++ struct bfq_weight_counter *weight_counter;
++
++ int on_st;
++
++ u64 finish;
++ u64 start;
++
++ struct rb_root *tree;
++
++ u64 min_start;
++
++ int service, budget;
++ unsigned short weight, new_weight;
++ unsigned short orig_weight;
++
++ struct bfq_entity *parent;
++
++ struct bfq_sched_data *my_sched_data;
++ struct bfq_sched_data *sched_data;
++
++ int prio_changed;
++};
++
++struct bfq_group;
++
++/**
++ * struct bfq_queue - leaf schedulable entity.
++ * @ref: reference counter.
++ * @bfqd: parent bfq_data.
++ * @new_ioprio: when an ioprio change is requested, the new ioprio value.
++ * @ioprio_class: the ioprio_class in use.
++ * @new_ioprio_class: when an ioprio_class change is requested, the new
++ * ioprio_class value.
++ * @new_bfqq: shared bfq_queue if queue is cooperating with
++ * one or more other queues.
++ * @sort_list: sorted list of pending requests.
++ * @next_rq: if fifo isn't expired, next request to serve.
++ * @queued: nr of requests queued in @sort_list.
++ * @allocated: currently allocated requests.
++ * @meta_pending: pending metadata requests.
++ * @fifo: fifo list of requests in sort_list.
++ * @entity: entity representing this queue in the scheduler.
++ * @max_budget: maximum budget allowed from the feedback mechanism.
++ * @budget_timeout: budget expiration (in jiffies).
++ * @dispatched: number of requests on the dispatch list or inside driver.
++ * @flags: status flags.
++ * @bfqq_list: node for active/idle bfqq list inside our bfqd.
++ * @burst_list_node: node for the device's burst list.
++ * @seek_samples: number of seeks sampled
++ * @seek_total: sum of the distances of the seeks sampled
++ * @seek_mean: mean seek distance
++ * @last_request_pos: position of the last request enqueued
++ * @requests_within_timer: number of consecutive pairs of request completion
++ * and arrival, such that the queue becomes idle
++ * after the completion, but the next request arrives
++ * within an idle time slice; used only if the queue's
++ * IO_bound has been cleared.
++ * @pid: pid of the process owning the queue, used for logging purposes.
++ * @last_wr_start_finish: start time of the current weight-raising period if
++ * the @bfq-queue is being weight-raised, otherwise
++ * finish time of the last weight-raising period
++ * @wr_cur_max_time: current max raising time for this queue
++ * @soft_rt_next_start: minimum time instant such that, only if a new
++ * request is enqueued after this time instant in an
++ * idle @bfq_queue with no outstanding requests, then
++ * the task associated with the queue it is deemed as
++ * soft real-time (see the comments to the function
++ * bfq_bfqq_softrt_next_start())
++ * @last_idle_bklogged: time of the last transition of the @bfq_queue from
++ * idle to backlogged
++ * @service_from_backlogged: cumulative service received from the @bfq_queue
++ * since the last transition from idle to
++ * backlogged
++ * @bic: pointer to the bfq_io_cq owning the bfq_queue, set to %NULL if the
++ * queue is shared
++ *
++ * A bfq_queue is a leaf request queue; it can be associated with an
++ * io_context or more, if it is async or shared between cooperating
++ * processes. @cgroup holds a reference to the cgroup, to be sure that it
++ * does not disappear while a bfqq still references it (mostly to avoid
++ * races between request issuing and task migration followed by cgroup
++ * destruction).
++ * All the fields are protected by the queue lock of the containing bfqd.
++ */
++struct bfq_queue {
++ atomic_t ref;
++ struct bfq_data *bfqd;
++
++ unsigned short ioprio, new_ioprio;
++ unsigned short ioprio_class, new_ioprio_class;
++
++ /* fields for cooperating queues handling */
++ struct bfq_queue *new_bfqq;
++ struct rb_node pos_node;
++ struct rb_root *pos_root;
++
++ struct rb_root sort_list;
++ struct request *next_rq;
++ int queued[2];
++ int allocated[2];
++ int meta_pending;
++ struct list_head fifo;
++
++ struct bfq_entity entity;
++
++ int max_budget;
++ unsigned long budget_timeout;
++
++ int dispatched;
++
++ unsigned int flags;
++
++ struct list_head bfqq_list;
++
++ struct hlist_node burst_list_node;
++
++ unsigned int seek_samples;
++ u64 seek_total;
++ sector_t seek_mean;
++ sector_t last_request_pos;
++
++ unsigned int requests_within_timer;
++
++ pid_t pid;
++ struct bfq_io_cq *bic;
++
++ /* weight-raising fields */
++ unsigned long wr_cur_max_time;
++ unsigned long soft_rt_next_start;
++ unsigned long last_wr_start_finish;
++ unsigned int wr_coeff;
++ unsigned long last_idle_bklogged;
++ unsigned long service_from_backlogged;
++};
++
++/**
++ * struct bfq_ttime - per process thinktime stats.
++ * @ttime_total: total process thinktime
++ * @ttime_samples: number of thinktime samples
++ * @ttime_mean: average process thinktime
++ */
++struct bfq_ttime {
++ unsigned long last_end_request;
++
++ unsigned long ttime_total;
++ unsigned long ttime_samples;
++ unsigned long ttime_mean;
++};
++
++/**
++ * struct bfq_io_cq - per (request_queue, io_context) structure.
++ * @icq: associated io_cq structure
++ * @bfqq: array of two process queues, the sync and the async
++ * @ttime: associated @bfq_ttime struct
++ * @ioprio: per (request_queue, blkcg) ioprio.
++ * @blkcg_id: id of the blkcg the related io_cq belongs to.
++ */
++struct bfq_io_cq {
++ struct io_cq icq; /* must be the first member */
++ struct bfq_queue *bfqq[2];
++ struct bfq_ttime ttime;
++ int ioprio;
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ uint64_t blkcg_id; /* the current blkcg ID */
++#endif
++};
++
++enum bfq_device_speed {
++ BFQ_BFQD_FAST,
++ BFQ_BFQD_SLOW,
++};
++
++/**
++ * struct bfq_data - per device data structure.
++ * @queue: request queue for the managed device.
++ * @root_group: root bfq_group for the device.
++ * @active_numerous_groups: number of bfq_groups containing more than one
++ * active @bfq_entity.
++ * @queue_weights_tree: rbtree of weight counters of @bfq_queues, sorted by
++ * weight. Used to keep track of whether all @bfq_queues
++ * have the same weight. The tree contains one counter
++ * for each distinct weight associated to some active
++ * and not weight-raised @bfq_queue (see the comments to
++ * the functions bfq_weights_tree_[add|remove] for
++ * further details).
++ * @group_weights_tree: rbtree of non-queue @bfq_entity weight counters, sorted
++ * by weight. Used to keep track of whether all
++ * @bfq_groups have the same weight. The tree contains
++ * one counter for each distinct weight associated to
++ * some active @bfq_group (see the comments to the
++ * functions bfq_weights_tree_[add|remove] for further
++ * details).
++ * @busy_queues: number of bfq_queues containing requests (including the
++ * queue in service, even if it is idling).
++ * @busy_in_flight_queues: number of @bfq_queues containing pending or
++ * in-flight requests, plus the @bfq_queue in
++ * service, even if idle but waiting for the
++ * possible arrival of its next sync request. This
++ * field is updated only if the device is rotational,
++ * but used only if the device is also NCQ-capable.
++ * The reason why the field is updated also for non-
++ * NCQ-capable rotational devices is related to the
++ * fact that the value of @hw_tag may be set also
++ * later than when busy_in_flight_queues may need to
++ * be incremented for the first time(s). Taking also
++ * this possibility into account, to avoid unbalanced
++ * increments/decrements, would imply more overhead
++ * than just updating busy_in_flight_queues
++ * regardless of the value of @hw_tag.
++ * @const_seeky_busy_in_flight_queues: number of constantly-seeky @bfq_queues
++ * (that is, seeky queues that expired
++ * for budget timeout at least once)
++ * containing pending or in-flight
++ * requests, including the in-service
++ * @bfq_queue if constantly seeky. This
++ * field is updated only if the device
++ * is rotational, but used only if the
++ * device is also NCQ-capable (see the
++ * comments to @busy_in_flight_queues).
++ * @wr_busy_queues: number of weight-raised busy @bfq_queues.
++ * @queued: number of queued requests.
++ * @rq_in_driver: number of requests dispatched and waiting for completion.
++ * @sync_flight: number of sync requests in the driver.
++ * @max_rq_in_driver: max number of reqs in driver in the last
++ * @hw_tag_samples completed requests.
++ * @hw_tag_samples: nr of samples used to calculate hw_tag.
++ * @hw_tag: flag set to one if the driver is showing a queueing behavior.
++ * @budgets_assigned: number of budgets assigned.
++ * @idle_slice_timer: timer set when idling for the next sequential request
++ * from the queue in service.
++ * @unplug_work: delayed work to restart dispatching on the request queue.
++ * @in_service_queue: bfq_queue in service.
++ * @in_service_bic: bfq_io_cq (bic) associated with the @in_service_queue.
++ * @last_position: on-disk position of the last served request.
++ * @last_budget_start: beginning of the last budget.
++ * @last_idling_start: beginning of the last idle slice.
++ * @peak_rate: peak transfer rate observed for a budget.
++ * @peak_rate_samples: number of samples used to calculate @peak_rate.
++ * @bfq_max_budget: maximum budget allotted to a bfq_queue before
++ * rescheduling.
++ * @group_list: list of all the bfq_groups active on the device.
++ * @active_list: list of all the bfq_queues active on the device.
++ * @idle_list: list of all the bfq_queues idle on the device.
++ * @bfq_fifo_expire: timeout for async/sync requests; when it expires
++ * requests are served in fifo order.
++ * @bfq_back_penalty: weight of backward seeks wrt forward ones.
++ * @bfq_back_max: maximum allowed backward seek.
++ * @bfq_slice_idle: maximum idling time.
++ * @bfq_user_max_budget: user-configured max budget value
++ * (0 for auto-tuning).
++ * @bfq_max_budget_async_rq: maximum budget (in nr of requests) allotted to
++ * async queues.
++ * @bfq_timeout: timeout for bfq_queues to consume their budget; used to
++ * to prevent seeky queues to impose long latencies to well
++ * behaved ones (this also implies that seeky queues cannot
++ * receive guarantees in the service domain; after a timeout
++ * they are charged for the whole allocated budget, to try
++ * to preserve a behavior reasonably fair among them, but
++ * without service-domain guarantees).
++ * @bfq_coop_thresh: number of queue merges after which a @bfq_queue is
++ * no more granted any weight-raising.
++ * @bfq_failed_cooperations: number of consecutive failed cooperation
++ * chances after which weight-raising is restored
++ * to a queue subject to more than bfq_coop_thresh
++ * queue merges.
++ * @bfq_requests_within_timer: number of consecutive requests that must be
++ * issued within the idle time slice to set
++ * again idling to a queue which was marked as
++ * non-I/O-bound (see the definition of the
++ * IO_bound flag for further details).
++ * @last_ins_in_burst: last time at which a queue entered the current
++ * burst of queues being activated shortly after
++ * each other; for more details about this and the
++ * following parameters related to a burst of
++ * activations, see the comments to the function
++ * @bfq_handle_burst.
++ * @bfq_burst_interval: reference time interval used to decide whether a
++ * queue has been activated shortly after
++ * @last_ins_in_burst.
++ * @burst_size: number of queues in the current burst of queue activations.
++ * @bfq_large_burst_thresh: maximum burst size above which the current
++ * queue-activation burst is deemed as 'large'.
++ * @large_burst: true if a large queue-activation burst is in progress.
++ * @burst_list: head of the burst list (as for the above fields, more details
++ * in the comments to the function bfq_handle_burst).
++ * @low_latency: if set to true, low-latency heuristics are enabled.
++ * @bfq_wr_coeff: maximum factor by which the weight of a weight-raised
++ * queue is multiplied.
++ * @bfq_wr_max_time: maximum duration of a weight-raising period (jiffies).
++ * @bfq_wr_rt_max_time: maximum duration for soft real-time processes.
++ * @bfq_wr_min_idle_time: minimum idle period after which weight-raising
++ * may be reactivated for a queue (in jiffies).
++ * @bfq_wr_min_inter_arr_async: minimum period between request arrivals
++ * after which weight-raising may be
++ * reactivated for an already busy queue
++ * (in jiffies).
++ * @bfq_wr_max_softrt_rate: max service-rate for a soft real-time queue,
++ * sectors per seconds.
++ * @RT_prod: cached value of the product R*T used for computing the maximum
++ * duration of the weight raising automatically.
++ * @device_speed: device-speed class for the low-latency heuristic.
++ * @oom_bfqq: fallback dummy bfqq for extreme OOM conditions.
++ *
++ * All the fields are protected by the @queue lock.
++ */
++struct bfq_data {
++ struct request_queue *queue;
++
++ struct bfq_group *root_group;
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++ int active_numerous_groups;
++#endif
++
++ struct rb_root queue_weights_tree;
++ struct rb_root group_weights_tree;
++
++ int busy_queues;
++ int busy_in_flight_queues;
++ int const_seeky_busy_in_flight_queues;
++ int wr_busy_queues;
++ int queued;
++ int rq_in_driver;
++ int sync_flight;
++
++ int max_rq_in_driver;
++ int hw_tag_samples;
++ int hw_tag;
++
++ int budgets_assigned;
++
++ struct timer_list idle_slice_timer;
++ struct work_struct unplug_work;
++
++ struct bfq_queue *in_service_queue;
++ struct bfq_io_cq *in_service_bic;
++
++ sector_t last_position;
++
++ ktime_t last_budget_start;
++ ktime_t last_idling_start;
++ int peak_rate_samples;
++ u64 peak_rate;
++ int bfq_max_budget;
++
++ struct hlist_head group_list;
++ struct list_head active_list;
++ struct list_head idle_list;
++
++ unsigned int bfq_fifo_expire[2];
++ unsigned int bfq_back_penalty;
++ unsigned int bfq_back_max;
++ unsigned int bfq_slice_idle;
++ u64 bfq_class_idle_last_service;
++
++ int bfq_user_max_budget;
++ int bfq_max_budget_async_rq;
++ unsigned int bfq_timeout[2];
++
++ unsigned int bfq_coop_thresh;
++ unsigned int bfq_failed_cooperations;
++ unsigned int bfq_requests_within_timer;
++
++ unsigned long last_ins_in_burst;
++ unsigned long bfq_burst_interval;
++ int burst_size;
++ unsigned long bfq_large_burst_thresh;
++ bool large_burst;
++ struct hlist_head burst_list;
++
++ bool low_latency;
++
++ /* parameters of the low_latency heuristics */
++ unsigned int bfq_wr_coeff;
++ unsigned int bfq_wr_max_time;
++ unsigned int bfq_wr_rt_max_time;
++ unsigned int bfq_wr_min_idle_time;
++ unsigned long bfq_wr_min_inter_arr_async;
++ unsigned int bfq_wr_max_softrt_rate;
++ u64 RT_prod;
++ enum bfq_device_speed device_speed;
++
++ struct bfq_queue oom_bfqq;
++};
++
++enum bfqq_state_flags {
++ BFQ_BFQQ_FLAG_busy = 0, /* has requests or is in service */
++ BFQ_BFQQ_FLAG_wait_request, /* waiting for a request */
++ BFQ_BFQQ_FLAG_must_alloc, /* must be allowed rq alloc */
++ BFQ_BFQQ_FLAG_fifo_expire, /* FIFO checked in this slice */
++ BFQ_BFQQ_FLAG_idle_window, /* slice idling enabled */
++ BFQ_BFQQ_FLAG_sync, /* synchronous queue */
++ BFQ_BFQQ_FLAG_budget_new, /* no completion with this budget */
++ BFQ_BFQQ_FLAG_IO_bound, /*
++ * bfqq has timed-out at least once
++ * having consumed at most 2/10 of
++ * its budget
++ */
++ BFQ_BFQQ_FLAG_in_large_burst, /*
++ * bfqq activated in a large burst,
++ * see comments to bfq_handle_burst.
++ */
++ BFQ_BFQQ_FLAG_constantly_seeky, /*
++ * bfqq has proved to be slow and
++ * seeky until budget timeout
++ */
++ BFQ_BFQQ_FLAG_softrt_update, /*
++ * may need softrt-next-start
++ * update
++ */
++};
++
++#define BFQ_BFQQ_FNS(name) \
++static void bfq_mark_bfqq_##name(struct bfq_queue *bfqq) \
++{ \
++ (bfqq)->flags |= (1 << BFQ_BFQQ_FLAG_##name); \
++} \
++static void bfq_clear_bfqq_##name(struct bfq_queue *bfqq) \
++{ \
++ (bfqq)->flags &= ~(1 << BFQ_BFQQ_FLAG_##name); \
++} \
++static int bfq_bfqq_##name(const struct bfq_queue *bfqq) \
++{ \
++ return ((bfqq)->flags & (1 << BFQ_BFQQ_FLAG_##name)) != 0; \
++}
++
++BFQ_BFQQ_FNS(busy);
++BFQ_BFQQ_FNS(wait_request);
++BFQ_BFQQ_FNS(must_alloc);
++BFQ_BFQQ_FNS(fifo_expire);
++BFQ_BFQQ_FNS(idle_window);
++BFQ_BFQQ_FNS(sync);
++BFQ_BFQQ_FNS(budget_new);
++BFQ_BFQQ_FNS(IO_bound);
++BFQ_BFQQ_FNS(in_large_burst);
++BFQ_BFQQ_FNS(constantly_seeky);
++BFQ_BFQQ_FNS(softrt_update);
++#undef BFQ_BFQQ_FNS
++
++/* Logging facilities. */
++#define bfq_log_bfqq(bfqd, bfqq, fmt, args...) \
++ blk_add_trace_msg((bfqd)->queue, "bfq%d " fmt, (bfqq)->pid, ##args)
++
++#define bfq_log(bfqd, fmt, args...) \
++ blk_add_trace_msg((bfqd)->queue, "bfq " fmt, ##args)
++
++/* Expiration reasons. */
++enum bfqq_expiration {
++ BFQ_BFQQ_TOO_IDLE = 0, /*
++ * queue has been idling for
++ * too long
++ */
++ BFQ_BFQQ_BUDGET_TIMEOUT, /* budget took too long to be used */
++ BFQ_BFQQ_BUDGET_EXHAUSTED, /* budget consumed */
++ BFQ_BFQQ_NO_MORE_REQUESTS, /* the queue has no more requests */
++};
++
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++
++struct bfqg_stats {
++ /* total bytes transferred */
++ struct blkg_rwstat service_bytes;
++ /* total IOs serviced, post merge */
++ struct blkg_rwstat serviced;
++ /* number of ios merged */
++ struct blkg_rwstat merged;
++ /* total time spent on device in ns, may not be accurate w/ queueing */
++ struct blkg_rwstat service_time;
++ /* total time spent waiting in scheduler queue in ns */
++ struct blkg_rwstat wait_time;
++ /* number of IOs queued up */
++ struct blkg_rwstat queued;
++ /* total sectors transferred */
++ struct blkg_stat sectors;
++ /* total disk time and nr sectors dispatched by this group */
++ struct blkg_stat time;
++ /* time not charged to this cgroup */
++ struct blkg_stat unaccounted_time;
++ /* sum of number of ios queued across all samples */
++ struct blkg_stat avg_queue_size_sum;
++ /* count of samples taken for average */
++ struct blkg_stat avg_queue_size_samples;
++ /* how many times this group has been removed from service tree */
++ struct blkg_stat dequeue;
++ /* total time spent waiting for it to be assigned a timeslice. */
++ struct blkg_stat group_wait_time;
++ /* time spent idling for this blkcg_gq */
++ struct blkg_stat idle_time;
++ /* total time with empty current active q with other requests queued */
++ struct blkg_stat empty_time;
++ /* fields after this shouldn't be cleared on stat reset */
++ uint64_t start_group_wait_time;
++ uint64_t start_idle_time;
++ uint64_t start_empty_time;
++ uint16_t flags;
++};
++
++/*
++ * struct bfq_group_data - per-blkcg storage for the blkio subsystem.
++ *
++ * @ps: @blkcg_policy_storage that this structure inherits
++ * @weight: weight of the bfq_group
++ */
++struct bfq_group_data {
++ /* must be the first member */
++ struct blkcg_policy_data pd;
++
++ unsigned short weight;
++};
++
++/**
++ * struct bfq_group - per (device, cgroup) data structure.
++ * @entity: schedulable entity to insert into the parent group sched_data.
++ * @sched_data: own sched_data, to contain child entities (they may be
++ * both bfq_queues and bfq_groups).
++ * @bfqd_node: node to be inserted into the @bfqd->group_list list
++ * of the groups active on the same device; used for cleanup.
++ * @bfqd: the bfq_data for the device this group acts upon.
++ * @async_bfqq: array of async queues for all the tasks belonging to
++ * the group, one queue per ioprio value per ioprio_class,
++ * except for the idle class that has only one queue.
++ * @async_idle_bfqq: async queue for the idle class (ioprio is ignored).
++ * @my_entity: pointer to @entity, %NULL for the toplevel group; used
++ * to avoid too many special cases during group creation/
++ * migration.
++ * @active_entities: number of active entities belonging to the group;
++ * unused for the root group. Used to know whether there
++ * are groups with more than one active @bfq_entity
++ * (see the comments to the function
++ * bfq_bfqq_must_not_expire()).
++ *
++ * Each (device, cgroup) pair has its own bfq_group, i.e., for each cgroup
++ * there is a set of bfq_groups, each one collecting the lower-level
++ * entities belonging to the group that are acting on the same device.
++ *
++ * Locking works as follows:
++ * o @bfqd is protected by the queue lock, RCU is used to access it
++ * from the readers.
++ * o All the other fields are protected by the @bfqd queue lock.
++ */
++struct bfq_group {
++ /* must be the first member */
++ struct blkg_policy_data pd;
++
++ struct bfq_entity entity;
++ struct bfq_sched_data sched_data;
++
++ struct hlist_node bfqd_node;
++
++ void *bfqd;
++
++ struct bfq_queue *async_bfqq[2][IOPRIO_BE_NR];
++ struct bfq_queue *async_idle_bfqq;
++
++ struct bfq_entity *my_entity;
++
++ int active_entities;
++
++ struct bfqg_stats stats;
++ struct bfqg_stats dead_stats; /* stats pushed from dead children */
++};
++
++#else
++struct bfq_group {
++ struct bfq_sched_data sched_data;
++
++ struct bfq_queue *async_bfqq[2][IOPRIO_BE_NR];
++ struct bfq_queue *async_idle_bfqq;
++};
++#endif
++
++static struct bfq_queue *bfq_entity_to_bfqq(struct bfq_entity *entity);
++
++static struct bfq_service_tree *
++bfq_entity_service_tree(struct bfq_entity *entity)
++{
++ struct bfq_sched_data *sched_data = entity->sched_data;
++ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++ unsigned int idx = bfqq ? bfqq->ioprio_class - 1 :
++ BFQ_DEFAULT_GRP_CLASS;
++
++ BUG_ON(idx >= BFQ_IOPRIO_CLASSES);
++ BUG_ON(sched_data == NULL);
++
++ return sched_data->service_tree + idx;
++}
++
++static struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic, bool is_sync)
++{
++ return bic->bfqq[is_sync];
++}
++
++static void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq,
++ bool is_sync)
++{
++ bic->bfqq[is_sync] = bfqq;
++}
++
++static struct bfq_data *bic_to_bfqd(struct bfq_io_cq *bic)
++{
++ return bic->icq.q->elevator->elevator_data;
++}
++
++/**
++ * bfq_get_bfqd_locked - get a lock to a bfqd using a RCU protected pointer.
++ * @ptr: a pointer to a bfqd.
++ * @flags: storage for the flags to be saved.
++ *
++ * This function allows bfqg->bfqd to be protected by the
++ * queue lock of the bfqd they reference; the pointer is dereferenced
++ * under RCU, so the storage for bfqd is assured to be safe as long
++ * as the RCU read side critical section does not end. After the
++ * bfqd->queue->queue_lock is taken the pointer is rechecked, to be
++ * sure that no other writer accessed it. If we raced with a writer,
++ * the function returns NULL, with the queue unlocked, otherwise it
++ * returns the dereferenced pointer, with the queue locked.
++ */
++static struct bfq_data *bfq_get_bfqd_locked(void **ptr, unsigned long *flags)
++{
++ struct bfq_data *bfqd;
++
++ rcu_read_lock();
++ bfqd = rcu_dereference(*(struct bfq_data **)ptr);
++
++ if (bfqd != NULL) {
++ spin_lock_irqsave(bfqd->queue->queue_lock, *flags);
++ if (ptr == NULL)
++ printk(KERN_CRIT "get_bfqd_locked pointer NULL\n");
++ else if (*ptr == bfqd)
++ goto out;
++ spin_unlock_irqrestore(bfqd->queue->queue_lock, *flags);
++ }
++
++ bfqd = NULL;
++out:
++ rcu_read_unlock();
++ return bfqd;
++}
++
++static void bfq_put_bfqd_unlock(struct bfq_data *bfqd, unsigned long *flags)
++{
++ spin_unlock_irqrestore(bfqd->queue->queue_lock, *flags);
++}
++
++static void bfq_check_ioprio_change(struct bfq_io_cq *bic, struct bio *bio);
++static void bfq_put_queue(struct bfq_queue *bfqq);
++static void bfq_dispatch_insert(struct request_queue *q, struct request *rq);
++static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
++ struct bio *bio, int is_sync,
++ struct bfq_io_cq *bic, gfp_t gfp_mask);
++static void bfq_end_wr_async_queues(struct bfq_data *bfqd,
++ struct bfq_group *bfqg);
++static void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg);
++static void bfq_exit_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq);
++
++#endif /* _BFQ_H */
+--
+2.1.4
+
diff --git a/5003_block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r9-for-4.2.patch b/5003_block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r9-for-4.2.patch
new file mode 100644
index 0000000..dac6db6
--- /dev/null
+++ b/5003_block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r9-for-4.2.patch
@@ -0,0 +1,1097 @@
+From 75c9c5ea340776c0a9e934581cf63cb963a33fd4 Mon Sep 17 00:00:00 2001
+From: Mauro Andreolini <mauro.andreolini@unimore.it>
+Date: Sun, 6 Sep 2015 16:09:05 +0200
+Subject: [PATCH 3/3] block, bfq: add Early Queue Merge (EQM) to BFQ-v7r9 for
+ 4.2.0
+
+A set of processes may happen to perform interleaved reads, i.e.,requests
+whose union would give rise to a sequential read pattern. There are two
+typical cases: in the first case, processes read fixed-size chunks of
+data at a fixed distance from each other, while in the second case processes
+may read variable-size chunks at variable distances. The latter case occurs
+for example with QEMU, which splits the I/O generated by the guest into
+multiple chunks, and lets these chunks be served by a pool of cooperating
+processes, iteratively assigning the next chunk of I/O to the first
+available process. CFQ uses actual queue merging for the first type of
+rocesses, whereas it uses preemption to get a sequential read pattern out
+of the read requests performed by the second type of processes. In the end
+it uses two different mechanisms to achieve the same goal: boosting the
+throughput with interleaved I/O.
+
+This patch introduces Early Queue Merge (EQM), a unified mechanism to get a
+sequential read pattern with both types of processes. The main idea is
+checking newly arrived requests against the next request of the active queue
+both in case of actual request insert and in case of request merge. By doing
+so, both the types of processes can be handled by just merging their queues.
+EQM is then simpler and more compact than the pair of mechanisms used in
+CFQ.
+
+Finally, EQM also preserves the typical low-latency properties of BFQ, by
+properly restoring the weight-raising state of a queue when it gets back to
+a non-merged state.
+
+Signed-off-by: Mauro Andreolini <mauro.andreolini@unimore.it>
+Signed-off-by: Arianna Avanzini <avanzini@google.com>
+Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
+---
+ block/bfq-cgroup.c | 4 +
+ block/bfq-iosched.c | 684 ++++++++++++++++++++++++++++++++++++++++++++++++++--
+ block/bfq.h | 66 +++++
+ 3 files changed, 740 insertions(+), 14 deletions(-)
+
+diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
+index c02d65a..bc34d7a 100644
+--- a/block/bfq-cgroup.c
++++ b/block/bfq-cgroup.c
+@@ -382,6 +382,7 @@ static void bfq_pd_init(struct blkcg_gq *blkg)
+ */
+ bfqg->bfqd = bfqd;
+ bfqg->active_entities = 0;
++ bfqg->rq_pos_tree = RB_ROOT;
+
+ /* if the root_group does not exist, we are handling it right now */
+ if (bfqd->root_group && bfqg != bfqd->root_group)
+@@ -484,6 +485,8 @@ static struct bfq_group *bfq_find_alloc_group(struct bfq_data *bfqd,
+ return bfqg;
+ }
+
++static void bfq_pos_tree_add_move(struct bfq_data *bfqd, struct bfq_queue *bfqq);
++
+ /**
+ * bfq_bfqq_move - migrate @bfqq to @bfqg.
+ * @bfqd: queue descriptor.
+@@ -531,6 +534,7 @@ static void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
+ bfqg_get(bfqg);
+
+ if (busy) {
++ bfq_pos_tree_add_move(bfqd, bfqq);
+ if (resume)
+ bfq_activate_bfqq(bfqd, bfqq);
+ }
+diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
+index 51d24dd..fcd6eea 100644
+--- a/block/bfq-iosched.c
++++ b/block/bfq-iosched.c
+@@ -296,6 +296,72 @@ static struct request *bfq_choose_req(struct bfq_data *bfqd,
+ }
+ }
+
++static struct bfq_queue *
++bfq_rq_pos_tree_lookup(struct bfq_data *bfqd, struct rb_root *root,
++ sector_t sector, struct rb_node **ret_parent,
++ struct rb_node ***rb_link)
++{
++ struct rb_node **p, *parent;
++ struct bfq_queue *bfqq = NULL;
++
++ parent = NULL;
++ p = &root->rb_node;
++ while (*p) {
++ struct rb_node **n;
++
++ parent = *p;
++ bfqq = rb_entry(parent, struct bfq_queue, pos_node);
++
++ /*
++ * Sort strictly based on sector. Smallest to the left,
++ * largest to the right.
++ */
++ if (sector > blk_rq_pos(bfqq->next_rq))
++ n = &(*p)->rb_right;
++ else if (sector < blk_rq_pos(bfqq->next_rq))
++ n = &(*p)->rb_left;
++ else
++ break;
++ p = n;
++ bfqq = NULL;
++ }
++
++ *ret_parent = parent;
++ if (rb_link)
++ *rb_link = p;
++
++ bfq_log(bfqd, "rq_pos_tree_lookup %llu: returning %d",
++ (long long unsigned)sector,
++ bfqq ? bfqq->pid : 0);
++
++ return bfqq;
++}
++
++static void bfq_pos_tree_add_move(struct bfq_data *bfqd, struct bfq_queue *bfqq)
++{
++ struct rb_node **p, *parent;
++ struct bfq_queue *__bfqq;
++
++ if (bfqq->pos_root) {
++ rb_erase(&bfqq->pos_node, bfqq->pos_root);
++ bfqq->pos_root = NULL;
++ }
++
++ if (bfq_class_idle(bfqq))
++ return;
++ if (!bfqq->next_rq)
++ return;
++
++ bfqq->pos_root = &bfq_bfqq_to_bfqg(bfqq)->rq_pos_tree;
++ __bfqq = bfq_rq_pos_tree_lookup(bfqd, bfqq->pos_root,
++ blk_rq_pos(bfqq->next_rq), &parent, &p);
++ if (!__bfqq) {
++ rb_link_node(&bfqq->pos_node, parent, p);
++ rb_insert_color(&bfqq->pos_node, bfqq->pos_root);
++ } else
++ bfqq->pos_root = NULL;
++}
++
+ /*
+ * Tell whether there are active queues or groups with differentiated weights.
+ */
+@@ -528,6 +594,57 @@ static unsigned int bfq_wr_duration(struct bfq_data *bfqd)
+ return dur;
+ }
+
++static unsigned bfq_bfqq_cooperations(struct bfq_queue *bfqq)
++{
++ return bfqq->bic ? bfqq->bic->cooperations : 0;
++}
++
++static void
++bfq_bfqq_resume_state(struct bfq_queue *bfqq, struct bfq_io_cq *bic)
++{
++ if (bic->saved_idle_window)
++ bfq_mark_bfqq_idle_window(bfqq);
++ else
++ bfq_clear_bfqq_idle_window(bfqq);
++ if (bic->saved_IO_bound)
++ bfq_mark_bfqq_IO_bound(bfqq);
++ else
++ bfq_clear_bfqq_IO_bound(bfqq);
++ /* Assuming that the flag in_large_burst is already correctly set */
++ if (bic->wr_time_left && bfqq->bfqd->low_latency &&
++ !bfq_bfqq_in_large_burst(bfqq) &&
++ bic->cooperations < bfqq->bfqd->bfq_coop_thresh) {
++ /*
++ * Start a weight raising period with the duration given by
++ * the raising_time_left snapshot.
++ */
++ if (bfq_bfqq_busy(bfqq))
++ bfqq->bfqd->wr_busy_queues++;
++ bfqq->wr_coeff = bfqq->bfqd->bfq_wr_coeff;
++ bfqq->wr_cur_max_time = bic->wr_time_left;
++ bfqq->last_wr_start_finish = jiffies;
++ bfqq->entity.prio_changed = 1;
++ }
++ /*
++ * Clear wr_time_left to prevent bfq_bfqq_save_state() from
++ * getting confused about the queue's need of a weight-raising
++ * period.
++ */
++ bic->wr_time_left = 0;
++}
++
++static int bfqq_process_refs(struct bfq_queue *bfqq)
++{
++ int process_refs, io_refs;
++
++ lockdep_assert_held(bfqq->bfqd->queue->queue_lock);
++
++ io_refs = bfqq->allocated[READ] + bfqq->allocated[WRITE];
++ process_refs = atomic_read(&bfqq->ref) - io_refs - bfqq->entity.on_st;
++ BUG_ON(process_refs < 0);
++ return process_refs;
++}
++
+ /* Empty burst list and add just bfqq (see comments to bfq_handle_burst) */
+ static void bfq_reset_burst_list(struct bfq_data *bfqd, struct bfq_queue *bfqq)
+ {
+@@ -764,8 +881,14 @@ static void bfq_add_request(struct request *rq)
+ BUG_ON(!next_rq);
+ bfqq->next_rq = next_rq;
+
++ /*
++ * Adjust priority tree position, if next_rq changes.
++ */
++ if (prev != bfqq->next_rq)
++ bfq_pos_tree_add_move(bfqd, bfqq);
++
+ if (!bfq_bfqq_busy(bfqq)) {
+- bool soft_rt, in_burst,
++ bool soft_rt, coop_or_in_burst,
+ idle_for_long_time = time_is_before_jiffies(
+ bfqq->budget_timeout +
+ bfqd->bfq_wr_min_idle_time);
+@@ -793,11 +916,12 @@ static void bfq_add_request(struct request *rq)
+ bfqd->last_ins_in_burst = jiffies;
+ }
+
+- in_burst = bfq_bfqq_in_large_burst(bfqq);
++ coop_or_in_burst = bfq_bfqq_in_large_burst(bfqq) ||
++ bfq_bfqq_cooperations(bfqq) >= bfqd->bfq_coop_thresh;
+ soft_rt = bfqd->bfq_wr_max_softrt_rate > 0 &&
+- !in_burst &&
++ !coop_or_in_burst &&
+ time_is_before_jiffies(bfqq->soft_rt_next_start);
+- interactive = !in_burst && idle_for_long_time;
++ interactive = !coop_or_in_burst && idle_for_long_time;
+ entity->budget = max_t(unsigned long, bfqq->max_budget,
+ bfq_serv_to_charge(next_rq, bfqq));
+
+@@ -816,6 +940,9 @@ static void bfq_add_request(struct request *rq)
+ if (!bfqd->low_latency)
+ goto add_bfqq_busy;
+
++ if (bfq_bfqq_just_split(bfqq))
++ goto set_prio_changed;
++
+ /*
+ * If the queue:
+ * - is not being boosted,
+@@ -840,7 +967,7 @@ static void bfq_add_request(struct request *rq)
+ } else if (old_wr_coeff > 1) {
+ if (interactive)
+ bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
+- else if (in_burst ||
++ else if (coop_or_in_burst ||
+ (bfqq->wr_cur_max_time ==
+ bfqd->bfq_wr_rt_max_time &&
+ !soft_rt)) {
+@@ -905,6 +1032,7 @@ static void bfq_add_request(struct request *rq)
+ bfqd->bfq_wr_rt_max_time;
+ }
+ }
++set_prio_changed:
+ if (old_wr_coeff != bfqq->wr_coeff)
+ entity->prio_changed = 1;
+ add_bfqq_busy:
+@@ -1047,6 +1175,15 @@ static void bfq_merged_request(struct request_queue *q, struct request *req,
+ bfqd->last_position);
+ BUG_ON(!next_rq);
+ bfqq->next_rq = next_rq;
++ /*
++ * If next_rq changes, update both the queue's budget to
++ * fit the new request and the queue's position in its
++ * rq_pos_tree.
++ */
++ if (prev != bfqq->next_rq) {
++ bfq_updated_next_req(bfqd, bfqq);
++ bfq_pos_tree_add_move(bfqd, bfqq);
++ }
+ }
+ }
+
+@@ -1129,11 +1266,343 @@ static void bfq_end_wr(struct bfq_data *bfqd)
+ spin_unlock_irq(bfqd->queue->queue_lock);
+ }
+
++static sector_t bfq_io_struct_pos(void *io_struct, bool request)
++{
++ if (request)
++ return blk_rq_pos(io_struct);
++ else
++ return ((struct bio *)io_struct)->bi_iter.bi_sector;
++}
++
++static int bfq_rq_close_to_sector(void *io_struct, bool request,
++ sector_t sector)
++{
++ return abs64(bfq_io_struct_pos(io_struct, request) - sector) <=
++ BFQQ_SEEK_THR;
++}
++
++static struct bfq_queue *bfqq_find_close(struct bfq_data *bfqd,
++ struct bfq_queue *bfqq,
++ sector_t sector)
++{
++ struct rb_root *root = &bfq_bfqq_to_bfqg(bfqq)->rq_pos_tree;
++ struct rb_node *parent, *node;
++ struct bfq_queue *__bfqq;
++
++ if (RB_EMPTY_ROOT(root))
++ return NULL;
++
++ /*
++ * First, if we find a request starting at the end of the last
++ * request, choose it.
++ */
++ __bfqq = bfq_rq_pos_tree_lookup(bfqd, root, sector, &parent, NULL);
++ if (__bfqq)
++ return __bfqq;
++
++ /*
++ * If the exact sector wasn't found, the parent of the NULL leaf
++ * will contain the closest sector (rq_pos_tree sorted by
++ * next_request position).
++ */
++ __bfqq = rb_entry(parent, struct bfq_queue, pos_node);
++ if (bfq_rq_close_to_sector(__bfqq->next_rq, true, sector))
++ return __bfqq;
++
++ if (blk_rq_pos(__bfqq->next_rq) < sector)
++ node = rb_next(&__bfqq->pos_node);
++ else
++ node = rb_prev(&__bfqq->pos_node);
++ if (!node)
++ return NULL;
++
++ __bfqq = rb_entry(node, struct bfq_queue, pos_node);
++ if (bfq_rq_close_to_sector(__bfqq->next_rq, true, sector))
++ return __bfqq;
++
++ return NULL;
++}
++
++static struct bfq_queue *bfq_find_close_cooperator(struct bfq_data *bfqd,
++ struct bfq_queue *cur_bfqq,
++ sector_t sector)
++{
++ struct bfq_queue *bfqq;
++
++ /*
++ * We should notice if some of the queues are cooperating, e.g.
++ * working closely on the same area of the disk. In that case,
++ * we can group them together and don't waste time idling.
++ */
++ bfqq = bfqq_find_close(bfqd, cur_bfqq, sector);
++ if (!bfqq || bfqq == cur_bfqq)
++ return NULL;
++
++ return bfqq;
++}
++
++static struct bfq_queue *
++bfq_setup_merge(struct bfq_queue *bfqq, struct bfq_queue *new_bfqq)
++{
++ int process_refs, new_process_refs;
++ struct bfq_queue *__bfqq;
++
++ /*
++ * If there are no process references on the new_bfqq, then it is
++ * unsafe to follow the ->new_bfqq chain as other bfqq's in the chain
++ * may have dropped their last reference (not just their last process
++ * reference).
++ */
++ if (!bfqq_process_refs(new_bfqq))
++ return NULL;
++
++ /* Avoid a circular list and skip interim queue merges. */
++ while ((__bfqq = new_bfqq->new_bfqq)) {
++ if (__bfqq == bfqq)
++ return NULL;
++ new_bfqq = __bfqq;
++ }
++
++ process_refs = bfqq_process_refs(bfqq);
++ new_process_refs = bfqq_process_refs(new_bfqq);
++ /*
++ * If the process for the bfqq has gone away, there is no
++ * sense in merging the queues.
++ */
++ if (process_refs == 0 || new_process_refs == 0)
++ return NULL;
++
++ bfq_log_bfqq(bfqq->bfqd, bfqq, "scheduling merge with queue %d",
++ new_bfqq->pid);
++
++ /*
++ * Merging is just a redirection: the requests of the process
++ * owning one of the two queues are redirected to the other queue.
++ * The latter queue, in its turn, is set as shared if this is the
++ * first time that the requests of some process are redirected to
++ * it.
++ *
++ * We redirect bfqq to new_bfqq and not the opposite, because we
++ * are in the context of the process owning bfqq, hence we have
++ * the io_cq of this process. So we can immediately configure this
++ * io_cq to redirect the requests of the process to new_bfqq.
++ *
++ * NOTE, even if new_bfqq coincides with the in-service queue, the
++ * io_cq of new_bfqq is not available, because, if the in-service
++ * queue is shared, bfqd->in_service_bic may not point to the
++ * io_cq of the in-service queue.
++ * Redirecting the requests of the process owning bfqq to the
++ * currently in-service queue is in any case the best option, as
++ * we feed the in-service queue with new requests close to the
++ * last request served and, by doing so, hopefully increase the
++ * throughput.
++ */
++ bfqq->new_bfqq = new_bfqq;
++ atomic_add(process_refs, &new_bfqq->ref);
++ return new_bfqq;
++}
++
++static bool bfq_may_be_close_cooperator(struct bfq_queue *bfqq,
++ struct bfq_queue *new_bfqq)
++{
++ if (WARN_ON(bfqq->entity.parent != new_bfqq->entity.parent))
++ return false;
++
++ if (bfq_class_idle(bfqq) || bfq_class_idle(new_bfqq) ||
++ (bfqq->ioprio_class != new_bfqq->ioprio_class))
++ return false;
++
++ /*
++ * If either of the queues has already been detected as seeky,
++ * then merging it with the other queue is unlikely to lead to
++ * sequential I/O.
++ */
++ if (BFQQ_SEEKY(bfqq) || BFQQ_SEEKY(new_bfqq))
++ return false;
++
++ /*
++ * Interleaved I/O is known to be done by (some) applications
++ * only for reads, so it does not make sense to merge async
++ * queues.
++ */
++ if (!bfq_bfqq_sync(bfqq) || !bfq_bfqq_sync(new_bfqq))
++ return false;
++
++ return true;
++}
++
++/*
++ * Attempt to schedule a merge of bfqq with the currently in-service queue
++ * or with a close queue among the scheduled queues.
++ * Return NULL if no merge was scheduled, a pointer to the shared bfq_queue
++ * structure otherwise.
++ *
++ * The OOM queue is not allowed to participate to cooperation: in fact, since
++ * the requests temporarily redirected to the OOM queue could be redirected
++ * again to dedicated queues at any time, the state needed to correctly
++ * handle merging with the OOM queue would be quite complex and expensive
++ * to maintain. Besides, in such a critical condition as an out of memory,
++ * the benefits of queue merging may be little relevant, or even negligible.
++ */
++static struct bfq_queue *
++bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++ void *io_struct, bool request)
++{
++ struct bfq_queue *in_service_bfqq, *new_bfqq;
++
++ if (bfqq->new_bfqq)
++ return bfqq->new_bfqq;
++ if (!io_struct || unlikely(bfqq == &bfqd->oom_bfqq))
++ return NULL;
++ /* If device has only one backlogged bfq_queue, don't search. */
++ if (bfqd->busy_queues == 1)
++ return NULL;
++
++ in_service_bfqq = bfqd->in_service_queue;
++
++ if (!in_service_bfqq || in_service_bfqq == bfqq ||
++ !bfqd->in_service_bic ||
++ unlikely(in_service_bfqq == &bfqd->oom_bfqq))
++ goto check_scheduled;
++
++ if (bfq_rq_close_to_sector(io_struct, request, bfqd->last_position) &&
++ bfq_may_be_close_cooperator(bfqq, in_service_bfqq)) {
++ new_bfqq = bfq_setup_merge(bfqq, in_service_bfqq);
++ if (new_bfqq)
++ return new_bfqq;
++ }
++ /*
++ * Check whether there is a cooperator among currently scheduled
++ * queues. The only thing we need is that the bio/request is not
++ * NULL, as we need it to establish whether a cooperator exists.
++ */
++check_scheduled:
++ new_bfqq = bfq_find_close_cooperator(bfqd, bfqq,
++ bfq_io_struct_pos(io_struct, request));
++ if (new_bfqq && likely(new_bfqq != &bfqd->oom_bfqq) &&
++ bfq_may_be_close_cooperator(bfqq, new_bfqq))
++ return bfq_setup_merge(bfqq, new_bfqq);
++
++ return NULL;
++}
++
++static void bfq_bfqq_save_state(struct bfq_queue *bfqq)
++{
++ /*
++ * If !bfqq->bic, the queue is already shared or its requests
++ * have already been redirected to a shared queue; both idle window
++ * and weight raising state have already been saved. Do nothing.
++ */
++ if (!bfqq->bic)
++ return;
++ if (bfqq->bic->wr_time_left)
++ /*
++ * This is the queue of a just-started process, and would
++ * deserve weight raising: we set wr_time_left to the full
++ * weight-raising duration to trigger weight-raising when
++ * and if the queue is split and the first request of the
++ * queue is enqueued.
++ */
++ bfqq->bic->wr_time_left = bfq_wr_duration(bfqq->bfqd);
++ else if (bfqq->wr_coeff > 1) {
++ unsigned long wr_duration =
++ jiffies - bfqq->last_wr_start_finish;
++ /*
++ * It may happen that a queue's weight raising period lasts
++ * longer than its wr_cur_max_time, as weight raising is
++ * handled only when a request is enqueued or dispatched (it
++ * does not use any timer). If the weight raising period is
++ * about to end, don't save it.
++ */
++ if (bfqq->wr_cur_max_time <= wr_duration)
++ bfqq->bic->wr_time_left = 0;
++ else
++ bfqq->bic->wr_time_left =
++ bfqq->wr_cur_max_time - wr_duration;
++ /*
++ * The bfq_queue is becoming shared or the requests of the
++ * process owning the queue are being redirected to a shared
++ * queue. Stop the weight raising period of the queue, as in
++ * both cases it should not be owned by an interactive or
++ * soft real-time application.
++ */
++ bfq_bfqq_end_wr(bfqq);
++ } else
++ bfqq->bic->wr_time_left = 0;
++ bfqq->bic->saved_idle_window = bfq_bfqq_idle_window(bfqq);
++ bfqq->bic->saved_IO_bound = bfq_bfqq_IO_bound(bfqq);
++ bfqq->bic->saved_in_large_burst = bfq_bfqq_in_large_burst(bfqq);
++ bfqq->bic->was_in_burst_list = !hlist_unhashed(&bfqq->burst_list_node);
++ bfqq->bic->cooperations++;
++ bfqq->bic->failed_cooperations = 0;
++}
++
++static void bfq_get_bic_reference(struct bfq_queue *bfqq)
++{
++ /*
++ * If bfqq->bic has a non-NULL value, the bic to which it belongs
++ * is about to begin using a shared bfq_queue.
++ */
++ if (bfqq->bic)
++ atomic_long_inc(&bfqq->bic->icq.ioc->refcount);
++}
++
++static void
++bfq_merge_bfqqs(struct bfq_data *bfqd, struct bfq_io_cq *bic,
++ struct bfq_queue *bfqq, struct bfq_queue *new_bfqq)
++{
++ bfq_log_bfqq(bfqd, bfqq, "merging with queue %lu",
++ (long unsigned)new_bfqq->pid);
++ /* Save weight raising and idle window of the merged queues */
++ bfq_bfqq_save_state(bfqq);
++ bfq_bfqq_save_state(new_bfqq);
++ if (bfq_bfqq_IO_bound(bfqq))
++ bfq_mark_bfqq_IO_bound(new_bfqq);
++ bfq_clear_bfqq_IO_bound(bfqq);
++ /*
++ * Grab a reference to the bic, to prevent it from being destroyed
++ * before being possibly touched by a bfq_split_bfqq().
++ */
++ bfq_get_bic_reference(bfqq);
++ bfq_get_bic_reference(new_bfqq);
++ /*
++ * Merge queues (that is, let bic redirect its requests to new_bfqq)
++ */
++ bic_set_bfqq(bic, new_bfqq, 1);
++ bfq_mark_bfqq_coop(new_bfqq);
++ /*
++ * new_bfqq now belongs to at least two bics (it is a shared queue):
++ * set new_bfqq->bic to NULL. bfqq either:
++ * - does not belong to any bic any more, and hence bfqq->bic must
++ * be set to NULL, or
++ * - is a queue whose owning bics have already been redirected to a
++ * different queue, hence the queue is destined to not belong to
++ * any bic soon and bfqq->bic is already NULL (therefore the next
++ * assignment causes no harm).
++ */
++ new_bfqq->bic = NULL;
++ bfqq->bic = NULL;
++ bfq_put_queue(bfqq);
++}
++
++static void bfq_bfqq_increase_failed_cooperations(struct bfq_queue *bfqq)
++{
++ struct bfq_io_cq *bic = bfqq->bic;
++ struct bfq_data *bfqd = bfqq->bfqd;
++
++ if (bic && bfq_bfqq_cooperations(bfqq) >= bfqd->bfq_coop_thresh) {
++ bic->failed_cooperations++;
++ if (bic->failed_cooperations >= bfqd->bfq_failed_cooperations)
++ bic->cooperations = 0;
++ }
++}
++
+ static int bfq_allow_merge(struct request_queue *q, struct request *rq,
+ struct bio *bio)
+ {
+ struct bfq_data *bfqd = q->elevator->elevator_data;
+ struct bfq_io_cq *bic;
++ struct bfq_queue *bfqq, *new_bfqq;
+
+ /*
+ * Disallow merge of a sync bio into an async request.
+@@ -1150,7 +1619,26 @@ static int bfq_allow_merge(struct request_queue *q, struct request *rq,
+ if (!bic)
+ return 0;
+
+- return bic_to_bfqq(bic, bfq_bio_sync(bio)) == RQ_BFQQ(rq);
++ bfqq = bic_to_bfqq(bic, bfq_bio_sync(bio));
++ /*
++ * We take advantage of this function to perform an early merge
++ * of the queues of possible cooperating processes.
++ */
++ if (bfqq) {
++ new_bfqq = bfq_setup_cooperator(bfqd, bfqq, bio, false);
++ if (new_bfqq) {
++ bfq_merge_bfqqs(bfqd, bic, bfqq, new_bfqq);
++ /*
++ * If we get here, the bio will be queued in the
++ * shared queue, i.e., new_bfqq, so use new_bfqq
++ * to decide whether bio and rq can be merged.
++ */
++ bfqq = new_bfqq;
++ } else
++ bfq_bfqq_increase_failed_cooperations(bfqq);
++ }
++
++ return bfqq == RQ_BFQQ(rq);
+ }
+
+ static void __bfq_set_in_service_queue(struct bfq_data *bfqd,
+@@ -1349,6 +1837,15 @@ static void __bfq_bfqq_expire(struct bfq_data *bfqd, struct bfq_queue *bfqq)
+
+ __bfq_bfqd_reset_in_service(bfqd);
+
++ /*
++ * If this bfqq is shared between multiple processes, check
++ * to make sure that those processes are still issuing I/Os
++ * within the mean seek distance. If not, it may be time to
++ * break the queues apart again.
++ */
++ if (bfq_bfqq_coop(bfqq) && BFQQ_SEEKY(bfqq))
++ bfq_mark_bfqq_split_coop(bfqq);
++
+ if (RB_EMPTY_ROOT(&bfqq->sort_list)) {
+ /*
+ * Overloading budget_timeout field to store the time
+@@ -1357,8 +1854,13 @@ static void __bfq_bfqq_expire(struct bfq_data *bfqd, struct bfq_queue *bfqq)
+ */
+ bfqq->budget_timeout = jiffies;
+ bfq_del_bfqq_busy(bfqd, bfqq, 1);
+- } else
++ } else {
+ bfq_activate_bfqq(bfqd, bfqq);
++ /*
++ * Resort priority tree of potential close cooperators.
++ */
++ bfq_pos_tree_add_move(bfqd, bfqq);
++ }
+ }
+
+ /**
+@@ -2242,10 +2744,12 @@ static void bfq_update_wr_data(struct bfq_data *bfqd, struct bfq_queue *bfqq)
+ /*
+ * If the queue was activated in a burst, or
+ * too much time has elapsed from the beginning
+- * of this weight-raising period, then end weight
+- * raising.
++ * of this weight-raising period, or the queue has
++ * exceeded the acceptable number of cooperations,
++ * then end weight raising.
+ */
+ if (bfq_bfqq_in_large_burst(bfqq) ||
++ bfq_bfqq_cooperations(bfqq) >= bfqd->bfq_coop_thresh ||
+ time_is_before_jiffies(bfqq->last_wr_start_finish +
+ bfqq->wr_cur_max_time)) {
+ bfqq->last_wr_start_finish = jiffies;
+@@ -2474,6 +2978,25 @@ static void bfq_put_queue(struct bfq_queue *bfqq)
+ #endif
+ }
+
++static void bfq_put_cooperator(struct bfq_queue *bfqq)
++{
++ struct bfq_queue *__bfqq, *next;
++
++ /*
++ * If this queue was scheduled to merge with another queue, be
++ * sure to drop the reference taken on that queue (and others in
++ * the merge chain). See bfq_setup_merge and bfq_merge_bfqqs.
++ */
++ __bfqq = bfqq->new_bfqq;
++ while (__bfqq) {
++ if (__bfqq == bfqq)
++ break;
++ next = __bfqq->new_bfqq;
++ bfq_put_queue(__bfqq);
++ __bfqq = next;
++ }
++}
++
+ static void bfq_exit_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq)
+ {
+ if (bfqq == bfqd->in_service_queue) {
+@@ -2484,6 +3007,8 @@ static void bfq_exit_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq)
+ bfq_log_bfqq(bfqd, bfqq, "exit_bfqq: %p, %d", bfqq,
+ atomic_read(&bfqq->ref));
+
++ bfq_put_cooperator(bfqq);
++
+ bfq_put_queue(bfqq);
+ }
+
+@@ -2492,6 +3017,25 @@ static void bfq_init_icq(struct io_cq *icq)
+ struct bfq_io_cq *bic = icq_to_bic(icq);
+
+ bic->ttime.last_end_request = jiffies;
++ /*
++ * A newly created bic indicates that the process has just
++ * started doing I/O, and is probably mapping into memory its
++ * executable and libraries: it definitely needs weight raising.
++ * There is however the possibility that the process performs,
++ * for a while, I/O close to some other process. EQM intercepts
++ * this behavior and may merge the queue corresponding to the
++ * process with some other queue, BEFORE the weight of the queue
++ * is raised. Merged queues are not weight-raised (they are assumed
++ * to belong to processes that benefit only from high throughput).
++ * If the merge is basically the consequence of an accident, then
++ * the queue will be split soon and will get back its old weight.
++ * It is then important to write down somewhere that this queue
++ * does need weight raising, even if it did not make it to get its
++ * weight raised before being merged. To this purpose, we overload
++ * the field raising_time_left and assign 1 to it, to mark the queue
++ * as needing weight raising.
++ */
++ bic->wr_time_left = 1;
+ }
+
+ static void bfq_exit_icq(struct io_cq *icq)
+@@ -2505,6 +3049,13 @@ static void bfq_exit_icq(struct io_cq *icq)
+ }
+
+ if (bic->bfqq[BLK_RW_SYNC]) {
++ /*
++ * If the bic is using a shared queue, put the reference
++ * taken on the io_context when the bic started using a
++ * shared bfq_queue.
++ */
++ if (bfq_bfqq_coop(bic->bfqq[BLK_RW_SYNC]))
++ put_io_context(icq->ioc);
+ bfq_exit_bfqq(bfqd, bic->bfqq[BLK_RW_SYNC]);
+ bic->bfqq[BLK_RW_SYNC] = NULL;
+ }
+@@ -2809,6 +3360,10 @@ static void bfq_update_idle_window(struct bfq_data *bfqd,
+ if (!bfq_bfqq_sync(bfqq) || bfq_class_idle(bfqq))
+ return;
+
++ /* Idle window just restored, statistics are meaningless. */
++ if (bfq_bfqq_just_split(bfqq))
++ return;
++
+ enable_idle = bfq_bfqq_idle_window(bfqq);
+
+ if (atomic_read(&bic->icq.ioc->active_ref) == 0 ||
+@@ -2856,6 +3411,7 @@ static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq,
+ if (bfqq->entity.service > bfq_max_budget(bfqd) / 8 ||
+ !BFQQ_SEEKY(bfqq))
+ bfq_update_idle_window(bfqd, bfqq, bic);
++ bfq_clear_bfqq_just_split(bfqq);
+
+ bfq_log_bfqq(bfqd, bfqq,
+ "rq_enqueued: idle_window=%d (seeky %d, mean %llu)",
+@@ -2920,12 +3476,47 @@ static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq,
+ static void bfq_insert_request(struct request_queue *q, struct request *rq)
+ {
+ struct bfq_data *bfqd = q->elevator->elevator_data;
+- struct bfq_queue *bfqq = RQ_BFQQ(rq);
++ struct bfq_queue *bfqq = RQ_BFQQ(rq), *new_bfqq;
+
+ assert_spin_locked(bfqd->queue->queue_lock);
+
++ /*
++ * An unplug may trigger a requeue of a request from the device
++ * driver: make sure we are in process context while trying to
++ * merge two bfq_queues.
++ */
++ if (!in_interrupt()) {
++ new_bfqq = bfq_setup_cooperator(bfqd, bfqq, rq, true);
++ if (new_bfqq) {
++ if (bic_to_bfqq(RQ_BIC(rq), 1) != bfqq)
++ new_bfqq = bic_to_bfqq(RQ_BIC(rq), 1);
++ /*
++ * Release the request's reference to the old bfqq
++ * and make sure one is taken to the shared queue.
++ */
++ new_bfqq->allocated[rq_data_dir(rq)]++;
++ bfqq->allocated[rq_data_dir(rq)]--;
++ atomic_inc(&new_bfqq->ref);
++ bfq_put_queue(bfqq);
++ if (bic_to_bfqq(RQ_BIC(rq), 1) == bfqq)
++ bfq_merge_bfqqs(bfqd, RQ_BIC(rq),
++ bfqq, new_bfqq);
++ rq->elv.priv[1] = new_bfqq;
++ bfqq = new_bfqq;
++ } else
++ bfq_bfqq_increase_failed_cooperations(bfqq);
++ }
++
+ bfq_add_request(rq);
+
++ /*
++ * Here a newly-created bfq_queue has already started a weight-raising
++ * period: clear raising_time_left to prevent bfq_bfqq_save_state()
++ * from assigning it a full weight-raising period. See the detailed
++ * comments about this field in bfq_init_icq().
++ */
++ if (bfqq->bic)
++ bfqq->bic->wr_time_left = 0;
+ rq->fifo_time = jiffies + bfqd->bfq_fifo_expire[rq_is_sync(rq)];
+ list_add_tail(&rq->queuelist, &bfqq->fifo);
+
+@@ -3094,6 +3685,32 @@ static void bfq_put_request(struct request *rq)
+ }
+
+ /*
++ * Returns NULL if a new bfqq should be allocated, or the old bfqq if this
++ * was the last process referring to said bfqq.
++ */
++static struct bfq_queue *
++bfq_split_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq)
++{
++ bfq_log_bfqq(bfqq->bfqd, bfqq, "splitting queue");
++
++ put_io_context(bic->icq.ioc);
++
++ if (bfqq_process_refs(bfqq) == 1) {
++ bfqq->pid = current->pid;
++ bfq_clear_bfqq_coop(bfqq);
++ bfq_clear_bfqq_split_coop(bfqq);
++ return bfqq;
++ }
++
++ bic_set_bfqq(bic, NULL, 1);
++
++ bfq_put_cooperator(bfqq);
++
++ bfq_put_queue(bfqq);
++ return NULL;
++}
++
++/*
+ * Allocate bfq data structures associated with this request.
+ */
+ static int bfq_set_request(struct request_queue *q, struct request *rq,
+@@ -3105,6 +3722,7 @@ static int bfq_set_request(struct request_queue *q, struct request *rq,
+ const int is_sync = rq_is_sync(rq);
+ struct bfq_queue *bfqq;
+ unsigned long flags;
++ bool split = false;
+
+ might_sleep_if(gfp_mask & __GFP_WAIT);
+
+@@ -3117,15 +3735,30 @@ static int bfq_set_request(struct request_queue *q, struct request *rq,
+
+ bfq_bic_update_cgroup(bic, bio);
+
++new_queue:
+ bfqq = bic_to_bfqq(bic, is_sync);
+ if (!bfqq || bfqq == &bfqd->oom_bfqq) {
+ bfqq = bfq_get_queue(bfqd, bio, is_sync, bic, gfp_mask);
+ bic_set_bfqq(bic, bfqq, is_sync);
+- if (is_sync) {
+- if (bfqd->large_burst)
++ if (split && is_sync) {
++ if ((bic->was_in_burst_list && bfqd->large_burst) ||
++ bic->saved_in_large_burst)
+ bfq_mark_bfqq_in_large_burst(bfqq);
+- else
+- bfq_clear_bfqq_in_large_burst(bfqq);
++ else {
++ bfq_clear_bfqq_in_large_burst(bfqq);
++ if (bic->was_in_burst_list)
++ hlist_add_head(&bfqq->burst_list_node,
++ &bfqd->burst_list);
++ }
++ }
++ } else {
++ /* If the queue was seeky for too long, break it apart. */
++ if (bfq_bfqq_coop(bfqq) && bfq_bfqq_split_coop(bfqq)) {
++ bfq_log_bfqq(bfqd, bfqq, "breaking apart bfqq");
++ bfqq = bfq_split_bfqq(bic, bfqq);
++ split = true;
++ if (!bfqq)
++ goto new_queue;
+ }
+ }
+
+@@ -3137,6 +3770,26 @@ static int bfq_set_request(struct request_queue *q, struct request *rq,
+ rq->elv.priv[0] = bic;
+ rq->elv.priv[1] = bfqq;
+
++ /*
++ * If a bfq_queue has only one process reference, it is owned
++ * by only one bfq_io_cq: we can set the bic field of the
++ * bfq_queue to the address of that structure. Also, if the
++ * queue has just been split, mark a flag so that the
++ * information is available to the other scheduler hooks.
++ */
++ if (likely(bfqq != &bfqd->oom_bfqq) && bfqq_process_refs(bfqq) == 1) {
++ bfqq->bic = bic;
++ if (split) {
++ bfq_mark_bfqq_just_split(bfqq);
++ /*
++ * If the queue has just been split from a shared
++ * queue, restore the idle window and the possible
++ * weight raising period.
++ */
++ bfq_bfqq_resume_state(bfqq, bic);
++ }
++ }
++
+ spin_unlock_irqrestore(q->queue_lock, flags);
+
+ return 0;
+@@ -3289,6 +3942,7 @@ static void bfq_init_root_group(struct bfq_group *root_group,
+ root_group->my_entity = NULL;
+ root_group->bfqd = bfqd;
+ #endif
++ root_group->rq_pos_tree = RB_ROOT;
+ for (i = 0; i < BFQ_IOPRIO_CLASSES; i++)
+ root_group->sched_data.service_tree[i] = BFQ_SERVICE_TREE_INIT;
+ }
+@@ -3369,6 +4023,8 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
+ bfqd->bfq_timeout[BLK_RW_ASYNC] = bfq_timeout_async;
+ bfqd->bfq_timeout[BLK_RW_SYNC] = bfq_timeout_sync;
+
++ bfqd->bfq_coop_thresh = 2;
++ bfqd->bfq_failed_cooperations = 7000;
+ bfqd->bfq_requests_within_timer = 120;
+
+ bfqd->bfq_large_burst_thresh = 11;
+diff --git a/block/bfq.h b/block/bfq.h
+index ca5ac20..320c438 100644
+--- a/block/bfq.h
++++ b/block/bfq.h
+@@ -183,6 +183,8 @@ struct bfq_group;
+ * ioprio_class value.
+ * @new_bfqq: shared bfq_queue if queue is cooperating with
+ * one or more other queues.
++ * @pos_node: request-position tree member (see bfq_group's @rq_pos_tree).
++ * @pos_root: request-position tree root (see bfq_group's @rq_pos_tree).
+ * @sort_list: sorted list of pending requests.
+ * @next_rq: if fifo isn't expired, next request to serve.
+ * @queued: nr of requests queued in @sort_list.
+@@ -304,6 +306,26 @@ struct bfq_ttime {
+ * @ttime: associated @bfq_ttime struct
+ * @ioprio: per (request_queue, blkcg) ioprio.
+ * @blkcg_id: id of the blkcg the related io_cq belongs to.
++ * @wr_time_left: snapshot of the time left before weight raising ends
++ * for the sync queue associated to this process; this
++ * snapshot is taken to remember this value while the weight
++ * raising is suspended because the queue is merged with a
++ * shared queue, and is used to set @raising_cur_max_time
++ * when the queue is split from the shared queue and its
++ * weight is raised again
++ * @saved_idle_window: same purpose as the previous field for the idle
++ * window
++ * @saved_IO_bound: same purpose as the previous two fields for the I/O
++ * bound classification of a queue
++ * @saved_in_large_burst: same purpose as the previous fields for the
++ * value of the field keeping the queue's belonging
++ * to a large burst
++ * @was_in_burst_list: true if the queue belonged to a burst list
++ * before its merge with another cooperating queue
++ * @cooperations: counter of consecutive successful queue merges underwent
++ * by any of the process' @bfq_queues
++ * @failed_cooperations: counter of consecutive failed queue merges of any
++ * of the process' @bfq_queues
+ */
+ struct bfq_io_cq {
+ struct io_cq icq; /* must be the first member */
+@@ -314,6 +336,16 @@ struct bfq_io_cq {
+ #ifdef CONFIG_BFQ_GROUP_IOSCHED
+ uint64_t blkcg_id; /* the current blkcg ID */
+ #endif
++
++ unsigned int wr_time_left;
++ bool saved_idle_window;
++ bool saved_IO_bound;
++
++ bool saved_in_large_burst;
++ bool was_in_burst_list;
++
++ unsigned int cooperations;
++ unsigned int failed_cooperations;
+ };
+
+ enum bfq_device_speed {
+@@ -559,6 +591,9 @@ enum bfqq_state_flags {
+ * may need softrt-next-start
+ * update
+ */
++ BFQ_BFQQ_FLAG_coop, /* bfqq is shared */
++ BFQ_BFQQ_FLAG_split_coop, /* shared bfqq will be split */
++ BFQ_BFQQ_FLAG_just_split, /* queue has just been split */
+ };
+
+ #define BFQ_BFQQ_FNS(name) \
+@@ -585,6 +620,9 @@ BFQ_BFQQ_FNS(budget_new);
+ BFQ_BFQQ_FNS(IO_bound);
+ BFQ_BFQQ_FNS(in_large_burst);
+ BFQ_BFQQ_FNS(constantly_seeky);
++BFQ_BFQQ_FNS(coop);
++BFQ_BFQQ_FNS(split_coop);
++BFQ_BFQQ_FNS(just_split);
+ BFQ_BFQQ_FNS(softrt_update);
+ #undef BFQ_BFQQ_FNS
+
+@@ -679,6 +717,9 @@ struct bfq_group_data {
+ * are groups with more than one active @bfq_entity
+ * (see the comments to the function
+ * bfq_bfqq_must_not_expire()).
++ * @rq_pos_tree: rbtree sorted by next_request position, used when
++ * determining if two or more queues have interleaving
++ * requests (see bfq_find_close_cooperator()).
+ *
+ * Each (device, cgroup) pair has its own bfq_group, i.e., for each cgroup
+ * there is a set of bfq_groups, each one collecting the lower-level
+@@ -707,6 +748,8 @@ struct bfq_group {
+
+ int active_entities;
+
++ struct rb_root rq_pos_tree;
++
+ struct bfqg_stats stats;
+ struct bfqg_stats dead_stats; /* stats pushed from dead children */
+ };
+@@ -717,6 +760,8 @@ struct bfq_group {
+
+ struct bfq_queue *async_bfqq[2][IOPRIO_BE_NR];
+ struct bfq_queue *async_idle_bfqq;
++
++ struct rb_root rq_pos_tree;
+ };
+ #endif
+
+@@ -793,6 +838,27 @@ static void bfq_put_bfqd_unlock(struct bfq_data *bfqd, unsigned long *flags)
+ spin_unlock_irqrestore(bfqd->queue->queue_lock, *flags);
+ }
+
++#ifdef CONFIG_BFQ_GROUP_IOSCHED
++
++static struct bfq_group *bfq_bfqq_to_bfqg(struct bfq_queue *bfqq)
++{
++ struct bfq_entity *group_entity = bfqq->entity.parent;
++
++ if (!group_entity)
++ group_entity = &bfqq->bfqd->root_group->entity;
++
++ return container_of(group_entity, struct bfq_group, entity);
++}
++
++#else
++
++static struct bfq_group *bfq_bfqq_to_bfqg(struct bfq_queue *bfqq)
++{
++ return bfqq->bfqd->root_group;
++}
++
++#endif
++
+ static void bfq_check_ioprio_change(struct bfq_io_cq *bic, struct bio *bio);
+ static void bfq_put_queue(struct bfq_queue *bfqq);
+ static void bfq_dispatch_insert(struct request_queue *q, struct request *rq);
+--
+2.1.4
+
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-09-21 22:19 Mike Pagano
0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-09-21 22:19 UTC (permalink / raw
To: gentoo-commits
commit: 857fef960f822e5b9d2105502ed3707d4f52df93
Author: Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Mon Sep 21 22:19:23 2015 +0000
Commit: Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Mon Sep 21 22:19:23 2015 +0000
URL: https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=857fef96
Linux patch 4.2.1
0000_README | 4 +
1000_linux-4.2.1.patch | 4522 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 4526 insertions(+)
diff --git a/0000_README b/0000_README
index 0f4cdca..0c6168a 100644
--- a/0000_README
+++ b/0000_README
@@ -43,6 +43,10 @@ EXPERIMENTAL
Individual Patch Descriptions:
--------------------------------------------------------------------------
+Patch: 1000_linux-4.2.1.patch
+From: http://www.kernel.org
+Desc: Linux 4.2.1
+
Patch: 1500_XATTR_USER_PREFIX.patch
From: https://bugs.gentoo.org/show_bug.cgi?id=470644
Desc: Support for namespace user.pax.* on tmpfs.
diff --git a/1000_linux-4.2.1.patch b/1000_linux-4.2.1.patch
new file mode 100644
index 0000000..2be0056
--- /dev/null
+++ b/1000_linux-4.2.1.patch
@@ -0,0 +1,4522 @@
+diff --git a/Documentation/ABI/testing/configfs-usb-gadget-loopback b/Documentation/ABI/testing/configfs-usb-gadget-loopback
+index 9aae5bfb9908..06beefbcf061 100644
+--- a/Documentation/ABI/testing/configfs-usb-gadget-loopback
++++ b/Documentation/ABI/testing/configfs-usb-gadget-loopback
+@@ -5,4 +5,4 @@ Description:
+ The attributes:
+
+ qlen - depth of loopback queue
+- bulk_buflen - buffer length
++ buflen - buffer length
+diff --git a/Documentation/ABI/testing/configfs-usb-gadget-sourcesink b/Documentation/ABI/testing/configfs-usb-gadget-sourcesink
+index 29477c319f61..bc7ff731aa0c 100644
+--- a/Documentation/ABI/testing/configfs-usb-gadget-sourcesink
++++ b/Documentation/ABI/testing/configfs-usb-gadget-sourcesink
+@@ -9,4 +9,4 @@ Description:
+ isoc_maxpacket - 0 - 1023 (fs), 0 - 1024 (hs/ss)
+ isoc_mult - 0..2 (hs/ss only)
+ isoc_maxburst - 0..15 (ss only)
+- qlen - buffer length
++ buflen - buffer length
+diff --git a/Documentation/device-mapper/statistics.txt b/Documentation/device-mapper/statistics.txt
+index 4919b2dfd1b3..6f5ef944ca4c 100644
+--- a/Documentation/device-mapper/statistics.txt
++++ b/Documentation/device-mapper/statistics.txt
+@@ -121,6 +121,10 @@ Messages
+
+ Output format:
+ <region_id>: <start_sector>+<length> <step> <program_id> <aux_data>
++ precise_timestamps histogram:n1,n2,n3,...
++
++ The strings "precise_timestamps" and "histogram" are printed only
++ if they were specified when creating the region.
+
+ @stats_print <region_id> [<starting_line> <number_of_lines>]
+
+diff --git a/Documentation/usb/gadget-testing.txt b/Documentation/usb/gadget-testing.txt
+index 592678009c15..b24d3ef89166 100644
+--- a/Documentation/usb/gadget-testing.txt
++++ b/Documentation/usb/gadget-testing.txt
+@@ -237,9 +237,7 @@ Testing the LOOPBACK function
+ -----------------------------
+
+ device: run the gadget
+-host: test-usb
+-
+-http://www.linux-usb.org/usbtest/testusb.c
++host: test-usb (tools/usb/testusb.c)
+
+ 8. MASS STORAGE function
+ ========================
+@@ -586,9 +584,8 @@ Testing the SOURCESINK function
+ -------------------------------
+
+ device: run the gadget
+-host: test-usb
++host: test-usb (tools/usb/testusb.c)
+
+-http://www.linux-usb.org/usbtest/testusb.c
+
+ 16. UAC1 function
+ =================
+diff --git a/Makefile b/Makefile
+index c3615937df38..a03efc18aa48 100644
+--- a/Makefile
++++ b/Makefile
+@@ -1,6 +1,6 @@
+ VERSION = 4
+ PATCHLEVEL = 2
+-SUBLEVEL = 0
++SUBLEVEL = 1
+ EXTRAVERSION =
+ NAME = Hurr durr I'ma sheep
+
+diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
+index 1c5021002fe4..ede2526ecf1f 100644
+--- a/arch/arm/Kconfig
++++ b/arch/arm/Kconfig
+@@ -536,6 +536,7 @@ config ARCH_ORION5X
+ select MVEBU_MBUS
+ select PCI
+ select PLAT_ORION_LEGACY
++ select MULTI_IRQ_HANDLER
+ help
+ Support for the following Marvell Orion 5x series SoCs:
+ Orion-1 (5181), Orion-VoIP (5181L), Orion-NAS (5182),
+diff --git a/arch/arm/boot/dts/exynos3250-rinato.dts b/arch/arm/boot/dts/exynos3250-rinato.dts
+index 031853b75528..baa9b2f52009 100644
+--- a/arch/arm/boot/dts/exynos3250-rinato.dts
++++ b/arch/arm/boot/dts/exynos3250-rinato.dts
+@@ -182,7 +182,7 @@
+
+ display-timings {
+ timing-0 {
+- clock-frequency = <0>;
++ clock-frequency = <4600000>;
+ hactive = <320>;
+ vactive = <320>;
+ hfront-porch = <1>;
+diff --git a/arch/arm/boot/dts/rk3288.dtsi b/arch/arm/boot/dts/rk3288.dtsi
+index 22316d00493e..858efd0c861d 100644
+--- a/arch/arm/boot/dts/rk3288.dtsi
++++ b/arch/arm/boot/dts/rk3288.dtsi
+@@ -626,7 +626,7 @@
+ compatible = "rockchip,rk3288-wdt", "snps,dw-wdt";
+ reg = <0xff800000 0x100>;
+ clocks = <&cru PCLK_WDT>;
+- interrupts = <GIC_SPI 111 IRQ_TYPE_LEVEL_HIGH>;
++ interrupts = <GIC_SPI 79 IRQ_TYPE_LEVEL_HIGH>;
+ status = "disabled";
+ };
+
+diff --git a/arch/arm/mach-bcm/bcm63xx_smp.c b/arch/arm/mach-bcm/bcm63xx_smp.c
+index 3f014f18cea5..b8e18cc8f237 100644
+--- a/arch/arm/mach-bcm/bcm63xx_smp.c
++++ b/arch/arm/mach-bcm/bcm63xx_smp.c
+@@ -127,7 +127,7 @@ static int bcm63138_smp_boot_secondary(unsigned int cpu,
+ }
+
+ /* Locate the secondary CPU node */
+- dn = of_get_cpu_node(cpu_logical_map(cpu), NULL);
++ dn = of_get_cpu_node(cpu, NULL);
+ if (!dn) {
+ pr_err("SMP: failed to locate secondary CPU%d node\n", cpu);
+ ret = -ENODEV;
+diff --git a/arch/arm/mach-omap2/clockdomains7xx_data.c b/arch/arm/mach-omap2/clockdomains7xx_data.c
+index 57d5df0c1fbd..7581e036bda6 100644
+--- a/arch/arm/mach-omap2/clockdomains7xx_data.c
++++ b/arch/arm/mach-omap2/clockdomains7xx_data.c
+@@ -331,7 +331,7 @@ static struct clockdomain l4per2_7xx_clkdm = {
+ .dep_bit = DRA7XX_L4PER2_STATDEP_SHIFT,
+ .wkdep_srcs = l4per2_wkup_sleep_deps,
+ .sleepdep_srcs = l4per2_wkup_sleep_deps,
+- .flags = CLKDM_CAN_HWSUP_SWSUP,
++ .flags = CLKDM_CAN_SWSUP,
+ };
+
+ static struct clockdomain mpu0_7xx_clkdm = {
+diff --git a/arch/arm/mach-orion5x/include/mach/irqs.h b/arch/arm/mach-orion5x/include/mach/irqs.h
+index a6fa9d8f12d8..2431d9923427 100644
+--- a/arch/arm/mach-orion5x/include/mach/irqs.h
++++ b/arch/arm/mach-orion5x/include/mach/irqs.h
+@@ -16,42 +16,42 @@
+ /*
+ * Orion Main Interrupt Controller
+ */
+-#define IRQ_ORION5X_BRIDGE 0
+-#define IRQ_ORION5X_DOORBELL_H2C 1
+-#define IRQ_ORION5X_DOORBELL_C2H 2
+-#define IRQ_ORION5X_UART0 3
+-#define IRQ_ORION5X_UART1 4
+-#define IRQ_ORION5X_I2C 5
+-#define IRQ_ORION5X_GPIO_0_7 6
+-#define IRQ_ORION5X_GPIO_8_15 7
+-#define IRQ_ORION5X_GPIO_16_23 8
+-#define IRQ_ORION5X_GPIO_24_31 9
+-#define IRQ_ORION5X_PCIE0_ERR 10
+-#define IRQ_ORION5X_PCIE0_INT 11
+-#define IRQ_ORION5X_USB1_CTRL 12
+-#define IRQ_ORION5X_DEV_BUS_ERR 14
+-#define IRQ_ORION5X_PCI_ERR 15
+-#define IRQ_ORION5X_USB_BR_ERR 16
+-#define IRQ_ORION5X_USB0_CTRL 17
+-#define IRQ_ORION5X_ETH_RX 18
+-#define IRQ_ORION5X_ETH_TX 19
+-#define IRQ_ORION5X_ETH_MISC 20
+-#define IRQ_ORION5X_ETH_SUM 21
+-#define IRQ_ORION5X_ETH_ERR 22
+-#define IRQ_ORION5X_IDMA_ERR 23
+-#define IRQ_ORION5X_IDMA_0 24
+-#define IRQ_ORION5X_IDMA_1 25
+-#define IRQ_ORION5X_IDMA_2 26
+-#define IRQ_ORION5X_IDMA_3 27
+-#define IRQ_ORION5X_CESA 28
+-#define IRQ_ORION5X_SATA 29
+-#define IRQ_ORION5X_XOR0 30
+-#define IRQ_ORION5X_XOR1 31
++#define IRQ_ORION5X_BRIDGE (1 + 0)
++#define IRQ_ORION5X_DOORBELL_H2C (1 + 1)
++#define IRQ_ORION5X_DOORBELL_C2H (1 + 2)
++#define IRQ_ORION5X_UART0 (1 + 3)
++#define IRQ_ORION5X_UART1 (1 + 4)
++#define IRQ_ORION5X_I2C (1 + 5)
++#define IRQ_ORION5X_GPIO_0_7 (1 + 6)
++#define IRQ_ORION5X_GPIO_8_15 (1 + 7)
++#define IRQ_ORION5X_GPIO_16_23 (1 + 8)
++#define IRQ_ORION5X_GPIO_24_31 (1 + 9)
++#define IRQ_ORION5X_PCIE0_ERR (1 + 10)
++#define IRQ_ORION5X_PCIE0_INT (1 + 11)
++#define IRQ_ORION5X_USB1_CTRL (1 + 12)
++#define IRQ_ORION5X_DEV_BUS_ERR (1 + 14)
++#define IRQ_ORION5X_PCI_ERR (1 + 15)
++#define IRQ_ORION5X_USB_BR_ERR (1 + 16)
++#define IRQ_ORION5X_USB0_CTRL (1 + 17)
++#define IRQ_ORION5X_ETH_RX (1 + 18)
++#define IRQ_ORION5X_ETH_TX (1 + 19)
++#define IRQ_ORION5X_ETH_MISC (1 + 20)
++#define IRQ_ORION5X_ETH_SUM (1 + 21)
++#define IRQ_ORION5X_ETH_ERR (1 + 22)
++#define IRQ_ORION5X_IDMA_ERR (1 + 23)
++#define IRQ_ORION5X_IDMA_0 (1 + 24)
++#define IRQ_ORION5X_IDMA_1 (1 + 25)
++#define IRQ_ORION5X_IDMA_2 (1 + 26)
++#define IRQ_ORION5X_IDMA_3 (1 + 27)
++#define IRQ_ORION5X_CESA (1 + 28)
++#define IRQ_ORION5X_SATA (1 + 29)
++#define IRQ_ORION5X_XOR0 (1 + 30)
++#define IRQ_ORION5X_XOR1 (1 + 31)
+
+ /*
+ * Orion General Purpose Pins
+ */
+-#define IRQ_ORION5X_GPIO_START 32
++#define IRQ_ORION5X_GPIO_START 33
+ #define NR_GPIO_IRQS 32
+
+ #define NR_IRQS (IRQ_ORION5X_GPIO_START + NR_GPIO_IRQS)
+diff --git a/arch/arm/mach-orion5x/irq.c b/arch/arm/mach-orion5x/irq.c
+index cd4bac4d7e43..086ecb87d885 100644
+--- a/arch/arm/mach-orion5x/irq.c
++++ b/arch/arm/mach-orion5x/irq.c
+@@ -42,7 +42,7 @@ __exception_irq_entry orion5x_legacy_handle_irq(struct pt_regs *regs)
+ stat = readl_relaxed(MAIN_IRQ_CAUSE);
+ stat &= readl_relaxed(MAIN_IRQ_MASK);
+ if (stat) {
+- unsigned int hwirq = __fls(stat);
++ unsigned int hwirq = 1 + __fls(stat);
+ handle_IRQ(hwirq, regs);
+ return;
+ }
+@@ -51,7 +51,7 @@ __exception_irq_entry orion5x_legacy_handle_irq(struct pt_regs *regs)
+
+ void __init orion5x_init_irq(void)
+ {
+- orion_irq_init(0, MAIN_IRQ_MASK);
++ orion_irq_init(1, MAIN_IRQ_MASK);
+
+ #ifdef CONFIG_MULTI_IRQ_HANDLER
+ set_handle_irq(orion5x_legacy_handle_irq);
+diff --git a/arch/arm/mach-rockchip/platsmp.c b/arch/arm/mach-rockchip/platsmp.c
+index 8fcec1cc101e..01b3e3683ede 100644
+--- a/arch/arm/mach-rockchip/platsmp.c
++++ b/arch/arm/mach-rockchip/platsmp.c
+@@ -72,29 +72,22 @@ static struct reset_control *rockchip_get_core_reset(int cpu)
+ static int pmu_set_power_domain(int pd, bool on)
+ {
+ u32 val = (on) ? 0 : BIT(pd);
++ struct reset_control *rstc = rockchip_get_core_reset(pd);
+ int ret;
+
++ if (IS_ERR(rstc) && read_cpuid_part() != ARM_CPU_PART_CORTEX_A9) {
++ pr_err("%s: could not get reset control for core %d\n",
++ __func__, pd);
++ return PTR_ERR(rstc);
++ }
++
+ /*
+ * We need to soft reset the cpu when we turn off the cpu power domain,
+ * or else the active processors might be stalled when the individual
+ * processor is powered down.
+ */
+- if (read_cpuid_part() != ARM_CPU_PART_CORTEX_A9) {
+- struct reset_control *rstc = rockchip_get_core_reset(pd);
+-
+- if (IS_ERR(rstc)) {
+- pr_err("%s: could not get reset control for core %d\n",
+- __func__, pd);
+- return PTR_ERR(rstc);
+- }
+-
+- if (on)
+- reset_control_deassert(rstc);
+- else
+- reset_control_assert(rstc);
+-
+- reset_control_put(rstc);
+- }
++ if (!IS_ERR(rstc) && !on)
++ reset_control_assert(rstc);
+
+ ret = regmap_update_bits(pmu, PMU_PWRDN_CON, BIT(pd), val);
+ if (ret < 0) {
+@@ -112,6 +105,12 @@ static int pmu_set_power_domain(int pd, bool on)
+ }
+ }
+
++ if (!IS_ERR(rstc)) {
++ if (on)
++ reset_control_deassert(rstc);
++ reset_control_put(rstc);
++ }
++
+ return 0;
+ }
+
+@@ -146,8 +145,12 @@ static int rockchip_boot_secondary(unsigned int cpu, struct task_struct *idle)
+ * the mailbox:
+ * sram_base_addr + 4: 0xdeadbeaf
+ * sram_base_addr + 8: start address for pc
++ * The cpu0 need to wait the other cpus other than cpu0 entering
++ * the wfe state.The wait time is affected by many aspects.
++ * (e.g: cpu frequency, bootrom frequency, sram frequency, ...)
+ * */
+- udelay(10);
++ mdelay(1); /* ensure the cpus other than cpu0 to startup */
++
+ writel(virt_to_phys(secondary_startup), sram_base_addr + 8);
+ writel(0xDEADBEAF, sram_base_addr + 4);
+ dsb_sev();
+diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+index b027a89737b6..c6d601cc9764 100644
+--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
++++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+@@ -421,14 +421,20 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
+ rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
+ v = pte & ~HPTE_V_HVLOCK;
+ if (v & HPTE_V_VALID) {
+- u64 pte1;
+-
+- pte1 = be64_to_cpu(hpte[1]);
+ hpte[0] &= ~cpu_to_be64(HPTE_V_VALID);
+- rb = compute_tlbie_rb(v, pte1, pte_index);
++ rb = compute_tlbie_rb(v, be64_to_cpu(hpte[1]), pte_index);
+ do_tlbies(kvm, &rb, 1, global_invalidates(kvm, flags), true);
+- /* Read PTE low word after tlbie to get final R/C values */
+- remove_revmap_chain(kvm, pte_index, rev, v, pte1);
++ /*
++ * The reference (R) and change (C) bits in a HPT
++ * entry can be set by hardware at any time up until
++ * the HPTE is invalidated and the TLB invalidation
++ * sequence has completed. This means that when
++ * removing a HPTE, we need to re-read the HPTE after
++ * the invalidation sequence has completed in order to
++ * obtain reliable values of R and C.
++ */
++ remove_revmap_chain(kvm, pte_index, rev, v,
++ be64_to_cpu(hpte[1]));
+ }
+ r = rev->guest_rpte & ~HPTE_GR_RESERVED;
+ note_hpte_modification(kvm, rev);
+diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+index faa86e9c0551..76408cf0ad04 100644
+--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
++++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+@@ -1127,6 +1127,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
+ cmpwi r12, BOOK3S_INTERRUPT_H_DOORBELL
+ bne 3f
+ lbz r0, HSTATE_HOST_IPI(r13)
++ cmpwi r0, 0
+ beq 4f
+ b guest_exit_cont
+ 3:
+diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
+index ca070d260af2..b80512b9ef59 100644
+--- a/arch/s390/kernel/setup.c
++++ b/arch/s390/kernel/setup.c
+@@ -688,7 +688,7 @@ static void __init setup_memory(void)
+ /*
+ * Setup hardware capabilities.
+ */
+-static void __init setup_hwcaps(void)
++static int __init setup_hwcaps(void)
+ {
+ static const int stfl_bits[6] = { 0, 2, 7, 17, 19, 21 };
+ struct cpuid cpu_id;
+@@ -754,9 +754,11 @@ static void __init setup_hwcaps(void)
+ elf_hwcap |= HWCAP_S390_TE;
+
+ /*
+- * Vector extension HWCAP_S390_VXRS is bit 11.
++ * Vector extension HWCAP_S390_VXRS is bit 11. The Vector extension
++ * can be disabled with the "novx" parameter. Use MACHINE_HAS_VX
++ * instead of facility bit 129.
+ */
+- if (test_facility(129))
++ if (MACHINE_HAS_VX)
+ elf_hwcap |= HWCAP_S390_VXRS;
+ get_cpu_id(&cpu_id);
+ add_device_randomness(&cpu_id, sizeof(cpu_id));
+@@ -793,7 +795,9 @@ static void __init setup_hwcaps(void)
+ strcpy(elf_platform, "z13");
+ break;
+ }
++ return 0;
+ }
++arch_initcall(setup_hwcaps);
+
+ /*
+ * Add system information as device randomness
+@@ -881,11 +885,6 @@ void __init setup_arch(char **cmdline_p)
+ cpu_init();
+
+ /*
+- * Setup capabilities (ELF_HWCAP & ELF_PLATFORM).
+- */
+- setup_hwcaps();
+-
+- /*
+ * Create kernel page tables and switch to virtual addressing.
+ */
+ paging_init();
+diff --git a/arch/x86/crypto/ghash-clmulni-intel_glue.c b/arch/x86/crypto/ghash-clmulni-intel_glue.c
+index 64d7cf1b50e1..440df0c7a2ee 100644
+--- a/arch/x86/crypto/ghash-clmulni-intel_glue.c
++++ b/arch/x86/crypto/ghash-clmulni-intel_glue.c
+@@ -294,6 +294,7 @@ static struct ahash_alg ghash_async_alg = {
+ .cra_name = "ghash",
+ .cra_driver_name = "ghash-clmulni",
+ .cra_priority = 400,
++ .cra_ctxsize = sizeof(struct ghash_async_ctx),
+ .cra_flags = CRYPTO_ALG_TYPE_AHASH | CRYPTO_ALG_ASYNC,
+ .cra_blocksize = GHASH_BLOCK_SIZE,
+ .cra_type = &crypto_ahash_type,
+diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
+index e49ee24da85e..9393896717d0 100644
+--- a/arch/x86/kernel/acpi/boot.c
++++ b/arch/x86/kernel/acpi/boot.c
+@@ -445,6 +445,7 @@ static void __init acpi_sci_ioapic_setup(u8 bus_irq, u16 polarity, u16 trigger,
+ polarity = acpi_sci_flags & ACPI_MADT_POLARITY_MASK;
+
+ mp_override_legacy_irq(bus_irq, polarity, trigger, gsi);
++ acpi_penalize_sci_irq(bus_irq, trigger, polarity);
+
+ /*
+ * stash over-ride to indicate we've been here
+diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
+index 844f56c5616d..c93c27df9919 100644
+--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
++++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
+@@ -146,6 +146,27 @@ void mce_intel_hcpu_update(unsigned long cpu)
+ per_cpu(cmci_storm_state, cpu) = CMCI_STORM_NONE;
+ }
+
++static void cmci_toggle_interrupt_mode(bool on)
++{
++ unsigned long flags, *owned;
++ int bank;
++ u64 val;
++
++ raw_spin_lock_irqsave(&cmci_discover_lock, flags);
++ owned = this_cpu_ptr(mce_banks_owned);
++ for_each_set_bit(bank, owned, MAX_NR_BANKS) {
++ rdmsrl(MSR_IA32_MCx_CTL2(bank), val);
++
++ if (on)
++ val |= MCI_CTL2_CMCI_EN;
++ else
++ val &= ~MCI_CTL2_CMCI_EN;
++
++ wrmsrl(MSR_IA32_MCx_CTL2(bank), val);
++ }
++ raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
++}
++
+ unsigned long cmci_intel_adjust_timer(unsigned long interval)
+ {
+ if ((this_cpu_read(cmci_backoff_cnt) > 0) &&
+@@ -175,7 +196,7 @@ unsigned long cmci_intel_adjust_timer(unsigned long interval)
+ */
+ if (!atomic_read(&cmci_storm_on_cpus)) {
+ __this_cpu_write(cmci_storm_state, CMCI_STORM_NONE);
+- cmci_reenable();
++ cmci_toggle_interrupt_mode(true);
+ cmci_recheck();
+ }
+ return CMCI_POLL_INTERVAL;
+@@ -186,22 +207,6 @@ unsigned long cmci_intel_adjust_timer(unsigned long interval)
+ }
+ }
+
+-static void cmci_storm_disable_banks(void)
+-{
+- unsigned long flags, *owned;
+- int bank;
+- u64 val;
+-
+- raw_spin_lock_irqsave(&cmci_discover_lock, flags);
+- owned = this_cpu_ptr(mce_banks_owned);
+- for_each_set_bit(bank, owned, MAX_NR_BANKS) {
+- rdmsrl(MSR_IA32_MCx_CTL2(bank), val);
+- val &= ~MCI_CTL2_CMCI_EN;
+- wrmsrl(MSR_IA32_MCx_CTL2(bank), val);
+- }
+- raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
+-}
+-
+ static bool cmci_storm_detect(void)
+ {
+ unsigned int cnt = __this_cpu_read(cmci_storm_cnt);
+@@ -223,7 +228,7 @@ static bool cmci_storm_detect(void)
+ if (cnt <= CMCI_STORM_THRESHOLD)
+ return false;
+
+- cmci_storm_disable_banks();
++ cmci_toggle_interrupt_mode(false);
+ __this_cpu_write(cmci_storm_state, CMCI_STORM_ACTIVE);
+ r = atomic_add_return(1, &cmci_storm_on_cpus);
+ mce_timer_kick(CMCI_STORM_INTERVAL);
+diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
+index 44171462bd2a..82362ad2f25d 100644
+--- a/arch/x86/kvm/mmu.c
++++ b/arch/x86/kvm/mmu.c
+@@ -357,12 +357,6 @@ static u64 __get_spte_lockless(u64 *sptep)
+ {
+ return ACCESS_ONCE(*sptep);
+ }
+-
+-static bool __check_direct_spte_mmio_pf(u64 spte)
+-{
+- /* It is valid if the spte is zapped. */
+- return spte == 0ull;
+-}
+ #else
+ union split_spte {
+ struct {
+@@ -478,23 +472,6 @@ retry:
+
+ return spte.spte;
+ }
+-
+-static bool __check_direct_spte_mmio_pf(u64 spte)
+-{
+- union split_spte sspte = (union split_spte)spte;
+- u32 high_mmio_mask = shadow_mmio_mask >> 32;
+-
+- /* It is valid if the spte is zapped. */
+- if (spte == 0ull)
+- return true;
+-
+- /* It is valid if the spte is being zapped. */
+- if (sspte.spte_low == 0ull &&
+- (sspte.spte_high & high_mmio_mask) == high_mmio_mask)
+- return true;
+-
+- return false;
+-}
+ #endif
+
+ static bool spte_is_locklessly_modifiable(u64 spte)
+@@ -3299,21 +3276,6 @@ static bool quickly_check_mmio_pf(struct kvm_vcpu *vcpu, u64 addr, bool direct)
+ return vcpu_match_mmio_gva(vcpu, addr);
+ }
+
+-
+-/*
+- * On direct hosts, the last spte is only allows two states
+- * for mmio page fault:
+- * - It is the mmio spte
+- * - It is zapped or it is being zapped.
+- *
+- * This function completely checks the spte when the last spte
+- * is not the mmio spte.
+- */
+-static bool check_direct_spte_mmio_pf(u64 spte)
+-{
+- return __check_direct_spte_mmio_pf(spte);
+-}
+-
+ static u64 walk_shadow_page_get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr)
+ {
+ struct kvm_shadow_walk_iterator iterator;
+@@ -3356,13 +3318,6 @@ int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct)
+ }
+
+ /*
+- * It's ok if the gva is remapped by other cpus on shadow guest,
+- * it's a BUG if the gfn is not a mmio page.
+- */
+- if (direct && !check_direct_spte_mmio_pf(spte))
+- return RET_MMIO_PF_BUG;
+-
+- /*
+ * If the page table is zapped by other cpus, let CPU fault again on
+ * the address.
+ */
+diff --git a/arch/xtensa/include/asm/traps.h b/arch/xtensa/include/asm/traps.h
+index 677bfcf4ee5d..28f33a8b7f5f 100644
+--- a/arch/xtensa/include/asm/traps.h
++++ b/arch/xtensa/include/asm/traps.h
+@@ -25,30 +25,39 @@ static inline void spill_registers(void)
+ {
+ #if XCHAL_NUM_AREGS > 16
+ __asm__ __volatile__ (
+- " call12 1f\n"
++ " call8 1f\n"
+ " _j 2f\n"
+ " retw\n"
+ " .align 4\n"
+ "1:\n"
++#if XCHAL_NUM_AREGS == 32
++ " _entry a1, 32\n"
++ " addi a8, a0, 3\n"
++ " _entry a1, 16\n"
++ " mov a12, a12\n"
++ " retw\n"
++#else
+ " _entry a1, 48\n"
+- " addi a12, a0, 3\n"
+-#if XCHAL_NUM_AREGS > 32
+- " .rept (" __stringify(XCHAL_NUM_AREGS) " - 32) / 12\n"
++ " call12 1f\n"
++ " retw\n"
++ " .align 4\n"
++ "1:\n"
++ " .rept (" __stringify(XCHAL_NUM_AREGS) " - 16) / 12\n"
+ " _entry a1, 48\n"
+ " mov a12, a0\n"
+ " .endr\n"
+-#endif
+- " _entry a1, 48\n"
++ " _entry a1, 16\n"
+ #if XCHAL_NUM_AREGS % 12 == 0
+- " mov a8, a8\n"
+-#elif XCHAL_NUM_AREGS % 12 == 4
+ " mov a12, a12\n"
+-#elif XCHAL_NUM_AREGS % 12 == 8
++#elif XCHAL_NUM_AREGS % 12 == 4
+ " mov a4, a4\n"
++#elif XCHAL_NUM_AREGS % 12 == 8
++ " mov a8, a8\n"
+ #endif
+ " retw\n"
++#endif
+ "2:\n"
+- : : : "a12", "a13", "memory");
++ : : : "a8", "a9", "memory");
+ #else
+ __asm__ __volatile__ (
+ " mov a12, a12\n"
+diff --git a/arch/xtensa/kernel/entry.S b/arch/xtensa/kernel/entry.S
+index 82bbfa5a05b3..a2a902140c4e 100644
+--- a/arch/xtensa/kernel/entry.S
++++ b/arch/xtensa/kernel/entry.S
+@@ -568,12 +568,13 @@ user_exception_exit:
+ * (if we have restored WSBITS-1 frames).
+ */
+
++2:
+ #if XCHAL_HAVE_THREADPTR
+ l32i a3, a1, PT_THREADPTR
+ wur a3, threadptr
+ #endif
+
+-2: j common_exception_exit
++ j common_exception_exit
+
+ /* This is the kernel exception exit.
+ * We avoided to do a MOVSP when we entered the exception, but we
+@@ -1820,7 +1821,7 @@ ENDPROC(system_call)
+ mov a12, a0
+ .endr
+ #endif
+- _entry a1, 48
++ _entry a1, 16
+ #if XCHAL_NUM_AREGS % 12 == 0
+ mov a8, a8
+ #elif XCHAL_NUM_AREGS % 12 == 4
+@@ -1844,7 +1845,7 @@ ENDPROC(system_call)
+
+ ENTRY(_switch_to)
+
+- entry a1, 16
++ entry a1, 48
+
+ mov a11, a3 # and 'next' (a3)
+
+diff --git a/drivers/acpi/acpi_pnp.c b/drivers/acpi/acpi_pnp.c
+index ff6d8adc9cda..fb765524cc3d 100644
+--- a/drivers/acpi/acpi_pnp.c
++++ b/drivers/acpi/acpi_pnp.c
+@@ -153,6 +153,7 @@ static const struct acpi_device_id acpi_pnp_device_ids[] = {
+ {"AEI0250"}, /* PROLiNK 1456VH ISA PnP K56flex Fax Modem */
+ {"AEI1240"}, /* Actiontec ISA PNP 56K X2 Fax Modem */
+ {"AKY1021"}, /* Rockwell 56K ACF II Fax+Data+Voice Modem */
++ {"ALI5123"}, /* ALi Fast Infrared Controller */
+ {"AZT4001"}, /* AZT3005 PnP SOUND DEVICE */
+ {"BDP3336"}, /* Best Data Products Inc. Smart One 336F PnP Modem */
+ {"BRI0A49"}, /* Boca Complete Ofc Communicator 14.4 Data-FAX */
+diff --git a/drivers/acpi/pci_link.c b/drivers/acpi/pci_link.c
+index cfd7581cc19f..b09ad554430a 100644
+--- a/drivers/acpi/pci_link.c
++++ b/drivers/acpi/pci_link.c
+@@ -826,6 +826,22 @@ void acpi_penalize_isa_irq(int irq, int active)
+ }
+
+ /*
++ * Penalize IRQ used by ACPI SCI. If ACPI SCI pin attributes conflict with
++ * PCI IRQ attributes, mark ACPI SCI as ISA_ALWAYS so it won't be use for
++ * PCI IRQs.
++ */
++void acpi_penalize_sci_irq(int irq, int trigger, int polarity)
++{
++ if (irq >= 0 && irq < ARRAY_SIZE(acpi_irq_penalty)) {
++ if (trigger != ACPI_MADT_TRIGGER_LEVEL ||
++ polarity != ACPI_MADT_POLARITY_ACTIVE_LOW)
++ acpi_irq_penalty[irq] += PIRQ_PENALTY_ISA_ALWAYS;
++ else
++ acpi_irq_penalty[irq] += PIRQ_PENALTY_PCI_USING;
++ }
++}
++
++/*
+ * Over-ride default table to reserve additional IRQs for use by ISA
+ * e.g. acpi_irq_isa=5
+ * Useful for telling ACPI how not to interfere with your ISA sound card.
+diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
+index 7e62751abfac..a46660204e3a 100644
+--- a/drivers/ata/ahci.c
++++ b/drivers/ata/ahci.c
+@@ -351,6 +351,7 @@ static const struct pci_device_id ahci_pci_tbl[] = {
+ /* JMicron 362B and 362C have an AHCI function with IDE class code */
+ { PCI_VDEVICE(JMICRON, 0x2362), board_ahci_ign_iferr },
+ { PCI_VDEVICE(JMICRON, 0x236f), board_ahci_ign_iferr },
++ /* May need to update quirk_jmicron_async_suspend() for additions */
+
+ /* ATI */
+ { PCI_VDEVICE(ATI, 0x4380), board_ahci_sb600 }, /* ATI SB600 */
+@@ -1451,18 +1452,6 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
+ else if (pdev->vendor == 0x177d && pdev->device == 0xa01c)
+ ahci_pci_bar = AHCI_PCI_BAR_CAVIUM;
+
+- /*
+- * The JMicron chip 361/363 contains one SATA controller and one
+- * PATA controller,for powering on these both controllers, we must
+- * follow the sequence one by one, otherwise one of them can not be
+- * powered on successfully, so here we disable the async suspend
+- * method for these chips.
+- */
+- if (pdev->vendor == PCI_VENDOR_ID_JMICRON &&
+- (pdev->device == PCI_DEVICE_ID_JMICRON_JMB363 ||
+- pdev->device == PCI_DEVICE_ID_JMICRON_JMB361))
+- device_disable_async_suspend(&pdev->dev);
+-
+ /* acquire resources */
+ rc = pcim_enable_device(pdev);
+ if (rc)
+diff --git a/drivers/ata/pata_jmicron.c b/drivers/ata/pata_jmicron.c
+index 47e418b8c8ba..4d1a5d2c4287 100644
+--- a/drivers/ata/pata_jmicron.c
++++ b/drivers/ata/pata_jmicron.c
+@@ -143,18 +143,6 @@ static int jmicron_init_one (struct pci_dev *pdev, const struct pci_device_id *i
+ };
+ const struct ata_port_info *ppi[] = { &info, NULL };
+
+- /*
+- * The JMicron chip 361/363 contains one SATA controller and one
+- * PATA controller,for powering on these both controllers, we must
+- * follow the sequence one by one, otherwise one of them can not be
+- * powered on successfully, so here we disable the async suspend
+- * method for these chips.
+- */
+- if (pdev->vendor == PCI_VENDOR_ID_JMICRON &&
+- (pdev->device == PCI_DEVICE_ID_JMICRON_JMB363 ||
+- pdev->device == PCI_DEVICE_ID_JMICRON_JMB361))
+- device_disable_async_suspend(&pdev->dev);
+-
+ return ata_pci_bmdma_init_one(pdev, ppi, &jmicron_sht, NULL, 0);
+ }
+
+diff --git a/drivers/auxdisplay/ks0108.c b/drivers/auxdisplay/ks0108.c
+index 5b93852392b8..0d752851a1ee 100644
+--- a/drivers/auxdisplay/ks0108.c
++++ b/drivers/auxdisplay/ks0108.c
+@@ -139,6 +139,7 @@ static int __init ks0108_init(void)
+
+ ks0108_pardevice = parport_register_device(ks0108_parport, KS0108_NAME,
+ NULL, NULL, NULL, PARPORT_DEV_EXCL, NULL);
++ parport_put_port(ks0108_parport);
+ if (ks0108_pardevice == NULL) {
+ printk(KERN_ERR KS0108_NAME ": ERROR: "
+ "parport didn't register new device\n");
+diff --git a/drivers/base/devres.c b/drivers/base/devres.c
+index c8a53d1e019f..875464690117 100644
+--- a/drivers/base/devres.c
++++ b/drivers/base/devres.c
+@@ -297,10 +297,10 @@ void * devres_get(struct device *dev, void *new_res,
+ if (!dr) {
+ add_dr(dev, &new_dr->node);
+ dr = new_dr;
+- new_dr = NULL;
++ new_res = NULL;
+ }
+ spin_unlock_irqrestore(&dev->devres_lock, flags);
+- devres_free(new_dr);
++ devres_free(new_res);
+
+ return dr->data;
+ }
+diff --git a/drivers/base/platform.c b/drivers/base/platform.c
+index 063f0ab15259..f80aaaf9f610 100644
+--- a/drivers/base/platform.c
++++ b/drivers/base/platform.c
+@@ -375,9 +375,7 @@ int platform_device_add(struct platform_device *pdev)
+
+ while (--i >= 0) {
+ struct resource *r = &pdev->resource[i];
+- unsigned long type = resource_type(r);
+-
+- if (type == IORESOURCE_MEM || type == IORESOURCE_IO)
++ if (r->parent)
+ release_resource(r);
+ }
+
+@@ -408,9 +406,7 @@ void platform_device_del(struct platform_device *pdev)
+
+ for (i = 0; i < pdev->num_resources; i++) {
+ struct resource *r = &pdev->resource[i];
+- unsigned long type = resource_type(r);
+-
+- if (type == IORESOURCE_MEM || type == IORESOURCE_IO)
++ if (r->parent)
+ release_resource(r);
+ }
+ }
+diff --git a/drivers/base/power/clock_ops.c b/drivers/base/power/clock_ops.c
+index acef9f9f759a..652b5a367c1f 100644
+--- a/drivers/base/power/clock_ops.c
++++ b/drivers/base/power/clock_ops.c
+@@ -38,7 +38,7 @@ struct pm_clock_entry {
+ * @dev: The device for the given clock
+ * @ce: PM clock entry corresponding to the clock.
+ */
+-static inline int __pm_clk_enable(struct device *dev, struct pm_clock_entry *ce)
++static inline void __pm_clk_enable(struct device *dev, struct pm_clock_entry *ce)
+ {
+ int ret;
+
+@@ -50,8 +50,6 @@ static inline int __pm_clk_enable(struct device *dev, struct pm_clock_entry *ce)
+ dev_err(dev, "%s: failed to enable clk %p, error %d\n",
+ __func__, ce->clk, ret);
+ }
+-
+- return ret;
+ }
+
+ /**
+diff --git a/drivers/clk/pistachio/clk-pistachio.c b/drivers/clk/pistachio/clk-pistachio.c
+index 8c0fe8828f99..c4ceb5eaf46c 100644
+--- a/drivers/clk/pistachio/clk-pistachio.c
++++ b/drivers/clk/pistachio/clk-pistachio.c
+@@ -159,9 +159,15 @@ PNAME(mux_debug) = { "mips_pll_mux", "rpu_v_pll_mux",
+ "wifi_pll_mux", "bt_pll_mux" };
+ static u32 mux_debug_idx[] = { 0x0, 0x1, 0x2, 0x4, 0x8, 0x10 };
+
+-static unsigned int pistachio_critical_clks[] __initdata = {
+- CLK_MIPS,
+- CLK_PERIPH_SYS,
++static unsigned int pistachio_critical_clks_core[] __initdata = {
++ CLK_MIPS
++};
++
++static unsigned int pistachio_critical_clks_sys[] __initdata = {
++ PERIPH_CLK_SYS,
++ PERIPH_CLK_SYS_BUS,
++ PERIPH_CLK_DDR,
++ PERIPH_CLK_ROM,
+ };
+
+ static void __init pistachio_clk_init(struct device_node *np)
+@@ -193,8 +199,8 @@ static void __init pistachio_clk_init(struct device_node *np)
+
+ pistachio_clk_register_provider(p);
+
+- pistachio_clk_force_enable(p, pistachio_critical_clks,
+- ARRAY_SIZE(pistachio_critical_clks));
++ pistachio_clk_force_enable(p, pistachio_critical_clks_core,
++ ARRAY_SIZE(pistachio_critical_clks_core));
+ }
+ CLK_OF_DECLARE(pistachio_clk, "img,pistachio-clk", pistachio_clk_init);
+
+@@ -261,6 +267,9 @@ static void __init pistachio_clk_periph_init(struct device_node *np)
+ ARRAY_SIZE(pistachio_periph_gates));
+
+ pistachio_clk_register_provider(p);
++
++ pistachio_clk_force_enable(p, pistachio_critical_clks_sys,
++ ARRAY_SIZE(pistachio_critical_clks_sys));
+ }
+ CLK_OF_DECLARE(pistachio_clk_periph, "img,pistachio-clk-periph",
+ pistachio_clk_periph_init);
+diff --git a/drivers/clk/pistachio/clk-pll.c b/drivers/clk/pistachio/clk-pll.c
+index e17dada0dd21..c9b459821084 100644
+--- a/drivers/clk/pistachio/clk-pll.c
++++ b/drivers/clk/pistachio/clk-pll.c
+@@ -65,6 +65,12 @@
+ #define MIN_OUTPUT_FRAC 12000000UL
+ #define MAX_OUTPUT_FRAC 1600000000UL
+
++/* Fractional PLL operating modes */
++enum pll_mode {
++ PLL_MODE_FRAC,
++ PLL_MODE_INT,
++};
++
+ struct pistachio_clk_pll {
+ struct clk_hw hw;
+ void __iomem *base;
+@@ -88,12 +94,10 @@ static inline void pll_lock(struct pistachio_clk_pll *pll)
+ cpu_relax();
+ }
+
+-static inline u32 do_div_round_closest(u64 dividend, u32 divisor)
++static inline u64 do_div_round_closest(u64 dividend, u64 divisor)
+ {
+ dividend += divisor / 2;
+- do_div(dividend, divisor);
+-
+- return dividend;
++ return div64_u64(dividend, divisor);
+ }
+
+ static inline struct pistachio_clk_pll *to_pistachio_pll(struct clk_hw *hw)
+@@ -101,6 +105,29 @@ static inline struct pistachio_clk_pll *to_pistachio_pll(struct clk_hw *hw)
+ return container_of(hw, struct pistachio_clk_pll, hw);
+ }
+
++static inline enum pll_mode pll_frac_get_mode(struct clk_hw *hw)
++{
++ struct pistachio_clk_pll *pll = to_pistachio_pll(hw);
++ u32 val;
++
++ val = pll_readl(pll, PLL_CTRL3) & PLL_FRAC_CTRL3_DSMPD;
++ return val ? PLL_MODE_INT : PLL_MODE_FRAC;
++}
++
++static inline void pll_frac_set_mode(struct clk_hw *hw, enum pll_mode mode)
++{
++ struct pistachio_clk_pll *pll = to_pistachio_pll(hw);
++ u32 val;
++
++ val = pll_readl(pll, PLL_CTRL3);
++ if (mode == PLL_MODE_INT)
++ val |= PLL_FRAC_CTRL3_DSMPD | PLL_FRAC_CTRL3_DACPD;
++ else
++ val &= ~(PLL_FRAC_CTRL3_DSMPD | PLL_FRAC_CTRL3_DACPD);
++
++ pll_writel(pll, val, PLL_CTRL3);
++}
++
+ static struct pistachio_pll_rate_table *
+ pll_get_params(struct pistachio_clk_pll *pll, unsigned long fref,
+ unsigned long fout)
+@@ -136,8 +163,7 @@ static int pll_gf40lp_frac_enable(struct clk_hw *hw)
+ u32 val;
+
+ val = pll_readl(pll, PLL_CTRL3);
+- val &= ~(PLL_FRAC_CTRL3_PD | PLL_FRAC_CTRL3_DACPD |
+- PLL_FRAC_CTRL3_DSMPD | PLL_FRAC_CTRL3_FOUTPOSTDIVPD |
++ val &= ~(PLL_FRAC_CTRL3_PD | PLL_FRAC_CTRL3_FOUTPOSTDIVPD |
+ PLL_FRAC_CTRL3_FOUT4PHASEPD | PLL_FRAC_CTRL3_FOUTVCOPD);
+ pll_writel(pll, val, PLL_CTRL3);
+
+@@ -173,7 +199,7 @@ static int pll_gf40lp_frac_set_rate(struct clk_hw *hw, unsigned long rate,
+ struct pistachio_clk_pll *pll = to_pistachio_pll(hw);
+ struct pistachio_pll_rate_table *params;
+ int enabled = pll_gf40lp_frac_is_enabled(hw);
+- u32 val, vco, old_postdiv1, old_postdiv2;
++ u64 val, vco, old_postdiv1, old_postdiv2;
+ const char *name = __clk_get_name(hw->clk);
+
+ if (rate < MIN_OUTPUT_FRAC || rate > MAX_OUTPUT_FRAC)
+@@ -183,17 +209,21 @@ static int pll_gf40lp_frac_set_rate(struct clk_hw *hw, unsigned long rate,
+ if (!params || !params->refdiv)
+ return -EINVAL;
+
+- vco = params->fref * params->fbdiv / params->refdiv;
++ /* calculate vco */
++ vco = params->fref;
++ vco *= (params->fbdiv << 24) + params->frac;
++ vco = div64_u64(vco, params->refdiv << 24);
++
+ if (vco < MIN_VCO_FRAC_FRAC || vco > MAX_VCO_FRAC_FRAC)
+- pr_warn("%s: VCO %u is out of range %lu..%lu\n", name, vco,
++ pr_warn("%s: VCO %llu is out of range %lu..%lu\n", name, vco,
+ MIN_VCO_FRAC_FRAC, MAX_VCO_FRAC_FRAC);
+
+- val = params->fref / params->refdiv;
++ val = div64_u64(params->fref, params->refdiv);
+ if (val < MIN_PFD)
+- pr_warn("%s: PFD %u is too low (min %lu)\n",
++ pr_warn("%s: PFD %llu is too low (min %lu)\n",
+ name, val, MIN_PFD);
+ if (val > vco / 16)
+- pr_warn("%s: PFD %u is too high (max %u)\n",
++ pr_warn("%s: PFD %llu is too high (max %llu)\n",
+ name, val, vco / 16);
+
+ val = pll_readl(pll, PLL_CTRL1);
+@@ -227,6 +257,12 @@ static int pll_gf40lp_frac_set_rate(struct clk_hw *hw, unsigned long rate,
+ (params->postdiv2 << PLL_FRAC_CTRL2_POSTDIV2_SHIFT);
+ pll_writel(pll, val, PLL_CTRL2);
+
++ /* set operating mode */
++ if (params->frac)
++ pll_frac_set_mode(hw, PLL_MODE_FRAC);
++ else
++ pll_frac_set_mode(hw, PLL_MODE_INT);
++
+ if (enabled)
+ pll_lock(pll);
+
+@@ -237,8 +273,7 @@ static unsigned long pll_gf40lp_frac_recalc_rate(struct clk_hw *hw,
+ unsigned long parent_rate)
+ {
+ struct pistachio_clk_pll *pll = to_pistachio_pll(hw);
+- u32 val, prediv, fbdiv, frac, postdiv1, postdiv2;
+- u64 rate = parent_rate;
++ u64 val, prediv, fbdiv, frac, postdiv1, postdiv2, rate;
+
+ val = pll_readl(pll, PLL_CTRL1);
+ prediv = (val >> PLL_CTRL1_REFDIV_SHIFT) & PLL_CTRL1_REFDIV_MASK;
+@@ -251,7 +286,13 @@ static unsigned long pll_gf40lp_frac_recalc_rate(struct clk_hw *hw,
+ PLL_FRAC_CTRL2_POSTDIV2_MASK;
+ frac = (val >> PLL_FRAC_CTRL2_FRAC_SHIFT) & PLL_FRAC_CTRL2_FRAC_MASK;
+
+- rate *= (fbdiv << 24) + frac;
++ /* get operating mode (int/frac) and calculate rate accordingly */
++ rate = parent_rate;
++ if (pll_frac_get_mode(hw) == PLL_MODE_FRAC)
++ rate *= (fbdiv << 24) + frac;
++ else
++ rate *= (fbdiv << 24);
++
+ rate = do_div_round_closest(rate, (prediv * postdiv1 * postdiv2) << 24);
+
+ return rate;
+@@ -279,7 +320,7 @@ static int pll_gf40lp_laint_enable(struct clk_hw *hw)
+ u32 val;
+
+ val = pll_readl(pll, PLL_CTRL1);
+- val &= ~(PLL_INT_CTRL1_PD | PLL_INT_CTRL1_DSMPD |
++ val &= ~(PLL_INT_CTRL1_PD |
+ PLL_INT_CTRL1_FOUTPOSTDIVPD | PLL_INT_CTRL1_FOUTVCOPD);
+ pll_writel(pll, val, PLL_CTRL1);
+
+@@ -325,12 +366,12 @@ static int pll_gf40lp_laint_set_rate(struct clk_hw *hw, unsigned long rate,
+ if (!params || !params->refdiv)
+ return -EINVAL;
+
+- vco = params->fref * params->fbdiv / params->refdiv;
++ vco = div_u64(params->fref * params->fbdiv, params->refdiv);
+ if (vco < MIN_VCO_LA || vco > MAX_VCO_LA)
+ pr_warn("%s: VCO %u is out of range %lu..%lu\n", name, vco,
+ MIN_VCO_LA, MAX_VCO_LA);
+
+- val = params->fref / params->refdiv;
++ val = div_u64(params->fref, params->refdiv);
+ if (val < MIN_PFD)
+ pr_warn("%s: PFD %u is too low (min %lu)\n",
+ name, val, MIN_PFD);
+diff --git a/drivers/clk/pistachio/clk.h b/drivers/clk/pistachio/clk.h
+index 52fabbc24624..8d45178dbde3 100644
+--- a/drivers/clk/pistachio/clk.h
++++ b/drivers/clk/pistachio/clk.h
+@@ -95,13 +95,13 @@ struct pistachio_fixed_factor {
+ }
+
+ struct pistachio_pll_rate_table {
+- unsigned long fref;
+- unsigned long fout;
+- unsigned int refdiv;
+- unsigned int fbdiv;
+- unsigned int postdiv1;
+- unsigned int postdiv2;
+- unsigned int frac;
++ unsigned long long fref;
++ unsigned long long fout;
++ unsigned long long refdiv;
++ unsigned long long fbdiv;
++ unsigned long long postdiv1;
++ unsigned long long postdiv2;
++ unsigned long long frac;
+ };
+
+ enum pistachio_pll_type {
+diff --git a/drivers/clk/pxa/clk-pxa25x.c b/drivers/clk/pxa/clk-pxa25x.c
+index 6cd88d963a7f..542e45ef5087 100644
+--- a/drivers/clk/pxa/clk-pxa25x.c
++++ b/drivers/clk/pxa/clk-pxa25x.c
+@@ -79,7 +79,7 @@ unsigned int pxa25x_get_clk_frequency_khz(int info)
+ clks[3] / 1000000, (clks[3] % 1000000) / 10000);
+ }
+
+- return (unsigned int)clks[0];
++ return (unsigned int)clks[0] / KHz;
+ }
+
+ static unsigned long clk_pxa25x_memory_get_rate(struct clk_hw *hw,
+diff --git a/drivers/clk/pxa/clk-pxa27x.c b/drivers/clk/pxa/clk-pxa27x.c
+index 9a31b77eed23..5b82d30baf9f 100644
+--- a/drivers/clk/pxa/clk-pxa27x.c
++++ b/drivers/clk/pxa/clk-pxa27x.c
+@@ -80,7 +80,7 @@ unsigned int pxa27x_get_clk_frequency_khz(int info)
+ pr_info("System bus clock: %ld.%02ldMHz\n",
+ clks[4] / 1000000, (clks[4] % 1000000) / 10000);
+ }
+- return (unsigned int)clks[0];
++ return (unsigned int)clks[0] / KHz;
+ }
+
+ bool pxa27x_is_ppll_disabled(void)
+diff --git a/drivers/clk/pxa/clk-pxa3xx.c b/drivers/clk/pxa/clk-pxa3xx.c
+index ac03ba49e9d1..4af4eed5f89f 100644
+--- a/drivers/clk/pxa/clk-pxa3xx.c
++++ b/drivers/clk/pxa/clk-pxa3xx.c
+@@ -78,7 +78,7 @@ unsigned int pxa3xx_get_clk_frequency_khz(int info)
+ pr_info("System bus clock: %ld.%02ldMHz\n",
+ clks[4] / 1000000, (clks[4] % 1000000) / 10000);
+ }
+- return (unsigned int)clks[0];
++ return (unsigned int)clks[0] / KHz;
+ }
+
+ static unsigned long clk_pxa3xx_ac97_get_rate(struct clk_hw *hw,
+diff --git a/drivers/clk/qcom/gcc-apq8084.c b/drivers/clk/qcom/gcc-apq8084.c
+index 54a756b90a37..457c540585f9 100644
+--- a/drivers/clk/qcom/gcc-apq8084.c
++++ b/drivers/clk/qcom/gcc-apq8084.c
+@@ -2105,6 +2105,7 @@ static struct clk_branch gcc_ce1_clk = {
+ "ce1_clk_src",
+ },
+ .num_parents = 1,
++ .flags = CLK_SET_RATE_PARENT,
+ .ops = &clk_branch2_ops,
+ },
+ },
+diff --git a/drivers/clk/qcom/gcc-msm8916.c b/drivers/clk/qcom/gcc-msm8916.c
+index c66f7bc2ae87..5d75bffab141 100644
+--- a/drivers/clk/qcom/gcc-msm8916.c
++++ b/drivers/clk/qcom/gcc-msm8916.c
+@@ -2278,7 +2278,7 @@ static struct clk_branch gcc_prng_ahb_clk = {
+ .halt_check = BRANCH_HALT_VOTED,
+ .clkr = {
+ .enable_reg = 0x45004,
+- .enable_mask = BIT(0),
++ .enable_mask = BIT(8),
+ .hw.init = &(struct clk_init_data){
+ .name = "gcc_prng_ahb_clk",
+ .parent_names = (const char *[]){
+diff --git a/drivers/clk/qcom/gcc-msm8974.c b/drivers/clk/qcom/gcc-msm8974.c
+index c39d09874e74..f06a082e3e87 100644
+--- a/drivers/clk/qcom/gcc-msm8974.c
++++ b/drivers/clk/qcom/gcc-msm8974.c
+@@ -1783,6 +1783,7 @@ static struct clk_branch gcc_ce1_clk = {
+ "ce1_clk_src",
+ },
+ .num_parents = 1,
++ .flags = CLK_SET_RATE_PARENT,
+ .ops = &clk_branch2_ops,
+ },
+ },
+diff --git a/drivers/clk/rockchip/clk-rk3288.c b/drivers/clk/rockchip/clk-rk3288.c
+index 4f817ed9e6ee..0211162ee879 100644
+--- a/drivers/clk/rockchip/clk-rk3288.c
++++ b/drivers/clk/rockchip/clk-rk3288.c
+@@ -578,7 +578,7 @@ static struct rockchip_clk_branch rk3288_clk_branches[] __initdata = {
+ COMPOSITE(0, "mac_pll_src", mux_pll_src_npll_cpll_gpll_p, 0,
+ RK3288_CLKSEL_CON(21), 0, 2, MFLAGS, 8, 5, DFLAGS,
+ RK3288_CLKGATE_CON(2), 5, GFLAGS),
+- MUX(SCLK_MAC, "mac_clk", mux_mac_p, 0,
++ MUX(SCLK_MAC, "mac_clk", mux_mac_p, CLK_SET_RATE_PARENT,
+ RK3288_CLKSEL_CON(21), 4, 1, MFLAGS),
+ GATE(SCLK_MACREF_OUT, "sclk_macref_out", "mac_clk", 0,
+ RK3288_CLKGATE_CON(5), 3, GFLAGS),
+diff --git a/drivers/clk/samsung/clk-exynos4.c b/drivers/clk/samsung/clk-exynos4.c
+index cae2c048488d..d1af2fc53c5f 100644
+--- a/drivers/clk/samsung/clk-exynos4.c
++++ b/drivers/clk/samsung/clk-exynos4.c
+@@ -86,6 +86,7 @@
+ #define DIV_PERIL4 0xc560
+ #define DIV_PERIL5 0xc564
+ #define E4X12_DIV_CAM1 0xc568
++#define E4X12_GATE_BUS_FSYS1 0xc744
+ #define GATE_SCLK_CAM 0xc820
+ #define GATE_IP_CAM 0xc920
+ #define GATE_IP_TV 0xc924
+@@ -1097,6 +1098,7 @@ static struct samsung_gate_clock exynos4x12_gate_clks[] __initdata = {
+ 0),
+ GATE(CLK_PPMUIMAGE, "ppmuimage", "aclk200", E4X12_GATE_IP_IMAGE, 9, 0,
+ 0),
++ GATE(CLK_TSADC, "tsadc", "aclk133", E4X12_GATE_BUS_FSYS1, 16, 0, 0),
+ GATE(CLK_MIPI_HSI, "mipi_hsi", "aclk133", GATE_IP_FSYS, 10, 0, 0),
+ GATE(CLK_CHIPID, "chipid", "aclk100", E4X12_GATE_IP_PERIR, 0, 0, 0),
+ GATE(CLK_SYSREG, "sysreg", "aclk100", E4X12_GATE_IP_PERIR, 1,
+diff --git a/drivers/clk/samsung/clk-s5pv210.c b/drivers/clk/samsung/clk-s5pv210.c
+index cf7e8fa7b624..793cb1d2f7ae 100644
+--- a/drivers/clk/samsung/clk-s5pv210.c
++++ b/drivers/clk/samsung/clk-s5pv210.c
+@@ -828,6 +828,8 @@ static void __init __s5pv210_clk_init(struct device_node *np,
+
+ s5pv210_clk_sleep_init();
+
++ samsung_clk_of_add_provider(np, ctx);
++
+ pr_info("%s clocks: mout_apll = %ld, mout_mpll = %ld\n"
+ "\tmout_epll = %ld, mout_vpll = %ld\n",
+ is_s5p6442 ? "S5P6442" : "S5PV210",
+diff --git a/drivers/clk/versatile/clk-sp810.c b/drivers/clk/versatile/clk-sp810.c
+index a96dd8e53fdb..b674ffc4f5ce 100644
+--- a/drivers/clk/versatile/clk-sp810.c
++++ b/drivers/clk/versatile/clk-sp810.c
+@@ -128,8 +128,8 @@ static struct clk *clk_sp810_timerclken_of_get(struct of_phandle_args *clkspec,
+ {
+ struct clk_sp810 *sp810 = data;
+
+- if (WARN_ON(clkspec->args_count != 1 || clkspec->args[0] >
+- ARRAY_SIZE(sp810->timerclken)))
++ if (WARN_ON(clkspec->args_count != 1 ||
++ clkspec->args[0] >= ARRAY_SIZE(sp810->timerclken)))
+ return NULL;
+
+ return sp810->timerclken[clkspec->args[0]].clk;
+diff --git a/drivers/crypto/vmx/aes_ctr.c b/drivers/crypto/vmx/aes_ctr.c
+index 7adae42a7b79..ed3838781b4c 100644
+--- a/drivers/crypto/vmx/aes_ctr.c
++++ b/drivers/crypto/vmx/aes_ctr.c
+@@ -113,6 +113,7 @@ static int p8_aes_ctr_crypt(struct blkcipher_desc *desc,
+ struct scatterlist *src, unsigned int nbytes)
+ {
+ int ret;
++ u64 inc;
+ struct blkcipher_walk walk;
+ struct p8_aes_ctr_ctx *ctx =
+ crypto_tfm_ctx(crypto_blkcipher_tfm(desc->tfm));
+@@ -140,7 +141,12 @@ static int p8_aes_ctr_crypt(struct blkcipher_desc *desc,
+ walk.iv);
+ pagefault_enable();
+
+- crypto_inc(walk.iv, AES_BLOCK_SIZE);
++ /* We need to update IV mostly for last bytes/round */
++ inc = (nbytes & AES_BLOCK_MASK) / AES_BLOCK_SIZE;
++ if (inc > 0)
++ while (inc--)
++ crypto_inc(walk.iv, AES_BLOCK_SIZE);
++
+ nbytes &= AES_BLOCK_SIZE - 1;
+ ret = blkcipher_walk_done(desc, &walk, nbytes);
+ }
+diff --git a/drivers/crypto/vmx/aesp8-ppc.pl b/drivers/crypto/vmx/aesp8-ppc.pl
+index 6c5c20c6108e..228053921b3f 100644
+--- a/drivers/crypto/vmx/aesp8-ppc.pl
++++ b/drivers/crypto/vmx/aesp8-ppc.pl
+@@ -1437,28 +1437,28 @@ Load_ctr32_enc_key:
+ ?vperm v31,v31,$out0,$keyperm
+ lvx v25,$x10,$key_ # pre-load round[2]
+
+- vadduwm $two,$one,$one
++ vadduqm $two,$one,$one
+ subi $inp,$inp,15 # undo "caller"
+ $SHL $len,$len,4
+
+- vadduwm $out1,$ivec,$one # counter values ...
+- vadduwm $out2,$ivec,$two
++ vadduqm $out1,$ivec,$one # counter values ...
++ vadduqm $out2,$ivec,$two
+ vxor $out0,$ivec,$rndkey0 # ... xored with rndkey[0]
+ le?li $idx,8
+- vadduwm $out3,$out1,$two
++ vadduqm $out3,$out1,$two
+ vxor $out1,$out1,$rndkey0
+ le?lvsl $inpperm,0,$idx
+- vadduwm $out4,$out2,$two
++ vadduqm $out4,$out2,$two
+ vxor $out2,$out2,$rndkey0
+ le?vspltisb $tmp,0x0f
+- vadduwm $out5,$out3,$two
++ vadduqm $out5,$out3,$two
+ vxor $out3,$out3,$rndkey0
+ le?vxor $inpperm,$inpperm,$tmp # transform for lvx_u/stvx_u
+- vadduwm $out6,$out4,$two
++ vadduqm $out6,$out4,$two
+ vxor $out4,$out4,$rndkey0
+- vadduwm $out7,$out5,$two
++ vadduqm $out7,$out5,$two
+ vxor $out5,$out5,$rndkey0
+- vadduwm $ivec,$out6,$two # next counter value
++ vadduqm $ivec,$out6,$two # next counter value
+ vxor $out6,$out6,$rndkey0
+ vxor $out7,$out7,$rndkey0
+
+@@ -1594,27 +1594,27 @@ Loop_ctr32_enc8x_middle:
+
+ vcipherlast $in0,$out0,$in0
+ vcipherlast $in1,$out1,$in1
+- vadduwm $out1,$ivec,$one # counter values ...
++ vadduqm $out1,$ivec,$one # counter values ...
+ vcipherlast $in2,$out2,$in2
+- vadduwm $out2,$ivec,$two
++ vadduqm $out2,$ivec,$two
+ vxor $out0,$ivec,$rndkey0 # ... xored with rndkey[0]
+ vcipherlast $in3,$out3,$in3
+- vadduwm $out3,$out1,$two
++ vadduqm $out3,$out1,$two
+ vxor $out1,$out1,$rndkey0
+ vcipherlast $in4,$out4,$in4
+- vadduwm $out4,$out2,$two
++ vadduqm $out4,$out2,$two
+ vxor $out2,$out2,$rndkey0
+ vcipherlast $in5,$out5,$in5
+- vadduwm $out5,$out3,$two
++ vadduqm $out5,$out3,$two
+ vxor $out3,$out3,$rndkey0
+ vcipherlast $in6,$out6,$in6
+- vadduwm $out6,$out4,$two
++ vadduqm $out6,$out4,$two
+ vxor $out4,$out4,$rndkey0
+ vcipherlast $in7,$out7,$in7
+- vadduwm $out7,$out5,$two
++ vadduqm $out7,$out5,$two
+ vxor $out5,$out5,$rndkey0
+ le?vperm $in0,$in0,$in0,$inpperm
+- vadduwm $ivec,$out6,$two # next counter value
++ vadduqm $ivec,$out6,$two # next counter value
+ vxor $out6,$out6,$rndkey0
+ le?vperm $in1,$in1,$in1,$inpperm
+ vxor $out7,$out7,$rndkey0
+diff --git a/drivers/crypto/vmx/ghashp8-ppc.pl b/drivers/crypto/vmx/ghashp8-ppc.pl
+index 0a6f899839dd..d8429cb71f02 100644
+--- a/drivers/crypto/vmx/ghashp8-ppc.pl
++++ b/drivers/crypto/vmx/ghashp8-ppc.pl
+@@ -61,6 +61,12 @@ $code=<<___;
+ mtspr 256,r0
+ li r10,0x30
+ lvx_u $H,0,r4 # load H
++ le?xor r7,r7,r7
++ le?addi r7,r7,0x8 # need a vperm start with 08
++ le?lvsr 5,0,r7
++ le?vspltisb 6,0x0f
++ le?vxor 5,5,6 # set a b-endian mask
++ le?vperm $H,$H,$H,5
+
+ vspltisb $xC2,-16 # 0xf0
+ vspltisb $t0,1 # one
+diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
+index 27df17a0e620..89c3dd62ba21 100644
+--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
++++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
+@@ -75,6 +75,11 @@ void amdgpu_connector_hotplug(struct drm_connector *connector)
+ if (!amdgpu_display_hpd_sense(adev, amdgpu_connector->hpd.hpd)) {
+ drm_helper_connector_dpms(connector, DRM_MODE_DPMS_OFF);
+ } else if (amdgpu_atombios_dp_needs_link_train(amdgpu_connector)) {
++ /* Don't try to start link training before we
++ * have the dpcd */
++ if (!amdgpu_atombios_dp_get_dpcd(amdgpu_connector))
++ return;
++
+ /* set it to OFF so that drm_helper_connector_dpms()
+ * won't return immediately since the current state
+ * is ON at this point.
+diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
+index db5422e65ec5..a8207e5a8549 100644
+--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
++++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c
+@@ -97,18 +97,12 @@ int amdgpu_ih_ring_init(struct amdgpu_device *adev, unsigned ring_size,
+ /* add 8 bytes for the rptr/wptr shadows and
+ * add them to the end of the ring allocation.
+ */
+- adev->irq.ih.ring = kzalloc(adev->irq.ih.ring_size + 8, GFP_KERNEL);
++ adev->irq.ih.ring = pci_alloc_consistent(adev->pdev,
++ adev->irq.ih.ring_size + 8,
++ &adev->irq.ih.rb_dma_addr);
+ if (adev->irq.ih.ring == NULL)
+ return -ENOMEM;
+- adev->irq.ih.rb_dma_addr = pci_map_single(adev->pdev,
+- (void *)adev->irq.ih.ring,
+- adev->irq.ih.ring_size,
+- PCI_DMA_BIDIRECTIONAL);
+- if (pci_dma_mapping_error(adev->pdev, adev->irq.ih.rb_dma_addr)) {
+- dev_err(&adev->pdev->dev, "Failed to DMA MAP the IH RB page\n");
+- kfree((void *)adev->irq.ih.ring);
+- return -ENOMEM;
+- }
++ memset((void *)adev->irq.ih.ring, 0, adev->irq.ih.ring_size + 8);
+ adev->irq.ih.wptr_offs = (adev->irq.ih.ring_size / 4) + 0;
+ adev->irq.ih.rptr_offs = (adev->irq.ih.ring_size / 4) + 1;
+ }
+@@ -148,9 +142,9 @@ void amdgpu_ih_ring_fini(struct amdgpu_device *adev)
+ /* add 8 bytes for the rptr/wptr shadows and
+ * add them to the end of the ring allocation.
+ */
+- pci_unmap_single(adev->pdev, adev->irq.ih.rb_dma_addr,
+- adev->irq.ih.ring_size + 8, PCI_DMA_BIDIRECTIONAL);
+- kfree((void *)adev->irq.ih.ring);
++ pci_free_consistent(adev->pdev, adev->irq.ih.ring_size + 8,
++ (void *)adev->irq.ih.ring,
++ adev->irq.ih.rb_dma_addr);
+ adev->irq.ih.ring = NULL;
+ }
+ } else {
+diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+index f5c22556ec2c..2abc661845b6 100644
+--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
++++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+@@ -374,7 +374,8 @@ static int amdgpu_uvd_cs_msg_decode(uint32_t *msg, unsigned buf_sizes[])
+ unsigned height_in_mb = ALIGN(height / 16, 2);
+ unsigned fs_in_mb = width_in_mb * height_in_mb;
+
+- unsigned image_size, tmp, min_dpb_size, num_dpb_buffer, min_ctx_size;
++ unsigned image_size, tmp, min_dpb_size, num_dpb_buffer;
++ unsigned min_ctx_size = 0;
+
+ image_size = width * height;
+ image_size += image_size / 2;
+diff --git a/drivers/gpu/drm/amd/amdgpu/atombios_dp.c b/drivers/gpu/drm/amd/amdgpu/atombios_dp.c
+index 9ba0a7d5bc8e..92b6acadfc52 100644
+--- a/drivers/gpu/drm/amd/amdgpu/atombios_dp.c
++++ b/drivers/gpu/drm/amd/amdgpu/atombios_dp.c
+@@ -139,7 +139,8 @@ amdgpu_atombios_dp_aux_transfer(struct drm_dp_aux *aux, struct drm_dp_aux_msg *m
+
+ tx_buf[0] = msg->address & 0xff;
+ tx_buf[1] = msg->address >> 8;
+- tx_buf[2] = msg->request << 4;
++ tx_buf[2] = (msg->request << 4) |
++ ((msg->address >> 16) & 0xf);
+ tx_buf[3] = msg->size ? (msg->size - 1) : 0;
+
+ switch (msg->request & ~DP_AUX_I2C_MOT) {
+diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
+index e70a26f587a0..e774a437dd65 100644
+--- a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
++++ b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
+@@ -1331,7 +1331,7 @@ static void dce_v10_0_program_watermarks(struct amdgpu_device *adev,
+ tmp = REG_SET_FIELD(wm_mask, DPG_WATERMARK_MASK_CONTROL, URGENCY_WATERMARK_MASK, 2);
+ WREG32(mmDPG_WATERMARK_MASK_CONTROL + amdgpu_crtc->crtc_offset, tmp);
+ tmp = RREG32(mmDPG_PIPE_URGENCY_CONTROL + amdgpu_crtc->crtc_offset);
+- tmp = REG_SET_FIELD(tmp, DPG_PIPE_URGENCY_CONTROL, URGENCY_LOW_WATERMARK, latency_watermark_a);
++ tmp = REG_SET_FIELD(tmp, DPG_PIPE_URGENCY_CONTROL, URGENCY_LOW_WATERMARK, latency_watermark_b);
+ tmp = REG_SET_FIELD(tmp, DPG_PIPE_URGENCY_CONTROL, URGENCY_HIGH_WATERMARK, line_time);
+ WREG32(mmDPG_PIPE_URGENCY_CONTROL + amdgpu_crtc->crtc_offset, tmp);
+ /* restore original selection */
+diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
+index dcb402ee048a..c4a21a7afd68 100644
+--- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
++++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
+@@ -1329,7 +1329,7 @@ static void dce_v11_0_program_watermarks(struct amdgpu_device *adev,
+ tmp = REG_SET_FIELD(wm_mask, DPG_WATERMARK_MASK_CONTROL, URGENCY_WATERMARK_MASK, 2);
+ WREG32(mmDPG_WATERMARK_MASK_CONTROL + amdgpu_crtc->crtc_offset, tmp);
+ tmp = RREG32(mmDPG_PIPE_URGENCY_CONTROL + amdgpu_crtc->crtc_offset);
+- tmp = REG_SET_FIELD(tmp, DPG_PIPE_URGENCY_CONTROL, URGENCY_LOW_WATERMARK, latency_watermark_a);
++ tmp = REG_SET_FIELD(tmp, DPG_PIPE_URGENCY_CONTROL, URGENCY_LOW_WATERMARK, latency_watermark_b);
+ tmp = REG_SET_FIELD(tmp, DPG_PIPE_URGENCY_CONTROL, URGENCY_HIGH_WATERMARK, line_time);
+ WREG32(mmDPG_PIPE_URGENCY_CONTROL + amdgpu_crtc->crtc_offset, tmp);
+ /* restore original selection */
+diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
+index 884b4f9b81c4..603146ec9868 100644
+--- a/drivers/gpu/drm/i915/i915_drv.c
++++ b/drivers/gpu/drm/i915/i915_drv.c
+@@ -683,15 +683,18 @@ static int i915_drm_suspend_late(struct drm_device *drm_dev, bool hibernation)
+
+ pci_disable_device(drm_dev->pdev);
+ /*
+- * During hibernation on some GEN4 platforms the BIOS may try to access
++ * During hibernation on some platforms the BIOS may try to access
+ * the device even though it's already in D3 and hang the machine. So
+ * leave the device in D0 on those platforms and hope the BIOS will
+- * power down the device properly. Platforms where this was seen:
+- * Lenovo Thinkpad X301, X61s
++ * power down the device properly. The issue was seen on multiple old
++ * GENs with different BIOS vendors, so having an explicit blacklist
++ * is inpractical; apply the workaround on everything pre GEN6. The
++ * platforms where the issue was seen:
++ * Lenovo Thinkpad X301, X61s, X60, T60, X41
++ * Fujitsu FSC S7110
++ * Acer Aspire 1830T
+ */
+- if (!(hibernation &&
+- drm_dev->pdev->subsystem_vendor == PCI_VENDOR_ID_LENOVO &&
+- INTEL_INFO(dev_priv)->gen == 4))
++ if (!(hibernation && INTEL_INFO(dev_priv)->gen < 6))
+ pci_set_power_state(drm_dev->pdev, PCI_D3hot);
+
+ return 0;
+diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
+index fd1de451c8c6..e1df8feb05be 100644
+--- a/drivers/gpu/drm/i915/i915_drv.h
++++ b/drivers/gpu/drm/i915/i915_drv.h
+@@ -3303,13 +3303,13 @@ int intel_freq_opcode(struct drm_i915_private *dev_priv, int val);
+ #define I915_READ64(reg) dev_priv->uncore.funcs.mmio_readq(dev_priv, (reg), true)
+
+ #define I915_READ64_2x32(lower_reg, upper_reg) ({ \
+- u32 upper, lower, tmp; \
+- tmp = I915_READ(upper_reg); \
++ u32 upper, lower, old_upper, loop = 0; \
++ upper = I915_READ(upper_reg); \
+ do { \
+- upper = tmp; \
++ old_upper = upper; \
+ lower = I915_READ(lower_reg); \
+- tmp = I915_READ(upper_reg); \
+- } while (upper != tmp); \
++ upper = I915_READ(upper_reg); \
++ } while (upper != old_upper && loop++ < 2); \
+ (u64)upper << 32 | lower; })
+
+ #define POSTING_READ(reg) (void)I915_READ_NOTRACE(reg)
+diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+index a7fa14516cda..5e6b4a29e503 100644
+--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
++++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+@@ -1024,6 +1024,7 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
+ u32 old_read = obj->base.read_domains;
+ u32 old_write = obj->base.write_domain;
+
++ obj->dirty = 1; /* be paranoid */
+ obj->base.write_domain = obj->base.pending_write_domain;
+ if (obj->base.write_domain == 0)
+ obj->base.pending_read_domains |= obj->base.read_domains;
+@@ -1031,7 +1032,6 @@ i915_gem_execbuffer_move_to_active(struct list_head *vmas,
+
+ i915_vma_move_to_active(vma, ring);
+ if (obj->base.write_domain) {
+- obj->dirty = 1;
+ i915_gem_request_assign(&obj->last_write_req, req);
+
+ intel_fb_obj_invalidate(obj, ring, ORIGIN_CS);
+diff --git a/drivers/gpu/drm/i915/intel_csr.c b/drivers/gpu/drm/i915/intel_csr.c
+index bcb41e61877d..fb842d6e343f 100644
+--- a/drivers/gpu/drm/i915/intel_csr.c
++++ b/drivers/gpu/drm/i915/intel_csr.c
+@@ -350,7 +350,7 @@ static void finish_csr_load(const struct firmware *fw, void *context)
+ }
+ csr->mmio_count = dmc_header->mmio_count;
+ for (i = 0; i < dmc_header->mmio_count; i++) {
+- if (dmc_header->mmioaddr[i] < CSR_MMIO_START_RANGE &&
++ if (dmc_header->mmioaddr[i] < CSR_MMIO_START_RANGE ||
+ dmc_header->mmioaddr[i] > CSR_MMIO_END_RANGE) {
+ DRM_ERROR(" Firmware has wrong mmio address 0x%x\n",
+ dmc_header->mmioaddr[i]);
+diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
+index 87476ff181dd..107c6c0519fd 100644
+--- a/drivers/gpu/drm/i915/intel_display.c
++++ b/drivers/gpu/drm/i915/intel_display.c
+@@ -14665,6 +14665,24 @@ void intel_modeset_init(struct drm_device *dev)
+ if (INTEL_INFO(dev)->num_pipes == 0)
+ return;
+
++ /*
++ * There may be no VBT; and if the BIOS enabled SSC we can
++ * just keep using it to avoid unnecessary flicker. Whereas if the
++ * BIOS isn't using it, don't assume it will work even if the VBT
++ * indicates as much.
++ */
++ if (HAS_PCH_IBX(dev) || HAS_PCH_CPT(dev)) {
++ bool bios_lvds_use_ssc = !!(I915_READ(PCH_DREF_CONTROL) &
++ DREF_SSC1_ENABLE);
++
++ if (dev_priv->vbt.lvds_use_ssc != bios_lvds_use_ssc) {
++ DRM_DEBUG_KMS("SSC %sabled by BIOS, overriding VBT which says %sabled\n",
++ bios_lvds_use_ssc ? "en" : "dis",
++ dev_priv->vbt.lvds_use_ssc ? "en" : "dis");
++ dev_priv->vbt.lvds_use_ssc = bios_lvds_use_ssc;
++ }
++ }
++
+ intel_init_display(dev);
+ intel_init_audio(dev);
+
+@@ -15160,7 +15178,6 @@ void intel_modeset_setup_hw_state(struct drm_device *dev,
+
+ void intel_modeset_gem_init(struct drm_device *dev)
+ {
+- struct drm_i915_private *dev_priv = dev->dev_private;
+ struct drm_crtc *c;
+ struct drm_i915_gem_object *obj;
+ int ret;
+@@ -15169,16 +15186,6 @@ void intel_modeset_gem_init(struct drm_device *dev)
+ intel_init_gt_powersave(dev);
+ mutex_unlock(&dev->struct_mutex);
+
+- /*
+- * There may be no VBT; and if the BIOS enabled SSC we can
+- * just keep using it to avoid unnecessary flicker. Whereas if the
+- * BIOS isn't using it, don't assume it will work even if the VBT
+- * indicates as much.
+- */
+- if (HAS_PCH_IBX(dev) || HAS_PCH_CPT(dev))
+- dev_priv->vbt.lvds_use_ssc = !!(I915_READ(PCH_DREF_CONTROL) &
+- DREF_SSC1_ENABLE);
+-
+ intel_modeset_init_hw(dev);
+
+ intel_setup_overlay(dev);
+diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
+index 1df0e1fe235f..bd8f8863eb0e 100644
+--- a/drivers/gpu/drm/i915/intel_dp.c
++++ b/drivers/gpu/drm/i915/intel_dp.c
+@@ -4987,9 +4987,12 @@ intel_dp_hpd_pulse(struct intel_digital_port *intel_dig_port, bool long_hpd)
+
+ intel_dp_probe_oui(intel_dp);
+
+- if (!intel_dp_probe_mst(intel_dp))
++ if (!intel_dp_probe_mst(intel_dp)) {
++ drm_modeset_lock(&dev->mode_config.connection_mutex, NULL);
++ intel_dp_check_link_status(intel_dp);
++ drm_modeset_unlock(&dev->mode_config.connection_mutex);
+ goto mst_fail;
+-
++ }
+ } else {
+ if (intel_dp->is_mst) {
+ if (intel_dp_check_mst_status(intel_dp) == -EINVAL)
+@@ -4997,10 +5000,6 @@ intel_dp_hpd_pulse(struct intel_digital_port *intel_dig_port, bool long_hpd)
+ }
+
+ if (!intel_dp->is_mst) {
+- /*
+- * we'll check the link status via the normal hot plug path later -
+- * but for short hpds we should check it now
+- */
+ drm_modeset_lock(&dev->mode_config.connection_mutex, NULL);
+ intel_dp_check_link_status(intel_dp);
+ drm_modeset_unlock(&dev->mode_config.connection_mutex);
+diff --git a/drivers/gpu/drm/i915/intel_dsi.c b/drivers/gpu/drm/i915/intel_dsi.c
+index b5a5558ecd63..68b25dd525f0 100644
+--- a/drivers/gpu/drm/i915/intel_dsi.c
++++ b/drivers/gpu/drm/i915/intel_dsi.c
+@@ -1036,11 +1036,7 @@ void intel_dsi_init(struct drm_device *dev)
+ intel_connector->unregister = intel_connector_unregister;
+
+ /* Pipe A maps to MIPI DSI port A, pipe B maps to MIPI DSI port C */
+- if (dev_priv->vbt.dsi.config->dual_link) {
+- /* XXX: does dual link work on either pipe? */
+- intel_encoder->crtc_mask = (1 << PIPE_A);
+- intel_dsi->ports = ((1 << PORT_A) | (1 << PORT_C));
+- } else if (dev_priv->vbt.dsi.port == DVO_PORT_MIPIA) {
++ if (dev_priv->vbt.dsi.port == DVO_PORT_MIPIA) {
+ intel_encoder->crtc_mask = (1 << PIPE_A);
+ intel_dsi->ports = (1 << PORT_A);
+ } else if (dev_priv->vbt.dsi.port == DVO_PORT_MIPIC) {
+@@ -1048,6 +1044,9 @@ void intel_dsi_init(struct drm_device *dev)
+ intel_dsi->ports = (1 << PORT_C);
+ }
+
++ if (dev_priv->vbt.dsi.config->dual_link)
++ intel_dsi->ports = ((1 << PORT_A) | (1 << PORT_C));
++
+ /* Create a DSI host (and a device) for each port. */
+ for_each_dsi_port(port, intel_dsi->ports) {
+ struct intel_dsi_host *host;
+diff --git a/drivers/gpu/drm/qxl/qxl_display.c b/drivers/gpu/drm/qxl/qxl_display.c
+index a8dbb3ef4e3c..7c6225c84ba6 100644
+--- a/drivers/gpu/drm/qxl/qxl_display.c
++++ b/drivers/gpu/drm/qxl/qxl_display.c
+@@ -160,9 +160,35 @@ static int qxl_add_monitors_config_modes(struct drm_connector *connector,
+ *pwidth = head->width;
+ *pheight = head->height;
+ drm_mode_probed_add(connector, mode);
++ /* remember the last custom size for mode validation */
++ qdev->monitors_config_width = mode->hdisplay;
++ qdev->monitors_config_height = mode->vdisplay;
+ return 1;
+ }
+
++static struct mode_size {
++ int w;
++ int h;
++} common_modes[] = {
++ { 640, 480},
++ { 720, 480},
++ { 800, 600},
++ { 848, 480},
++ {1024, 768},
++ {1152, 768},
++ {1280, 720},
++ {1280, 800},
++ {1280, 854},
++ {1280, 960},
++ {1280, 1024},
++ {1440, 900},
++ {1400, 1050},
++ {1680, 1050},
++ {1600, 1200},
++ {1920, 1080},
++ {1920, 1200}
++};
++
+ static int qxl_add_common_modes(struct drm_connector *connector,
+ unsigned pwidth,
+ unsigned pheight)
+@@ -170,29 +196,6 @@ static int qxl_add_common_modes(struct drm_connector *connector,
+ struct drm_device *dev = connector->dev;
+ struct drm_display_mode *mode = NULL;
+ int i;
+- struct mode_size {
+- int w;
+- int h;
+- } common_modes[] = {
+- { 640, 480},
+- { 720, 480},
+- { 800, 600},
+- { 848, 480},
+- {1024, 768},
+- {1152, 768},
+- {1280, 720},
+- {1280, 800},
+- {1280, 854},
+- {1280, 960},
+- {1280, 1024},
+- {1440, 900},
+- {1400, 1050},
+- {1680, 1050},
+- {1600, 1200},
+- {1920, 1080},
+- {1920, 1200}
+- };
+-
+ for (i = 0; i < ARRAY_SIZE(common_modes); i++) {
+ mode = drm_cvt_mode(dev, common_modes[i].w, common_modes[i].h,
+ 60, false, false, false);
+@@ -823,11 +826,22 @@ static int qxl_conn_get_modes(struct drm_connector *connector)
+ static int qxl_conn_mode_valid(struct drm_connector *connector,
+ struct drm_display_mode *mode)
+ {
++ struct drm_device *ddev = connector->dev;
++ struct qxl_device *qdev = ddev->dev_private;
++ int i;
++
+ /* TODO: is this called for user defined modes? (xrandr --add-mode)
+ * TODO: check that the mode fits in the framebuffer */
+- DRM_DEBUG("%s: %dx%d status=%d\n", mode->name, mode->hdisplay,
+- mode->vdisplay, mode->status);
+- return MODE_OK;
++
++ if(qdev->monitors_config_width == mode->hdisplay &&
++ qdev->monitors_config_height == mode->vdisplay)
++ return MODE_OK;
++
++ for (i = 0; i < ARRAY_SIZE(common_modes); i++) {
++ if (common_modes[i].w == mode->hdisplay && common_modes[i].h == mode->vdisplay)
++ return MODE_OK;
++ }
++ return MODE_BAD;
+ }
+
+ static struct drm_encoder *qxl_best_encoder(struct drm_connector *connector)
+diff --git a/drivers/gpu/drm/qxl/qxl_drv.h b/drivers/gpu/drm/qxl/qxl_drv.h
+index d8549690801d..01a86948eb8c 100644
+--- a/drivers/gpu/drm/qxl/qxl_drv.h
++++ b/drivers/gpu/drm/qxl/qxl_drv.h
+@@ -325,6 +325,8 @@ struct qxl_device {
+ struct work_struct fb_work;
+
+ struct drm_property *hotplug_mode_update_property;
++ int monitors_config_width;
++ int monitors_config_height;
+ };
+
+ /* forward declaration for QXL_INFO_IO */
+diff --git a/drivers/gpu/drm/radeon/atombios_dp.c b/drivers/gpu/drm/radeon/atombios_dp.c
+index f81e0d7d0232..9cd49c584263 100644
+--- a/drivers/gpu/drm/radeon/atombios_dp.c
++++ b/drivers/gpu/drm/radeon/atombios_dp.c
+@@ -171,8 +171,9 @@ radeon_dp_aux_transfer_atom(struct drm_dp_aux *aux, struct drm_dp_aux_msg *msg)
+ return -E2BIG;
+
+ tx_buf[0] = msg->address & 0xff;
+- tx_buf[1] = msg->address >> 8;
+- tx_buf[2] = msg->request << 4;
++ tx_buf[1] = (msg->address >> 8) & 0xff;
++ tx_buf[2] = (msg->request << 4) |
++ ((msg->address >> 16) & 0xf);
+ tx_buf[3] = msg->size ? (msg->size - 1) : 0;
+
+ switch (msg->request & ~DP_AUX_I2C_MOT) {
+diff --git a/drivers/gpu/drm/radeon/radeon_audio.c b/drivers/gpu/drm/radeon/radeon_audio.c
+index fbc8d88d6e5d..2c02e99b5f95 100644
+--- a/drivers/gpu/drm/radeon/radeon_audio.c
++++ b/drivers/gpu/drm/radeon/radeon_audio.c
+@@ -522,13 +522,15 @@ static int radeon_audio_set_avi_packet(struct drm_encoder *encoder,
+ return err;
+ }
+
+- if (drm_rgb_quant_range_selectable(radeon_connector_edid(connector))) {
+- if (radeon_encoder->output_csc == RADEON_OUTPUT_CSC_TVRGB)
+- frame.quantization_range = HDMI_QUANTIZATION_RANGE_LIMITED;
+- else
+- frame.quantization_range = HDMI_QUANTIZATION_RANGE_FULL;
+- } else {
+- frame.quantization_range = HDMI_QUANTIZATION_RANGE_DEFAULT;
++ if (radeon_encoder->output_csc != RADEON_OUTPUT_CSC_BYPASS) {
++ if (drm_rgb_quant_range_selectable(radeon_connector_edid(connector))) {
++ if (radeon_encoder->output_csc == RADEON_OUTPUT_CSC_TVRGB)
++ frame.quantization_range = HDMI_QUANTIZATION_RANGE_LIMITED;
++ else
++ frame.quantization_range = HDMI_QUANTIZATION_RANGE_FULL;
++ } else {
++ frame.quantization_range = HDMI_QUANTIZATION_RANGE_DEFAULT;
++ }
+ }
+
+ err = hdmi_avi_infoframe_pack(&frame, buffer, sizeof(buffer));
+diff --git a/drivers/gpu/drm/radeon/radeon_connectors.c b/drivers/gpu/drm/radeon/radeon_connectors.c
+index 94b21ae70ef7..5a2cafb4f1bc 100644
+--- a/drivers/gpu/drm/radeon/radeon_connectors.c
++++ b/drivers/gpu/drm/radeon/radeon_connectors.c
+@@ -95,6 +95,11 @@ void radeon_connector_hotplug(struct drm_connector *connector)
+ if (!radeon_hpd_sense(rdev, radeon_connector->hpd.hpd)) {
+ drm_helper_connector_dpms(connector, DRM_MODE_DPMS_OFF);
+ } else if (radeon_dp_needs_link_train(radeon_connector)) {
++ /* Don't try to start link training before we
++ * have the dpcd */
++ if (!radeon_dp_getdpcd(radeon_connector))
++ return;
++
+ /* set it to OFF so that drm_helper_connector_dpms()
+ * won't return immediately since the current state
+ * is ON at this point.
+diff --git a/drivers/gpu/drm/radeon/radeon_dp_auxch.c b/drivers/gpu/drm/radeon/radeon_dp_auxch.c
+index fcbd60bb0349..3b0c229d7dcd 100644
+--- a/drivers/gpu/drm/radeon/radeon_dp_auxch.c
++++ b/drivers/gpu/drm/radeon/radeon_dp_auxch.c
+@@ -116,8 +116,8 @@ radeon_dp_aux_transfer_native(struct drm_dp_aux *aux, struct drm_dp_aux_msg *msg
+ AUX_SW_WR_BYTES(bytes));
+
+ /* write the data header into the registers */
+- /* request, addres, msg size */
+- byte = (msg->request << 4);
++ /* request, address, msg size */
++ byte = (msg->request << 4) | ((msg->address >> 16) & 0xf);
+ WREG32(AUX_SW_DATA + aux_offset[instance],
+ AUX_SW_DATA_MASK(byte) | AUX_SW_AUTOINCREMENT_DISABLE);
+
+diff --git a/drivers/hid/hid-cp2112.c b/drivers/hid/hid-cp2112.c
+index a2dbbbe0d8d7..39bf74793b8b 100644
+--- a/drivers/hid/hid-cp2112.c
++++ b/drivers/hid/hid-cp2112.c
+@@ -537,7 +537,7 @@ static int cp2112_xfer(struct i2c_adapter *adap, u16 addr,
+ struct cp2112_device *dev = (struct cp2112_device *)adap->algo_data;
+ struct hid_device *hdev = dev->hdev;
+ u8 buf[64];
+- __be16 word;
++ __le16 word;
+ ssize_t count;
+ size_t read_length = 0;
+ unsigned int retries;
+@@ -554,7 +554,7 @@ static int cp2112_xfer(struct i2c_adapter *adap, u16 addr,
+ if (I2C_SMBUS_READ == read_write)
+ count = cp2112_read_req(buf, addr, read_length);
+ else
+- count = cp2112_write_req(buf, addr, data->byte, NULL,
++ count = cp2112_write_req(buf, addr, command, NULL,
+ 0);
+ break;
+ case I2C_SMBUS_BYTE_DATA:
+@@ -569,7 +569,7 @@ static int cp2112_xfer(struct i2c_adapter *adap, u16 addr,
+ break;
+ case I2C_SMBUS_WORD_DATA:
+ read_length = 2;
+- word = cpu_to_be16(data->word);
++ word = cpu_to_le16(data->word);
+
+ if (I2C_SMBUS_READ == read_write)
+ count = cp2112_write_read_req(buf, addr, read_length,
+@@ -582,7 +582,7 @@ static int cp2112_xfer(struct i2c_adapter *adap, u16 addr,
+ size = I2C_SMBUS_WORD_DATA;
+ read_write = I2C_SMBUS_READ;
+ read_length = 2;
+- word = cpu_to_be16(data->word);
++ word = cpu_to_le16(data->word);
+
+ count = cp2112_write_read_req(buf, addr, read_length, command,
+ (u8 *)&word, 2);
+@@ -675,7 +675,7 @@ static int cp2112_xfer(struct i2c_adapter *adap, u16 addr,
+ data->byte = buf[0];
+ break;
+ case I2C_SMBUS_WORD_DATA:
+- data->word = be16_to_cpup((__be16 *)buf);
++ data->word = le16_to_cpup((__le16 *)buf);
+ break;
+ case I2C_SMBUS_BLOCK_DATA:
+ if (read_length > I2C_SMBUS_BLOCK_MAX) {
+diff --git a/drivers/hid/usbhid/hid-core.c b/drivers/hid/usbhid/hid-core.c
+index bfbe1bedda7f..eab5bd6a2442 100644
+--- a/drivers/hid/usbhid/hid-core.c
++++ b/drivers/hid/usbhid/hid-core.c
+@@ -164,7 +164,7 @@ static void hid_io_error(struct hid_device *hid)
+ if (time_after(jiffies, usbhid->stop_retry)) {
+
+ /* Retries failed, so do a port reset unless we lack bandwidth*/
+- if (test_bit(HID_NO_BANDWIDTH, &usbhid->iofl)
++ if (!test_bit(HID_NO_BANDWIDTH, &usbhid->iofl)
+ && !test_and_set_bit(HID_RESET_PENDING, &usbhid->iofl)) {
+
+ schedule_work(&usbhid->reset_work);
+diff --git a/drivers/iio/accel/mma8452.c b/drivers/iio/accel/mma8452.c
+index 13ea1ea23328..bda69a4355fa 100644
+--- a/drivers/iio/accel/mma8452.c
++++ b/drivers/iio/accel/mma8452.c
+@@ -229,7 +229,7 @@ static int mma8452_get_hp_filter_index(struct mma8452_data *data,
+ int i = mma8452_get_odr_index(data);
+
+ return mma8452_get_int_plus_micros_index(mma8452_hp_filter_cutoff[i],
+- ARRAY_SIZE(mma8452_scales[0]), val, val2);
++ ARRAY_SIZE(mma8452_hp_filter_cutoff[0]), val, val2);
+ }
+
+ static int mma8452_read_hp_filter(struct mma8452_data *data, int *hz, int *uHz)
+diff --git a/drivers/iio/gyro/Kconfig b/drivers/iio/gyro/Kconfig
+index b3d0e94f72eb..8d2439345673 100644
+--- a/drivers/iio/gyro/Kconfig
++++ b/drivers/iio/gyro/Kconfig
+@@ -53,7 +53,8 @@ config ADXRS450
+ config BMG160
+ tristate "BOSCH BMG160 Gyro Sensor"
+ depends on I2C
+- select IIO_TRIGGERED_BUFFER if IIO_BUFFER
++ select IIO_BUFFER
++ select IIO_TRIGGERED_BUFFER
+ help
+ Say yes here to build support for Bosch BMG160 Tri-axis Gyro Sensor
+ driver. This driver also supports BMI055 gyroscope.
+diff --git a/drivers/iio/imu/adis16400_core.c b/drivers/iio/imu/adis16400_core.c
+index 2fd68f2219a7..d42e4fe2c7ed 100644
+--- a/drivers/iio/imu/adis16400_core.c
++++ b/drivers/iio/imu/adis16400_core.c
+@@ -780,7 +780,7 @@ static struct adis16400_chip_info adis16400_chips[] = {
+ .flags = ADIS16400_HAS_PROD_ID |
+ ADIS16400_HAS_SERIAL_NUMBER |
+ ADIS16400_BURST_DIAG_STAT,
+- .gyro_scale_micro = IIO_DEGREE_TO_RAD(10000), /* 0.01 deg/s */
++ .gyro_scale_micro = IIO_DEGREE_TO_RAD(40000), /* 0.04 deg/s */
+ .accel_scale_micro = IIO_G_TO_M_S_2(833), /* 1/1200 g */
+ .temp_scale_nano = 73860000, /* 0.07386 C */
+ .temp_offset = 31000000 / 73860, /* 31 C = 0x00 */
+diff --git a/drivers/iio/imu/adis16480.c b/drivers/iio/imu/adis16480.c
+index 989605dd6f78..b94bfd3f595b 100644
+--- a/drivers/iio/imu/adis16480.c
++++ b/drivers/iio/imu/adis16480.c
+@@ -110,6 +110,10 @@
+ struct adis16480_chip_info {
+ unsigned int num_channels;
+ const struct iio_chan_spec *channels;
++ unsigned int gyro_max_val;
++ unsigned int gyro_max_scale;
++ unsigned int accel_max_val;
++ unsigned int accel_max_scale;
+ };
+
+ struct adis16480 {
+@@ -497,19 +501,21 @@ static int adis16480_set_filter_freq(struct iio_dev *indio_dev,
+ static int adis16480_read_raw(struct iio_dev *indio_dev,
+ const struct iio_chan_spec *chan, int *val, int *val2, long info)
+ {
++ struct adis16480 *st = iio_priv(indio_dev);
++
+ switch (info) {
+ case IIO_CHAN_INFO_RAW:
+ return adis_single_conversion(indio_dev, chan, 0, val);
+ case IIO_CHAN_INFO_SCALE:
+ switch (chan->type) {
+ case IIO_ANGL_VEL:
+- *val = 0;
+- *val2 = IIO_DEGREE_TO_RAD(20000); /* 0.02 degree/sec */
+- return IIO_VAL_INT_PLUS_MICRO;
++ *val = st->chip_info->gyro_max_scale;
++ *val2 = st->chip_info->gyro_max_val;
++ return IIO_VAL_FRACTIONAL;
+ case IIO_ACCEL:
+- *val = 0;
+- *val2 = IIO_G_TO_M_S_2(800); /* 0.8 mg */
+- return IIO_VAL_INT_PLUS_MICRO;
++ *val = st->chip_info->accel_max_scale;
++ *val2 = st->chip_info->accel_max_val;
++ return IIO_VAL_FRACTIONAL;
+ case IIO_MAGN:
+ *val = 0;
+ *val2 = 100; /* 0.0001 gauss */
+@@ -674,18 +680,39 @@ static const struct adis16480_chip_info adis16480_chip_info[] = {
+ [ADIS16375] = {
+ .channels = adis16485_channels,
+ .num_channels = ARRAY_SIZE(adis16485_channels),
++ /*
++ * storing the value in rad/degree and the scale in degree
++ * gives us the result in rad and better precession than
++ * storing the scale directly in rad.
++ */
++ .gyro_max_val = IIO_RAD_TO_DEGREE(22887),
++ .gyro_max_scale = 300,
++ .accel_max_val = IIO_M_S_2_TO_G(21973),
++ .accel_max_scale = 18,
+ },
+ [ADIS16480] = {
+ .channels = adis16480_channels,
+ .num_channels = ARRAY_SIZE(adis16480_channels),
++ .gyro_max_val = IIO_RAD_TO_DEGREE(22500),
++ .gyro_max_scale = 450,
++ .accel_max_val = IIO_M_S_2_TO_G(12500),
++ .accel_max_scale = 5,
+ },
+ [ADIS16485] = {
+ .channels = adis16485_channels,
+ .num_channels = ARRAY_SIZE(adis16485_channels),
++ .gyro_max_val = IIO_RAD_TO_DEGREE(22500),
++ .gyro_max_scale = 450,
++ .accel_max_val = IIO_M_S_2_TO_G(20000),
++ .accel_max_scale = 5,
+ },
+ [ADIS16488] = {
+ .channels = adis16480_channels,
+ .num_channels = ARRAY_SIZE(adis16480_channels),
++ .gyro_max_val = IIO_RAD_TO_DEGREE(22500),
++ .gyro_max_scale = 450,
++ .accel_max_val = IIO_M_S_2_TO_G(22500),
++ .accel_max_scale = 18,
+ },
+ };
+
+diff --git a/drivers/iio/industrialio-buffer.c b/drivers/iio/industrialio-buffer.c
+index 6eee1b044c60..b3fda9ee4174 100644
+--- a/drivers/iio/industrialio-buffer.c
++++ b/drivers/iio/industrialio-buffer.c
+@@ -151,7 +151,7 @@ unsigned int iio_buffer_poll(struct file *filp,
+ struct iio_buffer *rb = indio_dev->buffer;
+
+ if (!indio_dev->info)
+- return -ENODEV;
++ return 0;
+
+ poll_wait(filp, &rb->pollq, wait);
+ if (iio_buffer_ready(indio_dev, rb, rb->watermark, 0))
+diff --git a/drivers/iio/industrialio-event.c b/drivers/iio/industrialio-event.c
+index 894d8137c4cf..52d4fcb0de1d 100644
+--- a/drivers/iio/industrialio-event.c
++++ b/drivers/iio/industrialio-event.c
+@@ -84,7 +84,7 @@ static unsigned int iio_event_poll(struct file *filep,
+ unsigned int events = 0;
+
+ if (!indio_dev->info)
+- return -ENODEV;
++ return events;
+
+ poll_wait(filep, &ev_int->wait, wait);
+
+diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
+index 1fe93cfea7d3..9d0672b58c31 100644
+--- a/drivers/md/dm-cache-target.c
++++ b/drivers/md/dm-cache-target.c
+@@ -1729,6 +1729,8 @@ static void remap_cell_to_origin_clear_discard(struct cache *cache,
+ remap_to_origin(cache, bio);
+ issue(cache, bio);
+ }
++
++ free_prison_cell(cache, cell);
+ }
+
+ static void remap_cell_to_cache_dirty(struct cache *cache, struct dm_bio_prison_cell *cell,
+@@ -1763,6 +1765,8 @@ static void remap_cell_to_cache_dirty(struct cache *cache, struct dm_bio_prison_
+ remap_to_cache(cache, bio, cblock);
+ issue(cache, bio);
+ }
++
++ free_prison_cell(cache, cell);
+ }
+
+ /*----------------------------------------------------------------*/
+diff --git a/drivers/md/dm-stats.c b/drivers/md/dm-stats.c
+index 8a8b48fa901a..8289804ccd99 100644
+--- a/drivers/md/dm-stats.c
++++ b/drivers/md/dm-stats.c
+@@ -457,12 +457,24 @@ static int dm_stats_list(struct dm_stats *stats, const char *program,
+ list_for_each_entry(s, &stats->list, list_entry) {
+ if (!program || !strcmp(program, s->program_id)) {
+ len = s->end - s->start;
+- DMEMIT("%d: %llu+%llu %llu %s %s\n", s->id,
++ DMEMIT("%d: %llu+%llu %llu %s %s", s->id,
+ (unsigned long long)s->start,
+ (unsigned long long)len,
+ (unsigned long long)s->step,
+ s->program_id,
+ s->aux_data);
++ if (s->stat_flags & STAT_PRECISE_TIMESTAMPS)
++ DMEMIT(" precise_timestamps");
++ if (s->n_histogram_entries) {
++ unsigned i;
++ DMEMIT(" histogram:");
++ for (i = 0; i < s->n_histogram_entries; i++) {
++ if (i)
++ DMEMIT(",");
++ DMEMIT("%llu", s->histogram_boundaries[i]);
++ }
++ }
++ DMEMIT("\n");
+ }
+ }
+ mutex_unlock(&stats->mutex);
+diff --git a/drivers/of/address.c b/drivers/of/address.c
+index 8bfda6ade2c0..384574c3987c 100644
+--- a/drivers/of/address.c
++++ b/drivers/of/address.c
+@@ -845,10 +845,10 @@ struct device_node *of_find_matching_node_by_address(struct device_node *from,
+ struct resource res;
+
+ while (dn) {
+- if (of_address_to_resource(dn, 0, &res))
+- continue;
+- if (res.start == base_address)
++ if (!of_address_to_resource(dn, 0, &res) &&
++ res.start == base_address)
+ return dn;
++
+ dn = of_find_matching_node(dn, matches);
+ }
+
+diff --git a/drivers/pci/access.c b/drivers/pci/access.c
+index d9b64a175990..b965c12168b7 100644
+--- a/drivers/pci/access.c
++++ b/drivers/pci/access.c
+@@ -439,6 +439,56 @@ static const struct pci_vpd_ops pci_vpd_pci22_ops = {
+ .release = pci_vpd_pci22_release,
+ };
+
++static ssize_t pci_vpd_f0_read(struct pci_dev *dev, loff_t pos, size_t count,
++ void *arg)
++{
++ struct pci_dev *tdev = pci_get_slot(dev->bus, PCI_SLOT(dev->devfn));
++ ssize_t ret;
++
++ if (!tdev)
++ return -ENODEV;
++
++ ret = pci_read_vpd(tdev, pos, count, arg);
++ pci_dev_put(tdev);
++ return ret;
++}
++
++static ssize_t pci_vpd_f0_write(struct pci_dev *dev, loff_t pos, size_t count,
++ const void *arg)
++{
++ struct pci_dev *tdev = pci_get_slot(dev->bus, PCI_SLOT(dev->devfn));
++ ssize_t ret;
++
++ if (!tdev)
++ return -ENODEV;
++
++ ret = pci_write_vpd(tdev, pos, count, arg);
++ pci_dev_put(tdev);
++ return ret;
++}
++
++static const struct pci_vpd_ops pci_vpd_f0_ops = {
++ .read = pci_vpd_f0_read,
++ .write = pci_vpd_f0_write,
++ .release = pci_vpd_pci22_release,
++};
++
++static int pci_vpd_f0_dev_check(struct pci_dev *dev)
++{
++ struct pci_dev *tdev = pci_get_slot(dev->bus, PCI_SLOT(dev->devfn));
++ int ret = 0;
++
++ if (!tdev)
++ return -ENODEV;
++ if (!tdev->vpd || !tdev->multifunction ||
++ dev->class != tdev->class || dev->vendor != tdev->vendor ||
++ dev->device != tdev->device)
++ ret = -ENODEV;
++
++ pci_dev_put(tdev);
++ return ret;
++}
++
+ int pci_vpd_pci22_init(struct pci_dev *dev)
+ {
+ struct pci_vpd_pci22 *vpd;
+@@ -447,12 +497,21 @@ int pci_vpd_pci22_init(struct pci_dev *dev)
+ cap = pci_find_capability(dev, PCI_CAP_ID_VPD);
+ if (!cap)
+ return -ENODEV;
++ if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0) {
++ int ret = pci_vpd_f0_dev_check(dev);
++
++ if (ret)
++ return ret;
++ }
+ vpd = kzalloc(sizeof(*vpd), GFP_ATOMIC);
+ if (!vpd)
+ return -ENOMEM;
+
+ vpd->base.len = PCI_VPD_PCI22_SIZE;
+- vpd->base.ops = &pci_vpd_pci22_ops;
++ if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0)
++ vpd->base.ops = &pci_vpd_f0_ops;
++ else
++ vpd->base.ops = &pci_vpd_pci22_ops;
+ mutex_init(&vpd->lock);
+ vpd->cap = cap;
+ vpd->busy = false;
+diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
+index e9fd0e90fa3b..dbd13854f21e 100644
+--- a/drivers/pci/quirks.c
++++ b/drivers/pci/quirks.c
+@@ -1569,6 +1569,18 @@ DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_JMICRON, PCI_DEVICE_ID_JMICRON_JMB3
+
+ #endif
+
++static void quirk_jmicron_async_suspend(struct pci_dev *dev)
++{
++ if (dev->multifunction) {
++ device_disable_async_suspend(&dev->dev);
++ dev_info(&dev->dev, "async suspend disabled to avoid multi-function power-on ordering issue\n");
++ }
++}
++DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_JMICRON, PCI_ANY_ID, PCI_CLASS_STORAGE_IDE, 8, quirk_jmicron_async_suspend);
++DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_JMICRON, PCI_ANY_ID, PCI_CLASS_STORAGE_SATA_AHCI, 0, quirk_jmicron_async_suspend);
++DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_JMICRON, 0x2362, quirk_jmicron_async_suspend);
++DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_JMICRON, 0x236f, quirk_jmicron_async_suspend);
++
+ #ifdef CONFIG_X86_IO_APIC
+ static void quirk_alder_ioapic(struct pci_dev *pdev)
+ {
+@@ -1894,6 +1906,15 @@ static void quirk_netmos(struct pci_dev *dev)
+ DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_NETMOS, PCI_ANY_ID,
+ PCI_CLASS_COMMUNICATION_SERIAL, 8, quirk_netmos);
+
++static void quirk_f0_vpd_link(struct pci_dev *dev)
++{
++ if (!dev->multifunction || !PCI_FUNC(dev->devfn))
++ return;
++ dev->dev_flags |= PCI_DEV_FLAGS_VPD_REF_F0;
++}
++DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, PCI_ANY_ID,
++ PCI_CLASS_NETWORK_ETHERNET, 8, quirk_f0_vpd_link);
++
+ static void quirk_e100_interrupt(struct pci_dev *dev)
+ {
+ u16 command, pmcsr;
+@@ -2829,12 +2850,15 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x3c28, vtd_mask_spec_errors);
+
+ static void fixup_ti816x_class(struct pci_dev *dev)
+ {
++ u32 class = dev->class;
++
+ /* TI 816x devices do not have class code set when in PCIe boot mode */
+- dev_info(&dev->dev, "Setting PCI class for 816x PCIe device\n");
+- dev->class = PCI_CLASS_MULTIMEDIA_VIDEO;
++ dev->class = PCI_CLASS_MULTIMEDIA_VIDEO << 8;
++ dev_info(&dev->dev, "PCI class overridden (%#08x -> %#08x)\n",
++ class, dev->class);
+ }
+ DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_TI, 0xb800,
+- PCI_CLASS_NOT_DEFINED, 0, fixup_ti816x_class);
++ PCI_CLASS_NOT_DEFINED, 0, fixup_ti816x_class);
+
+ /* Some PCIe devices do not work reliably with the claimed maximum
+ * payload size supported.
+diff --git a/drivers/regulator/pbias-regulator.c b/drivers/regulator/pbias-regulator.c
+index bd2b75c0d1d1..4fa7bcaf454e 100644
+--- a/drivers/regulator/pbias-regulator.c
++++ b/drivers/regulator/pbias-regulator.c
+@@ -30,6 +30,7 @@
+ struct pbias_reg_info {
+ u32 enable;
+ u32 enable_mask;
++ u32 disable_val;
+ u32 vmode;
+ unsigned int enable_time;
+ char *name;
+@@ -62,6 +63,7 @@ static const struct pbias_reg_info pbias_mmc_omap2430 = {
+ .enable = BIT(1),
+ .enable_mask = BIT(1),
+ .vmode = BIT(0),
++ .disable_val = 0,
+ .enable_time = 100,
+ .name = "pbias_mmc_omap2430"
+ };
+@@ -77,6 +79,7 @@ static const struct pbias_reg_info pbias_sim_omap3 = {
+ static const struct pbias_reg_info pbias_mmc_omap4 = {
+ .enable = BIT(26) | BIT(22),
+ .enable_mask = BIT(26) | BIT(25) | BIT(22),
++ .disable_val = BIT(25),
+ .vmode = BIT(21),
+ .enable_time = 100,
+ .name = "pbias_mmc_omap4"
+@@ -85,6 +88,7 @@ static const struct pbias_reg_info pbias_mmc_omap4 = {
+ static const struct pbias_reg_info pbias_mmc_omap5 = {
+ .enable = BIT(27) | BIT(26),
+ .enable_mask = BIT(27) | BIT(25) | BIT(26),
++ .disable_val = BIT(25),
+ .vmode = BIT(21),
+ .enable_time = 100,
+ .name = "pbias_mmc_omap5"
+@@ -159,6 +163,7 @@ static int pbias_regulator_probe(struct platform_device *pdev)
+ drvdata[data_idx].desc.enable_reg = res->start;
+ drvdata[data_idx].desc.enable_mask = info->enable_mask;
+ drvdata[data_idx].desc.enable_val = info->enable;
++ drvdata[data_idx].desc.disable_val = info->disable_val;
+
+ cfg.init_data = pbias_matches[idx].init_data;
+ cfg.driver_data = &drvdata[data_idx];
+diff --git a/drivers/soc/tegra/pmc.c b/drivers/soc/tegra/pmc.c
+index 75d0457a77b7..fa7036c4daf9 100644
+--- a/drivers/soc/tegra/pmc.c
++++ b/drivers/soc/tegra/pmc.c
+@@ -736,12 +736,12 @@ void tegra_pmc_init_tsense_reset(struct tegra_pmc *pmc)
+ u32 value, checksum;
+
+ if (!pmc->soc->has_tsense_reset)
+- goto out;
++ return;
+
+ np = of_find_node_by_name(pmc->dev->of_node, "i2c-thermtrip");
+ if (!np) {
+ dev_warn(dev, "i2c-thermtrip node not found, %s.\n", disabled);
+- goto out;
++ return;
+ }
+
+ if (of_property_read_u32(np, "nvidia,i2c-controller-id", &ctrl_id)) {
+diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
+index 59705ab23577..c9357bb393d3 100644
+--- a/drivers/spi/spi-bcm2835.c
++++ b/drivers/spi/spi-bcm2835.c
+@@ -553,13 +553,11 @@ static int bcm2835_spi_transfer_one(struct spi_master *master,
+ spi_used_hz = cdiv ? (clk_hz / cdiv) : (clk_hz / 65536);
+ bcm2835_wr(bs, BCM2835_SPI_CLK, cdiv);
+
+- /* handle all the modes */
++ /* handle all the 3-wire mode */
+ if ((spi->mode & SPI_3WIRE) && (tfr->rx_buf))
+ cs |= BCM2835_SPI_CS_REN;
+- if (spi->mode & SPI_CPOL)
+- cs |= BCM2835_SPI_CS_CPOL;
+- if (spi->mode & SPI_CPHA)
+- cs |= BCM2835_SPI_CS_CPHA;
++ else
++ cs &= ~BCM2835_SPI_CS_REN;
+
+ /* for gpio_cs set dummy CS so that no HW-CS get changed
+ * we can not run this in bcm2835_spi_set_cs, as it does
+@@ -592,6 +590,25 @@ static int bcm2835_spi_transfer_one(struct spi_master *master,
+ return bcm2835_spi_transfer_one_irq(master, spi, tfr, cs);
+ }
+
++static int bcm2835_spi_prepare_message(struct spi_master *master,
++ struct spi_message *msg)
++{
++ struct spi_device *spi = msg->spi;
++ struct bcm2835_spi *bs = spi_master_get_devdata(master);
++ u32 cs = bcm2835_rd(bs, BCM2835_SPI_CS);
++
++ cs &= ~(BCM2835_SPI_CS_CPOL | BCM2835_SPI_CS_CPHA);
++
++ if (spi->mode & SPI_CPOL)
++ cs |= BCM2835_SPI_CS_CPOL;
++ if (spi->mode & SPI_CPHA)
++ cs |= BCM2835_SPI_CS_CPHA;
++
++ bcm2835_wr(bs, BCM2835_SPI_CS, cs);
++
++ return 0;
++}
++
+ static void bcm2835_spi_handle_err(struct spi_master *master,
+ struct spi_message *msg)
+ {
+@@ -739,6 +756,7 @@ static int bcm2835_spi_probe(struct platform_device *pdev)
+ master->set_cs = bcm2835_spi_set_cs;
+ master->transfer_one = bcm2835_spi_transfer_one;
+ master->handle_err = bcm2835_spi_handle_err;
++ master->prepare_message = bcm2835_spi_prepare_message;
+ master->dev.of_node = pdev->dev.of_node;
+
+ bs = spi_master_get_devdata(master);
+diff --git a/drivers/spi/spi-bitbang-txrx.h b/drivers/spi/spi-bitbang-txrx.h
+index 06b34e5bcfa3..47bb9b898dfd 100644
+--- a/drivers/spi/spi-bitbang-txrx.h
++++ b/drivers/spi/spi-bitbang-txrx.h
+@@ -49,7 +49,7 @@ bitbang_txrx_be_cpha0(struct spi_device *spi,
+ {
+ /* if (cpol == 0) this is SPI_MODE_0; else this is SPI_MODE_2 */
+
+- bool oldbit = !(word & 1);
++ u32 oldbit = (!(word & (1<<(bits-1)))) << 31;
+ /* clock starts at inactive polarity */
+ for (word <<= (32 - bits); likely(bits); bits--) {
+
+@@ -81,7 +81,7 @@ bitbang_txrx_be_cpha1(struct spi_device *spi,
+ {
+ /* if (cpol == 0) this is SPI_MODE_1; else this is SPI_MODE_3 */
+
+- bool oldbit = !(word & (1 << 31));
++ u32 oldbit = (!(word & (1<<(bits-1)))) << 31;
+ /* clock starts at inactive polarity */
+ for (word <<= (32 - bits); likely(bits); bits--) {
+
+diff --git a/drivers/spi/spi-dw-mmio.c b/drivers/spi/spi-dw-mmio.c
+index eb03e1215195..7edede6e024b 100644
+--- a/drivers/spi/spi-dw-mmio.c
++++ b/drivers/spi/spi-dw-mmio.c
+@@ -74,6 +74,9 @@ static int dw_spi_mmio_probe(struct platform_device *pdev)
+
+ dws->max_freq = clk_get_rate(dwsmmio->clk);
+
++ of_property_read_u32(pdev->dev.of_node, "reg-io-width",
++ &dws->reg_io_width);
++
+ num_cs = 4;
+
+ if (pdev->dev.of_node)
+diff --git a/drivers/spi/spi-dw.c b/drivers/spi/spi-dw.c
+index 8d67d03c71eb..4fbfcdc5cb24 100644
+--- a/drivers/spi/spi-dw.c
++++ b/drivers/spi/spi-dw.c
+@@ -194,7 +194,7 @@ static void dw_writer(struct dw_spi *dws)
+ else
+ txw = *(u16 *)(dws->tx);
+ }
+- dw_writel(dws, DW_SPI_DR, txw);
++ dw_write_io_reg(dws, DW_SPI_DR, txw);
+ dws->tx += dws->n_bytes;
+ }
+ }
+@@ -205,7 +205,7 @@ static void dw_reader(struct dw_spi *dws)
+ u16 rxw;
+
+ while (max--) {
+- rxw = dw_readl(dws, DW_SPI_DR);
++ rxw = dw_read_io_reg(dws, DW_SPI_DR);
+ /* Care rx only if the transfer's original "rx" is not null */
+ if (dws->rx_end - dws->len) {
+ if (dws->n_bytes == 1)
+diff --git a/drivers/spi/spi-dw.h b/drivers/spi/spi-dw.h
+index 6c91391c1a4f..b75ed327d5a2 100644
+--- a/drivers/spi/spi-dw.h
++++ b/drivers/spi/spi-dw.h
+@@ -109,6 +109,7 @@ struct dw_spi {
+ u32 fifo_len; /* depth of the FIFO buffer */
+ u32 max_freq; /* max bus freq supported */
+
++ u32 reg_io_width; /* DR I/O width in bytes */
+ u16 bus_num;
+ u16 num_cs; /* supported slave numbers */
+
+@@ -145,11 +146,45 @@ static inline u32 dw_readl(struct dw_spi *dws, u32 offset)
+ return __raw_readl(dws->regs + offset);
+ }
+
++static inline u16 dw_readw(struct dw_spi *dws, u32 offset)
++{
++ return __raw_readw(dws->regs + offset);
++}
++
+ static inline void dw_writel(struct dw_spi *dws, u32 offset, u32 val)
+ {
+ __raw_writel(val, dws->regs + offset);
+ }
+
++static inline void dw_writew(struct dw_spi *dws, u32 offset, u16 val)
++{
++ __raw_writew(val, dws->regs + offset);
++}
++
++static inline u32 dw_read_io_reg(struct dw_spi *dws, u32 offset)
++{
++ switch (dws->reg_io_width) {
++ case 2:
++ return dw_readw(dws, offset);
++ case 4:
++ default:
++ return dw_readl(dws, offset);
++ }
++}
++
++static inline void dw_write_io_reg(struct dw_spi *dws, u32 offset, u32 val)
++{
++ switch (dws->reg_io_width) {
++ case 2:
++ dw_writew(dws, offset, val);
++ break;
++ case 4:
++ default:
++ dw_writel(dws, offset, val);
++ break;
++ }
++}
++
+ static inline void spi_enable_chip(struct dw_spi *dws, int enable)
+ {
+ dw_writel(dws, DW_SPI_SSIENR, (enable ? 1 : 0));
+diff --git a/drivers/spi/spi-img-spfi.c b/drivers/spi/spi-img-spfi.c
+index acce90ac7371..bb916c8d40db 100644
+--- a/drivers/spi/spi-img-spfi.c
++++ b/drivers/spi/spi-img-spfi.c
+@@ -105,6 +105,10 @@ struct img_spfi {
+ bool rx_dma_busy;
+ };
+
++struct img_spfi_device_data {
++ bool gpio_requested;
++};
++
+ static inline u32 spfi_readl(struct img_spfi *spfi, u32 reg)
+ {
+ return readl(spfi->regs + reg);
+@@ -267,15 +271,15 @@ static int img_spfi_start_pio(struct spi_master *master,
+ cpu_relax();
+ }
+
+- ret = spfi_wait_all_done(spfi);
+- if (ret < 0)
+- return ret;
+-
+ if (rx_bytes > 0 || tx_bytes > 0) {
+ dev_err(spfi->dev, "PIO transfer timed out\n");
+ return -ETIMEDOUT;
+ }
+
++ ret = spfi_wait_all_done(spfi);
++ if (ret < 0)
++ return ret;
++
+ return 0;
+ }
+
+@@ -440,21 +444,50 @@ static int img_spfi_unprepare(struct spi_master *master,
+
+ static int img_spfi_setup(struct spi_device *spi)
+ {
+- int ret;
+-
+- ret = gpio_request_one(spi->cs_gpio, (spi->mode & SPI_CS_HIGH) ?
+- GPIOF_OUT_INIT_LOW : GPIOF_OUT_INIT_HIGH,
+- dev_name(&spi->dev));
+- if (ret)
+- dev_err(&spi->dev, "can't request chipselect gpio %d\n",
++ int ret = -EINVAL;
++ struct img_spfi_device_data *spfi_data = spi_get_ctldata(spi);
++
++ if (!spfi_data) {
++ spfi_data = kzalloc(sizeof(*spfi_data), GFP_KERNEL);
++ if (!spfi_data)
++ return -ENOMEM;
++ spfi_data->gpio_requested = false;
++ spi_set_ctldata(spi, spfi_data);
++ }
++ if (!spfi_data->gpio_requested) {
++ ret = gpio_request_one(spi->cs_gpio,
++ (spi->mode & SPI_CS_HIGH) ?
++ GPIOF_OUT_INIT_LOW : GPIOF_OUT_INIT_HIGH,
++ dev_name(&spi->dev));
++ if (ret)
++ dev_err(&spi->dev, "can't request chipselect gpio %d\n",
+ spi->cs_gpio);
+-
++ else
++ spfi_data->gpio_requested = true;
++ } else {
++ if (gpio_is_valid(spi->cs_gpio)) {
++ int mode = ((spi->mode & SPI_CS_HIGH) ?
++ GPIOF_OUT_INIT_LOW : GPIOF_OUT_INIT_HIGH);
++
++ ret = gpio_direction_output(spi->cs_gpio, mode);
++ if (ret)
++ dev_err(&spi->dev, "chipselect gpio %d setup failed (%d)\n",
++ spi->cs_gpio, ret);
++ }
++ }
+ return ret;
+ }
+
+ static void img_spfi_cleanup(struct spi_device *spi)
+ {
+- gpio_free(spi->cs_gpio);
++ struct img_spfi_device_data *spfi_data = spi_get_ctldata(spi);
++
++ if (spfi_data) {
++ if (spfi_data->gpio_requested)
++ gpio_free(spi->cs_gpio);
++ kfree(spfi_data);
++ spi_set_ctldata(spi, NULL);
++ }
+ }
+
+ static void img_spfi_config(struct spi_master *master, struct spi_device *spi,
+diff --git a/drivers/spi/spi-omap2-mcspi.c b/drivers/spi/spi-omap2-mcspi.c
+index 58673841286c..3d09e0b69b73 100644
+--- a/drivers/spi/spi-omap2-mcspi.c
++++ b/drivers/spi/spi-omap2-mcspi.c
+@@ -245,6 +245,7 @@ static void omap2_mcspi_set_enable(const struct spi_device *spi, int enable)
+
+ static void omap2_mcspi_set_cs(struct spi_device *spi, bool enable)
+ {
++ struct omap2_mcspi *mcspi = spi_master_get_devdata(spi->master);
+ u32 l;
+
+ /* The controller handles the inverted chip selects
+@@ -255,6 +256,12 @@ static void omap2_mcspi_set_cs(struct spi_device *spi, bool enable)
+ enable = !enable;
+
+ if (spi->controller_state) {
++ int err = pm_runtime_get_sync(mcspi->dev);
++ if (err < 0) {
++ dev_err(mcspi->dev, "failed to get sync: %d\n", err);
++ return;
++ }
++
+ l = mcspi_cached_chconf0(spi);
+
+ if (enable)
+@@ -263,6 +270,9 @@ static void omap2_mcspi_set_cs(struct spi_device *spi, bool enable)
+ l |= OMAP2_MCSPI_CHCONF_FORCE;
+
+ mcspi_write_chconf0(spi, l);
++
++ pm_runtime_mark_last_busy(mcspi->dev);
++ pm_runtime_put_autosuspend(mcspi->dev);
+ }
+ }
+
+diff --git a/drivers/spi/spi-orion.c b/drivers/spi/spi-orion.c
+index 8cad107a5b3f..a87cfd4ba17b 100644
+--- a/drivers/spi/spi-orion.c
++++ b/drivers/spi/spi-orion.c
+@@ -41,6 +41,11 @@
+ #define ORION_SPI_DATA_OUT_REG 0x08
+ #define ORION_SPI_DATA_IN_REG 0x0c
+ #define ORION_SPI_INT_CAUSE_REG 0x10
++#define ORION_SPI_TIMING_PARAMS_REG 0x18
++
++#define ORION_SPI_TMISO_SAMPLE_MASK (0x3 << 6)
++#define ORION_SPI_TMISO_SAMPLE_1 (1 << 6)
++#define ORION_SPI_TMISO_SAMPLE_2 (2 << 6)
+
+ #define ORION_SPI_MODE_CPOL (1 << 11)
+ #define ORION_SPI_MODE_CPHA (1 << 12)
+@@ -70,6 +75,7 @@ struct orion_spi_dev {
+ unsigned int min_divisor;
+ unsigned int max_divisor;
+ u32 prescale_mask;
++ bool is_errata_50mhz_ac;
+ };
+
+ struct orion_spi {
+@@ -195,6 +201,41 @@ orion_spi_mode_set(struct spi_device *spi)
+ writel(reg, spi_reg(orion_spi, ORION_SPI_IF_CONFIG_REG));
+ }
+
++static void
++orion_spi_50mhz_ac_timing_erratum(struct spi_device *spi, unsigned int speed)
++{
++ u32 reg;
++ struct orion_spi *orion_spi;
++
++ orion_spi = spi_master_get_devdata(spi->master);
++
++ /*
++ * Erratum description: (Erratum NO. FE-9144572) The device
++ * SPI interface supports frequencies of up to 50 MHz.
++ * However, due to this erratum, when the device core clock is
++ * 250 MHz and the SPI interfaces is configured for 50MHz SPI
++ * clock and CPOL=CPHA=1 there might occur data corruption on
++ * reads from the SPI device.
++ * Erratum Workaround:
++ * Work in one of the following configurations:
++ * 1. Set CPOL=CPHA=0 in "SPI Interface Configuration
++ * Register".
++ * 2. Set TMISO_SAMPLE value to 0x2 in "SPI Timing Parameters 1
++ * Register" before setting the interface.
++ */
++ reg = readl(spi_reg(orion_spi, ORION_SPI_TIMING_PARAMS_REG));
++ reg &= ~ORION_SPI_TMISO_SAMPLE_MASK;
++
++ if (clk_get_rate(orion_spi->clk) == 250000000 &&
++ speed == 50000000 && spi->mode & SPI_CPOL &&
++ spi->mode & SPI_CPHA)
++ reg |= ORION_SPI_TMISO_SAMPLE_2;
++ else
++ reg |= ORION_SPI_TMISO_SAMPLE_1; /* This is the default value */
++
++ writel(reg, spi_reg(orion_spi, ORION_SPI_TIMING_PARAMS_REG));
++}
++
+ /*
+ * called only when no transfer is active on the bus
+ */
+@@ -216,6 +257,9 @@ orion_spi_setup_transfer(struct spi_device *spi, struct spi_transfer *t)
+
+ orion_spi_mode_set(spi);
+
++ if (orion_spi->devdata->is_errata_50mhz_ac)
++ orion_spi_50mhz_ac_timing_erratum(spi, speed);
++
+ rc = orion_spi_baudrate_set(spi, speed);
+ if (rc)
+ return rc;
+@@ -413,6 +457,14 @@ static const struct orion_spi_dev armada_375_spi_dev_data = {
+ .prescale_mask = ARMADA_SPI_CLK_PRESCALE_MASK,
+ };
+
++static const struct orion_spi_dev armada_380_spi_dev_data = {
++ .typ = ARMADA_SPI,
++ .max_hz = 50000000,
++ .max_divisor = 1920,
++ .prescale_mask = ARMADA_SPI_CLK_PRESCALE_MASK,
++ .is_errata_50mhz_ac = true,
++};
++
+ static const struct of_device_id orion_spi_of_match_table[] = {
+ {
+ .compatible = "marvell,orion-spi",
+@@ -428,7 +480,7 @@ static const struct of_device_id orion_spi_of_match_table[] = {
+ },
+ {
+ .compatible = "marvell,armada-380-spi",
+- .data = &armada_xp_spi_dev_data,
++ .data = &armada_380_spi_dev_data,
+ },
+ {
+ .compatible = "marvell,armada-390-spi",
+diff --git a/drivers/spi/spi-sh-msiof.c b/drivers/spi/spi-sh-msiof.c
+index d3370a612d84..a7629f8edfca 100644
+--- a/drivers/spi/spi-sh-msiof.c
++++ b/drivers/spi/spi-sh-msiof.c
+@@ -48,8 +48,8 @@ struct sh_msiof_spi_priv {
+ const struct sh_msiof_chipdata *chipdata;
+ struct sh_msiof_spi_info *info;
+ struct completion done;
+- int tx_fifo_size;
+- int rx_fifo_size;
++ unsigned int tx_fifo_size;
++ unsigned int rx_fifo_size;
+ void *tx_dma_page;
+ void *rx_dma_page;
+ dma_addr_t tx_dma_addr;
+@@ -95,8 +95,6 @@ struct sh_msiof_spi_priv {
+ #define MDR2_WDLEN1(i) (((i) - 1) << 16) /* Word Count (1-64/256 (SH, A1))) */
+ #define MDR2_GRPMASK1 0x00000001 /* Group Output Mask 1 (SH, A1) */
+
+-#define MAX_WDLEN 256U
+-
+ /* TSCR and RSCR */
+ #define SCR_BRPS_MASK 0x1f00 /* Prescaler Setting (1-32) */
+ #define SCR_BRPS(i) (((i) - 1) << 8)
+@@ -850,7 +848,12 @@ static int sh_msiof_transfer_one(struct spi_master *master,
+ * DMA supports 32-bit words only, hence pack 8-bit and 16-bit
+ * words, with byte resp. word swapping.
+ */
+- unsigned int l = min(len, MAX_WDLEN * 4);
++ unsigned int l = 0;
++
++ if (tx_buf)
++ l = min(len, p->tx_fifo_size * 4);
++ if (rx_buf)
++ l = min(len, p->rx_fifo_size * 4);
+
+ if (bits <= 8) {
+ if (l & 3)
+@@ -963,7 +966,7 @@ static const struct sh_msiof_chipdata sh_data = {
+
+ static const struct sh_msiof_chipdata r8a779x_data = {
+ .tx_fifo_size = 64,
+- .rx_fifo_size = 256,
++ .rx_fifo_size = 64,
+ .master_flags = SPI_MASTER_MUST_TX,
+ };
+
+diff --git a/drivers/spi/spi-xilinx.c b/drivers/spi/spi-xilinx.c
+index 133f53a9c1d4..a339c1e9997a 100644
+--- a/drivers/spi/spi-xilinx.c
++++ b/drivers/spi/spi-xilinx.c
+@@ -249,19 +249,23 @@ static int xilinx_spi_txrx_bufs(struct spi_device *spi, struct spi_transfer *t)
+ xspi->tx_ptr = t->tx_buf;
+ xspi->rx_ptr = t->rx_buf;
+ remaining_words = t->len / xspi->bytes_per_word;
+- reinit_completion(&xspi->done);
+
+ if (xspi->irq >= 0 && remaining_words > xspi->buffer_size) {
++ u32 isr;
+ use_irq = true;
+- xspi->write_fn(XSPI_INTR_TX_EMPTY,
+- xspi->regs + XIPIF_V123B_IISR_OFFSET);
+- /* Enable the global IPIF interrupt */
+- xspi->write_fn(XIPIF_V123B_GINTR_ENABLE,
+- xspi->regs + XIPIF_V123B_DGIER_OFFSET);
+ /* Inhibit irq to avoid spurious irqs on tx_empty*/
+ cr = xspi->read_fn(xspi->regs + XSPI_CR_OFFSET);
+ xspi->write_fn(cr | XSPI_CR_TRANS_INHIBIT,
+ xspi->regs + XSPI_CR_OFFSET);
++ /* ACK old irqs (if any) */
++ isr = xspi->read_fn(xspi->regs + XIPIF_V123B_IISR_OFFSET);
++ if (isr)
++ xspi->write_fn(isr,
++ xspi->regs + XIPIF_V123B_IISR_OFFSET);
++ /* Enable the global IPIF interrupt */
++ xspi->write_fn(XIPIF_V123B_GINTR_ENABLE,
++ xspi->regs + XIPIF_V123B_DGIER_OFFSET);
++ reinit_completion(&xspi->done);
+ }
+
+ while (remaining_words) {
+@@ -302,8 +306,10 @@ static int xilinx_spi_txrx_bufs(struct spi_device *spi, struct spi_transfer *t)
+ remaining_words -= n_words;
+ }
+
+- if (use_irq)
++ if (use_irq) {
+ xspi->write_fn(0, xspi->regs + XIPIF_V123B_DGIER_OFFSET);
++ xspi->write_fn(cr, xspi->regs + XSPI_CR_OFFSET);
++ }
+
+ return t->len;
+ }
+diff --git a/drivers/staging/comedi/drivers/adl_pci7x3x.c b/drivers/staging/comedi/drivers/adl_pci7x3x.c
+index 934af3ff7897..b0fc027cf485 100644
+--- a/drivers/staging/comedi/drivers/adl_pci7x3x.c
++++ b/drivers/staging/comedi/drivers/adl_pci7x3x.c
+@@ -120,8 +120,20 @@ static int adl_pci7x3x_do_insn_bits(struct comedi_device *dev,
+ {
+ unsigned long reg = (unsigned long)s->private;
+
+- if (comedi_dio_update_state(s, data))
+- outl(s->state, dev->iobase + reg);
++ if (comedi_dio_update_state(s, data)) {
++ unsigned int val = s->state;
++
++ if (s->n_chan == 16) {
++ /*
++ * It seems the PCI-7230 needs the 16-bit DO state
++ * to be shifted left by 16 bits before being written
++ * to the 32-bit register. Set the value in both
++ * halves of the register to be sure.
++ */
++ val |= val << 16;
++ }
++ outl(val, dev->iobase + reg);
++ }
+
+ data[1] = s->state;
+
+diff --git a/drivers/staging/comedi/drivers/usbduxsigma.c b/drivers/staging/comedi/drivers/usbduxsigma.c
+index eaa9add491df..dc0b25a54088 100644
+--- a/drivers/staging/comedi/drivers/usbduxsigma.c
++++ b/drivers/staging/comedi/drivers/usbduxsigma.c
+@@ -550,27 +550,6 @@ static int usbduxsigma_ai_cmdtest(struct comedi_device *dev,
+ if (err)
+ return 3;
+
+- /* Step 4: fix up any arguments */
+-
+- if (high_speed) {
+- /*
+- * every 2 channels get a time window of 125us. Thus, if we
+- * sample all 16 channels we need 1ms. If we sample only one
+- * channel we need only 125us
+- */
+- devpriv->ai_interval = interval;
+- devpriv->ai_timer = cmd->scan_begin_arg / (125000 * interval);
+- } else {
+- /* interval always 1ms */
+- devpriv->ai_interval = 1;
+- devpriv->ai_timer = cmd->scan_begin_arg / 1000000;
+- }
+- if (devpriv->ai_timer < 1)
+- err |= -EINVAL;
+-
+- if (err)
+- return 4;
+-
+ return 0;
+ }
+
+@@ -668,6 +647,22 @@ static int usbduxsigma_ai_cmd(struct comedi_device *dev,
+
+ down(&devpriv->sem);
+
++ if (devpriv->high_speed) {
++ /*
++ * every 2 channels get a time window of 125us. Thus, if we
++ * sample all 16 channels we need 1ms. If we sample only one
++ * channel we need only 125us
++ */
++ unsigned int interval = usbduxsigma_chans_to_interval(len);
++
++ devpriv->ai_interval = interval;
++ devpriv->ai_timer = cmd->scan_begin_arg / (125000 * interval);
++ } else {
++ /* interval always 1ms */
++ devpriv->ai_interval = 1;
++ devpriv->ai_timer = cmd->scan_begin_arg / 1000000;
++ }
++
+ for (i = 0; i < len; i++) {
+ unsigned int chan = CR_CHAN(cmd->chanlist[i]);
+
+@@ -917,25 +912,6 @@ static int usbduxsigma_ao_cmdtest(struct comedi_device *dev,
+ if (err)
+ return 3;
+
+- /* Step 4: fix up any arguments */
+-
+- /* we count in timer steps */
+- if (high_speed) {
+- /* timing of the conversion itself: every 125 us */
+- devpriv->ao_timer = cmd->convert_arg / 125000;
+- } else {
+- /*
+- * timing of the scan: every 1ms
+- * we get all channels at once
+- */
+- devpriv->ao_timer = cmd->scan_begin_arg / 1000000;
+- }
+- if (devpriv->ao_timer < 1)
+- err |= -EINVAL;
+-
+- if (err)
+- return 4;
+-
+ return 0;
+ }
+
+@@ -948,6 +924,20 @@ static int usbduxsigma_ao_cmd(struct comedi_device *dev,
+
+ down(&devpriv->sem);
+
++ if (cmd->convert_src == TRIG_TIMER) {
++ /*
++ * timing of the conversion itself: every 125 us
++ * at high speed (not used yet)
++ */
++ devpriv->ao_timer = cmd->convert_arg / 125000;
++ } else {
++ /*
++ * timing of the scan: every 1ms
++ * we get all channels at once
++ */
++ devpriv->ao_timer = cmd->scan_begin_arg / 1000000;
++ }
++
+ devpriv->ao_counter = devpriv->ao_timer;
+
+ if (cmd->start_src == TRIG_NOW) {
+diff --git a/drivers/staging/rtl8192e/rtl8192e/rtl_core.c b/drivers/staging/rtl8192e/rtl8192e/rtl_core.c
+index c6cdb43b864c..476808261fa8 100644
+--- a/drivers/staging/rtl8192e/rtl8192e/rtl_core.c
++++ b/drivers/staging/rtl8192e/rtl8192e/rtl_core.c
+@@ -1826,8 +1826,8 @@ void rtl8192_hard_data_xmit(struct sk_buff *skb, struct net_device *dev,
+ return;
+ }
+
+- if (queue_index != TXCMD_QUEUE)
+- netdev_warn(dev, "%s(): queue index != TXCMD_QUEUE\n",
++ if (queue_index == TXCMD_QUEUE)
++ netdev_warn(dev, "%s(): queue index == TXCMD_QUEUE\n",
+ __func__);
+
+ memcpy((unsigned char *)(skb->cb), &dev, sizeof(dev));
+diff --git a/drivers/staging/unisys/visorbus/visorchipset.c b/drivers/staging/unisys/visorbus/visorchipset.c
+index bb8087e70127..44269d58eb51 100644
+--- a/drivers/staging/unisys/visorbus/visorchipset.c
++++ b/drivers/staging/unisys/visorbus/visorchipset.c
+@@ -2381,6 +2381,9 @@ static struct acpi_driver unisys_acpi_driver = {
+ .remove = visorchipset_exit,
+ },
+ };
++
++MODULE_DEVICE_TABLE(acpi, unisys_device_ids);
++
+ static __init uint32_t visorutil_spar_detect(void)
+ {
+ unsigned int eax, ebx, ecx, edx;
+diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
+index d75a66c72750..b470df122642 100644
+--- a/drivers/tty/serial/8250/8250_omap.c
++++ b/drivers/tty/serial/8250/8250_omap.c
+@@ -100,6 +100,7 @@ struct omap8250_priv {
+ struct work_struct qos_work;
+ struct uart_8250_dma omap8250_dma;
+ spinlock_t rx_dma_lock;
++ bool rx_dma_broken;
+ };
+
+ static u32 uart_read(struct uart_8250_port *up, u32 reg)
+@@ -754,6 +755,7 @@ static void omap_8250_rx_dma_flush(struct uart_8250_port *p)
+ struct omap8250_priv *priv = p->port.private_data;
+ struct uart_8250_dma *dma = p->dma;
+ unsigned long flags;
++ int ret;
+
+ spin_lock_irqsave(&priv->rx_dma_lock, flags);
+
+@@ -762,7 +764,9 @@ static void omap_8250_rx_dma_flush(struct uart_8250_port *p)
+ return;
+ }
+
+- dmaengine_pause(dma->rxchan);
++ ret = dmaengine_pause(dma->rxchan);
++ if (WARN_ON_ONCE(ret))
++ priv->rx_dma_broken = true;
+
+ spin_unlock_irqrestore(&priv->rx_dma_lock, flags);
+
+@@ -806,6 +810,9 @@ static int omap_8250_rx_dma(struct uart_8250_port *p, unsigned int iir)
+ break;
+ }
+
++ if (priv->rx_dma_broken)
++ return -EINVAL;
++
+ spin_lock_irqsave(&priv->rx_dma_lock, flags);
+
+ if (dma->rx_running)
+@@ -1180,6 +1187,11 @@ static int omap8250_probe(struct platform_device *pdev)
+
+ if (of_machine_is_compatible("ti,am33xx"))
+ priv->habit |= OMAP_DMA_TX_KICK;
++ /*
++ * pause is currently not supported atleast on omap-sdma
++ * and edma on most earlier kernels.
++ */
++ priv->rx_dma_broken = true;
+ }
+ }
+ #endif
+diff --git a/drivers/tty/serial/8250/8250_pci.c b/drivers/tty/serial/8250/8250_pci.c
+index e55f18b93fe7..46ddce479f26 100644
+--- a/drivers/tty/serial/8250/8250_pci.c
++++ b/drivers/tty/serial/8250/8250_pci.c
+@@ -2017,6 +2017,12 @@ pci_wch_ch38x_setup(struct serial_private *priv,
+ #define PCIE_DEVICE_ID_WCH_CH382_2S1P 0x3250
+ #define PCIE_DEVICE_ID_WCH_CH384_4S 0x3470
+
++#define PCI_VENDOR_ID_PERICOM 0x12D8
++#define PCI_DEVICE_ID_PERICOM_PI7C9X7951 0x7951
++#define PCI_DEVICE_ID_PERICOM_PI7C9X7952 0x7952
++#define PCI_DEVICE_ID_PERICOM_PI7C9X7954 0x7954
++#define PCI_DEVICE_ID_PERICOM_PI7C9X7958 0x7958
++
+ /* Unknown vendors/cards - this should not be in linux/pci_ids.h */
+ #define PCI_SUBDEVICE_ID_UNKNOWN_0x1584 0x1584
+ #define PCI_SUBDEVICE_ID_UNKNOWN_0x1588 0x1588
+@@ -2331,27 +2337,12 @@ static struct pci_serial_quirk pci_serial_quirks[] __refdata = {
+ * Pericom
+ */
+ {
+- .vendor = 0x12d8,
+- .device = 0x7952,
+- .subvendor = PCI_ANY_ID,
+- .subdevice = PCI_ANY_ID,
+- .setup = pci_pericom_setup,
+- },
+- {
+- .vendor = 0x12d8,
+- .device = 0x7954,
+- .subvendor = PCI_ANY_ID,
+- .subdevice = PCI_ANY_ID,
+- .setup = pci_pericom_setup,
+- },
+- {
+- .vendor = 0x12d8,
+- .device = 0x7958,
+- .subvendor = PCI_ANY_ID,
+- .subdevice = PCI_ANY_ID,
+- .setup = pci_pericom_setup,
++ .vendor = PCI_VENDOR_ID_PERICOM,
++ .device = PCI_ANY_ID,
++ .subvendor = PCI_ANY_ID,
++ .subdevice = PCI_ANY_ID,
++ .setup = pci_pericom_setup,
+ },
+-
+ /*
+ * PLX
+ */
+@@ -3056,6 +3047,10 @@ enum pci_board_num_t {
+ pbn_fintek_8,
+ pbn_fintek_12,
+ pbn_wch384_4,
++ pbn_pericom_PI7C9X7951,
++ pbn_pericom_PI7C9X7952,
++ pbn_pericom_PI7C9X7954,
++ pbn_pericom_PI7C9X7958,
+ };
+
+ /*
+@@ -3881,7 +3876,6 @@ static struct pciserial_board pci_boards[] = {
+ .base_baud = 115200,
+ .first_offset = 0x40,
+ },
+-
+ [pbn_wch384_4] = {
+ .flags = FL_BASE0,
+ .num_ports = 4,
+@@ -3889,6 +3883,33 @@ static struct pciserial_board pci_boards[] = {
+ .uart_offset = 8,
+ .first_offset = 0xC0,
+ },
++ /*
++ * Pericom PI7C9X795[1248] Uno/Dual/Quad/Octal UART
++ */
++ [pbn_pericom_PI7C9X7951] = {
++ .flags = FL_BASE0,
++ .num_ports = 1,
++ .base_baud = 921600,
++ .uart_offset = 0x8,
++ },
++ [pbn_pericom_PI7C9X7952] = {
++ .flags = FL_BASE0,
++ .num_ports = 2,
++ .base_baud = 921600,
++ .uart_offset = 0x8,
++ },
++ [pbn_pericom_PI7C9X7954] = {
++ .flags = FL_BASE0,
++ .num_ports = 4,
++ .base_baud = 921600,
++ .uart_offset = 0x8,
++ },
++ [pbn_pericom_PI7C9X7958] = {
++ .flags = FL_BASE0,
++ .num_ports = 8,
++ .base_baud = 921600,
++ .uart_offset = 0x8,
++ },
+ };
+
+ static const struct pci_device_id blacklist[] = {
+@@ -5154,6 +5175,25 @@ static struct pci_device_id serial_pci_tbl[] = {
+ 0,
+ 0, pbn_exar_XR17V8358 },
+ /*
++ * Pericom PI7C9X795[1248] Uno/Dual/Quad/Octal UART
++ */
++ { PCI_VENDOR_ID_PERICOM, PCI_DEVICE_ID_PERICOM_PI7C9X7951,
++ PCI_ANY_ID, PCI_ANY_ID,
++ 0,
++ 0, pbn_pericom_PI7C9X7951 },
++ { PCI_VENDOR_ID_PERICOM, PCI_DEVICE_ID_PERICOM_PI7C9X7952,
++ PCI_ANY_ID, PCI_ANY_ID,
++ 0,
++ 0, pbn_pericom_PI7C9X7952 },
++ { PCI_VENDOR_ID_PERICOM, PCI_DEVICE_ID_PERICOM_PI7C9X7954,
++ PCI_ANY_ID, PCI_ANY_ID,
++ 0,
++ 0, pbn_pericom_PI7C9X7954 },
++ { PCI_VENDOR_ID_PERICOM, PCI_DEVICE_ID_PERICOM_PI7C9X7958,
++ PCI_ANY_ID, PCI_ANY_ID,
++ 0,
++ 0, pbn_pericom_PI7C9X7958 },
++ /*
+ * Topic TP560 Data/Fax/Voice 56k modem (reported by Evan Clarke)
+ */
+ { PCI_VENDOR_ID_TOPIC, PCI_DEVICE_ID_TOPIC_TP560,
+diff --git a/drivers/tty/serial/8250/8250_pnp.c b/drivers/tty/serial/8250/8250_pnp.c
+index 50a09cd76d50..658b392d1170 100644
+--- a/drivers/tty/serial/8250/8250_pnp.c
++++ b/drivers/tty/serial/8250/8250_pnp.c
+@@ -41,6 +41,12 @@ static const struct pnp_device_id pnp_dev_table[] = {
+ { "AEI1240", 0 },
+ /* Rockwell 56K ACF II Fax+Data+Voice Modem */
+ { "AKY1021", 0 /*SPCI_FL_NO_SHIRQ*/ },
++ /*
++ * ALi Fast Infrared Controller
++ * Native driver (ali-ircc) is broken so at least
++ * it can be used with irtty-sir.
++ */
++ { "ALI5123", 0 },
+ /* AZT3005 PnP SOUND DEVICE */
+ { "AZT4001", 0 },
+ /* Best Data Products Inc. Smart One 336F PnP Modem */
+@@ -364,6 +370,11 @@ static const struct pnp_device_id pnp_dev_table[] = {
+ /* Winbond CIR port, should not be probed. We should keep track
+ of it to prevent the legacy serial driver from probing it */
+ { "WEC1022", CIR_PORT },
++ /*
++ * SMSC IrCC SIR/FIR port, should not be probed by serial driver
++ * as well so its own driver can bind to it.
++ */
++ { "SMCF010", CIR_PORT },
+ { "", 0 }
+ };
+
+diff --git a/drivers/tty/serial/8250/8250_uniphier.c b/drivers/tty/serial/8250/8250_uniphier.c
+index 7d79425c2b09..d11621e2cf1d 100644
+--- a/drivers/tty/serial/8250/8250_uniphier.c
++++ b/drivers/tty/serial/8250/8250_uniphier.c
+@@ -218,6 +218,7 @@ static int uniphier_uart_probe(struct platform_device *pdev)
+ ret = serial8250_register_8250_port(&up);
+ if (ret < 0) {
+ dev_err(dev, "failed to register 8250 port\n");
++ clk_disable_unprepare(priv->clk);
+ return ret;
+ }
+
+diff --git a/drivers/tty/serial/men_z135_uart.c b/drivers/tty/serial/men_z135_uart.c
+index 35c55505b3eb..5a41b8fbb10a 100644
+--- a/drivers/tty/serial/men_z135_uart.c
++++ b/drivers/tty/serial/men_z135_uart.c
+@@ -392,7 +392,6 @@ static irqreturn_t men_z135_intr(int irq, void *data)
+ struct men_z135_port *uart = (struct men_z135_port *)data;
+ struct uart_port *port = &uart->port;
+ bool handled = false;
+- unsigned long flags;
+ int irq_id;
+
+ uart->stat_reg = ioread32(port->membase + MEN_Z135_STAT_REG);
+@@ -401,7 +400,7 @@ static irqreturn_t men_z135_intr(int irq, void *data)
+ if (!irq_id)
+ goto out;
+
+- spin_lock_irqsave(&port->lock, flags);
++ spin_lock(&port->lock);
+ /* It's save to write to IIR[7:6] RXC[9:8] */
+ iowrite8(irq_id, port->membase + MEN_Z135_STAT_REG);
+
+@@ -427,7 +426,7 @@ static irqreturn_t men_z135_intr(int irq, void *data)
+ handled = true;
+ }
+
+- spin_unlock_irqrestore(&port->lock, flags);
++ spin_unlock(&port->lock);
+ out:
+ return IRQ_RETVAL(handled);
+ }
+@@ -717,7 +716,7 @@ static void men_z135_set_termios(struct uart_port *port,
+
+ baud = uart_get_baud_rate(port, termios, old, 0, uart_freq / 16);
+
+- spin_lock(&port->lock);
++ spin_lock_irq(&port->lock);
+ if (tty_termios_baud_rate(termios))
+ tty_termios_encode_baud_rate(termios, baud, baud);
+
+@@ -725,7 +724,7 @@ static void men_z135_set_termios(struct uart_port *port,
+ iowrite32(bd_reg, port->membase + MEN_Z135_BAUD_REG);
+
+ uart_update_timeout(port, termios->c_cflag, baud);
+- spin_unlock(&port->lock);
++ spin_unlock_irq(&port->lock);
+ }
+
+ static const char *men_z135_type(struct uart_port *port)
+diff --git a/drivers/tty/serial/samsung.c b/drivers/tty/serial/samsung.c
+index 67d0c213b1c7..5916311eecb1 100644
+--- a/drivers/tty/serial/samsung.c
++++ b/drivers/tty/serial/samsung.c
+@@ -295,15 +295,6 @@ static int s3c24xx_serial_start_tx_dma(struct s3c24xx_uart_port *ourport,
+ if (ourport->tx_mode != S3C24XX_TX_DMA)
+ enable_tx_dma(ourport);
+
+- while (xmit->tail & (dma_get_cache_alignment() - 1)) {
+- if (rd_regl(port, S3C2410_UFSTAT) & ourport->info->tx_fifofull)
+- return 0;
+- wr_regb(port, S3C2410_UTXH, xmit->buf[xmit->tail]);
+- xmit->tail = (xmit->tail + 1) & (UART_XMIT_SIZE - 1);
+- port->icount.tx++;
+- count--;
+- }
+-
+ dma->tx_size = count & ~(dma_get_cache_alignment() - 1);
+ dma->tx_transfer_addr = dma->tx_addr + xmit->tail;
+
+@@ -342,7 +333,9 @@ static void s3c24xx_serial_start_next_tx(struct s3c24xx_uart_port *ourport)
+ return;
+ }
+
+- if (!ourport->dma || !ourport->dma->tx_chan || count < port->fifosize)
++ if (!ourport->dma || !ourport->dma->tx_chan ||
++ count < ourport->min_dma_size ||
++ xmit->tail & (dma_get_cache_alignment() - 1))
+ s3c24xx_serial_start_tx_pio(ourport);
+ else
+ s3c24xx_serial_start_tx_dma(ourport, count);
+@@ -736,15 +729,20 @@ static irqreturn_t s3c24xx_serial_tx_chars(int irq, void *id)
+ struct uart_port *port = &ourport->port;
+ struct circ_buf *xmit = &port->state->xmit;
+ unsigned long flags;
+- int count;
++ int count, dma_count = 0;
+
+ spin_lock_irqsave(&port->lock, flags);
+
+ count = CIRC_CNT_TO_END(xmit->head, xmit->tail, UART_XMIT_SIZE);
+
+- if (ourport->dma && ourport->dma->tx_chan && count >= port->fifosize) {
+- s3c24xx_serial_start_tx_dma(ourport, count);
+- goto out;
++ if (ourport->dma && ourport->dma->tx_chan &&
++ count >= ourport->min_dma_size) {
++ int align = dma_get_cache_alignment() -
++ (xmit->tail & (dma_get_cache_alignment() - 1));
++ if (count-align >= ourport->min_dma_size) {
++ dma_count = count-align;
++ count = align;
++ }
+ }
+
+ if (port->x_char) {
+@@ -765,14 +763,24 @@ static irqreturn_t s3c24xx_serial_tx_chars(int irq, void *id)
+
+ /* try and drain the buffer... */
+
+- count = port->fifosize;
+- while (!uart_circ_empty(xmit) && count-- > 0) {
++ if (count > port->fifosize) {
++ count = port->fifosize;
++ dma_count = 0;
++ }
++
++ while (!uart_circ_empty(xmit) && count > 0) {
+ if (rd_regl(port, S3C2410_UFSTAT) & ourport->info->tx_fifofull)
+ break;
+
+ wr_regb(port, S3C2410_UTXH, xmit->buf[xmit->tail]);
+ xmit->tail = (xmit->tail + 1) & (UART_XMIT_SIZE - 1);
+ port->icount.tx++;
++ count--;
++ }
++
++ if (!count && dma_count) {
++ s3c24xx_serial_start_tx_dma(ourport, dma_count);
++ goto out;
+ }
+
+ if (uart_circ_chars_pending(xmit) < WAKEUP_CHARS) {
+@@ -1838,6 +1846,13 @@ static int s3c24xx_serial_probe(struct platform_device *pdev)
+ else if (ourport->info->fifosize)
+ ourport->port.fifosize = ourport->info->fifosize;
+
++ /*
++ * DMA transfers must be aligned at least to cache line size,
++ * so find minimal transfer size suitable for DMA mode
++ */
++ ourport->min_dma_size = max_t(int, ourport->port.fifosize,
++ dma_get_cache_alignment());
++
+ probe_index++;
+
+ dbg("%s: initialising port %p...\n", __func__, ourport);
+diff --git a/drivers/tty/serial/samsung.h b/drivers/tty/serial/samsung.h
+index d275032aa68d..fc5deaa4f382 100644
+--- a/drivers/tty/serial/samsung.h
++++ b/drivers/tty/serial/samsung.h
+@@ -82,6 +82,7 @@ struct s3c24xx_uart_port {
+ unsigned char tx_claimed;
+ unsigned int pm_level;
+ unsigned long baudclk_rate;
++ unsigned int min_dma_size;
+
+ unsigned int rx_irq;
+ unsigned int tx_irq;
+diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
+index 69e769c35cf5..06ecd1e6871c 100644
+--- a/drivers/usb/dwc3/ep0.c
++++ b/drivers/usb/dwc3/ep0.c
+@@ -820,6 +820,11 @@ static void dwc3_ep0_complete_data(struct dwc3 *dwc,
+ unsigned maxp = ep0->endpoint.maxpacket;
+
+ transfer_size += (maxp - (transfer_size % maxp));
++
++ /* Maximum of DWC3_EP0_BOUNCE_SIZE can only be received */
++ if (transfer_size > DWC3_EP0_BOUNCE_SIZE)
++ transfer_size = DWC3_EP0_BOUNCE_SIZE;
++
+ transferred = min_t(u32, ur->length,
+ transfer_size - length);
+ memcpy(ur->buf, dwc->ep0_bounce, transferred);
+@@ -941,11 +946,14 @@ static void __dwc3_ep0_do_control_data(struct dwc3 *dwc,
+ return;
+ }
+
+- WARN_ON(req->request.length > DWC3_EP0_BOUNCE_SIZE);
+-
+ maxpacket = dep->endpoint.maxpacket;
+ transfer_size = roundup(req->request.length, maxpacket);
+
++ if (transfer_size > DWC3_EP0_BOUNCE_SIZE) {
++ dev_WARN(dwc->dev, "bounce buf can't handle req len\n");
++ transfer_size = DWC3_EP0_BOUNCE_SIZE;
++ }
++
+ dwc->ep0_bounced = true;
+
+ /*
+diff --git a/drivers/usb/gadget/function/f_uac2.c b/drivers/usb/gadget/function/f_uac2.c
+index 531861547253..96d935b00504 100644
+--- a/drivers/usb/gadget/function/f_uac2.c
++++ b/drivers/usb/gadget/function/f_uac2.c
+@@ -975,6 +975,29 @@ free_ep(struct uac2_rtd_params *prm, struct usb_ep *ep)
+ "%s:%d Error!\n", __func__, __LINE__);
+ }
+
++static void set_ep_max_packet_size(const struct f_uac2_opts *uac2_opts,
++ struct usb_endpoint_descriptor *ep_desc,
++ unsigned int factor, bool is_playback)
++{
++ int chmask, srate, ssize;
++ u16 max_packet_size;
++
++ if (is_playback) {
++ chmask = uac2_opts->p_chmask;
++ srate = uac2_opts->p_srate;
++ ssize = uac2_opts->p_ssize;
++ } else {
++ chmask = uac2_opts->c_chmask;
++ srate = uac2_opts->c_srate;
++ ssize = uac2_opts->c_ssize;
++ }
++
++ max_packet_size = num_channels(chmask) * ssize *
++ DIV_ROUND_UP(srate, factor / (1 << (ep_desc->bInterval - 1)));
++ ep_desc->wMaxPacketSize = cpu_to_le16(min(max_packet_size,
++ le16_to_cpu(ep_desc->wMaxPacketSize)));
++}
++
+ static int
+ afunc_bind(struct usb_configuration *cfg, struct usb_function *fn)
+ {
+@@ -1070,10 +1093,14 @@ afunc_bind(struct usb_configuration *cfg, struct usb_function *fn)
+ uac2->p_prm.uac2 = uac2;
+ uac2->c_prm.uac2 = uac2;
+
++ /* Calculate wMaxPacketSize according to audio bandwidth */
++ set_ep_max_packet_size(uac2_opts, &fs_epin_desc, 1000, true);
++ set_ep_max_packet_size(uac2_opts, &fs_epout_desc, 1000, false);
++ set_ep_max_packet_size(uac2_opts, &hs_epin_desc, 8000, true);
++ set_ep_max_packet_size(uac2_opts, &hs_epout_desc, 8000, false);
++
+ hs_epout_desc.bEndpointAddress = fs_epout_desc.bEndpointAddress;
+- hs_epout_desc.wMaxPacketSize = fs_epout_desc.wMaxPacketSize;
+ hs_epin_desc.bEndpointAddress = fs_epin_desc.bEndpointAddress;
+- hs_epin_desc.wMaxPacketSize = fs_epin_desc.wMaxPacketSize;
+
+ ret = usb_assign_descriptors(fn, fs_audio_desc, hs_audio_desc, NULL);
+ if (ret)
+diff --git a/drivers/usb/gadget/udc/m66592-udc.c b/drivers/usb/gadget/udc/m66592-udc.c
+index 309706fe4bf0..9704053dfe05 100644
+--- a/drivers/usb/gadget/udc/m66592-udc.c
++++ b/drivers/usb/gadget/udc/m66592-udc.c
+@@ -1052,7 +1052,7 @@ static void set_feature(struct m66592 *m66592, struct usb_ctrlrequest *ctrl)
+ tmp = m66592_read(m66592, M66592_INTSTS0) &
+ M66592_CTSQ;
+ udelay(1);
+- } while (tmp != M66592_CS_IDST || timeout-- > 0);
++ } while (tmp != M66592_CS_IDST && timeout-- > 0);
+
+ if (tmp == M66592_CS_IDST)
+ m66592_bset(m66592,
+diff --git a/drivers/usb/host/ehci-sysfs.c b/drivers/usb/host/ehci-sysfs.c
+index 5e44407aa099..5216f2b09d63 100644
+--- a/drivers/usb/host/ehci-sysfs.c
++++ b/drivers/usb/host/ehci-sysfs.c
+@@ -29,7 +29,7 @@ static ssize_t show_companion(struct device *dev,
+ int count = PAGE_SIZE;
+ char *ptr = buf;
+
+- ehci = hcd_to_ehci(bus_to_hcd(dev_get_drvdata(dev)));
++ ehci = hcd_to_ehci(dev_get_drvdata(dev));
+ nports = HCS_N_PORTS(ehci->hcs_params);
+
+ for (index = 0; index < nports; ++index) {
+@@ -54,7 +54,7 @@ static ssize_t store_companion(struct device *dev,
+ struct ehci_hcd *ehci;
+ int portnum, new_owner;
+
+- ehci = hcd_to_ehci(bus_to_hcd(dev_get_drvdata(dev)));
++ ehci = hcd_to_ehci(dev_get_drvdata(dev));
+ new_owner = PORT_OWNER; /* Owned by companion */
+ if (sscanf(buf, "%d", &portnum) != 1)
+ return -EINVAL;
+@@ -85,7 +85,7 @@ static ssize_t show_uframe_periodic_max(struct device *dev,
+ struct ehci_hcd *ehci;
+ int n;
+
+- ehci = hcd_to_ehci(bus_to_hcd(dev_get_drvdata(dev)));
++ ehci = hcd_to_ehci(dev_get_drvdata(dev));
+ n = scnprintf(buf, PAGE_SIZE, "%d\n", ehci->uframe_periodic_max);
+ return n;
+ }
+@@ -101,7 +101,7 @@ static ssize_t store_uframe_periodic_max(struct device *dev,
+ unsigned long flags;
+ ssize_t ret;
+
+- ehci = hcd_to_ehci(bus_to_hcd(dev_get_drvdata(dev)));
++ ehci = hcd_to_ehci(dev_get_drvdata(dev));
+ if (kstrtouint(buf, 0, &uframe_periodic_max) < 0)
+ return -EINVAL;
+
+diff --git a/drivers/usb/serial/ftdi_sio.c b/drivers/usb/serial/ftdi_sio.c
+index 4c8b3b82103d..a5a0376bbd48 100644
+--- a/drivers/usb/serial/ftdi_sio.c
++++ b/drivers/usb/serial/ftdi_sio.c
+@@ -605,6 +605,10 @@ static const struct usb_device_id id_table_combined[] = {
+ { USB_DEVICE(FTDI_VID, FTDI_NT_ORIONLXM_PID),
+ .driver_info = (kernel_ulong_t)&ftdi_jtag_quirk },
+ { USB_DEVICE(FTDI_VID, FTDI_SYNAPSE_SS200_PID) },
++ { USB_DEVICE(FTDI_VID, FTDI_CUSTOMWARE_MINIPLEX_PID) },
++ { USB_DEVICE(FTDI_VID, FTDI_CUSTOMWARE_MINIPLEX2_PID) },
++ { USB_DEVICE(FTDI_VID, FTDI_CUSTOMWARE_MINIPLEX2WI_PID) },
++ { USB_DEVICE(FTDI_VID, FTDI_CUSTOMWARE_MINIPLEX3_PID) },
+ /*
+ * ELV devices:
+ */
+diff --git a/drivers/usb/serial/ftdi_sio_ids.h b/drivers/usb/serial/ftdi_sio_ids.h
+index 792e054126de..2943b97b2a83 100644
+--- a/drivers/usb/serial/ftdi_sio_ids.h
++++ b/drivers/usb/serial/ftdi_sio_ids.h
+@@ -568,6 +568,14 @@
+ */
+ #define FTDI_SYNAPSE_SS200_PID 0x9090 /* SS200 - SNAP Stick 200 */
+
++/*
++ * CustomWare / ShipModul NMEA multiplexers product ids (FTDI_VID)
++ */
++#define FTDI_CUSTOMWARE_MINIPLEX_PID 0xfd48 /* MiniPlex first generation NMEA Multiplexer */
++#define FTDI_CUSTOMWARE_MINIPLEX2_PID 0xfd49 /* MiniPlex-USB and MiniPlex-2 series */
++#define FTDI_CUSTOMWARE_MINIPLEX2WI_PID 0xfd4a /* MiniPlex-2Wi */
++#define FTDI_CUSTOMWARE_MINIPLEX3_PID 0xfd4b /* MiniPlex-3 series */
++
+
+ /********************************/
+ /** third-party VID/PID combos **/
+diff --git a/drivers/usb/serial/pl2303.c b/drivers/usb/serial/pl2303.c
+index f5257af33ecf..ae682e4eeaef 100644
+--- a/drivers/usb/serial/pl2303.c
++++ b/drivers/usb/serial/pl2303.c
+@@ -362,21 +362,38 @@ static speed_t pl2303_encode_baud_rate_direct(unsigned char buf[4],
+ static speed_t pl2303_encode_baud_rate_divisor(unsigned char buf[4],
+ speed_t baud)
+ {
+- unsigned int tmp;
++ unsigned int baseline, mantissa, exponent;
+
+ /*
+ * Apparently the formula is:
+- * baudrate = 12M * 32 / (2^buf[1]) / buf[0]
++ * baudrate = 12M * 32 / (mantissa * 4^exponent)
++ * where
++ * mantissa = buf[8:0]
++ * exponent = buf[11:9]
+ */
+- tmp = 12000000 * 32 / baud;
++ baseline = 12000000 * 32;
++ mantissa = baseline / baud;
++ if (mantissa == 0)
++ mantissa = 1; /* Avoid dividing by zero if baud > 32*12M. */
++ exponent = 0;
++ while (mantissa >= 512) {
++ if (exponent < 7) {
++ mantissa >>= 2; /* divide by 4 */
++ exponent++;
++ } else {
++ /* Exponent is maxed. Trim mantissa and leave. */
++ mantissa = 511;
++ break;
++ }
++ }
++
+ buf[3] = 0x80;
+ buf[2] = 0;
+- buf[1] = (tmp >= 256);
+- while (tmp >= 256) {
+- tmp >>= 2;
+- buf[1] <<= 1;
+- }
+- buf[0] = tmp;
++ buf[1] = exponent << 1 | mantissa >> 8;
++ buf[0] = mantissa & 0xff;
++
++ /* Calculate and return the exact baud rate. */
++ baud = (baseline / mantissa) >> (exponent << 1);
+
+ return baud;
+ }
+diff --git a/drivers/usb/serial/qcserial.c b/drivers/usb/serial/qcserial.c
+index d156545728c2..ebcec8cda858 100644
+--- a/drivers/usb/serial/qcserial.c
++++ b/drivers/usb/serial/qcserial.c
+@@ -139,6 +139,7 @@ static const struct usb_device_id id_table[] = {
+ {USB_DEVICE(0x0AF0, 0x8120)}, /* Option GTM681W */
+
+ /* non-Gobi Sierra Wireless devices */
++ {DEVICE_SWI(0x03f0, 0x4e1d)}, /* HP lt4111 LTE/EV-DO/HSPA+ Gobi 4G Module */
+ {DEVICE_SWI(0x0f3d, 0x68a2)}, /* Sierra Wireless MC7700 */
+ {DEVICE_SWI(0x114f, 0x68a2)}, /* Sierra Wireless MC7750 */
+ {DEVICE_SWI(0x1199, 0x68a2)}, /* Sierra Wireless MC7710 */
+diff --git a/drivers/usb/serial/symbolserial.c b/drivers/usb/serial/symbolserial.c
+index 8fceec7298e0..6ed804450a5a 100644
+--- a/drivers/usb/serial/symbolserial.c
++++ b/drivers/usb/serial/symbolserial.c
+@@ -94,7 +94,7 @@ exit:
+
+ static int symbol_open(struct tty_struct *tty, struct usb_serial_port *port)
+ {
+- struct symbol_private *priv = usb_get_serial_data(port->serial);
++ struct symbol_private *priv = usb_get_serial_port_data(port);
+ unsigned long flags;
+ int result = 0;
+
+@@ -120,7 +120,7 @@ static void symbol_close(struct usb_serial_port *port)
+ static void symbol_throttle(struct tty_struct *tty)
+ {
+ struct usb_serial_port *port = tty->driver_data;
+- struct symbol_private *priv = usb_get_serial_data(port->serial);
++ struct symbol_private *priv = usb_get_serial_port_data(port);
+
+ spin_lock_irq(&priv->lock);
+ priv->throttled = true;
+@@ -130,7 +130,7 @@ static void symbol_throttle(struct tty_struct *tty)
+ static void symbol_unthrottle(struct tty_struct *tty)
+ {
+ struct usb_serial_port *port = tty->driver_data;
+- struct symbol_private *priv = usb_get_serial_data(port->serial);
++ struct symbol_private *priv = usb_get_serial_port_data(port);
+ int result;
+ bool was_throttled;
+
+diff --git a/fs/ceph/super.c b/fs/ceph/super.c
+index d1c833c321b9..7b6bfcbf801c 100644
+--- a/fs/ceph/super.c
++++ b/fs/ceph/super.c
+@@ -479,7 +479,7 @@ static int ceph_show_options(struct seq_file *m, struct dentry *root)
+ if (fsopt->max_readdir_bytes != CEPH_MAX_READDIR_BYTES_DEFAULT)
+ seq_printf(m, ",readdir_max_bytes=%d", fsopt->max_readdir_bytes);
+ if (strcmp(fsopt->snapdir_name, CEPH_SNAPDIRNAME_DEFAULT))
+- seq_printf(m, ",snapdirname=%s", fsopt->snapdir_name);
++ seq_show_option(m, "snapdirname", fsopt->snapdir_name);
+
+ return 0;
+ }
+diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
+index 0a9fb6b53126..6a1119e87fbb 100644
+--- a/fs/cifs/cifsfs.c
++++ b/fs/cifs/cifsfs.c
+@@ -394,17 +394,17 @@ cifs_show_options(struct seq_file *s, struct dentry *root)
+ struct sockaddr *srcaddr;
+ srcaddr = (struct sockaddr *)&tcon->ses->server->srcaddr;
+
+- seq_printf(s, ",vers=%s", tcon->ses->server->vals->version_string);
++ seq_show_option(s, "vers", tcon->ses->server->vals->version_string);
+ cifs_show_security(s, tcon->ses);
+ cifs_show_cache_flavor(s, cifs_sb);
+
+ if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_MULTIUSER)
+ seq_puts(s, ",multiuser");
+ else if (tcon->ses->user_name)
+- seq_printf(s, ",username=%s", tcon->ses->user_name);
++ seq_show_option(s, "username", tcon->ses->user_name);
+
+ if (tcon->ses->domainName)
+- seq_printf(s, ",domain=%s", tcon->ses->domainName);
++ seq_show_option(s, "domain", tcon->ses->domainName);
+
+ if (srcaddr->sa_family != AF_UNSPEC) {
+ struct sockaddr_in *saddr4;
+diff --git a/fs/ext4/super.c b/fs/ext4/super.c
+index 58987b5c514b..9981064c4a54 100644
+--- a/fs/ext4/super.c
++++ b/fs/ext4/super.c
+@@ -1763,10 +1763,10 @@ static inline void ext4_show_quota_options(struct seq_file *seq,
+ }
+
+ if (sbi->s_qf_names[USRQUOTA])
+- seq_printf(seq, ",usrjquota=%s", sbi->s_qf_names[USRQUOTA]);
++ seq_show_option(seq, "usrjquota", sbi->s_qf_names[USRQUOTA]);
+
+ if (sbi->s_qf_names[GRPQUOTA])
+- seq_printf(seq, ",grpjquota=%s", sbi->s_qf_names[GRPQUOTA]);
++ seq_show_option(seq, "grpjquota", sbi->s_qf_names[GRPQUOTA]);
+ #endif
+ }
+
+diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
+index 2982445947e1..894fb01a91da 100644
+--- a/fs/gfs2/super.c
++++ b/fs/gfs2/super.c
+@@ -1334,11 +1334,11 @@ static int gfs2_show_options(struct seq_file *s, struct dentry *root)
+ if (is_ancestor(root, sdp->sd_master_dir))
+ seq_puts(s, ",meta");
+ if (args->ar_lockproto[0])
+- seq_printf(s, ",lockproto=%s", args->ar_lockproto);
++ seq_show_option(s, "lockproto", args->ar_lockproto);
+ if (args->ar_locktable[0])
+- seq_printf(s, ",locktable=%s", args->ar_locktable);
++ seq_show_option(s, "locktable", args->ar_locktable);
+ if (args->ar_hostdata[0])
+- seq_printf(s, ",hostdata=%s", args->ar_hostdata);
++ seq_show_option(s, "hostdata", args->ar_hostdata);
+ if (args->ar_spectator)
+ seq_puts(s, ",spectator");
+ if (args->ar_localflocks)
+diff --git a/fs/hfs/super.c b/fs/hfs/super.c
+index 55c03b9e9070..4574fdd3d421 100644
+--- a/fs/hfs/super.c
++++ b/fs/hfs/super.c
+@@ -136,9 +136,9 @@ static int hfs_show_options(struct seq_file *seq, struct dentry *root)
+ struct hfs_sb_info *sbi = HFS_SB(root->d_sb);
+
+ if (sbi->s_creator != cpu_to_be32(0x3f3f3f3f))
+- seq_printf(seq, ",creator=%.4s", (char *)&sbi->s_creator);
++ seq_show_option_n(seq, "creator", (char *)&sbi->s_creator, 4);
+ if (sbi->s_type != cpu_to_be32(0x3f3f3f3f))
+- seq_printf(seq, ",type=%.4s", (char *)&sbi->s_type);
++ seq_show_option_n(seq, "type", (char *)&sbi->s_type, 4);
+ seq_printf(seq, ",uid=%u,gid=%u",
+ from_kuid_munged(&init_user_ns, sbi->s_uid),
+ from_kgid_munged(&init_user_ns, sbi->s_gid));
+diff --git a/fs/hfsplus/options.c b/fs/hfsplus/options.c
+index c90b72ee676d..bb806e58c977 100644
+--- a/fs/hfsplus/options.c
++++ b/fs/hfsplus/options.c
+@@ -218,9 +218,9 @@ int hfsplus_show_options(struct seq_file *seq, struct dentry *root)
+ struct hfsplus_sb_info *sbi = HFSPLUS_SB(root->d_sb);
+
+ if (sbi->creator != HFSPLUS_DEF_CR_TYPE)
+- seq_printf(seq, ",creator=%.4s", (char *)&sbi->creator);
++ seq_show_option_n(seq, "creator", (char *)&sbi->creator, 4);
+ if (sbi->type != HFSPLUS_DEF_CR_TYPE)
+- seq_printf(seq, ",type=%.4s", (char *)&sbi->type);
++ seq_show_option_n(seq, "type", (char *)&sbi->type, 4);
+ seq_printf(seq, ",umask=%o,uid=%u,gid=%u", sbi->umask,
+ from_kuid_munged(&init_user_ns, sbi->uid),
+ from_kgid_munged(&init_user_ns, sbi->gid));
+diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
+index 059597b23f67..2ac99db3750e 100644
+--- a/fs/hostfs/hostfs_kern.c
++++ b/fs/hostfs/hostfs_kern.c
+@@ -260,7 +260,7 @@ static int hostfs_show_options(struct seq_file *seq, struct dentry *root)
+ size_t offset = strlen(root_ino) + 1;
+
+ if (strlen(root_path) > offset)
+- seq_printf(seq, ",%s", root_path + offset);
++ seq_show_option(seq, root_path + offset, NULL);
+
+ if (append)
+ seq_puts(seq, ",append");
+diff --git a/fs/hpfs/namei.c b/fs/hpfs/namei.c
+index a0872f239f04..9e92c9c2d319 100644
+--- a/fs/hpfs/namei.c
++++ b/fs/hpfs/namei.c
+@@ -8,6 +8,17 @@
+ #include <linux/sched.h>
+ #include "hpfs_fn.h"
+
++static void hpfs_update_directory_times(struct inode *dir)
++{
++ time_t t = get_seconds();
++ if (t == dir->i_mtime.tv_sec &&
++ t == dir->i_ctime.tv_sec)
++ return;
++ dir->i_mtime.tv_sec = dir->i_ctime.tv_sec = t;
++ dir->i_mtime.tv_nsec = dir->i_ctime.tv_nsec = 0;
++ hpfs_write_inode_nolock(dir);
++}
++
+ static int hpfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+ {
+ const unsigned char *name = dentry->d_name.name;
+@@ -99,6 +110,7 @@ static int hpfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+ result->i_mode = mode | S_IFDIR;
+ hpfs_write_inode_nolock(result);
+ }
++ hpfs_update_directory_times(dir);
+ d_instantiate(dentry, result);
+ hpfs_unlock(dir->i_sb);
+ return 0;
+@@ -187,6 +199,7 @@ static int hpfs_create(struct inode *dir, struct dentry *dentry, umode_t mode, b
+ result->i_mode = mode | S_IFREG;
+ hpfs_write_inode_nolock(result);
+ }
++ hpfs_update_directory_times(dir);
+ d_instantiate(dentry, result);
+ hpfs_unlock(dir->i_sb);
+ return 0;
+@@ -262,6 +275,7 @@ static int hpfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, de
+ insert_inode_hash(result);
+
+ hpfs_write_inode_nolock(result);
++ hpfs_update_directory_times(dir);
+ d_instantiate(dentry, result);
+ brelse(bh);
+ hpfs_unlock(dir->i_sb);
+@@ -340,6 +354,7 @@ static int hpfs_symlink(struct inode *dir, struct dentry *dentry, const char *sy
+ insert_inode_hash(result);
+
+ hpfs_write_inode_nolock(result);
++ hpfs_update_directory_times(dir);
+ d_instantiate(dentry, result);
+ hpfs_unlock(dir->i_sb);
+ return 0;
+@@ -423,6 +438,8 @@ again:
+ out1:
+ hpfs_brelse4(&qbh);
+ out:
++ if (!err)
++ hpfs_update_directory_times(dir);
+ hpfs_unlock(dir->i_sb);
+ return err;
+ }
+@@ -477,6 +494,8 @@ static int hpfs_rmdir(struct inode *dir, struct dentry *dentry)
+ out1:
+ hpfs_brelse4(&qbh);
+ out:
++ if (!err)
++ hpfs_update_directory_times(dir);
+ hpfs_unlock(dir->i_sb);
+ return err;
+ }
+@@ -595,7 +614,7 @@ static int hpfs_rename(struct inode *old_dir, struct dentry *old_dentry,
+ goto end1;
+ }
+
+- end:
++end:
+ hpfs_i(i)->i_parent_dir = new_dir->i_ino;
+ if (S_ISDIR(i->i_mode)) {
+ inc_nlink(new_dir);
+@@ -610,6 +629,10 @@ static int hpfs_rename(struct inode *old_dir, struct dentry *old_dentry,
+ brelse(bh);
+ }
+ end1:
++ if (!err) {
++ hpfs_update_directory_times(old_dir);
++ hpfs_update_directory_times(new_dir);
++ }
+ hpfs_unlock(i->i_sb);
+ return err;
+ }
+diff --git a/fs/libfs.c b/fs/libfs.c
+index 102edfd39000..c7cbfb092e94 100644
+--- a/fs/libfs.c
++++ b/fs/libfs.c
+@@ -1185,7 +1185,7 @@ void make_empty_dir_inode(struct inode *inode)
+ inode->i_uid = GLOBAL_ROOT_UID;
+ inode->i_gid = GLOBAL_ROOT_GID;
+ inode->i_rdev = 0;
+- inode->i_size = 2;
++ inode->i_size = 0;
+ inode->i_blkbits = PAGE_SHIFT;
+ inode->i_blocks = 0;
+
+diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
+index 719f7f4c7a37..33efa334ec76 100644
+--- a/fs/ocfs2/file.c
++++ b/fs/ocfs2/file.c
+@@ -2372,6 +2372,20 @@ relock:
+ /* buffered aio wouldn't have proper lock coverage today */
+ BUG_ON(written == -EIOCBQUEUED && !(iocb->ki_flags & IOCB_DIRECT));
+
++ /*
++ * deep in g_f_a_w_n()->ocfs2_direct_IO we pass in a ocfs2_dio_end_io
++ * function pointer which is called when o_direct io completes so that
++ * it can unlock our rw lock.
++ * Unfortunately there are error cases which call end_io and others
++ * that don't. so we don't have to unlock the rw_lock if either an
++ * async dio is going to do it in the future or an end_io after an
++ * error has already done it.
++ */
++ if ((written == -EIOCBQUEUED) || (!ocfs2_iocb_is_rw_locked(iocb))) {
++ rw_level = -1;
++ unaligned_dio = 0;
++ }
++
+ if (unlikely(written <= 0))
+ goto no_sync;
+
+@@ -2396,20 +2410,6 @@ relock:
+ }
+
+ no_sync:
+- /*
+- * deep in g_f_a_w_n()->ocfs2_direct_IO we pass in a ocfs2_dio_end_io
+- * function pointer which is called when o_direct io completes so that
+- * it can unlock our rw lock.
+- * Unfortunately there are error cases which call end_io and others
+- * that don't. so we don't have to unlock the rw_lock if either an
+- * async dio is going to do it in the future or an end_io after an
+- * error has already done it.
+- */
+- if ((ret == -EIOCBQUEUED) || (!ocfs2_iocb_is_rw_locked(iocb))) {
+- rw_level = -1;
+- unaligned_dio = 0;
+- }
+-
+ if (unaligned_dio) {
+ ocfs2_iocb_clear_unaligned_aio(iocb);
+ mutex_unlock(&OCFS2_I(inode)->ip_unaligned_aio);
+diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
+index 403c5660b306..a482e312c7b2 100644
+--- a/fs/ocfs2/super.c
++++ b/fs/ocfs2/super.c
+@@ -1550,8 +1550,8 @@ static int ocfs2_show_options(struct seq_file *s, struct dentry *root)
+ seq_printf(s, ",localflocks,");
+
+ if (osb->osb_cluster_stack[0])
+- seq_printf(s, ",cluster_stack=%.*s", OCFS2_STACK_LABEL_LEN,
+- osb->osb_cluster_stack);
++ seq_show_option_n(s, "cluster_stack", osb->osb_cluster_stack,
++ OCFS2_STACK_LABEL_LEN);
+ if (opts & OCFS2_MOUNT_USRQUOTA)
+ seq_printf(s, ",usrquota");
+ if (opts & OCFS2_MOUNT_GRPQUOTA)
+diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
+index 7466ff339c66..79073d68b475 100644
+--- a/fs/overlayfs/super.c
++++ b/fs/overlayfs/super.c
+@@ -588,10 +588,10 @@ static int ovl_show_options(struct seq_file *m, struct dentry *dentry)
+ struct super_block *sb = dentry->d_sb;
+ struct ovl_fs *ufs = sb->s_fs_info;
+
+- seq_printf(m, ",lowerdir=%s", ufs->config.lowerdir);
++ seq_show_option(m, "lowerdir", ufs->config.lowerdir);
+ if (ufs->config.upperdir) {
+- seq_printf(m, ",upperdir=%s", ufs->config.upperdir);
+- seq_printf(m, ",workdir=%s", ufs->config.workdir);
++ seq_show_option(m, "upperdir", ufs->config.upperdir);
++ seq_show_option(m, "workdir", ufs->config.workdir);
+ }
+ return 0;
+ }
+diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c
+index 0e4cf728126f..4a62fe8cc3bf 100644
+--- a/fs/reiserfs/super.c
++++ b/fs/reiserfs/super.c
+@@ -714,18 +714,20 @@ static int reiserfs_show_options(struct seq_file *seq, struct dentry *root)
+ seq_puts(seq, ",acl");
+
+ if (REISERFS_SB(s)->s_jdev)
+- seq_printf(seq, ",jdev=%s", REISERFS_SB(s)->s_jdev);
++ seq_show_option(seq, "jdev", REISERFS_SB(s)->s_jdev);
+
+ if (journal->j_max_commit_age != journal->j_default_max_commit_age)
+ seq_printf(seq, ",commit=%d", journal->j_max_commit_age);
+
+ #ifdef CONFIG_QUOTA
+ if (REISERFS_SB(s)->s_qf_names[USRQUOTA])
+- seq_printf(seq, ",usrjquota=%s", REISERFS_SB(s)->s_qf_names[USRQUOTA]);
++ seq_show_option(seq, "usrjquota",
++ REISERFS_SB(s)->s_qf_names[USRQUOTA]);
+ else if (opts & (1 << REISERFS_USRQUOTA))
+ seq_puts(seq, ",usrquota");
+ if (REISERFS_SB(s)->s_qf_names[GRPQUOTA])
+- seq_printf(seq, ",grpjquota=%s", REISERFS_SB(s)->s_qf_names[GRPQUOTA]);
++ seq_show_option(seq, "grpjquota",
++ REISERFS_SB(s)->s_qf_names[GRPQUOTA]);
+ else if (opts & (1 << REISERFS_GRPQUOTA))
+ seq_puts(seq, ",grpquota");
+ if (REISERFS_SB(s)->s_jquota_fmt) {
+diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
+index 74bcbabfa523..b14bbd6bb05f 100644
+--- a/fs/xfs/libxfs/xfs_da_format.h
++++ b/fs/xfs/libxfs/xfs_da_format.h
+@@ -680,8 +680,15 @@ typedef struct xfs_attr_leaf_name_remote {
+ typedef struct xfs_attr_leafblock {
+ xfs_attr_leaf_hdr_t hdr; /* constant-structure header block */
+ xfs_attr_leaf_entry_t entries[1]; /* sorted on key, not name */
+- xfs_attr_leaf_name_local_t namelist; /* grows from bottom of buf */
+- xfs_attr_leaf_name_remote_t valuelist; /* grows from bottom of buf */
++ /*
++ * The rest of the block contains the following structures after the
++ * leaf entries, growing from the bottom up. The variables are never
++ * referenced and definining them can actually make gcc optimize away
++ * accesses to the 'entries' array above index 0 so don't do that.
++ *
++ * xfs_attr_leaf_name_local_t namelist;
++ * xfs_attr_leaf_name_remote_t valuelist;
++ */
+ } xfs_attr_leafblock_t;
+
+ /*
+diff --git a/fs/xfs/libxfs/xfs_dir2_data.c b/fs/xfs/libxfs/xfs_dir2_data.c
+index de1ea16f5748..534bbf283d6b 100644
+--- a/fs/xfs/libxfs/xfs_dir2_data.c
++++ b/fs/xfs/libxfs/xfs_dir2_data.c
+@@ -252,7 +252,8 @@ xfs_dir3_data_reada_verify(
+ return;
+ case cpu_to_be32(XFS_DIR2_DATA_MAGIC):
+ case cpu_to_be32(XFS_DIR3_DATA_MAGIC):
+- xfs_dir3_data_verify(bp);
++ bp->b_ops = &xfs_dir3_data_buf_ops;
++ bp->b_ops->verify_read(bp);
+ return;
+ default:
+ xfs_buf_ioerror(bp, -EFSCORRUPTED);
+diff --git a/fs/xfs/libxfs/xfs_dir2_node.c b/fs/xfs/libxfs/xfs_dir2_node.c
+index 41b80d3d3877..06bb4218b362 100644
+--- a/fs/xfs/libxfs/xfs_dir2_node.c
++++ b/fs/xfs/libxfs/xfs_dir2_node.c
+@@ -2132,6 +2132,7 @@ xfs_dir2_node_replace(
+ int error; /* error return value */
+ int i; /* btree level */
+ xfs_ino_t inum; /* new inode number */
++ int ftype; /* new file type */
+ xfs_dir2_leaf_t *leaf; /* leaf structure */
+ xfs_dir2_leaf_entry_t *lep; /* leaf entry being changed */
+ int rval; /* internal return value */
+@@ -2145,7 +2146,14 @@ xfs_dir2_node_replace(
+ state = xfs_da_state_alloc();
+ state->args = args;
+ state->mp = args->dp->i_mount;
++
++ /*
++ * We have to save new inode number and ftype since
++ * xfs_da3_node_lookup_int() is going to overwrite them
++ */
+ inum = args->inumber;
++ ftype = args->filetype;
++
+ /*
+ * Lookup the entry to change in the btree.
+ */
+@@ -2183,7 +2191,7 @@ xfs_dir2_node_replace(
+ * Fill in the new inode number and log the entry.
+ */
+ dep->inumber = cpu_to_be64(inum);
+- args->dp->d_ops->data_put_ftype(dep, args->filetype);
++ args->dp->d_ops->data_put_ftype(dep, ftype);
+ xfs_dir2_data_log_entry(args, state->extrablk.bp, dep);
+ rval = 0;
+ }
+diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
+index 3859f5e27a4d..458fced2c0f9 100644
+--- a/fs/xfs/xfs_aops.c
++++ b/fs/xfs/xfs_aops.c
+@@ -356,7 +356,8 @@ xfs_end_bio(
+ {
+ xfs_ioend_t *ioend = bio->bi_private;
+
+- ioend->io_error = test_bit(BIO_UPTODATE, &bio->bi_flags) ? 0 : error;
++ if (!ioend->io_error && !test_bit(BIO_UPTODATE, &bio->bi_flags))
++ ioend->io_error = error;
+
+ /* Toss bio and pass work off to an xfsdatad thread */
+ bio->bi_private = NULL;
+diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
+index 1fb16562c159..bbd9b1f10ffb 100644
+--- a/fs/xfs/xfs_super.c
++++ b/fs/xfs/xfs_super.c
+@@ -511,9 +511,9 @@ xfs_showargs(
+ seq_printf(m, "," MNTOPT_LOGBSIZE "=%dk", mp->m_logbsize >> 10);
+
+ if (mp->m_logname)
+- seq_printf(m, "," MNTOPT_LOGDEV "=%s", mp->m_logname);
++ seq_show_option(m, MNTOPT_LOGDEV, mp->m_logname);
+ if (mp->m_rtname)
+- seq_printf(m, "," MNTOPT_RTDEV "=%s", mp->m_rtname);
++ seq_show_option(m, MNTOPT_RTDEV, mp->m_rtname);
+
+ if (mp->m_dalign > 0)
+ seq_printf(m, "," MNTOPT_SUNIT "=%d",
+diff --git a/include/linux/acpi.h b/include/linux/acpi.h
+index d2445fa9999f..0b2394f61af4 100644
+--- a/include/linux/acpi.h
++++ b/include/linux/acpi.h
+@@ -221,7 +221,7 @@ struct pci_dev;
+
+ int acpi_pci_irq_enable (struct pci_dev *dev);
+ void acpi_penalize_isa_irq(int irq, int active);
+-
++void acpi_penalize_sci_irq(int irq, int trigger, int polarity);
+ void acpi_pci_irq_disable (struct pci_dev *dev);
+
+ extern int ec_read(u8 addr, u8 *val);
+diff --git a/include/linux/iio/iio.h b/include/linux/iio/iio.h
+index f79148261d16..7bb7f673cb3f 100644
+--- a/include/linux/iio/iio.h
++++ b/include/linux/iio/iio.h
+@@ -645,6 +645,15 @@ int iio_str_to_fixpoint(const char *str, int fract_mult, int *integer,
+ #define IIO_DEGREE_TO_RAD(deg) (((deg) * 314159ULL + 9000000ULL) / 18000000ULL)
+
+ /**
++ * IIO_RAD_TO_DEGREE() - Convert rad to degree
++ * @rad: A value in rad
++ *
++ * Returns the given value converted from rad to degree
++ */
++#define IIO_RAD_TO_DEGREE(rad) \
++ (((rad) * 18000000ULL + 314159ULL / 2) / 314159ULL)
++
++/**
+ * IIO_G_TO_M_S_2() - Convert g to meter / second**2
+ * @g: A value in g
+ *
+@@ -652,4 +661,12 @@ int iio_str_to_fixpoint(const char *str, int fract_mult, int *integer,
+ */
+ #define IIO_G_TO_M_S_2(g) ((g) * 980665ULL / 100000ULL)
+
++/**
++ * IIO_M_S_2_TO_G() - Convert meter / second**2 to g
++ * @ms2: A value in meter / second**2
++ *
++ * Returns the given value converted from meter / second**2 to g
++ */
++#define IIO_M_S_2_TO_G(ms2) (((ms2) * 100000ULL + 980665ULL / 2) / 980665ULL)
++
+ #endif /* _INDUSTRIAL_IO_H_ */
+diff --git a/include/linux/pci.h b/include/linux/pci.h
+index 860c751810fc..1d4eb6057f72 100644
+--- a/include/linux/pci.h
++++ b/include/linux/pci.h
+@@ -180,6 +180,8 @@ enum pci_dev_flags {
+ PCI_DEV_FLAGS_NO_BUS_RESET = (__force pci_dev_flags_t) (1 << 6),
+ /* Do not use PM reset even if device advertises NoSoftRst- */
+ PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
++ /* Get VPD from function 0 VPD */
++ PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
+ };
+
+ enum pci_irq_reroute_variant {
+diff --git a/include/linux/seq_file.h b/include/linux/seq_file.h
+index 912a7c482649..d4c7271382cb 100644
+--- a/include/linux/seq_file.h
++++ b/include/linux/seq_file.h
+@@ -149,6 +149,41 @@ static inline struct user_namespace *seq_user_ns(struct seq_file *seq)
+ #endif
+ }
+
++/**
++ * seq_show_options - display mount options with appropriate escapes.
++ * @m: the seq_file handle
++ * @name: the mount option name
++ * @value: the mount option name's value, can be NULL
++ */
++static inline void seq_show_option(struct seq_file *m, const char *name,
++ const char *value)
++{
++ seq_putc(m, ',');
++ seq_escape(m, name, ",= \t\n\\");
++ if (value) {
++ seq_putc(m, '=');
++ seq_escape(m, value, ", \t\n\\");
++ }
++}
++
++/**
++ * seq_show_option_n - display mount options with appropriate escapes
++ * where @value must be a specific length.
++ * @m: the seq_file handle
++ * @name: the mount option name
++ * @value: the mount option name's value, cannot be NULL
++ * @length: the length of @value to display
++ *
++ * This is a macro since this uses "length" to define the size of the
++ * stack buffer.
++ */
++#define seq_show_option_n(m, name, value, length) { \
++ char val_buf[length + 1]; \
++ strncpy(val_buf, value, length); \
++ val_buf[length] = '\0'; \
++ seq_show_option(m, name, val_buf); \
++}
++
+ #define SEQ_START_TOKEN ((void *)1)
+ /*
+ * Helpers for iteration over list_head-s in seq_files
+diff --git a/include/uapi/linux/dm-ioctl.h b/include/uapi/linux/dm-ioctl.h
+index 061aca3a962d..d34611e35a30 100644
+--- a/include/uapi/linux/dm-ioctl.h
++++ b/include/uapi/linux/dm-ioctl.h
+@@ -267,9 +267,9 @@ enum {
+ #define DM_DEV_SET_GEOMETRY _IOWR(DM_IOCTL, DM_DEV_SET_GEOMETRY_CMD, struct dm_ioctl)
+
+ #define DM_VERSION_MAJOR 4
+-#define DM_VERSION_MINOR 32
++#define DM_VERSION_MINOR 33
+ #define DM_VERSION_PATCHLEVEL 0
+-#define DM_VERSION_EXTRA "-ioctl (2015-6-26)"
++#define DM_VERSION_EXTRA "-ioctl (2015-8-18)"
+
+ /* Status bits */
+ #define DM_READONLY_FLAG (1 << 0) /* In/Out */
+diff --git a/kernel/cgroup.c b/kernel/cgroup.c
+index f89d9292eee6..c6c4240e7d28 100644
+--- a/kernel/cgroup.c
++++ b/kernel/cgroup.c
+@@ -1334,7 +1334,7 @@ static int cgroup_show_options(struct seq_file *seq,
+
+ for_each_subsys(ss, ssid)
+ if (root->subsys_mask & (1 << ssid))
+- seq_printf(seq, ",%s", ss->name);
++ seq_show_option(seq, ss->name, NULL);
+ if (root->flags & CGRP_ROOT_NOPREFIX)
+ seq_puts(seq, ",noprefix");
+ if (root->flags & CGRP_ROOT_XATTR)
+@@ -1342,13 +1342,14 @@ static int cgroup_show_options(struct seq_file *seq,
+
+ spin_lock(&release_agent_path_lock);
+ if (strlen(root->release_agent_path))
+- seq_printf(seq, ",release_agent=%s", root->release_agent_path);
++ seq_show_option(seq, "release_agent",
++ root->release_agent_path);
+ spin_unlock(&release_agent_path_lock);
+
+ if (test_bit(CGRP_CPUSET_CLONE_CHILDREN, &root->cgrp.flags))
+ seq_puts(seq, ",clone_children");
+ if (strlen(root->name))
+- seq_printf(seq, ",name=%s", root->name);
++ seq_show_option(seq, "name", root->name);
+ return 0;
+ }
+
+diff --git a/kernel/sched/core.c b/kernel/sched/core.c
+index 78b4bad10081..e9673433cc01 100644
+--- a/kernel/sched/core.c
++++ b/kernel/sched/core.c
+@@ -5433,6 +5433,14 @@ static int sched_cpu_active(struct notifier_block *nfb,
+ case CPU_STARTING:
+ set_cpu_rq_start_time();
+ return NOTIFY_OK;
++ case CPU_ONLINE:
++ /*
++ * At this point a starting CPU has marked itself as online via
++ * set_cpu_online(). But it might not yet have marked itself
++ * as active, which is essential from here on.
++ *
++ * Thus, fall-through and help the starting CPU along.
++ */
+ case CPU_DOWN_FAILED:
+ set_cpu_active((long)hcpu, true);
+ return NOTIFY_OK;
+diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
+index 6da82bcb0a8b..8fd97dac538a 100644
+--- a/mm/memory_hotplug.c
++++ b/mm/memory_hotplug.c
+@@ -1248,6 +1248,14 @@ int __ref add_memory(int nid, u64 start, u64 size)
+
+ mem_hotplug_begin();
+
++ /*
++ * Add new range to memblock so that when hotadd_new_pgdat() is called
++ * to allocate new pgdat, get_pfn_range_for_nid() will be able to find
++ * this new range and calculate total pages correctly. The range will
++ * be removed at hot-remove time.
++ */
++ memblock_add_node(start, size, nid);
++
+ new_node = !node_online(nid);
+ if (new_node) {
+ pgdat = hotadd_new_pgdat(nid, start);
+@@ -1277,7 +1285,6 @@ int __ref add_memory(int nid, u64 start, u64 size)
+
+ /* create new memmap entry */
+ firmware_map_add_hotplug(start, start + size, "System RAM");
+- memblock_add_node(start, size, nid);
+
+ goto out;
+
+@@ -1286,6 +1293,7 @@ error:
+ if (new_pgdat)
+ rollback_node_hotadd(nid, pgdat);
+ release_memory_resource(res);
++ memblock_remove(start, size);
+
+ out:
+ mem_hotplug_done();
+diff --git a/net/ceph/ceph_common.c b/net/ceph/ceph_common.c
+index f30329f72641..69a4d30a9ccf 100644
+--- a/net/ceph/ceph_common.c
++++ b/net/ceph/ceph_common.c
+@@ -517,8 +517,11 @@ int ceph_print_client_options(struct seq_file *m, struct ceph_client *client)
+ struct ceph_options *opt = client->options;
+ size_t pos = m->count;
+
+- if (opt->name)
+- seq_printf(m, "name=%s,", opt->name);
++ if (opt->name) {
++ seq_puts(m, "name=");
++ seq_escape(m, opt->name, ", \t\n\\");
++ seq_putc(m, ',');
++ }
+ if (opt->key)
+ seq_puts(m, "secret=<hidden>,");
+
+diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
+index 564079c5c49d..cdf4c589a391 100644
+--- a/security/selinux/hooks.c
++++ b/security/selinux/hooks.c
+@@ -1100,7 +1100,7 @@ static void selinux_write_opts(struct seq_file *m,
+ seq_puts(m, prefix);
+ if (has_comma)
+ seq_putc(m, '\"');
+- seq_puts(m, opts->mnt_opts[i]);
++ seq_escape(m, opts->mnt_opts[i], "\"\n\\");
+ if (has_comma)
+ seq_putc(m, '\"');
+ }
+diff --git a/sound/soc/codecs/adav80x.c b/sound/soc/codecs/adav80x.c
+index 36d842570745..69c63b92e078 100644
+--- a/sound/soc/codecs/adav80x.c
++++ b/sound/soc/codecs/adav80x.c
+@@ -865,7 +865,6 @@ const struct regmap_config adav80x_regmap_config = {
+ .val_bits = 8,
+ .pad_bits = 1,
+ .reg_bits = 7,
+- .read_flag_mask = 0x01,
+
+ .max_register = ADAV80X_PLL_OUTE,
+
+diff --git a/sound/soc/codecs/arizona.c b/sound/soc/codecs/arizona.c
+index 802e05eae3e9..4180827a8480 100644
+--- a/sound/soc/codecs/arizona.c
++++ b/sound/soc/codecs/arizona.c
+@@ -1756,17 +1756,6 @@ int arizona_init_dai(struct arizona_priv *priv, int id)
+ }
+ EXPORT_SYMBOL_GPL(arizona_init_dai);
+
+-static irqreturn_t arizona_fll_clock_ok(int irq, void *data)
+-{
+- struct arizona_fll *fll = data;
+-
+- arizona_fll_dbg(fll, "clock OK\n");
+-
+- complete(&fll->ok);
+-
+- return IRQ_HANDLED;
+-}
+-
+ static struct {
+ unsigned int min;
+ unsigned int max;
+@@ -2048,17 +2037,18 @@ static int arizona_is_enabled_fll(struct arizona_fll *fll)
+ static int arizona_enable_fll(struct arizona_fll *fll)
+ {
+ struct arizona *arizona = fll->arizona;
+- unsigned long time_left;
+ bool use_sync = false;
+ int already_enabled = arizona_is_enabled_fll(fll);
+ struct arizona_fll_cfg cfg;
++ int i;
++ unsigned int val;
+
+ if (already_enabled < 0)
+ return already_enabled;
+
+ if (already_enabled) {
+ /* Facilitate smooth refclk across the transition */
+- regmap_update_bits_async(fll->arizona->regmap, fll->base + 0x7,
++ regmap_update_bits_async(fll->arizona->regmap, fll->base + 0x9,
+ ARIZONA_FLL1_GAIN_MASK, 0);
+ regmap_update_bits_async(fll->arizona->regmap, fll->base + 1,
+ ARIZONA_FLL1_FREERUN,
+@@ -2110,9 +2100,6 @@ static int arizona_enable_fll(struct arizona_fll *fll)
+ if (!already_enabled)
+ pm_runtime_get(arizona->dev);
+
+- /* Clear any pending completions */
+- try_wait_for_completion(&fll->ok);
+-
+ regmap_update_bits_async(arizona->regmap, fll->base + 1,
+ ARIZONA_FLL1_ENA, ARIZONA_FLL1_ENA);
+ if (use_sync)
+@@ -2124,10 +2111,24 @@ static int arizona_enable_fll(struct arizona_fll *fll)
+ regmap_update_bits_async(arizona->regmap, fll->base + 1,
+ ARIZONA_FLL1_FREERUN, 0);
+
+- time_left = wait_for_completion_timeout(&fll->ok,
+- msecs_to_jiffies(250));
+- if (time_left == 0)
++ arizona_fll_dbg(fll, "Waiting for FLL lock...\n");
++ val = 0;
++ for (i = 0; i < 15; i++) {
++ if (i < 5)
++ usleep_range(200, 400);
++ else
++ msleep(20);
++
++ regmap_read(arizona->regmap,
++ ARIZONA_INTERRUPT_RAW_STATUS_5,
++ &val);
++ if (val & (ARIZONA_FLL1_CLOCK_OK_STS << (fll->id - 1)))
++ break;
++ }
++ if (i == 15)
+ arizona_fll_warn(fll, "Timed out waiting for lock\n");
++ else
++ arizona_fll_dbg(fll, "FLL locked (%d polls)\n", i);
+
+ return 0;
+ }
+@@ -2212,11 +2213,8 @@ EXPORT_SYMBOL_GPL(arizona_set_fll);
+ int arizona_init_fll(struct arizona *arizona, int id, int base, int lock_irq,
+ int ok_irq, struct arizona_fll *fll)
+ {
+- int ret;
+ unsigned int val;
+
+- init_completion(&fll->ok);
+-
+ fll->id = id;
+ fll->base = base;
+ fll->arizona = arizona;
+@@ -2238,13 +2236,6 @@ int arizona_init_fll(struct arizona *arizona, int id, int base, int lock_irq,
+ snprintf(fll->clock_ok_name, sizeof(fll->clock_ok_name),
+ "FLL%d clock OK", id);
+
+- ret = arizona_request_irq(arizona, ok_irq, fll->clock_ok_name,
+- arizona_fll_clock_ok, fll);
+- if (ret != 0) {
+- dev_err(arizona->dev, "Failed to get FLL%d clock OK IRQ: %d\n",
+- id, ret);
+- }
+-
+ regmap_update_bits(arizona->regmap, fll->base + 1,
+ ARIZONA_FLL1_FREERUN, 0);
+
+diff --git a/sound/soc/codecs/arizona.h b/sound/soc/codecs/arizona.h
+index 43deb0462309..36867d05e0bb 100644
+--- a/sound/soc/codecs/arizona.h
++++ b/sound/soc/codecs/arizona.h
+@@ -242,7 +242,6 @@ struct arizona_fll {
+ int id;
+ unsigned int base;
+ unsigned int vco_mult;
+- struct completion ok;
+
+ unsigned int fout;
+ int sync_src;
+diff --git a/sound/soc/codecs/rt5640.c b/sound/soc/codecs/rt5640.c
+index 9bc78e57513d..ff72cd8c236e 100644
+--- a/sound/soc/codecs/rt5640.c
++++ b/sound/soc/codecs/rt5640.c
+@@ -984,6 +984,35 @@ static int rt5640_hp_event(struct snd_soc_dapm_widget *w,
+ return 0;
+ }
+
++static int rt5640_lout_event(struct snd_soc_dapm_widget *w,
++ struct snd_kcontrol *kcontrol, int event)
++{
++ struct snd_soc_codec *codec = snd_soc_dapm_to_codec(w->dapm);
++
++ switch (event) {
++ case SND_SOC_DAPM_POST_PMU:
++ hp_amp_power_on(codec);
++ snd_soc_update_bits(codec, RT5640_PWR_ANLG1,
++ RT5640_PWR_LM, RT5640_PWR_LM);
++ snd_soc_update_bits(codec, RT5640_OUTPUT,
++ RT5640_L_MUTE | RT5640_R_MUTE, 0);
++ break;
++
++ case SND_SOC_DAPM_PRE_PMD:
++ snd_soc_update_bits(codec, RT5640_OUTPUT,
++ RT5640_L_MUTE | RT5640_R_MUTE,
++ RT5640_L_MUTE | RT5640_R_MUTE);
++ snd_soc_update_bits(codec, RT5640_PWR_ANLG1,
++ RT5640_PWR_LM, 0);
++ break;
++
++ default:
++ return 0;
++ }
++
++ return 0;
++}
++
+ static int rt5640_hp_power_event(struct snd_soc_dapm_widget *w,
+ struct snd_kcontrol *kcontrol, int event)
+ {
+@@ -1179,13 +1208,16 @@ static const struct snd_soc_dapm_widget rt5640_dapm_widgets[] = {
+ 0, rt5640_spo_l_mix, ARRAY_SIZE(rt5640_spo_l_mix)),
+ SND_SOC_DAPM_MIXER("SPOR MIX", SND_SOC_NOPM, 0,
+ 0, rt5640_spo_r_mix, ARRAY_SIZE(rt5640_spo_r_mix)),
+- SND_SOC_DAPM_MIXER("LOUT MIX", RT5640_PWR_ANLG1, RT5640_PWR_LM_BIT, 0,
++ SND_SOC_DAPM_MIXER("LOUT MIX", SND_SOC_NOPM, 0, 0,
+ rt5640_lout_mix, ARRAY_SIZE(rt5640_lout_mix)),
+ SND_SOC_DAPM_SUPPLY_S("Improve HP Amp Drv", 1, SND_SOC_NOPM,
+ 0, 0, rt5640_hp_power_event, SND_SOC_DAPM_POST_PMU),
+ SND_SOC_DAPM_PGA_S("HP Amp", 1, SND_SOC_NOPM, 0, 0,
+ rt5640_hp_event,
+ SND_SOC_DAPM_PRE_PMD | SND_SOC_DAPM_POST_PMU),
++ SND_SOC_DAPM_PGA_S("LOUT amp", 1, SND_SOC_NOPM, 0, 0,
++ rt5640_lout_event,
++ SND_SOC_DAPM_PRE_PMD | SND_SOC_DAPM_POST_PMU),
+ SND_SOC_DAPM_SUPPLY("HP L Amp", RT5640_PWR_ANLG1,
+ RT5640_PWR_HP_L_BIT, 0, NULL, 0),
+ SND_SOC_DAPM_SUPPLY("HP R Amp", RT5640_PWR_ANLG1,
+@@ -1500,8 +1532,10 @@ static const struct snd_soc_dapm_route rt5640_dapm_routes[] = {
+ {"HP R Playback", "Switch", "HP Amp"},
+ {"HPOL", NULL, "HP L Playback"},
+ {"HPOR", NULL, "HP R Playback"},
+- {"LOUTL", NULL, "LOUT MIX"},
+- {"LOUTR", NULL, "LOUT MIX"},
++
++ {"LOUT amp", NULL, "LOUT MIX"},
++ {"LOUTL", NULL, "LOUT amp"},
++ {"LOUTR", NULL, "LOUT amp"},
+ };
+
+ static const struct snd_soc_dapm_route rt5640_specific_dapm_routes[] = {
+diff --git a/sound/soc/codecs/rt5645.c b/sound/soc/codecs/rt5645.c
+index 961bd7e5877e..58713733d314 100644
+--- a/sound/soc/codecs/rt5645.c
++++ b/sound/soc/codecs/rt5645.c
+@@ -3232,6 +3232,13 @@ static struct dmi_system_id dmi_platform_intel_braswell[] = {
+ DMI_MATCH(DMI_PRODUCT_NAME, "Strago"),
+ },
+ },
++ {
++ .ident = "Google Celes",
++ .callback = strago_quirk_cb,
++ .matches = {
++ DMI_MATCH(DMI_PRODUCT_NAME, "Celes"),
++ },
++ },
+ { }
+ };
+
+diff --git a/sound/soc/samsung/arndale_rt5631.c b/sound/soc/samsung/arndale_rt5631.c
+index 8bf2e2c4bafb..9e371eb3e4fa 100644
+--- a/sound/soc/samsung/arndale_rt5631.c
++++ b/sound/soc/samsung/arndale_rt5631.c
+@@ -116,15 +116,6 @@ static int arndale_audio_probe(struct platform_device *pdev)
+ return ret;
+ }
+
+-static int arndale_audio_remove(struct platform_device *pdev)
+-{
+- struct snd_soc_card *card = platform_get_drvdata(pdev);
+-
+- snd_soc_unregister_card(card);
+-
+- return 0;
+-}
+-
+ static const struct of_device_id samsung_arndale_rt5631_of_match[] __maybe_unused = {
+ { .compatible = "samsung,arndale-rt5631", },
+ { .compatible = "samsung,arndale-alc5631", },
+@@ -139,7 +130,6 @@ static struct platform_driver arndale_audio_driver = {
+ .of_match_table = of_match_ptr(samsung_arndale_rt5631_of_match),
+ },
+ .probe = arndale_audio_probe,
+- .remove = arndale_audio_remove,
+ };
+
+ module_platform_driver(arndale_audio_driver);
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-09-22 11:43 Mike Pagano
0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-09-22 11:43 UTC (permalink / raw
To: gentoo-commits
commit: 5d62f231ba9e82e14ed7c8a7e0117cec4fa5973d
Author: Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Tue Sep 22 11:43:09 2015 +0000
Commit: Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Tue Sep 22 11:43:09 2015 +0000
URL: https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=5d62f231
Removal of BFQ for compilation issues. I will add the next working version released.
0000_README | 12 -
...roups-kconfig-build-bits-for-BFQ-v7r9-4.2.patch | 103 -
...introduce-the-BFQ-v7r9-I-O-sched-for-4.2.patch1 | 7026 --------------------
...Early-Queue-Merge-EQM-to-BFQ-v7r9-for-4.2.patch | 1097 ---
4 files changed, 8238 deletions(-)
diff --git a/0000_README b/0000_README
index 0c6168a..7050114 100644
--- a/0000_README
+++ b/0000_README
@@ -79,18 +79,6 @@ Patch: 5000_enable-additional-cpu-optimizations-for-gcc.patch
From: https://github.com/graysky2/kernel_gcc_patch/
Desc: Kernel patch enables gcc < v4.9 optimizations for additional CPUs.
-Patch: 5001_block-cgroups-kconfig-build-bits-for-BFQ-v7r9-4.2.patch
-From: http://algo.ing.unimo.it/people/paolo/disk_sched/
-Desc: BFQ v7r9 patch 1 for 4.2: Build, cgroups and kconfig bits
-
-Patch: 5002_block-introduce-the-BFQ-v7r9-I-O-sched-for-4.2.patch1
-From: http://algo.ing.unimo.it/people/paolo/disk_sched/
-Desc: BFQ v7r9 patch 2 for 4.2: BFQ Scheduler
-
-Patch: 5003_block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r9-for-4.2.0.patch
-From: http://algo.ing.unimo.it/people/paolo/disk_sched/
-Desc: BFQ v7r9 patch 3 for 4.2: Early Queue Merge (EQM)
-
Patch: 5010_enable-additional-cpu-optimizations-for-gcc-4.9.patch
From: https://github.com/graysky2/kernel_gcc_patch/
Desc: Kernel patch enables gcc >= v4.9 optimizations for additional CPUs.
diff --git a/5001_block-cgroups-kconfig-build-bits-for-BFQ-v7r9-4.2.patch b/5001_block-cgroups-kconfig-build-bits-for-BFQ-v7r9-4.2.patch
deleted file mode 100644
index fc7ef8e..0000000
--- a/5001_block-cgroups-kconfig-build-bits-for-BFQ-v7r9-4.2.patch
+++ /dev/null
@@ -1,103 +0,0 @@
-From f53ecde45f8d40a343aa5b5195e9f0944b7a1a37 Mon Sep 17 00:00:00 2001
-From: Paolo Valente <paolo.valente@unimore.it>
-Date: Tue, 7 Apr 2015 13:39:12 +0200
-Subject: [PATCH 1/3] block: cgroups, kconfig, build bits for BFQ-v7r9-4.2
-
-Update Kconfig.iosched and do the related Makefile changes to include
-kernel configuration options for BFQ. Also increase the number of
-policies supported by the blkio controller so that BFQ can add its
-own.
-
-Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
-Signed-off-by: Arianna Avanzini <avanzini@google.com>
----
- block/Kconfig.iosched | 32 ++++++++++++++++++++++++++++++++
- block/Makefile | 1 +
- include/linux/blkdev.h | 2 +-
- 3 files changed, 34 insertions(+), 1 deletion(-)
-
-diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
-index 421bef9..0ee5f0f 100644
---- a/block/Kconfig.iosched
-+++ b/block/Kconfig.iosched
-@@ -39,6 +39,27 @@ config CFQ_GROUP_IOSCHED
- ---help---
- Enable group IO scheduling in CFQ.
-
-+config IOSCHED_BFQ
-+ tristate "BFQ I/O scheduler"
-+ default n
-+ ---help---
-+ The BFQ I/O scheduler tries to distribute bandwidth among
-+ all processes according to their weights.
-+ It aims at distributing the bandwidth as desired, independently of
-+ the disk parameters and with any workload. It also tries to
-+ guarantee low latency to interactive and soft real-time
-+ applications. If compiled built-in (saying Y here), BFQ can
-+ be configured to support hierarchical scheduling.
-+
-+config CGROUP_BFQIO
-+ bool "BFQ hierarchical scheduling support"
-+ depends on CGROUPS && IOSCHED_BFQ=y
-+ default n
-+ ---help---
-+ Enable hierarchical scheduling in BFQ, using the cgroups
-+ filesystem interface. The name of the subsystem will be
-+ bfqio.
-+
- choice
- prompt "Default I/O scheduler"
- default DEFAULT_CFQ
-@@ -52,6 +73,16 @@ choice
- config DEFAULT_CFQ
- bool "CFQ" if IOSCHED_CFQ=y
-
-+ config DEFAULT_BFQ
-+ bool "BFQ" if IOSCHED_BFQ=y
-+ help
-+ Selects BFQ as the default I/O scheduler which will be
-+ used by default for all block devices.
-+ The BFQ I/O scheduler aims at distributing the bandwidth
-+ as desired, independently of the disk parameters and with
-+ any workload. It also tries to guarantee low latency to
-+ interactive and soft real-time applications.
-+
- config DEFAULT_NOOP
- bool "No-op"
-
-@@ -61,6 +92,7 @@ config DEFAULT_IOSCHED
- string
- default "deadline" if DEFAULT_DEADLINE
- default "cfq" if DEFAULT_CFQ
-+ default "bfq" if DEFAULT_BFQ
- default "noop" if DEFAULT_NOOP
-
- endmenu
-diff --git a/block/Makefile b/block/Makefile
-index 00ecc97..1ed86d5 100644
---- a/block/Makefile
-+++ b/block/Makefile
-@@ -18,6 +18,7 @@ obj-$(CONFIG_BLK_DEV_THROTTLING) += blk-throttle.o
- obj-$(CONFIG_IOSCHED_NOOP) += noop-iosched.o
- obj-$(CONFIG_IOSCHED_DEADLINE) += deadline-iosched.o
- obj-$(CONFIG_IOSCHED_CFQ) += cfq-iosched.o
-+obj-$(CONFIG_IOSCHED_BFQ) += bfq-iosched.o
-
- obj-$(CONFIG_BLOCK_COMPAT) += compat_ioctl.o
- obj-$(CONFIG_BLK_CMDLINE_PARSER) += cmdline-parser.o
-diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
-index a622f27..e2b4c03 100644
---- a/include/linux/blkdev.h
-+++ b/include/linux/blkdev.h
-@@ -43,7 +43,7 @@ struct blk_flush_queue;
- * Maximum number of blkcg policies allowed to be registered concurrently.
- * Defined here to simplify include dependency.
- */
--#define BLKCG_MAX_POLS 2
-+#define BLKCG_MAX_POLS 3
-
- struct request;
- typedef void (rq_end_io_fn)(struct request *, int);
---
-2.1.4
-
diff --git a/5002_block-introduce-the-BFQ-v7r9-I-O-sched-for-4.2.patch1 b/5002_block-introduce-the-BFQ-v7r9-I-O-sched-for-4.2.patch1
deleted file mode 100644
index 04dd37c..0000000
--- a/5002_block-introduce-the-BFQ-v7r9-I-O-sched-for-4.2.patch1
+++ /dev/null
@@ -1,7026 +0,0 @@
-From 152cacc8a71a6cd7fe8cedc1110a378721e66ffa Mon Sep 17 00:00:00 2001
-From: Paolo Valente <paolo.valente@unimore.it>
-Date: Thu, 9 May 2013 19:10:02 +0200
-Subject: [PATCH 2/3] block: introduce the BFQ-v7r9 I/O sched for 4.2
-
-Add the BFQ-v7r9 I/O scheduler to 4.2.
-The general structure is borrowed from CFQ, as much of the code for
-handling I/O contexts. Over time, several useful features have been
-ported from CFQ as well (details in the changelog in README.BFQ). A
-(bfq_)queue is associated to each task doing I/O on a device, and each
-time a scheduling decision has to be made a queue is selected and served
-until it expires.
-
- - Slices are given in the service domain: tasks are assigned
- budgets, measured in number of sectors. Once got the disk, a task
- must however consume its assigned budget within a configurable
- maximum time (by default, the maximum possible value of the
- budgets is automatically computed to comply with this timeout).
- This allows the desired latency vs "throughput boosting" tradeoff
- to be set.
-
- - Budgets are scheduled according to a variant of WF2Q+, implemented
- using an augmented rb-tree to take eligibility into account while
- preserving an O(log N) overall complexity.
-
- - A low-latency tunable is provided; if enabled, both interactive
- and soft real-time applications are guaranteed a very low latency.
-
- - Latency guarantees are preserved also in the presence of NCQ.
-
- - Also with flash-based devices, a high throughput is achieved
- while still preserving latency guarantees.
-
- - BFQ features Early Queue Merge (EQM), a sort of fusion of the
- cooperating-queue-merging and the preemption mechanisms present
- in CFQ. EQM is in fact a unified mechanism that tries to get a
- sequential read pattern, and hence a high throughput, with any
- set of processes performing interleaved I/O over a contiguous
- sequence of sectors.
-
- - BFQ supports full hierarchical scheduling, exporting a cgroups
- interface. Since each node has a full scheduler, each group can
- be assigned its own weight.
-
- - If the cgroups interface is not used, only I/O priorities can be
- assigned to processes, with ioprio values mapped to weights
- with the relation weight = IOPRIO_BE_NR - ioprio.
-
- - ioprio classes are served in strict priority order, i.e., lower
- priority queues are not served as long as there are higher
- priority queues. Among queues in the same class the bandwidth is
- distributed in proportion to the weight of each queue. A very
- thin extra bandwidth is however guaranteed to the Idle class, to
- prevent it from starving.
-
-Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
-Signed-off-by: Arianna Avanzini <avanzini@google.com>
----
- block/Kconfig.iosched | 6 +-
- block/bfq-cgroup.c | 1108 +++++++++++++++
- block/bfq-ioc.c | 36 +
- block/bfq-iosched.c | 3753 +++++++++++++++++++++++++++++++++++++++++++++++++
- block/bfq-sched.c | 1197 ++++++++++++++++
- block/bfq.h | 807 +++++++++++
- 6 files changed, 6903 insertions(+), 4 deletions(-)
- create mode 100644 block/bfq-cgroup.c
- create mode 100644 block/bfq-ioc.c
- create mode 100644 block/bfq-iosched.c
- create mode 100644 block/bfq-sched.c
- create mode 100644 block/bfq.h
-
-diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
-index 0ee5f0f..f78cd1a 100644
---- a/block/Kconfig.iosched
-+++ b/block/Kconfig.iosched
-@@ -51,14 +51,12 @@ config IOSCHED_BFQ
- applications. If compiled built-in (saying Y here), BFQ can
- be configured to support hierarchical scheduling.
-
--config CGROUP_BFQIO
-+config BFQ_GROUP_IOSCHED
- bool "BFQ hierarchical scheduling support"
- depends on CGROUPS && IOSCHED_BFQ=y
- default n
- ---help---
-- Enable hierarchical scheduling in BFQ, using the cgroups
-- filesystem interface. The name of the subsystem will be
-- bfqio.
-+ Enable hierarchical scheduling in BFQ, using the blkio controller.
-
- choice
- prompt "Default I/O scheduler"
-diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
-new file mode 100644
-index 0000000..c02d65a
---- /dev/null
-+++ b/block/bfq-cgroup.c
-@@ -0,0 +1,1108 @@
-+/*
-+ * BFQ: CGROUPS support.
-+ *
-+ * Based on ideas and code from CFQ:
-+ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
-+ *
-+ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
-+ * Paolo Valente <paolo.valente@unimore.it>
-+ *
-+ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
-+ *
-+ * Licensed under the GPL-2 as detailed in the accompanying COPYING.BFQ
-+ * file.
-+ */
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+
-+/* bfqg stats flags */
-+enum bfqg_stats_flags {
-+ BFQG_stats_waiting = 0,
-+ BFQG_stats_idling,
-+ BFQG_stats_empty,
-+};
-+
-+#define BFQG_FLAG_FNS(name) \
-+static void bfqg_stats_mark_##name(struct bfqg_stats *stats) \
-+{ \
-+ stats->flags |= (1 << BFQG_stats_##name); \
-+} \
-+static void bfqg_stats_clear_##name(struct bfqg_stats *stats) \
-+{ \
-+ stats->flags &= ~(1 << BFQG_stats_##name); \
-+} \
-+static int bfqg_stats_##name(struct bfqg_stats *stats) \
-+{ \
-+ return (stats->flags & (1 << BFQG_stats_##name)) != 0; \
-+} \
-+
-+BFQG_FLAG_FNS(waiting)
-+BFQG_FLAG_FNS(idling)
-+BFQG_FLAG_FNS(empty)
-+#undef BFQG_FLAG_FNS
-+
-+/* This should be called with the queue_lock held. */
-+static void bfqg_stats_update_group_wait_time(struct bfqg_stats *stats)
-+{
-+ unsigned long long now;
-+
-+ if (!bfqg_stats_waiting(stats))
-+ return;
-+
-+ now = sched_clock();
-+ if (time_after64(now, stats->start_group_wait_time))
-+ blkg_stat_add(&stats->group_wait_time,
-+ now - stats->start_group_wait_time);
-+ bfqg_stats_clear_waiting(stats);
-+}
-+
-+/* This should be called with the queue_lock held. */
-+static void bfqg_stats_set_start_group_wait_time(struct bfq_group *bfqg,
-+ struct bfq_group *curr_bfqg)
-+{
-+ struct bfqg_stats *stats = &bfqg->stats;
-+
-+ if (bfqg_stats_waiting(stats))
-+ return;
-+ if (bfqg == curr_bfqg)
-+ return;
-+ stats->start_group_wait_time = sched_clock();
-+ bfqg_stats_mark_waiting(stats);
-+}
-+
-+/* This should be called with the queue_lock held. */
-+static void bfqg_stats_end_empty_time(struct bfqg_stats *stats)
-+{
-+ unsigned long long now;
-+
-+ if (!bfqg_stats_empty(stats))
-+ return;
-+
-+ now = sched_clock();
-+ if (time_after64(now, stats->start_empty_time))
-+ blkg_stat_add(&stats->empty_time,
-+ now - stats->start_empty_time);
-+ bfqg_stats_clear_empty(stats);
-+}
-+
-+static void bfqg_stats_update_dequeue(struct bfq_group *bfqg)
-+{
-+ blkg_stat_add(&bfqg->stats.dequeue, 1);
-+}
-+
-+static void bfqg_stats_set_start_empty_time(struct bfq_group *bfqg)
-+{
-+ struct bfqg_stats *stats = &bfqg->stats;
-+
-+ if (blkg_rwstat_total(&stats->queued))
-+ return;
-+
-+ /*
-+ * group is already marked empty. This can happen if bfqq got new
-+ * request in parent group and moved to this group while being added
-+ * to service tree. Just ignore the event and move on.
-+ */
-+ if (bfqg_stats_empty(stats))
-+ return;
-+
-+ stats->start_empty_time = sched_clock();
-+ bfqg_stats_mark_empty(stats);
-+}
-+
-+static void bfqg_stats_update_idle_time(struct bfq_group *bfqg)
-+{
-+ struct bfqg_stats *stats = &bfqg->stats;
-+
-+ if (bfqg_stats_idling(stats)) {
-+ unsigned long long now = sched_clock();
-+
-+ if (time_after64(now, stats->start_idle_time))
-+ blkg_stat_add(&stats->idle_time,
-+ now - stats->start_idle_time);
-+ bfqg_stats_clear_idling(stats);
-+ }
-+}
-+
-+static void bfqg_stats_set_start_idle_time(struct bfq_group *bfqg)
-+{
-+ struct bfqg_stats *stats = &bfqg->stats;
-+
-+ stats->start_idle_time = sched_clock();
-+ bfqg_stats_mark_idling(stats);
-+}
-+
-+static void bfqg_stats_update_avg_queue_size(struct bfq_group *bfqg)
-+{
-+ struct bfqg_stats *stats = &bfqg->stats;
-+
-+ blkg_stat_add(&stats->avg_queue_size_sum,
-+ blkg_rwstat_total(&stats->queued));
-+ blkg_stat_add(&stats->avg_queue_size_samples, 1);
-+ bfqg_stats_update_group_wait_time(stats);
-+}
-+
-+static struct blkcg_policy blkcg_policy_bfq;
-+
-+/*
-+ * blk-cgroup policy-related handlers
-+ * The following functions help in converting between blk-cgroup
-+ * internal structures and BFQ-specific structures.
-+ */
-+
-+static struct bfq_group *pd_to_bfqg(struct blkg_policy_data *pd)
-+{
-+ return pd ? container_of(pd, struct bfq_group, pd) : NULL;
-+}
-+
-+static struct blkcg_gq *bfqg_to_blkg(struct bfq_group *bfqg)
-+{
-+ return pd_to_blkg(&bfqg->pd);
-+}
-+
-+static struct bfq_group *blkg_to_bfqg(struct blkcg_gq *blkg)
-+{
-+ return pd_to_bfqg(blkg_to_pd(blkg, &blkcg_policy_bfq));
-+}
-+
-+/*
-+ * bfq_group handlers
-+ * The following functions help in navigating the bfq_group hierarchy
-+ * by allowing to find the parent of a bfq_group or the bfq_group
-+ * associated to a bfq_queue.
-+ */
-+
-+static struct bfq_group *bfqg_parent(struct bfq_group *bfqg)
-+{
-+ struct blkcg_gq *pblkg = bfqg_to_blkg(bfqg)->parent;
-+
-+ return pblkg ? blkg_to_bfqg(pblkg) : NULL;
-+}
-+
-+static struct bfq_group *bfqq_group(struct bfq_queue *bfqq)
-+{
-+ struct bfq_entity *group_entity = bfqq->entity.parent;
-+
-+ return group_entity ? container_of(group_entity, struct bfq_group,
-+ entity) :
-+ bfqq->bfqd->root_group;
-+}
-+
-+/*
-+ * The following two functions handle get and put of a bfq_group by
-+ * wrapping the related blk-cgroup hooks.
-+ */
-+
-+static void bfqg_get(struct bfq_group *bfqg)
-+{
-+ return blkg_get(bfqg_to_blkg(bfqg));
-+}
-+
-+static void bfqg_put(struct bfq_group *bfqg)
-+{
-+ return blkg_put(bfqg_to_blkg(bfqg));
-+}
-+
-+static void bfqg_stats_update_io_add(struct bfq_group *bfqg,
-+ struct bfq_queue *bfqq,
-+ int rw)
-+{
-+ blkg_rwstat_add(&bfqg->stats.queued, rw, 1);
-+ bfqg_stats_end_empty_time(&bfqg->stats);
-+ if (!(bfqq == ((struct bfq_data *)bfqg->bfqd)->in_service_queue))
-+ bfqg_stats_set_start_group_wait_time(bfqg, bfqq_group(bfqq));
-+}
-+
-+static void bfqg_stats_update_io_remove(struct bfq_group *bfqg, int rw)
-+{
-+ blkg_rwstat_add(&bfqg->stats.queued, rw, -1);
-+}
-+
-+static void bfqg_stats_update_io_merged(struct bfq_group *bfqg, int rw)
-+{
-+ blkg_rwstat_add(&bfqg->stats.merged, rw, 1);
-+}
-+
-+static void bfqg_stats_update_dispatch(struct bfq_group *bfqg,
-+ uint64_t bytes, int rw)
-+{
-+ blkg_stat_add(&bfqg->stats.sectors, bytes >> 9);
-+ blkg_rwstat_add(&bfqg->stats.serviced, rw, 1);
-+ blkg_rwstat_add(&bfqg->stats.service_bytes, rw, bytes);
-+}
-+
-+static void bfqg_stats_update_completion(struct bfq_group *bfqg,
-+ uint64_t start_time, uint64_t io_start_time, int rw)
-+{
-+ struct bfqg_stats *stats = &bfqg->stats;
-+ unsigned long long now = sched_clock();
-+
-+ if (time_after64(now, io_start_time))
-+ blkg_rwstat_add(&stats->service_time, rw, now - io_start_time);
-+ if (time_after64(io_start_time, start_time))
-+ blkg_rwstat_add(&stats->wait_time, rw,
-+ io_start_time - start_time);
-+}
-+
-+/* @stats = 0 */
-+static void bfqg_stats_reset(struct bfqg_stats *stats)
-+{
-+ if (!stats)
-+ return;
-+
-+ /* queued stats shouldn't be cleared */
-+ blkg_rwstat_reset(&stats->service_bytes);
-+ blkg_rwstat_reset(&stats->serviced);
-+ blkg_rwstat_reset(&stats->merged);
-+ blkg_rwstat_reset(&stats->service_time);
-+ blkg_rwstat_reset(&stats->wait_time);
-+ blkg_stat_reset(&stats->time);
-+ blkg_stat_reset(&stats->unaccounted_time);
-+ blkg_stat_reset(&stats->avg_queue_size_sum);
-+ blkg_stat_reset(&stats->avg_queue_size_samples);
-+ blkg_stat_reset(&stats->dequeue);
-+ blkg_stat_reset(&stats->group_wait_time);
-+ blkg_stat_reset(&stats->idle_time);
-+ blkg_stat_reset(&stats->empty_time);
-+}
-+
-+/* @to += @from */
-+static void bfqg_stats_merge(struct bfqg_stats *to, struct bfqg_stats *from)
-+{
-+ if (!to || !from)
-+ return;
-+
-+ /* queued stats shouldn't be cleared */
-+ blkg_rwstat_merge(&to->service_bytes, &from->service_bytes);
-+ blkg_rwstat_merge(&to->serviced, &from->serviced);
-+ blkg_rwstat_merge(&to->merged, &from->merged);
-+ blkg_rwstat_merge(&to->service_time, &from->service_time);
-+ blkg_rwstat_merge(&to->wait_time, &from->wait_time);
-+ blkg_stat_merge(&from->time, &from->time);
-+ blkg_stat_merge(&to->unaccounted_time, &from->unaccounted_time);
-+ blkg_stat_merge(&to->avg_queue_size_sum, &from->avg_queue_size_sum);
-+ blkg_stat_merge(&to->avg_queue_size_samples, &from->avg_queue_size_samples);
-+ blkg_stat_merge(&to->dequeue, &from->dequeue);
-+ blkg_stat_merge(&to->group_wait_time, &from->group_wait_time);
-+ blkg_stat_merge(&to->idle_time, &from->idle_time);
-+ blkg_stat_merge(&to->empty_time, &from->empty_time);
-+}
-+
-+/*
-+ * Transfer @bfqg's stats to its parent's dead_stats so that the ancestors'
-+ * recursive stats can still account for the amount used by this bfqg after
-+ * it's gone.
-+ */
-+static void bfqg_stats_xfer_dead(struct bfq_group *bfqg)
-+{
-+ struct bfq_group *parent;
-+
-+ if (!bfqg) /* root_group */
-+ return;
-+
-+ parent = bfqg_parent(bfqg);
-+
-+ lockdep_assert_held(bfqg_to_blkg(bfqg)->q->queue_lock);
-+
-+ if (unlikely(!parent))
-+ return;
-+
-+ bfqg_stats_merge(&parent->dead_stats, &bfqg->stats);
-+ bfqg_stats_merge(&parent->dead_stats, &bfqg->dead_stats);
-+ bfqg_stats_reset(&bfqg->stats);
-+ bfqg_stats_reset(&bfqg->dead_stats);
-+}
-+
-+static void bfq_init_entity(struct bfq_entity *entity,
-+ struct bfq_group *bfqg)
-+{
-+ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
-+
-+ entity->weight = entity->new_weight;
-+ entity->orig_weight = entity->new_weight;
-+ if (bfqq) {
-+ bfqq->ioprio = bfqq->new_ioprio;
-+ bfqq->ioprio_class = bfqq->new_ioprio_class;
-+ bfqg_get(bfqg);
-+ }
-+ entity->parent = bfqg->my_entity;
-+ entity->sched_data = &bfqg->sched_data;
-+}
-+
-+static void bfqg_stats_init(struct bfqg_stats *stats)
-+{
-+ blkg_rwstat_init(&stats->service_bytes);
-+ blkg_rwstat_init(&stats->serviced);
-+ blkg_rwstat_init(&stats->merged);
-+ blkg_rwstat_init(&stats->service_time);
-+ blkg_rwstat_init(&stats->wait_time);
-+ blkg_rwstat_init(&stats->queued);
-+
-+ blkg_stat_init(&stats->sectors);
-+ blkg_stat_init(&stats->time);
-+
-+ blkg_stat_init(&stats->unaccounted_time);
-+ blkg_stat_init(&stats->avg_queue_size_sum);
-+ blkg_stat_init(&stats->avg_queue_size_samples);
-+ blkg_stat_init(&stats->dequeue);
-+ blkg_stat_init(&stats->group_wait_time);
-+ blkg_stat_init(&stats->idle_time);
-+ blkg_stat_init(&stats->empty_time);
-+}
-+
-+static struct bfq_group_data *cpd_to_bfqgd(struct blkcg_policy_data *cpd)
-+ {
-+ return cpd ? container_of(cpd, struct bfq_group_data, pd) : NULL;
-+ }
-+
-+static struct bfq_group_data *blkcg_to_bfqgd(struct blkcg *blkcg)
-+{
-+ return cpd_to_bfqgd(blkcg_to_cpd(blkcg, &blkcg_policy_bfq));
-+}
-+
-+static void bfq_cpd_init(const struct blkcg *blkcg)
-+{
-+ struct bfq_group_data *d =
-+ cpd_to_bfqgd(blkcg->pd[blkcg_policy_bfq.plid]);
-+
-+ d->weight = BFQ_DEFAULT_GRP_WEIGHT;
-+}
-+
-+static void bfq_pd_init(struct blkcg_gq *blkg)
-+{
-+ struct bfq_group *bfqg = blkg_to_bfqg(blkg);
-+ struct bfq_data *bfqd = blkg->q->elevator->elevator_data;
-+ struct bfq_entity *entity = &bfqg->entity;
-+ struct bfq_group_data *d = blkcg_to_bfqgd(blkg->blkcg);
-+
-+ entity->orig_weight = entity->weight = entity->new_weight = d->weight;
-+ entity->my_sched_data = &bfqg->sched_data;
-+ bfqg->my_entity = entity; /*
-+ * the root_group's will be set to NULL
-+ * in bfq_init_queue()
-+ */
-+ bfqg->bfqd = bfqd;
-+ bfqg->active_entities = 0;
-+
-+ /* if the root_group does not exist, we are handling it right now */
-+ if (bfqd->root_group && bfqg != bfqd->root_group)
-+ hlist_add_head(&bfqg->bfqd_node, &bfqd->group_list);
-+
-+ bfqg_stats_init(&bfqg->stats);
-+ bfqg_stats_init(&bfqg->dead_stats);
-+}
-+
-+/* offset delta from bfqg->stats to bfqg->dead_stats */
-+static const int dead_stats_off_delta = offsetof(struct bfq_group, dead_stats) -
-+ offsetof(struct bfq_group, stats);
-+
-+/* to be used by recursive prfill, sums live and dead stats recursively */
-+static u64 bfqg_stat_pd_recursive_sum(struct blkg_policy_data *pd, int off)
-+{
-+ u64 sum = 0;
-+
-+ sum += blkg_stat_recursive_sum(pd, off);
-+ sum += blkg_stat_recursive_sum(pd, off + dead_stats_off_delta);
-+ return sum;
-+}
-+
-+/* to be used by recursive prfill, sums live and dead rwstats recursively */
-+static struct blkg_rwstat bfqg_rwstat_pd_recursive_sum(struct blkg_policy_data *pd,
-+ int off)
-+{
-+ struct blkg_rwstat a, b;
-+
-+ a = blkg_rwstat_recursive_sum(pd, off);
-+ b = blkg_rwstat_recursive_sum(pd, off + dead_stats_off_delta);
-+ blkg_rwstat_merge(&a, &b);
-+ return a;
-+}
-+
-+static void bfq_pd_reset_stats(struct blkcg_gq *blkg)
-+{
-+ struct bfq_group *bfqg = blkg_to_bfqg(blkg);
-+
-+ bfqg_stats_reset(&bfqg->stats);
-+ bfqg_stats_reset(&bfqg->dead_stats);
-+}
-+
-+static void bfq_group_set_parent(struct bfq_group *bfqg,
-+ struct bfq_group *parent)
-+{
-+ struct bfq_entity *entity;
-+
-+ BUG_ON(!parent);
-+ BUG_ON(!bfqg);
-+ BUG_ON(bfqg == parent);
-+
-+ entity = &bfqg->entity;
-+ entity->parent = parent->my_entity;
-+ entity->sched_data = &parent->sched_data;
-+}
-+
-+static struct bfq_group *bfq_find_alloc_group(struct bfq_data *bfqd,
-+ struct blkcg *blkcg)
-+{
-+ struct request_queue *q = bfqd->queue;
-+ struct bfq_group *bfqg = NULL, *parent;
-+ struct bfq_entity *entity = NULL;
-+
-+ assert_spin_locked(bfqd->queue->queue_lock);
-+
-+ /* avoid lookup for the common case where there's no blkcg */
-+ if (blkcg == &blkcg_root) {
-+ bfqg = bfqd->root_group;
-+ } else {
-+ struct blkcg_gq *blkg;
-+
-+ blkg = blkg_lookup_create(blkcg, q);
-+ if (!IS_ERR(blkg))
-+ bfqg = blkg_to_bfqg(blkg);
-+ else /* fallback to root_group */
-+ bfqg = bfqd->root_group;
-+ }
-+
-+ BUG_ON(!bfqg);
-+
-+ /*
-+ * Update chain of bfq_groups as we might be handling a leaf group
-+ * which, along with some of its relatives, has not been hooked yet
-+ * to the private hierarchy of BFQ.
-+ */
-+ entity = &bfqg->entity;
-+ for_each_entity(entity) {
-+ bfqg = container_of(entity, struct bfq_group, entity);
-+ BUG_ON(!bfqg);
-+ if (bfqg != bfqd->root_group) {
-+ parent = bfqg_parent(bfqg);
-+ if (!parent)
-+ parent = bfqd->root_group;
-+ BUG_ON(!parent);
-+ bfq_group_set_parent(bfqg, parent);
-+ }
-+ }
-+
-+ return bfqg;
-+}
-+
-+/**
-+ * bfq_bfqq_move - migrate @bfqq to @bfqg.
-+ * @bfqd: queue descriptor.
-+ * @bfqq: the queue to move.
-+ * @entity: @bfqq's entity.
-+ * @bfqg: the group to move to.
-+ *
-+ * Move @bfqq to @bfqg, deactivating it from its old group and reactivating
-+ * it on the new one. Avoid putting the entity on the old group idle tree.
-+ *
-+ * Must be called under the queue lock; the cgroup owning @bfqg must
-+ * not disappear (by now this just means that we are called under
-+ * rcu_read_lock()).
-+ */
-+static void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
-+ struct bfq_entity *entity, struct bfq_group *bfqg)
-+{
-+ int busy, resume;
-+
-+ busy = bfq_bfqq_busy(bfqq);
-+ resume = !RB_EMPTY_ROOT(&bfqq->sort_list);
-+
-+ BUG_ON(resume && !entity->on_st);
-+ BUG_ON(busy && !resume && entity->on_st &&
-+ bfqq != bfqd->in_service_queue);
-+
-+ if (busy) {
-+ BUG_ON(atomic_read(&bfqq->ref) < 2);
-+
-+ if (!resume)
-+ bfq_del_bfqq_busy(bfqd, bfqq, 0);
-+ else
-+ bfq_deactivate_bfqq(bfqd, bfqq, 0);
-+ } else if (entity->on_st)
-+ bfq_put_idle_entity(bfq_entity_service_tree(entity), entity);
-+ bfqg_put(bfqq_group(bfqq));
-+
-+ /*
-+ * Here we use a reference to bfqg. We don't need a refcounter
-+ * as the cgroup reference will not be dropped, so that its
-+ * destroy() callback will not be invoked.
-+ */
-+ entity->parent = bfqg->my_entity;
-+ entity->sched_data = &bfqg->sched_data;
-+ bfqg_get(bfqg);
-+
-+ if (busy) {
-+ if (resume)
-+ bfq_activate_bfqq(bfqd, bfqq);
-+ }
-+
-+ if (!bfqd->in_service_queue && !bfqd->rq_in_driver)
-+ bfq_schedule_dispatch(bfqd);
-+}
-+
-+/**
-+ * __bfq_bic_change_cgroup - move @bic to @cgroup.
-+ * @bfqd: the queue descriptor.
-+ * @bic: the bic to move.
-+ * @blkcg: the blk-cgroup to move to.
-+ *
-+ * Move bic to blkcg, assuming that bfqd->queue is locked; the caller
-+ * has to make sure that the reference to cgroup is valid across the call.
-+ *
-+ * NOTE: an alternative approach might have been to store the current
-+ * cgroup in bfqq and getting a reference to it, reducing the lookup
-+ * time here, at the price of slightly more complex code.
-+ */
-+static struct bfq_group *__bfq_bic_change_cgroup(struct bfq_data *bfqd,
-+ struct bfq_io_cq *bic,
-+ struct blkcg *blkcg)
-+{
-+ struct bfq_queue *async_bfqq = bic_to_bfqq(bic, 0);
-+ struct bfq_queue *sync_bfqq = bic_to_bfqq(bic, 1);
-+ struct bfq_group *bfqg;
-+ struct bfq_entity *entity;
-+
-+ lockdep_assert_held(bfqd->queue->queue_lock);
-+
-+ bfqg = bfq_find_alloc_group(bfqd, blkcg);
-+ if (async_bfqq) {
-+ entity = &async_bfqq->entity;
-+
-+ if (entity->sched_data != &bfqg->sched_data) {
-+ bic_set_bfqq(bic, NULL, 0);
-+ bfq_log_bfqq(bfqd, async_bfqq,
-+ "bic_change_group: %p %d",
-+ async_bfqq, atomic_read(&async_bfqq->ref));
-+ bfq_put_queue(async_bfqq);
-+ }
-+ }
-+
-+ if (sync_bfqq) {
-+ entity = &sync_bfqq->entity;
-+ if (entity->sched_data != &bfqg->sched_data)
-+ bfq_bfqq_move(bfqd, sync_bfqq, entity, bfqg);
-+ }
-+
-+ return bfqg;
-+}
-+
-+static void bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio)
-+{
-+ struct bfq_data *bfqd = bic_to_bfqd(bic);
-+ struct blkcg *blkcg;
-+ struct bfq_group *bfqg = NULL;
-+ uint64_t id;
-+
-+ rcu_read_lock();
-+ blkcg = bio_blkcg(bio);
-+ id = blkcg->css.serial_nr;
-+ rcu_read_unlock();
-+
-+ /*
-+ * Check whether blkcg has changed. The condition may trigger
-+ * spuriously on a newly created cic but there's no harm.
-+ */
-+ if (unlikely(!bfqd) || likely(bic->blkcg_id == id))
-+ return;
-+
-+ bfqg = __bfq_bic_change_cgroup(bfqd, bic, blkcg);
-+ BUG_ON(!bfqg);
-+ bic->blkcg_id = id;
-+}
-+
-+/**
-+ * bfq_flush_idle_tree - deactivate any entity on the idle tree of @st.
-+ * @st: the service tree being flushed.
-+ */
-+static void bfq_flush_idle_tree(struct bfq_service_tree *st)
-+{
-+ struct bfq_entity *entity = st->first_idle;
-+
-+ for (; entity ; entity = st->first_idle)
-+ __bfq_deactivate_entity(entity, 0);
-+}
-+
-+/**
-+ * bfq_reparent_leaf_entity - move leaf entity to the root_group.
-+ * @bfqd: the device data structure with the root group.
-+ * @entity: the entity to move.
-+ */
-+static void bfq_reparent_leaf_entity(struct bfq_data *bfqd,
-+ struct bfq_entity *entity)
-+{
-+ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
-+
-+ BUG_ON(!bfqq);
-+ bfq_bfqq_move(bfqd, bfqq, entity, bfqd->root_group);
-+ return;
-+}
-+
-+/**
-+ * bfq_reparent_active_entities - move to the root group all active
-+ * entities.
-+ * @bfqd: the device data structure with the root group.
-+ * @bfqg: the group to move from.
-+ * @st: the service tree with the entities.
-+ *
-+ * Needs queue_lock to be taken and reference to be valid over the call.
-+ */
-+static void bfq_reparent_active_entities(struct bfq_data *bfqd,
-+ struct bfq_group *bfqg,
-+ struct bfq_service_tree *st)
-+{
-+ struct rb_root *active = &st->active;
-+ struct bfq_entity *entity = NULL;
-+
-+ if (!RB_EMPTY_ROOT(&st->active))
-+ entity = bfq_entity_of(rb_first(active));
-+
-+ for (; entity ; entity = bfq_entity_of(rb_first(active)))
-+ bfq_reparent_leaf_entity(bfqd, entity);
-+
-+ if (bfqg->sched_data.in_service_entity)
-+ bfq_reparent_leaf_entity(bfqd,
-+ bfqg->sched_data.in_service_entity);
-+
-+ return;
-+}
-+
-+/**
-+ * bfq_destroy_group - destroy @bfqg.
-+ * @bfqg: the group being destroyed.
-+ *
-+ * Destroy @bfqg, making sure that it is not referenced from its parent.
-+ * blkio already grabs the queue_lock for us, so no need to use RCU-based magic
-+ */
-+static void bfq_pd_offline(struct blkcg_gq *blkg)
-+{
-+ struct bfq_service_tree *st;
-+ struct bfq_group *bfqg = blkg_to_bfqg(blkg);
-+ struct bfq_data *bfqd = bfqg->bfqd;
-+ struct bfq_entity *entity = bfqg->my_entity;
-+ int i;
-+
-+ if (!entity) /* root group */
-+ return;
-+
-+ /*
-+ * Empty all service_trees belonging to this group before
-+ * deactivating the group itself.
-+ */
-+ for (i = 0; i < BFQ_IOPRIO_CLASSES; i++) {
-+ st = bfqg->sched_data.service_tree + i;
-+
-+ /*
-+ * The idle tree may still contain bfq_queues belonging
-+ * to exited task because they never migrated to a different
-+ * cgroup from the one being destroyed now. No one else
-+ * can access them so it's safe to act without any lock.
-+ */
-+ bfq_flush_idle_tree(st);
-+
-+ /*
-+ * It may happen that some queues are still active
-+ * (busy) upon group destruction (if the corresponding
-+ * processes have been forced to terminate). We move
-+ * all the leaf entities corresponding to these queues
-+ * to the root_group.
-+ * Also, it may happen that the group has an entity
-+ * in service, which is disconnected from the active
-+ * tree: it must be moved, too.
-+ * There is no need to put the sync queues, as the
-+ * scheduler has taken no reference.
-+ */
-+ bfq_reparent_active_entities(bfqd, bfqg, st);
-+ BUG_ON(!RB_EMPTY_ROOT(&st->active));
-+ BUG_ON(!RB_EMPTY_ROOT(&st->idle));
-+ }
-+ BUG_ON(bfqg->sched_data.next_in_service);
-+ BUG_ON(bfqg->sched_data.in_service_entity);
-+
-+ hlist_del(&bfqg->bfqd_node);
-+ __bfq_deactivate_entity(entity, 0);
-+ bfq_put_async_queues(bfqd, bfqg);
-+ BUG_ON(entity->tree);
-+
-+ bfqg_stats_xfer_dead(bfqg);
-+}
-+
-+static void bfq_end_wr_async(struct bfq_data *bfqd)
-+{
-+ struct hlist_node *tmp;
-+ struct bfq_group *bfqg;
-+
-+ hlist_for_each_entry_safe(bfqg, tmp, &bfqd->group_list, bfqd_node)
-+ bfq_end_wr_async_queues(bfqd, bfqg);
-+ bfq_end_wr_async_queues(bfqd, bfqd->root_group);
-+}
-+
-+/**
-+ * bfq_disconnect_groups - disconnect @bfqd from all its groups.
-+ * @bfqd: the device descriptor being exited.
-+ *
-+ * When the device exits we just make sure that no lookup can return
-+ * the now unused group structures. They will be deallocated on cgroup
-+ * destruction.
-+ */
-+static void bfq_disconnect_groups(struct bfq_data *bfqd)
-+{
-+ struct hlist_node *tmp;
-+ struct bfq_group *bfqg;
-+
-+ bfq_log(bfqd, "disconnect_groups beginning");
-+ hlist_for_each_entry_safe(bfqg, tmp, &bfqd->group_list, bfqd_node) {
-+ hlist_del(&bfqg->bfqd_node);
-+
-+ __bfq_deactivate_entity(bfqg->my_entity, 0);
-+
-+ /*
-+ * Don't remove from the group hash, just set an
-+ * invalid key. No lookups can race with the
-+ * assignment as bfqd is being destroyed; this
-+ * implies also that new elements cannot be added
-+ * to the list.
-+ */
-+ rcu_assign_pointer(bfqg->bfqd, NULL);
-+
-+ bfq_log(bfqd, "disconnect_groups: put async for group %p",
-+ bfqg);
-+ bfq_put_async_queues(bfqd, bfqg);
-+ }
-+}
-+
-+static u64 bfqio_cgroup_weight_read(struct cgroup_subsys_state *css,
-+ struct cftype *cftype)
-+{
-+ struct blkcg *blkcg = css_to_blkcg(css);
-+ struct bfq_group_data *bfqgd = blkcg_to_bfqgd(blkcg);
-+ int ret = -EINVAL;
-+
-+ spin_lock_irq(&blkcg->lock);
-+ ret = bfqgd->weight;
-+ spin_unlock_irq(&blkcg->lock);
-+
-+ return ret;
-+}
-+
-+static int bfqio_cgroup_weight_write(struct cgroup_subsys_state *css,
-+ struct cftype *cftype,
-+ u64 val)
-+{
-+ struct blkcg *blkcg = css_to_blkcg(css);
-+ struct bfq_group_data *bfqgd = blkcg_to_bfqgd(blkcg);
-+ struct blkcg_gq *blkg;
-+ int ret = -EINVAL;
-+
-+ if (val < BFQ_MIN_WEIGHT || val > BFQ_MAX_WEIGHT)
-+ return ret;
-+
-+ ret = 0;
-+ spin_lock_irq(&blkcg->lock);
-+ bfqgd->weight = (unsigned short)val;
-+ hlist_for_each_entry(blkg, &blkcg->blkg_list, blkcg_node) {
-+ struct bfq_group *bfqg = blkg_to_bfqg(blkg);
-+ if (!bfqg)
-+ continue;
-+ /*
-+ * Setting the prio_changed flag of the entity
-+ * to 1 with new_weight == weight would re-set
-+ * the value of the weight to its ioprio mapping.
-+ * Set the flag only if necessary.
-+ */
-+ if ((unsigned short)val != bfqg->entity.new_weight) {
-+ bfqg->entity.new_weight = (unsigned short)val;
-+ /*
-+ * Make sure that the above new value has been
-+ * stored in bfqg->entity.new_weight before
-+ * setting the prio_changed flag. In fact,
-+ * this flag may be read asynchronously (in
-+ * critical sections protected by a different
-+ * lock than that held here), and finding this
-+ * flag set may cause the execution of the code
-+ * for updating parameters whose value may
-+ * depend also on bfqg->entity.new_weight (in
-+ * __bfq_entity_update_weight_prio).
-+ * This barrier makes sure that the new value
-+ * of bfqg->entity.new_weight is correctly
-+ * seen in that code.
-+ */
-+ smp_wmb();
-+ bfqg->entity.prio_changed = 1;
-+ }
-+ }
-+ spin_unlock_irq(&blkcg->lock);
-+
-+ return ret;
-+}
-+
-+static int bfqg_print_stat(struct seq_file *sf, void *v)
-+{
-+ blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)), blkg_prfill_stat,
-+ &blkcg_policy_bfq, seq_cft(sf)->private, false);
-+ return 0;
-+}
-+
-+static int bfqg_print_rwstat(struct seq_file *sf, void *v)
-+{
-+ blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)), blkg_prfill_rwstat,
-+ &blkcg_policy_bfq, seq_cft(sf)->private, true);
-+ return 0;
-+}
-+
-+static u64 bfqg_prfill_stat_recursive(struct seq_file *sf,
-+ struct blkg_policy_data *pd, int off)
-+{
-+ u64 sum = bfqg_stat_pd_recursive_sum(pd, off);
-+
-+ return __blkg_prfill_u64(sf, pd, sum);
-+}
-+
-+static u64 bfqg_prfill_rwstat_recursive(struct seq_file *sf,
-+ struct blkg_policy_data *pd, int off)
-+{
-+ struct blkg_rwstat sum = bfqg_rwstat_pd_recursive_sum(pd, off);
-+
-+ return __blkg_prfill_rwstat(sf, pd, &sum);
-+}
-+
-+static int bfqg_print_stat_recursive(struct seq_file *sf, void *v)
-+{
-+ blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)),
-+ bfqg_prfill_stat_recursive, &blkcg_policy_bfq,
-+ seq_cft(sf)->private, false);
-+ return 0;
-+}
-+
-+static int bfqg_print_rwstat_recursive(struct seq_file *sf, void *v)
-+{
-+ blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)),
-+ bfqg_prfill_rwstat_recursive, &blkcg_policy_bfq,
-+ seq_cft(sf)->private, true);
-+ return 0;
-+}
-+
-+static u64 bfqg_prfill_avg_queue_size(struct seq_file *sf,
-+ struct blkg_policy_data *pd, int off)
-+{
-+ struct bfq_group *bfqg = pd_to_bfqg(pd);
-+ u64 samples = blkg_stat_read(&bfqg->stats.avg_queue_size_samples);
-+ u64 v = 0;
-+
-+ if (samples) {
-+ v = blkg_stat_read(&bfqg->stats.avg_queue_size_sum);
-+ v = div64_u64(v, samples);
-+ }
-+ __blkg_prfill_u64(sf, pd, v);
-+ return 0;
-+}
-+
-+/* print avg_queue_size */
-+static int bfqg_print_avg_queue_size(struct seq_file *sf, void *v)
-+{
-+ blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)),
-+ bfqg_prfill_avg_queue_size, &blkcg_policy_bfq,
-+ 0, false);
-+ return 0;
-+}
-+
-+static struct bfq_group *bfq_create_group_hierarchy(struct bfq_data *bfqd, int node)
-+{
-+ int ret;
-+
-+ ret = blkcg_activate_policy(bfqd->queue, &blkcg_policy_bfq);
-+ if (ret)
-+ return NULL;
-+
-+ return blkg_to_bfqg(bfqd->queue->root_blkg);
-+}
-+
-+static struct cftype bfqio_files[] = {
-+ {
-+ .name = "bfq.weight",
-+ .read_u64 = bfqio_cgroup_weight_read,
-+ .write_u64 = bfqio_cgroup_weight_write,
-+ },
-+ /* statistics, cover only the tasks in the bfqg */
-+ {
-+ .name = "bfq.time",
-+ .private = offsetof(struct bfq_group, stats.time),
-+ .seq_show = bfqg_print_stat,
-+ },
-+ {
-+ .name = "bfq.sectors",
-+ .private = offsetof(struct bfq_group, stats.sectors),
-+ .seq_show = bfqg_print_stat,
-+ },
-+ {
-+ .name = "bfq.io_service_bytes",
-+ .private = offsetof(struct bfq_group, stats.service_bytes),
-+ .seq_show = bfqg_print_rwstat,
-+ },
-+ {
-+ .name = "bfq.io_serviced",
-+ .private = offsetof(struct bfq_group, stats.serviced),
-+ .seq_show = bfqg_print_rwstat,
-+ },
-+ {
-+ .name = "bfq.io_service_time",
-+ .private = offsetof(struct bfq_group, stats.service_time),
-+ .seq_show = bfqg_print_rwstat,
-+ },
-+ {
-+ .name = "bfq.io_wait_time",
-+ .private = offsetof(struct bfq_group, stats.wait_time),
-+ .seq_show = bfqg_print_rwstat,
-+ },
-+ {
-+ .name = "bfq.io_merged",
-+ .private = offsetof(struct bfq_group, stats.merged),
-+ .seq_show = bfqg_print_rwstat,
-+ },
-+ {
-+ .name = "bfq.io_queued",
-+ .private = offsetof(struct bfq_group, stats.queued),
-+ .seq_show = bfqg_print_rwstat,
-+ },
-+
-+ /* the same statictics which cover the bfqg and its descendants */
-+ {
-+ .name = "bfq.time_recursive",
-+ .private = offsetof(struct bfq_group, stats.time),
-+ .seq_show = bfqg_print_stat_recursive,
-+ },
-+ {
-+ .name = "bfq.sectors_recursive",
-+ .private = offsetof(struct bfq_group, stats.sectors),
-+ .seq_show = bfqg_print_stat_recursive,
-+ },
-+ {
-+ .name = "bfq.io_service_bytes_recursive",
-+ .private = offsetof(struct bfq_group, stats.service_bytes),
-+ .seq_show = bfqg_print_rwstat_recursive,
-+ },
-+ {
-+ .name = "bfq.io_serviced_recursive",
-+ .private = offsetof(struct bfq_group, stats.serviced),
-+ .seq_show = bfqg_print_rwstat_recursive,
-+ },
-+ {
-+ .name = "bfq.io_service_time_recursive",
-+ .private = offsetof(struct bfq_group, stats.service_time),
-+ .seq_show = bfqg_print_rwstat_recursive,
-+ },
-+ {
-+ .name = "bfq.io_wait_time_recursive",
-+ .private = offsetof(struct bfq_group, stats.wait_time),
-+ .seq_show = bfqg_print_rwstat_recursive,
-+ },
-+ {
-+ .name = "bfq.io_merged_recursive",
-+ .private = offsetof(struct bfq_group, stats.merged),
-+ .seq_show = bfqg_print_rwstat_recursive,
-+ },
-+ {
-+ .name = "bfq.io_queued_recursive",
-+ .private = offsetof(struct bfq_group, stats.queued),
-+ .seq_show = bfqg_print_rwstat_recursive,
-+ },
-+ {
-+ .name = "bfq.avg_queue_size",
-+ .seq_show = bfqg_print_avg_queue_size,
-+ },
-+ {
-+ .name = "bfq.group_wait_time",
-+ .private = offsetof(struct bfq_group, stats.group_wait_time),
-+ .seq_show = bfqg_print_stat,
-+ },
-+ {
-+ .name = "bfq.idle_time",
-+ .private = offsetof(struct bfq_group, stats.idle_time),
-+ .seq_show = bfqg_print_stat,
-+ },
-+ {
-+ .name = "bfq.empty_time",
-+ .private = offsetof(struct bfq_group, stats.empty_time),
-+ .seq_show = bfqg_print_stat,
-+ },
-+ {
-+ .name = "bfq.dequeue",
-+ .private = offsetof(struct bfq_group, stats.dequeue),
-+ .seq_show = bfqg_print_stat,
-+ },
-+ {
-+ .name = "bfq.unaccounted_time",
-+ .private = offsetof(struct bfq_group, stats.unaccounted_time),
-+ .seq_show = bfqg_print_stat,
-+ },
-+ { } /* terminate */
-+};
-+
-+static struct blkcg_policy blkcg_policy_bfq = {
-+ .pd_size = sizeof(struct bfq_group),
-+ .cpd_size = sizeof(struct bfq_group_data),
-+ .cftypes = bfqio_files,
-+ .pd_init_fn = bfq_pd_init,
-+ .cpd_init_fn = bfq_cpd_init,
-+ .pd_offline_fn = bfq_pd_offline,
-+ .pd_reset_stats_fn = bfq_pd_reset_stats,
-+};
-+
-+#else
-+
-+static void bfq_init_entity(struct bfq_entity *entity,
-+ struct bfq_group *bfqg)
-+{
-+ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
-+ entity->weight = entity->new_weight;
-+ entity->orig_weight = entity->new_weight;
-+ if (bfqq) {
-+ bfqq->ioprio = bfqq->new_ioprio;
-+ bfqq->ioprio_class = bfqq->new_ioprio_class;
-+ }
-+ entity->sched_data = &bfqg->sched_data;
-+}
-+
-+static struct bfq_group *
-+bfq_bic_update_cgroup(struct bfq_io_cq *bic, struct bio *bio)
-+{
-+ struct bfq_data *bfqd = bic_to_bfqd(bic);
-+ return bfqd->root_group;
-+}
-+
-+static void bfq_bfqq_move(struct bfq_data *bfqd,
-+ struct bfq_queue *bfqq,
-+ struct bfq_entity *entity,
-+ struct bfq_group *bfqg)
-+{
-+}
-+
-+static void bfq_end_wr_async(struct bfq_data *bfqd)
-+{
-+ bfq_end_wr_async_queues(bfqd, bfqd->root_group);
-+}
-+
-+static void bfq_disconnect_groups(struct bfq_data *bfqd)
-+{
-+ bfq_put_async_queues(bfqd, bfqd->root_group);
-+}
-+
-+static struct bfq_group *bfq_find_alloc_group(struct bfq_data *bfqd,
-+ struct blkcg *blkcg)
-+{
-+ return bfqd->root_group;
-+}
-+
-+static struct bfq_group *bfq_create_group_hierarchy(struct bfq_data *bfqd, int node)
-+{
-+ struct bfq_group *bfqg;
-+ int i;
-+
-+ bfqg = kmalloc_node(sizeof(*bfqg), GFP_KERNEL | __GFP_ZERO, node);
-+ if (!bfqg)
-+ return NULL;
-+
-+ for (i = 0; i < BFQ_IOPRIO_CLASSES; i++)
-+ bfqg->sched_data.service_tree[i] = BFQ_SERVICE_TREE_INIT;
-+
-+ return bfqg;
-+}
-+#endif
-diff --git a/block/bfq-ioc.c b/block/bfq-ioc.c
-new file mode 100644
-index 0000000..fb7bb8f
---- /dev/null
-+++ b/block/bfq-ioc.c
-@@ -0,0 +1,36 @@
-+/*
-+ * BFQ: I/O context handling.
-+ *
-+ * Based on ideas and code from CFQ:
-+ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
-+ *
-+ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
-+ * Paolo Valente <paolo.valente@unimore.it>
-+ *
-+ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
-+ */
-+
-+/**
-+ * icq_to_bic - convert iocontext queue structure to bfq_io_cq.
-+ * @icq: the iocontext queue.
-+ */
-+static struct bfq_io_cq *icq_to_bic(struct io_cq *icq)
-+{
-+ /* bic->icq is the first member, %NULL will convert to %NULL */
-+ return container_of(icq, struct bfq_io_cq, icq);
-+}
-+
-+/**
-+ * bfq_bic_lookup - search into @ioc a bic associated to @bfqd.
-+ * @bfqd: the lookup key.
-+ * @ioc: the io_context of the process doing I/O.
-+ *
-+ * Queue lock must be held.
-+ */
-+static struct bfq_io_cq *bfq_bic_lookup(struct bfq_data *bfqd,
-+ struct io_context *ioc)
-+{
-+ if (ioc)
-+ return icq_to_bic(ioc_lookup_icq(ioc, bfqd->queue));
-+ return NULL;
-+}
-diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
-new file mode 100644
-index 0000000..51d24dd
---- /dev/null
-+++ b/block/bfq-iosched.c
-@@ -0,0 +1,3753 @@
-+/*
-+ * Budget Fair Queueing (BFQ) disk scheduler.
-+ *
-+ * Based on ideas and code from CFQ:
-+ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
-+ *
-+ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
-+ * Paolo Valente <paolo.valente@unimore.it>
-+ *
-+ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
-+ *
-+ * Licensed under the GPL-2 as detailed in the accompanying COPYING.BFQ
-+ * file.
-+ *
-+ * BFQ is a proportional-share storage-I/O scheduling algorithm based on
-+ * the slice-by-slice service scheme of CFQ. But BFQ assigns budgets,
-+ * measured in number of sectors, to processes instead of time slices. The
-+ * device is not granted to the in-service process for a given time slice,
-+ * but until it has exhausted its assigned budget. This change from the time
-+ * to the service domain allows BFQ to distribute the device throughput
-+ * among processes as desired, without any distortion due to ZBR, workload
-+ * fluctuations or other factors. BFQ uses an ad hoc internal scheduler,
-+ * called B-WF2Q+, to schedule processes according to their budgets. More
-+ * precisely, BFQ schedules queues associated to processes. Thanks to the
-+ * accurate policy of B-WF2Q+, BFQ can afford to assign high budgets to
-+ * I/O-bound processes issuing sequential requests (to boost the
-+ * throughput), and yet guarantee a low latency to interactive and soft
-+ * real-time applications.
-+ *
-+ * BFQ is described in [1], where also a reference to the initial, more
-+ * theoretical paper on BFQ can be found. The interested reader can find
-+ * in the latter paper full details on the main algorithm, as well as
-+ * formulas of the guarantees and formal proofs of all the properties.
-+ * With respect to the version of BFQ presented in these papers, this
-+ * implementation adds a few more heuristics, such as the one that
-+ * guarantees a low latency to soft real-time applications, and a
-+ * hierarchical extension based on H-WF2Q+.
-+ *
-+ * B-WF2Q+ is based on WF2Q+, that is described in [2], together with
-+ * H-WF2Q+, while the augmented tree used to implement B-WF2Q+ with O(log N)
-+ * complexity derives from the one introduced with EEVDF in [3].
-+ *
-+ * [1] P. Valente and M. Andreolini, ``Improving Application Responsiveness
-+ * with the BFQ Disk I/O Scheduler'',
-+ * Proceedings of the 5th Annual International Systems and Storage
-+ * Conference (SYSTOR '12), June 2012.
-+ *
-+ * http://algogroup.unimo.it/people/paolo/disk_sched/bf1-v1-suite-results.pdf
-+ *
-+ * [2] Jon C.R. Bennett and H. Zhang, ``Hierarchical Packet Fair Queueing
-+ * Algorithms,'' IEEE/ACM Transactions on Networking, 5(5):675-689,
-+ * Oct 1997.
-+ *
-+ * http://www.cs.cmu.edu/~hzhang/papers/TON-97-Oct.ps.gz
-+ *
-+ * [3] I. Stoica and H. Abdel-Wahab, ``Earliest Eligible Virtual Deadline
-+ * First: A Flexible and Accurate Mechanism for Proportional Share
-+ * Resource Allocation,'' technical report.
-+ *
-+ * http://www.cs.berkeley.edu/~istoica/papers/eevdf-tr-95.pdf
-+ */
-+#include <linux/module.h>
-+#include <linux/slab.h>
-+#include <linux/blkdev.h>
-+#include <linux/cgroup.h>
-+#include <linux/elevator.h>
-+#include <linux/jiffies.h>
-+#include <linux/rbtree.h>
-+#include <linux/ioprio.h>
-+#include "bfq.h"
-+#include "blk.h"
-+
-+/* Expiration time of sync (0) and async (1) requests, in jiffies. */
-+static const int bfq_fifo_expire[2] = { HZ / 4, HZ / 8 };
-+
-+/* Maximum backwards seek, in KiB. */
-+static const int bfq_back_max = 16 * 1024;
-+
-+/* Penalty of a backwards seek, in number of sectors. */
-+static const int bfq_back_penalty = 2;
-+
-+/* Idling period duration, in jiffies. */
-+static int bfq_slice_idle = HZ / 125;
-+
-+/* Minimum number of assigned budgets for which stats are safe to compute. */
-+static const int bfq_stats_min_budgets = 194;
-+
-+/* Default maximum budget values, in sectors and number of requests. */
-+static const int bfq_default_max_budget = 16 * 1024;
-+static const int bfq_max_budget_async_rq = 4;
-+
-+/*
-+ * Async to sync throughput distribution is controlled as follows:
-+ * when an async request is served, the entity is charged the number
-+ * of sectors of the request, multiplied by the factor below
-+ */
-+static const int bfq_async_charge_factor = 10;
-+
-+/* Default timeout values, in jiffies, approximating CFQ defaults. */
-+static const int bfq_timeout_sync = HZ / 8;
-+static int bfq_timeout_async = HZ / 25;
-+
-+struct kmem_cache *bfq_pool;
-+
-+/* Below this threshold (in ms), we consider thinktime immediate. */
-+#define BFQ_MIN_TT 2
-+
-+/* hw_tag detection: parallel requests threshold and min samples needed. */
-+#define BFQ_HW_QUEUE_THRESHOLD 4
-+#define BFQ_HW_QUEUE_SAMPLES 32
-+
-+#define BFQQ_SEEK_THR (sector_t)(8 * 1024)
-+#define BFQQ_SEEKY(bfqq) ((bfqq)->seek_mean > BFQQ_SEEK_THR)
-+
-+/* Min samples used for peak rate estimation (for autotuning). */
-+#define BFQ_PEAK_RATE_SAMPLES 32
-+
-+/* Shift used for peak rate fixed precision calculations. */
-+#define BFQ_RATE_SHIFT 16
-+
-+/*
-+ * By default, BFQ computes the duration of the weight raising for
-+ * interactive applications automatically, using the following formula:
-+ * duration = (R / r) * T, where r is the peak rate of the device, and
-+ * R and T are two reference parameters.
-+ * In particular, R is the peak rate of the reference device (see below),
-+ * and T is a reference time: given the systems that are likely to be
-+ * installed on the reference device according to its speed class, T is
-+ * about the maximum time needed, under BFQ and while reading two files in
-+ * parallel, to load typical large applications on these systems.
-+ * In practice, the slower/faster the device at hand is, the more/less it
-+ * takes to load applications with respect to the reference device.
-+ * Accordingly, the longer/shorter BFQ grants weight raising to interactive
-+ * applications.
-+ *
-+ * BFQ uses four different reference pairs (R, T), depending on:
-+ * . whether the device is rotational or non-rotational;
-+ * . whether the device is slow, such as old or portable HDDs, as well as
-+ * SD cards, or fast, such as newer HDDs and SSDs.
-+ *
-+ * The device's speed class is dynamically (re)detected in
-+ * bfq_update_peak_rate() every time the estimated peak rate is updated.
-+ *
-+ * In the following definitions, R_slow[0]/R_fast[0] and T_slow[0]/T_fast[0]
-+ * are the reference values for a slow/fast rotational device, whereas
-+ * R_slow[1]/R_fast[1] and T_slow[1]/T_fast[1] are the reference values for
-+ * a slow/fast non-rotational device. Finally, device_speed_thresh are the
-+ * thresholds used to switch between speed classes.
-+ * Both the reference peak rates and the thresholds are measured in
-+ * sectors/usec, left-shifted by BFQ_RATE_SHIFT.
-+ */
-+static int R_slow[2] = {1536, 10752};
-+static int R_fast[2] = {17415, 34791};
-+/*
-+ * To improve readability, a conversion function is used to initialize the
-+ * following arrays, which entails that they can be initialized only in a
-+ * function.
-+ */
-+static int T_slow[2];
-+static int T_fast[2];
-+static int device_speed_thresh[2];
-+
-+#define BFQ_SERVICE_TREE_INIT ((struct bfq_service_tree) \
-+ { RB_ROOT, RB_ROOT, NULL, NULL, 0, 0 })
-+
-+#define RQ_BIC(rq) ((struct bfq_io_cq *) (rq)->elv.priv[0])
-+#define RQ_BFQQ(rq) ((rq)->elv.priv[1])
-+
-+static void bfq_schedule_dispatch(struct bfq_data *bfqd);
-+
-+#include "bfq-ioc.c"
-+#include "bfq-sched.c"
-+#include "bfq-cgroup.c"
-+
-+#define bfq_class_idle(bfqq) ((bfqq)->ioprio_class == IOPRIO_CLASS_IDLE)
-+#define bfq_class_rt(bfqq) ((bfqq)->ioprio_class == IOPRIO_CLASS_RT)
-+
-+#define bfq_sample_valid(samples) ((samples) > 80)
-+
-+/*
-+ * We regard a request as SYNC, if either it's a read or has the SYNC bit
-+ * set (in which case it could also be a direct WRITE).
-+ */
-+static int bfq_bio_sync(struct bio *bio)
-+{
-+ if (bio_data_dir(bio) == READ || (bio->bi_rw & REQ_SYNC))
-+ return 1;
-+
-+ return 0;
-+}
-+
-+/*
-+ * Scheduler run of queue, if there are requests pending and no one in the
-+ * driver that will restart queueing.
-+ */
-+static void bfq_schedule_dispatch(struct bfq_data *bfqd)
-+{
-+ if (bfqd->queued != 0) {
-+ bfq_log(bfqd, "schedule dispatch");
-+ kblockd_schedule_work(&bfqd->unplug_work);
-+ }
-+}
-+
-+/*
-+ * Lifted from AS - choose which of rq1 and rq2 that is best served now.
-+ * We choose the request that is closesr to the head right now. Distance
-+ * behind the head is penalized and only allowed to a certain extent.
-+ */
-+static struct request *bfq_choose_req(struct bfq_data *bfqd,
-+ struct request *rq1,
-+ struct request *rq2,
-+ sector_t last)
-+{
-+ sector_t s1, s2, d1 = 0, d2 = 0;
-+ unsigned long back_max;
-+#define BFQ_RQ1_WRAP 0x01 /* request 1 wraps */
-+#define BFQ_RQ2_WRAP 0x02 /* request 2 wraps */
-+ unsigned wrap = 0; /* bit mask: requests behind the disk head? */
-+
-+ if (!rq1 || rq1 == rq2)
-+ return rq2;
-+ if (!rq2)
-+ return rq1;
-+
-+ if (rq_is_sync(rq1) && !rq_is_sync(rq2))
-+ return rq1;
-+ else if (rq_is_sync(rq2) && !rq_is_sync(rq1))
-+ return rq2;
-+ if ((rq1->cmd_flags & REQ_META) && !(rq2->cmd_flags & REQ_META))
-+ return rq1;
-+ else if ((rq2->cmd_flags & REQ_META) && !(rq1->cmd_flags & REQ_META))
-+ return rq2;
-+
-+ s1 = blk_rq_pos(rq1);
-+ s2 = blk_rq_pos(rq2);
-+
-+ /*
-+ * By definition, 1KiB is 2 sectors.
-+ */
-+ back_max = bfqd->bfq_back_max * 2;
-+
-+ /*
-+ * Strict one way elevator _except_ in the case where we allow
-+ * short backward seeks which are biased as twice the cost of a
-+ * similar forward seek.
-+ */
-+ if (s1 >= last)
-+ d1 = s1 - last;
-+ else if (s1 + back_max >= last)
-+ d1 = (last - s1) * bfqd->bfq_back_penalty;
-+ else
-+ wrap |= BFQ_RQ1_WRAP;
-+
-+ if (s2 >= last)
-+ d2 = s2 - last;
-+ else if (s2 + back_max >= last)
-+ d2 = (last - s2) * bfqd->bfq_back_penalty;
-+ else
-+ wrap |= BFQ_RQ2_WRAP;
-+
-+ /* Found required data */
-+
-+ /*
-+ * By doing switch() on the bit mask "wrap" we avoid having to
-+ * check two variables for all permutations: --> faster!
-+ */
-+ switch (wrap) {
-+ case 0: /* common case for CFQ: rq1 and rq2 not wrapped */
-+ if (d1 < d2)
-+ return rq1;
-+ else if (d2 < d1)
-+ return rq2;
-+ else {
-+ if (s1 >= s2)
-+ return rq1;
-+ else
-+ return rq2;
-+ }
-+
-+ case BFQ_RQ2_WRAP:
-+ return rq1;
-+ case BFQ_RQ1_WRAP:
-+ return rq2;
-+ case (BFQ_RQ1_WRAP|BFQ_RQ2_WRAP): /* both rqs wrapped */
-+ default:
-+ /*
-+ * Since both rqs are wrapped,
-+ * start with the one that's further behind head
-+ * (--> only *one* back seek required),
-+ * since back seek takes more time than forward.
-+ */
-+ if (s1 <= s2)
-+ return rq1;
-+ else
-+ return rq2;
-+ }
-+}
-+
-+/*
-+ * Tell whether there are active queues or groups with differentiated weights.
-+ */
-+static bool bfq_differentiated_weights(struct bfq_data *bfqd)
-+{
-+ /*
-+ * For weights to differ, at least one of the trees must contain
-+ * at least two nodes.
-+ */
-+ return (!RB_EMPTY_ROOT(&bfqd->queue_weights_tree) &&
-+ (bfqd->queue_weights_tree.rb_node->rb_left ||
-+ bfqd->queue_weights_tree.rb_node->rb_right)
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ ) ||
-+ (!RB_EMPTY_ROOT(&bfqd->group_weights_tree) &&
-+ (bfqd->group_weights_tree.rb_node->rb_left ||
-+ bfqd->group_weights_tree.rb_node->rb_right)
-+#endif
-+ );
-+}
-+
-+/*
-+ * The following function returns true if every queue must receive the
-+ * same share of the throughput (this condition is used when deciding
-+ * whether idling may be disabled, see the comments in the function
-+ * bfq_bfqq_may_idle()).
-+ *
-+ * Such a scenario occurs when:
-+ * 1) all active queues have the same weight,
-+ * 2) all active groups at the same level in the groups tree have the same
-+ * weight,
-+ * 3) all active groups at the same level in the groups tree have the same
-+ * number of children.
-+ *
-+ * Unfortunately, keeping the necessary state for evaluating exactly the
-+ * above symmetry conditions would be quite complex and time-consuming.
-+ * Therefore this function evaluates, instead, the following stronger
-+ * sub-conditions, for which it is much easier to maintain the needed
-+ * state:
-+ * 1) all active queues have the same weight,
-+ * 2) all active groups have the same weight,
-+ * 3) all active groups have at most one active child each.
-+ * In particular, the last two conditions are always true if hierarchical
-+ * support and the cgroups interface are not enabled, thus no state needs
-+ * to be maintained in this case.
-+ */
-+static bool bfq_symmetric_scenario(struct bfq_data *bfqd)
-+{
-+ return
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ !bfqd->active_numerous_groups &&
-+#endif
-+ !bfq_differentiated_weights(bfqd);
-+}
-+
-+/*
-+ * If the weight-counter tree passed as input contains no counter for
-+ * the weight of the input entity, then add that counter; otherwise just
-+ * increment the existing counter.
-+ *
-+ * Note that weight-counter trees contain few nodes in mostly symmetric
-+ * scenarios. For example, if all queues have the same weight, then the
-+ * weight-counter tree for the queues may contain at most one node.
-+ * This holds even if low_latency is on, because weight-raised queues
-+ * are not inserted in the tree.
-+ * In most scenarios, the rate at which nodes are created/destroyed
-+ * should be low too.
-+ */
-+static void bfq_weights_tree_add(struct bfq_data *bfqd,
-+ struct bfq_entity *entity,
-+ struct rb_root *root)
-+{
-+ struct rb_node **new = &(root->rb_node), *parent = NULL;
-+
-+ /*
-+ * Do not insert if the entity is already associated with a
-+ * counter, which happens if:
-+ * 1) the entity is associated with a queue,
-+ * 2) a request arrival has caused the queue to become both
-+ * non-weight-raised, and hence change its weight, and
-+ * backlogged; in this respect, each of the two events
-+ * causes an invocation of this function,
-+ * 3) this is the invocation of this function caused by the
-+ * second event. This second invocation is actually useless,
-+ * and we handle this fact by exiting immediately. More
-+ * efficient or clearer solutions might possibly be adopted.
-+ */
-+ if (entity->weight_counter)
-+ return;
-+
-+ while (*new) {
-+ struct bfq_weight_counter *__counter = container_of(*new,
-+ struct bfq_weight_counter,
-+ weights_node);
-+ parent = *new;
-+
-+ if (entity->weight == __counter->weight) {
-+ entity->weight_counter = __counter;
-+ goto inc_counter;
-+ }
-+ if (entity->weight < __counter->weight)
-+ new = &((*new)->rb_left);
-+ else
-+ new = &((*new)->rb_right);
-+ }
-+
-+ entity->weight_counter = kzalloc(sizeof(struct bfq_weight_counter),
-+ GFP_ATOMIC);
-+ entity->weight_counter->weight = entity->weight;
-+ rb_link_node(&entity->weight_counter->weights_node, parent, new);
-+ rb_insert_color(&entity->weight_counter->weights_node, root);
-+
-+inc_counter:
-+ entity->weight_counter->num_active++;
-+}
-+
-+/*
-+ * Decrement the weight counter associated with the entity, and, if the
-+ * counter reaches 0, remove the counter from the tree.
-+ * See the comments to the function bfq_weights_tree_add() for considerations
-+ * about overhead.
-+ */
-+static void bfq_weights_tree_remove(struct bfq_data *bfqd,
-+ struct bfq_entity *entity,
-+ struct rb_root *root)
-+{
-+ if (!entity->weight_counter)
-+ return;
-+
-+ BUG_ON(RB_EMPTY_ROOT(root));
-+ BUG_ON(entity->weight_counter->weight != entity->weight);
-+
-+ BUG_ON(!entity->weight_counter->num_active);
-+ entity->weight_counter->num_active--;
-+ if (entity->weight_counter->num_active > 0)
-+ goto reset_entity_pointer;
-+
-+ rb_erase(&entity->weight_counter->weights_node, root);
-+ kfree(entity->weight_counter);
-+
-+reset_entity_pointer:
-+ entity->weight_counter = NULL;
-+}
-+
-+static struct request *bfq_find_next_rq(struct bfq_data *bfqd,
-+ struct bfq_queue *bfqq,
-+ struct request *last)
-+{
-+ struct rb_node *rbnext = rb_next(&last->rb_node);
-+ struct rb_node *rbprev = rb_prev(&last->rb_node);
-+ struct request *next = NULL, *prev = NULL;
-+
-+ BUG_ON(RB_EMPTY_NODE(&last->rb_node));
-+
-+ if (rbprev)
-+ prev = rb_entry_rq(rbprev);
-+
-+ if (rbnext)
-+ next = rb_entry_rq(rbnext);
-+ else {
-+ rbnext = rb_first(&bfqq->sort_list);
-+ if (rbnext && rbnext != &last->rb_node)
-+ next = rb_entry_rq(rbnext);
-+ }
-+
-+ return bfq_choose_req(bfqd, next, prev, blk_rq_pos(last));
-+}
-+
-+/* see the definition of bfq_async_charge_factor for details */
-+static unsigned long bfq_serv_to_charge(struct request *rq,
-+ struct bfq_queue *bfqq)
-+{
-+ return blk_rq_sectors(rq) *
-+ (1 + ((!bfq_bfqq_sync(bfqq)) * (bfqq->wr_coeff == 1) *
-+ bfq_async_charge_factor));
-+}
-+
-+/**
-+ * bfq_updated_next_req - update the queue after a new next_rq selection.
-+ * @bfqd: the device data the queue belongs to.
-+ * @bfqq: the queue to update.
-+ *
-+ * If the first request of a queue changes we make sure that the queue
-+ * has enough budget to serve at least its first request (if the
-+ * request has grown). We do this because if the queue has not enough
-+ * budget for its first request, it has to go through two dispatch
-+ * rounds to actually get it dispatched.
-+ */
-+static void bfq_updated_next_req(struct bfq_data *bfqd,
-+ struct bfq_queue *bfqq)
-+{
-+ struct bfq_entity *entity = &bfqq->entity;
-+ struct bfq_service_tree *st = bfq_entity_service_tree(entity);
-+ struct request *next_rq = bfqq->next_rq;
-+ unsigned long new_budget;
-+
-+ if (!next_rq)
-+ return;
-+
-+ if (bfqq == bfqd->in_service_queue)
-+ /*
-+ * In order not to break guarantees, budgets cannot be
-+ * changed after an entity has been selected.
-+ */
-+ return;
-+
-+ BUG_ON(entity->tree != &st->active);
-+ BUG_ON(entity == entity->sched_data->in_service_entity);
-+
-+ new_budget = max_t(unsigned long, bfqq->max_budget,
-+ bfq_serv_to_charge(next_rq, bfqq));
-+ if (entity->budget != new_budget) {
-+ entity->budget = new_budget;
-+ bfq_log_bfqq(bfqd, bfqq, "updated next rq: new budget %lu",
-+ new_budget);
-+ bfq_activate_bfqq(bfqd, bfqq);
-+ }
-+}
-+
-+static unsigned int bfq_wr_duration(struct bfq_data *bfqd)
-+{
-+ u64 dur;
-+
-+ if (bfqd->bfq_wr_max_time > 0)
-+ return bfqd->bfq_wr_max_time;
-+
-+ dur = bfqd->RT_prod;
-+ do_div(dur, bfqd->peak_rate);
-+
-+ return dur;
-+}
-+
-+/* Empty burst list and add just bfqq (see comments to bfq_handle_burst) */
-+static void bfq_reset_burst_list(struct bfq_data *bfqd, struct bfq_queue *bfqq)
-+{
-+ struct bfq_queue *item;
-+ struct hlist_node *n;
-+
-+ hlist_for_each_entry_safe(item, n, &bfqd->burst_list, burst_list_node)
-+ hlist_del_init(&item->burst_list_node);
-+ hlist_add_head(&bfqq->burst_list_node, &bfqd->burst_list);
-+ bfqd->burst_size = 1;
-+}
-+
-+/* Add bfqq to the list of queues in current burst (see bfq_handle_burst) */
-+static void bfq_add_to_burst(struct bfq_data *bfqd, struct bfq_queue *bfqq)
-+{
-+ /* Increment burst size to take into account also bfqq */
-+ bfqd->burst_size++;
-+
-+ if (bfqd->burst_size == bfqd->bfq_large_burst_thresh) {
-+ struct bfq_queue *pos, *bfqq_item;
-+ struct hlist_node *n;
-+
-+ /*
-+ * Enough queues have been activated shortly after each
-+ * other to consider this burst as large.
-+ */
-+ bfqd->large_burst = true;
-+
-+ /*
-+ * We can now mark all queues in the burst list as
-+ * belonging to a large burst.
-+ */
-+ hlist_for_each_entry(bfqq_item, &bfqd->burst_list,
-+ burst_list_node)
-+ bfq_mark_bfqq_in_large_burst(bfqq_item);
-+ bfq_mark_bfqq_in_large_burst(bfqq);
-+
-+ /*
-+ * From now on, and until the current burst finishes, any
-+ * new queue being activated shortly after the last queue
-+ * was inserted in the burst can be immediately marked as
-+ * belonging to a large burst. So the burst list is not
-+ * needed any more. Remove it.
-+ */
-+ hlist_for_each_entry_safe(pos, n, &bfqd->burst_list,
-+ burst_list_node)
-+ hlist_del_init(&pos->burst_list_node);
-+ } else /* burst not yet large: add bfqq to the burst list */
-+ hlist_add_head(&bfqq->burst_list_node, &bfqd->burst_list);
-+}
-+
-+/*
-+ * If many queues happen to become active shortly after each other, then,
-+ * to help the processes associated to these queues get their job done as
-+ * soon as possible, it is usually better to not grant either weight-raising
-+ * or device idling to these queues. In this comment we describe, firstly,
-+ * the reasons why this fact holds, and, secondly, the next function, which
-+ * implements the main steps needed to properly mark these queues so that
-+ * they can then be treated in a different way.
-+ *
-+ * As for the terminology, we say that a queue becomes active, i.e.,
-+ * switches from idle to backlogged, either when it is created (as a
-+ * consequence of the arrival of an I/O request), or, if already existing,
-+ * when a new request for the queue arrives while the queue is idle.
-+ * Bursts of activations, i.e., activations of different queues occurring
-+ * shortly after each other, are typically caused by services or applications
-+ * that spawn or reactivate many parallel threads/processes. Examples are
-+ * systemd during boot or git grep.
-+ *
-+ * These services or applications benefit mostly from a high throughput:
-+ * the quicker the requests of the activated queues are cumulatively served,
-+ * the sooner the target job of these queues gets completed. As a consequence,
-+ * weight-raising any of these queues, which also implies idling the device
-+ * for it, is almost always counterproductive: in most cases it just lowers
-+ * throughput.
-+ *
-+ * On the other hand, a burst of activations may be also caused by the start
-+ * of an application that does not consist in a lot of parallel I/O-bound
-+ * threads. In fact, with a complex application, the burst may be just a
-+ * consequence of the fact that several processes need to be executed to
-+ * start-up the application. To start an application as quickly as possible,
-+ * the best thing to do is to privilege the I/O related to the application
-+ * with respect to all other I/O. Therefore, the best strategy to start as
-+ * quickly as possible an application that causes a burst of activations is
-+ * to weight-raise all the queues activated during the burst. This is the
-+ * exact opposite of the best strategy for the other type of bursts.
-+ *
-+ * In the end, to take the best action for each of the two cases, the two
-+ * types of bursts need to be distinguished. Fortunately, this seems
-+ * relatively easy to do, by looking at the sizes of the bursts. In
-+ * particular, we found a threshold such that bursts with a larger size
-+ * than that threshold are apparently caused only by services or commands
-+ * such as systemd or git grep. For brevity, hereafter we call just 'large'
-+ * these bursts. BFQ *does not* weight-raise queues whose activations occur
-+ * in a large burst. In addition, for each of these queues BFQ performs or
-+ * does not perform idling depending on which choice boosts the throughput
-+ * most. The exact choice depends on the device and request pattern at
-+ * hand.
-+ *
-+ * Turning back to the next function, it implements all the steps needed
-+ * to detect the occurrence of a large burst and to properly mark all the
-+ * queues belonging to it (so that they can then be treated in a different
-+ * way). This goal is achieved by maintaining a special "burst list" that
-+ * holds, temporarily, the queues that belong to the burst in progress. The
-+ * list is then used to mark these queues as belonging to a large burst if
-+ * the burst does become large. The main steps are the following.
-+ *
-+ * . when the very first queue is activated, the queue is inserted into the
-+ * list (as it could be the first queue in a possible burst)
-+ *
-+ * . if the current burst has not yet become large, and a queue Q that does
-+ * not yet belong to the burst is activated shortly after the last time
-+ * at which a new queue entered the burst list, then the function appends
-+ * Q to the burst list
-+ *
-+ * . if, as a consequence of the previous step, the burst size reaches
-+ * the large-burst threshold, then
-+ *
-+ * . all the queues in the burst list are marked as belonging to a
-+ * large burst
-+ *
-+ * . the burst list is deleted; in fact, the burst list already served
-+ * its purpose (keeping temporarily track of the queues in a burst,
-+ * so as to be able to mark them as belonging to a large burst in the
-+ * previous sub-step), and now is not needed any more
-+ *
-+ * . the device enters a large-burst mode
-+ *
-+ * . if a queue Q that does not belong to the burst is activated while
-+ * the device is in large-burst mode and shortly after the last time
-+ * at which a queue either entered the burst list or was marked as
-+ * belonging to the current large burst, then Q is immediately marked
-+ * as belonging to a large burst.
-+ *
-+ * . if a queue Q that does not belong to the burst is activated a while
-+ * later, i.e., not shortly after, than the last time at which a queue
-+ * either entered the burst list or was marked as belonging to the
-+ * current large burst, then the current burst is deemed as finished and:
-+ *
-+ * . the large-burst mode is reset if set
-+ *
-+ * . the burst list is emptied
-+ *
-+ * . Q is inserted in the burst list, as Q may be the first queue
-+ * in a possible new burst (then the burst list contains just Q
-+ * after this step).
-+ */
-+static void bfq_handle_burst(struct bfq_data *bfqd, struct bfq_queue *bfqq,
-+ bool idle_for_long_time)
-+{
-+ /*
-+ * If bfqq happened to be activated in a burst, but has been idle
-+ * for at least as long as an interactive queue, then we assume
-+ * that, in the overall I/O initiated in the burst, the I/O
-+ * associated to bfqq is finished. So bfqq does not need to be
-+ * treated as a queue belonging to a burst anymore. Accordingly,
-+ * we reset bfqq's in_large_burst flag if set, and remove bfqq
-+ * from the burst list if it's there. We do not decrement instead
-+ * burst_size, because the fact that bfqq does not need to belong
-+ * to the burst list any more does not invalidate the fact that
-+ * bfqq may have been activated during the current burst.
-+ */
-+ if (idle_for_long_time) {
-+ hlist_del_init(&bfqq->burst_list_node);
-+ bfq_clear_bfqq_in_large_burst(bfqq);
-+ }
-+
-+ /*
-+ * If bfqq is already in the burst list or is part of a large
-+ * burst, then there is nothing else to do.
-+ */
-+ if (!hlist_unhashed(&bfqq->burst_list_node) ||
-+ bfq_bfqq_in_large_burst(bfqq))
-+ return;
-+
-+ /*
-+ * If bfqq's activation happens late enough, then the current
-+ * burst is finished, and related data structures must be reset.
-+ *
-+ * In this respect, consider the special case where bfqq is the very
-+ * first queue being activated. In this case, last_ins_in_burst is
-+ * not yet significant when we get here. But it is easy to verify
-+ * that, whether or not the following condition is true, bfqq will
-+ * end up being inserted into the burst list. In particular the
-+ * list will happen to contain only bfqq. And this is exactly what
-+ * has to happen, as bfqq may be the first queue in a possible
-+ * burst.
-+ */
-+ if (time_is_before_jiffies(bfqd->last_ins_in_burst +
-+ bfqd->bfq_burst_interval)) {
-+ bfqd->large_burst = false;
-+ bfq_reset_burst_list(bfqd, bfqq);
-+ return;
-+ }
-+
-+ /*
-+ * If we get here, then bfqq is being activated shortly after the
-+ * last queue. So, if the current burst is also large, we can mark
-+ * bfqq as belonging to this large burst immediately.
-+ */
-+ if (bfqd->large_burst) {
-+ bfq_mark_bfqq_in_large_burst(bfqq);
-+ return;
-+ }
-+
-+ /*
-+ * If we get here, then a large-burst state has not yet been
-+ * reached, but bfqq is being activated shortly after the last
-+ * queue. Then we add bfqq to the burst.
-+ */
-+ bfq_add_to_burst(bfqd, bfqq);
-+}
-+
-+static void bfq_add_request(struct request *rq)
-+{
-+ struct bfq_queue *bfqq = RQ_BFQQ(rq);
-+ struct bfq_entity *entity = &bfqq->entity;
-+ struct bfq_data *bfqd = bfqq->bfqd;
-+ struct request *next_rq, *prev;
-+ unsigned long old_wr_coeff = bfqq->wr_coeff;
-+ bool interactive = false;
-+
-+ bfq_log_bfqq(bfqd, bfqq, "add_request %d", rq_is_sync(rq));
-+ bfqq->queued[rq_is_sync(rq)]++;
-+ bfqd->queued++;
-+
-+ elv_rb_add(&bfqq->sort_list, rq);
-+
-+ /*
-+ * Check if this request is a better next-serve candidate.
-+ */
-+ prev = bfqq->next_rq;
-+ next_rq = bfq_choose_req(bfqd, bfqq->next_rq, rq, bfqd->last_position);
-+ BUG_ON(!next_rq);
-+ bfqq->next_rq = next_rq;
-+
-+ if (!bfq_bfqq_busy(bfqq)) {
-+ bool soft_rt, in_burst,
-+ idle_for_long_time = time_is_before_jiffies(
-+ bfqq->budget_timeout +
-+ bfqd->bfq_wr_min_idle_time);
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ bfqg_stats_update_io_add(bfqq_group(RQ_BFQQ(rq)), bfqq,
-+ rq->cmd_flags);
-+#endif
-+ if (bfq_bfqq_sync(bfqq)) {
-+ bool already_in_burst =
-+ !hlist_unhashed(&bfqq->burst_list_node) ||
-+ bfq_bfqq_in_large_burst(bfqq);
-+ bfq_handle_burst(bfqd, bfqq, idle_for_long_time);
-+ /*
-+ * If bfqq was not already in the current burst,
-+ * then, at this point, bfqq either has been
-+ * added to the current burst or has caused the
-+ * current burst to terminate. In particular, in
-+ * the second case, bfqq has become the first
-+ * queue in a possible new burst.
-+ * In both cases last_ins_in_burst needs to be
-+ * moved forward.
-+ */
-+ if (!already_in_burst)
-+ bfqd->last_ins_in_burst = jiffies;
-+ }
-+
-+ in_burst = bfq_bfqq_in_large_burst(bfqq);
-+ soft_rt = bfqd->bfq_wr_max_softrt_rate > 0 &&
-+ !in_burst &&
-+ time_is_before_jiffies(bfqq->soft_rt_next_start);
-+ interactive = !in_burst && idle_for_long_time;
-+ entity->budget = max_t(unsigned long, bfqq->max_budget,
-+ bfq_serv_to_charge(next_rq, bfqq));
-+
-+ if (!bfq_bfqq_IO_bound(bfqq)) {
-+ if (time_before(jiffies,
-+ RQ_BIC(rq)->ttime.last_end_request +
-+ bfqd->bfq_slice_idle)) {
-+ bfqq->requests_within_timer++;
-+ if (bfqq->requests_within_timer >=
-+ bfqd->bfq_requests_within_timer)
-+ bfq_mark_bfqq_IO_bound(bfqq);
-+ } else
-+ bfqq->requests_within_timer = 0;
-+ }
-+
-+ if (!bfqd->low_latency)
-+ goto add_bfqq_busy;
-+
-+ /*
-+ * If the queue:
-+ * - is not being boosted,
-+ * - has been idle for enough time,
-+ * - is not a sync queue or is linked to a bfq_io_cq (it is
-+ * shared "for its nature" or it is not shared and its
-+ * requests have not been redirected to a shared queue)
-+ * start a weight-raising period.
-+ */
-+ if (old_wr_coeff == 1 && (interactive || soft_rt) &&
-+ (!bfq_bfqq_sync(bfqq) || bfqq->bic)) {
-+ bfqq->wr_coeff = bfqd->bfq_wr_coeff;
-+ if (interactive)
-+ bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
-+ else
-+ bfqq->wr_cur_max_time =
-+ bfqd->bfq_wr_rt_max_time;
-+ bfq_log_bfqq(bfqd, bfqq,
-+ "wrais starting at %lu, rais_max_time %u",
-+ jiffies,
-+ jiffies_to_msecs(bfqq->wr_cur_max_time));
-+ } else if (old_wr_coeff > 1) {
-+ if (interactive)
-+ bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
-+ else if (in_burst ||
-+ (bfqq->wr_cur_max_time ==
-+ bfqd->bfq_wr_rt_max_time &&
-+ !soft_rt)) {
-+ bfqq->wr_coeff = 1;
-+ bfq_log_bfqq(bfqd, bfqq,
-+ "wrais ending at %lu, rais_max_time %u",
-+ jiffies,
-+ jiffies_to_msecs(bfqq->
-+ wr_cur_max_time));
-+ } else if (time_before(
-+ bfqq->last_wr_start_finish +
-+ bfqq->wr_cur_max_time,
-+ jiffies +
-+ bfqd->bfq_wr_rt_max_time) &&
-+ soft_rt) {
-+ /*
-+ *
-+ * The remaining weight-raising time is lower
-+ * than bfqd->bfq_wr_rt_max_time, which means
-+ * that the application is enjoying weight
-+ * raising either because deemed soft-rt in
-+ * the near past, or because deemed interactive
-+ * a long ago.
-+ * In both cases, resetting now the current
-+ * remaining weight-raising time for the
-+ * application to the weight-raising duration
-+ * for soft rt applications would not cause any
-+ * latency increase for the application (as the
-+ * new duration would be higher than the
-+ * remaining time).
-+ *
-+ * In addition, the application is now meeting
-+ * the requirements for being deemed soft rt.
-+ * In the end we can correctly and safely
-+ * (re)charge the weight-raising duration for
-+ * the application with the weight-raising
-+ * duration for soft rt applications.
-+ *
-+ * In particular, doing this recharge now, i.e.,
-+ * before the weight-raising period for the
-+ * application finishes, reduces the probability
-+ * of the following negative scenario:
-+ * 1) the weight of a soft rt application is
-+ * raised at startup (as for any newly
-+ * created application),
-+ * 2) since the application is not interactive,
-+ * at a certain time weight-raising is
-+ * stopped for the application,
-+ * 3) at that time the application happens to
-+ * still have pending requests, and hence
-+ * is destined to not have a chance to be
-+ * deemed soft rt before these requests are
-+ * completed (see the comments to the
-+ * function bfq_bfqq_softrt_next_start()
-+ * for details on soft rt detection),
-+ * 4) these pending requests experience a high
-+ * latency because the application is not
-+ * weight-raised while they are pending.
-+ */
-+ bfqq->last_wr_start_finish = jiffies;
-+ bfqq->wr_cur_max_time =
-+ bfqd->bfq_wr_rt_max_time;
-+ }
-+ }
-+ if (old_wr_coeff != bfqq->wr_coeff)
-+ entity->prio_changed = 1;
-+add_bfqq_busy:
-+ bfqq->last_idle_bklogged = jiffies;
-+ bfqq->service_from_backlogged = 0;
-+ bfq_clear_bfqq_softrt_update(bfqq);
-+ bfq_add_bfqq_busy(bfqd, bfqq);
-+ } else {
-+ if (bfqd->low_latency && old_wr_coeff == 1 && !rq_is_sync(rq) &&
-+ time_is_before_jiffies(
-+ bfqq->last_wr_start_finish +
-+ bfqd->bfq_wr_min_inter_arr_async)) {
-+ bfqq->wr_coeff = bfqd->bfq_wr_coeff;
-+ bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
-+
-+ bfqd->wr_busy_queues++;
-+ entity->prio_changed = 1;
-+ bfq_log_bfqq(bfqd, bfqq,
-+ "non-idle wrais starting at %lu, rais_max_time %u",
-+ jiffies,
-+ jiffies_to_msecs(bfqq->wr_cur_max_time));
-+ }
-+ if (prev != bfqq->next_rq)
-+ bfq_updated_next_req(bfqd, bfqq);
-+ }
-+
-+ if (bfqd->low_latency &&
-+ (old_wr_coeff == 1 || bfqq->wr_coeff == 1 || interactive))
-+ bfqq->last_wr_start_finish = jiffies;
-+}
-+
-+static struct request *bfq_find_rq_fmerge(struct bfq_data *bfqd,
-+ struct bio *bio)
-+{
-+ struct task_struct *tsk = current;
-+ struct bfq_io_cq *bic;
-+ struct bfq_queue *bfqq;
-+
-+ bic = bfq_bic_lookup(bfqd, tsk->io_context);
-+ if (!bic)
-+ return NULL;
-+
-+ bfqq = bic_to_bfqq(bic, bfq_bio_sync(bio));
-+ if (bfqq)
-+ return elv_rb_find(&bfqq->sort_list, bio_end_sector(bio));
-+
-+ return NULL;
-+}
-+
-+static void bfq_activate_request(struct request_queue *q, struct request *rq)
-+{
-+ struct bfq_data *bfqd = q->elevator->elevator_data;
-+
-+ bfqd->rq_in_driver++;
-+ bfqd->last_position = blk_rq_pos(rq) + blk_rq_sectors(rq);
-+ bfq_log(bfqd, "activate_request: new bfqd->last_position %llu",
-+ (long long unsigned)bfqd->last_position);
-+}
-+
-+static void bfq_deactivate_request(struct request_queue *q, struct request *rq)
-+{
-+ struct bfq_data *bfqd = q->elevator->elevator_data;
-+
-+ BUG_ON(bfqd->rq_in_driver == 0);
-+ bfqd->rq_in_driver--;
-+}
-+
-+static void bfq_remove_request(struct request *rq)
-+{
-+ struct bfq_queue *bfqq = RQ_BFQQ(rq);
-+ struct bfq_data *bfqd = bfqq->bfqd;
-+ const int sync = rq_is_sync(rq);
-+
-+ if (bfqq->next_rq == rq) {
-+ bfqq->next_rq = bfq_find_next_rq(bfqd, bfqq, rq);
-+ bfq_updated_next_req(bfqd, bfqq);
-+ }
-+
-+ if (rq->queuelist.prev != &rq->queuelist)
-+ list_del_init(&rq->queuelist);
-+ BUG_ON(bfqq->queued[sync] == 0);
-+ bfqq->queued[sync]--;
-+ bfqd->queued--;
-+ elv_rb_del(&bfqq->sort_list, rq);
-+
-+ if (RB_EMPTY_ROOT(&bfqq->sort_list)) {
-+ if (bfq_bfqq_busy(bfqq) && bfqq != bfqd->in_service_queue)
-+ bfq_del_bfqq_busy(bfqd, bfqq, 1);
-+ /*
-+ * Remove queue from request-position tree as it is empty.
-+ */
-+ if (bfqq->pos_root) {
-+ rb_erase(&bfqq->pos_node, bfqq->pos_root);
-+ bfqq->pos_root = NULL;
-+ }
-+ }
-+
-+ if (rq->cmd_flags & REQ_META) {
-+ BUG_ON(bfqq->meta_pending == 0);
-+ bfqq->meta_pending--;
-+ }
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ bfqg_stats_update_io_remove(bfqq_group(bfqq), rq->cmd_flags);
-+#endif
-+}
-+
-+static int bfq_merge(struct request_queue *q, struct request **req,
-+ struct bio *bio)
-+{
-+ struct bfq_data *bfqd = q->elevator->elevator_data;
-+ struct request *__rq;
-+
-+ __rq = bfq_find_rq_fmerge(bfqd, bio);
-+ if (__rq && elv_rq_merge_ok(__rq, bio)) {
-+ *req = __rq;
-+ return ELEVATOR_FRONT_MERGE;
-+ }
-+
-+ return ELEVATOR_NO_MERGE;
-+}
-+
-+static void bfq_merged_request(struct request_queue *q, struct request *req,
-+ int type)
-+{
-+ if (type == ELEVATOR_FRONT_MERGE &&
-+ rb_prev(&req->rb_node) &&
-+ blk_rq_pos(req) <
-+ blk_rq_pos(container_of(rb_prev(&req->rb_node),
-+ struct request, rb_node))) {
-+ struct bfq_queue *bfqq = RQ_BFQQ(req);
-+ struct bfq_data *bfqd = bfqq->bfqd;
-+ struct request *prev, *next_rq;
-+
-+ /* Reposition request in its sort_list */
-+ elv_rb_del(&bfqq->sort_list, req);
-+ elv_rb_add(&bfqq->sort_list, req);
-+ /* Choose next request to be served for bfqq */
-+ prev = bfqq->next_rq;
-+ next_rq = bfq_choose_req(bfqd, bfqq->next_rq, req,
-+ bfqd->last_position);
-+ BUG_ON(!next_rq);
-+ bfqq->next_rq = next_rq;
-+ }
-+}
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+static void bfq_bio_merged(struct request_queue *q, struct request *req,
-+ struct bio *bio)
-+{
-+ bfqg_stats_update_io_merged(bfqq_group(RQ_BFQQ(req)), bio->bi_rw);
-+}
-+#endif
-+
-+static void bfq_merged_requests(struct request_queue *q, struct request *rq,
-+ struct request *next)
-+{
-+ struct bfq_queue *bfqq = RQ_BFQQ(rq), *next_bfqq = RQ_BFQQ(next);
-+
-+ /*
-+ * If next and rq belong to the same bfq_queue and next is older
-+ * than rq, then reposition rq in the fifo (by substituting next
-+ * with rq). Otherwise, if next and rq belong to different
-+ * bfq_queues, never reposition rq: in fact, we would have to
-+ * reposition it with respect to next's position in its own fifo,
-+ * which would most certainly be too expensive with respect to
-+ * the benefits.
-+ */
-+ if (bfqq == next_bfqq &&
-+ !list_empty(&rq->queuelist) && !list_empty(&next->queuelist) &&
-+ time_before(next->fifo_time, rq->fifo_time)) {
-+ list_del_init(&rq->queuelist);
-+ list_replace_init(&next->queuelist, &rq->queuelist);
-+ rq->fifo_time = next->fifo_time;
-+ }
-+
-+ if (bfqq->next_rq == next)
-+ bfqq->next_rq = rq;
-+
-+ bfq_remove_request(next);
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ bfqg_stats_update_io_merged(bfqq_group(bfqq), next->cmd_flags);
-+#endif
-+}
-+
-+/* Must be called with bfqq != NULL */
-+static void bfq_bfqq_end_wr(struct bfq_queue *bfqq)
-+{
-+ BUG_ON(!bfqq);
-+ if (bfq_bfqq_busy(bfqq))
-+ bfqq->bfqd->wr_busy_queues--;
-+ bfqq->wr_coeff = 1;
-+ bfqq->wr_cur_max_time = 0;
-+ /* Trigger a weight change on the next activation of the queue */
-+ bfqq->entity.prio_changed = 1;
-+}
-+
-+static void bfq_end_wr_async_queues(struct bfq_data *bfqd,
-+ struct bfq_group *bfqg)
-+{
-+ int i, j;
-+
-+ for (i = 0; i < 2; i++)
-+ for (j = 0; j < IOPRIO_BE_NR; j++)
-+ if (bfqg->async_bfqq[i][j])
-+ bfq_bfqq_end_wr(bfqg->async_bfqq[i][j]);
-+ if (bfqg->async_idle_bfqq)
-+ bfq_bfqq_end_wr(bfqg->async_idle_bfqq);
-+}
-+
-+static void bfq_end_wr(struct bfq_data *bfqd)
-+{
-+ struct bfq_queue *bfqq;
-+
-+ spin_lock_irq(bfqd->queue->queue_lock);
-+
-+ list_for_each_entry(bfqq, &bfqd->active_list, bfqq_list)
-+ bfq_bfqq_end_wr(bfqq);
-+ list_for_each_entry(bfqq, &bfqd->idle_list, bfqq_list)
-+ bfq_bfqq_end_wr(bfqq);
-+ bfq_end_wr_async(bfqd);
-+
-+ spin_unlock_irq(bfqd->queue->queue_lock);
-+}
-+
-+static int bfq_allow_merge(struct request_queue *q, struct request *rq,
-+ struct bio *bio)
-+{
-+ struct bfq_data *bfqd = q->elevator->elevator_data;
-+ struct bfq_io_cq *bic;
-+
-+ /*
-+ * Disallow merge of a sync bio into an async request.
-+ */
-+ if (bfq_bio_sync(bio) && !rq_is_sync(rq))
-+ return 0;
-+
-+ /*
-+ * Lookup the bfqq that this bio will be queued with. Allow
-+ * merge only if rq is queued there.
-+ * Queue lock is held here.
-+ */
-+ bic = bfq_bic_lookup(bfqd, current->io_context);
-+ if (!bic)
-+ return 0;
-+
-+ return bic_to_bfqq(bic, bfq_bio_sync(bio)) == RQ_BFQQ(rq);
-+}
-+
-+static void __bfq_set_in_service_queue(struct bfq_data *bfqd,
-+ struct bfq_queue *bfqq)
-+{
-+ if (bfqq) {
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ bfqg_stats_update_avg_queue_size(bfqq_group(bfqq));
-+#endif
-+ bfq_mark_bfqq_must_alloc(bfqq);
-+ bfq_mark_bfqq_budget_new(bfqq);
-+ bfq_clear_bfqq_fifo_expire(bfqq);
-+
-+ bfqd->budgets_assigned = (bfqd->budgets_assigned*7 + 256) / 8;
-+
-+ bfq_log_bfqq(bfqd, bfqq,
-+ "set_in_service_queue, cur-budget = %d",
-+ bfqq->entity.budget);
-+ }
-+
-+ bfqd->in_service_queue = bfqq;
-+}
-+
-+/*
-+ * Get and set a new queue for service.
-+ */
-+static struct bfq_queue *bfq_set_in_service_queue(struct bfq_data *bfqd)
-+{
-+ struct bfq_queue *bfqq = bfq_get_next_queue(bfqd);
-+
-+ __bfq_set_in_service_queue(bfqd, bfqq);
-+ return bfqq;
-+}
-+
-+/*
-+ * If enough samples have been computed, return the current max budget
-+ * stored in bfqd, which is dynamically updated according to the
-+ * estimated disk peak rate; otherwise return the default max budget
-+ */
-+static int bfq_max_budget(struct bfq_data *bfqd)
-+{
-+ if (bfqd->budgets_assigned < bfq_stats_min_budgets)
-+ return bfq_default_max_budget;
-+ else
-+ return bfqd->bfq_max_budget;
-+}
-+
-+/*
-+ * Return min budget, which is a fraction of the current or default
-+ * max budget (trying with 1/32)
-+ */
-+static int bfq_min_budget(struct bfq_data *bfqd)
-+{
-+ if (bfqd->budgets_assigned < bfq_stats_min_budgets)
-+ return bfq_default_max_budget / 32;
-+ else
-+ return bfqd->bfq_max_budget / 32;
-+}
-+
-+static void bfq_arm_slice_timer(struct bfq_data *bfqd)
-+{
-+ struct bfq_queue *bfqq = bfqd->in_service_queue;
-+ struct bfq_io_cq *bic;
-+ unsigned long sl;
-+
-+ BUG_ON(!RB_EMPTY_ROOT(&bfqq->sort_list));
-+
-+ /* Processes have exited, don't wait. */
-+ bic = bfqd->in_service_bic;
-+ if (!bic || atomic_read(&bic->icq.ioc->active_ref) == 0)
-+ return;
-+
-+ bfq_mark_bfqq_wait_request(bfqq);
-+
-+ /*
-+ * We don't want to idle for seeks, but we do want to allow
-+ * fair distribution of slice time for a process doing back-to-back
-+ * seeks. So allow a little bit of time for him to submit a new rq.
-+ *
-+ * To prevent processes with (partly) seeky workloads from
-+ * being too ill-treated, grant them a small fraction of the
-+ * assigned budget before reducing the waiting time to
-+ * BFQ_MIN_TT. This happened to help reduce latency.
-+ */
-+ sl = bfqd->bfq_slice_idle;
-+ /*
-+ * Unless the queue is being weight-raised or the scenario is
-+ * asymmetric, grant only minimum idle time if the queue either
-+ * has been seeky for long enough or has already proved to be
-+ * constantly seeky.
-+ */
-+ if (bfq_sample_valid(bfqq->seek_samples) &&
-+ ((BFQQ_SEEKY(bfqq) && bfqq->entity.service >
-+ bfq_max_budget(bfqq->bfqd) / 8) ||
-+ bfq_bfqq_constantly_seeky(bfqq)) && bfqq->wr_coeff == 1 &&
-+ bfq_symmetric_scenario(bfqd))
-+ sl = min(sl, msecs_to_jiffies(BFQ_MIN_TT));
-+ else if (bfqq->wr_coeff > 1)
-+ sl = sl * 3;
-+ bfqd->last_idling_start = ktime_get();
-+ mod_timer(&bfqd->idle_slice_timer, jiffies + sl);
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ bfqg_stats_set_start_idle_time(bfqq_group(bfqq));
-+#endif
-+ bfq_log(bfqd, "arm idle: %u/%u ms",
-+ jiffies_to_msecs(sl), jiffies_to_msecs(bfqd->bfq_slice_idle));
-+}
-+
-+/*
-+ * Set the maximum time for the in-service queue to consume its
-+ * budget. This prevents seeky processes from lowering the disk
-+ * throughput (always guaranteed with a time slice scheme as in CFQ).
-+ */
-+static void bfq_set_budget_timeout(struct bfq_data *bfqd)
-+{
-+ struct bfq_queue *bfqq = bfqd->in_service_queue;
-+ unsigned int timeout_coeff;
-+ if (bfqq->wr_cur_max_time == bfqd->bfq_wr_rt_max_time)
-+ timeout_coeff = 1;
-+ else
-+ timeout_coeff = bfqq->entity.weight / bfqq->entity.orig_weight;
-+
-+ bfqd->last_budget_start = ktime_get();
-+
-+ bfq_clear_bfqq_budget_new(bfqq);
-+ bfqq->budget_timeout = jiffies +
-+ bfqd->bfq_timeout[bfq_bfqq_sync(bfqq)] * timeout_coeff;
-+
-+ bfq_log_bfqq(bfqd, bfqq, "set budget_timeout %u",
-+ jiffies_to_msecs(bfqd->bfq_timeout[bfq_bfqq_sync(bfqq)] *
-+ timeout_coeff));
-+}
-+
-+/*
-+ * Move request from internal lists to the request queue dispatch list.
-+ */
-+static void bfq_dispatch_insert(struct request_queue *q, struct request *rq)
-+{
-+ struct bfq_data *bfqd = q->elevator->elevator_data;
-+ struct bfq_queue *bfqq = RQ_BFQQ(rq);
-+
-+ /*
-+ * For consistency, the next instruction should have been executed
-+ * after removing the request from the queue and dispatching it.
-+ * We execute instead this instruction before bfq_remove_request()
-+ * (and hence introduce a temporary inconsistency), for efficiency.
-+ * In fact, in a forced_dispatch, this prevents two counters related
-+ * to bfqq->dispatched to risk to be uselessly decremented if bfqq
-+ * is not in service, and then to be incremented again after
-+ * incrementing bfqq->dispatched.
-+ */
-+ bfqq->dispatched++;
-+ bfq_remove_request(rq);
-+ elv_dispatch_sort(q, rq);
-+
-+ if (bfq_bfqq_sync(bfqq))
-+ bfqd->sync_flight++;
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ bfqg_stats_update_dispatch(bfqq_group(bfqq), blk_rq_bytes(rq),
-+ rq->cmd_flags);
-+#endif
-+}
-+
-+/*
-+ * Return expired entry, or NULL to just start from scratch in rbtree.
-+ */
-+static struct request *bfq_check_fifo(struct bfq_queue *bfqq)
-+{
-+ struct request *rq = NULL;
-+
-+ if (bfq_bfqq_fifo_expire(bfqq))
-+ return NULL;
-+
-+ bfq_mark_bfqq_fifo_expire(bfqq);
-+
-+ if (list_empty(&bfqq->fifo))
-+ return NULL;
-+
-+ rq = rq_entry_fifo(bfqq->fifo.next);
-+
-+ if (time_before(jiffies, rq->fifo_time))
-+ return NULL;
-+
-+ return rq;
-+}
-+
-+static int bfq_bfqq_budget_left(struct bfq_queue *bfqq)
-+{
-+ struct bfq_entity *entity = &bfqq->entity;
-+ return entity->budget - entity->service;
-+}
-+
-+static void __bfq_bfqq_expire(struct bfq_data *bfqd, struct bfq_queue *bfqq)
-+{
-+ BUG_ON(bfqq != bfqd->in_service_queue);
-+
-+ __bfq_bfqd_reset_in_service(bfqd);
-+
-+ if (RB_EMPTY_ROOT(&bfqq->sort_list)) {
-+ /*
-+ * Overloading budget_timeout field to store the time
-+ * at which the queue remains with no backlog; used by
-+ * the weight-raising mechanism.
-+ */
-+ bfqq->budget_timeout = jiffies;
-+ bfq_del_bfqq_busy(bfqd, bfqq, 1);
-+ } else
-+ bfq_activate_bfqq(bfqd, bfqq);
-+}
-+
-+/**
-+ * __bfq_bfqq_recalc_budget - try to adapt the budget to the @bfqq behavior.
-+ * @bfqd: device data.
-+ * @bfqq: queue to update.
-+ * @reason: reason for expiration.
-+ *
-+ * Handle the feedback on @bfqq budget at queue expiration.
-+ * See the body for detailed comments.
-+ */
-+static void __bfq_bfqq_recalc_budget(struct bfq_data *bfqd,
-+ struct bfq_queue *bfqq,
-+ enum bfqq_expiration reason)
-+{
-+ struct request *next_rq;
-+ int budget, min_budget;
-+
-+ budget = bfqq->max_budget;
-+ min_budget = bfq_min_budget(bfqd);
-+
-+ BUG_ON(bfqq != bfqd->in_service_queue);
-+
-+ bfq_log_bfqq(bfqd, bfqq, "recalc_budg: last budg %d, budg left %d",
-+ bfqq->entity.budget, bfq_bfqq_budget_left(bfqq));
-+ bfq_log_bfqq(bfqd, bfqq, "recalc_budg: last max_budg %d, min budg %d",
-+ budget, bfq_min_budget(bfqd));
-+ bfq_log_bfqq(bfqd, bfqq, "recalc_budg: sync %d, seeky %d",
-+ bfq_bfqq_sync(bfqq), BFQQ_SEEKY(bfqd->in_service_queue));
-+
-+ if (bfq_bfqq_sync(bfqq)) {
-+ switch (reason) {
-+ /*
-+ * Caveat: in all the following cases we trade latency
-+ * for throughput.
-+ */
-+ case BFQ_BFQQ_TOO_IDLE:
-+ /*
-+ * This is the only case where we may reduce
-+ * the budget: if there is no request of the
-+ * process still waiting for completion, then
-+ * we assume (tentatively) that the timer has
-+ * expired because the batch of requests of
-+ * the process could have been served with a
-+ * smaller budget. Hence, betting that
-+ * process will behave in the same way when it
-+ * becomes backlogged again, we reduce its
-+ * next budget. As long as we guess right,
-+ * this budget cut reduces the latency
-+ * experienced by the process.
-+ *
-+ * However, if there are still outstanding
-+ * requests, then the process may have not yet
-+ * issued its next request just because it is
-+ * still waiting for the completion of some of
-+ * the still outstanding ones. So in this
-+ * subcase we do not reduce its budget, on the
-+ * contrary we increase it to possibly boost
-+ * the throughput, as discussed in the
-+ * comments to the BUDGET_TIMEOUT case.
-+ */
-+ if (bfqq->dispatched > 0) /* still outstanding reqs */
-+ budget = min(budget * 2, bfqd->bfq_max_budget);
-+ else {
-+ if (budget > 5 * min_budget)
-+ budget -= 4 * min_budget;
-+ else
-+ budget = min_budget;
-+ }
-+ break;
-+ case BFQ_BFQQ_BUDGET_TIMEOUT:
-+ /*
-+ * We double the budget here because: 1) it
-+ * gives the chance to boost the throughput if
-+ * this is not a seeky process (which may have
-+ * bumped into this timeout because of, e.g.,
-+ * ZBR), 2) together with charge_full_budget
-+ * it helps give seeky processes higher
-+ * timestamps, and hence be served less
-+ * frequently.
-+ */
-+ budget = min(budget * 2, bfqd->bfq_max_budget);
-+ break;
-+ case BFQ_BFQQ_BUDGET_EXHAUSTED:
-+ /*
-+ * The process still has backlog, and did not
-+ * let either the budget timeout or the disk
-+ * idling timeout expire. Hence it is not
-+ * seeky, has a short thinktime and may be
-+ * happy with a higher budget too. So
-+ * definitely increase the budget of this good
-+ * candidate to boost the disk throughput.
-+ */
-+ budget = min(budget * 4, bfqd->bfq_max_budget);
-+ break;
-+ case BFQ_BFQQ_NO_MORE_REQUESTS:
-+ /*
-+ * Leave the budget unchanged.
-+ */
-+ default:
-+ return;
-+ }
-+ } else
-+ /*
-+ * Async queues get always the maximum possible budget
-+ * (their ability to dispatch is limited by
-+ * @bfqd->bfq_max_budget_async_rq).
-+ */
-+ budget = bfqd->bfq_max_budget;
-+
-+ bfqq->max_budget = budget;
-+
-+ if (bfqd->budgets_assigned >= bfq_stats_min_budgets &&
-+ !bfqd->bfq_user_max_budget)
-+ bfqq->max_budget = min(bfqq->max_budget, bfqd->bfq_max_budget);
-+
-+ /*
-+ * Make sure that we have enough budget for the next request.
-+ * Since the finish time of the bfqq must be kept in sync with
-+ * the budget, be sure to call __bfq_bfqq_expire() after the
-+ * update.
-+ */
-+ next_rq = bfqq->next_rq;
-+ if (next_rq)
-+ bfqq->entity.budget = max_t(unsigned long, bfqq->max_budget,
-+ bfq_serv_to_charge(next_rq, bfqq));
-+ else
-+ bfqq->entity.budget = bfqq->max_budget;
-+
-+ bfq_log_bfqq(bfqd, bfqq, "head sect: %u, new budget %d",
-+ next_rq ? blk_rq_sectors(next_rq) : 0,
-+ bfqq->entity.budget);
-+}
-+
-+static unsigned long bfq_calc_max_budget(u64 peak_rate, u64 timeout)
-+{
-+ unsigned long max_budget;
-+
-+ /*
-+ * The max_budget calculated when autotuning is equal to the
-+ * amount of sectors transfered in timeout_sync at the
-+ * estimated peak rate.
-+ */
-+ max_budget = (unsigned long)(peak_rate * 1000 *
-+ timeout >> BFQ_RATE_SHIFT);
-+
-+ return max_budget;
-+}
-+
-+/*
-+ * In addition to updating the peak rate, checks whether the process
-+ * is "slow", and returns 1 if so. This slow flag is used, in addition
-+ * to the budget timeout, to reduce the amount of service provided to
-+ * seeky processes, and hence reduce their chances to lower the
-+ * throughput. See the code for more details.
-+ */
-+static bool bfq_update_peak_rate(struct bfq_data *bfqd, struct bfq_queue *bfqq,
-+ bool compensate, enum bfqq_expiration reason)
-+{
-+ u64 bw, usecs, expected, timeout;
-+ ktime_t delta;
-+ int update = 0;
-+
-+ if (!bfq_bfqq_sync(bfqq) || bfq_bfqq_budget_new(bfqq))
-+ return false;
-+
-+ if (compensate)
-+ delta = bfqd->last_idling_start;
-+ else
-+ delta = ktime_get();
-+ delta = ktime_sub(delta, bfqd->last_budget_start);
-+ usecs = ktime_to_us(delta);
-+
-+ /* Don't trust short/unrealistic values. */
-+ if (usecs < 100 || usecs >= LONG_MAX)
-+ return false;
-+
-+ /*
-+ * Calculate the bandwidth for the last slice. We use a 64 bit
-+ * value to store the peak rate, in sectors per usec in fixed
-+ * point math. We do so to have enough precision in the estimate
-+ * and to avoid overflows.
-+ */
-+ bw = (u64)bfqq->entity.service << BFQ_RATE_SHIFT;
-+ do_div(bw, (unsigned long)usecs);
-+
-+ timeout = jiffies_to_msecs(bfqd->bfq_timeout[BLK_RW_SYNC]);
-+
-+ /*
-+ * Use only long (> 20ms) intervals to filter out spikes for
-+ * the peak rate estimation.
-+ */
-+ if (usecs > 20000) {
-+ if (bw > bfqd->peak_rate ||
-+ (!BFQQ_SEEKY(bfqq) &&
-+ reason == BFQ_BFQQ_BUDGET_TIMEOUT)) {
-+ bfq_log(bfqd, "measured bw =%llu", bw);
-+ /*
-+ * To smooth oscillations use a low-pass filter with
-+ * alpha=7/8, i.e.,
-+ * new_rate = (7/8) * old_rate + (1/8) * bw
-+ */
-+ do_div(bw, 8);
-+ if (bw == 0)
-+ return 0;
-+ bfqd->peak_rate *= 7;
-+ do_div(bfqd->peak_rate, 8);
-+ bfqd->peak_rate += bw;
-+ update = 1;
-+ bfq_log(bfqd, "new peak_rate=%llu", bfqd->peak_rate);
-+ }
-+
-+ update |= bfqd->peak_rate_samples == BFQ_PEAK_RATE_SAMPLES - 1;
-+
-+ if (bfqd->peak_rate_samples < BFQ_PEAK_RATE_SAMPLES)
-+ bfqd->peak_rate_samples++;
-+
-+ if (bfqd->peak_rate_samples == BFQ_PEAK_RATE_SAMPLES &&
-+ update) {
-+ int dev_type = blk_queue_nonrot(bfqd->queue);
-+ if (bfqd->bfq_user_max_budget == 0) {
-+ bfqd->bfq_max_budget =
-+ bfq_calc_max_budget(bfqd->peak_rate,
-+ timeout);
-+ bfq_log(bfqd, "new max_budget=%d",
-+ bfqd->bfq_max_budget);
-+ }
-+ if (bfqd->device_speed == BFQ_BFQD_FAST &&
-+ bfqd->peak_rate < device_speed_thresh[dev_type]) {
-+ bfqd->device_speed = BFQ_BFQD_SLOW;
-+ bfqd->RT_prod = R_slow[dev_type] *
-+ T_slow[dev_type];
-+ } else if (bfqd->device_speed == BFQ_BFQD_SLOW &&
-+ bfqd->peak_rate > device_speed_thresh[dev_type]) {
-+ bfqd->device_speed = BFQ_BFQD_FAST;
-+ bfqd->RT_prod = R_fast[dev_type] *
-+ T_fast[dev_type];
-+ }
-+ }
-+ }
-+
-+ /*
-+ * If the process has been served for a too short time
-+ * interval to let its possible sequential accesses prevail on
-+ * the initial seek time needed to move the disk head on the
-+ * first sector it requested, then give the process a chance
-+ * and for the moment return false.
-+ */
-+ if (bfqq->entity.budget <= bfq_max_budget(bfqd) / 8)
-+ return false;
-+
-+ /*
-+ * A process is considered ``slow'' (i.e., seeky, so that we
-+ * cannot treat it fairly in the service domain, as it would
-+ * slow down too much the other processes) if, when a slice
-+ * ends for whatever reason, it has received service at a
-+ * rate that would not be high enough to complete the budget
-+ * before the budget timeout expiration.
-+ */
-+ expected = bw * 1000 * timeout >> BFQ_RATE_SHIFT;
-+
-+ /*
-+ * Caveat: processes doing IO in the slower disk zones will
-+ * tend to be slow(er) even if not seeky. And the estimated
-+ * peak rate will actually be an average over the disk
-+ * surface. Hence, to not be too harsh with unlucky processes,
-+ * we keep a budget/3 margin of safety before declaring a
-+ * process slow.
-+ */
-+ return expected > (4 * bfqq->entity.budget) / 3;
-+}
-+
-+/*
-+ * To be deemed as soft real-time, an application must meet two
-+ * requirements. First, the application must not require an average
-+ * bandwidth higher than the approximate bandwidth required to playback or
-+ * record a compressed high-definition video.
-+ * The next function is invoked on the completion of the last request of a
-+ * batch, to compute the next-start time instant, soft_rt_next_start, such
-+ * that, if the next request of the application does not arrive before
-+ * soft_rt_next_start, then the above requirement on the bandwidth is met.
-+ *
-+ * The second requirement is that the request pattern of the application is
-+ * isochronous, i.e., that, after issuing a request or a batch of requests,
-+ * the application stops issuing new requests until all its pending requests
-+ * have been completed. After that, the application may issue a new batch,
-+ * and so on.
-+ * For this reason the next function is invoked to compute
-+ * soft_rt_next_start only for applications that meet this requirement,
-+ * whereas soft_rt_next_start is set to infinity for applications that do
-+ * not.
-+ *
-+ * Unfortunately, even a greedy application may happen to behave in an
-+ * isochronous way if the CPU load is high. In fact, the application may
-+ * stop issuing requests while the CPUs are busy serving other processes,
-+ * then restart, then stop again for a while, and so on. In addition, if
-+ * the disk achieves a low enough throughput with the request pattern
-+ * issued by the application (e.g., because the request pattern is random
-+ * and/or the device is slow), then the application may meet the above
-+ * bandwidth requirement too. To prevent such a greedy application to be
-+ * deemed as soft real-time, a further rule is used in the computation of
-+ * soft_rt_next_start: soft_rt_next_start must be higher than the current
-+ * time plus the maximum time for which the arrival of a request is waited
-+ * for when a sync queue becomes idle, namely bfqd->bfq_slice_idle.
-+ * This filters out greedy applications, as the latter issue instead their
-+ * next request as soon as possible after the last one has been completed
-+ * (in contrast, when a batch of requests is completed, a soft real-time
-+ * application spends some time processing data).
-+ *
-+ * Unfortunately, the last filter may easily generate false positives if
-+ * only bfqd->bfq_slice_idle is used as a reference time interval and one
-+ * or both the following cases occur:
-+ * 1) HZ is so low that the duration of a jiffy is comparable to or higher
-+ * than bfqd->bfq_slice_idle. This happens, e.g., on slow devices with
-+ * HZ=100.
-+ * 2) jiffies, instead of increasing at a constant rate, may stop increasing
-+ * for a while, then suddenly 'jump' by several units to recover the lost
-+ * increments. This seems to happen, e.g., inside virtual machines.
-+ * To address this issue, we do not use as a reference time interval just
-+ * bfqd->bfq_slice_idle, but bfqd->bfq_slice_idle plus a few jiffies. In
-+ * particular we add the minimum number of jiffies for which the filter
-+ * seems to be quite precise also in embedded systems and KVM/QEMU virtual
-+ * machines.
-+ */
-+static unsigned long bfq_bfqq_softrt_next_start(struct bfq_data *bfqd,
-+ struct bfq_queue *bfqq)
-+{
-+ return max(bfqq->last_idle_bklogged +
-+ HZ * bfqq->service_from_backlogged /
-+ bfqd->bfq_wr_max_softrt_rate,
-+ jiffies + bfqq->bfqd->bfq_slice_idle + 4);
-+}
-+
-+/*
-+ * Return the largest-possible time instant such that, for as long as possible,
-+ * the current time will be lower than this time instant according to the macro
-+ * time_is_before_jiffies().
-+ */
-+static unsigned long bfq_infinity_from_now(unsigned long now)
-+{
-+ return now + ULONG_MAX / 2;
-+}
-+
-+/**
-+ * bfq_bfqq_expire - expire a queue.
-+ * @bfqd: device owning the queue.
-+ * @bfqq: the queue to expire.
-+ * @compensate: if true, compensate for the time spent idling.
-+ * @reason: the reason causing the expiration.
-+ *
-+ *
-+ * If the process associated to the queue is slow (i.e., seeky), or in
-+ * case of budget timeout, or, finally, if it is async, we
-+ * artificially charge it an entire budget (independently of the
-+ * actual service it received). As a consequence, the queue will get
-+ * higher timestamps than the correct ones upon reactivation, and
-+ * hence it will be rescheduled as if it had received more service
-+ * than what it actually received. In the end, this class of processes
-+ * will receive less service in proportion to how slowly they consume
-+ * their budgets (and hence how seriously they tend to lower the
-+ * throughput).
-+ *
-+ * In contrast, when a queue expires because it has been idling for
-+ * too much or because it exhausted its budget, we do not touch the
-+ * amount of service it has received. Hence when the queue will be
-+ * reactivated and its timestamps updated, the latter will be in sync
-+ * with the actual service received by the queue until expiration.
-+ *
-+ * Charging a full budget to the first type of queues and the exact
-+ * service to the others has the effect of using the WF2Q+ policy to
-+ * schedule the former on a timeslice basis, without violating the
-+ * service domain guarantees of the latter.
-+ */
-+static void bfq_bfqq_expire(struct bfq_data *bfqd,
-+ struct bfq_queue *bfqq,
-+ bool compensate,
-+ enum bfqq_expiration reason)
-+{
-+ bool slow;
-+ BUG_ON(bfqq != bfqd->in_service_queue);
-+
-+ /*
-+ * Update disk peak rate for autotuning and check whether the
-+ * process is slow (see bfq_update_peak_rate).
-+ */
-+ slow = bfq_update_peak_rate(bfqd, bfqq, compensate, reason);
-+
-+ /*
-+ * As above explained, 'punish' slow (i.e., seeky), timed-out
-+ * and async queues, to favor sequential sync workloads.
-+ *
-+ * Processes doing I/O in the slower disk zones will tend to be
-+ * slow(er) even if not seeky. Hence, since the estimated peak
-+ * rate is actually an average over the disk surface, these
-+ * processes may timeout just for bad luck. To avoid punishing
-+ * them we do not charge a full budget to a process that
-+ * succeeded in consuming at least 2/3 of its budget.
-+ */
-+ if (slow || (reason == BFQ_BFQQ_BUDGET_TIMEOUT &&
-+ bfq_bfqq_budget_left(bfqq) >= bfqq->entity.budget / 3))
-+ bfq_bfqq_charge_full_budget(bfqq);
-+
-+ bfqq->service_from_backlogged += bfqq->entity.service;
-+
-+ if (BFQQ_SEEKY(bfqq) && reason == BFQ_BFQQ_BUDGET_TIMEOUT &&
-+ !bfq_bfqq_constantly_seeky(bfqq)) {
-+ bfq_mark_bfqq_constantly_seeky(bfqq);
-+ if (!blk_queue_nonrot(bfqd->queue))
-+ bfqd->const_seeky_busy_in_flight_queues++;
-+ }
-+
-+ if (reason == BFQ_BFQQ_TOO_IDLE &&
-+ bfqq->entity.service <= 2 * bfqq->entity.budget / 10 )
-+ bfq_clear_bfqq_IO_bound(bfqq);
-+
-+ if (bfqd->low_latency && bfqq->wr_coeff == 1)
-+ bfqq->last_wr_start_finish = jiffies;
-+
-+ if (bfqd->low_latency && bfqd->bfq_wr_max_softrt_rate > 0 &&
-+ RB_EMPTY_ROOT(&bfqq->sort_list)) {
-+ /*
-+ * If we get here, and there are no outstanding requests,
-+ * then the request pattern is isochronous (see the comments
-+ * to the function bfq_bfqq_softrt_next_start()). Hence we
-+ * can compute soft_rt_next_start. If, instead, the queue
-+ * still has outstanding requests, then we have to wait
-+ * for the completion of all the outstanding requests to
-+ * discover whether the request pattern is actually
-+ * isochronous.
-+ */
-+ if (bfqq->dispatched == 0)
-+ bfqq->soft_rt_next_start =
-+ bfq_bfqq_softrt_next_start(bfqd, bfqq);
-+ else {
-+ /*
-+ * The application is still waiting for the
-+ * completion of one or more requests:
-+ * prevent it from possibly being incorrectly
-+ * deemed as soft real-time by setting its
-+ * soft_rt_next_start to infinity. In fact,
-+ * without this assignment, the application
-+ * would be incorrectly deemed as soft
-+ * real-time if:
-+ * 1) it issued a new request before the
-+ * completion of all its in-flight
-+ * requests, and
-+ * 2) at that time, its soft_rt_next_start
-+ * happened to be in the past.
-+ */
-+ bfqq->soft_rt_next_start =
-+ bfq_infinity_from_now(jiffies);
-+ /*
-+ * Schedule an update of soft_rt_next_start to when
-+ * the task may be discovered to be isochronous.
-+ */
-+ bfq_mark_bfqq_softrt_update(bfqq);
-+ }
-+ }
-+
-+ bfq_log_bfqq(bfqd, bfqq,
-+ "expire (%d, slow %d, num_disp %d, idle_win %d)", reason,
-+ slow, bfqq->dispatched, bfq_bfqq_idle_window(bfqq));
-+
-+ /*
-+ * Increase, decrease or leave budget unchanged according to
-+ * reason.
-+ */
-+ __bfq_bfqq_recalc_budget(bfqd, bfqq, reason);
-+ __bfq_bfqq_expire(bfqd, bfqq);
-+}
-+
-+/*
-+ * Budget timeout is not implemented through a dedicated timer, but
-+ * just checked on request arrivals and completions, as well as on
-+ * idle timer expirations.
-+ */
-+static bool bfq_bfqq_budget_timeout(struct bfq_queue *bfqq)
-+{
-+ if (bfq_bfqq_budget_new(bfqq) ||
-+ time_before(jiffies, bfqq->budget_timeout))
-+ return false;
-+ return true;
-+}
-+
-+/*
-+ * If we expire a queue that is waiting for the arrival of a new
-+ * request, we may prevent the fictitious timestamp back-shifting that
-+ * allows the guarantees of the queue to be preserved (see [1] for
-+ * this tricky aspect). Hence we return true only if this condition
-+ * does not hold, or if the queue is slow enough to deserve only to be
-+ * kicked off for preserving a high throughput.
-+*/
-+static bool bfq_may_expire_for_budg_timeout(struct bfq_queue *bfqq)
-+{
-+ bfq_log_bfqq(bfqq->bfqd, bfqq,
-+ "may_budget_timeout: wait_request %d left %d timeout %d",
-+ bfq_bfqq_wait_request(bfqq),
-+ bfq_bfqq_budget_left(bfqq) >= bfqq->entity.budget / 3,
-+ bfq_bfqq_budget_timeout(bfqq));
-+
-+ return (!bfq_bfqq_wait_request(bfqq) ||
-+ bfq_bfqq_budget_left(bfqq) >= bfqq->entity.budget / 3)
-+ &&
-+ bfq_bfqq_budget_timeout(bfqq);
-+}
-+
-+/*
-+ * For a queue that becomes empty, device idling is allowed only if
-+ * this function returns true for that queue. As a consequence, since
-+ * device idling plays a critical role for both throughput boosting
-+ * and service guarantees, the return value of this function plays a
-+ * critical role as well.
-+ *
-+ * In a nutshell, this function returns true only if idling is
-+ * beneficial for throughput or, even if detrimental for throughput,
-+ * idling is however necessary to preserve service guarantees (low
-+ * latency, desired throughput distribution, ...). In particular, on
-+ * NCQ-capable devices, this function tries to return false, so as to
-+ * help keep the drives' internal queues full, whenever this helps the
-+ * device boost the throughput without causing any service-guarantee
-+ * issue.
-+ *
-+ * In more detail, the return value of this function is obtained by,
-+ * first, computing a number of boolean variables that take into
-+ * account throughput and service-guarantee issues, and, then,
-+ * combining these variables in a logical expression. Most of the
-+ * issues taken into account are not trivial. We discuss these issues
-+ * while introducing the variables.
-+ */
-+static bool bfq_bfqq_may_idle(struct bfq_queue *bfqq)
-+{
-+ struct bfq_data *bfqd = bfqq->bfqd;
-+ bool idling_boosts_thr, idling_boosts_thr_without_issues,
-+ all_queues_seeky, on_hdd_and_not_all_queues_seeky,
-+ idling_needed_for_service_guarantees,
-+ asymmetric_scenario;
-+
-+ /*
-+ * The next variable takes into account the cases where idling
-+ * boosts the throughput.
-+ *
-+ * The value of the variable is computed considering, first, that
-+ * idling is virtually always beneficial for the throughput if:
-+ * (a) the device is not NCQ-capable, or
-+ * (b) regardless of the presence of NCQ, the device is rotational
-+ * and the request pattern for bfqq is I/O-bound and sequential.
-+ *
-+ * Secondly, and in contrast to the above item (b), idling an
-+ * NCQ-capable flash-based device would not boost the
-+ * throughput even with sequential I/O; rather it would lower
-+ * the throughput in proportion to how fast the device
-+ * is. Accordingly, the next variable is true if any of the
-+ * above conditions (a) and (b) is true, and, in particular,
-+ * happens to be false if bfqd is an NCQ-capable flash-based
-+ * device.
-+ */
-+ idling_boosts_thr = !bfqd->hw_tag ||
-+ (!blk_queue_nonrot(bfqd->queue) && bfq_bfqq_IO_bound(bfqq) &&
-+ bfq_bfqq_idle_window(bfqq)) ;
-+
-+ /*
-+ * The value of the next variable,
-+ * idling_boosts_thr_without_issues, is equal to that of
-+ * idling_boosts_thr, unless a special case holds. In this
-+ * special case, described below, idling may cause problems to
-+ * weight-raised queues.
-+ *
-+ * When the request pool is saturated (e.g., in the presence
-+ * of write hogs), if the processes associated with
-+ * non-weight-raised queues ask for requests at a lower rate,
-+ * then processes associated with weight-raised queues have a
-+ * higher probability to get a request from the pool
-+ * immediately (or at least soon) when they need one. Thus
-+ * they have a higher probability to actually get a fraction
-+ * of the device throughput proportional to their high
-+ * weight. This is especially true with NCQ-capable drives,
-+ * which enqueue several requests in advance, and further
-+ * reorder internally-queued requests.
-+ *
-+ * For this reason, we force to false the value of
-+ * idling_boosts_thr_without_issues if there are weight-raised
-+ * busy queues. In this case, and if bfqq is not weight-raised,
-+ * this guarantees that the device is not idled for bfqq (if,
-+ * instead, bfqq is weight-raised, then idling will be
-+ * guaranteed by another variable, see below). Combined with
-+ * the timestamping rules of BFQ (see [1] for details), this
-+ * behavior causes bfqq, and hence any sync non-weight-raised
-+ * queue, to get a lower number of requests served, and thus
-+ * to ask for a lower number of requests from the request
-+ * pool, before the busy weight-raised queues get served
-+ * again. This often mitigates starvation problems in the
-+ * presence of heavy write workloads and NCQ, thereby
-+ * guaranteeing a higher application and system responsiveness
-+ * in these hostile scenarios.
-+ */
-+ idling_boosts_thr_without_issues = idling_boosts_thr &&
-+ bfqd->wr_busy_queues == 0;
-+
-+ /*
-+ * There are then two cases where idling must be performed not
-+ * for throughput concerns, but to preserve service
-+ * guarantees. In the description of these cases, we say, for
-+ * short, that a queue is sequential/random if the process
-+ * associated to the queue issues sequential/random requests
-+ * (in the second case the queue may be tagged as seeky or
-+ * even constantly_seeky).
-+ *
-+ * To introduce the first case, we note that, since
-+ * bfq_bfqq_idle_window(bfqq) is false if the device is
-+ * NCQ-capable and bfqq is random (see
-+ * bfq_update_idle_window()), then, from the above two
-+ * assignments it follows that
-+ * idling_boosts_thr_without_issues is false if the device is
-+ * NCQ-capable and bfqq is random. Therefore, for this case,
-+ * device idling would never be allowed if we used just
-+ * idling_boosts_thr_without_issues to decide whether to allow
-+ * it. And, beneficially, this would imply that throughput
-+ * would always be boosted also with random I/O on NCQ-capable
-+ * HDDs.
-+ *
-+ * But we must be careful on this point, to avoid an unfair
-+ * treatment for bfqq. In fact, because of the same above
-+ * assignments, idling_boosts_thr_without_issues is, on the
-+ * other hand, true if 1) the device is an HDD and bfqq is
-+ * sequential, and 2) there are no busy weight-raised
-+ * queues. As a consequence, if we used just
-+ * idling_boosts_thr_without_issues to decide whether to idle
-+ * the device, then with an HDD we might easily bump into a
-+ * scenario where queues that are sequential and I/O-bound
-+ * would enjoy idling, whereas random queues would not. The
-+ * latter might then get a low share of the device throughput,
-+ * simply because the former would get many requests served
-+ * after being set as in service, while the latter would not.
-+ *
-+ * To address this issue, we start by setting to true a
-+ * sentinel variable, on_hdd_and_not_all_queues_seeky, if the
-+ * device is rotational and not all queues with pending or
-+ * in-flight requests are constantly seeky (i.e., there are
-+ * active sequential queues, and bfqq might then be mistreated
-+ * if it does not enjoy idling because it is random).
-+ */
-+ all_queues_seeky = bfq_bfqq_constantly_seeky(bfqq) &&
-+ bfqd->busy_in_flight_queues ==
-+ bfqd->const_seeky_busy_in_flight_queues;
-+
-+ on_hdd_and_not_all_queues_seeky =
-+ !blk_queue_nonrot(bfqd->queue) && !all_queues_seeky;
-+
-+ /*
-+ * To introduce the second case where idling needs to be
-+ * performed to preserve service guarantees, we can note that
-+ * allowing the drive to enqueue more than one request at a
-+ * time, and hence delegating de facto final scheduling
-+ * decisions to the drive's internal scheduler, causes loss of
-+ * control on the actual request service order. In particular,
-+ * the critical situation is when requests from different
-+ * processes happens to be present, at the same time, in the
-+ * internal queue(s) of the drive. In such a situation, the
-+ * drive, by deciding the service order of the
-+ * internally-queued requests, does determine also the actual
-+ * throughput distribution among these processes. But the
-+ * drive typically has no notion or concern about per-process
-+ * throughput distribution, and makes its decisions only on a
-+ * per-request basis. Therefore, the service distribution
-+ * enforced by the drive's internal scheduler is likely to
-+ * coincide with the desired device-throughput distribution
-+ * only in a completely symmetric scenario where:
-+ * (i) each of these processes must get the same throughput as
-+ * the others;
-+ * (ii) all these processes have the same I/O pattern
-+ (either sequential or random).
-+ * In fact, in such a scenario, the drive will tend to treat
-+ * the requests of each of these processes in about the same
-+ * way as the requests of the others, and thus to provide
-+ * each of these processes with about the same throughput
-+ * (which is exactly the desired throughput distribution). In
-+ * contrast, in any asymmetric scenario, device idling is
-+ * certainly needed to guarantee that bfqq receives its
-+ * assigned fraction of the device throughput (see [1] for
-+ * details).
-+ *
-+ * We address this issue by controlling, actually, only the
-+ * symmetry sub-condition (i), i.e., provided that
-+ * sub-condition (i) holds, idling is not performed,
-+ * regardless of whether sub-condition (ii) holds. In other
-+ * words, only if sub-condition (i) holds, then idling is
-+ * allowed, and the device tends to be prevented from queueing
-+ * many requests, possibly of several processes. The reason
-+ * for not controlling also sub-condition (ii) is that, first,
-+ * in the case of an HDD, the asymmetry in terms of types of
-+ * I/O patterns is already taken in to account in the above
-+ * sentinel variable
-+ * on_hdd_and_not_all_queues_seeky. Secondly, in the case of a
-+ * flash-based device, we prefer however to privilege
-+ * throughput (and idling lowers throughput for this type of
-+ * devices), for the following reasons:
-+ * 1) differently from HDDs, the service time of random
-+ * requests is not orders of magnitudes lower than the service
-+ * time of sequential requests; thus, even if processes doing
-+ * sequential I/O get a preferential treatment with respect to
-+ * others doing random I/O, the consequences are not as
-+ * dramatic as with HDDs;
-+ * 2) if a process doing random I/O does need strong
-+ * throughput guarantees, it is hopefully already being
-+ * weight-raised, or the user is likely to have assigned it a
-+ * higher weight than the other processes (and thus
-+ * sub-condition (i) is likely to be false, which triggers
-+ * idling).
-+ *
-+ * According to the above considerations, the next variable is
-+ * true (only) if sub-condition (i) holds. To compute the
-+ * value of this variable, we not only use the return value of
-+ * the function bfq_symmetric_scenario(), but also check
-+ * whether bfqq is being weight-raised, because
-+ * bfq_symmetric_scenario() does not take into account also
-+ * weight-raised queues (see comments to
-+ * bfq_weights_tree_add()).
-+ *
-+ * As a side note, it is worth considering that the above
-+ * device-idling countermeasures may however fail in the
-+ * following unlucky scenario: if idling is (correctly)
-+ * disabled in a time period during which all symmetry
-+ * sub-conditions hold, and hence the device is allowed to
-+ * enqueue many requests, but at some later point in time some
-+ * sub-condition stops to hold, then it may become impossible
-+ * to let requests be served in the desired order until all
-+ * the requests already queued in the device have been served.
-+ */
-+ asymmetric_scenario = bfqq->wr_coeff > 1 ||
-+ !bfq_symmetric_scenario(bfqd);
-+
-+ /*
-+ * Finally, there is a case where maximizing throughput is the
-+ * best choice even if it may cause unfairness toward
-+ * bfqq. Such a case is when bfqq became active in a burst of
-+ * queue activations. Queues that became active during a large
-+ * burst benefit only from throughput, as discussed in the
-+ * comments to bfq_handle_burst. Thus, if bfqq became active
-+ * in a burst and not idling the device maximizes throughput,
-+ * then the device must no be idled, because not idling the
-+ * device provides bfqq and all other queues in the burst with
-+ * maximum benefit. Combining this and the two cases above, we
-+ * can now establish when idling is actually needed to
-+ * preserve service guarantees.
-+ */
-+ idling_needed_for_service_guarantees =
-+ (on_hdd_and_not_all_queues_seeky || asymmetric_scenario) &&
-+ !bfq_bfqq_in_large_burst(bfqq);
-+
-+ /*
-+ * We have now all the components we need to compute the return
-+ * value of the function, which is true only if both the following
-+ * conditions hold:
-+ * 1) bfqq is sync, because idling make sense only for sync queues;
-+ * 2) idling either boosts the throughput (without issues), or
-+ * is necessary to preserve service guarantees.
-+ */
-+ return bfq_bfqq_sync(bfqq) &&
-+ (idling_boosts_thr_without_issues ||
-+ idling_needed_for_service_guarantees);
-+}
-+
-+/*
-+ * If the in-service queue is empty but the function bfq_bfqq_may_idle
-+ * returns true, then:
-+ * 1) the queue must remain in service and cannot be expired, and
-+ * 2) the device must be idled to wait for the possible arrival of a new
-+ * request for the queue.
-+ * See the comments to the function bfq_bfqq_may_idle for the reasons
-+ * why performing device idling is the best choice to boost the throughput
-+ * and preserve service guarantees when bfq_bfqq_may_idle itself
-+ * returns true.
-+ */
-+static bool bfq_bfqq_must_idle(struct bfq_queue *bfqq)
-+{
-+ struct bfq_data *bfqd = bfqq->bfqd;
-+
-+ return RB_EMPTY_ROOT(&bfqq->sort_list) && bfqd->bfq_slice_idle != 0 &&
-+ bfq_bfqq_may_idle(bfqq);
-+}
-+
-+/*
-+ * Select a queue for service. If we have a current queue in service,
-+ * check whether to continue servicing it, or retrieve and set a new one.
-+ */
-+static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
-+{
-+ struct bfq_queue *bfqq;
-+ struct request *next_rq;
-+ enum bfqq_expiration reason = BFQ_BFQQ_BUDGET_TIMEOUT;
-+
-+ bfqq = bfqd->in_service_queue;
-+ if (!bfqq)
-+ goto new_queue;
-+
-+ bfq_log_bfqq(bfqd, bfqq, "select_queue: already in-service queue");
-+
-+ if (bfq_may_expire_for_budg_timeout(bfqq) &&
-+ !timer_pending(&bfqd->idle_slice_timer) &&
-+ !bfq_bfqq_must_idle(bfqq))
-+ goto expire;
-+
-+ next_rq = bfqq->next_rq;
-+ /*
-+ * If bfqq has requests queued and it has enough budget left to
-+ * serve them, keep the queue, otherwise expire it.
-+ */
-+ if (next_rq) {
-+ if (bfq_serv_to_charge(next_rq, bfqq) >
-+ bfq_bfqq_budget_left(bfqq)) {
-+ reason = BFQ_BFQQ_BUDGET_EXHAUSTED;
-+ goto expire;
-+ } else {
-+ /*
-+ * The idle timer may be pending because we may
-+ * not disable disk idling even when a new request
-+ * arrives.
-+ */
-+ if (timer_pending(&bfqd->idle_slice_timer)) {
-+ /*
-+ * If we get here: 1) at least a new request
-+ * has arrived but we have not disabled the
-+ * timer because the request was too small,
-+ * 2) then the block layer has unplugged
-+ * the device, causing the dispatch to be
-+ * invoked.
-+ *
-+ * Since the device is unplugged, now the
-+ * requests are probably large enough to
-+ * provide a reasonable throughput.
-+ * So we disable idling.
-+ */
-+ bfq_clear_bfqq_wait_request(bfqq);
-+ del_timer(&bfqd->idle_slice_timer);
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ bfqg_stats_update_idle_time(bfqq_group(bfqq));
-+#endif
-+ }
-+ goto keep_queue;
-+ }
-+ }
-+
-+ /*
-+ * No requests pending. However, if the in-service queue is idling
-+ * for a new request, or has requests waiting for a completion and
-+ * may idle after their completion, then keep it anyway.
-+ */
-+ if (timer_pending(&bfqd->idle_slice_timer) ||
-+ (bfqq->dispatched != 0 && bfq_bfqq_may_idle(bfqq))) {
-+ bfqq = NULL;
-+ goto keep_queue;
-+ }
-+
-+ reason = BFQ_BFQQ_NO_MORE_REQUESTS;
-+expire:
-+ bfq_bfqq_expire(bfqd, bfqq, false, reason);
-+new_queue:
-+ bfqq = bfq_set_in_service_queue(bfqd);
-+ bfq_log(bfqd, "select_queue: new queue %d returned",
-+ bfqq ? bfqq->pid : 0);
-+keep_queue:
-+ return bfqq;
-+}
-+
-+static void bfq_update_wr_data(struct bfq_data *bfqd, struct bfq_queue *bfqq)
-+{
-+ struct bfq_entity *entity = &bfqq->entity;
-+ if (bfqq->wr_coeff > 1) { /* queue is being weight-raised */
-+ bfq_log_bfqq(bfqd, bfqq,
-+ "raising period dur %u/%u msec, old coeff %u, w %d(%d)",
-+ jiffies_to_msecs(jiffies - bfqq->last_wr_start_finish),
-+ jiffies_to_msecs(bfqq->wr_cur_max_time),
-+ bfqq->wr_coeff,
-+ bfqq->entity.weight, bfqq->entity.orig_weight);
-+
-+ BUG_ON(bfqq != bfqd->in_service_queue && entity->weight !=
-+ entity->orig_weight * bfqq->wr_coeff);
-+ if (entity->prio_changed)
-+ bfq_log_bfqq(bfqd, bfqq, "WARN: pending prio change");
-+
-+ /*
-+ * If the queue was activated in a burst, or
-+ * too much time has elapsed from the beginning
-+ * of this weight-raising period, then end weight
-+ * raising.
-+ */
-+ if (bfq_bfqq_in_large_burst(bfqq) ||
-+ time_is_before_jiffies(bfqq->last_wr_start_finish +
-+ bfqq->wr_cur_max_time)) {
-+ bfqq->last_wr_start_finish = jiffies;
-+ bfq_log_bfqq(bfqd, bfqq,
-+ "wrais ending at %lu, rais_max_time %u",
-+ bfqq->last_wr_start_finish,
-+ jiffies_to_msecs(bfqq->wr_cur_max_time));
-+ bfq_bfqq_end_wr(bfqq);
-+ }
-+ }
-+ /* Update weight both if it must be raised and if it must be lowered */
-+ if ((entity->weight > entity->orig_weight) != (bfqq->wr_coeff > 1))
-+ __bfq_entity_update_weight_prio(
-+ bfq_entity_service_tree(entity),
-+ entity);
-+}
-+
-+/*
-+ * Dispatch one request from bfqq, moving it to the request queue
-+ * dispatch list.
-+ */
-+static int bfq_dispatch_request(struct bfq_data *bfqd,
-+ struct bfq_queue *bfqq)
-+{
-+ int dispatched = 0;
-+ struct request *rq;
-+ unsigned long service_to_charge;
-+
-+ BUG_ON(RB_EMPTY_ROOT(&bfqq->sort_list));
-+
-+ /* Follow expired path, else get first next available. */
-+ rq = bfq_check_fifo(bfqq);
-+ if (!rq)
-+ rq = bfqq->next_rq;
-+ service_to_charge = bfq_serv_to_charge(rq, bfqq);
-+
-+ if (service_to_charge > bfq_bfqq_budget_left(bfqq)) {
-+ /*
-+ * This may happen if the next rq is chosen in fifo order
-+ * instead of sector order. The budget is properly
-+ * dimensioned to be always sufficient to serve the next
-+ * request only if it is chosen in sector order. The reason
-+ * is that it would be quite inefficient and little useful
-+ * to always make sure that the budget is large enough to
-+ * serve even the possible next rq in fifo order.
-+ * In fact, requests are seldom served in fifo order.
-+ *
-+ * Expire the queue for budget exhaustion, and make sure
-+ * that the next act_budget is enough to serve the next
-+ * request, even if it comes from the fifo expired path.
-+ */
-+ bfqq->next_rq = rq;
-+ /*
-+ * Since this dispatch is failed, make sure that
-+ * a new one will be performed
-+ */
-+ if (!bfqd->rq_in_driver)
-+ bfq_schedule_dispatch(bfqd);
-+ goto expire;
-+ }
-+
-+ /* Finally, insert request into driver dispatch list. */
-+ bfq_bfqq_served(bfqq, service_to_charge);
-+ bfq_dispatch_insert(bfqd->queue, rq);
-+
-+ bfq_update_wr_data(bfqd, bfqq);
-+
-+ bfq_log_bfqq(bfqd, bfqq,
-+ "dispatched %u sec req (%llu), budg left %d",
-+ blk_rq_sectors(rq),
-+ (long long unsigned)blk_rq_pos(rq),
-+ bfq_bfqq_budget_left(bfqq));
-+
-+ dispatched++;
-+
-+ if (!bfqd->in_service_bic) {
-+ atomic_long_inc(&RQ_BIC(rq)->icq.ioc->refcount);
-+ bfqd->in_service_bic = RQ_BIC(rq);
-+ }
-+
-+ if (bfqd->busy_queues > 1 && ((!bfq_bfqq_sync(bfqq) &&
-+ dispatched >= bfqd->bfq_max_budget_async_rq) ||
-+ bfq_class_idle(bfqq)))
-+ goto expire;
-+
-+ return dispatched;
-+
-+expire:
-+ bfq_bfqq_expire(bfqd, bfqq, false, BFQ_BFQQ_BUDGET_EXHAUSTED);
-+ return dispatched;
-+}
-+
-+static int __bfq_forced_dispatch_bfqq(struct bfq_queue *bfqq)
-+{
-+ int dispatched = 0;
-+
-+ while (bfqq->next_rq) {
-+ bfq_dispatch_insert(bfqq->bfqd->queue, bfqq->next_rq);
-+ dispatched++;
-+ }
-+
-+ BUG_ON(!list_empty(&bfqq->fifo));
-+ return dispatched;
-+}
-+
-+/*
-+ * Drain our current requests.
-+ * Used for barriers and when switching io schedulers on-the-fly.
-+ */
-+static int bfq_forced_dispatch(struct bfq_data *bfqd)
-+{
-+ struct bfq_queue *bfqq, *n;
-+ struct bfq_service_tree *st;
-+ int dispatched = 0;
-+
-+ bfqq = bfqd->in_service_queue;
-+ if (bfqq)
-+ __bfq_bfqq_expire(bfqd, bfqq);
-+
-+ /*
-+ * Loop through classes, and be careful to leave the scheduler
-+ * in a consistent state, as feedback mechanisms and vtime
-+ * updates cannot be disabled during the process.
-+ */
-+ list_for_each_entry_safe(bfqq, n, &bfqd->active_list, bfqq_list) {
-+ st = bfq_entity_service_tree(&bfqq->entity);
-+
-+ dispatched += __bfq_forced_dispatch_bfqq(bfqq);
-+ bfqq->max_budget = bfq_max_budget(bfqd);
-+
-+ bfq_forget_idle(st);
-+ }
-+
-+ BUG_ON(bfqd->busy_queues != 0);
-+
-+ return dispatched;
-+}
-+
-+static int bfq_dispatch_requests(struct request_queue *q, int force)
-+{
-+ struct bfq_data *bfqd = q->elevator->elevator_data;
-+ struct bfq_queue *bfqq;
-+ int max_dispatch;
-+
-+ bfq_log(bfqd, "dispatch requests: %d busy queues", bfqd->busy_queues);
-+ if (bfqd->busy_queues == 0)
-+ return 0;
-+
-+ if (unlikely(force))
-+ return bfq_forced_dispatch(bfqd);
-+
-+ bfqq = bfq_select_queue(bfqd);
-+ if (!bfqq)
-+ return 0;
-+
-+ if (bfq_class_idle(bfqq))
-+ max_dispatch = 1;
-+
-+ if (!bfq_bfqq_sync(bfqq))
-+ max_dispatch = bfqd->bfq_max_budget_async_rq;
-+
-+ if (!bfq_bfqq_sync(bfqq) && bfqq->dispatched >= max_dispatch) {
-+ if (bfqd->busy_queues > 1)
-+ return 0;
-+ if (bfqq->dispatched >= 4 * max_dispatch)
-+ return 0;
-+ }
-+
-+ if (bfqd->sync_flight != 0 && !bfq_bfqq_sync(bfqq))
-+ return 0;
-+
-+ bfq_clear_bfqq_wait_request(bfqq);
-+ BUG_ON(timer_pending(&bfqd->idle_slice_timer));
-+
-+ if (!bfq_dispatch_request(bfqd, bfqq))
-+ return 0;
-+
-+ bfq_log_bfqq(bfqd, bfqq, "dispatched %s request",
-+ bfq_bfqq_sync(bfqq) ? "sync" : "async");
-+
-+ return 1;
-+}
-+
-+/*
-+ * Task holds one reference to the queue, dropped when task exits. Each rq
-+ * in-flight on this queue also holds a reference, dropped when rq is freed.
-+ *
-+ * Queue lock must be held here.
-+ */
-+static void bfq_put_queue(struct bfq_queue *bfqq)
-+{
-+ struct bfq_data *bfqd = bfqq->bfqd;
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ struct bfq_group *bfqg = bfqq_group(bfqq);
-+#endif
-+
-+ BUG_ON(atomic_read(&bfqq->ref) <= 0);
-+
-+ bfq_log_bfqq(bfqd, bfqq, "put_queue: %p %d", bfqq,
-+ atomic_read(&bfqq->ref));
-+ if (!atomic_dec_and_test(&bfqq->ref))
-+ return;
-+
-+ BUG_ON(rb_first(&bfqq->sort_list));
-+ BUG_ON(bfqq->allocated[READ] + bfqq->allocated[WRITE] != 0);
-+ BUG_ON(bfqq->entity.tree);
-+ BUG_ON(bfq_bfqq_busy(bfqq));
-+ BUG_ON(bfqd->in_service_queue == bfqq);
-+
-+ if (bfq_bfqq_sync(bfqq))
-+ /*
-+ * The fact that this queue is being destroyed does not
-+ * invalidate the fact that this queue may have been
-+ * activated during the current burst. As a consequence,
-+ * although the queue does not exist anymore, and hence
-+ * needs to be removed from the burst list if there,
-+ * the burst size has not to be decremented.
-+ */
-+ hlist_del_init(&bfqq->burst_list_node);
-+
-+ bfq_log_bfqq(bfqd, bfqq, "put_queue: %p freed", bfqq);
-+
-+ kmem_cache_free(bfq_pool, bfqq);
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ bfqg_put(bfqg);
-+#endif
-+}
-+
-+static void bfq_exit_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq)
-+{
-+ if (bfqq == bfqd->in_service_queue) {
-+ __bfq_bfqq_expire(bfqd, bfqq);
-+ bfq_schedule_dispatch(bfqd);
-+ }
-+
-+ bfq_log_bfqq(bfqd, bfqq, "exit_bfqq: %p, %d", bfqq,
-+ atomic_read(&bfqq->ref));
-+
-+ bfq_put_queue(bfqq);
-+}
-+
-+static void bfq_init_icq(struct io_cq *icq)
-+{
-+ struct bfq_io_cq *bic = icq_to_bic(icq);
-+
-+ bic->ttime.last_end_request = jiffies;
-+}
-+
-+static void bfq_exit_icq(struct io_cq *icq)
-+{
-+ struct bfq_io_cq *bic = icq_to_bic(icq);
-+ struct bfq_data *bfqd = bic_to_bfqd(bic);
-+
-+ if (bic->bfqq[BLK_RW_ASYNC]) {
-+ bfq_exit_bfqq(bfqd, bic->bfqq[BLK_RW_ASYNC]);
-+ bic->bfqq[BLK_RW_ASYNC] = NULL;
-+ }
-+
-+ if (bic->bfqq[BLK_RW_SYNC]) {
-+ bfq_exit_bfqq(bfqd, bic->bfqq[BLK_RW_SYNC]);
-+ bic->bfqq[BLK_RW_SYNC] = NULL;
-+ }
-+}
-+
-+/*
-+ * Update the entity prio values; note that the new values will not
-+ * be used until the next (re)activation.
-+ */
-+static void bfq_set_next_ioprio_data(struct bfq_queue *bfqq, struct bfq_io_cq *bic)
-+{
-+ struct task_struct *tsk = current;
-+ int ioprio_class;
-+
-+ ioprio_class = IOPRIO_PRIO_CLASS(bic->ioprio);
-+ switch (ioprio_class) {
-+ default:
-+ dev_err(bfqq->bfqd->queue->backing_dev_info.dev,
-+ "bfq: bad prio class %d\n", ioprio_class);
-+ case IOPRIO_CLASS_NONE:
-+ /*
-+ * No prio set, inherit CPU scheduling settings.
-+ */
-+ bfqq->new_ioprio = task_nice_ioprio(tsk);
-+ bfqq->new_ioprio_class = task_nice_ioclass(tsk);
-+ break;
-+ case IOPRIO_CLASS_RT:
-+ bfqq->new_ioprio = IOPRIO_PRIO_DATA(bic->ioprio);
-+ bfqq->new_ioprio_class = IOPRIO_CLASS_RT;
-+ break;
-+ case IOPRIO_CLASS_BE:
-+ bfqq->new_ioprio = IOPRIO_PRIO_DATA(bic->ioprio);
-+ bfqq->new_ioprio_class = IOPRIO_CLASS_BE;
-+ break;
-+ case IOPRIO_CLASS_IDLE:
-+ bfqq->new_ioprio_class = IOPRIO_CLASS_IDLE;
-+ bfqq->new_ioprio = 7;
-+ bfq_clear_bfqq_idle_window(bfqq);
-+ break;
-+ }
-+
-+ if (bfqq->new_ioprio < 0 || bfqq->new_ioprio >= IOPRIO_BE_NR) {
-+ printk(KERN_CRIT "bfq_set_next_ioprio_data: new_ioprio %d\n",
-+ bfqq->new_ioprio);
-+ BUG();
-+ }
-+
-+ bfqq->entity.new_weight = bfq_ioprio_to_weight(bfqq->new_ioprio);
-+ bfqq->entity.prio_changed = 1;
-+}
-+
-+static void bfq_check_ioprio_change(struct bfq_io_cq *bic, struct bio *bio)
-+{
-+ struct bfq_data *bfqd;
-+ struct bfq_queue *bfqq, *new_bfqq;
-+ unsigned long uninitialized_var(flags);
-+ int ioprio = bic->icq.ioc->ioprio;
-+
-+ bfqd = bfq_get_bfqd_locked(&(bic->icq.q->elevator->elevator_data),
-+ &flags);
-+ /*
-+ * This condition may trigger on a newly created bic, be sure to
-+ * drop the lock before returning.
-+ */
-+ if (unlikely(!bfqd) || likely(bic->ioprio == ioprio))
-+ goto out;
-+
-+ bic->ioprio = ioprio;
-+
-+ bfqq = bic->bfqq[BLK_RW_ASYNC];
-+ if (bfqq) {
-+ new_bfqq = bfq_get_queue(bfqd, bio, BLK_RW_ASYNC, bic,
-+ GFP_ATOMIC);
-+ if (new_bfqq) {
-+ bic->bfqq[BLK_RW_ASYNC] = new_bfqq;
-+ bfq_log_bfqq(bfqd, bfqq,
-+ "check_ioprio_change: bfqq %p %d",
-+ bfqq, atomic_read(&bfqq->ref));
-+ bfq_put_queue(bfqq);
-+ }
-+ }
-+
-+ bfqq = bic->bfqq[BLK_RW_SYNC];
-+ if (bfqq)
-+ bfq_set_next_ioprio_data(bfqq, bic);
-+
-+out:
-+ bfq_put_bfqd_unlock(bfqd, &flags);
-+}
-+
-+static void bfq_init_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq,
-+ struct bfq_io_cq *bic, pid_t pid, int is_sync)
-+{
-+ RB_CLEAR_NODE(&bfqq->entity.rb_node);
-+ INIT_LIST_HEAD(&bfqq->fifo);
-+ INIT_HLIST_NODE(&bfqq->burst_list_node);
-+
-+ atomic_set(&bfqq->ref, 0);
-+ bfqq->bfqd = bfqd;
-+
-+ if (bic)
-+ bfq_set_next_ioprio_data(bfqq, bic);
-+
-+ if (is_sync) {
-+ if (!bfq_class_idle(bfqq))
-+ bfq_mark_bfqq_idle_window(bfqq);
-+ bfq_mark_bfqq_sync(bfqq);
-+ } else
-+ bfq_clear_bfqq_sync(bfqq);
-+ bfq_mark_bfqq_IO_bound(bfqq);
-+
-+ /* Tentative initial value to trade off between thr and lat */
-+ bfqq->max_budget = (2 * bfq_max_budget(bfqd)) / 3;
-+ bfqq->pid = pid;
-+
-+ bfqq->wr_coeff = 1;
-+ bfqq->last_wr_start_finish = 0;
-+ /*
-+ * Set to the value for which bfqq will not be deemed as
-+ * soft rt when it becomes backlogged.
-+ */
-+ bfqq->soft_rt_next_start = bfq_infinity_from_now(jiffies);
-+}
-+
-+static struct bfq_queue *bfq_find_alloc_queue(struct bfq_data *bfqd,
-+ struct bio *bio, int is_sync,
-+ struct bfq_io_cq *bic,
-+ gfp_t gfp_mask)
-+{
-+ struct bfq_group *bfqg;
-+ struct bfq_queue *bfqq, *new_bfqq = NULL;
-+ struct blkcg *blkcg;
-+
-+retry:
-+ rcu_read_lock();
-+
-+ blkcg = bio_blkcg(bio);
-+ bfqg = bfq_find_alloc_group(bfqd, blkcg);
-+ /* bic always exists here */
-+ bfqq = bic_to_bfqq(bic, is_sync);
-+
-+ /*
-+ * Always try a new alloc if we fall back to the OOM bfqq
-+ * originally, since it should just be a temporary situation.
-+ */
-+ if (!bfqq || bfqq == &bfqd->oom_bfqq) {
-+ bfqq = NULL;
-+ if (new_bfqq) {
-+ bfqq = new_bfqq;
-+ new_bfqq = NULL;
-+ } else if (gfp_mask & __GFP_WAIT) {
-+ rcu_read_unlock();
-+ spin_unlock_irq(bfqd->queue->queue_lock);
-+ new_bfqq = kmem_cache_alloc_node(bfq_pool,
-+ gfp_mask | __GFP_ZERO,
-+ bfqd->queue->node);
-+ spin_lock_irq(bfqd->queue->queue_lock);
-+ if (new_bfqq)
-+ goto retry;
-+ } else {
-+ bfqq = kmem_cache_alloc_node(bfq_pool,
-+ gfp_mask | __GFP_ZERO,
-+ bfqd->queue->node);
-+ }
-+
-+ if (bfqq) {
-+ bfq_init_bfqq(bfqd, bfqq, bic, current->pid,
-+ is_sync);
-+ bfq_init_entity(&bfqq->entity, bfqg);
-+ bfq_log_bfqq(bfqd, bfqq, "allocated");
-+ } else {
-+ bfqq = &bfqd->oom_bfqq;
-+ bfq_log_bfqq(bfqd, bfqq, "using oom bfqq");
-+ }
-+ }
-+
-+ if (new_bfqq)
-+ kmem_cache_free(bfq_pool, new_bfqq);
-+
-+ rcu_read_unlock();
-+
-+ return bfqq;
-+}
-+
-+static struct bfq_queue **bfq_async_queue_prio(struct bfq_data *bfqd,
-+ struct bfq_group *bfqg,
-+ int ioprio_class, int ioprio)
-+{
-+ switch (ioprio_class) {
-+ case IOPRIO_CLASS_RT:
-+ return &bfqg->async_bfqq[0][ioprio];
-+ case IOPRIO_CLASS_NONE:
-+ ioprio = IOPRIO_NORM;
-+ /* fall through */
-+ case IOPRIO_CLASS_BE:
-+ return &bfqg->async_bfqq[1][ioprio];
-+ case IOPRIO_CLASS_IDLE:
-+ return &bfqg->async_idle_bfqq;
-+ default:
-+ BUG();
-+ }
-+}
-+
-+static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
-+ struct bio *bio, int is_sync,
-+ struct bfq_io_cq *bic, gfp_t gfp_mask)
-+{
-+ const int ioprio = IOPRIO_PRIO_DATA(bic->ioprio);
-+ const int ioprio_class = IOPRIO_PRIO_CLASS(bic->ioprio);
-+ struct bfq_queue **async_bfqq = NULL;
-+ struct bfq_queue *bfqq = NULL;
-+
-+ if (!is_sync) {
-+ struct blkcg *blkcg;
-+ struct bfq_group *bfqg;
-+
-+ rcu_read_lock();
-+ blkcg = bio_blkcg(bio);
-+ rcu_read_unlock();
-+ bfqg = bfq_find_alloc_group(bfqd, blkcg);
-+ async_bfqq = bfq_async_queue_prio(bfqd, bfqg, ioprio_class,
-+ ioprio);
-+ bfqq = *async_bfqq;
-+ }
-+
-+ if (!bfqq)
-+ bfqq = bfq_find_alloc_queue(bfqd, bio, is_sync, bic, gfp_mask);
-+
-+ /*
-+ * Pin the queue now that it's allocated, scheduler exit will
-+ * prune it.
-+ */
-+ if (!is_sync && !(*async_bfqq)) {
-+ atomic_inc(&bfqq->ref);
-+ bfq_log_bfqq(bfqd, bfqq, "get_queue, bfqq not in async: %p, %d",
-+ bfqq, atomic_read(&bfqq->ref));
-+ *async_bfqq = bfqq;
-+ }
-+
-+ atomic_inc(&bfqq->ref);
-+ bfq_log_bfqq(bfqd, bfqq, "get_queue, at end: %p, %d", bfqq,
-+ atomic_read(&bfqq->ref));
-+ return bfqq;
-+}
-+
-+static void bfq_update_io_thinktime(struct bfq_data *bfqd,
-+ struct bfq_io_cq *bic)
-+{
-+ unsigned long elapsed = jiffies - bic->ttime.last_end_request;
-+ unsigned long ttime = min(elapsed, 2UL * bfqd->bfq_slice_idle);
-+
-+ bic->ttime.ttime_samples = (7*bic->ttime.ttime_samples + 256) / 8;
-+ bic->ttime.ttime_total = (7*bic->ttime.ttime_total + 256*ttime) / 8;
-+ bic->ttime.ttime_mean = (bic->ttime.ttime_total + 128) /
-+ bic->ttime.ttime_samples;
-+}
-+
-+static void bfq_update_io_seektime(struct bfq_data *bfqd,
-+ struct bfq_queue *bfqq,
-+ struct request *rq)
-+{
-+ sector_t sdist;
-+ u64 total;
-+
-+ if (bfqq->last_request_pos < blk_rq_pos(rq))
-+ sdist = blk_rq_pos(rq) - bfqq->last_request_pos;
-+ else
-+ sdist = bfqq->last_request_pos - blk_rq_pos(rq);
-+
-+ /*
-+ * Don't allow the seek distance to get too large from the
-+ * odd fragment, pagein, etc.
-+ */
-+ if (bfqq->seek_samples == 0) /* first request, not really a seek */
-+ sdist = 0;
-+ else if (bfqq->seek_samples <= 60) /* second & third seek */
-+ sdist = min(sdist, (bfqq->seek_mean * 4) + 2*1024*1024);
-+ else
-+ sdist = min(sdist, (bfqq->seek_mean * 4) + 2*1024*64);
-+
-+ bfqq->seek_samples = (7*bfqq->seek_samples + 256) / 8;
-+ bfqq->seek_total = (7*bfqq->seek_total + (u64)256*sdist) / 8;
-+ total = bfqq->seek_total + (bfqq->seek_samples/2);
-+ do_div(total, bfqq->seek_samples);
-+ bfqq->seek_mean = (sector_t)total;
-+
-+ bfq_log_bfqq(bfqd, bfqq, "dist=%llu mean=%llu", (u64)sdist,
-+ (u64)bfqq->seek_mean);
-+}
-+
-+/*
-+ * Disable idle window if the process thinks too long or seeks so much that
-+ * it doesn't matter.
-+ */
-+static void bfq_update_idle_window(struct bfq_data *bfqd,
-+ struct bfq_queue *bfqq,
-+ struct bfq_io_cq *bic)
-+{
-+ int enable_idle;
-+
-+ /* Don't idle for async or idle io prio class. */
-+ if (!bfq_bfqq_sync(bfqq) || bfq_class_idle(bfqq))
-+ return;
-+
-+ enable_idle = bfq_bfqq_idle_window(bfqq);
-+
-+ if (atomic_read(&bic->icq.ioc->active_ref) == 0 ||
-+ bfqd->bfq_slice_idle == 0 ||
-+ (bfqd->hw_tag && BFQQ_SEEKY(bfqq) &&
-+ bfqq->wr_coeff == 1))
-+ enable_idle = 0;
-+ else if (bfq_sample_valid(bic->ttime.ttime_samples)) {
-+ if (bic->ttime.ttime_mean > bfqd->bfq_slice_idle &&
-+ bfqq->wr_coeff == 1)
-+ enable_idle = 0;
-+ else
-+ enable_idle = 1;
-+ }
-+ bfq_log_bfqq(bfqd, bfqq, "update_idle_window: enable_idle %d",
-+ enable_idle);
-+
-+ if (enable_idle)
-+ bfq_mark_bfqq_idle_window(bfqq);
-+ else
-+ bfq_clear_bfqq_idle_window(bfqq);
-+}
-+
-+/*
-+ * Called when a new fs request (rq) is added to bfqq. Check if there's
-+ * something we should do about it.
-+ */
-+static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq,
-+ struct request *rq)
-+{
-+ struct bfq_io_cq *bic = RQ_BIC(rq);
-+
-+ if (rq->cmd_flags & REQ_META)
-+ bfqq->meta_pending++;
-+
-+ bfq_update_io_thinktime(bfqd, bic);
-+ bfq_update_io_seektime(bfqd, bfqq, rq);
-+ if (!BFQQ_SEEKY(bfqq) && bfq_bfqq_constantly_seeky(bfqq)) {
-+ bfq_clear_bfqq_constantly_seeky(bfqq);
-+ if (!blk_queue_nonrot(bfqd->queue)) {
-+ BUG_ON(!bfqd->const_seeky_busy_in_flight_queues);
-+ bfqd->const_seeky_busy_in_flight_queues--;
-+ }
-+ }
-+ if (bfqq->entity.service > bfq_max_budget(bfqd) / 8 ||
-+ !BFQQ_SEEKY(bfqq))
-+ bfq_update_idle_window(bfqd, bfqq, bic);
-+
-+ bfq_log_bfqq(bfqd, bfqq,
-+ "rq_enqueued: idle_window=%d (seeky %d, mean %llu)",
-+ bfq_bfqq_idle_window(bfqq), BFQQ_SEEKY(bfqq),
-+ (long long unsigned)bfqq->seek_mean);
-+
-+ bfqq->last_request_pos = blk_rq_pos(rq) + blk_rq_sectors(rq);
-+
-+ if (bfqq == bfqd->in_service_queue && bfq_bfqq_wait_request(bfqq)) {
-+ bool small_req = bfqq->queued[rq_is_sync(rq)] == 1 &&
-+ blk_rq_sectors(rq) < 32;
-+ bool budget_timeout = bfq_bfqq_budget_timeout(bfqq);
-+
-+ /*
-+ * There is just this request queued: if the request
-+ * is small and the queue is not to be expired, then
-+ * just exit.
-+ *
-+ * In this way, if the disk is being idled to wait for
-+ * a new request from the in-service queue, we avoid
-+ * unplugging the device and committing the disk to serve
-+ * just a small request. On the contrary, we wait for
-+ * the block layer to decide when to unplug the device:
-+ * hopefully, new requests will be merged to this one
-+ * quickly, then the device will be unplugged and
-+ * larger requests will be dispatched.
-+ */
-+ if (small_req && !budget_timeout)
-+ return;
-+
-+ /*
-+ * A large enough request arrived, or the queue is to
-+ * be expired: in both cases disk idling is to be
-+ * stopped, so clear wait_request flag and reset
-+ * timer.
-+ */
-+ bfq_clear_bfqq_wait_request(bfqq);
-+ del_timer(&bfqd->idle_slice_timer);
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ bfqg_stats_update_idle_time(bfqq_group(bfqq));
-+#endif
-+
-+ /*
-+ * The queue is not empty, because a new request just
-+ * arrived. Hence we can safely expire the queue, in
-+ * case of budget timeout, without risking that the
-+ * timestamps of the queue are not updated correctly.
-+ * See [1] for more details.
-+ */
-+ if (budget_timeout)
-+ bfq_bfqq_expire(bfqd, bfqq, false,
-+ BFQ_BFQQ_BUDGET_TIMEOUT);
-+
-+ /*
-+ * Let the request rip immediately, or let a new queue be
-+ * selected if bfqq has just been expired.
-+ */
-+ __blk_run_queue(bfqd->queue);
-+ }
-+}
-+
-+static void bfq_insert_request(struct request_queue *q, struct request *rq)
-+{
-+ struct bfq_data *bfqd = q->elevator->elevator_data;
-+ struct bfq_queue *bfqq = RQ_BFQQ(rq);
-+
-+ assert_spin_locked(bfqd->queue->queue_lock);
-+
-+ bfq_add_request(rq);
-+
-+ rq->fifo_time = jiffies + bfqd->bfq_fifo_expire[rq_is_sync(rq)];
-+ list_add_tail(&rq->queuelist, &bfqq->fifo);
-+
-+ bfq_rq_enqueued(bfqd, bfqq, rq);
-+}
-+
-+static void bfq_update_hw_tag(struct bfq_data *bfqd)
-+{
-+ bfqd->max_rq_in_driver = max(bfqd->max_rq_in_driver,
-+ bfqd->rq_in_driver);
-+
-+ if (bfqd->hw_tag == 1)
-+ return;
-+
-+ /*
-+ * This sample is valid if the number of outstanding requests
-+ * is large enough to allow a queueing behavior. Note that the
-+ * sum is not exact, as it's not taking into account deactivated
-+ * requests.
-+ */
-+ if (bfqd->rq_in_driver + bfqd->queued < BFQ_HW_QUEUE_THRESHOLD)
-+ return;
-+
-+ if (bfqd->hw_tag_samples++ < BFQ_HW_QUEUE_SAMPLES)
-+ return;
-+
-+ bfqd->hw_tag = bfqd->max_rq_in_driver > BFQ_HW_QUEUE_THRESHOLD;
-+ bfqd->max_rq_in_driver = 0;
-+ bfqd->hw_tag_samples = 0;
-+}
-+
-+static void bfq_completed_request(struct request_queue *q, struct request *rq)
-+{
-+ struct bfq_queue *bfqq = RQ_BFQQ(rq);
-+ struct bfq_data *bfqd = bfqq->bfqd;
-+ bool sync = bfq_bfqq_sync(bfqq);
-+
-+ bfq_log_bfqq(bfqd, bfqq, "completed one req with %u sects left (%d)",
-+ blk_rq_sectors(rq), sync);
-+
-+ bfq_update_hw_tag(bfqd);
-+
-+ BUG_ON(!bfqd->rq_in_driver);
-+ BUG_ON(!bfqq->dispatched);
-+ bfqd->rq_in_driver--;
-+ bfqq->dispatched--;
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ bfqg_stats_update_completion(bfqq_group(bfqq),
-+ rq_start_time_ns(rq),
-+ rq_io_start_time_ns(rq), rq->cmd_flags);
-+#endif
-+
-+ if (!bfqq->dispatched && !bfq_bfqq_busy(bfqq)) {
-+ bfq_weights_tree_remove(bfqd, &bfqq->entity,
-+ &bfqd->queue_weights_tree);
-+ if (!blk_queue_nonrot(bfqd->queue)) {
-+ BUG_ON(!bfqd->busy_in_flight_queues);
-+ bfqd->busy_in_flight_queues--;
-+ if (bfq_bfqq_constantly_seeky(bfqq)) {
-+ BUG_ON(!bfqd->
-+ const_seeky_busy_in_flight_queues);
-+ bfqd->const_seeky_busy_in_flight_queues--;
-+ }
-+ }
-+ }
-+
-+ if (sync) {
-+ bfqd->sync_flight--;
-+ RQ_BIC(rq)->ttime.last_end_request = jiffies;
-+ }
-+
-+ /*
-+ * If we are waiting to discover whether the request pattern of the
-+ * task associated with the queue is actually isochronous, and
-+ * both requisites for this condition to hold are satisfied, then
-+ * compute soft_rt_next_start (see the comments to the function
-+ * bfq_bfqq_softrt_next_start()).
-+ */
-+ if (bfq_bfqq_softrt_update(bfqq) && bfqq->dispatched == 0 &&
-+ RB_EMPTY_ROOT(&bfqq->sort_list))
-+ bfqq->soft_rt_next_start =
-+ bfq_bfqq_softrt_next_start(bfqd, bfqq);
-+
-+ /*
-+ * If this is the in-service queue, check if it needs to be expired,
-+ * or if we want to idle in case it has no pending requests.
-+ */
-+ if (bfqd->in_service_queue == bfqq) {
-+ if (bfq_bfqq_budget_new(bfqq))
-+ bfq_set_budget_timeout(bfqd);
-+
-+ if (bfq_bfqq_must_idle(bfqq)) {
-+ bfq_arm_slice_timer(bfqd);
-+ goto out;
-+ } else if (bfq_may_expire_for_budg_timeout(bfqq))
-+ bfq_bfqq_expire(bfqd, bfqq, false,
-+ BFQ_BFQQ_BUDGET_TIMEOUT);
-+ else if (RB_EMPTY_ROOT(&bfqq->sort_list) &&
-+ (bfqq->dispatched == 0 ||
-+ !bfq_bfqq_may_idle(bfqq)))
-+ bfq_bfqq_expire(bfqd, bfqq, false,
-+ BFQ_BFQQ_NO_MORE_REQUESTS);
-+ }
-+
-+ if (!bfqd->rq_in_driver)
-+ bfq_schedule_dispatch(bfqd);
-+
-+out:
-+ return;
-+}
-+
-+static int __bfq_may_queue(struct bfq_queue *bfqq)
-+{
-+ if (bfq_bfqq_wait_request(bfqq) && bfq_bfqq_must_alloc(bfqq)) {
-+ bfq_clear_bfqq_must_alloc(bfqq);
-+ return ELV_MQUEUE_MUST;
-+ }
-+
-+ return ELV_MQUEUE_MAY;
-+}
-+
-+static int bfq_may_queue(struct request_queue *q, int rw)
-+{
-+ struct bfq_data *bfqd = q->elevator->elevator_data;
-+ struct task_struct *tsk = current;
-+ struct bfq_io_cq *bic;
-+ struct bfq_queue *bfqq;
-+
-+ /*
-+ * Don't force setup of a queue from here, as a call to may_queue
-+ * does not necessarily imply that a request actually will be
-+ * queued. So just lookup a possibly existing queue, or return
-+ * 'may queue' if that fails.
-+ */
-+ bic = bfq_bic_lookup(bfqd, tsk->io_context);
-+ if (!bic)
-+ return ELV_MQUEUE_MAY;
-+
-+ bfqq = bic_to_bfqq(bic, rw_is_sync(rw));
-+ if (bfqq)
-+ return __bfq_may_queue(bfqq);
-+
-+ return ELV_MQUEUE_MAY;
-+}
-+
-+/*
-+ * Queue lock held here.
-+ */
-+static void bfq_put_request(struct request *rq)
-+{
-+ struct bfq_queue *bfqq = RQ_BFQQ(rq);
-+
-+ if (bfqq) {
-+ const int rw = rq_data_dir(rq);
-+
-+ BUG_ON(!bfqq->allocated[rw]);
-+ bfqq->allocated[rw]--;
-+
-+ rq->elv.priv[0] = NULL;
-+ rq->elv.priv[1] = NULL;
-+
-+ bfq_log_bfqq(bfqq->bfqd, bfqq, "put_request %p, %d",
-+ bfqq, atomic_read(&bfqq->ref));
-+ bfq_put_queue(bfqq);
-+ }
-+}
-+
-+/*
-+ * Allocate bfq data structures associated with this request.
-+ */
-+static int bfq_set_request(struct request_queue *q, struct request *rq,
-+ struct bio *bio, gfp_t gfp_mask)
-+{
-+ struct bfq_data *bfqd = q->elevator->elevator_data;
-+ struct bfq_io_cq *bic = icq_to_bic(rq->elv.icq);
-+ const int rw = rq_data_dir(rq);
-+ const int is_sync = rq_is_sync(rq);
-+ struct bfq_queue *bfqq;
-+ unsigned long flags;
-+
-+ might_sleep_if(gfp_mask & __GFP_WAIT);
-+
-+ bfq_check_ioprio_change(bic, bio);
-+
-+ spin_lock_irqsave(q->queue_lock, flags);
-+
-+ if (!bic)
-+ goto queue_fail;
-+
-+ bfq_bic_update_cgroup(bic, bio);
-+
-+ bfqq = bic_to_bfqq(bic, is_sync);
-+ if (!bfqq || bfqq == &bfqd->oom_bfqq) {
-+ bfqq = bfq_get_queue(bfqd, bio, is_sync, bic, gfp_mask);
-+ bic_set_bfqq(bic, bfqq, is_sync);
-+ if (is_sync) {
-+ if (bfqd->large_burst)
-+ bfq_mark_bfqq_in_large_burst(bfqq);
-+ else
-+ bfq_clear_bfqq_in_large_burst(bfqq);
-+ }
-+ }
-+
-+ bfqq->allocated[rw]++;
-+ atomic_inc(&bfqq->ref);
-+ bfq_log_bfqq(bfqd, bfqq, "set_request: bfqq %p, %d", bfqq,
-+ atomic_read(&bfqq->ref));
-+
-+ rq->elv.priv[0] = bic;
-+ rq->elv.priv[1] = bfqq;
-+
-+ spin_unlock_irqrestore(q->queue_lock, flags);
-+
-+ return 0;
-+
-+queue_fail:
-+ bfq_schedule_dispatch(bfqd);
-+ spin_unlock_irqrestore(q->queue_lock, flags);
-+
-+ return 1;
-+}
-+
-+static void bfq_kick_queue(struct work_struct *work)
-+{
-+ struct bfq_data *bfqd =
-+ container_of(work, struct bfq_data, unplug_work);
-+ struct request_queue *q = bfqd->queue;
-+
-+ spin_lock_irq(q->queue_lock);
-+ __blk_run_queue(q);
-+ spin_unlock_irq(q->queue_lock);
-+}
-+
-+/*
-+ * Handler of the expiration of the timer running if the in-service queue
-+ * is idling inside its time slice.
-+ */
-+static void bfq_idle_slice_timer(unsigned long data)
-+{
-+ struct bfq_data *bfqd = (struct bfq_data *)data;
-+ struct bfq_queue *bfqq;
-+ unsigned long flags;
-+ enum bfqq_expiration reason;
-+
-+ spin_lock_irqsave(bfqd->queue->queue_lock, flags);
-+
-+ bfqq = bfqd->in_service_queue;
-+ /*
-+ * Theoretical race here: the in-service queue can be NULL or
-+ * different from the queue that was idling if the timer handler
-+ * spins on the queue_lock and a new request arrives for the
-+ * current queue and there is a full dispatch cycle that changes
-+ * the in-service queue. This can hardly happen, but in the worst
-+ * case we just expire a queue too early.
-+ */
-+ if (bfqq) {
-+ bfq_log_bfqq(bfqd, bfqq, "slice_timer expired");
-+ if (bfq_bfqq_budget_timeout(bfqq))
-+ /*
-+ * Also here the queue can be safely expired
-+ * for budget timeout without wasting
-+ * guarantees
-+ */
-+ reason = BFQ_BFQQ_BUDGET_TIMEOUT;
-+ else if (bfqq->queued[0] == 0 && bfqq->queued[1] == 0)
-+ /*
-+ * The queue may not be empty upon timer expiration,
-+ * because we may not disable the timer when the
-+ * first request of the in-service queue arrives
-+ * during disk idling.
-+ */
-+ reason = BFQ_BFQQ_TOO_IDLE;
-+ else
-+ goto schedule_dispatch;
-+
-+ bfq_bfqq_expire(bfqd, bfqq, true, reason);
-+ }
-+
-+schedule_dispatch:
-+ bfq_schedule_dispatch(bfqd);
-+
-+ spin_unlock_irqrestore(bfqd->queue->queue_lock, flags);
-+}
-+
-+static void bfq_shutdown_timer_wq(struct bfq_data *bfqd)
-+{
-+ del_timer_sync(&bfqd->idle_slice_timer);
-+ cancel_work_sync(&bfqd->unplug_work);
-+}
-+
-+static void __bfq_put_async_bfqq(struct bfq_data *bfqd,
-+ struct bfq_queue **bfqq_ptr)
-+{
-+ struct bfq_group *root_group = bfqd->root_group;
-+ struct bfq_queue *bfqq = *bfqq_ptr;
-+
-+ bfq_log(bfqd, "put_async_bfqq: %p", bfqq);
-+ if (bfqq) {
-+ bfq_bfqq_move(bfqd, bfqq, &bfqq->entity, root_group);
-+ bfq_log_bfqq(bfqd, bfqq, "put_async_bfqq: putting %p, %d",
-+ bfqq, atomic_read(&bfqq->ref));
-+ bfq_put_queue(bfqq);
-+ *bfqq_ptr = NULL;
-+ }
-+}
-+
-+/*
-+ * Release all the bfqg references to its async queues. If we are
-+ * deallocating the group these queues may still contain requests, so
-+ * we reparent them to the root cgroup (i.e., the only one that will
-+ * exist for sure until all the requests on a device are gone).
-+ */
-+static void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg)
-+{
-+ int i, j;
-+
-+ for (i = 0; i < 2; i++)
-+ for (j = 0; j < IOPRIO_BE_NR; j++)
-+ __bfq_put_async_bfqq(bfqd, &bfqg->async_bfqq[i][j]);
-+
-+ __bfq_put_async_bfqq(bfqd, &bfqg->async_idle_bfqq);
-+}
-+
-+static void bfq_exit_queue(struct elevator_queue *e)
-+{
-+ struct bfq_data *bfqd = e->elevator_data;
-+ struct request_queue *q = bfqd->queue;
-+ struct bfq_queue *bfqq, *n;
-+
-+ bfq_shutdown_timer_wq(bfqd);
-+
-+ spin_lock_irq(q->queue_lock);
-+
-+ BUG_ON(bfqd->in_service_queue);
-+ list_for_each_entry_safe(bfqq, n, &bfqd->idle_list, bfqq_list)
-+ bfq_deactivate_bfqq(bfqd, bfqq, 0);
-+
-+ bfq_disconnect_groups(bfqd);
-+ spin_unlock_irq(q->queue_lock);
-+
-+ bfq_shutdown_timer_wq(bfqd);
-+
-+ synchronize_rcu();
-+
-+ BUG_ON(timer_pending(&bfqd->idle_slice_timer));
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ blkcg_deactivate_policy(q, &blkcg_policy_bfq);
-+#endif
-+
-+ kfree(bfqd);
-+}
-+
-+static void bfq_init_root_group(struct bfq_group *root_group,
-+ struct bfq_data *bfqd)
-+{
-+ int i;
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ root_group->entity.parent = NULL;
-+ root_group->my_entity = NULL;
-+ root_group->bfqd = bfqd;
-+#endif
-+ for (i = 0; i < BFQ_IOPRIO_CLASSES; i++)
-+ root_group->sched_data.service_tree[i] = BFQ_SERVICE_TREE_INIT;
-+}
-+
-+static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
-+{
-+ struct bfq_data *bfqd;
-+ struct elevator_queue *eq;
-+
-+ eq = elevator_alloc(q, e);
-+ if (!eq)
-+ return -ENOMEM;
-+
-+ bfqd = kzalloc_node(sizeof(*bfqd), GFP_KERNEL, q->node);
-+ if (!bfqd) {
-+ kobject_put(&eq->kobj);
-+ return -ENOMEM;
-+ }
-+ eq->elevator_data = bfqd;
-+
-+ /*
-+ * Our fallback bfqq if bfq_find_alloc_queue() runs into OOM issues.
-+ * Grab a permanent reference to it, so that the normal code flow
-+ * will not attempt to free it.
-+ */
-+ bfq_init_bfqq(bfqd, &bfqd->oom_bfqq, NULL, 1, 0);
-+ atomic_inc(&bfqd->oom_bfqq.ref);
-+ bfqd->oom_bfqq.new_ioprio = BFQ_DEFAULT_QUEUE_IOPRIO;
-+ bfqd->oom_bfqq.new_ioprio_class = IOPRIO_CLASS_BE;
-+ bfqd->oom_bfqq.entity.new_weight =
-+ bfq_ioprio_to_weight(bfqd->oom_bfqq.new_ioprio);
-+ /*
-+ * Trigger weight initialization, according to ioprio, at the
-+ * oom_bfqq's first activation. The oom_bfqq's ioprio and ioprio
-+ * class won't be changed any more.
-+ */
-+ bfqd->oom_bfqq.entity.prio_changed = 1;
-+
-+ bfqd->queue = q;
-+
-+ spin_lock_irq(q->queue_lock);
-+ q->elevator = eq;
-+ spin_unlock_irq(q->queue_lock);
-+
-+ bfqd->root_group = bfq_create_group_hierarchy(bfqd, q->node);
-+ if (!bfqd->root_group)
-+ goto out_free;
-+ bfq_init_root_group(bfqd->root_group, bfqd);
-+ bfq_init_entity(&bfqd->oom_bfqq.entity, bfqd->root_group);
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ bfqd->active_numerous_groups = 0;
-+#endif
-+
-+ init_timer(&bfqd->idle_slice_timer);
-+ bfqd->idle_slice_timer.function = bfq_idle_slice_timer;
-+ bfqd->idle_slice_timer.data = (unsigned long)bfqd;
-+
-+ bfqd->queue_weights_tree = RB_ROOT;
-+ bfqd->group_weights_tree = RB_ROOT;
-+
-+ INIT_WORK(&bfqd->unplug_work, bfq_kick_queue);
-+
-+ INIT_LIST_HEAD(&bfqd->active_list);
-+ INIT_LIST_HEAD(&bfqd->idle_list);
-+ INIT_HLIST_HEAD(&bfqd->burst_list);
-+
-+ bfqd->hw_tag = -1;
-+
-+ bfqd->bfq_max_budget = bfq_default_max_budget;
-+
-+ bfqd->bfq_fifo_expire[0] = bfq_fifo_expire[0];
-+ bfqd->bfq_fifo_expire[1] = bfq_fifo_expire[1];
-+ bfqd->bfq_back_max = bfq_back_max;
-+ bfqd->bfq_back_penalty = bfq_back_penalty;
-+ bfqd->bfq_slice_idle = bfq_slice_idle;
-+ bfqd->bfq_class_idle_last_service = 0;
-+ bfqd->bfq_max_budget_async_rq = bfq_max_budget_async_rq;
-+ bfqd->bfq_timeout[BLK_RW_ASYNC] = bfq_timeout_async;
-+ bfqd->bfq_timeout[BLK_RW_SYNC] = bfq_timeout_sync;
-+
-+ bfqd->bfq_requests_within_timer = 120;
-+
-+ bfqd->bfq_large_burst_thresh = 11;
-+ bfqd->bfq_burst_interval = msecs_to_jiffies(500);
-+
-+ bfqd->low_latency = true;
-+
-+ bfqd->bfq_wr_coeff = 20;
-+ bfqd->bfq_wr_rt_max_time = msecs_to_jiffies(300);
-+ bfqd->bfq_wr_max_time = 0;
-+ bfqd->bfq_wr_min_idle_time = msecs_to_jiffies(2000);
-+ bfqd->bfq_wr_min_inter_arr_async = msecs_to_jiffies(500);
-+ bfqd->bfq_wr_max_softrt_rate = 7000; /*
-+ * Approximate rate required
-+ * to playback or record a
-+ * high-definition compressed
-+ * video.
-+ */
-+ bfqd->wr_busy_queues = 0;
-+ bfqd->busy_in_flight_queues = 0;
-+ bfqd->const_seeky_busy_in_flight_queues = 0;
-+
-+ /*
-+ * Begin by assuming, optimistically, that the device peak rate is
-+ * equal to the highest reference rate.
-+ */
-+ bfqd->RT_prod = R_fast[blk_queue_nonrot(bfqd->queue)] *
-+ T_fast[blk_queue_nonrot(bfqd->queue)];
-+ bfqd->peak_rate = R_fast[blk_queue_nonrot(bfqd->queue)];
-+ bfqd->device_speed = BFQ_BFQD_FAST;
-+
-+ return 0;
-+
-+out_free:
-+ kfree(bfqd);
-+ kobject_put(&eq->kobj);
-+ return -ENOMEM;
-+}
-+
-+static void bfq_slab_kill(void)
-+{
-+ if (bfq_pool)
-+ kmem_cache_destroy(bfq_pool);
-+}
-+
-+static int __init bfq_slab_setup(void)
-+{
-+ bfq_pool = KMEM_CACHE(bfq_queue, 0);
-+ if (!bfq_pool)
-+ return -ENOMEM;
-+ return 0;
-+}
-+
-+static ssize_t bfq_var_show(unsigned int var, char *page)
-+{
-+ return sprintf(page, "%d\n", var);
-+}
-+
-+static ssize_t bfq_var_store(unsigned long *var, const char *page,
-+ size_t count)
-+{
-+ unsigned long new_val;
-+ int ret = kstrtoul(page, 10, &new_val);
-+
-+ if (ret == 0)
-+ *var = new_val;
-+
-+ return count;
-+}
-+
-+static ssize_t bfq_wr_max_time_show(struct elevator_queue *e, char *page)
-+{
-+ struct bfq_data *bfqd = e->elevator_data;
-+ return sprintf(page, "%d\n", bfqd->bfq_wr_max_time > 0 ?
-+ jiffies_to_msecs(bfqd->bfq_wr_max_time) :
-+ jiffies_to_msecs(bfq_wr_duration(bfqd)));
-+}
-+
-+static ssize_t bfq_weights_show(struct elevator_queue *e, char *page)
-+{
-+ struct bfq_queue *bfqq;
-+ struct bfq_data *bfqd = e->elevator_data;
-+ ssize_t num_char = 0;
-+
-+ num_char += sprintf(page + num_char, "Tot reqs queued %d\n\n",
-+ bfqd->queued);
-+
-+ spin_lock_irq(bfqd->queue->queue_lock);
-+
-+ num_char += sprintf(page + num_char, "Active:\n");
-+ list_for_each_entry(bfqq, &bfqd->active_list, bfqq_list) {
-+ num_char += sprintf(page + num_char,
-+ "pid%d: weight %hu, nr_queued %d %d, dur %d/%u\n",
-+ bfqq->pid,
-+ bfqq->entity.weight,
-+ bfqq->queued[0],
-+ bfqq->queued[1],
-+ jiffies_to_msecs(jiffies - bfqq->last_wr_start_finish),
-+ jiffies_to_msecs(bfqq->wr_cur_max_time));
-+ }
-+
-+ num_char += sprintf(page + num_char, "Idle:\n");
-+ list_for_each_entry(bfqq, &bfqd->idle_list, bfqq_list) {
-+ num_char += sprintf(page + num_char,
-+ "pid%d: weight %hu, dur %d/%u\n",
-+ bfqq->pid,
-+ bfqq->entity.weight,
-+ jiffies_to_msecs(jiffies -
-+ bfqq->last_wr_start_finish),
-+ jiffies_to_msecs(bfqq->wr_cur_max_time));
-+ }
-+
-+ spin_unlock_irq(bfqd->queue->queue_lock);
-+
-+ return num_char;
-+}
-+
-+#define SHOW_FUNCTION(__FUNC, __VAR, __CONV) \
-+static ssize_t __FUNC(struct elevator_queue *e, char *page) \
-+{ \
-+ struct bfq_data *bfqd = e->elevator_data; \
-+ unsigned int __data = __VAR; \
-+ if (__CONV) \
-+ __data = jiffies_to_msecs(__data); \
-+ return bfq_var_show(__data, (page)); \
-+}
-+SHOW_FUNCTION(bfq_fifo_expire_sync_show, bfqd->bfq_fifo_expire[1], 1);
-+SHOW_FUNCTION(bfq_fifo_expire_async_show, bfqd->bfq_fifo_expire[0], 1);
-+SHOW_FUNCTION(bfq_back_seek_max_show, bfqd->bfq_back_max, 0);
-+SHOW_FUNCTION(bfq_back_seek_penalty_show, bfqd->bfq_back_penalty, 0);
-+SHOW_FUNCTION(bfq_slice_idle_show, bfqd->bfq_slice_idle, 1);
-+SHOW_FUNCTION(bfq_max_budget_show, bfqd->bfq_user_max_budget, 0);
-+SHOW_FUNCTION(bfq_max_budget_async_rq_show,
-+ bfqd->bfq_max_budget_async_rq, 0);
-+SHOW_FUNCTION(bfq_timeout_sync_show, bfqd->bfq_timeout[BLK_RW_SYNC], 1);
-+SHOW_FUNCTION(bfq_timeout_async_show, bfqd->bfq_timeout[BLK_RW_ASYNC], 1);
-+SHOW_FUNCTION(bfq_low_latency_show, bfqd->low_latency, 0);
-+SHOW_FUNCTION(bfq_wr_coeff_show, bfqd->bfq_wr_coeff, 0);
-+SHOW_FUNCTION(bfq_wr_rt_max_time_show, bfqd->bfq_wr_rt_max_time, 1);
-+SHOW_FUNCTION(bfq_wr_min_idle_time_show, bfqd->bfq_wr_min_idle_time, 1);
-+SHOW_FUNCTION(bfq_wr_min_inter_arr_async_show, bfqd->bfq_wr_min_inter_arr_async,
-+ 1);
-+SHOW_FUNCTION(bfq_wr_max_softrt_rate_show, bfqd->bfq_wr_max_softrt_rate, 0);
-+#undef SHOW_FUNCTION
-+
-+#define STORE_FUNCTION(__FUNC, __PTR, MIN, MAX, __CONV) \
-+static ssize_t \
-+__FUNC(struct elevator_queue *e, const char *page, size_t count) \
-+{ \
-+ struct bfq_data *bfqd = e->elevator_data; \
-+ unsigned long uninitialized_var(__data); \
-+ int ret = bfq_var_store(&__data, (page), count); \
-+ if (__data < (MIN)) \
-+ __data = (MIN); \
-+ else if (__data > (MAX)) \
-+ __data = (MAX); \
-+ if (__CONV) \
-+ *(__PTR) = msecs_to_jiffies(__data); \
-+ else \
-+ *(__PTR) = __data; \
-+ return ret; \
-+}
-+STORE_FUNCTION(bfq_fifo_expire_sync_store, &bfqd->bfq_fifo_expire[1], 1,
-+ INT_MAX, 1);
-+STORE_FUNCTION(bfq_fifo_expire_async_store, &bfqd->bfq_fifo_expire[0], 1,
-+ INT_MAX, 1);
-+STORE_FUNCTION(bfq_back_seek_max_store, &bfqd->bfq_back_max, 0, INT_MAX, 0);
-+STORE_FUNCTION(bfq_back_seek_penalty_store, &bfqd->bfq_back_penalty, 1,
-+ INT_MAX, 0);
-+STORE_FUNCTION(bfq_slice_idle_store, &bfqd->bfq_slice_idle, 0, INT_MAX, 1);
-+STORE_FUNCTION(bfq_max_budget_async_rq_store, &bfqd->bfq_max_budget_async_rq,
-+ 1, INT_MAX, 0);
-+STORE_FUNCTION(bfq_timeout_async_store, &bfqd->bfq_timeout[BLK_RW_ASYNC], 0,
-+ INT_MAX, 1);
-+STORE_FUNCTION(bfq_wr_coeff_store, &bfqd->bfq_wr_coeff, 1, INT_MAX, 0);
-+STORE_FUNCTION(bfq_wr_max_time_store, &bfqd->bfq_wr_max_time, 0, INT_MAX, 1);
-+STORE_FUNCTION(bfq_wr_rt_max_time_store, &bfqd->bfq_wr_rt_max_time, 0, INT_MAX,
-+ 1);
-+STORE_FUNCTION(bfq_wr_min_idle_time_store, &bfqd->bfq_wr_min_idle_time, 0,
-+ INT_MAX, 1);
-+STORE_FUNCTION(bfq_wr_min_inter_arr_async_store,
-+ &bfqd->bfq_wr_min_inter_arr_async, 0, INT_MAX, 1);
-+STORE_FUNCTION(bfq_wr_max_softrt_rate_store, &bfqd->bfq_wr_max_softrt_rate, 0,
-+ INT_MAX, 0);
-+#undef STORE_FUNCTION
-+
-+/* do nothing for the moment */
-+static ssize_t bfq_weights_store(struct elevator_queue *e,
-+ const char *page, size_t count)
-+{
-+ return count;
-+}
-+
-+static unsigned long bfq_estimated_max_budget(struct bfq_data *bfqd)
-+{
-+ u64 timeout = jiffies_to_msecs(bfqd->bfq_timeout[BLK_RW_SYNC]);
-+
-+ if (bfqd->peak_rate_samples >= BFQ_PEAK_RATE_SAMPLES)
-+ return bfq_calc_max_budget(bfqd->peak_rate, timeout);
-+ else
-+ return bfq_default_max_budget;
-+}
-+
-+static ssize_t bfq_max_budget_store(struct elevator_queue *e,
-+ const char *page, size_t count)
-+{
-+ struct bfq_data *bfqd = e->elevator_data;
-+ unsigned long uninitialized_var(__data);
-+ int ret = bfq_var_store(&__data, (page), count);
-+
-+ if (__data == 0)
-+ bfqd->bfq_max_budget = bfq_estimated_max_budget(bfqd);
-+ else {
-+ if (__data > INT_MAX)
-+ __data = INT_MAX;
-+ bfqd->bfq_max_budget = __data;
-+ }
-+
-+ bfqd->bfq_user_max_budget = __data;
-+
-+ return ret;
-+}
-+
-+static ssize_t bfq_timeout_sync_store(struct elevator_queue *e,
-+ const char *page, size_t count)
-+{
-+ struct bfq_data *bfqd = e->elevator_data;
-+ unsigned long uninitialized_var(__data);
-+ int ret = bfq_var_store(&__data, (page), count);
-+
-+ if (__data < 1)
-+ __data = 1;
-+ else if (__data > INT_MAX)
-+ __data = INT_MAX;
-+
-+ bfqd->bfq_timeout[BLK_RW_SYNC] = msecs_to_jiffies(__data);
-+ if (bfqd->bfq_user_max_budget == 0)
-+ bfqd->bfq_max_budget = bfq_estimated_max_budget(bfqd);
-+
-+ return ret;
-+}
-+
-+static ssize_t bfq_low_latency_store(struct elevator_queue *e,
-+ const char *page, size_t count)
-+{
-+ struct bfq_data *bfqd = e->elevator_data;
-+ unsigned long uninitialized_var(__data);
-+ int ret = bfq_var_store(&__data, (page), count);
-+
-+ if (__data > 1)
-+ __data = 1;
-+ if (__data == 0 && bfqd->low_latency != 0)
-+ bfq_end_wr(bfqd);
-+ bfqd->low_latency = __data;
-+
-+ return ret;
-+}
-+
-+#define BFQ_ATTR(name) \
-+ __ATTR(name, S_IRUGO|S_IWUSR, bfq_##name##_show, bfq_##name##_store)
-+
-+static struct elv_fs_entry bfq_attrs[] = {
-+ BFQ_ATTR(fifo_expire_sync),
-+ BFQ_ATTR(fifo_expire_async),
-+ BFQ_ATTR(back_seek_max),
-+ BFQ_ATTR(back_seek_penalty),
-+ BFQ_ATTR(slice_idle),
-+ BFQ_ATTR(max_budget),
-+ BFQ_ATTR(max_budget_async_rq),
-+ BFQ_ATTR(timeout_sync),
-+ BFQ_ATTR(timeout_async),
-+ BFQ_ATTR(low_latency),
-+ BFQ_ATTR(wr_coeff),
-+ BFQ_ATTR(wr_max_time),
-+ BFQ_ATTR(wr_rt_max_time),
-+ BFQ_ATTR(wr_min_idle_time),
-+ BFQ_ATTR(wr_min_inter_arr_async),
-+ BFQ_ATTR(wr_max_softrt_rate),
-+ BFQ_ATTR(weights),
-+ __ATTR_NULL
-+};
-+
-+static struct elevator_type iosched_bfq = {
-+ .ops = {
-+ .elevator_merge_fn = bfq_merge,
-+ .elevator_merged_fn = bfq_merged_request,
-+ .elevator_merge_req_fn = bfq_merged_requests,
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ .elevator_bio_merged_fn = bfq_bio_merged,
-+#endif
-+ .elevator_allow_merge_fn = bfq_allow_merge,
-+ .elevator_dispatch_fn = bfq_dispatch_requests,
-+ .elevator_add_req_fn = bfq_insert_request,
-+ .elevator_activate_req_fn = bfq_activate_request,
-+ .elevator_deactivate_req_fn = bfq_deactivate_request,
-+ .elevator_completed_req_fn = bfq_completed_request,
-+ .elevator_former_req_fn = elv_rb_former_request,
-+ .elevator_latter_req_fn = elv_rb_latter_request,
-+ .elevator_init_icq_fn = bfq_init_icq,
-+ .elevator_exit_icq_fn = bfq_exit_icq,
-+ .elevator_set_req_fn = bfq_set_request,
-+ .elevator_put_req_fn = bfq_put_request,
-+ .elevator_may_queue_fn = bfq_may_queue,
-+ .elevator_init_fn = bfq_init_queue,
-+ .elevator_exit_fn = bfq_exit_queue,
-+ },
-+ .icq_size = sizeof(struct bfq_io_cq),
-+ .icq_align = __alignof__(struct bfq_io_cq),
-+ .elevator_attrs = bfq_attrs,
-+ .elevator_name = "bfq",
-+ .elevator_owner = THIS_MODULE,
-+};
-+
-+static int __init bfq_init(void)
-+{
-+ int ret;
-+
-+ /*
-+ * Can be 0 on HZ < 1000 setups.
-+ */
-+ if (bfq_slice_idle == 0)
-+ bfq_slice_idle = 1;
-+
-+ if (bfq_timeout_async == 0)
-+ bfq_timeout_async = 1;
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ ret = blkcg_policy_register(&blkcg_policy_bfq);
-+ if (ret)
-+ return ret;
-+#endif
-+
-+ ret = -ENOMEM;
-+ if (bfq_slab_setup())
-+ goto err_pol_unreg;
-+
-+ /*
-+ * Times to load large popular applications for the typical systems
-+ * installed on the reference devices (see the comments before the
-+ * definitions of the two arrays).
-+ */
-+ T_slow[0] = msecs_to_jiffies(2600);
-+ T_slow[1] = msecs_to_jiffies(1000);
-+ T_fast[0] = msecs_to_jiffies(5500);
-+ T_fast[1] = msecs_to_jiffies(2000);
-+
-+ /*
-+ * Thresholds that determine the switch between speed classes (see
-+ * the comments before the definition of the array).
-+ */
-+ device_speed_thresh[0] = (R_fast[0] + R_slow[0]) / 2;
-+ device_speed_thresh[1] = (R_fast[1] + R_slow[1]) / 2;
-+
-+ ret = elv_register(&iosched_bfq);
-+ if (ret)
-+ goto err_pol_unreg;
-+
-+ pr_info("BFQ I/O-scheduler: v7r9");
-+
-+ return 0;
-+
-+err_pol_unreg:
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ blkcg_policy_unregister(&blkcg_policy_bfq);
-+#endif
-+ return ret;
-+}
-+
-+static void __exit bfq_exit(void)
-+{
-+ elv_unregister(&iosched_bfq);
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ blkcg_policy_unregister(&blkcg_policy_bfq);
-+#endif
-+ bfq_slab_kill();
-+}
-+
-+module_init(bfq_init);
-+module_exit(bfq_exit);
-+
-+MODULE_AUTHOR("Fabio Checconi, Paolo Valente");
-+MODULE_LICENSE("GPL");
-diff --git a/block/bfq-sched.c b/block/bfq-sched.c
-new file mode 100644
-index 0000000..9328a1f
---- /dev/null
-+++ b/block/bfq-sched.c
-@@ -0,0 +1,1197 @@
-+/*
-+ * BFQ: Hierarchical B-WF2Q+ scheduler.
-+ *
-+ * Based on ideas and code from CFQ:
-+ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
-+ *
-+ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
-+ * Paolo Valente <paolo.valente@unimore.it>
-+ *
-+ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
-+ */
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+#define for_each_entity(entity) \
-+ for (; entity ; entity = entity->parent)
-+
-+#define for_each_entity_safe(entity, parent) \
-+ for (; entity && ({ parent = entity->parent; 1; }); entity = parent)
-+
-+
-+static struct bfq_entity *bfq_lookup_next_entity(struct bfq_sched_data *sd,
-+ int extract,
-+ struct bfq_data *bfqd);
-+
-+static struct bfq_group *bfqq_group(struct bfq_queue *bfqq);
-+
-+static void bfq_update_budget(struct bfq_entity *next_in_service)
-+{
-+ struct bfq_entity *bfqg_entity;
-+ struct bfq_group *bfqg;
-+ struct bfq_sched_data *group_sd;
-+
-+ BUG_ON(!next_in_service);
-+
-+ group_sd = next_in_service->sched_data;
-+
-+ bfqg = container_of(group_sd, struct bfq_group, sched_data);
-+ /*
-+ * bfq_group's my_entity field is not NULL only if the group
-+ * is not the root group. We must not touch the root entity
-+ * as it must never become an in-service entity.
-+ */
-+ bfqg_entity = bfqg->my_entity;
-+ if (bfqg_entity)
-+ bfqg_entity->budget = next_in_service->budget;
-+}
-+
-+static int bfq_update_next_in_service(struct bfq_sched_data *sd)
-+{
-+ struct bfq_entity *next_in_service;
-+
-+ if (sd->in_service_entity)
-+ /* will update/requeue at the end of service */
-+ return 0;
-+
-+ /*
-+ * NOTE: this can be improved in many ways, such as returning
-+ * 1 (and thus propagating upwards the update) only when the
-+ * budget changes, or caching the bfqq that will be scheduled
-+ * next from this subtree. By now we worry more about
-+ * correctness than about performance...
-+ */
-+ next_in_service = bfq_lookup_next_entity(sd, 0, NULL);
-+ sd->next_in_service = next_in_service;
-+
-+ if (next_in_service)
-+ bfq_update_budget(next_in_service);
-+
-+ return 1;
-+}
-+
-+static void bfq_check_next_in_service(struct bfq_sched_data *sd,
-+ struct bfq_entity *entity)
-+{
-+ BUG_ON(sd->next_in_service != entity);
-+}
-+#else
-+#define for_each_entity(entity) \
-+ for (; entity ; entity = NULL)
-+
-+#define for_each_entity_safe(entity, parent) \
-+ for (parent = NULL; entity ; entity = parent)
-+
-+static int bfq_update_next_in_service(struct bfq_sched_data *sd)
-+{
-+ return 0;
-+}
-+
-+static void bfq_check_next_in_service(struct bfq_sched_data *sd,
-+ struct bfq_entity *entity)
-+{
-+}
-+
-+static void bfq_update_budget(struct bfq_entity *next_in_service)
-+{
-+}
-+#endif
-+
-+/*
-+ * Shift for timestamp calculations. This actually limits the maximum
-+ * service allowed in one timestamp delta (small shift values increase it),
-+ * the maximum total weight that can be used for the queues in the system
-+ * (big shift values increase it), and the period of virtual time
-+ * wraparounds.
-+ */
-+#define WFQ_SERVICE_SHIFT 22
-+
-+/**
-+ * bfq_gt - compare two timestamps.
-+ * @a: first ts.
-+ * @b: second ts.
-+ *
-+ * Return @a > @b, dealing with wrapping correctly.
-+ */
-+static int bfq_gt(u64 a, u64 b)
-+{
-+ return (s64)(a - b) > 0;
-+}
-+
-+static struct bfq_queue *bfq_entity_to_bfqq(struct bfq_entity *entity)
-+{
-+ struct bfq_queue *bfqq = NULL;
-+
-+ BUG_ON(!entity);
-+
-+ if (!entity->my_sched_data)
-+ bfqq = container_of(entity, struct bfq_queue, entity);
-+
-+ return bfqq;
-+}
-+
-+
-+/**
-+ * bfq_delta - map service into the virtual time domain.
-+ * @service: amount of service.
-+ * @weight: scale factor (weight of an entity or weight sum).
-+ */
-+static u64 bfq_delta(unsigned long service, unsigned long weight)
-+{
-+ u64 d = (u64)service << WFQ_SERVICE_SHIFT;
-+
-+ do_div(d, weight);
-+ return d;
-+}
-+
-+/**
-+ * bfq_calc_finish - assign the finish time to an entity.
-+ * @entity: the entity to act upon.
-+ * @service: the service to be charged to the entity.
-+ */
-+static void bfq_calc_finish(struct bfq_entity *entity, unsigned long service)
-+{
-+ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
-+
-+ BUG_ON(entity->weight == 0);
-+
-+ entity->finish = entity->start +
-+ bfq_delta(service, entity->weight);
-+
-+ if (bfqq) {
-+ bfq_log_bfqq(bfqq->bfqd, bfqq,
-+ "calc_finish: serv %lu, w %d",
-+ service, entity->weight);
-+ bfq_log_bfqq(bfqq->bfqd, bfqq,
-+ "calc_finish: start %llu, finish %llu, delta %llu",
-+ entity->start, entity->finish,
-+ bfq_delta(service, entity->weight));
-+ }
-+}
-+
-+/**
-+ * bfq_entity_of - get an entity from a node.
-+ * @node: the node field of the entity.
-+ *
-+ * Convert a node pointer to the relative entity. This is used only
-+ * to simplify the logic of some functions and not as the generic
-+ * conversion mechanism because, e.g., in the tree walking functions,
-+ * the check for a %NULL value would be redundant.
-+ */
-+static struct bfq_entity *bfq_entity_of(struct rb_node *node)
-+{
-+ struct bfq_entity *entity = NULL;
-+
-+ if (node)
-+ entity = rb_entry(node, struct bfq_entity, rb_node);
-+
-+ return entity;
-+}
-+
-+/**
-+ * bfq_extract - remove an entity from a tree.
-+ * @root: the tree root.
-+ * @entity: the entity to remove.
-+ */
-+static void bfq_extract(struct rb_root *root, struct bfq_entity *entity)
-+{
-+ BUG_ON(entity->tree != root);
-+
-+ entity->tree = NULL;
-+ rb_erase(&entity->rb_node, root);
-+}
-+
-+/**
-+ * bfq_idle_extract - extract an entity from the idle tree.
-+ * @st: the service tree of the owning @entity.
-+ * @entity: the entity being removed.
-+ */
-+static void bfq_idle_extract(struct bfq_service_tree *st,
-+ struct bfq_entity *entity)
-+{
-+ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
-+ struct rb_node *next;
-+
-+ BUG_ON(entity->tree != &st->idle);
-+
-+ if (entity == st->first_idle) {
-+ next = rb_next(&entity->rb_node);
-+ st->first_idle = bfq_entity_of(next);
-+ }
-+
-+ if (entity == st->last_idle) {
-+ next = rb_prev(&entity->rb_node);
-+ st->last_idle = bfq_entity_of(next);
-+ }
-+
-+ bfq_extract(&st->idle, entity);
-+
-+ if (bfqq)
-+ list_del(&bfqq->bfqq_list);
-+}
-+
-+/**
-+ * bfq_insert - generic tree insertion.
-+ * @root: tree root.
-+ * @entity: entity to insert.
-+ *
-+ * This is used for the idle and the active tree, since they are both
-+ * ordered by finish time.
-+ */
-+static void bfq_insert(struct rb_root *root, struct bfq_entity *entity)
-+{
-+ struct bfq_entity *entry;
-+ struct rb_node **node = &root->rb_node;
-+ struct rb_node *parent = NULL;
-+
-+ BUG_ON(entity->tree);
-+
-+ while (*node) {
-+ parent = *node;
-+ entry = rb_entry(parent, struct bfq_entity, rb_node);
-+
-+ if (bfq_gt(entry->finish, entity->finish))
-+ node = &parent->rb_left;
-+ else
-+ node = &parent->rb_right;
-+ }
-+
-+ rb_link_node(&entity->rb_node, parent, node);
-+ rb_insert_color(&entity->rb_node, root);
-+
-+ entity->tree = root;
-+}
-+
-+/**
-+ * bfq_update_min - update the min_start field of a entity.
-+ * @entity: the entity to update.
-+ * @node: one of its children.
-+ *
-+ * This function is called when @entity may store an invalid value for
-+ * min_start due to updates to the active tree. The function assumes
-+ * that the subtree rooted at @node (which may be its left or its right
-+ * child) has a valid min_start value.
-+ */
-+static void bfq_update_min(struct bfq_entity *entity, struct rb_node *node)
-+{
-+ struct bfq_entity *child;
-+
-+ if (node) {
-+ child = rb_entry(node, struct bfq_entity, rb_node);
-+ if (bfq_gt(entity->min_start, child->min_start))
-+ entity->min_start = child->min_start;
-+ }
-+}
-+
-+/**
-+ * bfq_update_active_node - recalculate min_start.
-+ * @node: the node to update.
-+ *
-+ * @node may have changed position or one of its children may have moved,
-+ * this function updates its min_start value. The left and right subtrees
-+ * are assumed to hold a correct min_start value.
-+ */
-+static void bfq_update_active_node(struct rb_node *node)
-+{
-+ struct bfq_entity *entity = rb_entry(node, struct bfq_entity, rb_node);
-+
-+ entity->min_start = entity->start;
-+ bfq_update_min(entity, node->rb_right);
-+ bfq_update_min(entity, node->rb_left);
-+}
-+
-+/**
-+ * bfq_update_active_tree - update min_start for the whole active tree.
-+ * @node: the starting node.
-+ *
-+ * @node must be the deepest modified node after an update. This function
-+ * updates its min_start using the values held by its children, assuming
-+ * that they did not change, and then updates all the nodes that may have
-+ * changed in the path to the root. The only nodes that may have changed
-+ * are the ones in the path or their siblings.
-+ */
-+static void bfq_update_active_tree(struct rb_node *node)
-+{
-+ struct rb_node *parent;
-+
-+up:
-+ bfq_update_active_node(node);
-+
-+ parent = rb_parent(node);
-+ if (!parent)
-+ return;
-+
-+ if (node == parent->rb_left && parent->rb_right)
-+ bfq_update_active_node(parent->rb_right);
-+ else if (parent->rb_left)
-+ bfq_update_active_node(parent->rb_left);
-+
-+ node = parent;
-+ goto up;
-+}
-+
-+static void bfq_weights_tree_add(struct bfq_data *bfqd,
-+ struct bfq_entity *entity,
-+ struct rb_root *root);
-+
-+static void bfq_weights_tree_remove(struct bfq_data *bfqd,
-+ struct bfq_entity *entity,
-+ struct rb_root *root);
-+
-+
-+/**
-+ * bfq_active_insert - insert an entity in the active tree of its
-+ * group/device.
-+ * @st: the service tree of the entity.
-+ * @entity: the entity being inserted.
-+ *
-+ * The active tree is ordered by finish time, but an extra key is kept
-+ * per each node, containing the minimum value for the start times of
-+ * its children (and the node itself), so it's possible to search for
-+ * the eligible node with the lowest finish time in logarithmic time.
-+ */
-+static void bfq_active_insert(struct bfq_service_tree *st,
-+ struct bfq_entity *entity)
-+{
-+ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
-+ struct rb_node *node = &entity->rb_node;
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ struct bfq_sched_data *sd = NULL;
-+ struct bfq_group *bfqg = NULL;
-+ struct bfq_data *bfqd = NULL;
-+#endif
-+
-+ bfq_insert(&st->active, entity);
-+
-+ if (node->rb_left)
-+ node = node->rb_left;
-+ else if (node->rb_right)
-+ node = node->rb_right;
-+
-+ bfq_update_active_tree(node);
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ sd = entity->sched_data;
-+ bfqg = container_of(sd, struct bfq_group, sched_data);
-+ BUG_ON(!bfqg);
-+ bfqd = (struct bfq_data *)bfqg->bfqd;
-+#endif
-+ if (bfqq)
-+ list_add(&bfqq->bfqq_list, &bfqq->bfqd->active_list);
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ else { /* bfq_group */
-+ BUG_ON(!bfqd);
-+ bfq_weights_tree_add(bfqd, entity, &bfqd->group_weights_tree);
-+ }
-+ if (bfqg != bfqd->root_group) {
-+ BUG_ON(!bfqg);
-+ BUG_ON(!bfqd);
-+ bfqg->active_entities++;
-+ if (bfqg->active_entities == 2)
-+ bfqd->active_numerous_groups++;
-+ }
-+#endif
-+}
-+
-+/**
-+ * bfq_ioprio_to_weight - calc a weight from an ioprio.
-+ * @ioprio: the ioprio value to convert.
-+ */
-+static unsigned short bfq_ioprio_to_weight(int ioprio)
-+{
-+ BUG_ON(ioprio < 0 || ioprio >= IOPRIO_BE_NR);
-+ return IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF - ioprio;
-+}
-+
-+/**
-+ * bfq_weight_to_ioprio - calc an ioprio from a weight.
-+ * @weight: the weight value to convert.
-+ *
-+ * To preserve as much as possible the old only-ioprio user interface,
-+ * 0 is used as an escape ioprio value for weights (numerically) equal or
-+ * larger than IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF.
-+ */
-+static unsigned short bfq_weight_to_ioprio(int weight)
-+{
-+ BUG_ON(weight < BFQ_MIN_WEIGHT || weight > BFQ_MAX_WEIGHT);
-+ return IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF - weight < 0 ?
-+ 0 : IOPRIO_BE_NR * BFQ_WEIGHT_CONVERSION_COEFF - weight;
-+}
-+
-+static void bfq_get_entity(struct bfq_entity *entity)
-+{
-+ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
-+
-+ if (bfqq) {
-+ atomic_inc(&bfqq->ref);
-+ bfq_log_bfqq(bfqq->bfqd, bfqq, "get_entity: %p %d",
-+ bfqq, atomic_read(&bfqq->ref));
-+ }
-+}
-+
-+/**
-+ * bfq_find_deepest - find the deepest node that an extraction can modify.
-+ * @node: the node being removed.
-+ *
-+ * Do the first step of an extraction in an rb tree, looking for the
-+ * node that will replace @node, and returning the deepest node that
-+ * the following modifications to the tree can touch. If @node is the
-+ * last node in the tree return %NULL.
-+ */
-+static struct rb_node *bfq_find_deepest(struct rb_node *node)
-+{
-+ struct rb_node *deepest;
-+
-+ if (!node->rb_right && !node->rb_left)
-+ deepest = rb_parent(node);
-+ else if (!node->rb_right)
-+ deepest = node->rb_left;
-+ else if (!node->rb_left)
-+ deepest = node->rb_right;
-+ else {
-+ deepest = rb_next(node);
-+ if (deepest->rb_right)
-+ deepest = deepest->rb_right;
-+ else if (rb_parent(deepest) != node)
-+ deepest = rb_parent(deepest);
-+ }
-+
-+ return deepest;
-+}
-+
-+/**
-+ * bfq_active_extract - remove an entity from the active tree.
-+ * @st: the service_tree containing the tree.
-+ * @entity: the entity being removed.
-+ */
-+static void bfq_active_extract(struct bfq_service_tree *st,
-+ struct bfq_entity *entity)
-+{
-+ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
-+ struct rb_node *node;
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ struct bfq_sched_data *sd = NULL;
-+ struct bfq_group *bfqg = NULL;
-+ struct bfq_data *bfqd = NULL;
-+#endif
-+
-+ node = bfq_find_deepest(&entity->rb_node);
-+ bfq_extract(&st->active, entity);
-+
-+ if (node)
-+ bfq_update_active_tree(node);
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ sd = entity->sched_data;
-+ bfqg = container_of(sd, struct bfq_group, sched_data);
-+ BUG_ON(!bfqg);
-+ bfqd = (struct bfq_data *)bfqg->bfqd;
-+#endif
-+ if (bfqq)
-+ list_del(&bfqq->bfqq_list);
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ else { /* bfq_group */
-+ BUG_ON(!bfqd);
-+ bfq_weights_tree_remove(bfqd, entity,
-+ &bfqd->group_weights_tree);
-+ }
-+ if (bfqg != bfqd->root_group) {
-+ BUG_ON(!bfqg);
-+ BUG_ON(!bfqd);
-+ BUG_ON(!bfqg->active_entities);
-+ bfqg->active_entities--;
-+ if (bfqg->active_entities == 1) {
-+ BUG_ON(!bfqd->active_numerous_groups);
-+ bfqd->active_numerous_groups--;
-+ }
-+ }
-+#endif
-+}
-+
-+/**
-+ * bfq_idle_insert - insert an entity into the idle tree.
-+ * @st: the service tree containing the tree.
-+ * @entity: the entity to insert.
-+ */
-+static void bfq_idle_insert(struct bfq_service_tree *st,
-+ struct bfq_entity *entity)
-+{
-+ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
-+ struct bfq_entity *first_idle = st->first_idle;
-+ struct bfq_entity *last_idle = st->last_idle;
-+
-+ if (!first_idle || bfq_gt(first_idle->finish, entity->finish))
-+ st->first_idle = entity;
-+ if (!last_idle || bfq_gt(entity->finish, last_idle->finish))
-+ st->last_idle = entity;
-+
-+ bfq_insert(&st->idle, entity);
-+
-+ if (bfqq)
-+ list_add(&bfqq->bfqq_list, &bfqq->bfqd->idle_list);
-+}
-+
-+/**
-+ * bfq_forget_entity - remove an entity from the wfq trees.
-+ * @st: the service tree.
-+ * @entity: the entity being removed.
-+ *
-+ * Update the device status and forget everything about @entity, putting
-+ * the device reference to it, if it is a queue. Entities belonging to
-+ * groups are not refcounted.
-+ */
-+static void bfq_forget_entity(struct bfq_service_tree *st,
-+ struct bfq_entity *entity)
-+{
-+ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
-+ struct bfq_sched_data *sd;
-+
-+ BUG_ON(!entity->on_st);
-+
-+ entity->on_st = 0;
-+ st->wsum -= entity->weight;
-+ if (bfqq) {
-+ sd = entity->sched_data;
-+ bfq_log_bfqq(bfqq->bfqd, bfqq, "forget_entity: %p %d",
-+ bfqq, atomic_read(&bfqq->ref));
-+ bfq_put_queue(bfqq);
-+ }
-+}
-+
-+/**
-+ * bfq_put_idle_entity - release the idle tree ref of an entity.
-+ * @st: service tree for the entity.
-+ * @entity: the entity being released.
-+ */
-+static void bfq_put_idle_entity(struct bfq_service_tree *st,
-+ struct bfq_entity *entity)
-+{
-+ bfq_idle_extract(st, entity);
-+ bfq_forget_entity(st, entity);
-+}
-+
-+/**
-+ * bfq_forget_idle - update the idle tree if necessary.
-+ * @st: the service tree to act upon.
-+ *
-+ * To preserve the global O(log N) complexity we only remove one entry here;
-+ * as the idle tree will not grow indefinitely this can be done safely.
-+ */
-+static void bfq_forget_idle(struct bfq_service_tree *st)
-+{
-+ struct bfq_entity *first_idle = st->first_idle;
-+ struct bfq_entity *last_idle = st->last_idle;
-+
-+ if (RB_EMPTY_ROOT(&st->active) && last_idle &&
-+ !bfq_gt(last_idle->finish, st->vtime)) {
-+ /*
-+ * Forget the whole idle tree, increasing the vtime past
-+ * the last finish time of idle entities.
-+ */
-+ st->vtime = last_idle->finish;
-+ }
-+
-+ if (first_idle && !bfq_gt(first_idle->finish, st->vtime))
-+ bfq_put_idle_entity(st, first_idle);
-+}
-+
-+static struct bfq_service_tree *
-+__bfq_entity_update_weight_prio(struct bfq_service_tree *old_st,
-+ struct bfq_entity *entity)
-+{
-+ struct bfq_service_tree *new_st = old_st;
-+
-+ if (entity->prio_changed) {
-+ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
-+ unsigned short prev_weight, new_weight;
-+ struct bfq_data *bfqd = NULL;
-+ struct rb_root *root;
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ struct bfq_sched_data *sd;
-+ struct bfq_group *bfqg;
-+#endif
-+
-+ if (bfqq)
-+ bfqd = bfqq->bfqd;
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ else {
-+ sd = entity->my_sched_data;
-+ bfqg = container_of(sd, struct bfq_group, sched_data);
-+ BUG_ON(!bfqg);
-+ bfqd = (struct bfq_data *)bfqg->bfqd;
-+ BUG_ON(!bfqd);
-+ }
-+#endif
-+
-+ BUG_ON(old_st->wsum < entity->weight);
-+ old_st->wsum -= entity->weight;
-+
-+ if (entity->new_weight != entity->orig_weight) {
-+ if (entity->new_weight < BFQ_MIN_WEIGHT ||
-+ entity->new_weight > BFQ_MAX_WEIGHT) {
-+ printk(KERN_CRIT "update_weight_prio: "
-+ "new_weight %d\n",
-+ entity->new_weight);
-+ BUG();
-+ }
-+ entity->orig_weight = entity->new_weight;
-+ if (bfqq)
-+ bfqq->ioprio =
-+ bfq_weight_to_ioprio(entity->orig_weight);
-+ }
-+
-+ if (bfqq)
-+ bfqq->ioprio_class = bfqq->new_ioprio_class;
-+ entity->prio_changed = 0;
-+
-+ /*
-+ * NOTE: here we may be changing the weight too early,
-+ * this will cause unfairness. The correct approach
-+ * would have required additional complexity to defer
-+ * weight changes to the proper time instants (i.e.,
-+ * when entity->finish <= old_st->vtime).
-+ */
-+ new_st = bfq_entity_service_tree(entity);
-+
-+ prev_weight = entity->weight;
-+ new_weight = entity->orig_weight *
-+ (bfqq ? bfqq->wr_coeff : 1);
-+ /*
-+ * If the weight of the entity changes, remove the entity
-+ * from its old weight counter (if there is a counter
-+ * associated with the entity), and add it to the counter
-+ * associated with its new weight.
-+ */
-+ if (prev_weight != new_weight) {
-+ root = bfqq ? &bfqd->queue_weights_tree :
-+ &bfqd->group_weights_tree;
-+ bfq_weights_tree_remove(bfqd, entity, root);
-+ }
-+ entity->weight = new_weight;
-+ /*
-+ * Add the entity to its weights tree only if it is
-+ * not associated with a weight-raised queue.
-+ */
-+ if (prev_weight != new_weight &&
-+ (bfqq ? bfqq->wr_coeff == 1 : 1))
-+ /* If we get here, root has been initialized. */
-+ bfq_weights_tree_add(bfqd, entity, root);
-+
-+ new_st->wsum += entity->weight;
-+
-+ if (new_st != old_st)
-+ entity->start = new_st->vtime;
-+ }
-+
-+ return new_st;
-+}
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+static void bfqg_stats_set_start_empty_time(struct bfq_group *bfqg);
-+#endif
-+
-+/**
-+ * bfq_bfqq_served - update the scheduler status after selection for
-+ * service.
-+ * @bfqq: the queue being served.
-+ * @served: bytes to transfer.
-+ *
-+ * NOTE: this can be optimized, as the timestamps of upper level entities
-+ * are synchronized every time a new bfqq is selected for service. By now,
-+ * we keep it to better check consistency.
-+ */
-+static void bfq_bfqq_served(struct bfq_queue *bfqq, int served)
-+{
-+ struct bfq_entity *entity = &bfqq->entity;
-+ struct bfq_service_tree *st;
-+
-+ for_each_entity(entity) {
-+ st = bfq_entity_service_tree(entity);
-+
-+ entity->service += served;
-+ BUG_ON(entity->service > entity->budget);
-+ BUG_ON(st->wsum == 0);
-+
-+ st->vtime += bfq_delta(served, st->wsum);
-+ bfq_forget_idle(st);
-+ }
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ bfqg_stats_set_start_empty_time(bfqq_group(bfqq));
-+#endif
-+ bfq_log_bfqq(bfqq->bfqd, bfqq, "bfqq_served %d secs", served);
-+}
-+
-+/**
-+ * bfq_bfqq_charge_full_budget - set the service to the entity budget.
-+ * @bfqq: the queue that needs a service update.
-+ *
-+ * When it's not possible to be fair in the service domain, because
-+ * a queue is not consuming its budget fast enough (the meaning of
-+ * fast depends on the timeout parameter), we charge it a full
-+ * budget. In this way we should obtain a sort of time-domain
-+ * fairness among all the seeky/slow queues.
-+ */
-+static void bfq_bfqq_charge_full_budget(struct bfq_queue *bfqq)
-+{
-+ struct bfq_entity *entity = &bfqq->entity;
-+
-+ bfq_log_bfqq(bfqq->bfqd, bfqq, "charge_full_budget");
-+
-+ bfq_bfqq_served(bfqq, entity->budget - entity->service);
-+}
-+
-+/**
-+ * __bfq_activate_entity - activate an entity.
-+ * @entity: the entity being activated.
-+ *
-+ * Called whenever an entity is activated, i.e., it is not active and one
-+ * of its children receives a new request, or has to be reactivated due to
-+ * budget exhaustion. It uses the current budget of the entity (and the
-+ * service received if @entity is active) of the queue to calculate its
-+ * timestamps.
-+ */
-+static void __bfq_activate_entity(struct bfq_entity *entity)
-+{
-+ struct bfq_sched_data *sd = entity->sched_data;
-+ struct bfq_service_tree *st = bfq_entity_service_tree(entity);
-+
-+ if (entity == sd->in_service_entity) {
-+ BUG_ON(entity->tree);
-+ /*
-+ * If we are requeueing the current entity we have
-+ * to take care of not charging to it service it has
-+ * not received.
-+ */
-+ bfq_calc_finish(entity, entity->service);
-+ entity->start = entity->finish;
-+ sd->in_service_entity = NULL;
-+ } else if (entity->tree == &st->active) {
-+ /*
-+ * Requeueing an entity due to a change of some
-+ * next_in_service entity below it. We reuse the
-+ * old start time.
-+ */
-+ bfq_active_extract(st, entity);
-+ } else if (entity->tree == &st->idle) {
-+ /*
-+ * Must be on the idle tree, bfq_idle_extract() will
-+ * check for that.
-+ */
-+ bfq_idle_extract(st, entity);
-+ entity->start = bfq_gt(st->vtime, entity->finish) ?
-+ st->vtime : entity->finish;
-+ } else {
-+ /*
-+ * The finish time of the entity may be invalid, and
-+ * it is in the past for sure, otherwise the queue
-+ * would have been on the idle tree.
-+ */
-+ entity->start = st->vtime;
-+ st->wsum += entity->weight;
-+ bfq_get_entity(entity);
-+
-+ BUG_ON(entity->on_st);
-+ entity->on_st = 1;
-+ }
-+
-+ st = __bfq_entity_update_weight_prio(st, entity);
-+ bfq_calc_finish(entity, entity->budget);
-+ bfq_active_insert(st, entity);
-+}
-+
-+/**
-+ * bfq_activate_entity - activate an entity and its ancestors if necessary.
-+ * @entity: the entity to activate.
-+ *
-+ * Activate @entity and all the entities on the path from it to the root.
-+ */
-+static void bfq_activate_entity(struct bfq_entity *entity)
-+{
-+ struct bfq_sched_data *sd;
-+
-+ for_each_entity(entity) {
-+ __bfq_activate_entity(entity);
-+
-+ sd = entity->sched_data;
-+ if (!bfq_update_next_in_service(sd))
-+ /*
-+ * No need to propagate the activation to the
-+ * upper entities, as they will be updated when
-+ * the in-service entity is rescheduled.
-+ */
-+ break;
-+ }
-+}
-+
-+/**
-+ * __bfq_deactivate_entity - deactivate an entity from its service tree.
-+ * @entity: the entity to deactivate.
-+ * @requeue: if false, the entity will not be put into the idle tree.
-+ *
-+ * Deactivate an entity, independently from its previous state. If the
-+ * entity was not on a service tree just return, otherwise if it is on
-+ * any scheduler tree, extract it from that tree, and if necessary
-+ * and if the caller did not specify @requeue, put it on the idle tree.
-+ *
-+ * Return %1 if the caller should update the entity hierarchy, i.e.,
-+ * if the entity was in service or if it was the next_in_service for
-+ * its sched_data; return %0 otherwise.
-+ */
-+static int __bfq_deactivate_entity(struct bfq_entity *entity, int requeue)
-+{
-+ struct bfq_sched_data *sd = entity->sched_data;
-+ struct bfq_service_tree *st = bfq_entity_service_tree(entity);
-+ int was_in_service = entity == sd->in_service_entity;
-+ int ret = 0;
-+
-+ if (!entity->on_st)
-+ return 0;
-+
-+ BUG_ON(was_in_service && entity->tree);
-+
-+ if (was_in_service) {
-+ bfq_calc_finish(entity, entity->service);
-+ sd->in_service_entity = NULL;
-+ } else if (entity->tree == &st->active)
-+ bfq_active_extract(st, entity);
-+ else if (entity->tree == &st->idle)
-+ bfq_idle_extract(st, entity);
-+ else if (entity->tree)
-+ BUG();
-+
-+ if (was_in_service || sd->next_in_service == entity)
-+ ret = bfq_update_next_in_service(sd);
-+
-+ if (!requeue || !bfq_gt(entity->finish, st->vtime))
-+ bfq_forget_entity(st, entity);
-+ else
-+ bfq_idle_insert(st, entity);
-+
-+ BUG_ON(sd->in_service_entity == entity);
-+ BUG_ON(sd->next_in_service == entity);
-+
-+ return ret;
-+}
-+
-+/**
-+ * bfq_deactivate_entity - deactivate an entity.
-+ * @entity: the entity to deactivate.
-+ * @requeue: true if the entity can be put on the idle tree
-+ */
-+static void bfq_deactivate_entity(struct bfq_entity *entity, int requeue)
-+{
-+ struct bfq_sched_data *sd;
-+ struct bfq_entity *parent;
-+
-+ for_each_entity_safe(entity, parent) {
-+ sd = entity->sched_data;
-+
-+ if (!__bfq_deactivate_entity(entity, requeue))
-+ /*
-+ * The parent entity is still backlogged, and
-+ * we don't need to update it as it is still
-+ * in service.
-+ */
-+ break;
-+
-+ if (sd->next_in_service)
-+ /*
-+ * The parent entity is still backlogged and
-+ * the budgets on the path towards the root
-+ * need to be updated.
-+ */
-+ goto update;
-+
-+ /*
-+ * If we reach there the parent is no more backlogged and
-+ * we want to propagate the dequeue upwards.
-+ */
-+ requeue = 1;
-+ }
-+
-+ return;
-+
-+update:
-+ entity = parent;
-+ for_each_entity(entity) {
-+ __bfq_activate_entity(entity);
-+
-+ sd = entity->sched_data;
-+ if (!bfq_update_next_in_service(sd))
-+ break;
-+ }
-+}
-+
-+/**
-+ * bfq_update_vtime - update vtime if necessary.
-+ * @st: the service tree to act upon.
-+ *
-+ * If necessary update the service tree vtime to have at least one
-+ * eligible entity, skipping to its start time. Assumes that the
-+ * active tree of the device is not empty.
-+ *
-+ * NOTE: this hierarchical implementation updates vtimes quite often,
-+ * we may end up with reactivated processes getting timestamps after a
-+ * vtime skip done because we needed a ->first_active entity on some
-+ * intermediate node.
-+ */
-+static void bfq_update_vtime(struct bfq_service_tree *st)
-+{
-+ struct bfq_entity *entry;
-+ struct rb_node *node = st->active.rb_node;
-+
-+ entry = rb_entry(node, struct bfq_entity, rb_node);
-+ if (bfq_gt(entry->min_start, st->vtime)) {
-+ st->vtime = entry->min_start;
-+ bfq_forget_idle(st);
-+ }
-+}
-+
-+/**
-+ * bfq_first_active_entity - find the eligible entity with
-+ * the smallest finish time
-+ * @st: the service tree to select from.
-+ *
-+ * This function searches the first schedulable entity, starting from the
-+ * root of the tree and going on the left every time on this side there is
-+ * a subtree with at least one eligible (start >= vtime) entity. The path on
-+ * the right is followed only if a) the left subtree contains no eligible
-+ * entities and b) no eligible entity has been found yet.
-+ */
-+static struct bfq_entity *bfq_first_active_entity(struct bfq_service_tree *st)
-+{
-+ struct bfq_entity *entry, *first = NULL;
-+ struct rb_node *node = st->active.rb_node;
-+
-+ while (node) {
-+ entry = rb_entry(node, struct bfq_entity, rb_node);
-+left:
-+ if (!bfq_gt(entry->start, st->vtime))
-+ first = entry;
-+
-+ BUG_ON(bfq_gt(entry->min_start, st->vtime));
-+
-+ if (node->rb_left) {
-+ entry = rb_entry(node->rb_left,
-+ struct bfq_entity, rb_node);
-+ if (!bfq_gt(entry->min_start, st->vtime)) {
-+ node = node->rb_left;
-+ goto left;
-+ }
-+ }
-+ if (first)
-+ break;
-+ node = node->rb_right;
-+ }
-+
-+ BUG_ON(!first && !RB_EMPTY_ROOT(&st->active));
-+ return first;
-+}
-+
-+/**
-+ * __bfq_lookup_next_entity - return the first eligible entity in @st.
-+ * @st: the service tree.
-+ *
-+ * Update the virtual time in @st and return the first eligible entity
-+ * it contains.
-+ */
-+static struct bfq_entity *__bfq_lookup_next_entity(struct bfq_service_tree *st,
-+ bool force)
-+{
-+ struct bfq_entity *entity, *new_next_in_service = NULL;
-+
-+ if (RB_EMPTY_ROOT(&st->active))
-+ return NULL;
-+
-+ bfq_update_vtime(st);
-+ entity = bfq_first_active_entity(st);
-+ BUG_ON(bfq_gt(entity->start, st->vtime));
-+
-+ /*
-+ * If the chosen entity does not match with the sched_data's
-+ * next_in_service and we are forcedly serving the IDLE priority
-+ * class tree, bubble up budget update.
-+ */
-+ if (unlikely(force && entity != entity->sched_data->next_in_service)) {
-+ new_next_in_service = entity;
-+ for_each_entity(new_next_in_service)
-+ bfq_update_budget(new_next_in_service);
-+ }
-+
-+ return entity;
-+}
-+
-+/**
-+ * bfq_lookup_next_entity - return the first eligible entity in @sd.
-+ * @sd: the sched_data.
-+ * @extract: if true the returned entity will be also extracted from @sd.
-+ *
-+ * NOTE: since we cache the next_in_service entity at each level of the
-+ * hierarchy, the complexity of the lookup can be decreased with
-+ * absolutely no effort just returning the cached next_in_service value;
-+ * we prefer to do full lookups to test the consistency of * the data
-+ * structures.
-+ */
-+static struct bfq_entity *bfq_lookup_next_entity(struct bfq_sched_data *sd,
-+ int extract,
-+ struct bfq_data *bfqd)
-+{
-+ struct bfq_service_tree *st = sd->service_tree;
-+ struct bfq_entity *entity;
-+ int i = 0;
-+
-+ BUG_ON(sd->in_service_entity);
-+
-+ if (bfqd &&
-+ jiffies - bfqd->bfq_class_idle_last_service > BFQ_CL_IDLE_TIMEOUT) {
-+ entity = __bfq_lookup_next_entity(st + BFQ_IOPRIO_CLASSES - 1,
-+ true);
-+ if (entity) {
-+ i = BFQ_IOPRIO_CLASSES - 1;
-+ bfqd->bfq_class_idle_last_service = jiffies;
-+ sd->next_in_service = entity;
-+ }
-+ }
-+ for (; i < BFQ_IOPRIO_CLASSES; i++) {
-+ entity = __bfq_lookup_next_entity(st + i, false);
-+ if (entity) {
-+ if (extract) {
-+ bfq_check_next_in_service(sd, entity);
-+ bfq_active_extract(st + i, entity);
-+ sd->in_service_entity = entity;
-+ sd->next_in_service = NULL;
-+ }
-+ break;
-+ }
-+ }
-+
-+ return entity;
-+}
-+
-+/*
-+ * Get next queue for service.
-+ */
-+static struct bfq_queue *bfq_get_next_queue(struct bfq_data *bfqd)
-+{
-+ struct bfq_entity *entity = NULL;
-+ struct bfq_sched_data *sd;
-+ struct bfq_queue *bfqq;
-+
-+ BUG_ON(bfqd->in_service_queue);
-+
-+ if (bfqd->busy_queues == 0)
-+ return NULL;
-+
-+ sd = &bfqd->root_group->sched_data;
-+ for (; sd ; sd = entity->my_sched_data) {
-+ entity = bfq_lookup_next_entity(sd, 1, bfqd);
-+ BUG_ON(!entity);
-+ entity->service = 0;
-+ }
-+
-+ bfqq = bfq_entity_to_bfqq(entity);
-+ BUG_ON(!bfqq);
-+
-+ return bfqq;
-+}
-+
-+static void __bfq_bfqd_reset_in_service(struct bfq_data *bfqd)
-+{
-+ if (bfqd->in_service_bic) {
-+ put_io_context(bfqd->in_service_bic->icq.ioc);
-+ bfqd->in_service_bic = NULL;
-+ }
-+
-+ bfqd->in_service_queue = NULL;
-+ del_timer(&bfqd->idle_slice_timer);
-+}
-+
-+static void bfq_deactivate_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq,
-+ int requeue)
-+{
-+ struct bfq_entity *entity = &bfqq->entity;
-+
-+ if (bfqq == bfqd->in_service_queue)
-+ __bfq_bfqd_reset_in_service(bfqd);
-+
-+ bfq_deactivate_entity(entity, requeue);
-+}
-+
-+static void bfq_activate_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq)
-+{
-+ struct bfq_entity *entity = &bfqq->entity;
-+
-+ bfq_activate_entity(entity);
-+}
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+static void bfqg_stats_update_dequeue(struct bfq_group *bfqg);
-+#endif
-+
-+/*
-+ * Called when the bfqq no longer has requests pending, remove it from
-+ * the service tree.
-+ */
-+static void bfq_del_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq,
-+ int requeue)
-+{
-+ BUG_ON(!bfq_bfqq_busy(bfqq));
-+ BUG_ON(!RB_EMPTY_ROOT(&bfqq->sort_list));
-+
-+ bfq_log_bfqq(bfqd, bfqq, "del from busy");
-+
-+ bfq_clear_bfqq_busy(bfqq);
-+
-+ BUG_ON(bfqd->busy_queues == 0);
-+ bfqd->busy_queues--;
-+
-+ if (!bfqq->dispatched) {
-+ bfq_weights_tree_remove(bfqd, &bfqq->entity,
-+ &bfqd->queue_weights_tree);
-+ if (!blk_queue_nonrot(bfqd->queue)) {
-+ BUG_ON(!bfqd->busy_in_flight_queues);
-+ bfqd->busy_in_flight_queues--;
-+ if (bfq_bfqq_constantly_seeky(bfqq)) {
-+ BUG_ON(!bfqd->
-+ const_seeky_busy_in_flight_queues);
-+ bfqd->const_seeky_busy_in_flight_queues--;
-+ }
-+ }
-+ }
-+ if (bfqq->wr_coeff > 1)
-+ bfqd->wr_busy_queues--;
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ bfqg_stats_update_dequeue(bfqq_group(bfqq));
-+#endif
-+
-+ bfq_deactivate_bfqq(bfqd, bfqq, requeue);
-+}
-+
-+/*
-+ * Called when an inactive queue receives a new request.
-+ */
-+static void bfq_add_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq)
-+{
-+ BUG_ON(bfq_bfqq_busy(bfqq));
-+ BUG_ON(bfqq == bfqd->in_service_queue);
-+
-+ bfq_log_bfqq(bfqd, bfqq, "add to busy");
-+
-+ bfq_activate_bfqq(bfqd, bfqq);
-+
-+ bfq_mark_bfqq_busy(bfqq);
-+ bfqd->busy_queues++;
-+
-+ if (!bfqq->dispatched) {
-+ if (bfqq->wr_coeff == 1)
-+ bfq_weights_tree_add(bfqd, &bfqq->entity,
-+ &bfqd->queue_weights_tree);
-+ if (!blk_queue_nonrot(bfqd->queue)) {
-+ bfqd->busy_in_flight_queues++;
-+ if (bfq_bfqq_constantly_seeky(bfqq))
-+ bfqd->const_seeky_busy_in_flight_queues++;
-+ }
-+ }
-+ if (bfqq->wr_coeff > 1)
-+ bfqd->wr_busy_queues++;
-+}
-diff --git a/block/bfq.h b/block/bfq.h
-new file mode 100644
-index 0000000..ca5ac20
---- /dev/null
-+++ b/block/bfq.h
-@@ -0,0 +1,807 @@
-+/*
-+ * BFQ-v7r9 for 4.2.0: data structures and common functions prototypes.
-+ *
-+ * Based on ideas and code from CFQ:
-+ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
-+ *
-+ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
-+ * Paolo Valente <paolo.valente@unimore.it>
-+ *
-+ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
-+ */
-+
-+#ifndef _BFQ_H
-+#define _BFQ_H
-+
-+#include <linux/blktrace_api.h>
-+#include <linux/hrtimer.h>
-+#include <linux/ioprio.h>
-+#include <linux/rbtree.h>
-+#include <linux/blk-cgroup.h>
-+
-+#define BFQ_IOPRIO_CLASSES 3
-+#define BFQ_CL_IDLE_TIMEOUT (HZ/5)
-+
-+#define BFQ_MIN_WEIGHT 1
-+#define BFQ_MAX_WEIGHT 1000
-+#define BFQ_WEIGHT_CONVERSION_COEFF 10
-+
-+#define BFQ_DEFAULT_QUEUE_IOPRIO 4
-+
-+#define BFQ_DEFAULT_GRP_WEIGHT 10
-+#define BFQ_DEFAULT_GRP_IOPRIO 0
-+#define BFQ_DEFAULT_GRP_CLASS IOPRIO_CLASS_BE
-+
-+struct bfq_entity;
-+
-+/**
-+ * struct bfq_service_tree - per ioprio_class service tree.
-+ * @active: tree for active entities (i.e., those backlogged).
-+ * @idle: tree for idle entities (i.e., those not backlogged, with V <= F_i).
-+ * @first_idle: idle entity with minimum F_i.
-+ * @last_idle: idle entity with maximum F_i.
-+ * @vtime: scheduler virtual time.
-+ * @wsum: scheduler weight sum; active and idle entities contribute to it.
-+ *
-+ * Each service tree represents a B-WF2Q+ scheduler on its own. Each
-+ * ioprio_class has its own independent scheduler, and so its own
-+ * bfq_service_tree. All the fields are protected by the queue lock
-+ * of the containing bfqd.
-+ */
-+struct bfq_service_tree {
-+ struct rb_root active;
-+ struct rb_root idle;
-+
-+ struct bfq_entity *first_idle;
-+ struct bfq_entity *last_idle;
-+
-+ u64 vtime;
-+ unsigned long wsum;
-+};
-+
-+/**
-+ * struct bfq_sched_data - multi-class scheduler.
-+ * @in_service_entity: entity in service.
-+ * @next_in_service: head-of-the-line entity in the scheduler.
-+ * @service_tree: array of service trees, one per ioprio_class.
-+ *
-+ * bfq_sched_data is the basic scheduler queue. It supports three
-+ * ioprio_classes, and can be used either as a toplevel queue or as
-+ * an intermediate queue on a hierarchical setup.
-+ * @next_in_service points to the active entity of the sched_data
-+ * service trees that will be scheduled next.
-+ *
-+ * The supported ioprio_classes are the same as in CFQ, in descending
-+ * priority order, IOPRIO_CLASS_RT, IOPRIO_CLASS_BE, IOPRIO_CLASS_IDLE.
-+ * Requests from higher priority queues are served before all the
-+ * requests from lower priority queues; among requests of the same
-+ * queue requests are served according to B-WF2Q+.
-+ * All the fields are protected by the queue lock of the containing bfqd.
-+ */
-+struct bfq_sched_data {
-+ struct bfq_entity *in_service_entity;
-+ struct bfq_entity *next_in_service;
-+ struct bfq_service_tree service_tree[BFQ_IOPRIO_CLASSES];
-+};
-+
-+/**
-+ * struct bfq_weight_counter - counter of the number of all active entities
-+ * with a given weight.
-+ * @weight: weight of the entities that this counter refers to.
-+ * @num_active: number of active entities with this weight.
-+ * @weights_node: weights tree member (see bfq_data's @queue_weights_tree
-+ * and @group_weights_tree).
-+ */
-+struct bfq_weight_counter {
-+ short int weight;
-+ unsigned int num_active;
-+ struct rb_node weights_node;
-+};
-+
-+/**
-+ * struct bfq_entity - schedulable entity.
-+ * @rb_node: service_tree member.
-+ * @weight_counter: pointer to the weight counter associated with this entity.
-+ * @on_st: flag, true if the entity is on a tree (either the active or
-+ * the idle one of its service_tree).
-+ * @finish: B-WF2Q+ finish timestamp (aka F_i).
-+ * @start: B-WF2Q+ start timestamp (aka S_i).
-+ * @tree: tree the entity is enqueued into; %NULL if not on a tree.
-+ * @min_start: minimum start time of the (active) subtree rooted at
-+ * this entity; used for O(log N) lookups into active trees.
-+ * @service: service received during the last round of service.
-+ * @budget: budget used to calculate F_i; F_i = S_i + @budget / @weight.
-+ * @weight: weight of the queue
-+ * @parent: parent entity, for hierarchical scheduling.
-+ * @my_sched_data: for non-leaf nodes in the cgroup hierarchy, the
-+ * associated scheduler queue, %NULL on leaf nodes.
-+ * @sched_data: the scheduler queue this entity belongs to.
-+ * @ioprio: the ioprio in use.
-+ * @new_weight: when a weight change is requested, the new weight value.
-+ * @orig_weight: original weight, used to implement weight boosting
-+ * @prio_changed: flag, true when the user requested a weight, ioprio or
-+ * ioprio_class change.
-+ *
-+ * A bfq_entity is used to represent either a bfq_queue (leaf node in the
-+ * cgroup hierarchy) or a bfq_group into the upper level scheduler. Each
-+ * entity belongs to the sched_data of the parent group in the cgroup
-+ * hierarchy. Non-leaf entities have also their own sched_data, stored
-+ * in @my_sched_data.
-+ *
-+ * Each entity stores independently its priority values; this would
-+ * allow different weights on different devices, but this
-+ * functionality is not exported to userspace by now. Priorities and
-+ * weights are updated lazily, first storing the new values into the
-+ * new_* fields, then setting the @prio_changed flag. As soon as
-+ * there is a transition in the entity state that allows the priority
-+ * update to take place the effective and the requested priority
-+ * values are synchronized.
-+ *
-+ * Unless cgroups are used, the weight value is calculated from the
-+ * ioprio to export the same interface as CFQ. When dealing with
-+ * ``well-behaved'' queues (i.e., queues that do not spend too much
-+ * time to consume their budget and have true sequential behavior, and
-+ * when there are no external factors breaking anticipation) the
-+ * relative weights at each level of the cgroups hierarchy should be
-+ * guaranteed. All the fields are protected by the queue lock of the
-+ * containing bfqd.
-+ */
-+struct bfq_entity {
-+ struct rb_node rb_node;
-+ struct bfq_weight_counter *weight_counter;
-+
-+ int on_st;
-+
-+ u64 finish;
-+ u64 start;
-+
-+ struct rb_root *tree;
-+
-+ u64 min_start;
-+
-+ int service, budget;
-+ unsigned short weight, new_weight;
-+ unsigned short orig_weight;
-+
-+ struct bfq_entity *parent;
-+
-+ struct bfq_sched_data *my_sched_data;
-+ struct bfq_sched_data *sched_data;
-+
-+ int prio_changed;
-+};
-+
-+struct bfq_group;
-+
-+/**
-+ * struct bfq_queue - leaf schedulable entity.
-+ * @ref: reference counter.
-+ * @bfqd: parent bfq_data.
-+ * @new_ioprio: when an ioprio change is requested, the new ioprio value.
-+ * @ioprio_class: the ioprio_class in use.
-+ * @new_ioprio_class: when an ioprio_class change is requested, the new
-+ * ioprio_class value.
-+ * @new_bfqq: shared bfq_queue if queue is cooperating with
-+ * one or more other queues.
-+ * @sort_list: sorted list of pending requests.
-+ * @next_rq: if fifo isn't expired, next request to serve.
-+ * @queued: nr of requests queued in @sort_list.
-+ * @allocated: currently allocated requests.
-+ * @meta_pending: pending metadata requests.
-+ * @fifo: fifo list of requests in sort_list.
-+ * @entity: entity representing this queue in the scheduler.
-+ * @max_budget: maximum budget allowed from the feedback mechanism.
-+ * @budget_timeout: budget expiration (in jiffies).
-+ * @dispatched: number of requests on the dispatch list or inside driver.
-+ * @flags: status flags.
-+ * @bfqq_list: node for active/idle bfqq list inside our bfqd.
-+ * @burst_list_node: node for the device's burst list.
-+ * @seek_samples: number of seeks sampled
-+ * @seek_total: sum of the distances of the seeks sampled
-+ * @seek_mean: mean seek distance
-+ * @last_request_pos: position of the last request enqueued
-+ * @requests_within_timer: number of consecutive pairs of request completion
-+ * and arrival, such that the queue becomes idle
-+ * after the completion, but the next request arrives
-+ * within an idle time slice; used only if the queue's
-+ * IO_bound has been cleared.
-+ * @pid: pid of the process owning the queue, used for logging purposes.
-+ * @last_wr_start_finish: start time of the current weight-raising period if
-+ * the @bfq-queue is being weight-raised, otherwise
-+ * finish time of the last weight-raising period
-+ * @wr_cur_max_time: current max raising time for this queue
-+ * @soft_rt_next_start: minimum time instant such that, only if a new
-+ * request is enqueued after this time instant in an
-+ * idle @bfq_queue with no outstanding requests, then
-+ * the task associated with the queue it is deemed as
-+ * soft real-time (see the comments to the function
-+ * bfq_bfqq_softrt_next_start())
-+ * @last_idle_bklogged: time of the last transition of the @bfq_queue from
-+ * idle to backlogged
-+ * @service_from_backlogged: cumulative service received from the @bfq_queue
-+ * since the last transition from idle to
-+ * backlogged
-+ * @bic: pointer to the bfq_io_cq owning the bfq_queue, set to %NULL if the
-+ * queue is shared
-+ *
-+ * A bfq_queue is a leaf request queue; it can be associated with an
-+ * io_context or more, if it is async or shared between cooperating
-+ * processes. @cgroup holds a reference to the cgroup, to be sure that it
-+ * does not disappear while a bfqq still references it (mostly to avoid
-+ * races between request issuing and task migration followed by cgroup
-+ * destruction).
-+ * All the fields are protected by the queue lock of the containing bfqd.
-+ */
-+struct bfq_queue {
-+ atomic_t ref;
-+ struct bfq_data *bfqd;
-+
-+ unsigned short ioprio, new_ioprio;
-+ unsigned short ioprio_class, new_ioprio_class;
-+
-+ /* fields for cooperating queues handling */
-+ struct bfq_queue *new_bfqq;
-+ struct rb_node pos_node;
-+ struct rb_root *pos_root;
-+
-+ struct rb_root sort_list;
-+ struct request *next_rq;
-+ int queued[2];
-+ int allocated[2];
-+ int meta_pending;
-+ struct list_head fifo;
-+
-+ struct bfq_entity entity;
-+
-+ int max_budget;
-+ unsigned long budget_timeout;
-+
-+ int dispatched;
-+
-+ unsigned int flags;
-+
-+ struct list_head bfqq_list;
-+
-+ struct hlist_node burst_list_node;
-+
-+ unsigned int seek_samples;
-+ u64 seek_total;
-+ sector_t seek_mean;
-+ sector_t last_request_pos;
-+
-+ unsigned int requests_within_timer;
-+
-+ pid_t pid;
-+ struct bfq_io_cq *bic;
-+
-+ /* weight-raising fields */
-+ unsigned long wr_cur_max_time;
-+ unsigned long soft_rt_next_start;
-+ unsigned long last_wr_start_finish;
-+ unsigned int wr_coeff;
-+ unsigned long last_idle_bklogged;
-+ unsigned long service_from_backlogged;
-+};
-+
-+/**
-+ * struct bfq_ttime - per process thinktime stats.
-+ * @ttime_total: total process thinktime
-+ * @ttime_samples: number of thinktime samples
-+ * @ttime_mean: average process thinktime
-+ */
-+struct bfq_ttime {
-+ unsigned long last_end_request;
-+
-+ unsigned long ttime_total;
-+ unsigned long ttime_samples;
-+ unsigned long ttime_mean;
-+};
-+
-+/**
-+ * struct bfq_io_cq - per (request_queue, io_context) structure.
-+ * @icq: associated io_cq structure
-+ * @bfqq: array of two process queues, the sync and the async
-+ * @ttime: associated @bfq_ttime struct
-+ * @ioprio: per (request_queue, blkcg) ioprio.
-+ * @blkcg_id: id of the blkcg the related io_cq belongs to.
-+ */
-+struct bfq_io_cq {
-+ struct io_cq icq; /* must be the first member */
-+ struct bfq_queue *bfqq[2];
-+ struct bfq_ttime ttime;
-+ int ioprio;
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ uint64_t blkcg_id; /* the current blkcg ID */
-+#endif
-+};
-+
-+enum bfq_device_speed {
-+ BFQ_BFQD_FAST,
-+ BFQ_BFQD_SLOW,
-+};
-+
-+/**
-+ * struct bfq_data - per device data structure.
-+ * @queue: request queue for the managed device.
-+ * @root_group: root bfq_group for the device.
-+ * @active_numerous_groups: number of bfq_groups containing more than one
-+ * active @bfq_entity.
-+ * @queue_weights_tree: rbtree of weight counters of @bfq_queues, sorted by
-+ * weight. Used to keep track of whether all @bfq_queues
-+ * have the same weight. The tree contains one counter
-+ * for each distinct weight associated to some active
-+ * and not weight-raised @bfq_queue (see the comments to
-+ * the functions bfq_weights_tree_[add|remove] for
-+ * further details).
-+ * @group_weights_tree: rbtree of non-queue @bfq_entity weight counters, sorted
-+ * by weight. Used to keep track of whether all
-+ * @bfq_groups have the same weight. The tree contains
-+ * one counter for each distinct weight associated to
-+ * some active @bfq_group (see the comments to the
-+ * functions bfq_weights_tree_[add|remove] for further
-+ * details).
-+ * @busy_queues: number of bfq_queues containing requests (including the
-+ * queue in service, even if it is idling).
-+ * @busy_in_flight_queues: number of @bfq_queues containing pending or
-+ * in-flight requests, plus the @bfq_queue in
-+ * service, even if idle but waiting for the
-+ * possible arrival of its next sync request. This
-+ * field is updated only if the device is rotational,
-+ * but used only if the device is also NCQ-capable.
-+ * The reason why the field is updated also for non-
-+ * NCQ-capable rotational devices is related to the
-+ * fact that the value of @hw_tag may be set also
-+ * later than when busy_in_flight_queues may need to
-+ * be incremented for the first time(s). Taking also
-+ * this possibility into account, to avoid unbalanced
-+ * increments/decrements, would imply more overhead
-+ * than just updating busy_in_flight_queues
-+ * regardless of the value of @hw_tag.
-+ * @const_seeky_busy_in_flight_queues: number of constantly-seeky @bfq_queues
-+ * (that is, seeky queues that expired
-+ * for budget timeout at least once)
-+ * containing pending or in-flight
-+ * requests, including the in-service
-+ * @bfq_queue if constantly seeky. This
-+ * field is updated only if the device
-+ * is rotational, but used only if the
-+ * device is also NCQ-capable (see the
-+ * comments to @busy_in_flight_queues).
-+ * @wr_busy_queues: number of weight-raised busy @bfq_queues.
-+ * @queued: number of queued requests.
-+ * @rq_in_driver: number of requests dispatched and waiting for completion.
-+ * @sync_flight: number of sync requests in the driver.
-+ * @max_rq_in_driver: max number of reqs in driver in the last
-+ * @hw_tag_samples completed requests.
-+ * @hw_tag_samples: nr of samples used to calculate hw_tag.
-+ * @hw_tag: flag set to one if the driver is showing a queueing behavior.
-+ * @budgets_assigned: number of budgets assigned.
-+ * @idle_slice_timer: timer set when idling for the next sequential request
-+ * from the queue in service.
-+ * @unplug_work: delayed work to restart dispatching on the request queue.
-+ * @in_service_queue: bfq_queue in service.
-+ * @in_service_bic: bfq_io_cq (bic) associated with the @in_service_queue.
-+ * @last_position: on-disk position of the last served request.
-+ * @last_budget_start: beginning of the last budget.
-+ * @last_idling_start: beginning of the last idle slice.
-+ * @peak_rate: peak transfer rate observed for a budget.
-+ * @peak_rate_samples: number of samples used to calculate @peak_rate.
-+ * @bfq_max_budget: maximum budget allotted to a bfq_queue before
-+ * rescheduling.
-+ * @group_list: list of all the bfq_groups active on the device.
-+ * @active_list: list of all the bfq_queues active on the device.
-+ * @idle_list: list of all the bfq_queues idle on the device.
-+ * @bfq_fifo_expire: timeout for async/sync requests; when it expires
-+ * requests are served in fifo order.
-+ * @bfq_back_penalty: weight of backward seeks wrt forward ones.
-+ * @bfq_back_max: maximum allowed backward seek.
-+ * @bfq_slice_idle: maximum idling time.
-+ * @bfq_user_max_budget: user-configured max budget value
-+ * (0 for auto-tuning).
-+ * @bfq_max_budget_async_rq: maximum budget (in nr of requests) allotted to
-+ * async queues.
-+ * @bfq_timeout: timeout for bfq_queues to consume their budget; used to
-+ * to prevent seeky queues to impose long latencies to well
-+ * behaved ones (this also implies that seeky queues cannot
-+ * receive guarantees in the service domain; after a timeout
-+ * they are charged for the whole allocated budget, to try
-+ * to preserve a behavior reasonably fair among them, but
-+ * without service-domain guarantees).
-+ * @bfq_coop_thresh: number of queue merges after which a @bfq_queue is
-+ * no more granted any weight-raising.
-+ * @bfq_failed_cooperations: number of consecutive failed cooperation
-+ * chances after which weight-raising is restored
-+ * to a queue subject to more than bfq_coop_thresh
-+ * queue merges.
-+ * @bfq_requests_within_timer: number of consecutive requests that must be
-+ * issued within the idle time slice to set
-+ * again idling to a queue which was marked as
-+ * non-I/O-bound (see the definition of the
-+ * IO_bound flag for further details).
-+ * @last_ins_in_burst: last time at which a queue entered the current
-+ * burst of queues being activated shortly after
-+ * each other; for more details about this and the
-+ * following parameters related to a burst of
-+ * activations, see the comments to the function
-+ * @bfq_handle_burst.
-+ * @bfq_burst_interval: reference time interval used to decide whether a
-+ * queue has been activated shortly after
-+ * @last_ins_in_burst.
-+ * @burst_size: number of queues in the current burst of queue activations.
-+ * @bfq_large_burst_thresh: maximum burst size above which the current
-+ * queue-activation burst is deemed as 'large'.
-+ * @large_burst: true if a large queue-activation burst is in progress.
-+ * @burst_list: head of the burst list (as for the above fields, more details
-+ * in the comments to the function bfq_handle_burst).
-+ * @low_latency: if set to true, low-latency heuristics are enabled.
-+ * @bfq_wr_coeff: maximum factor by which the weight of a weight-raised
-+ * queue is multiplied.
-+ * @bfq_wr_max_time: maximum duration of a weight-raising period (jiffies).
-+ * @bfq_wr_rt_max_time: maximum duration for soft real-time processes.
-+ * @bfq_wr_min_idle_time: minimum idle period after which weight-raising
-+ * may be reactivated for a queue (in jiffies).
-+ * @bfq_wr_min_inter_arr_async: minimum period between request arrivals
-+ * after which weight-raising may be
-+ * reactivated for an already busy queue
-+ * (in jiffies).
-+ * @bfq_wr_max_softrt_rate: max service-rate for a soft real-time queue,
-+ * sectors per seconds.
-+ * @RT_prod: cached value of the product R*T used for computing the maximum
-+ * duration of the weight raising automatically.
-+ * @device_speed: device-speed class for the low-latency heuristic.
-+ * @oom_bfqq: fallback dummy bfqq for extreme OOM conditions.
-+ *
-+ * All the fields are protected by the @queue lock.
-+ */
-+struct bfq_data {
-+ struct request_queue *queue;
-+
-+ struct bfq_group *root_group;
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+ int active_numerous_groups;
-+#endif
-+
-+ struct rb_root queue_weights_tree;
-+ struct rb_root group_weights_tree;
-+
-+ int busy_queues;
-+ int busy_in_flight_queues;
-+ int const_seeky_busy_in_flight_queues;
-+ int wr_busy_queues;
-+ int queued;
-+ int rq_in_driver;
-+ int sync_flight;
-+
-+ int max_rq_in_driver;
-+ int hw_tag_samples;
-+ int hw_tag;
-+
-+ int budgets_assigned;
-+
-+ struct timer_list idle_slice_timer;
-+ struct work_struct unplug_work;
-+
-+ struct bfq_queue *in_service_queue;
-+ struct bfq_io_cq *in_service_bic;
-+
-+ sector_t last_position;
-+
-+ ktime_t last_budget_start;
-+ ktime_t last_idling_start;
-+ int peak_rate_samples;
-+ u64 peak_rate;
-+ int bfq_max_budget;
-+
-+ struct hlist_head group_list;
-+ struct list_head active_list;
-+ struct list_head idle_list;
-+
-+ unsigned int bfq_fifo_expire[2];
-+ unsigned int bfq_back_penalty;
-+ unsigned int bfq_back_max;
-+ unsigned int bfq_slice_idle;
-+ u64 bfq_class_idle_last_service;
-+
-+ int bfq_user_max_budget;
-+ int bfq_max_budget_async_rq;
-+ unsigned int bfq_timeout[2];
-+
-+ unsigned int bfq_coop_thresh;
-+ unsigned int bfq_failed_cooperations;
-+ unsigned int bfq_requests_within_timer;
-+
-+ unsigned long last_ins_in_burst;
-+ unsigned long bfq_burst_interval;
-+ int burst_size;
-+ unsigned long bfq_large_burst_thresh;
-+ bool large_burst;
-+ struct hlist_head burst_list;
-+
-+ bool low_latency;
-+
-+ /* parameters of the low_latency heuristics */
-+ unsigned int bfq_wr_coeff;
-+ unsigned int bfq_wr_max_time;
-+ unsigned int bfq_wr_rt_max_time;
-+ unsigned int bfq_wr_min_idle_time;
-+ unsigned long bfq_wr_min_inter_arr_async;
-+ unsigned int bfq_wr_max_softrt_rate;
-+ u64 RT_prod;
-+ enum bfq_device_speed device_speed;
-+
-+ struct bfq_queue oom_bfqq;
-+};
-+
-+enum bfqq_state_flags {
-+ BFQ_BFQQ_FLAG_busy = 0, /* has requests or is in service */
-+ BFQ_BFQQ_FLAG_wait_request, /* waiting for a request */
-+ BFQ_BFQQ_FLAG_must_alloc, /* must be allowed rq alloc */
-+ BFQ_BFQQ_FLAG_fifo_expire, /* FIFO checked in this slice */
-+ BFQ_BFQQ_FLAG_idle_window, /* slice idling enabled */
-+ BFQ_BFQQ_FLAG_sync, /* synchronous queue */
-+ BFQ_BFQQ_FLAG_budget_new, /* no completion with this budget */
-+ BFQ_BFQQ_FLAG_IO_bound, /*
-+ * bfqq has timed-out at least once
-+ * having consumed at most 2/10 of
-+ * its budget
-+ */
-+ BFQ_BFQQ_FLAG_in_large_burst, /*
-+ * bfqq activated in a large burst,
-+ * see comments to bfq_handle_burst.
-+ */
-+ BFQ_BFQQ_FLAG_constantly_seeky, /*
-+ * bfqq has proved to be slow and
-+ * seeky until budget timeout
-+ */
-+ BFQ_BFQQ_FLAG_softrt_update, /*
-+ * may need softrt-next-start
-+ * update
-+ */
-+};
-+
-+#define BFQ_BFQQ_FNS(name) \
-+static void bfq_mark_bfqq_##name(struct bfq_queue *bfqq) \
-+{ \
-+ (bfqq)->flags |= (1 << BFQ_BFQQ_FLAG_##name); \
-+} \
-+static void bfq_clear_bfqq_##name(struct bfq_queue *bfqq) \
-+{ \
-+ (bfqq)->flags &= ~(1 << BFQ_BFQQ_FLAG_##name); \
-+} \
-+static int bfq_bfqq_##name(const struct bfq_queue *bfqq) \
-+{ \
-+ return ((bfqq)->flags & (1 << BFQ_BFQQ_FLAG_##name)) != 0; \
-+}
-+
-+BFQ_BFQQ_FNS(busy);
-+BFQ_BFQQ_FNS(wait_request);
-+BFQ_BFQQ_FNS(must_alloc);
-+BFQ_BFQQ_FNS(fifo_expire);
-+BFQ_BFQQ_FNS(idle_window);
-+BFQ_BFQQ_FNS(sync);
-+BFQ_BFQQ_FNS(budget_new);
-+BFQ_BFQQ_FNS(IO_bound);
-+BFQ_BFQQ_FNS(in_large_burst);
-+BFQ_BFQQ_FNS(constantly_seeky);
-+BFQ_BFQQ_FNS(softrt_update);
-+#undef BFQ_BFQQ_FNS
-+
-+/* Logging facilities. */
-+#define bfq_log_bfqq(bfqd, bfqq, fmt, args...) \
-+ blk_add_trace_msg((bfqd)->queue, "bfq%d " fmt, (bfqq)->pid, ##args)
-+
-+#define bfq_log(bfqd, fmt, args...) \
-+ blk_add_trace_msg((bfqd)->queue, "bfq " fmt, ##args)
-+
-+/* Expiration reasons. */
-+enum bfqq_expiration {
-+ BFQ_BFQQ_TOO_IDLE = 0, /*
-+ * queue has been idling for
-+ * too long
-+ */
-+ BFQ_BFQQ_BUDGET_TIMEOUT, /* budget took too long to be used */
-+ BFQ_BFQQ_BUDGET_EXHAUSTED, /* budget consumed */
-+ BFQ_BFQQ_NO_MORE_REQUESTS, /* the queue has no more requests */
-+};
-+
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+
-+struct bfqg_stats {
-+ /* total bytes transferred */
-+ struct blkg_rwstat service_bytes;
-+ /* total IOs serviced, post merge */
-+ struct blkg_rwstat serviced;
-+ /* number of ios merged */
-+ struct blkg_rwstat merged;
-+ /* total time spent on device in ns, may not be accurate w/ queueing */
-+ struct blkg_rwstat service_time;
-+ /* total time spent waiting in scheduler queue in ns */
-+ struct blkg_rwstat wait_time;
-+ /* number of IOs queued up */
-+ struct blkg_rwstat queued;
-+ /* total sectors transferred */
-+ struct blkg_stat sectors;
-+ /* total disk time and nr sectors dispatched by this group */
-+ struct blkg_stat time;
-+ /* time not charged to this cgroup */
-+ struct blkg_stat unaccounted_time;
-+ /* sum of number of ios queued across all samples */
-+ struct blkg_stat avg_queue_size_sum;
-+ /* count of samples taken for average */
-+ struct blkg_stat avg_queue_size_samples;
-+ /* how many times this group has been removed from service tree */
-+ struct blkg_stat dequeue;
-+ /* total time spent waiting for it to be assigned a timeslice. */
-+ struct blkg_stat group_wait_time;
-+ /* time spent idling for this blkcg_gq */
-+ struct blkg_stat idle_time;
-+ /* total time with empty current active q with other requests queued */
-+ struct blkg_stat empty_time;
-+ /* fields after this shouldn't be cleared on stat reset */
-+ uint64_t start_group_wait_time;
-+ uint64_t start_idle_time;
-+ uint64_t start_empty_time;
-+ uint16_t flags;
-+};
-+
-+/*
-+ * struct bfq_group_data - per-blkcg storage for the blkio subsystem.
-+ *
-+ * @ps: @blkcg_policy_storage that this structure inherits
-+ * @weight: weight of the bfq_group
-+ */
-+struct bfq_group_data {
-+ /* must be the first member */
-+ struct blkcg_policy_data pd;
-+
-+ unsigned short weight;
-+};
-+
-+/**
-+ * struct bfq_group - per (device, cgroup) data structure.
-+ * @entity: schedulable entity to insert into the parent group sched_data.
-+ * @sched_data: own sched_data, to contain child entities (they may be
-+ * both bfq_queues and bfq_groups).
-+ * @bfqd_node: node to be inserted into the @bfqd->group_list list
-+ * of the groups active on the same device; used for cleanup.
-+ * @bfqd: the bfq_data for the device this group acts upon.
-+ * @async_bfqq: array of async queues for all the tasks belonging to
-+ * the group, one queue per ioprio value per ioprio_class,
-+ * except for the idle class that has only one queue.
-+ * @async_idle_bfqq: async queue for the idle class (ioprio is ignored).
-+ * @my_entity: pointer to @entity, %NULL for the toplevel group; used
-+ * to avoid too many special cases during group creation/
-+ * migration.
-+ * @active_entities: number of active entities belonging to the group;
-+ * unused for the root group. Used to know whether there
-+ * are groups with more than one active @bfq_entity
-+ * (see the comments to the function
-+ * bfq_bfqq_must_not_expire()).
-+ *
-+ * Each (device, cgroup) pair has its own bfq_group, i.e., for each cgroup
-+ * there is a set of bfq_groups, each one collecting the lower-level
-+ * entities belonging to the group that are acting on the same device.
-+ *
-+ * Locking works as follows:
-+ * o @bfqd is protected by the queue lock, RCU is used to access it
-+ * from the readers.
-+ * o All the other fields are protected by the @bfqd queue lock.
-+ */
-+struct bfq_group {
-+ /* must be the first member */
-+ struct blkg_policy_data pd;
-+
-+ struct bfq_entity entity;
-+ struct bfq_sched_data sched_data;
-+
-+ struct hlist_node bfqd_node;
-+
-+ void *bfqd;
-+
-+ struct bfq_queue *async_bfqq[2][IOPRIO_BE_NR];
-+ struct bfq_queue *async_idle_bfqq;
-+
-+ struct bfq_entity *my_entity;
-+
-+ int active_entities;
-+
-+ struct bfqg_stats stats;
-+ struct bfqg_stats dead_stats; /* stats pushed from dead children */
-+};
-+
-+#else
-+struct bfq_group {
-+ struct bfq_sched_data sched_data;
-+
-+ struct bfq_queue *async_bfqq[2][IOPRIO_BE_NR];
-+ struct bfq_queue *async_idle_bfqq;
-+};
-+#endif
-+
-+static struct bfq_queue *bfq_entity_to_bfqq(struct bfq_entity *entity);
-+
-+static struct bfq_service_tree *
-+bfq_entity_service_tree(struct bfq_entity *entity)
-+{
-+ struct bfq_sched_data *sched_data = entity->sched_data;
-+ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
-+ unsigned int idx = bfqq ? bfqq->ioprio_class - 1 :
-+ BFQ_DEFAULT_GRP_CLASS;
-+
-+ BUG_ON(idx >= BFQ_IOPRIO_CLASSES);
-+ BUG_ON(sched_data == NULL);
-+
-+ return sched_data->service_tree + idx;
-+}
-+
-+static struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic, bool is_sync)
-+{
-+ return bic->bfqq[is_sync];
-+}
-+
-+static void bic_set_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq,
-+ bool is_sync)
-+{
-+ bic->bfqq[is_sync] = bfqq;
-+}
-+
-+static struct bfq_data *bic_to_bfqd(struct bfq_io_cq *bic)
-+{
-+ return bic->icq.q->elevator->elevator_data;
-+}
-+
-+/**
-+ * bfq_get_bfqd_locked - get a lock to a bfqd using a RCU protected pointer.
-+ * @ptr: a pointer to a bfqd.
-+ * @flags: storage for the flags to be saved.
-+ *
-+ * This function allows bfqg->bfqd to be protected by the
-+ * queue lock of the bfqd they reference; the pointer is dereferenced
-+ * under RCU, so the storage for bfqd is assured to be safe as long
-+ * as the RCU read side critical section does not end. After the
-+ * bfqd->queue->queue_lock is taken the pointer is rechecked, to be
-+ * sure that no other writer accessed it. If we raced with a writer,
-+ * the function returns NULL, with the queue unlocked, otherwise it
-+ * returns the dereferenced pointer, with the queue locked.
-+ */
-+static struct bfq_data *bfq_get_bfqd_locked(void **ptr, unsigned long *flags)
-+{
-+ struct bfq_data *bfqd;
-+
-+ rcu_read_lock();
-+ bfqd = rcu_dereference(*(struct bfq_data **)ptr);
-+
-+ if (bfqd != NULL) {
-+ spin_lock_irqsave(bfqd->queue->queue_lock, *flags);
-+ if (ptr == NULL)
-+ printk(KERN_CRIT "get_bfqd_locked pointer NULL\n");
-+ else if (*ptr == bfqd)
-+ goto out;
-+ spin_unlock_irqrestore(bfqd->queue->queue_lock, *flags);
-+ }
-+
-+ bfqd = NULL;
-+out:
-+ rcu_read_unlock();
-+ return bfqd;
-+}
-+
-+static void bfq_put_bfqd_unlock(struct bfq_data *bfqd, unsigned long *flags)
-+{
-+ spin_unlock_irqrestore(bfqd->queue->queue_lock, *flags);
-+}
-+
-+static void bfq_check_ioprio_change(struct bfq_io_cq *bic, struct bio *bio);
-+static void bfq_put_queue(struct bfq_queue *bfqq);
-+static void bfq_dispatch_insert(struct request_queue *q, struct request *rq);
-+static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
-+ struct bio *bio, int is_sync,
-+ struct bfq_io_cq *bic, gfp_t gfp_mask);
-+static void bfq_end_wr_async_queues(struct bfq_data *bfqd,
-+ struct bfq_group *bfqg);
-+static void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg);
-+static void bfq_exit_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq);
-+
-+#endif /* _BFQ_H */
---
-2.1.4
-
diff --git a/5003_block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r9-for-4.2.patch b/5003_block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r9-for-4.2.patch
deleted file mode 100644
index dac6db6..0000000
--- a/5003_block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r9-for-4.2.patch
+++ /dev/null
@@ -1,1097 +0,0 @@
-From 75c9c5ea340776c0a9e934581cf63cb963a33fd4 Mon Sep 17 00:00:00 2001
-From: Mauro Andreolini <mauro.andreolini@unimore.it>
-Date: Sun, 6 Sep 2015 16:09:05 +0200
-Subject: [PATCH 3/3] block, bfq: add Early Queue Merge (EQM) to BFQ-v7r9 for
- 4.2.0
-
-A set of processes may happen to perform interleaved reads, i.e.,requests
-whose union would give rise to a sequential read pattern. There are two
-typical cases: in the first case, processes read fixed-size chunks of
-data at a fixed distance from each other, while in the second case processes
-may read variable-size chunks at variable distances. The latter case occurs
-for example with QEMU, which splits the I/O generated by the guest into
-multiple chunks, and lets these chunks be served by a pool of cooperating
-processes, iteratively assigning the next chunk of I/O to the first
-available process. CFQ uses actual queue merging for the first type of
-rocesses, whereas it uses preemption to get a sequential read pattern out
-of the read requests performed by the second type of processes. In the end
-it uses two different mechanisms to achieve the same goal: boosting the
-throughput with interleaved I/O.
-
-This patch introduces Early Queue Merge (EQM), a unified mechanism to get a
-sequential read pattern with both types of processes. The main idea is
-checking newly arrived requests against the next request of the active queue
-both in case of actual request insert and in case of request merge. By doing
-so, both the types of processes can be handled by just merging their queues.
-EQM is then simpler and more compact than the pair of mechanisms used in
-CFQ.
-
-Finally, EQM also preserves the typical low-latency properties of BFQ, by
-properly restoring the weight-raising state of a queue when it gets back to
-a non-merged state.
-
-Signed-off-by: Mauro Andreolini <mauro.andreolini@unimore.it>
-Signed-off-by: Arianna Avanzini <avanzini@google.com>
-Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
----
- block/bfq-cgroup.c | 4 +
- block/bfq-iosched.c | 684 ++++++++++++++++++++++++++++++++++++++++++++++++++--
- block/bfq.h | 66 +++++
- 3 files changed, 740 insertions(+), 14 deletions(-)
-
-diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
-index c02d65a..bc34d7a 100644
---- a/block/bfq-cgroup.c
-+++ b/block/bfq-cgroup.c
-@@ -382,6 +382,7 @@ static void bfq_pd_init(struct blkcg_gq *blkg)
- */
- bfqg->bfqd = bfqd;
- bfqg->active_entities = 0;
-+ bfqg->rq_pos_tree = RB_ROOT;
-
- /* if the root_group does not exist, we are handling it right now */
- if (bfqd->root_group && bfqg != bfqd->root_group)
-@@ -484,6 +485,8 @@ static struct bfq_group *bfq_find_alloc_group(struct bfq_data *bfqd,
- return bfqg;
- }
-
-+static void bfq_pos_tree_add_move(struct bfq_data *bfqd, struct bfq_queue *bfqq);
-+
- /**
- * bfq_bfqq_move - migrate @bfqq to @bfqg.
- * @bfqd: queue descriptor.
-@@ -531,6 +534,7 @@ static void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
- bfqg_get(bfqg);
-
- if (busy) {
-+ bfq_pos_tree_add_move(bfqd, bfqq);
- if (resume)
- bfq_activate_bfqq(bfqd, bfqq);
- }
-diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
-index 51d24dd..fcd6eea 100644
---- a/block/bfq-iosched.c
-+++ b/block/bfq-iosched.c
-@@ -296,6 +296,72 @@ static struct request *bfq_choose_req(struct bfq_data *bfqd,
- }
- }
-
-+static struct bfq_queue *
-+bfq_rq_pos_tree_lookup(struct bfq_data *bfqd, struct rb_root *root,
-+ sector_t sector, struct rb_node **ret_parent,
-+ struct rb_node ***rb_link)
-+{
-+ struct rb_node **p, *parent;
-+ struct bfq_queue *bfqq = NULL;
-+
-+ parent = NULL;
-+ p = &root->rb_node;
-+ while (*p) {
-+ struct rb_node **n;
-+
-+ parent = *p;
-+ bfqq = rb_entry(parent, struct bfq_queue, pos_node);
-+
-+ /*
-+ * Sort strictly based on sector. Smallest to the left,
-+ * largest to the right.
-+ */
-+ if (sector > blk_rq_pos(bfqq->next_rq))
-+ n = &(*p)->rb_right;
-+ else if (sector < blk_rq_pos(bfqq->next_rq))
-+ n = &(*p)->rb_left;
-+ else
-+ break;
-+ p = n;
-+ bfqq = NULL;
-+ }
-+
-+ *ret_parent = parent;
-+ if (rb_link)
-+ *rb_link = p;
-+
-+ bfq_log(bfqd, "rq_pos_tree_lookup %llu: returning %d",
-+ (long long unsigned)sector,
-+ bfqq ? bfqq->pid : 0);
-+
-+ return bfqq;
-+}
-+
-+static void bfq_pos_tree_add_move(struct bfq_data *bfqd, struct bfq_queue *bfqq)
-+{
-+ struct rb_node **p, *parent;
-+ struct bfq_queue *__bfqq;
-+
-+ if (bfqq->pos_root) {
-+ rb_erase(&bfqq->pos_node, bfqq->pos_root);
-+ bfqq->pos_root = NULL;
-+ }
-+
-+ if (bfq_class_idle(bfqq))
-+ return;
-+ if (!bfqq->next_rq)
-+ return;
-+
-+ bfqq->pos_root = &bfq_bfqq_to_bfqg(bfqq)->rq_pos_tree;
-+ __bfqq = bfq_rq_pos_tree_lookup(bfqd, bfqq->pos_root,
-+ blk_rq_pos(bfqq->next_rq), &parent, &p);
-+ if (!__bfqq) {
-+ rb_link_node(&bfqq->pos_node, parent, p);
-+ rb_insert_color(&bfqq->pos_node, bfqq->pos_root);
-+ } else
-+ bfqq->pos_root = NULL;
-+}
-+
- /*
- * Tell whether there are active queues or groups with differentiated weights.
- */
-@@ -528,6 +594,57 @@ static unsigned int bfq_wr_duration(struct bfq_data *bfqd)
- return dur;
- }
-
-+static unsigned bfq_bfqq_cooperations(struct bfq_queue *bfqq)
-+{
-+ return bfqq->bic ? bfqq->bic->cooperations : 0;
-+}
-+
-+static void
-+bfq_bfqq_resume_state(struct bfq_queue *bfqq, struct bfq_io_cq *bic)
-+{
-+ if (bic->saved_idle_window)
-+ bfq_mark_bfqq_idle_window(bfqq);
-+ else
-+ bfq_clear_bfqq_idle_window(bfqq);
-+ if (bic->saved_IO_bound)
-+ bfq_mark_bfqq_IO_bound(bfqq);
-+ else
-+ bfq_clear_bfqq_IO_bound(bfqq);
-+ /* Assuming that the flag in_large_burst is already correctly set */
-+ if (bic->wr_time_left && bfqq->bfqd->low_latency &&
-+ !bfq_bfqq_in_large_burst(bfqq) &&
-+ bic->cooperations < bfqq->bfqd->bfq_coop_thresh) {
-+ /*
-+ * Start a weight raising period with the duration given by
-+ * the raising_time_left snapshot.
-+ */
-+ if (bfq_bfqq_busy(bfqq))
-+ bfqq->bfqd->wr_busy_queues++;
-+ bfqq->wr_coeff = bfqq->bfqd->bfq_wr_coeff;
-+ bfqq->wr_cur_max_time = bic->wr_time_left;
-+ bfqq->last_wr_start_finish = jiffies;
-+ bfqq->entity.prio_changed = 1;
-+ }
-+ /*
-+ * Clear wr_time_left to prevent bfq_bfqq_save_state() from
-+ * getting confused about the queue's need of a weight-raising
-+ * period.
-+ */
-+ bic->wr_time_left = 0;
-+}
-+
-+static int bfqq_process_refs(struct bfq_queue *bfqq)
-+{
-+ int process_refs, io_refs;
-+
-+ lockdep_assert_held(bfqq->bfqd->queue->queue_lock);
-+
-+ io_refs = bfqq->allocated[READ] + bfqq->allocated[WRITE];
-+ process_refs = atomic_read(&bfqq->ref) - io_refs - bfqq->entity.on_st;
-+ BUG_ON(process_refs < 0);
-+ return process_refs;
-+}
-+
- /* Empty burst list and add just bfqq (see comments to bfq_handle_burst) */
- static void bfq_reset_burst_list(struct bfq_data *bfqd, struct bfq_queue *bfqq)
- {
-@@ -764,8 +881,14 @@ static void bfq_add_request(struct request *rq)
- BUG_ON(!next_rq);
- bfqq->next_rq = next_rq;
-
-+ /*
-+ * Adjust priority tree position, if next_rq changes.
-+ */
-+ if (prev != bfqq->next_rq)
-+ bfq_pos_tree_add_move(bfqd, bfqq);
-+
- if (!bfq_bfqq_busy(bfqq)) {
-- bool soft_rt, in_burst,
-+ bool soft_rt, coop_or_in_burst,
- idle_for_long_time = time_is_before_jiffies(
- bfqq->budget_timeout +
- bfqd->bfq_wr_min_idle_time);
-@@ -793,11 +916,12 @@ static void bfq_add_request(struct request *rq)
- bfqd->last_ins_in_burst = jiffies;
- }
-
-- in_burst = bfq_bfqq_in_large_burst(bfqq);
-+ coop_or_in_burst = bfq_bfqq_in_large_burst(bfqq) ||
-+ bfq_bfqq_cooperations(bfqq) >= bfqd->bfq_coop_thresh;
- soft_rt = bfqd->bfq_wr_max_softrt_rate > 0 &&
-- !in_burst &&
-+ !coop_or_in_burst &&
- time_is_before_jiffies(bfqq->soft_rt_next_start);
-- interactive = !in_burst && idle_for_long_time;
-+ interactive = !coop_or_in_burst && idle_for_long_time;
- entity->budget = max_t(unsigned long, bfqq->max_budget,
- bfq_serv_to_charge(next_rq, bfqq));
-
-@@ -816,6 +940,9 @@ static void bfq_add_request(struct request *rq)
- if (!bfqd->low_latency)
- goto add_bfqq_busy;
-
-+ if (bfq_bfqq_just_split(bfqq))
-+ goto set_prio_changed;
-+
- /*
- * If the queue:
- * - is not being boosted,
-@@ -840,7 +967,7 @@ static void bfq_add_request(struct request *rq)
- } else if (old_wr_coeff > 1) {
- if (interactive)
- bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
-- else if (in_burst ||
-+ else if (coop_or_in_burst ||
- (bfqq->wr_cur_max_time ==
- bfqd->bfq_wr_rt_max_time &&
- !soft_rt)) {
-@@ -905,6 +1032,7 @@ static void bfq_add_request(struct request *rq)
- bfqd->bfq_wr_rt_max_time;
- }
- }
-+set_prio_changed:
- if (old_wr_coeff != bfqq->wr_coeff)
- entity->prio_changed = 1;
- add_bfqq_busy:
-@@ -1047,6 +1175,15 @@ static void bfq_merged_request(struct request_queue *q, struct request *req,
- bfqd->last_position);
- BUG_ON(!next_rq);
- bfqq->next_rq = next_rq;
-+ /*
-+ * If next_rq changes, update both the queue's budget to
-+ * fit the new request and the queue's position in its
-+ * rq_pos_tree.
-+ */
-+ if (prev != bfqq->next_rq) {
-+ bfq_updated_next_req(bfqd, bfqq);
-+ bfq_pos_tree_add_move(bfqd, bfqq);
-+ }
- }
- }
-
-@@ -1129,11 +1266,343 @@ static void bfq_end_wr(struct bfq_data *bfqd)
- spin_unlock_irq(bfqd->queue->queue_lock);
- }
-
-+static sector_t bfq_io_struct_pos(void *io_struct, bool request)
-+{
-+ if (request)
-+ return blk_rq_pos(io_struct);
-+ else
-+ return ((struct bio *)io_struct)->bi_iter.bi_sector;
-+}
-+
-+static int bfq_rq_close_to_sector(void *io_struct, bool request,
-+ sector_t sector)
-+{
-+ return abs64(bfq_io_struct_pos(io_struct, request) - sector) <=
-+ BFQQ_SEEK_THR;
-+}
-+
-+static struct bfq_queue *bfqq_find_close(struct bfq_data *bfqd,
-+ struct bfq_queue *bfqq,
-+ sector_t sector)
-+{
-+ struct rb_root *root = &bfq_bfqq_to_bfqg(bfqq)->rq_pos_tree;
-+ struct rb_node *parent, *node;
-+ struct bfq_queue *__bfqq;
-+
-+ if (RB_EMPTY_ROOT(root))
-+ return NULL;
-+
-+ /*
-+ * First, if we find a request starting at the end of the last
-+ * request, choose it.
-+ */
-+ __bfqq = bfq_rq_pos_tree_lookup(bfqd, root, sector, &parent, NULL);
-+ if (__bfqq)
-+ return __bfqq;
-+
-+ /*
-+ * If the exact sector wasn't found, the parent of the NULL leaf
-+ * will contain the closest sector (rq_pos_tree sorted by
-+ * next_request position).
-+ */
-+ __bfqq = rb_entry(parent, struct bfq_queue, pos_node);
-+ if (bfq_rq_close_to_sector(__bfqq->next_rq, true, sector))
-+ return __bfqq;
-+
-+ if (blk_rq_pos(__bfqq->next_rq) < sector)
-+ node = rb_next(&__bfqq->pos_node);
-+ else
-+ node = rb_prev(&__bfqq->pos_node);
-+ if (!node)
-+ return NULL;
-+
-+ __bfqq = rb_entry(node, struct bfq_queue, pos_node);
-+ if (bfq_rq_close_to_sector(__bfqq->next_rq, true, sector))
-+ return __bfqq;
-+
-+ return NULL;
-+}
-+
-+static struct bfq_queue *bfq_find_close_cooperator(struct bfq_data *bfqd,
-+ struct bfq_queue *cur_bfqq,
-+ sector_t sector)
-+{
-+ struct bfq_queue *bfqq;
-+
-+ /*
-+ * We should notice if some of the queues are cooperating, e.g.
-+ * working closely on the same area of the disk. In that case,
-+ * we can group them together and don't waste time idling.
-+ */
-+ bfqq = bfqq_find_close(bfqd, cur_bfqq, sector);
-+ if (!bfqq || bfqq == cur_bfqq)
-+ return NULL;
-+
-+ return bfqq;
-+}
-+
-+static struct bfq_queue *
-+bfq_setup_merge(struct bfq_queue *bfqq, struct bfq_queue *new_bfqq)
-+{
-+ int process_refs, new_process_refs;
-+ struct bfq_queue *__bfqq;
-+
-+ /*
-+ * If there are no process references on the new_bfqq, then it is
-+ * unsafe to follow the ->new_bfqq chain as other bfqq's in the chain
-+ * may have dropped their last reference (not just their last process
-+ * reference).
-+ */
-+ if (!bfqq_process_refs(new_bfqq))
-+ return NULL;
-+
-+ /* Avoid a circular list and skip interim queue merges. */
-+ while ((__bfqq = new_bfqq->new_bfqq)) {
-+ if (__bfqq == bfqq)
-+ return NULL;
-+ new_bfqq = __bfqq;
-+ }
-+
-+ process_refs = bfqq_process_refs(bfqq);
-+ new_process_refs = bfqq_process_refs(new_bfqq);
-+ /*
-+ * If the process for the bfqq has gone away, there is no
-+ * sense in merging the queues.
-+ */
-+ if (process_refs == 0 || new_process_refs == 0)
-+ return NULL;
-+
-+ bfq_log_bfqq(bfqq->bfqd, bfqq, "scheduling merge with queue %d",
-+ new_bfqq->pid);
-+
-+ /*
-+ * Merging is just a redirection: the requests of the process
-+ * owning one of the two queues are redirected to the other queue.
-+ * The latter queue, in its turn, is set as shared if this is the
-+ * first time that the requests of some process are redirected to
-+ * it.
-+ *
-+ * We redirect bfqq to new_bfqq and not the opposite, because we
-+ * are in the context of the process owning bfqq, hence we have
-+ * the io_cq of this process. So we can immediately configure this
-+ * io_cq to redirect the requests of the process to new_bfqq.
-+ *
-+ * NOTE, even if new_bfqq coincides with the in-service queue, the
-+ * io_cq of new_bfqq is not available, because, if the in-service
-+ * queue is shared, bfqd->in_service_bic may not point to the
-+ * io_cq of the in-service queue.
-+ * Redirecting the requests of the process owning bfqq to the
-+ * currently in-service queue is in any case the best option, as
-+ * we feed the in-service queue with new requests close to the
-+ * last request served and, by doing so, hopefully increase the
-+ * throughput.
-+ */
-+ bfqq->new_bfqq = new_bfqq;
-+ atomic_add(process_refs, &new_bfqq->ref);
-+ return new_bfqq;
-+}
-+
-+static bool bfq_may_be_close_cooperator(struct bfq_queue *bfqq,
-+ struct bfq_queue *new_bfqq)
-+{
-+ if (WARN_ON(bfqq->entity.parent != new_bfqq->entity.parent))
-+ return false;
-+
-+ if (bfq_class_idle(bfqq) || bfq_class_idle(new_bfqq) ||
-+ (bfqq->ioprio_class != new_bfqq->ioprio_class))
-+ return false;
-+
-+ /*
-+ * If either of the queues has already been detected as seeky,
-+ * then merging it with the other queue is unlikely to lead to
-+ * sequential I/O.
-+ */
-+ if (BFQQ_SEEKY(bfqq) || BFQQ_SEEKY(new_bfqq))
-+ return false;
-+
-+ /*
-+ * Interleaved I/O is known to be done by (some) applications
-+ * only for reads, so it does not make sense to merge async
-+ * queues.
-+ */
-+ if (!bfq_bfqq_sync(bfqq) || !bfq_bfqq_sync(new_bfqq))
-+ return false;
-+
-+ return true;
-+}
-+
-+/*
-+ * Attempt to schedule a merge of bfqq with the currently in-service queue
-+ * or with a close queue among the scheduled queues.
-+ * Return NULL if no merge was scheduled, a pointer to the shared bfq_queue
-+ * structure otherwise.
-+ *
-+ * The OOM queue is not allowed to participate to cooperation: in fact, since
-+ * the requests temporarily redirected to the OOM queue could be redirected
-+ * again to dedicated queues at any time, the state needed to correctly
-+ * handle merging with the OOM queue would be quite complex and expensive
-+ * to maintain. Besides, in such a critical condition as an out of memory,
-+ * the benefits of queue merging may be little relevant, or even negligible.
-+ */
-+static struct bfq_queue *
-+bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
-+ void *io_struct, bool request)
-+{
-+ struct bfq_queue *in_service_bfqq, *new_bfqq;
-+
-+ if (bfqq->new_bfqq)
-+ return bfqq->new_bfqq;
-+ if (!io_struct || unlikely(bfqq == &bfqd->oom_bfqq))
-+ return NULL;
-+ /* If device has only one backlogged bfq_queue, don't search. */
-+ if (bfqd->busy_queues == 1)
-+ return NULL;
-+
-+ in_service_bfqq = bfqd->in_service_queue;
-+
-+ if (!in_service_bfqq || in_service_bfqq == bfqq ||
-+ !bfqd->in_service_bic ||
-+ unlikely(in_service_bfqq == &bfqd->oom_bfqq))
-+ goto check_scheduled;
-+
-+ if (bfq_rq_close_to_sector(io_struct, request, bfqd->last_position) &&
-+ bfq_may_be_close_cooperator(bfqq, in_service_bfqq)) {
-+ new_bfqq = bfq_setup_merge(bfqq, in_service_bfqq);
-+ if (new_bfqq)
-+ return new_bfqq;
-+ }
-+ /*
-+ * Check whether there is a cooperator among currently scheduled
-+ * queues. The only thing we need is that the bio/request is not
-+ * NULL, as we need it to establish whether a cooperator exists.
-+ */
-+check_scheduled:
-+ new_bfqq = bfq_find_close_cooperator(bfqd, bfqq,
-+ bfq_io_struct_pos(io_struct, request));
-+ if (new_bfqq && likely(new_bfqq != &bfqd->oom_bfqq) &&
-+ bfq_may_be_close_cooperator(bfqq, new_bfqq))
-+ return bfq_setup_merge(bfqq, new_bfqq);
-+
-+ return NULL;
-+}
-+
-+static void bfq_bfqq_save_state(struct bfq_queue *bfqq)
-+{
-+ /*
-+ * If !bfqq->bic, the queue is already shared or its requests
-+ * have already been redirected to a shared queue; both idle window
-+ * and weight raising state have already been saved. Do nothing.
-+ */
-+ if (!bfqq->bic)
-+ return;
-+ if (bfqq->bic->wr_time_left)
-+ /*
-+ * This is the queue of a just-started process, and would
-+ * deserve weight raising: we set wr_time_left to the full
-+ * weight-raising duration to trigger weight-raising when
-+ * and if the queue is split and the first request of the
-+ * queue is enqueued.
-+ */
-+ bfqq->bic->wr_time_left = bfq_wr_duration(bfqq->bfqd);
-+ else if (bfqq->wr_coeff > 1) {
-+ unsigned long wr_duration =
-+ jiffies - bfqq->last_wr_start_finish;
-+ /*
-+ * It may happen that a queue's weight raising period lasts
-+ * longer than its wr_cur_max_time, as weight raising is
-+ * handled only when a request is enqueued or dispatched (it
-+ * does not use any timer). If the weight raising period is
-+ * about to end, don't save it.
-+ */
-+ if (bfqq->wr_cur_max_time <= wr_duration)
-+ bfqq->bic->wr_time_left = 0;
-+ else
-+ bfqq->bic->wr_time_left =
-+ bfqq->wr_cur_max_time - wr_duration;
-+ /*
-+ * The bfq_queue is becoming shared or the requests of the
-+ * process owning the queue are being redirected to a shared
-+ * queue. Stop the weight raising period of the queue, as in
-+ * both cases it should not be owned by an interactive or
-+ * soft real-time application.
-+ */
-+ bfq_bfqq_end_wr(bfqq);
-+ } else
-+ bfqq->bic->wr_time_left = 0;
-+ bfqq->bic->saved_idle_window = bfq_bfqq_idle_window(bfqq);
-+ bfqq->bic->saved_IO_bound = bfq_bfqq_IO_bound(bfqq);
-+ bfqq->bic->saved_in_large_burst = bfq_bfqq_in_large_burst(bfqq);
-+ bfqq->bic->was_in_burst_list = !hlist_unhashed(&bfqq->burst_list_node);
-+ bfqq->bic->cooperations++;
-+ bfqq->bic->failed_cooperations = 0;
-+}
-+
-+static void bfq_get_bic_reference(struct bfq_queue *bfqq)
-+{
-+ /*
-+ * If bfqq->bic has a non-NULL value, the bic to which it belongs
-+ * is about to begin using a shared bfq_queue.
-+ */
-+ if (bfqq->bic)
-+ atomic_long_inc(&bfqq->bic->icq.ioc->refcount);
-+}
-+
-+static void
-+bfq_merge_bfqqs(struct bfq_data *bfqd, struct bfq_io_cq *bic,
-+ struct bfq_queue *bfqq, struct bfq_queue *new_bfqq)
-+{
-+ bfq_log_bfqq(bfqd, bfqq, "merging with queue %lu",
-+ (long unsigned)new_bfqq->pid);
-+ /* Save weight raising and idle window of the merged queues */
-+ bfq_bfqq_save_state(bfqq);
-+ bfq_bfqq_save_state(new_bfqq);
-+ if (bfq_bfqq_IO_bound(bfqq))
-+ bfq_mark_bfqq_IO_bound(new_bfqq);
-+ bfq_clear_bfqq_IO_bound(bfqq);
-+ /*
-+ * Grab a reference to the bic, to prevent it from being destroyed
-+ * before being possibly touched by a bfq_split_bfqq().
-+ */
-+ bfq_get_bic_reference(bfqq);
-+ bfq_get_bic_reference(new_bfqq);
-+ /*
-+ * Merge queues (that is, let bic redirect its requests to new_bfqq)
-+ */
-+ bic_set_bfqq(bic, new_bfqq, 1);
-+ bfq_mark_bfqq_coop(new_bfqq);
-+ /*
-+ * new_bfqq now belongs to at least two bics (it is a shared queue):
-+ * set new_bfqq->bic to NULL. bfqq either:
-+ * - does not belong to any bic any more, and hence bfqq->bic must
-+ * be set to NULL, or
-+ * - is a queue whose owning bics have already been redirected to a
-+ * different queue, hence the queue is destined to not belong to
-+ * any bic soon and bfqq->bic is already NULL (therefore the next
-+ * assignment causes no harm).
-+ */
-+ new_bfqq->bic = NULL;
-+ bfqq->bic = NULL;
-+ bfq_put_queue(bfqq);
-+}
-+
-+static void bfq_bfqq_increase_failed_cooperations(struct bfq_queue *bfqq)
-+{
-+ struct bfq_io_cq *bic = bfqq->bic;
-+ struct bfq_data *bfqd = bfqq->bfqd;
-+
-+ if (bic && bfq_bfqq_cooperations(bfqq) >= bfqd->bfq_coop_thresh) {
-+ bic->failed_cooperations++;
-+ if (bic->failed_cooperations >= bfqd->bfq_failed_cooperations)
-+ bic->cooperations = 0;
-+ }
-+}
-+
- static int bfq_allow_merge(struct request_queue *q, struct request *rq,
- struct bio *bio)
- {
- struct bfq_data *bfqd = q->elevator->elevator_data;
- struct bfq_io_cq *bic;
-+ struct bfq_queue *bfqq, *new_bfqq;
-
- /*
- * Disallow merge of a sync bio into an async request.
-@@ -1150,7 +1619,26 @@ static int bfq_allow_merge(struct request_queue *q, struct request *rq,
- if (!bic)
- return 0;
-
-- return bic_to_bfqq(bic, bfq_bio_sync(bio)) == RQ_BFQQ(rq);
-+ bfqq = bic_to_bfqq(bic, bfq_bio_sync(bio));
-+ /*
-+ * We take advantage of this function to perform an early merge
-+ * of the queues of possible cooperating processes.
-+ */
-+ if (bfqq) {
-+ new_bfqq = bfq_setup_cooperator(bfqd, bfqq, bio, false);
-+ if (new_bfqq) {
-+ bfq_merge_bfqqs(bfqd, bic, bfqq, new_bfqq);
-+ /*
-+ * If we get here, the bio will be queued in the
-+ * shared queue, i.e., new_bfqq, so use new_bfqq
-+ * to decide whether bio and rq can be merged.
-+ */
-+ bfqq = new_bfqq;
-+ } else
-+ bfq_bfqq_increase_failed_cooperations(bfqq);
-+ }
-+
-+ return bfqq == RQ_BFQQ(rq);
- }
-
- static void __bfq_set_in_service_queue(struct bfq_data *bfqd,
-@@ -1349,6 +1837,15 @@ static void __bfq_bfqq_expire(struct bfq_data *bfqd, struct bfq_queue *bfqq)
-
- __bfq_bfqd_reset_in_service(bfqd);
-
-+ /*
-+ * If this bfqq is shared between multiple processes, check
-+ * to make sure that those processes are still issuing I/Os
-+ * within the mean seek distance. If not, it may be time to
-+ * break the queues apart again.
-+ */
-+ if (bfq_bfqq_coop(bfqq) && BFQQ_SEEKY(bfqq))
-+ bfq_mark_bfqq_split_coop(bfqq);
-+
- if (RB_EMPTY_ROOT(&bfqq->sort_list)) {
- /*
- * Overloading budget_timeout field to store the time
-@@ -1357,8 +1854,13 @@ static void __bfq_bfqq_expire(struct bfq_data *bfqd, struct bfq_queue *bfqq)
- */
- bfqq->budget_timeout = jiffies;
- bfq_del_bfqq_busy(bfqd, bfqq, 1);
-- } else
-+ } else {
- bfq_activate_bfqq(bfqd, bfqq);
-+ /*
-+ * Resort priority tree of potential close cooperators.
-+ */
-+ bfq_pos_tree_add_move(bfqd, bfqq);
-+ }
- }
-
- /**
-@@ -2242,10 +2744,12 @@ static void bfq_update_wr_data(struct bfq_data *bfqd, struct bfq_queue *bfqq)
- /*
- * If the queue was activated in a burst, or
- * too much time has elapsed from the beginning
-- * of this weight-raising period, then end weight
-- * raising.
-+ * of this weight-raising period, or the queue has
-+ * exceeded the acceptable number of cooperations,
-+ * then end weight raising.
- */
- if (bfq_bfqq_in_large_burst(bfqq) ||
-+ bfq_bfqq_cooperations(bfqq) >= bfqd->bfq_coop_thresh ||
- time_is_before_jiffies(bfqq->last_wr_start_finish +
- bfqq->wr_cur_max_time)) {
- bfqq->last_wr_start_finish = jiffies;
-@@ -2474,6 +2978,25 @@ static void bfq_put_queue(struct bfq_queue *bfqq)
- #endif
- }
-
-+static void bfq_put_cooperator(struct bfq_queue *bfqq)
-+{
-+ struct bfq_queue *__bfqq, *next;
-+
-+ /*
-+ * If this queue was scheduled to merge with another queue, be
-+ * sure to drop the reference taken on that queue (and others in
-+ * the merge chain). See bfq_setup_merge and bfq_merge_bfqqs.
-+ */
-+ __bfqq = bfqq->new_bfqq;
-+ while (__bfqq) {
-+ if (__bfqq == bfqq)
-+ break;
-+ next = __bfqq->new_bfqq;
-+ bfq_put_queue(__bfqq);
-+ __bfqq = next;
-+ }
-+}
-+
- static void bfq_exit_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq)
- {
- if (bfqq == bfqd->in_service_queue) {
-@@ -2484,6 +3007,8 @@ static void bfq_exit_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq)
- bfq_log_bfqq(bfqd, bfqq, "exit_bfqq: %p, %d", bfqq,
- atomic_read(&bfqq->ref));
-
-+ bfq_put_cooperator(bfqq);
-+
- bfq_put_queue(bfqq);
- }
-
-@@ -2492,6 +3017,25 @@ static void bfq_init_icq(struct io_cq *icq)
- struct bfq_io_cq *bic = icq_to_bic(icq);
-
- bic->ttime.last_end_request = jiffies;
-+ /*
-+ * A newly created bic indicates that the process has just
-+ * started doing I/O, and is probably mapping into memory its
-+ * executable and libraries: it definitely needs weight raising.
-+ * There is however the possibility that the process performs,
-+ * for a while, I/O close to some other process. EQM intercepts
-+ * this behavior and may merge the queue corresponding to the
-+ * process with some other queue, BEFORE the weight of the queue
-+ * is raised. Merged queues are not weight-raised (they are assumed
-+ * to belong to processes that benefit only from high throughput).
-+ * If the merge is basically the consequence of an accident, then
-+ * the queue will be split soon and will get back its old weight.
-+ * It is then important to write down somewhere that this queue
-+ * does need weight raising, even if it did not make it to get its
-+ * weight raised before being merged. To this purpose, we overload
-+ * the field raising_time_left and assign 1 to it, to mark the queue
-+ * as needing weight raising.
-+ */
-+ bic->wr_time_left = 1;
- }
-
- static void bfq_exit_icq(struct io_cq *icq)
-@@ -2505,6 +3049,13 @@ static void bfq_exit_icq(struct io_cq *icq)
- }
-
- if (bic->bfqq[BLK_RW_SYNC]) {
-+ /*
-+ * If the bic is using a shared queue, put the reference
-+ * taken on the io_context when the bic started using a
-+ * shared bfq_queue.
-+ */
-+ if (bfq_bfqq_coop(bic->bfqq[BLK_RW_SYNC]))
-+ put_io_context(icq->ioc);
- bfq_exit_bfqq(bfqd, bic->bfqq[BLK_RW_SYNC]);
- bic->bfqq[BLK_RW_SYNC] = NULL;
- }
-@@ -2809,6 +3360,10 @@ static void bfq_update_idle_window(struct bfq_data *bfqd,
- if (!bfq_bfqq_sync(bfqq) || bfq_class_idle(bfqq))
- return;
-
-+ /* Idle window just restored, statistics are meaningless. */
-+ if (bfq_bfqq_just_split(bfqq))
-+ return;
-+
- enable_idle = bfq_bfqq_idle_window(bfqq);
-
- if (atomic_read(&bic->icq.ioc->active_ref) == 0 ||
-@@ -2856,6 +3411,7 @@ static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq,
- if (bfqq->entity.service > bfq_max_budget(bfqd) / 8 ||
- !BFQQ_SEEKY(bfqq))
- bfq_update_idle_window(bfqd, bfqq, bic);
-+ bfq_clear_bfqq_just_split(bfqq);
-
- bfq_log_bfqq(bfqd, bfqq,
- "rq_enqueued: idle_window=%d (seeky %d, mean %llu)",
-@@ -2920,12 +3476,47 @@ static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq,
- static void bfq_insert_request(struct request_queue *q, struct request *rq)
- {
- struct bfq_data *bfqd = q->elevator->elevator_data;
-- struct bfq_queue *bfqq = RQ_BFQQ(rq);
-+ struct bfq_queue *bfqq = RQ_BFQQ(rq), *new_bfqq;
-
- assert_spin_locked(bfqd->queue->queue_lock);
-
-+ /*
-+ * An unplug may trigger a requeue of a request from the device
-+ * driver: make sure we are in process context while trying to
-+ * merge two bfq_queues.
-+ */
-+ if (!in_interrupt()) {
-+ new_bfqq = bfq_setup_cooperator(bfqd, bfqq, rq, true);
-+ if (new_bfqq) {
-+ if (bic_to_bfqq(RQ_BIC(rq), 1) != bfqq)
-+ new_bfqq = bic_to_bfqq(RQ_BIC(rq), 1);
-+ /*
-+ * Release the request's reference to the old bfqq
-+ * and make sure one is taken to the shared queue.
-+ */
-+ new_bfqq->allocated[rq_data_dir(rq)]++;
-+ bfqq->allocated[rq_data_dir(rq)]--;
-+ atomic_inc(&new_bfqq->ref);
-+ bfq_put_queue(bfqq);
-+ if (bic_to_bfqq(RQ_BIC(rq), 1) == bfqq)
-+ bfq_merge_bfqqs(bfqd, RQ_BIC(rq),
-+ bfqq, new_bfqq);
-+ rq->elv.priv[1] = new_bfqq;
-+ bfqq = new_bfqq;
-+ } else
-+ bfq_bfqq_increase_failed_cooperations(bfqq);
-+ }
-+
- bfq_add_request(rq);
-
-+ /*
-+ * Here a newly-created bfq_queue has already started a weight-raising
-+ * period: clear raising_time_left to prevent bfq_bfqq_save_state()
-+ * from assigning it a full weight-raising period. See the detailed
-+ * comments about this field in bfq_init_icq().
-+ */
-+ if (bfqq->bic)
-+ bfqq->bic->wr_time_left = 0;
- rq->fifo_time = jiffies + bfqd->bfq_fifo_expire[rq_is_sync(rq)];
- list_add_tail(&rq->queuelist, &bfqq->fifo);
-
-@@ -3094,6 +3685,32 @@ static void bfq_put_request(struct request *rq)
- }
-
- /*
-+ * Returns NULL if a new bfqq should be allocated, or the old bfqq if this
-+ * was the last process referring to said bfqq.
-+ */
-+static struct bfq_queue *
-+bfq_split_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq)
-+{
-+ bfq_log_bfqq(bfqq->bfqd, bfqq, "splitting queue");
-+
-+ put_io_context(bic->icq.ioc);
-+
-+ if (bfqq_process_refs(bfqq) == 1) {
-+ bfqq->pid = current->pid;
-+ bfq_clear_bfqq_coop(bfqq);
-+ bfq_clear_bfqq_split_coop(bfqq);
-+ return bfqq;
-+ }
-+
-+ bic_set_bfqq(bic, NULL, 1);
-+
-+ bfq_put_cooperator(bfqq);
-+
-+ bfq_put_queue(bfqq);
-+ return NULL;
-+}
-+
-+/*
- * Allocate bfq data structures associated with this request.
- */
- static int bfq_set_request(struct request_queue *q, struct request *rq,
-@@ -3105,6 +3722,7 @@ static int bfq_set_request(struct request_queue *q, struct request *rq,
- const int is_sync = rq_is_sync(rq);
- struct bfq_queue *bfqq;
- unsigned long flags;
-+ bool split = false;
-
- might_sleep_if(gfp_mask & __GFP_WAIT);
-
-@@ -3117,15 +3735,30 @@ static int bfq_set_request(struct request_queue *q, struct request *rq,
-
- bfq_bic_update_cgroup(bic, bio);
-
-+new_queue:
- bfqq = bic_to_bfqq(bic, is_sync);
- if (!bfqq || bfqq == &bfqd->oom_bfqq) {
- bfqq = bfq_get_queue(bfqd, bio, is_sync, bic, gfp_mask);
- bic_set_bfqq(bic, bfqq, is_sync);
-- if (is_sync) {
-- if (bfqd->large_burst)
-+ if (split && is_sync) {
-+ if ((bic->was_in_burst_list && bfqd->large_burst) ||
-+ bic->saved_in_large_burst)
- bfq_mark_bfqq_in_large_burst(bfqq);
-- else
-- bfq_clear_bfqq_in_large_burst(bfqq);
-+ else {
-+ bfq_clear_bfqq_in_large_burst(bfqq);
-+ if (bic->was_in_burst_list)
-+ hlist_add_head(&bfqq->burst_list_node,
-+ &bfqd->burst_list);
-+ }
-+ }
-+ } else {
-+ /* If the queue was seeky for too long, break it apart. */
-+ if (bfq_bfqq_coop(bfqq) && bfq_bfqq_split_coop(bfqq)) {
-+ bfq_log_bfqq(bfqd, bfqq, "breaking apart bfqq");
-+ bfqq = bfq_split_bfqq(bic, bfqq);
-+ split = true;
-+ if (!bfqq)
-+ goto new_queue;
- }
- }
-
-@@ -3137,6 +3770,26 @@ static int bfq_set_request(struct request_queue *q, struct request *rq,
- rq->elv.priv[0] = bic;
- rq->elv.priv[1] = bfqq;
-
-+ /*
-+ * If a bfq_queue has only one process reference, it is owned
-+ * by only one bfq_io_cq: we can set the bic field of the
-+ * bfq_queue to the address of that structure. Also, if the
-+ * queue has just been split, mark a flag so that the
-+ * information is available to the other scheduler hooks.
-+ */
-+ if (likely(bfqq != &bfqd->oom_bfqq) && bfqq_process_refs(bfqq) == 1) {
-+ bfqq->bic = bic;
-+ if (split) {
-+ bfq_mark_bfqq_just_split(bfqq);
-+ /*
-+ * If the queue has just been split from a shared
-+ * queue, restore the idle window and the possible
-+ * weight raising period.
-+ */
-+ bfq_bfqq_resume_state(bfqq, bic);
-+ }
-+ }
-+
- spin_unlock_irqrestore(q->queue_lock, flags);
-
- return 0;
-@@ -3289,6 +3942,7 @@ static void bfq_init_root_group(struct bfq_group *root_group,
- root_group->my_entity = NULL;
- root_group->bfqd = bfqd;
- #endif
-+ root_group->rq_pos_tree = RB_ROOT;
- for (i = 0; i < BFQ_IOPRIO_CLASSES; i++)
- root_group->sched_data.service_tree[i] = BFQ_SERVICE_TREE_INIT;
- }
-@@ -3369,6 +4023,8 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
- bfqd->bfq_timeout[BLK_RW_ASYNC] = bfq_timeout_async;
- bfqd->bfq_timeout[BLK_RW_SYNC] = bfq_timeout_sync;
-
-+ bfqd->bfq_coop_thresh = 2;
-+ bfqd->bfq_failed_cooperations = 7000;
- bfqd->bfq_requests_within_timer = 120;
-
- bfqd->bfq_large_burst_thresh = 11;
-diff --git a/block/bfq.h b/block/bfq.h
-index ca5ac20..320c438 100644
---- a/block/bfq.h
-+++ b/block/bfq.h
-@@ -183,6 +183,8 @@ struct bfq_group;
- * ioprio_class value.
- * @new_bfqq: shared bfq_queue if queue is cooperating with
- * one or more other queues.
-+ * @pos_node: request-position tree member (see bfq_group's @rq_pos_tree).
-+ * @pos_root: request-position tree root (see bfq_group's @rq_pos_tree).
- * @sort_list: sorted list of pending requests.
- * @next_rq: if fifo isn't expired, next request to serve.
- * @queued: nr of requests queued in @sort_list.
-@@ -304,6 +306,26 @@ struct bfq_ttime {
- * @ttime: associated @bfq_ttime struct
- * @ioprio: per (request_queue, blkcg) ioprio.
- * @blkcg_id: id of the blkcg the related io_cq belongs to.
-+ * @wr_time_left: snapshot of the time left before weight raising ends
-+ * for the sync queue associated to this process; this
-+ * snapshot is taken to remember this value while the weight
-+ * raising is suspended because the queue is merged with a
-+ * shared queue, and is used to set @raising_cur_max_time
-+ * when the queue is split from the shared queue and its
-+ * weight is raised again
-+ * @saved_idle_window: same purpose as the previous field for the idle
-+ * window
-+ * @saved_IO_bound: same purpose as the previous two fields for the I/O
-+ * bound classification of a queue
-+ * @saved_in_large_burst: same purpose as the previous fields for the
-+ * value of the field keeping the queue's belonging
-+ * to a large burst
-+ * @was_in_burst_list: true if the queue belonged to a burst list
-+ * before its merge with another cooperating queue
-+ * @cooperations: counter of consecutive successful queue merges underwent
-+ * by any of the process' @bfq_queues
-+ * @failed_cooperations: counter of consecutive failed queue merges of any
-+ * of the process' @bfq_queues
- */
- struct bfq_io_cq {
- struct io_cq icq; /* must be the first member */
-@@ -314,6 +336,16 @@ struct bfq_io_cq {
- #ifdef CONFIG_BFQ_GROUP_IOSCHED
- uint64_t blkcg_id; /* the current blkcg ID */
- #endif
-+
-+ unsigned int wr_time_left;
-+ bool saved_idle_window;
-+ bool saved_IO_bound;
-+
-+ bool saved_in_large_burst;
-+ bool was_in_burst_list;
-+
-+ unsigned int cooperations;
-+ unsigned int failed_cooperations;
- };
-
- enum bfq_device_speed {
-@@ -559,6 +591,9 @@ enum bfqq_state_flags {
- * may need softrt-next-start
- * update
- */
-+ BFQ_BFQQ_FLAG_coop, /* bfqq is shared */
-+ BFQ_BFQQ_FLAG_split_coop, /* shared bfqq will be split */
-+ BFQ_BFQQ_FLAG_just_split, /* queue has just been split */
- };
-
- #define BFQ_BFQQ_FNS(name) \
-@@ -585,6 +620,9 @@ BFQ_BFQQ_FNS(budget_new);
- BFQ_BFQQ_FNS(IO_bound);
- BFQ_BFQQ_FNS(in_large_burst);
- BFQ_BFQQ_FNS(constantly_seeky);
-+BFQ_BFQQ_FNS(coop);
-+BFQ_BFQQ_FNS(split_coop);
-+BFQ_BFQQ_FNS(just_split);
- BFQ_BFQQ_FNS(softrt_update);
- #undef BFQ_BFQQ_FNS
-
-@@ -679,6 +717,9 @@ struct bfq_group_data {
- * are groups with more than one active @bfq_entity
- * (see the comments to the function
- * bfq_bfqq_must_not_expire()).
-+ * @rq_pos_tree: rbtree sorted by next_request position, used when
-+ * determining if two or more queues have interleaving
-+ * requests (see bfq_find_close_cooperator()).
- *
- * Each (device, cgroup) pair has its own bfq_group, i.e., for each cgroup
- * there is a set of bfq_groups, each one collecting the lower-level
-@@ -707,6 +748,8 @@ struct bfq_group {
-
- int active_entities;
-
-+ struct rb_root rq_pos_tree;
-+
- struct bfqg_stats stats;
- struct bfqg_stats dead_stats; /* stats pushed from dead children */
- };
-@@ -717,6 +760,8 @@ struct bfq_group {
-
- struct bfq_queue *async_bfqq[2][IOPRIO_BE_NR];
- struct bfq_queue *async_idle_bfqq;
-+
-+ struct rb_root rq_pos_tree;
- };
- #endif
-
-@@ -793,6 +838,27 @@ static void bfq_put_bfqd_unlock(struct bfq_data *bfqd, unsigned long *flags)
- spin_unlock_irqrestore(bfqd->queue->queue_lock, *flags);
- }
-
-+#ifdef CONFIG_BFQ_GROUP_IOSCHED
-+
-+static struct bfq_group *bfq_bfqq_to_bfqg(struct bfq_queue *bfqq)
-+{
-+ struct bfq_entity *group_entity = bfqq->entity.parent;
-+
-+ if (!group_entity)
-+ group_entity = &bfqq->bfqd->root_group->entity;
-+
-+ return container_of(group_entity, struct bfq_group, entity);
-+}
-+
-+#else
-+
-+static struct bfq_group *bfq_bfqq_to_bfqg(struct bfq_queue *bfqq)
-+{
-+ return bfqq->bfqd->root_group;
-+}
-+
-+#endif
-+
- static void bfq_check_ioprio_change(struct bfq_io_cq *bic, struct bio *bio);
- static void bfq_put_queue(struct bfq_queue *bfqq);
- static void bfq_dispatch_insert(struct request_queue *q, struct request *rq);
---
-2.1.4
-
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-09-28 16:49 Mike Pagano
0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-09-28 16:49 UTC (permalink / raw
To: gentoo-commits
commit: 24113c3716b8d5a19a98dca269fbd61c48ce37dc
Author: Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Mon Sep 28 16:49:45 2015 +0000
Commit: Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Mon Sep 28 16:49:45 2015 +0000
URL: https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=24113c37
Add BFQ v7r8.
0000_README | 12 +
...roups-kconfig-build-bits-for-BFQ-v7r8-4.2.patch | 104 +
...introduce-the-BFQ-v7r8-I-O-sched-for-4.2.patch1 | 6952 ++++++++++++++++++++
...Early-Queue-Merge-EQM-to-BFQ-v7r8-for-4.2.patch | 1220 ++++
4 files changed, 8288 insertions(+)
diff --git a/0000_README b/0000_README
index 7050114..93b94b6 100644
--- a/0000_README
+++ b/0000_README
@@ -79,6 +79,18 @@ Patch: 5000_enable-additional-cpu-optimizations-for-gcc.patch
From: https://github.com/graysky2/kernel_gcc_patch/
Desc: Kernel patch enables gcc < v4.9 optimizations for additional CPUs.
+Patch: 5001_block-cgroups-kconfig-build-bits-for-BFQ-v7r8-4.2.patch
+From: http://algo.ing.unimo.it/people/paolo/disk_sched/
+Desc: BFQ v7r8 patch 1 for 4.2: Build, cgroups and kconfig bits
+
+Patch: 5002_block-introduce-the-BFQ-v7r8-I-O-sched-for-4.2.patch1
+From: http://algo.ing.unimo.it/people/paolo/disk_sched/
+Desc: BFQ v7r8 patch 2 for 4.2: BFQ Scheduler
+
+Patch: 5003_block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r8-for-4.2.0.patch
+From: http://algo.ing.unimo.it/people/paolo/disk_sched/
+Desc: BFQ v7r8 patch 3 for 4.2: Early Queue Merge (EQM)
+
Patch: 5010_enable-additional-cpu-optimizations-for-gcc-4.9.patch
From: https://github.com/graysky2/kernel_gcc_patch/
Desc: Kernel patch enables gcc >= v4.9 optimizations for additional CPUs.
diff --git a/5001_block-cgroups-kconfig-build-bits-for-BFQ-v7r8-4.2.patch b/5001_block-cgroups-kconfig-build-bits-for-BFQ-v7r8-4.2.patch
new file mode 100644
index 0000000..daf9be7
--- /dev/null
+++ b/5001_block-cgroups-kconfig-build-bits-for-BFQ-v7r8-4.2.patch
@@ -0,0 +1,104 @@
+From c710d693f32c3d4952626aa2bdcf68ac7b40dd0e Mon Sep 17 00:00:00 2001
+From: Paolo Valente <paolo.valente@unimore.it>
+Date: Tue, 7 Apr 2015 13:39:12 +0200
+Subject: [PATCH 1/3] block: cgroups, kconfig, build bits for BFQ-v7r8-4.2
+
+Update Kconfig.iosched and do the related Makefile changes to include
+kernel configuration options for BFQ. Also add the bfqio controller
+to the cgroups subsystem.
+
+Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
+Signed-off-by: Arianna Avanzini <avanzini@google.com>
+---
+ block/Kconfig.iosched | 32 ++++++++++++++++++++++++++++++++
+ block/Makefile | 1 +
+ include/linux/cgroup_subsys.h | 4 ++++
+ 3 files changed, 37 insertions(+)
+
+diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
+index 421bef9..0ee5f0f 100644
+--- a/block/Kconfig.iosched
++++ b/block/Kconfig.iosched
+@@ -39,6 +39,27 @@ config CFQ_GROUP_IOSCHED
+ ---help---
+ Enable group IO scheduling in CFQ.
+
++config IOSCHED_BFQ
++ tristate "BFQ I/O scheduler"
++ default n
++ ---help---
++ The BFQ I/O scheduler tries to distribute bandwidth among
++ all processes according to their weights.
++ It aims at distributing the bandwidth as desired, independently of
++ the disk parameters and with any workload. It also tries to
++ guarantee low latency to interactive and soft real-time
++ applications. If compiled built-in (saying Y here), BFQ can
++ be configured to support hierarchical scheduling.
++
++config CGROUP_BFQIO
++ bool "BFQ hierarchical scheduling support"
++ depends on CGROUPS && IOSCHED_BFQ=y
++ default n
++ ---help---
++ Enable hierarchical scheduling in BFQ, using the cgroups
++ filesystem interface. The name of the subsystem will be
++ bfqio.
++
+ choice
+ prompt "Default I/O scheduler"
+ default DEFAULT_CFQ
+@@ -52,6 +73,16 @@ choice
+ config DEFAULT_CFQ
+ bool "CFQ" if IOSCHED_CFQ=y
+
++ config DEFAULT_BFQ
++ bool "BFQ" if IOSCHED_BFQ=y
++ help
++ Selects BFQ as the default I/O scheduler which will be
++ used by default for all block devices.
++ The BFQ I/O scheduler aims at distributing the bandwidth
++ as desired, independently of the disk parameters and with
++ any workload. It also tries to guarantee low latency to
++ interactive and soft real-time applications.
++
+ config DEFAULT_NOOP
+ bool "No-op"
+
+@@ -61,6 +92,7 @@ config DEFAULT_IOSCHED
+ string
+ default "deadline" if DEFAULT_DEADLINE
+ default "cfq" if DEFAULT_CFQ
++ default "bfq" if DEFAULT_BFQ
+ default "noop" if DEFAULT_NOOP
+
+ endmenu
+diff --git a/block/Makefile b/block/Makefile
+index 00ecc97..1ed86d5 100644
+--- a/block/Makefile
++++ b/block/Makefile
+@@ -18,6 +18,7 @@ obj-$(CONFIG_BLK_DEV_THROTTLING) += blk-throttle.o
+ obj-$(CONFIG_IOSCHED_NOOP) += noop-iosched.o
+ obj-$(CONFIG_IOSCHED_DEADLINE) += deadline-iosched.o
+ obj-$(CONFIG_IOSCHED_CFQ) += cfq-iosched.o
++obj-$(CONFIG_IOSCHED_BFQ) += bfq-iosched.o
+
+ obj-$(CONFIG_BLOCK_COMPAT) += compat_ioctl.o
+ obj-$(CONFIG_BLK_CMDLINE_PARSER) += cmdline-parser.o
+diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
+index 1a96fda..81ad8a0 100644
+--- a/include/linux/cgroup_subsys.h
++++ b/include/linux/cgroup_subsys.h
+@@ -46,6 +46,10 @@ SUBSYS(freezer)
+ SUBSYS(net_cls)
+ #endif
+
++#if IS_ENABLED(CONFIG_CGROUP_BFQIO)
++SUBSYS(bfqio)
++#endif
++
+ #if IS_ENABLED(CONFIG_CGROUP_PERF)
+ SUBSYS(perf_event)
+ #endif
+--
+1.9.1
+
diff --git a/5002_block-introduce-the-BFQ-v7r8-I-O-sched-for-4.2.patch1 b/5002_block-introduce-the-BFQ-v7r8-I-O-sched-for-4.2.patch1
new file mode 100644
index 0000000..4cc232d
--- /dev/null
+++ b/5002_block-introduce-the-BFQ-v7r8-I-O-sched-for-4.2.patch1
@@ -0,0 +1,6952 @@
+From a364e1785d2eef24c2ca0ade5db036721b86c185 Mon Sep 17 00:00:00 2001
+From: Paolo Valente <paolo.valente@unimore.it>
+Date: Thu, 9 May 2013 19:10:02 +0200
+Subject: [PATCH 2/3] block: introduce the BFQ-v7r8 I/O sched for 4.2
+
+Add the BFQ-v7r8 I/O scheduler to 4.2.
+The general structure is borrowed from CFQ, as much of the code for
+handling I/O contexts. Over time, several useful features have been
+ported from CFQ as well (details in the changelog in README.BFQ). A
+(bfq_)queue is associated to each task doing I/O on a device, and each
+time a scheduling decision has to be made a queue is selected and served
+until it expires.
+
+ - Slices are given in the service domain: tasks are assigned
+ budgets, measured in number of sectors. Once got the disk, a task
+ must however consume its assigned budget within a configurable
+ maximum time (by default, the maximum possible value of the
+ budgets is automatically computed to comply with this timeout).
+ This allows the desired latency vs "throughput boosting" tradeoff
+ to be set.
+
+ - Budgets are scheduled according to a variant of WF2Q+, implemented
+ using an augmented rb-tree to take eligibility into account while
+ preserving an O(log N) overall complexity.
+
+ - A low-latency tunable is provided; if enabled, both interactive
+ and soft real-time applications are guaranteed a very low latency.
+
+ - Latency guarantees are preserved also in the presence of NCQ.
+
+ - Also with flash-based devices, a high throughput is achieved
+ while still preserving latency guarantees.
+
+ - BFQ features Early Queue Merge (EQM), a sort of fusion of the
+ cooperating-queue-merging and the preemption mechanisms present
+ in CFQ. EQM is in fact a unified mechanism that tries to get a
+ sequential read pattern, and hence a high throughput, with any
+ set of processes performing interleaved I/O over a contiguous
+ sequence of sectors.
+
+ - BFQ supports full hierarchical scheduling, exporting a cgroups
+ interface. Since each node has a full scheduler, each group can
+ be assigned its own weight.
+
+ - If the cgroups interface is not used, only I/O priorities can be
+ assigned to processes, with ioprio values mapped to weights
+ with the relation weight = IOPRIO_BE_NR - ioprio.
+
+ - ioprio classes are served in strict priority order, i.e., lower
+ priority queues are not served as long as there are higher
+ priority queues. Among queues in the same class the bandwidth is
+ distributed in proportion to the weight of each queue. A very
+ thin extra bandwidth is however guaranteed to the Idle class, to
+ prevent it from starving.
+
+Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
+Signed-off-by: Arianna Avanzini <avanzini@google.com>
+---
+ block/bfq-cgroup.c | 936 +++++++++++++
+ block/bfq-ioc.c | 36 +
+ block/bfq-iosched.c | 3898 +++++++++++++++++++++++++++++++++++++++++++++++++++
+ block/bfq-sched.c | 1208 ++++++++++++++++
+ block/bfq.h | 771 ++++++++++
+ 5 files changed, 6849 insertions(+)
+ create mode 100644 block/bfq-cgroup.c
+ create mode 100644 block/bfq-ioc.c
+ create mode 100644 block/bfq-iosched.c
+ create mode 100644 block/bfq-sched.c
+ create mode 100644 block/bfq.h
+
+diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
+new file mode 100644
+index 0000000..11e2f1d
+--- /dev/null
++++ b/block/bfq-cgroup.c
+@@ -0,0 +1,936 @@
++/*
++ * BFQ: CGROUPS support.
++ *
++ * Based on ideas and code from CFQ:
++ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
++ *
++ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
++ * Paolo Valente <paolo.valente@unimore.it>
++ *
++ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
++ *
++ * Licensed under the GPL-2 as detailed in the accompanying COPYING.BFQ
++ * file.
++ */
++
++#ifdef CONFIG_CGROUP_BFQIO
++
++static DEFINE_MUTEX(bfqio_mutex);
++
++static bool bfqio_is_removed(struct bfqio_cgroup *bgrp)
++{
++ return bgrp ? !bgrp->online : false;
++}
++
++static struct bfqio_cgroup bfqio_root_cgroup = {
++ .weight = BFQ_DEFAULT_GRP_WEIGHT,
++ .ioprio = BFQ_DEFAULT_GRP_IOPRIO,
++ .ioprio_class = BFQ_DEFAULT_GRP_CLASS,
++};
++
++static inline void bfq_init_entity(struct bfq_entity *entity,
++ struct bfq_group *bfqg)
++{
++ entity->weight = entity->new_weight;
++ entity->orig_weight = entity->new_weight;
++ entity->ioprio = entity->new_ioprio;
++ entity->ioprio_class = entity->new_ioprio_class;
++ entity->parent = bfqg->my_entity;
++ entity->sched_data = &bfqg->sched_data;
++}
++
++static struct bfqio_cgroup *css_to_bfqio(struct cgroup_subsys_state *css)
++{
++ return css ? container_of(css, struct bfqio_cgroup, css) : NULL;
++}
++
++/*
++ * Search the bfq_group for bfqd into the hash table (by now only a list)
++ * of bgrp. Must be called under rcu_read_lock().
++ */
++static struct bfq_group *bfqio_lookup_group(struct bfqio_cgroup *bgrp,
++ struct bfq_data *bfqd)
++{
++ struct bfq_group *bfqg;
++ void *key;
++
++ hlist_for_each_entry_rcu(bfqg, &bgrp->group_data, group_node) {
++ key = rcu_dereference(bfqg->bfqd);
++ if (key == bfqd)
++ return bfqg;
++ }
++
++ return NULL;
++}
++
++static inline void bfq_group_init_entity(struct bfqio_cgroup *bgrp,
++ struct bfq_group *bfqg)
++{
++ struct bfq_entity *entity = &bfqg->entity;
++
++ /*
++ * If the weight of the entity has never been set via the sysfs
++ * interface, then bgrp->weight == 0. In this case we initialize
++ * the weight from the current ioprio value. Otherwise, the group
++ * weight, if set, has priority over the ioprio value.
++ */
++ if (bgrp->weight == 0) {
++ entity->new_weight = bfq_ioprio_to_weight(bgrp->ioprio);
++ entity->new_ioprio = bgrp->ioprio;
++ } else {
++ if (bgrp->weight < BFQ_MIN_WEIGHT ||
++ bgrp->weight > BFQ_MAX_WEIGHT) {
++ printk(KERN_CRIT "bfq_group_init_entity: "
++ "bgrp->weight %d\n", bgrp->weight);
++ BUG();
++ }
++ entity->new_weight = bgrp->weight;
++ entity->new_ioprio = bfq_weight_to_ioprio(bgrp->weight);
++ }
++ entity->orig_weight = entity->weight = entity->new_weight;
++ entity->ioprio = entity->new_ioprio;
++ entity->ioprio_class = entity->new_ioprio_class = bgrp->ioprio_class;
++ entity->my_sched_data = &bfqg->sched_data;
++ bfqg->active_entities = 0;
++}
++
++static inline void bfq_group_set_parent(struct bfq_group *bfqg,
++ struct bfq_group *parent)
++{
++ struct bfq_entity *entity;
++
++ BUG_ON(parent == NULL);
++ BUG_ON(bfqg == NULL);
++
++ entity = &bfqg->entity;
++ entity->parent = parent->my_entity;
++ entity->sched_data = &parent->sched_data;
++}
++
++/**
++ * bfq_group_chain_alloc - allocate a chain of groups.
++ * @bfqd: queue descriptor.
++ * @css: the leaf cgroup_subsys_state this chain starts from.
++ *
++ * Allocate a chain of groups starting from the one belonging to
++ * @cgroup up to the root cgroup. Stop if a cgroup on the chain
++ * to the root has already an allocated group on @bfqd.
++ */
++static struct bfq_group *bfq_group_chain_alloc(struct bfq_data *bfqd,
++ struct cgroup_subsys_state *css)
++{
++ struct bfqio_cgroup *bgrp;
++ struct bfq_group *bfqg, *prev = NULL, *leaf = NULL;
++
++ for (; css != NULL; css = css->parent) {
++ bgrp = css_to_bfqio(css);
++
++ bfqg = bfqio_lookup_group(bgrp, bfqd);
++ if (bfqg != NULL) {
++ /*
++ * All the cgroups in the path from there to the
++ * root must have a bfq_group for bfqd, so we don't
++ * need any more allocations.
++ */
++ break;
++ }
++
++ bfqg = kzalloc(sizeof(*bfqg), GFP_ATOMIC);
++ if (bfqg == NULL)
++ goto cleanup;
++
++ bfq_group_init_entity(bgrp, bfqg);
++ bfqg->my_entity = &bfqg->entity;
++
++ if (leaf == NULL) {
++ leaf = bfqg;
++ prev = leaf;
++ } else {
++ bfq_group_set_parent(prev, bfqg);
++ /*
++ * Build a list of allocated nodes using the bfqd
++ * filed, that is still unused and will be
++ * initialized only after the node will be
++ * connected.
++ */
++ prev->bfqd = bfqg;
++ prev = bfqg;
++ }
++ }
++
++ return leaf;
++
++cleanup:
++ while (leaf != NULL) {
++ prev = leaf;
++ leaf = leaf->bfqd;
++ kfree(prev);
++ }
++
++ return NULL;
++}
++
++/**
++ * bfq_group_chain_link - link an allocated group chain to a cgroup
++ * hierarchy.
++ * @bfqd: the queue descriptor.
++ * @css: the leaf cgroup_subsys_state to start from.
++ * @leaf: the leaf group (to be associated to @cgroup).
++ *
++ * Try to link a chain of groups to a cgroup hierarchy, connecting the
++ * nodes bottom-up, so we can be sure that when we find a cgroup in the
++ * hierarchy that already as a group associated to @bfqd all the nodes
++ * in the path to the root cgroup have one too.
++ *
++ * On locking: the queue lock protects the hierarchy (there is a hierarchy
++ * per device) while the bfqio_cgroup lock protects the list of groups
++ * belonging to the same cgroup.
++ */
++static void bfq_group_chain_link(struct bfq_data *bfqd,
++ struct cgroup_subsys_state *css,
++ struct bfq_group *leaf)
++{
++ struct bfqio_cgroup *bgrp;
++ struct bfq_group *bfqg, *next, *prev = NULL;
++ unsigned long flags;
++
++ assert_spin_locked(bfqd->queue->queue_lock);
++
++ for (; css != NULL && leaf != NULL; css = css->parent) {
++ bgrp = css_to_bfqio(css);
++ next = leaf->bfqd;
++
++ bfqg = bfqio_lookup_group(bgrp, bfqd);
++ BUG_ON(bfqg != NULL);
++
++ spin_lock_irqsave(&bgrp->lock, flags);
++
++ rcu_assign_pointer(leaf->bfqd, bfqd);
++ hlist_add_head_rcu(&leaf->group_node, &bgrp->group_data);
++ hlist_add_head(&leaf->bfqd_node, &bfqd->group_list);
++
++ spin_unlock_irqrestore(&bgrp->lock, flags);
++
++ prev = leaf;
++ leaf = next;
++ }
++
++ BUG_ON(css == NULL && leaf != NULL);
++ if (css != NULL && prev != NULL) {
++ bgrp = css_to_bfqio(css);
++ bfqg = bfqio_lookup_group(bgrp, bfqd);
++ bfq_group_set_parent(prev, bfqg);
++ }
++}
++
++/**
++ * bfq_find_alloc_group - return the group associated to @bfqd in @cgroup.
++ * @bfqd: queue descriptor.
++ * @cgroup: cgroup being searched for.
++ *
++ * Return a group associated to @bfqd in @cgroup, allocating one if
++ * necessary. When a group is returned all the cgroups in the path
++ * to the root have a group associated to @bfqd.
++ *
++ * If the allocation fails, return the root group: this breaks guarantees
++ * but is a safe fallback. If this loss becomes a problem it can be
++ * mitigated using the equivalent weight (given by the product of the
++ * weights of the groups in the path from @group to the root) in the
++ * root scheduler.
++ *
++ * We allocate all the missing nodes in the path from the leaf cgroup
++ * to the root and we connect the nodes only after all the allocations
++ * have been successful.
++ */
++static struct bfq_group *bfq_find_alloc_group(struct bfq_data *bfqd,
++ struct cgroup_subsys_state *css)
++{
++ struct bfqio_cgroup *bgrp = css_to_bfqio(css);
++ struct bfq_group *bfqg;
++
++ bfqg = bfqio_lookup_group(bgrp, bfqd);
++ if (bfqg != NULL)
++ return bfqg;
++
++ bfqg = bfq_group_chain_alloc(bfqd, css);
++ if (bfqg != NULL)
++ bfq_group_chain_link(bfqd, css, bfqg);
++ else
++ bfqg = bfqd->root_group;
++
++ return bfqg;
++}
++
++/**
++ * bfq_bfqq_move - migrate @bfqq to @bfqg.
++ * @bfqd: queue descriptor.
++ * @bfqq: the queue to move.
++ * @entity: @bfqq's entity.
++ * @bfqg: the group to move to.
++ *
++ * Move @bfqq to @bfqg, deactivating it from its old group and reactivating
++ * it on the new one. Avoid putting the entity on the old group idle tree.
++ *
++ * Must be called under the queue lock; the cgroup owning @bfqg must
++ * not disappear (by now this just means that we are called under
++ * rcu_read_lock()).
++ */
++static void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++ struct bfq_entity *entity, struct bfq_group *bfqg)
++{
++ int busy, resume;
++
++ busy = bfq_bfqq_busy(bfqq);
++ resume = !RB_EMPTY_ROOT(&bfqq->sort_list);
++
++ BUG_ON(resume && !entity->on_st);
++ BUG_ON(busy && !resume && entity->on_st &&
++ bfqq != bfqd->in_service_queue);
++
++ if (busy) {
++ BUG_ON(atomic_read(&bfqq->ref) < 2);
++
++ if (!resume)
++ bfq_del_bfqq_busy(bfqd, bfqq, 0);
++ else
++ bfq_deactivate_bfqq(bfqd, bfqq, 0);
++ } else if (entity->on_st)
++ bfq_put_idle_entity(bfq_entity_service_tree(entity), entity);
++
++ /*
++ * Here we use a reference to bfqg. We don't need a refcounter
++ * as the cgroup reference will not be dropped, so that its
++ * destroy() callback will not be invoked.
++ */
++ entity->parent = bfqg->my_entity;
++ entity->sched_data = &bfqg->sched_data;
++
++ if (busy && resume)
++ bfq_activate_bfqq(bfqd, bfqq);
++
++ if (bfqd->in_service_queue == NULL && !bfqd->rq_in_driver)
++ bfq_schedule_dispatch(bfqd);
++}
++
++/**
++ * __bfq_bic_change_cgroup - move @bic to @cgroup.
++ * @bfqd: the queue descriptor.
++ * @bic: the bic to move.
++ * @cgroup: the cgroup to move to.
++ *
++ * Move bic to cgroup, assuming that bfqd->queue is locked; the caller
++ * has to make sure that the reference to cgroup is valid across the call.
++ *
++ * NOTE: an alternative approach might have been to store the current
++ * cgroup in bfqq and getting a reference to it, reducing the lookup
++ * time here, at the price of slightly more complex code.
++ */
++static struct bfq_group *__bfq_bic_change_cgroup(struct bfq_data *bfqd,
++ struct bfq_io_cq *bic,
++ struct cgroup_subsys_state *css)
++{
++ struct bfq_queue *async_bfqq = bic_to_bfqq(bic, 0);
++ struct bfq_queue *sync_bfqq = bic_to_bfqq(bic, 1);
++ struct bfq_entity *entity;
++ struct bfq_group *bfqg;
++ struct bfqio_cgroup *bgrp;
++
++ bgrp = css_to_bfqio(css);
++
++ bfqg = bfq_find_alloc_group(bfqd, css);
++ if (async_bfqq != NULL) {
++ entity = &async_bfqq->entity;
++
++ if (entity->sched_data != &bfqg->sched_data) {
++ bic_set_bfqq(bic, NULL, 0);
++ bfq_log_bfqq(bfqd, async_bfqq,
++ "bic_change_group: %p %d",
++ async_bfqq, atomic_read(&async_bfqq->ref));
++ bfq_put_queue(async_bfqq);
++ }
++ }
++
++ if (sync_bfqq != NULL) {
++ entity = &sync_bfqq->entity;
++ if (entity->sched_data != &bfqg->sched_data)
++ bfq_bfqq_move(bfqd, sync_bfqq, entity, bfqg);
++ }
++
++ return bfqg;
++}
++
++/**
++ * bfq_bic_change_cgroup - move @bic to @cgroup.
++ * @bic: the bic being migrated.
++ * @cgroup: the destination cgroup.
++ *
++ * When the task owning @bic is moved to @cgroup, @bic is immediately
++ * moved into its new parent group.
++ */
++static void bfq_bic_change_cgroup(struct bfq_io_cq *bic,
++ struct cgroup_subsys_state *css)
++{
++ struct bfq_data *bfqd;
++ unsigned long uninitialized_var(flags);
++
++ bfqd = bfq_get_bfqd_locked(&(bic->icq.q->elevator->elevator_data),
++ &flags);
++ if (bfqd != NULL) {
++ __bfq_bic_change_cgroup(bfqd, bic, css);
++ bfq_put_bfqd_unlock(bfqd, &flags);
++ }
++}
++
++/**
++ * bfq_bic_update_cgroup - update the cgroup of @bic.
++ * @bic: the @bic to update.
++ *
++ * Make sure that @bic is enqueued in the cgroup of the current task.
++ * We need this in addition to moving bics during the cgroup attach
++ * phase because the task owning @bic could be at its first disk
++ * access or we may end up in the root cgroup as the result of a
++ * memory allocation failure and here we try to move to the right
++ * group.
++ *
++ * Must be called under the queue lock. It is safe to use the returned
++ * value even after the rcu_read_unlock() as the migration/destruction
++ * paths act under the queue lock too. IOW it is impossible to race with
++ * group migration/destruction and end up with an invalid group as:
++ * a) here cgroup has not yet been destroyed, nor its destroy callback
++ * has started execution, as current holds a reference to it,
++ * b) if it is destroyed after rcu_read_unlock() [after current is
++ * migrated to a different cgroup] its attach() callback will have
++ * taken care of remove all the references to the old cgroup data.
++ */
++static struct bfq_group *bfq_bic_update_cgroup(struct bfq_io_cq *bic)
++{
++ struct bfq_data *bfqd = bic_to_bfqd(bic);
++ struct bfq_group *bfqg;
++ struct cgroup_subsys_state *css;
++
++ BUG_ON(bfqd == NULL);
++
++ rcu_read_lock();
++ css = task_css(current, bfqio_cgrp_id);
++ bfqg = __bfq_bic_change_cgroup(bfqd, bic, css);
++ rcu_read_unlock();
++
++ return bfqg;
++}
++
++/**
++ * bfq_flush_idle_tree - deactivate any entity on the idle tree of @st.
++ * @st: the service tree being flushed.
++ */
++static inline void bfq_flush_idle_tree(struct bfq_service_tree *st)
++{
++ struct bfq_entity *entity = st->first_idle;
++
++ for (; entity != NULL; entity = st->first_idle)
++ __bfq_deactivate_entity(entity, 0);
++}
++
++/**
++ * bfq_reparent_leaf_entity - move leaf entity to the root_group.
++ * @bfqd: the device data structure with the root group.
++ * @entity: the entity to move.
++ */
++static inline void bfq_reparent_leaf_entity(struct bfq_data *bfqd,
++ struct bfq_entity *entity)
++{
++ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++
++ BUG_ON(bfqq == NULL);
++ bfq_bfqq_move(bfqd, bfqq, entity, bfqd->root_group);
++ return;
++}
++
++/**
++ * bfq_reparent_active_entities - move to the root group all active
++ * entities.
++ * @bfqd: the device data structure with the root group.
++ * @bfqg: the group to move from.
++ * @st: the service tree with the entities.
++ *
++ * Needs queue_lock to be taken and reference to be valid over the call.
++ */
++static inline void bfq_reparent_active_entities(struct bfq_data *bfqd,
++ struct bfq_group *bfqg,
++ struct bfq_service_tree *st)
++{
++ struct rb_root *active = &st->active;
++ struct bfq_entity *entity = NULL;
++
++ if (!RB_EMPTY_ROOT(&st->active))
++ entity = bfq_entity_of(rb_first(active));
++
++ for (; entity != NULL; entity = bfq_entity_of(rb_first(active)))
++ bfq_reparent_leaf_entity(bfqd, entity);
++
++ if (bfqg->sched_data.in_service_entity != NULL)
++ bfq_reparent_leaf_entity(bfqd,
++ bfqg->sched_data.in_service_entity);
++
++ return;
++}
++
++/**
++ * bfq_destroy_group - destroy @bfqg.
++ * @bgrp: the bfqio_cgroup containing @bfqg.
++ * @bfqg: the group being destroyed.
++ *
++ * Destroy @bfqg, making sure that it is not referenced from its parent.
++ */
++static void bfq_destroy_group(struct bfqio_cgroup *bgrp, struct bfq_group *bfqg)
++{
++ struct bfq_data *bfqd;
++ struct bfq_service_tree *st;
++ struct bfq_entity *entity = bfqg->my_entity;
++ unsigned long uninitialized_var(flags);
++ int i;
++
++ hlist_del(&bfqg->group_node);
++
++ /*
++ * Empty all service_trees belonging to this group before
++ * deactivating the group itself.
++ */
++ for (i = 0; i < BFQ_IOPRIO_CLASSES; i++) {
++ st = bfqg->sched_data.service_tree + i;
++
++ /*
++ * The idle tree may still contain bfq_queues belonging
++ * to exited task because they never migrated to a different
++ * cgroup from the one being destroyed now. No one else
++ * can access them so it's safe to act without any lock.
++ */
++ bfq_flush_idle_tree(st);
++
++ /*
++ * It may happen that some queues are still active
++ * (busy) upon group destruction (if the corresponding
++ * processes have been forced to terminate). We move
++ * all the leaf entities corresponding to these queues
++ * to the root_group.
++ * Also, it may happen that the group has an entity
++ * in service, which is disconnected from the active
++ * tree: it must be moved, too.
++ * There is no need to put the sync queues, as the
++ * scheduler has taken no reference.
++ */
++ bfqd = bfq_get_bfqd_locked(&bfqg->bfqd, &flags);
++ if (bfqd != NULL) {
++ bfq_reparent_active_entities(bfqd, bfqg, st);
++ bfq_put_bfqd_unlock(bfqd, &flags);
++ }
++ BUG_ON(!RB_EMPTY_ROOT(&st->active));
++ BUG_ON(!RB_EMPTY_ROOT(&st->idle));
++ }
++ BUG_ON(bfqg->sched_data.next_in_service != NULL);
++ BUG_ON(bfqg->sched_data.in_service_entity != NULL);
++
++ /*
++ * We may race with device destruction, take extra care when
++ * dereferencing bfqg->bfqd.
++ */
++ bfqd = bfq_get_bfqd_locked(&bfqg->bfqd, &flags);
++ if (bfqd != NULL) {
++ hlist_del(&bfqg->bfqd_node);
++ __bfq_deactivate_entity(entity, 0);
++ bfq_put_async_queues(bfqd, bfqg);
++ bfq_put_bfqd_unlock(bfqd, &flags);
++ }
++ BUG_ON(entity->tree != NULL);
++
++ /*
++ * No need to defer the kfree() to the end of the RCU grace
++ * period: we are called from the destroy() callback of our
++ * cgroup, so we can be sure that no one is a) still using
++ * this cgroup or b) doing lookups in it.
++ */
++ kfree(bfqg);
++}
++
++static void bfq_end_wr_async(struct bfq_data *bfqd)
++{
++ struct hlist_node *tmp;
++ struct bfq_group *bfqg;
++
++ hlist_for_each_entry_safe(bfqg, tmp, &bfqd->group_list, bfqd_node)
++ bfq_end_wr_async_queues(bfqd, bfqg);
++ bfq_end_wr_async_queues(bfqd, bfqd->root_group);
++}
++
++/**
++ * bfq_disconnect_groups - disconnect @bfqd from all its groups.
++ * @bfqd: the device descriptor being exited.
++ *
++ * When the device exits we just make sure that no lookup can return
++ * the now unused group structures. They will be deallocated on cgroup
++ * destruction.
++ */
++static void bfq_disconnect_groups(struct bfq_data *bfqd)
++{
++ struct hlist_node *tmp;
++ struct bfq_group *bfqg;
++
++ bfq_log(bfqd, "disconnect_groups beginning");
++ hlist_for_each_entry_safe(bfqg, tmp, &bfqd->group_list, bfqd_node) {
++ hlist_del(&bfqg->bfqd_node);
++
++ __bfq_deactivate_entity(bfqg->my_entity, 0);
++
++ /*
++ * Don't remove from the group hash, just set an
++ * invalid key. No lookups can race with the
++ * assignment as bfqd is being destroyed; this
++ * implies also that new elements cannot be added
++ * to the list.
++ */
++ rcu_assign_pointer(bfqg->bfqd, NULL);
++
++ bfq_log(bfqd, "disconnect_groups: put async for group %p",
++ bfqg);
++ bfq_put_async_queues(bfqd, bfqg);
++ }
++}
++
++static inline void bfq_free_root_group(struct bfq_data *bfqd)
++{
++ struct bfqio_cgroup *bgrp = &bfqio_root_cgroup;
++ struct bfq_group *bfqg = bfqd->root_group;
++
++ bfq_put_async_queues(bfqd, bfqg);
++
++ spin_lock_irq(&bgrp->lock);
++ hlist_del_rcu(&bfqg->group_node);
++ spin_unlock_irq(&bgrp->lock);
++
++ /*
++ * No need to synchronize_rcu() here: since the device is gone
++ * there cannot be any read-side access to its root_group.
++ */
++ kfree(bfqg);
++}
++
++static struct bfq_group *bfq_alloc_root_group(struct bfq_data *bfqd, int node)
++{
++ struct bfq_group *bfqg;
++ struct bfqio_cgroup *bgrp;
++ int i;
++
++ bfqg = kzalloc_node(sizeof(*bfqg), GFP_KERNEL, node);
++ if (bfqg == NULL)
++ return NULL;
++
++ bfqg->entity.parent = NULL;
++ for (i = 0; i < BFQ_IOPRIO_CLASSES; i++)
++ bfqg->sched_data.service_tree[i] = BFQ_SERVICE_TREE_INIT;
++
++ bgrp = &bfqio_root_cgroup;
++ spin_lock_irq(&bgrp->lock);
++ rcu_assign_pointer(bfqg->bfqd, bfqd);
++ hlist_add_head_rcu(&bfqg->group_node, &bgrp->group_data);
++ spin_unlock_irq(&bgrp->lock);
++
++ return bfqg;
++}
++
++#define SHOW_FUNCTION(__VAR) \
++static u64 bfqio_cgroup_##__VAR##_read(struct cgroup_subsys_state *css, \
++ struct cftype *cftype) \
++{ \
++ struct bfqio_cgroup *bgrp = css_to_bfqio(css); \
++ u64 ret = -ENODEV; \
++ \
++ mutex_lock(&bfqio_mutex); \
++ if (bfqio_is_removed(bgrp)) \
++ goto out_unlock; \
++ \
++ spin_lock_irq(&bgrp->lock); \
++ ret = bgrp->__VAR; \
++ spin_unlock_irq(&bgrp->lock); \
++ \
++out_unlock: \
++ mutex_unlock(&bfqio_mutex); \
++ return ret; \
++}
++
++SHOW_FUNCTION(weight);
++SHOW_FUNCTION(ioprio);
++SHOW_FUNCTION(ioprio_class);
++#undef SHOW_FUNCTION
++
++#define STORE_FUNCTION(__VAR, __MIN, __MAX) \
++static int bfqio_cgroup_##__VAR##_write(struct cgroup_subsys_state *css,\
++ struct cftype *cftype, \
++ u64 val) \
++{ \
++ struct bfqio_cgroup *bgrp = css_to_bfqio(css); \
++ struct bfq_group *bfqg; \
++ int ret = -EINVAL; \
++ \
++ if (val < (__MIN) || val > (__MAX)) \
++ return ret; \
++ \
++ ret = -ENODEV; \
++ mutex_lock(&bfqio_mutex); \
++ if (bfqio_is_removed(bgrp)) \
++ goto out_unlock; \
++ ret = 0; \
++ \
++ spin_lock_irq(&bgrp->lock); \
++ bgrp->__VAR = (unsigned short)val; \
++ hlist_for_each_entry(bfqg, &bgrp->group_data, group_node) { \
++ /* \
++ * Setting the ioprio_changed flag of the entity \
++ * to 1 with new_##__VAR == ##__VAR would re-set \
++ * the value of the weight to its ioprio mapping. \
++ * Set the flag only if necessary. \
++ */ \
++ if ((unsigned short)val != bfqg->entity.new_##__VAR) { \
++ bfqg->entity.new_##__VAR = (unsigned short)val; \
++ /* \
++ * Make sure that the above new value has been \
++ * stored in bfqg->entity.new_##__VAR before \
++ * setting the ioprio_changed flag. In fact, \
++ * this flag may be read asynchronously (in \
++ * critical sections protected by a different \
++ * lock than that held here), and finding this \
++ * flag set may cause the execution of the code \
++ * for updating parameters whose value may \
++ * depend also on bfqg->entity.new_##__VAR (in \
++ * __bfq_entity_update_weight_prio). \
++ * This barrier makes sure that the new value \
++ * of bfqg->entity.new_##__VAR is correctly \
++ * seen in that code. \
++ */ \
++ smp_wmb(); \
++ bfqg->entity.ioprio_changed = 1; \
++ } \
++ } \
++ spin_unlock_irq(&bgrp->lock); \
++ \
++out_unlock: \
++ mutex_unlock(&bfqio_mutex); \
++ return ret; \
++}
++
++STORE_FUNCTION(weight, BFQ_MIN_WEIGHT, BFQ_MAX_WEIGHT);
++STORE_FUNCTION(ioprio, 0, IOPRIO_BE_NR - 1);
++STORE_FUNCTION(ioprio_class, IOPRIO_CLASS_RT, IOPRIO_CLASS_IDLE);
++#undef STORE_FUNCTION
++
++static struct cftype bfqio_files[] = {
++ {
++ .name = "weight",
++ .read_u64 = bfqio_cgroup_weight_read,
++ .write_u64 = bfqio_cgroup_weight_write,
++ },
++ {
++ .name = "ioprio",
++ .read_u64 = bfqio_cgroup_ioprio_read,
++ .write_u64 = bfqio_cgroup_ioprio_write,
++ },
++ {
++ .name = "ioprio_class",
++ .read_u64 = bfqio_cgroup_ioprio_class_read,
++ .write_u64 = bfqio_cgroup_ioprio_class_write,
++ },
++ { }, /* terminate */
++};
++
++static struct cgroup_subsys_state *bfqio_create(struct cgroup_subsys_state
++ *parent_css)
++{
++ struct bfqio_cgroup *bgrp;
++
++ if (parent_css != NULL) {
++ bgrp = kzalloc(sizeof(*bgrp), GFP_KERNEL);
++ if (bgrp == NULL)
++ return ERR_PTR(-ENOMEM);
++ } else
++ bgrp = &bfqio_root_cgroup;
++
++ spin_lock_init(&bgrp->lock);
++ INIT_HLIST_HEAD(&bgrp->group_data);
++ bgrp->ioprio = BFQ_DEFAULT_GRP_IOPRIO;
++ bgrp->ioprio_class = BFQ_DEFAULT_GRP_CLASS;
++
++ return &bgrp->css;
++}
++
++/*
++ * We cannot support shared io contexts, as we have no means to support
++ * two tasks with the same ioc in two different groups without major rework
++ * of the main bic/bfqq data structures. By now we allow a task to change
++ * its cgroup only if it's the only owner of its ioc; the drawback of this
++ * behavior is that a group containing a task that forked using CLONE_IO
++ * will not be destroyed until the tasks sharing the ioc die.
++ */
++static int bfqio_can_attach(struct cgroup_subsys_state *css,
++ struct cgroup_taskset *tset)
++{
++ struct task_struct *task;
++ struct io_context *ioc;
++ int ret = 0;
++
++ cgroup_taskset_for_each(task, tset) {
++ /*
++ * task_lock() is needed to avoid races with
++ * exit_io_context()
++ */
++ task_lock(task);
++ ioc = task->io_context;
++ if (ioc != NULL && atomic_read(&ioc->nr_tasks) > 1)
++ /*
++ * ioc == NULL means that the task is either too
++ * young or exiting: if it has still no ioc the
++ * ioc can't be shared, if the task is exiting the
++ * attach will fail anyway, no matter what we
++ * return here.
++ */
++ ret = -EINVAL;
++ task_unlock(task);
++ if (ret)
++ break;
++ }
++
++ return ret;
++}
++
++static void bfqio_attach(struct cgroup_subsys_state *css,
++ struct cgroup_taskset *tset)
++{
++ struct task_struct *task;
++ struct io_context *ioc;
++ struct io_cq *icq;
++
++ /*
++ * IMPORTANT NOTE: The move of more than one process at a time to a
++ * new group has not yet been tested.
++ */
++ cgroup_taskset_for_each(task, tset) {
++ ioc = get_task_io_context(task, GFP_ATOMIC, NUMA_NO_NODE);
++ if (ioc) {
++ /*
++ * Handle cgroup change here.
++ */
++ rcu_read_lock();
++ hlist_for_each_entry_rcu(icq, &ioc->icq_list, ioc_node)
++ if (!strncmp(
++ icq->q->elevator->type->elevator_name,
++ "bfq", ELV_NAME_MAX))
++ bfq_bic_change_cgroup(icq_to_bic(icq),
++ css);
++ rcu_read_unlock();
++ put_io_context(ioc);
++ }
++ }
++}
++
++static void bfqio_destroy(struct cgroup_subsys_state *css)
++{
++ struct bfqio_cgroup *bgrp = css_to_bfqio(css);
++ struct hlist_node *tmp;
++ struct bfq_group *bfqg;
++
++ /*
++ * Since we are destroying the cgroup, there are no more tasks
++ * referencing it, and all the RCU grace periods that may have
++ * referenced it are ended (as the destruction of the parent
++ * cgroup is RCU-safe); bgrp->group_data will not be accessed by
++ * anything else and we don't need any synchronization.
++ */
++ hlist_for_each_entry_safe(bfqg, tmp, &bgrp->group_data, group_node)
++ bfq_destroy_group(bgrp, bfqg);
++
++ BUG_ON(!hlist_empty(&bgrp->group_data));
++
++ kfree(bgrp);
++}
++
++static int bfqio_css_online(struct cgroup_subsys_state *css)
++{
++ struct bfqio_cgroup *bgrp = css_to_bfqio(css);
++
++ mutex_lock(&bfqio_mutex);
++ bgrp->online = true;
++ mutex_unlock(&bfqio_mutex);
++
++ return 0;
++}
++
++static void bfqio_css_offline(struct cgroup_subsys_state *css)
++{
++ struct bfqio_cgroup *bgrp = css_to_bfqio(css);
++
++ mutex_lock(&bfqio_mutex);
++ bgrp->online = false;
++ mutex_unlock(&bfqio_mutex);
++}
++
++struct cgroup_subsys bfqio_cgrp_subsys = {
++ .css_alloc = bfqio_create,
++ .css_online = bfqio_css_online,
++ .css_offline = bfqio_css_offline,
++ .can_attach = bfqio_can_attach,
++ .attach = bfqio_attach,
++ .css_free = bfqio_destroy,
++ .legacy_cftypes = bfqio_files,
++};
++#else
++static inline void bfq_init_entity(struct bfq_entity *entity,
++ struct bfq_group *bfqg)
++{
++ entity->weight = entity->new_weight;
++ entity->orig_weight = entity->new_weight;
++ entity->ioprio = entity->new_ioprio;
++ entity->ioprio_class = entity->new_ioprio_class;
++ entity->sched_data = &bfqg->sched_data;
++}
++
++static inline struct bfq_group *
++bfq_bic_update_cgroup(struct bfq_io_cq *bic)
++{
++ struct bfq_data *bfqd = bic_to_bfqd(bic);
++ return bfqd->root_group;
++}
++
++static inline void bfq_bfqq_move(struct bfq_data *bfqd,
++ struct bfq_queue *bfqq,
++ struct bfq_entity *entity,
++ struct bfq_group *bfqg)
++{
++}
++
++static void bfq_end_wr_async(struct bfq_data *bfqd)
++{
++ bfq_end_wr_async_queues(bfqd, bfqd->root_group);
++}
++
++static inline void bfq_disconnect_groups(struct bfq_data *bfqd)
++{
++ bfq_put_async_queues(bfqd, bfqd->root_group);
++}
++
++static inline void bfq_free_root_group(struct bfq_data *bfqd)
++{
++ kfree(bfqd->root_group);
++}
++
++static struct bfq_group *bfq_alloc_root_group(struct bfq_data *bfqd, int node)
++{
++ struct bfq_group *bfqg;
++ int i;
++
++ bfqg = kmalloc_node(sizeof(*bfqg), GFP_KERNEL | __GFP_ZERO, node);
++ if (bfqg == NULL)
++ return NULL;
++
++ for (i = 0; i < BFQ_IOPRIO_CLASSES; i++)
++ bfqg->sched_data.service_tree[i] = BFQ_SERVICE_TREE_INIT;
++
++ return bfqg;
++}
++#endif
+diff --git a/block/bfq-ioc.c b/block/bfq-ioc.c
+new file mode 100644
+index 0000000..7f6b000
+--- /dev/null
++++ b/block/bfq-ioc.c
+@@ -0,0 +1,36 @@
++/*
++ * BFQ: I/O context handling.
++ *
++ * Based on ideas and code from CFQ:
++ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
++ *
++ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
++ * Paolo Valente <paolo.valente@unimore.it>
++ *
++ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
++ */
++
++/**
++ * icq_to_bic - convert iocontext queue structure to bfq_io_cq.
++ * @icq: the iocontext queue.
++ */
++static inline struct bfq_io_cq *icq_to_bic(struct io_cq *icq)
++{
++ /* bic->icq is the first member, %NULL will convert to %NULL */
++ return container_of(icq, struct bfq_io_cq, icq);
++}
++
++/**
++ * bfq_bic_lookup - search into @ioc a bic associated to @bfqd.
++ * @bfqd: the lookup key.
++ * @ioc: the io_context of the process doing I/O.
++ *
++ * Queue lock must be held.
++ */
++static inline struct bfq_io_cq *bfq_bic_lookup(struct bfq_data *bfqd,
++ struct io_context *ioc)
++{
++ if (ioc)
++ return icq_to_bic(ioc_lookup_icq(ioc, bfqd->queue));
++ return NULL;
++}
+diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
+new file mode 100644
+index 0000000..773b2ee
+--- /dev/null
++++ b/block/bfq-iosched.c
+@@ -0,0 +1,3898 @@
++/*
++ * Budget Fair Queueing (BFQ) disk scheduler.
++ *
++ * Based on ideas and code from CFQ:
++ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
++ *
++ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
++ * Paolo Valente <paolo.valente@unimore.it>
++ *
++ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
++ *
++ * Licensed under the GPL-2 as detailed in the accompanying COPYING.BFQ
++ * file.
++ *
++ * BFQ is a proportional-share storage-I/O scheduling algorithm based on
++ * the slice-by-slice service scheme of CFQ. But BFQ assigns budgets,
++ * measured in number of sectors, to processes instead of time slices. The
++ * device is not granted to the in-service process for a given time slice,
++ * but until it has exhausted its assigned budget. This change from the time
++ * to the service domain allows BFQ to distribute the device throughput
++ * among processes as desired, without any distortion due to ZBR, workload
++ * fluctuations or other factors. BFQ uses an ad hoc internal scheduler,
++ * called B-WF2Q+, to schedule processes according to their budgets. More
++ * precisely, BFQ schedules queues associated to processes. Thanks to the
++ * accurate policy of B-WF2Q+, BFQ can afford to assign high budgets to
++ * I/O-bound processes issuing sequential requests (to boost the
++ * throughput), and yet guarantee a low latency to interactive and soft
++ * real-time applications.
++ *
++ * BFQ is described in [1], where also a reference to the initial, more
++ * theoretical paper on BFQ can be found. The interested reader can find
++ * in the latter paper full details on the main algorithm, as well as
++ * formulas of the guarantees and formal proofs of all the properties.
++ * With respect to the version of BFQ presented in these papers, this
++ * implementation adds a few more heuristics, such as the one that
++ * guarantees a low latency to soft real-time applications, and a
++ * hierarchical extension based on H-WF2Q+.
++ *
++ * B-WF2Q+ is based on WF2Q+, that is described in [2], together with
++ * H-WF2Q+, while the augmented tree used to implement B-WF2Q+ with O(log N)
++ * complexity derives from the one introduced with EEVDF in [3].
++ *
++ * [1] P. Valente and M. Andreolini, ``Improving Application Responsiveness
++ * with the BFQ Disk I/O Scheduler'',
++ * Proceedings of the 5th Annual International Systems and Storage
++ * Conference (SYSTOR '12), June 2012.
++ *
++ * http://algogroup.unimo.it/people/paolo/disk_sched/bf1-v1-suite-results.pdf
++ *
++ * [2] Jon C.R. Bennett and H. Zhang, ``Hierarchical Packet Fair Queueing
++ * Algorithms,'' IEEE/ACM Transactions on Networking, 5(5):675-689,
++ * Oct 1997.
++ *
++ * http://www.cs.cmu.edu/~hzhang/papers/TON-97-Oct.ps.gz
++ *
++ * [3] I. Stoica and H. Abdel-Wahab, ``Earliest Eligible Virtual Deadline
++ * First: A Flexible and Accurate Mechanism for Proportional Share
++ * Resource Allocation,'' technical report.
++ *
++ * http://www.cs.berkeley.edu/~istoica/papers/eevdf-tr-95.pdf
++ */
++#include <linux/module.h>
++#include <linux/slab.h>
++#include <linux/blkdev.h>
++#include <linux/cgroup.h>
++#include <linux/elevator.h>
++#include <linux/jiffies.h>
++#include <linux/rbtree.h>
++#include <linux/ioprio.h>
++#include "bfq.h"
++#include "blk.h"
++
++/* Expiration time of sync (0) and async (1) requests, in jiffies. */
++static const int bfq_fifo_expire[2] = { HZ / 4, HZ / 8 };
++
++/* Maximum backwards seek, in KiB. */
++static const int bfq_back_max = 16 * 1024;
++
++/* Penalty of a backwards seek, in number of sectors. */
++static const int bfq_back_penalty = 2;
++
++/* Idling period duration, in jiffies. */
++static int bfq_slice_idle = HZ / 125;
++
++/* Default maximum budget values, in sectors and number of requests. */
++static const int bfq_default_max_budget = 16 * 1024;
++static const int bfq_max_budget_async_rq = 4;
++
++/*
++ * Async to sync throughput distribution is controlled as follows:
++ * when an async request is served, the entity is charged the number
++ * of sectors of the request, multiplied by the factor below
++ */
++static const int bfq_async_charge_factor = 10;
++
++/* Default timeout values, in jiffies, approximating CFQ defaults. */
++static const int bfq_timeout_sync = HZ / 8;
++static int bfq_timeout_async = HZ / 25;
++
++struct kmem_cache *bfq_pool;
++
++/* Below this threshold (in ms), we consider thinktime immediate. */
++#define BFQ_MIN_TT 2
++
++/* hw_tag detection: parallel requests threshold and min samples needed. */
++#define BFQ_HW_QUEUE_THRESHOLD 4
++#define BFQ_HW_QUEUE_SAMPLES 32
++
++#define BFQQ_SEEK_THR (sector_t)(8 * 1024)
++#define BFQQ_SEEKY(bfqq) ((bfqq)->seek_mean > BFQQ_SEEK_THR)
++
++/* Min samples used for peak rate estimation (for autotuning). */
++#define BFQ_PEAK_RATE_SAMPLES 32
++
++/* Shift used for peak rate fixed precision calculations. */
++#define BFQ_RATE_SHIFT 16
++
++/*
++ * By default, BFQ computes the duration of the weight raising for
++ * interactive applications automatically, using the following formula:
++ * duration = (R / r) * T, where r is the peak rate of the device, and
++ * R and T are two reference parameters.
++ * In particular, R is the peak rate of the reference device (see below),
++ * and T is a reference time: given the systems that are likely to be
++ * installed on the reference device according to its speed class, T is
++ * about the maximum time needed, under BFQ and while reading two files in
++ * parallel, to load typical large applications on these systems.
++ * In practice, the slower/faster the device at hand is, the more/less it
++ * takes to load applications with respect to the reference device.
++ * Accordingly, the longer/shorter BFQ grants weight raising to interactive
++ * applications.
++ *
++ * BFQ uses four different reference pairs (R, T), depending on:
++ * . whether the device is rotational or non-rotational;
++ * . whether the device is slow, such as old or portable HDDs, as well as
++ * SD cards, or fast, such as newer HDDs and SSDs.
++ *
++ * The device's speed class is dynamically (re)detected in
++ * bfq_update_peak_rate() every time the estimated peak rate is updated.
++ *
++ * In the following definitions, R_slow[0]/R_fast[0] and T_slow[0]/T_fast[0]
++ * are the reference values for a slow/fast rotational device, whereas
++ * R_slow[1]/R_fast[1] and T_slow[1]/T_fast[1] are the reference values for
++ * a slow/fast non-rotational device. Finally, device_speed_thresh are the
++ * thresholds used to switch between speed classes.
++ * Both the reference peak rates and the thresholds are measured in
++ * sectors/usec, left-shifted by BFQ_RATE_SHIFT.
++ */
++static int R_slow[2] = {1536, 10752};
++static int R_fast[2] = {17415, 34791};
++/*
++ * To improve readability, a conversion function is used to initialize the
++ * following arrays, which entails that they can be initialized only in a
++ * function.
++ */
++static int T_slow[2];
++static int T_fast[2];
++static int device_speed_thresh[2];
++
++#define BFQ_SERVICE_TREE_INIT ((struct bfq_service_tree) \
++ { RB_ROOT, RB_ROOT, NULL, NULL, 0, 0 })
++
++#define RQ_BIC(rq) ((struct bfq_io_cq *) (rq)->elv.priv[0])
++#define RQ_BFQQ(rq) ((rq)->elv.priv[1])
++
++static inline void bfq_schedule_dispatch(struct bfq_data *bfqd);
++
++#include "bfq-ioc.c"
++#include "bfq-sched.c"
++#include "bfq-cgroup.c"
++
++#define bfq_class_idle(bfqq) ((bfqq)->entity.ioprio_class ==\
++ IOPRIO_CLASS_IDLE)
++#define bfq_class_rt(bfqq) ((bfqq)->entity.ioprio_class ==\
++ IOPRIO_CLASS_RT)
++
++#define bfq_sample_valid(samples) ((samples) > 80)
++
++/*
++ * The following macro groups conditions that need to be evaluated when
++ * checking if existing queues and groups form a symmetric scenario
++ * and therefore idling can be reduced or disabled for some of the
++ * queues. See the comment to the function bfq_bfqq_must_not_expire()
++ * for further details.
++ */
++#ifdef CONFIG_CGROUP_BFQIO
++#define symmetric_scenario (!bfqd->active_numerous_groups && \
++ !bfq_differentiated_weights(bfqd))
++#else
++#define symmetric_scenario (!bfq_differentiated_weights(bfqd))
++#endif
++
++/*
++ * We regard a request as SYNC, if either it's a read or has the SYNC bit
++ * set (in which case it could also be a direct WRITE).
++ */
++static inline int bfq_bio_sync(struct bio *bio)
++{
++ if (bio_data_dir(bio) == READ || (bio->bi_rw & REQ_SYNC))
++ return 1;
++
++ return 0;
++}
++
++/*
++ * Scheduler run of queue, if there are requests pending and no one in the
++ * driver that will restart queueing.
++ */
++static inline void bfq_schedule_dispatch(struct bfq_data *bfqd)
++{
++ if (bfqd->queued != 0) {
++ bfq_log(bfqd, "schedule dispatch");
++ kblockd_schedule_work(&bfqd->unplug_work);
++ }
++}
++
++/*
++ * Lifted from AS - choose which of rq1 and rq2 that is best served now.
++ * We choose the request that is closesr to the head right now. Distance
++ * behind the head is penalized and only allowed to a certain extent.
++ */
++static struct request *bfq_choose_req(struct bfq_data *bfqd,
++ struct request *rq1,
++ struct request *rq2,
++ sector_t last)
++{
++ sector_t s1, s2, d1 = 0, d2 = 0;
++ unsigned long back_max;
++#define BFQ_RQ1_WRAP 0x01 /* request 1 wraps */
++#define BFQ_RQ2_WRAP 0x02 /* request 2 wraps */
++ unsigned wrap = 0; /* bit mask: requests behind the disk head? */
++
++ if (rq1 == NULL || rq1 == rq2)
++ return rq2;
++ if (rq2 == NULL)
++ return rq1;
++
++ if (rq_is_sync(rq1) && !rq_is_sync(rq2))
++ return rq1;
++ else if (rq_is_sync(rq2) && !rq_is_sync(rq1))
++ return rq2;
++ if ((rq1->cmd_flags & REQ_META) && !(rq2->cmd_flags & REQ_META))
++ return rq1;
++ else if ((rq2->cmd_flags & REQ_META) && !(rq1->cmd_flags & REQ_META))
++ return rq2;
++
++ s1 = blk_rq_pos(rq1);
++ s2 = blk_rq_pos(rq2);
++
++ /*
++ * By definition, 1KiB is 2 sectors.
++ */
++ back_max = bfqd->bfq_back_max * 2;
++
++ /*
++ * Strict one way elevator _except_ in the case where we allow
++ * short backward seeks which are biased as twice the cost of a
++ * similar forward seek.
++ */
++ if (s1 >= last)
++ d1 = s1 - last;
++ else if (s1 + back_max >= last)
++ d1 = (last - s1) * bfqd->bfq_back_penalty;
++ else
++ wrap |= BFQ_RQ1_WRAP;
++
++ if (s2 >= last)
++ d2 = s2 - last;
++ else if (s2 + back_max >= last)
++ d2 = (last - s2) * bfqd->bfq_back_penalty;
++ else
++ wrap |= BFQ_RQ2_WRAP;
++
++ /* Found required data */
++
++ /*
++ * By doing switch() on the bit mask "wrap" we avoid having to
++ * check two variables for all permutations: --> faster!
++ */
++ switch (wrap) {
++ case 0: /* common case for CFQ: rq1 and rq2 not wrapped */
++ if (d1 < d2)
++ return rq1;
++ else if (d2 < d1)
++ return rq2;
++ else {
++ if (s1 >= s2)
++ return rq1;
++ else
++ return rq2;
++ }
++
++ case BFQ_RQ2_WRAP:
++ return rq1;
++ case BFQ_RQ1_WRAP:
++ return rq2;
++ case (BFQ_RQ1_WRAP|BFQ_RQ2_WRAP): /* both rqs wrapped */
++ default:
++ /*
++ * Since both rqs are wrapped,
++ * start with the one that's further behind head
++ * (--> only *one* back seek required),
++ * since back seek takes more time than forward.
++ */
++ if (s1 <= s2)
++ return rq1;
++ else
++ return rq2;
++ }
++}
++
++static struct bfq_queue *
++bfq_rq_pos_tree_lookup(struct bfq_data *bfqd, struct rb_root *root,
++ sector_t sector, struct rb_node **ret_parent,
++ struct rb_node ***rb_link)
++{
++ struct rb_node **p, *parent;
++ struct bfq_queue *bfqq = NULL;
++
++ parent = NULL;
++ p = &root->rb_node;
++ while (*p) {
++ struct rb_node **n;
++
++ parent = *p;
++ bfqq = rb_entry(parent, struct bfq_queue, pos_node);
++
++ /*
++ * Sort strictly based on sector. Smallest to the left,
++ * largest to the right.
++ */
++ if (sector > blk_rq_pos(bfqq->next_rq))
++ n = &(*p)->rb_right;
++ else if (sector < blk_rq_pos(bfqq->next_rq))
++ n = &(*p)->rb_left;
++ else
++ break;
++ p = n;
++ bfqq = NULL;
++ }
++
++ *ret_parent = parent;
++ if (rb_link)
++ *rb_link = p;
++
++ bfq_log(bfqd, "rq_pos_tree_lookup %llu: returning %d",
++ (long long unsigned)sector,
++ bfqq != NULL ? bfqq->pid : 0);
++
++ return bfqq;
++}
++
++static void bfq_rq_pos_tree_add(struct bfq_data *bfqd, struct bfq_queue *bfqq)
++{
++ struct rb_node **p, *parent;
++ struct bfq_queue *__bfqq;
++
++ if (bfqq->pos_root != NULL) {
++ rb_erase(&bfqq->pos_node, bfqq->pos_root);
++ bfqq->pos_root = NULL;
++ }
++
++ if (bfq_class_idle(bfqq))
++ return;
++ if (!bfqq->next_rq)
++ return;
++
++ bfqq->pos_root = &bfqd->rq_pos_tree;
++ __bfqq = bfq_rq_pos_tree_lookup(bfqd, bfqq->pos_root,
++ blk_rq_pos(bfqq->next_rq), &parent, &p);
++ if (__bfqq == NULL) {
++ rb_link_node(&bfqq->pos_node, parent, p);
++ rb_insert_color(&bfqq->pos_node, bfqq->pos_root);
++ } else
++ bfqq->pos_root = NULL;
++}
++
++/*
++ * Tell whether there are active queues or groups with differentiated weights.
++ */
++static inline bool bfq_differentiated_weights(struct bfq_data *bfqd)
++{
++ /*
++ * For weights to differ, at least one of the trees must contain
++ * at least two nodes.
++ */
++ return (!RB_EMPTY_ROOT(&bfqd->queue_weights_tree) &&
++ (bfqd->queue_weights_tree.rb_node->rb_left ||
++ bfqd->queue_weights_tree.rb_node->rb_right)
++#ifdef CONFIG_CGROUP_BFQIO
++ ) ||
++ (!RB_EMPTY_ROOT(&bfqd->group_weights_tree) &&
++ (bfqd->group_weights_tree.rb_node->rb_left ||
++ bfqd->group_weights_tree.rb_node->rb_right)
++#endif
++ );
++}
++
++/*
++ * If the weight-counter tree passed as input contains no counter for
++ * the weight of the input entity, then add that counter; otherwise just
++ * increment the existing counter.
++ *
++ * Note that weight-counter trees contain few nodes in mostly symmetric
++ * scenarios. For example, if all queues have the same weight, then the
++ * weight-counter tree for the queues may contain at most one node.
++ * This holds even if low_latency is on, because weight-raised queues
++ * are not inserted in the tree.
++ * In most scenarios, the rate at which nodes are created/destroyed
++ * should be low too.
++ */
++static void bfq_weights_tree_add(struct bfq_data *bfqd,
++ struct bfq_entity *entity,
++ struct rb_root *root)
++{
++ struct rb_node **new = &(root->rb_node), *parent = NULL;
++
++ /*
++ * Do not insert if the entity is already associated with a
++ * counter, which happens if:
++ * 1) the entity is associated with a queue,
++ * 2) a request arrival has caused the queue to become both
++ * non-weight-raised, and hence change its weight, and
++ * backlogged; in this respect, each of the two events
++ * causes an invocation of this function,
++ * 3) this is the invocation of this function caused by the
++ * second event. This second invocation is actually useless,
++ * and we handle this fact by exiting immediately. More
++ * efficient or clearer solutions might possibly be adopted.
++ */
++ if (entity->weight_counter)
++ return;
++
++ while (*new) {
++ struct bfq_weight_counter *__counter = container_of(*new,
++ struct bfq_weight_counter,
++ weights_node);
++ parent = *new;
++
++ if (entity->weight == __counter->weight) {
++ entity->weight_counter = __counter;
++ goto inc_counter;
++ }
++ if (entity->weight < __counter->weight)
++ new = &((*new)->rb_left);
++ else
++ new = &((*new)->rb_right);
++ }
++
++ entity->weight_counter = kzalloc(sizeof(struct bfq_weight_counter),
++ GFP_ATOMIC);
++ entity->weight_counter->weight = entity->weight;
++ rb_link_node(&entity->weight_counter->weights_node, parent, new);
++ rb_insert_color(&entity->weight_counter->weights_node, root);
++
++inc_counter:
++ entity->weight_counter->num_active++;
++}
++
++/*
++ * Decrement the weight counter associated with the entity, and, if the
++ * counter reaches 0, remove the counter from the tree.
++ * See the comments to the function bfq_weights_tree_add() for considerations
++ * about overhead.
++ */
++static void bfq_weights_tree_remove(struct bfq_data *bfqd,
++ struct bfq_entity *entity,
++ struct rb_root *root)
++{
++ if (!entity->weight_counter)
++ return;
++
++ BUG_ON(RB_EMPTY_ROOT(root));
++ BUG_ON(entity->weight_counter->weight != entity->weight);
++
++ BUG_ON(!entity->weight_counter->num_active);
++ entity->weight_counter->num_active--;
++ if (entity->weight_counter->num_active > 0)
++ goto reset_entity_pointer;
++
++ rb_erase(&entity->weight_counter->weights_node, root);
++ kfree(entity->weight_counter);
++
++reset_entity_pointer:
++ entity->weight_counter = NULL;
++}
++
++static struct request *bfq_find_next_rq(struct bfq_data *bfqd,
++ struct bfq_queue *bfqq,
++ struct request *last)
++{
++ struct rb_node *rbnext = rb_next(&last->rb_node);
++ struct rb_node *rbprev = rb_prev(&last->rb_node);
++ struct request *next = NULL, *prev = NULL;
++
++ BUG_ON(RB_EMPTY_NODE(&last->rb_node));
++
++ if (rbprev != NULL)
++ prev = rb_entry_rq(rbprev);
++
++ if (rbnext != NULL)
++ next = rb_entry_rq(rbnext);
++ else {
++ rbnext = rb_first(&bfqq->sort_list);
++ if (rbnext && rbnext != &last->rb_node)
++ next = rb_entry_rq(rbnext);
++ }
++
++ return bfq_choose_req(bfqd, next, prev, blk_rq_pos(last));
++}
++
++/* see the definition of bfq_async_charge_factor for details */
++static inline unsigned long bfq_serv_to_charge(struct request *rq,
++ struct bfq_queue *bfqq)
++{
++ return blk_rq_sectors(rq) *
++ (1 + ((!bfq_bfqq_sync(bfqq)) * (bfqq->wr_coeff == 1) *
++ bfq_async_charge_factor));
++}
++
++/**
++ * bfq_updated_next_req - update the queue after a new next_rq selection.
++ * @bfqd: the device data the queue belongs to.
++ * @bfqq: the queue to update.
++ *
++ * If the first request of a queue changes we make sure that the queue
++ * has enough budget to serve at least its first request (if the
++ * request has grown). We do this because if the queue has not enough
++ * budget for its first request, it has to go through two dispatch
++ * rounds to actually get it dispatched.
++ */
++static void bfq_updated_next_req(struct bfq_data *bfqd,
++ struct bfq_queue *bfqq)
++{
++ struct bfq_entity *entity = &bfqq->entity;
++ struct bfq_service_tree *st = bfq_entity_service_tree(entity);
++ struct request *next_rq = bfqq->next_rq;
++ unsigned long new_budget;
++
++ if (next_rq == NULL)
++ return;
++
++ if (bfqq == bfqd->in_service_queue)
++ /*
++ * In order not to break guarantees, budgets cannot be
++ * changed after an entity has been selected.
++ */
++ return;
++
++ BUG_ON(entity->tree != &st->active);
++ BUG_ON(entity == entity->sched_data->in_service_entity);
++
++ new_budget = max_t(unsigned long, bfqq->max_budget,
++ bfq_serv_to_charge(next_rq, bfqq));
++ if (entity->budget != new_budget) {
++ entity->budget = new_budget;
++ bfq_log_bfqq(bfqd, bfqq, "updated next rq: new budget %lu",
++ new_budget);
++ bfq_activate_bfqq(bfqd, bfqq);
++ }
++}
++
++static inline unsigned int bfq_wr_duration(struct bfq_data *bfqd)
++{
++ u64 dur;
++
++ if (bfqd->bfq_wr_max_time > 0)
++ return bfqd->bfq_wr_max_time;
++
++ dur = bfqd->RT_prod;
++ do_div(dur, bfqd->peak_rate);
++
++ return dur;
++}
++
++/* Empty burst list and add just bfqq (see comments to bfq_handle_burst) */
++static inline void bfq_reset_burst_list(struct bfq_data *bfqd,
++ struct bfq_queue *bfqq)
++{
++ struct bfq_queue *item;
++ struct hlist_node *n;
++
++ hlist_for_each_entry_safe(item, n, &bfqd->burst_list, burst_list_node)
++ hlist_del_init(&item->burst_list_node);
++ hlist_add_head(&bfqq->burst_list_node, &bfqd->burst_list);
++ bfqd->burst_size = 1;
++}
++
++/* Add bfqq to the list of queues in current burst (see bfq_handle_burst) */
++static void bfq_add_to_burst(struct bfq_data *bfqd, struct bfq_queue *bfqq)
++{
++ /* Increment burst size to take into account also bfqq */
++ bfqd->burst_size++;
++
++ if (bfqd->burst_size == bfqd->bfq_large_burst_thresh) {
++ struct bfq_queue *pos, *bfqq_item;
++ struct hlist_node *n;
++
++ /*
++ * Enough queues have been activated shortly after each
++ * other to consider this burst as large.
++ */
++ bfqd->large_burst = true;
++
++ /*
++ * We can now mark all queues in the burst list as
++ * belonging to a large burst.
++ */
++ hlist_for_each_entry(bfqq_item, &bfqd->burst_list,
++ burst_list_node)
++ bfq_mark_bfqq_in_large_burst(bfqq_item);
++ bfq_mark_bfqq_in_large_burst(bfqq);
++
++ /*
++ * From now on, and until the current burst finishes, any
++ * new queue being activated shortly after the last queue
++ * was inserted in the burst can be immediately marked as
++ * belonging to a large burst. So the burst list is not
++ * needed any more. Remove it.
++ */
++ hlist_for_each_entry_safe(pos, n, &bfqd->burst_list,
++ burst_list_node)
++ hlist_del_init(&pos->burst_list_node);
++ } else /* burst not yet large: add bfqq to the burst list */
++ hlist_add_head(&bfqq->burst_list_node, &bfqd->burst_list);
++}
++
++/*
++ * If many queues happen to become active shortly after each other, then,
++ * to help the processes associated to these queues get their job done as
++ * soon as possible, it is usually better to not grant either weight-raising
++ * or device idling to these queues. In this comment we describe, firstly,
++ * the reasons why this fact holds, and, secondly, the next function, which
++ * implements the main steps needed to properly mark these queues so that
++ * they can then be treated in a different way.
++ *
++ * As for the terminology, we say that a queue becomes active, i.e.,
++ * switches from idle to backlogged, either when it is created (as a
++ * consequence of the arrival of an I/O request), or, if already existing,
++ * when a new request for the queue arrives while the queue is idle.
++ * Bursts of activations, i.e., activations of different queues occurring
++ * shortly after each other, are typically caused by services or applications
++ * that spawn or reactivate many parallel threads/processes. Examples are
++ * systemd during boot or git grep.
++ *
++ * These services or applications benefit mostly from a high throughput:
++ * the quicker the requests of the activated queues are cumulatively served,
++ * the sooner the target job of these queues gets completed. As a consequence,
++ * weight-raising any of these queues, which also implies idling the device
++ * for it, is almost always counterproductive: in most cases it just lowers
++ * throughput.
++ *
++ * On the other hand, a burst of activations may be also caused by the start
++ * of an application that does not consist in a lot of parallel I/O-bound
++ * threads. In fact, with a complex application, the burst may be just a
++ * consequence of the fact that several processes need to be executed to
++ * start-up the application. To start an application as quickly as possible,
++ * the best thing to do is to privilege the I/O related to the application
++ * with respect to all other I/O. Therefore, the best strategy to start as
++ * quickly as possible an application that causes a burst of activations is
++ * to weight-raise all the queues activated during the burst. This is the
++ * exact opposite of the best strategy for the other type of bursts.
++ *
++ * In the end, to take the best action for each of the two cases, the two
++ * types of bursts need to be distinguished. Fortunately, this seems
++ * relatively easy to do, by looking at the sizes of the bursts. In
++ * particular, we found a threshold such that bursts with a larger size
++ * than that threshold are apparently caused only by services or commands
++ * such as systemd or git grep. For brevity, hereafter we call just 'large'
++ * these bursts. BFQ *does not* weight-raise queues whose activations occur
++ * in a large burst. In addition, for each of these queues BFQ performs or
++ * does not perform idling depending on which choice boosts the throughput
++ * most. The exact choice depends on the device and request pattern at
++ * hand.
++ *
++ * Turning back to the next function, it implements all the steps needed
++ * to detect the occurrence of a large burst and to properly mark all the
++ * queues belonging to it (so that they can then be treated in a different
++ * way). This goal is achieved by maintaining a special "burst list" that
++ * holds, temporarily, the queues that belong to the burst in progress. The
++ * list is then used to mark these queues as belonging to a large burst if
++ * the burst does become large. The main steps are the following.
++ *
++ * . when the very first queue is activated, the queue is inserted into the
++ * list (as it could be the first queue in a possible burst)
++ *
++ * . if the current burst has not yet become large, and a queue Q that does
++ * not yet belong to the burst is activated shortly after the last time
++ * at which a new queue entered the burst list, then the function appends
++ * Q to the burst list
++ *
++ * . if, as a consequence of the previous step, the burst size reaches
++ * the large-burst threshold, then
++ *
++ * . all the queues in the burst list are marked as belonging to a
++ * large burst
++ *
++ * . the burst list is deleted; in fact, the burst list already served
++ * its purpose (keeping temporarily track of the queues in a burst,
++ * so as to be able to mark them as belonging to a large burst in the
++ * previous sub-step), and now is not needed any more
++ *
++ * . the device enters a large-burst mode
++ *
++ * . if a queue Q that does not belong to the burst is activated while
++ * the device is in large-burst mode and shortly after the last time
++ * at which a queue either entered the burst list or was marked as
++ * belonging to the current large burst, then Q is immediately marked
++ * as belonging to a large burst.
++ *
++ * . if a queue Q that does not belong to the burst is activated a while
++ * later, i.e., not shortly after, than the last time at which a queue
++ * either entered the burst list or was marked as belonging to the
++ * current large burst, then the current burst is deemed as finished and:
++ *
++ * . the large-burst mode is reset if set
++ *
++ * . the burst list is emptied
++ *
++ * . Q is inserted in the burst list, as Q may be the first queue
++ * in a possible new burst (then the burst list contains just Q
++ * after this step).
++ */
++static void bfq_handle_burst(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++ bool idle_for_long_time)
++{
++ /*
++ * If bfqq happened to be activated in a burst, but has been idle
++ * for at least as long as an interactive queue, then we assume
++ * that, in the overall I/O initiated in the burst, the I/O
++ * associated to bfqq is finished. So bfqq does not need to be
++ * treated as a queue belonging to a burst anymore. Accordingly,
++ * we reset bfqq's in_large_burst flag if set, and remove bfqq
++ * from the burst list if it's there. We do not decrement instead
++ * burst_size, because the fact that bfqq does not need to belong
++ * to the burst list any more does not invalidate the fact that
++ * bfqq may have been activated during the current burst.
++ */
++ if (idle_for_long_time) {
++ hlist_del_init(&bfqq->burst_list_node);
++ bfq_clear_bfqq_in_large_burst(bfqq);
++ }
++
++ /*
++ * If bfqq is already in the burst list or is part of a large
++ * burst, then there is nothing else to do.
++ */
++ if (!hlist_unhashed(&bfqq->burst_list_node) ||
++ bfq_bfqq_in_large_burst(bfqq))
++ return;
++
++ /*
++ * If bfqq's activation happens late enough, then the current
++ * burst is finished, and related data structures must be reset.
++ *
++ * In this respect, consider the special case where bfqq is the very
++ * first queue being activated. In this case, last_ins_in_burst is
++ * not yet significant when we get here. But it is easy to verify
++ * that, whether or not the following condition is true, bfqq will
++ * end up being inserted into the burst list. In particular the
++ * list will happen to contain only bfqq. And this is exactly what
++ * has to happen, as bfqq may be the first queue in a possible
++ * burst.
++ */
++ if (time_is_before_jiffies(bfqd->last_ins_in_burst +
++ bfqd->bfq_burst_interval)) {
++ bfqd->large_burst = false;
++ bfq_reset_burst_list(bfqd, bfqq);
++ return;
++ }
++
++ /*
++ * If we get here, then bfqq is being activated shortly after the
++ * last queue. So, if the current burst is also large, we can mark
++ * bfqq as belonging to this large burst immediately.
++ */
++ if (bfqd->large_burst) {
++ bfq_mark_bfqq_in_large_burst(bfqq);
++ return;
++ }
++
++ /*
++ * If we get here, then a large-burst state has not yet been
++ * reached, but bfqq is being activated shortly after the last
++ * queue. Then we add bfqq to the burst.
++ */
++ bfq_add_to_burst(bfqd, bfqq);
++}
++
++static void bfq_add_request(struct request *rq)
++{
++ struct bfq_queue *bfqq = RQ_BFQQ(rq);
++ struct bfq_entity *entity = &bfqq->entity;
++ struct bfq_data *bfqd = bfqq->bfqd;
++ struct request *next_rq, *prev;
++ unsigned long old_wr_coeff = bfqq->wr_coeff;
++ bool interactive = false;
++
++ bfq_log_bfqq(bfqd, bfqq, "add_request %d", rq_is_sync(rq));
++ bfqq->queued[rq_is_sync(rq)]++;
++ bfqd->queued++;
++
++ elv_rb_add(&bfqq->sort_list, rq);
++
++ /*
++ * Check if this request is a better next-serve candidate.
++ */
++ prev = bfqq->next_rq;
++ next_rq = bfq_choose_req(bfqd, bfqq->next_rq, rq, bfqd->last_position);
++ BUG_ON(next_rq == NULL);
++ bfqq->next_rq = next_rq;
++
++ /*
++ * Adjust priority tree position, if next_rq changes.
++ */
++ if (prev != bfqq->next_rq)
++ bfq_rq_pos_tree_add(bfqd, bfqq);
++
++ if (!bfq_bfqq_busy(bfqq)) {
++ bool soft_rt,
++ idle_for_long_time = time_is_before_jiffies(
++ bfqq->budget_timeout +
++ bfqd->bfq_wr_min_idle_time);
++
++ if (bfq_bfqq_sync(bfqq)) {
++ bool already_in_burst =
++ !hlist_unhashed(&bfqq->burst_list_node) ||
++ bfq_bfqq_in_large_burst(bfqq);
++ bfq_handle_burst(bfqd, bfqq, idle_for_long_time);
++ /*
++ * If bfqq was not already in the current burst,
++ * then, at this point, bfqq either has been
++ * added to the current burst or has caused the
++ * current burst to terminate. In particular, in
++ * the second case, bfqq has become the first
++ * queue in a possible new burst.
++ * In both cases last_ins_in_burst needs to be
++ * moved forward.
++ */
++ if (!already_in_burst)
++ bfqd->last_ins_in_burst = jiffies;
++ }
++
++ soft_rt = bfqd->bfq_wr_max_softrt_rate > 0 &&
++ !bfq_bfqq_in_large_burst(bfqq) &&
++ time_is_before_jiffies(bfqq->soft_rt_next_start);
++ interactive = !bfq_bfqq_in_large_burst(bfqq) &&
++ idle_for_long_time;
++ entity->budget = max_t(unsigned long, bfqq->max_budget,
++ bfq_serv_to_charge(next_rq, bfqq));
++
++ if (!bfq_bfqq_IO_bound(bfqq)) {
++ if (time_before(jiffies,
++ RQ_BIC(rq)->ttime.last_end_request +
++ bfqd->bfq_slice_idle)) {
++ bfqq->requests_within_timer++;
++ if (bfqq->requests_within_timer >=
++ bfqd->bfq_requests_within_timer)
++ bfq_mark_bfqq_IO_bound(bfqq);
++ } else
++ bfqq->requests_within_timer = 0;
++ }
++
++ if (!bfqd->low_latency)
++ goto add_bfqq_busy;
++
++ /*
++ * If the queue is not being boosted and has been idle
++ * for enough time, start a weight-raising period
++ */
++ if (old_wr_coeff == 1 && (interactive || soft_rt)) {
++ bfqq->wr_coeff = bfqd->bfq_wr_coeff;
++ if (interactive)
++ bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
++ else
++ bfqq->wr_cur_max_time =
++ bfqd->bfq_wr_rt_max_time;
++ bfq_log_bfqq(bfqd, bfqq,
++ "wrais starting at %lu, rais_max_time %u",
++ jiffies,
++ jiffies_to_msecs(bfqq->wr_cur_max_time));
++ } else if (old_wr_coeff > 1) {
++ if (interactive)
++ bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
++ else if (bfq_bfqq_in_large_burst(bfqq) ||
++ (bfqq->wr_cur_max_time ==
++ bfqd->bfq_wr_rt_max_time &&
++ !soft_rt)) {
++ bfqq->wr_coeff = 1;
++ bfq_log_bfqq(bfqd, bfqq,
++ "wrais ending at %lu, rais_max_time %u",
++ jiffies,
++ jiffies_to_msecs(bfqq->
++ wr_cur_max_time));
++ } else if (time_before(
++ bfqq->last_wr_start_finish +
++ bfqq->wr_cur_max_time,
++ jiffies +
++ bfqd->bfq_wr_rt_max_time) &&
++ soft_rt) {
++ /*
++ *
++ * The remaining weight-raising time is lower
++ * than bfqd->bfq_wr_rt_max_time, which
++ * means that the application is enjoying
++ * weight raising either because deemed soft-
++ * rt in the near past, or because deemed
++ * interactive a long ago. In both cases,
++ * resetting now the current remaining weight-
++ * raising time for the application to the
++ * weight-raising duration for soft rt
++ * applications would not cause any latency
++ * increase for the application (as the new
++ * duration would be higher than the remaining
++ * time).
++ *
++ * In addition, the application is now meeting
++ * the requirements for being deemed soft rt.
++ * In the end we can correctly and safely
++ * (re)charge the weight-raising duration for
++ * the application with the weight-raising
++ * duration for soft rt applications.
++ *
++ * In particular, doing this recharge now, i.e.,
++ * before the weight-raising period for the
++ * application finishes, reduces the probability
++ * of the following negative scenario:
++ * 1) the weight of a soft rt application is
++ * raised at startup (as for any newly
++ * created application),
++ * 2) since the application is not interactive,
++ * at a certain time weight-raising is
++ * stopped for the application,
++ * 3) at that time the application happens to
++ * still have pending requests, and hence
++ * is destined to not have a chance to be
++ * deemed soft rt before these requests are
++ * completed (see the comments to the
++ * function bfq_bfqq_softrt_next_start()
++ * for details on soft rt detection),
++ * 4) these pending requests experience a high
++ * latency because the application is not
++ * weight-raised while they are pending.
++ */
++ bfqq->last_wr_start_finish = jiffies;
++ bfqq->wr_cur_max_time =
++ bfqd->bfq_wr_rt_max_time;
++ }
++ }
++ if (old_wr_coeff != bfqq->wr_coeff)
++ entity->ioprio_changed = 1;
++add_bfqq_busy:
++ bfqq->last_idle_bklogged = jiffies;
++ bfqq->service_from_backlogged = 0;
++ bfq_clear_bfqq_softrt_update(bfqq);
++ bfq_add_bfqq_busy(bfqd, bfqq);
++ } else {
++ if (bfqd->low_latency && old_wr_coeff == 1 && !rq_is_sync(rq) &&
++ time_is_before_jiffies(
++ bfqq->last_wr_start_finish +
++ bfqd->bfq_wr_min_inter_arr_async)) {
++ bfqq->wr_coeff = bfqd->bfq_wr_coeff;
++ bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
++
++ bfqd->wr_busy_queues++;
++ entity->ioprio_changed = 1;
++ bfq_log_bfqq(bfqd, bfqq,
++ "non-idle wrais starting at %lu, rais_max_time %u",
++ jiffies,
++ jiffies_to_msecs(bfqq->wr_cur_max_time));
++ }
++ if (prev != bfqq->next_rq)
++ bfq_updated_next_req(bfqd, bfqq);
++ }
++
++ if (bfqd->low_latency &&
++ (old_wr_coeff == 1 || bfqq->wr_coeff == 1 || interactive))
++ bfqq->last_wr_start_finish = jiffies;
++}
++
++static struct request *bfq_find_rq_fmerge(struct bfq_data *bfqd,
++ struct bio *bio)
++{
++ struct task_struct *tsk = current;
++ struct bfq_io_cq *bic;
++ struct bfq_queue *bfqq;
++
++ bic = bfq_bic_lookup(bfqd, tsk->io_context);
++ if (bic == NULL)
++ return NULL;
++
++ bfqq = bic_to_bfqq(bic, bfq_bio_sync(bio));
++ if (bfqq != NULL)
++ return elv_rb_find(&bfqq->sort_list, bio_end_sector(bio));
++
++ return NULL;
++}
++
++static void bfq_activate_request(struct request_queue *q, struct request *rq)
++{
++ struct bfq_data *bfqd = q->elevator->elevator_data;
++
++ bfqd->rq_in_driver++;
++ bfqd->last_position = blk_rq_pos(rq) + blk_rq_sectors(rq);
++ bfq_log(bfqd, "activate_request: new bfqd->last_position %llu",
++ (long long unsigned)bfqd->last_position);
++}
++
++static inline void bfq_deactivate_request(struct request_queue *q,
++ struct request *rq)
++{
++ struct bfq_data *bfqd = q->elevator->elevator_data;
++
++ BUG_ON(bfqd->rq_in_driver == 0);
++ bfqd->rq_in_driver--;
++}
++
++static void bfq_remove_request(struct request *rq)
++{
++ struct bfq_queue *bfqq = RQ_BFQQ(rq);
++ struct bfq_data *bfqd = bfqq->bfqd;
++ const int sync = rq_is_sync(rq);
++
++ if (bfqq->next_rq == rq) {
++ bfqq->next_rq = bfq_find_next_rq(bfqd, bfqq, rq);
++ bfq_updated_next_req(bfqd, bfqq);
++ }
++
++ if (rq->queuelist.prev != &rq->queuelist)
++ list_del_init(&rq->queuelist);
++ BUG_ON(bfqq->queued[sync] == 0);
++ bfqq->queued[sync]--;
++ bfqd->queued--;
++ elv_rb_del(&bfqq->sort_list, rq);
++
++ if (RB_EMPTY_ROOT(&bfqq->sort_list)) {
++ if (bfq_bfqq_busy(bfqq) && bfqq != bfqd->in_service_queue)
++ bfq_del_bfqq_busy(bfqd, bfqq, 1);
++ /*
++ * Remove queue from request-position tree as it is empty.
++ */
++ if (bfqq->pos_root != NULL) {
++ rb_erase(&bfqq->pos_node, bfqq->pos_root);
++ bfqq->pos_root = NULL;
++ }
++ }
++
++ if (rq->cmd_flags & REQ_META) {
++ BUG_ON(bfqq->meta_pending == 0);
++ bfqq->meta_pending--;
++ }
++}
++
++static int bfq_merge(struct request_queue *q, struct request **req,
++ struct bio *bio)
++{
++ struct bfq_data *bfqd = q->elevator->elevator_data;
++ struct request *__rq;
++
++ __rq = bfq_find_rq_fmerge(bfqd, bio);
++ if (__rq != NULL && elv_rq_merge_ok(__rq, bio)) {
++ *req = __rq;
++ return ELEVATOR_FRONT_MERGE;
++ }
++
++ return ELEVATOR_NO_MERGE;
++}
++
++static void bfq_merged_request(struct request_queue *q, struct request *req,
++ int type)
++{
++ if (type == ELEVATOR_FRONT_MERGE &&
++ rb_prev(&req->rb_node) &&
++ blk_rq_pos(req) <
++ blk_rq_pos(container_of(rb_prev(&req->rb_node),
++ struct request, rb_node))) {
++ struct bfq_queue *bfqq = RQ_BFQQ(req);
++ struct bfq_data *bfqd = bfqq->bfqd;
++ struct request *prev, *next_rq;
++
++ /* Reposition request in its sort_list */
++ elv_rb_del(&bfqq->sort_list, req);
++ elv_rb_add(&bfqq->sort_list, req);
++ /* Choose next request to be served for bfqq */
++ prev = bfqq->next_rq;
++ next_rq = bfq_choose_req(bfqd, bfqq->next_rq, req,
++ bfqd->last_position);
++ BUG_ON(next_rq == NULL);
++ bfqq->next_rq = next_rq;
++ /*
++ * If next_rq changes, update both the queue's budget to
++ * fit the new request and the queue's position in its
++ * rq_pos_tree.
++ */
++ if (prev != bfqq->next_rq) {
++ bfq_updated_next_req(bfqd, bfqq);
++ bfq_rq_pos_tree_add(bfqd, bfqq);
++ }
++ }
++}
++
++static void bfq_merged_requests(struct request_queue *q, struct request *rq,
++ struct request *next)
++{
++ struct bfq_queue *bfqq = RQ_BFQQ(rq), *next_bfqq = RQ_BFQQ(next);
++
++ /*
++ * If next and rq belong to the same bfq_queue and next is older
++ * than rq, then reposition rq in the fifo (by substituting next
++ * with rq). Otherwise, if next and rq belong to different
++ * bfq_queues, never reposition rq: in fact, we would have to
++ * reposition it with respect to next's position in its own fifo,
++ * which would most certainly be too expensive with respect to
++ * the benefits.
++ */
++ if (bfqq == next_bfqq &&
++ !list_empty(&rq->queuelist) && !list_empty(&next->queuelist) &&
++ time_before(next->fifo_time, rq->fifo_time)) {
++ list_del_init(&rq->queuelist);
++ list_replace_init(&next->queuelist, &rq->queuelist);
++ rq->fifo_time = next->fifo_time;
++ }
++
++ if (bfqq->next_rq == next)
++ bfqq->next_rq = rq;
++
++ bfq_remove_request(next);
++}
++
++/* Must be called with bfqq != NULL */
++static inline void bfq_bfqq_end_wr(struct bfq_queue *bfqq)
++{
++ BUG_ON(bfqq == NULL);
++ if (bfq_bfqq_busy(bfqq))
++ bfqq->bfqd->wr_busy_queues--;
++ bfqq->wr_coeff = 1;
++ bfqq->wr_cur_max_time = 0;
++ /* Trigger a weight change on the next activation of the queue */
++ bfqq->entity.ioprio_changed = 1;
++}
++
++static void bfq_end_wr_async_queues(struct bfq_data *bfqd,
++ struct bfq_group *bfqg)
++{
++ int i, j;
++
++ for (i = 0; i < 2; i++)
++ for (j = 0; j < IOPRIO_BE_NR; j++)
++ if (bfqg->async_bfqq[i][j] != NULL)
++ bfq_bfqq_end_wr(bfqg->async_bfqq[i][j]);
++ if (bfqg->async_idle_bfqq != NULL)
++ bfq_bfqq_end_wr(bfqg->async_idle_bfqq);
++}
++
++static void bfq_end_wr(struct bfq_data *bfqd)
++{
++ struct bfq_queue *bfqq;
++
++ spin_lock_irq(bfqd->queue->queue_lock);
++
++ list_for_each_entry(bfqq, &bfqd->active_list, bfqq_list)
++ bfq_bfqq_end_wr(bfqq);
++ list_for_each_entry(bfqq, &bfqd->idle_list, bfqq_list)
++ bfq_bfqq_end_wr(bfqq);
++ bfq_end_wr_async(bfqd);
++
++ spin_unlock_irq(bfqd->queue->queue_lock);
++}
++
++static int bfq_allow_merge(struct request_queue *q, struct request *rq,
++ struct bio *bio)
++{
++ struct bfq_data *bfqd = q->elevator->elevator_data;
++ struct bfq_io_cq *bic;
++ struct bfq_queue *bfqq;
++
++ /*
++ * Disallow merge of a sync bio into an async request.
++ */
++ if (bfq_bio_sync(bio) && !rq_is_sync(rq))
++ return 0;
++
++ /*
++ * Lookup the bfqq that this bio will be queued with. Allow
++ * merge only if rq is queued there.
++ * Queue lock is held here.
++ */
++ bic = bfq_bic_lookup(bfqd, current->io_context);
++ if (bic == NULL)
++ return 0;
++
++ bfqq = bic_to_bfqq(bic, bfq_bio_sync(bio));
++ return bfqq == RQ_BFQQ(rq);
++}
++
++static void __bfq_set_in_service_queue(struct bfq_data *bfqd,
++ struct bfq_queue *bfqq)
++{
++ if (bfqq != NULL) {
++ bfq_mark_bfqq_must_alloc(bfqq);
++ bfq_mark_bfqq_budget_new(bfqq);
++ bfq_clear_bfqq_fifo_expire(bfqq);
++
++ bfqd->budgets_assigned = (bfqd->budgets_assigned*7 + 256) / 8;
++
++ bfq_log_bfqq(bfqd, bfqq,
++ "set_in_service_queue, cur-budget = %lu",
++ bfqq->entity.budget);
++ }
++
++ bfqd->in_service_queue = bfqq;
++}
++
++/*
++ * Get and set a new queue for service.
++ */
++static struct bfq_queue *bfq_set_in_service_queue(struct bfq_data *bfqd,
++ struct bfq_queue *bfqq)
++{
++ if (!bfqq)
++ bfqq = bfq_get_next_queue(bfqd);
++ else
++ bfq_get_next_queue_forced(bfqd, bfqq);
++
++ __bfq_set_in_service_queue(bfqd, bfqq);
++ return bfqq;
++}
++
++static inline sector_t bfq_dist_from_last(struct bfq_data *bfqd,
++ struct request *rq)
++{
++ if (blk_rq_pos(rq) >= bfqd->last_position)
++ return blk_rq_pos(rq) - bfqd->last_position;
++ else
++ return bfqd->last_position - blk_rq_pos(rq);
++}
++
++/*
++ * Return true if bfqq has no request pending and rq is close enough to
++ * bfqd->last_position, or if rq is closer to bfqd->last_position than
++ * bfqq->next_rq
++ */
++static inline int bfq_rq_close(struct bfq_data *bfqd, struct request *rq)
++{
++ return bfq_dist_from_last(bfqd, rq) <= BFQQ_SEEK_THR;
++}
++
++static struct bfq_queue *bfqq_close(struct bfq_data *bfqd)
++{
++ struct rb_root *root = &bfqd->rq_pos_tree;
++ struct rb_node *parent, *node;
++ struct bfq_queue *__bfqq;
++ sector_t sector = bfqd->last_position;
++
++ if (RB_EMPTY_ROOT(root))
++ return NULL;
++
++ /*
++ * First, if we find a request starting at the end of the last
++ * request, choose it.
++ */
++ __bfqq = bfq_rq_pos_tree_lookup(bfqd, root, sector, &parent, NULL);
++ if (__bfqq != NULL)
++ return __bfqq;
++
++ /*
++ * If the exact sector wasn't found, the parent of the NULL leaf
++ * will contain the closest sector (rq_pos_tree sorted by
++ * next_request position).
++ */
++ __bfqq = rb_entry(parent, struct bfq_queue, pos_node);
++ if (bfq_rq_close(bfqd, __bfqq->next_rq))
++ return __bfqq;
++
++ if (blk_rq_pos(__bfqq->next_rq) < sector)
++ node = rb_next(&__bfqq->pos_node);
++ else
++ node = rb_prev(&__bfqq->pos_node);
++ if (node == NULL)
++ return NULL;
++
++ __bfqq = rb_entry(node, struct bfq_queue, pos_node);
++ if (bfq_rq_close(bfqd, __bfqq->next_rq))
++ return __bfqq;
++
++ return NULL;
++}
++
++/*
++ * bfqd - obvious
++ * cur_bfqq - passed in so that we don't decide that the current queue
++ * is closely cooperating with itself.
++ *
++ * We are assuming that cur_bfqq has dispatched at least one request,
++ * and that bfqd->last_position reflects a position on the disk associated
++ * with the I/O issued by cur_bfqq.
++ */
++static struct bfq_queue *bfq_close_cooperator(struct bfq_data *bfqd,
++ struct bfq_queue *cur_bfqq)
++{
++ struct bfq_queue *bfqq;
++
++ if (bfq_class_idle(cur_bfqq))
++ return NULL;
++ if (!bfq_bfqq_sync(cur_bfqq))
++ return NULL;
++ if (BFQQ_SEEKY(cur_bfqq))
++ return NULL;
++
++ /* If device has only one backlogged bfq_queue, don't search. */
++ if (bfqd->busy_queues == 1)
++ return NULL;
++
++ /*
++ * We should notice if some of the queues are cooperating, e.g.
++ * working closely on the same area of the disk. In that case,
++ * we can group them together and don't waste time idling.
++ */
++ bfqq = bfqq_close(bfqd);
++ if (bfqq == NULL || bfqq == cur_bfqq)
++ return NULL;
++
++ /*
++ * Do not merge queues from different bfq_groups.
++ */
++ if (bfqq->entity.parent != cur_bfqq->entity.parent)
++ return NULL;
++
++ /*
++ * It only makes sense to merge sync queues.
++ */
++ if (!bfq_bfqq_sync(bfqq))
++ return NULL;
++ if (BFQQ_SEEKY(bfqq))
++ return NULL;
++
++ /*
++ * Do not merge queues of different priority classes.
++ */
++ if (bfq_class_rt(bfqq) != bfq_class_rt(cur_bfqq))
++ return NULL;
++
++ return bfqq;
++}
++
++/*
++ * If enough samples have been computed, return the current max budget
++ * stored in bfqd, which is dynamically updated according to the
++ * estimated disk peak rate; otherwise return the default max budget
++ */
++static inline unsigned long bfq_max_budget(struct bfq_data *bfqd)
++{
++ if (bfqd->budgets_assigned < 194)
++ return bfq_default_max_budget;
++ else
++ return bfqd->bfq_max_budget;
++}
++
++/*
++ * Return min budget, which is a fraction of the current or default
++ * max budget (trying with 1/32)
++ */
++static inline unsigned long bfq_min_budget(struct bfq_data *bfqd)
++{
++ if (bfqd->budgets_assigned < 194)
++ return bfq_default_max_budget / 32;
++ else
++ return bfqd->bfq_max_budget / 32;
++}
++
++static void bfq_arm_slice_timer(struct bfq_data *bfqd)
++{
++ struct bfq_queue *bfqq = bfqd->in_service_queue;
++ struct bfq_io_cq *bic;
++ unsigned long sl;
++
++ BUG_ON(!RB_EMPTY_ROOT(&bfqq->sort_list));
++
++ /* Processes have exited, don't wait. */
++ bic = bfqd->in_service_bic;
++ if (bic == NULL || atomic_read(&bic->icq.ioc->active_ref) == 0)
++ return;
++
++ bfq_mark_bfqq_wait_request(bfqq);
++
++ /*
++ * We don't want to idle for seeks, but we do want to allow
++ * fair distribution of slice time for a process doing back-to-back
++ * seeks. So allow a little bit of time for him to submit a new rq.
++ *
++ * To prevent processes with (partly) seeky workloads from
++ * being too ill-treated, grant them a small fraction of the
++ * assigned budget before reducing the waiting time to
++ * BFQ_MIN_TT. This happened to help reduce latency.
++ */
++ sl = bfqd->bfq_slice_idle;
++ /*
++ * Unless the queue is being weight-raised or the scenario is
++ * asymmetric, grant only minimum idle time if the queue either
++ * has been seeky for long enough or has already proved to be
++ * constantly seeky.
++ */
++ if (bfq_sample_valid(bfqq->seek_samples) &&
++ ((BFQQ_SEEKY(bfqq) && bfqq->entity.service >
++ bfq_max_budget(bfqq->bfqd) / 8) ||
++ bfq_bfqq_constantly_seeky(bfqq)) && bfqq->wr_coeff == 1 &&
++ symmetric_scenario)
++ sl = min(sl, msecs_to_jiffies(BFQ_MIN_TT));
++ else if (bfqq->wr_coeff > 1)
++ sl = sl * 3;
++ bfqd->last_idling_start = ktime_get();
++ mod_timer(&bfqd->idle_slice_timer, jiffies + sl);
++ bfq_log(bfqd, "arm idle: %u/%u ms",
++ jiffies_to_msecs(sl), jiffies_to_msecs(bfqd->bfq_slice_idle));
++}
++
++/*
++ * Set the maximum time for the in-service queue to consume its
++ * budget. This prevents seeky processes from lowering the disk
++ * throughput (always guaranteed with a time slice scheme as in CFQ).
++ */
++static void bfq_set_budget_timeout(struct bfq_data *bfqd)
++{
++ struct bfq_queue *bfqq = bfqd->in_service_queue;
++ unsigned int timeout_coeff;
++ if (bfqq->wr_cur_max_time == bfqd->bfq_wr_rt_max_time)
++ timeout_coeff = 1;
++ else
++ timeout_coeff = bfqq->entity.weight / bfqq->entity.orig_weight;
++
++ bfqd->last_budget_start = ktime_get();
++
++ bfq_clear_bfqq_budget_new(bfqq);
++ bfqq->budget_timeout = jiffies +
++ bfqd->bfq_timeout[bfq_bfqq_sync(bfqq)] * timeout_coeff;
++
++ bfq_log_bfqq(bfqd, bfqq, "set budget_timeout %u",
++ jiffies_to_msecs(bfqd->bfq_timeout[bfq_bfqq_sync(bfqq)] *
++ timeout_coeff));
++}
++
++/*
++ * Move request from internal lists to the request queue dispatch list.
++ */
++static void bfq_dispatch_insert(struct request_queue *q, struct request *rq)
++{
++ struct bfq_data *bfqd = q->elevator->elevator_data;
++ struct bfq_queue *bfqq = RQ_BFQQ(rq);
++
++ /*
++ * For consistency, the next instruction should have been executed
++ * after removing the request from the queue and dispatching it.
++ * We execute instead this instruction before bfq_remove_request()
++ * (and hence introduce a temporary inconsistency), for efficiency.
++ * In fact, in a forced_dispatch, this prevents two counters related
++ * to bfqq->dispatched to risk to be uselessly decremented if bfqq
++ * is not in service, and then to be incremented again after
++ * incrementing bfqq->dispatched.
++ */
++ bfqq->dispatched++;
++ bfq_remove_request(rq);
++ elv_dispatch_sort(q, rq);
++
++ if (bfq_bfqq_sync(bfqq))
++ bfqd->sync_flight++;
++}
++
++/*
++ * Return expired entry, or NULL to just start from scratch in rbtree.
++ */
++static struct request *bfq_check_fifo(struct bfq_queue *bfqq)
++{
++ struct request *rq = NULL;
++
++ if (bfq_bfqq_fifo_expire(bfqq))
++ return NULL;
++
++ bfq_mark_bfqq_fifo_expire(bfqq);
++
++ if (list_empty(&bfqq->fifo))
++ return NULL;
++
++ rq = rq_entry_fifo(bfqq->fifo.next);
++
++ if (time_before(jiffies, rq->fifo_time))
++ return NULL;
++
++ return rq;
++}
++
++/* Must be called with the queue_lock held. */
++static int bfqq_process_refs(struct bfq_queue *bfqq)
++{
++ int process_refs, io_refs;
++
++ io_refs = bfqq->allocated[READ] + bfqq->allocated[WRITE];
++ process_refs = atomic_read(&bfqq->ref) - io_refs - bfqq->entity.on_st;
++ BUG_ON(process_refs < 0);
++ return process_refs;
++}
++
++static void bfq_setup_merge(struct bfq_queue *bfqq, struct bfq_queue *new_bfqq)
++{
++ int process_refs, new_process_refs;
++ struct bfq_queue *__bfqq;
++
++ /*
++ * If there are no process references on the new_bfqq, then it is
++ * unsafe to follow the ->new_bfqq chain as other bfqq's in the chain
++ * may have dropped their last reference (not just their last process
++ * reference).
++ */
++ if (!bfqq_process_refs(new_bfqq))
++ return;
++
++ /* Avoid a circular list and skip interim queue merges. */
++ while ((__bfqq = new_bfqq->new_bfqq)) {
++ if (__bfqq == bfqq)
++ return;
++ new_bfqq = __bfqq;
++ }
++
++ process_refs = bfqq_process_refs(bfqq);
++ new_process_refs = bfqq_process_refs(new_bfqq);
++ /*
++ * If the process for the bfqq has gone away, there is no
++ * sense in merging the queues.
++ */
++ if (process_refs == 0 || new_process_refs == 0)
++ return;
++
++ /*
++ * Merge in the direction of the lesser amount of work.
++ */
++ if (new_process_refs >= process_refs) {
++ bfqq->new_bfqq = new_bfqq;
++ atomic_add(process_refs, &new_bfqq->ref);
++ } else {
++ new_bfqq->new_bfqq = bfqq;
++ atomic_add(new_process_refs, &bfqq->ref);
++ }
++ bfq_log_bfqq(bfqq->bfqd, bfqq, "scheduling merge with queue %d",
++ new_bfqq->pid);
++}
++
++static inline unsigned long bfq_bfqq_budget_left(struct bfq_queue *bfqq)
++{
++ struct bfq_entity *entity = &bfqq->entity;
++ return entity->budget - entity->service;
++}
++
++static void __bfq_bfqq_expire(struct bfq_data *bfqd, struct bfq_queue *bfqq)
++{
++ BUG_ON(bfqq != bfqd->in_service_queue);
++
++ __bfq_bfqd_reset_in_service(bfqd);
++
++ /*
++ * If this bfqq is shared between multiple processes, check
++ * to make sure that those processes are still issuing I/Os
++ * within the mean seek distance. If not, it may be time to
++ * break the queues apart again.
++ */
++ if (bfq_bfqq_coop(bfqq) && BFQQ_SEEKY(bfqq))
++ bfq_mark_bfqq_split_coop(bfqq);
++
++ if (RB_EMPTY_ROOT(&bfqq->sort_list)) {
++ /*
++ * Overloading budget_timeout field to store the time
++ * at which the queue remains with no backlog; used by
++ * the weight-raising mechanism.
++ */
++ bfqq->budget_timeout = jiffies;
++ bfq_del_bfqq_busy(bfqd, bfqq, 1);
++ } else {
++ bfq_activate_bfqq(bfqd, bfqq);
++ /*
++ * Resort priority tree of potential close cooperators.
++ */
++ bfq_rq_pos_tree_add(bfqd, bfqq);
++ }
++}
++
++/**
++ * __bfq_bfqq_recalc_budget - try to adapt the budget to the @bfqq behavior.
++ * @bfqd: device data.
++ * @bfqq: queue to update.
++ * @reason: reason for expiration.
++ *
++ * Handle the feedback on @bfqq budget. See the body for detailed
++ * comments.
++ */
++static void __bfq_bfqq_recalc_budget(struct bfq_data *bfqd,
++ struct bfq_queue *bfqq,
++ enum bfqq_expiration reason)
++{
++ struct request *next_rq;
++ unsigned long budget, min_budget;
++
++ budget = bfqq->max_budget;
++ min_budget = bfq_min_budget(bfqd);
++
++ BUG_ON(bfqq != bfqd->in_service_queue);
++
++ bfq_log_bfqq(bfqd, bfqq, "recalc_budg: last budg %lu, budg left %lu",
++ bfqq->entity.budget, bfq_bfqq_budget_left(bfqq));
++ bfq_log_bfqq(bfqd, bfqq, "recalc_budg: last max_budg %lu, min budg %lu",
++ budget, bfq_min_budget(bfqd));
++ bfq_log_bfqq(bfqd, bfqq, "recalc_budg: sync %d, seeky %d",
++ bfq_bfqq_sync(bfqq), BFQQ_SEEKY(bfqd->in_service_queue));
++
++ if (bfq_bfqq_sync(bfqq)) {
++ switch (reason) {
++ /*
++ * Caveat: in all the following cases we trade latency
++ * for throughput.
++ */
++ case BFQ_BFQQ_TOO_IDLE:
++ /*
++ * This is the only case where we may reduce
++ * the budget: if there is no request of the
++ * process still waiting for completion, then
++ * we assume (tentatively) that the timer has
++ * expired because the batch of requests of
++ * the process could have been served with a
++ * smaller budget. Hence, betting that
++ * process will behave in the same way when it
++ * becomes backlogged again, we reduce its
++ * next budget. As long as we guess right,
++ * this budget cut reduces the latency
++ * experienced by the process.
++ *
++ * However, if there are still outstanding
++ * requests, then the process may have not yet
++ * issued its next request just because it is
++ * still waiting for the completion of some of
++ * the still outstanding ones. So in this
++ * subcase we do not reduce its budget, on the
++ * contrary we increase it to possibly boost
++ * the throughput, as discussed in the
++ * comments to the BUDGET_TIMEOUT case.
++ */
++ if (bfqq->dispatched > 0) /* still outstanding reqs */
++ budget = min(budget * 2, bfqd->bfq_max_budget);
++ else {
++ if (budget > 5 * min_budget)
++ budget -= 4 * min_budget;
++ else
++ budget = min_budget;
++ }
++ break;
++ case BFQ_BFQQ_BUDGET_TIMEOUT:
++ /*
++ * We double the budget here because: 1) it
++ * gives the chance to boost the throughput if
++ * this is not a seeky process (which may have
++ * bumped into this timeout because of, e.g.,
++ * ZBR), 2) together with charge_full_budget
++ * it helps give seeky processes higher
++ * timestamps, and hence be served less
++ * frequently.
++ */
++ budget = min(budget * 2, bfqd->bfq_max_budget);
++ break;
++ case BFQ_BFQQ_BUDGET_EXHAUSTED:
++ /*
++ * The process still has backlog, and did not
++ * let either the budget timeout or the disk
++ * idling timeout expire. Hence it is not
++ * seeky, has a short thinktime and may be
++ * happy with a higher budget too. So
++ * definitely increase the budget of this good
++ * candidate to boost the disk throughput.
++ */
++ budget = min(budget * 4, bfqd->bfq_max_budget);
++ break;
++ case BFQ_BFQQ_NO_MORE_REQUESTS:
++ /*
++ * Leave the budget unchanged.
++ */
++ default:
++ return;
++ }
++ } else /* async queue */
++ /* async queues get always the maximum possible budget
++ * (their ability to dispatch is limited by
++ * @bfqd->bfq_max_budget_async_rq).
++ */
++ budget = bfqd->bfq_max_budget;
++
++ bfqq->max_budget = budget;
++
++ if (bfqd->budgets_assigned >= 194 && bfqd->bfq_user_max_budget == 0 &&
++ bfqq->max_budget > bfqd->bfq_max_budget)
++ bfqq->max_budget = bfqd->bfq_max_budget;
++
++ /*
++ * Make sure that we have enough budget for the next request.
++ * Since the finish time of the bfqq must be kept in sync with
++ * the budget, be sure to call __bfq_bfqq_expire() after the
++ * update.
++ */
++ next_rq = bfqq->next_rq;
++ if (next_rq != NULL)
++ bfqq->entity.budget = max_t(unsigned long, bfqq->max_budget,
++ bfq_serv_to_charge(next_rq, bfqq));
++ else
++ bfqq->entity.budget = bfqq->max_budget;
++
++ bfq_log_bfqq(bfqd, bfqq, "head sect: %u, new budget %lu",
++ next_rq != NULL ? blk_rq_sectors(next_rq) : 0,
++ bfqq->entity.budget);
++}
++
++static unsigned long bfq_calc_max_budget(u64 peak_rate, u64 timeout)
++{
++ unsigned long max_budget;
++
++ /*
++ * The max_budget calculated when autotuning is equal to the
++ * amount of sectors transfered in timeout_sync at the
++ * estimated peak rate.
++ */
++ max_budget = (unsigned long)(peak_rate * 1000 *
++ timeout >> BFQ_RATE_SHIFT);
++
++ return max_budget;
++}
++
++/*
++ * In addition to updating the peak rate, checks whether the process
++ * is "slow", and returns 1 if so. This slow flag is used, in addition
++ * to the budget timeout, to reduce the amount of service provided to
++ * seeky processes, and hence reduce their chances to lower the
++ * throughput. See the code for more details.
++ */
++static int bfq_update_peak_rate(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++ int compensate, enum bfqq_expiration reason)
++{
++ u64 bw, usecs, expected, timeout;
++ ktime_t delta;
++ int update = 0;
++
++ if (!bfq_bfqq_sync(bfqq) || bfq_bfqq_budget_new(bfqq))
++ return 0;
++
++ if (compensate)
++ delta = bfqd->last_idling_start;
++ else
++ delta = ktime_get();
++ delta = ktime_sub(delta, bfqd->last_budget_start);
++ usecs = ktime_to_us(delta);
++
++ /* Don't trust short/unrealistic values. */
++ if (usecs < 100 || usecs >= LONG_MAX)
++ return 0;
++
++ /*
++ * Calculate the bandwidth for the last slice. We use a 64 bit
++ * value to store the peak rate, in sectors per usec in fixed
++ * point math. We do so to have enough precision in the estimate
++ * and to avoid overflows.
++ */
++ bw = (u64)bfqq->entity.service << BFQ_RATE_SHIFT;
++ do_div(bw, (unsigned long)usecs);
++
++ timeout = jiffies_to_msecs(bfqd->bfq_timeout[BLK_RW_SYNC]);
++
++ /*
++ * Use only long (> 20ms) intervals to filter out spikes for
++ * the peak rate estimation.
++ */
++ if (usecs > 20000) {
++ if (bw > bfqd->peak_rate ||
++ (!BFQQ_SEEKY(bfqq) &&
++ reason == BFQ_BFQQ_BUDGET_TIMEOUT)) {
++ bfq_log(bfqd, "measured bw =%llu", bw);
++ /*
++ * To smooth oscillations use a low-pass filter with
++ * alpha=7/8, i.e.,
++ * new_rate = (7/8) * old_rate + (1/8) * bw
++ */
++ do_div(bw, 8);
++ if (bw == 0)
++ return 0;
++ bfqd->peak_rate *= 7;
++ do_div(bfqd->peak_rate, 8);
++ bfqd->peak_rate += bw;
++ update = 1;
++ bfq_log(bfqd, "new peak_rate=%llu", bfqd->peak_rate);
++ }
++
++ update |= bfqd->peak_rate_samples == BFQ_PEAK_RATE_SAMPLES - 1;
++
++ if (bfqd->peak_rate_samples < BFQ_PEAK_RATE_SAMPLES)
++ bfqd->peak_rate_samples++;
++
++ if (bfqd->peak_rate_samples == BFQ_PEAK_RATE_SAMPLES &&
++ update) {
++ int dev_type = blk_queue_nonrot(bfqd->queue);
++ if (bfqd->bfq_user_max_budget == 0) {
++ bfqd->bfq_max_budget =
++ bfq_calc_max_budget(bfqd->peak_rate,
++ timeout);
++ bfq_log(bfqd, "new max_budget=%lu",
++ bfqd->bfq_max_budget);
++ }
++ if (bfqd->device_speed == BFQ_BFQD_FAST &&
++ bfqd->peak_rate < device_speed_thresh[dev_type]) {
++ bfqd->device_speed = BFQ_BFQD_SLOW;
++ bfqd->RT_prod = R_slow[dev_type] *
++ T_slow[dev_type];
++ } else if (bfqd->device_speed == BFQ_BFQD_SLOW &&
++ bfqd->peak_rate > device_speed_thresh[dev_type]) {
++ bfqd->device_speed = BFQ_BFQD_FAST;
++ bfqd->RT_prod = R_fast[dev_type] *
++ T_fast[dev_type];
++ }
++ }
++ }
++
++ /*
++ * If the process has been served for a too short time
++ * interval to let its possible sequential accesses prevail on
++ * the initial seek time needed to move the disk head on the
++ * first sector it requested, then give the process a chance
++ * and for the moment return false.
++ */
++ if (bfqq->entity.budget <= bfq_max_budget(bfqd) / 8)
++ return 0;
++
++ /*
++ * A process is considered ``slow'' (i.e., seeky, so that we
++ * cannot treat it fairly in the service domain, as it would
++ * slow down too much the other processes) if, when a slice
++ * ends for whatever reason, it has received service at a
++ * rate that would not be high enough to complete the budget
++ * before the budget timeout expiration.
++ */
++ expected = bw * 1000 * timeout >> BFQ_RATE_SHIFT;
++
++ /*
++ * Caveat: processes doing IO in the slower disk zones will
++ * tend to be slow(er) even if not seeky. And the estimated
++ * peak rate will actually be an average over the disk
++ * surface. Hence, to not be too harsh with unlucky processes,
++ * we keep a budget/3 margin of safety before declaring a
++ * process slow.
++ */
++ return expected > (4 * bfqq->entity.budget) / 3;
++}
++
++/*
++ * To be deemed as soft real-time, an application must meet two
++ * requirements. First, the application must not require an average
++ * bandwidth higher than the approximate bandwidth required to playback or
++ * record a compressed high-definition video.
++ * The next function is invoked on the completion of the last request of a
++ * batch, to compute the next-start time instant, soft_rt_next_start, such
++ * that, if the next request of the application does not arrive before
++ * soft_rt_next_start, then the above requirement on the bandwidth is met.
++ *
++ * The second requirement is that the request pattern of the application is
++ * isochronous, i.e., that, after issuing a request or a batch of requests,
++ * the application stops issuing new requests until all its pending requests
++ * have been completed. After that, the application may issue a new batch,
++ * and so on.
++ * For this reason the next function is invoked to compute
++ * soft_rt_next_start only for applications that meet this requirement,
++ * whereas soft_rt_next_start is set to infinity for applications that do
++ * not.
++ *
++ * Unfortunately, even a greedy application may happen to behave in an
++ * isochronous way if the CPU load is high. In fact, the application may
++ * stop issuing requests while the CPUs are busy serving other processes,
++ * then restart, then stop again for a while, and so on. In addition, if
++ * the disk achieves a low enough throughput with the request pattern
++ * issued by the application (e.g., because the request pattern is random
++ * and/or the device is slow), then the application may meet the above
++ * bandwidth requirement too. To prevent such a greedy application to be
++ * deemed as soft real-time, a further rule is used in the computation of
++ * soft_rt_next_start: soft_rt_next_start must be higher than the current
++ * time plus the maximum time for which the arrival of a request is waited
++ * for when a sync queue becomes idle, namely bfqd->bfq_slice_idle.
++ * This filters out greedy applications, as the latter issue instead their
++ * next request as soon as possible after the last one has been completed
++ * (in contrast, when a batch of requests is completed, a soft real-time
++ * application spends some time processing data).
++ *
++ * Unfortunately, the last filter may easily generate false positives if
++ * only bfqd->bfq_slice_idle is used as a reference time interval and one
++ * or both the following cases occur:
++ * 1) HZ is so low that the duration of a jiffy is comparable to or higher
++ * than bfqd->bfq_slice_idle. This happens, e.g., on slow devices with
++ * HZ=100.
++ * 2) jiffies, instead of increasing at a constant rate, may stop increasing
++ * for a while, then suddenly 'jump' by several units to recover the lost
++ * increments. This seems to happen, e.g., inside virtual machines.
++ * To address this issue, we do not use as a reference time interval just
++ * bfqd->bfq_slice_idle, but bfqd->bfq_slice_idle plus a few jiffies. In
++ * particular we add the minimum number of jiffies for which the filter
++ * seems to be quite precise also in embedded systems and KVM/QEMU virtual
++ * machines.
++ */
++static inline unsigned long bfq_bfqq_softrt_next_start(struct bfq_data *bfqd,
++ struct bfq_queue *bfqq)
++{
++ return max(bfqq->last_idle_bklogged +
++ HZ * bfqq->service_from_backlogged /
++ bfqd->bfq_wr_max_softrt_rate,
++ jiffies + bfqq->bfqd->bfq_slice_idle + 4);
++}
++
++/*
++ * Return the largest-possible time instant such that, for as long as possible,
++ * the current time will be lower than this time instant according to the macro
++ * time_is_before_jiffies().
++ */
++static inline unsigned long bfq_infinity_from_now(unsigned long now)
++{
++ return now + ULONG_MAX / 2;
++}
++
++/**
++ * bfq_bfqq_expire - expire a queue.
++ * @bfqd: device owning the queue.
++ * @bfqq: the queue to expire.
++ * @compensate: if true, compensate for the time spent idling.
++ * @reason: the reason causing the expiration.
++ *
++ *
++ * If the process associated to the queue is slow (i.e., seeky), or in
++ * case of budget timeout, or, finally, if it is async, we
++ * artificially charge it an entire budget (independently of the
++ * actual service it received). As a consequence, the queue will get
++ * higher timestamps than the correct ones upon reactivation, and
++ * hence it will be rescheduled as if it had received more service
++ * than what it actually received. In the end, this class of processes
++ * will receive less service in proportion to how slowly they consume
++ * their budgets (and hence how seriously they tend to lower the
++ * throughput).
++ *
++ * In contrast, when a queue expires because it has been idling for
++ * too much or because it exhausted its budget, we do not touch the
++ * amount of service it has received. Hence when the queue will be
++ * reactivated and its timestamps updated, the latter will be in sync
++ * with the actual service received by the queue until expiration.
++ *
++ * Charging a full budget to the first type of queues and the exact
++ * service to the others has the effect of using the WF2Q+ policy to
++ * schedule the former on a timeslice basis, without violating the
++ * service domain guarantees of the latter.
++ */
++static void bfq_bfqq_expire(struct bfq_data *bfqd,
++ struct bfq_queue *bfqq,
++ int compensate,
++ enum bfqq_expiration reason)
++{
++ int slow;
++ BUG_ON(bfqq != bfqd->in_service_queue);
++
++ /* Update disk peak rate for autotuning and check whether the
++ * process is slow (see bfq_update_peak_rate).
++ */
++ slow = bfq_update_peak_rate(bfqd, bfqq, compensate, reason);
++
++ /*
++ * As above explained, 'punish' slow (i.e., seeky), timed-out
++ * and async queues, to favor sequential sync workloads.
++ *
++ * Processes doing I/O in the slower disk zones will tend to be
++ * slow(er) even if not seeky. Hence, since the estimated peak
++ * rate is actually an average over the disk surface, these
++ * processes may timeout just for bad luck. To avoid punishing
++ * them we do not charge a full budget to a process that
++ * succeeded in consuming at least 2/3 of its budget.
++ */
++ if (slow || (reason == BFQ_BFQQ_BUDGET_TIMEOUT &&
++ bfq_bfqq_budget_left(bfqq) >= bfqq->entity.budget / 3))
++ bfq_bfqq_charge_full_budget(bfqq);
++
++ bfqq->service_from_backlogged += bfqq->entity.service;
++
++ if (BFQQ_SEEKY(bfqq) && reason == BFQ_BFQQ_BUDGET_TIMEOUT &&
++ !bfq_bfqq_constantly_seeky(bfqq)) {
++ bfq_mark_bfqq_constantly_seeky(bfqq);
++ if (!blk_queue_nonrot(bfqd->queue))
++ bfqd->const_seeky_busy_in_flight_queues++;
++ }
++
++ if (reason == BFQ_BFQQ_TOO_IDLE &&
++ bfqq->entity.service <= 2 * bfqq->entity.budget / 10 )
++ bfq_clear_bfqq_IO_bound(bfqq);
++
++ if (bfqd->low_latency && bfqq->wr_coeff == 1)
++ bfqq->last_wr_start_finish = jiffies;
++
++ if (bfqd->low_latency && bfqd->bfq_wr_max_softrt_rate > 0 &&
++ RB_EMPTY_ROOT(&bfqq->sort_list)) {
++ /*
++ * If we get here, and there are no outstanding requests,
++ * then the request pattern is isochronous (see the comments
++ * to the function bfq_bfqq_softrt_next_start()). Hence we
++ * can compute soft_rt_next_start. If, instead, the queue
++ * still has outstanding requests, then we have to wait
++ * for the completion of all the outstanding requests to
++ * discover whether the request pattern is actually
++ * isochronous.
++ */
++ if (bfqq->dispatched == 0)
++ bfqq->soft_rt_next_start =
++ bfq_bfqq_softrt_next_start(bfqd, bfqq);
++ else {
++ /*
++ * The application is still waiting for the
++ * completion of one or more requests:
++ * prevent it from possibly being incorrectly
++ * deemed as soft real-time by setting its
++ * soft_rt_next_start to infinity. In fact,
++ * without this assignment, the application
++ * would be incorrectly deemed as soft
++ * real-time if:
++ * 1) it issued a new request before the
++ * completion of all its in-flight
++ * requests, and
++ * 2) at that time, its soft_rt_next_start
++ * happened to be in the past.
++ */
++ bfqq->soft_rt_next_start =
++ bfq_infinity_from_now(jiffies);
++ /*
++ * Schedule an update of soft_rt_next_start to when
++ * the task may be discovered to be isochronous.
++ */
++ bfq_mark_bfqq_softrt_update(bfqq);
++ }
++ }
++
++ bfq_log_bfqq(bfqd, bfqq,
++ "expire (%d, slow %d, num_disp %d, idle_win %d)", reason,
++ slow, bfqq->dispatched, bfq_bfqq_idle_window(bfqq));
++
++ /*
++ * Increase, decrease or leave budget unchanged according to
++ * reason.
++ */
++ __bfq_bfqq_recalc_budget(bfqd, bfqq, reason);
++ __bfq_bfqq_expire(bfqd, bfqq);
++}
++
++/*
++ * Budget timeout is not implemented through a dedicated timer, but
++ * just checked on request arrivals and completions, as well as on
++ * idle timer expirations.
++ */
++static int bfq_bfqq_budget_timeout(struct bfq_queue *bfqq)
++{
++ if (bfq_bfqq_budget_new(bfqq) ||
++ time_before(jiffies, bfqq->budget_timeout))
++ return 0;
++ return 1;
++}
++
++/*
++ * If we expire a queue that is waiting for the arrival of a new
++ * request, we may prevent the fictitious timestamp back-shifting that
++ * allows the guarantees of the queue to be preserved (see [1] for
++ * this tricky aspect). Hence we return true only if this condition
++ * does not hold, or if the queue is slow enough to deserve only to be
++ * kicked off for preserving a high throughput.
++*/
++static inline int bfq_may_expire_for_budg_timeout(struct bfq_queue *bfqq)
++{
++ bfq_log_bfqq(bfqq->bfqd, bfqq,
++ "may_budget_timeout: wait_request %d left %d timeout %d",
++ bfq_bfqq_wait_request(bfqq),
++ bfq_bfqq_budget_left(bfqq) >= bfqq->entity.budget / 3,
++ bfq_bfqq_budget_timeout(bfqq));
++
++ return (!bfq_bfqq_wait_request(bfqq) ||
++ bfq_bfqq_budget_left(bfqq) >= bfqq->entity.budget / 3)
++ &&
++ bfq_bfqq_budget_timeout(bfqq);
++}
++
++/*
++ * Device idling is allowed only for the queues for which this function
++ * returns true. For this reason, the return value of this function plays a
++ * critical role for both throughput boosting and service guarantees. The
++ * return value is computed through a logical expression. In this rather
++ * long comment, we try to briefly describe all the details and motivations
++ * behind the components of this logical expression.
++ *
++ * First, the expression is false if bfqq is not sync, or if: bfqq happened
++ * to become active during a large burst of queue activations, and the
++ * pattern of requests bfqq contains boosts the throughput if bfqq is
++ * expired. In fact, queues that became active during a large burst benefit
++ * only from throughput, as discussed in the comments to bfq_handle_burst.
++ * In this respect, expiring bfqq certainly boosts the throughput on NCQ-
++ * capable flash-based devices, whereas, on rotational devices, it boosts
++ * the throughput only if bfqq contains random requests.
++ *
++ * On the opposite end, if (a) bfqq is sync, (b) the above burst-related
++ * condition does not hold, and (c) bfqq is being weight-raised, then the
++ * expression always evaluates to true, as device idling is instrumental
++ * for preserving low-latency guarantees (see [1]). If, instead, conditions
++ * (a) and (b) do hold, but (c) does not, then the expression evaluates to
++ * true only if: (1) bfqq is I/O-bound and has a non-null idle window, and
++ * (2) at least one of the following two conditions holds.
++ * The first condition is that the device is not performing NCQ, because
++ * idling the device most certainly boosts the throughput if this condition
++ * holds and bfqq is I/O-bound and has been granted a non-null idle window.
++ * The second compound condition is made of the logical AND of two components.
++ *
++ * The first component is true only if there is no weight-raised busy
++ * queue. This guarantees that the device is not idled for a sync non-
++ * weight-raised queue when there are busy weight-raised queues. The former
++ * is then expired immediately if empty. Combined with the timestamping
++ * rules of BFQ (see [1] for details), this causes sync non-weight-raised
++ * queues to get a lower number of requests served, and hence to ask for a
++ * lower number of requests from the request pool, before the busy weight-
++ * raised queues get served again.
++ *
++ * This is beneficial for the processes associated with weight-raised
++ * queues, when the request pool is saturated (e.g., in the presence of
++ * write hogs). In fact, if the processes associated with the other queues
++ * ask for requests at a lower rate, then weight-raised processes have a
++ * higher probability to get a request from the pool immediately (or at
++ * least soon) when they need one. Hence they have a higher probability to
++ * actually get a fraction of the disk throughput proportional to their
++ * high weight. This is especially true with NCQ-capable drives, which
++ * enqueue several requests in advance and further reorder internally-
++ * queued requests.
++ *
++ * In the end, mistreating non-weight-raised queues when there are busy
++ * weight-raised queues seems to mitigate starvation problems in the
++ * presence of heavy write workloads and NCQ, and hence to guarantee a
++ * higher application and system responsiveness in these hostile scenarios.
++ *
++ * If the first component of the compound condition is instead true, i.e.,
++ * there is no weight-raised busy queue, then the second component of the
++ * compound condition takes into account service-guarantee and throughput
++ * issues related to NCQ (recall that the compound condition is evaluated
++ * only if the device is detected as supporting NCQ).
++ *
++ * As for service guarantees, allowing the drive to enqueue more than one
++ * request at a time, and hence delegating de facto final scheduling
++ * decisions to the drive's internal scheduler, causes loss of control on
++ * the actual request service order. In this respect, when the drive is
++ * allowed to enqueue more than one request at a time, the service
++ * distribution enforced by the drive's internal scheduler is likely to
++ * coincide with the desired device-throughput distribution only in the
++ * following, perfectly symmetric, scenario:
++ * 1) all active queues have the same weight,
++ * 2) all active groups at the same level in the groups tree have the same
++ * weight,
++ * 3) all active groups at the same level in the groups tree have the same
++ * number of children.
++ *
++ * Even in such a scenario, sequential I/O may still receive a preferential
++ * treatment, but this is not likely to be a big issue with flash-based
++ * devices, because of their non-dramatic loss of throughput with random
++ * I/O. Things do differ with HDDs, for which additional care is taken, as
++ * explained after completing the discussion for flash-based devices.
++ *
++ * Unfortunately, keeping the necessary state for evaluating exactly the
++ * above symmetry conditions would be quite complex and time-consuming.
++ * Therefore BFQ evaluates instead the following stronger sub-conditions,
++ * for which it is much easier to maintain the needed state:
++ * 1) all active queues have the same weight,
++ * 2) all active groups have the same weight,
++ * 3) all active groups have at most one active child each.
++ * In particular, the last two conditions are always true if hierarchical
++ * support and the cgroups interface are not enabled, hence no state needs
++ * to be maintained in this case.
++ *
++ * According to the above considerations, the second component of the
++ * compound condition evaluates to true if any of the above symmetry
++ * sub-condition does not hold, or the device is not flash-based. Therefore,
++ * if also the first component is true, then idling is allowed for a sync
++ * queue. These are the only sub-conditions considered if the device is
++ * flash-based, as, for such a device, it is sensible to force idling only
++ * for service-guarantee issues. In fact, as for throughput, idling
++ * NCQ-capable flash-based devices would not boost the throughput even
++ * with sequential I/O; rather it would lower the throughput in proportion
++ * to how fast the device is. In the end, (only) if all the three
++ * sub-conditions hold and the device is flash-based, the compound
++ * condition evaluates to false and therefore no idling is performed.
++ *
++ * As already said, things change with a rotational device, where idling
++ * boosts the throughput with sequential I/O (even with NCQ). Hence, for
++ * such a device the second component of the compound condition evaluates
++ * to true also if the following additional sub-condition does not hold:
++ * the queue is constantly seeky. Unfortunately, this different behavior
++ * with respect to flash-based devices causes an additional asymmetry: if
++ * some sync queues enjoy idling and some other sync queues do not, then
++ * the latter get a low share of the device throughput, simply because the
++ * former get many requests served after being set as in service, whereas
++ * the latter do not. As a consequence, to guarantee the desired throughput
++ * distribution, on HDDs the compound expression evaluates to true (and
++ * hence device idling is performed) also if the following last symmetry
++ * condition does not hold: no other queue is benefiting from idling. Also
++ * this last condition is actually replaced with a simpler-to-maintain and
++ * stronger condition: there is no busy queue which is not constantly seeky
++ * (and hence may also benefit from idling).
++ *
++ * To sum up, when all the required symmetry and throughput-boosting
++ * sub-conditions hold, the second component of the compound condition
++ * evaluates to false, and hence no idling is performed. This helps to
++ * keep the drives' internal queues full on NCQ-capable devices, and hence
++ * to boost the throughput, without causing 'almost' any loss of service
++ * guarantees. The 'almost' follows from the fact that, if the internal
++ * queue of one such device is filled while all the sub-conditions hold,
++ * but at some point in time some sub-condition stops to hold, then it may
++ * become impossible to let requests be served in the new desired order
++ * until all the requests already queued in the device have been served.
++ */
++static inline bool bfq_bfqq_must_not_expire(struct bfq_queue *bfqq)
++{
++ struct bfq_data *bfqd = bfqq->bfqd;
++#define cond_for_seeky_on_ncq_hdd (bfq_bfqq_constantly_seeky(bfqq) && \
++ bfqd->busy_in_flight_queues == \
++ bfqd->const_seeky_busy_in_flight_queues)
++
++#define cond_for_expiring_in_burst (bfq_bfqq_in_large_burst(bfqq) && \
++ bfqd->hw_tag && \
++ (blk_queue_nonrot(bfqd->queue) || \
++ bfq_bfqq_constantly_seeky(bfqq)))
++
++/*
++ * Condition for expiring a non-weight-raised queue (and hence not idling
++ * the device).
++ */
++#define cond_for_expiring_non_wr (bfqd->hw_tag && \
++ (bfqd->wr_busy_queues > 0 || \
++ (blk_queue_nonrot(bfqd->queue) || \
++ cond_for_seeky_on_ncq_hdd)))
++
++ return bfq_bfqq_sync(bfqq) &&
++ !cond_for_expiring_in_burst &&
++ (bfqq->wr_coeff > 1 || !symmetric_scenario ||
++ (bfq_bfqq_IO_bound(bfqq) && bfq_bfqq_idle_window(bfqq) &&
++ !cond_for_expiring_non_wr)
++ );
++}
++
++/*
++ * If the in-service queue is empty but sync, and the function
++ * bfq_bfqq_must_not_expire returns true, then:
++ * 1) the queue must remain in service and cannot be expired, and
++ * 2) the disk must be idled to wait for the possible arrival of a new
++ * request for the queue.
++ * See the comments to the function bfq_bfqq_must_not_expire for the reasons
++ * why performing device idling is the best choice to boost the throughput
++ * and preserve service guarantees when bfq_bfqq_must_not_expire itself
++ * returns true.
++ */
++static inline bool bfq_bfqq_must_idle(struct bfq_queue *bfqq)
++{
++ struct bfq_data *bfqd = bfqq->bfqd;
++
++ return RB_EMPTY_ROOT(&bfqq->sort_list) && bfqd->bfq_slice_idle != 0 &&
++ bfq_bfqq_must_not_expire(bfqq);
++}
++
++/*
++ * Select a queue for service. If we have a current queue in service,
++ * check whether to continue servicing it, or retrieve and set a new one.
++ */
++static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
++{
++ struct bfq_queue *bfqq, *new_bfqq = NULL;
++ struct request *next_rq;
++ enum bfqq_expiration reason = BFQ_BFQQ_BUDGET_TIMEOUT;
++
++ bfqq = bfqd->in_service_queue;
++ if (bfqq == NULL)
++ goto new_queue;
++
++ bfq_log_bfqq(bfqd, bfqq, "select_queue: already in-service queue");
++
++ /*
++ * If another queue has a request waiting within our mean seek
++ * distance, let it run. The expire code will check for close
++ * cooperators and put the close queue at the front of the
++ * service tree. If possible, merge the expiring queue with the
++ * new bfqq.
++ */
++ new_bfqq = bfq_close_cooperator(bfqd, bfqq);
++ if (new_bfqq != NULL && bfqq->new_bfqq == NULL)
++ bfq_setup_merge(bfqq, new_bfqq);
++
++ if (bfq_may_expire_for_budg_timeout(bfqq) &&
++ !timer_pending(&bfqd->idle_slice_timer) &&
++ !bfq_bfqq_must_idle(bfqq))
++ goto expire;
++
++ next_rq = bfqq->next_rq;
++ /*
++ * If bfqq has requests queued and it has enough budget left to
++ * serve them, keep the queue, otherwise expire it.
++ */
++ if (next_rq != NULL) {
++ if (bfq_serv_to_charge(next_rq, bfqq) >
++ bfq_bfqq_budget_left(bfqq)) {
++ reason = BFQ_BFQQ_BUDGET_EXHAUSTED;
++ goto expire;
++ } else {
++ /*
++ * The idle timer may be pending because we may
++ * not disable disk idling even when a new request
++ * arrives.
++ */
++ if (timer_pending(&bfqd->idle_slice_timer)) {
++ /*
++ * If we get here: 1) at least a new request
++ * has arrived but we have not disabled the
++ * timer because the request was too small,
++ * 2) then the block layer has unplugged
++ * the device, causing the dispatch to be
++ * invoked.
++ *
++ * Since the device is unplugged, now the
++ * requests are probably large enough to
++ * provide a reasonable throughput.
++ * So we disable idling.
++ */
++ bfq_clear_bfqq_wait_request(bfqq);
++ del_timer(&bfqd->idle_slice_timer);
++ }
++ if (new_bfqq == NULL)
++ goto keep_queue;
++ else
++ goto expire;
++ }
++ }
++
++ /*
++ * No requests pending. However, if the in-service queue is idling
++ * for a new request, or has requests waiting for a completion and
++ * may idle after their completion, then keep it anyway.
++ */
++ if (new_bfqq == NULL && (timer_pending(&bfqd->idle_slice_timer) ||
++ (bfqq->dispatched != 0 && bfq_bfqq_must_not_expire(bfqq)))) {
++ bfqq = NULL;
++ goto keep_queue;
++ } else if (new_bfqq != NULL && timer_pending(&bfqd->idle_slice_timer)) {
++ /*
++ * Expiring the queue because there is a close cooperator,
++ * cancel timer.
++ */
++ bfq_clear_bfqq_wait_request(bfqq);
++ del_timer(&bfqd->idle_slice_timer);
++ }
++
++ reason = BFQ_BFQQ_NO_MORE_REQUESTS;
++expire:
++ bfq_bfqq_expire(bfqd, bfqq, 0, reason);
++new_queue:
++ bfqq = bfq_set_in_service_queue(bfqd, new_bfqq);
++ bfq_log(bfqd, "select_queue: new queue %d returned",
++ bfqq != NULL ? bfqq->pid : 0);
++keep_queue:
++ return bfqq;
++}
++
++static void bfq_update_wr_data(struct bfq_data *bfqd,
++ struct bfq_queue *bfqq)
++{
++ if (bfqq->wr_coeff > 1) { /* queue is being boosted */
++ struct bfq_entity *entity = &bfqq->entity;
++
++ bfq_log_bfqq(bfqd, bfqq,
++ "raising period dur %u/%u msec, old coeff %u, w %d(%d)",
++ jiffies_to_msecs(jiffies -
++ bfqq->last_wr_start_finish),
++ jiffies_to_msecs(bfqq->wr_cur_max_time),
++ bfqq->wr_coeff,
++ bfqq->entity.weight, bfqq->entity.orig_weight);
++
++ BUG_ON(bfqq != bfqd->in_service_queue && entity->weight !=
++ entity->orig_weight * bfqq->wr_coeff);
++ if (entity->ioprio_changed)
++ bfq_log_bfqq(bfqd, bfqq, "WARN: pending prio change");
++ /*
++ * If the queue was activated in a burst, or
++ * too much time has elapsed from the beginning
++ * of this weight-raising, then end weight raising.
++ */
++ if (bfq_bfqq_in_large_burst(bfqq) ||
++ time_is_before_jiffies(bfqq->last_wr_start_finish +
++ bfqq->wr_cur_max_time)) {
++ bfqq->last_wr_start_finish = jiffies;
++ bfq_log_bfqq(bfqd, bfqq,
++ "wrais ending at %lu, rais_max_time %u",
++ bfqq->last_wr_start_finish,
++ jiffies_to_msecs(bfqq->wr_cur_max_time));
++ bfq_bfqq_end_wr(bfqq);
++ __bfq_entity_update_weight_prio(
++ bfq_entity_service_tree(entity),
++ entity);
++ }
++ }
++}
++
++/*
++ * Dispatch one request from bfqq, moving it to the request queue
++ * dispatch list.
++ */
++static int bfq_dispatch_request(struct bfq_data *bfqd,
++ struct bfq_queue *bfqq)
++{
++ int dispatched = 0;
++ struct request *rq;
++ unsigned long service_to_charge;
++
++ BUG_ON(RB_EMPTY_ROOT(&bfqq->sort_list));
++
++ /* Follow expired path, else get first next available. */
++ rq = bfq_check_fifo(bfqq);
++ if (rq == NULL)
++ rq = bfqq->next_rq;
++ service_to_charge = bfq_serv_to_charge(rq, bfqq);
++
++ if (service_to_charge > bfq_bfqq_budget_left(bfqq)) {
++ /*
++ * This may happen if the next rq is chosen in fifo order
++ * instead of sector order. The budget is properly
++ * dimensioned to be always sufficient to serve the next
++ * request only if it is chosen in sector order. The reason
++ * is that it would be quite inefficient and little useful
++ * to always make sure that the budget is large enough to
++ * serve even the possible next rq in fifo order.
++ * In fact, requests are seldom served in fifo order.
++ *
++ * Expire the queue for budget exhaustion, and make sure
++ * that the next act_budget is enough to serve the next
++ * request, even if it comes from the fifo expired path.
++ */
++ bfqq->next_rq = rq;
++ /*
++ * Since this dispatch is failed, make sure that
++ * a new one will be performed
++ */
++ if (!bfqd->rq_in_driver)
++ bfq_schedule_dispatch(bfqd);
++ goto expire;
++ }
++
++ /* Finally, insert request into driver dispatch list. */
++ bfq_bfqq_served(bfqq, service_to_charge);
++ bfq_dispatch_insert(bfqd->queue, rq);
++
++ bfq_update_wr_data(bfqd, bfqq);
++
++ bfq_log_bfqq(bfqd, bfqq,
++ "dispatched %u sec req (%llu), budg left %lu",
++ blk_rq_sectors(rq),
++ (long long unsigned)blk_rq_pos(rq),
++ bfq_bfqq_budget_left(bfqq));
++
++ dispatched++;
++
++ if (bfqd->in_service_bic == NULL) {
++ atomic_long_inc(&RQ_BIC(rq)->icq.ioc->refcount);
++ bfqd->in_service_bic = RQ_BIC(rq);
++ }
++
++ if (bfqd->busy_queues > 1 && ((!bfq_bfqq_sync(bfqq) &&
++ dispatched >= bfqd->bfq_max_budget_async_rq) ||
++ bfq_class_idle(bfqq)))
++ goto expire;
++
++ return dispatched;
++
++expire:
++ bfq_bfqq_expire(bfqd, bfqq, 0, BFQ_BFQQ_BUDGET_EXHAUSTED);
++ return dispatched;
++}
++
++static int __bfq_forced_dispatch_bfqq(struct bfq_queue *bfqq)
++{
++ int dispatched = 0;
++
++ while (bfqq->next_rq != NULL) {
++ bfq_dispatch_insert(bfqq->bfqd->queue, bfqq->next_rq);
++ dispatched++;
++ }
++
++ BUG_ON(!list_empty(&bfqq->fifo));
++ return dispatched;
++}
++
++/*
++ * Drain our current requests.
++ * Used for barriers and when switching io schedulers on-the-fly.
++ */
++static int bfq_forced_dispatch(struct bfq_data *bfqd)
++{
++ struct bfq_queue *bfqq, *n;
++ struct bfq_service_tree *st;
++ int dispatched = 0;
++
++ bfqq = bfqd->in_service_queue;
++ if (bfqq != NULL)
++ __bfq_bfqq_expire(bfqd, bfqq);
++
++ /*
++ * Loop through classes, and be careful to leave the scheduler
++ * in a consistent state, as feedback mechanisms and vtime
++ * updates cannot be disabled during the process.
++ */
++ list_for_each_entry_safe(bfqq, n, &bfqd->active_list, bfqq_list) {
++ st = bfq_entity_service_tree(&bfqq->entity);
++
++ dispatched += __bfq_forced_dispatch_bfqq(bfqq);
++ bfqq->max_budget = bfq_max_budget(bfqd);
++
++ bfq_forget_idle(st);
++ }
++
++ BUG_ON(bfqd->busy_queues != 0);
++
++ return dispatched;
++}
++
++static int bfq_dispatch_requests(struct request_queue *q, int force)
++{
++ struct bfq_data *bfqd = q->elevator->elevator_data;
++ struct bfq_queue *bfqq;
++ int max_dispatch;
++
++ bfq_log(bfqd, "dispatch requests: %d busy queues", bfqd->busy_queues);
++ if (bfqd->busy_queues == 0)
++ return 0;
++
++ if (unlikely(force))
++ return bfq_forced_dispatch(bfqd);
++
++ bfqq = bfq_select_queue(bfqd);
++ if (bfqq == NULL)
++ return 0;
++
++ if (bfq_class_idle(bfqq))
++ max_dispatch = 1;
++
++ if (!bfq_bfqq_sync(bfqq))
++ max_dispatch = bfqd->bfq_max_budget_async_rq;
++
++ if (!bfq_bfqq_sync(bfqq) && bfqq->dispatched >= max_dispatch) {
++ if (bfqd->busy_queues > 1)
++ return 0;
++ if (bfqq->dispatched >= 4 * max_dispatch)
++ return 0;
++ }
++
++ if (bfqd->sync_flight != 0 && !bfq_bfqq_sync(bfqq))
++ return 0;
++
++ bfq_clear_bfqq_wait_request(bfqq);
++ BUG_ON(timer_pending(&bfqd->idle_slice_timer));
++
++ if (!bfq_dispatch_request(bfqd, bfqq))
++ return 0;
++
++ bfq_log_bfqq(bfqd, bfqq, "dispatched %s request",
++ bfq_bfqq_sync(bfqq) ? "sync" : "async");
++
++ return 1;
++}
++
++/*
++ * Task holds one reference to the queue, dropped when task exits. Each rq
++ * in-flight on this queue also holds a reference, dropped when rq is freed.
++ *
++ * Queue lock must be held here.
++ */
++static void bfq_put_queue(struct bfq_queue *bfqq)
++{
++ struct bfq_data *bfqd = bfqq->bfqd;
++
++ BUG_ON(atomic_read(&bfqq->ref) <= 0);
++
++ bfq_log_bfqq(bfqd, bfqq, "put_queue: %p %d", bfqq,
++ atomic_read(&bfqq->ref));
++ if (!atomic_dec_and_test(&bfqq->ref))
++ return;
++
++ BUG_ON(rb_first(&bfqq->sort_list) != NULL);
++ BUG_ON(bfqq->allocated[READ] + bfqq->allocated[WRITE] != 0);
++ BUG_ON(bfqq->entity.tree != NULL);
++ BUG_ON(bfq_bfqq_busy(bfqq));
++ BUG_ON(bfqd->in_service_queue == bfqq);
++
++ if (bfq_bfqq_sync(bfqq))
++ /*
++ * The fact that this queue is being destroyed does not
++ * invalidate the fact that this queue may have been
++ * activated during the current burst. As a consequence,
++ * although the queue does not exist anymore, and hence
++ * needs to be removed from the burst list if there,
++ * the burst size has not to be decremented.
++ */
++ hlist_del_init(&bfqq->burst_list_node);
++
++ bfq_log_bfqq(bfqd, bfqq, "put_queue: %p freed", bfqq);
++
++ kmem_cache_free(bfq_pool, bfqq);
++}
++
++static void bfq_put_cooperator(struct bfq_queue *bfqq)
++{
++ struct bfq_queue *__bfqq, *next;
++
++ /*
++ * If this queue was scheduled to merge with another queue, be
++ * sure to drop the reference taken on that queue (and others in
++ * the merge chain). See bfq_setup_merge and bfq_merge_bfqqs.
++ */
++ __bfqq = bfqq->new_bfqq;
++ while (__bfqq) {
++ if (__bfqq == bfqq)
++ break;
++ next = __bfqq->new_bfqq;
++ bfq_put_queue(__bfqq);
++ __bfqq = next;
++ }
++}
++
++static void bfq_exit_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq)
++{
++ if (bfqq == bfqd->in_service_queue) {
++ __bfq_bfqq_expire(bfqd, bfqq);
++ bfq_schedule_dispatch(bfqd);
++ }
++
++ bfq_log_bfqq(bfqd, bfqq, "exit_bfqq: %p, %d", bfqq,
++ atomic_read(&bfqq->ref));
++
++ bfq_put_cooperator(bfqq);
++
++ bfq_put_queue(bfqq);
++}
++
++static inline void bfq_init_icq(struct io_cq *icq)
++{
++ struct bfq_io_cq *bic = icq_to_bic(icq);
++
++ bic->ttime.last_end_request = jiffies;
++}
++
++static void bfq_exit_icq(struct io_cq *icq)
++{
++ struct bfq_io_cq *bic = icq_to_bic(icq);
++ struct bfq_data *bfqd = bic_to_bfqd(bic);
++
++ if (bic->bfqq[BLK_RW_ASYNC]) {
++ bfq_exit_bfqq(bfqd, bic->bfqq[BLK_RW_ASYNC]);
++ bic->bfqq[BLK_RW_ASYNC] = NULL;
++ }
++
++ if (bic->bfqq[BLK_RW_SYNC]) {
++ bfq_exit_bfqq(bfqd, bic->bfqq[BLK_RW_SYNC]);
++ bic->bfqq[BLK_RW_SYNC] = NULL;
++ }
++}
++
++/*
++ * Update the entity prio values; note that the new values will not
++ * be used until the next (re)activation.
++ */
++static void bfq_set_next_ioprio_data(struct bfq_queue *bfqq, struct bfq_io_cq *bic)
++{
++ struct task_struct *tsk = current;
++ int ioprio_class;
++
++ ioprio_class = IOPRIO_PRIO_CLASS(bic->ioprio);
++ switch (ioprio_class) {
++ default:
++ dev_err(bfqq->bfqd->queue->backing_dev_info.dev,
++ "bfq: bad prio class %d\n", ioprio_class);
++ case IOPRIO_CLASS_NONE:
++ /*
++ * No prio set, inherit CPU scheduling settings.
++ */
++ bfqq->entity.new_ioprio = task_nice_ioprio(tsk);
++ bfqq->entity.new_ioprio_class = task_nice_ioclass(tsk);
++ break;
++ case IOPRIO_CLASS_RT:
++ bfqq->entity.new_ioprio = IOPRIO_PRIO_DATA(bic->ioprio);
++ bfqq->entity.new_ioprio_class = IOPRIO_CLASS_RT;
++ break;
++ case IOPRIO_CLASS_BE:
++ bfqq->entity.new_ioprio = IOPRIO_PRIO_DATA(bic->ioprio);
++ bfqq->entity.new_ioprio_class = IOPRIO_CLASS_BE;
++ break;
++ case IOPRIO_CLASS_IDLE:
++ bfqq->entity.new_ioprio_class = IOPRIO_CLASS_IDLE;
++ bfqq->entity.new_ioprio = 7;
++ bfq_clear_bfqq_idle_window(bfqq);
++ break;
++ }
++
++ if (bfqq->entity.new_ioprio < 0 ||
++ bfqq->entity.new_ioprio >= IOPRIO_BE_NR) {
++ printk(KERN_CRIT "bfq_set_next_ioprio_data: new_ioprio %d\n",
++ bfqq->entity.new_ioprio);
++ BUG();
++ }
++
++ bfqq->entity.new_weight = bfq_ioprio_to_weight(bfqq->entity.new_ioprio);
++ bfqq->entity.ioprio_changed = 1;
++}
++
++static void bfq_check_ioprio_change(struct bfq_io_cq *bic)
++{
++ struct bfq_data *bfqd;
++ struct bfq_queue *bfqq, *new_bfqq;
++ struct bfq_group *bfqg;
++ unsigned long uninitialized_var(flags);
++ int ioprio = bic->icq.ioc->ioprio;
++
++ bfqd = bfq_get_bfqd_locked(&(bic->icq.q->elevator->elevator_data),
++ &flags);
++ /*
++ * This condition may trigger on a newly created bic, be sure to
++ * drop the lock before returning.
++ */
++ if (unlikely(bfqd == NULL) || likely(bic->ioprio == ioprio))
++ goto out;
++
++ bic->ioprio = ioprio;
++
++ bfqq = bic->bfqq[BLK_RW_ASYNC];
++ if (bfqq != NULL) {
++ bfqg = container_of(bfqq->entity.sched_data, struct bfq_group,
++ sched_data);
++ new_bfqq = bfq_get_queue(bfqd, bfqg, BLK_RW_ASYNC, bic,
++ GFP_ATOMIC);
++ if (new_bfqq != NULL) {
++ bic->bfqq[BLK_RW_ASYNC] = new_bfqq;
++ bfq_log_bfqq(bfqd, bfqq,
++ "check_ioprio_change: bfqq %p %d",
++ bfqq, atomic_read(&bfqq->ref));
++ bfq_put_queue(bfqq);
++ }
++ }
++
++ bfqq = bic->bfqq[BLK_RW_SYNC];
++ if (bfqq != NULL)
++ bfq_set_next_ioprio_data(bfqq, bic);
++
++out:
++ bfq_put_bfqd_unlock(bfqd, &flags);
++}
++
++static void bfq_init_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++ struct bfq_io_cq *bic, pid_t pid, int is_sync)
++{
++ RB_CLEAR_NODE(&bfqq->entity.rb_node);
++ INIT_LIST_HEAD(&bfqq->fifo);
++ INIT_HLIST_NODE(&bfqq->burst_list_node);
++
++ atomic_set(&bfqq->ref, 0);
++ bfqq->bfqd = bfqd;
++
++ if (bic)
++ bfq_set_next_ioprio_data(bfqq, bic);
++
++ if (is_sync) {
++ if (!bfq_class_idle(bfqq))
++ bfq_mark_bfqq_idle_window(bfqq);
++ bfq_mark_bfqq_sync(bfqq);
++ }
++ bfq_mark_bfqq_IO_bound(bfqq);
++
++ /* Tentative initial value to trade off between thr and lat */
++ bfqq->max_budget = (2 * bfq_max_budget(bfqd)) / 3;
++ bfqq->pid = pid;
++
++ bfqq->wr_coeff = 1;
++ bfqq->last_wr_start_finish = 0;
++ /*
++ * Set to the value for which bfqq will not be deemed as
++ * soft rt when it becomes backlogged.
++ */
++ bfqq->soft_rt_next_start = bfq_infinity_from_now(jiffies);
++}
++
++static struct bfq_queue *bfq_find_alloc_queue(struct bfq_data *bfqd,
++ struct bfq_group *bfqg,
++ int is_sync,
++ struct bfq_io_cq *bic,
++ gfp_t gfp_mask)
++{
++ struct bfq_queue *bfqq, *new_bfqq = NULL;
++
++retry:
++ /* bic always exists here */
++ bfqq = bic_to_bfqq(bic, is_sync);
++
++ /*
++ * Always try a new alloc if we fall back to the OOM bfqq
++ * originally, since it should just be a temporary situation.
++ */
++ if (bfqq == NULL || bfqq == &bfqd->oom_bfqq) {
++ bfqq = NULL;
++ if (new_bfqq != NULL) {
++ bfqq = new_bfqq;
++ new_bfqq = NULL;
++ } else if (gfp_mask & __GFP_WAIT) {
++ spin_unlock_irq(bfqd->queue->queue_lock);
++ new_bfqq = kmem_cache_alloc_node(bfq_pool,
++ gfp_mask | __GFP_ZERO,
++ bfqd->queue->node);
++ spin_lock_irq(bfqd->queue->queue_lock);
++ if (new_bfqq != NULL)
++ goto retry;
++ } else {
++ bfqq = kmem_cache_alloc_node(bfq_pool,
++ gfp_mask | __GFP_ZERO,
++ bfqd->queue->node);
++ }
++
++ if (bfqq != NULL) {
++ bfq_init_bfqq(bfqd, bfqq, bic, current->pid,
++ is_sync);
++ bfq_init_entity(&bfqq->entity, bfqg);
++ bfq_log_bfqq(bfqd, bfqq, "allocated");
++ } else {
++ bfqq = &bfqd->oom_bfqq;
++ bfq_log_bfqq(bfqd, bfqq, "using oom bfqq");
++ }
++ }
++
++ if (new_bfqq != NULL)
++ kmem_cache_free(bfq_pool, new_bfqq);
++
++ return bfqq;
++}
++
++static struct bfq_queue **bfq_async_queue_prio(struct bfq_data *bfqd,
++ struct bfq_group *bfqg,
++ int ioprio_class, int ioprio)
++{
++ switch (ioprio_class) {
++ case IOPRIO_CLASS_RT:
++ return &bfqg->async_bfqq[0][ioprio];
++ case IOPRIO_CLASS_NONE:
++ ioprio = IOPRIO_NORM;
++ /* fall through */
++ case IOPRIO_CLASS_BE:
++ return &bfqg->async_bfqq[1][ioprio];
++ case IOPRIO_CLASS_IDLE:
++ return &bfqg->async_idle_bfqq;
++ default:
++ BUG();
++ }
++}
++
++static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
++ struct bfq_group *bfqg, int is_sync,
++ struct bfq_io_cq *bic, gfp_t gfp_mask)
++{
++ const int ioprio = IOPRIO_PRIO_DATA(bic->ioprio);
++ const int ioprio_class = IOPRIO_PRIO_CLASS(bic->ioprio);
++ struct bfq_queue **async_bfqq = NULL;
++ struct bfq_queue *bfqq = NULL;
++
++ if (!is_sync) {
++ async_bfqq = bfq_async_queue_prio(bfqd, bfqg, ioprio_class,
++ ioprio);
++ bfqq = *async_bfqq;
++ }
++
++ if (bfqq == NULL)
++ bfqq = bfq_find_alloc_queue(bfqd, bfqg, is_sync, bic, gfp_mask);
++
++ /*
++ * Pin the queue now that it's allocated, scheduler exit will
++ * prune it.
++ */
++ if (!is_sync && *async_bfqq == NULL) {
++ atomic_inc(&bfqq->ref);
++ bfq_log_bfqq(bfqd, bfqq, "get_queue, bfqq not in async: %p, %d",
++ bfqq, atomic_read(&bfqq->ref));
++ *async_bfqq = bfqq;
++ }
++
++ atomic_inc(&bfqq->ref);
++ bfq_log_bfqq(bfqd, bfqq, "get_queue, at end: %p, %d", bfqq,
++ atomic_read(&bfqq->ref));
++ return bfqq;
++}
++
++static void bfq_update_io_thinktime(struct bfq_data *bfqd,
++ struct bfq_io_cq *bic)
++{
++ unsigned long elapsed = jiffies - bic->ttime.last_end_request;
++ unsigned long ttime = min(elapsed, 2UL * bfqd->bfq_slice_idle);
++
++ bic->ttime.ttime_samples = (7*bic->ttime.ttime_samples + 256) / 8;
++ bic->ttime.ttime_total = (7*bic->ttime.ttime_total + 256*ttime) / 8;
++ bic->ttime.ttime_mean = (bic->ttime.ttime_total + 128) /
++ bic->ttime.ttime_samples;
++}
++
++static void bfq_update_io_seektime(struct bfq_data *bfqd,
++ struct bfq_queue *bfqq,
++ struct request *rq)
++{
++ sector_t sdist;
++ u64 total;
++
++ if (bfqq->last_request_pos < blk_rq_pos(rq))
++ sdist = blk_rq_pos(rq) - bfqq->last_request_pos;
++ else
++ sdist = bfqq->last_request_pos - blk_rq_pos(rq);
++
++ /*
++ * Don't allow the seek distance to get too large from the
++ * odd fragment, pagein, etc.
++ */
++ if (bfqq->seek_samples == 0) /* first request, not really a seek */
++ sdist = 0;
++ else if (bfqq->seek_samples <= 60) /* second & third seek */
++ sdist = min(sdist, (bfqq->seek_mean * 4) + 2*1024*1024);
++ else
++ sdist = min(sdist, (bfqq->seek_mean * 4) + 2*1024*64);
++
++ bfqq->seek_samples = (7*bfqq->seek_samples + 256) / 8;
++ bfqq->seek_total = (7*bfqq->seek_total + (u64)256*sdist) / 8;
++ total = bfqq->seek_total + (bfqq->seek_samples/2);
++ do_div(total, bfqq->seek_samples);
++ bfqq->seek_mean = (sector_t)total;
++
++ bfq_log_bfqq(bfqd, bfqq, "dist=%llu mean=%llu", (u64)sdist,
++ (u64)bfqq->seek_mean);
++}
++
++/*
++ * Disable idle window if the process thinks too long or seeks so much that
++ * it doesn't matter.
++ */
++static void bfq_update_idle_window(struct bfq_data *bfqd,
++ struct bfq_queue *bfqq,
++ struct bfq_io_cq *bic)
++{
++ int enable_idle;
++
++ /* Don't idle for async or idle io prio class. */
++ if (!bfq_bfqq_sync(bfqq) || bfq_class_idle(bfqq))
++ return;
++
++ enable_idle = bfq_bfqq_idle_window(bfqq);
++
++ if (atomic_read(&bic->icq.ioc->active_ref) == 0 ||
++ bfqd->bfq_slice_idle == 0 ||
++ (bfqd->hw_tag && BFQQ_SEEKY(bfqq) &&
++ bfqq->wr_coeff == 1))
++ enable_idle = 0;
++ else if (bfq_sample_valid(bic->ttime.ttime_samples)) {
++ if (bic->ttime.ttime_mean > bfqd->bfq_slice_idle &&
++ bfqq->wr_coeff == 1)
++ enable_idle = 0;
++ else
++ enable_idle = 1;
++ }
++ bfq_log_bfqq(bfqd, bfqq, "update_idle_window: enable_idle %d",
++ enable_idle);
++
++ if (enable_idle)
++ bfq_mark_bfqq_idle_window(bfqq);
++ else
++ bfq_clear_bfqq_idle_window(bfqq);
++}
++
++/*
++ * Called when a new fs request (rq) is added to bfqq. Check if there's
++ * something we should do about it.
++ */
++static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++ struct request *rq)
++{
++ struct bfq_io_cq *bic = RQ_BIC(rq);
++
++ if (rq->cmd_flags & REQ_META)
++ bfqq->meta_pending++;
++
++ bfq_update_io_thinktime(bfqd, bic);
++ bfq_update_io_seektime(bfqd, bfqq, rq);
++ if (!BFQQ_SEEKY(bfqq) && bfq_bfqq_constantly_seeky(bfqq)) {
++ bfq_clear_bfqq_constantly_seeky(bfqq);
++ if (!blk_queue_nonrot(bfqd->queue)) {
++ BUG_ON(!bfqd->const_seeky_busy_in_flight_queues);
++ bfqd->const_seeky_busy_in_flight_queues--;
++ }
++ }
++ if (bfqq->entity.service > bfq_max_budget(bfqd) / 8 ||
++ !BFQQ_SEEKY(bfqq))
++ bfq_update_idle_window(bfqd, bfqq, bic);
++
++ bfq_log_bfqq(bfqd, bfqq,
++ "rq_enqueued: idle_window=%d (seeky %d, mean %llu)",
++ bfq_bfqq_idle_window(bfqq), BFQQ_SEEKY(bfqq),
++ (long long unsigned)bfqq->seek_mean);
++
++ bfqq->last_request_pos = blk_rq_pos(rq) + blk_rq_sectors(rq);
++
++ if (bfqq == bfqd->in_service_queue && bfq_bfqq_wait_request(bfqq)) {
++ int small_req = bfqq->queued[rq_is_sync(rq)] == 1 &&
++ blk_rq_sectors(rq) < 32;
++ int budget_timeout = bfq_bfqq_budget_timeout(bfqq);
++
++ /*
++ * There is just this request queued: if the request
++ * is small and the queue is not to be expired, then
++ * just exit.
++ *
++ * In this way, if the disk is being idled to wait for
++ * a new request from the in-service queue, we avoid
++ * unplugging the device and committing the disk to serve
++ * just a small request. On the contrary, we wait for
++ * the block layer to decide when to unplug the device:
++ * hopefully, new requests will be merged to this one
++ * quickly, then the device will be unplugged and
++ * larger requests will be dispatched.
++ */
++ if (small_req && !budget_timeout)
++ return;
++
++ /*
++ * A large enough request arrived, or the queue is to
++ * be expired: in both cases disk idling is to be
++ * stopped, so clear wait_request flag and reset
++ * timer.
++ */
++ bfq_clear_bfqq_wait_request(bfqq);
++ del_timer(&bfqd->idle_slice_timer);
++
++ /*
++ * The queue is not empty, because a new request just
++ * arrived. Hence we can safely expire the queue, in
++ * case of budget timeout, without risking that the
++ * timestamps of the queue are not updated correctly.
++ * See [1] for more details.
++ */
++ if (budget_timeout)
++ bfq_bfqq_expire(bfqd, bfqq, 0, BFQ_BFQQ_BUDGET_TIMEOUT);
++
++ /*
++ * Let the request rip immediately, or let a new queue be
++ * selected if bfqq has just been expired.
++ */
++ __blk_run_queue(bfqd->queue);
++ }
++}
++
++static void bfq_insert_request(struct request_queue *q, struct request *rq)
++{
++ struct bfq_data *bfqd = q->elevator->elevator_data;
++ struct bfq_queue *bfqq = RQ_BFQQ(rq);
++
++ assert_spin_locked(bfqd->queue->queue_lock);
++
++ bfq_add_request(rq);
++
++ rq->fifo_time = jiffies + bfqd->bfq_fifo_expire[rq_is_sync(rq)];
++ list_add_tail(&rq->queuelist, &bfqq->fifo);
++
++ bfq_rq_enqueued(bfqd, bfqq, rq);
++}
++
++static void bfq_update_hw_tag(struct bfq_data *bfqd)
++{
++ bfqd->max_rq_in_driver = max(bfqd->max_rq_in_driver,
++ bfqd->rq_in_driver);
++
++ if (bfqd->hw_tag == 1)
++ return;
++
++ /*
++ * This sample is valid if the number of outstanding requests
++ * is large enough to allow a queueing behavior. Note that the
++ * sum is not exact, as it's not taking into account deactivated
++ * requests.
++ */
++ if (bfqd->rq_in_driver + bfqd->queued < BFQ_HW_QUEUE_THRESHOLD)
++ return;
++
++ if (bfqd->hw_tag_samples++ < BFQ_HW_QUEUE_SAMPLES)
++ return;
++
++ bfqd->hw_tag = bfqd->max_rq_in_driver > BFQ_HW_QUEUE_THRESHOLD;
++ bfqd->max_rq_in_driver = 0;
++ bfqd->hw_tag_samples = 0;
++}
++
++static void bfq_completed_request(struct request_queue *q, struct request *rq)
++{
++ struct bfq_queue *bfqq = RQ_BFQQ(rq);
++ struct bfq_data *bfqd = bfqq->bfqd;
++ bool sync = bfq_bfqq_sync(bfqq);
++
++ bfq_log_bfqq(bfqd, bfqq, "completed one req with %u sects left (%d)",
++ blk_rq_sectors(rq), sync);
++
++ bfq_update_hw_tag(bfqd);
++
++ BUG_ON(!bfqd->rq_in_driver);
++ BUG_ON(!bfqq->dispatched);
++ bfqd->rq_in_driver--;
++ bfqq->dispatched--;
++
++ if (!bfqq->dispatched && !bfq_bfqq_busy(bfqq)) {
++ bfq_weights_tree_remove(bfqd, &bfqq->entity,
++ &bfqd->queue_weights_tree);
++ if (!blk_queue_nonrot(bfqd->queue)) {
++ BUG_ON(!bfqd->busy_in_flight_queues);
++ bfqd->busy_in_flight_queues--;
++ if (bfq_bfqq_constantly_seeky(bfqq)) {
++ BUG_ON(!bfqd->
++ const_seeky_busy_in_flight_queues);
++ bfqd->const_seeky_busy_in_flight_queues--;
++ }
++ }
++ }
++
++ if (sync) {
++ bfqd->sync_flight--;
++ RQ_BIC(rq)->ttime.last_end_request = jiffies;
++ }
++
++ /*
++ * If we are waiting to discover whether the request pattern of the
++ * task associated with the queue is actually isochronous, and
++ * both requisites for this condition to hold are satisfied, then
++ * compute soft_rt_next_start (see the comments to the function
++ * bfq_bfqq_softrt_next_start()).
++ */
++ if (bfq_bfqq_softrt_update(bfqq) && bfqq->dispatched == 0 &&
++ RB_EMPTY_ROOT(&bfqq->sort_list))
++ bfqq->soft_rt_next_start =
++ bfq_bfqq_softrt_next_start(bfqd, bfqq);
++
++ /*
++ * If this is the in-service queue, check if it needs to be expired,
++ * or if we want to idle in case it has no pending requests.
++ */
++ if (bfqd->in_service_queue == bfqq) {
++ if (bfq_bfqq_budget_new(bfqq))
++ bfq_set_budget_timeout(bfqd);
++
++ if (bfq_bfqq_must_idle(bfqq)) {
++ bfq_arm_slice_timer(bfqd);
++ goto out;
++ } else if (bfq_may_expire_for_budg_timeout(bfqq))
++ bfq_bfqq_expire(bfqd, bfqq, 0, BFQ_BFQQ_BUDGET_TIMEOUT);
++ else if (RB_EMPTY_ROOT(&bfqq->sort_list) &&
++ (bfqq->dispatched == 0 ||
++ !bfq_bfqq_must_not_expire(bfqq)))
++ bfq_bfqq_expire(bfqd, bfqq, 0,
++ BFQ_BFQQ_NO_MORE_REQUESTS);
++ }
++
++ if (!bfqd->rq_in_driver)
++ bfq_schedule_dispatch(bfqd);
++
++out:
++ return;
++}
++
++static inline int __bfq_may_queue(struct bfq_queue *bfqq)
++{
++ if (bfq_bfqq_wait_request(bfqq) && bfq_bfqq_must_alloc(bfqq)) {
++ bfq_clear_bfqq_must_alloc(bfqq);
++ return ELV_MQUEUE_MUST;
++ }
++
++ return ELV_MQUEUE_MAY;
++}
++
++static int bfq_may_queue(struct request_queue *q, int rw)
++{
++ struct bfq_data *bfqd = q->elevator->elevator_data;
++ struct task_struct *tsk = current;
++ struct bfq_io_cq *bic;
++ struct bfq_queue *bfqq;
++
++ /*
++ * Don't force setup of a queue from here, as a call to may_queue
++ * does not necessarily imply that a request actually will be
++ * queued. So just lookup a possibly existing queue, or return
++ * 'may queue' if that fails.
++ */
++ bic = bfq_bic_lookup(bfqd, tsk->io_context);
++ if (bic == NULL)
++ return ELV_MQUEUE_MAY;
++
++ bfqq = bic_to_bfqq(bic, rw_is_sync(rw));
++ if (bfqq != NULL)
++ return __bfq_may_queue(bfqq);
++
++ return ELV_MQUEUE_MAY;
++}
++
++/*
++ * Queue lock held here.
++ */
++static void bfq_put_request(struct request *rq)
++{
++ struct bfq_queue *bfqq = RQ_BFQQ(rq);
++
++ if (bfqq != NULL) {
++ const int rw = rq_data_dir(rq);
++
++ BUG_ON(!bfqq->allocated[rw]);
++ bfqq->allocated[rw]--;
++
++ rq->elv.priv[0] = NULL;
++ rq->elv.priv[1] = NULL;
++
++ bfq_log_bfqq(bfqq->bfqd, bfqq, "put_request %p, %d",
++ bfqq, atomic_read(&bfqq->ref));
++ bfq_put_queue(bfqq);
++ }
++}
++
++static struct bfq_queue *
++bfq_merge_bfqqs(struct bfq_data *bfqd, struct bfq_io_cq *bic,
++ struct bfq_queue *bfqq)
++{
++ bfq_log_bfqq(bfqd, bfqq, "merging with queue %lu",
++ (long unsigned)bfqq->new_bfqq->pid);
++ bic_set_bfqq(bic, bfqq->new_bfqq, 1);
++ bfq_mark_bfqq_coop(bfqq->new_bfqq);
++ bfq_put_queue(bfqq);
++ return bic_to_bfqq(bic, 1);
++}
++
++/*
++ * Returns NULL if a new bfqq should be allocated, or the old bfqq if this
++ * was the last process referring to said bfqq.
++ */
++static struct bfq_queue *
++bfq_split_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq)
++{
++ bfq_log_bfqq(bfqq->bfqd, bfqq, "splitting queue");
++ if (bfqq_process_refs(bfqq) == 1) {
++ bfqq->pid = current->pid;
++ bfq_clear_bfqq_coop(bfqq);
++ bfq_clear_bfqq_split_coop(bfqq);
++ return bfqq;
++ }
++
++ bic_set_bfqq(bic, NULL, 1);
++
++ bfq_put_cooperator(bfqq);
++
++ bfq_put_queue(bfqq);
++ return NULL;
++}
++
++/*
++ * Allocate bfq data structures associated with this request.
++ */
++static int bfq_set_request(struct request_queue *q, struct request *rq,
++ struct bio *bio, gfp_t gfp_mask)
++{
++ struct bfq_data *bfqd = q->elevator->elevator_data;
++ struct bfq_io_cq *bic = icq_to_bic(rq->elv.icq);
++ const int rw = rq_data_dir(rq);
++ const int is_sync = rq_is_sync(rq);
++ struct bfq_queue *bfqq;
++ struct bfq_group *bfqg;
++ unsigned long flags;
++
++ might_sleep_if(gfp_mask & __GFP_WAIT);
++
++ bfq_check_ioprio_change(bic);
++
++ spin_lock_irqsave(q->queue_lock, flags);
++
++ if (bic == NULL)
++ goto queue_fail;
++
++ bfqg = bfq_bic_update_cgroup(bic);
++
++new_queue:
++ bfqq = bic_to_bfqq(bic, is_sync);
++ if (bfqq == NULL || bfqq == &bfqd->oom_bfqq) {
++ bfqq = bfq_get_queue(bfqd, bfqg, is_sync, bic, gfp_mask);
++ bic_set_bfqq(bic, bfqq, is_sync);
++ } else {
++ /*
++ * If the queue was seeky for too long, break it apart.
++ */
++ if (bfq_bfqq_coop(bfqq) && bfq_bfqq_split_coop(bfqq)) {
++ bfq_log_bfqq(bfqd, bfqq, "breaking apart bfqq");
++ bfqq = bfq_split_bfqq(bic, bfqq);
++ if (!bfqq)
++ goto new_queue;
++ }
++
++ /*
++ * Check to see if this queue is scheduled to merge with
++ * another closely cooperating queue. The merging of queues
++ * happens here as it must be done in process context.
++ * The reference on new_bfqq was taken in merge_bfqqs.
++ */
++ if (bfqq->new_bfqq != NULL)
++ bfqq = bfq_merge_bfqqs(bfqd, bic, bfqq);
++ }
++
++ bfqq->allocated[rw]++;
++ atomic_inc(&bfqq->ref);
++ bfq_log_bfqq(bfqd, bfqq, "set_request: bfqq %p, %d", bfqq,
++ atomic_read(&bfqq->ref));
++
++ rq->elv.priv[0] = bic;
++ rq->elv.priv[1] = bfqq;
++
++ spin_unlock_irqrestore(q->queue_lock, flags);
++
++ return 0;
++
++queue_fail:
++ bfq_schedule_dispatch(bfqd);
++ spin_unlock_irqrestore(q->queue_lock, flags);
++
++ return 1;
++}
++
++static void bfq_kick_queue(struct work_struct *work)
++{
++ struct bfq_data *bfqd =
++ container_of(work, struct bfq_data, unplug_work);
++ struct request_queue *q = bfqd->queue;
++
++ spin_lock_irq(q->queue_lock);
++ __blk_run_queue(q);
++ spin_unlock_irq(q->queue_lock);
++}
++
++/*
++ * Handler of the expiration of the timer running if the in-service queue
++ * is idling inside its time slice.
++ */
++static void bfq_idle_slice_timer(unsigned long data)
++{
++ struct bfq_data *bfqd = (struct bfq_data *)data;
++ struct bfq_queue *bfqq;
++ unsigned long flags;
++ enum bfqq_expiration reason;
++
++ spin_lock_irqsave(bfqd->queue->queue_lock, flags);
++
++ bfqq = bfqd->in_service_queue;
++ /*
++ * Theoretical race here: the in-service queue can be NULL or
++ * different from the queue that was idling if the timer handler
++ * spins on the queue_lock and a new request arrives for the
++ * current queue and there is a full dispatch cycle that changes
++ * the in-service queue. This can hardly happen, but in the worst
++ * case we just expire a queue too early.
++ */
++ if (bfqq != NULL) {
++ bfq_log_bfqq(bfqd, bfqq, "slice_timer expired");
++ if (bfq_bfqq_budget_timeout(bfqq))
++ /*
++ * Also here the queue can be safely expired
++ * for budget timeout without wasting
++ * guarantees
++ */
++ reason = BFQ_BFQQ_BUDGET_TIMEOUT;
++ else if (bfqq->queued[0] == 0 && bfqq->queued[1] == 0)
++ /*
++ * The queue may not be empty upon timer expiration,
++ * because we may not disable the timer when the
++ * first request of the in-service queue arrives
++ * during disk idling.
++ */
++ reason = BFQ_BFQQ_TOO_IDLE;
++ else
++ goto schedule_dispatch;
++
++ bfq_bfqq_expire(bfqd, bfqq, 1, reason);
++ }
++
++schedule_dispatch:
++ bfq_schedule_dispatch(bfqd);
++
++ spin_unlock_irqrestore(bfqd->queue->queue_lock, flags);
++}
++
++static void bfq_shutdown_timer_wq(struct bfq_data *bfqd)
++{
++ del_timer_sync(&bfqd->idle_slice_timer);
++ cancel_work_sync(&bfqd->unplug_work);
++}
++
++static inline void __bfq_put_async_bfqq(struct bfq_data *bfqd,
++ struct bfq_queue **bfqq_ptr)
++{
++ struct bfq_group *root_group = bfqd->root_group;
++ struct bfq_queue *bfqq = *bfqq_ptr;
++
++ bfq_log(bfqd, "put_async_bfqq: %p", bfqq);
++ if (bfqq != NULL) {
++ bfq_bfqq_move(bfqd, bfqq, &bfqq->entity, root_group);
++ bfq_log_bfqq(bfqd, bfqq, "put_async_bfqq: putting %p, %d",
++ bfqq, atomic_read(&bfqq->ref));
++ bfq_put_queue(bfqq);
++ *bfqq_ptr = NULL;
++ }
++}
++
++/*
++ * Release all the bfqg references to its async queues. If we are
++ * deallocating the group these queues may still contain requests, so
++ * we reparent them to the root cgroup (i.e., the only one that will
++ * exist for sure until all the requests on a device are gone).
++ */
++static void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg)
++{
++ int i, j;
++
++ for (i = 0; i < 2; i++)
++ for (j = 0; j < IOPRIO_BE_NR; j++)
++ __bfq_put_async_bfqq(bfqd, &bfqg->async_bfqq[i][j]);
++
++ __bfq_put_async_bfqq(bfqd, &bfqg->async_idle_bfqq);
++}
++
++static void bfq_exit_queue(struct elevator_queue *e)
++{
++ struct bfq_data *bfqd = e->elevator_data;
++ struct request_queue *q = bfqd->queue;
++ struct bfq_queue *bfqq, *n;
++
++ bfq_shutdown_timer_wq(bfqd);
++
++ spin_lock_irq(q->queue_lock);
++
++ BUG_ON(bfqd->in_service_queue != NULL);
++ list_for_each_entry_safe(bfqq, n, &bfqd->idle_list, bfqq_list)
++ bfq_deactivate_bfqq(bfqd, bfqq, 0);
++
++ bfq_disconnect_groups(bfqd);
++ spin_unlock_irq(q->queue_lock);
++
++ bfq_shutdown_timer_wq(bfqd);
++
++ synchronize_rcu();
++
++ BUG_ON(timer_pending(&bfqd->idle_slice_timer));
++
++ bfq_free_root_group(bfqd);
++ kfree(bfqd);
++}
++
++static int bfq_init_queue(struct request_queue *q, struct elevator_type *e)
++{
++ struct bfq_group *bfqg;
++ struct bfq_data *bfqd;
++ struct elevator_queue *eq;
++
++ eq = elevator_alloc(q, e);
++ if (eq == NULL)
++ return -ENOMEM;
++
++ bfqd = kzalloc_node(sizeof(*bfqd), GFP_KERNEL, q->node);
++ if (bfqd == NULL) {
++ kobject_put(&eq->kobj);
++ return -ENOMEM;
++ }
++ eq->elevator_data = bfqd;
++
++ /*
++ * Our fallback bfqq if bfq_find_alloc_queue() runs into OOM issues.
++ * Grab a permanent reference to it, so that the normal code flow
++ * will not attempt to free it.
++ */
++ bfq_init_bfqq(bfqd, &bfqd->oom_bfqq, NULL, 1, 0);
++ atomic_inc(&bfqd->oom_bfqq.ref);
++ bfqd->oom_bfqq.entity.new_ioprio = BFQ_DEFAULT_QUEUE_IOPRIO;
++ bfqd->oom_bfqq.entity.new_ioprio_class = IOPRIO_CLASS_BE;
++ bfqd->oom_bfqq.entity.new_weight =
++ bfq_ioprio_to_weight(bfqd->oom_bfqq.entity.new_ioprio);
++ /*
++ * Trigger weight initialization, according to ioprio, at the
++ * oom_bfqq's first activation. The oom_bfqq's ioprio and ioprio
++ * class won't be changed any more.
++ */
++ bfqd->oom_bfqq.entity.ioprio_changed = 1;
++
++ bfqd->queue = q;
++
++ spin_lock_irq(q->queue_lock);
++ q->elevator = eq;
++ spin_unlock_irq(q->queue_lock);
++
++ bfqg = bfq_alloc_root_group(bfqd, q->node);
++ if (bfqg == NULL) {
++ kfree(bfqd);
++ kobject_put(&eq->kobj);
++ return -ENOMEM;
++ }
++
++ bfqd->root_group = bfqg;
++ bfq_init_entity(&bfqd->oom_bfqq.entity, bfqd->root_group);
++#ifdef CONFIG_CGROUP_BFQIO
++ bfqd->active_numerous_groups = 0;
++#endif
++
++ init_timer(&bfqd->idle_slice_timer);
++ bfqd->idle_slice_timer.function = bfq_idle_slice_timer;
++ bfqd->idle_slice_timer.data = (unsigned long)bfqd;
++
++ bfqd->rq_pos_tree = RB_ROOT;
++ bfqd->queue_weights_tree = RB_ROOT;
++ bfqd->group_weights_tree = RB_ROOT;
++
++ INIT_WORK(&bfqd->unplug_work, bfq_kick_queue);
++
++ INIT_LIST_HEAD(&bfqd->active_list);
++ INIT_LIST_HEAD(&bfqd->idle_list);
++ INIT_HLIST_HEAD(&bfqd->burst_list);
++
++ bfqd->hw_tag = -1;
++
++ bfqd->bfq_max_budget = bfq_default_max_budget;
++
++ bfqd->bfq_fifo_expire[0] = bfq_fifo_expire[0];
++ bfqd->bfq_fifo_expire[1] = bfq_fifo_expire[1];
++ bfqd->bfq_back_max = bfq_back_max;
++ bfqd->bfq_back_penalty = bfq_back_penalty;
++ bfqd->bfq_slice_idle = bfq_slice_idle;
++ bfqd->bfq_class_idle_last_service = 0;
++ bfqd->bfq_max_budget_async_rq = bfq_max_budget_async_rq;
++ bfqd->bfq_timeout[BLK_RW_ASYNC] = bfq_timeout_async;
++ bfqd->bfq_timeout[BLK_RW_SYNC] = bfq_timeout_sync;
++
++ bfqd->bfq_coop_thresh = 2;
++ bfqd->bfq_failed_cooperations = 7000;
++ bfqd->bfq_requests_within_timer = 120;
++
++ bfqd->bfq_large_burst_thresh = 11;
++ bfqd->bfq_burst_interval = msecs_to_jiffies(500);
++
++ bfqd->low_latency = true;
++
++ bfqd->bfq_wr_coeff = 20;
++ bfqd->bfq_wr_rt_max_time = msecs_to_jiffies(300);
++ bfqd->bfq_wr_max_time = 0;
++ bfqd->bfq_wr_min_idle_time = msecs_to_jiffies(2000);
++ bfqd->bfq_wr_min_inter_arr_async = msecs_to_jiffies(500);
++ bfqd->bfq_wr_max_softrt_rate = 7000; /*
++ * Approximate rate required
++ * to playback or record a
++ * high-definition compressed
++ * video.
++ */
++ bfqd->wr_busy_queues = 0;
++ bfqd->busy_in_flight_queues = 0;
++ bfqd->const_seeky_busy_in_flight_queues = 0;
++
++ /*
++ * Begin by assuming, optimistically, that the device peak rate is
++ * equal to the highest reference rate.
++ */
++ bfqd->RT_prod = R_fast[blk_queue_nonrot(bfqd->queue)] *
++ T_fast[blk_queue_nonrot(bfqd->queue)];
++ bfqd->peak_rate = R_fast[blk_queue_nonrot(bfqd->queue)];
++ bfqd->device_speed = BFQ_BFQD_FAST;
++
++ return 0;
++}
++
++static void bfq_slab_kill(void)
++{
++ if (bfq_pool != NULL)
++ kmem_cache_destroy(bfq_pool);
++}
++
++static int __init bfq_slab_setup(void)
++{
++ bfq_pool = KMEM_CACHE(bfq_queue, 0);
++ if (bfq_pool == NULL)
++ return -ENOMEM;
++ return 0;
++}
++
++static ssize_t bfq_var_show(unsigned int var, char *page)
++{
++ return sprintf(page, "%d\n", var);
++}
++
++static ssize_t bfq_var_store(unsigned long *var, const char *page,
++ size_t count)
++{
++ unsigned long new_val;
++ int ret = kstrtoul(page, 10, &new_val);
++
++ if (ret == 0)
++ *var = new_val;
++
++ return count;
++}
++
++static ssize_t bfq_wr_max_time_show(struct elevator_queue *e, char *page)
++{
++ struct bfq_data *bfqd = e->elevator_data;
++ return sprintf(page, "%d\n", bfqd->bfq_wr_max_time > 0 ?
++ jiffies_to_msecs(bfqd->bfq_wr_max_time) :
++ jiffies_to_msecs(bfq_wr_duration(bfqd)));
++}
++
++static ssize_t bfq_weights_show(struct elevator_queue *e, char *page)
++{
++ struct bfq_queue *bfqq;
++ struct bfq_data *bfqd = e->elevator_data;
++ ssize_t num_char = 0;
++
++ num_char += sprintf(page + num_char, "Tot reqs queued %d\n\n",
++ bfqd->queued);
++
++ spin_lock_irq(bfqd->queue->queue_lock);
++
++ num_char += sprintf(page + num_char, "Active:\n");
++ list_for_each_entry(bfqq, &bfqd->active_list, bfqq_list) {
++ num_char += sprintf(page + num_char,
++ "pid%d: weight %hu, nr_queued %d %d, dur %d/%u\n",
++ bfqq->pid,
++ bfqq->entity.weight,
++ bfqq->queued[0],
++ bfqq->queued[1],
++ jiffies_to_msecs(jiffies - bfqq->last_wr_start_finish),
++ jiffies_to_msecs(bfqq->wr_cur_max_time));
++ }
++
++ num_char += sprintf(page + num_char, "Idle:\n");
++ list_for_each_entry(bfqq, &bfqd->idle_list, bfqq_list) {
++ num_char += sprintf(page + num_char,
++ "pid%d: weight %hu, dur %d/%u\n",
++ bfqq->pid,
++ bfqq->entity.weight,
++ jiffies_to_msecs(jiffies -
++ bfqq->last_wr_start_finish),
++ jiffies_to_msecs(bfqq->wr_cur_max_time));
++ }
++
++ spin_unlock_irq(bfqd->queue->queue_lock);
++
++ return num_char;
++}
++
++#define SHOW_FUNCTION(__FUNC, __VAR, __CONV) \
++static ssize_t __FUNC(struct elevator_queue *e, char *page) \
++{ \
++ struct bfq_data *bfqd = e->elevator_data; \
++ unsigned int __data = __VAR; \
++ if (__CONV) \
++ __data = jiffies_to_msecs(__data); \
++ return bfq_var_show(__data, (page)); \
++}
++SHOW_FUNCTION(bfq_fifo_expire_sync_show, bfqd->bfq_fifo_expire[1], 1);
++SHOW_FUNCTION(bfq_fifo_expire_async_show, bfqd->bfq_fifo_expire[0], 1);
++SHOW_FUNCTION(bfq_back_seek_max_show, bfqd->bfq_back_max, 0);
++SHOW_FUNCTION(bfq_back_seek_penalty_show, bfqd->bfq_back_penalty, 0);
++SHOW_FUNCTION(bfq_slice_idle_show, bfqd->bfq_slice_idle, 1);
++SHOW_FUNCTION(bfq_max_budget_show, bfqd->bfq_user_max_budget, 0);
++SHOW_FUNCTION(bfq_max_budget_async_rq_show,
++ bfqd->bfq_max_budget_async_rq, 0);
++SHOW_FUNCTION(bfq_timeout_sync_show, bfqd->bfq_timeout[BLK_RW_SYNC], 1);
++SHOW_FUNCTION(bfq_timeout_async_show, bfqd->bfq_timeout[BLK_RW_ASYNC], 1);
++SHOW_FUNCTION(bfq_low_latency_show, bfqd->low_latency, 0);
++SHOW_FUNCTION(bfq_wr_coeff_show, bfqd->bfq_wr_coeff, 0);
++SHOW_FUNCTION(bfq_wr_rt_max_time_show, bfqd->bfq_wr_rt_max_time, 1);
++SHOW_FUNCTION(bfq_wr_min_idle_time_show, bfqd->bfq_wr_min_idle_time, 1);
++SHOW_FUNCTION(bfq_wr_min_inter_arr_async_show, bfqd->bfq_wr_min_inter_arr_async,
++ 1);
++SHOW_FUNCTION(bfq_wr_max_softrt_rate_show, bfqd->bfq_wr_max_softrt_rate, 0);
++#undef SHOW_FUNCTION
++
++#define STORE_FUNCTION(__FUNC, __PTR, MIN, MAX, __CONV) \
++static ssize_t \
++__FUNC(struct elevator_queue *e, const char *page, size_t count) \
++{ \
++ struct bfq_data *bfqd = e->elevator_data; \
++ unsigned long uninitialized_var(__data); \
++ int ret = bfq_var_store(&__data, (page), count); \
++ if (__data < (MIN)) \
++ __data = (MIN); \
++ else if (__data > (MAX)) \
++ __data = (MAX); \
++ if (__CONV) \
++ *(__PTR) = msecs_to_jiffies(__data); \
++ else \
++ *(__PTR) = __data; \
++ return ret; \
++}
++STORE_FUNCTION(bfq_fifo_expire_sync_store, &bfqd->bfq_fifo_expire[1], 1,
++ INT_MAX, 1);
++STORE_FUNCTION(bfq_fifo_expire_async_store, &bfqd->bfq_fifo_expire[0], 1,
++ INT_MAX, 1);
++STORE_FUNCTION(bfq_back_seek_max_store, &bfqd->bfq_back_max, 0, INT_MAX, 0);
++STORE_FUNCTION(bfq_back_seek_penalty_store, &bfqd->bfq_back_penalty, 1,
++ INT_MAX, 0);
++STORE_FUNCTION(bfq_slice_idle_store, &bfqd->bfq_slice_idle, 0, INT_MAX, 1);
++STORE_FUNCTION(bfq_max_budget_async_rq_store, &bfqd->bfq_max_budget_async_rq,
++ 1, INT_MAX, 0);
++STORE_FUNCTION(bfq_timeout_async_store, &bfqd->bfq_timeout[BLK_RW_ASYNC], 0,
++ INT_MAX, 1);
++STORE_FUNCTION(bfq_wr_coeff_store, &bfqd->bfq_wr_coeff, 1, INT_MAX, 0);
++STORE_FUNCTION(bfq_wr_max_time_store, &bfqd->bfq_wr_max_time, 0, INT_MAX, 1);
++STORE_FUNCTION(bfq_wr_rt_max_time_store, &bfqd->bfq_wr_rt_max_time, 0, INT_MAX,
++ 1);
++STORE_FUNCTION(bfq_wr_min_idle_time_store, &bfqd->bfq_wr_min_idle_time, 0,
++ INT_MAX, 1);
++STORE_FUNCTION(bfq_wr_min_inter_arr_async_store,
++ &bfqd->bfq_wr_min_inter_arr_async, 0, INT_MAX, 1);
++STORE_FUNCTION(bfq_wr_max_softrt_rate_store, &bfqd->bfq_wr_max_softrt_rate, 0,
++ INT_MAX, 0);
++#undef STORE_FUNCTION
++
++/* do nothing for the moment */
++static ssize_t bfq_weights_store(struct elevator_queue *e,
++ const char *page, size_t count)
++{
++ return count;
++}
++
++static inline unsigned long bfq_estimated_max_budget(struct bfq_data *bfqd)
++{
++ u64 timeout = jiffies_to_msecs(bfqd->bfq_timeout[BLK_RW_SYNC]);
++
++ if (bfqd->peak_rate_samples >= BFQ_PEAK_RATE_SAMPLES)
++ return bfq_calc_max_budget(bfqd->peak_rate, timeout);
++ else
++ return bfq_default_max_budget;
++}
++
++static ssize_t bfq_max_budget_store(struct elevator_queue *e,
++ const char *page, size_t count)
++{
++ struct bfq_data *bfqd = e->elevator_data;
++ unsigned long uninitialized_var(__data);
++ int ret = bfq_var_store(&__data, (page), count);
++
++ if (__data == 0)
++ bfqd->bfq_max_budget = bfq_estimated_max_budget(bfqd);
++ else {
++ if (__data > INT_MAX)
++ __data = INT_MAX;
++ bfqd->bfq_max_budget = __data;
++ }
++
++ bfqd->bfq_user_max_budget = __data;
++
++ return ret;
++}
++
++static ssize_t bfq_timeout_sync_store(struct elevator_queue *e,
++ const char *page, size_t count)
++{
++ struct bfq_data *bfqd = e->elevator_data;
++ unsigned long uninitialized_var(__data);
++ int ret = bfq_var_store(&__data, (page), count);
++
++ if (__data < 1)
++ __data = 1;
++ else if (__data > INT_MAX)
++ __data = INT_MAX;
++
++ bfqd->bfq_timeout[BLK_RW_SYNC] = msecs_to_jiffies(__data);
++ if (bfqd->bfq_user_max_budget == 0)
++ bfqd->bfq_max_budget = bfq_estimated_max_budget(bfqd);
++
++ return ret;
++}
++
++static ssize_t bfq_low_latency_store(struct elevator_queue *e,
++ const char *page, size_t count)
++{
++ struct bfq_data *bfqd = e->elevator_data;
++ unsigned long uninitialized_var(__data);
++ int ret = bfq_var_store(&__data, (page), count);
++
++ if (__data > 1)
++ __data = 1;
++ if (__data == 0 && bfqd->low_latency != 0)
++ bfq_end_wr(bfqd);
++ bfqd->low_latency = __data;
++
++ return ret;
++}
++
++#define BFQ_ATTR(name) \
++ __ATTR(name, S_IRUGO|S_IWUSR, bfq_##name##_show, bfq_##name##_store)
++
++static struct elv_fs_entry bfq_attrs[] = {
++ BFQ_ATTR(fifo_expire_sync),
++ BFQ_ATTR(fifo_expire_async),
++ BFQ_ATTR(back_seek_max),
++ BFQ_ATTR(back_seek_penalty),
++ BFQ_ATTR(slice_idle),
++ BFQ_ATTR(max_budget),
++ BFQ_ATTR(max_budget_async_rq),
++ BFQ_ATTR(timeout_sync),
++ BFQ_ATTR(timeout_async),
++ BFQ_ATTR(low_latency),
++ BFQ_ATTR(wr_coeff),
++ BFQ_ATTR(wr_max_time),
++ BFQ_ATTR(wr_rt_max_time),
++ BFQ_ATTR(wr_min_idle_time),
++ BFQ_ATTR(wr_min_inter_arr_async),
++ BFQ_ATTR(wr_max_softrt_rate),
++ BFQ_ATTR(weights),
++ __ATTR_NULL
++};
++
++static struct elevator_type iosched_bfq = {
++ .ops = {
++ .elevator_merge_fn = bfq_merge,
++ .elevator_merged_fn = bfq_merged_request,
++ .elevator_merge_req_fn = bfq_merged_requests,
++ .elevator_allow_merge_fn = bfq_allow_merge,
++ .elevator_dispatch_fn = bfq_dispatch_requests,
++ .elevator_add_req_fn = bfq_insert_request,
++ .elevator_activate_req_fn = bfq_activate_request,
++ .elevator_deactivate_req_fn = bfq_deactivate_request,
++ .elevator_completed_req_fn = bfq_completed_request,
++ .elevator_former_req_fn = elv_rb_former_request,
++ .elevator_latter_req_fn = elv_rb_latter_request,
++ .elevator_init_icq_fn = bfq_init_icq,
++ .elevator_exit_icq_fn = bfq_exit_icq,
++ .elevator_set_req_fn = bfq_set_request,
++ .elevator_put_req_fn = bfq_put_request,
++ .elevator_may_queue_fn = bfq_may_queue,
++ .elevator_init_fn = bfq_init_queue,
++ .elevator_exit_fn = bfq_exit_queue,
++ },
++ .icq_size = sizeof(struct bfq_io_cq),
++ .icq_align = __alignof__(struct bfq_io_cq),
++ .elevator_attrs = bfq_attrs,
++ .elevator_name = "bfq",
++ .elevator_owner = THIS_MODULE,
++};
++
++static int __init bfq_init(void)
++{
++ /*
++ * Can be 0 on HZ < 1000 setups.
++ */
++ if (bfq_slice_idle == 0)
++ bfq_slice_idle = 1;
++
++ if (bfq_timeout_async == 0)
++ bfq_timeout_async = 1;
++
++ if (bfq_slab_setup())
++ return -ENOMEM;
++
++ /*
++ * Times to load large popular applications for the typical systems
++ * installed on the reference devices (see the comments before the
++ * definitions of the two arrays).
++ */
++ T_slow[0] = msecs_to_jiffies(2600);
++ T_slow[1] = msecs_to_jiffies(1000);
++ T_fast[0] = msecs_to_jiffies(5500);
++ T_fast[1] = msecs_to_jiffies(2000);
++
++ /*
++ * Thresholds that determine the switch between speed classes (see
++ * the comments before the definition of the array).
++ */
++ device_speed_thresh[0] = (R_fast[0] + R_slow[0]) / 2;
++ device_speed_thresh[1] = (R_fast[1] + R_slow[1]) / 2;
++
++ elv_register(&iosched_bfq);
++ pr_info("BFQ I/O-scheduler: v7r8");
++
++ return 0;
++}
++
++static void __exit bfq_exit(void)
++{
++ elv_unregister(&iosched_bfq);
++ bfq_slab_kill();
++}
++
++module_init(bfq_init);
++module_exit(bfq_exit);
++
++MODULE_AUTHOR("Fabio Checconi, Paolo Valente");
++MODULE_LICENSE("GPL");
+diff --git a/block/bfq-sched.c b/block/bfq-sched.c
+new file mode 100644
+index 0000000..c343099
+--- /dev/null
++++ b/block/bfq-sched.c
+@@ -0,0 +1,1208 @@
++/*
++ * BFQ: Hierarchical B-WF2Q+ scheduler.
++ *
++ * Based on ideas and code from CFQ:
++ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
++ *
++ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
++ * Paolo Valente <paolo.valente@unimore.it>
++ *
++ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
++ */
++
++#ifdef CONFIG_CGROUP_BFQIO
++#define for_each_entity(entity) \
++ for (; entity != NULL; entity = entity->parent)
++
++#define for_each_entity_safe(entity, parent) \
++ for (; entity && ({ parent = entity->parent; 1; }); entity = parent)
++
++static struct bfq_entity *bfq_lookup_next_entity(struct bfq_sched_data *sd,
++ int extract,
++ struct bfq_data *bfqd);
++
++static inline void bfq_update_budget(struct bfq_entity *next_in_service)
++{
++ struct bfq_entity *bfqg_entity;
++ struct bfq_group *bfqg;
++ struct bfq_sched_data *group_sd;
++
++ BUG_ON(next_in_service == NULL);
++
++ group_sd = next_in_service->sched_data;
++
++ bfqg = container_of(group_sd, struct bfq_group, sched_data);
++ /*
++ * bfq_group's my_entity field is not NULL only if the group
++ * is not the root group. We must not touch the root entity
++ * as it must never become an in-service entity.
++ */
++ bfqg_entity = bfqg->my_entity;
++ if (bfqg_entity != NULL)
++ bfqg_entity->budget = next_in_service->budget;
++}
++
++static int bfq_update_next_in_service(struct bfq_sched_data *sd)
++{
++ struct bfq_entity *next_in_service;
++
++ if (sd->in_service_entity != NULL)
++ /* will update/requeue at the end of service */
++ return 0;
++
++ /*
++ * NOTE: this can be improved in many ways, such as returning
++ * 1 (and thus propagating upwards the update) only when the
++ * budget changes, or caching the bfqq that will be scheduled
++ * next from this subtree. By now we worry more about
++ * correctness than about performance...
++ */
++ next_in_service = bfq_lookup_next_entity(sd, 0, NULL);
++ sd->next_in_service = next_in_service;
++
++ if (next_in_service != NULL)
++ bfq_update_budget(next_in_service);
++
++ return 1;
++}
++
++static inline void bfq_check_next_in_service(struct bfq_sched_data *sd,
++ struct bfq_entity *entity)
++{
++ BUG_ON(sd->next_in_service != entity);
++}
++#else
++#define for_each_entity(entity) \
++ for (; entity != NULL; entity = NULL)
++
++#define for_each_entity_safe(entity, parent) \
++ for (parent = NULL; entity != NULL; entity = parent)
++
++static inline int bfq_update_next_in_service(struct bfq_sched_data *sd)
++{
++ return 0;
++}
++
++static inline void bfq_check_next_in_service(struct bfq_sched_data *sd,
++ struct bfq_entity *entity)
++{
++}
++
++static inline void bfq_update_budget(struct bfq_entity *next_in_service)
++{
++}
++#endif
++
++/*
++ * Shift for timestamp calculations. This actually limits the maximum
++ * service allowed in one timestamp delta (small shift values increase it),
++ * the maximum total weight that can be used for the queues in the system
++ * (big shift values increase it), and the period of virtual time
++ * wraparounds.
++ */
++#define WFQ_SERVICE_SHIFT 22
++
++/**
++ * bfq_gt - compare two timestamps.
++ * @a: first ts.
++ * @b: second ts.
++ *
++ * Return @a > @b, dealing with wrapping correctly.
++ */
++static inline int bfq_gt(u64 a, u64 b)
++{
++ return (s64)(a - b) > 0;
++}
++
++static inline struct bfq_queue *bfq_entity_to_bfqq(struct bfq_entity *entity)
++{
++ struct bfq_queue *bfqq = NULL;
++
++ BUG_ON(entity == NULL);
++
++ if (entity->my_sched_data == NULL)
++ bfqq = container_of(entity, struct bfq_queue, entity);
++
++ return bfqq;
++}
++
++
++/**
++ * bfq_delta - map service into the virtual time domain.
++ * @service: amount of service.
++ * @weight: scale factor (weight of an entity or weight sum).
++ */
++static inline u64 bfq_delta(unsigned long service,
++ unsigned long weight)
++{
++ u64 d = (u64)service << WFQ_SERVICE_SHIFT;
++
++ do_div(d, weight);
++ return d;
++}
++
++/**
++ * bfq_calc_finish - assign the finish time to an entity.
++ * @entity: the entity to act upon.
++ * @service: the service to be charged to the entity.
++ */
++static inline void bfq_calc_finish(struct bfq_entity *entity,
++ unsigned long service)
++{
++ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++
++ BUG_ON(entity->weight == 0);
++
++ entity->finish = entity->start +
++ bfq_delta(service, entity->weight);
++
++ if (bfqq != NULL) {
++ bfq_log_bfqq(bfqq->bfqd, bfqq,
++ "calc_finish: serv %lu, w %d",
++ service, entity->weight);
++ bfq_log_bfqq(bfqq->bfqd, bfqq,
++ "calc_finish: start %llu, finish %llu, delta %llu",
++ entity->start, entity->finish,
++ bfq_delta(service, entity->weight));
++ }
++}
++
++/**
++ * bfq_entity_of - get an entity from a node.
++ * @node: the node field of the entity.
++ *
++ * Convert a node pointer to the relative entity. This is used only
++ * to simplify the logic of some functions and not as the generic
++ * conversion mechanism because, e.g., in the tree walking functions,
++ * the check for a %NULL value would be redundant.
++ */
++static inline struct bfq_entity *bfq_entity_of(struct rb_node *node)
++{
++ struct bfq_entity *entity = NULL;
++
++ if (node != NULL)
++ entity = rb_entry(node, struct bfq_entity, rb_node);
++
++ return entity;
++}
++
++/**
++ * bfq_extract - remove an entity from a tree.
++ * @root: the tree root.
++ * @entity: the entity to remove.
++ */
++static inline void bfq_extract(struct rb_root *root,
++ struct bfq_entity *entity)
++{
++ BUG_ON(entity->tree != root);
++
++ entity->tree = NULL;
++ rb_erase(&entity->rb_node, root);
++}
++
++/**
++ * bfq_idle_extract - extract an entity from the idle tree.
++ * @st: the service tree of the owning @entity.
++ * @entity: the entity being removed.
++ */
++static void bfq_idle_extract(struct bfq_service_tree *st,
++ struct bfq_entity *entity)
++{
++ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++ struct rb_node *next;
++
++ BUG_ON(entity->tree != &st->idle);
++
++ if (entity == st->first_idle) {
++ next = rb_next(&entity->rb_node);
++ st->first_idle = bfq_entity_of(next);
++ }
++
++ if (entity == st->last_idle) {
++ next = rb_prev(&entity->rb_node);
++ st->last_idle = bfq_entity_of(next);
++ }
++
++ bfq_extract(&st->idle, entity);
++
++ if (bfqq != NULL)
++ list_del(&bfqq->bfqq_list);
++}
++
++/**
++ * bfq_insert - generic tree insertion.
++ * @root: tree root.
++ * @entity: entity to insert.
++ *
++ * This is used for the idle and the active tree, since they are both
++ * ordered by finish time.
++ */
++static void bfq_insert(struct rb_root *root, struct bfq_entity *entity)
++{
++ struct bfq_entity *entry;
++ struct rb_node **node = &root->rb_node;
++ struct rb_node *parent = NULL;
++
++ BUG_ON(entity->tree != NULL);
++
++ while (*node != NULL) {
++ parent = *node;
++ entry = rb_entry(parent, struct bfq_entity, rb_node);
++
++ if (bfq_gt(entry->finish, entity->finish))
++ node = &parent->rb_left;
++ else
++ node = &parent->rb_right;
++ }
++
++ rb_link_node(&entity->rb_node, parent, node);
++ rb_insert_color(&entity->rb_node, root);
++
++ entity->tree = root;
++}
++
++/**
++ * bfq_update_min - update the min_start field of a entity.
++ * @entity: the entity to update.
++ * @node: one of its children.
++ *
++ * This function is called when @entity may store an invalid value for
++ * min_start due to updates to the active tree. The function assumes
++ * that the subtree rooted at @node (which may be its left or its right
++ * child) has a valid min_start value.
++ */
++static inline void bfq_update_min(struct bfq_entity *entity,
++ struct rb_node *node)
++{
++ struct bfq_entity *child;
++
++ if (node != NULL) {
++ child = rb_entry(node, struct bfq_entity, rb_node);
++ if (bfq_gt(entity->min_start, child->min_start))
++ entity->min_start = child->min_start;
++ }
++}
++
++/**
++ * bfq_update_active_node - recalculate min_start.
++ * @node: the node to update.
++ *
++ * @node may have changed position or one of its children may have moved,
++ * this function updates its min_start value. The left and right subtrees
++ * are assumed to hold a correct min_start value.
++ */
++static inline void bfq_update_active_node(struct rb_node *node)
++{
++ struct bfq_entity *entity = rb_entry(node, struct bfq_entity, rb_node);
++
++ entity->min_start = entity->start;
++ bfq_update_min(entity, node->rb_right);
++ bfq_update_min(entity, node->rb_left);
++}
++
++/**
++ * bfq_update_active_tree - update min_start for the whole active tree.
++ * @node: the starting node.
++ *
++ * @node must be the deepest modified node after an update. This function
++ * updates its min_start using the values held by its children, assuming
++ * that they did not change, and then updates all the nodes that may have
++ * changed in the path to the root. The only nodes that may have changed
++ * are the ones in the path or their siblings.
++ */
++static void bfq_update_active_tree(struct rb_node *node)
++{
++ struct rb_node *parent;
++
++up:
++ bfq_update_active_node(node);
++
++ parent = rb_parent(node);
++ if (parent == NULL)
++ return;
++
++ if (node == parent->rb_left && parent->rb_right != NULL)
++ bfq_update_active_node(parent->rb_right);
++ else if (parent->rb_left != NULL)
++ bfq_update_active_node(parent->rb_left);
++
++ node = parent;
++ goto up;
++}
++
++static void bfq_weights_tree_add(struct bfq_data *bfqd,
++ struct bfq_entity *entity,
++ struct rb_root *root);
++
++static void bfq_weights_tree_remove(struct bfq_data *bfqd,
++ struct bfq_entity *entity,
++ struct rb_root *root);
++
++
++/**
++ * bfq_active_insert - insert an entity in the active tree of its
++ * group/device.
++ * @st: the service tree of the entity.
++ * @entity: the entity being inserted.
++ *
++ * The active tree is ordered by finish time, but an extra key is kept
++ * per each node, containing the minimum value for the start times of
++ * its children (and the node itself), so it's possible to search for
++ * the eligible node with the lowest finish time in logarithmic time.
++ */
++static void bfq_active_insert(struct bfq_service_tree *st,
++ struct bfq_entity *entity)
++{
++ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++ struct rb_node *node = &entity->rb_node;
++#ifdef CONFIG_CGROUP_BFQIO
++ struct bfq_sched_data *sd = NULL;
++ struct bfq_group *bfqg = NULL;
++ struct bfq_data *bfqd = NULL;
++#endif
++
++ bfq_insert(&st->active, entity);
++
++ if (node->rb_left != NULL)
++ node = node->rb_left;
++ else if (node->rb_right != NULL)
++ node = node->rb_right;
++
++ bfq_update_active_tree(node);
++
++#ifdef CONFIG_CGROUP_BFQIO
++ sd = entity->sched_data;
++ bfqg = container_of(sd, struct bfq_group, sched_data);
++ BUG_ON(!bfqg);
++ bfqd = (struct bfq_data *)bfqg->bfqd;
++#endif
++ if (bfqq != NULL)
++ list_add(&bfqq->bfqq_list, &bfqq->bfqd->active_list);
++#ifdef CONFIG_CGROUP_BFQIO
++ else { /* bfq_group */
++ BUG_ON(!bfqd);
++ bfq_weights_tree_add(bfqd, entity, &bfqd->group_weights_tree);
++ }
++ if (bfqg != bfqd->root_group) {
++ BUG_ON(!bfqg);
++ BUG_ON(!bfqd);
++ bfqg->active_entities++;
++ if (bfqg->active_entities == 2)
++ bfqd->active_numerous_groups++;
++ }
++#endif
++}
++
++/**
++ * bfq_ioprio_to_weight - calc a weight from an ioprio.
++ * @ioprio: the ioprio value to convert.
++ */
++static inline unsigned short bfq_ioprio_to_weight(int ioprio)
++{
++ BUG_ON(ioprio < 0 || ioprio >= IOPRIO_BE_NR);
++ return IOPRIO_BE_NR - ioprio;
++}
++
++/**
++ * bfq_weight_to_ioprio - calc an ioprio from a weight.
++ * @weight: the weight value to convert.
++ *
++ * To preserve as mush as possible the old only-ioprio user interface,
++ * 0 is used as an escape ioprio value for weights (numerically) equal or
++ * larger than IOPRIO_BE_NR
++ */
++static inline unsigned short bfq_weight_to_ioprio(int weight)
++{
++ BUG_ON(weight < BFQ_MIN_WEIGHT || weight > BFQ_MAX_WEIGHT);
++ return IOPRIO_BE_NR - weight < 0 ? 0 : IOPRIO_BE_NR - weight;
++}
++
++static inline void bfq_get_entity(struct bfq_entity *entity)
++{
++ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++
++ if (bfqq != NULL) {
++ atomic_inc(&bfqq->ref);
++ bfq_log_bfqq(bfqq->bfqd, bfqq, "get_entity: %p %d",
++ bfqq, atomic_read(&bfqq->ref));
++ }
++}
++
++/**
++ * bfq_find_deepest - find the deepest node that an extraction can modify.
++ * @node: the node being removed.
++ *
++ * Do the first step of an extraction in an rb tree, looking for the
++ * node that will replace @node, and returning the deepest node that
++ * the following modifications to the tree can touch. If @node is the
++ * last node in the tree return %NULL.
++ */
++static struct rb_node *bfq_find_deepest(struct rb_node *node)
++{
++ struct rb_node *deepest;
++
++ if (node->rb_right == NULL && node->rb_left == NULL)
++ deepest = rb_parent(node);
++ else if (node->rb_right == NULL)
++ deepest = node->rb_left;
++ else if (node->rb_left == NULL)
++ deepest = node->rb_right;
++ else {
++ deepest = rb_next(node);
++ if (deepest->rb_right != NULL)
++ deepest = deepest->rb_right;
++ else if (rb_parent(deepest) != node)
++ deepest = rb_parent(deepest);
++ }
++
++ return deepest;
++}
++
++/**
++ * bfq_active_extract - remove an entity from the active tree.
++ * @st: the service_tree containing the tree.
++ * @entity: the entity being removed.
++ */
++static void bfq_active_extract(struct bfq_service_tree *st,
++ struct bfq_entity *entity)
++{
++ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++ struct rb_node *node;
++#ifdef CONFIG_CGROUP_BFQIO
++ struct bfq_sched_data *sd = NULL;
++ struct bfq_group *bfqg = NULL;
++ struct bfq_data *bfqd = NULL;
++#endif
++
++ node = bfq_find_deepest(&entity->rb_node);
++ bfq_extract(&st->active, entity);
++
++ if (node != NULL)
++ bfq_update_active_tree(node);
++
++#ifdef CONFIG_CGROUP_BFQIO
++ sd = entity->sched_data;
++ bfqg = container_of(sd, struct bfq_group, sched_data);
++ BUG_ON(!bfqg);
++ bfqd = (struct bfq_data *)bfqg->bfqd;
++#endif
++ if (bfqq != NULL)
++ list_del(&bfqq->bfqq_list);
++#ifdef CONFIG_CGROUP_BFQIO
++ else { /* bfq_group */
++ BUG_ON(!bfqd);
++ bfq_weights_tree_remove(bfqd, entity,
++ &bfqd->group_weights_tree);
++ }
++ if (bfqg != bfqd->root_group) {
++ BUG_ON(!bfqg);
++ BUG_ON(!bfqd);
++ BUG_ON(!bfqg->active_entities);
++ bfqg->active_entities--;
++ if (bfqg->active_entities == 1) {
++ BUG_ON(!bfqd->active_numerous_groups);
++ bfqd->active_numerous_groups--;
++ }
++ }
++#endif
++}
++
++/**
++ * bfq_idle_insert - insert an entity into the idle tree.
++ * @st: the service tree containing the tree.
++ * @entity: the entity to insert.
++ */
++static void bfq_idle_insert(struct bfq_service_tree *st,
++ struct bfq_entity *entity)
++{
++ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++ struct bfq_entity *first_idle = st->first_idle;
++ struct bfq_entity *last_idle = st->last_idle;
++
++ if (first_idle == NULL || bfq_gt(first_idle->finish, entity->finish))
++ st->first_idle = entity;
++ if (last_idle == NULL || bfq_gt(entity->finish, last_idle->finish))
++ st->last_idle = entity;
++
++ bfq_insert(&st->idle, entity);
++
++ if (bfqq != NULL)
++ list_add(&bfqq->bfqq_list, &bfqq->bfqd->idle_list);
++}
++
++/**
++ * bfq_forget_entity - remove an entity from the wfq trees.
++ * @st: the service tree.
++ * @entity: the entity being removed.
++ *
++ * Update the device status and forget everything about @entity, putting
++ * the device reference to it, if it is a queue. Entities belonging to
++ * groups are not refcounted.
++ */
++static void bfq_forget_entity(struct bfq_service_tree *st,
++ struct bfq_entity *entity)
++{
++ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++ struct bfq_sched_data *sd;
++
++ BUG_ON(!entity->on_st);
++
++ entity->on_st = 0;
++ st->wsum -= entity->weight;
++ if (bfqq != NULL) {
++ sd = entity->sched_data;
++ bfq_log_bfqq(bfqq->bfqd, bfqq, "forget_entity: %p %d",
++ bfqq, atomic_read(&bfqq->ref));
++ bfq_put_queue(bfqq);
++ }
++}
++
++/**
++ * bfq_put_idle_entity - release the idle tree ref of an entity.
++ * @st: service tree for the entity.
++ * @entity: the entity being released.
++ */
++static void bfq_put_idle_entity(struct bfq_service_tree *st,
++ struct bfq_entity *entity)
++{
++ bfq_idle_extract(st, entity);
++ bfq_forget_entity(st, entity);
++}
++
++/**
++ * bfq_forget_idle - update the idle tree if necessary.
++ * @st: the service tree to act upon.
++ *
++ * To preserve the global O(log N) complexity we only remove one entry here;
++ * as the idle tree will not grow indefinitely this can be done safely.
++ */
++static void bfq_forget_idle(struct bfq_service_tree *st)
++{
++ struct bfq_entity *first_idle = st->first_idle;
++ struct bfq_entity *last_idle = st->last_idle;
++
++ if (RB_EMPTY_ROOT(&st->active) && last_idle != NULL &&
++ !bfq_gt(last_idle->finish, st->vtime)) {
++ /*
++ * Forget the whole idle tree, increasing the vtime past
++ * the last finish time of idle entities.
++ */
++ st->vtime = last_idle->finish;
++ }
++
++ if (first_idle != NULL && !bfq_gt(first_idle->finish, st->vtime))
++ bfq_put_idle_entity(st, first_idle);
++}
++
++static struct bfq_service_tree *
++__bfq_entity_update_weight_prio(struct bfq_service_tree *old_st,
++ struct bfq_entity *entity)
++{
++ struct bfq_service_tree *new_st = old_st;
++
++ if (entity->ioprio_changed) {
++ struct bfq_queue *bfqq = bfq_entity_to_bfqq(entity);
++ unsigned short prev_weight, new_weight;
++ struct bfq_data *bfqd = NULL;
++ struct rb_root *root;
++#ifdef CONFIG_CGROUP_BFQIO
++ struct bfq_sched_data *sd;
++ struct bfq_group *bfqg;
++#endif
++
++ if (bfqq != NULL)
++ bfqd = bfqq->bfqd;
++#ifdef CONFIG_CGROUP_BFQIO
++ else {
++ sd = entity->my_sched_data;
++ bfqg = container_of(sd, struct bfq_group, sched_data);
++ BUG_ON(!bfqg);
++ bfqd = (struct bfq_data *)bfqg->bfqd;
++ BUG_ON(!bfqd);
++ }
++#endif
++
++ BUG_ON(old_st->wsum < entity->weight);
++ old_st->wsum -= entity->weight;
++
++ if (entity->new_weight != entity->orig_weight) {
++ if (entity->new_weight < BFQ_MIN_WEIGHT ||
++ entity->new_weight > BFQ_MAX_WEIGHT) {
++ printk(KERN_CRIT "update_weight_prio: "
++ "new_weight %d\n",
++ entity->new_weight);
++ BUG();
++ }
++ entity->orig_weight = entity->new_weight;
++ entity->ioprio =
++ bfq_weight_to_ioprio(entity->orig_weight);
++ }
++
++ entity->ioprio_class = entity->new_ioprio_class;
++ entity->ioprio_changed = 0;
++
++ /*
++ * NOTE: here we may be changing the weight too early,
++ * this will cause unfairness. The correct approach
++ * would have required additional complexity to defer
++ * weight changes to the proper time instants (i.e.,
++ * when entity->finish <= old_st->vtime).
++ */
++ new_st = bfq_entity_service_tree(entity);
++
++ prev_weight = entity->weight;
++ new_weight = entity->orig_weight *
++ (bfqq != NULL ? bfqq->wr_coeff : 1);
++ /*
++ * If the weight of the entity changes, remove the entity
++ * from its old weight counter (if there is a counter
++ * associated with the entity), and add it to the counter
++ * associated with its new weight.
++ */
++ if (prev_weight != new_weight) {
++ root = bfqq ? &bfqd->queue_weights_tree :
++ &bfqd->group_weights_tree;
++ bfq_weights_tree_remove(bfqd, entity, root);
++ }
++ entity->weight = new_weight;
++ /*
++ * Add the entity to its weights tree only if it is
++ * not associated with a weight-raised queue.
++ */
++ if (prev_weight != new_weight &&
++ (bfqq ? bfqq->wr_coeff == 1 : 1))
++ /* If we get here, root has been initialized. */
++ bfq_weights_tree_add(bfqd, entity, root);
++
++ new_st->wsum += entity->weight;
++
++ if (new_st != old_st)
++ entity->start = new_st->vtime;
++ }
++
++ return new_st;
++}
++
++/**
++ * bfq_bfqq_served - update the scheduler status after selection for
++ * service.
++ * @bfqq: the queue being served.
++ * @served: bytes to transfer.
++ *
++ * NOTE: this can be optimized, as the timestamps of upper level entities
++ * are synchronized every time a new bfqq is selected for service. By now,
++ * we keep it to better check consistency.
++ */
++static void bfq_bfqq_served(struct bfq_queue *bfqq, unsigned long served)
++{
++ struct bfq_entity *entity = &bfqq->entity;
++ struct bfq_service_tree *st;
++
++ for_each_entity(entity) {
++ st = bfq_entity_service_tree(entity);
++
++ entity->service += served;
++ BUG_ON(entity->service > entity->budget);
++ BUG_ON(st->wsum == 0);
++
++ st->vtime += bfq_delta(served, st->wsum);
++ bfq_forget_idle(st);
++ }
++ bfq_log_bfqq(bfqq->bfqd, bfqq, "bfqq_served %lu secs", served);
++}
++
++/**
++ * bfq_bfqq_charge_full_budget - set the service to the entity budget.
++ * @bfqq: the queue that needs a service update.
++ *
++ * When it's not possible to be fair in the service domain, because
++ * a queue is not consuming its budget fast enough (the meaning of
++ * fast depends on the timeout parameter), we charge it a full
++ * budget. In this way we should obtain a sort of time-domain
++ * fairness among all the seeky/slow queues.
++ */
++static inline void bfq_bfqq_charge_full_budget(struct bfq_queue *bfqq)
++{
++ struct bfq_entity *entity = &bfqq->entity;
++
++ bfq_log_bfqq(bfqq->bfqd, bfqq, "charge_full_budget");
++
++ bfq_bfqq_served(bfqq, entity->budget - entity->service);
++}
++
++/**
++ * __bfq_activate_entity - activate an entity.
++ * @entity: the entity being activated.
++ *
++ * Called whenever an entity is activated, i.e., it is not active and one
++ * of its children receives a new request, or has to be reactivated due to
++ * budget exhaustion. It uses the current budget of the entity (and the
++ * service received if @entity is active) of the queue to calculate its
++ * timestamps.
++ */
++static void __bfq_activate_entity(struct bfq_entity *entity)
++{
++ struct bfq_sched_data *sd = entity->sched_data;
++ struct bfq_service_tree *st = bfq_entity_service_tree(entity);
++
++ if (entity == sd->in_service_entity) {
++ BUG_ON(entity->tree != NULL);
++ /*
++ * If we are requeueing the current entity we have
++ * to take care of not charging to it service it has
++ * not received.
++ */
++ bfq_calc_finish(entity, entity->service);
++ entity->start = entity->finish;
++ sd->in_service_entity = NULL;
++ } else if (entity->tree == &st->active) {
++ /*
++ * Requeueing an entity due to a change of some
++ * next_in_service entity below it. We reuse the
++ * old start time.
++ */
++ bfq_active_extract(st, entity);
++ } else if (entity->tree == &st->idle) {
++ /*
++ * Must be on the idle tree, bfq_idle_extract() will
++ * check for that.
++ */
++ bfq_idle_extract(st, entity);
++ entity->start = bfq_gt(st->vtime, entity->finish) ?
++ st->vtime : entity->finish;
++ } else {
++ /*
++ * The finish time of the entity may be invalid, and
++ * it is in the past for sure, otherwise the queue
++ * would have been on the idle tree.
++ */
++ entity->start = st->vtime;
++ st->wsum += entity->weight;
++ bfq_get_entity(entity);
++
++ BUG_ON(entity->on_st);
++ entity->on_st = 1;
++ }
++
++ st = __bfq_entity_update_weight_prio(st, entity);
++ bfq_calc_finish(entity, entity->budget);
++ bfq_active_insert(st, entity);
++}
++
++/**
++ * bfq_activate_entity - activate an entity and its ancestors if necessary.
++ * @entity: the entity to activate.
++ *
++ * Activate @entity and all the entities on the path from it to the root.
++ */
++static void bfq_activate_entity(struct bfq_entity *entity)
++{
++ struct bfq_sched_data *sd;
++
++ for_each_entity(entity) {
++ __bfq_activate_entity(entity);
++
++ sd = entity->sched_data;
++ if (!bfq_update_next_in_service(sd))
++ /*
++ * No need to propagate the activation to the
++ * upper entities, as they will be updated when
++ * the in-service entity is rescheduled.
++ */
++ break;
++ }
++}
++
++/**
++ * __bfq_deactivate_entity - deactivate an entity from its service tree.
++ * @entity: the entity to deactivate.
++ * @requeue: if false, the entity will not be put into the idle tree.
++ *
++ * Deactivate an entity, independently from its previous state. If the
++ * entity was not on a service tree just return, otherwise if it is on
++ * any scheduler tree, extract it from that tree, and if necessary
++ * and if the caller did not specify @requeue, put it on the idle tree.
++ *
++ * Return %1 if the caller should update the entity hierarchy, i.e.,
++ * if the entity was in service or if it was the next_in_service for
++ * its sched_data; return %0 otherwise.
++ */
++static int __bfq_deactivate_entity(struct bfq_entity *entity, int requeue)
++{
++ struct bfq_sched_data *sd = entity->sched_data;
++ struct bfq_service_tree *st = bfq_entity_service_tree(entity);
++ int was_in_service = entity == sd->in_service_entity;
++ int ret = 0;
++
++ if (!entity->on_st)
++ return 0;
++
++ BUG_ON(was_in_service && entity->tree != NULL);
++
++ if (was_in_service) {
++ bfq_calc_finish(entity, entity->service);
++ sd->in_service_entity = NULL;
++ } else if (entity->tree == &st->active)
++ bfq_active_extract(st, entity);
++ else if (entity->tree == &st->idle)
++ bfq_idle_extract(st, entity);
++ else if (entity->tree != NULL)
++ BUG();
++
++ if (was_in_service || sd->next_in_service == entity)
++ ret = bfq_update_next_in_service(sd);
++
++ if (!requeue || !bfq_gt(entity->finish, st->vtime))
++ bfq_forget_entity(st, entity);
++ else
++ bfq_idle_insert(st, entity);
++
++ BUG_ON(sd->in_service_entity == entity);
++ BUG_ON(sd->next_in_service == entity);
++
++ return ret;
++}
++
++/**
++ * bfq_deactivate_entity - deactivate an entity.
++ * @entity: the entity to deactivate.
++ * @requeue: true if the entity can be put on the idle tree
++ */
++static void bfq_deactivate_entity(struct bfq_entity *entity, int requeue)
++{
++ struct bfq_sched_data *sd;
++ struct bfq_entity *parent;
++
++ for_each_entity_safe(entity, parent) {
++ sd = entity->sched_data;
++
++ if (!__bfq_deactivate_entity(entity, requeue))
++ /*
++ * The parent entity is still backlogged, and
++ * we don't need to update it as it is still
++ * in service.
++ */
++ break;
++
++ if (sd->next_in_service != NULL)
++ /*
++ * The parent entity is still backlogged and
++ * the budgets on the path towards the root
++ * need to be updated.
++ */
++ goto update;
++
++ /*
++ * If we reach there the parent is no more backlogged and
++ * we want to propagate the dequeue upwards.
++ */
++ requeue = 1;
++ }
++
++ return;
++
++update:
++ entity = parent;
++ for_each_entity(entity) {
++ __bfq_activate_entity(entity);
++
++ sd = entity->sched_data;
++ if (!bfq_update_next_in_service(sd))
++ break;
++ }
++}
++
++/**
++ * bfq_update_vtime - update vtime if necessary.
++ * @st: the service tree to act upon.
++ *
++ * If necessary update the service tree vtime to have at least one
++ * eligible entity, skipping to its start time. Assumes that the
++ * active tree of the device is not empty.
++ *
++ * NOTE: this hierarchical implementation updates vtimes quite often,
++ * we may end up with reactivated processes getting timestamps after a
++ * vtime skip done because we needed a ->first_active entity on some
++ * intermediate node.
++ */
++static void bfq_update_vtime(struct bfq_service_tree *st)
++{
++ struct bfq_entity *entry;
++ struct rb_node *node = st->active.rb_node;
++
++ entry = rb_entry(node, struct bfq_entity, rb_node);
++ if (bfq_gt(entry->min_start, st->vtime)) {
++ st->vtime = entry->min_start;
++ bfq_forget_idle(st);
++ }
++}
++
++/**
++ * bfq_first_active_entity - find the eligible entity with
++ * the smallest finish time
++ * @st: the service tree to select from.
++ *
++ * This function searches the first schedulable entity, starting from the
++ * root of the tree and going on the left every time on this side there is
++ * a subtree with at least one eligible (start >= vtime) entity. The path on
++ * the right is followed only if a) the left subtree contains no eligible
++ * entities and b) no eligible entity has been found yet.
++ */
++static struct bfq_entity *bfq_first_active_entity(struct bfq_service_tree *st)
++{
++ struct bfq_entity *entry, *first = NULL;
++ struct rb_node *node = st->active.rb_node;
++
++ while (node != NULL) {
++ entry = rb_entry(node, struct bfq_entity, rb_node);
++left:
++ if (!bfq_gt(entry->start, st->vtime))
++ first = entry;
++
++ BUG_ON(bfq_gt(entry->min_start, st->vtime));
++
++ if (node->rb_left != NULL) {
++ entry = rb_entry(node->rb_left,
++ struct bfq_entity, rb_node);
++ if (!bfq_gt(entry->min_start, st->vtime)) {
++ node = node->rb_left;
++ goto left;
++ }
++ }
++ if (first != NULL)
++ break;
++ node = node->rb_right;
++ }
++
++ BUG_ON(first == NULL && !RB_EMPTY_ROOT(&st->active));
++ return first;
++}
++
++/**
++ * __bfq_lookup_next_entity - return the first eligible entity in @st.
++ * @st: the service tree.
++ *
++ * Update the virtual time in @st and return the first eligible entity
++ * it contains.
++ */
++static struct bfq_entity *__bfq_lookup_next_entity(struct bfq_service_tree *st,
++ bool force)
++{
++ struct bfq_entity *entity, *new_next_in_service = NULL;
++
++ if (RB_EMPTY_ROOT(&st->active))
++ return NULL;
++
++ bfq_update_vtime(st);
++ entity = bfq_first_active_entity(st);
++ BUG_ON(bfq_gt(entity->start, st->vtime));
++
++ /*
++ * If the chosen entity does not match with the sched_data's
++ * next_in_service and we are forcedly serving the IDLE priority
++ * class tree, bubble up budget update.
++ */
++ if (unlikely(force && entity != entity->sched_data->next_in_service)) {
++ new_next_in_service = entity;
++ for_each_entity(new_next_in_service)
++ bfq_update_budget(new_next_in_service);
++ }
++
++ return entity;
++}
++
++/**
++ * bfq_lookup_next_entity - return the first eligible entity in @sd.
++ * @sd: the sched_data.
++ * @extract: if true the returned entity will be also extracted from @sd.
++ *
++ * NOTE: since we cache the next_in_service entity at each level of the
++ * hierarchy, the complexity of the lookup can be decreased with
++ * absolutely no effort just returning the cached next_in_service value;
++ * we prefer to do full lookups to test the consistency of * the data
++ * structures.
++ */
++static struct bfq_entity *bfq_lookup_next_entity(struct bfq_sched_data *sd,
++ int extract,
++ struct bfq_data *bfqd)
++{
++ struct bfq_service_tree *st = sd->service_tree;
++ struct bfq_entity *entity;
++ int i = 0;
++
++ BUG_ON(sd->in_service_entity != NULL);
++
++ if (bfqd != NULL &&
++ jiffies - bfqd->bfq_class_idle_last_service > BFQ_CL_IDLE_TIMEOUT) {
++ entity = __bfq_lookup_next_entity(st + BFQ_IOPRIO_CLASSES - 1,
++ true);
++ if (entity != NULL) {
++ i = BFQ_IOPRIO_CLASSES - 1;
++ bfqd->bfq_class_idle_last_service = jiffies;
++ sd->next_in_service = entity;
++ }
++ }
++ for (; i < BFQ_IOPRIO_CLASSES; i++) {
++ entity = __bfq_lookup_next_entity(st + i, false);
++ if (entity != NULL) {
++ if (extract) {
++ bfq_check_next_in_service(sd, entity);
++ bfq_active_extract(st + i, entity);
++ sd->in_service_entity = entity;
++ sd->next_in_service = NULL;
++ }
++ break;
++ }
++ }
++
++ return entity;
++}
++
++/*
++ * Get next queue for service.
++ */
++static struct bfq_queue *bfq_get_next_queue(struct bfq_data *bfqd)
++{
++ struct bfq_entity *entity = NULL;
++ struct bfq_sched_data *sd;
++ struct bfq_queue *bfqq;
++
++ BUG_ON(bfqd->in_service_queue != NULL);
++
++ if (bfqd->busy_queues == 0)
++ return NULL;
++
++ sd = &bfqd->root_group->sched_data;
++ for (; sd != NULL; sd = entity->my_sched_data) {
++ entity = bfq_lookup_next_entity(sd, 1, bfqd);
++ BUG_ON(entity == NULL);
++ entity->service = 0;
++ }
++
++ bfqq = bfq_entity_to_bfqq(entity);
++ BUG_ON(bfqq == NULL);
++
++ return bfqq;
++}
++
++/*
++ * Forced extraction of the given queue.
++ */
++static void bfq_get_next_queue_forced(struct bfq_data *bfqd,
++ struct bfq_queue *bfqq)
++{
++ struct bfq_entity *entity;
++ struct bfq_sched_data *sd;
++
++ BUG_ON(bfqd->in_service_queue != NULL);
++
++ entity = &bfqq->entity;
++ /*
++ * Bubble up extraction/update from the leaf to the root.
++ */
++ for_each_entity(entity) {
++ sd = entity->sched_data;
++ bfq_update_budget(entity);
++ bfq_update_vtime(bfq_entity_service_tree(entity));
++ bfq_active_extract(bfq_entity_service_tree(entity), entity);
++ sd->in_service_entity = entity;
++ sd->next_in_service = NULL;
++ entity->service = 0;
++ }
++
++ return;
++}
++
++static void __bfq_bfqd_reset_in_service(struct bfq_data *bfqd)
++{
++ if (bfqd->in_service_bic != NULL) {
++ put_io_context(bfqd->in_service_bic->icq.ioc);
++ bfqd->in_service_bic = NULL;
++ }
++
++ bfqd->in_service_queue = NULL;
++ del_timer(&bfqd->idle_slice_timer);
++}
++
++static void bfq_deactivate_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++ int requeue)
++{
++ struct bfq_entity *entity = &bfqq->entity;
++
++ if (bfqq == bfqd->in_service_queue)
++ __bfq_bfqd_reset_in_service(bfqd);
++
++ bfq_deactivate_entity(entity, requeue);
++}
++
++static void bfq_activate_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq)
++{
++ struct bfq_entity *entity = &bfqq->entity;
++
++ bfq_activate_entity(entity);
++}
++
++/*
++ * Called when the bfqq no longer has requests pending, remove it from
++ * the service tree.
++ */
++static void bfq_del_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++ int requeue)
++{
++ BUG_ON(!bfq_bfqq_busy(bfqq));
++ BUG_ON(!RB_EMPTY_ROOT(&bfqq->sort_list));
++
++ bfq_log_bfqq(bfqd, bfqq, "del from busy");
++
++ bfq_clear_bfqq_busy(bfqq);
++
++ BUG_ON(bfqd->busy_queues == 0);
++ bfqd->busy_queues--;
++
++ if (!bfqq->dispatched) {
++ bfq_weights_tree_remove(bfqd, &bfqq->entity,
++ &bfqd->queue_weights_tree);
++ if (!blk_queue_nonrot(bfqd->queue)) {
++ BUG_ON(!bfqd->busy_in_flight_queues);
++ bfqd->busy_in_flight_queues--;
++ if (bfq_bfqq_constantly_seeky(bfqq)) {
++ BUG_ON(!bfqd->
++ const_seeky_busy_in_flight_queues);
++ bfqd->const_seeky_busy_in_flight_queues--;
++ }
++ }
++ }
++ if (bfqq->wr_coeff > 1)
++ bfqd->wr_busy_queues--;
++
++ bfq_deactivate_bfqq(bfqd, bfqq, requeue);
++}
++
++/*
++ * Called when an inactive queue receives a new request.
++ */
++static void bfq_add_bfqq_busy(struct bfq_data *bfqd, struct bfq_queue *bfqq)
++{
++ BUG_ON(bfq_bfqq_busy(bfqq));
++ BUG_ON(bfqq == bfqd->in_service_queue);
++
++ bfq_log_bfqq(bfqd, bfqq, "add to busy");
++
++ bfq_activate_bfqq(bfqd, bfqq);
++
++ bfq_mark_bfqq_busy(bfqq);
++ bfqd->busy_queues++;
++
++ if (!bfqq->dispatched) {
++ if (bfqq->wr_coeff == 1)
++ bfq_weights_tree_add(bfqd, &bfqq->entity,
++ &bfqd->queue_weights_tree);
++ if (!blk_queue_nonrot(bfqd->queue)) {
++ bfqd->busy_in_flight_queues++;
++ if (bfq_bfqq_constantly_seeky(bfqq))
++ bfqd->const_seeky_busy_in_flight_queues++;
++ }
++ }
++ if (bfqq->wr_coeff > 1)
++ bfqd->wr_busy_queues++;
++}
+diff --git a/block/bfq.h b/block/bfq.h
+new file mode 100644
+index 0000000..e350b5f
+--- /dev/null
++++ b/block/bfq.h
+@@ -0,0 +1,771 @@
++/*
++ * BFQ-v7r8 for 4.2.0: data structures and common functions prototypes.
++ *
++ * Based on ideas and code from CFQ:
++ * Copyright (C) 2003 Jens Axboe <axboe@kernel.dk>
++ *
++ * Copyright (C) 2008 Fabio Checconi <fabio@gandalf.sssup.it>
++ * Paolo Valente <paolo.valente@unimore.it>
++ *
++ * Copyright (C) 2010 Paolo Valente <paolo.valente@unimore.it>
++ */
++
++#ifndef _BFQ_H
++#define _BFQ_H
++
++#include <linux/blktrace_api.h>
++#include <linux/hrtimer.h>
++#include <linux/ioprio.h>
++#include <linux/rbtree.h>
++
++#define BFQ_IOPRIO_CLASSES 3
++#define BFQ_CL_IDLE_TIMEOUT (HZ/5)
++
++#define BFQ_MIN_WEIGHT 1
++#define BFQ_MAX_WEIGHT 1000
++
++#define BFQ_DEFAULT_QUEUE_IOPRIO 4
++
++#define BFQ_DEFAULT_GRP_WEIGHT 10
++#define BFQ_DEFAULT_GRP_IOPRIO 0
++#define BFQ_DEFAULT_GRP_CLASS IOPRIO_CLASS_BE
++
++struct bfq_entity;
++
++/**
++ * struct bfq_service_tree - per ioprio_class service tree.
++ * @active: tree for active entities (i.e., those backlogged).
++ * @idle: tree for idle entities (i.e., those not backlogged, with V <= F_i).
++ * @first_idle: idle entity with minimum F_i.
++ * @last_idle: idle entity with maximum F_i.
++ * @vtime: scheduler virtual time.
++ * @wsum: scheduler weight sum; active and idle entities contribute to it.
++ *
++ * Each service tree represents a B-WF2Q+ scheduler on its own. Each
++ * ioprio_class has its own independent scheduler, and so its own
++ * bfq_service_tree. All the fields are protected by the queue lock
++ * of the containing bfqd.
++ */
++struct bfq_service_tree {
++ struct rb_root active;
++ struct rb_root idle;
++
++ struct bfq_entity *first_idle;
++ struct bfq_entity *last_idle;
++
++ u64 vtime;
++ unsigned long wsum;
++};
++
++/**
++ * struct bfq_sched_data - multi-class scheduler.
++ * @in_service_entity: entity in service.
++ * @next_in_service: head-of-the-line entity in the scheduler.
++ * @service_tree: array of service trees, one per ioprio_class.
++ *
++ * bfq_sched_data is the basic scheduler queue. It supports three
++ * ioprio_classes, and can be used either as a toplevel queue or as
++ * an intermediate queue on a hierarchical setup.
++ * @next_in_service points to the active entity of the sched_data
++ * service trees that will be scheduled next.
++ *
++ * The supported ioprio_classes are the same as in CFQ, in descending
++ * priority order, IOPRIO_CLASS_RT, IOPRIO_CLASS_BE, IOPRIO_CLASS_IDLE.
++ * Requests from higher priority queues are served before all the
++ * requests from lower priority queues; among requests of the same
++ * queue requests are served according to B-WF2Q+.
++ * All the fields are protected by the queue lock of the containing bfqd.
++ */
++struct bfq_sched_data {
++ struct bfq_entity *in_service_entity;
++ struct bfq_entity *next_in_service;
++ struct bfq_service_tree service_tree[BFQ_IOPRIO_CLASSES];
++};
++
++/**
++ * struct bfq_weight_counter - counter of the number of all active entities
++ * with a given weight.
++ * @weight: weight of the entities that this counter refers to.
++ * @num_active: number of active entities with this weight.
++ * @weights_node: weights tree member (see bfq_data's @queue_weights_tree
++ * and @group_weights_tree).
++ */
++struct bfq_weight_counter {
++ short int weight;
++ unsigned int num_active;
++ struct rb_node weights_node;
++};
++
++/**
++ * struct bfq_entity - schedulable entity.
++ * @rb_node: service_tree member.
++ * @weight_counter: pointer to the weight counter associated with this entity.
++ * @on_st: flag, true if the entity is on a tree (either the active or
++ * the idle one of its service_tree).
++ * @finish: B-WF2Q+ finish timestamp (aka F_i).
++ * @start: B-WF2Q+ start timestamp (aka S_i).
++ * @tree: tree the entity is enqueued into; %NULL if not on a tree.
++ * @min_start: minimum start time of the (active) subtree rooted at
++ * this entity; used for O(log N) lookups into active trees.
++ * @service: service received during the last round of service.
++ * @budget: budget used to calculate F_i; F_i = S_i + @budget / @weight.
++ * @weight: weight of the queue
++ * @parent: parent entity, for hierarchical scheduling.
++ * @my_sched_data: for non-leaf nodes in the cgroup hierarchy, the
++ * associated scheduler queue, %NULL on leaf nodes.
++ * @sched_data: the scheduler queue this entity belongs to.
++ * @ioprio: the ioprio in use.
++ * @new_weight: when a weight change is requested, the new weight value.
++ * @orig_weight: original weight, used to implement weight boosting
++ * @new_ioprio: when an ioprio change is requested, the new ioprio value.
++ * @ioprio_class: the ioprio_class in use.
++ * @new_ioprio_class: when an ioprio_class change is requested, the new
++ * ioprio_class value.
++ * @ioprio_changed: flag, true when the user requested a weight, ioprio or
++ * ioprio_class change.
++ *
++ * A bfq_entity is used to represent either a bfq_queue (leaf node in the
++ * cgroup hierarchy) or a bfq_group into the upper level scheduler. Each
++ * entity belongs to the sched_data of the parent group in the cgroup
++ * hierarchy. Non-leaf entities have also their own sched_data, stored
++ * in @my_sched_data.
++ *
++ * Each entity stores independently its priority values; this would
++ * allow different weights on different devices, but this
++ * functionality is not exported to userspace by now. Priorities and
++ * weights are updated lazily, first storing the new values into the
++ * new_* fields, then setting the @ioprio_changed flag. As soon as
++ * there is a transition in the entity state that allows the priority
++ * update to take place the effective and the requested priority
++ * values are synchronized.
++ *
++ * Unless cgroups are used, the weight value is calculated from the
++ * ioprio to export the same interface as CFQ. When dealing with
++ * ``well-behaved'' queues (i.e., queues that do not spend too much
++ * time to consume their budget and have true sequential behavior, and
++ * when there are no external factors breaking anticipation) the
++ * relative weights at each level of the cgroups hierarchy should be
++ * guaranteed. All the fields are protected by the queue lock of the
++ * containing bfqd.
++ */
++struct bfq_entity {
++ struct rb_node rb_node;
++ struct bfq_weight_counter *weight_counter;
++
++ int on_st;
++
++ u64 finish;
++ u64 start;
++
++ struct rb_root *tree;
++
++ u64 min_start;
++
++ unsigned long service, budget;
++ unsigned short weight, new_weight;
++ unsigned short orig_weight;
++
++ struct bfq_entity *parent;
++
++ struct bfq_sched_data *my_sched_data;
++ struct bfq_sched_data *sched_data;
++
++ unsigned short ioprio, new_ioprio;
++ unsigned short ioprio_class, new_ioprio_class;
++
++ int ioprio_changed;
++};
++
++struct bfq_group;
++
++/**
++ * struct bfq_queue - leaf schedulable entity.
++ * @ref: reference counter.
++ * @bfqd: parent bfq_data.
++ * @new_bfqq: shared bfq_queue if queue is cooperating with
++ * one or more other queues.
++ * @pos_node: request-position tree member (see bfq_data's @rq_pos_tree).
++ * @pos_root: request-position tree root (see bfq_data's @rq_pos_tree).
++ * @sort_list: sorted list of pending requests.
++ * @next_rq: if fifo isn't expired, next request to serve.
++ * @queued: nr of requests queued in @sort_list.
++ * @allocated: currently allocated requests.
++ * @meta_pending: pending metadata requests.
++ * @fifo: fifo list of requests in sort_list.
++ * @entity: entity representing this queue in the scheduler.
++ * @max_budget: maximum budget allowed from the feedback mechanism.
++ * @budget_timeout: budget expiration (in jiffies).
++ * @dispatched: number of requests on the dispatch list or inside driver.
++ * @flags: status flags.
++ * @bfqq_list: node for active/idle bfqq list inside our bfqd.
++ * @burst_list_node: node for the device's burst list.
++ * @seek_samples: number of seeks sampled
++ * @seek_total: sum of the distances of the seeks sampled
++ * @seek_mean: mean seek distance
++ * @last_request_pos: position of the last request enqueued
++ * @requests_within_timer: number of consecutive pairs of request completion
++ * and arrival, such that the queue becomes idle
++ * after the completion, but the next request arrives
++ * within an idle time slice; used only if the queue's
++ * IO_bound has been cleared.
++ * @pid: pid of the process owning the queue, used for logging purposes.
++ * @last_wr_start_finish: start time of the current weight-raising period if
++ * the @bfq-queue is being weight-raised, otherwise
++ * finish time of the last weight-raising period
++ * @wr_cur_max_time: current max raising time for this queue
++ * @soft_rt_next_start: minimum time instant such that, only if a new
++ * request is enqueued after this time instant in an
++ * idle @bfq_queue with no outstanding requests, then
++ * the task associated with the queue it is deemed as
++ * soft real-time (see the comments to the function
++ * bfq_bfqq_softrt_next_start()).
++ * @last_idle_bklogged: time of the last transition of the @bfq_queue from
++ * idle to backlogged
++ * @service_from_backlogged: cumulative service received from the @bfq_queue
++ * since the last transition from idle to
++ * backlogged
++ *
++ * A bfq_queue is a leaf request queue; it can be associated with an io_context
++ * or more, if it is async or shared between cooperating processes. @cgroup
++ * holds a reference to the cgroup, to be sure that it does not disappear while
++ * a bfqq still references it (mostly to avoid races between request issuing and
++ * task migration followed by cgroup destruction).
++ * All the fields are protected by the queue lock of the containing bfqd.
++ */
++struct bfq_queue {
++ atomic_t ref;
++ struct bfq_data *bfqd;
++
++ /* fields for cooperating queues handling */
++ struct bfq_queue *new_bfqq;
++ struct rb_node pos_node;
++ struct rb_root *pos_root;
++
++ struct rb_root sort_list;
++ struct request *next_rq;
++ int queued[2];
++ int allocated[2];
++ int meta_pending;
++ struct list_head fifo;
++
++ struct bfq_entity entity;
++
++ unsigned long max_budget;
++ unsigned long budget_timeout;
++
++ int dispatched;
++
++ unsigned int flags;
++
++ struct list_head bfqq_list;
++
++ struct hlist_node burst_list_node;
++
++ unsigned int seek_samples;
++ u64 seek_total;
++ sector_t seek_mean;
++ sector_t last_request_pos;
++
++ unsigned int requests_within_timer;
++
++ pid_t pid;
++
++ /* weight-raising fields */
++ unsigned long wr_cur_max_time;
++ unsigned long soft_rt_next_start;
++ unsigned long last_wr_start_finish;
++ unsigned int wr_coeff;
++ unsigned long last_idle_bklogged;
++ unsigned long service_from_backlogged;
++};
++
++/**
++ * struct bfq_ttime - per process thinktime stats.
++ * @ttime_total: total process thinktime
++ * @ttime_samples: number of thinktime samples
++ * @ttime_mean: average process thinktime
++ */
++struct bfq_ttime {
++ unsigned long last_end_request;
++
++ unsigned long ttime_total;
++ unsigned long ttime_samples;
++ unsigned long ttime_mean;
++};
++
++/**
++ * struct bfq_io_cq - per (request_queue, io_context) structure.
++ * @icq: associated io_cq structure
++ * @bfqq: array of two process queues, the sync and the async
++ * @ttime: associated @bfq_ttime struct
++ */
++struct bfq_io_cq {
++ struct io_cq icq; /* must be the first member */
++ struct bfq_queue *bfqq[2];
++ struct bfq_ttime ttime;
++ int ioprio;
++};
++
++enum bfq_device_speed {
++ BFQ_BFQD_FAST,
++ BFQ_BFQD_SLOW,
++};
++
++/**
++ * struct bfq_data - per device data structure.
++ * @queue: request queue for the managed device.
++ * @root_group: root bfq_group for the device.
++ * @rq_pos_tree: rbtree sorted by next_request position, used when
++ * determining if two or more queues have interleaving
++ * requests (see bfq_close_cooperator()).
++ * @active_numerous_groups: number of bfq_groups containing more than one
++ * active @bfq_entity.
++ * @queue_weights_tree: rbtree of weight counters of @bfq_queues, sorted by
++ * weight. Used to keep track of whether all @bfq_queues
++ * have the same weight. The tree contains one counter
++ * for each distinct weight associated to some active
++ * and not weight-raised @bfq_queue (see the comments to
++ * the functions bfq_weights_tree_[add|remove] for
++ * further details).
++ * @group_weights_tree: rbtree of non-queue @bfq_entity weight counters, sorted
++ * by weight. Used to keep track of whether all
++ * @bfq_groups have the same weight. The tree contains
++ * one counter for each distinct weight associated to
++ * some active @bfq_group (see the comments to the
++ * functions bfq_weights_tree_[add|remove] for further
++ * details).
++ * @busy_queues: number of bfq_queues containing requests (including the
++ * queue in service, even if it is idling).
++ * @busy_in_flight_queues: number of @bfq_queues containing pending or
++ * in-flight requests, plus the @bfq_queue in
++ * service, even if idle but waiting for the
++ * possible arrival of its next sync request. This
++ * field is updated only if the device is rotational,
++ * but used only if the device is also NCQ-capable.
++ * The reason why the field is updated also for non-
++ * NCQ-capable rotational devices is related to the
++ * fact that the value of @hw_tag may be set also
++ * later than when busy_in_flight_queues may need to
++ * be incremented for the first time(s). Taking also
++ * this possibility into account, to avoid unbalanced
++ * increments/decrements, would imply more overhead
++ * than just updating busy_in_flight_queues
++ * regardless of the value of @hw_tag.
++ * @const_seeky_busy_in_flight_queues: number of constantly-seeky @bfq_queues
++ * (that is, seeky queues that expired
++ * for budget timeout at least once)
++ * containing pending or in-flight
++ * requests, including the in-service
++ * @bfq_queue if constantly seeky. This
++ * field is updated only if the device
++ * is rotational, but used only if the
++ * device is also NCQ-capable (see the
++ * comments to @busy_in_flight_queues).
++ * @wr_busy_queues: number of weight-raised busy @bfq_queues.
++ * @queued: number of queued requests.
++ * @rq_in_driver: number of requests dispatched and waiting for completion.
++ * @sync_flight: number of sync requests in the driver.
++ * @max_rq_in_driver: max number of reqs in driver in the last
++ * @hw_tag_samples completed requests.
++ * @hw_tag_samples: nr of samples used to calculate hw_tag.
++ * @hw_tag: flag set to one if the driver is showing a queueing behavior.
++ * @budgets_assigned: number of budgets assigned.
++ * @idle_slice_timer: timer set when idling for the next sequential request
++ * from the queue in service.
++ * @unplug_work: delayed work to restart dispatching on the request queue.
++ * @in_service_queue: bfq_queue in service.
++ * @in_service_bic: bfq_io_cq (bic) associated with the @in_service_queue.
++ * @last_position: on-disk position of the last served request.
++ * @last_budget_start: beginning of the last budget.
++ * @last_idling_start: beginning of the last idle slice.
++ * @peak_rate: peak transfer rate observed for a budget.
++ * @peak_rate_samples: number of samples used to calculate @peak_rate.
++ * @bfq_max_budget: maximum budget allotted to a bfq_queue before
++ * rescheduling.
++ * @group_list: list of all the bfq_groups active on the device.
++ * @active_list: list of all the bfq_queues active on the device.
++ * @idle_list: list of all the bfq_queues idle on the device.
++ * @bfq_fifo_expire: timeout for async/sync requests; when it expires
++ * requests are served in fifo order.
++ * @bfq_back_penalty: weight of backward seeks wrt forward ones.
++ * @bfq_back_max: maximum allowed backward seek.
++ * @bfq_slice_idle: maximum idling time.
++ * @bfq_user_max_budget: user-configured max budget value
++ * (0 for auto-tuning).
++ * @bfq_max_budget_async_rq: maximum budget (in nr of requests) allotted to
++ * async queues.
++ * @bfq_timeout: timeout for bfq_queues to consume their budget; used to
++ * to prevent seeky queues to impose long latencies to well
++ * behaved ones (this also implies that seeky queues cannot
++ * receive guarantees in the service domain; after a timeout
++ * they are charged for the whole allocated budget, to try
++ * to preserve a behavior reasonably fair among them, but
++ * without service-domain guarantees).
++ * @bfq_coop_thresh: number of queue merges after which a @bfq_queue is
++ * no more granted any weight-raising.
++ * @bfq_failed_cooperations: number of consecutive failed cooperation
++ * chances after which weight-raising is restored
++ * to a queue subject to more than bfq_coop_thresh
++ * queue merges.
++ * @bfq_requests_within_timer: number of consecutive requests that must be
++ * issued within the idle time slice to set
++ * again idling to a queue which was marked as
++ * non-I/O-bound (see the definition of the
++ * IO_bound flag for further details).
++ * @last_ins_in_burst: last time at which a queue entered the current
++ * burst of queues being activated shortly after
++ * each other; for more details about this and the
++ * following parameters related to a burst of
++ * activations, see the comments to the function
++ * @bfq_handle_burst.
++ * @bfq_burst_interval: reference time interval used to decide whether a
++ * queue has been activated shortly after
++ * @last_ins_in_burst.
++ * @burst_size: number of queues in the current burst of queue activations.
++ * @bfq_large_burst_thresh: maximum burst size above which the current
++ * queue-activation burst is deemed as 'large'.
++ * @large_burst: true if a large queue-activation burst is in progress.
++ * @burst_list: head of the burst list (as for the above fields, more details
++ * in the comments to the function bfq_handle_burst).
++ * @low_latency: if set to true, low-latency heuristics are enabled.
++ * @bfq_wr_coeff: maximum factor by which the weight of a weight-raised
++ * queue is multiplied.
++ * @bfq_wr_max_time: maximum duration of a weight-raising period (jiffies).
++ * @bfq_wr_rt_max_time: maximum duration for soft real-time processes.
++ * @bfq_wr_min_idle_time: minimum idle period after which weight-raising
++ * may be reactivated for a queue (in jiffies).
++ * @bfq_wr_min_inter_arr_async: minimum period between request arrivals
++ * after which weight-raising may be
++ * reactivated for an already busy queue
++ * (in jiffies).
++ * @bfq_wr_max_softrt_rate: max service-rate for a soft real-time queue,
++ * sectors per seconds.
++ * @RT_prod: cached value of the product R*T used for computing the maximum
++ * duration of the weight raising automatically.
++ * @device_speed: device-speed class for the low-latency heuristic.
++ * @oom_bfqq: fallback dummy bfqq for extreme OOM conditions.
++ *
++ * All the fields are protected by the @queue lock.
++ */
++struct bfq_data {
++ struct request_queue *queue;
++
++ struct bfq_group *root_group;
++ struct rb_root rq_pos_tree;
++
++#ifdef CONFIG_CGROUP_BFQIO
++ int active_numerous_groups;
++#endif
++
++ struct rb_root queue_weights_tree;
++ struct rb_root group_weights_tree;
++
++ int busy_queues;
++ int busy_in_flight_queues;
++ int const_seeky_busy_in_flight_queues;
++ int wr_busy_queues;
++ int queued;
++ int rq_in_driver;
++ int sync_flight;
++
++ int max_rq_in_driver;
++ int hw_tag_samples;
++ int hw_tag;
++
++ int budgets_assigned;
++
++ struct timer_list idle_slice_timer;
++ struct work_struct unplug_work;
++
++ struct bfq_queue *in_service_queue;
++ struct bfq_io_cq *in_service_bic;
++
++ sector_t last_position;
++
++ ktime_t last_budget_start;
++ ktime_t last_idling_start;
++ int peak_rate_samples;
++ u64 peak_rate;
++ unsigned long bfq_max_budget;
++
++ struct hlist_head group_list;
++ struct list_head active_list;
++ struct list_head idle_list;
++
++ unsigned int bfq_fifo_expire[2];
++ unsigned int bfq_back_penalty;
++ unsigned int bfq_back_max;
++ unsigned int bfq_slice_idle;
++ u64 bfq_class_idle_last_service;
++
++ unsigned int bfq_user_max_budget;
++ unsigned int bfq_max_budget_async_rq;
++ unsigned int bfq_timeout[2];
++
++ unsigned int bfq_coop_thresh;
++ unsigned int bfq_failed_cooperations;
++ unsigned int bfq_requests_within_timer;
++
++ unsigned long last_ins_in_burst;
++ unsigned long bfq_burst_interval;
++ int burst_size;
++ unsigned long bfq_large_burst_thresh;
++ bool large_burst;
++ struct hlist_head burst_list;
++
++ bool low_latency;
++
++ /* parameters of the low_latency heuristics */
++ unsigned int bfq_wr_coeff;
++ unsigned int bfq_wr_max_time;
++ unsigned int bfq_wr_rt_max_time;
++ unsigned int bfq_wr_min_idle_time;
++ unsigned long bfq_wr_min_inter_arr_async;
++ unsigned int bfq_wr_max_softrt_rate;
++ u64 RT_prod;
++ enum bfq_device_speed device_speed;
++
++ struct bfq_queue oom_bfqq;
++};
++
++enum bfqq_state_flags {
++ BFQ_BFQQ_FLAG_busy = 0, /* has requests or is in service */
++ BFQ_BFQQ_FLAG_wait_request, /* waiting for a request */
++ BFQ_BFQQ_FLAG_must_alloc, /* must be allowed rq alloc */
++ BFQ_BFQQ_FLAG_fifo_expire, /* FIFO checked in this slice */
++ BFQ_BFQQ_FLAG_idle_window, /* slice idling enabled */
++ BFQ_BFQQ_FLAG_sync, /* synchronous queue */
++ BFQ_BFQQ_FLAG_budget_new, /* no completion with this budget */
++ BFQ_BFQQ_FLAG_IO_bound, /*
++ * bfqq has timed-out at least once
++ * having consumed at most 2/10 of
++ * its budget
++ */
++ BFQ_BFQQ_FLAG_in_large_burst, /*
++ * bfqq activated in a large burst,
++ * see comments to bfq_handle_burst.
++ */
++ BFQ_BFQQ_FLAG_constantly_seeky, /*
++ * bfqq has proved to be slow and
++ * seeky until budget timeout
++ */
++ BFQ_BFQQ_FLAG_softrt_update, /*
++ * may need softrt-next-start
++ * update
++ */
++ BFQ_BFQQ_FLAG_coop, /* bfqq is shared */
++ BFQ_BFQQ_FLAG_split_coop, /* shared bfqq will be splitted */
++};
++
++#define BFQ_BFQQ_FNS(name) \
++static inline void bfq_mark_bfqq_##name(struct bfq_queue *bfqq) \
++{ \
++ (bfqq)->flags |= (1 << BFQ_BFQQ_FLAG_##name); \
++} \
++static inline void bfq_clear_bfqq_##name(struct bfq_queue *bfqq) \
++{ \
++ (bfqq)->flags &= ~(1 << BFQ_BFQQ_FLAG_##name); \
++} \
++static inline int bfq_bfqq_##name(const struct bfq_queue *bfqq) \
++{ \
++ return ((bfqq)->flags & (1 << BFQ_BFQQ_FLAG_##name)) != 0; \
++}
++
++BFQ_BFQQ_FNS(busy);
++BFQ_BFQQ_FNS(wait_request);
++BFQ_BFQQ_FNS(must_alloc);
++BFQ_BFQQ_FNS(fifo_expire);
++BFQ_BFQQ_FNS(idle_window);
++BFQ_BFQQ_FNS(sync);
++BFQ_BFQQ_FNS(budget_new);
++BFQ_BFQQ_FNS(IO_bound);
++BFQ_BFQQ_FNS(in_large_burst);
++BFQ_BFQQ_FNS(constantly_seeky);
++BFQ_BFQQ_FNS(coop);
++BFQ_BFQQ_FNS(split_coop);
++BFQ_BFQQ_FNS(softrt_update);
++#undef BFQ_BFQQ_FNS
++
++/* Logging facilities. */
++#define bfq_log_bfqq(bfqd, bfqq, fmt, args...) \
++ blk_add_trace_msg((bfqd)->queue, "bfq%d " fmt, (bfqq)->pid, ##args)
++
++#define bfq_log(bfqd, fmt, args...) \
++ blk_add_trace_msg((bfqd)->queue, "bfq " fmt, ##args)
++
++/* Expiration reasons. */
++enum bfqq_expiration {
++ BFQ_BFQQ_TOO_IDLE = 0, /*
++ * queue has been idling for
++ * too long
++ */
++ BFQ_BFQQ_BUDGET_TIMEOUT, /* budget took too long to be used */
++ BFQ_BFQQ_BUDGET_EXHAUSTED, /* budget consumed */
++ BFQ_BFQQ_NO_MORE_REQUESTS, /* the queue has no more requests */
++};
++
++#ifdef CONFIG_CGROUP_BFQIO
++/**
++ * struct bfq_group - per (device, cgroup) data structure.
++ * @entity: schedulable entity to insert into the parent group sched_data.
++ * @sched_data: own sched_data, to contain child entities (they may be
++ * both bfq_queues and bfq_groups).
++ * @group_node: node to be inserted into the bfqio_cgroup->group_data
++ * list of the containing cgroup's bfqio_cgroup.
++ * @bfqd_node: node to be inserted into the @bfqd->group_list list
++ * of the groups active on the same device; used for cleanup.
++ * @bfqd: the bfq_data for the device this group acts upon.
++ * @async_bfqq: array of async queues for all the tasks belonging to
++ * the group, one queue per ioprio value per ioprio_class,
++ * except for the idle class that has only one queue.
++ * @async_idle_bfqq: async queue for the idle class (ioprio is ignored).
++ * @my_entity: pointer to @entity, %NULL for the toplevel group; used
++ * to avoid too many special cases during group creation/
++ * migration.
++ * @active_entities: number of active entities belonging to the group;
++ * unused for the root group. Used to know whether there
++ * are groups with more than one active @bfq_entity
++ * (see the comments to the function
++ * bfq_bfqq_must_not_expire()).
++ *
++ * Each (device, cgroup) pair has its own bfq_group, i.e., for each cgroup
++ * there is a set of bfq_groups, each one collecting the lower-level
++ * entities belonging to the group that are acting on the same device.
++ *
++ * Locking works as follows:
++ * o @group_node is protected by the bfqio_cgroup lock, and is accessed
++ * via RCU from its readers.
++ * o @bfqd is protected by the queue lock, RCU is used to access it
++ * from the readers.
++ * o All the other fields are protected by the @bfqd queue lock.
++ */
++struct bfq_group {
++ struct bfq_entity entity;
++ struct bfq_sched_data sched_data;
++
++ struct hlist_node group_node;
++ struct hlist_node bfqd_node;
++
++ void *bfqd;
++
++ struct bfq_queue *async_bfqq[2][IOPRIO_BE_NR];
++ struct bfq_queue *async_idle_bfqq;
++
++ struct bfq_entity *my_entity;
++
++ int active_entities;
++};
++
++/**
++ * struct bfqio_cgroup - bfq cgroup data structure.
++ * @css: subsystem state for bfq in the containing cgroup.
++ * @online: flag marked when the subsystem is inserted.
++ * @weight: cgroup weight.
++ * @ioprio: cgroup ioprio.
++ * @ioprio_class: cgroup ioprio_class.
++ * @lock: spinlock that protects @ioprio, @ioprio_class and @group_data.
++ * @group_data: list containing the bfq_group belonging to this cgroup.
++ *
++ * @group_data is accessed using RCU, with @lock protecting the updates,
++ * @ioprio and @ioprio_class are protected by @lock.
++ */
++struct bfqio_cgroup {
++ struct cgroup_subsys_state css;
++ bool online;
++
++ unsigned short weight, ioprio, ioprio_class;
++
++ spinlock_t lock;
++ struct hlist_head group_data;
++};
++#else
++struct bfq_group {
++ struct bfq_sched_data sched_data;
++
++ struct bfq_queue *async_bfqq[2][IOPRIO_BE_NR];
++ struct bfq_queue *async_idle_bfqq;
++};
++#endif
++
++static inline struct bfq_service_tree *
++bfq_entity_service_tree(struct bfq_entity *entity)
++{
++ struct bfq_sched_data *sched_data = entity->sched_data;
++ unsigned int idx = entity->ioprio_class - 1;
++
++ BUG_ON(idx >= BFQ_IOPRIO_CLASSES);
++ BUG_ON(sched_data == NULL);
++
++ return sched_data->service_tree + idx;
++}
++
++static inline struct bfq_queue *bic_to_bfqq(struct bfq_io_cq *bic,
++ bool is_sync)
++{
++ return bic->bfqq[is_sync];
++}
++
++static inline void bic_set_bfqq(struct bfq_io_cq *bic,
++ struct bfq_queue *bfqq, bool is_sync)
++{
++ bic->bfqq[is_sync] = bfqq;
++}
++
++static inline struct bfq_data *bic_to_bfqd(struct bfq_io_cq *bic)
++{
++ return bic->icq.q->elevator->elevator_data;
++}
++
++/**
++ * bfq_get_bfqd_locked - get a lock to a bfqd using a RCU protected pointer.
++ * @ptr: a pointer to a bfqd.
++ * @flags: storage for the flags to be saved.
++ *
++ * This function allows bfqg->bfqd to be protected by the
++ * queue lock of the bfqd they reference; the pointer is dereferenced
++ * under RCU, so the storage for bfqd is assured to be safe as long
++ * as the RCU read side critical section does not end. After the
++ * bfqd->queue->queue_lock is taken the pointer is rechecked, to be
++ * sure that no other writer accessed it. If we raced with a writer,
++ * the function returns NULL, with the queue unlocked, otherwise it
++ * returns the dereferenced pointer, with the queue locked.
++ */
++static inline struct bfq_data *bfq_get_bfqd_locked(void **ptr,
++ unsigned long *flags)
++{
++ struct bfq_data *bfqd;
++
++ rcu_read_lock();
++ bfqd = rcu_dereference(*(struct bfq_data **)ptr);
++
++ if (bfqd != NULL) {
++ spin_lock_irqsave(bfqd->queue->queue_lock, *flags);
++ if (*ptr == bfqd)
++ goto out;
++ spin_unlock_irqrestore(bfqd->queue->queue_lock, *flags);
++ }
++
++ bfqd = NULL;
++out:
++ rcu_read_unlock();
++ return bfqd;
++}
++
++static inline void bfq_put_bfqd_unlock(struct bfq_data *bfqd,
++ unsigned long *flags)
++{
++ spin_unlock_irqrestore(bfqd->queue->queue_lock, *flags);
++}
++
++static void bfq_check_ioprio_change(struct bfq_io_cq *bic);
++static void bfq_put_queue(struct bfq_queue *bfqq);
++static void bfq_dispatch_insert(struct request_queue *q, struct request *rq);
++static struct bfq_queue *bfq_get_queue(struct bfq_data *bfqd,
++ struct bfq_group *bfqg, int is_sync,
++ struct bfq_io_cq *bic, gfp_t gfp_mask);
++static void bfq_end_wr_async_queues(struct bfq_data *bfqd,
++ struct bfq_group *bfqg);
++static void bfq_put_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg);
++static void bfq_exit_bfqq(struct bfq_data *bfqd, struct bfq_queue *bfqq);
++
++#endif /* _BFQ_H */
+--
+1.9.1
+
diff --git a/5003_block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r8-for-4.2.patch b/5003_block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r8-for-4.2.patch
new file mode 100644
index 0000000..547a098
--- /dev/null
+++ b/5003_block-bfq-add-Early-Queue-Merge-EQM-to-BFQ-v7r8-for-4.2.patch
@@ -0,0 +1,1220 @@
+From e7a71ea27442adefc78628dedca1477a1ac6994e Mon Sep 17 00:00:00 2001
+From: Mauro Andreolini <mauro.andreolini@unimore.it>
+Date: Fri, 5 Jun 2015 17:45:40 +0200
+Subject: [PATCH 3/3] block, bfq: add Early Queue Merge (EQM) to BFQ-v7r8 for
+ 4.2.0
+
+A set of processes may happen to perform interleaved reads, i.e.,requests
+whose union would give rise to a sequential read pattern. There are two
+typical cases: in the first case, processes read fixed-size chunks of
+data at a fixed distance from each other, while in the second case processes
+may read variable-size chunks at variable distances. The latter case occurs
+for example with QEMU, which splits the I/O generated by the guest into
+multiple chunks, and lets these chunks be served by a pool of cooperating
+processes, iteratively assigning the next chunk of I/O to the first
+available process. CFQ uses actual queue merging for the first type of
+rocesses, whereas it uses preemption to get a sequential read pattern out
+of the read requests performed by the second type of processes. In the end
+it uses two different mechanisms to achieve the same goal: boosting the
+throughput with interleaved I/O.
+
+This patch introduces Early Queue Merge (EQM), a unified mechanism to get a
+sequential read pattern with both types of processes. The main idea is
+checking newly arrived requests against the next request of the active queue
+both in case of actual request insert and in case of request merge. By doing
+so, both the types of processes can be handled by just merging their queues.
+EQM is then simpler and more compact than the pair of mechanisms used in
+CFQ.
+
+Finally, EQM also preserves the typical low-latency properties of BFQ, by
+properly restoring the weight-raising state of a queue when it gets back to
+a non-merged state.
+
+Signed-off-by: Mauro Andreolini <mauro.andreolini@unimore.it>
+Signed-off-by: Arianna Avanzini <avanzini@google.com>
+Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
+---
+ block/bfq-iosched.c | 750 +++++++++++++++++++++++++++++++++++++---------------
+ block/bfq-sched.c | 28 --
+ block/bfq.h | 54 +++-
+ 3 files changed, 580 insertions(+), 252 deletions(-)
+
+diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
+index 773b2ee..71b51c1 100644
+--- a/block/bfq-iosched.c
++++ b/block/bfq-iosched.c
+@@ -573,6 +573,57 @@ static inline unsigned int bfq_wr_duration(struct bfq_data *bfqd)
+ return dur;
+ }
+
++static inline unsigned
++bfq_bfqq_cooperations(struct bfq_queue *bfqq)
++{
++ return bfqq->bic ? bfqq->bic->cooperations : 0;
++}
++
++static inline void
++bfq_bfqq_resume_state(struct bfq_queue *bfqq, struct bfq_io_cq *bic)
++{
++ if (bic->saved_idle_window)
++ bfq_mark_bfqq_idle_window(bfqq);
++ else
++ bfq_clear_bfqq_idle_window(bfqq);
++ if (bic->saved_IO_bound)
++ bfq_mark_bfqq_IO_bound(bfqq);
++ else
++ bfq_clear_bfqq_IO_bound(bfqq);
++ /* Assuming that the flag in_large_burst is already correctly set */
++ if (bic->wr_time_left && bfqq->bfqd->low_latency &&
++ !bfq_bfqq_in_large_burst(bfqq) &&
++ bic->cooperations < bfqq->bfqd->bfq_coop_thresh) {
++ /*
++ * Start a weight raising period with the duration given by
++ * the raising_time_left snapshot.
++ */
++ if (bfq_bfqq_busy(bfqq))
++ bfqq->bfqd->wr_busy_queues++;
++ bfqq->wr_coeff = bfqq->bfqd->bfq_wr_coeff;
++ bfqq->wr_cur_max_time = bic->wr_time_left;
++ bfqq->last_wr_start_finish = jiffies;
++ bfqq->entity.ioprio_changed = 1;
++ }
++ /*
++ * Clear wr_time_left to prevent bfq_bfqq_save_state() from
++ * getting confused about the queue's need of a weight-raising
++ * period.
++ */
++ bic->wr_time_left = 0;
++}
++
++/* Must be called with the queue_lock held. */
++static int bfqq_process_refs(struct bfq_queue *bfqq)
++{
++ int process_refs, io_refs;
++
++ io_refs = bfqq->allocated[READ] + bfqq->allocated[WRITE];
++ process_refs = atomic_read(&bfqq->ref) - io_refs - bfqq->entity.on_st;
++ BUG_ON(process_refs < 0);
++ return process_refs;
++}
++
+ /* Empty burst list and add just bfqq (see comments to bfq_handle_burst) */
+ static inline void bfq_reset_burst_list(struct bfq_data *bfqd,
+ struct bfq_queue *bfqq)
+@@ -817,7 +868,7 @@ static void bfq_add_request(struct request *rq)
+ bfq_rq_pos_tree_add(bfqd, bfqq);
+
+ if (!bfq_bfqq_busy(bfqq)) {
+- bool soft_rt,
++ bool soft_rt, coop_or_in_burst,
+ idle_for_long_time = time_is_before_jiffies(
+ bfqq->budget_timeout +
+ bfqd->bfq_wr_min_idle_time);
+@@ -841,11 +892,12 @@ static void bfq_add_request(struct request *rq)
+ bfqd->last_ins_in_burst = jiffies;
+ }
+
++ coop_or_in_burst = bfq_bfqq_in_large_burst(bfqq) ||
++ bfq_bfqq_cooperations(bfqq) >= bfqd->bfq_coop_thresh;
+ soft_rt = bfqd->bfq_wr_max_softrt_rate > 0 &&
+- !bfq_bfqq_in_large_burst(bfqq) &&
++ !coop_or_in_burst &&
+ time_is_before_jiffies(bfqq->soft_rt_next_start);
+- interactive = !bfq_bfqq_in_large_burst(bfqq) &&
+- idle_for_long_time;
++ interactive = !coop_or_in_burst && idle_for_long_time;
+ entity->budget = max_t(unsigned long, bfqq->max_budget,
+ bfq_serv_to_charge(next_rq, bfqq));
+
+@@ -864,11 +916,20 @@ static void bfq_add_request(struct request *rq)
+ if (!bfqd->low_latency)
+ goto add_bfqq_busy;
+
++ if (bfq_bfqq_just_split(bfqq))
++ goto set_ioprio_changed;
++
+ /*
+- * If the queue is not being boosted and has been idle
+- * for enough time, start a weight-raising period
++ * If the queue:
++ * - is not being boosted,
++ * - has been idle for enough time,
++ * - is not a sync queue or is linked to a bfq_io_cq (it is
++ * shared "for its nature" or it is not shared and its
++ * requests have not been redirected to a shared queue)
++ * start a weight-raising period.
+ */
+- if (old_wr_coeff == 1 && (interactive || soft_rt)) {
++ if (old_wr_coeff == 1 && (interactive || soft_rt) &&
++ (!bfq_bfqq_sync(bfqq) || bfqq->bic != NULL)) {
+ bfqq->wr_coeff = bfqd->bfq_wr_coeff;
+ if (interactive)
+ bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
+@@ -882,7 +943,7 @@ static void bfq_add_request(struct request *rq)
+ } else if (old_wr_coeff > 1) {
+ if (interactive)
+ bfqq->wr_cur_max_time = bfq_wr_duration(bfqd);
+- else if (bfq_bfqq_in_large_burst(bfqq) ||
++ else if (coop_or_in_burst ||
+ (bfqq->wr_cur_max_time ==
+ bfqd->bfq_wr_rt_max_time &&
+ !soft_rt)) {
+@@ -901,18 +962,18 @@ static void bfq_add_request(struct request *rq)
+ /*
+ *
+ * The remaining weight-raising time is lower
+- * than bfqd->bfq_wr_rt_max_time, which
+- * means that the application is enjoying
+- * weight raising either because deemed soft-
+- * rt in the near past, or because deemed
+- * interactive a long ago. In both cases,
+- * resetting now the current remaining weight-
+- * raising time for the application to the
+- * weight-raising duration for soft rt
+- * applications would not cause any latency
+- * increase for the application (as the new
+- * duration would be higher than the remaining
+- * time).
++ * than bfqd->bfq_wr_rt_max_time, which means
++ * that the application is enjoying weight
++ * raising either because deemed soft-rt in
++ * the near past, or because deemed interactive
++ * a long ago.
++ * In both cases, resetting now the current
++ * remaining weight-raising time for the
++ * application to the weight-raising duration
++ * for soft rt applications would not cause any
++ * latency increase for the application (as the
++ * new duration would be higher than the
++ * remaining time).
+ *
+ * In addition, the application is now meeting
+ * the requirements for being deemed soft rt.
+@@ -947,6 +1008,7 @@ static void bfq_add_request(struct request *rq)
+ bfqd->bfq_wr_rt_max_time;
+ }
+ }
++set_ioprio_changed:
+ if (old_wr_coeff != bfqq->wr_coeff)
+ entity->ioprio_changed = 1;
+ add_bfqq_busy:
+@@ -1167,90 +1229,35 @@ static void bfq_end_wr(struct bfq_data *bfqd)
+ spin_unlock_irq(bfqd->queue->queue_lock);
+ }
+
+-static int bfq_allow_merge(struct request_queue *q, struct request *rq,
+- struct bio *bio)
++static inline sector_t bfq_io_struct_pos(void *io_struct, bool request)
+ {
+- struct bfq_data *bfqd = q->elevator->elevator_data;
+- struct bfq_io_cq *bic;
+- struct bfq_queue *bfqq;
+-
+- /*
+- * Disallow merge of a sync bio into an async request.
+- */
+- if (bfq_bio_sync(bio) && !rq_is_sync(rq))
+- return 0;
+-
+- /*
+- * Lookup the bfqq that this bio will be queued with. Allow
+- * merge only if rq is queued there.
+- * Queue lock is held here.
+- */
+- bic = bfq_bic_lookup(bfqd, current->io_context);
+- if (bic == NULL)
+- return 0;
+-
+- bfqq = bic_to_bfqq(bic, bfq_bio_sync(bio));
+- return bfqq == RQ_BFQQ(rq);
+-}
+-
+-static void __bfq_set_in_service_queue(struct bfq_data *bfqd,
+- struct bfq_queue *bfqq)
+-{
+- if (bfqq != NULL) {
+- bfq_mark_bfqq_must_alloc(bfqq);
+- bfq_mark_bfqq_budget_new(bfqq);
+- bfq_clear_bfqq_fifo_expire(bfqq);
+-
+- bfqd->budgets_assigned = (bfqd->budgets_assigned*7 + 256) / 8;
+-
+- bfq_log_bfqq(bfqd, bfqq,
+- "set_in_service_queue, cur-budget = %lu",
+- bfqq->entity.budget);
+- }
+-
+- bfqd->in_service_queue = bfqq;
+-}
+-
+-/*
+- * Get and set a new queue for service.
+- */
+-static struct bfq_queue *bfq_set_in_service_queue(struct bfq_data *bfqd,
+- struct bfq_queue *bfqq)
+-{
+- if (!bfqq)
+- bfqq = bfq_get_next_queue(bfqd);
++ if (request)
++ return blk_rq_pos(io_struct);
+ else
+- bfq_get_next_queue_forced(bfqd, bfqq);
+-
+- __bfq_set_in_service_queue(bfqd, bfqq);
+- return bfqq;
++ return ((struct bio *)io_struct)->bi_iter.bi_sector;
+ }
+
+-static inline sector_t bfq_dist_from_last(struct bfq_data *bfqd,
+- struct request *rq)
++static inline sector_t bfq_dist_from(sector_t pos1,
++ sector_t pos2)
+ {
+- if (blk_rq_pos(rq) >= bfqd->last_position)
+- return blk_rq_pos(rq) - bfqd->last_position;
++ if (pos1 >= pos2)
++ return pos1 - pos2;
+ else
+- return bfqd->last_position - blk_rq_pos(rq);
++ return pos2 - pos1;
+ }
+
+-/*
+- * Return true if bfqq has no request pending and rq is close enough to
+- * bfqd->last_position, or if rq is closer to bfqd->last_position than
+- * bfqq->next_rq
+- */
+-static inline int bfq_rq_close(struct bfq_data *bfqd, struct request *rq)
++static inline int bfq_rq_close_to_sector(void *io_struct, bool request,
++ sector_t sector)
+ {
+- return bfq_dist_from_last(bfqd, rq) <= BFQQ_SEEK_THR;
++ return bfq_dist_from(bfq_io_struct_pos(io_struct, request), sector) <=
++ BFQQ_SEEK_THR;
+ }
+
+-static struct bfq_queue *bfqq_close(struct bfq_data *bfqd)
++static struct bfq_queue *bfqq_close(struct bfq_data *bfqd, sector_t sector)
+ {
+ struct rb_root *root = &bfqd->rq_pos_tree;
+ struct rb_node *parent, *node;
+ struct bfq_queue *__bfqq;
+- sector_t sector = bfqd->last_position;
+
+ if (RB_EMPTY_ROOT(root))
+ return NULL;
+@@ -1269,7 +1276,7 @@ static struct bfq_queue *bfqq_close(struct bfq_data *bfqd)
+ * next_request position).
+ */
+ __bfqq = rb_entry(parent, struct bfq_queue, pos_node);
+- if (bfq_rq_close(bfqd, __bfqq->next_rq))
++ if (bfq_rq_close_to_sector(__bfqq->next_rq, true, sector))
+ return __bfqq;
+
+ if (blk_rq_pos(__bfqq->next_rq) < sector)
+@@ -1280,7 +1287,7 @@ static struct bfq_queue *bfqq_close(struct bfq_data *bfqd)
+ return NULL;
+
+ __bfqq = rb_entry(node, struct bfq_queue, pos_node);
+- if (bfq_rq_close(bfqd, __bfqq->next_rq))
++ if (bfq_rq_close_to_sector(__bfqq->next_rq, true, sector))
+ return __bfqq;
+
+ return NULL;
+@@ -1289,14 +1296,12 @@ static struct bfq_queue *bfqq_close(struct bfq_data *bfqd)
+ /*
+ * bfqd - obvious
+ * cur_bfqq - passed in so that we don't decide that the current queue
+- * is closely cooperating with itself.
+- *
+- * We are assuming that cur_bfqq has dispatched at least one request,
+- * and that bfqd->last_position reflects a position on the disk associated
+- * with the I/O issued by cur_bfqq.
++ * is closely cooperating with itself
++ * sector - used as a reference point to search for a close queue
+ */
+ static struct bfq_queue *bfq_close_cooperator(struct bfq_data *bfqd,
+- struct bfq_queue *cur_bfqq)
++ struct bfq_queue *cur_bfqq,
++ sector_t sector)
+ {
+ struct bfq_queue *bfqq;
+
+@@ -1316,7 +1321,7 @@ static struct bfq_queue *bfq_close_cooperator(struct bfq_data *bfqd,
+ * working closely on the same area of the disk. In that case,
+ * we can group them together and don't waste time idling.
+ */
+- bfqq = bfqq_close(bfqd);
++ bfqq = bfqq_close(bfqd, sector);
+ if (bfqq == NULL || bfqq == cur_bfqq)
+ return NULL;
+
+@@ -1343,6 +1348,315 @@ static struct bfq_queue *bfq_close_cooperator(struct bfq_data *bfqd,
+ return bfqq;
+ }
+
++static struct bfq_queue *
++bfq_setup_merge(struct bfq_queue *bfqq, struct bfq_queue *new_bfqq)
++{
++ int process_refs, new_process_refs;
++ struct bfq_queue *__bfqq;
++
++ /*
++ * If there are no process references on the new_bfqq, then it is
++ * unsafe to follow the ->new_bfqq chain as other bfqq's in the chain
++ * may have dropped their last reference (not just their last process
++ * reference).
++ */
++ if (!bfqq_process_refs(new_bfqq))
++ return NULL;
++
++ /* Avoid a circular list and skip interim queue merges. */
++ while ((__bfqq = new_bfqq->new_bfqq)) {
++ if (__bfqq == bfqq)
++ return NULL;
++ new_bfqq = __bfqq;
++ }
++
++ process_refs = bfqq_process_refs(bfqq);
++ new_process_refs = bfqq_process_refs(new_bfqq);
++ /*
++ * If the process for the bfqq has gone away, there is no
++ * sense in merging the queues.
++ */
++ if (process_refs == 0 || new_process_refs == 0)
++ return NULL;
++
++ bfq_log_bfqq(bfqq->bfqd, bfqq, "scheduling merge with queue %d",
++ new_bfqq->pid);
++
++ /*
++ * Merging is just a redirection: the requests of the process
++ * owning one of the two queues are redirected to the other queue.
++ * The latter queue, in its turn, is set as shared if this is the
++ * first time that the requests of some process are redirected to
++ * it.
++ *
++ * We redirect bfqq to new_bfqq and not the opposite, because we
++ * are in the context of the process owning bfqq, hence we have
++ * the io_cq of this process. So we can immediately configure this
++ * io_cq to redirect the requests of the process to new_bfqq.
++ *
++ * NOTE, even if new_bfqq coincides with the in-service queue, the
++ * io_cq of new_bfqq is not available, because, if the in-service
++ * queue is shared, bfqd->in_service_bic may not point to the
++ * io_cq of the in-service queue.
++ * Redirecting the requests of the process owning bfqq to the
++ * currently in-service queue is in any case the best option, as
++ * we feed the in-service queue with new requests close to the
++ * last request served and, by doing so, hopefully increase the
++ * throughput.
++ */
++ bfqq->new_bfqq = new_bfqq;
++ atomic_add(process_refs, &new_bfqq->ref);
++ return new_bfqq;
++}
++
++/*
++ * Attempt to schedule a merge of bfqq with the currently in-service queue
++ * or with a close queue among the scheduled queues.
++ * Return NULL if no merge was scheduled, a pointer to the shared bfq_queue
++ * structure otherwise.
++ *
++ * The OOM queue is not allowed to participate to cooperation: in fact, since
++ * the requests temporarily redirected to the OOM queue could be redirected
++ * again to dedicated queues at any time, the state needed to correctly
++ * handle merging with the OOM queue would be quite complex and expensive
++ * to maintain. Besides, in such a critical condition as an out of memory,
++ * the benefits of queue merging may be little relevant, or even negligible.
++ */
++static struct bfq_queue *
++bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
++ void *io_struct, bool request)
++{
++ struct bfq_queue *in_service_bfqq, *new_bfqq;
++
++ if (bfqq->new_bfqq)
++ return bfqq->new_bfqq;
++
++ if (!io_struct || unlikely(bfqq == &bfqd->oom_bfqq))
++ return NULL;
++
++ in_service_bfqq = bfqd->in_service_queue;
++
++ if (in_service_bfqq == NULL || in_service_bfqq == bfqq ||
++ !bfqd->in_service_bic ||
++ unlikely(in_service_bfqq == &bfqd->oom_bfqq))
++ goto check_scheduled;
++
++ if (bfq_class_idle(in_service_bfqq) || bfq_class_idle(bfqq))
++ goto check_scheduled;
++
++ if (bfq_class_rt(in_service_bfqq) != bfq_class_rt(bfqq))
++ goto check_scheduled;
++
++ if (in_service_bfqq->entity.parent != bfqq->entity.parent)
++ goto check_scheduled;
++
++ if (bfq_rq_close_to_sector(io_struct, request, bfqd->last_position) &&
++ bfq_bfqq_sync(in_service_bfqq) && bfq_bfqq_sync(bfqq)) {
++ new_bfqq = bfq_setup_merge(bfqq, in_service_bfqq);
++ if (new_bfqq != NULL)
++ return new_bfqq; /* Merge with in-service queue */
++ }
++
++ /*
++ * Check whether there is a cooperator among currently scheduled
++ * queues. The only thing we need is that the bio/request is not
++ * NULL, as we need it to establish whether a cooperator exists.
++ */
++check_scheduled:
++ new_bfqq = bfq_close_cooperator(bfqd, bfqq,
++ bfq_io_struct_pos(io_struct, request));
++ if (new_bfqq && likely(new_bfqq != &bfqd->oom_bfqq))
++ return bfq_setup_merge(bfqq, new_bfqq);
++
++ return NULL;
++}
++
++static inline void
++bfq_bfqq_save_state(struct bfq_queue *bfqq)
++{
++ /*
++ * If bfqq->bic == NULL, the queue is already shared or its requests
++ * have already been redirected to a shared queue; both idle window
++ * and weight raising state have already been saved. Do nothing.
++ */
++ if (bfqq->bic == NULL)
++ return;
++ if (bfqq->bic->wr_time_left)
++ /*
++ * This is the queue of a just-started process, and would
++ * deserve weight raising: we set wr_time_left to the full
++ * weight-raising duration to trigger weight-raising when
++ * and if the queue is split and the first request of the
++ * queue is enqueued.
++ */
++ bfqq->bic->wr_time_left = bfq_wr_duration(bfqq->bfqd);
++ else if (bfqq->wr_coeff > 1) {
++ unsigned long wr_duration =
++ jiffies - bfqq->last_wr_start_finish;
++ /*
++ * It may happen that a queue's weight raising period lasts
++ * longer than its wr_cur_max_time, as weight raising is
++ * handled only when a request is enqueued or dispatched (it
++ * does not use any timer). If the weight raising period is
++ * about to end, don't save it.
++ */
++ if (bfqq->wr_cur_max_time <= wr_duration)
++ bfqq->bic->wr_time_left = 0;
++ else
++ bfqq->bic->wr_time_left =
++ bfqq->wr_cur_max_time - wr_duration;
++ /*
++ * The bfq_queue is becoming shared or the requests of the
++ * process owning the queue are being redirected to a shared
++ * queue. Stop the weight raising period of the queue, as in
++ * both cases it should not be owned by an interactive or
++ * soft real-time application.
++ */
++ bfq_bfqq_end_wr(bfqq);
++ } else
++ bfqq->bic->wr_time_left = 0;
++ bfqq->bic->saved_idle_window = bfq_bfqq_idle_window(bfqq);
++ bfqq->bic->saved_IO_bound = bfq_bfqq_IO_bound(bfqq);
++ bfqq->bic->saved_in_large_burst = bfq_bfqq_in_large_burst(bfqq);
++ bfqq->bic->was_in_burst_list = !hlist_unhashed(&bfqq->burst_list_node);
++ bfqq->bic->cooperations++;
++ bfqq->bic->failed_cooperations = 0;
++}
++
++static inline void
++bfq_get_bic_reference(struct bfq_queue *bfqq)
++{
++ /*
++ * If bfqq->bic has a non-NULL value, the bic to which it belongs
++ * is about to begin using a shared bfq_queue.
++ */
++ if (bfqq->bic)
++ atomic_long_inc(&bfqq->bic->icq.ioc->refcount);
++}
++
++static void
++bfq_merge_bfqqs(struct bfq_data *bfqd, struct bfq_io_cq *bic,
++ struct bfq_queue *bfqq, struct bfq_queue *new_bfqq)
++{
++ bfq_log_bfqq(bfqd, bfqq, "merging with queue %lu",
++ (long unsigned)new_bfqq->pid);
++ /* Save weight raising and idle window of the merged queues */
++ bfq_bfqq_save_state(bfqq);
++ bfq_bfqq_save_state(new_bfqq);
++ if (bfq_bfqq_IO_bound(bfqq))
++ bfq_mark_bfqq_IO_bound(new_bfqq);
++ bfq_clear_bfqq_IO_bound(bfqq);
++ /*
++ * Grab a reference to the bic, to prevent it from being destroyed
++ * before being possibly touched by a bfq_split_bfqq().
++ */
++ bfq_get_bic_reference(bfqq);
++ bfq_get_bic_reference(new_bfqq);
++ /*
++ * Merge queues (that is, let bic redirect its requests to new_bfqq)
++ */
++ bic_set_bfqq(bic, new_bfqq, 1);
++ bfq_mark_bfqq_coop(new_bfqq);
++ /*
++ * new_bfqq now belongs to at least two bics (it is a shared queue):
++ * set new_bfqq->bic to NULL. bfqq either:
++ * - does not belong to any bic any more, and hence bfqq->bic must
++ * be set to NULL, or
++ * - is a queue whose owning bics have already been redirected to a
++ * different queue, hence the queue is destined to not belong to
++ * any bic soon and bfqq->bic is already NULL (therefore the next
++ * assignment causes no harm).
++ */
++ new_bfqq->bic = NULL;
++ bfqq->bic = NULL;
++ bfq_put_queue(bfqq);
++}
++
++static inline void bfq_bfqq_increase_failed_cooperations(struct bfq_queue *bfqq)
++{
++ struct bfq_io_cq *bic = bfqq->bic;
++ struct bfq_data *bfqd = bfqq->bfqd;
++
++ if (bic && bfq_bfqq_cooperations(bfqq) >= bfqd->bfq_coop_thresh) {
++ bic->failed_cooperations++;
++ if (bic->failed_cooperations >= bfqd->bfq_failed_cooperations)
++ bic->cooperations = 0;
++ }
++}
++
++static int bfq_allow_merge(struct request_queue *q, struct request *rq,
++ struct bio *bio)
++{
++ struct bfq_data *bfqd = q->elevator->elevator_data;
++ struct bfq_io_cq *bic;
++ struct bfq_queue *bfqq, *new_bfqq;
++
++ /*
++ * Disallow merge of a sync bio into an async request.
++ */
++ if (bfq_bio_sync(bio) && !rq_is_sync(rq))
++ return 0;
++
++ /*
++ * Lookup the bfqq that this bio will be queued with. Allow
++ * merge only if rq is queued there.
++ * Queue lock is held here.
++ */
++ bic = bfq_bic_lookup(bfqd, current->io_context);
++ if (bic == NULL)
++ return 0;
++
++ bfqq = bic_to_bfqq(bic, bfq_bio_sync(bio));
++ /*
++ * We take advantage of this function to perform an early merge
++ * of the queues of possible cooperating processes.
++ */
++ if (bfqq != NULL) {
++ new_bfqq = bfq_setup_cooperator(bfqd, bfqq, bio, false);
++ if (new_bfqq != NULL) {
++ bfq_merge_bfqqs(bfqd, bic, bfqq, new_bfqq);
++ /*
++ * If we get here, the bio will be queued in the
++ * shared queue, i.e., new_bfqq, so use new_bfqq
++ * to decide whether bio and rq can be merged.
++ */
++ bfqq = new_bfqq;
++ } else
++ bfq_bfqq_increase_failed_cooperations(bfqq);
++ }
++
++ return bfqq == RQ_BFQQ(rq);
++}
++
++static void __bfq_set_in_service_queue(struct bfq_data *bfqd,
++ struct bfq_queue *bfqq)
++{
++ if (bfqq != NULL) {
++ bfq_mark_bfqq_must_alloc(bfqq);
++ bfq_mark_bfqq_budget_new(bfqq);
++ bfq_clear_bfqq_fifo_expire(bfqq);
++
++ bfqd->budgets_assigned = (bfqd->budgets_assigned*7 + 256) / 8;
++
++ bfq_log_bfqq(bfqd, bfqq,
++ "set_in_service_queue, cur-budget = %lu",
++ bfqq->entity.budget);
++ }
++
++ bfqd->in_service_queue = bfqq;
++}
++
++/*
++ * Get and set a new queue for service.
++ */
++static struct bfq_queue *bfq_set_in_service_queue(struct bfq_data *bfqd)
++{
++ struct bfq_queue *bfqq = bfq_get_next_queue(bfqd);
++
++ __bfq_set_in_service_queue(bfqd, bfqq);
++ return bfqq;
++}
++
+ /*
+ * If enough samples have been computed, return the current max budget
+ * stored in bfqd, which is dynamically updated according to the
+@@ -1488,61 +1802,6 @@ static struct request *bfq_check_fifo(struct bfq_queue *bfqq)
+ return rq;
+ }
+
+-/* Must be called with the queue_lock held. */
+-static int bfqq_process_refs(struct bfq_queue *bfqq)
+-{
+- int process_refs, io_refs;
+-
+- io_refs = bfqq->allocated[READ] + bfqq->allocated[WRITE];
+- process_refs = atomic_read(&bfqq->ref) - io_refs - bfqq->entity.on_st;
+- BUG_ON(process_refs < 0);
+- return process_refs;
+-}
+-
+-static void bfq_setup_merge(struct bfq_queue *bfqq, struct bfq_queue *new_bfqq)
+-{
+- int process_refs, new_process_refs;
+- struct bfq_queue *__bfqq;
+-
+- /*
+- * If there are no process references on the new_bfqq, then it is
+- * unsafe to follow the ->new_bfqq chain as other bfqq's in the chain
+- * may have dropped their last reference (not just their last process
+- * reference).
+- */
+- if (!bfqq_process_refs(new_bfqq))
+- return;
+-
+- /* Avoid a circular list and skip interim queue merges. */
+- while ((__bfqq = new_bfqq->new_bfqq)) {
+- if (__bfqq == bfqq)
+- return;
+- new_bfqq = __bfqq;
+- }
+-
+- process_refs = bfqq_process_refs(bfqq);
+- new_process_refs = bfqq_process_refs(new_bfqq);
+- /*
+- * If the process for the bfqq has gone away, there is no
+- * sense in merging the queues.
+- */
+- if (process_refs == 0 || new_process_refs == 0)
+- return;
+-
+- /*
+- * Merge in the direction of the lesser amount of work.
+- */
+- if (new_process_refs >= process_refs) {
+- bfqq->new_bfqq = new_bfqq;
+- atomic_add(process_refs, &new_bfqq->ref);
+- } else {
+- new_bfqq->new_bfqq = bfqq;
+- atomic_add(new_process_refs, &bfqq->ref);
+- }
+- bfq_log_bfqq(bfqq->bfqd, bfqq, "scheduling merge with queue %d",
+- new_bfqq->pid);
+-}
+-
+ static inline unsigned long bfq_bfqq_budget_left(struct bfq_queue *bfqq)
+ {
+ struct bfq_entity *entity = &bfqq->entity;
+@@ -2269,7 +2528,7 @@ static inline bool bfq_bfqq_must_idle(struct bfq_queue *bfqq)
+ */
+ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
+ {
+- struct bfq_queue *bfqq, *new_bfqq = NULL;
++ struct bfq_queue *bfqq;
+ struct request *next_rq;
+ enum bfqq_expiration reason = BFQ_BFQQ_BUDGET_TIMEOUT;
+
+@@ -2279,17 +2538,6 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
+
+ bfq_log_bfqq(bfqd, bfqq, "select_queue: already in-service queue");
+
+- /*
+- * If another queue has a request waiting within our mean seek
+- * distance, let it run. The expire code will check for close
+- * cooperators and put the close queue at the front of the
+- * service tree. If possible, merge the expiring queue with the
+- * new bfqq.
+- */
+- new_bfqq = bfq_close_cooperator(bfqd, bfqq);
+- if (new_bfqq != NULL && bfqq->new_bfqq == NULL)
+- bfq_setup_merge(bfqq, new_bfqq);
+-
+ if (bfq_may_expire_for_budg_timeout(bfqq) &&
+ !timer_pending(&bfqd->idle_slice_timer) &&
+ !bfq_bfqq_must_idle(bfqq))
+@@ -2328,10 +2576,7 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
+ bfq_clear_bfqq_wait_request(bfqq);
+ del_timer(&bfqd->idle_slice_timer);
+ }
+- if (new_bfqq == NULL)
+- goto keep_queue;
+- else
+- goto expire;
++ goto keep_queue;
+ }
+ }
+
+@@ -2340,40 +2585,30 @@ static struct bfq_queue *bfq_select_queue(struct bfq_data *bfqd)
+ * for a new request, or has requests waiting for a completion and
+ * may idle after their completion, then keep it anyway.
+ */
+- if (new_bfqq == NULL && (timer_pending(&bfqd->idle_slice_timer) ||
+- (bfqq->dispatched != 0 && bfq_bfqq_must_not_expire(bfqq)))) {
++ if (timer_pending(&bfqd->idle_slice_timer) ||
++ (bfqq->dispatched != 0 && bfq_bfqq_must_not_expire(bfqq))) {
+ bfqq = NULL;
+ goto keep_queue;
+- } else if (new_bfqq != NULL && timer_pending(&bfqd->idle_slice_timer)) {
+- /*
+- * Expiring the queue because there is a close cooperator,
+- * cancel timer.
+- */
+- bfq_clear_bfqq_wait_request(bfqq);
+- del_timer(&bfqd->idle_slice_timer);
+ }
+
+ reason = BFQ_BFQQ_NO_MORE_REQUESTS;
+ expire:
+ bfq_bfqq_expire(bfqd, bfqq, 0, reason);
+ new_queue:
+- bfqq = bfq_set_in_service_queue(bfqd, new_bfqq);
++ bfqq = bfq_set_in_service_queue(bfqd);
+ bfq_log(bfqd, "select_queue: new queue %d returned",
+ bfqq != NULL ? bfqq->pid : 0);
+ keep_queue:
+ return bfqq;
+ }
+
+-static void bfq_update_wr_data(struct bfq_data *bfqd,
+- struct bfq_queue *bfqq)
++static void bfq_update_wr_data(struct bfq_data *bfqd, struct bfq_queue *bfqq)
+ {
+- if (bfqq->wr_coeff > 1) { /* queue is being boosted */
+- struct bfq_entity *entity = &bfqq->entity;
+-
++ struct bfq_entity *entity = &bfqq->entity;
++ if (bfqq->wr_coeff > 1) { /* queue is being weight-raised */
+ bfq_log_bfqq(bfqd, bfqq,
+ "raising period dur %u/%u msec, old coeff %u, w %d(%d)",
+- jiffies_to_msecs(jiffies -
+- bfqq->last_wr_start_finish),
++ jiffies_to_msecs(jiffies - bfqq->last_wr_start_finish),
+ jiffies_to_msecs(bfqq->wr_cur_max_time),
+ bfqq->wr_coeff,
+ bfqq->entity.weight, bfqq->entity.orig_weight);
+@@ -2382,12 +2617,16 @@ static void bfq_update_wr_data(struct bfq_data *bfqd,
+ entity->orig_weight * bfqq->wr_coeff);
+ if (entity->ioprio_changed)
+ bfq_log_bfqq(bfqd, bfqq, "WARN: pending prio change");
++
+ /*
+ * If the queue was activated in a burst, or
+ * too much time has elapsed from the beginning
+- * of this weight-raising, then end weight raising.
++ * of this weight-raising period, or the queue has
++ * exceeded the acceptable number of cooperations,
++ * then end weight raising.
+ */
+ if (bfq_bfqq_in_large_burst(bfqq) ||
++ bfq_bfqq_cooperations(bfqq) >= bfqd->bfq_coop_thresh ||
+ time_is_before_jiffies(bfqq->last_wr_start_finish +
+ bfqq->wr_cur_max_time)) {
+ bfqq->last_wr_start_finish = jiffies;
+@@ -2396,11 +2635,13 @@ static void bfq_update_wr_data(struct bfq_data *bfqd,
+ bfqq->last_wr_start_finish,
+ jiffies_to_msecs(bfqq->wr_cur_max_time));
+ bfq_bfqq_end_wr(bfqq);
+- __bfq_entity_update_weight_prio(
+- bfq_entity_service_tree(entity),
+- entity);
+ }
+ }
++ /* Update weight both if it must be raised and if it must be lowered */
++ if ((entity->weight > entity->orig_weight) != (bfqq->wr_coeff > 1))
++ __bfq_entity_update_weight_prio(
++ bfq_entity_service_tree(entity),
++ entity);
+ }
+
+ /*
+@@ -2647,6 +2888,25 @@ static inline void bfq_init_icq(struct io_cq *icq)
+ struct bfq_io_cq *bic = icq_to_bic(icq);
+
+ bic->ttime.last_end_request = jiffies;
++ /*
++ * A newly created bic indicates that the process has just
++ * started doing I/O, and is probably mapping into memory its
++ * executable and libraries: it definitely needs weight raising.
++ * There is however the possibility that the process performs,
++ * for a while, I/O close to some other process. EQM intercepts
++ * this behavior and may merge the queue corresponding to the
++ * process with some other queue, BEFORE the weight of the queue
++ * is raised. Merged queues are not weight-raised (they are assumed
++ * to belong to processes that benefit only from high throughput).
++ * If the merge is basically the consequence of an accident, then
++ * the queue will be split soon and will get back its old weight.
++ * It is then important to write down somewhere that this queue
++ * does need weight raising, even if it did not make it to get its
++ * weight raised before being merged. To this purpose, we overload
++ * the field raising_time_left and assign 1 to it, to mark the queue
++ * as needing weight raising.
++ */
++ bic->wr_time_left = 1;
+ }
+
+ static void bfq_exit_icq(struct io_cq *icq)
+@@ -2660,6 +2920,13 @@ static void bfq_exit_icq(struct io_cq *icq)
+ }
+
+ if (bic->bfqq[BLK_RW_SYNC]) {
++ /*
++ * If the bic is using a shared queue, put the reference
++ * taken on the io_context when the bic started using a
++ * shared bfq_queue.
++ */
++ if (bfq_bfqq_coop(bic->bfqq[BLK_RW_SYNC]))
++ put_io_context(icq->ioc);
+ bfq_exit_bfqq(bfqd, bic->bfqq[BLK_RW_SYNC]);
+ bic->bfqq[BLK_RW_SYNC] = NULL;
+ }
+@@ -2952,6 +3219,10 @@ static void bfq_update_idle_window(struct bfq_data *bfqd,
+ if (!bfq_bfqq_sync(bfqq) || bfq_class_idle(bfqq))
+ return;
+
++ /* Idle window just restored, statistics are meaningless. */
++ if (bfq_bfqq_just_split(bfqq))
++ return;
++
+ enable_idle = bfq_bfqq_idle_window(bfqq);
+
+ if (atomic_read(&bic->icq.ioc->active_ref) == 0 ||
+@@ -2999,6 +3270,7 @@ static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq,
+ if (bfqq->entity.service > bfq_max_budget(bfqd) / 8 ||
+ !BFQQ_SEEKY(bfqq))
+ bfq_update_idle_window(bfqd, bfqq, bic);
++ bfq_clear_bfqq_just_split(bfqq);
+
+ bfq_log_bfqq(bfqd, bfqq,
+ "rq_enqueued: idle_window=%d (seeky %d, mean %llu)",
+@@ -3059,12 +3331,47 @@ static void bfq_rq_enqueued(struct bfq_data *bfqd, struct bfq_queue *bfqq,
+ static void bfq_insert_request(struct request_queue *q, struct request *rq)
+ {
+ struct bfq_data *bfqd = q->elevator->elevator_data;
+- struct bfq_queue *bfqq = RQ_BFQQ(rq);
++ struct bfq_queue *bfqq = RQ_BFQQ(rq), *new_bfqq;
+
+ assert_spin_locked(bfqd->queue->queue_lock);
+
++ /*
++ * An unplug may trigger a requeue of a request from the device
++ * driver: make sure we are in process context while trying to
++ * merge two bfq_queues.
++ */
++ if (!in_interrupt()) {
++ new_bfqq = bfq_setup_cooperator(bfqd, bfqq, rq, true);
++ if (new_bfqq != NULL) {
++ if (bic_to_bfqq(RQ_BIC(rq), 1) != bfqq)
++ new_bfqq = bic_to_bfqq(RQ_BIC(rq), 1);
++ /*
++ * Release the request's reference to the old bfqq
++ * and make sure one is taken to the shared queue.
++ */
++ new_bfqq->allocated[rq_data_dir(rq)]++;
++ bfqq->allocated[rq_data_dir(rq)]--;
++ atomic_inc(&new_bfqq->ref);
++ bfq_put_queue(bfqq);
++ if (bic_to_bfqq(RQ_BIC(rq), 1) == bfqq)
++ bfq_merge_bfqqs(bfqd, RQ_BIC(rq),
++ bfqq, new_bfqq);
++ rq->elv.priv[1] = new_bfqq;
++ bfqq = new_bfqq;
++ } else
++ bfq_bfqq_increase_failed_cooperations(bfqq);
++ }
++
+ bfq_add_request(rq);
+
++ /*
++ * Here a newly-created bfq_queue has already started a weight-raising
++ * period: clear raising_time_left to prevent bfq_bfqq_save_state()
++ * from assigning it a full weight-raising period. See the detailed
++ * comments about this field in bfq_init_icq().
++ */
++ if (bfqq->bic != NULL)
++ bfqq->bic->wr_time_left = 0;
+ rq->fifo_time = jiffies + bfqd->bfq_fifo_expire[rq_is_sync(rq)];
+ list_add_tail(&rq->queuelist, &bfqq->fifo);
+
+@@ -3226,18 +3533,6 @@ static void bfq_put_request(struct request *rq)
+ }
+ }
+
+-static struct bfq_queue *
+-bfq_merge_bfqqs(struct bfq_data *bfqd, struct bfq_io_cq *bic,
+- struct bfq_queue *bfqq)
+-{
+- bfq_log_bfqq(bfqd, bfqq, "merging with queue %lu",
+- (long unsigned)bfqq->new_bfqq->pid);
+- bic_set_bfqq(bic, bfqq->new_bfqq, 1);
+- bfq_mark_bfqq_coop(bfqq->new_bfqq);
+- bfq_put_queue(bfqq);
+- return bic_to_bfqq(bic, 1);
+-}
+-
+ /*
+ * Returns NULL if a new bfqq should be allocated, or the old bfqq if this
+ * was the last process referring to said bfqq.
+@@ -3246,6 +3541,9 @@ static struct bfq_queue *
+ bfq_split_bfqq(struct bfq_io_cq *bic, struct bfq_queue *bfqq)
+ {
+ bfq_log_bfqq(bfqq->bfqd, bfqq, "splitting queue");
++
++ put_io_context(bic->icq.ioc);
++
+ if (bfqq_process_refs(bfqq) == 1) {
+ bfqq->pid = current->pid;
+ bfq_clear_bfqq_coop(bfqq);
+@@ -3274,6 +3572,7 @@ static int bfq_set_request(struct request_queue *q, struct request *rq,
+ struct bfq_queue *bfqq;
+ struct bfq_group *bfqg;
+ unsigned long flags;
++ bool split = false;
+
+ might_sleep_if(gfp_mask & __GFP_WAIT);
+
+@@ -3291,25 +3590,26 @@ new_queue:
+ if (bfqq == NULL || bfqq == &bfqd->oom_bfqq) {
+ bfqq = bfq_get_queue(bfqd, bfqg, is_sync, bic, gfp_mask);
+ bic_set_bfqq(bic, bfqq, is_sync);
++ if (split && is_sync) {
++ if ((bic->was_in_burst_list && bfqd->large_burst) ||
++ bic->saved_in_large_burst)
++ bfq_mark_bfqq_in_large_burst(bfqq);
++ else {
++ bfq_clear_bfqq_in_large_burst(bfqq);
++ if (bic->was_in_burst_list)
++ hlist_add_head(&bfqq->burst_list_node,
++ &bfqd->burst_list);
++ }
++ }
+ } else {
+- /*
+- * If the queue was seeky for too long, break it apart.
+- */
++ /* If the queue was seeky for too long, break it apart. */
+ if (bfq_bfqq_coop(bfqq) && bfq_bfqq_split_coop(bfqq)) {
+ bfq_log_bfqq(bfqd, bfqq, "breaking apart bfqq");
+ bfqq = bfq_split_bfqq(bic, bfqq);
++ split = true;
+ if (!bfqq)
+ goto new_queue;
+ }
+-
+- /*
+- * Check to see if this queue is scheduled to merge with
+- * another closely cooperating queue. The merging of queues
+- * happens here as it must be done in process context.
+- * The reference on new_bfqq was taken in merge_bfqqs.
+- */
+- if (bfqq->new_bfqq != NULL)
+- bfqq = bfq_merge_bfqqs(bfqd, bic, bfqq);
+ }
+
+ bfqq->allocated[rw]++;
+@@ -3320,6 +3620,26 @@ new_queue:
+ rq->elv.priv[0] = bic;
+ rq->elv.priv[1] = bfqq;
+
++ /*
++ * If a bfq_queue has only one process reference, it is owned
++ * by only one bfq_io_cq: we can set the bic field of the
++ * bfq_queue to the address of that structure. Also, if the
++ * queue has just been split, mark a flag so that the
++ * information is available to the other scheduler hooks.
++ */
++ if (likely(bfqq != &bfqd->oom_bfqq) && bfqq_process_refs(bfqq) == 1) {
++ bfqq->bic = bic;
++ if (split) {
++ bfq_mark_bfqq_just_split(bfqq);
++ /*
++ * If the queue has just been split from a shared
++ * queue, restore the idle window and the possible
++ * weight raising period.
++ */
++ bfq_bfqq_resume_state(bfqq, bic);
++ }
++ }
++
+ spin_unlock_irqrestore(q->queue_lock, flags);
+
+ return 0;
+diff --git a/block/bfq-sched.c b/block/bfq-sched.c
+index c343099..d0890c6 100644
+--- a/block/bfq-sched.c
++++ b/block/bfq-sched.c
+@@ -1085,34 +1085,6 @@ static struct bfq_queue *bfq_get_next_queue(struct bfq_data *bfqd)
+ return bfqq;
+ }
+
+-/*
+- * Forced extraction of the given queue.
+- */
+-static void bfq_get_next_queue_forced(struct bfq_data *bfqd,
+- struct bfq_queue *bfqq)
+-{
+- struct bfq_entity *entity;
+- struct bfq_sched_data *sd;
+-
+- BUG_ON(bfqd->in_service_queue != NULL);
+-
+- entity = &bfqq->entity;
+- /*
+- * Bubble up extraction/update from the leaf to the root.
+- */
+- for_each_entity(entity) {
+- sd = entity->sched_data;
+- bfq_update_budget(entity);
+- bfq_update_vtime(bfq_entity_service_tree(entity));
+- bfq_active_extract(bfq_entity_service_tree(entity), entity);
+- sd->in_service_entity = entity;
+- sd->next_in_service = NULL;
+- entity->service = 0;
+- }
+-
+- return;
+-}
+-
+ static void __bfq_bfqd_reset_in_service(struct bfq_data *bfqd)
+ {
+ if (bfqd->in_service_bic != NULL) {
+diff --git a/block/bfq.h b/block/bfq.h
+index e350b5f..93d3f6e 100644
+--- a/block/bfq.h
++++ b/block/bfq.h
+@@ -218,18 +218,21 @@ struct bfq_group;
+ * idle @bfq_queue with no outstanding requests, then
+ * the task associated with the queue it is deemed as
+ * soft real-time (see the comments to the function
+- * bfq_bfqq_softrt_next_start()).
++ * bfq_bfqq_softrt_next_start())
+ * @last_idle_bklogged: time of the last transition of the @bfq_queue from
+ * idle to backlogged
+ * @service_from_backlogged: cumulative service received from the @bfq_queue
+ * since the last transition from idle to
+ * backlogged
++ * @bic: pointer to the bfq_io_cq owning the bfq_queue, set to %NULL if the
++ * queue is shared
+ *
+- * A bfq_queue is a leaf request queue; it can be associated with an io_context
+- * or more, if it is async or shared between cooperating processes. @cgroup
+- * holds a reference to the cgroup, to be sure that it does not disappear while
+- * a bfqq still references it (mostly to avoid races between request issuing and
+- * task migration followed by cgroup destruction).
++ * A bfq_queue is a leaf request queue; it can be associated with an
++ * io_context or more, if it is async or shared between cooperating
++ * processes. @cgroup holds a reference to the cgroup, to be sure that it
++ * does not disappear while a bfqq still references it (mostly to avoid
++ * races between request issuing and task migration followed by cgroup
++ * destruction).
+ * All the fields are protected by the queue lock of the containing bfqd.
+ */
+ struct bfq_queue {
+@@ -269,6 +272,7 @@ struct bfq_queue {
+ unsigned int requests_within_timer;
+
+ pid_t pid;
++ struct bfq_io_cq *bic;
+
+ /* weight-raising fields */
+ unsigned long wr_cur_max_time;
+@@ -298,12 +302,42 @@ struct bfq_ttime {
+ * @icq: associated io_cq structure
+ * @bfqq: array of two process queues, the sync and the async
+ * @ttime: associated @bfq_ttime struct
++ * @wr_time_left: snapshot of the time left before weight raising ends
++ * for the sync queue associated to this process; this
++ * snapshot is taken to remember this value while the weight
++ * raising is suspended because the queue is merged with a
++ * shared queue, and is used to set @raising_cur_max_time
++ * when the queue is split from the shared queue and its
++ * weight is raised again
++ * @saved_idle_window: same purpose as the previous field for the idle
++ * window
++ * @saved_IO_bound: same purpose as the previous two fields for the I/O
++ * bound classification of a queue
++ * @saved_in_large_burst: same purpose as the previous fields for the
++ * value of the field keeping the queue's belonging
++ * to a large burst
++ * @was_in_burst_list: true if the queue belonged to a burst list
++ * before its merge with another cooperating queue
++ * @cooperations: counter of consecutive successful queue merges underwent
++ * by any of the process' @bfq_queues
++ * @failed_cooperations: counter of consecutive failed queue merges of any
++ * of the process' @bfq_queues
+ */
+ struct bfq_io_cq {
+ struct io_cq icq; /* must be the first member */
+ struct bfq_queue *bfqq[2];
+ struct bfq_ttime ttime;
+ int ioprio;
++
++ unsigned int wr_time_left;
++ bool saved_idle_window;
++ bool saved_IO_bound;
++
++ bool saved_in_large_burst;
++ bool was_in_burst_list;
++
++ unsigned int cooperations;
++ unsigned int failed_cooperations;
+ };
+
+ enum bfq_device_speed {
+@@ -536,7 +570,7 @@ enum bfqq_state_flags {
+ BFQ_BFQQ_FLAG_idle_window, /* slice idling enabled */
+ BFQ_BFQQ_FLAG_sync, /* synchronous queue */
+ BFQ_BFQQ_FLAG_budget_new, /* no completion with this budget */
+- BFQ_BFQQ_FLAG_IO_bound, /*
++ BFQ_BFQQ_FLAG_IO_bound, /*
+ * bfqq has timed-out at least once
+ * having consumed at most 2/10 of
+ * its budget
+@@ -549,12 +583,13 @@ enum bfqq_state_flags {
+ * bfqq has proved to be slow and
+ * seeky until budget timeout
+ */
+- BFQ_BFQQ_FLAG_softrt_update, /*
++ BFQ_BFQQ_FLAG_softrt_update, /*
+ * may need softrt-next-start
+ * update
+ */
+ BFQ_BFQQ_FLAG_coop, /* bfqq is shared */
+- BFQ_BFQQ_FLAG_split_coop, /* shared bfqq will be splitted */
++ BFQ_BFQQ_FLAG_split_coop, /* shared bfqq will be split */
++ BFQ_BFQQ_FLAG_just_split, /* queue has just been split */
+ };
+
+ #define BFQ_BFQQ_FNS(name) \
+@@ -583,6 +618,7 @@ BFQ_BFQQ_FNS(in_large_burst);
+ BFQ_BFQQ_FNS(constantly_seeky);
+ BFQ_BFQQ_FNS(coop);
+ BFQ_BFQQ_FNS(split_coop);
++BFQ_BFQQ_FNS(just_split);
+ BFQ_BFQQ_FNS(softrt_update);
+ #undef BFQ_BFQQ_FNS
+
+--
+1.9.1
+
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-09-28 23:44 Mike Pagano
0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-09-28 23:44 UTC (permalink / raw
To: gentoo-commits
commit: 226f35b4faf8c37111b54e1449a20137b0b3212c
Author: Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Mon Sep 28 23:44:18 2015 +0000
Commit: Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Mon Sep 28 23:44:18 2015 +0000
URL: https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=226f35b4
dm crypt: constrain crypt device's max_segment_size to PAGE_SIZE. See bug #561558. Thanks to kipplasterjoe for reporting.
0000_README | 4 ++
1600_dm-crypt-limit-max-segment-size.patch | 84 ++++++++++++++++++++++++++++++
2 files changed, 88 insertions(+)
diff --git a/0000_README b/0000_README
index 93b94b6..551dcf3 100644
--- a/0000_README
+++ b/0000_README
@@ -55,6 +55,10 @@ Patch: 1510_fs-enable-link-security-restrictions-by-default.patch
From: http://sources.debian.net/src/linux/3.16.7-ckt4-3/debian/patches/debian/fs-enable-link-security-restrictions-by-default.patch/
Desc: Enable link security restrictions by default.
+Patch: 1600_dm-crypt-limit-max-segment-size.patch
+From: https://bugzilla.kernel.org/show_bug.cgi?id=104421
+Desc: dm crypt: constrain crypt device's max_segment_size to PAGE_SIZE.
+
Patch: 2700_ThinkPad-30-brightness-control-fix.patch
From: Seth Forshee <seth.forshee@canonical.com>
Desc: ACPI: Disable Windows 8 compatibility for some Lenovo ThinkPads.
diff --git a/1600_dm-crypt-limit-max-segment-size.patch b/1600_dm-crypt-limit-max-segment-size.patch
new file mode 100644
index 0000000..82aca44
--- /dev/null
+++ b/1600_dm-crypt-limit-max-segment-size.patch
@@ -0,0 +1,84 @@
+From 586b286b110e94eb31840ac5afc0c24e0881fe34 Mon Sep 17 00:00:00 2001
+From: Mike Snitzer <snitzer@redhat.com>
+Date: Wed, 9 Sep 2015 21:34:51 -0400
+Subject: dm crypt: constrain crypt device's max_segment_size to PAGE_SIZE
+
+Setting the dm-crypt device's max_segment_size to PAGE_SIZE is an
+unfortunate constraint that is required to avoid the potential for
+exceeding dm-crypt's underlying device's max_segments limits -- due to
+crypt_alloc_buffer() possibly allocating pages for the encryption bio
+that are not as physically contiguous as the original bio.
+
+It is interesting to note that this problem was already fixed back in
+2007 via commit 91e106259 ("dm crypt: use bio_add_page"). But Linux 4.0
+commit cf2f1abfb ("dm crypt: don't allocate pages for a partial
+request") regressed dm-crypt back to _not_ using bio_add_page(). But
+given dm-crypt's cpu parallelization changes all depend on commit
+cf2f1abfb's abandoning of the more complex io fragments processing that
+dm-crypt previously had we cannot easily go back to using
+bio_add_page().
+
+So all said the cleanest way to resolve this issue is to fix dm-crypt to
+properly constrain the original bios entering dm-crypt so the encryption
+bios that dm-crypt generates from the original bios are always
+compatible with the underlying device's max_segments queue limits.
+
+It should be noted that technically Linux 4.3 does _not_ need this fix
+because of the block core's new late bio-splitting capability. But, it
+is reasoned, there is little to be gained by having the block core split
+the encrypted bio that is composed of PAGE_SIZE segments. That said, in
+the future we may revert this change.
+
+Fixes: cf2f1abfb ("dm crypt: don't allocate pages for a partial request")
+Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=104421
+Suggested-by: Jeff Moyer <jmoyer@redhat.com>
+Signed-off-by: Mike Snitzer <snitzer@redhat.com>
+Cc: stable@vger.kernel.org # 4.0+
+
+diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
+index d60c88d..4b3b6f8 100644
+--- a/drivers/md/dm-crypt.c
++++ b/drivers/md/dm-crypt.c
+@@ -968,7 +968,8 @@ static void crypt_free_buffer_pages(struct crypt_config *cc, struct bio *clone);
+
+ /*
+ * Generate a new unfragmented bio with the given size
+- * This should never violate the device limitations
++ * This should never violate the device limitations (but only because
++ * max_segment_size is being constrained to PAGE_SIZE).
+ *
+ * This function may be called concurrently. If we allocate from the mempool
+ * concurrently, there is a possibility of deadlock. For example, if we have
+@@ -2045,9 +2046,20 @@ static int crypt_iterate_devices(struct dm_target *ti,
+ return fn(ti, cc->dev, cc->start, ti->len, data);
+ }
+
++static void crypt_io_hints(struct dm_target *ti, struct queue_limits *limits)
++{
++ /*
++ * Unfortunate constraint that is required to avoid the potential
++ * for exceeding underlying device's max_segments limits -- due to
++ * crypt_alloc_buffer() possibly allocating pages for the encryption
++ * bio that are not as physically contiguous as the original bio.
++ */
++ limits->max_segment_size = PAGE_SIZE;
++}
++
+ static struct target_type crypt_target = {
+ .name = "crypt",
+- .version = {1, 14, 0},
++ .version = {1, 14, 1},
+ .module = THIS_MODULE,
+ .ctr = crypt_ctr,
+ .dtr = crypt_dtr,
+@@ -2058,6 +2070,7 @@ static struct target_type crypt_target = {
+ .resume = crypt_resume,
+ .message = crypt_message,
+ .iterate_devices = crypt_iterate_devices,
++ .io_hints = crypt_io_hints,
+ };
+
+ static int __init dm_crypt_init(void)
+--
+cgit v0.10.2
+
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-09-29 17:51 Mike Pagano
0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-09-29 17:51 UTC (permalink / raw
To: gentoo-commits
commit: 418b300cac3a4b2286197e6433c3e8a08c638305
Author: Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Tue Sep 29 17:51:49 2015 +0000
Commit: Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Tue Sep 29 17:51:49 2015 +0000
URL: https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=418b300c
Linux patch 4.2.2
0000_README | 4 +
1001_linux-4.2.2.patch | 5014 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 5018 insertions(+)
diff --git a/0000_README b/0000_README
index 551dcf3..9428abc 100644
--- a/0000_README
+++ b/0000_README
@@ -47,6 +47,10 @@ Patch: 1000_linux-4.2.1.patch
From: http://www.kernel.org
Desc: Linux 4.2.1
+Patch: 1001_linux-4.2.2.patch
+From: http://www.kernel.org
+Desc: Linux 4.2.2
+
Patch: 1500_XATTR_USER_PREFIX.patch
From: https://bugs.gentoo.org/show_bug.cgi?id=470644
Desc: Support for namespace user.pax.* on tmpfs.
diff --git a/1001_linux-4.2.2.patch b/1001_linux-4.2.2.patch
new file mode 100644
index 0000000..6e64028
--- /dev/null
+++ b/1001_linux-4.2.2.patch
@@ -0,0 +1,5014 @@
+diff --git a/Makefile b/Makefile
+index a03efc18aa48..3578b4426ecf 100644
+--- a/Makefile
++++ b/Makefile
+@@ -1,6 +1,6 @@
+ VERSION = 4
+ PATCHLEVEL = 2
+-SUBLEVEL = 1
++SUBLEVEL = 2
+ EXTRAVERSION =
+ NAME = Hurr durr I'ma sheep
+
+diff --git a/arch/arm/boot/compressed/decompress.c b/arch/arm/boot/compressed/decompress.c
+index bd245d34952d..a0765e7ed6c7 100644
+--- a/arch/arm/boot/compressed/decompress.c
++++ b/arch/arm/boot/compressed/decompress.c
+@@ -57,5 +57,5 @@ extern char * strstr(const char * s1, const char *s2);
+
+ int do_decompress(u8 *input, int len, u8 *output, void (*error)(char *x))
+ {
+- return decompress(input, len, NULL, NULL, output, NULL, error);
++ return __decompress(input, len, NULL, NULL, output, 0, NULL, error);
+ }
+diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
+index bc738d2b8392..f9c341c5ae78 100644
+--- a/arch/arm/kvm/arm.c
++++ b/arch/arm/kvm/arm.c
+@@ -449,7 +449,7 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
+ * Map the VGIC hardware resources before running a vcpu the first
+ * time on this VM.
+ */
+- if (unlikely(!vgic_ready(kvm))) {
++ if (unlikely(irqchip_in_kernel(kvm) && !vgic_ready(kvm))) {
+ ret = kvm_vgic_map_resources(kvm);
+ if (ret)
+ return ret;
+diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
+index 318175f62c24..735456feb08e 100644
+--- a/arch/arm64/Kconfig
++++ b/arch/arm64/Kconfig
+@@ -104,6 +104,10 @@ config NO_IOPORT_MAP
+ config STACKTRACE_SUPPORT
+ def_bool y
+
++config ILLEGAL_POINTER_VALUE
++ hex
++ default 0xdead000000000000
++
+ config LOCKDEP_SUPPORT
+ def_bool y
+
+@@ -417,6 +421,22 @@ config ARM64_ERRATUM_845719
+
+ If unsure, say Y.
+
++config ARM64_ERRATUM_843419
++ bool "Cortex-A53: 843419: A load or store might access an incorrect address"
++ depends on MODULES
++ default y
++ help
++ This option builds kernel modules using the large memory model in
++ order to avoid the use of the ADRP instruction, which can cause
++ a subsequent memory access to use an incorrect address on Cortex-A53
++ parts up to r0p4.
++
++ Note that the kernel itself must be linked with a version of ld
++ which fixes potentially affected ADRP instructions through the
++ use of veneers.
++
++ If unsure, say Y.
++
+ endmenu
+
+
+diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
+index 4d2a925998f9..81151663ef38 100644
+--- a/arch/arm64/Makefile
++++ b/arch/arm64/Makefile
+@@ -30,6 +30,10 @@ endif
+
+ CHECKFLAGS += -D__aarch64__
+
++ifeq ($(CONFIG_ARM64_ERRATUM_843419), y)
++CFLAGS_MODULE += -mcmodel=large
++endif
++
+ # Default value
+ head-y := arch/arm64/kernel/head.o
+
+diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
+index f800d45ea226..44a59c20e773 100644
+--- a/arch/arm64/include/asm/memory.h
++++ b/arch/arm64/include/asm/memory.h
+@@ -114,6 +114,14 @@ extern phys_addr_t memstart_addr;
+ #define PHYS_OFFSET ({ memstart_addr; })
+
+ /*
++ * The maximum physical address that the linear direct mapping
++ * of system RAM can cover. (PAGE_OFFSET can be interpreted as
++ * a 2's complement signed quantity and negated to derive the
++ * maximum size of the linear mapping.)
++ */
++#define MAX_MEMBLOCK_ADDR ({ memstart_addr - PAGE_OFFSET - 1; })
++
++/*
+ * PFNs are used to describe any physical page; this means
+ * PFN 0 == physical address 0.
+ *
+diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
+index e16351819fed..8213ca15abd2 100644
+--- a/arch/arm64/kernel/entry.S
++++ b/arch/arm64/kernel/entry.S
+@@ -116,7 +116,7 @@
+ */
+ .endm
+
+- .macro kernel_exit, el, ret = 0
++ .macro kernel_exit, el
+ ldp x21, x22, [sp, #S_PC] // load ELR, SPSR
+ .if \el == 0
+ ct_user_enter
+@@ -146,11 +146,7 @@
+ .endif
+ msr elr_el1, x21 // set up the return data
+ msr spsr_el1, x22
+- .if \ret
+- ldr x1, [sp, #S_X1] // preserve x0 (syscall return)
+- .else
+ ldp x0, x1, [sp, #16 * 0]
+- .endif
+ ldp x2, x3, [sp, #16 * 1]
+ ldp x4, x5, [sp, #16 * 2]
+ ldp x6, x7, [sp, #16 * 3]
+@@ -613,22 +609,21 @@ ENDPROC(cpu_switch_to)
+ */
+ ret_fast_syscall:
+ disable_irq // disable interrupts
++ str x0, [sp, #S_X0] // returned x0
+ ldr x1, [tsk, #TI_FLAGS] // re-check for syscall tracing
+ and x2, x1, #_TIF_SYSCALL_WORK
+ cbnz x2, ret_fast_syscall_trace
+ and x2, x1, #_TIF_WORK_MASK
+- cbnz x2, fast_work_pending
++ cbnz x2, work_pending
+ enable_step_tsk x1, x2
+- kernel_exit 0, ret = 1
++ kernel_exit 0
+ ret_fast_syscall_trace:
+ enable_irq // enable interrupts
+- b __sys_trace_return
++ b __sys_trace_return_skipped // we already saved x0
+
+ /*
+ * Ok, we need to do extra processing, enter the slow path.
+ */
+-fast_work_pending:
+- str x0, [sp, #S_X0] // returned x0
+ work_pending:
+ tbnz x1, #TIF_NEED_RESCHED, work_resched
+ /* TIF_SIGPENDING, TIF_NOTIFY_RESUME or TIF_FOREIGN_FPSTATE case */
+@@ -652,7 +647,7 @@ ret_to_user:
+ cbnz x2, work_pending
+ enable_step_tsk x1, x2
+ no_work_pending:
+- kernel_exit 0, ret = 0
++ kernel_exit 0
+ ENDPROC(ret_to_user)
+
+ /*
+diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
+index 44d6f7545505..c56956a16d3f 100644
+--- a/arch/arm64/kernel/fpsimd.c
++++ b/arch/arm64/kernel/fpsimd.c
+@@ -158,6 +158,7 @@ void fpsimd_thread_switch(struct task_struct *next)
+ void fpsimd_flush_thread(void)
+ {
+ memset(¤t->thread.fpsimd_state, 0, sizeof(struct fpsimd_state));
++ fpsimd_flush_task_state(current);
+ set_thread_flag(TIF_FOREIGN_FPSTATE);
+ }
+
+diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
+index c0ff3ce4299e..370541162658 100644
+--- a/arch/arm64/kernel/head.S
++++ b/arch/arm64/kernel/head.S
+@@ -528,6 +528,11 @@ CPU_LE( movk x0, #0x30d0, lsl #16 ) // Clear EE and E0E on LE systems
+ msr hstr_el2, xzr // Disable CP15 traps to EL2
+ #endif
+
++ /* EL2 debug */
++ mrs x0, pmcr_el0 // Disable debug access traps
++ ubfx x0, x0, #11, #5 // to EL2 and allow access to
++ msr mdcr_el2, x0 // all PMU counters from EL1
++
+ /* Stage-2 translation */
+ msr vttbr_el2, xzr
+
+diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
+index 67bf4107f6ef..876eb8df50bf 100644
+--- a/arch/arm64/kernel/module.c
++++ b/arch/arm64/kernel/module.c
+@@ -332,12 +332,14 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
+ ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 0, 21,
+ AARCH64_INSN_IMM_ADR);
+ break;
++#ifndef CONFIG_ARM64_ERRATUM_843419
+ case R_AARCH64_ADR_PREL_PG_HI21_NC:
+ overflow_check = false;
+ case R_AARCH64_ADR_PREL_PG_HI21:
+ ovf = reloc_insn_imm(RELOC_OP_PAGE, loc, val, 12, 21,
+ AARCH64_INSN_IMM_ADR);
+ break;
++#endif
+ case R_AARCH64_ADD_ABS_LO12_NC:
+ case R_AARCH64_LDST8_ABS_LO12_NC:
+ overflow_check = false;
+diff --git a/arch/arm64/kernel/signal32.c b/arch/arm64/kernel/signal32.c
+index 948f0ad2de23..71ef6dc89ae5 100644
+--- a/arch/arm64/kernel/signal32.c
++++ b/arch/arm64/kernel/signal32.c
+@@ -212,14 +212,32 @@ int copy_siginfo_from_user32(siginfo_t *to, compat_siginfo_t __user *from)
+
+ /*
+ * VFP save/restore code.
++ *
++ * We have to be careful with endianness, since the fpsimd context-switch
++ * code operates on 128-bit (Q) register values whereas the compat ABI
++ * uses an array of 64-bit (D) registers. Consequently, we need to swap
++ * the two halves of each Q register when running on a big-endian CPU.
+ */
++union __fpsimd_vreg {
++ __uint128_t raw;
++ struct {
++#ifdef __AARCH64EB__
++ u64 hi;
++ u64 lo;
++#else
++ u64 lo;
++ u64 hi;
++#endif
++ };
++};
++
+ static int compat_preserve_vfp_context(struct compat_vfp_sigframe __user *frame)
+ {
+ struct fpsimd_state *fpsimd = ¤t->thread.fpsimd_state;
+ compat_ulong_t magic = VFP_MAGIC;
+ compat_ulong_t size = VFP_STORAGE_SIZE;
+ compat_ulong_t fpscr, fpexc;
+- int err = 0;
++ int i, err = 0;
+
+ /*
+ * Save the hardware registers to the fpsimd_state structure.
+@@ -235,10 +253,15 @@ static int compat_preserve_vfp_context(struct compat_vfp_sigframe __user *frame)
+ /*
+ * Now copy the FP registers. Since the registers are packed,
+ * we can copy the prefix we want (V0-V15) as it is.
+- * FIXME: Won't work if big endian.
+ */
+- err |= __copy_to_user(&frame->ufp.fpregs, fpsimd->vregs,
+- sizeof(frame->ufp.fpregs));
++ for (i = 0; i < ARRAY_SIZE(frame->ufp.fpregs); i += 2) {
++ union __fpsimd_vreg vreg = {
++ .raw = fpsimd->vregs[i >> 1],
++ };
++
++ __put_user_error(vreg.lo, &frame->ufp.fpregs[i], err);
++ __put_user_error(vreg.hi, &frame->ufp.fpregs[i + 1], err);
++ }
+
+ /* Create an AArch32 fpscr from the fpsr and the fpcr. */
+ fpscr = (fpsimd->fpsr & VFP_FPSCR_STAT_MASK) |
+@@ -263,7 +286,7 @@ static int compat_restore_vfp_context(struct compat_vfp_sigframe __user *frame)
+ compat_ulong_t magic = VFP_MAGIC;
+ compat_ulong_t size = VFP_STORAGE_SIZE;
+ compat_ulong_t fpscr;
+- int err = 0;
++ int i, err = 0;
+
+ __get_user_error(magic, &frame->magic, err);
+ __get_user_error(size, &frame->size, err);
+@@ -273,12 +296,14 @@ static int compat_restore_vfp_context(struct compat_vfp_sigframe __user *frame)
+ if (magic != VFP_MAGIC || size != VFP_STORAGE_SIZE)
+ return -EINVAL;
+
+- /*
+- * Copy the FP registers into the start of the fpsimd_state.
+- * FIXME: Won't work if big endian.
+- */
+- err |= __copy_from_user(fpsimd.vregs, frame->ufp.fpregs,
+- sizeof(frame->ufp.fpregs));
++ /* Copy the FP registers into the start of the fpsimd_state. */
++ for (i = 0; i < ARRAY_SIZE(frame->ufp.fpregs); i += 2) {
++ union __fpsimd_vreg vreg;
++
++ __get_user_error(vreg.lo, &frame->ufp.fpregs[i], err);
++ __get_user_error(vreg.hi, &frame->ufp.fpregs[i + 1], err);
++ fpsimd.vregs[i >> 1] = vreg.raw;
++ }
+
+ /* Extract the fpsr and the fpcr from the fpscr */
+ __get_user_error(fpscr, &frame->ufp.fpscr, err);
+diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
+index 17a8fb14f428..3c6051cbf442 100644
+--- a/arch/arm64/kvm/hyp.S
++++ b/arch/arm64/kvm/hyp.S
+@@ -840,8 +840,6 @@
+ mrs x3, cntv_ctl_el0
+ and x3, x3, #3
+ str w3, [x0, #VCPU_TIMER_CNTV_CTL]
+- bic x3, x3, #1 // Clear Enable
+- msr cntv_ctl_el0, x3
+
+ isb
+
+@@ -849,6 +847,9 @@
+ str x3, [x0, #VCPU_TIMER_CNTV_CVAL]
+
+ 1:
++ // Disable the virtual timer
++ msr cntv_ctl_el0, xzr
++
+ // Allow physical timer/counter access for the host
+ mrs x2, cnthctl_el2
+ orr x2, x2, #3
+@@ -943,13 +944,15 @@ ENTRY(__kvm_vcpu_run)
+ // Guest context
+ add x2, x0, #VCPU_CONTEXT
+
++ // We must restore the 32-bit state before the sysregs, thanks
++ // to Cortex-A57 erratum #852523.
++ restore_guest_32bit_state
+ bl __restore_sysregs
+ bl __restore_fpsimd
+
+ skip_debug_state x3, 1f
+ bl __restore_debug
+ 1:
+- restore_guest_32bit_state
+ restore_guest_regs
+
+ // That's it, no more messing around.
+diff --git a/arch/h8300/boot/compressed/misc.c b/arch/h8300/boot/compressed/misc.c
+index 704274127c07..c4f2cfcb117b 100644
+--- a/arch/h8300/boot/compressed/misc.c
++++ b/arch/h8300/boot/compressed/misc.c
+@@ -70,5 +70,5 @@ void decompress_kernel(void)
+ free_mem_ptr = (unsigned long)&_end;
+ free_mem_end_ptr = free_mem_ptr + HEAP_SIZE;
+
+- decompress(input_data, input_len, NULL, NULL, output, NULL, error);
++ __decompress(input_data, input_len, NULL, NULL, output, 0, NULL, error);
+ }
+diff --git a/arch/m32r/boot/compressed/misc.c b/arch/m32r/boot/compressed/misc.c
+index 28a09529f206..3a7692745868 100644
+--- a/arch/m32r/boot/compressed/misc.c
++++ b/arch/m32r/boot/compressed/misc.c
+@@ -86,6 +86,7 @@ decompress_kernel(int mmu_on, unsigned char *zimage_data,
+ free_mem_end_ptr = free_mem_ptr + BOOT_HEAP_SIZE;
+
+ puts("\nDecompressing Linux... ");
+- decompress(input_data, input_len, NULL, NULL, output_data, NULL, error);
++ __decompress(input_data, input_len, NULL, NULL, output_data, 0,
++ NULL, error);
+ puts("done.\nBooting the kernel.\n");
+ }
+diff --git a/arch/mips/boot/compressed/decompress.c b/arch/mips/boot/compressed/decompress.c
+index 54831069a206..080cd53bac36 100644
+--- a/arch/mips/boot/compressed/decompress.c
++++ b/arch/mips/boot/compressed/decompress.c
+@@ -111,8 +111,8 @@ void decompress_kernel(unsigned long boot_heap_start)
+ puts("\n");
+
+ /* Decompress the kernel with according algorithm */
+- decompress((char *)zimage_start, zimage_size, 0, 0,
+- (void *)VMLINUX_LOAD_ADDRESS_ULL, 0, error);
++ __decompress((char *)zimage_start, zimage_size, 0, 0,
++ (void *)VMLINUX_LOAD_ADDRESS_ULL, 0, 0, error);
+
+ /* FIXME: should we flush cache here? */
+ puts("Now, booting the kernel...\n");
+diff --git a/arch/mips/kernel/cps-vec.S b/arch/mips/kernel/cps-vec.S
+index 1b6ca634e646..9f71c06aebf6 100644
+--- a/arch/mips/kernel/cps-vec.S
++++ b/arch/mips/kernel/cps-vec.S
+@@ -152,7 +152,7 @@ dcache_done:
+
+ /* Enter the coherent domain */
+ li t0, 0xff
+- PTR_S t0, GCR_CL_COHERENCE_OFS(v1)
++ sw t0, GCR_CL_COHERENCE_OFS(v1)
+ ehb
+
+ /* Jump to kseg0 */
+@@ -302,7 +302,7 @@ LEAF(mips_cps_boot_vpes)
+ PTR_L t0, 0(t0)
+
+ /* Calculate a pointer to this cores struct core_boot_config */
+- PTR_L t0, GCR_CL_ID_OFS(t0)
++ lw t0, GCR_CL_ID_OFS(t0)
+ li t1, COREBOOTCFG_SIZE
+ mul t0, t0, t1
+ PTR_LA t1, mips_cps_core_bootcfg
+diff --git a/arch/mips/math-emu/cp1emu.c b/arch/mips/math-emu/cp1emu.c
+index 712f17a2ecf2..f0f1b98a5fde 100644
+--- a/arch/mips/math-emu/cp1emu.c
++++ b/arch/mips/math-emu/cp1emu.c
+@@ -1137,7 +1137,7 @@ emul:
+ break;
+
+ case mfhc_op:
+- if (!cpu_has_mips_r2)
++ if (!cpu_has_mips_r2_r6)
+ goto sigill;
+
+ /* copregister rd -> gpr[rt] */
+@@ -1148,7 +1148,7 @@ emul:
+ break;
+
+ case mthc_op:
+- if (!cpu_has_mips_r2)
++ if (!cpu_has_mips_r2_r6)
+ goto sigill;
+
+ /* copregister rd <- gpr[rt] */
+@@ -1181,6 +1181,24 @@ emul:
+ }
+ break;
+
++ case bc1eqz_op:
++ case bc1nez_op:
++ if (!cpu_has_mips_r6 || delay_slot(xcp))
++ return SIGILL;
++
++ cond = likely = 0;
++ switch (MIPSInst_RS(ir)) {
++ case bc1eqz_op:
++ if (get_fpr32(¤t->thread.fpu.fpr[MIPSInst_RT(ir)], 0) & 0x1)
++ cond = 1;
++ break;
++ case bc1nez_op:
++ if (!(get_fpr32(¤t->thread.fpu.fpr[MIPSInst_RT(ir)], 0) & 0x1))
++ cond = 1;
++ break;
++ }
++ goto branch_common;
++
+ case bc_op:
+ if (delay_slot(xcp))
+ return SIGILL;
+@@ -1207,7 +1225,7 @@ emul:
+ case bct_op:
+ break;
+ }
+-
++branch_common:
+ set_delay_slot(xcp);
+ if (cond) {
+ /*
+diff --git a/arch/parisc/kernel/irq.c b/arch/parisc/kernel/irq.c
+index f3191db6e2e9..c0eab24f6a9e 100644
+--- a/arch/parisc/kernel/irq.c
++++ b/arch/parisc/kernel/irq.c
+@@ -507,8 +507,8 @@ void do_cpu_irq_mask(struct pt_regs *regs)
+ struct pt_regs *old_regs;
+ unsigned long eirr_val;
+ int irq, cpu = smp_processor_id();
+-#ifdef CONFIG_SMP
+ struct irq_desc *desc;
++#ifdef CONFIG_SMP
+ cpumask_t dest;
+ #endif
+
+@@ -521,8 +521,12 @@ void do_cpu_irq_mask(struct pt_regs *regs)
+ goto set_out;
+ irq = eirr_to_irq(eirr_val);
+
+-#ifdef CONFIG_SMP
++ /* Filter out spurious interrupts, mostly from serial port at bootup */
+ desc = irq_to_desc(irq);
++ if (unlikely(!desc->action))
++ goto set_out;
++
++#ifdef CONFIG_SMP
+ cpumask_copy(&dest, desc->irq_data.affinity);
+ if (irqd_is_per_cpu(&desc->irq_data) &&
+ !cpumask_test_cpu(smp_processor_id(), &dest)) {
+diff --git a/arch/parisc/kernel/syscall.S b/arch/parisc/kernel/syscall.S
+index 7ef22e3387e0..0b8d26d3ba43 100644
+--- a/arch/parisc/kernel/syscall.S
++++ b/arch/parisc/kernel/syscall.S
+@@ -821,7 +821,7 @@ cas2_action:
+ /* 64bit CAS */
+ #ifdef CONFIG_64BIT
+ 19: ldd,ma 0(%sr3,%r26), %r29
+- sub,= %r29, %r25, %r0
++ sub,*= %r29, %r25, %r0
+ b,n cas2_end
+ 20: std,ma %r24, 0(%sr3,%r26)
+ copy %r0, %r28
+diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
+index 73eddda53b8e..4eec430d8fa8 100644
+--- a/arch/powerpc/boot/Makefile
++++ b/arch/powerpc/boot/Makefile
+@@ -28,6 +28,9 @@ BOOTCFLAGS += -m64
+ endif
+ ifdef CONFIG_CPU_BIG_ENDIAN
+ BOOTCFLAGS += -mbig-endian
++else
++BOOTCFLAGS += -mlittle-endian
++BOOTCFLAGS += $(call cc-option,-mabi=elfv2)
+ endif
+
+ BOOTAFLAGS := -D__ASSEMBLY__ $(BOOTCFLAGS) -traditional -nostdinc
+diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
+index 3bb7488bd24b..7ee2300ee392 100644
+--- a/arch/powerpc/include/asm/pgtable-ppc64.h
++++ b/arch/powerpc/include/asm/pgtable-ppc64.h
+@@ -135,7 +135,19 @@
+ #define pte_iterate_hashed_end() } while(0)
+
+ #ifdef CONFIG_PPC_HAS_HASH_64K
+-#define pte_pagesize_index(mm, addr, pte) get_slice_psize(mm, addr)
++/*
++ * We expect this to be called only for user addresses or kernel virtual
++ * addresses other than the linear mapping.
++ */
++#define pte_pagesize_index(mm, addr, pte) \
++ ({ \
++ unsigned int psize; \
++ if (is_kernel_addr(addr)) \
++ psize = MMU_PAGE_4K; \
++ else \
++ psize = get_slice_psize(mm, addr); \
++ psize; \
++ })
+ #else
+ #define pte_pagesize_index(mm, addr, pte) MMU_PAGE_4K
+ #endif
+diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
+index 7a4ede16b283..b77ef369c0f0 100644
+--- a/arch/powerpc/include/asm/rtas.h
++++ b/arch/powerpc/include/asm/rtas.h
+@@ -343,6 +343,7 @@ extern void rtas_power_off(void);
+ extern void rtas_halt(void);
+ extern void rtas_os_term(char *str);
+ extern int rtas_get_sensor(int sensor, int index, int *state);
++extern int rtas_get_sensor_fast(int sensor, int index, int *state);
+ extern int rtas_get_power_level(int powerdomain, int *level);
+ extern int rtas_set_power_level(int powerdomain, int level, int *setlevel);
+ extern bool rtas_indicator_present(int token, int *maxindex);
+diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h
+index 58abeda64cb7..15cca17cba4b 100644
+--- a/arch/powerpc/include/asm/switch_to.h
++++ b/arch/powerpc/include/asm/switch_to.h
+@@ -29,6 +29,7 @@ static inline void save_early_sprs(struct thread_struct *prev) {}
+
+ extern void enable_kernel_fp(void);
+ extern void enable_kernel_altivec(void);
++extern void enable_kernel_vsx(void);
+ extern int emulate_altivec(struct pt_regs *);
+ extern void __giveup_vsx(struct task_struct *);
+ extern void giveup_vsx(struct task_struct *);
+diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
+index af9b597b10af..01c961d5d2de 100644
+--- a/arch/powerpc/kernel/eeh.c
++++ b/arch/powerpc/kernel/eeh.c
+@@ -308,11 +308,26 @@ void eeh_slot_error_detail(struct eeh_pe *pe, int severity)
+ if (!(pe->type & EEH_PE_PHB)) {
+ if (eeh_has_flag(EEH_ENABLE_IO_FOR_LOG))
+ eeh_pci_enable(pe, EEH_OPT_THAW_MMIO);
++
++ /*
++ * The config space of some PCI devices can't be accessed
++ * when their PEs are in frozen state. Otherwise, fenced
++ * PHB might be seen. Those PEs are identified with flag
++ * EEH_PE_CFG_RESTRICTED, indicating EEH_PE_CFG_BLOCKED
++ * is set automatically when the PE is put to EEH_PE_ISOLATED.
++ *
++ * Restoring BARs possibly triggers PCI config access in
++ * (OPAL) firmware and then causes fenced PHB. If the
++ * PCI config is blocked with flag EEH_PE_CFG_BLOCKED, it's
++ * pointless to restore BARs and dump config space.
++ */
+ eeh_ops->configure_bridge(pe);
+- eeh_pe_restore_bars(pe);
++ if (!(pe->state & EEH_PE_CFG_BLOCKED)) {
++ eeh_pe_restore_bars(pe);
+
+- pci_regs_buf[0] = 0;
+- eeh_pe_traverse(pe, eeh_dump_pe_log, &loglen);
++ pci_regs_buf[0] = 0;
++ eeh_pe_traverse(pe, eeh_dump_pe_log, &loglen);
++ }
+ }
+
+ eeh_ops->get_log(pe, severity, pci_regs_buf, loglen);
+@@ -1116,9 +1131,6 @@ void eeh_add_device_late(struct pci_dev *dev)
+ return;
+ }
+
+- if (eeh_has_flag(EEH_PROBE_MODE_DEV))
+- eeh_ops->probe(pdn, NULL);
+-
+ /*
+ * The EEH cache might not be removed correctly because of
+ * unbalanced kref to the device during unplug time, which
+@@ -1142,6 +1154,9 @@ void eeh_add_device_late(struct pci_dev *dev)
+ dev->dev.archdata.edev = NULL;
+ }
+
++ if (eeh_has_flag(EEH_PROBE_MODE_DEV))
++ eeh_ops->probe(pdn, NULL);
++
+ edev->pdev = dev;
+ dev->dev.archdata.edev = edev;
+
+diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
+index 8005e18d1b40..64e6e9d9e656 100644
+--- a/arch/powerpc/kernel/process.c
++++ b/arch/powerpc/kernel/process.c
+@@ -204,8 +204,6 @@ EXPORT_SYMBOL_GPL(flush_altivec_to_thread);
+ #endif /* CONFIG_ALTIVEC */
+
+ #ifdef CONFIG_VSX
+-#if 0
+-/* not currently used, but some crazy RAID module might want to later */
+ void enable_kernel_vsx(void)
+ {
+ WARN_ON(preemptible());
+@@ -220,7 +218,6 @@ void enable_kernel_vsx(void)
+ #endif /* CONFIG_SMP */
+ }
+ EXPORT_SYMBOL(enable_kernel_vsx);
+-#endif
+
+ void giveup_vsx(struct task_struct *tsk)
+ {
+diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
+index 7a488c108410..caffb10e7aa3 100644
+--- a/arch/powerpc/kernel/rtas.c
++++ b/arch/powerpc/kernel/rtas.c
+@@ -584,6 +584,23 @@ int rtas_get_sensor(int sensor, int index, int *state)
+ }
+ EXPORT_SYMBOL(rtas_get_sensor);
+
++int rtas_get_sensor_fast(int sensor, int index, int *state)
++{
++ int token = rtas_token("get-sensor-state");
++ int rc;
++
++ if (token == RTAS_UNKNOWN_SERVICE)
++ return -ENOENT;
++
++ rc = rtas_call(token, 2, 2, state, sensor, index);
++ WARN_ON(rc == RTAS_BUSY || (rc >= RTAS_EXTENDED_DELAY_MIN &&
++ rc <= RTAS_EXTENDED_DELAY_MAX));
++
++ if (rc < 0)
++ return rtas_error_rc(rc);
++ return rc;
++}
++
+ bool rtas_indicator_present(int token, int *maxindex)
+ {
+ int proplen, count, i;
+diff --git a/arch/powerpc/mm/hugepage-hash64.c b/arch/powerpc/mm/hugepage-hash64.c
+index 43dafb9d6a46..4d87122cf6a7 100644
+--- a/arch/powerpc/mm/hugepage-hash64.c
++++ b/arch/powerpc/mm/hugepage-hash64.c
+@@ -85,7 +85,6 @@ int __hash_page_thp(unsigned long ea, unsigned long access, unsigned long vsid,
+ BUG_ON(index >= 4096);
+
+ vpn = hpt_vpn(ea, vsid, ssize);
+- hash = hpt_hash(vpn, shift, ssize);
+ hpte_slot_array = get_hpte_slot_array(pmdp);
+ if (psize == MMU_PAGE_4K) {
+ /*
+@@ -101,6 +100,7 @@ int __hash_page_thp(unsigned long ea, unsigned long access, unsigned long vsid,
+ valid = hpte_valid(hpte_slot_array, index);
+ if (valid) {
+ /* update the hpte bits */
++ hash = hpt_hash(vpn, shift, ssize);
+ hidx = hpte_hash_index(hpte_slot_array, index);
+ if (hidx & _PTEIDX_SECONDARY)
+ hash = ~hash;
+@@ -126,6 +126,7 @@ int __hash_page_thp(unsigned long ea, unsigned long access, unsigned long vsid,
+ if (!valid) {
+ unsigned long hpte_group;
+
++ hash = hpt_hash(vpn, shift, ssize);
+ /* insert new entry */
+ pa = pmd_pfn(__pmd(old_pmd)) << PAGE_SHIFT;
+ new_pmd |= _PAGE_HASHPTE;
+diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
+index 85cbc96eff6c..8b64f89e68c9 100644
+--- a/arch/powerpc/platforms/powernv/pci-ioda.c
++++ b/arch/powerpc/platforms/powernv/pci-ioda.c
+@@ -2078,9 +2078,23 @@ static long pnv_pci_ioda2_setup_default_config(struct pnv_ioda_pe *pe)
+ struct iommu_table *tbl = NULL;
+ long rc;
+
++ /*
++ * crashkernel= specifies the kdump kernel's maximum memory at
++ * some offset and there is no guaranteed the result is a power
++ * of 2, which will cause errors later.
++ */
++ const u64 max_memory = __rounddown_pow_of_two(memory_hotplug_max());
++
++ /*
++ * In memory constrained environments, e.g. kdump kernel, the
++ * DMA window can be larger than available memory, which will
++ * cause errors later.
++ */
++ const u64 window_size = min((u64)pe->table_group.tce32_size, max_memory);
++
+ rc = pnv_pci_ioda2_create_table(&pe->table_group, 0,
+ IOMMU_PAGE_SHIFT_4K,
+- pe->table_group.tce32_size,
++ window_size,
+ POWERNV_IOMMU_DEFAULT_LEVELS, &tbl);
+ if (rc) {
+ pe_err(pe, "Failed to create 32-bit TCE table, err %ld",
+diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
+index 47d9cebe7159..db17827eb746 100644
+--- a/arch/powerpc/platforms/pseries/dlpar.c
++++ b/arch/powerpc/platforms/pseries/dlpar.c
+@@ -422,8 +422,10 @@ static ssize_t dlpar_cpu_probe(const char *buf, size_t count)
+
+ dn = dlpar_configure_connector(cpu_to_be32(drc_index), parent);
+ of_node_put(parent);
+- if (!dn)
++ if (!dn) {
++ dlpar_release_drc(drc_index);
+ return -EINVAL;
++ }
+
+ rc = dlpar_attach_node(dn);
+ if (rc) {
+diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
+index 02e4a1745516..3b6647e574b6 100644
+--- a/arch/powerpc/platforms/pseries/ras.c
++++ b/arch/powerpc/platforms/pseries/ras.c
+@@ -189,7 +189,8 @@ static irqreturn_t ras_epow_interrupt(int irq, void *dev_id)
+ int state;
+ int critical;
+
+- status = rtas_get_sensor(EPOW_SENSOR_TOKEN, EPOW_SENSOR_INDEX, &state);
++ status = rtas_get_sensor_fast(EPOW_SENSOR_TOKEN, EPOW_SENSOR_INDEX,
++ &state);
+
+ if (state > 3)
+ critical = 1; /* Time Critical */
+diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
+index df6a7041922b..e6e8b241d717 100644
+--- a/arch/powerpc/platforms/pseries/setup.c
++++ b/arch/powerpc/platforms/pseries/setup.c
+@@ -268,6 +268,11 @@ static int pci_dn_reconfig_notifier(struct notifier_block *nb, unsigned long act
+ eeh_dev_init(PCI_DN(np), pci->phb);
+ }
+ break;
++ case OF_RECONFIG_DETACH_NODE:
++ pci = PCI_DN(np);
++ if (pci)
++ list_del(&pci->list);
++ break;
+ default:
+ err = NOTIFY_DONE;
+ break;
+diff --git a/arch/s390/boot/compressed/misc.c b/arch/s390/boot/compressed/misc.c
+index 42506b371b74..4da604ebf6fd 100644
+--- a/arch/s390/boot/compressed/misc.c
++++ b/arch/s390/boot/compressed/misc.c
+@@ -167,7 +167,7 @@ unsigned long decompress_kernel(void)
+ #endif
+
+ puts("Uncompressing Linux... ");
+- decompress(input_data, input_len, NULL, NULL, output, NULL, error);
++ __decompress(input_data, input_len, NULL, NULL, output, 0, NULL, error);
+ puts("Ok, booting the kernel.\n");
+ return (unsigned long) output;
+ }
+diff --git a/arch/sh/boot/compressed/misc.c b/arch/sh/boot/compressed/misc.c
+index 95470a472d2c..208a9753ab38 100644
+--- a/arch/sh/boot/compressed/misc.c
++++ b/arch/sh/boot/compressed/misc.c
+@@ -132,7 +132,7 @@ void decompress_kernel(void)
+
+ puts("Uncompressing Linux... ");
+ cache_control(CACHE_ENABLE);
+- decompress(input_data, input_len, NULL, NULL, output, NULL, error);
++ __decompress(input_data, input_len, NULL, NULL, output, 0, NULL, error);
+ cache_control(CACHE_DISABLE);
+ puts("Ok, booting the kernel.\n");
+ }
+diff --git a/arch/unicore32/boot/compressed/misc.c b/arch/unicore32/boot/compressed/misc.c
+index 176d5bda3559..5c65dfee278c 100644
+--- a/arch/unicore32/boot/compressed/misc.c
++++ b/arch/unicore32/boot/compressed/misc.c
+@@ -119,8 +119,8 @@ unsigned long decompress_kernel(unsigned long output_start,
+ output_ptr = get_unaligned_le32(tmp);
+
+ arch_decomp_puts("Uncompressing Linux...");
+- decompress(input_data, input_data_end - input_data, NULL, NULL,
+- output_data, NULL, error);
++ __decompress(input_data, input_data_end - input_data, NULL, NULL,
++ output_data, 0, NULL, error);
+ arch_decomp_puts(" done, booting the kernel.\n");
+ return output_ptr;
+ }
+diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
+index a107b935e22f..e28437e0f708 100644
+--- a/arch/x86/boot/compressed/misc.c
++++ b/arch/x86/boot/compressed/misc.c
+@@ -424,7 +424,8 @@ asmlinkage __visible void *decompress_kernel(void *rmode, memptr heap,
+ #endif
+
+ debug_putstr("\nDecompressing Linux... ");
+- decompress(input_data, input_len, NULL, NULL, output, NULL, error);
++ __decompress(input_data, input_len, NULL, NULL, output, output_len,
++ NULL, error);
+ parse_elf(output);
+ /*
+ * 32-bit always performs relocations. 64-bit relocations are only
+diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
+index 8340e45c891a..68aec42545c2 100644
+--- a/arch/x86/mm/init_32.c
++++ b/arch/x86/mm/init_32.c
+@@ -137,6 +137,7 @@ page_table_range_init_count(unsigned long start, unsigned long end)
+
+ vaddr = start;
+ pgd_idx = pgd_index(vaddr);
++ pmd_idx = pmd_index(vaddr);
+
+ for ( ; (pgd_idx < PTRS_PER_PGD) && (vaddr != end); pgd_idx++) {
+ for (; (pmd_idx < PTRS_PER_PMD) && (vaddr != end);
+diff --git a/block/blk-flush.c b/block/blk-flush.c
+index 20badd7b9d1b..9c423e53324a 100644
+--- a/block/blk-flush.c
++++ b/block/blk-flush.c
+@@ -73,6 +73,7 @@
+
+ #include "blk.h"
+ #include "blk-mq.h"
++#include "blk-mq-tag.h"
+
+ /* FLUSH/FUA sequences */
+ enum {
+@@ -226,7 +227,12 @@ static void flush_end_io(struct request *flush_rq, int error)
+ struct blk_flush_queue *fq = blk_get_flush_queue(q, flush_rq->mq_ctx);
+
+ if (q->mq_ops) {
++ struct blk_mq_hw_ctx *hctx;
++
++ /* release the tag's ownership to the req cloned from */
+ spin_lock_irqsave(&fq->mq_flush_lock, flags);
++ hctx = q->mq_ops->map_queue(q, flush_rq->mq_ctx->cpu);
++ blk_mq_tag_set_rq(hctx, flush_rq->tag, fq->orig_rq);
+ flush_rq->tag = -1;
+ }
+
+@@ -308,11 +314,18 @@ static bool blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq)
+
+ /*
+ * Borrow tag from the first request since they can't
+- * be in flight at the same time.
++ * be in flight at the same time. And acquire the tag's
++ * ownership for flush req.
+ */
+ if (q->mq_ops) {
++ struct blk_mq_hw_ctx *hctx;
++
+ flush_rq->mq_ctx = first_rq->mq_ctx;
+ flush_rq->tag = first_rq->tag;
++ fq->orig_rq = first_rq;
++
++ hctx = q->mq_ops->map_queue(q, first_rq->mq_ctx->cpu);
++ blk_mq_tag_set_rq(hctx, first_rq->tag, flush_rq);
+ }
+
+ flush_rq->cmd_type = REQ_TYPE_FS;
+diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
+index b79685e06b70..279c5d674edf 100644
+--- a/block/blk-mq-sysfs.c
++++ b/block/blk-mq-sysfs.c
+@@ -141,15 +141,26 @@ static ssize_t blk_mq_sysfs_completed_show(struct blk_mq_ctx *ctx, char *page)
+
+ static ssize_t sysfs_list_show(char *page, struct list_head *list, char *msg)
+ {
+- char *start_page = page;
+ struct request *rq;
++ int len = snprintf(page, PAGE_SIZE - 1, "%s:\n", msg);
++
++ list_for_each_entry(rq, list, queuelist) {
++ const int rq_len = 2 * sizeof(rq) + 2;
++
++ /* if the output will be truncated */
++ if (PAGE_SIZE - 1 < len + rq_len) {
++ /* backspacing if it can't hold '\t...\n' */
++ if (PAGE_SIZE - 1 < len + 5)
++ len -= rq_len;
++ len += snprintf(page + len, PAGE_SIZE - 1 - len,
++ "\t...\n");
++ break;
++ }
++ len += snprintf(page + len, PAGE_SIZE - 1 - len,
++ "\t%p\n", rq);
++ }
+
+- page += sprintf(page, "%s:\n", msg);
+-
+- list_for_each_entry(rq, list, queuelist)
+- page += sprintf(page, "\t%p\n", rq);
+-
+- return page - start_page;
++ return len;
+ }
+
+ static ssize_t blk_mq_sysfs_rq_list_show(struct blk_mq_ctx *ctx, char *page)
+diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
+index 9b6e28830b82..9115c6d59948 100644
+--- a/block/blk-mq-tag.c
++++ b/block/blk-mq-tag.c
+@@ -429,7 +429,7 @@ static void bt_for_each(struct blk_mq_hw_ctx *hctx,
+ for (bit = find_first_bit(&bm->word, bm->depth);
+ bit < bm->depth;
+ bit = find_next_bit(&bm->word, bm->depth, bit + 1)) {
+- rq = blk_mq_tag_to_rq(hctx->tags, off + bit);
++ rq = hctx->tags->rqs[off + bit];
+ if (rq->q == hctx->queue)
+ fn(hctx, rq, data, reserved);
+ }
+@@ -453,7 +453,7 @@ static void bt_tags_for_each(struct blk_mq_tags *tags,
+ for (bit = find_first_bit(&bm->word, bm->depth);
+ bit < bm->depth;
+ bit = find_next_bit(&bm->word, bm->depth, bit + 1)) {
+- rq = blk_mq_tag_to_rq(tags, off + bit);
++ rq = tags->rqs[off + bit];
+ fn(rq, data, reserved);
+ }
+
+diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
+index 75893a34237d..9eb2cf4f01cb 100644
+--- a/block/blk-mq-tag.h
++++ b/block/blk-mq-tag.h
+@@ -89,4 +89,16 @@ static inline void blk_mq_tag_idle(struct blk_mq_hw_ctx *hctx)
+ __blk_mq_tag_idle(hctx);
+ }
+
++/*
++ * This helper should only be used for flush request to share tag
++ * with the request cloned from, and both the two requests can't be
++ * in flight at the same time. The caller has to make sure the tag
++ * can't be freed.
++ */
++static inline void blk_mq_tag_set_rq(struct blk_mq_hw_ctx *hctx,
++ unsigned int tag, struct request *rq)
++{
++ hctx->tags->rqs[tag] = rq;
++}
++
+ #endif
+diff --git a/block/blk-mq.c b/block/blk-mq.c
+index 7d842db59699..176262ec3731 100644
+--- a/block/blk-mq.c
++++ b/block/blk-mq.c
+@@ -559,23 +559,9 @@ void blk_mq_abort_requeue_list(struct request_queue *q)
+ }
+ EXPORT_SYMBOL(blk_mq_abort_requeue_list);
+
+-static inline bool is_flush_request(struct request *rq,
+- struct blk_flush_queue *fq, unsigned int tag)
+-{
+- return ((rq->cmd_flags & REQ_FLUSH_SEQ) &&
+- fq->flush_rq->tag == tag);
+-}
+-
+ struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
+ {
+- struct request *rq = tags->rqs[tag];
+- /* mq_ctx of flush rq is always cloned from the corresponding req */
+- struct blk_flush_queue *fq = blk_get_flush_queue(rq->q, rq->mq_ctx);
+-
+- if (!is_flush_request(rq, fq, tag))
+- return rq;
+-
+- return fq->flush_rq;
++ return tags->rqs[tag];
+ }
+ EXPORT_SYMBOL(blk_mq_tag_to_rq);
+
+diff --git a/block/blk.h b/block/blk.h
+index 026d9594142b..838188b35a83 100644
+--- a/block/blk.h
++++ b/block/blk.h
+@@ -22,6 +22,12 @@ struct blk_flush_queue {
+ struct list_head flush_queue[2];
+ struct list_head flush_data_in_flight;
+ struct request *flush_rq;
++
++ /*
++ * flush_rq shares tag with this rq, both can't be active
++ * at the same time
++ */
++ struct request *orig_rq;
+ spinlock_t mq_flush_lock;
+ };
+
+diff --git a/drivers/base/node.c b/drivers/base/node.c
+index 31df474d72f4..560751bad294 100644
+--- a/drivers/base/node.c
++++ b/drivers/base/node.c
+@@ -392,6 +392,16 @@ int register_mem_sect_under_node(struct memory_block *mem_blk, int nid)
+ for (pfn = sect_start_pfn; pfn <= sect_end_pfn; pfn++) {
+ int page_nid;
+
++ /*
++ * memory block could have several absent sections from start.
++ * skip pfn range from absent section
++ */
++ if (!pfn_present(pfn)) {
++ pfn = round_down(pfn + PAGES_PER_SECTION,
++ PAGES_PER_SECTION) - 1;
++ continue;
++ }
++
+ page_nid = get_nid_for_pfn(pfn);
+ if (page_nid < 0)
+ continue;
+diff --git a/drivers/crypto/vmx/aes.c b/drivers/crypto/vmx/aes.c
+index e79e567e43aa..263af709e536 100644
+--- a/drivers/crypto/vmx/aes.c
++++ b/drivers/crypto/vmx/aes.c
+@@ -84,6 +84,7 @@ static int p8_aes_setkey(struct crypto_tfm *tfm, const u8 *key,
+ preempt_disable();
+ pagefault_disable();
+ enable_kernel_altivec();
++ enable_kernel_vsx();
+ ret = aes_p8_set_encrypt_key(key, keylen * 8, &ctx->enc_key);
+ ret += aes_p8_set_decrypt_key(key, keylen * 8, &ctx->dec_key);
+ pagefault_enable();
+@@ -103,6 +104,7 @@ static void p8_aes_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+ preempt_disable();
+ pagefault_disable();
+ enable_kernel_altivec();
++ enable_kernel_vsx();
+ aes_p8_encrypt(src, dst, &ctx->enc_key);
+ pagefault_enable();
+ preempt_enable();
+@@ -119,6 +121,7 @@ static void p8_aes_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+ preempt_disable();
+ pagefault_disable();
+ enable_kernel_altivec();
++ enable_kernel_vsx();
+ aes_p8_decrypt(src, dst, &ctx->dec_key);
+ pagefault_enable();
+ preempt_enable();
+diff --git a/drivers/crypto/vmx/aes_cbc.c b/drivers/crypto/vmx/aes_cbc.c
+index 7299995c78ec..0b8fe2ec5315 100644
+--- a/drivers/crypto/vmx/aes_cbc.c
++++ b/drivers/crypto/vmx/aes_cbc.c
+@@ -85,6 +85,7 @@ static int p8_aes_cbc_setkey(struct crypto_tfm *tfm, const u8 *key,
+ preempt_disable();
+ pagefault_disable();
+ enable_kernel_altivec();
++ enable_kernel_vsx();
+ ret = aes_p8_set_encrypt_key(key, keylen * 8, &ctx->enc_key);
+ ret += aes_p8_set_decrypt_key(key, keylen * 8, &ctx->dec_key);
+ pagefault_enable();
+@@ -115,6 +116,7 @@ static int p8_aes_cbc_encrypt(struct blkcipher_desc *desc,
+ preempt_disable();
+ pagefault_disable();
+ enable_kernel_altivec();
++ enable_kernel_vsx();
+
+ blkcipher_walk_init(&walk, dst, src, nbytes);
+ ret = blkcipher_walk_virt(desc, &walk);
+@@ -155,6 +157,7 @@ static int p8_aes_cbc_decrypt(struct blkcipher_desc *desc,
+ preempt_disable();
+ pagefault_disable();
+ enable_kernel_altivec();
++ enable_kernel_vsx();
+
+ blkcipher_walk_init(&walk, dst, src, nbytes);
+ ret = blkcipher_walk_virt(desc, &walk);
+diff --git a/drivers/crypto/vmx/aes_ctr.c b/drivers/crypto/vmx/aes_ctr.c
+index ed3838781b4c..ee1306cd8f59 100644
+--- a/drivers/crypto/vmx/aes_ctr.c
++++ b/drivers/crypto/vmx/aes_ctr.c
+@@ -82,6 +82,7 @@ static int p8_aes_ctr_setkey(struct crypto_tfm *tfm, const u8 *key,
+
+ pagefault_disable();
+ enable_kernel_altivec();
++ enable_kernel_vsx();
+ ret = aes_p8_set_encrypt_key(key, keylen * 8, &ctx->enc_key);
+ pagefault_enable();
+
+@@ -100,6 +101,7 @@ static void p8_aes_ctr_final(struct p8_aes_ctr_ctx *ctx,
+
+ pagefault_disable();
+ enable_kernel_altivec();
++ enable_kernel_vsx();
+ aes_p8_encrypt(ctrblk, keystream, &ctx->enc_key);
+ pagefault_enable();
+
+@@ -132,6 +134,7 @@ static int p8_aes_ctr_crypt(struct blkcipher_desc *desc,
+ while ((nbytes = walk.nbytes) >= AES_BLOCK_SIZE) {
+ pagefault_disable();
+ enable_kernel_altivec();
++ enable_kernel_vsx();
+ aes_p8_ctr32_encrypt_blocks(walk.src.virt.addr,
+ walk.dst.virt.addr,
+ (nbytes &
+diff --git a/drivers/crypto/vmx/ghash.c b/drivers/crypto/vmx/ghash.c
+index b5e29002b666..2183a2e77641 100644
+--- a/drivers/crypto/vmx/ghash.c
++++ b/drivers/crypto/vmx/ghash.c
+@@ -119,6 +119,7 @@ static int p8_ghash_setkey(struct crypto_shash *tfm, const u8 *key,
+ preempt_disable();
+ pagefault_disable();
+ enable_kernel_altivec();
++ enable_kernel_vsx();
+ enable_kernel_fp();
+ gcm_init_p8(ctx->htable, (const u64 *) key);
+ pagefault_enable();
+@@ -149,6 +150,7 @@ static int p8_ghash_update(struct shash_desc *desc,
+ preempt_disable();
+ pagefault_disable();
+ enable_kernel_altivec();
++ enable_kernel_vsx();
+ enable_kernel_fp();
+ gcm_ghash_p8(dctx->shash, ctx->htable,
+ dctx->buffer, GHASH_DIGEST_SIZE);
+@@ -163,6 +165,7 @@ static int p8_ghash_update(struct shash_desc *desc,
+ preempt_disable();
+ pagefault_disable();
+ enable_kernel_altivec();
++ enable_kernel_vsx();
+ enable_kernel_fp();
+ gcm_ghash_p8(dctx->shash, ctx->htable, src, len);
+ pagefault_enable();
+@@ -193,6 +196,7 @@ static int p8_ghash_final(struct shash_desc *desc, u8 *out)
+ preempt_disable();
+ pagefault_disable();
+ enable_kernel_altivec();
++ enable_kernel_vsx();
+ enable_kernel_fp();
+ gcm_ghash_p8(dctx->shash, ctx->htable,
+ dctx->buffer, GHASH_DIGEST_SIZE);
+diff --git a/drivers/gpu/drm/i915/intel_ddi.c b/drivers/gpu/drm/i915/intel_ddi.c
+index cacb07b7a8f1..32e7b4a686ef 100644
+--- a/drivers/gpu/drm/i915/intel_ddi.c
++++ b/drivers/gpu/drm/i915/intel_ddi.c
+@@ -1293,17 +1293,14 @@ skl_ddi_pll_select(struct intel_crtc *intel_crtc,
+ DPLL_CFGCR2_PDIV(wrpll_params.pdiv) |
+ wrpll_params.central_freq;
+ } else if (intel_encoder->type == INTEL_OUTPUT_DISPLAYPORT) {
+- struct drm_encoder *encoder = &intel_encoder->base;
+- struct intel_dp *intel_dp = enc_to_intel_dp(encoder);
+-
+- switch (intel_dp->link_bw) {
+- case DP_LINK_BW_1_62:
++ switch (crtc_state->port_clock / 2) {
++ case 81000:
+ ctrl1 |= DPLL_CTRL1_LINK_RATE(DPLL_CTRL1_LINK_RATE_810, 0);
+ break;
+- case DP_LINK_BW_2_7:
++ case 135000:
+ ctrl1 |= DPLL_CTRL1_LINK_RATE(DPLL_CTRL1_LINK_RATE_1350, 0);
+ break;
+- case DP_LINK_BW_5_4:
++ case 270000:
+ ctrl1 |= DPLL_CTRL1_LINK_RATE(DPLL_CTRL1_LINK_RATE_2700, 0);
+ break;
+ }
+diff --git a/drivers/gpu/drm/i915/intel_dp.c b/drivers/gpu/drm/i915/intel_dp.c
+index bd8f8863eb0e..ca2d923101fc 100644
+--- a/drivers/gpu/drm/i915/intel_dp.c
++++ b/drivers/gpu/drm/i915/intel_dp.c
+@@ -48,28 +48,28 @@
+ #define INTEL_DP_RESOLUTION_FAILSAFE (3 << INTEL_DP_RESOLUTION_SHIFT_MASK)
+
+ struct dp_link_dpll {
+- int link_bw;
++ int clock;
+ struct dpll dpll;
+ };
+
+ static const struct dp_link_dpll gen4_dpll[] = {
+- { DP_LINK_BW_1_62,
++ { 162000,
+ { .p1 = 2, .p2 = 10, .n = 2, .m1 = 23, .m2 = 8 } },
+- { DP_LINK_BW_2_7,
++ { 270000,
+ { .p1 = 1, .p2 = 10, .n = 1, .m1 = 14, .m2 = 2 } }
+ };
+
+ static const struct dp_link_dpll pch_dpll[] = {
+- { DP_LINK_BW_1_62,
++ { 162000,
+ { .p1 = 2, .p2 = 10, .n = 1, .m1 = 12, .m2 = 9 } },
+- { DP_LINK_BW_2_7,
++ { 270000,
+ { .p1 = 1, .p2 = 10, .n = 2, .m1 = 14, .m2 = 8 } }
+ };
+
+ static const struct dp_link_dpll vlv_dpll[] = {
+- { DP_LINK_BW_1_62,
++ { 162000,
+ { .p1 = 3, .p2 = 2, .n = 5, .m1 = 3, .m2 = 81 } },
+- { DP_LINK_BW_2_7,
++ { 270000,
+ { .p1 = 2, .p2 = 2, .n = 1, .m1 = 2, .m2 = 27 } }
+ };
+
+@@ -83,11 +83,11 @@ static const struct dp_link_dpll chv_dpll[] = {
+ * m2 is stored in fixed point format using formula below
+ * (m2_int << 22) | m2_fraction
+ */
+- { DP_LINK_BW_1_62, /* m2_int = 32, m2_fraction = 1677722 */
++ { 162000, /* m2_int = 32, m2_fraction = 1677722 */
+ { .p1 = 4, .p2 = 2, .n = 1, .m1 = 2, .m2 = 0x819999a } },
+- { DP_LINK_BW_2_7, /* m2_int = 27, m2_fraction = 0 */
++ { 270000, /* m2_int = 27, m2_fraction = 0 */
+ { .p1 = 4, .p2 = 1, .n = 1, .m1 = 2, .m2 = 0x6c00000 } },
+- { DP_LINK_BW_5_4, /* m2_int = 27, m2_fraction = 0 */
++ { 540000, /* m2_int = 27, m2_fraction = 0 */
+ { .p1 = 2, .p2 = 1, .n = 1, .m1 = 2, .m2 = 0x6c00000 } }
+ };
+
+@@ -1089,7 +1089,7 @@ intel_dp_connector_unregister(struct intel_connector *intel_connector)
+ }
+
+ static void
+-skl_edp_set_pll_config(struct intel_crtc_state *pipe_config, int link_clock)
++skl_edp_set_pll_config(struct intel_crtc_state *pipe_config)
+ {
+ u32 ctrl1;
+
+@@ -1101,7 +1101,7 @@ skl_edp_set_pll_config(struct intel_crtc_state *pipe_config, int link_clock)
+ pipe_config->dpll_hw_state.cfgcr2 = 0;
+
+ ctrl1 = DPLL_CTRL1_OVERRIDE(SKL_DPLL0);
+- switch (link_clock / 2) {
++ switch (pipe_config->port_clock / 2) {
+ case 81000:
+ ctrl1 |= DPLL_CTRL1_LINK_RATE(DPLL_CTRL1_LINK_RATE_810,
+ SKL_DPLL0);
+@@ -1134,20 +1134,20 @@ skl_edp_set_pll_config(struct intel_crtc_state *pipe_config, int link_clock)
+ pipe_config->dpll_hw_state.ctrl1 = ctrl1;
+ }
+
+-static void
+-hsw_dp_set_ddi_pll_sel(struct intel_crtc_state *pipe_config, int link_bw)
++void
++hsw_dp_set_ddi_pll_sel(struct intel_crtc_state *pipe_config)
+ {
+ memset(&pipe_config->dpll_hw_state, 0,
+ sizeof(pipe_config->dpll_hw_state));
+
+- switch (link_bw) {
+- case DP_LINK_BW_1_62:
++ switch (pipe_config->port_clock / 2) {
++ case 81000:
+ pipe_config->ddi_pll_sel = PORT_CLK_SEL_LCPLL_810;
+ break;
+- case DP_LINK_BW_2_7:
++ case 135000:
+ pipe_config->ddi_pll_sel = PORT_CLK_SEL_LCPLL_1350;
+ break;
+- case DP_LINK_BW_5_4:
++ case 270000:
+ pipe_config->ddi_pll_sel = PORT_CLK_SEL_LCPLL_2700;
+ break;
+ }
+@@ -1198,7 +1198,7 @@ intel_dp_source_rates(struct drm_device *dev, const int **source_rates)
+
+ static void
+ intel_dp_set_clock(struct intel_encoder *encoder,
+- struct intel_crtc_state *pipe_config, int link_bw)
++ struct intel_crtc_state *pipe_config)
+ {
+ struct drm_device *dev = encoder->base.dev;
+ const struct dp_link_dpll *divisor = NULL;
+@@ -1220,7 +1220,7 @@ intel_dp_set_clock(struct intel_encoder *encoder,
+
+ if (divisor && count) {
+ for (i = 0; i < count; i++) {
+- if (link_bw == divisor[i].link_bw) {
++ if (pipe_config->port_clock == divisor[i].clock) {
+ pipe_config->dpll = divisor[i].dpll;
+ pipe_config->clock_set = true;
+ break;
+@@ -1494,13 +1494,13 @@ found:
+ }
+
+ if (IS_SKYLAKE(dev) && is_edp(intel_dp))
+- skl_edp_set_pll_config(pipe_config, common_rates[clock]);
++ skl_edp_set_pll_config(pipe_config);
+ else if (IS_BROXTON(dev))
+ /* handled in ddi */;
+ else if (IS_HASWELL(dev) || IS_BROADWELL(dev))
+- hsw_dp_set_ddi_pll_sel(pipe_config, intel_dp->link_bw);
++ hsw_dp_set_ddi_pll_sel(pipe_config);
+ else
+- intel_dp_set_clock(encoder, pipe_config, intel_dp->link_bw);
++ intel_dp_set_clock(encoder, pipe_config);
+
+ return true;
+ }
+diff --git a/drivers/gpu/drm/i915/intel_dp_mst.c b/drivers/gpu/drm/i915/intel_dp_mst.c
+index 600afdbef8c9..8c127201ab3c 100644
+--- a/drivers/gpu/drm/i915/intel_dp_mst.c
++++ b/drivers/gpu/drm/i915/intel_dp_mst.c
+@@ -33,6 +33,7 @@
+ static bool intel_dp_mst_compute_config(struct intel_encoder *encoder,
+ struct intel_crtc_state *pipe_config)
+ {
++ struct drm_device *dev = encoder->base.dev;
+ struct intel_dp_mst_encoder *intel_mst = enc_to_mst(&encoder->base);
+ struct intel_digital_port *intel_dig_port = intel_mst->primary;
+ struct intel_dp *intel_dp = &intel_dig_port->dp;
+@@ -97,6 +98,10 @@ static bool intel_dp_mst_compute_config(struct intel_encoder *encoder,
+ &pipe_config->dp_m_n);
+
+ pipe_config->dp_m_n.tu = slots;
++
++ if (IS_HASWELL(dev) || IS_BROADWELL(dev))
++ hsw_dp_set_ddi_pll_sel(pipe_config);
++
+ return true;
+
+ }
+diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
+index 105928382e21..04d426156bdb 100644
+--- a/drivers/gpu/drm/i915/intel_drv.h
++++ b/drivers/gpu/drm/i915/intel_drv.h
+@@ -1194,6 +1194,7 @@ void intel_edp_drrs_disable(struct intel_dp *intel_dp);
+ void intel_edp_drrs_invalidate(struct drm_device *dev,
+ unsigned frontbuffer_bits);
+ void intel_edp_drrs_flush(struct drm_device *dev, unsigned frontbuffer_bits);
++void hsw_dp_set_ddi_pll_sel(struct intel_crtc_state *pipe_config);
+
+ /* intel_dp_mst.c */
+ int intel_dp_mst_encoder_init(struct intel_digital_port *intel_dig_port, int conn_id);
+diff --git a/drivers/gpu/drm/radeon/radeon_combios.c b/drivers/gpu/drm/radeon/radeon_combios.c
+index c097d3a82bda..a9b01bcf7d0a 100644
+--- a/drivers/gpu/drm/radeon/radeon_combios.c
++++ b/drivers/gpu/drm/radeon/radeon_combios.c
+@@ -3387,6 +3387,14 @@ void radeon_combios_asic_init(struct drm_device *dev)
+ rdev->pdev->subsystem_device == 0x30ae)
+ return;
+
++ /* quirk for rs4xx HP Compaq dc5750 Small Form Factor to make it resume
++ * - it hangs on resume inside the dynclk 1 table.
++ */
++ if (rdev->family == CHIP_RS480 &&
++ rdev->pdev->subsystem_vendor == 0x103c &&
++ rdev->pdev->subsystem_device == 0x280a)
++ return;
++
+ /* DYN CLK 1 */
+ table = combios_get_table_offset(dev, COMBIOS_DYN_CLK_1_TABLE);
+ if (table)
+diff --git a/drivers/i2c/busses/i2c-xgene-slimpro.c b/drivers/i2c/busses/i2c-xgene-slimpro.c
+index 1c9cb65ac4cf..4233f5695352 100644
+--- a/drivers/i2c/busses/i2c-xgene-slimpro.c
++++ b/drivers/i2c/busses/i2c-xgene-slimpro.c
+@@ -198,10 +198,10 @@ static int slimpro_i2c_blkrd(struct slimpro_i2c_dev *ctx, u32 chip, u32 addr,
+ int rc;
+
+ paddr = dma_map_single(ctx->dev, ctx->dma_buffer, readlen, DMA_FROM_DEVICE);
+- rc = dma_mapping_error(ctx->dev, paddr);
+- if (rc) {
++ if (dma_mapping_error(ctx->dev, paddr)) {
+ dev_err(&ctx->adapter.dev, "Error in mapping dma buffer %p\n",
+ ctx->dma_buffer);
++ rc = -ENOMEM;
+ goto err;
+ }
+
+@@ -241,10 +241,10 @@ static int slimpro_i2c_blkwr(struct slimpro_i2c_dev *ctx, u32 chip,
+ memcpy(ctx->dma_buffer, data, writelen);
+ paddr = dma_map_single(ctx->dev, ctx->dma_buffer, writelen,
+ DMA_TO_DEVICE);
+- rc = dma_mapping_error(ctx->dev, paddr);
+- if (rc) {
++ if (dma_mapping_error(ctx->dev, paddr)) {
+ dev_err(&ctx->adapter.dev, "Error in mapping dma buffer %p\n",
+ ctx->dma_buffer);
++ rc = -ENOMEM;
+ goto err;
+ }
+
+diff --git a/drivers/infiniband/core/uverbs.h b/drivers/infiniband/core/uverbs.h
+index ba365b6d1e8d..65cbfcc92f11 100644
+--- a/drivers/infiniband/core/uverbs.h
++++ b/drivers/infiniband/core/uverbs.h
+@@ -85,7 +85,7 @@
+ */
+
+ struct ib_uverbs_device {
+- struct kref ref;
++ atomic_t refcount;
+ int num_comp_vectors;
+ struct completion comp;
+ struct device *dev;
+@@ -94,6 +94,7 @@ struct ib_uverbs_device {
+ struct cdev cdev;
+ struct rb_root xrcd_tree;
+ struct mutex xrcd_tree_mutex;
++ struct kobject kobj;
+ };
+
+ struct ib_uverbs_event_file {
+diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
+index bbb02ffe87df..a6ca83b3153f 100644
+--- a/drivers/infiniband/core/uverbs_cmd.c
++++ b/drivers/infiniband/core/uverbs_cmd.c
+@@ -2346,6 +2346,12 @@ ssize_t ib_uverbs_post_send(struct ib_uverbs_file *file,
+ next->send_flags = user_wr->send_flags;
+
+ if (is_ud) {
++ if (next->opcode != IB_WR_SEND &&
++ next->opcode != IB_WR_SEND_WITH_IMM) {
++ ret = -EINVAL;
++ goto out_put;
++ }
++
+ next->wr.ud.ah = idr_read_ah(user_wr->wr.ud.ah,
+ file->ucontext);
+ if (!next->wr.ud.ah) {
+@@ -2385,9 +2391,11 @@ ssize_t ib_uverbs_post_send(struct ib_uverbs_file *file,
+ user_wr->wr.atomic.compare_add;
+ next->wr.atomic.swap = user_wr->wr.atomic.swap;
+ next->wr.atomic.rkey = user_wr->wr.atomic.rkey;
++ case IB_WR_SEND:
+ break;
+ default:
+- break;
++ ret = -EINVAL;
++ goto out_put;
+ }
+ }
+
+diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
+index f6eef2da7097..15f4126a577d 100644
+--- a/drivers/infiniband/core/uverbs_main.c
++++ b/drivers/infiniband/core/uverbs_main.c
+@@ -130,14 +130,18 @@ static int (*uverbs_ex_cmd_table[])(struct ib_uverbs_file *file,
+ static void ib_uverbs_add_one(struct ib_device *device);
+ static void ib_uverbs_remove_one(struct ib_device *device);
+
+-static void ib_uverbs_release_dev(struct kref *ref)
++static void ib_uverbs_release_dev(struct kobject *kobj)
+ {
+ struct ib_uverbs_device *dev =
+- container_of(ref, struct ib_uverbs_device, ref);
++ container_of(kobj, struct ib_uverbs_device, kobj);
+
+- complete(&dev->comp);
++ kfree(dev);
+ }
+
++static struct kobj_type ib_uverbs_dev_ktype = {
++ .release = ib_uverbs_release_dev,
++};
++
+ static void ib_uverbs_release_event_file(struct kref *ref)
+ {
+ struct ib_uverbs_event_file *file =
+@@ -303,13 +307,19 @@ static int ib_uverbs_cleanup_ucontext(struct ib_uverbs_file *file,
+ return context->device->dealloc_ucontext(context);
+ }
+
++static void ib_uverbs_comp_dev(struct ib_uverbs_device *dev)
++{
++ complete(&dev->comp);
++}
++
+ static void ib_uverbs_release_file(struct kref *ref)
+ {
+ struct ib_uverbs_file *file =
+ container_of(ref, struct ib_uverbs_file, ref);
+
+ module_put(file->device->ib_dev->owner);
+- kref_put(&file->device->ref, ib_uverbs_release_dev);
++ if (atomic_dec_and_test(&file->device->refcount))
++ ib_uverbs_comp_dev(file->device);
+
+ kfree(file);
+ }
+@@ -743,9 +753,7 @@ static int ib_uverbs_open(struct inode *inode, struct file *filp)
+ int ret;
+
+ dev = container_of(inode->i_cdev, struct ib_uverbs_device, cdev);
+- if (dev)
+- kref_get(&dev->ref);
+- else
++ if (!atomic_inc_not_zero(&dev->refcount))
+ return -ENXIO;
+
+ if (!try_module_get(dev->ib_dev->owner)) {
+@@ -766,6 +774,7 @@ static int ib_uverbs_open(struct inode *inode, struct file *filp)
+ mutex_init(&file->mutex);
+
+ filp->private_data = file;
++ kobject_get(&dev->kobj);
+
+ return nonseekable_open(inode, filp);
+
+@@ -773,13 +782,16 @@ err_module:
+ module_put(dev->ib_dev->owner);
+
+ err:
+- kref_put(&dev->ref, ib_uverbs_release_dev);
++ if (atomic_dec_and_test(&dev->refcount))
++ ib_uverbs_comp_dev(dev);
++
+ return ret;
+ }
+
+ static int ib_uverbs_close(struct inode *inode, struct file *filp)
+ {
+ struct ib_uverbs_file *file = filp->private_data;
++ struct ib_uverbs_device *dev = file->device;
+
+ ib_uverbs_cleanup_ucontext(file, file->ucontext);
+
+@@ -787,6 +799,7 @@ static int ib_uverbs_close(struct inode *inode, struct file *filp)
+ kref_put(&file->async_file->ref, ib_uverbs_release_event_file);
+
+ kref_put(&file->ref, ib_uverbs_release_file);
++ kobject_put(&dev->kobj);
+
+ return 0;
+ }
+@@ -882,10 +895,11 @@ static void ib_uverbs_add_one(struct ib_device *device)
+ if (!uverbs_dev)
+ return;
+
+- kref_init(&uverbs_dev->ref);
++ atomic_set(&uverbs_dev->refcount, 1);
+ init_completion(&uverbs_dev->comp);
+ uverbs_dev->xrcd_tree = RB_ROOT;
+ mutex_init(&uverbs_dev->xrcd_tree_mutex);
++ kobject_init(&uverbs_dev->kobj, &ib_uverbs_dev_ktype);
+
+ spin_lock(&map_lock);
+ devnum = find_first_zero_bit(dev_map, IB_UVERBS_MAX_DEVICES);
+@@ -912,6 +926,7 @@ static void ib_uverbs_add_one(struct ib_device *device)
+ cdev_init(&uverbs_dev->cdev, NULL);
+ uverbs_dev->cdev.owner = THIS_MODULE;
+ uverbs_dev->cdev.ops = device->mmap ? &uverbs_mmap_fops : &uverbs_fops;
++ uverbs_dev->cdev.kobj.parent = &uverbs_dev->kobj;
+ kobject_set_name(&uverbs_dev->cdev.kobj, "uverbs%d", uverbs_dev->devnum);
+ if (cdev_add(&uverbs_dev->cdev, base, 1))
+ goto err_cdev;
+@@ -942,9 +957,10 @@ err_cdev:
+ clear_bit(devnum, overflow_map);
+
+ err:
+- kref_put(&uverbs_dev->ref, ib_uverbs_release_dev);
++ if (atomic_dec_and_test(&uverbs_dev->refcount))
++ ib_uverbs_comp_dev(uverbs_dev);
+ wait_for_completion(&uverbs_dev->comp);
+- kfree(uverbs_dev);
++ kobject_put(&uverbs_dev->kobj);
+ return;
+ }
+
+@@ -964,9 +980,10 @@ static void ib_uverbs_remove_one(struct ib_device *device)
+ else
+ clear_bit(uverbs_dev->devnum - IB_UVERBS_MAX_DEVICES, overflow_map);
+
+- kref_put(&uverbs_dev->ref, ib_uverbs_release_dev);
++ if (atomic_dec_and_test(&uverbs_dev->refcount))
++ ib_uverbs_comp_dev(uverbs_dev);
+ wait_for_completion(&uverbs_dev->comp);
+- kfree(uverbs_dev);
++ kobject_put(&uverbs_dev->kobj);
+ }
+
+ static char *uverbs_devnode(struct device *dev, umode_t *mode)
+diff --git a/drivers/infiniband/hw/mlx4/ah.c b/drivers/infiniband/hw/mlx4/ah.c
+index f50a546224ad..33fdd50123f7 100644
+--- a/drivers/infiniband/hw/mlx4/ah.c
++++ b/drivers/infiniband/hw/mlx4/ah.c
+@@ -148,9 +148,13 @@ int mlx4_ib_query_ah(struct ib_ah *ibah, struct ib_ah_attr *ah_attr)
+ enum rdma_link_layer ll;
+
+ memset(ah_attr, 0, sizeof *ah_attr);
+- ah_attr->sl = be32_to_cpu(ah->av.ib.sl_tclass_flowlabel) >> 28;
+ ah_attr->port_num = be32_to_cpu(ah->av.ib.port_pd) >> 24;
+ ll = rdma_port_get_link_layer(ibah->device, ah_attr->port_num);
++ if (ll == IB_LINK_LAYER_ETHERNET)
++ ah_attr->sl = be32_to_cpu(ah->av.eth.sl_tclass_flowlabel) >> 29;
++ else
++ ah_attr->sl = be32_to_cpu(ah->av.ib.sl_tclass_flowlabel) >> 28;
++
+ ah_attr->dlid = ll == IB_LINK_LAYER_INFINIBAND ? be16_to_cpu(ah->av.ib.dlid) : 0;
+ if (ah->av.ib.stat_rate)
+ ah_attr->static_rate = ah->av.ib.stat_rate - MLX4_STAT_RATE_OFFSET;
+diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
+index 36eb3d012b6d..2f4259525bb1 100644
+--- a/drivers/infiniband/hw/mlx4/cq.c
++++ b/drivers/infiniband/hw/mlx4/cq.c
+@@ -638,7 +638,7 @@ static void mlx4_ib_poll_sw_comp(struct mlx4_ib_cq *cq, int num_entries,
+ * simulated FLUSH_ERR completions
+ */
+ list_for_each_entry(qp, &cq->send_qp_list, cq_send_list) {
+- mlx4_ib_qp_sw_comp(qp, num_entries, wc, npolled, 1);
++ mlx4_ib_qp_sw_comp(qp, num_entries, wc + *npolled, npolled, 1);
+ if (*npolled >= num_entries)
+ goto out;
+ }
+diff --git a/drivers/infiniband/hw/mlx4/mcg.c b/drivers/infiniband/hw/mlx4/mcg.c
+index ed327e6c8fdc..a0559a8af4f4 100644
+--- a/drivers/infiniband/hw/mlx4/mcg.c
++++ b/drivers/infiniband/hw/mlx4/mcg.c
+@@ -206,15 +206,16 @@ static int send_mad_to_wire(struct mlx4_ib_demux_ctx *ctx, struct ib_mad *mad)
+ {
+ struct mlx4_ib_dev *dev = ctx->dev;
+ struct ib_ah_attr ah_attr;
++ unsigned long flags;
+
+- spin_lock(&dev->sm_lock);
++ spin_lock_irqsave(&dev->sm_lock, flags);
+ if (!dev->sm_ah[ctx->port - 1]) {
+ /* port is not yet Active, sm_ah not ready */
+- spin_unlock(&dev->sm_lock);
++ spin_unlock_irqrestore(&dev->sm_lock, flags);
+ return -EAGAIN;
+ }
+ mlx4_ib_query_ah(dev->sm_ah[ctx->port - 1], &ah_attr);
+- spin_unlock(&dev->sm_lock);
++ spin_unlock_irqrestore(&dev->sm_lock, flags);
+ return mlx4_ib_send_to_wire(dev, mlx4_master_func_num(dev->dev),
+ ctx->port, IB_QPT_GSI, 0, 1, IB_QP1_QKEY,
+ &ah_attr, NULL, mad);
+diff --git a/drivers/infiniband/hw/mlx4/sysfs.c b/drivers/infiniband/hw/mlx4/sysfs.c
+index 6797108ce873..69fb5ba94d0f 100644
+--- a/drivers/infiniband/hw/mlx4/sysfs.c
++++ b/drivers/infiniband/hw/mlx4/sysfs.c
+@@ -640,6 +640,8 @@ static int add_port(struct mlx4_ib_dev *dev, int port_num, int slave)
+ struct mlx4_port *p;
+ int i;
+ int ret;
++ int is_eth = rdma_port_get_link_layer(&dev->ib_dev, port_num) ==
++ IB_LINK_LAYER_ETHERNET;
+
+ p = kzalloc(sizeof *p, GFP_KERNEL);
+ if (!p)
+@@ -657,7 +659,8 @@ static int add_port(struct mlx4_ib_dev *dev, int port_num, int slave)
+
+ p->pkey_group.name = "pkey_idx";
+ p->pkey_group.attrs =
+- alloc_group_attrs(show_port_pkey, store_port_pkey,
++ alloc_group_attrs(show_port_pkey,
++ is_eth ? NULL : store_port_pkey,
+ dev->dev->caps.pkey_table_len[port_num]);
+ if (!p->pkey_group.attrs) {
+ ret = -ENOMEM;
+diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
+index bc9a0de897cb..dbb75c0de848 100644
+--- a/drivers/infiniband/hw/mlx5/mr.c
++++ b/drivers/infiniband/hw/mlx5/mr.c
+@@ -1118,19 +1118,7 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
+ return &mr->ibmr;
+
+ error:
+- /*
+- * Destroy the umem *before* destroying the MR, to ensure we
+- * will not have any in-flight notifiers when destroying the
+- * MR.
+- *
+- * As the MR is completely invalid to begin with, and this
+- * error path is only taken if we can't push the mr entry into
+- * the pagefault tree, this is safe.
+- */
+-
+ ib_umem_release(umem);
+- /* Kill the MR, and return an error code. */
+- clean_mr(mr);
+ return ERR_PTR(err);
+ }
+
+diff --git a/drivers/infiniband/hw/qib/qib_keys.c b/drivers/infiniband/hw/qib/qib_keys.c
+index ad843c786e72..5afaa218508d 100644
+--- a/drivers/infiniband/hw/qib/qib_keys.c
++++ b/drivers/infiniband/hw/qib/qib_keys.c
+@@ -86,6 +86,10 @@ int qib_alloc_lkey(struct qib_mregion *mr, int dma_region)
+ * unrestricted LKEY.
+ */
+ rkt->gen++;
++ /*
++ * bits are capped in qib_verbs.c to insure enough bits
++ * for generation number
++ */
+ mr->lkey = (r << (32 - ib_qib_lkey_table_size)) |
+ ((((1 << (24 - ib_qib_lkey_table_size)) - 1) & rkt->gen)
+ << 8);
+diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c
+index a05d1a372208..77e981abfce4 100644
+--- a/drivers/infiniband/hw/qib/qib_verbs.c
++++ b/drivers/infiniband/hw/qib/qib_verbs.c
+@@ -40,6 +40,7 @@
+ #include <linux/rculist.h>
+ #include <linux/mm.h>
+ #include <linux/random.h>
++#include <linux/vmalloc.h>
+
+ #include "qib.h"
+ #include "qib_common.h"
+@@ -2109,10 +2110,16 @@ int qib_register_ib_device(struct qib_devdata *dd)
+ * the LKEY). The remaining bits act as a generation number or tag.
+ */
+ spin_lock_init(&dev->lk_table.lock);
++ /* insure generation is at least 4 bits see keys.c */
++ if (ib_qib_lkey_table_size > MAX_LKEY_TABLE_BITS) {
++ qib_dev_warn(dd, "lkey bits %u too large, reduced to %u\n",
++ ib_qib_lkey_table_size, MAX_LKEY_TABLE_BITS);
++ ib_qib_lkey_table_size = MAX_LKEY_TABLE_BITS;
++ }
+ dev->lk_table.max = 1 << ib_qib_lkey_table_size;
+ lk_tab_size = dev->lk_table.max * sizeof(*dev->lk_table.table);
+ dev->lk_table.table = (struct qib_mregion __rcu **)
+- __get_free_pages(GFP_KERNEL, get_order(lk_tab_size));
++ vmalloc(lk_tab_size);
+ if (dev->lk_table.table == NULL) {
+ ret = -ENOMEM;
+ goto err_lk;
+@@ -2286,7 +2293,7 @@ err_tx:
+ sizeof(struct qib_pio_header),
+ dev->pio_hdrs, dev->pio_hdrs_phys);
+ err_hdrs:
+- free_pages((unsigned long) dev->lk_table.table, get_order(lk_tab_size));
++ vfree(dev->lk_table.table);
+ err_lk:
+ kfree(dev->qp_table);
+ err_qpt:
+@@ -2340,8 +2347,7 @@ void qib_unregister_ib_device(struct qib_devdata *dd)
+ sizeof(struct qib_pio_header),
+ dev->pio_hdrs, dev->pio_hdrs_phys);
+ lk_tab_size = dev->lk_table.max * sizeof(*dev->lk_table.table);
+- free_pages((unsigned long) dev->lk_table.table,
+- get_order(lk_tab_size));
++ vfree(dev->lk_table.table);
+ kfree(dev->qp_table);
+ }
+
+diff --git a/drivers/infiniband/hw/qib/qib_verbs.h b/drivers/infiniband/hw/qib/qib_verbs.h
+index 1635572752ce..bce0fa596b4d 100644
+--- a/drivers/infiniband/hw/qib/qib_verbs.h
++++ b/drivers/infiniband/hw/qib/qib_verbs.h
+@@ -647,6 +647,8 @@ struct qib_qpn_table {
+ struct qpn_map map[QPNMAP_ENTRIES];
+ };
+
++#define MAX_LKEY_TABLE_BITS 23
++
+ struct qib_lkey_table {
+ spinlock_t lock; /* protect changes in this struct */
+ u32 next; /* next unused index (speeds search) */
+diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c
+index 6a594aac2290..c933d882c35c 100644
+--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
++++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
+@@ -201,6 +201,7 @@ iser_initialize_task_headers(struct iscsi_task *task,
+ goto out;
+ }
+
++ tx_desc->mapped = true;
+ tx_desc->dma_addr = dma_addr;
+ tx_desc->tx_sg[0].addr = tx_desc->dma_addr;
+ tx_desc->tx_sg[0].length = ISER_HEADERS_LEN;
+@@ -360,16 +361,19 @@ iscsi_iser_task_xmit(struct iscsi_task *task)
+ static void iscsi_iser_cleanup_task(struct iscsi_task *task)
+ {
+ struct iscsi_iser_task *iser_task = task->dd_data;
+- struct iser_tx_desc *tx_desc = &iser_task->desc;
+- struct iser_conn *iser_conn = task->conn->dd_data;
++ struct iser_tx_desc *tx_desc = &iser_task->desc;
++ struct iser_conn *iser_conn = task->conn->dd_data;
+ struct iser_device *device = iser_conn->ib_conn.device;
+
+ /* DEVICE_REMOVAL event might have already released the device */
+ if (!device)
+ return;
+
+- ib_dma_unmap_single(device->ib_device,
+- tx_desc->dma_addr, ISER_HEADERS_LEN, DMA_TO_DEVICE);
++ if (likely(tx_desc->mapped)) {
++ ib_dma_unmap_single(device->ib_device, tx_desc->dma_addr,
++ ISER_HEADERS_LEN, DMA_TO_DEVICE);
++ tx_desc->mapped = false;
++ }
+
+ /* mgmt tasks do not need special cleanup */
+ if (!task->sc)
+diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h
+index 262ba1f8ee50..d2b6caf7694d 100644
+--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
++++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
+@@ -270,6 +270,7 @@ enum iser_desc_type {
+ * sg[1] optionally points to either of immediate data
+ * unsolicited data-out or control
+ * @num_sge: number sges used on this TX task
++ * @mapped: Is the task header mapped
+ */
+ struct iser_tx_desc {
+ struct iser_hdr iser_header;
+@@ -278,6 +279,7 @@ struct iser_tx_desc {
+ u64 dma_addr;
+ struct ib_sge tx_sg[2];
+ int num_sge;
++ bool mapped;
+ };
+
+ #define ISER_RX_PAD_SIZE (256 - (ISER_RX_PAYLOAD_SIZE + \
+diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c b/drivers/infiniband/ulp/iser/iser_initiator.c
+index 3e2118e8ed87..0a47f42fec24 100644
+--- a/drivers/infiniband/ulp/iser/iser_initiator.c
++++ b/drivers/infiniband/ulp/iser/iser_initiator.c
+@@ -454,7 +454,7 @@ int iser_send_data_out(struct iscsi_conn *conn,
+ unsigned long buf_offset;
+ unsigned long data_seg_len;
+ uint32_t itt;
+- int err = 0;
++ int err;
+ struct ib_sge *tx_dsg;
+
+ itt = (__force uint32_t)hdr->itt;
+@@ -475,7 +475,9 @@ int iser_send_data_out(struct iscsi_conn *conn,
+ memcpy(&tx_desc->iscsi_header, hdr, sizeof(struct iscsi_hdr));
+
+ /* build the tx desc */
+- iser_initialize_task_headers(task, tx_desc);
++ err = iser_initialize_task_headers(task, tx_desc);
++ if (err)
++ goto send_data_out_error;
+
+ mem_reg = &iser_task->rdma_reg[ISER_DIR_OUT];
+ tx_dsg = &tx_desc->tx_sg[1];
+@@ -502,7 +504,7 @@ int iser_send_data_out(struct iscsi_conn *conn,
+
+ send_data_out_error:
+ kmem_cache_free(ig.desc_cache, tx_desc);
+- iser_err("conn %p failed err %d\n",conn, err);
++ iser_err("conn %p failed err %d\n", conn, err);
+ return err;
+ }
+
+diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
+index 31a20b462266..ffda44ff9375 100644
+--- a/drivers/infiniband/ulp/srp/ib_srp.c
++++ b/drivers/infiniband/ulp/srp/ib_srp.c
+@@ -2757,6 +2757,13 @@ static int srp_sdev_count(struct Scsi_Host *host)
+ return c;
+ }
+
++/*
++ * Return values:
++ * < 0 upon failure. Caller is responsible for SRP target port cleanup.
++ * 0 and target->state == SRP_TARGET_REMOVED if asynchronous target port
++ * removal has been scheduled.
++ * 0 and target->state != SRP_TARGET_REMOVED upon success.
++ */
+ static int srp_add_target(struct srp_host *host, struct srp_target_port *target)
+ {
+ struct srp_rport_identifiers ids;
+@@ -3262,7 +3269,7 @@ static ssize_t srp_create_target(struct device *dev,
+ srp_free_ch_ib(target, ch);
+ srp_free_req_data(target, ch);
+ target->ch_count = ch - target->ch;
+- break;
++ goto connected;
+ }
+ }
+
+@@ -3272,6 +3279,7 @@ static ssize_t srp_create_target(struct device *dev,
+ node_idx++;
+ }
+
++connected:
+ target->scsi_host->nr_hw_queues = target->ch_count;
+
+ ret = srp_add_target(host, target);
+@@ -3294,6 +3302,8 @@ out:
+ mutex_unlock(&host->add_target_mutex);
+
+ scsi_host_put(target->scsi_host);
++ if (ret < 0)
++ scsi_host_put(target->scsi_host);
+
+ return ret;
+
+diff --git a/drivers/input/evdev.c b/drivers/input/evdev.c
+index 9d35499faca4..08d496411f75 100644
+--- a/drivers/input/evdev.c
++++ b/drivers/input/evdev.c
+@@ -290,19 +290,14 @@ static int evdev_flush(struct file *file, fl_owner_t id)
+ {
+ struct evdev_client *client = file->private_data;
+ struct evdev *evdev = client->evdev;
+- int retval;
+
+- retval = mutex_lock_interruptible(&evdev->mutex);
+- if (retval)
+- return retval;
++ mutex_lock(&evdev->mutex);
+
+- if (!evdev->exist || client->revoked)
+- retval = -ENODEV;
+- else
+- retval = input_flush_device(&evdev->handle, file);
++ if (evdev->exist && !client->revoked)
++ input_flush_device(&evdev->handle, file);
+
+ mutex_unlock(&evdev->mutex);
+- return retval;
++ return 0;
+ }
+
+ static void evdev_free(struct device *dev)
+diff --git a/drivers/iommu/fsl_pamu.c b/drivers/iommu/fsl_pamu.c
+index abeedc9a78c2..2570f2a25dc4 100644
+--- a/drivers/iommu/fsl_pamu.c
++++ b/drivers/iommu/fsl_pamu.c
+@@ -41,7 +41,6 @@ struct pamu_isr_data {
+
+ static struct paace *ppaact;
+ static struct paace *spaact;
+-static struct ome *omt __initdata;
+
+ /*
+ * Table for matching compatible strings, for device tree
+@@ -50,7 +49,7 @@ static struct ome *omt __initdata;
+ * SOCs. For the older SOCs "fsl,qoriq-device-config-1.0"
+ * string would be used.
+ */
+-static const struct of_device_id guts_device_ids[] __initconst = {
++static const struct of_device_id guts_device_ids[] = {
+ { .compatible = "fsl,qoriq-device-config-1.0", },
+ { .compatible = "fsl,qoriq-device-config-2.0", },
+ {}
+@@ -599,7 +598,7 @@ found_cpu_node:
+ * Memory accesses to QMAN and BMAN private memory need not be coherent, so
+ * clear the PAACE entry coherency attribute for them.
+ */
+-static void __init setup_qbman_paace(struct paace *ppaace, int paace_type)
++static void setup_qbman_paace(struct paace *ppaace, int paace_type)
+ {
+ switch (paace_type) {
+ case QMAN_PAACE:
+@@ -629,7 +628,7 @@ static void __init setup_qbman_paace(struct paace *ppaace, int paace_type)
+ * this table to translate device transaction to appropriate corenet
+ * transaction.
+ */
+-static void __init setup_omt(struct ome *omt)
++static void setup_omt(struct ome *omt)
+ {
+ struct ome *ome;
+
+@@ -666,7 +665,7 @@ static void __init setup_omt(struct ome *omt)
+ * Get the maximum number of PAACT table entries
+ * and subwindows supported by PAMU
+ */
+-static void __init get_pamu_cap_values(unsigned long pamu_reg_base)
++static void get_pamu_cap_values(unsigned long pamu_reg_base)
+ {
+ u32 pc_val;
+
+@@ -676,9 +675,9 @@ static void __init get_pamu_cap_values(unsigned long pamu_reg_base)
+ }
+
+ /* Setup PAMU registers pointing to PAACT, SPAACT and OMT */
+-static int __init setup_one_pamu(unsigned long pamu_reg_base, unsigned long pamu_reg_size,
+- phys_addr_t ppaact_phys, phys_addr_t spaact_phys,
+- phys_addr_t omt_phys)
++static int setup_one_pamu(unsigned long pamu_reg_base, unsigned long pamu_reg_size,
++ phys_addr_t ppaact_phys, phys_addr_t spaact_phys,
++ phys_addr_t omt_phys)
+ {
+ u32 *pc;
+ struct pamu_mmap_regs *pamu_regs;
+@@ -720,7 +719,7 @@ static int __init setup_one_pamu(unsigned long pamu_reg_base, unsigned long pamu
+ }
+
+ /* Enable all device LIODNS */
+-static void __init setup_liodns(void)
++static void setup_liodns(void)
+ {
+ int i, len;
+ struct paace *ppaace;
+@@ -846,7 +845,7 @@ struct ccsr_law {
+ /*
+ * Create a coherence subdomain for a given memory block.
+ */
+-static int __init create_csd(phys_addr_t phys, size_t size, u32 csd_port_id)
++static int create_csd(phys_addr_t phys, size_t size, u32 csd_port_id)
+ {
+ struct device_node *np;
+ const __be32 *iprop;
+@@ -988,7 +987,7 @@ error:
+ static const struct {
+ u32 svr;
+ u32 port_id;
+-} port_id_map[] __initconst = {
++} port_id_map[] = {
+ {(SVR_P2040 << 8) | 0x10, 0xFF000000}, /* P2040 1.0 */
+ {(SVR_P2040 << 8) | 0x11, 0xFF000000}, /* P2040 1.1 */
+ {(SVR_P2041 << 8) | 0x10, 0xFF000000}, /* P2041 1.0 */
+@@ -1006,7 +1005,7 @@ static const struct {
+
+ #define SVR_SECURITY 0x80000 /* The Security (E) bit */
+
+-static int __init fsl_pamu_probe(struct platform_device *pdev)
++static int fsl_pamu_probe(struct platform_device *pdev)
+ {
+ struct device *dev = &pdev->dev;
+ void __iomem *pamu_regs = NULL;
+@@ -1022,6 +1021,7 @@ static int __init fsl_pamu_probe(struct platform_device *pdev)
+ int irq;
+ phys_addr_t ppaact_phys;
+ phys_addr_t spaact_phys;
++ struct ome *omt;
+ phys_addr_t omt_phys;
+ size_t mem_size = 0;
+ unsigned int order = 0;
+@@ -1200,7 +1200,7 @@ error:
+ return ret;
+ }
+
+-static struct platform_driver fsl_of_pamu_driver __initdata = {
++static struct platform_driver fsl_of_pamu_driver = {
+ .driver = {
+ .name = "fsl-of-pamu",
+ },
+diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
+index 0649b94f5958..7553cb90627f 100644
+--- a/drivers/iommu/intel-iommu.c
++++ b/drivers/iommu/intel-iommu.c
+@@ -755,6 +755,7 @@ static inline struct context_entry *iommu_context_addr(struct intel_iommu *iommu
+ struct context_entry *context;
+ u64 *entry;
+
++ entry = &root->lo;
+ if (ecs_enabled(iommu)) {
+ if (devfn >= 0x80) {
+ devfn -= 0x80;
+@@ -762,7 +763,6 @@ static inline struct context_entry *iommu_context_addr(struct intel_iommu *iommu
+ }
+ devfn *= 2;
+ }
+- entry = &root->lo;
+ if (*entry & 1)
+ context = phys_to_virt(*entry & VTD_PAGE_MASK);
+ else {
+diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
+index 4e460216bd16..e29d5d7fe220 100644
+--- a/drivers/iommu/io-pgtable-arm.c
++++ b/drivers/iommu/io-pgtable-arm.c
+@@ -200,6 +200,10 @@ typedef u64 arm_lpae_iopte;
+
+ static bool selftest_running = false;
+
++static int __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
++ unsigned long iova, size_t size, int lvl,
++ arm_lpae_iopte *ptep);
++
+ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
+ unsigned long iova, phys_addr_t paddr,
+ arm_lpae_iopte prot, int lvl,
+@@ -207,10 +211,21 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
+ {
+ arm_lpae_iopte pte = prot;
+
+- /* We require an unmap first */
+ if (iopte_leaf(*ptep, lvl)) {
++ /* We require an unmap first */
+ WARN_ON(!selftest_running);
+ return -EEXIST;
++ } else if (iopte_type(*ptep, lvl) == ARM_LPAE_PTE_TYPE_TABLE) {
++ /*
++ * We need to unmap and free the old table before
++ * overwriting it with a block entry.
++ */
++ arm_lpae_iopte *tblp;
++ size_t sz = ARM_LPAE_BLOCK_SIZE(lvl, data);
++
++ tblp = ptep - ARM_LPAE_LVL_IDX(iova, lvl, data);
++ if (WARN_ON(__arm_lpae_unmap(data, iova, sz, lvl, tblp) != sz))
++ return -EINVAL;
+ }
+
+ if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_NS)
+diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
+index c1f2e521dc52..2cd439203d0f 100644
+--- a/drivers/iommu/tegra-smmu.c
++++ b/drivers/iommu/tegra-smmu.c
+@@ -27,6 +27,7 @@ struct tegra_smmu {
+ const struct tegra_smmu_soc *soc;
+
+ unsigned long pfn_mask;
++ unsigned long tlb_mask;
+
+ unsigned long *asids;
+ struct mutex lock;
+@@ -68,7 +69,8 @@ static inline u32 smmu_readl(struct tegra_smmu *smmu, unsigned long offset)
+ #define SMMU_TLB_CONFIG 0x14
+ #define SMMU_TLB_CONFIG_HIT_UNDER_MISS (1 << 29)
+ #define SMMU_TLB_CONFIG_ROUND_ROBIN_ARBITRATION (1 << 28)
+-#define SMMU_TLB_CONFIG_ACTIVE_LINES(x) ((x) & 0x3f)
++#define SMMU_TLB_CONFIG_ACTIVE_LINES(smmu) \
++ ((smmu)->soc->num_tlb_lines & (smmu)->tlb_mask)
+
+ #define SMMU_PTC_CONFIG 0x18
+ #define SMMU_PTC_CONFIG_ENABLE (1 << 29)
+@@ -816,6 +818,9 @@ struct tegra_smmu *tegra_smmu_probe(struct device *dev,
+ smmu->pfn_mask = BIT_MASK(mc->soc->num_address_bits - PAGE_SHIFT) - 1;
+ dev_dbg(dev, "address bits: %u, PFN mask: %#lx\n",
+ mc->soc->num_address_bits, smmu->pfn_mask);
++ smmu->tlb_mask = (smmu->soc->num_tlb_lines << 1) - 1;
++ dev_dbg(dev, "TLB lines: %u, mask: %#lx\n", smmu->soc->num_tlb_lines,
++ smmu->tlb_mask);
+
+ value = SMMU_PTC_CONFIG_ENABLE | SMMU_PTC_CONFIG_INDEX_MAP(0x3f);
+
+@@ -825,7 +830,7 @@ struct tegra_smmu *tegra_smmu_probe(struct device *dev,
+ smmu_writel(smmu, value, SMMU_PTC_CONFIG);
+
+ value = SMMU_TLB_CONFIG_HIT_UNDER_MISS |
+- SMMU_TLB_CONFIG_ACTIVE_LINES(0x20);
++ SMMU_TLB_CONFIG_ACTIVE_LINES(smmu);
+
+ if (soc->supports_round_robin_arbitration)
+ value |= SMMU_TLB_CONFIG_ROUND_ROBIN_ARBITRATION;
+diff --git a/drivers/media/platform/am437x/am437x-vpfe.c b/drivers/media/platform/am437x/am437x-vpfe.c
+index 1fba339cddc1..c8447fa3fd91 100644
+--- a/drivers/media/platform/am437x/am437x-vpfe.c
++++ b/drivers/media/platform/am437x/am437x-vpfe.c
+@@ -1186,14 +1186,24 @@ static int vpfe_initialize_device(struct vpfe_device *vpfe)
+ static int vpfe_release(struct file *file)
+ {
+ struct vpfe_device *vpfe = video_drvdata(file);
++ bool fh_singular;
+ int ret;
+
+ mutex_lock(&vpfe->lock);
+
+- if (v4l2_fh_is_singular_file(file))
+- vpfe_ccdc_close(&vpfe->ccdc, vpfe->pdev);
++ /* Save the singular status before we call the clean-up helper */
++ fh_singular = v4l2_fh_is_singular_file(file);
++
++ /* the release helper will cleanup any on-going streaming */
+ ret = _vb2_fop_release(file, NULL);
+
++ /*
++ * If this was the last open file.
++ * Then de-initialize hw module.
++ */
++ if (fh_singular)
++ vpfe_ccdc_close(&vpfe->ccdc, vpfe->pdev);
++
+ mutex_unlock(&vpfe->lock);
+
+ return ret;
+@@ -1565,7 +1575,7 @@ static int vpfe_s_fmt(struct file *file, void *priv,
+ return -EBUSY;
+ }
+
+- ret = vpfe_try_fmt(file, priv, fmt);
++ ret = vpfe_try_fmt(file, priv, &format);
+ if (ret)
+ return ret;
+
+diff --git a/drivers/media/platform/omap3isp/isp.c b/drivers/media/platform/omap3isp/isp.c
+index 18d0a871747f..12be830d704f 100644
+--- a/drivers/media/platform/omap3isp/isp.c
++++ b/drivers/media/platform/omap3isp/isp.c
+@@ -829,14 +829,14 @@ static int isp_pipeline_link_notify(struct media_link *link, u32 flags,
+ int ret;
+
+ if (notification == MEDIA_DEV_NOTIFY_POST_LINK_CH &&
+- !(link->flags & MEDIA_LNK_FL_ENABLED)) {
++ !(flags & MEDIA_LNK_FL_ENABLED)) {
+ /* Powering off entities is assumed to never fail. */
+ isp_pipeline_pm_power(source, -sink_use);
+ isp_pipeline_pm_power(sink, -source_use);
+ return 0;
+ }
+
+- if (notification == MEDIA_DEV_NOTIFY_POST_LINK_CH &&
++ if (notification == MEDIA_DEV_NOTIFY_PRE_LINK_CH &&
+ (flags & MEDIA_LNK_FL_ENABLED)) {
+
+ ret = isp_pipeline_pm_power(source, sink_use);
+@@ -2000,10 +2000,8 @@ static int isp_register_entities(struct isp_device *isp)
+ ret = v4l2_device_register_subdev_nodes(&isp->v4l2_dev);
+
+ done:
+- if (ret < 0) {
++ if (ret < 0)
+ isp_unregister_entities(isp);
+- v4l2_async_notifier_unregister(&isp->notifier);
+- }
+
+ return ret;
+ }
+@@ -2423,10 +2421,6 @@ static int isp_probe(struct platform_device *pdev)
+ ret = isp_of_parse_nodes(&pdev->dev, &isp->notifier);
+ if (ret < 0)
+ return ret;
+- ret = v4l2_async_notifier_register(&isp->v4l2_dev,
+- &isp->notifier);
+- if (ret)
+- return ret;
+ } else {
+ isp->pdata = pdev->dev.platform_data;
+ isp->syscon = syscon_regmap_lookup_by_pdevname("syscon.0");
+@@ -2557,18 +2551,27 @@ static int isp_probe(struct platform_device *pdev)
+ if (ret < 0)
+ goto error_iommu;
+
+- isp->notifier.bound = isp_subdev_notifier_bound;
+- isp->notifier.complete = isp_subdev_notifier_complete;
+-
+ ret = isp_register_entities(isp);
+ if (ret < 0)
+ goto error_modules;
+
++ if (IS_ENABLED(CONFIG_OF) && pdev->dev.of_node) {
++ isp->notifier.bound = isp_subdev_notifier_bound;
++ isp->notifier.complete = isp_subdev_notifier_complete;
++
++ ret = v4l2_async_notifier_register(&isp->v4l2_dev,
++ &isp->notifier);
++ if (ret)
++ goto error_register_entities;
++ }
++
+ isp_core_init(isp, 1);
+ omap3isp_put(isp);
+
+ return 0;
+
++error_register_entities:
++ isp_unregister_entities(isp);
+ error_modules:
+ isp_cleanup_modules(isp);
+ error_iommu:
+diff --git a/drivers/media/platform/xilinx/xilinx-dma.c b/drivers/media/platform/xilinx/xilinx-dma.c
+index 98e50e446d57..e779c93cb015 100644
+--- a/drivers/media/platform/xilinx/xilinx-dma.c
++++ b/drivers/media/platform/xilinx/xilinx-dma.c
+@@ -699,8 +699,10 @@ int xvip_dma_init(struct xvip_composite_device *xdev, struct xvip_dma *dma,
+
+ /* ... and the buffers queue... */
+ dma->alloc_ctx = vb2_dma_contig_init_ctx(dma->xdev->dev);
+- if (IS_ERR(dma->alloc_ctx))
++ if (IS_ERR(dma->alloc_ctx)) {
++ ret = PTR_ERR(dma->alloc_ctx);
+ goto error;
++ }
+
+ /* Don't enable VB2_READ and VB2_WRITE, as using the read() and write()
+ * V4L2 APIs would be inefficient. Testing on the command line with a
+diff --git a/drivers/media/rc/rc-main.c b/drivers/media/rc/rc-main.c
+index 0ff388a16168..f3b6b2caabf6 100644
+--- a/drivers/media/rc/rc-main.c
++++ b/drivers/media/rc/rc-main.c
+@@ -1191,9 +1191,6 @@ static int rc_dev_uevent(struct device *device, struct kobj_uevent_env *env)
+ {
+ struct rc_dev *dev = to_rc_dev(device);
+
+- if (!dev || !dev->input_dev)
+- return -ENODEV;
+-
+ if (dev->rc_map.name)
+ ADD_HOTPLUG_VAR("NAME=%s", dev->rc_map.name);
+ if (dev->driver_name)
+diff --git a/drivers/memory/tegra/tegra114.c b/drivers/memory/tegra/tegra114.c
+index 9f579589e800..9bf11ea90549 100644
+--- a/drivers/memory/tegra/tegra114.c
++++ b/drivers/memory/tegra/tegra114.c
+@@ -935,6 +935,7 @@ static const struct tegra_smmu_soc tegra114_smmu_soc = {
+ .num_swgroups = ARRAY_SIZE(tegra114_swgroups),
+ .supports_round_robin_arbitration = false,
+ .supports_request_limit = false,
++ .num_tlb_lines = 32,
+ .num_asids = 4,
+ .ops = &tegra114_smmu_ops,
+ };
+diff --git a/drivers/memory/tegra/tegra124.c b/drivers/memory/tegra/tegra124.c
+index 966e1557e6f4..70ed80d23431 100644
+--- a/drivers/memory/tegra/tegra124.c
++++ b/drivers/memory/tegra/tegra124.c
+@@ -1023,6 +1023,7 @@ static const struct tegra_smmu_soc tegra124_smmu_soc = {
+ .num_swgroups = ARRAY_SIZE(tegra124_swgroups),
+ .supports_round_robin_arbitration = true,
+ .supports_request_limit = true,
++ .num_tlb_lines = 32,
+ .num_asids = 128,
+ .ops = &tegra124_smmu_ops,
+ };
+diff --git a/drivers/memory/tegra/tegra30.c b/drivers/memory/tegra/tegra30.c
+index 1abcd8f6f3ba..b2a34fefabef 100644
+--- a/drivers/memory/tegra/tegra30.c
++++ b/drivers/memory/tegra/tegra30.c
+@@ -957,6 +957,7 @@ static const struct tegra_smmu_soc tegra30_smmu_soc = {
+ .num_swgroups = ARRAY_SIZE(tegra30_swgroups),
+ .supports_round_robin_arbitration = false,
+ .supports_request_limit = false,
++ .num_tlb_lines = 16,
+ .num_asids = 4,
+ .ops = &tegra30_smmu_ops,
+ };
+diff --git a/drivers/misc/cxl/api.c b/drivers/misc/cxl/api.c
+index 729e0851167d..4224a6acf4c4 100644
+--- a/drivers/misc/cxl/api.c
++++ b/drivers/misc/cxl/api.c
+@@ -59,7 +59,7 @@ EXPORT_SYMBOL_GPL(cxl_get_phys_dev);
+
+ int cxl_release_context(struct cxl_context *ctx)
+ {
+- if (ctx->status != CLOSED)
++ if (ctx->status >= STARTED)
+ return -EBUSY;
+
+ put_device(&ctx->afu->dev);
+diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c
+index 32ad09705949..dc836071c633 100644
+--- a/drivers/misc/cxl/pci.c
++++ b/drivers/misc/cxl/pci.c
+@@ -851,16 +851,9 @@ int cxl_reset(struct cxl *adapter)
+ {
+ struct pci_dev *dev = to_pci_dev(adapter->dev.parent);
+ int rc;
+- int i;
+- u32 val;
+
+ dev_info(&dev->dev, "CXL reset\n");
+
+- for (i = 0; i < adapter->slices; i++) {
+- cxl_pci_vphb_remove(adapter->afu[i]);
+- cxl_remove_afu(adapter->afu[i]);
+- }
+-
+ /* pcie_warm_reset requests a fundamental pci reset which includes a
+ * PERST assert/deassert. PERST triggers a loading of the image
+ * if "user" or "factory" is selected in sysfs */
+@@ -869,20 +862,6 @@ int cxl_reset(struct cxl *adapter)
+ return rc;
+ }
+
+- /* the PERST done above fences the PHB. So, reset depends on EEH
+- * to unbind the driver, tell Sapphire to reinit the PHB, and rebind
+- * the driver. Do an mmio read explictly to ensure EEH notices the
+- * fenced PHB. Retry for a few seconds before giving up. */
+- i = 0;
+- while (((val = mmio_read32be(adapter->p1_mmio)) != 0xffffffff) &&
+- (i < 5)) {
+- msleep(500);
+- i++;
+- }
+-
+- if (val != 0xffffffff)
+- dev_err(&dev->dev, "cxl: PERST failed to trigger EEH\n");
+-
+ return rc;
+ }
+
+@@ -1140,8 +1119,6 @@ static int cxl_probe(struct pci_dev *dev, const struct pci_device_id *id)
+ int slice;
+ int rc;
+
+- pci_dev_get(dev);
+-
+ if (cxl_verbose)
+ dump_cxl_config_space(dev);
+
+diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
+index 9ad73f30f744..9e3fdbdc4037 100644
+--- a/drivers/mmc/core/core.c
++++ b/drivers/mmc/core/core.c
+@@ -358,8 +358,10 @@ EXPORT_SYMBOL(mmc_start_bkops);
+ */
+ static void mmc_wait_data_done(struct mmc_request *mrq)
+ {
+- mrq->host->context_info.is_done_rcv = true;
+- wake_up_interruptible(&mrq->host->context_info.wait);
++ struct mmc_context_info *context_info = &mrq->host->context_info;
++
++ context_info->is_done_rcv = true;
++ wake_up_interruptible(&context_info->wait);
+ }
+
+ static void mmc_wait_done(struct mmc_request *mrq)
+diff --git a/drivers/mmc/host/sdhci-of-esdhc.c b/drivers/mmc/host/sdhci-of-esdhc.c
+index 797be7549a15..653f335bef15 100644
+--- a/drivers/mmc/host/sdhci-of-esdhc.c
++++ b/drivers/mmc/host/sdhci-of-esdhc.c
+@@ -208,6 +208,12 @@ static void esdhc_of_set_clock(struct sdhci_host *host, unsigned int clock)
+ if (clock == 0)
+ return;
+
++ /* Workaround to start pre_div at 2 for VNN < VENDOR_V_23 */
++ temp = esdhc_readw(host, SDHCI_HOST_VERSION);
++ temp = (temp & SDHCI_VENDOR_VER_MASK) >> SDHCI_VENDOR_VER_SHIFT;
++ if (temp < VENDOR_V_23)
++ pre_div = 2;
++
+ /* Workaround to reduce the clock frequency for p1010 esdhc */
+ if (of_find_compatible_node(NULL, NULL, "fsl,p1010-esdhc")) {
+ if (clock > 20000000)
+diff --git a/drivers/mmc/host/sdhci-pci.c b/drivers/mmc/host/sdhci-pci.c
+index 94f54d2772e8..b3b0a3e4fca1 100644
+--- a/drivers/mmc/host/sdhci-pci.c
++++ b/drivers/mmc/host/sdhci-pci.c
+@@ -618,6 +618,7 @@ static int jmicron_resume(struct sdhci_pci_chip *chip)
+ static const struct sdhci_pci_fixes sdhci_o2 = {
+ .probe = sdhci_pci_o2_probe,
+ .quirks = SDHCI_QUIRK_NO_ENDATTR_IN_NOPDESC,
++ .quirks2 = SDHCI_QUIRK2_CLEAR_TRANSFERMODE_REG_BEFORE_CMD,
+ .probe_slot = sdhci_pci_o2_probe_slot,
+ .resume = sdhci_pci_o2_resume,
+ };
+diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
+index 1dbe93232030..b0c915a35a9e 100644
+--- a/drivers/mmc/host/sdhci.c
++++ b/drivers/mmc/host/sdhci.c
+@@ -54,8 +54,7 @@ static void sdhci_finish_command(struct sdhci_host *);
+ static int sdhci_execute_tuning(struct mmc_host *mmc, u32 opcode);
+ static void sdhci_enable_preset_value(struct sdhci_host *host, bool enable);
+ static int sdhci_pre_dma_transfer(struct sdhci_host *host,
+- struct mmc_data *data,
+- struct sdhci_host_next *next);
++ struct mmc_data *data);
+ static int sdhci_do_get_cd(struct sdhci_host *host);
+
+ #ifdef CONFIG_PM
+@@ -496,7 +495,7 @@ static int sdhci_adma_table_pre(struct sdhci_host *host,
+ goto fail;
+ BUG_ON(host->align_addr & host->align_mask);
+
+- host->sg_count = sdhci_pre_dma_transfer(host, data, NULL);
++ host->sg_count = sdhci_pre_dma_transfer(host, data);
+ if (host->sg_count < 0)
+ goto unmap_align;
+
+@@ -635,9 +634,11 @@ static void sdhci_adma_table_post(struct sdhci_host *host,
+ }
+ }
+
+- if (!data->host_cookie)
++ if (data->host_cookie == COOKIE_MAPPED) {
+ dma_unmap_sg(mmc_dev(host->mmc), data->sg,
+ data->sg_len, direction);
++ data->host_cookie = COOKIE_UNMAPPED;
++ }
+ }
+
+ static u8 sdhci_calc_timeout(struct sdhci_host *host, struct mmc_command *cmd)
+@@ -833,7 +834,7 @@ static void sdhci_prepare_data(struct sdhci_host *host, struct mmc_command *cmd)
+ } else {
+ int sg_cnt;
+
+- sg_cnt = sdhci_pre_dma_transfer(host, data, NULL);
++ sg_cnt = sdhci_pre_dma_transfer(host, data);
+ if (sg_cnt <= 0) {
+ /*
+ * This only happens when someone fed
+@@ -949,11 +950,13 @@ static void sdhci_finish_data(struct sdhci_host *host)
+ if (host->flags & SDHCI_USE_ADMA)
+ sdhci_adma_table_post(host, data);
+ else {
+- if (!data->host_cookie)
++ if (data->host_cookie == COOKIE_MAPPED) {
+ dma_unmap_sg(mmc_dev(host->mmc),
+ data->sg, data->sg_len,
+ (data->flags & MMC_DATA_READ) ?
+ DMA_FROM_DEVICE : DMA_TO_DEVICE);
++ data->host_cookie = COOKIE_UNMAPPED;
++ }
+ }
+ }
+
+@@ -1132,6 +1135,7 @@ static u16 sdhci_get_preset_value(struct sdhci_host *host)
+ preset = sdhci_readw(host, SDHCI_PRESET_FOR_SDR104);
+ break;
+ case MMC_TIMING_UHS_DDR50:
++ case MMC_TIMING_MMC_DDR52:
+ preset = sdhci_readw(host, SDHCI_PRESET_FOR_DDR50);
+ break;
+ case MMC_TIMING_MMC_HS400:
+@@ -1559,7 +1563,8 @@ static void sdhci_do_set_ios(struct sdhci_host *host, struct mmc_ios *ios)
+ (ios->timing == MMC_TIMING_UHS_SDR25) ||
+ (ios->timing == MMC_TIMING_UHS_SDR50) ||
+ (ios->timing == MMC_TIMING_UHS_SDR104) ||
+- (ios->timing == MMC_TIMING_UHS_DDR50))) {
++ (ios->timing == MMC_TIMING_UHS_DDR50) ||
++ (ios->timing == MMC_TIMING_MMC_DDR52))) {
+ u16 preset;
+
+ sdhci_enable_preset_value(host, true);
+@@ -2097,49 +2102,36 @@ static void sdhci_post_req(struct mmc_host *mmc, struct mmc_request *mrq,
+ struct mmc_data *data = mrq->data;
+
+ if (host->flags & SDHCI_REQ_USE_DMA) {
+- if (data->host_cookie)
++ if (data->host_cookie == COOKIE_GIVEN ||
++ data->host_cookie == COOKIE_MAPPED)
+ dma_unmap_sg(mmc_dev(host->mmc), data->sg, data->sg_len,
+ data->flags & MMC_DATA_WRITE ?
+ DMA_TO_DEVICE : DMA_FROM_DEVICE);
+- mrq->data->host_cookie = 0;
++ data->host_cookie = COOKIE_UNMAPPED;
+ }
+ }
+
+ static int sdhci_pre_dma_transfer(struct sdhci_host *host,
+- struct mmc_data *data,
+- struct sdhci_host_next *next)
++ struct mmc_data *data)
+ {
+ int sg_count;
+
+- if (!next && data->host_cookie &&
+- data->host_cookie != host->next_data.cookie) {
+- pr_debug(DRIVER_NAME "[%s] invalid cookie: %d, next-cookie %d\n",
+- __func__, data->host_cookie, host->next_data.cookie);
+- data->host_cookie = 0;
++ if (data->host_cookie == COOKIE_MAPPED) {
++ data->host_cookie = COOKIE_GIVEN;
++ return data->sg_count;
+ }
+
+- /* Check if next job is already prepared */
+- if (next ||
+- (!next && data->host_cookie != host->next_data.cookie)) {
+- sg_count = dma_map_sg(mmc_dev(host->mmc), data->sg,
+- data->sg_len,
+- data->flags & MMC_DATA_WRITE ?
+- DMA_TO_DEVICE : DMA_FROM_DEVICE);
+-
+- } else {
+- sg_count = host->next_data.sg_count;
+- host->next_data.sg_count = 0;
+- }
++ WARN_ON(data->host_cookie == COOKIE_GIVEN);
+
++ sg_count = dma_map_sg(mmc_dev(host->mmc), data->sg, data->sg_len,
++ data->flags & MMC_DATA_WRITE ?
++ DMA_TO_DEVICE : DMA_FROM_DEVICE);
+
+ if (sg_count == 0)
+- return -EINVAL;
++ return -ENOSPC;
+
+- if (next) {
+- next->sg_count = sg_count;
+- data->host_cookie = ++next->cookie < 0 ? 1 : next->cookie;
+- } else
+- host->sg_count = sg_count;
++ data->sg_count = sg_count;
++ data->host_cookie = COOKIE_MAPPED;
+
+ return sg_count;
+ }
+@@ -2149,16 +2141,10 @@ static void sdhci_pre_req(struct mmc_host *mmc, struct mmc_request *mrq,
+ {
+ struct sdhci_host *host = mmc_priv(mmc);
+
+- if (mrq->data->host_cookie) {
+- mrq->data->host_cookie = 0;
+- return;
+- }
++ mrq->data->host_cookie = COOKIE_UNMAPPED;
+
+ if (host->flags & SDHCI_REQ_USE_DMA)
+- if (sdhci_pre_dma_transfer(host,
+- mrq->data,
+- &host->next_data) < 0)
+- mrq->data->host_cookie = 0;
++ sdhci_pre_dma_transfer(host, mrq->data);
+ }
+
+ static void sdhci_card_event(struct mmc_host *mmc)
+@@ -3030,7 +3016,6 @@ int sdhci_add_host(struct sdhci_host *host)
+ host->max_clk = host->ops->get_max_clock(host);
+ }
+
+- host->next_data.cookie = 1;
+ /*
+ * In case of Host Controller v3.00, find out whether clock
+ * multiplier is supported.
+diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h
+index 5521d29368e4..a9512a421f52 100644
+--- a/drivers/mmc/host/sdhci.h
++++ b/drivers/mmc/host/sdhci.h
+@@ -309,9 +309,10 @@ struct sdhci_adma2_64_desc {
+ */
+ #define SDHCI_MAX_SEGS 128
+
+-struct sdhci_host_next {
+- unsigned int sg_count;
+- s32 cookie;
++enum sdhci_cookie {
++ COOKIE_UNMAPPED,
++ COOKIE_MAPPED,
++ COOKIE_GIVEN,
+ };
+
+ struct sdhci_host {
+@@ -503,7 +504,6 @@ struct sdhci_host {
+ unsigned int tuning_mode; /* Re-tuning mode supported by host */
+ #define SDHCI_TUNING_MODE_1 0
+
+- struct sdhci_host_next next_data;
+ unsigned long private[0] ____cacheline_aligned;
+ };
+
+diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
+index 73c934cf6c61..79789d8e52da 100644
+--- a/drivers/net/ethernet/broadcom/tg3.c
++++ b/drivers/net/ethernet/broadcom/tg3.c
+@@ -10757,7 +10757,7 @@ static ssize_t tg3_show_temp(struct device *dev,
+ tg3_ape_scratchpad_read(tp, &temperature, attr->index,
+ sizeof(temperature));
+ spin_unlock_bh(&tp->lock);
+- return sprintf(buf, "%u\n", temperature);
++ return sprintf(buf, "%u\n", temperature * 1000);
+ }
+
+
+diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
+index c2bd4f98a837..212d668dabb3 100644
+--- a/drivers/net/ethernet/intel/igb/igb.h
++++ b/drivers/net/ethernet/intel/igb/igb.h
+@@ -540,6 +540,7 @@ void igb_ptp_rx_pktstamp(struct igb_q_vector *q_vector, unsigned char *va,
+ struct sk_buff *skb);
+ int igb_ptp_set_ts_config(struct net_device *netdev, struct ifreq *ifr);
+ int igb_ptp_get_ts_config(struct net_device *netdev, struct ifreq *ifr);
++void igb_set_flag_queue_pairs(struct igb_adapter *, const u32);
+ #ifdef CONFIG_IGB_HWMON
+ void igb_sysfs_exit(struct igb_adapter *adapter);
+ int igb_sysfs_init(struct igb_adapter *adapter);
+diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c
+index d5673eb90c54..0afc0913e5b9 100644
+--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
++++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
+@@ -2991,6 +2991,7 @@ static int igb_set_channels(struct net_device *netdev,
+ {
+ struct igb_adapter *adapter = netdev_priv(netdev);
+ unsigned int count = ch->combined_count;
++ unsigned int max_combined = 0;
+
+ /* Verify they are not requesting separate vectors */
+ if (!count || ch->rx_count || ch->tx_count)
+@@ -3001,11 +3002,13 @@ static int igb_set_channels(struct net_device *netdev,
+ return -EINVAL;
+
+ /* Verify the number of channels doesn't exceed hw limits */
+- if (count > igb_max_channels(adapter))
++ max_combined = igb_max_channels(adapter);
++ if (count > max_combined)
+ return -EINVAL;
+
+ if (count != adapter->rss_queues) {
+ adapter->rss_queues = count;
++ igb_set_flag_queue_pairs(adapter, max_combined);
+
+ /* Hardware has to reinitialize queues and interrupts to
+ * match the new configuration.
+diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
+index 830466c49987..8d7b59689722 100644
+--- a/drivers/net/ethernet/intel/igb/igb_main.c
++++ b/drivers/net/ethernet/intel/igb/igb_main.c
+@@ -1205,10 +1205,14 @@ static int igb_alloc_q_vector(struct igb_adapter *adapter,
+
+ /* allocate q_vector and rings */
+ q_vector = adapter->q_vector[v_idx];
+- if (!q_vector)
++ if (!q_vector) {
+ q_vector = kzalloc(size, GFP_KERNEL);
+- else
++ } else if (size > ksize(q_vector)) {
++ kfree_rcu(q_vector, rcu);
++ q_vector = kzalloc(size, GFP_KERNEL);
++ } else {
+ memset(q_vector, 0, size);
++ }
+ if (!q_vector)
+ return -ENOMEM;
+
+@@ -2888,6 +2892,14 @@ static void igb_init_queue_configuration(struct igb_adapter *adapter)
+
+ adapter->rss_queues = min_t(u32, max_rss_queues, num_online_cpus());
+
++ igb_set_flag_queue_pairs(adapter, max_rss_queues);
++}
++
++void igb_set_flag_queue_pairs(struct igb_adapter *adapter,
++ const u32 max_rss_queues)
++{
++ struct e1000_hw *hw = &adapter->hw;
++
+ /* Determine if we need to pair queues. */
+ switch (hw->mac.type) {
+ case e1000_82575:
+diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+index 864b476f7fd5..925f2f8659b8 100644
+--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
++++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+@@ -837,8 +837,11 @@ static int stmmac_init_phy(struct net_device *dev)
+ interface);
+ }
+
+- if (IS_ERR(phydev)) {
++ if (IS_ERR_OR_NULL(phydev)) {
+ pr_err("%s: Could not attach to PHY\n", dev->name);
++ if (!phydev)
++ return -ENODEV;
++
+ return PTR_ERR(phydev);
+ }
+
+diff --git a/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c b/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c
+index 23806c243a53..fd4a5353d216 100644
+--- a/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c
++++ b/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c
+@@ -321,6 +321,7 @@ static struct usb_device_id rtl8192c_usb_ids[] = {
+ {RTL_USB_DEVICE(0x07b8, 0x8188, rtl92cu_hal_cfg)}, /*Abocom - Abocom*/
+ {RTL_USB_DEVICE(0x07b8, 0x8189, rtl92cu_hal_cfg)}, /*Funai - Abocom*/
+ {RTL_USB_DEVICE(0x0846, 0x9041, rtl92cu_hal_cfg)}, /*NetGear WNA1000M*/
++ {RTL_USB_DEVICE(0x0846, 0x9043, rtl92cu_hal_cfg)}, /*NG WNA1000Mv2*/
+ {RTL_USB_DEVICE(0x0b05, 0x17ba, rtl92cu_hal_cfg)}, /*ASUS-Edimax*/
+ {RTL_USB_DEVICE(0x0bda, 0x5088, rtl92cu_hal_cfg)}, /*Thinkware-CC&C*/
+ {RTL_USB_DEVICE(0x0df6, 0x0052, rtl92cu_hal_cfg)}, /*Sitecom - Edimax*/
+diff --git a/drivers/net/wireless/rtlwifi/rtl8821ae/hw.c b/drivers/net/wireless/rtlwifi/rtl8821ae/hw.c
+index 3236d44b459d..b7f18e2155eb 100644
+--- a/drivers/net/wireless/rtlwifi/rtl8821ae/hw.c
++++ b/drivers/net/wireless/rtlwifi/rtl8821ae/hw.c
+@@ -2180,7 +2180,7 @@ static int _rtl8821ae_set_media_status(struct ieee80211_hw *hw,
+
+ rtl_write_byte(rtlpriv, MSR, bt_msr);
+ rtlpriv->cfg->ops->led_control(hw, ledaction);
+- if ((bt_msr & 0xfc) == MSR_AP)
++ if ((bt_msr & MSR_MASK) == MSR_AP)
+ rtl_write_byte(rtlpriv, REG_BCNTCFG + 1, 0x00);
+ else
+ rtl_write_byte(rtlpriv, REG_BCNTCFG + 1, 0x66);
+diff --git a/drivers/net/wireless/rtlwifi/rtl8821ae/reg.h b/drivers/net/wireless/rtlwifi/rtl8821ae/reg.h
+index 53668fc8f23e..1d6110f9c1fb 100644
+--- a/drivers/net/wireless/rtlwifi/rtl8821ae/reg.h
++++ b/drivers/net/wireless/rtlwifi/rtl8821ae/reg.h
+@@ -429,6 +429,7 @@
+ #define MSR_ADHOC 0x01
+ #define MSR_INFRA 0x02
+ #define MSR_AP 0x03
++#define MSR_MASK 0x03
+
+ #define RRSR_RSC_OFFSET 21
+ #define RRSR_SHORT_OFFSET 23
+diff --git a/drivers/nfc/st-nci/i2c.c b/drivers/nfc/st-nci/i2c.c
+index 06175ce769bb..707ed2eb5936 100644
+--- a/drivers/nfc/st-nci/i2c.c
++++ b/drivers/nfc/st-nci/i2c.c
+@@ -25,15 +25,15 @@
+ #include <linux/interrupt.h>
+ #include <linux/delay.h>
+ #include <linux/nfc.h>
+-#include <linux/platform_data/st_nci.h>
++#include <linux/platform_data/st-nci.h>
+
+ #include "ndlc.h"
+
+-#define DRIVER_DESC "NCI NFC driver for ST21NFCB"
++#define DRIVER_DESC "NCI NFC driver for ST_NCI"
+
+ /* ndlc header */
+-#define ST21NFCB_FRAME_HEADROOM 1
+-#define ST21NFCB_FRAME_TAILROOM 0
++#define ST_NCI_FRAME_HEADROOM 1
++#define ST_NCI_FRAME_TAILROOM 0
+
+ #define ST_NCI_I2C_MIN_SIZE 4 /* PCB(1) + NCI Packet header(3) */
+ #define ST_NCI_I2C_MAX_SIZE 250 /* req 4.2.1 */
+@@ -118,15 +118,10 @@ static int st_nci_i2c_write(void *phy_id, struct sk_buff *skb)
+ /*
+ * Reads an ndlc frame and returns it in a newly allocated sk_buff.
+ * returns:
+- * frame size : if received frame is complete (find ST21NFCB_SOF_EOF at
+- * end of read)
+- * -EAGAIN : if received frame is incomplete (not find ST21NFCB_SOF_EOF
+- * at end of read)
++ * 0 : if received frame is complete
+ * -EREMOTEIO : i2c read error (fatal)
+ * -EBADMSG : frame was incorrect and discarded
+- * (value returned from st_nci_i2c_repack)
+- * -EIO : if no ST21NFCB_SOF_EOF is found after reaching
+- * the read length end sequence
++ * -ENOMEM : cannot allocate skb, frame dropped
+ */
+ static int st_nci_i2c_read(struct st_nci_i2c_phy *phy,
+ struct sk_buff **skb)
+@@ -179,7 +174,7 @@ static int st_nci_i2c_read(struct st_nci_i2c_phy *phy,
+ /*
+ * Reads an ndlc frame from the chip.
+ *
+- * On ST21NFCB, IRQ goes in idle state when read starts.
++ * On ST_NCI, IRQ goes in idle state when read starts.
+ */
+ static irqreturn_t st_nci_irq_thread_fn(int irq, void *phy_id)
+ {
+@@ -325,12 +320,12 @@ static int st_nci_i2c_probe(struct i2c_client *client,
+ }
+ } else {
+ nfc_err(&client->dev,
+- "st21nfcb platform resources not available\n");
++ "st_nci platform resources not available\n");
+ return -ENODEV;
+ }
+
+ r = ndlc_probe(phy, &i2c_phy_ops, &client->dev,
+- ST21NFCB_FRAME_HEADROOM, ST21NFCB_FRAME_TAILROOM,
++ ST_NCI_FRAME_HEADROOM, ST_NCI_FRAME_TAILROOM,
+ &phy->ndlc);
+ if (r < 0) {
+ nfc_err(&client->dev, "Unable to register ndlc layer\n");
+diff --git a/drivers/nfc/st-nci/ndlc.c b/drivers/nfc/st-nci/ndlc.c
+index 56c6a4cb4c96..4f51649d0e75 100644
+--- a/drivers/nfc/st-nci/ndlc.c
++++ b/drivers/nfc/st-nci/ndlc.c
+@@ -171,6 +171,8 @@ static void llt_ndlc_rcv_queue(struct llt_ndlc *ndlc)
+ if ((pcb & PCB_TYPE_MASK) == PCB_TYPE_SUPERVISOR) {
+ switch (pcb & PCB_SYNC_MASK) {
+ case PCB_SYNC_ACK:
++ skb = skb_dequeue(&ndlc->ack_pending_q);
++ kfree_skb(skb);
+ del_timer_sync(&ndlc->t1_timer);
+ del_timer_sync(&ndlc->t2_timer);
+ ndlc->t2_active = false;
+@@ -196,8 +198,10 @@ static void llt_ndlc_rcv_queue(struct llt_ndlc *ndlc)
+ kfree_skb(skb);
+ break;
+ }
+- } else {
++ } else if ((pcb & PCB_TYPE_MASK) == PCB_TYPE_DATAFRAME) {
+ nci_recv_frame(ndlc->ndev, skb);
++ } else {
++ kfree_skb(skb);
+ }
+ }
+ }
+diff --git a/drivers/nfc/st-nci/st-nci_se.c b/drivers/nfc/st-nci/st-nci_se.c
+index 97addfa96c6f..c742ef65a05a 100644
+--- a/drivers/nfc/st-nci/st-nci_se.c
++++ b/drivers/nfc/st-nci/st-nci_se.c
+@@ -189,14 +189,14 @@ int st_nci_hci_load_session(struct nci_dev *ndev)
+ ST_NCI_DEVICE_MGNT_GATE,
+ ST_NCI_DEVICE_MGNT_PIPE);
+ if (r < 0)
+- goto free_info;
++ return r;
+
+ /* Get pipe list */
+ r = nci_hci_send_cmd(ndev, ST_NCI_DEVICE_MGNT_GATE,
+ ST_NCI_DM_GETINFO, pipe_list, sizeof(pipe_list),
+ &skb_pipe_list);
+ if (r < 0)
+- goto free_info;
++ return r;
+
+ /* Complete the existing gate_pipe table */
+ for (i = 0; i < skb_pipe_list->len; i++) {
+@@ -222,6 +222,7 @@ int st_nci_hci_load_session(struct nci_dev *ndev)
+ dm_pipe_info->src_host_id != ST_NCI_ESE_HOST_ID) {
+ pr_err("Unexpected apdu_reader pipe on host %x\n",
+ dm_pipe_info->src_host_id);
++ kfree_skb(skb_pipe_info);
+ continue;
+ }
+
+@@ -241,13 +242,12 @@ int st_nci_hci_load_session(struct nci_dev *ndev)
+ ndev->hci_dev->pipes[st_nci_gates[j].pipe].host =
+ dm_pipe_info->src_host_id;
+ }
++ kfree_skb(skb_pipe_info);
+ }
+
+ memcpy(ndev->hci_dev->init_data.gates, st_nci_gates,
+ sizeof(st_nci_gates));
+
+-free_info:
+- kfree_skb(skb_pipe_info);
+ kfree_skb(skb_pipe_list);
+ return r;
+ }
+diff --git a/drivers/nfc/st21nfca/st21nfca.c b/drivers/nfc/st21nfca/st21nfca.c
+index d251f7229c4e..051286562fab 100644
+--- a/drivers/nfc/st21nfca/st21nfca.c
++++ b/drivers/nfc/st21nfca/st21nfca.c
+@@ -148,14 +148,14 @@ static int st21nfca_hci_load_session(struct nfc_hci_dev *hdev)
+ ST21NFCA_DEVICE_MGNT_GATE,
+ ST21NFCA_DEVICE_MGNT_PIPE);
+ if (r < 0)
+- goto free_info;
++ return r;
+
+ /* Get pipe list */
+ r = nfc_hci_send_cmd(hdev, ST21NFCA_DEVICE_MGNT_GATE,
+ ST21NFCA_DM_GETINFO, pipe_list, sizeof(pipe_list),
+ &skb_pipe_list);
+ if (r < 0)
+- goto free_info;
++ return r;
+
+ /* Complete the existing gate_pipe table */
+ for (i = 0; i < skb_pipe_list->len; i++) {
+@@ -181,6 +181,7 @@ static int st21nfca_hci_load_session(struct nfc_hci_dev *hdev)
+ info->src_host_id != ST21NFCA_ESE_HOST_ID) {
+ pr_err("Unexpected apdu_reader pipe on host %x\n",
+ info->src_host_id);
++ kfree_skb(skb_pipe_info);
+ continue;
+ }
+
+@@ -200,6 +201,7 @@ static int st21nfca_hci_load_session(struct nfc_hci_dev *hdev)
+ hdev->pipes[st21nfca_gates[j].pipe].dest_host =
+ info->src_host_id;
+ }
++ kfree_skb(skb_pipe_info);
+ }
+
+ /*
+@@ -214,13 +216,12 @@ static int st21nfca_hci_load_session(struct nfc_hci_dev *hdev)
+ st21nfca_gates[i].gate,
+ st21nfca_gates[i].pipe);
+ if (r < 0)
+- goto free_info;
++ goto free_list;
+ }
+ }
+
+ memcpy(hdev->init_data.gates, st21nfca_gates, sizeof(st21nfca_gates));
+-free_info:
+- kfree_skb(skb_pipe_info);
++free_list:
+ kfree_skb(skb_pipe_list);
+ return r;
+ }
+diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
+index 07496560e5b9..6e82bc42373b 100644
+--- a/drivers/of/fdt.c
++++ b/drivers/of/fdt.c
+@@ -967,7 +967,9 @@ int __init early_init_dt_scan_chosen(unsigned long node, const char *uname,
+ }
+
+ #ifdef CONFIG_HAVE_MEMBLOCK
+-#define MAX_PHYS_ADDR ((phys_addr_t)~0)
++#ifndef MAX_MEMBLOCK_ADDR
++#define MAX_MEMBLOCK_ADDR ((phys_addr_t)~0)
++#endif
+
+ void __init __weak early_init_dt_add_memory_arch(u64 base, u64 size)
+ {
+@@ -984,16 +986,16 @@ void __init __weak early_init_dt_add_memory_arch(u64 base, u64 size)
+ }
+ size &= PAGE_MASK;
+
+- if (base > MAX_PHYS_ADDR) {
++ if (base > MAX_MEMBLOCK_ADDR) {
+ pr_warning("Ignoring memory block 0x%llx - 0x%llx\n",
+ base, base + size);
+ return;
+ }
+
+- if (base + size - 1 > MAX_PHYS_ADDR) {
++ if (base + size - 1 > MAX_MEMBLOCK_ADDR) {
+ pr_warning("Ignoring memory range 0x%llx - 0x%llx\n",
+- ((u64)MAX_PHYS_ADDR) + 1, base + size);
+- size = MAX_PHYS_ADDR - base + 1;
++ ((u64)MAX_MEMBLOCK_ADDR) + 1, base + size);
++ size = MAX_MEMBLOCK_ADDR - base + 1;
+ }
+
+ if (base + size < phys_offset) {
+diff --git a/drivers/parisc/lba_pci.c b/drivers/parisc/lba_pci.c
+index dceb9ddfd99a..a32c1f6c252c 100644
+--- a/drivers/parisc/lba_pci.c
++++ b/drivers/parisc/lba_pci.c
+@@ -1556,8 +1556,11 @@ lba_driver_probe(struct parisc_device *dev)
+ if (lba_dev->hba.lmmio_space.flags)
+ pci_add_resource_offset(&resources, &lba_dev->hba.lmmio_space,
+ lba_dev->hba.lmmio_space_offset);
+- if (lba_dev->hba.gmmio_space.flags)
+- pci_add_resource(&resources, &lba_dev->hba.gmmio_space);
++ if (lba_dev->hba.gmmio_space.flags) {
++ /* pci_add_resource(&resources, &lba_dev->hba.gmmio_space); */
++ pr_warn("LBA: Not registering GMMIO space %pR\n",
++ &lba_dev->hba.gmmio_space);
++ }
+
+ pci_add_resource(&resources, &lba_dev->hba.bus_num);
+
+diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
+index 944f50015ed0..73de4efcbe6e 100644
+--- a/drivers/pci/Kconfig
++++ b/drivers/pci/Kconfig
+@@ -2,7 +2,7 @@
+ # PCI configuration
+ #
+ config PCI_BUS_ADDR_T_64BIT
+- def_bool y if (ARCH_DMA_ADDR_T_64BIT || (64BIT && !PARISC))
++ def_bool y if (ARCH_DMA_ADDR_T_64BIT || 64BIT)
+ depends on PCI
+
+ config PCI_MSI
+diff --git a/drivers/pinctrl/mediatek/pinctrl-mtk-common.c b/drivers/pinctrl/mediatek/pinctrl-mtk-common.c
+index ad1ea1695b4a..4a52072d1d3f 100644
+--- a/drivers/pinctrl/mediatek/pinctrl-mtk-common.c
++++ b/drivers/pinctrl/mediatek/pinctrl-mtk-common.c
+@@ -1202,12 +1202,6 @@ static int mtk_pctrl_build_state(struct platform_device *pdev)
+ return 0;
+ }
+
+-static struct pinctrl_desc mtk_pctrl_desc = {
+- .confops = &mtk_pconf_ops,
+- .pctlops = &mtk_pctrl_ops,
+- .pmxops = &mtk_pmx_ops,
+-};
+-
+ int mtk_pctrl_init(struct platform_device *pdev,
+ const struct mtk_pinctrl_devdata *data,
+ struct regmap *regmap)
+@@ -1265,12 +1259,17 @@ int mtk_pctrl_init(struct platform_device *pdev,
+
+ for (i = 0; i < pctl->devdata->npins; i++)
+ pins[i] = pctl->devdata->pins[i].pin;
+- mtk_pctrl_desc.name = dev_name(&pdev->dev);
+- mtk_pctrl_desc.owner = THIS_MODULE;
+- mtk_pctrl_desc.pins = pins;
+- mtk_pctrl_desc.npins = pctl->devdata->npins;
++
++ pctl->pctl_desc.name = dev_name(&pdev->dev);
++ pctl->pctl_desc.owner = THIS_MODULE;
++ pctl->pctl_desc.pins = pins;
++ pctl->pctl_desc.npins = pctl->devdata->npins;
++ pctl->pctl_desc.confops = &mtk_pconf_ops;
++ pctl->pctl_desc.pctlops = &mtk_pctrl_ops;
++ pctl->pctl_desc.pmxops = &mtk_pmx_ops;
+ pctl->dev = &pdev->dev;
+- pctl->pctl_dev = pinctrl_register(&mtk_pctrl_desc, &pdev->dev, pctl);
++
++ pctl->pctl_dev = pinctrl_register(&pctl->pctl_desc, &pdev->dev, pctl);
+ if (IS_ERR(pctl->pctl_dev)) {
+ dev_err(&pdev->dev, "couldn't register pinctrl driver\n");
+ return PTR_ERR(pctl->pctl_dev);
+diff --git a/drivers/pinctrl/mediatek/pinctrl-mtk-common.h b/drivers/pinctrl/mediatek/pinctrl-mtk-common.h
+index 30213e514c2f..c532c23c70b4 100644
+--- a/drivers/pinctrl/mediatek/pinctrl-mtk-common.h
++++ b/drivers/pinctrl/mediatek/pinctrl-mtk-common.h
+@@ -256,6 +256,7 @@ struct mtk_pinctrl_devdata {
+ struct mtk_pinctrl {
+ struct regmap *regmap1;
+ struct regmap *regmap2;
++ struct pinctrl_desc pctl_desc;
+ struct device *dev;
+ struct gpio_chip *chip;
+ struct mtk_pinctrl_group *groups;
+diff --git a/drivers/pinctrl/pinctrl-at91.c b/drivers/pinctrl/pinctrl-at91.c
+index a0824477072b..2deb1309fcac 100644
+--- a/drivers/pinctrl/pinctrl-at91.c
++++ b/drivers/pinctrl/pinctrl-at91.c
+@@ -320,6 +320,9 @@ static const struct pinctrl_ops at91_pctrl_ops = {
+ static void __iomem *pin_to_controller(struct at91_pinctrl *info,
+ unsigned int bank)
+ {
++ if (!gpio_chips[bank])
++ return NULL;
++
+ return gpio_chips[bank]->regbase;
+ }
+
+@@ -729,6 +732,10 @@ static int at91_pmx_set(struct pinctrl_dev *pctldev, unsigned selector,
+ pin = &pins_conf[i];
+ at91_pin_dbg(info->dev, pin);
+ pio = pin_to_controller(info, pin->bank);
++
++ if (!pio)
++ continue;
++
+ mask = pin_to_mask(pin->pin);
+ at91_mux_disable_interrupt(pio, mask);
+ switch (pin->mux) {
+@@ -848,6 +855,10 @@ static int at91_pinconf_get(struct pinctrl_dev *pctldev,
+ *config = 0;
+ dev_dbg(info->dev, "%s:%d, pin_id=%d", __func__, __LINE__, pin_id);
+ pio = pin_to_controller(info, pin_to_bank(pin_id));
++
++ if (!pio)
++ return -EINVAL;
++
+ pin = pin_id % MAX_NB_GPIO_PER_BANK;
+
+ if (at91_mux_get_multidrive(pio, pin))
+@@ -889,6 +900,10 @@ static int at91_pinconf_set(struct pinctrl_dev *pctldev,
+ "%s:%d, pin_id=%d, config=0x%lx",
+ __func__, __LINE__, pin_id, config);
+ pio = pin_to_controller(info, pin_to_bank(pin_id));
++
++ if (!pio)
++ return -EINVAL;
++
+ pin = pin_id % MAX_NB_GPIO_PER_BANK;
+ mask = pin_to_mask(pin);
+
+diff --git a/drivers/platform/x86/ideapad-laptop.c b/drivers/platform/x86/ideapad-laptop.c
+index 76b57388d01b..81c3e582309a 100644
+--- a/drivers/platform/x86/ideapad-laptop.c
++++ b/drivers/platform/x86/ideapad-laptop.c
+@@ -853,6 +853,13 @@ static const struct dmi_system_id no_hw_rfkill_list[] = {
+ },
+ },
+ {
++ .ident = "Lenovo Yoga 3 14",
++ .matches = {
++ DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
++ DMI_MATCH(DMI_PRODUCT_VERSION, "Lenovo Yoga 3 14"),
++ },
++ },
++ {
+ .ident = "Lenovo Yoga 3 Pro 1370",
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
+diff --git a/drivers/rtc/rtc-abx80x.c b/drivers/rtc/rtc-abx80x.c
+index 4337c3bc6ace..afea84c7a155 100644
+--- a/drivers/rtc/rtc-abx80x.c
++++ b/drivers/rtc/rtc-abx80x.c
+@@ -28,7 +28,7 @@
+ #define ABX8XX_REG_WD 0x07
+
+ #define ABX8XX_REG_CTRL1 0x10
+-#define ABX8XX_CTRL_WRITE BIT(1)
++#define ABX8XX_CTRL_WRITE BIT(0)
+ #define ABX8XX_CTRL_12_24 BIT(6)
+
+ #define ABX8XX_REG_CFG_KEY 0x1f
+diff --git a/drivers/rtc/rtc-s3c.c b/drivers/rtc/rtc-s3c.c
+index a0f832362199..2e709e239dbc 100644
+--- a/drivers/rtc/rtc-s3c.c
++++ b/drivers/rtc/rtc-s3c.c
+@@ -39,6 +39,7 @@ struct s3c_rtc {
+ void __iomem *base;
+ struct clk *rtc_clk;
+ struct clk *rtc_src_clk;
++ bool clk_disabled;
+
+ struct s3c_rtc_data *data;
+
+@@ -71,9 +72,12 @@ static void s3c_rtc_enable_clk(struct s3c_rtc *info)
+ unsigned long irq_flags;
+
+ spin_lock_irqsave(&info->alarm_clk_lock, irq_flags);
+- clk_enable(info->rtc_clk);
+- if (info->data->needs_src_clk)
+- clk_enable(info->rtc_src_clk);
++ if (info->clk_disabled) {
++ clk_enable(info->rtc_clk);
++ if (info->data->needs_src_clk)
++ clk_enable(info->rtc_src_clk);
++ info->clk_disabled = false;
++ }
+ spin_unlock_irqrestore(&info->alarm_clk_lock, irq_flags);
+ }
+
+@@ -82,9 +86,12 @@ static void s3c_rtc_disable_clk(struct s3c_rtc *info)
+ unsigned long irq_flags;
+
+ spin_lock_irqsave(&info->alarm_clk_lock, irq_flags);
+- if (info->data->needs_src_clk)
+- clk_disable(info->rtc_src_clk);
+- clk_disable(info->rtc_clk);
++ if (!info->clk_disabled) {
++ if (info->data->needs_src_clk)
++ clk_disable(info->rtc_src_clk);
++ clk_disable(info->rtc_clk);
++ info->clk_disabled = true;
++ }
+ spin_unlock_irqrestore(&info->alarm_clk_lock, irq_flags);
+ }
+
+@@ -128,6 +135,11 @@ static int s3c_rtc_setaie(struct device *dev, unsigned int enabled)
+
+ s3c_rtc_disable_clk(info);
+
++ if (enabled)
++ s3c_rtc_enable_clk(info);
++ else
++ s3c_rtc_disable_clk(info);
++
+ return 0;
+ }
+
+diff --git a/drivers/rtc/rtc-s5m.c b/drivers/rtc/rtc-s5m.c
+index 8c70d785ba73..ab60287ee72d 100644
+--- a/drivers/rtc/rtc-s5m.c
++++ b/drivers/rtc/rtc-s5m.c
+@@ -635,6 +635,16 @@ static int s5m8767_rtc_init_reg(struct s5m_rtc_info *info)
+ case S2MPS13X:
+ data[0] = (0 << BCD_EN_SHIFT) | (1 << MODEL24_SHIFT);
+ ret = regmap_write(info->regmap, info->regs->ctrl, data[0]);
++ if (ret < 0)
++ break;
++
++ /*
++ * Should set WUDR & (RUDR or AUDR) bits to high after writing
++ * RTC_CTRL register like writing Alarm registers. We can't find
++ * the description from datasheet but vendor code does that
++ * really.
++ */
++ ret = s5m8767_rtc_set_alarm_reg(info);
+ break;
+
+ default:
+diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
+index f5021fcb154e..089e7f8543a5 100644
+--- a/fs/btrfs/transaction.c
++++ b/fs/btrfs/transaction.c
+@@ -1893,8 +1893,11 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
+ spin_unlock(&root->fs_info->trans_lock);
+
+ wait_for_commit(root, prev_trans);
++ ret = prev_trans->aborted;
+
+ btrfs_put_transaction(prev_trans);
++ if (ret)
++ goto cleanup_transaction;
+ } else {
+ spin_unlock(&root->fs_info->trans_lock);
+ }
+diff --git a/fs/cifs/ioctl.c b/fs/cifs/ioctl.c
+index 49b8b6e41a18..c7b84f3bf6ad 100644
+--- a/fs/cifs/ioctl.c
++++ b/fs/cifs/ioctl.c
+@@ -70,6 +70,12 @@ static long cifs_ioctl_clone(unsigned int xid, struct file *dst_file,
+ goto out_drop_write;
+ }
+
++ if (src_file.file->f_op->unlocked_ioctl != cifs_ioctl) {
++ rc = -EBADF;
++ cifs_dbg(VFS, "src file seems to be from a different filesystem type\n");
++ goto out_fput;
++ }
++
+ if ((!src_file.file->private_data) || (!dst_file->private_data)) {
+ rc = -EBADF;
+ cifs_dbg(VFS, "missing cifsFileInfo on copy range src file\n");
+diff --git a/fs/coredump.c b/fs/coredump.c
+index c5ecde6f3eed..a8f75640ac86 100644
+--- a/fs/coredump.c
++++ b/fs/coredump.c
+@@ -513,10 +513,10 @@ void do_coredump(const siginfo_t *siginfo)
+ const struct cred *old_cred;
+ struct cred *cred;
+ int retval = 0;
+- int flag = 0;
+ int ispipe;
+ struct files_struct *displaced;
+- bool need_nonrelative = false;
++ /* require nonrelative corefile path and be extra careful */
++ bool need_suid_safe = false;
+ bool core_dumped = false;
+ static atomic_t core_dump_count = ATOMIC_INIT(0);
+ struct coredump_params cprm = {
+@@ -550,9 +550,8 @@ void do_coredump(const siginfo_t *siginfo)
+ */
+ if (__get_dumpable(cprm.mm_flags) == SUID_DUMP_ROOT) {
+ /* Setuid core dump mode */
+- flag = O_EXCL; /* Stop rewrite attacks */
+ cred->fsuid = GLOBAL_ROOT_UID; /* Dump root private */
+- need_nonrelative = true;
++ need_suid_safe = true;
+ }
+
+ retval = coredump_wait(siginfo->si_signo, &core_state);
+@@ -633,7 +632,7 @@ void do_coredump(const siginfo_t *siginfo)
+ if (cprm.limit < binfmt->min_coredump)
+ goto fail_unlock;
+
+- if (need_nonrelative && cn.corename[0] != '/') {
++ if (need_suid_safe && cn.corename[0] != '/') {
+ printk(KERN_WARNING "Pid %d(%s) can only dump core "\
+ "to fully qualified path!\n",
+ task_tgid_vnr(current), current->comm);
+@@ -641,8 +640,35 @@ void do_coredump(const siginfo_t *siginfo)
+ goto fail_unlock;
+ }
+
++ /*
++ * Unlink the file if it exists unless this is a SUID
++ * binary - in that case, we're running around with root
++ * privs and don't want to unlink another user's coredump.
++ */
++ if (!need_suid_safe) {
++ mm_segment_t old_fs;
++
++ old_fs = get_fs();
++ set_fs(KERNEL_DS);
++ /*
++ * If it doesn't exist, that's fine. If there's some
++ * other problem, we'll catch it at the filp_open().
++ */
++ (void) sys_unlink((const char __user *)cn.corename);
++ set_fs(old_fs);
++ }
++
++ /*
++ * There is a race between unlinking and creating the
++ * file, but if that causes an EEXIST here, that's
++ * fine - another process raced with us while creating
++ * the corefile, and the other process won. To userspace,
++ * what matters is that at least one of the two processes
++ * writes its coredump successfully, not which one.
++ */
+ cprm.file = filp_open(cn.corename,
+- O_CREAT | 2 | O_NOFOLLOW | O_LARGEFILE | flag,
++ O_CREAT | 2 | O_NOFOLLOW |
++ O_LARGEFILE | O_EXCL,
+ 0600);
+ if (IS_ERR(cprm.file))
+ goto fail_unlock;
+@@ -659,11 +685,15 @@ void do_coredump(const siginfo_t *siginfo)
+ if (!S_ISREG(inode->i_mode))
+ goto close_fail;
+ /*
+- * Dont allow local users get cute and trick others to coredump
+- * into their pre-created files.
++ * Don't dump core if the filesystem changed owner or mode
++ * of the file during file creation. This is an issue when
++ * a process dumps core while its cwd is e.g. on a vfat
++ * filesystem.
+ */
+ if (!uid_eq(inode->i_uid, current_fsuid()))
+ goto close_fail;
++ if ((inode->i_mode & 0677) != 0600)
++ goto close_fail;
+ if (!(cprm.file->f_mode & FMODE_CAN_WRITE))
+ goto close_fail;
+ if (do_truncate(cprm.file->f_path.dentry, 0, 0, cprm.file))
+diff --git a/fs/ecryptfs/dentry.c b/fs/ecryptfs/dentry.c
+index 8db0b464483f..63cd2c147221 100644
+--- a/fs/ecryptfs/dentry.c
++++ b/fs/ecryptfs/dentry.c
+@@ -45,20 +45,20 @@
+ static int ecryptfs_d_revalidate(struct dentry *dentry, unsigned int flags)
+ {
+ struct dentry *lower_dentry = ecryptfs_dentry_to_lower(dentry);
+- int rc;
+-
+- if (!(lower_dentry->d_flags & DCACHE_OP_REVALIDATE))
+- return 1;
++ int rc = 1;
+
+ if (flags & LOOKUP_RCU)
+ return -ECHILD;
+
+- rc = lower_dentry->d_op->d_revalidate(lower_dentry, flags);
++ if (lower_dentry->d_flags & DCACHE_OP_REVALIDATE)
++ rc = lower_dentry->d_op->d_revalidate(lower_dentry, flags);
++
+ if (d_really_is_positive(dentry)) {
+- struct inode *lower_inode =
+- ecryptfs_inode_to_lower(d_inode(dentry));
++ struct inode *inode = d_inode(dentry);
+
+- fsstack_copy_attr_all(d_inode(dentry), lower_inode);
++ fsstack_copy_attr_all(inode, ecryptfs_inode_to_lower(inode));
++ if (!inode->i_nlink)
++ return 0;
+ }
+ return rc;
+ }
+diff --git a/fs/ext4/super.c b/fs/ext4/super.c
+index 9981064c4a54..a5e8c744e962 100644
+--- a/fs/ext4/super.c
++++ b/fs/ext4/super.c
+@@ -325,6 +325,22 @@ static void save_error_info(struct super_block *sb, const char *func,
+ ext4_commit_super(sb, 1);
+ }
+
++/*
++ * The del_gendisk() function uninitializes the disk-specific data
++ * structures, including the bdi structure, without telling anyone
++ * else. Once this happens, any attempt to call mark_buffer_dirty()
++ * (for example, by ext4_commit_super), will cause a kernel OOPS.
++ * This is a kludge to prevent these oops until we can put in a proper
++ * hook in del_gendisk() to inform the VFS and file system layers.
++ */
++static int block_device_ejected(struct super_block *sb)
++{
++ struct inode *bd_inode = sb->s_bdev->bd_inode;
++ struct backing_dev_info *bdi = inode_to_bdi(bd_inode);
++
++ return bdi->dev == NULL;
++}
++
+ static void ext4_journal_commit_callback(journal_t *journal, transaction_t *txn)
+ {
+ struct super_block *sb = journal->j_private;
+@@ -4617,7 +4633,7 @@ static int ext4_commit_super(struct super_block *sb, int sync)
+ struct buffer_head *sbh = EXT4_SB(sb)->s_sbh;
+ int error = 0;
+
+- if (!sbh)
++ if (!sbh || block_device_ejected(sb))
+ return error;
+ if (buffer_write_io_error(sbh)) {
+ /*
+@@ -4833,10 +4849,11 @@ static int ext4_freeze(struct super_block *sb)
+ error = jbd2_journal_flush(journal);
+ if (error < 0)
+ goto out;
++
++ /* Journal blocked and flushed, clear needs_recovery flag. */
++ EXT4_CLEAR_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
+ }
+
+- /* Journal blocked and flushed, clear needs_recovery flag. */
+- EXT4_CLEAR_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
+ error = ext4_commit_super(sb, 1);
+ out:
+ if (journal)
+@@ -4854,8 +4871,11 @@ static int ext4_unfreeze(struct super_block *sb)
+ if (sb->s_flags & MS_RDONLY)
+ return 0;
+
+- /* Reset the needs_recovery flag before the fs is unlocked. */
+- EXT4_SET_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
++ if (EXT4_SB(sb)->s_journal) {
++ /* Reset the needs_recovery flag before the fs is unlocked. */
++ EXT4_SET_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER);
++ }
++
+ ext4_commit_super(sb, 1);
+ return 0;
+ }
+diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c
+index d3fa6bd9503e..221719eac5de 100644
+--- a/fs/hfs/bnode.c
++++ b/fs/hfs/bnode.c
+@@ -288,7 +288,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
+ page_cache_release(page);
+ goto fail;
+ }
+- page_cache_release(page);
+ node->page[i] = page;
+ }
+
+@@ -398,11 +397,11 @@ node_error:
+
+ void hfs_bnode_free(struct hfs_bnode *node)
+ {
+- //int i;
++ int i;
+
+- //for (i = 0; i < node->tree->pages_per_bnode; i++)
+- // if (node->page[i])
+- // page_cache_release(node->page[i]);
++ for (i = 0; i < node->tree->pages_per_bnode; i++)
++ if (node->page[i])
++ page_cache_release(node->page[i]);
+ kfree(node);
+ }
+
+diff --git a/fs/hfs/brec.c b/fs/hfs/brec.c
+index 9f4ee7f52026..6fc766df0461 100644
+--- a/fs/hfs/brec.c
++++ b/fs/hfs/brec.c
+@@ -131,13 +131,16 @@ skip:
+ hfs_bnode_write(node, entry, data_off + key_len, entry_len);
+ hfs_bnode_dump(node);
+
+- if (new_node) {
+- /* update parent key if we inserted a key
+- * at the start of the first node
+- */
+- if (!rec && new_node != node)
+- hfs_brec_update_parent(fd);
++ /*
++ * update parent key if we inserted a key
++ * at the start of the node and it is not the new node
++ */
++ if (!rec && new_node != node) {
++ hfs_bnode_read_key(node, fd->search_key, data_off + size);
++ hfs_brec_update_parent(fd);
++ }
+
++ if (new_node) {
+ hfs_bnode_put(fd->bnode);
+ if (!new_node->parent) {
+ hfs_btree_inc_height(tree);
+@@ -166,9 +169,6 @@ skip:
+ goto again;
+ }
+
+- if (!rec)
+- hfs_brec_update_parent(fd);
+-
+ return 0;
+ }
+
+@@ -366,6 +366,8 @@ again:
+ if (IS_ERR(parent))
+ return PTR_ERR(parent);
+ __hfs_brec_find(parent, fd);
++ if (fd->record < 0)
++ return -ENOENT;
+ hfs_bnode_dump(parent);
+ rec = fd->record;
+
+diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c
+index 759708fd9331..63924662aaf3 100644
+--- a/fs/hfsplus/bnode.c
++++ b/fs/hfsplus/bnode.c
+@@ -454,7 +454,6 @@ static struct hfs_bnode *__hfs_bnode_create(struct hfs_btree *tree, u32 cnid)
+ page_cache_release(page);
+ goto fail;
+ }
+- page_cache_release(page);
+ node->page[i] = page;
+ }
+
+@@ -566,13 +565,11 @@ node_error:
+
+ void hfs_bnode_free(struct hfs_bnode *node)
+ {
+-#if 0
+ int i;
+
+ for (i = 0; i < node->tree->pages_per_bnode; i++)
+ if (node->page[i])
+ page_cache_release(node->page[i]);
+-#endif
+ kfree(node);
+ }
+
+diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
+index 4227dc4f7437..8c44654ce274 100644
+--- a/fs/jbd2/checkpoint.c
++++ b/fs/jbd2/checkpoint.c
+@@ -417,12 +417,12 @@ int jbd2_cleanup_journal_tail(journal_t *journal)
+ * journal_clean_one_cp_list
+ *
+ * Find all the written-back checkpoint buffers in the given list and
+- * release them.
++ * release them. If 'destroy' is set, clean all buffers unconditionally.
+ *
+ * Called with j_list_lock held.
+ * Returns 1 if we freed the transaction, 0 otherwise.
+ */
+-static int journal_clean_one_cp_list(struct journal_head *jh)
++static int journal_clean_one_cp_list(struct journal_head *jh, bool destroy)
+ {
+ struct journal_head *last_jh;
+ struct journal_head *next_jh = jh;
+@@ -436,7 +436,10 @@ static int journal_clean_one_cp_list(struct journal_head *jh)
+ do {
+ jh = next_jh;
+ next_jh = jh->b_cpnext;
+- ret = __try_to_free_cp_buf(jh);
++ if (!destroy)
++ ret = __try_to_free_cp_buf(jh);
++ else
++ ret = __jbd2_journal_remove_checkpoint(jh) + 1;
+ if (!ret)
+ return freed;
+ if (ret == 2)
+@@ -459,10 +462,11 @@ static int journal_clean_one_cp_list(struct journal_head *jh)
+ * journal_clean_checkpoint_list
+ *
+ * Find all the written-back checkpoint buffers in the journal and release them.
++ * If 'destroy' is set, release all buffers unconditionally.
+ *
+ * Called with j_list_lock held.
+ */
+-void __jbd2_journal_clean_checkpoint_list(journal_t *journal)
++void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy)
+ {
+ transaction_t *transaction, *last_transaction, *next_transaction;
+ int ret;
+@@ -476,7 +480,8 @@ void __jbd2_journal_clean_checkpoint_list(journal_t *journal)
+ do {
+ transaction = next_transaction;
+ next_transaction = transaction->t_cpnext;
+- ret = journal_clean_one_cp_list(transaction->t_checkpoint_list);
++ ret = journal_clean_one_cp_list(transaction->t_checkpoint_list,
++ destroy);
+ /*
+ * This function only frees up some memory if possible so we
+ * dont have an obligation to finish processing. Bail out if
+@@ -492,7 +497,7 @@ void __jbd2_journal_clean_checkpoint_list(journal_t *journal)
+ * we can possibly see not yet submitted buffers on io_list
+ */
+ ret = journal_clean_one_cp_list(transaction->
+- t_checkpoint_io_list);
++ t_checkpoint_io_list, destroy);
+ if (need_resched())
+ return;
+ /*
+@@ -506,6 +511,28 @@ void __jbd2_journal_clean_checkpoint_list(journal_t *journal)
+ }
+
+ /*
++ * Remove buffers from all checkpoint lists as journal is aborted and we just
++ * need to free memory
++ */
++void jbd2_journal_destroy_checkpoint(journal_t *journal)
++{
++ /*
++ * We loop because __jbd2_journal_clean_checkpoint_list() may abort
++ * early due to a need of rescheduling.
++ */
++ while (1) {
++ spin_lock(&journal->j_list_lock);
++ if (!journal->j_checkpoint_transactions) {
++ spin_unlock(&journal->j_list_lock);
++ break;
++ }
++ __jbd2_journal_clean_checkpoint_list(journal, true);
++ spin_unlock(&journal->j_list_lock);
++ cond_resched();
++ }
++}
++
++/*
+ * journal_remove_checkpoint: called after a buffer has been committed
+ * to disk (either by being write-back flushed to disk, or being
+ * committed to the log).
+diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
+index b73e0215baa7..362e5f614450 100644
+--- a/fs/jbd2/commit.c
++++ b/fs/jbd2/commit.c
+@@ -510,7 +510,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
+ * frees some memory
+ */
+ spin_lock(&journal->j_list_lock);
+- __jbd2_journal_clean_checkpoint_list(journal);
++ __jbd2_journal_clean_checkpoint_list(journal, false);
+ spin_unlock(&journal->j_list_lock);
+
+ jbd_debug(3, "JBD2: commit phase 1\n");
+diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
+index 4ff3fad4e9e3..2721513adb1f 100644
+--- a/fs/jbd2/journal.c
++++ b/fs/jbd2/journal.c
+@@ -1693,8 +1693,17 @@ int jbd2_journal_destroy(journal_t *journal)
+ while (journal->j_checkpoint_transactions != NULL) {
+ spin_unlock(&journal->j_list_lock);
+ mutex_lock(&journal->j_checkpoint_mutex);
+- jbd2_log_do_checkpoint(journal);
++ err = jbd2_log_do_checkpoint(journal);
+ mutex_unlock(&journal->j_checkpoint_mutex);
++ /*
++ * If checkpointing failed, just free the buffers to avoid
++ * looping forever
++ */
++ if (err) {
++ jbd2_journal_destroy_checkpoint(journal);
++ spin_lock(&journal->j_list_lock);
++ break;
++ }
+ spin_lock(&journal->j_list_lock);
+ }
+
+diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
+index b3289d701eea..14e3b1e1b17d 100644
+--- a/fs/nfs/flexfilelayout/flexfilelayout.c
++++ b/fs/nfs/flexfilelayout/flexfilelayout.c
+@@ -1199,6 +1199,11 @@ static int ff_layout_write_done_cb(struct rpc_task *task,
+ hdr->res.verf->committed == NFS_DATA_SYNC)
+ ff_layout_set_layoutcommit(hdr);
+
++ /* zero out fattr since we don't care DS attr at all */
++ hdr->fattr.valid = 0;
++ if (task->tk_status >= 0)
++ nfs_writeback_update_inode(hdr);
++
+ return 0;
+ }
+
+diff --git a/fs/nfs/flexfilelayout/flexfilelayoutdev.c b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
+index f13e1969eedd..b28fa4cbea52 100644
+--- a/fs/nfs/flexfilelayout/flexfilelayoutdev.c
++++ b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
+@@ -500,16 +500,19 @@ int ff_layout_encode_ds_ioerr(struct nfs4_flexfile_layout *flo,
+ range->offset, range->length))
+ continue;
+ /* offset(8) + length(8) + stateid(NFS4_STATEID_SIZE)
+- * + deviceid(NFS4_DEVICEID4_SIZE) + status(4) + opnum(4)
++ * + array length + deviceid(NFS4_DEVICEID4_SIZE)
++ * + status(4) + opnum(4)
+ */
+ p = xdr_reserve_space(xdr,
+- 24 + NFS4_STATEID_SIZE + NFS4_DEVICEID4_SIZE);
++ 28 + NFS4_STATEID_SIZE + NFS4_DEVICEID4_SIZE);
+ if (unlikely(!p))
+ return -ENOBUFS;
+ p = xdr_encode_hyper(p, err->offset);
+ p = xdr_encode_hyper(p, err->length);
+ p = xdr_encode_opaque_fixed(p, &err->stateid,
+ NFS4_STATEID_SIZE);
++ /* Encode 1 error */
++ *p++ = cpu_to_be32(1);
+ p = xdr_encode_opaque_fixed(p, &err->deviceid,
+ NFS4_DEVICEID4_SIZE);
+ *p++ = cpu_to_be32(err->status);
+diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
+index 0adc7d245b3d..4afbe13321cb 100644
+--- a/fs/nfs/inode.c
++++ b/fs/nfs/inode.c
+@@ -1273,13 +1273,6 @@ static int nfs_check_inode_attributes(struct inode *inode, struct nfs_fattr *fat
+ return 0;
+ }
+
+-static int nfs_ctime_need_update(const struct inode *inode, const struct nfs_fattr *fattr)
+-{
+- if (!(fattr->valid & NFS_ATTR_FATTR_CTIME))
+- return 0;
+- return timespec_compare(&fattr->ctime, &inode->i_ctime) > 0;
+-}
+-
+ static atomic_long_t nfs_attr_generation_counter;
+
+ static unsigned long nfs_read_attr_generation_counter(void)
+@@ -1428,7 +1421,6 @@ static int nfs_inode_attrs_need_update(const struct inode *inode, const struct n
+ const struct nfs_inode *nfsi = NFS_I(inode);
+
+ return ((long)fattr->gencount - (long)nfsi->attr_gencount) > 0 ||
+- nfs_ctime_need_update(inode, fattr) ||
+ ((long)nfsi->attr_gencount - (long)nfs_read_attr_generation_counter() > 0);
+ }
+
+@@ -1491,6 +1483,13 @@ static int nfs_post_op_update_inode_locked(struct inode *inode, struct nfs_fattr
+ {
+ unsigned long invalid = NFS_INO_INVALID_ATTR|NFS_INO_REVAL_PAGECACHE;
+
++ /*
++ * Don't revalidate the pagecache if we hold a delegation, but do
++ * force an attribute update
++ */
++ if (NFS_PROTO(inode)->have_delegation(inode, FMODE_READ))
++ invalid = NFS_INO_INVALID_ATTR|NFS_INO_REVAL_FORCED;
++
+ if (S_ISDIR(inode->i_mode))
+ invalid |= NFS_INO_INVALID_DATA;
+ nfs_set_cache_invalid(inode, invalid);
+diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
+index 9b372b845f6a..1dad18105ed0 100644
+--- a/fs/nfs/internal.h
++++ b/fs/nfs/internal.h
+@@ -490,6 +490,9 @@ void nfs_retry_commit(struct list_head *page_list,
+ void nfs_commitdata_release(struct nfs_commit_data *data);
+ void nfs_request_add_commit_list(struct nfs_page *req, struct list_head *dst,
+ struct nfs_commit_info *cinfo);
++void nfs_request_add_commit_list_locked(struct nfs_page *req,
++ struct list_head *dst,
++ struct nfs_commit_info *cinfo);
+ void nfs_request_remove_commit_list(struct nfs_page *req,
+ struct nfs_commit_info *cinfo);
+ void nfs_init_cinfo(struct nfs_commit_info *cinfo,
+@@ -623,13 +626,15 @@ void nfs_super_set_maxbytes(struct super_block *sb, __u64 maxfilesize)
+ * Record the page as unstable and mark its inode as dirty.
+ */
+ static inline
+-void nfs_mark_page_unstable(struct page *page)
++void nfs_mark_page_unstable(struct page *page, struct nfs_commit_info *cinfo)
+ {
+- struct inode *inode = page_file_mapping(page)->host;
++ if (!cinfo->dreq) {
++ struct inode *inode = page_file_mapping(page)->host;
+
+- inc_zone_page_state(page, NR_UNSTABLE_NFS);
+- inc_wb_stat(&inode_to_bdi(inode)->wb, WB_RECLAIMABLE);
+- __mark_inode_dirty(inode, I_DIRTY_DATASYNC);
++ inc_zone_page_state(page, NR_UNSTABLE_NFS);
++ inc_wb_stat(&inode_to_bdi(inode)->wb, WB_RECLAIMABLE);
++ __mark_inode_dirty(inode, I_DIRTY_DATASYNC);
++ }
+ }
+
+ /*
+diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
+index 3acb1eb72930..73c8204ad463 100644
+--- a/fs/nfs/nfs4proc.c
++++ b/fs/nfs/nfs4proc.c
+@@ -1156,6 +1156,8 @@ static int can_open_delegated(struct nfs_delegation *delegation, fmode_t fmode)
+ return 0;
+ if ((delegation->type & fmode) != fmode)
+ return 0;
++ if (test_bit(NFS_DELEGATION_NEED_RECLAIM, &delegation->flags))
++ return 0;
+ if (test_bit(NFS_DELEGATION_RETURNING, &delegation->flags))
+ return 0;
+ nfs_mark_delegation_referenced(delegation);
+@@ -1220,6 +1222,7 @@ static void nfs_resync_open_stateid_locked(struct nfs4_state *state)
+ }
+
+ static void nfs_clear_open_stateid_locked(struct nfs4_state *state,
++ nfs4_stateid *arg_stateid,
+ nfs4_stateid *stateid, fmode_t fmode)
+ {
+ clear_bit(NFS_O_RDWR_STATE, &state->flags);
+@@ -1238,8 +1241,9 @@ static void nfs_clear_open_stateid_locked(struct nfs4_state *state,
+ if (stateid == NULL)
+ return;
+ /* Handle races with OPEN */
+- if (!nfs4_stateid_match_other(stateid, &state->open_stateid) ||
+- !nfs4_stateid_is_newer(stateid, &state->open_stateid)) {
++ if (!nfs4_stateid_match_other(arg_stateid, &state->open_stateid) ||
++ (nfs4_stateid_match_other(stateid, &state->open_stateid) &&
++ !nfs4_stateid_is_newer(stateid, &state->open_stateid))) {
+ nfs_resync_open_stateid_locked(state);
+ return;
+ }
+@@ -1248,10 +1252,12 @@ static void nfs_clear_open_stateid_locked(struct nfs4_state *state,
+ nfs4_stateid_copy(&state->open_stateid, stateid);
+ }
+
+-static void nfs_clear_open_stateid(struct nfs4_state *state, nfs4_stateid *stateid, fmode_t fmode)
++static void nfs_clear_open_stateid(struct nfs4_state *state,
++ nfs4_stateid *arg_stateid,
++ nfs4_stateid *stateid, fmode_t fmode)
+ {
+ write_seqlock(&state->seqlock);
+- nfs_clear_open_stateid_locked(state, stateid, fmode);
++ nfs_clear_open_stateid_locked(state, arg_stateid, stateid, fmode);
+ write_sequnlock(&state->seqlock);
+ if (test_bit(NFS_STATE_RECLAIM_NOGRACE, &state->flags))
+ nfs4_schedule_state_manager(state->owner->so_server->nfs_client);
+@@ -2425,7 +2431,7 @@ static int _nfs4_do_open(struct inode *dir,
+ goto err_free_label;
+ state = ctx->state;
+
+- if ((opendata->o_arg.open_flags & O_EXCL) &&
++ if ((opendata->o_arg.open_flags & (O_CREAT|O_EXCL)) == (O_CREAT|O_EXCL) &&
+ (opendata->o_arg.createmode != NFS4_CREATE_GUARDED)) {
+ nfs4_exclusive_attrset(opendata, sattr);
+
+@@ -2684,7 +2690,8 @@ static void nfs4_close_done(struct rpc_task *task, void *data)
+ goto out_release;
+ }
+ }
+- nfs_clear_open_stateid(state, res_stateid, calldata->arg.fmode);
++ nfs_clear_open_stateid(state, &calldata->arg.stateid,
++ res_stateid, calldata->arg.fmode);
+ out_release:
+ nfs_release_seqid(calldata->arg.seqid);
+ nfs_refresh_inode(calldata->inode, calldata->res.fattr);
+@@ -4984,7 +4991,7 @@ nfs4_init_nonuniform_client_string(struct nfs_client *clp)
+ return 0;
+ retry:
+ rcu_read_lock();
+- len = 10 + strlen(clp->cl_ipaddr) + 1 +
++ len = 14 + strlen(clp->cl_ipaddr) + 1 +
+ strlen(rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_ADDR)) +
+ 1 +
+ strlen(rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_PROTO)) +
+@@ -8661,6 +8668,7 @@ static const struct nfs4_minor_version_ops nfs_v4_2_minor_ops = {
+ .reboot_recovery_ops = &nfs41_reboot_recovery_ops,
+ .nograce_recovery_ops = &nfs41_nograce_recovery_ops,
+ .state_renewal_ops = &nfs41_state_renewal_ops,
++ .mig_recovery_ops = &nfs41_mig_recovery_ops,
+ };
+ #endif
+
+diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
+index 4984bbe55ff1..7c5718ba625e 100644
+--- a/fs/nfs/pagelist.c
++++ b/fs/nfs/pagelist.c
+@@ -77,8 +77,8 @@ EXPORT_SYMBOL_GPL(nfs_pgheader_init);
+ void nfs_set_pgio_error(struct nfs_pgio_header *hdr, int error, loff_t pos)
+ {
+ spin_lock(&hdr->lock);
+- if (pos < hdr->io_start + hdr->good_bytes) {
+- set_bit(NFS_IOHDR_ERROR, &hdr->flags);
++ if (!test_and_set_bit(NFS_IOHDR_ERROR, &hdr->flags)
++ || pos < hdr->io_start + hdr->good_bytes) {
+ clear_bit(NFS_IOHDR_EOF, &hdr->flags);
+ hdr->good_bytes = pos - hdr->io_start;
+ hdr->error = error;
+diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c
+index f37e25b6311c..e5c679f04099 100644
+--- a/fs/nfs/pnfs_nfs.c
++++ b/fs/nfs/pnfs_nfs.c
+@@ -359,26 +359,31 @@ same_sockaddr(struct sockaddr *addr1, struct sockaddr *addr2)
+ return false;
+ }
+
++/*
++ * Checks if 'dsaddrs1' contains a subset of 'dsaddrs2'. If it does,
++ * declare a match.
++ */
+ static bool
+ _same_data_server_addrs_locked(const struct list_head *dsaddrs1,
+ const struct list_head *dsaddrs2)
+ {
+ struct nfs4_pnfs_ds_addr *da1, *da2;
+-
+- /* step through both lists, comparing as we go */
+- for (da1 = list_first_entry(dsaddrs1, typeof(*da1), da_node),
+- da2 = list_first_entry(dsaddrs2, typeof(*da2), da_node);
+- da1 != NULL && da2 != NULL;
+- da1 = list_entry(da1->da_node.next, typeof(*da1), da_node),
+- da2 = list_entry(da2->da_node.next, typeof(*da2), da_node)) {
+- if (!same_sockaddr((struct sockaddr *)&da1->da_addr,
+- (struct sockaddr *)&da2->da_addr))
+- return false;
++ struct sockaddr *sa1, *sa2;
++ bool match = false;
++
++ list_for_each_entry(da1, dsaddrs1, da_node) {
++ sa1 = (struct sockaddr *)&da1->da_addr;
++ match = false;
++ list_for_each_entry(da2, dsaddrs2, da_node) {
++ sa2 = (struct sockaddr *)&da2->da_addr;
++ match = same_sockaddr(sa1, sa2);
++ if (match)
++ break;
++ }
++ if (!match)
++ break;
+ }
+- if (da1 == NULL && da2 == NULL)
+- return true;
+-
+- return false;
++ return match;
+ }
+
+ /*
+@@ -863,9 +868,10 @@ pnfs_layout_mark_request_commit(struct nfs_page *req,
+ }
+ set_bit(PG_COMMIT_TO_DS, &req->wb_flags);
+ cinfo->ds->nwritten++;
+- spin_unlock(cinfo->lock);
+
+- nfs_request_add_commit_list(req, list, cinfo);
++ nfs_request_add_commit_list_locked(req, list, cinfo);
++ spin_unlock(cinfo->lock);
++ nfs_mark_page_unstable(req->wb_page, cinfo);
+ }
+ EXPORT_SYMBOL_GPL(pnfs_layout_mark_request_commit);
+
+diff --git a/fs/nfs/write.c b/fs/nfs/write.c
+index 75a35a1afa79..fdee9270ca15 100644
+--- a/fs/nfs/write.c
++++ b/fs/nfs/write.c
+@@ -768,6 +768,28 @@ nfs_page_search_commits_for_head_request_locked(struct nfs_inode *nfsi,
+ }
+
+ /**
++ * nfs_request_add_commit_list_locked - add request to a commit list
++ * @req: pointer to a struct nfs_page
++ * @dst: commit list head
++ * @cinfo: holds list lock and accounting info
++ *
++ * This sets the PG_CLEAN bit, updates the cinfo count of
++ * number of outstanding requests requiring a commit as well as
++ * the MM page stats.
++ *
++ * The caller must hold the cinfo->lock, and the nfs_page lock.
++ */
++void
++nfs_request_add_commit_list_locked(struct nfs_page *req, struct list_head *dst,
++ struct nfs_commit_info *cinfo)
++{
++ set_bit(PG_CLEAN, &req->wb_flags);
++ nfs_list_add_request(req, dst);
++ cinfo->mds->ncommit++;
++}
++EXPORT_SYMBOL_GPL(nfs_request_add_commit_list_locked);
++
++/**
+ * nfs_request_add_commit_list - add request to a commit list
+ * @req: pointer to a struct nfs_page
+ * @dst: commit list head
+@@ -784,13 +806,10 @@ void
+ nfs_request_add_commit_list(struct nfs_page *req, struct list_head *dst,
+ struct nfs_commit_info *cinfo)
+ {
+- set_bit(PG_CLEAN, &(req)->wb_flags);
+ spin_lock(cinfo->lock);
+- nfs_list_add_request(req, dst);
+- cinfo->mds->ncommit++;
++ nfs_request_add_commit_list_locked(req, dst, cinfo);
+ spin_unlock(cinfo->lock);
+- if (!cinfo->dreq)
+- nfs_mark_page_unstable(req->wb_page);
++ nfs_mark_page_unstable(req->wb_page, cinfo);
+ }
+ EXPORT_SYMBOL_GPL(nfs_request_add_commit_list);
+
+diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
+index 95202719a1fd..75189cd34583 100644
+--- a/fs/nfsd/nfs4state.c
++++ b/fs/nfsd/nfs4state.c
+@@ -777,13 +777,16 @@ hash_delegation_locked(struct nfs4_delegation *dp, struct nfs4_file *fp)
+ list_add(&dp->dl_perclnt, &dp->dl_stid.sc_client->cl_delegations);
+ }
+
+-static void
++static bool
+ unhash_delegation_locked(struct nfs4_delegation *dp)
+ {
+ struct nfs4_file *fp = dp->dl_stid.sc_file;
+
+ lockdep_assert_held(&state_lock);
+
++ if (list_empty(&dp->dl_perfile))
++ return false;
++
+ dp->dl_stid.sc_type = NFS4_CLOSED_DELEG_STID;
+ /* Ensure that deleg break won't try to requeue it */
+ ++dp->dl_time;
+@@ -792,16 +795,21 @@ unhash_delegation_locked(struct nfs4_delegation *dp)
+ list_del_init(&dp->dl_recall_lru);
+ list_del_init(&dp->dl_perfile);
+ spin_unlock(&fp->fi_lock);
++ return true;
+ }
+
+ static void destroy_delegation(struct nfs4_delegation *dp)
+ {
++ bool unhashed;
++
+ spin_lock(&state_lock);
+- unhash_delegation_locked(dp);
++ unhashed = unhash_delegation_locked(dp);
+ spin_unlock(&state_lock);
+- put_clnt_odstate(dp->dl_clnt_odstate);
+- nfs4_put_deleg_lease(dp->dl_stid.sc_file);
+- nfs4_put_stid(&dp->dl_stid);
++ if (unhashed) {
++ put_clnt_odstate(dp->dl_clnt_odstate);
++ nfs4_put_deleg_lease(dp->dl_stid.sc_file);
++ nfs4_put_stid(&dp->dl_stid);
++ }
+ }
+
+ static void revoke_delegation(struct nfs4_delegation *dp)
+@@ -1004,16 +1012,20 @@ static void nfs4_put_stateowner(struct nfs4_stateowner *sop)
+ sop->so_ops->so_free(sop);
+ }
+
+-static void unhash_ol_stateid(struct nfs4_ol_stateid *stp)
++static bool unhash_ol_stateid(struct nfs4_ol_stateid *stp)
+ {
+ struct nfs4_file *fp = stp->st_stid.sc_file;
+
+ lockdep_assert_held(&stp->st_stateowner->so_client->cl_lock);
+
++ if (list_empty(&stp->st_perfile))
++ return false;
++
+ spin_lock(&fp->fi_lock);
+- list_del(&stp->st_perfile);
++ list_del_init(&stp->st_perfile);
+ spin_unlock(&fp->fi_lock);
+ list_del(&stp->st_perstateowner);
++ return true;
+ }
+
+ static void nfs4_free_ol_stateid(struct nfs4_stid *stid)
+@@ -1063,25 +1075,27 @@ static void put_ol_stateid_locked(struct nfs4_ol_stateid *stp,
+ list_add(&stp->st_locks, reaplist);
+ }
+
+-static void unhash_lock_stateid(struct nfs4_ol_stateid *stp)
++static bool unhash_lock_stateid(struct nfs4_ol_stateid *stp)
+ {
+ struct nfs4_openowner *oo = openowner(stp->st_openstp->st_stateowner);
+
+ lockdep_assert_held(&oo->oo_owner.so_client->cl_lock);
+
+ list_del_init(&stp->st_locks);
+- unhash_ol_stateid(stp);
+ nfs4_unhash_stid(&stp->st_stid);
++ return unhash_ol_stateid(stp);
+ }
+
+ static void release_lock_stateid(struct nfs4_ol_stateid *stp)
+ {
+ struct nfs4_openowner *oo = openowner(stp->st_openstp->st_stateowner);
++ bool unhashed;
+
+ spin_lock(&oo->oo_owner.so_client->cl_lock);
+- unhash_lock_stateid(stp);
++ unhashed = unhash_lock_stateid(stp);
+ spin_unlock(&oo->oo_owner.so_client->cl_lock);
+- nfs4_put_stid(&stp->st_stid);
++ if (unhashed)
++ nfs4_put_stid(&stp->st_stid);
+ }
+
+ static void unhash_lockowner_locked(struct nfs4_lockowner *lo)
+@@ -1129,7 +1143,7 @@ static void release_lockowner(struct nfs4_lockowner *lo)
+ while (!list_empty(&lo->lo_owner.so_stateids)) {
+ stp = list_first_entry(&lo->lo_owner.so_stateids,
+ struct nfs4_ol_stateid, st_perstateowner);
+- unhash_lock_stateid(stp);
++ WARN_ON(!unhash_lock_stateid(stp));
+ put_ol_stateid_locked(stp, &reaplist);
+ }
+ spin_unlock(&clp->cl_lock);
+@@ -1142,21 +1156,26 @@ static void release_open_stateid_locks(struct nfs4_ol_stateid *open_stp,
+ {
+ struct nfs4_ol_stateid *stp;
+
++ lockdep_assert_held(&open_stp->st_stid.sc_client->cl_lock);
++
+ while (!list_empty(&open_stp->st_locks)) {
+ stp = list_entry(open_stp->st_locks.next,
+ struct nfs4_ol_stateid, st_locks);
+- unhash_lock_stateid(stp);
++ WARN_ON(!unhash_lock_stateid(stp));
+ put_ol_stateid_locked(stp, reaplist);
+ }
+ }
+
+-static void unhash_open_stateid(struct nfs4_ol_stateid *stp,
++static bool unhash_open_stateid(struct nfs4_ol_stateid *stp,
+ struct list_head *reaplist)
+ {
++ bool unhashed;
++
+ lockdep_assert_held(&stp->st_stid.sc_client->cl_lock);
+
+- unhash_ol_stateid(stp);
++ unhashed = unhash_ol_stateid(stp);
+ release_open_stateid_locks(stp, reaplist);
++ return unhashed;
+ }
+
+ static void release_open_stateid(struct nfs4_ol_stateid *stp)
+@@ -1164,8 +1183,8 @@ static void release_open_stateid(struct nfs4_ol_stateid *stp)
+ LIST_HEAD(reaplist);
+
+ spin_lock(&stp->st_stid.sc_client->cl_lock);
+- unhash_open_stateid(stp, &reaplist);
+- put_ol_stateid_locked(stp, &reaplist);
++ if (unhash_open_stateid(stp, &reaplist))
++ put_ol_stateid_locked(stp, &reaplist);
+ spin_unlock(&stp->st_stid.sc_client->cl_lock);
+ free_ol_stateid_reaplist(&reaplist);
+ }
+@@ -1210,8 +1229,8 @@ static void release_openowner(struct nfs4_openowner *oo)
+ while (!list_empty(&oo->oo_owner.so_stateids)) {
+ stp = list_first_entry(&oo->oo_owner.so_stateids,
+ struct nfs4_ol_stateid, st_perstateowner);
+- unhash_open_stateid(stp, &reaplist);
+- put_ol_stateid_locked(stp, &reaplist);
++ if (unhash_open_stateid(stp, &reaplist))
++ put_ol_stateid_locked(stp, &reaplist);
+ }
+ spin_unlock(&clp->cl_lock);
+ free_ol_stateid_reaplist(&reaplist);
+@@ -1714,7 +1733,7 @@ __destroy_client(struct nfs4_client *clp)
+ spin_lock(&state_lock);
+ while (!list_empty(&clp->cl_delegations)) {
+ dp = list_entry(clp->cl_delegations.next, struct nfs4_delegation, dl_perclnt);
+- unhash_delegation_locked(dp);
++ WARN_ON(!unhash_delegation_locked(dp));
+ list_add(&dp->dl_recall_lru, &reaplist);
+ }
+ spin_unlock(&state_lock);
+@@ -4345,7 +4364,7 @@ nfs4_laundromat(struct nfsd_net *nn)
+ new_timeo = min(new_timeo, t);
+ break;
+ }
+- unhash_delegation_locked(dp);
++ WARN_ON(!unhash_delegation_locked(dp));
+ list_add(&dp->dl_recall_lru, &reaplist);
+ }
+ spin_unlock(&state_lock);
+@@ -4751,7 +4770,7 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
+ if (check_for_locks(stp->st_stid.sc_file,
+ lockowner(stp->st_stateowner)))
+ break;
+- unhash_lock_stateid(stp);
++ WARN_ON(!unhash_lock_stateid(stp));
+ spin_unlock(&cl->cl_lock);
+ nfs4_put_stid(s);
+ ret = nfs_ok;
+@@ -4967,20 +4986,23 @@ out:
+ static void nfsd4_close_open_stateid(struct nfs4_ol_stateid *s)
+ {
+ struct nfs4_client *clp = s->st_stid.sc_client;
++ bool unhashed;
+ LIST_HEAD(reaplist);
+
+ s->st_stid.sc_type = NFS4_CLOSED_STID;
+ spin_lock(&clp->cl_lock);
+- unhash_open_stateid(s, &reaplist);
++ unhashed = unhash_open_stateid(s, &reaplist);
+
+ if (clp->cl_minorversion) {
+- put_ol_stateid_locked(s, &reaplist);
++ if (unhashed)
++ put_ol_stateid_locked(s, &reaplist);
+ spin_unlock(&clp->cl_lock);
+ free_ol_stateid_reaplist(&reaplist);
+ } else {
+ spin_unlock(&clp->cl_lock);
+ free_ol_stateid_reaplist(&reaplist);
+- move_to_close_lru(s, clp->net);
++ if (unhashed)
++ move_to_close_lru(s, clp->net);
+ }
+ }
+
+@@ -6019,7 +6041,7 @@ nfsd_inject_add_lock_to_list(struct nfs4_ol_stateid *lst,
+
+ static u64 nfsd_foreach_client_lock(struct nfs4_client *clp, u64 max,
+ struct list_head *collect,
+- void (*func)(struct nfs4_ol_stateid *))
++ bool (*func)(struct nfs4_ol_stateid *))
+ {
+ struct nfs4_openowner *oop;
+ struct nfs4_ol_stateid *stp, *st_next;
+@@ -6033,9 +6055,9 @@ static u64 nfsd_foreach_client_lock(struct nfs4_client *clp, u64 max,
+ list_for_each_entry_safe(lst, lst_next,
+ &stp->st_locks, st_locks) {
+ if (func) {
+- func(lst);
+- nfsd_inject_add_lock_to_list(lst,
+- collect);
++ if (func(lst))
++ nfsd_inject_add_lock_to_list(lst,
++ collect);
+ }
+ ++count;
+ /*
+@@ -6305,7 +6327,7 @@ static u64 nfsd_find_all_delegations(struct nfs4_client *clp, u64 max,
+ continue;
+
+ atomic_inc(&clp->cl_refcount);
+- unhash_delegation_locked(dp);
++ WARN_ON(!unhash_delegation_locked(dp));
+ list_add(&dp->dl_recall_lru, victims);
+ }
+ ++count;
+@@ -6635,7 +6657,7 @@ nfs4_state_shutdown_net(struct net *net)
+ spin_lock(&state_lock);
+ list_for_each_safe(pos, next, &nn->del_recall_lru) {
+ dp = list_entry (pos, struct nfs4_delegation, dl_recall_lru);
+- unhash_delegation_locked(dp);
++ WARN_ON(!unhash_delegation_locked(dp));
+ list_add(&dp->dl_recall_lru, &reaplist);
+ }
+ spin_unlock(&state_lock);
+diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
+index 75e0563c09d1..b81f725ee21d 100644
+--- a/fs/nfsd/nfs4xdr.c
++++ b/fs/nfsd/nfs4xdr.c
+@@ -2140,6 +2140,27 @@ nfsd4_encode_aclname(struct xdr_stream *xdr, struct svc_rqst *rqstp,
+ return nfsd4_encode_user(xdr, rqstp, ace->who_uid);
+ }
+
++static inline __be32
++nfsd4_encode_layout_type(struct xdr_stream *xdr, enum pnfs_layouttype layout_type)
++{
++ __be32 *p;
++
++ if (layout_type) {
++ p = xdr_reserve_space(xdr, 8);
++ if (!p)
++ return nfserr_resource;
++ *p++ = cpu_to_be32(1);
++ *p++ = cpu_to_be32(layout_type);
++ } else {
++ p = xdr_reserve_space(xdr, 4);
++ if (!p)
++ return nfserr_resource;
++ *p++ = cpu_to_be32(0);
++ }
++
++ return 0;
++}
++
+ #define WORD0_ABSENT_FS_ATTRS (FATTR4_WORD0_FS_LOCATIONS | FATTR4_WORD0_FSID | \
+ FATTR4_WORD0_RDATTR_ERROR)
+ #define WORD1_ABSENT_FS_ATTRS FATTR4_WORD1_MOUNTED_ON_FILEID
+@@ -2688,20 +2709,16 @@ out_acl:
+ p = xdr_encode_hyper(p, stat.ino);
+ }
+ #ifdef CONFIG_NFSD_PNFS
+- if ((bmval1 & FATTR4_WORD1_FS_LAYOUT_TYPES) ||
+- (bmval2 & FATTR4_WORD2_LAYOUT_TYPES)) {
+- if (exp->ex_layout_type) {
+- p = xdr_reserve_space(xdr, 8);
+- if (!p)
+- goto out_resource;
+- *p++ = cpu_to_be32(1);
+- *p++ = cpu_to_be32(exp->ex_layout_type);
+- } else {
+- p = xdr_reserve_space(xdr, 4);
+- if (!p)
+- goto out_resource;
+- *p++ = cpu_to_be32(0);
+- }
++ if (bmval1 & FATTR4_WORD1_FS_LAYOUT_TYPES) {
++ status = nfsd4_encode_layout_type(xdr, exp->ex_layout_type);
++ if (status)
++ goto out;
++ }
++
++ if (bmval2 & FATTR4_WORD2_LAYOUT_TYPES) {
++ status = nfsd4_encode_layout_type(xdr, exp->ex_layout_type);
++ if (status)
++ goto out;
+ }
+
+ if (bmval2 & FATTR4_WORD2_LAYOUT_BLKSIZE) {
+diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
+index edb640ae9a94..eb1cebed3f36 100644
+--- a/include/linux/jbd2.h
++++ b/include/linux/jbd2.h
+@@ -1042,8 +1042,9 @@ void jbd2_update_log_tail(journal_t *journal, tid_t tid, unsigned long block);
+ extern void jbd2_journal_commit_transaction(journal_t *);
+
+ /* Checkpoint list management */
+-void __jbd2_journal_clean_checkpoint_list(journal_t *journal);
++void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy);
+ int __jbd2_journal_remove_checkpoint(struct journal_head *);
++void jbd2_journal_destroy_checkpoint(journal_t *journal);
+ void __jbd2_journal_insert_checkpoint(struct journal_head *, transaction_t *);
+
+
+diff --git a/include/linux/platform_data/st_nci.h b/include/linux/platform_data/st_nci.h
+deleted file mode 100644
+index d9d400a297bd..000000000000
+--- a/include/linux/platform_data/st_nci.h
++++ /dev/null
+@@ -1,29 +0,0 @@
+-/*
+- * Driver include for ST NCI NFC chip family.
+- *
+- * Copyright (C) 2014-2015 STMicroelectronics SAS. All rights reserved.
+- *
+- * This program is free software; you can redistribute it and/or modify it
+- * under the terms and conditions of the GNU General Public License,
+- * version 2, as published by the Free Software Foundation.
+- *
+- * This program is distributed in the hope that it will be useful,
+- * but WITHOUT ANY WARRANTY; without even the implied warranty of
+- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+- * GNU General Public License for more details.
+- *
+- * You should have received a copy of the GNU General Public License
+- * along with this program; if not, see <http://www.gnu.org/licenses/>.
+- */
+-
+-#ifndef _ST_NCI_H_
+-#define _ST_NCI_H_
+-
+-#define ST_NCI_DRIVER_NAME "st_nci"
+-
+-struct st_nci_nfc_platform_data {
+- unsigned int gpio_reset;
+- unsigned int irq_polarity;
+-};
+-
+-#endif /* _ST_NCI_H_ */
+diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
+index cb94ee4181d4..4929a8a9fd52 100644
+--- a/include/linux/sunrpc/svc_rdma.h
++++ b/include/linux/sunrpc/svc_rdma.h
+@@ -172,13 +172,6 @@ struct svcxprt_rdma {
+ #define RDMAXPRT_SQ_PENDING 2
+ #define RDMAXPRT_CONN_PENDING 3
+
+-#define RPCRDMA_MAX_SVC_SEGS (64) /* server max scatter/gather */
+-#if RPCSVC_MAXPAYLOAD < (RPCRDMA_MAX_SVC_SEGS << PAGE_SHIFT)
+-#define RPCRDMA_MAXPAYLOAD RPCSVC_MAXPAYLOAD
+-#else
+-#define RPCRDMA_MAXPAYLOAD (RPCRDMA_MAX_SVC_SEGS << PAGE_SHIFT)
+-#endif
+-
+ #define RPCRDMA_LISTEN_BACKLOG 10
+ /* The default ORD value is based on two outstanding full-size writes with a
+ * page size of 4k, or 32k * 2 ops / 4k = 16 outstanding RDMA_READ. */
+@@ -187,6 +180,8 @@ struct svcxprt_rdma {
+ #define RPCRDMA_MAX_REQUESTS 32
+ #define RPCRDMA_MAX_REQ_SIZE 4096
+
++#define RPCSVC_MAXPAYLOAD_RDMA RPCSVC_MAXPAYLOAD
++
+ /* svc_rdma_marshal.c */
+ extern int svc_rdma_xdr_decode_req(struct rpcrdma_msg **, struct svc_rqst *);
+ extern int svc_rdma_xdr_encode_error(struct svcxprt_rdma *,
+diff --git a/include/linux/sunrpc/xprtsock.h b/include/linux/sunrpc/xprtsock.h
+index 7591788e9fbf..357e44c1a46b 100644
+--- a/include/linux/sunrpc/xprtsock.h
++++ b/include/linux/sunrpc/xprtsock.h
+@@ -42,6 +42,7 @@ struct sock_xprt {
+ /*
+ * Connection of transports
+ */
++ unsigned long sock_state;
+ struct delayed_work connect_worker;
+ struct sockaddr_storage srcaddr;
+ unsigned short srcport;
+@@ -76,6 +77,8 @@ struct sock_xprt {
+ */
+ #define TCP_RPC_REPLY (1UL << 6)
+
++#define XPRT_SOCK_CONNECTING 1U
++
+ #endif /* __KERNEL__ */
+
+ #endif /* _LINUX_SUNRPC_XPRTSOCK_H */
+diff --git a/include/soc/tegra/mc.h b/include/soc/tegra/mc.h
+index 1ab2813273cd..bf2058690ceb 100644
+--- a/include/soc/tegra/mc.h
++++ b/include/soc/tegra/mc.h
+@@ -66,6 +66,7 @@ struct tegra_smmu_soc {
+ bool supports_round_robin_arbitration;
+ bool supports_request_limit;
+
++ unsigned int num_tlb_lines;
+ unsigned int num_asids;
+
+ const struct tegra_smmu_ops *ops;
+diff --git a/include/sound/hda_i915.h b/include/sound/hda_i915.h
+index adb5ba5cbd9d..ff99140831ba 100644
+--- a/include/sound/hda_i915.h
++++ b/include/sound/hda_i915.h
+@@ -11,7 +11,7 @@ int snd_hdac_get_display_clk(struct hdac_bus *bus);
+ int snd_hdac_i915_init(struct hdac_bus *bus);
+ int snd_hdac_i915_exit(struct hdac_bus *bus);
+ #else
+-static int snd_hdac_set_codec_wakeup(struct hdac_bus *bus, bool enable)
++static inline int snd_hdac_set_codec_wakeup(struct hdac_bus *bus, bool enable)
+ {
+ return 0;
+ }
+diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h
+index fd1a02cb3c82..003dca933803 100644
+--- a/include/trace/events/sunrpc.h
++++ b/include/trace/events/sunrpc.h
+@@ -529,18 +529,21 @@ TRACE_EVENT(svc_xprt_do_enqueue,
+
+ TP_STRUCT__entry(
+ __field(struct svc_xprt *, xprt)
+- __field(struct svc_rqst *, rqst)
++ __field_struct(struct sockaddr_storage, ss)
++ __field(int, pid)
++ __field(unsigned long, flags)
+ ),
+
+ TP_fast_assign(
+ __entry->xprt = xprt;
+- __entry->rqst = rqst;
++ xprt ? memcpy(&__entry->ss, &xprt->xpt_remote, sizeof(__entry->ss)) : memset(&__entry->ss, 0, sizeof(__entry->ss));
++ __entry->pid = rqst? rqst->rq_task->pid : 0;
++ __entry->flags = xprt ? xprt->xpt_flags : 0;
+ ),
+
+ TP_printk("xprt=0x%p addr=%pIScp pid=%d flags=%s", __entry->xprt,
+- (struct sockaddr *)&__entry->xprt->xpt_remote,
+- __entry->rqst ? __entry->rqst->rq_task->pid : 0,
+- show_svc_xprt_flags(__entry->xprt->xpt_flags))
++ (struct sockaddr *)&__entry->ss,
++ __entry->pid, show_svc_xprt_flags(__entry->flags))
+ );
+
+ TRACE_EVENT(svc_xprt_dequeue,
+@@ -589,16 +592,20 @@ TRACE_EVENT(svc_handle_xprt,
+ TP_STRUCT__entry(
+ __field(struct svc_xprt *, xprt)
+ __field(int, len)
++ __field_struct(struct sockaddr_storage, ss)
++ __field(unsigned long, flags)
+ ),
+
+ TP_fast_assign(
+ __entry->xprt = xprt;
++ xprt ? memcpy(&__entry->ss, &xprt->xpt_remote, sizeof(__entry->ss)) : memset(&__entry->ss, 0, sizeof(__entry->ss));
+ __entry->len = len;
++ __entry->flags = xprt ? xprt->xpt_flags : 0;
+ ),
+
+ TP_printk("xprt=0x%p addr=%pIScp len=%d flags=%s", __entry->xprt,
+- (struct sockaddr *)&__entry->xprt->xpt_remote, __entry->len,
+- show_svc_xprt_flags(__entry->xprt->xpt_flags))
++ (struct sockaddr *)&__entry->ss,
++ __entry->len, show_svc_xprt_flags(__entry->flags))
+ );
+ #endif /* _TRACE_SUNRPC_H */
+
+diff --git a/kernel/fork.c b/kernel/fork.c
+index dbd9b8d7b7cc..26a70dc7a915 100644
+--- a/kernel/fork.c
++++ b/kernel/fork.c
+@@ -1871,13 +1871,21 @@ static int check_unshare_flags(unsigned long unshare_flags)
+ CLONE_NEWUSER|CLONE_NEWPID))
+ return -EINVAL;
+ /*
+- * Not implemented, but pretend it works if there is nothing to
+- * unshare. Note that unsharing CLONE_THREAD or CLONE_SIGHAND
+- * needs to unshare vm.
++ * Not implemented, but pretend it works if there is nothing
++ * to unshare. Note that unsharing the address space or the
++ * signal handlers also need to unshare the signal queues (aka
++ * CLONE_THREAD).
+ */
+ if (unshare_flags & (CLONE_THREAD | CLONE_SIGHAND | CLONE_VM)) {
+- /* FIXME: get_task_mm() increments ->mm_users */
+- if (atomic_read(¤t->mm->mm_users) > 1)
++ if (!thread_group_empty(current))
++ return -EINVAL;
++ }
++ if (unshare_flags & (CLONE_SIGHAND | CLONE_VM)) {
++ if (atomic_read(¤t->sighand->count) > 1)
++ return -EINVAL;
++ }
++ if (unshare_flags & CLONE_VM) {
++ if (!current_is_single_threaded())
+ return -EINVAL;
+ }
+
+@@ -1946,16 +1954,16 @@ SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags)
+ if (unshare_flags & CLONE_NEWUSER)
+ unshare_flags |= CLONE_THREAD | CLONE_FS;
+ /*
+- * If unsharing a thread from a thread group, must also unshare vm.
+- */
+- if (unshare_flags & CLONE_THREAD)
+- unshare_flags |= CLONE_VM;
+- /*
+ * If unsharing vm, must also unshare signal handlers.
+ */
+ if (unshare_flags & CLONE_VM)
+ unshare_flags |= CLONE_SIGHAND;
+ /*
++ * If unsharing a signal handlers, must also unshare the signal queues.
++ */
++ if (unshare_flags & CLONE_SIGHAND)
++ unshare_flags |= CLONE_THREAD;
++ /*
+ * If unsharing namespace, must also unshare filesystem information.
+ */
+ if (unshare_flags & CLONE_NEWNS)
+diff --git a/kernel/workqueue.c b/kernel/workqueue.c
+index 4c4f06176f74..a413acb59a07 100644
+--- a/kernel/workqueue.c
++++ b/kernel/workqueue.c
+@@ -2614,7 +2614,7 @@ void flush_workqueue(struct workqueue_struct *wq)
+ out_unlock:
+ mutex_unlock(&wq->mutex);
+ }
+-EXPORT_SYMBOL_GPL(flush_workqueue);
++EXPORT_SYMBOL(flush_workqueue);
+
+ /**
+ * drain_workqueue - drain a workqueue
+diff --git a/lib/decompress_bunzip2.c b/lib/decompress_bunzip2.c
+index 6dd0335ea61b..0234361b24b8 100644
+--- a/lib/decompress_bunzip2.c
++++ b/lib/decompress_bunzip2.c
+@@ -743,12 +743,12 @@ exit_0:
+ }
+
+ #ifdef PREBOOT
+-STATIC int INIT decompress(unsigned char *buf, long len,
++STATIC int INIT __decompress(unsigned char *buf, long len,
+ long (*fill)(void*, unsigned long),
+ long (*flush)(void*, unsigned long),
+- unsigned char *outbuf,
++ unsigned char *outbuf, long olen,
+ long *pos,
+- void(*error)(char *x))
++ void (*error)(char *x))
+ {
+ return bunzip2(buf, len - 4, fill, flush, outbuf, pos, error);
+ }
+diff --git a/lib/decompress_inflate.c b/lib/decompress_inflate.c
+index d4c7891635ec..555c06bf20da 100644
+--- a/lib/decompress_inflate.c
++++ b/lib/decompress_inflate.c
+@@ -1,4 +1,5 @@
+ #ifdef STATIC
++#define PREBOOT
+ /* Pre-boot environment: included */
+
+ /* prevent inclusion of _LINUX_KERNEL_H in pre-boot environment: lots
+@@ -33,23 +34,23 @@ static long INIT nofill(void *buffer, unsigned long len)
+ }
+
+ /* Included from initramfs et al code */
+-STATIC int INIT gunzip(unsigned char *buf, long len,
++STATIC int INIT __gunzip(unsigned char *buf, long len,
+ long (*fill)(void*, unsigned long),
+ long (*flush)(void*, unsigned long),
+- unsigned char *out_buf,
++ unsigned char *out_buf, long out_len,
+ long *pos,
+ void(*error)(char *x)) {
+ u8 *zbuf;
+ struct z_stream_s *strm;
+ int rc;
+- size_t out_len;
+
+ rc = -1;
+ if (flush) {
+ out_len = 0x8000; /* 32 K */
+ out_buf = malloc(out_len);
+ } else {
+- out_len = ((size_t)~0) - (size_t)out_buf; /* no limit */
++ if (!out_len)
++ out_len = ((size_t)~0) - (size_t)out_buf; /* no limit */
+ }
+ if (!out_buf) {
+ error("Out of memory while allocating output buffer");
+@@ -181,4 +182,24 @@ gunzip_nomem1:
+ return rc; /* returns Z_OK (0) if successful */
+ }
+
+-#define decompress gunzip
++#ifndef PREBOOT
++STATIC int INIT gunzip(unsigned char *buf, long len,
++ long (*fill)(void*, unsigned long),
++ long (*flush)(void*, unsigned long),
++ unsigned char *out_buf,
++ long *pos,
++ void (*error)(char *x))
++{
++ return __gunzip(buf, len, fill, flush, out_buf, 0, pos, error);
++}
++#else
++STATIC int INIT __decompress(unsigned char *buf, long len,
++ long (*fill)(void*, unsigned long),
++ long (*flush)(void*, unsigned long),
++ unsigned char *out_buf, long out_len,
++ long *pos,
++ void (*error)(char *x))
++{
++ return __gunzip(buf, len, fill, flush, out_buf, out_len, pos, error);
++}
++#endif
+diff --git a/lib/decompress_unlz4.c b/lib/decompress_unlz4.c
+index 40f66ebe57b7..036fc882cd72 100644
+--- a/lib/decompress_unlz4.c
++++ b/lib/decompress_unlz4.c
+@@ -196,12 +196,12 @@ exit_0:
+ }
+
+ #ifdef PREBOOT
+-STATIC int INIT decompress(unsigned char *buf, long in_len,
++STATIC int INIT __decompress(unsigned char *buf, long in_len,
+ long (*fill)(void*, unsigned long),
+ long (*flush)(void*, unsigned long),
+- unsigned char *output,
++ unsigned char *output, long out_len,
+ long *posp,
+- void(*error)(char *x)
++ void (*error)(char *x)
+ )
+ {
+ return unlz4(buf, in_len - 4, fill, flush, output, posp, error);
+diff --git a/lib/decompress_unlzma.c b/lib/decompress_unlzma.c
+index 0be83af62b88..decb64629c14 100644
+--- a/lib/decompress_unlzma.c
++++ b/lib/decompress_unlzma.c
+@@ -667,13 +667,12 @@ exit_0:
+ }
+
+ #ifdef PREBOOT
+-STATIC int INIT decompress(unsigned char *buf, long in_len,
++STATIC int INIT __decompress(unsigned char *buf, long in_len,
+ long (*fill)(void*, unsigned long),
+ long (*flush)(void*, unsigned long),
+- unsigned char *output,
++ unsigned char *output, long out_len,
+ long *posp,
+- void(*error)(char *x)
+- )
++ void (*error)(char *x))
+ {
+ return unlzma(buf, in_len - 4, fill, flush, output, posp, error);
+ }
+diff --git a/lib/decompress_unlzo.c b/lib/decompress_unlzo.c
+index b94a31bdd87d..f4c158e3a022 100644
+--- a/lib/decompress_unlzo.c
++++ b/lib/decompress_unlzo.c
+@@ -31,6 +31,7 @@
+ */
+
+ #ifdef STATIC
++#define PREBOOT
+ #include "lzo/lzo1x_decompress_safe.c"
+ #else
+ #include <linux/decompress/unlzo.h>
+@@ -287,4 +288,14 @@ exit:
+ return ret;
+ }
+
+-#define decompress unlzo
++#ifdef PREBOOT
++STATIC int INIT __decompress(unsigned char *buf, long len,
++ long (*fill)(void*, unsigned long),
++ long (*flush)(void*, unsigned long),
++ unsigned char *out_buf, long olen,
++ long *pos,
++ void (*error)(char *x))
++{
++ return unlzo(buf, len, fill, flush, out_buf, pos, error);
++}
++#endif
+diff --git a/lib/decompress_unxz.c b/lib/decompress_unxz.c
+index b07a78340e9d..25d59a95bd66 100644
+--- a/lib/decompress_unxz.c
++++ b/lib/decompress_unxz.c
+@@ -394,4 +394,14 @@ error_alloc_state:
+ * This macro is used by architecture-specific files to decompress
+ * the kernel image.
+ */
+-#define decompress unxz
++#ifdef XZ_PREBOOT
++STATIC int INIT __decompress(unsigned char *buf, long len,
++ long (*fill)(void*, unsigned long),
++ long (*flush)(void*, unsigned long),
++ unsigned char *out_buf, long olen,
++ long *pos,
++ void (*error)(char *x))
++{
++ return unxz(buf, len, fill, flush, out_buf, pos, error);
++}
++#endif
+diff --git a/mm/vmscan.c b/mm/vmscan.c
+index 8286938c70de..26c86e2fb5af 100644
+--- a/mm/vmscan.c
++++ b/mm/vmscan.c
+@@ -1190,7 +1190,7 @@ cull_mlocked:
+ if (PageSwapCache(page))
+ try_to_free_swap(page);
+ unlock_page(page);
+- putback_lru_page(page);
++ list_add(&page->lru, &ret_pages);
+ continue;
+
+ activate_locked:
+diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
+index b8233505bf9f..8f1df6793650 100644
+--- a/net/mac80211/tx.c
++++ b/net/mac80211/tx.c
+@@ -311,9 +311,6 @@ ieee80211_tx_h_check_assoc(struct ieee80211_tx_data *tx)
+ if (tx->sdata->vif.type == NL80211_IFTYPE_WDS)
+ return TX_CONTINUE;
+
+- if (tx->sdata->vif.type == NL80211_IFTYPE_MESH_POINT)
+- return TX_CONTINUE;
+-
+ if (tx->flags & IEEE80211_TX_PS_BUFFERED)
+ return TX_CONTINUE;
+
+diff --git a/net/nfc/nci/hci.c b/net/nfc/nci/hci.c
+index af002df640c7..609f92283d1b 100644
+--- a/net/nfc/nci/hci.c
++++ b/net/nfc/nci/hci.c
+@@ -233,7 +233,7 @@ int nci_hci_send_cmd(struct nci_dev *ndev, u8 gate, u8 cmd,
+ r = nci_request(ndev, nci_hci_send_data_req, (unsigned long)&data,
+ msecs_to_jiffies(NCI_DATA_TIMEOUT));
+
+- if (r == NCI_STATUS_OK)
++ if (r == NCI_STATUS_OK && skb)
+ *skb = conn_info->rx_skb;
+
+ return r;
+diff --git a/net/nfc/netlink.c b/net/nfc/netlink.c
+index f85f37ed19b2..73d1ca7c546c 100644
+--- a/net/nfc/netlink.c
++++ b/net/nfc/netlink.c
+@@ -1518,12 +1518,13 @@ static int nfc_genl_vendor_cmd(struct sk_buff *skb,
+ if (!dev || !dev->vendor_cmds || !dev->n_vendor_cmds)
+ return -ENODEV;
+
+- data = nla_data(info->attrs[NFC_ATTR_VENDOR_DATA]);
+- if (data) {
++ if (info->attrs[NFC_ATTR_VENDOR_DATA]) {
++ data = nla_data(info->attrs[NFC_ATTR_VENDOR_DATA]);
+ data_len = nla_len(info->attrs[NFC_ATTR_VENDOR_DATA]);
+ if (data_len == 0)
+ return -EINVAL;
+ } else {
++ data = NULL;
+ data_len = 0;
+ }
+
+diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
+index ab5dd621ae0c..2e98f4a243e5 100644
+--- a/net/sunrpc/xprt.c
++++ b/net/sunrpc/xprt.c
+@@ -614,6 +614,7 @@ static void xprt_autoclose(struct work_struct *work)
+ clear_bit(XPRT_CLOSE_WAIT, &xprt->state);
+ xprt->ops->close(xprt);
+ xprt_release_write(xprt, NULL);
++ wake_up_bit(&xprt->state, XPRT_LOCKED);
+ }
+
+ /**
+@@ -723,6 +724,7 @@ void xprt_unlock_connect(struct rpc_xprt *xprt, void *cookie)
+ xprt->ops->release_xprt(xprt, NULL);
+ out:
+ spin_unlock_bh(&xprt->transport_lock);
++ wake_up_bit(&xprt->state, XPRT_LOCKED);
+ }
+
+ /**
+@@ -1394,6 +1396,10 @@ out:
+ static void xprt_destroy(struct rpc_xprt *xprt)
+ {
+ dprintk("RPC: destroying transport %p\n", xprt);
++
++ /* Exclude transport connect/disconnect handlers */
++ wait_on_bit_lock(&xprt->state, XPRT_LOCKED, TASK_UNINTERRUPTIBLE);
++
+ del_timer_sync(&xprt->timer);
+
+ rpc_xprt_debugfs_unregister(xprt);
+diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
+index 6b36279e4288..48f6de912f78 100644
+--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
++++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
+@@ -91,7 +91,7 @@ struct svc_xprt_class svc_rdma_class = {
+ .xcl_name = "rdma",
+ .xcl_owner = THIS_MODULE,
+ .xcl_ops = &svc_rdma_ops,
+- .xcl_max_payload = RPCRDMA_MAXPAYLOAD,
++ .xcl_max_payload = RPCSVC_MAXPAYLOAD_RDMA,
+ .xcl_ident = XPRT_TRANSPORT_RDMA,
+ };
+
+diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
+index f49dd8b38122..e718d0959af3 100644
+--- a/net/sunrpc/xprtrdma/xprt_rdma.h
++++ b/net/sunrpc/xprtrdma/xprt_rdma.h
+@@ -51,7 +51,6 @@
+ #include <linux/sunrpc/clnt.h> /* rpc_xprt */
+ #include <linux/sunrpc/rpc_rdma.h> /* RPC/RDMA protocol */
+ #include <linux/sunrpc/xprtrdma.h> /* xprt parameters */
+-#include <linux/sunrpc/svc.h> /* RPCSVC_MAXPAYLOAD */
+
+ #define RDMA_RESOLVE_TIMEOUT (5000) /* 5 seconds */
+ #define RDMA_CONNECT_RETRY_MAX (2) /* retries if no listener backlog */
+diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
+index 0030376327b7..8a39b1e48bc4 100644
+--- a/net/sunrpc/xprtsock.c
++++ b/net/sunrpc/xprtsock.c
+@@ -829,6 +829,7 @@ static void xs_reset_transport(struct sock_xprt *transport)
+ sk->sk_user_data = NULL;
+
+ xs_restore_old_callbacks(transport, sk);
++ xprt_clear_connected(xprt);
+ write_unlock_bh(&sk->sk_callback_lock);
+ xs_sock_reset_connection_flags(xprt);
+
+@@ -1432,6 +1433,7 @@ out:
+ static void xs_tcp_state_change(struct sock *sk)
+ {
+ struct rpc_xprt *xprt;
++ struct sock_xprt *transport;
+
+ read_lock_bh(&sk->sk_callback_lock);
+ if (!(xprt = xprt_from_sock(sk)))
+@@ -1443,13 +1445,12 @@ static void xs_tcp_state_change(struct sock *sk)
+ sock_flag(sk, SOCK_ZAPPED),
+ sk->sk_shutdown);
+
++ transport = container_of(xprt, struct sock_xprt, xprt);
+ trace_rpc_socket_state_change(xprt, sk->sk_socket);
+ switch (sk->sk_state) {
+ case TCP_ESTABLISHED:
+ spin_lock(&xprt->transport_lock);
+ if (!xprt_test_and_set_connected(xprt)) {
+- struct sock_xprt *transport = container_of(xprt,
+- struct sock_xprt, xprt);
+
+ /* Reset TCP record info */
+ transport->tcp_offset = 0;
+@@ -1458,6 +1459,8 @@ static void xs_tcp_state_change(struct sock *sk)
+ transport->tcp_flags =
+ TCP_RCV_COPY_FRAGHDR | TCP_RCV_COPY_XID;
+ xprt->connect_cookie++;
++ clear_bit(XPRT_SOCK_CONNECTING, &transport->sock_state);
++ xprt_clear_connecting(xprt);
+
+ xprt_wake_pending_tasks(xprt, -EAGAIN);
+ }
+@@ -1493,6 +1496,9 @@ static void xs_tcp_state_change(struct sock *sk)
+ smp_mb__after_atomic();
+ break;
+ case TCP_CLOSE:
++ if (test_and_clear_bit(XPRT_SOCK_CONNECTING,
++ &transport->sock_state))
++ xprt_clear_connecting(xprt);
+ xs_sock_mark_closed(xprt);
+ }
+ out:
+@@ -2176,6 +2182,7 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
+ /* Tell the socket layer to start connecting... */
+ xprt->stat.connect_count++;
+ xprt->stat.connect_start = jiffies;
++ set_bit(XPRT_SOCK_CONNECTING, &transport->sock_state);
+ ret = kernel_connect(sock, xs_addr(xprt), xprt->addrlen, O_NONBLOCK);
+ switch (ret) {
+ case 0:
+@@ -2237,7 +2244,6 @@ static void xs_tcp_setup_socket(struct work_struct *work)
+ case -EINPROGRESS:
+ case -EALREADY:
+ xprt_unlock_connect(xprt, transport);
+- xprt_clear_connecting(xprt);
+ return;
+ case -EINVAL:
+ /* Happens, for instance, if the user specified a link
+@@ -2279,13 +2285,14 @@ static void xs_connect(struct rpc_xprt *xprt, struct rpc_task *task)
+
+ WARN_ON_ONCE(!xprt_lock_connect(xprt, task, transport));
+
+- /* Start by resetting any existing state */
+- xs_reset_transport(transport);
+-
+- if (transport->sock != NULL && !RPC_IS_SOFTCONN(task)) {
++ if (transport->sock != NULL) {
+ dprintk("RPC: xs_connect delayed xprt %p for %lu "
+ "seconds\n",
+ xprt, xprt->reestablish_timeout / HZ);
++
++ /* Start by resetting any existing state */
++ xs_reset_transport(transport);
++
+ queue_delayed_work(rpciod_workqueue,
+ &transport->connect_worker,
+ xprt->reestablish_timeout);
+diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
+index 374ea53288ca..c8f01ccc2513 100644
+--- a/sound/pci/hda/patch_realtek.c
++++ b/sound/pci/hda/patch_realtek.c
+@@ -1135,7 +1135,7 @@ static const struct hda_fixup alc880_fixups[] = {
+ /* override all pins as BIOS on old Amilo is broken */
+ .type = HDA_FIXUP_PINS,
+ .v.pins = (const struct hda_pintbl[]) {
+- { 0x14, 0x0121411f }, /* HP */
++ { 0x14, 0x0121401f }, /* HP */
+ { 0x15, 0x99030120 }, /* speaker */
+ { 0x16, 0x99030130 }, /* bass speaker */
+ { 0x17, 0x411111f0 }, /* N/A */
+@@ -1155,7 +1155,7 @@ static const struct hda_fixup alc880_fixups[] = {
+ /* almost compatible with FUJITSU, but no bass and SPDIF */
+ .type = HDA_FIXUP_PINS,
+ .v.pins = (const struct hda_pintbl[]) {
+- { 0x14, 0x0121411f }, /* HP */
++ { 0x14, 0x0121401f }, /* HP */
+ { 0x15, 0x99030120 }, /* speaker */
+ { 0x16, 0x411111f0 }, /* N/A */
+ { 0x17, 0x411111f0 }, /* N/A */
+@@ -1364,7 +1364,7 @@ static const struct snd_pci_quirk alc880_fixup_tbl[] = {
+ SND_PCI_QUIRK(0x161f, 0x203d, "W810", ALC880_FIXUP_W810),
+ SND_PCI_QUIRK(0x161f, 0x205d, "Medion Rim 2150", ALC880_FIXUP_MEDION_RIM),
+ SND_PCI_QUIRK(0x1631, 0xe011, "PB 13201056", ALC880_FIXUP_6ST_AUTOMUTE),
+- SND_PCI_QUIRK(0x1734, 0x107c, "FSC F1734", ALC880_FIXUP_F1734),
++ SND_PCI_QUIRK(0x1734, 0x107c, "FSC Amilo M1437", ALC880_FIXUP_FUJITSU),
+ SND_PCI_QUIRK(0x1734, 0x1094, "FSC Amilo M1451G", ALC880_FIXUP_FUJITSU),
+ SND_PCI_QUIRK(0x1734, 0x10ac, "FSC AMILO Xi 1526", ALC880_FIXUP_F1734),
+ SND_PCI_QUIRK(0x1734, 0x10b0, "FSC Amilo Pi1556", ALC880_FIXUP_FUJITSU),
+@@ -5189,8 +5189,11 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
+ SND_PCI_QUIRK(0x1028, 0x06c7, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE),
+ SND_PCI_QUIRK(0x1028, 0x06d9, "Dell", ALC293_FIXUP_DELL1_MIC_NO_PRESENCE),
+ SND_PCI_QUIRK(0x1028, 0x06da, "Dell", ALC293_FIXUP_DELL1_MIC_NO_PRESENCE),
+- SND_PCI_QUIRK(0x1028, 0x06de, "Dell", ALC292_FIXUP_DISABLE_AAMIX),
+ SND_PCI_QUIRK(0x1028, 0x06db, "Dell", ALC292_FIXUP_DISABLE_AAMIX),
++ SND_PCI_QUIRK(0x1028, 0x06dd, "Dell", ALC292_FIXUP_DISABLE_AAMIX),
++ SND_PCI_QUIRK(0x1028, 0x06de, "Dell", ALC292_FIXUP_DISABLE_AAMIX),
++ SND_PCI_QUIRK(0x1028, 0x06df, "Dell", ALC292_FIXUP_DISABLE_AAMIX),
++ SND_PCI_QUIRK(0x1028, 0x06e0, "Dell", ALC292_FIXUP_DISABLE_AAMIX),
+ SND_PCI_QUIRK(0x1028, 0x164a, "Dell", ALC293_FIXUP_DELL1_MIC_NO_PRESENCE),
+ SND_PCI_QUIRK(0x1028, 0x164b, "Dell", ALC293_FIXUP_DELL1_MIC_NO_PRESENCE),
+ SND_PCI_QUIRK(0x103c, 0x1586, "HP", ALC269_FIXUP_HP_MUTE_LED_MIC2),
+@@ -6579,6 +6582,7 @@ static const struct snd_pci_quirk alc662_fixup_tbl[] = {
+ SND_PCI_QUIRK(0x1028, 0x05db, "Dell", ALC668_FIXUP_DELL_MIC_NO_PRESENCE),
+ SND_PCI_QUIRK(0x1028, 0x05fe, "Dell XPS 15", ALC668_FIXUP_DELL_XPS13),
+ SND_PCI_QUIRK(0x1028, 0x060a, "Dell XPS 13", ALC668_FIXUP_DELL_XPS13),
++ SND_PCI_QUIRK(0x1028, 0x060d, "Dell M3800", ALC668_FIXUP_DELL_XPS13),
+ SND_PCI_QUIRK(0x1028, 0x0625, "Dell", ALC668_FIXUP_DELL_MIC_NO_PRESENCE),
+ SND_PCI_QUIRK(0x1028, 0x0626, "Dell", ALC668_FIXUP_DELL_MIC_NO_PRESENCE),
+ SND_PCI_QUIRK(0x1028, 0x0696, "Dell", ALC668_FIXUP_DELL_MIC_NO_PRESENCE),
+diff --git a/sound/usb/mixer.c b/sound/usb/mixer.c
+index 6b3acba5da7a..83d6e76435b4 100644
+--- a/sound/usb/mixer.c
++++ b/sound/usb/mixer.c
+@@ -2522,7 +2522,7 @@ static int restore_mixer_value(struct usb_mixer_elem_list *list)
+ for (c = 0; c < MAX_CHANNELS; c++) {
+ if (!(cval->cmask & (1 << c)))
+ continue;
+- if (cval->cached & (1 << c)) {
++ if (cval->cached & (1 << (c + 1))) {
+ err = snd_usb_set_cur_mix_value(cval, c + 1, idx,
+ cval->cache_val[idx]);
+ if (err < 0)
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-09-29 19:16 Mike Pagano
0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-09-29 19:16 UTC (permalink / raw
To: gentoo-commits
commit: ddc71720a6ed4cec05ef162cbcab0a74b71f76a7
Author: Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Mon Oct 5 11:49:35 2015 +0000
Commit: Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Mon Oct 5 11:49:35 2015 +0000
URL: https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=ddc71720
Remove redundant patch
2710_flush-workqueue-non-GPL-availability.patch | 33 -------------------------
1 file changed, 33 deletions(-)
diff --git a/2710_flush-workqueue-non-GPL-availability.patch b/2710_flush-workqueue-non-GPL-availability.patch
deleted file mode 100644
index 3e017d4..0000000
--- a/2710_flush-workqueue-non-GPL-availability.patch
+++ /dev/null
@@ -1,33 +0,0 @@
-From 1dadafa86a779884f14a6e7a3ddde1a57b0a0a65 Mon Sep 17 00:00:00 2001
-From: Tim Gardner <tim.gardner@canonical.com>
-Date: Tue, 4 Aug 2015 11:26:04 -0600
-Subject: workqueue: Make flush_workqueue() available again to non GPL modules
-
-Commit 37b1ef31a568fc02e53587620226e5f3c66454c8 ("workqueue: move
-flush_scheduled_work() to workqueue.h") moved the exported non GPL
-flush_scheduled_work() from a function to an inline wrapper.
-Unfortunately, it directly calls flush_workqueue() which is a GPL function.
-This has the effect of changing the licensing requirement for this function
-and makes it unavailable to non GPL modules.
-
-See commit ad7b1f841f8a54c6d61ff181451f55b68175e15a ("workqueue: Make
-schedule_work() available again to non GPL modules") for precedent.
-
-Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
-Signed-off-by: Tejun Heo <tj@kernel.org>
-
-diff --git a/kernel/workqueue.c b/kernel/workqueue.c
-index 4c4f061..a413acb 100644
---- a/kernel/workqueue.c
-+++ b/kernel/workqueue.c
-@@ -2614,7 +2614,7 @@ void flush_workqueue(struct workqueue_struct *wq)
- out_unlock:
- mutex_unlock(&wq->mutex);
- }
--EXPORT_SYMBOL_GPL(flush_workqueue);
-+EXPORT_SYMBOL(flush_workqueue);
-
- /**
- * drain_workqueue - drain a workqueue
---
-cgit v0.10.2
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-10-03 16:12 Mike Pagano
0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-10-03 16:12 UTC (permalink / raw
To: gentoo-commits
commit: 2c0f6c3b92e2248ee19155496c89a7eead78472a
Author: Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Sat Oct 3 16:12:47 2015 +0000
Commit: Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Sat Oct 3 16:12:47 2015 +0000
URL: https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=2c0f6c3b
Linux patch 4.2.3
0000_README | 4 +
1002_linux-4.2.3.patch | 1532 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 1536 insertions(+)
diff --git a/0000_README b/0000_README
index 9428abc..5a14372 100644
--- a/0000_README
+++ b/0000_README
@@ -51,6 +51,10 @@ Patch: 1001_linux-4.2.2.patch
From: http://www.kernel.org
Desc: Linux 4.2.2
+Patch: 1002_linux-4.2.3.patch
+From: http://www.kernel.org
+Desc: Linux 4.2.3
+
Patch: 1500_XATTR_USER_PREFIX.patch
From: https://bugs.gentoo.org/show_bug.cgi?id=470644
Desc: Support for namespace user.pax.* on tmpfs.
diff --git a/1002_linux-4.2.3.patch b/1002_linux-4.2.3.patch
new file mode 100644
index 0000000..018e36c
--- /dev/null
+++ b/1002_linux-4.2.3.patch
@@ -0,0 +1,1532 @@
+diff --git a/Documentation/devicetree/bindings/net/ethernet.txt b/Documentation/devicetree/bindings/net/ethernet.txt
+index 41b3f3f864e8..5d88f37480b6 100644
+--- a/Documentation/devicetree/bindings/net/ethernet.txt
++++ b/Documentation/devicetree/bindings/net/ethernet.txt
+@@ -25,7 +25,11 @@ The following properties are common to the Ethernet controllers:
+ flow control thresholds.
+ - tx-fifo-depth: the size of the controller's transmit fifo in bytes. This
+ is used for components that can have configurable fifo sizes.
++- managed: string, specifies the PHY management type. Supported values are:
++ "auto", "in-band-status". "auto" is the default, it usess MDIO for
++ management if fixed-link is not specified.
+
+ Child nodes of the Ethernet controller are typically the individual PHY devices
+ connected via the MDIO bus (sometimes the MDIO bus controller is separate).
+ They are described in the phy.txt file in this same directory.
++For non-MDIO PHY management see fixed-link.txt.
+diff --git a/Makefile b/Makefile
+index 3578b4426ecf..a6edbb11a69a 100644
+--- a/Makefile
++++ b/Makefile
+@@ -1,6 +1,6 @@
+ VERSION = 4
+ PATCHLEVEL = 2
+-SUBLEVEL = 2
++SUBLEVEL = 3
+ EXTRAVERSION =
+ NAME = Hurr durr I'ma sheep
+
+diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
+index 965d1afb0eaa..5cb13ca3a3ac 100644
+--- a/drivers/block/zram/zcomp.c
++++ b/drivers/block/zram/zcomp.c
+@@ -330,12 +330,14 @@ void zcomp_destroy(struct zcomp *comp)
+ * allocate new zcomp and initialize it. return compressing
+ * backend pointer or ERR_PTR if things went bad. ERR_PTR(-EINVAL)
+ * if requested algorithm is not supported, ERR_PTR(-ENOMEM) in
+- * case of allocation error.
++ * case of allocation error, or any other error potentially
++ * returned by functions zcomp_strm_{multi,single}_create.
+ */
+ struct zcomp *zcomp_create(const char *compress, int max_strm)
+ {
+ struct zcomp *comp;
+ struct zcomp_backend *backend;
++ int error;
+
+ backend = find_backend(compress);
+ if (!backend)
+@@ -347,12 +349,12 @@ struct zcomp *zcomp_create(const char *compress, int max_strm)
+
+ comp->backend = backend;
+ if (max_strm > 1)
+- zcomp_strm_multi_create(comp, max_strm);
++ error = zcomp_strm_multi_create(comp, max_strm);
+ else
+- zcomp_strm_single_create(comp);
+- if (!comp->stream) {
++ error = zcomp_strm_single_create(comp);
++ if (error) {
+ kfree(comp);
+- return ERR_PTR(-ENOMEM);
++ return ERR_PTR(error);
+ }
+ return comp;
+ }
+diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
+index 079897b3a955..9d56515f4c4d 100644
+--- a/drivers/net/dsa/bcm_sf2.c
++++ b/drivers/net/dsa/bcm_sf2.c
+@@ -418,7 +418,7 @@ static int bcm_sf2_sw_fast_age_port(struct dsa_switch *ds, int port)
+ core_writel(priv, port, CORE_FAST_AGE_PORT);
+
+ reg = core_readl(priv, CORE_FAST_AGE_CTRL);
+- reg |= EN_AGE_PORT | FAST_AGE_STR_DONE;
++ reg |= EN_AGE_PORT | EN_AGE_DYNAMIC | FAST_AGE_STR_DONE;
+ core_writel(priv, reg, CORE_FAST_AGE_CTRL);
+
+ do {
+@@ -432,6 +432,8 @@ static int bcm_sf2_sw_fast_age_port(struct dsa_switch *ds, int port)
+ if (!timeout)
+ return -ETIMEDOUT;
+
++ core_writel(priv, 0, CORE_FAST_AGE_CTRL);
++
+ return 0;
+ }
+
+@@ -507,7 +509,7 @@ static int bcm_sf2_sw_br_set_stp_state(struct dsa_switch *ds, int port,
+ u32 reg;
+
+ reg = core_readl(priv, CORE_G_PCTL_PORT(port));
+- cur_hw_state = reg >> G_MISTP_STATE_SHIFT;
++ cur_hw_state = reg & (G_MISTP_STATE_MASK << G_MISTP_STATE_SHIFT);
+
+ switch (state) {
+ case BR_STATE_DISABLED:
+@@ -531,10 +533,12 @@ static int bcm_sf2_sw_br_set_stp_state(struct dsa_switch *ds, int port,
+ }
+
+ /* Fast-age ARL entries if we are moving a port from Learning or
+- * Forwarding state to Disabled, Blocking or Listening state
++ * Forwarding (cur_hw_state) state to Disabled, Blocking or Listening
++ * state (hw_state)
+ */
+ if (cur_hw_state != hw_state) {
+- if (cur_hw_state & 4 && !(hw_state & 4)) {
++ if (cur_hw_state >= G_MISTP_LEARN_STATE &&
++ hw_state <= G_MISTP_LISTEN_STATE) {
+ ret = bcm_sf2_sw_fast_age_port(ds, port);
+ if (ret) {
+ pr_err("%s: fast-ageing failed\n", __func__);
+@@ -901,15 +905,11 @@ static void bcm_sf2_sw_fixed_link_update(struct dsa_switch *ds, int port,
+ struct fixed_phy_status *status)
+ {
+ struct bcm_sf2_priv *priv = ds_to_priv(ds);
+- u32 duplex, pause, speed;
++ u32 duplex, pause;
+ u32 reg;
+
+ duplex = core_readl(priv, CORE_DUPSTS);
+ pause = core_readl(priv, CORE_PAUSESTS);
+- speed = core_readl(priv, CORE_SPDSTS);
+-
+- speed >>= (port * SPDSTS_SHIFT);
+- speed &= SPDSTS_MASK;
+
+ status->link = 0;
+
+@@ -944,18 +944,6 @@ static void bcm_sf2_sw_fixed_link_update(struct dsa_switch *ds, int port,
+ reg &= ~LINK_STS;
+ core_writel(priv, reg, CORE_STS_OVERRIDE_GMIIP_PORT(port));
+
+- switch (speed) {
+- case SPDSTS_10:
+- status->speed = SPEED_10;
+- break;
+- case SPDSTS_100:
+- status->speed = SPEED_100;
+- break;
+- case SPDSTS_1000:
+- status->speed = SPEED_1000;
+- break;
+- }
+-
+ if ((pause & (1 << port)) &&
+ (pause & (1 << (port + PAUSESTS_TX_PAUSE_SHIFT)))) {
+ status->asym_pause = 1;
+diff --git a/drivers/net/dsa/bcm_sf2.h b/drivers/net/dsa/bcm_sf2.h
+index 22e2ebf31333..789d7b7737da 100644
+--- a/drivers/net/dsa/bcm_sf2.h
++++ b/drivers/net/dsa/bcm_sf2.h
+@@ -112,8 +112,8 @@ static inline u64 name##_readq(struct bcm_sf2_priv *priv, u32 off) \
+ spin_unlock(&priv->indir_lock); \
+ return (u64)indir << 32 | dir; \
+ } \
+-static inline void name##_writeq(struct bcm_sf2_priv *priv, u32 off, \
+- u64 val) \
++static inline void name##_writeq(struct bcm_sf2_priv *priv, u64 val, \
++ u32 off) \
+ { \
+ spin_lock(&priv->indir_lock); \
+ reg_writel(priv, upper_32_bits(val), REG_DIR_DATA_WRITE); \
+diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
+index 561342466076..26ec2fbfaa89 100644
+--- a/drivers/net/dsa/mv88e6xxx.c
++++ b/drivers/net/dsa/mv88e6xxx.c
+@@ -1387,6 +1387,7 @@ static int mv88e6xxx_setup_port(struct dsa_switch *ds, int port)
+ reg = _mv88e6xxx_reg_read(ds, REG_PORT(port), PORT_PCS_CTRL);
+ if (dsa_is_cpu_port(ds, port) ||
+ ds->dsa_port_mask & (1 << port)) {
++ reg &= ~PORT_PCS_CTRL_UNFORCED;
+ reg |= PORT_PCS_CTRL_FORCE_LINK |
+ PORT_PCS_CTRL_LINK_UP |
+ PORT_PCS_CTRL_DUPLEX_FULL |
+diff --git a/drivers/net/ethernet/altera/altera_tse_main.c b/drivers/net/ethernet/altera/altera_tse_main.c
+index da48e66377b5..8207877d6237 100644
+--- a/drivers/net/ethernet/altera/altera_tse_main.c
++++ b/drivers/net/ethernet/altera/altera_tse_main.c
+@@ -511,8 +511,7 @@ static int tse_poll(struct napi_struct *napi, int budget)
+
+ if (rxcomplete < budget) {
+
+- napi_gro_flush(napi, false);
+- __napi_complete(napi);
++ napi_complete(napi);
+
+ netdev_dbg(priv->dev,
+ "NAPI Complete, did %d packets with budget %d\n",
+diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
+index b349e6f36ea7..de63266de16b 100644
+--- a/drivers/net/ethernet/freescale/fec_main.c
++++ b/drivers/net/ethernet/freescale/fec_main.c
+@@ -1402,6 +1402,7 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id)
+ if ((status & BD_ENET_RX_LAST) == 0)
+ netdev_err(ndev, "rcv is not +last\n");
+
++ writel(FEC_ENET_RXF, fep->hwp + FEC_IEVENT);
+
+ /* Check for errors. */
+ if (status & (BD_ENET_RX_LG | BD_ENET_RX_SH | BD_ENET_RX_NO |
+diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
+index 62e48bc0cb23..09ec32e33076 100644
+--- a/drivers/net/ethernet/marvell/mvneta.c
++++ b/drivers/net/ethernet/marvell/mvneta.c
+@@ -1479,6 +1479,7 @@ static int mvneta_rx(struct mvneta_port *pp, int rx_todo,
+ struct mvneta_rx_desc *rx_desc = mvneta_rxq_next_desc_get(rxq);
+ struct sk_buff *skb;
+ unsigned char *data;
++ dma_addr_t phys_addr;
+ u32 rx_status;
+ int rx_bytes, err;
+
+@@ -1486,6 +1487,7 @@ static int mvneta_rx(struct mvneta_port *pp, int rx_todo,
+ rx_status = rx_desc->status;
+ rx_bytes = rx_desc->data_size - (ETH_FCS_LEN + MVNETA_MH_SIZE);
+ data = (unsigned char *)rx_desc->buf_cookie;
++ phys_addr = rx_desc->buf_phys_addr;
+
+ if (!mvneta_rxq_desc_is_first_last(rx_status) ||
+ (rx_status & MVNETA_RXD_ERR_SUMMARY)) {
+@@ -1534,7 +1536,7 @@ static int mvneta_rx(struct mvneta_port *pp, int rx_todo,
+ if (!skb)
+ goto err_drop_frame;
+
+- dma_unmap_single(dev->dev.parent, rx_desc->buf_phys_addr,
++ dma_unmap_single(dev->dev.parent, phys_addr,
+ MVNETA_RX_BUF_SIZE(pp->pkt_size), DMA_FROM_DEVICE);
+
+ rcvd_pkts++;
+@@ -3027,8 +3029,8 @@ static int mvneta_probe(struct platform_device *pdev)
+ const char *dt_mac_addr;
+ char hw_mac_addr[ETH_ALEN];
+ const char *mac_from;
++ const char *managed;
+ int phy_mode;
+- int fixed_phy = 0;
+ int err;
+
+ /* Our multiqueue support is not complete, so for now, only
+@@ -3062,7 +3064,6 @@ static int mvneta_probe(struct platform_device *pdev)
+ dev_err(&pdev->dev, "cannot register fixed PHY\n");
+ goto err_free_irq;
+ }
+- fixed_phy = 1;
+
+ /* In the case of a fixed PHY, the DT node associated
+ * to the PHY is the Ethernet MAC DT node.
+@@ -3086,8 +3087,10 @@ static int mvneta_probe(struct platform_device *pdev)
+ pp = netdev_priv(dev);
+ pp->phy_node = phy_node;
+ pp->phy_interface = phy_mode;
+- pp->use_inband_status = (phy_mode == PHY_INTERFACE_MODE_SGMII) &&
+- fixed_phy;
++
++ err = of_property_read_string(dn, "managed", &managed);
++ pp->use_inband_status = (err == 0 &&
++ strcmp(managed, "in-band-status") == 0);
+
+ pp->clk = devm_clk_get(&pdev->dev, NULL);
+ if (IS_ERR(pp->clk)) {
+diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+index 9c145dddd717..4f95fa7b594d 100644
+--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
++++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+@@ -1250,8 +1250,6 @@ int mlx4_en_config_rss_steer(struct mlx4_en_priv *priv)
+ rss_context->hash_fn = MLX4_RSS_HASH_TOP;
+ memcpy(rss_context->rss_key, priv->rss_key,
+ MLX4_EN_RSS_KEY_SIZE);
+- netdev_rss_key_fill(rss_context->rss_key,
+- MLX4_EN_RSS_KEY_SIZE);
+ } else {
+ en_err(priv, "Unknown RSS hash function requested\n");
+ err = -EINVAL;
+diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
+index 29c2a017a450..a408977a531a 100644
+--- a/drivers/net/ethernet/mellanox/mlx4/main.c
++++ b/drivers/net/ethernet/mellanox/mlx4/main.c
+@@ -2654,9 +2654,14 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev)
+
+ if (msi_x) {
+ int nreq = dev->caps.num_ports * num_online_cpus() + 1;
++ bool shared_ports = false;
+
+ nreq = min_t(int, dev->caps.num_eqs - dev->caps.reserved_eqs,
+ nreq);
++ if (nreq > MAX_MSIX) {
++ nreq = MAX_MSIX;
++ shared_ports = true;
++ }
+
+ entries = kcalloc(nreq, sizeof *entries, GFP_KERNEL);
+ if (!entries)
+@@ -2679,6 +2684,9 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev)
+ bitmap_zero(priv->eq_table.eq[MLX4_EQ_ASYNC].actv_ports.ports,
+ dev->caps.num_ports);
+
++ if (MLX4_IS_LEGACY_EQ_MODE(dev->caps))
++ shared_ports = true;
++
+ for (i = 0; i < dev->caps.num_comp_vectors + 1; i++) {
+ if (i == MLX4_EQ_ASYNC)
+ continue;
+@@ -2686,7 +2694,7 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev)
+ priv->eq_table.eq[i].irq =
+ entries[i + 1 - !!(i > MLX4_EQ_ASYNC)].vector;
+
+- if (MLX4_IS_LEGACY_EQ_MODE(dev->caps)) {
++ if (shared_ports) {
+ bitmap_fill(priv->eq_table.eq[i].actv_ports.ports,
+ dev->caps.num_ports);
+ /* We don't set affinity hint when there
+diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
+index edd77342773a..248478c6f6e4 100644
+--- a/drivers/net/macvtap.c
++++ b/drivers/net/macvtap.c
+@@ -1111,10 +1111,10 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
+ return 0;
+
+ case TUNSETSNDBUF:
+- if (get_user(u, up))
++ if (get_user(s, sp))
+ return -EFAULT;
+
+- q->sk.sk_sndbuf = u;
++ q->sk.sk_sndbuf = s;
+ return 0;
+
+ case TUNGETVNETHDRSZ:
+diff --git a/drivers/net/phy/fixed_phy.c b/drivers/net/phy/fixed_phy.c
+index d7a65247f952..99d9bc19c94a 100644
+--- a/drivers/net/phy/fixed_phy.c
++++ b/drivers/net/phy/fixed_phy.c
+@@ -52,6 +52,10 @@ static int fixed_phy_update_regs(struct fixed_phy *fp)
+ u16 lpagb = 0;
+ u16 lpa = 0;
+
++ if (!fp->status.link)
++ goto done;
++ bmsr |= BMSR_LSTATUS | BMSR_ANEGCOMPLETE;
++
+ if (fp->status.duplex) {
+ bmcr |= BMCR_FULLDPLX;
+
+@@ -96,15 +100,13 @@ static int fixed_phy_update_regs(struct fixed_phy *fp)
+ }
+ }
+
+- if (fp->status.link)
+- bmsr |= BMSR_LSTATUS | BMSR_ANEGCOMPLETE;
+-
+ if (fp->status.pause)
+ lpa |= LPA_PAUSE_CAP;
+
+ if (fp->status.asym_pause)
+ lpa |= LPA_PAUSE_ASYM;
+
++done:
+ fp->regs[MII_PHYSID1] = 0;
+ fp->regs[MII_PHYSID2] = 0;
+
+diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
+index 46a14cbb0215..02a4615b65f8 100644
+--- a/drivers/net/phy/mdio_bus.c
++++ b/drivers/net/phy/mdio_bus.c
+@@ -303,12 +303,12 @@ void mdiobus_unregister(struct mii_bus *bus)
+ BUG_ON(bus->state != MDIOBUS_REGISTERED);
+ bus->state = MDIOBUS_UNREGISTERED;
+
+- device_del(&bus->dev);
+ for (i = 0; i < PHY_MAX_ADDR; i++) {
+ if (bus->phy_map[i])
+ device_unregister(&bus->phy_map[i]->dev);
+ bus->phy_map[i] = NULL;
+ }
++ device_del(&bus->dev);
+ }
+ EXPORT_SYMBOL(mdiobus_unregister);
+
+diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
+index fa8f5046afe9..487be20b6b12 100644
+--- a/drivers/net/ppp/ppp_generic.c
++++ b/drivers/net/ppp/ppp_generic.c
+@@ -2742,6 +2742,7 @@ static struct ppp *ppp_create_interface(struct net *net, int unit,
+ */
+ dev_net_set(dev, net);
+
++ rtnl_lock();
+ mutex_lock(&pn->all_ppp_mutex);
+
+ if (unit < 0) {
+@@ -2772,7 +2773,7 @@ static struct ppp *ppp_create_interface(struct net *net, int unit,
+ ppp->file.index = unit;
+ sprintf(dev->name, "ppp%d", unit);
+
+- ret = register_netdev(dev);
++ ret = register_netdevice(dev);
+ if (ret != 0) {
+ unit_put(&pn->units_idr, unit);
+ netdev_err(ppp->dev, "PPP: couldn't register device %s (%d)\n",
+@@ -2784,6 +2785,7 @@ static struct ppp *ppp_create_interface(struct net *net, int unit,
+
+ atomic_inc(&ppp_unit_count);
+ mutex_unlock(&pn->all_ppp_mutex);
++ rtnl_unlock();
+
+ *retp = 0;
+ return ppp;
+diff --git a/drivers/of/of_mdio.c b/drivers/of/of_mdio.c
+index fdc60db60829..7c8c23cc6896 100644
+--- a/drivers/of/of_mdio.c
++++ b/drivers/of/of_mdio.c
+@@ -266,7 +266,8 @@ EXPORT_SYMBOL(of_phy_attach);
+ bool of_phy_is_fixed_link(struct device_node *np)
+ {
+ struct device_node *dn;
+- int len;
++ int len, err;
++ const char *managed;
+
+ /* New binding */
+ dn = of_get_child_by_name(np, "fixed-link");
+@@ -275,6 +276,10 @@ bool of_phy_is_fixed_link(struct device_node *np)
+ return true;
+ }
+
++ err = of_property_read_string(np, "managed", &managed);
++ if (err == 0 && strcmp(managed, "auto") != 0)
++ return true;
++
+ /* Old binding */
+ if (of_get_property(np, "fixed-link", &len) &&
+ len == (5 * sizeof(__be32)))
+@@ -289,8 +294,18 @@ int of_phy_register_fixed_link(struct device_node *np)
+ struct fixed_phy_status status = {};
+ struct device_node *fixed_link_node;
+ const __be32 *fixed_link_prop;
+- int len;
++ int len, err;
+ struct phy_device *phy;
++ const char *managed;
++
++ err = of_property_read_string(np, "managed", &managed);
++ if (err == 0) {
++ if (strcmp(managed, "in-band-status") == 0) {
++ /* status is zeroed, namely its .link member */
++ phy = fixed_phy_register(PHY_POLL, &status, np);
++ return IS_ERR(phy) ? PTR_ERR(phy) : 0;
++ }
++ }
+
+ /* New binding */
+ fixed_link_node = of_get_child_by_name(np, "fixed-link");
+diff --git a/drivers/platform/x86/hp-wmi.c b/drivers/platform/x86/hp-wmi.c
+index 06697315a088..fb4dd7b3ee71 100644
+--- a/drivers/platform/x86/hp-wmi.c
++++ b/drivers/platform/x86/hp-wmi.c
+@@ -54,8 +54,9 @@ MODULE_ALIAS("wmi:5FB7F034-2C63-45e9-BE91-3D44E2C707E4");
+ #define HPWMI_HARDWARE_QUERY 0x4
+ #define HPWMI_WIRELESS_QUERY 0x5
+ #define HPWMI_BIOS_QUERY 0x9
++#define HPWMI_FEATURE_QUERY 0xb
+ #define HPWMI_HOTKEY_QUERY 0xc
+-#define HPWMI_FEATURE_QUERY 0xd
++#define HPWMI_FEATURE2_QUERY 0xd
+ #define HPWMI_WIRELESS2_QUERY 0x1b
+ #define HPWMI_POSTCODEERROR_QUERY 0x2a
+
+@@ -295,25 +296,33 @@ static int hp_wmi_tablet_state(void)
+ return (state & 0x4) ? 1 : 0;
+ }
+
+-static int __init hp_wmi_bios_2009_later(void)
++static int __init hp_wmi_bios_2008_later(void)
+ {
+ int state = 0;
+ int ret = hp_wmi_perform_query(HPWMI_FEATURE_QUERY, 0, &state,
+ sizeof(state), sizeof(state));
+- if (ret)
+- return ret;
++ if (!ret)
++ return 1;
+
+- return (state & 0x10) ? 1 : 0;
++ return (ret == HPWMI_RET_UNKNOWN_CMDTYPE) ? 0 : -ENXIO;
+ }
+
+-static int hp_wmi_enable_hotkeys(void)
++static int __init hp_wmi_bios_2009_later(void)
+ {
+- int ret;
+- int query = 0x6e;
++ int state = 0;
++ int ret = hp_wmi_perform_query(HPWMI_FEATURE2_QUERY, 0, &state,
++ sizeof(state), sizeof(state));
++ if (!ret)
++ return 1;
+
+- ret = hp_wmi_perform_query(HPWMI_BIOS_QUERY, 1, &query, sizeof(query),
+- 0);
++ return (ret == HPWMI_RET_UNKNOWN_CMDTYPE) ? 0 : -ENXIO;
++}
+
++static int __init hp_wmi_enable_hotkeys(void)
++{
++ int value = 0x6e;
++ int ret = hp_wmi_perform_query(HPWMI_BIOS_QUERY, 1, &value,
++ sizeof(value), 0);
+ if (ret)
+ return -EINVAL;
+ return 0;
+@@ -663,7 +672,7 @@ static int __init hp_wmi_input_setup(void)
+ hp_wmi_tablet_state());
+ input_sync(hp_wmi_input_dev);
+
+- if (hp_wmi_bios_2009_later() == 4)
++ if (!hp_wmi_bios_2009_later() && hp_wmi_bios_2008_later())
+ hp_wmi_enable_hotkeys();
+
+ status = wmi_install_notify_handler(HPWMI_EVENT_GUID, hp_wmi_notify, NULL);
+diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
+index 1285eaf5dc22..03cdb9e18d57 100644
+--- a/net/bridge/br_multicast.c
++++ b/net/bridge/br_multicast.c
+@@ -991,7 +991,7 @@ static int br_ip4_multicast_igmp3_report(struct net_bridge *br,
+
+ ih = igmpv3_report_hdr(skb);
+ num = ntohs(ih->ngrec);
+- len = sizeof(*ih);
++ len = skb_transport_offset(skb) + sizeof(*ih);
+
+ for (i = 0; i < num; i++) {
+ len += sizeof(*grec);
+@@ -1052,7 +1052,7 @@ static int br_ip6_multicast_mld2_report(struct net_bridge *br,
+
+ icmp6h = icmp6_hdr(skb);
+ num = ntohs(icmp6h->icmp6_dataun.un_data16[1]);
+- len = sizeof(*icmp6h);
++ len = skb_transport_offset(skb) + sizeof(*icmp6h);
+
+ for (i = 0; i < num; i++) {
+ __be16 *nsrcs, _nsrcs;
+diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
+index 9a12668f7d62..0ad144fb0c79 100644
+--- a/net/core/fib_rules.c
++++ b/net/core/fib_rules.c
+@@ -615,15 +615,17 @@ static int dump_rules(struct sk_buff *skb, struct netlink_callback *cb,
+ {
+ int idx = 0;
+ struct fib_rule *rule;
++ int err = 0;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(rule, &ops->rules_list, list) {
+ if (idx < cb->args[1])
+ goto skip;
+
+- if (fib_nl_fill_rule(skb, rule, NETLINK_CB(cb->skb).portid,
+- cb->nlh->nlmsg_seq, RTM_NEWRULE,
+- NLM_F_MULTI, ops) < 0)
++ err = fib_nl_fill_rule(skb, rule, NETLINK_CB(cb->skb).portid,
++ cb->nlh->nlmsg_seq, RTM_NEWRULE,
++ NLM_F_MULTI, ops);
++ if (err)
+ break;
+ skip:
+ idx++;
+@@ -632,7 +634,7 @@ skip:
+ cb->args[1] = idx;
+ rules_ops_put(ops);
+
+- return skb->len;
++ return err;
+ }
+
+ static int fib_nl_dumprule(struct sk_buff *skb, struct netlink_callback *cb)
+@@ -648,7 +650,9 @@ static int fib_nl_dumprule(struct sk_buff *skb, struct netlink_callback *cb)
+ if (ops == NULL)
+ return -EAFNOSUPPORT;
+
+- return dump_rules(skb, cb, ops);
++ dump_rules(skb, cb, ops);
++
++ return skb->len;
+ }
+
+ rcu_read_lock();
+diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
+index dc004b1e1f85..0861018be708 100644
+--- a/net/core/rtnetlink.c
++++ b/net/core/rtnetlink.c
+@@ -3021,6 +3021,7 @@ static int rtnl_bridge_getlink(struct sk_buff *skb, struct netlink_callback *cb)
+ u32 portid = NETLINK_CB(cb->skb).portid;
+ u32 seq = cb->nlh->nlmsg_seq;
+ u32 filter_mask = 0;
++ int err;
+
+ if (nlmsg_len(cb->nlh) > sizeof(struct ifinfomsg)) {
+ struct nlattr *extfilt;
+@@ -3041,20 +3042,25 @@ static int rtnl_bridge_getlink(struct sk_buff *skb, struct netlink_callback *cb)
+ struct net_device *br_dev = netdev_master_upper_dev_get(dev);
+
+ if (br_dev && br_dev->netdev_ops->ndo_bridge_getlink) {
+- if (idx >= cb->args[0] &&
+- br_dev->netdev_ops->ndo_bridge_getlink(
+- skb, portid, seq, dev, filter_mask,
+- NLM_F_MULTI) < 0)
+- break;
++ if (idx >= cb->args[0]) {
++ err = br_dev->netdev_ops->ndo_bridge_getlink(
++ skb, portid, seq, dev,
++ filter_mask, NLM_F_MULTI);
++ if (err < 0 && err != -EOPNOTSUPP)
++ break;
++ }
+ idx++;
+ }
+
+ if (ops->ndo_bridge_getlink) {
+- if (idx >= cb->args[0] &&
+- ops->ndo_bridge_getlink(skb, portid, seq, dev,
+- filter_mask,
+- NLM_F_MULTI) < 0)
+- break;
++ if (idx >= cb->args[0]) {
++ err = ops->ndo_bridge_getlink(skb, portid,
++ seq, dev,
++ filter_mask,
++ NLM_F_MULTI);
++ if (err < 0 && err != -EOPNOTSUPP)
++ break;
++ }
+ idx++;
+ }
+ }
+diff --git a/net/core/sock_diag.c b/net/core/sock_diag.c
+index d79866c5f8bc..817622f3dbb7 100644
+--- a/net/core/sock_diag.c
++++ b/net/core/sock_diag.c
+@@ -90,6 +90,9 @@ int sock_diag_put_filterinfo(bool may_report_filterinfo, struct sock *sk,
+ goto out;
+
+ fprog = filter->prog->orig_prog;
++ if (!fprog)
++ goto out;
++
+ flen = bpf_classic_proglen(fprog);
+
+ attr = nla_reserve(skb, attrtype, flen);
+diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
+index b1c218df2c85..b7dedd9d36d8 100644
+--- a/net/ipv4/tcp_output.c
++++ b/net/ipv4/tcp_output.c
+@@ -2898,6 +2898,7 @@ void tcp_send_active_reset(struct sock *sk, gfp_t priority)
+ skb_reserve(skb, MAX_TCP_HEADER);
+ tcp_init_nondata_skb(skb, tcp_acceptable_seq(sk),
+ TCPHDR_ACK | TCPHDR_RST);
++ skb_mstamp_get(&skb->skb_mstamp);
+ /* Send it off. */
+ if (tcp_transmit_skb(sk, skb, 0, priority))
+ NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTFAILED);
+diff --git a/net/ipv6/exthdrs_offload.c b/net/ipv6/exthdrs_offload.c
+index 447a7fbd1bb6..f5e2ba1c18bf 100644
+--- a/net/ipv6/exthdrs_offload.c
++++ b/net/ipv6/exthdrs_offload.c
+@@ -36,6 +36,6 @@ out:
+ return ret;
+
+ out_rt:
+- inet_del_offload(&rthdr_offload, IPPROTO_ROUTING);
++ inet6_del_offload(&rthdr_offload, IPPROTO_ROUTING);
+ goto out;
+ }
+diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
+index 74ceb73c1c9a..5f36266b1f5e 100644
+--- a/net/ipv6/ip6mr.c
++++ b/net/ipv6/ip6mr.c
+@@ -550,7 +550,7 @@ static void ipmr_mfc_seq_stop(struct seq_file *seq, void *v)
+
+ if (it->cache == &mrt->mfc6_unres_queue)
+ spin_unlock_bh(&mfc_unres_lock);
+- else if (it->cache == mrt->mfc6_cache_array)
++ else if (it->cache == &mrt->mfc6_cache_array[it->ct])
+ read_unlock(&mrt_lock);
+ }
+
+diff --git a/net/ipv6/route.c b/net/ipv6/route.c
+index d15586490cec..00b64d402a57 100644
+--- a/net/ipv6/route.c
++++ b/net/ipv6/route.c
+@@ -1727,7 +1727,7 @@ static int ip6_convert_metrics(struct mx6_config *mxc,
+ return -EINVAL;
+ }
+
+-int ip6_route_add(struct fib6_config *cfg)
++int ip6_route_info_create(struct fib6_config *cfg, struct rt6_info **rt_ret)
+ {
+ int err;
+ struct net *net = cfg->fc_nlinfo.nl_net;
+@@ -1735,7 +1735,6 @@ int ip6_route_add(struct fib6_config *cfg)
+ struct net_device *dev = NULL;
+ struct inet6_dev *idev = NULL;
+ struct fib6_table *table;
+- struct mx6_config mxc = { .mx = NULL, };
+ int addr_type;
+
+ if (cfg->fc_dst_len > 128 || cfg->fc_src_len > 128)
+@@ -1941,6 +1940,32 @@ install_route:
+
+ cfg->fc_nlinfo.nl_net = dev_net(dev);
+
++ *rt_ret = rt;
++
++ return 0;
++out:
++ if (dev)
++ dev_put(dev);
++ if (idev)
++ in6_dev_put(idev);
++ if (rt)
++ dst_free(&rt->dst);
++
++ *rt_ret = NULL;
++
++ return err;
++}
++
++int ip6_route_add(struct fib6_config *cfg)
++{
++ struct mx6_config mxc = { .mx = NULL, };
++ struct rt6_info *rt = NULL;
++ int err;
++
++ err = ip6_route_info_create(cfg, &rt);
++ if (err)
++ goto out;
++
+ err = ip6_convert_metrics(&mxc, cfg);
+ if (err)
+ goto out;
+@@ -1948,14 +1973,12 @@ install_route:
+ err = __ip6_ins_rt(rt, &cfg->fc_nlinfo, &mxc);
+
+ kfree(mxc.mx);
++
+ return err;
+ out:
+- if (dev)
+- dev_put(dev);
+- if (idev)
+- in6_dev_put(idev);
+ if (rt)
+ dst_free(&rt->dst);
++
+ return err;
+ }
+
+@@ -2727,19 +2750,78 @@ errout:
+ return err;
+ }
+
+-static int ip6_route_multipath(struct fib6_config *cfg, int add)
++struct rt6_nh {
++ struct rt6_info *rt6_info;
++ struct fib6_config r_cfg;
++ struct mx6_config mxc;
++ struct list_head next;
++};
++
++static void ip6_print_replace_route_err(struct list_head *rt6_nh_list)
++{
++ struct rt6_nh *nh;
++
++ list_for_each_entry(nh, rt6_nh_list, next) {
++ pr_warn("IPV6: multipath route replace failed (check consistency of installed routes): %pI6 nexthop %pI6 ifi %d\n",
++ &nh->r_cfg.fc_dst, &nh->r_cfg.fc_gateway,
++ nh->r_cfg.fc_ifindex);
++ }
++}
++
++static int ip6_route_info_append(struct list_head *rt6_nh_list,
++ struct rt6_info *rt, struct fib6_config *r_cfg)
++{
++ struct rt6_nh *nh;
++ struct rt6_info *rtnh;
++ int err = -EEXIST;
++
++ list_for_each_entry(nh, rt6_nh_list, next) {
++ /* check if rt6_info already exists */
++ rtnh = nh->rt6_info;
++
++ if (rtnh->dst.dev == rt->dst.dev &&
++ rtnh->rt6i_idev == rt->rt6i_idev &&
++ ipv6_addr_equal(&rtnh->rt6i_gateway,
++ &rt->rt6i_gateway))
++ return err;
++ }
++
++ nh = kzalloc(sizeof(*nh), GFP_KERNEL);
++ if (!nh)
++ return -ENOMEM;
++ nh->rt6_info = rt;
++ err = ip6_convert_metrics(&nh->mxc, r_cfg);
++ if (err) {
++ kfree(nh);
++ return err;
++ }
++ memcpy(&nh->r_cfg, r_cfg, sizeof(*r_cfg));
++ list_add_tail(&nh->next, rt6_nh_list);
++
++ return 0;
++}
++
++static int ip6_route_multipath_add(struct fib6_config *cfg)
+ {
+ struct fib6_config r_cfg;
+ struct rtnexthop *rtnh;
++ struct rt6_info *rt;
++ struct rt6_nh *err_nh;
++ struct rt6_nh *nh, *nh_safe;
+ int remaining;
+ int attrlen;
+- int err = 0, last_err = 0;
++ int err = 1;
++ int nhn = 0;
++ int replace = (cfg->fc_nlinfo.nlh &&
++ (cfg->fc_nlinfo.nlh->nlmsg_flags & NLM_F_REPLACE));
++ LIST_HEAD(rt6_nh_list);
+
+ remaining = cfg->fc_mp_len;
+-beginning:
+ rtnh = (struct rtnexthop *)cfg->fc_mp;
+
+- /* Parse a Multipath Entry */
++ /* Parse a Multipath Entry and build a list (rt6_nh_list) of
++ * rt6_info structs per nexthop
++ */
+ while (rtnh_ok(rtnh, remaining)) {
+ memcpy(&r_cfg, cfg, sizeof(*cfg));
+ if (rtnh->rtnh_ifindex)
+@@ -2755,22 +2837,32 @@ beginning:
+ r_cfg.fc_flags |= RTF_GATEWAY;
+ }
+ }
+- err = add ? ip6_route_add(&r_cfg) : ip6_route_del(&r_cfg);
++
++ err = ip6_route_info_create(&r_cfg, &rt);
++ if (err)
++ goto cleanup;
++
++ err = ip6_route_info_append(&rt6_nh_list, rt, &r_cfg);
+ if (err) {
+- last_err = err;
+- /* If we are trying to remove a route, do not stop the
+- * loop when ip6_route_del() fails (because next hop is
+- * already gone), we should try to remove all next hops.
+- */
+- if (add) {
+- /* If add fails, we should try to delete all
+- * next hops that have been already added.
+- */
+- add = 0;
+- remaining = cfg->fc_mp_len - remaining;
+- goto beginning;
+- }
++ dst_free(&rt->dst);
++ goto cleanup;
++ }
++
++ rtnh = rtnh_next(rtnh, &remaining);
++ }
++
++ err_nh = NULL;
++ list_for_each_entry(nh, &rt6_nh_list, next) {
++ err = __ip6_ins_rt(nh->rt6_info, &cfg->fc_nlinfo, &nh->mxc);
++ /* nh->rt6_info is used or freed at this point, reset to NULL*/
++ nh->rt6_info = NULL;
++ if (err) {
++ if (replace && nhn)
++ ip6_print_replace_route_err(&rt6_nh_list);
++ err_nh = nh;
++ goto add_errout;
+ }
++
+ /* Because each route is added like a single route we remove
+ * these flags after the first nexthop: if there is a collision,
+ * we have already failed to add the first nexthop:
+@@ -2780,6 +2872,63 @@ beginning:
+ */
+ cfg->fc_nlinfo.nlh->nlmsg_flags &= ~(NLM_F_EXCL |
+ NLM_F_REPLACE);
++ nhn++;
++ }
++
++ goto cleanup;
++
++add_errout:
++ /* Delete routes that were already added */
++ list_for_each_entry(nh, &rt6_nh_list, next) {
++ if (err_nh == nh)
++ break;
++ ip6_route_del(&nh->r_cfg);
++ }
++
++cleanup:
++ list_for_each_entry_safe(nh, nh_safe, &rt6_nh_list, next) {
++ if (nh->rt6_info)
++ dst_free(&nh->rt6_info->dst);
++ if (nh->mxc.mx)
++ kfree(nh->mxc.mx);
++ list_del(&nh->next);
++ kfree(nh);
++ }
++
++ return err;
++}
++
++static int ip6_route_multipath_del(struct fib6_config *cfg)
++{
++ struct fib6_config r_cfg;
++ struct rtnexthop *rtnh;
++ int remaining;
++ int attrlen;
++ int err = 1, last_err = 0;
++
++ remaining = cfg->fc_mp_len;
++ rtnh = (struct rtnexthop *)cfg->fc_mp;
++
++ /* Parse a Multipath Entry */
++ while (rtnh_ok(rtnh, remaining)) {
++ memcpy(&r_cfg, cfg, sizeof(*cfg));
++ if (rtnh->rtnh_ifindex)
++ r_cfg.fc_ifindex = rtnh->rtnh_ifindex;
++
++ attrlen = rtnh_attrlen(rtnh);
++ if (attrlen > 0) {
++ struct nlattr *nla, *attrs = rtnh_attrs(rtnh);
++
++ nla = nla_find(attrs, attrlen, RTA_GATEWAY);
++ if (nla) {
++ nla_memcpy(&r_cfg.fc_gateway, nla, 16);
++ r_cfg.fc_flags |= RTF_GATEWAY;
++ }
++ }
++ err = ip6_route_del(&r_cfg);
++ if (err)
++ last_err = err;
++
+ rtnh = rtnh_next(rtnh, &remaining);
+ }
+
+@@ -2796,7 +2945,7 @@ static int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr *nlh)
+ return err;
+
+ if (cfg.fc_mp)
+- return ip6_route_multipath(&cfg, 0);
++ return ip6_route_multipath_del(&cfg);
+ else
+ return ip6_route_del(&cfg);
+ }
+@@ -2811,7 +2960,7 @@ static int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr *nlh)
+ return err;
+
+ if (cfg.fc_mp)
+- return ip6_route_multipath(&cfg, 1);
++ return ip6_route_multipath_add(&cfg);
+ else
+ return ip6_route_add(&cfg);
+ }
+diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
+index a774985489e2..0857f7243797 100644
+--- a/net/netlink/af_netlink.c
++++ b/net/netlink/af_netlink.c
+@@ -124,6 +124,24 @@ static inline u32 netlink_group_mask(u32 group)
+ return group ? 1 << (group - 1) : 0;
+ }
+
++static struct sk_buff *netlink_to_full_skb(const struct sk_buff *skb,
++ gfp_t gfp_mask)
++{
++ unsigned int len = skb_end_offset(skb);
++ struct sk_buff *new;
++
++ new = alloc_skb(len, gfp_mask);
++ if (new == NULL)
++ return NULL;
++
++ NETLINK_CB(new).portid = NETLINK_CB(skb).portid;
++ NETLINK_CB(new).dst_group = NETLINK_CB(skb).dst_group;
++ NETLINK_CB(new).creds = NETLINK_CB(skb).creds;
++
++ memcpy(skb_put(new, len), skb->data, len);
++ return new;
++}
++
+ int netlink_add_tap(struct netlink_tap *nt)
+ {
+ if (unlikely(nt->dev->type != ARPHRD_NETLINK))
+@@ -205,7 +223,11 @@ static int __netlink_deliver_tap_skb(struct sk_buff *skb,
+ int ret = -ENOMEM;
+
+ dev_hold(dev);
+- nskb = skb_clone(skb, GFP_ATOMIC);
++
++ if (netlink_skb_is_mmaped(skb) || is_vmalloc_addr(skb->head))
++ nskb = netlink_to_full_skb(skb, GFP_ATOMIC);
++ else
++ nskb = skb_clone(skb, GFP_ATOMIC);
+ if (nskb) {
+ nskb->dev = dev;
+ nskb->protocol = htons((u16) sk->sk_protocol);
+@@ -278,11 +300,6 @@ static void netlink_rcv_wake(struct sock *sk)
+ }
+
+ #ifdef CONFIG_NETLINK_MMAP
+-static bool netlink_skb_is_mmaped(const struct sk_buff *skb)
+-{
+- return NETLINK_CB(skb).flags & NETLINK_SKB_MMAPED;
+-}
+-
+ static bool netlink_rx_is_mmaped(struct sock *sk)
+ {
+ return nlk_sk(sk)->rx_ring.pg_vec != NULL;
+@@ -834,7 +851,6 @@ static void netlink_ring_set_copied(struct sock *sk, struct sk_buff *skb)
+ }
+
+ #else /* CONFIG_NETLINK_MMAP */
+-#define netlink_skb_is_mmaped(skb) false
+ #define netlink_rx_is_mmaped(sk) false
+ #define netlink_tx_is_mmaped(sk) false
+ #define netlink_mmap sock_no_mmap
+@@ -1082,8 +1098,8 @@ static int netlink_insert(struct sock *sk, u32 portid)
+
+ lock_sock(sk);
+
+- err = -EBUSY;
+- if (nlk_sk(sk)->portid)
++ err = nlk_sk(sk)->portid == portid ? 0 : -EBUSY;
++ if (nlk_sk(sk)->bound)
+ goto err;
+
+ err = -ENOMEM;
+@@ -1103,10 +1119,14 @@ static int netlink_insert(struct sock *sk, u32 portid)
+ err = -EOVERFLOW;
+ if (err == -EEXIST)
+ err = -EADDRINUSE;
+- nlk_sk(sk)->portid = 0;
+ sock_put(sk);
++ goto err;
+ }
+
++ /* We need to ensure that the socket is hashed and visible. */
++ smp_wmb();
++ nlk_sk(sk)->bound = portid;
++
+ err:
+ release_sock(sk);
+ return err;
+@@ -1491,6 +1511,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
+ struct sockaddr_nl *nladdr = (struct sockaddr_nl *)addr;
+ int err;
+ long unsigned int groups = nladdr->nl_groups;
++ bool bound;
+
+ if (addr_len < sizeof(struct sockaddr_nl))
+ return -EINVAL;
+@@ -1507,9 +1528,14 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
+ return err;
+ }
+
+- if (nlk->portid)
++ bound = nlk->bound;
++ if (bound) {
++ /* Ensure nlk->portid is up-to-date. */
++ smp_rmb();
++
+ if (nladdr->nl_pid != nlk->portid)
+ return -EINVAL;
++ }
+
+ if (nlk->netlink_bind && groups) {
+ int group;
+@@ -1525,7 +1551,10 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
+ }
+ }
+
+- if (!nlk->portid) {
++ /* No need for barriers here as we return to user-space without
++ * using any of the bound attributes.
++ */
++ if (!bound) {
+ err = nladdr->nl_pid ?
+ netlink_insert(sk, nladdr->nl_pid) :
+ netlink_autobind(sock);
+@@ -1573,7 +1602,10 @@ static int netlink_connect(struct socket *sock, struct sockaddr *addr,
+ !netlink_allowed(sock, NL_CFG_F_NONROOT_SEND))
+ return -EPERM;
+
+- if (!nlk->portid)
++ /* No need for barriers here as we return to user-space without
++ * using any of the bound attributes.
++ */
++ if (!nlk->bound)
+ err = netlink_autobind(sock);
+
+ if (err == 0) {
+@@ -2391,10 +2423,13 @@ static int netlink_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
+ dst_group = nlk->dst_group;
+ }
+
+- if (!nlk->portid) {
++ if (!nlk->bound) {
+ err = netlink_autobind(sock);
+ if (err)
+ goto out;
++ } else {
++ /* Ensure nlk is hashed and visible. */
++ smp_rmb();
+ }
+
+ /* It's a really convoluted way for userland to ask for mmaped
+diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
+index 89008405d6b4..14437d9b1965 100644
+--- a/net/netlink/af_netlink.h
++++ b/net/netlink/af_netlink.h
+@@ -35,6 +35,7 @@ struct netlink_sock {
+ unsigned long state;
+ size_t max_recvmsg_len;
+ wait_queue_head_t wait;
++ bool bound;
+ bool cb_running;
+ struct netlink_callback cb;
+ struct mutex *cb_mutex;
+@@ -59,6 +60,15 @@ static inline struct netlink_sock *nlk_sk(struct sock *sk)
+ return container_of(sk, struct netlink_sock, sk);
+ }
+
++static inline bool netlink_skb_is_mmaped(const struct sk_buff *skb)
++{
++#ifdef CONFIG_NETLINK_MMAP
++ return NETLINK_CB(skb).flags & NETLINK_SKB_MMAPED;
++#else
++ return false;
++#endif /* CONFIG_NETLINK_MMAP */
++}
++
+ struct netlink_table {
+ struct rhashtable hash;
+ struct hlist_head mc_list;
+diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
+index ff8c4a4c1609..ff332d1b94bc 100644
+--- a/net/openvswitch/datapath.c
++++ b/net/openvswitch/datapath.c
+@@ -920,7 +920,7 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
+ if (error)
+ goto err_kfree_flow;
+
+- ovs_flow_mask_key(&new_flow->key, &key, &mask);
++ ovs_flow_mask_key(&new_flow->key, &key, true, &mask);
+
+ /* Extract flow identifier. */
+ error = ovs_nla_get_identifier(&new_flow->id, a[OVS_FLOW_ATTR_UFID],
+@@ -1047,7 +1047,7 @@ static struct sw_flow_actions *get_flow_actions(const struct nlattr *a,
+ struct sw_flow_key masked_key;
+ int error;
+
+- ovs_flow_mask_key(&masked_key, key, mask);
++ ovs_flow_mask_key(&masked_key, key, true, mask);
+ error = ovs_nla_copy_actions(a, &masked_key, &acts, log);
+ if (error) {
+ OVS_NLERR(log,
+diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
+index 65523948fb95..b5c3bba87fc8 100644
+--- a/net/openvswitch/flow_table.c
++++ b/net/openvswitch/flow_table.c
+@@ -56,20 +56,21 @@ static u16 range_n_bytes(const struct sw_flow_key_range *range)
+ }
+
+ void ovs_flow_mask_key(struct sw_flow_key *dst, const struct sw_flow_key *src,
+- const struct sw_flow_mask *mask)
++ bool full, const struct sw_flow_mask *mask)
+ {
+- const long *m = (const long *)((const u8 *)&mask->key +
+- mask->range.start);
+- const long *s = (const long *)((const u8 *)src +
+- mask->range.start);
+- long *d = (long *)((u8 *)dst + mask->range.start);
++ int start = full ? 0 : mask->range.start;
++ int len = full ? sizeof *dst : range_n_bytes(&mask->range);
++ const long *m = (const long *)((const u8 *)&mask->key + start);
++ const long *s = (const long *)((const u8 *)src + start);
++ long *d = (long *)((u8 *)dst + start);
+ int i;
+
+- /* The memory outside of the 'mask->range' are not set since
+- * further operations on 'dst' only uses contents within
+- * 'mask->range'.
++ /* If 'full' is true then all of 'dst' is fully initialized. Otherwise,
++ * if 'full' is false the memory outside of the 'mask->range' is left
++ * uninitialized. This can be used as an optimization when further
++ * operations on 'dst' only use contents within 'mask->range'.
+ */
+- for (i = 0; i < range_n_bytes(&mask->range); i += sizeof(long))
++ for (i = 0; i < len; i += sizeof(long))
+ *d++ = *s++ & *m++;
+ }
+
+@@ -473,7 +474,7 @@ static struct sw_flow *masked_flow_lookup(struct table_instance *ti,
+ u32 hash;
+ struct sw_flow_key masked_key;
+
+- ovs_flow_mask_key(&masked_key, unmasked, mask);
++ ovs_flow_mask_key(&masked_key, unmasked, false, mask);
+ hash = flow_hash(&masked_key, &mask->range);
+ head = find_bucket(ti, hash);
+ hlist_for_each_entry_rcu(flow, head, flow_table.node[ti->node_ver]) {
+diff --git a/net/openvswitch/flow_table.h b/net/openvswitch/flow_table.h
+index 616eda10d955..2dd9900f533d 100644
+--- a/net/openvswitch/flow_table.h
++++ b/net/openvswitch/flow_table.h
+@@ -86,5 +86,5 @@ struct sw_flow *ovs_flow_tbl_lookup_ufid(struct flow_table *,
+ bool ovs_flow_cmp(const struct sw_flow *, const struct sw_flow_match *);
+
+ void ovs_flow_mask_key(struct sw_flow_key *dst, const struct sw_flow_key *src,
+- const struct sw_flow_mask *mask);
++ bool full, const struct sw_flow_mask *mask);
+ #endif /* flow_table.h */
+diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
+index ed458b315ef4..7851b1222a36 100644
+--- a/net/packet/af_packet.c
++++ b/net/packet/af_packet.c
+@@ -229,6 +229,8 @@ struct packet_skb_cb {
+ } sa;
+ };
+
++#define vio_le() virtio_legacy_is_little_endian()
++
+ #define PACKET_SKB_CB(__skb) ((struct packet_skb_cb *)((__skb)->cb))
+
+ #define GET_PBDQC_FROM_RB(x) ((struct tpacket_kbdq_core *)(&(x)->prb_bdqc))
+@@ -2561,15 +2563,15 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
+ goto out_unlock;
+
+ if ((vnet_hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) &&
+- (__virtio16_to_cpu(false, vnet_hdr.csum_start) +
+- __virtio16_to_cpu(false, vnet_hdr.csum_offset) + 2 >
+- __virtio16_to_cpu(false, vnet_hdr.hdr_len)))
+- vnet_hdr.hdr_len = __cpu_to_virtio16(false,
+- __virtio16_to_cpu(false, vnet_hdr.csum_start) +
+- __virtio16_to_cpu(false, vnet_hdr.csum_offset) + 2);
++ (__virtio16_to_cpu(vio_le(), vnet_hdr.csum_start) +
++ __virtio16_to_cpu(vio_le(), vnet_hdr.csum_offset) + 2 >
++ __virtio16_to_cpu(vio_le(), vnet_hdr.hdr_len)))
++ vnet_hdr.hdr_len = __cpu_to_virtio16(vio_le(),
++ __virtio16_to_cpu(vio_le(), vnet_hdr.csum_start) +
++ __virtio16_to_cpu(vio_le(), vnet_hdr.csum_offset) + 2);
+
+ err = -EINVAL;
+- if (__virtio16_to_cpu(false, vnet_hdr.hdr_len) > len)
++ if (__virtio16_to_cpu(vio_le(), vnet_hdr.hdr_len) > len)
+ goto out_unlock;
+
+ if (vnet_hdr.gso_type != VIRTIO_NET_HDR_GSO_NONE) {
+@@ -2612,7 +2614,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
+ hlen = LL_RESERVED_SPACE(dev);
+ tlen = dev->needed_tailroom;
+ skb = packet_alloc_skb(sk, hlen + tlen, hlen, len,
+- __virtio16_to_cpu(false, vnet_hdr.hdr_len),
++ __virtio16_to_cpu(vio_le(), vnet_hdr.hdr_len),
+ msg->msg_flags & MSG_DONTWAIT, &err);
+ if (skb == NULL)
+ goto out_unlock;
+@@ -2659,8 +2661,8 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
+
+ if (po->has_vnet_hdr) {
+ if (vnet_hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
+- u16 s = __virtio16_to_cpu(false, vnet_hdr.csum_start);
+- u16 o = __virtio16_to_cpu(false, vnet_hdr.csum_offset);
++ u16 s = __virtio16_to_cpu(vio_le(), vnet_hdr.csum_start);
++ u16 o = __virtio16_to_cpu(vio_le(), vnet_hdr.csum_offset);
+ if (!skb_partial_csum_set(skb, s, o)) {
+ err = -EINVAL;
+ goto out_free;
+@@ -2668,7 +2670,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
+ }
+
+ skb_shinfo(skb)->gso_size =
+- __virtio16_to_cpu(false, vnet_hdr.gso_size);
++ __virtio16_to_cpu(vio_le(), vnet_hdr.gso_size);
+ skb_shinfo(skb)->gso_type = gso_type;
+
+ /* Header must be checked, and gso_segs computed. */
+@@ -3042,9 +3044,9 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
+
+ /* This is a hint as to how much should be linear. */
+ vnet_hdr.hdr_len =
+- __cpu_to_virtio16(false, skb_headlen(skb));
++ __cpu_to_virtio16(vio_le(), skb_headlen(skb));
+ vnet_hdr.gso_size =
+- __cpu_to_virtio16(false, sinfo->gso_size);
++ __cpu_to_virtio16(vio_le(), sinfo->gso_size);
+ if (sinfo->gso_type & SKB_GSO_TCPV4)
+ vnet_hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
+ else if (sinfo->gso_type & SKB_GSO_TCPV6)
+@@ -3062,9 +3064,9 @@ static int packet_recvmsg(struct socket *sock, struct msghdr *msg, size_t len,
+
+ if (skb->ip_summed == CHECKSUM_PARTIAL) {
+ vnet_hdr.flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
+- vnet_hdr.csum_start = __cpu_to_virtio16(false,
++ vnet_hdr.csum_start = __cpu_to_virtio16(vio_le(),
+ skb_checksum_start_offset(skb));
+- vnet_hdr.csum_offset = __cpu_to_virtio16(false,
++ vnet_hdr.csum_offset = __cpu_to_virtio16(vio_le(),
+ skb->csum_offset);
+ } else if (skb->ip_summed == CHECKSUM_UNNECESSARY) {
+ vnet_hdr.flags = VIRTIO_NET_HDR_F_DATA_VALID;
+diff --git a/net/sched/cls_fw.c b/net/sched/cls_fw.c
+index 715e01e5910a..f23a3b68bba6 100644
+--- a/net/sched/cls_fw.c
++++ b/net/sched/cls_fw.c
+@@ -33,7 +33,6 @@
+
+ struct fw_head {
+ u32 mask;
+- bool mask_set;
+ struct fw_filter __rcu *ht[HTSIZE];
+ struct rcu_head rcu;
+ };
+@@ -84,7 +83,7 @@ static int fw_classify(struct sk_buff *skb, const struct tcf_proto *tp,
+ }
+ }
+ } else {
+- /* old method */
++ /* Old method: classify the packet using its skb mark. */
+ if (id && (TC_H_MAJ(id) == 0 ||
+ !(TC_H_MAJ(id ^ tp->q->handle)))) {
+ res->classid = id;
+@@ -114,14 +113,9 @@ static unsigned long fw_get(struct tcf_proto *tp, u32 handle)
+
+ static int fw_init(struct tcf_proto *tp)
+ {
+- struct fw_head *head;
+-
+- head = kzalloc(sizeof(struct fw_head), GFP_KERNEL);
+- if (head == NULL)
+- return -ENOBUFS;
+-
+- head->mask_set = false;
+- rcu_assign_pointer(tp->root, head);
++ /* We don't allocate fw_head here, because in the old method
++ * we don't need it at all.
++ */
+ return 0;
+ }
+
+@@ -252,7 +246,7 @@ static int fw_change(struct net *net, struct sk_buff *in_skb,
+ int err;
+
+ if (!opt)
+- return handle ? -EINVAL : 0;
++ return handle ? -EINVAL : 0; /* Succeed if it is old method. */
+
+ err = nla_parse_nested(tb, TCA_FW_MAX, opt, fw_policy);
+ if (err < 0)
+@@ -302,11 +296,17 @@ static int fw_change(struct net *net, struct sk_buff *in_skb,
+ if (!handle)
+ return -EINVAL;
+
+- if (!head->mask_set) {
+- head->mask = 0xFFFFFFFF;
++ if (!head) {
++ u32 mask = 0xFFFFFFFF;
+ if (tb[TCA_FW_MASK])
+- head->mask = nla_get_u32(tb[TCA_FW_MASK]);
+- head->mask_set = true;
++ mask = nla_get_u32(tb[TCA_FW_MASK]);
++
++ head = kzalloc(sizeof(*head), GFP_KERNEL);
++ if (!head)
++ return -ENOBUFS;
++ head->mask = mask;
++
++ rcu_assign_pointer(tp->root, head);
+ }
+
+ f = kzalloc(sizeof(struct fw_filter), GFP_KERNEL);
+diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
+index 59e80356672b..3ac604f96da0 100644
+--- a/net/sctp/protocol.c
++++ b/net/sctp/protocol.c
+@@ -1166,7 +1166,7 @@ static void sctp_v4_del_protocol(void)
+ unregister_inetaddr_notifier(&sctp_inetaddr_notifier);
+ }
+
+-static int __net_init sctp_net_init(struct net *net)
++static int __net_init sctp_defaults_init(struct net *net)
+ {
+ int status;
+
+@@ -1259,12 +1259,6 @@ static int __net_init sctp_net_init(struct net *net)
+
+ sctp_dbg_objcnt_init(net);
+
+- /* Initialize the control inode/socket for handling OOTB packets. */
+- if ((status = sctp_ctl_sock_init(net))) {
+- pr_err("Failed to initialize the SCTP control sock\n");
+- goto err_ctl_sock_init;
+- }
+-
+ /* Initialize the local address list. */
+ INIT_LIST_HEAD(&net->sctp.local_addr_list);
+ spin_lock_init(&net->sctp.local_addr_lock);
+@@ -1280,9 +1274,6 @@ static int __net_init sctp_net_init(struct net *net)
+
+ return 0;
+
+-err_ctl_sock_init:
+- sctp_dbg_objcnt_exit(net);
+- sctp_proc_exit(net);
+ err_init_proc:
+ cleanup_sctp_mibs(net);
+ err_init_mibs:
+@@ -1291,15 +1282,12 @@ err_sysctl_register:
+ return status;
+ }
+
+-static void __net_exit sctp_net_exit(struct net *net)
++static void __net_exit sctp_defaults_exit(struct net *net)
+ {
+ /* Free the local address list */
+ sctp_free_addr_wq(net);
+ sctp_free_local_addr_list(net);
+
+- /* Free the control endpoint. */
+- inet_ctl_sock_destroy(net->sctp.ctl_sock);
+-
+ sctp_dbg_objcnt_exit(net);
+
+ sctp_proc_exit(net);
+@@ -1307,9 +1295,32 @@ static void __net_exit sctp_net_exit(struct net *net)
+ sctp_sysctl_net_unregister(net);
+ }
+
+-static struct pernet_operations sctp_net_ops = {
+- .init = sctp_net_init,
+- .exit = sctp_net_exit,
++static struct pernet_operations sctp_defaults_ops = {
++ .init = sctp_defaults_init,
++ .exit = sctp_defaults_exit,
++};
++
++static int __net_init sctp_ctrlsock_init(struct net *net)
++{
++ int status;
++
++ /* Initialize the control inode/socket for handling OOTB packets. */
++ status = sctp_ctl_sock_init(net);
++ if (status)
++ pr_err("Failed to initialize the SCTP control sock\n");
++
++ return status;
++}
++
++static void __net_init sctp_ctrlsock_exit(struct net *net)
++{
++ /* Free the control endpoint. */
++ inet_ctl_sock_destroy(net->sctp.ctl_sock);
++}
++
++static struct pernet_operations sctp_ctrlsock_ops = {
++ .init = sctp_ctrlsock_init,
++ .exit = sctp_ctrlsock_exit,
+ };
+
+ /* Initialize the universe into something sensible. */
+@@ -1442,8 +1453,11 @@ static __init int sctp_init(void)
+ sctp_v4_pf_init();
+ sctp_v6_pf_init();
+
+- status = sctp_v4_protosw_init();
++ status = register_pernet_subsys(&sctp_defaults_ops);
++ if (status)
++ goto err_register_defaults;
+
++ status = sctp_v4_protosw_init();
+ if (status)
+ goto err_protosw_init;
+
+@@ -1451,9 +1465,9 @@ static __init int sctp_init(void)
+ if (status)
+ goto err_v6_protosw_init;
+
+- status = register_pernet_subsys(&sctp_net_ops);
++ status = register_pernet_subsys(&sctp_ctrlsock_ops);
+ if (status)
+- goto err_register_pernet_subsys;
++ goto err_register_ctrlsock;
+
+ status = sctp_v4_add_protocol();
+ if (status)
+@@ -1469,12 +1483,14 @@ out:
+ err_v6_add_protocol:
+ sctp_v4_del_protocol();
+ err_add_protocol:
+- unregister_pernet_subsys(&sctp_net_ops);
+-err_register_pernet_subsys:
++ unregister_pernet_subsys(&sctp_ctrlsock_ops);
++err_register_ctrlsock:
+ sctp_v6_protosw_exit();
+ err_v6_protosw_init:
+ sctp_v4_protosw_exit();
+ err_protosw_init:
++ unregister_pernet_subsys(&sctp_defaults_ops);
++err_register_defaults:
+ sctp_v4_pf_exit();
+ sctp_v6_pf_exit();
+ sctp_sysctl_unregister();
+@@ -1507,12 +1523,14 @@ static __exit void sctp_exit(void)
+ sctp_v6_del_protocol();
+ sctp_v4_del_protocol();
+
+- unregister_pernet_subsys(&sctp_net_ops);
++ unregister_pernet_subsys(&sctp_ctrlsock_ops);
+
+ /* Free protosw registrations */
+ sctp_v6_protosw_exit();
+ sctp_v4_protosw_exit();
+
++ unregister_pernet_subsys(&sctp_defaults_ops);
++
+ /* Unregister with socket layer. */
+ sctp_v6_pf_exit();
+ sctp_v4_pf_exit();
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-10-23 17:14 Mike Pagano
0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-10-23 17:14 UTC (permalink / raw
To: gentoo-commits
commit: a66c9411919f0d467ddacb949af14b1336517b90
Author: Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Fri Oct 23 17:14:16 2015 +0000
Commit: Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Fri Oct 23 17:14:16 2015 +0000
URL: https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=a66c9411
Linux patch 4.2.4
0000_README | 4 +
1003_linux-4.2.4.patch | 10010 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 10014 insertions(+)
diff --git a/0000_README b/0000_README
index 5a14372..2a467c2 100644
--- a/0000_README
+++ b/0000_README
@@ -55,6 +55,10 @@ Patch: 1002_linux-4.2.3.patch
From: http://www.kernel.org
Desc: Linux 4.2.3
+Patch: 1003_linux-4.2.4.patch
+From: http://www.kernel.org
+Desc: Linux 4.2.4
+
Patch: 1500_XATTR_USER_PREFIX.patch
From: https://bugs.gentoo.org/show_bug.cgi?id=470644
Desc: Support for namespace user.pax.* on tmpfs.
diff --git a/1003_linux-4.2.4.patch b/1003_linux-4.2.4.patch
new file mode 100644
index 0000000..4118bfa
--- /dev/null
+++ b/1003_linux-4.2.4.patch
@@ -0,0 +1,10010 @@
+diff --git a/Documentation/HOWTO b/Documentation/HOWTO
+index 93aa8604630e..21152d397b88 100644
+--- a/Documentation/HOWTO
++++ b/Documentation/HOWTO
+@@ -218,16 +218,16 @@ The development process
+ Linux kernel development process currently consists of a few different
+ main kernel "branches" and lots of different subsystem-specific kernel
+ branches. These different branches are:
+- - main 3.x kernel tree
+- - 3.x.y -stable kernel tree
+- - 3.x -git kernel patches
++ - main 4.x kernel tree
++ - 4.x.y -stable kernel tree
++ - 4.x -git kernel patches
+ - subsystem specific kernel trees and patches
+- - the 3.x -next kernel tree for integration tests
++ - the 4.x -next kernel tree for integration tests
+
+-3.x kernel tree
++4.x kernel tree
+ -----------------
+-3.x kernels are maintained by Linus Torvalds, and can be found on
+-kernel.org in the pub/linux/kernel/v3.x/ directory. Its development
++4.x kernels are maintained by Linus Torvalds, and can be found on
++kernel.org in the pub/linux/kernel/v4.x/ directory. Its development
+ process is as follows:
+ - As soon as a new kernel is released a two weeks window is open,
+ during this period of time maintainers can submit big diffs to
+@@ -262,20 +262,20 @@ mailing list about kernel releases:
+ released according to perceived bug status, not according to a
+ preconceived timeline."
+
+-3.x.y -stable kernel tree
++4.x.y -stable kernel tree
+ ---------------------------
+ Kernels with 3-part versions are -stable kernels. They contain
+ relatively small and critical fixes for security problems or significant
+-regressions discovered in a given 3.x kernel.
++regressions discovered in a given 4.x kernel.
+
+ This is the recommended branch for users who want the most recent stable
+ kernel and are not interested in helping test development/experimental
+ versions.
+
+-If no 3.x.y kernel is available, then the highest numbered 3.x
++If no 4.x.y kernel is available, then the highest numbered 4.x
+ kernel is the current stable kernel.
+
+-3.x.y are maintained by the "stable" team <stable@vger.kernel.org>, and
++4.x.y are maintained by the "stable" team <stable@vger.kernel.org>, and
+ are released as needs dictate. The normal release period is approximately
+ two weeks, but it can be longer if there are no pressing problems. A
+ security-related problem, instead, can cause a release to happen almost
+@@ -285,7 +285,7 @@ The file Documentation/stable_kernel_rules.txt in the kernel tree
+ documents what kinds of changes are acceptable for the -stable tree, and
+ how the release process works.
+
+-3.x -git patches
++4.x -git patches
+ ------------------
+ These are daily snapshots of Linus' kernel tree which are managed in a
+ git repository (hence the name.) These patches are usually released
+@@ -317,9 +317,9 @@ revisions to it, and maintainers can mark patches as under review,
+ accepted, or rejected. Most of these patchwork sites are listed at
+ http://patchwork.kernel.org/.
+
+-3.x -next kernel tree for integration tests
++4.x -next kernel tree for integration tests
+ ---------------------------------------------
+-Before updates from subsystem trees are merged into the mainline 3.x
++Before updates from subsystem trees are merged into the mainline 4.x
+ tree, they need to be integration-tested. For this purpose, a special
+ testing repository exists into which virtually all subsystem trees are
+ pulled on an almost daily basis:
+diff --git a/Makefile b/Makefile
+index a6edbb11a69a..a952801a6cd5 100644
+--- a/Makefile
++++ b/Makefile
+@@ -1,6 +1,6 @@
+ VERSION = 4
+ PATCHLEVEL = 2
+-SUBLEVEL = 3
++SUBLEVEL = 4
+ EXTRAVERSION =
+ NAME = Hurr durr I'ma sheep
+
+diff --git a/arch/arc/plat-axs10x/axs10x.c b/arch/arc/plat-axs10x/axs10x.c
+index e7769c3ab5f2..ac79491ee2c0 100644
+--- a/arch/arc/plat-axs10x/axs10x.c
++++ b/arch/arc/plat-axs10x/axs10x.c
+@@ -402,6 +402,8 @@ static void __init axs103_early_init(void)
+ unsigned int num_cores = (read_aux_reg(ARC_REG_MCIP_BCR) >> 16) & 0x3F;
+ if (num_cores > 2)
+ arc_set_core_freq(50 * 1000000);
++ else if (num_cores == 2)
++ arc_set_core_freq(75 * 1000000);
+ #endif
+
+ switch (arc_get_core_freq()/1000000) {
+diff --git a/arch/arm/Makefile b/arch/arm/Makefile
+index 7451b447cc2d..2c2b28ee4811 100644
+--- a/arch/arm/Makefile
++++ b/arch/arm/Makefile
+@@ -54,6 +54,14 @@ AS += -EL
+ LD += -EL
+ endif
+
++#
++# The Scalar Replacement of Aggregates (SRA) optimization pass in GCC 4.9 and
++# later may result in code being generated that handles signed short and signed
++# char struct members incorrectly. So disable it.
++# (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65932)
++#
++KBUILD_CFLAGS += $(call cc-option,-fno-ipa-sra)
++
+ # This selects which instruction set is used.
+ # Note that GCC does not numerically define an architecture version
+ # macro, but instead defines a whole series of macros which makes
+diff --git a/arch/arm/boot/dts/exynos5420.dtsi b/arch/arm/boot/dts/exynos5420.dtsi
+index 534f27ceb10b..fa8107dec109 100644
+--- a/arch/arm/boot/dts/exynos5420.dtsi
++++ b/arch/arm/boot/dts/exynos5420.dtsi
+@@ -1118,7 +1118,7 @@
+ interrupt-parent = <&combiner>;
+ interrupts = <3 0>;
+ clock-names = "sysmmu", "master";
+- clocks = <&clock CLK_SMMU_FIMD1M0>, <&clock CLK_FIMD1>;
++ clocks = <&clock CLK_SMMU_FIMD1M1>, <&clock CLK_FIMD1>;
+ power-domains = <&disp_pd>;
+ #iommu-cells = <0>;
+ };
+diff --git a/arch/arm/boot/dts/imx6qdl-rex.dtsi b/arch/arm/boot/dts/imx6qdl-rex.dtsi
+index 3373fd958e95..a50356243888 100644
+--- a/arch/arm/boot/dts/imx6qdl-rex.dtsi
++++ b/arch/arm/boot/dts/imx6qdl-rex.dtsi
+@@ -35,7 +35,6 @@
+ compatible = "regulator-fixed";
+ reg = <1>;
+ pinctrl-names = "default";
+- pinctrl-0 = <&pinctrl_usbh1>;
+ regulator-name = "usbh1_vbus";
+ regulator-min-microvolt = <5000000>;
+ regulator-max-microvolt = <5000000>;
+@@ -47,7 +46,6 @@
+ compatible = "regulator-fixed";
+ reg = <2>;
+ pinctrl-names = "default";
+- pinctrl-0 = <&pinctrl_usbotg>;
+ regulator-name = "usb_otg_vbus";
+ regulator-min-microvolt = <5000000>;
+ regulator-max-microvolt = <5000000>;
+diff --git a/arch/arm/boot/dts/omap3-beagle.dts b/arch/arm/boot/dts/omap3-beagle.dts
+index a5474113cd50..67659a0ed13e 100644
+--- a/arch/arm/boot/dts/omap3-beagle.dts
++++ b/arch/arm/boot/dts/omap3-beagle.dts
+@@ -202,7 +202,7 @@
+
+ tfp410_pins: pinmux_tfp410_pins {
+ pinctrl-single,pins = <
+- 0x194 (PIN_OUTPUT | MUX_MODE4) /* hdq_sio.gpio_170 */
++ 0x196 (PIN_OUTPUT | MUX_MODE4) /* hdq_sio.gpio_170 */
+ >;
+ };
+
+diff --git a/arch/arm/boot/dts/omap5-uevm.dts b/arch/arm/boot/dts/omap5-uevm.dts
+index 275618f19a43..5771a149ce4a 100644
+--- a/arch/arm/boot/dts/omap5-uevm.dts
++++ b/arch/arm/boot/dts/omap5-uevm.dts
+@@ -174,8 +174,8 @@
+
+ i2c5_pins: pinmux_i2c5_pins {
+ pinctrl-single,pins = <
+- 0x184 (PIN_INPUT | MUX_MODE0) /* i2c5_scl */
+- 0x186 (PIN_INPUT | MUX_MODE0) /* i2c5_sda */
++ 0x186 (PIN_INPUT | MUX_MODE0) /* i2c5_scl */
++ 0x188 (PIN_INPUT | MUX_MODE0) /* i2c5_sda */
+ >;
+ };
+
+diff --git a/arch/arm/boot/dts/sun7i-a20.dtsi b/arch/arm/boot/dts/sun7i-a20.dtsi
+index 6a63f30c9a69..f5f384c04335 100644
+--- a/arch/arm/boot/dts/sun7i-a20.dtsi
++++ b/arch/arm/boot/dts/sun7i-a20.dtsi
+@@ -107,7 +107,7 @@
+ 720000 1200000
+ 528000 1100000
+ 312000 1000000
+- 144000 900000
++ 144000 1000000
+ >;
+ #cooling-cells = <2>;
+ cooling-min-level = <0>;
+diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
+index a6ad93c9bce3..fd9eefce0a7b 100644
+--- a/arch/arm/kernel/kgdb.c
++++ b/arch/arm/kernel/kgdb.c
+@@ -259,15 +259,17 @@ int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt)
+ if (err)
+ return err;
+
+- patch_text((void *)bpt->bpt_addr,
+- *(unsigned int *)arch_kgdb_ops.gdb_bpt_instr);
++ /* Machine is already stopped, so we can use __patch_text() directly */
++ __patch_text((void *)bpt->bpt_addr,
++ *(unsigned int *)arch_kgdb_ops.gdb_bpt_instr);
+
+ return err;
+ }
+
+ int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
+ {
+- patch_text((void *)bpt->bpt_addr, *(unsigned int *)bpt->saved_instr);
++ /* Machine is already stopped, so we can use __patch_text() directly */
++ __patch_text((void *)bpt->bpt_addr, *(unsigned int *)bpt->saved_instr);
+
+ return 0;
+ }
+diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
+index 54272e0be713..7d5379c1c443 100644
+--- a/arch/arm/kernel/perf_event.c
++++ b/arch/arm/kernel/perf_event.c
+@@ -795,8 +795,10 @@ static int of_pmu_irq_cfg(struct arm_pmu *pmu)
+
+ /* Don't bother with PPIs; they're already affine */
+ irq = platform_get_irq(pdev, 0);
+- if (irq >= 0 && irq_is_percpu(irq))
++ if (irq >= 0 && irq_is_percpu(irq)) {
++ cpumask_setall(&pmu->supported_cpus);
+ return 0;
++ }
+
+ irqs = kcalloc(pdev->num_resources, sizeof(*irqs), GFP_KERNEL);
+ if (!irqs)
+diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
+index 423663e23791..586eef26203d 100644
+--- a/arch/arm/kernel/signal.c
++++ b/arch/arm/kernel/signal.c
+@@ -343,12 +343,17 @@ setup_return(struct pt_regs *regs, struct ksignal *ksig,
+ */
+ thumb = handler & 1;
+
+-#if __LINUX_ARM_ARCH__ >= 7
++#if __LINUX_ARM_ARCH__ >= 6
+ /*
+- * Clear the If-Then Thumb-2 execution state
+- * ARM spec requires this to be all 000s in ARM mode
+- * Snapdragon S4/Krait misbehaves on a Thumb=>ARM
+- * signal transition without this.
++ * Clear the If-Then Thumb-2 execution state. ARM spec
++ * requires this to be all 000s in ARM mode. Snapdragon
++ * S4/Krait misbehaves on a Thumb=>ARM signal transition
++ * without this.
++ *
++ * We must do this whenever we are running on a Thumb-2
++ * capable CPU, which includes ARMv6T2. However, we elect
++ * to do this whenever we're on an ARMv6 or later CPU for
++ * simplicity.
+ */
+ cpsr &= ~PSR_IT_MASK;
+ #endif
+diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
+index 702740d37465..51a59504bef4 100644
+--- a/arch/arm/kvm/interrupts_head.S
++++ b/arch/arm/kvm/interrupts_head.S
+@@ -515,8 +515,7 @@ ARM_BE8(rev r6, r6 )
+
+ mrc p15, 0, r2, c14, c3, 1 @ CNTV_CTL
+ str r2, [vcpu, #VCPU_TIMER_CNTV_CTL]
+- bic r2, #1 @ Clear ENABLE
+- mcr p15, 0, r2, c14, c3, 1 @ CNTV_CTL
++
+ isb
+
+ mrrc p15, 3, rr_lo_hi(r2, r3), c14 @ CNTV_CVAL
+@@ -529,6 +528,9 @@ ARM_BE8(rev r6, r6 )
+ mcrr p15, 4, r2, r2, c14 @ CNTVOFF
+
+ 1:
++ mov r2, #0 @ Clear ENABLE
++ mcr p15, 0, r2, c14, c3, 1 @ CNTV_CTL
++
+ @ Allow physical timer/counter access for the host
+ mrc p15, 4, r2, c14, c1, 0 @ CNTHCTL
+ orr r2, r2, #(CNTHCTL_PL1PCEN | CNTHCTL_PL1PCTEN)
+diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
+index 7b4201294187..6984342da13d 100644
+--- a/arch/arm/kvm/mmu.c
++++ b/arch/arm/kvm/mmu.c
+@@ -1792,8 +1792,10 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
+ if (vma->vm_flags & VM_PFNMAP) {
+ gpa_t gpa = mem->guest_phys_addr +
+ (vm_start - mem->userspace_addr);
+- phys_addr_t pa = (vma->vm_pgoff << PAGE_SHIFT) +
+- vm_start - vma->vm_start;
++ phys_addr_t pa;
++
++ pa = (phys_addr_t)vma->vm_pgoff << PAGE_SHIFT;
++ pa += vm_start - vma->vm_start;
+
+ /* IO region dirty page logging not allowed */
+ if (memslot->flags & KVM_MEM_LOG_DIRTY_PAGES)
+diff --git a/arch/arm/mach-exynos/mcpm-exynos.c b/arch/arm/mach-exynos/mcpm-exynos.c
+index 9bdf54795f05..56978199c479 100644
+--- a/arch/arm/mach-exynos/mcpm-exynos.c
++++ b/arch/arm/mach-exynos/mcpm-exynos.c
+@@ -20,6 +20,7 @@
+ #include <asm/cputype.h>
+ #include <asm/cp15.h>
+ #include <asm/mcpm.h>
++#include <asm/smp_plat.h>
+
+ #include "regs-pmu.h"
+ #include "common.h"
+@@ -70,7 +71,31 @@ static int exynos_cpu_powerup(unsigned int cpu, unsigned int cluster)
+ cluster >= EXYNOS5420_NR_CLUSTERS)
+ return -EINVAL;
+
+- exynos_cpu_power_up(cpunr);
++ if (!exynos_cpu_power_state(cpunr)) {
++ exynos_cpu_power_up(cpunr);
++
++ /*
++ * This assumes the cluster number of the big cores(Cortex A15)
++ * is 0 and the Little cores(Cortex A7) is 1.
++ * When the system was booted from the Little core,
++ * they should be reset during power up cpu.
++ */
++ if (cluster &&
++ cluster == MPIDR_AFFINITY_LEVEL(cpu_logical_map(0), 1)) {
++ /*
++ * Before we reset the Little cores, we should wait
++ * the SPARE2 register is set to 1 because the init
++ * codes of the iROM will set the register after
++ * initialization.
++ */
++ while (!pmu_raw_readl(S5P_PMU_SPARE2))
++ udelay(10);
++
++ pmu_raw_writel(EXYNOS5420_KFC_CORE_RESET(cpu),
++ EXYNOS_SWRESET);
++ }
++ }
++
+ return 0;
+ }
+
+diff --git a/arch/arm/mach-exynos/regs-pmu.h b/arch/arm/mach-exynos/regs-pmu.h
+index b7614333d296..fba9068ed260 100644
+--- a/arch/arm/mach-exynos/regs-pmu.h
++++ b/arch/arm/mach-exynos/regs-pmu.h
+@@ -513,6 +513,12 @@ static inline unsigned int exynos_pmu_cpunr(unsigned int mpidr)
+ #define SPREAD_ENABLE 0xF
+ #define SPREAD_USE_STANDWFI 0xF
+
++#define EXYNOS5420_KFC_CORE_RESET0 BIT(8)
++#define EXYNOS5420_KFC_ETM_RESET0 BIT(20)
++
++#define EXYNOS5420_KFC_CORE_RESET(_nr) \
++ ((EXYNOS5420_KFC_CORE_RESET0 | EXYNOS5420_KFC_ETM_RESET0) << (_nr))
++
+ #define EXYNOS5420_BB_CON1 0x0784
+ #define EXYNOS5420_BB_SEL_EN BIT(31)
+ #define EXYNOS5420_BB_PMOS_EN BIT(7)
+diff --git a/arch/arm/plat-pxa/ssp.c b/arch/arm/plat-pxa/ssp.c
+index ad9529cc4203..daa1a65f2eb7 100644
+--- a/arch/arm/plat-pxa/ssp.c
++++ b/arch/arm/plat-pxa/ssp.c
+@@ -107,7 +107,6 @@ static const struct of_device_id pxa_ssp_of_ids[] = {
+ { .compatible = "mvrl,pxa168-ssp", .data = (void *) PXA168_SSP },
+ { .compatible = "mrvl,pxa910-ssp", .data = (void *) PXA910_SSP },
+ { .compatible = "mrvl,ce4100-ssp", .data = (void *) CE4100_SSP },
+- { .compatible = "mrvl,lpss-ssp", .data = (void *) LPSS_SSP },
+ { },
+ };
+ MODULE_DEVICE_TABLE(of, pxa_ssp_of_ids);
+diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c
+index e8ca6eaedd02..13671a9cf016 100644
+--- a/arch/arm64/kernel/efi.c
++++ b/arch/arm64/kernel/efi.c
+@@ -258,7 +258,8 @@ static bool __init efi_virtmap_init(void)
+ */
+ if (!is_normal_ram(md))
+ prot = __pgprot(PROT_DEVICE_nGnRE);
+- else if (md->type == EFI_RUNTIME_SERVICES_CODE)
++ else if (md->type == EFI_RUNTIME_SERVICES_CODE ||
++ !PAGE_ALIGNED(md->phys_addr))
+ prot = PAGE_KERNEL_EXEC;
+ else
+ prot = PAGE_KERNEL;
+diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
+index 08cafc518b9a..0f03a8fe2314 100644
+--- a/arch/arm64/kernel/entry-ftrace.S
++++ b/arch/arm64/kernel/entry-ftrace.S
+@@ -178,6 +178,24 @@ ENTRY(ftrace_stub)
+ ENDPROC(ftrace_stub)
+
+ #ifdef CONFIG_FUNCTION_GRAPH_TRACER
++ /* save return value regs*/
++ .macro save_return_regs
++ sub sp, sp, #64
++ stp x0, x1, [sp]
++ stp x2, x3, [sp, #16]
++ stp x4, x5, [sp, #32]
++ stp x6, x7, [sp, #48]
++ .endm
++
++ /* restore return value regs*/
++ .macro restore_return_regs
++ ldp x0, x1, [sp]
++ ldp x2, x3, [sp, #16]
++ ldp x4, x5, [sp, #32]
++ ldp x6, x7, [sp, #48]
++ add sp, sp, #64
++ .endm
++
+ /*
+ * void ftrace_graph_caller(void)
+ *
+@@ -204,11 +222,11 @@ ENDPROC(ftrace_graph_caller)
+ * only when CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST is enabled.
+ */
+ ENTRY(return_to_handler)
+- str x0, [sp, #-16]!
++ save_return_regs
+ mov x0, x29 // parent's fp
+ bl ftrace_return_to_handler// addr = ftrace_return_to_hander(fp);
+ mov x30, x0 // restore the original return address
+- ldr x0, [sp], #16
++ restore_return_regs
+ ret
+ END(return_to_handler)
+ #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
+diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
+index 94d98cd1aad8..27c3e6fd24c1 100644
+--- a/arch/arm64/mm/fault.c
++++ b/arch/arm64/mm/fault.c
+@@ -278,6 +278,7 @@ retry:
+ * starvation.
+ */
+ mm_flags &= ~FAULT_FLAG_ALLOW_RETRY;
++ mm_flags |= FAULT_FLAG_TRIED;
+ goto retry;
+ }
+ }
+diff --git a/arch/m68k/include/asm/linkage.h b/arch/m68k/include/asm/linkage.h
+index 5a822bb790f7..066e74f666ae 100644
+--- a/arch/m68k/include/asm/linkage.h
++++ b/arch/m68k/include/asm/linkage.h
+@@ -4,4 +4,34 @@
+ #define __ALIGN .align 4
+ #define __ALIGN_STR ".align 4"
+
++/*
++ * Make sure the compiler doesn't do anything stupid with the
++ * arguments on the stack - they are owned by the *caller*, not
++ * the callee. This just fools gcc into not spilling into them,
++ * and keeps it from doing tailcall recursion and/or using the
++ * stack slots for temporaries, since they are live and "used"
++ * all the way to the end of the function.
++ */
++#define asmlinkage_protect(n, ret, args...) \
++ __asmlinkage_protect##n(ret, ##args)
++#define __asmlinkage_protect_n(ret, args...) \
++ __asm__ __volatile__ ("" : "=r" (ret) : "0" (ret), ##args)
++#define __asmlinkage_protect0(ret) \
++ __asmlinkage_protect_n(ret)
++#define __asmlinkage_protect1(ret, arg1) \
++ __asmlinkage_protect_n(ret, "m" (arg1))
++#define __asmlinkage_protect2(ret, arg1, arg2) \
++ __asmlinkage_protect_n(ret, "m" (arg1), "m" (arg2))
++#define __asmlinkage_protect3(ret, arg1, arg2, arg3) \
++ __asmlinkage_protect_n(ret, "m" (arg1), "m" (arg2), "m" (arg3))
++#define __asmlinkage_protect4(ret, arg1, arg2, arg3, arg4) \
++ __asmlinkage_protect_n(ret, "m" (arg1), "m" (arg2), "m" (arg3), \
++ "m" (arg4))
++#define __asmlinkage_protect5(ret, arg1, arg2, arg3, arg4, arg5) \
++ __asmlinkage_protect_n(ret, "m" (arg1), "m" (arg2), "m" (arg3), \
++ "m" (arg4), "m" (arg5))
++#define __asmlinkage_protect6(ret, arg1, arg2, arg3, arg4, arg5, arg6) \
++ __asmlinkage_protect_n(ret, "m" (arg1), "m" (arg2), "m" (arg3), \
++ "m" (arg4), "m" (arg5), "m" (arg6))
++
+ #endif
+diff --git a/arch/mips/kernel/cps-vec.S b/arch/mips/kernel/cps-vec.S
+index 9f71c06aebf6..209ded16806b 100644
+--- a/arch/mips/kernel/cps-vec.S
++++ b/arch/mips/kernel/cps-vec.S
+@@ -39,6 +39,7 @@
+ mfc0 \dest, CP0_CONFIG, 3
+ andi \dest, \dest, MIPS_CONF3_MT
+ beqz \dest, \nomt
++ nop
+ .endm
+
+ .section .text.cps-vec
+@@ -223,10 +224,9 @@ LEAF(excep_ejtag)
+ END(excep_ejtag)
+
+ LEAF(mips_cps_core_init)
+-#ifdef CONFIG_MIPS_MT
++#ifdef CONFIG_MIPS_MT_SMP
+ /* Check that the core implements the MT ASE */
+ has_mt t0, 3f
+- nop
+
+ .set push
+ .set mips64r2
+@@ -310,8 +310,9 @@ LEAF(mips_cps_boot_vpes)
+ PTR_ADDU t0, t0, t1
+
+ /* Calculate this VPEs ID. If the core doesn't support MT use 0 */
++ li t9, 0
++#ifdef CONFIG_MIPS_MT_SMP
+ has_mt ta2, 1f
+- li t9, 0
+
+ /* Find the number of VPEs present in the core */
+ mfc0 t1, CP0_MVPCONF0
+@@ -330,6 +331,7 @@ LEAF(mips_cps_boot_vpes)
+ /* Retrieve the VPE ID from EBase.CPUNum */
+ mfc0 t9, $15, 1
+ and t9, t9, t1
++#endif
+
+ 1: /* Calculate a pointer to this VPEs struct vpe_boot_config */
+ li t1, VPEBOOTCFG_SIZE
+@@ -337,7 +339,7 @@ LEAF(mips_cps_boot_vpes)
+ PTR_L ta3, COREBOOTCFG_VPECONFIG(t0)
+ PTR_ADDU v0, v0, ta3
+
+-#ifdef CONFIG_MIPS_MT
++#ifdef CONFIG_MIPS_MT_SMP
+
+ /* If the core doesn't support MT then return */
+ bnez ta2, 1f
+@@ -451,7 +453,7 @@ LEAF(mips_cps_boot_vpes)
+
+ 2: .set pop
+
+-#endif /* CONFIG_MIPS_MT */
++#endif /* CONFIG_MIPS_MT_SMP */
+
+ /* Return */
+ jr ra
+diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
+index 008b3378653a..4ceac5cdd6b8 100644
+--- a/arch/mips/kernel/setup.c
++++ b/arch/mips/kernel/setup.c
+@@ -338,7 +338,7 @@ static void __init bootmem_init(void)
+ if (end <= reserved_end)
+ continue;
+ #ifdef CONFIG_BLK_DEV_INITRD
+- /* mapstart should be after initrd_end */
++ /* Skip zones before initrd and initrd itself */
+ if (initrd_end && end <= (unsigned long)PFN_UP(__pa(initrd_end)))
+ continue;
+ #endif
+@@ -371,6 +371,14 @@ static void __init bootmem_init(void)
+ max_low_pfn = PFN_DOWN(HIGHMEM_START);
+ }
+
++#ifdef CONFIG_BLK_DEV_INITRD
++ /*
++ * mapstart should be after initrd_end
++ */
++ if (initrd_end)
++ mapstart = max(mapstart, (unsigned long)PFN_UP(__pa(initrd_end)));
++#endif
++
+ /*
+ * Initialize the boot-time allocator with low memory only.
+ */
+diff --git a/arch/mips/loongson64/common/env.c b/arch/mips/loongson64/common/env.c
+index f6c44dd332e2..d6d07ad56180 100644
+--- a/arch/mips/loongson64/common/env.c
++++ b/arch/mips/loongson64/common/env.c
+@@ -64,6 +64,9 @@ void __init prom_init_env(void)
+ }
+ if (memsize == 0)
+ memsize = 256;
++
++ loongson_sysconf.nr_uarts = 1;
++
+ pr_info("memsize=%u, highmemsize=%u\n", memsize, highmemsize);
+ #else
+ struct boot_params *boot_p;
+diff --git a/arch/mips/mm/dma-default.c b/arch/mips/mm/dma-default.c
+index eeaf0245c3b1..815892ed3fe8 100644
+--- a/arch/mips/mm/dma-default.c
++++ b/arch/mips/mm/dma-default.c
+@@ -100,7 +100,7 @@ static gfp_t massage_gfp_flags(const struct device *dev, gfp_t gfp)
+ else
+ #endif
+ #if defined(CONFIG_ZONE_DMA) && !defined(CONFIG_ZONE_DMA32)
+- if (dev->coherent_dma_mask < DMA_BIT_MASK(64))
++ if (dev->coherent_dma_mask < DMA_BIT_MASK(sizeof(phys_addr_t) * 8))
+ dma_flag = __GFP_DMA;
+ else
+ #endif
+diff --git a/arch/mips/net/bpf_jit_asm.S b/arch/mips/net/bpf_jit_asm.S
+index e92726099be0..dabf4179cd7e 100644
+--- a/arch/mips/net/bpf_jit_asm.S
++++ b/arch/mips/net/bpf_jit_asm.S
+@@ -64,8 +64,20 @@ sk_load_word_positive:
+ PTR_ADDU t1, $r_skb_data, offset
+ lw $r_A, 0(t1)
+ #ifdef CONFIG_CPU_LITTLE_ENDIAN
++# if defined(__mips_isa_rev) && (__mips_isa_rev >= 2)
+ wsbh t0, $r_A
+ rotr $r_A, t0, 16
++# else
++ sll t0, $r_A, 24
++ srl t1, $r_A, 24
++ srl t2, $r_A, 8
++ or t0, t0, t1
++ andi t2, t2, 0xff00
++ andi t1, $r_A, 0xff00
++ or t0, t0, t2
++ sll t1, t1, 8
++ or $r_A, t0, t1
++# endif
+ #endif
+ jr $r_ra
+ move $r_ret, zero
+@@ -80,8 +92,16 @@ sk_load_half_positive:
+ PTR_ADDU t1, $r_skb_data, offset
+ lh $r_A, 0(t1)
+ #ifdef CONFIG_CPU_LITTLE_ENDIAN
++# if defined(__mips_isa_rev) && (__mips_isa_rev >= 2)
+ wsbh t0, $r_A
+ seh $r_A, t0
++# else
++ sll t0, $r_A, 24
++ andi t1, $r_A, 0xff00
++ sra t0, t0, 16
++ srl t1, t1, 8
++ or $r_A, t0, t1
++# endif
+ #endif
+ jr $r_ra
+ move $r_ret, zero
+@@ -148,23 +168,47 @@ sk_load_byte_positive:
+ NESTED(bpf_slow_path_word, (6 * SZREG), $r_sp)
+ bpf_slow_path_common(4)
+ #ifdef CONFIG_CPU_LITTLE_ENDIAN
++# if defined(__mips_isa_rev) && (__mips_isa_rev >= 2)
+ wsbh t0, $r_s0
+ jr $r_ra
+ rotr $r_A, t0, 16
+-#endif
++# else
++ sll t0, $r_s0, 24
++ srl t1, $r_s0, 24
++ srl t2, $r_s0, 8
++ or t0, t0, t1
++ andi t2, t2, 0xff00
++ andi t1, $r_s0, 0xff00
++ or t0, t0, t2
++ sll t1, t1, 8
++ jr $r_ra
++ or $r_A, t0, t1
++# endif
++#else
+ jr $r_ra
+- move $r_A, $r_s0
++ move $r_A, $r_s0
++#endif
+
+ END(bpf_slow_path_word)
+
+ NESTED(bpf_slow_path_half, (6 * SZREG), $r_sp)
+ bpf_slow_path_common(2)
+ #ifdef CONFIG_CPU_LITTLE_ENDIAN
++# if defined(__mips_isa_rev) && (__mips_isa_rev >= 2)
+ jr $r_ra
+ wsbh $r_A, $r_s0
+-#endif
++# else
++ sll t0, $r_s0, 8
++ andi t1, $r_s0, 0xff00
++ andi t0, t0, 0xff00
++ srl t1, t1, 8
++ jr $r_ra
++ or $r_A, t0, t1
++# endif
++#else
+ jr $r_ra
+ move $r_A, $r_s0
++#endif
+
+ END(bpf_slow_path_half)
+
+diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
+index 05ea8fc7f829..4816fe2fa857 100644
+--- a/arch/powerpc/kvm/book3s.c
++++ b/arch/powerpc/kvm/book3s.c
+@@ -827,12 +827,15 @@ int kvmppc_h_logical_ci_load(struct kvm_vcpu *vcpu)
+ unsigned long size = kvmppc_get_gpr(vcpu, 4);
+ unsigned long addr = kvmppc_get_gpr(vcpu, 5);
+ u64 buf;
++ int srcu_idx;
+ int ret;
+
+ if (!is_power_of_2(size) || (size > sizeof(buf)))
+ return H_TOO_HARD;
+
++ srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
+ ret = kvm_io_bus_read(vcpu, KVM_MMIO_BUS, addr, size, &buf);
++ srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx);
+ if (ret != 0)
+ return H_TOO_HARD;
+
+@@ -867,6 +870,7 @@ int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu)
+ unsigned long addr = kvmppc_get_gpr(vcpu, 5);
+ unsigned long val = kvmppc_get_gpr(vcpu, 6);
+ u64 buf;
++ int srcu_idx;
+ int ret;
+
+ switch (size) {
+@@ -890,7 +894,9 @@ int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu)
+ return H_TOO_HARD;
+ }
+
++ srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
+ ret = kvm_io_bus_write(vcpu, KVM_MMIO_BUS, addr, size, &buf);
++ srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx);
+ if (ret != 0)
+ return H_TOO_HARD;
+
+diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
+index 68d067ad4222..a9f753fb73a8 100644
+--- a/arch/powerpc/kvm/book3s_hv.c
++++ b/arch/powerpc/kvm/book3s_hv.c
+@@ -2178,7 +2178,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
+ vc->runner = vcpu;
+ if (n_ceded == vc->n_runnable) {
+ kvmppc_vcore_blocked(vc);
+- } else if (should_resched()) {
++ } else if (need_resched()) {
+ vc->vcore_state = VCORE_PREEMPT;
+ /* Let something else run */
+ cond_resched_lock(&vc->lock);
+diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+index 76408cf0ad04..437f64350847 100644
+--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
++++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+@@ -1171,6 +1171,7 @@ mc_cont:
+ bl kvmhv_accumulate_time
+ #endif
+
++ mr r3, r12
+ /* Increment exit count, poke other threads to exit */
+ bl kvmhv_commence_exit
+ nop
+diff --git a/arch/powerpc/platforms/pasemi/msi.c b/arch/powerpc/platforms/pasemi/msi.c
+index 27f2b187a91b..ff1bb4b690b9 100644
+--- a/arch/powerpc/platforms/pasemi/msi.c
++++ b/arch/powerpc/platforms/pasemi/msi.c
+@@ -63,6 +63,7 @@ static struct irq_chip mpic_pasemi_msi_chip = {
+ static void pasemi_msi_teardown_msi_irqs(struct pci_dev *pdev)
+ {
+ struct msi_desc *entry;
++ irq_hw_number_t hwirq;
+
+ pr_debug("pasemi_msi_teardown_msi_irqs, pdev %p\n", pdev);
+
+@@ -70,10 +71,10 @@ static void pasemi_msi_teardown_msi_irqs(struct pci_dev *pdev)
+ if (entry->irq == NO_IRQ)
+ continue;
+
++ hwirq = virq_to_hw(entry->irq);
+ irq_set_msi_desc(entry->irq, NULL);
+- msi_bitmap_free_hwirqs(&msi_mpic->msi_bitmap,
+- virq_to_hw(entry->irq), ALLOC_CHUNK);
+ irq_dispose_mapping(entry->irq);
++ msi_bitmap_free_hwirqs(&msi_mpic->msi_bitmap, hwirq, ALLOC_CHUNK);
+ }
+
+ return;
+diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
+index 765d8ed558d0..fd16f86e54a9 100644
+--- a/arch/powerpc/platforms/powernv/pci.c
++++ b/arch/powerpc/platforms/powernv/pci.c
+@@ -99,6 +99,7 @@ void pnv_teardown_msi_irqs(struct pci_dev *pdev)
+ struct pci_controller *hose = pci_bus_to_host(pdev->bus);
+ struct pnv_phb *phb = hose->private_data;
+ struct msi_desc *entry;
++ irq_hw_number_t hwirq;
+
+ if (WARN_ON(!phb))
+ return;
+@@ -106,10 +107,10 @@ void pnv_teardown_msi_irqs(struct pci_dev *pdev)
+ list_for_each_entry(entry, &pdev->msi_list, list) {
+ if (entry->irq == NO_IRQ)
+ continue;
++ hwirq = virq_to_hw(entry->irq);
+ irq_set_msi_desc(entry->irq, NULL);
+- msi_bitmap_free_hwirqs(&phb->msi_bmp,
+- virq_to_hw(entry->irq) - phb->msi_base, 1);
+ irq_dispose_mapping(entry->irq);
++ msi_bitmap_free_hwirqs(&phb->msi_bmp, hwirq - phb->msi_base, 1);
+ }
+ }
+ #endif /* CONFIG_PCI_MSI */
+diff --git a/arch/powerpc/sysdev/fsl_msi.c b/arch/powerpc/sysdev/fsl_msi.c
+index 5236e5427c38..691e8e517b3e 100644
+--- a/arch/powerpc/sysdev/fsl_msi.c
++++ b/arch/powerpc/sysdev/fsl_msi.c
+@@ -128,15 +128,16 @@ static void fsl_teardown_msi_irqs(struct pci_dev *pdev)
+ {
+ struct msi_desc *entry;
+ struct fsl_msi *msi_data;
++ irq_hw_number_t hwirq;
+
+ list_for_each_entry(entry, &pdev->msi_list, list) {
+ if (entry->irq == NO_IRQ)
+ continue;
++ hwirq = virq_to_hw(entry->irq);
+ msi_data = irq_get_chip_data(entry->irq);
+ irq_set_msi_desc(entry->irq, NULL);
+- msi_bitmap_free_hwirqs(&msi_data->bitmap,
+- virq_to_hw(entry->irq), 1);
+ irq_dispose_mapping(entry->irq);
++ msi_bitmap_free_hwirqs(&msi_data->bitmap, hwirq, 1);
+ }
+
+ return;
+diff --git a/arch/powerpc/sysdev/mpic_u3msi.c b/arch/powerpc/sysdev/mpic_u3msi.c
+index fc46ef3b816e..4c3165fa521c 100644
+--- a/arch/powerpc/sysdev/mpic_u3msi.c
++++ b/arch/powerpc/sysdev/mpic_u3msi.c
+@@ -107,15 +107,16 @@ static u64 find_u4_magic_addr(struct pci_dev *pdev, unsigned int hwirq)
+ static void u3msi_teardown_msi_irqs(struct pci_dev *pdev)
+ {
+ struct msi_desc *entry;
++ irq_hw_number_t hwirq;
+
+ list_for_each_entry(entry, &pdev->msi_list, list) {
+ if (entry->irq == NO_IRQ)
+ continue;
+
++ hwirq = virq_to_hw(entry->irq);
+ irq_set_msi_desc(entry->irq, NULL);
+- msi_bitmap_free_hwirqs(&msi_mpic->msi_bitmap,
+- virq_to_hw(entry->irq), 1);
+ irq_dispose_mapping(entry->irq);
++ msi_bitmap_free_hwirqs(&msi_mpic->msi_bitmap, hwirq, 1);
+ }
+
+ return;
+diff --git a/arch/powerpc/sysdev/ppc4xx_msi.c b/arch/powerpc/sysdev/ppc4xx_msi.c
+index 6eb21f2ea585..060f23775255 100644
+--- a/arch/powerpc/sysdev/ppc4xx_msi.c
++++ b/arch/powerpc/sysdev/ppc4xx_msi.c
+@@ -124,16 +124,17 @@ void ppc4xx_teardown_msi_irqs(struct pci_dev *dev)
+ {
+ struct msi_desc *entry;
+ struct ppc4xx_msi *msi_data = &ppc4xx_msi;
++ irq_hw_number_t hwirq;
+
+ dev_dbg(&dev->dev, "PCIE-MSI: tearing down msi irqs\n");
+
+ list_for_each_entry(entry, &dev->msi_list, list) {
+ if (entry->irq == NO_IRQ)
+ continue;
++ hwirq = virq_to_hw(entry->irq);
+ irq_set_msi_desc(entry->irq, NULL);
+- msi_bitmap_free_hwirqs(&msi_data->bitmap,
+- virq_to_hw(entry->irq), 1);
+ irq_dispose_mapping(entry->irq);
++ msi_bitmap_free_hwirqs(&msi_data->bitmap, hwirq, 1);
+ }
+ }
+
+diff --git a/arch/s390/boot/compressed/Makefile b/arch/s390/boot/compressed/Makefile
+index d4788111c161..fac6ac9790fa 100644
+--- a/arch/s390/boot/compressed/Makefile
++++ b/arch/s390/boot/compressed/Makefile
+@@ -10,7 +10,7 @@ targets += misc.o piggy.o sizes.h head.o
+
+ KBUILD_CFLAGS := -m64 -D__KERNEL__ $(LINUX_INCLUDE) -O2
+ KBUILD_CFLAGS += -DDISABLE_BRANCH_PROFILING
+-KBUILD_CFLAGS += $(cflags-y) -fno-delete-null-pointer-checks
++KBUILD_CFLAGS += $(cflags-y) -fno-delete-null-pointer-checks -msoft-float
+ KBUILD_CFLAGS += $(call cc-option,-mpacked-stack)
+ KBUILD_CFLAGS += $(call cc-option,-ffreestanding)
+
+diff --git a/arch/s390/kernel/compat_signal.c b/arch/s390/kernel/compat_signal.c
+index fe8d6924efaa..c78ba51ae285 100644
+--- a/arch/s390/kernel/compat_signal.c
++++ b/arch/s390/kernel/compat_signal.c
+@@ -48,6 +48,19 @@ typedef struct
+ struct ucontext32 uc;
+ } rt_sigframe32;
+
++static inline void sigset_to_sigset32(unsigned long *set64,
++ compat_sigset_word *set32)
++{
++ set32[0] = (compat_sigset_word) set64[0];
++ set32[1] = (compat_sigset_word)(set64[0] >> 32);
++}
++
++static inline void sigset32_to_sigset(compat_sigset_word *set32,
++ unsigned long *set64)
++{
++ set64[0] = (unsigned long) set32[0] | ((unsigned long) set32[1] << 32);
++}
++
+ int copy_siginfo_to_user32(compat_siginfo_t __user *to, const siginfo_t *from)
+ {
+ int err;
+@@ -303,10 +316,12 @@ COMPAT_SYSCALL_DEFINE0(sigreturn)
+ {
+ struct pt_regs *regs = task_pt_regs(current);
+ sigframe32 __user *frame = (sigframe32 __user *)regs->gprs[15];
++ compat_sigset_t cset;
+ sigset_t set;
+
+- if (__copy_from_user(&set.sig, &frame->sc.oldmask, _SIGMASK_COPY_SIZE32))
++ if (__copy_from_user(&cset.sig, &frame->sc.oldmask, _SIGMASK_COPY_SIZE32))
+ goto badframe;
++ sigset32_to_sigset(cset.sig, set.sig);
+ set_current_blocked(&set);
+ if (restore_sigregs32(regs, &frame->sregs))
+ goto badframe;
+@@ -323,10 +338,12 @@ COMPAT_SYSCALL_DEFINE0(rt_sigreturn)
+ {
+ struct pt_regs *regs = task_pt_regs(current);
+ rt_sigframe32 __user *frame = (rt_sigframe32 __user *)regs->gprs[15];
++ compat_sigset_t cset;
+ sigset_t set;
+
+- if (__copy_from_user(&set, &frame->uc.uc_sigmask, sizeof(set)))
++ if (__copy_from_user(&cset, &frame->uc.uc_sigmask, sizeof(cset)))
+ goto badframe;
++ sigset32_to_sigset(cset.sig, set.sig);
+ set_current_blocked(&set);
+ if (compat_restore_altstack(&frame->uc.uc_stack))
+ goto badframe;
+@@ -397,7 +414,7 @@ static int setup_frame32(struct ksignal *ksig, sigset_t *set,
+ return -EFAULT;
+
+ /* Create struct sigcontext32 on the signal stack */
+- memcpy(&sc.oldmask, &set->sig, _SIGMASK_COPY_SIZE32);
++ sigset_to_sigset32(set->sig, sc.oldmask);
+ sc.sregs = (__u32)(unsigned long __force) &frame->sregs;
+ if (__copy_to_user(&frame->sc, &sc, sizeof(frame->sc)))
+ return -EFAULT;
+@@ -458,6 +475,7 @@ static int setup_frame32(struct ksignal *ksig, sigset_t *set,
+ static int setup_rt_frame32(struct ksignal *ksig, sigset_t *set,
+ struct pt_regs *regs)
+ {
++ compat_sigset_t cset;
+ rt_sigframe32 __user *frame;
+ unsigned long restorer;
+ size_t frame_size;
+@@ -505,11 +523,12 @@ static int setup_rt_frame32(struct ksignal *ksig, sigset_t *set,
+ store_sigregs();
+
+ /* Create ucontext on the signal stack. */
++ sigset_to_sigset32(set->sig, cset.sig);
+ if (__put_user(uc_flags, &frame->uc.uc_flags) ||
+ __put_user(0, &frame->uc.uc_link) ||
+ __compat_save_altstack(&frame->uc.uc_stack, regs->gprs[15]) ||
+ save_sigregs32(regs, &frame->uc.uc_mcontext) ||
+- __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set)) ||
++ __copy_to_user(&frame->uc.uc_sigmask, &cset, sizeof(cset)) ||
+ save_sigregs_ext32(regs, &frame->uc.uc_mcontext_ext))
+ return -EFAULT;
+
+diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
+index 8cb3e438f21e..d330840a2b18 100644
+--- a/arch/x86/entry/entry_64.S
++++ b/arch/x86/entry/entry_64.S
+@@ -1219,7 +1219,18 @@ END(error_exit)
+
+ /* Runs on exception stack */
+ ENTRY(nmi)
++ /*
++ * Fix up the exception frame if we're on Xen.
++ * PARAVIRT_ADJUST_EXCEPTION_FRAME is guaranteed to push at most
++ * one value to the stack on native, so it may clobber the rdx
++ * scratch slot, but it won't clobber any of the important
++ * slots past it.
++ *
++ * Xen is a different story, because the Xen frame itself overlaps
++ * the "NMI executing" variable.
++ */
+ PARAVIRT_ADJUST_EXCEPTION_FRAME
++
+ /*
+ * We allow breakpoints in NMIs. If a breakpoint occurs, then
+ * the iretq it performs will take us out of NMI context.
+@@ -1270,9 +1281,12 @@ ENTRY(nmi)
+ * we don't want to enable interrupts, because then we'll end
+ * up in an awkward situation in which IRQs are on but NMIs
+ * are off.
++ *
++ * We also must not push anything to the stack before switching
++ * stacks lest we corrupt the "NMI executing" variable.
+ */
+
+- SWAPGS
++ SWAPGS_UNSAFE_STACK
+ cld
+ movq %rsp, %rdx
+ movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp
+diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
+index 9ebc3d009373..2350ab78183a 100644
+--- a/arch/x86/include/asm/msr-index.h
++++ b/arch/x86/include/asm/msr-index.h
+@@ -311,6 +311,7 @@
+ /* C1E active bits in int pending message */
+ #define K8_INTP_C1E_ACTIVE_MASK 0x18000000
+ #define MSR_K8_TSEG_ADDR 0xc0010112
++#define MSR_K8_TSEG_MASK 0xc0010113
+ #define K8_MTRRFIXRANGE_DRAM_ENABLE 0x00040000 /* MtrrFixDramEn bit */
+ #define K8_MTRRFIXRANGE_DRAM_MODIFY 0x00080000 /* MtrrFixDramModEn bit */
+ #define K8_MTRR_RDMEM_WRMEM_MASK 0x18181818 /* Mask: RdMem|WrMem */
+diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h
+index dca71714f860..b12f81022a6b 100644
+--- a/arch/x86/include/asm/preempt.h
++++ b/arch/x86/include/asm/preempt.h
+@@ -90,9 +90,9 @@ static __always_inline bool __preempt_count_dec_and_test(void)
+ /*
+ * Returns true when we need to resched and can (barring IRQ state).
+ */
+-static __always_inline bool should_resched(void)
++static __always_inline bool should_resched(int preempt_offset)
+ {
+- return unlikely(!raw_cpu_read_4(__preempt_count));
++ return unlikely(raw_cpu_read_4(__preempt_count) == preempt_offset);
+ }
+
+ #ifdef CONFIG_PREEMPT
+diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
+index 9d51fae1cba3..eaba08076030 100644
+--- a/arch/x86/include/asm/qspinlock.h
++++ b/arch/x86/include/asm/qspinlock.h
+@@ -39,18 +39,27 @@ static inline void queued_spin_unlock(struct qspinlock *lock)
+ }
+ #endif
+
+-#define virt_queued_spin_lock virt_queued_spin_lock
+-
+-static inline bool virt_queued_spin_lock(struct qspinlock *lock)
++#ifdef CONFIG_PARAVIRT
++#define virt_spin_lock virt_spin_lock
++static inline bool virt_spin_lock(struct qspinlock *lock)
+ {
+ if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
+ return false;
+
+- while (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) != 0)
+- cpu_relax();
++ /*
++ * On hypervisors without PARAVIRT_SPINLOCKS support we fall
++ * back to a Test-and-Set spinlock, because fair locks have
++ * horrible lock 'holder' preemption issues.
++ */
++
++ do {
++ while (atomic_read(&lock->val) != 0)
++ cpu_relax();
++ } while (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) != 0);
+
+ return true;
+ }
++#endif /* CONFIG_PARAVIRT */
+
+ #include <asm-generic/qspinlock.h>
+
+diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
+index c42827eb86cf..25f909362b7a 100644
+--- a/arch/x86/kernel/alternative.c
++++ b/arch/x86/kernel/alternative.c
+@@ -338,10 +338,15 @@ done:
+
+ static void __init_or_module optimize_nops(struct alt_instr *a, u8 *instr)
+ {
++ unsigned long flags;
++
+ if (instr[0] != 0x90)
+ return;
+
++ local_irq_save(flags);
+ add_nops(instr + (a->instrlen - a->padlen), a->padlen);
++ sync_core();
++ local_irq_restore(flags);
+
+ DUMP_BYTES(instr, a->instrlen, "%p: [%d:%d) optimized NOPs: ",
+ instr, a->instrlen - a->padlen, a->padlen);
+diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
+index cde732c1b495..307a49828826 100644
+--- a/arch/x86/kernel/apic/apic.c
++++ b/arch/x86/kernel/apic/apic.c
+@@ -336,6 +336,13 @@ static void __setup_APIC_LVTT(unsigned int clocks, int oneshot, int irqen)
+ apic_write(APIC_LVTT, lvtt_value);
+
+ if (lvtt_value & APIC_LVT_TIMER_TSCDEADLINE) {
++ /*
++ * See Intel SDM: TSC-Deadline Mode chapter. In xAPIC mode,
++ * writing to the APIC LVTT and TSC_DEADLINE MSR isn't serialized.
++ * According to Intel, MFENCE can do the serialization here.
++ */
++ asm volatile("mfence" : : : "memory");
++
+ printk_once(KERN_DEBUG "TSC deadline timer enabled\n");
+ return;
+ }
+diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
+index 206052e55517..5880b482d83c 100644
+--- a/arch/x86/kernel/apic/io_apic.c
++++ b/arch/x86/kernel/apic/io_apic.c
+@@ -2522,6 +2522,7 @@ void __init setup_ioapic_dest(void)
+ int pin, ioapic, irq, irq_entry;
+ const struct cpumask *mask;
+ struct irq_data *idata;
++ struct irq_chip *chip;
+
+ if (skip_ioapic_setup == 1)
+ return;
+@@ -2545,9 +2546,9 @@ void __init setup_ioapic_dest(void)
+ else
+ mask = apic->target_cpus();
+
+- irq_set_affinity(irq, mask);
++ chip = irq_data_get_irq_chip(idata);
++ chip->irq_set_affinity(idata, mask, false);
+ }
+-
+ }
+ #endif
+
+diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
+index 6326ae24e4d5..1b09c420c7ff 100644
+--- a/arch/x86/kernel/cpu/perf_event_intel.c
++++ b/arch/x86/kernel/cpu/perf_event_intel.c
+@@ -2102,9 +2102,12 @@ static struct event_constraint *
+ intel_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
+ struct perf_event *event)
+ {
+- struct event_constraint *c1 = cpuc->event_constraint[idx];
++ struct event_constraint *c1 = NULL;
+ struct event_constraint *c2;
+
++ if (idx >= 0) /* fake does < 0 */
++ c1 = cpuc->event_constraint[idx];
++
+ /*
+ * first time only
+ * - static constraint: no change across incremental scheduling calls
+diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
+index e068d6683dba..74ca2fe7a0b3 100644
+--- a/arch/x86/kernel/crash.c
++++ b/arch/x86/kernel/crash.c
+@@ -185,10 +185,9 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
+ }
+
+ #ifdef CONFIG_KEXEC_FILE
+-static int get_nr_ram_ranges_callback(unsigned long start_pfn,
+- unsigned long nr_pfn, void *arg)
++static int get_nr_ram_ranges_callback(u64 start, u64 end, void *arg)
+ {
+- int *nr_ranges = arg;
++ unsigned int *nr_ranges = arg;
+
+ (*nr_ranges)++;
+ return 0;
+@@ -214,7 +213,7 @@ static void fill_up_crash_elf_data(struct crash_elf_data *ced,
+
+ ced->image = image;
+
+- walk_system_ram_range(0, -1, &nr_ranges,
++ walk_system_ram_res(0, -1, &nr_ranges,
+ get_nr_ram_ranges_callback);
+
+ ced->max_nr_ranges = nr_ranges;
+diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
+index 58bcfb67c01f..ebb5657ee280 100644
+--- a/arch/x86/kernel/paravirt.c
++++ b/arch/x86/kernel/paravirt.c
+@@ -41,10 +41,18 @@
+ #include <asm/timer.h>
+ #include <asm/special_insns.h>
+
+-/* nop stub */
+-void _paravirt_nop(void)
+-{
+-}
++/*
++ * nop stub, which must not clobber anything *including the stack* to
++ * avoid confusing the entry prologues.
++ */
++extern void _paravirt_nop(void);
++asm (".pushsection .entry.text, \"ax\"\n"
++ ".global _paravirt_nop\n"
++ "_paravirt_nop:\n\t"
++ "ret\n\t"
++ ".size _paravirt_nop, . - _paravirt_nop\n\t"
++ ".type _paravirt_nop, @function\n\t"
++ ".popsection");
+
+ /* identity function, which can be inlined */
+ u32 _paravirt_ident_32(u32 x)
+diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
+index f6b916387590..a90ac95562af 100644
+--- a/arch/x86/kernel/process_64.c
++++ b/arch/x86/kernel/process_64.c
+@@ -497,27 +497,59 @@ void set_personality_ia32(bool x32)
+ }
+ EXPORT_SYMBOL_GPL(set_personality_ia32);
+
++/*
++ * Called from fs/proc with a reference on @p to find the function
++ * which called into schedule(). This needs to be done carefully
++ * because the task might wake up and we might look at a stack
++ * changing under us.
++ */
+ unsigned long get_wchan(struct task_struct *p)
+ {
+- unsigned long stack;
+- u64 fp, ip;
++ unsigned long start, bottom, top, sp, fp, ip;
+ int count = 0;
+
+ if (!p || p == current || p->state == TASK_RUNNING)
+ return 0;
+- stack = (unsigned long)task_stack_page(p);
+- if (p->thread.sp < stack || p->thread.sp >= stack+THREAD_SIZE)
++
++ start = (unsigned long)task_stack_page(p);
++ if (!start)
++ return 0;
++
++ /*
++ * Layout of the stack page:
++ *
++ * ----------- topmax = start + THREAD_SIZE - sizeof(unsigned long)
++ * PADDING
++ * ----------- top = topmax - TOP_OF_KERNEL_STACK_PADDING
++ * stack
++ * ----------- bottom = start + sizeof(thread_info)
++ * thread_info
++ * ----------- start
++ *
++ * The tasks stack pointer points at the location where the
++ * framepointer is stored. The data on the stack is:
++ * ... IP FP ... IP FP
++ *
++ * We need to read FP and IP, so we need to adjust the upper
++ * bound by another unsigned long.
++ */
++ top = start + THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING;
++ top -= 2 * sizeof(unsigned long);
++ bottom = start + sizeof(struct thread_info);
++
++ sp = READ_ONCE(p->thread.sp);
++ if (sp < bottom || sp > top)
+ return 0;
+- fp = *(u64 *)(p->thread.sp);
++
++ fp = READ_ONCE(*(unsigned long *)sp);
+ do {
+- if (fp < (unsigned long)stack ||
+- fp >= (unsigned long)stack+THREAD_SIZE)
++ if (fp < bottom || fp > top)
+ return 0;
+- ip = *(u64 *)(fp+8);
++ ip = READ_ONCE(*(unsigned long *)(fp + sizeof(unsigned long)));
+ if (!in_sched_functions(ip))
+ return ip;
+- fp = *(u64 *)fp;
+- } while (count++ < 16);
++ fp = READ_ONCE(*(unsigned long *)fp);
++ } while (count++ < 16 && p->state != TASK_RUNNING);
+ return 0;
+ }
+
+diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
+index 7437b41f6a47..dc9af7a0839a 100644
+--- a/arch/x86/kernel/tsc.c
++++ b/arch/x86/kernel/tsc.c
+@@ -21,6 +21,7 @@
+ #include <asm/hypervisor.h>
+ #include <asm/nmi.h>
+ #include <asm/x86_init.h>
++#include <asm/geode.h>
+
+ unsigned int __read_mostly cpu_khz; /* TSC clocks / usec, not used here */
+ EXPORT_SYMBOL(cpu_khz);
+@@ -1013,15 +1014,17 @@ EXPORT_SYMBOL_GPL(mark_tsc_unstable);
+
+ static void __init check_system_tsc_reliable(void)
+ {
+-#ifdef CONFIG_MGEODE_LX
+- /* RTSC counts during suspend */
++#if defined(CONFIG_MGEODEGX1) || defined(CONFIG_MGEODE_LX) || defined(CONFIG_X86_GENERIC)
++ if (is_geode_lx()) {
++ /* RTSC counts during suspend */
+ #define RTSC_SUSP 0x100
+- unsigned long res_low, res_high;
++ unsigned long res_low, res_high;
+
+- rdmsr_safe(MSR_GEODE_BUSCONT_CONF0, &res_low, &res_high);
+- /* Geode_LX - the OLPC CPU has a very reliable TSC */
+- if (res_low & RTSC_SUSP)
+- tsc_clocksource_reliable = 1;
++ rdmsr_safe(MSR_GEODE_BUSCONT_CONF0, &res_low, &res_high);
++ /* Geode_LX - the OLPC CPU has a very reliable TSC */
++ if (res_low & RTSC_SUSP)
++ tsc_clocksource_reliable = 1;
++ }
+ #endif
+ if (boot_cpu_has(X86_FEATURE_TSC_RELIABLE))
+ tsc_clocksource_reliable = 1;
+diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
+index 8e0c0844c6b9..2d32b67a1043 100644
+--- a/arch/x86/kvm/svm.c
++++ b/arch/x86/kvm/svm.c
+@@ -513,7 +513,7 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ if (svm->vmcb->control.next_rip != 0) {
+- WARN_ON(!static_cpu_has(X86_FEATURE_NRIPS));
++ WARN_ON_ONCE(!static_cpu_has(X86_FEATURE_NRIPS));
+ svm->next_rip = svm->vmcb->control.next_rip;
+ }
+
+@@ -865,64 +865,6 @@ static void svm_disable_lbrv(struct vcpu_svm *svm)
+ set_msr_interception(msrpm, MSR_IA32_LASTINTTOIP, 0, 0);
+ }
+
+-#define MTRR_TYPE_UC_MINUS 7
+-#define MTRR2PROTVAL_INVALID 0xff
+-
+-static u8 mtrr2protval[8];
+-
+-static u8 fallback_mtrr_type(int mtrr)
+-{
+- /*
+- * WT and WP aren't always available in the host PAT. Treat
+- * them as UC and UC- respectively. Everything else should be
+- * there.
+- */
+- switch (mtrr)
+- {
+- case MTRR_TYPE_WRTHROUGH:
+- return MTRR_TYPE_UNCACHABLE;
+- case MTRR_TYPE_WRPROT:
+- return MTRR_TYPE_UC_MINUS;
+- default:
+- BUG();
+- }
+-}
+-
+-static void build_mtrr2protval(void)
+-{
+- int i;
+- u64 pat;
+-
+- for (i = 0; i < 8; i++)
+- mtrr2protval[i] = MTRR2PROTVAL_INVALID;
+-
+- /* Ignore the invalid MTRR types. */
+- mtrr2protval[2] = 0;
+- mtrr2protval[3] = 0;
+-
+- /*
+- * Use host PAT value to figure out the mapping from guest MTRR
+- * values to nested page table PAT/PCD/PWT values. We do not
+- * want to change the host PAT value every time we enter the
+- * guest.
+- */
+- rdmsrl(MSR_IA32_CR_PAT, pat);
+- for (i = 0; i < 8; i++) {
+- u8 mtrr = pat >> (8 * i);
+-
+- if (mtrr2protval[mtrr] == MTRR2PROTVAL_INVALID)
+- mtrr2protval[mtrr] = __cm_idx2pte(i);
+- }
+-
+- for (i = 0; i < 8; i++) {
+- if (mtrr2protval[i] == MTRR2PROTVAL_INVALID) {
+- u8 fallback = fallback_mtrr_type(i);
+- mtrr2protval[i] = mtrr2protval[fallback];
+- BUG_ON(mtrr2protval[i] == MTRR2PROTVAL_INVALID);
+- }
+- }
+-}
+-
+ static __init int svm_hardware_setup(void)
+ {
+ int cpu;
+@@ -989,7 +931,6 @@ static __init int svm_hardware_setup(void)
+ } else
+ kvm_disable_tdp();
+
+- build_mtrr2protval();
+ return 0;
+
+ err:
+@@ -1144,39 +1085,6 @@ static u64 svm_compute_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc)
+ return target_tsc - tsc;
+ }
+
+-static void svm_set_guest_pat(struct vcpu_svm *svm, u64 *g_pat)
+-{
+- struct kvm_vcpu *vcpu = &svm->vcpu;
+-
+- /* Unlike Intel, AMD takes the guest's CR0.CD into account.
+- *
+- * AMD does not have IPAT. To emulate it for the case of guests
+- * with no assigned devices, just set everything to WB. If guests
+- * have assigned devices, however, we cannot force WB for RAM
+- * pages only, so use the guest PAT directly.
+- */
+- if (!kvm_arch_has_assigned_device(vcpu->kvm))
+- *g_pat = 0x0606060606060606;
+- else
+- *g_pat = vcpu->arch.pat;
+-}
+-
+-static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
+-{
+- u8 mtrr;
+-
+- /*
+- * 1. MMIO: trust guest MTRR, so same as item 3.
+- * 2. No passthrough: always map as WB, and force guest PAT to WB as well
+- * 3. Passthrough: can't guarantee the result, try to trust guest.
+- */
+- if (!is_mmio && !kvm_arch_has_assigned_device(vcpu->kvm))
+- return 0;
+-
+- mtrr = kvm_mtrr_get_guest_memory_type(vcpu, gfn);
+- return mtrr2protval[mtrr];
+-}
+-
+ static void init_vmcb(struct vcpu_svm *svm, bool init_event)
+ {
+ struct vmcb_control_area *control = &svm->vmcb->control;
+@@ -1260,6 +1168,7 @@ static void init_vmcb(struct vcpu_svm *svm, bool init_event)
+ * It also updates the guest-visible cr0 value.
+ */
+ (void)kvm_set_cr0(&svm->vcpu, X86_CR0_NW | X86_CR0_CD | X86_CR0_ET);
++ kvm_mmu_reset_context(&svm->vcpu);
+
+ save->cr4 = X86_CR4_PAE;
+ /* rdx = ?? */
+@@ -1272,7 +1181,6 @@ static void init_vmcb(struct vcpu_svm *svm, bool init_event)
+ clr_cr_intercept(svm, INTERCEPT_CR3_READ);
+ clr_cr_intercept(svm, INTERCEPT_CR3_WRITE);
+ save->g_pat = svm->vcpu.arch.pat;
+- svm_set_guest_pat(svm, &save->g_pat);
+ save->cr3 = 0;
+ save->cr4 = 0;
+ }
+@@ -3347,16 +3255,6 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
+ case MSR_VM_IGNNE:
+ vcpu_unimpl(vcpu, "unimplemented wrmsr: 0x%x data 0x%llx\n", ecx, data);
+ break;
+- case MSR_IA32_CR_PAT:
+- if (npt_enabled) {
+- if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data))
+- return 1;
+- vcpu->arch.pat = data;
+- svm_set_guest_pat(svm, &svm->vmcb->save.g_pat);
+- mark_dirty(svm->vmcb, VMCB_NPT);
+- break;
+- }
+- /* fall through */
+ default:
+ return kvm_set_msr_common(vcpu, msr);
+ }
+@@ -4191,6 +4089,11 @@ static bool svm_has_high_real_mode_segbase(void)
+ return true;
+ }
+
++static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
++{
++ return 0;
++}
++
+ static void svm_cpuid_update(struct kvm_vcpu *vcpu)
+ {
+ }
+diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
+index 83b7b5cd75d5..aa9e8229571d 100644
+--- a/arch/x86/kvm/vmx.c
++++ b/arch/x86/kvm/vmx.c
+@@ -6134,6 +6134,8 @@ static __init int hardware_setup(void)
+ memcpy(vmx_msr_bitmap_longmode_x2apic,
+ vmx_msr_bitmap_longmode, PAGE_SIZE);
+
++ set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
++
+ if (enable_apicv) {
+ for (msr = 0x800; msr <= 0x8ff; msr++)
+ vmx_disable_intercept_msr_read_x2apic(msr);
+@@ -8632,17 +8634,22 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
+ u64 ipat = 0;
+
+ /* For VT-d and EPT combination
+- * 1. MMIO: guest may want to apply WC, trust it.
++ * 1. MMIO: always map as UC
+ * 2. EPT with VT-d:
+ * a. VT-d without snooping control feature: can't guarantee the
+- * result, try to trust guest. So the same as item 1.
++ * result, try to trust guest.
+ * b. VT-d with snooping control feature: snooping control feature of
+ * VT-d engine can guarantee the cache correctness. Just set it
+ * to WB to keep consistent with host. So the same as item 3.
+ * 3. EPT without VT-d: always map as WB and set IPAT=1 to keep
+ * consistent with host MTRR
+ */
+- if (!is_mmio && !kvm_arch_has_noncoherent_dma(vcpu->kvm)) {
++ if (is_mmio) {
++ cache = MTRR_TYPE_UNCACHABLE;
++ goto exit;
++ }
++
++ if (!kvm_arch_has_noncoherent_dma(vcpu->kvm)) {
+ ipat = VMX_EPT_IPAT_BIT;
+ cache = MTRR_TYPE_WRBACK;
+ goto exit;
+diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
+index 8f0f6eca69da..32c6e6ac5964 100644
+--- a/arch/x86/kvm/x86.c
++++ b/arch/x86/kvm/x86.c
+@@ -2388,6 +2388,8 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+ case MSR_IA32_LASTINTFROMIP:
+ case MSR_IA32_LASTINTTOIP:
+ case MSR_K8_SYSCFG:
++ case MSR_K8_TSEG_ADDR:
++ case MSR_K8_TSEG_MASK:
+ case MSR_K7_HWCR:
+ case MSR_VM_HSAVE_PA:
+ case MSR_K8_INT_PENDING_MSG:
+diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
+index 3fba623e3ba5..f9977a7a9444 100644
+--- a/arch/x86/mm/init_64.c
++++ b/arch/x86/mm/init_64.c
+@@ -1132,7 +1132,7 @@ void mark_rodata_ro(void)
+ * has been zapped already via cleanup_highmem().
+ */
+ all_end = roundup((unsigned long)_brk_end, PMD_SIZE);
+- set_memory_nx(rodata_start, (all_end - rodata_start) >> PAGE_SHIFT);
++ set_memory_nx(text_end, (all_end - text_end) >> PAGE_SHIFT);
+
+ rodata_test();
+
+diff --git a/arch/x86/pci/intel_mid_pci.c b/arch/x86/pci/intel_mid_pci.c
+index 27062303c881..7553921c146c 100644
+--- a/arch/x86/pci/intel_mid_pci.c
++++ b/arch/x86/pci/intel_mid_pci.c
+@@ -35,6 +35,9 @@
+
+ #define PCIE_CAP_OFFSET 0x100
+
++/* Quirks for the listed devices */
++#define PCI_DEVICE_ID_INTEL_MRFL_MMC 0x1190
++
+ /* Fixed BAR fields */
+ #define PCIE_VNDR_CAP_ID_FIXED_BAR 0x00 /* Fixed BAR (TBD) */
+ #define PCI_FIXED_BAR_0_SIZE 0x04
+@@ -214,10 +217,27 @@ static int intel_mid_pci_irq_enable(struct pci_dev *dev)
+ if (dev->irq_managed && dev->irq > 0)
+ return 0;
+
+- if (intel_mid_identify_cpu() == INTEL_MID_CPU_CHIP_TANGIER)
++ switch (intel_mid_identify_cpu()) {
++ case INTEL_MID_CPU_CHIP_TANGIER:
+ polarity = 0; /* active high */
+- else
++
++ /* Special treatment for IRQ0 */
++ if (dev->irq == 0) {
++ /*
++ * TNG has IRQ0 assigned to eMMC controller. But there
++ * are also other devices with bogus PCI configuration
++ * that have IRQ0 assigned. This check ensures that
++ * eMMC gets it.
++ */
++ if (dev->device != PCI_DEVICE_ID_INTEL_MRFL_MMC)
++ return -EBUSY;
++ }
++ break;
++ default:
+ polarity = 1; /* active low */
++ break;
++ }
++
+ ioapic_set_alloc_attr(&info, dev_to_node(&dev->dev), 1, polarity);
+
+ /*
+diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
+index e4308fe6afe8..c6835bfad3a1 100644
+--- a/arch/x86/platform/efi/efi.c
++++ b/arch/x86/platform/efi/efi.c
+@@ -705,6 +705,70 @@ out:
+ }
+
+ /*
++ * Iterate the EFI memory map in reverse order because the regions
++ * will be mapped top-down. The end result is the same as if we had
++ * mapped things forward, but doesn't require us to change the
++ * existing implementation of efi_map_region().
++ */
++static inline void *efi_map_next_entry_reverse(void *entry)
++{
++ /* Initial call */
++ if (!entry)
++ return memmap.map_end - memmap.desc_size;
++
++ entry -= memmap.desc_size;
++ if (entry < memmap.map)
++ return NULL;
++
++ return entry;
++}
++
++/*
++ * efi_map_next_entry - Return the next EFI memory map descriptor
++ * @entry: Previous EFI memory map descriptor
++ *
++ * This is a helper function to iterate over the EFI memory map, which
++ * we do in different orders depending on the current configuration.
++ *
++ * To begin traversing the memory map @entry must be %NULL.
++ *
++ * Returns %NULL when we reach the end of the memory map.
++ */
++static void *efi_map_next_entry(void *entry)
++{
++ if (!efi_enabled(EFI_OLD_MEMMAP) && efi_enabled(EFI_64BIT)) {
++ /*
++ * Starting in UEFI v2.5 the EFI_PROPERTIES_TABLE
++ * config table feature requires us to map all entries
++ * in the same order as they appear in the EFI memory
++ * map. That is to say, entry N must have a lower
++ * virtual address than entry N+1. This is because the
++ * firmware toolchain leaves relative references in
++ * the code/data sections, which are split and become
++ * separate EFI memory regions. Mapping things
++ * out-of-order leads to the firmware accessing
++ * unmapped addresses.
++ *
++ * Since we need to map things this way whether or not
++ * the kernel actually makes use of
++ * EFI_PROPERTIES_TABLE, let's just switch to this
++ * scheme by default for 64-bit.
++ */
++ return efi_map_next_entry_reverse(entry);
++ }
++
++ /* Initial call */
++ if (!entry)
++ return memmap.map;
++
++ entry += memmap.desc_size;
++ if (entry >= memmap.map_end)
++ return NULL;
++
++ return entry;
++}
++
++/*
+ * Map the efi memory ranges of the runtime services and update new_mmap with
+ * virtual addresses.
+ */
+@@ -714,7 +778,8 @@ static void * __init efi_map_regions(int *count, int *pg_shift)
+ unsigned long left = 0;
+ efi_memory_desc_t *md;
+
+- for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
++ p = NULL;
++ while ((p = efi_map_next_entry(p))) {
+ md = p;
+ if (!(md->attribute & EFI_MEMORY_RUNTIME)) {
+ #ifdef CONFIG_X86_64
+diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
+index 11d6fb4e8483..777ad2f03160 100644
+--- a/arch/x86/xen/enlighten.c
++++ b/arch/x86/xen/enlighten.c
+@@ -33,6 +33,10 @@
+ #include <linux/memblock.h>
+ #include <linux/edd.h>
+
++#ifdef CONFIG_KEXEC_CORE
++#include <linux/kexec.h>
++#endif
++
+ #include <xen/xen.h>
+ #include <xen/events.h>
+ #include <xen/interface/xen.h>
+@@ -1800,6 +1804,21 @@ static struct notifier_block xen_hvm_cpu_notifier = {
+ .notifier_call = xen_hvm_cpu_notify,
+ };
+
++#ifdef CONFIG_KEXEC_CORE
++static void xen_hvm_shutdown(void)
++{
++ native_machine_shutdown();
++ if (kexec_in_progress)
++ xen_reboot(SHUTDOWN_soft_reset);
++}
++
++static void xen_hvm_crash_shutdown(struct pt_regs *regs)
++{
++ native_machine_crash_shutdown(regs);
++ xen_reboot(SHUTDOWN_soft_reset);
++}
++#endif
++
+ static void __init xen_hvm_guest_init(void)
+ {
+ if (xen_pv_domain())
+@@ -1819,6 +1838,10 @@ static void __init xen_hvm_guest_init(void)
+ x86_init.irqs.intr_init = xen_init_IRQ;
+ xen_hvm_init_time_ops();
+ xen_hvm_init_mmu_ops();
++#ifdef CONFIG_KEXEC_CORE
++ machine_ops.shutdown = xen_hvm_shutdown;
++ machine_ops.crash_shutdown = xen_hvm_crash_shutdown;
++#endif
+ }
+ #endif
+
+diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
+index d6283b3f5db5..9cc48d1d7abb 100644
+--- a/block/blk-cgroup.c
++++ b/block/blk-cgroup.c
+@@ -387,6 +387,9 @@ static void blkg_destroy_all(struct request_queue *q)
+ blkg_destroy(blkg);
+ spin_unlock(&blkcg->lock);
+ }
++
++ q->root_blkg = NULL;
++ q->root_rl.blkg = NULL;
+ }
+
+ /*
+diff --git a/block/blk-mq.c b/block/blk-mq.c
+index 176262ec3731..c69902695136 100644
+--- a/block/blk-mq.c
++++ b/block/blk-mq.c
+@@ -1807,7 +1807,6 @@ static void blk_mq_map_swqueue(struct request_queue *q)
+
+ hctx = q->mq_ops->map_queue(q, i);
+ cpumask_set_cpu(i, hctx->cpumask);
+- cpumask_set_cpu(i, hctx->tags->cpumask);
+ ctx->index_hw = hctx->nr_ctx;
+ hctx->ctxs[hctx->nr_ctx++] = ctx;
+ }
+@@ -1847,6 +1846,14 @@ static void blk_mq_map_swqueue(struct request_queue *q)
+ hctx->next_cpu = cpumask_first(hctx->cpumask);
+ hctx->next_cpu_batch = BLK_MQ_CPU_WORK_BATCH;
+ }
++
++ queue_for_each_ctx(q, ctx, i) {
++ if (!cpu_online(i))
++ continue;
++
++ hctx = q->mq_ops->map_queue(q, i);
++ cpumask_set_cpu(i, hctx->tags->cpumask);
++ }
+ }
+
+ static void blk_mq_update_tag_set_depth(struct blk_mq_tag_set *set)
+diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c
+index 764280a91776..e9fd32e91668 100644
+--- a/drivers/base/cacheinfo.c
++++ b/drivers/base/cacheinfo.c
+@@ -148,7 +148,11 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)
+
+ if (sibling == cpu) /* skip itself */
+ continue;
++
+ sib_cpu_ci = get_cpu_cacheinfo(sibling);
++ if (!sib_cpu_ci->info_list)
++ continue;
++
+ sib_leaf = sib_cpu_ci->info_list + index;
+ cpumask_clear_cpu(cpu, &sib_leaf->shared_cpu_map);
+ cpumask_clear_cpu(sibling, &this_leaf->shared_cpu_map);
+@@ -159,6 +163,9 @@ static void cache_shared_cpu_map_remove(unsigned int cpu)
+
+ static void free_cache_attributes(unsigned int cpu)
+ {
++ if (!per_cpu_cacheinfo(cpu))
++ return;
++
+ cache_shared_cpu_map_remove(cpu);
+
+ kfree(per_cpu_cacheinfo(cpu));
+@@ -514,8 +521,7 @@ static int cacheinfo_cpu_callback(struct notifier_block *nfb,
+ break;
+ case CPU_DEAD:
+ cache_remove_dev(cpu);
+- if (per_cpu_cacheinfo(cpu))
+- free_cache_attributes(cpu);
++ free_cache_attributes(cpu);
+ break;
+ }
+ return notifier_from_errno(rc);
+diff --git a/drivers/base/property.c b/drivers/base/property.c
+index f3f6d167f3f1..37a7bb7b239d 100644
+--- a/drivers/base/property.c
++++ b/drivers/base/property.c
+@@ -27,9 +27,10 @@
+ */
+ void device_add_property_set(struct device *dev, struct property_set *pset)
+ {
+- if (pset)
+- pset->fwnode.type = FWNODE_PDATA;
++ if (!pset)
++ return;
+
++ pset->fwnode.type = FWNODE_PDATA;
+ set_secondary_fwnode(dev, &pset->fwnode);
+ }
+ EXPORT_SYMBOL_GPL(device_add_property_set);
+diff --git a/drivers/base/regmap/regmap-debugfs.c b/drivers/base/regmap/regmap-debugfs.c
+index 5799a0b9e6cc..c8941f39c919 100644
+--- a/drivers/base/regmap/regmap-debugfs.c
++++ b/drivers/base/regmap/regmap-debugfs.c
+@@ -32,8 +32,7 @@ static DEFINE_MUTEX(regmap_debugfs_early_lock);
+ /* Calculate the length of a fixed format */
+ static size_t regmap_calc_reg_len(int max_val, char *buf, size_t buf_size)
+ {
+- snprintf(buf, buf_size, "%x", max_val);
+- return strlen(buf);
++ return snprintf(NULL, 0, "%x", max_val);
+ }
+
+ static ssize_t regmap_name_read_file(struct file *file,
+@@ -432,7 +431,7 @@ static ssize_t regmap_access_read_file(struct file *file,
+ /* If we're in the region the user is trying to read */
+ if (p >= *ppos) {
+ /* ...but not beyond it */
+- if (buf_pos >= count - 1 - tot_len)
++ if (buf_pos + tot_len + 1 >= count)
+ break;
+
+ /* Format the register */
+diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
+index deb3f001791f..767657565de6 100644
+--- a/drivers/block/xen-blkback/xenbus.c
++++ b/drivers/block/xen-blkback/xenbus.c
+@@ -212,6 +212,9 @@ static int xen_blkif_map(struct xen_blkif *blkif, grant_ref_t *gref,
+
+ static int xen_blkif_disconnect(struct xen_blkif *blkif)
+ {
++ struct pending_req *req, *n;
++ int i = 0, j;
++
+ if (blkif->xenblkd) {
+ kthread_stop(blkif->xenblkd);
+ wake_up(&blkif->shutdown_wq);
+@@ -238,13 +241,28 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
+ /* Remove all persistent grants and the cache of ballooned pages. */
+ xen_blkbk_free_caches(blkif);
+
++ /* Check that there is no request in use */
++ list_for_each_entry_safe(req, n, &blkif->pending_free, free_list) {
++ list_del(&req->free_list);
++
++ for (j = 0; j < MAX_INDIRECT_SEGMENTS; j++)
++ kfree(req->segments[j]);
++
++ for (j = 0; j < MAX_INDIRECT_PAGES; j++)
++ kfree(req->indirect_pages[j]);
++
++ kfree(req);
++ i++;
++ }
++
++ WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));
++ blkif->nr_ring_pages = 0;
++
+ return 0;
+ }
+
+ static void xen_blkif_free(struct xen_blkif *blkif)
+ {
+- struct pending_req *req, *n;
+- int i = 0, j;
+
+ xen_blkif_disconnect(blkif);
+ xen_vbd_free(&blkif->vbd);
+@@ -257,22 +275,6 @@ static void xen_blkif_free(struct xen_blkif *blkif)
+ BUG_ON(!list_empty(&blkif->free_pages));
+ BUG_ON(!RB_EMPTY_ROOT(&blkif->persistent_gnts));
+
+- /* Check that there is no request in use */
+- list_for_each_entry_safe(req, n, &blkif->pending_free, free_list) {
+- list_del(&req->free_list);
+-
+- for (j = 0; j < MAX_INDIRECT_SEGMENTS; j++)
+- kfree(req->segments[j]);
+-
+- for (j = 0; j < MAX_INDIRECT_PAGES; j++)
+- kfree(req->indirect_pages[j]);
+-
+- kfree(req);
+- i++;
+- }
+-
+- WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));
+-
+ kmem_cache_free(xen_blkif_cachep, blkif);
+ }
+
+diff --git a/drivers/clk/samsung/clk-cpu.c b/drivers/clk/samsung/clk-cpu.c
+index 3a1fe07cfe9e..dd02356e2e86 100644
+--- a/drivers/clk/samsung/clk-cpu.c
++++ b/drivers/clk/samsung/clk-cpu.c
+@@ -161,7 +161,7 @@ static int exynos_cpuclk_pre_rate_change(struct clk_notifier_data *ndata,
+ * the values for DIV_COPY and DIV_HPM dividers need not be set.
+ */
+ div0 = cfg_data->div0;
+- if (test_bit(CLK_CPU_HAS_DIV1, &cpuclk->flags)) {
++ if (cpuclk->flags & CLK_CPU_HAS_DIV1) {
+ div1 = cfg_data->div1;
+ if (readl(base + E4210_SRC_CPU) & E4210_MUX_HPM_MASK)
+ div1 = readl(base + E4210_DIV_CPU1) &
+@@ -182,7 +182,7 @@ static int exynos_cpuclk_pre_rate_change(struct clk_notifier_data *ndata,
+ alt_div = DIV_ROUND_UP(alt_prate, tmp_rate) - 1;
+ WARN_ON(alt_div >= MAX_DIV);
+
+- if (test_bit(CLK_CPU_NEEDS_DEBUG_ALT_DIV, &cpuclk->flags)) {
++ if (cpuclk->flags & CLK_CPU_NEEDS_DEBUG_ALT_DIV) {
+ /*
+ * In Exynos4210, ATB clock parent is also mout_core. So
+ * ATB clock also needs to be mantained at safe speed.
+@@ -203,7 +203,7 @@ static int exynos_cpuclk_pre_rate_change(struct clk_notifier_data *ndata,
+ writel(div0, base + E4210_DIV_CPU0);
+ wait_until_divider_stable(base + E4210_DIV_STAT_CPU0, DIV_MASK_ALL);
+
+- if (test_bit(CLK_CPU_HAS_DIV1, &cpuclk->flags)) {
++ if (cpuclk->flags & CLK_CPU_HAS_DIV1) {
+ writel(div1, base + E4210_DIV_CPU1);
+ wait_until_divider_stable(base + E4210_DIV_STAT_CPU1,
+ DIV_MASK_ALL);
+@@ -222,7 +222,7 @@ static int exynos_cpuclk_post_rate_change(struct clk_notifier_data *ndata,
+ unsigned long mux_reg;
+
+ /* find out the divider values to use for clock data */
+- if (test_bit(CLK_CPU_NEEDS_DEBUG_ALT_DIV, &cpuclk->flags)) {
++ if (cpuclk->flags & CLK_CPU_NEEDS_DEBUG_ALT_DIV) {
+ while ((cfg_data->prate * 1000) != ndata->new_rate) {
+ if (cfg_data->prate == 0)
+ return -EINVAL;
+@@ -237,7 +237,7 @@ static int exynos_cpuclk_post_rate_change(struct clk_notifier_data *ndata,
+ writel(mux_reg & ~(1 << 16), base + E4210_SRC_CPU);
+ wait_until_mux_stable(base + E4210_STAT_CPU, 16, 1);
+
+- if (test_bit(CLK_CPU_NEEDS_DEBUG_ALT_DIV, &cpuclk->flags)) {
++ if (cpuclk->flags & CLK_CPU_NEEDS_DEBUG_ALT_DIV) {
+ div |= (cfg_data->div0 & E4210_DIV0_ATB_MASK);
+ div_mask |= E4210_DIV0_ATB_MASK;
+ }
+diff --git a/drivers/clk/ti/clk-3xxx.c b/drivers/clk/ti/clk-3xxx.c
+index 757636d166cf..4ab28cfb8d2a 100644
+--- a/drivers/clk/ti/clk-3xxx.c
++++ b/drivers/clk/ti/clk-3xxx.c
+@@ -163,7 +163,6 @@ static struct ti_dt_clk omap3xxx_clks[] = {
+ DT_CLK(NULL, "gpio2_ick", "gpio2_ick"),
+ DT_CLK(NULL, "wdt3_ick", "wdt3_ick"),
+ DT_CLK(NULL, "uart3_ick", "uart3_ick"),
+- DT_CLK(NULL, "uart4_ick", "uart4_ick"),
+ DT_CLK(NULL, "gpt9_ick", "gpt9_ick"),
+ DT_CLK(NULL, "gpt8_ick", "gpt8_ick"),
+ DT_CLK(NULL, "gpt7_ick", "gpt7_ick"),
+@@ -308,6 +307,7 @@ static struct ti_dt_clk am35xx_clks[] = {
+ static struct ti_dt_clk omap36xx_clks[] = {
+ DT_CLK(NULL, "omap_192m_alwon_fck", "omap_192m_alwon_fck"),
+ DT_CLK(NULL, "uart4_fck", "uart4_fck"),
++ DT_CLK(NULL, "uart4_ick", "uart4_ick"),
+ { .node_name = NULL },
+ };
+
+diff --git a/drivers/clk/ti/clk-7xx.c b/drivers/clk/ti/clk-7xx.c
+index 63b8323df918..0eb82107c421 100644
+--- a/drivers/clk/ti/clk-7xx.c
++++ b/drivers/clk/ti/clk-7xx.c
+@@ -16,7 +16,6 @@
+ #include <linux/clkdev.h>
+ #include <linux/clk/ti.h>
+
+-#define DRA7_DPLL_ABE_DEFFREQ 180633600
+ #define DRA7_DPLL_GMAC_DEFFREQ 1000000000
+ #define DRA7_DPLL_USB_DEFFREQ 960000000
+
+@@ -312,27 +311,12 @@ static struct ti_dt_clk dra7xx_clks[] = {
+ int __init dra7xx_dt_clk_init(void)
+ {
+ int rc;
+- struct clk *abe_dpll_mux, *sys_clkin2, *dpll_ck, *hdcp_ck;
++ struct clk *dpll_ck, *hdcp_ck;
+
+ ti_dt_clocks_register(dra7xx_clks);
+
+ omap2_clk_disable_autoidle_all();
+
+- abe_dpll_mux = clk_get_sys(NULL, "abe_dpll_sys_clk_mux");
+- sys_clkin2 = clk_get_sys(NULL, "sys_clkin2");
+- dpll_ck = clk_get_sys(NULL, "dpll_abe_ck");
+-
+- rc = clk_set_parent(abe_dpll_mux, sys_clkin2);
+- if (!rc)
+- rc = clk_set_rate(dpll_ck, DRA7_DPLL_ABE_DEFFREQ);
+- if (rc)
+- pr_err("%s: failed to configure ABE DPLL!\n", __func__);
+-
+- dpll_ck = clk_get_sys(NULL, "dpll_abe_m2x2_ck");
+- rc = clk_set_rate(dpll_ck, DRA7_DPLL_ABE_DEFFREQ * 2);
+- if (rc)
+- pr_err("%s: failed to configure ABE DPLL m2x2!\n", __func__);
+-
+ dpll_ck = clk_get_sys(NULL, "dpll_gmac_ck");
+ rc = clk_set_rate(dpll_ck, DRA7_DPLL_GMAC_DEFFREQ);
+ if (rc)
+diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
+index 0136dfcdabf0..7c2a7385c2ad 100644
+--- a/drivers/cpufreq/acpi-cpufreq.c
++++ b/drivers/cpufreq/acpi-cpufreq.c
+@@ -146,6 +146,9 @@ static ssize_t show_freqdomain_cpus(struct cpufreq_policy *policy, char *buf)
+ {
+ struct acpi_cpufreq_data *data = per_cpu(acfreq_data, policy->cpu);
+
++ if (unlikely(!data))
++ return -ENODEV;
++
+ return cpufreq_show_cpus(data->freqdomain_cpus, buf);
+ }
+
+diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c
+index 528a82bf5038..99a406501e8c 100644
+--- a/drivers/cpufreq/cpufreq-dt.c
++++ b/drivers/cpufreq/cpufreq-dt.c
+@@ -255,7 +255,8 @@ static int cpufreq_init(struct cpufreq_policy *policy)
+ rcu_read_unlock();
+
+ tol_uV = opp_uV * priv->voltage_tolerance / 100;
+- if (regulator_is_supported_voltage(cpu_reg, opp_uV,
++ if (regulator_is_supported_voltage(cpu_reg,
++ opp_uV - tol_uV,
+ opp_uV + tol_uV)) {
+ if (opp_uV < min_uV)
+ min_uV = opp_uV;
+diff --git a/drivers/crypto/marvell/cesa.h b/drivers/crypto/marvell/cesa.h
+index b60698b30d30..bc2a55bc35e4 100644
+--- a/drivers/crypto/marvell/cesa.h
++++ b/drivers/crypto/marvell/cesa.h
+@@ -687,6 +687,33 @@ static inline u32 mv_cesa_get_int_mask(struct mv_cesa_engine *engine)
+
+ int mv_cesa_queue_req(struct crypto_async_request *req);
+
++/*
++ * Helper function that indicates whether a crypto request needs to be
++ * cleaned up or not after being enqueued using mv_cesa_queue_req().
++ */
++static inline int mv_cesa_req_needs_cleanup(struct crypto_async_request *req,
++ int ret)
++{
++ /*
++ * The queue still had some space, the request was queued
++ * normally, so there's no need to clean it up.
++ */
++ if (ret == -EINPROGRESS)
++ return false;
++
++ /*
++ * The queue had not space left, but since the request is
++ * flagged with CRYPTO_TFM_REQ_MAY_BACKLOG, it was added to
++ * the backlog and will be processed later. There's no need to
++ * clean it up.
++ */
++ if (ret == -EBUSY && req->flags & CRYPTO_TFM_REQ_MAY_BACKLOG)
++ return false;
++
++ /* Request wasn't queued, we need to clean it up */
++ return true;
++}
++
+ /* TDMA functions */
+
+ static inline void mv_cesa_req_dma_iter_init(struct mv_cesa_dma_iter *iter,
+diff --git a/drivers/crypto/marvell/cipher.c b/drivers/crypto/marvell/cipher.c
+index 0745cf3b9c0e..3df2f4e7adb2 100644
+--- a/drivers/crypto/marvell/cipher.c
++++ b/drivers/crypto/marvell/cipher.c
+@@ -189,7 +189,6 @@ static inline void mv_cesa_ablkcipher_prepare(struct crypto_async_request *req,
+ {
+ struct ablkcipher_request *ablkreq = ablkcipher_request_cast(req);
+ struct mv_cesa_ablkcipher_req *creq = ablkcipher_request_ctx(ablkreq);
+-
+ creq->req.base.engine = engine;
+
+ if (creq->req.base.type == CESA_DMA_REQ)
+@@ -431,7 +430,7 @@ static int mv_cesa_des_op(struct ablkcipher_request *req,
+ return ret;
+
+ ret = mv_cesa_queue_req(&req->base);
+- if (ret && ret != -EINPROGRESS)
++ if (mv_cesa_req_needs_cleanup(&req->base, ret))
+ mv_cesa_ablkcipher_cleanup(req);
+
+ return ret;
+@@ -551,7 +550,7 @@ static int mv_cesa_des3_op(struct ablkcipher_request *req,
+ return ret;
+
+ ret = mv_cesa_queue_req(&req->base);
+- if (ret && ret != -EINPROGRESS)
++ if (mv_cesa_req_needs_cleanup(&req->base, ret))
+ mv_cesa_ablkcipher_cleanup(req);
+
+ return ret;
+@@ -693,7 +692,7 @@ static int mv_cesa_aes_op(struct ablkcipher_request *req,
+ return ret;
+
+ ret = mv_cesa_queue_req(&req->base);
+- if (ret && ret != -EINPROGRESS)
++ if (mv_cesa_req_needs_cleanup(&req->base, ret))
+ mv_cesa_ablkcipher_cleanup(req);
+
+ return ret;
+diff --git a/drivers/crypto/marvell/hash.c b/drivers/crypto/marvell/hash.c
+index ae9272eb9c1a..e8d0d7128137 100644
+--- a/drivers/crypto/marvell/hash.c
++++ b/drivers/crypto/marvell/hash.c
+@@ -739,10 +739,8 @@ static int mv_cesa_ahash_update(struct ahash_request *req)
+ return 0;
+
+ ret = mv_cesa_queue_req(&req->base);
+- if (ret && ret != -EINPROGRESS) {
++ if (mv_cesa_req_needs_cleanup(&req->base, ret))
+ mv_cesa_ahash_cleanup(req);
+- return ret;
+- }
+
+ return ret;
+ }
+@@ -766,7 +764,7 @@ static int mv_cesa_ahash_final(struct ahash_request *req)
+ return 0;
+
+ ret = mv_cesa_queue_req(&req->base);
+- if (ret && ret != -EINPROGRESS)
++ if (mv_cesa_req_needs_cleanup(&req->base, ret))
+ mv_cesa_ahash_cleanup(req);
+
+ return ret;
+@@ -791,7 +789,7 @@ static int mv_cesa_ahash_finup(struct ahash_request *req)
+ return 0;
+
+ ret = mv_cesa_queue_req(&req->base);
+- if (ret && ret != -EINPROGRESS)
++ if (mv_cesa_req_needs_cleanup(&req->base, ret))
+ mv_cesa_ahash_cleanup(req);
+
+ return ret;
+diff --git a/drivers/dma/at_xdmac.c b/drivers/dma/at_xdmac.c
+index 40afa2a16cfc..da7917a2eed2 100644
+--- a/drivers/dma/at_xdmac.c
++++ b/drivers/dma/at_xdmac.c
+@@ -455,6 +455,15 @@ static struct at_xdmac_desc *at_xdmac_alloc_desc(struct dma_chan *chan,
+ return desc;
+ }
+
++void at_xdmac_init_used_desc(struct at_xdmac_desc *desc)
++{
++ memset(&desc->lld, 0, sizeof(desc->lld));
++ INIT_LIST_HEAD(&desc->descs_list);
++ desc->direction = DMA_TRANS_NONE;
++ desc->xfer_size = 0;
++ desc->active_xfer = false;
++}
++
+ /* Call must be protected by lock. */
+ static struct at_xdmac_desc *at_xdmac_get_desc(struct at_xdmac_chan *atchan)
+ {
+@@ -466,7 +475,7 @@ static struct at_xdmac_desc *at_xdmac_get_desc(struct at_xdmac_chan *atchan)
+ desc = list_first_entry(&atchan->free_descs_list,
+ struct at_xdmac_desc, desc_node);
+ list_del(&desc->desc_node);
+- desc->active_xfer = false;
++ at_xdmac_init_used_desc(desc);
+ }
+
+ return desc;
+@@ -797,10 +806,7 @@ at_xdmac_prep_dma_cyclic(struct dma_chan *chan, dma_addr_t buf_addr,
+ list_add_tail(&desc->desc_node, &first->descs_list);
+ }
+
+- prev->lld.mbr_nda = first->tx_dma_desc.phys;
+- dev_dbg(chan2dev(chan),
+- "%s: chain lld: prev=0x%p, mbr_nda=%pad\n",
+- __func__, prev, &prev->lld.mbr_nda);
++ at_xdmac_queue_desc(chan, prev, first);
+ first->tx_dma_desc.flags = flags;
+ first->xfer_size = buf_len;
+ first->direction = direction;
+@@ -878,14 +884,14 @@ at_xdmac_interleaved_queue_desc(struct dma_chan *chan,
+
+ if (xt->src_inc) {
+ if (xt->src_sgl)
+- chan_cc |= AT_XDMAC_CC_SAM_UBS_DS_AM;
++ chan_cc |= AT_XDMAC_CC_SAM_UBS_AM;
+ else
+ chan_cc |= AT_XDMAC_CC_SAM_INCREMENTED_AM;
+ }
+
+ if (xt->dst_inc) {
+ if (xt->dst_sgl)
+- chan_cc |= AT_XDMAC_CC_DAM_UBS_DS_AM;
++ chan_cc |= AT_XDMAC_CC_DAM_UBS_AM;
+ else
+ chan_cc |= AT_XDMAC_CC_DAM_INCREMENTED_AM;
+ }
+diff --git a/drivers/dma/dw/core.c b/drivers/dma/dw/core.c
+index cf1c87fa1edd..bedce038c6e2 100644
+--- a/drivers/dma/dw/core.c
++++ b/drivers/dma/dw/core.c
+@@ -1591,7 +1591,6 @@ int dw_dma_probe(struct dw_dma_chip *chip, struct dw_dma_platform_data *pdata)
+ INIT_LIST_HEAD(&dw->dma.channels);
+ for (i = 0; i < nr_channels; i++) {
+ struct dw_dma_chan *dwc = &dw->chan[i];
+- int r = nr_channels - i - 1;
+
+ dwc->chan.device = &dw->dma;
+ dma_cookie_init(&dwc->chan);
+@@ -1603,7 +1602,7 @@ int dw_dma_probe(struct dw_dma_chip *chip, struct dw_dma_platform_data *pdata)
+
+ /* 7 is highest priority & 0 is lowest. */
+ if (pdata->chan_priority == CHAN_PRIORITY_ASCENDING)
+- dwc->priority = r;
++ dwc->priority = nr_channels - i - 1;
+ else
+ dwc->priority = i;
+
+@@ -1622,6 +1621,7 @@ int dw_dma_probe(struct dw_dma_chip *chip, struct dw_dma_platform_data *pdata)
+ /* Hardware configuration */
+ if (autocfg) {
+ unsigned int dwc_params;
++ unsigned int r = DW_DMA_MAX_NR_CHANNELS - i - 1;
+ void __iomem *addr = chip->regs + r * sizeof(u32);
+
+ dwc_params = dma_read_byaddr(addr, DWC_PARAMS);
+diff --git a/drivers/dma/pxa_dma.c b/drivers/dma/pxa_dma.c
+index ddcbbf5cd9e9..95bdbbe2a671 100644
+--- a/drivers/dma/pxa_dma.c
++++ b/drivers/dma/pxa_dma.c
+@@ -888,6 +888,7 @@ pxad_tx_prep(struct virt_dma_chan *vc, struct virt_dma_desc *vd,
+ struct dma_async_tx_descriptor *tx;
+ struct pxad_chan *chan = container_of(vc, struct pxad_chan, vc);
+
++ INIT_LIST_HEAD(&vd->node);
+ tx = vchan_tx_prep(vc, vd, tx_flags);
+ tx->tx_submit = pxad_tx_submit;
+ dev_dbg(&chan->vc.chan.dev->device,
+diff --git a/drivers/extcon/extcon.c b/drivers/extcon/extcon.c
+index 43b57b02d050..ca94f475fd05 100644
+--- a/drivers/extcon/extcon.c
++++ b/drivers/extcon/extcon.c
+@@ -126,7 +126,7 @@ static int find_cable_index_by_id(struct extcon_dev *edev, const unsigned int id
+
+ static int find_cable_id_by_name(struct extcon_dev *edev, const char *name)
+ {
+- unsigned int id = -EINVAL;
++ int id = -EINVAL;
+ int i = 0;
+
+ /* Find the id of extcon cable */
+@@ -143,7 +143,7 @@ static int find_cable_id_by_name(struct extcon_dev *edev, const char *name)
+
+ static int find_cable_index_by_name(struct extcon_dev *edev, const char *name)
+ {
+- unsigned int id;
++ int id;
+
+ if (edev->max_supported == 0)
+ return -EINVAL;
+@@ -159,7 +159,7 @@ static int find_cable_index_by_name(struct extcon_dev *edev, const char *name)
+ static bool is_extcon_changed(u32 prev, u32 new, int idx, bool *attached)
+ {
+ if (((prev >> idx) & 0x1) != ((new >> idx) & 0x1)) {
+- *attached = new ? true : false;
++ *attached = ((new >> idx) & 0x1) ? true : false;
+ return true;
+ }
+
+@@ -378,7 +378,7 @@ EXPORT_SYMBOL_GPL(extcon_get_cable_state_);
+ */
+ int extcon_get_cable_state(struct extcon_dev *edev, const char *cable_name)
+ {
+- unsigned int id;
++ int id;
+
+ id = find_cable_id_by_name(edev, cable_name);
+ if (id < 0)
+@@ -426,7 +426,7 @@ EXPORT_SYMBOL_GPL(extcon_set_cable_state_);
+ int extcon_set_cable_state(struct extcon_dev *edev,
+ const char *cable_name, bool cable_state)
+ {
+- unsigned int id;
++ int id;
+
+ id = find_cable_id_by_name(edev, cable_name);
+ if (id < 0)
+diff --git a/drivers/firmware/efi/libstub/arm-stub.c b/drivers/firmware/efi/libstub/arm-stub.c
+index e29560e6b40b..950c87f5d279 100644
+--- a/drivers/firmware/efi/libstub/arm-stub.c
++++ b/drivers/firmware/efi/libstub/arm-stub.c
+@@ -13,6 +13,7 @@
+ */
+
+ #include <linux/efi.h>
++#include <linux/sort.h>
+ #include <asm/efi.h>
+
+ #include "efistub.h"
+@@ -305,6 +306,44 @@ fail:
+ */
+ #define EFI_RT_VIRTUAL_BASE 0x40000000
+
++static int cmp_mem_desc(const void *l, const void *r)
++{
++ const efi_memory_desc_t *left = l, *right = r;
++
++ return (left->phys_addr > right->phys_addr) ? 1 : -1;
++}
++
++/*
++ * Returns whether region @left ends exactly where region @right starts,
++ * or false if either argument is NULL.
++ */
++static bool regions_are_adjacent(efi_memory_desc_t *left,
++ efi_memory_desc_t *right)
++{
++ u64 left_end;
++
++ if (left == NULL || right == NULL)
++ return false;
++
++ left_end = left->phys_addr + left->num_pages * EFI_PAGE_SIZE;
++
++ return left_end == right->phys_addr;
++}
++
++/*
++ * Returns whether region @left and region @right have compatible memory type
++ * mapping attributes, and are both EFI_MEMORY_RUNTIME regions.
++ */
++static bool regions_have_compatible_memory_type_attrs(efi_memory_desc_t *left,
++ efi_memory_desc_t *right)
++{
++ static const u64 mem_type_mask = EFI_MEMORY_WB | EFI_MEMORY_WT |
++ EFI_MEMORY_WC | EFI_MEMORY_UC |
++ EFI_MEMORY_RUNTIME;
++
++ return ((left->attribute ^ right->attribute) & mem_type_mask) == 0;
++}
++
+ /*
+ * efi_get_virtmap() - create a virtual mapping for the EFI memory map
+ *
+@@ -317,33 +356,52 @@ void efi_get_virtmap(efi_memory_desc_t *memory_map, unsigned long map_size,
+ int *count)
+ {
+ u64 efi_virt_base = EFI_RT_VIRTUAL_BASE;
+- efi_memory_desc_t *out = runtime_map;
++ efi_memory_desc_t *in, *prev = NULL, *out = runtime_map;
+ int l;
+
+- for (l = 0; l < map_size; l += desc_size) {
+- efi_memory_desc_t *in = (void *)memory_map + l;
++ /*
++ * To work around potential issues with the Properties Table feature
++ * introduced in UEFI 2.5, which may split PE/COFF executable images
++ * in memory into several RuntimeServicesCode and RuntimeServicesData
++ * regions, we need to preserve the relative offsets between adjacent
++ * EFI_MEMORY_RUNTIME regions with the same memory type attributes.
++ * The easiest way to find adjacent regions is to sort the memory map
++ * before traversing it.
++ */
++ sort(memory_map, map_size / desc_size, desc_size, cmp_mem_desc, NULL);
++
++ for (l = 0; l < map_size; l += desc_size, prev = in) {
+ u64 paddr, size;
+
++ in = (void *)memory_map + l;
+ if (!(in->attribute & EFI_MEMORY_RUNTIME))
+ continue;
+
++ paddr = in->phys_addr;
++ size = in->num_pages * EFI_PAGE_SIZE;
++
+ /*
+ * Make the mapping compatible with 64k pages: this allows
+ * a 4k page size kernel to kexec a 64k page size kernel and
+ * vice versa.
+ */
+- paddr = round_down(in->phys_addr, SZ_64K);
+- size = round_up(in->num_pages * EFI_PAGE_SIZE +
+- in->phys_addr - paddr, SZ_64K);
+-
+- /*
+- * Avoid wasting memory on PTEs by choosing a virtual base that
+- * is compatible with section mappings if this region has the
+- * appropriate size and physical alignment. (Sections are 2 MB
+- * on 4k granule kernels)
+- */
+- if (IS_ALIGNED(in->phys_addr, SZ_2M) && size >= SZ_2M)
+- efi_virt_base = round_up(efi_virt_base, SZ_2M);
++ if (!regions_are_adjacent(prev, in) ||
++ !regions_have_compatible_memory_type_attrs(prev, in)) {
++
++ paddr = round_down(in->phys_addr, SZ_64K);
++ size += in->phys_addr - paddr;
++
++ /*
++ * Avoid wasting memory on PTEs by choosing a virtual
++ * base that is compatible with section mappings if this
++ * region has the appropriate size and physical
++ * alignment. (Sections are 2 MB on 4k granule kernels)
++ */
++ if (IS_ALIGNED(in->phys_addr, SZ_2M) && size >= SZ_2M)
++ efi_virt_base = round_up(efi_virt_base, SZ_2M);
++ else
++ efi_virt_base = round_up(efi_virt_base, SZ_64K);
++ }
+
+ in->virt_addr = efi_virt_base + in->phys_addr - paddr;
+ efi_virt_base += size;
+diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+index b4d36f0f2153..c098d762089c 100644
+--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
++++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+@@ -140,7 +140,7 @@ void amdgpu_irq_preinstall(struct drm_device *dev)
+ */
+ int amdgpu_irq_postinstall(struct drm_device *dev)
+ {
+- dev->max_vblank_count = 0x001fffff;
++ dev->max_vblank_count = 0x00ffffff;
+ return 0;
+ }
+
+diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+index 2abc661845b6..ddcfbf3b188b 100644
+--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
++++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+@@ -543,46 +543,60 @@ static int amdgpu_uvd_cs_msg(struct amdgpu_uvd_cs_ctx *ctx,
+ return -EINVAL;
+ }
+
+- if (msg_type == 1) {
++ switch (msg_type) {
++ case 0:
++ /* it's a create msg, calc image size (width * height) */
++ amdgpu_bo_kunmap(bo);
++
++ /* try to alloc a new handle */
++ for (i = 0; i < AMDGPU_MAX_UVD_HANDLES; ++i) {
++ if (atomic_read(&adev->uvd.handles[i]) == handle) {
++ DRM_ERROR("Handle 0x%x already in use!\n", handle);
++ return -EINVAL;
++ }
++
++ if (!atomic_cmpxchg(&adev->uvd.handles[i], 0, handle)) {
++ adev->uvd.filp[i] = ctx->parser->filp;
++ return 0;
++ }
++ }
++
++ DRM_ERROR("No more free UVD handles!\n");
++ return -EINVAL;
++
++ case 1:
+ /* it's a decode msg, calc buffer sizes */
+ r = amdgpu_uvd_cs_msg_decode(msg, ctx->buf_sizes);
+ amdgpu_bo_kunmap(bo);
+ if (r)
+ return r;
+
+- } else if (msg_type == 2) {
++ /* validate the handle */
++ for (i = 0; i < AMDGPU_MAX_UVD_HANDLES; ++i) {
++ if (atomic_read(&adev->uvd.handles[i]) == handle) {
++ if (adev->uvd.filp[i] != ctx->parser->filp) {
++ DRM_ERROR("UVD handle collision detected!\n");
++ return -EINVAL;
++ }
++ return 0;
++ }
++ }
++
++ DRM_ERROR("Invalid UVD handle 0x%x!\n", handle);
++ return -ENOENT;
++
++ case 2:
+ /* it's a destroy msg, free the handle */
+ for (i = 0; i < AMDGPU_MAX_UVD_HANDLES; ++i)
+ atomic_cmpxchg(&adev->uvd.handles[i], handle, 0);
+ amdgpu_bo_kunmap(bo);
+ return 0;
+- } else {
+- /* it's a create msg */
+- amdgpu_bo_kunmap(bo);
+-
+- if (msg_type != 0) {
+- DRM_ERROR("Illegal UVD message type (%d)!\n", msg_type);
+- return -EINVAL;
+- }
+-
+- /* it's a create msg, no special handling needed */
+- }
+-
+- /* create or decode, validate the handle */
+- for (i = 0; i < AMDGPU_MAX_UVD_HANDLES; ++i) {
+- if (atomic_read(&adev->uvd.handles[i]) == handle)
+- return 0;
+- }
+
+- /* handle not found try to alloc a new one */
+- for (i = 0; i < AMDGPU_MAX_UVD_HANDLES; ++i) {
+- if (!atomic_cmpxchg(&adev->uvd.handles[i], 0, handle)) {
+- adev->uvd.filp[i] = ctx->parser->filp;
+- return 0;
+- }
++ default:
++ DRM_ERROR("Illegal UVD message type (%d)!\n", msg_type);
++ return -EINVAL;
+ }
+-
+- DRM_ERROR("No more free UVD handles!\n");
++ BUG();
+ return -EINVAL;
+ }
+
+diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+index 9a4e3b63f1cb..b07402fc8ded 100644
+--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
++++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+@@ -787,7 +787,7 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev,
+ int r;
+
+ if (mem) {
+- addr = mem->start << PAGE_SHIFT;
++ addr = (u64)mem->start << PAGE_SHIFT;
+ if (mem->mem_type != TTM_PL_TT)
+ addr += adev->vm_manager.vram_base_offset;
+ } else {
+diff --git a/drivers/gpu/drm/amd/amdgpu/atombios_encoders.c b/drivers/gpu/drm/amd/amdgpu/atombios_encoders.c
+index ae8caca61e04..e60557417049 100644
+--- a/drivers/gpu/drm/amd/amdgpu/atombios_encoders.c
++++ b/drivers/gpu/drm/amd/amdgpu/atombios_encoders.c
+@@ -1279,8 +1279,7 @@ amdgpu_atombios_encoder_setup_dig(struct drm_encoder *encoder, int action)
+ amdgpu_atombios_encoder_setup_dig_encoder(encoder, ATOM_ENCODER_CMD_DP_VIDEO_ON, 0);
+ }
+ if (amdgpu_encoder->devices & (ATOM_DEVICE_LCD_SUPPORT))
+- amdgpu_atombios_encoder_setup_dig_transmitter(encoder,
+- ATOM_TRANSMITTER_ACTION_LCD_BLON, 0, 0);
++ amdgpu_atombios_encoder_set_backlight_level(amdgpu_encoder, dig->backlight_level);
+ if (ext_encoder)
+ amdgpu_atombios_encoder_setup_external_encoder(encoder, ext_encoder, ATOM_ENABLE);
+ } else {
+diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c b/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c
+index 4efd671d7a9b..9488ea6ea93f 100644
+--- a/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c
++++ b/drivers/gpu/drm/amd/amdgpu/uvd_v4_2.c
+@@ -224,11 +224,11 @@ static int uvd_v4_2_suspend(void *handle)
+ int r;
+ struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+
+- r = uvd_v4_2_hw_fini(adev);
++ r = amdgpu_uvd_suspend(adev);
+ if (r)
+ return r;
+
+- r = amdgpu_uvd_suspend(adev);
++ r = uvd_v4_2_hw_fini(adev);
+ if (r)
+ return r;
+
+diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c
+index b756bd99c0fd..d0ed998228ef 100644
+--- a/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c
++++ b/drivers/gpu/drm/amd/amdgpu/uvd_v5_0.c
+@@ -220,11 +220,11 @@ static int uvd_v5_0_suspend(void *handle)
+ int r;
+ struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+
+- r = uvd_v5_0_hw_fini(adev);
++ r = amdgpu_uvd_suspend(adev);
+ if (r)
+ return r;
+
+- r = amdgpu_uvd_suspend(adev);
++ r = uvd_v5_0_hw_fini(adev);
+ if (r)
+ return r;
+
+diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
+index 49aa931b2cb4..345eb760fd5b 100644
+--- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
++++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
+@@ -214,11 +214,11 @@ static int uvd_v6_0_suspend(void *handle)
+ int r;
+ struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+
+- r = uvd_v6_0_hw_fini(adev);
++ r = amdgpu_uvd_suspend(adev);
+ if (r)
+ return r;
+
+- r = amdgpu_uvd_suspend(adev);
++ r = uvd_v6_0_hw_fini(adev);
+ if (r)
+ return r;
+
+diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c
+index 68552da40287..4f58a1e18de6 100644
+--- a/drivers/gpu/drm/amd/amdgpu/vi.c
++++ b/drivers/gpu/drm/amd/amdgpu/vi.c
+@@ -1290,7 +1290,8 @@ static int vi_common_early_init(void *handle)
+ case CHIP_CARRIZO:
+ adev->has_uvd = true;
+ adev->cg_flags = 0;
+- adev->pg_flags = AMDGPU_PG_SUPPORT_UVD | AMDGPU_PG_SUPPORT_VCE;
++ /* Disable UVD pg */
++ adev->pg_flags = /* AMDGPU_PG_SUPPORT_UVD | */AMDGPU_PG_SUPPORT_VCE;
+ adev->external_rev_id = adev->rev_id + 0x1;
+ if (amdgpu_smc_load_fw && smc_enabled)
+ adev->firmware.smu_load = true;
+diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
+index eb603f1defc2..969e7898a7ed 100644
+--- a/drivers/gpu/drm/drm_dp_mst_topology.c
++++ b/drivers/gpu/drm/drm_dp_mst_topology.c
+@@ -804,8 +804,6 @@ static void drm_dp_destroy_mst_branch_device(struct kref *kref)
+ struct drm_dp_mst_port *port, *tmp;
+ bool wake_tx = false;
+
+- cancel_work_sync(&mstb->mgr->work);
+-
+ /*
+ * destroy all ports - don't need lock
+ * as there are no more references to the mst branch
+@@ -863,29 +861,33 @@ static void drm_dp_destroy_port(struct kref *kref)
+ {
+ struct drm_dp_mst_port *port = container_of(kref, struct drm_dp_mst_port, kref);
+ struct drm_dp_mst_topology_mgr *mgr = port->mgr;
++
+ if (!port->input) {
+ port->vcpi.num_slots = 0;
+
+ kfree(port->cached_edid);
+
+- /* we can't destroy the connector here, as
+- we might be holding the mode_config.mutex
+- from an EDID retrieval */
++ /*
++ * The only time we don't have a connector
++ * on an output port is if the connector init
++ * fails.
++ */
+ if (port->connector) {
++ /* we can't destroy the connector here, as
++ * we might be holding the mode_config.mutex
++ * from an EDID retrieval */
++
+ mutex_lock(&mgr->destroy_connector_lock);
+ list_add(&port->next, &mgr->destroy_connector_list);
+ mutex_unlock(&mgr->destroy_connector_lock);
+ schedule_work(&mgr->destroy_connector_work);
+ return;
+ }
++ /* no need to clean up vcpi
++ * as if we have no connector we never setup a vcpi */
+ drm_dp_port_teardown_pdt(port, port->pdt);
+-
+- if (!port->input && port->vcpi.vcpi > 0)
+- drm_dp_mst_put_payload_id(mgr, port->vcpi.vcpi);
+ }
+ kfree(port);
+-
+- (*mgr->cbs->hotplug)(mgr);
+ }
+
+ static void drm_dp_put_port(struct drm_dp_mst_port *port)
+@@ -1115,12 +1117,21 @@ static void drm_dp_add_port(struct drm_dp_mst_branch *mstb,
+ char proppath[255];
+ build_mst_prop_path(port, mstb, proppath, sizeof(proppath));
+ port->connector = (*mstb->mgr->cbs->add_connector)(mstb->mgr, port, proppath);
+-
++ if (!port->connector) {
++ /* remove it from the port list */
++ mutex_lock(&mstb->mgr->lock);
++ list_del(&port->next);
++ mutex_unlock(&mstb->mgr->lock);
++ /* drop port list reference */
++ drm_dp_put_port(port);
++ goto out;
++ }
+ if (port->port_num >= 8) {
+ port->cached_edid = drm_get_edid(port->connector, &port->aux.ddc);
+ }
+ }
+
++out:
+ /* put reference to this port */
+ drm_dp_put_port(port);
+ }
+@@ -1978,6 +1989,8 @@ void drm_dp_mst_topology_mgr_suspend(struct drm_dp_mst_topology_mgr *mgr)
+ drm_dp_dpcd_writeb(mgr->aux, DP_MSTM_CTRL,
+ DP_MST_EN | DP_UPSTREAM_IS_SRC);
+ mutex_unlock(&mgr->lock);
++ flush_work(&mgr->work);
++ flush_work(&mgr->destroy_connector_work);
+ }
+ EXPORT_SYMBOL(drm_dp_mst_topology_mgr_suspend);
+
+@@ -2661,7 +2674,7 @@ static void drm_dp_destroy_connector_work(struct work_struct *work)
+ {
+ struct drm_dp_mst_topology_mgr *mgr = container_of(work, struct drm_dp_mst_topology_mgr, destroy_connector_work);
+ struct drm_dp_mst_port *port;
+-
++ bool send_hotplug = false;
+ /*
+ * Not a regular list traverse as we have to drop the destroy
+ * connector lock before destroying the connector, to avoid AB->BA
+@@ -2684,7 +2697,10 @@ static void drm_dp_destroy_connector_work(struct work_struct *work)
+ if (!port->input && port->vcpi.vcpi > 0)
+ drm_dp_mst_put_payload_id(mgr, port->vcpi.vcpi);
+ kfree(port);
++ send_hotplug = true;
+ }
++ if (send_hotplug)
++ (*mgr->cbs->hotplug)(mgr);
+ }
+
+ /**
+@@ -2737,6 +2753,7 @@ EXPORT_SYMBOL(drm_dp_mst_topology_mgr_init);
+ */
+ void drm_dp_mst_topology_mgr_destroy(struct drm_dp_mst_topology_mgr *mgr)
+ {
++ flush_work(&mgr->work);
+ flush_work(&mgr->destroy_connector_work);
+ mutex_lock(&mgr->payload_lock);
+ kfree(mgr->payloads);
+diff --git a/drivers/gpu/drm/drm_lock.c b/drivers/gpu/drm/drm_lock.c
+index f861361a635e..4924d381b664 100644
+--- a/drivers/gpu/drm/drm_lock.c
++++ b/drivers/gpu/drm/drm_lock.c
+@@ -61,6 +61,9 @@ int drm_legacy_lock(struct drm_device *dev, void *data,
+ struct drm_master *master = file_priv->master;
+ int ret = 0;
+
++ if (drm_core_check_feature(dev, DRIVER_MODESET))
++ return -EINVAL;
++
+ ++file_priv->lock_count;
+
+ if (lock->context == DRM_KERNEL_CONTEXT) {
+@@ -153,6 +156,9 @@ int drm_legacy_unlock(struct drm_device *dev, void *data, struct drm_file *file_
+ struct drm_lock *lock = data;
+ struct drm_master *master = file_priv->master;
+
++ if (drm_core_check_feature(dev, DRIVER_MODESET))
++ return -EINVAL;
++
+ if (lock->context == DRM_KERNEL_CONTEXT) {
+ DRM_ERROR("Process %d using kernel context %d\n",
+ task_pid_nr(current), lock->context);
+diff --git a/drivers/gpu/drm/i915/intel_bios.c b/drivers/gpu/drm/i915/intel_bios.c
+index 198fc3c3291b..17522f733513 100644
+--- a/drivers/gpu/drm/i915/intel_bios.c
++++ b/drivers/gpu/drm/i915/intel_bios.c
+@@ -42,7 +42,7 @@ find_section(const void *_bdb, int section_id)
+ const struct bdb_header *bdb = _bdb;
+ const u8 *base = _bdb;
+ int index = 0;
+- u16 total, current_size;
++ u32 total, current_size;
+ u8 current_id;
+
+ /* skip to first section */
+@@ -57,6 +57,10 @@ find_section(const void *_bdb, int section_id)
+ current_size = *((const u16 *)(base + index));
+ index += 2;
+
++ /* The MIPI Sequence Block v3+ has a separate size field. */
++ if (current_id == BDB_MIPI_SEQUENCE && *(base + index) >= 3)
++ current_size = *((const u32 *)(base + index + 1));
++
+ if (index + current_size > total)
+ return NULL;
+
+@@ -859,6 +863,12 @@ parse_mipi(struct drm_i915_private *dev_priv, const struct bdb_header *bdb)
+ return;
+ }
+
++ /* Fail gracefully for forward incompatible sequence block. */
++ if (sequence->version >= 3) {
++ DRM_ERROR("Unable to parse MIPI Sequence Block v3+\n");
++ return;
++ }
++
+ DRM_DEBUG_DRIVER("Found MIPI sequence block\n");
+
+ block_size = get_blocksize(sequence);
+diff --git a/drivers/gpu/drm/qxl/qxl_display.c b/drivers/gpu/drm/qxl/qxl_display.c
+index 7c6225c84ba6..4649bd2ed340 100644
+--- a/drivers/gpu/drm/qxl/qxl_display.c
++++ b/drivers/gpu/drm/qxl/qxl_display.c
+@@ -618,7 +618,7 @@ static int qxl_crtc_mode_set(struct drm_crtc *crtc,
+ adjusted_mode->hdisplay,
+ adjusted_mode->vdisplay);
+
+- if (qcrtc->index == 0)
++ if (bo->is_primary == false)
+ recreate_primary = true;
+
+ if (bo->surf.stride * bo->surf.height > qdev->vram_size) {
+@@ -886,13 +886,15 @@ static enum drm_connector_status qxl_conn_detect(
+ drm_connector_to_qxl_output(connector);
+ struct drm_device *ddev = connector->dev;
+ struct qxl_device *qdev = ddev->dev_private;
+- int connected;
++ bool connected = false;
+
+ /* The first monitor is always connected */
+- connected = (output->index == 0) ||
+- (qdev->client_monitors_config &&
+- qdev->client_monitors_config->count > output->index &&
+- qxl_head_enabled(&qdev->client_monitors_config->heads[output->index]));
++ if (!qdev->client_monitors_config) {
++ if (output->index == 0)
++ connected = true;
++ } else
++ connected = qdev->client_monitors_config->count > output->index &&
++ qxl_head_enabled(&qdev->client_monitors_config->heads[output->index]);
+
+ DRM_DEBUG("#%d connected: %d\n", output->index, connected);
+ if (!connected)
+diff --git a/drivers/gpu/drm/radeon/atombios_encoders.c b/drivers/gpu/drm/radeon/atombios_encoders.c
+index c3872598b85a..65adb9c72377 100644
+--- a/drivers/gpu/drm/radeon/atombios_encoders.c
++++ b/drivers/gpu/drm/radeon/atombios_encoders.c
+@@ -1624,8 +1624,9 @@ radeon_atom_encoder_dpms_avivo(struct drm_encoder *encoder, int mode)
+ } else
+ atom_execute_table(rdev->mode_info.atom_context, index, (uint32_t *)&args);
+ if (radeon_encoder->devices & (ATOM_DEVICE_LCD_SUPPORT)) {
+- args.ucAction = ATOM_LCD_BLON;
+- atom_execute_table(rdev->mode_info.atom_context, index, (uint32_t *)&args);
++ struct radeon_encoder_atom_dig *dig = radeon_encoder->enc_priv;
++
++ atombios_set_backlight_level(radeon_encoder, dig->backlight_level);
+ }
+ break;
+ case DRM_MODE_DPMS_STANDBY:
+@@ -1706,8 +1707,7 @@ radeon_atom_encoder_dpms_dig(struct drm_encoder *encoder, int mode)
+ atombios_dig_encoder_setup(encoder, ATOM_ENCODER_CMD_DP_VIDEO_ON, 0);
+ }
+ if (radeon_encoder->devices & (ATOM_DEVICE_LCD_SUPPORT))
+- atombios_dig_transmitter_setup(encoder,
+- ATOM_TRANSMITTER_ACTION_LCD_BLON, 0, 0);
++ atombios_set_backlight_level(radeon_encoder, dig->backlight_level);
+ if (ext_encoder)
+ atombios_external_encoder_setup(encoder, ext_encoder, ATOM_ENABLE);
+ break;
+diff --git a/drivers/hv/hv_utils_transport.c b/drivers/hv/hv_utils_transport.c
+index ea7ba5ef16a9..6a9d80a5332d 100644
+--- a/drivers/hv/hv_utils_transport.c
++++ b/drivers/hv/hv_utils_transport.c
+@@ -186,7 +186,7 @@ int hvutil_transport_send(struct hvutil_transport *hvt, void *msg, int len)
+ return -EINVAL;
+ } else if (hvt->mode == HVUTIL_TRANSPORT_NETLINK) {
+ cn_msg = kzalloc(sizeof(*cn_msg) + len, GFP_ATOMIC);
+- if (!msg)
++ if (!cn_msg)
+ return -ENOMEM;
+ cn_msg->id.idx = hvt->cn_id.idx;
+ cn_msg->id.val = hvt->cn_id.val;
+diff --git a/drivers/hwmon/nct6775.c b/drivers/hwmon/nct6775.c
+index bd1c99deac71..2aaedbe0b023 100644
+--- a/drivers/hwmon/nct6775.c
++++ b/drivers/hwmon/nct6775.c
+@@ -354,6 +354,10 @@ static const u16 NCT6775_REG_TEMP_CRIT[ARRAY_SIZE(nct6775_temp_label) - 1]
+
+ /* NCT6776 specific data */
+
++/* STEP_UP_TIME and STEP_DOWN_TIME regs are swapped for all chips but NCT6775 */
++#define NCT6776_REG_FAN_STEP_UP_TIME NCT6775_REG_FAN_STEP_DOWN_TIME
++#define NCT6776_REG_FAN_STEP_DOWN_TIME NCT6775_REG_FAN_STEP_UP_TIME
++
+ static const s8 NCT6776_ALARM_BITS[] = {
+ 0, 1, 2, 3, 8, 21, 20, 16, /* in0.. in7 */
+ 17, -1, -1, -1, -1, -1, -1, /* in8..in14 */
+@@ -3528,8 +3532,8 @@ static int nct6775_probe(struct platform_device *pdev)
+ data->REG_FAN_PULSES = NCT6776_REG_FAN_PULSES;
+ data->FAN_PULSE_SHIFT = NCT6775_FAN_PULSE_SHIFT;
+ data->REG_FAN_TIME[0] = NCT6775_REG_FAN_STOP_TIME;
+- data->REG_FAN_TIME[1] = NCT6775_REG_FAN_STEP_UP_TIME;
+- data->REG_FAN_TIME[2] = NCT6775_REG_FAN_STEP_DOWN_TIME;
++ data->REG_FAN_TIME[1] = NCT6776_REG_FAN_STEP_UP_TIME;
++ data->REG_FAN_TIME[2] = NCT6776_REG_FAN_STEP_DOWN_TIME;
+ data->REG_TOLERANCE_H = NCT6776_REG_TOLERANCE_H;
+ data->REG_PWM[0] = NCT6775_REG_PWM;
+ data->REG_PWM[1] = NCT6775_REG_FAN_START_OUTPUT;
+@@ -3600,8 +3604,8 @@ static int nct6775_probe(struct platform_device *pdev)
+ data->REG_FAN_PULSES = NCT6779_REG_FAN_PULSES;
+ data->FAN_PULSE_SHIFT = NCT6775_FAN_PULSE_SHIFT;
+ data->REG_FAN_TIME[0] = NCT6775_REG_FAN_STOP_TIME;
+- data->REG_FAN_TIME[1] = NCT6775_REG_FAN_STEP_UP_TIME;
+- data->REG_FAN_TIME[2] = NCT6775_REG_FAN_STEP_DOWN_TIME;
++ data->REG_FAN_TIME[1] = NCT6776_REG_FAN_STEP_UP_TIME;
++ data->REG_FAN_TIME[2] = NCT6776_REG_FAN_STEP_DOWN_TIME;
+ data->REG_TOLERANCE_H = NCT6776_REG_TOLERANCE_H;
+ data->REG_PWM[0] = NCT6775_REG_PWM;
+ data->REG_PWM[1] = NCT6775_REG_FAN_START_OUTPUT;
+@@ -3677,8 +3681,8 @@ static int nct6775_probe(struct platform_device *pdev)
+ data->REG_FAN_PULSES = NCT6779_REG_FAN_PULSES;
+ data->FAN_PULSE_SHIFT = NCT6775_FAN_PULSE_SHIFT;
+ data->REG_FAN_TIME[0] = NCT6775_REG_FAN_STOP_TIME;
+- data->REG_FAN_TIME[1] = NCT6775_REG_FAN_STEP_UP_TIME;
+- data->REG_FAN_TIME[2] = NCT6775_REG_FAN_STEP_DOWN_TIME;
++ data->REG_FAN_TIME[1] = NCT6776_REG_FAN_STEP_UP_TIME;
++ data->REG_FAN_TIME[2] = NCT6776_REG_FAN_STEP_DOWN_TIME;
+ data->REG_TOLERANCE_H = NCT6776_REG_TOLERANCE_H;
+ data->REG_PWM[0] = NCT6775_REG_PWM;
+ data->REG_PWM[1] = NCT6775_REG_FAN_START_OUTPUT;
+diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
+index d851e1828d6f..85761b78bb5f 100644
+--- a/drivers/infiniband/ulp/isert/ib_isert.c
++++ b/drivers/infiniband/ulp/isert/ib_isert.c
+@@ -3012,9 +3012,16 @@ isert_get_dataout(struct iscsi_conn *conn, struct iscsi_cmd *cmd, bool recovery)
+ static int
+ isert_immediate_queue(struct iscsi_conn *conn, struct iscsi_cmd *cmd, int state)
+ {
+- int ret;
++ struct isert_cmd *isert_cmd = iscsit_priv_cmd(cmd);
++ int ret = 0;
+
+ switch (state) {
++ case ISTATE_REMOVE:
++ spin_lock_bh(&conn->cmd_lock);
++ list_del_init(&cmd->i_conn_node);
++ spin_unlock_bh(&conn->cmd_lock);
++ isert_put_cmd(isert_cmd, true);
++ break;
+ case ISTATE_SEND_NOPIN_WANT_RESPONSE:
+ ret = isert_put_nopin(cmd, conn, false);
+ break;
+@@ -3379,6 +3386,41 @@ isert_wait4flush(struct isert_conn *isert_conn)
+ wait_for_completion(&isert_conn->wait_comp_err);
+ }
+
++/**
++ * isert_put_unsol_pending_cmds() - Drop commands waiting for
++ * unsolicitate dataout
++ * @conn: iscsi connection
++ *
++ * We might still have commands that are waiting for unsolicited
++ * dataouts messages. We must put the extra reference on those
++ * before blocking on the target_wait_for_session_cmds
++ */
++static void
++isert_put_unsol_pending_cmds(struct iscsi_conn *conn)
++{
++ struct iscsi_cmd *cmd, *tmp;
++ static LIST_HEAD(drop_cmd_list);
++
++ spin_lock_bh(&conn->cmd_lock);
++ list_for_each_entry_safe(cmd, tmp, &conn->conn_cmd_list, i_conn_node) {
++ if ((cmd->cmd_flags & ICF_NON_IMMEDIATE_UNSOLICITED_DATA) &&
++ (cmd->write_data_done < conn->sess->sess_ops->FirstBurstLength) &&
++ (cmd->write_data_done < cmd->se_cmd.data_length))
++ list_move_tail(&cmd->i_conn_node, &drop_cmd_list);
++ }
++ spin_unlock_bh(&conn->cmd_lock);
++
++ list_for_each_entry_safe(cmd, tmp, &drop_cmd_list, i_conn_node) {
++ list_del_init(&cmd->i_conn_node);
++ if (cmd->i_state != ISTATE_REMOVE) {
++ struct isert_cmd *isert_cmd = iscsit_priv_cmd(cmd);
++
++ isert_info("conn %p dropping cmd %p\n", conn, cmd);
++ isert_put_cmd(isert_cmd, true);
++ }
++ }
++}
++
+ static void isert_wait_conn(struct iscsi_conn *conn)
+ {
+ struct isert_conn *isert_conn = conn->context;
+@@ -3397,8 +3439,9 @@ static void isert_wait_conn(struct iscsi_conn *conn)
+ isert_conn_terminate(isert_conn);
+ mutex_unlock(&isert_conn->mutex);
+
+- isert_wait4cmds(conn);
+ isert_wait4flush(isert_conn);
++ isert_put_unsol_pending_cmds(conn);
++ isert_wait4cmds(conn);
+ isert_wait4logout(isert_conn);
+
+ queue_work(isert_release_wq, &isert_conn->release_work);
+diff --git a/drivers/irqchip/irq-atmel-aic5.c b/drivers/irqchip/irq-atmel-aic5.c
+index 459bf4429d36..7e077bf13fe1 100644
+--- a/drivers/irqchip/irq-atmel-aic5.c
++++ b/drivers/irqchip/irq-atmel-aic5.c
+@@ -88,28 +88,36 @@ static void aic5_mask(struct irq_data *d)
+ {
+ struct irq_domain *domain = d->domain;
+ struct irq_domain_chip_generic *dgc = domain->gc;
+- struct irq_chip_generic *gc = dgc->gc[0];
++ struct irq_chip_generic *bgc = dgc->gc[0];
++ struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
+
+- /* Disable interrupt on AIC5 */
+- irq_gc_lock(gc);
++ /*
++ * Disable interrupt on AIC5. We always take the lock of the
++ * first irq chip as all chips share the same registers.
++ */
++ irq_gc_lock(bgc);
+ irq_reg_writel(gc, d->hwirq, AT91_AIC5_SSR);
+ irq_reg_writel(gc, 1, AT91_AIC5_IDCR);
+ gc->mask_cache &= ~d->mask;
+- irq_gc_unlock(gc);
++ irq_gc_unlock(bgc);
+ }
+
+ static void aic5_unmask(struct irq_data *d)
+ {
+ struct irq_domain *domain = d->domain;
+ struct irq_domain_chip_generic *dgc = domain->gc;
+- struct irq_chip_generic *gc = dgc->gc[0];
++ struct irq_chip_generic *bgc = dgc->gc[0];
++ struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
+
+- /* Enable interrupt on AIC5 */
+- irq_gc_lock(gc);
++ /*
++ * Enable interrupt on AIC5. We always take the lock of the
++ * first irq chip as all chips share the same registers.
++ */
++ irq_gc_lock(bgc);
+ irq_reg_writel(gc, d->hwirq, AT91_AIC5_SSR);
+ irq_reg_writel(gc, 1, AT91_AIC5_IECR);
+ gc->mask_cache |= d->mask;
+- irq_gc_unlock(gc);
++ irq_gc_unlock(bgc);
+ }
+
+ static int aic5_retrigger(struct irq_data *d)
+diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
+index c00e2db351ba..9a791dd52199 100644
+--- a/drivers/irqchip/irq-gic-v3-its.c
++++ b/drivers/irqchip/irq-gic-v3-its.c
+@@ -921,8 +921,10 @@ retry_baser:
+ * non-cacheable as well.
+ */
+ shr = tmp & GITS_BASER_SHAREABILITY_MASK;
+- if (!shr)
++ if (!shr) {
+ cache = GITS_BASER_nC;
++ __flush_dcache_area(base, alloc_size);
++ }
+ goto retry_baser;
+ }
+
+@@ -1163,6 +1165,8 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
+ return NULL;
+ }
+
++ __flush_dcache_area(itt, sz);
++
+ dev->its = its;
+ dev->itt = itt;
+ dev->nr_ites = nr_ites;
+diff --git a/drivers/leds/Kconfig b/drivers/leds/Kconfig
+index 9ad35f72ab4c..433fb9df848a 100644
+--- a/drivers/leds/Kconfig
++++ b/drivers/leds/Kconfig
+@@ -229,7 +229,7 @@ config LEDS_LP55XX_COMMON
+ tristate "Common Driver for TI/National LP5521/5523/55231/5562/8501"
+ depends on LEDS_LP5521 || LEDS_LP5523 || LEDS_LP5562 || LEDS_LP8501
+ select FW_LOADER
+- select FW_LOADER_USER_HELPER_FALLBACK
++ select FW_LOADER_USER_HELPER
+ help
+ This option supports common operations for LP5521/5523/55231/5562/8501
+ devices.
+diff --git a/drivers/leds/led-class.c b/drivers/leds/led-class.c
+index beabfbc6f7cd..ca51d58bed24 100644
+--- a/drivers/leds/led-class.c
++++ b/drivers/leds/led-class.c
+@@ -228,12 +228,15 @@ static int led_classdev_next_name(const char *init_name, char *name,
+ {
+ unsigned int i = 0;
+ int ret = 0;
++ struct device *dev;
+
+ strlcpy(name, init_name, len);
+
+- while (class_find_device(leds_class, NULL, name, match_name) &&
+- (ret < len))
++ while ((ret < len) &&
++ (dev = class_find_device(leds_class, NULL, name, match_name))) {
++ put_device(dev);
+ ret = snprintf(name, len, "%s_%u", init_name, ++i);
++ }
+
+ if (ret >= len)
+ return -ENOMEM;
+diff --git a/drivers/macintosh/windfarm_core.c b/drivers/macintosh/windfarm_core.c
+index 3ee198b65843..cc7ece1712b5 100644
+--- a/drivers/macintosh/windfarm_core.c
++++ b/drivers/macintosh/windfarm_core.c
+@@ -435,7 +435,7 @@ int wf_unregister_client(struct notifier_block *nb)
+ {
+ mutex_lock(&wf_lock);
+ blocking_notifier_chain_unregister(&wf_client_list, nb);
+- wf_client_count++;
++ wf_client_count--;
+ if (wf_client_count == 0)
+ wf_stop_thread();
+ mutex_unlock(&wf_lock);
+diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
+index e51de52eeb94..48b5890c28e3 100644
+--- a/drivers/md/bitmap.c
++++ b/drivers/md/bitmap.c
+@@ -1997,7 +1997,8 @@ int bitmap_resize(struct bitmap *bitmap, sector_t blocks,
+ if (bitmap->mddev->bitmap_info.offset || bitmap->mddev->bitmap_info.file)
+ ret = bitmap_storage_alloc(&store, chunks,
+ !bitmap->mddev->bitmap_info.external,
+- bitmap->cluster_slot);
++ mddev_is_clustered(bitmap->mddev)
++ ? bitmap->cluster_slot : 0);
+ if (ret)
+ goto err;
+
+diff --git a/drivers/md/dm-cache-policy-cleaner.c b/drivers/md/dm-cache-policy-cleaner.c
+index 240c9f0e85e7..8a096456579b 100644
+--- a/drivers/md/dm-cache-policy-cleaner.c
++++ b/drivers/md/dm-cache-policy-cleaner.c
+@@ -436,7 +436,7 @@ static struct dm_cache_policy *wb_create(dm_cblock_t cache_size,
+ static struct dm_cache_policy_type wb_policy_type = {
+ .name = "cleaner",
+ .version = {1, 0, 0},
+- .hint_size = 0,
++ .hint_size = 4,
+ .owner = THIS_MODULE,
+ .create = wb_create
+ };
+diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
+index 0f48fed44a17..0d28c5b9d065 100644
+--- a/drivers/md/dm-crypt.c
++++ b/drivers/md/dm-crypt.c
+@@ -968,7 +968,8 @@ static void crypt_free_buffer_pages(struct crypt_config *cc, struct bio *clone);
+
+ /*
+ * Generate a new unfragmented bio with the given size
+- * This should never violate the device limitations
++ * This should never violate the device limitations (but only because
++ * max_segment_size is being constrained to PAGE_SIZE).
+ *
+ * This function may be called concurrently. If we allocate from the mempool
+ * concurrently, there is a possibility of deadlock. For example, if we have
+@@ -2058,9 +2059,20 @@ static int crypt_iterate_devices(struct dm_target *ti,
+ return fn(ti, cc->dev, cc->start, ti->len, data);
+ }
+
++static void crypt_io_hints(struct dm_target *ti, struct queue_limits *limits)
++{
++ /*
++ * Unfortunate constraint that is required to avoid the potential
++ * for exceeding underlying device's max_segments limits -- due to
++ * crypt_alloc_buffer() possibly allocating pages for the encryption
++ * bio that are not as physically contiguous as the original bio.
++ */
++ limits->max_segment_size = PAGE_SIZE;
++}
++
+ static struct target_type crypt_target = {
+ .name = "crypt",
+- .version = {1, 14, 0},
++ .version = {1, 14, 1},
+ .module = THIS_MODULE,
+ .ctr = crypt_ctr,
+ .dtr = crypt_dtr,
+@@ -2072,6 +2084,7 @@ static struct target_type crypt_target = {
+ .message = crypt_message,
+ .merge = crypt_merge,
+ .iterate_devices = crypt_iterate_devices,
++ .io_hints = crypt_io_hints,
+ };
+
+ static int __init dm_crypt_init(void)
+diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
+index 2daa67793511..1257d484392a 100644
+--- a/drivers/md/dm-raid.c
++++ b/drivers/md/dm-raid.c
+@@ -329,8 +329,7 @@ static int validate_region_size(struct raid_set *rs, unsigned long region_size)
+ */
+ if (min_region_size > (1 << 13)) {
+ /* If not a power of 2, make it the next power of 2 */
+- if (min_region_size & (min_region_size - 1))
+- region_size = 1 << fls(region_size);
++ region_size = roundup_pow_of_two(min_region_size);
+ DMINFO("Choosing default region size of %lu sectors",
+ region_size);
+ } else {
+diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
+index d2bbe8cc1e97..75aef240c2d1 100644
+--- a/drivers/md/dm-thin.c
++++ b/drivers/md/dm-thin.c
+@@ -4333,6 +4333,10 @@ static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits)
+ {
+ struct thin_c *tc = ti->private;
+ struct pool *pool = tc->pool;
++ struct queue_limits *pool_limits = dm_get_queue_limits(pool->pool_md);
++
++ if (!pool_limits->discard_granularity)
++ return; /* pool's discard support is disabled */
+
+ limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT;
+ limits->max_discard_sectors = 2048 * 1024 * 16; /* 16G */
+diff --git a/drivers/md/dm.c b/drivers/md/dm.c
+index 0d7ab20c58df..3e32f4e31bbb 100644
+--- a/drivers/md/dm.c
++++ b/drivers/md/dm.c
+@@ -2952,8 +2952,6 @@ static void __dm_destroy(struct mapped_device *md, bool wait)
+
+ might_sleep();
+
+- map = dm_get_live_table(md, &srcu_idx);
+-
+ spin_lock(&_minor_lock);
+ idr_replace(&_minor_idr, MINOR_ALLOCED, MINOR(disk_devt(dm_disk(md))));
+ set_bit(DMF_FREEING, &md->flags);
+@@ -2967,14 +2965,14 @@ static void __dm_destroy(struct mapped_device *md, bool wait)
+ * do not race with internal suspend.
+ */
+ mutex_lock(&md->suspend_lock);
++ map = dm_get_live_table(md, &srcu_idx);
+ if (!dm_suspended_md(md)) {
+ dm_table_presuspend_targets(map);
+ dm_table_postsuspend_targets(map);
+ }
+- mutex_unlock(&md->suspend_lock);
+-
+ /* dm_put_live_table must be before msleep, otherwise deadlock is possible */
+ dm_put_live_table(md, srcu_idx);
++ mutex_unlock(&md->suspend_lock);
+
+ /*
+ * Rare, but there may be I/O requests still going to complete,
+diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
+index efb654eb5399..0875e5e7e09a 100644
+--- a/drivers/md/raid0.c
++++ b/drivers/md/raid0.c
+@@ -83,7 +83,7 @@ static int create_strip_zones(struct mddev *mddev, struct r0conf **private_conf)
+ char b[BDEVNAME_SIZE];
+ char b2[BDEVNAME_SIZE];
+ struct r0conf *conf = kzalloc(sizeof(*conf), GFP_KERNEL);
+- bool discard_supported = false;
++ unsigned short blksize = 512;
+
+ if (!conf)
+ return -ENOMEM;
+@@ -98,6 +98,9 @@ static int create_strip_zones(struct mddev *mddev, struct r0conf **private_conf)
+ sector_div(sectors, mddev->chunk_sectors);
+ rdev1->sectors = sectors * mddev->chunk_sectors;
+
++ blksize = max(blksize, queue_logical_block_size(
++ rdev1->bdev->bd_disk->queue));
++
+ rdev_for_each(rdev2, mddev) {
+ pr_debug("md/raid0:%s: comparing %s(%llu)"
+ " with %s(%llu)\n",
+@@ -134,6 +137,18 @@ static int create_strip_zones(struct mddev *mddev, struct r0conf **private_conf)
+ }
+ pr_debug("md/raid0:%s: FINAL %d zones\n",
+ mdname(mddev), conf->nr_strip_zones);
++ /*
++ * now since we have the hard sector sizes, we can make sure
++ * chunk size is a multiple of that sector size
++ */
++ if ((mddev->chunk_sectors << 9) % blksize) {
++ printk(KERN_ERR "md/raid0:%s: chunk_size of %d not multiple of block size %d\n",
++ mdname(mddev),
++ mddev->chunk_sectors << 9, blksize);
++ err = -EINVAL;
++ goto abort;
++ }
++
+ err = -ENOMEM;
+ conf->strip_zone = kzalloc(sizeof(struct strip_zone)*
+ conf->nr_strip_zones, GFP_KERNEL);
+@@ -188,19 +203,12 @@ static int create_strip_zones(struct mddev *mddev, struct r0conf **private_conf)
+ }
+ dev[j] = rdev1;
+
+- if (mddev->queue)
+- disk_stack_limits(mddev->gendisk, rdev1->bdev,
+- rdev1->data_offset << 9);
+-
+ if (rdev1->bdev->bd_disk->queue->merge_bvec_fn)
+ conf->has_merge_bvec = 1;
+
+ if (!smallest || (rdev1->sectors < smallest->sectors))
+ smallest = rdev1;
+ cnt++;
+-
+- if (blk_queue_discard(bdev_get_queue(rdev1->bdev)))
+- discard_supported = true;
+ }
+ if (cnt != mddev->raid_disks) {
+ printk(KERN_ERR "md/raid0:%s: too few disks (%d of %d) - "
+@@ -261,28 +269,6 @@ static int create_strip_zones(struct mddev *mddev, struct r0conf **private_conf)
+ (unsigned long long)smallest->sectors);
+ }
+
+- /*
+- * now since we have the hard sector sizes, we can make sure
+- * chunk size is a multiple of that sector size
+- */
+- if ((mddev->chunk_sectors << 9) % queue_logical_block_size(mddev->queue)) {
+- printk(KERN_ERR "md/raid0:%s: chunk_size of %d not valid\n",
+- mdname(mddev),
+- mddev->chunk_sectors << 9);
+- goto abort;
+- }
+-
+- if (mddev->queue) {
+- blk_queue_io_min(mddev->queue, mddev->chunk_sectors << 9);
+- blk_queue_io_opt(mddev->queue,
+- (mddev->chunk_sectors << 9) * mddev->raid_disks);
+-
+- if (!discard_supported)
+- queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
+- else
+- queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
+- }
+-
+ pr_debug("md/raid0:%s: done.\n", mdname(mddev));
+ *private_conf = conf;
+
+@@ -433,12 +419,6 @@ static int raid0_run(struct mddev *mddev)
+ if (md_check_no_bitmap(mddev))
+ return -EINVAL;
+
+- if (mddev->queue) {
+- blk_queue_max_hw_sectors(mddev->queue, mddev->chunk_sectors);
+- blk_queue_max_write_same_sectors(mddev->queue, mddev->chunk_sectors);
+- blk_queue_max_discard_sectors(mddev->queue, mddev->chunk_sectors);
+- }
+-
+ /* if private is not null, we are here after takeover */
+ if (mddev->private == NULL) {
+ ret = create_strip_zones(mddev, &conf);
+@@ -447,6 +427,29 @@ static int raid0_run(struct mddev *mddev)
+ mddev->private = conf;
+ }
+ conf = mddev->private;
++ if (mddev->queue) {
++ struct md_rdev *rdev;
++ bool discard_supported = false;
++
++ blk_queue_max_hw_sectors(mddev->queue, mddev->chunk_sectors);
++ blk_queue_max_write_same_sectors(mddev->queue, mddev->chunk_sectors);
++ blk_queue_max_discard_sectors(mddev->queue, mddev->chunk_sectors);
++
++ blk_queue_io_min(mddev->queue, mddev->chunk_sectors << 9);
++ blk_queue_io_opt(mddev->queue,
++ (mddev->chunk_sectors << 9) * mddev->raid_disks);
++
++ rdev_for_each(rdev, mddev) {
++ disk_stack_limits(mddev->gendisk, rdev->bdev,
++ rdev->data_offset << 9);
++ if (blk_queue_discard(bdev_get_queue(rdev->bdev)))
++ discard_supported = true;
++ }
++ if (!discard_supported)
++ queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
++ else
++ queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
++ }
+
+ /* calculate array device size */
+ md_set_array_sectors(mddev, raid0_size(mddev, 0, 0));
+diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
+index 9e3fdbdc4037..2f4503a7f315 100644
+--- a/drivers/mmc/core/core.c
++++ b/drivers/mmc/core/core.c
+@@ -134,9 +134,11 @@ void mmc_request_done(struct mmc_host *host, struct mmc_request *mrq)
+ int err = cmd->error;
+
+ /* Flag re-tuning needed on CRC errors */
+- if (err == -EILSEQ || (mrq->sbc && mrq->sbc->error == -EILSEQ) ||
++ if ((cmd->opcode != MMC_SEND_TUNING_BLOCK &&
++ cmd->opcode != MMC_SEND_TUNING_BLOCK_HS200) &&
++ (err == -EILSEQ || (mrq->sbc && mrq->sbc->error == -EILSEQ) ||
+ (mrq->data && mrq->data->error == -EILSEQ) ||
+- (mrq->stop && mrq->stop->error == -EILSEQ))
++ (mrq->stop && mrq->stop->error == -EILSEQ)))
+ mmc_retune_needed(host);
+
+ if (err && cmd->retries && mmc_host_is_spi(host)) {
+diff --git a/drivers/mmc/core/host.c b/drivers/mmc/core/host.c
+index 99a9c9011c50..79979e9d5a09 100644
+--- a/drivers/mmc/core/host.c
++++ b/drivers/mmc/core/host.c
+@@ -457,7 +457,7 @@ int mmc_of_parse(struct mmc_host *host)
+ 0, &cd_gpio_invert);
+ if (!ret)
+ dev_info(host->parent, "Got CD GPIO\n");
+- else if (ret != -ENOENT)
++ else if (ret != -ENOENT && ret != -ENOSYS)
+ return ret;
+
+ /*
+@@ -481,7 +481,7 @@ int mmc_of_parse(struct mmc_host *host)
+ ret = mmc_gpiod_request_ro(host, "wp", 0, false, 0, &ro_gpio_invert);
+ if (!ret)
+ dev_info(host->parent, "Got WP GPIO\n");
+- else if (ret != -ENOENT)
++ else if (ret != -ENOENT && ret != -ENOSYS)
+ return ret;
+
+ if (of_property_read_bool(np, "disable-wp"))
+diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
+index 40e9d8e45f25..e41fb7405426 100644
+--- a/drivers/mmc/host/dw_mmc.c
++++ b/drivers/mmc/host/dw_mmc.c
+@@ -99,6 +99,9 @@ struct idmac_desc {
+
+ __le32 des3; /* buffer 2 physical address */
+ };
++
++/* Each descriptor can transfer up to 4KB of data in chained mode */
++#define DW_MCI_DESC_DATA_LENGTH 0x1000
+ #endif /* CONFIG_MMC_DW_IDMAC */
+
+ static bool dw_mci_reset(struct dw_mci *host);
+@@ -462,66 +465,96 @@ static void dw_mci_idmac_complete_dma(struct dw_mci *host)
+ static void dw_mci_translate_sglist(struct dw_mci *host, struct mmc_data *data,
+ unsigned int sg_len)
+ {
++ unsigned int desc_len;
+ int i;
+ if (host->dma_64bit_address == 1) {
+- struct idmac_desc_64addr *desc = host->sg_cpu;
++ struct idmac_desc_64addr *desc_first, *desc_last, *desc;
++
++ desc_first = desc_last = desc = host->sg_cpu;
+
+- for (i = 0; i < sg_len; i++, desc++) {
++ for (i = 0; i < sg_len; i++) {
+ unsigned int length = sg_dma_len(&data->sg[i]);
+ u64 mem_addr = sg_dma_address(&data->sg[i]);
+
+- /*
+- * Set the OWN bit and disable interrupts for this
+- * descriptor
+- */
+- desc->des0 = IDMAC_DES0_OWN | IDMAC_DES0_DIC |
+- IDMAC_DES0_CH;
+- /* Buffer length */
+- IDMAC_64ADDR_SET_BUFFER1_SIZE(desc, length);
+-
+- /* Physical address to DMA to/from */
+- desc->des4 = mem_addr & 0xffffffff;
+- desc->des5 = mem_addr >> 32;
++ for ( ; length ; desc++) {
++ desc_len = (length <= DW_MCI_DESC_DATA_LENGTH) ?
++ length : DW_MCI_DESC_DATA_LENGTH;
++
++ length -= desc_len;
++
++ /*
++ * Set the OWN bit and disable interrupts
++ * for this descriptor
++ */
++ desc->des0 = IDMAC_DES0_OWN | IDMAC_DES0_DIC |
++ IDMAC_DES0_CH;
++
++ /* Buffer length */
++ IDMAC_64ADDR_SET_BUFFER1_SIZE(desc, desc_len);
++
++ /* Physical address to DMA to/from */
++ desc->des4 = mem_addr & 0xffffffff;
++ desc->des5 = mem_addr >> 32;
++
++ /* Update physical address for the next desc */
++ mem_addr += desc_len;
++
++ /* Save pointer to the last descriptor */
++ desc_last = desc;
++ }
+ }
+
+ /* Set first descriptor */
+- desc = host->sg_cpu;
+- desc->des0 |= IDMAC_DES0_FD;
++ desc_first->des0 |= IDMAC_DES0_FD;
+
+ /* Set last descriptor */
+- desc = host->sg_cpu + (i - 1) *
+- sizeof(struct idmac_desc_64addr);
+- desc->des0 &= ~(IDMAC_DES0_CH | IDMAC_DES0_DIC);
+- desc->des0 |= IDMAC_DES0_LD;
++ desc_last->des0 &= ~(IDMAC_DES0_CH | IDMAC_DES0_DIC);
++ desc_last->des0 |= IDMAC_DES0_LD;
+
+ } else {
+- struct idmac_desc *desc = host->sg_cpu;
++ struct idmac_desc *desc_first, *desc_last, *desc;
++
++ desc_first = desc_last = desc = host->sg_cpu;
+
+- for (i = 0; i < sg_len; i++, desc++) {
++ for (i = 0; i < sg_len; i++) {
+ unsigned int length = sg_dma_len(&data->sg[i]);
+ u32 mem_addr = sg_dma_address(&data->sg[i]);
+
+- /*
+- * Set the OWN bit and disable interrupts for this
+- * descriptor
+- */
+- desc->des0 = cpu_to_le32(IDMAC_DES0_OWN |
+- IDMAC_DES0_DIC | IDMAC_DES0_CH);
+- /* Buffer length */
+- IDMAC_SET_BUFFER1_SIZE(desc, length);
++ for ( ; length ; desc++) {
++ desc_len = (length <= DW_MCI_DESC_DATA_LENGTH) ?
++ length : DW_MCI_DESC_DATA_LENGTH;
++
++ length -= desc_len;
++
++ /*
++ * Set the OWN bit and disable interrupts
++ * for this descriptor
++ */
++ desc->des0 = cpu_to_le32(IDMAC_DES0_OWN |
++ IDMAC_DES0_DIC |
++ IDMAC_DES0_CH);
++
++ /* Buffer length */
++ IDMAC_SET_BUFFER1_SIZE(desc, desc_len);
+
+- /* Physical address to DMA to/from */
+- desc->des2 = cpu_to_le32(mem_addr);
++ /* Physical address to DMA to/from */
++ desc->des2 = cpu_to_le32(mem_addr);
++
++ /* Update physical address for the next desc */
++ mem_addr += desc_len;
++
++ /* Save pointer to the last descriptor */
++ desc_last = desc;
++ }
+ }
+
+ /* Set first descriptor */
+- desc = host->sg_cpu;
+- desc->des0 |= cpu_to_le32(IDMAC_DES0_FD);
++ desc_first->des0 |= cpu_to_le32(IDMAC_DES0_FD);
+
+ /* Set last descriptor */
+- desc = host->sg_cpu + (i - 1) * sizeof(struct idmac_desc);
+- desc->des0 &= cpu_to_le32(~(IDMAC_DES0_CH | IDMAC_DES0_DIC));
+- desc->des0 |= cpu_to_le32(IDMAC_DES0_LD);
++ desc_last->des0 &= cpu_to_le32(~(IDMAC_DES0_CH |
++ IDMAC_DES0_DIC));
++ desc_last->des0 |= cpu_to_le32(IDMAC_DES0_LD);
+ }
+
+ wmb();
+@@ -2394,7 +2427,7 @@ static int dw_mci_init_slot(struct dw_mci *host, unsigned int id)
+ #ifdef CONFIG_MMC_DW_IDMAC
+ mmc->max_segs = host->ring_size;
+ mmc->max_blk_size = 65536;
+- mmc->max_seg_size = 0x1000;
++ mmc->max_seg_size = DW_MCI_DESC_DATA_LENGTH;
+ mmc->max_req_size = mmc->max_seg_size * host->ring_size;
+ mmc->max_blk_count = mmc->max_req_size / 512;
+ #else
+diff --git a/drivers/mmc/host/sdhci-pxav3.c b/drivers/mmc/host/sdhci-pxav3.c
+index 946d37f94a31..f5edf9d3a18a 100644
+--- a/drivers/mmc/host/sdhci-pxav3.c
++++ b/drivers/mmc/host/sdhci-pxav3.c
+@@ -135,6 +135,7 @@ static int armada_38x_quirks(struct platform_device *pdev,
+ struct sdhci_pxa *pxa = pltfm_host->priv;
+ struct resource *res;
+
++ host->quirks &= ~SDHCI_QUIRK_CAP_CLOCK_BASE_BROKEN;
+ host->quirks |= SDHCI_QUIRK_MISSING_CAPS;
+ res = platform_get_resource_byname(pdev, IORESOURCE_MEM,
+ "conf-sdio3");
+@@ -290,6 +291,9 @@ static void pxav3_set_uhs_signaling(struct sdhci_host *host, unsigned int uhs)
+ uhs == MMC_TIMING_UHS_DDR50) {
+ reg_val &= ~SDIO3_CONF_CLK_INV;
+ reg_val |= SDIO3_CONF_SD_FB_CLK;
++ } else if (uhs == MMC_TIMING_MMC_HS) {
++ reg_val &= ~SDIO3_CONF_CLK_INV;
++ reg_val &= ~SDIO3_CONF_SD_FB_CLK;
+ } else {
+ reg_val |= SDIO3_CONF_CLK_INV;
+ reg_val &= ~SDIO3_CONF_SD_FB_CLK;
+@@ -398,7 +402,7 @@ static int sdhci_pxav3_probe(struct platform_device *pdev)
+ if (of_device_is_compatible(np, "marvell,armada-380-sdhci")) {
+ ret = armada_38x_quirks(pdev, host);
+ if (ret < 0)
+- goto err_clk_get;
++ goto err_mbus_win;
+ ret = mv_conf_mbus_windows(pdev, mv_mbus_dram_info());
+ if (ret < 0)
+ goto err_mbus_win;
+diff --git a/drivers/mtd/nand/pxa3xx_nand.c b/drivers/mtd/nand/pxa3xx_nand.c
+index 1259cc558ce9..5465fa439c9e 100644
+--- a/drivers/mtd/nand/pxa3xx_nand.c
++++ b/drivers/mtd/nand/pxa3xx_nand.c
+@@ -1473,6 +1473,9 @@ static int pxa3xx_nand_scan(struct mtd_info *mtd)
+ if (pdata->keep_config && !pxa3xx_nand_detect_config(info))
+ goto KEEP_CONFIG;
+
++ /* Set a default chunk size */
++ info->chunk_size = 512;
++
+ ret = pxa3xx_nand_sensing(info);
+ if (ret) {
+ dev_info(&info->pdev->dev, "There is no chip on cs %d!\n",
+diff --git a/drivers/mtd/nand/sunxi_nand.c b/drivers/mtd/nand/sunxi_nand.c
+index 6f93b2990d25..499b8e433d3d 100644
+--- a/drivers/mtd/nand/sunxi_nand.c
++++ b/drivers/mtd/nand/sunxi_nand.c
+@@ -138,6 +138,10 @@
+ #define NFC_ECC_MODE GENMASK(15, 12)
+ #define NFC_RANDOM_SEED GENMASK(30, 16)
+
++/* NFC_USER_DATA helper macros */
++#define NFC_BUF_TO_USER_DATA(buf) ((buf)[0] | ((buf)[1] << 8) | \
++ ((buf)[2] << 16) | ((buf)[3] << 24))
++
+ #define NFC_DEFAULT_TIMEOUT_MS 1000
+
+ #define NFC_SRAM_SIZE 1024
+@@ -632,15 +636,9 @@ static int sunxi_nfc_hw_ecc_write_page(struct mtd_info *mtd,
+ offset = layout->eccpos[i * ecc->bytes] - 4 + mtd->writesize;
+
+ /* Fill OOB data in */
+- if (oob_required) {
+- tmp = 0xffffffff;
+- memcpy_toio(nfc->regs + NFC_REG_USER_DATA_BASE, &tmp,
+- 4);
+- } else {
+- memcpy_toio(nfc->regs + NFC_REG_USER_DATA_BASE,
+- chip->oob_poi + offset - mtd->writesize,
+- 4);
+- }
++ writel(NFC_BUF_TO_USER_DATA(chip->oob_poi +
++ layout->oobfree[i].offset),
++ nfc->regs + NFC_REG_USER_DATA_BASE);
+
+ chip->cmdfunc(mtd, NAND_CMD_RNDIN, offset, -1);
+
+@@ -770,14 +768,8 @@ static int sunxi_nfc_hw_syndrome_ecc_write_page(struct mtd_info *mtd,
+ offset += ecc->size;
+
+ /* Fill OOB data in */
+- if (oob_required) {
+- tmp = 0xffffffff;
+- memcpy_toio(nfc->regs + NFC_REG_USER_DATA_BASE, &tmp,
+- 4);
+- } else {
+- memcpy_toio(nfc->regs + NFC_REG_USER_DATA_BASE, oob,
+- 4);
+- }
++ writel(NFC_BUF_TO_USER_DATA(oob),
++ nfc->regs + NFC_REG_USER_DATA_BASE);
+
+ tmp = NFC_DATA_TRANS | NFC_DATA_SWAP_METHOD | NFC_ACCESS_DIR |
+ (1 << 30);
+@@ -1312,6 +1304,7 @@ static void sunxi_nand_chips_cleanup(struct sunxi_nfc *nfc)
+ node);
+ nand_release(&chip->mtd);
+ sunxi_nand_ecc_cleanup(&chip->nand.ecc);
++ list_del(&chip->node);
+ }
+ }
+
+diff --git a/drivers/mtd/ubi/io.c b/drivers/mtd/ubi/io.c
+index 5bbd1f094f4e..1fc23e48fe8e 100644
+--- a/drivers/mtd/ubi/io.c
++++ b/drivers/mtd/ubi/io.c
+@@ -926,6 +926,11 @@ static int validate_vid_hdr(const struct ubi_device *ubi,
+ goto bad;
+ }
+
++ if (data_size > ubi->leb_size) {
++ ubi_err(ubi, "bad data_size");
++ goto bad;
++ }
++
+ if (vol_type == UBI_VID_STATIC) {
+ /*
+ * Although from high-level point of view static volumes may
+diff --git a/drivers/mtd/ubi/vtbl.c b/drivers/mtd/ubi/vtbl.c
+index 80bdd5b88bac..d85c19762160 100644
+--- a/drivers/mtd/ubi/vtbl.c
++++ b/drivers/mtd/ubi/vtbl.c
+@@ -649,6 +649,7 @@ static int init_volumes(struct ubi_device *ubi,
+ if (ubi->corr_peb_count)
+ ubi_err(ubi, "%d PEBs are corrupted and not used",
+ ubi->corr_peb_count);
++ return -ENOSPC;
+ }
+ ubi->rsvd_pebs += reserved_pebs;
+ ubi->avail_pebs -= reserved_pebs;
+diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c
+index 275d9fb6fe5c..eb4489f9082f 100644
+--- a/drivers/mtd/ubi/wl.c
++++ b/drivers/mtd/ubi/wl.c
+@@ -1601,6 +1601,7 @@ int ubi_wl_init(struct ubi_device *ubi, struct ubi_attach_info *ai)
+ if (ubi->corr_peb_count)
+ ubi_err(ubi, "%d PEBs are corrupted and not used",
+ ubi->corr_peb_count);
++ err = -ENOSPC;
+ goto out_free;
+ }
+ ubi->avail_pebs -= reserved_pebs;
+diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
+index 89d788d8f263..adfe1de78d99 100644
+--- a/drivers/net/ethernet/intel/e1000e/netdev.c
++++ b/drivers/net/ethernet/intel/e1000e/netdev.c
+@@ -4280,18 +4280,29 @@ static cycle_t e1000e_cyclecounter_read(const struct cyclecounter *cc)
+ struct e1000_adapter *adapter = container_of(cc, struct e1000_adapter,
+ cc);
+ struct e1000_hw *hw = &adapter->hw;
++ u32 systimel_1, systimel_2, systimeh;
+ cycle_t systim, systim_next;
+- /* SYSTIMH latching upon SYSTIML read does not work well. To fix that
+- * we don't want to allow overflow of SYSTIML and a change to SYSTIMH
+- * to occur between reads, so if we read a vale close to overflow, we
+- * wait for overflow to occur and read both registers when its safe.
++ /* SYSTIMH latching upon SYSTIML read does not work well.
++ * This means that if SYSTIML overflows after we read it but before
++ * we read SYSTIMH, the value of SYSTIMH has been incremented and we
++ * will experience a huge non linear increment in the systime value
++ * to fix that we test for overflow and if true, we re-read systime.
+ */
+- u32 systim_overflow_latch_fix = 0x3FFFFFFF;
+-
+- do {
+- systim = (cycle_t)er32(SYSTIML);
+- } while (systim > systim_overflow_latch_fix);
+- systim |= (cycle_t)er32(SYSTIMH) << 32;
++ systimel_1 = er32(SYSTIML);
++ systimeh = er32(SYSTIMH);
++ systimel_2 = er32(SYSTIML);
++ /* Check for overflow. If there was no overflow, use the values */
++ if (systimel_1 < systimel_2) {
++ systim = (cycle_t)systimel_1;
++ systim |= (cycle_t)systimeh << 32;
++ } else {
++ /* There was an overflow, read again SYSTIMH, and use
++ * systimel_2
++ */
++ systimeh = er32(SYSTIMH);
++ systim = (cycle_t)systimel_2;
++ systim |= (cycle_t)systimeh << 32;
++ }
+
+ if ((hw->mac.type == e1000_82574) || (hw->mac.type == e1000_82583)) {
+ u64 incvalue, time_delta, rem, temp;
+diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
+index 8d7b59689722..5bc9fca67957 100644
+--- a/drivers/net/ethernet/intel/igb/igb_main.c
++++ b/drivers/net/ethernet/intel/igb/igb_main.c
+@@ -2851,7 +2851,7 @@ static void igb_probe_vfs(struct igb_adapter *adapter)
+ return;
+
+ pci_sriov_set_totalvfs(pdev, 7);
+- igb_pci_enable_sriov(pdev, max_vfs);
++ igb_enable_sriov(pdev, max_vfs);
+
+ #endif /* CONFIG_PCI_IOV */
+ }
+diff --git a/drivers/net/ethernet/via/Kconfig b/drivers/net/ethernet/via/Kconfig
+index 2f1264b882b9..d3d094742a7e 100644
+--- a/drivers/net/ethernet/via/Kconfig
++++ b/drivers/net/ethernet/via/Kconfig
+@@ -17,7 +17,7 @@ if NET_VENDOR_VIA
+
+ config VIA_RHINE
+ tristate "VIA Rhine support"
+- depends on (PCI || OF_IRQ)
++ depends on PCI || (OF_IRQ && GENERIC_PCI_IOMAP)
+ depends on HAS_DMA
+ select CRC32
+ select MII
+diff --git a/drivers/net/wireless/ath/ath10k/htc.c b/drivers/net/wireless/ath/ath10k/htc.c
+index 85bfa2acb801..32d9ff1b19dc 100644
+--- a/drivers/net/wireless/ath/ath10k/htc.c
++++ b/drivers/net/wireless/ath/ath10k/htc.c
+@@ -145,8 +145,10 @@ int ath10k_htc_send(struct ath10k_htc *htc,
+ skb_cb->eid = eid;
+ skb_cb->paddr = dma_map_single(dev, skb->data, skb->len, DMA_TO_DEVICE);
+ ret = dma_mapping_error(dev, skb_cb->paddr);
+- if (ret)
++ if (ret) {
++ ret = -EIO;
+ goto err_credits;
++ }
+
+ sg_item.transfer_id = ep->eid;
+ sg_item.transfer_context = skb;
+diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c b/drivers/net/wireless/ath/ath10k/htt_tx.c
+index a60ef7d1d5fc..7be3ce6e0ffa 100644
+--- a/drivers/net/wireless/ath/ath10k/htt_tx.c
++++ b/drivers/net/wireless/ath/ath10k/htt_tx.c
+@@ -371,8 +371,10 @@ int ath10k_htt_mgmt_tx(struct ath10k_htt *htt, struct sk_buff *msdu)
+ skb_cb->paddr = dma_map_single(dev, msdu->data, msdu->len,
+ DMA_TO_DEVICE);
+ res = dma_mapping_error(dev, skb_cb->paddr);
+- if (res)
++ if (res) {
++ res = -EIO;
+ goto err_free_txdesc;
++ }
+
+ skb_put(txdesc, len);
+ cmd = (struct htt_cmd *)txdesc->data;
+@@ -456,8 +458,10 @@ int ath10k_htt_tx(struct ath10k_htt *htt, struct sk_buff *msdu)
+ skb_cb->paddr = dma_map_single(dev, msdu->data, msdu->len,
+ DMA_TO_DEVICE);
+ res = dma_mapping_error(dev, skb_cb->paddr);
+- if (res)
++ if (res) {
++ res = -EIO;
+ goto err_free_txbuf;
++ }
+
+ switch (skb_cb->txmode) {
+ case ATH10K_HW_TXRX_RAW:
+diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
+index 218b6af63447..0d3c474ff76d 100644
+--- a/drivers/net/wireless/ath/ath10k/mac.c
++++ b/drivers/net/wireless/ath/ath10k/mac.c
+@@ -591,11 +591,19 @@ ath10k_mac_get_any_chandef_iter(struct ieee80211_hw *hw,
+ static int ath10k_peer_create(struct ath10k *ar, u32 vdev_id, const u8 *addr,
+ enum wmi_peer_type peer_type)
+ {
++ struct ath10k_vif *arvif;
++ int num_peers = 0;
+ int ret;
+
+ lockdep_assert_held(&ar->conf_mutex);
+
+- if (ar->num_peers >= ar->max_num_peers)
++ num_peers = ar->num_peers;
++
++ /* Each vdev consumes a peer entry as well */
++ list_for_each_entry(arvif, &ar->arvifs, list)
++ num_peers++;
++
++ if (num_peers >= ar->max_num_peers)
+ return -ENOBUFS;
+
+ ret = ath10k_wmi_peer_create(ar, vdev_id, addr, peer_type);
+@@ -2995,6 +3003,8 @@ void ath10k_mac_tx_unlock(struct ath10k *ar, int reason)
+ IEEE80211_IFACE_ITER_RESUME_ALL,
+ ath10k_mac_tx_unlock_iter,
+ ar);
++
++ ieee80211_wake_queue(ar->hw, ar->hw->offchannel_tx_hw_queue);
+ }
+
+ void ath10k_mac_vif_tx_lock(struct ath10k_vif *arvif, int reason)
+@@ -3034,38 +3044,16 @@ static void ath10k_mac_vif_handle_tx_pause(struct ath10k_vif *arvif,
+
+ lockdep_assert_held(&ar->htt.tx_lock);
+
+- switch (pause_id) {
+- case WMI_TLV_TX_PAUSE_ID_MCC:
+- case WMI_TLV_TX_PAUSE_ID_P2P_CLI_NOA:
+- case WMI_TLV_TX_PAUSE_ID_P2P_GO_PS:
+- case WMI_TLV_TX_PAUSE_ID_AP_PS:
+- case WMI_TLV_TX_PAUSE_ID_IBSS_PS:
+- switch (action) {
+- case WMI_TLV_TX_PAUSE_ACTION_STOP:
+- ath10k_mac_vif_tx_lock(arvif, pause_id);
+- break;
+- case WMI_TLV_TX_PAUSE_ACTION_WAKE:
+- ath10k_mac_vif_tx_unlock(arvif, pause_id);
+- break;
+- default:
+- ath10k_warn(ar, "received unknown tx pause action %d on vdev %i, ignoring\n",
+- action, arvif->vdev_id);
+- break;
+- }
++ switch (action) {
++ case WMI_TLV_TX_PAUSE_ACTION_STOP:
++ ath10k_mac_vif_tx_lock(arvif, pause_id);
++ break;
++ case WMI_TLV_TX_PAUSE_ACTION_WAKE:
++ ath10k_mac_vif_tx_unlock(arvif, pause_id);
+ break;
+- case WMI_TLV_TX_PAUSE_ID_AP_PEER_PS:
+- case WMI_TLV_TX_PAUSE_ID_AP_PEER_UAPSD:
+- case WMI_TLV_TX_PAUSE_ID_STA_ADD_BA:
+- case WMI_TLV_TX_PAUSE_ID_HOST:
+ default:
+- /* FIXME: Some pause_ids aren't vdev specific. Instead they
+- * target peer_id and tid. Implementing these could improve
+- * traffic scheduling fairness across multiple connected
+- * stations in AP/IBSS modes.
+- */
+- ath10k_dbg(ar, ATH10K_DBG_MAC,
+- "mac ignoring unsupported tx pause vdev %i id %d\n",
+- arvif->vdev_id, pause_id);
++ ath10k_warn(ar, "received unknown tx pause action %d on vdev %i, ignoring\n",
++ action, arvif->vdev_id);
+ break;
+ }
+ }
+@@ -3082,12 +3070,15 @@ static void ath10k_mac_handle_tx_pause_iter(void *data, u8 *mac,
+ struct ath10k_vif *arvif = ath10k_vif_to_arvif(vif);
+ struct ath10k_mac_tx_pause *arg = data;
+
++ if (arvif->vdev_id != arg->vdev_id)
++ return;
++
+ ath10k_mac_vif_handle_tx_pause(arvif, arg->pause_id, arg->action);
+ }
+
+-void ath10k_mac_handle_tx_pause(struct ath10k *ar, u32 vdev_id,
+- enum wmi_tlv_tx_pause_id pause_id,
+- enum wmi_tlv_tx_pause_action action)
++void ath10k_mac_handle_tx_pause_vdev(struct ath10k *ar, u32 vdev_id,
++ enum wmi_tlv_tx_pause_id pause_id,
++ enum wmi_tlv_tx_pause_action action)
+ {
+ struct ath10k_mac_tx_pause arg = {
+ .vdev_id = vdev_id,
+@@ -4080,6 +4071,11 @@ static int ath10k_add_interface(struct ieee80211_hw *hw,
+ sizeof(arvif->bitrate_mask.control[i].vht_mcs));
+ }
+
++ if (ar->num_peers >= ar->max_num_peers) {
++ ath10k_warn(ar, "refusing vdev creation due to insufficient peer entry resources in firmware\n");
++ return -ENOBUFS;
++ }
++
+ if (ar->free_vdev_map == 0) {
+ ath10k_warn(ar, "Free vdev map is empty, no more interfaces allowed.\n");
+ ret = -EBUSY;
+@@ -4287,6 +4283,11 @@ static int ath10k_add_interface(struct ieee80211_hw *hw,
+ }
+ }
+
++ spin_lock_bh(&ar->htt.tx_lock);
++ if (!ar->tx_paused)
++ ieee80211_wake_queue(ar->hw, arvif->vdev_id);
++ spin_unlock_bh(&ar->htt.tx_lock);
++
+ mutex_unlock(&ar->conf_mutex);
+ return 0;
+
+@@ -5561,6 +5562,21 @@ static int ath10k_set_rts_threshold(struct ieee80211_hw *hw, u32 value)
+ return ret;
+ }
+
++static int ath10k_mac_op_set_frag_threshold(struct ieee80211_hw *hw, u32 value)
++{
++ /* Even though there's a WMI enum for fragmentation threshold no known
++ * firmware actually implements it. Moreover it is not possible to rely
++ * frame fragmentation to mac80211 because firmware clears the "more
++ * fragments" bit in frame control making it impossible for remote
++ * devices to reassemble frames.
++ *
++ * Hence implement a dummy callback just to say fragmentation isn't
++ * supported. This effectively prevents mac80211 from doing frame
++ * fragmentation in software.
++ */
++ return -EOPNOTSUPP;
++}
++
+ static void ath10k_flush(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
+ u32 queues, bool drop)
+ {
+@@ -6395,6 +6411,7 @@ static const struct ieee80211_ops ath10k_ops = {
+ .remain_on_channel = ath10k_remain_on_channel,
+ .cancel_remain_on_channel = ath10k_cancel_remain_on_channel,
+ .set_rts_threshold = ath10k_set_rts_threshold,
++ .set_frag_threshold = ath10k_mac_op_set_frag_threshold,
+ .flush = ath10k_flush,
+ .tx_last_beacon = ath10k_tx_last_beacon,
+ .set_antenna = ath10k_set_antenna,
+diff --git a/drivers/net/wireless/ath/ath10k/mac.h b/drivers/net/wireless/ath/ath10k/mac.h
+index b291f063705c..e3cefe4c7cfd 100644
+--- a/drivers/net/wireless/ath/ath10k/mac.h
++++ b/drivers/net/wireless/ath/ath10k/mac.h
+@@ -61,9 +61,9 @@ int ath10k_mac_vif_chan(struct ieee80211_vif *vif,
+
+ void ath10k_mac_handle_beacon(struct ath10k *ar, struct sk_buff *skb);
+ void ath10k_mac_handle_beacon_miss(struct ath10k *ar, u32 vdev_id);
+-void ath10k_mac_handle_tx_pause(struct ath10k *ar, u32 vdev_id,
+- enum wmi_tlv_tx_pause_id pause_id,
+- enum wmi_tlv_tx_pause_action action);
++void ath10k_mac_handle_tx_pause_vdev(struct ath10k *ar, u32 vdev_id,
++ enum wmi_tlv_tx_pause_id pause_id,
++ enum wmi_tlv_tx_pause_action action);
+
+ u8 ath10k_mac_hw_rate_to_idx(const struct ieee80211_supported_band *sband,
+ u8 hw_rate);
+diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
+index ea656e011a96..8c5cc1facc45 100644
+--- a/drivers/net/wireless/ath/ath10k/pci.c
++++ b/drivers/net/wireless/ath/ath10k/pci.c
+@@ -1546,8 +1546,10 @@ static int ath10k_pci_hif_exchange_bmi_msg(struct ath10k *ar,
+
+ req_paddr = dma_map_single(ar->dev, treq, req_len, DMA_TO_DEVICE);
+ ret = dma_mapping_error(ar->dev, req_paddr);
+- if (ret)
++ if (ret) {
++ ret = -EIO;
+ goto err_dma;
++ }
+
+ if (resp && resp_len) {
+ tresp = kzalloc(*resp_len, GFP_KERNEL);
+@@ -1559,8 +1561,10 @@ static int ath10k_pci_hif_exchange_bmi_msg(struct ath10k *ar,
+ resp_paddr = dma_map_single(ar->dev, tresp, *resp_len,
+ DMA_FROM_DEVICE);
+ ret = dma_mapping_error(ar->dev, resp_paddr);
+- if (ret)
++ if (ret) {
++ ret = EIO;
+ goto err_req;
++ }
+
+ xfer.wait_for_resp = true;
+ xfer.resp_len = 0;
+diff --git a/drivers/net/wireless/ath/ath10k/wmi-tlv.c b/drivers/net/wireless/ath/ath10k/wmi-tlv.c
+index 8fdba3865c96..6f477e83099d 100644
+--- a/drivers/net/wireless/ath/ath10k/wmi-tlv.c
++++ b/drivers/net/wireless/ath/ath10k/wmi-tlv.c
+@@ -377,12 +377,34 @@ static int ath10k_wmi_tlv_event_tx_pause(struct ath10k *ar,
+ "wmi tlv tx pause pause_id %u action %u vdev_map 0x%08x peer_id %u tid_map 0x%08x\n",
+ pause_id, action, vdev_map, peer_id, tid_map);
+
+- for (vdev_id = 0; vdev_map; vdev_id++) {
+- if (!(vdev_map & BIT(vdev_id)))
+- continue;
+-
+- vdev_map &= ~BIT(vdev_id);
+- ath10k_mac_handle_tx_pause(ar, vdev_id, pause_id, action);
++ switch (pause_id) {
++ case WMI_TLV_TX_PAUSE_ID_MCC:
++ case WMI_TLV_TX_PAUSE_ID_P2P_CLI_NOA:
++ case WMI_TLV_TX_PAUSE_ID_P2P_GO_PS:
++ case WMI_TLV_TX_PAUSE_ID_AP_PS:
++ case WMI_TLV_TX_PAUSE_ID_IBSS_PS:
++ for (vdev_id = 0; vdev_map; vdev_id++) {
++ if (!(vdev_map & BIT(vdev_id)))
++ continue;
++
++ vdev_map &= ~BIT(vdev_id);
++ ath10k_mac_handle_tx_pause_vdev(ar, vdev_id, pause_id,
++ action);
++ }
++ break;
++ case WMI_TLV_TX_PAUSE_ID_AP_PEER_PS:
++ case WMI_TLV_TX_PAUSE_ID_AP_PEER_UAPSD:
++ case WMI_TLV_TX_PAUSE_ID_STA_ADD_BA:
++ case WMI_TLV_TX_PAUSE_ID_HOST:
++ ath10k_dbg(ar, ATH10K_DBG_MAC,
++ "mac ignoring unsupported tx pause id %d\n",
++ pause_id);
++ break;
++ default:
++ ath10k_dbg(ar, ATH10K_DBG_MAC,
++ "mac ignoring unknown tx pause vdev %d\n",
++ pause_id);
++ break;
+ }
+
+ kfree(tb);
+diff --git a/drivers/net/wireless/ath/ath10k/wmi.c b/drivers/net/wireless/ath/ath10k/wmi.c
+index 6c046c244705..8dd84c160cfd 100644
+--- a/drivers/net/wireless/ath/ath10k/wmi.c
++++ b/drivers/net/wireless/ath/ath10k/wmi.c
+@@ -2391,6 +2391,7 @@ void ath10k_wmi_event_host_swba(struct ath10k *ar, struct sk_buff *skb)
+ ath10k_warn(ar, "failed to map beacon: %d\n",
+ ret);
+ dev_kfree_skb_any(bcn);
++ ret = -EIO;
+ goto skip;
+ }
+
+diff --git a/drivers/net/wireless/rsi/rsi_91x_sdio_ops.c b/drivers/net/wireless/rsi/rsi_91x_sdio_ops.c
+index 1c6788aecc62..40d72312f3df 100644
+--- a/drivers/net/wireless/rsi/rsi_91x_sdio_ops.c
++++ b/drivers/net/wireless/rsi/rsi_91x_sdio_ops.c
+@@ -203,8 +203,10 @@ static int rsi_load_ta_instructions(struct rsi_common *common)
+
+ /* Copy firmware into DMA-accessible memory */
+ fw = kmemdup(fw_entry->data, fw_entry->size, GFP_KERNEL);
+- if (!fw)
+- return -ENOMEM;
++ if (!fw) {
++ status = -ENOMEM;
++ goto out;
++ }
+ len = fw_entry->size;
+
+ if (len % 4)
+@@ -217,6 +219,8 @@ static int rsi_load_ta_instructions(struct rsi_common *common)
+
+ status = rsi_copy_to_card(common, fw, len, num_blocks);
+ kfree(fw);
++
++out:
+ release_firmware(fw_entry);
+ return status;
+ }
+diff --git a/drivers/net/wireless/rsi/rsi_91x_usb_ops.c b/drivers/net/wireless/rsi/rsi_91x_usb_ops.c
+index 30c2cf7fa93b..de4900862836 100644
+--- a/drivers/net/wireless/rsi/rsi_91x_usb_ops.c
++++ b/drivers/net/wireless/rsi/rsi_91x_usb_ops.c
+@@ -148,8 +148,10 @@ static int rsi_load_ta_instructions(struct rsi_common *common)
+
+ /* Copy firmware into DMA-accessible memory */
+ fw = kmemdup(fw_entry->data, fw_entry->size, GFP_KERNEL);
+- if (!fw)
+- return -ENOMEM;
++ if (!fw) {
++ status = -ENOMEM;
++ goto out;
++ }
+ len = fw_entry->size;
+
+ if (len % 4)
+@@ -162,6 +164,8 @@ static int rsi_load_ta_instructions(struct rsi_common *common)
+
+ status = rsi_copy_to_card(common, fw, len, num_blocks);
+ kfree(fw);
++
++out:
+ release_firmware(fw_entry);
+ return status;
+ }
+diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
+index f948c46d5132..5ff0cfd142ee 100644
+--- a/drivers/net/xen-netfront.c
++++ b/drivers/net/xen-netfront.c
+@@ -1348,7 +1348,8 @@ static void xennet_disconnect_backend(struct netfront_info *info)
+ queue->tx_evtchn = queue->rx_evtchn = 0;
+ queue->tx_irq = queue->rx_irq = 0;
+
+- napi_synchronize(&queue->napi);
++ if (netif_running(info->netdev))
++ napi_synchronize(&queue->napi);
+
+ xennet_release_tx_bufs(queue);
+ xennet_release_rx_bufs(queue);
+diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
+index ade9eb917a4d..b796d1bd8988 100644
+--- a/drivers/nvdimm/pmem.c
++++ b/drivers/nvdimm/pmem.c
+@@ -86,6 +86,8 @@ static int pmem_rw_page(struct block_device *bdev, sector_t sector,
+ struct pmem_device *pmem = bdev->bd_disk->private_data;
+
+ pmem_do_bvec(pmem, page, PAGE_CACHE_SIZE, 0, rw, sector);
++ if (rw & WRITE)
++ wmb_pmem();
+ page_endio(page, rw & WRITE, 0);
+
+ return 0;
+diff --git a/drivers/pci/access.c b/drivers/pci/access.c
+index b965c12168b7..502a82ca1db0 100644
+--- a/drivers/pci/access.c
++++ b/drivers/pci/access.c
+@@ -442,7 +442,8 @@ static const struct pci_vpd_ops pci_vpd_pci22_ops = {
+ static ssize_t pci_vpd_f0_read(struct pci_dev *dev, loff_t pos, size_t count,
+ void *arg)
+ {
+- struct pci_dev *tdev = pci_get_slot(dev->bus, PCI_SLOT(dev->devfn));
++ struct pci_dev *tdev = pci_get_slot(dev->bus,
++ PCI_DEVFN(PCI_SLOT(dev->devfn), 0));
+ ssize_t ret;
+
+ if (!tdev)
+@@ -456,7 +457,8 @@ static ssize_t pci_vpd_f0_read(struct pci_dev *dev, loff_t pos, size_t count,
+ static ssize_t pci_vpd_f0_write(struct pci_dev *dev, loff_t pos, size_t count,
+ const void *arg)
+ {
+- struct pci_dev *tdev = pci_get_slot(dev->bus, PCI_SLOT(dev->devfn));
++ struct pci_dev *tdev = pci_get_slot(dev->bus,
++ PCI_DEVFN(PCI_SLOT(dev->devfn), 0));
+ ssize_t ret;
+
+ if (!tdev)
+@@ -473,22 +475,6 @@ static const struct pci_vpd_ops pci_vpd_f0_ops = {
+ .release = pci_vpd_pci22_release,
+ };
+
+-static int pci_vpd_f0_dev_check(struct pci_dev *dev)
+-{
+- struct pci_dev *tdev = pci_get_slot(dev->bus, PCI_SLOT(dev->devfn));
+- int ret = 0;
+-
+- if (!tdev)
+- return -ENODEV;
+- if (!tdev->vpd || !tdev->multifunction ||
+- dev->class != tdev->class || dev->vendor != tdev->vendor ||
+- dev->device != tdev->device)
+- ret = -ENODEV;
+-
+- pci_dev_put(tdev);
+- return ret;
+-}
+-
+ int pci_vpd_pci22_init(struct pci_dev *dev)
+ {
+ struct pci_vpd_pci22 *vpd;
+@@ -497,12 +483,7 @@ int pci_vpd_pci22_init(struct pci_dev *dev)
+ cap = pci_find_capability(dev, PCI_CAP_ID_VPD);
+ if (!cap)
+ return -ENODEV;
+- if (dev->dev_flags & PCI_DEV_FLAGS_VPD_REF_F0) {
+- int ret = pci_vpd_f0_dev_check(dev);
+
+- if (ret)
+- return ret;
+- }
+ vpd = kzalloc(sizeof(*vpd), GFP_ATOMIC);
+ if (!vpd)
+ return -ENOMEM;
+diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
+index 6fbd3f2b5992..d3346d23963b 100644
+--- a/drivers/pci/bus.c
++++ b/drivers/pci/bus.c
+@@ -256,6 +256,8 @@ bool pci_bus_clip_resource(struct pci_dev *dev, int idx)
+
+ res->start = start;
+ res->end = end;
++ res->flags &= ~IORESOURCE_UNSET;
++ orig_res.flags &= ~IORESOURCE_UNSET;
+ dev_printk(KERN_DEBUG, &dev->dev, "%pR clipped to %pR\n",
+ &orig_res, res);
+
+diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
+index dbd13854f21e..6b1c6a915daa 100644
+--- a/drivers/pci/quirks.c
++++ b/drivers/pci/quirks.c
+@@ -1906,11 +1906,27 @@ static void quirk_netmos(struct pci_dev *dev)
+ DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_NETMOS, PCI_ANY_ID,
+ PCI_CLASS_COMMUNICATION_SERIAL, 8, quirk_netmos);
+
++/*
++ * Quirk non-zero PCI functions to route VPD access through function 0 for
++ * devices that share VPD resources between functions. The functions are
++ * expected to be identical devices.
++ */
+ static void quirk_f0_vpd_link(struct pci_dev *dev)
+ {
+- if (!dev->multifunction || !PCI_FUNC(dev->devfn))
++ struct pci_dev *f0;
++
++ if (!PCI_FUNC(dev->devfn))
+ return;
+- dev->dev_flags |= PCI_DEV_FLAGS_VPD_REF_F0;
++
++ f0 = pci_get_slot(dev->bus, PCI_DEVFN(PCI_SLOT(dev->devfn), 0));
++ if (!f0)
++ return;
++
++ if (f0->vpd && dev->class == f0->class &&
++ dev->vendor == f0->vendor && dev->device == f0->device)
++ dev->dev_flags |= PCI_DEV_FLAGS_VPD_REF_F0;
++
++ pci_dev_put(f0);
+ }
+ DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_INTEL, PCI_ANY_ID,
+ PCI_CLASS_NETWORK_ETHERNET, 8, quirk_f0_vpd_link);
+diff --git a/drivers/pcmcia/sa1100_generic.c b/drivers/pcmcia/sa1100_generic.c
+index 803945259da8..42861cc70158 100644
+--- a/drivers/pcmcia/sa1100_generic.c
++++ b/drivers/pcmcia/sa1100_generic.c
+@@ -93,7 +93,6 @@ static int sa11x0_drv_pcmcia_remove(struct platform_device *dev)
+ for (i = 0; i < sinfo->nskt; i++)
+ soc_pcmcia_remove_one(&sinfo->skt[i]);
+
+- clk_put(sinfo->clk);
+ kfree(sinfo);
+ return 0;
+ }
+diff --git a/drivers/pcmcia/sa11xx_base.c b/drivers/pcmcia/sa11xx_base.c
+index cf6de2c2b329..553d70a67f80 100644
+--- a/drivers/pcmcia/sa11xx_base.c
++++ b/drivers/pcmcia/sa11xx_base.c
+@@ -222,7 +222,7 @@ int sa11xx_drv_pcmcia_probe(struct device *dev, struct pcmcia_low_level *ops,
+ int i, ret = 0;
+ struct clk *clk;
+
+- clk = clk_get(dev, NULL);
++ clk = devm_clk_get(dev, NULL);
+ if (IS_ERR(clk))
+ return PTR_ERR(clk);
+
+@@ -251,7 +251,6 @@ int sa11xx_drv_pcmcia_probe(struct device *dev, struct pcmcia_low_level *ops,
+ if (ret) {
+ while (--i >= 0)
+ soc_pcmcia_remove_one(&sinfo->skt[i]);
+- clk_put(clk);
+ kfree(sinfo);
+ } else {
+ dev_set_drvdata(dev, sinfo);
+diff --git a/drivers/platform/x86/toshiba_acpi.c b/drivers/platform/x86/toshiba_acpi.c
+index 3ad7b1fa24ce..6f4f310de946 100644
+--- a/drivers/platform/x86/toshiba_acpi.c
++++ b/drivers/platform/x86/toshiba_acpi.c
+@@ -2408,11 +2408,9 @@ static int toshiba_acpi_setup_keyboard(struct toshiba_acpi_dev *dev)
+ if (error)
+ return error;
+
+- error = toshiba_hotkey_event_type_get(dev, &events_type);
+- if (error) {
+- pr_err("Unable to query Hotkey Event Type\n");
+- return error;
+- }
++ if (toshiba_hotkey_event_type_get(dev, &events_type))
++ pr_notice("Unable to query Hotkey Event Type\n");
++
+ dev->hotkey_event_type = events_type;
+
+ dev->hotkey_dev = input_allocate_device();
+diff --git a/drivers/power/avs/Kconfig b/drivers/power/avs/Kconfig
+index 7f3d389bd601..a67eeace6a89 100644
+--- a/drivers/power/avs/Kconfig
++++ b/drivers/power/avs/Kconfig
+@@ -13,7 +13,7 @@ menuconfig POWER_AVS
+
+ config ROCKCHIP_IODOMAIN
+ tristate "Rockchip IO domain support"
+- depends on ARCH_ROCKCHIP && OF
++ depends on POWER_AVS && ARCH_ROCKCHIP && OF
+ help
+ Say y here to enable support io domains on Rockchip SoCs. It is
+ necessary for the io domain setting of the SoC to match the
+diff --git a/drivers/regulator/axp20x-regulator.c b/drivers/regulator/axp20x-regulator.c
+index 646829132b59..1dea0e8353e0 100644
+--- a/drivers/regulator/axp20x-regulator.c
++++ b/drivers/regulator/axp20x-regulator.c
+@@ -192,9 +192,9 @@ static const struct regulator_desc axp22x_regulators[] = {
+ AXP_DESC(AXP22X, DCDC3, "dcdc3", "vin3", 600, 1860, 20,
+ AXP22X_DCDC3_V_OUT, 0x3f, AXP22X_PWR_OUT_CTRL1, BIT(3)),
+ AXP_DESC(AXP22X, DCDC4, "dcdc4", "vin4", 600, 1540, 20,
+- AXP22X_DCDC4_V_OUT, 0x3f, AXP22X_PWR_OUT_CTRL1, BIT(3)),
++ AXP22X_DCDC4_V_OUT, 0x3f, AXP22X_PWR_OUT_CTRL1, BIT(4)),
+ AXP_DESC(AXP22X, DCDC5, "dcdc5", "vin5", 1000, 2550, 50,
+- AXP22X_DCDC5_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL1, BIT(4)),
++ AXP22X_DCDC5_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL1, BIT(5)),
+ /* secondary switchable output of DCDC1 */
+ AXP_DESC_SW(AXP22X, DC1SW, "dc1sw", "dcdc1", 1600, 3400, 100,
+ AXP22X_DCDC1_V_OUT, 0x1f, AXP22X_PWR_OUT_CTRL2, BIT(7)),
+diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
+index 78387a6cbae5..5081533858f1 100644
+--- a/drivers/regulator/core.c
++++ b/drivers/regulator/core.c
+@@ -1376,15 +1376,19 @@ static int regulator_resolve_supply(struct regulator_dev *rdev)
+ return 0;
+
+ r = regulator_dev_lookup(dev, rdev->supply_name, &ret);
+- if (ret == -ENODEV) {
+- /*
+- * No supply was specified for this regulator and
+- * there will never be one.
+- */
+- return 0;
+- }
+-
+ if (!r) {
++ if (ret == -ENODEV) {
++ /*
++ * No supply was specified for this regulator and
++ * there will never be one.
++ */
++ return 0;
++ }
++
++ /* Did the lookup explicitly defer for us? */
++ if (ret == -EPROBE_DEFER)
++ return ret;
++
+ if (have_full_constraints()) {
+ r = dummy_regulator_rdev;
+ } else {
+diff --git a/drivers/scsi/3w-9xxx.c b/drivers/scsi/3w-9xxx.c
+index add419d6ff34..a56a7b243e91 100644
+--- a/drivers/scsi/3w-9xxx.c
++++ b/drivers/scsi/3w-9xxx.c
+@@ -212,6 +212,17 @@ static const struct file_operations twa_fops = {
+ .llseek = noop_llseek,
+ };
+
++/*
++ * The controllers use an inline buffer instead of a mapped SGL for small,
++ * single entry buffers. Note that we treat a zero-length transfer like
++ * a mapped SGL.
++ */
++static bool twa_command_mapped(struct scsi_cmnd *cmd)
++{
++ return scsi_sg_count(cmd) != 1 ||
++ scsi_bufflen(cmd) >= TW_MIN_SGL_LENGTH;
++}
++
+ /* This function will complete an aen request from the isr */
+ static int twa_aen_complete(TW_Device_Extension *tw_dev, int request_id)
+ {
+@@ -1339,7 +1350,8 @@ static irqreturn_t twa_interrupt(int irq, void *dev_instance)
+ }
+
+ /* Now complete the io */
+- scsi_dma_unmap(cmd);
++ if (twa_command_mapped(cmd))
++ scsi_dma_unmap(cmd);
+ cmd->scsi_done(cmd);
+ tw_dev->state[request_id] = TW_S_COMPLETED;
+ twa_free_request_id(tw_dev, request_id);
+@@ -1582,7 +1594,8 @@ static int twa_reset_device_extension(TW_Device_Extension *tw_dev)
+ struct scsi_cmnd *cmd = tw_dev->srb[i];
+
+ cmd->result = (DID_RESET << 16);
+- scsi_dma_unmap(cmd);
++ if (twa_command_mapped(cmd))
++ scsi_dma_unmap(cmd);
+ cmd->scsi_done(cmd);
+ }
+ }
+@@ -1765,12 +1778,14 @@ static int twa_scsi_queue_lck(struct scsi_cmnd *SCpnt, void (*done)(struct scsi_
+ retval = twa_scsiop_execute_scsi(tw_dev, request_id, NULL, 0, NULL);
+ switch (retval) {
+ case SCSI_MLQUEUE_HOST_BUSY:
+- scsi_dma_unmap(SCpnt);
++ if (twa_command_mapped(SCpnt))
++ scsi_dma_unmap(SCpnt);
+ twa_free_request_id(tw_dev, request_id);
+ break;
+ case 1:
+ SCpnt->result = (DID_ERROR << 16);
+- scsi_dma_unmap(SCpnt);
++ if (twa_command_mapped(SCpnt))
++ scsi_dma_unmap(SCpnt);
+ done(SCpnt);
+ tw_dev->state[request_id] = TW_S_COMPLETED;
+ twa_free_request_id(tw_dev, request_id);
+@@ -1831,8 +1846,7 @@ static int twa_scsiop_execute_scsi(TW_Device_Extension *tw_dev, int request_id,
+ /* Map sglist from scsi layer to cmd packet */
+
+ if (scsi_sg_count(srb)) {
+- if ((scsi_sg_count(srb) == 1) &&
+- (scsi_bufflen(srb) < TW_MIN_SGL_LENGTH)) {
++ if (!twa_command_mapped(srb)) {
+ if (srb->sc_data_direction == DMA_TO_DEVICE ||
+ srb->sc_data_direction == DMA_BIDIRECTIONAL)
+ scsi_sg_copy_to_buffer(srb,
+@@ -1905,7 +1919,7 @@ static void twa_scsiop_execute_scsi_complete(TW_Device_Extension *tw_dev, int re
+ {
+ struct scsi_cmnd *cmd = tw_dev->srb[request_id];
+
+- if (scsi_bufflen(cmd) < TW_MIN_SGL_LENGTH &&
++ if (!twa_command_mapped(cmd) &&
+ (cmd->sc_data_direction == DMA_FROM_DEVICE ||
+ cmd->sc_data_direction == DMA_BIDIRECTIONAL)) {
+ if (scsi_sg_count(cmd) == 1) {
+diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
+index 1dafeb43333b..cab4e98b2b0e 100644
+--- a/drivers/scsi/hpsa.c
++++ b/drivers/scsi/hpsa.c
+@@ -5104,7 +5104,7 @@ static int hpsa_eh_device_reset_handler(struct scsi_cmnd *scsicmd)
+ int rc;
+ struct ctlr_info *h;
+ struct hpsa_scsi_dev_t *dev;
+- char msg[40];
++ char msg[48];
+
+ /* find the controller to which the command to be aborted was sent */
+ h = sdev_to_hba(scsicmd->device);
+@@ -5122,16 +5122,18 @@ static int hpsa_eh_device_reset_handler(struct scsi_cmnd *scsicmd)
+
+ /* if controller locked up, we can guarantee command won't complete */
+ if (lockup_detected(h)) {
+- sprintf(msg, "cmd %d RESET FAILED, lockup detected",
+- hpsa_get_cmd_index(scsicmd));
++ snprintf(msg, sizeof(msg),
++ "cmd %d RESET FAILED, lockup detected",
++ hpsa_get_cmd_index(scsicmd));
+ hpsa_show_dev_msg(KERN_WARNING, h, dev, msg);
+ return FAILED;
+ }
+
+ /* this reset request might be the result of a lockup; check */
+ if (detect_controller_lockup(h)) {
+- sprintf(msg, "cmd %d RESET FAILED, new lockup detected",
+- hpsa_get_cmd_index(scsicmd));
++ snprintf(msg, sizeof(msg),
++ "cmd %d RESET FAILED, new lockup detected",
++ hpsa_get_cmd_index(scsicmd));
+ hpsa_show_dev_msg(KERN_WARNING, h, dev, msg);
+ return FAILED;
+ }
+@@ -5145,7 +5147,8 @@ static int hpsa_eh_device_reset_handler(struct scsi_cmnd *scsicmd)
+ /* send a reset to the SCSI LUN which the command was sent to */
+ rc = hpsa_do_reset(h, dev, dev->scsi3addr, HPSA_RESET_TYPE_LUN,
+ DEFAULT_REPLY_QUEUE);
+- sprintf(msg, "reset %s", rc == 0 ? "completed successfully" : "failed");
++ snprintf(msg, sizeof(msg), "reset %s",
++ rc == 0 ? "completed successfully" : "failed");
+ hpsa_show_dev_msg(KERN_WARNING, h, dev, msg);
+ return rc == 0 ? SUCCESS : FAILED;
+ }
+diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c
+index a9aa38903efe..cccab6188328 100644
+--- a/drivers/scsi/ipr.c
++++ b/drivers/scsi/ipr.c
+@@ -4554,7 +4554,7 @@ static ssize_t ipr_store_raw_mode(struct device *dev,
+ spin_lock_irqsave(ioa_cfg->host->host_lock, lock_flags);
+ res = (struct ipr_resource_entry *)sdev->hostdata;
+ if (res) {
+- if (ioa_cfg->sis64 && ipr_is_af_dasd_device(res)) {
++ if (ipr_is_af_dasd_device(res)) {
+ res->raw_mode = simple_strtoul(buf, NULL, 10);
+ len = strlen(buf);
+ if (res->sdev)
+diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
+index 6457a8a0db9c..bf3d801ac5f9 100644
+--- a/drivers/scsi/scsi_error.c
++++ b/drivers/scsi/scsi_error.c
+@@ -2169,8 +2169,17 @@ int scsi_error_handler(void *data)
+ * We never actually get interrupted because kthread_run
+ * disables signal delivery for the created thread.
+ */
+- while (!kthread_should_stop()) {
++ while (true) {
++ /*
++ * The sequence in kthread_stop() sets the stop flag first
++ * then wakes the process. To avoid missed wakeups, the task
++ * should always be in a non running state before the stop
++ * flag is checked
++ */
+ set_current_state(TASK_INTERRUPTIBLE);
++ if (kthread_should_stop())
++ break;
++
+ if ((shost->host_failed == 0 && shost->host_eh_scheduled == 0) ||
+ shost->host_failed != atomic_read(&shost->host_busy)) {
+ SCSI_LOG_ERROR_RECOVERY(1,
+diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
+index c9357bb393d3..744596464d33 100644
+--- a/drivers/spi/spi-bcm2835.c
++++ b/drivers/spi/spi-bcm2835.c
+@@ -386,14 +386,14 @@ static bool bcm2835_spi_can_dma(struct spi_master *master,
+ /* otherwise we only allow transfers within the same page
+ * to avoid wasting time on dma_mapping when it is not practical
+ */
+- if (((size_t)tfr->tx_buf & PAGE_MASK) + tfr->len > PAGE_SIZE) {
++ if (((size_t)tfr->tx_buf & (PAGE_SIZE - 1)) + tfr->len > PAGE_SIZE) {
+ dev_warn_once(&spi->dev,
+ "Unaligned spi tx-transfer bridging page\n");
+ return false;
+ }
+- if (((size_t)tfr->rx_buf & PAGE_MASK) + tfr->len > PAGE_SIZE) {
++ if (((size_t)tfr->rx_buf & (PAGE_SIZE - 1)) + tfr->len > PAGE_SIZE) {
+ dev_warn_once(&spi->dev,
+- "Unaligned spi tx-transfer bridging page\n");
++ "Unaligned spi rx-transfer bridging page\n");
+ return false;
+ }
+
+diff --git a/drivers/spi/spi-pxa2xx.c b/drivers/spi/spi-pxa2xx.c
+index 7293d6d875c5..8e4b1a7c37ce 100644
+--- a/drivers/spi/spi-pxa2xx.c
++++ b/drivers/spi/spi-pxa2xx.c
+@@ -643,6 +643,10 @@ static irqreturn_t ssp_int(int irq, void *dev_id)
+ if (!(sccr1_reg & SSCR1_TIE))
+ mask &= ~SSSR_TFS;
+
++ /* Ignore RX timeout interrupt if it is disabled */
++ if (!(sccr1_reg & SSCR1_TINTE))
++ mask &= ~SSSR_TINT;
++
+ if (!(status & mask))
+ return IRQ_NONE;
+
+diff --git a/drivers/spi/spi-xtensa-xtfpga.c b/drivers/spi/spi-xtensa-xtfpga.c
+index 2e32ea2f194f..be6155cba9de 100644
+--- a/drivers/spi/spi-xtensa-xtfpga.c
++++ b/drivers/spi/spi-xtensa-xtfpga.c
+@@ -34,13 +34,13 @@ struct xtfpga_spi {
+ static inline void xtfpga_spi_write32(const struct xtfpga_spi *spi,
+ unsigned addr, u32 val)
+ {
+- iowrite32(val, spi->regs + addr);
++ __raw_writel(val, spi->regs + addr);
+ }
+
+ static inline unsigned int xtfpga_spi_read32(const struct xtfpga_spi *spi,
+ unsigned addr)
+ {
+- return ioread32(spi->regs + addr);
++ return __raw_readl(spi->regs + addr);
+ }
+
+ static inline void xtfpga_spi_wait_busy(struct xtfpga_spi *xspi)
+diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
+index cf8b91b23a76..9ce2f156d382 100644
+--- a/drivers/spi/spi.c
++++ b/drivers/spi/spi.c
+@@ -1437,8 +1437,7 @@ static struct class spi_master_class = {
+ *
+ * The caller is responsible for assigning the bus number and initializing
+ * the master's methods before calling spi_register_master(); and (after errors
+- * adding the device) calling spi_master_put() and kfree() to prevent a memory
+- * leak.
++ * adding the device) calling spi_master_put() to prevent a memory leak.
+ */
+ struct spi_master *spi_alloc_master(struct device *dev, unsigned size)
+ {
+diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c
+index c7de64171c45..97aad8f91c2f 100644
+--- a/drivers/spi/spidev.c
++++ b/drivers/spi/spidev.c
+@@ -651,7 +651,8 @@ static int spidev_release(struct inode *inode, struct file *filp)
+ kfree(spidev->rx_buffer);
+ spidev->rx_buffer = NULL;
+
+- spidev->speed_hz = spidev->spi->max_speed_hz;
++ if (spidev->spi)
++ spidev->speed_hz = spidev->spi->max_speed_hz;
+
+ /* ... after we unbound from the underlying device? */
+ spin_lock_irq(&spidev->spi_lock);
+diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
+index 6f4811263557..b71b1f2d98d5 100644
+--- a/drivers/staging/android/ion/ion.c
++++ b/drivers/staging/android/ion/ion.c
+@@ -1179,13 +1179,13 @@ struct ion_handle *ion_import_dma_buf(struct ion_client *client, int fd)
+ mutex_unlock(&client->lock);
+ goto end;
+ }
+- mutex_unlock(&client->lock);
+
+ handle = ion_handle_create(client, buffer);
+- if (IS_ERR(handle))
++ if (IS_ERR(handle)) {
++ mutex_unlock(&client->lock);
+ goto end;
++ }
+
+- mutex_lock(&client->lock);
+ ret = ion_handle_add(client, handle);
+ mutex_unlock(&client->lock);
+ if (ret) {
+diff --git a/drivers/staging/speakup/fakekey.c b/drivers/staging/speakup/fakekey.c
+index 4299cf45f947..5e1f16c36b49 100644
+--- a/drivers/staging/speakup/fakekey.c
++++ b/drivers/staging/speakup/fakekey.c
+@@ -81,6 +81,7 @@ void speakup_fake_down_arrow(void)
+ __this_cpu_write(reporting_keystroke, true);
+ input_report_key(virt_keyboard, KEY_DOWN, PRESSED);
+ input_report_key(virt_keyboard, KEY_DOWN, RELEASED);
++ input_sync(virt_keyboard);
+ __this_cpu_write(reporting_keystroke, false);
+
+ /* reenable preemption */
+diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
+index fd092909a457..56cf1996f30f 100644
+--- a/drivers/target/iscsi/iscsi_target.c
++++ b/drivers/target/iscsi/iscsi_target.c
+@@ -341,7 +341,6 @@ static struct iscsi_np *iscsit_get_np(
+
+ struct iscsi_np *iscsit_add_np(
+ struct __kernel_sockaddr_storage *sockaddr,
+- char *ip_str,
+ int network_transport)
+ {
+ struct sockaddr_in *sock_in;
+@@ -370,11 +369,9 @@ struct iscsi_np *iscsit_add_np(
+ np->np_flags |= NPF_IP_NETWORK;
+ if (sockaddr->ss_family == AF_INET6) {
+ sock_in6 = (struct sockaddr_in6 *)sockaddr;
+- snprintf(np->np_ip, IPV6_ADDRESS_SPACE, "%s", ip_str);
+ np->np_port = ntohs(sock_in6->sin6_port);
+ } else {
+ sock_in = (struct sockaddr_in *)sockaddr;
+- sprintf(np->np_ip, "%s", ip_str);
+ np->np_port = ntohs(sock_in->sin_port);
+ }
+
+@@ -411,8 +408,8 @@ struct iscsi_np *iscsit_add_np(
+ list_add_tail(&np->np_list, &g_np_list);
+ mutex_unlock(&np_lock);
+
+- pr_debug("CORE[0] - Added Network Portal: %s:%hu on %s\n",
+- np->np_ip, np->np_port, np->np_transport->name);
++ pr_debug("CORE[0] - Added Network Portal: %pISc:%hu on %s\n",
++ &np->np_sockaddr, np->np_port, np->np_transport->name);
+
+ return np;
+ }
+@@ -481,8 +478,8 @@ int iscsit_del_np(struct iscsi_np *np)
+ list_del(&np->np_list);
+ mutex_unlock(&np_lock);
+
+- pr_debug("CORE[0] - Removed Network Portal: %s:%hu on %s\n",
+- np->np_ip, np->np_port, np->np_transport->name);
++ pr_debug("CORE[0] - Removed Network Portal: %pISc:%hu on %s\n",
++ &np->np_sockaddr, np->np_port, np->np_transport->name);
+
+ iscsit_put_transport(np->np_transport);
+ kfree(np);
+@@ -3464,7 +3461,6 @@ iscsit_build_sendtargets_response(struct iscsi_cmd *cmd,
+ tpg_np_list) {
+ struct iscsi_np *np = tpg_np->tpg_np;
+ bool inaddr_any = iscsit_check_inaddr_any(np);
+- char *fmt_str;
+
+ if (np->np_network_transport != network_transport)
+ continue;
+@@ -3492,15 +3488,18 @@ iscsit_build_sendtargets_response(struct iscsi_cmd *cmd,
+ }
+ }
+
+- if (np->np_sockaddr.ss_family == AF_INET6)
+- fmt_str = "TargetAddress=[%s]:%hu,%hu";
+- else
+- fmt_str = "TargetAddress=%s:%hu,%hu";
+-
+- len = sprintf(buf, fmt_str,
+- inaddr_any ? conn->local_ip : np->np_ip,
+- np->np_port,
+- tpg->tpgt);
++ if (inaddr_any) {
++ len = sprintf(buf, "TargetAddress="
++ "%s:%hu,%hu",
++ conn->local_ip,
++ np->np_port,
++ tpg->tpgt);
++ } else {
++ len = sprintf(buf, "TargetAddress="
++ "%pISpc,%hu",
++ &np->np_sockaddr,
++ tpg->tpgt);
++ }
+ len += 1;
+
+ if ((len + payload_len) > buffer_len) {
+diff --git a/drivers/target/iscsi/iscsi_target.h b/drivers/target/iscsi/iscsi_target.h
+index 7d0f9c00d9c2..d294f030a097 100644
+--- a/drivers/target/iscsi/iscsi_target.h
++++ b/drivers/target/iscsi/iscsi_target.h
+@@ -13,7 +13,7 @@ extern int iscsit_deaccess_np(struct iscsi_np *, struct iscsi_portal_group *,
+ extern bool iscsit_check_np_match(struct __kernel_sockaddr_storage *,
+ struct iscsi_np *, int);
+ extern struct iscsi_np *iscsit_add_np(struct __kernel_sockaddr_storage *,
+- char *, int);
++ int);
+ extern int iscsit_reset_np_thread(struct iscsi_np *, struct iscsi_tpg_np *,
+ struct iscsi_portal_group *, bool);
+ extern int iscsit_del_np(struct iscsi_np *);
+diff --git a/drivers/target/iscsi/iscsi_target_configfs.c b/drivers/target/iscsi/iscsi_target_configfs.c
+index c1898c84b3d2..db3b9b986954 100644
+--- a/drivers/target/iscsi/iscsi_target_configfs.c
++++ b/drivers/target/iscsi/iscsi_target_configfs.c
+@@ -99,7 +99,7 @@ static ssize_t lio_target_np_store_sctp(
+ * Use existing np->np_sockaddr for SCTP network portal reference
+ */
+ tpg_np_sctp = iscsit_tpg_add_network_portal(tpg, &np->np_sockaddr,
+- np->np_ip, tpg_np, ISCSI_SCTP_TCP);
++ tpg_np, ISCSI_SCTP_TCP);
+ if (!tpg_np_sctp || IS_ERR(tpg_np_sctp))
+ goto out;
+ } else {
+@@ -177,7 +177,7 @@ static ssize_t lio_target_np_store_iser(
+ }
+
+ tpg_np_iser = iscsit_tpg_add_network_portal(tpg, &np->np_sockaddr,
+- np->np_ip, tpg_np, ISCSI_INFINIBAND);
++ tpg_np, ISCSI_INFINIBAND);
+ if (IS_ERR(tpg_np_iser)) {
+ rc = PTR_ERR(tpg_np_iser);
+ goto out;
+@@ -248,8 +248,8 @@ static struct se_tpg_np *lio_target_call_addnptotpg(
+ return ERR_PTR(-EINVAL);
+ }
+ str++; /* Skip over leading "[" */
+- *str2 = '\0'; /* Terminate the IPv6 address */
+- str2++; /* Skip over the "]" */
++ *str2 = '\0'; /* Terminate the unbracketed IPv6 address */
++ str2++; /* Skip over the \0 */
+ port_str = strstr(str2, ":");
+ if (!port_str) {
+ pr_err("Unable to locate \":port\""
+@@ -316,7 +316,7 @@ static struct se_tpg_np *lio_target_call_addnptotpg(
+ * sys/kernel/config/iscsi/$IQN/$TPG/np/$IP:$PORT/
+ *
+ */
+- tpg_np = iscsit_tpg_add_network_portal(tpg, &sockaddr, str, NULL,
++ tpg_np = iscsit_tpg_add_network_portal(tpg, &sockaddr, NULL,
+ ISCSI_TCP);
+ if (IS_ERR(tpg_np)) {
+ iscsit_put_tpg(tpg);
+@@ -344,8 +344,8 @@ static void lio_target_call_delnpfromtpg(
+
+ se_tpg = &tpg->tpg_se_tpg;
+ pr_debug("LIO_Target_ConfigFS: DEREGISTER -> %s TPGT: %hu"
+- " PORTAL: %s:%hu\n", config_item_name(&se_tpg->se_tpg_wwn->wwn_group.cg_item),
+- tpg->tpgt, tpg_np->tpg_np->np_ip, tpg_np->tpg_np->np_port);
++ " PORTAL: %pISc:%hu\n", config_item_name(&se_tpg->se_tpg_wwn->wwn_group.cg_item),
++ tpg->tpgt, &tpg_np->tpg_np->np_sockaddr, tpg_np->tpg_np->np_port);
+
+ ret = iscsit_tpg_del_network_portal(tpg, tpg_np);
+ if (ret < 0)
+diff --git a/drivers/target/iscsi/iscsi_target_login.c b/drivers/target/iscsi/iscsi_target_login.c
+index 7e8f65e5448f..666c0739bfbe 100644
+--- a/drivers/target/iscsi/iscsi_target_login.c
++++ b/drivers/target/iscsi/iscsi_target_login.c
+@@ -823,8 +823,8 @@ static void iscsi_handle_login_thread_timeout(unsigned long data)
+ struct iscsi_np *np = (struct iscsi_np *) data;
+
+ spin_lock_bh(&np->np_thread_lock);
+- pr_err("iSCSI Login timeout on Network Portal %s:%hu\n",
+- np->np_ip, np->np_port);
++ pr_err("iSCSI Login timeout on Network Portal %pISc:%hu\n",
++ &np->np_sockaddr, np->np_port);
+
+ if (np->np_login_timer_flags & ISCSI_TF_STOP) {
+ spin_unlock_bh(&np->np_thread_lock);
+@@ -1302,8 +1302,8 @@ static int __iscsi_target_login_thread(struct iscsi_np *np)
+ spin_lock_bh(&np->np_thread_lock);
+ if (np->np_thread_state != ISCSI_NP_THREAD_ACTIVE) {
+ spin_unlock_bh(&np->np_thread_lock);
+- pr_err("iSCSI Network Portal on %s:%hu currently not"
+- " active.\n", np->np_ip, np->np_port);
++ pr_err("iSCSI Network Portal on %pISc:%hu currently not"
++ " active.\n", &np->np_sockaddr, np->np_port);
+ iscsit_tx_login_rsp(conn, ISCSI_STATUS_CLS_TARGET_ERR,
+ ISCSI_LOGIN_STATUS_SVC_UNAVAILABLE);
+ goto new_sess_out;
+diff --git a/drivers/target/iscsi/iscsi_target_parameters.c b/drivers/target/iscsi/iscsi_target_parameters.c
+index e8a52f7d6204..51d1734d5390 100644
+--- a/drivers/target/iscsi/iscsi_target_parameters.c
++++ b/drivers/target/iscsi/iscsi_target_parameters.c
+@@ -407,6 +407,7 @@ int iscsi_create_default_params(struct iscsi_param_list **param_list_ptr)
+ TYPERANGE_UTF8, USE_INITIAL_ONLY);
+ if (!param)
+ goto out;
++
+ /*
+ * Extra parameters for ISER from RFC-5046
+ */
+@@ -496,9 +497,9 @@ int iscsi_set_keys_to_negotiate(
+ } else if (!strcmp(param->name, SESSIONTYPE)) {
+ SET_PSTATE_NEGOTIATE(param);
+ } else if (!strcmp(param->name, IFMARKER)) {
+- SET_PSTATE_NEGOTIATE(param);
++ SET_PSTATE_REJECT(param);
+ } else if (!strcmp(param->name, OFMARKER)) {
+- SET_PSTATE_NEGOTIATE(param);
++ SET_PSTATE_REJECT(param);
+ } else if (!strcmp(param->name, IFMARKINT)) {
+ SET_PSTATE_REJECT(param);
+ } else if (!strcmp(param->name, OFMARKINT)) {
+diff --git a/drivers/target/iscsi/iscsi_target_tpg.c b/drivers/target/iscsi/iscsi_target_tpg.c
+index 968068ffcb1c..de26bee4bddd 100644
+--- a/drivers/target/iscsi/iscsi_target_tpg.c
++++ b/drivers/target/iscsi/iscsi_target_tpg.c
+@@ -460,7 +460,6 @@ static bool iscsit_tpg_check_network_portal(
+ struct iscsi_tpg_np *iscsit_tpg_add_network_portal(
+ struct iscsi_portal_group *tpg,
+ struct __kernel_sockaddr_storage *sockaddr,
+- char *ip_str,
+ struct iscsi_tpg_np *tpg_np_parent,
+ int network_transport)
+ {
+@@ -470,8 +469,8 @@ struct iscsi_tpg_np *iscsit_tpg_add_network_portal(
+ if (!tpg_np_parent) {
+ if (iscsit_tpg_check_network_portal(tpg->tpg_tiqn, sockaddr,
+ network_transport)) {
+- pr_err("Network Portal: %s already exists on a"
+- " different TPG on %s\n", ip_str,
++ pr_err("Network Portal: %pISc already exists on a"
++ " different TPG on %s\n", sockaddr,
+ tpg->tpg_tiqn->tiqn);
+ return ERR_PTR(-EEXIST);
+ }
+@@ -484,7 +483,7 @@ struct iscsi_tpg_np *iscsit_tpg_add_network_portal(
+ return ERR_PTR(-ENOMEM);
+ }
+
+- np = iscsit_add_np(sockaddr, ip_str, network_transport);
++ np = iscsit_add_np(sockaddr, network_transport);
+ if (IS_ERR(np)) {
+ kfree(tpg_np);
+ return ERR_CAST(np);
+@@ -514,8 +513,8 @@ struct iscsi_tpg_np *iscsit_tpg_add_network_portal(
+ spin_unlock(&tpg_np_parent->tpg_np_parent_lock);
+ }
+
+- pr_debug("CORE[%s] - Added Network Portal: %s:%hu,%hu on %s\n",
+- tpg->tpg_tiqn->tiqn, np->np_ip, np->np_port, tpg->tpgt,
++ pr_debug("CORE[%s] - Added Network Portal: %pISc:%hu,%hu on %s\n",
++ tpg->tpg_tiqn->tiqn, &np->np_sockaddr, np->np_port, tpg->tpgt,
+ np->np_transport->name);
+
+ return tpg_np;
+@@ -528,8 +527,8 @@ static int iscsit_tpg_release_np(
+ {
+ iscsit_clear_tpg_np_login_thread(tpg_np, tpg, true);
+
+- pr_debug("CORE[%s] - Removed Network Portal: %s:%hu,%hu on %s\n",
+- tpg->tpg_tiqn->tiqn, np->np_ip, np->np_port, tpg->tpgt,
++ pr_debug("CORE[%s] - Removed Network Portal: %pISc:%hu,%hu on %s\n",
++ tpg->tpg_tiqn->tiqn, &np->np_sockaddr, np->np_port, tpg->tpgt,
+ np->np_transport->name);
+
+ tpg_np->tpg_np = NULL;
+diff --git a/drivers/target/iscsi/iscsi_target_tpg.h b/drivers/target/iscsi/iscsi_target_tpg.h
+index 95ff5bdecd71..28abda89ea98 100644
+--- a/drivers/target/iscsi/iscsi_target_tpg.h
++++ b/drivers/target/iscsi/iscsi_target_tpg.h
+@@ -22,7 +22,7 @@ extern struct iscsi_node_attrib *iscsit_tpg_get_node_attrib(struct iscsi_session
+ extern void iscsit_tpg_del_external_nps(struct iscsi_tpg_np *);
+ extern struct iscsi_tpg_np *iscsit_tpg_locate_child_np(struct iscsi_tpg_np *, int);
+ extern struct iscsi_tpg_np *iscsit_tpg_add_network_portal(struct iscsi_portal_group *,
+- struct __kernel_sockaddr_storage *, char *, struct iscsi_tpg_np *,
++ struct __kernel_sockaddr_storage *, struct iscsi_tpg_np *,
+ int);
+ extern int iscsit_tpg_del_network_portal(struct iscsi_portal_group *,
+ struct iscsi_tpg_np *);
+diff --git a/drivers/target/target_core_device.c b/drivers/target/target_core_device.c
+index 09e682b1c549..8f1cd194f06a 100644
+--- a/drivers/target/target_core_device.c
++++ b/drivers/target/target_core_device.c
+@@ -427,8 +427,6 @@ void core_disable_device_list_for_node(
+
+ hlist_del_rcu(&orig->link);
+ clear_bit(DEF_PR_REG_ACTIVE, &orig->deve_flags);
+- rcu_assign_pointer(orig->se_lun, NULL);
+- rcu_assign_pointer(orig->se_lun_acl, NULL);
+ orig->lun_flags = 0;
+ orig->creation_time = 0;
+ orig->attach_count--;
+@@ -439,6 +437,9 @@ void core_disable_device_list_for_node(
+ kref_put(&orig->pr_kref, target_pr_kref_release);
+ wait_for_completion(&orig->pr_comp);
+
++ rcu_assign_pointer(orig->se_lun, NULL);
++ rcu_assign_pointer(orig->se_lun_acl, NULL);
++
+ kfree_rcu(orig, rcu_head);
+
+ core_scsi3_free_pr_reg_from_nacl(dev, nacl);
+diff --git a/drivers/target/target_core_pr.c b/drivers/target/target_core_pr.c
+index 5ab7100de17e..e7933115087a 100644
+--- a/drivers/target/target_core_pr.c
++++ b/drivers/target/target_core_pr.c
+@@ -618,7 +618,7 @@ static struct t10_pr_registration *__core_scsi3_do_alloc_registration(
+ struct se_device *dev,
+ struct se_node_acl *nacl,
+ struct se_lun *lun,
+- struct se_dev_entry *deve,
++ struct se_dev_entry *dest_deve,
+ u64 mapped_lun,
+ unsigned char *isid,
+ u64 sa_res_key,
+@@ -640,7 +640,29 @@ static struct t10_pr_registration *__core_scsi3_do_alloc_registration(
+ INIT_LIST_HEAD(&pr_reg->pr_reg_atp_mem_list);
+ atomic_set(&pr_reg->pr_res_holders, 0);
+ pr_reg->pr_reg_nacl = nacl;
+- pr_reg->pr_reg_deve = deve;
++ /*
++ * For destination registrations for ALL_TG_PT=1 and SPEC_I_PT=1,
++ * the se_dev_entry->pr_ref will have been already obtained by
++ * core_get_se_deve_from_rtpi() or __core_scsi3_alloc_registration().
++ *
++ * Otherwise, locate se_dev_entry now and obtain a reference until
++ * registration completes in __core_scsi3_add_registration().
++ */
++ if (dest_deve) {
++ pr_reg->pr_reg_deve = dest_deve;
++ } else {
++ rcu_read_lock();
++ pr_reg->pr_reg_deve = target_nacl_find_deve(nacl, mapped_lun);
++ if (!pr_reg->pr_reg_deve) {
++ rcu_read_unlock();
++ pr_err("Unable to locate PR deve %s mapped_lun: %llu\n",
++ nacl->initiatorname, mapped_lun);
++ kmem_cache_free(t10_pr_reg_cache, pr_reg);
++ return NULL;
++ }
++ kref_get(&pr_reg->pr_reg_deve->pr_kref);
++ rcu_read_unlock();
++ }
+ pr_reg->pr_res_mapped_lun = mapped_lun;
+ pr_reg->pr_aptpl_target_lun = lun->unpacked_lun;
+ pr_reg->tg_pt_sep_rtpi = lun->lun_rtpi;
+@@ -936,17 +958,29 @@ static int __core_scsi3_check_aptpl_registration(
+ !(strcmp(pr_reg->pr_tport, t_port)) &&
+ (pr_reg->pr_reg_tpgt == tpgt) &&
+ (pr_reg->pr_aptpl_target_lun == target_lun)) {
++ /*
++ * Obtain the ->pr_reg_deve pointer + reference, that
++ * is released by __core_scsi3_add_registration() below.
++ */
++ rcu_read_lock();
++ pr_reg->pr_reg_deve = target_nacl_find_deve(nacl, mapped_lun);
++ if (!pr_reg->pr_reg_deve) {
++ pr_err("Unable to locate PR APTPL %s mapped_lun:"
++ " %llu\n", nacl->initiatorname, mapped_lun);
++ rcu_read_unlock();
++ continue;
++ }
++ kref_get(&pr_reg->pr_reg_deve->pr_kref);
++ rcu_read_unlock();
+
+ pr_reg->pr_reg_nacl = nacl;
+ pr_reg->tg_pt_sep_rtpi = lun->lun_rtpi;
+-
+ list_del(&pr_reg->pr_reg_aptpl_list);
+ spin_unlock(&pr_tmpl->aptpl_reg_lock);
+ /*
+ * At this point all of the pointers in *pr_reg will
+ * be setup, so go ahead and add the registration.
+ */
+-
+ __core_scsi3_add_registration(dev, nacl, pr_reg, 0, 0);
+ /*
+ * If this registration is the reservation holder,
+@@ -1044,18 +1078,11 @@ static void __core_scsi3_add_registration(
+
+ __core_scsi3_dump_registration(tfo, dev, nacl, pr_reg, register_type);
+ spin_unlock(&pr_tmpl->registration_lock);
+-
+- rcu_read_lock();
+- deve = pr_reg->pr_reg_deve;
+- if (deve)
+- set_bit(DEF_PR_REG_ACTIVE, &deve->deve_flags);
+- rcu_read_unlock();
+-
+ /*
+ * Skip extra processing for ALL_TG_PT=0 or REGISTER_AND_MOVE.
+ */
+ if (!pr_reg->pr_reg_all_tg_pt || register_move)
+- return;
++ goto out;
+ /*
+ * Walk pr_reg->pr_reg_atp_list and add registrations for ALL_TG_PT=1
+ * allocated in __core_scsi3_alloc_registration()
+@@ -1075,19 +1102,31 @@ static void __core_scsi3_add_registration(
+ __core_scsi3_dump_registration(tfo, dev, nacl_tmp, pr_reg_tmp,
+ register_type);
+ spin_unlock(&pr_tmpl->registration_lock);
+-
++ /*
++ * Drop configfs group dependency reference and deve->pr_kref
++ * obtained from __core_scsi3_alloc_registration() code.
++ */
+ rcu_read_lock();
+ deve = pr_reg_tmp->pr_reg_deve;
+- if (deve)
++ if (deve) {
+ set_bit(DEF_PR_REG_ACTIVE, &deve->deve_flags);
++ core_scsi3_lunacl_undepend_item(deve);
++ pr_reg_tmp->pr_reg_deve = NULL;
++ }
+ rcu_read_unlock();
+-
+- /*
+- * Drop configfs group dependency reference from
+- * __core_scsi3_alloc_registration()
+- */
+- core_scsi3_lunacl_undepend_item(pr_reg_tmp->pr_reg_deve);
+ }
++out:
++ /*
++ * Drop deve->pr_kref obtained in __core_scsi3_do_alloc_registration()
++ */
++ rcu_read_lock();
++ deve = pr_reg->pr_reg_deve;
++ if (deve) {
++ set_bit(DEF_PR_REG_ACTIVE, &deve->deve_flags);
++ kref_put(&deve->pr_kref, target_pr_kref_release);
++ pr_reg->pr_reg_deve = NULL;
++ }
++ rcu_read_unlock();
+ }
+
+ static int core_scsi3_alloc_registration(
+@@ -1785,9 +1824,11 @@ core_scsi3_decode_spec_i_port(
+ dest_node_acl->initiatorname, i_buf, (dest_se_deve) ?
+ dest_se_deve->mapped_lun : 0);
+
+- if (!dest_se_deve)
++ if (!dest_se_deve) {
++ kref_put(&local_pr_reg->pr_reg_deve->pr_kref,
++ target_pr_kref_release);
+ continue;
+-
++ }
+ core_scsi3_lunacl_undepend_item(dest_se_deve);
+ core_scsi3_nodeacl_undepend_item(dest_node_acl);
+ core_scsi3_tpg_undepend_item(dest_tpg);
+@@ -1823,9 +1864,11 @@ out:
+
+ kmem_cache_free(t10_pr_reg_cache, dest_pr_reg);
+
+- if (!dest_se_deve)
++ if (!dest_se_deve) {
++ kref_put(&local_pr_reg->pr_reg_deve->pr_kref,
++ target_pr_kref_release);
+ continue;
+-
++ }
+ core_scsi3_lunacl_undepend_item(dest_se_deve);
+ core_scsi3_nodeacl_undepend_item(dest_node_acl);
+ core_scsi3_tpg_undepend_item(dest_tpg);
+diff --git a/drivers/target/target_core_xcopy.c b/drivers/target/target_core_xcopy.c
+index 4515f52546f8..47fe94ee10b8 100644
+--- a/drivers/target/target_core_xcopy.c
++++ b/drivers/target/target_core_xcopy.c
+@@ -450,6 +450,8 @@ int target_xcopy_setup_pt(void)
+ memset(&xcopy_pt_sess, 0, sizeof(struct se_session));
+ INIT_LIST_HEAD(&xcopy_pt_sess.sess_list);
+ INIT_LIST_HEAD(&xcopy_pt_sess.sess_acl_list);
++ INIT_LIST_HEAD(&xcopy_pt_sess.sess_cmd_list);
++ spin_lock_init(&xcopy_pt_sess.sess_cmd_lock);
+
+ xcopy_pt_nacl.se_tpg = &xcopy_pt_tpg;
+ xcopy_pt_nacl.nacl_sess = &xcopy_pt_sess;
+@@ -644,7 +646,7 @@ static int target_xcopy_read_source(
+ pr_debug("XCOPY: Built READ_16: LBA: %llu Sectors: %u Length: %u\n",
+ (unsigned long long)src_lba, src_sectors, length);
+
+- transport_init_se_cmd(se_cmd, &xcopy_pt_tfo, NULL, length,
++ transport_init_se_cmd(se_cmd, &xcopy_pt_tfo, &xcopy_pt_sess, length,
+ DMA_FROM_DEVICE, 0, &xpt_cmd->sense_buffer[0]);
+ xop->src_pt_cmd = xpt_cmd;
+
+@@ -704,7 +706,7 @@ static int target_xcopy_write_destination(
+ pr_debug("XCOPY: Built WRITE_16: LBA: %llu Sectors: %u Length: %u\n",
+ (unsigned long long)dst_lba, dst_sectors, length);
+
+- transport_init_se_cmd(se_cmd, &xcopy_pt_tfo, NULL, length,
++ transport_init_se_cmd(se_cmd, &xcopy_pt_tfo, &xcopy_pt_sess, length,
+ DMA_TO_DEVICE, 0, &xpt_cmd->sense_buffer[0]);
+ xop->dst_pt_cmd = xpt_cmd;
+
+diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
+index 620dcd405ff6..42c6f71bdcc1 100644
+--- a/drivers/thermal/cpu_cooling.c
++++ b/drivers/thermal/cpu_cooling.c
+@@ -262,7 +262,9 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,
+ * efficiently. Power is stored in mW, frequency in KHz. The
+ * resulting table is in ascending order.
+ *
+- * Return: 0 on success, -E* on error.
++ * Return: 0 on success, -EINVAL if there are no OPPs for any CPUs,
++ * -ENOMEM if we run out of memory or -EAGAIN if an OPP was
++ * added/enabled while the function was executing.
+ */
+ static int build_dyn_power_table(struct cpufreq_cooling_device *cpufreq_device,
+ u32 capacitance)
+@@ -273,8 +275,6 @@ static int build_dyn_power_table(struct cpufreq_cooling_device *cpufreq_device,
+ int num_opps = 0, cpu, i, ret = 0;
+ unsigned long freq;
+
+- rcu_read_lock();
+-
+ for_each_cpu(cpu, &cpufreq_device->allowed_cpus) {
+ dev = get_cpu_device(cpu);
+ if (!dev) {
+@@ -284,24 +284,20 @@ static int build_dyn_power_table(struct cpufreq_cooling_device *cpufreq_device,
+ }
+
+ num_opps = dev_pm_opp_get_opp_count(dev);
+- if (num_opps > 0) {
++ if (num_opps > 0)
+ break;
+- } else if (num_opps < 0) {
+- ret = num_opps;
+- goto unlock;
+- }
++ else if (num_opps < 0)
++ return num_opps;
+ }
+
+- if (num_opps == 0) {
+- ret = -EINVAL;
+- goto unlock;
+- }
++ if (num_opps == 0)
++ return -EINVAL;
+
+ power_table = kcalloc(num_opps, sizeof(*power_table), GFP_KERNEL);
+- if (!power_table) {
+- ret = -ENOMEM;
+- goto unlock;
+- }
++ if (!power_table)
++ return -ENOMEM;
++
++ rcu_read_lock();
+
+ for (freq = 0, i = 0;
+ opp = dev_pm_opp_find_freq_ceil(dev, &freq), !IS_ERR(opp);
+@@ -309,6 +305,12 @@ static int build_dyn_power_table(struct cpufreq_cooling_device *cpufreq_device,
+ u32 freq_mhz, voltage_mv;
+ u64 power;
+
++ if (i >= num_opps) {
++ rcu_read_unlock();
++ ret = -EAGAIN;
++ goto free_power_table;
++ }
++
+ freq_mhz = freq / 1000000;
+ voltage_mv = dev_pm_opp_get_voltage(opp) / 1000;
+
+@@ -326,17 +328,22 @@ static int build_dyn_power_table(struct cpufreq_cooling_device *cpufreq_device,
+ power_table[i].power = power;
+ }
+
+- if (i == 0) {
++ rcu_read_unlock();
++
++ if (i != num_opps) {
+ ret = PTR_ERR(opp);
+- goto unlock;
++ goto free_power_table;
+ }
+
+ cpufreq_device->cpu_dev = dev;
+ cpufreq_device->dyn_power_table = power_table;
+ cpufreq_device->dyn_power_table_entries = i;
+
+-unlock:
+- rcu_read_unlock();
++ return 0;
++
++free_power_table:
++ kfree(power_table);
++
+ return ret;
+ }
+
+@@ -847,7 +854,7 @@ __cpufreq_cooling_register(struct device_node *np,
+ ret = get_idr(&cpufreq_idr, &cpufreq_dev->id);
+ if (ret) {
+ cool_dev = ERR_PTR(ret);
+- goto free_table;
++ goto free_power_table;
+ }
+
+ snprintf(dev_name, sizeof(dev_name), "thermal-cpufreq-%d",
+@@ -889,6 +896,8 @@ __cpufreq_cooling_register(struct device_node *np,
+
+ remove_idr:
+ release_idr(&cpufreq_idr, cpufreq_dev->id);
++free_power_table:
++ kfree(cpufreq_dev->dyn_power_table);
+ free_table:
+ kfree(cpufreq_dev->freq_table);
+ free_time_in_idle_timestamp:
+@@ -1039,6 +1048,7 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev)
+
+ thermal_cooling_device_unregister(cpufreq_dev->cool_dev);
+ release_idr(&cpufreq_idr, cpufreq_dev->id);
++ kfree(cpufreq_dev->dyn_power_table);
+ kfree(cpufreq_dev->time_in_idle_timestamp);
+ kfree(cpufreq_dev->time_in_idle);
+ kfree(cpufreq_dev->freq_table);
+diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
+index ee8bfacf2071..afc1879f66e0 100644
+--- a/drivers/tty/n_tty.c
++++ b/drivers/tty/n_tty.c
+@@ -343,8 +343,7 @@ static void n_tty_packet_mode_flush(struct tty_struct *tty)
+ spin_lock_irqsave(&tty->ctrl_lock, flags);
+ tty->ctrl_status |= TIOCPKT_FLUSHREAD;
+ spin_unlock_irqrestore(&tty->ctrl_lock, flags);
+- if (waitqueue_active(&tty->link->read_wait))
+- wake_up_interruptible(&tty->link->read_wait);
++ wake_up_interruptible(&tty->link->read_wait);
+ }
+ }
+
+@@ -1382,8 +1381,7 @@ handle_newline:
+ put_tty_queue(c, ldata);
+ smp_store_release(&ldata->canon_head, ldata->read_head);
+ kill_fasync(&tty->fasync, SIGIO, POLL_IN);
+- if (waitqueue_active(&tty->read_wait))
+- wake_up_interruptible_poll(&tty->read_wait, POLLIN);
++ wake_up_interruptible_poll(&tty->read_wait, POLLIN);
+ return 0;
+ }
+ }
+@@ -1667,8 +1665,7 @@ static void __receive_buf(struct tty_struct *tty, const unsigned char *cp,
+
+ if ((read_cnt(ldata) >= ldata->minimum_to_wake) || L_EXTPROC(tty)) {
+ kill_fasync(&tty->fasync, SIGIO, POLL_IN);
+- if (waitqueue_active(&tty->read_wait))
+- wake_up_interruptible_poll(&tty->read_wait, POLLIN);
++ wake_up_interruptible_poll(&tty->read_wait, POLLIN);
+ }
+ }
+
+@@ -1887,10 +1884,8 @@ static void n_tty_set_termios(struct tty_struct *tty, struct ktermios *old)
+ }
+
+ /* The termios change make the tty ready for I/O */
+- if (waitqueue_active(&tty->write_wait))
+- wake_up_interruptible(&tty->write_wait);
+- if (waitqueue_active(&tty->read_wait))
+- wake_up_interruptible(&tty->read_wait);
++ wake_up_interruptible(&tty->write_wait);
++ wake_up_interruptible(&tty->read_wait);
+ }
+
+ /**
+diff --git a/drivers/tty/serial/8250/8250_core.c b/drivers/tty/serial/8250/8250_core.c
+index 37fff12dd4d0..c35d96ece8ff 100644
+--- a/drivers/tty/serial/8250/8250_core.c
++++ b/drivers/tty/serial/8250/8250_core.c
+@@ -326,6 +326,14 @@ configured less than Maximum supported fifo bytes */
+ UART_FCR7_64BYTE,
+ .flags = UART_CAP_FIFO,
+ },
++ [PORT_RT2880] = {
++ .name = "Palmchip BK-3103",
++ .fifo_size = 16,
++ .tx_loadsz = 16,
++ .fcr = UART_FCR_ENABLE_FIFO | UART_FCR_R_TRIG_10,
++ .rxtrig_bytes = {1, 4, 8, 14},
++ .flags = UART_CAP_FIFO,
++ },
+ };
+
+ /* Uart divisor latch read */
+diff --git a/drivers/tty/serial/atmel_serial.c b/drivers/tty/serial/atmel_serial.c
+index 2a8f528153e7..40326b342762 100644
+--- a/drivers/tty/serial/atmel_serial.c
++++ b/drivers/tty/serial/atmel_serial.c
+@@ -2641,7 +2641,7 @@ static int atmel_serial_probe(struct platform_device *pdev)
+ ret = atmel_init_gpios(port, &pdev->dev);
+ if (ret < 0) {
+ dev_err(&pdev->dev, "Failed to initialize GPIOs.");
+- goto err;
++ goto err_clear_bit;
+ }
+
+ ret = atmel_init_port(port, pdev);
+diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
+index 57fc6ee12332..774df354af55 100644
+--- a/drivers/tty/tty_io.c
++++ b/drivers/tty/tty_io.c
+@@ -2136,8 +2136,24 @@ retry_open:
+ if (!noctty &&
+ current->signal->leader &&
+ !current->signal->tty &&
+- tty->session == NULL)
+- __proc_set_tty(tty);
++ tty->session == NULL) {
++ /*
++ * Don't let a process that only has write access to the tty
++ * obtain the privileges associated with having a tty as
++ * controlling terminal (being able to reopen it with full
++ * access through /dev/tty, being able to perform pushback).
++ * Many distributions set the group of all ttys to "tty" and
++ * grant write-only access to all terminals for setgid tty
++ * binaries, which should not imply full privileges on all ttys.
++ *
++ * This could theoretically break old code that performs open()
++ * on a write-only file descriptor. In that case, it might be
++ * necessary to also permit this if
++ * inode_permission(inode, MAY_READ) == 0.
++ */
++ if (filp->f_mode & FMODE_READ)
++ __proc_set_tty(tty);
++ }
+ spin_unlock_irq(¤t->sighand->siglock);
+ read_unlock(&tasklist_lock);
+ tty_unlock(tty);
+@@ -2426,7 +2442,7 @@ static int fionbio(struct file *file, int __user *p)
+ * Takes ->siglock() when updating signal->tty
+ */
+
+-static int tiocsctty(struct tty_struct *tty, int arg)
++static int tiocsctty(struct tty_struct *tty, struct file *file, int arg)
+ {
+ int ret = 0;
+
+@@ -2460,6 +2476,13 @@ static int tiocsctty(struct tty_struct *tty, int arg)
+ goto unlock;
+ }
+ }
++
++ /* See the comment in tty_open(). */
++ if ((file->f_mode & FMODE_READ) == 0 && !capable(CAP_SYS_ADMIN)) {
++ ret = -EPERM;
++ goto unlock;
++ }
++
+ proc_set_tty(tty);
+ unlock:
+ read_unlock(&tasklist_lock);
+@@ -2852,7 +2875,7 @@ long tty_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+ no_tty();
+ return 0;
+ case TIOCSCTTY:
+- return tiocsctty(tty, arg);
++ return tiocsctty(tty, file, arg);
+ case TIOCGPGRP:
+ return tiocgpgrp(tty, real_tty, p);
+ case TIOCSPGRP:
+diff --git a/drivers/usb/chipidea/ci_hdrc_imx.c b/drivers/usb/chipidea/ci_hdrc_imx.c
+index 389f0e034259..fa774323ebda 100644
+--- a/drivers/usb/chipidea/ci_hdrc_imx.c
++++ b/drivers/usb/chipidea/ci_hdrc_imx.c
+@@ -56,7 +56,7 @@ static const struct of_device_id ci_hdrc_imx_dt_ids[] = {
+ { .compatible = "fsl,imx27-usb", .data = &imx27_usb_data},
+ { .compatible = "fsl,imx6q-usb", .data = &imx6q_usb_data},
+ { .compatible = "fsl,imx6sl-usb", .data = &imx6sl_usb_data},
+- { .compatible = "fsl,imx6sx-usb", .data = &imx6sl_usb_data},
++ { .compatible = "fsl,imx6sx-usb", .data = &imx6sx_usb_data},
+ { /* sentinel */ }
+ };
+ MODULE_DEVICE_TABLE(of, ci_hdrc_imx_dt_ids);
+diff --git a/drivers/usb/chipidea/udc.c b/drivers/usb/chipidea/udc.c
+index 764f668d45a9..6e53c24fa1cb 100644
+--- a/drivers/usb/chipidea/udc.c
++++ b/drivers/usb/chipidea/udc.c
+@@ -656,6 +656,44 @@ __acquires(hwep->lock)
+ return 0;
+ }
+
++static int _ep_set_halt(struct usb_ep *ep, int value, bool check_transfer)
++{
++ struct ci_hw_ep *hwep = container_of(ep, struct ci_hw_ep, ep);
++ int direction, retval = 0;
++ unsigned long flags;
++
++ if (ep == NULL || hwep->ep.desc == NULL)
++ return -EINVAL;
++
++ if (usb_endpoint_xfer_isoc(hwep->ep.desc))
++ return -EOPNOTSUPP;
++
++ spin_lock_irqsave(hwep->lock, flags);
++
++ if (value && hwep->dir == TX && check_transfer &&
++ !list_empty(&hwep->qh.queue) &&
++ !usb_endpoint_xfer_control(hwep->ep.desc)) {
++ spin_unlock_irqrestore(hwep->lock, flags);
++ return -EAGAIN;
++ }
++
++ direction = hwep->dir;
++ do {
++ retval |= hw_ep_set_halt(hwep->ci, hwep->num, hwep->dir, value);
++
++ if (!value)
++ hwep->wedge = 0;
++
++ if (hwep->type == USB_ENDPOINT_XFER_CONTROL)
++ hwep->dir = (hwep->dir == TX) ? RX : TX;
++
++ } while (hwep->dir != direction);
++
++ spin_unlock_irqrestore(hwep->lock, flags);
++ return retval;
++}
++
++
+ /**
+ * _gadget_stop_activity: stops all USB activity, flushes & disables all endpts
+ * @gadget: gadget
+@@ -1051,7 +1089,7 @@ __acquires(ci->lock)
+ num += ci->hw_ep_max / 2;
+
+ spin_unlock(&ci->lock);
+- err = usb_ep_set_halt(&ci->ci_hw_ep[num].ep);
++ err = _ep_set_halt(&ci->ci_hw_ep[num].ep, 1, false);
+ spin_lock(&ci->lock);
+ if (!err)
+ isr_setup_status_phase(ci);
+@@ -1110,8 +1148,8 @@ delegate:
+
+ if (err < 0) {
+ spin_unlock(&ci->lock);
+- if (usb_ep_set_halt(&hwep->ep))
+- dev_err(ci->dev, "error: ep_set_halt\n");
++ if (_ep_set_halt(&hwep->ep, 1, false))
++ dev_err(ci->dev, "error: _ep_set_halt\n");
+ spin_lock(&ci->lock);
+ }
+ }
+@@ -1142,9 +1180,9 @@ __acquires(ci->lock)
+ err = isr_setup_status_phase(ci);
+ if (err < 0) {
+ spin_unlock(&ci->lock);
+- if (usb_ep_set_halt(&hwep->ep))
++ if (_ep_set_halt(&hwep->ep, 1, false))
+ dev_err(ci->dev,
+- "error: ep_set_halt\n");
++ "error: _ep_set_halt\n");
+ spin_lock(&ci->lock);
+ }
+ }
+@@ -1390,41 +1428,7 @@ static int ep_dequeue(struct usb_ep *ep, struct usb_request *req)
+ */
+ static int ep_set_halt(struct usb_ep *ep, int value)
+ {
+- struct ci_hw_ep *hwep = container_of(ep, struct ci_hw_ep, ep);
+- int direction, retval = 0;
+- unsigned long flags;
+-
+- if (ep == NULL || hwep->ep.desc == NULL)
+- return -EINVAL;
+-
+- if (usb_endpoint_xfer_isoc(hwep->ep.desc))
+- return -EOPNOTSUPP;
+-
+- spin_lock_irqsave(hwep->lock, flags);
+-
+-#ifndef STALL_IN
+- /* g_file_storage MS compliant but g_zero fails chapter 9 compliance */
+- if (value && hwep->type == USB_ENDPOINT_XFER_BULK && hwep->dir == TX &&
+- !list_empty(&hwep->qh.queue)) {
+- spin_unlock_irqrestore(hwep->lock, flags);
+- return -EAGAIN;
+- }
+-#endif
+-
+- direction = hwep->dir;
+- do {
+- retval |= hw_ep_set_halt(hwep->ci, hwep->num, hwep->dir, value);
+-
+- if (!value)
+- hwep->wedge = 0;
+-
+- if (hwep->type == USB_ENDPOINT_XFER_CONTROL)
+- hwep->dir = (hwep->dir == TX) ? RX : TX;
+-
+- } while (hwep->dir != direction);
+-
+- spin_unlock_irqrestore(hwep->lock, flags);
+- return retval;
++ return _ep_set_halt(ep, value, true);
+ }
+
+ /**
+diff --git a/drivers/usb/core/config.c b/drivers/usb/core/config.c
+index b2a540b43f97..b9ddf0c1ffe5 100644
+--- a/drivers/usb/core/config.c
++++ b/drivers/usb/core/config.c
+@@ -112,7 +112,7 @@ static void usb_parse_ss_endpoint_companion(struct device *ddev, int cfgno,
+ cfgno, inum, asnum, ep->desc.bEndpointAddress);
+ ep->ss_ep_comp.bmAttributes = 16;
+ } else if (usb_endpoint_xfer_isoc(&ep->desc) &&
+- desc->bmAttributes > 2) {
++ USB_SS_MULT(desc->bmAttributes) > 3) {
+ dev_warn(ddev, "Isoc endpoint has Mult of %d in "
+ "config %d interface %d altsetting %d ep %d: "
+ "setting to 3\n", desc->bmAttributes + 1,
+@@ -121,7 +121,8 @@ static void usb_parse_ss_endpoint_companion(struct device *ddev, int cfgno,
+ }
+
+ if (usb_endpoint_xfer_isoc(&ep->desc))
+- max_tx = (desc->bMaxBurst + 1) * (desc->bmAttributes + 1) *
++ max_tx = (desc->bMaxBurst + 1) *
++ (USB_SS_MULT(desc->bmAttributes)) *
+ usb_endpoint_maxp(&ep->desc);
+ else if (usb_endpoint_xfer_int(&ep->desc))
+ max_tx = usb_endpoint_maxp(&ep->desc) *
+diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c
+index d85abfed84cc..f5a381945db2 100644
+--- a/drivers/usb/core/quirks.c
++++ b/drivers/usb/core/quirks.c
+@@ -54,6 +54,13 @@ static const struct usb_device_id usb_quirk_list[] = {
+ { USB_DEVICE(0x046d, 0x082d), .driver_info = USB_QUIRK_DELAY_INIT },
+ { USB_DEVICE(0x046d, 0x0843), .driver_info = USB_QUIRK_DELAY_INIT },
+
++ /* Logitech ConferenceCam CC3000e */
++ { USB_DEVICE(0x046d, 0x0847), .driver_info = USB_QUIRK_DELAY_INIT },
++ { USB_DEVICE(0x046d, 0x0848), .driver_info = USB_QUIRK_DELAY_INIT },
++
++ /* Logitech PTZ Pro Camera */
++ { USB_DEVICE(0x046d, 0x0853), .driver_info = USB_QUIRK_DELAY_INIT },
++
+ /* Logitech Quickcam Fusion */
+ { USB_DEVICE(0x046d, 0x08c1), .driver_info = USB_QUIRK_RESET_RESUME },
+
+@@ -78,6 +85,12 @@ static const struct usb_device_id usb_quirk_list[] = {
+ /* Philips PSC805 audio device */
+ { USB_DEVICE(0x0471, 0x0155), .driver_info = USB_QUIRK_RESET_RESUME },
+
++ /* Plantronic Audio 655 DSP */
++ { USB_DEVICE(0x047f, 0xc008), .driver_info = USB_QUIRK_RESET_RESUME },
++
++ /* Plantronic Audio 648 USB */
++ { USB_DEVICE(0x047f, 0xc013), .driver_info = USB_QUIRK_RESET_RESUME },
++
+ /* Artisman Watchdog Dongle */
+ { USB_DEVICE(0x04b4, 0x0526), .driver_info =
+ USB_QUIRK_CONFIG_INTF_STRINGS },
+diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
+index 9a8c936cd42c..41f841fa6c4d 100644
+--- a/drivers/usb/host/xhci-mem.c
++++ b/drivers/usb/host/xhci-mem.c
+@@ -1498,10 +1498,10 @@ int xhci_endpoint_init(struct xhci_hcd *xhci,
+ * use Event Data TRBs, and we don't chain in a link TRB on short
+ * transfers, we're basically dividing by 1.
+ *
+- * xHCI 1.0 specification indicates that the Average TRB Length should
+- * be set to 8 for control endpoints.
++ * xHCI 1.0 and 1.1 specification indicates that the Average TRB Length
++ * should be set to 8 for control endpoints.
+ */
+- if (usb_endpoint_xfer_control(&ep->desc) && xhci->hci_version == 0x100)
++ if (usb_endpoint_xfer_control(&ep->desc) && xhci->hci_version >= 0x100)
+ ep_ctx->tx_info |= cpu_to_le32(AVG_TRB_LENGTH_FOR_EP(8));
+ else
+ ep_ctx->tx_info |=
+@@ -1792,8 +1792,7 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci)
+ int size;
+ int i, j, num_ports;
+
+- if (timer_pending(&xhci->cmd_timer))
+- del_timer_sync(&xhci->cmd_timer);
++ del_timer_sync(&xhci->cmd_timer);
+
+ /* Free the Event Ring Segment Table and the actual Event Ring */
+ size = sizeof(struct xhci_erst_entry)*(xhci->erst.num_entries);
+@@ -2321,6 +2320,10 @@ int xhci_mem_init(struct xhci_hcd *xhci, gfp_t flags)
+
+ INIT_LIST_HEAD(&xhci->cmd_list);
+
++ /* init command timeout timer */
++ setup_timer(&xhci->cmd_timer, xhci_handle_command_timeout,
++ (unsigned long)xhci);
++
+ page_size = readl(&xhci->op_regs->page_size);
+ xhci_dbg_trace(xhci, trace_xhci_dbg_init,
+ "Supported page size register = 0x%x", page_size);
+@@ -2505,10 +2508,6 @@ int xhci_mem_init(struct xhci_hcd *xhci, gfp_t flags)
+ "Wrote ERST address to ir_set 0.");
+ xhci_print_ir_set(xhci, 0);
+
+- /* init command timeout timer */
+- setup_timer(&xhci->cmd_timer, xhci_handle_command_timeout,
+- (unsigned long)xhci);
+-
+ /*
+ * XXX: Might need to set the Interrupter Moderation Register to
+ * something other than the default (~1ms minimum between interrupts).
+diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
+index 5590eac2b22d..c79d33676672 100644
+--- a/drivers/usb/host/xhci-pci.c
++++ b/drivers/usb/host/xhci-pci.c
+@@ -180,51 +180,6 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
+ "QUIRK: Resetting on resume");
+ }
+
+-/*
+- * In some Intel xHCI controllers, in order to get D3 working,
+- * through a vendor specific SSIC CONFIG register at offset 0x883c,
+- * SSIC PORT need to be marked as "unused" before putting xHCI
+- * into D3. After D3 exit, the SSIC port need to be marked as "used".
+- * Without this change, xHCI might not enter D3 state.
+- * Make sure PME works on some Intel xHCI controllers by writing 1 to clear
+- * the Internal PME flag bit in vendor specific PMCTRL register at offset 0x80a4
+- */
+-static void xhci_pme_quirk(struct usb_hcd *hcd, bool suspend)
+-{
+- struct xhci_hcd *xhci = hcd_to_xhci(hcd);
+- struct pci_dev *pdev = to_pci_dev(hcd->self.controller);
+- u32 val;
+- void __iomem *reg;
+-
+- if (pdev->vendor == PCI_VENDOR_ID_INTEL &&
+- pdev->device == PCI_DEVICE_ID_INTEL_CHERRYVIEW_XHCI) {
+-
+- reg = (void __iomem *) xhci->cap_regs + PORT2_SSIC_CONFIG_REG2;
+-
+- /* Notify SSIC that SSIC profile programming is not done */
+- val = readl(reg) & ~PROG_DONE;
+- writel(val, reg);
+-
+- /* Mark SSIC port as unused(suspend) or used(resume) */
+- val = readl(reg);
+- if (suspend)
+- val |= SSIC_PORT_UNUSED;
+- else
+- val &= ~SSIC_PORT_UNUSED;
+- writel(val, reg);
+-
+- /* Notify SSIC that SSIC profile programming is done */
+- val = readl(reg) | PROG_DONE;
+- writel(val, reg);
+- readl(reg);
+- }
+-
+- reg = (void __iomem *) xhci->cap_regs + 0x80a4;
+- val = readl(reg);
+- writel(val | BIT(28), reg);
+- readl(reg);
+-}
+-
+ #ifdef CONFIG_ACPI
+ static void xhci_pme_acpi_rtd3_enable(struct pci_dev *dev)
+ {
+@@ -345,6 +300,51 @@ static void xhci_pci_remove(struct pci_dev *dev)
+ }
+
+ #ifdef CONFIG_PM
++/*
++ * In some Intel xHCI controllers, in order to get D3 working,
++ * through a vendor specific SSIC CONFIG register at offset 0x883c,
++ * SSIC PORT need to be marked as "unused" before putting xHCI
++ * into D3. After D3 exit, the SSIC port need to be marked as "used".
++ * Without this change, xHCI might not enter D3 state.
++ * Make sure PME works on some Intel xHCI controllers by writing 1 to clear
++ * the Internal PME flag bit in vendor specific PMCTRL register at offset 0x80a4
++ */
++static void xhci_pme_quirk(struct usb_hcd *hcd, bool suspend)
++{
++ struct xhci_hcd *xhci = hcd_to_xhci(hcd);
++ struct pci_dev *pdev = to_pci_dev(hcd->self.controller);
++ u32 val;
++ void __iomem *reg;
++
++ if (pdev->vendor == PCI_VENDOR_ID_INTEL &&
++ pdev->device == PCI_DEVICE_ID_INTEL_CHERRYVIEW_XHCI) {
++
++ reg = (void __iomem *) xhci->cap_regs + PORT2_SSIC_CONFIG_REG2;
++
++ /* Notify SSIC that SSIC profile programming is not done */
++ val = readl(reg) & ~PROG_DONE;
++ writel(val, reg);
++
++ /* Mark SSIC port as unused(suspend) or used(resume) */
++ val = readl(reg);
++ if (suspend)
++ val |= SSIC_PORT_UNUSED;
++ else
++ val &= ~SSIC_PORT_UNUSED;
++ writel(val, reg);
++
++ /* Notify SSIC that SSIC profile programming is done */
++ val = readl(reg) | PROG_DONE;
++ writel(val, reg);
++ readl(reg);
++ }
++
++ reg = (void __iomem *) xhci->cap_regs + 0x80a4;
++ val = readl(reg);
++ writel(val | BIT(28), reg);
++ readl(reg);
++}
++
+ static int xhci_pci_suspend(struct usb_hcd *hcd, bool do_wakeup)
+ {
+ struct xhci_hcd *xhci = hcd_to_xhci(hcd);
+diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
+index 32f4d564494a..8aadf3def901 100644
+--- a/drivers/usb/host/xhci-ring.c
++++ b/drivers/usb/host/xhci-ring.c
+@@ -302,6 +302,15 @@ static int xhci_abort_cmd_ring(struct xhci_hcd *xhci)
+ ret = xhci_handshake(&xhci->op_regs->cmd_ring,
+ CMD_RING_RUNNING, 0, 5 * 1000 * 1000);
+ if (ret < 0) {
++ /* we are about to kill xhci, give it one more chance */
++ xhci_write_64(xhci, temp_64 | CMD_RING_ABORT,
++ &xhci->op_regs->cmd_ring);
++ udelay(1000);
++ ret = xhci_handshake(&xhci->op_regs->cmd_ring,
++ CMD_RING_RUNNING, 0, 3 * 1000 * 1000);
++ if (ret == 0)
++ return 0;
++
+ xhci_err(xhci, "Stopped the command ring failed, "
+ "maybe the host is dead\n");
+ xhci->xhc_state |= XHCI_STATE_DYING;
+@@ -3041,9 +3050,11 @@ static int queue_bulk_sg_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
+ struct xhci_td *td;
+ struct scatterlist *sg;
+ int num_sgs;
+- int trb_buff_len, this_sg_len, running_total;
++ int trb_buff_len, this_sg_len, running_total, ret;
+ unsigned int total_packet_count;
++ bool zero_length_needed;
+ bool first_trb;
++ int last_trb_num;
+ u64 addr;
+ bool more_trbs_coming;
+
+@@ -3059,13 +3070,27 @@ static int queue_bulk_sg_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
+ total_packet_count = DIV_ROUND_UP(urb->transfer_buffer_length,
+ usb_endpoint_maxp(&urb->ep->desc));
+
+- trb_buff_len = prepare_transfer(xhci, xhci->devs[slot_id],
++ ret = prepare_transfer(xhci, xhci->devs[slot_id],
+ ep_index, urb->stream_id,
+ num_trbs, urb, 0, mem_flags);
+- if (trb_buff_len < 0)
+- return trb_buff_len;
++ if (ret < 0)
++ return ret;
+
+ urb_priv = urb->hcpriv;
++
++ /* Deal with URB_ZERO_PACKET - need one more td/trb */
++ zero_length_needed = urb->transfer_flags & URB_ZERO_PACKET &&
++ urb_priv->length == 2;
++ if (zero_length_needed) {
++ num_trbs++;
++ xhci_dbg(xhci, "Creating zero length td.\n");
++ ret = prepare_transfer(xhci, xhci->devs[slot_id],
++ ep_index, urb->stream_id,
++ 1, urb, 1, mem_flags);
++ if (ret < 0)
++ return ret;
++ }
++
+ td = urb_priv->td[0];
+
+ /*
+@@ -3095,6 +3120,7 @@ static int queue_bulk_sg_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
+ trb_buff_len = urb->transfer_buffer_length;
+
+ first_trb = true;
++ last_trb_num = zero_length_needed ? 2 : 1;
+ /* Queue the first TRB, even if it's zero-length */
+ do {
+ u32 field = 0;
+@@ -3112,12 +3138,15 @@ static int queue_bulk_sg_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
+ /* Chain all the TRBs together; clear the chain bit in the last
+ * TRB to indicate it's the last TRB in the chain.
+ */
+- if (num_trbs > 1) {
++ if (num_trbs > last_trb_num) {
+ field |= TRB_CHAIN;
+- } else {
+- /* FIXME - add check for ZERO_PACKET flag before this */
++ } else if (num_trbs == last_trb_num) {
+ td->last_trb = ep_ring->enqueue;
+ field |= TRB_IOC;
++ } else if (zero_length_needed && num_trbs == 1) {
++ trb_buff_len = 0;
++ urb_priv->td[1]->last_trb = ep_ring->enqueue;
++ field |= TRB_IOC;
+ }
+
+ /* Only set interrupt on short packet for IN endpoints */
+@@ -3179,7 +3208,7 @@ static int queue_bulk_sg_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
+ if (running_total + trb_buff_len > urb->transfer_buffer_length)
+ trb_buff_len =
+ urb->transfer_buffer_length - running_total;
+- } while (running_total < urb->transfer_buffer_length);
++ } while (num_trbs > 0);
+
+ check_trb_math(urb, num_trbs, running_total);
+ giveback_first_trb(xhci, slot_id, ep_index, urb->stream_id,
+@@ -3197,7 +3226,9 @@ int xhci_queue_bulk_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
+ int num_trbs;
+ struct xhci_generic_trb *start_trb;
+ bool first_trb;
++ int last_trb_num;
+ bool more_trbs_coming;
++ bool zero_length_needed;
+ int start_cycle;
+ u32 field, length_field;
+
+@@ -3228,7 +3259,6 @@ int xhci_queue_bulk_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
+ num_trbs++;
+ running_total += TRB_MAX_BUFF_SIZE;
+ }
+- /* FIXME: this doesn't deal with URB_ZERO_PACKET - need one more */
+
+ ret = prepare_transfer(xhci, xhci->devs[slot_id],
+ ep_index, urb->stream_id,
+@@ -3237,6 +3267,20 @@ int xhci_queue_bulk_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
+ return ret;
+
+ urb_priv = urb->hcpriv;
++
++ /* Deal with URB_ZERO_PACKET - need one more td/trb */
++ zero_length_needed = urb->transfer_flags & URB_ZERO_PACKET &&
++ urb_priv->length == 2;
++ if (zero_length_needed) {
++ num_trbs++;
++ xhci_dbg(xhci, "Creating zero length td.\n");
++ ret = prepare_transfer(xhci, xhci->devs[slot_id],
++ ep_index, urb->stream_id,
++ 1, urb, 1, mem_flags);
++ if (ret < 0)
++ return ret;
++ }
++
+ td = urb_priv->td[0];
+
+ /*
+@@ -3258,7 +3302,7 @@ int xhci_queue_bulk_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
+ trb_buff_len = urb->transfer_buffer_length;
+
+ first_trb = true;
+-
++ last_trb_num = zero_length_needed ? 2 : 1;
+ /* Queue the first TRB, even if it's zero-length */
+ do {
+ u32 remainder = 0;
+@@ -3275,12 +3319,15 @@ int xhci_queue_bulk_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
+ /* Chain all the TRBs together; clear the chain bit in the last
+ * TRB to indicate it's the last TRB in the chain.
+ */
+- if (num_trbs > 1) {
++ if (num_trbs > last_trb_num) {
+ field |= TRB_CHAIN;
+- } else {
+- /* FIXME - add check for ZERO_PACKET flag before this */
++ } else if (num_trbs == last_trb_num) {
+ td->last_trb = ep_ring->enqueue;
+ field |= TRB_IOC;
++ } else if (zero_length_needed && num_trbs == 1) {
++ trb_buff_len = 0;
++ urb_priv->td[1]->last_trb = ep_ring->enqueue;
++ field |= TRB_IOC;
+ }
+
+ /* Only set interrupt on short packet for IN endpoints */
+@@ -3318,7 +3365,7 @@ int xhci_queue_bulk_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
+ trb_buff_len = urb->transfer_buffer_length - running_total;
+ if (trb_buff_len > TRB_MAX_BUFF_SIZE)
+ trb_buff_len = TRB_MAX_BUFF_SIZE;
+- } while (running_total < urb->transfer_buffer_length);
++ } while (num_trbs > 0);
+
+ check_trb_math(urb, num_trbs, running_total);
+ giveback_first_trb(xhci, slot_id, ep_index, urb->stream_id,
+@@ -3385,8 +3432,8 @@ int xhci_queue_ctrl_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
+ if (start_cycle == 0)
+ field |= 0x1;
+
+- /* xHCI 1.0 6.4.1.2.1: Transfer Type field */
+- if (xhci->hci_version == 0x100) {
++ /* xHCI 1.0/1.1 6.4.1.2.1: Transfer Type field */
++ if (xhci->hci_version >= 0x100) {
+ if (urb->transfer_buffer_length > 0) {
+ if (setup->bRequestType & USB_DIR_IN)
+ field |= TRB_TX_TYPE(TRB_DATA_IN);
+diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
+index 526ebc0c7e72..d7b9f484d4e9 100644
+--- a/drivers/usb/host/xhci.c
++++ b/drivers/usb/host/xhci.c
+@@ -146,7 +146,8 @@ static int xhci_start(struct xhci_hcd *xhci)
+ "waited %u microseconds.\n",
+ XHCI_MAX_HALT_USEC);
+ if (!ret)
+- xhci->xhc_state &= ~XHCI_STATE_HALTED;
++ xhci->xhc_state &= ~(XHCI_STATE_HALTED | XHCI_STATE_DYING);
++
+ return ret;
+ }
+
+@@ -654,15 +655,6 @@ int xhci_run(struct usb_hcd *hcd)
+ }
+ EXPORT_SYMBOL_GPL(xhci_run);
+
+-static void xhci_only_stop_hcd(struct usb_hcd *hcd)
+-{
+- struct xhci_hcd *xhci = hcd_to_xhci(hcd);
+-
+- spin_lock_irq(&xhci->lock);
+- xhci_halt(xhci);
+- spin_unlock_irq(&xhci->lock);
+-}
+-
+ /*
+ * Stop xHCI driver.
+ *
+@@ -677,12 +669,14 @@ void xhci_stop(struct usb_hcd *hcd)
+ u32 temp;
+ struct xhci_hcd *xhci = hcd_to_xhci(hcd);
+
+- if (!usb_hcd_is_primary_hcd(hcd)) {
+- xhci_only_stop_hcd(xhci->shared_hcd);
++ if (xhci->xhc_state & XHCI_STATE_HALTED)
+ return;
+- }
+
++ mutex_lock(&xhci->mutex);
+ spin_lock_irq(&xhci->lock);
++ xhci->xhc_state |= XHCI_STATE_HALTED;
++ xhci->cmd_ring_state = CMD_RING_STATE_STOPPED;
++
+ /* Make sure the xHC is halted for a USB3 roothub
+ * (xhci_stop() could be called as part of failed init).
+ */
+@@ -717,6 +711,7 @@ void xhci_stop(struct usb_hcd *hcd)
+ xhci_dbg_trace(xhci, trace_xhci_dbg_init,
+ "xhci_stop completed - status = %x",
+ readl(&xhci->op_regs->status));
++ mutex_unlock(&xhci->mutex);
+ }
+
+ /*
+@@ -1340,6 +1335,11 @@ int xhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flags)
+
+ if (usb_endpoint_xfer_isoc(&urb->ep->desc))
+ size = urb->number_of_packets;
++ else if (usb_endpoint_is_bulk_out(&urb->ep->desc) &&
++ urb->transfer_buffer_length > 0 &&
++ urb->transfer_flags & URB_ZERO_PACKET &&
++ !(urb->transfer_buffer_length % usb_endpoint_maxp(&urb->ep->desc)))
++ size = 2;
+ else
+ size = 1;
+
+@@ -3788,6 +3788,9 @@ static int xhci_setup_device(struct usb_hcd *hcd, struct usb_device *udev,
+
+ mutex_lock(&xhci->mutex);
+
++ if (xhci->xhc_state) /* dying or halted */
++ goto out;
++
+ if (!udev->slot_id) {
+ xhci_dbg_trace(xhci, trace_xhci_dbg_address,
+ "Bad Slot ID %d", udev->slot_id);
+diff --git a/drivers/usb/misc/chaoskey.c b/drivers/usb/misc/chaoskey.c
+index 3ad5d19e4d04..23c794813e6a 100644
+--- a/drivers/usb/misc/chaoskey.c
++++ b/drivers/usb/misc/chaoskey.c
+@@ -472,7 +472,7 @@ static int chaoskey_rng_read(struct hwrng *rng, void *data,
+ if (this_time > max)
+ this_time = max;
+
+- memcpy(data, dev->buf, this_time);
++ memcpy(data, dev->buf + dev->used, this_time);
+
+ dev->used += this_time;
+
+diff --git a/drivers/usb/musb/musb_cppi41.c b/drivers/usb/musb/musb_cppi41.c
+index 4d1b44c232ee..d07cafb7d5f5 100644
+--- a/drivers/usb/musb/musb_cppi41.c
++++ b/drivers/usb/musb/musb_cppi41.c
+@@ -614,7 +614,7 @@ static int cppi41_dma_controller_start(struct cppi41_dma_controller *controller)
+ {
+ struct musb *musb = controller->musb;
+ struct device *dev = musb->controller;
+- struct device_node *np = dev->of_node;
++ struct device_node *np = dev->parent->of_node;
+ struct cppi41_dma_channel *cppi41_channel;
+ int count;
+ int i;
+@@ -664,7 +664,7 @@ static int cppi41_dma_controller_start(struct cppi41_dma_controller *controller)
+ musb_dma->status = MUSB_DMA_STATUS_FREE;
+ musb_dma->max_len = SZ_4M;
+
+- dc = dma_request_slave_channel(dev, str);
++ dc = dma_request_slave_channel(dev->parent, str);
+ if (!dc) {
+ dev_err(dev, "Failed to request %s.\n", str);
+ ret = -EPROBE_DEFER;
+@@ -695,7 +695,7 @@ cppi41_dma_controller_create(struct musb *musb, void __iomem *base)
+ struct cppi41_dma_controller *controller;
+ int ret = 0;
+
+- if (!musb->controller->of_node) {
++ if (!musb->controller->parent->of_node) {
+ dev_err(musb->controller, "Need DT for the DMA engine.\n");
+ return NULL;
+ }
+diff --git a/drivers/usb/musb/musb_dsps.c b/drivers/usb/musb/musb_dsps.c
+index 1334a3de31b8..67325ec94894 100644
+--- a/drivers/usb/musb/musb_dsps.c
++++ b/drivers/usb/musb/musb_dsps.c
+@@ -225,8 +225,11 @@ static void dsps_musb_enable(struct musb *musb)
+
+ dsps_writel(reg_base, wrp->epintr_set, epmask);
+ dsps_writel(reg_base, wrp->coreintr_set, coremask);
+- /* start polling for ID change. */
+- mod_timer(&glue->timer, jiffies + msecs_to_jiffies(wrp->poll_timeout));
++ /* start polling for ID change in dual-role idle mode */
++ if (musb->xceiv->otg->state == OTG_STATE_B_IDLE &&
++ musb->port_mode == MUSB_PORT_MODE_DUAL_ROLE)
++ mod_timer(&glue->timer, jiffies +
++ msecs_to_jiffies(wrp->poll_timeout));
+ dsps_musb_try_idle(musb, 0);
+ }
+
+diff --git a/drivers/usb/phy/phy-generic.c b/drivers/usb/phy/phy-generic.c
+index deee68eafb72..0cd85f2ccddd 100644
+--- a/drivers/usb/phy/phy-generic.c
++++ b/drivers/usb/phy/phy-generic.c
+@@ -230,7 +230,8 @@ int usb_phy_gen_create_phy(struct device *dev, struct usb_phy_generic *nop,
+ clk_rate = pdata->clk_rate;
+ needs_vcc = pdata->needs_vcc;
+ if (gpio_is_valid(pdata->gpio_reset)) {
+- err = devm_gpio_request_one(dev, pdata->gpio_reset, 0,
++ err = devm_gpio_request_one(dev, pdata->gpio_reset,
++ GPIOF_ACTIVE_LOW,
+ dev_name(dev));
+ if (!err)
+ nop->gpiod_reset =
+diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c
+index 876423b8892c..7c8eb4c4c175 100644
+--- a/drivers/usb/serial/option.c
++++ b/drivers/usb/serial/option.c
+@@ -278,6 +278,10 @@ static void option_instat_callback(struct urb *urb);
+ #define ZTE_PRODUCT_MF622 0x0001
+ #define ZTE_PRODUCT_MF628 0x0015
+ #define ZTE_PRODUCT_MF626 0x0031
++#define ZTE_PRODUCT_ZM8620_X 0x0396
++#define ZTE_PRODUCT_ME3620_MBIM 0x0426
++#define ZTE_PRODUCT_ME3620_X 0x1432
++#define ZTE_PRODUCT_ME3620_L 0x1433
+ #define ZTE_PRODUCT_AC2726 0xfff1
+ #define ZTE_PRODUCT_MG880 0xfffd
+ #define ZTE_PRODUCT_CDMA_TECH 0xfffe
+@@ -544,6 +548,18 @@ static const struct option_blacklist_info zte_mc2716_z_blacklist = {
+ .sendsetup = BIT(1) | BIT(2) | BIT(3),
+ };
+
++static const struct option_blacklist_info zte_me3620_mbim_blacklist = {
++ .reserved = BIT(2) | BIT(3) | BIT(4),
++};
++
++static const struct option_blacklist_info zte_me3620_xl_blacklist = {
++ .reserved = BIT(3) | BIT(4) | BIT(5),
++};
++
++static const struct option_blacklist_info zte_zm8620_x_blacklist = {
++ .reserved = BIT(3) | BIT(4) | BIT(5),
++};
++
+ static const struct option_blacklist_info huawei_cdc12_blacklist = {
+ .reserved = BIT(1) | BIT(2),
+ };
+@@ -1591,6 +1607,14 @@ static const struct usb_device_id option_ids[] = {
+ .driver_info = (kernel_ulong_t)&zte_ad3812_z_blacklist },
+ { USB_DEVICE_AND_INTERFACE_INFO(ZTE_VENDOR_ID, ZTE_PRODUCT_MC2716, 0xff, 0xff, 0xff),
+ .driver_info = (kernel_ulong_t)&zte_mc2716_z_blacklist },
++ { USB_DEVICE(ZTE_VENDOR_ID, ZTE_PRODUCT_ME3620_L),
++ .driver_info = (kernel_ulong_t)&zte_me3620_xl_blacklist },
++ { USB_DEVICE(ZTE_VENDOR_ID, ZTE_PRODUCT_ME3620_MBIM),
++ .driver_info = (kernel_ulong_t)&zte_me3620_mbim_blacklist },
++ { USB_DEVICE(ZTE_VENDOR_ID, ZTE_PRODUCT_ME3620_X),
++ .driver_info = (kernel_ulong_t)&zte_me3620_xl_blacklist },
++ { USB_DEVICE(ZTE_VENDOR_ID, ZTE_PRODUCT_ZM8620_X),
++ .driver_info = (kernel_ulong_t)&zte_zm8620_x_blacklist },
+ { USB_VENDOR_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0xff, 0x02, 0x01) },
+ { USB_VENDOR_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0xff, 0x02, 0x05) },
+ { USB_VENDOR_AND_INTERFACE_INFO(ZTE_VENDOR_ID, 0xff, 0x86, 0x10) },
+diff --git a/drivers/usb/serial/whiteheat.c b/drivers/usb/serial/whiteheat.c
+index 6c3734d2b45a..d3ea90bef84d 100644
+--- a/drivers/usb/serial/whiteheat.c
++++ b/drivers/usb/serial/whiteheat.c
+@@ -80,6 +80,8 @@ static int whiteheat_firmware_download(struct usb_serial *serial,
+ static int whiteheat_firmware_attach(struct usb_serial *serial);
+
+ /* function prototypes for the Connect Tech WhiteHEAT serial converter */
++static int whiteheat_probe(struct usb_serial *serial,
++ const struct usb_device_id *id);
+ static int whiteheat_attach(struct usb_serial *serial);
+ static void whiteheat_release(struct usb_serial *serial);
+ static int whiteheat_port_probe(struct usb_serial_port *port);
+@@ -116,6 +118,7 @@ static struct usb_serial_driver whiteheat_device = {
+ .description = "Connect Tech - WhiteHEAT",
+ .id_table = id_table_std,
+ .num_ports = 4,
++ .probe = whiteheat_probe,
+ .attach = whiteheat_attach,
+ .release = whiteheat_release,
+ .port_probe = whiteheat_port_probe,
+@@ -217,6 +220,34 @@ static int whiteheat_firmware_attach(struct usb_serial *serial)
+ /*****************************************************************************
+ * Connect Tech's White Heat serial driver functions
+ *****************************************************************************/
++
++static int whiteheat_probe(struct usb_serial *serial,
++ const struct usb_device_id *id)
++{
++ struct usb_host_interface *iface_desc;
++ struct usb_endpoint_descriptor *endpoint;
++ size_t num_bulk_in = 0;
++ size_t num_bulk_out = 0;
++ size_t min_num_bulk;
++ unsigned int i;
++
++ iface_desc = serial->interface->cur_altsetting;
++
++ for (i = 0; i < iface_desc->desc.bNumEndpoints; i++) {
++ endpoint = &iface_desc->endpoint[i].desc;
++ if (usb_endpoint_is_bulk_in(endpoint))
++ ++num_bulk_in;
++ if (usb_endpoint_is_bulk_out(endpoint))
++ ++num_bulk_out;
++ }
++
++ min_num_bulk = COMMAND_PORT + 1;
++ if (num_bulk_in < min_num_bulk || num_bulk_out < min_num_bulk)
++ return -ENODEV;
++
++ return 0;
++}
++
+ static int whiteheat_attach(struct usb_serial *serial)
+ {
+ struct usb_serial_port *command_port;
+diff --git a/drivers/watchdog/imgpdc_wdt.c b/drivers/watchdog/imgpdc_wdt.c
+index 0f73621827ab..15ab07230960 100644
+--- a/drivers/watchdog/imgpdc_wdt.c
++++ b/drivers/watchdog/imgpdc_wdt.c
+@@ -316,6 +316,7 @@ static int pdc_wdt_remove(struct platform_device *pdev)
+ {
+ struct pdc_wdt_dev *pdc_wdt = platform_get_drvdata(pdev);
+
++ unregister_restart_handler(&pdc_wdt->restart_handler);
+ pdc_wdt_stop(&pdc_wdt->wdt_dev);
+ watchdog_unregister_device(&pdc_wdt->wdt_dev);
+ clk_disable_unprepare(pdc_wdt->wdt_clk);
+diff --git a/drivers/watchdog/sunxi_wdt.c b/drivers/watchdog/sunxi_wdt.c
+index a29afb37c48c..47bd8a14d01f 100644
+--- a/drivers/watchdog/sunxi_wdt.c
++++ b/drivers/watchdog/sunxi_wdt.c
+@@ -184,7 +184,7 @@ static int sunxi_wdt_start(struct watchdog_device *wdt_dev)
+ /* Set system reset function */
+ reg = readl(wdt_base + regs->wdt_cfg);
+ reg &= ~(regs->wdt_reset_mask);
+- reg |= ~(regs->wdt_reset_val);
++ reg |= regs->wdt_reset_val;
+ writel(reg, wdt_base + regs->wdt_cfg);
+
+ /* Enable watchdog */
+diff --git a/drivers/xen/preempt.c b/drivers/xen/preempt.c
+index a1800c150839..08cb419eb4e6 100644
+--- a/drivers/xen/preempt.c
++++ b/drivers/xen/preempt.c
+@@ -31,7 +31,7 @@ EXPORT_SYMBOL_GPL(xen_in_preemptible_hcall);
+ asmlinkage __visible void xen_maybe_preempt_hcall(void)
+ {
+ if (unlikely(__this_cpu_read(xen_in_preemptible_hcall)
+- && should_resched())) {
++ && need_resched())) {
+ /*
+ * Clear flag as we may be rescheduled on a different
+ * cpu.
+diff --git a/fs/block_dev.c b/fs/block_dev.c
+index 198243717da5..1170f8ce5e7f 100644
+--- a/fs/block_dev.c
++++ b/fs/block_dev.c
+@@ -1241,6 +1241,13 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
+ goto out_clear;
+ }
+ bd_set_size(bdev, (loff_t)bdev->bd_part->nr_sects << 9);
++ /*
++ * If the partition is not aligned on a page
++ * boundary, we can't do dax I/O to it.
++ */
++ if ((bdev->bd_part->start_sect % (PAGE_SIZE / 512)) ||
++ (bdev->bd_part->nr_sects % (PAGE_SIZE / 512)))
++ bdev->bd_inode->i_flags &= ~S_DAX;
+ }
+ } else {
+ if (bdev->bd_contains == bdev) {
+diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
+index 02d05817cbdf..3fc4fec9b94e 100644
+--- a/fs/btrfs/extent_io.c
++++ b/fs/btrfs/extent_io.c
+@@ -2798,7 +2798,8 @@ static int submit_extent_page(int rw, struct extent_io_tree *tree,
+ bio_end_io_t end_io_func,
+ int mirror_num,
+ unsigned long prev_bio_flags,
+- unsigned long bio_flags)
++ unsigned long bio_flags,
++ bool force_bio_submit)
+ {
+ int ret = 0;
+ struct bio *bio;
+@@ -2816,6 +2817,7 @@ static int submit_extent_page(int rw, struct extent_io_tree *tree,
+ contig = bio_end_sector(bio) == sector;
+
+ if (prev_bio_flags != bio_flags || !contig ||
++ force_bio_submit ||
+ merge_bio(rw, tree, page, offset, page_size, bio, bio_flags) ||
+ bio_add_page(bio, page, page_size, offset) < page_size) {
+ ret = submit_one_bio(rw, bio, mirror_num,
+@@ -2909,7 +2911,8 @@ static int __do_readpage(struct extent_io_tree *tree,
+ get_extent_t *get_extent,
+ struct extent_map **em_cached,
+ struct bio **bio, int mirror_num,
+- unsigned long *bio_flags, int rw)
++ unsigned long *bio_flags, int rw,
++ u64 *prev_em_start)
+ {
+ struct inode *inode = page->mapping->host;
+ u64 start = page_offset(page);
+@@ -2957,6 +2960,7 @@ static int __do_readpage(struct extent_io_tree *tree,
+ }
+ while (cur <= end) {
+ unsigned long pnr = (last_byte >> PAGE_CACHE_SHIFT) + 1;
++ bool force_bio_submit = false;
+
+ if (cur >= last_byte) {
+ char *userpage;
+@@ -3007,6 +3011,49 @@ static int __do_readpage(struct extent_io_tree *tree,
+ block_start = em->block_start;
+ if (test_bit(EXTENT_FLAG_PREALLOC, &em->flags))
+ block_start = EXTENT_MAP_HOLE;
++
++ /*
++ * If we have a file range that points to a compressed extent
++ * and it's followed by a consecutive file range that points to
++ * to the same compressed extent (possibly with a different
++ * offset and/or length, so it either points to the whole extent
++ * or only part of it), we must make sure we do not submit a
++ * single bio to populate the pages for the 2 ranges because
++ * this makes the compressed extent read zero out the pages
++ * belonging to the 2nd range. Imagine the following scenario:
++ *
++ * File layout
++ * [0 - 8K] [8K - 24K]
++ * | |
++ * | |
++ * points to extent X, points to extent X,
++ * offset 4K, length of 8K offset 0, length 16K
++ *
++ * [extent X, compressed length = 4K uncompressed length = 16K]
++ *
++ * If the bio to read the compressed extent covers both ranges,
++ * it will decompress extent X into the pages belonging to the
++ * first range and then it will stop, zeroing out the remaining
++ * pages that belong to the other range that points to extent X.
++ * So here we make sure we submit 2 bios, one for the first
++ * range and another one for the third range. Both will target
++ * the same physical extent from disk, but we can't currently
++ * make the compressed bio endio callback populate the pages
++ * for both ranges because each compressed bio is tightly
++ * coupled with a single extent map, and each range can have
++ * an extent map with a different offset value relative to the
++ * uncompressed data of our extent and different lengths. This
++ * is a corner case so we prioritize correctness over
++ * non-optimal behavior (submitting 2 bios for the same extent).
++ */
++ if (test_bit(EXTENT_FLAG_COMPRESSED, &em->flags) &&
++ prev_em_start && *prev_em_start != (u64)-1 &&
++ *prev_em_start != em->orig_start)
++ force_bio_submit = true;
++
++ if (prev_em_start)
++ *prev_em_start = em->orig_start;
++
+ free_extent_map(em);
+ em = NULL;
+
+@@ -3056,7 +3103,8 @@ static int __do_readpage(struct extent_io_tree *tree,
+ bdev, bio, pnr,
+ end_bio_extent_readpage, mirror_num,
+ *bio_flags,
+- this_bio_flag);
++ this_bio_flag,
++ force_bio_submit);
+ if (!ret) {
+ nr++;
+ *bio_flags = this_bio_flag;
+@@ -3083,7 +3131,8 @@ static inline void __do_contiguous_readpages(struct extent_io_tree *tree,
+ get_extent_t *get_extent,
+ struct extent_map **em_cached,
+ struct bio **bio, int mirror_num,
+- unsigned long *bio_flags, int rw)
++ unsigned long *bio_flags, int rw,
++ u64 *prev_em_start)
+ {
+ struct inode *inode;
+ struct btrfs_ordered_extent *ordered;
+@@ -3103,7 +3152,7 @@ static inline void __do_contiguous_readpages(struct extent_io_tree *tree,
+
+ for (index = 0; index < nr_pages; index++) {
+ __do_readpage(tree, pages[index], get_extent, em_cached, bio,
+- mirror_num, bio_flags, rw);
++ mirror_num, bio_flags, rw, prev_em_start);
+ page_cache_release(pages[index]);
+ }
+ }
+@@ -3113,7 +3162,8 @@ static void __extent_readpages(struct extent_io_tree *tree,
+ int nr_pages, get_extent_t *get_extent,
+ struct extent_map **em_cached,
+ struct bio **bio, int mirror_num,
+- unsigned long *bio_flags, int rw)
++ unsigned long *bio_flags, int rw,
++ u64 *prev_em_start)
+ {
+ u64 start = 0;
+ u64 end = 0;
+@@ -3134,7 +3184,7 @@ static void __extent_readpages(struct extent_io_tree *tree,
+ index - first_index, start,
+ end, get_extent, em_cached,
+ bio, mirror_num, bio_flags,
+- rw);
++ rw, prev_em_start);
+ start = page_start;
+ end = start + PAGE_CACHE_SIZE - 1;
+ first_index = index;
+@@ -3145,7 +3195,8 @@ static void __extent_readpages(struct extent_io_tree *tree,
+ __do_contiguous_readpages(tree, &pages[first_index],
+ index - first_index, start,
+ end, get_extent, em_cached, bio,
+- mirror_num, bio_flags, rw);
++ mirror_num, bio_flags, rw,
++ prev_em_start);
+ }
+
+ static int __extent_read_full_page(struct extent_io_tree *tree,
+@@ -3171,7 +3222,7 @@ static int __extent_read_full_page(struct extent_io_tree *tree,
+ }
+
+ ret = __do_readpage(tree, page, get_extent, NULL, bio, mirror_num,
+- bio_flags, rw);
++ bio_flags, rw, NULL);
+ return ret;
+ }
+
+@@ -3197,7 +3248,7 @@ int extent_read_full_page_nolock(struct extent_io_tree *tree, struct page *page,
+ int ret;
+
+ ret = __do_readpage(tree, page, get_extent, NULL, &bio, mirror_num,
+- &bio_flags, READ);
++ &bio_flags, READ, NULL);
+ if (bio)
+ ret = submit_one_bio(READ, bio, mirror_num, bio_flags);
+ return ret;
+@@ -3450,7 +3501,7 @@ static noinline_for_stack int __extent_writepage_io(struct inode *inode,
+ sector, iosize, pg_offset,
+ bdev, &epd->bio, max_nr,
+ end_bio_extent_writepage,
+- 0, 0, 0);
++ 0, 0, 0, false);
+ if (ret)
+ SetPageError(page);
+ }
+@@ -3752,7 +3803,7 @@ static noinline_for_stack int write_one_eb(struct extent_buffer *eb,
+ ret = submit_extent_page(rw, tree, p, offset >> 9,
+ PAGE_CACHE_SIZE, 0, bdev, &epd->bio,
+ -1, end_bio_extent_buffer_writepage,
+- 0, epd->bio_flags, bio_flags);
++ 0, epd->bio_flags, bio_flags, false);
+ epd->bio_flags = bio_flags;
+ if (ret) {
+ set_btree_ioerr(p);
+@@ -4156,6 +4207,7 @@ int extent_readpages(struct extent_io_tree *tree,
+ struct page *page;
+ struct extent_map *em_cached = NULL;
+ int nr = 0;
++ u64 prev_em_start = (u64)-1;
+
+ for (page_idx = 0; page_idx < nr_pages; page_idx++) {
+ page = list_entry(pages->prev, struct page, lru);
+@@ -4172,12 +4224,12 @@ int extent_readpages(struct extent_io_tree *tree,
+ if (nr < ARRAY_SIZE(pagepool))
+ continue;
+ __extent_readpages(tree, pagepool, nr, get_extent, &em_cached,
+- &bio, 0, &bio_flags, READ);
++ &bio, 0, &bio_flags, READ, &prev_em_start);
+ nr = 0;
+ }
+ if (nr)
+ __extent_readpages(tree, pagepool, nr, get_extent, &em_cached,
+- &bio, 0, &bio_flags, READ);
++ &bio, 0, &bio_flags, READ, &prev_em_start);
+
+ if (em_cached)
+ free_extent_map(em_cached);
+diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
+index e33dff356460..b54e63038b96 100644
+--- a/fs/btrfs/inode.c
++++ b/fs/btrfs/inode.c
+@@ -5051,7 +5051,8 @@ void btrfs_evict_inode(struct inode *inode)
+ goto no_delete;
+ }
+ /* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */
+- btrfs_wait_ordered_range(inode, 0, (u64)-1);
++ if (!special_file(inode->i_mode))
++ btrfs_wait_ordered_range(inode, 0, (u64)-1);
+
+ btrfs_free_io_failure_record(inode, 0, (u64)-1);
+
+diff --git a/fs/cifs/cifsencrypt.c b/fs/cifs/cifsencrypt.c
+index aa0dc2573374..afa09fce8151 100644
+--- a/fs/cifs/cifsencrypt.c
++++ b/fs/cifs/cifsencrypt.c
+@@ -444,6 +444,48 @@ find_domain_name(struct cifs_ses *ses, const struct nls_table *nls_cp)
+ return 0;
+ }
+
++/* Server has provided av pairs/target info in the type 2 challenge
++ * packet and we have plucked it and stored within smb session.
++ * We parse that blob here to find the server given timestamp
++ * as part of ntlmv2 authentication (or local current time as
++ * default in case of failure)
++ */
++static __le64
++find_timestamp(struct cifs_ses *ses)
++{
++ unsigned int attrsize;
++ unsigned int type;
++ unsigned int onesize = sizeof(struct ntlmssp2_name);
++ unsigned char *blobptr;
++ unsigned char *blobend;
++ struct ntlmssp2_name *attrptr;
++
++ if (!ses->auth_key.len || !ses->auth_key.response)
++ return 0;
++
++ blobptr = ses->auth_key.response;
++ blobend = blobptr + ses->auth_key.len;
++
++ while (blobptr + onesize < blobend) {
++ attrptr = (struct ntlmssp2_name *) blobptr;
++ type = le16_to_cpu(attrptr->type);
++ if (type == NTLMSSP_AV_EOL)
++ break;
++ blobptr += 2; /* advance attr type */
++ attrsize = le16_to_cpu(attrptr->length);
++ blobptr += 2; /* advance attr size */
++ if (blobptr + attrsize > blobend)
++ break;
++ if (type == NTLMSSP_AV_TIMESTAMP) {
++ if (attrsize == sizeof(u64))
++ return *((__le64 *)blobptr);
++ }
++ blobptr += attrsize; /* advance attr value */
++ }
++
++ return cpu_to_le64(cifs_UnixTimeToNT(CURRENT_TIME));
++}
++
+ static int calc_ntlmv2_hash(struct cifs_ses *ses, char *ntlmv2_hash,
+ const struct nls_table *nls_cp)
+ {
+@@ -641,6 +683,7 @@ setup_ntlmv2_rsp(struct cifs_ses *ses, const struct nls_table *nls_cp)
+ struct ntlmv2_resp *ntlmv2;
+ char ntlmv2_hash[16];
+ unsigned char *tiblob = NULL; /* target info blob */
++ __le64 rsp_timestamp;
+
+ if (ses->server->negflavor == CIFS_NEGFLAVOR_EXTENDED) {
+ if (!ses->domainName) {
+@@ -659,6 +702,12 @@ setup_ntlmv2_rsp(struct cifs_ses *ses, const struct nls_table *nls_cp)
+ }
+ }
+
++ /* Must be within 5 minutes of the server (or in range +/-2h
++ * in case of Mac OS X), so simply carry over server timestamp
++ * (as Windows 7 does)
++ */
++ rsp_timestamp = find_timestamp(ses);
++
+ baselen = CIFS_SESS_KEY_SIZE + sizeof(struct ntlmv2_resp);
+ tilen = ses->auth_key.len;
+ tiblob = ses->auth_key.response;
+@@ -675,8 +724,8 @@ setup_ntlmv2_rsp(struct cifs_ses *ses, const struct nls_table *nls_cp)
+ (ses->auth_key.response + CIFS_SESS_KEY_SIZE);
+ ntlmv2->blob_signature = cpu_to_le32(0x00000101);
+ ntlmv2->reserved = 0;
+- /* Must be within 5 minutes of the server */
+- ntlmv2->time = cpu_to_le64(cifs_UnixTimeToNT(CURRENT_TIME));
++ ntlmv2->time = rsp_timestamp;
++
+ get_random_bytes(&ntlmv2->client_chal, sizeof(ntlmv2->client_chal));
+ ntlmv2->reserved2 = 0;
+
+diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
+index f621b44cb800..6b66dd5d1540 100644
+--- a/fs/cifs/inode.c
++++ b/fs/cifs/inode.c
+@@ -2034,7 +2034,6 @@ cifs_set_file_size(struct inode *inode, struct iattr *attrs,
+ struct tcon_link *tlink = NULL;
+ struct cifs_tcon *tcon = NULL;
+ struct TCP_Server_Info *server;
+- struct cifs_io_parms io_parms;
+
+ /*
+ * To avoid spurious oplock breaks from server, in the case of
+@@ -2056,18 +2055,6 @@ cifs_set_file_size(struct inode *inode, struct iattr *attrs,
+ rc = -ENOSYS;
+ cifsFileInfo_put(open_file);
+ cifs_dbg(FYI, "SetFSize for attrs rc = %d\n", rc);
+- if ((rc == -EINVAL) || (rc == -EOPNOTSUPP)) {
+- unsigned int bytes_written;
+-
+- io_parms.netfid = open_file->fid.netfid;
+- io_parms.pid = open_file->pid;
+- io_parms.tcon = tcon;
+- io_parms.offset = 0;
+- io_parms.length = attrs->ia_size;
+- rc = CIFSSMBWrite(xid, &io_parms, &bytes_written,
+- NULL, NULL, 1);
+- cifs_dbg(FYI, "Wrt seteof rc %d\n", rc);
+- }
+ } else
+ rc = -EINVAL;
+
+@@ -2093,28 +2080,7 @@ cifs_set_file_size(struct inode *inode, struct iattr *attrs,
+ else
+ rc = -ENOSYS;
+ cifs_dbg(FYI, "SetEOF by path (setattrs) rc = %d\n", rc);
+- if ((rc == -EINVAL) || (rc == -EOPNOTSUPP)) {
+- __u16 netfid;
+- int oplock = 0;
+
+- rc = SMBLegacyOpen(xid, tcon, full_path, FILE_OPEN,
+- GENERIC_WRITE, CREATE_NOT_DIR, &netfid,
+- &oplock, NULL, cifs_sb->local_nls,
+- cifs_remap(cifs_sb));
+- if (rc == 0) {
+- unsigned int bytes_written;
+-
+- io_parms.netfid = netfid;
+- io_parms.pid = current->tgid;
+- io_parms.tcon = tcon;
+- io_parms.offset = 0;
+- io_parms.length = attrs->ia_size;
+- rc = CIFSSMBWrite(xid, &io_parms, &bytes_written, NULL,
+- NULL, 1);
+- cifs_dbg(FYI, "wrt seteof rc %d\n", rc);
+- CIFSSMBClose(xid, tcon, netfid);
+- }
+- }
+ if (tlink)
+ cifs_put_tlink(tlink);
+
+diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
+index df91bcf56d67..18da19f4f811 100644
+--- a/fs/cifs/smb2ops.c
++++ b/fs/cifs/smb2ops.c
+@@ -50,9 +50,13 @@ change_conf(struct TCP_Server_Info *server)
+ break;
+ default:
+ server->echoes = true;
+- server->oplocks = true;
++ if (enable_oplocks) {
++ server->oplocks = true;
++ server->oplock_credits = 1;
++ } else
++ server->oplocks = false;
++
+ server->echo_credits = 1;
+- server->oplock_credits = 1;
+ }
+ server->credits -= server->echo_credits + server->oplock_credits;
+ return 0;
+diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
+index b8b4f08ee094..60dd83164ed6 100644
+--- a/fs/cifs/smb2pdu.c
++++ b/fs/cifs/smb2pdu.c
+@@ -46,6 +46,7 @@
+ #include "smb2status.h"
+ #include "smb2glob.h"
+ #include "cifspdu.h"
++#include "cifs_spnego.h"
+
+ /*
+ * The following table defines the expected "StructureSize" of SMB2 requests
+@@ -486,19 +487,15 @@ SMB2_negotiate(const unsigned int xid, struct cifs_ses *ses)
+ cifs_dbg(FYI, "missing security blob on negprot\n");
+
+ rc = cifs_enable_signing(server, ses->sign);
+-#ifdef CONFIG_SMB2_ASN1 /* BB REMOVEME when updated asn1.c ready */
+ if (rc)
+ goto neg_exit;
+- if (blob_length)
++ if (blob_length) {
+ rc = decode_negTokenInit(security_blob, blob_length, server);
+- if (rc == 1)
+- rc = 0;
+- else if (rc == 0) {
+- rc = -EIO;
+- goto neg_exit;
++ if (rc == 1)
++ rc = 0;
++ else if (rc == 0)
++ rc = -EIO;
+ }
+-#endif
+-
+ neg_exit:
+ free_rsp_buf(resp_buftype, rsp);
+ return rc;
+@@ -592,7 +589,8 @@ SMB2_sess_setup(const unsigned int xid, struct cifs_ses *ses,
+ __le32 phase = NtLmNegotiate; /* NTLMSSP, if needed, is multistage */
+ struct TCP_Server_Info *server = ses->server;
+ u16 blob_length = 0;
+- char *security_blob;
++ struct key *spnego_key = NULL;
++ char *security_blob = NULL;
+ char *ntlmssp_blob = NULL;
+ bool use_spnego = false; /* else use raw ntlmssp */
+
+@@ -620,7 +618,8 @@ SMB2_sess_setup(const unsigned int xid, struct cifs_ses *ses,
+ ses->ntlmssp->sesskey_per_smbsess = true;
+
+ /* FIXME: allow for other auth types besides NTLMSSP (e.g. krb5) */
+- ses->sectype = RawNTLMSSP;
++ if (ses->sectype != Kerberos && ses->sectype != RawNTLMSSP)
++ ses->sectype = RawNTLMSSP;
+
+ ssetup_ntlmssp_authenticate:
+ if (phase == NtLmChallenge)
+@@ -649,7 +648,48 @@ ssetup_ntlmssp_authenticate:
+ iov[0].iov_base = (char *)req;
+ /* 4 for rfc1002 length field and 1 for pad */
+ iov[0].iov_len = get_rfc1002_length(req) + 4 - 1;
+- if (phase == NtLmNegotiate) {
++
++ if (ses->sectype == Kerberos) {
++#ifdef CONFIG_CIFS_UPCALL
++ struct cifs_spnego_msg *msg;
++
++ spnego_key = cifs_get_spnego_key(ses);
++ if (IS_ERR(spnego_key)) {
++ rc = PTR_ERR(spnego_key);
++ spnego_key = NULL;
++ goto ssetup_exit;
++ }
++
++ msg = spnego_key->payload.data;
++ /*
++ * check version field to make sure that cifs.upcall is
++ * sending us a response in an expected form
++ */
++ if (msg->version != CIFS_SPNEGO_UPCALL_VERSION) {
++ cifs_dbg(VFS,
++ "bad cifs.upcall version. Expected %d got %d",
++ CIFS_SPNEGO_UPCALL_VERSION, msg->version);
++ rc = -EKEYREJECTED;
++ goto ssetup_exit;
++ }
++ ses->auth_key.response = kmemdup(msg->data, msg->sesskey_len,
++ GFP_KERNEL);
++ if (!ses->auth_key.response) {
++ cifs_dbg(VFS,
++ "Kerberos can't allocate (%u bytes) memory",
++ msg->sesskey_len);
++ rc = -ENOMEM;
++ goto ssetup_exit;
++ }
++ ses->auth_key.len = msg->sesskey_len;
++ blob_length = msg->secblob_len;
++ iov[1].iov_base = msg->data + msg->sesskey_len;
++ iov[1].iov_len = blob_length;
++#else
++ rc = -EOPNOTSUPP;
++ goto ssetup_exit;
++#endif /* CONFIG_CIFS_UPCALL */
++ } else if (phase == NtLmNegotiate) { /* if not krb5 must be ntlmssp */
+ ntlmssp_blob = kmalloc(sizeof(struct _NEGOTIATE_MESSAGE),
+ GFP_KERNEL);
+ if (ntlmssp_blob == NULL) {
+@@ -672,6 +712,8 @@ ssetup_ntlmssp_authenticate:
+ /* with raw NTLMSSP we don't encapsulate in SPNEGO */
+ security_blob = ntlmssp_blob;
+ }
++ iov[1].iov_base = security_blob;
++ iov[1].iov_len = blob_length;
+ } else if (phase == NtLmAuthenticate) {
+ req->hdr.SessionId = ses->Suid;
+ ntlmssp_blob = kzalloc(sizeof(struct _NEGOTIATE_MESSAGE) + 500,
+@@ -699,6 +741,8 @@ ssetup_ntlmssp_authenticate:
+ } else {
+ security_blob = ntlmssp_blob;
+ }
++ iov[1].iov_base = security_blob;
++ iov[1].iov_len = blob_length;
+ } else {
+ cifs_dbg(VFS, "illegal ntlmssp phase\n");
+ rc = -EIO;
+@@ -710,8 +754,6 @@ ssetup_ntlmssp_authenticate:
+ cpu_to_le16(sizeof(struct smb2_sess_setup_req) -
+ 1 /* pad */ - 4 /* rfc1001 len */);
+ req->SecurityBufferLength = cpu_to_le16(blob_length);
+- iov[1].iov_base = security_blob;
+- iov[1].iov_len = blob_length;
+
+ inc_rfc1001_len(req, blob_length - 1 /* pad */);
+
+@@ -722,6 +764,7 @@ ssetup_ntlmssp_authenticate:
+
+ kfree(security_blob);
+ rsp = (struct smb2_sess_setup_rsp *)iov[0].iov_base;
++ ses->Suid = rsp->hdr.SessionId;
+ if (resp_buftype != CIFS_NO_BUFFER &&
+ rsp->hdr.Status == STATUS_MORE_PROCESSING_REQUIRED) {
+ if (phase != NtLmNegotiate) {
+@@ -739,7 +782,6 @@ ssetup_ntlmssp_authenticate:
+ /* NTLMSSP Negotiate sent now processing challenge (response) */
+ phase = NtLmChallenge; /* process ntlmssp challenge */
+ rc = 0; /* MORE_PROCESSING is not an error here but expected */
+- ses->Suid = rsp->hdr.SessionId;
+ rc = decode_ntlmssp_challenge(rsp->Buffer,
+ le16_to_cpu(rsp->SecurityBufferLength), ses);
+ }
+@@ -796,6 +838,10 @@ keygen_exit:
+ kfree(ses->auth_key.response);
+ ses->auth_key.response = NULL;
+ }
++ if (spnego_key) {
++ key_invalidate(spnego_key);
++ key_put(spnego_key);
++ }
+ kfree(ses->ntlmssp);
+
+ return rc;
+diff --git a/fs/dax.c b/fs/dax.c
+index a7f77e1fa18c..ef35a2014580 100644
+--- a/fs/dax.c
++++ b/fs/dax.c
+@@ -116,7 +116,8 @@ static ssize_t dax_io(struct inode *inode, struct iov_iter *iter,
+ unsigned len;
+ if (pos == max) {
+ unsigned blkbits = inode->i_blkbits;
+- sector_t block = pos >> blkbits;
++ long page = pos >> PAGE_SHIFT;
++ sector_t block = page << (PAGE_SHIFT - blkbits);
+ unsigned first = pos - (block << blkbits);
+ long size;
+
+diff --git a/fs/dcache.c b/fs/dcache.c
+index 9b5fe503f6cb..e3b44ca75a1b 100644
+--- a/fs/dcache.c
++++ b/fs/dcache.c
+@@ -2926,6 +2926,13 @@ restart:
+
+ if (dentry == vfsmnt->mnt_root || IS_ROOT(dentry)) {
+ struct mount *parent = ACCESS_ONCE(mnt->mnt_parent);
++ /* Escaped? */
++ if (dentry != vfsmnt->mnt_root) {
++ bptr = *buffer;
++ blen = *buflen;
++ error = 3;
++ break;
++ }
+ /* Global root? */
+ if (mnt != parent) {
+ dentry = ACCESS_ONCE(mnt->mnt_mountpoint);
+diff --git a/fs/namei.c b/fs/namei.c
+index 1c2105ed20c5..36df4818a635 100644
+--- a/fs/namei.c
++++ b/fs/namei.c
+@@ -560,6 +560,24 @@ static int __nd_alloc_stack(struct nameidata *nd)
+ return 0;
+ }
+
++/**
++ * path_connected - Verify that a path->dentry is below path->mnt.mnt_root
++ * @path: nameidate to verify
++ *
++ * Rename can sometimes move a file or directory outside of a bind
++ * mount, path_connected allows those cases to be detected.
++ */
++static bool path_connected(const struct path *path)
++{
++ struct vfsmount *mnt = path->mnt;
++
++ /* Only bind mounts can have disconnected paths */
++ if (mnt->mnt_root == mnt->mnt_sb->s_root)
++ return true;
++
++ return is_subdir(path->dentry, mnt->mnt_root);
++}
++
+ static inline int nd_alloc_stack(struct nameidata *nd)
+ {
+ if (likely(nd->depth != EMBEDDED_LEVELS))
+@@ -1296,6 +1314,8 @@ static int follow_dotdot_rcu(struct nameidata *nd)
+ return -ECHILD;
+ nd->path.dentry = parent;
+ nd->seq = seq;
++ if (unlikely(!path_connected(&nd->path)))
++ return -ENOENT;
+ break;
+ } else {
+ struct mount *mnt = real_mount(nd->path.mnt);
+@@ -1396,7 +1416,7 @@ static void follow_mount(struct path *path)
+ }
+ }
+
+-static void follow_dotdot(struct nameidata *nd)
++static int follow_dotdot(struct nameidata *nd)
+ {
+ if (!nd->root.mnt)
+ set_root(nd);
+@@ -1412,6 +1432,8 @@ static void follow_dotdot(struct nameidata *nd)
+ /* rare case of legitimate dget_parent()... */
+ nd->path.dentry = dget_parent(nd->path.dentry);
+ dput(old);
++ if (unlikely(!path_connected(&nd->path)))
++ return -ENOENT;
+ break;
+ }
+ if (!follow_up(&nd->path))
+@@ -1419,6 +1441,7 @@ static void follow_dotdot(struct nameidata *nd)
+ }
+ follow_mount(&nd->path);
+ nd->inode = nd->path.dentry->d_inode;
++ return 0;
+ }
+
+ /*
+@@ -1535,8 +1558,6 @@ static int lookup_fast(struct nameidata *nd,
+ negative = d_is_negative(dentry);
+ if (read_seqcount_retry(&dentry->d_seq, seq))
+ return -ECHILD;
+- if (negative)
+- return -ENOENT;
+
+ /*
+ * This sequence count validates that the parent had no
+@@ -1557,6 +1578,12 @@ static int lookup_fast(struct nameidata *nd,
+ goto unlazy;
+ }
+ }
++ /*
++ * Note: do negative dentry check after revalidation in
++ * case that drops it.
++ */
++ if (negative)
++ return -ENOENT;
+ path->mnt = mnt;
+ path->dentry = dentry;
+ if (likely(__follow_mount_rcu(nd, path, inode, seqp)))
+@@ -1634,7 +1661,7 @@ static inline int handle_dots(struct nameidata *nd, int type)
+ if (nd->flags & LOOKUP_RCU) {
+ return follow_dotdot_rcu(nd);
+ } else
+- follow_dotdot(nd);
++ return follow_dotdot(nd);
+ }
+ return 0;
+ }
+diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c
+index 029d688a969f..c56886829708 100644
+--- a/fs/nfs/delegation.c
++++ b/fs/nfs/delegation.c
+@@ -113,7 +113,8 @@ out:
+ return status;
+ }
+
+-static int nfs_delegation_claim_opens(struct inode *inode, const nfs4_stateid *stateid)
++static int nfs_delegation_claim_opens(struct inode *inode,
++ const nfs4_stateid *stateid, fmode_t type)
+ {
+ struct nfs_inode *nfsi = NFS_I(inode);
+ struct nfs_open_context *ctx;
+@@ -140,7 +141,7 @@ again:
+ /* Block nfs4_proc_unlck */
+ mutex_lock(&sp->so_delegreturn_mutex);
+ seq = raw_seqcount_begin(&sp->so_reclaim_seqcount);
+- err = nfs4_open_delegation_recall(ctx, state, stateid);
++ err = nfs4_open_delegation_recall(ctx, state, stateid, type);
+ if (!err)
+ err = nfs_delegation_claim_locks(ctx, state, stateid);
+ if (!err && read_seqcount_retry(&sp->so_reclaim_seqcount, seq))
+@@ -411,7 +412,8 @@ static int nfs_end_delegation_return(struct inode *inode, struct nfs_delegation
+ do {
+ if (test_bit(NFS_DELEGATION_REVOKED, &delegation->flags))
+ break;
+- err = nfs_delegation_claim_opens(inode, &delegation->stateid);
++ err = nfs_delegation_claim_opens(inode, &delegation->stateid,
++ delegation->type);
+ if (!issync || err != -EAGAIN)
+ break;
+ /*
+diff --git a/fs/nfs/delegation.h b/fs/nfs/delegation.h
+index e3c20a3ccc93..785c8525b576 100644
+--- a/fs/nfs/delegation.h
++++ b/fs/nfs/delegation.h
+@@ -54,7 +54,7 @@ void nfs_delegation_reap_unclaimed(struct nfs_client *clp);
+
+ /* NFSv4 delegation-related procedures */
+ int nfs4_proc_delegreturn(struct inode *inode, struct rpc_cred *cred, const nfs4_stateid *stateid, int issync);
+-int nfs4_open_delegation_recall(struct nfs_open_context *ctx, struct nfs4_state *state, const nfs4_stateid *stateid);
++int nfs4_open_delegation_recall(struct nfs_open_context *ctx, struct nfs4_state *state, const nfs4_stateid *stateid, fmode_t type);
+ int nfs4_lock_delegation_recall(struct file_lock *fl, struct nfs4_state *state, const nfs4_stateid *stateid);
+ bool nfs4_copy_delegation_stateid(nfs4_stateid *dst, struct inode *inode, fmode_t flags);
+
+diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
+index b34f2e228601..02ec07973bc4 100644
+--- a/fs/nfs/filelayout/filelayout.c
++++ b/fs/nfs/filelayout/filelayout.c
+@@ -629,23 +629,18 @@ out_put:
+ goto out;
+ }
+
+-static void filelayout_free_fh_array(struct nfs4_filelayout_segment *fl)
++static void _filelayout_free_lseg(struct nfs4_filelayout_segment *fl)
+ {
+ int i;
+
+- for (i = 0; i < fl->num_fh; i++) {
+- if (!fl->fh_array[i])
+- break;
+- kfree(fl->fh_array[i]);
++ if (fl->fh_array) {
++ for (i = 0; i < fl->num_fh; i++) {
++ if (!fl->fh_array[i])
++ break;
++ kfree(fl->fh_array[i]);
++ }
++ kfree(fl->fh_array);
+ }
+- kfree(fl->fh_array);
+- fl->fh_array = NULL;
+-}
+-
+-static void
+-_filelayout_free_lseg(struct nfs4_filelayout_segment *fl)
+-{
+- filelayout_free_fh_array(fl);
+ kfree(fl);
+ }
+
+@@ -716,21 +711,21 @@ filelayout_decode_layout(struct pnfs_layout_hdr *flo,
+ /* Do we want to use a mempool here? */
+ fl->fh_array[i] = kmalloc(sizeof(struct nfs_fh), gfp_flags);
+ if (!fl->fh_array[i])
+- goto out_err_free;
++ goto out_err;
+
+ p = xdr_inline_decode(&stream, 4);
+ if (unlikely(!p))
+- goto out_err_free;
++ goto out_err;
+ fl->fh_array[i]->size = be32_to_cpup(p++);
+ if (sizeof(struct nfs_fh) < fl->fh_array[i]->size) {
+ printk(KERN_ERR "NFS: Too big fh %d received %d\n",
+ i, fl->fh_array[i]->size);
+- goto out_err_free;
++ goto out_err;
+ }
+
+ p = xdr_inline_decode(&stream, fl->fh_array[i]->size);
+ if (unlikely(!p))
+- goto out_err_free;
++ goto out_err;
+ memcpy(fl->fh_array[i]->data, p, fl->fh_array[i]->size);
+ dprintk("DEBUG: %s: fh len %d\n", __func__,
+ fl->fh_array[i]->size);
+@@ -739,8 +734,6 @@ filelayout_decode_layout(struct pnfs_layout_hdr *flo,
+ __free_page(scratch);
+ return 0;
+
+-out_err_free:
+- filelayout_free_fh_array(fl);
+ out_err:
+ __free_page(scratch);
+ return -EIO;
+diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
+index d731bbf974aa..0f020e4d8421 100644
+--- a/fs/nfs/nfs42proc.c
++++ b/fs/nfs/nfs42proc.c
+@@ -175,10 +175,12 @@ loff_t nfs42_proc_llseek(struct file *filep, loff_t offset, int whence)
+ {
+ struct nfs_server *server = NFS_SERVER(file_inode(filep));
+ struct nfs4_exception exception = { };
+- int err;
++ loff_t err;
+
+ do {
+ err = _nfs42_proc_llseek(filep, offset, whence);
++ if (err >= 0)
++ break;
+ if (err == -ENOTSUPP)
+ return -EOPNOTSUPP;
+ err = nfs4_handle_exception(server, err, &exception);
+diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
+index 73c8204ad463..d2daacad3568 100644
+--- a/fs/nfs/nfs4proc.c
++++ b/fs/nfs/nfs4proc.c
+@@ -1127,6 +1127,21 @@ static int nfs4_wait_for_completion_rpc_task(struct rpc_task *task)
+ return ret;
+ }
+
++static bool nfs4_mode_match_open_stateid(struct nfs4_state *state,
++ fmode_t fmode)
++{
++ switch(fmode & (FMODE_READ|FMODE_WRITE)) {
++ case FMODE_READ|FMODE_WRITE:
++ return state->n_rdwr != 0;
++ case FMODE_WRITE:
++ return state->n_wronly != 0;
++ case FMODE_READ:
++ return state->n_rdonly != 0;
++ }
++ WARN_ON_ONCE(1);
++ return false;
++}
++
+ static int can_open_cached(struct nfs4_state *state, fmode_t mode, int open_mode)
+ {
+ int ret = 0;
+@@ -1561,17 +1576,13 @@ static struct nfs4_opendata *nfs4_open_recoverdata_alloc(struct nfs_open_context
+ return opendata;
+ }
+
+-static int nfs4_open_recover_helper(struct nfs4_opendata *opendata, fmode_t fmode, struct nfs4_state **res)
++static int nfs4_open_recover_helper(struct nfs4_opendata *opendata,
++ fmode_t fmode)
+ {
+ struct nfs4_state *newstate;
+ int ret;
+
+- if ((opendata->o_arg.claim == NFS4_OPEN_CLAIM_DELEGATE_CUR ||
+- opendata->o_arg.claim == NFS4_OPEN_CLAIM_DELEG_CUR_FH) &&
+- (opendata->o_arg.u.delegation_type & fmode) != fmode)
+- /* This mode can't have been delegated, so we must have
+- * a valid open_stateid to cover it - not need to reclaim.
+- */
++ if (!nfs4_mode_match_open_stateid(opendata->state, fmode))
+ return 0;
+ opendata->o_arg.open_flags = 0;
+ opendata->o_arg.fmode = fmode;
+@@ -1587,14 +1598,14 @@ static int nfs4_open_recover_helper(struct nfs4_opendata *opendata, fmode_t fmod
+ newstate = nfs4_opendata_to_nfs4_state(opendata);
+ if (IS_ERR(newstate))
+ return PTR_ERR(newstate);
++ if (newstate != opendata->state)
++ ret = -ESTALE;
+ nfs4_close_state(newstate, fmode);
+- *res = newstate;
+- return 0;
++ return ret;
+ }
+
+ static int nfs4_open_recover(struct nfs4_opendata *opendata, struct nfs4_state *state)
+ {
+- struct nfs4_state *newstate;
+ int ret;
+
+ /* Don't trigger recovery in nfs_test_and_clear_all_open_stateid */
+@@ -1605,27 +1616,15 @@ static int nfs4_open_recover(struct nfs4_opendata *opendata, struct nfs4_state *
+ clear_bit(NFS_DELEGATED_STATE, &state->flags);
+ clear_bit(NFS_OPEN_STATE, &state->flags);
+ smp_rmb();
+- if (state->n_rdwr != 0) {
+- ret = nfs4_open_recover_helper(opendata, FMODE_READ|FMODE_WRITE, &newstate);
+- if (ret != 0)
+- return ret;
+- if (newstate != state)
+- return -ESTALE;
+- }
+- if (state->n_wronly != 0) {
+- ret = nfs4_open_recover_helper(opendata, FMODE_WRITE, &newstate);
+- if (ret != 0)
+- return ret;
+- if (newstate != state)
+- return -ESTALE;
+- }
+- if (state->n_rdonly != 0) {
+- ret = nfs4_open_recover_helper(opendata, FMODE_READ, &newstate);
+- if (ret != 0)
+- return ret;
+- if (newstate != state)
+- return -ESTALE;
+- }
++ ret = nfs4_open_recover_helper(opendata, FMODE_READ|FMODE_WRITE);
++ if (ret != 0)
++ return ret;
++ ret = nfs4_open_recover_helper(opendata, FMODE_WRITE);
++ if (ret != 0)
++ return ret;
++ ret = nfs4_open_recover_helper(opendata, FMODE_READ);
++ if (ret != 0)
++ return ret;
+ /*
+ * We may have performed cached opens for all three recoveries.
+ * Check if we need to update the current stateid.
+@@ -1749,18 +1748,32 @@ static int nfs4_handle_delegation_recall_error(struct nfs_server *server, struct
+ return err;
+ }
+
+-int nfs4_open_delegation_recall(struct nfs_open_context *ctx, struct nfs4_state *state, const nfs4_stateid *stateid)
++int nfs4_open_delegation_recall(struct nfs_open_context *ctx,
++ struct nfs4_state *state, const nfs4_stateid *stateid,
++ fmode_t type)
+ {
+ struct nfs_server *server = NFS_SERVER(state->inode);
+ struct nfs4_opendata *opendata;
+- int err;
++ int err = 0;
+
+ opendata = nfs4_open_recoverdata_alloc(ctx, state,
+ NFS4_OPEN_CLAIM_DELEG_CUR_FH);
+ if (IS_ERR(opendata))
+ return PTR_ERR(opendata);
+ nfs4_stateid_copy(&opendata->o_arg.u.delegation, stateid);
+- err = nfs4_open_recover(opendata, state);
++ clear_bit(NFS_DELEGATED_STATE, &state->flags);
++ switch (type & (FMODE_READ|FMODE_WRITE)) {
++ case FMODE_READ|FMODE_WRITE:
++ case FMODE_WRITE:
++ err = nfs4_open_recover_helper(opendata, FMODE_READ|FMODE_WRITE);
++ if (err)
++ break;
++ err = nfs4_open_recover_helper(opendata, FMODE_WRITE);
++ if (err)
++ break;
++ case FMODE_READ:
++ err = nfs4_open_recover_helper(opendata, FMODE_READ);
++ }
+ nfs4_opendata_put(opendata);
+ return nfs4_handle_delegation_recall_error(server, state, stateid, err);
+ }
+diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
+index 7c5718ba625e..fe3ddd20ff89 100644
+--- a/fs/nfs/pagelist.c
++++ b/fs/nfs/pagelist.c
+@@ -508,7 +508,7 @@ size_t nfs_generic_pg_test(struct nfs_pageio_descriptor *desc,
+ * for it without upsetting the slab allocator.
+ */
+ if (((mirror->pg_count + req->wb_bytes) >> PAGE_SHIFT) *
+- sizeof(struct page) > PAGE_SIZE)
++ sizeof(struct page *) > PAGE_SIZE)
+ return 0;
+
+ return min(mirror->pg_bsize - mirror->pg_count, (size_t)req->wb_bytes);
+diff --git a/fs/nfs/read.c b/fs/nfs/read.c
+index ae0ff7a11b40..01b8cc8e8cfc 100644
+--- a/fs/nfs/read.c
++++ b/fs/nfs/read.c
+@@ -72,6 +72,9 @@ void nfs_pageio_reset_read_mds(struct nfs_pageio_descriptor *pgio)
+ {
+ struct nfs_pgio_mirror *mirror;
+
++ if (pgio->pg_ops && pgio->pg_ops->pg_cleanup)
++ pgio->pg_ops->pg_cleanup(pgio);
++
+ pgio->pg_ops = &nfs_pgio_rw_ops;
+
+ /* read path should never have more than one mirror */
+diff --git a/fs/nfs/write.c b/fs/nfs/write.c
+index fdee9270ca15..b45b465bc205 100644
+--- a/fs/nfs/write.c
++++ b/fs/nfs/write.c
+@@ -1223,7 +1223,7 @@ static int nfs_can_extend_write(struct file *file, struct page *page, struct ino
+ return 1;
+ if (!flctx || (list_empty_careful(&flctx->flc_flock) &&
+ list_empty_careful(&flctx->flc_posix)))
+- return 0;
++ return 1;
+
+ /* Check to see if there are whole file write locks */
+ ret = 0;
+@@ -1351,6 +1351,9 @@ void nfs_pageio_reset_write_mds(struct nfs_pageio_descriptor *pgio)
+ {
+ struct nfs_pgio_mirror *mirror;
+
++ if (pgio->pg_ops && pgio->pg_ops->pg_cleanup)
++ pgio->pg_ops->pg_cleanup(pgio);
++
+ pgio->pg_ops = &nfs_pgio_rw_ops;
+
+ nfs_pageio_stop_mirroring(pgio);
+diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
+index fdf4b41d0609..482cfd34472d 100644
+--- a/fs/ocfs2/dlm/dlmmaster.c
++++ b/fs/ocfs2/dlm/dlmmaster.c
+@@ -1439,6 +1439,7 @@ int dlm_master_request_handler(struct o2net_msg *msg, u32 len, void *data,
+ int found, ret;
+ int set_maybe;
+ int dispatch_assert = 0;
++ int dispatched = 0;
+
+ if (!dlm_grab(dlm))
+ return DLM_MASTER_RESP_NO;
+@@ -1658,15 +1659,18 @@ send_response:
+ mlog(ML_ERROR, "failed to dispatch assert master work\n");
+ response = DLM_MASTER_RESP_ERROR;
+ dlm_lockres_put(res);
+- } else
++ } else {
++ dispatched = 1;
+ __dlm_lockres_grab_inflight_worker(dlm, res);
++ }
+ spin_unlock(&res->spinlock);
+ } else {
+ if (res)
+ dlm_lockres_put(res);
+ }
+
+- dlm_put(dlm);
++ if (!dispatched)
++ dlm_put(dlm);
+ return response;
+ }
+
+@@ -2090,7 +2094,6 @@ int dlm_dispatch_assert_master(struct dlm_ctxt *dlm,
+
+
+ /* queue up work for dlm_assert_master_worker */
+- dlm_grab(dlm); /* get an extra ref for the work item */
+ dlm_init_work_item(dlm, item, dlm_assert_master_worker, NULL);
+ item->u.am.lockres = res; /* already have a ref */
+ /* can optionally ignore node numbers higher than this node */
+diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
+index ce12e0b1a31f..3d90ad7ff91f 100644
+--- a/fs/ocfs2/dlm/dlmrecovery.c
++++ b/fs/ocfs2/dlm/dlmrecovery.c
+@@ -1694,6 +1694,7 @@ int dlm_master_requery_handler(struct o2net_msg *msg, u32 len, void *data,
+ unsigned int hash;
+ int master = DLM_LOCK_RES_OWNER_UNKNOWN;
+ u32 flags = DLM_ASSERT_MASTER_REQUERY;
++ int dispatched = 0;
+
+ if (!dlm_grab(dlm)) {
+ /* since the domain has gone away on this
+@@ -1719,8 +1720,10 @@ int dlm_master_requery_handler(struct o2net_msg *msg, u32 len, void *data,
+ dlm_put(dlm);
+ /* sender will take care of this and retry */
+ return ret;
+- } else
++ } else {
++ dispatched = 1;
+ __dlm_lockres_grab_inflight_worker(dlm, res);
++ }
+ spin_unlock(&res->spinlock);
+ } else {
+ /* put.. incase we are not the master */
+@@ -1730,7 +1733,8 @@ int dlm_master_requery_handler(struct o2net_msg *msg, u32 len, void *data,
+ }
+ spin_unlock(&dlm->spinlock);
+
+- dlm_put(dlm);
++ if (!dispatched)
++ dlm_put(dlm);
+ return master;
+ }
+
+diff --git a/fs/ubifs/xattr.c b/fs/ubifs/xattr.c
+index 96f3448b6eb4..fd65b3f1923c 100644
+--- a/fs/ubifs/xattr.c
++++ b/fs/ubifs/xattr.c
+@@ -652,11 +652,8 @@ int ubifs_init_security(struct inode *dentry, struct inode *inode,
+ {
+ int err;
+
+- mutex_lock(&inode->i_mutex);
+ err = security_inode_init_security(inode, dentry, qstr,
+ &init_xattrs, 0);
+- mutex_unlock(&inode->i_mutex);
+-
+ if (err) {
+ struct ubifs_info *c = dentry->i_sb->s_fs_info;
+ ubifs_err(c, "cannot initialize security for inode %lu, error %d",
+diff --git a/include/asm-generic/preempt.h b/include/asm-generic/preempt.h
+index d0a7a4753db2..0bec580a4885 100644
+--- a/include/asm-generic/preempt.h
++++ b/include/asm-generic/preempt.h
+@@ -71,9 +71,10 @@ static __always_inline bool __preempt_count_dec_and_test(void)
+ /*
+ * Returns true when we need to resched and can (barring IRQ state).
+ */
+-static __always_inline bool should_resched(void)
++static __always_inline bool should_resched(int preempt_offset)
+ {
+- return unlikely(!preempt_count() && tif_need_resched());
++ return unlikely(preempt_count() == preempt_offset &&
++ tif_need_resched());
+ }
+
+ #ifdef CONFIG_PREEMPT
+diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
+index 83bfb87f5bf1..e2aadbc7151f 100644
+--- a/include/asm-generic/qspinlock.h
++++ b/include/asm-generic/qspinlock.h
+@@ -111,8 +111,8 @@ static inline void queued_spin_unlock_wait(struct qspinlock *lock)
+ cpu_relax();
+ }
+
+-#ifndef virt_queued_spin_lock
+-static __always_inline bool virt_queued_spin_lock(struct qspinlock *lock)
++#ifndef virt_spin_lock
++static __always_inline bool virt_spin_lock(struct qspinlock *lock)
+ {
+ return false;
+ }
+diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
+index 93755a629299..430c876ad717 100644
+--- a/include/linux/cgroup-defs.h
++++ b/include/linux/cgroup-defs.h
+@@ -463,31 +463,8 @@ struct cgroup_subsys {
+ unsigned int depends_on;
+ };
+
+-extern struct percpu_rw_semaphore cgroup_threadgroup_rwsem;
+-
+-/**
+- * cgroup_threadgroup_change_begin - threadgroup exclusion for cgroups
+- * @tsk: target task
+- *
+- * Called from threadgroup_change_begin() and allows cgroup operations to
+- * synchronize against threadgroup changes using a percpu_rw_semaphore.
+- */
+-static inline void cgroup_threadgroup_change_begin(struct task_struct *tsk)
+-{
+- percpu_down_read(&cgroup_threadgroup_rwsem);
+-}
+-
+-/**
+- * cgroup_threadgroup_change_end - threadgroup exclusion for cgroups
+- * @tsk: target task
+- *
+- * Called from threadgroup_change_end(). Counterpart of
+- * cgroup_threadcgroup_change_begin().
+- */
+-static inline void cgroup_threadgroup_change_end(struct task_struct *tsk)
+-{
+- percpu_up_read(&cgroup_threadgroup_rwsem);
+-}
++void cgroup_threadgroup_change_begin(struct task_struct *tsk);
++void cgroup_threadgroup_change_end(struct task_struct *tsk);
+
+ #else /* CONFIG_CGROUPS */
+
+diff --git a/include/linux/init_task.h b/include/linux/init_task.h
+index e8493fee8160..bb9b075f0eb0 100644
+--- a/include/linux/init_task.h
++++ b/include/linux/init_task.h
+@@ -25,6 +25,13 @@
+ extern struct files_struct init_files;
+ extern struct fs_struct init_fs;
+
++#ifdef CONFIG_CGROUPS
++#define INIT_GROUP_RWSEM(sig) \
++ .group_rwsem = __RWSEM_INITIALIZER(sig.group_rwsem),
++#else
++#define INIT_GROUP_RWSEM(sig)
++#endif
++
+ #ifdef CONFIG_CPUSETS
+ #define INIT_CPUSET_SEQ(tsk) \
+ .mems_allowed_seq = SEQCNT_ZERO(tsk.mems_allowed_seq),
+@@ -48,6 +55,7 @@ extern struct fs_struct init_fs;
+ }, \
+ .cred_guard_mutex = \
+ __MUTEX_INITIALIZER(sig.cred_guard_mutex), \
++ INIT_GROUP_RWSEM(sig) \
+ }
+
+ extern struct nsproxy init_nsproxy;
+diff --git a/include/linux/mm.h b/include/linux/mm.h
+index bf6f117fcf4d..2b05068f5878 100644
+--- a/include/linux/mm.h
++++ b/include/linux/mm.h
+@@ -916,6 +916,27 @@ static inline void set_page_links(struct page *page, enum zone_type zone,
+ #endif
+ }
+
++#ifdef CONFIG_MEMCG
++static inline struct mem_cgroup *page_memcg(struct page *page)
++{
++ return page->mem_cgroup;
++}
++
++static inline void set_page_memcg(struct page *page, struct mem_cgroup *memcg)
++{
++ page->mem_cgroup = memcg;
++}
++#else
++static inline struct mem_cgroup *page_memcg(struct page *page)
++{
++ return NULL;
++}
++
++static inline void set_page_memcg(struct page *page, struct mem_cgroup *memcg)
++{
++}
++#endif
++
+ /*
+ * Some inline functions in vmstat.h depend on page_zone()
+ */
+diff --git a/include/linux/preempt.h b/include/linux/preempt.h
+index 84991f185173..bea8dd8ff5e0 100644
+--- a/include/linux/preempt.h
++++ b/include/linux/preempt.h
+@@ -84,13 +84,21 @@
+ */
+ #define in_nmi() (preempt_count() & NMI_MASK)
+
++/*
++ * The preempt_count offset after preempt_disable();
++ */
+ #if defined(CONFIG_PREEMPT_COUNT)
+-# define PREEMPT_DISABLE_OFFSET 1
++# define PREEMPT_DISABLE_OFFSET PREEMPT_OFFSET
+ #else
+-# define PREEMPT_DISABLE_OFFSET 0
++# define PREEMPT_DISABLE_OFFSET 0
+ #endif
+
+ /*
++ * The preempt_count offset after spin_lock()
++ */
++#define PREEMPT_LOCK_OFFSET PREEMPT_DISABLE_OFFSET
++
++/*
+ * The preempt_count offset needed for things like:
+ *
+ * spin_lock_bh()
+@@ -103,7 +111,7 @@
+ *
+ * Work as expected.
+ */
+-#define SOFTIRQ_LOCK_OFFSET (SOFTIRQ_DISABLE_OFFSET + PREEMPT_DISABLE_OFFSET)
++#define SOFTIRQ_LOCK_OFFSET (SOFTIRQ_DISABLE_OFFSET + PREEMPT_LOCK_OFFSET)
+
+ /*
+ * Are we running in atomic context? WARNING: this macro cannot
+@@ -124,7 +132,8 @@
+ #if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
+ extern void preempt_count_add(int val);
+ extern void preempt_count_sub(int val);
+-#define preempt_count_dec_and_test() ({ preempt_count_sub(1); should_resched(); })
++#define preempt_count_dec_and_test() \
++ ({ preempt_count_sub(1); should_resched(0); })
+ #else
+ #define preempt_count_add(val) __preempt_count_add(val)
+ #define preempt_count_sub(val) __preempt_count_sub(val)
+@@ -184,7 +193,7 @@ do { \
+
+ #define preempt_check_resched() \
+ do { \
+- if (should_resched()) \
++ if (should_resched(0)) \
+ __preempt_schedule(); \
+ } while (0)
+
+diff --git a/include/linux/sched.h b/include/linux/sched.h
+index 04b5ada460b4..bfca8aa215d1 100644
+--- a/include/linux/sched.h
++++ b/include/linux/sched.h
+@@ -754,6 +754,18 @@ struct signal_struct {
+ unsigned audit_tty_log_passwd;
+ struct tty_audit_buf *tty_audit_buf;
+ #endif
++#ifdef CONFIG_CGROUPS
++ /*
++ * group_rwsem prevents new tasks from entering the threadgroup and
++ * member tasks from exiting,a more specifically, setting of
++ * PF_EXITING. fork and exit paths are protected with this rwsem
++ * using threadgroup_change_begin/end(). Users which require
++ * threadgroup to remain stable should use threadgroup_[un]lock()
++ * which also takes care of exec path. Currently, cgroup is the
++ * only user.
++ */
++ struct rw_semaphore group_rwsem;
++#endif
+
+ oom_flags_t oom_flags;
+ short oom_score_adj; /* OOM kill score adjustment */
+@@ -2897,12 +2909,6 @@ extern int _cond_resched(void);
+
+ extern int __cond_resched_lock(spinlock_t *lock);
+
+-#ifdef CONFIG_PREEMPT_COUNT
+-#define PREEMPT_LOCK_OFFSET PREEMPT_OFFSET
+-#else
+-#define PREEMPT_LOCK_OFFSET 0
+-#endif
+-
+ #define cond_resched_lock(lock) ({ \
+ ___might_sleep(__FILE__, __LINE__, PREEMPT_LOCK_OFFSET);\
+ __cond_resched_lock(lock); \
+diff --git a/include/linux/security.h b/include/linux/security.h
+index 79d85ddf8093..2f4c1f7aa7db 100644
+--- a/include/linux/security.h
++++ b/include/linux/security.h
+@@ -946,7 +946,7 @@ static inline int security_task_prctl(int option, unsigned long arg2,
+ unsigned long arg4,
+ unsigned long arg5)
+ {
+- return cap_task_prctl(option, arg2, arg3, arg3, arg5);
++ return cap_task_prctl(option, arg2, arg3, arg4, arg5);
+ }
+
+ static inline void security_task_to_inode(struct task_struct *p, struct inode *inode)
+diff --git a/include/net/netfilter/br_netfilter.h b/include/net/netfilter/br_netfilter.h
+index bab824bde92c..d4c6b5f30acd 100644
+--- a/include/net/netfilter/br_netfilter.h
++++ b/include/net/netfilter/br_netfilter.h
+@@ -59,7 +59,7 @@ static inline unsigned int
+ br_nf_pre_routing_ipv6(const struct nf_hook_ops *ops, struct sk_buff *skb,
+ const struct nf_hook_state *state)
+ {
+- return NF_DROP;
++ return NF_ACCEPT;
+ }
+ #endif
+
+diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
+index 37cd3911d5c5..4023c4ce260f 100644
+--- a/include/net/netfilter/nf_conntrack.h
++++ b/include/net/netfilter/nf_conntrack.h
+@@ -292,6 +292,7 @@ extern unsigned int nf_conntrack_hash_rnd;
+ void init_nf_conntrack_hash_rnd(void);
+
+ struct nf_conn *nf_ct_tmpl_alloc(struct net *net, u16 zone, gfp_t flags);
++void nf_ct_tmpl_free(struct nf_conn *tmpl);
+
+ #define NF_CT_STAT_INC(net, count) __this_cpu_inc((net)->ct.stat->count)
+ #define NF_CT_STAT_INC_ATOMIC(net, count) this_cpu_inc((net)->ct.stat->count)
+diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
+index 2a246680a6c3..aa8bee72c9d3 100644
+--- a/include/net/netfilter/nf_tables.h
++++ b/include/net/netfilter/nf_tables.h
+@@ -125,7 +125,7 @@ static inline enum nft_data_types nft_dreg_to_type(enum nft_registers reg)
+
+ static inline enum nft_registers nft_type_to_reg(enum nft_data_types type)
+ {
+- return type == NFT_DATA_VERDICT ? NFT_REG_VERDICT : NFT_REG_1;
++ return type == NFT_DATA_VERDICT ? NFT_REG_VERDICT : NFT_REG_1 * NFT_REG_SIZE / NFT_REG32_SIZE;
+ }
+
+ unsigned int nft_parse_register(const struct nlattr *attr);
+diff --git a/include/target/iscsi/iscsi_target_core.h b/include/target/iscsi/iscsi_target_core.h
+index 0aedbb2c10e0..7e7f8875ac32 100644
+--- a/include/target/iscsi/iscsi_target_core.h
++++ b/include/target/iscsi/iscsi_target_core.h
+@@ -776,7 +776,6 @@ struct iscsi_np {
+ enum iscsi_timer_flags_table np_login_timer_flags;
+ u32 np_exports;
+ enum np_flags_table np_flags;
+- unsigned char np_ip[IPV6_ADDRESS_SPACE];
+ u16 np_port;
+ spinlock_t np_thread_lock;
+ struct completion np_restart_comp;
+diff --git a/include/xen/interface/sched.h b/include/xen/interface/sched.h
+index 9ce083960a25..f18490985fc8 100644
+--- a/include/xen/interface/sched.h
++++ b/include/xen/interface/sched.h
+@@ -107,5 +107,13 @@ struct sched_watchdog {
+ #define SHUTDOWN_suspend 2 /* Clean up, save suspend info, kill. */
+ #define SHUTDOWN_crash 3 /* Tell controller we've crashed. */
+ #define SHUTDOWN_watchdog 4 /* Restart because watchdog time expired. */
++/*
++ * Domain asked to perform 'soft reset' for it. The expected behavior is to
++ * reset internal Xen state for the domain returning it to the point where it
++ * was created but leaving the domain's memory contents and vCPU contexts
++ * intact. This will allow the domain to start over and set up all Xen specific
++ * interfaces again.
++ */
++#define SHUTDOWN_soft_reset 5
+
+ #endif /* __XEN_PUBLIC_SCHED_H__ */
+diff --git a/ipc/msg.c b/ipc/msg.c
+index 66c4f567eb73..1471db9a7e61 100644
+--- a/ipc/msg.c
++++ b/ipc/msg.c
+@@ -137,13 +137,6 @@ static int newque(struct ipc_namespace *ns, struct ipc_params *params)
+ return retval;
+ }
+
+- /* ipc_addid() locks msq upon success. */
+- id = ipc_addid(&msg_ids(ns), &msq->q_perm, ns->msg_ctlmni);
+- if (id < 0) {
+- ipc_rcu_putref(msq, msg_rcu_free);
+- return id;
+- }
+-
+ msq->q_stime = msq->q_rtime = 0;
+ msq->q_ctime = get_seconds();
+ msq->q_cbytes = msq->q_qnum = 0;
+@@ -153,6 +146,13 @@ static int newque(struct ipc_namespace *ns, struct ipc_params *params)
+ INIT_LIST_HEAD(&msq->q_receivers);
+ INIT_LIST_HEAD(&msq->q_senders);
+
++ /* ipc_addid() locks msq upon success. */
++ id = ipc_addid(&msg_ids(ns), &msq->q_perm, ns->msg_ctlmni);
++ if (id < 0) {
++ ipc_rcu_putref(msq, msg_rcu_free);
++ return id;
++ }
++
+ ipc_unlock_object(&msq->q_perm);
+ rcu_read_unlock();
+
+diff --git a/ipc/shm.c b/ipc/shm.c
+index 4aef24d91b63..0e61fd430547 100644
+--- a/ipc/shm.c
++++ b/ipc/shm.c
+@@ -551,12 +551,6 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
+ if (IS_ERR(file))
+ goto no_file;
+
+- id = ipc_addid(&shm_ids(ns), &shp->shm_perm, ns->shm_ctlmni);
+- if (id < 0) {
+- error = id;
+- goto no_id;
+- }
+-
+ shp->shm_cprid = task_tgid_vnr(current);
+ shp->shm_lprid = 0;
+ shp->shm_atim = shp->shm_dtim = 0;
+@@ -565,6 +559,13 @@ static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
+ shp->shm_nattch = 0;
+ shp->shm_file = file;
+ shp->shm_creator = current;
++
++ id = ipc_addid(&shm_ids(ns), &shp->shm_perm, ns->shm_ctlmni);
++ if (id < 0) {
++ error = id;
++ goto no_id;
++ }
++
+ list_add(&shp->shm_clist, ¤t->sysvshm.shm_clist);
+
+ /*
+diff --git a/ipc/util.c b/ipc/util.c
+index be4230020a1f..0f401d94b7c6 100644
+--- a/ipc/util.c
++++ b/ipc/util.c
+@@ -237,6 +237,10 @@ int ipc_addid(struct ipc_ids *ids, struct kern_ipc_perm *new, int size)
+ rcu_read_lock();
+ spin_lock(&new->lock);
+
++ current_euid_egid(&euid, &egid);
++ new->cuid = new->uid = euid;
++ new->gid = new->cgid = egid;
++
+ id = idr_alloc(&ids->ipcs_idr, new,
+ (next_id < 0) ? 0 : ipcid_to_idx(next_id), 0,
+ GFP_NOWAIT);
+@@ -249,10 +253,6 @@ int ipc_addid(struct ipc_ids *ids, struct kern_ipc_perm *new, int size)
+
+ ids->in_use++;
+
+- current_euid_egid(&euid, &egid);
+- new->cuid = new->uid = euid;
+- new->gid = new->cgid = egid;
+-
+ if (next_id < 0) {
+ new->seq = ids->seq++;
+ if (ids->seq > IPCID_SEQ_MAX)
+diff --git a/kernel/cgroup.c b/kernel/cgroup.c
+index c6c4240e7d28..fe6f855de3d1 100644
+--- a/kernel/cgroup.c
++++ b/kernel/cgroup.c
+@@ -46,7 +46,6 @@
+ #include <linux/slab.h>
+ #include <linux/spinlock.h>
+ #include <linux/rwsem.h>
+-#include <linux/percpu-rwsem.h>
+ #include <linux/string.h>
+ #include <linux/sort.h>
+ #include <linux/kmod.h>
+@@ -104,8 +103,6 @@ static DEFINE_SPINLOCK(cgroup_idr_lock);
+ */
+ static DEFINE_SPINLOCK(release_agent_path_lock);
+
+-struct percpu_rw_semaphore cgroup_threadgroup_rwsem;
+-
+ #define cgroup_assert_mutex_or_rcu_locked() \
+ rcu_lockdep_assert(rcu_read_lock_held() || \
+ lockdep_is_held(&cgroup_mutex), \
+@@ -870,6 +867,48 @@ static struct css_set *find_css_set(struct css_set *old_cset,
+ return cset;
+ }
+
++void cgroup_threadgroup_change_begin(struct task_struct *tsk)
++{
++ down_read(&tsk->signal->group_rwsem);
++}
++
++void cgroup_threadgroup_change_end(struct task_struct *tsk)
++{
++ up_read(&tsk->signal->group_rwsem);
++}
++
++/**
++ * threadgroup_lock - lock threadgroup
++ * @tsk: member task of the threadgroup to lock
++ *
++ * Lock the threadgroup @tsk belongs to. No new task is allowed to enter
++ * and member tasks aren't allowed to exit (as indicated by PF_EXITING) or
++ * change ->group_leader/pid. This is useful for cases where the threadgroup
++ * needs to stay stable across blockable operations.
++ *
++ * fork and exit explicitly call threadgroup_change_{begin|end}() for
++ * synchronization. While held, no new task will be added to threadgroup
++ * and no existing live task will have its PF_EXITING set.
++ *
++ * de_thread() does threadgroup_change_{begin|end}() when a non-leader
++ * sub-thread becomes a new leader.
++ */
++static void threadgroup_lock(struct task_struct *tsk)
++{
++ down_write(&tsk->signal->group_rwsem);
++}
++
++/**
++ * threadgroup_unlock - unlock threadgroup
++ * @tsk: member task of the threadgroup to unlock
++ *
++ * Reverse threadgroup_lock().
++ */
++static inline void threadgroup_unlock(struct task_struct *tsk)
++{
++ up_write(&tsk->signal->group_rwsem);
++}
++
+ static struct cgroup_root *cgroup_root_from_kf(struct kernfs_root *kf_root)
+ {
+ struct cgroup *root_cgrp = kf_root->kn->priv;
+@@ -2066,9 +2105,9 @@ static void cgroup_task_migrate(struct cgroup *old_cgrp,
+ lockdep_assert_held(&css_set_rwsem);
+
+ /*
+- * We are synchronized through cgroup_threadgroup_rwsem against
+- * PF_EXITING setting such that we can't race against cgroup_exit()
+- * changing the css_set to init_css_set and dropping the old one.
++ * We are synchronized through threadgroup_lock() against PF_EXITING
++ * setting such that we can't race against cgroup_exit() changing the
++ * css_set to init_css_set and dropping the old one.
+ */
+ WARN_ON_ONCE(tsk->flags & PF_EXITING);
+ old_cset = task_css_set(tsk);
+@@ -2125,11 +2164,10 @@ static void cgroup_migrate_finish(struct list_head *preloaded_csets)
+ * @src_cset and add it to @preloaded_csets, which should later be cleaned
+ * up by cgroup_migrate_finish().
+ *
+- * This function may be called without holding cgroup_threadgroup_rwsem
+- * even if the target is a process. Threads may be created and destroyed
+- * but as long as cgroup_mutex is not dropped, no new css_set can be put
+- * into play and the preloaded css_sets are guaranteed to cover all
+- * migrations.
++ * This function may be called without holding threadgroup_lock even if the
++ * target is a process. Threads may be created and destroyed but as long
++ * as cgroup_mutex is not dropped, no new css_set can be put into play and
++ * the preloaded css_sets are guaranteed to cover all migrations.
+ */
+ static void cgroup_migrate_add_src(struct css_set *src_cset,
+ struct cgroup *dst_cgrp,
+@@ -2232,7 +2270,7 @@ err:
+ * @threadgroup: whether @leader points to the whole process or a single task
+ *
+ * Migrate a process or task denoted by @leader to @cgrp. If migrating a
+- * process, the caller must be holding cgroup_threadgroup_rwsem. The
++ * process, the caller must be holding threadgroup_lock of @leader. The
+ * caller is also responsible for invoking cgroup_migrate_add_src() and
+ * cgroup_migrate_prepare_dst() on the targets before invoking this
+ * function and following up with cgroup_migrate_finish().
+@@ -2360,7 +2398,7 @@ out_release_tset:
+ * @leader: the task or the leader of the threadgroup to be attached
+ * @threadgroup: attach the whole threadgroup?
+ *
+- * Call holding cgroup_mutex and cgroup_threadgroup_rwsem.
++ * Call holding cgroup_mutex and threadgroup_lock of @leader.
+ */
+ static int cgroup_attach_task(struct cgroup *dst_cgrp,
+ struct task_struct *leader, bool threadgroup)
+@@ -2452,13 +2490,14 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf,
+ if (!cgrp)
+ return -ENODEV;
+
+- percpu_down_write(&cgroup_threadgroup_rwsem);
++retry_find_task:
+ rcu_read_lock();
+ if (pid) {
+ tsk = find_task_by_vpid(pid);
+ if (!tsk) {
++ rcu_read_unlock();
+ ret = -ESRCH;
+- goto out_unlock_rcu;
++ goto out_unlock_cgroup;
+ }
+ } else {
+ tsk = current;
+@@ -2474,23 +2513,37 @@ static ssize_t __cgroup_procs_write(struct kernfs_open_file *of, char *buf,
+ */
+ if (tsk == kthreadd_task || (tsk->flags & PF_NO_SETAFFINITY)) {
+ ret = -EINVAL;
+- goto out_unlock_rcu;
++ rcu_read_unlock();
++ goto out_unlock_cgroup;
+ }
+
+ get_task_struct(tsk);
+ rcu_read_unlock();
+
++ threadgroup_lock(tsk);
++ if (threadgroup) {
++ if (!thread_group_leader(tsk)) {
++ /*
++ * a race with de_thread from another thread's exec()
++ * may strip us of our leadership, if this happens,
++ * there is no choice but to throw this task away and
++ * try again; this is
++ * "double-double-toil-and-trouble-check locking".
++ */
++ threadgroup_unlock(tsk);
++ put_task_struct(tsk);
++ goto retry_find_task;
++ }
++ }
++
+ ret = cgroup_procs_write_permission(tsk, cgrp, of);
+ if (!ret)
+ ret = cgroup_attach_task(cgrp, tsk, threadgroup);
+
+- put_task_struct(tsk);
+- goto out_unlock_threadgroup;
++ threadgroup_unlock(tsk);
+
+-out_unlock_rcu:
+- rcu_read_unlock();
+-out_unlock_threadgroup:
+- percpu_up_write(&cgroup_threadgroup_rwsem);
++ put_task_struct(tsk);
++out_unlock_cgroup:
+ cgroup_kn_unlock(of->kn);
+ return ret ?: nbytes;
+ }
+@@ -2635,8 +2688,6 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp)
+
+ lockdep_assert_held(&cgroup_mutex);
+
+- percpu_down_write(&cgroup_threadgroup_rwsem);
+-
+ /* look up all csses currently attached to @cgrp's subtree */
+ down_read(&css_set_rwsem);
+ css_for_each_descendant_pre(css, cgroup_css(cgrp, NULL)) {
+@@ -2692,8 +2743,17 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp)
+ goto out_finish;
+ last_task = task;
+
++ threadgroup_lock(task);
++ /* raced against de_thread() from another thread? */
++ if (!thread_group_leader(task)) {
++ threadgroup_unlock(task);
++ put_task_struct(task);
++ continue;
++ }
++
+ ret = cgroup_migrate(src_cset->dfl_cgrp, task, true);
+
++ threadgroup_unlock(task);
+ put_task_struct(task);
+
+ if (WARN(ret, "cgroup: failed to update controllers for the default hierarchy (%d), further operations may crash or hang\n", ret))
+@@ -2703,7 +2763,6 @@ static int cgroup_update_dfl_csses(struct cgroup *cgrp)
+
+ out_finish:
+ cgroup_migrate_finish(&preloaded_csets);
+- percpu_up_write(&cgroup_threadgroup_rwsem);
+ return ret;
+ }
+
+@@ -5013,7 +5072,6 @@ int __init cgroup_init(void)
+ unsigned long key;
+ int ssid, err;
+
+- BUG_ON(percpu_init_rwsem(&cgroup_threadgroup_rwsem));
+ BUG_ON(cgroup_init_cftypes(NULL, cgroup_dfl_base_files));
+ BUG_ON(cgroup_init_cftypes(NULL, cgroup_legacy_base_files));
+
+diff --git a/kernel/fork.c b/kernel/fork.c
+index 26a70dc7a915..e769c8c86f86 100644
+--- a/kernel/fork.c
++++ b/kernel/fork.c
+@@ -1146,6 +1146,10 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
+ tty_audit_fork(sig);
+ sched_autogroup_fork(sig);
+
++#ifdef CONFIG_CGROUPS
++ init_rwsem(&sig->group_rwsem);
++#endif
++
+ sig->oom_score_adj = current->signal->oom_score_adj;
+ sig->oom_score_adj_min = current->signal->oom_score_adj_min;
+
+diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c
+index 0e97c142ce40..4e6267a34440 100644
+--- a/kernel/irq/proc.c
++++ b/kernel/irq/proc.c
+@@ -12,6 +12,7 @@
+ #include <linux/seq_file.h>
+ #include <linux/interrupt.h>
+ #include <linux/kernel_stat.h>
++#include <linux/mutex.h>
+
+ #include "internals.h"
+
+@@ -323,18 +324,29 @@ void register_handler_proc(unsigned int irq, struct irqaction *action)
+
+ void register_irq_proc(unsigned int irq, struct irq_desc *desc)
+ {
++ static DEFINE_MUTEX(register_lock);
+ char name [MAX_NAMELEN];
+
+- if (!root_irq_dir || (desc->irq_data.chip == &no_irq_chip) || desc->dir)
++ if (!root_irq_dir || (desc->irq_data.chip == &no_irq_chip))
+ return;
+
++ /*
++ * irq directories are registered only when a handler is
++ * added, not when the descriptor is created, so multiple
++ * tasks might try to register at the same time.
++ */
++ mutex_lock(®ister_lock);
++
++ if (desc->dir)
++ goto out_unlock;
++
+ memset(name, 0, MAX_NAMELEN);
+ sprintf(name, "%d", irq);
+
+ /* create /proc/irq/1234 */
+ desc->dir = proc_mkdir(name, root_irq_dir);
+ if (!desc->dir)
+- return;
++ goto out_unlock;
+
+ #ifdef CONFIG_SMP
+ /* create /proc/irq/<irq>/smp_affinity */
+@@ -355,6 +367,9 @@ void register_irq_proc(unsigned int irq, struct irq_desc *desc)
+
+ proc_create_data("spurious", 0444, desc->dir,
+ &irq_spurious_proc_fops, (void *)(long)irq);
++
++out_unlock:
++ mutex_unlock(®ister_lock);
+ }
+
+ void unregister_irq_proc(unsigned int irq, struct irq_desc *desc)
+diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
+index 38c49202d532..8ed01611ae73 100644
+--- a/kernel/locking/qspinlock.c
++++ b/kernel/locking/qspinlock.c
+@@ -289,7 +289,7 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val)
+ if (pv_enabled())
+ goto queue;
+
+- if (virt_queued_spin_lock(lock))
++ if (virt_spin_lock(lock))
+ return;
+
+ /*
+diff --git a/kernel/sched/core.c b/kernel/sched/core.c
+index e9673433cc01..6776631676e0 100644
+--- a/kernel/sched/core.c
++++ b/kernel/sched/core.c
+@@ -2461,11 +2461,11 @@ static struct rq *finish_task_switch(struct task_struct *prev)
+ * If a task dies, then it sets TASK_DEAD in tsk->state and calls
+ * schedule one last time. The schedule call will never return, and
+ * the scheduled task must drop that reference.
+- * The test for TASK_DEAD must occur while the runqueue locks are
+- * still held, otherwise prev could be scheduled on another cpu, die
+- * there before we look at prev->state, and then the reference would
+- * be dropped twice.
+- * Manfred Spraul <manfred@colorfullife.com>
++ *
++ * We must observe prev->state before clearing prev->on_cpu (in
++ * finish_lock_switch), otherwise a concurrent wakeup can get prev
++ * running on another CPU and we could rave with its RUNNING -> DEAD
++ * transition, resulting in a double drop.
+ */
+ prev_state = prev->state;
+ vtime_task_switch(prev);
+@@ -2614,13 +2614,20 @@ unsigned long nr_running(void)
+
+ /*
+ * Check if only the current task is running on the cpu.
++ *
++ * Caution: this function does not check that the caller has disabled
++ * preemption, thus the result might have a time-of-check-to-time-of-use
++ * race. The caller is responsible to use it correctly, for example:
++ *
++ * - from a non-preemptable section (of course)
++ *
++ * - from a thread that is bound to a single CPU
++ *
++ * - in a loop with very short iterations (e.g. a polling loop)
+ */
+ bool single_task_running(void)
+ {
+- if (cpu_rq(smp_processor_id())->nr_running == 1)
+- return true;
+- else
+- return false;
++ return raw_rq()->nr_running == 1;
+ }
+ EXPORT_SYMBOL(single_task_running);
+
+@@ -4492,7 +4499,7 @@ SYSCALL_DEFINE0(sched_yield)
+
+ int __sched _cond_resched(void)
+ {
+- if (should_resched()) {
++ if (should_resched(0)) {
+ preempt_schedule_common();
+ return 1;
+ }
+@@ -4510,7 +4517,7 @@ EXPORT_SYMBOL(_cond_resched);
+ */
+ int __cond_resched_lock(spinlock_t *lock)
+ {
+- int resched = should_resched();
++ int resched = should_resched(PREEMPT_LOCK_OFFSET);
+ int ret = 0;
+
+ lockdep_assert_held(lock);
+@@ -4532,7 +4539,7 @@ int __sched __cond_resched_softirq(void)
+ {
+ BUG_ON(!in_softirq());
+
+- if (should_resched()) {
++ if (should_resched(SOFTIRQ_DISABLE_OFFSET)) {
+ local_bh_enable();
+ preempt_schedule_common();
+ local_bh_disable();
+diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
+index 84d48790bb6d..08ab96b366bf 100644
+--- a/kernel/sched/sched.h
++++ b/kernel/sched/sched.h
+@@ -1091,9 +1091,10 @@ static inline void finish_lock_switch(struct rq *rq, struct task_struct *prev)
+ * After ->on_cpu is cleared, the task can be moved to a different CPU.
+ * We must ensure this doesn't happen until the switch is completely
+ * finished.
++ *
++ * Pairs with the control dependency and rmb in try_to_wake_up().
+ */
+- smp_wmb();
+- prev->on_cpu = 0;
++ smp_store_release(&prev->on_cpu, 0);
+ #endif
+ #ifdef CONFIG_DEBUG_SPINLOCK
+ /* this is a valid case when another task releases the spinlock */
+diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
+index 841b72f720e8..3a38775b50c2 100644
+--- a/kernel/time/clocksource.c
++++ b/kernel/time/clocksource.c
+@@ -217,7 +217,7 @@ static void clocksource_watchdog(unsigned long data)
+ continue;
+
+ /* Check the deviation from the watchdog clocksource. */
+- if ((abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD)) {
++ if (abs64(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD) {
+ pr_warn("timekeeping watchdog: Marking clocksource '%s' as unstable because the skew is too large:\n",
+ cs->name);
+ pr_warn(" '%s' wd_now: %llx wd_last: %llx mask: %llx\n",
+diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
+index bca3667a2de1..a20d4110e871 100644
+--- a/kernel/time/timekeeping.c
++++ b/kernel/time/timekeeping.c
+@@ -1607,7 +1607,7 @@ static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
+ negative = (tick_error < 0);
+
+ /* Sort out the magnitude of the correction */
+- tick_error = abs(tick_error);
++ tick_error = abs64(tick_error);
+ for (adj = 0; tick_error > interval; adj++)
+ tick_error >>= 1;
+
+diff --git a/lib/iommu-common.c b/lib/iommu-common.c
+index ff19f66d3f7f..b1c93e94ca7a 100644
+--- a/lib/iommu-common.c
++++ b/lib/iommu-common.c
+@@ -21,8 +21,7 @@ static DEFINE_PER_CPU(unsigned int, iommu_hash_common);
+
+ static inline bool need_flush(struct iommu_map_table *iommu)
+ {
+- return (iommu->lazy_flush != NULL &&
+- (iommu->flags & IOMMU_NEED_FLUSH) != 0);
++ return ((iommu->flags & IOMMU_NEED_FLUSH) != 0);
+ }
+
+ static inline void set_flush(struct iommu_map_table *iommu)
+@@ -211,7 +210,8 @@ unsigned long iommu_tbl_range_alloc(struct device *dev,
+ goto bail;
+ }
+ }
+- if (n < pool->hint || need_flush(iommu)) {
++ if (iommu->lazy_flush &&
++ (n < pool->hint || need_flush(iommu))) {
+ clear_flush(iommu);
+ iommu->lazy_flush(iommu);
+ }
+diff --git a/mm/hugetlb.c b/mm/hugetlb.c
+index a8c3087089d8..62c1ec5a9d31 100644
+--- a/mm/hugetlb.c
++++ b/mm/hugetlb.c
+@@ -2974,6 +2974,14 @@ static void unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma,
+ continue;
+
+ /*
++ * Shared VMAs have their own reserves and do not affect
++ * MAP_PRIVATE accounting but it is possible that a shared
++ * VMA is using the same page so check and skip such VMAs.
++ */
++ if (iter_vma->vm_flags & VM_MAYSHARE)
++ continue;
++
++ /*
+ * Unmap the page from other VMAs without their own reserves.
+ * They get marked to be SIGKILLed if they fault in these
+ * areas. This is because a future no-page fault on this VMA
+diff --git a/mm/memcontrol.c b/mm/memcontrol.c
+index acb93c554f6e..237d4686482d 100644
+--- a/mm/memcontrol.c
++++ b/mm/memcontrol.c
+@@ -806,12 +806,14 @@ mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_zone *mctz)
+ }
+
+ /*
++ * Return page count for single (non recursive) @memcg.
++ *
+ * Implementation Note: reading percpu statistics for memcg.
+ *
+ * Both of vmstat[] and percpu_counter has threshold and do periodic
+ * synchronization to implement "quick" read. There are trade-off between
+ * reading cost and precision of value. Then, we may have a chance to implement
+- * a periodic synchronizion of counter in memcg's counter.
++ * a periodic synchronization of counter in memcg's counter.
+ *
+ * But this _read() function is used for user interface now. The user accounts
+ * memory usage by memory cgroup and he _always_ requires exact value because
+@@ -821,17 +823,24 @@ mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_zone *mctz)
+ *
+ * If there are kernel internal actions which can make use of some not-exact
+ * value, and reading all cpu value can be performance bottleneck in some
+- * common workload, threashold and synchonization as vmstat[] should be
++ * common workload, threshold and synchronization as vmstat[] should be
+ * implemented.
+ */
+-static long mem_cgroup_read_stat(struct mem_cgroup *memcg,
+- enum mem_cgroup_stat_index idx)
++static unsigned long
++mem_cgroup_read_stat(struct mem_cgroup *memcg, enum mem_cgroup_stat_index idx)
+ {
+ long val = 0;
+ int cpu;
+
++ /* Per-cpu values can be negative, use a signed accumulator */
+ for_each_possible_cpu(cpu)
+ val += per_cpu(memcg->stat->count[idx], cpu);
++ /*
++ * Summing races with updates, so val may be negative. Avoid exposing
++ * transient negative values.
++ */
++ if (val < 0)
++ val = 0;
+ return val;
+ }
+
+@@ -1498,7 +1507,7 @@ void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
+ for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
+ if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
+ continue;
+- pr_cont(" %s:%ldKB", mem_cgroup_stat_names[i],
++ pr_cont(" %s:%luKB", mem_cgroup_stat_names[i],
+ K(mem_cgroup_read_stat(iter, i)));
+ }
+
+@@ -3119,14 +3128,11 @@ static unsigned long tree_stat(struct mem_cgroup *memcg,
+ enum mem_cgroup_stat_index idx)
+ {
+ struct mem_cgroup *iter;
+- long val = 0;
++ unsigned long val = 0;
+
+- /* Per-cpu values can be negative, use a signed accumulator */
+ for_each_mem_cgroup_tree(iter, memcg)
+ val += mem_cgroup_read_stat(iter, idx);
+
+- if (val < 0) /* race ? */
+- val = 0;
+ return val;
+ }
+
+@@ -3469,7 +3475,7 @@ static int memcg_stat_show(struct seq_file *m, void *v)
+ for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
+ if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
+ continue;
+- seq_printf(m, "%s %ld\n", mem_cgroup_stat_names[i],
++ seq_printf(m, "%s %lu\n", mem_cgroup_stat_names[i],
+ mem_cgroup_read_stat(memcg, i) * PAGE_SIZE);
+ }
+
+@@ -3494,13 +3500,13 @@ static int memcg_stat_show(struct seq_file *m, void *v)
+ (u64)memsw * PAGE_SIZE);
+
+ for (i = 0; i < MEM_CGROUP_STAT_NSTATS; i++) {
+- long long val = 0;
++ unsigned long long val = 0;
+
+ if (i == MEM_CGROUP_STAT_SWAP && !do_swap_account)
+ continue;
+ for_each_mem_cgroup_tree(mi, memcg)
+ val += mem_cgroup_read_stat(mi, i) * PAGE_SIZE;
+- seq_printf(m, "total_%s %lld\n", mem_cgroup_stat_names[i], val);
++ seq_printf(m, "total_%s %llu\n", mem_cgroup_stat_names[i], val);
+ }
+
+ for (i = 0; i < MEM_CGROUP_EVENTS_NSTATS; i++) {
+diff --git a/mm/migrate.c b/mm/migrate.c
+index eb4267107d1f..fcb6204de108 100644
+--- a/mm/migrate.c
++++ b/mm/migrate.c
+@@ -734,6 +734,15 @@ static int move_to_new_page(struct page *newpage, struct page *page,
+ if (PageSwapBacked(page))
+ SetPageSwapBacked(newpage);
+
++ /*
++ * Indirectly called below, migrate_page_copy() copies PG_dirty and thus
++ * needs newpage's memcg set to transfer memcg dirty page accounting.
++ * So perform memcg migration in two steps:
++ * 1. set newpage->mem_cgroup (here)
++ * 2. clear page->mem_cgroup (below)
++ */
++ set_page_memcg(newpage, page_memcg(page));
++
+ mapping = page_mapping(page);
+ if (!mapping)
+ rc = migrate_page(mapping, newpage, page, mode);
+@@ -750,9 +759,10 @@ static int move_to_new_page(struct page *newpage, struct page *page,
+ rc = fallback_migrate_page(mapping, newpage, page, mode);
+
+ if (rc != MIGRATEPAGE_SUCCESS) {
++ set_page_memcg(newpage, NULL);
+ newpage->mapping = NULL;
+ } else {
+- mem_cgroup_migrate(page, newpage, false);
++ set_page_memcg(page, NULL);
+ if (page_was_mapped)
+ remove_migration_ptes(page, newpage);
+ page->mapping = NULL;
+@@ -1068,7 +1078,7 @@ out:
+ if (rc != MIGRATEPAGE_SUCCESS && put_new_page)
+ put_new_page(new_hpage, private);
+ else
+- put_page(new_hpage);
++ putback_active_hugepage(new_hpage);
+
+ if (result) {
+ if (rc)
+diff --git a/mm/slab.c b/mm/slab.c
+index bbd0b47dc6a9..ae360283029c 100644
+--- a/mm/slab.c
++++ b/mm/slab.c
+@@ -2190,9 +2190,16 @@ __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
+ size += BYTES_PER_WORD;
+ }
+ #if FORCED_DEBUG && defined(CONFIG_DEBUG_PAGEALLOC)
+- if (size >= kmalloc_size(INDEX_NODE + 1)
+- && cachep->object_size > cache_line_size()
+- && ALIGN(size, cachep->align) < PAGE_SIZE) {
++ /*
++ * To activate debug pagealloc, off-slab management is necessary
++ * requirement. In early phase of initialization, small sized slab
++ * doesn't get initialized so it would not be possible. So, we need
++ * to check size >= 256. It guarantees that all necessary small
++ * sized slab is initialized in current slab initialization sequence.
++ */
++ if (!slab_early_init && size >= kmalloc_size(INDEX_NODE) &&
++ size >= 256 && cachep->object_size > cache_line_size() &&
++ ALIGN(size, cachep->align) < PAGE_SIZE) {
+ cachep->obj_offset += PAGE_SIZE - ALIGN(size, cachep->align);
+ size = PAGE_SIZE;
+ }
+diff --git a/net/batman-adv/distributed-arp-table.c b/net/batman-adv/distributed-arp-table.c
+index 6d0b471eede8..cc7d87d64987 100644
+--- a/net/batman-adv/distributed-arp-table.c
++++ b/net/batman-adv/distributed-arp-table.c
+@@ -19,6 +19,7 @@
+ #include "main.h"
+
+ #include <linux/atomic.h>
++#include <linux/bitops.h>
+ #include <linux/byteorder/generic.h>
+ #include <linux/errno.h>
+ #include <linux/etherdevice.h>
+@@ -453,7 +454,7 @@ static bool batadv_is_orig_node_eligible(struct batadv_dat_candidate *res,
+ int j;
+
+ /* check if orig node candidate is running DAT */
+- if (!(candidate->capabilities & BATADV_ORIG_CAPA_HAS_DAT))
++ if (!test_bit(BATADV_ORIG_CAPA_HAS_DAT, &candidate->capabilities))
+ goto out;
+
+ /* Check if this node has already been selected... */
+@@ -713,9 +714,9 @@ static void batadv_dat_tvlv_ogm_handler_v1(struct batadv_priv *bat_priv,
+ uint16_t tvlv_value_len)
+ {
+ if (flags & BATADV_TVLV_HANDLER_OGM_CIFNOTFND)
+- orig->capabilities &= ~BATADV_ORIG_CAPA_HAS_DAT;
++ clear_bit(BATADV_ORIG_CAPA_HAS_DAT, &orig->capabilities);
+ else
+- orig->capabilities |= BATADV_ORIG_CAPA_HAS_DAT;
++ set_bit(BATADV_ORIG_CAPA_HAS_DAT, &orig->capabilities);
+ }
+
+ /**
+diff --git a/net/batman-adv/multicast.c b/net/batman-adv/multicast.c
+index 7aa480b7edd0..68a9554961eb 100644
+--- a/net/batman-adv/multicast.c
++++ b/net/batman-adv/multicast.c
+@@ -19,6 +19,8 @@
+ #include "main.h"
+
+ #include <linux/atomic.h>
++#include <linux/bitops.h>
++#include <linux/bug.h>
+ #include <linux/byteorder/generic.h>
+ #include <linux/errno.h>
+ #include <linux/etherdevice.h>
+@@ -588,19 +590,26 @@ batadv_mcast_forw_mode(struct batadv_priv *bat_priv, struct sk_buff *skb,
+ *
+ * If the BATADV_MCAST_WANT_ALL_UNSNOOPABLES flag of this originator,
+ * orig, has toggled then this method updates counter and list accordingly.
++ *
++ * Caller needs to hold orig->mcast_handler_lock.
+ */
+ static void batadv_mcast_want_unsnoop_update(struct batadv_priv *bat_priv,
+ struct batadv_orig_node *orig,
+ uint8_t mcast_flags)
+ {
++ struct hlist_node *node = &orig->mcast_want_all_unsnoopables_node;
++ struct hlist_head *head = &bat_priv->mcast.want_all_unsnoopables_list;
++
+ /* switched from flag unset to set */
+ if (mcast_flags & BATADV_MCAST_WANT_ALL_UNSNOOPABLES &&
+ !(orig->mcast_flags & BATADV_MCAST_WANT_ALL_UNSNOOPABLES)) {
+ atomic_inc(&bat_priv->mcast.num_want_all_unsnoopables);
+
+ spin_lock_bh(&bat_priv->mcast.want_lists_lock);
+- hlist_add_head_rcu(&orig->mcast_want_all_unsnoopables_node,
+- &bat_priv->mcast.want_all_unsnoopables_list);
++ /* flag checks above + mcast_handler_lock prevents this */
++ WARN_ON(!hlist_unhashed(node));
++
++ hlist_add_head_rcu(node, head);
+ spin_unlock_bh(&bat_priv->mcast.want_lists_lock);
+ /* switched from flag set to unset */
+ } else if (!(mcast_flags & BATADV_MCAST_WANT_ALL_UNSNOOPABLES) &&
+@@ -608,7 +617,10 @@ static void batadv_mcast_want_unsnoop_update(struct batadv_priv *bat_priv,
+ atomic_dec(&bat_priv->mcast.num_want_all_unsnoopables);
+
+ spin_lock_bh(&bat_priv->mcast.want_lists_lock);
+- hlist_del_rcu(&orig->mcast_want_all_unsnoopables_node);
++ /* flag checks above + mcast_handler_lock prevents this */
++ WARN_ON(hlist_unhashed(node));
++
++ hlist_del_init_rcu(node);
+ spin_unlock_bh(&bat_priv->mcast.want_lists_lock);
+ }
+ }
+@@ -621,19 +633,26 @@ static void batadv_mcast_want_unsnoop_update(struct batadv_priv *bat_priv,
+ *
+ * If the BATADV_MCAST_WANT_ALL_IPV4 flag of this originator, orig, has
+ * toggled then this method updates counter and list accordingly.
++ *
++ * Caller needs to hold orig->mcast_handler_lock.
+ */
+ static void batadv_mcast_want_ipv4_update(struct batadv_priv *bat_priv,
+ struct batadv_orig_node *orig,
+ uint8_t mcast_flags)
+ {
++ struct hlist_node *node = &orig->mcast_want_all_ipv4_node;
++ struct hlist_head *head = &bat_priv->mcast.want_all_ipv4_list;
++
+ /* switched from flag unset to set */
+ if (mcast_flags & BATADV_MCAST_WANT_ALL_IPV4 &&
+ !(orig->mcast_flags & BATADV_MCAST_WANT_ALL_IPV4)) {
+ atomic_inc(&bat_priv->mcast.num_want_all_ipv4);
+
+ spin_lock_bh(&bat_priv->mcast.want_lists_lock);
+- hlist_add_head_rcu(&orig->mcast_want_all_ipv4_node,
+- &bat_priv->mcast.want_all_ipv4_list);
++ /* flag checks above + mcast_handler_lock prevents this */
++ WARN_ON(!hlist_unhashed(node));
++
++ hlist_add_head_rcu(node, head);
+ spin_unlock_bh(&bat_priv->mcast.want_lists_lock);
+ /* switched from flag set to unset */
+ } else if (!(mcast_flags & BATADV_MCAST_WANT_ALL_IPV4) &&
+@@ -641,7 +660,10 @@ static void batadv_mcast_want_ipv4_update(struct batadv_priv *bat_priv,
+ atomic_dec(&bat_priv->mcast.num_want_all_ipv4);
+
+ spin_lock_bh(&bat_priv->mcast.want_lists_lock);
+- hlist_del_rcu(&orig->mcast_want_all_ipv4_node);
++ /* flag checks above + mcast_handler_lock prevents this */
++ WARN_ON(hlist_unhashed(node));
++
++ hlist_del_init_rcu(node);
+ spin_unlock_bh(&bat_priv->mcast.want_lists_lock);
+ }
+ }
+@@ -654,19 +676,26 @@ static void batadv_mcast_want_ipv4_update(struct batadv_priv *bat_priv,
+ *
+ * If the BATADV_MCAST_WANT_ALL_IPV6 flag of this originator, orig, has
+ * toggled then this method updates counter and list accordingly.
++ *
++ * Caller needs to hold orig->mcast_handler_lock.
+ */
+ static void batadv_mcast_want_ipv6_update(struct batadv_priv *bat_priv,
+ struct batadv_orig_node *orig,
+ uint8_t mcast_flags)
+ {
++ struct hlist_node *node = &orig->mcast_want_all_ipv6_node;
++ struct hlist_head *head = &bat_priv->mcast.want_all_ipv6_list;
++
+ /* switched from flag unset to set */
+ if (mcast_flags & BATADV_MCAST_WANT_ALL_IPV6 &&
+ !(orig->mcast_flags & BATADV_MCAST_WANT_ALL_IPV6)) {
+ atomic_inc(&bat_priv->mcast.num_want_all_ipv6);
+
+ spin_lock_bh(&bat_priv->mcast.want_lists_lock);
+- hlist_add_head_rcu(&orig->mcast_want_all_ipv6_node,
+- &bat_priv->mcast.want_all_ipv6_list);
++ /* flag checks above + mcast_handler_lock prevents this */
++ WARN_ON(!hlist_unhashed(node));
++
++ hlist_add_head_rcu(node, head);
+ spin_unlock_bh(&bat_priv->mcast.want_lists_lock);
+ /* switched from flag set to unset */
+ } else if (!(mcast_flags & BATADV_MCAST_WANT_ALL_IPV6) &&
+@@ -674,7 +703,10 @@ static void batadv_mcast_want_ipv6_update(struct batadv_priv *bat_priv,
+ atomic_dec(&bat_priv->mcast.num_want_all_ipv6);
+
+ spin_lock_bh(&bat_priv->mcast.want_lists_lock);
+- hlist_del_rcu(&orig->mcast_want_all_ipv6_node);
++ /* flag checks above + mcast_handler_lock prevents this */
++ WARN_ON(hlist_unhashed(node));
++
++ hlist_del_init_rcu(node);
+ spin_unlock_bh(&bat_priv->mcast.want_lists_lock);
+ }
+ }
+@@ -697,39 +729,42 @@ static void batadv_mcast_tvlv_ogm_handler_v1(struct batadv_priv *bat_priv,
+ uint8_t mcast_flags = BATADV_NO_FLAGS;
+ bool orig_initialized;
+
+- orig_initialized = orig->capa_initialized & BATADV_ORIG_CAPA_HAS_MCAST;
++ if (orig_mcast_enabled && tvlv_value &&
++ (tvlv_value_len >= sizeof(mcast_flags)))
++ mcast_flags = *(uint8_t *)tvlv_value;
++
++ spin_lock_bh(&orig->mcast_handler_lock);
++ orig_initialized = test_bit(BATADV_ORIG_CAPA_HAS_MCAST,
++ &orig->capa_initialized);
+
+ /* If mcast support is turned on decrease the disabled mcast node
+ * counter only if we had increased it for this node before. If this
+ * is a completely new orig_node no need to decrease the counter.
+ */
+ if (orig_mcast_enabled &&
+- !(orig->capabilities & BATADV_ORIG_CAPA_HAS_MCAST)) {
++ !test_bit(BATADV_ORIG_CAPA_HAS_MCAST, &orig->capabilities)) {
+ if (orig_initialized)
+ atomic_dec(&bat_priv->mcast.num_disabled);
+- orig->capabilities |= BATADV_ORIG_CAPA_HAS_MCAST;
++ set_bit(BATADV_ORIG_CAPA_HAS_MCAST, &orig->capabilities);
+ /* If mcast support is being switched off or if this is an initial
+ * OGM without mcast support then increase the disabled mcast
+ * node counter.
+ */
+ } else if (!orig_mcast_enabled &&
+- (orig->capabilities & BATADV_ORIG_CAPA_HAS_MCAST ||
++ (test_bit(BATADV_ORIG_CAPA_HAS_MCAST, &orig->capabilities) ||
+ !orig_initialized)) {
+ atomic_inc(&bat_priv->mcast.num_disabled);
+- orig->capabilities &= ~BATADV_ORIG_CAPA_HAS_MCAST;
++ clear_bit(BATADV_ORIG_CAPA_HAS_MCAST, &orig->capabilities);
+ }
+
+- orig->capa_initialized |= BATADV_ORIG_CAPA_HAS_MCAST;
+-
+- if (orig_mcast_enabled && tvlv_value &&
+- (tvlv_value_len >= sizeof(mcast_flags)))
+- mcast_flags = *(uint8_t *)tvlv_value;
++ set_bit(BATADV_ORIG_CAPA_HAS_MCAST, &orig->capa_initialized);
+
+ batadv_mcast_want_unsnoop_update(bat_priv, orig, mcast_flags);
+ batadv_mcast_want_ipv4_update(bat_priv, orig, mcast_flags);
+ batadv_mcast_want_ipv6_update(bat_priv, orig, mcast_flags);
+
+ orig->mcast_flags = mcast_flags;
++ spin_unlock_bh(&orig->mcast_handler_lock);
+ }
+
+ /**
+@@ -763,11 +798,15 @@ void batadv_mcast_purge_orig(struct batadv_orig_node *orig)
+ {
+ struct batadv_priv *bat_priv = orig->bat_priv;
+
+- if (!(orig->capabilities & BATADV_ORIG_CAPA_HAS_MCAST) &&
+- orig->capa_initialized & BATADV_ORIG_CAPA_HAS_MCAST)
++ spin_lock_bh(&orig->mcast_handler_lock);
++
++ if (!test_bit(BATADV_ORIG_CAPA_HAS_MCAST, &orig->capabilities) &&
++ test_bit(BATADV_ORIG_CAPA_HAS_MCAST, &orig->capa_initialized))
+ atomic_dec(&bat_priv->mcast.num_disabled);
+
+ batadv_mcast_want_unsnoop_update(bat_priv, orig, BATADV_NO_FLAGS);
+ batadv_mcast_want_ipv4_update(bat_priv, orig, BATADV_NO_FLAGS);
+ batadv_mcast_want_ipv6_update(bat_priv, orig, BATADV_NO_FLAGS);
++
++ spin_unlock_bh(&orig->mcast_handler_lock);
+ }
+diff --git a/net/batman-adv/network-coding.c b/net/batman-adv/network-coding.c
+index f0a50f31d822..46604010dcd4 100644
+--- a/net/batman-adv/network-coding.c
++++ b/net/batman-adv/network-coding.c
+@@ -19,6 +19,7 @@
+ #include "main.h"
+
+ #include <linux/atomic.h>
++#include <linux/bitops.h>
+ #include <linux/byteorder/generic.h>
+ #include <linux/compiler.h>
+ #include <linux/debugfs.h>
+@@ -134,9 +135,9 @@ static void batadv_nc_tvlv_ogm_handler_v1(struct batadv_priv *bat_priv,
+ uint16_t tvlv_value_len)
+ {
+ if (flags & BATADV_TVLV_HANDLER_OGM_CIFNOTFND)
+- orig->capabilities &= ~BATADV_ORIG_CAPA_HAS_NC;
++ clear_bit(BATADV_ORIG_CAPA_HAS_NC, &orig->capabilities);
+ else
+- orig->capabilities |= BATADV_ORIG_CAPA_HAS_NC;
++ set_bit(BATADV_ORIG_CAPA_HAS_NC, &orig->capabilities);
+ }
+
+ /**
+@@ -894,7 +895,7 @@ void batadv_nc_update_nc_node(struct batadv_priv *bat_priv,
+ goto out;
+
+ /* check if orig node is network coding enabled */
+- if (!(orig_node->capabilities & BATADV_ORIG_CAPA_HAS_NC))
++ if (!test_bit(BATADV_ORIG_CAPA_HAS_NC, &orig_node->capabilities))
+ goto out;
+
+ /* accept ogms from 'good' neighbors and single hop neighbors */
+diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c
+index 018b7495ad84..32a0fcfab36d 100644
+--- a/net/batman-adv/originator.c
++++ b/net/batman-adv/originator.c
+@@ -696,8 +696,13 @@ struct batadv_orig_node *batadv_orig_node_new(struct batadv_priv *bat_priv,
+ orig_node->last_seen = jiffies;
+ reset_time = jiffies - 1 - msecs_to_jiffies(BATADV_RESET_PROTECTION_MS);
+ orig_node->bcast_seqno_reset = reset_time;
++
+ #ifdef CONFIG_BATMAN_ADV_MCAST
+ orig_node->mcast_flags = BATADV_NO_FLAGS;
++ INIT_HLIST_NODE(&orig_node->mcast_want_all_unsnoopables_node);
++ INIT_HLIST_NODE(&orig_node->mcast_want_all_ipv4_node);
++ INIT_HLIST_NODE(&orig_node->mcast_want_all_ipv6_node);
++ spin_lock_init(&orig_node->mcast_handler_lock);
+ #endif
+
+ /* create a vlan object for the "untagged" LAN */
+diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
+index a2fc843c2243..51cda3a7c51d 100644
+--- a/net/batman-adv/soft-interface.c
++++ b/net/batman-adv/soft-interface.c
+@@ -202,6 +202,7 @@ static int batadv_interface_tx(struct sk_buff *skb,
+ int gw_mode;
+ enum batadv_forw_mode forw_mode;
+ struct batadv_orig_node *mcast_single_orig = NULL;
++ int network_offset = ETH_HLEN;
+
+ if (atomic_read(&bat_priv->mesh_state) != BATADV_MESH_ACTIVE)
+ goto dropped;
+@@ -214,14 +215,18 @@ static int batadv_interface_tx(struct sk_buff *skb,
+ case ETH_P_8021Q:
+ vhdr = vlan_eth_hdr(skb);
+
+- if (vhdr->h_vlan_encapsulated_proto != ethertype)
++ if (vhdr->h_vlan_encapsulated_proto != ethertype) {
++ network_offset += VLAN_HLEN;
+ break;
++ }
+
+ /* fall through */
+ case ETH_P_BATMAN:
+ goto dropped;
+ }
+
++ skb_set_network_header(skb, network_offset);
++
+ if (batadv_bla_tx(bat_priv, skb, vid))
+ goto dropped;
+
+diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
+index 5809b39c1922..c9b26291ac4c 100644
+--- a/net/batman-adv/translation-table.c
++++ b/net/batman-adv/translation-table.c
+@@ -19,6 +19,7 @@
+ #include "main.h"
+
+ #include <linux/atomic.h>
++#include <linux/bitops.h>
+ #include <linux/bug.h>
+ #include <linux/byteorder/generic.h>
+ #include <linux/compiler.h>
+@@ -1882,7 +1883,7 @@ void batadv_tt_global_del_orig(struct batadv_priv *bat_priv,
+ }
+ spin_unlock_bh(list_lock);
+ }
+- orig_node->capa_initialized &= ~BATADV_ORIG_CAPA_HAS_TT;
++ clear_bit(BATADV_ORIG_CAPA_HAS_TT, &orig_node->capa_initialized);
+ }
+
+ static bool batadv_tt_global_to_purge(struct batadv_tt_global_entry *tt_global,
+@@ -2841,7 +2842,7 @@ static void _batadv_tt_update_changes(struct batadv_priv *bat_priv,
+ return;
+ }
+ }
+- orig_node->capa_initialized |= BATADV_ORIG_CAPA_HAS_TT;
++ set_bit(BATADV_ORIG_CAPA_HAS_TT, &orig_node->capa_initialized);
+ }
+
+ static void batadv_tt_fill_gtable(struct batadv_priv *bat_priv,
+@@ -3343,7 +3344,8 @@ static void batadv_tt_update_orig(struct batadv_priv *bat_priv,
+ bool has_tt_init;
+
+ tt_vlan = (struct batadv_tvlv_tt_vlan_data *)tt_buff;
+- has_tt_init = orig_node->capa_initialized & BATADV_ORIG_CAPA_HAS_TT;
++ has_tt_init = test_bit(BATADV_ORIG_CAPA_HAS_TT,
++ &orig_node->capa_initialized);
+
+ /* orig table not initialised AND first diff is in the OGM OR the ttvn
+ * increased by one -> we can apply the attached changes
+diff --git a/net/batman-adv/types.h b/net/batman-adv/types.h
+index 67d63483618e..55610a805b53 100644
+--- a/net/batman-adv/types.h
++++ b/net/batman-adv/types.h
+@@ -221,6 +221,7 @@ struct batadv_orig_bat_iv {
+ * @batadv_dat_addr_t: address of the orig node in the distributed hash
+ * @last_seen: time when last packet from this node was received
+ * @bcast_seqno_reset: time when the broadcast seqno window was reset
++ * @mcast_handler_lock: synchronizes mcast-capability and -flag changes
+ * @mcast_flags: multicast flags announced by the orig node
+ * @mcast_want_all_unsnoop_node: a list node for the
+ * mcast.want_all_unsnoopables list
+@@ -268,13 +269,15 @@ struct batadv_orig_node {
+ unsigned long last_seen;
+ unsigned long bcast_seqno_reset;
+ #ifdef CONFIG_BATMAN_ADV_MCAST
++ /* synchronizes mcast tvlv specific orig changes */
++ spinlock_t mcast_handler_lock;
+ uint8_t mcast_flags;
+ struct hlist_node mcast_want_all_unsnoopables_node;
+ struct hlist_node mcast_want_all_ipv4_node;
+ struct hlist_node mcast_want_all_ipv6_node;
+ #endif
+- uint8_t capabilities;
+- uint8_t capa_initialized;
++ unsigned long capabilities;
++ unsigned long capa_initialized;
+ atomic_t last_ttvn;
+ unsigned char *tt_buff;
+ int16_t tt_buff_len;
+@@ -313,10 +316,10 @@ struct batadv_orig_node {
+ * (= orig node announces a tvlv of type BATADV_TVLV_MCAST)
+ */
+ enum batadv_orig_capabilities {
+- BATADV_ORIG_CAPA_HAS_DAT = BIT(0),
+- BATADV_ORIG_CAPA_HAS_NC = BIT(1),
+- BATADV_ORIG_CAPA_HAS_TT = BIT(2),
+- BATADV_ORIG_CAPA_HAS_MCAST = BIT(3),
++ BATADV_ORIG_CAPA_HAS_DAT,
++ BATADV_ORIG_CAPA_HAS_NC,
++ BATADV_ORIG_CAPA_HAS_TT,
++ BATADV_ORIG_CAPA_HAS_MCAST,
+ };
+
+ /**
+diff --git a/net/bluetooth/smp.c b/net/bluetooth/smp.c
+index ad82324f710f..0510a577a7b5 100644
+--- a/net/bluetooth/smp.c
++++ b/net/bluetooth/smp.c
+@@ -2311,12 +2311,6 @@ int smp_conn_security(struct hci_conn *hcon, __u8 sec_level)
+ if (!conn)
+ return 1;
+
+- chan = conn->smp;
+- if (!chan) {
+- BT_ERR("SMP security requested but not available");
+- return 1;
+- }
+-
+ if (!hci_dev_test_flag(hcon->hdev, HCI_LE_ENABLED))
+ return 1;
+
+@@ -2330,6 +2324,12 @@ int smp_conn_security(struct hci_conn *hcon, __u8 sec_level)
+ if (smp_ltk_encrypt(conn, hcon->pending_sec_level))
+ return 0;
+
++ chan = conn->smp;
++ if (!chan) {
++ BT_ERR("SMP security requested but not available");
++ return 1;
++ }
++
+ l2cap_chan_lock(chan);
+
+ /* If SMP is already in progress ignore this request */
+diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h
+index afe905c208af..691b54fcaf2a 100644
+--- a/net/netfilter/ipset/ip_set_hash_gen.h
++++ b/net/netfilter/ipset/ip_set_hash_gen.h
+@@ -152,9 +152,13 @@ htable_bits(u32 hashsize)
+ #define SET_HOST_MASK(family) (family == AF_INET ? 32 : 128)
+
+ #ifdef IP_SET_HASH_WITH_NET0
++/* cidr from 0 to SET_HOST_MASK() value and c = cidr + 1 */
+ #define NLEN(family) (SET_HOST_MASK(family) + 1)
++#define CIDR_POS(c) ((c) - 1)
+ #else
++/* cidr from 1 to SET_HOST_MASK() value and c = cidr + 1 */
+ #define NLEN(family) SET_HOST_MASK(family)
++#define CIDR_POS(c) ((c) - 2)
+ #endif
+
+ #else
+@@ -305,7 +309,7 @@ mtype_add_cidr(struct htype *h, u8 cidr, u8 nets_length, u8 n)
+ } else if (h->nets[i].cidr[n] < cidr) {
+ j = i;
+ } else if (h->nets[i].cidr[n] == cidr) {
+- h->nets[cidr - 1].nets[n]++;
++ h->nets[CIDR_POS(cidr)].nets[n]++;
+ return;
+ }
+ }
+@@ -314,7 +318,7 @@ mtype_add_cidr(struct htype *h, u8 cidr, u8 nets_length, u8 n)
+ h->nets[i].cidr[n] = h->nets[i - 1].cidr[n];
+ }
+ h->nets[i].cidr[n] = cidr;
+- h->nets[cidr - 1].nets[n] = 1;
++ h->nets[CIDR_POS(cidr)].nets[n] = 1;
+ }
+
+ static void
+@@ -325,8 +329,8 @@ mtype_del_cidr(struct htype *h, u8 cidr, u8 nets_length, u8 n)
+ for (i = 0; i < nets_length; i++) {
+ if (h->nets[i].cidr[n] != cidr)
+ continue;
+- h->nets[cidr - 1].nets[n]--;
+- if (h->nets[cidr - 1].nets[n] > 0)
++ h->nets[CIDR_POS(cidr)].nets[n]--;
++ if (h->nets[CIDR_POS(cidr)].nets[n] > 0)
+ return;
+ for (j = i; j < net_end && h->nets[j].cidr[n]; j++)
+ h->nets[j].cidr[n] = h->nets[j + 1].cidr[n];
+diff --git a/net/netfilter/ipset/ip_set_hash_netnet.c b/net/netfilter/ipset/ip_set_hash_netnet.c
+index 3c862c0a76d1..a93dfebffa81 100644
+--- a/net/netfilter/ipset/ip_set_hash_netnet.c
++++ b/net/netfilter/ipset/ip_set_hash_netnet.c
+@@ -131,6 +131,13 @@ hash_netnet4_data_next(struct hash_netnet4_elem *next,
+ #define HOST_MASK 32
+ #include "ip_set_hash_gen.h"
+
++static void
++hash_netnet4_init(struct hash_netnet4_elem *e)
++{
++ e->cidr[0] = HOST_MASK;
++ e->cidr[1] = HOST_MASK;
++}
++
+ static int
+ hash_netnet4_kadt(struct ip_set *set, const struct sk_buff *skb,
+ const struct xt_action_param *par,
+@@ -160,7 +167,7 @@ hash_netnet4_uadt(struct ip_set *set, struct nlattr *tb[],
+ {
+ const struct hash_netnet *h = set->data;
+ ipset_adtfn adtfn = set->variant->adt[adt];
+- struct hash_netnet4_elem e = { .cidr = { HOST_MASK, HOST_MASK, }, };
++ struct hash_netnet4_elem e = { };
+ struct ip_set_ext ext = IP_SET_INIT_UEXT(set);
+ u32 ip = 0, ip_to = 0, last;
+ u32 ip2 = 0, ip2_from = 0, ip2_to = 0, last2;
+@@ -169,6 +176,7 @@ hash_netnet4_uadt(struct ip_set *set, struct nlattr *tb[],
+ if (tb[IPSET_ATTR_LINENO])
+ *lineno = nla_get_u32(tb[IPSET_ATTR_LINENO]);
+
++ hash_netnet4_init(&e);
+ if (unlikely(!tb[IPSET_ATTR_IP] || !tb[IPSET_ATTR_IP2] ||
+ !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS)))
+ return -IPSET_ERR_PROTOCOL;
+@@ -357,6 +365,13 @@ hash_netnet6_data_next(struct hash_netnet4_elem *next,
+ #define IP_SET_EMIT_CREATE
+ #include "ip_set_hash_gen.h"
+
++static void
++hash_netnet6_init(struct hash_netnet6_elem *e)
++{
++ e->cidr[0] = HOST_MASK;
++ e->cidr[1] = HOST_MASK;
++}
++
+ static int
+ hash_netnet6_kadt(struct ip_set *set, const struct sk_buff *skb,
+ const struct xt_action_param *par,
+@@ -385,13 +400,14 @@ hash_netnet6_uadt(struct ip_set *set, struct nlattr *tb[],
+ enum ipset_adt adt, u32 *lineno, u32 flags, bool retried)
+ {
+ ipset_adtfn adtfn = set->variant->adt[adt];
+- struct hash_netnet6_elem e = { .cidr = { HOST_MASK, HOST_MASK, }, };
++ struct hash_netnet6_elem e = { };
+ struct ip_set_ext ext = IP_SET_INIT_UEXT(set);
+ int ret;
+
+ if (tb[IPSET_ATTR_LINENO])
+ *lineno = nla_get_u32(tb[IPSET_ATTR_LINENO]);
+
++ hash_netnet6_init(&e);
+ if (unlikely(!tb[IPSET_ATTR_IP] || !tb[IPSET_ATTR_IP2] ||
+ !ip_set_optattr_netorder(tb, IPSET_ATTR_CADT_FLAGS)))
+ return -IPSET_ERR_PROTOCOL;
+diff --git a/net/netfilter/ipset/ip_set_hash_netportnet.c b/net/netfilter/ipset/ip_set_hash_netportnet.c
+index 0c68734f5cc4..9a14c237830f 100644
+--- a/net/netfilter/ipset/ip_set_hash_netportnet.c
++++ b/net/netfilter/ipset/ip_set_hash_netportnet.c
+@@ -142,6 +142,13 @@ hash_netportnet4_data_next(struct hash_netportnet4_elem *next,
+ #define HOST_MASK 32
+ #include "ip_set_hash_gen.h"
+
++static void
++hash_netportnet4_init(struct hash_netportnet4_elem *e)
++{
++ e->cidr[0] = HOST_MASK;
++ e->cidr[1] = HOST_MASK;
++}
++
+ static int
+ hash_netportnet4_kadt(struct ip_set *set, const struct sk_buff *skb,
+ const struct xt_action_param *par,
+@@ -175,7 +182,7 @@ hash_netportnet4_uadt(struct ip_set *set, struct nlattr *tb[],
+ {
+ const struct hash_netportnet *h = set->data;
+ ipset_adtfn adtfn = set->variant->adt[adt];
+- struct hash_netportnet4_elem e = { .cidr = { HOST_MASK, HOST_MASK, }, };
++ struct hash_netportnet4_elem e = { };
+ struct ip_set_ext ext = IP_SET_INIT_UEXT(set);
+ u32 ip = 0, ip_to = 0, ip_last, p = 0, port, port_to;
+ u32 ip2_from = 0, ip2_to = 0, ip2_last, ip2;
+@@ -185,6 +192,7 @@ hash_netportnet4_uadt(struct ip_set *set, struct nlattr *tb[],
+ if (tb[IPSET_ATTR_LINENO])
+ *lineno = nla_get_u32(tb[IPSET_ATTR_LINENO]);
+
++ hash_netportnet4_init(&e);
+ if (unlikely(!tb[IPSET_ATTR_IP] || !tb[IPSET_ATTR_IP2] ||
+ !ip_set_attr_netorder(tb, IPSET_ATTR_PORT) ||
+ !ip_set_optattr_netorder(tb, IPSET_ATTR_PORT_TO) ||
+@@ -412,6 +420,13 @@ hash_netportnet6_data_next(struct hash_netportnet4_elem *next,
+ #define IP_SET_EMIT_CREATE
+ #include "ip_set_hash_gen.h"
+
++static void
++hash_netportnet6_init(struct hash_netportnet6_elem *e)
++{
++ e->cidr[0] = HOST_MASK;
++ e->cidr[1] = HOST_MASK;
++}
++
+ static int
+ hash_netportnet6_kadt(struct ip_set *set, const struct sk_buff *skb,
+ const struct xt_action_param *par,
+@@ -445,7 +460,7 @@ hash_netportnet6_uadt(struct ip_set *set, struct nlattr *tb[],
+ {
+ const struct hash_netportnet *h = set->data;
+ ipset_adtfn adtfn = set->variant->adt[adt];
+- struct hash_netportnet6_elem e = { .cidr = { HOST_MASK, HOST_MASK, }, };
++ struct hash_netportnet6_elem e = { };
+ struct ip_set_ext ext = IP_SET_INIT_UEXT(set);
+ u32 port, port_to;
+ bool with_ports = false;
+@@ -454,6 +469,7 @@ hash_netportnet6_uadt(struct ip_set *set, struct nlattr *tb[],
+ if (tb[IPSET_ATTR_LINENO])
+ *lineno = nla_get_u32(tb[IPSET_ATTR_LINENO]);
+
++ hash_netportnet6_init(&e);
+ if (unlikely(!tb[IPSET_ATTR_IP] || !tb[IPSET_ATTR_IP2] ||
+ !ip_set_attr_netorder(tb, IPSET_ATTR_PORT) ||
+ !ip_set_optattr_netorder(tb, IPSET_ATTR_PORT_TO) ||
+diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
+index 3c20d02aee73..0625a42df108 100644
+--- a/net/netfilter/nf_conntrack_core.c
++++ b/net/netfilter/nf_conntrack_core.c
+@@ -320,12 +320,13 @@ out_free:
+ }
+ EXPORT_SYMBOL_GPL(nf_ct_tmpl_alloc);
+
+-static void nf_ct_tmpl_free(struct nf_conn *tmpl)
++void nf_ct_tmpl_free(struct nf_conn *tmpl)
+ {
+ nf_ct_ext_destroy(tmpl);
+ nf_ct_ext_free(tmpl);
+ kfree(tmpl);
+ }
++EXPORT_SYMBOL_GPL(nf_ct_tmpl_free);
+
+ static void
+ destroy_conntrack(struct nf_conntrack *nfct)
+diff --git a/net/netfilter/nf_log.c b/net/netfilter/nf_log.c
+index 675d12c69e32..a5d41dfa9f05 100644
+--- a/net/netfilter/nf_log.c
++++ b/net/netfilter/nf_log.c
+@@ -107,12 +107,17 @@ EXPORT_SYMBOL(nf_log_register);
+
+ void nf_log_unregister(struct nf_logger *logger)
+ {
++ const struct nf_logger *log;
+ int i;
+
+ mutex_lock(&nf_log_mutex);
+- for (i = 0; i < NFPROTO_NUMPROTO; i++)
+- RCU_INIT_POINTER(loggers[i][logger->type], NULL);
++ for (i = 0; i < NFPROTO_NUMPROTO; i++) {
++ log = nft_log_dereference(loggers[i][logger->type]);
++ if (log == logger)
++ RCU_INIT_POINTER(loggers[i][logger->type], NULL);
++ }
+ mutex_unlock(&nf_log_mutex);
++ synchronize_rcu();
+ }
+ EXPORT_SYMBOL(nf_log_unregister);
+
+diff --git a/net/netfilter/nf_synproxy_core.c b/net/netfilter/nf_synproxy_core.c
+index d7f168527903..d6ee8f8b19b6 100644
+--- a/net/netfilter/nf_synproxy_core.c
++++ b/net/netfilter/nf_synproxy_core.c
+@@ -378,7 +378,7 @@ static int __net_init synproxy_net_init(struct net *net)
+ err3:
+ free_percpu(snet->stats);
+ err2:
+- nf_conntrack_free(ct);
++ nf_ct_tmpl_free(ct);
+ err1:
+ return err;
+ }
+diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
+index 0c0e8ecf02ab..70277b11f742 100644
+--- a/net/netfilter/nfnetlink.c
++++ b/net/netfilter/nfnetlink.c
+@@ -444,6 +444,7 @@ done:
+ static void nfnetlink_rcv(struct sk_buff *skb)
+ {
+ struct nlmsghdr *nlh = nlmsg_hdr(skb);
++ u_int16_t res_id;
+ int msglen;
+
+ if (nlh->nlmsg_len < NLMSG_HDRLEN ||
+@@ -468,7 +469,12 @@ static void nfnetlink_rcv(struct sk_buff *skb)
+
+ nfgenmsg = nlmsg_data(nlh);
+ skb_pull(skb, msglen);
+- nfnetlink_rcv_batch(skb, nlh, nfgenmsg->res_id);
++ /* Work around old nft using host byte order */
++ if (nfgenmsg->res_id == NFNL_SUBSYS_NFTABLES)
++ res_id = NFNL_SUBSYS_NFTABLES;
++ else
++ res_id = ntohs(nfgenmsg->res_id);
++ nfnetlink_rcv_batch(skb, nlh, res_id);
+ } else {
+ netlink_rcv_skb(skb, &nfnetlink_rcv_msg);
+ }
+diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c
+index 66def315eb56..9c8fab00164b 100644
+--- a/net/netfilter/nft_compat.c
++++ b/net/netfilter/nft_compat.c
+@@ -619,6 +619,13 @@ struct nft_xt {
+
+ static struct nft_expr_type nft_match_type;
+
++static bool nft_match_cmp(const struct xt_match *match,
++ const char *name, u32 rev, u32 family)
++{
++ return strcmp(match->name, name) == 0 && match->revision == rev &&
++ (match->family == NFPROTO_UNSPEC || match->family == family);
++}
++
+ static const struct nft_expr_ops *
+ nft_match_select_ops(const struct nft_ctx *ctx,
+ const struct nlattr * const tb[])
+@@ -626,7 +633,7 @@ nft_match_select_ops(const struct nft_ctx *ctx,
+ struct nft_xt *nft_match;
+ struct xt_match *match;
+ char *mt_name;
+- __u32 rev, family;
++ u32 rev, family;
+
+ if (tb[NFTA_MATCH_NAME] == NULL ||
+ tb[NFTA_MATCH_REV] == NULL ||
+@@ -641,8 +648,7 @@ nft_match_select_ops(const struct nft_ctx *ctx,
+ list_for_each_entry(nft_match, &nft_match_list, head) {
+ struct xt_match *match = nft_match->ops.data;
+
+- if (strcmp(match->name, mt_name) == 0 &&
+- match->revision == rev && match->family == family) {
++ if (nft_match_cmp(match, mt_name, rev, family)) {
+ if (!try_module_get(match->me))
+ return ERR_PTR(-ENOENT);
+
+@@ -693,6 +699,13 @@ static LIST_HEAD(nft_target_list);
+
+ static struct nft_expr_type nft_target_type;
+
++static bool nft_target_cmp(const struct xt_target *tg,
++ const char *name, u32 rev, u32 family)
++{
++ return strcmp(tg->name, name) == 0 && tg->revision == rev &&
++ (tg->family == NFPROTO_UNSPEC || tg->family == family);
++}
++
+ static const struct nft_expr_ops *
+ nft_target_select_ops(const struct nft_ctx *ctx,
+ const struct nlattr * const tb[])
+@@ -700,7 +713,7 @@ nft_target_select_ops(const struct nft_ctx *ctx,
+ struct nft_xt *nft_target;
+ struct xt_target *target;
+ char *tg_name;
+- __u32 rev, family;
++ u32 rev, family;
+
+ if (tb[NFTA_TARGET_NAME] == NULL ||
+ tb[NFTA_TARGET_REV] == NULL ||
+@@ -715,8 +728,7 @@ nft_target_select_ops(const struct nft_ctx *ctx,
+ list_for_each_entry(nft_target, &nft_target_list, head) {
+ struct xt_target *target = nft_target->ops.data;
+
+- if (strcmp(target->name, tg_name) == 0 &&
+- target->revision == rev && target->family == family) {
++ if (nft_target_cmp(target, tg_name, rev, family)) {
+ if (!try_module_get(target->me))
+ return ERR_PTR(-ENOENT);
+
+diff --git a/net/netfilter/xt_CT.c b/net/netfilter/xt_CT.c
+index 43ddeee404e9..f3377ce1ff18 100644
+--- a/net/netfilter/xt_CT.c
++++ b/net/netfilter/xt_CT.c
+@@ -233,7 +233,7 @@ out:
+ return 0;
+
+ err3:
+- nf_conntrack_free(ct);
++ nf_ct_tmpl_free(ct);
+ err2:
+ nf_ct_l3proto_module_put(par->family);
+ err1:
+diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+index d25cd430f9ff..95412abc95b0 100644
+--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
++++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+@@ -384,6 +384,7 @@ static int send_reply(struct svcxprt_rdma *rdma,
+ int byte_count)
+ {
+ struct ib_send_wr send_wr;
++ u32 xdr_off;
+ int sge_no;
+ int sge_bytes;
+ int page_no;
+@@ -418,8 +419,8 @@ static int send_reply(struct svcxprt_rdma *rdma,
+ ctxt->direction = DMA_TO_DEVICE;
+
+ /* Map the payload indicated by 'byte_count' */
++ xdr_off = 0;
+ for (sge_no = 1; byte_count && sge_no < vec->count; sge_no++) {
+- int xdr_off = 0;
+ sge_bytes = min_t(size_t, vec->sge[sge_no].iov_len, byte_count);
+ byte_count -= sge_bytes;
+ ctxt->sge[sge_no].addr =
+@@ -457,6 +458,13 @@ static int send_reply(struct svcxprt_rdma *rdma,
+ }
+ rqstp->rq_next_page = rqstp->rq_respages + 1;
+
++ /* The loop above bumps sc_dma_used for each sge. The
++ * xdr_buf.tail gets a separate sge, but resides in the
++ * same page as xdr_buf.head. Don't count it twice.
++ */
++ if (sge_no > ctxt->count)
++ atomic_dec(&rdma->sc_dma_used);
++
+ if (sge_no > rdma->sc_max_sge) {
+ pr_err("svcrdma: Too many sges (%d)\n", sge_no);
+ goto err;
+diff --git a/sound/arm/Kconfig b/sound/arm/Kconfig
+index 885683a3b0bd..e0406211716b 100644
+--- a/sound/arm/Kconfig
++++ b/sound/arm/Kconfig
+@@ -9,6 +9,14 @@ menuconfig SND_ARM
+ Drivers that are implemented on ASoC can be found in
+ "ALSA for SoC audio support" section.
+
++config SND_PXA2XX_LIB
++ tristate
++ select SND_AC97_CODEC if SND_PXA2XX_LIB_AC97
++ select SND_DMAENGINE_PCM
++
++config SND_PXA2XX_LIB_AC97
++ bool
++
+ if SND_ARM
+
+ config SND_ARMAACI
+@@ -21,13 +29,6 @@ config SND_PXA2XX_PCM
+ tristate
+ select SND_PCM
+
+-config SND_PXA2XX_LIB
+- tristate
+- select SND_AC97_CODEC if SND_PXA2XX_LIB_AC97
+-
+-config SND_PXA2XX_LIB_AC97
+- bool
+-
+ config SND_PXA2XX_AC97
+ tristate "AC97 driver for the Intel PXA2xx chip"
+ depends on ARCH_PXA
+diff --git a/sound/pci/hda/hda_tegra.c b/sound/pci/hda/hda_tegra.c
+index 477742cb70a2..58c0aad37284 100644
+--- a/sound/pci/hda/hda_tegra.c
++++ b/sound/pci/hda/hda_tegra.c
+@@ -73,6 +73,7 @@ struct hda_tegra {
+ struct clk *hda2codec_2x_clk;
+ struct clk *hda2hdmi_clk;
+ void __iomem *regs;
++ struct work_struct probe_work;
+ };
+
+ #ifdef CONFIG_PM
+@@ -294,7 +295,9 @@ static int hda_tegra_dev_disconnect(struct snd_device *device)
+ static int hda_tegra_dev_free(struct snd_device *device)
+ {
+ struct azx *chip = device->device_data;
++ struct hda_tegra *hda = container_of(chip, struct hda_tegra, chip);
+
++ cancel_work_sync(&hda->probe_work);
+ if (azx_bus(chip)->chip_init) {
+ azx_stop_all_streams(chip);
+ azx_stop_chip(chip);
+@@ -426,6 +429,9 @@ static int hda_tegra_first_init(struct azx *chip, struct platform_device *pdev)
+ /*
+ * constructor
+ */
++
++static void hda_tegra_probe_work(struct work_struct *work);
++
+ static int hda_tegra_create(struct snd_card *card,
+ unsigned int driver_caps,
+ struct hda_tegra *hda)
+@@ -452,6 +458,8 @@ static int hda_tegra_create(struct snd_card *card,
+ chip->single_cmd = false;
+ chip->snoop = true;
+
++ INIT_WORK(&hda->probe_work, hda_tegra_probe_work);
++
+ err = azx_bus_init(chip, NULL, &hda_tegra_io_ops);
+ if (err < 0)
+ return err;
+@@ -499,6 +507,21 @@ static int hda_tegra_probe(struct platform_device *pdev)
+ card->private_data = chip;
+
+ dev_set_drvdata(&pdev->dev, card);
++ schedule_work(&hda->probe_work);
++
++ return 0;
++
++out_free:
++ snd_card_free(card);
++ return err;
++}
++
++static void hda_tegra_probe_work(struct work_struct *work)
++{
++ struct hda_tegra *hda = container_of(work, struct hda_tegra, probe_work);
++ struct azx *chip = &hda->chip;
++ struct platform_device *pdev = to_platform_device(hda->dev);
++ int err;
+
+ err = hda_tegra_first_init(chip, pdev);
+ if (err < 0)
+@@ -520,11 +543,8 @@ static int hda_tegra_probe(struct platform_device *pdev)
+ chip->running = 1;
+ snd_hda_set_power_save(&chip->bus, power_save * 1000);
+
+- return 0;
+-
+-out_free:
+- snd_card_free(card);
+- return err;
++ out_free:
++ return; /* no error return from async probe */
+ }
+
+ static int hda_tegra_remove(struct platform_device *pdev)
+diff --git a/sound/pci/hda/patch_cirrus.c b/sound/pci/hda/patch_cirrus.c
+index 584a0343ab0c..85813de26da8 100644
+--- a/sound/pci/hda/patch_cirrus.c
++++ b/sound/pci/hda/patch_cirrus.c
+@@ -633,6 +633,7 @@ static const struct snd_pci_quirk cs4208_mac_fixup_tbl[] = {
+ SND_PCI_QUIRK(0x106b, 0x5e00, "MacBookPro 11,2", CS4208_MBP11),
+ SND_PCI_QUIRK(0x106b, 0x7100, "MacBookAir 6,1", CS4208_MBA6),
+ SND_PCI_QUIRK(0x106b, 0x7200, "MacBookAir 6,2", CS4208_MBA6),
++ SND_PCI_QUIRK(0x106b, 0x7b00, "MacBookPro 12,1", CS4208_MBP11),
+ {} /* terminator */
+ };
+
+diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
+index c8f01ccc2513..6a66139871c6 100644
+--- a/sound/pci/hda/patch_realtek.c
++++ b/sound/pci/hda/patch_realtek.c
+@@ -4188,6 +4188,24 @@ static void alc_fixup_disable_aamix(struct hda_codec *codec,
+ }
+ }
+
++/* fixup for Thinkpad docks: add dock pins, avoid HP parser fixup */
++static void alc_fixup_tpt440_dock(struct hda_codec *codec,
++ const struct hda_fixup *fix, int action)
++{
++ static const struct hda_pintbl pincfgs[] = {
++ { 0x16, 0x21211010 }, /* dock headphone */
++ { 0x19, 0x21a11010 }, /* dock mic */
++ { }
++ };
++ struct alc_spec *spec = codec->spec;
++
++ if (action == HDA_FIXUP_ACT_PRE_PROBE) {
++ spec->parse_flags = HDA_PINCFG_NO_HP_FIXUP;
++ codec->power_save_node = 0; /* avoid click noises */
++ snd_hda_apply_pincfgs(codec, pincfgs);
++ }
++}
++
+ static void alc_shutup_dell_xps13(struct hda_codec *codec)
+ {
+ struct alc_spec *spec = codec->spec;
+@@ -4562,7 +4580,6 @@ enum {
+ ALC255_FIXUP_HEADSET_MODE_NO_HP_MIC,
+ ALC293_FIXUP_DELL1_MIC_NO_PRESENCE,
+ ALC292_FIXUP_TPT440_DOCK,
+- ALC292_FIXUP_TPT440_DOCK2,
+ ALC283_FIXUP_BXBT2807_MIC,
+ ALC255_FIXUP_DELL_WMI_MIC_MUTE_LED,
+ ALC282_FIXUP_ASPIRE_V5_PINS,
+@@ -5029,17 +5046,7 @@ static const struct hda_fixup alc269_fixups[] = {
+ },
+ [ALC292_FIXUP_TPT440_DOCK] = {
+ .type = HDA_FIXUP_FUNC,
+- .v.func = alc269_fixup_pincfg_no_hp_to_lineout,
+- .chained = true,
+- .chain_id = ALC292_FIXUP_TPT440_DOCK2
+- },
+- [ALC292_FIXUP_TPT440_DOCK2] = {
+- .type = HDA_FIXUP_PINS,
+- .v.pins = (const struct hda_pintbl[]) {
+- { 0x16, 0x21211010 }, /* dock headphone */
+- { 0x19, 0x21a11010 }, /* dock mic */
+- { }
+- },
++ .v.func = alc_fixup_tpt440_dock,
+ .chained = true,
+ .chain_id = ALC269_FIXUP_LIMIT_INT_MIC_BOOST
+ },
+@@ -5299,6 +5306,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = {
+ SND_PCI_QUIRK(0x17aa, 0x2212, "Thinkpad T440", ALC292_FIXUP_TPT440_DOCK),
+ SND_PCI_QUIRK(0x17aa, 0x2214, "Thinkpad X240", ALC292_FIXUP_TPT440_DOCK),
+ SND_PCI_QUIRK(0x17aa, 0x2215, "Thinkpad", ALC269_FIXUP_LIMIT_INT_MIC_BOOST),
++ SND_PCI_QUIRK(0x17aa, 0x2223, "ThinkPad T550", ALC292_FIXUP_TPT440_DOCK),
+ SND_PCI_QUIRK(0x17aa, 0x2226, "ThinkPad X250", ALC292_FIXUP_TPT440_DOCK),
+ SND_PCI_QUIRK(0x17aa, 0x3977, "IdeaPad S210", ALC283_FIXUP_INT_MIC),
+ SND_PCI_QUIRK(0x17aa, 0x3978, "IdeaPad Y410P", ALC269_FIXUP_NO_SHUTUP),
+diff --git a/sound/pci/hda/patch_sigmatel.c b/sound/pci/hda/patch_sigmatel.c
+index 9d947aef2c8b..def5cc8dff02 100644
+--- a/sound/pci/hda/patch_sigmatel.c
++++ b/sound/pci/hda/patch_sigmatel.c
+@@ -4520,7 +4520,11 @@ static int patch_stac92hd73xx(struct hda_codec *codec)
+ return err;
+
+ spec = codec->spec;
+- codec->power_save_node = 1;
++ /* enable power_save_node only for new 92HD89xx chips, as it causes
++ * click noises on old 92HD73xx chips.
++ */
++ if ((codec->core.vendor_id & 0xfffffff0) != 0x111d7670)
++ codec->power_save_node = 1;
+ spec->linear_tone_beep = 0;
+ spec->gen.mixer_nid = 0x1d;
+ spec->have_spdif_mux = 1;
+diff --git a/sound/soc/au1x/db1200.c b/sound/soc/au1x/db1200.c
+index 58c3164802b8..8c907ebea189 100644
+--- a/sound/soc/au1x/db1200.c
++++ b/sound/soc/au1x/db1200.c
+@@ -129,6 +129,8 @@ static struct snd_soc_dai_link db1300_i2s_dai = {
+ .cpu_dai_name = "au1xpsc_i2s.2",
+ .platform_name = "au1xpsc-pcm.2",
+ .codec_name = "wm8731.0-001b",
++ .dai_fmt = SND_SOC_DAIFMT_LEFT_J | SND_SOC_DAIFMT_NB_NF |
++ SND_SOC_DAIFMT_CBM_CFM,
+ .ops = &db1200_i2s_wm8731_ops,
+ };
+
+@@ -146,6 +148,8 @@ static struct snd_soc_dai_link db1550_i2s_dai = {
+ .cpu_dai_name = "au1xpsc_i2s.3",
+ .platform_name = "au1xpsc-pcm.3",
+ .codec_name = "wm8731.0-001b",
++ .dai_fmt = SND_SOC_DAIFMT_LEFT_J | SND_SOC_DAIFMT_NB_NF |
++ SND_SOC_DAIFMT_CBM_CFM,
+ .ops = &db1200_i2s_wm8731_ops,
+ };
+
+diff --git a/sound/soc/codecs/sgtl5000.c b/sound/soc/codecs/sgtl5000.c
+index e673f6ceb521..7c411297bfdd 100644
+--- a/sound/soc/codecs/sgtl5000.c
++++ b/sound/soc/codecs/sgtl5000.c
+@@ -1377,8 +1377,8 @@ static int sgtl5000_probe(struct snd_soc_codec *codec)
+ sgtl5000->micbias_resistor << SGTL5000_BIAS_R_SHIFT);
+
+ snd_soc_update_bits(codec, SGTL5000_CHIP_MIC_CTRL,
+- SGTL5000_BIAS_R_MASK,
+- sgtl5000->micbias_voltage << SGTL5000_BIAS_R_SHIFT);
++ SGTL5000_BIAS_VOLT_MASK,
++ sgtl5000->micbias_voltage << SGTL5000_BIAS_VOLT_SHIFT);
+ /*
+ * disable DAP
+ * TODO:
+diff --git a/sound/soc/codecs/tas2552.c b/sound/soc/codecs/tas2552.c
+index 4f25a7d0efa2..b3e5685aca1e 100644
+--- a/sound/soc/codecs/tas2552.c
++++ b/sound/soc/codecs/tas2552.c
+@@ -551,7 +551,7 @@ static struct snd_soc_dai_driver tas2552_dai[] = {
+ /*
+ * DAC digital volumes. From -7 to 24 dB in 1 dB steps
+ */
+-static DECLARE_TLV_DB_SCALE(dac_tlv, -7, 100, 0);
++static DECLARE_TLV_DB_SCALE(dac_tlv, -700, 100, 0);
+
+ static const char * const tas2552_din_source_select[] = {
+ "Muted",
+diff --git a/sound/soc/dwc/designware_i2s.c b/sound/soc/dwc/designware_i2s.c
+index a3e97b46b64e..0d28e3b356f6 100644
+--- a/sound/soc/dwc/designware_i2s.c
++++ b/sound/soc/dwc/designware_i2s.c
+@@ -131,10 +131,10 @@ static inline void i2s_clear_irqs(struct dw_i2s_dev *dev, u32 stream)
+
+ if (stream == SNDRV_PCM_STREAM_PLAYBACK) {
+ for (i = 0; i < 4; i++)
+- i2s_write_reg(dev->i2s_base, TOR(i), 0);
++ i2s_read_reg(dev->i2s_base, TOR(i));
+ } else {
+ for (i = 0; i < 4; i++)
+- i2s_write_reg(dev->i2s_base, ROR(i), 0);
++ i2s_read_reg(dev->i2s_base, ROR(i));
+ }
+ }
+
+diff --git a/sound/soc/pxa/Kconfig b/sound/soc/pxa/Kconfig
+index 39cea80846c3..f2bf8661dd21 100644
+--- a/sound/soc/pxa/Kconfig
++++ b/sound/soc/pxa/Kconfig
+@@ -1,7 +1,6 @@
+ config SND_PXA2XX_SOC
+ tristate "SoC Audio for the Intel PXA2xx chip"
+ depends on ARCH_PXA
+- select SND_ARM
+ select SND_PXA2XX_LIB
+ help
+ Say Y or M if you want to add support for codecs attached to
+@@ -25,7 +24,6 @@ config SND_PXA2XX_AC97
+ config SND_PXA2XX_SOC_AC97
+ tristate
+ select AC97_BUS
+- select SND_ARM
+ select SND_PXA2XX_LIB_AC97
+ select SND_SOC_AC97_BUS
+
+diff --git a/sound/soc/pxa/pxa2xx-ac97.c b/sound/soc/pxa/pxa2xx-ac97.c
+index 1f6054650991..9e4b04e0fbd1 100644
+--- a/sound/soc/pxa/pxa2xx-ac97.c
++++ b/sound/soc/pxa/pxa2xx-ac97.c
+@@ -49,7 +49,7 @@ static struct snd_ac97_bus_ops pxa2xx_ac97_ops = {
+ .reset = pxa2xx_ac97_cold_reset,
+ };
+
+-static unsigned long pxa2xx_ac97_pcm_stereo_in_req = 12;
++static unsigned long pxa2xx_ac97_pcm_stereo_in_req = 11;
+ static struct snd_dmaengine_dai_dma_data pxa2xx_ac97_pcm_stereo_in = {
+ .addr = __PREG(PCDR),
+ .addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES,
+@@ -57,7 +57,7 @@ static struct snd_dmaengine_dai_dma_data pxa2xx_ac97_pcm_stereo_in = {
+ .filter_data = &pxa2xx_ac97_pcm_stereo_in_req,
+ };
+
+-static unsigned long pxa2xx_ac97_pcm_stereo_out_req = 11;
++static unsigned long pxa2xx_ac97_pcm_stereo_out_req = 12;
+ static struct snd_dmaengine_dai_dma_data pxa2xx_ac97_pcm_stereo_out = {
+ .addr = __PREG(PCDR),
+ .addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES,
+diff --git a/sound/synth/emux/emux_oss.c b/sound/synth/emux/emux_oss.c
+index 82e350e9501c..ac75816ada7c 100644
+--- a/sound/synth/emux/emux_oss.c
++++ b/sound/synth/emux/emux_oss.c
+@@ -69,7 +69,8 @@ snd_emux_init_seq_oss(struct snd_emux *emu)
+ struct snd_seq_oss_reg *arg;
+ struct snd_seq_device *dev;
+
+- if (snd_seq_device_new(emu->card, 0, SNDRV_SEQ_DEV_ID_OSS,
++ /* using device#1 here for avoiding conflicts with OPL3 */
++ if (snd_seq_device_new(emu->card, 1, SNDRV_SEQ_DEV_ID_OSS,
+ sizeof(struct snd_seq_oss_reg), &dev) < 0)
+ return;
+
+diff --git a/tools/lguest/lguest.c b/tools/lguest/lguest.c
+index e44052483ed9..80159e6811c2 100644
+--- a/tools/lguest/lguest.c
++++ b/tools/lguest/lguest.c
+@@ -125,7 +125,11 @@ struct device_list {
+ /* The list of Guest devices, based on command line arguments. */
+ static struct device_list devices;
+
+-struct virtio_pci_cfg_cap {
++/*
++ * Just like struct virtio_pci_cfg_cap in uapi/linux/virtio_pci.h,
++ * but uses a u32 explicitly for the data.
++ */
++struct virtio_pci_cfg_cap_u32 {
+ struct virtio_pci_cap cap;
+ u32 pci_cfg_data; /* Data for BAR access. */
+ };
+@@ -157,7 +161,7 @@ struct pci_config {
+ struct virtio_pci_notify_cap notify;
+ struct virtio_pci_cap isr;
+ struct virtio_pci_cap device;
+- struct virtio_pci_cfg_cap cfg_access;
++ struct virtio_pci_cfg_cap_u32 cfg_access;
+ };
+
+ /* The device structure describes a single device. */
+@@ -1291,7 +1295,7 @@ static struct device *dev_and_reg(u32 *reg)
+ * only fault if they try to write with some invalid bar/offset/length.
+ */
+ static bool valid_bar_access(struct device *d,
+- struct virtio_pci_cfg_cap *cfg_access)
++ struct virtio_pci_cfg_cap_u32 *cfg_access)
+ {
+ /* We only have 1 bar (BAR0) */
+ if (cfg_access->cap.bar != 0)
+diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c
+index cc25f059ab3d..a843bee66a4f 100644
+--- a/tools/lib/traceevent/event-parse.c
++++ b/tools/lib/traceevent/event-parse.c
+@@ -3721,7 +3721,7 @@ static void print_str_arg(struct trace_seq *s, void *data, int size,
+ struct format_field *field;
+ struct printk_map *printk;
+ long long val, fval;
+- unsigned long addr;
++ unsigned long long addr;
+ char *str;
+ unsigned char *hex;
+ int print;
+@@ -3754,13 +3754,30 @@ static void print_str_arg(struct trace_seq *s, void *data, int size,
+ */
+ if (!(field->flags & FIELD_IS_ARRAY) &&
+ field->size == pevent->long_size) {
+- addr = *(unsigned long *)(data + field->offset);
++
++ /* Handle heterogeneous recording and processing
++ * architectures
++ *
++ * CASE I:
++ * Traces recorded on 32-bit devices (32-bit
++ * addressing) and processed on 64-bit devices:
++ * In this case, only 32 bits should be read.
++ *
++ * CASE II:
++ * Traces recorded on 64 bit devices and processed
++ * on 32-bit devices:
++ * In this case, 64 bits must be read.
++ */
++ addr = (pevent->long_size == 8) ?
++ *(unsigned long long *)(data + field->offset) :
++ (unsigned long long)*(unsigned int *)(data + field->offset);
++
+ /* Check if it matches a print format */
+ printk = find_printk(pevent, addr);
+ if (printk)
+ trace_seq_puts(s, printk->printk);
+ else
+- trace_seq_printf(s, "%lx", addr);
++ trace_seq_printf(s, "%llx", addr);
+ break;
+ }
+ str = malloc(len + 1);
+diff --git a/tools/perf/arch/alpha/Build b/tools/perf/arch/alpha/Build
+new file mode 100644
+index 000000000000..1bb8bf6d7fd4
+--- /dev/null
++++ b/tools/perf/arch/alpha/Build
+@@ -0,0 +1 @@
++# empty
+diff --git a/tools/perf/arch/mips/Build b/tools/perf/arch/mips/Build
+new file mode 100644
+index 000000000000..1bb8bf6d7fd4
+--- /dev/null
++++ b/tools/perf/arch/mips/Build
+@@ -0,0 +1 @@
++# empty
+diff --git a/tools/perf/arch/parisc/Build b/tools/perf/arch/parisc/Build
+new file mode 100644
+index 000000000000..1bb8bf6d7fd4
+--- /dev/null
++++ b/tools/perf/arch/parisc/Build
+@@ -0,0 +1 @@
++# empty
+diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
+index d99d850e1444..ef355fc0e870 100644
+--- a/tools/perf/builtin-stat.c
++++ b/tools/perf/builtin-stat.c
+@@ -694,7 +694,7 @@ static void abs_printout(int id, int nr, struct perf_evsel *evsel, double avg)
+ static void print_aggr(char *prefix)
+ {
+ struct perf_evsel *counter;
+- int cpu, cpu2, s, s2, id, nr;
++ int cpu, s, s2, id, nr;
+ double uval;
+ u64 ena, run, val;
+
+@@ -707,8 +707,7 @@ static void print_aggr(char *prefix)
+ val = ena = run = 0;
+ nr = 0;
+ for (cpu = 0; cpu < perf_evsel__nr_cpus(counter); cpu++) {
+- cpu2 = perf_evsel__cpus(counter)->map[cpu];
+- s2 = aggr_get_id(evsel_list->cpus, cpu2);
++ s2 = aggr_get_id(perf_evsel__cpus(counter), cpu);
+ if (s2 != id)
+ continue;
+ val += perf_counts(counter->counts, cpu, 0)->val;
+diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
+index 03ace57a800c..4215cc155041 100644
+--- a/tools/perf/util/header.c
++++ b/tools/perf/util/header.c
+@@ -1442,7 +1442,7 @@ static int process_nrcpus(struct perf_file_section *section __maybe_unused,
+ if (ph->needs_swap)
+ nr = bswap_32(nr);
+
+- ph->env.nr_cpus_online = nr;
++ ph->env.nr_cpus_avail = nr;
+
+ ret = readn(fd, &nr, sizeof(nr));
+ if (ret != sizeof(nr))
+@@ -1451,7 +1451,7 @@ static int process_nrcpus(struct perf_file_section *section __maybe_unused,
+ if (ph->needs_swap)
+ nr = bswap_32(nr);
+
+- ph->env.nr_cpus_avail = nr;
++ ph->env.nr_cpus_online = nr;
+ return 0;
+ }
+
+diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
+index 6f28d53d4e46..f298c696e24f 100644
+--- a/tools/perf/util/hist.c
++++ b/tools/perf/util/hist.c
+@@ -151,6 +151,9 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
+ hists__new_col_len(hists, HISTC_LOCAL_WEIGHT, 12);
+ hists__new_col_len(hists, HISTC_GLOBAL_WEIGHT, 12);
+
++ if (h->srcline)
++ hists__new_col_len(hists, HISTC_SRCLINE, strlen(h->srcline));
++
+ if (h->transaction)
+ hists__new_col_len(hists, HISTC_TRANSACTION,
+ hist_entry__transaction_len());
+diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
+index 591905a02b92..9cd70819c795 100644
+--- a/tools/perf/util/parse-events.y
++++ b/tools/perf/util/parse-events.y
+@@ -255,7 +255,7 @@ PE_PMU_EVENT_PRE '-' PE_PMU_EVENT_SUF sep_dc
+ list_add_tail(&term->list, head);
+
+ ALLOC_LIST(list);
+- ABORT_ON(parse_events_add_pmu(list, &data->idx, "cpu", head));
++ ABORT_ON(parse_events_add_pmu(data, list, "cpu", head));
+ parse_events__free_terms(head);
+ $$ = list;
+ }
+diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
+index 381f23a443c7..ae6351db6de4 100644
+--- a/tools/perf/util/probe-event.c
++++ b/tools/perf/util/probe-event.c
+@@ -274,12 +274,13 @@ static int kernel_get_module_dso(const char *module, struct dso **pdso)
+ int ret = 0;
+
+ if (module) {
+- list_for_each_entry(dso, &host_machine->dsos.head, node) {
+- if (!dso->kernel)
+- continue;
+- if (strncmp(dso->short_name + 1, module,
+- dso->short_name_len - 2) == 0)
+- goto found;
++ char module_name[128];
++
++ snprintf(module_name, sizeof(module_name), "[%s]", module);
++ map = map_groups__find_by_name(&host_machine->kmaps, MAP__FUNCTION, module_name);
++ if (map) {
++ dso = map->dso;
++ goto found;
+ }
+ pr_debug("Failed to find module %s.\n", module);
+ return -ENOENT;
+diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
+index 31db6ee7db54..cd55c6db421d 100644
+--- a/tools/perf/util/probe-event.h
++++ b/tools/perf/util/probe-event.h
+@@ -106,6 +106,8 @@ struct variable_list {
+ struct strlist *vars; /* Available variables */
+ };
+
++struct map;
++
+ /* Command string to events */
+ extern int parse_perf_probe_command(const char *cmd,
+ struct perf_probe_event *pev);
+diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
+index 65f7e389ae09..333858821ab0 100644
+--- a/tools/perf/util/symbol-elf.c
++++ b/tools/perf/util/symbol-elf.c
+@@ -1260,8 +1260,6 @@ out_close:
+ static int kcore__init(struct kcore *kcore, char *filename, int elfclass,
+ bool temp)
+ {
+- GElf_Ehdr *ehdr;
+-
+ kcore->elfclass = elfclass;
+
+ if (temp)
+@@ -1278,9 +1276,7 @@ static int kcore__init(struct kcore *kcore, char *filename, int elfclass,
+ if (!gelf_newehdr(kcore->elf, elfclass))
+ goto out_end;
+
+- ehdr = gelf_getehdr(kcore->elf, &kcore->ehdr);
+- if (!ehdr)
+- goto out_end;
++ memset(&kcore->ehdr, 0, sizeof(GElf_Ehdr));
+
+ return 0;
+
+@@ -1337,23 +1333,18 @@ static int kcore__copy_hdr(struct kcore *from, struct kcore *to, size_t count)
+ static int kcore__add_phdr(struct kcore *kcore, int idx, off_t offset,
+ u64 addr, u64 len)
+ {
+- GElf_Phdr gphdr;
+- GElf_Phdr *phdr;
+-
+- phdr = gelf_getphdr(kcore->elf, idx, &gphdr);
+- if (!phdr)
+- return -1;
+-
+- phdr->p_type = PT_LOAD;
+- phdr->p_flags = PF_R | PF_W | PF_X;
+- phdr->p_offset = offset;
+- phdr->p_vaddr = addr;
+- phdr->p_paddr = 0;
+- phdr->p_filesz = len;
+- phdr->p_memsz = len;
+- phdr->p_align = page_size;
+-
+- if (!gelf_update_phdr(kcore->elf, idx, phdr))
++ GElf_Phdr phdr = {
++ .p_type = PT_LOAD,
++ .p_flags = PF_R | PF_W | PF_X,
++ .p_offset = offset,
++ .p_vaddr = addr,
++ .p_paddr = 0,
++ .p_filesz = len,
++ .p_memsz = len,
++ .p_align = page_size,
++ };
++
++ if (!gelf_update_phdr(kcore->elf, idx, &phdr))
+ return -1;
+
+ return 0;
+diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
+index 9ff4193dfa49..79db45336e3a 100644
+--- a/virt/kvm/eventfd.c
++++ b/virt/kvm/eventfd.c
+@@ -771,40 +771,14 @@ static enum kvm_bus ioeventfd_bus_from_flags(__u32 flags)
+ return KVM_MMIO_BUS;
+ }
+
+-static int
+-kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
++static int kvm_assign_ioeventfd_idx(struct kvm *kvm,
++ enum kvm_bus bus_idx,
++ struct kvm_ioeventfd *args)
+ {
+- enum kvm_bus bus_idx;
+- struct _ioeventfd *p;
+- struct eventfd_ctx *eventfd;
+- int ret;
+-
+- bus_idx = ioeventfd_bus_from_flags(args->flags);
+- /* must be natural-word sized, or 0 to ignore length */
+- switch (args->len) {
+- case 0:
+- case 1:
+- case 2:
+- case 4:
+- case 8:
+- break;
+- default:
+- return -EINVAL;
+- }
+-
+- /* check for range overflow */
+- if (args->addr + args->len < args->addr)
+- return -EINVAL;
+
+- /* check for extra flags that we don't understand */
+- if (args->flags & ~KVM_IOEVENTFD_VALID_FLAG_MASK)
+- return -EINVAL;
+-
+- /* ioeventfd with no length can't be combined with DATAMATCH */
+- if (!args->len &&
+- args->flags & (KVM_IOEVENTFD_FLAG_PIO |
+- KVM_IOEVENTFD_FLAG_DATAMATCH))
+- return -EINVAL;
++ struct eventfd_ctx *eventfd;
++ struct _ioeventfd *p;
++ int ret;
+
+ eventfd = eventfd_ctx_fdget(args->fd);
+ if (IS_ERR(eventfd))
+@@ -843,16 +817,6 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
+ if (ret < 0)
+ goto unlock_fail;
+
+- /* When length is ignored, MMIO is also put on a separate bus, for
+- * faster lookups.
+- */
+- if (!args->len && !(args->flags & KVM_IOEVENTFD_FLAG_PIO)) {
+- ret = kvm_io_bus_register_dev(kvm, KVM_FAST_MMIO_BUS,
+- p->addr, 0, &p->dev);
+- if (ret < 0)
+- goto register_fail;
+- }
+-
+ kvm->buses[bus_idx]->ioeventfd_count++;
+ list_add_tail(&p->list, &kvm->ioeventfds);
+
+@@ -860,8 +824,6 @@ kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
+
+ return 0;
+
+-register_fail:
+- kvm_io_bus_unregister_dev(kvm, bus_idx, &p->dev);
+ unlock_fail:
+ mutex_unlock(&kvm->slots_lock);
+
+@@ -873,14 +835,13 @@ fail:
+ }
+
+ static int
+-kvm_deassign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
++kvm_deassign_ioeventfd_idx(struct kvm *kvm, enum kvm_bus bus_idx,
++ struct kvm_ioeventfd *args)
+ {
+- enum kvm_bus bus_idx;
+ struct _ioeventfd *p, *tmp;
+ struct eventfd_ctx *eventfd;
+ int ret = -ENOENT;
+
+- bus_idx = ioeventfd_bus_from_flags(args->flags);
+ eventfd = eventfd_ctx_fdget(args->fd);
+ if (IS_ERR(eventfd))
+ return PTR_ERR(eventfd);
+@@ -901,10 +862,6 @@ kvm_deassign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
+ continue;
+
+ kvm_io_bus_unregister_dev(kvm, bus_idx, &p->dev);
+- if (!p->length) {
+- kvm_io_bus_unregister_dev(kvm, KVM_FAST_MMIO_BUS,
+- &p->dev);
+- }
+ kvm->buses[bus_idx]->ioeventfd_count--;
+ ioeventfd_release(p);
+ ret = 0;
+@@ -918,6 +875,71 @@ kvm_deassign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
+ return ret;
+ }
+
++static int kvm_deassign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
++{
++ enum kvm_bus bus_idx = ioeventfd_bus_from_flags(args->flags);
++ int ret = kvm_deassign_ioeventfd_idx(kvm, bus_idx, args);
++
++ if (!args->len && bus_idx == KVM_MMIO_BUS)
++ kvm_deassign_ioeventfd_idx(kvm, KVM_FAST_MMIO_BUS, args);
++
++ return ret;
++}
++
++static int
++kvm_assign_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
++{
++ enum kvm_bus bus_idx;
++ int ret;
++
++ bus_idx = ioeventfd_bus_from_flags(args->flags);
++ /* must be natural-word sized, or 0 to ignore length */
++ switch (args->len) {
++ case 0:
++ case 1:
++ case 2:
++ case 4:
++ case 8:
++ break;
++ default:
++ return -EINVAL;
++ }
++
++ /* check for range overflow */
++ if (args->addr + args->len < args->addr)
++ return -EINVAL;
++
++ /* check for extra flags that we don't understand */
++ if (args->flags & ~KVM_IOEVENTFD_VALID_FLAG_MASK)
++ return -EINVAL;
++
++ /* ioeventfd with no length can't be combined with DATAMATCH */
++ if (!args->len &&
++ args->flags & (KVM_IOEVENTFD_FLAG_PIO |
++ KVM_IOEVENTFD_FLAG_DATAMATCH))
++ return -EINVAL;
++
++ ret = kvm_assign_ioeventfd_idx(kvm, bus_idx, args);
++ if (ret)
++ goto fail;
++
++ /* When length is ignored, MMIO is also put on a separate bus, for
++ * faster lookups.
++ */
++ if (!args->len && bus_idx == KVM_MMIO_BUS) {
++ ret = kvm_assign_ioeventfd_idx(kvm, KVM_FAST_MMIO_BUS, args);
++ if (ret < 0)
++ goto fast_fail;
++ }
++
++ return 0;
++
++fast_fail:
++ kvm_deassign_ioeventfd_idx(kvm, bus_idx, args);
++fail:
++ return ret;
++}
++
+ int
+ kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
+ {
+diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
+index 8b8a44453670..5a2a78a91d58 100644
+--- a/virt/kvm/kvm_main.c
++++ b/virt/kvm/kvm_main.c
+@@ -3080,10 +3080,25 @@ static void kvm_io_bus_destroy(struct kvm_io_bus *bus)
+ static inline int kvm_io_bus_cmp(const struct kvm_io_range *r1,
+ const struct kvm_io_range *r2)
+ {
+- if (r1->addr < r2->addr)
++ gpa_t addr1 = r1->addr;
++ gpa_t addr2 = r2->addr;
++
++ if (addr1 < addr2)
+ return -1;
+- if (r1->addr + r1->len > r2->addr + r2->len)
++
++ /* If r2->len == 0, match the exact address. If r2->len != 0,
++ * accept any overlapping write. Any order is acceptable for
++ * overlapping ranges, because kvm_io_bus_get_first_dev ensures
++ * we process all of them.
++ */
++ if (r2->len) {
++ addr1 += r1->len;
++ addr2 += r2->len;
++ }
++
++ if (addr1 > addr2)
+ return 1;
++
+ return 0;
+ }
+
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-10-23 17:19 Mike Pagano
0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-10-23 17:19 UTC (permalink / raw
To: gentoo-commits
commit: 6bc02433d40973c69bd8f87e1f849c63dc01a3c4
Author: Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Fri Oct 23 17:19:17 2015 +0000
Commit: Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Fri Oct 23 17:19:17 2015 +0000
URL: https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=6bc02433
Remove redundant patch.
0000_README | 4 --
1600_dm-crypt-limit-max-segment-size.patch | 84 ------------------------------
2 files changed, 88 deletions(-)
diff --git a/0000_README b/0000_README
index 2a467c2..daafdd3 100644
--- a/0000_README
+++ b/0000_README
@@ -67,10 +67,6 @@ Patch: 1510_fs-enable-link-security-restrictions-by-default.patch
From: http://sources.debian.net/src/linux/3.16.7-ckt4-3/debian/patches/debian/fs-enable-link-security-restrictions-by-default.patch/
Desc: Enable link security restrictions by default.
-Patch: 1600_dm-crypt-limit-max-segment-size.patch
-From: https://bugzilla.kernel.org/show_bug.cgi?id=104421
-Desc: dm crypt: constrain crypt device's max_segment_size to PAGE_SIZE.
-
Patch: 2700_ThinkPad-30-brightness-control-fix.patch
From: Seth Forshee <seth.forshee@canonical.com>
Desc: ACPI: Disable Windows 8 compatibility for some Lenovo ThinkPads.
diff --git a/1600_dm-crypt-limit-max-segment-size.patch b/1600_dm-crypt-limit-max-segment-size.patch
deleted file mode 100644
index 82aca44..0000000
--- a/1600_dm-crypt-limit-max-segment-size.patch
+++ /dev/null
@@ -1,84 +0,0 @@
-From 586b286b110e94eb31840ac5afc0c24e0881fe34 Mon Sep 17 00:00:00 2001
-From: Mike Snitzer <snitzer@redhat.com>
-Date: Wed, 9 Sep 2015 21:34:51 -0400
-Subject: dm crypt: constrain crypt device's max_segment_size to PAGE_SIZE
-
-Setting the dm-crypt device's max_segment_size to PAGE_SIZE is an
-unfortunate constraint that is required to avoid the potential for
-exceeding dm-crypt's underlying device's max_segments limits -- due to
-crypt_alloc_buffer() possibly allocating pages for the encryption bio
-that are not as physically contiguous as the original bio.
-
-It is interesting to note that this problem was already fixed back in
-2007 via commit 91e106259 ("dm crypt: use bio_add_page"). But Linux 4.0
-commit cf2f1abfb ("dm crypt: don't allocate pages for a partial
-request") regressed dm-crypt back to _not_ using bio_add_page(). But
-given dm-crypt's cpu parallelization changes all depend on commit
-cf2f1abfb's abandoning of the more complex io fragments processing that
-dm-crypt previously had we cannot easily go back to using
-bio_add_page().
-
-So all said the cleanest way to resolve this issue is to fix dm-crypt to
-properly constrain the original bios entering dm-crypt so the encryption
-bios that dm-crypt generates from the original bios are always
-compatible with the underlying device's max_segments queue limits.
-
-It should be noted that technically Linux 4.3 does _not_ need this fix
-because of the block core's new late bio-splitting capability. But, it
-is reasoned, there is little to be gained by having the block core split
-the encrypted bio that is composed of PAGE_SIZE segments. That said, in
-the future we may revert this change.
-
-Fixes: cf2f1abfb ("dm crypt: don't allocate pages for a partial request")
-Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=104421
-Suggested-by: Jeff Moyer <jmoyer@redhat.com>
-Signed-off-by: Mike Snitzer <snitzer@redhat.com>
-Cc: stable@vger.kernel.org # 4.0+
-
-diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
-index d60c88d..4b3b6f8 100644
---- a/drivers/md/dm-crypt.c
-+++ b/drivers/md/dm-crypt.c
-@@ -968,7 +968,8 @@ static void crypt_free_buffer_pages(struct crypt_config *cc, struct bio *clone);
-
- /*
- * Generate a new unfragmented bio with the given size
-- * This should never violate the device limitations
-+ * This should never violate the device limitations (but only because
-+ * max_segment_size is being constrained to PAGE_SIZE).
- *
- * This function may be called concurrently. If we allocate from the mempool
- * concurrently, there is a possibility of deadlock. For example, if we have
-@@ -2045,9 +2046,20 @@ static int crypt_iterate_devices(struct dm_target *ti,
- return fn(ti, cc->dev, cc->start, ti->len, data);
- }
-
-+static void crypt_io_hints(struct dm_target *ti, struct queue_limits *limits)
-+{
-+ /*
-+ * Unfortunate constraint that is required to avoid the potential
-+ * for exceeding underlying device's max_segments limits -- due to
-+ * crypt_alloc_buffer() possibly allocating pages for the encryption
-+ * bio that are not as physically contiguous as the original bio.
-+ */
-+ limits->max_segment_size = PAGE_SIZE;
-+}
-+
- static struct target_type crypt_target = {
- .name = "crypt",
-- .version = {1, 14, 0},
-+ .version = {1, 14, 1},
- .module = THIS_MODULE,
- .ctr = crypt_ctr,
- .dtr = crypt_dtr,
-@@ -2058,6 +2070,7 @@ static struct target_type crypt_target = {
- .resume = crypt_resume,
- .message = crypt_message,
- .iterate_devices = crypt_iterate_devices,
-+ .io_hints = crypt_io_hints,
- };
-
- static int __init dm_crypt_init(void)
---
-cgit v0.10.2
-
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-10-27 13:36 Mike Pagano
0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-10-27 13:36 UTC (permalink / raw
To: gentoo-commits
commit: b00da6f810d31f1fb924713c20c3f3b103f03228
Author: Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Tue Oct 27 13:36:07 2015 +0000
Commit: Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Tue Oct 27 13:36:07 2015 +0000
URL: https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=b00da6f8
Linux patch 4.2.5
0000_README | 4 +
1004_linux-4.2.5.patch | 1945 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 1949 insertions(+)
diff --git a/0000_README b/0000_README
index daafdd3..d40ecf2 100644
--- a/0000_README
+++ b/0000_README
@@ -59,6 +59,10 @@ Patch: 1003_linux-4.2.4.patch
From: http://www.kernel.org
Desc: Linux 4.2.4
+Patch: 1004_linux-4.2.5.patch
+From: http://www.kernel.org
+Desc: Linux 4.2.5
+
Patch: 1500_XATTR_USER_PREFIX.patch
From: https://bugs.gentoo.org/show_bug.cgi?id=470644
Desc: Support for namespace user.pax.* on tmpfs.
diff --git a/1004_linux-4.2.5.patch b/1004_linux-4.2.5.patch
new file mode 100644
index 0000000..b866faf
--- /dev/null
+++ b/1004_linux-4.2.5.patch
@@ -0,0 +1,1945 @@
+diff --git a/Makefile b/Makefile
+index a952801a6cd5..96076dcad18e 100644
+--- a/Makefile
++++ b/Makefile
+@@ -1,6 +1,6 @@
+ VERSION = 4
+ PATCHLEVEL = 2
+-SUBLEVEL = 4
++SUBLEVEL = 5
+ EXTRAVERSION =
+ NAME = Hurr durr I'ma sheep
+
+diff --git a/arch/arm/mach-ux500/Makefile b/arch/arm/mach-ux500/Makefile
+index 4418a5078833..c8643ac5db71 100644
+--- a/arch/arm/mach-ux500/Makefile
++++ b/arch/arm/mach-ux500/Makefile
+@@ -7,7 +7,7 @@ obj-$(CONFIG_CACHE_L2X0) += cache-l2x0.o
+ obj-$(CONFIG_UX500_SOC_DB8500) += cpu-db8500.o
+ obj-$(CONFIG_MACH_MOP500) += board-mop500-regulators.o \
+ board-mop500-audio.o
+-obj-$(CONFIG_SMP) += platsmp.o headsmp.o
++obj-$(CONFIG_SMP) += platsmp.o
+ obj-$(CONFIG_HOTPLUG_CPU) += hotplug.o
+ obj-$(CONFIG_PM_GENERIC_DOMAINS) += pm_domains.o
+
+diff --git a/arch/arm/mach-ux500/cpu-db8500.c b/arch/arm/mach-ux500/cpu-db8500.c
+index 16913800bbf9..ba708ce08616 100644
+--- a/arch/arm/mach-ux500/cpu-db8500.c
++++ b/arch/arm/mach-ux500/cpu-db8500.c
+@@ -154,7 +154,6 @@ static const char * stericsson_dt_platform_compat[] = {
+ };
+
+ DT_MACHINE_START(U8500_DT, "ST-Ericsson Ux5x0 platform (Device Tree Support)")
+- .smp = smp_ops(ux500_smp_ops),
+ .map_io = u8500_map_io,
+ .init_irq = ux500_init_irq,
+ /* we re-use nomadik timer here */
+diff --git a/arch/arm/mach-ux500/headsmp.S b/arch/arm/mach-ux500/headsmp.S
+deleted file mode 100644
+index 9cdea049485d..000000000000
+--- a/arch/arm/mach-ux500/headsmp.S
++++ /dev/null
+@@ -1,37 +0,0 @@
+-/*
+- * Copyright (c) 2009 ST-Ericsson
+- * This file is based ARM Realview platform
+- * Copyright (c) 2003 ARM Limited
+- * All Rights Reserved
+- *
+- * This program is free software; you can redistribute it and/or modify
+- * it under the terms of the GNU General Public License version 2 as
+- * published by the Free Software Foundation.
+- */
+-#include <linux/linkage.h>
+-#include <linux/init.h>
+-
+-/*
+- * U8500 specific entry point for secondary CPUs.
+- */
+-ENTRY(u8500_secondary_startup)
+- mrc p15, 0, r0, c0, c0, 5
+- and r0, r0, #15
+- adr r4, 1f
+- ldmia r4, {r5, r6}
+- sub r4, r4, r5
+- add r6, r6, r4
+-pen: ldr r7, [r6]
+- cmp r7, r0
+- bne pen
+-
+- /*
+- * we've been released from the holding pen: secondary_stack
+- * should now contain the SVC stack for this core
+- */
+- b secondary_startup
+-ENDPROC(u8500_secondary_startup)
+-
+- .align 2
+-1: .long .
+- .long pen_release
+diff --git a/arch/arm/mach-ux500/platsmp.c b/arch/arm/mach-ux500/platsmp.c
+index 62b1de922bd8..70766b963758 100644
+--- a/arch/arm/mach-ux500/platsmp.c
++++ b/arch/arm/mach-ux500/platsmp.c
+@@ -28,135 +28,81 @@
+ #include "db8500-regs.h"
+ #include "id.h"
+
+-static void __iomem *scu_base;
+-static void __iomem *backupram;
+-
+-/* This is called from headsmp.S to wakeup the secondary core */
+-extern void u8500_secondary_startup(void);
+-
+-/*
+- * Write pen_release in a way that is guaranteed to be visible to all
+- * observers, irrespective of whether they're taking part in coherency
+- * or not. This is necessary for the hotplug code to work reliably.
+- */
+-static void write_pen_release(int val)
+-{
+- pen_release = val;
+- smp_wmb();
+- sync_cache_w(&pen_release);
+-}
+-
+-static DEFINE_SPINLOCK(boot_lock);
+-
+-static void ux500_secondary_init(unsigned int cpu)
+-{
+- /*
+- * let the primary processor know we're out of the
+- * pen, then head off into the C entry point
+- */
+- write_pen_release(-1);
+-
+- /*
+- * Synchronise with the boot thread.
+- */
+- spin_lock(&boot_lock);
+- spin_unlock(&boot_lock);
+-}
++/* Magic triggers in backup RAM */
++#define UX500_CPU1_JUMPADDR_OFFSET 0x1FF4
++#define UX500_CPU1_WAKEMAGIC_OFFSET 0x1FF0
+
+-static int ux500_boot_secondary(unsigned int cpu, struct task_struct *idle)
++static void wakeup_secondary(void)
+ {
+- unsigned long timeout;
+-
+- /*
+- * set synchronisation state between this boot processor
+- * and the secondary one
+- */
+- spin_lock(&boot_lock);
+-
+- /*
+- * The secondary processor is waiting to be released from
+- * the holding pen - release it, then wait for it to flag
+- * that it has been released by resetting pen_release.
+- */
+- write_pen_release(cpu_logical_map(cpu));
+-
+- arch_send_wakeup_ipi_mask(cpumask_of(cpu));
++ struct device_node *np;
++ static void __iomem *backupram;
+
+- timeout = jiffies + (1 * HZ);
+- while (time_before(jiffies, timeout)) {
+- if (pen_release == -1)
+- break;
++ np = of_find_compatible_node(NULL, NULL, "ste,dbx500-backupram");
++ if (!np) {
++ pr_err("No backupram base address\n");
++ return;
++ }
++ backupram = of_iomap(np, 0);
++ of_node_put(np);
++ if (!backupram) {
++ pr_err("No backupram remap\n");
++ return;
+ }
+
+ /*
+- * now the secondary core is starting up let it run its
+- * calibrations, then wait for it to finish
+- */
+- spin_unlock(&boot_lock);
+-
+- return pen_release != -1 ? -ENOSYS : 0;
+-}
+-
+-static void __init wakeup_secondary(void)
+-{
+- /*
+ * write the address of secondary startup into the backup ram register
+ * at offset 0x1FF4, then write the magic number 0xA1FEED01 to the
+ * backup ram register at offset 0x1FF0, which is what boot rom code
+- * is waiting for. This would wake up the secondary core from WFE
++ * is waiting for. This will wake up the secondary core from WFE.
+ */
+-#define UX500_CPU1_JUMPADDR_OFFSET 0x1FF4
+- __raw_writel(virt_to_phys(u8500_secondary_startup),
+- backupram + UX500_CPU1_JUMPADDR_OFFSET);
+-
+-#define UX500_CPU1_WAKEMAGIC_OFFSET 0x1FF0
+- __raw_writel(0xA1FEED01,
+- backupram + UX500_CPU1_WAKEMAGIC_OFFSET);
++ writel(virt_to_phys(secondary_startup),
++ backupram + UX500_CPU1_JUMPADDR_OFFSET);
++ writel(0xA1FEED01,
++ backupram + UX500_CPU1_WAKEMAGIC_OFFSET);
+
+ /* make sure write buffer is drained */
+ mb();
++ iounmap(backupram);
+ }
+
+-/*
+- * Initialise the CPU possible map early - this describes the CPUs
+- * which may be present or become present in the system.
+- */
+-static void __init ux500_smp_init_cpus(void)
++static void __init ux500_smp_prepare_cpus(unsigned int max_cpus)
+ {
+- unsigned int i, ncores;
+ struct device_node *np;
++ static void __iomem *scu_base;
++ unsigned int ncores;
++ int i;
+
+ np = of_find_compatible_node(NULL, NULL, "arm,cortex-a9-scu");
++ if (!np) {
++ pr_err("No SCU base address\n");
++ return;
++ }
+ scu_base = of_iomap(np, 0);
+ of_node_put(np);
+- if (!scu_base)
++ if (!scu_base) {
++ pr_err("No SCU remap\n");
+ return;
+- backupram = ioremap(U8500_BACKUPRAM0_BASE, SZ_8K);
+- ncores = scu_get_core_count(scu_base);
+-
+- /* sanity check */
+- if (ncores > nr_cpu_ids) {
+- pr_warn("SMP: %u cores greater than maximum (%u), clipping\n",
+- ncores, nr_cpu_ids);
+- ncores = nr_cpu_ids;
+ }
+
++ scu_enable(scu_base);
++ ncores = scu_get_core_count(scu_base);
+ for (i = 0; i < ncores; i++)
+ set_cpu_possible(i, true);
++ iounmap(scu_base);
+ }
+
+-static void __init ux500_smp_prepare_cpus(unsigned int max_cpus)
++static int ux500_boot_secondary(unsigned int cpu, struct task_struct *idle)
+ {
+- scu_enable(scu_base);
+ wakeup_secondary();
++ arch_send_wakeup_ipi_mask(cpumask_of(cpu));
++ return 0;
+ }
+
+ struct smp_operations ux500_smp_ops __initdata = {
+- .smp_init_cpus = ux500_smp_init_cpus,
+ .smp_prepare_cpus = ux500_smp_prepare_cpus,
+- .smp_secondary_init = ux500_secondary_init,
+ .smp_boot_secondary = ux500_boot_secondary,
+ #ifdef CONFIG_HOTPLUG_CPU
+ .cpu_die = ux500_cpu_die,
+ #endif
+ };
++CPU_METHOD_OF_DECLARE(ux500_smp, "ste,dbx500-smp", &ux500_smp_ops);
+diff --git a/arch/arm/mach-ux500/setup.h b/arch/arm/mach-ux500/setup.h
+index 1fb6ad2789f1..65876eac0761 100644
+--- a/arch/arm/mach-ux500/setup.h
++++ b/arch/arm/mach-ux500/setup.h
+@@ -26,7 +26,6 @@ extern struct device *ux500_soc_device_init(const char *soc_id);
+
+ extern void ux500_timer_init(void);
+
+-extern struct smp_operations ux500_smp_ops;
+ extern void ux500_cpu_die(unsigned int cpu);
+
+ #endif /* __ASM_ARCH_SETUP_H */
+diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
+index 81151663ef38..3258174e6152 100644
+--- a/arch/arm64/Makefile
++++ b/arch/arm64/Makefile
+@@ -31,7 +31,7 @@ endif
+ CHECKFLAGS += -D__aarch64__
+
+ ifeq ($(CONFIG_ARM64_ERRATUM_843419), y)
+-CFLAGS_MODULE += -mcmodel=large
++KBUILD_CFLAGS_MODULE += -mcmodel=large
+ endif
+
+ # Default value
+diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
+index 56283f8a675c..cf7319422768 100644
+--- a/arch/arm64/include/asm/pgtable.h
++++ b/arch/arm64/include/asm/pgtable.h
+@@ -80,7 +80,7 @@ extern void __pgd_error(const char *file, int line, unsigned long val);
+ #define PAGE_S2 __pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_NORMAL) | PTE_S2_RDONLY)
+ #define PAGE_S2_DEVICE __pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_DEVICE_nGnRE) | PTE_S2_RDONLY | PTE_UXN)
+
+-#define PAGE_NONE __pgprot(((_PAGE_DEFAULT) & ~PTE_TYPE_MASK) | PTE_PROT_NONE | PTE_PXN | PTE_UXN)
++#define PAGE_NONE __pgprot(((_PAGE_DEFAULT) & ~PTE_VALID) | PTE_PROT_NONE | PTE_PXN | PTE_UXN)
+ #define PAGE_SHARED __pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_PXN | PTE_UXN | PTE_WRITE)
+ #define PAGE_SHARED_EXEC __pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_PXN | PTE_WRITE)
+ #define PAGE_COPY __pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_PXN | PTE_UXN)
+@@ -460,7 +460,7 @@ static inline pud_t *pud_offset(pgd_t *pgd, unsigned long addr)
+ static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
+ {
+ const pteval_t mask = PTE_USER | PTE_PXN | PTE_UXN | PTE_RDONLY |
+- PTE_PROT_NONE | PTE_WRITE | PTE_TYPE_MASK;
++ PTE_PROT_NONE | PTE_VALID | PTE_WRITE;
+ pte_val(pte) = (pte_val(pte) & ~mask) | (pgprot_val(newprot) & mask);
+ return pte;
+ }
+diff --git a/arch/sparc/crypto/aes_glue.c b/arch/sparc/crypto/aes_glue.c
+index 2e48eb8813ff..c90930de76ba 100644
+--- a/arch/sparc/crypto/aes_glue.c
++++ b/arch/sparc/crypto/aes_glue.c
+@@ -433,6 +433,7 @@ static struct crypto_alg algs[] = { {
+ .blkcipher = {
+ .min_keysize = AES_MIN_KEY_SIZE,
+ .max_keysize = AES_MAX_KEY_SIZE,
++ .ivsize = AES_BLOCK_SIZE,
+ .setkey = aes_set_key,
+ .encrypt = cbc_encrypt,
+ .decrypt = cbc_decrypt,
+@@ -452,6 +453,7 @@ static struct crypto_alg algs[] = { {
+ .blkcipher = {
+ .min_keysize = AES_MIN_KEY_SIZE,
+ .max_keysize = AES_MAX_KEY_SIZE,
++ .ivsize = AES_BLOCK_SIZE,
+ .setkey = aes_set_key,
+ .encrypt = ctr_crypt,
+ .decrypt = ctr_crypt,
+diff --git a/arch/sparc/crypto/camellia_glue.c b/arch/sparc/crypto/camellia_glue.c
+index 6bf2479a12fb..561a84d93cf6 100644
+--- a/arch/sparc/crypto/camellia_glue.c
++++ b/arch/sparc/crypto/camellia_glue.c
+@@ -274,6 +274,7 @@ static struct crypto_alg algs[] = { {
+ .blkcipher = {
+ .min_keysize = CAMELLIA_MIN_KEY_SIZE,
+ .max_keysize = CAMELLIA_MAX_KEY_SIZE,
++ .ivsize = CAMELLIA_BLOCK_SIZE,
+ .setkey = camellia_set_key,
+ .encrypt = cbc_encrypt,
+ .decrypt = cbc_decrypt,
+diff --git a/arch/sparc/crypto/des_glue.c b/arch/sparc/crypto/des_glue.c
+index dd6a34fa6e19..61af794aa2d3 100644
+--- a/arch/sparc/crypto/des_glue.c
++++ b/arch/sparc/crypto/des_glue.c
+@@ -429,6 +429,7 @@ static struct crypto_alg algs[] = { {
+ .blkcipher = {
+ .min_keysize = DES_KEY_SIZE,
+ .max_keysize = DES_KEY_SIZE,
++ .ivsize = DES_BLOCK_SIZE,
+ .setkey = des_set_key,
+ .encrypt = cbc_encrypt,
+ .decrypt = cbc_decrypt,
+@@ -485,6 +486,7 @@ static struct crypto_alg algs[] = { {
+ .blkcipher = {
+ .min_keysize = DES3_EDE_KEY_SIZE,
+ .max_keysize = DES3_EDE_KEY_SIZE,
++ .ivsize = DES3_EDE_BLOCK_SIZE,
+ .setkey = des3_ede_set_key,
+ .encrypt = cbc3_encrypt,
+ .decrypt = cbc3_decrypt,
+diff --git a/arch/x86/crypto/camellia_aesni_avx_glue.c b/arch/x86/crypto/camellia_aesni_avx_glue.c
+index 80a0e4389c9a..bacaa13acac5 100644
+--- a/arch/x86/crypto/camellia_aesni_avx_glue.c
++++ b/arch/x86/crypto/camellia_aesni_avx_glue.c
+@@ -554,6 +554,11 @@ static int __init camellia_aesni_init(void)
+ {
+ const char *feature_name;
+
++ if (!cpu_has_avx || !cpu_has_aes || !cpu_has_osxsave) {
++ pr_info("AVX or AES-NI instructions are not detected.\n");
++ return -ENODEV;
++ }
++
+ if (!cpu_has_xfeatures(XSTATE_SSE | XSTATE_YMM, &feature_name)) {
+ pr_info("CPU feature '%s' is not supported.\n", feature_name);
+ return -ENODEV;
+diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
+index e7a4fde5d631..2392541a96e6 100644
+--- a/arch/x86/kvm/emulate.c
++++ b/arch/x86/kvm/emulate.c
+@@ -2418,7 +2418,7 @@ static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt, u64 smbase)
+ u64 val, cr0, cr4;
+ u32 base3;
+ u16 selector;
+- int i;
++ int i, r;
+
+ for (i = 0; i < 16; i++)
+ *reg_write(ctxt, i) = GET_SMSTATE(u64, smbase, 0x7ff8 - i * 8);
+@@ -2460,13 +2460,17 @@ static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt, u64 smbase)
+ dt.address = GET_SMSTATE(u64, smbase, 0x7e68);
+ ctxt->ops->set_gdt(ctxt, &dt);
+
++ r = rsm_enter_protected_mode(ctxt, cr0, cr4);
++ if (r != X86EMUL_CONTINUE)
++ return r;
++
+ for (i = 0; i < 6; i++) {
+- int r = rsm_load_seg_64(ctxt, smbase, i);
++ r = rsm_load_seg_64(ctxt, smbase, i);
+ if (r != X86EMUL_CONTINUE)
+ return r;
+ }
+
+- return rsm_enter_protected_mode(ctxt, cr0, cr4);
++ return X86EMUL_CONTINUE;
+ }
+
+ static int em_rsm(struct x86_emulate_ctxt *ctxt)
+diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
+index 32c6e6ac5964..373328b71599 100644
+--- a/arch/x86/kvm/x86.c
++++ b/arch/x86/kvm/x86.c
+@@ -6706,6 +6706,12 @@ static inline int vcpu_block(struct kvm *kvm, struct kvm_vcpu *vcpu)
+ return 1;
+ }
+
++static inline bool kvm_vcpu_running(struct kvm_vcpu *vcpu)
++{
++ return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
++ !vcpu->arch.apf.halted);
++}
++
+ static int vcpu_run(struct kvm_vcpu *vcpu)
+ {
+ int r;
+@@ -6714,8 +6720,7 @@ static int vcpu_run(struct kvm_vcpu *vcpu)
+ vcpu->srcu_idx = srcu_read_lock(&kvm->srcu);
+
+ for (;;) {
+- if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
+- !vcpu->arch.apf.halted)
++ if (kvm_vcpu_running(vcpu))
+ r = vcpu_enter_guest(vcpu);
+ else
+ r = vcpu_block(kvm, vcpu);
+@@ -8011,19 +8016,36 @@ void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
+ kvm_mmu_invalidate_zap_all_pages(kvm);
+ }
+
++static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
++{
++ if (!list_empty_careful(&vcpu->async_pf.done))
++ return true;
++
++ if (kvm_apic_has_events(vcpu))
++ return true;
++
++ if (vcpu->arch.pv.pv_unhalted)
++ return true;
++
++ if (atomic_read(&vcpu->arch.nmi_queued))
++ return true;
++
++ if (test_bit(KVM_REQ_SMI, &vcpu->requests))
++ return true;
++
++ if (kvm_arch_interrupt_allowed(vcpu) &&
++ kvm_cpu_has_interrupt(vcpu))
++ return true;
++
++ return false;
++}
++
+ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
+ {
+ if (is_guest_mode(vcpu) && kvm_x86_ops->check_nested_events)
+ kvm_x86_ops->check_nested_events(vcpu, false);
+
+- return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
+- !vcpu->arch.apf.halted)
+- || !list_empty_careful(&vcpu->async_pf.done)
+- || kvm_apic_has_events(vcpu)
+- || vcpu->arch.pv.pv_unhalted
+- || atomic_read(&vcpu->arch.nmi_queued) ||
+- (kvm_arch_interrupt_allowed(vcpu) &&
+- kvm_cpu_has_interrupt(vcpu));
++ return kvm_vcpu_running(vcpu) || kvm_vcpu_has_events(vcpu);
+ }
+
+ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
+diff --git a/crypto/ahash.c b/crypto/ahash.c
+index 8acb886032ae..9c1dc8d6106a 100644
+--- a/crypto/ahash.c
++++ b/crypto/ahash.c
+@@ -544,7 +544,8 @@ static int ahash_prepare_alg(struct ahash_alg *alg)
+ struct crypto_alg *base = &alg->halg.base;
+
+ if (alg->halg.digestsize > PAGE_SIZE / 8 ||
+- alg->halg.statesize > PAGE_SIZE / 8)
++ alg->halg.statesize > PAGE_SIZE / 8 ||
++ alg->halg.statesize == 0)
+ return -EINVAL;
+
+ base->cra_type = &crypto_ahash_type;
+diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
+index bc67a93aa4f4..324bf35ec4dd 100644
+--- a/drivers/block/rbd.c
++++ b/drivers/block/rbd.c
+@@ -5201,7 +5201,6 @@ static int rbd_dev_probe_parent(struct rbd_device *rbd_dev)
+ out_err:
+ if (parent) {
+ rbd_dev_unparent(rbd_dev);
+- kfree(rbd_dev->header_name);
+ rbd_dev_destroy(parent);
+ } else {
+ rbd_put_client(rbdc);
+diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
+index b16b9256883e..4c4035fdeb6f 100644
+--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
++++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
+@@ -76,8 +76,6 @@ static void amdgpu_flip_work_func(struct work_struct *__work)
+ /* We borrow the event spin lock for protecting flip_status */
+ spin_lock_irqsave(&crtc->dev->event_lock, flags);
+
+- /* set the proper interrupt */
+- amdgpu_irq_get(adev, &adev->pageflip_irq, work->crtc_id);
+ /* do the flip (mmio) */
+ adev->mode_info.funcs->page_flip(adev, work->crtc_id, work->base);
+ /* set the flip status */
+diff --git a/drivers/gpu/drm/amd/amdgpu/ci_dpm.c b/drivers/gpu/drm/amd/amdgpu/ci_dpm.c
+index 82e8d0730517..a1a35a5df8e7 100644
+--- a/drivers/gpu/drm/amd/amdgpu/ci_dpm.c
++++ b/drivers/gpu/drm/amd/amdgpu/ci_dpm.c
+@@ -6185,6 +6185,11 @@ static int ci_dpm_late_init(void *handle)
+ if (!amdgpu_dpm)
+ return 0;
+
++ /* init the sysfs and debugfs files late */
++ ret = amdgpu_pm_sysfs_init(adev);
++ if (ret)
++ return ret;
++
+ ret = ci_set_temperature_range(adev);
+ if (ret)
+ return ret;
+@@ -6232,9 +6237,6 @@ static int ci_dpm_sw_init(void *handle)
+ adev->pm.dpm.current_ps = adev->pm.dpm.requested_ps = adev->pm.dpm.boot_ps;
+ if (amdgpu_dpm == 1)
+ amdgpu_pm_print_power_states(adev);
+- ret = amdgpu_pm_sysfs_init(adev);
+- if (ret)
+- goto dpm_failed;
+ mutex_unlock(&adev->pm.mutex);
+ DRM_INFO("amdgpu: dpm initialized\n");
+
+diff --git a/drivers/gpu/drm/amd/amdgpu/cik.c b/drivers/gpu/drm/amd/amdgpu/cik.c
+index 341c56681841..519fa515c4d8 100644
+--- a/drivers/gpu/drm/amd/amdgpu/cik.c
++++ b/drivers/gpu/drm/amd/amdgpu/cik.c
+@@ -1565,6 +1565,9 @@ static void cik_pcie_gen3_enable(struct amdgpu_device *adev)
+ int ret, i;
+ u16 tmp16;
+
++ if (pci_is_root_bus(adev->pdev->bus))
++ return;
++
+ if (amdgpu_pcie_gen2 == 0)
+ return;
+
+diff --git a/drivers/gpu/drm/amd/amdgpu/cz_dpm.c b/drivers/gpu/drm/amd/amdgpu/cz_dpm.c
+index ace870afc7d4..fd29c18fc14e 100644
+--- a/drivers/gpu/drm/amd/amdgpu/cz_dpm.c
++++ b/drivers/gpu/drm/amd/amdgpu/cz_dpm.c
+@@ -596,6 +596,12 @@ static int cz_dpm_late_init(void *handle)
+ struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+
+ if (amdgpu_dpm) {
++ int ret;
++ /* init the sysfs and debugfs files late */
++ ret = amdgpu_pm_sysfs_init(adev);
++ if (ret)
++ return ret;
++
+ /* powerdown unused blocks for now */
+ cz_dpm_powergate_uvd(adev, true);
+ cz_dpm_powergate_vce(adev, true);
+@@ -632,10 +638,6 @@ static int cz_dpm_sw_init(void *handle)
+ if (amdgpu_dpm == 1)
+ amdgpu_pm_print_power_states(adev);
+
+- ret = amdgpu_pm_sysfs_init(adev);
+- if (ret)
+- goto dpm_init_failed;
+-
+ mutex_unlock(&adev->pm.mutex);
+ DRM_INFO("amdgpu: dpm initialized\n");
+
+diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
+index e774a437dd65..ef36467c7e34 100644
+--- a/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
++++ b/drivers/gpu/drm/amd/amdgpu/dce_v10_0.c
+@@ -233,6 +233,24 @@ static u32 dce_v10_0_vblank_get_counter(struct amdgpu_device *adev, int crtc)
+ return RREG32(mmCRTC_STATUS_FRAME_COUNT + crtc_offsets[crtc]);
+ }
+
++static void dce_v10_0_pageflip_interrupt_init(struct amdgpu_device *adev)
++{
++ unsigned i;
++
++ /* Enable pflip interrupts */
++ for (i = 0; i < adev->mode_info.num_crtc; i++)
++ amdgpu_irq_get(adev, &adev->pageflip_irq, i);
++}
++
++static void dce_v10_0_pageflip_interrupt_fini(struct amdgpu_device *adev)
++{
++ unsigned i;
++
++ /* Disable pflip interrupts */
++ for (i = 0; i < adev->mode_info.num_crtc; i++)
++ amdgpu_irq_put(adev, &adev->pageflip_irq, i);
++}
++
+ /**
+ * dce_v10_0_page_flip - pageflip callback.
+ *
+@@ -2641,9 +2659,10 @@ static void dce_v10_0_crtc_dpms(struct drm_crtc *crtc, int mode)
+ dce_v10_0_vga_enable(crtc, true);
+ amdgpu_atombios_crtc_blank(crtc, ATOM_DISABLE);
+ dce_v10_0_vga_enable(crtc, false);
+- /* Make sure VBLANK interrupt is still enabled */
++ /* Make sure VBLANK and PFLIP interrupts are still enabled */
+ type = amdgpu_crtc_idx_to_irq_type(adev, amdgpu_crtc->crtc_id);
+ amdgpu_irq_update(adev, &adev->crtc_irq, type);
++ amdgpu_irq_update(adev, &adev->pageflip_irq, type);
+ drm_vblank_post_modeset(dev, amdgpu_crtc->crtc_id);
+ dce_v10_0_crtc_load_lut(crtc);
+ break;
+@@ -3002,6 +3021,8 @@ static int dce_v10_0_hw_init(void *handle)
+ dce_v10_0_audio_enable(adev, &adev->mode_info.audio.pin[i], false);
+ }
+
++ dce_v10_0_pageflip_interrupt_init(adev);
++
+ return 0;
+ }
+
+@@ -3016,6 +3037,8 @@ static int dce_v10_0_hw_fini(void *handle)
+ dce_v10_0_audio_enable(adev, &adev->mode_info.audio.pin[i], false);
+ }
+
++ dce_v10_0_pageflip_interrupt_fini(adev);
++
+ return 0;
+ }
+
+@@ -3027,6 +3050,8 @@ static int dce_v10_0_suspend(void *handle)
+
+ dce_v10_0_hpd_fini(adev);
+
++ dce_v10_0_pageflip_interrupt_fini(adev);
++
+ return 0;
+ }
+
+@@ -3052,6 +3077,8 @@ static int dce_v10_0_resume(void *handle)
+ /* initialize hpd */
+ dce_v10_0_hpd_init(adev);
+
++ dce_v10_0_pageflip_interrupt_init(adev);
++
+ return 0;
+ }
+
+@@ -3346,7 +3373,6 @@ static int dce_v10_0_pageflip_irq(struct amdgpu_device *adev,
+ spin_unlock_irqrestore(&adev->ddev->event_lock, flags);
+
+ drm_vblank_put(adev->ddev, amdgpu_crtc->crtc_id);
+- amdgpu_irq_put(adev, &adev->pageflip_irq, crtc_id);
+ queue_work(amdgpu_crtc->pflip_queue, &works->unpin_work);
+
+ return 0;
+diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
+index c4a21a7afd68..329bca0f1331 100644
+--- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
++++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
+@@ -233,6 +233,24 @@ static u32 dce_v11_0_vblank_get_counter(struct amdgpu_device *adev, int crtc)
+ return RREG32(mmCRTC_STATUS_FRAME_COUNT + crtc_offsets[crtc]);
+ }
+
++static void dce_v11_0_pageflip_interrupt_init(struct amdgpu_device *adev)
++{
++ unsigned i;
++
++ /* Enable pflip interrupts */
++ for (i = 0; i < adev->mode_info.num_crtc; i++)
++ amdgpu_irq_get(adev, &adev->pageflip_irq, i);
++}
++
++static void dce_v11_0_pageflip_interrupt_fini(struct amdgpu_device *adev)
++{
++ unsigned i;
++
++ /* Disable pflip interrupts */
++ for (i = 0; i < adev->mode_info.num_crtc; i++)
++ amdgpu_irq_put(adev, &adev->pageflip_irq, i);
++}
++
+ /**
+ * dce_v11_0_page_flip - pageflip callback.
+ *
+@@ -2640,9 +2658,10 @@ static void dce_v11_0_crtc_dpms(struct drm_crtc *crtc, int mode)
+ dce_v11_0_vga_enable(crtc, true);
+ amdgpu_atombios_crtc_blank(crtc, ATOM_DISABLE);
+ dce_v11_0_vga_enable(crtc, false);
+- /* Make sure VBLANK interrupt is still enabled */
++ /* Make sure VBLANK and PFLIP interrupts are still enabled */
+ type = amdgpu_crtc_idx_to_irq_type(adev, amdgpu_crtc->crtc_id);
+ amdgpu_irq_update(adev, &adev->crtc_irq, type);
++ amdgpu_irq_update(adev, &adev->pageflip_irq, type);
+ drm_vblank_post_modeset(dev, amdgpu_crtc->crtc_id);
+ dce_v11_0_crtc_load_lut(crtc);
+ break;
+@@ -2888,7 +2907,7 @@ static int dce_v11_0_early_init(void *handle)
+
+ switch (adev->asic_type) {
+ case CHIP_CARRIZO:
+- adev->mode_info.num_crtc = 4;
++ adev->mode_info.num_crtc = 3;
+ adev->mode_info.num_hpd = 6;
+ adev->mode_info.num_dig = 9;
+ break;
+@@ -3000,6 +3019,8 @@ static int dce_v11_0_hw_init(void *handle)
+ dce_v11_0_audio_enable(adev, &adev->mode_info.audio.pin[i], false);
+ }
+
++ dce_v11_0_pageflip_interrupt_init(adev);
++
+ return 0;
+ }
+
+@@ -3014,6 +3035,8 @@ static int dce_v11_0_hw_fini(void *handle)
+ dce_v11_0_audio_enable(adev, &adev->mode_info.audio.pin[i], false);
+ }
+
++ dce_v11_0_pageflip_interrupt_fini(adev);
++
+ return 0;
+ }
+
+@@ -3025,6 +3048,8 @@ static int dce_v11_0_suspend(void *handle)
+
+ dce_v11_0_hpd_fini(adev);
+
++ dce_v11_0_pageflip_interrupt_fini(adev);
++
+ return 0;
+ }
+
+@@ -3051,6 +3076,8 @@ static int dce_v11_0_resume(void *handle)
+ /* initialize hpd */
+ dce_v11_0_hpd_init(adev);
+
++ dce_v11_0_pageflip_interrupt_init(adev);
++
+ return 0;
+ }
+
+@@ -3345,7 +3372,6 @@ static int dce_v11_0_pageflip_irq(struct amdgpu_device *adev,
+ spin_unlock_irqrestore(&adev->ddev->event_lock, flags);
+
+ drm_vblank_put(adev->ddev, amdgpu_crtc->crtc_id);
+- amdgpu_irq_put(adev, &adev->pageflip_irq, crtc_id);
+ queue_work(amdgpu_crtc->pflip_queue, &works->unpin_work);
+
+ return 0;
+diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c
+index cc050a329c49..937879ed86bc 100644
+--- a/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c
++++ b/drivers/gpu/drm/amd/amdgpu/dce_v8_0.c
+@@ -204,6 +204,24 @@ static u32 dce_v8_0_vblank_get_counter(struct amdgpu_device *adev, int crtc)
+ return RREG32(mmCRTC_STATUS_FRAME_COUNT + crtc_offsets[crtc]);
+ }
+
++static void dce_v8_0_pageflip_interrupt_init(struct amdgpu_device *adev)
++{
++ unsigned i;
++
++ /* Enable pflip interrupts */
++ for (i = 0; i < adev->mode_info.num_crtc; i++)
++ amdgpu_irq_get(adev, &adev->pageflip_irq, i);
++}
++
++static void dce_v8_0_pageflip_interrupt_fini(struct amdgpu_device *adev)
++{
++ unsigned i;
++
++ /* Disable pflip interrupts */
++ for (i = 0; i < adev->mode_info.num_crtc; i++)
++ amdgpu_irq_put(adev, &adev->pageflip_irq, i);
++}
++
+ /**
+ * dce_v8_0_page_flip - pageflip callback.
+ *
+@@ -2575,9 +2593,10 @@ static void dce_v8_0_crtc_dpms(struct drm_crtc *crtc, int mode)
+ dce_v8_0_vga_enable(crtc, true);
+ amdgpu_atombios_crtc_blank(crtc, ATOM_DISABLE);
+ dce_v8_0_vga_enable(crtc, false);
+- /* Make sure VBLANK interrupt is still enabled */
++ /* Make sure VBLANK and PFLIP interrupts are still enabled */
+ type = amdgpu_crtc_idx_to_irq_type(adev, amdgpu_crtc->crtc_id);
+ amdgpu_irq_update(adev, &adev->crtc_irq, type);
++ amdgpu_irq_update(adev, &adev->pageflip_irq, type);
+ drm_vblank_post_modeset(dev, amdgpu_crtc->crtc_id);
+ dce_v8_0_crtc_load_lut(crtc);
+ break;
+@@ -2933,6 +2952,8 @@ static int dce_v8_0_hw_init(void *handle)
+ dce_v8_0_audio_enable(adev, &adev->mode_info.audio.pin[i], false);
+ }
+
++ dce_v8_0_pageflip_interrupt_init(adev);
++
+ return 0;
+ }
+
+@@ -2947,6 +2968,8 @@ static int dce_v8_0_hw_fini(void *handle)
+ dce_v8_0_audio_enable(adev, &adev->mode_info.audio.pin[i], false);
+ }
+
++ dce_v8_0_pageflip_interrupt_fini(adev);
++
+ return 0;
+ }
+
+@@ -2958,6 +2981,8 @@ static int dce_v8_0_suspend(void *handle)
+
+ dce_v8_0_hpd_fini(adev);
+
++ dce_v8_0_pageflip_interrupt_fini(adev);
++
+ return 0;
+ }
+
+@@ -2981,6 +3006,8 @@ static int dce_v8_0_resume(void *handle)
+ /* initialize hpd */
+ dce_v8_0_hpd_init(adev);
+
++ dce_v8_0_pageflip_interrupt_init(adev);
++
+ return 0;
+ }
+
+@@ -3376,7 +3403,6 @@ static int dce_v8_0_pageflip_irq(struct amdgpu_device *adev,
+ spin_unlock_irqrestore(&adev->ddev->event_lock, flags);
+
+ drm_vblank_put(adev->ddev, amdgpu_crtc->crtc_id);
+- amdgpu_irq_put(adev, &adev->pageflip_irq, crtc_id);
+ queue_work(amdgpu_crtc->pflip_queue, &works->unpin_work);
+
+ return 0;
+diff --git a/drivers/gpu/drm/amd/amdgpu/kv_dpm.c b/drivers/gpu/drm/amd/amdgpu/kv_dpm.c
+index 94ec04a9c4d5..9745ed3a9aef 100644
+--- a/drivers/gpu/drm/amd/amdgpu/kv_dpm.c
++++ b/drivers/gpu/drm/amd/amdgpu/kv_dpm.c
+@@ -2995,6 +2995,12 @@ static int kv_dpm_late_init(void *handle)
+ {
+ /* powerdown unused blocks for now */
+ struct amdgpu_device *adev = (struct amdgpu_device *)handle;
++ int ret;
++
++ /* init the sysfs and debugfs files late */
++ ret = amdgpu_pm_sysfs_init(adev);
++ if (ret)
++ return ret;
+
+ kv_dpm_powergate_acp(adev, true);
+ kv_dpm_powergate_samu(adev, true);
+@@ -3038,9 +3044,6 @@ static int kv_dpm_sw_init(void *handle)
+ adev->pm.dpm.current_ps = adev->pm.dpm.requested_ps = adev->pm.dpm.boot_ps;
+ if (amdgpu_dpm == 1)
+ amdgpu_pm_print_power_states(adev);
+- ret = amdgpu_pm_sysfs_init(adev);
+- if (ret)
+- goto dpm_failed;
+ mutex_unlock(&adev->pm.mutex);
+ DRM_INFO("amdgpu: dpm initialized\n");
+
+diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c
+index 4f58a1e18de6..9ffa56cebdbc 100644
+--- a/drivers/gpu/drm/amd/amdgpu/vi.c
++++ b/drivers/gpu/drm/amd/amdgpu/vi.c
+@@ -968,6 +968,9 @@ static void vi_pcie_gen3_enable(struct amdgpu_device *adev)
+ u32 mask;
+ int ret;
+
++ if (pci_is_root_bus(adev->pdev->bus))
++ return;
++
+ if (amdgpu_pcie_gen2 == 0)
+ return;
+
+diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
+index 969e7898a7ed..27a2426c3daa 100644
+--- a/drivers/gpu/drm/drm_dp_mst_topology.c
++++ b/drivers/gpu/drm/drm_dp_mst_topology.c
+@@ -2789,12 +2789,13 @@ static int drm_dp_mst_i2c_xfer(struct i2c_adapter *adapter, struct i2c_msg *msgs
+ if (msgs[num - 1].flags & I2C_M_RD)
+ reading = true;
+
+- if (!reading) {
++ if (!reading || (num - 1 > DP_REMOTE_I2C_READ_MAX_TRANSACTIONS)) {
+ DRM_DEBUG_KMS("Unsupported I2C transaction for MST device\n");
+ ret = -EIO;
+ goto out;
+ }
+
++ memset(&msg, 0, sizeof(msg));
+ msg.req_type = DP_REMOTE_I2C_READ;
+ msg.u.i2c_read.num_transactions = num - 1;
+ msg.u.i2c_read.port_number = port->port_num;
+diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c
+index 0f6cd33b531f..684bd4a13843 100644
+--- a/drivers/gpu/drm/drm_sysfs.c
++++ b/drivers/gpu/drm/drm_sysfs.c
+@@ -235,18 +235,12 @@ static ssize_t dpms_show(struct device *device,
+ char *buf)
+ {
+ struct drm_connector *connector = to_drm_connector(device);
+- struct drm_device *dev = connector->dev;
+- uint64_t dpms_status;
+- int ret;
++ int dpms;
+
+- ret = drm_object_property_get_value(&connector->base,
+- dev->mode_config.dpms_property,
+- &dpms_status);
+- if (ret)
+- return 0;
++ dpms = READ_ONCE(connector->dpms);
+
+ return snprintf(buf, PAGE_SIZE, "%s\n",
+- drm_get_dpms_name((int)dpms_status));
++ drm_get_dpms_name(dpms));
+ }
+
+ static ssize_t enabled_show(struct device *device,
+diff --git a/drivers/gpu/drm/nouveau/nouveau_fbcon.c b/drivers/gpu/drm/nouveau/nouveau_fbcon.c
+index 6751553abe4a..567791b27d6d 100644
+--- a/drivers/gpu/drm/nouveau/nouveau_fbcon.c
++++ b/drivers/gpu/drm/nouveau/nouveau_fbcon.c
+@@ -178,8 +178,30 @@ nouveau_fbcon_sync(struct fb_info *info)
+ return 0;
+ }
+
++static int
++nouveau_fbcon_open(struct fb_info *info, int user)
++{
++ struct nouveau_fbdev *fbcon = info->par;
++ struct nouveau_drm *drm = nouveau_drm(fbcon->dev);
++ int ret = pm_runtime_get_sync(drm->dev->dev);
++ if (ret < 0 && ret != -EACCES)
++ return ret;
++ return 0;
++}
++
++static int
++nouveau_fbcon_release(struct fb_info *info, int user)
++{
++ struct nouveau_fbdev *fbcon = info->par;
++ struct nouveau_drm *drm = nouveau_drm(fbcon->dev);
++ pm_runtime_put(drm->dev->dev);
++ return 0;
++}
++
+ static struct fb_ops nouveau_fbcon_ops = {
+ .owner = THIS_MODULE,
++ .fb_open = nouveau_fbcon_open,
++ .fb_release = nouveau_fbcon_release,
+ .fb_check_var = drm_fb_helper_check_var,
+ .fb_set_par = drm_fb_helper_set_par,
+ .fb_fillrect = nouveau_fbcon_fillrect,
+@@ -195,6 +217,8 @@ static struct fb_ops nouveau_fbcon_ops = {
+
+ static struct fb_ops nouveau_fbcon_sw_ops = {
+ .owner = THIS_MODULE,
++ .fb_open = nouveau_fbcon_open,
++ .fb_release = nouveau_fbcon_release,
+ .fb_check_var = drm_fb_helper_check_var,
+ .fb_set_par = drm_fb_helper_set_par,
+ .fb_fillrect = cfb_fillrect,
+diff --git a/drivers/gpu/drm/qxl/qxl_fb.c b/drivers/gpu/drm/qxl/qxl_fb.c
+index 6b6e57e8c2d6..847a902e7385 100644
+--- a/drivers/gpu/drm/qxl/qxl_fb.c
++++ b/drivers/gpu/drm/qxl/qxl_fb.c
+@@ -144,14 +144,17 @@ static void qxl_dirty_update(struct qxl_fbdev *qfbdev,
+
+ spin_lock_irqsave(&qfbdev->dirty.lock, flags);
+
+- if (qfbdev->dirty.y1 < y)
+- y = qfbdev->dirty.y1;
+- if (qfbdev->dirty.y2 > y2)
+- y2 = qfbdev->dirty.y2;
+- if (qfbdev->dirty.x1 < x)
+- x = qfbdev->dirty.x1;
+- if (qfbdev->dirty.x2 > x2)
+- x2 = qfbdev->dirty.x2;
++ if ((qfbdev->dirty.y2 - qfbdev->dirty.y1) &&
++ (qfbdev->dirty.x2 - qfbdev->dirty.x1)) {
++ if (qfbdev->dirty.y1 < y)
++ y = qfbdev->dirty.y1;
++ if (qfbdev->dirty.y2 > y2)
++ y2 = qfbdev->dirty.y2;
++ if (qfbdev->dirty.x1 < x)
++ x = qfbdev->dirty.x1;
++ if (qfbdev->dirty.x2 > x2)
++ x2 = qfbdev->dirty.x2;
++ }
+
+ qfbdev->dirty.x1 = x;
+ qfbdev->dirty.x2 = x2;
+diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c
+index d2e9e9efc159..6743174acdbc 100644
+--- a/drivers/gpu/drm/radeon/radeon_display.c
++++ b/drivers/gpu/drm/radeon/radeon_display.c
+@@ -1633,18 +1633,8 @@ int radeon_modeset_init(struct radeon_device *rdev)
+ radeon_fbdev_init(rdev);
+ drm_kms_helper_poll_init(rdev->ddev);
+
+- if (rdev->pm.dpm_enabled) {
+- /* do dpm late init */
+- ret = radeon_pm_late_init(rdev);
+- if (ret) {
+- rdev->pm.dpm_enabled = false;
+- DRM_ERROR("radeon_pm_late_init failed, disabling dpm\n");
+- }
+- /* set the dpm state for PX since there won't be
+- * a modeset to call this.
+- */
+- radeon_pm_compute_clocks(rdev);
+- }
++ /* do pm late init */
++ ret = radeon_pm_late_init(rdev);
+
+ return 0;
+ }
+diff --git a/drivers/gpu/drm/radeon/radeon_dp_mst.c b/drivers/gpu/drm/radeon/radeon_dp_mst.c
+index 257b10be5cda..42986130cc63 100644
+--- a/drivers/gpu/drm/radeon/radeon_dp_mst.c
++++ b/drivers/gpu/drm/radeon/radeon_dp_mst.c
+@@ -283,6 +283,7 @@ static struct drm_connector *radeon_dp_add_mst_connector(struct drm_dp_mst_topol
+ radeon_connector->mst_encoder = radeon_dp_create_fake_mst_encoder(master);
+
+ drm_object_attach_property(&connector->base, dev->mode_config.path_property, 0);
++ drm_object_attach_property(&connector->base, dev->mode_config.tile_property, 0);
+ drm_mode_connector_set_path_property(connector, pathprop);
+ drm_reinit_primary_mode_group(dev);
+
+diff --git a/drivers/gpu/drm/radeon/radeon_pm.c b/drivers/gpu/drm/radeon/radeon_pm.c
+index c1ba83a8dd8c..948c33105801 100644
+--- a/drivers/gpu/drm/radeon/radeon_pm.c
++++ b/drivers/gpu/drm/radeon/radeon_pm.c
+@@ -1331,14 +1331,6 @@ static int radeon_pm_init_old(struct radeon_device *rdev)
+ INIT_DELAYED_WORK(&rdev->pm.dynpm_idle_work, radeon_dynpm_idle_work_handler);
+
+ if (rdev->pm.num_power_states > 1) {
+- /* where's the best place to put these? */
+- ret = device_create_file(rdev->dev, &dev_attr_power_profile);
+- if (ret)
+- DRM_ERROR("failed to create device file for power profile\n");
+- ret = device_create_file(rdev->dev, &dev_attr_power_method);
+- if (ret)
+- DRM_ERROR("failed to create device file for power method\n");
+-
+ if (radeon_debugfs_pm_init(rdev)) {
+ DRM_ERROR("Failed to register debugfs file for PM!\n");
+ }
+@@ -1396,20 +1388,6 @@ static int radeon_pm_init_dpm(struct radeon_device *rdev)
+ goto dpm_failed;
+ rdev->pm.dpm_enabled = true;
+
+- ret = device_create_file(rdev->dev, &dev_attr_power_dpm_state);
+- if (ret)
+- DRM_ERROR("failed to create device file for dpm state\n");
+- ret = device_create_file(rdev->dev, &dev_attr_power_dpm_force_performance_level);
+- if (ret)
+- DRM_ERROR("failed to create device file for dpm state\n");
+- /* XXX: these are noops for dpm but are here for backwards compat */
+- ret = device_create_file(rdev->dev, &dev_attr_power_profile);
+- if (ret)
+- DRM_ERROR("failed to create device file for power profile\n");
+- ret = device_create_file(rdev->dev, &dev_attr_power_method);
+- if (ret)
+- DRM_ERROR("failed to create device file for power method\n");
+-
+ if (radeon_debugfs_pm_init(rdev)) {
+ DRM_ERROR("Failed to register debugfs file for dpm!\n");
+ }
+@@ -1550,9 +1528,44 @@ int radeon_pm_late_init(struct radeon_device *rdev)
+ int ret = 0;
+
+ if (rdev->pm.pm_method == PM_METHOD_DPM) {
+- mutex_lock(&rdev->pm.mutex);
+- ret = radeon_dpm_late_enable(rdev);
+- mutex_unlock(&rdev->pm.mutex);
++ if (rdev->pm.dpm_enabled) {
++ ret = device_create_file(rdev->dev, &dev_attr_power_dpm_state);
++ if (ret)
++ DRM_ERROR("failed to create device file for dpm state\n");
++ ret = device_create_file(rdev->dev, &dev_attr_power_dpm_force_performance_level);
++ if (ret)
++ DRM_ERROR("failed to create device file for dpm state\n");
++ /* XXX: these are noops for dpm but are here for backwards compat */
++ ret = device_create_file(rdev->dev, &dev_attr_power_profile);
++ if (ret)
++ DRM_ERROR("failed to create device file for power profile\n");
++ ret = device_create_file(rdev->dev, &dev_attr_power_method);
++ if (ret)
++ DRM_ERROR("failed to create device file for power method\n");
++
++ mutex_lock(&rdev->pm.mutex);
++ ret = radeon_dpm_late_enable(rdev);
++ mutex_unlock(&rdev->pm.mutex);
++ if (ret) {
++ rdev->pm.dpm_enabled = false;
++ DRM_ERROR("radeon_pm_late_init failed, disabling dpm\n");
++ } else {
++ /* set the dpm state for PX since there won't be
++ * a modeset to call this.
++ */
++ radeon_pm_compute_clocks(rdev);
++ }
++ }
++ } else {
++ if (rdev->pm.num_power_states > 1) {
++ /* where's the best place to put these? */
++ ret = device_create_file(rdev->dev, &dev_attr_power_profile);
++ if (ret)
++ DRM_ERROR("failed to create device file for power profile\n");
++ ret = device_create_file(rdev->dev, &dev_attr_power_method);
++ if (ret)
++ DRM_ERROR("failed to create device file for power method\n");
++ }
+ }
+ return ret;
+ }
+diff --git a/drivers/i2c/busses/i2c-designware-platdrv.c b/drivers/i2c/busses/i2c-designware-platdrv.c
+index 3dd2de31a2f8..472b88285c75 100644
+--- a/drivers/i2c/busses/i2c-designware-platdrv.c
++++ b/drivers/i2c/busses/i2c-designware-platdrv.c
+@@ -24,6 +24,7 @@
+ #include <linux/kernel.h>
+ #include <linux/module.h>
+ #include <linux/delay.h>
++#include <linux/dmi.h>
+ #include <linux/i2c.h>
+ #include <linux/clk.h>
+ #include <linux/clk-provider.h>
+@@ -51,6 +52,22 @@ static u32 i2c_dw_get_clk_rate_khz(struct dw_i2c_dev *dev)
+ }
+
+ #ifdef CONFIG_ACPI
++/*
++ * The HCNT/LCNT information coming from ACPI should be the most accurate
++ * for given platform. However, some systems get it wrong. On such systems
++ * we get better results by calculating those based on the input clock.
++ */
++static const struct dmi_system_id dw_i2c_no_acpi_params[] = {
++ {
++ .ident = "Dell Inspiron 7348",
++ .matches = {
++ DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
++ DMI_MATCH(DMI_PRODUCT_NAME, "Inspiron 7348"),
++ },
++ },
++ { }
++};
++
+ static void dw_i2c_acpi_params(struct platform_device *pdev, char method[],
+ u16 *hcnt, u16 *lcnt, u32 *sda_hold)
+ {
+@@ -58,6 +75,9 @@ static void dw_i2c_acpi_params(struct platform_device *pdev, char method[],
+ acpi_handle handle = ACPI_HANDLE(&pdev->dev);
+ union acpi_object *obj;
+
++ if (dmi_check_system(dw_i2c_no_acpi_params))
++ return;
++
+ if (ACPI_FAILURE(acpi_evaluate_object(handle, method, NULL, &buf)))
+ return;
+
+@@ -253,12 +273,6 @@ static int dw_i2c_probe(struct platform_device *pdev)
+ adap->dev.parent = &pdev->dev;
+ adap->dev.of_node = pdev->dev.of_node;
+
+- r = i2c_add_numbered_adapter(adap);
+- if (r) {
+- dev_err(&pdev->dev, "failure adding adapter\n");
+- return r;
+- }
+-
+ if (dev->pm_runtime_disabled) {
+ pm_runtime_forbid(&pdev->dev);
+ } else {
+@@ -268,6 +282,13 @@ static int dw_i2c_probe(struct platform_device *pdev)
+ pm_runtime_enable(&pdev->dev);
+ }
+
++ r = i2c_add_numbered_adapter(adap);
++ if (r) {
++ dev_err(&pdev->dev, "failure adding adapter\n");
++ pm_runtime_disable(&pdev->dev);
++ return r;
++ }
++
+ return 0;
+ }
+
+diff --git a/drivers/i2c/busses/i2c-rcar.c b/drivers/i2c/busses/i2c-rcar.c
+index d8361dada584..d8b5a8fee1e6 100644
+--- a/drivers/i2c/busses/i2c-rcar.c
++++ b/drivers/i2c/busses/i2c-rcar.c
+@@ -690,15 +690,16 @@ static int rcar_i2c_probe(struct platform_device *pdev)
+ return ret;
+ }
+
++ pm_runtime_enable(dev);
++ platform_set_drvdata(pdev, priv);
++
+ ret = i2c_add_numbered_adapter(adap);
+ if (ret < 0) {
+ dev_err(dev, "reg adap failed: %d\n", ret);
++ pm_runtime_disable(dev);
+ return ret;
+ }
+
+- pm_runtime_enable(dev);
+- platform_set_drvdata(pdev, priv);
+-
+ dev_info(dev, "probed\n");
+
+ return 0;
+diff --git a/drivers/i2c/busses/i2c-s3c2410.c b/drivers/i2c/busses/i2c-s3c2410.c
+index 50bfd8cef5f2..5df819610d52 100644
+--- a/drivers/i2c/busses/i2c-s3c2410.c
++++ b/drivers/i2c/busses/i2c-s3c2410.c
+@@ -1243,17 +1243,19 @@ static int s3c24xx_i2c_probe(struct platform_device *pdev)
+ i2c->adap.nr = i2c->pdata->bus_num;
+ i2c->adap.dev.of_node = pdev->dev.of_node;
+
++ platform_set_drvdata(pdev, i2c);
++
++ pm_runtime_enable(&pdev->dev);
++
+ ret = i2c_add_numbered_adapter(&i2c->adap);
+ if (ret < 0) {
+ dev_err(&pdev->dev, "failed to add bus to i2c core\n");
++ pm_runtime_disable(&pdev->dev);
+ s3c24xx_i2c_deregister_cpufreq(i2c);
+ clk_unprepare(i2c->clk);
+ return ret;
+ }
+
+- platform_set_drvdata(pdev, i2c);
+-
+- pm_runtime_enable(&pdev->dev);
+ pm_runtime_enable(&i2c->adap.dev);
+
+ dev_info(&pdev->dev, "%s: S3C I2C adapter\n", dev_name(&i2c->adap.dev));
+diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
+index 75aef240c2d1..493c38e08bd2 100644
+--- a/drivers/md/dm-thin.c
++++ b/drivers/md/dm-thin.c
+@@ -3255,7 +3255,7 @@ static int pool_ctr(struct dm_target *ti, unsigned argc, char **argv)
+ metadata_low_callback,
+ pool);
+ if (r)
+- goto out_free_pt;
++ goto out_flags_changed;
+
+ pt->callbacks.congested_fn = pool_is_congested;
+ dm_table_add_target_callbacks(ti->table, &pt->callbacks);
+diff --git a/drivers/mfd/max77843.c b/drivers/mfd/max77843.c
+index a354ac677ec7..1074a0d68680 100644
+--- a/drivers/mfd/max77843.c
++++ b/drivers/mfd/max77843.c
+@@ -79,7 +79,7 @@ static int max77843_chg_init(struct max77843 *max77843)
+ if (!max77843->i2c_chg) {
+ dev_err(&max77843->i2c->dev,
+ "Cannot allocate I2C device for Charger\n");
+- return PTR_ERR(max77843->i2c_chg);
++ return -ENODEV;
+ }
+ i2c_set_clientdata(max77843->i2c_chg, max77843);
+
+diff --git a/drivers/net/ethernet/ibm/emac/core.h b/drivers/net/ethernet/ibm/emac/core.h
+index 28df37420da9..ac02c675c59c 100644
+--- a/drivers/net/ethernet/ibm/emac/core.h
++++ b/drivers/net/ethernet/ibm/emac/core.h
+@@ -460,8 +460,8 @@ struct emac_ethtool_regs_subhdr {
+ u32 index;
+ };
+
+-#define EMAC_ETHTOOL_REGS_VER 0
+-#define EMAC4_ETHTOOL_REGS_VER 1
+-#define EMAC4SYNC_ETHTOOL_REGS_VER 2
++#define EMAC_ETHTOOL_REGS_VER 3
++#define EMAC4_ETHTOOL_REGS_VER 4
++#define EMAC4SYNC_ETHTOOL_REGS_VER 5
+
+ #endif /* __IBM_NEWEMAC_CORE_H */
+diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
+index 3837ae344f63..2ed75060da50 100644
+--- a/drivers/net/ppp/pppoe.c
++++ b/drivers/net/ppp/pppoe.c
+@@ -313,7 +313,6 @@ static void pppoe_flush_dev(struct net_device *dev)
+ if (po->pppoe_dev == dev &&
+ sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND | PPPOX_ZOMBIE)) {
+ pppox_unbind_sock(sk);
+- sk->sk_state = PPPOX_ZOMBIE;
+ sk->sk_state_change(sk);
+ po->pppoe_dev = NULL;
+ dev_put(dev);
+diff --git a/drivers/pinctrl/freescale/pinctrl-imx25.c b/drivers/pinctrl/freescale/pinctrl-imx25.c
+index faf635654312..293ed4381cc0 100644
+--- a/drivers/pinctrl/freescale/pinctrl-imx25.c
++++ b/drivers/pinctrl/freescale/pinctrl-imx25.c
+@@ -26,7 +26,8 @@
+ #include "pinctrl-imx.h"
+
+ enum imx25_pads {
+- MX25_PAD_RESERVE0 = 1,
++ MX25_PAD_RESERVE0 = 0,
++ MX25_PAD_RESERVE1 = 1,
+ MX25_PAD_A10 = 2,
+ MX25_PAD_A13 = 3,
+ MX25_PAD_A14 = 4,
+@@ -169,6 +170,7 @@ enum imx25_pads {
+ /* Pad names for the pinmux subsystem */
+ static const struct pinctrl_pin_desc imx25_pinctrl_pads[] = {
+ IMX_PINCTRL_PIN(MX25_PAD_RESERVE0),
++ IMX_PINCTRL_PIN(MX25_PAD_RESERVE1),
+ IMX_PINCTRL_PIN(MX25_PAD_A10),
+ IMX_PINCTRL_PIN(MX25_PAD_A13),
+ IMX_PINCTRL_PIN(MX25_PAD_A14),
+diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
+index 802fabb30e15..34cbe3505dac 100644
+--- a/fs/btrfs/backref.c
++++ b/fs/btrfs/backref.c
+@@ -1809,7 +1809,6 @@ static int iterate_inode_extrefs(u64 inum, struct btrfs_root *fs_root,
+ int found = 0;
+ struct extent_buffer *eb;
+ struct btrfs_inode_extref *extref;
+- struct extent_buffer *leaf;
+ u32 item_size;
+ u32 cur_offset;
+ unsigned long ptr;
+@@ -1837,9 +1836,8 @@ static int iterate_inode_extrefs(u64 inum, struct btrfs_root *fs_root,
+ btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
+ btrfs_release_path(path);
+
+- leaf = path->nodes[0];
+- item_size = btrfs_item_size_nr(leaf, slot);
+- ptr = btrfs_item_ptr_offset(leaf, slot);
++ item_size = btrfs_item_size_nr(eb, slot);
++ ptr = btrfs_item_ptr_offset(eb, slot);
+ cur_offset = 0;
+
+ while (cur_offset < item_size) {
+@@ -1853,7 +1851,7 @@ static int iterate_inode_extrefs(u64 inum, struct btrfs_root *fs_root,
+ if (ret)
+ break;
+
+- cur_offset += btrfs_inode_extref_name_len(leaf, extref);
++ cur_offset += btrfs_inode_extref_name_len(eb, extref);
+ cur_offset += sizeof(*extref);
+ }
+ btrfs_tree_read_unlock_blocking(eb);
+diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
+index 0770c91586ca..f490b6155091 100644
+--- a/fs/btrfs/ioctl.c
++++ b/fs/btrfs/ioctl.c
+@@ -4647,6 +4647,11 @@ locked:
+ bctl->flags |= BTRFS_BALANCE_TYPE_MASK;
+ }
+
++ if (bctl->flags & ~(BTRFS_BALANCE_ARGS_MASK | BTRFS_BALANCE_TYPE_MASK)) {
++ ret = -EINVAL;
++ goto out_bargs;
++ }
++
+ do_balance:
+ /*
+ * Ownership of bctl and mutually_exclusive_operation_running
+diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
+index 95842a909e7f..2ac5f8cd701a 100644
+--- a/fs/btrfs/volumes.h
++++ b/fs/btrfs/volumes.h
+@@ -376,6 +376,14 @@ struct map_lookup {
+ #define BTRFS_BALANCE_ARGS_VRANGE (1ULL << 4)
+ #define BTRFS_BALANCE_ARGS_LIMIT (1ULL << 5)
+
++#define BTRFS_BALANCE_ARGS_MASK \
++ (BTRFS_BALANCE_ARGS_PROFILES | \
++ BTRFS_BALANCE_ARGS_USAGE | \
++ BTRFS_BALANCE_ARGS_DEVID | \
++ BTRFS_BALANCE_ARGS_DRANGE | \
++ BTRFS_BALANCE_ARGS_VRANGE | \
++ BTRFS_BALANCE_ARGS_LIMIT)
++
+ /*
+ * Profile changing flags. When SOFT is set we won't relocate chunk if
+ * it already has the target profile (even though it may be
+diff --git a/fs/nfsd/blocklayout.c b/fs/nfsd/blocklayout.c
+index cdefaa331a07..c29d9421bd5e 100644
+--- a/fs/nfsd/blocklayout.c
++++ b/fs/nfsd/blocklayout.c
+@@ -56,14 +56,6 @@ nfsd4_block_proc_layoutget(struct inode *inode, const struct svc_fh *fhp,
+ u32 device_generation = 0;
+ int error;
+
+- /*
+- * We do not attempt to support I/O smaller than the fs block size,
+- * or not aligned to it.
+- */
+- if (args->lg_minlength < block_size) {
+- dprintk("pnfsd: I/O too small\n");
+- goto out_layoutunavailable;
+- }
+ if (seg->offset & (block_size - 1)) {
+ dprintk("pnfsd: I/O misaligned\n");
+ goto out_layoutunavailable;
+diff --git a/include/drm/drm_dp_mst_helper.h b/include/drm/drm_dp_mst_helper.h
+index 86d0b25ed054..a89f505c856b 100644
+--- a/include/drm/drm_dp_mst_helper.h
++++ b/include/drm/drm_dp_mst_helper.h
+@@ -253,6 +253,7 @@ struct drm_dp_remote_dpcd_write {
+ u8 *bytes;
+ };
+
++#define DP_REMOTE_I2C_READ_MAX_TRANSACTIONS 4
+ struct drm_dp_remote_i2c_read {
+ u8 num_transactions;
+ u8 port_number;
+@@ -262,7 +263,7 @@ struct drm_dp_remote_i2c_read {
+ u8 *bytes;
+ u8 no_stop_bit;
+ u8 i2c_transaction_delay;
+- } transactions[4];
++ } transactions[DP_REMOTE_I2C_READ_MAX_TRANSACTIONS];
+ u8 read_i2c_device_id;
+ u8 num_bytes_read;
+ };
+diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
+index 9b88536487e6..275158803824 100644
+--- a/include/linux/skbuff.h
++++ b/include/linux/skbuff.h
+@@ -2601,6 +2601,9 @@ static inline void skb_postpull_rcsum(struct sk_buff *skb,
+ {
+ if (skb->ip_summed == CHECKSUM_COMPLETE)
+ skb->csum = csum_sub(skb->csum, csum_partial(start, len, 0));
++ else if (skb->ip_summed == CHECKSUM_PARTIAL &&
++ skb_checksum_start_offset(skb) < 0)
++ skb->ip_summed = CHECKSUM_NONE;
+ }
+
+ unsigned char *skb_pull_rcsum(struct sk_buff *skb, unsigned int len);
+diff --git a/include/net/af_unix.h b/include/net/af_unix.h
+index 4a167b30a12f..cb1b9bbda332 100644
+--- a/include/net/af_unix.h
++++ b/include/net/af_unix.h
+@@ -63,7 +63,11 @@ struct unix_sock {
+ #define UNIX_GC_MAYBE_CYCLE 1
+ struct socket_wq peer_wq;
+ };
+-#define unix_sk(__sk) ((struct unix_sock *)__sk)
++
++static inline struct unix_sock *unix_sk(struct sock *sk)
++{
++ return (struct unix_sock *)sk;
++}
+
+ #define peer_wait peer_wq.wait
+
+diff --git a/include/net/sock.h b/include/net/sock.h
+index f21f0708ec59..4ca4c3fe446f 100644
+--- a/include/net/sock.h
++++ b/include/net/sock.h
+@@ -826,6 +826,14 @@ static inline __must_check int sk_add_backlog(struct sock *sk, struct sk_buff *s
+ if (sk_rcvqueues_full(sk, limit))
+ return -ENOBUFS;
+
++ /*
++ * If the skb was allocated from pfmemalloc reserves, only
++ * allow SOCK_MEMALLOC sockets to use it as this socket is
++ * helping free memory
++ */
++ if (skb_pfmemalloc(skb) && !sock_flag(sk, SOCK_MEMALLOC))
++ return -ENOMEM;
++
+ __sk_add_backlog(sk, skb);
+ sk->sk_backlog.len += skb->truesize;
+ return 0;
+diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
+index a20d4110e871..3688f1e07ebd 100644
+--- a/kernel/time/timekeeping.c
++++ b/kernel/time/timekeeping.c
+@@ -1244,7 +1244,7 @@ void __init timekeeping_init(void)
+ set_normalized_timespec64(&tmp, -boot.tv_sec, -boot.tv_nsec);
+ tk_set_wall_to_mono(tk, tmp);
+
+- timekeeping_update(tk, TK_MIRROR);
++ timekeeping_update(tk, TK_MIRROR | TK_CLOCK_WAS_SET);
+
+ write_seqcount_end(&tk_core.seq);
+ raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
+diff --git a/kernel/workqueue.c b/kernel/workqueue.c
+index a413acb59a07..1de0f5fabb98 100644
+--- a/kernel/workqueue.c
++++ b/kernel/workqueue.c
+@@ -1458,13 +1458,13 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
+ timer_stats_timer_set_start_info(&dwork->timer);
+
+ dwork->wq = wq;
++ /* timer isn't guaranteed to run in this cpu, record earlier */
++ if (cpu == WORK_CPU_UNBOUND)
++ cpu = raw_smp_processor_id();
+ dwork->cpu = cpu;
+ timer->expires = jiffies + delay;
+
+- if (unlikely(cpu != WORK_CPU_UNBOUND))
+- add_timer_on(timer, cpu);
+- else
+- add_timer(timer);
++ add_timer_on(timer, cpu);
+ }
+
+ /**
+diff --git a/mm/memcontrol.c b/mm/memcontrol.c
+index 237d4686482d..03a6f7506cf3 100644
+--- a/mm/memcontrol.c
++++ b/mm/memcontrol.c
+@@ -3687,6 +3687,7 @@ static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg,
+ ret = page_counter_memparse(args, "-1", &threshold);
+ if (ret)
+ return ret;
++ threshold <<= PAGE_SHIFT;
+
+ mutex_lock(&memcg->thresholds_lock);
+
+diff --git a/net/core/ethtool.c b/net/core/ethtool.c
+index b495ab1797fa..29edf74846fc 100644
+--- a/net/core/ethtool.c
++++ b/net/core/ethtool.c
+@@ -1284,7 +1284,7 @@ static int ethtool_get_strings(struct net_device *dev, void __user *useraddr)
+
+ gstrings.len = ret;
+
+- data = kmalloc(gstrings.len * ETH_GSTRING_LEN, GFP_USER);
++ data = kcalloc(gstrings.len, ETH_GSTRING_LEN, GFP_USER);
+ if (!data)
+ return -ENOMEM;
+
+diff --git a/net/core/filter.c b/net/core/filter.c
+index be3098fb65e4..8dcdd86b68dd 100644
+--- a/net/core/filter.c
++++ b/net/core/filter.c
+@@ -1412,6 +1412,7 @@ static u64 bpf_clone_redirect(u64 r1, u64 ifindex, u64 flags, u64 r4, u64 r5)
+ return dev_forward_skb(dev, skb2);
+
+ skb2->dev = dev;
++ skb_sender_cpu_clear(skb2);
+ return dev_queue_xmit(skb2);
+ }
+
+@@ -1701,9 +1702,13 @@ int sk_get_filter(struct sock *sk, struct sock_filter __user *ubuf,
+ goto out;
+
+ /* We're copying the filter that has been originally attached,
+- * so no conversion/decode needed anymore.
++ * so no conversion/decode needed anymore. eBPF programs that
++ * have no original program cannot be dumped through this.
+ */
++ ret = -EACCES;
+ fprog = filter->prog->orig_prog;
++ if (!fprog)
++ goto out;
+
+ ret = fprog->len;
+ if (!len)
+diff --git a/net/core/skbuff.c b/net/core/skbuff.c
+index 7b84330e5d30..7bfa18746681 100644
+--- a/net/core/skbuff.c
++++ b/net/core/skbuff.c
+@@ -2958,11 +2958,12 @@ EXPORT_SYMBOL_GPL(skb_append_pagefrags);
+ */
+ unsigned char *skb_pull_rcsum(struct sk_buff *skb, unsigned int len)
+ {
++ unsigned char *data = skb->data;
++
+ BUG_ON(len > skb->len);
+- skb->len -= len;
+- BUG_ON(skb->len < skb->data_len);
+- skb_postpull_rcsum(skb, skb->data, len);
+- return skb->data += len;
++ __skb_pull(skb, len);
++ skb_postpull_rcsum(skb, data, len);
++ return skb->data;
+ }
+ EXPORT_SYMBOL_GPL(skb_pull_rcsum);
+
+diff --git a/net/dsa/slave.c b/net/dsa/slave.c
+index 35c47ddd04f0..25dbb91e1bc0 100644
+--- a/net/dsa/slave.c
++++ b/net/dsa/slave.c
+@@ -348,12 +348,17 @@ static int dsa_slave_stp_update(struct net_device *dev, u8 state)
+ static int dsa_slave_port_attr_set(struct net_device *dev,
+ struct switchdev_attr *attr)
+ {
+- int ret = 0;
++ struct dsa_slave_priv *p = netdev_priv(dev);
++ struct dsa_switch *ds = p->parent;
++ int ret;
+
+ switch (attr->id) {
+ case SWITCHDEV_ATTR_PORT_STP_STATE:
+- if (attr->trans == SWITCHDEV_TRANS_COMMIT)
+- ret = dsa_slave_stp_update(dev, attr->u.stp_state);
++ if (attr->trans == SWITCHDEV_TRANS_PREPARE)
++ ret = ds->drv->port_stp_update ? 0 : -EOPNOTSUPP;
++ else
++ ret = ds->drv->port_stp_update(ds, p->port,
++ attr->u.stp_state);
+ break;
+ default:
+ ret = -EOPNOTSUPP;
+diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
+index 134957159c27..61b45a17fc73 100644
+--- a/net/ipv4/inet_connection_sock.c
++++ b/net/ipv4/inet_connection_sock.c
+@@ -577,21 +577,22 @@ EXPORT_SYMBOL(inet_rtx_syn_ack);
+ static bool reqsk_queue_unlink(struct request_sock_queue *queue,
+ struct request_sock *req)
+ {
+- struct listen_sock *lopt = queue->listen_opt;
+ struct request_sock **prev;
++ struct listen_sock *lopt;
+ bool found = false;
+
+ spin_lock(&queue->syn_wait_lock);
+-
+- for (prev = &lopt->syn_table[req->rsk_hash]; *prev != NULL;
+- prev = &(*prev)->dl_next) {
+- if (*prev == req) {
+- *prev = req->dl_next;
+- found = true;
+- break;
++ lopt = queue->listen_opt;
++ if (lopt) {
++ for (prev = &lopt->syn_table[req->rsk_hash]; *prev != NULL;
++ prev = &(*prev)->dl_next) {
++ if (*prev == req) {
++ *prev = req->dl_next;
++ found = true;
++ break;
++ }
+ }
+ }
+-
+ spin_unlock(&queue->syn_wait_lock);
+ if (timer_pending(&req->rsk_timer) && del_timer_sync(&req->rsk_timer))
+ reqsk_put(req);
+@@ -685,20 +686,20 @@ void reqsk_queue_hash_req(struct request_sock_queue *queue,
+ req->num_timeout = 0;
+ req->sk = NULL;
+
++ setup_timer(&req->rsk_timer, reqsk_timer_handler, (unsigned long)req);
++ mod_timer_pinned(&req->rsk_timer, jiffies + timeout);
++ req->rsk_hash = hash;
++
+ /* before letting lookups find us, make sure all req fields
+ * are committed to memory and refcnt initialized.
+ */
+ smp_wmb();
+ atomic_set(&req->rsk_refcnt, 2);
+- setup_timer(&req->rsk_timer, reqsk_timer_handler, (unsigned long)req);
+- req->rsk_hash = hash;
+
+ spin_lock(&queue->syn_wait_lock);
+ req->dl_next = lopt->syn_table[hash];
+ lopt->syn_table[hash] = req;
+ spin_unlock(&queue->syn_wait_lock);
+-
+- mod_timer_pinned(&req->rsk_timer, jiffies + timeout);
+ }
+ EXPORT_SYMBOL(reqsk_queue_hash_req);
+
+diff --git a/net/ipv6/route.c b/net/ipv6/route.c
+index 00b64d402a57..dd6ebba5846c 100644
+--- a/net/ipv6/route.c
++++ b/net/ipv6/route.c
+@@ -139,6 +139,9 @@ static void rt6_uncached_list_flush_dev(struct net *net, struct net_device *dev)
+ struct net_device *loopback_dev = net->loopback_dev;
+ int cpu;
+
++ if (dev == loopback_dev)
++ return;
++
+ for_each_possible_cpu(cpu) {
+ struct uncached_list *ul = per_cpu_ptr(&rt6_uncached_list, cpu);
+ struct rt6_info *rt;
+@@ -148,14 +151,12 @@ static void rt6_uncached_list_flush_dev(struct net *net, struct net_device *dev)
+ struct inet6_dev *rt_idev = rt->rt6i_idev;
+ struct net_device *rt_dev = rt->dst.dev;
+
+- if (rt_idev && (rt_idev->dev == dev || !dev) &&
+- rt_idev->dev != loopback_dev) {
++ if (rt_idev->dev == dev) {
+ rt->rt6i_idev = in6_dev_get(loopback_dev);
+ in6_dev_put(rt_idev);
+ }
+
+- if (rt_dev && (rt_dev == dev || !dev) &&
+- rt_dev != loopback_dev) {
++ if (rt_dev == dev) {
+ rt->dst.dev = loopback_dev;
+ dev_hold(rt->dst.dev);
+ dev_put(rt_dev);
+@@ -2577,7 +2578,8 @@ void rt6_ifdown(struct net *net, struct net_device *dev)
+
+ fib6_clean_all(net, fib6_ifdown, &adn);
+ icmp6_clean_all(fib6_ifdown, &adn);
+- rt6_uncached_list_flush_dev(net, dev);
++ if (dev)
++ rt6_uncached_list_flush_dev(net, dev);
+ }
+
+ struct rt6_mtu_change_arg {
+diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
+index f6b090df3930..afca2eb4dfa7 100644
+--- a/net/l2tp/l2tp_core.c
++++ b/net/l2tp/l2tp_core.c
+@@ -1319,7 +1319,7 @@ static void l2tp_tunnel_del_work(struct work_struct *work)
+ tunnel = container_of(work, struct l2tp_tunnel, del_work);
+ sk = l2tp_tunnel_sock_lookup(tunnel);
+ if (!sk)
+- return;
++ goto out;
+
+ sock = sk->sk_socket;
+
+@@ -1341,6 +1341,8 @@ static void l2tp_tunnel_del_work(struct work_struct *work)
+ }
+
+ l2tp_tunnel_sock_put(sk);
++out:
++ l2tp_tunnel_dec_refcount(tunnel);
+ }
+
+ /* Create a socket for the tunnel, if one isn't set up by
+@@ -1636,8 +1638,13 @@ EXPORT_SYMBOL_GPL(l2tp_tunnel_create);
+ */
+ int l2tp_tunnel_delete(struct l2tp_tunnel *tunnel)
+ {
++ l2tp_tunnel_inc_refcount(tunnel);
+ l2tp_tunnel_closeall(tunnel);
+- return (false == queue_work(l2tp_wq, &tunnel->del_work));
++ if (false == queue_work(l2tp_wq, &tunnel->del_work)) {
++ l2tp_tunnel_dec_refcount(tunnel);
++ return 1;
++ }
++ return 0;
+ }
+ EXPORT_SYMBOL_GPL(l2tp_tunnel_delete);
+
+diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
+index 0857f7243797..a133d16eb053 100644
+--- a/net/netlink/af_netlink.c
++++ b/net/netlink/af_netlink.c
+@@ -2750,6 +2750,7 @@ static int netlink_dump(struct sock *sk)
+ struct sk_buff *skb = NULL;
+ struct nlmsghdr *nlh;
+ int len, err = -ENOBUFS;
++ int alloc_min_size;
+ int alloc_size;
+
+ mutex_lock(nlk->cb_mutex);
+@@ -2758,9 +2759,6 @@ static int netlink_dump(struct sock *sk)
+ goto errout_skb;
+ }
+
+- cb = &nlk->cb;
+- alloc_size = max_t(int, cb->min_dump_alloc, NLMSG_GOODSIZE);
+-
+ if (!netlink_rx_is_mmaped(sk) &&
+ atomic_read(&sk->sk_rmem_alloc) >= sk->sk_rcvbuf)
+ goto errout_skb;
+@@ -2770,23 +2768,35 @@ static int netlink_dump(struct sock *sk)
+ * to reduce number of system calls on dump operations, if user
+ * ever provided a big enough buffer.
+ */
+- if (alloc_size < nlk->max_recvmsg_len) {
+- skb = netlink_alloc_skb(sk,
+- nlk->max_recvmsg_len,
+- nlk->portid,
++ cb = &nlk->cb;
++ alloc_min_size = max_t(int, cb->min_dump_alloc, NLMSG_GOODSIZE);
++
++ if (alloc_min_size < nlk->max_recvmsg_len) {
++ alloc_size = nlk->max_recvmsg_len;
++ skb = netlink_alloc_skb(sk, alloc_size, nlk->portid,
+ GFP_KERNEL |
+ __GFP_NOWARN |
+ __GFP_NORETRY);
+- /* available room should be exact amount to avoid MSG_TRUNC */
+- if (skb)
+- skb_reserve(skb, skb_tailroom(skb) -
+- nlk->max_recvmsg_len);
+ }
+- if (!skb)
++ if (!skb) {
++ alloc_size = alloc_min_size;
+ skb = netlink_alloc_skb(sk, alloc_size, nlk->portid,
+ GFP_KERNEL);
++ }
+ if (!skb)
+ goto errout_skb;
++
++ /* Trim skb to allocated size. User is expected to provide buffer as
++ * large as max(min_dump_alloc, 16KiB (mac_recvmsg_len capped at
++ * netlink_recvmsg())). dump will pack as many smaller messages as
++ * could fit within the allocated skb. skb is typically allocated
++ * with larger space than required (could be as much as near 2x the
++ * requested size with align to next power of 2 approach). Allowing
++ * dump to use the excess space makes it difficult for a user to have a
++ * reasonable static buffer based on the expected largest dump of a
++ * single netdev. The outcome is MSG_TRUNC error.
++ */
++ skb_reserve(skb, skb_tailroom(skb) - alloc_size);
+ netlink_skb_set_owner_r(skb, sk);
+
+ len = cb->dump(skb, cb);
+diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
+index b5c3bba87fc8..af08e6fc9860 100644
+--- a/net/openvswitch/flow_table.c
++++ b/net/openvswitch/flow_table.c
+@@ -92,7 +92,8 @@ struct sw_flow *ovs_flow_alloc(void)
+
+ /* Initialize the default stat node. */
+ stats = kmem_cache_alloc_node(flow_stats_cache,
+- GFP_KERNEL | __GFP_ZERO, 0);
++ GFP_KERNEL | __GFP_ZERO,
++ node_online(0) ? 0 : NUMA_NO_NODE);
+ if (!stats)
+ goto err;
+
+diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
+index 268545050ddb..b1768198ad59 100644
+--- a/net/sched/act_mirred.c
++++ b/net/sched/act_mirred.c
+@@ -168,6 +168,7 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
+
+ skb2->skb_iif = skb->dev->ifindex;
+ skb2->dev = dev;
++ skb_sender_cpu_clear(skb2);
+ err = dev_queue_xmit(skb2);
+
+ out:
+diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+index 2e1348bde325..96d886a866e9 100644
+--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
++++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+@@ -146,7 +146,8 @@ int rdma_read_chunk_lcl(struct svcxprt_rdma *xprt,
+ ctxt->read_hdr = head;
+ pages_needed =
+ min_t(int, pages_needed, rdma_read_max_sge(xprt, pages_needed));
+- read = min_t(int, pages_needed << PAGE_SHIFT, rs_length);
++ read = min_t(int, (pages_needed << PAGE_SHIFT) - *page_offset,
++ rs_length);
+
+ for (pno = 0; pno < pages_needed; pno++) {
+ int len = min_t(int, rs_length, PAGE_SIZE - pg_off);
+@@ -245,7 +246,8 @@ int rdma_read_chunk_frmr(struct svcxprt_rdma *xprt,
+ ctxt->direction = DMA_FROM_DEVICE;
+ ctxt->frmr = frmr;
+ pages_needed = min_t(int, pages_needed, xprt->sc_frmr_pg_list_len);
+- read = min_t(int, pages_needed << PAGE_SHIFT, rs_length);
++ read = min_t(int, (pages_needed << PAGE_SHIFT) - *page_offset,
++ rs_length);
+
+ frmr->kva = page_address(rqstp->rq_arg.pages[pg_no]);
+ frmr->direction = DMA_FROM_DEVICE;
+diff --git a/net/tipc/msg.h b/net/tipc/msg.h
+index 19c45fb66238..49f9a9648aa9 100644
+--- a/net/tipc/msg.h
++++ b/net/tipc/msg.h
+@@ -357,7 +357,7 @@ static inline u32 msg_importance(struct tipc_msg *m)
+ if (likely((usr <= TIPC_CRITICAL_IMPORTANCE) && !msg_errcode(m)))
+ return usr;
+ if ((usr == MSG_FRAGMENTER) || (usr == MSG_BUNDLER))
+- return msg_bits(m, 5, 13, 0x7);
++ return msg_bits(m, 9, 0, 0x7);
+ return TIPC_SYSTEM_IMPORTANCE;
+ }
+
+@@ -366,7 +366,7 @@ static inline void msg_set_importance(struct tipc_msg *m, u32 i)
+ int usr = msg_user(m);
+
+ if (likely((usr == MSG_FRAGMENTER) || (usr == MSG_BUNDLER)))
+- msg_set_bits(m, 5, 13, 0x7, i);
++ msg_set_bits(m, 9, 0, 0x7, i);
+ else if (i < TIPC_SYSTEM_IMPORTANCE)
+ msg_set_user(m, i);
+ else
+diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
+index 03ee4d359f6a..94f658235fb4 100644
+--- a/net/unix/af_unix.c
++++ b/net/unix/af_unix.c
+@@ -2064,6 +2064,11 @@ static int unix_stream_read_generic(struct unix_stream_read_state *state)
+ goto out;
+ }
+
++ if (flags & MSG_PEEK)
++ skip = sk_peek_offset(sk, flags);
++ else
++ skip = 0;
++
+ do {
+ int chunk;
+ struct sk_buff *skb, *last;
+@@ -2112,7 +2117,6 @@ unlock:
+ break;
+ }
+
+- skip = sk_peek_offset(sk, flags);
+ while (skip >= unix_skb_len(skb)) {
+ skip -= unix_skb_len(skb);
+ last = skb;
+@@ -2181,6 +2185,17 @@ unlock:
+
+ sk_peek_offset_fwd(sk, chunk);
+
++ if (UNIXCB(skb).fp)
++ break;
++
++ skip = 0;
++ last = skb;
++ last_len = skb->len;
++ unix_state_lock(sk);
++ skb = skb_peek_next(skb, &sk->sk_receive_queue);
++ if (skb)
++ goto again;
++ unix_state_unlock(sk);
+ break;
+ }
+ } while (size);
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-11-05 23:30 Mike Pagano
0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-11-05 23:30 UTC (permalink / raw
To: gentoo-commits
commit: 3a0e597bb6b80d0db9567050a1fb2c397c1e3594
Author: Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Thu Nov 5 23:30:34 2015 +0000
Commit: Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Thu Nov 5 23:30:34 2015 +0000
URL: https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=3a0e597b
Removing kdbus as per upstream developers. See http://lwn.net/Articles/663062/
0000_README | 4 -
5015_kdbus-8-12-2015.patch | 34349 -------------------------------------------
2 files changed, 34353 deletions(-)
diff --git a/0000_README b/0000_README
index d40ecf2..cf9d964 100644
--- a/0000_README
+++ b/0000_README
@@ -110,7 +110,3 @@ Desc: BFQ v7r8 patch 3 for 4.2: Early Queue Merge (EQM)
Patch: 5010_enable-additional-cpu-optimizations-for-gcc-4.9.patch
From: https://github.com/graysky2/kernel_gcc_patch/
Desc: Kernel patch enables gcc >= v4.9 optimizations for additional CPUs.
-
-Patch: 5015_kdbus-8-12-2015.patch
-From: https://lkml.org
-Desc: Kernel-level IPC implementation
diff --git a/5015_kdbus-8-12-2015.patch b/5015_kdbus-8-12-2015.patch
deleted file mode 100644
index 4e018f2..0000000
--- a/5015_kdbus-8-12-2015.patch
+++ /dev/null
@@ -1,34349 +0,0 @@
-diff --git a/Documentation/Makefile b/Documentation/Makefile
-index bc05482..e2127a7 100644
---- a/Documentation/Makefile
-+++ b/Documentation/Makefile
-@@ -1,4 +1,4 @@
- subdir-y := accounting auxdisplay blackfin connector \
-- filesystems filesystems ia64 laptops mic misc-devices \
-+ filesystems filesystems ia64 kdbus laptops mic misc-devices \
- networking pcmcia prctl ptp spi timers vDSO video4linux \
- watchdog
-diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
-index 51f4221..ec7c81b 100644
---- a/Documentation/ioctl/ioctl-number.txt
-+++ b/Documentation/ioctl/ioctl-number.txt
-@@ -292,6 +292,7 @@ Code Seq#(hex) Include File Comments
- 0x92 00-0F drivers/usb/mon/mon_bin.c
- 0x93 60-7F linux/auto_fs.h
- 0x94 all fs/btrfs/ioctl.h
-+0x95 all uapi/linux/kdbus.h kdbus IPC driver
- 0x97 00-7F fs/ceph/ioctl.h Ceph file system
- 0x99 00-0F 537-Addinboard driver
- <mailto:buk@buks.ipn.de>
-diff --git a/Documentation/kdbus/.gitignore b/Documentation/kdbus/.gitignore
-new file mode 100644
-index 0000000..b4a77cc
---- /dev/null
-+++ b/Documentation/kdbus/.gitignore
-@@ -0,0 +1,2 @@
-+*.7
-+*.html
-diff --git a/Documentation/kdbus/Makefile b/Documentation/kdbus/Makefile
-new file mode 100644
-index 0000000..8caffe5
---- /dev/null
-+++ b/Documentation/kdbus/Makefile
-@@ -0,0 +1,44 @@
-+DOCS := \
-+ kdbus.xml \
-+ kdbus.bus.xml \
-+ kdbus.connection.xml \
-+ kdbus.endpoint.xml \
-+ kdbus.fs.xml \
-+ kdbus.item.xml \
-+ kdbus.match.xml \
-+ kdbus.message.xml \
-+ kdbus.name.xml \
-+ kdbus.policy.xml \
-+ kdbus.pool.xml
-+
-+XMLFILES := $(addprefix $(obj)/,$(DOCS))
-+MANFILES := $(patsubst %.xml, %.7, $(XMLFILES))
-+HTMLFILES := $(patsubst %.xml, %.html, $(XMLFILES))
-+
-+XMLTO_ARGS := -m $(srctree)/$(src)/stylesheet.xsl --skip-validation
-+
-+quiet_cmd_db2man = MAN $@
-+ cmd_db2man = xmlto man $(XMLTO_ARGS) -o $(obj) $<
-+%.7: %.xml
-+ @(which xmlto > /dev/null 2>&1) || \
-+ (echo "*** You need to install xmlto ***"; \
-+ exit 1)
-+ $(call cmd,db2man)
-+
-+quiet_cmd_db2html = HTML $@
-+ cmd_db2html = xmlto html-nochunks $(XMLTO_ARGS) -o $(obj) $<
-+%.html: %.xml
-+ @(which xmlto > /dev/null 2>&1) || \
-+ (echo "*** You need to install xmlto ***"; \
-+ exit 1)
-+ $(call cmd,db2html)
-+
-+mandocs: $(MANFILES)
-+
-+htmldocs: $(HTMLFILES)
-+
-+clean-files := $(MANFILES) $(HTMLFILES)
-+
-+# we don't support other %docs targets right now
-+%docs:
-+ @true
-diff --git a/Documentation/kdbus/kdbus.bus.xml b/Documentation/kdbus/kdbus.bus.xml
-new file mode 100644
-index 0000000..83f1198
---- /dev/null
-+++ b/Documentation/kdbus/kdbus.bus.xml
-@@ -0,0 +1,344 @@
-+<?xml version='1.0'?> <!--*-nxml-*-->
-+<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
-+ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
-+
-+<refentry id="kdbus.bus">
-+
-+ <refentryinfo>
-+ <title>kdbus.bus</title>
-+ <productname>kdbus.bus</productname>
-+ </refentryinfo>
-+
-+ <refmeta>
-+ <refentrytitle>kdbus.bus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </refmeta>
-+
-+ <refnamediv>
-+ <refname>kdbus.bus</refname>
-+ <refpurpose>kdbus bus</refpurpose>
-+ </refnamediv>
-+
-+ <refsect1>
-+ <title>Description</title>
-+
-+ <para>
-+ A bus is a resource that is shared between connections in order to
-+ transmit messages (see
-+ <citerefentry>
-+ <refentrytitle>kdbus.message</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>).
-+ Each bus is independent, and operations on the bus will not have any
-+ effect on other buses. A bus is a management entity that controls the
-+ addresses of its connections, their policies and message transactions
-+ performed via this bus.
-+ </para>
-+ <para>
-+ Each bus is bound to the mount instance it was created on. It has a
-+ custom name that is unique across all buses of a domain. In
-+ <citerefentry>
-+ <refentrytitle>kdbus.fs</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ a bus is presented as a directory. No operations can be performed on
-+ the bus itself; instead you need to perform the operations on an endpoint
-+ associated with the bus. Endpoints are accessible as files underneath the
-+ bus directory. A default endpoint called <constant>bus</constant> is
-+ provided on each bus.
-+ </para>
-+ <para>
-+ Bus names may be chosen freely except for one restriction: the name must
-+ be prefixed with the numeric effective UID of the creator and a dash. This
-+ is required to avoid namespace clashes between different users. When
-+ creating a bus, the name that is passed in must be properly formatted, or
-+ the kernel will refuse creation of the bus. Example:
-+ <literal>1047-foobar</literal> is an acceptable name for a bus
-+ registered by a user with UID 1047. However,
-+ <literal>1024-foobar</literal> is not, and neither is
-+ <literal>foobar</literal>. The UID must be provided in the
-+ user-namespace of the bus owner.
-+ </para>
-+ <para>
-+ To create a new bus, you need to open the control file of a domain and
-+ employ the <constant>KDBUS_CMD_BUS_MAKE</constant> ioctl. The control
-+ file descriptor that was used to issue
-+ <constant>KDBUS_CMD_BUS_MAKE</constant> must not previously have been
-+ used for any other control-ioctl and must be kept open for the entire
-+ life-time of the created bus. Closing it will immediately cleanup the
-+ entire bus and all its associated resources and endpoints. Every control
-+ file descriptor can only be used to create a single new bus; from that
-+ point on, it is not used for any further communication until the final
-+ <citerefentry>
-+ <refentrytitle>close</refentrytitle>
-+ <manvolnum>2</manvolnum>
-+ </citerefentry>
-+ .
-+ </para>
-+ <para>
-+ Each bus will generate a random, 128-bit UUID upon creation. This UUID
-+ will be returned to creators of connections through
-+ <varname>kdbus_cmd_hello.id128</varname> and can be used to uniquely
-+ identify buses, even across different machines or containers. The UUID
-+ will have its variant bits set to <literal>DCE</literal>, and denote
-+ version 4 (random). For more details on UUIDs, see <ulink
-+ url="https://en.wikipedia.org/wiki/Universally_unique_identifier">
-+ the Wikipedia article on UUIDs</ulink>.
-+ </para>
-+
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Creating buses</title>
-+ <para>
-+ To create a new bus, the <constant>KDBUS_CMD_BUS_MAKE</constant>
-+ command is used. It takes a <type>struct kdbus_cmd</type> argument.
-+ </para>
-+ <programlisting>
-+struct kdbus_cmd {
-+ __u64 size;
-+ __u64 flags;
-+ __u64 return_flags;
-+ struct kdbus_item items[0];
-+};
-+ </programlisting>
-+
-+ <para>The fields in this struct are described below.</para>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><varname>size</varname></term>
-+ <listitem><para>
-+ The overall size of the struct, including its items.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>flags</varname></term>
-+ <listitem><para>The flags for creation.</para>
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_MAKE_ACCESS_GROUP</constant></term>
-+ <listitem>
-+ <para>Make the bus file group-accessible.</para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_MAKE_ACCESS_WORLD</constant></term>
-+ <listitem>
-+ <para>Make the bus file world-accessible.</para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
-+ <listitem>
-+ <para>
-+ Requests a set of valid flags for this ioctl. When this bit is
-+ set, no action is taken; the ioctl will return
-+ <errorcode>0</errorcode>, and the <varname>flags</varname>
-+ field will have all bits set that are valid for this command.
-+ The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
-+ cleared by the operation.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>return_flags</varname></term>
-+ <listitem><para>
-+ Flags returned by the kernel. Currently unused and always set to
-+ <constant>0</constant> by the kernel.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>items</varname></term>
-+ <listitem>
-+ <para>
-+ The following items (see
-+ <citerefentry>
-+ <refentrytitle>kdbus.item</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>)
-+ are expected for <constant>KDBUS_CMD_BUS_MAKE</constant>.
-+ </para>
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_MAKE_NAME</constant></term>
-+ <listitem>
-+ <para>
-+ Contains a null-terminated string that identifies the
-+ bus. The name must be unique across the kdbus domain and
-+ must start with the effective UID of the caller, followed by
-+ a '<literal>-</literal>' (dash). This item is mandatory.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_BLOOM_PARAMETER</constant></term>
-+ <listitem>
-+ <para>
-+ Bus-wide bloom parameters passed in a
-+ <type>struct kdbus_bloom_parameter</type>. These settings are
-+ copied back to new connections verbatim. This item is
-+ mandatory. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.item</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for a more detailed description of this item.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_ATTACH_FLAGS_SEND</constant></term>
-+ <listitem>
-+ <para>
-+ An optional item that contains a set of attach flags that are
-+ returned to connections when they query the bus creator
-+ metadata. If not set, no metadata is returned.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
-+ <listitem><para>
-+ With this item, programs can <emphasis>probe</emphasis> the
-+ kernel for known item types. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.item</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more details.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+
-+ <para>
-+ Unrecognized items are rejected, and the ioctl will fail with
-+ <varname>errno</varname> set to <constant>EINVAL</constant>.
-+ </para>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Return value</title>
-+ <para>
-+ On success, all mentioned ioctl commands return <errorcode>0</errorcode>;
-+ on error, <errorcode>-1</errorcode> is returned, and
-+ <varname>errno</varname> is set to indicate the error.
-+ If the issued ioctl is illegal for the file descriptor used,
-+ <varname>errno</varname> will be set to <constant>ENOTTY</constant>.
-+ </para>
-+
-+ <refsect2>
-+ <title>
-+ <constant>KDBUS_CMD_BUS_MAKE</constant> may fail with the following
-+ errors
-+ </title>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>EBADMSG</constant></term>
-+ <listitem><para>
-+ A mandatory item is missing.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EINVAL</constant></term>
-+ <listitem><para>
-+ The flags supplied in the <constant>struct kdbus_cmd</constant>
-+ are invalid or the supplied name does not start with the current
-+ UID and a '<literal>-</literal>' (dash).
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EEXIST</constant></term>
-+ <listitem><para>
-+ A bus of that name already exists.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>ESHUTDOWN</constant></term>
-+ <listitem><para>
-+ The kdbus mount instance for the bus was already shut down.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EMFILE</constant></term>
-+ <listitem><para>
-+ The maximum number of buses for the current user is exhausted.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </refsect2>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>See Also</title>
-+ <simplelist type="inline">
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.connection</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.endpoint</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.fs</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.item</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.message</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.name</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.pool</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ </simplelist>
-+ </refsect1>
-+</refentry>
-diff --git a/Documentation/kdbus/kdbus.connection.xml b/Documentation/kdbus/kdbus.connection.xml
-new file mode 100644
-index 0000000..4bb5f30
---- /dev/null
-+++ b/Documentation/kdbus/kdbus.connection.xml
-@@ -0,0 +1,1244 @@
-+<?xml version='1.0'?> <!--*-nxml-*-->
-+<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
-+ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
-+
-+<refentry id="kdbus.connection">
-+
-+ <refentryinfo>
-+ <title>kdbus.connection</title>
-+ <productname>kdbus.connection</productname>
-+ </refentryinfo>
-+
-+ <refmeta>
-+ <refentrytitle>kdbus.connection</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </refmeta>
-+
-+ <refnamediv>
-+ <refname>kdbus.connection</refname>
-+ <refpurpose>kdbus connection</refpurpose>
-+ </refnamediv>
-+
-+ <refsect1>
-+ <title>Description</title>
-+
-+ <para>
-+ Connections are identified by their <emphasis>connection ID</emphasis>,
-+ internally implemented as a <type>uint64_t</type> counter.
-+ The IDs of every newly created bus start at <constant>1</constant>, and
-+ every new connection will increment the counter by <constant>1</constant>.
-+ The IDs are not reused.
-+ </para>
-+ <para>
-+ In higher level tools, the user visible representation of a connection is
-+ defined by the D-Bus protocol specification as
-+ <constant>":1.<ID>"</constant>.
-+ </para>
-+ <para>
-+ Messages with a specific <type>uint64_t</type> destination ID are
-+ directly delivered to the connection with the corresponding ID. Signal
-+ messages (see
-+ <citerefentry>
-+ <refentrytitle>kdbus.message</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>)
-+ may be addressed to the special destination ID
-+ <constant>KDBUS_DST_ID_BROADCAST</constant> (~0ULL) and will then
-+ potentially be delivered to all currently active connections on the bus.
-+ However, in order to receive any signal messages, clients must subscribe
-+ to them by installing a match (see
-+ <citerefentry>
-+ <refentrytitle>kdbus.match</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>).
-+ </para>
-+ <para>
-+ Messages synthesized and sent directly by the kernel will carry the
-+ special source ID <constant>KDBUS_SRC_ID_KERNEL</constant> (0).
-+ </para>
-+ <para>
-+ In addition to the unique <type>uint64_t</type> connection ID,
-+ established connections can request the ownership of
-+ <emphasis>well-known names</emphasis>, under which they can be found and
-+ addressed by other bus clients. A well-known name is associated with one
-+ and only one connection at a time. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.name</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ on name acquisition, the name registry, and the validity of names.
-+ </para>
-+ <para>
-+ Messages can specify the special destination ID
-+ <constant>KDBUS_DST_ID_NAME</constant> (0) and carry a well-known name
-+ in the message data. Such a message is delivered to the destination
-+ connection which owns that well-known name.
-+ </para>
-+
-+ <programlisting><![CDATA[
-+ +-------------------------------------------------------------------------+
-+ | +---------------+ +---------------------------+ |
-+ | | Connection | | Message | -----------------+ |
-+ | | :1.22 | --> | src: 22 | | |
-+ | | | | dst: 25 | | |
-+ | | | | | | |
-+ | | | | | | |
-+ | | | +---------------------------+ | |
-+ | | | | |
-+ | | | <--------------------------------------+ | |
-+ | +---------------+ | | |
-+ | | | |
-+ | +---------------+ +---------------------------+ | | |
-+ | | Connection | | Message | -----+ | |
-+ | | :1.25 | --> | src: 25 | | |
-+ | | | | dst: 0xffffffffffffffff | -------------+ | |
-+ | | | | (KDBUS_DST_ID_BROADCAST) | | | |
-+ | | | | | ---------+ | | |
-+ | | | +---------------------------+ | | | |
-+ | | | | | | |
-+ | | | <--------------------------------------------------+ |
-+ | +---------------+ | | |
-+ | | | |
-+ | +---------------+ +---------------------------+ | | |
-+ | | Connection | | Message | --+ | | |
-+ | | :1.55 | --> | src: 55 | | | | |
-+ | | | | dst: 0 / org.foo.bar | | | | |
-+ | | | | | | | | |
-+ | | | | | | | | |
-+ | | | +---------------------------+ | | | |
-+ | | | | | | |
-+ | | | <------------------------------------------+ | |
-+ | +---------------+ | | |
-+ | | | |
-+ | +---------------+ | | |
-+ | | Connection | | | |
-+ | | :1.81 | | | |
-+ | | org.foo.bar | | | |
-+ | | | | | |
-+ | | | | | |
-+ | | | <-----------------------------------+ | |
-+ | | | | |
-+ | | | <----------------------------------------------+ |
-+ | +---------------+ |
-+ +-------------------------------------------------------------------------+
-+ ]]></programlisting>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Privileged connections</title>
-+ <para>
-+ A connection is considered <emphasis>privileged</emphasis> if the user
-+ it was created by is the same that created the bus, or if the creating
-+ task had <constant>CAP_IPC_OWNER</constant> set when it called
-+ <constant>KDBUS_CMD_HELLO</constant> (see below).
-+ </para>
-+ <para>
-+ Privileged connections have permission to employ certain restricted
-+ functions and commands, which are explained below and in other kdbus
-+ man-pages.
-+ </para>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Activator and policy holder connection</title>
-+ <para>
-+ An <emphasis>activator</emphasis> connection is a placeholder for a
-+ <emphasis>well-known name</emphasis>. Messages sent to such a connection
-+ can be used to start an implementer connection, which will then get all
-+ the messages from the activator copied over. An activator connection
-+ cannot be used to send any message.
-+ </para>
-+ <para>
-+ A <emphasis>policy holder</emphasis> connection only installs a policy
-+ for one or more names. These policy entries are kept active as long as
-+ the connection is alive, and are removed once it terminates. Such a
-+ policy connection type can be used to deploy restrictions for names that
-+ are not yet active on the bus. A policy holder connection cannot be used
-+ to send any message.
-+ </para>
-+ <para>
-+ The creation of activator or policy holder connections is restricted to
-+ privileged users on the bus (see above).
-+ </para>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Monitor connections</title>
-+ <para>
-+ Monitors are eavesdropping connections that receive all the traffic on the
-+ bus, but is invisible to other connections. Such connections have all
-+ properties of any other, regular connection, except for the following
-+ details:
-+ </para>
-+
-+ <itemizedlist>
-+ <listitem><para>
-+ They will get every message sent over the bus, both unicasts and
-+ broadcasts.
-+ </para></listitem>
-+
-+ <listitem><para>
-+ Installing matches for signal messages is neither necessary
-+ nor allowed.
-+ </para></listitem>
-+
-+ <listitem><para>
-+ They cannot send messages or be directly addressed as receiver.
-+ </para></listitem>
-+
-+ <listitem><para>
-+ They cannot own well-known names. Therefore, they also can't operate as
-+ activators.
-+ </para></listitem>
-+
-+ <listitem><para>
-+ Their creation and destruction will not cause
-+ <constant>KDBUS_ITEM_ID_{ADD,REMOVE}</constant> (see
-+ <citerefentry>
-+ <refentrytitle>kdbus.item</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>).
-+ </para></listitem>
-+
-+ <listitem><para>
-+ They are not listed with their unique name in name registry dumps
-+ (see <constant>KDBUS_CMD_NAME_LIST</constant> in
-+ <citerefentry>
-+ <refentrytitle>kdbus.name</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>), so other connections cannot detect the presence of
-+ a monitor.
-+ </para></listitem>
-+ </itemizedlist>
-+ <para>
-+ The creation of monitor connections is restricted to privileged users on
-+ the bus (see above).
-+ </para>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Creating connections</title>
-+ <para>
-+ A connection to a bus is created by opening an endpoint file (see
-+ <citerefentry>
-+ <refentrytitle>kdbus.endpoint</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>)
-+ of a bus and becoming an active client with the
-+ <constant>KDBUS_CMD_HELLO</constant> ioctl. Every connection has a unique
-+ identifier on the bus and can address messages to every other connection
-+ on the same bus by using the peer's connection ID as the destination.
-+ </para>
-+ <para>
-+ The <constant>KDBUS_CMD_HELLO</constant> ioctl takes a <type>struct
-+ kdbus_cmd_hello</type> as argument.
-+ </para>
-+
-+ <programlisting>
-+struct kdbus_cmd_hello {
-+ __u64 size;
-+ __u64 flags;
-+ __u64 return_flags;
-+ __u64 attach_flags_send;
-+ __u64 attach_flags_recv;
-+ __u64 bus_flags;
-+ __u64 id;
-+ __u64 pool_size;
-+ __u64 offset;
-+ __u8 id128[16];
-+ struct kdbus_item items[0];
-+};
-+ </programlisting>
-+
-+ <para>The fields in this struct are described below.</para>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><varname>size</varname></term>
-+ <listitem><para>
-+ The overall size of the struct, including its items.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>flags</varname></term>
-+ <listitem>
-+ <para>Flags to apply to this connection</para>
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_HELLO_ACCEPT_FD</constant></term>
-+ <listitem>
-+ <para>
-+ When this flag is set, the connection can be sent file
-+ descriptors as message payload of unicast messages. If it's
-+ not set, an attempt to send file descriptors will result in
-+ <constant>-ECOMM</constant> on the sender's side.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_HELLO_ACTIVATOR</constant></term>
-+ <listitem>
-+ <para>
-+ Make this connection an activator (see above). With this bit
-+ set, an item of type <constant>KDBUS_ITEM_NAME</constant> has
-+ to be attached. This item describes the well-known name this
-+ connection should be an activator for.
-+ A connection can not be an activator and a policy holder at
-+ the same time time, so this bit is not allowed together with
-+ <constant>KDBUS_HELLO_POLICY_HOLDER</constant>.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_HELLO_POLICY_HOLDER</constant></term>
-+ <listitem>
-+ <para>
-+ Make this connection a policy holder (see above). With this
-+ bit set, an item of type <constant>KDBUS_ITEM_NAME</constant>
-+ has to be attached. This item describes the well-known name
-+ this connection should hold a policy for.
-+ A connection can not be an activator and a policy holder at
-+ the same time time, so this bit is not allowed together with
-+ <constant>KDBUS_HELLO_ACTIVATOR</constant>.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_HELLO_MONITOR</constant></term>
-+ <listitem>
-+ <para>
-+ Make this connection a monitor connection (see above).
-+ </para>
-+ <para>
-+ This flag can only be set by privileged bus connections. See
-+ below for more information.
-+ A connection can not be monitor and an activator or a policy
-+ holder at the same time time, so this bit is not allowed
-+ together with <constant>KDBUS_HELLO_ACTIVATOR</constant> or
-+ <constant>KDBUS_HELLO_POLICY_HOLDER</constant>.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
-+ <listitem>
-+ <para>
-+ Requests a set of valid flags for this ioctl. When this bit is
-+ set, no action is taken; the ioctl will return
-+ <errorcode>0</errorcode>, and the <varname>flags</varname>
-+ field will have all bits set that are valid for this command.
-+ The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
-+ cleared by the operation.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>return_flags</varname></term>
-+ <listitem><para>
-+ Flags returned by the kernel. Currently unused and always set to
-+ <constant>0</constant> by the kernel.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>attach_flags_send</varname></term>
-+ <listitem><para>
-+ Set the bits for metadata this connection permits to be sent to the
-+ receiving peer. Only metadata items that are both allowed to be sent
-+ by the sender and that are requested by the receiver will be attached
-+ to the message.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>attach_flags_recv</varname></term>
-+ <listitem><para>
-+ Request the attachment of metadata for each message received by this
-+ connection. See
-+ <citerefentry>
-+ <refentrytitle>kdbus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for information about metadata, and
-+ <citerefentry>
-+ <refentrytitle>kdbus.item</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ regarding items in general.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>bus_flags</varname></term>
-+ <listitem><para>
-+ Upon successful completion of the ioctl, this member will contain the
-+ flags of the bus it connected to.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>id</varname></term>
-+ <listitem><para>
-+ Upon successful completion of the command, this member will contain
-+ the numerical ID of the new connection.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>pool_size</varname></term>
-+ <listitem><para>
-+ The size of the communication pool, in bytes. The pool can be
-+ accessed by calling
-+ <citerefentry>
-+ <refentrytitle>mmap</refentrytitle>
-+ <manvolnum>2</manvolnum>
-+ </citerefentry>
-+ on the file descriptor that was used to issue the
-+ <constant>KDBUS_CMD_HELLO</constant> ioctl.
-+ The pool size of a connection must be greater than
-+ <constant>0</constant> and a multiple of
-+ <constant>PAGE_SIZE</constant>. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.pool</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more information.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>offset</varname></term>
-+ <listitem><para>
-+ The kernel will return the offset in the pool where returned details
-+ will be stored. See below.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>id128</varname></term>
-+ <listitem><para>
-+ Upon successful completion of the ioctl, this member will contain the
-+ <emphasis>128-bit UUID</emphasis> of the connected bus.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>items</varname></term>
-+ <listitem>
-+ <para>
-+ Variable list of items containing optional additional information.
-+ The following items are currently expected/valid:
-+ </para>
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_CONN_DESCRIPTION</constant></term>
-+ <listitem>
-+ <para>
-+ Contains a string that describes this connection, so it can
-+ be identified later.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_NAME</constant></term>
-+ <term><constant>KDBUS_ITEM_POLICY_ACCESS</constant></term>
-+ <listitem>
-+ <para>
-+ For activators and policy holders only, combinations of
-+ these two items describe policy access entries. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.policy</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for further details.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_CREDS</constant></term>
-+ <term><constant>KDBUS_ITEM_PIDS</constant></term>
-+ <term><constant>KDBUS_ITEM_SECLABEL</constant></term>
-+ <listitem>
-+ <para>
-+ Privileged bus users may submit these types in order to
-+ create connections with faked credentials. This information
-+ will be returned when peer information is queried by
-+ <constant>KDBUS_CMD_CONN_INFO</constant>. See below for more
-+ information on retrieving information on connections.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
-+ <listitem><para>
-+ With this item, programs can <emphasis>probe</emphasis> the
-+ kernel for known item types. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.item</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more details.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+
-+ <para>
-+ Unrecognized items are rejected, and the ioctl will fail with
-+ <varname>errno</varname> set to <constant>EINVAL</constant>.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+
-+ <para>
-+ At the offset returned in the <varname>offset</varname> field of
-+ <type>struct kdbus_cmd_hello</type>, the kernel will store items
-+ of the following types:
-+ </para>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_BLOOM_PARAMETER</constant></term>
-+ <listitem>
-+ <para>
-+ Bloom filter parameter as defined by the bus creator.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+
-+ <para>
-+ The offset in the pool has to be freed with the
-+ <constant>KDBUS_CMD_FREE</constant> ioctl. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.pool</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for further information.
-+ </para>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Retrieving information on a connection</title>
-+ <para>
-+ The <constant>KDBUS_CMD_CONN_INFO</constant> ioctl can be used to
-+ retrieve credentials and properties of the initial creator of a
-+ connection. This ioctl uses the following struct.
-+ </para>
-+
-+ <programlisting>
-+struct kdbus_cmd_info {
-+ __u64 size;
-+ __u64 flags;
-+ __u64 return_flags;
-+ __u64 id;
-+ __u64 attach_flags;
-+ __u64 offset;
-+ __u64 info_size;
-+ struct kdbus_item items[0];
-+};
-+ </programlisting>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><varname>size</varname></term>
-+ <listitem><para>
-+ The overall size of the struct, including its items.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>flags</varname></term>
-+ <listitem><para>
-+ Currently, no flags are supported.
-+ <constant>KDBUS_FLAG_NEGOTIATE</constant> is accepted to probe for
-+ valid flags. If set, the ioctl will return <errorcode>0</errorcode>,
-+ and the <varname>flags</varname> field is set to
-+ <constant>0</constant>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>return_flags</varname></term>
-+ <listitem><para>
-+ Flags returned by the kernel. Currently unused and always set to
-+ <constant>0</constant> by the kernel.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>id</varname></term>
-+ <listitem><para>
-+ The numerical ID of the connection for which information is to be
-+ retrieved. If set to a non-zero value, the
-+ <constant>KDBUS_ITEM_OWNED_NAME</constant> item is ignored.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>attach_flags</varname></term>
-+ <listitem><para>
-+ Specifies which metadata items should be attached to the answer. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.message</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>offset</varname></term>
-+ <listitem><para>
-+ When the ioctl returns, this field will contain the offset of the
-+ connection information inside the caller's pool. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.pool</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for further information.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>info_size</varname></term>
-+ <listitem><para>
-+ The kernel will return the size of the returned information, so
-+ applications can optionally
-+ <citerefentry>
-+ <refentrytitle>mmap</refentrytitle>
-+ <manvolnum>2</manvolnum>
-+ </citerefentry>
-+ specific parts of the pool. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.pool</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for further information.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>items</varname></term>
-+ <listitem>
-+ <para>
-+ The following items are expected for
-+ <constant>KDBUS_CMD_CONN_INFO</constant>.
-+ </para>
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_OWNED_NAME</constant></term>
-+ <listitem>
-+ <para>
-+ Contains the well-known name of the connection to look up as.
-+ This item is mandatory if the <varname>id</varname> field is
-+ set to 0.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
-+ <listitem><para>
-+ With this item, programs can <emphasis>probe</emphasis> the
-+ kernel for known item types. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.item</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more details.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+ <para>
-+ Unrecognized items are rejected, and the ioctl will fail with
-+ <varname>errno</varname> set to <constant>EINVAL</constant>.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+
-+ <para>
-+ When the ioctl returns, the following struct will be stored in the
-+ caller's pool at <varname>offset</varname>. The fields in this struct
-+ are described below.
-+ </para>
-+
-+ <programlisting>
-+struct kdbus_info {
-+ __u64 size;
-+ __u64 id;
-+ __u64 flags;
-+ struct kdbus_item items[0];
-+};
-+ </programlisting>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><varname>size</varname></term>
-+ <listitem><para>
-+ The overall size of the struct, including its items.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>id</varname></term>
-+ <listitem><para>
-+ The connection's unique ID.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>flags</varname></term>
-+ <listitem><para>
-+ The connection's flags as specified when it was created.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>items</varname></term>
-+ <listitem>
-+ <para>
-+ Depending on the <varname>flags</varname> field in
-+ <type>struct kdbus_cmd_info</type>, items of types
-+ <constant>KDBUS_ITEM_OWNED_NAME</constant> and
-+ <constant>KDBUS_ITEM_CONN_DESCRIPTION</constant> may follow here.
-+ <constant>KDBUS_ITEM_NEGOTIATE</constant> is also allowed.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+
-+ <para>
-+ Once the caller is finished with parsing the return buffer, it needs to
-+ employ the <constant>KDBUS_CMD_FREE</constant> command for the offset, in
-+ order to free the buffer part. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.pool</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for further information.
-+ </para>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Getting information about a connection's bus creator</title>
-+ <para>
-+ The <constant>KDBUS_CMD_BUS_CREATOR_INFO</constant> ioctl takes the same
-+ struct as <constant>KDBUS_CMD_CONN_INFO</constant>, but is used to
-+ retrieve information about the creator of the bus the connection is
-+ attached to. The metadata returned by this call is collected during the
-+ creation of the bus and is never altered afterwards, so it provides
-+ pristine information on the task that created the bus, at the moment when
-+ it did so.
-+ </para>
-+ <para>
-+ In response to this call, a slice in the connection's pool is allocated
-+ and filled with an object of type <type>struct kdbus_info</type>,
-+ pointed to by the ioctl's <varname>offset</varname> field.
-+ </para>
-+
-+ <programlisting>
-+struct kdbus_info {
-+ __u64 size;
-+ __u64 id;
-+ __u64 flags;
-+ struct kdbus_item items[0];
-+};
-+ </programlisting>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><varname>size</varname></term>
-+ <listitem><para>
-+ The overall size of the struct, including its items.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>id</varname></term>
-+ <listitem><para>
-+ The bus ID.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>flags</varname></term>
-+ <listitem><para>
-+ The bus flags as specified when it was created.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>items</varname></term>
-+ <listitem>
-+ <para>
-+ Metadata information is stored in items here. The item list
-+ contains a <constant>KDBUS_ITEM_MAKE_NAME</constant> item that
-+ indicates the bus name of the calling connection.
-+ <constant>KDBUS_ITEM_NEGOTIATE</constant> is allowed to probe
-+ for known item types.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+
-+ <para>
-+ Once the caller is finished with parsing the return buffer, it needs to
-+ employ the <constant>KDBUS_CMD_FREE</constant> command for the offset, in
-+ order to free the buffer part. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.pool</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for further information.
-+ </para>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Updating connection details</title>
-+ <para>
-+ Some of a connection's details can be updated with the
-+ <constant>KDBUS_CMD_CONN_UPDATE</constant> ioctl, using the file
-+ descriptor that was used to create the connection. The update command
-+ uses the following struct.
-+ </para>
-+
-+ <programlisting>
-+struct kdbus_cmd {
-+ __u64 size;
-+ __u64 flags;
-+ __u64 return_flags;
-+ struct kdbus_item items[0];
-+};
-+ </programlisting>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><varname>size</varname></term>
-+ <listitem><para>
-+ The overall size of the struct, including its items.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>flags</varname></term>
-+ <listitem><para>
-+ Currently, no flags are supported.
-+ <constant>KDBUS_FLAG_NEGOTIATE</constant> is accepted to probe for
-+ valid flags. If set, the ioctl will return <errorcode>0</errorcode>,
-+ and the <varname>flags</varname> field is set to
-+ <constant>0</constant>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>return_flags</varname></term>
-+ <listitem><para>
-+ Flags returned by the kernel. Currently unused and always set to
-+ <constant>0</constant> by the kernel.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>items</varname></term>
-+ <listitem>
-+ <para>
-+ Items to describe the connection details to be updated. The
-+ following item types are supported.
-+ </para>
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_ATTACH_FLAGS_SEND</constant></term>
-+ <listitem>
-+ <para>
-+ Supply a new set of metadata items that this connection
-+ permits to be sent along with messages.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_ATTACH_FLAGS_RECV</constant></term>
-+ <listitem>
-+ <para>
-+ Supply a new set of metadata items that this connection
-+ requests to be attached to each message.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_NAME</constant></term>
-+ <term><constant>KDBUS_ITEM_POLICY_ACCESS</constant></term>
-+ <listitem>
-+ <para>
-+ Policy holder connections may supply a new set of policy
-+ information with these items. For other connection types,
-+ <constant>EOPNOTSUPP</constant> is returned in
-+ <varname>errno</varname>.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
-+ <listitem><para>
-+ With this item, programs can <emphasis>probe</emphasis> the
-+ kernel for known item types. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.item</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more details.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+
-+ <para>
-+ Unrecognized items are rejected, and the ioctl will fail with
-+ <varname>errno</varname> set to <constant>EINVAL</constant>.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Termination of connections</title>
-+ <para>
-+ A connection can be terminated by simply calling
-+ <citerefentry>
-+ <refentrytitle>close</refentrytitle>
-+ <manvolnum>2</manvolnum>
-+ </citerefentry>
-+ on its file descriptor. All pending incoming messages will be discarded,
-+ and the memory allocated by the pool will be freed.
-+ </para>
-+
-+ <para>
-+ An alternative way of closing down a connection is via the
-+ <constant>KDBUS_CMD_BYEBYE</constant> ioctl. This ioctl will succeed only
-+ if the message queue of the connection is empty at the time of closing;
-+ otherwise, the ioctl will fail with <varname>errno</varname> set to
-+ <constant>EBUSY</constant>. When this ioctl returns
-+ successfully, the connection has been terminated and won't accept any new
-+ messages from remote peers. This way, a connection can be terminated
-+ race-free, without losing any messages. The ioctl takes an argument of
-+ type <type>struct kdbus_cmd</type>.
-+ </para>
-+
-+ <programlisting>
-+struct kdbus_cmd {
-+ __u64 size;
-+ __u64 flags;
-+ __u64 return_flags;
-+ struct kdbus_item items[0];
-+};
-+ </programlisting>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><varname>size</varname></term>
-+ <listitem><para>
-+ The overall size of the struct, including its items.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>flags</varname></term>
-+ <listitem><para>
-+ Currently, no flags are supported.
-+ <constant>KDBUS_FLAG_NEGOTIATE</constant> is accepted to probe for
-+ valid flags. If set, the ioctl will fail with
-+ <varname>errno</varname> set to <constant>EPROTO</constant>, and
-+ the <varname>flags</varname> field is set to <constant>0</constant>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>return_flags</varname></term>
-+ <listitem><para>
-+ Flags returned by the kernel. Currently unused and always set to
-+ <constant>0</constant> by the kernel.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>items</varname></term>
-+ <listitem>
-+ <para>
-+ The following item types are supported.
-+ </para>
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
-+ <listitem><para>
-+ With this item, programs can <emphasis>probe</emphasis> the
-+ kernel for known item types. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.item</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more details.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+
-+ <para>
-+ Unrecognized items are rejected, and the ioctl will fail with
-+ <varname>errno</varname> set to <constant>EINVAL</constant>.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Return value</title>
-+ <para>
-+ On success, all mentioned ioctl commands return <errorcode>0</errorcode>;
-+ on error, <errorcode>-1</errorcode> is returned, and
-+ <varname>errno</varname> is set to indicate the error.
-+ If the issued ioctl is illegal for the file descriptor used,
-+ <varname>errno</varname> will be set to <constant>ENOTTY</constant>.
-+ </para>
-+
-+ <refsect2>
-+ <title>
-+ <constant>KDBUS_CMD_HELLO</constant> may fail with the following
-+ errors
-+ </title>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>EFAULT</constant></term>
-+ <listitem><para>
-+ The supplied pool size was 0 or not a multiple of the page size.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EINVAL</constant></term>
-+ <listitem><para>
-+ The flags supplied in <type>struct kdbus_cmd_hello</type>
-+ are invalid.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EINVAL</constant></term>
-+ <listitem><para>
-+ An illegal combination of
-+ <constant>KDBUS_HELLO_MONITOR</constant>,
-+ <constant>KDBUS_HELLO_ACTIVATOR</constant> and
-+ <constant>KDBUS_HELLO_POLICY_HOLDER</constant> was passed in
-+ <varname>flags</varname>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EINVAL</constant></term>
-+ <listitem><para>
-+ An invalid set of items was supplied.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>ECONNREFUSED</constant></term>
-+ <listitem><para>
-+ The attach_flags_send field did not satisfy the requirements of
-+ the bus.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EPERM</constant></term>
-+ <listitem><para>
-+ A <constant>KDBUS_ITEM_CREDS</constant> items was supplied, but the
-+ current user is not privileged.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>ESHUTDOWN</constant></term>
-+ <listitem><para>
-+ The bus you were trying to connect to has already been shut down.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EMFILE</constant></term>
-+ <listitem><para>
-+ The maximum number of connections on the bus has been reached.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EOPNOTSUPP</constant></term>
-+ <listitem><para>
-+ The endpoint does not support the connection flags supplied in
-+ <type>struct kdbus_cmd_hello</type>.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </refsect2>
-+
-+ <refsect2>
-+ <title>
-+ <constant>KDBUS_CMD_BYEBYE</constant> may fail with the following
-+ errors
-+ </title>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>EALREADY</constant></term>
-+ <listitem><para>
-+ The connection has already been shut down.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EBUSY</constant></term>
-+ <listitem><para>
-+ There are still messages queued up in the connection's pool.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </refsect2>
-+
-+ <refsect2>
-+ <title>
-+ <constant>KDBUS_CMD_CONN_INFO</constant> may fail with the following
-+ errors
-+ </title>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>EINVAL</constant></term>
-+ <listitem><para>
-+ Invalid flags, or neither an ID nor a name was provided, or the
-+ name is invalid.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>ESRCH</constant></term>
-+ <listitem><para>
-+ Connection lookup by name failed.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>ENXIO</constant></term>
-+ <listitem><para>
-+ No connection with the provided connection ID found.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </refsect2>
-+
-+ <refsect2>
-+ <title>
-+ <constant>KDBUS_CMD_CONN_UPDATE</constant> may fail with the following
-+ errors
-+ </title>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>EINVAL</constant></term>
-+ <listitem><para>
-+ Illegal flags or items.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EINVAL</constant></term>
-+ <listitem><para>
-+ Wildcards submitted in policy entries, or illegal sequence
-+ of policy items.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EOPNOTSUPP</constant></term>
-+ <listitem><para>
-+ Operation not supported by connection.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>E2BIG</constant></term>
-+ <listitem><para>
-+ Too many policy items attached.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </refsect2>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>See Also</title>
-+ <simplelist type="inline">
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.bus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.endpoint</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.message</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.name</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.policy</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.pool</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.item</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ </simplelist>
-+ </refsect1>
-+</refentry>
-diff --git a/Documentation/kdbus/kdbus.endpoint.xml b/Documentation/kdbus/kdbus.endpoint.xml
-new file mode 100644
-index 0000000..6632485
---- /dev/null
-+++ b/Documentation/kdbus/kdbus.endpoint.xml
-@@ -0,0 +1,429 @@
-+<?xml version='1.0'?> <!--*-nxml-*-->
-+<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
-+ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
-+
-+<refentry id="kdbus.endpoint">
-+
-+ <refentryinfo>
-+ <title>kdbus.endpoint</title>
-+ <productname>kdbus.endpoint</productname>
-+ </refentryinfo>
-+
-+ <refmeta>
-+ <refentrytitle>kdbus.endpoint</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </refmeta>
-+
-+ <refnamediv>
-+ <refname>kdbus.endpoint</refname>
-+ <refpurpose>kdbus endpoint</refpurpose>
-+ </refnamediv>
-+
-+ <refsect1>
-+ <title>Description</title>
-+
-+ <para>
-+ Endpoints are entry points to a bus (see
-+ <citerefentry>
-+ <refentrytitle>kdbus.bus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>).
-+ By default, each bus has a default
-+ endpoint called 'bus'. The bus owner has the ability to create custom
-+ endpoints with specific names, permissions, and policy databases
-+ (see below). An endpoint is presented as file underneath the directory
-+ of the parent bus.
-+ </para>
-+ <para>
-+ To create a custom endpoint, open the default endpoint
-+ (<literal>bus</literal>) and use the
-+ <constant>KDBUS_CMD_ENDPOINT_MAKE</constant> ioctl with
-+ <type>struct kdbus_cmd</type>. Custom endpoints always have a policy
-+ database that, by default, forbids any operation. You have to explicitly
-+ install policy entries to allow any operation on this endpoint.
-+ </para>
-+ <para>
-+ Once <constant>KDBUS_CMD_ENDPOINT_MAKE</constant> succeeded, the new
-+ endpoint will appear in the filesystem
-+ (<citerefentry>
-+ <refentrytitle>kdbus.bus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>), and the used file descriptor will manage the
-+ newly created endpoint resource. It cannot be used to manage further
-+ resources and must be kept open as long as the endpoint is needed. The
-+ endpoint will be terminated as soon as the file descriptor is closed.
-+ </para>
-+ <para>
-+ Endpoint names may be chosen freely except for one restriction: the name
-+ must be prefixed with the numeric effective UID of the creator and a dash.
-+ This is required to avoid namespace clashes between different users. When
-+ creating an endpoint, the name that is passed in must be properly
-+ formatted or the kernel will refuse creation of the endpoint. Example:
-+ <literal>1047-my-endpoint</literal> is an acceptable name for an
-+ endpoint registered by a user with UID 1047. However,
-+ <literal>1024-my-endpoint</literal> is not, and neither is
-+ <literal>my-endpoint</literal>. The UID must be provided in the
-+ user-namespace of the bus.
-+ </para>
-+ <para>
-+ To create connections to a bus, use <constant>KDBUS_CMD_HELLO</constant>
-+ on a file descriptor returned by <function>open()</function> on an
-+ endpoint node. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.connection</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for further details.
-+ </para>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Creating custom endpoints</title>
-+ <para>
-+ To create a new endpoint, the
-+ <constant>KDBUS_CMD_ENDPOINT_MAKE</constant> command is used. Along with
-+ the endpoint's name, which will be used to expose the endpoint in the
-+ <citerefentry>
-+ <refentrytitle>kdbus.fs</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>,
-+ the command also optionally takes items to set up the endpoint's
-+ <citerefentry>
-+ <refentrytitle>kdbus.policy</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>.
-+ <constant>KDBUS_CMD_ENDPOINT_MAKE</constant> takes a
-+ <type>struct kdbus_cmd</type> argument.
-+ </para>
-+ <programlisting>
-+struct kdbus_cmd {
-+ __u64 size;
-+ __u64 flags;
-+ __u64 return_flags;
-+ struct kdbus_item items[0];
-+};
-+ </programlisting>
-+
-+ <para>The fields in this struct are described below.</para>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><varname>size</varname></term>
-+ <listitem><para>
-+ The overall size of the struct, including its items.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>flags</varname></term>
-+ <listitem><para>The flags for creation.</para>
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_MAKE_ACCESS_GROUP</constant></term>
-+ <listitem>
-+ <para>Make the endpoint file group-accessible.</para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_MAKE_ACCESS_WORLD</constant></term>
-+ <listitem>
-+ <para>Make the endpoint file world-accessible.</para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
-+ <listitem>
-+ <para>
-+ Requests a set of valid flags for this ioctl. When this bit is
-+ set, no action is taken; the ioctl will return
-+ <errorcode>0</errorcode>, and the <varname>flags</varname>
-+ field will have all bits set that are valid for this command.
-+ The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
-+ cleared by the operation.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>return_flags</varname></term>
-+ <listitem><para>
-+ Flags returned by the kernel. Currently unused and always set to
-+ <constant>0</constant> by the kernel.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>items</varname></term>
-+ <listitem>
-+ <para>
-+ The following items are expected for
-+ <constant>KDBUS_CMD_ENDPOINT_MAKE</constant>.
-+ </para>
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_MAKE_NAME</constant></term>
-+ <listitem>
-+ <para>Contains a string to identify the endpoint name.</para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_NAME</constant></term>
-+ <term><constant>KDBUS_ITEM_POLICY_ACCESS</constant></term>
-+ <listitem>
-+ <para>
-+ These items are used to set the policy attached to the
-+ endpoint. For more details on bus and endpoint policies, see
-+ <citerefentry>
-+ <refentrytitle>kdbus.policy</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+ <para>
-+ Unrecognized items are rejected, and the ioctl will fail with
-+ <varname>errno</varname> set to <varname>EINVAL</varname>.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Updating endpoints</title>
-+ <para>
-+ To update an existing endpoint, the
-+ <constant>KDBUS_CMD_ENDPOINT_UPDATE</constant> command is used on the file
-+ descriptor that was used to create the endpoint, using
-+ <constant>KDBUS_CMD_ENDPOINT_MAKE</constant>. The only relevant detail of
-+ the endpoint that can be updated is the policy. When the command is
-+ employed, the policy of the endpoint is <emphasis>replaced</emphasis>
-+ atomically with the new set of rules.
-+ The command takes a <type>struct kdbus_cmd</type> argument.
-+ </para>
-+ <programlisting>
-+struct kdbus_cmd {
-+ __u64 size;
-+ __u64 flags;
-+ __u64 return_flags;
-+ struct kdbus_item items[0];
-+};
-+ </programlisting>
-+
-+ <para>The fields in this struct are described below.</para>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><varname>size</varname></term>
-+ <listitem><para>
-+ The overall size of the struct, including its items.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>flags</varname></term>
-+ <listitem><para>
-+ Unused for this command.
-+ <constant>KDBUS_FLAG_NEGOTIATE</constant> is accepted to probe for
-+ valid flags. If set, the ioctl will return <errorcode>0</errorcode>,
-+ and the <varname>flags</varname> field is set to
-+ <constant>0</constant>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>return_flags</varname></term>
-+ <listitem><para>
-+ Flags returned by the kernel. Currently unused and always set to
-+ <constant>0</constant> by the kernel.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>items</varname></term>
-+ <listitem>
-+ <para>
-+ The following items are expected for
-+ <constant>KDBUS_CMD_ENDPOINT_UPDATE</constant>.
-+ </para>
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_NAME</constant></term>
-+ <term><constant>KDBUS_ITEM_POLICY_ACCESS</constant></term>
-+ <listitem>
-+ <para>
-+ These items are used to set the policy attached to the
-+ endpoint. For more details on bus and endpoint policies, see
-+ <citerefentry>
-+ <refentrytitle>kdbus.policy</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>.
-+ Existing policy is atomically replaced with the new rules
-+ provided.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
-+ <listitem><para>
-+ With this item, programs can <emphasis>probe</emphasis> the
-+ kernel for known item types. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.item</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more details.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+ <para>
-+ Unrecognized items are rejected, and the ioctl will fail with
-+ <varname>errno</varname> set to <constant>EINVAL</constant>.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Return value</title>
-+ <para>
-+ On success, all mentioned ioctl commands return <errorcode>0</errorcode>;
-+ on error, <errorcode>-1</errorcode> is returned, and
-+ <varname>errno</varname> is set to indicate the error.
-+ If the issued ioctl is illegal for the file descriptor used,
-+ <varname>errno</varname> will be set to <constant>ENOTTY</constant>.
-+ </para>
-+
-+ <refsect2>
-+ <title>
-+ <constant>KDBUS_CMD_ENDPOINT_MAKE</constant> may fail with the
-+ following errors
-+ </title>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>EINVAL</constant></term>
-+ <listitem><para>
-+ The flags supplied in the <type>struct kdbus_cmd</type>
-+ are invalid.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EINVAL</constant></term>
-+ <listitem><para>
-+ Illegal combination of <constant>KDBUS_ITEM_NAME</constant> and
-+ <constant>KDBUS_ITEM_POLICY_ACCESS</constant> was provided.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EEXIST</constant></term>
-+ <listitem><para>
-+ An endpoint of that name already exists.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EPERM</constant></term>
-+ <listitem><para>
-+ The calling user is not privileged. See
-+ <citerefentry>
-+ <refentrytitle>kdbus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for information about privileged users.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </refsect2>
-+
-+ <refsect2>
-+ <title>
-+ <constant>KDBUS_CMD_ENDPOINT_UPDATE</constant> may fail with the
-+ following errors
-+ </title>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>EINVAL</constant></term>
-+ <listitem><para>
-+ The flags supplied in <type>struct kdbus_cmd</type>
-+ are invalid.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EINVAL</constant></term>
-+ <listitem><para>
-+ Illegal combination of <constant>KDBUS_ITEM_NAME</constant> and
-+ <constant>KDBUS_ITEM_POLICY_ACCESS</constant> was provided.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </refsect2>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>See Also</title>
-+ <simplelist type="inline">
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.bus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.endpoint</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.fs</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.item</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.message</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.name</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.pool</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ </simplelist>
-+ </refsect1>
-+</refentry>
-diff --git a/Documentation/kdbus/kdbus.fs.xml b/Documentation/kdbus/kdbus.fs.xml
-new file mode 100644
-index 0000000..8c2a90e
---- /dev/null
-+++ b/Documentation/kdbus/kdbus.fs.xml
-@@ -0,0 +1,124 @@
-+<?xml version='1.0'?> <!--*-nxml-*-->
-+<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
-+ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
-+
-+<refentry id="kdbus_fs">
-+
-+ <refentryinfo>
-+ <title>kdbus.fs</title>
-+ <productname>kdbus.fs</productname>
-+ </refentryinfo>
-+
-+ <refmeta>
-+ <refentrytitle>kdbus.fs</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </refmeta>
-+
-+ <refnamediv>
-+ <refname>kdbus.fs</refname>
-+ <refpurpose>kdbus file system</refpurpose>
-+ </refnamediv>
-+
-+ <refsect1>
-+ <title>File-system Layout</title>
-+
-+ <para>
-+ The <emphasis>kdbusfs</emphasis> pseudo filesystem provides access to
-+ kdbus entities, such as <emphasis>buses</emphasis> and
-+ <emphasis>endpoints</emphasis>. Each time the filesystem is mounted,
-+ a new, isolated kdbus instance is created, which is independent from the
-+ other instances.
-+ </para>
-+ <para>
-+ The system-wide standard mount point for <emphasis>kdbusfs</emphasis> is
-+ <constant>/sys/fs/kdbus</constant>.
-+ </para>
-+
-+ <para>
-+ Buses are represented as directories in the file system layout, whereas
-+ endpoints are exposed as files inside these directories. At the top-level,
-+ a <emphasis>control</emphasis> node is present, which can be opened to
-+ create new buses via the <constant>KDBUS_CMD_BUS_MAKE</constant> ioctl.
-+ Each <emphasis>bus</emphasis> shows a default endpoint called
-+ <varname>bus</varname>, which can be opened to either create a connection
-+ with the <constant>KDBUS_CMD_HELLO</constant> ioctl, or to create new
-+ custom endpoints for the bus with
-+ <constant>KDBUS_CMD_ENDPOINT_MAKE</constant>. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.bus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>,
-+ <citerefentry>
-+ <refentrytitle>kdbus.connection</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry> and
-+ <citerefentry>
-+ <refentrytitle>kdbus.endpoint</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more details.
-+ </para>
-+
-+ <para>Following, you can see an example layout of the
-+ <emphasis>kdbusfs</emphasis> filesystem:</para>
-+
-+<programlisting>
-+ /sys/fs/kdbus/ ; mount-point
-+ |-- 0-system ; bus directory
-+ | |-- bus ; default endpoint
-+ | `-- 1017-custom ; custom endpoint
-+ |-- 1000-user ; bus directory
-+ | |-- bus ; default endpoint
-+ | |-- 1000-service-A ; custom endpoint
-+ | `-- 1000-service-B ; custom endpoint
-+ `-- control ; control file
-+</programlisting>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Mounting instances</title>
-+ <para>
-+ In order to get a new and separate kdbus environment, a new instance
-+ of <emphasis>kdbusfs</emphasis> can be mounted like this:
-+ </para>
-+<programlisting>
-+ # mount -t kdbusfs kdbusfs /tmp/new_kdbus/
-+</programlisting>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>See Also</title>
-+ <simplelist type="inline">
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.bus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.connection</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.endpoint</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>mount</refentrytitle>
-+ <manvolnum>8</manvolnum>
-+ </citerefentry>
-+ </member>
-+ </simplelist>
-+ </refsect1>
-+</refentry>
-diff --git a/Documentation/kdbus/kdbus.item.xml b/Documentation/kdbus/kdbus.item.xml
-new file mode 100644
-index 0000000..ee09dfa
---- /dev/null
-+++ b/Documentation/kdbus/kdbus.item.xml
-@@ -0,0 +1,839 @@
-+<?xml version='1.0'?> <!--*-nxml-*-->
-+<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
-+ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
-+
-+<refentry id="kdbus">
-+
-+ <refentryinfo>
-+ <title>kdbus.item</title>
-+ <productname>kdbus item</productname>
-+ </refentryinfo>
-+
-+ <refmeta>
-+ <refentrytitle>kdbus.item</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </refmeta>
-+
-+ <refnamediv>
-+ <refname>kdbus.item</refname>
-+ <refpurpose>kdbus item structure, layout and usage</refpurpose>
-+ </refnamediv>
-+
-+ <refsect1>
-+ <title>Description</title>
-+
-+ <para>
-+ To flexibly augment transport structures, data blobs of type
-+ <type>struct kdbus_item</type> can be attached to the structs passed
-+ into the ioctls. Some ioctls make items of certain types mandatory,
-+ others are optional. Items that are unsupported by ioctls they are
-+ attached to will cause the ioctl to fail with <varname>errno</varname>
-+ set to <constant>EINVAL</constant>.
-+ Items are also used for information stored in a connection's
-+ <emphasis>pool</emphasis>, such as received messages, name lists or
-+ requested connection or bus owner information. Depending on the type of
-+ an item, its total size is either fixed or variable.
-+ </para>
-+
-+ <refsect2>
-+ <title>Chaining items</title>
-+ <para>
-+ Whenever items are used as part of the kdbus kernel API, they are
-+ embedded in structs that are embedded inside structs that themselves
-+ include a size field containing the overall size of the structure.
-+ This allows multiple items to be chained up, and an item iterator
-+ (see below) is capable of detecting the end of an item chain.
-+ </para>
-+ </refsect2>
-+
-+ <refsect2>
-+ <title>Alignment</title>
-+ <para>
-+ The kernel expects all items to be aligned to 8-byte boundaries.
-+ Unaligned items will cause the ioctl they are used with to fail
-+ with <varname>errno</varname> set to <constant>EINVAL</constant>.
-+ An item that has an unaligned size itself hence needs to be padded
-+ if it is followed by another item.
-+ </para>
-+ </refsect2>
-+
-+ <refsect2>
-+ <title>Iterating items</title>
-+ <para>
-+ A simple iterator would iterate over the items until the items have
-+ reached the embedding structure's overall size. An example
-+ implementation is shown below.
-+ </para>
-+
-+ <programlisting><![CDATA[
-+#define KDBUS_ALIGN8(val) (((val) + 7) & ~7)
-+
-+#define KDBUS_ITEM_NEXT(item) \
-+ (typeof(item))((uint8_t *)(item) + KDBUS_ALIGN8((item)->size))
-+
-+#define KDBUS_ITEM_FOREACH(item, head, first) \
-+ for ((item) = (head)->first; \
-+ ((uint8_t *)(item) < (uint8_t *)(head) + (head)->size) && \
-+ ((uint8_t *)(item) >= (uint8_t *)(head)); \
-+ (item) = KDBUS_ITEM_NEXT(item))
-+ ]]></programlisting>
-+ </refsect2>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Item layout</title>
-+ <para>
-+ A <type>struct kdbus_item</type> consists of a
-+ <varname>size</varname> field, describing its overall size, and a
-+ <varname>type</varname> field, both 64 bit wide. They are followed by
-+ a union to store information that is specific to the item's type.
-+ The struct layout is shown below.
-+ </para>
-+
-+ <programlisting>
-+struct kdbus_item {
-+ __u64 size;
-+ __u64 type;
-+ /* item payload - see below */
-+ union {
-+ __u8 data[0];
-+ __u32 data32[0];
-+ __u64 data64[0];
-+ char str[0];
-+
-+ __u64 id;
-+ struct kdbus_vec vec;
-+ struct kdbus_creds creds;
-+ struct kdbus_pids pids;
-+ struct kdbus_audit audit;
-+ struct kdbus_caps caps;
-+ struct kdbus_timestamp timestamp;
-+ struct kdbus_name name;
-+ struct kdbus_bloom_parameter bloom_parameter;
-+ struct kdbus_bloom_filter bloom_filter;
-+ struct kdbus_memfd memfd;
-+ int fds[0];
-+ struct kdbus_notify_name_change name_change;
-+ struct kdbus_notify_id_change id_change;
-+ struct kdbus_policy_access policy_access;
-+ };
-+};
-+ </programlisting>
-+
-+ <para>
-+ <type>struct kdbus_item</type> should never be used to allocate
-+ an item instance, as its size may grow in future releases of the API.
-+ Instead, it should be manually assembled by storing the
-+ <varname>size</varname>, <varname>type</varname> and payload to a
-+ struct of its own.
-+ </para>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Item types</title>
-+
-+ <refsect2>
-+ <title>Negotiation item</title>
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
-+ <listitem><para>
-+ With this item is attached to any ioctl, programs can
-+ <emphasis>probe</emphasis> the kernel for known item types.
-+ The item carries an array of <type>uint64_t</type> values in
-+ <varname>item.data64</varname>, each set to an item type to
-+ probe. The kernel will reset each member of this array that is
-+ not recognized as valid item type to <constant>0</constant>.
-+ This way, users can negotiate kernel features at start-up to
-+ keep newer userspace compatible with older kernels. This item
-+ is never attached by the kernel in response to any command.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </refsect2>
-+
-+ <refsect2>
-+ <title>Command specific items</title>
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_PAYLOAD_VEC</constant></term>
-+ <term><constant>KDBUS_ITEM_PAYLOAD_OFF</constant></term>
-+ <listitem><para>
-+ Messages are directly copied by the sending process into the
-+ receiver's
-+ <citerefentry>
-+ <refentrytitle>kdbus.pool</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>.
-+ This way, two peers can exchange data by effectively doing a
-+ single-copy from one process to another; the kernel will not buffer
-+ the data anywhere else. <constant>KDBUS_ITEM_PAYLOAD_VEC</constant>
-+ is used when <emphasis>sending</emphasis> message. The item
-+ references a memory address when the payload data can be found.
-+ <constant>KDBUS_ITEM_PAYLOAD_OFF</constant> is used when messages
-+ are <emphasis>received</emphasis>, and the
-+ <constant>offset</constant> value describes the offset inside the
-+ receiving connection's
-+ <citerefentry>
-+ <refentrytitle>kdbus.pool</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ where the message payload can be found. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.message</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more information on passing of payload data along with a
-+ message.
-+ <programlisting>
-+struct kdbus_vec {
-+ __u64 size;
-+ union {
-+ __u64 address;
-+ __u64 offset;
-+ };
-+};
-+ </programlisting>
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_PAYLOAD_MEMFD</constant></term>
-+ <listitem><para>
-+ Transports a file descriptor of a <emphasis>memfd</emphasis> in
-+ <type>struct kdbus_memfd</type> in <varname>item.memfd</varname>.
-+ The <varname>size</varname> field has to match the actual size of
-+ the memfd that was specified when it was created. The
-+ <varname>start</varname> parameter denotes the offset inside the
-+ memfd at which the referenced payload starts. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.message</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more information on passing of payload data along with a
-+ message.
-+ <programlisting>
-+struct kdbus_memfd {
-+ __u64 start;
-+ __u64 size;
-+ int fd;
-+ __u32 __pad;
-+};
-+ </programlisting>
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_FDS</constant></term>
-+ <listitem><para>
-+ Contains an array of <emphasis>file descriptors</emphasis>.
-+ When used with <constant>KDBUS_CMD_SEND</constant>, the values of
-+ this array must be filled with valid file descriptor numbers.
-+ When received as item attached to a message, the array will
-+ contain the numbers of the installed file descriptors, or
-+ <constant>-1</constant> in case an error occurred.
-+ In either case, the number of entries in the array is derived from
-+ the item's total size. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.message</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more information.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </refsect2>
-+
-+ <refsect2>
-+ <title>Items specific to some commands</title>
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_CANCEL_FD</constant></term>
-+ <listitem><para>
-+ Transports a file descriptor that can be used to cancel a
-+ synchronous <constant>KDBUS_CMD_SEND</constant> operation by
-+ writing to it. The file descriptor is stored in
-+ <varname>item.fd[0]</varname>. The item may only contain one
-+ file descriptor. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.message</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more information on this item and how to use it.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_BLOOM_PARAMETER</constant></term>
-+ <listitem><para>
-+ Contains a set of <emphasis>bloom parameters</emphasis> as
-+ <type>struct kdbus_bloom_parameter</type> in
-+ <varname>item.bloom_parameter</varname>.
-+ The item is passed from userspace to kernel during the
-+ <constant>KDBUS_CMD_BUS_MAKE</constant> ioctl, and returned
-+ verbatim when <constant>KDBUS_CMD_HELLO</constant> is called.
-+ The kernel does not use the bloom parameters, but they need to
-+ be known by each connection on the bus in order to define the
-+ bloom filter hash details. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.match</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more information on matching and bloom filters.
-+ <programlisting>
-+struct kdbus_bloom_parameter {
-+ __u64 size;
-+ __u64 n_hash;
-+};
-+ </programlisting>
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_BLOOM_FILTER</constant></term>
-+ <listitem><para>
-+ Carries a <emphasis>bloom filter</emphasis> as
-+ <type>struct kdbus_bloom_filter</type> in
-+ <varname>item.bloom_filter</varname>. It is mandatory to send this
-+ item attached to a <type>struct kdbus_msg</type>, in case the
-+ message is a signal. This item is never transported from kernel to
-+ userspace. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.match</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more information on matching and bloom filters.
-+ <programlisting>
-+struct kdbus_bloom_filter {
-+ __u64 generation;
-+ __u64 data[0];
-+};
-+ </programlisting>
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_BLOOM_MASK</constant></term>
-+ <listitem><para>
-+ Transports a <emphasis>bloom mask</emphasis> as binary data blob
-+ stored in <varname>item.data</varname>. This item is used to
-+ describe a match into a connection's match database. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.match</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more information on matching and bloom filters.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_DST_NAME</constant></term>
-+ <listitem><para>
-+ Contains a <emphasis>well-known name</emphasis> to send a
-+ message to, as null-terminated string in
-+ <varname>item.str</varname>. This item is used with
-+ <constant>KDBUS_CMD_SEND</constant>. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.message</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more information on how to send a message.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_MAKE_NAME</constant></term>
-+ <listitem><para>
-+ Contains a <emphasis>bus name</emphasis> or
-+ <emphasis>endpoint name</emphasis>, stored as null-terminated
-+ string in <varname>item.str</varname>. This item is sent from
-+ userspace to kernel when buses or endpoints are created, and
-+ returned back to userspace when the bus creator information is
-+ queried. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.bus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ and
-+ <citerefentry>
-+ <refentrytitle>kdbus.endpoint</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_ATTACH_FLAGS_SEND</constant></term>
-+ <term><constant>KDBUS_ITEM_ATTACH_FLAGS_RECV</constant></term>
-+ <listitem><para>
-+ Contains a set of <emphasis>attach flags</emphasis> at
-+ <emphasis>send</emphasis> or <emphasis>receive</emphasis> time. See
-+ <citerefentry>
-+ <refentrytitle>kdbus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>,
-+ <citerefentry>
-+ <refentrytitle>kdbus.bus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry> and
-+ <citerefentry>
-+ <refentrytitle>kdbus.connection</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more information on attach flags.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_ID</constant></term>
-+ <listitem><para>
-+ Transports a connection's <emphasis>numerical ID</emphasis> of
-+ a connection as <type>uint64_t</type> value in
-+ <varname>item.id</varname>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_NAME</constant></term>
-+ <listitem><para>
-+ Transports a name associated with the
-+ <emphasis>name registry</emphasis> as null-terminated string as
-+ <type>struct kdbus_name</type> in
-+ <varname>item.name</varname>. The <varname>flags</varname>
-+ contains the flags of the name. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.name</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more information on how to access the name registry of a bus.
-+ <programlisting>
-+struct kdbus_name {
-+ __u64 flags;
-+ char name[0];
-+};
-+ </programlisting>
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </refsect2>
-+
-+ <refsect2>
-+ <title>Items attached by the kernel as metadata</title>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_TIMESTAMP</constant></term>
-+ <listitem><para>
-+ Contains both the <emphasis>monotonic</emphasis> and the
-+ <emphasis>realtime</emphasis> timestamp, taken when the message
-+ was processed on the kernel side.
-+ Stored as <type>struct kdbus_timestamp</type> in
-+ <varname>item.timestamp</varname>.
-+ <programlisting>
-+struct kdbus_timestamp {
-+ __u64 seqnum;
-+ __u64 monotonic_ns;
-+ __u64 realtime_ns;
-+};
-+ </programlisting>
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_CREDS</constant></term>
-+ <listitem><para>
-+ Contains a set of <emphasis>user</emphasis> and
-+ <emphasis>group</emphasis> information as 32-bit values, in the
-+ usual four flavors: real, effective, saved and filesystem related.
-+ Stored as <type>struct kdbus_creds</type> in
-+ <varname>item.creds</varname>.
-+ <programlisting>
-+struct kdbus_creds {
-+ __u32 uid;
-+ __u32 euid;
-+ __u32 suid;
-+ __u32 fsuid;
-+ __u32 gid;
-+ __u32 egid;
-+ __u32 sgid;
-+ __u32 fsgid;
-+};
-+ </programlisting>
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_PIDS</constant></term>
-+ <listitem><para>
-+ Contains the <emphasis>PID</emphasis>, <emphasis>TID</emphasis>
-+ and <emphasis>parent PID (PPID)</emphasis> of a remote peer.
-+ Stored as <type>struct kdbus_pids</type> in
-+ <varname>item.pids</varname>.
-+ <programlisting>
-+struct kdbus_pids {
-+ __u64 pid;
-+ __u64 tid;
-+ __u64 ppid;
-+};
-+ </programlisting>
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_AUXGROUPS</constant></term>
-+ <listitem><para>
-+ Contains the <emphasis>auxiliary (supplementary) groups</emphasis>
-+ a remote peer is a member of, stored as array of
-+ <type>uint32_t</type> values in <varname>item.data32</varname>.
-+ The array length can be determined by looking at the item's total
-+ size, subtracting the size of the header and dividing the
-+ remainder by <constant>sizeof(uint32_t)</constant>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_OWNED_NAME</constant></term>
-+ <listitem><para>
-+ Contains a <emphasis>well-known name</emphasis> currently owned
-+ by a connection. The name is stored as null-terminated string in
-+ <varname>item.str</varname>. Its length can also be derived from
-+ the item's total size.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_TID_COMM</constant> [*]</term>
-+ <listitem><para>
-+ Contains the <emphasis>comm</emphasis> string of a task's
-+ <emphasis>TID</emphasis> (thread ID), stored as null-terminated
-+ string in <varname>item.str</varname>. Its length can also be
-+ derived from the item's total size. Receivers of this item should
-+ not use its contents for any kind of security measures. See below.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_PID_COMM</constant> [*]</term>
-+ <listitem><para>
-+ Contains the <emphasis>comm</emphasis> string of a task's
-+ <emphasis>PID</emphasis> (process ID), stored as null-terminated
-+ string in <varname>item.str</varname>. Its length can also be
-+ derived from the item's total size. Receivers of this item should
-+ not use its contents for any kind of security measures. See below.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_EXE</constant> [*]</term>
-+ <listitem><para>
-+ Contains the <emphasis>path to the executable</emphasis> of a task,
-+ stored as null-terminated string in <varname>item.str</varname>. Its
-+ length can also be derived from the item's total size. Receivers of
-+ this item should not use its contents for any kind of security
-+ measures. See below.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_CMDLINE</constant> [*]</term>
-+ <listitem><para>
-+ Contains the <emphasis>command line arguments</emphasis> of a
-+ task, stored as an <emphasis>array</emphasis> of null-terminated
-+ strings in <varname>item.str</varname>. The total length of all
-+ strings in the array can be derived from the item's total size.
-+ Receivers of this item should not use its contents for any kind
-+ of security measures. See below.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_CGROUP</constant></term>
-+ <listitem><para>
-+ Contains the <emphasis>cgroup path</emphasis> of a task, stored
-+ as null-terminated string in <varname>item.str</varname>. Its
-+ length can also be derived from the item's total size.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_CAPS</constant></term>
-+ <listitem><para>
-+ Contains sets of <emphasis>capabilities</emphasis>, stored as
-+ <type>struct kdbus_caps</type> in <varname>item.caps</varname>.
-+ As the item size may increase in the future, programs should be
-+ written in a way that it takes
-+ <varname>item.caps.last_cap</varname> into account, and derive
-+ the number of sets and rows from the item size and the reported
-+ number of valid capability bits.
-+ <programlisting>
-+struct kdbus_caps {
-+ __u32 last_cap;
-+ __u32 caps[0];
-+};
-+ </programlisting>
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_SECLABEL</constant></term>
-+ <listitem><para>
-+ Contains the <emphasis>LSM label</emphasis> of a task, stored as
-+ null-terminated string in <varname>item.str</varname>. Its length
-+ can also be derived from the item's total size.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_AUDIT</constant></term>
-+ <listitem><para>
-+ Contains the audit <emphasis>sessionid</emphasis> and
-+ <emphasis>loginuid</emphasis> of a task, stored as
-+ <type>struct kdbus_audit</type> in
-+ <varname>item.audit</varname>.
-+ <programlisting>
-+struct kdbus_audit {
-+ __u32 sessionid;
-+ __u32 loginuid;
-+};
-+ </programlisting>
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_CONN_DESCRIPTION</constant></term>
-+ <listitem><para>
-+ Contains the <emphasis>connection description</emphasis>, as set
-+ by <constant>KDBUS_CMD_HELLO</constant> or
-+ <constant>KDBUS_CMD_CONN_UPDATE</constant>, stored as
-+ null-terminated string in <varname>item.str</varname>. Its length
-+ can also be derived from the item's total size.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+
-+ <para>
-+ All metadata is automatically translated into the
-+ <emphasis>namespaces</emphasis> of the task that receives them. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.message</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more information.
-+ </para>
-+
-+ <para>
-+ [*] Note that the content stored in metadata items of type
-+ <constant>KDBUS_ITEM_TID_COMM</constant>,
-+ <constant>KDBUS_ITEM_PID_COMM</constant>,
-+ <constant>KDBUS_ITEM_EXE</constant> and
-+ <constant>KDBUS_ITEM_CMDLINE</constant>
-+ can easily be tampered by the sending tasks. Therefore, they should
-+ <emphasis>not</emphasis> be used for any sort of security relevant
-+ assumptions. The only reason they are transmitted is to let
-+ receivers know about details that were set when metadata was
-+ collected, even though the task they were collected from is not
-+ active any longer when the items are received.
-+ </para>
-+ </refsect2>
-+
-+ <refsect2>
-+ <title>Items used for policy entries, matches and notifications</title>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_POLICY_ACCESS</constant></term>
-+ <listitem><para>
-+ This item describes a <emphasis>policy access</emphasis> entry to
-+ access the policy database of a
-+ <citerefentry>
-+ <refentrytitle>kdbus.bus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry> or
-+ <citerefentry>
-+ <refentrytitle>kdbus.endpoint</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>.
-+ Please refer to
-+ <citerefentry>
-+ <refentrytitle>kdbus.policy</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more information on the policy database and how to access it.
-+ <programlisting>
-+struct kdbus_policy_access {
-+ __u64 type;
-+ __u64 access;
-+ __u64 id;
-+};
-+ </programlisting>
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_ID_ADD</constant></term>
-+ <term><constant>KDBUS_ITEM_ID_REMOVE</constant></term>
-+ <listitem><para>
-+ This item is sent as attachment to a
-+ <emphasis>kernel notification</emphasis> and indicates that a
-+ new connection was created on the bus, or that a connection was
-+ disconnected, respectively. It stores a
-+ <type>struct kdbus_notify_id_change</type> in
-+ <varname>item.id_change</varname>.
-+ The <varname>id</varname> field contains the numeric ID of the
-+ connection that was added or removed, and <varname>flags</varname>
-+ is set to the connection flags, as passed by
-+ <constant>KDBUS_CMD_HELLO</constant>. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.match</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ and
-+ <citerefentry>
-+ <refentrytitle>kdbus.message</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more information on matches and notification messages.
-+ <programlisting>
-+struct kdbus_notify_id_change {
-+ __u64 id;
-+ __u64 flags;
-+};
-+ </programlisting>
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_NAME_ADD</constant></term>
-+ <term><constant>KDBUS_ITEM_NAME_REMOVE</constant></term>
-+ <term><constant>KDBUS_ITEM_NAME_CHANGE</constant></term>
-+ <listitem><para>
-+ This item is sent as attachment to a
-+ <emphasis>kernel notification</emphasis> and indicates that a
-+ <emphasis>well-known name</emphasis> appeared, disappeared or
-+ transferred to another owner on the bus. It stores a
-+ <type>struct kdbus_notify_name_change</type> in
-+ <varname>item.name_change</varname>.
-+ <varname>old_id</varname> describes the former owner of the name
-+ and is set to <constant>0</constant> values in case of
-+ <constant>KDBUS_ITEM_NAME_ADD</constant>.
-+ <varname>new_id</varname> describes the new owner of the name and
-+ is set to <constant>0</constant> values in case of
-+ <constant>KDBUS_ITEM_NAME_REMOVE</constant>.
-+ The <varname>name</varname> field contains the well-known name the
-+ notification is about, as null-terminated string. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.match</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ and
-+ <citerefentry>
-+ <refentrytitle>kdbus.message</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more information on matches and notification messages.
-+ <programlisting>
-+struct kdbus_notify_name_change {
-+ struct kdbus_notify_id_change old_id;
-+ struct kdbus_notify_id_change new_id;
-+ char name[0];
-+};
-+ </programlisting>
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_REPLY_TIMEOUT</constant></term>
-+ <listitem><para>
-+ This item is sent as attachment to a
-+ <emphasis>kernel notification</emphasis>. It informs the receiver
-+ that an expected reply to a message was not received in time.
-+ The remote peer ID and the message cookie are stored in the message
-+ header. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.message</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more information about messages, timeouts and notifications.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_REPLY_DEAD</constant></term>
-+ <listitem><para>
-+ This item is sent as attachment to a
-+ <emphasis>kernel notification</emphasis>. It informs the receiver
-+ that a remote connection a reply is expected from was disconnected
-+ before that reply was sent. The remote peer ID and the message
-+ cookie are stored in the message header. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.message</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more information about messages, timeouts and notifications.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </refsect2>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>See Also</title>
-+ <simplelist type="inline">
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.bus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.connection</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.endpoint</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.fs</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.message</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.name</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.pool</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>memfd_create</refentrytitle>
-+ <manvolnum>2</manvolnum>
-+ </citerefentry>
-+ </member>
-+ </simplelist>
-+ </refsect1>
-+
-+</refentry>
-diff --git a/Documentation/kdbus/kdbus.match.xml b/Documentation/kdbus/kdbus.match.xml
-new file mode 100644
-index 0000000..ae38e04
---- /dev/null
-+++ b/Documentation/kdbus/kdbus.match.xml
-@@ -0,0 +1,555 @@
-+<?xml version='1.0'?> <!--*-nxml-*-->
-+<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
-+ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
-+
-+<refentry id="kdbus.match">
-+
-+ <refentryinfo>
-+ <title>kdbus.match</title>
-+ <productname>kdbus.match</productname>
-+ </refentryinfo>
-+
-+ <refmeta>
-+ <refentrytitle>kdbus.match</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </refmeta>
-+
-+ <refnamediv>
-+ <refname>kdbus.match</refname>
-+ <refpurpose>kdbus match</refpurpose>
-+ </refnamediv>
-+
-+ <refsect1>
-+ <title>Description</title>
-+
-+ <para>
-+ kdbus connections can install matches in order to subscribe to signal
-+ messages sent on the bus. Such signal messages can be either directed
-+ to a single connection (by setting a specific connection ID in
-+ <varname>struct kdbus_msg.dst_id</varname> or by sending it to a
-+ well-known name), or to potentially <emphasis>all</emphasis> currently
-+ active connections on the bus (by setting
-+ <varname>struct kdbus_msg.dst_id</varname> to
-+ <constant>KDBUS_DST_ID_BROADCAST</constant>).
-+ A signal message always has the <constant>KDBUS_MSG_SIGNAL</constant>
-+ bit set in the <varname>flags</varname> bitfield.
-+ Also, signal messages can originate from either the kernel (called
-+ <emphasis>notifications</emphasis>), or from other bus connections.
-+ In either case, a bus connection needs to have a suitable
-+ <emphasis>match</emphasis> installed in order to receive any signal
-+ message. Without any rules installed in the connection, no signal message
-+ will be received.
-+ </para>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Matches for signal messages from other connections</title>
-+ <para>
-+ Matches for messages from other connections (not kernel notifications)
-+ are implemented as bloom filters (see below). The sender adds certain
-+ properties of the message as elements to a bloom filter bit field, and
-+ sends that along with the signal message.
-+
-+ The receiving connection adds the message properties it is interested in
-+ as elements to a bloom mask bit field, and uploads the mask as match rule,
-+ possibly along with some other rules to further limit the match.
-+
-+ The kernel will match the signal message's bloom filter against the
-+ connection's bloom mask (simply by &-ing it), and will decide whether
-+ the message should be delivered to a connection.
-+ </para>
-+ <para>
-+ The kernel has no notion of any specific properties of the signal message,
-+ all it sees are the bit fields of the bloom filter and the mask to match
-+ against. The use of bloom filters allows simple and efficient matching,
-+ without exposing any message properties or internals to the kernel side.
-+ Clients need to deal with the fact that they might receive signal messages
-+ which they did not subscribe to, as the bloom filter might allow
-+ false-positives to pass the filter.
-+
-+ To allow the future extension of the set of elements in the bloom filter,
-+ the filter specifies a <emphasis>generation</emphasis> number. A later
-+ generation must always contain all elements of the set of the previous
-+ generation, but can add new elements to the set. The match rules mask can
-+ carry an array with all previous generations of masks individually stored.
-+ When the filter and mask are matched by the kernel, the mask with the
-+ closest matching generation is selected as the index into the mask array.
-+ </para>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Bloom filters</title>
-+ <para>
-+ Bloom filters allow checking whether a given word is present in a
-+ dictionary. This allows connections to set up a mask for information it
-+ is interested in, and will be delivered signal messages that have a
-+ matching filter.
-+
-+ For general information, see
-+ <ulink url="https://en.wikipedia.org/wiki/Bloom_filter">the Wikipedia
-+ article on bloom filters</ulink>.
-+ </para>
-+ <para>
-+ The size of the bloom filter is defined per bus when it is created, in
-+ <varname>kdbus_bloom_parameter.size</varname>. All bloom filters attached
-+ to signal messages on the bus must match this size, and all bloom filter
-+ matches uploaded by connections must also match the size, or a multiple
-+ thereof (see below).
-+
-+ The calculation of the mask has to be done in userspace applications. The
-+ kernel just checks the bitmasks to decide whether or not to let the
-+ message pass. All bits in the mask must match the filter in and bit-wise
-+ <emphasis>AND</emphasis> logic, but the mask may have more bits set than
-+ the filter. Consequently, false positive matches are expected to happen,
-+ and programs must deal with that fact by checking the contents of the
-+ payload again at receive time.
-+ </para>
-+ <para>
-+ Masks are entities that are always passed to the kernel as part of a
-+ match (with an item of type <constant>KDBUS_ITEM_BLOOM_MASK</constant>),
-+ and filters can be attached to signals, with an item of type
-+ <constant>KDBUS_ITEM_BLOOM_FILTER</constant>. For a filter to match, all
-+ its bits have to be set in the match mask as well.
-+ </para>
-+ <para>
-+ For example, consider a bus that has a bloom size of 8 bytes, and the
-+ following mask/filter combinations:
-+ </para>
-+ <programlisting><![CDATA[
-+ filter 0x0101010101010101
-+ mask 0x0101010101010101
-+ -> matches
-+
-+ filter 0x0303030303030303
-+ mask 0x0101010101010101
-+ -> doesn't match
-+
-+ filter 0x0101010101010101
-+ mask 0x0303030303030303
-+ -> matches
-+ ]]></programlisting>
-+
-+ <para>
-+ Hence, in order to catch all messages, a mask filled with
-+ <constant>0xff</constant> bytes can be installed as a wildcard match rule.
-+ </para>
-+
-+ <refsect2>
-+ <title>Generations</title>
-+
-+ <para>
-+ Uploaded matches may contain multiple masks, which have to be as large
-+ as the bloom filter size defined by the bus. Each block of a mask is
-+ called a <emphasis>generation</emphasis>, starting at index 0.
-+
-+ At match time, when a signal is about to be delivered, a bloom mask
-+ generation is passed, which denotes which of the bloom masks the filter
-+ should be matched against. This allows programs to provide backward
-+ compatible masks at upload time, while older clients can still match
-+ against older versions of filters.
-+ </para>
-+ </refsect2>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Matches for kernel notifications</title>
-+ <para>
-+ To receive kernel generated notifications (see
-+ <citerefentry>
-+ <refentrytitle>kdbus.message</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>),
-+ a connection must install match rules that are different from
-+ the bloom filter matches described in the section above. They can be
-+ filtered by the connection ID that caused the notification to be sent, by
-+ one of the names it currently owns, or by the type of the notification
-+ (ID/name add/remove/change).
-+ </para>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Adding a match</title>
-+ <para>
-+ To add a match, the <constant>KDBUS_CMD_MATCH_ADD</constant> ioctl is
-+ used, which takes a <type>struct kdbus_cmd_match</type> as an argument
-+ described below.
-+
-+ Note that each of the items attached to this command will internally
-+ create one match <emphasis>rule</emphasis>, and the collection of them,
-+ which is submitted as one block via the ioctl, is called a
-+ <emphasis>match</emphasis>. To allow a message to pass, all rules of a
-+ match have to be satisfied. Hence, adding more items to the command will
-+ only narrow the possibility of a match to effectively let the message
-+ pass, and will decrease the chance that the connection's process will be
-+ woken up needlessly.
-+
-+ Multiple matches can be installed per connection. As long as one of it has
-+ a set of rules which allows the message to pass, this one will be
-+ decisive.
-+ </para>
-+
-+ <programlisting>
-+struct kdbus_cmd_match {
-+ __u64 size;
-+ __u64 flags;
-+ __u64 return_flags;
-+ __u64 cookie;
-+ struct kdbus_item items[0];
-+};
-+ </programlisting>
-+
-+ <para>The fields in this struct are described below.</para>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><varname>size</varname></term>
-+ <listitem><para>
-+ The overall size of the struct, including its items.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>flags</varname></term>
-+ <listitem><para>Flags to control the behavior of the ioctl.</para>
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_MATCH_REPLACE</constant></term>
-+ <listitem>
-+ <para>Make the endpoint file group-accessible</para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
-+ <listitem>
-+ <para>
-+ Requests a set of valid flags for this ioctl. When this bit is
-+ set, no action is taken; the ioctl will return
-+ <errorcode>0</errorcode>, and the <varname>flags</varname>
-+ field will have all bits set that are valid for this command.
-+ The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
-+ cleared by the operation.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>return_flags</varname></term>
-+ <listitem><para>
-+ Flags returned by the kernel. Currently unused and always set to
-+ <constant>0</constant> by the kernel.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>cookie</varname></term>
-+ <listitem><para>
-+ A cookie which identifies the match, so it can be referred to when
-+ removing it.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>items</varname></term>
-+ <listitem>
-+ <para>
-+ Items to define the actual rules of the matches. The following item
-+ types are expected. Each item will create one new match rule.
-+ </para>
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_BLOOM_MASK</constant></term>
-+ <listitem>
-+ <para>
-+ An item that carries the bloom filter mask to match against
-+ in its data field. The payload size must match the bloom
-+ filter size that was specified when the bus was created.
-+ See the "Bloom filters" section above for more information on
-+ bloom filters.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_NAME</constant></term>
-+ <listitem>
-+ <para>
-+ When used as part of kernel notifications, this item specifies
-+ a name that is acquired, lost or that changed its owner (see
-+ below). When used as part of a match for user-generated signal
-+ messages, it specifies a name that the sending connection must
-+ own at the time of sending the signal.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_ID</constant></term>
-+ <listitem>
-+ <para>
-+ Specify a sender connection's ID that will match this rule.
-+ For kernel notifications, this specifies the ID of a
-+ connection that was added to or removed from the bus.
-+ For used-generated signals, it specifies the ID of the
-+ connection that sent the signal message.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_NAME_ADD</constant></term>
-+ <term><constant>KDBUS_ITEM_NAME_REMOVE</constant></term>
-+ <term><constant>KDBUS_ITEM_NAME_CHANGE</constant></term>
-+ <listitem>
-+ <para>
-+ These items request delivery of kernel notifications that
-+ describe a name acquisition, loss, or change. The details
-+ are stored in the item's
-+ <varname>kdbus_notify_name_change</varname> member.
-+ All information specified must be matched in order to make
-+ the message pass. Use
-+ <constant>KDBUS_MATCH_ID_ANY</constant> to
-+ match against any unique connection ID.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_ID_ADD</constant></term>
-+ <term><constant>KDBUS_ITEM_ID_REMOVE</constant></term>
-+ <listitem>
-+ <para>
-+ These items request delivery of kernel notifications that are
-+ generated when a connection is created or terminated.
-+ <type>struct kdbus_notify_id_change</type> is used to
-+ store the actual match information. This item can be used to
-+ monitor one particular connection ID, or, when the ID field
-+ is set to <constant>KDBUS_MATCH_ID_ANY</constant>,
-+ all of them.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_NEGOTIATE</constant></term>
-+ <listitem><para>
-+ With this item, programs can <emphasis>probe</emphasis> the
-+ kernel for known item types. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.item</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more details.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+
-+ <para>
-+ Unrecognized items are rejected, and the ioctl will fail with
-+ <varname>errno</varname> set to <constant>EINVAL</constant>.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+
-+ <para>
-+ Refer to
-+ <citerefentry>
-+ <refentrytitle>kdbus.message</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more information on message types.
-+ </para>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Removing a match</title>
-+ <para>
-+ Matches can be removed with the
-+ <constant>KDBUS_CMD_MATCH_REMOVE</constant> ioctl, which takes
-+ <type>struct kdbus_cmd_match</type> as argument, but its fields
-+ usage slightly differs compared to that of
-+ <constant>KDBUS_CMD_MATCH_ADD</constant>.
-+ </para>
-+
-+ <programlisting>
-+struct kdbus_cmd_match {
-+ __u64 size;
-+ __u64 cookie;
-+ __u64 flags;
-+ __u64 return_flags;
-+ struct kdbus_item items[0];
-+};
-+ </programlisting>
-+
-+ <para>The fields in this struct are described below.</para>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><varname>size</varname></term>
-+ <listitem><para>
-+ The overall size of the struct, including its items.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>cookie</varname></term>
-+ <listitem><para>
-+ The cookie of the match, as it was passed when the match was added.
-+ All matches that have this cookie will be removed.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>flags</varname></term>
-+ <listitem><para>
-+ No flags are supported for this use case.
-+ <constant>KDBUS_FLAG_NEGOTIATE</constant> is accepted to probe for
-+ valid flags. If set, the ioctl will fail with
-+ <errorcode>-1</errorcode>, <varname>errno</varname> is set to
-+ <constant>EPROTO</constant>, and the <varname>flags</varname> field
-+ is set to <constant>0</constant>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>return_flags</varname></term>
-+ <listitem><para>
-+ Flags returned by the kernel. Currently unused and always set to
-+ <constant>0</constant> by the kernel.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>items</varname></term>
-+ <listitem>
-+ <para>
-+ No items are supported for this use case, but
-+ <constant>KDBUS_ITEM_NEGOTIATE</constant> is allowed nevertheless.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Return value</title>
-+ <para>
-+ On success, all mentioned ioctl commands return <errorcode>0</errorcode>;
-+ on error, <errorcode>-1</errorcode> is returned, and
-+ <varname>errno</varname> is set to indicate the error.
-+ If the issued ioctl is illegal for the file descriptor used,
-+ <varname>errno</varname> will be set to <constant>ENOTTY</constant>.
-+ </para>
-+
-+ <refsect2>
-+ <title>
-+ <constant>KDBUS_CMD_MATCH_ADD</constant> may fail with the following
-+ errors
-+ </title>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>EINVAL</constant></term>
-+ <listitem><para>
-+ Illegal flags or items.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EDOM</constant></term>
-+ <listitem><para>
-+ Illegal bloom filter size.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EMFILE</constant></term>
-+ <listitem><para>
-+ Too many matches for this connection.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </refsect2>
-+
-+ <refsect2>
-+ <title>
-+ <constant>KDBUS_CMD_MATCH_REMOVE</constant> may fail with the following
-+ errors
-+ </title>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>EINVAL</constant></term>
-+ <listitem><para>
-+ Illegal flags.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EBADSLT</constant></term>
-+ <listitem><para>
-+ A match entry with the given cookie could not be found.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </refsect2>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>See Also</title>
-+ <simplelist type="inline">
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.bus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.match</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.fs</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.item</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.message</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.name</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.pool</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ </simplelist>
-+ </refsect1>
-+</refentry>
-diff --git a/Documentation/kdbus/kdbus.message.xml b/Documentation/kdbus/kdbus.message.xml
-new file mode 100644
-index 0000000..0115d9d
---- /dev/null
-+++ b/Documentation/kdbus/kdbus.message.xml
-@@ -0,0 +1,1276 @@
-+<?xml version='1.0'?> <!--*-nxml-*-->
-+<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
-+ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
-+
-+<refentry id="kdbus.message">
-+
-+ <refentryinfo>
-+ <title>kdbus.message</title>
-+ <productname>kdbus.message</productname>
-+ </refentryinfo>
-+
-+ <refmeta>
-+ <refentrytitle>kdbus.message</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </refmeta>
-+
-+ <refnamediv>
-+ <refname>kdbus.message</refname>
-+ <refpurpose>kdbus message</refpurpose>
-+ </refnamediv>
-+
-+ <refsect1>
-+ <title>Description</title>
-+
-+ <para>
-+ A kdbus message is used to exchange information between two connections
-+ on a bus, or to transport notifications from the kernel to one or many
-+ connections. This document describes the layout of messages, how payload
-+ is added to them and how they are sent and received.
-+ </para>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Message layout</title>
-+
-+ <para>The layout of a message is shown below.</para>
-+
-+ <programlisting>
-+ +-------------------------------------------------------------------------+
-+ | Message |
-+ | +---------------------------------------------------------------------+ |
-+ | | Header | |
-+ | | size: overall message size, including the data records | |
-+ | | destination: connection ID of the receiver | |
-+ | | source: connection ID of the sender (set by kernel) | |
-+ | | payload_type: "DBusDBus" textual identifier stored as uint64_t | |
-+ | +---------------------------------------------------------------------+ |
-+ | +---------------------------------------------------------------------+ |
-+ | | Data Record | |
-+ | | size: overall record size (without padding) | |
-+ | | type: type of data | |
-+ | | data: reference to data (address or file descriptor) | |
-+ | +---------------------------------------------------------------------+ |
-+ | +---------------------------------------------------------------------+ |
-+ | | padding bytes to the next 8 byte alignment | |
-+ | +---------------------------------------------------------------------+ |
-+ | +---------------------------------------------------------------------+ |
-+ | | Data Record | |
-+ | | size: overall record size (without padding) | |
-+ | | ... | |
-+ | +---------------------------------------------------------------------+ |
-+ | +---------------------------------------------------------------------+ |
-+ | | padding bytes to the next 8 byte alignment | |
-+ | +---------------------------------------------------------------------+ |
-+ | +---------------------------------------------------------------------+ |
-+ | | Data Record | |
-+ | | size: overall record size | |
-+ | | ... | |
-+ | +---------------------------------------------------------------------+ |
-+ | ... further data records ... |
-+ +-------------------------------------------------------------------------+
-+ </programlisting>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Message payload</title>
-+
-+ <para>
-+ When connecting to the bus, receivers request a memory pool of a given
-+ size, large enough to carry all backlog of data enqueued for the
-+ connection. The pool is internally backed by a shared memory file which
-+ can be <function>mmap()</function>ed by the receiver. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.pool</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more information.
-+ </para>
-+
-+ <para>
-+ Message payload must be described in items attached to a message when
-+ it is sent. A receiver can access the payload by looking at the items
-+ that are attached to a message in its pool. The following items are used.
-+ </para>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_PAYLOAD_VEC</constant></term>
-+ <listitem>
-+ <para>
-+ This item references a piece of memory on the sender side which is
-+ directly copied into the receiver's pool. This way, two peers can
-+ exchange data by effectively doing a single-copy from one process
-+ to another; the kernel will not buffer the data anywhere else.
-+ This item is never found in a message received by a connection.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_PAYLOAD_OFF</constant></term>
-+ <listitem>
-+ <para>
-+ This item is attached to messages on the receiving side and points
-+ to a memory area inside the receiver's pool. The
-+ <varname>offset</varname> variable in the item denotes the memory
-+ location relative to the message itself.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_PAYLOAD_MEMFD</constant></term>
-+ <listitem>
-+ <para>
-+ Messages can reference <emphasis>memfd</emphasis> files which
-+ contain the data. memfd files are tmpfs-backed files that allow
-+ sealing of the content of the file, which prevents all writable
-+ access to the file content.
-+ </para>
-+ <para>
-+ Only memfds that have
-+ <constant>(F_SEAL_SHRINK|F_SEAL_GROW|F_SEAL_WRITE|F_SEAL_SEAL)
-+ </constant>
-+ set are accepted as payload data, which enforces reliable passing of
-+ data. The receiver can assume that neither the sender nor anyone
-+ else can alter the content after the message is sent. If those
-+ seals are not set on the memfd, the ioctl will fail with
-+ <errorcode>-1</errorcode>, and <varname>errno</varname> will be
-+ set to <constant>ETXTBUSY</constant>.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_FDS</constant></term>
-+ <listitem>
-+ <para>
-+ Messages can transport regular file descriptors via
-+ <constant>KDBUS_ITEM_FDS</constant>. This item carries an array
-+ of <type>int</type> values in <varname>item.fd</varname>. The
-+ maximum number of file descriptors in the item is
-+ <constant>253</constant>, and only one item of this type is
-+ accepted per message. All passed values must be valid file
-+ descriptors; the open count of each file descriptors is increased
-+ by installing it to the receiver's task. This item can only be
-+ used for directed messages, not for broadcasts, and only to
-+ remote peers that have opted-in for receiving file descriptors
-+ at connection time (<constant>KDBUS_HELLO_ACCEPT_FD</constant>).
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+
-+ <para>
-+ The sender must not make any assumptions on the type in which data is
-+ received by the remote peer. The kernel is free to re-pack multiple
-+ <constant>KDBUS_ITEM_PAYLOAD_VEC</constant> and
-+ <constant>KDBUS_ITEM_PAYLOAD_MEMFD</constant> payloads. For instance, the
-+ kernel may decide to merge multiple <constant>VECs</constant> into a
-+ single <constant>VEC</constant>, inline <constant>MEMFD</constant>
-+ payloads into memory, or merge all passed <constant>VECs</constant> into a
-+ single <constant>MEMFD</constant>. However, the kernel preserves the order
-+ of passed data. This means that the order of all <constant>VEC</constant>
-+ and <constant>MEMFD</constant> items is not changed in respect to each
-+ other. In other words: All passed <constant>VEC</constant> and
-+ <constant>MEMFD</constant> data payloads are treated as a single stream
-+ of data that may be received by the remote peer in a different set of
-+ chunks than it was sent as.
-+ </para>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Sending messages</title>
-+
-+ <para>
-+ Messages are passed to the kernel with the
-+ <constant>KDBUS_CMD_SEND</constant> ioctl. Depending on the destination
-+ address of the message, the kernel delivers the message to the specific
-+ destination connection, or to some subset of all connections on the same
-+ bus. Sending messages across buses is not possible. Messages are always
-+ queued in the memory pool of the destination connection (see above).
-+ </para>
-+
-+ <para>
-+ The <constant>KDBUS_CMD_SEND</constant> ioctl uses a
-+ <type>struct kdbus_cmd_send</type> to describe the message
-+ transfer.
-+ </para>
-+ <programlisting>
-+struct kdbus_cmd_send {
-+ __u64 size;
-+ __u64 flags;
-+ __u64 return_flags;
-+ __u64 msg_address;
-+ struct kdbus_msg_info reply;
-+ struct kdbus_item items[0];
-+};
-+ </programlisting>
-+
-+ <para>The fields in this struct are described below.</para>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><varname>size</varname></term>
-+ <listitem><para>
-+ The overall size of the struct, including its items.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>flags</varname></term>
-+ <listitem><para>Flags for message delivery</para>
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_SEND_SYNC_REPLY</constant></term>
-+ <listitem>
-+ <para>
-+ By default, all calls to kdbus are considered asynchronous,
-+ non-blocking. However, as there are many use cases that need
-+ to wait for a remote peer to answer a method call, there's a
-+ way to send a message and wait for a reply in a synchronous
-+ fashion. This is what the
-+ <constant>KDBUS_SEND_SYNC_REPLY</constant> controls. The
-+ <constant>KDBUS_CMD_SEND</constant> ioctl will block until the
-+ reply has arrived, the timeout limit is reached, in case the
-+ remote connection was shut down, or if interrupted by a signal
-+ before any reply; see
-+ <citerefentry>
-+ <refentrytitle>signal</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>.
-+
-+ The offset of the reply message in the sender's pool is stored
-+ in <varname>reply</varname> when the ioctl has returned without
-+ error. Hence, there is no need for another
-+ <constant>KDBUS_CMD_RECV</constant> ioctl or anything else to
-+ receive the reply.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
-+ <listitem>
-+ <para>
-+ Request a set of valid flags for this ioctl. When this bit is
-+ set, no action is taken; the ioctl will fail with
-+ <errorcode>-1</errorcode>, <varname>errno</varname>
-+ is set to <constant>EPROTO</constant>.
-+ Once the ioctl returned, the <varname>flags</varname>
-+ field will have all bits set that the kernel recognizes as
-+ valid for this command.
-+ The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
-+ cleared by the operation.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>return_flags</varname></term>
-+ <listitem><para>
-+ Flags returned by the kernel. Currently unused and always set to
-+ <constant>0</constant> by the kernel.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>msg_address</varname></term>
-+ <listitem><para>
-+ In this field, users have to provide a pointer to a message
-+ (<type>struct kdbus_msg</type>) to send. See below for a
-+ detailed description.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>reply</varname></term>
-+ <listitem><para>
-+ Only used for synchronous replies. See description of
-+ <type>struct kdbus_cmd_recv</type> for more details.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>items</varname></term>
-+ <listitem>
-+ <para>
-+ The following items are currently recognized.
-+ </para>
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_CANCEL_FD</constant></term>
-+ <listitem>
-+ <para>
-+ When this optional item is passed in, and the call is
-+ executed as SYNC call, the passed in file descriptor can be
-+ used as alternative cancellation point. The kernel will call
-+ <citerefentry>
-+ <refentrytitle>poll</refentrytitle>
-+ <manvolnum>2</manvolnum>
-+ </citerefentry>
-+ on this file descriptor, and once it reports any incoming
-+ bytes, the blocking send operation will be canceled; the
-+ blocking, synchronous ioctl call will return
-+ <errorcode>-1</errorcode>, and <varname>errno</varname> will
-+ be set to <errorname>ECANCELED</errorname>.
-+ Any type of file descriptor on which
-+ <citerefentry>
-+ <refentrytitle>poll</refentrytitle>
-+ <manvolnum>2</manvolnum>
-+ </citerefentry>
-+ can be called on can be used as payload to this item; for
-+ example, an eventfd can be used for this purpose, see
-+ <citerefentry>
-+ <refentrytitle>eventfd</refentrytitle>
-+ <manvolnum>2</manvolnum>
-+ </citerefentry>.
-+ For asynchronous message sending, this item is allowed but
-+ ignored.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+ <para>
-+ Unrecognized items are rejected, and the ioctl will fail with
-+ <varname>errno</varname> set to <constant>EINVAL</constant>.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+
-+ <para>
-+ The message referenced by the <varname>msg_address</varname> above has
-+ the following layout.
-+ </para>
-+
-+ <programlisting>
-+struct kdbus_msg {
-+ __u64 size;
-+ __u64 flags;
-+ __s64 priority;
-+ __u64 dst_id;
-+ __u64 src_id;
-+ __u64 payload_type;
-+ __u64 cookie;
-+ __u64 timeout_ns;
-+ __u64 cookie_reply;
-+ struct kdbus_item items[0];
-+};
-+ </programlisting>
-+
-+ <para>The fields in this struct are described below.</para>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><varname>size</varname></term>
-+ <listitem><para>
-+ The overall size of the struct, including its items.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>flags</varname></term>
-+ <listitem><para>Flags to describe message details.</para>
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_MSG_EXPECT_REPLY</constant></term>
-+ <listitem>
-+ <para>
-+ Expect a reply to this message from the remote peer. With
-+ this bit set, the timeout_ns field must be set to a non-zero
-+ number of nanoseconds in which the receiving peer is expected
-+ to reply. If such a reply is not received in time, the sender
-+ will be notified with a timeout message (see below). The
-+ value must be an absolute value, in nanoseconds and based on
-+ <constant>CLOCK_MONOTONIC</constant>.
-+ </para><para>
-+ For a message to be accepted as reply, it must be a direct
-+ message to the original sender (not a broadcast and not a
-+ signal message), and its
-+ <varname>kdbus_msg.cookie_reply</varname> must match the
-+ previous message's <varname>kdbus_msg.cookie</varname>.
-+ </para><para>
-+ Expected replies also temporarily open the policy of the
-+ sending connection, so the other peer is allowed to respond
-+ within the given time window.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_MSG_NO_AUTO_START</constant></term>
-+ <listitem>
-+ <para>
-+ By default, when a message is sent to an activator
-+ connection, the activator is notified and will start an
-+ implementer. This flag inhibits that behavior. With this bit
-+ set, and the remote being an activator, the ioctl will fail
-+ with <varname>errno</varname> set to
-+ <constant>EADDRNOTAVAIL</constant>.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
-+ <listitem>
-+ <para>
-+ Requests a set of valid flags for this ioctl. When this bit is
-+ set, no action is taken; the ioctl will return
-+ <errorcode>0</errorcode>, and the <varname>flags</varname>
-+ field will have all bits set that are valid for this command.
-+ The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
-+ cleared by the operation.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>priority</varname></term>
-+ <listitem><para>
-+ The priority of this message. Receiving messages (see below) may
-+ optionally be constrained to messages of a minimal priority. This
-+ allows for use cases where timing critical data is interleaved with
-+ control data on the same connection. If unused, the priority field
-+ should be set to <constant>0</constant>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>dst_id</varname></term>
-+ <listitem><para>
-+ The numeric ID of the destination connection, or
-+ <constant>KDBUS_DST_ID_BROADCAST</constant>
-+ (~0ULL) to address every peer on the bus, or
-+ <constant>KDBUS_DST_ID_NAME</constant> (0) to look
-+ it up dynamically from the bus' name registry.
-+ In the latter case, an item of type
-+ <constant>KDBUS_ITEM_DST_NAME</constant> is mandatory.
-+ Also see
-+ <citerefentry>
-+ <refentrytitle>kdbus.name</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ .
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>src_id</varname></term>
-+ <listitem><para>
-+ Upon return of the ioctl, this member will contain the sending
-+ connection's numerical ID. Should be 0 at send time.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>payload_type</varname></term>
-+ <listitem><para>
-+ Type of the payload in the actual data records. Currently, only
-+ <constant>KDBUS_PAYLOAD_DBUS</constant> is accepted as input value
-+ of this field. When receiving messages that are generated by the
-+ kernel (notifications), this field will contain
-+ <constant>KDBUS_PAYLOAD_KERNEL</constant>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>cookie</varname></term>
-+ <listitem><para>
-+ Cookie of this message, for later recognition. Also, when replying
-+ to a message (see above), the <varname>cookie_reply</varname>
-+ field must match this value.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>timeout_ns</varname></term>
-+ <listitem><para>
-+ If the message sent requires a reply from the remote peer (see above),
-+ this field contains the timeout in absolute nanoseconds based on
-+ <constant>CLOCK_MONOTONIC</constant>. Also see
-+ <citerefentry>
-+ <refentrytitle>clock_gettime</refentrytitle>
-+ <manvolnum>2</manvolnum>
-+ </citerefentry>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>cookie_reply</varname></term>
-+ <listitem><para>
-+ If the message sent is a reply to another message, this field must
-+ match the cookie of the formerly received message.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>items</varname></term>
-+ <listitem>
-+ <para>
-+ A dynamically sized list of items to contain additional information.
-+ The following items are expected/valid:
-+ </para>
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_PAYLOAD_VEC</constant></term>
-+ <term><constant>KDBUS_ITEM_PAYLOAD_MEMFD</constant></term>
-+ <term><constant>KDBUS_ITEM_FDS</constant></term>
-+ <listitem>
-+ <para>
-+ Actual data records containing the payload. See section
-+ "Message payload".
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_BLOOM_FILTER</constant></term>
-+ <listitem>
-+ <para>
-+ Bloom filter for matches (see below).
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ITEM_DST_NAME</constant></term>
-+ <listitem>
-+ <para>
-+ Well-known name to send this message to. Required if
-+ <varname>dst_id</varname> is set to
-+ <constant>KDBUS_DST_ID_NAME</constant>.
-+ If a connection holding the given name can't be found,
-+ the ioctl will fail with <varname>errno</varname> set to
-+ <constant>ESRCH</constant> is returned.
-+ </para>
-+ <para>
-+ For messages to a unique name (ID), this item is optional. If
-+ present, the kernel will make sure the name owner matches the
-+ given unique name. This allows programs to tie the message
-+ sending to the condition that a name is currently owned by a
-+ certain unique name.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+
-+ <para>
-+ The message will be augmented by the requested metadata items when
-+ queued into the receiver's pool. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.connection</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ and
-+ <citerefentry>
-+ <refentrytitle>kdbus.item</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more information on metadata.
-+ </para>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Receiving messages</title>
-+
-+ <para>
-+ Messages are received by the client with the
-+ <constant>KDBUS_CMD_RECV</constant> ioctl. The endpoint file of the bus
-+ supports <function>poll()/epoll()/select()</function>; when new messages
-+ are available on the connection's file descriptor,
-+ <constant>POLLIN</constant> is reported. For compatibility reasons,
-+ <constant>POLLOUT</constant> is always reported as well. Note, however,
-+ that the latter does not guarantee that a message can in fact be sent, as
-+ this depends on how many pending messages the receiver has in its pool.
-+ </para>
-+
-+ <para>
-+ With the <constant>KDBUS_CMD_RECV</constant> ioctl, a
-+ <type>struct kdbus_cmd_recv</type> is used.
-+ </para>
-+
-+ <programlisting>
-+struct kdbus_cmd_recv {
-+ __u64 size;
-+ __u64 flags;
-+ __u64 return_flags;
-+ __s64 priority;
-+ __u64 dropped_msgs;
-+ struct kdbus_msg_info msg;
-+ struct kdbus_item items[0];
-+};
-+ </programlisting>
-+
-+ <para>The fields in this struct are described below.</para>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><varname>size</varname></term>
-+ <listitem><para>
-+ The overall size of the struct, including its items.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>flags</varname></term>
-+ <listitem><para>Flags to control the receive command.</para>
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_RECV_PEEK</constant></term>
-+ <listitem>
-+ <para>
-+ Just return the location of the next message. Do not install
-+ file descriptors or anything else. This is usually used to
-+ determine the sender of the next queued message.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_RECV_DROP</constant></term>
-+ <listitem>
-+ <para>
-+ Drop the next message without doing anything else with it,
-+ and free the pool slice. This a short-cut for
-+ <constant>KDBUS_RECV_PEEK</constant> and
-+ <constant>KDBUS_CMD_FREE</constant>.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_RECV_USE_PRIORITY</constant></term>
-+ <listitem>
-+ <para>
-+ Dequeue the messages ordered by their priority, and filtering
-+ them with the priority field (see below).
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
-+ <listitem>
-+ <para>
-+ Request a set of valid flags for this ioctl. When this bit is
-+ set, no action is taken; the ioctl will fail with
-+ <errorcode>-1</errorcode>, <varname>errno</varname>
-+ is set to <constant>EPROTO</constant>.
-+ Once the ioctl returned, the <varname>flags</varname>
-+ field will have all bits set that the kernel recognizes as
-+ valid for this command.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>return_flags</varname></term>
-+ <listitem><para>
-+ Flags returned by the kernel. If the <varname>dropped_msgs</varname>
-+ field is non-zero, <constant>KDBUS_RECV_RETURN_DROPPED_MSGS</constant>
-+ is set. If a file descriptor could not be installed, the
-+ <constant>KDBUS_RECV_RETURN_INCOMPLETE_FDS</constant> flag is set.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>priority</varname></term>
-+ <listitem><para>
-+ With <constant>KDBUS_RECV_USE_PRIORITY</constant> set in
-+ <varname>flags</varname>, messages will be dequeued ordered by their
-+ priority, starting with the highest value. Also, messages will be
-+ filtered by the value given in this field, so the returned message
-+ will at least have the requested priority. If no such message is
-+ waiting in the queue, the ioctl will fail, and
-+ <varname>errno</varname> will be set to <constant>EAGAIN</constant>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>dropped_msgs</varname></term>
-+ <listitem><para>
-+ Whenever a message with <constant>KDBUS_MSG_SIGNAL</constant> is sent
-+ but cannot be queued on a peer (e.g., as it contains FDs but the peer
-+ does not support FDs, or there is no space left in the peer's pool)
-+ the 'dropped_msgs' counter of the peer is incremented. On the next
-+ RECV ioctl, the 'dropped_msgs' field is copied into the ioctl struct
-+ and cleared on the peer. If it was non-zero, the
-+ <constant>KDBUS_RECV_RETURN_DROPPED_MSGS</constant> flag will be set
-+ in <varname>return_flags</varname>. Note that this will only happen
-+ if the ioctl succeeded or failed with <constant>EAGAIN</constant>. In
-+ other error cases, the 'dropped_msgs' field of the peer is left
-+ untouched.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>msg</varname></term>
-+ <listitem><para>
-+ Embedded struct containing information on the received message when
-+ this command succeeded (see below).
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>items</varname></term>
-+ <listitem><para>
-+ Items to specify further details for the receive command.
-+ Currently unused, and all items will be rejected with
-+ <varname>errno</varname> set to <constant>EINVAL</constant>.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+
-+ <para>
-+ Both <type>struct kdbus_cmd_recv</type> and
-+ <type>struct kdbus_cmd_send</type> embed
-+ <type>struct kdbus_msg_info</type>.
-+ For the <constant>KDBUS_CMD_SEND</constant> ioctl, it is used to catch
-+ synchronous replies, if one was requested, and is unused otherwise.
-+ </para>
-+
-+ <programlisting>
-+struct kdbus_msg_info {
-+ __u64 offset;
-+ __u64 msg_size;
-+ __u64 return_flags;
-+};
-+ </programlisting>
-+
-+ <para>The fields in this struct are described below.</para>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><varname>offset</varname></term>
-+ <listitem><para>
-+ Upon return of the ioctl, this field contains the offset in the
-+ receiver's memory pool. The memory must be freed with
-+ <constant>KDBUS_CMD_FREE</constant>. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.pool</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for further details.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>msg_size</varname></term>
-+ <listitem><para>
-+ Upon successful return of the ioctl, this field contains the size of
-+ the allocated slice at offset <varname>offset</varname>.
-+ It is the combination of the size of the stored
-+ <type>struct kdbus_msg</type> object plus all appended VECs.
-+ You can use it in combination with <varname>offset</varname> to map
-+ a single message, instead of mapping the entire pool. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.pool</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for further details.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>return_flags</varname></term>
-+ <listitem>
-+ <para>
-+ Kernel-provided return flags. Currently, the following flags are
-+ defined.
-+ </para>
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_RECV_RETURN_INCOMPLETE_FDS</constant></term>
-+ <listitem>
-+ <para>
-+ The message contained memfds or file descriptors, and the
-+ kernel failed to install one or more of them at receive time.
-+ Most probably that happened because the maximum number of
-+ file descriptors for the receiver's task were exceeded.
-+ In such cases, the message is still delivered, so this is not
-+ a fatal condition. File descriptors numbers inside the
-+ <constant>KDBUS_ITEM_FDS</constant> item or memfd files
-+ referenced by <constant>KDBUS_ITEM_PAYLOAD_MEMFD</constant>
-+ items which could not be installed will be set to
-+ <constant>-1</constant>.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+
-+ <para>
-+ Unless <constant>KDBUS_RECV_DROP</constant> was passed, the
-+ <varname>offset</varname> field contains the location of the new message
-+ inside the receiver's pool after the <constant>KDBUS_CMD_RECV</constant>
-+ ioctl was employed. The message is stored as <type>struct kdbus_msg</type>
-+ at this offset, and can be interpreted with the semantics described above.
-+ </para>
-+ <para>
-+ Also, if the connection allowed for file descriptor to be passed
-+ (<constant>KDBUS_HELLO_ACCEPT_FD</constant>), and if the message contained
-+ any, they will be installed into the receiving process when the
-+ <constant>KDBUS_CMD_RECV</constant> ioctl is called.
-+ <emphasis>memfds</emphasis> may always be part of the message payload.
-+ The receiving task is obliged to close all file descriptors appropriately
-+ once no longer needed. If <constant>KDBUS_RECV_PEEK</constant> is set, no
-+ file descriptors are installed. This allows for peeking at a message,
-+ looking at its metadata only and dropping it via
-+ <constant>KDBUS_RECV_DROP</constant>, without installing any of the file
-+ descriptors into the receiving process.
-+ </para>
-+ <para>
-+ The caller is obliged to call the <constant>KDBUS_CMD_FREE</constant>
-+ ioctl with the returned offset when the memory is no longer needed.
-+ </para>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Notifications</title>
-+ <para>
-+ A kernel notification is a regular kdbus message with the following
-+ details.
-+ </para>
-+
-+ <itemizedlist>
-+ <listitem><para>
-+ kdbus_msg.src_id == <constant>KDBUS_SRC_ID_KERNEL</constant>
-+ </para></listitem>
-+ <listitem><para>
-+ kdbus_msg.dst_id == <constant>KDBUS_DST_ID_BROADCAST</constant>
-+ </para></listitem>
-+ <listitem><para>
-+ kdbus_msg.payload_type == <constant>KDBUS_PAYLOAD_KERNEL</constant>
-+ </para></listitem>
-+ <listitem><para>
-+ Has exactly one of the items attached that are described below.
-+ </para></listitem>
-+ <listitem><para>
-+ Always has a timestamp item (<constant>KDBUS_ITEM_TIMESTAMP</constant>)
-+ attached.
-+ </para></listitem>
-+ </itemizedlist>
-+
-+ <para>
-+ The kernel will notify its users of the following events.
-+ </para>
-+
-+ <itemizedlist>
-+ <listitem><para>
-+ When connection <emphasis>A</emphasis> is terminated while connection
-+ <emphasis>B</emphasis> is waiting for a reply from it, connection
-+ <emphasis>B</emphasis> is notified with a message with an item of
-+ type <constant>KDBUS_ITEM_REPLY_DEAD</constant>.
-+ </para></listitem>
-+
-+ <listitem><para>
-+ When connection <emphasis>A</emphasis> does not receive a reply from
-+ connection <emphasis>B</emphasis> within the specified timeout window,
-+ connection <emphasis>A</emphasis> will receive a message with an
-+ item of type <constant>KDBUS_ITEM_REPLY_TIMEOUT</constant>.
-+ </para></listitem>
-+
-+ <listitem><para>
-+ When an ordinary connection (not a monitor) is created on or removed
-+ from a bus, messages with an item of type
-+ <constant>KDBUS_ITEM_ID_ADD</constant> or
-+ <constant>KDBUS_ITEM_ID_REMOVE</constant>, respectively, are delivered
-+ to all bus members that match these messages through their match
-+ database. Eavesdroppers (monitor connections) do not cause such
-+ notifications to be sent. They are invisible on the bus.
-+ </para></listitem>
-+
-+ <listitem><para>
-+ When a connection gains or loses ownership of a name, messages with an
-+ item of type <constant>KDBUS_ITEM_NAME_ADD</constant>,
-+ <constant>KDBUS_ITEM_NAME_REMOVE</constant> or
-+ <constant>KDBUS_ITEM_NAME_CHANGE</constant> are delivered to all bus
-+ members that match these messages through their match database.
-+ </para></listitem>
-+ </itemizedlist>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Return value</title>
-+ <para>
-+ On success, all mentioned ioctl commands return <errorcode>0</errorcode>;
-+ on error, <errorcode>-1</errorcode> is returned, and
-+ <varname>errno</varname> is set to indicate the error.
-+ If the issued ioctl is illegal for the file descriptor used,
-+ <varname>errno</varname> will be set to <constant>ENOTTY</constant>.
-+ </para>
-+
-+ <refsect2>
-+ <title>
-+ <constant>KDBUS_CMD_SEND</constant> may fail with the following
-+ errors
-+ </title>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>EOPNOTSUPP</constant></term>
-+ <listitem><para>
-+ The connection is not an ordinary connection, or the passed
-+ file descriptors in <constant>KDBUS_ITEM_FDS</constant> item are
-+ either kdbus handles or unix domain sockets. Both are currently
-+ unsupported.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EINVAL</constant></term>
-+ <listitem><para>
-+ The submitted payload type is
-+ <constant>KDBUS_PAYLOAD_KERNEL</constant>,
-+ <constant>KDBUS_MSG_EXPECT_REPLY</constant> was set without timeout
-+ or cookie values, <constant>KDBUS_SEND_SYNC_REPLY</constant> was
-+ set without <constant>KDBUS_MSG_EXPECT_REPLY</constant>, an invalid
-+ item was supplied, <constant>src_id</constant> was non-zero and was
-+ different from the current connection's ID, a supplied memfd had a
-+ size of 0, or a string was not properly null-terminated.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>ENOTUNIQ</constant></term>
-+ <listitem><para>
-+ The supplied destination is
-+ <constant>KDBUS_DST_ID_BROADCAST</constant> and either
-+ file descriptors were passed, or
-+ <constant>KDBUS_MSG_EXPECT_REPLY</constant> was set,
-+ or a timeout was given.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>E2BIG</constant></term>
-+ <listitem><para>
-+ Too many items.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EMSGSIZE</constant></term>
-+ <listitem><para>
-+ The size of the message header and items or the payload vector
-+ is excessive.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EEXIST</constant></term>
-+ <listitem><para>
-+ Multiple <constant>KDBUS_ITEM_FDS</constant>,
-+ <constant>KDBUS_ITEM_BLOOM_FILTER</constant> or
-+ <constant>KDBUS_ITEM_DST_NAME</constant> items were supplied.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EBADF</constant></term>
-+ <listitem><para>
-+ The supplied <constant>KDBUS_ITEM_FDS</constant> or
-+ <constant>KDBUS_ITEM_PAYLOAD_MEMFD</constant> items
-+ contained an illegal file descriptor.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EMEDIUMTYPE</constant></term>
-+ <listitem><para>
-+ The supplied memfd is not a sealed kdbus memfd.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EMFILE</constant></term>
-+ <listitem><para>
-+ Too many file descriptors inside a
-+ <constant>KDBUS_ITEM_FDS</constant>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EBADMSG</constant></term>
-+ <listitem><para>
-+ An item had illegal size, both a <constant>dst_id</constant> and a
-+ <constant>KDBUS_ITEM_DST_NAME</constant> was given, or both a name
-+ and a bloom filter was given.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>ETXTBSY</constant></term>
-+ <listitem><para>
-+ The supplied kdbus memfd file cannot be sealed or the seal
-+ was removed, because it is shared with other processes or
-+ still mapped with
-+ <citerefentry>
-+ <refentrytitle>mmap</refentrytitle>
-+ <manvolnum>2</manvolnum>
-+ </citerefentry>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>ECOMM</constant></term>
-+ <listitem><para>
-+ A peer does not accept the file descriptors addressed to it.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EFAULT</constant></term>
-+ <listitem><para>
-+ The supplied bloom filter size was not 64-bit aligned, or supplied
-+ memory could not be accessed by the kernel.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EDOM</constant></term>
-+ <listitem><para>
-+ The supplied bloom filter size did not match the bloom filter
-+ size of the bus.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EDESTADDRREQ</constant></term>
-+ <listitem><para>
-+ <constant>dst_id</constant> was set to
-+ <constant>KDBUS_DST_ID_NAME</constant>, but no
-+ <constant>KDBUS_ITEM_DST_NAME</constant> was attached.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>ESRCH</constant></term>
-+ <listitem><para>
-+ The name to look up was not found in the name registry.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EADDRNOTAVAIL</constant></term>
-+ <listitem><para>
-+ <constant>KDBUS_MSG_NO_AUTO_START</constant> was given but the
-+ destination connection is an activator.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>ENXIO</constant></term>
-+ <listitem><para>
-+ The passed numeric destination connection ID couldn't be found,
-+ or is not connected.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>ECONNRESET</constant></term>
-+ <listitem><para>
-+ The destination connection is no longer active.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>ETIMEDOUT</constant></term>
-+ <listitem><para>
-+ Timeout while synchronously waiting for a reply.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EINTR</constant></term>
-+ <listitem><para>
-+ Interrupted system call while synchronously waiting for a reply.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EPIPE</constant></term>
-+ <listitem><para>
-+ When sending a message, a synchronous reply from the receiving
-+ connection was expected but the connection died before answering.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>ENOBUFS</constant></term>
-+ <listitem><para>
-+ Too many pending messages on the receiver side.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EREMCHG</constant></term>
-+ <listitem><para>
-+ Both a well-known name and a unique name (ID) was given, but
-+ the name is not currently owned by that connection.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EXFULL</constant></term>
-+ <listitem><para>
-+ The memory pool of the receiver is full.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EREMOTEIO</constant></term>
-+ <listitem><para>
-+ While synchronously waiting for a reply, the remote peer
-+ failed with an I/O error.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </refsect2>
-+
-+ <refsect2>
-+ <title>
-+ <constant>KDBUS_CMD_RECV</constant> may fail with the following
-+ errors
-+ </title>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>EOPNOTSUPP</constant></term>
-+ <listitem><para>
-+ The connection is not an ordinary connection, or the passed
-+ file descriptors are either kdbus handles or unix domain
-+ sockets. Both are currently unsupported.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EINVAL</constant></term>
-+ <listitem><para>
-+ Invalid flags or offset.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EAGAIN</constant></term>
-+ <listitem><para>
-+ No message found in the queue.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </refsect2>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>See Also</title>
-+ <simplelist type="inline">
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.bus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.connection</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.endpoint</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.fs</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.item</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.name</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.pool</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>clock_gettime</refentrytitle>
-+ <manvolnum>2</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>ioctl</refentrytitle>
-+ <manvolnum>2</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>poll</refentrytitle>
-+ <manvolnum>2</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>select</refentrytitle>
-+ <manvolnum>2</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>epoll</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>eventfd</refentrytitle>
-+ <manvolnum>2</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>memfd_create</refentrytitle>
-+ <manvolnum>2</manvolnum>
-+ </citerefentry>
-+ </member>
-+ </simplelist>
-+ </refsect1>
-+</refentry>
-diff --git a/Documentation/kdbus/kdbus.name.xml b/Documentation/kdbus/kdbus.name.xml
-new file mode 100644
-index 0000000..3f5f6a6
---- /dev/null
-+++ b/Documentation/kdbus/kdbus.name.xml
-@@ -0,0 +1,711 @@
-+<?xml version='1.0'?> <!--*-nxml-*-->
-+<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
-+ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
-+
-+<refentry id="kdbus.name">
-+
-+ <refentryinfo>
-+ <title>kdbus.name</title>
-+ <productname>kdbus.name</productname>
-+ </refentryinfo>
-+
-+ <refmeta>
-+ <refentrytitle>kdbus.name</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </refmeta>
-+
-+ <refnamediv>
-+ <refname>kdbus.name</refname>
-+ <refpurpose>kdbus.name</refpurpose>
-+ </refnamediv>
-+
-+ <refsect1>
-+ <title>Description</title>
-+ <para>
-+ Each
-+ <citerefentry>
-+ <refentrytitle>kdbus.bus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ instantiates a name registry to resolve well-known names into unique
-+ connection IDs for message delivery. The registry will be queried when a
-+ message is sent with <varname>kdbus_msg.dst_id</varname> set to
-+ <constant>KDBUS_DST_ID_NAME</constant>, or when a registry dump is
-+ requested with <constant>KDBUS_CMD_NAME_LIST</constant>.
-+ </para>
-+
-+ <para>
-+ All of the below is subject to policy rules for <emphasis>SEE</emphasis>
-+ and <emphasis>OWN</emphasis> permissions. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.policy</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more information.
-+ </para>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Name validity</title>
-+ <para>
-+ A name has to comply with the following rules in order to be considered
-+ valid.
-+ </para>
-+
-+ <itemizedlist>
-+ <listitem>
-+ <para>
-+ The name has two or more elements separated by a
-+ '<literal>.</literal>' (period) character.
-+ </para>
-+ </listitem>
-+ <listitem>
-+ <para>
-+ All elements must contain at least one character.
-+ </para>
-+ </listitem>
-+ <listitem>
-+ <para>
-+ Each element must only contain the ASCII characters
-+ <literal>[A-Z][a-z][0-9]_</literal> and must not begin with a
-+ digit.
-+ </para>
-+ </listitem>
-+ <listitem>
-+ <para>
-+ The name must contain at least one '<literal>.</literal>' (period)
-+ character (and thus at least two elements).
-+ </para>
-+ </listitem>
-+ <listitem>
-+ <para>
-+ The name must not begin with a '<literal>.</literal>' (period)
-+ character.
-+ </para>
-+ </listitem>
-+ <listitem>
-+ <para>
-+ The name must not exceed <constant>255</constant> characters in
-+ length.
-+ </para>
-+ </listitem>
-+ </itemizedlist>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Acquiring a name</title>
-+ <para>
-+ To acquire a name, a client uses the
-+ <constant>KDBUS_CMD_NAME_ACQUIRE</constant> ioctl with
-+ <type>struct kdbus_cmd</type> as argument.
-+ </para>
-+
-+ <programlisting>
-+struct kdbus_cmd {
-+ __u64 size;
-+ __u64 flags;
-+ __u64 return_flags;
-+ struct kdbus_item items[0];
-+};
-+ </programlisting>
-+
-+ <para>The fields in this struct are described below.</para>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><varname>size</varname></term>
-+ <listitem><para>
-+ The overall size of the struct, including its items.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>flags</varname></term>
-+ <listitem><para>Flags to control details in the name acquisition.</para>
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_NAME_REPLACE_EXISTING</constant></term>
-+ <listitem>
-+ <para>
-+ Acquiring a name that is already present usually fails,
-+ unless this flag is set in the call, and
-+ <constant>KDBUS_NAME_ALLOW_REPLACEMENT</constant> (see below)
-+ was set when the current owner of the name acquired it, or
-+ if the current owner is an activator connection (see
-+ <citerefentry>
-+ <refentrytitle>kdbus.connection</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>).
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_NAME_ALLOW_REPLACEMENT</constant></term>
-+ <listitem>
-+ <para>
-+ Allow other connections to take over this name. When this
-+ happens, the former owner of the connection will be notified
-+ of the name loss.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_NAME_QUEUE</constant></term>
-+ <listitem>
-+ <para>
-+ A name that is already acquired by a connection can not be
-+ acquired again (unless the
-+ <constant>KDBUS_NAME_ALLOW_REPLACEMENT</constant> flag was
-+ set during acquisition; see above).
-+ However, a connection can put itself in a queue of
-+ connections waiting for the name to be released. Once that
-+ happens, the first connection in that queue becomes the new
-+ owner and is notified accordingly.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
-+ <listitem>
-+ <para>
-+ Request a set of valid flags for this ioctl. When this bit is
-+ set, no action is taken; the ioctl will fail with
-+ <errorcode>-1</errorcode>, and <varname>errno</varname>
-+ is set to <constant>EPROTO</constant>.
-+ Once the ioctl returned, the <varname>flags</varname>
-+ field will have all bits set that the kernel recognizes as
-+ valid for this command.
-+ The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
-+ cleared by the operation.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>return_flags</varname></term>
-+ <listitem>
-+ <para>
-+ Flags returned by the kernel. Currently, the following may be
-+ returned by the kernel.
-+ </para>
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_NAME_IN_QUEUE</constant></term>
-+ <listitem>
-+ <para>
-+ The name was not acquired yet, but the connection was
-+ placed in the queue of peers waiting for the name.
-+ This can only happen if <constant>KDBUS_NAME_QUEUE</constant>
-+ was set in the <varname>flags</varname> member (see above).
-+ The connection will receive a name owner change notification
-+ once the current owner has given up the name and its
-+ ownership was transferred.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>items</varname></term>
-+ <listitem>
-+ <para>
-+ Items to submit the name. Currently, one item of type
-+ <constant>KDBUS_ITEM_NAME</constant> is expected and allowed, and
-+ the contained string must be a valid bus name.
-+ <constant>KDBUS_ITEM_NEGOTIATE</constant> may be used to probe for
-+ valid item types. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.item</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for a detailed description of how this item is used.
-+ </para>
-+ <para>
-+ Unrecognized items are rejected, and the ioctl will fail with
-+ <varname>errno</varname> set to <errorname>>EINVAL</errorname>.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Releasing a name</title>
-+ <para>
-+ A connection may release a name explicitly with the
-+ <constant>KDBUS_CMD_NAME_RELEASE</constant> ioctl. If the connection was
-+ an implementer of an activatable name, its pending messages are moved
-+ back to the activator. If there are any connections queued up as waiters
-+ for the name, the first one in the queue (the oldest entry) will become
-+ the new owner. The same happens implicitly for all names once a
-+ connection terminates. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.connection</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more information on connections.
-+ </para>
-+ <para>
-+ The <constant>KDBUS_CMD_NAME_RELEASE</constant> ioctl uses the same data
-+ structure as the acquisition call
-+ (<constant>KDBUS_CMD_NAME_ACQUIRE</constant>),
-+ but with slightly different field usage.
-+ </para>
-+
-+ <programlisting>
-+struct kdbus_cmd {
-+ __u64 size;
-+ __u64 flags;
-+ __u64 return_flags;
-+ struct kdbus_item items[0];
-+};
-+ </programlisting>
-+
-+ <para>The fields in this struct are described below.</para>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><varname>size</varname></term>
-+ <listitem><para>
-+ The overall size of the struct, including its items.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>flags</varname></term>
-+ <listitem><para>
-+ Flags to the command. Currently unused.
-+ <constant>KDBUS_FLAG_NEGOTIATE</constant> is accepted to probe for
-+ valid flags. If set, the ioctl will return <errorcode>0</errorcode>,
-+ and the <varname>flags</varname> field is set to
-+ <constant>0</constant>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>return_flags</varname></term>
-+ <listitem><para>
-+ Flags returned by the kernel. Currently unused and always set to
-+ <constant>0</constant> by the kernel.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>items</varname></term>
-+ <listitem>
-+ <para>
-+ Items to submit the name. Currently, one item of type
-+ <constant>KDBUS_ITEM_NAME</constant> is expected and allowed, and
-+ the contained string must be a valid bus name.
-+ <constant>KDBUS_ITEM_NEGOTIATE</constant> may be used to probe for
-+ valid item types. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.item</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for a detailed description of how this item is used.
-+ </para>
-+ <para>
-+ Unrecognized items are rejected, and the ioctl will fail with
-+ <varname>errno</varname> set to <constant>EINVAL</constant>.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Dumping the name registry</title>
-+ <para>
-+ A connection may request a complete or filtered dump of currently active
-+ bus names with the <constant>KDBUS_CMD_LIST</constant> ioctl, which
-+ takes a <type>struct kdbus_cmd_list</type> as argument.
-+ </para>
-+
-+ <programlisting>
-+struct kdbus_cmd_list {
-+ __u64 flags;
-+ __u64 return_flags;
-+ __u64 offset;
-+};
-+ </programlisting>
-+
-+ <para>The fields in this struct are described below.</para>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><varname>flags</varname></term>
-+ <listitem>
-+ <para>
-+ Any combination of flags to specify which names should be dumped.
-+ </para>
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_LIST_UNIQUE</constant></term>
-+ <listitem>
-+ <para>
-+ List the unique (numeric) IDs of the connection, whether it
-+ owns a name or not.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_LIST_NAMES</constant></term>
-+ <listitem>
-+ <para>
-+ List well-known names stored in the database which are
-+ actively owned by a real connection (not an activator).
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_LIST_ACTIVATORS</constant></term>
-+ <listitem>
-+ <para>
-+ List names that are owned by an activator.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_LIST_QUEUED</constant></term>
-+ <listitem>
-+ <para>
-+ List connections that are not yet owning a name but are
-+ waiting for it to become available.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
-+ <listitem>
-+ <para>
-+ Request a set of valid flags for this ioctl. When this bit is
-+ set, no action is taken; the ioctl will fail with
-+ <errorcode>-1</errorcode>, and <varname>errno</varname>
-+ is set to <constant>EPROTO</constant>.
-+ Once the ioctl returned, the <varname>flags</varname>
-+ field will have all bits set that the kernel recognizes as
-+ valid for this command.
-+ The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
-+ cleared by the operation.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>return_flags</varname></term>
-+ <listitem><para>
-+ Flags returned by the kernel. Currently unused and always set to
-+ <constant>0</constant> by the kernel.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>offset</varname></term>
-+ <listitem><para>
-+ When the ioctl returns successfully, the offset to the name registry
-+ dump inside the connection's pool will be stored in this field.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+
-+ <para>
-+ The returned list of names is stored in a <type>struct kdbus_list</type>
-+ that in turn contains an array of type <type>struct kdbus_info</type>,
-+ The array-size in bytes is given as <varname>list_size</varname>.
-+ The fields inside <type>struct kdbus_info</type> is described next.
-+ </para>
-+
-+ <programlisting>
-+struct kdbus_info {
-+ __u64 size;
-+ __u64 id;
-+ __u64 flags;
-+ struct kdbus_item items[0];
-+};
-+ </programlisting>
-+
-+ <para>The fields in this struct are described below.</para>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><varname>size</varname></term>
-+ <listitem><para>
-+ The overall size of the struct, including its items.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>id</varname></term>
-+ <listitem><para>
-+ The owning connection's unique ID.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>flags</varname></term>
-+ <listitem><para>
-+ The flags of the owning connection.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>items</varname></term>
-+ <listitem>
-+ <para>
-+ Items containing the actual name. Currently, one item of type
-+ <constant>KDBUS_ITEM_OWNED_NAME</constant> will be attached,
-+ including the name's flags. In that item, the flags field of the
-+ name may carry the following bits:
-+ </para>
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_NAME_ALLOW_REPLACEMENT</constant></term>
-+ <listitem>
-+ <para>
-+ Other connections are allowed to take over this name from the
-+ connection that owns it.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_NAME_IN_QUEUE</constant></term>
-+ <listitem>
-+ <para>
-+ When retrieving a list of currently acquired names in the
-+ registry, this flag indicates whether the connection
-+ actually owns the name or is currently waiting for it to
-+ become available.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_NAME_ACTIVATOR</constant></term>
-+ <listitem>
-+ <para>
-+ An activator connection owns a name as a placeholder for an
-+ implementer, which is started on demand by programs as soon
-+ as the first message arrives. There's some more information
-+ on this topic in
-+ <citerefentry>
-+ <refentrytitle>kdbus.connection</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ .
-+ </para>
-+ <para>
-+ In contrast to
-+ <constant>KDBUS_NAME_REPLACE_EXISTING</constant>,
-+ when a name is taken over from an activator connection, all
-+ the messages that have been queued in the activator
-+ connection will be moved over to the new owner. The activator
-+ connection will still be tracked for the name and will take
-+ control again if the implementer connection terminates.
-+ </para>
-+ <para>
-+ This flag can not be used when acquiring a name, but is
-+ implicitly set through <constant>KDBUS_CMD_HELLO</constant>
-+ with <constant>KDBUS_HELLO_ACTIVATOR</constant> set in
-+ <varname>kdbus_cmd_hello.conn_flags</varname>.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_FLAG_NEGOTIATE</constant></term>
-+ <listitem>
-+ <para>
-+ Requests a set of valid flags for this ioctl. When this bit is
-+ set, no action is taken; the ioctl will return
-+ <errorcode>0</errorcode>, and the <varname>flags</varname>
-+ field will have all bits set that are valid for this command.
-+ The <constant>KDBUS_FLAG_NEGOTIATE</constant> bit will be
-+ cleared by the operation.
-+ </para>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </listitem>
-+ </varlistentry>
-+ </variablelist>
-+
-+ <para>
-+ The returned buffer must be freed with the
-+ <constant>KDBUS_CMD_FREE</constant> ioctl when the user is finished with
-+ it. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.pool</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more information.
-+ </para>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Return value</title>
-+ <para>
-+ On success, all mentioned ioctl commands return <errorcode>0</errorcode>;
-+ on error, <errorcode>-1</errorcode> is returned, and
-+ <varname>errno</varname> is set to indicate the error.
-+ If the issued ioctl is illegal for the file descriptor used,
-+ <varname>errno</varname> will be set to <constant>ENOTTY</constant>.
-+ </para>
-+
-+ <refsect2>
-+ <title>
-+ <constant>KDBUS_CMD_NAME_ACQUIRE</constant> may fail with the following
-+ errors
-+ </title>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>EINVAL</constant></term>
-+ <listitem><para>
-+ Illegal command flags, illegal name provided, or an activator
-+ tried to acquire a second name.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EPERM</constant></term>
-+ <listitem><para>
-+ Policy prohibited name ownership.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EALREADY</constant></term>
-+ <listitem><para>
-+ Connection already owns that name.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EEXIST</constant></term>
-+ <listitem><para>
-+ The name already exists and can not be taken over.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>E2BIG</constant></term>
-+ <listitem><para>
-+ The maximum number of well-known names per connection is exhausted.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </refsect2>
-+
-+ <refsect2>
-+ <title>
-+ <constant>KDBUS_CMD_NAME_RELEASE</constant>
-+ may fail with the following errors
-+ </title>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>EINVAL</constant></term>
-+ <listitem><para>
-+ Invalid command flags, or invalid name provided.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>ESRCH</constant></term>
-+ <listitem><para>
-+ Name is not found in the registry.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EADDRINUSE</constant></term>
-+ <listitem><para>
-+ Name is owned by a different connection and can't be released.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </refsect2>
-+
-+ <refsect2>
-+ <title>
-+ <constant>KDBUS_CMD_LIST</constant> may fail with the following
-+ errors
-+ </title>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>EINVAL</constant></term>
-+ <listitem><para>
-+ Invalid command flags
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>ENOBUFS</constant></term>
-+ <listitem><para>
-+ No available memory in the connection's pool.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </refsect2>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>See Also</title>
-+ <simplelist type="inline">
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.bus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.connection</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.item</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.policy</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.pool</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ </simplelist>
-+ </refsect1>
-+</refentry>
-diff --git a/Documentation/kdbus/kdbus.policy.xml b/Documentation/kdbus/kdbus.policy.xml
-new file mode 100644
-index 0000000..6732416
---- /dev/null
-+++ b/Documentation/kdbus/kdbus.policy.xml
-@@ -0,0 +1,406 @@
-+<?xml version='1.0'?> <!--*-nxml-*-->
-+<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
-+ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
-+
-+<refentry id="kdbus.policy">
-+
-+ <refentryinfo>
-+ <title>kdbus.policy</title>
-+ <productname>kdbus.policy</productname>
-+ </refentryinfo>
-+
-+ <refmeta>
-+ <refentrytitle>kdbus.policy</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </refmeta>
-+
-+ <refnamediv>
-+ <refname>kdbus.policy</refname>
-+ <refpurpose>kdbus policy</refpurpose>
-+ </refnamediv>
-+
-+ <refsect1>
-+ <title>Description</title>
-+
-+ <para>
-+ A kdbus policy restricts the possibilities of connections to own, see and
-+ talk to well-known names. A policy can be associated with a bus (through a
-+ policy holder connection) or a custom endpoint. kdbus stores its policy
-+ information in a database that can be accessed through the following
-+ ioctl commands:
-+ </para>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_CMD_HELLO</constant></term>
-+ <listitem><para>
-+ When creating, or updating, a policy holder connection. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.connection</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_CMD_ENDPOINT_MAKE</constant></term>
-+ <term><constant>KDBUS_CMD_ENDPOINT_UPDATE</constant></term>
-+ <listitem><para>
-+ When creating, or updating, a bus custom endpoint. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.endpoint</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+
-+ <para>
-+ In all cases, the name and policy access information is stored in items
-+ of type <constant>KDBUS_ITEM_NAME</constant> and
-+ <constant>KDBUS_ITEM_POLICY_ACCESS</constant>. For this transport, the
-+ following rules apply.
-+ </para>
-+
-+ <itemizedlist>
-+ <listitem>
-+ <para>
-+ An item of type <constant>KDBUS_ITEM_NAME</constant> must be followed
-+ by at least one <constant>KDBUS_ITEM_POLICY_ACCESS</constant> item.
-+ </para>
-+ </listitem>
-+
-+ <listitem>
-+ <para>
-+ An item of type <constant>KDBUS_ITEM_NAME</constant> can be followed
-+ by an arbitrary number of
-+ <constant>KDBUS_ITEM_POLICY_ACCESS</constant> items.
-+ </para>
-+ </listitem>
-+
-+ <listitem>
-+ <para>
-+ An arbitrary number of groups of names and access levels can be given.
-+ </para>
-+ </listitem>
-+ </itemizedlist>
-+
-+ <para>
-+ Names passed in items of type <constant>KDBUS_ITEM_NAME</constant> must
-+ comply to the rules of valid kdbus.name. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.name</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more information.
-+
-+ The payload of an item of type
-+ <constant>KDBUS_ITEM_POLICY_ACCESS</constant> is defined by the following
-+ struct. For more information on the layout of items, please refer to
-+ <citerefentry>
-+ <refentrytitle>kdbus.item</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>.
-+ </para>
-+
-+ <programlisting>
-+struct kdbus_policy_access {
-+ __u64 type;
-+ __u64 access;
-+ __u64 id;
-+};
-+ </programlisting>
-+
-+ <para>The fields in this struct are described below.</para>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><varname>type</varname></term>
-+ <listitem>
-+ <para>
-+ One of the following.
-+ </para>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_POLICY_ACCESS_USER</constant></term>
-+ <listitem><para>
-+ Grant access to a user with the UID stored in the
-+ <varname>id</varname> field.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_POLICY_ACCESS_GROUP</constant></term>
-+ <listitem><para>
-+ Grant access to a user with the GID stored in the
-+ <varname>id</varname> field.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_POLICY_ACCESS_WORLD</constant></term>
-+ <listitem><para>
-+ Grant access to everyone. The <varname>id</varname> field
-+ is ignored.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>access</varname></term>
-+ <listitem>
-+ <para>
-+ The access to grant. One of the following.
-+ </para>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_POLICY_SEE</constant></term>
-+ <listitem><para>
-+ Allow the name to be seen.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_POLICY_TALK</constant></term>
-+ <listitem><para>
-+ Allow the name to be talked to.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_POLICY_OWN</constant></term>
-+ <listitem><para>
-+ Allow the name to be owned.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>id</varname></term>
-+ <listitem><para>
-+ For <constant>KDBUS_POLICY_ACCESS_USER</constant>, stores the UID.
-+ For <constant>KDBUS_POLICY_ACCESS_GROUP</constant>, stores the GID.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ </variablelist>
-+
-+ <para>
-+ All endpoints of buses have an empty policy database by default.
-+ Therefore, unless policy rules are added, all operations will also be
-+ denied by default. Also see
-+ <citerefentry>
-+ <refentrytitle>kdbus.endpoint</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>.
-+ </para>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Wildcard names</title>
-+ <para>
-+ Policy holder connections may upload names that contain the wildcard
-+ suffix (<literal>".*"</literal>). Such a policy entry is effective for
-+ every well-known name that extends the provided name by exactly one more
-+ level.
-+
-+ For example, the name <literal>foo.bar.*</literal> matches both
-+ <literal>"foo.bar.baz"</literal> and
-+ <literal>"foo.bar.bazbaz"</literal> are, but not
-+ <literal>"foo.bar.baz.baz"</literal>.
-+
-+ This allows connections to take control over multiple names that the
-+ policy holder doesn't need to know about when uploading the policy.
-+
-+ Such wildcard entries are not allowed for custom endpoints.
-+ </para>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Privileged connections</title>
-+ <para>
-+ The policy database is overruled when action is taken by a privileged
-+ connection. Please refer to
-+ <citerefentry>
-+ <refentrytitle>kdbus.connection</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more information on what makes a connection privileged.
-+ </para>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Examples</title>
-+ <para>
-+ For instance, a set of policy rules may look like this:
-+ </para>
-+
-+ <programlisting>
-+KDBUS_ITEM_NAME: str='org.foo.bar'
-+KDBUS_ITEM_POLICY_ACCESS: type=USER, access=OWN, ID=1000
-+KDBUS_ITEM_POLICY_ACCESS: type=USER, access=TALK, ID=1001
-+KDBUS_ITEM_POLICY_ACCESS: type=WORLD, access=SEE
-+
-+KDBUS_ITEM_NAME: str='org.blah.baz'
-+KDBUS_ITEM_POLICY_ACCESS: type=USER, access=OWN, ID=0
-+KDBUS_ITEM_POLICY_ACCESS: type=WORLD, access=TALK
-+ </programlisting>
-+
-+ <para>
-+ That means that 'org.foo.bar' may only be owned by UID 1000, but every
-+ user on the bus is allowed to see the name. However, only UID 1001 may
-+ actually send a message to the connection and receive a reply from it.
-+
-+ The second rule allows 'org.blah.baz' to be owned by UID 0 only, but
-+ every user may talk to it.
-+ </para>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>TALK access and multiple well-known names per connection</title>
-+ <para>
-+ Note that TALK access is checked against all names of a connection. For
-+ example, if a connection owns both <constant>'org.foo.bar'</constant> and
-+ <constant>'org.blah.baz'</constant>, and the policy database allows
-+ <constant>'org.blah.baz'</constant> to be talked to by WORLD, then this
-+ permission is also granted to <constant>'org.foo.bar'</constant>. That
-+ might sound illogical, but after all, we allow messages to be directed to
-+ either the ID or a well-known name, and policy is applied to the
-+ connection, not the name. In other words, the effective TALK policy for a
-+ connection is the most permissive of all names the connection owns.
-+
-+ For broadcast messages, the receiver needs TALK permissions to the sender
-+ to receive the broadcast.
-+ </para>
-+ <para>
-+ Both the endpoint and the bus policy databases are consulted to allow
-+ name registry listing, owning a well-known name and message delivery.
-+ If either one fails, the operation is failed with
-+ <varname>errno</varname> set to <constant>EPERM</constant>.
-+
-+ For best practices, connections that own names with a restricted TALK
-+ access should not install matches. This avoids cases where the sent
-+ message may pass the bloom filter due to false-positives and may also
-+ satisfy the policy rules.
-+
-+ Also see
-+ <citerefentry>
-+ <refentrytitle>kdbus.match</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>.
-+ </para>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Implicit policies</title>
-+ <para>
-+ Depending on the type of the endpoint, a set of implicit rules that
-+ override installed policies might be enforced.
-+
-+ On default endpoints, the following set is enforced and checked before
-+ any user-supplied policy is checked.
-+ </para>
-+
-+ <itemizedlist>
-+ <listitem>
-+ <para>
-+ Privileged connections always override any installed policy. Those
-+ connections could easily install their own policies, so there is no
-+ reason to enforce installed policies.
-+ </para>
-+ </listitem>
-+ <listitem>
-+ <para>
-+ Connections can always talk to connections of the same user. This
-+ includes broadcast messages.
-+ </para>
-+ </listitem>
-+ </itemizedlist>
-+
-+ <para>
-+ Custom endpoints have stricter policies. The following rules apply:
-+ </para>
-+
-+ <itemizedlist>
-+ <listitem>
-+ <para>
-+ Policy rules are always enforced, even if the connection is a
-+ privileged connection.
-+ </para>
-+ </listitem>
-+ <listitem>
-+ <para>
-+ Policy rules are always enforced for <constant>TALK</constant> access,
-+ even if both ends are running under the same user. This includes
-+ broadcast messages.
-+ </para>
-+ </listitem>
-+ <listitem>
-+ <para>
-+ To restrict the set of names that can be seen, endpoint policies can
-+ install <constant>SEE</constant> policies.
-+ </para>
-+ </listitem>
-+ </itemizedlist>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>See Also</title>
-+ <simplelist type="inline">
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.bus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.endpoint</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.fs</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.item</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.message</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.name</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.pool</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ </simplelist>
-+ </refsect1>
-+</refentry>
-diff --git a/Documentation/kdbus/kdbus.pool.xml b/Documentation/kdbus/kdbus.pool.xml
-new file mode 100644
-index 0000000..a9e16f1
---- /dev/null
-+++ b/Documentation/kdbus/kdbus.pool.xml
-@@ -0,0 +1,326 @@
-+<?xml version='1.0'?> <!--*-nxml-*-->
-+<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
-+ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
-+
-+<refentry id="kdbus.pool">
-+
-+ <refentryinfo>
-+ <title>kdbus.pool</title>
-+ <productname>kdbus.pool</productname>
-+ </refentryinfo>
-+
-+ <refmeta>
-+ <refentrytitle>kdbus.pool</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </refmeta>
-+
-+ <refnamediv>
-+ <refname>kdbus.pool</refname>
-+ <refpurpose>kdbus pool</refpurpose>
-+ </refnamediv>
-+
-+ <refsect1>
-+ <title>Description</title>
-+ <para>
-+ A pool for data received from the kernel is installed for every
-+ <emphasis>connection</emphasis> of the <emphasis>bus</emphasis>, and
-+ is sized according to the information stored in the
-+ <varname>pool_size</varname> member of <type>struct kdbus_cmd_hello</type>
-+ when <constant>KDBUS_CMD_HELLO</constant> is employed. Internally, the
-+ pool is segmented into <emphasis>slices</emphasis>, each referenced by its
-+ <emphasis>offset</emphasis> in the pool, expressed in <type>bytes</type>.
-+ See
-+ <citerefentry>
-+ <refentrytitle>kdbus.connection</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more information about <constant>KDBUS_CMD_HELLO</constant>.
-+ </para>
-+
-+ <para>
-+ The pool is written to by the kernel when one of the following
-+ <emphasis>ioctls</emphasis> is issued:
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_CMD_HELLO</constant></term>
-+ <listitem><para>
-+ ... to receive details about the bus the connection was made to
-+ </para></listitem>
-+ </varlistentry>
-+ <varlistentry>
-+ <term><constant>KDBUS_CMD_RECV</constant></term>
-+ <listitem><para>
-+ ... to receive a message
-+ </para></listitem>
-+ </varlistentry>
-+ <varlistentry>
-+ <term><constant>KDBUS_CMD_LIST</constant></term>
-+ <listitem><para>
-+ ... to dump the name registry
-+ </para></listitem>
-+ </varlistentry>
-+ <varlistentry>
-+ <term><constant>KDBUS_CMD_CONN_INFO</constant></term>
-+ <listitem><para>
-+ ... to retrieve information on a connection
-+ </para></listitem>
-+ </varlistentry>
-+ <varlistentry>
-+ <term><constant>KDBUS_CMD_BUS_CREATOR_INFO</constant></term>
-+ <listitem><para>
-+ ... to retrieve information about a connection's bus creator
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+
-+ </para>
-+ <para>
-+ The <varname>offset</varname> fields returned by either one of the
-+ aforementioned ioctls describe offsets inside the pool. In order to make
-+ the slice available for subsequent calls,
-+ <constant>KDBUS_CMD_FREE</constant> has to be called on that offset
-+ (see below). Otherwise, the pool will fill up, and the connection won't
-+ be able to receive any more information through its pool.
-+ </para>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Pool slice allocation</title>
-+ <para>
-+ Pool slices are allocated by the kernel in order to report information
-+ back to a task, such as messages, returned name list etc.
-+ Allocation of pool slices cannot be initiated by userspace. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.connection</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ and
-+ <citerefentry>
-+ <refentrytitle>kdbus.name</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for examples of commands that use the <emphasis>pool</emphasis> to
-+ return data.
-+ </para>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Accessing the pool memory</title>
-+ <para>
-+ Memory in the pool is read-only for userspace and may only be written
-+ to by the kernel. To read from the pool memory, the caller is expected to
-+ <citerefentry>
-+ <refentrytitle>mmap</refentrytitle>
-+ <manvolnum>2</manvolnum>
-+ </citerefentry>
-+ the buffer into its task, like this:
-+ </para>
-+ <programlisting>
-+uint8_t *buf = mmap(NULL, size, PROT_READ, MAP_SHARED, conn_fd, 0);
-+ </programlisting>
-+
-+ <para>
-+ In order to map the entire pool, the <varname>size</varname> parameter in
-+ the example above should be set to the value of the
-+ <varname>pool_size</varname> member of
-+ <type>struct kdbus_cmd_hello</type> when
-+ <constant>KDBUS_CMD_HELLO</constant> was employed to create the
-+ connection (see above).
-+ </para>
-+
-+ <para>
-+ The <emphasis>file descriptor</emphasis> used to map the memory must be
-+ the one that was used to create the <emphasis>connection</emphasis>.
-+ In other words, the one that was used to call
-+ <constant>KDBUS_CMD_HELLO</constant>. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.connection</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more details.
-+ </para>
-+
-+ <para>
-+ Alternatively, instead of mapping the entire pool buffer, only parts
-+ of it can be mapped. Every kdbus command that returns an
-+ <emphasis>offset</emphasis> (see above) also reports a
-+ <emphasis>size</emphasis> along with it, so programs can be written
-+ in a way that it only maps portions of the pool to access a specific
-+ <emphasis>slice</emphasis>.
-+ </para>
-+
-+ <para>
-+ When access to the pool memory is no longer needed, programs should
-+ call <function>munmap()</function> on the pointer returned by
-+ <function>mmap()</function>.
-+ </para>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Freeing pool slices</title>
-+ <para>
-+ The <constant>KDBUS_CMD_FREE</constant> ioctl is used to free a slice
-+ inside the pool, describing an offset that was returned in an
-+ <varname>offset</varname> field of another ioctl struct.
-+ The <constant>KDBUS_CMD_FREE</constant> command takes a
-+ <type>struct kdbus_cmd_free</type> as argument.
-+ </para>
-+
-+<programlisting>
-+struct kdbus_cmd_free {
-+ __u64 size;
-+ __u64 flags;
-+ __u64 return_flags;
-+ __u64 offset;
-+ struct kdbus_item items[0];
-+};
-+</programlisting>
-+
-+ <para>The fields in this struct are described below.</para>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><varname>size</varname></term>
-+ <listitem><para>
-+ The overall size of the struct, including its items.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>flags</varname></term>
-+ <listitem><para>
-+ Currently unused.
-+ <constant>KDBUS_FLAG_NEGOTIATE</constant> is accepted to probe for
-+ valid flags. If set, the ioctl will return <errorcode>0</errorcode>,
-+ and the <varname>flags</varname> field is set to
-+ <constant>0</constant>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>return_flags</varname></term>
-+ <listitem><para>
-+ Flags returned by the kernel. Currently unused and always set to
-+ <constant>0</constant> by the kernel.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>offset</varname></term>
-+ <listitem><para>
-+ The offset to free, as returned by other ioctls that allocated
-+ memory for returned information.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><varname>items</varname></term>
-+ <listitem><para>
-+ Items to specify further details for the receive command.
-+ Currently unused.
-+ Unrecognized items are rejected, and the ioctl will fail with
-+ <varname>errno</varname> set to <constant>EINVAL</constant>.
-+ All items except for
-+ <constant>KDBUS_ITEM_NEGOTIATE</constant> (see
-+ <citerefentry>
-+ <refentrytitle>kdbus.item</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ ) will be rejected.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Return value</title>
-+ <para>
-+ On success, all mentioned ioctl commands return <errorcode>0</errorcode>;
-+ on error, <errorcode>-1</errorcode> is returned, and
-+ <varname>errno</varname> is set to indicate the error.
-+ If the issued ioctl is illegal for the file descriptor used,
-+ <varname>errno</varname> will be set to <constant>ENOTTY</constant>.
-+ </para>
-+
-+ <refsect2>
-+ <title>
-+ <constant>KDBUS_CMD_FREE</constant> may fail with the following
-+ errors
-+ </title>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>ENXIO</constant></term>
-+ <listitem><para>
-+ No pool slice found at given offset.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EINVAL</constant></term>
-+ <listitem><para>
-+ Invalid flags provided.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>EINVAL</constant></term>
-+ <listitem><para>
-+ The offset is valid, but the user is not allowed to free the slice.
-+ This happens, for example, if the offset was retrieved with
-+ <constant>KDBUS_RECV_PEEK</constant>.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+ </refsect2>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>See Also</title>
-+ <simplelist type="inline">
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.bus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.connection</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.endpoint</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.name</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>mmap</refentrytitle>
-+ <manvolnum>2</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>munmap</refentrytitle>
-+ <manvolnum>2</manvolnum>
-+ </citerefentry>
-+ </member>
-+ </simplelist>
-+ </refsect1>
-+</refentry>
-diff --git a/Documentation/kdbus/kdbus.xml b/Documentation/kdbus/kdbus.xml
-new file mode 100644
-index 0000000..d8e7400
---- /dev/null
-+++ b/Documentation/kdbus/kdbus.xml
-@@ -0,0 +1,1012 @@
-+<?xml version='1.0'?> <!--*-nxml-*-->
-+<!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
-+ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
-+
-+<refentry id="kdbus">
-+
-+ <refentryinfo>
-+ <title>kdbus</title>
-+ <productname>kdbus</productname>
-+ </refentryinfo>
-+
-+ <refmeta>
-+ <refentrytitle>kdbus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </refmeta>
-+
-+ <refnamediv>
-+ <refname>kdbus</refname>
-+ <refpurpose>Kernel Message Bus</refpurpose>
-+ </refnamediv>
-+
-+ <refsect1>
-+ <title>Synopsis</title>
-+ <para>
-+ kdbus is an inter-process communication bus system controlled by the
-+ kernel. It provides user-space with an API to create buses and send
-+ unicast and multicast messages to one, or many, peers connected to the
-+ same bus. It does not enforce any layout on the transmitted data, but
-+ only provides the transport layer used for message interchange between
-+ peers.
-+ </para>
-+ <para>
-+ This set of man-pages gives a comprehensive overview of the kernel-level
-+ API, with all ioctl commands, associated structs and bit masks. However,
-+ most people will not use this API level directly, but rather let one of
-+ the high-level abstraction libraries help them integrate D-Bus
-+ functionality into their applications.
-+ </para>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Description</title>
-+ <para>
-+ kdbus provides a pseudo filesystem called <emphasis>kdbusfs</emphasis>,
-+ which is usually mounted on <filename>/sys/fs/kdbus</filename>. Bus
-+ primitives can be accessed as files and sub-directories underneath this
-+ mount-point. Any advanced operations are done via
-+ <function>ioctl()</function> on files created by
-+ <emphasis>kdbusfs</emphasis>. Multiple mount-points of
-+ <emphasis>kdbusfs</emphasis> are independent of each other. This allows
-+ namespacing of kdbus by mounting a new instance of
-+ <emphasis>kdbusfs</emphasis> in a new mount-namespace. kdbus calls these
-+ mount instances domains and each bus belongs to exactly one domain.
-+ </para>
-+
-+ <para>
-+ kdbus was designed as a transport layer for D-Bus, but is in no way
-+ limited, nor controlled by the D-Bus protocol specification. The D-Bus
-+ protocol is one possible application layer on top of kdbus.
-+ </para>
-+
-+ <para>
-+ For the general D-Bus protocol specification, its payload format, its
-+ marshaling, and its communication semantics, please refer to the
-+ <ulink url="http://dbus.freedesktop.org/doc/dbus-specification.html">
-+ D-Bus specification</ulink>.
-+ </para>
-+
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Terminology</title>
-+
-+ <refsect2>
-+ <title>Domain</title>
-+ <para>
-+ A domain is a <emphasis>kdbusfs</emphasis> mount-point containing all
-+ the bus primitives. Each domain is independent, and separate domains
-+ do not affect each other.
-+ </para>
-+ </refsect2>
-+
-+ <refsect2>
-+ <title>Bus</title>
-+ <para>
-+ A bus is a named object inside a domain. Clients exchange messages
-+ over a bus. Multiple buses themselves have no connection to each other;
-+ messages can only be exchanged on the same bus. The default endpoint of
-+ a bus, to which clients establish connections, is the "bus" file
-+ /sys/fs/kdbus/<bus name>/bus.
-+ Common operating system setups create one "system bus" per system,
-+ and one "user bus" for every logged-in user. Applications or services
-+ may create their own private buses. The kernel driver does not
-+ distinguish between different bus types, they are all handled the same
-+ way. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.bus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more details.
-+ </para>
-+ </refsect2>
-+
-+ <refsect2>
-+ <title>Endpoint</title>
-+ <para>
-+ An endpoint provides a file to talk to a bus. Opening an endpoint
-+ creates a new connection to the bus to which the endpoint belongs. All
-+ endpoints have unique names and are accessible as files underneath the
-+ directory of a bus, e.g., /sys/fs/kdbus/<bus>/<endpoint>
-+ Every bus has a default endpoint called "bus".
-+ A bus can optionally offer additional endpoints with custom names
-+ to provide restricted access to the bus. Custom endpoints carry
-+ additional policy which can be used to create sandboxes with
-+ locked-down, limited, filtered access to a bus. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.endpoint</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more details.
-+ </para>
-+ </refsect2>
-+
-+ <refsect2>
-+ <title>Connection</title>
-+ <para>
-+ A connection to a bus is created by opening an endpoint file of a
-+ bus. Every ordinary client connection has a unique identifier on the
-+ bus and can address messages to every other connection on the same
-+ bus by using the peer's connection ID as the destination. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.connection</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more details.
-+ </para>
-+ </refsect2>
-+
-+ <refsect2>
-+ <title>Pool</title>
-+ <para>
-+ Each connection allocates a piece of shmem-backed memory that is
-+ used to receive messages and answers to ioctl commands from the kernel.
-+ It is never used to send anything to the kernel. In order to access that
-+ memory, an application must mmap() it into its address space. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.pool</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more details.
-+ </para>
-+ </refsect2>
-+
-+ <refsect2>
-+ <title>Well-known Name</title>
-+ <para>
-+ A connection can, in addition to its implicit unique connection ID,
-+ request the ownership of a textual well-known name. Well-known names are
-+ noted in reverse-domain notation, such as com.example.service1. A
-+ connection that offers a service on a bus is usually reached by its
-+ well-known name. An analogy of connection ID and well-known name is an
-+ IP address and a DNS name associated with that address. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.name</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more details.
-+ </para>
-+ </refsect2>
-+
-+ <refsect2>
-+ <title>Message</title>
-+ <para>
-+ Connections can exchange messages with other connections by addressing
-+ the peers with their connection ID or well-known name. A message
-+ consists of a message header with information on how to route the
-+ message, and the message payload, which is a logical byte stream of
-+ arbitrary size. Messages can carry additional file descriptors to be
-+ passed from one connection to another, just like passing file
-+ descriptors over UNIX domain sockets. Every connection can specify which
-+ set of metadata the kernel should attach to the message when it is
-+ delivered to the receiving connection. Metadata contains information
-+ like: system time stamps, UID, GID, TID, proc-starttime, well-known
-+ names, process comm, process exe, process argv, cgroup, capabilities,
-+ seclabel, audit session, loginuid and the connection's human-readable
-+ name. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.message</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more details.
-+ </para>
-+ </refsect2>
-+
-+ <refsect2>
-+ <title>Item</title>
-+ <para>
-+ The API of kdbus implements the notion of items, submitted through and
-+ returned by most ioctls, and stored inside data structures in the
-+ connection's pool. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.item</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more details.
-+ </para>
-+ </refsect2>
-+
-+ <refsect2>
-+ <title>Broadcast, signal, filter, match</title>
-+ <para>
-+ Signals are messages that a receiver opts in for by installing a blob of
-+ bytes, called a 'match'. Signal messages must always carry a
-+ counter-part blob, called a 'filter', and signals are only delivered to
-+ peers which have a match that white-lists the message's filter. Senders
-+ of signal messages can use either a single connection ID as receiver,
-+ or the special connection ID
-+ <constant>KDBUS_DST_ID_BROADCAST</constant> to potentially send it to
-+ all connections of a bus, following the logic described above. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.match</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ and
-+ <citerefentry>
-+ <refentrytitle>kdbus.message</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more details.
-+ </para>
-+ </refsect2>
-+
-+ <refsect2>
-+ <title>Policy</title>
-+ <para>
-+ A policy is a set of rules that define which connections can see, talk
-+ to, or register a well-known name on the bus. A policy is attached to
-+ buses and custom endpoints, and modified by policy holder connections or
-+ owners of custom endpoints. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.policy</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more details.
-+ </para>
-+ </refsect2>
-+
-+ <refsect2>
-+ <title>Privileged bus users</title>
-+ <para>
-+ A user connecting to the bus is considered privileged if it is either
-+ the creator of the bus, or if it has the CAP_IPC_OWNER capability flag
-+ set. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.connection</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more details.
-+ </para>
-+ </refsect2>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Bus Layout</title>
-+
-+ <para>
-+ A <emphasis>bus</emphasis> provides and defines an environment that peers
-+ can connect to for message interchange. A bus is created via the kdbus
-+ control interface and can be modified by the bus creator. It applies the
-+ policy that control all bus operations. The bus creator itself does not
-+ participate as a peer. To establish a peer
-+ <emphasis>connection</emphasis>, you have to open one of the
-+ <emphasis>endpoints</emphasis> of a bus. Each bus provides a default
-+ endpoint, but further endpoints can be created on-demand. Endpoints are
-+ used to apply additional policies for all connections on this endpoint.
-+ Thus, they provide additional filters to further restrict access of
-+ specific connections to the bus.
-+ </para>
-+
-+ <para>
-+ Following, you can see an example bus layout:
-+ </para>
-+
-+ <programlisting><![CDATA[
-+ Bus Creator
-+ |
-+ |
-+ +-----+
-+ | Bus |
-+ +-----+
-+ |
-+ __________________/ \__________________
-+ / \
-+ | |
-+ +----------+ +----------+
-+ | Endpoint | | Endpoint |
-+ +----------+ +----------+
-+ _________/|\_________ _________/|\_________
-+ / | \ / | \
-+ | | | | | |
-+ | | | | | |
-+ Connection Connection Connection Connection Connection Connection
-+ ]]></programlisting>
-+
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Data structures and interconnections</title>
-+ <programlisting><![CDATA[
-+ +--------------------------------------------------------------------------+
-+ | Domain (Mount Point) |
-+ | /sys/fs/kdbus/control |
-+ | +----------------------------------------------------------------------+ |
-+ | | Bus (System Bus) | |
-+ | | /sys/fs/kdbus/0-system/ | |
-+ | | +-------------------------------+ +--------------------------------+ | |
-+ | | | Endpoint | | Endpoint | | |
-+ | | | /sys/fs/kdbus/0-system/bus | | /sys/fs/kdbus/0-system/ep.app | | |
-+ | | +-------------------------------+ +--------------------------------+ | |
-+ | | +--------------+ +--------------+ +--------------+ +---------------+ | |
-+ | | | Connection | | Connection | | Connection | | Connection | | |
-+ | | | :1.22 | | :1.25 | | :1.55 | | :1.81 | | |
-+ | | +--------------+ +--------------+ +--------------+ +---------------+ | |
-+ | +----------------------------------------------------------------------+ |
-+ | |
-+ | +----------------------------------------------------------------------+ |
-+ | | Bus (User Bus for UID 2702) | |
-+ | | /sys/fs/kdbus/2702-user/ | |
-+ | | +-------------------------------+ +--------------------------------+ | |
-+ | | | Endpoint | | Endpoint | | |
-+ | | | /sys/fs/kdbus/2702-user/bus | | /sys/fs/kdbus/2702-user/ep.app | | |
-+ | | +-------------------------------+ +--------------------------------+ | |
-+ | | +--------------+ +--------------+ +--------------+ +---------------+ | |
-+ | | | Connection | | Connection | | Connection | | Connection | | |
-+ | | | :1.22 | | :1.25 | | :1.55 | | :1.81 | | |
-+ | | +--------------+ +--------------+ +--------------------------------+ | |
-+ | +----------------------------------------------------------------------+ |
-+ +--------------------------------------------------------------------------+
-+ ]]></programlisting>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>Metadata</title>
-+
-+ <refsect2>
-+ <title>When metadata is collected</title>
-+ <para>
-+ kdbus records data about the system in certain situations. Such metadata
-+ can refer to the currently active process (creds, PIDs, current user
-+ groups, process names and its executable path, cgroup membership,
-+ capabilities, security label and audit information), connection
-+ information (description string, currently owned names) and time stamps.
-+ </para>
-+ <para>
-+ Metadata is collected at the following times.
-+ </para>
-+
-+ <itemizedlist>
-+ <listitem><para>
-+ When a bus is created (<constant>KDBUS_CMD_MAKE</constant>),
-+ information about the calling task is collected. This data is returned
-+ by the kernel via the <constant>KDBUS_CMD_BUS_CREATOR_INFO</constant>
-+ call.
-+ </para></listitem>
-+
-+ <listitem>
-+ <para>
-+ When a connection is created (<constant>KDBUS_CMD_HELLO</constant>),
-+ information about the calling task is collected. Alternatively, a
-+ privileged connection may provide 'faked' information about
-+ credentials, PIDs and security labels which will be stored instead.
-+ This data is returned by the kernel as information on a connection
-+ (<constant>KDBUS_CMD_CONN_INFO</constant>). Only metadata that a
-+ connection allowed to be sent (by setting its bit in
-+ <varname>attach_flags_send</varname>) will be exported in this way.
-+ </para>
-+ </listitem>
-+
-+ <listitem>
-+ <para>
-+ When a message is sent (<constant>KDBUS_CMD_SEND</constant>),
-+ information about the sending task and the sending connection is
-+ collected. This metadata will be attached to the message when it
-+ arrives in the receiver's pool. If the connection sending the
-+ message installed faked credentials (see
-+ <citerefentry>
-+ <refentrytitle>kdbus.connection</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>),
-+ the message will not be augmented by any information about the
-+ currently sending task. Note that only metadata that was requested
-+ by the receiving connection will be collected and attached to
-+ messages.
-+ </para>
-+ </listitem>
-+ </itemizedlist>
-+
-+ <para>
-+ Which metadata items are actually delivered depends on the following
-+ sets and masks:
-+ </para>
-+
-+ <itemizedlist>
-+ <listitem><para>
-+ (a) the system-wide kmod creds mask
-+ (module parameter <varname>attach_flags_mask</varname>)
-+ </para></listitem>
-+
-+ <listitem><para>
-+ (b) the per-connection send creds mask, set by the connecting client
-+ </para></listitem>
-+
-+ <listitem><para>
-+ (c) the per-connection receive creds mask, set by the connecting
-+ client
-+ </para></listitem>
-+
-+ <listitem><para>
-+ (d) the per-bus minimal creds mask, set by the bus creator
-+ </para></listitem>
-+
-+ <listitem><para>
-+ (e) the per-bus owner creds mask, set by the bus creator
-+ </para></listitem>
-+
-+ <listitem><para>
-+ (f) the mask specified when querying creds of a bus peer
-+ </para></listitem>
-+
-+ <listitem><para>
-+ (g) the mask specified when querying creds of a bus owner
-+ </para></listitem>
-+ </itemizedlist>
-+
-+ <para>
-+ With the following rules:
-+ </para>
-+
-+ <itemizedlist>
-+ <listitem>
-+ <para>
-+ [1] The creds attached to messages are determined as
-+ <constant>a & b & c</constant>.
-+ </para>
-+ </listitem>
-+
-+ <listitem>
-+ <para>
-+ [2] When connecting to a bus (<constant>KDBUS_CMD_HELLO</constant>),
-+ and <constant>~b & d != 0</constant>, the call will fail with,
-+ <errorcode>-1</errorcode>, and <varname>errno</varname> is set to
-+ <constant>ECONNREFUSED</constant>.
-+ </para>
-+ </listitem>
-+
-+ <listitem>
-+ <para>
-+ [3] When querying creds of a bus peer, the creds returned are
-+ <constant>a & b & f</constant>.
-+ </para>
-+ </listitem>
-+
-+ <listitem>
-+ <para>
-+ [4] When querying creds of a bus owner, the creds returned are
-+ <constant>a & e & g</constant>.
-+ </para>
-+ </listitem>
-+ </itemizedlist>
-+
-+ <para>
-+ Hence, programs might not always get all requested metadata items that
-+ it requested. Code must be written so that it can cope with this fact.
-+ </para>
-+ </refsect2>
-+
-+ <refsect2>
-+ <title>Benefits and heads-up</title>
-+ <para>
-+ Attaching metadata to messages has two major benefits.
-+
-+ <itemizedlist>
-+ <listitem>
-+ <para>
-+ Metadata attached to messages is gathered at the moment when the
-+ other side calls <constant>KDBUS_CMD_SEND</constant>, or,
-+ respectively, then the kernel notification is generated. There is
-+ no need for the receiving peer to retrieve information about the
-+ task in a second step. This closes a race gap that would otherwise
-+ be inherent.
-+ </para>
-+ </listitem>
-+ <listitem>
-+ <para>
-+ As metadata is delivered along with messages in the same data
-+ blob, no extra calls to kernel functions etc. are needed to gather
-+ them.
-+ </para>
-+ </listitem>
-+ </itemizedlist>
-+
-+ Note, however, that collecting metadata does come at a price for
-+ performance, so developers should carefully assess which metadata to
-+ really opt-in for. For best practice, data that is not needed as part
-+ of a message should not be requested by the connection in the first
-+ place (see <varname>attach_flags_recv</varname> in
-+ <constant>KDBUS_CMD_HELLO</constant>).
-+ </para>
-+ </refsect2>
-+
-+ <refsect2>
-+ <title>Attach flags for metadata items</title>
-+ <para>
-+ To let the kernel know which metadata information to attach as items
-+ to the aforementioned commands, it uses a bitmask. In those, the
-+ following <emphasis>attach flags</emphasis> are currently supported.
-+ Both the <varname>attach_flags_recv</varname> and
-+ <varname>attach_flags_send</varname> fields of
-+ <type>struct kdbus_cmd_hello</type>, as well as the payload of the
-+ <constant>KDBUS_ITEM_ATTACH_FLAGS_SEND</constant> and
-+ <constant>KDBUS_ITEM_ATTACH_FLAGS_RECV</constant> items follow this
-+ scheme.
-+ </para>
-+
-+ <variablelist>
-+ <varlistentry>
-+ <term><constant>KDBUS_ATTACH_TIMESTAMP</constant></term>
-+ <listitem><para>
-+ Requests the attachment of an item of type
-+ <constant>KDBUS_ITEM_TIMESTAMP</constant>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ATTACH_CREDS</constant></term>
-+ <listitem><para>
-+ Requests the attachment of an item of type
-+ <constant>KDBUS_ITEM_CREDS</constant>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ATTACH_PIDS</constant></term>
-+ <listitem><para>
-+ Requests the attachment of an item of type
-+ <constant>KDBUS_ITEM_PIDS</constant>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ATTACH_AUXGROUPS</constant></term>
-+ <listitem><para>
-+ Requests the attachment of an item of type
-+ <constant>KDBUS_ITEM_AUXGROUPS</constant>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ATTACH_NAMES</constant></term>
-+ <listitem><para>
-+ Requests the attachment of an item of type
-+ <constant>KDBUS_ITEM_OWNED_NAME</constant>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ATTACH_TID_COMM</constant></term>
-+ <listitem><para>
-+ Requests the attachment of an item of type
-+ <constant>KDBUS_ITEM_TID_COMM</constant>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ATTACH_PID_COMM</constant></term>
-+ <listitem><para>
-+ Requests the attachment of an item of type
-+ <constant>KDBUS_ITEM_PID_COMM</constant>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ATTACH_EXE</constant></term>
-+ <listitem><para>
-+ Requests the attachment of an item of type
-+ <constant>KDBUS_ITEM_EXE</constant>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ATTACH_CMDLINE</constant></term>
-+ <listitem><para>
-+ Requests the attachment of an item of type
-+ <constant>KDBUS_ITEM_CMDLINE</constant>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ATTACH_CGROUP</constant></term>
-+ <listitem><para>
-+ Requests the attachment of an item of type
-+ <constant>KDBUS_ITEM_CGROUP</constant>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ATTACH_CAPS</constant></term>
-+ <listitem><para>
-+ Requests the attachment of an item of type
-+ <constant>KDBUS_ITEM_CAPS</constant>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ATTACH_SECLABEL</constant></term>
-+ <listitem><para>
-+ Requests the attachment of an item of type
-+ <constant>KDBUS_ITEM_SECLABEL</constant>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ATTACH_AUDIT</constant></term>
-+ <listitem><para>
-+ Requests the attachment of an item of type
-+ <constant>KDBUS_ITEM_AUDIT</constant>.
-+ </para></listitem>
-+ </varlistentry>
-+
-+ <varlistentry>
-+ <term><constant>KDBUS_ATTACH_CONN_DESCRIPTION</constant></term>
-+ <listitem><para>
-+ Requests the attachment of an item of type
-+ <constant>KDBUS_ITEM_CONN_DESCRIPTION</constant>.
-+ </para></listitem>
-+ </varlistentry>
-+ </variablelist>
-+
-+ <para>
-+ Please refer to
-+ <citerefentry>
-+ <refentrytitle>kdbus.item</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for detailed information about the layout and payload of items and
-+ what metadata should be used to.
-+ </para>
-+ </refsect2>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>The ioctl interface</title>
-+
-+ <para>
-+ As stated in the 'synopsis' section above, application developers are
-+ strongly encouraged to use kdbus through one of the high-level D-Bus
-+ abstraction libraries, rather than using the low-level API directly.
-+ </para>
-+
-+ <para>
-+ kdbus on the kernel level exposes its functions exclusively through
-+ <citerefentry>
-+ <refentrytitle>ioctl</refentrytitle>
-+ <manvolnum>2</manvolnum>
-+ </citerefentry>,
-+ employed on file descriptors returned by
-+ <citerefentry>
-+ <refentrytitle>open</refentrytitle>
-+ <manvolnum>2</manvolnum>
-+ </citerefentry>
-+ on pseudo files exposed by
-+ <citerefentry>
-+ <refentrytitle>kdbus.fs</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>.
-+ </para>
-+ <para>
-+ Following is a list of all the ioctls, along with the command structs
-+ they must be used with.
-+ </para>
-+
-+ <informaltable frame="none">
-+ <tgroup cols="3" colsep="1">
-+ <thead>
-+ <row>
-+ <entry>ioctl signature</entry>
-+ <entry>command</entry>
-+ <entry>transported struct</entry>
-+ </row>
-+ </thead>
-+ <tbody>
-+ <row>
-+ <entry><constant>0x40189500</constant></entry>
-+ <entry><constant>KDBUS_CMD_BUS_MAKE</constant></entry>
-+ <entry><type>struct kdbus_cmd *</type></entry>
-+ </row><row>
-+ <entry><constant>0x40189510</constant></entry>
-+ <entry><constant>KDBUS_CMD_ENDPOINT_MAKE</constant></entry>
-+ <entry><type>struct kdbus_cmd *</type></entry>
-+ </row><row>
-+ <entry><constant>0xc0609580</constant></entry>
-+ <entry><constant>KDBUS_CMD_HELLO</constant></entry>
-+ <entry><type>struct kdbus_cmd_hello *</type></entry>
-+ </row><row>
-+ <entry><constant>0x40189582</constant></entry>
-+ <entry><constant>KDBUS_CMD_BYEBYE</constant></entry>
-+ <entry><type>struct kdbus_cmd *</type></entry>
-+ </row><row>
-+ <entry><constant>0x40389590</constant></entry>
-+ <entry><constant>KDBUS_CMD_SEND</constant></entry>
-+ <entry><type>struct kdbus_cmd_send *</type></entry>
-+ </row><row>
-+ <entry><constant>0x80409591</constant></entry>
-+ <entry><constant>KDBUS_CMD_RECV</constant></entry>
-+ <entry><type>struct kdbus_cmd_recv *</type></entry>
-+ </row><row>
-+ <entry><constant>0x40209583</constant></entry>
-+ <entry><constant>KDBUS_CMD_FREE</constant></entry>
-+ <entry><type>struct kdbus_cmd_free *</type></entry>
-+ </row><row>
-+ <entry><constant>0x401895a0</constant></entry>
-+ <entry><constant>KDBUS_CMD_NAME_ACQUIRE</constant></entry>
-+ <entry><type>struct kdbus_cmd *</type></entry>
-+ </row><row>
-+ <entry><constant>0x401895a1</constant></entry>
-+ <entry><constant>KDBUS_CMD_NAME_RELEASE</constant></entry>
-+ <entry><type>struct kdbus_cmd *</type></entry>
-+ </row><row>
-+ <entry><constant>0x80289586</constant></entry>
-+ <entry><constant>KDBUS_CMD_LIST</constant></entry>
-+ <entry><type>struct kdbus_cmd_list *</type></entry>
-+ </row><row>
-+ <entry><constant>0x80309584</constant></entry>
-+ <entry><constant>KDBUS_CMD_CONN_INFO</constant></entry>
-+ <entry><type>struct kdbus_cmd_info *</type></entry>
-+ </row><row>
-+ <entry><constant>0x40209551</constant></entry>
-+ <entry><constant>KDBUS_CMD_UPDATE</constant></entry>
-+ <entry><type>struct kdbus_cmd *</type></entry>
-+ </row><row>
-+ <entry><constant>0x80309585</constant></entry>
-+ <entry><constant>KDBUS_CMD_BUS_CREATOR_INFO</constant></entry>
-+ <entry><type>struct kdbus_cmd_info *</type></entry>
-+ </row><row>
-+ <entry><constant>0x40189511</constant></entry>
-+ <entry><constant>KDBUS_CMD_ENDPOINT_UPDATE</constant></entry>
-+ <entry><type>struct kdbus_cmd *</type></entry>
-+ </row><row>
-+ <entry><constant>0x402095b0</constant></entry>
-+ <entry><constant>KDBUS_CMD_MATCH_ADD</constant></entry>
-+ <entry><type>struct kdbus_cmd_match *</type></entry>
-+ </row><row>
-+ <entry><constant>0x402095b1</constant></entry>
-+ <entry><constant>KDBUS_CMD_MATCH_REMOVE</constant></entry>
-+ <entry><type>struct kdbus_cmd_match *</type></entry>
-+ </row>
-+ </tbody>
-+ </tgroup>
-+ </informaltable>
-+
-+ <para>
-+ Depending on the type of <emphasis>kdbusfs</emphasis> node that was
-+ opened and what ioctls have been executed on a file descriptor before,
-+ a different sub-set of ioctl commands is allowed.
-+ </para>
-+
-+ <itemizedlist>
-+ <listitem>
-+ <para>
-+ On a file descriptor resulting from opening a
-+ <emphasis>control node</emphasis>, only the
-+ <constant>KDBUS_CMD_BUS_MAKE</constant> ioctl may be executed.
-+ </para>
-+ </listitem>
-+ <listitem>
-+ <para>
-+ On a file descriptor resulting from opening a
-+ <emphasis>bus endpoint node</emphasis>, only the
-+ <constant>KDBUS_CMD_ENDPOINT_MAKE</constant> and
-+ <constant>KDBUS_CMD_HELLO</constant> ioctls may be executed.
-+ </para>
-+ </listitem>
-+ <listitem>
-+ <para>
-+ A file descriptor that was used to create a bus
-+ (via <constant>KDBUS_CMD_BUS_MAKE</constant>) is called a
-+ <emphasis>bus owner</emphasis> file descriptor. The bus will be
-+ active as long as the file descriptor is kept open.
-+ A bus owner file descriptor can not be used to
-+ employ any further ioctls. As soon as
-+ <citerefentry>
-+ <refentrytitle>close</refentrytitle>
-+ <manvolnum>2</manvolnum>
-+ </citerefentry>
-+ is called on it, the bus will be shut down, along will all associated
-+ endpoints and connections. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.bus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more details.
-+ </para>
-+ </listitem>
-+ <listitem>
-+ <para>
-+ A file descriptor that was used to create an endpoint
-+ (via <constant>KDBUS_CMD_ENDPOINT_MAKE</constant>) is called an
-+ <emphasis>endpoint owner</emphasis> file descriptor. The endpoint
-+ will be active as long as the file descriptor is kept open.
-+ An endpoint owner file descriptor can only be used
-+ to update details of an endpoint through the
-+ <constant>KDBUS_CMD_ENDPOINT_UPDATE</constant> ioctl. As soon as
-+ <citerefentry>
-+ <refentrytitle>close</refentrytitle>
-+ <manvolnum>2</manvolnum>
-+ </citerefentry>
-+ is called on it, the endpoint will be removed from the bus, and all
-+ connections that are connected to the bus through it are shut down.
-+ See
-+ <citerefentry>
-+ <refentrytitle>kdbus.endpoint</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ for more details.
-+ </para>
-+ </listitem>
-+ <listitem>
-+ <para>
-+ A file descriptor that was used to create a connection
-+ (via <constant>KDBUS_CMD_HELLO</constant>) is called a
-+ <emphasis>connection owner</emphasis> file descriptor. The connection
-+ will be active as long as the file descriptor is kept open.
-+ A connection owner file descriptor may be used to
-+ issue any of the following ioctls.
-+ </para>
-+
-+ <itemizedlist>
-+ <listitem><para>
-+ <constant>KDBUS_CMD_UPDATE</constant> to tweak details of the
-+ connection. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.connection</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>.
-+ </para></listitem>
-+
-+ <listitem><para>
-+ <constant>KDBUS_CMD_BYEBYE</constant> to shut down a connection
-+ without losing messages. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.connection</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>.
-+ </para></listitem>
-+
-+ <listitem><para>
-+ <constant>KDBUS_CMD_FREE</constant> to free a slice of memory in
-+ the pool. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.pool</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>.
-+ </para></listitem>
-+
-+ <listitem><para>
-+ <constant>KDBUS_CMD_CONN_INFO</constant> to retrieve information
-+ on other connections on the bus. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.connection</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>.
-+ </para></listitem>
-+
-+ <listitem><para>
-+ <constant>KDBUS_CMD_BUS_CREATOR_INFO</constant> to retrieve
-+ information on the bus creator. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.connection</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>.
-+ </para></listitem>
-+
-+ <listitem><para>
-+ <constant>KDBUS_CMD_LIST</constant> to retrieve a list of
-+ currently active well-known names and unique IDs on the bus. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.name</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>.
-+ </para></listitem>
-+
-+ <listitem><para>
-+ <constant>KDBUS_CMD_SEND</constant> and
-+ <constant>KDBUS_CMD_RECV</constant> to send or receive a message.
-+ See
-+ <citerefentry>
-+ <refentrytitle>kdbus.message</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>.
-+ </para></listitem>
-+
-+ <listitem><para>
-+ <constant>KDBUS_CMD_NAME_ACQUIRE</constant> and
-+ <constant>KDBUS_CMD_NAME_RELEASE</constant> to acquire or release
-+ a well-known name on the bus. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.name</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>.
-+ </para></listitem>
-+
-+ <listitem><para>
-+ <constant>KDBUS_CMD_MATCH_ADD</constant> and
-+ <constant>KDBUS_CMD_MATCH_REMOVE</constant> to add or remove
-+ a match for signal messages. See
-+ <citerefentry>
-+ <refentrytitle>kdbus.match</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>.
-+ </para></listitem>
-+ </itemizedlist>
-+ </listitem>
-+ </itemizedlist>
-+
-+ <para>
-+ These ioctls, along with the structs they transport, are explained in
-+ detail in the other documents linked to in the "See Also" section below.
-+ </para>
-+ </refsect1>
-+
-+ <refsect1>
-+ <title>See Also</title>
-+ <simplelist type="inline">
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.bus</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.connection</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.endpoint</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.fs</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.item</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.message</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.name</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>kdbus.pool</refentrytitle>
-+ <manvolnum>7</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>ioctl</refentrytitle>
-+ <manvolnum>2</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>mmap</refentrytitle>
-+ <manvolnum>2</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>open</refentrytitle>
-+ <manvolnum>2</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <citerefentry>
-+ <refentrytitle>close</refentrytitle>
-+ <manvolnum>2</manvolnum>
-+ </citerefentry>
-+ </member>
-+ <member>
-+ <ulink url="http://freedesktop.org/wiki/Software/dbus">D-Bus</ulink>
-+ </member>
-+ </simplelist>
-+ </refsect1>
-+
-+</refentry>
-diff --git a/Documentation/kdbus/stylesheet.xsl b/Documentation/kdbus/stylesheet.xsl
-new file mode 100644
-index 0000000..52565ea
---- /dev/null
-+++ b/Documentation/kdbus/stylesheet.xsl
-@@ -0,0 +1,16 @@
-+<?xml version="1.0" encoding="UTF-8"?>
-+<stylesheet xmlns="http://www.w3.org/1999/XSL/Transform" version="1.0">
-+ <param name="chunk.quietly">1</param>
-+ <param name="funcsynopsis.style">ansi</param>
-+ <param name="funcsynopsis.tabular.threshold">80</param>
-+ <param name="callout.graphics">0</param>
-+ <param name="paper.type">A4</param>
-+ <param name="generate.section.toc.level">2</param>
-+ <param name="use.id.as.filename">1</param>
-+ <param name="citerefentry.link">1</param>
-+ <strip-space elements="*"/>
-+ <template name="generate.citerefentry.link">
-+ <value-of select="refentrytitle"/>
-+ <text>.html</text>
-+ </template>
-+</stylesheet>
-diff --git a/MAINTAINERS b/MAINTAINERS
-index d8afd29..02f7668 100644
---- a/MAINTAINERS
-+++ b/MAINTAINERS
-@@ -5585,6 +5585,19 @@ S: Maintained
- F: Documentation/kbuild/kconfig-language.txt
- F: scripts/kconfig/
-
-+KDBUS
-+M: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+M: Daniel Mack <daniel@zonque.org>
-+M: David Herrmann <dh.herrmann@googlemail.com>
-+M: Djalal Harouni <tixxdz@opendz.org>
-+L: linux-kernel@vger.kernel.org
-+S: Maintained
-+F: ipc/kdbus/*
-+F: samples/kdbus/*
-+F: Documentation/kdbus/*
-+F: include/uapi/linux/kdbus.h
-+F: tools/testing/selftests/kdbus/
-+
- KDUMP
- M: Vivek Goyal <vgoyal@redhat.com>
- M: Haren Myneni <hbabu@us.ibm.com>
-diff --git a/Makefile b/Makefile
-index f5c8983..a1c8d57 100644
---- a/Makefile
-+++ b/Makefile
-@@ -1343,6 +1343,7 @@ $(help-board-dirs): help-%:
- %docs: scripts_basic FORCE
- $(Q)$(MAKE) $(build)=scripts build_docproc
- $(Q)$(MAKE) $(build)=Documentation/DocBook $@
-+ $(Q)$(MAKE) $(build)=Documentation/kdbus $@
-
- else # KBUILD_EXTMOD
-
-diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
-index 1a0006a..4842a98 100644
---- a/include/uapi/linux/Kbuild
-+++ b/include/uapi/linux/Kbuild
-@@ -215,6 +215,7 @@ header-y += ixjuser.h
- header-y += jffs2.h
- header-y += joystick.h
- header-y += kcmp.h
-+header-y += kdbus.h
- header-y += kdev_t.h
- header-y += kd.h
- header-y += kernelcapi.h
-diff --git a/include/uapi/linux/kdbus.h b/include/uapi/linux/kdbus.h
-new file mode 100644
-index 0000000..4fc44cb
---- /dev/null
-+++ b/include/uapi/linux/kdbus.h
-@@ -0,0 +1,984 @@
-+/*
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef _UAPI_KDBUS_H_
-+#define _UAPI_KDBUS_H_
-+
-+#include <linux/ioctl.h>
-+#include <linux/types.h>
-+
-+#define KDBUS_IOCTL_MAGIC 0x95
-+#define KDBUS_SRC_ID_KERNEL (0)
-+#define KDBUS_DST_ID_NAME (0)
-+#define KDBUS_MATCH_ID_ANY (~0ULL)
-+#define KDBUS_DST_ID_BROADCAST (~0ULL)
-+#define KDBUS_FLAG_NEGOTIATE (1ULL << 63)
-+
-+/**
-+ * struct kdbus_notify_id_change - name registry change message
-+ * @id: New or former owner of the name
-+ * @flags: flags field from KDBUS_HELLO_*
-+ *
-+ * Sent from kernel to userspace when the owner or activator of
-+ * a well-known name changes.
-+ *
-+ * Attached to:
-+ * KDBUS_ITEM_ID_ADD
-+ * KDBUS_ITEM_ID_REMOVE
-+ */
-+struct kdbus_notify_id_change {
-+ __u64 id;
-+ __u64 flags;
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_notify_name_change - name registry change message
-+ * @old_id: ID and flags of former owner of a name
-+ * @new_id: ID and flags of new owner of a name
-+ * @name: Well-known name
-+ *
-+ * Sent from kernel to userspace when the owner or activator of
-+ * a well-known name changes.
-+ *
-+ * Attached to:
-+ * KDBUS_ITEM_NAME_ADD
-+ * KDBUS_ITEM_NAME_REMOVE
-+ * KDBUS_ITEM_NAME_CHANGE
-+ */
-+struct kdbus_notify_name_change {
-+ struct kdbus_notify_id_change old_id;
-+ struct kdbus_notify_id_change new_id;
-+ char name[0];
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_creds - process credentials
-+ * @uid: User ID
-+ * @euid: Effective UID
-+ * @suid: Saved UID
-+ * @fsuid: Filesystem UID
-+ * @gid: Group ID
-+ * @egid: Effective GID
-+ * @sgid: Saved GID
-+ * @fsgid: Filesystem GID
-+ *
-+ * Attached to:
-+ * KDBUS_ITEM_CREDS
-+ */
-+struct kdbus_creds {
-+ __u64 uid;
-+ __u64 euid;
-+ __u64 suid;
-+ __u64 fsuid;
-+ __u64 gid;
-+ __u64 egid;
-+ __u64 sgid;
-+ __u64 fsgid;
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_pids - process identifiers
-+ * @pid: Process ID
-+ * @tid: Thread ID
-+ * @ppid: Parent process ID
-+ *
-+ * The PID and TID of a process.
-+ *
-+ * Attached to:
-+ * KDBUS_ITEM_PIDS
-+ */
-+struct kdbus_pids {
-+ __u64 pid;
-+ __u64 tid;
-+ __u64 ppid;
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_caps - process capabilities
-+ * @last_cap: Highest currently known capability bit
-+ * @caps: Variable number of 32-bit capabilities flags
-+ *
-+ * Contains a variable number of 32-bit capabilities flags.
-+ *
-+ * Attached to:
-+ * KDBUS_ITEM_CAPS
-+ */
-+struct kdbus_caps {
-+ __u32 last_cap;
-+ __u32 caps[0];
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_audit - audit information
-+ * @sessionid: The audit session ID
-+ * @loginuid: The audit login uid
-+ *
-+ * Attached to:
-+ * KDBUS_ITEM_AUDIT
-+ */
-+struct kdbus_audit {
-+ __u32 sessionid;
-+ __u32 loginuid;
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_timestamp
-+ * @seqnum: Global per-domain message sequence number
-+ * @monotonic_ns: Monotonic timestamp, in nanoseconds
-+ * @realtime_ns: Realtime timestamp, in nanoseconds
-+ *
-+ * Attached to:
-+ * KDBUS_ITEM_TIMESTAMP
-+ */
-+struct kdbus_timestamp {
-+ __u64 seqnum;
-+ __u64 monotonic_ns;
-+ __u64 realtime_ns;
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_vec - I/O vector for kdbus payload items
-+ * @size: The size of the vector
-+ * @address: Memory address of data buffer
-+ * @offset: Offset in the in-message payload memory,
-+ * relative to the message head
-+ *
-+ * Attached to:
-+ * KDBUS_ITEM_PAYLOAD_VEC, KDBUS_ITEM_PAYLOAD_OFF
-+ */
-+struct kdbus_vec {
-+ __u64 size;
-+ union {
-+ __u64 address;
-+ __u64 offset;
-+ };
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_bloom_parameter - bus-wide bloom parameters
-+ * @size: Size of the bit field in bytes (m / 8)
-+ * @n_hash: Number of hash functions used (k)
-+ */
-+struct kdbus_bloom_parameter {
-+ __u64 size;
-+ __u64 n_hash;
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_bloom_filter - bloom filter containing n elements
-+ * @generation: Generation of the element set in the filter
-+ * @data: Bit field, multiple of 8 bytes
-+ */
-+struct kdbus_bloom_filter {
-+ __u64 generation;
-+ __u64 data[0];
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_memfd - a kdbus memfd
-+ * @start: The offset into the memfd where the segment starts
-+ * @size: The size of the memfd segment
-+ * @fd: The file descriptor number
-+ * @__pad: Padding to ensure proper alignment and size
-+ *
-+ * Attached to:
-+ * KDBUS_ITEM_PAYLOAD_MEMFD
-+ */
-+struct kdbus_memfd {
-+ __u64 start;
-+ __u64 size;
-+ int fd;
-+ __u32 __pad;
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_name - a registered well-known name with its flags
-+ * @flags: Flags from KDBUS_NAME_*
-+ * @name: Well-known name
-+ *
-+ * Attached to:
-+ * KDBUS_ITEM_OWNED_NAME
-+ */
-+struct kdbus_name {
-+ __u64 flags;
-+ char name[0];
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * enum kdbus_policy_access_type - permissions of a policy record
-+ * @_KDBUS_POLICY_ACCESS_NULL: Uninitialized/invalid
-+ * @KDBUS_POLICY_ACCESS_USER: Grant access to a uid
-+ * @KDBUS_POLICY_ACCESS_GROUP: Grant access to gid
-+ * @KDBUS_POLICY_ACCESS_WORLD: World-accessible
-+ */
-+enum kdbus_policy_access_type {
-+ _KDBUS_POLICY_ACCESS_NULL,
-+ KDBUS_POLICY_ACCESS_USER,
-+ KDBUS_POLICY_ACCESS_GROUP,
-+ KDBUS_POLICY_ACCESS_WORLD,
-+};
-+
-+/**
-+ * enum kdbus_policy_access_flags - mode flags
-+ * @KDBUS_POLICY_OWN: Allow to own a well-known name
-+ * Implies KDBUS_POLICY_TALK and KDBUS_POLICY_SEE
-+ * @KDBUS_POLICY_TALK: Allow communication to a well-known name
-+ * Implies KDBUS_POLICY_SEE
-+ * @KDBUS_POLICY_SEE: Allow to see a well-known name
-+ */
-+enum kdbus_policy_type {
-+ KDBUS_POLICY_SEE = 0,
-+ KDBUS_POLICY_TALK,
-+ KDBUS_POLICY_OWN,
-+};
-+
-+/**
-+ * struct kdbus_policy_access - policy access item
-+ * @type: One of KDBUS_POLICY_ACCESS_* types
-+ * @access: Access to grant
-+ * @id: For KDBUS_POLICY_ACCESS_USER, the uid
-+ * For KDBUS_POLICY_ACCESS_GROUP, the gid
-+ */
-+struct kdbus_policy_access {
-+ __u64 type; /* USER, GROUP, WORLD */
-+ __u64 access; /* OWN, TALK, SEE */
-+ __u64 id; /* uid, gid, 0 */
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * enum kdbus_attach_flags - flags for metadata attachments
-+ * @KDBUS_ATTACH_TIMESTAMP: Timestamp
-+ * @KDBUS_ATTACH_CREDS: Credentials
-+ * @KDBUS_ATTACH_PIDS: PIDs
-+ * @KDBUS_ATTACH_AUXGROUPS: Auxiliary groups
-+ * @KDBUS_ATTACH_NAMES: Well-known names
-+ * @KDBUS_ATTACH_TID_COMM: The "comm" process identifier of the TID
-+ * @KDBUS_ATTACH_PID_COMM: The "comm" process identifier of the PID
-+ * @KDBUS_ATTACH_EXE: The path of the executable
-+ * @KDBUS_ATTACH_CMDLINE: The process command line
-+ * @KDBUS_ATTACH_CGROUP: The croup membership
-+ * @KDBUS_ATTACH_CAPS: The process capabilities
-+ * @KDBUS_ATTACH_SECLABEL: The security label
-+ * @KDBUS_ATTACH_AUDIT: The audit IDs
-+ * @KDBUS_ATTACH_CONN_DESCRIPTION: The human-readable connection name
-+ * @_KDBUS_ATTACH_ALL: All of the above
-+ * @_KDBUS_ATTACH_ANY: Wildcard match to enable any kind of
-+ * metatdata.
-+ */
-+enum kdbus_attach_flags {
-+ KDBUS_ATTACH_TIMESTAMP = 1ULL << 0,
-+ KDBUS_ATTACH_CREDS = 1ULL << 1,
-+ KDBUS_ATTACH_PIDS = 1ULL << 2,
-+ KDBUS_ATTACH_AUXGROUPS = 1ULL << 3,
-+ KDBUS_ATTACH_NAMES = 1ULL << 4,
-+ KDBUS_ATTACH_TID_COMM = 1ULL << 5,
-+ KDBUS_ATTACH_PID_COMM = 1ULL << 6,
-+ KDBUS_ATTACH_EXE = 1ULL << 7,
-+ KDBUS_ATTACH_CMDLINE = 1ULL << 8,
-+ KDBUS_ATTACH_CGROUP = 1ULL << 9,
-+ KDBUS_ATTACH_CAPS = 1ULL << 10,
-+ KDBUS_ATTACH_SECLABEL = 1ULL << 11,
-+ KDBUS_ATTACH_AUDIT = 1ULL << 12,
-+ KDBUS_ATTACH_CONN_DESCRIPTION = 1ULL << 13,
-+ _KDBUS_ATTACH_ALL = (1ULL << 14) - 1,
-+ _KDBUS_ATTACH_ANY = ~0ULL
-+};
-+
-+/**
-+ * enum kdbus_item_type - item types to chain data in a list
-+ * @_KDBUS_ITEM_NULL: Uninitialized/invalid
-+ * @_KDBUS_ITEM_USER_BASE: Start of user items
-+ * @KDBUS_ITEM_NEGOTIATE: Negotiate supported items
-+ * @KDBUS_ITEM_PAYLOAD_VEC: Vector to data
-+ * @KDBUS_ITEM_PAYLOAD_OFF: Data at returned offset to message head
-+ * @KDBUS_ITEM_PAYLOAD_MEMFD: Data as sealed memfd
-+ * @KDBUS_ITEM_FDS: Attached file descriptors
-+ * @KDBUS_ITEM_CANCEL_FD: FD used to cancel a synchronous
-+ * operation by writing to it from
-+ * userspace
-+ * @KDBUS_ITEM_BLOOM_PARAMETER: Bus-wide bloom parameters, used with
-+ * KDBUS_CMD_BUS_MAKE, carries a
-+ * struct kdbus_bloom_parameter
-+ * @KDBUS_ITEM_BLOOM_FILTER: Bloom filter carried with a message,
-+ * used to match against a bloom mask of a
-+ * connection, carries a struct
-+ * kdbus_bloom_filter
-+ * @KDBUS_ITEM_BLOOM_MASK: Bloom mask used to match against a
-+ * message'sbloom filter
-+ * @KDBUS_ITEM_DST_NAME: Destination's well-known name
-+ * @KDBUS_ITEM_MAKE_NAME: Name of domain, bus, endpoint
-+ * @KDBUS_ITEM_ATTACH_FLAGS_SEND: Attach-flags, used for updating which
-+ * metadata a connection opts in to send
-+ * @KDBUS_ITEM_ATTACH_FLAGS_RECV: Attach-flags, used for updating which
-+ * metadata a connection requests to
-+ * receive for each reeceived message
-+ * @KDBUS_ITEM_ID: Connection ID
-+ * @KDBUS_ITEM_NAME: Well-know name with flags
-+ * @_KDBUS_ITEM_ATTACH_BASE: Start of metadata attach items
-+ * @KDBUS_ITEM_TIMESTAMP: Timestamp
-+ * @KDBUS_ITEM_CREDS: Process credentials
-+ * @KDBUS_ITEM_PIDS: Process identifiers
-+ * @KDBUS_ITEM_AUXGROUPS: Auxiliary process groups
-+ * @KDBUS_ITEM_OWNED_NAME: A name owned by the associated
-+ * connection
-+ * @KDBUS_ITEM_TID_COMM: Thread ID "comm" identifier
-+ * (Don't trust this, see below.)
-+ * @KDBUS_ITEM_PID_COMM: Process ID "comm" identifier
-+ * (Don't trust this, see below.)
-+ * @KDBUS_ITEM_EXE: The path of the executable
-+ * (Don't trust this, see below.)
-+ * @KDBUS_ITEM_CMDLINE: The process command line
-+ * (Don't trust this, see below.)
-+ * @KDBUS_ITEM_CGROUP: The croup membership
-+ * @KDBUS_ITEM_CAPS: The process capabilities
-+ * @KDBUS_ITEM_SECLABEL: The security label
-+ * @KDBUS_ITEM_AUDIT: The audit IDs
-+ * @KDBUS_ITEM_CONN_DESCRIPTION: The connection's human-readable name
-+ * (debugging)
-+ * @_KDBUS_ITEM_POLICY_BASE: Start of policy items
-+ * @KDBUS_ITEM_POLICY_ACCESS: Policy access block
-+ * @_KDBUS_ITEM_KERNEL_BASE: Start of kernel-generated message items
-+ * @KDBUS_ITEM_NAME_ADD: Notification in kdbus_notify_name_change
-+ * @KDBUS_ITEM_NAME_REMOVE: Notification in kdbus_notify_name_change
-+ * @KDBUS_ITEM_NAME_CHANGE: Notification in kdbus_notify_name_change
-+ * @KDBUS_ITEM_ID_ADD: Notification in kdbus_notify_id_change
-+ * @KDBUS_ITEM_ID_REMOVE: Notification in kdbus_notify_id_change
-+ * @KDBUS_ITEM_REPLY_TIMEOUT: Timeout has been reached
-+ * @KDBUS_ITEM_REPLY_DEAD: Destination died
-+ *
-+ * N.B: The process and thread COMM fields, as well as the CMDLINE and
-+ * EXE fields may be altered by unprivileged processes und should
-+ * hence *not* used for security decisions. Peers should make use of
-+ * these items only for informational purposes, such as generating log
-+ * records.
-+ */
-+enum kdbus_item_type {
-+ _KDBUS_ITEM_NULL,
-+ _KDBUS_ITEM_USER_BASE,
-+ KDBUS_ITEM_NEGOTIATE = _KDBUS_ITEM_USER_BASE,
-+ KDBUS_ITEM_PAYLOAD_VEC,
-+ KDBUS_ITEM_PAYLOAD_OFF,
-+ KDBUS_ITEM_PAYLOAD_MEMFD,
-+ KDBUS_ITEM_FDS,
-+ KDBUS_ITEM_CANCEL_FD,
-+ KDBUS_ITEM_BLOOM_PARAMETER,
-+ KDBUS_ITEM_BLOOM_FILTER,
-+ KDBUS_ITEM_BLOOM_MASK,
-+ KDBUS_ITEM_DST_NAME,
-+ KDBUS_ITEM_MAKE_NAME,
-+ KDBUS_ITEM_ATTACH_FLAGS_SEND,
-+ KDBUS_ITEM_ATTACH_FLAGS_RECV,
-+ KDBUS_ITEM_ID,
-+ KDBUS_ITEM_NAME,
-+ KDBUS_ITEM_DST_ID,
-+
-+ /* keep these item types in sync with KDBUS_ATTACH_* flags */
-+ _KDBUS_ITEM_ATTACH_BASE = 0x1000,
-+ KDBUS_ITEM_TIMESTAMP = _KDBUS_ITEM_ATTACH_BASE,
-+ KDBUS_ITEM_CREDS,
-+ KDBUS_ITEM_PIDS,
-+ KDBUS_ITEM_AUXGROUPS,
-+ KDBUS_ITEM_OWNED_NAME,
-+ KDBUS_ITEM_TID_COMM,
-+ KDBUS_ITEM_PID_COMM,
-+ KDBUS_ITEM_EXE,
-+ KDBUS_ITEM_CMDLINE,
-+ KDBUS_ITEM_CGROUP,
-+ KDBUS_ITEM_CAPS,
-+ KDBUS_ITEM_SECLABEL,
-+ KDBUS_ITEM_AUDIT,
-+ KDBUS_ITEM_CONN_DESCRIPTION,
-+
-+ _KDBUS_ITEM_POLICY_BASE = 0x2000,
-+ KDBUS_ITEM_POLICY_ACCESS = _KDBUS_ITEM_POLICY_BASE,
-+
-+ _KDBUS_ITEM_KERNEL_BASE = 0x8000,
-+ KDBUS_ITEM_NAME_ADD = _KDBUS_ITEM_KERNEL_BASE,
-+ KDBUS_ITEM_NAME_REMOVE,
-+ KDBUS_ITEM_NAME_CHANGE,
-+ KDBUS_ITEM_ID_ADD,
-+ KDBUS_ITEM_ID_REMOVE,
-+ KDBUS_ITEM_REPLY_TIMEOUT,
-+ KDBUS_ITEM_REPLY_DEAD,
-+};
-+
-+/**
-+ * struct kdbus_item - chain of data blocks
-+ * @size: Overall data record size
-+ * @type: Kdbus_item type of data
-+ * @data: Generic bytes
-+ * @data32: Generic 32 bit array
-+ * @data64: Generic 64 bit array
-+ * @str: Generic string
-+ * @id: Connection ID
-+ * @vec: KDBUS_ITEM_PAYLOAD_VEC
-+ * @creds: KDBUS_ITEM_CREDS
-+ * @audit: KDBUS_ITEM_AUDIT
-+ * @timestamp: KDBUS_ITEM_TIMESTAMP
-+ * @name: KDBUS_ITEM_NAME
-+ * @bloom_parameter: KDBUS_ITEM_BLOOM_PARAMETER
-+ * @bloom_filter: KDBUS_ITEM_BLOOM_FILTER
-+ * @memfd: KDBUS_ITEM_PAYLOAD_MEMFD
-+ * @name_change: KDBUS_ITEM_NAME_ADD
-+ * KDBUS_ITEM_NAME_REMOVE
-+ * KDBUS_ITEM_NAME_CHANGE
-+ * @id_change: KDBUS_ITEM_ID_ADD
-+ * KDBUS_ITEM_ID_REMOVE
-+ * @policy: KDBUS_ITEM_POLICY_ACCESS
-+ */
-+struct kdbus_item {
-+ __u64 size;
-+ __u64 type;
-+ union {
-+ __u8 data[0];
-+ __u32 data32[0];
-+ __u64 data64[0];
-+ char str[0];
-+
-+ __u64 id;
-+ struct kdbus_vec vec;
-+ struct kdbus_creds creds;
-+ struct kdbus_pids pids;
-+ struct kdbus_audit audit;
-+ struct kdbus_caps caps;
-+ struct kdbus_timestamp timestamp;
-+ struct kdbus_name name;
-+ struct kdbus_bloom_parameter bloom_parameter;
-+ struct kdbus_bloom_filter bloom_filter;
-+ struct kdbus_memfd memfd;
-+ int fds[0];
-+ struct kdbus_notify_name_change name_change;
-+ struct kdbus_notify_id_change id_change;
-+ struct kdbus_policy_access policy_access;
-+ };
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * enum kdbus_msg_flags - type of message
-+ * @KDBUS_MSG_EXPECT_REPLY: Expect a reply message, used for
-+ * method calls. The userspace-supplied
-+ * cookie identifies the message and the
-+ * respective reply carries the cookie
-+ * in cookie_reply
-+ * @KDBUS_MSG_NO_AUTO_START: Do not start a service if the addressed
-+ * name is not currently active. This flag is
-+ * not looked at by the kernel but only
-+ * serves as hint for userspace implementations.
-+ * @KDBUS_MSG_SIGNAL: Treat this message as signal
-+ */
-+enum kdbus_msg_flags {
-+ KDBUS_MSG_EXPECT_REPLY = 1ULL << 0,
-+ KDBUS_MSG_NO_AUTO_START = 1ULL << 1,
-+ KDBUS_MSG_SIGNAL = 1ULL << 2,
-+};
-+
-+/**
-+ * enum kdbus_payload_type - type of payload carried by message
-+ * @KDBUS_PAYLOAD_KERNEL: Kernel-generated simple message
-+ * @KDBUS_PAYLOAD_DBUS: D-Bus marshalling "DBusDBus"
-+ *
-+ * Any payload-type is accepted. Common types will get added here once
-+ * established.
-+ */
-+enum kdbus_payload_type {
-+ KDBUS_PAYLOAD_KERNEL,
-+ KDBUS_PAYLOAD_DBUS = 0x4442757344427573ULL,
-+};
-+
-+/**
-+ * struct kdbus_msg - the representation of a kdbus message
-+ * @size: Total size of the message
-+ * @flags: Message flags (KDBUS_MSG_*), userspace → kernel
-+ * @priority: Message queue priority value
-+ * @dst_id: 64-bit ID of the destination connection
-+ * @src_id: 64-bit ID of the source connection
-+ * @payload_type: Payload type (KDBUS_PAYLOAD_*)
-+ * @cookie: Userspace-supplied cookie, for the connection
-+ * to identify its messages
-+ * @timeout_ns: The time to wait for a message reply from the peer.
-+ * If there is no reply, and the send command is
-+ * executed asynchronously, a kernel-generated message
-+ * with an attached KDBUS_ITEM_REPLY_TIMEOUT item
-+ * is sent to @src_id. For synchronously executed send
-+ * command, the value denotes the maximum time the call
-+ * blocks to wait for a reply. The timeout is expected in
-+ * nanoseconds and as absolute CLOCK_MONOTONIC value.
-+ * @cookie_reply: A reply to the requesting message with the same
-+ * cookie. The requesting connection can match its
-+ * request and the reply with this value
-+ * @items: A list of kdbus_items containing the message payload
-+ */
-+struct kdbus_msg {
-+ __u64 size;
-+ __u64 flags;
-+ __s64 priority;
-+ __u64 dst_id;
-+ __u64 src_id;
-+ __u64 payload_type;
-+ __u64 cookie;
-+ union {
-+ __u64 timeout_ns;
-+ __u64 cookie_reply;
-+ };
-+ struct kdbus_item items[0];
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_msg_info - returned message container
-+ * @offset: Offset of kdbus_msg slice in pool
-+ * @msg_size: Copy of the kdbus_msg.size field
-+ * @return_flags: Command return flags, kernel → userspace
-+ */
-+struct kdbus_msg_info {
-+ __u64 offset;
-+ __u64 msg_size;
-+ __u64 return_flags;
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * enum kdbus_send_flags - flags for sending messages
-+ * @KDBUS_SEND_SYNC_REPLY: Wait for destination connection to
-+ * reply to this message. The
-+ * KDBUS_CMD_SEND ioctl() will block
-+ * until the reply is received, and
-+ * reply in struct kdbus_cmd_send will
-+ * yield the offset in the sender's pool
-+ * where the reply can be found.
-+ * This flag is only valid if
-+ * @KDBUS_MSG_EXPECT_REPLY is set as well.
-+ */
-+enum kdbus_send_flags {
-+ KDBUS_SEND_SYNC_REPLY = 1ULL << 0,
-+};
-+
-+/**
-+ * struct kdbus_cmd_send - send message
-+ * @size: Overall size of this structure
-+ * @flags: Flags to change send behavior (KDBUS_SEND_*)
-+ * @return_flags: Command return flags, kernel → userspace
-+ * @msg_address: Storage address of the kdbus_msg to send
-+ * @reply: Storage for message reply if KDBUS_SEND_SYNC_REPLY
-+ * was given
-+ * @items: Additional items for this command
-+ */
-+struct kdbus_cmd_send {
-+ __u64 size;
-+ __u64 flags;
-+ __u64 return_flags;
-+ __u64 msg_address;
-+ struct kdbus_msg_info reply;
-+ struct kdbus_item items[0];
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * enum kdbus_recv_flags - flags for de-queuing messages
-+ * @KDBUS_RECV_PEEK: Return the next queued message without
-+ * actually de-queuing it, and without installing
-+ * any file descriptors or other resources. It is
-+ * usually used to determine the activating
-+ * connection of a bus name.
-+ * @KDBUS_RECV_DROP: Drop and free the next queued message and all
-+ * its resources without actually receiving it.
-+ * @KDBUS_RECV_USE_PRIORITY: Only de-queue messages with the specified or
-+ * higher priority (lowest values); if not set,
-+ * the priority value is ignored.
-+ */
-+enum kdbus_recv_flags {
-+ KDBUS_RECV_PEEK = 1ULL << 0,
-+ KDBUS_RECV_DROP = 1ULL << 1,
-+ KDBUS_RECV_USE_PRIORITY = 1ULL << 2,
-+};
-+
-+/**
-+ * enum kdbus_recv_return_flags - return flags for message receive commands
-+ * @KDBUS_RECV_RETURN_INCOMPLETE_FDS: One or more file descriptors could not
-+ * be installed. These descriptors in
-+ * KDBUS_ITEM_FDS will carry the value -1.
-+ * @KDBUS_RECV_RETURN_DROPPED_MSGS: There have been dropped messages since
-+ * the last time a message was received.
-+ * The 'dropped_msgs' counter contains the
-+ * number of messages dropped pool
-+ * overflows or other missed broadcasts.
-+ */
-+enum kdbus_recv_return_flags {
-+ KDBUS_RECV_RETURN_INCOMPLETE_FDS = 1ULL << 0,
-+ KDBUS_RECV_RETURN_DROPPED_MSGS = 1ULL << 1,
-+};
-+
-+/**
-+ * struct kdbus_cmd_recv - struct to de-queue a buffered message
-+ * @size: Overall size of this object
-+ * @flags: KDBUS_RECV_* flags, userspace → kernel
-+ * @return_flags: Command return flags, kernel → userspace
-+ * @priority: Minimum priority of the messages to de-queue. Lowest
-+ * values have the highest priority.
-+ * @dropped_msgs: In case there were any dropped messages since the last
-+ * time a message was received, this will be set to the
-+ * number of lost messages and
-+ * KDBUS_RECV_RETURN_DROPPED_MSGS will be set in
-+ * 'return_flags'. This can only happen if the ioctl
-+ * returns 0 or EAGAIN.
-+ * @msg: Return storage for received message.
-+ * @items: Additional items for this command.
-+ *
-+ * This struct is used with the KDBUS_CMD_RECV ioctl.
-+ */
-+struct kdbus_cmd_recv {
-+ __u64 size;
-+ __u64 flags;
-+ __u64 return_flags;
-+ __s64 priority;
-+ __u64 dropped_msgs;
-+ struct kdbus_msg_info msg;
-+ struct kdbus_item items[0];
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_cmd_free - struct to free a slice of memory in the pool
-+ * @size: Overall size of this structure
-+ * @flags: Flags for the free command, userspace → kernel
-+ * @return_flags: Command return flags, kernel → userspace
-+ * @offset: The offset of the memory slice, as returned by other
-+ * ioctls
-+ * @items: Additional items to modify the behavior
-+ *
-+ * This struct is used with the KDBUS_CMD_FREE ioctl.
-+ */
-+struct kdbus_cmd_free {
-+ __u64 size;
-+ __u64 flags;
-+ __u64 return_flags;
-+ __u64 offset;
-+ struct kdbus_item items[0];
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * enum kdbus_hello_flags - flags for struct kdbus_cmd_hello
-+ * @KDBUS_HELLO_ACCEPT_FD: The connection allows the reception of
-+ * any passed file descriptors
-+ * @KDBUS_HELLO_ACTIVATOR: Special-purpose connection which registers
-+ * a well-know name for a process to be started
-+ * when traffic arrives
-+ * @KDBUS_HELLO_POLICY_HOLDER: Special-purpose connection which registers
-+ * policy entries for a name. The provided name
-+ * is not activated and not registered with the
-+ * name database, it only allows unprivileged
-+ * connections to acquire a name, talk or discover
-+ * a service
-+ * @KDBUS_HELLO_MONITOR: Special-purpose connection to monitor
-+ * bus traffic
-+ */
-+enum kdbus_hello_flags {
-+ KDBUS_HELLO_ACCEPT_FD = 1ULL << 0,
-+ KDBUS_HELLO_ACTIVATOR = 1ULL << 1,
-+ KDBUS_HELLO_POLICY_HOLDER = 1ULL << 2,
-+ KDBUS_HELLO_MONITOR = 1ULL << 3,
-+};
-+
-+/**
-+ * struct kdbus_cmd_hello - struct to say hello to kdbus
-+ * @size: The total size of the structure
-+ * @flags: Connection flags (KDBUS_HELLO_*), userspace → kernel
-+ * @return_flags: Command return flags, kernel → userspace
-+ * @attach_flags_send: Mask of metadata to attach to each message sent
-+ * off by this connection (KDBUS_ATTACH_*)
-+ * @attach_flags_recv: Mask of metadata to attach to each message receieved
-+ * by the new connection (KDBUS_ATTACH_*)
-+ * @bus_flags: The flags field copied verbatim from the original
-+ * KDBUS_CMD_BUS_MAKE ioctl. It's intended to be useful
-+ * to do negotiation of features of the payload that is
-+ * transferred (kernel → userspace)
-+ * @id: The ID of this connection (kernel → userspace)
-+ * @pool_size: Size of the connection's buffer where the received
-+ * messages are placed
-+ * @offset: Pool offset where items are returned to report
-+ * additional information about the bus and the newly
-+ * created connection.
-+ * @items_size: Size of buffer returned in the pool slice at @offset.
-+ * @id128: Unique 128-bit ID of the bus (kernel → userspace)
-+ * @items: A list of items
-+ *
-+ * This struct is used with the KDBUS_CMD_HELLO ioctl.
-+ */
-+struct kdbus_cmd_hello {
-+ __u64 size;
-+ __u64 flags;
-+ __u64 return_flags;
-+ __u64 attach_flags_send;
-+ __u64 attach_flags_recv;
-+ __u64 bus_flags;
-+ __u64 id;
-+ __u64 pool_size;
-+ __u64 offset;
-+ __u64 items_size;
-+ __u8 id128[16];
-+ struct kdbus_item items[0];
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_info - connection information
-+ * @size: total size of the struct
-+ * @id: 64bit object ID
-+ * @flags: object creation flags
-+ * @items: list of items
-+ *
-+ * Note that the user is responsible for freeing the allocated memory with
-+ * the KDBUS_CMD_FREE ioctl.
-+ */
-+struct kdbus_info {
-+ __u64 size;
-+ __u64 id;
-+ __u64 flags;
-+ struct kdbus_item items[0];
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * enum kdbus_list_flags - what to include into the returned list
-+ * @KDBUS_LIST_UNIQUE: active connections
-+ * @KDBUS_LIST_ACTIVATORS: activator connections
-+ * @KDBUS_LIST_NAMES: known well-known names
-+ * @KDBUS_LIST_QUEUED: queued-up names
-+ */
-+enum kdbus_list_flags {
-+ KDBUS_LIST_UNIQUE = 1ULL << 0,
-+ KDBUS_LIST_NAMES = 1ULL << 1,
-+ KDBUS_LIST_ACTIVATORS = 1ULL << 2,
-+ KDBUS_LIST_QUEUED = 1ULL << 3,
-+};
-+
-+/**
-+ * struct kdbus_cmd_list - list connections
-+ * @size: overall size of this object
-+ * @flags: flags for the query (KDBUS_LIST_*), userspace → kernel
-+ * @return_flags: command return flags, kernel → userspace
-+ * @offset: Offset in the caller's pool buffer where an array of
-+ * kdbus_info objects is stored.
-+ * The user must use KDBUS_CMD_FREE to free the
-+ * allocated memory.
-+ * @list_size: size of returned list in bytes
-+ * @items: Items for the command. Reserved for future use.
-+ *
-+ * This structure is used with the KDBUS_CMD_LIST ioctl.
-+ */
-+struct kdbus_cmd_list {
-+ __u64 size;
-+ __u64 flags;
-+ __u64 return_flags;
-+ __u64 offset;
-+ __u64 list_size;
-+ struct kdbus_item items[0];
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * struct kdbus_cmd_info - struct used for KDBUS_CMD_CONN_INFO ioctl
-+ * @size: The total size of the struct
-+ * @flags: Flags for this ioctl, userspace → kernel
-+ * @return_flags: Command return flags, kernel → userspace
-+ * @id: The 64-bit ID of the connection. If set to zero, passing
-+ * @name is required. kdbus will look up the name to
-+ * determine the ID in this case.
-+ * @attach_flags: Set of attach flags to specify the set of information
-+ * to receive, userspace → kernel
-+ * @offset: Returned offset in the caller's pool buffer where the
-+ * kdbus_info struct result is stored. The user must
-+ * use KDBUS_CMD_FREE to free the allocated memory.
-+ * @info_size: Output buffer to report size of data at @offset.
-+ * @items: The optional item list, containing the
-+ * well-known name to look up as a KDBUS_ITEM_NAME.
-+ * Only needed in case @id is zero.
-+ *
-+ * On success, the KDBUS_CMD_CONN_INFO ioctl will return 0 and @offset will
-+ * tell the user the offset in the connection pool buffer at which to find the
-+ * result in a struct kdbus_info.
-+ */
-+struct kdbus_cmd_info {
-+ __u64 size;
-+ __u64 flags;
-+ __u64 return_flags;
-+ __u64 id;
-+ __u64 attach_flags;
-+ __u64 offset;
-+ __u64 info_size;
-+ struct kdbus_item items[0];
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * enum kdbus_cmd_match_flags - flags to control the KDBUS_CMD_MATCH_ADD ioctl
-+ * @KDBUS_MATCH_REPLACE: If entries with the supplied cookie already
-+ * exists, remove them before installing the new
-+ * matches.
-+ */
-+enum kdbus_cmd_match_flags {
-+ KDBUS_MATCH_REPLACE = 1ULL << 0,
-+};
-+
-+/**
-+ * struct kdbus_cmd_match - struct to add or remove matches
-+ * @size: The total size of the struct
-+ * @flags: Flags for match command (KDBUS_MATCH_*),
-+ * userspace → kernel
-+ * @return_flags: Command return flags, kernel → userspace
-+ * @cookie: Userspace supplied cookie. When removing, the cookie
-+ * identifies the match to remove
-+ * @items: A list of items for additional information
-+ *
-+ * This structure is used with the KDBUS_CMD_MATCH_ADD and
-+ * KDBUS_CMD_MATCH_REMOVE ioctl.
-+ */
-+struct kdbus_cmd_match {
-+ __u64 size;
-+ __u64 flags;
-+ __u64 return_flags;
-+ __u64 cookie;
-+ struct kdbus_item items[0];
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * enum kdbus_make_flags - Flags for KDBUS_CMD_{BUS,ENDPOINT}_MAKE
-+ * @KDBUS_MAKE_ACCESS_GROUP: Make the bus or endpoint node group-accessible
-+ * @KDBUS_MAKE_ACCESS_WORLD: Make the bus or endpoint node world-accessible
-+ */
-+enum kdbus_make_flags {
-+ KDBUS_MAKE_ACCESS_GROUP = 1ULL << 0,
-+ KDBUS_MAKE_ACCESS_WORLD = 1ULL << 1,
-+};
-+
-+/**
-+ * enum kdbus_name_flags - flags for KDBUS_CMD_NAME_ACQUIRE
-+ * @KDBUS_NAME_REPLACE_EXISTING: Try to replace name of other connections
-+ * @KDBUS_NAME_ALLOW_REPLACEMENT: Allow the replacement of the name
-+ * @KDBUS_NAME_QUEUE: Name should be queued if busy
-+ * @KDBUS_NAME_IN_QUEUE: Name is queued
-+ * @KDBUS_NAME_ACTIVATOR: Name is owned by a activator connection
-+ * @KDBUS_NAME_PRIMARY: Primary owner of the name
-+ * @KDBUS_NAME_ACQUIRED: Name was acquired/queued _now_
-+ */
-+enum kdbus_name_flags {
-+ KDBUS_NAME_REPLACE_EXISTING = 1ULL << 0,
-+ KDBUS_NAME_ALLOW_REPLACEMENT = 1ULL << 1,
-+ KDBUS_NAME_QUEUE = 1ULL << 2,
-+ KDBUS_NAME_IN_QUEUE = 1ULL << 3,
-+ KDBUS_NAME_ACTIVATOR = 1ULL << 4,
-+ KDBUS_NAME_PRIMARY = 1ULL << 5,
-+ KDBUS_NAME_ACQUIRED = 1ULL << 6,
-+};
-+
-+/**
-+ * struct kdbus_cmd - generic ioctl payload
-+ * @size: Overall size of this structure
-+ * @flags: Flags for this ioctl, userspace → kernel
-+ * @return_flags: Ioctl return flags, kernel → userspace
-+ * @items: Additional items to modify the behavior
-+ *
-+ * This is a generic ioctl payload object. It's used by all ioctls that only
-+ * take flags and items as input.
-+ */
-+struct kdbus_cmd {
-+ __u64 size;
-+ __u64 flags;
-+ __u64 return_flags;
-+ struct kdbus_item items[0];
-+} __attribute__((__aligned__(8)));
-+
-+/**
-+ * Ioctl API
-+ *
-+ * KDBUS_CMD_BUS_MAKE: After opening the "control" node, this command
-+ * creates a new bus with the specified
-+ * name. The bus is immediately shut down and
-+ * cleaned up when the opened file descriptor is
-+ * closed.
-+ *
-+ * KDBUS_CMD_ENDPOINT_MAKE: Creates a new named special endpoint to talk to
-+ * the bus. Such endpoints usually carry a more
-+ * restrictive policy and grant restricted access
-+ * to specific applications.
-+ * KDBUS_CMD_ENDPOINT_UPDATE: Update the properties of a custom enpoint. Used
-+ * to update the policy.
-+ *
-+ * KDBUS_CMD_HELLO: By opening the bus node, a connection is
-+ * created. After a HELLO the opened connection
-+ * becomes an active peer on the bus.
-+ * KDBUS_CMD_UPDATE: Update the properties of a connection. Used to
-+ * update the metadata subscription mask and
-+ * policy.
-+ * KDBUS_CMD_BYEBYE: Disconnect a connection. If there are no
-+ * messages queued up in the connection's pool,
-+ * the call succeeds, and the handle is rendered
-+ * unusable. Otherwise, -EBUSY is returned without
-+ * any further side-effects.
-+ * KDBUS_CMD_FREE: Release the allocated memory in the receiver's
-+ * pool.
-+ * KDBUS_CMD_CONN_INFO: Retrieve credentials and properties of the
-+ * initial creator of the connection. The data was
-+ * stored at registration time and does not
-+ * necessarily represent the connected process or
-+ * the actual state of the process.
-+ * KDBUS_CMD_BUS_CREATOR_INFO: Retrieve information of the creator of the bus
-+ * a connection is attached to.
-+ *
-+ * KDBUS_CMD_SEND: Send a message and pass data from userspace to
-+ * the kernel.
-+ * KDBUS_CMD_RECV: Receive a message from the kernel which is
-+ * placed in the receiver's pool.
-+ *
-+ * KDBUS_CMD_NAME_ACQUIRE: Request a well-known bus name to associate with
-+ * the connection. Well-known names are used to
-+ * address a peer on the bus.
-+ * KDBUS_CMD_NAME_RELEASE: Release a well-known name the connection
-+ * currently owns.
-+ * KDBUS_CMD_LIST: Retrieve the list of all currently registered
-+ * well-known and unique names.
-+ *
-+ * KDBUS_CMD_MATCH_ADD: Install a match which broadcast messages should
-+ * be delivered to the connection.
-+ * KDBUS_CMD_MATCH_REMOVE: Remove a current match for broadcast messages.
-+ */
-+enum kdbus_ioctl_type {
-+ /* bus owner (00-0f) */
-+ KDBUS_CMD_BUS_MAKE = _IOW(KDBUS_IOCTL_MAGIC, 0x00,
-+ struct kdbus_cmd),
-+
-+ /* endpoint owner (10-1f) */
-+ KDBUS_CMD_ENDPOINT_MAKE = _IOW(KDBUS_IOCTL_MAGIC, 0x10,
-+ struct kdbus_cmd),
-+ KDBUS_CMD_ENDPOINT_UPDATE = _IOW(KDBUS_IOCTL_MAGIC, 0x11,
-+ struct kdbus_cmd),
-+
-+ /* connection owner (80-ff) */
-+ KDBUS_CMD_HELLO = _IOWR(KDBUS_IOCTL_MAGIC, 0x80,
-+ struct kdbus_cmd_hello),
-+ KDBUS_CMD_UPDATE = _IOW(KDBUS_IOCTL_MAGIC, 0x81,
-+ struct kdbus_cmd),
-+ KDBUS_CMD_BYEBYE = _IOW(KDBUS_IOCTL_MAGIC, 0x82,
-+ struct kdbus_cmd),
-+ KDBUS_CMD_FREE = _IOW(KDBUS_IOCTL_MAGIC, 0x83,
-+ struct kdbus_cmd_free),
-+ KDBUS_CMD_CONN_INFO = _IOR(KDBUS_IOCTL_MAGIC, 0x84,
-+ struct kdbus_cmd_info),
-+ KDBUS_CMD_BUS_CREATOR_INFO = _IOR(KDBUS_IOCTL_MAGIC, 0x85,
-+ struct kdbus_cmd_info),
-+ KDBUS_CMD_LIST = _IOR(KDBUS_IOCTL_MAGIC, 0x86,
-+ struct kdbus_cmd_list),
-+
-+ KDBUS_CMD_SEND = _IOW(KDBUS_IOCTL_MAGIC, 0x90,
-+ struct kdbus_cmd_send),
-+ KDBUS_CMD_RECV = _IOR(KDBUS_IOCTL_MAGIC, 0x91,
-+ struct kdbus_cmd_recv),
-+
-+ KDBUS_CMD_NAME_ACQUIRE = _IOW(KDBUS_IOCTL_MAGIC, 0xa0,
-+ struct kdbus_cmd),
-+ KDBUS_CMD_NAME_RELEASE = _IOW(KDBUS_IOCTL_MAGIC, 0xa1,
-+ struct kdbus_cmd),
-+
-+ KDBUS_CMD_MATCH_ADD = _IOW(KDBUS_IOCTL_MAGIC, 0xb0,
-+ struct kdbus_cmd_match),
-+ KDBUS_CMD_MATCH_REMOVE = _IOW(KDBUS_IOCTL_MAGIC, 0xb1,
-+ struct kdbus_cmd_match),
-+};
-+
-+#endif /* _UAPI_KDBUS_H_ */
-diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
-index 7b1425a..ce2ac5a 100644
---- a/include/uapi/linux/magic.h
-+++ b/include/uapi/linux/magic.h
-@@ -76,4 +76,6 @@
- #define BTRFS_TEST_MAGIC 0x73727279
- #define NSFS_MAGIC 0x6e736673
-
-+#define KDBUS_SUPER_MAGIC 0x44427573
-+
- #endif /* __LINUX_MAGIC_H__ */
-diff --git a/init/Kconfig b/init/Kconfig
-index dc24dec..9388071 100644
---- a/init/Kconfig
-+++ b/init/Kconfig
-@@ -261,6 +261,19 @@ config POSIX_MQUEUE_SYSCTL
- depends on SYSCTL
- default y
-
-+config KDBUS
-+ tristate "kdbus interprocess communication"
-+ depends on TMPFS
-+ help
-+ D-Bus is a system for low-latency, low-overhead, easy to use
-+ interprocess communication (IPC).
-+
-+ See the man-pages and HTML files in Documentation/kdbus/
-+ that are generated by 'make mandocs' and 'make htmldocs'.
-+
-+ If you have an ordinary machine, select M here. The module
-+ will be called kdbus.
-+
- config CROSS_MEMORY_ATTACH
- bool "Enable process_vm_readv/writev syscalls"
- depends on MMU
-diff --git a/ipc/Makefile b/ipc/Makefile
-index 86c7300..68ec416 100644
---- a/ipc/Makefile
-+++ b/ipc/Makefile
-@@ -9,4 +9,4 @@ obj_mq-$(CONFIG_COMPAT) += compat_mq.o
- obj-$(CONFIG_POSIX_MQUEUE) += mqueue.o msgutil.o $(obj_mq-y)
- obj-$(CONFIG_IPC_NS) += namespace.o
- obj-$(CONFIG_POSIX_MQUEUE_SYSCTL) += mq_sysctl.o
--
-+obj-$(CONFIG_KDBUS) += kdbus/
-diff --git a/ipc/kdbus/Makefile b/ipc/kdbus/Makefile
-new file mode 100644
-index 0000000..66663a1
---- /dev/null
-+++ b/ipc/kdbus/Makefile
-@@ -0,0 +1,33 @@
-+#
-+# By setting KDBUS_EXT=2, the kdbus module will be built as kdbus2.ko, and
-+# KBUILD_MODNAME=kdbus2. This has the effect that all exported objects have
-+# different names than usually (kdbus2fs, /sys/fs/kdbus2/) and you can run
-+# your test-infrastructure against the kdbus2.ko, while running your system
-+# on kdbus.ko.
-+#
-+# To just build the module, use:
-+# make KDBUS_EXT=2 M=ipc/kdbus
-+#
-+
-+kdbus$(KDBUS_EXT)-y := \
-+ bus.o \
-+ connection.o \
-+ endpoint.o \
-+ fs.o \
-+ handle.o \
-+ item.o \
-+ main.o \
-+ match.o \
-+ message.o \
-+ metadata.o \
-+ names.o \
-+ node.o \
-+ notify.o \
-+ domain.o \
-+ policy.o \
-+ pool.o \
-+ reply.o \
-+ queue.o \
-+ util.o
-+
-+obj-$(CONFIG_KDBUS) += kdbus$(KDBUS_EXT).o
-diff --git a/ipc/kdbus/bus.c b/ipc/kdbus/bus.c
-new file mode 100644
-index 0000000..a67f825
---- /dev/null
-+++ b/ipc/kdbus/bus.c
-@@ -0,0 +1,514 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/fs.h>
-+#include <linux/hashtable.h>
-+#include <linux/init.h>
-+#include <linux/module.h>
-+#include <linux/random.h>
-+#include <linux/sched.h>
-+#include <linux/sizes.h>
-+#include <linux/slab.h>
-+#include <linux/uaccess.h>
-+#include <linux/uio.h>
-+
-+#include "bus.h"
-+#include "notify.h"
-+#include "connection.h"
-+#include "domain.h"
-+#include "endpoint.h"
-+#include "handle.h"
-+#include "item.h"
-+#include "match.h"
-+#include "message.h"
-+#include "metadata.h"
-+#include "names.h"
-+#include "policy.h"
-+#include "util.h"
-+
-+static void kdbus_bus_free(struct kdbus_node *node)
-+{
-+ struct kdbus_bus *bus = container_of(node, struct kdbus_bus, node);
-+
-+ WARN_ON(!list_empty(&bus->monitors_list));
-+ WARN_ON(!hash_empty(bus->conn_hash));
-+
-+ kdbus_notify_free(bus);
-+
-+ kdbus_user_unref(bus->creator);
-+ kdbus_name_registry_free(bus->name_registry);
-+ kdbus_domain_unref(bus->domain);
-+ kdbus_policy_db_clear(&bus->policy_db);
-+ kdbus_meta_proc_unref(bus->creator_meta);
-+ kfree(bus);
-+}
-+
-+static void kdbus_bus_release(struct kdbus_node *node, bool was_active)
-+{
-+ struct kdbus_bus *bus = container_of(node, struct kdbus_bus, node);
-+
-+ if (was_active)
-+ atomic_dec(&bus->creator->buses);
-+}
-+
-+static struct kdbus_bus *kdbus_bus_new(struct kdbus_domain *domain,
-+ const char *name,
-+ struct kdbus_bloom_parameter *bloom,
-+ const u64 *pattach_owner,
-+ u64 flags, kuid_t uid, kgid_t gid)
-+{
-+ struct kdbus_bus *b;
-+ u64 attach_owner;
-+ int ret;
-+
-+ if (bloom->size < 8 || bloom->size > KDBUS_BUS_BLOOM_MAX_SIZE ||
-+ !KDBUS_IS_ALIGNED8(bloom->size) || bloom->n_hash < 1)
-+ return ERR_PTR(-EINVAL);
-+
-+ ret = kdbus_sanitize_attach_flags(pattach_owner ? *pattach_owner : 0,
-+ &attach_owner);
-+ if (ret < 0)
-+ return ERR_PTR(ret);
-+
-+ ret = kdbus_verify_uid_prefix(name, domain->user_namespace, uid);
-+ if (ret < 0)
-+ return ERR_PTR(ret);
-+
-+ b = kzalloc(sizeof(*b), GFP_KERNEL);
-+ if (!b)
-+ return ERR_PTR(-ENOMEM);
-+
-+ kdbus_node_init(&b->node, KDBUS_NODE_BUS);
-+
-+ b->node.free_cb = kdbus_bus_free;
-+ b->node.release_cb = kdbus_bus_release;
-+ b->node.uid = uid;
-+ b->node.gid = gid;
-+ b->node.mode = S_IRUSR | S_IXUSR;
-+
-+ if (flags & (KDBUS_MAKE_ACCESS_GROUP | KDBUS_MAKE_ACCESS_WORLD))
-+ b->node.mode |= S_IRGRP | S_IXGRP;
-+ if (flags & KDBUS_MAKE_ACCESS_WORLD)
-+ b->node.mode |= S_IROTH | S_IXOTH;
-+
-+ b->id = atomic64_inc_return(&domain->last_id);
-+ b->bus_flags = flags;
-+ b->attach_flags_owner = attach_owner;
-+ generate_random_uuid(b->id128);
-+ b->bloom = *bloom;
-+ b->domain = kdbus_domain_ref(domain);
-+
-+ kdbus_policy_db_init(&b->policy_db);
-+
-+ init_rwsem(&b->conn_rwlock);
-+ hash_init(b->conn_hash);
-+ INIT_LIST_HEAD(&b->monitors_list);
-+
-+ INIT_LIST_HEAD(&b->notify_list);
-+ spin_lock_init(&b->notify_lock);
-+ mutex_init(&b->notify_flush_lock);
-+
-+ ret = kdbus_node_link(&b->node, &domain->node, name);
-+ if (ret < 0)
-+ goto exit_unref;
-+
-+ /* cache the metadata/credentials of the creator */
-+ b->creator_meta = kdbus_meta_proc_new();
-+ if (IS_ERR(b->creator_meta)) {
-+ ret = PTR_ERR(b->creator_meta);
-+ b->creator_meta = NULL;
-+ goto exit_unref;
-+ }
-+
-+ ret = kdbus_meta_proc_collect(b->creator_meta,
-+ KDBUS_ATTACH_CREDS |
-+ KDBUS_ATTACH_PIDS |
-+ KDBUS_ATTACH_AUXGROUPS |
-+ KDBUS_ATTACH_TID_COMM |
-+ KDBUS_ATTACH_PID_COMM |
-+ KDBUS_ATTACH_EXE |
-+ KDBUS_ATTACH_CMDLINE |
-+ KDBUS_ATTACH_CGROUP |
-+ KDBUS_ATTACH_CAPS |
-+ KDBUS_ATTACH_SECLABEL |
-+ KDBUS_ATTACH_AUDIT);
-+ if (ret < 0)
-+ goto exit_unref;
-+
-+ b->name_registry = kdbus_name_registry_new();
-+ if (IS_ERR(b->name_registry)) {
-+ ret = PTR_ERR(b->name_registry);
-+ b->name_registry = NULL;
-+ goto exit_unref;
-+ }
-+
-+ /*
-+ * Bus-limits of the creator are accounted on its real UID, just like
-+ * all other per-user limits.
-+ */
-+ b->creator = kdbus_user_lookup(domain, current_uid());
-+ if (IS_ERR(b->creator)) {
-+ ret = PTR_ERR(b->creator);
-+ b->creator = NULL;
-+ goto exit_unref;
-+ }
-+
-+ return b;
-+
-+exit_unref:
-+ kdbus_node_deactivate(&b->node);
-+ kdbus_node_unref(&b->node);
-+ return ERR_PTR(ret);
-+}
-+
-+/**
-+ * kdbus_bus_ref() - increase the reference counter of a kdbus_bus
-+ * @bus: The bus to reference
-+ *
-+ * Every user of a bus, except for its creator, must add a reference to the
-+ * kdbus_bus using this function.
-+ *
-+ * Return: the bus itself
-+ */
-+struct kdbus_bus *kdbus_bus_ref(struct kdbus_bus *bus)
-+{
-+ if (bus)
-+ kdbus_node_ref(&bus->node);
-+ return bus;
-+}
-+
-+/**
-+ * kdbus_bus_unref() - decrease the reference counter of a kdbus_bus
-+ * @bus: The bus to unref
-+ *
-+ * Release a reference. If the reference count drops to 0, the bus will be
-+ * freed.
-+ *
-+ * Return: NULL
-+ */
-+struct kdbus_bus *kdbus_bus_unref(struct kdbus_bus *bus)
-+{
-+ if (bus)
-+ kdbus_node_unref(&bus->node);
-+ return NULL;
-+}
-+
-+/**
-+ * kdbus_bus_find_conn_by_id() - find a connection with a given id
-+ * @bus: The bus to look for the connection
-+ * @id: The 64-bit connection id
-+ *
-+ * Looks up a connection with a given id. The returned connection
-+ * is ref'ed, and needs to be unref'ed by the user. Returns NULL if
-+ * the connection can't be found.
-+ */
-+struct kdbus_conn *kdbus_bus_find_conn_by_id(struct kdbus_bus *bus, u64 id)
-+{
-+ struct kdbus_conn *conn, *found = NULL;
-+
-+ down_read(&bus->conn_rwlock);
-+ hash_for_each_possible(bus->conn_hash, conn, hentry, id)
-+ if (conn->id == id) {
-+ found = kdbus_conn_ref(conn);
-+ break;
-+ }
-+ up_read(&bus->conn_rwlock);
-+
-+ return found;
-+}
-+
-+/**
-+ * kdbus_bus_broadcast() - send a message to all subscribed connections
-+ * @bus: The bus the connections are connected to
-+ * @conn_src: The source connection, may be %NULL for kernel notifications
-+ * @staging: Staging object containing the message to send
-+ *
-+ * Send message to all connections that are currently active on the bus.
-+ * Connections must still have matches installed in order to let the message
-+ * pass.
-+ *
-+ * The caller must hold the name-registry lock of @bus.
-+ */
-+void kdbus_bus_broadcast(struct kdbus_bus *bus,
-+ struct kdbus_conn *conn_src,
-+ struct kdbus_staging *staging)
-+{
-+ struct kdbus_conn *conn_dst;
-+ unsigned int i;
-+ int ret;
-+
-+ lockdep_assert_held(&bus->name_registry->rwlock);
-+
-+ /*
-+ * Make sure broadcast are queued on monitors before we send it out to
-+ * anyone else. Otherwise, connections might react to broadcasts before
-+ * the monitor gets the broadcast queued. In the worst case, the
-+ * monitor sees a reaction to the broadcast before the broadcast itself.
-+ * We don't give ordering guarantees across connections (and monitors
-+ * can re-construct order via sequence numbers), but we should at least
-+ * try to avoid re-ordering for monitors.
-+ */
-+ kdbus_bus_eavesdrop(bus, conn_src, staging);
-+
-+ down_read(&bus->conn_rwlock);
-+ hash_for_each(bus->conn_hash, i, conn_dst, hentry) {
-+ if (!kdbus_conn_is_ordinary(conn_dst))
-+ continue;
-+
-+ /*
-+ * Check if there is a match for the kmsg object in
-+ * the destination connection match db
-+ */
-+ if (!kdbus_match_db_match_msg(conn_dst->match_db, conn_src,
-+ staging))
-+ continue;
-+
-+ if (conn_src) {
-+ /*
-+ * Anyone can send broadcasts, as they have no
-+ * destination. But a receiver needs TALK access to
-+ * the sender in order to receive broadcasts.
-+ */
-+ if (!kdbus_conn_policy_talk(conn_dst, NULL, conn_src))
-+ continue;
-+ } else {
-+ /*
-+ * Check if there is a policy db that prevents the
-+ * destination connection from receiving this kernel
-+ * notification
-+ */
-+ if (!kdbus_conn_policy_see_notification(conn_dst, NULL,
-+ staging->msg))
-+ continue;
-+ }
-+
-+ ret = kdbus_conn_entry_insert(conn_src, conn_dst, staging,
-+ NULL, NULL);
-+ if (ret < 0)
-+ kdbus_conn_lost_message(conn_dst);
-+ }
-+ up_read(&bus->conn_rwlock);
-+}
-+
-+/**
-+ * kdbus_bus_eavesdrop() - send a message to all subscribed monitors
-+ * @bus: The bus the monitors are connected to
-+ * @conn_src: The source connection, may be %NULL for kernel notifications
-+ * @staging: Staging object containing the message to send
-+ *
-+ * Send message to all monitors that are currently active on the bus. Monitors
-+ * must still have matches installed in order to let the message pass.
-+ *
-+ * The caller must hold the name-registry lock of @bus.
-+ */
-+void kdbus_bus_eavesdrop(struct kdbus_bus *bus,
-+ struct kdbus_conn *conn_src,
-+ struct kdbus_staging *staging)
-+{
-+ struct kdbus_conn *conn_dst;
-+ int ret;
-+
-+ /*
-+ * Monitor connections get all messages; ignore possible errors
-+ * when sending messages to monitor connections.
-+ */
-+
-+ lockdep_assert_held(&bus->name_registry->rwlock);
-+
-+ down_read(&bus->conn_rwlock);
-+ list_for_each_entry(conn_dst, &bus->monitors_list, monitor_entry) {
-+ ret = kdbus_conn_entry_insert(conn_src, conn_dst, staging,
-+ NULL, NULL);
-+ if (ret < 0)
-+ kdbus_conn_lost_message(conn_dst);
-+ }
-+ up_read(&bus->conn_rwlock);
-+}
-+
-+/**
-+ * kdbus_cmd_bus_make() - handle KDBUS_CMD_BUS_MAKE
-+ * @domain: domain to operate on
-+ * @argp: command payload
-+ *
-+ * Return: NULL or newly created bus on success, ERR_PTR on failure.
-+ */
-+struct kdbus_bus *kdbus_cmd_bus_make(struct kdbus_domain *domain,
-+ void __user *argp)
-+{
-+ struct kdbus_bus *bus = NULL;
-+ struct kdbus_cmd *cmd;
-+ struct kdbus_ep *ep = NULL;
-+ int ret;
-+
-+ struct kdbus_arg argv[] = {
-+ { .type = KDBUS_ITEM_NEGOTIATE },
-+ { .type = KDBUS_ITEM_MAKE_NAME, .mandatory = true },
-+ { .type = KDBUS_ITEM_BLOOM_PARAMETER, .mandatory = true },
-+ { .type = KDBUS_ITEM_ATTACH_FLAGS_SEND },
-+ };
-+ struct kdbus_args args = {
-+ .allowed_flags = KDBUS_FLAG_NEGOTIATE |
-+ KDBUS_MAKE_ACCESS_GROUP |
-+ KDBUS_MAKE_ACCESS_WORLD,
-+ .argv = argv,
-+ .argc = ARRAY_SIZE(argv),
-+ };
-+
-+ ret = kdbus_args_parse(&args, argp, &cmd);
-+ if (ret < 0)
-+ return ERR_PTR(ret);
-+ if (ret > 0)
-+ return NULL;
-+
-+ bus = kdbus_bus_new(domain,
-+ argv[1].item->str, &argv[2].item->bloom_parameter,
-+ argv[3].item ? argv[3].item->data64 : NULL,
-+ cmd->flags, current_euid(), current_egid());
-+ if (IS_ERR(bus)) {
-+ ret = PTR_ERR(bus);
-+ bus = NULL;
-+ goto exit;
-+ }
-+
-+ if (atomic_inc_return(&bus->creator->buses) > KDBUS_USER_MAX_BUSES) {
-+ atomic_dec(&bus->creator->buses);
-+ ret = -EMFILE;
-+ goto exit;
-+ }
-+
-+ if (!kdbus_node_activate(&bus->node)) {
-+ atomic_dec(&bus->creator->buses);
-+ ret = -ESHUTDOWN;
-+ goto exit;
-+ }
-+
-+ ep = kdbus_ep_new(bus, "bus", cmd->flags, bus->node.uid, bus->node.gid,
-+ false);
-+ if (IS_ERR(ep)) {
-+ ret = PTR_ERR(ep);
-+ ep = NULL;
-+ goto exit;
-+ }
-+
-+ if (!kdbus_node_activate(&ep->node)) {
-+ ret = -ESHUTDOWN;
-+ goto exit;
-+ }
-+
-+ /*
-+ * Drop our own reference, effectively causing the endpoint to be
-+ * deactivated and released when the parent bus is.
-+ */
-+ ep = kdbus_ep_unref(ep);
-+
-+exit:
-+ ret = kdbus_args_clear(&args, ret);
-+ if (ret < 0) {
-+ if (ep) {
-+ kdbus_node_deactivate(&ep->node);
-+ kdbus_ep_unref(ep);
-+ }
-+ if (bus) {
-+ kdbus_node_deactivate(&bus->node);
-+ kdbus_bus_unref(bus);
-+ }
-+ return ERR_PTR(ret);
-+ }
-+ return bus;
-+}
-+
-+/**
-+ * kdbus_cmd_bus_creator_info() - handle KDBUS_CMD_BUS_CREATOR_INFO
-+ * @conn: connection to operate on
-+ * @argp: command payload
-+ *
-+ * Return: >=0 on success, negative error code on failure.
-+ */
-+int kdbus_cmd_bus_creator_info(struct kdbus_conn *conn, void __user *argp)
-+{
-+ struct kdbus_cmd_info *cmd;
-+ struct kdbus_bus *bus = conn->ep->bus;
-+ struct kdbus_pool_slice *slice = NULL;
-+ struct kdbus_item *meta_items = NULL;
-+ struct kdbus_item_header item_hdr;
-+ struct kdbus_info info = {};
-+ size_t meta_size, name_len, cnt = 0;
-+ struct kvec kvec[6];
-+ u64 attach_flags, size = 0;
-+ int ret;
-+
-+ struct kdbus_arg argv[] = {
-+ { .type = KDBUS_ITEM_NEGOTIATE },
-+ };
-+ struct kdbus_args args = {
-+ .allowed_flags = KDBUS_FLAG_NEGOTIATE,
-+ .argv = argv,
-+ .argc = ARRAY_SIZE(argv),
-+ };
-+
-+ ret = kdbus_args_parse(&args, argp, &cmd);
-+ if (ret != 0)
-+ return ret;
-+
-+ ret = kdbus_sanitize_attach_flags(cmd->attach_flags, &attach_flags);
-+ if (ret < 0)
-+ goto exit;
-+
-+ attach_flags &= bus->attach_flags_owner;
-+
-+ ret = kdbus_meta_emit(bus->creator_meta, NULL, NULL, conn,
-+ attach_flags, &meta_items, &meta_size);
-+ if (ret < 0)
-+ goto exit;
-+
-+ name_len = strlen(bus->node.name) + 1;
-+ info.id = bus->id;
-+ info.flags = bus->bus_flags;
-+ item_hdr.type = KDBUS_ITEM_MAKE_NAME;
-+ item_hdr.size = KDBUS_ITEM_HEADER_SIZE + name_len;
-+
-+ kdbus_kvec_set(&kvec[cnt++], &info, sizeof(info), &size);
-+ kdbus_kvec_set(&kvec[cnt++], &item_hdr, sizeof(item_hdr), &size);
-+ kdbus_kvec_set(&kvec[cnt++], bus->node.name, name_len, &size);
-+ cnt += !!kdbus_kvec_pad(&kvec[cnt], &size);
-+ if (meta_size > 0) {
-+ kdbus_kvec_set(&kvec[cnt++], meta_items, meta_size, &size);
-+ cnt += !!kdbus_kvec_pad(&kvec[cnt], &size);
-+ }
-+
-+ info.size = size;
-+
-+ slice = kdbus_pool_slice_alloc(conn->pool, size, false);
-+ if (IS_ERR(slice)) {
-+ ret = PTR_ERR(slice);
-+ slice = NULL;
-+ goto exit;
-+ }
-+
-+ ret = kdbus_pool_slice_copy_kvec(slice, 0, kvec, cnt, size);
-+ if (ret < 0)
-+ goto exit;
-+
-+ kdbus_pool_slice_publish(slice, &cmd->offset, &cmd->info_size);
-+
-+ if (kdbus_member_set_user(&cmd->offset, argp, typeof(*cmd), offset) ||
-+ kdbus_member_set_user(&cmd->info_size, argp,
-+ typeof(*cmd), info_size))
-+ ret = -EFAULT;
-+
-+exit:
-+ kdbus_pool_slice_release(slice);
-+ kfree(meta_items);
-+ return kdbus_args_clear(&args, ret);
-+}
-diff --git a/ipc/kdbus/bus.h b/ipc/kdbus/bus.h
-new file mode 100644
-index 0000000..8c2acae
---- /dev/null
-+++ b/ipc/kdbus/bus.h
-@@ -0,0 +1,101 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_BUS_H
-+#define __KDBUS_BUS_H
-+
-+#include <linux/hashtable.h>
-+#include <linux/list.h>
-+#include <linux/mutex.h>
-+#include <linux/rwsem.h>
-+#include <linux/spinlock.h>
-+#include <uapi/linux/kdbus.h>
-+
-+#include "metadata.h"
-+#include "names.h"
-+#include "node.h"
-+#include "policy.h"
-+
-+struct kdbus_conn;
-+struct kdbus_domain;
-+struct kdbus_staging;
-+struct kdbus_user;
-+
-+/**
-+ * struct kdbus_bus - bus in a domain
-+ * @node: kdbus_node
-+ * @id: ID of this bus in the domain
-+ * @bus_flags: Simple pass-through flags from userspace to userspace
-+ * @attach_flags_owner: KDBUS_ATTACH_* flags of bus creator that other
-+ * connections can see or query
-+ * @id128: Unique random 128 bit ID of this bus
-+ * @bloom: Bloom parameters
-+ * @domain: Domain of this bus
-+ * @creator: Creator of the bus
-+ * @creator_meta: Meta information about the bus creator
-+ * @last_message_id: Last used message id
-+ * @policy_db: Policy database for this bus
-+ * @name_registry: Name registry of this bus
-+ * @conn_rwlock: Read/Write lock for all lists of child connections
-+ * @conn_hash: Map of connection IDs
-+ * @monitors_list: Connections that monitor this bus
-+ * @notify_list: List of pending kernel-generated messages
-+ * @notify_lock: Notification list lock
-+ * @notify_flush_lock: Notification flushing lock
-+ */
-+struct kdbus_bus {
-+ struct kdbus_node node;
-+
-+ /* static */
-+ u64 id;
-+ u64 bus_flags;
-+ u64 attach_flags_owner;
-+ u8 id128[16];
-+ struct kdbus_bloom_parameter bloom;
-+ struct kdbus_domain *domain;
-+ struct kdbus_user *creator;
-+ struct kdbus_meta_proc *creator_meta;
-+
-+ /* protected by own locks */
-+ atomic64_t last_message_id;
-+ struct kdbus_policy_db policy_db;
-+ struct kdbus_name_registry *name_registry;
-+
-+ /* protected by conn_rwlock */
-+ struct rw_semaphore conn_rwlock;
-+ DECLARE_HASHTABLE(conn_hash, 8);
-+ struct list_head monitors_list;
-+
-+ /* protected by notify_lock */
-+ struct list_head notify_list;
-+ spinlock_t notify_lock;
-+ struct mutex notify_flush_lock;
-+};
-+
-+struct kdbus_bus *kdbus_bus_ref(struct kdbus_bus *bus);
-+struct kdbus_bus *kdbus_bus_unref(struct kdbus_bus *bus);
-+
-+struct kdbus_conn *kdbus_bus_find_conn_by_id(struct kdbus_bus *bus, u64 id);
-+void kdbus_bus_broadcast(struct kdbus_bus *bus,
-+ struct kdbus_conn *conn_src,
-+ struct kdbus_staging *staging);
-+void kdbus_bus_eavesdrop(struct kdbus_bus *bus,
-+ struct kdbus_conn *conn_src,
-+ struct kdbus_staging *staging);
-+
-+struct kdbus_bus *kdbus_cmd_bus_make(struct kdbus_domain *domain,
-+ void __user *argp);
-+int kdbus_cmd_bus_creator_info(struct kdbus_conn *conn, void __user *argp);
-+
-+#endif
-diff --git a/ipc/kdbus/connection.c b/ipc/kdbus/connection.c
-new file mode 100644
-index 0000000..ef63d65
---- /dev/null
-+++ b/ipc/kdbus/connection.c
-@@ -0,0 +1,2227 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/audit.h>
-+#include <linux/file.h>
-+#include <linux/fs.h>
-+#include <linux/fs_struct.h>
-+#include <linux/hashtable.h>
-+#include <linux/idr.h>
-+#include <linux/init.h>
-+#include <linux/math64.h>
-+#include <linux/mm.h>
-+#include <linux/module.h>
-+#include <linux/mutex.h>
-+#include <linux/path.h>
-+#include <linux/poll.h>
-+#include <linux/sched.h>
-+#include <linux/shmem_fs.h>
-+#include <linux/sizes.h>
-+#include <linux/slab.h>
-+#include <linux/syscalls.h>
-+#include <linux/uio.h>
-+
-+#include "bus.h"
-+#include "connection.h"
-+#include "endpoint.h"
-+#include "handle.h"
-+#include "match.h"
-+#include "message.h"
-+#include "metadata.h"
-+#include "names.h"
-+#include "domain.h"
-+#include "item.h"
-+#include "notify.h"
-+#include "policy.h"
-+#include "pool.h"
-+#include "reply.h"
-+#include "util.h"
-+#include "queue.h"
-+
-+#define KDBUS_CONN_ACTIVE_BIAS (INT_MIN + 2)
-+#define KDBUS_CONN_ACTIVE_NEW (INT_MIN + 1)
-+
-+static struct kdbus_conn *kdbus_conn_new(struct kdbus_ep *ep,
-+ struct file *file,
-+ struct kdbus_cmd_hello *hello,
-+ const char *name,
-+ const struct kdbus_creds *creds,
-+ const struct kdbus_pids *pids,
-+ const char *seclabel,
-+ const char *conn_description)
-+{
-+#ifdef CONFIG_DEBUG_LOCK_ALLOC
-+ static struct lock_class_key __key;
-+#endif
-+ struct kdbus_pool_slice *slice = NULL;
-+ struct kdbus_bus *bus = ep->bus;
-+ struct kdbus_conn *conn;
-+ u64 attach_flags_send;
-+ u64 attach_flags_recv;
-+ u64 items_size = 0;
-+ bool is_policy_holder;
-+ bool is_activator;
-+ bool is_monitor;
-+ bool privileged;
-+ bool owner;
-+ struct kvec kvec;
-+ int ret;
-+
-+ struct {
-+ u64 size;
-+ u64 type;
-+ struct kdbus_bloom_parameter bloom;
-+ } bloom_item;
-+
-+ privileged = kdbus_ep_is_privileged(ep, file);
-+ owner = kdbus_ep_is_owner(ep, file);
-+
-+ is_monitor = hello->flags & KDBUS_HELLO_MONITOR;
-+ is_activator = hello->flags & KDBUS_HELLO_ACTIVATOR;
-+ is_policy_holder = hello->flags & KDBUS_HELLO_POLICY_HOLDER;
-+
-+ if (!hello->pool_size || !IS_ALIGNED(hello->pool_size, PAGE_SIZE))
-+ return ERR_PTR(-EINVAL);
-+ if (is_monitor + is_activator + is_policy_holder > 1)
-+ return ERR_PTR(-EINVAL);
-+ if (name && !is_activator && !is_policy_holder)
-+ return ERR_PTR(-EINVAL);
-+ if (!name && (is_activator || is_policy_holder))
-+ return ERR_PTR(-EINVAL);
-+ if (name && !kdbus_name_is_valid(name, true))
-+ return ERR_PTR(-EINVAL);
-+ if (is_monitor && ep->user)
-+ return ERR_PTR(-EOPNOTSUPP);
-+ if (!owner && (is_activator || is_policy_holder || is_monitor))
-+ return ERR_PTR(-EPERM);
-+ if (!owner && (creds || pids || seclabel))
-+ return ERR_PTR(-EPERM);
-+
-+ ret = kdbus_sanitize_attach_flags(hello->attach_flags_send,
-+ &attach_flags_send);
-+ if (ret < 0)
-+ return ERR_PTR(ret);
-+
-+ ret = kdbus_sanitize_attach_flags(hello->attach_flags_recv,
-+ &attach_flags_recv);
-+ if (ret < 0)
-+ return ERR_PTR(ret);
-+
-+ conn = kzalloc(sizeof(*conn), GFP_KERNEL);
-+ if (!conn)
-+ return ERR_PTR(-ENOMEM);
-+
-+ kref_init(&conn->kref);
-+ atomic_set(&conn->active, KDBUS_CONN_ACTIVE_NEW);
-+#ifdef CONFIG_DEBUG_LOCK_ALLOC
-+ lockdep_init_map(&conn->dep_map, "s_active", &__key, 0);
-+#endif
-+ mutex_init(&conn->lock);
-+ INIT_LIST_HEAD(&conn->names_list);
-+ INIT_LIST_HEAD(&conn->reply_list);
-+ atomic_set(&conn->request_count, 0);
-+ atomic_set(&conn->lost_count, 0);
-+ INIT_DELAYED_WORK(&conn->work, kdbus_reply_list_scan_work);
-+ conn->cred = get_cred(file->f_cred);
-+ conn->pid = get_pid(task_pid(current));
-+ get_fs_root(current->fs, &conn->root_path);
-+ init_waitqueue_head(&conn->wait);
-+ kdbus_queue_init(&conn->queue);
-+ conn->privileged = privileged;
-+ conn->owner = owner;
-+ conn->ep = kdbus_ep_ref(ep);
-+ conn->id = atomic64_inc_return(&bus->domain->last_id);
-+ conn->flags = hello->flags;
-+ atomic64_set(&conn->attach_flags_send, attach_flags_send);
-+ atomic64_set(&conn->attach_flags_recv, attach_flags_recv);
-+ INIT_LIST_HEAD(&conn->monitor_entry);
-+
-+ if (conn_description) {
-+ conn->description = kstrdup(conn_description, GFP_KERNEL);
-+ if (!conn->description) {
-+ ret = -ENOMEM;
-+ goto exit_unref;
-+ }
-+ }
-+
-+ conn->pool = kdbus_pool_new(conn->description, hello->pool_size);
-+ if (IS_ERR(conn->pool)) {
-+ ret = PTR_ERR(conn->pool);
-+ conn->pool = NULL;
-+ goto exit_unref;
-+ }
-+
-+ conn->match_db = kdbus_match_db_new();
-+ if (IS_ERR(conn->match_db)) {
-+ ret = PTR_ERR(conn->match_db);
-+ conn->match_db = NULL;
-+ goto exit_unref;
-+ }
-+
-+ /* return properties of this connection to the caller */
-+ hello->bus_flags = bus->bus_flags;
-+ hello->id = conn->id;
-+
-+ BUILD_BUG_ON(sizeof(bus->id128) != sizeof(hello->id128));
-+ memcpy(hello->id128, bus->id128, sizeof(hello->id128));
-+
-+ /* privileged processes can impersonate somebody else */
-+ if (creds || pids || seclabel) {
-+ conn->meta_fake = kdbus_meta_fake_new();
-+ if (IS_ERR(conn->meta_fake)) {
-+ ret = PTR_ERR(conn->meta_fake);
-+ conn->meta_fake = NULL;
-+ goto exit_unref;
-+ }
-+
-+ ret = kdbus_meta_fake_collect(conn->meta_fake,
-+ creds, pids, seclabel);
-+ if (ret < 0)
-+ goto exit_unref;
-+ } else {
-+ conn->meta_proc = kdbus_meta_proc_new();
-+ if (IS_ERR(conn->meta_proc)) {
-+ ret = PTR_ERR(conn->meta_proc);
-+ conn->meta_proc = NULL;
-+ goto exit_unref;
-+ }
-+
-+ ret = kdbus_meta_proc_collect(conn->meta_proc,
-+ KDBUS_ATTACH_CREDS |
-+ KDBUS_ATTACH_PIDS |
-+ KDBUS_ATTACH_AUXGROUPS |
-+ KDBUS_ATTACH_TID_COMM |
-+ KDBUS_ATTACH_PID_COMM |
-+ KDBUS_ATTACH_EXE |
-+ KDBUS_ATTACH_CMDLINE |
-+ KDBUS_ATTACH_CGROUP |
-+ KDBUS_ATTACH_CAPS |
-+ KDBUS_ATTACH_SECLABEL |
-+ KDBUS_ATTACH_AUDIT);
-+ if (ret < 0)
-+ goto exit_unref;
-+ }
-+
-+ /*
-+ * Account the connection against the current user (UID), or for
-+ * custom endpoints use the anonymous user assigned to the endpoint.
-+ * Note that limits are always accounted against the real UID, not
-+ * the effective UID (cred->user always points to the accounting of
-+ * cred->uid, not cred->euid).
-+ * In case the caller is privileged, we allow changing the accounting
-+ * to the faked user.
-+ */
-+ if (ep->user) {
-+ conn->user = kdbus_user_ref(ep->user);
-+ } else {
-+ kuid_t uid;
-+
-+ if (conn->meta_fake && uid_valid(conn->meta_fake->uid) &&
-+ conn->privileged)
-+ uid = conn->meta_fake->uid;
-+ else
-+ uid = conn->cred->uid;
-+
-+ conn->user = kdbus_user_lookup(ep->bus->domain, uid);
-+ if (IS_ERR(conn->user)) {
-+ ret = PTR_ERR(conn->user);
-+ conn->user = NULL;
-+ goto exit_unref;
-+ }
-+ }
-+
-+ if (atomic_inc_return(&conn->user->connections) > KDBUS_USER_MAX_CONN) {
-+ /* decremented by destructor as conn->user is valid */
-+ ret = -EMFILE;
-+ goto exit_unref;
-+ }
-+
-+ bloom_item.size = sizeof(bloom_item);
-+ bloom_item.type = KDBUS_ITEM_BLOOM_PARAMETER;
-+ bloom_item.bloom = bus->bloom;
-+ kdbus_kvec_set(&kvec, &bloom_item, bloom_item.size, &items_size);
-+
-+ slice = kdbus_pool_slice_alloc(conn->pool, items_size, false);
-+ if (IS_ERR(slice)) {
-+ ret = PTR_ERR(slice);
-+ slice = NULL;
-+ goto exit_unref;
-+ }
-+
-+ ret = kdbus_pool_slice_copy_kvec(slice, 0, &kvec, 1, items_size);
-+ if (ret < 0)
-+ goto exit_unref;
-+
-+ kdbus_pool_slice_publish(slice, &hello->offset, &hello->items_size);
-+ kdbus_pool_slice_release(slice);
-+
-+ return conn;
-+
-+exit_unref:
-+ kdbus_pool_slice_release(slice);
-+ kdbus_conn_unref(conn);
-+ return ERR_PTR(ret);
-+}
-+
-+static void __kdbus_conn_free(struct kref *kref)
-+{
-+ struct kdbus_conn *conn = container_of(kref, struct kdbus_conn, kref);
-+
-+ WARN_ON(kdbus_conn_active(conn));
-+ WARN_ON(delayed_work_pending(&conn->work));
-+ WARN_ON(!list_empty(&conn->queue.msg_list));
-+ WARN_ON(!list_empty(&conn->names_list));
-+ WARN_ON(!list_empty(&conn->reply_list));
-+
-+ if (conn->user) {
-+ atomic_dec(&conn->user->connections);
-+ kdbus_user_unref(conn->user);
-+ }
-+
-+ kdbus_meta_fake_free(conn->meta_fake);
-+ kdbus_meta_proc_unref(conn->meta_proc);
-+ kdbus_match_db_free(conn->match_db);
-+ kdbus_pool_free(conn->pool);
-+ kdbus_ep_unref(conn->ep);
-+ path_put(&conn->root_path);
-+ put_pid(conn->pid);
-+ put_cred(conn->cred);
-+ kfree(conn->description);
-+ kfree(conn->quota);
-+ kfree(conn);
-+}
-+
-+/**
-+ * kdbus_conn_ref() - take a connection reference
-+ * @conn: Connection, may be %NULL
-+ *
-+ * Return: the connection itself
-+ */
-+struct kdbus_conn *kdbus_conn_ref(struct kdbus_conn *conn)
-+{
-+ if (conn)
-+ kref_get(&conn->kref);
-+ return conn;
-+}
-+
-+/**
-+ * kdbus_conn_unref() - drop a connection reference
-+ * @conn: Connection (may be NULL)
-+ *
-+ * When the last reference is dropped, the connection's internal structure
-+ * is freed.
-+ *
-+ * Return: NULL
-+ */
-+struct kdbus_conn *kdbus_conn_unref(struct kdbus_conn *conn)
-+{
-+ if (conn)
-+ kref_put(&conn->kref, __kdbus_conn_free);
-+ return NULL;
-+}
-+
-+/**
-+ * kdbus_conn_active() - connection is not disconnected
-+ * @conn: Connection to check
-+ *
-+ * Return true if the connection was not disconnected, yet. Note that a
-+ * connection might be disconnected asynchronously, unless you hold the
-+ * connection lock. If that's not suitable for you, see kdbus_conn_acquire() to
-+ * suppress connection shutdown for a short period.
-+ *
-+ * Return: true if the connection is still active
-+ */
-+bool kdbus_conn_active(const struct kdbus_conn *conn)
-+{
-+ return atomic_read(&conn->active) >= 0;
-+}
-+
-+/**
-+ * kdbus_conn_acquire() - acquire an active connection reference
-+ * @conn: Connection
-+ *
-+ * Users can close a connection via KDBUS_BYEBYE (or by destroying the
-+ * endpoint/bus/...) at any time. Whenever this happens, we should deny any
-+ * user-visible action on this connection and signal ECONNRESET instead.
-+ * To avoid testing for connection availability everytime you take the
-+ * connection-lock, you can acquire a connection for short periods.
-+ *
-+ * By calling kdbus_conn_acquire(), you gain an "active reference" to the
-+ * connection. You must also hold a regular reference at any time! As long as
-+ * you hold the active-ref, the connection will not be shut down. However, if
-+ * the connection was shut down, you can never acquire an active-ref again.
-+ *
-+ * kdbus_conn_disconnect() disables the connection and then waits for all active
-+ * references to be dropped. It will also wake up any pending operation.
-+ * However, you must not sleep for an indefinite period while holding an
-+ * active-reference. Otherwise, kdbus_conn_disconnect() might stall. If you need
-+ * to sleep for an indefinite period, either release the reference and try to
-+ * acquire it again after waking up, or make kdbus_conn_disconnect() wake up
-+ * your wait-queue.
-+ *
-+ * Return: 0 on success, negative error code on failure.
-+ */
-+int kdbus_conn_acquire(struct kdbus_conn *conn)
-+{
-+ if (!atomic_inc_unless_negative(&conn->active))
-+ return -ECONNRESET;
-+
-+#ifdef CONFIG_DEBUG_LOCK_ALLOC
-+ rwsem_acquire_read(&conn->dep_map, 0, 1, _RET_IP_);
-+#endif
-+
-+ return 0;
-+}
-+
-+/**
-+ * kdbus_conn_release() - release an active connection reference
-+ * @conn: Connection
-+ *
-+ * This releases an active reference that has been acquired via
-+ * kdbus_conn_acquire(). If the connection was already disabled and this is the
-+ * last active-ref that is dropped, the disconnect-waiter will be woken up and
-+ * properly close the connection.
-+ */
-+void kdbus_conn_release(struct kdbus_conn *conn)
-+{
-+ int v;
-+
-+ if (!conn)
-+ return;
-+
-+#ifdef CONFIG_DEBUG_LOCK_ALLOC
-+ rwsem_release(&conn->dep_map, 1, _RET_IP_);
-+#endif
-+
-+ v = atomic_dec_return(&conn->active);
-+ if (v != KDBUS_CONN_ACTIVE_BIAS)
-+ return;
-+
-+ wake_up_all(&conn->wait);
-+}
-+
-+static int kdbus_conn_connect(struct kdbus_conn *conn, const char *name)
-+{
-+ struct kdbus_ep *ep = conn->ep;
-+ struct kdbus_bus *bus = ep->bus;
-+ int ret;
-+
-+ if (WARN_ON(atomic_read(&conn->active) != KDBUS_CONN_ACTIVE_NEW))
-+ return -EALREADY;
-+
-+ /* make sure the ep-node is active while we add our connection */
-+ if (!kdbus_node_acquire(&ep->node))
-+ return -ESHUTDOWN;
-+
-+ /* lock order: domain -> bus -> ep -> names -> conn */
-+ mutex_lock(&ep->lock);
-+ down_write(&bus->conn_rwlock);
-+
-+ /* link into monitor list */
-+ if (kdbus_conn_is_monitor(conn))
-+ list_add_tail(&conn->monitor_entry, &bus->monitors_list);
-+
-+ /* link into bus and endpoint */
-+ list_add_tail(&conn->ep_entry, &ep->conn_list);
-+ hash_add(bus->conn_hash, &conn->hentry, conn->id);
-+
-+ /* enable lookups and acquire active ref */
-+ atomic_set(&conn->active, 1);
-+#ifdef CONFIG_DEBUG_LOCK_ALLOC
-+ rwsem_acquire_read(&conn->dep_map, 0, 1, _RET_IP_);
-+#endif
-+
-+ up_write(&bus->conn_rwlock);
-+ mutex_unlock(&ep->lock);
-+
-+ kdbus_node_release(&ep->node);
-+
-+ /*
-+ * Notify subscribers about the new active connection, unless it is
-+ * a monitor. Monitors are invisible on the bus, can't be addressed
-+ * directly, and won't cause any notifications.
-+ */
-+ if (!kdbus_conn_is_monitor(conn)) {
-+ ret = kdbus_notify_id_change(bus, KDBUS_ITEM_ID_ADD,
-+ conn->id, conn->flags);
-+ if (ret < 0)
-+ goto exit_disconnect;
-+ }
-+
-+ if (kdbus_conn_is_activator(conn)) {
-+ u64 flags = KDBUS_NAME_ACTIVATOR;
-+
-+ if (WARN_ON(!name)) {
-+ ret = -EINVAL;
-+ goto exit_disconnect;
-+ }
-+
-+ ret = kdbus_name_acquire(bus->name_registry, conn, name,
-+ flags, NULL);
-+ if (ret < 0)
-+ goto exit_disconnect;
-+ }
-+
-+ kdbus_conn_release(conn);
-+ kdbus_notify_flush(bus);
-+ return 0;
-+
-+exit_disconnect:
-+ kdbus_conn_release(conn);
-+ kdbus_conn_disconnect(conn, false);
-+ return ret;
-+}
-+
-+/**
-+ * kdbus_conn_disconnect() - disconnect a connection
-+ * @conn: The connection to disconnect
-+ * @ensure_queue_empty: Flag to indicate if the call should fail in
-+ * case the connection's message list is not
-+ * empty
-+ *
-+ * If @ensure_msg_list_empty is true, and the connection has pending messages,
-+ * -EBUSY is returned.
-+ *
-+ * Return: 0 on success, negative errno on failure
-+ */
-+int kdbus_conn_disconnect(struct kdbus_conn *conn, bool ensure_queue_empty)
-+{
-+ struct kdbus_queue_entry *entry, *tmp;
-+ struct kdbus_bus *bus = conn->ep->bus;
-+ struct kdbus_reply *r, *r_tmp;
-+ struct kdbus_conn *c;
-+ int i, v;
-+
-+ mutex_lock(&conn->lock);
-+ v = atomic_read(&conn->active);
-+ if (v == KDBUS_CONN_ACTIVE_NEW) {
-+ /* was never connected */
-+ mutex_unlock(&conn->lock);
-+ return 0;
-+ }
-+ if (v < 0) {
-+ /* already dead */
-+ mutex_unlock(&conn->lock);
-+ return -ECONNRESET;
-+ }
-+ if (ensure_queue_empty && !list_empty(&conn->queue.msg_list)) {
-+ /* still busy */
-+ mutex_unlock(&conn->lock);
-+ return -EBUSY;
-+ }
-+
-+ atomic_add(KDBUS_CONN_ACTIVE_BIAS, &conn->active);
-+ mutex_unlock(&conn->lock);
-+
-+ wake_up_interruptible(&conn->wait);
-+
-+#ifdef CONFIG_DEBUG_LOCK_ALLOC
-+ rwsem_acquire(&conn->dep_map, 0, 0, _RET_IP_);
-+ if (atomic_read(&conn->active) != KDBUS_CONN_ACTIVE_BIAS)
-+ lock_contended(&conn->dep_map, _RET_IP_);
-+#endif
-+
-+ wait_event(conn->wait,
-+ atomic_read(&conn->active) == KDBUS_CONN_ACTIVE_BIAS);
-+
-+#ifdef CONFIG_DEBUG_LOCK_ALLOC
-+ lock_acquired(&conn->dep_map, _RET_IP_);
-+ rwsem_release(&conn->dep_map, 1, _RET_IP_);
-+#endif
-+
-+ cancel_delayed_work_sync(&conn->work);
-+ kdbus_policy_remove_owner(&conn->ep->bus->policy_db, conn);
-+
-+ /* lock order: domain -> bus -> ep -> names -> conn */
-+ mutex_lock(&conn->ep->lock);
-+ down_write(&bus->conn_rwlock);
-+
-+ /* remove from bus and endpoint */
-+ hash_del(&conn->hentry);
-+ list_del(&conn->monitor_entry);
-+ list_del(&conn->ep_entry);
-+
-+ up_write(&bus->conn_rwlock);
-+ mutex_unlock(&conn->ep->lock);
-+
-+ /*
-+ * Remove all names associated with this connection; this possibly
-+ * moves queued messages back to the activator connection.
-+ */
-+ kdbus_name_release_all(bus->name_registry, conn);
-+
-+ /* if we die while other connections wait for our reply, notify them */
-+ mutex_lock(&conn->lock);
-+ list_for_each_entry_safe(entry, tmp, &conn->queue.msg_list, entry) {
-+ if (entry->reply)
-+ kdbus_notify_reply_dead(bus,
-+ entry->reply->reply_dst->id,
-+ entry->reply->cookie);
-+ kdbus_queue_entry_free(entry);
-+ }
-+
-+ list_for_each_entry_safe(r, r_tmp, &conn->reply_list, entry)
-+ kdbus_reply_unlink(r);
-+ mutex_unlock(&conn->lock);
-+
-+ /* lock order: domain -> bus -> ep -> names -> conn */
-+ down_read(&bus->conn_rwlock);
-+ hash_for_each(bus->conn_hash, i, c, hentry) {
-+ mutex_lock(&c->lock);
-+ list_for_each_entry_safe(r, r_tmp, &c->reply_list, entry) {
-+ if (r->reply_src != conn)
-+ continue;
-+
-+ if (r->sync)
-+ kdbus_sync_reply_wakeup(r, -EPIPE);
-+ else
-+ /* send a 'connection dead' notification */
-+ kdbus_notify_reply_dead(bus, c->id, r->cookie);
-+
-+ kdbus_reply_unlink(r);
-+ }
-+ mutex_unlock(&c->lock);
-+ }
-+ up_read(&bus->conn_rwlock);
-+
-+ if (!kdbus_conn_is_monitor(conn))
-+ kdbus_notify_id_change(bus, KDBUS_ITEM_ID_REMOVE,
-+ conn->id, conn->flags);
-+
-+ kdbus_notify_flush(bus);
-+
-+ return 0;
-+}
-+
-+/**
-+ * kdbus_conn_has_name() - check if a connection owns a name
-+ * @conn: Connection
-+ * @name: Well-know name to check for
-+ *
-+ * The caller must hold the registry lock of conn->ep->bus.
-+ *
-+ * Return: true if the name is currently owned by the connection
-+ */
-+bool kdbus_conn_has_name(struct kdbus_conn *conn, const char *name)
-+{
-+ struct kdbus_name_owner *owner;
-+
-+ lockdep_assert_held(&conn->ep->bus->name_registry->rwlock);
-+
-+ list_for_each_entry(owner, &conn->names_list, conn_entry)
-+ if (!(owner->flags & KDBUS_NAME_IN_QUEUE) &&
-+ !strcmp(name, owner->name->name))
-+ return true;
-+
-+ return false;
-+}
-+
-+struct kdbus_quota {
-+ u32 memory;
-+ u16 msgs;
-+ u8 fds;
-+};
-+
-+/**
-+ * kdbus_conn_quota_inc() - increase quota accounting
-+ * @c: connection owning the quota tracking
-+ * @u: user to account for (or NULL for kernel accounting)
-+ * @memory: size of memory to account for
-+ * @fds: number of FDs to account for
-+ *
-+ * This call manages the quotas on resource @c. That is, it's used if other
-+ * users want to use the resources of connection @c, which so far only concerns
-+ * the receive queue of the destination.
-+ *
-+ * This increases the quota-accounting for user @u by @memory bytes and @fds
-+ * file descriptors. If the user has already reached the quota limits, this call
-+ * will not do any accounting but return a negative error code indicating the
-+ * failure.
-+ *
-+ * Return: 0 on success, negative error code on failure.
-+ */
-+int kdbus_conn_quota_inc(struct kdbus_conn *c, struct kdbus_user *u,
-+ size_t memory, size_t fds)
-+{
-+ struct kdbus_quota *quota;
-+ size_t available, accounted;
-+ unsigned int id;
-+
-+ /*
-+ * Pool Layout:
-+ * 50% of a pool is always owned by the connection. It is reserved for
-+ * kernel queries, handling received messages and other tasks that are
-+ * under control of the pool owner. The other 50% of the pool are used
-+ * as incoming queue.
-+ * As we optionally support user-space based policies, we need fair
-+ * allocation schemes. Furthermore, resource utilization should be
-+ * maximized, so only minimal resources stay reserved. However, we need
-+ * to adapt to a dynamic number of users, as we cannot know how many
-+ * users will talk to a connection. Therefore, the current allocation
-+ * works like this:
-+ * We limit the number of bytes in a destination's pool per sending
-+ * user. The space available for a user is 33% of the unused pool space
-+ * (whereas the space used by the user itself is also treated as
-+ * 'unused'). This way, we favor users coming first, but keep enough
-+ * pool space available for any following users. Given that messages are
-+ * dequeued in FIFO order, this should balance nicely if the number of
-+ * users grows. At the same time, this algorithm guarantees that the
-+ * space available to a connection is reduced dynamically, the more
-+ * concurrent users talk to a connection.
-+ */
-+
-+ /* per user-accounting is expensive, so we keep state small */
-+ BUILD_BUG_ON(sizeof(quota->memory) != 4);
-+ BUILD_BUG_ON(sizeof(quota->msgs) != 2);
-+ BUILD_BUG_ON(sizeof(quota->fds) != 1);
-+ BUILD_BUG_ON(KDBUS_CONN_MAX_MSGS > U16_MAX);
-+ BUILD_BUG_ON(KDBUS_CONN_MAX_FDS_PER_USER > U8_MAX);
-+
-+ id = u ? u->id : KDBUS_USER_KERNEL_ID;
-+ if (id >= c->n_quota) {
-+ unsigned int users;
-+
-+ users = max(KDBUS_ALIGN8(id) + 8, id);
-+ quota = krealloc(c->quota, users * sizeof(*quota),
-+ GFP_KERNEL | __GFP_ZERO);
-+ if (!quota)
-+ return -ENOMEM;
-+
-+ c->n_quota = users;
-+ c->quota = quota;
-+ }
-+
-+ quota = &c->quota[id];
-+ kdbus_pool_accounted(c->pool, &available, &accounted);
-+
-+ /* half the pool is _always_ reserved for the pool owner */
-+ available /= 2;
-+
-+ /*
-+ * Pool owner slices are un-accounted slices; they can claim more
-+ * than 50% of the queue. However, the slices we're dealing with here
-+ * belong to the incoming queue, hence they are 'accounted' slices
-+ * to which the 50%-limit applies.
-+ */
-+ if (available < accounted)
-+ return -ENOBUFS;
-+
-+ /* 1/3 of the remaining space (including your own memory) */
-+ available = (available - accounted + quota->memory) / 3;
-+
-+ if (available < quota->memory ||
-+ available - quota->memory < memory ||
-+ quota->memory + memory > U32_MAX)
-+ return -ENOBUFS;
-+ if (quota->msgs >= KDBUS_CONN_MAX_MSGS)
-+ return -ENOBUFS;
-+ if (quota->fds + fds < quota->fds ||
-+ quota->fds + fds > KDBUS_CONN_MAX_FDS_PER_USER)
-+ return -EMFILE;
-+
-+ quota->memory += memory;
-+ quota->fds += fds;
-+ ++quota->msgs;
-+ return 0;
-+}
-+
-+/**
-+ * kdbus_conn_quota_dec() - decrease quota accounting
-+ * @c: connection owning the quota tracking
-+ * @u: user which was accounted for (or NULL for kernel accounting)
-+ * @memory: size of memory which was accounted for
-+ * @fds: number of FDs which were accounted for
-+ *
-+ * This does the reverse of kdbus_conn_quota_inc(). You have to release any
-+ * accounted resources that you called kdbus_conn_quota_inc() for. However, you
-+ * must not call kdbus_conn_quota_dec() if the accounting failed (that is,
-+ * kdbus_conn_quota_inc() failed).
-+ */
-+void kdbus_conn_quota_dec(struct kdbus_conn *c, struct kdbus_user *u,
-+ size_t memory, size_t fds)
-+{
-+ struct kdbus_quota *quota;
-+ unsigned int id;
-+
-+ id = u ? u->id : KDBUS_USER_KERNEL_ID;
-+ if (WARN_ON(id >= c->n_quota))
-+ return;
-+
-+ quota = &c->quota[id];
-+
-+ if (!WARN_ON(quota->msgs == 0))
-+ --quota->msgs;
-+ if (!WARN_ON(quota->memory < memory))
-+ quota->memory -= memory;
-+ if (!WARN_ON(quota->fds < fds))
-+ quota->fds -= fds;
-+}
-+
-+/**
-+ * kdbus_conn_lost_message() - handle lost messages
-+ * @c: connection that lost a message
-+ *
-+ * kdbus is reliable. That means, we try hard to never lose messages. However,
-+ * memory is limited, so we cannot rely on transmissions to never fail.
-+ * Therefore, we use quota-limits to let callers know if their unicast message
-+ * cannot be transmitted to a peer. This works fine for unicasts, but for
-+ * broadcasts we cannot make the caller handle the transmission failure.
-+ * Instead, we must let the destination know that it couldn't receive a
-+ * broadcast.
-+ * As this is an unlikely scenario, we keep it simple. A single lost-counter
-+ * remembers the number of lost messages since the last call to RECV. The next
-+ * message retrieval will notify the connection that it lost messages since the
-+ * last message retrieval and thus should resync its state.
-+ */
-+void kdbus_conn_lost_message(struct kdbus_conn *c)
-+{
-+ if (atomic_inc_return(&c->lost_count) == 1)
-+ wake_up_interruptible(&c->wait);
-+}
-+
-+/* Callers should take the conn_dst lock */
-+static struct kdbus_queue_entry *
-+kdbus_conn_entry_make(struct kdbus_conn *conn_src,
-+ struct kdbus_conn *conn_dst,
-+ struct kdbus_staging *staging)
-+{
-+ /* The remote connection was disconnected */
-+ if (!kdbus_conn_active(conn_dst))
-+ return ERR_PTR(-ECONNRESET);
-+
-+ /*
-+ * If the connection does not accept file descriptors but the message
-+ * has some attached, refuse it.
-+ *
-+ * If this is a monitor connection, accept the message. In that
-+ * case, all file descriptors will be set to -1 at receive time.
-+ */
-+ if (!kdbus_conn_is_monitor(conn_dst) &&
-+ !(conn_dst->flags & KDBUS_HELLO_ACCEPT_FD) &&
-+ staging->gaps && staging->gaps->n_fds > 0)
-+ return ERR_PTR(-ECOMM);
-+
-+ return kdbus_queue_entry_new(conn_src, conn_dst, staging);
-+}
-+
-+/*
-+ * Synchronously responding to a message, allocate a queue entry
-+ * and attach it to the reply tracking object.
-+ * The connection's queue will never get to see it.
-+ */
-+static int kdbus_conn_entry_sync_attach(struct kdbus_conn *conn_dst,
-+ struct kdbus_staging *staging,
-+ struct kdbus_reply *reply_wake)
-+{
-+ struct kdbus_queue_entry *entry;
-+ int remote_ret, ret = 0;
-+
-+ mutex_lock(&reply_wake->reply_dst->lock);
-+
-+ /*
-+ * If we are still waiting then proceed, allocate a queue
-+ * entry and attach it to the reply object
-+ */
-+ if (reply_wake->waiting) {
-+ entry = kdbus_conn_entry_make(reply_wake->reply_src, conn_dst,
-+ staging);
-+ if (IS_ERR(entry))
-+ ret = PTR_ERR(entry);
-+ else
-+ /* Attach the entry to the reply object */
-+ reply_wake->queue_entry = entry;
-+ } else {
-+ ret = -ECONNRESET;
-+ }
-+
-+ /*
-+ * Update the reply object and wake up remote peer only
-+ * on appropriate return codes
-+ *
-+ * * -ECOMM: if the replying connection failed with -ECOMM
-+ * then wakeup remote peer with -EREMOTEIO
-+ *
-+ * We do this to differenciate between -ECOMM errors
-+ * from the original sender perspective:
-+ * -ECOMM error during the sync send and
-+ * -ECOMM error during the sync reply, this last
-+ * one is rewritten to -EREMOTEIO
-+ *
-+ * * Wake up on all other return codes.
-+ */
-+ remote_ret = ret;
-+
-+ if (ret == -ECOMM)
-+ remote_ret = -EREMOTEIO;
-+
-+ kdbus_sync_reply_wakeup(reply_wake, remote_ret);
-+ kdbus_reply_unlink(reply_wake);
-+ mutex_unlock(&reply_wake->reply_dst->lock);
-+
-+ return ret;
-+}
-+
-+/**
-+ * kdbus_conn_entry_insert() - enqueue a message into the receiver's pool
-+ * @conn_src: The sending connection
-+ * @conn_dst: The connection to queue into
-+ * @staging: Message to send
-+ * @reply: The reply tracker to attach to the queue entry
-+ * @name: Destination name this msg is sent to, or NULL
-+ *
-+ * Return: 0 on success. negative error otherwise.
-+ */
-+int kdbus_conn_entry_insert(struct kdbus_conn *conn_src,
-+ struct kdbus_conn *conn_dst,
-+ struct kdbus_staging *staging,
-+ struct kdbus_reply *reply,
-+ const struct kdbus_name_entry *name)
-+{
-+ struct kdbus_queue_entry *entry;
-+ int ret;
-+
-+ kdbus_conn_lock2(conn_src, conn_dst);
-+
-+ entry = kdbus_conn_entry_make(conn_src, conn_dst, staging);
-+ if (IS_ERR(entry)) {
-+ ret = PTR_ERR(entry);
-+ goto exit_unlock;
-+ }
-+
-+ if (reply) {
-+ kdbus_reply_link(reply);
-+ if (!reply->sync)
-+ schedule_delayed_work(&conn_src->work, 0);
-+ }
-+
-+ /*
-+ * Record the sequence number of the registered name; it will
-+ * be remembered by the queue, in case messages addressed to a
-+ * name need to be moved from or to an activator.
-+ */
-+ if (name)
-+ entry->dst_name_id = name->name_id;
-+
-+ kdbus_queue_entry_enqueue(entry, reply);
-+ wake_up_interruptible(&conn_dst->wait);
-+
-+ ret = 0;
-+
-+exit_unlock:
-+ kdbus_conn_unlock2(conn_src, conn_dst);
-+ return ret;
-+}
-+
-+static int kdbus_conn_wait_reply(struct kdbus_conn *conn_src,
-+ struct kdbus_cmd_send *cmd_send,
-+ struct file *ioctl_file,
-+ struct file *cancel_fd,
-+ struct kdbus_reply *reply_wait,
-+ ktime_t expire)
-+{
-+ struct kdbus_queue_entry *entry;
-+ struct poll_wqueues pwq = {};
-+ int ret;
-+
-+ if (WARN_ON(!reply_wait))
-+ return -EIO;
-+
-+ /*
-+ * Block until the reply arrives. reply_wait is left untouched
-+ * by the timeout scans that might be conducted for other,
-+ * asynchronous replies of conn_src.
-+ */
-+
-+ poll_initwait(&pwq);
-+ poll_wait(ioctl_file, &conn_src->wait, &pwq.pt);
-+
-+ for (;;) {
-+ /*
-+ * Any of the following conditions will stop our synchronously
-+ * blocking SEND command:
-+ *
-+ * a) The origin sender closed its connection
-+ * b) The remote peer answered, setting reply_wait->waiting = 0
-+ * c) The cancel FD was written to
-+ * d) A signal was received
-+ * e) The specified timeout was reached, and none of the above
-+ * conditions kicked in.
-+ */
-+
-+ /*
-+ * We have already acquired an active reference when
-+ * entering here, but another thread may call
-+ * KDBUS_CMD_BYEBYE which does not acquire an active
-+ * reference, therefore kdbus_conn_disconnect() will
-+ * not wait for us.
-+ */
-+ if (!kdbus_conn_active(conn_src)) {
-+ ret = -ECONNRESET;
-+ break;
-+ }
-+
-+ /*
-+ * After the replying peer unset the waiting variable
-+ * it will wake up us.
-+ */
-+ if (!reply_wait->waiting) {
-+ ret = reply_wait->err;
-+ break;
-+ }
-+
-+ if (cancel_fd) {
-+ unsigned int r;
-+
-+ r = cancel_fd->f_op->poll(cancel_fd, &pwq.pt);
-+ if (r & POLLIN) {
-+ ret = -ECANCELED;
-+ break;
-+ }
-+ }
-+
-+ if (signal_pending(current)) {
-+ ret = -EINTR;
-+ break;
-+ }
-+
-+ if (!poll_schedule_timeout(&pwq, TASK_INTERRUPTIBLE,
-+ &expire, 0)) {
-+ ret = -ETIMEDOUT;
-+ break;
-+ }
-+
-+ /*
-+ * Reset the poll worker func, so the waitqueues are not
-+ * added to the poll table again. We just reuse what we've
-+ * collected earlier for further iterations.
-+ */
-+ init_poll_funcptr(&pwq.pt, NULL);
-+ }
-+
-+ poll_freewait(&pwq);
-+
-+ if (ret == -EINTR) {
-+ /*
-+ * Interrupted system call. Unref the reply object, and pass
-+ * the return value down the chain. Mark the reply as
-+ * interrupted, so the cleanup work can remove it, but do not
-+ * unlink it from the list. Once the syscall restarts, we'll
-+ * pick it up and wait on it again.
-+ */
-+ mutex_lock(&conn_src->lock);
-+ reply_wait->interrupted = true;
-+ schedule_delayed_work(&conn_src->work, 0);
-+ mutex_unlock(&conn_src->lock);
-+
-+ return -ERESTARTSYS;
-+ }
-+
-+ mutex_lock(&conn_src->lock);
-+ reply_wait->waiting = false;
-+ entry = reply_wait->queue_entry;
-+ if (entry) {
-+ ret = kdbus_queue_entry_install(entry,
-+ &cmd_send->reply.return_flags,
-+ true);
-+ kdbus_pool_slice_publish(entry->slice, &cmd_send->reply.offset,
-+ &cmd_send->reply.msg_size);
-+ kdbus_queue_entry_free(entry);
-+ }
-+ kdbus_reply_unlink(reply_wait);
-+ mutex_unlock(&conn_src->lock);
-+
-+ return ret;
-+}
-+
-+static int kdbus_pin_dst(struct kdbus_bus *bus,
-+ struct kdbus_staging *staging,
-+ struct kdbus_name_entry **out_name,
-+ struct kdbus_conn **out_dst)
-+{
-+ const struct kdbus_msg *msg = staging->msg;
-+ struct kdbus_name_owner *owner = NULL;
-+ struct kdbus_name_entry *name = NULL;
-+ struct kdbus_conn *dst = NULL;
-+ int ret;
-+
-+ lockdep_assert_held(&bus->name_registry->rwlock);
-+
-+ if (!staging->dst_name) {
-+ dst = kdbus_bus_find_conn_by_id(bus, msg->dst_id);
-+ if (!dst)
-+ return -ENXIO;
-+
-+ if (!kdbus_conn_is_ordinary(dst)) {
-+ ret = -ENXIO;
-+ goto error;
-+ }
-+ } else {
-+ name = kdbus_name_lookup_unlocked(bus->name_registry,
-+ staging->dst_name);
-+ if (name)
-+ owner = kdbus_name_get_owner(name);
-+ if (!owner)
-+ return -ESRCH;
-+
-+ /*
-+ * If both a name and a connection ID are given as destination
-+ * of a message, check that the currently owning connection of
-+ * the name matches the specified ID.
-+ * This way, we allow userspace to send the message to a
-+ * specific connection by ID only if the connection currently
-+ * owns the given name.
-+ */
-+ if (msg->dst_id != KDBUS_DST_ID_NAME &&
-+ msg->dst_id != owner->conn->id)
-+ return -EREMCHG;
-+
-+ if ((msg->flags & KDBUS_MSG_NO_AUTO_START) &&
-+ kdbus_conn_is_activator(owner->conn))
-+ return -EADDRNOTAVAIL;
-+
-+ dst = kdbus_conn_ref(owner->conn);
-+ }
-+
-+ *out_name = name;
-+ *out_dst = dst;
-+ return 0;
-+
-+error:
-+ kdbus_conn_unref(dst);
-+ return ret;
-+}
-+
-+static int kdbus_conn_reply(struct kdbus_conn *src,
-+ struct kdbus_staging *staging)
-+{
-+ const struct kdbus_msg *msg = staging->msg;
-+ struct kdbus_name_entry *name = NULL;
-+ struct kdbus_reply *reply, *wake = NULL;
-+ struct kdbus_conn *dst = NULL;
-+ struct kdbus_bus *bus = src->ep->bus;
-+ int ret;
-+
-+ if (WARN_ON(msg->dst_id == KDBUS_DST_ID_BROADCAST) ||
-+ WARN_ON(msg->flags & KDBUS_MSG_EXPECT_REPLY) ||
-+ WARN_ON(msg->flags & KDBUS_MSG_SIGNAL))
-+ return -EINVAL;
-+
-+ /* name-registry must be locked for lookup *and* collecting data */
-+ down_read(&bus->name_registry->rwlock);
-+
-+ /* find and pin destination */
-+
-+ ret = kdbus_pin_dst(bus, staging, &name, &dst);
-+ if (ret < 0)
-+ goto exit;
-+
-+ mutex_lock(&dst->lock);
-+ reply = kdbus_reply_find(src, dst, msg->cookie_reply);
-+ if (reply) {
-+ if (reply->sync)
-+ wake = kdbus_reply_ref(reply);
-+ kdbus_reply_unlink(reply);
-+ }
-+ mutex_unlock(&dst->lock);
-+
-+ if (!reply) {
-+ ret = -EBADSLT;
-+ goto exit;
-+ }
-+
-+ /* send message */
-+
-+ kdbus_bus_eavesdrop(bus, src, staging);
-+
-+ if (wake)
-+ ret = kdbus_conn_entry_sync_attach(dst, staging, wake);
-+ else
-+ ret = kdbus_conn_entry_insert(src, dst, staging, NULL, name);
-+
-+exit:
-+ up_read(&bus->name_registry->rwlock);
-+ kdbus_reply_unref(wake);
-+ kdbus_conn_unref(dst);
-+ return ret;
-+}
-+
-+static struct kdbus_reply *kdbus_conn_call(struct kdbus_conn *src,
-+ struct kdbus_staging *staging,
-+ ktime_t exp)
-+{
-+ const struct kdbus_msg *msg = staging->msg;
-+ struct kdbus_name_entry *name = NULL;
-+ struct kdbus_reply *wait = NULL;
-+ struct kdbus_conn *dst = NULL;
-+ struct kdbus_bus *bus = src->ep->bus;
-+ int ret;
-+
-+ if (WARN_ON(msg->dst_id == KDBUS_DST_ID_BROADCAST) ||
-+ WARN_ON(msg->flags & KDBUS_MSG_SIGNAL) ||
-+ WARN_ON(!(msg->flags & KDBUS_MSG_EXPECT_REPLY)))
-+ return ERR_PTR(-EINVAL);
-+
-+ /* resume previous wait-context, if available */
-+
-+ mutex_lock(&src->lock);
-+ wait = kdbus_reply_find(NULL, src, msg->cookie);
-+ if (wait) {
-+ if (wait->interrupted) {
-+ kdbus_reply_ref(wait);
-+ wait->interrupted = false;
-+ } else {
-+ wait = NULL;
-+ }
-+ }
-+ mutex_unlock(&src->lock);
-+
-+ if (wait)
-+ return wait;
-+
-+ if (ktime_compare(ktime_get(), exp) >= 0)
-+ return ERR_PTR(-ETIMEDOUT);
-+
-+ /* name-registry must be locked for lookup *and* collecting data */
-+ down_read(&bus->name_registry->rwlock);
-+
-+ /* find and pin destination */
-+
-+ ret = kdbus_pin_dst(bus, staging, &name, &dst);
-+ if (ret < 0)
-+ goto exit;
-+
-+ if (!kdbus_conn_policy_talk(src, current_cred(), dst)) {
-+ ret = -EPERM;
-+ goto exit;
-+ }
-+
-+ wait = kdbus_reply_new(dst, src, msg, name, true);
-+ if (IS_ERR(wait)) {
-+ ret = PTR_ERR(wait);
-+ wait = NULL;
-+ goto exit;
-+ }
-+
-+ /* send message */
-+
-+ kdbus_bus_eavesdrop(bus, src, staging);
-+
-+ ret = kdbus_conn_entry_insert(src, dst, staging, wait, name);
-+ if (ret < 0)
-+ goto exit;
-+
-+ ret = 0;
-+
-+exit:
-+ up_read(&bus->name_registry->rwlock);
-+ if (ret < 0) {
-+ kdbus_reply_unref(wait);
-+ wait = ERR_PTR(ret);
-+ }
-+ kdbus_conn_unref(dst);
-+ return wait;
-+}
-+
-+static int kdbus_conn_unicast(struct kdbus_conn *src,
-+ struct kdbus_staging *staging)
-+{
-+ const struct kdbus_msg *msg = staging->msg;
-+ struct kdbus_name_entry *name = NULL;
-+ struct kdbus_reply *wait = NULL;
-+ struct kdbus_conn *dst = NULL;
-+ struct kdbus_bus *bus = src->ep->bus;
-+ bool is_signal = (msg->flags & KDBUS_MSG_SIGNAL);
-+ int ret = 0;
-+
-+ if (WARN_ON(msg->dst_id == KDBUS_DST_ID_BROADCAST) ||
-+ WARN_ON(!(msg->flags & KDBUS_MSG_EXPECT_REPLY) &&
-+ msg->cookie_reply != 0))
-+ return -EINVAL;
-+
-+ /* name-registry must be locked for lookup *and* collecting data */
-+ down_read(&bus->name_registry->rwlock);
-+
-+ /* find and pin destination */
-+
-+ ret = kdbus_pin_dst(bus, staging, &name, &dst);
-+ if (ret < 0)
-+ goto exit;
-+
-+ if (is_signal) {
-+ /* like broadcasts we eavesdrop even if the msg is dropped */
-+ kdbus_bus_eavesdrop(bus, src, staging);
-+
-+ /* drop silently if peer is not interested or not privileged */
-+ if (!kdbus_match_db_match_msg(dst->match_db, src, staging) ||
-+ !kdbus_conn_policy_talk(dst, NULL, src))
-+ goto exit;
-+ } else if (!kdbus_conn_policy_talk(src, current_cred(), dst)) {
-+ ret = -EPERM;
-+ goto exit;
-+ } else if (msg->flags & KDBUS_MSG_EXPECT_REPLY) {
-+ wait = kdbus_reply_new(dst, src, msg, name, false);
-+ if (IS_ERR(wait)) {
-+ ret = PTR_ERR(wait);
-+ wait = NULL;
-+ goto exit;
-+ }
-+ }
-+
-+ /* send message */
-+
-+ if (!is_signal)
-+ kdbus_bus_eavesdrop(bus, src, staging);
-+
-+ ret = kdbus_conn_entry_insert(src, dst, staging, wait, name);
-+ if (ret < 0 && !is_signal)
-+ goto exit;
-+
-+ /* signals are treated like broadcasts, recv-errors are ignored */
-+ ret = 0;
-+
-+exit:
-+ up_read(&bus->name_registry->rwlock);
-+ kdbus_reply_unref(wait);
-+ kdbus_conn_unref(dst);
-+ return ret;
-+}
-+
-+/**
-+ * kdbus_conn_move_messages() - move messages from one connection to another
-+ * @conn_dst: Connection to copy to
-+ * @conn_src: Connection to copy from
-+ * @name_id: Filter for the sequence number of the registered
-+ * name, 0 means no filtering.
-+ *
-+ * Move all messages from one connection to another. This is used when
-+ * an implementer connection is taking over/giving back a well-known name
-+ * from/to an activator connection.
-+ */
-+void kdbus_conn_move_messages(struct kdbus_conn *conn_dst,
-+ struct kdbus_conn *conn_src,
-+ u64 name_id)
-+{
-+ struct kdbus_queue_entry *e, *e_tmp;
-+ struct kdbus_reply *r, *r_tmp;
-+ struct kdbus_bus *bus;
-+ struct kdbus_conn *c;
-+ LIST_HEAD(msg_list);
-+ int i, ret = 0;
-+
-+ if (WARN_ON(conn_src == conn_dst))
-+ return;
-+
-+ bus = conn_src->ep->bus;
-+
-+ /* lock order: domain -> bus -> ep -> names -> conn */
-+ down_read(&bus->conn_rwlock);
-+ hash_for_each(bus->conn_hash, i, c, hentry) {
-+ if (c == conn_src || c == conn_dst)
-+ continue;
-+
-+ mutex_lock(&c->lock);
-+ list_for_each_entry_safe(r, r_tmp, &c->reply_list, entry) {
-+ if (r->reply_src != conn_src)
-+ continue;
-+
-+ /* filter messages for a specific name */
-+ if (name_id > 0 && r->name_id != name_id)
-+ continue;
-+
-+ kdbus_conn_unref(r->reply_src);
-+ r->reply_src = kdbus_conn_ref(conn_dst);
-+ }
-+ mutex_unlock(&c->lock);
-+ }
-+ up_read(&bus->conn_rwlock);
-+
-+ kdbus_conn_lock2(conn_src, conn_dst);
-+ list_for_each_entry_safe(e, e_tmp, &conn_src->queue.msg_list, entry) {
-+ /* filter messages for a specific name */
-+ if (name_id > 0 && e->dst_name_id != name_id)
-+ continue;
-+
-+ if (!(conn_dst->flags & KDBUS_HELLO_ACCEPT_FD) &&
-+ e->gaps && e->gaps->n_fds > 0) {
-+ kdbus_conn_lost_message(conn_dst);
-+ kdbus_queue_entry_free(e);
-+ continue;
-+ }
-+
-+ ret = kdbus_queue_entry_move(e, conn_dst);
-+ if (ret < 0) {
-+ kdbus_conn_lost_message(conn_dst);
-+ kdbus_queue_entry_free(e);
-+ continue;
-+ }
-+ }
-+ kdbus_conn_unlock2(conn_src, conn_dst);
-+
-+ /* wake up poll() */
-+ wake_up_interruptible(&conn_dst->wait);
-+}
-+
-+/* query the policy-database for all names of @whom */
-+static bool kdbus_conn_policy_query_all(struct kdbus_conn *conn,
-+ const struct cred *conn_creds,
-+ struct kdbus_policy_db *db,
-+ struct kdbus_conn *whom,
-+ unsigned int access)
-+{
-+ struct kdbus_name_owner *owner;
-+ bool pass = false;
-+ int res;
-+
-+ lockdep_assert_held(&conn->ep->bus->name_registry->rwlock);
-+
-+ down_read(&db->entries_rwlock);
-+ mutex_lock(&whom->lock);
-+
-+ list_for_each_entry(owner, &whom->names_list, conn_entry) {
-+ if (owner->flags & KDBUS_NAME_IN_QUEUE)
-+ continue;
-+
-+ res = kdbus_policy_query_unlocked(db,
-+ conn_creds ? : conn->cred,
-+ owner->name->name,
-+ kdbus_strhash(owner->name->name));
-+ if (res >= (int)access) {
-+ pass = true;
-+ break;
-+ }
-+ }
-+
-+ mutex_unlock(&whom->lock);
-+ up_read(&db->entries_rwlock);
-+
-+ return pass;
-+}
-+
-+/**
-+ * kdbus_conn_policy_own_name() - verify a connection can own the given name
-+ * @conn: Connection
-+ * @conn_creds: Credentials of @conn to use for policy check
-+ * @name: Name
-+ *
-+ * This verifies that @conn is allowed to acquire the well-known name @name.
-+ *
-+ * Return: true if allowed, false if not.
-+ */
-+bool kdbus_conn_policy_own_name(struct kdbus_conn *conn,
-+ const struct cred *conn_creds,
-+ const char *name)
-+{
-+ unsigned int hash = kdbus_strhash(name);
-+ int res;
-+
-+ if (!conn_creds)
-+ conn_creds = conn->cred;
-+
-+ if (conn->ep->user) {
-+ res = kdbus_policy_query(&conn->ep->policy_db, conn_creds,
-+ name, hash);
-+ if (res < KDBUS_POLICY_OWN)
-+ return false;
-+ }
-+
-+ if (conn->owner)
-+ return true;
-+
-+ res = kdbus_policy_query(&conn->ep->bus->policy_db, conn_creds,
-+ name, hash);
-+ return res >= KDBUS_POLICY_OWN;
-+}
-+
-+/**
-+ * kdbus_conn_policy_talk() - verify a connection can talk to a given peer
-+ * @conn: Connection that tries to talk
-+ * @conn_creds: Credentials of @conn to use for policy check
-+ * @to: Connection that is talked to
-+ *
-+ * This verifies that @conn is allowed to talk to @to.
-+ *
-+ * Return: true if allowed, false if not.
-+ */
-+bool kdbus_conn_policy_talk(struct kdbus_conn *conn,
-+ const struct cred *conn_creds,
-+ struct kdbus_conn *to)
-+{
-+ if (!conn_creds)
-+ conn_creds = conn->cred;
-+
-+ if (conn->ep->user &&
-+ !kdbus_conn_policy_query_all(conn, conn_creds, &conn->ep->policy_db,
-+ to, KDBUS_POLICY_TALK))
-+ return false;
-+
-+ if (conn->owner)
-+ return true;
-+ if (uid_eq(conn_creds->euid, to->cred->uid))
-+ return true;
-+
-+ return kdbus_conn_policy_query_all(conn, conn_creds,
-+ &conn->ep->bus->policy_db, to,
-+ KDBUS_POLICY_TALK);
-+}
-+
-+/**
-+ * kdbus_conn_policy_see_name_unlocked() - verify a connection can see a given
-+ * name
-+ * @conn: Connection
-+ * @conn_creds: Credentials of @conn to use for policy check
-+ * @name: Name
-+ *
-+ * This verifies that @conn is allowed to see the well-known name @name. Caller
-+ * must hold policy-lock.
-+ *
-+ * Return: true if allowed, false if not.
-+ */
-+bool kdbus_conn_policy_see_name_unlocked(struct kdbus_conn *conn,
-+ const struct cred *conn_creds,
-+ const char *name)
-+{
-+ int res;
-+
-+ /*
-+ * By default, all names are visible on a bus. SEE policies can only be
-+ * installed on custom endpoints, where by default no name is visible.
-+ */
-+ if (!conn->ep->user)
-+ return true;
-+
-+ res = kdbus_policy_query_unlocked(&conn->ep->policy_db,
-+ conn_creds ? : conn->cred,
-+ name, kdbus_strhash(name));
-+ return res >= KDBUS_POLICY_SEE;
-+}
-+
-+static bool kdbus_conn_policy_see_name(struct kdbus_conn *conn,
-+ const struct cred *conn_creds,
-+ const char *name)
-+{
-+ bool res;
-+
-+ down_read(&conn->ep->policy_db.entries_rwlock);
-+ res = kdbus_conn_policy_see_name_unlocked(conn, conn_creds, name);
-+ up_read(&conn->ep->policy_db.entries_rwlock);
-+
-+ return res;
-+}
-+
-+static bool kdbus_conn_policy_see(struct kdbus_conn *conn,
-+ const struct cred *conn_creds,
-+ struct kdbus_conn *whom)
-+{
-+ /*
-+ * By default, all names are visible on a bus, so a connection can
-+ * always see other connections. SEE policies can only be installed on
-+ * custom endpoints, where by default no name is visible and we hide
-+ * peers from each other, unless you see at least _one_ name of the
-+ * peer.
-+ */
-+ return !conn->ep->user ||
-+ kdbus_conn_policy_query_all(conn, conn_creds,
-+ &conn->ep->policy_db, whom,
-+ KDBUS_POLICY_SEE);
-+}
-+
-+/**
-+ * kdbus_conn_policy_see_notification() - verify a connection is allowed to
-+ * receive a given kernel notification
-+ * @conn: Connection
-+ * @conn_creds: Credentials of @conn to use for policy check
-+ * @msg: Notification message
-+ *
-+ * This checks whether @conn is allowed to see the kernel notification.
-+ *
-+ * Return: true if allowed, false if not.
-+ */
-+bool kdbus_conn_policy_see_notification(struct kdbus_conn *conn,
-+ const struct cred *conn_creds,
-+ const struct kdbus_msg *msg)
-+{
-+ /*
-+ * Depending on the notification type, broadcasted kernel notifications
-+ * have to be filtered:
-+ *
-+ * KDBUS_ITEM_NAME_{ADD,REMOVE,CHANGE}: This notification is forwarded
-+ * to a peer if, and only if, that peer can see the name this
-+ * notification is for.
-+ *
-+ * KDBUS_ITEM_ID_{ADD,REMOVE}: Notifications for ID changes are
-+ * broadcast to everyone, to allow tracking peers.
-+ */
-+
-+ switch (msg->items[0].type) {
-+ case KDBUS_ITEM_NAME_ADD:
-+ case KDBUS_ITEM_NAME_REMOVE:
-+ case KDBUS_ITEM_NAME_CHANGE:
-+ return kdbus_conn_policy_see_name(conn, conn_creds,
-+ msg->items[0].name_change.name);
-+
-+ case KDBUS_ITEM_ID_ADD:
-+ case KDBUS_ITEM_ID_REMOVE:
-+ return true;
-+
-+ default:
-+ WARN(1, "Invalid type for notification broadcast: %llu\n",
-+ (unsigned long long)msg->items[0].type);
-+ return false;
-+ }
-+}
-+
-+/**
-+ * kdbus_cmd_hello() - handle KDBUS_CMD_HELLO
-+ * @ep: Endpoint to operate on
-+ * @file: File this connection is opened on
-+ * @argp: Command payload
-+ *
-+ * Return: NULL or newly created connection on success, ERR_PTR on failure.
-+ */
-+struct kdbus_conn *kdbus_cmd_hello(struct kdbus_ep *ep, struct file *file,
-+ void __user *argp)
-+{
-+ struct kdbus_cmd_hello *cmd;
-+ struct kdbus_conn *c = NULL;
-+ const char *item_name;
-+ int ret;
-+
-+ struct kdbus_arg argv[] = {
-+ { .type = KDBUS_ITEM_NEGOTIATE },
-+ { .type = KDBUS_ITEM_NAME },
-+ { .type = KDBUS_ITEM_CREDS },
-+ { .type = KDBUS_ITEM_PIDS },
-+ { .type = KDBUS_ITEM_SECLABEL },
-+ { .type = KDBUS_ITEM_CONN_DESCRIPTION },
-+ { .type = KDBUS_ITEM_POLICY_ACCESS, .multiple = true },
-+ };
-+ struct kdbus_args args = {
-+ .allowed_flags = KDBUS_FLAG_NEGOTIATE |
-+ KDBUS_HELLO_ACCEPT_FD |
-+ KDBUS_HELLO_ACTIVATOR |
-+ KDBUS_HELLO_POLICY_HOLDER |
-+ KDBUS_HELLO_MONITOR,
-+ .argv = argv,
-+ .argc = ARRAY_SIZE(argv),
-+ };
-+
-+ ret = kdbus_args_parse(&args, argp, &cmd);
-+ if (ret < 0)
-+ return ERR_PTR(ret);
-+ if (ret > 0)
-+ return NULL;
-+
-+ item_name = argv[1].item ? argv[1].item->str : NULL;
-+
-+ c = kdbus_conn_new(ep, file, cmd, item_name,
-+ argv[2].item ? &argv[2].item->creds : NULL,
-+ argv[3].item ? &argv[3].item->pids : NULL,
-+ argv[4].item ? argv[4].item->str : NULL,
-+ argv[5].item ? argv[5].item->str : NULL);
-+ if (IS_ERR(c)) {
-+ ret = PTR_ERR(c);
-+ c = NULL;
-+ goto exit;
-+ }
-+
-+ ret = kdbus_conn_connect(c, item_name);
-+ if (ret < 0)
-+ goto exit;
-+
-+ if (kdbus_conn_is_activator(c) || kdbus_conn_is_policy_holder(c)) {
-+ ret = kdbus_conn_acquire(c);
-+ if (ret < 0)
-+ goto exit;
-+
-+ ret = kdbus_policy_set(&c->ep->bus->policy_db, args.items,
-+ args.items_size, 1,
-+ kdbus_conn_is_policy_holder(c), c);
-+ kdbus_conn_release(c);
-+ if (ret < 0)
-+ goto exit;
-+ }
-+
-+ if (copy_to_user(argp, cmd, sizeof(*cmd)))
-+ ret = -EFAULT;
-+
-+exit:
-+ ret = kdbus_args_clear(&args, ret);
-+ if (ret < 0) {
-+ if (c) {
-+ kdbus_conn_disconnect(c, false);
-+ kdbus_conn_unref(c);
-+ }
-+ return ERR_PTR(ret);
-+ }
-+ return c;
-+}
-+
-+/**
-+ * kdbus_cmd_byebye_unlocked() - handle KDBUS_CMD_BYEBYE
-+ * @conn: connection to operate on
-+ * @argp: command payload
-+ *
-+ * The caller must not hold any active reference to @conn or this will deadlock.
-+ *
-+ * Return: >=0 on success, negative error code on failure.
-+ */
-+int kdbus_cmd_byebye_unlocked(struct kdbus_conn *conn, void __user *argp)
-+{
-+ struct kdbus_cmd *cmd;
-+ int ret;
-+
-+ struct kdbus_arg argv[] = {
-+ { .type = KDBUS_ITEM_NEGOTIATE },
-+ };
-+ struct kdbus_args args = {
-+ .allowed_flags = KDBUS_FLAG_NEGOTIATE,
-+ .argv = argv,
-+ .argc = ARRAY_SIZE(argv),
-+ };
-+
-+ if (!kdbus_conn_is_ordinary(conn))
-+ return -EOPNOTSUPP;
-+
-+ ret = kdbus_args_parse(&args, argp, &cmd);
-+ if (ret != 0)
-+ return ret;
-+
-+ ret = kdbus_conn_disconnect(conn, true);
-+ return kdbus_args_clear(&args, ret);
-+}
-+
-+/**
-+ * kdbus_cmd_conn_info() - handle KDBUS_CMD_CONN_INFO
-+ * @conn: connection to operate on
-+ * @argp: command payload
-+ *
-+ * Return: >=0 on success, negative error code on failure.
-+ */
-+int kdbus_cmd_conn_info(struct kdbus_conn *conn, void __user *argp)
-+{
-+ struct kdbus_meta_conn *conn_meta = NULL;
-+ struct kdbus_pool_slice *slice = NULL;
-+ struct kdbus_name_entry *entry = NULL;
-+ struct kdbus_name_owner *owner = NULL;
-+ struct kdbus_conn *owner_conn = NULL;
-+ struct kdbus_item *meta_items = NULL;
-+ struct kdbus_info info = {};
-+ struct kdbus_cmd_info *cmd;
-+ struct kdbus_bus *bus = conn->ep->bus;
-+ struct kvec kvec[3];
-+ size_t meta_size, cnt = 0;
-+ const char *name;
-+ u64 attach_flags, size = 0;
-+ int ret;
-+
-+ struct kdbus_arg argv[] = {
-+ { .type = KDBUS_ITEM_NEGOTIATE },
-+ { .type = KDBUS_ITEM_NAME },
-+ };
-+ struct kdbus_args args = {
-+ .allowed_flags = KDBUS_FLAG_NEGOTIATE,
-+ .argv = argv,
-+ .argc = ARRAY_SIZE(argv),
-+ };
-+
-+ ret = kdbus_args_parse(&args, argp, &cmd);
-+ if (ret != 0)
-+ return ret;
-+
-+ /* registry must be held throughout lookup *and* collecting data */
-+ down_read(&bus->name_registry->rwlock);
-+
-+ ret = kdbus_sanitize_attach_flags(cmd->attach_flags, &attach_flags);
-+ if (ret < 0)
-+ goto exit;
-+
-+ name = argv[1].item ? argv[1].item->str : NULL;
-+
-+ if (name) {
-+ entry = kdbus_name_lookup_unlocked(bus->name_registry, name);
-+ if (entry)
-+ owner = kdbus_name_get_owner(entry);
-+ if (!owner ||
-+ !kdbus_conn_policy_see_name(conn, current_cred(), name) ||
-+ (cmd->id != 0 && owner->conn->id != cmd->id)) {
-+ /* pretend a name doesn't exist if you cannot see it */
-+ ret = -ESRCH;
-+ goto exit;
-+ }
-+
-+ owner_conn = kdbus_conn_ref(owner->conn);
-+ } else if (cmd->id > 0) {
-+ owner_conn = kdbus_bus_find_conn_by_id(bus, cmd->id);
-+ if (!owner_conn || !kdbus_conn_policy_see(conn, current_cred(),
-+ owner_conn)) {
-+ /* pretend an id doesn't exist if you cannot see it */
-+ ret = -ENXIO;
-+ goto exit;
-+ }
-+ } else {
-+ ret = -EINVAL;
-+ goto exit;
-+ }
-+
-+ attach_flags &= atomic64_read(&owner_conn->attach_flags_send);
-+
-+ conn_meta = kdbus_meta_conn_new();
-+ if (IS_ERR(conn_meta)) {
-+ ret = PTR_ERR(conn_meta);
-+ conn_meta = NULL;
-+ goto exit;
-+ }
-+
-+ ret = kdbus_meta_conn_collect(conn_meta, owner_conn, 0, attach_flags);
-+ if (ret < 0)
-+ goto exit;
-+
-+ ret = kdbus_meta_emit(owner_conn->meta_proc, owner_conn->meta_fake,
-+ conn_meta, conn, attach_flags,
-+ &meta_items, &meta_size);
-+ if (ret < 0)
-+ goto exit;
-+
-+ info.id = owner_conn->id;
-+ info.flags = owner_conn->flags;
-+
-+ kdbus_kvec_set(&kvec[cnt++], &info, sizeof(info), &size);
-+ if (meta_size > 0) {
-+ kdbus_kvec_set(&kvec[cnt++], meta_items, meta_size, &size);
-+ cnt += !!kdbus_kvec_pad(&kvec[cnt], &size);
-+ }
-+
-+ info.size = size;
-+
-+ slice = kdbus_pool_slice_alloc(conn->pool, size, false);
-+ if (IS_ERR(slice)) {
-+ ret = PTR_ERR(slice);
-+ slice = NULL;
-+ goto exit;
-+ }
-+
-+ ret = kdbus_pool_slice_copy_kvec(slice, 0, kvec, cnt, size);
-+ if (ret < 0)
-+ goto exit;
-+
-+ kdbus_pool_slice_publish(slice, &cmd->offset, &cmd->info_size);
-+
-+ if (kdbus_member_set_user(&cmd->offset, argp, typeof(*cmd), offset) ||
-+ kdbus_member_set_user(&cmd->info_size, argp,
-+ typeof(*cmd), info_size)) {
-+ ret = -EFAULT;
-+ goto exit;
-+ }
-+
-+ ret = 0;
-+
-+exit:
-+ up_read(&bus->name_registry->rwlock);
-+ kdbus_pool_slice_release(slice);
-+ kfree(meta_items);
-+ kdbus_meta_conn_unref(conn_meta);
-+ kdbus_conn_unref(owner_conn);
-+ return kdbus_args_clear(&args, ret);
-+}
-+
-+/**
-+ * kdbus_cmd_update() - handle KDBUS_CMD_UPDATE
-+ * @conn: connection to operate on
-+ * @argp: command payload
-+ *
-+ * Return: >=0 on success, negative error code on failure.
-+ */
-+int kdbus_cmd_update(struct kdbus_conn *conn, void __user *argp)
-+{
-+ struct kdbus_item *item_policy;
-+ u64 *item_attach_send = NULL;
-+ u64 *item_attach_recv = NULL;
-+ struct kdbus_cmd *cmd;
-+ u64 attach_send;
-+ u64 attach_recv;
-+ int ret;
-+
-+ struct kdbus_arg argv[] = {
-+ { .type = KDBUS_ITEM_NEGOTIATE },
-+ { .type = KDBUS_ITEM_ATTACH_FLAGS_SEND },
-+ { .type = KDBUS_ITEM_ATTACH_FLAGS_RECV },
-+ { .type = KDBUS_ITEM_NAME, .multiple = true },
-+ { .type = KDBUS_ITEM_POLICY_ACCESS, .multiple = true },
-+ };
-+ struct kdbus_args args = {
-+ .allowed_flags = KDBUS_FLAG_NEGOTIATE,
-+ .argv = argv,
-+ .argc = ARRAY_SIZE(argv),
-+ };
-+
-+ ret = kdbus_args_parse(&args, argp, &cmd);
-+ if (ret != 0)
-+ return ret;
-+
-+ item_attach_send = argv[1].item ? &argv[1].item->data64[0] : NULL;
-+ item_attach_recv = argv[2].item ? &argv[2].item->data64[0] : NULL;
-+ item_policy = argv[3].item ? : argv[4].item;
-+
-+ if (item_attach_send) {
-+ if (!kdbus_conn_is_ordinary(conn) &&
-+ !kdbus_conn_is_monitor(conn)) {
-+ ret = -EOPNOTSUPP;
-+ goto exit;
-+ }
-+
-+ ret = kdbus_sanitize_attach_flags(*item_attach_send,
-+ &attach_send);
-+ if (ret < 0)
-+ goto exit;
-+ }
-+
-+ if (item_attach_recv) {
-+ if (!kdbus_conn_is_ordinary(conn) &&
-+ !kdbus_conn_is_monitor(conn) &&
-+ !kdbus_conn_is_activator(conn)) {
-+ ret = -EOPNOTSUPP;
-+ goto exit;
-+ }
-+
-+ ret = kdbus_sanitize_attach_flags(*item_attach_recv,
-+ &attach_recv);
-+ if (ret < 0)
-+ goto exit;
-+ }
-+
-+ if (item_policy && !kdbus_conn_is_policy_holder(conn)) {
-+ ret = -EOPNOTSUPP;
-+ goto exit;
-+ }
-+
-+ /* now that we verified the input, update the connection */
-+
-+ if (item_policy) {
-+ ret = kdbus_policy_set(&conn->ep->bus->policy_db, cmd->items,
-+ KDBUS_ITEMS_SIZE(cmd, items),
-+ 1, true, conn);
-+ if (ret < 0)
-+ goto exit;
-+ }
-+
-+ if (item_attach_send)
-+ atomic64_set(&conn->attach_flags_send, attach_send);
-+
-+ if (item_attach_recv)
-+ atomic64_set(&conn->attach_flags_recv, attach_recv);
-+
-+exit:
-+ return kdbus_args_clear(&args, ret);
-+}
-+
-+/**
-+ * kdbus_cmd_send() - handle KDBUS_CMD_SEND
-+ * @conn: connection to operate on
-+ * @f: file this command was called on
-+ * @argp: command payload
-+ *
-+ * Return: >=0 on success, negative error code on failure.
-+ */
-+int kdbus_cmd_send(struct kdbus_conn *conn, struct file *f, void __user *argp)
-+{
-+ struct kdbus_cmd_send *cmd;
-+ struct kdbus_staging *staging = NULL;
-+ struct kdbus_msg *msg = NULL;
-+ struct file *cancel_fd = NULL;
-+ int ret, ret2;
-+
-+ /* command arguments */
-+ struct kdbus_arg argv[] = {
-+ { .type = KDBUS_ITEM_NEGOTIATE },
-+ { .type = KDBUS_ITEM_CANCEL_FD },
-+ };
-+ struct kdbus_args args = {
-+ .allowed_flags = KDBUS_FLAG_NEGOTIATE |
-+ KDBUS_SEND_SYNC_REPLY,
-+ .argv = argv,
-+ .argc = ARRAY_SIZE(argv),
-+ };
-+
-+ /* message arguments */
-+ struct kdbus_arg msg_argv[] = {
-+ { .type = KDBUS_ITEM_NEGOTIATE },
-+ { .type = KDBUS_ITEM_PAYLOAD_VEC, .multiple = true },
-+ { .type = KDBUS_ITEM_PAYLOAD_MEMFD, .multiple = true },
-+ { .type = KDBUS_ITEM_FDS },
-+ { .type = KDBUS_ITEM_BLOOM_FILTER },
-+ { .type = KDBUS_ITEM_DST_NAME },
-+ };
-+ struct kdbus_args msg_args = {
-+ .allowed_flags = KDBUS_FLAG_NEGOTIATE |
-+ KDBUS_MSG_EXPECT_REPLY |
-+ KDBUS_MSG_NO_AUTO_START |
-+ KDBUS_MSG_SIGNAL,
-+ .argv = msg_argv,
-+ .argc = ARRAY_SIZE(msg_argv),
-+ };
-+
-+ if (!kdbus_conn_is_ordinary(conn))
-+ return -EOPNOTSUPP;
-+
-+ /* make sure to parse both, @cmd and @msg on negotiation */
-+
-+ ret = kdbus_args_parse(&args, argp, &cmd);
-+ if (ret < 0)
-+ goto exit;
-+ else if (ret > 0 && !cmd->msg_address) /* negotiation without msg */
-+ goto exit;
-+
-+ ret2 = kdbus_args_parse_msg(&msg_args, KDBUS_PTR(cmd->msg_address),
-+ &msg);
-+ if (ret2 < 0) { /* cannot parse message */
-+ ret = ret2;
-+ goto exit;
-+ } else if (ret2 > 0 && !ret) { /* msg-negot implies cmd-negot */
-+ ret = -EINVAL;
-+ goto exit;
-+ } else if (ret > 0) { /* negotiation */
-+ goto exit;
-+ }
-+
-+ /* here we parsed both, @cmd and @msg, and neither wants negotiation */
-+
-+ cmd->reply.return_flags = 0;
-+ kdbus_pool_publish_empty(conn->pool, &cmd->reply.offset,
-+ &cmd->reply.msg_size);
-+
-+ if (argv[1].item) {
-+ cancel_fd = fget(argv[1].item->fds[0]);
-+ if (!cancel_fd) {
-+ ret = -EBADF;
-+ goto exit;
-+ }
-+
-+ if (!cancel_fd->f_op->poll) {
-+ ret = -EINVAL;
-+ goto exit;
-+ }
-+ }
-+
-+ /* patch-in the source of this message */
-+ if (msg->src_id > 0 && msg->src_id != conn->id) {
-+ ret = -EINVAL;
-+ goto exit;
-+ }
-+ msg->src_id = conn->id;
-+
-+ staging = kdbus_staging_new_user(conn->ep->bus, cmd, msg);
-+ if (IS_ERR(staging)) {
-+ ret = PTR_ERR(staging);
-+ staging = NULL;
-+ goto exit;
-+ }
-+
-+ if (msg->dst_id == KDBUS_DST_ID_BROADCAST) {
-+ down_read(&conn->ep->bus->name_registry->rwlock);
-+ kdbus_bus_broadcast(conn->ep->bus, conn, staging);
-+ up_read(&conn->ep->bus->name_registry->rwlock);
-+ } else if (cmd->flags & KDBUS_SEND_SYNC_REPLY) {
-+ struct kdbus_reply *r;
-+ ktime_t exp;
-+
-+ exp = ns_to_ktime(msg->timeout_ns);
-+ r = kdbus_conn_call(conn, staging, exp);
-+ if (IS_ERR(r)) {
-+ ret = PTR_ERR(r);
-+ goto exit;
-+ }
-+
-+ ret = kdbus_conn_wait_reply(conn, cmd, f, cancel_fd, r, exp);
-+ kdbus_reply_unref(r);
-+ if (ret < 0)
-+ goto exit;
-+ } else if ((msg->flags & KDBUS_MSG_EXPECT_REPLY) ||
-+ msg->cookie_reply == 0) {
-+ ret = kdbus_conn_unicast(conn, staging);
-+ if (ret < 0)
-+ goto exit;
-+ } else {
-+ ret = kdbus_conn_reply(conn, staging);
-+ if (ret < 0)
-+ goto exit;
-+ }
-+
-+ if (kdbus_member_set_user(&cmd->reply, argp, typeof(*cmd), reply))
-+ ret = -EFAULT;
-+
-+exit:
-+ if (cancel_fd)
-+ fput(cancel_fd);
-+ kdbus_staging_free(staging);
-+ ret = kdbus_args_clear(&msg_args, ret);
-+ return kdbus_args_clear(&args, ret);
-+}
-+
-+/**
-+ * kdbus_cmd_recv() - handle KDBUS_CMD_RECV
-+ * @conn: connection to operate on
-+ * @argp: command payload
-+ *
-+ * Return: >=0 on success, negative error code on failure.
-+ */
-+int kdbus_cmd_recv(struct kdbus_conn *conn, void __user *argp)
-+{
-+ struct kdbus_queue_entry *entry;
-+ struct kdbus_cmd_recv *cmd;
-+ int ret;
-+
-+ struct kdbus_arg argv[] = {
-+ { .type = KDBUS_ITEM_NEGOTIATE },
-+ };
-+ struct kdbus_args args = {
-+ .allowed_flags = KDBUS_FLAG_NEGOTIATE |
-+ KDBUS_RECV_PEEK |
-+ KDBUS_RECV_DROP |
-+ KDBUS_RECV_USE_PRIORITY,
-+ .argv = argv,
-+ .argc = ARRAY_SIZE(argv),
-+ };
-+
-+ if (!kdbus_conn_is_ordinary(conn) &&
-+ !kdbus_conn_is_monitor(conn) &&
-+ !kdbus_conn_is_activator(conn))
-+ return -EOPNOTSUPP;
-+
-+ ret = kdbus_args_parse(&args, argp, &cmd);
-+ if (ret != 0)
-+ return ret;
-+
-+ cmd->dropped_msgs = 0;
-+ cmd->msg.return_flags = 0;
-+ kdbus_pool_publish_empty(conn->pool, &cmd->msg.offset,
-+ &cmd->msg.msg_size);
-+
-+ /* DROP+priority is not realiably, so prevent it */
-+ if ((cmd->flags & KDBUS_RECV_DROP) &&
-+ (cmd->flags & KDBUS_RECV_USE_PRIORITY)) {
-+ ret = -EINVAL;
-+ goto exit;
-+ }
-+
-+ mutex_lock(&conn->lock);
-+
-+ entry = kdbus_queue_peek(&conn->queue, cmd->priority,
-+ cmd->flags & KDBUS_RECV_USE_PRIORITY);
-+ if (!entry) {
-+ mutex_unlock(&conn->lock);
-+ ret = -EAGAIN;
-+ } else if (cmd->flags & KDBUS_RECV_DROP) {
-+ struct kdbus_reply *reply = kdbus_reply_ref(entry->reply);
-+
-+ kdbus_queue_entry_free(entry);
-+
-+ mutex_unlock(&conn->lock);
-+
-+ if (reply) {
-+ mutex_lock(&reply->reply_dst->lock);
-+ if (!list_empty(&reply->entry)) {
-+ kdbus_reply_unlink(reply);
-+ if (reply->sync)
-+ kdbus_sync_reply_wakeup(reply, -EPIPE);
-+ else
-+ kdbus_notify_reply_dead(conn->ep->bus,
-+ reply->reply_dst->id,
-+ reply->cookie);
-+ }
-+ mutex_unlock(&reply->reply_dst->lock);
-+ kdbus_notify_flush(conn->ep->bus);
-+ }
-+
-+ kdbus_reply_unref(reply);
-+ } else {
-+ bool install_fds;
-+
-+ /*
-+ * PEEK just returns the location of the next message. Do not
-+ * install FDs nor memfds nor anything else. The only
-+ * information of interest should be the message header and
-+ * metadata. Any FD numbers in the payload is undefined for
-+ * PEEK'ed messages.
-+ * Also make sure to never install fds into a connection that
-+ * has refused to receive any. Ordinary connections will not get
-+ * messages with FDs queued (the receiver will get -ECOMM), but
-+ * eavesdroppers might.
-+ */
-+ install_fds = (conn->flags & KDBUS_HELLO_ACCEPT_FD) &&
-+ !(cmd->flags & KDBUS_RECV_PEEK);
-+
-+ ret = kdbus_queue_entry_install(entry,
-+ &cmd->msg.return_flags,
-+ install_fds);
-+ if (ret < 0) {
-+ mutex_unlock(&conn->lock);
-+ goto exit;
-+ }
-+
-+ kdbus_pool_slice_publish(entry->slice, &cmd->msg.offset,
-+ &cmd->msg.msg_size);
-+
-+ if (!(cmd->flags & KDBUS_RECV_PEEK))
-+ kdbus_queue_entry_free(entry);
-+
-+ mutex_unlock(&conn->lock);
-+ }
-+
-+ cmd->dropped_msgs = atomic_xchg(&conn->lost_count, 0);
-+ if (cmd->dropped_msgs > 0)
-+ cmd->return_flags |= KDBUS_RECV_RETURN_DROPPED_MSGS;
-+
-+ if (kdbus_member_set_user(&cmd->msg, argp, typeof(*cmd), msg) ||
-+ kdbus_member_set_user(&cmd->dropped_msgs, argp, typeof(*cmd),
-+ dropped_msgs))
-+ ret = -EFAULT;
-+
-+exit:
-+ return kdbus_args_clear(&args, ret);
-+}
-+
-+/**
-+ * kdbus_cmd_free() - handle KDBUS_CMD_FREE
-+ * @conn: connection to operate on
-+ * @argp: command payload
-+ *
-+ * Return: >=0 on success, negative error code on failure.
-+ */
-+int kdbus_cmd_free(struct kdbus_conn *conn, void __user *argp)
-+{
-+ struct kdbus_cmd_free *cmd;
-+ int ret;
-+
-+ struct kdbus_arg argv[] = {
-+ { .type = KDBUS_ITEM_NEGOTIATE },
-+ };
-+ struct kdbus_args args = {
-+ .allowed_flags = KDBUS_FLAG_NEGOTIATE,
-+ .argv = argv,
-+ .argc = ARRAY_SIZE(argv),
-+ };
-+
-+ if (!kdbus_conn_is_ordinary(conn) &&
-+ !kdbus_conn_is_monitor(conn) &&
-+ !kdbus_conn_is_activator(conn))
-+ return -EOPNOTSUPP;
-+
-+ ret = kdbus_args_parse(&args, argp, &cmd);
-+ if (ret != 0)
-+ return ret;
-+
-+ ret = kdbus_pool_release_offset(conn->pool, cmd->offset);
-+
-+ return kdbus_args_clear(&args, ret);
-+}
-diff --git a/ipc/kdbus/connection.h b/ipc/kdbus/connection.h
-new file mode 100644
-index 0000000..1ad0820
---- /dev/null
-+++ b/ipc/kdbus/connection.h
-@@ -0,0 +1,260 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_CONNECTION_H
-+#define __KDBUS_CONNECTION_H
-+
-+#include <linux/atomic.h>
-+#include <linux/kref.h>
-+#include <linux/lockdep.h>
-+#include <linux/path.h>
-+
-+#include "limits.h"
-+#include "metadata.h"
-+#include "pool.h"
-+#include "queue.h"
-+#include "util.h"
-+
-+#define KDBUS_HELLO_SPECIAL_CONN (KDBUS_HELLO_ACTIVATOR | \
-+ KDBUS_HELLO_POLICY_HOLDER | \
-+ KDBUS_HELLO_MONITOR)
-+
-+struct kdbus_name_entry;
-+struct kdbus_quota;
-+struct kdbus_staging;
-+
-+/**
-+ * struct kdbus_conn - connection to a bus
-+ * @kref: Reference count
-+ * @active: Active references to the connection
-+ * @id: Connection ID
-+ * @flags: KDBUS_HELLO_* flags
-+ * @attach_flags_send: KDBUS_ATTACH_* flags for sending
-+ * @attach_flags_recv: KDBUS_ATTACH_* flags for receiving
-+ * @description: Human-readable connection description, used for
-+ * debugging. This field is only set when the
-+ * connection is created.
-+ * @ep: The endpoint this connection belongs to
-+ * @lock: Connection data lock
-+ * @hentry: Entry in ID <-> connection map
-+ * @ep_entry: Entry in endpoint
-+ * @monitor_entry: Entry in monitor, if the connection is a monitor
-+ * @reply_list: List of connections this connection should
-+ * reply to
-+ * @work: Delayed work to handle timeouts
-+ * activator for
-+ * @match_db: Subscription filter to broadcast messages
-+ * @meta_proc: Process metadata of connection creator, or NULL
-+ * @meta_fake: Faked metadata, or NULL
-+ * @pool: The user's buffer to receive messages
-+ * @user: Owner of the connection
-+ * @cred: The credentials of the connection at creation time
-+ * @pid: Pid at creation time
-+ * @root_path: Root path at creation time
-+ * @request_count: Number of pending requests issued by this
-+ * connection that are waiting for replies from
-+ * other peers
-+ * @lost_count: Number of lost broadcast messages
-+ * @wait: Wake up this endpoint
-+ * @queue: The message queue associated with this connection
-+ * @quota: Array of per-user quota indexed by user->id
-+ * @n_quota: Number of elements in quota array
-+ * @names_list: List of well-known names
-+ * @name_count: Number of owned well-known names
-+ * @privileged: Whether this connection is privileged on the domain
-+ * @owner: Owned by the same user as the bus owner
-+ */
-+struct kdbus_conn {
-+ struct kref kref;
-+ atomic_t active;
-+#ifdef CONFIG_DEBUG_LOCK_ALLOC
-+ struct lockdep_map dep_map;
-+#endif
-+ u64 id;
-+ u64 flags;
-+ atomic64_t attach_flags_send;
-+ atomic64_t attach_flags_recv;
-+ const char *description;
-+ struct kdbus_ep *ep;
-+ struct mutex lock;
-+ struct hlist_node hentry;
-+ struct list_head ep_entry;
-+ struct list_head monitor_entry;
-+ struct list_head reply_list;
-+ struct delayed_work work;
-+ struct kdbus_match_db *match_db;
-+ struct kdbus_meta_proc *meta_proc;
-+ struct kdbus_meta_fake *meta_fake;
-+ struct kdbus_pool *pool;
-+ struct kdbus_user *user;
-+ const struct cred *cred;
-+ struct pid *pid;
-+ struct path root_path;
-+ atomic_t request_count;
-+ atomic_t lost_count;
-+ wait_queue_head_t wait;
-+ struct kdbus_queue queue;
-+
-+ struct kdbus_quota *quota;
-+ unsigned int n_quota;
-+
-+ /* protected by registry->rwlock */
-+ struct list_head names_list;
-+ unsigned int name_count;
-+
-+ bool privileged:1;
-+ bool owner:1;
-+};
-+
-+struct kdbus_conn *kdbus_conn_ref(struct kdbus_conn *conn);
-+struct kdbus_conn *kdbus_conn_unref(struct kdbus_conn *conn);
-+bool kdbus_conn_active(const struct kdbus_conn *conn);
-+int kdbus_conn_acquire(struct kdbus_conn *conn);
-+void kdbus_conn_release(struct kdbus_conn *conn);
-+int kdbus_conn_disconnect(struct kdbus_conn *conn, bool ensure_queue_empty);
-+bool kdbus_conn_has_name(struct kdbus_conn *conn, const char *name);
-+int kdbus_conn_quota_inc(struct kdbus_conn *c, struct kdbus_user *u,
-+ size_t memory, size_t fds);
-+void kdbus_conn_quota_dec(struct kdbus_conn *c, struct kdbus_user *u,
-+ size_t memory, size_t fds);
-+void kdbus_conn_lost_message(struct kdbus_conn *c);
-+int kdbus_conn_entry_insert(struct kdbus_conn *conn_src,
-+ struct kdbus_conn *conn_dst,
-+ struct kdbus_staging *staging,
-+ struct kdbus_reply *reply,
-+ const struct kdbus_name_entry *name);
-+void kdbus_conn_move_messages(struct kdbus_conn *conn_dst,
-+ struct kdbus_conn *conn_src,
-+ u64 name_id);
-+
-+/* policy */
-+bool kdbus_conn_policy_own_name(struct kdbus_conn *conn,
-+ const struct cred *conn_creds,
-+ const char *name);
-+bool kdbus_conn_policy_talk(struct kdbus_conn *conn,
-+ const struct cred *conn_creds,
-+ struct kdbus_conn *to);
-+bool kdbus_conn_policy_see_name_unlocked(struct kdbus_conn *conn,
-+ const struct cred *curr_creds,
-+ const char *name);
-+bool kdbus_conn_policy_see_notification(struct kdbus_conn *conn,
-+ const struct cred *curr_creds,
-+ const struct kdbus_msg *msg);
-+
-+/* command dispatcher */
-+struct kdbus_conn *kdbus_cmd_hello(struct kdbus_ep *ep, struct file *file,
-+ void __user *argp);
-+int kdbus_cmd_byebye_unlocked(struct kdbus_conn *conn, void __user *argp);
-+int kdbus_cmd_conn_info(struct kdbus_conn *conn, void __user *argp);
-+int kdbus_cmd_update(struct kdbus_conn *conn, void __user *argp);
-+int kdbus_cmd_send(struct kdbus_conn *conn, struct file *f, void __user *argp);
-+int kdbus_cmd_recv(struct kdbus_conn *conn, void __user *argp);
-+int kdbus_cmd_free(struct kdbus_conn *conn, void __user *argp);
-+
-+/**
-+ * kdbus_conn_is_ordinary() - Check if connection is ordinary
-+ * @conn: The connection to check
-+ *
-+ * Return: Non-zero if the connection is an ordinary connection
-+ */
-+static inline int kdbus_conn_is_ordinary(const struct kdbus_conn *conn)
-+{
-+ return !(conn->flags & KDBUS_HELLO_SPECIAL_CONN);
-+}
-+
-+/**
-+ * kdbus_conn_is_activator() - Check if connection is an activator
-+ * @conn: The connection to check
-+ *
-+ * Return: Non-zero if the connection is an activator
-+ */
-+static inline int kdbus_conn_is_activator(const struct kdbus_conn *conn)
-+{
-+ return conn->flags & KDBUS_HELLO_ACTIVATOR;
-+}
-+
-+/**
-+ * kdbus_conn_is_policy_holder() - Check if connection is a policy holder
-+ * @conn: The connection to check
-+ *
-+ * Return: Non-zero if the connection is a policy holder
-+ */
-+static inline int kdbus_conn_is_policy_holder(const struct kdbus_conn *conn)
-+{
-+ return conn->flags & KDBUS_HELLO_POLICY_HOLDER;
-+}
-+
-+/**
-+ * kdbus_conn_is_monitor() - Check if connection is a monitor
-+ * @conn: The connection to check
-+ *
-+ * Return: Non-zero if the connection is a monitor
-+ */
-+static inline int kdbus_conn_is_monitor(const struct kdbus_conn *conn)
-+{
-+ return conn->flags & KDBUS_HELLO_MONITOR;
-+}
-+
-+/**
-+ * kdbus_conn_lock2() - Lock two connections
-+ * @a: connection A to lock or NULL
-+ * @b: connection B to lock or NULL
-+ *
-+ * Lock two connections at once. As we need to have a stable locking order, we
-+ * always lock the connection with lower memory address first.
-+ */
-+static inline void kdbus_conn_lock2(struct kdbus_conn *a, struct kdbus_conn *b)
-+{
-+ if (a < b) {
-+ if (a)
-+ mutex_lock(&a->lock);
-+ if (b && b != a)
-+ mutex_lock_nested(&b->lock, !!a);
-+ } else {
-+ if (b)
-+ mutex_lock(&b->lock);
-+ if (a && a != b)
-+ mutex_lock_nested(&a->lock, !!b);
-+ }
-+}
-+
-+/**
-+ * kdbus_conn_unlock2() - Unlock two connections
-+ * @a: connection A to unlock or NULL
-+ * @b: connection B to unlock or NULL
-+ *
-+ * Unlock two connections at once. See kdbus_conn_lock2().
-+ */
-+static inline void kdbus_conn_unlock2(struct kdbus_conn *a,
-+ struct kdbus_conn *b)
-+{
-+ if (a)
-+ mutex_unlock(&a->lock);
-+ if (b && b != a)
-+ mutex_unlock(&b->lock);
-+}
-+
-+/**
-+ * kdbus_conn_assert_active() - lockdep assert on active lock
-+ * @conn: connection that shall be active
-+ *
-+ * This verifies via lockdep that the caller holds an active reference to the
-+ * given connection.
-+ */
-+static inline void kdbus_conn_assert_active(struct kdbus_conn *conn)
-+{
-+ lockdep_assert_held(conn);
-+}
-+
-+#endif
-diff --git a/ipc/kdbus/domain.c b/ipc/kdbus/domain.c
-new file mode 100644
-index 0000000..ac9f760
---- /dev/null
-+++ b/ipc/kdbus/domain.c
-@@ -0,0 +1,296 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/fs.h>
-+#include <linux/idr.h>
-+#include <linux/init.h>
-+#include <linux/module.h>
-+#include <linux/sched.h>
-+#include <linux/sizes.h>
-+#include <linux/slab.h>
-+#include <linux/uaccess.h>
-+
-+#include "bus.h"
-+#include "domain.h"
-+#include "handle.h"
-+#include "item.h"
-+#include "limits.h"
-+#include "util.h"
-+
-+static void kdbus_domain_control_free(struct kdbus_node *node)
-+{
-+ kfree(node);
-+}
-+
-+static struct kdbus_node *kdbus_domain_control_new(struct kdbus_domain *domain,
-+ unsigned int access)
-+{
-+ struct kdbus_node *node;
-+ int ret;
-+
-+ node = kzalloc(sizeof(*node), GFP_KERNEL);
-+ if (!node)
-+ return ERR_PTR(-ENOMEM);
-+
-+ kdbus_node_init(node, KDBUS_NODE_CONTROL);
-+
-+ node->free_cb = kdbus_domain_control_free;
-+ node->mode = domain->node.mode;
-+ node->mode = S_IRUSR | S_IWUSR;
-+ if (access & (KDBUS_MAKE_ACCESS_GROUP | KDBUS_MAKE_ACCESS_WORLD))
-+ node->mode |= S_IRGRP | S_IWGRP;
-+ if (access & KDBUS_MAKE_ACCESS_WORLD)
-+ node->mode |= S_IROTH | S_IWOTH;
-+
-+ ret = kdbus_node_link(node, &domain->node, "control");
-+ if (ret < 0)
-+ goto exit_free;
-+
-+ return node;
-+
-+exit_free:
-+ kdbus_node_deactivate(node);
-+ kdbus_node_unref(node);
-+ return ERR_PTR(ret);
-+}
-+
-+static void kdbus_domain_free(struct kdbus_node *node)
-+{
-+ struct kdbus_domain *domain =
-+ container_of(node, struct kdbus_domain, node);
-+
-+ put_user_ns(domain->user_namespace);
-+ ida_destroy(&domain->user_ida);
-+ idr_destroy(&domain->user_idr);
-+ kfree(domain);
-+}
-+
-+/**
-+ * kdbus_domain_new() - create a new domain
-+ * @access: The access mode for this node (KDBUS_MAKE_ACCESS_*)
-+ *
-+ * Return: a new kdbus_domain on success, ERR_PTR on failure
-+ */
-+struct kdbus_domain *kdbus_domain_new(unsigned int access)
-+{
-+ struct kdbus_domain *d;
-+ int ret;
-+
-+ d = kzalloc(sizeof(*d), GFP_KERNEL);
-+ if (!d)
-+ return ERR_PTR(-ENOMEM);
-+
-+ kdbus_node_init(&d->node, KDBUS_NODE_DOMAIN);
-+
-+ d->node.free_cb = kdbus_domain_free;
-+ d->node.mode = S_IRUSR | S_IXUSR;
-+ if (access & (KDBUS_MAKE_ACCESS_GROUP | KDBUS_MAKE_ACCESS_WORLD))
-+ d->node.mode |= S_IRGRP | S_IXGRP;
-+ if (access & KDBUS_MAKE_ACCESS_WORLD)
-+ d->node.mode |= S_IROTH | S_IXOTH;
-+
-+ mutex_init(&d->lock);
-+ idr_init(&d->user_idr);
-+ ida_init(&d->user_ida);
-+
-+ /* Pin user namespace so we can guarantee domain-unique bus * names. */
-+ d->user_namespace = get_user_ns(current_user_ns());
-+
-+ ret = kdbus_node_link(&d->node, NULL, NULL);
-+ if (ret < 0)
-+ goto exit_unref;
-+
-+ return d;
-+
-+exit_unref:
-+ kdbus_node_deactivate(&d->node);
-+ kdbus_node_unref(&d->node);
-+ return ERR_PTR(ret);
-+}
-+
-+/**
-+ * kdbus_domain_ref() - take a domain reference
-+ * @domain: Domain
-+ *
-+ * Return: the domain itself
-+ */
-+struct kdbus_domain *kdbus_domain_ref(struct kdbus_domain *domain)
-+{
-+ if (domain)
-+ kdbus_node_ref(&domain->node);
-+ return domain;
-+}
-+
-+/**
-+ * kdbus_domain_unref() - drop a domain reference
-+ * @domain: Domain
-+ *
-+ * When the last reference is dropped, the domain internal structure
-+ * is freed.
-+ *
-+ * Return: NULL
-+ */
-+struct kdbus_domain *kdbus_domain_unref(struct kdbus_domain *domain)
-+{
-+ if (domain)
-+ kdbus_node_unref(&domain->node);
-+ return NULL;
-+}
-+
-+/**
-+ * kdbus_domain_populate() - populate static domain nodes
-+ * @domain: domain to populate
-+ * @access: KDBUS_MAKE_ACCESS_* access restrictions for new nodes
-+ *
-+ * Allocate and activate static sub-nodes of the given domain. This will fail if
-+ * you call it on a non-active node or if the domain was already populated.
-+ *
-+ * Return: 0 on success, negative error code on failure.
-+ */
-+int kdbus_domain_populate(struct kdbus_domain *domain, unsigned int access)
-+{
-+ struct kdbus_node *control;
-+
-+ /*
-+ * Create a control-node for this domain. We drop our own reference
-+ * immediately, effectively causing the node to be deactivated and
-+ * released when the parent domain is.
-+ */
-+ control = kdbus_domain_control_new(domain, access);
-+ if (IS_ERR(control))
-+ return PTR_ERR(control);
-+
-+ kdbus_node_activate(control);
-+ kdbus_node_unref(control);
-+ return 0;
-+}
-+
-+/**
-+ * kdbus_user_lookup() - lookup a kdbus_user object
-+ * @domain: domain of the user
-+ * @uid: uid of the user; INVALID_UID for an anon user
-+ *
-+ * Lookup the kdbus user accounting object for the given domain. If INVALID_UID
-+ * is passed, a new anonymous user is created which is private to the caller.
-+ *
-+ * Return: The user object is returned, ERR_PTR on failure.
-+ */
-+struct kdbus_user *kdbus_user_lookup(struct kdbus_domain *domain, kuid_t uid)
-+{
-+ struct kdbus_user *u = NULL, *old = NULL;
-+ int ret;
-+
-+ mutex_lock(&domain->lock);
-+
-+ if (uid_valid(uid)) {
-+ old = idr_find(&domain->user_idr, __kuid_val(uid));
-+ /*
-+ * If the object is about to be destroyed, ignore it and
-+ * replace the slot in the IDR later on.
-+ */
-+ if (old && kref_get_unless_zero(&old->kref)) {
-+ mutex_unlock(&domain->lock);
-+ return old;
-+ }
-+ }
-+
-+ u = kzalloc(sizeof(*u), GFP_KERNEL);
-+ if (!u) {
-+ ret = -ENOMEM;
-+ goto exit;
-+ }
-+
-+ kref_init(&u->kref);
-+ u->domain = kdbus_domain_ref(domain);
-+ u->uid = uid;
-+ atomic_set(&u->buses, 0);
-+ atomic_set(&u->connections, 0);
-+
-+ if (uid_valid(uid)) {
-+ if (old) {
-+ idr_replace(&domain->user_idr, u, __kuid_val(uid));
-+ old->uid = INVALID_UID; /* mark old as removed */
-+ } else {
-+ ret = idr_alloc(&domain->user_idr, u, __kuid_val(uid),
-+ __kuid_val(uid) + 1, GFP_KERNEL);
-+ if (ret < 0)
-+ goto exit;
-+ }
-+ }
-+
-+ /*
-+ * Allocate the smallest possible index for this user; used
-+ * in arrays for accounting user quota in receiver queues.
-+ */
-+ ret = ida_simple_get(&domain->user_ida, 1, 0, GFP_KERNEL);
-+ if (ret < 0)
-+ goto exit;
-+
-+ u->id = ret;
-+ mutex_unlock(&domain->lock);
-+ return u;
-+
-+exit:
-+ if (u) {
-+ if (uid_valid(u->uid))
-+ idr_remove(&domain->user_idr, __kuid_val(u->uid));
-+ kdbus_domain_unref(u->domain);
-+ kfree(u);
-+ }
-+ mutex_unlock(&domain->lock);
-+ return ERR_PTR(ret);
-+}
-+
-+static void __kdbus_user_free(struct kref *kref)
-+{
-+ struct kdbus_user *user = container_of(kref, struct kdbus_user, kref);
-+
-+ WARN_ON(atomic_read(&user->buses) > 0);
-+ WARN_ON(atomic_read(&user->connections) > 0);
-+
-+ mutex_lock(&user->domain->lock);
-+ ida_simple_remove(&user->domain->user_ida, user->id);
-+ if (uid_valid(user->uid))
-+ idr_remove(&user->domain->user_idr, __kuid_val(user->uid));
-+ mutex_unlock(&user->domain->lock);
-+
-+ kdbus_domain_unref(user->domain);
-+ kfree(user);
-+}
-+
-+/**
-+ * kdbus_user_ref() - take a user reference
-+ * @u: User
-+ *
-+ * Return: @u is returned
-+ */
-+struct kdbus_user *kdbus_user_ref(struct kdbus_user *u)
-+{
-+ if (u)
-+ kref_get(&u->kref);
-+ return u;
-+}
-+
-+/**
-+ * kdbus_user_unref() - drop a user reference
-+ * @u: User
-+ *
-+ * Return: NULL
-+ */
-+struct kdbus_user *kdbus_user_unref(struct kdbus_user *u)
-+{
-+ if (u)
-+ kref_put(&u->kref, __kdbus_user_free);
-+ return NULL;
-+}
-diff --git a/ipc/kdbus/domain.h b/ipc/kdbus/domain.h
-new file mode 100644
-index 0000000..447a2bd
---- /dev/null
-+++ b/ipc/kdbus/domain.h
-@@ -0,0 +1,77 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_DOMAIN_H
-+#define __KDBUS_DOMAIN_H
-+
-+#include <linux/fs.h>
-+#include <linux/idr.h>
-+#include <linux/kref.h>
-+#include <linux/user_namespace.h>
-+
-+#include "node.h"
-+
-+/**
-+ * struct kdbus_domain - domain for buses
-+ * @node: Underlying API node
-+ * @lock: Domain data lock
-+ * @last_id: Last used object id
-+ * @user_idr: Set of all users indexed by UID
-+ * @user_ida: Set of all users to compute small indices
-+ * @user_namespace: User namespace, pinned at creation time
-+ * @dentry: Root dentry of VFS mount (don't use outside of kdbusfs)
-+ */
-+struct kdbus_domain {
-+ struct kdbus_node node;
-+ struct mutex lock;
-+ atomic64_t last_id;
-+ struct idr user_idr;
-+ struct ida user_ida;
-+ struct user_namespace *user_namespace;
-+ struct dentry *dentry;
-+};
-+
-+/**
-+ * struct kdbus_user - resource accounting for users
-+ * @kref: Reference counter
-+ * @domain: Domain of the user
-+ * @id: Index of this user
-+ * @uid: UID of the user
-+ * @buses: Number of buses the user has created
-+ * @connections: Number of connections the user has created
-+ */
-+struct kdbus_user {
-+ struct kref kref;
-+ struct kdbus_domain *domain;
-+ unsigned int id;
-+ kuid_t uid;
-+ atomic_t buses;
-+ atomic_t connections;
-+};
-+
-+#define kdbus_domain_from_node(_node) \
-+ container_of((_node), struct kdbus_domain, node)
-+
-+struct kdbus_domain *kdbus_domain_new(unsigned int access);
-+struct kdbus_domain *kdbus_domain_ref(struct kdbus_domain *domain);
-+struct kdbus_domain *kdbus_domain_unref(struct kdbus_domain *domain);
-+int kdbus_domain_populate(struct kdbus_domain *domain, unsigned int access);
-+
-+#define KDBUS_USER_KERNEL_ID 0 /* ID 0 is reserved for kernel accounting */
-+
-+struct kdbus_user *kdbus_user_lookup(struct kdbus_domain *domain, kuid_t uid);
-+struct kdbus_user *kdbus_user_ref(struct kdbus_user *u);
-+struct kdbus_user *kdbus_user_unref(struct kdbus_user *u);
-+
-+#endif
-diff --git a/ipc/kdbus/endpoint.c b/ipc/kdbus/endpoint.c
-new file mode 100644
-index 0000000..44e7a20
---- /dev/null
-+++ b/ipc/kdbus/endpoint.c
-@@ -0,0 +1,303 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/fs.h>
-+#include <linux/idr.h>
-+#include <linux/init.h>
-+#include <linux/module.h>
-+#include <linux/sched.h>
-+#include <linux/sizes.h>
-+#include <linux/slab.h>
-+#include <linux/uaccess.h>
-+#include <linux/uio.h>
-+
-+#include "bus.h"
-+#include "connection.h"
-+#include "domain.h"
-+#include "endpoint.h"
-+#include "handle.h"
-+#include "item.h"
-+#include "message.h"
-+#include "policy.h"
-+
-+static void kdbus_ep_free(struct kdbus_node *node)
-+{
-+ struct kdbus_ep *ep = container_of(node, struct kdbus_ep, node);
-+
-+ WARN_ON(!list_empty(&ep->conn_list));
-+
-+ kdbus_policy_db_clear(&ep->policy_db);
-+ kdbus_bus_unref(ep->bus);
-+ kdbus_user_unref(ep->user);
-+ kfree(ep);
-+}
-+
-+static void kdbus_ep_release(struct kdbus_node *node, bool was_active)
-+{
-+ struct kdbus_ep *ep = container_of(node, struct kdbus_ep, node);
-+
-+ /* disconnect all connections to this endpoint */
-+ for (;;) {
-+ struct kdbus_conn *conn;
-+
-+ mutex_lock(&ep->lock);
-+ conn = list_first_entry_or_null(&ep->conn_list,
-+ struct kdbus_conn,
-+ ep_entry);
-+ if (!conn) {
-+ mutex_unlock(&ep->lock);
-+ break;
-+ }
-+
-+ /* take reference, release lock, disconnect without lock */
-+ kdbus_conn_ref(conn);
-+ mutex_unlock(&ep->lock);
-+
-+ kdbus_conn_disconnect(conn, false);
-+ kdbus_conn_unref(conn);
-+ }
-+}
-+
-+/**
-+ * kdbus_ep_new() - create a new endpoint
-+ * @bus: The bus this endpoint will be created for
-+ * @name: The name of the endpoint
-+ * @access: The access flags for this node (KDBUS_MAKE_ACCESS_*)
-+ * @uid: The uid of the node
-+ * @gid: The gid of the node
-+ * @is_custom: Whether this is a custom endpoint
-+ *
-+ * This function will create a new endpoint with the given
-+ * name and properties for a given bus.
-+ *
-+ * Return: a new kdbus_ep on success, ERR_PTR on failure.
-+ */
-+struct kdbus_ep *kdbus_ep_new(struct kdbus_bus *bus, const char *name,
-+ unsigned int access, kuid_t uid, kgid_t gid,
-+ bool is_custom)
-+{
-+ struct kdbus_ep *e;
-+ int ret;
-+
-+ /*
-+ * Validate only custom endpoints names, default endpoints
-+ * with a "bus" name are created when the bus is created
-+ */
-+ if (is_custom) {
-+ ret = kdbus_verify_uid_prefix(name, bus->domain->user_namespace,
-+ uid);
-+ if (ret < 0)
-+ return ERR_PTR(ret);
-+ }
-+
-+ e = kzalloc(sizeof(*e), GFP_KERNEL);
-+ if (!e)
-+ return ERR_PTR(-ENOMEM);
-+
-+ kdbus_node_init(&e->node, KDBUS_NODE_ENDPOINT);
-+
-+ e->node.free_cb = kdbus_ep_free;
-+ e->node.release_cb = kdbus_ep_release;
-+ e->node.uid = uid;
-+ e->node.gid = gid;
-+ e->node.mode = S_IRUSR | S_IWUSR;
-+ if (access & (KDBUS_MAKE_ACCESS_GROUP | KDBUS_MAKE_ACCESS_WORLD))
-+ e->node.mode |= S_IRGRP | S_IWGRP;
-+ if (access & KDBUS_MAKE_ACCESS_WORLD)
-+ e->node.mode |= S_IROTH | S_IWOTH;
-+
-+ mutex_init(&e->lock);
-+ INIT_LIST_HEAD(&e->conn_list);
-+ kdbus_policy_db_init(&e->policy_db);
-+ e->bus = kdbus_bus_ref(bus);
-+
-+ ret = kdbus_node_link(&e->node, &bus->node, name);
-+ if (ret < 0)
-+ goto exit_unref;
-+
-+ /*
-+ * Transactions on custom endpoints are never accounted on the global
-+ * user limits. Instead, for each custom endpoint, we create a custom,
-+ * unique user, which all transactions are accounted on. Regardless of
-+ * the user using that endpoint, it is always accounted on the same
-+ * user-object. This budget is not shared with ordinary users on
-+ * non-custom endpoints.
-+ */
-+ if (is_custom) {
-+ e->user = kdbus_user_lookup(bus->domain, INVALID_UID);
-+ if (IS_ERR(e->user)) {
-+ ret = PTR_ERR(e->user);
-+ e->user = NULL;
-+ goto exit_unref;
-+ }
-+ }
-+
-+ return e;
-+
-+exit_unref:
-+ kdbus_node_deactivate(&e->node);
-+ kdbus_node_unref(&e->node);
-+ return ERR_PTR(ret);
-+}
-+
-+/**
-+ * kdbus_ep_ref() - increase the reference counter of a kdbus_ep
-+ * @ep: The endpoint to reference
-+ *
-+ * Every user of an endpoint, except for its creator, must add a reference to
-+ * the kdbus_ep instance using this function.
-+ *
-+ * Return: the ep itself
-+ */
-+struct kdbus_ep *kdbus_ep_ref(struct kdbus_ep *ep)
-+{
-+ if (ep)
-+ kdbus_node_ref(&ep->node);
-+ return ep;
-+}
-+
-+/**
-+ * kdbus_ep_unref() - decrease the reference counter of a kdbus_ep
-+ * @ep: The ep to unref
-+ *
-+ * Release a reference. If the reference count drops to 0, the ep will be
-+ * freed.
-+ *
-+ * Return: NULL
-+ */
-+struct kdbus_ep *kdbus_ep_unref(struct kdbus_ep *ep)
-+{
-+ if (ep)
-+ kdbus_node_unref(&ep->node);
-+ return NULL;
-+}
-+
-+/**
-+ * kdbus_ep_is_privileged() - check whether a file is privileged
-+ * @ep: endpoint to operate on
-+ * @file: file to test
-+ *
-+ * Return: True if @file is privileged in the domain of @ep.
-+ */
-+bool kdbus_ep_is_privileged(struct kdbus_ep *ep, struct file *file)
-+{
-+ return !ep->user &&
-+ file_ns_capable(file, ep->bus->domain->user_namespace,
-+ CAP_IPC_OWNER);
-+}
-+
-+/**
-+ * kdbus_ep_is_owner() - check whether a file should be treated as bus owner
-+ * @ep: endpoint to operate on
-+ * @file: file to test
-+ *
-+ * Return: True if @file should be treated as bus owner on @ep
-+ */
-+bool kdbus_ep_is_owner(struct kdbus_ep *ep, struct file *file)
-+{
-+ return !ep->user &&
-+ (uid_eq(file->f_cred->euid, ep->bus->node.uid) ||
-+ kdbus_ep_is_privileged(ep, file));
-+}
-+
-+/**
-+ * kdbus_cmd_ep_make() - handle KDBUS_CMD_ENDPOINT_MAKE
-+ * @bus: bus to operate on
-+ * @argp: command payload
-+ *
-+ * Return: NULL or newly created endpoint on success, ERR_PTR on failure.
-+ */
-+struct kdbus_ep *kdbus_cmd_ep_make(struct kdbus_bus *bus, void __user *argp)
-+{
-+ const char *item_make_name;
-+ struct kdbus_ep *ep = NULL;
-+ struct kdbus_cmd *cmd;
-+ int ret;
-+
-+ struct kdbus_arg argv[] = {
-+ { .type = KDBUS_ITEM_NEGOTIATE },
-+ { .type = KDBUS_ITEM_MAKE_NAME, .mandatory = true },
-+ };
-+ struct kdbus_args args = {
-+ .allowed_flags = KDBUS_FLAG_NEGOTIATE |
-+ KDBUS_MAKE_ACCESS_GROUP |
-+ KDBUS_MAKE_ACCESS_WORLD,
-+ .argv = argv,
-+ .argc = ARRAY_SIZE(argv),
-+ };
-+
-+ ret = kdbus_args_parse(&args, argp, &cmd);
-+ if (ret < 0)
-+ return ERR_PTR(ret);
-+ if (ret > 0)
-+ return NULL;
-+
-+ item_make_name = argv[1].item->str;
-+
-+ ep = kdbus_ep_new(bus, item_make_name, cmd->flags,
-+ current_euid(), current_egid(), true);
-+ if (IS_ERR(ep)) {
-+ ret = PTR_ERR(ep);
-+ ep = NULL;
-+ goto exit;
-+ }
-+
-+ if (!kdbus_node_activate(&ep->node)) {
-+ ret = -ESHUTDOWN;
-+ goto exit;
-+ }
-+
-+exit:
-+ ret = kdbus_args_clear(&args, ret);
-+ if (ret < 0) {
-+ if (ep) {
-+ kdbus_node_deactivate(&ep->node);
-+ kdbus_ep_unref(ep);
-+ }
-+ return ERR_PTR(ret);
-+ }
-+ return ep;
-+}
-+
-+/**
-+ * kdbus_cmd_ep_update() - handle KDBUS_CMD_ENDPOINT_UPDATE
-+ * @ep: endpoint to operate on
-+ * @argp: command payload
-+ *
-+ * Return: >=0 on success, negative error code on failure.
-+ */
-+int kdbus_cmd_ep_update(struct kdbus_ep *ep, void __user *argp)
-+{
-+ struct kdbus_cmd *cmd;
-+ int ret;
-+
-+ struct kdbus_arg argv[] = {
-+ { .type = KDBUS_ITEM_NEGOTIATE },
-+ { .type = KDBUS_ITEM_NAME, .multiple = true },
-+ { .type = KDBUS_ITEM_POLICY_ACCESS, .multiple = true },
-+ };
-+ struct kdbus_args args = {
-+ .allowed_flags = KDBUS_FLAG_NEGOTIATE,
-+ .argv = argv,
-+ .argc = ARRAY_SIZE(argv),
-+ };
-+
-+ ret = kdbus_args_parse(&args, argp, &cmd);
-+ if (ret != 0)
-+ return ret;
-+
-+ ret = kdbus_policy_set(&ep->policy_db, args.items, args.items_size,
-+ 0, true, ep);
-+ return kdbus_args_clear(&args, ret);
-+}
-diff --git a/ipc/kdbus/endpoint.h b/ipc/kdbus/endpoint.h
-new file mode 100644
-index 0000000..e0da59f
---- /dev/null
-+++ b/ipc/kdbus/endpoint.h
-@@ -0,0 +1,70 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_ENDPOINT_H
-+#define __KDBUS_ENDPOINT_H
-+
-+#include <linux/list.h>
-+#include <linux/mutex.h>
-+#include <linux/uidgid.h>
-+#include "node.h"
-+#include "policy.h"
-+
-+struct kdbus_bus;
-+struct kdbus_user;
-+
-+/**
-+ * struct kdbus_ep - endpoint to access a bus
-+ * @node: The kdbus node
-+ * @lock: Endpoint data lock
-+ * @bus: Bus behind this endpoint
-+ * @user: Custom enpoints account against an anonymous user
-+ * @policy_db: Uploaded policy
-+ * @conn_list: Connections of this endpoint
-+ *
-+ * An endpoint offers access to a bus; the default endpoint node name is "bus".
-+ * Additional custom endpoints to the same bus can be created and they can
-+ * carry their own policies/filters.
-+ */
-+struct kdbus_ep {
-+ struct kdbus_node node;
-+ struct mutex lock;
-+
-+ /* static */
-+ struct kdbus_bus *bus;
-+ struct kdbus_user *user;
-+
-+ /* protected by own locks */
-+ struct kdbus_policy_db policy_db;
-+
-+ /* protected by ep->lock */
-+ struct list_head conn_list;
-+};
-+
-+#define kdbus_ep_from_node(_node) \
-+ container_of((_node), struct kdbus_ep, node)
-+
-+struct kdbus_ep *kdbus_ep_new(struct kdbus_bus *bus, const char *name,
-+ unsigned int access, kuid_t uid, kgid_t gid,
-+ bool policy);
-+struct kdbus_ep *kdbus_ep_ref(struct kdbus_ep *ep);
-+struct kdbus_ep *kdbus_ep_unref(struct kdbus_ep *ep);
-+
-+bool kdbus_ep_is_privileged(struct kdbus_ep *ep, struct file *file);
-+bool kdbus_ep_is_owner(struct kdbus_ep *ep, struct file *file);
-+
-+struct kdbus_ep *kdbus_cmd_ep_make(struct kdbus_bus *bus, void __user *argp);
-+int kdbus_cmd_ep_update(struct kdbus_ep *ep, void __user *argp);
-+
-+#endif
-diff --git a/ipc/kdbus/fs.c b/ipc/kdbus/fs.c
-new file mode 100644
-index 0000000..09c4809
---- /dev/null
-+++ b/ipc/kdbus/fs.c
-@@ -0,0 +1,508 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/dcache.h>
-+#include <linux/fs.h>
-+#include <linux/fsnotify.h>
-+#include <linux/init.h>
-+#include <linux/ipc_namespace.h>
-+#include <linux/magic.h>
-+#include <linux/module.h>
-+#include <linux/mount.h>
-+#include <linux/mutex.h>
-+#include <linux/namei.h>
-+#include <linux/pagemap.h>
-+#include <linux/sched.h>
-+#include <linux/slab.h>
-+
-+#include "bus.h"
-+#include "domain.h"
-+#include "endpoint.h"
-+#include "fs.h"
-+#include "handle.h"
-+#include "node.h"
-+
-+#define kdbus_node_from_dentry(_dentry) \
-+ ((struct kdbus_node *)(_dentry)->d_fsdata)
-+
-+static struct inode *fs_inode_get(struct super_block *sb,
-+ struct kdbus_node *node);
-+
-+/*
-+ * Directory Management
-+ */
-+
-+static inline unsigned char kdbus_dt_type(struct kdbus_node *node)
-+{
-+ switch (node->type) {
-+ case KDBUS_NODE_DOMAIN:
-+ case KDBUS_NODE_BUS:
-+ return DT_DIR;
-+ case KDBUS_NODE_CONTROL:
-+ case KDBUS_NODE_ENDPOINT:
-+ return DT_REG;
-+ }
-+
-+ return DT_UNKNOWN;
-+}
-+
-+static int fs_dir_fop_iterate(struct file *file, struct dir_context *ctx)
-+{
-+ struct dentry *dentry = file->f_path.dentry;
-+ struct kdbus_node *parent = kdbus_node_from_dentry(dentry);
-+ struct kdbus_node *old, *next = file->private_data;
-+
-+ /*
-+ * kdbusfs directory iterator (modelled after sysfs/kernfs)
-+ * When iterating kdbusfs directories, we iterate all children of the
-+ * parent kdbus_node object. We use ctx->pos to store the hash of the
-+ * child and file->private_data to store a reference to the next node
-+ * object. If ctx->pos is not modified via llseek while you iterate a
-+ * directory, then we use the file->private_data node pointer to
-+ * directly access the next node in the tree.
-+ * However, if you directly seek on the directory, we have to find the
-+ * closest node to that position and cannot use our node pointer. This
-+ * means iterating the rb-tree to find the closest match and start over
-+ * from there.
-+ * Note that hash values are not necessarily unique. Therefore, llseek
-+ * is not guaranteed to seek to the same node that you got when you
-+ * retrieved the position. Seeking to 0, 1, 2 and >=INT_MAX is safe,
-+ * though. We could use the inode-number as position, but this would
-+ * require another rb-tree for fast access. Kernfs and others already
-+ * ignore those conflicts, so we should be fine, too.
-+ */
-+
-+ if (!dir_emit_dots(file, ctx))
-+ return 0;
-+
-+ /* acquire @next; if deactivated, or seek detected, find next node */
-+ old = next;
-+ if (next && ctx->pos == next->hash) {
-+ if (kdbus_node_acquire(next))
-+ kdbus_node_ref(next);
-+ else
-+ next = kdbus_node_next_child(parent, next);
-+ } else {
-+ next = kdbus_node_find_closest(parent, ctx->pos);
-+ }
-+ kdbus_node_unref(old);
-+
-+ while (next) {
-+ /* emit @next */
-+ file->private_data = next;
-+ ctx->pos = next->hash;
-+
-+ kdbus_node_release(next);
-+
-+ if (!dir_emit(ctx, next->name, strlen(next->name), next->id,
-+ kdbus_dt_type(next)))
-+ return 0;
-+
-+ /* find next node after @next */
-+ old = next;
-+ next = kdbus_node_next_child(parent, next);
-+ kdbus_node_unref(old);
-+ }
-+
-+ file->private_data = NULL;
-+ ctx->pos = INT_MAX;
-+
-+ return 0;
-+}
-+
-+static loff_t fs_dir_fop_llseek(struct file *file, loff_t offset, int whence)
-+{
-+ struct inode *inode = file_inode(file);
-+ loff_t ret;
-+
-+ /* protect f_off against fop_iterate */
-+ mutex_lock(&inode->i_mutex);
-+ ret = generic_file_llseek(file, offset, whence);
-+ mutex_unlock(&inode->i_mutex);
-+
-+ return ret;
-+}
-+
-+static int fs_dir_fop_release(struct inode *inode, struct file *file)
-+{
-+ kdbus_node_unref(file->private_data);
-+ return 0;
-+}
-+
-+static const struct file_operations fs_dir_fops = {
-+ .read = generic_read_dir,
-+ .iterate = fs_dir_fop_iterate,
-+ .llseek = fs_dir_fop_llseek,
-+ .release = fs_dir_fop_release,
-+};
-+
-+static struct dentry *fs_dir_iop_lookup(struct inode *dir,
-+ struct dentry *dentry,
-+ unsigned int flags)
-+{
-+ struct dentry *dnew = NULL;
-+ struct kdbus_node *parent;
-+ struct kdbus_node *node;
-+ struct inode *inode;
-+
-+ parent = kdbus_node_from_dentry(dentry->d_parent);
-+ if (!kdbus_node_acquire(parent))
-+ return NULL;
-+
-+ /* returns reference to _acquired_ child node */
-+ node = kdbus_node_find_child(parent, dentry->d_name.name);
-+ if (node) {
-+ dentry->d_fsdata = node;
-+ inode = fs_inode_get(dir->i_sb, node);
-+ if (IS_ERR(inode))
-+ dnew = ERR_CAST(inode);
-+ else
-+ dnew = d_splice_alias(inode, dentry);
-+
-+ kdbus_node_release(node);
-+ }
-+
-+ kdbus_node_release(parent);
-+ return dnew;
-+}
-+
-+static const struct inode_operations fs_dir_iops = {
-+ .permission = generic_permission,
-+ .lookup = fs_dir_iop_lookup,
-+};
-+
-+/*
-+ * Inode Management
-+ */
-+
-+static const struct inode_operations fs_inode_iops = {
-+ .permission = generic_permission,
-+};
-+
-+static struct inode *fs_inode_get(struct super_block *sb,
-+ struct kdbus_node *node)
-+{
-+ struct inode *inode;
-+
-+ inode = iget_locked(sb, node->id);
-+ if (!inode)
-+ return ERR_PTR(-ENOMEM);
-+ if (!(inode->i_state & I_NEW))
-+ return inode;
-+
-+ inode->i_private = kdbus_node_ref(node);
-+ inode->i_mapping->a_ops = &empty_aops;
-+ inode->i_mode = node->mode & S_IALLUGO;
-+ inode->i_atime = inode->i_ctime = inode->i_mtime = CURRENT_TIME;
-+ inode->i_uid = node->uid;
-+ inode->i_gid = node->gid;
-+
-+ switch (node->type) {
-+ case KDBUS_NODE_DOMAIN:
-+ case KDBUS_NODE_BUS:
-+ inode->i_mode |= S_IFDIR;
-+ inode->i_op = &fs_dir_iops;
-+ inode->i_fop = &fs_dir_fops;
-+ set_nlink(inode, 2);
-+ break;
-+ case KDBUS_NODE_CONTROL:
-+ case KDBUS_NODE_ENDPOINT:
-+ inode->i_mode |= S_IFREG;
-+ inode->i_op = &fs_inode_iops;
-+ inode->i_fop = &kdbus_handle_ops;
-+ break;
-+ }
-+
-+ unlock_new_inode(inode);
-+
-+ return inode;
-+}
-+
-+/*
-+ * Superblock Management
-+ */
-+
-+static int fs_super_dop_revalidate(struct dentry *dentry, unsigned int flags)
-+{
-+ struct kdbus_node *node;
-+
-+ /* Force lookup on negatives */
-+ if (!dentry->d_inode)
-+ return 0;
-+
-+ node = kdbus_node_from_dentry(dentry);
-+
-+ /* see whether the node has been removed */
-+ if (!kdbus_node_is_active(node))
-+ return 0;
-+
-+ return 1;
-+}
-+
-+static void fs_super_dop_release(struct dentry *dentry)
-+{
-+ kdbus_node_unref(dentry->d_fsdata);
-+}
-+
-+static const struct dentry_operations fs_super_dops = {
-+ .d_revalidate = fs_super_dop_revalidate,
-+ .d_release = fs_super_dop_release,
-+};
-+
-+static void fs_super_sop_evict_inode(struct inode *inode)
-+{
-+ struct kdbus_node *node = kdbus_node_from_inode(inode);
-+
-+ truncate_inode_pages_final(&inode->i_data);
-+ clear_inode(inode);
-+ kdbus_node_unref(node);
-+}
-+
-+static const struct super_operations fs_super_sops = {
-+ .statfs = simple_statfs,
-+ .drop_inode = generic_delete_inode,
-+ .evict_inode = fs_super_sop_evict_inode,
-+};
-+
-+static int fs_super_fill(struct super_block *sb)
-+{
-+ struct kdbus_domain *domain = sb->s_fs_info;
-+ struct inode *inode;
-+ int ret;
-+
-+ sb->s_blocksize = PAGE_CACHE_SIZE;
-+ sb->s_blocksize_bits = PAGE_CACHE_SHIFT;
-+ sb->s_magic = KDBUS_SUPER_MAGIC;
-+ sb->s_maxbytes = MAX_LFS_FILESIZE;
-+ sb->s_op = &fs_super_sops;
-+ sb->s_time_gran = 1;
-+
-+ inode = fs_inode_get(sb, &domain->node);
-+ if (IS_ERR(inode))
-+ return PTR_ERR(inode);
-+
-+ sb->s_root = d_make_root(inode);
-+ if (!sb->s_root) {
-+ /* d_make_root iput()s the inode on failure */
-+ return -ENOMEM;
-+ }
-+
-+ /* sb holds domain reference */
-+ sb->s_root->d_fsdata = &domain->node;
-+ sb->s_d_op = &fs_super_dops;
-+
-+ /* sb holds root reference */
-+ domain->dentry = sb->s_root;
-+
-+ if (!kdbus_node_activate(&domain->node))
-+ return -ESHUTDOWN;
-+
-+ ret = kdbus_domain_populate(domain, KDBUS_MAKE_ACCESS_WORLD);
-+ if (ret < 0)
-+ return ret;
-+
-+ sb->s_flags |= MS_ACTIVE;
-+ return 0;
-+}
-+
-+static void fs_super_kill(struct super_block *sb)
-+{
-+ struct kdbus_domain *domain = sb->s_fs_info;
-+
-+ if (domain) {
-+ kdbus_node_deactivate(&domain->node);
-+ domain->dentry = NULL;
-+ }
-+
-+ kill_anon_super(sb);
-+ kdbus_domain_unref(domain);
-+}
-+
-+static int fs_super_set(struct super_block *sb, void *data)
-+{
-+ int ret;
-+
-+ ret = set_anon_super(sb, data);
-+ if (!ret)
-+ sb->s_fs_info = data;
-+
-+ return ret;
-+}
-+
-+static struct dentry *fs_super_mount(struct file_system_type *fs_type,
-+ int flags, const char *dev_name,
-+ void *data)
-+{
-+ struct kdbus_domain *domain;
-+ struct super_block *sb;
-+ int ret;
-+
-+ domain = kdbus_domain_new(KDBUS_MAKE_ACCESS_WORLD);
-+ if (IS_ERR(domain))
-+ return ERR_CAST(domain);
-+
-+ sb = sget(fs_type, NULL, fs_super_set, flags, domain);
-+ if (IS_ERR(sb)) {
-+ kdbus_node_deactivate(&domain->node);
-+ kdbus_domain_unref(domain);
-+ return ERR_CAST(sb);
-+ }
-+
-+ WARN_ON(sb->s_fs_info != domain);
-+ WARN_ON(sb->s_root);
-+
-+ ret = fs_super_fill(sb);
-+ if (ret < 0) {
-+ /* calls into ->kill_sb() when done */
-+ deactivate_locked_super(sb);
-+ return ERR_PTR(ret);
-+ }
-+
-+ return dget(sb->s_root);
-+}
-+
-+static struct file_system_type fs_type = {
-+ .name = KBUILD_MODNAME "fs",
-+ .owner = THIS_MODULE,
-+ .mount = fs_super_mount,
-+ .kill_sb = fs_super_kill,
-+ .fs_flags = FS_USERNS_MOUNT,
-+};
-+
-+/**
-+ * kdbus_fs_init() - register kdbus filesystem
-+ *
-+ * This registers a filesystem with the VFS layer. The filesystem is called
-+ * `KBUILD_MODNAME "fs"', which usually resolves to `kdbusfs'. The nameing
-+ * scheme allows to set KBUILD_MODNAME to "kdbus2" and you will get an
-+ * independent filesystem for developers.
-+ *
-+ * Each mount of the kdbusfs filesystem has an kdbus_domain attached.
-+ * Operations on this mount will only affect the attached domain. On each mount
-+ * a new domain is automatically created and used for this mount exclusively.
-+ * If you want to share a domain across multiple mounts, you need to bind-mount
-+ * it.
-+ *
-+ * Mounts of kdbusfs (with a different domain each) are unrelated to each other
-+ * and will never have any effect on any domain but their own.
-+ *
-+ * Return: 0 on success, negative error otherwise.
-+ */
-+int kdbus_fs_init(void)
-+{
-+ return register_filesystem(&fs_type);
-+}
-+
-+/**
-+ * kdbus_fs_exit() - unregister kdbus filesystem
-+ *
-+ * This does the reverse to kdbus_fs_init(). It unregisters the kdbusfs
-+ * filesystem from VFS and cleans up any allocated resources.
-+ */
-+void kdbus_fs_exit(void)
-+{
-+ unregister_filesystem(&fs_type);
-+}
-+
-+/* acquire domain of @node, making sure all ancestors are active */
-+static struct kdbus_domain *fs_acquire_domain(struct kdbus_node *node)
-+{
-+ struct kdbus_domain *domain;
-+ struct kdbus_node *iter;
-+
-+ /* caller must guarantee that @node is linked */
-+ for (iter = node; iter->parent; iter = iter->parent)
-+ if (!kdbus_node_is_active(iter->parent))
-+ return NULL;
-+
-+ /* root nodes are always domains */
-+ if (WARN_ON(iter->type != KDBUS_NODE_DOMAIN))
-+ return NULL;
-+
-+ domain = kdbus_domain_from_node(iter);
-+ if (!kdbus_node_acquire(&domain->node))
-+ return NULL;
-+
-+ return domain;
-+}
-+
-+/**
-+ * kdbus_fs_flush() - flush dcache entries of a node
-+ * @node: Node to flush entries of
-+ *
-+ * This flushes all VFS filesystem cache entries for a node and all its
-+ * children. This should be called whenever a node is destroyed during
-+ * runtime. It will flush the cache entries so the linked objects can be
-+ * deallocated.
-+ *
-+ * This is a no-op if you call it on active nodes (they really should stay in
-+ * cache) or on nodes with deactivated parents (flushing the parent is enough).
-+ * Furthermore, there is no need to call it on nodes whose lifetime is bound to
-+ * their parents'. In those cases, the parent-flush will always also flush the
-+ * children.
-+ */
-+void kdbus_fs_flush(struct kdbus_node *node)
-+{
-+ struct dentry *dentry, *parent_dentry = NULL;
-+ struct kdbus_domain *domain;
-+ struct qstr name;
-+
-+ /* active nodes should remain in cache */
-+ if (!kdbus_node_is_deactivated(node))
-+ return;
-+
-+ /* nodes that were never linked were never instantiated */
-+ if (!node->parent)
-+ return;
-+
-+ /* acquire domain and verify all ancestors are active */
-+ domain = fs_acquire_domain(node);
-+ if (!domain)
-+ return;
-+
-+ switch (node->type) {
-+ case KDBUS_NODE_ENDPOINT:
-+ if (WARN_ON(!node->parent || !node->parent->name))
-+ goto exit;
-+
-+ name.name = node->parent->name;
-+ name.len = strlen(node->parent->name);
-+ parent_dentry = d_hash_and_lookup(domain->dentry, &name);
-+ if (IS_ERR_OR_NULL(parent_dentry))
-+ goto exit;
-+
-+ /* fallthrough */
-+ case KDBUS_NODE_BUS:
-+ if (WARN_ON(!node->name))
-+ goto exit;
-+
-+ name.name = node->name;
-+ name.len = strlen(node->name);
-+ dentry = d_hash_and_lookup(parent_dentry ? : domain->dentry,
-+ &name);
-+ if (!IS_ERR_OR_NULL(dentry)) {
-+ d_invalidate(dentry);
-+ dput(dentry);
-+ }
-+
-+ dput(parent_dentry);
-+ break;
-+
-+ default:
-+ /* all other types are bound to their parent lifetime */
-+ break;
-+ }
-+
-+exit:
-+ kdbus_node_release(&domain->node);
-+}
-diff --git a/ipc/kdbus/fs.h b/ipc/kdbus/fs.h
-new file mode 100644
-index 0000000..62f7d6a
---- /dev/null
-+++ b/ipc/kdbus/fs.h
-@@ -0,0 +1,28 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUSFS_H
-+#define __KDBUSFS_H
-+
-+#include <linux/kernel.h>
-+
-+struct kdbus_node;
-+
-+int kdbus_fs_init(void);
-+void kdbus_fs_exit(void);
-+void kdbus_fs_flush(struct kdbus_node *node);
-+
-+#define kdbus_node_from_inode(_inode) \
-+ ((struct kdbus_node *)(_inode)->i_private)
-+
-+#endif
-diff --git a/ipc/kdbus/handle.c b/ipc/kdbus/handle.c
-new file mode 100644
-index 0000000..fc60932
---- /dev/null
-+++ b/ipc/kdbus/handle.c
-@@ -0,0 +1,691 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/file.h>
-+#include <linux/fs.h>
-+#include <linux/idr.h>
-+#include <linux/init.h>
-+#include <linux/kdev_t.h>
-+#include <linux/module.h>
-+#include <linux/mutex.h>
-+#include <linux/poll.h>
-+#include <linux/rwsem.h>
-+#include <linux/sched.h>
-+#include <linux/sizes.h>
-+#include <linux/slab.h>
-+#include <linux/uaccess.h>
-+#include <linux/syscalls.h>
-+
-+#include "bus.h"
-+#include "connection.h"
-+#include "endpoint.h"
-+#include "fs.h"
-+#include "handle.h"
-+#include "item.h"
-+#include "match.h"
-+#include "message.h"
-+#include "names.h"
-+#include "domain.h"
-+#include "policy.h"
-+
-+static int kdbus_args_verify(struct kdbus_args *args)
-+{
-+ struct kdbus_item *item;
-+ size_t i;
-+ int ret;
-+
-+ KDBUS_ITEMS_FOREACH(item, args->items, args->items_size) {
-+ struct kdbus_arg *arg = NULL;
-+
-+ if (!KDBUS_ITEM_VALID(item, args->items, args->items_size))
-+ return -EINVAL;
-+
-+ for (i = 0; i < args->argc; ++i)
-+ if (args->argv[i].type == item->type)
-+ break;
-+ if (i >= args->argc)
-+ return -EINVAL;
-+
-+ arg = &args->argv[i];
-+
-+ ret = kdbus_item_validate(item);
-+ if (ret < 0)
-+ return ret;
-+
-+ if (arg->item && !arg->multiple)
-+ return -EINVAL;
-+
-+ arg->item = item;
-+ }
-+
-+ if (!KDBUS_ITEMS_END(item, args->items, args->items_size))
-+ return -EINVAL;
-+
-+ return 0;
-+}
-+
-+static int kdbus_args_negotiate(struct kdbus_args *args)
-+{
-+ struct kdbus_item __user *user;
-+ struct kdbus_item *negotiation;
-+ size_t i, j, num;
-+
-+ /*
-+ * If KDBUS_FLAG_NEGOTIATE is set, we overwrite the flags field with
-+ * the set of supported flags. Furthermore, if an KDBUS_ITEM_NEGOTIATE
-+ * item is passed, we iterate its payload (array of u64, each set to an
-+ * item type) and clear all unsupported item-types to 0.
-+ * The caller might do this recursively, if other flags or objects are
-+ * embedded in the payload itself.
-+ */
-+
-+ if (args->cmd->flags & KDBUS_FLAG_NEGOTIATE) {
-+ if (put_user(args->allowed_flags & ~KDBUS_FLAG_NEGOTIATE,
-+ &args->user->flags))
-+ return -EFAULT;
-+ }
-+
-+ if (args->argc < 1 || args->argv[0].type != KDBUS_ITEM_NEGOTIATE ||
-+ !args->argv[0].item)
-+ return 0;
-+
-+ negotiation = args->argv[0].item;
-+ user = (struct kdbus_item __user *)
-+ ((u8 __user *)args->user +
-+ ((u8 *)negotiation - (u8 *)args->cmd));
-+ num = KDBUS_ITEM_PAYLOAD_SIZE(negotiation) / sizeof(u64);
-+
-+ for (i = 0; i < num; ++i) {
-+ for (j = 0; j < args->argc; ++j)
-+ if (negotiation->data64[i] == args->argv[j].type)
-+ break;
-+
-+ if (j < args->argc)
-+ continue;
-+
-+ /* this item is not supported, clear it out */
-+ negotiation->data64[i] = 0;
-+ if (put_user(negotiation->data64[i], &user->data64[i]))
-+ return -EFAULT;
-+ }
-+
-+ return 0;
-+}
-+
-+/**
-+ * __kdbus_args_parse() - parse payload of kdbus command
-+ * @args: object to parse data into
-+ * @is_cmd: whether this is a command or msg payload
-+ * @argp: user-space location of command payload to parse
-+ * @type_size: overall size of command payload to parse
-+ * @items_offset: offset of items array in command payload
-+ * @out: output variable to store pointer to copied payload
-+ *
-+ * This parses the ioctl payload at user-space location @argp into @args. @args
-+ * must be pre-initialized by the caller to reflect the supported flags and
-+ * items of this command. This parser will then copy the command payload into
-+ * kernel-space, verify correctness and consistency and cache pointers to parsed
-+ * items and other data in @args.
-+ *
-+ * If this function succeeded, you must call kdbus_args_clear() to release
-+ * allocated resources before destroying @args.
-+ *
-+ * This can also be used to import kdbus_msg objects. In that case, @is_cmd must
-+ * be set to 'false' and the 'return_flags' field will not be touched (as it
-+ * doesn't exist on kdbus_msg).
-+ *
-+ * Return: On failure a negative error code is returned. Otherwise, 1 is
-+ * returned if negotiation was requested, 0 if not.
-+ */
-+int __kdbus_args_parse(struct kdbus_args *args, bool is_cmd, void __user *argp,
-+ size_t type_size, size_t items_offset, void **out)
-+{
-+ u64 user_size;
-+ int ret, i;
-+
-+ ret = kdbus_copy_from_user(&user_size, argp, sizeof(user_size));
-+ if (ret < 0)
-+ return ret;
-+
-+ if (user_size < type_size)
-+ return -EINVAL;
-+ if (user_size > KDBUS_CMD_MAX_SIZE)
-+ return -EMSGSIZE;
-+
-+ if (user_size <= sizeof(args->cmd_buf)) {
-+ if (copy_from_user(args->cmd_buf, argp, user_size))
-+ return -EFAULT;
-+ args->cmd = (void*)args->cmd_buf;
-+ } else {
-+ args->cmd = memdup_user(argp, user_size);
-+ if (IS_ERR(args->cmd))
-+ return PTR_ERR(args->cmd);
-+ }
-+
-+ if (args->cmd->size != user_size) {
-+ ret = -EINVAL;
-+ goto error;
-+ }
-+
-+ if (is_cmd)
-+ args->cmd->return_flags = 0;
-+ args->user = argp;
-+ args->items = (void *)((u8 *)args->cmd + items_offset);
-+ args->items_size = args->cmd->size - items_offset;
-+ args->is_cmd = is_cmd;
-+
-+ if (args->cmd->flags & ~args->allowed_flags) {
-+ ret = -EINVAL;
-+ goto error;
-+ }
-+
-+ ret = kdbus_args_verify(args);
-+ if (ret < 0)
-+ goto error;
-+
-+ ret = kdbus_args_negotiate(args);
-+ if (ret < 0)
-+ goto error;
-+
-+ /* mandatory items must be given (but not on negotiation) */
-+ if (!(args->cmd->flags & KDBUS_FLAG_NEGOTIATE)) {
-+ for (i = 0; i < args->argc; ++i)
-+ if (args->argv[i].mandatory && !args->argv[i].item) {
-+ ret = -EINVAL;
-+ goto error;
-+ }
-+ }
-+
-+ *out = args->cmd;
-+ return !!(args->cmd->flags & KDBUS_FLAG_NEGOTIATE);
-+
-+error:
-+ return kdbus_args_clear(args, ret);
-+}
-+
-+/**
-+ * kdbus_args_clear() - release allocated command resources
-+ * @args: object to release resources of
-+ * @ret: return value of this command
-+ *
-+ * This frees all allocated resources on @args and copies the command result
-+ * flags into user-space. @ret is usually returned unchanged by this function,
-+ * so it can be used in the final 'return' statement of the command handler.
-+ *
-+ * Return: -EFAULT if return values cannot be copied into user-space, otherwise
-+ * @ret is returned unchanged.
-+ */
-+int kdbus_args_clear(struct kdbus_args *args, int ret)
-+{
-+ if (!args)
-+ return ret;
-+
-+ if (!IS_ERR_OR_NULL(args->cmd)) {
-+ if (args->is_cmd && put_user(args->cmd->return_flags,
-+ &args->user->return_flags))
-+ ret = -EFAULT;
-+ if (args->cmd != (void*)args->cmd_buf)
-+ kfree(args->cmd);
-+ args->cmd = NULL;
-+ }
-+
-+ return ret;
-+}
-+
-+/**
-+ * enum kdbus_handle_type - type an handle can be of
-+ * @KDBUS_HANDLE_NONE: no type set, yet
-+ * @KDBUS_HANDLE_BUS_OWNER: bus owner
-+ * @KDBUS_HANDLE_EP_OWNER: endpoint owner
-+ * @KDBUS_HANDLE_CONNECTED: endpoint connection after HELLO
-+ */
-+enum kdbus_handle_type {
-+ KDBUS_HANDLE_NONE,
-+ KDBUS_HANDLE_BUS_OWNER,
-+ KDBUS_HANDLE_EP_OWNER,
-+ KDBUS_HANDLE_CONNECTED,
-+};
-+
-+/**
-+ * struct kdbus_handle - handle to the kdbus system
-+ * @lock: handle lock
-+ * @type: type of this handle (KDBUS_HANDLE_*)
-+ * @bus_owner: bus this handle owns
-+ * @ep_owner: endpoint this handle owns
-+ * @conn: connection this handle owns
-+ */
-+struct kdbus_handle {
-+ struct mutex lock;
-+
-+ enum kdbus_handle_type type;
-+ union {
-+ struct kdbus_bus *bus_owner;
-+ struct kdbus_ep *ep_owner;
-+ struct kdbus_conn *conn;
-+ };
-+};
-+
-+static int kdbus_handle_open(struct inode *inode, struct file *file)
-+{
-+ struct kdbus_handle *handle;
-+ struct kdbus_node *node;
-+ int ret;
-+
-+ node = kdbus_node_from_inode(inode);
-+ if (!kdbus_node_acquire(node))
-+ return -ESHUTDOWN;
-+
-+ handle = kzalloc(sizeof(*handle), GFP_KERNEL);
-+ if (!handle) {
-+ ret = -ENOMEM;
-+ goto exit;
-+ }
-+
-+ mutex_init(&handle->lock);
-+ handle->type = KDBUS_HANDLE_NONE;
-+
-+ file->private_data = handle;
-+ ret = 0;
-+
-+exit:
-+ kdbus_node_release(node);
-+ return ret;
-+}
-+
-+static int kdbus_handle_release(struct inode *inode, struct file *file)
-+{
-+ struct kdbus_handle *handle = file->private_data;
-+
-+ switch (handle->type) {
-+ case KDBUS_HANDLE_BUS_OWNER:
-+ if (handle->bus_owner) {
-+ kdbus_node_deactivate(&handle->bus_owner->node);
-+ kdbus_bus_unref(handle->bus_owner);
-+ }
-+ break;
-+ case KDBUS_HANDLE_EP_OWNER:
-+ if (handle->ep_owner) {
-+ kdbus_node_deactivate(&handle->ep_owner->node);
-+ kdbus_ep_unref(handle->ep_owner);
-+ }
-+ break;
-+ case KDBUS_HANDLE_CONNECTED:
-+ kdbus_conn_disconnect(handle->conn, false);
-+ kdbus_conn_unref(handle->conn);
-+ break;
-+ case KDBUS_HANDLE_NONE:
-+ /* nothing to clean up */
-+ break;
-+ }
-+
-+ kfree(handle);
-+
-+ return 0;
-+}
-+
-+static long kdbus_handle_ioctl_control(struct file *file, unsigned int cmd,
-+ void __user *argp)
-+{
-+ struct kdbus_handle *handle = file->private_data;
-+ struct kdbus_node *node = file_inode(file)->i_private;
-+ struct kdbus_domain *domain;
-+ int ret = 0;
-+
-+ if (!kdbus_node_acquire(node))
-+ return -ESHUTDOWN;
-+
-+ /*
-+ * The parent of control-nodes is always a domain, make sure to pin it
-+ * so the parent is actually valid.
-+ */
-+ domain = kdbus_domain_from_node(node->parent);
-+ if (!kdbus_node_acquire(&domain->node)) {
-+ kdbus_node_release(node);
-+ return -ESHUTDOWN;
-+ }
-+
-+ switch (cmd) {
-+ case KDBUS_CMD_BUS_MAKE: {
-+ struct kdbus_bus *bus;
-+
-+ bus = kdbus_cmd_bus_make(domain, argp);
-+ if (IS_ERR_OR_NULL(bus)) {
-+ ret = PTR_ERR_OR_ZERO(bus);
-+ break;
-+ }
-+
-+ handle->bus_owner = bus;
-+ ret = KDBUS_HANDLE_BUS_OWNER;
-+ break;
-+ }
-+
-+ default:
-+ ret = -EBADFD;
-+ break;
-+ }
-+
-+ kdbus_node_release(&domain->node);
-+ kdbus_node_release(node);
-+ return ret;
-+}
-+
-+static long kdbus_handle_ioctl_ep(struct file *file, unsigned int cmd,
-+ void __user *buf)
-+{
-+ struct kdbus_handle *handle = file->private_data;
-+ struct kdbus_node *node = file_inode(file)->i_private;
-+ struct kdbus_ep *ep, *file_ep = kdbus_ep_from_node(node);
-+ struct kdbus_bus *bus = file_ep->bus;
-+ struct kdbus_conn *conn;
-+ int ret = 0;
-+
-+ if (!kdbus_node_acquire(node))
-+ return -ESHUTDOWN;
-+
-+ switch (cmd) {
-+ case KDBUS_CMD_ENDPOINT_MAKE: {
-+ /* creating custom endpoints is a privileged operation */
-+ if (!kdbus_ep_is_owner(file_ep, file)) {
-+ ret = -EPERM;
-+ break;
-+ }
-+
-+ ep = kdbus_cmd_ep_make(bus, buf);
-+ if (IS_ERR_OR_NULL(ep)) {
-+ ret = PTR_ERR_OR_ZERO(ep);
-+ break;
-+ }
-+
-+ handle->ep_owner = ep;
-+ ret = KDBUS_HANDLE_EP_OWNER;
-+ break;
-+ }
-+
-+ case KDBUS_CMD_HELLO:
-+ conn = kdbus_cmd_hello(file_ep, file, buf);
-+ if (IS_ERR_OR_NULL(conn)) {
-+ ret = PTR_ERR_OR_ZERO(conn);
-+ break;
-+ }
-+
-+ handle->conn = conn;
-+ ret = KDBUS_HANDLE_CONNECTED;
-+ break;
-+
-+ default:
-+ ret = -EBADFD;
-+ break;
-+ }
-+
-+ kdbus_node_release(node);
-+ return ret;
-+}
-+
-+static long kdbus_handle_ioctl_ep_owner(struct file *file, unsigned int command,
-+ void __user *buf)
-+{
-+ struct kdbus_handle *handle = file->private_data;
-+ struct kdbus_ep *ep = handle->ep_owner;
-+ int ret;
-+
-+ if (!kdbus_node_acquire(&ep->node))
-+ return -ESHUTDOWN;
-+
-+ switch (command) {
-+ case KDBUS_CMD_ENDPOINT_UPDATE:
-+ ret = kdbus_cmd_ep_update(ep, buf);
-+ break;
-+ default:
-+ ret = -EBADFD;
-+ break;
-+ }
-+
-+ kdbus_node_release(&ep->node);
-+ return ret;
-+}
-+
-+static long kdbus_handle_ioctl_connected(struct file *file,
-+ unsigned int command, void __user *buf)
-+{
-+ struct kdbus_handle *handle = file->private_data;
-+ struct kdbus_conn *conn = handle->conn;
-+ struct kdbus_conn *release_conn = NULL;
-+ int ret;
-+
-+ release_conn = conn;
-+ ret = kdbus_conn_acquire(release_conn);
-+ if (ret < 0)
-+ return ret;
-+
-+ switch (command) {
-+ case KDBUS_CMD_BYEBYE:
-+ /*
-+ * BYEBYE is special; we must not acquire a connection when
-+ * calling into kdbus_conn_disconnect() or we will deadlock,
-+ * because kdbus_conn_disconnect() will wait for all acquired
-+ * references to be dropped.
-+ */
-+ kdbus_conn_release(release_conn);
-+ release_conn = NULL;
-+ ret = kdbus_cmd_byebye_unlocked(conn, buf);
-+ break;
-+ case KDBUS_CMD_NAME_ACQUIRE:
-+ ret = kdbus_cmd_name_acquire(conn, buf);
-+ break;
-+ case KDBUS_CMD_NAME_RELEASE:
-+ ret = kdbus_cmd_name_release(conn, buf);
-+ break;
-+ case KDBUS_CMD_LIST:
-+ ret = kdbus_cmd_list(conn, buf);
-+ break;
-+ case KDBUS_CMD_CONN_INFO:
-+ ret = kdbus_cmd_conn_info(conn, buf);
-+ break;
-+ case KDBUS_CMD_BUS_CREATOR_INFO:
-+ ret = kdbus_cmd_bus_creator_info(conn, buf);
-+ break;
-+ case KDBUS_CMD_UPDATE:
-+ ret = kdbus_cmd_update(conn, buf);
-+ break;
-+ case KDBUS_CMD_MATCH_ADD:
-+ ret = kdbus_cmd_match_add(conn, buf);
-+ break;
-+ case KDBUS_CMD_MATCH_REMOVE:
-+ ret = kdbus_cmd_match_remove(conn, buf);
-+ break;
-+ case KDBUS_CMD_SEND:
-+ ret = kdbus_cmd_send(conn, file, buf);
-+ break;
-+ case KDBUS_CMD_RECV:
-+ ret = kdbus_cmd_recv(conn, buf);
-+ break;
-+ case KDBUS_CMD_FREE:
-+ ret = kdbus_cmd_free(conn, buf);
-+ break;
-+ default:
-+ ret = -EBADFD;
-+ break;
-+ }
-+
-+ kdbus_conn_release(release_conn);
-+ return ret;
-+}
-+
-+static long kdbus_handle_ioctl(struct file *file, unsigned int cmd,
-+ unsigned long arg)
-+{
-+ struct kdbus_handle *handle = file->private_data;
-+ struct kdbus_node *node = kdbus_node_from_inode(file_inode(file));
-+ void __user *argp = (void __user *)arg;
-+ long ret = -EBADFD;
-+
-+ switch (cmd) {
-+ case KDBUS_CMD_BUS_MAKE:
-+ case KDBUS_CMD_ENDPOINT_MAKE:
-+ case KDBUS_CMD_HELLO:
-+ mutex_lock(&handle->lock);
-+ if (handle->type == KDBUS_HANDLE_NONE) {
-+ if (node->type == KDBUS_NODE_CONTROL)
-+ ret = kdbus_handle_ioctl_control(file, cmd,
-+ argp);
-+ else if (node->type == KDBUS_NODE_ENDPOINT)
-+ ret = kdbus_handle_ioctl_ep(file, cmd, argp);
-+
-+ if (ret > 0) {
-+ /*
-+ * The data given via open() is not sufficient
-+ * to setup a kdbus handle. Hence, we require
-+ * the user to perform a setup ioctl. This setup
-+ * can only be performed once and defines the
-+ * type of the handle. The different setup
-+ * ioctls are locked against each other so they
-+ * cannot race. Once the handle type is set,
-+ * the type-dependent ioctls are enabled. To
-+ * improve performance, we don't lock those via
-+ * handle->lock. Instead, we issue a
-+ * write-barrier before performing the
-+ * type-change, which pairs with smp_rmb() in
-+ * all handlers that access the type field. This
-+ * guarantees the handle is fully setup, if
-+ * handle->type is set. If handle->type is
-+ * unset, you must not make any assumptions
-+ * without taking handle->lock.
-+ * Note that handle->type is only set once. It
-+ * will never change afterwards.
-+ */
-+ smp_wmb();
-+ handle->type = ret;
-+ }
-+ }
-+ mutex_unlock(&handle->lock);
-+ break;
-+
-+ case KDBUS_CMD_ENDPOINT_UPDATE:
-+ case KDBUS_CMD_BYEBYE:
-+ case KDBUS_CMD_NAME_ACQUIRE:
-+ case KDBUS_CMD_NAME_RELEASE:
-+ case KDBUS_CMD_LIST:
-+ case KDBUS_CMD_CONN_INFO:
-+ case KDBUS_CMD_BUS_CREATOR_INFO:
-+ case KDBUS_CMD_UPDATE:
-+ case KDBUS_CMD_MATCH_ADD:
-+ case KDBUS_CMD_MATCH_REMOVE:
-+ case KDBUS_CMD_SEND:
-+ case KDBUS_CMD_RECV:
-+ case KDBUS_CMD_FREE: {
-+ enum kdbus_handle_type type;
-+
-+ /*
-+ * This read-barrier pairs with smp_wmb() of the handle setup.
-+ * it guarantees the handle is fully written, in case the
-+ * type has been set. It allows us to access the handle without
-+ * taking handle->lock, given the guarantee that the type is
-+ * only ever set once, and stays constant afterwards.
-+ * Furthermore, the handle object itself is not modified in any
-+ * way after the type is set. That is, the type-field is the
-+ * last field that is written on any handle. If it has not been
-+ * set, we must not access the handle here.
-+ */
-+ type = handle->type;
-+ smp_rmb();
-+
-+ if (type == KDBUS_HANDLE_EP_OWNER)
-+ ret = kdbus_handle_ioctl_ep_owner(file, cmd, argp);
-+ else if (type == KDBUS_HANDLE_CONNECTED)
-+ ret = kdbus_handle_ioctl_connected(file, cmd, argp);
-+
-+ break;
-+ }
-+ default:
-+ ret = -ENOTTY;
-+ break;
-+ }
-+
-+ return ret < 0 ? ret : 0;
-+}
-+
-+static unsigned int kdbus_handle_poll(struct file *file,
-+ struct poll_table_struct *wait)
-+{
-+ struct kdbus_handle *handle = file->private_data;
-+ enum kdbus_handle_type type;
-+ unsigned int mask = POLLOUT | POLLWRNORM;
-+
-+ /*
-+ * This pairs with smp_wmb() during handle setup. It guarantees that
-+ * _iff_ the handle type is set, handle->conn is valid. Furthermore,
-+ * _iff_ the type is set, the handle object is constant and never
-+ * changed again. If it's not set, we must not access the handle but
-+ * bail out. We also must assume no setup has taken place, yet.
-+ */
-+ type = handle->type;
-+ smp_rmb();
-+
-+ /* Only a connected endpoint can read/write data */
-+ if (type != KDBUS_HANDLE_CONNECTED)
-+ return POLLERR | POLLHUP;
-+
-+ poll_wait(file, &handle->conn->wait, wait);
-+
-+ /*
-+ * Verify the connection hasn't been deactivated _after_ adding the
-+ * wait-queue. This guarantees, that if the connection is deactivated
-+ * after we checked it, the waitqueue is signaled and we're called
-+ * again.
-+ */
-+ if (!kdbus_conn_active(handle->conn))
-+ return POLLERR | POLLHUP;
-+
-+ if (!list_empty(&handle->conn->queue.msg_list) ||
-+ atomic_read(&handle->conn->lost_count) > 0)
-+ mask |= POLLIN | POLLRDNORM;
-+
-+ return mask;
-+}
-+
-+static int kdbus_handle_mmap(struct file *file, struct vm_area_struct *vma)
-+{
-+ struct kdbus_handle *handle = file->private_data;
-+ enum kdbus_handle_type type;
-+ int ret = -EBADFD;
-+
-+ /*
-+ * This pairs with smp_wmb() during handle setup. It guarantees that
-+ * _iff_ the handle type is set, handle->conn is valid. Furthermore,
-+ * _iff_ the type is set, the handle object is constant and never
-+ * changed again. If it's not set, we must not access the handle but
-+ * bail out. We also must assume no setup has taken place, yet.
-+ */
-+ type = handle->type;
-+ smp_rmb();
-+
-+ /* Only connected handles have a pool we can map */
-+ if (type == KDBUS_HANDLE_CONNECTED)
-+ ret = kdbus_pool_mmap(handle->conn->pool, vma);
-+
-+ return ret;
-+}
-+
-+const struct file_operations kdbus_handle_ops = {
-+ .owner = THIS_MODULE,
-+ .open = kdbus_handle_open,
-+ .release = kdbus_handle_release,
-+ .poll = kdbus_handle_poll,
-+ .llseek = noop_llseek,
-+ .unlocked_ioctl = kdbus_handle_ioctl,
-+ .mmap = kdbus_handle_mmap,
-+#ifdef CONFIG_COMPAT
-+ .compat_ioctl = kdbus_handle_ioctl,
-+#endif
-+};
-diff --git a/ipc/kdbus/handle.h b/ipc/kdbus/handle.h
-new file mode 100644
-index 0000000..5dde2c1
---- /dev/null
-+++ b/ipc/kdbus/handle.h
-@@ -0,0 +1,103 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_HANDLE_H
-+#define __KDBUS_HANDLE_H
-+
-+#include <linux/fs.h>
-+#include <uapi/linux/kdbus.h>
-+
-+extern const struct file_operations kdbus_handle_ops;
-+
-+/**
-+ * kdbus_arg - information and state of a single ioctl command item
-+ * @type: item type
-+ * @item: set by the parser to the first found item of this type
-+ * @multiple: whether multiple items of this type are allowed
-+ * @mandatory: whether at least one item of this type is required
-+ *
-+ * This structure describes a single item in an ioctl command payload. The
-+ * caller has to pre-fill the type and flags, the parser will then use this
-+ * information to verify the ioctl payload. @item is set by the parser to point
-+ * to the first occurrence of the item.
-+ */
-+struct kdbus_arg {
-+ u64 type;
-+ struct kdbus_item *item;
-+ bool multiple : 1;
-+ bool mandatory : 1;
-+};
-+
-+/**
-+ * kdbus_args - information and state of ioctl command parser
-+ * @allowed_flags: set of flags this command supports
-+ * @argc: number of items in @argv
-+ * @argv: array of items this command supports
-+ * @user: set by parser to user-space location of current command
-+ * @cmd: set by parser to kernel copy of command payload
-+ * @cmd_buf: inline buf to avoid kmalloc() on small cmds
-+ * @items: points to item array in @cmd
-+ * @items_size: size of @items in bytes
-+ * @is_cmd: whether this is a command-payload or msg-payload
-+ *
-+ * This structure is used to parse ioctl command payloads on each invocation.
-+ * The ioctl handler has to pre-fill the flags and allowed items before passing
-+ * the object to kdbus_args_parse(). The parser will copy the command payload
-+ * into kernel-space and verify the correctness of the data.
-+ *
-+ * We use a 256 bytes buffer for small command payloads, to be allocated on
-+ * stack on syscall entrance.
-+ */
-+struct kdbus_args {
-+ u64 allowed_flags;
-+ size_t argc;
-+ struct kdbus_arg *argv;
-+
-+ struct kdbus_cmd __user *user;
-+ struct kdbus_cmd *cmd;
-+ u8 cmd_buf[256];
-+
-+ struct kdbus_item *items;
-+ size_t items_size;
-+ bool is_cmd : 1;
-+};
-+
-+int __kdbus_args_parse(struct kdbus_args *args, bool is_cmd, void __user *argp,
-+ size_t type_size, size_t items_offset, void **out);
-+int kdbus_args_clear(struct kdbus_args *args, int ret);
-+
-+#define kdbus_args_parse(_args, _argp, _v) \
-+ ({ \
-+ BUILD_BUG_ON(offsetof(typeof(**(_v)), size) != \
-+ offsetof(struct kdbus_cmd, size)); \
-+ BUILD_BUG_ON(offsetof(typeof(**(_v)), flags) != \
-+ offsetof(struct kdbus_cmd, flags)); \
-+ BUILD_BUG_ON(offsetof(typeof(**(_v)), return_flags) != \
-+ offsetof(struct kdbus_cmd, return_flags)); \
-+ __kdbus_args_parse((_args), 1, (_argp), sizeof(**(_v)), \
-+ offsetof(typeof(**(_v)), items), \
-+ (void **)(_v)); \
-+ })
-+
-+#define kdbus_args_parse_msg(_args, _argp, _v) \
-+ ({ \
-+ BUILD_BUG_ON(offsetof(typeof(**(_v)), size) != \
-+ offsetof(struct kdbus_cmd, size)); \
-+ BUILD_BUG_ON(offsetof(typeof(**(_v)), flags) != \
-+ offsetof(struct kdbus_cmd, flags)); \
-+ __kdbus_args_parse((_args), 0, (_argp), sizeof(**(_v)), \
-+ offsetof(typeof(**(_v)), items), \
-+ (void **)(_v)); \
-+ })
-+
-+#endif
-diff --git a/ipc/kdbus/item.c b/ipc/kdbus/item.c
-new file mode 100644
-index 0000000..ce78dba
---- /dev/null
-+++ b/ipc/kdbus/item.c
-@@ -0,0 +1,293 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/ctype.h>
-+#include <linux/fs.h>
-+#include <linux/string.h>
-+
-+#include "item.h"
-+#include "limits.h"
-+#include "util.h"
-+
-+/*
-+ * This verifies the string at position @str with size @size is properly
-+ * zero-terminated and does not contain a 0-byte but at the end.
-+ */
-+static bool kdbus_str_valid(const char *str, size_t size)
-+{
-+ return size > 0 && memchr(str, '\0', size) == str + size - 1;
-+}
-+
-+/**
-+ * kdbus_item_validate_name() - validate an item containing a name
-+ * @item: Item to validate
-+ *
-+ * Return: zero on success or an negative error code on failure
-+ */
-+int kdbus_item_validate_name(const struct kdbus_item *item)
-+{
-+ const char *name = item->str;
-+ unsigned int i;
-+ size_t len;
-+
-+ if (item->size < KDBUS_ITEM_HEADER_SIZE + 2)
-+ return -EINVAL;
-+
-+ if (item->size > KDBUS_ITEM_HEADER_SIZE +
-+ KDBUS_SYSNAME_MAX_LEN + 1)
-+ return -ENAMETOOLONG;
-+
-+ if (!kdbus_str_valid(name, KDBUS_ITEM_PAYLOAD_SIZE(item)))
-+ return -EINVAL;
-+
-+ len = strlen(name);
-+ if (len == 0)
-+ return -EINVAL;
-+
-+ for (i = 0; i < len; i++) {
-+ if (isalpha(name[i]))
-+ continue;
-+ if (isdigit(name[i]))
-+ continue;
-+ if (name[i] == '_')
-+ continue;
-+ if (i > 0 && i + 1 < len && (name[i] == '-' || name[i] == '.'))
-+ continue;
-+
-+ return -EINVAL;
-+ }
-+
-+ return 0;
-+}
-+
-+/**
-+ * kdbus_item_validate() - validate a single item
-+ * @item: item to validate
-+ *
-+ * Return: 0 if item is valid, negative error code if not.
-+ */
-+int kdbus_item_validate(const struct kdbus_item *item)
-+{
-+ size_t payload_size = KDBUS_ITEM_PAYLOAD_SIZE(item);
-+ size_t l;
-+ int ret;
-+
-+ BUILD_BUG_ON(KDBUS_ITEM_HEADER_SIZE !=
-+ sizeof(struct kdbus_item_header));
-+
-+ if (item->size < KDBUS_ITEM_HEADER_SIZE)
-+ return -EINVAL;
-+
-+ switch (item->type) {
-+ case KDBUS_ITEM_NEGOTIATE:
-+ if (payload_size % sizeof(u64) != 0)
-+ return -EINVAL;
-+ break;
-+
-+ case KDBUS_ITEM_PAYLOAD_VEC:
-+ case KDBUS_ITEM_PAYLOAD_OFF:
-+ if (payload_size != sizeof(struct kdbus_vec))
-+ return -EINVAL;
-+ if (item->vec.size == 0 || item->vec.size > SIZE_MAX)
-+ return -EINVAL;
-+ break;
-+
-+ case KDBUS_ITEM_PAYLOAD_MEMFD:
-+ if (payload_size != sizeof(struct kdbus_memfd))
-+ return -EINVAL;
-+ if (item->memfd.size == 0 || item->memfd.size > SIZE_MAX)
-+ return -EINVAL;
-+ if (item->memfd.fd < 0)
-+ return -EBADF;
-+ break;
-+
-+ case KDBUS_ITEM_FDS:
-+ if (payload_size % sizeof(int) != 0)
-+ return -EINVAL;
-+ break;
-+
-+ case KDBUS_ITEM_CANCEL_FD:
-+ if (payload_size != sizeof(int))
-+ return -EINVAL;
-+ break;
-+
-+ case KDBUS_ITEM_BLOOM_PARAMETER:
-+ if (payload_size != sizeof(struct kdbus_bloom_parameter))
-+ return -EINVAL;
-+ break;
-+
-+ case KDBUS_ITEM_BLOOM_FILTER:
-+ /* followed by the bloom-mask, depends on the bloom-size */
-+ if (payload_size < sizeof(struct kdbus_bloom_filter))
-+ return -EINVAL;
-+ break;
-+
-+ case KDBUS_ITEM_BLOOM_MASK:
-+ /* size depends on bloom-size of bus */
-+ break;
-+
-+ case KDBUS_ITEM_CONN_DESCRIPTION:
-+ case KDBUS_ITEM_MAKE_NAME:
-+ ret = kdbus_item_validate_name(item);
-+ if (ret < 0)
-+ return ret;
-+ break;
-+
-+ case KDBUS_ITEM_ATTACH_FLAGS_SEND:
-+ case KDBUS_ITEM_ATTACH_FLAGS_RECV:
-+ case KDBUS_ITEM_ID:
-+ case KDBUS_ITEM_DST_ID:
-+ if (payload_size != sizeof(u64))
-+ return -EINVAL;
-+ break;
-+
-+ case KDBUS_ITEM_TIMESTAMP:
-+ if (payload_size != sizeof(struct kdbus_timestamp))
-+ return -EINVAL;
-+ break;
-+
-+ case KDBUS_ITEM_CREDS:
-+ if (payload_size != sizeof(struct kdbus_creds))
-+ return -EINVAL;
-+ break;
-+
-+ case KDBUS_ITEM_AUXGROUPS:
-+ if (payload_size % sizeof(u32) != 0)
-+ return -EINVAL;
-+ break;
-+
-+ case KDBUS_ITEM_NAME:
-+ case KDBUS_ITEM_DST_NAME:
-+ case KDBUS_ITEM_PID_COMM:
-+ case KDBUS_ITEM_TID_COMM:
-+ case KDBUS_ITEM_EXE:
-+ case KDBUS_ITEM_CMDLINE:
-+ case KDBUS_ITEM_CGROUP:
-+ case KDBUS_ITEM_SECLABEL:
-+ if (!kdbus_str_valid(item->str, payload_size))
-+ return -EINVAL;
-+ break;
-+
-+ case KDBUS_ITEM_CAPS:
-+ if (payload_size < sizeof(u32))
-+ return -EINVAL;
-+ if (payload_size < sizeof(u32) +
-+ 4 * CAP_TO_INDEX(item->caps.last_cap) * sizeof(u32))
-+ return -EINVAL;
-+ break;
-+
-+ case KDBUS_ITEM_AUDIT:
-+ if (payload_size != sizeof(struct kdbus_audit))
-+ return -EINVAL;
-+ break;
-+
-+ case KDBUS_ITEM_POLICY_ACCESS:
-+ if (payload_size != sizeof(struct kdbus_policy_access))
-+ return -EINVAL;
-+ break;
-+
-+ case KDBUS_ITEM_NAME_ADD:
-+ case KDBUS_ITEM_NAME_REMOVE:
-+ case KDBUS_ITEM_NAME_CHANGE:
-+ if (payload_size < sizeof(struct kdbus_notify_name_change))
-+ return -EINVAL;
-+ l = payload_size - offsetof(struct kdbus_notify_name_change,
-+ name);
-+ if (l > 0 && !kdbus_str_valid(item->name_change.name, l))
-+ return -EINVAL;
-+ break;
-+
-+ case KDBUS_ITEM_ID_ADD:
-+ case KDBUS_ITEM_ID_REMOVE:
-+ if (payload_size != sizeof(struct kdbus_notify_id_change))
-+ return -EINVAL;
-+ break;
-+
-+ case KDBUS_ITEM_REPLY_TIMEOUT:
-+ case KDBUS_ITEM_REPLY_DEAD:
-+ if (payload_size != 0)
-+ return -EINVAL;
-+ break;
-+
-+ default:
-+ break;
-+ }
-+
-+ return 0;
-+}
-+
-+/**
-+ * kdbus_items_validate() - validate items passed by user-space
-+ * @items: items to validate
-+ * @items_size: number of items
-+ *
-+ * This verifies that the passed items pointer is consistent and valid.
-+ * Furthermore, each item is checked for:
-+ * - valid "size" value
-+ * - payload is of expected type
-+ * - payload is fully included in the item
-+ * - string payloads are zero-terminated
-+ *
-+ * Return: 0 on success, negative error code on failure.
-+ */
-+int kdbus_items_validate(const struct kdbus_item *items, size_t items_size)
-+{
-+ const struct kdbus_item *item;
-+ int ret;
-+
-+ KDBUS_ITEMS_FOREACH(item, items, items_size) {
-+ if (!KDBUS_ITEM_VALID(item, items, items_size))
-+ return -EINVAL;
-+
-+ ret = kdbus_item_validate(item);
-+ if (ret < 0)
-+ return ret;
-+ }
-+
-+ if (!KDBUS_ITEMS_END(item, items, items_size))
-+ return -EINVAL;
-+
-+ return 0;
-+}
-+
-+/**
-+ * kdbus_item_set() - Set item content
-+ * @item: The item to modify
-+ * @type: The item type to set (KDBUS_ITEM_*)
-+ * @data: Data to copy to item->data, may be %NULL
-+ * @len: Number of bytes in @data
-+ *
-+ * This sets type, size and data fields of an item. If @data is NULL, the data
-+ * memory is cleared.
-+ *
-+ * Note that you must align your @data memory to 8 bytes. Trailing padding (in
-+ * case @len is not 8byte aligned) is cleared by this call.
-+ *
-+ * Returns: Pointer to the following item.
-+ */
-+struct kdbus_item *kdbus_item_set(struct kdbus_item *item, u64 type,
-+ const void *data, size_t len)
-+{
-+ item->type = type;
-+ item->size = KDBUS_ITEM_HEADER_SIZE + len;
-+
-+ if (data) {
-+ memcpy(item->data, data, len);
-+ memset(item->data + len, 0, KDBUS_ALIGN8(len) - len);
-+ } else {
-+ memset(item->data, 0, KDBUS_ALIGN8(len));
-+ }
-+
-+ return KDBUS_ITEM_NEXT(item);
-+}
-diff --git a/ipc/kdbus/item.h b/ipc/kdbus/item.h
-new file mode 100644
-index 0000000..3a7e6cc
---- /dev/null
-+++ b/ipc/kdbus/item.h
-@@ -0,0 +1,61 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_ITEM_H
-+#define __KDBUS_ITEM_H
-+
-+#include <linux/kernel.h>
-+#include <uapi/linux/kdbus.h>
-+
-+#include "util.h"
-+
-+/* generic access and iterators over a stream of items */
-+#define KDBUS_ITEM_NEXT(_i) (typeof(_i))((u8 *)(_i) + KDBUS_ALIGN8((_i)->size))
-+#define KDBUS_ITEMS_SIZE(_h, _is) ((_h)->size - offsetof(typeof(*(_h)), _is))
-+#define KDBUS_ITEM_HEADER_SIZE offsetof(struct kdbus_item, data)
-+#define KDBUS_ITEM_SIZE(_s) KDBUS_ALIGN8(KDBUS_ITEM_HEADER_SIZE + (_s))
-+#define KDBUS_ITEM_PAYLOAD_SIZE(_i) ((_i)->size - KDBUS_ITEM_HEADER_SIZE)
-+
-+#define KDBUS_ITEMS_FOREACH(_i, _is, _s) \
-+ for ((_i) = (_is); \
-+ ((u8 *)(_i) < (u8 *)(_is) + (_s)) && \
-+ ((u8 *)(_i) >= (u8 *)(_is)); \
-+ (_i) = KDBUS_ITEM_NEXT(_i))
-+
-+#define KDBUS_ITEM_VALID(_i, _is, _s) \
-+ ((_i)->size >= KDBUS_ITEM_HEADER_SIZE && \
-+ (u8 *)(_i) + (_i)->size > (u8 *)(_i) && \
-+ (u8 *)(_i) + (_i)->size <= (u8 *)(_is) + (_s) && \
-+ (u8 *)(_i) >= (u8 *)(_is))
-+
-+#define KDBUS_ITEMS_END(_i, _is, _s) \
-+ ((u8 *)(_i) == ((u8 *)(_is) + KDBUS_ALIGN8(_s)))
-+
-+/**
-+ * struct kdbus_item_header - Describes the fix part of an item
-+ * @size: The total size of the item
-+ * @type: The item type, one of KDBUS_ITEM_*
-+ */
-+struct kdbus_item_header {
-+ u64 size;
-+ u64 type;
-+};
-+
-+int kdbus_item_validate_name(const struct kdbus_item *item);
-+int kdbus_item_validate(const struct kdbus_item *item);
-+int kdbus_items_validate(const struct kdbus_item *items, size_t items_size);
-+struct kdbus_item *kdbus_item_set(struct kdbus_item *item, u64 type,
-+ const void *data, size_t len);
-+
-+#endif
-diff --git a/ipc/kdbus/limits.h b/ipc/kdbus/limits.h
-new file mode 100644
-index 0000000..c54925a
---- /dev/null
-+++ b/ipc/kdbus/limits.h
-@@ -0,0 +1,61 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_DEFAULTS_H
-+#define __KDBUS_DEFAULTS_H
-+
-+#include <linux/kernel.h>
-+
-+/* maximum size of message header and items */
-+#define KDBUS_MSG_MAX_SIZE SZ_8K
-+
-+/* maximum number of memfd items per message */
-+#define KDBUS_MSG_MAX_MEMFD_ITEMS 16
-+
-+/* max size of ioctl command data */
-+#define KDBUS_CMD_MAX_SIZE SZ_32K
-+
-+/* maximum number of inflight fds in a target queue per user */
-+#define KDBUS_CONN_MAX_FDS_PER_USER 16
-+
-+/* maximum message payload size */
-+#define KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE SZ_2M
-+
-+/* maximum size of bloom bit field in bytes */
-+#define KDBUS_BUS_BLOOM_MAX_SIZE SZ_4K
-+
-+/* maximum length of well-known bus name */
-+#define KDBUS_NAME_MAX_LEN 255
-+
-+/* maximum length of bus, domain, ep name */
-+#define KDBUS_SYSNAME_MAX_LEN 63
-+
-+/* maximum number of matches per connection */
-+#define KDBUS_MATCH_MAX 256
-+
-+/* maximum number of queued messages from the same individual user */
-+#define KDBUS_CONN_MAX_MSGS 256
-+
-+/* maximum number of well-known names per connection */
-+#define KDBUS_CONN_MAX_NAMES 256
-+
-+/* maximum number of queued requests waiting for a reply */
-+#define KDBUS_CONN_MAX_REQUESTS_PENDING 128
-+
-+/* maximum number of connections per user in one domain */
-+#define KDBUS_USER_MAX_CONN 1024
-+
-+/* maximum number of buses per user in one domain */
-+#define KDBUS_USER_MAX_BUSES 16
-+
-+#endif
-diff --git a/ipc/kdbus/main.c b/ipc/kdbus/main.c
-new file mode 100644
-index 0000000..1ad4dc8
---- /dev/null
-+++ b/ipc/kdbus/main.c
-@@ -0,0 +1,114 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-+#include <linux/fs.h>
-+#include <linux/init.h>
-+#include <linux/module.h>
-+
-+#include "util.h"
-+#include "fs.h"
-+#include "handle.h"
-+#include "metadata.h"
-+#include "node.h"
-+
-+/*
-+ * This is a simplified outline of the internal kdbus object relations, for
-+ * those interested in the inner life of the driver implementation.
-+ *
-+ * From a mount point's (domain's) perspective:
-+ *
-+ * struct kdbus_domain
-+ * |» struct kdbus_user *user (many, owned)
-+ * '» struct kdbus_node node (embedded)
-+ * |» struct kdbus_node children (many, referenced)
-+ * |» struct kdbus_node *parent (pinned)
-+ * '» struct kdbus_bus (many, pinned)
-+ * |» struct kdbus_node node (embedded)
-+ * '» struct kdbus_ep (many, pinned)
-+ * |» struct kdbus_node node (embedded)
-+ * |» struct kdbus_bus *bus (pinned)
-+ * |» struct kdbus_conn conn_list (many, pinned)
-+ * | |» struct kdbus_ep *ep (pinned)
-+ * | |» struct kdbus_name_entry *activator_of (owned)
-+ * | |» struct kdbus_match_db *match_db (owned)
-+ * | |» struct kdbus_meta *meta (owned)
-+ * | |» struct kdbus_match_db *match_db (owned)
-+ * | | '» struct kdbus_match_entry (many, owned)
-+ * | |
-+ * | |» struct kdbus_pool *pool (owned)
-+ * | | '» struct kdbus_pool_slice *slices (many, owned)
-+ * | | '» struct kdbus_pool *pool (pinned)
-+ * | |
-+ * | |» struct kdbus_user *user (pinned)
-+ * | `» struct kdbus_queue_entry entries (many, embedded)
-+ * | |» struct kdbus_pool_slice *slice (pinned)
-+ * | |» struct kdbus_conn_reply *reply (owned)
-+ * | '» struct kdbus_user *user (pinned)
-+ * |
-+ * '» struct kdbus_user *user (pinned)
-+ * '» struct kdbus_policy_db policy_db (embedded)
-+ * |» struct kdbus_policy_db_entry (many, owned)
-+ * | |» struct kdbus_conn (pinned)
-+ * | '» struct kdbus_ep (pinned)
-+ * |
-+ * '» struct kdbus_policy_db_cache_entry (many, owned)
-+ * '» struct kdbus_conn (pinned)
-+ *
-+ * For the life-time of a file descriptor derived from calling open() on a file
-+ * inside the mount point:
-+ *
-+ * struct kdbus_handle
-+ * |» struct kdbus_meta *meta (owned)
-+ * |» struct kdbus_ep *ep (pinned)
-+ * |» struct kdbus_conn *conn (owned)
-+ * '» struct kdbus_ep *ep (owned)
-+ */
-+
-+/* kdbus mount-point /sys/fs/kdbus */
-+static struct kobject *kdbus_dir;
-+
-+static int __init kdbus_init(void)
-+{
-+ int ret;
-+
-+ kdbus_dir = kobject_create_and_add(KBUILD_MODNAME, fs_kobj);
-+ if (!kdbus_dir)
-+ return -ENOMEM;
-+
-+ ret = kdbus_fs_init();
-+ if (ret < 0) {
-+ pr_err("cannot register filesystem: %d\n", ret);
-+ goto exit_dir;
-+ }
-+
-+ pr_info("initialized\n");
-+ return 0;
-+
-+exit_dir:
-+ kobject_put(kdbus_dir);
-+ return ret;
-+}
-+
-+static void __exit kdbus_exit(void)
-+{
-+ kdbus_fs_exit();
-+ kobject_put(kdbus_dir);
-+ ida_destroy(&kdbus_node_ida);
-+}
-+
-+module_init(kdbus_init);
-+module_exit(kdbus_exit);
-+MODULE_LICENSE("GPL");
-+MODULE_DESCRIPTION("D-Bus, powerful, easy to use interprocess communication");
-+MODULE_ALIAS_FS(KBUILD_MODNAME "fs");
-diff --git a/ipc/kdbus/match.c b/ipc/kdbus/match.c
-new file mode 100644
-index 0000000..4ee6a1f
---- /dev/null
-+++ b/ipc/kdbus/match.c
-@@ -0,0 +1,546 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/fs.h>
-+#include <linux/hash.h>
-+#include <linux/init.h>
-+#include <linux/mutex.h>
-+#include <linux/sched.h>
-+#include <linux/sizes.h>
-+#include <linux/slab.h>
-+#include <linux/uaccess.h>
-+
-+#include "bus.h"
-+#include "connection.h"
-+#include "endpoint.h"
-+#include "handle.h"
-+#include "item.h"
-+#include "match.h"
-+#include "message.h"
-+#include "names.h"
-+
-+/**
-+ * struct kdbus_match_db - message filters
-+ * @entries_list: List of matches
-+ * @mdb_rwlock: Match data lock
-+ * @entries_count: Number of entries in database
-+ */
-+struct kdbus_match_db {
-+ struct list_head entries_list;
-+ struct rw_semaphore mdb_rwlock;
-+ unsigned int entries_count;
-+};
-+
-+/**
-+ * struct kdbus_match_entry - a match database entry
-+ * @cookie: User-supplied cookie to lookup the entry
-+ * @list_entry: The list entry element for the db list
-+ * @rules_list: The list head for tracking rules of this entry
-+ */
-+struct kdbus_match_entry {
-+ u64 cookie;
-+ struct list_head list_entry;
-+ struct list_head rules_list;
-+};
-+
-+/**
-+ * struct kdbus_bloom_mask - mask to match against filter
-+ * @generations: Number of generations carried
-+ * @data: Array of bloom bit fields
-+ */
-+struct kdbus_bloom_mask {
-+ u64 generations;
-+ u64 *data;
-+};
-+
-+/**
-+ * struct kdbus_match_rule - a rule appended to a match entry
-+ * @type: An item type to match against
-+ * @bloom_mask: Bloom mask to match a message's filter against, used
-+ * with KDBUS_ITEM_BLOOM_MASK
-+ * @name: Name to match against, used with KDBUS_ITEM_NAME,
-+ * KDBUS_ITEM_NAME_{ADD,REMOVE,CHANGE}
-+ * @old_id: ID to match against, used with
-+ * KDBUS_ITEM_NAME_{ADD,REMOVE,CHANGE},
-+ * KDBUS_ITEM_ID_REMOVE
-+ * @new_id: ID to match against, used with
-+ * KDBUS_ITEM_NAME_{ADD,REMOVE,CHANGE},
-+ * KDBUS_ITEM_ID_REMOVE
-+ * @src_id: ID to match against, used with KDBUS_ITEM_ID
-+ * @dst_id: Message destination ID, used with KDBUS_ITEM_DST_ID
-+ * @rules_entry: Entry in the entry's rules list
-+ */
-+struct kdbus_match_rule {
-+ u64 type;
-+ union {
-+ struct kdbus_bloom_mask bloom_mask;
-+ struct {
-+ char *name;
-+ u64 old_id;
-+ u64 new_id;
-+ };
-+ u64 src_id;
-+ u64 dst_id;
-+ };
-+ struct list_head rules_entry;
-+};
-+
-+static void kdbus_match_rule_free(struct kdbus_match_rule *rule)
-+{
-+ if (!rule)
-+ return;
-+
-+ switch (rule->type) {
-+ case KDBUS_ITEM_BLOOM_MASK:
-+ kfree(rule->bloom_mask.data);
-+ break;
-+
-+ case KDBUS_ITEM_NAME:
-+ case KDBUS_ITEM_NAME_ADD:
-+ case KDBUS_ITEM_NAME_REMOVE:
-+ case KDBUS_ITEM_NAME_CHANGE:
-+ kfree(rule->name);
-+ break;
-+
-+ case KDBUS_ITEM_ID:
-+ case KDBUS_ITEM_DST_ID:
-+ case KDBUS_ITEM_ID_ADD:
-+ case KDBUS_ITEM_ID_REMOVE:
-+ break;
-+
-+ default:
-+ BUG();
-+ }
-+
-+ list_del(&rule->rules_entry);
-+ kfree(rule);
-+}
-+
-+static void kdbus_match_entry_free(struct kdbus_match_entry *entry)
-+{
-+ struct kdbus_match_rule *r, *tmp;
-+
-+ if (!entry)
-+ return;
-+
-+ list_for_each_entry_safe(r, tmp, &entry->rules_list, rules_entry)
-+ kdbus_match_rule_free(r);
-+
-+ list_del(&entry->list_entry);
-+ kfree(entry);
-+}
-+
-+/**
-+ * kdbus_match_db_free() - free match db resources
-+ * @mdb: The match database
-+ */
-+void kdbus_match_db_free(struct kdbus_match_db *mdb)
-+{
-+ struct kdbus_match_entry *entry, *tmp;
-+
-+ if (!mdb)
-+ return;
-+
-+ list_for_each_entry_safe(entry, tmp, &mdb->entries_list, list_entry)
-+ kdbus_match_entry_free(entry);
-+
-+ kfree(mdb);
-+}
-+
-+/**
-+ * kdbus_match_db_new() - create a new match database
-+ *
-+ * Return: a new kdbus_match_db on success, ERR_PTR on failure.
-+ */
-+struct kdbus_match_db *kdbus_match_db_new(void)
-+{
-+ struct kdbus_match_db *d;
-+
-+ d = kzalloc(sizeof(*d), GFP_KERNEL);
-+ if (!d)
-+ return ERR_PTR(-ENOMEM);
-+
-+ init_rwsem(&d->mdb_rwlock);
-+ INIT_LIST_HEAD(&d->entries_list);
-+
-+ return d;
-+}
-+
-+static bool kdbus_match_bloom(const struct kdbus_bloom_filter *filter,
-+ const struct kdbus_bloom_mask *mask,
-+ const struct kdbus_conn *conn)
-+{
-+ size_t n = conn->ep->bus->bloom.size / sizeof(u64);
-+ const u64 *m;
-+ size_t i;
-+
-+ /*
-+ * The message's filter carries a generation identifier, the
-+ * match's mask possibly carries an array of multiple generations
-+ * of the mask. Select the mask with the closest match of the
-+ * filter's generation.
-+ */
-+ m = mask->data + (min(filter->generation, mask->generations - 1) * n);
-+
-+ /*
-+ * The message's filter contains the messages properties,
-+ * the match's mask contains the properties to look for in the
-+ * message. Check the mask bit field against the filter bit field,
-+ * if the message possibly carries the properties the connection
-+ * has subscribed to.
-+ */
-+ for (i = 0; i < n; i++)
-+ if ((filter->data[i] & m[i]) != m[i])
-+ return false;
-+
-+ return true;
-+}
-+
-+static bool kdbus_match_rule_conn(const struct kdbus_match_rule *r,
-+ struct kdbus_conn *c,
-+ const struct kdbus_staging *s)
-+{
-+ lockdep_assert_held(&c->ep->bus->name_registry->rwlock);
-+
-+ switch (r->type) {
-+ case KDBUS_ITEM_BLOOM_MASK:
-+ return kdbus_match_bloom(s->bloom_filter, &r->bloom_mask, c);
-+ case KDBUS_ITEM_ID:
-+ return r->src_id == c->id || r->src_id == KDBUS_MATCH_ID_ANY;
-+ case KDBUS_ITEM_DST_ID:
-+ return r->dst_id == s->msg->dst_id ||
-+ r->dst_id == KDBUS_MATCH_ID_ANY;
-+ case KDBUS_ITEM_NAME:
-+ return kdbus_conn_has_name(c, r->name);
-+ default:
-+ return false;
-+ }
-+}
-+
-+static bool kdbus_match_rule_kernel(const struct kdbus_match_rule *r,
-+ const struct kdbus_staging *s)
-+{
-+ struct kdbus_item *n = s->notify;
-+
-+ if (WARN_ON(!n) || n->type != r->type)
-+ return false;
-+
-+ switch (r->type) {
-+ case KDBUS_ITEM_ID_ADD:
-+ return r->new_id == KDBUS_MATCH_ID_ANY ||
-+ r->new_id == n->id_change.id;
-+ case KDBUS_ITEM_ID_REMOVE:
-+ return r->old_id == KDBUS_MATCH_ID_ANY ||
-+ r->old_id == n->id_change.id;
-+ case KDBUS_ITEM_NAME_ADD:
-+ case KDBUS_ITEM_NAME_CHANGE:
-+ case KDBUS_ITEM_NAME_REMOVE:
-+ return (r->old_id == KDBUS_MATCH_ID_ANY ||
-+ r->old_id == n->name_change.old_id.id) &&
-+ (r->new_id == KDBUS_MATCH_ID_ANY ||
-+ r->new_id == n->name_change.new_id.id) &&
-+ (!r->name || !strcmp(r->name, n->name_change.name));
-+ default:
-+ return false;
-+ }
-+}
-+
-+static bool kdbus_match_rules(const struct kdbus_match_entry *entry,
-+ struct kdbus_conn *c,
-+ const struct kdbus_staging *s)
-+{
-+ struct kdbus_match_rule *r;
-+
-+ list_for_each_entry(r, &entry->rules_list, rules_entry)
-+ if ((c && !kdbus_match_rule_conn(r, c, s)) ||
-+ (!c && !kdbus_match_rule_kernel(r, s)))
-+ return false;
-+
-+ return true;
-+}
-+
-+/**
-+ * kdbus_match_db_match_msg() - match a msg object agains the database entries
-+ * @mdb: The match database
-+ * @conn_src: The connection object originating the message
-+ * @staging: Staging object containing the message to match against
-+ *
-+ * This function will walk through all the database entries previously uploaded
-+ * with kdbus_match_db_add(). As soon as any of them has an all-satisfied rule
-+ * set, this function will return true.
-+ *
-+ * The caller must hold the registry lock of conn_src->ep->bus, in case conn_src
-+ * is non-NULL.
-+ *
-+ * Return: true if there was a matching database entry, false otherwise.
-+ */
-+bool kdbus_match_db_match_msg(struct kdbus_match_db *mdb,
-+ struct kdbus_conn *conn_src,
-+ const struct kdbus_staging *staging)
-+{
-+ struct kdbus_match_entry *entry;
-+ bool matched = false;
-+
-+ down_read(&mdb->mdb_rwlock);
-+ list_for_each_entry(entry, &mdb->entries_list, list_entry) {
-+ matched = kdbus_match_rules(entry, conn_src, staging);
-+ if (matched)
-+ break;
-+ }
-+ up_read(&mdb->mdb_rwlock);
-+
-+ return matched;
-+}
-+
-+static int kdbus_match_db_remove_unlocked(struct kdbus_match_db *mdb,
-+ u64 cookie)
-+{
-+ struct kdbus_match_entry *entry, *tmp;
-+ bool found = false;
-+
-+ list_for_each_entry_safe(entry, tmp, &mdb->entries_list, list_entry)
-+ if (entry->cookie == cookie) {
-+ kdbus_match_entry_free(entry);
-+ --mdb->entries_count;
-+ found = true;
-+ }
-+
-+ return found ? 0 : -EBADSLT;
-+}
-+
-+/**
-+ * kdbus_cmd_match_add() - handle KDBUS_CMD_MATCH_ADD
-+ * @conn: connection to operate on
-+ * @argp: command payload
-+ *
-+ * One call to this function (or one ioctl(KDBUS_CMD_MATCH_ADD), respectively,
-+ * adds one new database entry with n rules attached to it. Each rule is
-+ * described with an kdbus_item, and an entry is considered matching if all
-+ * its rules are satisfied.
-+ *
-+ * The items attached to a kdbus_cmd_match struct have the following mapping:
-+ *
-+ * KDBUS_ITEM_BLOOM_MASK: A bloom mask
-+ * KDBUS_ITEM_NAME: A connection's source name
-+ * KDBUS_ITEM_ID: A connection ID
-+ * KDBUS_ITEM_DST_ID: A connection ID
-+ * KDBUS_ITEM_NAME_ADD:
-+ * KDBUS_ITEM_NAME_REMOVE:
-+ * KDBUS_ITEM_NAME_CHANGE: Well-known name changes, carry
-+ * kdbus_notify_name_change
-+ * KDBUS_ITEM_ID_ADD:
-+ * KDBUS_ITEM_ID_REMOVE: Connection ID changes, carry
-+ * kdbus_notify_id_change
-+ *
-+ * For kdbus_notify_{id,name}_change structs, only the ID and name fields
-+ * are looked at when adding an entry. The flags are unused.
-+ *
-+ * Also note that KDBUS_ITEM_BLOOM_MASK, KDBUS_ITEM_NAME, KDBUS_ITEM_ID,
-+ * and KDBUS_ITEM_DST_ID are used to match messages from userspace, while the
-+ * others apply to kernel-generated notifications.
-+ *
-+ * Return: >=0 on success, negative error code on failure.
-+ */
-+int kdbus_cmd_match_add(struct kdbus_conn *conn, void __user *argp)
-+{
-+ struct kdbus_match_db *mdb = conn->match_db;
-+ struct kdbus_match_entry *entry = NULL;
-+ struct kdbus_cmd_match *cmd;
-+ struct kdbus_item *item;
-+ int ret;
-+
-+ struct kdbus_arg argv[] = {
-+ { .type = KDBUS_ITEM_NEGOTIATE },
-+ { .type = KDBUS_ITEM_BLOOM_MASK, .multiple = true },
-+ { .type = KDBUS_ITEM_NAME, .multiple = true },
-+ { .type = KDBUS_ITEM_ID, .multiple = true },
-+ { .type = KDBUS_ITEM_DST_ID, .multiple = true },
-+ { .type = KDBUS_ITEM_NAME_ADD, .multiple = true },
-+ { .type = KDBUS_ITEM_NAME_REMOVE, .multiple = true },
-+ { .type = KDBUS_ITEM_NAME_CHANGE, .multiple = true },
-+ { .type = KDBUS_ITEM_ID_ADD, .multiple = true },
-+ { .type = KDBUS_ITEM_ID_REMOVE, .multiple = true },
-+ };
-+ struct kdbus_args args = {
-+ .allowed_flags = KDBUS_FLAG_NEGOTIATE |
-+ KDBUS_MATCH_REPLACE,
-+ .argv = argv,
-+ .argc = ARRAY_SIZE(argv),
-+ };
-+
-+ if (!kdbus_conn_is_ordinary(conn))
-+ return -EOPNOTSUPP;
-+
-+ ret = kdbus_args_parse(&args, argp, &cmd);
-+ if (ret != 0)
-+ return ret;
-+
-+ entry = kzalloc(sizeof(*entry), GFP_KERNEL);
-+ if (!entry) {
-+ ret = -ENOMEM;
-+ goto exit;
-+ }
-+
-+ entry->cookie = cmd->cookie;
-+ INIT_LIST_HEAD(&entry->list_entry);
-+ INIT_LIST_HEAD(&entry->rules_list);
-+
-+ KDBUS_ITEMS_FOREACH(item, cmd->items, KDBUS_ITEMS_SIZE(cmd, items)) {
-+ struct kdbus_match_rule *rule;
-+ size_t size = item->size - offsetof(struct kdbus_item, data);
-+
-+ rule = kzalloc(sizeof(*rule), GFP_KERNEL);
-+ if (!rule) {
-+ ret = -ENOMEM;
-+ goto exit;
-+ }
-+
-+ rule->type = item->type;
-+ INIT_LIST_HEAD(&rule->rules_entry);
-+
-+ switch (item->type) {
-+ case KDBUS_ITEM_BLOOM_MASK: {
-+ u64 bsize = conn->ep->bus->bloom.size;
-+ u64 generations;
-+ u64 remainder;
-+
-+ generations = div64_u64_rem(size, bsize, &remainder);
-+ if (size < bsize || remainder > 0) {
-+ ret = -EDOM;
-+ break;
-+ }
-+
-+ rule->bloom_mask.data = kmemdup(item->data,
-+ size, GFP_KERNEL);
-+ if (!rule->bloom_mask.data) {
-+ ret = -ENOMEM;
-+ break;
-+ }
-+
-+ rule->bloom_mask.generations = generations;
-+ break;
-+ }
-+
-+ case KDBUS_ITEM_NAME:
-+ if (!kdbus_name_is_valid(item->str, false)) {
-+ ret = -EINVAL;
-+ break;
-+ }
-+
-+ rule->name = kstrdup(item->str, GFP_KERNEL);
-+ if (!rule->name)
-+ ret = -ENOMEM;
-+
-+ break;
-+
-+ case KDBUS_ITEM_ID:
-+ rule->src_id = item->id;
-+ break;
-+
-+ case KDBUS_ITEM_DST_ID:
-+ rule->dst_id = item->id;
-+ break;
-+
-+ case KDBUS_ITEM_NAME_ADD:
-+ case KDBUS_ITEM_NAME_REMOVE:
-+ case KDBUS_ITEM_NAME_CHANGE:
-+ rule->old_id = item->name_change.old_id.id;
-+ rule->new_id = item->name_change.new_id.id;
-+
-+ if (size > sizeof(struct kdbus_notify_name_change)) {
-+ rule->name = kstrdup(item->name_change.name,
-+ GFP_KERNEL);
-+ if (!rule->name)
-+ ret = -ENOMEM;
-+ }
-+
-+ break;
-+
-+ case KDBUS_ITEM_ID_ADD:
-+ case KDBUS_ITEM_ID_REMOVE:
-+ if (item->type == KDBUS_ITEM_ID_ADD)
-+ rule->new_id = item->id_change.id;
-+ else
-+ rule->old_id = item->id_change.id;
-+
-+ break;
-+ }
-+
-+ if (ret < 0) {
-+ kdbus_match_rule_free(rule);
-+ goto exit;
-+ }
-+
-+ list_add_tail(&rule->rules_entry, &entry->rules_list);
-+ }
-+
-+ down_write(&mdb->mdb_rwlock);
-+
-+ /* Remove any entry that has the same cookie as the current one. */
-+ if (cmd->flags & KDBUS_MATCH_REPLACE)
-+ kdbus_match_db_remove_unlocked(mdb, entry->cookie);
-+
-+ /*
-+ * If the above removal caught any entry, there will be room for the
-+ * new one.
-+ */
-+ if (++mdb->entries_count > KDBUS_MATCH_MAX) {
-+ --mdb->entries_count;
-+ ret = -EMFILE;
-+ } else {
-+ list_add_tail(&entry->list_entry, &mdb->entries_list);
-+ entry = NULL;
-+ }
-+
-+ up_write(&mdb->mdb_rwlock);
-+
-+exit:
-+ kdbus_match_entry_free(entry);
-+ return kdbus_args_clear(&args, ret);
-+}
-+
-+/**
-+ * kdbus_cmd_match_remove() - handle KDBUS_CMD_MATCH_REMOVE
-+ * @conn: connection to operate on
-+ * @argp: command payload
-+ *
-+ * Return: >=0 on success, negative error code on failure.
-+ */
-+int kdbus_cmd_match_remove(struct kdbus_conn *conn, void __user *argp)
-+{
-+ struct kdbus_cmd_match *cmd;
-+ int ret;
-+
-+ struct kdbus_arg argv[] = {
-+ { .type = KDBUS_ITEM_NEGOTIATE },
-+ };
-+ struct kdbus_args args = {
-+ .allowed_flags = KDBUS_FLAG_NEGOTIATE,
-+ .argv = argv,
-+ .argc = ARRAY_SIZE(argv),
-+ };
-+
-+ if (!kdbus_conn_is_ordinary(conn))
-+ return -EOPNOTSUPP;
-+
-+ ret = kdbus_args_parse(&args, argp, &cmd);
-+ if (ret != 0)
-+ return ret;
-+
-+ down_write(&conn->match_db->mdb_rwlock);
-+ ret = kdbus_match_db_remove_unlocked(conn->match_db, cmd->cookie);
-+ up_write(&conn->match_db->mdb_rwlock);
-+
-+ return kdbus_args_clear(&args, ret);
-+}
-diff --git a/ipc/kdbus/match.h b/ipc/kdbus/match.h
-new file mode 100644
-index 0000000..ceb492f
---- /dev/null
-+++ b/ipc/kdbus/match.h
-@@ -0,0 +1,35 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_MATCH_H
-+#define __KDBUS_MATCH_H
-+
-+struct kdbus_conn;
-+struct kdbus_match_db;
-+struct kdbus_staging;
-+
-+struct kdbus_match_db *kdbus_match_db_new(void);
-+void kdbus_match_db_free(struct kdbus_match_db *db);
-+int kdbus_match_db_add(struct kdbus_conn *conn,
-+ struct kdbus_cmd_match *cmd);
-+int kdbus_match_db_remove(struct kdbus_conn *conn,
-+ struct kdbus_cmd_match *cmd);
-+bool kdbus_match_db_match_msg(struct kdbus_match_db *db,
-+ struct kdbus_conn *conn_src,
-+ const struct kdbus_staging *staging);
-+
-+int kdbus_cmd_match_add(struct kdbus_conn *conn, void __user *argp);
-+int kdbus_cmd_match_remove(struct kdbus_conn *conn, void __user *argp);
-+
-+#endif
-diff --git a/ipc/kdbus/message.c b/ipc/kdbus/message.c
-new file mode 100644
-index 0000000..ae565cd
---- /dev/null
-+++ b/ipc/kdbus/message.c
-@@ -0,0 +1,1040 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/capability.h>
-+#include <linux/cgroup.h>
-+#include <linux/cred.h>
-+#include <linux/file.h>
-+#include <linux/init.h>
-+#include <linux/module.h>
-+#include <linux/mutex.h>
-+#include <linux/sched.h>
-+#include <linux/shmem_fs.h>
-+#include <linux/sizes.h>
-+#include <linux/slab.h>
-+#include <linux/uaccess.h>
-+#include <net/sock.h>
-+
-+#include "bus.h"
-+#include "connection.h"
-+#include "domain.h"
-+#include "endpoint.h"
-+#include "handle.h"
-+#include "item.h"
-+#include "match.h"
-+#include "message.h"
-+#include "names.h"
-+#include "policy.h"
-+
-+static const char * const zeros = "\0\0\0\0\0\0\0";
-+
-+static struct kdbus_gaps *kdbus_gaps_new(size_t n_memfds, size_t n_fds)
-+{
-+ size_t size_offsets, size_memfds, size_fds, size;
-+ struct kdbus_gaps *gaps;
-+
-+ size_offsets = n_memfds * sizeof(*gaps->memfd_offsets);
-+ size_memfds = n_memfds * sizeof(*gaps->memfd_files);
-+ size_fds = n_fds * sizeof(*gaps->fd_files);
-+ size = sizeof(*gaps) + size_offsets + size_memfds + size_fds;
-+
-+ gaps = kzalloc(size, GFP_KERNEL);
-+ if (!gaps)
-+ return ERR_PTR(-ENOMEM);
-+
-+ kref_init(&gaps->kref);
-+ gaps->n_memfds = 0; /* we reserve n_memfds, but don't enforce them */
-+ gaps->memfd_offsets = (void *)(gaps + 1);
-+ gaps->memfd_files = (void *)((u8 *)gaps->memfd_offsets + size_offsets);
-+ gaps->n_fds = 0; /* we reserve n_fds, but don't enforce them */
-+ gaps->fd_files = (void *)((u8 *)gaps->memfd_files + size_memfds);
-+
-+ return gaps;
-+}
-+
-+static void kdbus_gaps_free(struct kref *kref)
-+{
-+ struct kdbus_gaps *gaps = container_of(kref, struct kdbus_gaps, kref);
-+ size_t i;
-+
-+ for (i = 0; i < gaps->n_fds; ++i)
-+ if (gaps->fd_files[i])
-+ fput(gaps->fd_files[i]);
-+ for (i = 0; i < gaps->n_memfds; ++i)
-+ if (gaps->memfd_files[i])
-+ fput(gaps->memfd_files[i]);
-+
-+ kfree(gaps);
-+}
-+
-+/**
-+ * kdbus_gaps_ref() - gain reference
-+ * @gaps: gaps object
-+ *
-+ * Return: @gaps is returned
-+ */
-+struct kdbus_gaps *kdbus_gaps_ref(struct kdbus_gaps *gaps)
-+{
-+ if (gaps)
-+ kref_get(&gaps->kref);
-+ return gaps;
-+}
-+
-+/**
-+ * kdbus_gaps_unref() - drop reference
-+ * @gaps: gaps object
-+ *
-+ * Return: NULL
-+ */
-+struct kdbus_gaps *kdbus_gaps_unref(struct kdbus_gaps *gaps)
-+{
-+ if (gaps)
-+ kref_put(&gaps->kref, kdbus_gaps_free);
-+ return NULL;
-+}
-+
-+/**
-+ * kdbus_gaps_install() - install file-descriptors
-+ * @gaps: gaps object, or NULL
-+ * @slice: pool slice that contains the message
-+ * @out_incomplete output variable to note incomplete fds
-+ *
-+ * This function installs all file-descriptors of @gaps into the current
-+ * process and copies the file-descriptor numbers into the target pool slice.
-+ *
-+ * If the file-descriptors were only partially installed, then @out_incomplete
-+ * will be set to true. Otherwise, it's set to false.
-+ *
-+ * Return: 0 on success, negative error code on failure
-+ */
-+int kdbus_gaps_install(struct kdbus_gaps *gaps, struct kdbus_pool_slice *slice,
-+ bool *out_incomplete)
-+{
-+ bool incomplete_fds = false;
-+ struct kvec kvec;
-+ size_t i, n_fds;
-+ int ret, *fds;
-+
-+ if (!gaps) {
-+ /* nothing to do */
-+ *out_incomplete = incomplete_fds;
-+ return 0;
-+ }
-+
-+ n_fds = gaps->n_fds + gaps->n_memfds;
-+ if (n_fds < 1) {
-+ /* nothing to do */
-+ *out_incomplete = incomplete_fds;
-+ return 0;
-+ }
-+
-+ fds = kmalloc_array(n_fds, sizeof(*fds), GFP_TEMPORARY);
-+ n_fds = 0;
-+ if (!fds)
-+ return -ENOMEM;
-+
-+ /* 1) allocate fds and copy them over */
-+
-+ if (gaps->n_fds > 0) {
-+ for (i = 0; i < gaps->n_fds; ++i) {
-+ int fd;
-+
-+ fd = get_unused_fd_flags(O_CLOEXEC);
-+ if (fd < 0)
-+ incomplete_fds = true;
-+
-+ WARN_ON(!gaps->fd_files[i]);
-+
-+ fds[n_fds++] = fd < 0 ? -1 : fd;
-+ }
-+
-+ /*
-+ * The file-descriptor array can only be present once per
-+ * message. Hence, prepare all fds and then copy them over with
-+ * a single kvec.
-+ */
-+
-+ WARN_ON(!gaps->fd_offset);
-+
-+ kvec.iov_base = fds;
-+ kvec.iov_len = gaps->n_fds * sizeof(*fds);
-+ ret = kdbus_pool_slice_copy_kvec(slice, gaps->fd_offset,
-+ &kvec, 1, kvec.iov_len);
-+ if (ret < 0)
-+ goto exit;
-+ }
-+
-+ for (i = 0; i < gaps->n_memfds; ++i) {
-+ int memfd;
-+
-+ memfd = get_unused_fd_flags(O_CLOEXEC);
-+ if (memfd < 0) {
-+ incomplete_fds = true;
-+ /* memfds are initialized to -1, skip copying it */
-+ continue;
-+ }
-+
-+ fds[n_fds++] = memfd;
-+
-+ /*
-+ * memfds have to be copied individually as they each are put
-+ * into a separate item. This should not be an issue, though,
-+ * as usually there is no need to send more than one memfd per
-+ * message.
-+ */
-+
-+ WARN_ON(!gaps->memfd_offsets[i]);
-+ WARN_ON(!gaps->memfd_files[i]);
-+
-+ kvec.iov_base = &memfd;
-+ kvec.iov_len = sizeof(memfd);
-+ ret = kdbus_pool_slice_copy_kvec(slice, gaps->memfd_offsets[i],
-+ &kvec, 1, kvec.iov_len);
-+ if (ret < 0)
-+ goto exit;
-+ }
-+
-+ /* 2) install fds now that everything was successful */
-+
-+ for (i = 0; i < gaps->n_fds; ++i)
-+ if (fds[i] >= 0)
-+ fd_install(fds[i], get_file(gaps->fd_files[i]));
-+ for (i = 0; i < gaps->n_memfds; ++i)
-+ if (fds[gaps->n_fds + i] >= 0)
-+ fd_install(fds[gaps->n_fds + i],
-+ get_file(gaps->memfd_files[i]));
-+
-+ ret = 0;
-+
-+exit:
-+ if (ret < 0)
-+ for (i = 0; i < n_fds; ++i)
-+ put_unused_fd(fds[i]);
-+ kfree(fds);
-+ *out_incomplete = incomplete_fds;
-+ return ret;
-+}
-+
-+static struct file *kdbus_get_fd(int fd)
-+{
-+ struct file *f, *ret;
-+ struct inode *inode;
-+ struct socket *sock;
-+
-+ if (fd < 0)
-+ return ERR_PTR(-EBADF);
-+
-+ f = fget_raw(fd);
-+ if (!f)
-+ return ERR_PTR(-EBADF);
-+
-+ inode = file_inode(f);
-+ sock = S_ISSOCK(inode->i_mode) ? SOCKET_I(inode) : NULL;
-+
-+ if (f->f_mode & FMODE_PATH)
-+ ret = f; /* O_PATH is always allowed */
-+ else if (f->f_op == &kdbus_handle_ops)
-+ ret = ERR_PTR(-EOPNOTSUPP); /* disallow kdbus-fd over kdbus */
-+ else if (sock && sock->sk && sock->ops && sock->ops->family == PF_UNIX)
-+ ret = ERR_PTR(-EOPNOTSUPP); /* disallow UDS over kdbus */
-+ else
-+ ret = f; /* all other are allowed */
-+
-+ if (f != ret)
-+ fput(f);
-+
-+ return ret;
-+}
-+
-+static struct file *kdbus_get_memfd(const struct kdbus_memfd *memfd)
-+{
-+ const int m = F_SEAL_SHRINK | F_SEAL_GROW | F_SEAL_WRITE | F_SEAL_SEAL;
-+ struct file *f, *ret;
-+ int s;
-+
-+ if (memfd->fd < 0)
-+ return ERR_PTR(-EBADF);
-+
-+ f = fget(memfd->fd);
-+ if (!f)
-+ return ERR_PTR(-EBADF);
-+
-+ s = shmem_get_seals(f);
-+ if (s < 0)
-+ ret = ERR_PTR(-EMEDIUMTYPE);
-+ else if ((s & m) != m)
-+ ret = ERR_PTR(-ETXTBSY);
-+ else if (memfd->start + memfd->size > (u64)i_size_read(file_inode(f)))
-+ ret = ERR_PTR(-EFAULT);
-+ else
-+ ret = f;
-+
-+ if (f != ret)
-+ fput(f);
-+
-+ return ret;
-+}
-+
-+static int kdbus_msg_examine(struct kdbus_msg *msg, struct kdbus_bus *bus,
-+ struct kdbus_cmd_send *cmd, size_t *out_n_memfds,
-+ size_t *out_n_fds, size_t *out_n_parts)
-+{
-+ struct kdbus_item *item, *fds = NULL, *bloom = NULL, *dstname = NULL;
-+ u64 n_parts, n_memfds, n_fds, vec_size;
-+
-+ /*
-+ * Step 1:
-+ * Validate the message and command parameters.
-+ */
-+
-+ /* KDBUS_PAYLOAD_KERNEL is reserved to kernel messages */
-+ if (msg->payload_type == KDBUS_PAYLOAD_KERNEL)
-+ return -EINVAL;
-+
-+ if (msg->dst_id == KDBUS_DST_ID_BROADCAST) {
-+ /* broadcasts must be marked as signals */
-+ if (!(msg->flags & KDBUS_MSG_SIGNAL))
-+ return -EBADMSG;
-+ /* broadcasts cannot have timeouts */
-+ if (msg->timeout_ns > 0)
-+ return -ENOTUNIQ;
-+ }
-+
-+ if (msg->flags & KDBUS_MSG_EXPECT_REPLY) {
-+ /* if you expect a reply, you must specify a timeout */
-+ if (msg->timeout_ns == 0)
-+ return -EINVAL;
-+ /* signals cannot have replies */
-+ if (msg->flags & KDBUS_MSG_SIGNAL)
-+ return -ENOTUNIQ;
-+ } else {
-+ /* must expect reply if sent as synchronous call */
-+ if (cmd->flags & KDBUS_SEND_SYNC_REPLY)
-+ return -EINVAL;
-+ /* cannot mark replies as signal */
-+ if (msg->cookie_reply && (msg->flags & KDBUS_MSG_SIGNAL))
-+ return -EINVAL;
-+ }
-+
-+ /*
-+ * Step 2:
-+ * Validate all passed items. While at it, select some statistics that
-+ * are required to allocate state objects later on.
-+ *
-+ * Generic item validation has already been done via
-+ * kdbus_item_validate(). Furthermore, the number of items is naturally
-+ * limited by the maximum message size. Hence, only non-generic item
-+ * checks are performed here (mainly integer overflow tests).
-+ */
-+
-+ n_parts = 0;
-+ n_memfds = 0;
-+ n_fds = 0;
-+ vec_size = 0;
-+
-+ KDBUS_ITEMS_FOREACH(item, msg->items, KDBUS_ITEMS_SIZE(msg, items)) {
-+ switch (item->type) {
-+ case KDBUS_ITEM_PAYLOAD_VEC: {
-+ void __force __user *ptr = KDBUS_PTR(item->vec.address);
-+ u64 size = item->vec.size;
-+
-+ if (vec_size + size < vec_size)
-+ return -EMSGSIZE;
-+ if (vec_size + size > KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE)
-+ return -EMSGSIZE;
-+ if (ptr && unlikely(!access_ok(VERIFY_READ, ptr, size)))
-+ return -EFAULT;
-+
-+ if (ptr || size % 8) /* data or padding */
-+ ++n_parts;
-+ break;
-+ }
-+ case KDBUS_ITEM_PAYLOAD_MEMFD: {
-+ u64 start = item->memfd.start;
-+ u64 size = item->memfd.size;
-+
-+ if (start + size < start)
-+ return -EMSGSIZE;
-+ if (n_memfds >= KDBUS_MSG_MAX_MEMFD_ITEMS)
-+ return -E2BIG;
-+
-+ ++n_memfds;
-+ if (size % 8) /* vec-padding required */
-+ ++n_parts;
-+ break;
-+ }
-+ case KDBUS_ITEM_FDS: {
-+ if (fds)
-+ return -EEXIST;
-+
-+ fds = item;
-+ n_fds = KDBUS_ITEM_PAYLOAD_SIZE(item) / sizeof(int);
-+ if (n_fds > KDBUS_CONN_MAX_FDS_PER_USER)
-+ return -EMFILE;
-+
-+ break;
-+ }
-+ case KDBUS_ITEM_BLOOM_FILTER: {
-+ u64 bloom_size;
-+
-+ if (bloom)
-+ return -EEXIST;
-+
-+ bloom = item;
-+ bloom_size = KDBUS_ITEM_PAYLOAD_SIZE(item) -
-+ offsetof(struct kdbus_bloom_filter, data);
-+ if (!KDBUS_IS_ALIGNED8(bloom_size))
-+ return -EFAULT;
-+ if (bloom_size != bus->bloom.size)
-+ return -EDOM;
-+
-+ break;
-+ }
-+ case KDBUS_ITEM_DST_NAME: {
-+ if (dstname)
-+ return -EEXIST;
-+
-+ dstname = item;
-+ if (!kdbus_name_is_valid(item->str, false))
-+ return -EINVAL;
-+ if (msg->dst_id == KDBUS_DST_ID_BROADCAST)
-+ return -EBADMSG;
-+
-+ break;
-+ }
-+ default:
-+ return -EINVAL;
-+ }
-+ }
-+
-+ /*
-+ * Step 3:
-+ * Validate that required items were actually passed, and that no item
-+ * contradicts the message flags.
-+ */
-+
-+ /* bloom filters must be attached _iff_ it's a signal */
-+ if (!(msg->flags & KDBUS_MSG_SIGNAL) != !bloom)
-+ return -EBADMSG;
-+ /* destination name is required if no ID is given */
-+ if (msg->dst_id == KDBUS_DST_ID_NAME && !dstname)
-+ return -EDESTADDRREQ;
-+ /* cannot send file-descriptors attached to broadcasts */
-+ if (msg->dst_id == KDBUS_DST_ID_BROADCAST && fds)
-+ return -ENOTUNIQ;
-+
-+ *out_n_memfds = n_memfds;
-+ *out_n_fds = n_fds;
-+ *out_n_parts = n_parts;
-+
-+ return 0;
-+}
-+
-+static bool kdbus_staging_merge_vecs(struct kdbus_staging *staging,
-+ struct kdbus_item **prev_item,
-+ struct iovec **prev_vec,
-+ const struct kdbus_item *merge)
-+{
-+ void __user *ptr = (void __user *)KDBUS_PTR(merge->vec.address);
-+ u64 padding = merge->vec.size % 8;
-+ struct kdbus_item *prev = *prev_item;
-+ struct iovec *vec = *prev_vec;
-+
-+ /* XXX: merging is disabled so far */
-+ if (0 && prev && prev->type == KDBUS_ITEM_PAYLOAD_OFF &&
-+ !merge->vec.address == !prev->vec.address) {
-+ /*
-+ * If we merge two VECs, we can always drop the second
-+ * PAYLOAD_VEC item. Hence, include its size in the previous
-+ * one.
-+ */
-+ prev->vec.size += merge->vec.size;
-+
-+ if (ptr) {
-+ /*
-+ * If we merge two data VECs, we need two iovecs to copy
-+ * the data. But the items can be easily merged by
-+ * summing their lengths.
-+ */
-+ vec = &staging->parts[staging->n_parts++];
-+ vec->iov_len = merge->vec.size;
-+ vec->iov_base = ptr;
-+ staging->n_payload += vec->iov_len;
-+ } else if (padding) {
-+ /*
-+ * If we merge two 0-vecs with the second 0-vec
-+ * requiring padding, we need to insert an iovec to copy
-+ * the 0-padding. We try merging it with the previous
-+ * 0-padding iovec. This might end up with an
-+ * iov_len==0, in which case we simply drop the iovec.
-+ */
-+ if (vec) {
-+ staging->n_payload -= vec->iov_len;
-+ vec->iov_len = prev->vec.size % 8;
-+ if (!vec->iov_len) {
-+ --staging->n_parts;
-+ vec = NULL;
-+ } else {
-+ staging->n_payload += vec->iov_len;
-+ }
-+ } else {
-+ vec = &staging->parts[staging->n_parts++];
-+ vec->iov_len = padding;
-+ vec->iov_base = (char __user *)zeros;
-+ staging->n_payload += vec->iov_len;
-+ }
-+ } else {
-+ /*
-+ * If we merge two 0-vecs with the second 0-vec having
-+ * no padding, we know the padding of the first stays
-+ * the same. Hence, @vec needs no adjustment.
-+ */
-+ }
-+
-+ /* successfully merged with previous item */
-+ merge = prev;
-+ } else {
-+ /*
-+ * If we cannot merge the payload item with the previous one,
-+ * we simply insert a new iovec for the data/padding.
-+ */
-+ if (ptr) {
-+ vec = &staging->parts[staging->n_parts++];
-+ vec->iov_len = merge->vec.size;
-+ vec->iov_base = ptr;
-+ staging->n_payload += vec->iov_len;
-+ } else if (padding) {
-+ vec = &staging->parts[staging->n_parts++];
-+ vec->iov_len = padding;
-+ vec->iov_base = (char __user *)zeros;
-+ staging->n_payload += vec->iov_len;
-+ } else {
-+ vec = NULL;
-+ }
-+ }
-+
-+ *prev_item = (struct kdbus_item *)merge;
-+ *prev_vec = vec;
-+
-+ return merge == prev;
-+}
-+
-+static int kdbus_staging_import(struct kdbus_staging *staging)
-+{
-+ struct kdbus_item *it, *item, *last, *prev_payload;
-+ struct kdbus_gaps *gaps = staging->gaps;
-+ struct kdbus_msg *msg = staging->msg;
-+ struct iovec *part, *prev_part;
-+ bool drop_item;
-+
-+ drop_item = false;
-+ last = NULL;
-+ prev_payload = NULL;
-+ prev_part = NULL;
-+
-+ /*
-+ * We modify msg->items along the way; make sure to use @item as offset
-+ * to the next item (instead of the iterator @it).
-+ */
-+ for (it = item = msg->items;
-+ it >= msg->items &&
-+ (u8 *)it < (u8 *)msg + msg->size &&
-+ (u8 *)it + it->size <= (u8 *)msg + msg->size; ) {
-+ /*
-+ * If we dropped items along the way, move current item to
-+ * front. We must not access @it afterwards, but use @item
-+ * instead!
-+ */
-+ if (it != item)
-+ memmove(item, it, it->size);
-+ it = (void *)((u8 *)it + KDBUS_ALIGN8(item->size));
-+
-+ switch (item->type) {
-+ case KDBUS_ITEM_PAYLOAD_VEC: {
-+ size_t offset = staging->n_payload;
-+
-+ if (kdbus_staging_merge_vecs(staging, &prev_payload,
-+ &prev_part, item)) {
-+ drop_item = true;
-+ } else if (item->vec.address) {
-+ /* real offset is patched later on */
-+ item->type = KDBUS_ITEM_PAYLOAD_OFF;
-+ item->vec.offset = offset;
-+ } else {
-+ item->type = KDBUS_ITEM_PAYLOAD_OFF;
-+ item->vec.offset = ~0ULL;
-+ }
-+
-+ break;
-+ }
-+ case KDBUS_ITEM_PAYLOAD_MEMFD: {
-+ struct file *f;
-+
-+ f = kdbus_get_memfd(&item->memfd);
-+ if (IS_ERR(f))
-+ return PTR_ERR(f);
-+
-+ gaps->memfd_files[gaps->n_memfds] = f;
-+ gaps->memfd_offsets[gaps->n_memfds] =
-+ (u8 *)&item->memfd.fd - (u8 *)msg;
-+ ++gaps->n_memfds;
-+
-+ /* memfds cannot be merged */
-+ prev_payload = item;
-+ prev_part = NULL;
-+
-+ /* insert padding to make following VECs aligned */
-+ if (item->memfd.size % 8) {
-+ part = &staging->parts[staging->n_parts++];
-+ part->iov_len = item->memfd.size % 8;
-+ part->iov_base = (char __user *)zeros;
-+ staging->n_payload += part->iov_len;
-+ }
-+
-+ break;
-+ }
-+ case KDBUS_ITEM_FDS: {
-+ size_t i, n_fds;
-+
-+ n_fds = KDBUS_ITEM_PAYLOAD_SIZE(item) / sizeof(int);
-+ for (i = 0; i < n_fds; ++i) {
-+ struct file *f;
-+
-+ f = kdbus_get_fd(item->fds[i]);
-+ if (IS_ERR(f))
-+ return PTR_ERR(f);
-+
-+ gaps->fd_files[gaps->n_fds++] = f;
-+ }
-+
-+ gaps->fd_offset = (u8 *)item->fds - (u8 *)msg;
-+
-+ break;
-+ }
-+ case KDBUS_ITEM_BLOOM_FILTER:
-+ staging->bloom_filter = &item->bloom_filter;
-+ break;
-+ case KDBUS_ITEM_DST_NAME:
-+ staging->dst_name = item->str;
-+ break;
-+ }
-+
-+ /* drop item if we merged it with a previous one */
-+ if (drop_item) {
-+ drop_item = false;
-+ } else {
-+ last = item;
-+ item = KDBUS_ITEM_NEXT(item);
-+ }
-+ }
-+
-+ /* adjust message size regarding dropped items */
-+ msg->size = offsetof(struct kdbus_msg, items);
-+ if (last)
-+ msg->size += ((u8 *)last - (u8 *)msg->items) + last->size;
-+
-+ return 0;
-+}
-+
-+static void kdbus_staging_reserve(struct kdbus_staging *staging)
-+{
-+ struct iovec *part;
-+
-+ part = &staging->parts[staging->n_parts++];
-+ part->iov_base = (void __user *)zeros;
-+ part->iov_len = 0;
-+}
-+
-+static struct kdbus_staging *kdbus_staging_new(struct kdbus_bus *bus,
-+ size_t n_parts,
-+ size_t msg_extra_size)
-+{
-+ const size_t reserved_parts = 5; /* see below for explanation */
-+ struct kdbus_staging *staging;
-+ int ret;
-+
-+ n_parts += reserved_parts;
-+
-+ staging = kzalloc(sizeof(*staging) + n_parts * sizeof(*staging->parts) +
-+ msg_extra_size, GFP_TEMPORARY);
-+ if (!staging)
-+ return ERR_PTR(-ENOMEM);
-+
-+ staging->msg_seqnum = atomic64_inc_return(&bus->last_message_id);
-+ staging->n_parts = 0; /* we reserve n_parts, but don't enforce them */
-+ staging->parts = (void *)(staging + 1);
-+
-+ if (msg_extra_size) /* if requested, allocate message, too */
-+ staging->msg = (void *)((u8 *)staging->parts +
-+ n_parts * sizeof(*staging->parts));
-+
-+ staging->meta_proc = kdbus_meta_proc_new();
-+ if (IS_ERR(staging->meta_proc)) {
-+ ret = PTR_ERR(staging->meta_proc);
-+ staging->meta_proc = NULL;
-+ goto error;
-+ }
-+
-+ staging->meta_conn = kdbus_meta_conn_new();
-+ if (IS_ERR(staging->meta_conn)) {
-+ ret = PTR_ERR(staging->meta_conn);
-+ staging->meta_conn = NULL;
-+ goto error;
-+ }
-+
-+ /*
-+ * Prepare iovecs to copy the message into the target pool. We use the
-+ * following iovecs:
-+ * * iovec to copy "kdbus_msg.size"
-+ * * iovec to copy "struct kdbus_msg" (minus size) plus items
-+ * * iovec for possible padding after the items
-+ * * iovec for metadata items
-+ * * iovec for possible padding after the items
-+ *
-+ * Make sure to update @reserved_parts if you add more parts here.
-+ */
-+
-+ kdbus_staging_reserve(staging); /* msg.size */
-+ kdbus_staging_reserve(staging); /* msg (minus msg.size) plus items */
-+ kdbus_staging_reserve(staging); /* msg padding */
-+ kdbus_staging_reserve(staging); /* meta */
-+ kdbus_staging_reserve(staging); /* meta padding */
-+
-+ return staging;
-+
-+error:
-+ kdbus_staging_free(staging);
-+ return ERR_PTR(ret);
-+}
-+
-+struct kdbus_staging *kdbus_staging_new_kernel(struct kdbus_bus *bus,
-+ u64 dst, u64 cookie_timeout,
-+ size_t it_size, size_t it_type)
-+{
-+ struct kdbus_staging *staging;
-+ size_t size;
-+
-+ size = offsetof(struct kdbus_msg, items) +
-+ KDBUS_ITEM_HEADER_SIZE + it_size;
-+
-+ staging = kdbus_staging_new(bus, 0, KDBUS_ALIGN8(size));
-+ if (IS_ERR(staging))
-+ return ERR_CAST(staging);
-+
-+ staging->msg->size = size;
-+ staging->msg->flags = (dst == KDBUS_DST_ID_BROADCAST) ?
-+ KDBUS_MSG_SIGNAL : 0;
-+ staging->msg->dst_id = dst;
-+ staging->msg->src_id = KDBUS_SRC_ID_KERNEL;
-+ staging->msg->payload_type = KDBUS_PAYLOAD_KERNEL;
-+ staging->msg->cookie_reply = cookie_timeout;
-+ staging->notify = staging->msg->items;
-+ staging->notify->size = KDBUS_ITEM_HEADER_SIZE + it_size;
-+ staging->notify->type = it_type;
-+
-+ return staging;
-+}
-+
-+struct kdbus_staging *kdbus_staging_new_user(struct kdbus_bus *bus,
-+ struct kdbus_cmd_send *cmd,
-+ struct kdbus_msg *msg)
-+{
-+ const size_t reserved_parts = 1; /* see below for explanation */
-+ size_t n_memfds, n_fds, n_parts;
-+ struct kdbus_staging *staging;
-+ int ret;
-+
-+ /*
-+ * Examine user-supplied message and figure out how many resources we
-+ * need to allocate in our staging area. This requires us to iterate
-+ * the message twice, but saves us from re-allocating our resources
-+ * all the time.
-+ */
-+
-+ ret = kdbus_msg_examine(msg, bus, cmd, &n_memfds, &n_fds, &n_parts);
-+ if (ret < 0)
-+ return ERR_PTR(ret);
-+
-+ n_parts += reserved_parts;
-+
-+ /*
-+ * Allocate staging area with the number of required resources. Make
-+ * sure that we have enough iovecs for all required parts pre-allocated
-+ * so this will hopefully be the only memory allocation for this
-+ * message transaction.
-+ */
-+
-+ staging = kdbus_staging_new(bus, n_parts, 0);
-+ if (IS_ERR(staging))
-+ return ERR_CAST(staging);
-+
-+ staging->msg = msg;
-+
-+ /*
-+ * If the message contains memfds or fd items, we need to remember some
-+ * state so we can fill in the requested information at RECV time.
-+ * File-descriptors cannot be passed at SEND time. Hence, allocate a
-+ * gaps-object to remember that state. That gaps object is linked to
-+ * from the staging area, but will also be linked to from the message
-+ * queue of each peer. Hence, each receiver owns a reference to it, and
-+ * it will later be used to fill the 'gaps' in message that couldn't be
-+ * filled at SEND time.
-+ * Note that the 'gaps' object is read-only once the staging-allocator
-+ * returns. There might be connections receiving a queued message while
-+ * the sender still broadcasts the message to other receivers.
-+ */
-+
-+ if (n_memfds > 0 || n_fds > 0) {
-+ staging->gaps = kdbus_gaps_new(n_memfds, n_fds);
-+ if (IS_ERR(staging->gaps)) {
-+ ret = PTR_ERR(staging->gaps);
-+ staging->gaps = NULL;
-+ kdbus_staging_free(staging);
-+ return ERR_PTR(ret);
-+ }
-+ }
-+
-+ /*
-+ * kdbus_staging_new() already reserves parts for message setup. For
-+ * user-supplied messages, we add the following iovecs:
-+ * ... variable number of iovecs for payload ...
-+ * * final iovec for possible padding of payload
-+ *
-+ * Make sure to update @reserved_parts if you add more parts here.
-+ */
-+
-+ ret = kdbus_staging_import(staging); /* payload */
-+ kdbus_staging_reserve(staging); /* payload padding */
-+
-+ if (ret < 0)
-+ goto error;
-+
-+ return staging;
-+
-+error:
-+ kdbus_staging_free(staging);
-+ return ERR_PTR(ret);
-+}
-+
-+struct kdbus_staging *kdbus_staging_free(struct kdbus_staging *staging)
-+{
-+ if (!staging)
-+ return NULL;
-+
-+ kdbus_meta_conn_unref(staging->meta_conn);
-+ kdbus_meta_proc_unref(staging->meta_proc);
-+ kdbus_gaps_unref(staging->gaps);
-+ kfree(staging);
-+
-+ return NULL;
-+}
-+
-+static int kdbus_staging_collect_metadata(struct kdbus_staging *staging,
-+ struct kdbus_conn *src,
-+ struct kdbus_conn *dst,
-+ u64 *out_attach)
-+{
-+ u64 attach;
-+ int ret;
-+
-+ if (src)
-+ attach = kdbus_meta_msg_mask(src, dst);
-+ else
-+ attach = KDBUS_ATTACH_TIMESTAMP; /* metadata for kernel msgs */
-+
-+ if (src && !src->meta_fake) {
-+ ret = kdbus_meta_proc_collect(staging->meta_proc, attach);
-+ if (ret < 0)
-+ return ret;
-+ }
-+
-+ ret = kdbus_meta_conn_collect(staging->meta_conn, src,
-+ staging->msg_seqnum, attach);
-+ if (ret < 0)
-+ return ret;
-+
-+ *out_attach = attach;
-+ return 0;
-+}
-+
-+/**
-+ * kdbus_staging_emit() - emit linearized message in target pool
-+ * @staging: staging object to create message from
-+ * @src: sender of the message (or NULL)
-+ * @dst: target connection to allocate message for
-+ *
-+ * This allocates a pool-slice for @dst and copies the message provided by
-+ * @staging into it. The new slice is then returned to the caller for further
-+ * processing. It's not linked into any queue, yet.
-+ *
-+ * Return: Newly allocated slice or ERR_PTR on failure.
-+ */
-+struct kdbus_pool_slice *kdbus_staging_emit(struct kdbus_staging *staging,
-+ struct kdbus_conn *src,
-+ struct kdbus_conn *dst)
-+{
-+ struct kdbus_item *item, *meta_items = NULL;
-+ struct kdbus_pool_slice *slice = NULL;
-+ size_t off, size, meta_size;
-+ struct iovec *v;
-+ u64 attach, msg_size;
-+ int ret;
-+
-+ /*
-+ * Step 1:
-+ * Collect metadata from @src depending on the attach-flags allowed for
-+ * @dst. Translate it into the namespaces pinned by @dst.
-+ */
-+
-+ ret = kdbus_staging_collect_metadata(staging, src, dst, &attach);
-+ if (ret < 0)
-+ goto error;
-+
-+ ret = kdbus_meta_emit(staging->meta_proc, NULL, staging->meta_conn,
-+ dst, attach, &meta_items, &meta_size);
-+ if (ret < 0)
-+ goto error;
-+
-+ /*
-+ * Step 2:
-+ * Setup iovecs for the message. See kdbus_staging_new() for allocation
-+ * of those iovecs. All reserved iovecs have been initialized with
-+ * iov_len=0 + iov_base=zeros. Furthermore, the iovecs to copy the
-+ * actual message payload have already been initialized and need not be
-+ * touched.
-+ */
-+
-+ v = staging->parts;
-+ msg_size = staging->msg->size;
-+
-+ /* msg.size */
-+ v->iov_len = sizeof(msg_size);
-+ v->iov_base = (void __user *)&msg_size;
-+ ++v;
-+
-+ /* msg (after msg.size) plus items */
-+ v->iov_len = staging->msg->size - sizeof(staging->msg->size);
-+ v->iov_base = (void __user *)((u8 *)staging->msg +
-+ sizeof(staging->msg->size));
-+ ++v;
-+
-+ /* padding after msg */
-+ v->iov_len = KDBUS_ALIGN8(staging->msg->size) - staging->msg->size;
-+ v->iov_base = (void __user *)zeros;
-+ ++v;
-+
-+ if (meta_size > 0) {
-+ /* metadata items */
-+ v->iov_len = meta_size;
-+ v->iov_base = (void __user *)meta_items;
-+ ++v;
-+
-+ /* padding after metadata */
-+ v->iov_len = KDBUS_ALIGN8(meta_size) - meta_size;
-+ v->iov_base = (void __user *)zeros;
-+ ++v;
-+
-+ msg_size = KDBUS_ALIGN8(msg_size) + meta_size;
-+ } else {
-+ /* metadata items */
-+ v->iov_len = 0;
-+ v->iov_base = (void __user *)zeros;
-+ ++v;
-+
-+ /* padding after metadata */
-+ v->iov_len = 0;
-+ v->iov_base = (void __user *)zeros;
-+ ++v;
-+ }
-+
-+ /* ... payload iovecs are already filled in ... */
-+
-+ /* compute overall size and fill in padding after payload */
-+ size = KDBUS_ALIGN8(msg_size);
-+
-+ if (staging->n_payload > 0) {
-+ size += staging->n_payload;
-+
-+ v = &staging->parts[staging->n_parts - 1];
-+ v->iov_len = KDBUS_ALIGN8(size) - size;
-+ v->iov_base = (void __user *)zeros;
-+
-+ size = KDBUS_ALIGN8(size);
-+ }
-+
-+ /*
-+ * Step 3:
-+ * The PAYLOAD_OFF items in the message contain a relative 'offset'
-+ * field that tells the receiver where to find the actual payload. This
-+ * offset is relative to the start of the message, and as such depends
-+ * on the size of the metadata items we inserted. This size is variable
-+ * and changes for each peer we send the message to. Hence, we remember
-+ * the last relative offset that was used to calculate the 'offset'
-+ * fields. For each message, we re-calculate it and patch all items, in
-+ * case it changed.
-+ */
-+
-+ off = KDBUS_ALIGN8(msg_size);
-+
-+ if (off != staging->i_payload) {
-+ KDBUS_ITEMS_FOREACH(item, staging->msg->items,
-+ KDBUS_ITEMS_SIZE(staging->msg, items)) {
-+ if (item->type != KDBUS_ITEM_PAYLOAD_OFF)
-+ continue;
-+
-+ item->vec.offset -= staging->i_payload;
-+ item->vec.offset += off;
-+ }
-+
-+ staging->i_payload = off;
-+ }
-+
-+ /*
-+ * Step 4:
-+ * Allocate pool slice and copy over all data. Make sure to properly
-+ * account on user quota.
-+ */
-+
-+ ret = kdbus_conn_quota_inc(dst, src ? src->user : NULL, size,
-+ staging->gaps ? staging->gaps->n_fds : 0);
-+ if (ret < 0)
-+ goto error;
-+
-+ slice = kdbus_pool_slice_alloc(dst->pool, size, true);
-+ if (IS_ERR(slice)) {
-+ ret = PTR_ERR(slice);
-+ slice = NULL;
-+ goto error;
-+ }
-+
-+ WARN_ON(kdbus_pool_slice_size(slice) != size);
-+
-+ ret = kdbus_pool_slice_copy_iovec(slice, 0, staging->parts,
-+ staging->n_parts, size);
-+ if (ret < 0)
-+ goto error;
-+
-+ /* all done, return slice to caller */
-+ goto exit;
-+
-+error:
-+ if (slice)
-+ kdbus_conn_quota_dec(dst, src ? src->user : NULL, size,
-+ staging->gaps ? staging->gaps->n_fds : 0);
-+ kdbus_pool_slice_release(slice);
-+ slice = ERR_PTR(ret);
-+exit:
-+ kfree(meta_items);
-+ return slice;
-+}
-diff --git a/ipc/kdbus/message.h b/ipc/kdbus/message.h
-new file mode 100644
-index 0000000..298f9c9
---- /dev/null
-+++ b/ipc/kdbus/message.h
-@@ -0,0 +1,120 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_MESSAGE_H
-+#define __KDBUS_MESSAGE_H
-+
-+#include <linux/fs.h>
-+#include <linux/kref.h>
-+#include <uapi/linux/kdbus.h>
-+
-+struct kdbus_bus;
-+struct kdbus_conn;
-+struct kdbus_meta_conn;
-+struct kdbus_meta_proc;
-+struct kdbus_pool_slice;
-+
-+/**
-+ * struct kdbus_gaps - gaps in message to be filled later
-+ * @kref: Reference counter
-+ * @n_memfd_offs: Number of memfds
-+ * @memfd_offs: Offsets of kdbus_memfd items in target slice
-+ * @n_fds: Number of fds
-+ * @fds: Array of sent fds
-+ * @fds_offset: Offset of fd-array in target slice
-+ *
-+ * The 'gaps' object is used to track data that is needed to fill gaps in a
-+ * message at RECV time. Usually, we try to compile the whole message at SEND
-+ * time. This has the advantage, that we don't have to cache any information and
-+ * can keep the memory consumption small. Furthermore, all copy operations can
-+ * be combined into a single function call, which speeds up transactions
-+ * considerably.
-+ * However, things like file-descriptors can only be fully installed at RECV
-+ * time. The gaps object tracks this data and pins it until a message is
-+ * received. The gaps object is shared between all receivers of the same
-+ * message.
-+ */
-+struct kdbus_gaps {
-+ struct kref kref;
-+
-+ /* state tracking for KDBUS_ITEM_PAYLOAD_MEMFD entries */
-+ size_t n_memfds;
-+ u64 *memfd_offsets;
-+ struct file **memfd_files;
-+
-+ /* state tracking for KDBUS_ITEM_FDS */
-+ size_t n_fds;
-+ struct file **fd_files;
-+ u64 fd_offset;
-+};
-+
-+struct kdbus_gaps *kdbus_gaps_ref(struct kdbus_gaps *gaps);
-+struct kdbus_gaps *kdbus_gaps_unref(struct kdbus_gaps *gaps);
-+int kdbus_gaps_install(struct kdbus_gaps *gaps, struct kdbus_pool_slice *slice,
-+ bool *out_incomplete);
-+
-+/**
-+ * struct kdbus_staging - staging area to import messages
-+ * @msg: User-supplied message
-+ * @gaps: Gaps-object created during import (or NULL if empty)
-+ * @msg_seqnum: Message sequence number
-+ * @notify_entry: Entry into list of kernel-generated notifications
-+ * @i_payload: Current relative index of start of payload
-+ * @n_payload: Total number of bytes needed for payload
-+ * @n_parts: Number of parts
-+ * @parts: Array of iovecs that make up the whole message
-+ * @meta_proc: Process metadata of the sender (or NULL if empty)
-+ * @meta_conn: Connection metadata of the sender (or NULL if empty)
-+ * @bloom_filter: Pointer to the bloom-item in @msg, or NULL
-+ * @dst_name: Pointer to the dst-name-item in @msg, or NULL
-+ * @notify: Pointer to the notification item in @msg, or NULL
-+ *
-+ * The kdbus_staging object is a temporary staging area to import user-supplied
-+ * messages into the kernel. It is only used during SEND and dropped once the
-+ * message is queued. Any data that cannot be collected during SEND, is
-+ * collected in a kdbus_gaps object and attached to the message queue.
-+ */
-+struct kdbus_staging {
-+ struct kdbus_msg *msg;
-+ struct kdbus_gaps *gaps;
-+ u64 msg_seqnum;
-+ struct list_head notify_entry;
-+
-+ /* crafted iovecs to copy the message */
-+ size_t i_payload;
-+ size_t n_payload;
-+ size_t n_parts;
-+ struct iovec *parts;
-+
-+ /* metadata state */
-+ struct kdbus_meta_proc *meta_proc;
-+ struct kdbus_meta_conn *meta_conn;
-+
-+ /* cached pointers into @msg */
-+ const struct kdbus_bloom_filter *bloom_filter;
-+ const char *dst_name;
-+ struct kdbus_item *notify;
-+};
-+
-+struct kdbus_staging *kdbus_staging_new_kernel(struct kdbus_bus *bus,
-+ u64 dst, u64 cookie_timeout,
-+ size_t it_size, size_t it_type);
-+struct kdbus_staging *kdbus_staging_new_user(struct kdbus_bus *bus,
-+ struct kdbus_cmd_send *cmd,
-+ struct kdbus_msg *msg);
-+struct kdbus_staging *kdbus_staging_free(struct kdbus_staging *staging);
-+struct kdbus_pool_slice *kdbus_staging_emit(struct kdbus_staging *staging,
-+ struct kdbus_conn *src,
-+ struct kdbus_conn *dst);
-+
-+#endif
-diff --git a/ipc/kdbus/metadata.c b/ipc/kdbus/metadata.c
-new file mode 100644
-index 0000000..71ca475
---- /dev/null
-+++ b/ipc/kdbus/metadata.c
-@@ -0,0 +1,1347 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/audit.h>
-+#include <linux/capability.h>
-+#include <linux/cgroup.h>
-+#include <linux/cred.h>
-+#include <linux/file.h>
-+#include <linux/fs_struct.h>
-+#include <linux/init.h>
-+#include <linux/kref.h>
-+#include <linux/mutex.h>
-+#include <linux/sched.h>
-+#include <linux/security.h>
-+#include <linux/sizes.h>
-+#include <linux/slab.h>
-+#include <linux/uaccess.h>
-+#include <linux/uidgid.h>
-+#include <linux/uio.h>
-+#include <linux/user_namespace.h>
-+
-+#include "bus.h"
-+#include "connection.h"
-+#include "endpoint.h"
-+#include "item.h"
-+#include "message.h"
-+#include "metadata.h"
-+#include "names.h"
-+
-+/**
-+ * struct kdbus_meta_proc - Process metadata
-+ * @kref: Reference counting
-+ * @lock: Object lock
-+ * @collected: Bitmask of collected items
-+ * @valid: Bitmask of collected and valid items
-+ * @cred: Credentials
-+ * @pid: PID of process
-+ * @tgid: TGID of process
-+ * @ppid: PPID of process
-+ * @tid_comm: TID comm line
-+ * @pid_comm: PID comm line
-+ * @exe_path: Executable path
-+ * @root_path: Root-FS path
-+ * @cmdline: Command-line
-+ * @cgroup: Full cgroup path
-+ * @seclabel: Seclabel
-+ * @audit_loginuid: Audit login-UID
-+ * @audit_sessionid: Audit session-ID
-+ */
-+struct kdbus_meta_proc {
-+ struct kref kref;
-+ struct mutex lock;
-+ u64 collected;
-+ u64 valid;
-+
-+ /* KDBUS_ITEM_CREDS */
-+ /* KDBUS_ITEM_AUXGROUPS */
-+ /* KDBUS_ITEM_CAPS */
-+ const struct cred *cred;
-+
-+ /* KDBUS_ITEM_PIDS */
-+ struct pid *pid;
-+ struct pid *tgid;
-+ struct pid *ppid;
-+
-+ /* KDBUS_ITEM_TID_COMM */
-+ char tid_comm[TASK_COMM_LEN];
-+ /* KDBUS_ITEM_PID_COMM */
-+ char pid_comm[TASK_COMM_LEN];
-+
-+ /* KDBUS_ITEM_EXE */
-+ struct path exe_path;
-+ struct path root_path;
-+
-+ /* KDBUS_ITEM_CMDLINE */
-+ char *cmdline;
-+
-+ /* KDBUS_ITEM_CGROUP */
-+ char *cgroup;
-+
-+ /* KDBUS_ITEM_SECLABEL */
-+ char *seclabel;
-+
-+ /* KDBUS_ITEM_AUDIT */
-+ kuid_t audit_loginuid;
-+ unsigned int audit_sessionid;
-+};
-+
-+/**
-+ * struct kdbus_meta_conn
-+ * @kref: Reference counting
-+ * @lock: Object lock
-+ * @collected: Bitmask of collected items
-+ * @valid: Bitmask of collected and valid items
-+ * @ts: Timestamp values
-+ * @owned_names_items: Serialized items for owned names
-+ * @owned_names_size: Size of @owned_names_items
-+ * @conn_description: Connection description
-+ */
-+struct kdbus_meta_conn {
-+ struct kref kref;
-+ struct mutex lock;
-+ u64 collected;
-+ u64 valid;
-+
-+ /* KDBUS_ITEM_TIMESTAMP */
-+ struct kdbus_timestamp ts;
-+
-+ /* KDBUS_ITEM_OWNED_NAME */
-+ struct kdbus_item *owned_names_items;
-+ size_t owned_names_size;
-+
-+ /* KDBUS_ITEM_CONN_DESCRIPTION */
-+ char *conn_description;
-+};
-+
-+/* fixed size equivalent of "kdbus_caps" */
-+struct kdbus_meta_caps {
-+ u32 last_cap;
-+ struct {
-+ u32 caps[_KERNEL_CAPABILITY_U32S];
-+ } set[4];
-+};
-+
-+/**
-+ * kdbus_meta_proc_new() - Create process metadata object
-+ *
-+ * Return: Pointer to new object on success, ERR_PTR on failure.
-+ */
-+struct kdbus_meta_proc *kdbus_meta_proc_new(void)
-+{
-+ struct kdbus_meta_proc *mp;
-+
-+ mp = kzalloc(sizeof(*mp), GFP_KERNEL);
-+ if (!mp)
-+ return ERR_PTR(-ENOMEM);
-+
-+ kref_init(&mp->kref);
-+ mutex_init(&mp->lock);
-+
-+ return mp;
-+}
-+
-+static void kdbus_meta_proc_free(struct kref *kref)
-+{
-+ struct kdbus_meta_proc *mp = container_of(kref, struct kdbus_meta_proc,
-+ kref);
-+
-+ path_put(&mp->exe_path);
-+ path_put(&mp->root_path);
-+ if (mp->cred)
-+ put_cred(mp->cred);
-+ put_pid(mp->ppid);
-+ put_pid(mp->tgid);
-+ put_pid(mp->pid);
-+
-+ kfree(mp->seclabel);
-+ kfree(mp->cmdline);
-+ kfree(mp->cgroup);
-+ kfree(mp);
-+}
-+
-+/**
-+ * kdbus_meta_proc_ref() - Gain reference
-+ * @mp: Process metadata object
-+ *
-+ * Return: @mp is returned
-+ */
-+struct kdbus_meta_proc *kdbus_meta_proc_ref(struct kdbus_meta_proc *mp)
-+{
-+ if (mp)
-+ kref_get(&mp->kref);
-+ return mp;
-+}
-+
-+/**
-+ * kdbus_meta_proc_unref() - Drop reference
-+ * @mp: Process metadata object
-+ *
-+ * Return: NULL
-+ */
-+struct kdbus_meta_proc *kdbus_meta_proc_unref(struct kdbus_meta_proc *mp)
-+{
-+ if (mp)
-+ kref_put(&mp->kref, kdbus_meta_proc_free);
-+ return NULL;
-+}
-+
-+static void kdbus_meta_proc_collect_pids(struct kdbus_meta_proc *mp)
-+{
-+ struct task_struct *parent;
-+
-+ mp->pid = get_pid(task_pid(current));
-+ mp->tgid = get_pid(task_tgid(current));
-+
-+ rcu_read_lock();
-+ parent = rcu_dereference(current->real_parent);
-+ mp->ppid = get_pid(task_tgid(parent));
-+ rcu_read_unlock();
-+
-+ mp->valid |= KDBUS_ATTACH_PIDS;
-+}
-+
-+static void kdbus_meta_proc_collect_tid_comm(struct kdbus_meta_proc *mp)
-+{
-+ get_task_comm(mp->tid_comm, current);
-+ mp->valid |= KDBUS_ATTACH_TID_COMM;
-+}
-+
-+static void kdbus_meta_proc_collect_pid_comm(struct kdbus_meta_proc *mp)
-+{
-+ get_task_comm(mp->pid_comm, current->group_leader);
-+ mp->valid |= KDBUS_ATTACH_PID_COMM;
-+}
-+
-+static void kdbus_meta_proc_collect_exe(struct kdbus_meta_proc *mp)
-+{
-+ struct file *exe_file;
-+
-+ rcu_read_lock();
-+ exe_file = rcu_dereference(current->mm->exe_file);
-+ if (exe_file) {
-+ mp->exe_path = exe_file->f_path;
-+ path_get(&mp->exe_path);
-+ get_fs_root(current->fs, &mp->root_path);
-+ mp->valid |= KDBUS_ATTACH_EXE;
-+ }
-+ rcu_read_unlock();
-+}
-+
-+static int kdbus_meta_proc_collect_cmdline(struct kdbus_meta_proc *mp)
-+{
-+ struct mm_struct *mm = current->mm;
-+ char *cmdline;
-+
-+ if (!mm->arg_end)
-+ return 0;
-+
-+ cmdline = strndup_user((const char __user *)mm->arg_start,
-+ mm->arg_end - mm->arg_start);
-+ if (IS_ERR(cmdline))
-+ return PTR_ERR(cmdline);
-+
-+ mp->cmdline = cmdline;
-+ mp->valid |= KDBUS_ATTACH_CMDLINE;
-+
-+ return 0;
-+}
-+
-+static int kdbus_meta_proc_collect_cgroup(struct kdbus_meta_proc *mp)
-+{
-+#ifdef CONFIG_CGROUPS
-+ void *page;
-+ char *s;
-+
-+ page = (void *)__get_free_page(GFP_TEMPORARY);
-+ if (!page)
-+ return -ENOMEM;
-+
-+ s = task_cgroup_path(current, page, PAGE_SIZE);
-+ if (s) {
-+ mp->cgroup = kstrdup(s, GFP_KERNEL);
-+ if (!mp->cgroup) {
-+ free_page((unsigned long)page);
-+ return -ENOMEM;
-+ }
-+ }
-+
-+ free_page((unsigned long)page);
-+ mp->valid |= KDBUS_ATTACH_CGROUP;
-+#endif
-+
-+ return 0;
-+}
-+
-+static int kdbus_meta_proc_collect_seclabel(struct kdbus_meta_proc *mp)
-+{
-+#ifdef CONFIG_SECURITY
-+ char *ctx = NULL;
-+ u32 sid, len;
-+ int ret;
-+
-+ security_task_getsecid(current, &sid);
-+ ret = security_secid_to_secctx(sid, &ctx, &len);
-+ if (ret < 0) {
-+ /*
-+ * EOPNOTSUPP means no security module is active,
-+ * lets skip adding the seclabel then. This effectively
-+ * drops the SECLABEL item.
-+ */
-+ return (ret == -EOPNOTSUPP) ? 0 : ret;
-+ }
-+
-+ mp->seclabel = kstrdup(ctx, GFP_KERNEL);
-+ security_release_secctx(ctx, len);
-+ if (!mp->seclabel)
-+ return -ENOMEM;
-+
-+ mp->valid |= KDBUS_ATTACH_SECLABEL;
-+#endif
-+
-+ return 0;
-+}
-+
-+static void kdbus_meta_proc_collect_audit(struct kdbus_meta_proc *mp)
-+{
-+#ifdef CONFIG_AUDITSYSCALL
-+ mp->audit_loginuid = audit_get_loginuid(current);
-+ mp->audit_sessionid = audit_get_sessionid(current);
-+ mp->valid |= KDBUS_ATTACH_AUDIT;
-+#endif
-+}
-+
-+/**
-+ * kdbus_meta_proc_collect() - Collect process metadata
-+ * @mp: Process metadata object
-+ * @what: Attach flags to collect
-+ *
-+ * This collects process metadata from current and saves it in @mp.
-+ *
-+ * Return: 0 on success, negative error code on failure.
-+ */
-+int kdbus_meta_proc_collect(struct kdbus_meta_proc *mp, u64 what)
-+{
-+ int ret;
-+
-+ if (!mp || !(what & (KDBUS_ATTACH_CREDS |
-+ KDBUS_ATTACH_PIDS |
-+ KDBUS_ATTACH_AUXGROUPS |
-+ KDBUS_ATTACH_TID_COMM |
-+ KDBUS_ATTACH_PID_COMM |
-+ KDBUS_ATTACH_EXE |
-+ KDBUS_ATTACH_CMDLINE |
-+ KDBUS_ATTACH_CGROUP |
-+ KDBUS_ATTACH_CAPS |
-+ KDBUS_ATTACH_SECLABEL |
-+ KDBUS_ATTACH_AUDIT)))
-+ return 0;
-+
-+ mutex_lock(&mp->lock);
-+
-+ /* creds, auxgrps and caps share "struct cred" as context */
-+ {
-+ const u64 m_cred = KDBUS_ATTACH_CREDS |
-+ KDBUS_ATTACH_AUXGROUPS |
-+ KDBUS_ATTACH_CAPS;
-+
-+ if ((what & m_cred) && !(mp->collected & m_cred)) {
-+ mp->cred = get_current_cred();
-+ mp->valid |= m_cred;
-+ mp->collected |= m_cred;
-+ }
-+ }
-+
-+ if ((what & KDBUS_ATTACH_PIDS) &&
-+ !(mp->collected & KDBUS_ATTACH_PIDS)) {
-+ kdbus_meta_proc_collect_pids(mp);
-+ mp->collected |= KDBUS_ATTACH_PIDS;
-+ }
-+
-+ if ((what & KDBUS_ATTACH_TID_COMM) &&
-+ !(mp->collected & KDBUS_ATTACH_TID_COMM)) {
-+ kdbus_meta_proc_collect_tid_comm(mp);
-+ mp->collected |= KDBUS_ATTACH_TID_COMM;
-+ }
-+
-+ if ((what & KDBUS_ATTACH_PID_COMM) &&
-+ !(mp->collected & KDBUS_ATTACH_PID_COMM)) {
-+ kdbus_meta_proc_collect_pid_comm(mp);
-+ mp->collected |= KDBUS_ATTACH_PID_COMM;
-+ }
-+
-+ if ((what & KDBUS_ATTACH_EXE) &&
-+ !(mp->collected & KDBUS_ATTACH_EXE)) {
-+ kdbus_meta_proc_collect_exe(mp);
-+ mp->collected |= KDBUS_ATTACH_EXE;
-+ }
-+
-+ if ((what & KDBUS_ATTACH_CMDLINE) &&
-+ !(mp->collected & KDBUS_ATTACH_CMDLINE)) {
-+ ret = kdbus_meta_proc_collect_cmdline(mp);
-+ if (ret < 0)
-+ goto exit_unlock;
-+ mp->collected |= KDBUS_ATTACH_CMDLINE;
-+ }
-+
-+ if ((what & KDBUS_ATTACH_CGROUP) &&
-+ !(mp->collected & KDBUS_ATTACH_CGROUP)) {
-+ ret = kdbus_meta_proc_collect_cgroup(mp);
-+ if (ret < 0)
-+ goto exit_unlock;
-+ mp->collected |= KDBUS_ATTACH_CGROUP;
-+ }
-+
-+ if ((what & KDBUS_ATTACH_SECLABEL) &&
-+ !(mp->collected & KDBUS_ATTACH_SECLABEL)) {
-+ ret = kdbus_meta_proc_collect_seclabel(mp);
-+ if (ret < 0)
-+ goto exit_unlock;
-+ mp->collected |= KDBUS_ATTACH_SECLABEL;
-+ }
-+
-+ if ((what & KDBUS_ATTACH_AUDIT) &&
-+ !(mp->collected & KDBUS_ATTACH_AUDIT)) {
-+ kdbus_meta_proc_collect_audit(mp);
-+ mp->collected |= KDBUS_ATTACH_AUDIT;
-+ }
-+
-+ ret = 0;
-+
-+exit_unlock:
-+ mutex_unlock(&mp->lock);
-+ return ret;
-+}
-+
-+/**
-+ * kdbus_meta_fake_new() - Create fake metadata object
-+ *
-+ * Return: Pointer to new object on success, ERR_PTR on failure.
-+ */
-+struct kdbus_meta_fake *kdbus_meta_fake_new(void)
-+{
-+ struct kdbus_meta_fake *mf;
-+
-+ mf = kzalloc(sizeof(*mf), GFP_KERNEL);
-+ if (!mf)
-+ return ERR_PTR(-ENOMEM);
-+
-+ return mf;
-+}
-+
-+/**
-+ * kdbus_meta_fake_free() - Free fake metadata object
-+ * @mf: Fake metadata object
-+ *
-+ * Return: NULL
-+ */
-+struct kdbus_meta_fake *kdbus_meta_fake_free(struct kdbus_meta_fake *mf)
-+{
-+ if (mf) {
-+ put_pid(mf->ppid);
-+ put_pid(mf->tgid);
-+ put_pid(mf->pid);
-+ kfree(mf->seclabel);
-+ kfree(mf);
-+ }
-+
-+ return NULL;
-+}
-+
-+/**
-+ * kdbus_meta_fake_collect() - Fill fake metadata from faked credentials
-+ * @mf: Fake metadata object
-+ * @creds: Creds to set, may be %NULL
-+ * @pids: PIDs to set, may be %NULL
-+ * @seclabel: Seclabel to set, may be %NULL
-+ *
-+ * This function takes information stored in @creds, @pids and @seclabel and
-+ * resolves them to kernel-representations, if possible. This call uses the
-+ * current task's namespaces to resolve the given information.
-+ *
-+ * Return: 0 on success, negative error code on failure.
-+ */
-+int kdbus_meta_fake_collect(struct kdbus_meta_fake *mf,
-+ const struct kdbus_creds *creds,
-+ const struct kdbus_pids *pids,
-+ const char *seclabel)
-+{
-+ if (mf->valid)
-+ return -EALREADY;
-+
-+ if (creds) {
-+ struct user_namespace *ns = current_user_ns();
-+
-+ mf->uid = make_kuid(ns, creds->uid);
-+ mf->euid = make_kuid(ns, creds->euid);
-+ mf->suid = make_kuid(ns, creds->suid);
-+ mf->fsuid = make_kuid(ns, creds->fsuid);
-+
-+ mf->gid = make_kgid(ns, creds->gid);
-+ mf->egid = make_kgid(ns, creds->egid);
-+ mf->sgid = make_kgid(ns, creds->sgid);
-+ mf->fsgid = make_kgid(ns, creds->fsgid);
-+
-+ if ((creds->uid != (uid_t)-1 && !uid_valid(mf->uid)) ||
-+ (creds->euid != (uid_t)-1 && !uid_valid(mf->euid)) ||
-+ (creds->suid != (uid_t)-1 && !uid_valid(mf->suid)) ||
-+ (creds->fsuid != (uid_t)-1 && !uid_valid(mf->fsuid)) ||
-+ (creds->gid != (gid_t)-1 && !gid_valid(mf->gid)) ||
-+ (creds->egid != (gid_t)-1 && !gid_valid(mf->egid)) ||
-+ (creds->sgid != (gid_t)-1 && !gid_valid(mf->sgid)) ||
-+ (creds->fsgid != (gid_t)-1 && !gid_valid(mf->fsgid)))
-+ return -EINVAL;
-+
-+ mf->valid |= KDBUS_ATTACH_CREDS;
-+ }
-+
-+ if (pids) {
-+ mf->pid = get_pid(find_vpid(pids->tid));
-+ mf->tgid = get_pid(find_vpid(pids->pid));
-+ mf->ppid = get_pid(find_vpid(pids->ppid));
-+
-+ if ((pids->tid != 0 && !mf->pid) ||
-+ (pids->pid != 0 && !mf->tgid) ||
-+ (pids->ppid != 0 && !mf->ppid)) {
-+ put_pid(mf->pid);
-+ put_pid(mf->tgid);
-+ put_pid(mf->ppid);
-+ mf->pid = NULL;
-+ mf->tgid = NULL;
-+ mf->ppid = NULL;
-+ return -EINVAL;
-+ }
-+
-+ mf->valid |= KDBUS_ATTACH_PIDS;
-+ }
-+
-+ if (seclabel) {
-+ mf->seclabel = kstrdup(seclabel, GFP_KERNEL);
-+ if (!mf->seclabel)
-+ return -ENOMEM;
-+
-+ mf->valid |= KDBUS_ATTACH_SECLABEL;
-+ }
-+
-+ return 0;
-+}
-+
-+/**
-+ * kdbus_meta_conn_new() - Create connection metadata object
-+ *
-+ * Return: Pointer to new object on success, ERR_PTR on failure.
-+ */
-+struct kdbus_meta_conn *kdbus_meta_conn_new(void)
-+{
-+ struct kdbus_meta_conn *mc;
-+
-+ mc = kzalloc(sizeof(*mc), GFP_KERNEL);
-+ if (!mc)
-+ return ERR_PTR(-ENOMEM);
-+
-+ kref_init(&mc->kref);
-+ mutex_init(&mc->lock);
-+
-+ return mc;
-+}
-+
-+static void kdbus_meta_conn_free(struct kref *kref)
-+{
-+ struct kdbus_meta_conn *mc =
-+ container_of(kref, struct kdbus_meta_conn, kref);
-+
-+ kfree(mc->conn_description);
-+ kfree(mc->owned_names_items);
-+ kfree(mc);
-+}
-+
-+/**
-+ * kdbus_meta_conn_ref() - Gain reference
-+ * @mc: Connection metadata object
-+ */
-+struct kdbus_meta_conn *kdbus_meta_conn_ref(struct kdbus_meta_conn *mc)
-+{
-+ if (mc)
-+ kref_get(&mc->kref);
-+ return mc;
-+}
-+
-+/**
-+ * kdbus_meta_conn_unref() - Drop reference
-+ * @mc: Connection metadata object
-+ */
-+struct kdbus_meta_conn *kdbus_meta_conn_unref(struct kdbus_meta_conn *mc)
-+{
-+ if (mc)
-+ kref_put(&mc->kref, kdbus_meta_conn_free);
-+ return NULL;
-+}
-+
-+static void kdbus_meta_conn_collect_timestamp(struct kdbus_meta_conn *mc,
-+ u64 msg_seqnum)
-+{
-+ mc->ts.monotonic_ns = ktime_get_ns();
-+ mc->ts.realtime_ns = ktime_get_real_ns();
-+
-+ if (msg_seqnum)
-+ mc->ts.seqnum = msg_seqnum;
-+
-+ mc->valid |= KDBUS_ATTACH_TIMESTAMP;
-+}
-+
-+static int kdbus_meta_conn_collect_names(struct kdbus_meta_conn *mc,
-+ struct kdbus_conn *conn)
-+{
-+ const struct kdbus_name_owner *owner;
-+ struct kdbus_item *item;
-+ size_t slen, size;
-+
-+ lockdep_assert_held(&conn->ep->bus->name_registry->rwlock);
-+
-+ size = 0;
-+ /* open-code length calculation to avoid final padding */
-+ list_for_each_entry(owner, &conn->names_list, conn_entry)
-+ if (!(owner->flags & KDBUS_NAME_IN_QUEUE))
-+ size = KDBUS_ALIGN8(size) + KDBUS_ITEM_HEADER_SIZE +
-+ sizeof(struct kdbus_name) +
-+ strlen(owner->name->name) + 1;
-+
-+ if (!size)
-+ return 0;
-+
-+ /* make sure we include zeroed padding for convenience helpers */
-+ item = kmalloc(KDBUS_ALIGN8(size), GFP_KERNEL);
-+ if (!item)
-+ return -ENOMEM;
-+
-+ mc->owned_names_items = item;
-+ mc->owned_names_size = size;
-+
-+ list_for_each_entry(owner, &conn->names_list, conn_entry) {
-+ if (owner->flags & KDBUS_NAME_IN_QUEUE)
-+ continue;
-+
-+ slen = strlen(owner->name->name) + 1;
-+ kdbus_item_set(item, KDBUS_ITEM_OWNED_NAME, NULL,
-+ sizeof(struct kdbus_name) + slen);
-+ item->name.flags = owner->flags;
-+ memcpy(item->name.name, owner->name->name, slen);
-+ item = KDBUS_ITEM_NEXT(item);
-+ }
-+
-+ /* sanity check: the buffer should be completely written now */
-+ WARN_ON((u8 *)item !=
-+ (u8 *)mc->owned_names_items + KDBUS_ALIGN8(size));
-+
-+ mc->valid |= KDBUS_ATTACH_NAMES;
-+ return 0;
-+}
-+
-+static int kdbus_meta_conn_collect_description(struct kdbus_meta_conn *mc,
-+ struct kdbus_conn *conn)
-+{
-+ if (!conn->description)
-+ return 0;
-+
-+ mc->conn_description = kstrdup(conn->description, GFP_KERNEL);
-+ if (!mc->conn_description)
-+ return -ENOMEM;
-+
-+ mc->valid |= KDBUS_ATTACH_CONN_DESCRIPTION;
-+ return 0;
-+}
-+
-+/**
-+ * kdbus_meta_conn_collect() - Collect connection metadata
-+ * @mc: Message metadata object
-+ * @conn: Connection to collect data from
-+ * @msg_seqnum: Sequence number of the message to send
-+ * @what: Attach flags to collect
-+ *
-+ * This collects connection metadata from @msg_seqnum and @conn and saves it
-+ * in @mc.
-+ *
-+ * If KDBUS_ATTACH_NAMES is set in @what and @conn is non-NULL, the caller must
-+ * hold the name-registry read-lock of conn->ep->bus->registry.
-+ *
-+ * Return: 0 on success, negative error code on failure.
-+ */
-+int kdbus_meta_conn_collect(struct kdbus_meta_conn *mc,
-+ struct kdbus_conn *conn,
-+ u64 msg_seqnum, u64 what)
-+{
-+ int ret;
-+
-+ if (!mc || !(what & (KDBUS_ATTACH_TIMESTAMP |
-+ KDBUS_ATTACH_NAMES |
-+ KDBUS_ATTACH_CONN_DESCRIPTION)))
-+ return 0;
-+
-+ mutex_lock(&mc->lock);
-+
-+ if (msg_seqnum && (what & KDBUS_ATTACH_TIMESTAMP) &&
-+ !(mc->collected & KDBUS_ATTACH_TIMESTAMP)) {
-+ kdbus_meta_conn_collect_timestamp(mc, msg_seqnum);
-+ mc->collected |= KDBUS_ATTACH_TIMESTAMP;
-+ }
-+
-+ if (conn && (what & KDBUS_ATTACH_NAMES) &&
-+ !(mc->collected & KDBUS_ATTACH_NAMES)) {
-+ ret = kdbus_meta_conn_collect_names(mc, conn);
-+ if (ret < 0)
-+ goto exit_unlock;
-+ mc->collected |= KDBUS_ATTACH_NAMES;
-+ }
-+
-+ if (conn && (what & KDBUS_ATTACH_CONN_DESCRIPTION) &&
-+ !(mc->collected & KDBUS_ATTACH_CONN_DESCRIPTION)) {
-+ ret = kdbus_meta_conn_collect_description(mc, conn);
-+ if (ret < 0)
-+ goto exit_unlock;
-+ mc->collected |= KDBUS_ATTACH_CONN_DESCRIPTION;
-+ }
-+
-+ ret = 0;
-+
-+exit_unlock:
-+ mutex_unlock(&mc->lock);
-+ return ret;
-+}
-+
-+static void kdbus_meta_export_caps(struct kdbus_meta_caps *out,
-+ const struct kdbus_meta_proc *mp,
-+ struct user_namespace *user_ns)
-+{
-+ struct user_namespace *iter;
-+ const struct cred *cred = mp->cred;
-+ bool parent = false, owner = false;
-+ int i;
-+
-+ /*
-+ * This translates the effective capabilities of 'cred' into the given
-+ * user-namespace. If the given user-namespace is a child-namespace of
-+ * the user-namespace of 'cred', the mask can be copied verbatim. If
-+ * not, the mask is cleared.
-+ * There's one exception: If 'cred' is the owner of any user-namespace
-+ * in the path between the given user-namespace and the user-namespace
-+ * of 'cred', then it has all effective capabilities set. This means,
-+ * the user who created a user-namespace always has all effective
-+ * capabilities in any child namespaces. Note that this is based on the
-+ * uid of the namespace creator, not the task hierarchy.
-+ */
-+ for (iter = user_ns; iter; iter = iter->parent) {
-+ if (iter == cred->user_ns) {
-+ parent = true;
-+ break;
-+ }
-+
-+ if (iter == &init_user_ns)
-+ break;
-+
-+ if ((iter->parent == cred->user_ns) &&
-+ uid_eq(iter->owner, cred->euid)) {
-+ owner = true;
-+ break;
-+ }
-+ }
-+
-+ out->last_cap = CAP_LAST_CAP;
-+
-+ CAP_FOR_EACH_U32(i) {
-+ if (parent) {
-+ out->set[0].caps[i] = cred->cap_inheritable.cap[i];
-+ out->set[1].caps[i] = cred->cap_permitted.cap[i];
-+ out->set[2].caps[i] = cred->cap_effective.cap[i];
-+ out->set[3].caps[i] = cred->cap_bset.cap[i];
-+ } else if (owner) {
-+ out->set[0].caps[i] = 0U;
-+ out->set[1].caps[i] = ~0U;
-+ out->set[2].caps[i] = ~0U;
-+ out->set[3].caps[i] = ~0U;
-+ } else {
-+ out->set[0].caps[i] = 0U;
-+ out->set[1].caps[i] = 0U;
-+ out->set[2].caps[i] = 0U;
-+ out->set[3].caps[i] = 0U;
-+ }
-+ }
-+
-+ /* clear unused bits */
-+ for (i = 0; i < 4; i++)
-+ out->set[i].caps[CAP_TO_INDEX(CAP_LAST_CAP)] &=
-+ CAP_LAST_U32_VALID_MASK;
-+}
-+
-+/* This is equivalent to from_kuid_munged(), but maps INVALID_UID to itself */
-+static uid_t kdbus_from_kuid_keep(struct user_namespace *ns, kuid_t uid)
-+{
-+ return uid_valid(uid) ? from_kuid_munged(ns, uid) : ((uid_t)-1);
-+}
-+
-+/* This is equivalent to from_kgid_munged(), but maps INVALID_GID to itself */
-+static gid_t kdbus_from_kgid_keep(struct user_namespace *ns, kgid_t gid)
-+{
-+ return gid_valid(gid) ? from_kgid_munged(ns, gid) : ((gid_t)-1);
-+}
-+
-+struct kdbus_meta_staging {
-+ const struct kdbus_meta_proc *mp;
-+ const struct kdbus_meta_fake *mf;
-+ const struct kdbus_meta_conn *mc;
-+ const struct kdbus_conn *conn;
-+ u64 mask;
-+
-+ void *exe;
-+ const char *exe_path;
-+};
-+
-+static size_t kdbus_meta_measure(struct kdbus_meta_staging *staging)
-+{
-+ const struct kdbus_meta_proc *mp = staging->mp;
-+ const struct kdbus_meta_fake *mf = staging->mf;
-+ const struct kdbus_meta_conn *mc = staging->mc;
-+ const u64 mask = staging->mask;
-+ size_t size = 0;
-+
-+ /* process metadata */
-+
-+ if (mf && (mask & KDBUS_ATTACH_CREDS))
-+ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_creds));
-+ else if (mp && (mask & KDBUS_ATTACH_CREDS))
-+ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_creds));
-+
-+ if (mf && (mask & KDBUS_ATTACH_PIDS))
-+ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_pids));
-+ else if (mp && (mask & KDBUS_ATTACH_PIDS))
-+ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_pids));
-+
-+ if (mp && (mask & KDBUS_ATTACH_AUXGROUPS))
-+ size += KDBUS_ITEM_SIZE(mp->cred->group_info->ngroups *
-+ sizeof(u64));
-+
-+ if (mp && (mask & KDBUS_ATTACH_TID_COMM))
-+ size += KDBUS_ITEM_SIZE(strlen(mp->tid_comm) + 1);
-+
-+ if (mp && (mask & KDBUS_ATTACH_PID_COMM))
-+ size += KDBUS_ITEM_SIZE(strlen(mp->pid_comm) + 1);
-+
-+ if (staging->exe_path && (mask & KDBUS_ATTACH_EXE))
-+ size += KDBUS_ITEM_SIZE(strlen(staging->exe_path) + 1);
-+
-+ if (mp && (mask & KDBUS_ATTACH_CMDLINE))
-+ size += KDBUS_ITEM_SIZE(strlen(mp->cmdline) + 1);
-+
-+ if (mp && (mask & KDBUS_ATTACH_CGROUP))
-+ size += KDBUS_ITEM_SIZE(strlen(mp->cgroup) + 1);
-+
-+ if (mp && (mask & KDBUS_ATTACH_CAPS))
-+ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_meta_caps));
-+
-+ if (mf && (mask & KDBUS_ATTACH_SECLABEL))
-+ size += KDBUS_ITEM_SIZE(strlen(mf->seclabel) + 1);
-+ else if (mp && (mask & KDBUS_ATTACH_SECLABEL))
-+ size += KDBUS_ITEM_SIZE(strlen(mp->seclabel) + 1);
-+
-+ if (mp && (mask & KDBUS_ATTACH_AUDIT))
-+ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_audit));
-+
-+ /* connection metadata */
-+
-+ if (mc && (mask & KDBUS_ATTACH_NAMES))
-+ size += KDBUS_ALIGN8(mc->owned_names_size);
-+
-+ if (mc && (mask & KDBUS_ATTACH_CONN_DESCRIPTION))
-+ size += KDBUS_ITEM_SIZE(strlen(mc->conn_description) + 1);
-+
-+ if (mc && (mask & KDBUS_ATTACH_TIMESTAMP))
-+ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_timestamp));
-+
-+ return size;
-+}
-+
-+static struct kdbus_item *kdbus_write_head(struct kdbus_item **iter,
-+ u64 type, u64 size)
-+{
-+ struct kdbus_item *item = *iter;
-+ size_t padding;
-+
-+ item->type = type;
-+ item->size = KDBUS_ITEM_HEADER_SIZE + size;
-+
-+ /* clear padding */
-+ padding = KDBUS_ALIGN8(item->size) - item->size;
-+ if (padding)
-+ memset(item->data + size, 0, padding);
-+
-+ *iter = KDBUS_ITEM_NEXT(item);
-+ return item;
-+}
-+
-+static struct kdbus_item *kdbus_write_full(struct kdbus_item **iter,
-+ u64 type, u64 size, const void *data)
-+{
-+ struct kdbus_item *item;
-+
-+ item = kdbus_write_head(iter, type, size);
-+ memcpy(item->data, data, size);
-+ return item;
-+}
-+
-+static size_t kdbus_meta_write(struct kdbus_meta_staging *staging, void *mem,
-+ size_t size)
-+{
-+ struct user_namespace *user_ns = staging->conn->cred->user_ns;
-+ struct pid_namespace *pid_ns = ns_of_pid(staging->conn->pid);
-+ struct kdbus_item *item = NULL, *items = mem;
-+ u8 *end, *owned_names_end = NULL;
-+
-+ /* process metadata */
-+
-+ if (staging->mf && (staging->mask & KDBUS_ATTACH_CREDS)) {
-+ const struct kdbus_meta_fake *mf = staging->mf;
-+
-+ item = kdbus_write_head(&items, KDBUS_ITEM_CREDS,
-+ sizeof(struct kdbus_creds));
-+ item->creds = (struct kdbus_creds){
-+ .uid = kdbus_from_kuid_keep(user_ns, mf->uid),
-+ .euid = kdbus_from_kuid_keep(user_ns, mf->euid),
-+ .suid = kdbus_from_kuid_keep(user_ns, mf->suid),
-+ .fsuid = kdbus_from_kuid_keep(user_ns, mf->fsuid),
-+ .gid = kdbus_from_kgid_keep(user_ns, mf->gid),
-+ .egid = kdbus_from_kgid_keep(user_ns, mf->egid),
-+ .sgid = kdbus_from_kgid_keep(user_ns, mf->sgid),
-+ .fsgid = kdbus_from_kgid_keep(user_ns, mf->fsgid),
-+ };
-+ } else if (staging->mp && (staging->mask & KDBUS_ATTACH_CREDS)) {
-+ const struct cred *c = staging->mp->cred;
-+
-+ item = kdbus_write_head(&items, KDBUS_ITEM_CREDS,
-+ sizeof(struct kdbus_creds));
-+ item->creds = (struct kdbus_creds){
-+ .uid = kdbus_from_kuid_keep(user_ns, c->uid),
-+ .euid = kdbus_from_kuid_keep(user_ns, c->euid),
-+ .suid = kdbus_from_kuid_keep(user_ns, c->suid),
-+ .fsuid = kdbus_from_kuid_keep(user_ns, c->fsuid),
-+ .gid = kdbus_from_kgid_keep(user_ns, c->gid),
-+ .egid = kdbus_from_kgid_keep(user_ns, c->egid),
-+ .sgid = kdbus_from_kgid_keep(user_ns, c->sgid),
-+ .fsgid = kdbus_from_kgid_keep(user_ns, c->fsgid),
-+ };
-+ }
-+
-+ if (staging->mf && (staging->mask & KDBUS_ATTACH_PIDS)) {
-+ item = kdbus_write_head(&items, KDBUS_ITEM_PIDS,
-+ sizeof(struct kdbus_pids));
-+ item->pids = (struct kdbus_pids){
-+ .pid = pid_nr_ns(staging->mf->tgid, pid_ns),
-+ .tid = pid_nr_ns(staging->mf->pid, pid_ns),
-+ .ppid = pid_nr_ns(staging->mf->ppid, pid_ns),
-+ };
-+ } else if (staging->mp && (staging->mask & KDBUS_ATTACH_PIDS)) {
-+ item = kdbus_write_head(&items, KDBUS_ITEM_PIDS,
-+ sizeof(struct kdbus_pids));
-+ item->pids = (struct kdbus_pids){
-+ .pid = pid_nr_ns(staging->mp->tgid, pid_ns),
-+ .tid = pid_nr_ns(staging->mp->pid, pid_ns),
-+ .ppid = pid_nr_ns(staging->mp->ppid, pid_ns),
-+ };
-+ }
-+
-+ if (staging->mp && (staging->mask & KDBUS_ATTACH_AUXGROUPS)) {
-+ const struct group_info *info = staging->mp->cred->group_info;
-+ size_t i;
-+
-+ item = kdbus_write_head(&items, KDBUS_ITEM_AUXGROUPS,
-+ info->ngroups * sizeof(u64));
-+ for (i = 0; i < info->ngroups; ++i)
-+ item->data64[i] = from_kgid_munged(user_ns,
-+ GROUP_AT(info, i));
-+ }
-+
-+ if (staging->mp && (staging->mask & KDBUS_ATTACH_TID_COMM))
-+ item = kdbus_write_full(&items, KDBUS_ITEM_TID_COMM,
-+ strlen(staging->mp->tid_comm) + 1,
-+ staging->mp->tid_comm);
-+
-+ if (staging->mp && (staging->mask & KDBUS_ATTACH_PID_COMM))
-+ item = kdbus_write_full(&items, KDBUS_ITEM_PID_COMM,
-+ strlen(staging->mp->pid_comm) + 1,
-+ staging->mp->pid_comm);
-+
-+ if (staging->exe_path && (staging->mask & KDBUS_ATTACH_EXE))
-+ item = kdbus_write_full(&items, KDBUS_ITEM_EXE,
-+ strlen(staging->exe_path) + 1,
-+ staging->exe_path);
-+
-+ if (staging->mp && (staging->mask & KDBUS_ATTACH_CMDLINE))
-+ item = kdbus_write_full(&items, KDBUS_ITEM_CMDLINE,
-+ strlen(staging->mp->cmdline) + 1,
-+ staging->mp->cmdline);
-+
-+ if (staging->mp && (staging->mask & KDBUS_ATTACH_CGROUP))
-+ item = kdbus_write_full(&items, KDBUS_ITEM_CGROUP,
-+ strlen(staging->mp->cgroup) + 1,
-+ staging->mp->cgroup);
-+
-+ if (staging->mp && (staging->mask & KDBUS_ATTACH_CAPS)) {
-+ item = kdbus_write_head(&items, KDBUS_ITEM_CAPS,
-+ sizeof(struct kdbus_meta_caps));
-+ kdbus_meta_export_caps((void*)&item->caps, staging->mp,
-+ user_ns);
-+ }
-+
-+ if (staging->mf && (staging->mask & KDBUS_ATTACH_SECLABEL))
-+ item = kdbus_write_full(&items, KDBUS_ITEM_SECLABEL,
-+ strlen(staging->mf->seclabel) + 1,
-+ staging->mf->seclabel);
-+ else if (staging->mp && (staging->mask & KDBUS_ATTACH_SECLABEL))
-+ item = kdbus_write_full(&items, KDBUS_ITEM_SECLABEL,
-+ strlen(staging->mp->seclabel) + 1,
-+ staging->mp->seclabel);
-+
-+ if (staging->mp && (staging->mask & KDBUS_ATTACH_AUDIT)) {
-+ item = kdbus_write_head(&items, KDBUS_ITEM_AUDIT,
-+ sizeof(struct kdbus_audit));
-+ item->audit = (struct kdbus_audit){
-+ .loginuid = from_kuid(user_ns,
-+ staging->mp->audit_loginuid),
-+ .sessionid = staging->mp->audit_sessionid,
-+ };
-+ }
-+
-+ /* connection metadata */
-+
-+ if (staging->mc && (staging->mask & KDBUS_ATTACH_NAMES)) {
-+ memcpy(items, staging->mc->owned_names_items,
-+ KDBUS_ALIGN8(staging->mc->owned_names_size));
-+ owned_names_end = (u8 *)items + staging->mc->owned_names_size;
-+ items = (void *)KDBUS_ALIGN8((unsigned long)owned_names_end);
-+ }
-+
-+ if (staging->mc && (staging->mask & KDBUS_ATTACH_CONN_DESCRIPTION))
-+ item = kdbus_write_full(&items, KDBUS_ITEM_CONN_DESCRIPTION,
-+ strlen(staging->mc->conn_description) + 1,
-+ staging->mc->conn_description);
-+
-+ if (staging->mc && (staging->mask & KDBUS_ATTACH_TIMESTAMP))
-+ item = kdbus_write_full(&items, KDBUS_ITEM_TIMESTAMP,
-+ sizeof(staging->mc->ts),
-+ &staging->mc->ts);
-+
-+ /*
-+ * Return real size (minus trailing padding). In case of 'owned_names'
-+ * we cannot deduce it from item->size, so treat it special.
-+ */
-+
-+ if (items == (void *)KDBUS_ALIGN8((unsigned long)owned_names_end))
-+ end = owned_names_end;
-+ else if (item)
-+ end = (u8 *)item + item->size;
-+ else
-+ end = mem;
-+
-+ WARN_ON((u8 *)items - (u8 *)mem != size);
-+ WARN_ON((void *)KDBUS_ALIGN8((unsigned long)end) != (void *)items);
-+
-+ return end - (u8 *)mem;
-+}
-+
-+int kdbus_meta_emit(struct kdbus_meta_proc *mp,
-+ struct kdbus_meta_fake *mf,
-+ struct kdbus_meta_conn *mc,
-+ struct kdbus_conn *conn,
-+ u64 mask,
-+ struct kdbus_item **out_items,
-+ size_t *out_size)
-+{
-+ struct kdbus_meta_staging staging = {};
-+ struct kdbus_item *items = NULL;
-+ size_t size = 0;
-+ int ret;
-+
-+ if (WARN_ON(mf && mp))
-+ mp = NULL;
-+
-+ staging.mp = mp;
-+ staging.mf = mf;
-+ staging.mc = mc;
-+ staging.conn = conn;
-+
-+ /* get mask of valid items */
-+ if (mf)
-+ staging.mask |= mf->valid;
-+ if (mp) {
-+ mutex_lock(&mp->lock);
-+ staging.mask |= mp->valid;
-+ mutex_unlock(&mp->lock);
-+ }
-+ if (mc) {
-+ mutex_lock(&mc->lock);
-+ staging.mask |= mc->valid;
-+ mutex_unlock(&mc->lock);
-+ }
-+
-+ staging.mask &= mask;
-+
-+ if (!staging.mask) { /* bail out if nothing to do */
-+ ret = 0;
-+ goto exit;
-+ }
-+
-+ /* EXE is special as it needs a temporary page to assemble */
-+ if (mp && (staging.mask & KDBUS_ATTACH_EXE)) {
-+ struct path p;
-+
-+ /*
-+ * XXX: We need access to __d_path() so we can write the path
-+ * relative to conn->root_path. Once upstream, we need
-+ * EXPORT_SYMBOL(__d_path) or an equivalent of d_path() that
-+ * takes the root path directly. Until then, we drop this item
-+ * if the root-paths differ.
-+ */
-+
-+ get_fs_root(current->fs, &p);
-+ if (path_equal(&p, &conn->root_path)) {
-+ staging.exe = (void *)__get_free_page(GFP_TEMPORARY);
-+ if (!staging.exe) {
-+ path_put(&p);
-+ ret = -ENOMEM;
-+ goto exit;
-+ }
-+
-+ staging.exe_path = d_path(&mp->exe_path, staging.exe,
-+ PAGE_SIZE);
-+ if (IS_ERR(staging.exe_path)) {
-+ path_put(&p);
-+ ret = PTR_ERR(staging.exe_path);
-+ goto exit;
-+ }
-+ }
-+ path_put(&p);
-+ }
-+
-+ size = kdbus_meta_measure(&staging);
-+ if (!size) { /* bail out if nothing to do */
-+ ret = 0;
-+ goto exit;
-+ }
-+
-+ items = kmalloc(size, GFP_KERNEL);
-+ if (!items) {
-+ ret = -ENOMEM;
-+ goto exit;
-+ }
-+
-+ size = kdbus_meta_write(&staging, items, size);
-+ if (!size) {
-+ kfree(items);
-+ items = NULL;
-+ }
-+
-+ ret = 0;
-+
-+exit:
-+ if (staging.exe)
-+ free_page((unsigned long)staging.exe);
-+ if (ret >= 0) {
-+ *out_items = items;
-+ *out_size = size;
-+ }
-+ return ret;
-+}
-+
-+enum {
-+ KDBUS_META_PROC_NONE,
-+ KDBUS_META_PROC_NORMAL,
-+};
-+
-+/**
-+ * kdbus_proc_permission() - check /proc permissions on target pid
-+ * @pid_ns: namespace we operate in
-+ * @cred: credentials of requestor
-+ * @target: target process
-+ *
-+ * This checks whether a process with credentials @cred can access information
-+ * of @target in the namespace @pid_ns. This tries to follow /proc permissions,
-+ * but is slightly more restrictive.
-+ *
-+ * Return: The /proc access level (KDBUS_META_PROC_*) is returned.
-+ */
-+static unsigned int kdbus_proc_permission(const struct pid_namespace *pid_ns,
-+ const struct cred *cred,
-+ struct pid *target)
-+{
-+ if (pid_ns->hide_pid < 1)
-+ return KDBUS_META_PROC_NORMAL;
-+
-+ /* XXX: we need groups_search() exported for aux-groups */
-+ if (gid_eq(cred->egid, pid_ns->pid_gid))
-+ return KDBUS_META_PROC_NORMAL;
-+
-+ /*
-+ * XXX: If ptrace_may_access(PTRACE_MODE_READ) is granted, you can
-+ * overwrite hide_pid. However, ptrace_may_access() only supports
-+ * checking 'current', hence, we cannot use this here. But we
-+ * simply decide to not support this override, so no need to worry.
-+ */
-+
-+ return KDBUS_META_PROC_NONE;
-+}
-+
-+/**
-+ * kdbus_meta_proc_mask() - calculate which metadata would be visible to
-+ * a connection via /proc
-+ * @prv_pid: pid of metadata provider
-+ * @req_pid: pid of metadata requestor
-+ * @req_cred: credentials of metadata reqeuestor
-+ * @wanted: metadata that is requested
-+ *
-+ * This checks which metadata items of @prv_pid can be read via /proc by the
-+ * requestor @req_pid.
-+ *
-+ * Return: Set of metadata flags the requestor can see (limited by @wanted).
-+ */
-+static u64 kdbus_meta_proc_mask(struct pid *prv_pid,
-+ struct pid *req_pid,
-+ const struct cred *req_cred,
-+ u64 wanted)
-+{
-+ struct pid_namespace *prv_ns, *req_ns;
-+ unsigned int proc;
-+
-+ prv_ns = ns_of_pid(prv_pid);
-+ req_ns = ns_of_pid(req_pid);
-+
-+ /*
-+ * If the sender is not visible in the receiver namespace, then the
-+ * receiver cannot access the sender via its own procfs. Hence, we do
-+ * not attach any additional metadata.
-+ */
-+ if (!pid_nr_ns(prv_pid, req_ns))
-+ return 0;
-+
-+ /*
-+ * If the pid-namespace of the receiver has hide_pid set, it cannot see
-+ * any process but its own. We shortcut this /proc permission check if
-+ * provider and requestor are the same. If not, we perform rather
-+ * expensive /proc permission checks.
-+ */
-+ if (prv_pid == req_pid)
-+ proc = KDBUS_META_PROC_NORMAL;
-+ else
-+ proc = kdbus_proc_permission(req_ns, req_cred, prv_pid);
-+
-+ /* you need /proc access to read standard process attributes */
-+ if (proc < KDBUS_META_PROC_NORMAL)
-+ wanted &= ~(KDBUS_ATTACH_TID_COMM |
-+ KDBUS_ATTACH_PID_COMM |
-+ KDBUS_ATTACH_SECLABEL |
-+ KDBUS_ATTACH_CMDLINE |
-+ KDBUS_ATTACH_CGROUP |
-+ KDBUS_ATTACH_AUDIT |
-+ KDBUS_ATTACH_CAPS |
-+ KDBUS_ATTACH_EXE);
-+
-+ /* clear all non-/proc flags */
-+ return wanted & (KDBUS_ATTACH_TID_COMM |
-+ KDBUS_ATTACH_PID_COMM |
-+ KDBUS_ATTACH_SECLABEL |
-+ KDBUS_ATTACH_CMDLINE |
-+ KDBUS_ATTACH_CGROUP |
-+ KDBUS_ATTACH_AUDIT |
-+ KDBUS_ATTACH_CAPS |
-+ KDBUS_ATTACH_EXE);
-+}
-+
-+/**
-+ * kdbus_meta_get_mask() - calculate attach flags mask for metadata request
-+ * @prv_pid: pid of metadata provider
-+ * @prv_mask: mask of metadata the provide grants unchecked
-+ * @req_pid: pid of metadata requestor
-+ * @req_cred: credentials of metadata requestor
-+ * @req_mask: mask of metadata that is requested
-+ *
-+ * This calculates the metadata items that the requestor @req_pid can access
-+ * from the metadata provider @prv_pid. This permission check consists of
-+ * several different parts:
-+ * - Providers can grant metadata items unchecked. Regardless of their type,
-+ * they're always granted to the requestor. This mask is passed as @prv_mask.
-+ * - Basic items (credentials and connection metadata) are granted implicitly
-+ * to everyone. They're publicly available to any bus-user that can see the
-+ * provider.
-+ * - Process credentials that are not granted implicitly follow the same
-+ * permission checks as /proc. This means, we always assume a requestor
-+ * process has access to their *own* /proc mount, if they have access to
-+ * kdbusfs.
-+ *
-+ * Return: Mask of metadata that is granted.
-+ */
-+static u64 kdbus_meta_get_mask(struct pid *prv_pid, u64 prv_mask,
-+ struct pid *req_pid,
-+ const struct cred *req_cred, u64 req_mask)
-+{
-+ u64 missing, impl_mask, proc_mask = 0;
-+
-+ /*
-+ * Connection metadata and basic unix process credentials are
-+ * transmitted implicitly, and cannot be suppressed. Both are required
-+ * to perform user-space policies on the receiver-side. Furthermore,
-+ * connection metadata is public state, anyway, and unix credentials
-+ * are needed for UDS-compatibility. We extend them slightly by
-+ * auxiliary groups and additional uids/gids/pids.
-+ */
-+ impl_mask = /* connection metadata */
-+ KDBUS_ATTACH_CONN_DESCRIPTION |
-+ KDBUS_ATTACH_TIMESTAMP |
-+ KDBUS_ATTACH_NAMES |
-+ /* credentials and pids */
-+ KDBUS_ATTACH_AUXGROUPS |
-+ KDBUS_ATTACH_CREDS |
-+ KDBUS_ATTACH_PIDS;
-+
-+ /*
-+ * Calculate the set of metadata that is not granted implicitly nor by
-+ * the sender, but still requested by the receiver. If any are left,
-+ * perform rather expensive /proc access checks for them.
-+ */
-+ missing = req_mask & ~((prv_mask | impl_mask) & req_mask);
-+ if (missing)
-+ proc_mask = kdbus_meta_proc_mask(prv_pid, req_pid, req_cred,
-+ missing);
-+
-+ return (prv_mask | impl_mask | proc_mask) & req_mask;
-+}
-+
-+/**
-+ */
-+u64 kdbus_meta_info_mask(const struct kdbus_conn *conn, u64 mask)
-+{
-+ return kdbus_meta_get_mask(conn->pid,
-+ atomic64_read(&conn->attach_flags_send),
-+ task_pid(current),
-+ current_cred(),
-+ mask);
-+}
-+
-+/**
-+ */
-+u64 kdbus_meta_msg_mask(const struct kdbus_conn *snd,
-+ const struct kdbus_conn *rcv)
-+{
-+ return kdbus_meta_get_mask(task_pid(current),
-+ atomic64_read(&snd->attach_flags_send),
-+ rcv->pid,
-+ rcv->cred,
-+ atomic64_read(&rcv->attach_flags_recv));
-+}
-diff --git a/ipc/kdbus/metadata.h b/ipc/kdbus/metadata.h
-new file mode 100644
-index 0000000..dba7cc7
---- /dev/null
-+++ b/ipc/kdbus/metadata.h
-@@ -0,0 +1,86 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_METADATA_H
-+#define __KDBUS_METADATA_H
-+
-+#include <linux/kernel.h>
-+
-+struct kdbus_conn;
-+struct kdbus_pool_slice;
-+
-+struct kdbus_meta_proc;
-+struct kdbus_meta_conn;
-+
-+/**
-+ * struct kdbus_meta_fake - Fake metadata
-+ * @valid: Bitmask of collected and valid items
-+ * @uid: UID of process
-+ * @euid: EUID of process
-+ * @suid: SUID of process
-+ * @fsuid: FSUID of process
-+ * @gid: GID of process
-+ * @egid: EGID of process
-+ * @sgid: SGID of process
-+ * @fsgid: FSGID of process
-+ * @pid: PID of process
-+ * @tgid: TGID of process
-+ * @ppid: PPID of process
-+ * @seclabel: Seclabel
-+ */
-+struct kdbus_meta_fake {
-+ u64 valid;
-+
-+ /* KDBUS_ITEM_CREDS */
-+ kuid_t uid, euid, suid, fsuid;
-+ kgid_t gid, egid, sgid, fsgid;
-+
-+ /* KDBUS_ITEM_PIDS */
-+ struct pid *pid, *tgid, *ppid;
-+
-+ /* KDBUS_ITEM_SECLABEL */
-+ char *seclabel;
-+};
-+
-+struct kdbus_meta_proc *kdbus_meta_proc_new(void);
-+struct kdbus_meta_proc *kdbus_meta_proc_ref(struct kdbus_meta_proc *mp);
-+struct kdbus_meta_proc *kdbus_meta_proc_unref(struct kdbus_meta_proc *mp);
-+int kdbus_meta_proc_collect(struct kdbus_meta_proc *mp, u64 what);
-+
-+struct kdbus_meta_fake *kdbus_meta_fake_new(void);
-+struct kdbus_meta_fake *kdbus_meta_fake_free(struct kdbus_meta_fake *mf);
-+int kdbus_meta_fake_collect(struct kdbus_meta_fake *mf,
-+ const struct kdbus_creds *creds,
-+ const struct kdbus_pids *pids,
-+ const char *seclabel);
-+
-+struct kdbus_meta_conn *kdbus_meta_conn_new(void);
-+struct kdbus_meta_conn *kdbus_meta_conn_ref(struct kdbus_meta_conn *mc);
-+struct kdbus_meta_conn *kdbus_meta_conn_unref(struct kdbus_meta_conn *mc);
-+int kdbus_meta_conn_collect(struct kdbus_meta_conn *mc,
-+ struct kdbus_conn *conn,
-+ u64 msg_seqnum, u64 what);
-+
-+int kdbus_meta_emit(struct kdbus_meta_proc *mp,
-+ struct kdbus_meta_fake *mf,
-+ struct kdbus_meta_conn *mc,
-+ struct kdbus_conn *conn,
-+ u64 mask,
-+ struct kdbus_item **out_items,
-+ size_t *out_size);
-+u64 kdbus_meta_info_mask(const struct kdbus_conn *conn, u64 mask);
-+u64 kdbus_meta_msg_mask(const struct kdbus_conn *snd,
-+ const struct kdbus_conn *rcv);
-+
-+#endif
-diff --git a/ipc/kdbus/names.c b/ipc/kdbus/names.c
-new file mode 100644
-index 0000000..bf44ca3
---- /dev/null
-+++ b/ipc/kdbus/names.c
-@@ -0,0 +1,854 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/ctype.h>
-+#include <linux/fs.h>
-+#include <linux/hash.h>
-+#include <linux/idr.h>
-+#include <linux/init.h>
-+#include <linux/module.h>
-+#include <linux/mutex.h>
-+#include <linux/rwsem.h>
-+#include <linux/sched.h>
-+#include <linux/slab.h>
-+#include <linux/uaccess.h>
-+#include <linux/uio.h>
-+
-+#include "bus.h"
-+#include "connection.h"
-+#include "endpoint.h"
-+#include "handle.h"
-+#include "item.h"
-+#include "names.h"
-+#include "notify.h"
-+#include "policy.h"
-+
-+#define KDBUS_NAME_SAVED_MASK (KDBUS_NAME_ALLOW_REPLACEMENT | \
-+ KDBUS_NAME_QUEUE)
-+
-+static bool kdbus_name_owner_is_used(struct kdbus_name_owner *owner)
-+{
-+ return !list_empty(&owner->name_entry) ||
-+ owner == owner->name->activator;
-+}
-+
-+static struct kdbus_name_owner *
-+kdbus_name_owner_new(struct kdbus_conn *conn, struct kdbus_name_entry *name,
-+ u64 flags)
-+{
-+ struct kdbus_name_owner *owner;
-+
-+ kdbus_conn_assert_active(conn);
-+
-+ if (conn->name_count >= KDBUS_CONN_MAX_NAMES)
-+ return ERR_PTR(-E2BIG);
-+
-+ owner = kmalloc(sizeof(*owner), GFP_KERNEL);
-+ if (!owner)
-+ return ERR_PTR(-ENOMEM);
-+
-+ owner->flags = flags & KDBUS_NAME_SAVED_MASK;
-+ owner->conn = conn;
-+ owner->name = name;
-+ list_add_tail(&owner->conn_entry, &conn->names_list);
-+ INIT_LIST_HEAD(&owner->name_entry);
-+
-+ ++conn->name_count;
-+ return owner;
-+}
-+
-+static void kdbus_name_owner_free(struct kdbus_name_owner *owner)
-+{
-+ if (!owner)
-+ return;
-+
-+ WARN_ON(kdbus_name_owner_is_used(owner));
-+ --owner->conn->name_count;
-+ list_del(&owner->conn_entry);
-+ kfree(owner);
-+}
-+
-+static struct kdbus_name_owner *
-+kdbus_name_owner_find(struct kdbus_name_entry *name, struct kdbus_conn *conn)
-+{
-+ struct kdbus_name_owner *owner;
-+
-+ /*
-+ * Use conn->names_list over name->queue to make sure boundaries of
-+ * this linear search are controlled by the connection itself.
-+ * Furthermore, this will find normal owners as well as activators
-+ * without any additional code.
-+ */
-+ list_for_each_entry(owner, &conn->names_list, conn_entry)
-+ if (owner->name == name)
-+ return owner;
-+
-+ return NULL;
-+}
-+
-+static bool kdbus_name_entry_is_used(struct kdbus_name_entry *name)
-+{
-+ return !list_empty(&name->queue) || name->activator;
-+}
-+
-+static struct kdbus_name_owner *
-+kdbus_name_entry_first(struct kdbus_name_entry *name)
-+{
-+ return list_first_entry_or_null(&name->queue, struct kdbus_name_owner,
-+ name_entry);
-+}
-+
-+static struct kdbus_name_entry *
-+kdbus_name_entry_new(struct kdbus_name_registry *r, u32 hash,
-+ const char *name_str)
-+{
-+ struct kdbus_name_entry *name;
-+ size_t namelen;
-+
-+ lockdep_assert_held(&r->rwlock);
-+
-+ namelen = strlen(name_str);
-+
-+ name = kmalloc(sizeof(*name) + namelen + 1, GFP_KERNEL);
-+ if (!name)
-+ return ERR_PTR(-ENOMEM);
-+
-+ name->name_id = ++r->name_seq_last;
-+ name->activator = NULL;
-+ INIT_LIST_HEAD(&name->queue);
-+ hash_add(r->entries_hash, &name->hentry, hash);
-+ memcpy(name->name, name_str, namelen + 1);
-+
-+ return name;
-+}
-+
-+static void kdbus_name_entry_free(struct kdbus_name_entry *name)
-+{
-+ if (!name)
-+ return;
-+
-+ WARN_ON(kdbus_name_entry_is_used(name));
-+ hash_del(&name->hentry);
-+ kfree(name);
-+}
-+
-+static struct kdbus_name_entry *
-+kdbus_name_entry_find(struct kdbus_name_registry *r, u32 hash,
-+ const char *name_str)
-+{
-+ struct kdbus_name_entry *name;
-+
-+ lockdep_assert_held(&r->rwlock);
-+
-+ hash_for_each_possible(r->entries_hash, name, hentry, hash)
-+ if (!strcmp(name->name, name_str))
-+ return name;
-+
-+ return NULL;
-+}
-+
-+/**
-+ * kdbus_name_registry_new() - create a new name registry
-+ *
-+ * Return: a new kdbus_name_registry on success, ERR_PTR on failure.
-+ */
-+struct kdbus_name_registry *kdbus_name_registry_new(void)
-+{
-+ struct kdbus_name_registry *r;
-+
-+ r = kmalloc(sizeof(*r), GFP_KERNEL);
-+ if (!r)
-+ return ERR_PTR(-ENOMEM);
-+
-+ hash_init(r->entries_hash);
-+ init_rwsem(&r->rwlock);
-+ r->name_seq_last = 0;
-+
-+ return r;
-+}
-+
-+/**
-+ * kdbus_name_registry_free() - free name registry
-+ * @r: name registry to free, or NULL
-+ *
-+ * Free a name registry and cleanup all internal objects. This is a no-op if
-+ * you pass NULL as registry.
-+ */
-+void kdbus_name_registry_free(struct kdbus_name_registry *r)
-+{
-+ if (!r)
-+ return;
-+
-+ WARN_ON(!hash_empty(r->entries_hash));
-+ kfree(r);
-+}
-+
-+/**
-+ * kdbus_name_lookup_unlocked() - lookup name in registry
-+ * @reg: name registry
-+ * @name: name to lookup
-+ *
-+ * This looks up @name in the given name-registry and returns the
-+ * kdbus_name_entry object. The caller must hold the registry-lock and must not
-+ * access the returned object after releasing the lock.
-+ *
-+ * Return: Pointer to name-entry, or NULL if not found.
-+ */
-+struct kdbus_name_entry *
-+kdbus_name_lookup_unlocked(struct kdbus_name_registry *reg, const char *name)
-+{
-+ return kdbus_name_entry_find(reg, kdbus_strhash(name), name);
-+}
-+
-+static int kdbus_name_become_activator(struct kdbus_name_owner *owner,
-+ u64 *return_flags)
-+{
-+ if (kdbus_name_owner_is_used(owner))
-+ return -EALREADY;
-+ if (owner->name->activator)
-+ return -EEXIST;
-+
-+ owner->name->activator = owner;
-+ owner->flags |= KDBUS_NAME_ACTIVATOR;
-+
-+ if (kdbus_name_entry_first(owner->name)) {
-+ owner->flags |= KDBUS_NAME_IN_QUEUE;
-+ } else {
-+ owner->flags |= KDBUS_NAME_PRIMARY;
-+ kdbus_notify_name_change(owner->conn->ep->bus,
-+ KDBUS_ITEM_NAME_ADD,
-+ 0, owner->conn->id,
-+ 0, owner->flags,
-+ owner->name->name);
-+ }
-+
-+ if (return_flags)
-+ *return_flags = owner->flags | KDBUS_NAME_ACQUIRED;
-+
-+ return 0;
-+}
-+
-+static int kdbus_name_update(struct kdbus_name_owner *owner, u64 flags,
-+ u64 *return_flags)
-+{
-+ struct kdbus_name_owner *primary, *activator;
-+ struct kdbus_name_entry *name;
-+ struct kdbus_bus *bus;
-+ u64 nflags = 0;
-+ int ret = 0;
-+
-+ name = owner->name;
-+ bus = owner->conn->ep->bus;
-+ primary = kdbus_name_entry_first(name);
-+ activator = name->activator;
-+
-+ /* cannot be activator and acquire a name */
-+ if (owner == activator)
-+ return -EUCLEAN;
-+
-+ /* update saved flags */
-+ owner->flags = flags & KDBUS_NAME_SAVED_MASK;
-+
-+ if (!primary) {
-+ /*
-+ * No primary owner (but maybe an activator). Take over the
-+ * name.
-+ */
-+
-+ list_add(&owner->name_entry, &name->queue);
-+ owner->flags |= KDBUS_NAME_PRIMARY;
-+ nflags |= KDBUS_NAME_ACQUIRED;
-+
-+ /* move messages to new owner on activation */
-+ if (activator) {
-+ kdbus_conn_move_messages(owner->conn, activator->conn,
-+ name->name_id);
-+ kdbus_notify_name_change(bus, KDBUS_ITEM_NAME_CHANGE,
-+ activator->conn->id, owner->conn->id,
-+ activator->flags, owner->flags,
-+ name->name);
-+ activator->flags &= ~KDBUS_NAME_PRIMARY;
-+ activator->flags |= KDBUS_NAME_IN_QUEUE;
-+ } else {
-+ kdbus_notify_name_change(bus, KDBUS_ITEM_NAME_ADD,
-+ 0, owner->conn->id,
-+ 0, owner->flags,
-+ name->name);
-+ }
-+
-+ } else if (owner == primary) {
-+ /*
-+ * Already the primary owner of the name, flags were already
-+ * updated. Nothing to do.
-+ */
-+
-+ owner->flags |= KDBUS_NAME_PRIMARY;
-+
-+ } else if ((primary->flags & KDBUS_NAME_ALLOW_REPLACEMENT) &&
-+ (flags & KDBUS_NAME_REPLACE_EXISTING)) {
-+ /*
-+ * We're not the primary owner but can replace it. Move us
-+ * ahead of the primary owner and acquire the name (possibly
-+ * skipping queued owners ahead of us).
-+ */
-+
-+ list_del_init(&owner->name_entry);
-+ list_add(&owner->name_entry, &name->queue);
-+ owner->flags |= KDBUS_NAME_PRIMARY;
-+ nflags |= KDBUS_NAME_ACQUIRED;
-+
-+ kdbus_notify_name_change(bus, KDBUS_ITEM_NAME_CHANGE,
-+ primary->conn->id, owner->conn->id,
-+ primary->flags, owner->flags,
-+ name->name);
-+
-+ /* requeue old primary, or drop if queueing not wanted */
-+ if (primary->flags & KDBUS_NAME_QUEUE) {
-+ primary->flags &= ~KDBUS_NAME_PRIMARY;
-+ primary->flags |= KDBUS_NAME_IN_QUEUE;
-+ } else {
-+ list_del_init(&primary->name_entry);
-+ kdbus_name_owner_free(primary);
-+ }
-+
-+ } else if (flags & KDBUS_NAME_QUEUE) {
-+ /*
-+ * Name is already occupied and we cannot take it over, but
-+ * queuing is allowed. Put us silently on the queue, if not
-+ * already there.
-+ */
-+
-+ owner->flags |= KDBUS_NAME_IN_QUEUE;
-+ if (!kdbus_name_owner_is_used(owner)) {
-+ list_add_tail(&owner->name_entry, &name->queue);
-+ nflags |= KDBUS_NAME_ACQUIRED;
-+ }
-+ } else if (kdbus_name_owner_is_used(owner)) {
-+ /*
-+ * Already queued on name, but re-queueing was not requested.
-+ * Make sure to unlink it from the name, the caller is
-+ * responsible for releasing it.
-+ */
-+
-+ list_del_init(&owner->name_entry);
-+ } else {
-+ /*
-+ * Name is already claimed and queueing is not requested.
-+ * Return error to the caller.
-+ */
-+
-+ ret = -EEXIST;
-+ }
-+
-+ if (return_flags)
-+ *return_flags = owner->flags | nflags;
-+
-+ return ret;
-+}
-+
-+int kdbus_name_acquire(struct kdbus_name_registry *reg,
-+ struct kdbus_conn *conn, const char *name_str,
-+ u64 flags, u64 *return_flags)
-+{
-+ struct kdbus_name_entry *name = NULL;
-+ struct kdbus_name_owner *owner = NULL;
-+ u32 hash;
-+ int ret;
-+
-+ kdbus_conn_assert_active(conn);
-+
-+ down_write(®->rwlock);
-+
-+ /*
-+ * Verify the connection has access to the name. Do this before testing
-+ * for double-acquisitions and other errors to make sure we do not leak
-+ * information about this name through possible custom endpoints.
-+ */
-+ if (!kdbus_conn_policy_own_name(conn, current_cred(), name_str)) {
-+ ret = -EPERM;
-+ goto exit;
-+ }
-+
-+ /*
-+ * Lookup the name entry. If it already exists, search for an owner
-+ * entry as we might already own that name. If either does not exist,
-+ * we will allocate a fresh one.
-+ */
-+ hash = kdbus_strhash(name_str);
-+ name = kdbus_name_entry_find(reg, hash, name_str);
-+ if (name) {
-+ owner = kdbus_name_owner_find(name, conn);
-+ } else {
-+ name = kdbus_name_entry_new(reg, hash, name_str);
-+ if (IS_ERR(name)) {
-+ ret = PTR_ERR(name);
-+ name = NULL;
-+ goto exit;
-+ }
-+ }
-+
-+ /* create name owner object if not already queued */
-+ if (!owner) {
-+ owner = kdbus_name_owner_new(conn, name, flags);
-+ if (IS_ERR(owner)) {
-+ ret = PTR_ERR(owner);
-+ owner = NULL;
-+ goto exit;
-+ }
-+ }
-+
-+ if (flags & KDBUS_NAME_ACTIVATOR)
-+ ret = kdbus_name_become_activator(owner, return_flags);
-+ else
-+ ret = kdbus_name_update(owner, flags, return_flags);
-+ if (ret < 0)
-+ goto exit;
-+
-+exit:
-+ if (owner && !kdbus_name_owner_is_used(owner))
-+ kdbus_name_owner_free(owner);
-+ if (name && !kdbus_name_entry_is_used(name))
-+ kdbus_name_entry_free(name);
-+ up_write(®->rwlock);
-+ kdbus_notify_flush(conn->ep->bus);
-+ return ret;
-+}
-+
-+static void kdbus_name_release_unlocked(struct kdbus_name_owner *owner)
-+{
-+ struct kdbus_name_owner *primary, *next;
-+ struct kdbus_name_entry *name;
-+
-+ name = owner->name;
-+ primary = kdbus_name_entry_first(name);
-+
-+ list_del_init(&owner->name_entry);
-+ if (owner == name->activator)
-+ name->activator = NULL;
-+
-+ if (!primary || owner == primary) {
-+ next = kdbus_name_entry_first(name);
-+ if (!next)
-+ next = name->activator;
-+
-+ if (next) {
-+ /* hand to next in queue */
-+ next->flags &= ~KDBUS_NAME_IN_QUEUE;
-+ next->flags |= KDBUS_NAME_PRIMARY;
-+ if (next == name->activator)
-+ kdbus_conn_move_messages(next->conn,
-+ owner->conn,
-+ name->name_id);
-+
-+ kdbus_notify_name_change(owner->conn->ep->bus,
-+ KDBUS_ITEM_NAME_CHANGE,
-+ owner->conn->id, next->conn->id,
-+ owner->flags, next->flags,
-+ name->name);
-+ } else {
-+ kdbus_notify_name_change(owner->conn->ep->bus,
-+ KDBUS_ITEM_NAME_REMOVE,
-+ owner->conn->id, 0,
-+ owner->flags, 0,
-+ name->name);
-+ }
-+ }
-+
-+ kdbus_name_owner_free(owner);
-+ if (!kdbus_name_entry_is_used(name))
-+ kdbus_name_entry_free(name);
-+}
-+
-+static int kdbus_name_release(struct kdbus_name_registry *reg,
-+ struct kdbus_conn *conn,
-+ const char *name_str)
-+{
-+ struct kdbus_name_owner *owner;
-+ struct kdbus_name_entry *name;
-+ int ret = 0;
-+
-+ down_write(®->rwlock);
-+ name = kdbus_name_entry_find(reg, kdbus_strhash(name_str), name_str);
-+ if (name) {
-+ owner = kdbus_name_owner_find(name, conn);
-+ if (owner)
-+ kdbus_name_release_unlocked(owner);
-+ else
-+ ret = -EADDRINUSE;
-+ } else {
-+ ret = -ESRCH;
-+ }
-+ up_write(®->rwlock);
-+
-+ kdbus_notify_flush(conn->ep->bus);
-+ return ret;
-+}
-+
-+/**
-+ * kdbus_name_release_all() - remove all name entries of a given connection
-+ * @reg: name registry
-+ * @conn: connection
-+ */
-+void kdbus_name_release_all(struct kdbus_name_registry *reg,
-+ struct kdbus_conn *conn)
-+{
-+ struct kdbus_name_owner *owner;
-+
-+ down_write(®->rwlock);
-+
-+ while ((owner = list_first_entry_or_null(&conn->names_list,
-+ struct kdbus_name_owner,
-+ conn_entry)))
-+ kdbus_name_release_unlocked(owner);
-+
-+ up_write(®->rwlock);
-+
-+ kdbus_notify_flush(conn->ep->bus);
-+}
-+
-+/**
-+ * kdbus_name_is_valid() - check if a name is valid
-+ * @p: The name to check
-+ * @allow_wildcard: Whether or not to allow a wildcard name
-+ *
-+ * A name is valid if all of the following criterias are met:
-+ *
-+ * - The name has two or more elements separated by a period ('.') character.
-+ * - All elements must contain at least one character.
-+ * - Each element must only contain the ASCII characters "[A-Z][a-z][0-9]_-"
-+ * and must not begin with a digit.
-+ * - The name must not exceed KDBUS_NAME_MAX_LEN.
-+ * - If @allow_wildcard is true, the name may end on '.*'
-+ */
-+bool kdbus_name_is_valid(const char *p, bool allow_wildcard)
-+{
-+ bool dot, found_dot = false;
-+ const char *q;
-+
-+ for (dot = true, q = p; *q; q++) {
-+ if (*q == '.') {
-+ if (dot)
-+ return false;
-+
-+ found_dot = true;
-+ dot = true;
-+ } else {
-+ bool good;
-+
-+ good = isalpha(*q) || (!dot && isdigit(*q)) ||
-+ *q == '_' || *q == '-' ||
-+ (allow_wildcard && dot &&
-+ *q == '*' && *(q + 1) == '\0');
-+
-+ if (!good)
-+ return false;
-+
-+ dot = false;
-+ }
-+ }
-+
-+ if (q - p > KDBUS_NAME_MAX_LEN)
-+ return false;
-+
-+ if (dot)
-+ return false;
-+
-+ if (!found_dot)
-+ return false;
-+
-+ return true;
-+}
-+
-+/**
-+ * kdbus_cmd_name_acquire() - handle KDBUS_CMD_NAME_ACQUIRE
-+ * @conn: connection to operate on
-+ * @argp: command payload
-+ *
-+ * Return: >=0 on success, negative error code on failure.
-+ */
-+int kdbus_cmd_name_acquire(struct kdbus_conn *conn, void __user *argp)
-+{
-+ const char *item_name;
-+ struct kdbus_cmd *cmd;
-+ int ret;
-+
-+ struct kdbus_arg argv[] = {
-+ { .type = KDBUS_ITEM_NEGOTIATE },
-+ { .type = KDBUS_ITEM_NAME, .mandatory = true },
-+ };
-+ struct kdbus_args args = {
-+ .allowed_flags = KDBUS_FLAG_NEGOTIATE |
-+ KDBUS_NAME_REPLACE_EXISTING |
-+ KDBUS_NAME_ALLOW_REPLACEMENT |
-+ KDBUS_NAME_QUEUE,
-+ .argv = argv,
-+ .argc = ARRAY_SIZE(argv),
-+ };
-+
-+ if (!kdbus_conn_is_ordinary(conn))
-+ return -EOPNOTSUPP;
-+
-+ ret = kdbus_args_parse(&args, argp, &cmd);
-+ if (ret != 0)
-+ return ret;
-+
-+ item_name = argv[1].item->str;
-+ if (!kdbus_name_is_valid(item_name, false)) {
-+ ret = -EINVAL;
-+ goto exit;
-+ }
-+
-+ ret = kdbus_name_acquire(conn->ep->bus->name_registry, conn, item_name,
-+ cmd->flags, &cmd->return_flags);
-+
-+exit:
-+ return kdbus_args_clear(&args, ret);
-+}
-+
-+/**
-+ * kdbus_cmd_name_release() - handle KDBUS_CMD_NAME_RELEASE
-+ * @conn: connection to operate on
-+ * @argp: command payload
-+ *
-+ * Return: >=0 on success, negative error code on failure.
-+ */
-+int kdbus_cmd_name_release(struct kdbus_conn *conn, void __user *argp)
-+{
-+ struct kdbus_cmd *cmd;
-+ int ret;
-+
-+ struct kdbus_arg argv[] = {
-+ { .type = KDBUS_ITEM_NEGOTIATE },
-+ { .type = KDBUS_ITEM_NAME, .mandatory = true },
-+ };
-+ struct kdbus_args args = {
-+ .allowed_flags = KDBUS_FLAG_NEGOTIATE,
-+ .argv = argv,
-+ .argc = ARRAY_SIZE(argv),
-+ };
-+
-+ if (!kdbus_conn_is_ordinary(conn))
-+ return -EOPNOTSUPP;
-+
-+ ret = kdbus_args_parse(&args, argp, &cmd);
-+ if (ret != 0)
-+ return ret;
-+
-+ ret = kdbus_name_release(conn->ep->bus->name_registry, conn,
-+ argv[1].item->str);
-+ return kdbus_args_clear(&args, ret);
-+}
-+
-+static int kdbus_list_write(struct kdbus_conn *conn,
-+ struct kdbus_conn *c,
-+ struct kdbus_pool_slice *slice,
-+ size_t *pos,
-+ struct kdbus_name_owner *o,
-+ bool write)
-+{
-+ struct kvec kvec[4];
-+ size_t cnt = 0;
-+ int ret;
-+
-+ /* info header */
-+ struct kdbus_info info = {
-+ .size = 0,
-+ .id = c->id,
-+ .flags = c->flags,
-+ };
-+
-+ /* fake the header of a kdbus_name item */
-+ struct {
-+ u64 size;
-+ u64 type;
-+ u64 flags;
-+ } h = {};
-+
-+ if (o && !kdbus_conn_policy_see_name_unlocked(conn, current_cred(),
-+ o->name->name))
-+ return 0;
-+
-+ kdbus_kvec_set(&kvec[cnt++], &info, sizeof(info), &info.size);
-+
-+ /* append name */
-+ if (o) {
-+ size_t slen = strlen(o->name->name) + 1;
-+
-+ h.size = offsetof(struct kdbus_item, name.name) + slen;
-+ h.type = KDBUS_ITEM_OWNED_NAME;
-+ h.flags = o->flags;
-+
-+ kdbus_kvec_set(&kvec[cnt++], &h, sizeof(h), &info.size);
-+ kdbus_kvec_set(&kvec[cnt++], o->name->name, slen, &info.size);
-+ cnt += !!kdbus_kvec_pad(&kvec[cnt], &info.size);
-+ }
-+
-+ if (write) {
-+ ret = kdbus_pool_slice_copy_kvec(slice, *pos, kvec,
-+ cnt, info.size);
-+ if (ret < 0)
-+ return ret;
-+ }
-+
-+ *pos += info.size;
-+ return 0;
-+}
-+
-+static int kdbus_list_all(struct kdbus_conn *conn, u64 flags,
-+ struct kdbus_pool_slice *slice,
-+ size_t *pos, bool write)
-+{
-+ struct kdbus_conn *c;
-+ size_t p = *pos;
-+ int ret, i;
-+
-+ hash_for_each(conn->ep->bus->conn_hash, i, c, hentry) {
-+ bool added = false;
-+
-+ /* skip monitors */
-+ if (kdbus_conn_is_monitor(c))
-+ continue;
-+
-+ /* all names the connection owns */
-+ if (flags & (KDBUS_LIST_NAMES |
-+ KDBUS_LIST_ACTIVATORS |
-+ KDBUS_LIST_QUEUED)) {
-+ struct kdbus_name_owner *o;
-+
-+ list_for_each_entry(o, &c->names_list, conn_entry) {
-+ if (o->flags & KDBUS_NAME_ACTIVATOR) {
-+ if (!(flags & KDBUS_LIST_ACTIVATORS))
-+ continue;
-+
-+ ret = kdbus_list_write(conn, c, slice,
-+ &p, o, write);
-+ if (ret < 0) {
-+ mutex_unlock(&c->lock);
-+ return ret;
-+ }
-+
-+ added = true;
-+ } else if (o->flags & KDBUS_NAME_IN_QUEUE) {
-+ if (!(flags & KDBUS_LIST_QUEUED))
-+ continue;
-+
-+ ret = kdbus_list_write(conn, c, slice,
-+ &p, o, write);
-+ if (ret < 0) {
-+ mutex_unlock(&c->lock);
-+ return ret;
-+ }
-+
-+ added = true;
-+ } else if (flags & KDBUS_LIST_NAMES) {
-+ ret = kdbus_list_write(conn, c, slice,
-+ &p, o, write);
-+ if (ret < 0) {
-+ mutex_unlock(&c->lock);
-+ return ret;
-+ }
-+
-+ added = true;
-+ }
-+ }
-+ }
-+
-+ /* nothing added so far, just add the unique ID */
-+ if (!added && (flags & KDBUS_LIST_UNIQUE)) {
-+ ret = kdbus_list_write(conn, c, slice, &p, NULL, write);
-+ if (ret < 0)
-+ return ret;
-+ }
-+ }
-+
-+ *pos = p;
-+ return 0;
-+}
-+
-+/**
-+ * kdbus_cmd_list() - handle KDBUS_CMD_LIST
-+ * @conn: connection to operate on
-+ * @argp: command payload
-+ *
-+ * Return: >=0 on success, negative error code on failure.
-+ */
-+int kdbus_cmd_list(struct kdbus_conn *conn, void __user *argp)
-+{
-+ struct kdbus_name_registry *reg = conn->ep->bus->name_registry;
-+ struct kdbus_pool_slice *slice = NULL;
-+ struct kdbus_cmd_list *cmd;
-+ size_t pos, size;
-+ int ret;
-+
-+ struct kdbus_arg argv[] = {
-+ { .type = KDBUS_ITEM_NEGOTIATE },
-+ };
-+ struct kdbus_args args = {
-+ .allowed_flags = KDBUS_FLAG_NEGOTIATE |
-+ KDBUS_LIST_UNIQUE |
-+ KDBUS_LIST_NAMES |
-+ KDBUS_LIST_ACTIVATORS |
-+ KDBUS_LIST_QUEUED,
-+ .argv = argv,
-+ .argc = ARRAY_SIZE(argv),
-+ };
-+
-+ ret = kdbus_args_parse(&args, argp, &cmd);
-+ if (ret != 0)
-+ return ret;
-+
-+ /* lock order: domain -> bus -> ep -> names -> conn */
-+ down_read(®->rwlock);
-+ down_read(&conn->ep->bus->conn_rwlock);
-+ down_read(&conn->ep->policy_db.entries_rwlock);
-+
-+ /* size of records */
-+ size = 0;
-+ ret = kdbus_list_all(conn, cmd->flags, NULL, &size, false);
-+ if (ret < 0)
-+ goto exit_unlock;
-+
-+ if (size == 0) {
-+ kdbus_pool_publish_empty(conn->pool, &cmd->offset,
-+ &cmd->list_size);
-+ } else {
-+ slice = kdbus_pool_slice_alloc(conn->pool, size, false);
-+ if (IS_ERR(slice)) {
-+ ret = PTR_ERR(slice);
-+ slice = NULL;
-+ goto exit_unlock;
-+ }
-+
-+ /* copy the records */
-+ pos = 0;
-+ ret = kdbus_list_all(conn, cmd->flags, slice, &pos, true);
-+ if (ret < 0)
-+ goto exit_unlock;
-+
-+ WARN_ON(pos != size);
-+ kdbus_pool_slice_publish(slice, &cmd->offset, &cmd->list_size);
-+ }
-+
-+ if (kdbus_member_set_user(&cmd->offset, argp, typeof(*cmd), offset) ||
-+ kdbus_member_set_user(&cmd->list_size, argp,
-+ typeof(*cmd), list_size))
-+ ret = -EFAULT;
-+
-+exit_unlock:
-+ up_read(&conn->ep->policy_db.entries_rwlock);
-+ up_read(&conn->ep->bus->conn_rwlock);
-+ up_read(®->rwlock);
-+ kdbus_pool_slice_release(slice);
-+ return kdbus_args_clear(&args, ret);
-+}
-diff --git a/ipc/kdbus/names.h b/ipc/kdbus/names.h
-new file mode 100644
-index 0000000..edac59d
---- /dev/null
-+++ b/ipc/kdbus/names.h
-@@ -0,0 +1,105 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_NAMES_H
-+#define __KDBUS_NAMES_H
-+
-+#include <linux/hashtable.h>
-+#include <linux/rwsem.h>
-+
-+struct kdbus_name_entry;
-+struct kdbus_name_owner;
-+struct kdbus_name_registry;
-+
-+/**
-+ * struct kdbus_name_registry - names registered for a bus
-+ * @entries_hash: Map of entries
-+ * @lock: Registry data lock
-+ * @name_seq_last: Last used sequence number to assign to a name entry
-+ */
-+struct kdbus_name_registry {
-+ DECLARE_HASHTABLE(entries_hash, 8);
-+ struct rw_semaphore rwlock;
-+ u64 name_seq_last;
-+};
-+
-+/**
-+ * struct kdbus_name_entry - well-know name entry
-+ * @name_id: sequence number of name entry to be able to uniquely
-+ * identify a name over its registration lifetime
-+ * @activator: activator of this name, or NULL
-+ * @queue: list of queued owners
-+ * @hentry: entry in registry map
-+ * @name: well-known name
-+ */
-+struct kdbus_name_entry {
-+ u64 name_id;
-+ struct kdbus_name_owner *activator;
-+ struct list_head queue;
-+ struct hlist_node hentry;
-+ char name[];
-+};
-+
-+/**
-+ * struct kdbus_name_owner - owner of a well-known name
-+ * @flags: KDBUS_NAME_* flags of this owner
-+ * @conn: connection owning the name
-+ * @name: name that is owned
-+ * @conn_entry: link into @conn
-+ * @name_entry: link into @name
-+ */
-+struct kdbus_name_owner {
-+ u64 flags;
-+ struct kdbus_conn *conn;
-+ struct kdbus_name_entry *name;
-+ struct list_head conn_entry;
-+ struct list_head name_entry;
-+};
-+
-+bool kdbus_name_is_valid(const char *p, bool allow_wildcard);
-+
-+struct kdbus_name_registry *kdbus_name_registry_new(void);
-+void kdbus_name_registry_free(struct kdbus_name_registry *reg);
-+
-+struct kdbus_name_entry *
-+kdbus_name_lookup_unlocked(struct kdbus_name_registry *reg, const char *name);
-+
-+int kdbus_name_acquire(struct kdbus_name_registry *reg,
-+ struct kdbus_conn *conn, const char *name,
-+ u64 flags, u64 *return_flags);
-+void kdbus_name_release_all(struct kdbus_name_registry *reg,
-+ struct kdbus_conn *conn);
-+
-+int kdbus_cmd_name_acquire(struct kdbus_conn *conn, void __user *argp);
-+int kdbus_cmd_name_release(struct kdbus_conn *conn, void __user *argp);
-+int kdbus_cmd_list(struct kdbus_conn *conn, void __user *argp);
-+
-+/**
-+ * kdbus_name_get_owner() - get current owner of a name
-+ * @name: name to get current owner of
-+ *
-+ * This returns a pointer to the current owner of a name (or its activator if
-+ * there is no owner). The caller must make sure @name is valid and does not
-+ * vanish.
-+ *
-+ * Return: Pointer to current owner or NULL if there is none.
-+ */
-+static inline struct kdbus_name_owner *
-+kdbus_name_get_owner(struct kdbus_name_entry *name)
-+{
-+ return list_first_entry_or_null(&name->queue, struct kdbus_name_owner,
-+ name_entry) ? : name->activator;
-+}
-+
-+#endif
-diff --git a/ipc/kdbus/node.c b/ipc/kdbus/node.c
-new file mode 100644
-index 0000000..89f58bc
---- /dev/null
-+++ b/ipc/kdbus/node.c
-@@ -0,0 +1,897 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/atomic.h>
-+#include <linux/fs.h>
-+#include <linux/idr.h>
-+#include <linux/kdev_t.h>
-+#include <linux/rbtree.h>
-+#include <linux/rwsem.h>
-+#include <linux/sched.h>
-+#include <linux/slab.h>
-+#include <linux/wait.h>
-+
-+#include "bus.h"
-+#include "domain.h"
-+#include "endpoint.h"
-+#include "fs.h"
-+#include "handle.h"
-+#include "node.h"
-+#include "util.h"
-+
-+/**
-+ * DOC: kdbus nodes
-+ *
-+ * Nodes unify lifetime management across exposed kdbus objects and provide a
-+ * hierarchy. Each kdbus object, that might be exposed to user-space, has a
-+ * kdbus_node object embedded and is linked into the hierarchy. Each node can
-+ * have any number (0-n) of child nodes linked. Each child retains a reference
-+ * to its parent node. For root-nodes, the parent is NULL.
-+ *
-+ * Each node object goes through a bunch of states during it's lifetime:
-+ * * NEW
-+ * * LINKED (can be skipped by NEW->FREED transition)
-+ * * ACTIVE (can be skipped by LINKED->INACTIVE transition)
-+ * * INACTIVE
-+ * * DRAINED
-+ * * FREED
-+ *
-+ * Each node is allocated by the caller and initialized via kdbus_node_init().
-+ * This never fails and sets the object into state NEW. From now on, ref-counts
-+ * on the node manage its lifetime. During init, the ref-count is set to 1. Once
-+ * it drops to 0, the node goes to state FREED and the node->free_cb() callback
-+ * is called to deallocate any memory.
-+ *
-+ * After initializing a node, you usually link it into the hierarchy. You need
-+ * to provide a parent node and a name. The node will be linked as child to the
-+ * parent and a globally unique ID is assigned to the child. The name of the
-+ * child must be unique for all children of this parent. Otherwise, linking the
-+ * child will fail with -EEXIST.
-+ * Note that the child is not marked active, yet. Admittedly, it prevents any
-+ * other node from being linked with the same name (thus, it reserves that
-+ * name), but any child-lookup (via name or unique ID) will never return this
-+ * child unless it has been marked active.
-+ *
-+ * Once successfully linked, you can use kdbus_node_activate() to activate a
-+ * child. This will mark the child active. This state can be skipped by directly
-+ * deactivating the child via kdbus_node_deactivate() (see below).
-+ * By activating a child, you enable any lookups on this child to succeed from
-+ * now on. Furthermore, any code that got its hands on a reference to the node,
-+ * can from now on "acquire" the node.
-+ *
-+ * Active References (or: 'acquiring' and 'releasing' a node)
-+ * Additionally to normal object references, nodes support something we call
-+ * "active references". An active reference can be acquired via
-+ * kdbus_node_acquire() and released via kdbus_node_release(). A caller
-+ * _must_ own a normal object reference whenever calling those functions.
-+ * Unlike object references, acquiring an active reference can fail (by
-+ * returning 'false' from kdbus_node_acquire()). An active reference can
-+ * only be acquired if the node is marked active. If it is not marked
-+ * active, yet, or if it was already deactivated, no more active references
-+ * can be acquired, ever!
-+ * Active references are used to track tasks working on a node. Whenever a
-+ * task enters kernel-space to perform an action on a node, it acquires an
-+ * active reference, performs the action and releases the reference again.
-+ * While holding an active reference, the node is guaranteed to stay active.
-+ * If the node is deactivated in parallel, the node is marked as
-+ * deactivated, then we wait for all active references to be dropped, before
-+ * we finally proceed with any cleanups. That is, if you hold an active
-+ * reference to a node, any resources that are bound to the "active" state
-+ * are guaranteed to stay accessible until you release your reference.
-+ *
-+ * Active-references are very similar to rw-locks, where acquiring a node is
-+ * equal to try-read-lock and releasing to read-unlock. Deactivating a node
-+ * means write-lock and never releasing it again.
-+ * Unlike rw-locks, the 'active reference' concept is more versatile and
-+ * avoids unusual rw-lock usage (never releasing a write-lock..).
-+ *
-+ * It is safe to acquire multiple active-references recursively. But you
-+ * need to check the return value of kdbus_node_acquire() on _each_ call. It
-+ * may stop granting references at _any_ time.
-+ *
-+ * You're free to perform any operations you want while holding an active
-+ * reference, except sleeping for an indefinite period. Sleeping for a fixed
-+ * amount of time is fine, but you usually should not wait on wait-queues
-+ * without a timeout.
-+ * For example, if you wait for I/O to happen, you should gather all data
-+ * and schedule the I/O operation, then release your active reference and
-+ * wait for it to complete. Then try to acquire a new reference. If it
-+ * fails, perform any cleanup (the node is now dead). Otherwise, you can
-+ * finish your operation.
-+ *
-+ * All nodes can be deactivated via kdbus_node_deactivate() at any time. You can
-+ * call this multiple times, even in parallel or on nodes that were never
-+ * linked, and it will just work. The only restriction is, you must not hold an
-+ * active reference when calling kdbus_node_deactivate().
-+ * By deactivating a node, it is immediately marked inactive. Then, we wait for
-+ * all active references to be released (called 'draining' the node). This
-+ * shouldn't take very long as we don't perform long-lasting operations while
-+ * holding an active reference. Note that once the node is marked inactive, no
-+ * new active references can be acquired.
-+ * Once all active references are dropped, the node is considered 'drained'. Now
-+ * kdbus_node_deactivate() is called on each child of the node before we
-+ * continue deactivating our node. That is, once all children are entirely
-+ * deactivated, we call ->release_cb() of our node. ->release_cb() can release
-+ * any resources on that node which are bound to the "active" state of a node.
-+ * When done, we unlink the node from its parent rb-tree, mark it as
-+ * 'released' and return.
-+ * If kdbus_node_deactivate() is called multiple times (even in parallel), all
-+ * but one caller will just wait until the node is fully deactivated. That is,
-+ * one random caller of kdbus_node_deactivate() is selected to call
-+ * ->release_cb() and cleanup the node. Only once all this is done, all other
-+ * callers will return from kdbus_node_deactivate(). That is, it doesn't matter
-+ * whether you're the selected caller or not, it will only return after
-+ * everything is fully done.
-+ *
-+ * When a node is activated, we acquire a normal object reference to the node.
-+ * This reference is dropped after deactivation is fully done (and only iff the
-+ * node really was activated). This allows callers to link+activate a child node
-+ * and then drop all refs. The node will be deactivated together with the
-+ * parent, and then be freed when this reference is dropped.
-+ *
-+ * Currently, nodes provide a bunch of resources that external code can use
-+ * directly. This includes:
-+ *
-+ * * node->waitq: Each node has its own wait-queue that is used to manage
-+ * the 'active' state. When a node is deactivated, we wait on
-+ * this queue until all active refs are dropped. Analogously,
-+ * when you release an active reference on a deactivated
-+ * node, and the active ref-count drops to 0, we wake up a
-+ * single thread on this queue. Furthermore, once the
-+ * ->release_cb() callback finished, we wake up all waiters.
-+ * The node-owner is free to re-use this wait-queue for other
-+ * purposes. As node-management uses this queue only during
-+ * deactivation, it is usually totally fine to re-use the
-+ * queue for other, preferably low-overhead, use-cases.
-+ *
-+ * * node->type: This field defines the type of the owner of this node. It
-+ * must be set during node initialization and must remain
-+ * constant. The node management never looks at this value,
-+ * but external users might use to gain access to the owner
-+ * object of a node.
-+ * It is totally up to the owner of the node to define what
-+ * their type means. Usually it means you can access the
-+ * parent structure via container_of(), as long as you hold an
-+ * active reference to the node.
-+ *
-+ * * node->free_cb: callback after all references are dropped
-+ * node->release_cb: callback during node deactivation
-+ * These fields must be set by the node owner during
-+ * node initialization. They must remain constant. If
-+ * NULL, they're skipped.
-+ *
-+ * * node->mode: filesystem access modes
-+ * node->uid: filesystem owner uid
-+ * node->gid: filesystem owner gid
-+ * These fields must be set by the node owner during node
-+ * initialization. They must remain constant and may be
-+ * accessed by other callers to properly initialize
-+ * filesystem nodes.
-+ *
-+ * * node->id: This is an unsigned 32bit integer allocated by an IDA. It is
-+ * always kept as small as possible during allocation and is
-+ * globally unique across all nodes allocated by this module. 0
-+ * is reserved as "not assigned" and is the default.
-+ * The ID is assigned during kdbus_node_link() and is kept until
-+ * the object is freed. Thus, the ID surpasses the active
-+ * lifetime of a node. As long as you hold an object reference
-+ * to a node (and the node was linked once), the ID is valid and
-+ * unique.
-+ *
-+ * * node->name: name of this node
-+ * node->hash: 31bit hash-value of @name (range [2..INT_MAX-1])
-+ * These values follow the same lifetime rules as node->id.
-+ * They're initialized when the node is linked and then remain
-+ * constant until the last object reference is dropped.
-+ * Unlike the id, the name is only unique across all siblings
-+ * and only until the node is deactivated. Currently, the name
-+ * is even unique if linked but not activated, yet. This might
-+ * change in the future, though. Code should not rely on this.
-+ *
-+ * * node->lock: lock to protect node->children, node->rb, node->parent
-+ * * node->parent: Reference to parent node. This is set during LINK time
-+ * and is dropped during destruction. You must not access
-+ * it unless you hold an active reference to the node or if
-+ * you know the node is dead.
-+ * * node->children: rb-tree of all linked children of this node. You must
-+ * not access this directly, but use one of the iterator
-+ * or lookup helpers.
-+ */
-+
-+/*
-+ * Bias values track states of "active references". They're all negative. If a
-+ * node is active, its active-ref-counter is >=0 and tracks all active
-+ * references. Once a node is deactivaed, we subtract NODE_BIAS. This means, the
-+ * counter is now negative but still counts the active references. Once it drops
-+ * to exactly NODE_BIAS, we know all active references were dropped. Exactly one
-+ * thread will change it to NODE_RELEASE now, perform cleanup and then put it
-+ * into NODE_DRAINED. Once drained, all other threads that tried deactivating
-+ * the node will now be woken up (thus, they wait until the node is fully done).
-+ * The initial state during node-setup is NODE_NEW. If a node is directly
-+ * deactivated without having ever been active, it is put into
-+ * NODE_RELEASE_DIRECT instead of NODE_BIAS. This tracks this one-bit state
-+ * across node-deactivation. The task putting it into NODE_RELEASE now knows
-+ * whether the node was active before or not.
-+ *
-+ * Some archs implement atomic_sub(v) with atomic_add(-v), so reserve INT_MIN
-+ * to avoid overflows if multiplied by -1.
-+ */
-+#define KDBUS_NODE_BIAS (INT_MIN + 5)
-+#define KDBUS_NODE_RELEASE_DIRECT (KDBUS_NODE_BIAS - 1)
-+#define KDBUS_NODE_RELEASE (KDBUS_NODE_BIAS - 2)
-+#define KDBUS_NODE_DRAINED (KDBUS_NODE_BIAS - 3)
-+#define KDBUS_NODE_NEW (KDBUS_NODE_BIAS - 4)
-+
-+/* global unique ID mapping for kdbus nodes */
-+DEFINE_IDA(kdbus_node_ida);
-+
-+/**
-+ * kdbus_node_name_hash() - hash a name
-+ * @name: The string to hash
-+ *
-+ * This computes the hash of @name. It is guaranteed to be in the range
-+ * [2..INT_MAX-1]. The values 1, 2 and INT_MAX are unused as they are reserved
-+ * for the filesystem code.
-+ *
-+ * Return: hash value of the passed string
-+ */
-+static unsigned int kdbus_node_name_hash(const char *name)
-+{
-+ unsigned int hash;
-+
-+ /* reserve hash numbers 0, 1 and >=INT_MAX for magic directories */
-+ hash = kdbus_strhash(name) & INT_MAX;
-+ if (hash < 2)
-+ hash += 2;
-+ if (hash >= INT_MAX)
-+ hash = INT_MAX - 1;
-+
-+ return hash;
-+}
-+
-+/**
-+ * kdbus_node_name_compare() - compare a name with a node's name
-+ * @hash: hash of the string to compare the node with
-+ * @name: name to compare the node with
-+ * @node: node to compare the name with
-+ *
-+ * Return: 0 if @name and @hash exactly match the information in @node, or
-+ * an integer less than or greater than zero if @name is found, respectively,
-+ * to be less than or be greater than the string stored in @node.
-+ */
-+static int kdbus_node_name_compare(unsigned int hash, const char *name,
-+ const struct kdbus_node *node)
-+{
-+ if (hash != node->hash)
-+ return hash - node->hash;
-+
-+ return strcmp(name, node->name);
-+}
-+
-+/**
-+ * kdbus_node_init() - initialize a kdbus_node
-+ * @node: Pointer to the node to initialize
-+ * @type: The type the node will have (KDBUS_NODE_*)
-+ *
-+ * The caller is responsible of allocating @node and initializating it to zero.
-+ * Once this call returns, you must use the node_ref() and node_unref()
-+ * functions to manage this node.
-+ */
-+void kdbus_node_init(struct kdbus_node *node, unsigned int type)
-+{
-+ atomic_set(&node->refcnt, 1);
-+ mutex_init(&node->lock);
-+ node->id = 0;
-+ node->type = type;
-+ RB_CLEAR_NODE(&node->rb);
-+ node->children = RB_ROOT;
-+ init_waitqueue_head(&node->waitq);
-+ atomic_set(&node->active, KDBUS_NODE_NEW);
-+}
-+
-+/**
-+ * kdbus_node_link() - link a node into the nodes system
-+ * @node: Pointer to the node to initialize
-+ * @parent: Pointer to a parent node, may be %NULL
-+ * @name: The name of the node (or NULL if root node)
-+ *
-+ * This links a node into the hierarchy. This must not be called multiple times.
-+ * If @parent is NULL, the node becomes a new root node.
-+ *
-+ * This call will fail if @name is not unique across all its siblings or if no
-+ * ID could be allocated. You must not activate a node if linking failed! It is
-+ * safe to deactivate it, though.
-+ *
-+ * Once you linked a node, you must call kdbus_node_deactivate() before you drop
-+ * the last reference (even if you never activate the node).
-+ *
-+ * Return: 0 on success. negative error otherwise.
-+ */
-+int kdbus_node_link(struct kdbus_node *node, struct kdbus_node *parent,
-+ const char *name)
-+{
-+ int ret;
-+
-+ if (WARN_ON(node->type != KDBUS_NODE_DOMAIN && !parent))
-+ return -EINVAL;
-+
-+ if (WARN_ON(parent && !name))
-+ return -EINVAL;
-+
-+ if (name) {
-+ node->name = kstrdup(name, GFP_KERNEL);
-+ if (!node->name)
-+ return -ENOMEM;
-+
-+ node->hash = kdbus_node_name_hash(name);
-+ }
-+
-+ ret = ida_simple_get(&kdbus_node_ida, 1, 0, GFP_KERNEL);
-+ if (ret < 0)
-+ return ret;
-+
-+ node->id = ret;
-+ ret = 0;
-+
-+ if (parent) {
-+ struct rb_node **n, *prev;
-+
-+ if (!kdbus_node_acquire(parent))
-+ return -ESHUTDOWN;
-+
-+ mutex_lock(&parent->lock);
-+
-+ n = &parent->children.rb_node;
-+ prev = NULL;
-+
-+ while (*n) {
-+ struct kdbus_node *pos;
-+ int result;
-+
-+ pos = kdbus_node_from_rb(*n);
-+ prev = *n;
-+ result = kdbus_node_name_compare(node->hash,
-+ node->name,
-+ pos);
-+ if (result == 0) {
-+ ret = -EEXIST;
-+ goto exit_unlock;
-+ }
-+
-+ if (result < 0)
-+ n = &pos->rb.rb_left;
-+ else
-+ n = &pos->rb.rb_right;
-+ }
-+
-+ /* add new node and rebalance the tree */
-+ rb_link_node(&node->rb, prev, n);
-+ rb_insert_color(&node->rb, &parent->children);
-+ node->parent = kdbus_node_ref(parent);
-+
-+exit_unlock:
-+ mutex_unlock(&parent->lock);
-+ kdbus_node_release(parent);
-+ }
-+
-+ return ret;
-+}
-+
-+/**
-+ * kdbus_node_ref() - Acquire object reference
-+ * @node: node to acquire reference to (or NULL)
-+ *
-+ * This acquires a new reference to @node. You must already own a reference when
-+ * calling this!
-+ * If @node is NULL, this is a no-op.
-+ *
-+ * Return: @node is returned
-+ */
-+struct kdbus_node *kdbus_node_ref(struct kdbus_node *node)
-+{
-+ if (node)
-+ atomic_inc(&node->refcnt);
-+ return node;
-+}
-+
-+/**
-+ * kdbus_node_unref() - Drop object reference
-+ * @node: node to drop reference to (or NULL)
-+ *
-+ * This drops an object reference to @node. You must not access the node if you
-+ * no longer own a reference.
-+ * If the ref-count drops to 0, the object will be destroyed (->free_cb will be
-+ * called).
-+ *
-+ * If you linked or activated the node, you must deactivate the node before you
-+ * drop your last reference! If you didn't link or activate the node, you can
-+ * drop any reference you want.
-+ *
-+ * Note that this calls into ->free_cb() and thus _might_ sleep. The ->free_cb()
-+ * callbacks must not acquire any outer locks, though. So you can safely drop
-+ * references while holding locks.
-+ *
-+ * If @node is NULL, this is a no-op.
-+ *
-+ * Return: This always returns NULL
-+ */
-+struct kdbus_node *kdbus_node_unref(struct kdbus_node *node)
-+{
-+ if (node && atomic_dec_and_test(&node->refcnt)) {
-+ struct kdbus_node safe = *node;
-+
-+ WARN_ON(atomic_read(&node->active) != KDBUS_NODE_DRAINED);
-+ WARN_ON(!RB_EMPTY_NODE(&node->rb));
-+
-+ if (node->free_cb)
-+ node->free_cb(node);
-+ if (safe.id > 0)
-+ ida_simple_remove(&kdbus_node_ida, safe.id);
-+
-+ kfree(safe.name);
-+
-+ /*
-+ * kdbusfs relies on the parent to be available even after the
-+ * node was deactivated and unlinked. Therefore, we pin it
-+ * until a node is destroyed.
-+ */
-+ kdbus_node_unref(safe.parent);
-+ }
-+
-+ return NULL;
-+}
-+
-+/**
-+ * kdbus_node_is_active() - test whether a node is active
-+ * @node: node to test
-+ *
-+ * This checks whether @node is active. That means, @node was linked and
-+ * activated by the node owner and hasn't been deactivated, yet. If, and only
-+ * if, a node is active, kdbus_node_acquire() will be able to acquire active
-+ * references.
-+ *
-+ * Note that this function does not give any lifetime guarantees. After this
-+ * call returns, the node might be deactivated immediately. Normally, what you
-+ * want is to acquire a real active reference via kdbus_node_acquire().
-+ *
-+ * Return: true if @node is active, false otherwise
-+ */
-+bool kdbus_node_is_active(struct kdbus_node *node)
-+{
-+ return atomic_read(&node->active) >= 0;
-+}
-+
-+/**
-+ * kdbus_node_is_deactivated() - test whether a node was already deactivated
-+ * @node: node to test
-+ *
-+ * This checks whether kdbus_node_deactivate() was called on @node. Note that
-+ * this might be true even if you never deactivated the node directly, but only
-+ * one of its ancestors.
-+ *
-+ * Note that even if this returns 'false', the node might get deactivated
-+ * immediately after the call returns.
-+ *
-+ * Return: true if @node was already deactivated, false if not
-+ */
-+bool kdbus_node_is_deactivated(struct kdbus_node *node)
-+{
-+ int v;
-+
-+ v = atomic_read(&node->active);
-+ return v != KDBUS_NODE_NEW && v < 0;
-+}
-+
-+/**
-+ * kdbus_node_activate() - activate a node
-+ * @node: node to activate
-+ *
-+ * This marks @node as active if, and only if, the node wasn't activated nor
-+ * deactivated, yet, and the parent is still active. Any but the first call to
-+ * kdbus_node_activate() is a no-op.
-+ * If you called kdbus_node_deactivate() before, then even the first call to
-+ * kdbus_node_activate() will be a no-op.
-+ *
-+ * This call doesn't give any lifetime guarantees. The node might get
-+ * deactivated immediately after this call returns. Or the parent might already
-+ * be deactivated, which will make this call a no-op.
-+ *
-+ * If this call successfully activated a node, it will take an object reference
-+ * to it. This reference is dropped after the node is deactivated. Therefore,
-+ * the object owner can safely drop their reference to @node iff they know that
-+ * its parent node will get deactivated at some point. Once the parent node is
-+ * deactivated, it will deactivate all its child and thus drop this reference
-+ * again.
-+ *
-+ * Return: True if this call successfully activated the node, otherwise false.
-+ * Note that this might return false, even if the node is still active
-+ * (eg., if you called this a second time).
-+ */
-+bool kdbus_node_activate(struct kdbus_node *node)
-+{
-+ bool res = false;
-+
-+ mutex_lock(&node->lock);
-+ if (atomic_read(&node->active) == KDBUS_NODE_NEW) {
-+ atomic_sub(KDBUS_NODE_NEW, &node->active);
-+ /* activated nodes have ref +1 */
-+ kdbus_node_ref(node);
-+ res = true;
-+ }
-+ mutex_unlock(&node->lock);
-+
-+ return res;
-+}
-+
-+/**
-+ * kdbus_node_deactivate() - deactivate a node
-+ * @node: The node to deactivate.
-+ *
-+ * This function recursively deactivates this node and all its children. It
-+ * returns only once all children and the node itself were recursively disabled
-+ * (even if you call this function multiple times in parallel).
-+ *
-+ * It is safe to call this function on _any_ node that was initialized _any_
-+ * number of times.
-+ *
-+ * This call may sleep, as it waits for all active references to be dropped.
-+ */
-+void kdbus_node_deactivate(struct kdbus_node *node)
-+{
-+ struct kdbus_node *pos, *child;
-+ struct rb_node *rb;
-+ int v_pre, v_post;
-+
-+ pos = node;
-+
-+ /*
-+ * To avoid recursion, we perform back-tracking while deactivating
-+ * nodes. For each node we enter, we first mark the active-counter as
-+ * deactivated by adding BIAS. If the node as children, we set the first
-+ * child as current position and start over. If the node has no
-+ * children, we drain the node by waiting for all active refs to be
-+ * dropped and then releasing the node.
-+ *
-+ * After the node is released, we set its parent as current position
-+ * and start over. If the current position was the initial node, we're
-+ * done.
-+ *
-+ * Note that this function can be called in parallel by multiple
-+ * callers. We make sure that each node is only released once, and any
-+ * racing caller will wait until the other thread fully released that
-+ * node.
-+ */
-+
-+ for (;;) {
-+ /*
-+ * Add BIAS to node->active to mark it as inactive. If it was
-+ * never active before, immediately mark it as RELEASE_INACTIVE
-+ * so we remember this state.
-+ * We cannot remember v_pre as we might iterate into the
-+ * children, overwriting v_pre, before we can release our node.
-+ */
-+ mutex_lock(&pos->lock);
-+ v_pre = atomic_read(&pos->active);
-+ if (v_pre >= 0)
-+ atomic_add_return(KDBUS_NODE_BIAS, &pos->active);
-+ else if (v_pre == KDBUS_NODE_NEW)
-+ atomic_set(&pos->active, KDBUS_NODE_RELEASE_DIRECT);
-+ mutex_unlock(&pos->lock);
-+
-+ /* wait until all active references were dropped */
-+ wait_event(pos->waitq,
-+ atomic_read(&pos->active) <= KDBUS_NODE_BIAS);
-+
-+ mutex_lock(&pos->lock);
-+ /* recurse into first child if any */
-+ rb = rb_first(&pos->children);
-+ if (rb) {
-+ child = kdbus_node_ref(kdbus_node_from_rb(rb));
-+ mutex_unlock(&pos->lock);
-+ pos = child;
-+ continue;
-+ }
-+
-+ /* mark object as RELEASE */
-+ v_post = atomic_read(&pos->active);
-+ if (v_post == KDBUS_NODE_BIAS ||
-+ v_post == KDBUS_NODE_RELEASE_DIRECT)
-+ atomic_set(&pos->active, KDBUS_NODE_RELEASE);
-+ mutex_unlock(&pos->lock);
-+
-+ /*
-+ * If this is the thread that marked the object as RELEASE, we
-+ * perform the actual release. Otherwise, we wait until the
-+ * release is done and the node is marked as DRAINED.
-+ */
-+ if (v_post == KDBUS_NODE_BIAS ||
-+ v_post == KDBUS_NODE_RELEASE_DIRECT) {
-+ if (pos->release_cb)
-+ pos->release_cb(pos, v_post == KDBUS_NODE_BIAS);
-+
-+ if (pos->parent) {
-+ mutex_lock(&pos->parent->lock);
-+ if (!RB_EMPTY_NODE(&pos->rb)) {
-+ rb_erase(&pos->rb,
-+ &pos->parent->children);
-+ RB_CLEAR_NODE(&pos->rb);
-+ }
-+ mutex_unlock(&pos->parent->lock);
-+ }
-+
-+ /* mark as DRAINED */
-+ atomic_set(&pos->active, KDBUS_NODE_DRAINED);
-+ wake_up_all(&pos->waitq);
-+
-+ /* drop VFS cache */
-+ kdbus_fs_flush(pos);
-+
-+ /*
-+ * If the node was activated and someone subtracted BIAS
-+ * from it to deactivate it, we, and only us, are
-+ * responsible to release the extra ref-count that was
-+ * taken once in kdbus_node_activate().
-+ * If the node was never activated, no-one ever
-+ * subtracted BIAS, but instead skipped that state and
-+ * immediately went to NODE_RELEASE_DIRECT. In that case
-+ * we must not drop the reference.
-+ */
-+ if (v_post == KDBUS_NODE_BIAS)
-+ kdbus_node_unref(pos);
-+ } else {
-+ /* wait until object is DRAINED */
-+ wait_event(pos->waitq,
-+ atomic_read(&pos->active) == KDBUS_NODE_DRAINED);
-+ }
-+
-+ /*
-+ * We're done with the current node. Continue on its parent
-+ * again, which will try deactivating its next child, or itself
-+ * if no child is left.
-+ * If we've reached our initial node again, we are done and
-+ * can safely return.
-+ */
-+ if (pos == node)
-+ break;
-+
-+ child = pos;
-+ pos = pos->parent;
-+ kdbus_node_unref(child);
-+ }
-+}
-+
-+/**
-+ * kdbus_node_acquire() - Acquire an active ref on a node
-+ * @node: The node
-+ *
-+ * This acquires an active-reference to @node. This will only succeed if the
-+ * node is active. You must release this active reference via
-+ * kdbus_node_release() again.
-+ *
-+ * See the introduction to "active references" for more details.
-+ *
-+ * Return: %true if @node was non-NULL and active
-+ */
-+bool kdbus_node_acquire(struct kdbus_node *node)
-+{
-+ return node && atomic_inc_unless_negative(&node->active);
-+}
-+
-+/**
-+ * kdbus_node_release() - Release an active ref on a node
-+ * @node: The node
-+ *
-+ * This releases an active reference that was previously acquired via
-+ * kdbus_node_acquire(). See kdbus_node_acquire() for details.
-+ */
-+void kdbus_node_release(struct kdbus_node *node)
-+{
-+ if (node && atomic_dec_return(&node->active) == KDBUS_NODE_BIAS)
-+ wake_up(&node->waitq);
-+}
-+
-+/**
-+ * kdbus_node_find_child() - Find child by name
-+ * @node: parent node to search through
-+ * @name: name of child node
-+ *
-+ * This searches through all children of @node for a child-node with name @name.
-+ * If not found, or if the child is deactivated, NULL is returned. Otherwise,
-+ * the child is acquired and a new reference is returned.
-+ *
-+ * If you're done with the child, you need to release it and drop your
-+ * reference.
-+ *
-+ * This function does not acquire the parent node. However, if the parent was
-+ * already deactivated, then kdbus_node_deactivate() will, at some point, also
-+ * deactivate the child. Therefore, we can rely on the explicit ordering during
-+ * deactivation.
-+ *
-+ * Return: Reference to acquired child node, or NULL if not found / not active.
-+ */
-+struct kdbus_node *kdbus_node_find_child(struct kdbus_node *node,
-+ const char *name)
-+{
-+ struct kdbus_node *child;
-+ struct rb_node *rb;
-+ unsigned int hash;
-+ int ret;
-+
-+ hash = kdbus_node_name_hash(name);
-+
-+ mutex_lock(&node->lock);
-+ rb = node->children.rb_node;
-+ while (rb) {
-+ child = kdbus_node_from_rb(rb);
-+ ret = kdbus_node_name_compare(hash, name, child);
-+ if (ret < 0)
-+ rb = rb->rb_left;
-+ else if (ret > 0)
-+ rb = rb->rb_right;
-+ else
-+ break;
-+ }
-+ if (rb && kdbus_node_acquire(child))
-+ kdbus_node_ref(child);
-+ else
-+ child = NULL;
-+ mutex_unlock(&node->lock);
-+
-+ return child;
-+}
-+
-+static struct kdbus_node *node_find_closest_unlocked(struct kdbus_node *node,
-+ unsigned int hash,
-+ const char *name)
-+{
-+ struct kdbus_node *n, *pos = NULL;
-+ struct rb_node *rb;
-+ int res;
-+
-+ /*
-+ * Find the closest child with ``node->hash >= hash'', or, if @name is
-+ * valid, ``node->name >= name'' (where '>=' is the lex. order).
-+ */
-+
-+ rb = node->children.rb_node;
-+ while (rb) {
-+ n = kdbus_node_from_rb(rb);
-+
-+ if (name)
-+ res = kdbus_node_name_compare(hash, name, n);
-+ else
-+ res = hash - n->hash;
-+
-+ if (res <= 0) {
-+ rb = rb->rb_left;
-+ pos = n;
-+ } else { /* ``hash > n->hash'', ``name > n->name'' */
-+ rb = rb->rb_right;
-+ }
-+ }
-+
-+ return pos;
-+}
-+
-+/**
-+ * kdbus_node_find_closest() - Find closest child-match
-+ * @node: parent node to search through
-+ * @hash: hash value to find closest match for
-+ *
-+ * Find the closest child of @node with a hash greater than or equal to @hash.
-+ * The closest match is the left-most child of @node with this property. Which
-+ * means, it is the first child with that hash returned by
-+ * kdbus_node_next_child(), if you'd iterate the whole parent node.
-+ *
-+ * Return: Reference to acquired child, or NULL if none found.
-+ */
-+struct kdbus_node *kdbus_node_find_closest(struct kdbus_node *node,
-+ unsigned int hash)
-+{
-+ struct kdbus_node *child;
-+ struct rb_node *rb;
-+
-+ mutex_lock(&node->lock);
-+
-+ child = node_find_closest_unlocked(node, hash, NULL);
-+ while (child && !kdbus_node_acquire(child)) {
-+ rb = rb_next(&child->rb);
-+ if (rb)
-+ child = kdbus_node_from_rb(rb);
-+ else
-+ child = NULL;
-+ }
-+ kdbus_node_ref(child);
-+
-+ mutex_unlock(&node->lock);
-+
-+ return child;
-+}
-+
-+/**
-+ * kdbus_node_next_child() - Acquire next child
-+ * @node: parent node
-+ * @prev: previous child-node position or NULL
-+ *
-+ * This function returns a reference to the next active child of @node, after
-+ * the passed position @prev. If @prev is NULL, a reference to the first active
-+ * child is returned. If no more active children are found, NULL is returned.
-+ *
-+ * This function acquires the next child it returns. If you're done with the
-+ * returned pointer, you need to release _and_ unref it.
-+ *
-+ * The passed in pointer @prev is not modified by this function, and it does
-+ * *not* have to be active. If @prev was acquired via different means, or if it
-+ * was unlinked from its parent before you pass it in, then this iterator will
-+ * still return the next active child (it will have to search through the
-+ * rb-tree based on the node-name, though).
-+ * However, @prev must not be linked to a different parent than @node!
-+ *
-+ * Return: Reference to next acquired child, or NULL if at the end.
-+ */
-+struct kdbus_node *kdbus_node_next_child(struct kdbus_node *node,
-+ struct kdbus_node *prev)
-+{
-+ struct kdbus_node *pos = NULL;
-+ struct rb_node *rb;
-+
-+ mutex_lock(&node->lock);
-+
-+ if (!prev) {
-+ /*
-+ * New iteration; find first node in rb-tree and try to acquire
-+ * it. If we got it, directly return it as first element.
-+ * Otherwise, the loop below will find the next active node.
-+ */
-+ rb = rb_first(&node->children);
-+ if (!rb)
-+ goto exit;
-+ pos = kdbus_node_from_rb(rb);
-+ if (kdbus_node_acquire(pos))
-+ goto exit;
-+ } else if (RB_EMPTY_NODE(&prev->rb)) {
-+ /*
-+ * The current iterator is no longer linked to the rb-tree. Use
-+ * its hash value and name to find the next _higher_ node and
-+ * acquire it. If we got it, return it as next element.
-+ * Otherwise, the loop below will find the next active node.
-+ */
-+ pos = node_find_closest_unlocked(node, prev->hash, prev->name);
-+ if (!pos)
-+ goto exit;
-+ if (kdbus_node_acquire(pos))
-+ goto exit;
-+ } else {
-+ /*
-+ * The current iterator is still linked to the parent. Set it
-+ * as current position and use the loop below to find the next
-+ * active element.
-+ */
-+ pos = prev;
-+ }
-+
-+ /* @pos was already returned or is inactive; find next active node */
-+ do {
-+ rb = rb_next(&pos->rb);
-+ if (rb)
-+ pos = kdbus_node_from_rb(rb);
-+ else
-+ pos = NULL;
-+ } while (pos && !kdbus_node_acquire(pos));
-+
-+exit:
-+ /* @pos is NULL or acquired. Take ref if non-NULL and return it */
-+ kdbus_node_ref(pos);
-+ mutex_unlock(&node->lock);
-+ return pos;
-+}
-diff --git a/ipc/kdbus/node.h b/ipc/kdbus/node.h
-new file mode 100644
-index 0000000..970e02b
---- /dev/null
-+++ b/ipc/kdbus/node.h
-@@ -0,0 +1,86 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_NODE_H
-+#define __KDBUS_NODE_H
-+
-+#include <linux/atomic.h>
-+#include <linux/kernel.h>
-+#include <linux/mutex.h>
-+#include <linux/wait.h>
-+
-+struct kdbus_node;
-+
-+enum kdbus_node_type {
-+ KDBUS_NODE_DOMAIN,
-+ KDBUS_NODE_CONTROL,
-+ KDBUS_NODE_BUS,
-+ KDBUS_NODE_ENDPOINT,
-+};
-+
-+typedef void (*kdbus_node_free_t) (struct kdbus_node *node);
-+typedef void (*kdbus_node_release_t) (struct kdbus_node *node, bool was_active);
-+
-+struct kdbus_node {
-+ atomic_t refcnt;
-+ atomic_t active;
-+ wait_queue_head_t waitq;
-+
-+ /* static members */
-+ unsigned int type;
-+ kdbus_node_free_t free_cb;
-+ kdbus_node_release_t release_cb;
-+ umode_t mode;
-+ kuid_t uid;
-+ kgid_t gid;
-+
-+ /* valid once linked */
-+ char *name;
-+ unsigned int hash;
-+ unsigned int id;
-+ struct kdbus_node *parent; /* may be NULL */
-+
-+ /* valid iff active */
-+ struct mutex lock;
-+ struct rb_node rb;
-+ struct rb_root children;
-+};
-+
-+#define kdbus_node_from_rb(_node) rb_entry((_node), struct kdbus_node, rb)
-+
-+extern struct ida kdbus_node_ida;
-+
-+void kdbus_node_init(struct kdbus_node *node, unsigned int type);
-+
-+int kdbus_node_link(struct kdbus_node *node, struct kdbus_node *parent,
-+ const char *name);
-+
-+struct kdbus_node *kdbus_node_ref(struct kdbus_node *node);
-+struct kdbus_node *kdbus_node_unref(struct kdbus_node *node);
-+
-+bool kdbus_node_is_active(struct kdbus_node *node);
-+bool kdbus_node_is_deactivated(struct kdbus_node *node);
-+bool kdbus_node_activate(struct kdbus_node *node);
-+void kdbus_node_deactivate(struct kdbus_node *node);
-+
-+bool kdbus_node_acquire(struct kdbus_node *node);
-+void kdbus_node_release(struct kdbus_node *node);
-+
-+struct kdbus_node *kdbus_node_find_child(struct kdbus_node *node,
-+ const char *name);
-+struct kdbus_node *kdbus_node_find_closest(struct kdbus_node *node,
-+ unsigned int hash);
-+struct kdbus_node *kdbus_node_next_child(struct kdbus_node *node,
-+ struct kdbus_node *prev);
-+
-+#endif
-diff --git a/ipc/kdbus/notify.c b/ipc/kdbus/notify.c
-new file mode 100644
-index 0000000..375758c
---- /dev/null
-+++ b/ipc/kdbus/notify.c
-@@ -0,0 +1,204 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/fs.h>
-+#include <linux/init.h>
-+#include <linux/module.h>
-+#include <linux/mutex.h>
-+#include <linux/spinlock.h>
-+#include <linux/sched.h>
-+#include <linux/slab.h>
-+
-+#include "bus.h"
-+#include "connection.h"
-+#include "domain.h"
-+#include "endpoint.h"
-+#include "item.h"
-+#include "message.h"
-+#include "notify.h"
-+
-+static inline void kdbus_notify_add_tail(struct kdbus_staging *staging,
-+ struct kdbus_bus *bus)
-+{
-+ spin_lock(&bus->notify_lock);
-+ list_add_tail(&staging->notify_entry, &bus->notify_list);
-+ spin_unlock(&bus->notify_lock);
-+}
-+
-+static int kdbus_notify_reply(struct kdbus_bus *bus, u64 id,
-+ u64 cookie, u64 msg_type)
-+{
-+ struct kdbus_staging *s;
-+
-+ s = kdbus_staging_new_kernel(bus, id, cookie, 0, msg_type);
-+ if (IS_ERR(s))
-+ return PTR_ERR(s);
-+
-+ kdbus_notify_add_tail(s, bus);
-+ return 0;
-+}
-+
-+/**
-+ * kdbus_notify_reply_timeout() - queue a timeout reply
-+ * @bus: Bus which queues the messages
-+ * @id: The destination's connection ID
-+ * @cookie: The cookie to set in the reply.
-+ *
-+ * Queues a message that has a KDBUS_ITEM_REPLY_TIMEOUT item attached.
-+ *
-+ * Return: 0 on success, negative errno on failure.
-+ */
-+int kdbus_notify_reply_timeout(struct kdbus_bus *bus, u64 id, u64 cookie)
-+{
-+ return kdbus_notify_reply(bus, id, cookie, KDBUS_ITEM_REPLY_TIMEOUT);
-+}
-+
-+/**
-+ * kdbus_notify_reply_dead() - queue a 'dead' reply
-+ * @bus: Bus which queues the messages
-+ * @id: The destination's connection ID
-+ * @cookie: The cookie to set in the reply.
-+ *
-+ * Queues a message that has a KDBUS_ITEM_REPLY_DEAD item attached.
-+ *
-+ * Return: 0 on success, negative errno on failure.
-+ */
-+int kdbus_notify_reply_dead(struct kdbus_bus *bus, u64 id, u64 cookie)
-+{
-+ return kdbus_notify_reply(bus, id, cookie, KDBUS_ITEM_REPLY_DEAD);
-+}
-+
-+/**
-+ * kdbus_notify_name_change() - queue a notification about a name owner change
-+ * @bus: Bus which queues the messages
-+ * @type: The type if the notification; KDBUS_ITEM_NAME_ADD,
-+ * KDBUS_ITEM_NAME_CHANGE or KDBUS_ITEM_NAME_REMOVE
-+ * @old_id: The id of the connection that used to own the name
-+ * @new_id: The id of the new owner connection
-+ * @old_flags: The flags to pass in the KDBUS_ITEM flags field for
-+ * the old owner
-+ * @new_flags: The flags to pass in the KDBUS_ITEM flags field for
-+ * the new owner
-+ * @name: The name that was removed or assigned to a new owner
-+ *
-+ * Return: 0 on success, negative errno on failure.
-+ */
-+int kdbus_notify_name_change(struct kdbus_bus *bus, u64 type,
-+ u64 old_id, u64 new_id,
-+ u64 old_flags, u64 new_flags,
-+ const char *name)
-+{
-+ size_t name_len, extra_size;
-+ struct kdbus_staging *s;
-+
-+ name_len = strlen(name) + 1;
-+ extra_size = sizeof(struct kdbus_notify_name_change) + name_len;
-+
-+ s = kdbus_staging_new_kernel(bus, KDBUS_DST_ID_BROADCAST, 0,
-+ extra_size, type);
-+ if (IS_ERR(s))
-+ return PTR_ERR(s);
-+
-+ s->notify->name_change.old_id.id = old_id;
-+ s->notify->name_change.old_id.flags = old_flags;
-+ s->notify->name_change.new_id.id = new_id;
-+ s->notify->name_change.new_id.flags = new_flags;
-+ memcpy(s->notify->name_change.name, name, name_len);
-+
-+ kdbus_notify_add_tail(s, bus);
-+ return 0;
-+}
-+
-+/**
-+ * kdbus_notify_id_change() - queue a notification about a unique ID change
-+ * @bus: Bus which queues the messages
-+ * @type: The type if the notification; KDBUS_ITEM_ID_ADD or
-+ * KDBUS_ITEM_ID_REMOVE
-+ * @id: The id of the connection that was added or removed
-+ * @flags: The flags to pass in the KDBUS_ITEM flags field
-+ *
-+ * Return: 0 on success, negative errno on failure.
-+ */
-+int kdbus_notify_id_change(struct kdbus_bus *bus, u64 type, u64 id, u64 flags)
-+{
-+ struct kdbus_staging *s;
-+ size_t extra_size;
-+
-+ extra_size = sizeof(struct kdbus_notify_id_change);
-+ s = kdbus_staging_new_kernel(bus, KDBUS_DST_ID_BROADCAST, 0,
-+ extra_size, type);
-+ if (IS_ERR(s))
-+ return PTR_ERR(s);
-+
-+ s->notify->id_change.id = id;
-+ s->notify->id_change.flags = flags;
-+
-+ kdbus_notify_add_tail(s, bus);
-+ return 0;
-+}
-+
-+/**
-+ * kdbus_notify_flush() - send a list of collected messages
-+ * @bus: Bus which queues the messages
-+ *
-+ * The list is empty after sending the messages.
-+ */
-+void kdbus_notify_flush(struct kdbus_bus *bus)
-+{
-+ LIST_HEAD(notify_list);
-+ struct kdbus_staging *s, *tmp;
-+
-+ mutex_lock(&bus->notify_flush_lock);
-+ down_read(&bus->name_registry->rwlock);
-+
-+ spin_lock(&bus->notify_lock);
-+ list_splice_init(&bus->notify_list, ¬ify_list);
-+ spin_unlock(&bus->notify_lock);
-+
-+ list_for_each_entry_safe(s, tmp, ¬ify_list, notify_entry) {
-+ if (s->msg->dst_id != KDBUS_DST_ID_BROADCAST) {
-+ struct kdbus_conn *conn;
-+
-+ conn = kdbus_bus_find_conn_by_id(bus, s->msg->dst_id);
-+ if (conn) {
-+ kdbus_bus_eavesdrop(bus, NULL, s);
-+ kdbus_conn_entry_insert(NULL, conn, s, NULL,
-+ NULL);
-+ kdbus_conn_unref(conn);
-+ }
-+ } else {
-+ kdbus_bus_broadcast(bus, NULL, s);
-+ }
-+
-+ list_del(&s->notify_entry);
-+ kdbus_staging_free(s);
-+ }
-+
-+ up_read(&bus->name_registry->rwlock);
-+ mutex_unlock(&bus->notify_flush_lock);
-+}
-+
-+/**
-+ * kdbus_notify_free() - free a list of collected messages
-+ * @bus: Bus which queues the messages
-+ */
-+void kdbus_notify_free(struct kdbus_bus *bus)
-+{
-+ struct kdbus_staging *s, *tmp;
-+
-+ list_for_each_entry_safe(s, tmp, &bus->notify_list, notify_entry) {
-+ list_del(&s->notify_entry);
-+ kdbus_staging_free(s);
-+ }
-+}
-diff --git a/ipc/kdbus/notify.h b/ipc/kdbus/notify.h
-new file mode 100644
-index 0000000..03df464
---- /dev/null
-+++ b/ipc/kdbus/notify.h
-@@ -0,0 +1,30 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_NOTIFY_H
-+#define __KDBUS_NOTIFY_H
-+
-+struct kdbus_bus;
-+
-+int kdbus_notify_id_change(struct kdbus_bus *bus, u64 type, u64 id, u64 flags);
-+int kdbus_notify_reply_timeout(struct kdbus_bus *bus, u64 id, u64 cookie);
-+int kdbus_notify_reply_dead(struct kdbus_bus *bus, u64 id, u64 cookie);
-+int kdbus_notify_name_change(struct kdbus_bus *bus, u64 type,
-+ u64 old_id, u64 new_id,
-+ u64 old_flags, u64 new_flags,
-+ const char *name);
-+void kdbus_notify_flush(struct kdbus_bus *bus);
-+void kdbus_notify_free(struct kdbus_bus *bus);
-+
-+#endif
-diff --git a/ipc/kdbus/policy.c b/ipc/kdbus/policy.c
-new file mode 100644
-index 0000000..f2618e15
---- /dev/null
-+++ b/ipc/kdbus/policy.c
-@@ -0,0 +1,489 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/dcache.h>
-+#include <linux/fs.h>
-+#include <linux/init.h>
-+#include <linux/mutex.h>
-+#include <linux/sched.h>
-+#include <linux/sizes.h>
-+#include <linux/slab.h>
-+#include <linux/uaccess.h>
-+
-+#include "bus.h"
-+#include "connection.h"
-+#include "domain.h"
-+#include "item.h"
-+#include "names.h"
-+#include "policy.h"
-+
-+#define KDBUS_POLICY_HASH_SIZE 64
-+
-+/**
-+ * struct kdbus_policy_db_entry_access - a database entry access item
-+ * @type: One of KDBUS_POLICY_ACCESS_* types
-+ * @access: Access to grant. One of KDBUS_POLICY_*
-+ * @uid: For KDBUS_POLICY_ACCESS_USER, the global uid
-+ * @gid: For KDBUS_POLICY_ACCESS_GROUP, the global gid
-+ * @list: List entry item for the entry's list
-+ *
-+ * This is the internal version of struct kdbus_policy_db_access.
-+ */
-+struct kdbus_policy_db_entry_access {
-+ u8 type; /* USER, GROUP, WORLD */
-+ u8 access; /* OWN, TALK, SEE */
-+ union {
-+ kuid_t uid; /* global uid */
-+ kgid_t gid; /* global gid */
-+ };
-+ struct list_head list;
-+};
-+
-+/**
-+ * struct kdbus_policy_db_entry - a policy database entry
-+ * @name: The name to match the policy entry against
-+ * @hentry: The hash entry for the database's entries_hash
-+ * @access_list: List head for keeping tracks of the entry's
-+ * access items.
-+ * @owner: The owner of this entry. Can be a kdbus_conn or
-+ * a kdbus_ep object.
-+ * @wildcard: The name is a wildcard, such as ending on '.*'
-+ */
-+struct kdbus_policy_db_entry {
-+ char *name;
-+ struct hlist_node hentry;
-+ struct list_head access_list;
-+ const void *owner;
-+ bool wildcard:1;
-+};
-+
-+static void kdbus_policy_entry_free(struct kdbus_policy_db_entry *e)
-+{
-+ struct kdbus_policy_db_entry_access *a, *tmp;
-+
-+ list_for_each_entry_safe(a, tmp, &e->access_list, list) {
-+ list_del(&a->list);
-+ kfree(a);
-+ }
-+
-+ kfree(e->name);
-+ kfree(e);
-+}
-+
-+static unsigned int kdbus_strnhash(const char *str, size_t len)
-+{
-+ unsigned long hash = init_name_hash();
-+
-+ while (len--)
-+ hash = partial_name_hash(*str++, hash);
-+
-+ return end_name_hash(hash);
-+}
-+
-+static const struct kdbus_policy_db_entry *
-+kdbus_policy_lookup(struct kdbus_policy_db *db, const char *name, u32 hash)
-+{
-+ struct kdbus_policy_db_entry *e;
-+ const char *dot;
-+ size_t len;
-+
-+ /* find exact match */
-+ hash_for_each_possible(db->entries_hash, e, hentry, hash)
-+ if (strcmp(e->name, name) == 0 && !e->wildcard)
-+ return e;
-+
-+ /* find wildcard match */
-+
-+ dot = strrchr(name, '.');
-+ if (!dot)
-+ return NULL;
-+
-+ len = dot - name;
-+ hash = kdbus_strnhash(name, len);
-+
-+ hash_for_each_possible(db->entries_hash, e, hentry, hash)
-+ if (e->wildcard && !strncmp(e->name, name, len) &&
-+ !e->name[len])
-+ return e;
-+
-+ return NULL;
-+}
-+
-+/**
-+ * kdbus_policy_db_clear - release all memory from a policy db
-+ * @db: The policy database
-+ */
-+void kdbus_policy_db_clear(struct kdbus_policy_db *db)
-+{
-+ struct kdbus_policy_db_entry *e;
-+ struct hlist_node *tmp;
-+ unsigned int i;
-+
-+ /* purge entries */
-+ down_write(&db->entries_rwlock);
-+ hash_for_each_safe(db->entries_hash, i, tmp, e, hentry) {
-+ hash_del(&e->hentry);
-+ kdbus_policy_entry_free(e);
-+ }
-+ up_write(&db->entries_rwlock);
-+}
-+
-+/**
-+ * kdbus_policy_db_init() - initialize a new policy database
-+ * @db: The location of the database
-+ *
-+ * This initializes a new policy-db. The underlying memory must have been
-+ * cleared to zero by the caller.
-+ */
-+void kdbus_policy_db_init(struct kdbus_policy_db *db)
-+{
-+ hash_init(db->entries_hash);
-+ init_rwsem(&db->entries_rwlock);
-+}
-+
-+/**
-+ * kdbus_policy_query_unlocked() - Query the policy database
-+ * @db: Policy database
-+ * @cred: Credentials to test against
-+ * @name: Name to query
-+ * @hash: Hash value of @name
-+ *
-+ * Same as kdbus_policy_query() but requires the caller to lock the policy
-+ * database against concurrent writes.
-+ *
-+ * Return: The highest KDBUS_POLICY_* access type found, or -EPERM if none.
-+ */
-+int kdbus_policy_query_unlocked(struct kdbus_policy_db *db,
-+ const struct cred *cred, const char *name,
-+ unsigned int hash)
-+{
-+ struct kdbus_policy_db_entry_access *a;
-+ const struct kdbus_policy_db_entry *e;
-+ int i, highest = -EPERM;
-+
-+ e = kdbus_policy_lookup(db, name, hash);
-+ if (!e)
-+ return -EPERM;
-+
-+ list_for_each_entry(a, &e->access_list, list) {
-+ if ((int)a->access <= highest)
-+ continue;
-+
-+ switch (a->type) {
-+ case KDBUS_POLICY_ACCESS_USER:
-+ if (uid_eq(cred->euid, a->uid))
-+ highest = a->access;
-+ break;
-+ case KDBUS_POLICY_ACCESS_GROUP:
-+ if (gid_eq(cred->egid, a->gid)) {
-+ highest = a->access;
-+ break;
-+ }
-+
-+ for (i = 0; i < cred->group_info->ngroups; i++) {
-+ kgid_t gid = GROUP_AT(cred->group_info, i);
-+
-+ if (gid_eq(gid, a->gid)) {
-+ highest = a->access;
-+ break;
-+ }
-+ }
-+
-+ break;
-+ case KDBUS_POLICY_ACCESS_WORLD:
-+ highest = a->access;
-+ break;
-+ }
-+
-+ /* OWN is the highest possible policy */
-+ if (highest >= KDBUS_POLICY_OWN)
-+ break;
-+ }
-+
-+ return highest;
-+}
-+
-+/**
-+ * kdbus_policy_query() - Query the policy database
-+ * @db: Policy database
-+ * @cred: Credentials to test against
-+ * @name: Name to query
-+ * @hash: Hash value of @name
-+ *
-+ * Query the policy database @db for the access rights of @cred to the name
-+ * @name. The access rights of @cred are returned, or -EPERM if no access is
-+ * granted.
-+ *
-+ * This call effectively searches for the highest access-right granted to
-+ * @cred. The caller should really cache those as policy lookups are rather
-+ * expensive.
-+ *
-+ * Return: The highest KDBUS_POLICY_* access type found, or -EPERM if none.
-+ */
-+int kdbus_policy_query(struct kdbus_policy_db *db, const struct cred *cred,
-+ const char *name, unsigned int hash)
-+{
-+ int ret;
-+
-+ down_read(&db->entries_rwlock);
-+ ret = kdbus_policy_query_unlocked(db, cred, name, hash);
-+ up_read(&db->entries_rwlock);
-+
-+ return ret;
-+}
-+
-+static void __kdbus_policy_remove_owner(struct kdbus_policy_db *db,
-+ const void *owner)
-+{
-+ struct kdbus_policy_db_entry *e;
-+ struct hlist_node *tmp;
-+ int i;
-+
-+ hash_for_each_safe(db->entries_hash, i, tmp, e, hentry)
-+ if (e->owner == owner) {
-+ hash_del(&e->hentry);
-+ kdbus_policy_entry_free(e);
-+ }
-+}
-+
-+/**
-+ * kdbus_policy_remove_owner() - remove all entries related to a connection
-+ * @db: The policy database
-+ * @owner: The connection which items to remove
-+ */
-+void kdbus_policy_remove_owner(struct kdbus_policy_db *db,
-+ const void *owner)
-+{
-+ down_write(&db->entries_rwlock);
-+ __kdbus_policy_remove_owner(db, owner);
-+ up_write(&db->entries_rwlock);
-+}
-+
-+/*
-+ * Convert user provided policy access to internal kdbus policy
-+ * access
-+ */
-+static struct kdbus_policy_db_entry_access *
-+kdbus_policy_make_access(const struct kdbus_policy_access *uaccess)
-+{
-+ int ret;
-+ struct kdbus_policy_db_entry_access *a;
-+
-+ a = kzalloc(sizeof(*a), GFP_KERNEL);
-+ if (!a)
-+ return ERR_PTR(-ENOMEM);
-+
-+ ret = -EINVAL;
-+ switch (uaccess->access) {
-+ case KDBUS_POLICY_SEE:
-+ case KDBUS_POLICY_TALK:
-+ case KDBUS_POLICY_OWN:
-+ a->access = uaccess->access;
-+ break;
-+ default:
-+ goto err;
-+ }
-+
-+ switch (uaccess->type) {
-+ case KDBUS_POLICY_ACCESS_USER:
-+ a->uid = make_kuid(current_user_ns(), uaccess->id);
-+ if (!uid_valid(a->uid))
-+ goto err;
-+
-+ break;
-+ case KDBUS_POLICY_ACCESS_GROUP:
-+ a->gid = make_kgid(current_user_ns(), uaccess->id);
-+ if (!gid_valid(a->gid))
-+ goto err;
-+
-+ break;
-+ case KDBUS_POLICY_ACCESS_WORLD:
-+ break;
-+ default:
-+ goto err;
-+ }
-+
-+ a->type = uaccess->type;
-+
-+ return a;
-+
-+err:
-+ kfree(a);
-+ return ERR_PTR(ret);
-+}
-+
-+/**
-+ * kdbus_policy_set() - set a connection's policy rules
-+ * @db: The policy database
-+ * @items: A list of kdbus_item elements that contain both
-+ * names and access rules to set.
-+ * @items_size: The total size of the items.
-+ * @max_policies: The maximum number of policy entries to allow.
-+ * Pass 0 for no limit.
-+ * @allow_wildcards: Boolean value whether wildcard entries (such
-+ * ending on '.*') should be allowed.
-+ * @owner: The owner of the new policy items.
-+ *
-+ * This function sets a new set of policies for a given owner. The names and
-+ * access rules are gathered by walking the list of items passed in as
-+ * argument. An item of type KDBUS_ITEM_NAME is expected before any number of
-+ * KDBUS_ITEM_POLICY_ACCESS items. If there are more repetitions of this
-+ * pattern than denoted in @max_policies, -EINVAL is returned.
-+ *
-+ * In order to allow atomic replacement of rules, the function first removes
-+ * all entries that have been created for the given owner previously.
-+ *
-+ * Callers to this function must make sure that the owner is a custom
-+ * endpoint, or if the endpoint is a default endpoint, then it must be
-+ * either a policy holder or an activator.
-+ *
-+ * Return: 0 on success, negative errno on failure.
-+ */
-+int kdbus_policy_set(struct kdbus_policy_db *db,
-+ const struct kdbus_item *items,
-+ size_t items_size,
-+ size_t max_policies,
-+ bool allow_wildcards,
-+ const void *owner)
-+{
-+ struct kdbus_policy_db_entry_access *a;
-+ struct kdbus_policy_db_entry *e, *p;
-+ const struct kdbus_item *item;
-+ struct hlist_node *tmp;
-+ HLIST_HEAD(entries);
-+ HLIST_HEAD(restore);
-+ size_t count = 0;
-+ int i, ret = 0;
-+ u32 hash;
-+
-+ /* Walk the list of items and look for new policies */
-+ e = NULL;
-+ KDBUS_ITEMS_FOREACH(item, items, items_size) {
-+ switch (item->type) {
-+ case KDBUS_ITEM_NAME: {
-+ size_t len;
-+
-+ if (max_policies && ++count > max_policies) {
-+ ret = -E2BIG;
-+ goto exit;
-+ }
-+
-+ if (!kdbus_name_is_valid(item->str, true)) {
-+ ret = -EINVAL;
-+ goto exit;
-+ }
-+
-+ e = kzalloc(sizeof(*e), GFP_KERNEL);
-+ if (!e) {
-+ ret = -ENOMEM;
-+ goto exit;
-+ }
-+
-+ INIT_LIST_HEAD(&e->access_list);
-+ e->owner = owner;
-+ hlist_add_head(&e->hentry, &entries);
-+
-+ e->name = kstrdup(item->str, GFP_KERNEL);
-+ if (!e->name) {
-+ ret = -ENOMEM;
-+ goto exit;
-+ }
-+
-+ /*
-+ * If a supplied name ends with an '.*', cut off that
-+ * part, only store anything before it, and mark the
-+ * entry as wildcard.
-+ */
-+ len = strlen(e->name);
-+ if (len > 2 &&
-+ e->name[len - 3] == '.' &&
-+ e->name[len - 2] == '*') {
-+ if (!allow_wildcards) {
-+ ret = -EINVAL;
-+ goto exit;
-+ }
-+
-+ e->name[len - 3] = '\0';
-+ e->wildcard = true;
-+ }
-+
-+ break;
-+ }
-+
-+ case KDBUS_ITEM_POLICY_ACCESS:
-+ if (!e) {
-+ ret = -EINVAL;
-+ goto exit;
-+ }
-+
-+ a = kdbus_policy_make_access(&item->policy_access);
-+ if (IS_ERR(a)) {
-+ ret = PTR_ERR(a);
-+ goto exit;
-+ }
-+
-+ list_add_tail(&a->list, &e->access_list);
-+ break;
-+ }
-+ }
-+
-+ down_write(&db->entries_rwlock);
-+
-+ /* remember previous entries to restore in case of failure */
-+ hash_for_each_safe(db->entries_hash, i, tmp, e, hentry)
-+ if (e->owner == owner) {
-+ hash_del(&e->hentry);
-+ hlist_add_head(&e->hentry, &restore);
-+ }
-+
-+ hlist_for_each_entry_safe(e, tmp, &entries, hentry) {
-+ /* prevent duplicates */
-+ hash = kdbus_strhash(e->name);
-+ hash_for_each_possible(db->entries_hash, p, hentry, hash)
-+ if (strcmp(e->name, p->name) == 0 &&
-+ e->wildcard == p->wildcard) {
-+ ret = -EEXIST;
-+ goto restore;
-+ }
-+
-+ hlist_del(&e->hentry);
-+ hash_add(db->entries_hash, &e->hentry, hash);
-+ }
-+
-+restore:
-+ /* if we failed, flush all entries we added so far */
-+ if (ret < 0)
-+ __kdbus_policy_remove_owner(db, owner);
-+
-+ /* if we failed, restore entries, otherwise release them */
-+ hlist_for_each_entry_safe(e, tmp, &restore, hentry) {
-+ hlist_del(&e->hentry);
-+ if (ret < 0) {
-+ hash = kdbus_strhash(e->name);
-+ hash_add(db->entries_hash, &e->hentry, hash);
-+ } else {
-+ kdbus_policy_entry_free(e);
-+ }
-+ }
-+
-+ up_write(&db->entries_rwlock);
-+
-+exit:
-+ hlist_for_each_entry_safe(e, tmp, &entries, hentry) {
-+ hlist_del(&e->hentry);
-+ kdbus_policy_entry_free(e);
-+ }
-+
-+ return ret;
-+}
-diff --git a/ipc/kdbus/policy.h b/ipc/kdbus/policy.h
-new file mode 100644
-index 0000000..15dd7bc
---- /dev/null
-+++ b/ipc/kdbus/policy.h
-@@ -0,0 +1,51 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_POLICY_H
-+#define __KDBUS_POLICY_H
-+
-+#include <linux/hashtable.h>
-+#include <linux/rwsem.h>
-+
-+struct kdbus_conn;
-+struct kdbus_item;
-+
-+/**
-+ * struct kdbus_policy_db - policy database
-+ * @entries_hash: Hashtable of entries
-+ * @entries_rwlock: Mutex to protect the database's access entries
-+ */
-+struct kdbus_policy_db {
-+ DECLARE_HASHTABLE(entries_hash, 6);
-+ struct rw_semaphore entries_rwlock;
-+};
-+
-+void kdbus_policy_db_init(struct kdbus_policy_db *db);
-+void kdbus_policy_db_clear(struct kdbus_policy_db *db);
-+
-+int kdbus_policy_query_unlocked(struct kdbus_policy_db *db,
-+ const struct cred *cred, const char *name,
-+ unsigned int hash);
-+int kdbus_policy_query(struct kdbus_policy_db *db, const struct cred *cred,
-+ const char *name, unsigned int hash);
-+
-+void kdbus_policy_remove_owner(struct kdbus_policy_db *db,
-+ const void *owner);
-+int kdbus_policy_set(struct kdbus_policy_db *db,
-+ const struct kdbus_item *items,
-+ size_t items_size,
-+ size_t max_policies,
-+ bool allow_wildcards,
-+ const void *owner);
-+
-+#endif
-diff --git a/ipc/kdbus/pool.c b/ipc/kdbus/pool.c
-new file mode 100644
-index 0000000..63ccd55
---- /dev/null
-+++ b/ipc/kdbus/pool.c
-@@ -0,0 +1,728 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/aio.h>
-+#include <linux/file.h>
-+#include <linux/fs.h>
-+#include <linux/highmem.h>
-+#include <linux/init.h>
-+#include <linux/mm.h>
-+#include <linux/module.h>
-+#include <linux/pagemap.h>
-+#include <linux/rbtree.h>
-+#include <linux/sched.h>
-+#include <linux/shmem_fs.h>
-+#include <linux/sizes.h>
-+#include <linux/slab.h>
-+#include <linux/uaccess.h>
-+#include <linux/uio.h>
-+
-+#include "pool.h"
-+#include "util.h"
-+
-+/**
-+ * struct kdbus_pool - the receiver's buffer
-+ * @f: The backing shmem file
-+ * @size: The size of the file
-+ * @accounted_size: Currently accounted memory in bytes
-+ * @lock: Pool data lock
-+ * @slices: All slices sorted by address
-+ * @slices_busy: Tree of allocated slices
-+ * @slices_free: Tree of free slices
-+ *
-+ * The receiver's buffer, managed as a pool of allocated and free
-+ * slices containing the queued messages.
-+ *
-+ * Messages sent with KDBUS_CMD_SEND are copied directly by the
-+ * sending process into the receiver's pool.
-+ *
-+ * Messages received with KDBUS_CMD_RECV just return the offset
-+ * to the data placed in the pool.
-+ *
-+ * The internally allocated memory needs to be returned by the receiver
-+ * with KDBUS_CMD_FREE.
-+ */
-+struct kdbus_pool {
-+ struct file *f;
-+ size_t size;
-+ size_t accounted_size;
-+ struct mutex lock;
-+
-+ struct list_head slices;
-+ struct rb_root slices_busy;
-+ struct rb_root slices_free;
-+};
-+
-+/**
-+ * struct kdbus_pool_slice - allocated element in kdbus_pool
-+ * @pool: Pool this slice belongs to
-+ * @off: Offset of slice in the shmem file
-+ * @size: Size of slice
-+ * @entry: Entry in "all slices" list
-+ * @rb_node: Entry in free or busy list
-+ * @free: Unused slice
-+ * @accounted: Accounted as queue slice
-+ * @ref_kernel: Kernel holds a reference
-+ * @ref_user: Userspace holds a reference
-+ *
-+ * The pool has one or more slices, always spanning the entire size of the
-+ * pool.
-+ *
-+ * Every slice is an element in a list sorted by the buffer address, to
-+ * provide access to the next neighbor slice.
-+ *
-+ * Every slice is member in either the busy or the free tree. The free
-+ * tree is organized by slice size, the busy tree organized by buffer
-+ * offset.
-+ */
-+struct kdbus_pool_slice {
-+ struct kdbus_pool *pool;
-+ size_t off;
-+ size_t size;
-+
-+ struct list_head entry;
-+ struct rb_node rb_node;
-+
-+ bool free:1;
-+ bool accounted:1;
-+ bool ref_kernel:1;
-+ bool ref_user:1;
-+};
-+
-+static struct kdbus_pool_slice *kdbus_pool_slice_new(struct kdbus_pool *pool,
-+ size_t off, size_t size)
-+{
-+ struct kdbus_pool_slice *slice;
-+
-+ slice = kzalloc(sizeof(*slice), GFP_KERNEL);
-+ if (!slice)
-+ return NULL;
-+
-+ slice->pool = pool;
-+ slice->off = off;
-+ slice->size = size;
-+ slice->free = true;
-+ return slice;
-+}
-+
-+/* insert a slice into the free tree */
-+static void kdbus_pool_add_free_slice(struct kdbus_pool *pool,
-+ struct kdbus_pool_slice *slice)
-+{
-+ struct rb_node **n;
-+ struct rb_node *pn = NULL;
-+
-+ n = &pool->slices_free.rb_node;
-+ while (*n) {
-+ struct kdbus_pool_slice *pslice;
-+
-+ pn = *n;
-+ pslice = rb_entry(pn, struct kdbus_pool_slice, rb_node);
-+ if (slice->size < pslice->size)
-+ n = &pn->rb_left;
-+ else
-+ n = &pn->rb_right;
-+ }
-+
-+ rb_link_node(&slice->rb_node, pn, n);
-+ rb_insert_color(&slice->rb_node, &pool->slices_free);
-+}
-+
-+/* insert a slice into the busy tree */
-+static void kdbus_pool_add_busy_slice(struct kdbus_pool *pool,
-+ struct kdbus_pool_slice *slice)
-+{
-+ struct rb_node **n;
-+ struct rb_node *pn = NULL;
-+
-+ n = &pool->slices_busy.rb_node;
-+ while (*n) {
-+ struct kdbus_pool_slice *pslice;
-+
-+ pn = *n;
-+ pslice = rb_entry(pn, struct kdbus_pool_slice, rb_node);
-+ if (slice->off < pslice->off)
-+ n = &pn->rb_left;
-+ else if (slice->off > pslice->off)
-+ n = &pn->rb_right;
-+ else
-+ BUG();
-+ }
-+
-+ rb_link_node(&slice->rb_node, pn, n);
-+ rb_insert_color(&slice->rb_node, &pool->slices_busy);
-+}
-+
-+static struct kdbus_pool_slice *kdbus_pool_find_slice(struct kdbus_pool *pool,
-+ size_t off)
-+{
-+ struct rb_node *n;
-+
-+ n = pool->slices_busy.rb_node;
-+ while (n) {
-+ struct kdbus_pool_slice *s;
-+
-+ s = rb_entry(n, struct kdbus_pool_slice, rb_node);
-+ if (off < s->off)
-+ n = n->rb_left;
-+ else if (off > s->off)
-+ n = n->rb_right;
-+ else
-+ return s;
-+ }
-+
-+ return NULL;
-+}
-+
-+/**
-+ * kdbus_pool_slice_alloc() - allocate memory from a pool
-+ * @pool: The receiver's pool
-+ * @size: The number of bytes to allocate
-+ * @accounted: Whether this slice should be accounted for
-+ *
-+ * The returned slice is used for kdbus_pool_slice_release() to
-+ * free the allocated memory. If either @kvec or @iovec is non-NULL, the data
-+ * will be copied from kernel or userspace memory into the new slice at
-+ * offset 0.
-+ *
-+ * Return: the allocated slice on success, ERR_PTR on failure.
-+ */
-+struct kdbus_pool_slice *kdbus_pool_slice_alloc(struct kdbus_pool *pool,
-+ size_t size, bool accounted)
-+{
-+ size_t slice_size = KDBUS_ALIGN8(size);
-+ struct rb_node *n, *found = NULL;
-+ struct kdbus_pool_slice *s;
-+ int ret = 0;
-+
-+ if (WARN_ON(!size))
-+ return ERR_PTR(-EINVAL);
-+
-+ /* search a free slice with the closest matching size */
-+ mutex_lock(&pool->lock);
-+ n = pool->slices_free.rb_node;
-+ while (n) {
-+ s = rb_entry(n, struct kdbus_pool_slice, rb_node);
-+ if (slice_size < s->size) {
-+ found = n;
-+ n = n->rb_left;
-+ } else if (slice_size > s->size) {
-+ n = n->rb_right;
-+ } else {
-+ found = n;
-+ break;
-+ }
-+ }
-+
-+ /* no slice with the minimum size found in the pool */
-+ if (!found) {
-+ ret = -EXFULL;
-+ goto exit_unlock;
-+ }
-+
-+ /* no exact match, use the closest one */
-+ if (!n) {
-+ struct kdbus_pool_slice *s_new;
-+
-+ s = rb_entry(found, struct kdbus_pool_slice, rb_node);
-+
-+ /* split-off the remainder of the size to its own slice */
-+ s_new = kdbus_pool_slice_new(pool, s->off + slice_size,
-+ s->size - slice_size);
-+ if (!s_new) {
-+ ret = -ENOMEM;
-+ goto exit_unlock;
-+ }
-+
-+ list_add(&s_new->entry, &s->entry);
-+ kdbus_pool_add_free_slice(pool, s_new);
-+
-+ /* adjust our size now that we split-off another slice */
-+ s->size = slice_size;
-+ }
-+
-+ /* move slice from free to the busy tree */
-+ rb_erase(found, &pool->slices_free);
-+ kdbus_pool_add_busy_slice(pool, s);
-+
-+ WARN_ON(s->ref_kernel || s->ref_user);
-+
-+ s->ref_kernel = true;
-+ s->free = false;
-+ s->accounted = accounted;
-+ if (accounted)
-+ pool->accounted_size += s->size;
-+ mutex_unlock(&pool->lock);
-+
-+ return s;
-+
-+exit_unlock:
-+ mutex_unlock(&pool->lock);
-+ return ERR_PTR(ret);
-+}
-+
-+static void __kdbus_pool_slice_release(struct kdbus_pool_slice *slice)
-+{
-+ struct kdbus_pool *pool = slice->pool;
-+
-+ /* don't free the slice if either has a reference */
-+ if (slice->ref_kernel || slice->ref_user)
-+ return;
-+
-+ if (WARN_ON(slice->free))
-+ return;
-+
-+ rb_erase(&slice->rb_node, &pool->slices_busy);
-+
-+ /* merge with the next free slice */
-+ if (!list_is_last(&slice->entry, &pool->slices)) {
-+ struct kdbus_pool_slice *s;
-+
-+ s = list_entry(slice->entry.next,
-+ struct kdbus_pool_slice, entry);
-+ if (s->free) {
-+ rb_erase(&s->rb_node, &pool->slices_free);
-+ list_del(&s->entry);
-+ slice->size += s->size;
-+ kfree(s);
-+ }
-+ }
-+
-+ /* merge with previous free slice */
-+ if (pool->slices.next != &slice->entry) {
-+ struct kdbus_pool_slice *s;
-+
-+ s = list_entry(slice->entry.prev,
-+ struct kdbus_pool_slice, entry);
-+ if (s->free) {
-+ rb_erase(&s->rb_node, &pool->slices_free);
-+ list_del(&slice->entry);
-+ s->size += slice->size;
-+ kfree(slice);
-+ slice = s;
-+ }
-+ }
-+
-+ slice->free = true;
-+ kdbus_pool_add_free_slice(pool, slice);
-+}
-+
-+/**
-+ * kdbus_pool_slice_release() - drop kernel-reference on allocated slice
-+ * @slice: Slice allocated from the pool
-+ *
-+ * This releases the kernel-reference on the given slice. If the
-+ * kernel-reference and the user-reference on a slice are dropped, the slice is
-+ * returned to the pool.
-+ *
-+ * So far, we do not implement full ref-counting on slices. Each, kernel and
-+ * user-space can have exactly one reference to a slice. If both are dropped at
-+ * the same time, the slice is released.
-+ */
-+void kdbus_pool_slice_release(struct kdbus_pool_slice *slice)
-+{
-+ struct kdbus_pool *pool;
-+
-+ if (!slice)
-+ return;
-+
-+ /* @slice may be freed, so keep local ptr to @pool */
-+ pool = slice->pool;
-+
-+ mutex_lock(&pool->lock);
-+ /* kernel must own a ref to @slice to drop it */
-+ WARN_ON(!slice->ref_kernel);
-+ slice->ref_kernel = false;
-+ /* no longer kernel-owned, de-account slice */
-+ if (slice->accounted && !WARN_ON(pool->accounted_size < slice->size))
-+ pool->accounted_size -= slice->size;
-+ __kdbus_pool_slice_release(slice);
-+ mutex_unlock(&pool->lock);
-+}
-+
-+/**
-+ * kdbus_pool_release_offset() - release a public offset
-+ * @pool: pool to operate on
-+ * @off: offset to release
-+ *
-+ * This should be called whenever user-space frees a slice given to them. It
-+ * verifies the slice is available and public, and then drops it. It ensures
-+ * correct locking and barriers against queues.
-+ *
-+ * Return: 0 on success, ENXIO if the offset is invalid or not public.
-+ */
-+int kdbus_pool_release_offset(struct kdbus_pool *pool, size_t off)
-+{
-+ struct kdbus_pool_slice *slice;
-+ int ret = 0;
-+
-+ /* 'pool->size' is used as dummy offset for empty slices */
-+ if (off == pool->size)
-+ return 0;
-+
-+ mutex_lock(&pool->lock);
-+ slice = kdbus_pool_find_slice(pool, off);
-+ if (slice && slice->ref_user) {
-+ slice->ref_user = false;
-+ __kdbus_pool_slice_release(slice);
-+ } else {
-+ ret = -ENXIO;
-+ }
-+ mutex_unlock(&pool->lock);
-+
-+ return ret;
-+}
-+
-+/**
-+ * kdbus_pool_publish_empty() - publish empty slice to user-space
-+ * @pool: pool to operate on
-+ * @off: output storage for offset, or NULL
-+ * @size: output storage for size, or NULL
-+ *
-+ * This is the same as kdbus_pool_slice_publish(), but uses a dummy slice with
-+ * size 0. The returned offset points to the end of the pool and is never
-+ * returned on real slices.
-+ */
-+void kdbus_pool_publish_empty(struct kdbus_pool *pool, u64 *off, u64 *size)
-+{
-+ if (off)
-+ *off = pool->size;
-+ if (size)
-+ *size = 0;
-+}
-+
-+/**
-+ * kdbus_pool_slice_publish() - publish slice to user-space
-+ * @slice: The slice
-+ * @out_offset: Output storage for offset, or NULL
-+ * @out_size: Output storage for size, or NULL
-+ *
-+ * This prepares a slice to be published to user-space.
-+ *
-+ * This call combines the following operations:
-+ * * the memory region is flushed so the user's memory view is consistent
-+ * * the slice is marked as referenced by user-space, so user-space has to
-+ * call KDBUS_CMD_FREE to release it
-+ * * the offset and size of the slice are written to the given output
-+ * arguments, if non-NULL
-+ */
-+void kdbus_pool_slice_publish(struct kdbus_pool_slice *slice,
-+ u64 *out_offset, u64 *out_size)
-+{
-+ mutex_lock(&slice->pool->lock);
-+ /* kernel must own a ref to @slice to gain a user-space ref */
-+ WARN_ON(!slice->ref_kernel);
-+ slice->ref_user = true;
-+ mutex_unlock(&slice->pool->lock);
-+
-+ if (out_offset)
-+ *out_offset = slice->off;
-+ if (out_size)
-+ *out_size = slice->size;
-+}
-+
-+/**
-+ * kdbus_pool_slice_offset() - Get a slice's offset inside the pool
-+ * @slice: Slice to return the offset of
-+ *
-+ * Return: The internal offset @slice inside the pool.
-+ */
-+off_t kdbus_pool_slice_offset(const struct kdbus_pool_slice *slice)
-+{
-+ return slice->off;
-+}
-+
-+/**
-+ * kdbus_pool_slice_size() - get size of a pool slice
-+ * @slice: slice to query
-+ *
-+ * Return: size of the given slice
-+ */
-+size_t kdbus_pool_slice_size(const struct kdbus_pool_slice *slice)
-+{
-+ return slice->size;
-+}
-+
-+/**
-+ * kdbus_pool_new() - create a new pool
-+ * @name: Name of the (deleted) file which shows up in
-+ * /proc, used for debugging
-+ * @size: Maximum size of the pool
-+ *
-+ * Return: a new kdbus_pool on success, ERR_PTR on failure.
-+ */
-+struct kdbus_pool *kdbus_pool_new(const char *name, size_t size)
-+{
-+ struct kdbus_pool_slice *s;
-+ struct kdbus_pool *p;
-+ struct file *f;
-+ char *n = NULL;
-+ int ret;
-+
-+ p = kzalloc(sizeof(*p), GFP_KERNEL);
-+ if (!p)
-+ return ERR_PTR(-ENOMEM);
-+
-+ if (name) {
-+ n = kasprintf(GFP_KERNEL, KBUILD_MODNAME "-conn:%s", name);
-+ if (!n) {
-+ ret = -ENOMEM;
-+ goto exit_free;
-+ }
-+ }
-+
-+ f = shmem_file_setup(n ?: KBUILD_MODNAME "-conn", size, 0);
-+ kfree(n);
-+
-+ if (IS_ERR(f)) {
-+ ret = PTR_ERR(f);
-+ goto exit_free;
-+ }
-+
-+ ret = get_write_access(file_inode(f));
-+ if (ret < 0)
-+ goto exit_put_shmem;
-+
-+ /* allocate first slice spanning the entire pool */
-+ s = kdbus_pool_slice_new(p, 0, size);
-+ if (!s) {
-+ ret = -ENOMEM;
-+ goto exit_put_write;
-+ }
-+
-+ p->f = f;
-+ p->size = size;
-+ p->slices_free = RB_ROOT;
-+ p->slices_busy = RB_ROOT;
-+ mutex_init(&p->lock);
-+
-+ INIT_LIST_HEAD(&p->slices);
-+ list_add(&s->entry, &p->slices);
-+
-+ kdbus_pool_add_free_slice(p, s);
-+ return p;
-+
-+exit_put_write:
-+ put_write_access(file_inode(f));
-+exit_put_shmem:
-+ fput(f);
-+exit_free:
-+ kfree(p);
-+ return ERR_PTR(ret);
-+}
-+
-+/**
-+ * kdbus_pool_free() - destroy pool
-+ * @pool: The receiver's pool
-+ */
-+void kdbus_pool_free(struct kdbus_pool *pool)
-+{
-+ struct kdbus_pool_slice *s, *tmp;
-+
-+ if (!pool)
-+ return;
-+
-+ list_for_each_entry_safe(s, tmp, &pool->slices, entry) {
-+ list_del(&s->entry);
-+ kfree(s);
-+ }
-+
-+ put_write_access(file_inode(pool->f));
-+ fput(pool->f);
-+ kfree(pool);
-+}
-+
-+/**
-+ * kdbus_pool_accounted() - retrieve accounting information
-+ * @pool: pool to query
-+ * @size: output for overall pool size
-+ * @acc: output for currently accounted size
-+ *
-+ * This returns accounting information of the pool. Note that the data might
-+ * change after the function returns, as the pool lock is dropped. You need to
-+ * protect the data via other means, if you need reliable accounting.
-+ */
-+void kdbus_pool_accounted(struct kdbus_pool *pool, size_t *size, size_t *acc)
-+{
-+ mutex_lock(&pool->lock);
-+ if (size)
-+ *size = pool->size;
-+ if (acc)
-+ *acc = pool->accounted_size;
-+ mutex_unlock(&pool->lock);
-+}
-+
-+/**
-+ * kdbus_pool_slice_copy_iovec() - copy user memory to a slice
-+ * @slice: The slice to write to
-+ * @off: Offset in the slice to write to
-+ * @iov: iovec array, pointing to data to copy
-+ * @iov_len: Number of elements in @iov
-+ * @total_len: Total number of bytes described in members of @iov
-+ *
-+ * User memory referenced by @iov will be copied into @slice at offset @off.
-+ *
-+ * Return: the numbers of bytes copied, negative errno on failure.
-+ */
-+ssize_t
-+kdbus_pool_slice_copy_iovec(const struct kdbus_pool_slice *slice, loff_t off,
-+ struct iovec *iov, size_t iov_len, size_t total_len)
-+{
-+ struct iov_iter iter;
-+ ssize_t len;
-+
-+ if (WARN_ON(off + total_len > slice->size))
-+ return -EFAULT;
-+
-+ off += slice->off;
-+ iov_iter_init(&iter, WRITE, iov, iov_len, total_len);
-+ len = vfs_iter_write(slice->pool->f, &iter, &off);
-+
-+ return (len >= 0 && len != total_len) ? -EFAULT : len;
-+}
-+
-+/**
-+ * kdbus_pool_slice_copy_kvec() - copy kernel memory to a slice
-+ * @slice: The slice to write to
-+ * @off: Offset in the slice to write to
-+ * @kvec: kvec array, pointing to data to copy
-+ * @kvec_len: Number of elements in @kvec
-+ * @total_len: Total number of bytes described in members of @kvec
-+ *
-+ * Kernel memory referenced by @kvec will be copied into @slice at offset @off.
-+ *
-+ * Return: the numbers of bytes copied, negative errno on failure.
-+ */
-+ssize_t kdbus_pool_slice_copy_kvec(const struct kdbus_pool_slice *slice,
-+ loff_t off, struct kvec *kvec,
-+ size_t kvec_len, size_t total_len)
-+{
-+ struct iov_iter iter;
-+ mm_segment_t old_fs;
-+ ssize_t len;
-+
-+ if (WARN_ON(off + total_len > slice->size))
-+ return -EFAULT;
-+
-+ off += slice->off;
-+ iov_iter_kvec(&iter, WRITE | ITER_KVEC, kvec, kvec_len, total_len);
-+
-+ old_fs = get_fs();
-+ set_fs(get_ds());
-+ len = vfs_iter_write(slice->pool->f, &iter, &off);
-+ set_fs(old_fs);
-+
-+ return (len >= 0 && len != total_len) ? -EFAULT : len;
-+}
-+
-+/**
-+ * kdbus_pool_slice_copy() - copy data from one slice into another
-+ * @slice_dst: destination slice
-+ * @slice_src: source slice
-+ *
-+ * Return: 0 on success, negative error number on failure.
-+ */
-+int kdbus_pool_slice_copy(const struct kdbus_pool_slice *slice_dst,
-+ const struct kdbus_pool_slice *slice_src)
-+{
-+ struct file *f_src = slice_src->pool->f;
-+ struct file *f_dst = slice_dst->pool->f;
-+ struct inode *i_dst = file_inode(f_dst);
-+ struct address_space *mapping_dst = f_dst->f_mapping;
-+ const struct address_space_operations *aops = mapping_dst->a_ops;
-+ unsigned long len = slice_src->size;
-+ loff_t off_src = slice_src->off;
-+ loff_t off_dst = slice_dst->off;
-+ mm_segment_t old_fs;
-+ int ret = 0;
-+
-+ if (WARN_ON(slice_src->size != slice_dst->size) ||
-+ WARN_ON(slice_src->free || slice_dst->free))
-+ return -EINVAL;
-+
-+ mutex_lock(&i_dst->i_mutex);
-+ old_fs = get_fs();
-+ set_fs(get_ds());
-+ while (len > 0) {
-+ unsigned long page_off;
-+ unsigned long copy_len;
-+ char __user *kaddr;
-+ struct page *page;
-+ ssize_t n_read;
-+ void *fsdata;
-+ long status;
-+
-+ page_off = off_dst & (PAGE_CACHE_SIZE - 1);
-+ copy_len = min_t(unsigned long,
-+ PAGE_CACHE_SIZE - page_off, len);
-+
-+ status = aops->write_begin(f_dst, mapping_dst, off_dst,
-+ copy_len, 0, &page, &fsdata);
-+ if (unlikely(status < 0)) {
-+ ret = status;
-+ break;
-+ }
-+
-+ kaddr = (char __force __user *)kmap(page) + page_off;
-+ n_read = __vfs_read(f_src, kaddr, copy_len, &off_src);
-+ kunmap(page);
-+ mark_page_accessed(page);
-+ flush_dcache_page(page);
-+
-+ if (unlikely(n_read != copy_len)) {
-+ ret = -EFAULT;
-+ break;
-+ }
-+
-+ status = aops->write_end(f_dst, mapping_dst, off_dst,
-+ copy_len, copy_len, page, fsdata);
-+ if (unlikely(status != copy_len)) {
-+ ret = -EFAULT;
-+ break;
-+ }
-+
-+ off_dst += copy_len;
-+ len -= copy_len;
-+ }
-+ set_fs(old_fs);
-+ mutex_unlock(&i_dst->i_mutex);
-+
-+ return ret;
-+}
-+
-+/**
-+ * kdbus_pool_mmap() - map the pool into the process
-+ * @pool: The receiver's pool
-+ * @vma: passed by mmap() syscall
-+ *
-+ * Return: the result of the mmap() call, negative errno on failure.
-+ */
-+int kdbus_pool_mmap(const struct kdbus_pool *pool, struct vm_area_struct *vma)
-+{
-+ /* deny write access to the pool */
-+ if (vma->vm_flags & VM_WRITE)
-+ return -EPERM;
-+ vma->vm_flags &= ~VM_MAYWRITE;
-+
-+ /* do not allow to map more than the size of the file */
-+ if ((vma->vm_end - vma->vm_start) > pool->size)
-+ return -EFAULT;
-+
-+ /* replace the connection file with our shmem file */
-+ if (vma->vm_file)
-+ fput(vma->vm_file);
-+ vma->vm_file = get_file(pool->f);
-+
-+ return pool->f->f_op->mmap(pool->f, vma);
-+}
-diff --git a/ipc/kdbus/pool.h b/ipc/kdbus/pool.h
-new file mode 100644
-index 0000000..a903821
---- /dev/null
-+++ b/ipc/kdbus/pool.h
-@@ -0,0 +1,46 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_POOL_H
-+#define __KDBUS_POOL_H
-+
-+#include <linux/uio.h>
-+
-+struct kdbus_pool;
-+struct kdbus_pool_slice;
-+
-+struct kdbus_pool *kdbus_pool_new(const char *name, size_t size);
-+void kdbus_pool_free(struct kdbus_pool *pool);
-+void kdbus_pool_accounted(struct kdbus_pool *pool, size_t *size, size_t *acc);
-+int kdbus_pool_mmap(const struct kdbus_pool *pool, struct vm_area_struct *vma);
-+int kdbus_pool_release_offset(struct kdbus_pool *pool, size_t off);
-+void kdbus_pool_publish_empty(struct kdbus_pool *pool, u64 *off, u64 *size);
-+
-+struct kdbus_pool_slice *kdbus_pool_slice_alloc(struct kdbus_pool *pool,
-+ size_t size, bool accounted);
-+void kdbus_pool_slice_release(struct kdbus_pool_slice *slice);
-+void kdbus_pool_slice_publish(struct kdbus_pool_slice *slice,
-+ u64 *out_offset, u64 *out_size);
-+off_t kdbus_pool_slice_offset(const struct kdbus_pool_slice *slice);
-+size_t kdbus_pool_slice_size(const struct kdbus_pool_slice *slice);
-+int kdbus_pool_slice_copy(const struct kdbus_pool_slice *slice_dst,
-+ const struct kdbus_pool_slice *slice_src);
-+ssize_t kdbus_pool_slice_copy_kvec(const struct kdbus_pool_slice *slice,
-+ loff_t off, struct kvec *kvec,
-+ size_t kvec_count, size_t total_len);
-+ssize_t kdbus_pool_slice_copy_iovec(const struct kdbus_pool_slice *slice,
-+ loff_t off, struct iovec *iov,
-+ size_t iov_count, size_t total_len);
-+
-+#endif
-diff --git a/ipc/kdbus/queue.c b/ipc/kdbus/queue.c
-new file mode 100644
-index 0000000..f9c44d7
---- /dev/null
-+++ b/ipc/kdbus/queue.c
-@@ -0,0 +1,363 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/audit.h>
-+#include <linux/file.h>
-+#include <linux/fs.h>
-+#include <linux/hashtable.h>
-+#include <linux/idr.h>
-+#include <linux/init.h>
-+#include <linux/math64.h>
-+#include <linux/mm.h>
-+#include <linux/module.h>
-+#include <linux/mutex.h>
-+#include <linux/poll.h>
-+#include <linux/sched.h>
-+#include <linux/sizes.h>
-+#include <linux/slab.h>
-+#include <linux/syscalls.h>
-+#include <linux/uio.h>
-+
-+#include "util.h"
-+#include "domain.h"
-+#include "connection.h"
-+#include "item.h"
-+#include "message.h"
-+#include "metadata.h"
-+#include "queue.h"
-+#include "reply.h"
-+
-+/**
-+ * kdbus_queue_init() - initialize data structure related to a queue
-+ * @queue: The queue to initialize
-+ */
-+void kdbus_queue_init(struct kdbus_queue *queue)
-+{
-+ INIT_LIST_HEAD(&queue->msg_list);
-+ queue->msg_prio_queue = RB_ROOT;
-+}
-+
-+/**
-+ * kdbus_queue_peek() - Retrieves an entry from a queue
-+ * @queue: The queue
-+ * @priority: The minimum priority of the entry to peek
-+ * @use_priority: Boolean flag whether or not to peek by priority
-+ *
-+ * Look for a entry in a queue, either by priority, or the oldest one (FIFO).
-+ * The entry is not freed, put off the queue's lists or anything else.
-+ *
-+ * Return: the peeked queue entry on success, NULL if no suitable msg is found
-+ */
-+struct kdbus_queue_entry *kdbus_queue_peek(struct kdbus_queue *queue,
-+ s64 priority, bool use_priority)
-+{
-+ struct kdbus_queue_entry *e;
-+
-+ if (list_empty(&queue->msg_list))
-+ return NULL;
-+
-+ if (use_priority) {
-+ /* get next entry with highest priority */
-+ e = rb_entry(queue->msg_prio_highest,
-+ struct kdbus_queue_entry, prio_node);
-+
-+ /* no entry with the requested priority */
-+ if (e->priority > priority)
-+ return NULL;
-+ } else {
-+ /* ignore the priority, return the next entry in the entry */
-+ e = list_first_entry(&queue->msg_list,
-+ struct kdbus_queue_entry, entry);
-+ }
-+
-+ return e;
-+}
-+
-+static void kdbus_queue_entry_link(struct kdbus_queue_entry *entry)
-+{
-+ struct kdbus_queue *queue = &entry->conn->queue;
-+ struct rb_node **n, *pn = NULL;
-+ bool highest = true;
-+
-+ lockdep_assert_held(&entry->conn->lock);
-+ if (WARN_ON(!list_empty(&entry->entry)))
-+ return;
-+
-+ /* sort into priority entry tree */
-+ n = &queue->msg_prio_queue.rb_node;
-+ while (*n) {
-+ struct kdbus_queue_entry *e;
-+
-+ pn = *n;
-+ e = rb_entry(pn, struct kdbus_queue_entry, prio_node);
-+
-+ /* existing node for this priority, add to its list */
-+ if (likely(entry->priority == e->priority)) {
-+ list_add_tail(&entry->prio_entry, &e->prio_entry);
-+ goto prio_done;
-+ }
-+
-+ if (entry->priority < e->priority) {
-+ n = &pn->rb_left;
-+ } else {
-+ n = &pn->rb_right;
-+ highest = false;
-+ }
-+ }
-+
-+ /* cache highest-priority entry */
-+ if (highest)
-+ queue->msg_prio_highest = &entry->prio_node;
-+
-+ /* new node for this priority */
-+ rb_link_node(&entry->prio_node, pn, n);
-+ rb_insert_color(&entry->prio_node, &queue->msg_prio_queue);
-+ INIT_LIST_HEAD(&entry->prio_entry);
-+
-+prio_done:
-+ /* add to unsorted fifo list */
-+ list_add_tail(&entry->entry, &queue->msg_list);
-+}
-+
-+static void kdbus_queue_entry_unlink(struct kdbus_queue_entry *entry)
-+{
-+ struct kdbus_queue *queue = &entry->conn->queue;
-+
-+ lockdep_assert_held(&entry->conn->lock);
-+ if (list_empty(&entry->entry))
-+ return;
-+
-+ list_del_init(&entry->entry);
-+
-+ if (list_empty(&entry->prio_entry)) {
-+ /*
-+ * Single entry for this priority, update cached
-+ * highest-priority entry, remove the tree node.
-+ */
-+ if (queue->msg_prio_highest == &entry->prio_node)
-+ queue->msg_prio_highest = rb_next(&entry->prio_node);
-+
-+ rb_erase(&entry->prio_node, &queue->msg_prio_queue);
-+ } else {
-+ struct kdbus_queue_entry *q;
-+
-+ /*
-+ * Multiple entries for this priority entry, get next one in
-+ * the list. Update cached highest-priority entry, store the
-+ * new one as the tree node.
-+ */
-+ q = list_first_entry(&entry->prio_entry,
-+ struct kdbus_queue_entry, prio_entry);
-+ list_del(&entry->prio_entry);
-+
-+ if (queue->msg_prio_highest == &entry->prio_node)
-+ queue->msg_prio_highest = &q->prio_node;
-+
-+ rb_replace_node(&entry->prio_node, &q->prio_node,
-+ &queue->msg_prio_queue);
-+ }
-+}
-+
-+/**
-+ * kdbus_queue_entry_new() - allocate a queue entry
-+ * @src: source connection, or NULL
-+ * @dst: destination connection
-+ * @s: staging object carrying the message
-+ *
-+ * Allocates a queue entry based on a given msg and allocate space for
-+ * the message payload and the requested metadata in the connection's pool.
-+ * The entry is not actually added to the queue's lists at this point.
-+ *
-+ * Return: the allocated entry on success, or an ERR_PTR on failures.
-+ */
-+struct kdbus_queue_entry *kdbus_queue_entry_new(struct kdbus_conn *src,
-+ struct kdbus_conn *dst,
-+ struct kdbus_staging *s)
-+{
-+ struct kdbus_queue_entry *entry;
-+ int ret;
-+
-+ entry = kzalloc(sizeof(*entry), GFP_KERNEL);
-+ if (!entry)
-+ return ERR_PTR(-ENOMEM);
-+
-+ INIT_LIST_HEAD(&entry->entry);
-+ entry->priority = s->msg->priority;
-+ entry->conn = kdbus_conn_ref(dst);
-+ entry->gaps = kdbus_gaps_ref(s->gaps);
-+
-+ entry->slice = kdbus_staging_emit(s, src, dst);
-+ if (IS_ERR(entry->slice)) {
-+ ret = PTR_ERR(entry->slice);
-+ entry->slice = NULL;
-+ goto error;
-+ }
-+
-+ entry->user = src ? kdbus_user_ref(src->user) : NULL;
-+ return entry;
-+
-+error:
-+ kdbus_queue_entry_free(entry);
-+ return ERR_PTR(ret);
-+}
-+
-+/**
-+ * kdbus_queue_entry_free() - free resources of an entry
-+ * @entry: The entry to free
-+ *
-+ * Removes resources allocated by a queue entry, along with the entry itself.
-+ * Note that the entry's slice is not freed at this point.
-+ */
-+void kdbus_queue_entry_free(struct kdbus_queue_entry *entry)
-+{
-+ if (!entry)
-+ return;
-+
-+ lockdep_assert_held(&entry->conn->lock);
-+
-+ kdbus_queue_entry_unlink(entry);
-+ kdbus_reply_unref(entry->reply);
-+
-+ if (entry->slice) {
-+ kdbus_conn_quota_dec(entry->conn, entry->user,
-+ kdbus_pool_slice_size(entry->slice),
-+ entry->gaps ? entry->gaps->n_fds : 0);
-+ kdbus_pool_slice_release(entry->slice);
-+ }
-+
-+ kdbus_user_unref(entry->user);
-+ kdbus_gaps_unref(entry->gaps);
-+ kdbus_conn_unref(entry->conn);
-+ kfree(entry);
-+}
-+
-+/**
-+ * kdbus_queue_entry_install() - install message components into the
-+ * receiver's process
-+ * @entry: The queue entry to install
-+ * @return_flags: Pointer to store the return flags for userspace
-+ * @install_fds: Whether or not to install associated file descriptors
-+ *
-+ * Return: 0 on success.
-+ */
-+int kdbus_queue_entry_install(struct kdbus_queue_entry *entry,
-+ u64 *return_flags, bool install_fds)
-+{
-+ bool incomplete_fds = false;
-+ int ret;
-+
-+ lockdep_assert_held(&entry->conn->lock);
-+
-+ ret = kdbus_gaps_install(entry->gaps, entry->slice, &incomplete_fds);
-+ if (ret < 0)
-+ return ret;
-+
-+ if (incomplete_fds)
-+ *return_flags |= KDBUS_RECV_RETURN_INCOMPLETE_FDS;
-+ return 0;
-+}
-+
-+/**
-+ * kdbus_queue_entry_enqueue() - enqueue an entry
-+ * @entry: entry to enqueue
-+ * @reply: reply to link to this entry (or NULL if none)
-+ *
-+ * This enqueues an unqueued entry into the message queue of the linked
-+ * connection. It also binds a reply object to the entry so we can remember it
-+ * when the message is moved.
-+ *
-+ * Once this call returns (and the connection lock is released), this entry can
-+ * be dequeued by the target connection. Note that the entry will not be removed
-+ * from the queue until it is destroyed.
-+ */
-+void kdbus_queue_entry_enqueue(struct kdbus_queue_entry *entry,
-+ struct kdbus_reply *reply)
-+{
-+ lockdep_assert_held(&entry->conn->lock);
-+
-+ if (WARN_ON(entry->reply) || WARN_ON(!list_empty(&entry->entry)))
-+ return;
-+
-+ entry->reply = kdbus_reply_ref(reply);
-+ kdbus_queue_entry_link(entry);
-+}
-+
-+/**
-+ * kdbus_queue_entry_move() - move queue entry
-+ * @e: queue entry to move
-+ * @dst: destination connection to queue the entry on
-+ *
-+ * This moves a queue entry onto a different connection. It allocates a new
-+ * slice on the target connection and copies the message over. If the copy
-+ * succeeded, we move the entry from @src to @dst.
-+ *
-+ * On failure, the entry is left untouched.
-+ *
-+ * The queue entry must be queued right now, and after the call succeeds it will
-+ * be queued on the destination, but no longer on the source.
-+ *
-+ * The caller must hold the connection lock of the source *and* destination.
-+ *
-+ * Return: 0 on success, negative error code on failure.
-+ */
-+int kdbus_queue_entry_move(struct kdbus_queue_entry *e,
-+ struct kdbus_conn *dst)
-+{
-+ struct kdbus_pool_slice *slice = NULL;
-+ struct kdbus_conn *src = e->conn;
-+ size_t size, fds;
-+ int ret;
-+
-+ lockdep_assert_held(&src->lock);
-+ lockdep_assert_held(&dst->lock);
-+
-+ if (WARN_ON(list_empty(&e->entry)))
-+ return -EINVAL;
-+ if (src == dst)
-+ return 0;
-+
-+ size = kdbus_pool_slice_size(e->slice);
-+ fds = e->gaps ? e->gaps->n_fds : 0;
-+
-+ ret = kdbus_conn_quota_inc(dst, e->user, size, fds);
-+ if (ret < 0)
-+ return ret;
-+
-+ slice = kdbus_pool_slice_alloc(dst->pool, size, true);
-+ if (IS_ERR(slice)) {
-+ ret = PTR_ERR(slice);
-+ slice = NULL;
-+ goto error;
-+ }
-+
-+ ret = kdbus_pool_slice_copy(slice, e->slice);
-+ if (ret < 0)
-+ goto error;
-+
-+ kdbus_queue_entry_unlink(e);
-+ kdbus_conn_quota_dec(src, e->user, size, fds);
-+ kdbus_pool_slice_release(e->slice);
-+ kdbus_conn_unref(e->conn);
-+
-+ e->slice = slice;
-+ e->conn = kdbus_conn_ref(dst);
-+ kdbus_queue_entry_link(e);
-+
-+ return 0;
-+
-+error:
-+ kdbus_pool_slice_release(slice);
-+ kdbus_conn_quota_dec(dst, e->user, size, fds);
-+ return ret;
-+}
-diff --git a/ipc/kdbus/queue.h b/ipc/kdbus/queue.h
-new file mode 100644
-index 0000000..bf686d1
---- /dev/null
-+++ b/ipc/kdbus/queue.h
-@@ -0,0 +1,84 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_QUEUE_H
-+#define __KDBUS_QUEUE_H
-+
-+#include <linux/list.h>
-+#include <linux/rbtree.h>
-+
-+struct kdbus_conn;
-+struct kdbus_pool_slice;
-+struct kdbus_reply;
-+struct kdbus_staging;
-+struct kdbus_user;
-+
-+/**
-+ * struct kdbus_queue - a connection's message queue
-+ * @msg_list: List head for kdbus_queue_entry objects
-+ * @msg_prio_queue: RB tree root for messages, sorted by priority
-+ * @msg_prio_highest: Link to the RB node referencing the message with the
-+ * highest priority in the tree.
-+ */
-+struct kdbus_queue {
-+ struct list_head msg_list;
-+ struct rb_root msg_prio_queue;
-+ struct rb_node *msg_prio_highest;
-+};
-+
-+/**
-+ * struct kdbus_queue_entry - messages waiting to be read
-+ * @entry: Entry in the connection's list
-+ * @prio_node: Entry in the priority queue tree
-+ * @prio_entry: Queue tree node entry in the list of one priority
-+ * @priority: Message priority
-+ * @dst_name_id: The sequence number of the name this message is
-+ * addressed to, 0 for messages sent to an ID
-+ * @conn: Connection this entry is queued on
-+ * @gaps: Gaps object to fill message gaps at RECV time
-+ * @user: User used for accounting
-+ * @slice: Slice in the receiver's pool for the message
-+ * @reply: The reply block if a reply to this message is expected
-+ */
-+struct kdbus_queue_entry {
-+ struct list_head entry;
-+ struct rb_node prio_node;
-+ struct list_head prio_entry;
-+
-+ s64 priority;
-+ u64 dst_name_id;
-+
-+ struct kdbus_conn *conn;
-+ struct kdbus_gaps *gaps;
-+ struct kdbus_user *user;
-+ struct kdbus_pool_slice *slice;
-+ struct kdbus_reply *reply;
-+};
-+
-+void kdbus_queue_init(struct kdbus_queue *queue);
-+struct kdbus_queue_entry *kdbus_queue_peek(struct kdbus_queue *queue,
-+ s64 priority, bool use_priority);
-+
-+struct kdbus_queue_entry *kdbus_queue_entry_new(struct kdbus_conn *src,
-+ struct kdbus_conn *dst,
-+ struct kdbus_staging *s);
-+void kdbus_queue_entry_free(struct kdbus_queue_entry *entry);
-+int kdbus_queue_entry_install(struct kdbus_queue_entry *entry,
-+ u64 *return_flags, bool install_fds);
-+void kdbus_queue_entry_enqueue(struct kdbus_queue_entry *entry,
-+ struct kdbus_reply *reply);
-+int kdbus_queue_entry_move(struct kdbus_queue_entry *entry,
-+ struct kdbus_conn *dst);
-+
-+#endif /* __KDBUS_QUEUE_H */
-diff --git a/ipc/kdbus/reply.c b/ipc/kdbus/reply.c
-new file mode 100644
-index 0000000..e6791d8
---- /dev/null
-+++ b/ipc/kdbus/reply.c
-@@ -0,0 +1,252 @@
-+#include <linux/init.h>
-+#include <linux/mm.h>
-+#include <linux/module.h>
-+#include <linux/mutex.h>
-+#include <linux/slab.h>
-+#include <linux/uio.h>
-+
-+#include "bus.h"
-+#include "connection.h"
-+#include "endpoint.h"
-+#include "message.h"
-+#include "metadata.h"
-+#include "names.h"
-+#include "domain.h"
-+#include "item.h"
-+#include "notify.h"
-+#include "policy.h"
-+#include "reply.h"
-+#include "util.h"
-+
-+/**
-+ * kdbus_reply_new() - Allocate and set up a new kdbus_reply object
-+ * @reply_src: The connection a reply is expected from
-+ * @reply_dst: The connection this reply object belongs to
-+ * @msg: Message associated with the reply
-+ * @name_entry: Name entry used to send the message
-+ * @sync: Whether or not to make this reply synchronous
-+ *
-+ * Allocate and fill a new kdbus_reply object.
-+ *
-+ * Return: New kdbus_conn object on success, ERR_PTR on error.
-+ */
-+struct kdbus_reply *kdbus_reply_new(struct kdbus_conn *reply_src,
-+ struct kdbus_conn *reply_dst,
-+ const struct kdbus_msg *msg,
-+ struct kdbus_name_entry *name_entry,
-+ bool sync)
-+{
-+ struct kdbus_reply *r;
-+ int ret;
-+
-+ if (atomic_inc_return(&reply_dst->request_count) >
-+ KDBUS_CONN_MAX_REQUESTS_PENDING) {
-+ ret = -EMLINK;
-+ goto exit_dec_request_count;
-+ }
-+
-+ r = kzalloc(sizeof(*r), GFP_KERNEL);
-+ if (!r) {
-+ ret = -ENOMEM;
-+ goto exit_dec_request_count;
-+ }
-+
-+ kref_init(&r->kref);
-+ INIT_LIST_HEAD(&r->entry);
-+ r->reply_src = kdbus_conn_ref(reply_src);
-+ r->reply_dst = kdbus_conn_ref(reply_dst);
-+ r->cookie = msg->cookie;
-+ r->name_id = name_entry ? name_entry->name_id : 0;
-+ r->deadline_ns = msg->timeout_ns;
-+
-+ if (sync) {
-+ r->sync = true;
-+ r->waiting = true;
-+ }
-+
-+ return r;
-+
-+exit_dec_request_count:
-+ atomic_dec(&reply_dst->request_count);
-+ return ERR_PTR(ret);
-+}
-+
-+static void __kdbus_reply_free(struct kref *kref)
-+{
-+ struct kdbus_reply *reply =
-+ container_of(kref, struct kdbus_reply, kref);
-+
-+ atomic_dec(&reply->reply_dst->request_count);
-+ kdbus_conn_unref(reply->reply_src);
-+ kdbus_conn_unref(reply->reply_dst);
-+ kfree(reply);
-+}
-+
-+/**
-+ * kdbus_reply_ref() - Increase reference on kdbus_reply
-+ * @r: The reply, may be %NULL
-+ *
-+ * Return: The reply object with an extra reference
-+ */
-+struct kdbus_reply *kdbus_reply_ref(struct kdbus_reply *r)
-+{
-+ if (r)
-+ kref_get(&r->kref);
-+ return r;
-+}
-+
-+/**
-+ * kdbus_reply_unref() - Decrease reference on kdbus_reply
-+ * @r: The reply, may be %NULL
-+ *
-+ * Return: NULL
-+ */
-+struct kdbus_reply *kdbus_reply_unref(struct kdbus_reply *r)
-+{
-+ if (r)
-+ kref_put(&r->kref, __kdbus_reply_free);
-+ return NULL;
-+}
-+
-+/**
-+ * kdbus_reply_link() - Link reply object into target connection
-+ * @r: Reply to link
-+ */
-+void kdbus_reply_link(struct kdbus_reply *r)
-+{
-+ if (WARN_ON(!list_empty(&r->entry)))
-+ return;
-+
-+ list_add(&r->entry, &r->reply_dst->reply_list);
-+ kdbus_reply_ref(r);
-+}
-+
-+/**
-+ * kdbus_reply_unlink() - Unlink reply object from target connection
-+ * @r: Reply to unlink
-+ */
-+void kdbus_reply_unlink(struct kdbus_reply *r)
-+{
-+ if (!list_empty(&r->entry)) {
-+ list_del_init(&r->entry);
-+ kdbus_reply_unref(r);
-+ }
-+}
-+
-+/**
-+ * kdbus_sync_reply_wakeup() - Wake a synchronously blocking reply
-+ * @reply: The reply object
-+ * @err: Error code to set on the remote side
-+ *
-+ * Wake up remote peer (method origin) with the appropriate synchronous reply
-+ * code.
-+ */
-+void kdbus_sync_reply_wakeup(struct kdbus_reply *reply, int err)
-+{
-+ if (WARN_ON(!reply->sync))
-+ return;
-+
-+ reply->waiting = false;
-+ reply->err = err;
-+ wake_up_interruptible(&reply->reply_dst->wait);
-+}
-+
-+/**
-+ * kdbus_reply_find() - Find the corresponding reply object
-+ * @replying: The replying connection or NULL
-+ * @reply_dst: The connection the reply will be sent to
-+ * (method origin)
-+ * @cookie: The cookie of the requesting message
-+ *
-+ * Lookup a reply object that should be sent as a reply by
-+ * @replying to @reply_dst with the given cookie.
-+ *
-+ * Callers must take the @reply_dst lock.
-+ *
-+ * Return: the corresponding reply object or NULL if not found
-+ */
-+struct kdbus_reply *kdbus_reply_find(struct kdbus_conn *replying,
-+ struct kdbus_conn *reply_dst,
-+ u64 cookie)
-+{
-+ struct kdbus_reply *r;
-+
-+ list_for_each_entry(r, &reply_dst->reply_list, entry) {
-+ if (r->cookie == cookie &&
-+ (!replying || r->reply_src == replying))
-+ return r;
-+ }
-+
-+ return NULL;
-+}
-+
-+/**
-+ * kdbus_reply_list_scan_work() - Worker callback to scan the replies of a
-+ * connection for exceeded timeouts
-+ * @work: Work struct of the connection to scan
-+ *
-+ * Walk the list of replies stored with a connection and look for entries
-+ * that have exceeded their timeout. If such an entry is found, a timeout
-+ * notification is sent to the waiting peer, and the reply is removed from
-+ * the list.
-+ *
-+ * The work is rescheduled to the nearest timeout found during the list
-+ * iteration.
-+ */
-+void kdbus_reply_list_scan_work(struct work_struct *work)
-+{
-+ struct kdbus_conn *conn =
-+ container_of(work, struct kdbus_conn, work.work);
-+ struct kdbus_reply *reply, *reply_tmp;
-+ u64 deadline = ~0ULL;
-+ u64 now;
-+
-+ now = ktime_get_ns();
-+
-+ mutex_lock(&conn->lock);
-+ if (!kdbus_conn_active(conn)) {
-+ mutex_unlock(&conn->lock);
-+ return;
-+ }
-+
-+ list_for_each_entry_safe(reply, reply_tmp, &conn->reply_list, entry) {
-+ /*
-+ * If the reply block is waiting for synchronous I/O,
-+ * the timeout is handled by wait_event_*_timeout(),
-+ * so we don't have to care for it here.
-+ */
-+ if (reply->sync && !reply->interrupted)
-+ continue;
-+
-+ WARN_ON(reply->reply_dst != conn);
-+
-+ if (reply->deadline_ns > now) {
-+ /* remember next timeout */
-+ if (deadline > reply->deadline_ns)
-+ deadline = reply->deadline_ns;
-+
-+ continue;
-+ }
-+
-+ /*
-+ * A zero deadline means the connection died, was
-+ * cleaned up already and the notification was sent.
-+ * Don't send notifications for reply trackers that were
-+ * left in an interrupted syscall state.
-+ */
-+ if (reply->deadline_ns != 0 && !reply->interrupted)
-+ kdbus_notify_reply_timeout(conn->ep->bus, conn->id,
-+ reply->cookie);
-+
-+ kdbus_reply_unlink(reply);
-+ }
-+
-+ /* rearm delayed work with next timeout */
-+ if (deadline != ~0ULL)
-+ schedule_delayed_work(&conn->work,
-+ nsecs_to_jiffies(deadline - now));
-+
-+ mutex_unlock(&conn->lock);
-+
-+ kdbus_notify_flush(conn->ep->bus);
-+}
-diff --git a/ipc/kdbus/reply.h b/ipc/kdbus/reply.h
-new file mode 100644
-index 0000000..68d5232
---- /dev/null
-+++ b/ipc/kdbus/reply.h
-@@ -0,0 +1,68 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_REPLY_H
-+#define __KDBUS_REPLY_H
-+
-+/**
-+ * struct kdbus_reply - an entry of kdbus_conn's list of replies
-+ * @kref: Ref-count of this object
-+ * @entry: The entry of the connection's reply_list
-+ * @reply_src: The connection the reply will be sent from
-+ * @reply_dst: The connection the reply will be sent to
-+ * @queue_entry: The queue entry item that is prepared by the replying
-+ * connection
-+ * @deadline_ns: The deadline of the reply, in nanoseconds
-+ * @cookie: The cookie of the requesting message
-+ * @name_id: ID of the well-known name the original msg was sent to
-+ * @sync: The reply block is waiting for synchronous I/O
-+ * @waiting: The condition to synchronously wait for
-+ * @interrupted: The sync reply was left in an interrupted state
-+ * @err: The error code for the synchronous reply
-+ */
-+struct kdbus_reply {
-+ struct kref kref;
-+ struct list_head entry;
-+ struct kdbus_conn *reply_src;
-+ struct kdbus_conn *reply_dst;
-+ struct kdbus_queue_entry *queue_entry;
-+ u64 deadline_ns;
-+ u64 cookie;
-+ u64 name_id;
-+ bool sync:1;
-+ bool waiting:1;
-+ bool interrupted:1;
-+ int err;
-+};
-+
-+struct kdbus_reply *kdbus_reply_new(struct kdbus_conn *reply_src,
-+ struct kdbus_conn *reply_dst,
-+ const struct kdbus_msg *msg,
-+ struct kdbus_name_entry *name_entry,
-+ bool sync);
-+
-+struct kdbus_reply *kdbus_reply_ref(struct kdbus_reply *r);
-+struct kdbus_reply *kdbus_reply_unref(struct kdbus_reply *r);
-+
-+void kdbus_reply_link(struct kdbus_reply *r);
-+void kdbus_reply_unlink(struct kdbus_reply *r);
-+
-+struct kdbus_reply *kdbus_reply_find(struct kdbus_conn *replying,
-+ struct kdbus_conn *reply_dst,
-+ u64 cookie);
-+
-+void kdbus_sync_reply_wakeup(struct kdbus_reply *reply, int err);
-+void kdbus_reply_list_scan_work(struct work_struct *work);
-+
-+#endif /* __KDBUS_REPLY_H */
-diff --git a/ipc/kdbus/util.c b/ipc/kdbus/util.c
-new file mode 100644
-index 0000000..72b1883
---- /dev/null
-+++ b/ipc/kdbus/util.c
-@@ -0,0 +1,156 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <linux/capability.h>
-+#include <linux/cred.h>
-+#include <linux/ctype.h>
-+#include <linux/err.h>
-+#include <linux/file.h>
-+#include <linux/slab.h>
-+#include <linux/string.h>
-+#include <linux/uaccess.h>
-+#include <linux/uio.h>
-+#include <linux/user_namespace.h>
-+
-+#include "limits.h"
-+#include "util.h"
-+
-+/**
-+ * kdbus_copy_from_user() - copy aligned data from user-space
-+ * @dest: target buffer in kernel memory
-+ * @user_ptr: user-provided source buffer
-+ * @size: memory size to copy from user
-+ *
-+ * This copies @size bytes from @user_ptr into the kernel, just like
-+ * copy_from_user() does. But we enforce an 8-byte alignment and reject any
-+ * unaligned user-space pointers.
-+ *
-+ * Return: 0 on success, negative error code on failure.
-+ */
-+int kdbus_copy_from_user(void *dest, void __user *user_ptr, size_t size)
-+{
-+ if (!KDBUS_IS_ALIGNED8((uintptr_t)user_ptr))
-+ return -EFAULT;
-+
-+ if (copy_from_user(dest, user_ptr, size))
-+ return -EFAULT;
-+
-+ return 0;
-+}
-+
-+/**
-+ * kdbus_verify_uid_prefix() - verify UID prefix of a user-supplied name
-+ * @name: user-supplied name to verify
-+ * @user_ns: user-namespace to act in
-+ * @kuid: Kernel internal uid of user
-+ *
-+ * This verifies that the user-supplied name @name has their UID as prefix. This
-+ * is the default name-spacing policy we enforce on user-supplied names for
-+ * public kdbus entities like buses and endpoints.
-+ *
-+ * The user must supply names prefixed with "<UID>-", whereas the UID is
-+ * interpreted in the user-namespace of the domain. If the user fails to supply
-+ * such a prefixed name, we reject it.
-+ *
-+ * Return: 0 on success, negative error code on failure
-+ */
-+int kdbus_verify_uid_prefix(const char *name, struct user_namespace *user_ns,
-+ kuid_t kuid)
-+{
-+ uid_t uid;
-+ char prefix[16];
-+
-+ /*
-+ * The kuid must have a mapping into the userns of the domain
-+ * otherwise do not allow creation of buses nor endpoints.
-+ */
-+ uid = from_kuid(user_ns, kuid);
-+ if (uid == (uid_t) -1)
-+ return -EINVAL;
-+
-+ snprintf(prefix, sizeof(prefix), "%u-", uid);
-+ if (strncmp(name, prefix, strlen(prefix)) != 0)
-+ return -EINVAL;
-+
-+ return 0;
-+}
-+
-+/**
-+ * kdbus_sanitize_attach_flags() - Sanitize attach flags from user-space
-+ * @flags: Attach flags provided by userspace
-+ * @attach_flags: A pointer where to store the valid attach flags
-+ *
-+ * Convert attach-flags provided by user-space into a valid mask. If the mask
-+ * is invalid, an error is returned. The sanitized attach flags are stored in
-+ * the output parameter.
-+ *
-+ * Return: 0 on success, negative error on failure.
-+ */
-+int kdbus_sanitize_attach_flags(u64 flags, u64 *attach_flags)
-+{
-+ /* 'any' degrades to 'all' for compatibility */
-+ if (flags == _KDBUS_ATTACH_ANY)
-+ flags = _KDBUS_ATTACH_ALL;
-+
-+ /* reject unknown attach flags */
-+ if (flags & ~_KDBUS_ATTACH_ALL)
-+ return -EINVAL;
-+
-+ *attach_flags = flags;
-+ return 0;
-+}
-+
-+/**
-+ * kdbus_kvec_set - helper utility to assemble kvec arrays
-+ * @kvec: kvec entry to use
-+ * @src: Source address to set in @kvec
-+ * @len: Number of bytes in @src
-+ * @total_len: Pointer to total length variable
-+ *
-+ * Set @src and @len in @kvec, and increase @total_len by @len.
-+ */
-+void kdbus_kvec_set(struct kvec *kvec, void *src, size_t len, u64 *total_len)
-+{
-+ kvec->iov_base = src;
-+ kvec->iov_len = len;
-+ *total_len += len;
-+}
-+
-+static const char * const zeros = "\0\0\0\0\0\0\0";
-+
-+/**
-+ * kdbus_kvec_pad - conditionally write a padding kvec
-+ * @kvec: kvec entry to use
-+ * @len: Total length used for kvec array
-+ *
-+ * Check if the current total byte length of the array in @len is aligned to
-+ * 8 bytes. If it isn't, fill @kvec with padding information and increase @len
-+ * by the number of bytes stored in @kvec.
-+ *
-+ * Return: the number of added padding bytes.
-+ */
-+size_t kdbus_kvec_pad(struct kvec *kvec, u64 *len)
-+{
-+ size_t pad = KDBUS_ALIGN8(*len) - *len;
-+
-+ if (!pad)
-+ return 0;
-+
-+ kvec->iov_base = (void *)zeros;
-+ kvec->iov_len = pad;
-+
-+ *len += pad;
-+
-+ return pad;
-+}
-diff --git a/ipc/kdbus/util.h b/ipc/kdbus/util.h
-new file mode 100644
-index 0000000..5297166
---- /dev/null
-+++ b/ipc/kdbus/util.h
-@@ -0,0 +1,73 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-+ * Copyright (C) 2013-2015 Daniel Mack <daniel@zonque.org>
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ * Copyright (C) 2013-2015 Linux Foundation
-+ * Copyright (C) 2014-2015 Djalal Harouni <tixxdz@opendz.org>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#ifndef __KDBUS_UTIL_H
-+#define __KDBUS_UTIL_H
-+
-+#include <linux/dcache.h>
-+#include <linux/ioctl.h>
-+
-+#include <uapi/linux/kdbus.h>
-+
-+/* all exported addresses are 64 bit */
-+#define KDBUS_PTR(addr) ((void __user *)(uintptr_t)(addr))
-+
-+/* all exported sizes are 64 bit and data aligned to 64 bit */
-+#define KDBUS_ALIGN8(s) ALIGN((s), 8)
-+#define KDBUS_IS_ALIGNED8(s) (IS_ALIGNED(s, 8))
-+
-+/**
-+ * kdbus_member_set_user - write a structure member to user memory
-+ * @_s: Variable to copy from
-+ * @_b: Buffer to write to
-+ * @_t: Structure type
-+ * @_m: Member name in the passed structure
-+ *
-+ * Return: the result of copy_to_user()
-+ */
-+#define kdbus_member_set_user(_s, _b, _t, _m) \
-+({ \
-+ u64 __user *_sz = \
-+ (void __user *)((u8 __user *)(_b) + offsetof(_t, _m)); \
-+ copy_to_user(_sz, _s, FIELD_SIZEOF(_t, _m)); \
-+})
-+
-+/**
-+ * kdbus_strhash - calculate a hash
-+ * @str: String
-+ *
-+ * Return: hash value
-+ */
-+static inline unsigned int kdbus_strhash(const char *str)
-+{
-+ unsigned long hash = init_name_hash();
-+
-+ while (*str)
-+ hash = partial_name_hash(*str++, hash);
-+
-+ return end_name_hash(hash);
-+}
-+
-+int kdbus_verify_uid_prefix(const char *name, struct user_namespace *user_ns,
-+ kuid_t kuid);
-+int kdbus_sanitize_attach_flags(u64 flags, u64 *attach_flags);
-+
-+int kdbus_copy_from_user(void *dest, void __user *user_ptr, size_t size);
-+
-+struct kvec;
-+
-+void kdbus_kvec_set(struct kvec *kvec, void *src, size_t len, u64 *total_len);
-+size_t kdbus_kvec_pad(struct kvec *kvec, u64 *len);
-+
-+#endif
-diff --git a/samples/Kconfig b/samples/Kconfig
-index 224ebb4..a4c6b2f 100644
---- a/samples/Kconfig
-+++ b/samples/Kconfig
-@@ -55,6 +55,13 @@ config SAMPLE_KDB
- Build an example of how to dynamically add the hello
- command to the kdb shell.
-
-+config SAMPLE_KDBUS
-+ bool "Build kdbus API example"
-+ depends on KDBUS
-+ help
-+ Build an example of how the kdbus API can be used from
-+ userspace.
-+
- config SAMPLE_RPMSG_CLIENT
- tristate "Build rpmsg client sample -- loadable modules only"
- depends on RPMSG && m
-diff --git a/samples/Makefile b/samples/Makefile
-index f00257b..f0ad51e 100644
---- a/samples/Makefile
-+++ b/samples/Makefile
-@@ -1,4 +1,5 @@
- # Makefile for Linux samples code
-
- obj-$(CONFIG_SAMPLES) += kobject/ kprobes/ trace_events/ livepatch/ \
-- hw_breakpoint/ kfifo/ kdb/ hidraw/ rpmsg/ seccomp/
-+ hw_breakpoint/ kfifo/ kdb/ kdbus/ hidraw/ rpmsg/ \
-+ seccomp/
-diff --git a/samples/kdbus/.gitignore b/samples/kdbus/.gitignore
-new file mode 100644
-index 0000000..ee07d98
---- /dev/null
-+++ b/samples/kdbus/.gitignore
-@@ -0,0 +1 @@
-+kdbus-workers
-diff --git a/samples/kdbus/Makefile b/samples/kdbus/Makefile
-new file mode 100644
-index 0000000..137f842
---- /dev/null
-+++ b/samples/kdbus/Makefile
-@@ -0,0 +1,9 @@
-+# kbuild trick to avoid linker error. Can be omitted if a module is built.
-+obj- := dummy.o
-+
-+hostprogs-$(CONFIG_SAMPLE_KDBUS) += kdbus-workers
-+
-+always := $(hostprogs-y)
-+
-+HOSTCFLAGS_kdbus-workers.o += -I$(objtree)/usr/include
-+HOSTLOADLIBES_kdbus-workers := -lrt
-diff --git a/samples/kdbus/kdbus-api.h b/samples/kdbus/kdbus-api.h
-new file mode 100644
-index 0000000..7f3abae
---- /dev/null
-+++ b/samples/kdbus/kdbus-api.h
-@@ -0,0 +1,114 @@
-+#ifndef KDBUS_API_H
-+#define KDBUS_API_H
-+
-+#include <sys/ioctl.h>
-+#include <linux/kdbus.h>
-+
-+#define KDBUS_ALIGN8(l) (((l) + 7) & ~7)
-+#define KDBUS_ITEM_HEADER_SIZE offsetof(struct kdbus_item, data)
-+#define KDBUS_ITEM_SIZE(s) KDBUS_ALIGN8((s) + KDBUS_ITEM_HEADER_SIZE)
-+#define KDBUS_ITEM_NEXT(item) \
-+ (typeof(item))((uint8_t *)(item) + KDBUS_ALIGN8((item)->size))
-+#define KDBUS_FOREACH(iter, first, _size) \
-+ for ((iter) = (first); \
-+ ((uint8_t *)(iter) < (uint8_t *)(first) + (_size)) && \
-+ ((uint8_t *)(iter) >= (uint8_t *)(first)); \
-+ (iter) = (void *)((uint8_t *)(iter) + KDBUS_ALIGN8((iter)->size)))
-+
-+static inline int kdbus_cmd_bus_make(int control_fd, struct kdbus_cmd *cmd)
-+{
-+ int ret = ioctl(control_fd, KDBUS_CMD_BUS_MAKE, cmd);
-+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_endpoint_make(int bus_fd, struct kdbus_cmd *cmd)
-+{
-+ int ret = ioctl(bus_fd, KDBUS_CMD_ENDPOINT_MAKE, cmd);
-+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_endpoint_update(int ep_fd, struct kdbus_cmd *cmd)
-+{
-+ int ret = ioctl(ep_fd, KDBUS_CMD_ENDPOINT_UPDATE, cmd);
-+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_hello(int bus_fd, struct kdbus_cmd_hello *cmd)
-+{
-+ int ret = ioctl(bus_fd, KDBUS_CMD_HELLO, cmd);
-+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_update(int fd, struct kdbus_cmd *cmd)
-+{
-+ int ret = ioctl(fd, KDBUS_CMD_UPDATE, cmd);
-+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_byebye(int conn_fd, struct kdbus_cmd *cmd)
-+{
-+ int ret = ioctl(conn_fd, KDBUS_CMD_BYEBYE, cmd);
-+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_free(int conn_fd, struct kdbus_cmd_free *cmd)
-+{
-+ int ret = ioctl(conn_fd, KDBUS_CMD_FREE, cmd);
-+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_conn_info(int conn_fd, struct kdbus_cmd_info *cmd)
-+{
-+ int ret = ioctl(conn_fd, KDBUS_CMD_CONN_INFO, cmd);
-+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_bus_creator_info(int conn_fd, struct kdbus_cmd_info *cmd)
-+{
-+ int ret = ioctl(conn_fd, KDBUS_CMD_BUS_CREATOR_INFO, cmd);
-+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_list(int fd, struct kdbus_cmd_list *cmd)
-+{
-+ int ret = ioctl(fd, KDBUS_CMD_LIST, cmd);
-+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_send(int conn_fd, struct kdbus_cmd_send *cmd)
-+{
-+ int ret = ioctl(conn_fd, KDBUS_CMD_SEND, cmd);
-+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_recv(int conn_fd, struct kdbus_cmd_recv *cmd)
-+{
-+ int ret = ioctl(conn_fd, KDBUS_CMD_RECV, cmd);
-+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_name_acquire(int conn_fd, struct kdbus_cmd *cmd)
-+{
-+ int ret = ioctl(conn_fd, KDBUS_CMD_NAME_ACQUIRE, cmd);
-+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_name_release(int conn_fd, struct kdbus_cmd *cmd)
-+{
-+ int ret = ioctl(conn_fd, KDBUS_CMD_NAME_RELEASE, cmd);
-+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_match_add(int conn_fd, struct kdbus_cmd_match *cmd)
-+{
-+ int ret = ioctl(conn_fd, KDBUS_CMD_MATCH_ADD, cmd);
-+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+static inline int kdbus_cmd_match_remove(int conn_fd, struct kdbus_cmd_match *cmd)
-+{
-+ int ret = ioctl(conn_fd, KDBUS_CMD_MATCH_REMOVE, cmd);
-+ return (ret < 0) ? (errno > 0 ? -errno : -EINVAL) : 0;
-+}
-+
-+#endif /* KDBUS_API_H */
-diff --git a/samples/kdbus/kdbus-workers.c b/samples/kdbus/kdbus-workers.c
-new file mode 100644
-index 0000000..5a6dfdc
---- /dev/null
-+++ b/samples/kdbus/kdbus-workers.c
-@@ -0,0 +1,1346 @@
-+/*
-+ * Copyright (C) 2013-2015 David Herrmann <dh.herrmann@gmail.com>
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+/*
-+ * Example: Workers
-+ * This program computes prime-numbers based on the sieve of Eratosthenes. The
-+ * master sets up a shared memory region and spawns workers which clear out the
-+ * non-primes. The master reacts to keyboard input and to client-requests to
-+ * control what each worker does. Note that this is in no way meant as efficient
-+ * way to compute primes. It should only serve as example how a master/worker
-+ * concept can be implemented with kdbus used as control messages.
-+ *
-+ * The main process is called the 'master'. It creates a new, private bus which
-+ * will be used between the master and its workers to communicate. The master
-+ * then spawns a fixed number of workers. Whenever a worker dies (detected via
-+ * SIGCHLD), the master spawns a new worker. When done, the master waits for all
-+ * workers to exit, prints a status report and exits itself.
-+ *
-+ * The master process does *not* keep track of its workers. Instead, this
-+ * example implements a PULL model. That is, the master acquires a well-known
-+ * name on the bus which each worker uses to request tasks from the master. If
-+ * there are no more tasks, the master will return an empty task-list, which
-+ * casues a worker to exit immediately.
-+ *
-+ * As tasks can be computationally expensive, we support cancellation. Whenever
-+ * the master process is interrupted, it will drop its well-known name on the
-+ * bus. This causes kdbus to broadcast a name-change notification. The workers
-+ * check for broadcast messages regularly and will exit if they receive one.
-+ *
-+ * This example exists of 4 objects:
-+ * * master: The master object contains the context of the master process. This
-+ * process manages the prime-context, spawns workers and assigns
-+ * prime-ranges to each worker to compute.
-+ * The master itself does not do any prime-computations itself.
-+ * * child: The child object contains the context of a worker. It inherits the
-+ * prime context from its parent (the master) and then creates a new
-+ * bus context to request prime-ranges to compute.
-+ * * prime: The "prime" object is used to abstract how we compute primes. When
-+ * allocated, it prepares a memory region to hold 1 bit for each
-+ * natural number up to a fixed maximum ('MAX_PRIMES').
-+ * The memory region is backed by a memfd which we share between
-+ * processes. Each worker now gets assigned a range of natural
-+ * numbers which it clears multiples of off the memory region. The
-+ * master process is responsible of distributing all natural numbers
-+ * up to the fixed maximum to its workers.
-+ * * bus: The bus object is an abstraction of the kdbus API. It is pretty
-+ * straightfoward and only manages the connection-fd plus the
-+ * memory-mapped pool in a single object.
-+ *
-+ * This example is in reversed order, which should make it easier to read
-+ * top-down, but requires some forward-declarations. Just ignore those.
-+ */
-+
-+#include <stdio.h>
-+#include <stdlib.h>
-+#include <sys/syscall.h>
-+
-+/* glibc < 2.7 does not ship sys/signalfd.h */
-+/* we require kernels with __NR_memfd_create */
-+#if __GLIBC__ >= 2 && __GLIBC_MINOR__ >= 7 && defined(__NR_memfd_create)
-+
-+#include <ctype.h>
-+#include <errno.h>
-+#include <fcntl.h>
-+#include <linux/memfd.h>
-+#include <signal.h>
-+#include <stdbool.h>
-+#include <stddef.h>
-+#include <stdint.h>
-+#include <string.h>
-+#include <sys/mman.h>
-+#include <sys/poll.h>
-+#include <sys/signalfd.h>
-+#include <sys/time.h>
-+#include <sys/wait.h>
-+#include <time.h>
-+#include <unistd.h>
-+#include "kdbus-api.h"
-+
-+/* FORWARD DECLARATIONS */
-+
-+#define POOL_SIZE (16 * 1024 * 1024)
-+#define MAX_PRIMES (2UL << 24)
-+#define WORKER_COUNT (16)
-+#define PRIME_STEPS (65536 * 4)
-+
-+static const char *arg_busname = "example-workers";
-+static const char *arg_modname = "kdbus";
-+static const char *arg_master = "org.freedesktop.master";
-+
-+static int err_assert(int r_errno, const char *msg, const char *func, int line,
-+ const char *file)
-+{
-+ r_errno = (r_errno != 0) ? -abs(r_errno) : -EFAULT;
-+ if (r_errno < 0) {
-+ errno = -r_errno;
-+ fprintf(stderr, "ERR: %s: %m (%s:%d in %s)\n",
-+ msg, func, line, file);
-+ }
-+ return r_errno;
-+}
-+
-+#define err_r(_r, _msg) err_assert((_r), (_msg), __func__, __LINE__, __FILE__)
-+#define err(_msg) err_r(errno, (_msg))
-+
-+struct prime;
-+struct bus;
-+struct master;
-+struct child;
-+
-+struct prime {
-+ int fd;
-+ uint8_t *area;
-+ size_t max;
-+ size_t done;
-+ size_t status;
-+};
-+
-+static int prime_new(struct prime **out);
-+static void prime_free(struct prime *p);
-+static bool prime_done(struct prime *p);
-+static void prime_consume(struct prime *p, size_t amount);
-+static int prime_run(struct prime *p, struct bus *cancel, size_t number);
-+static void prime_print(struct prime *p);
-+
-+struct bus {
-+ int fd;
-+ uint8_t *pool;
-+};
-+
-+static int bus_open_connection(struct bus **out, uid_t uid, const char *name,
-+ uint64_t recv_flags);
-+static void bus_close_connection(struct bus *b);
-+static void bus_poool_free_slice(struct bus *b, uint64_t offset);
-+static int bus_acquire_name(struct bus *b, const char *name);
-+static int bus_install_name_loss_match(struct bus *b, const char *name);
-+static int bus_poll(struct bus *b);
-+static int bus_make(uid_t uid, const char *name);
-+
-+struct master {
-+ size_t n_workers;
-+ size_t max_workers;
-+
-+ int signal_fd;
-+ int control_fd;
-+
-+ struct prime *prime;
-+ struct bus *bus;
-+};
-+
-+static int master_new(struct master **out);
-+static void master_free(struct master *m);
-+static int master_run(struct master *m);
-+static int master_poll(struct master *m);
-+static int master_handle_stdin(struct master *m);
-+static int master_handle_signal(struct master *m);
-+static int master_handle_bus(struct master *m);
-+static int master_reply(struct master *m, const struct kdbus_msg *msg);
-+static int master_waitpid(struct master *m);
-+static int master_spawn(struct master *m);
-+
-+struct child {
-+ struct bus *bus;
-+ struct prime *prime;
-+};
-+
-+static int child_new(struct child **out, struct prime *p);
-+static void child_free(struct child *c);
-+static int child_run(struct child *c);
-+
-+/* END OF FORWARD DECLARATIONS */
-+
-+/*
-+ * This is the main entrypoint of this example. It is pretty straightforward. We
-+ * create a master object, run the computation, print a status report and then
-+ * exit. Nothing particularly interesting here, so lets look into the master
-+ * object...
-+ */
-+int main(int argc, char **argv)
-+{
-+ struct master *m = NULL;
-+ int r;
-+
-+ r = master_new(&m);
-+ if (r < 0)
-+ goto out;
-+
-+ r = master_run(m);
-+ if (r < 0)
-+ goto out;
-+
-+ if (0)
-+ prime_print(m->prime);
-+
-+out:
-+ master_free(m);
-+ if (r < 0 && r != -EINTR)
-+ fprintf(stderr, "failed\n");
-+ else
-+ fprintf(stderr, "done\n");
-+ return r < 0 ? EXIT_FAILURE : EXIT_SUCCESS;
-+}
-+
-+/*
-+ * ...this will allocate a new master context. It keeps track of the current
-+ * number of children/workers that are running, manages a signalfd to track
-+ * SIGCHLD, and creates a private kdbus bus. Afterwards, it opens its connection
-+ * to the bus and acquires a well known-name (arg_master).
-+ */
-+static int master_new(struct master **out)
-+{
-+ struct master *m;
-+ sigset_t smask;
-+ int r;
-+
-+ m = calloc(1, sizeof(*m));
-+ if (!m)
-+ return err("cannot allocate master");
-+
-+ m->max_workers = WORKER_COUNT;
-+ m->signal_fd = -1;
-+ m->control_fd = -1;
-+
-+ /* Block SIGINT and SIGCHLD signals */
-+ sigemptyset(&smask);
-+ sigaddset(&smask, SIGINT);
-+ sigaddset(&smask, SIGCHLD);
-+ sigprocmask(SIG_BLOCK, &smask, NULL);
-+
-+ m->signal_fd = signalfd(-1, &smask, SFD_CLOEXEC);
-+ if (m->signal_fd < 0) {
-+ r = err("cannot create signalfd");
-+ goto error;
-+ }
-+
-+ r = prime_new(&m->prime);
-+ if (r < 0)
-+ goto error;
-+
-+ m->control_fd = bus_make(getuid(), arg_busname);
-+ if (m->control_fd < 0) {
-+ r = m->control_fd;
-+ goto error;
-+ }
-+
-+ /*
-+ * Open a bus connection for the master, and require each received
-+ * message to have a metadata item of type KDBUS_ITEM_PIDS attached.
-+ * The current UID is needed to compute the name of the bus node to
-+ * connect to.
-+ */
-+ r = bus_open_connection(&m->bus, getuid(),
-+ arg_busname, KDBUS_ATTACH_PIDS);
-+ if (r < 0)
-+ goto error;
-+
-+ /*
-+ * Acquire a well-known name on the bus, so children can address
-+ * messages to the master using KDBUS_DST_ID_NAME as destination-ID
-+ * of messages.
-+ */
-+ r = bus_acquire_name(m->bus, arg_master);
-+ if (r < 0)
-+ goto error;
-+
-+ *out = m;
-+ return 0;
-+
-+error:
-+ master_free(m);
-+ return r;
-+}
-+
-+/* pretty straightforward destructor of a master object */
-+static void master_free(struct master *m)
-+{
-+ if (!m)
-+ return;
-+
-+ bus_close_connection(m->bus);
-+ if (m->control_fd >= 0)
-+ close(m->control_fd);
-+ prime_free(m->prime);
-+ if (m->signal_fd >= 0)
-+ close(m->signal_fd);
-+ free(m);
-+}
-+
-+static int master_run(struct master *m)
-+{
-+ int res, r = 0;
-+
-+ while (!prime_done(m->prime)) {
-+ while (m->n_workers < m->max_workers) {
-+ r = master_spawn(m);
-+ if (r < 0)
-+ break;
-+ }
-+
-+ r = master_poll(m);
-+ if (r < 0)
-+ break;
-+ }
-+
-+ if (r < 0) {
-+ bus_close_connection(m->bus);
-+ m->bus = NULL;
-+ }
-+
-+ while (m->n_workers > 0) {
-+ res = master_poll(m);
-+ if (res < 0) {
-+ if (m->bus) {
-+ bus_close_connection(m->bus);
-+ m->bus = NULL;
-+ }
-+ r = res;
-+ }
-+ }
-+
-+ return r == -EINTR ? 0 : r;
-+}
-+
-+static int master_poll(struct master *m)
-+{
-+ struct pollfd fds[3] = {};
-+ int r = 0, n = 0;
-+
-+ /*
-+ * Add stdin, the eventfd and the connection owner file descriptor to
-+ * the pollfd table, and handle incoming traffic on the latter in
-+ * master_handle_bus().
-+ */
-+ fds[n].fd = STDIN_FILENO;
-+ fds[n++].events = POLLIN;
-+ fds[n].fd = m->signal_fd;
-+ fds[n++].events = POLLIN;
-+ if (m->bus) {
-+ fds[n].fd = m->bus->fd;
-+ fds[n++].events = POLLIN;
-+ }
-+
-+ r = poll(fds, n, -1);
-+ if (r < 0)
-+ return err("poll() failed");
-+
-+ if (fds[0].revents & POLLIN)
-+ r = master_handle_stdin(m);
-+ else if (fds[0].revents)
-+ r = err("ERR/HUP on stdin");
-+ if (r < 0)
-+ return r;
-+
-+ if (fds[1].revents & POLLIN)
-+ r = master_handle_signal(m);
-+ else if (fds[1].revents)
-+ r = err("ERR/HUP on signalfd");
-+ if (r < 0)
-+ return r;
-+
-+ if (fds[2].revents & POLLIN)
-+ r = master_handle_bus(m);
-+ else if (fds[2].revents)
-+ r = err("ERR/HUP on bus");
-+
-+ return r;
-+}
-+
-+static int master_handle_stdin(struct master *m)
-+{
-+ char buf[128];
-+ ssize_t l;
-+ int r = 0;
-+
-+ l = read(STDIN_FILENO, buf, sizeof(buf));
-+ if (l < 0)
-+ return err("cannot read stdin");
-+ if (l == 0)
-+ return err_r(-EINVAL, "EOF on stdin");
-+
-+ while (l-- > 0) {
-+ switch (buf[l]) {
-+ case 'q':
-+ /* quit */
-+ r = -EINTR;
-+ break;
-+ case '\n':
-+ case ' ':
-+ /* ignore */
-+ break;
-+ default:
-+ if (isgraph(buf[l]))
-+ fprintf(stderr, "invalid input '%c'\n", buf[l]);
-+ else
-+ fprintf(stderr, "invalid input 0x%x\n", buf[l]);
-+ break;
-+ }
-+ }
-+
-+ return r;
-+}
-+
-+static int master_handle_signal(struct master *m)
-+{
-+ struct signalfd_siginfo val;
-+ ssize_t l;
-+
-+ l = read(m->signal_fd, &val, sizeof(val));
-+ if (l < 0)
-+ return err("cannot read signalfd");
-+ if (l != sizeof(val))
-+ return err_r(-EINVAL, "invalid data from signalfd");
-+
-+ switch (val.ssi_signo) {
-+ case SIGCHLD:
-+ return master_waitpid(m);
-+ case SIGINT:
-+ return err_r(-EINTR, "interrupted");
-+ default:
-+ return err_r(-EINVAL, "caught invalid signal");
-+ }
-+}
-+
-+static int master_handle_bus(struct master *m)
-+{
-+ struct kdbus_cmd_recv recv = { .size = sizeof(recv) };
-+ const struct kdbus_msg *msg = NULL;
-+ const struct kdbus_item *item;
-+ const struct kdbus_vec *vec = NULL;
-+ int r = 0;
-+
-+ /*
-+ * To receive a message, the KDBUS_CMD_RECV ioctl is used.
-+ * It takes an argument of type 'struct kdbus_cmd_recv', which
-+ * will contain information on the received message when the call
-+ * returns. See kdbus.message(7).
-+ */
-+ r = kdbus_cmd_recv(m->bus->fd, &recv);
-+ /*
-+ * EAGAIN is returned when there is no message waiting on this
-+ * connection. This is not an error - simply bail out.
-+ */
-+ if (r == -EAGAIN)
-+ return 0;
-+ if (r < 0)
-+ return err_r(r, "cannot receive message");
-+
-+ /*
-+ * Messages received by a connection are stored inside the connection's
-+ * pool, at an offset that has been returned in the 'recv' command
-+ * struct above. The value describes the relative offset from the
-+ * start address of the pool. A message is described with
-+ * 'struct kdbus_msg'. See kdbus.message(7).
-+ */
-+ msg = (void *)(m->bus->pool + recv.msg.offset);
-+
-+ /*
-+ * A messages describes its actual payload in an array of items.
-+ * KDBUS_FOREACH() is a simple iterator that walks such an array.
-+ * struct kdbus_msg has a field to denote its total size, which is
-+ * needed to determine the number of items in the array.
-+ */
-+ KDBUS_FOREACH(item, msg->items,
-+ msg->size - offsetof(struct kdbus_msg, items)) {
-+ /*
-+ * An item of type PAYLOAD_OFF describes in-line memory
-+ * stored in the pool at a described offset. That offset is
-+ * relative to the start address of the message header.
-+ * This example program only expects one single item of that
-+ * type, remembers the struct kdbus_vec member of the item
-+ * when it sees it, and bails out if there is more than one
-+ * of them.
-+ */
-+ if (item->type == KDBUS_ITEM_PAYLOAD_OFF) {
-+ if (vec) {
-+ r = err_r(-EEXIST,
-+ "message with multiple vecs");
-+ break;
-+ }
-+ vec = &item->vec;
-+ if (vec->size != 1) {
-+ r = err_r(-EINVAL, "invalid message size");
-+ break;
-+ }
-+
-+ /*
-+ * MEMFDs are transported as items of type PAYLOAD_MEMFD.
-+ * If such an item is attached, a new file descriptor was
-+ * installed into the task when KDBUS_CMD_RECV was called, and
-+ * its number is stored in item->memfd.fd.
-+ * Implementers *must* handle this item type and close the
-+ * file descriptor when no longer needed in order to prevent
-+ * file descriptor exhaustion. This example program just bails
-+ * out with an error in this case, as memfds are not expected
-+ * in this context.
-+ */
-+ } else if (item->type == KDBUS_ITEM_PAYLOAD_MEMFD) {
-+ r = err_r(-EINVAL, "message with memfd");
-+ break;
-+ }
-+ }
-+ if (r < 0)
-+ goto exit;
-+ if (!vec) {
-+ r = err_r(-EINVAL, "empty message");
-+ goto exit;
-+ }
-+
-+ switch (*((const uint8_t *)msg + vec->offset)) {
-+ case 'r': {
-+ r = master_reply(m, msg);
-+ break;
-+ }
-+ default:
-+ r = err_r(-EINVAL, "invalid message type");
-+ break;
-+ }
-+
-+exit:
-+ /*
-+ * We are done with the memory slice that was given to us through
-+ * recv.msg.offset. Tell the kernel it can use it for other content
-+ * in the future. See kdbus.pool(7).
-+ */
-+ bus_poool_free_slice(m->bus, recv.msg.offset);
-+ return r;
-+}
-+
-+static int master_reply(struct master *m, const struct kdbus_msg *msg)
-+{
-+ struct kdbus_cmd_send cmd;
-+ struct kdbus_item *item;
-+ struct kdbus_msg *reply;
-+ size_t size, status, p[2];
-+ int r;
-+
-+ /*
-+ * This functions sends a message over kdbus. To do this, it uses the
-+ * KDBUS_CMD_SEND ioctl, which takes a command struct argument of type
-+ * 'struct kdbus_cmd_send'. This struct stores a pointer to the actual
-+ * message to send. See kdbus.message(7).
-+ */
-+ p[0] = m->prime->done;
-+ p[1] = prime_done(m->prime) ? 0 : PRIME_STEPS;
-+
-+ size = sizeof(*reply);
-+ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
-+
-+ /* Prepare the message to send */
-+ reply = alloca(size);
-+ memset(reply, 0, size);
-+ reply->size = size;
-+
-+ /* Each message has a cookie that can be used to send replies */
-+ reply->cookie = 1;
-+
-+ /* The payload_type is arbitrary, but it must be non-zero */
-+ reply->payload_type = 0xdeadbeef;
-+
-+ /*
-+ * We are sending a reply. Let the kernel know the cookie of the
-+ * message we are replying to.
-+ */
-+ reply->cookie_reply = msg->cookie;
-+
-+ /*
-+ * Messages can either be directed to a well-known name (stored as
-+ * string) or to a unique name (stored as number). This example does
-+ * the latter. If the message would be directed to a well-known name
-+ * instead, the message's dst_id field would be set to
-+ * KDBUS_DST_ID_NAME, and the name would be attaches in an item of type
-+ * KDBUS_ITEM_DST_NAME. See below for an example, and also refer to
-+ * kdbus.message(7).
-+ */
-+ reply->dst_id = msg->src_id;
-+
-+ /* Our message has exactly one item to store its payload */
-+ item = reply->items;
-+ item->type = KDBUS_ITEM_PAYLOAD_VEC;
-+ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
-+ item->vec.address = (uintptr_t)p;
-+ item->vec.size = sizeof(p);
-+
-+ /*
-+ * Now prepare the command struct, and reference the message we want
-+ * to send.
-+ */
-+ memset(&cmd, 0, sizeof(cmd));
-+ cmd.size = sizeof(cmd);
-+ cmd.msg_address = (uintptr_t)reply;
-+
-+ /*
-+ * Finally, employ the command on the connection owner
-+ * file descriptor.
-+ */
-+ r = kdbus_cmd_send(m->bus->fd, &cmd);
-+ if (r < 0)
-+ return err_r(r, "cannot send reply");
-+
-+ if (p[1]) {
-+ prime_consume(m->prime, p[1]);
-+ status = m->prime->done * 10000 / m->prime->max;
-+ if (status != m->prime->status) {
-+ m->prime->status = status;
-+ fprintf(stderr, "status: %7.3lf%%\n",
-+ (double)status / 100);
-+ }
-+ }
-+
-+ return 0;
-+}
-+
-+static int master_waitpid(struct master *m)
-+{
-+ pid_t pid;
-+ int r;
-+
-+ while ((pid = waitpid(-1, &r, WNOHANG)) > 0) {
-+ if (m->n_workers > 0)
-+ --m->n_workers;
-+ if (!WIFEXITED(r))
-+ r = err_r(-EINVAL, "child died unexpectedly");
-+ else if (WEXITSTATUS(r) != 0)
-+ r = err_r(-WEXITSTATUS(r), "child failed");
-+ }
-+
-+ return r;
-+}
-+
-+static int master_spawn(struct master *m)
-+{
-+ struct child *c = NULL;
-+ struct prime *p = NULL;
-+ pid_t pid;
-+ int r;
-+
-+ /* Spawn off one child and call child_run() inside it */
-+
-+ pid = fork();
-+ if (pid < 0)
-+ return err("cannot fork");
-+ if (pid > 0) {
-+ /* parent */
-+ ++m->n_workers;
-+ return 0;
-+ }
-+
-+ /* child */
-+
-+ p = m->prime;
-+ m->prime = NULL;
-+ master_free(m);
-+
-+ r = child_new(&c, p);
-+ if (r < 0)
-+ goto exit;
-+
-+ r = child_run(c);
-+
-+exit:
-+ child_free(c);
-+ exit(abs(r));
-+}
-+
-+static int child_new(struct child **out, struct prime *p)
-+{
-+ struct child *c;
-+ int r;
-+
-+ c = calloc(1, sizeof(*c));
-+ if (!c)
-+ return err("cannot allocate child");
-+
-+ c->prime = p;
-+
-+ /*
-+ * Open a connection to the bus and require each received message to
-+ * carry a list of the well-known names the sendind connection currently
-+ * owns. The current UID is needed in order to determine the name of the
-+ * bus node to connect to.
-+ */
-+ r = bus_open_connection(&c->bus, getuid(),
-+ arg_busname, KDBUS_ATTACH_NAMES);
-+ if (r < 0)
-+ goto error;
-+
-+ /*
-+ * Install a kdbus match so the child's connection gets notified when
-+ * the master loses its well-known name.
-+ */
-+ r = bus_install_name_loss_match(c->bus, arg_master);
-+ if (r < 0)
-+ goto error;
-+
-+ *out = c;
-+ return 0;
-+
-+error:
-+ child_free(c);
-+ return r;
-+}
-+
-+static void child_free(struct child *c)
-+{
-+ if (!c)
-+ return;
-+
-+ bus_close_connection(c->bus);
-+ prime_free(c->prime);
-+ free(c);
-+}
-+
-+static int child_run(struct child *c)
-+{
-+ struct kdbus_cmd_send cmd;
-+ struct kdbus_item *item;
-+ struct kdbus_vec *vec = NULL;
-+ struct kdbus_msg *msg;
-+ struct timespec spec;
-+ size_t n, steps, size;
-+ int r = 0;
-+
-+ /*
-+ * Let's send a message to the master and ask for work. To do this,
-+ * we use the KDBUS_CMD_SEND ioctl, which takes an argument of type
-+ * 'struct kdbus_cmd_send'. This struct stores a pointer to the actual
-+ * message to send. See kdbus.message(7).
-+ */
-+ size = sizeof(*msg);
-+ size += KDBUS_ITEM_SIZE(strlen(arg_master) + 1);
-+ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
-+
-+ msg = alloca(size);
-+ memset(msg, 0, size);
-+ msg->size = size;
-+
-+ /*
-+ * Tell the kernel that we expect a reply to this message. This means
-+ * that
-+ *
-+ * a) The remote peer will gain temporary permission to talk to us
-+ * even if it would not be allowed to normally.
-+ *
-+ * b) A timeout value is required.
-+ *
-+ * For asynchronous send commands, if no reply is received, we will
-+ * get a kernel notification with an item of type
-+ * KDBUS_ITEM_REPLY_TIMEOUT attached.
-+ *
-+ * For synchronous send commands (which this example does), the
-+ * ioctl will block until a reply is received or the timeout is
-+ * exceeded.
-+ */
-+ msg->flags = KDBUS_MSG_EXPECT_REPLY;
-+
-+ /* Set our cookie. Replies must use this cookie to send their reply. */
-+ msg->cookie = 1;
-+
-+ /* The payload_type is arbitrary, but it must be non-zero */
-+ msg->payload_type = 0xdeadbeef;
-+
-+ /*
-+ * We are sending our message to the current owner of a well-known
-+ * name. This makes an item of type KDBUS_ITEM_DST_NAME mandatory.
-+ */
-+ msg->dst_id = KDBUS_DST_ID_NAME;
-+
-+ /*
-+ * Set the reply timeout to 5 seconds. Timeouts are always set in
-+ * absolute timestamps, based con CLOCK_MONOTONIC. See kdbus.message(7).
-+ */
-+ clock_gettime(CLOCK_MONOTONIC_COARSE, &spec);
-+ msg->timeout_ns += (5 + spec.tv_sec) * 1000ULL * 1000ULL * 1000ULL;
-+ msg->timeout_ns += spec.tv_nsec;
-+
-+ /*
-+ * Fill the appended items. First, set the well-known name of the
-+ * destination we want to talk to.
-+ */
-+ item = msg->items;
-+ item->type = KDBUS_ITEM_DST_NAME;
-+ item->size = KDBUS_ITEM_HEADER_SIZE + strlen(arg_master) + 1;
-+ strcpy(item->str, arg_master);
-+
-+ /*
-+ * The 2nd item contains a vector to memory we want to send. It
-+ * can be content of any type. In our case, we're sending a one-byte
-+ * string only. The memory referenced by this item will be copied into
-+ * the pool of the receiver connection, and does not need to be valid
-+ * after the command is employed.
-+ */
-+ item = KDBUS_ITEM_NEXT(item);
-+ item->type = KDBUS_ITEM_PAYLOAD_VEC;
-+ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
-+ item->vec.address = (uintptr_t)"r";
-+ item->vec.size = 1;
-+
-+ /* Set up the command struct and reference the message we prepared */
-+ memset(&cmd, 0, sizeof(cmd));
-+ cmd.size = sizeof(cmd);
-+ cmd.msg_address = (uintptr_t)msg;
-+
-+ /*
-+ * The send commands knows a mode in which it will block until a
-+ * reply to a message is received. This example uses that mode.
-+ * The pool offset to the received reply will be stored in the command
-+ * struct after the send command returned. See below.
-+ */
-+ cmd.flags = KDBUS_SEND_SYNC_REPLY;
-+
-+ /*
-+ * Finally, employ the command on the connection owner
-+ * file descriptor.
-+ */
-+ r = kdbus_cmd_send(c->bus->fd, &cmd);
-+ if (r == -ESRCH || r == -EPIPE || r == -ECONNRESET)
-+ return 0;
-+ if (r < 0)
-+ return err_r(r, "cannot send request to master");
-+
-+ /*
-+ * The command was sent with the KDBUS_SEND_SYNC_REPLY flag set,
-+ * and returned successfully, which means that cmd.reply.offset now
-+ * points to a message inside our connection's pool where the reply
-+ * is found. This is equivalent to receiving the reply with
-+ * KDBUS_CMD_RECV, but it doesn't require waiting for the reply with
-+ * poll() and also saves the ioctl to receive the message.
-+ */
-+ msg = (void *)(c->bus->pool + cmd.reply.offset);
-+
-+ /*
-+ * A messages describes its actual payload in an array of items.
-+ * KDBUS_FOREACH() is a simple iterator that walks such an array.
-+ * struct kdbus_msg has a field to denote its total size, which is
-+ * needed to determine the number of items in the array.
-+ */
-+ KDBUS_FOREACH(item, msg->items,
-+ msg->size - offsetof(struct kdbus_msg, items)) {
-+ /*
-+ * An item of type PAYLOAD_OFF describes in-line memory
-+ * stored in the pool at a described offset. That offset is
-+ * relative to the start address of the message header.
-+ * This example program only expects one single item of that
-+ * type, remembers the struct kdbus_vec member of the item
-+ * when it sees it, and bails out if there is more than one
-+ * of them.
-+ */
-+ if (item->type == KDBUS_ITEM_PAYLOAD_OFF) {
-+ if (vec) {
-+ r = err_r(-EEXIST,
-+ "message with multiple vecs");
-+ break;
-+ }
-+ vec = &item->vec;
-+ if (vec->size != 2 * sizeof(size_t)) {
-+ r = err_r(-EINVAL, "invalid message size");
-+ break;
-+ }
-+ /*
-+ * MEMFDs are transported as items of type PAYLOAD_MEMFD.
-+ * If such an item is attached, a new file descriptor was
-+ * installed into the task when KDBUS_CMD_RECV was called, and
-+ * its number is stored in item->memfd.fd.
-+ * Implementers *must* handle this item type close the
-+ * file descriptor when no longer needed in order to prevent
-+ * file descriptor exhaustion. This example program just bails
-+ * out with an error in this case, as memfds are not expected
-+ * in this context.
-+ */
-+ } else if (item->type == KDBUS_ITEM_PAYLOAD_MEMFD) {
-+ r = err_r(-EINVAL, "message with memfd");
-+ break;
-+ }
-+ }
-+ if (r < 0)
-+ goto exit;
-+ if (!vec) {
-+ r = err_r(-EINVAL, "empty message");
-+ goto exit;
-+ }
-+
-+ n = ((size_t *)((const uint8_t *)msg + vec->offset))[0];
-+ steps = ((size_t *)((const uint8_t *)msg + vec->offset))[1];
-+
-+ while (steps-- > 0) {
-+ ++n;
-+ r = prime_run(c->prime, c->bus, n);
-+ if (r < 0)
-+ break;
-+ r = bus_poll(c->bus);
-+ if (r != 0) {
-+ r = r < 0 ? r : -EINTR;
-+ break;
-+ }
-+ }
-+
-+exit:
-+ /*
-+ * We are done with the memory slice that was given to us through
-+ * cmd.reply.offset. Tell the kernel it can use it for other content
-+ * in the future. See kdbus.pool(7).
-+ */
-+ bus_poool_free_slice(c->bus, cmd.reply.offset);
-+ return r;
-+}
-+
-+/*
-+ * Prime Computation
-+ *
-+ */
-+
-+static int prime_new(struct prime **out)
-+{
-+ struct prime *p;
-+ int r;
-+
-+ p = calloc(1, sizeof(*p));
-+ if (!p)
-+ return err("cannot allocate prime memory");
-+
-+ p->fd = -1;
-+ p->area = MAP_FAILED;
-+ p->max = MAX_PRIMES;
-+
-+ /*
-+ * Prepare and map a memfd to store the bit-fields for the number
-+ * ranges we want to perform the prime detection on.
-+ */
-+ p->fd = syscall(__NR_memfd_create, "prime-area", MFD_CLOEXEC);
-+ if (p->fd < 0) {
-+ r = err("cannot create memfd");
-+ goto error;
-+ }
-+
-+ r = ftruncate(p->fd, p->max / 8 + 1);
-+ if (r < 0) {
-+ r = err("cannot ftruncate area");
-+ goto error;
-+ }
-+
-+ p->area = mmap(NULL, p->max / 8 + 1, PROT_READ | PROT_WRITE,
-+ MAP_SHARED, p->fd, 0);
-+ if (p->area == MAP_FAILED) {
-+ r = err("cannot mmap memfd");
-+ goto error;
-+ }
-+
-+ *out = p;
-+ return 0;
-+
-+error:
-+ prime_free(p);
-+ return r;
-+}
-+
-+static void prime_free(struct prime *p)
-+{
-+ if (!p)
-+ return;
-+
-+ if (p->area != MAP_FAILED)
-+ munmap(p->area, p->max / 8 + 1);
-+ if (p->fd >= 0)
-+ close(p->fd);
-+ free(p);
-+}
-+
-+static bool prime_done(struct prime *p)
-+{
-+ return p->done >= p->max;
-+}
-+
-+static void prime_consume(struct prime *p, size_t amount)
-+{
-+ p->done += amount;
-+}
-+
-+static int prime_run(struct prime *p, struct bus *cancel, size_t number)
-+{
-+ size_t i, n = 0;
-+ int r;
-+
-+ if (number < 2 || number > 65535)
-+ return 0;
-+
-+ for (i = number * number;
-+ i < p->max && i > number;
-+ i += number) {
-+ p->area[i / 8] |= 1 << (i % 8);
-+
-+ if (!(++n % (1 << 20))) {
-+ r = bus_poll(cancel);
-+ if (r != 0)
-+ return r < 0 ? r : -EINTR;
-+ }
-+ }
-+
-+ return 0;
-+}
-+
-+static void prime_print(struct prime *p)
-+{
-+ size_t i, l = 0;
-+
-+ fprintf(stderr, "PRIMES:");
-+ for (i = 0; i < p->max; ++i) {
-+ if (!(p->area[i / 8] & (1 << (i % 8))))
-+ fprintf(stderr, "%c%7zu", !(l++ % 16) ? '\n' : ' ', i);
-+ }
-+ fprintf(stderr, "\nEND\n");
-+}
-+
-+static int bus_open_connection(struct bus **out, uid_t uid, const char *name,
-+ uint64_t recv_flags)
-+{
-+ struct kdbus_cmd_hello hello;
-+ char path[128];
-+ struct bus *b;
-+ int r;
-+
-+ /*
-+ * The 'bus' object is our representation of a kdbus connection which
-+ * stores two details: the connection owner file descriptor, and the
-+ * mmap()ed memory of its associated pool. See kdbus.connection(7) and
-+ * kdbus.pool(7).
-+ */
-+ b = calloc(1, sizeof(*b));
-+ if (!b)
-+ return err("cannot allocate bus memory");
-+
-+ b->fd = -1;
-+ b->pool = MAP_FAILED;
-+
-+ /* Compute the name of the bus node to connect to. */
-+ snprintf(path, sizeof(path), "/sys/fs/%s/%lu-%s/bus",
-+ arg_modname, (unsigned long)uid, name);
-+ b->fd = open(path, O_RDWR | O_CLOEXEC);
-+ if (b->fd < 0) {
-+ r = err("cannot open bus");
-+ goto error;
-+ }
-+
-+ /*
-+ * To make a connection to the bus, the KDBUS_CMD_HELLO ioctl is used.
-+ * It takes an argument of type 'struct kdbus_cmd_hello'.
-+ */
-+ memset(&hello, 0, sizeof(hello));
-+ hello.size = sizeof(hello);
-+
-+ /*
-+ * Specify a mask of metadata attach flags, describing metadata items
-+ * that this new connection allows to be sent.
-+ */
-+ hello.attach_flags_send = _KDBUS_ATTACH_ALL;
-+
-+ /*
-+ * Specify a mask of metadata attach flags, describing metadata items
-+ * that this new connection wants to be receive along with each message.
-+ */
-+ hello.attach_flags_recv = recv_flags;
-+
-+ /*
-+ * A connection may choose the size of its pool, but the number has to
-+ * comply with two rules: a) it must be greater than 0, and b) it must
-+ * be a mulitple of PAGE_SIZE. See kdbus.pool(7).
-+ */
-+ hello.pool_size = POOL_SIZE;
-+
-+ /*
-+ * Now employ the command on the file descriptor opened above.
-+ * This command will turn the file descriptor into a connection-owner
-+ * file descriptor that controls the life-time of the connection; once
-+ * it's closed, the connection is shut down.
-+ */
-+ r = kdbus_cmd_hello(b->fd, &hello);
-+ if (r < 0) {
-+ err_r(r, "HELLO failed");
-+ goto error;
-+ }
-+
-+ bus_poool_free_slice(b, hello.offset);
-+
-+ /*
-+ * Map the pool of the connection. Its size has been set in the
-+ * command struct above. See kdbus.pool(7).
-+ */
-+ b->pool = mmap(NULL, POOL_SIZE, PROT_READ, MAP_SHARED, b->fd, 0);
-+ if (b->pool == MAP_FAILED) {
-+ r = err("cannot mmap pool");
-+ goto error;
-+ }
-+
-+ *out = b;
-+ return 0;
-+
-+error:
-+ bus_close_connection(b);
-+ return r;
-+}
-+
-+static void bus_close_connection(struct bus *b)
-+{
-+ if (!b)
-+ return;
-+
-+ /*
-+ * A bus connection is closed by simply calling close() on the
-+ * connection owner file descriptor. The unique name and all owned
-+ * well-known names of the conneciton will disappear.
-+ * See kdbus.connection(7).
-+ */
-+ if (b->pool != MAP_FAILED)
-+ munmap(b->pool, POOL_SIZE);
-+ if (b->fd >= 0)
-+ close(b->fd);
-+ free(b);
-+}
-+
-+static void bus_poool_free_slice(struct bus *b, uint64_t offset)
-+{
-+ struct kdbus_cmd_free cmd = {
-+ .size = sizeof(cmd),
-+ .offset = offset,
-+ };
-+ int r;
-+
-+ /*
-+ * Once we're done with a piece of pool memory that was returned
-+ * by a command, we have to call the KDBUS_CMD_FREE ioctl on it so it
-+ * can be reused. The command takes an argument of type
-+ * 'struct kdbus_cmd_free', in which the pool offset of the slice to
-+ * free is stored. The ioctl is employed on the connection owner
-+ * file descriptor. See kdbus.pool(7),
-+ */
-+ r = kdbus_cmd_free(b->fd, &cmd);
-+ if (r < 0)
-+ err_r(r, "cannot free pool slice");
-+}
-+
-+static int bus_acquire_name(struct bus *b, const char *name)
-+{
-+ struct kdbus_item *item;
-+ struct kdbus_cmd *cmd;
-+ size_t size;
-+ int r;
-+
-+ /*
-+ * This function acquires a well-known name on the bus through the
-+ * KDBUS_CMD_NAME_ACQUIRE ioctl. This ioctl takes an argument of type
-+ * 'struct kdbus_cmd', which is assembled below. See kdbus.name(7).
-+ */
-+ size = sizeof(*cmd);
-+ size += KDBUS_ITEM_SIZE(strlen(name) + 1);
-+
-+ cmd = alloca(size);
-+ memset(cmd, 0, size);
-+ cmd->size = size;
-+
-+ /*
-+ * The command requires an item of type KDBUS_ITEM_NAME, and its
-+ * content must be a valid bus name.
-+ */
-+ item = cmd->items;
-+ item->type = KDBUS_ITEM_NAME;
-+ item->size = KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
-+ strcpy(item->str, name);
-+
-+ /*
-+ * Employ the command on the connection owner file descriptor.
-+ */
-+ r = kdbus_cmd_name_acquire(b->fd, cmd);
-+ if (r < 0)
-+ return err_r(r, "cannot acquire name");
-+
-+ return 0;
-+}
-+
-+static int bus_install_name_loss_match(struct bus *b, const char *name)
-+{
-+ struct kdbus_cmd_match *match;
-+ struct kdbus_item *item;
-+ size_t size;
-+ int r;
-+
-+ /*
-+ * In order to install a match for signal messages, we have to
-+ * assemble a 'struct kdbus_cmd_match' and use it along with the
-+ * KDBUS_CMD_MATCH_ADD ioctl. See kdbus.match(7).
-+ */
-+ size = sizeof(*match);
-+ size += KDBUS_ITEM_SIZE(sizeof(item->name_change) + strlen(name) + 1);
-+
-+ match = alloca(size);
-+ memset(match, 0, size);
-+ match->size = size;
-+
-+ /*
-+ * A match is comprised of many 'rules', each of which describes a
-+ * mandatory detail of the message. All rules of a match must be
-+ * satified in order to make a message pass.
-+ */
-+ item = match->items;
-+
-+ /*
-+ * In this case, we're interested in notifications that inform us
-+ * about a well-known name being removed from the bus.
-+ */
-+ item->type = KDBUS_ITEM_NAME_REMOVE;
-+ item->size = KDBUS_ITEM_HEADER_SIZE +
-+ sizeof(item->name_change) + strlen(name) + 1;
-+
-+ /*
-+ * We could limit the match further and require a specific unique-ID
-+ * to be the new or the old owner of the name. In this case, however,
-+ * we don't, and allow 'any' id.
-+ */
-+ item->name_change.old_id.id = KDBUS_MATCH_ID_ANY;
-+ item->name_change.new_id.id = KDBUS_MATCH_ID_ANY;
-+
-+ /* Copy in the well-known name we're interested in */
-+ strcpy(item->name_change.name, name);
-+
-+ /*
-+ * Add the match through the KDBUS_CMD_MATCH_ADD ioctl, employed on
-+ * the connection owner fd.
-+ */
-+ r = kdbus_cmd_match_add(b->fd, match);
-+ if (r < 0)
-+ return err_r(r, "cannot add match");
-+
-+ return 0;
-+}
-+
-+static int bus_poll(struct bus *b)
-+{
-+ struct pollfd fds[1] = {};
-+ int r;
-+
-+ /*
-+ * A connection endpoint supports poll() and will wake-up the
-+ * task with POLLIN set once a message has arrived.
-+ */
-+ fds[0].fd = b->fd;
-+ fds[0].events = POLLIN;
-+ r = poll(fds, sizeof(fds) / sizeof(*fds), 0);
-+ if (r < 0)
-+ return err("cannot poll bus");
-+ return !!(fds[0].revents & POLLIN);
-+}
-+
-+static int bus_make(uid_t uid, const char *name)
-+{
-+ struct kdbus_item *item;
-+ struct kdbus_cmd *make;
-+ char path[128], busname[128];
-+ size_t size;
-+ int r, fd;
-+
-+ /*
-+ * Compute the full path to the 'control' node. 'arg_modname' may be
-+ * set to a different value than 'kdbus' for development purposes.
-+ * The 'control' node is the primary entry point to kdbus that must be
-+ * used in order to create a bus. See kdbus(7) and kdbus.bus(7).
-+ */
-+ snprintf(path, sizeof(path), "/sys/fs/%s/control", arg_modname);
-+
-+ /*
-+ * Compute the bus name. A valid bus name must always be prefixed with
-+ * the EUID of the currently running process in order to avoid name
-+ * conflicts. See kdbus.bus(7).
-+ */
-+ snprintf(busname, sizeof(busname), "%lu-%s", (unsigned long)uid, name);
-+
-+ fd = open(path, O_RDWR | O_CLOEXEC);
-+ if (fd < 0)
-+ return err("cannot open control file");
-+
-+ /*
-+ * The KDBUS_CMD_BUS_MAKE ioctl takes an argument of type
-+ * 'struct kdbus_cmd', and expects at least two items attached to
-+ * it: one to decribe the bloom parameters to be propagated to
-+ * connections of the bus, and the name of the bus that was computed
-+ * above. Assemble this struct now, and fill it with values.
-+ */
-+ size = sizeof(*make);
-+ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_parameter));
-+ size += KDBUS_ITEM_SIZE(strlen(busname) + 1);
-+
-+ make = alloca(size);
-+ memset(make, 0, size);
-+ make->size = size;
-+
-+ /*
-+ * Each item has a 'type' and 'size' field, and must be stored at an
-+ * 8-byte aligned address. The KDBUS_ITEM_NEXT macro is used to advance
-+ * the pointer. See kdbus.item(7) for more details.
-+ */
-+ item = make->items;
-+ item->type = KDBUS_ITEM_BLOOM_PARAMETER;
-+ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(item->bloom_parameter);
-+ item->bloom_parameter.size = 8;
-+ item->bloom_parameter.n_hash = 1;
-+
-+ /* The name of the new bus is stored in the next item. */
-+ item = KDBUS_ITEM_NEXT(item);
-+ item->type = KDBUS_ITEM_MAKE_NAME;
-+ item->size = KDBUS_ITEM_HEADER_SIZE + strlen(busname) + 1;
-+ strcpy(item->str, busname);
-+
-+ /*
-+ * Now create the bus via the KDBUS_CMD_BUS_MAKE ioctl and return the
-+ * fd that was used back to the caller of this function. This fd is now
-+ * called a 'bus owner file descriptor', and it controls the life-time
-+ * of the newly created bus; once the file descriptor is closed, the
-+ * bus goes away, and all connections are shut down. See kdbus.bus(7).
-+ */
-+ r = kdbus_cmd_bus_make(fd, make);
-+ if (r < 0) {
-+ err_r(r, "cannot make bus");
-+ close(fd);
-+ return r;
-+ }
-+
-+ return fd;
-+}
-+
-+#else
-+
-+#warning "Skipping compilation due to unsupported libc version"
-+
-+int main(int argc, char **argv)
-+{
-+ fprintf(stderr,
-+ "Compilation of %s was skipped due to unsupported libc.\n",
-+ argv[0]);
-+
-+ return EXIT_FAILURE;
-+}
-+
-+#endif /* libc sanity check */
-diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
-index 95abddc..b57100c 100644
---- a/tools/testing/selftests/Makefile
-+++ b/tools/testing/selftests/Makefile
-@@ -5,6 +5,7 @@ TARGETS += exec
- TARGETS += firmware
- TARGETS += ftrace
- TARGETS += kcmp
-+TARGETS += kdbus
- TARGETS += memfd
- TARGETS += memory-hotplug
- TARGETS += mount
-diff --git a/tools/testing/selftests/kdbus/.gitignore b/tools/testing/selftests/kdbus/.gitignore
-new file mode 100644
-index 0000000..d3ef42f
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/.gitignore
-@@ -0,0 +1 @@
-+kdbus-test
-diff --git a/tools/testing/selftests/kdbus/Makefile b/tools/testing/selftests/kdbus/Makefile
-new file mode 100644
-index 0000000..8f36cb5
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/Makefile
-@@ -0,0 +1,49 @@
-+CFLAGS += -I../../../../usr/include/
-+CFLAGS += -I../../../../samples/kdbus/
-+CFLAGS += -I../../../../include/uapi/
-+CFLAGS += -std=gnu99
-+CFLAGS += -DKBUILD_MODNAME=\"kdbus\" -D_GNU_SOURCE
-+LDLIBS = -pthread -lcap -lm
-+
-+OBJS= \
-+ kdbus-enum.o \
-+ kdbus-util.o \
-+ kdbus-test.o \
-+ kdbus-test.o \
-+ test-activator.o \
-+ test-benchmark.o \
-+ test-bus.o \
-+ test-chat.o \
-+ test-connection.o \
-+ test-daemon.o \
-+ test-endpoint.o \
-+ test-fd.o \
-+ test-free.o \
-+ test-match.o \
-+ test-message.o \
-+ test-metadata-ns.o \
-+ test-monitor.o \
-+ test-names.o \
-+ test-policy.o \
-+ test-policy-ns.o \
-+ test-policy-priv.o \
-+ test-sync.o \
-+ test-timeout.o
-+
-+all: kdbus-test
-+
-+include ../lib.mk
-+
-+%.o: %.c kdbus-enum.h kdbus-test.h kdbus-util.h
-+ $(CC) $(CFLAGS) -c $< -o $@
-+
-+kdbus-test: $(OBJS)
-+ $(CC) $(CFLAGS) $^ $(LDLIBS) -o $@
-+
-+TEST_PROGS := kdbus-test
-+
-+run_tests:
-+ ./kdbus-test --tap
-+
-+clean:
-+ rm -f *.o kdbus-test
-diff --git a/tools/testing/selftests/kdbus/kdbus-enum.c b/tools/testing/selftests/kdbus/kdbus-enum.c
-new file mode 100644
-index 0000000..4f1e579
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/kdbus-enum.c
-@@ -0,0 +1,94 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <stdio.h>
-+#include <string.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+
-+struct kdbus_enum_table {
-+ long long id;
-+ const char *name;
-+};
-+
-+#define TABLE(what) static struct kdbus_enum_table kdbus_table_##what[]
-+#define ENUM(_id) { .id = _id, .name = STRINGIFY(_id) }
-+#define LOOKUP(what) \
-+ const char *enum_##what(long long id) \
-+ { \
-+ for (size_t i = 0; i < ELEMENTSOF(kdbus_table_##what); i++) \
-+ if (id == kdbus_table_##what[i].id) \
-+ return kdbus_table_##what[i].name; \
-+ return "UNKNOWN"; \
-+ }
-+
-+TABLE(CMD) = {
-+ ENUM(KDBUS_CMD_BUS_MAKE),
-+ ENUM(KDBUS_CMD_ENDPOINT_MAKE),
-+ ENUM(KDBUS_CMD_HELLO),
-+ ENUM(KDBUS_CMD_SEND),
-+ ENUM(KDBUS_CMD_RECV),
-+ ENUM(KDBUS_CMD_LIST),
-+ ENUM(KDBUS_CMD_NAME_RELEASE),
-+ ENUM(KDBUS_CMD_CONN_INFO),
-+ ENUM(KDBUS_CMD_MATCH_ADD),
-+ ENUM(KDBUS_CMD_MATCH_REMOVE),
-+};
-+LOOKUP(CMD);
-+
-+TABLE(MSG) = {
-+ ENUM(_KDBUS_ITEM_NULL),
-+ ENUM(KDBUS_ITEM_PAYLOAD_VEC),
-+ ENUM(KDBUS_ITEM_PAYLOAD_OFF),
-+ ENUM(KDBUS_ITEM_PAYLOAD_MEMFD),
-+ ENUM(KDBUS_ITEM_FDS),
-+ ENUM(KDBUS_ITEM_BLOOM_PARAMETER),
-+ ENUM(KDBUS_ITEM_BLOOM_FILTER),
-+ ENUM(KDBUS_ITEM_DST_NAME),
-+ ENUM(KDBUS_ITEM_MAKE_NAME),
-+ ENUM(KDBUS_ITEM_ATTACH_FLAGS_SEND),
-+ ENUM(KDBUS_ITEM_ATTACH_FLAGS_RECV),
-+ ENUM(KDBUS_ITEM_ID),
-+ ENUM(KDBUS_ITEM_NAME),
-+ ENUM(KDBUS_ITEM_TIMESTAMP),
-+ ENUM(KDBUS_ITEM_CREDS),
-+ ENUM(KDBUS_ITEM_PIDS),
-+ ENUM(KDBUS_ITEM_AUXGROUPS),
-+ ENUM(KDBUS_ITEM_OWNED_NAME),
-+ ENUM(KDBUS_ITEM_TID_COMM),
-+ ENUM(KDBUS_ITEM_PID_COMM),
-+ ENUM(KDBUS_ITEM_EXE),
-+ ENUM(KDBUS_ITEM_CMDLINE),
-+ ENUM(KDBUS_ITEM_CGROUP),
-+ ENUM(KDBUS_ITEM_CAPS),
-+ ENUM(KDBUS_ITEM_SECLABEL),
-+ ENUM(KDBUS_ITEM_AUDIT),
-+ ENUM(KDBUS_ITEM_CONN_DESCRIPTION),
-+ ENUM(KDBUS_ITEM_NAME_ADD),
-+ ENUM(KDBUS_ITEM_NAME_REMOVE),
-+ ENUM(KDBUS_ITEM_NAME_CHANGE),
-+ ENUM(KDBUS_ITEM_ID_ADD),
-+ ENUM(KDBUS_ITEM_ID_REMOVE),
-+ ENUM(KDBUS_ITEM_REPLY_TIMEOUT),
-+ ENUM(KDBUS_ITEM_REPLY_DEAD),
-+};
-+LOOKUP(MSG);
-+
-+TABLE(PAYLOAD) = {
-+ ENUM(KDBUS_PAYLOAD_KERNEL),
-+ ENUM(KDBUS_PAYLOAD_DBUS),
-+};
-+LOOKUP(PAYLOAD);
-diff --git a/tools/testing/selftests/kdbus/kdbus-enum.h b/tools/testing/selftests/kdbus/kdbus-enum.h
-new file mode 100644
-index 0000000..ed28cca
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/kdbus-enum.h
-@@ -0,0 +1,15 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#pragma once
-+
-+const char *enum_CMD(long long id);
-+const char *enum_MSG(long long id);
-+const char *enum_MATCH(long long id);
-+const char *enum_PAYLOAD(long long id);
-diff --git a/tools/testing/selftests/kdbus/kdbus-test.c b/tools/testing/selftests/kdbus/kdbus-test.c
-new file mode 100644
-index 0000000..db57381
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/kdbus-test.c
-@@ -0,0 +1,905 @@
-+#include <errno.h>
-+#include <stdio.h>
-+#include <string.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <time.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <assert.h>
-+#include <getopt.h>
-+#include <stdbool.h>
-+#include <signal.h>
-+#include <sys/mount.h>
-+#include <sys/prctl.h>
-+#include <sys/wait.h>
-+#include <sys/syscall.h>
-+#include <sys/eventfd.h>
-+#include <linux/sched.h>
-+
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+#include "kdbus-test.h"
-+
-+enum {
-+ TEST_CREATE_BUS = 1 << 0,
-+ TEST_CREATE_CONN = 1 << 1,
-+};
-+
-+struct kdbus_test {
-+ const char *name;
-+ const char *desc;
-+ int (*func)(struct kdbus_test_env *env);
-+ unsigned int flags;
-+};
-+
-+struct kdbus_test_args {
-+ bool mntns;
-+ bool pidns;
-+ bool userns;
-+ char *uid_map;
-+ char *gid_map;
-+ int loop;
-+ int wait;
-+ int fork;
-+ int tap_output;
-+ char *module;
-+ char *root;
-+ char *test;
-+ char *busname;
-+};
-+
-+static const struct kdbus_test tests[] = {
-+ {
-+ .name = "bus-make",
-+ .desc = "bus make functions",
-+ .func = kdbus_test_bus_make,
-+ .flags = 0,
-+ },
-+ {
-+ .name = "hello",
-+ .desc = "the HELLO command",
-+ .func = kdbus_test_hello,
-+ .flags = TEST_CREATE_BUS,
-+ },
-+ {
-+ .name = "byebye",
-+ .desc = "the BYEBYE command",
-+ .func = kdbus_test_byebye,
-+ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
-+ },
-+ {
-+ .name = "chat",
-+ .desc = "a chat pattern",
-+ .func = kdbus_test_chat,
-+ .flags = TEST_CREATE_BUS,
-+ },
-+ {
-+ .name = "daemon",
-+ .desc = "a simple daemon",
-+ .func = kdbus_test_daemon,
-+ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
-+ },
-+ {
-+ .name = "fd-passing",
-+ .desc = "file descriptor passing",
-+ .func = kdbus_test_fd_passing,
-+ .flags = TEST_CREATE_BUS,
-+ },
-+ {
-+ .name = "endpoint",
-+ .desc = "custom endpoint",
-+ .func = kdbus_test_custom_endpoint,
-+ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
-+ },
-+ {
-+ .name = "monitor",
-+ .desc = "monitor functionality",
-+ .func = kdbus_test_monitor,
-+ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
-+ },
-+ {
-+ .name = "name-basics",
-+ .desc = "basic name registry functions",
-+ .func = kdbus_test_name_basic,
-+ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
-+ },
-+ {
-+ .name = "name-conflict",
-+ .desc = "name registry conflict details",
-+ .func = kdbus_test_name_conflict,
-+ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
-+ },
-+ {
-+ .name = "name-queue",
-+ .desc = "queuing of names",
-+ .func = kdbus_test_name_queue,
-+ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
-+ },
-+ {
-+ .name = "name-takeover",
-+ .desc = "takeover of names",
-+ .func = kdbus_test_name_takeover,
-+ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
-+ },
-+ {
-+ .name = "message-basic",
-+ .desc = "basic message handling",
-+ .func = kdbus_test_message_basic,
-+ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
-+ },
-+ {
-+ .name = "message-prio",
-+ .desc = "handling of messages with priority",
-+ .func = kdbus_test_message_prio,
-+ .flags = TEST_CREATE_BUS,
-+ },
-+ {
-+ .name = "message-quota",
-+ .desc = "message quotas are enforced",
-+ .func = kdbus_test_message_quota,
-+ .flags = TEST_CREATE_BUS,
-+ },
-+ {
-+ .name = "memory-access",
-+ .desc = "memory access",
-+ .func = kdbus_test_memory_access,
-+ .flags = TEST_CREATE_BUS,
-+ },
-+ {
-+ .name = "timeout",
-+ .desc = "timeout",
-+ .func = kdbus_test_timeout,
-+ .flags = TEST_CREATE_BUS,
-+ },
-+ {
-+ .name = "sync-byebye",
-+ .desc = "synchronous replies vs. BYEBYE",
-+ .func = kdbus_test_sync_byebye,
-+ .flags = TEST_CREATE_BUS,
-+ },
-+ {
-+ .name = "sync-reply",
-+ .desc = "synchronous replies",
-+ .func = kdbus_test_sync_reply,
-+ .flags = TEST_CREATE_BUS,
-+ },
-+ {
-+ .name = "message-free",
-+ .desc = "freeing of memory",
-+ .func = kdbus_test_free,
-+ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
-+ },
-+ {
-+ .name = "connection-info",
-+ .desc = "retrieving connection information",
-+ .func = kdbus_test_conn_info,
-+ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
-+ },
-+ {
-+ .name = "connection-update",
-+ .desc = "updating connection information",
-+ .func = kdbus_test_conn_update,
-+ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
-+ },
-+ {
-+ .name = "writable-pool",
-+ .desc = "verifying pools are never writable",
-+ .func = kdbus_test_writable_pool,
-+ .flags = TEST_CREATE_BUS,
-+ },
-+ {
-+ .name = "policy",
-+ .desc = "policy",
-+ .func = kdbus_test_policy,
-+ .flags = TEST_CREATE_BUS,
-+ },
-+ {
-+ .name = "policy-priv",
-+ .desc = "unprivileged bus access",
-+ .func = kdbus_test_policy_priv,
-+ .flags = TEST_CREATE_BUS,
-+ },
-+ {
-+ .name = "policy-ns",
-+ .desc = "policy in user namespaces",
-+ .func = kdbus_test_policy_ns,
-+ .flags = TEST_CREATE_BUS,
-+ },
-+ {
-+ .name = "metadata-ns",
-+ .desc = "metadata in different namespaces",
-+ .func = kdbus_test_metadata_ns,
-+ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
-+ },
-+ {
-+ .name = "match-id-add",
-+ .desc = "adding of matches by id",
-+ .func = kdbus_test_match_id_add,
-+ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
-+ },
-+ {
-+ .name = "match-id-remove",
-+ .desc = "removing of matches by id",
-+ .func = kdbus_test_match_id_remove,
-+ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
-+ },
-+ {
-+ .name = "match-replace",
-+ .desc = "replace of matches with the same cookie",
-+ .func = kdbus_test_match_replace,
-+ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
-+ },
-+ {
-+ .name = "match-name-add",
-+ .desc = "adding of matches by name",
-+ .func = kdbus_test_match_name_add,
-+ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
-+ },
-+ {
-+ .name = "match-name-remove",
-+ .desc = "removing of matches by name",
-+ .func = kdbus_test_match_name_remove,
-+ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
-+ },
-+ {
-+ .name = "match-name-change",
-+ .desc = "matching for name changes",
-+ .func = kdbus_test_match_name_change,
-+ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
-+ },
-+ {
-+ .name = "match-bloom",
-+ .desc = "matching with bloom filters",
-+ .func = kdbus_test_match_bloom,
-+ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
-+ },
-+ {
-+ .name = "activator",
-+ .desc = "activator connections",
-+ .func = kdbus_test_activator,
-+ .flags = TEST_CREATE_BUS | TEST_CREATE_CONN,
-+ },
-+ {
-+ .name = "benchmark",
-+ .desc = "benchmark",
-+ .func = kdbus_test_benchmark,
-+ .flags = TEST_CREATE_BUS,
-+ },
-+ {
-+ .name = "benchmark-nomemfds",
-+ .desc = "benchmark without using memfds",
-+ .func = kdbus_test_benchmark_nomemfds,
-+ .flags = TEST_CREATE_BUS,
-+ },
-+ {
-+ .name = "benchmark-uds",
-+ .desc = "benchmark comparison to UDS",
-+ .func = kdbus_test_benchmark_uds,
-+ .flags = TEST_CREATE_BUS,
-+ },
-+};
-+
-+#define N_TESTS ((int) (sizeof(tests) / sizeof(tests[0])))
-+
-+static int test_prepare_env(const struct kdbus_test *t,
-+ const struct kdbus_test_args *args,
-+ struct kdbus_test_env *env)
-+{
-+ if (t->flags & TEST_CREATE_BUS) {
-+ char *s;
-+ char *n = NULL;
-+ int ret;
-+
-+ asprintf(&s, "%s/control", args->root);
-+
-+ env->control_fd = open(s, O_RDWR);
-+ free(s);
-+ ASSERT_RETURN(env->control_fd >= 0);
-+
-+ if (!args->busname) {
-+ n = unique_name("test-bus");
-+ ASSERT_RETURN(n);
-+ }
-+
-+ ret = kdbus_create_bus(env->control_fd,
-+ args->busname ?: n,
-+ _KDBUS_ATTACH_ALL, &s);
-+ free(n);
-+ ASSERT_RETURN(ret == 0);
-+
-+ asprintf(&env->buspath, "%s/%s/bus", args->root, s);
-+ free(s);
-+ }
-+
-+ if (t->flags & TEST_CREATE_CONN) {
-+ env->conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(env->conn);
-+ }
-+
-+ env->root = args->root;
-+ env->module = args->module;
-+
-+ return 0;
-+}
-+
-+void test_unprepare_env(const struct kdbus_test *t, struct kdbus_test_env *env)
-+{
-+ if (env->conn) {
-+ kdbus_conn_free(env->conn);
-+ env->conn = NULL;
-+ }
-+
-+ if (env->control_fd >= 0) {
-+ close(env->control_fd);
-+ env->control_fd = -1;
-+ }
-+
-+ if (env->buspath) {
-+ free(env->buspath);
-+ env->buspath = NULL;
-+ }
-+}
-+
-+static int test_run(const struct kdbus_test *t,
-+ const struct kdbus_test_args *kdbus_args,
-+ int wait)
-+{
-+ int ret;
-+ struct kdbus_test_env env = {};
-+
-+ ret = test_prepare_env(t, kdbus_args, &env);
-+ if (ret != TEST_OK)
-+ return ret;
-+
-+ if (wait > 0) {
-+ printf("Sleeping %d seconds before running test ...\n", wait);
-+ sleep(wait);
-+ }
-+
-+ ret = t->func(&env);
-+ test_unprepare_env(t, &env);
-+ return ret;
-+}
-+
-+static int test_run_forked(const struct kdbus_test *t,
-+ const struct kdbus_test_args *kdbus_args,
-+ int wait)
-+{
-+ int ret;
-+ pid_t pid;
-+
-+ pid = fork();
-+ if (pid < 0) {
-+ return TEST_ERR;
-+ } else if (pid == 0) {
-+ ret = test_run(t, kdbus_args, wait);
-+ _exit(ret);
-+ }
-+
-+ pid = waitpid(pid, &ret, 0);
-+ if (pid <= 0)
-+ return TEST_ERR;
-+ else if (!WIFEXITED(ret))
-+ return TEST_ERR;
-+ else
-+ return WEXITSTATUS(ret);
-+}
-+
-+static void print_test_result(int ret)
-+{
-+ switch (ret) {
-+ case TEST_OK:
-+ printf("OK");
-+ break;
-+ case TEST_SKIP:
-+ printf("SKIPPED");
-+ break;
-+ case TEST_ERR:
-+ printf("ERROR");
-+ break;
-+ }
-+}
-+
-+static int start_all_tests(struct kdbus_test_args *kdbus_args)
-+{
-+ int ret;
-+ unsigned int fail_cnt = 0;
-+ unsigned int skip_cnt = 0;
-+ unsigned int ok_cnt = 0;
-+ unsigned int i;
-+
-+ if (kdbus_args->tap_output) {
-+ printf("1..%d\n", N_TESTS);
-+ fflush(stdout);
-+ }
-+
-+ kdbus_util_verbose = false;
-+
-+ for (i = 0; i < N_TESTS; i++) {
-+ const struct kdbus_test *t = tests + i;
-+
-+ if (!kdbus_args->tap_output) {
-+ unsigned int n;
-+
-+ printf("Testing %s (%s) ", t->desc, t->name);
-+ for (n = 0; n < 60 - strlen(t->desc) - strlen(t->name); n++)
-+ printf(".");
-+ printf(" ");
-+ }
-+
-+ ret = test_run_forked(t, kdbus_args, 0);
-+ switch (ret) {
-+ case TEST_OK:
-+ ok_cnt++;
-+ break;
-+ case TEST_SKIP:
-+ skip_cnt++;
-+ break;
-+ case TEST_ERR:
-+ fail_cnt++;
-+ break;
-+ }
-+
-+ if (kdbus_args->tap_output) {
-+ printf("%sok %d - %s%s (%s)\n",
-+ (ret == TEST_ERR) ? "not " : "", i + 1,
-+ (ret == TEST_SKIP) ? "# SKIP " : "",
-+ t->desc, t->name);
-+ fflush(stdout);
-+ } else {
-+ print_test_result(ret);
-+ printf("\n");
-+ }
-+ }
-+
-+ if (kdbus_args->tap_output)
-+ printf("Failed %d/%d tests, %.2f%% okay\n", fail_cnt, N_TESTS,
-+ 100.0 - (fail_cnt * 100.0) / ((float) N_TESTS));
-+ else
-+ printf("\nSUMMARY: %u tests passed, %u skipped, %u failed\n",
-+ ok_cnt, skip_cnt, fail_cnt);
-+
-+ return fail_cnt > 0 ? TEST_ERR : TEST_OK;
-+}
-+
-+static int start_one_test(struct kdbus_test_args *kdbus_args)
-+{
-+ int i, ret;
-+ bool test_found = false;
-+
-+ for (i = 0; i < N_TESTS; i++) {
-+ const struct kdbus_test *t = tests + i;
-+
-+ if (strcmp(t->name, kdbus_args->test))
-+ continue;
-+
-+ do {
-+ test_found = true;
-+ if (kdbus_args->fork)
-+ ret = test_run_forked(t, kdbus_args,
-+ kdbus_args->wait);
-+ else
-+ ret = test_run(t, kdbus_args,
-+ kdbus_args->wait);
-+
-+ printf("Testing %s: ", t->desc);
-+ print_test_result(ret);
-+ printf("\n");
-+
-+ if (ret != TEST_OK)
-+ break;
-+ } while (kdbus_args->loop);
-+
-+ return ret;
-+ }
-+
-+ if (!test_found) {
-+ printf("Unknown test-id '%s'\n", kdbus_args->test);
-+ return TEST_ERR;
-+ }
-+
-+ return TEST_OK;
-+}
-+
-+static void usage(const char *argv0)
-+{
-+ unsigned int i, j;
-+
-+ printf("Usage: %s [options]\n"
-+ "Options:\n"
-+ "\t-a, --tap Output test results in TAP format\n"
-+ "\t-m, --module <module> Kdbus module name\n"
-+ "\t-x, --loop Run in a loop\n"
-+ "\t-f, --fork Fork before running a test\n"
-+ "\t-h, --help Print this help\n"
-+ "\t-r, --root <root> Toplevel of the kdbus hierarchy\n"
-+ "\t-t, --test <test-id> Run one specific test only, in verbose mode\n"
-+ "\t-b, --bus <busname> Instead of generating a random bus name, take <busname>.\n"
-+ "\t-w, --wait <secs> Wait <secs> before actually starting test\n"
-+ "\t --mntns New mount namespace\n"
-+ "\t --pidns New PID namespace\n"
-+ "\t --userns New user namespace\n"
-+ "\t --uidmap uid_map UID map for user namespace\n"
-+ "\t --gidmap gid_map GID map for user namespace\n"
-+ "\n", argv0);
-+
-+ printf("By default, all test are run once, and a summary is printed.\n"
-+ "Available tests for --test:\n\n");
-+
-+ for (i = 0; i < N_TESTS; i++) {
-+ const struct kdbus_test *t = tests + i;
-+
-+ printf("\t%s", t->name);
-+
-+ for (j = 0; j < 24 - strlen(t->name); j++)
-+ printf(" ");
-+
-+ printf("Test %s\n", t->desc);
-+ }
-+
-+ printf("\n");
-+ printf("Note that some tests may, if run specifically by --test, "
-+ "behave differently, and not terminate by themselves.\n");
-+
-+ exit(EXIT_FAILURE);
-+}
-+
-+void print_kdbus_test_args(struct kdbus_test_args *args)
-+{
-+ if (args->userns || args->pidns || args->mntns)
-+ printf("# Starting tests in new %s%s%s namespaces%s\n",
-+ args->mntns ? "MOUNT " : "",
-+ args->pidns ? "PID " : "",
-+ args->userns ? "USER " : "",
-+ args->mntns ? ", kdbusfs will be remounted" : "");
-+ else
-+ printf("# Starting tests in the same namespaces\n");
-+}
-+
-+void print_metadata_support(void)
-+{
-+ bool no_meta_audit, no_meta_cgroups, no_meta_seclabel;
-+
-+ /*
-+ * KDBUS_ATTACH_CGROUP, KDBUS_ATTACH_AUDIT and
-+ * KDBUS_ATTACH_SECLABEL
-+ */
-+ no_meta_audit = !config_auditsyscall_is_enabled();
-+ no_meta_cgroups = !config_cgroups_is_enabled();
-+ no_meta_seclabel = !config_security_is_enabled();
-+
-+ if (no_meta_audit | no_meta_cgroups | no_meta_seclabel)
-+ printf("# Starting tests without %s%s%s metadata support\n",
-+ no_meta_audit ? "AUDIT " : "",
-+ no_meta_cgroups ? "CGROUP " : "",
-+ no_meta_seclabel ? "SECLABEL " : "");
-+ else
-+ printf("# Starting tests with full metadata support\n");
-+}
-+
-+int run_tests(struct kdbus_test_args *kdbus_args)
-+{
-+ int ret;
-+ static char control[4096];
-+
-+ snprintf(control, sizeof(control), "%s/control", kdbus_args->root);
-+
-+ if (access(control, W_OK) < 0) {
-+ printf("Unable to locate control node at '%s'.\n",
-+ control);
-+ return TEST_ERR;
-+ }
-+
-+ if (kdbus_args->test) {
-+ ret = start_one_test(kdbus_args);
-+ } else {
-+ do {
-+ ret = start_all_tests(kdbus_args);
-+ if (ret != TEST_OK)
-+ break;
-+ } while (kdbus_args->loop);
-+ }
-+
-+ return ret;
-+}
-+
-+static void nop_handler(int sig) {}
-+
-+static int test_prepare_mounts(struct kdbus_test_args *kdbus_args)
-+{
-+ int ret;
-+ char kdbusfs[64] = {'\0'};
-+
-+ snprintf(kdbusfs, sizeof(kdbusfs), "%sfs", kdbus_args->module);
-+
-+ /* make current mount slave */
-+ ret = mount(NULL, "/", NULL, MS_SLAVE|MS_REC, NULL);
-+ if (ret < 0) {
-+ ret = -errno;
-+ printf("error mount() root: %d (%m)\n", ret);
-+ return ret;
-+ }
-+
-+ /* Remount procfs since we need it in our tests */
-+ if (kdbus_args->pidns) {
-+ ret = mount("proc", "/proc", "proc",
-+ MS_NOSUID|MS_NOEXEC|MS_NODEV, NULL);
-+ if (ret < 0) {
-+ ret = -errno;
-+ printf("error mount() /proc : %d (%m)\n", ret);
-+ return ret;
-+ }
-+ }
-+
-+ /* Remount kdbusfs */
-+ ret = mount(kdbusfs, kdbus_args->root, kdbusfs,
-+ MS_NOSUID|MS_NOEXEC|MS_NODEV, NULL);
-+ if (ret < 0) {
-+ ret = -errno;
-+ printf("error mount() %s :%d (%m)\n", kdbusfs, ret);
-+ return ret;
-+ }
-+
-+ return 0;
-+}
-+
-+int run_tests_in_namespaces(struct kdbus_test_args *kdbus_args)
-+{
-+ int ret;
-+ int efd = -1;
-+ int status;
-+ pid_t pid, rpid;
-+ struct sigaction oldsa;
-+ struct sigaction sa = {
-+ .sa_handler = nop_handler,
-+ .sa_flags = SA_NOCLDSTOP,
-+ };
-+
-+ efd = eventfd(0, EFD_CLOEXEC);
-+ if (efd < 0) {
-+ ret = -errno;
-+ printf("eventfd() failed: %d (%m)\n", ret);
-+ return TEST_ERR;
-+ }
-+
-+ ret = sigaction(SIGCHLD, &sa, &oldsa);
-+ if (ret < 0) {
-+ ret = -errno;
-+ printf("sigaction() failed: %d (%m)\n", ret);
-+ return TEST_ERR;
-+ }
-+
-+ /* setup namespaces */
-+ pid = syscall(__NR_clone, SIGCHLD|
-+ (kdbus_args->userns ? CLONE_NEWUSER : 0) |
-+ (kdbus_args->mntns ? CLONE_NEWNS : 0) |
-+ (kdbus_args->pidns ? CLONE_NEWPID : 0), NULL);
-+ if (pid < 0) {
-+ printf("clone() failed: %d (%m)\n", -errno);
-+ return TEST_ERR;
-+ }
-+
-+ if (pid == 0) {
-+ eventfd_t event_status = 0;
-+
-+ ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
-+ if (ret < 0) {
-+ ret = -errno;
-+ printf("error prctl(): %d (%m)\n", ret);
-+ _exit(TEST_ERR);
-+ }
-+
-+ /* reset sighandlers of childs */
-+ ret = sigaction(SIGCHLD, &oldsa, NULL);
-+ if (ret < 0) {
-+ ret = -errno;
-+ printf("sigaction() failed: %d (%m)\n", ret);
-+ _exit(TEST_ERR);
-+ }
-+
-+ ret = eventfd_read(efd, &event_status);
-+ if (ret < 0 || event_status != 1) {
-+ printf("error eventfd_read()\n");
-+ _exit(TEST_ERR);
-+ }
-+
-+ if (kdbus_args->mntns) {
-+ ret = test_prepare_mounts(kdbus_args);
-+ if (ret < 0) {
-+ printf("error preparing mounts\n");
-+ _exit(TEST_ERR);
-+ }
-+ }
-+
-+ ret = run_tests(kdbus_args);
-+ _exit(ret);
-+ }
-+
-+ /* Setup userns mapping */
-+ if (kdbus_args->userns) {
-+ ret = userns_map_uid_gid(pid, kdbus_args->uid_map,
-+ kdbus_args->gid_map);
-+ if (ret < 0) {
-+ printf("error mapping uid and gid in userns\n");
-+ eventfd_write(efd, 2);
-+ return TEST_ERR;
-+ }
-+ }
-+
-+ ret = eventfd_write(efd, 1);
-+ if (ret < 0) {
-+ ret = -errno;
-+ printf("error eventfd_write(): %d (%m)\n", ret);
-+ return TEST_ERR;
-+ }
-+
-+ rpid = waitpid(pid, &status, 0);
-+ ASSERT_RETURN_VAL(rpid == pid, TEST_ERR);
-+
-+ close(efd);
-+
-+ if (!WIFEXITED(status) || WEXITSTATUS(status) != 0)
-+ return TEST_ERR;
-+
-+ return TEST_OK;
-+}
-+
-+int start_tests(struct kdbus_test_args *kdbus_args)
-+{
-+ int ret;
-+ bool namespaces;
-+ static char fspath[4096];
-+
-+ namespaces = (kdbus_args->mntns || kdbus_args->pidns ||
-+ kdbus_args->userns);
-+
-+ /* for pidns we need mntns set */
-+ if (kdbus_args->pidns && !kdbus_args->mntns) {
-+ printf("Failed: please set both pid and mnt namesapces\n");
-+ return TEST_ERR;
-+ }
-+
-+ if (kdbus_args->userns) {
-+ if (!config_user_ns_is_enabled()) {
-+ printf("User namespace not supported\n");
-+ return TEST_ERR;
-+ }
-+
-+ if (!kdbus_args->uid_map || !kdbus_args->gid_map) {
-+ printf("Failed: please specify uid or gid mapping\n");
-+ return TEST_ERR;
-+ }
-+ }
-+
-+ print_kdbus_test_args(kdbus_args);
-+ print_metadata_support();
-+
-+ /* setup kdbus paths */
-+ if (!kdbus_args->module)
-+ kdbus_args->module = "kdbus";
-+
-+ if (!kdbus_args->root) {
-+ snprintf(fspath, sizeof(fspath), "/sys/fs/%s",
-+ kdbus_args->module);
-+ kdbus_args->root = fspath;
-+ }
-+
-+ /* Start tests */
-+ if (namespaces)
-+ ret = run_tests_in_namespaces(kdbus_args);
-+ else
-+ ret = run_tests(kdbus_args);
-+
-+ return ret;
-+}
-+
-+int main(int argc, char *argv[])
-+{
-+ int t, ret = 0;
-+ struct kdbus_test_args *kdbus_args;
-+ enum {
-+ ARG_MNTNS = 0x100,
-+ ARG_PIDNS,
-+ ARG_USERNS,
-+ ARG_UIDMAP,
-+ ARG_GIDMAP,
-+ };
-+
-+ kdbus_args = malloc(sizeof(*kdbus_args));
-+ if (!kdbus_args) {
-+ printf("unable to malloc() kdbus_args\n");
-+ return EXIT_FAILURE;
-+ }
-+
-+ memset(kdbus_args, 0, sizeof(*kdbus_args));
-+
-+ static const struct option options[] = {
-+ { "loop", no_argument, NULL, 'x' },
-+ { "help", no_argument, NULL, 'h' },
-+ { "root", required_argument, NULL, 'r' },
-+ { "test", required_argument, NULL, 't' },
-+ { "bus", required_argument, NULL, 'b' },
-+ { "wait", required_argument, NULL, 'w' },
-+ { "fork", no_argument, NULL, 'f' },
-+ { "module", required_argument, NULL, 'm' },
-+ { "tap", no_argument, NULL, 'a' },
-+ { "mntns", no_argument, NULL, ARG_MNTNS },
-+ { "pidns", no_argument, NULL, ARG_PIDNS },
-+ { "userns", no_argument, NULL, ARG_USERNS },
-+ { "uidmap", required_argument, NULL, ARG_UIDMAP },
-+ { "gidmap", required_argument, NULL, ARG_GIDMAP },
-+ {}
-+ };
-+
-+ srand(time(NULL));
-+
-+ while ((t = getopt_long(argc, argv, "hxfm:r:t:b:w:a", options, NULL)) >= 0) {
-+ switch (t) {
-+ case 'x':
-+ kdbus_args->loop = 1;
-+ break;
-+
-+ case 'm':
-+ kdbus_args->module = optarg;
-+ break;
-+
-+ case 'r':
-+ kdbus_args->root = optarg;
-+ break;
-+
-+ case 't':
-+ kdbus_args->test = optarg;
-+ break;
-+
-+ case 'b':
-+ kdbus_args->busname = optarg;
-+ break;
-+
-+ case 'w':
-+ kdbus_args->wait = strtol(optarg, NULL, 10);
-+ break;
-+
-+ case 'f':
-+ kdbus_args->fork = 1;
-+ break;
-+
-+ case 'a':
-+ kdbus_args->tap_output = 1;
-+ break;
-+
-+ case ARG_MNTNS:
-+ kdbus_args->mntns = true;
-+ break;
-+
-+ case ARG_PIDNS:
-+ kdbus_args->pidns = true;
-+ break;
-+
-+ case ARG_USERNS:
-+ kdbus_args->userns = true;
-+ break;
-+
-+ case ARG_UIDMAP:
-+ kdbus_args->uid_map = optarg;
-+ break;
-+
-+ case ARG_GIDMAP:
-+ kdbus_args->gid_map = optarg;
-+ break;
-+
-+ default:
-+ case 'h':
-+ usage(argv[0]);
-+ }
-+ }
-+
-+ ret = start_tests(kdbus_args);
-+ if (ret == TEST_ERR)
-+ return EXIT_FAILURE;
-+
-+ free(kdbus_args);
-+
-+ return 0;
-+}
-diff --git a/tools/testing/selftests/kdbus/kdbus-test.h b/tools/testing/selftests/kdbus/kdbus-test.h
-new file mode 100644
-index 0000000..ee937f9
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/kdbus-test.h
-@@ -0,0 +1,84 @@
-+#ifndef _TEST_KDBUS_H_
-+#define _TEST_KDBUS_H_
-+
-+struct kdbus_test_env {
-+ char *buspath;
-+ const char *root;
-+ const char *module;
-+ int control_fd;
-+ struct kdbus_conn *conn;
-+};
-+
-+enum {
-+ TEST_OK,
-+ TEST_SKIP,
-+ TEST_ERR,
-+};
-+
-+#define ASSERT_RETURN_VAL(cond, val) \
-+ if (!(cond)) { \
-+ fprintf(stderr, "Assertion '%s' failed in %s(), %s:%d\n", \
-+ #cond, __func__, __FILE__, __LINE__); \
-+ return val; \
-+ }
-+
-+#define ASSERT_EXIT_VAL(cond, val) \
-+ if (!(cond)) { \
-+ fprintf(stderr, "Assertion '%s' failed in %s(), %s:%d\n", \
-+ #cond, __func__, __FILE__, __LINE__); \
-+ _exit(val); \
-+ }
-+
-+#define ASSERT_BREAK(cond) \
-+ if (!(cond)) { \
-+ fprintf(stderr, "Assertion '%s' failed in %s(), %s:%d\n", \
-+ #cond, __func__, __FILE__, __LINE__); \
-+ break; \
-+ }
-+
-+#define ASSERT_RETURN(cond) \
-+ ASSERT_RETURN_VAL(cond, TEST_ERR)
-+
-+#define ASSERT_EXIT(cond) \
-+ ASSERT_EXIT_VAL(cond, EXIT_FAILURE)
-+
-+int kdbus_test_activator(struct kdbus_test_env *env);
-+int kdbus_test_benchmark(struct kdbus_test_env *env);
-+int kdbus_test_benchmark_nomemfds(struct kdbus_test_env *env);
-+int kdbus_test_benchmark_uds(struct kdbus_test_env *env);
-+int kdbus_test_bus_make(struct kdbus_test_env *env);
-+int kdbus_test_byebye(struct kdbus_test_env *env);
-+int kdbus_test_chat(struct kdbus_test_env *env);
-+int kdbus_test_conn_info(struct kdbus_test_env *env);
-+int kdbus_test_conn_update(struct kdbus_test_env *env);
-+int kdbus_test_daemon(struct kdbus_test_env *env);
-+int kdbus_test_custom_endpoint(struct kdbus_test_env *env);
-+int kdbus_test_fd_passing(struct kdbus_test_env *env);
-+int kdbus_test_free(struct kdbus_test_env *env);
-+int kdbus_test_hello(struct kdbus_test_env *env);
-+int kdbus_test_match_bloom(struct kdbus_test_env *env);
-+int kdbus_test_match_id_add(struct kdbus_test_env *env);
-+int kdbus_test_match_id_remove(struct kdbus_test_env *env);
-+int kdbus_test_match_replace(struct kdbus_test_env *env);
-+int kdbus_test_match_name_add(struct kdbus_test_env *env);
-+int kdbus_test_match_name_change(struct kdbus_test_env *env);
-+int kdbus_test_match_name_remove(struct kdbus_test_env *env);
-+int kdbus_test_message_basic(struct kdbus_test_env *env);
-+int kdbus_test_message_prio(struct kdbus_test_env *env);
-+int kdbus_test_message_quota(struct kdbus_test_env *env);
-+int kdbus_test_memory_access(struct kdbus_test_env *env);
-+int kdbus_test_metadata_ns(struct kdbus_test_env *env);
-+int kdbus_test_monitor(struct kdbus_test_env *env);
-+int kdbus_test_name_basic(struct kdbus_test_env *env);
-+int kdbus_test_name_conflict(struct kdbus_test_env *env);
-+int kdbus_test_name_queue(struct kdbus_test_env *env);
-+int kdbus_test_name_takeover(struct kdbus_test_env *env);
-+int kdbus_test_policy(struct kdbus_test_env *env);
-+int kdbus_test_policy_ns(struct kdbus_test_env *env);
-+int kdbus_test_policy_priv(struct kdbus_test_env *env);
-+int kdbus_test_sync_byebye(struct kdbus_test_env *env);
-+int kdbus_test_sync_reply(struct kdbus_test_env *env);
-+int kdbus_test_timeout(struct kdbus_test_env *env);
-+int kdbus_test_writable_pool(struct kdbus_test_env *env);
-+
-+#endif /* _TEST_KDBUS_H_ */
-diff --git a/tools/testing/selftests/kdbus/kdbus-util.c b/tools/testing/selftests/kdbus/kdbus-util.c
-new file mode 100644
-index 0000000..82fa89b
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/kdbus-util.c
-@@ -0,0 +1,1612 @@
-+/*
-+ * Copyright (C) 2013-2015 Daniel Mack
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2014-2015 Djalal Harouni
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <stdio.h>
-+#include <stdarg.h>
-+#include <string.h>
-+#include <time.h>
-+#include <inttypes.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <stdbool.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <poll.h>
-+#include <grp.h>
-+#include <sys/capability.h>
-+#include <sys/mman.h>
-+#include <sys/stat.h>
-+#include <sys/time.h>
-+#include <linux/unistd.h>
-+#include <linux/memfd.h>
-+
-+#ifndef __NR_memfd_create
-+ #ifdef __x86_64__
-+ #define __NR_memfd_create 319
-+ #elif defined __arm__
-+ #define __NR_memfd_create 385
-+ #else
-+ #define __NR_memfd_create 356
-+ #endif
-+#endif
-+
-+#include "kdbus-api.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+
-+#ifndef F_ADD_SEALS
-+#define F_LINUX_SPECIFIC_BASE 1024
-+#define F_ADD_SEALS (F_LINUX_SPECIFIC_BASE + 9)
-+#define F_GET_SEALS (F_LINUX_SPECIFIC_BASE + 10)
-+
-+#define F_SEAL_SEAL 0x0001 /* prevent further seals from being set */
-+#define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */
-+#define F_SEAL_GROW 0x0004 /* prevent file from growing */
-+#define F_SEAL_WRITE 0x0008 /* prevent writes */
-+#endif
-+
-+int kdbus_util_verbose = true;
-+
-+int kdbus_sysfs_get_parameter_mask(const char *path, uint64_t *mask)
-+{
-+ int ret;
-+ FILE *file;
-+ unsigned long long value;
-+
-+ file = fopen(path, "r");
-+ if (!file) {
-+ ret = -errno;
-+ kdbus_printf("--- error fopen(): %d (%m)\n", ret);
-+ return ret;
-+ }
-+
-+ ret = fscanf(file, "%llu", &value);
-+ if (ret != 1) {
-+ if (ferror(file))
-+ ret = -errno;
-+ else
-+ ret = -EIO;
-+
-+ kdbus_printf("--- error fscanf(): %d\n", ret);
-+ fclose(file);
-+ return ret;
-+ }
-+
-+ *mask = (uint64_t)value;
-+
-+ fclose(file);
-+
-+ return 0;
-+}
-+
-+int kdbus_sysfs_set_parameter_mask(const char *path, uint64_t mask)
-+{
-+ int ret;
-+ FILE *file;
-+
-+ file = fopen(path, "w");
-+ if (!file) {
-+ ret = -errno;
-+ kdbus_printf("--- error open(): %d (%m)\n", ret);
-+ return ret;
-+ }
-+
-+ ret = fprintf(file, "%llu", (unsigned long long)mask);
-+ if (ret <= 0) {
-+ ret = -EIO;
-+ kdbus_printf("--- error fprintf(): %d\n", ret);
-+ }
-+
-+ fclose(file);
-+
-+ return ret > 0 ? 0 : ret;
-+}
-+
-+int kdbus_create_bus(int control_fd, const char *name,
-+ uint64_t owner_meta, char **path)
-+{
-+ struct {
-+ struct kdbus_cmd cmd;
-+
-+ /* bloom size item */
-+ struct {
-+ uint64_t size;
-+ uint64_t type;
-+ struct kdbus_bloom_parameter bloom;
-+ } bp;
-+
-+ /* owner metadata items */
-+ struct {
-+ uint64_t size;
-+ uint64_t type;
-+ uint64_t flags;
-+ } attach;
-+
-+ /* name item */
-+ struct {
-+ uint64_t size;
-+ uint64_t type;
-+ char str[64];
-+ } name;
-+ } bus_make;
-+ int ret;
-+
-+ memset(&bus_make, 0, sizeof(bus_make));
-+ bus_make.bp.size = sizeof(bus_make.bp);
-+ bus_make.bp.type = KDBUS_ITEM_BLOOM_PARAMETER;
-+ bus_make.bp.bloom.size = 64;
-+ bus_make.bp.bloom.n_hash = 1;
-+
-+ snprintf(bus_make.name.str, sizeof(bus_make.name.str),
-+ "%u-%s", getuid(), name);
-+
-+ bus_make.attach.type = KDBUS_ITEM_ATTACH_FLAGS_SEND;
-+ bus_make.attach.size = sizeof(bus_make.attach);
-+ bus_make.attach.flags = owner_meta;
-+
-+ bus_make.name.type = KDBUS_ITEM_MAKE_NAME;
-+ bus_make.name.size = KDBUS_ITEM_HEADER_SIZE +
-+ strlen(bus_make.name.str) + 1;
-+
-+ bus_make.cmd.flags = KDBUS_MAKE_ACCESS_WORLD;
-+ bus_make.cmd.size = sizeof(bus_make.cmd) +
-+ bus_make.bp.size +
-+ bus_make.attach.size +
-+ bus_make.name.size;
-+
-+ kdbus_printf("Creating bus with name >%s< on control fd %d ...\n",
-+ name, control_fd);
-+
-+ ret = kdbus_cmd_bus_make(control_fd, &bus_make.cmd);
-+ if (ret < 0) {
-+ kdbus_printf("--- error when making bus: %d (%m)\n", ret);
-+ return ret;
-+ }
-+
-+ if (ret == 0 && path)
-+ *path = strdup(bus_make.name.str);
-+
-+ return ret;
-+}
-+
-+struct kdbus_conn *
-+kdbus_hello(const char *path, uint64_t flags,
-+ const struct kdbus_item *item, size_t item_size)
-+{
-+ struct kdbus_cmd_free cmd_free = {};
-+ int fd, ret;
-+ struct {
-+ struct kdbus_cmd_hello hello;
-+
-+ struct {
-+ uint64_t size;
-+ uint64_t type;
-+ char str[16];
-+ } conn_name;
-+
-+ uint8_t extra_items[item_size];
-+ } h;
-+ struct kdbus_conn *conn;
-+
-+ memset(&h, 0, sizeof(h));
-+
-+ if (item_size > 0)
-+ memcpy(h.extra_items, item, item_size);
-+
-+ kdbus_printf("-- opening bus connection %s\n", path);
-+ fd = open(path, O_RDWR|O_CLOEXEC);
-+ if (fd < 0) {
-+ kdbus_printf("--- error %d (%m)\n", fd);
-+ return NULL;
-+ }
-+
-+ h.hello.flags = flags | KDBUS_HELLO_ACCEPT_FD;
-+ h.hello.attach_flags_send = _KDBUS_ATTACH_ALL;
-+ h.hello.attach_flags_recv = _KDBUS_ATTACH_ALL;
-+ h.conn_name.type = KDBUS_ITEM_CONN_DESCRIPTION;
-+ strcpy(h.conn_name.str, "this-is-my-name");
-+ h.conn_name.size = KDBUS_ITEM_HEADER_SIZE + strlen(h.conn_name.str) + 1;
-+
-+ h.hello.size = sizeof(h);
-+ h.hello.pool_size = POOL_SIZE;
-+
-+ ret = kdbus_cmd_hello(fd, (struct kdbus_cmd_hello *) &h.hello);
-+ if (ret < 0) {
-+ kdbus_printf("--- error when saying hello: %d (%m)\n", ret);
-+ return NULL;
-+ }
-+ kdbus_printf("-- Our peer ID for %s: %llu -- bus uuid: '%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x%02x'\n",
-+ path, (unsigned long long)h.hello.id,
-+ h.hello.id128[0], h.hello.id128[1], h.hello.id128[2],
-+ h.hello.id128[3], h.hello.id128[4], h.hello.id128[5],
-+ h.hello.id128[6], h.hello.id128[7], h.hello.id128[8],
-+ h.hello.id128[9], h.hello.id128[10], h.hello.id128[11],
-+ h.hello.id128[12], h.hello.id128[13], h.hello.id128[14],
-+ h.hello.id128[15]);
-+
-+ cmd_free.size = sizeof(cmd_free);
-+ cmd_free.offset = h.hello.offset;
-+ kdbus_cmd_free(fd, &cmd_free);
-+
-+ conn = malloc(sizeof(*conn));
-+ if (!conn) {
-+ kdbus_printf("unable to malloc()!?\n");
-+ return NULL;
-+ }
-+
-+ conn->buf = mmap(NULL, POOL_SIZE, PROT_READ, MAP_SHARED, fd, 0);
-+ if (conn->buf == MAP_FAILED) {
-+ free(conn);
-+ close(fd);
-+ kdbus_printf("--- error mmap (%m)\n");
-+ return NULL;
-+ }
-+
-+ conn->fd = fd;
-+ conn->id = h.hello.id;
-+ return conn;
-+}
-+
-+struct kdbus_conn *
-+kdbus_hello_registrar(const char *path, const char *name,
-+ const struct kdbus_policy_access *access,
-+ size_t num_access, uint64_t flags)
-+{
-+ struct kdbus_item *item, *items;
-+ size_t i, size;
-+
-+ size = KDBUS_ITEM_SIZE(strlen(name) + 1) +
-+ num_access * KDBUS_ITEM_SIZE(sizeof(*access));
-+
-+ items = alloca(size);
-+
-+ item = items;
-+ item->size = KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
-+ item->type = KDBUS_ITEM_NAME;
-+ strcpy(item->str, name);
-+ item = KDBUS_ITEM_NEXT(item);
-+
-+ for (i = 0; i < num_access; i++) {
-+ item->size = KDBUS_ITEM_HEADER_SIZE +
-+ sizeof(struct kdbus_policy_access);
-+ item->type = KDBUS_ITEM_POLICY_ACCESS;
-+
-+ item->policy_access.type = access[i].type;
-+ item->policy_access.access = access[i].access;
-+ item->policy_access.id = access[i].id;
-+
-+ item = KDBUS_ITEM_NEXT(item);
-+ }
-+
-+ return kdbus_hello(path, flags, items, size);
-+}
-+
-+struct kdbus_conn *kdbus_hello_activator(const char *path, const char *name,
-+ const struct kdbus_policy_access *access,
-+ size_t num_access)
-+{
-+ return kdbus_hello_registrar(path, name, access, num_access,
-+ KDBUS_HELLO_ACTIVATOR);
-+}
-+
-+bool kdbus_item_in_message(struct kdbus_msg *msg, uint64_t type)
-+{
-+ const struct kdbus_item *item;
-+
-+ KDBUS_ITEM_FOREACH(item, msg, items)
-+ if (item->type == type)
-+ return true;
-+
-+ return false;
-+}
-+
-+int kdbus_bus_creator_info(struct kdbus_conn *conn,
-+ uint64_t flags,
-+ uint64_t *offset)
-+{
-+ struct kdbus_cmd_info *cmd;
-+ size_t size = sizeof(*cmd);
-+ int ret;
-+
-+ cmd = alloca(size);
-+ memset(cmd, 0, size);
-+ cmd->size = size;
-+ cmd->attach_flags = flags;
-+
-+ ret = kdbus_cmd_bus_creator_info(conn->fd, cmd);
-+ if (ret < 0) {
-+ kdbus_printf("--- error when requesting info: %d (%m)\n", ret);
-+ return ret;
-+ }
-+
-+ if (offset)
-+ *offset = cmd->offset;
-+ else
-+ kdbus_free(conn, cmd->offset);
-+
-+ return 0;
-+}
-+
-+int kdbus_conn_info(struct kdbus_conn *conn, uint64_t id,
-+ const char *name, uint64_t flags,
-+ uint64_t *offset)
-+{
-+ struct kdbus_cmd_info *cmd;
-+ size_t size = sizeof(*cmd);
-+ struct kdbus_info *info;
-+ int ret;
-+
-+ if (name)
-+ size += KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
-+
-+ cmd = alloca(size);
-+ memset(cmd, 0, size);
-+ cmd->size = size;
-+ cmd->attach_flags = flags;
-+
-+ if (name) {
-+ cmd->items[0].size = KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
-+ cmd->items[0].type = KDBUS_ITEM_NAME;
-+ strcpy(cmd->items[0].str, name);
-+ } else {
-+ cmd->id = id;
-+ }
-+
-+ ret = kdbus_cmd_conn_info(conn->fd, cmd);
-+ if (ret < 0) {
-+ kdbus_printf("--- error when requesting info: %d (%m)\n", ret);
-+ return ret;
-+ }
-+
-+ info = (struct kdbus_info *) (conn->buf + cmd->offset);
-+ if (info->size != cmd->info_size) {
-+ kdbus_printf("%s(): size mismatch: %d != %d\n", __func__,
-+ (int) info->size, (int) cmd->info_size);
-+ return -EIO;
-+ }
-+
-+ if (offset)
-+ *offset = cmd->offset;
-+ else
-+ kdbus_free(conn, cmd->offset);
-+
-+ return 0;
-+}
-+
-+void kdbus_conn_free(struct kdbus_conn *conn)
-+{
-+ if (!conn)
-+ return;
-+
-+ if (conn->buf)
-+ munmap(conn->buf, POOL_SIZE);
-+
-+ if (conn->fd >= 0)
-+ close(conn->fd);
-+
-+ free(conn);
-+}
-+
-+int sys_memfd_create(const char *name, __u64 size)
-+{
-+ int ret, fd;
-+
-+ fd = syscall(__NR_memfd_create, name, MFD_ALLOW_SEALING);
-+ if (fd < 0)
-+ return fd;
-+
-+ ret = ftruncate(fd, size);
-+ if (ret < 0) {
-+ close(fd);
-+ return ret;
-+ }
-+
-+ return fd;
-+}
-+
-+int sys_memfd_seal_set(int fd)
-+{
-+ return fcntl(fd, F_ADD_SEALS, F_SEAL_SHRINK |
-+ F_SEAL_GROW | F_SEAL_WRITE | F_SEAL_SEAL);
-+}
-+
-+off_t sys_memfd_get_size(int fd, off_t *size)
-+{
-+ struct stat stat;
-+ int ret;
-+
-+ ret = fstat(fd, &stat);
-+ if (ret < 0) {
-+ kdbus_printf("stat() failed: %m\n");
-+ return ret;
-+ }
-+
-+ *size = stat.st_size;
-+ return 0;
-+}
-+
-+static int __kdbus_msg_send(const struct kdbus_conn *conn,
-+ const char *name,
-+ uint64_t cookie,
-+ uint64_t flags,
-+ uint64_t timeout,
-+ int64_t priority,
-+ uint64_t dst_id,
-+ uint64_t cmd_flags,
-+ int cancel_fd)
-+{
-+ struct kdbus_cmd_send *cmd = NULL;
-+ struct kdbus_msg *msg = NULL;
-+ const char ref1[1024 * 128 + 3] = "0123456789_0";
-+ const char ref2[] = "0123456789_1";
-+ struct kdbus_item *item;
-+ struct timespec now;
-+ uint64_t size;
-+ int memfd = -1;
-+ int ret;
-+
-+ size = sizeof(*msg) + 3 * KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
-+
-+ if (dst_id == KDBUS_DST_ID_BROADCAST)
-+ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
-+ else {
-+ memfd = sys_memfd_create("my-name-is-nice", 1024 * 1024);
-+ if (memfd < 0) {
-+ kdbus_printf("failed to create memfd: %m\n");
-+ return memfd;
-+ }
-+
-+ if (write(memfd, "kdbus memfd 1234567", 19) != 19) {
-+ ret = -errno;
-+ kdbus_printf("writing to memfd failed: %m\n");
-+ goto out;
-+ }
-+
-+ ret = sys_memfd_seal_set(memfd);
-+ if (ret < 0) {
-+ ret = -errno;
-+ kdbus_printf("memfd sealing failed: %m\n");
-+ goto out;
-+ }
-+
-+ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_memfd));
-+ }
-+
-+ if (name)
-+ size += KDBUS_ITEM_SIZE(strlen(name) + 1);
-+
-+ msg = malloc(size);
-+ if (!msg) {
-+ ret = -errno;
-+ kdbus_printf("unable to malloc()!?\n");
-+ goto out;
-+ }
-+
-+ if (dst_id == KDBUS_DST_ID_BROADCAST)
-+ flags |= KDBUS_MSG_SIGNAL;
-+
-+ memset(msg, 0, size);
-+ msg->flags = flags;
-+ msg->priority = priority;
-+ msg->size = size;
-+ msg->src_id = conn->id;
-+ msg->dst_id = name ? 0 : dst_id;
-+ msg->cookie = cookie;
-+ msg->payload_type = KDBUS_PAYLOAD_DBUS;
-+
-+ if (timeout) {
-+ ret = clock_gettime(CLOCK_MONOTONIC_COARSE, &now);
-+ if (ret < 0)
-+ goto out;
-+
-+ msg->timeout_ns = now.tv_sec * 1000000000ULL +
-+ now.tv_nsec + timeout;
-+ }
-+
-+ item = msg->items;
-+
-+ if (name) {
-+ item->type = KDBUS_ITEM_DST_NAME;
-+ item->size = KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
-+ strcpy(item->str, name);
-+ item = KDBUS_ITEM_NEXT(item);
-+ }
-+
-+ item->type = KDBUS_ITEM_PAYLOAD_VEC;
-+ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
-+ item->vec.address = (uintptr_t)&ref1;
-+ item->vec.size = sizeof(ref1);
-+ item = KDBUS_ITEM_NEXT(item);
-+
-+ /* data padding for ref1 */
-+ item->type = KDBUS_ITEM_PAYLOAD_VEC;
-+ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
-+ item->vec.address = (uintptr_t)NULL;
-+ item->vec.size = KDBUS_ALIGN8(sizeof(ref1)) - sizeof(ref1);
-+ item = KDBUS_ITEM_NEXT(item);
-+
-+ item->type = KDBUS_ITEM_PAYLOAD_VEC;
-+ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
-+ item->vec.address = (uintptr_t)&ref2;
-+ item->vec.size = sizeof(ref2);
-+ item = KDBUS_ITEM_NEXT(item);
-+
-+ if (dst_id == KDBUS_DST_ID_BROADCAST) {
-+ item->type = KDBUS_ITEM_BLOOM_FILTER;
-+ item->size = KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
-+ item->bloom_filter.generation = 0;
-+ } else {
-+ item->type = KDBUS_ITEM_PAYLOAD_MEMFD;
-+ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_memfd);
-+ item->memfd.size = 16;
-+ item->memfd.fd = memfd;
-+ }
-+ item = KDBUS_ITEM_NEXT(item);
-+
-+ size = sizeof(*cmd);
-+ if (cancel_fd != -1)
-+ size += KDBUS_ITEM_SIZE(sizeof(cancel_fd));
-+
-+ cmd = malloc(size);
-+ if (!cmd) {
-+ ret = -errno;
-+ kdbus_printf("unable to malloc()!?\n");
-+ goto out;
-+ }
-+
-+ cmd->size = size;
-+ cmd->flags = cmd_flags;
-+ cmd->msg_address = (uintptr_t)msg;
-+
-+ item = cmd->items;
-+
-+ if (cancel_fd != -1) {
-+ item->type = KDBUS_ITEM_CANCEL_FD;
-+ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(cancel_fd);
-+ item->fds[0] = cancel_fd;
-+ item = KDBUS_ITEM_NEXT(item);
-+ }
-+
-+ ret = kdbus_cmd_send(conn->fd, cmd);
-+ if (ret < 0) {
-+ kdbus_printf("error sending message: %d (%m)\n", ret);
-+ goto out;
-+ }
-+
-+ if (cmd_flags & KDBUS_SEND_SYNC_REPLY) {
-+ struct kdbus_msg *reply;
-+
-+ kdbus_printf("SYNC REPLY @offset %llu:\n", cmd->reply.offset);
-+ reply = (struct kdbus_msg *)(conn->buf + cmd->reply.offset);
-+ kdbus_msg_dump(conn, reply);
-+
-+ kdbus_msg_free(reply);
-+
-+ ret = kdbus_free(conn, cmd->reply.offset);
-+ if (ret < 0)
-+ goto out;
-+ }
-+
-+out:
-+ free(msg);
-+ free(cmd);
-+
-+ if (memfd >= 0)
-+ close(memfd);
-+
-+ return ret < 0 ? ret : 0;
-+}
-+
-+int kdbus_msg_send(const struct kdbus_conn *conn, const char *name,
-+ uint64_t cookie, uint64_t flags, uint64_t timeout,
-+ int64_t priority, uint64_t dst_id)
-+{
-+ return __kdbus_msg_send(conn, name, cookie, flags, timeout, priority,
-+ dst_id, 0, -1);
-+}
-+
-+int kdbus_msg_send_sync(const struct kdbus_conn *conn, const char *name,
-+ uint64_t cookie, uint64_t flags, uint64_t timeout,
-+ int64_t priority, uint64_t dst_id, int cancel_fd)
-+{
-+ return __kdbus_msg_send(conn, name, cookie, flags, timeout, priority,
-+ dst_id, KDBUS_SEND_SYNC_REPLY, cancel_fd);
-+}
-+
-+int kdbus_msg_send_reply(const struct kdbus_conn *conn,
-+ uint64_t reply_cookie,
-+ uint64_t dst_id)
-+{
-+ struct kdbus_cmd_send cmd = {};
-+ struct kdbus_msg *msg;
-+ const char ref1[1024 * 128 + 3] = "0123456789_0";
-+ struct kdbus_item *item;
-+ uint64_t size;
-+ int ret;
-+
-+ size = sizeof(struct kdbus_msg);
-+ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
-+
-+ msg = malloc(size);
-+ if (!msg) {
-+ kdbus_printf("unable to malloc()!?\n");
-+ return -ENOMEM;
-+ }
-+
-+ memset(msg, 0, size);
-+ msg->size = size;
-+ msg->src_id = conn->id;
-+ msg->dst_id = dst_id;
-+ msg->cookie_reply = reply_cookie;
-+ msg->payload_type = KDBUS_PAYLOAD_DBUS;
-+
-+ item = msg->items;
-+
-+ item->type = KDBUS_ITEM_PAYLOAD_VEC;
-+ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
-+ item->vec.address = (uintptr_t)&ref1;
-+ item->vec.size = sizeof(ref1);
-+ item = KDBUS_ITEM_NEXT(item);
-+
-+ cmd.size = sizeof(cmd);
-+ cmd.msg_address = (uintptr_t)msg;
-+
-+ ret = kdbus_cmd_send(conn->fd, &cmd);
-+ if (ret < 0)
-+ kdbus_printf("error sending message: %d (%m)\n", ret);
-+
-+ free(msg);
-+
-+ return ret;
-+}
-+
-+static char *msg_id(uint64_t id, char *buf)
-+{
-+ if (id == 0)
-+ return "KERNEL";
-+ if (id == ~0ULL)
-+ return "BROADCAST";
-+ sprintf(buf, "%llu", (unsigned long long)id);
-+ return buf;
-+}
-+
-+int kdbus_msg_dump(const struct kdbus_conn *conn, const struct kdbus_msg *msg)
-+{
-+ const struct kdbus_item *item = msg->items;
-+ char buf_src[32];
-+ char buf_dst[32];
-+ uint64_t timeout = 0;
-+ uint64_t cookie_reply = 0;
-+ int ret = 0;
-+
-+ if (msg->flags & KDBUS_MSG_EXPECT_REPLY)
-+ timeout = msg->timeout_ns;
-+ else
-+ cookie_reply = msg->cookie_reply;
-+
-+ kdbus_printf("MESSAGE: %s (%llu bytes) flags=0x%08llx, %s → %s, "
-+ "cookie=%llu, timeout=%llu cookie_reply=%llu priority=%lli\n",
-+ enum_PAYLOAD(msg->payload_type), (unsigned long long)msg->size,
-+ (unsigned long long)msg->flags,
-+ msg_id(msg->src_id, buf_src), msg_id(msg->dst_id, buf_dst),
-+ (unsigned long long)msg->cookie, (unsigned long long)timeout,
-+ (unsigned long long)cookie_reply, (long long)msg->priority);
-+
-+ KDBUS_ITEM_FOREACH(item, msg, items) {
-+ if (item->size < KDBUS_ITEM_HEADER_SIZE) {
-+ kdbus_printf(" +%s (%llu bytes) invalid data record\n",
-+ enum_MSG(item->type), item->size);
-+ ret = -EINVAL;
-+ break;
-+ }
-+
-+ switch (item->type) {
-+ case KDBUS_ITEM_PAYLOAD_OFF: {
-+ char *s;
-+
-+ if (item->vec.offset == ~0ULL)
-+ s = "[\\0-bytes]";
-+ else
-+ s = (char *)msg + item->vec.offset;
-+
-+ kdbus_printf(" +%s (%llu bytes) off=%llu size=%llu '%s'\n",
-+ enum_MSG(item->type), item->size,
-+ (unsigned long long)item->vec.offset,
-+ (unsigned long long)item->vec.size, s);
-+ break;
-+ }
-+
-+ case KDBUS_ITEM_FDS: {
-+ int i, n = (item->size - KDBUS_ITEM_HEADER_SIZE) /
-+ sizeof(int);
-+
-+ kdbus_printf(" +%s (%llu bytes, %d fds)\n",
-+ enum_MSG(item->type), item->size, n);
-+
-+ for (i = 0; i < n; i++)
-+ kdbus_printf(" fd[%d] = %d\n",
-+ i, item->fds[i]);
-+
-+ break;
-+ }
-+
-+ case KDBUS_ITEM_PAYLOAD_MEMFD: {
-+ char *buf;
-+ off_t size;
-+
-+ buf = mmap(NULL, item->memfd.size, PROT_READ,
-+ MAP_PRIVATE, item->memfd.fd, 0);
-+ if (buf == MAP_FAILED) {
-+ kdbus_printf("mmap() fd=%i size=%llu failed: %m\n",
-+ item->memfd.fd, item->memfd.size);
-+ break;
-+ }
-+
-+ if (sys_memfd_get_size(item->memfd.fd, &size) < 0) {
-+ kdbus_printf("KDBUS_CMD_MEMFD_SIZE_GET failed: %m\n");
-+ break;
-+ }
-+
-+ kdbus_printf(" +%s (%llu bytes) fd=%i size=%llu filesize=%llu '%s'\n",
-+ enum_MSG(item->type), item->size, item->memfd.fd,
-+ (unsigned long long)item->memfd.size,
-+ (unsigned long long)size, buf);
-+ munmap(buf, item->memfd.size);
-+ break;
-+ }
-+
-+ case KDBUS_ITEM_CREDS:
-+ kdbus_printf(" +%s (%llu bytes) uid=%lld, euid=%lld, suid=%lld, fsuid=%lld, "
-+ "gid=%lld, egid=%lld, sgid=%lld, fsgid=%lld\n",
-+ enum_MSG(item->type), item->size,
-+ item->creds.uid, item->creds.euid,
-+ item->creds.suid, item->creds.fsuid,
-+ item->creds.gid, item->creds.egid,
-+ item->creds.sgid, item->creds.fsgid);
-+ break;
-+
-+ case KDBUS_ITEM_PIDS:
-+ kdbus_printf(" +%s (%llu bytes) pid=%lld, tid=%lld, ppid=%lld\n",
-+ enum_MSG(item->type), item->size,
-+ item->pids.pid, item->pids.tid,
-+ item->pids.ppid);
-+ break;
-+
-+ case KDBUS_ITEM_AUXGROUPS: {
-+ int i, n;
-+
-+ kdbus_printf(" +%s (%llu bytes)\n",
-+ enum_MSG(item->type), item->size);
-+ n = (item->size - KDBUS_ITEM_HEADER_SIZE) /
-+ sizeof(uint64_t);
-+
-+ for (i = 0; i < n; i++)
-+ kdbus_printf(" gid[%d] = %lld\n",
-+ i, item->data64[i]);
-+ break;
-+ }
-+
-+ case KDBUS_ITEM_NAME:
-+ case KDBUS_ITEM_PID_COMM:
-+ case KDBUS_ITEM_TID_COMM:
-+ case KDBUS_ITEM_EXE:
-+ case KDBUS_ITEM_CGROUP:
-+ case KDBUS_ITEM_SECLABEL:
-+ case KDBUS_ITEM_DST_NAME:
-+ case KDBUS_ITEM_CONN_DESCRIPTION:
-+ kdbus_printf(" +%s (%llu bytes) '%s' (%zu)\n",
-+ enum_MSG(item->type), item->size,
-+ item->str, strlen(item->str));
-+ break;
-+
-+ case KDBUS_ITEM_OWNED_NAME: {
-+ kdbus_printf(" +%s (%llu bytes) '%s' (%zu) flags=0x%08llx\n",
-+ enum_MSG(item->type), item->size,
-+ item->name.name, strlen(item->name.name),
-+ item->name.flags);
-+ break;
-+ }
-+
-+ case KDBUS_ITEM_CMDLINE: {
-+ size_t size = item->size - KDBUS_ITEM_HEADER_SIZE;
-+ const char *str = item->str;
-+ int count = 0;
-+
-+ kdbus_printf(" +%s (%llu bytes) ",
-+ enum_MSG(item->type), item->size);
-+ while (size) {
-+ kdbus_printf("'%s' ", str);
-+ size -= strlen(str) + 1;
-+ str += strlen(str) + 1;
-+ count++;
-+ }
-+
-+ kdbus_printf("(%d string%s)\n",
-+ count, (count == 1) ? "" : "s");
-+ break;
-+ }
-+
-+ case KDBUS_ITEM_AUDIT:
-+ kdbus_printf(" +%s (%llu bytes) loginuid=%u sessionid=%u\n",
-+ enum_MSG(item->type), item->size,
-+ item->audit.loginuid, item->audit.sessionid);
-+ break;
-+
-+ case KDBUS_ITEM_CAPS: {
-+ const uint32_t *cap;
-+ int n, i;
-+
-+ kdbus_printf(" +%s (%llu bytes) len=%llu bytes, last_cap %d\n",
-+ enum_MSG(item->type), item->size,
-+ (unsigned long long)item->size -
-+ KDBUS_ITEM_HEADER_SIZE,
-+ (int) item->caps.last_cap);
-+
-+ cap = item->caps.caps;
-+ n = (item->size - offsetof(struct kdbus_item, caps.caps))
-+ / 4 / sizeof(uint32_t);
-+
-+ kdbus_printf(" CapInh=");
-+ for (i = 0; i < n; i++)
-+ kdbus_printf("%08x", cap[(0 * n) + (n - i - 1)]);
-+
-+ kdbus_printf(" CapPrm=");
-+ for (i = 0; i < n; i++)
-+ kdbus_printf("%08x", cap[(1 * n) + (n - i - 1)]);
-+
-+ kdbus_printf(" CapEff=");
-+ for (i = 0; i < n; i++)
-+ kdbus_printf("%08x", cap[(2 * n) + (n - i - 1)]);
-+
-+ kdbus_printf(" CapBnd=");
-+ for (i = 0; i < n; i++)
-+ kdbus_printf("%08x", cap[(3 * n) + (n - i - 1)]);
-+ kdbus_printf("\n");
-+ break;
-+ }
-+
-+ case KDBUS_ITEM_TIMESTAMP:
-+ kdbus_printf(" +%s (%llu bytes) seq=%llu realtime=%lluns monotonic=%lluns\n",
-+ enum_MSG(item->type), item->size,
-+ (unsigned long long)item->timestamp.seqnum,
-+ (unsigned long long)item->timestamp.realtime_ns,
-+ (unsigned long long)item->timestamp.monotonic_ns);
-+ break;
-+
-+ case KDBUS_ITEM_REPLY_TIMEOUT:
-+ kdbus_printf(" +%s (%llu bytes) cookie=%llu\n",
-+ enum_MSG(item->type), item->size,
-+ msg->cookie_reply);
-+ break;
-+
-+ case KDBUS_ITEM_NAME_ADD:
-+ case KDBUS_ITEM_NAME_REMOVE:
-+ case KDBUS_ITEM_NAME_CHANGE:
-+ kdbus_printf(" +%s (%llu bytes) '%s', old id=%lld, now id=%lld, old_flags=0x%llx new_flags=0x%llx\n",
-+ enum_MSG(item->type),
-+ (unsigned long long) item->size,
-+ item->name_change.name,
-+ item->name_change.old_id.id,
-+ item->name_change.new_id.id,
-+ item->name_change.old_id.flags,
-+ item->name_change.new_id.flags);
-+ break;
-+
-+ case KDBUS_ITEM_ID_ADD:
-+ case KDBUS_ITEM_ID_REMOVE:
-+ kdbus_printf(" +%s (%llu bytes) id=%llu flags=%llu\n",
-+ enum_MSG(item->type),
-+ (unsigned long long) item->size,
-+ (unsigned long long) item->id_change.id,
-+ (unsigned long long) item->id_change.flags);
-+ break;
-+
-+ default:
-+ kdbus_printf(" +%s (%llu bytes)\n",
-+ enum_MSG(item->type), item->size);
-+ break;
-+ }
-+ }
-+
-+ if ((char *)item - ((char *)msg + msg->size) >= 8) {
-+ kdbus_printf("invalid padding at end of message\n");
-+ ret = -EINVAL;
-+ }
-+
-+ kdbus_printf("\n");
-+
-+ return ret;
-+}
-+
-+void kdbus_msg_free(struct kdbus_msg *msg)
-+{
-+ const struct kdbus_item *item;
-+ int nfds, i;
-+
-+ if (!msg)
-+ return;
-+
-+ KDBUS_ITEM_FOREACH(item, msg, items) {
-+ switch (item->type) {
-+ /* close all memfds */
-+ case KDBUS_ITEM_PAYLOAD_MEMFD:
-+ close(item->memfd.fd);
-+ break;
-+ case KDBUS_ITEM_FDS:
-+ nfds = (item->size - KDBUS_ITEM_HEADER_SIZE) /
-+ sizeof(int);
-+
-+ for (i = 0; i < nfds; i++)
-+ close(item->fds[i]);
-+
-+ break;
-+ }
-+ }
-+}
-+
-+int kdbus_msg_recv(struct kdbus_conn *conn,
-+ struct kdbus_msg **msg_out,
-+ uint64_t *offset)
-+{
-+ struct kdbus_cmd_recv recv = { .size = sizeof(recv) };
-+ struct kdbus_msg *msg;
-+ int ret;
-+
-+ ret = kdbus_cmd_recv(conn->fd, &recv);
-+ if (ret < 0)
-+ return ret;
-+
-+ msg = (struct kdbus_msg *)(conn->buf + recv.msg.offset);
-+ ret = kdbus_msg_dump(conn, msg);
-+ if (ret < 0) {
-+ kdbus_msg_free(msg);
-+ return ret;
-+ }
-+
-+ if (msg_out) {
-+ *msg_out = msg;
-+
-+ if (offset)
-+ *offset = recv.msg.offset;
-+ } else {
-+ kdbus_msg_free(msg);
-+
-+ ret = kdbus_free(conn, recv.msg.offset);
-+ if (ret < 0)
-+ return ret;
-+ }
-+
-+ return 0;
-+}
-+
-+/*
-+ * Returns: 0 on success, negative errno on failure.
-+ *
-+ * We must return -ETIMEDOUT, -ECONNREST, -EAGAIN and other errors.
-+ * We must return the result of kdbus_msg_recv()
-+ */
-+int kdbus_msg_recv_poll(struct kdbus_conn *conn,
-+ int timeout_ms,
-+ struct kdbus_msg **msg_out,
-+ uint64_t *offset)
-+{
-+ int ret;
-+
-+ do {
-+ struct timeval before, after, diff;
-+ struct pollfd fd;
-+
-+ fd.fd = conn->fd;
-+ fd.events = POLLIN | POLLPRI | POLLHUP;
-+ fd.revents = 0;
-+
-+ gettimeofday(&before, NULL);
-+ ret = poll(&fd, 1, timeout_ms);
-+ gettimeofday(&after, NULL);
-+
-+ if (ret == 0) {
-+ ret = -ETIMEDOUT;
-+ break;
-+ }
-+
-+ if (ret > 0) {
-+ if (fd.revents & POLLIN)
-+ ret = kdbus_msg_recv(conn, msg_out, offset);
-+
-+ if (fd.revents & (POLLHUP | POLLERR))
-+ ret = -ECONNRESET;
-+ }
-+
-+ if (ret == 0 || ret != -EAGAIN)
-+ break;
-+
-+ timersub(&after, &before, &diff);
-+ timeout_ms -= diff.tv_sec * 1000UL +
-+ diff.tv_usec / 1000UL;
-+ } while (timeout_ms > 0);
-+
-+ return ret;
-+}
-+
-+int kdbus_free(const struct kdbus_conn *conn, uint64_t offset)
-+{
-+ struct kdbus_cmd_free cmd_free = {};
-+ int ret;
-+
-+ cmd_free.size = sizeof(cmd_free);
-+ cmd_free.offset = offset;
-+ cmd_free.flags = 0;
-+
-+ ret = kdbus_cmd_free(conn->fd, &cmd_free);
-+ if (ret < 0) {
-+ kdbus_printf("KDBUS_CMD_FREE failed: %d (%m)\n", ret);
-+ return ret;
-+ }
-+
-+ return 0;
-+}
-+
-+int kdbus_name_acquire(struct kdbus_conn *conn,
-+ const char *name, uint64_t *flags)
-+{
-+ struct kdbus_cmd *cmd_name;
-+ size_t name_len = strlen(name) + 1;
-+ uint64_t size = sizeof(*cmd_name) + KDBUS_ITEM_SIZE(name_len);
-+ struct kdbus_item *item;
-+ int ret;
-+
-+ cmd_name = alloca(size);
-+
-+ memset(cmd_name, 0, size);
-+
-+ item = cmd_name->items;
-+ item->size = KDBUS_ITEM_HEADER_SIZE + name_len;
-+ item->type = KDBUS_ITEM_NAME;
-+ strcpy(item->str, name);
-+
-+ cmd_name->size = size;
-+ if (flags)
-+ cmd_name->flags = *flags;
-+
-+ ret = kdbus_cmd_name_acquire(conn->fd, cmd_name);
-+ if (ret < 0) {
-+ kdbus_printf("error aquiring name: %s\n", strerror(-ret));
-+ return ret;
-+ }
-+
-+ kdbus_printf("%s(): flags after call: 0x%llx\n", __func__,
-+ cmd_name->return_flags);
-+
-+ if (flags)
-+ *flags = cmd_name->return_flags;
-+
-+ return 0;
-+}
-+
-+int kdbus_name_release(struct kdbus_conn *conn, const char *name)
-+{
-+ struct kdbus_cmd *cmd_name;
-+ size_t name_len = strlen(name) + 1;
-+ uint64_t size = sizeof(*cmd_name) + KDBUS_ITEM_SIZE(name_len);
-+ struct kdbus_item *item;
-+ int ret;
-+
-+ cmd_name = alloca(size);
-+
-+ memset(cmd_name, 0, size);
-+
-+ item = cmd_name->items;
-+ item->size = KDBUS_ITEM_HEADER_SIZE + name_len;
-+ item->type = KDBUS_ITEM_NAME;
-+ strcpy(item->str, name);
-+
-+ cmd_name->size = size;
-+
-+ kdbus_printf("conn %lld giving up name '%s'\n",
-+ (unsigned long long) conn->id, name);
-+
-+ ret = kdbus_cmd_name_release(conn->fd, cmd_name);
-+ if (ret < 0) {
-+ kdbus_printf("error releasing name: %s\n", strerror(-ret));
-+ return ret;
-+ }
-+
-+ return 0;
-+}
-+
-+int kdbus_list(struct kdbus_conn *conn, uint64_t flags)
-+{
-+ struct kdbus_cmd_list cmd_list = {};
-+ struct kdbus_info *list, *name;
-+ int ret;
-+
-+ cmd_list.size = sizeof(cmd_list);
-+ cmd_list.flags = flags;
-+
-+ ret = kdbus_cmd_list(conn->fd, &cmd_list);
-+ if (ret < 0) {
-+ kdbus_printf("error listing names: %d (%m)\n", ret);
-+ return ret;
-+ }
-+
-+ kdbus_printf("REGISTRY:\n");
-+ list = (struct kdbus_info *)(conn->buf + cmd_list.offset);
-+
-+ KDBUS_FOREACH(name, list, cmd_list.list_size) {
-+ uint64_t flags = 0;
-+ struct kdbus_item *item;
-+ const char *n = "MISSING-NAME";
-+
-+ if (name->size == sizeof(struct kdbus_cmd))
-+ continue;
-+
-+ KDBUS_ITEM_FOREACH(item, name, items)
-+ if (item->type == KDBUS_ITEM_OWNED_NAME) {
-+ n = item->name.name;
-+ flags = item->name.flags;
-+
-+ kdbus_printf("%8llu flags=0x%08llx conn=0x%08llx '%s'\n",
-+ name->id,
-+ (unsigned long long) flags,
-+ name->flags, n);
-+ }
-+ }
-+ kdbus_printf("\n");
-+
-+ ret = kdbus_free(conn, cmd_list.offset);
-+
-+ return ret;
-+}
-+
-+int kdbus_conn_update_attach_flags(struct kdbus_conn *conn,
-+ uint64_t attach_flags_send,
-+ uint64_t attach_flags_recv)
-+{
-+ int ret;
-+ size_t size;
-+ struct kdbus_cmd *update;
-+ struct kdbus_item *item;
-+
-+ size = sizeof(struct kdbus_cmd);
-+ size += KDBUS_ITEM_SIZE(sizeof(uint64_t)) * 2;
-+
-+ update = malloc(size);
-+ if (!update) {
-+ kdbus_printf("error malloc: %m\n");
-+ return -ENOMEM;
-+ }
-+
-+ memset(update, 0, size);
-+ update->size = size;
-+
-+ item = update->items;
-+
-+ item->type = KDBUS_ITEM_ATTACH_FLAGS_SEND;
-+ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(uint64_t);
-+ item->data64[0] = attach_flags_send;
-+ item = KDBUS_ITEM_NEXT(item);
-+
-+ item->type = KDBUS_ITEM_ATTACH_FLAGS_RECV;
-+ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(uint64_t);
-+ item->data64[0] = attach_flags_recv;
-+ item = KDBUS_ITEM_NEXT(item);
-+
-+ ret = kdbus_cmd_update(conn->fd, update);
-+ if (ret < 0)
-+ kdbus_printf("error conn update: %d (%m)\n", ret);
-+
-+ free(update);
-+
-+ return ret;
-+}
-+
-+int kdbus_conn_update_policy(struct kdbus_conn *conn, const char *name,
-+ const struct kdbus_policy_access *access,
-+ size_t num_access)
-+{
-+ struct kdbus_cmd *update;
-+ struct kdbus_item *item;
-+ size_t i, size;
-+ int ret;
-+
-+ size = sizeof(struct kdbus_cmd);
-+ size += KDBUS_ITEM_SIZE(strlen(name) + 1);
-+ size += num_access * KDBUS_ITEM_SIZE(sizeof(struct kdbus_policy_access));
-+
-+ update = malloc(size);
-+ if (!update) {
-+ kdbus_printf("error malloc: %m\n");
-+ return -ENOMEM;
-+ }
-+
-+ memset(update, 0, size);
-+ update->size = size;
-+
-+ item = update->items;
-+
-+ item->type = KDBUS_ITEM_NAME;
-+ item->size = KDBUS_ITEM_HEADER_SIZE + strlen(name) + 1;
-+ strcpy(item->str, name);
-+ item = KDBUS_ITEM_NEXT(item);
-+
-+ for (i = 0; i < num_access; i++) {
-+ item->size = KDBUS_ITEM_HEADER_SIZE +
-+ sizeof(struct kdbus_policy_access);
-+ item->type = KDBUS_ITEM_POLICY_ACCESS;
-+
-+ item->policy_access.type = access[i].type;
-+ item->policy_access.access = access[i].access;
-+ item->policy_access.id = access[i].id;
-+
-+ item = KDBUS_ITEM_NEXT(item);
-+ }
-+
-+ ret = kdbus_cmd_update(conn->fd, update);
-+ if (ret < 0)
-+ kdbus_printf("error conn update: %d (%m)\n", ret);
-+
-+ free(update);
-+
-+ return ret;
-+}
-+
-+int kdbus_add_match_id(struct kdbus_conn *conn, uint64_t cookie,
-+ uint64_t type, uint64_t id)
-+{
-+ struct {
-+ struct kdbus_cmd_match cmd;
-+ struct {
-+ uint64_t size;
-+ uint64_t type;
-+ struct kdbus_notify_id_change chg;
-+ } item;
-+ } buf;
-+ int ret;
-+
-+ memset(&buf, 0, sizeof(buf));
-+
-+ buf.cmd.size = sizeof(buf);
-+ buf.cmd.cookie = cookie;
-+ buf.item.size = sizeof(buf.item);
-+ buf.item.type = type;
-+ buf.item.chg.id = id;
-+
-+ ret = kdbus_cmd_match_add(conn->fd, &buf.cmd);
-+ if (ret < 0)
-+ kdbus_printf("--- error adding conn match: %d (%m)\n", ret);
-+
-+ return ret;
-+}
-+
-+int kdbus_add_match_empty(struct kdbus_conn *conn)
-+{
-+ struct {
-+ struct kdbus_cmd_match cmd;
-+ struct kdbus_item item;
-+ } buf;
-+ int ret;
-+
-+ memset(&buf, 0, sizeof(buf));
-+
-+ buf.item.size = sizeof(uint64_t) * 3;
-+ buf.item.type = KDBUS_ITEM_ID;
-+ buf.item.id = KDBUS_MATCH_ID_ANY;
-+
-+ buf.cmd.size = sizeof(buf.cmd) + buf.item.size;
-+
-+ ret = kdbus_cmd_match_add(conn->fd, &buf.cmd);
-+ if (ret < 0)
-+ kdbus_printf("--- error adding conn match: %d (%m)\n", ret);
-+
-+ return ret;
-+}
-+
-+static int all_ids_are_mapped(const char *path)
-+{
-+ int ret;
-+ FILE *file;
-+ uint32_t inside_id, length;
-+
-+ file = fopen(path, "r");
-+ if (!file) {
-+ ret = -errno;
-+ kdbus_printf("error fopen() %s: %d (%m)\n",
-+ path, ret);
-+ return ret;
-+ }
-+
-+ ret = fscanf(file, "%u\t%*u\t%u", &inside_id, &length);
-+ if (ret != 2) {
-+ if (ferror(file))
-+ ret = -errno;
-+ else
-+ ret = -EIO;
-+
-+ kdbus_printf("--- error fscanf(): %d\n", ret);
-+ fclose(file);
-+ return ret;
-+ }
-+
-+ fclose(file);
-+
-+ /*
-+ * If length is 4294967295 which means the invalid uid
-+ * (uid_t) -1 then we are able to map all uid/gids
-+ */
-+ if (inside_id == 0 && length == (uid_t) -1)
-+ return 1;
-+
-+ return 0;
-+}
-+
-+int all_uids_gids_are_mapped(void)
-+{
-+ int ret;
-+
-+ ret = all_ids_are_mapped("/proc/self/uid_map");
-+ if (ret <= 0) {
-+ kdbus_printf("--- error not all uids are mapped\n");
-+ return 0;
-+ }
-+
-+ ret = all_ids_are_mapped("/proc/self/gid_map");
-+ if (ret <= 0) {
-+ kdbus_printf("--- error not all gids are mapped\n");
-+ return 0;
-+ }
-+
-+ return 1;
-+}
-+
-+int drop_privileges(uid_t uid, gid_t gid)
-+{
-+ int ret;
-+
-+ ret = setgroups(0, NULL);
-+ if (ret < 0) {
-+ ret = -errno;
-+ kdbus_printf("error setgroups: %d (%m)\n", ret);
-+ return ret;
-+ }
-+
-+ ret = setresgid(gid, gid, gid);
-+ if (ret < 0) {
-+ ret = -errno;
-+ kdbus_printf("error setresgid: %d (%m)\n", ret);
-+ return ret;
-+ }
-+
-+ ret = setresuid(uid, uid, uid);
-+ if (ret < 0) {
-+ ret = -errno;
-+ kdbus_printf("error setresuid: %d (%m)\n", ret);
-+ return ret;
-+ }
-+
-+ return ret;
-+}
-+
-+uint64_t now(clockid_t clock)
-+{
-+ struct timespec spec;
-+
-+ clock_gettime(clock, &spec);
-+ return spec.tv_sec * 1000ULL * 1000ULL * 1000ULL + spec.tv_nsec;
-+}
-+
-+char *unique_name(const char *prefix)
-+{
-+ unsigned int i;
-+ uint64_t u_now;
-+ char n[17];
-+ char *str;
-+ int r;
-+
-+ /*
-+ * This returns a random string which is guaranteed to be
-+ * globally unique across all calls to unique_name(). We
-+ * compose the string as:
-+ * <prefix>-<random>-<time>
-+ * With:
-+ * <prefix>: string provided by the caller
-+ * <random>: a random alpha string of 16 characters
-+ * <time>: the current time in micro-seconds since last boot
-+ *
-+ * The <random> part makes the string always look vastly different,
-+ * the <time> part makes sure no two calls return the same string.
-+ */
-+
-+ u_now = now(CLOCK_MONOTONIC);
-+
-+ for (i = 0; i < sizeof(n) - 1; ++i)
-+ n[i] = 'a' + (rand() % ('z' - 'a'));
-+ n[sizeof(n) - 1] = 0;
-+
-+ r = asprintf(&str, "%s-%s-%" PRIu64, prefix, n, u_now);
-+ if (r < 0)
-+ return NULL;
-+
-+ return str;
-+}
-+
-+static int do_userns_map_id(pid_t pid,
-+ const char *map_file,
-+ const char *map_id)
-+{
-+ int ret;
-+ int fd;
-+ char *map;
-+ unsigned int i;
-+
-+ map = strndupa(map_id, strlen(map_id));
-+ if (!map) {
-+ ret = -errno;
-+ kdbus_printf("error strndupa %s: %d (%m)\n",
-+ map_file, ret);
-+ return ret;
-+ }
-+
-+ for (i = 0; i < strlen(map); i++)
-+ if (map[i] == ',')
-+ map[i] = '\n';
-+
-+ fd = open(map_file, O_RDWR);
-+ if (fd < 0) {
-+ ret = -errno;
-+ kdbus_printf("error open %s: %d (%m)\n",
-+ map_file, ret);
-+ return ret;
-+ }
-+
-+ ret = write(fd, map, strlen(map));
-+ if (ret < 0) {
-+ ret = -errno;
-+ kdbus_printf("error write to %s: %d (%m)\n",
-+ map_file, ret);
-+ goto out;
-+ }
-+
-+ ret = 0;
-+
-+out:
-+ close(fd);
-+ return ret;
-+}
-+
-+int userns_map_uid_gid(pid_t pid,
-+ const char *map_uid,
-+ const char *map_gid)
-+{
-+ int fd, ret;
-+ char file_id[128] = {'\0'};
-+
-+ snprintf(file_id, sizeof(file_id), "/proc/%ld/uid_map",
-+ (long) pid);
-+
-+ ret = do_userns_map_id(pid, file_id, map_uid);
-+ if (ret < 0)
-+ return ret;
-+
-+ snprintf(file_id, sizeof(file_id), "/proc/%ld/setgroups",
-+ (long) pid);
-+
-+ fd = open(file_id, O_WRONLY);
-+ if (fd >= 0) {
-+ write(fd, "deny\n", 5);
-+ close(fd);
-+ }
-+
-+ snprintf(file_id, sizeof(file_id), "/proc/%ld/gid_map",
-+ (long) pid);
-+
-+ return do_userns_map_id(pid, file_id, map_gid);
-+}
-+
-+static int do_cap_get_flag(cap_t caps, cap_value_t cap)
-+{
-+ int ret;
-+ cap_flag_value_t flag_set;
-+
-+ ret = cap_get_flag(caps, cap, CAP_EFFECTIVE, &flag_set);
-+ if (ret < 0) {
-+ ret = -errno;
-+ kdbus_printf("error cap_get_flag(): %d (%m)\n", ret);
-+ return ret;
-+ }
-+
-+ return (flag_set == CAP_SET);
-+}
-+
-+/*
-+ * Returns:
-+ * 1 in case all the requested effective capabilities are set.
-+ * 0 in case we do not have the requested capabilities. This value
-+ * will be used to abort tests with TEST_SKIP
-+ * Negative errno on failure.
-+ *
-+ * Terminate args with a negative value.
-+ */
-+int test_is_capable(int cap, ...)
-+{
-+ int ret;
-+ va_list ap;
-+ cap_t caps;
-+
-+ caps = cap_get_proc();
-+ if (!caps) {
-+ ret = -errno;
-+ kdbus_printf("error cap_get_proc(): %d (%m)\n", ret);
-+ return ret;
-+ }
-+
-+ ret = do_cap_get_flag(caps, (cap_value_t)cap);
-+ if (ret <= 0)
-+ goto out;
-+
-+ va_start(ap, cap);
-+ while ((cap = va_arg(ap, int)) > 0) {
-+ ret = do_cap_get_flag(caps, (cap_value_t)cap);
-+ if (ret <= 0)
-+ break;
-+ }
-+ va_end(ap);
-+
-+out:
-+ cap_free(caps);
-+ return ret;
-+}
-+
-+int config_user_ns_is_enabled(void)
-+{
-+ return (access("/proc/self/uid_map", F_OK) == 0);
-+}
-+
-+int config_auditsyscall_is_enabled(void)
-+{
-+ return (access("/proc/self/loginuid", F_OK) == 0);
-+}
-+
-+int config_cgroups_is_enabled(void)
-+{
-+ return (access("/proc/self/cgroup", F_OK) == 0);
-+}
-+
-+int config_security_is_enabled(void)
-+{
-+ int fd;
-+ int ret;
-+ char buf[128];
-+
-+ /* CONFIG_SECURITY is disabled */
-+ if (access("/proc/self/attr/current", F_OK) != 0)
-+ return 0;
-+
-+ /*
-+ * Now only if read() fails with -EINVAL then we assume
-+ * that SECLABEL and LSM are disabled
-+ */
-+ fd = open("/proc/self/attr/current", O_RDONLY|O_CLOEXEC);
-+ if (fd < 0)
-+ return 1;
-+
-+ ret = read(fd, buf, sizeof(buf));
-+ if (ret == -1 && errno == EINVAL)
-+ ret = 0;
-+ else
-+ ret = 1;
-+
-+ close(fd);
-+
-+ return ret;
-+}
-diff --git a/tools/testing/selftests/kdbus/kdbus-util.h b/tools/testing/selftests/kdbus/kdbus-util.h
-new file mode 100644
-index 0000000..e1e18b9
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/kdbus-util.h
-@@ -0,0 +1,218 @@
-+/*
-+ * Copyright (C) 2013-2015 Kay Sievers
-+ * Copyright (C) 2013-2015 Daniel Mack
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#pragma once
-+
-+#define BIT(X) (1 << (X))
-+
-+#include <time.h>
-+#include <stdbool.h>
-+#include <linux/kdbus.h>
-+
-+#define _STRINGIFY(x) #x
-+#define STRINGIFY(x) _STRINGIFY(x)
-+#define ELEMENTSOF(x) (sizeof(x)/sizeof((x)[0]))
-+
-+#define KDBUS_PTR(addr) ((void *)(uintptr_t)(addr))
-+
-+#define KDBUS_ALIGN8(l) (((l) + 7) & ~7)
-+#define KDBUS_ITEM_HEADER_SIZE offsetof(struct kdbus_item, data)
-+#define KDBUS_ITEM_SIZE(s) KDBUS_ALIGN8((s) + KDBUS_ITEM_HEADER_SIZE)
-+
-+#define KDBUS_ITEM_NEXT(item) \
-+ (typeof(item))((uint8_t *)(item) + KDBUS_ALIGN8((item)->size))
-+#define KDBUS_ITEM_FOREACH(item, head, first) \
-+ for ((item) = (head)->first; \
-+ ((uint8_t *)(item) < (uint8_t *)(head) + (head)->size) && \
-+ ((uint8_t *)(item) >= (uint8_t *)(head)); \
-+ (item) = KDBUS_ITEM_NEXT(item))
-+#define KDBUS_FOREACH(iter, first, _size) \
-+ for ((iter) = (first); \
-+ ((uint8_t *)(iter) < (uint8_t *)(first) + (_size)) && \
-+ ((uint8_t *)(iter) >= (uint8_t *)(first)); \
-+ (iter) = (void *)((uint8_t *)(iter) + KDBUS_ALIGN8((iter)->size)))
-+
-+#define _KDBUS_ATTACH_BITS_SET_NR (__builtin_popcountll(_KDBUS_ATTACH_ALL))
-+
-+/* Sum of KDBUS_ITEM_* that reflects _KDBUS_ATTACH_ALL */
-+#define KDBUS_ATTACH_ITEMS_TYPE_SUM \
-+ ((((_KDBUS_ATTACH_BITS_SET_NR - 1) * \
-+ ((_KDBUS_ATTACH_BITS_SET_NR - 1) + 1)) / 2) + \
-+ (_KDBUS_ITEM_ATTACH_BASE * _KDBUS_ATTACH_BITS_SET_NR))
-+
-+#define POOL_SIZE (16 * 1024LU * 1024LU)
-+
-+#define UNPRIV_UID 65534
-+#define UNPRIV_GID 65534
-+
-+/* Dump as user of process, useful for user namespace testing */
-+#define SUID_DUMP_USER 1
-+
-+extern int kdbus_util_verbose;
-+
-+#define kdbus_printf(X...) \
-+ if (kdbus_util_verbose) \
-+ printf(X)
-+
-+#define RUN_UNPRIVILEGED(child_uid, child_gid, _child_, _parent_) ({ \
-+ pid_t pid, rpid; \
-+ int ret; \
-+ \
-+ pid = fork(); \
-+ if (pid == 0) { \
-+ ret = drop_privileges(child_uid, child_gid); \
-+ ASSERT_EXIT_VAL(ret == 0, ret); \
-+ \
-+ _child_; \
-+ _exit(0); \
-+ } else if (pid > 0) { \
-+ _parent_; \
-+ rpid = waitpid(pid, &ret, 0); \
-+ ASSERT_RETURN(rpid == pid); \
-+ ASSERT_RETURN(WIFEXITED(ret)); \
-+ ASSERT_RETURN(WEXITSTATUS(ret) == 0); \
-+ ret = TEST_OK; \
-+ } else { \
-+ ret = pid; \
-+ } \
-+ \
-+ ret; \
-+ })
-+
-+#define RUN_UNPRIVILEGED_CONN(_var_, _bus_, _code_) \
-+ RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_GID, ({ \
-+ struct kdbus_conn *_var_; \
-+ _var_ = kdbus_hello(_bus_, 0, NULL, 0); \
-+ ASSERT_EXIT(_var_); \
-+ _code_; \
-+ kdbus_conn_free(_var_); \
-+ }), ({ 0; }))
-+
-+#define RUN_CLONE_CHILD(clone_ret, flags, _setup_, _child_body_, \
-+ _parent_setup_, _parent_body_) ({ \
-+ pid_t pid, rpid; \
-+ int ret; \
-+ int efd = -1; \
-+ \
-+ _setup_; \
-+ efd = eventfd(0, EFD_CLOEXEC); \
-+ ASSERT_RETURN(efd >= 0); \
-+ *(clone_ret) = 0; \
-+ pid = syscall(__NR_clone, flags, NULL); \
-+ if (pid == 0) { \
-+ eventfd_t event_status = 0; \
-+ ret = prctl(PR_SET_PDEATHSIG, SIGKILL); \
-+ ASSERT_EXIT(ret == 0); \
-+ ret = eventfd_read(efd, &event_status); \
-+ if (ret < 0 || event_status != 1) { \
-+ kdbus_printf("error eventfd_read()\n"); \
-+ _exit(EXIT_FAILURE); \
-+ } \
-+ _child_body_; \
-+ _exit(0); \
-+ } else if (pid > 0) { \
-+ _parent_setup_; \
-+ ret = eventfd_write(efd, 1); \
-+ ASSERT_RETURN(ret >= 0); \
-+ _parent_body_; \
-+ rpid = waitpid(pid, &ret, 0); \
-+ ASSERT_RETURN(rpid == pid); \
-+ ASSERT_RETURN(WIFEXITED(ret)); \
-+ ASSERT_RETURN(WEXITSTATUS(ret) == 0); \
-+ ret = TEST_OK; \
-+ } else { \
-+ ret = -errno; \
-+ *(clone_ret) = -errno; \
-+ } \
-+ close(efd); \
-+ ret; \
-+})
-+
-+/* Enums for parent if it should drop privs or not */
-+enum kdbus_drop_parent {
-+ DO_NOT_DROP,
-+ DROP_SAME_UNPRIV,
-+ DROP_OTHER_UNPRIV,
-+};
-+
-+struct kdbus_conn {
-+ int fd;
-+ uint64_t id;
-+ unsigned char *buf;
-+};
-+
-+int kdbus_sysfs_get_parameter_mask(const char *path, uint64_t *mask);
-+int kdbus_sysfs_set_parameter_mask(const char *path, uint64_t mask);
-+
-+int sys_memfd_create(const char *name, __u64 size);
-+int sys_memfd_seal_set(int fd);
-+off_t sys_memfd_get_size(int fd, off_t *size);
-+
-+int kdbus_list(struct kdbus_conn *conn, uint64_t flags);
-+int kdbus_name_release(struct kdbus_conn *conn, const char *name);
-+int kdbus_name_acquire(struct kdbus_conn *conn, const char *name,
-+ uint64_t *flags);
-+void kdbus_msg_free(struct kdbus_msg *msg);
-+int kdbus_msg_recv(struct kdbus_conn *conn,
-+ struct kdbus_msg **msg, uint64_t *offset);
-+int kdbus_msg_recv_poll(struct kdbus_conn *conn, int timeout_ms,
-+ struct kdbus_msg **msg_out, uint64_t *offset);
-+int kdbus_free(const struct kdbus_conn *conn, uint64_t offset);
-+int kdbus_msg_dump(const struct kdbus_conn *conn,
-+ const struct kdbus_msg *msg);
-+int kdbus_create_bus(int control_fd, const char *name,
-+ uint64_t owner_meta, char **path);
-+int kdbus_msg_send(const struct kdbus_conn *conn, const char *name,
-+ uint64_t cookie, uint64_t flags, uint64_t timeout,
-+ int64_t priority, uint64_t dst_id);
-+int kdbus_msg_send_sync(const struct kdbus_conn *conn, const char *name,
-+ uint64_t cookie, uint64_t flags, uint64_t timeout,
-+ int64_t priority, uint64_t dst_id, int cancel_fd);
-+int kdbus_msg_send_reply(const struct kdbus_conn *conn,
-+ uint64_t reply_cookie,
-+ uint64_t dst_id);
-+struct kdbus_conn *kdbus_hello(const char *path, uint64_t hello_flags,
-+ const struct kdbus_item *item,
-+ size_t item_size);
-+struct kdbus_conn *kdbus_hello_registrar(const char *path, const char *name,
-+ const struct kdbus_policy_access *access,
-+ size_t num_access, uint64_t flags);
-+struct kdbus_conn *kdbus_hello_activator(const char *path, const char *name,
-+ const struct kdbus_policy_access *access,
-+ size_t num_access);
-+bool kdbus_item_in_message(struct kdbus_msg *msg, uint64_t type);
-+int kdbus_bus_creator_info(struct kdbus_conn *conn,
-+ uint64_t flags,
-+ uint64_t *offset);
-+int kdbus_conn_info(struct kdbus_conn *conn, uint64_t id,
-+ const char *name, uint64_t flags, uint64_t *offset);
-+void kdbus_conn_free(struct kdbus_conn *conn);
-+int kdbus_conn_update_attach_flags(struct kdbus_conn *conn,
-+ uint64_t attach_flags_send,
-+ uint64_t attach_flags_recv);
-+int kdbus_conn_update_policy(struct kdbus_conn *conn, const char *name,
-+ const struct kdbus_policy_access *access,
-+ size_t num_access);
-+
-+int kdbus_add_match_id(struct kdbus_conn *conn, uint64_t cookie,
-+ uint64_t type, uint64_t id);
-+int kdbus_add_match_empty(struct kdbus_conn *conn);
-+
-+int all_uids_gids_are_mapped(void);
-+int drop_privileges(uid_t uid, gid_t gid);
-+uint64_t now(clockid_t clock);
-+char *unique_name(const char *prefix);
-+
-+int userns_map_uid_gid(pid_t pid, const char *map_uid, const char *map_gid);
-+int test_is_capable(int cap, ...);
-+int config_user_ns_is_enabled(void);
-+int config_auditsyscall_is_enabled(void);
-+int config_cgroups_is_enabled(void);
-+int config_security_is_enabled(void);
-diff --git a/tools/testing/selftests/kdbus/test-activator.c b/tools/testing/selftests/kdbus/test-activator.c
-new file mode 100644
-index 0000000..3d1b763
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-activator.c
-@@ -0,0 +1,318 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <time.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stdbool.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <poll.h>
-+#include <sys/capability.h>
-+#include <sys/types.h>
-+#include <sys/wait.h>
-+
-+#include "kdbus-test.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+
-+static int kdbus_starter_poll(struct kdbus_conn *conn)
-+{
-+ int ret;
-+ struct pollfd fd;
-+
-+ fd.fd = conn->fd;
-+ fd.events = POLLIN | POLLPRI | POLLHUP;
-+ fd.revents = 0;
-+
-+ ret = poll(&fd, 1, 100);
-+ if (ret == 0)
-+ return -ETIMEDOUT;
-+ else if (ret > 0) {
-+ if (fd.revents & POLLIN)
-+ return 0;
-+
-+ if (fd.revents & (POLLHUP | POLLERR))
-+ ret = -ECONNRESET;
-+ }
-+
-+ return ret;
-+}
-+
-+/* Ensure that kdbus activator logic is safe */
-+static int kdbus_priv_activator(struct kdbus_test_env *env)
-+{
-+ int ret;
-+ struct kdbus_msg *msg = NULL;
-+ uint64_t cookie = 0xdeadbeef;
-+ uint64_t flags = KDBUS_NAME_REPLACE_EXISTING;
-+ struct kdbus_conn *activator;
-+ struct kdbus_conn *service;
-+ struct kdbus_conn *client;
-+ struct kdbus_conn *holder;
-+ struct kdbus_policy_access *access;
-+
-+ access = (struct kdbus_policy_access[]){
-+ {
-+ .type = KDBUS_POLICY_ACCESS_USER,
-+ .id = getuid(),
-+ .access = KDBUS_POLICY_OWN,
-+ },
-+ {
-+ .type = KDBUS_POLICY_ACCESS_USER,
-+ .id = getuid(),
-+ .access = KDBUS_POLICY_TALK,
-+ },
-+ };
-+
-+ activator = kdbus_hello_activator(env->buspath, "foo.priv.activator",
-+ access, 2);
-+ ASSERT_RETURN(activator);
-+
-+ service = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(service);
-+
-+ client = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(client);
-+
-+ /*
-+ * Make sure that other users can't TALK to the activator
-+ */
-+
-+ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+ /* Try to talk using the ID */
-+ ret = kdbus_msg_send(unpriv, NULL, 0xdeadbeef, 0, 0,
-+ 0, activator->id);
-+ ASSERT_EXIT(ret == -ENXIO);
-+
-+ /* Try to talk to the name */
-+ ret = kdbus_msg_send(unpriv, "foo.priv.activator",
-+ 0xdeadbeef, 0, 0, 0,
-+ KDBUS_DST_ID_NAME);
-+ ASSERT_EXIT(ret == -EPERM);
-+ }));
-+ ASSERT_RETURN(ret >= 0);
-+
-+ /*
-+ * Make sure that we did not receive anything, so the
-+ * service will not be started automatically
-+ */
-+
-+ ret = kdbus_starter_poll(activator);
-+ ASSERT_RETURN(ret == -ETIMEDOUT);
-+
-+ /*
-+ * Now try to emulate the starter/service logic and
-+ * acquire the name.
-+ */
-+
-+ cookie++;
-+ ret = kdbus_msg_send(service, "foo.priv.activator", cookie,
-+ 0, 0, 0, KDBUS_DST_ID_NAME);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_starter_poll(activator);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* Policies are still checked, access denied */
-+
-+ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+ ret = kdbus_name_acquire(unpriv, "foo.priv.activator",
-+ &flags);
-+ ASSERT_RETURN(ret == -EPERM);
-+ }));
-+ ASSERT_RETURN(ret >= 0);
-+
-+ ret = kdbus_name_acquire(service, "foo.priv.activator",
-+ &flags);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* We read our previous starter message */
-+
-+ ret = kdbus_msg_recv_poll(service, 100, NULL, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* Try to talk, we still fail */
-+
-+ cookie++;
-+ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+ /* Try to talk to the name */
-+ ret = kdbus_msg_send(unpriv, "foo.priv.activator",
-+ cookie, 0, 0, 0,
-+ KDBUS_DST_ID_NAME);
-+ ASSERT_EXIT(ret == -EPERM);
-+ }));
-+ ASSERT_RETURN(ret >= 0);
-+
-+ /* Still nothing to read */
-+
-+ ret = kdbus_msg_recv_poll(service, 100, NULL, NULL);
-+ ASSERT_RETURN(ret == -ETIMEDOUT);
-+
-+ /* We receive every thing now */
-+
-+ cookie++;
-+ ret = kdbus_msg_send(client, "foo.priv.activator", cookie,
-+ 0, 0, 0, KDBUS_DST_ID_NAME);
-+ ASSERT_RETURN(ret == 0);
-+ ret = kdbus_msg_recv_poll(service, 100, &msg, NULL);
-+ ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
-+
-+ kdbus_msg_free(msg);
-+
-+ /* Policies default to deny TALK now */
-+ kdbus_conn_free(activator);
-+
-+ cookie++;
-+ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+ /* Try to talk to the name */
-+ ret = kdbus_msg_send(unpriv, "foo.priv.activator",
-+ cookie, 0, 0, 0,
-+ KDBUS_DST_ID_NAME);
-+ ASSERT_EXIT(ret == -EPERM);
-+ }));
-+ ASSERT_RETURN(ret >= 0);
-+
-+ ret = kdbus_msg_recv_poll(service, 100, NULL, NULL);
-+ ASSERT_RETURN(ret == -ETIMEDOUT);
-+
-+ /* Same user is able to TALK */
-+ cookie++;
-+ ret = kdbus_msg_send(client, "foo.priv.activator", cookie,
-+ 0, 0, 0, KDBUS_DST_ID_NAME);
-+ ASSERT_RETURN(ret == 0);
-+ ret = kdbus_msg_recv_poll(service, 100, &msg, NULL);
-+ ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
-+
-+ kdbus_msg_free(msg);
-+
-+ access = (struct kdbus_policy_access []){
-+ {
-+ .type = KDBUS_POLICY_ACCESS_WORLD,
-+ .id = getuid(),
-+ .access = KDBUS_POLICY_TALK,
-+ },
-+ };
-+
-+ holder = kdbus_hello_registrar(env->buspath, "foo.priv.activator",
-+ access, 1, KDBUS_HELLO_POLICY_HOLDER);
-+ ASSERT_RETURN(holder);
-+
-+ /* Now we are able to TALK to the name */
-+
-+ cookie++;
-+ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+ /* Try to talk to the name */
-+ ret = kdbus_msg_send(unpriv, "foo.priv.activator",
-+ cookie, 0, 0, 0,
-+ KDBUS_DST_ID_NAME);
-+ ASSERT_EXIT(ret == 0);
-+ }));
-+ ASSERT_RETURN(ret >= 0);
-+
-+ ret = kdbus_msg_recv_poll(service, 100, NULL, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+ ret = kdbus_name_acquire(unpriv, "foo.priv.activator",
-+ &flags);
-+ ASSERT_RETURN(ret == -EPERM);
-+ }));
-+ ASSERT_RETURN(ret >= 0);
-+
-+ kdbus_conn_free(service);
-+ kdbus_conn_free(client);
-+ kdbus_conn_free(holder);
-+
-+ return 0;
-+}
-+
-+int kdbus_test_activator(struct kdbus_test_env *env)
-+{
-+ int ret;
-+ struct kdbus_conn *activator;
-+ struct pollfd fds[2];
-+ bool activator_done = false;
-+ struct kdbus_policy_access access[2];
-+
-+ access[0].type = KDBUS_POLICY_ACCESS_USER;
-+ access[0].id = getuid();
-+ access[0].access = KDBUS_POLICY_OWN;
-+
-+ access[1].type = KDBUS_POLICY_ACCESS_WORLD;
-+ access[1].access = KDBUS_POLICY_TALK;
-+
-+ activator = kdbus_hello_activator(env->buspath, "foo.test.activator",
-+ access, 2);
-+ ASSERT_RETURN(activator);
-+
-+ ret = kdbus_add_match_empty(env->conn);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_list(env->conn, KDBUS_LIST_NAMES |
-+ KDBUS_LIST_UNIQUE |
-+ KDBUS_LIST_ACTIVATORS |
-+ KDBUS_LIST_QUEUED);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_msg_send(env->conn, "foo.test.activator", 0xdeafbeef,
-+ 0, 0, 0, KDBUS_DST_ID_NAME);
-+ ASSERT_RETURN(ret == 0);
-+
-+ fds[0].fd = activator->fd;
-+ fds[1].fd = env->conn->fd;
-+
-+ kdbus_printf("-- entering poll loop ...\n");
-+
-+ for (;;) {
-+ int i, nfds = sizeof(fds) / sizeof(fds[0]);
-+
-+ for (i = 0; i < nfds; i++) {
-+ fds[i].events = POLLIN | POLLPRI;
-+ fds[i].revents = 0;
-+ }
-+
-+ ret = poll(fds, nfds, 3000);
-+ ASSERT_RETURN(ret >= 0);
-+
-+ ret = kdbus_list(env->conn, KDBUS_LIST_NAMES);
-+ ASSERT_RETURN(ret == 0);
-+
-+ if ((fds[0].revents & POLLIN) && !activator_done) {
-+ uint64_t flags = KDBUS_NAME_REPLACE_EXISTING;
-+
-+ kdbus_printf("Starter was called back!\n");
-+
-+ ret = kdbus_name_acquire(env->conn,
-+ "foo.test.activator", &flags);
-+ ASSERT_RETURN(ret == 0);
-+
-+ activator_done = true;
-+ }
-+
-+ if (fds[1].revents & POLLIN) {
-+ kdbus_msg_recv(env->conn, NULL, NULL);
-+ break;
-+ }
-+ }
-+
-+ /* Check if all uids/gids are mapped */
-+ if (!all_uids_gids_are_mapped())
-+ return TEST_SKIP;
-+
-+ /* Check now capabilities, so we run the previous tests */
-+ ret = test_is_capable(CAP_SETUID, CAP_SETGID, -1);
-+ ASSERT_RETURN(ret >= 0);
-+
-+ if (!ret)
-+ return TEST_SKIP;
-+
-+ ret = kdbus_priv_activator(env);
-+ ASSERT_RETURN(ret == 0);
-+
-+ kdbus_conn_free(activator);
-+
-+ return TEST_OK;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-benchmark.c b/tools/testing/selftests/kdbus/test-benchmark.c
-new file mode 100644
-index 0000000..8a9744b
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-benchmark.c
-@@ -0,0 +1,451 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <time.h>
-+#include <fcntl.h>
-+#include <locale.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <stdbool.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <poll.h>
-+#include <sys/time.h>
-+#include <sys/mman.h>
-+#include <sys/socket.h>
-+#include <math.h>
-+
-+#include "kdbus-api.h"
-+#include "kdbus-test.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+
-+#define SERVICE_NAME "foo.bar.echo"
-+
-+/*
-+ * To have a banchmark comparison with unix socket, set:
-+ * user_memfd = false;
-+ * compare_uds = true;
-+ * attach_none = true; do not attached metadata
-+ */
-+
-+static bool use_memfd = true; /* transmit memfd? */
-+static bool compare_uds = false; /* unix-socket comparison? */
-+static bool attach_none = false; /* clear attach-flags? */
-+static char stress_payload[8192];
-+
-+struct stats {
-+ uint64_t count;
-+ uint64_t latency_acc;
-+ uint64_t latency_low;
-+ uint64_t latency_high;
-+ uint64_t latency_avg;
-+ uint64_t latency_ssquares;
-+};
-+
-+static struct stats stats;
-+
-+static void reset_stats(void)
-+{
-+ stats.count = 0;
-+ stats.latency_acc = 0;
-+ stats.latency_low = UINT64_MAX;
-+ stats.latency_high = 0;
-+ stats.latency_avg = 0;
-+ stats.latency_ssquares = 0;
-+}
-+
-+static void dump_stats(bool is_uds)
-+{
-+ if (stats.count > 0) {
-+ kdbus_printf("stats %s: %'llu packets processed, latency (nsecs) min/max/avg/dev %'7llu // %'7llu // %'7llu // %'7.f\n",
-+ is_uds ? " (UNIX)" : "(KDBUS)",
-+ (unsigned long long) stats.count,
-+ (unsigned long long) stats.latency_low,
-+ (unsigned long long) stats.latency_high,
-+ (unsigned long long) stats.latency_avg,
-+ sqrt(stats.latency_ssquares / stats.count));
-+ } else {
-+ kdbus_printf("*** no packets received. bus stuck?\n");
-+ }
-+}
-+
-+static void add_stats(uint64_t prev)
-+{
-+ uint64_t diff, latency_avg_prev;
-+
-+ diff = now(CLOCK_THREAD_CPUTIME_ID) - prev;
-+
-+ stats.count++;
-+ stats.latency_acc += diff;
-+
-+ /* see Welford62 */
-+ latency_avg_prev = stats.latency_avg;
-+ stats.latency_avg = stats.latency_acc / stats.count;
-+ stats.latency_ssquares += (diff - latency_avg_prev) * (diff - stats.latency_avg);
-+
-+ if (stats.latency_low > diff)
-+ stats.latency_low = diff;
-+
-+ if (stats.latency_high < diff)
-+ stats.latency_high = diff;
-+}
-+
-+static int setup_simple_kdbus_msg(struct kdbus_conn *conn,
-+ uint64_t dst_id,
-+ struct kdbus_msg **msg_out)
-+{
-+ struct kdbus_msg *msg;
-+ struct kdbus_item *item;
-+ uint64_t size;
-+
-+ size = sizeof(struct kdbus_msg);
-+ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
-+
-+ msg = malloc(size);
-+ ASSERT_RETURN_VAL(msg, -ENOMEM);
-+
-+ memset(msg, 0, size);
-+ msg->size = size;
-+ msg->src_id = conn->id;
-+ msg->dst_id = dst_id;
-+ msg->payload_type = KDBUS_PAYLOAD_DBUS;
-+
-+ item = msg->items;
-+
-+ item->type = KDBUS_ITEM_PAYLOAD_VEC;
-+ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
-+ item->vec.address = (uintptr_t) stress_payload;
-+ item->vec.size = sizeof(stress_payload);
-+ item = KDBUS_ITEM_NEXT(item);
-+
-+ *msg_out = msg;
-+
-+ return 0;
-+}
-+
-+static int setup_memfd_kdbus_msg(struct kdbus_conn *conn,
-+ uint64_t dst_id,
-+ off_t *memfd_item_offset,
-+ struct kdbus_msg **msg_out)
-+{
-+ struct kdbus_msg *msg;
-+ struct kdbus_item *item;
-+ uint64_t size;
-+
-+ size = sizeof(struct kdbus_msg);
-+ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
-+ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_memfd));
-+
-+ msg = malloc(size);
-+ ASSERT_RETURN_VAL(msg, -ENOMEM);
-+
-+ memset(msg, 0, size);
-+ msg->size = size;
-+ msg->src_id = conn->id;
-+ msg->dst_id = dst_id;
-+ msg->payload_type = KDBUS_PAYLOAD_DBUS;
-+
-+ item = msg->items;
-+
-+ item->type = KDBUS_ITEM_PAYLOAD_VEC;
-+ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
-+ item->vec.address = (uintptr_t) stress_payload;
-+ item->vec.size = sizeof(stress_payload);
-+ item = KDBUS_ITEM_NEXT(item);
-+
-+ item->type = KDBUS_ITEM_PAYLOAD_MEMFD;
-+ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_memfd);
-+ item->memfd.size = sizeof(uint64_t);
-+
-+ *memfd_item_offset = (unsigned char *)item - (unsigned char *)msg;
-+ *msg_out = msg;
-+
-+ return 0;
-+}
-+
-+static int
-+send_echo_request(struct kdbus_conn *conn, uint64_t dst_id,
-+ void *kdbus_msg, off_t memfd_item_offset)
-+{
-+ struct kdbus_cmd_send cmd = {};
-+ int memfd = -1;
-+ int ret;
-+
-+ if (use_memfd) {
-+ uint64_t now_ns = now(CLOCK_THREAD_CPUTIME_ID);
-+ struct kdbus_item *item = memfd_item_offset + kdbus_msg;
-+ memfd = sys_memfd_create("memfd-name", 0);
-+ ASSERT_RETURN_VAL(memfd >= 0, memfd);
-+
-+ ret = write(memfd, &now_ns, sizeof(now_ns));
-+ ASSERT_RETURN_VAL(ret == sizeof(now_ns), -EAGAIN);
-+
-+ ret = sys_memfd_seal_set(memfd);
-+ ASSERT_RETURN_VAL(ret == 0, -errno);
-+
-+ item->memfd.fd = memfd;
-+ }
-+
-+ cmd.size = sizeof(cmd);
-+ cmd.msg_address = (uintptr_t)kdbus_msg;
-+
-+ ret = kdbus_cmd_send(conn->fd, &cmd);
-+ ASSERT_RETURN_VAL(ret == 0, ret);
-+
-+ close(memfd);
-+
-+ return 0;
-+}
-+
-+static int
-+handle_echo_reply(struct kdbus_conn *conn, uint64_t send_ns)
-+{
-+ int ret;
-+ struct kdbus_cmd_recv recv = { .size = sizeof(recv) };
-+ struct kdbus_msg *msg;
-+ const struct kdbus_item *item;
-+ bool has_memfd = false;
-+
-+ ret = kdbus_cmd_recv(conn->fd, &recv);
-+ if (ret == -EAGAIN)
-+ return ret;
-+
-+ ASSERT_RETURN_VAL(ret == 0, ret);
-+
-+ if (!use_memfd)
-+ goto out;
-+
-+ msg = (struct kdbus_msg *)(conn->buf + recv.msg.offset);
-+
-+ KDBUS_ITEM_FOREACH(item, msg, items) {
-+ switch (item->type) {
-+ case KDBUS_ITEM_PAYLOAD_MEMFD: {
-+ char *buf;
-+
-+ buf = mmap(NULL, item->memfd.size, PROT_READ,
-+ MAP_PRIVATE, item->memfd.fd, 0);
-+ ASSERT_RETURN_VAL(buf != MAP_FAILED, -EINVAL);
-+ ASSERT_RETURN_VAL(item->memfd.size == sizeof(uint64_t),
-+ -EINVAL);
-+
-+ add_stats(*(uint64_t*)buf);
-+ munmap(buf, item->memfd.size);
-+ close(item->memfd.fd);
-+ has_memfd = true;
-+ break;
-+ }
-+
-+ case KDBUS_ITEM_PAYLOAD_OFF:
-+ /* ignore */
-+ break;
-+ }
-+ }
-+
-+out:
-+ if (!has_memfd)
-+ add_stats(send_ns);
-+
-+ ret = kdbus_free(conn, recv.msg.offset);
-+ ASSERT_RETURN_VAL(ret == 0, -errno);
-+
-+ return 0;
-+}
-+
-+static int benchmark(struct kdbus_test_env *env)
-+{
-+ static char buf[sizeof(stress_payload)];
-+ struct kdbus_msg *kdbus_msg = NULL;
-+ off_t memfd_cached_offset = 0;
-+ int ret;
-+ struct kdbus_conn *conn_a, *conn_b;
-+ struct pollfd fds[2];
-+ uint64_t start, send_ns, now_ns, diff;
-+ unsigned int i;
-+ int uds[2];
-+
-+ setlocale(LC_ALL, "");
-+
-+ for (i = 0; i < sizeof(stress_payload); i++)
-+ stress_payload[i] = i;
-+
-+ /* setup kdbus pair */
-+
-+ conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
-+ conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn_a && conn_b);
-+
-+ ret = kdbus_add_match_empty(conn_a);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_add_match_empty(conn_b);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_name_acquire(conn_a, SERVICE_NAME, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ if (attach_none) {
-+ ret = kdbus_conn_update_attach_flags(conn_a,
-+ _KDBUS_ATTACH_ALL,
-+ 0);
-+ ASSERT_RETURN(ret == 0);
-+ }
-+
-+ /* setup UDS pair */
-+
-+ ret = socketpair(AF_UNIX, SOCK_SEQPACKET | SOCK_NONBLOCK, 0, uds);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* setup a kdbus msg now */
-+ if (use_memfd) {
-+ ret = setup_memfd_kdbus_msg(conn_b, conn_a->id,
-+ &memfd_cached_offset,
-+ &kdbus_msg);
-+ ASSERT_RETURN(ret == 0);
-+ } else {
-+ ret = setup_simple_kdbus_msg(conn_b, conn_a->id, &kdbus_msg);
-+ ASSERT_RETURN(ret == 0);
-+ }
-+
-+ /* start benchmark */
-+
-+ kdbus_printf("-- entering poll loop ...\n");
-+
-+ do {
-+ /* run kdbus benchmark */
-+ fds[0].fd = conn_a->fd;
-+ fds[1].fd = conn_b->fd;
-+
-+ /* cancel any pending message */
-+ handle_echo_reply(conn_a, 0);
-+
-+ start = now(CLOCK_THREAD_CPUTIME_ID);
-+ reset_stats();
-+
-+ send_ns = now(CLOCK_THREAD_CPUTIME_ID);
-+ ret = send_echo_request(conn_b, conn_a->id,
-+ kdbus_msg, memfd_cached_offset);
-+ ASSERT_RETURN(ret == 0);
-+
-+ while (1) {
-+ unsigned int nfds = sizeof(fds) / sizeof(fds[0]);
-+ unsigned int i;
-+
-+ for (i = 0; i < nfds; i++) {
-+ fds[i].events = POLLIN | POLLPRI | POLLHUP;
-+ fds[i].revents = 0;
-+ }
-+
-+ ret = poll(fds, nfds, 10);
-+ if (ret < 0)
-+ break;
-+
-+ if (fds[0].revents & POLLIN) {
-+ ret = handle_echo_reply(conn_a, send_ns);
-+ ASSERT_RETURN(ret == 0);
-+
-+ send_ns = now(CLOCK_THREAD_CPUTIME_ID);
-+ ret = send_echo_request(conn_b, conn_a->id,
-+ kdbus_msg,
-+ memfd_cached_offset);
-+ ASSERT_RETURN(ret == 0);
-+ }
-+
-+ now_ns = now(CLOCK_THREAD_CPUTIME_ID);
-+ diff = now_ns - start;
-+ if (diff > 1000000000ULL) {
-+ start = now_ns;
-+
-+ dump_stats(false);
-+ break;
-+ }
-+ }
-+
-+ if (!compare_uds)
-+ continue;
-+
-+ /* run unix-socket benchmark as comparison */
-+
-+ fds[0].fd = uds[0];
-+ fds[1].fd = uds[1];
-+
-+ /* cancel any pendign message */
-+ read(uds[1], buf, sizeof(buf));
-+
-+ start = now(CLOCK_THREAD_CPUTIME_ID);
-+ reset_stats();
-+
-+ send_ns = now(CLOCK_THREAD_CPUTIME_ID);
-+ ret = write(uds[0], stress_payload, sizeof(stress_payload));
-+ ASSERT_RETURN(ret == sizeof(stress_payload));
-+
-+ while (1) {
-+ unsigned int nfds = sizeof(fds) / sizeof(fds[0]);
-+ unsigned int i;
-+
-+ for (i = 0; i < nfds; i++) {
-+ fds[i].events = POLLIN | POLLPRI | POLLHUP;
-+ fds[i].revents = 0;
-+ }
-+
-+ ret = poll(fds, nfds, 10);
-+ if (ret < 0)
-+ break;
-+
-+ if (fds[1].revents & POLLIN) {
-+ ret = read(uds[1], buf, sizeof(buf));
-+ ASSERT_RETURN(ret == sizeof(buf));
-+
-+ add_stats(send_ns);
-+
-+ send_ns = now(CLOCK_THREAD_CPUTIME_ID);
-+ ret = write(uds[0], buf, sizeof(buf));
-+ ASSERT_RETURN(ret == sizeof(buf));
-+ }
-+
-+ now_ns = now(CLOCK_THREAD_CPUTIME_ID);
-+ diff = now_ns - start;
-+ if (diff > 1000000000ULL) {
-+ start = now_ns;
-+
-+ dump_stats(true);
-+ break;
-+ }
-+ }
-+
-+ } while (kdbus_util_verbose);
-+
-+ kdbus_printf("-- closing bus connections\n");
-+
-+ free(kdbus_msg);
-+
-+ kdbus_conn_free(conn_a);
-+ kdbus_conn_free(conn_b);
-+
-+ return (stats.count > 1) ? TEST_OK : TEST_ERR;
-+}
-+
-+int kdbus_test_benchmark(struct kdbus_test_env *env)
-+{
-+ use_memfd = true;
-+ attach_none = false;
-+ compare_uds = false;
-+ return benchmark(env);
-+}
-+
-+int kdbus_test_benchmark_nomemfds(struct kdbus_test_env *env)
-+{
-+ use_memfd = false;
-+ attach_none = false;
-+ compare_uds = false;
-+ return benchmark(env);
-+}
-+
-+int kdbus_test_benchmark_uds(struct kdbus_test_env *env)
-+{
-+ use_memfd = false;
-+ attach_none = true;
-+ compare_uds = true;
-+ return benchmark(env);
-+}
-diff --git a/tools/testing/selftests/kdbus/test-bus.c b/tools/testing/selftests/kdbus/test-bus.c
-new file mode 100644
-index 0000000..762fb30
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-bus.c
-@@ -0,0 +1,175 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <limits.h>
-+#include <sys/mman.h>
-+#include <stdbool.h>
-+
-+#include "kdbus-api.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+#include "kdbus-test.h"
-+
-+static struct kdbus_item *kdbus_get_item(struct kdbus_info *info,
-+ uint64_t type)
-+{
-+ struct kdbus_item *item;
-+
-+ KDBUS_ITEM_FOREACH(item, info, items)
-+ if (item->type == type)
-+ return item;
-+
-+ return NULL;
-+}
-+
-+static int test_bus_creator_info(const char *bus_path)
-+{
-+ int ret;
-+ uint64_t offset;
-+ struct kdbus_conn *conn;
-+ struct kdbus_info *info;
-+ struct kdbus_item *item;
-+ char *tmp, *busname;
-+
-+ /* extract the bus-name from @bus_path */
-+ tmp = strdup(bus_path);
-+ ASSERT_RETURN(tmp);
-+ busname = strrchr(tmp, '/');
-+ ASSERT_RETURN(busname);
-+ *busname = 0;
-+ busname = strrchr(tmp, '/');
-+ ASSERT_RETURN(busname);
-+ ++busname;
-+
-+ conn = kdbus_hello(bus_path, 0, NULL, 0);
-+ ASSERT_RETURN(conn);
-+
-+ ret = kdbus_bus_creator_info(conn, _KDBUS_ATTACH_ALL, &offset);
-+ ASSERT_RETURN(ret == 0);
-+
-+ info = (struct kdbus_info *)(conn->buf + offset);
-+
-+ item = kdbus_get_item(info, KDBUS_ITEM_MAKE_NAME);
-+ ASSERT_RETURN(item);
-+ ASSERT_RETURN(!strcmp(item->str, busname));
-+
-+ ret = kdbus_free(conn, offset);
-+ ASSERT_RETURN_VAL(ret == 0, ret);
-+
-+ free(tmp);
-+ kdbus_conn_free(conn);
-+ return 0;
-+}
-+
-+int kdbus_test_bus_make(struct kdbus_test_env *env)
-+{
-+ struct {
-+ struct kdbus_cmd cmd;
-+
-+ /* bloom size item */
-+ struct {
-+ uint64_t size;
-+ uint64_t type;
-+ struct kdbus_bloom_parameter bloom;
-+ } bs;
-+
-+ /* name item */
-+ uint64_t n_size;
-+ uint64_t n_type;
-+ char name[64];
-+ } bus_make;
-+ char s[PATH_MAX], *name;
-+ int ret, control_fd2;
-+ uid_t uid;
-+
-+ name = unique_name("");
-+ ASSERT_RETURN(name);
-+
-+ snprintf(s, sizeof(s), "%s/control", env->root);
-+ env->control_fd = open(s, O_RDWR|O_CLOEXEC);
-+ ASSERT_RETURN(env->control_fd >= 0);
-+
-+ control_fd2 = open(s, O_RDWR|O_CLOEXEC);
-+ ASSERT_RETURN(control_fd2 >= 0);
-+
-+ memset(&bus_make, 0, sizeof(bus_make));
-+
-+ bus_make.bs.size = sizeof(bus_make.bs);
-+ bus_make.bs.type = KDBUS_ITEM_BLOOM_PARAMETER;
-+ bus_make.bs.bloom.size = 64;
-+ bus_make.bs.bloom.n_hash = 1;
-+
-+ bus_make.n_type = KDBUS_ITEM_MAKE_NAME;
-+
-+ uid = getuid();
-+
-+ /* missing uid prefix */
-+ snprintf(bus_make.name, sizeof(bus_make.name), "foo");
-+ bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
-+ bus_make.cmd.size = sizeof(struct kdbus_cmd) +
-+ sizeof(bus_make.bs) + bus_make.n_size;
-+ ret = kdbus_cmd_bus_make(env->control_fd, &bus_make.cmd);
-+ ASSERT_RETURN(ret == -EINVAL);
-+
-+ /* non alphanumeric character */
-+ snprintf(bus_make.name, sizeof(bus_make.name), "%u-blah@123", uid);
-+ bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
-+ bus_make.cmd.size = sizeof(struct kdbus_cmd) +
-+ sizeof(bus_make.bs) + bus_make.n_size;
-+ ret = kdbus_cmd_bus_make(env->control_fd, &bus_make.cmd);
-+ ASSERT_RETURN(ret == -EINVAL);
-+
-+ /* '-' at the end */
-+ snprintf(bus_make.name, sizeof(bus_make.name), "%u-blah-", uid);
-+ bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
-+ bus_make.cmd.size = sizeof(struct kdbus_cmd) +
-+ sizeof(bus_make.bs) + bus_make.n_size;
-+ ret = kdbus_cmd_bus_make(env->control_fd, &bus_make.cmd);
-+ ASSERT_RETURN(ret == -EINVAL);
-+
-+ /* create a new bus */
-+ snprintf(bus_make.name, sizeof(bus_make.name), "%u-%s-1", uid, name);
-+ bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
-+ bus_make.cmd.size = sizeof(struct kdbus_cmd) +
-+ sizeof(bus_make.bs) + bus_make.n_size;
-+ ret = kdbus_cmd_bus_make(env->control_fd, &bus_make.cmd);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_cmd_bus_make(control_fd2, &bus_make.cmd);
-+ ASSERT_RETURN(ret == -EEXIST);
-+
-+ snprintf(s, sizeof(s), "%s/%u-%s-1/bus", env->root, uid, name);
-+ ASSERT_RETURN(access(s, F_OK) == 0);
-+
-+ ret = test_bus_creator_info(s);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* can't use the same fd for bus make twice, even though a different
-+ * bus name is used
-+ */
-+ snprintf(bus_make.name, sizeof(bus_make.name), "%u-%s-2", uid, name);
-+ bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
-+ bus_make.cmd.size = sizeof(struct kdbus_cmd) +
-+ sizeof(bus_make.bs) + bus_make.n_size;
-+ ret = kdbus_cmd_bus_make(env->control_fd, &bus_make.cmd);
-+ ASSERT_RETURN(ret == -EBADFD);
-+
-+ /* create a new bus, with different fd and different bus name */
-+ snprintf(bus_make.name, sizeof(bus_make.name), "%u-%s-2", uid, name);
-+ bus_make.n_size = KDBUS_ITEM_HEADER_SIZE + strlen(bus_make.name) + 1;
-+ bus_make.cmd.size = sizeof(struct kdbus_cmd) +
-+ sizeof(bus_make.bs) + bus_make.n_size;
-+ ret = kdbus_cmd_bus_make(control_fd2, &bus_make.cmd);
-+ ASSERT_RETURN(ret == 0);
-+
-+ close(control_fd2);
-+ free(name);
-+
-+ return TEST_OK;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-chat.c b/tools/testing/selftests/kdbus/test-chat.c
-new file mode 100644
-index 0000000..41e5b53
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-chat.c
-@@ -0,0 +1,124 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <time.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <poll.h>
-+#include <stdbool.h>
-+
-+#include "kdbus-test.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+
-+int kdbus_test_chat(struct kdbus_test_env *env)
-+{
-+ int ret, cookie;
-+ struct kdbus_conn *conn_a, *conn_b;
-+ struct pollfd fds[2];
-+ uint64_t flags;
-+ int count;
-+
-+ conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
-+ conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn_a && conn_b);
-+
-+ flags = KDBUS_NAME_ALLOW_REPLACEMENT;
-+ ret = kdbus_name_acquire(conn_a, "foo.bar.test", &flags);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_name_acquire(conn_a, "foo.bar.baz", NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ flags = KDBUS_NAME_QUEUE;
-+ ret = kdbus_name_acquire(conn_b, "foo.bar.baz", &flags);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_name_acquire(conn_a, "foo.bar.double", NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ flags = 0;
-+ ret = kdbus_name_acquire(conn_a, "foo.bar.double", &flags);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(!(flags & KDBUS_NAME_ACQUIRED));
-+
-+ ret = kdbus_name_release(conn_a, "foo.bar.double");
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_name_release(conn_a, "foo.bar.double");
-+ ASSERT_RETURN(ret == -ESRCH);
-+
-+ ret = kdbus_list(conn_b, KDBUS_LIST_UNIQUE |
-+ KDBUS_LIST_NAMES |
-+ KDBUS_LIST_QUEUED |
-+ KDBUS_LIST_ACTIVATORS);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_add_match_empty(conn_a);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_add_match_empty(conn_b);
-+ ASSERT_RETURN(ret == 0);
-+
-+ cookie = 0;
-+ ret = kdbus_msg_send(conn_b, NULL, 0xc0000000 | cookie, 0, 0, 0,
-+ KDBUS_DST_ID_BROADCAST);
-+ ASSERT_RETURN(ret == 0);
-+
-+ fds[0].fd = conn_a->fd;
-+ fds[1].fd = conn_b->fd;
-+
-+ kdbus_printf("-- entering poll loop ...\n");
-+
-+ for (count = 0;; count++) {
-+ int i, nfds = sizeof(fds) / sizeof(fds[0]);
-+
-+ for (i = 0; i < nfds; i++) {
-+ fds[i].events = POLLIN | POLLPRI | POLLHUP;
-+ fds[i].revents = 0;
-+ }
-+
-+ ret = poll(fds, nfds, 3000);
-+ ASSERT_RETURN(ret >= 0);
-+
-+ if (fds[0].revents & POLLIN) {
-+ if (count > 2)
-+ kdbus_name_release(conn_a, "foo.bar.baz");
-+
-+ ret = kdbus_msg_recv(conn_a, NULL, NULL);
-+ ASSERT_RETURN(ret == 0);
-+ ret = kdbus_msg_send(conn_a, NULL,
-+ 0xc0000000 | cookie++,
-+ 0, 0, 0, conn_b->id);
-+ ASSERT_RETURN(ret == 0);
-+ }
-+
-+ if (fds[1].revents & POLLIN) {
-+ ret = kdbus_msg_recv(conn_b, NULL, NULL);
-+ ASSERT_RETURN(ret == 0);
-+ ret = kdbus_msg_send(conn_b, NULL,
-+ 0xc0000000 | cookie++,
-+ 0, 0, 0, conn_a->id);
-+ ASSERT_RETURN(ret == 0);
-+ }
-+
-+ ret = kdbus_list(conn_b, KDBUS_LIST_UNIQUE |
-+ KDBUS_LIST_NAMES |
-+ KDBUS_LIST_QUEUED |
-+ KDBUS_LIST_ACTIVATORS);
-+ ASSERT_RETURN(ret == 0);
-+
-+ if (count > 10)
-+ break;
-+ }
-+
-+ kdbus_printf("-- closing bus connections\n");
-+ kdbus_conn_free(conn_a);
-+ kdbus_conn_free(conn_b);
-+
-+ return TEST_OK;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-connection.c b/tools/testing/selftests/kdbus/test-connection.c
-new file mode 100644
-index 0000000..4688ce8
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-connection.c
-@@ -0,0 +1,597 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <limits.h>
-+#include <sys/types.h>
-+#include <sys/capability.h>
-+#include <sys/mman.h>
-+#include <sys/syscall.h>
-+#include <sys/wait.h>
-+#include <stdbool.h>
-+
-+#include "kdbus-api.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+#include "kdbus-test.h"
-+
-+int kdbus_test_hello(struct kdbus_test_env *env)
-+{
-+ struct kdbus_cmd_free cmd_free = {};
-+ struct kdbus_cmd_hello hello;
-+ int fd, ret;
-+
-+ memset(&hello, 0, sizeof(hello));
-+
-+ fd = open(env->buspath, O_RDWR|O_CLOEXEC);
-+ ASSERT_RETURN(fd >= 0);
-+
-+ hello.flags = KDBUS_HELLO_ACCEPT_FD;
-+ hello.attach_flags_send = _KDBUS_ATTACH_ALL;
-+ hello.attach_flags_recv = _KDBUS_ATTACH_ALL;
-+ hello.size = sizeof(struct kdbus_cmd_hello);
-+ hello.pool_size = POOL_SIZE;
-+
-+ /* an unaligned hello must result in -EFAULT */
-+ ret = kdbus_cmd_hello(fd, (struct kdbus_cmd_hello *) ((char *) &hello + 1));
-+ ASSERT_RETURN(ret == -EFAULT);
-+
-+ /* a size of 0 must return EMSGSIZE */
-+ hello.size = 1;
-+ hello.flags = KDBUS_HELLO_ACCEPT_FD;
-+ hello.attach_flags_send = _KDBUS_ATTACH_ALL;
-+ ret = kdbus_cmd_hello(fd, &hello);
-+ ASSERT_RETURN(ret == -EINVAL);
-+
-+ hello.size = sizeof(struct kdbus_cmd_hello);
-+
-+ /* check faulty flags */
-+ hello.flags = 1ULL << 32;
-+ hello.attach_flags_send = _KDBUS_ATTACH_ALL;
-+ ret = kdbus_cmd_hello(fd, &hello);
-+ ASSERT_RETURN(ret == -EINVAL);
-+
-+ /* check for faulty pool sizes */
-+ hello.pool_size = 0;
-+ hello.flags = KDBUS_HELLO_ACCEPT_FD;
-+ hello.attach_flags_send = _KDBUS_ATTACH_ALL;
-+ ret = kdbus_cmd_hello(fd, &hello);
-+ ASSERT_RETURN(ret == -EINVAL);
-+
-+ hello.pool_size = 4097;
-+ hello.attach_flags_send = _KDBUS_ATTACH_ALL;
-+ ret = kdbus_cmd_hello(fd, &hello);
-+ ASSERT_RETURN(ret == -EINVAL);
-+
-+ hello.pool_size = POOL_SIZE;
-+
-+ hello.attach_flags_send = _KDBUS_ATTACH_ALL;
-+ hello.offset = (__u64)-1;
-+
-+ /* success test */
-+ ret = kdbus_cmd_hello(fd, &hello);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* The kernel should have returned some items */
-+ ASSERT_RETURN(hello.offset != (__u64)-1);
-+ cmd_free.size = sizeof(cmd_free);
-+ cmd_free.offset = hello.offset;
-+ ret = kdbus_cmd_free(fd, &cmd_free);
-+ ASSERT_RETURN(ret >= 0);
-+
-+ close(fd);
-+
-+ fd = open(env->buspath, O_RDWR|O_CLOEXEC);
-+ ASSERT_RETURN(fd >= 0);
-+
-+ /* no ACTIVATOR flag without a name */
-+ hello.flags = KDBUS_HELLO_ACTIVATOR;
-+ ret = kdbus_cmd_hello(fd, &hello);
-+ ASSERT_RETURN(ret == -EINVAL);
-+
-+ close(fd);
-+
-+ return TEST_OK;
-+}
-+
-+int kdbus_test_byebye(struct kdbus_test_env *env)
-+{
-+ struct kdbus_conn *conn;
-+ struct kdbus_cmd_recv cmd_recv = { .size = sizeof(cmd_recv) };
-+ struct kdbus_cmd cmd_byebye = { .size = sizeof(cmd_byebye) };
-+ int ret;
-+
-+ /* create a 2nd connection */
-+ conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn != NULL);
-+
-+ ret = kdbus_add_match_empty(conn);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_add_match_empty(env->conn);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* send over 1st connection */
-+ ret = kdbus_msg_send(env->conn, NULL, 0, 0, 0, 0,
-+ KDBUS_DST_ID_BROADCAST);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* say byebye on the 2nd, which must fail */
-+ ret = kdbus_cmd_byebye(conn->fd, &cmd_byebye);
-+ ASSERT_RETURN(ret == -EBUSY);
-+
-+ /* receive the message */
-+ ret = kdbus_cmd_recv(conn->fd, &cmd_recv);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_free(conn, cmd_recv.msg.offset);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* and try again */
-+ ret = kdbus_cmd_byebye(conn->fd, &cmd_byebye);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* a 2nd try should result in -ECONNRESET */
-+ ret = kdbus_cmd_byebye(conn->fd, &cmd_byebye);
-+ ASSERT_RETURN(ret == -ECONNRESET);
-+
-+ kdbus_conn_free(conn);
-+
-+ return TEST_OK;
-+}
-+
-+/* Get only the first item */
-+static struct kdbus_item *kdbus_get_item(struct kdbus_info *info,
-+ uint64_t type)
-+{
-+ struct kdbus_item *item;
-+
-+ KDBUS_ITEM_FOREACH(item, info, items)
-+ if (item->type == type)
-+ return item;
-+
-+ return NULL;
-+}
-+
-+static unsigned int kdbus_count_item(struct kdbus_info *info,
-+ uint64_t type)
-+{
-+ unsigned int i = 0;
-+ const struct kdbus_item *item;
-+
-+ KDBUS_ITEM_FOREACH(item, info, items)
-+ if (item->type == type)
-+ i++;
-+
-+ return i;
-+}
-+
-+static int kdbus_fuzz_conn_info(struct kdbus_test_env *env, int capable)
-+{
-+ int ret;
-+ unsigned int cnt = 0;
-+ uint64_t offset = 0;
-+ struct kdbus_info *info;
-+ struct kdbus_conn *conn;
-+ struct kdbus_conn *privileged;
-+ const struct kdbus_item *item;
-+ uint64_t valid_flags = KDBUS_ATTACH_NAMES |
-+ KDBUS_ATTACH_CREDS |
-+ KDBUS_ATTACH_PIDS |
-+ KDBUS_ATTACH_CONN_DESCRIPTION;
-+
-+ uint64_t invalid_flags = KDBUS_ATTACH_NAMES |
-+ KDBUS_ATTACH_CREDS |
-+ KDBUS_ATTACH_PIDS |
-+ KDBUS_ATTACH_CAPS |
-+ KDBUS_ATTACH_CGROUP |
-+ KDBUS_ATTACH_CONN_DESCRIPTION;
-+
-+ struct kdbus_creds cached_creds;
-+ uid_t ruid, euid, suid;
-+ gid_t rgid, egid, sgid;
-+
-+ getresuid(&ruid, &euid, &suid);
-+ getresgid(&rgid, &egid, &sgid);
-+
-+ cached_creds.uid = ruid;
-+ cached_creds.euid = euid;
-+ cached_creds.suid = suid;
-+ cached_creds.fsuid = ruid;
-+
-+ cached_creds.gid = rgid;
-+ cached_creds.egid = egid;
-+ cached_creds.sgid = sgid;
-+ cached_creds.fsgid = rgid;
-+
-+ struct kdbus_pids cached_pids = {
-+ .pid = getpid(),
-+ .tid = syscall(SYS_gettid),
-+ .ppid = getppid(),
-+ };
-+
-+ ret = kdbus_conn_info(env->conn, env->conn->id, NULL,
-+ valid_flags, &offset);
-+ ASSERT_RETURN(ret == 0);
-+
-+ info = (struct kdbus_info *)(env->conn->buf + offset);
-+ ASSERT_RETURN(info->id == env->conn->id);
-+
-+ /* We do not have any well-known name */
-+ item = kdbus_get_item(info, KDBUS_ITEM_NAME);
-+ ASSERT_RETURN(item == NULL);
-+
-+ item = kdbus_get_item(info, KDBUS_ITEM_CONN_DESCRIPTION);
-+ if (valid_flags & KDBUS_ATTACH_CONN_DESCRIPTION) {
-+ ASSERT_RETURN(item);
-+ } else {
-+ ASSERT_RETURN(item == NULL);
-+ }
-+
-+ kdbus_free(env->conn, offset);
-+
-+ conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn);
-+
-+ privileged = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(privileged);
-+
-+ ret = kdbus_conn_info(conn, conn->id, NULL, valid_flags, &offset);
-+ ASSERT_RETURN(ret == 0);
-+
-+ info = (struct kdbus_info *)(conn->buf + offset);
-+ ASSERT_RETURN(info->id == conn->id);
-+
-+ /* We do not have any well-known name */
-+ item = kdbus_get_item(info, KDBUS_ITEM_NAME);
-+ ASSERT_RETURN(item == NULL);
-+
-+ cnt = kdbus_count_item(info, KDBUS_ITEM_CREDS);
-+ if (valid_flags & KDBUS_ATTACH_CREDS) {
-+ ASSERT_RETURN(cnt == 1);
-+
-+ item = kdbus_get_item(info, KDBUS_ITEM_CREDS);
-+ ASSERT_RETURN(item);
-+
-+ /* Compare received items with cached creds */
-+ ASSERT_RETURN(memcmp(&item->creds, &cached_creds,
-+ sizeof(struct kdbus_creds)) == 0);
-+ } else {
-+ ASSERT_RETURN(cnt == 0);
-+ }
-+
-+ item = kdbus_get_item(info, KDBUS_ITEM_PIDS);
-+ if (valid_flags & KDBUS_ATTACH_PIDS) {
-+ ASSERT_RETURN(item);
-+
-+ /* Compare item->pids with cached PIDs */
-+ ASSERT_RETURN(item->pids.pid == cached_pids.pid &&
-+ item->pids.tid == cached_pids.tid &&
-+ item->pids.ppid == cached_pids.ppid);
-+ } else {
-+ ASSERT_RETURN(item == NULL);
-+ }
-+
-+ /* We did not request KDBUS_ITEM_CAPS */
-+ item = kdbus_get_item(info, KDBUS_ITEM_CAPS);
-+ ASSERT_RETURN(item == NULL);
-+
-+ kdbus_free(conn, offset);
-+
-+ ret = kdbus_name_acquire(conn, "com.example.a", NULL);
-+ ASSERT_RETURN(ret >= 0);
-+
-+ ret = kdbus_conn_info(conn, conn->id, NULL, valid_flags, &offset);
-+ ASSERT_RETURN(ret == 0);
-+
-+ info = (struct kdbus_info *)(conn->buf + offset);
-+ ASSERT_RETURN(info->id == conn->id);
-+
-+ item = kdbus_get_item(info, KDBUS_ITEM_OWNED_NAME);
-+ if (valid_flags & KDBUS_ATTACH_NAMES) {
-+ ASSERT_RETURN(item && !strcmp(item->name.name, "com.example.a"));
-+ } else {
-+ ASSERT_RETURN(item == NULL);
-+ }
-+
-+ kdbus_free(conn, offset);
-+
-+ ret = kdbus_conn_info(conn, 0, "com.example.a", valid_flags, &offset);
-+ ASSERT_RETURN(ret == 0);
-+
-+ info = (struct kdbus_info *)(conn->buf + offset);
-+ ASSERT_RETURN(info->id == conn->id);
-+
-+ kdbus_free(conn, offset);
-+
-+ /* does not have the necessary caps to drop to unprivileged */
-+ if (!capable)
-+ goto continue_test;
-+
-+ ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_GID, ({
-+ ret = kdbus_conn_info(conn, conn->id, NULL,
-+ valid_flags, &offset);
-+ ASSERT_EXIT(ret == 0);
-+
-+ info = (struct kdbus_info *)(conn->buf + offset);
-+ ASSERT_EXIT(info->id == conn->id);
-+
-+ if (valid_flags & KDBUS_ATTACH_NAMES) {
-+ item = kdbus_get_item(info, KDBUS_ITEM_OWNED_NAME);
-+ ASSERT_EXIT(item &&
-+ strcmp(item->name.name,
-+ "com.example.a") == 0);
-+ }
-+
-+ if (valid_flags & KDBUS_ATTACH_CREDS) {
-+ item = kdbus_get_item(info, KDBUS_ITEM_CREDS);
-+ ASSERT_EXIT(item);
-+
-+ /* Compare received items with cached creds */
-+ ASSERT_EXIT(memcmp(&item->creds, &cached_creds,
-+ sizeof(struct kdbus_creds)) == 0);
-+ }
-+
-+ if (valid_flags & KDBUS_ATTACH_PIDS) {
-+ item = kdbus_get_item(info, KDBUS_ITEM_PIDS);
-+ ASSERT_EXIT(item);
-+
-+ /*
-+ * Compare item->pids with cached pids of
-+ * privileged one.
-+ *
-+ * cmd_info will always return cached pids.
-+ */
-+ ASSERT_EXIT(item->pids.pid == cached_pids.pid &&
-+ item->pids.tid == cached_pids.tid);
-+ }
-+
-+ kdbus_free(conn, offset);
-+
-+ /*
-+ * Use invalid_flags and make sure that userspace
-+ * do not play with us.
-+ */
-+ ret = kdbus_conn_info(conn, conn->id, NULL,
-+ invalid_flags, &offset);
-+ ASSERT_EXIT(ret == 0);
-+
-+ /*
-+ * Make sure that we return only one creds item and
-+ * it points to the cached creds.
-+ */
-+ cnt = kdbus_count_item(info, KDBUS_ITEM_CREDS);
-+ if (invalid_flags & KDBUS_ATTACH_CREDS) {
-+ ASSERT_EXIT(cnt == 1);
-+
-+ item = kdbus_get_item(info, KDBUS_ITEM_CREDS);
-+ ASSERT_EXIT(item);
-+
-+ /* Compare received items with cached creds */
-+ ASSERT_EXIT(memcmp(&item->creds, &cached_creds,
-+ sizeof(struct kdbus_creds)) == 0);
-+ } else {
-+ ASSERT_EXIT(cnt == 0);
-+ }
-+
-+ if (invalid_flags & KDBUS_ATTACH_PIDS) {
-+ cnt = kdbus_count_item(info, KDBUS_ITEM_PIDS);
-+ ASSERT_EXIT(cnt == 1);
-+
-+ item = kdbus_get_item(info, KDBUS_ITEM_PIDS);
-+ ASSERT_EXIT(item);
-+
-+ /* Compare item->pids with cached pids */
-+ ASSERT_EXIT(item->pids.pid == cached_pids.pid &&
-+ item->pids.tid == cached_pids.tid);
-+ }
-+
-+ cnt = kdbus_count_item(info, KDBUS_ITEM_CGROUP);
-+ if (invalid_flags & KDBUS_ATTACH_CGROUP) {
-+ ASSERT_EXIT(cnt == 1);
-+ } else {
-+ ASSERT_EXIT(cnt == 0);
-+ }
-+
-+ cnt = kdbus_count_item(info, KDBUS_ITEM_CAPS);
-+ if (invalid_flags & KDBUS_ATTACH_CAPS) {
-+ ASSERT_EXIT(cnt == 1);
-+ } else {
-+ ASSERT_EXIT(cnt == 0);
-+ }
-+
-+ kdbus_free(conn, offset);
-+ }),
-+ ({ 0; }));
-+ ASSERT_RETURN(ret == 0);
-+
-+continue_test:
-+
-+ /* A second name */
-+ ret = kdbus_name_acquire(conn, "com.example.b", NULL);
-+ ASSERT_RETURN(ret >= 0);
-+
-+ ret = kdbus_conn_info(conn, conn->id, NULL, valid_flags, &offset);
-+ ASSERT_RETURN(ret == 0);
-+
-+ info = (struct kdbus_info *)(conn->buf + offset);
-+ ASSERT_RETURN(info->id == conn->id);
-+
-+ cnt = kdbus_count_item(info, KDBUS_ITEM_OWNED_NAME);
-+ if (valid_flags & KDBUS_ATTACH_NAMES) {
-+ ASSERT_RETURN(cnt == 2);
-+ } else {
-+ ASSERT_RETURN(cnt == 0);
-+ }
-+
-+ kdbus_free(conn, offset);
-+
-+ ASSERT_RETURN(ret == 0);
-+
-+ return 0;
-+}
-+
-+int kdbus_test_conn_info(struct kdbus_test_env *env)
-+{
-+ int ret;
-+ int have_caps;
-+ struct {
-+ struct kdbus_cmd_info cmd_info;
-+
-+ struct {
-+ uint64_t size;
-+ uint64_t type;
-+ char str[64];
-+ } name;
-+ } buf;
-+
-+ buf.cmd_info.size = sizeof(struct kdbus_cmd_info);
-+ buf.cmd_info.flags = 0;
-+ buf.cmd_info.attach_flags = 0;
-+ buf.cmd_info.id = env->conn->id;
-+
-+ ret = kdbus_conn_info(env->conn, env->conn->id, NULL, 0, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* try to pass a name that is longer than the buffer's size */
-+ buf.name.size = KDBUS_ITEM_HEADER_SIZE + 1;
-+ buf.name.type = KDBUS_ITEM_NAME;
-+ strcpy(buf.name.str, "foo.bar.bla");
-+
-+ buf.cmd_info.id = 0;
-+ buf.cmd_info.size = sizeof(buf.cmd_info) + buf.name.size;
-+ ret = kdbus_cmd_conn_info(env->conn->fd, (struct kdbus_cmd_info *) &buf);
-+ ASSERT_RETURN(ret == -EINVAL);
-+
-+ /* Pass a non existent name */
-+ ret = kdbus_conn_info(env->conn, 0, "non.existent.name", 0, NULL);
-+ ASSERT_RETURN(ret == -ESRCH);
-+
-+ if (!all_uids_gids_are_mapped())
-+ return TEST_SKIP;
-+
-+ /* Test for caps here, so we run the previous test */
-+ have_caps = test_is_capable(CAP_SETUID, CAP_SETGID, -1);
-+ ASSERT_RETURN(have_caps >= 0);
-+
-+ ret = kdbus_fuzz_conn_info(env, have_caps);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* Now if we have skipped some tests then let the user know */
-+ if (!have_caps)
-+ return TEST_SKIP;
-+
-+ return TEST_OK;
-+}
-+
-+int kdbus_test_conn_update(struct kdbus_test_env *env)
-+{
-+ struct kdbus_conn *conn;
-+ struct kdbus_msg *msg;
-+ int found = 0;
-+ int ret;
-+
-+ /*
-+ * kdbus_hello() sets all attach flags. Receive a message by this
-+ * connection, and make sure a timestamp item (just to pick one) is
-+ * present.
-+ */
-+ conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn);
-+
-+ ret = kdbus_msg_send(env->conn, NULL, 0x12345678, 0, 0, 0, conn->id);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_msg_recv(conn, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ found = kdbus_item_in_message(msg, KDBUS_ITEM_TIMESTAMP);
-+ ASSERT_RETURN(found == 1);
-+
-+ kdbus_msg_free(msg);
-+
-+ /*
-+ * Now, modify the attach flags and repeat the action. The item must
-+ * now be missing.
-+ */
-+ found = 0;
-+
-+ ret = kdbus_conn_update_attach_flags(conn,
-+ _KDBUS_ATTACH_ALL,
-+ _KDBUS_ATTACH_ALL &
-+ ~KDBUS_ATTACH_TIMESTAMP);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_msg_send(env->conn, NULL, 0x12345678, 0, 0, 0, conn->id);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_msg_recv(conn, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ found = kdbus_item_in_message(msg, KDBUS_ITEM_TIMESTAMP);
-+ ASSERT_RETURN(found == 0);
-+
-+ /* Provide a bogus attach_flags value */
-+ ret = kdbus_conn_update_attach_flags(conn,
-+ _KDBUS_ATTACH_ALL + 1,
-+ _KDBUS_ATTACH_ALL);
-+ ASSERT_RETURN(ret == -EINVAL);
-+
-+ kdbus_msg_free(msg);
-+
-+ kdbus_conn_free(conn);
-+
-+ return TEST_OK;
-+}
-+
-+int kdbus_test_writable_pool(struct kdbus_test_env *env)
-+{
-+ struct kdbus_cmd_free cmd_free = {};
-+ struct kdbus_cmd_hello hello;
-+ int fd, ret;
-+ void *map;
-+
-+ fd = open(env->buspath, O_RDWR | O_CLOEXEC);
-+ ASSERT_RETURN(fd >= 0);
-+
-+ memset(&hello, 0, sizeof(hello));
-+ hello.flags = KDBUS_HELLO_ACCEPT_FD;
-+ hello.attach_flags_send = _KDBUS_ATTACH_ALL;
-+ hello.attach_flags_recv = _KDBUS_ATTACH_ALL;
-+ hello.size = sizeof(struct kdbus_cmd_hello);
-+ hello.pool_size = POOL_SIZE;
-+ hello.offset = (__u64)-1;
-+
-+ /* success test */
-+ ret = kdbus_cmd_hello(fd, &hello);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* The kernel should have returned some items */
-+ ASSERT_RETURN(hello.offset != (__u64)-1);
-+ cmd_free.size = sizeof(cmd_free);
-+ cmd_free.offset = hello.offset;
-+ ret = kdbus_cmd_free(fd, &cmd_free);
-+ ASSERT_RETURN(ret >= 0);
-+
-+ /* pools cannot be mapped writable */
-+ map = mmap(NULL, POOL_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
-+ ASSERT_RETURN(map == MAP_FAILED);
-+
-+ /* pools can always be mapped readable */
-+ map = mmap(NULL, POOL_SIZE, PROT_READ, MAP_SHARED, fd, 0);
-+ ASSERT_RETURN(map != MAP_FAILED);
-+
-+ /* make sure we cannot change protection masks to writable */
-+ ret = mprotect(map, POOL_SIZE, PROT_READ | PROT_WRITE);
-+ ASSERT_RETURN(ret < 0);
-+
-+ munmap(map, POOL_SIZE);
-+ close(fd);
-+
-+ return TEST_OK;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-daemon.c b/tools/testing/selftests/kdbus/test-daemon.c
-new file mode 100644
-index 0000000..8bc2386
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-daemon.c
-@@ -0,0 +1,65 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <time.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <poll.h>
-+#include <stdbool.h>
-+
-+#include "kdbus-test.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+
-+int kdbus_test_daemon(struct kdbus_test_env *env)
-+{
-+ struct pollfd fds[2];
-+ int count;
-+ int ret;
-+
-+ /* This test doesn't make any sense in non-interactive mode */
-+ if (!kdbus_util_verbose)
-+ return TEST_OK;
-+
-+ printf("Created connection %llu on bus '%s'\n",
-+ (unsigned long long) env->conn->id, env->buspath);
-+
-+ ret = kdbus_name_acquire(env->conn, "com.example.kdbus-test", NULL);
-+ ASSERT_RETURN(ret == 0);
-+ printf(" Aquired name: com.example.kdbus-test\n");
-+
-+ fds[0].fd = env->conn->fd;
-+ fds[1].fd = STDIN_FILENO;
-+
-+ printf("Monitoring connections:\n");
-+
-+ for (count = 0;; count++) {
-+ int i, nfds = sizeof(fds) / sizeof(fds[0]);
-+
-+ for (i = 0; i < nfds; i++) {
-+ fds[i].events = POLLIN | POLLPRI | POLLHUP;
-+ fds[i].revents = 0;
-+ }
-+
-+ ret = poll(fds, nfds, -1);
-+ if (ret <= 0)
-+ break;
-+
-+ if (fds[0].revents & POLLIN) {
-+ ret = kdbus_msg_recv(env->conn, NULL, NULL);
-+ ASSERT_RETURN(ret == 0);
-+ }
-+
-+ /* stdin */
-+ if (fds[1].revents & POLLIN)
-+ break;
-+ }
-+
-+ printf("Closing bus connection\n");
-+
-+ return TEST_OK;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-endpoint.c b/tools/testing/selftests/kdbus/test-endpoint.c
-new file mode 100644
-index 0000000..34a7be4
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-endpoint.c
-@@ -0,0 +1,352 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <libgen.h>
-+#include <sys/capability.h>
-+#include <sys/wait.h>
-+#include <stdbool.h>
-+
-+#include "kdbus-api.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+#include "kdbus-test.h"
-+
-+#define KDBUS_SYSNAME_MAX_LEN 63
-+
-+static int install_name_add_match(struct kdbus_conn *conn, const char *name)
-+{
-+ struct {
-+ struct kdbus_cmd_match cmd;
-+ struct {
-+ uint64_t size;
-+ uint64_t type;
-+ struct kdbus_notify_name_change chg;
-+ } item;
-+ char name[64];
-+ } buf;
-+ int ret;
-+
-+ /* install the match rule */
-+ memset(&buf, 0, sizeof(buf));
-+ buf.item.type = KDBUS_ITEM_NAME_ADD;
-+ buf.item.chg.old_id.id = KDBUS_MATCH_ID_ANY;
-+ buf.item.chg.new_id.id = KDBUS_MATCH_ID_ANY;
-+ strncpy(buf.name, name, sizeof(buf.name) - 1);
-+ buf.item.size = sizeof(buf.item) + strlen(buf.name) + 1;
-+ buf.cmd.size = sizeof(buf.cmd) + buf.item.size;
-+
-+ ret = kdbus_cmd_match_add(conn->fd, &buf.cmd);
-+ if (ret < 0)
-+ return ret;
-+
-+ return 0;
-+}
-+
-+static int create_endpoint(const char *buspath, uid_t uid, const char *name,
-+ uint64_t flags)
-+{
-+ struct {
-+ struct kdbus_cmd cmd;
-+
-+ /* name item */
-+ struct {
-+ uint64_t size;
-+ uint64_t type;
-+ /* max should be KDBUS_SYSNAME_MAX_LEN */
-+ char str[128];
-+ } name;
-+ } ep_make;
-+ int fd, ret;
-+
-+ fd = open(buspath, O_RDWR);
-+ if (fd < 0)
-+ return fd;
-+
-+ memset(&ep_make, 0, sizeof(ep_make));
-+
-+ snprintf(ep_make.name.str,
-+ /* Use the KDBUS_SYSNAME_MAX_LEN or sizeof(str) */
-+ KDBUS_SYSNAME_MAX_LEN > strlen(name) ?
-+ KDBUS_SYSNAME_MAX_LEN : sizeof(ep_make.name.str),
-+ "%u-%s", uid, name);
-+
-+ ep_make.name.type = KDBUS_ITEM_MAKE_NAME;
-+ ep_make.name.size = KDBUS_ITEM_HEADER_SIZE +
-+ strlen(ep_make.name.str) + 1;
-+
-+ ep_make.cmd.flags = flags;
-+ ep_make.cmd.size = sizeof(ep_make.cmd) + ep_make.name.size;
-+
-+ ret = kdbus_cmd_endpoint_make(fd, &ep_make.cmd);
-+ if (ret < 0) {
-+ kdbus_printf("error creating endpoint: %d (%m)\n", ret);
-+ return ret;
-+ }
-+
-+ return fd;
-+}
-+
-+static int unpriv_test_custom_ep(const char *buspath)
-+{
-+ int ret, ep_fd1, ep_fd2;
-+ char *ep1, *ep2, *tmp1, *tmp2;
-+
-+ tmp1 = strdup(buspath);
-+ tmp2 = strdup(buspath);
-+ ASSERT_RETURN(tmp1 && tmp2);
-+
-+ ret = asprintf(&ep1, "%s/%u-%s", dirname(tmp1), getuid(), "apps1");
-+ ASSERT_RETURN(ret >= 0);
-+
-+ ret = asprintf(&ep2, "%s/%u-%s", dirname(tmp2), getuid(), "apps2");
-+ ASSERT_RETURN(ret >= 0);
-+
-+ free(tmp1);
-+ free(tmp2);
-+
-+ /* endpoint only accessible to current uid */
-+ ep_fd1 = create_endpoint(buspath, getuid(), "apps1", 0);
-+ ASSERT_RETURN(ep_fd1 >= 0);
-+
-+ /* endpoint world accessible */
-+ ep_fd2 = create_endpoint(buspath, getuid(), "apps2",
-+ KDBUS_MAKE_ACCESS_WORLD);
-+ ASSERT_RETURN(ep_fd2 >= 0);
-+
-+ ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_UID, ({
-+ int ep_fd;
-+ struct kdbus_conn *ep_conn;
-+
-+ /*
-+ * Make sure that we are not able to create custom
-+ * endpoints
-+ */
-+ ep_fd = create_endpoint(buspath, getuid(),
-+ "unpriv_costum_ep", 0);
-+ ASSERT_EXIT(ep_fd == -EPERM);
-+
-+ /*
-+ * Endpoint "apps1" only accessible to same users,
-+ * that own the endpoint. Access denied by VFS
-+ */
-+ ep_conn = kdbus_hello(ep1, 0, NULL, 0);
-+ ASSERT_EXIT(!ep_conn && errno == EACCES);
-+
-+ /* Endpoint "apps2" world accessible */
-+ ep_conn = kdbus_hello(ep2, 0, NULL, 0);
-+ ASSERT_EXIT(ep_conn);
-+
-+ kdbus_conn_free(ep_conn);
-+
-+ _exit(EXIT_SUCCESS);
-+ }),
-+ ({ 0; }));
-+ ASSERT_RETURN(ret == 0);
-+
-+ close(ep_fd1);
-+ close(ep_fd2);
-+ free(ep1);
-+ free(ep2);
-+
-+ return 0;
-+}
-+
-+static int update_endpoint(int fd, const char *name)
-+{
-+ int len = strlen(name) + 1;
-+ struct {
-+ struct kdbus_cmd cmd;
-+
-+ /* name item */
-+ struct {
-+ uint64_t size;
-+ uint64_t type;
-+ char str[KDBUS_ALIGN8(len)];
-+ } name;
-+
-+ struct {
-+ uint64_t size;
-+ uint64_t type;
-+ struct kdbus_policy_access access;
-+ } access;
-+ } ep_update;
-+ int ret;
-+
-+ memset(&ep_update, 0, sizeof(ep_update));
-+
-+ ep_update.name.size = KDBUS_ITEM_HEADER_SIZE + len;
-+ ep_update.name.type = KDBUS_ITEM_NAME;
-+ strncpy(ep_update.name.str, name, sizeof(ep_update.name.str) - 1);
-+
-+ ep_update.access.size = sizeof(ep_update.access);
-+ ep_update.access.type = KDBUS_ITEM_POLICY_ACCESS;
-+ ep_update.access.access.type = KDBUS_POLICY_ACCESS_WORLD;
-+ ep_update.access.access.access = KDBUS_POLICY_SEE;
-+
-+ ep_update.cmd.size = sizeof(ep_update);
-+
-+ ret = kdbus_cmd_endpoint_update(fd, &ep_update.cmd);
-+ if (ret < 0) {
-+ kdbus_printf("error updating endpoint: %d (%m)\n", ret);
-+ return ret;
-+ }
-+
-+ return 0;
-+}
-+
-+int kdbus_test_custom_endpoint(struct kdbus_test_env *env)
-+{
-+ char *ep, *tmp;
-+ int ret, ep_fd;
-+ struct kdbus_msg *msg;
-+ struct kdbus_conn *ep_conn;
-+ struct kdbus_conn *reader;
-+ const char *name = "foo.bar.baz";
-+ const char *epname = "foo";
-+ char fake_ep[KDBUS_SYSNAME_MAX_LEN + 1] = {'\0'};
-+
-+ memset(fake_ep, 'X', sizeof(fake_ep) - 1);
-+
-+ /* Try to create a custom endpoint with a long name */
-+ ret = create_endpoint(env->buspath, getuid(), fake_ep, 0);
-+ ASSERT_RETURN(ret == -ENAMETOOLONG);
-+
-+ /* Try to create a custom endpoint with a different uid */
-+ ret = create_endpoint(env->buspath, getuid() + 1, "foobar", 0);
-+ ASSERT_RETURN(ret == -EINVAL);
-+
-+ /* create a custom endpoint, and open a connection on it */
-+ ep_fd = create_endpoint(env->buspath, getuid(), "foo", 0);
-+ ASSERT_RETURN(ep_fd >= 0);
-+
-+ tmp = strdup(env->buspath);
-+ ASSERT_RETURN(tmp);
-+
-+ ret = asprintf(&ep, "%s/%u-%s", dirname(tmp), getuid(), epname);
-+ free(tmp);
-+ ASSERT_RETURN(ret >= 0);
-+
-+ /* Register a connection that listen to broadcasts */
-+ reader = kdbus_hello(ep, 0, NULL, 0);
-+ ASSERT_RETURN(reader);
-+
-+ /* Register to kernel signals */
-+ ret = kdbus_add_match_id(reader, 0x1, KDBUS_ITEM_ID_ADD,
-+ KDBUS_MATCH_ID_ANY);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_add_match_id(reader, 0x2, KDBUS_ITEM_ID_REMOVE,
-+ KDBUS_MATCH_ID_ANY);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = install_name_add_match(reader, name);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* Monitor connections are not supported on custom endpoints */
-+ ep_conn = kdbus_hello(ep, KDBUS_HELLO_MONITOR, NULL, 0);
-+ ASSERT_RETURN(!ep_conn && errno == EOPNOTSUPP);
-+
-+ ep_conn = kdbus_hello(ep, 0, NULL, 0);
-+ ASSERT_RETURN(ep_conn);
-+
-+ /* Check that the reader got the IdAdd notification */
-+ ret = kdbus_msg_recv(reader, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_ID_ADD);
-+ ASSERT_RETURN(msg->items[0].id_change.id == ep_conn->id);
-+ kdbus_msg_free(msg);
-+
-+ /*
-+ * Add a name add match on the endpoint connection, acquire name from
-+ * the unfiltered connection, and make sure the filtered connection
-+ * did not get the notification on the name owner change. Also, the
-+ * endpoint connection may not be able to call conn_info, neither on
-+ * the name nor on the ID.
-+ */
-+ ret = install_name_add_match(ep_conn, name);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_name_acquire(env->conn, name, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_msg_recv(ep_conn, NULL, NULL);
-+ ASSERT_RETURN(ret == -EAGAIN);
-+
-+ ret = kdbus_conn_info(ep_conn, 0, name, 0, NULL);
-+ ASSERT_RETURN(ret == -ESRCH);
-+
-+ ret = kdbus_conn_info(ep_conn, 0, "random.crappy.name", 0, NULL);
-+ ASSERT_RETURN(ret == -ESRCH);
-+
-+ ret = kdbus_conn_info(ep_conn, env->conn->id, NULL, 0, NULL);
-+ ASSERT_RETURN(ret == -ENXIO);
-+
-+ ret = kdbus_conn_info(ep_conn, 0x0fffffffffffffffULL, NULL, 0, NULL);
-+ ASSERT_RETURN(ret == -ENXIO);
-+
-+ /* Check that the reader did not receive the name notification */
-+ ret = kdbus_msg_recv(reader, NULL, NULL);
-+ ASSERT_RETURN(ret == -EAGAIN);
-+
-+ /*
-+ * Release the name again, update the custom endpoint policy,
-+ * and try again. This time, the connection on the custom endpoint
-+ * should have gotten it.
-+ */
-+ ret = kdbus_name_release(env->conn, name);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* Check that the reader did not receive the name notification */
-+ ret = kdbus_msg_recv(reader, NULL, NULL);
-+ ASSERT_RETURN(ret == -EAGAIN);
-+
-+ ret = update_endpoint(ep_fd, name);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_name_acquire(env->conn, name, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_msg_recv(ep_conn, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_NAME_ADD);
-+ ASSERT_RETURN(msg->items[0].name_change.old_id.id == 0);
-+ ASSERT_RETURN(msg->items[0].name_change.new_id.id == env->conn->id);
-+ ASSERT_RETURN(strcmp(msg->items[0].name_change.name, name) == 0);
-+ kdbus_msg_free(msg);
-+
-+ ret = kdbus_msg_recv(reader, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(strcmp(msg->items[0].name_change.name, name) == 0);
-+
-+ kdbus_msg_free(msg);
-+
-+ ret = kdbus_conn_info(ep_conn, 0, name, 0, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_conn_info(ep_conn, env->conn->id, NULL, 0, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* If we have privileges test custom endpoints */
-+ ret = test_is_capable(CAP_SETUID, CAP_SETGID, -1);
-+ ASSERT_RETURN(ret >= 0);
-+
-+ /*
-+ * All uids/gids are mapped and we have the necessary caps
-+ */
-+ if (ret && all_uids_gids_are_mapped()) {
-+ ret = unpriv_test_custom_ep(env->buspath);
-+ ASSERT_RETURN(ret == 0);
-+ }
-+
-+ kdbus_conn_free(reader);
-+ kdbus_conn_free(ep_conn);
-+ close(ep_fd);
-+
-+ return TEST_OK;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-fd.c b/tools/testing/selftests/kdbus/test-fd.c
-new file mode 100644
-index 0000000..2ae0f5a
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-fd.c
-@@ -0,0 +1,789 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <time.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stdbool.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <sys/types.h>
-+#include <sys/mman.h>
-+#include <sys/socket.h>
-+#include <sys/wait.h>
-+
-+#include "kdbus-api.h"
-+#include "kdbus-test.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+
-+#define KDBUS_MSG_MAX_ITEMS 128
-+#define KDBUS_USER_MAX_CONN 256
-+
-+/* maximum number of inflight fds in a target queue per user */
-+#define KDBUS_CONN_MAX_FDS_PER_USER 16
-+
-+/* maximum number of memfd items per message */
-+#define KDBUS_MSG_MAX_MEMFD_ITEMS 16
-+
-+static int make_msg_payload_dbus(uint64_t src_id, uint64_t dst_id,
-+ uint64_t msg_size,
-+ struct kdbus_msg **msg_dbus)
-+{
-+ struct kdbus_msg *msg;
-+
-+ msg = malloc(msg_size);
-+ ASSERT_RETURN_VAL(msg, -ENOMEM);
-+
-+ memset(msg, 0, msg_size);
-+ msg->size = msg_size;
-+ msg->src_id = src_id;
-+ msg->dst_id = dst_id;
-+ msg->payload_type = KDBUS_PAYLOAD_DBUS;
-+
-+ *msg_dbus = msg;
-+
-+ return 0;
-+}
-+
-+static void make_item_memfds(struct kdbus_item *item,
-+ int *memfds, size_t memfd_size)
-+{
-+ size_t i;
-+
-+ for (i = 0; i < memfd_size; i++) {
-+ item->type = KDBUS_ITEM_PAYLOAD_MEMFD;
-+ item->size = KDBUS_ITEM_HEADER_SIZE +
-+ sizeof(struct kdbus_memfd);
-+ item->memfd.fd = memfds[i];
-+ item->memfd.size = sizeof(uint64_t); /* const size */
-+ item = KDBUS_ITEM_NEXT(item);
-+ }
-+}
-+
-+static void make_item_fds(struct kdbus_item *item,
-+ int *fd_array, size_t fd_size)
-+{
-+ size_t i;
-+ item->type = KDBUS_ITEM_FDS;
-+ item->size = KDBUS_ITEM_HEADER_SIZE + (sizeof(int) * fd_size);
-+
-+ for (i = 0; i < fd_size; i++)
-+ item->fds[i] = fd_array[i];
-+}
-+
-+static int memfd_write(const char *name, void *buf, size_t bufsize)
-+{
-+ ssize_t ret;
-+ int memfd;
-+
-+ memfd = sys_memfd_create(name, 0);
-+ ASSERT_RETURN_VAL(memfd >= 0, memfd);
-+
-+ ret = write(memfd, buf, bufsize);
-+ ASSERT_RETURN_VAL(ret == (ssize_t)bufsize, -EAGAIN);
-+
-+ ret = sys_memfd_seal_set(memfd);
-+ ASSERT_RETURN_VAL(ret == 0, -errno);
-+
-+ return memfd;
-+}
-+
-+static int send_memfds(struct kdbus_conn *conn, uint64_t dst_id,
-+ int *memfds_array, size_t memfd_count)
-+{
-+ struct kdbus_cmd_send cmd = {};
-+ struct kdbus_item *item;
-+ struct kdbus_msg *msg;
-+ uint64_t size;
-+ int ret;
-+
-+ size = sizeof(struct kdbus_msg);
-+ size += memfd_count * KDBUS_ITEM_SIZE(sizeof(struct kdbus_memfd));
-+
-+ if (dst_id == KDBUS_DST_ID_BROADCAST)
-+ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
-+
-+ ret = make_msg_payload_dbus(conn->id, dst_id, size, &msg);
-+ ASSERT_RETURN_VAL(ret == 0, ret);
-+
-+ item = msg->items;
-+
-+ if (dst_id == KDBUS_DST_ID_BROADCAST) {
-+ item->type = KDBUS_ITEM_BLOOM_FILTER;
-+ item->size = KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
-+ item = KDBUS_ITEM_NEXT(item);
-+
-+ msg->flags |= KDBUS_MSG_SIGNAL;
-+ }
-+
-+ make_item_memfds(item, memfds_array, memfd_count);
-+
-+ cmd.size = sizeof(cmd);
-+ cmd.msg_address = (uintptr_t)msg;
-+
-+ ret = kdbus_cmd_send(conn->fd, &cmd);
-+ if (ret < 0) {
-+ kdbus_printf("error sending message: %d (%m)\n", ret);
-+ return ret;
-+ }
-+
-+ free(msg);
-+ return 0;
-+}
-+
-+static int send_fds(struct kdbus_conn *conn, uint64_t dst_id,
-+ int *fd_array, size_t fd_count)
-+{
-+ struct kdbus_cmd_send cmd = {};
-+ struct kdbus_item *item;
-+ struct kdbus_msg *msg;
-+ uint64_t size;
-+ int ret;
-+
-+ size = sizeof(struct kdbus_msg);
-+ size += KDBUS_ITEM_SIZE(sizeof(int) * fd_count);
-+
-+ if (dst_id == KDBUS_DST_ID_BROADCAST)
-+ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
-+
-+ ret = make_msg_payload_dbus(conn->id, dst_id, size, &msg);
-+ ASSERT_RETURN_VAL(ret == 0, ret);
-+
-+ item = msg->items;
-+
-+ if (dst_id == KDBUS_DST_ID_BROADCAST) {
-+ item->type = KDBUS_ITEM_BLOOM_FILTER;
-+ item->size = KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + 64;
-+ item = KDBUS_ITEM_NEXT(item);
-+
-+ msg->flags |= KDBUS_MSG_SIGNAL;
-+ }
-+
-+ make_item_fds(item, fd_array, fd_count);
-+
-+ cmd.size = sizeof(cmd);
-+ cmd.msg_address = (uintptr_t)msg;
-+
-+ ret = kdbus_cmd_send(conn->fd, &cmd);
-+ if (ret < 0) {
-+ kdbus_printf("error sending message: %d (%m)\n", ret);
-+ return ret;
-+ }
-+
-+ free(msg);
-+ return ret;
-+}
-+
-+static int send_fds_memfds(struct kdbus_conn *conn, uint64_t dst_id,
-+ int *fds_array, size_t fd_count,
-+ int *memfds_array, size_t memfd_count)
-+{
-+ struct kdbus_cmd_send cmd = {};
-+ struct kdbus_item *item;
-+ struct kdbus_msg *msg;
-+ uint64_t size;
-+ int ret;
-+
-+ size = sizeof(struct kdbus_msg);
-+ size += memfd_count * KDBUS_ITEM_SIZE(sizeof(struct kdbus_memfd));
-+ size += KDBUS_ITEM_SIZE(sizeof(int) * fd_count);
-+
-+ ret = make_msg_payload_dbus(conn->id, dst_id, size, &msg);
-+ ASSERT_RETURN_VAL(ret == 0, ret);
-+
-+ item = msg->items;
-+
-+ make_item_fds(item, fds_array, fd_count);
-+ item = KDBUS_ITEM_NEXT(item);
-+ make_item_memfds(item, memfds_array, memfd_count);
-+
-+ cmd.size = sizeof(cmd);
-+ cmd.msg_address = (uintptr_t)msg;
-+
-+ ret = kdbus_cmd_send(conn->fd, &cmd);
-+ if (ret < 0) {
-+ kdbus_printf("error sending message: %d (%m)\n", ret);
-+ return ret;
-+ }
-+
-+ free(msg);
-+ return ret;
-+}
-+
-+/* Return the number of received fds */
-+static unsigned int kdbus_item_get_nfds(struct kdbus_msg *msg)
-+{
-+ unsigned int fds = 0;
-+ const struct kdbus_item *item;
-+
-+ KDBUS_ITEM_FOREACH(item, msg, items) {
-+ switch (item->type) {
-+ case KDBUS_ITEM_FDS: {
-+ fds += (item->size - KDBUS_ITEM_HEADER_SIZE) /
-+ sizeof(int);
-+ break;
-+ }
-+
-+ case KDBUS_ITEM_PAYLOAD_MEMFD:
-+ fds++;
-+ break;
-+
-+ default:
-+ break;
-+ }
-+ }
-+
-+ return fds;
-+}
-+
-+static struct kdbus_msg *
-+get_kdbus_msg_with_fd(struct kdbus_conn *conn_src,
-+ uint64_t dst_id, uint64_t cookie, int fd)
-+{
-+ int ret;
-+ uint64_t size;
-+ struct kdbus_item *item;
-+ struct kdbus_msg *msg;
-+
-+ size = sizeof(struct kdbus_msg);
-+ if (fd >= 0)
-+ size += KDBUS_ITEM_SIZE(sizeof(int));
-+
-+ ret = make_msg_payload_dbus(conn_src->id, dst_id, size, &msg);
-+ ASSERT_RETURN_VAL(ret == 0, NULL);
-+
-+ msg->cookie = cookie;
-+
-+ if (fd >= 0) {
-+ item = msg->items;
-+
-+ make_item_fds(item, (int *)&fd, 1);
-+ }
-+
-+ return msg;
-+}
-+
-+static int kdbus_test_no_fds(struct kdbus_test_env *env,
-+ int *fds, int *memfd)
-+{
-+ pid_t pid;
-+ int ret, status;
-+ uint64_t cookie;
-+ int connfd1, connfd2;
-+ struct kdbus_msg *msg, *msg_sync_reply;
-+ struct kdbus_cmd_hello hello;
-+ struct kdbus_conn *conn_src, *conn_dst, *conn_dummy;
-+ struct kdbus_cmd_send cmd = {};
-+ struct kdbus_cmd_free cmd_free = {};
-+
-+ conn_src = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn_src);
-+
-+ connfd1 = open(env->buspath, O_RDWR|O_CLOEXEC);
-+ ASSERT_RETURN(connfd1 >= 0);
-+
-+ connfd2 = open(env->buspath, O_RDWR|O_CLOEXEC);
-+ ASSERT_RETURN(connfd2 >= 0);
-+
-+ /*
-+ * Create connections without KDBUS_HELLO_ACCEPT_FD
-+ * to test if send fd operations are blocked
-+ */
-+ conn_dst = malloc(sizeof(*conn_dst));
-+ ASSERT_RETURN(conn_dst);
-+
-+ conn_dummy = malloc(sizeof(*conn_dummy));
-+ ASSERT_RETURN(conn_dummy);
-+
-+ memset(&hello, 0, sizeof(hello));
-+ hello.size = sizeof(struct kdbus_cmd_hello);
-+ hello.pool_size = POOL_SIZE;
-+ hello.attach_flags_send = _KDBUS_ATTACH_ALL;
-+
-+ ret = kdbus_cmd_hello(connfd1, &hello);
-+ ASSERT_RETURN(ret == 0);
-+
-+ cmd_free.size = sizeof(cmd_free);
-+ cmd_free.offset = hello.offset;
-+ ret = kdbus_cmd_free(connfd1, &cmd_free);
-+ ASSERT_RETURN(ret >= 0);
-+
-+ conn_dst->fd = connfd1;
-+ conn_dst->id = hello.id;
-+
-+ memset(&hello, 0, sizeof(hello));
-+ hello.size = sizeof(struct kdbus_cmd_hello);
-+ hello.pool_size = POOL_SIZE;
-+ hello.attach_flags_send = _KDBUS_ATTACH_ALL;
-+
-+ ret = kdbus_cmd_hello(connfd2, &hello);
-+ ASSERT_RETURN(ret == 0);
-+
-+ cmd_free.size = sizeof(cmd_free);
-+ cmd_free.offset = hello.offset;
-+ ret = kdbus_cmd_free(connfd2, &cmd_free);
-+ ASSERT_RETURN(ret >= 0);
-+
-+ conn_dummy->fd = connfd2;
-+ conn_dummy->id = hello.id;
-+
-+ conn_dst->buf = mmap(NULL, POOL_SIZE, PROT_READ,
-+ MAP_SHARED, connfd1, 0);
-+ ASSERT_RETURN(conn_dst->buf != MAP_FAILED);
-+
-+ conn_dummy->buf = mmap(NULL, POOL_SIZE, PROT_READ,
-+ MAP_SHARED, connfd2, 0);
-+ ASSERT_RETURN(conn_dummy->buf != MAP_FAILED);
-+
-+ /*
-+ * Send fds to connection that do not accept fd passing
-+ */
-+ ret = send_fds(conn_src, conn_dst->id, fds, 1);
-+ ASSERT_RETURN(ret == -ECOMM);
-+
-+ /*
-+ * memfd are kdbus payload
-+ */
-+ ret = send_memfds(conn_src, conn_dst->id, memfd, 1);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_msg_recv_poll(conn_dst, 100, NULL, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ cookie = time(NULL);
-+
-+ pid = fork();
-+ ASSERT_RETURN_VAL(pid >= 0, pid);
-+
-+ if (pid == 0) {
-+ struct timespec now;
-+
-+ /*
-+ * A sync send/reply to a connection that do not
-+ * accept fds should fail if it contains an fd
-+ */
-+ msg_sync_reply = get_kdbus_msg_with_fd(conn_dst,
-+ conn_dummy->id,
-+ cookie, fds[0]);
-+ ASSERT_EXIT(msg_sync_reply);
-+
-+ ret = clock_gettime(CLOCK_MONOTONIC_COARSE, &now);
-+ ASSERT_EXIT(ret == 0);
-+
-+ msg_sync_reply->timeout_ns = now.tv_sec * 1000000000ULL +
-+ now.tv_nsec + 100000000ULL;
-+ msg_sync_reply->flags = KDBUS_MSG_EXPECT_REPLY;
-+
-+ memset(&cmd, 0, sizeof(cmd));
-+ cmd.size = sizeof(cmd);
-+ cmd.msg_address = (uintptr_t)msg_sync_reply;
-+ cmd.flags = KDBUS_SEND_SYNC_REPLY;
-+
-+ ret = kdbus_cmd_send(conn_dst->fd, &cmd);
-+ ASSERT_EXIT(ret == -ECOMM);
-+
-+ /*
-+ * Now send a normal message, but the sync reply
-+ * will fail since it contains an fd that the
-+ * original sender do not want.
-+ *
-+ * The original sender will fail with -ETIMEDOUT
-+ */
-+ cookie++;
-+ ret = kdbus_msg_send_sync(conn_dst, NULL, cookie,
-+ KDBUS_MSG_EXPECT_REPLY,
-+ 5000000000ULL, 0, conn_src->id, -1);
-+ ASSERT_EXIT(ret == -EREMOTEIO);
-+
-+ cookie++;
-+ ret = kdbus_msg_recv_poll(conn_dst, 100, &msg, NULL);
-+ ASSERT_EXIT(ret == 0);
-+ ASSERT_EXIT(msg->cookie == cookie);
-+
-+ free(msg_sync_reply);
-+ kdbus_msg_free(msg);
-+
-+ _exit(EXIT_SUCCESS);
-+ }
-+
-+ ret = kdbus_msg_recv_poll(conn_dummy, 100, NULL, NULL);
-+ ASSERT_RETURN(ret == -ETIMEDOUT);
-+
-+ cookie++;
-+ ret = kdbus_msg_recv_poll(conn_src, 100, &msg, NULL);
-+ ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
-+
-+ kdbus_msg_free(msg);
-+
-+ /*
-+ * Try to reply with a kdbus connection handle, this should
-+ * fail with -EOPNOTSUPP
-+ */
-+ msg_sync_reply = get_kdbus_msg_with_fd(conn_src,
-+ conn_dst->id,
-+ cookie, conn_dst->fd);
-+ ASSERT_RETURN(msg_sync_reply);
-+
-+ msg_sync_reply->cookie_reply = cookie;
-+
-+ memset(&cmd, 0, sizeof(cmd));
-+ cmd.size = sizeof(cmd);
-+ cmd.msg_address = (uintptr_t)msg_sync_reply;
-+
-+ ret = kdbus_cmd_send(conn_src->fd, &cmd);
-+ ASSERT_RETURN(ret == -EOPNOTSUPP);
-+
-+ free(msg_sync_reply);
-+
-+ /*
-+ * Try to reply with a normal fd, this should fail even
-+ * if the response is a sync reply
-+ *
-+ * From the sender view we fail with -ECOMM
-+ */
-+ msg_sync_reply = get_kdbus_msg_with_fd(conn_src,
-+ conn_dst->id,
-+ cookie, fds[0]);
-+ ASSERT_RETURN(msg_sync_reply);
-+
-+ msg_sync_reply->cookie_reply = cookie;
-+
-+ memset(&cmd, 0, sizeof(cmd));
-+ cmd.size = sizeof(cmd);
-+ cmd.msg_address = (uintptr_t)msg_sync_reply;
-+
-+ ret = kdbus_cmd_send(conn_src->fd, &cmd);
-+ ASSERT_RETURN(ret == -ECOMM);
-+
-+ free(msg_sync_reply);
-+
-+ /*
-+ * Resend another normal message and check if the queue
-+ * is clear
-+ */
-+ cookie++;
-+ ret = kdbus_msg_send(conn_src, NULL, cookie, 0, 0, 0,
-+ conn_dst->id);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = waitpid(pid, &status, 0);
-+ ASSERT_RETURN_VAL(ret >= 0, ret);
-+
-+ kdbus_conn_free(conn_dummy);
-+ kdbus_conn_free(conn_dst);
-+ kdbus_conn_free(conn_src);
-+
-+ return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
-+}
-+
-+static int kdbus_send_multiple_fds(struct kdbus_conn *conn_src,
-+ struct kdbus_conn *conn_dst)
-+{
-+ int ret, i;
-+ unsigned int nfds;
-+ int fds[KDBUS_CONN_MAX_FDS_PER_USER + 1];
-+ int memfds[KDBUS_MSG_MAX_ITEMS + 1];
-+ struct kdbus_msg *msg;
-+ uint64_t dummy_value;
-+
-+ dummy_value = time(NULL);
-+
-+ for (i = 0; i < KDBUS_CONN_MAX_FDS_PER_USER + 1; i++) {
-+ fds[i] = open("/dev/null", O_RDWR|O_CLOEXEC);
-+ ASSERT_RETURN_VAL(fds[i] >= 0, -errno);
-+ }
-+
-+ /* Send KDBUS_CONN_MAX_FDS_PER_USER with one more fd */
-+ ret = send_fds(conn_src, conn_dst->id, fds,
-+ KDBUS_CONN_MAX_FDS_PER_USER + 1);
-+ ASSERT_RETURN(ret == -EMFILE);
-+
-+ /* Retry with the correct KDBUS_CONN_MAX_FDS_PER_USER */
-+ ret = send_fds(conn_src, conn_dst->id, fds,
-+ KDBUS_CONN_MAX_FDS_PER_USER);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_msg_recv(conn_dst, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* Check we got the right number of fds */
-+ nfds = kdbus_item_get_nfds(msg);
-+ ASSERT_RETURN(nfds == KDBUS_CONN_MAX_FDS_PER_USER);
-+
-+ kdbus_msg_free(msg);
-+
-+ for (i = 0; i < KDBUS_MSG_MAX_ITEMS + 1; i++, dummy_value++) {
-+ memfds[i] = memfd_write("memfd-name",
-+ &dummy_value,
-+ sizeof(dummy_value));
-+ ASSERT_RETURN_VAL(memfds[i] >= 0, memfds[i]);
-+ }
-+
-+ /* Send KDBUS_MSG_MAX_ITEMS with one more memfd */
-+ ret = send_memfds(conn_src, conn_dst->id,
-+ memfds, KDBUS_MSG_MAX_ITEMS + 1);
-+ ASSERT_RETURN(ret == -E2BIG);
-+
-+ ret = send_memfds(conn_src, conn_dst->id,
-+ memfds, KDBUS_MSG_MAX_MEMFD_ITEMS + 1);
-+ ASSERT_RETURN(ret == -E2BIG);
-+
-+ /* Retry with the correct KDBUS_MSG_MAX_ITEMS */
-+ ret = send_memfds(conn_src, conn_dst->id,
-+ memfds, KDBUS_MSG_MAX_MEMFD_ITEMS);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_msg_recv(conn_dst, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* Check we got the right number of fds */
-+ nfds = kdbus_item_get_nfds(msg);
-+ ASSERT_RETURN(nfds == KDBUS_MSG_MAX_MEMFD_ITEMS);
-+
-+ kdbus_msg_free(msg);
-+
-+
-+ /*
-+ * Combine multiple KDBUS_CONN_MAX_FDS_PER_USER+1 fds and
-+ * 10 memfds
-+ */
-+ ret = send_fds_memfds(conn_src, conn_dst->id,
-+ fds, KDBUS_CONN_MAX_FDS_PER_USER + 1,
-+ memfds, 10);
-+ ASSERT_RETURN(ret == -EMFILE);
-+
-+ ret = kdbus_msg_recv(conn_dst, NULL, NULL);
-+ ASSERT_RETURN(ret == -EAGAIN);
-+
-+ /*
-+ * Combine multiple KDBUS_CONN_MAX_FDS_PER_USER fds and
-+ * (128 - 1) + 1 memfds, all fds take one item, while each
-+ * memfd takes one item
-+ */
-+ ret = send_fds_memfds(conn_src, conn_dst->id,
-+ fds, KDBUS_CONN_MAX_FDS_PER_USER,
-+ memfds, (KDBUS_MSG_MAX_ITEMS - 1) + 1);
-+ ASSERT_RETURN(ret == -E2BIG);
-+
-+ ret = send_fds_memfds(conn_src, conn_dst->id,
-+ fds, KDBUS_CONN_MAX_FDS_PER_USER,
-+ memfds, KDBUS_MSG_MAX_MEMFD_ITEMS + 1);
-+ ASSERT_RETURN(ret == -E2BIG);
-+
-+ ret = kdbus_msg_recv(conn_dst, NULL, NULL);
-+ ASSERT_RETURN(ret == -EAGAIN);
-+
-+ /*
-+ * Send KDBUS_CONN_MAX_FDS_PER_USER fds +
-+ * KDBUS_MSG_MAX_MEMFD_ITEMS memfds
-+ */
-+ ret = send_fds_memfds(conn_src, conn_dst->id,
-+ fds, KDBUS_CONN_MAX_FDS_PER_USER,
-+ memfds, KDBUS_MSG_MAX_MEMFD_ITEMS);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_msg_recv(conn_dst, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* Check we got the right number of fds */
-+ nfds = kdbus_item_get_nfds(msg);
-+ ASSERT_RETURN(nfds == KDBUS_CONN_MAX_FDS_PER_USER +
-+ KDBUS_MSG_MAX_MEMFD_ITEMS);
-+
-+ kdbus_msg_free(msg);
-+
-+
-+ /*
-+ * Re-send fds + memfds, close them, but do not receive them
-+ * and try to queue more
-+ */
-+ ret = send_fds_memfds(conn_src, conn_dst->id,
-+ fds, KDBUS_CONN_MAX_FDS_PER_USER,
-+ memfds, KDBUS_MSG_MAX_MEMFD_ITEMS);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* close old references and get a new ones */
-+ for (i = 0; i < KDBUS_CONN_MAX_FDS_PER_USER + 1; i++) {
-+ close(fds[i]);
-+ fds[i] = open("/dev/null", O_RDWR|O_CLOEXEC);
-+ ASSERT_RETURN_VAL(fds[i] >= 0, -errno);
-+ }
-+
-+ /* should fail since we have already fds in the queue */
-+ ret = send_fds(conn_src, conn_dst->id, fds,
-+ KDBUS_CONN_MAX_FDS_PER_USER);
-+ ASSERT_RETURN(ret == -EMFILE);
-+
-+ /* This should succeed */
-+ ret = send_memfds(conn_src, conn_dst->id,
-+ memfds, KDBUS_MSG_MAX_MEMFD_ITEMS);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_msg_recv(conn_dst, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ nfds = kdbus_item_get_nfds(msg);
-+ ASSERT_RETURN(nfds == KDBUS_CONN_MAX_FDS_PER_USER +
-+ KDBUS_MSG_MAX_MEMFD_ITEMS);
-+
-+ kdbus_msg_free(msg);
-+
-+ ret = kdbus_msg_recv(conn_dst, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ nfds = kdbus_item_get_nfds(msg);
-+ ASSERT_RETURN(nfds == KDBUS_MSG_MAX_MEMFD_ITEMS);
-+
-+ kdbus_msg_free(msg);
-+
-+ ret = kdbus_msg_recv(conn_dst, NULL, NULL);
-+ ASSERT_RETURN(ret == -EAGAIN);
-+
-+ for (i = 0; i < KDBUS_CONN_MAX_FDS_PER_USER + 1; i++)
-+ close(fds[i]);
-+
-+ for (i = 0; i < KDBUS_MSG_MAX_ITEMS + 1; i++)
-+ close(memfds[i]);
-+
-+ return 0;
-+}
-+
-+int kdbus_test_fd_passing(struct kdbus_test_env *env)
-+{
-+ struct kdbus_conn *conn_src, *conn_dst;
-+ const char *str = "stackenblocken";
-+ const struct kdbus_item *item;
-+ struct kdbus_msg *msg;
-+ unsigned int i;
-+ uint64_t now;
-+ int fds_conn[2];
-+ int sock_pair[2];
-+ int fds[2];
-+ int memfd;
-+ int ret;
-+
-+ now = (uint64_t) time(NULL);
-+
-+ /* create two connections */
-+ conn_src = kdbus_hello(env->buspath, 0, NULL, 0);
-+ conn_dst = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn_src && conn_dst);
-+
-+ fds_conn[0] = conn_src->fd;
-+ fds_conn[1] = conn_dst->fd;
-+
-+ ret = socketpair(AF_UNIX, SOCK_STREAM, 0, sock_pair);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* Setup memfd */
-+ memfd = memfd_write("memfd-name", &now, sizeof(now));
-+ ASSERT_RETURN(memfd >= 0);
-+
-+ /* Setup pipes */
-+ ret = pipe(fds);
-+ ASSERT_RETURN(ret == 0);
-+
-+ i = write(fds[1], str, strlen(str));
-+ ASSERT_RETURN(i == strlen(str));
-+
-+ /*
-+ * Try to ass the handle of a connection as message payload.
-+ * This must fail.
-+ */
-+ ret = send_fds(conn_src, conn_dst->id, fds_conn, 2);
-+ ASSERT_RETURN(ret == -ENOTSUP);
-+
-+ ret = send_fds(conn_dst, conn_src->id, fds_conn, 2);
-+ ASSERT_RETURN(ret == -ENOTSUP);
-+
-+ ret = send_fds(conn_src, conn_dst->id, sock_pair, 2);
-+ ASSERT_RETURN(ret == -ENOTSUP);
-+
-+ /*
-+ * Send fds and memfds to connection that do not accept fds
-+ */
-+ ret = kdbus_test_no_fds(env, fds, (int *)&memfd);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* Try to broadcast file descriptors. This must fail. */
-+ ret = send_fds(conn_src, KDBUS_DST_ID_BROADCAST, fds, 1);
-+ ASSERT_RETURN(ret == -ENOTUNIQ);
-+
-+ /* Try to broadcast memfd. This must succeed. */
-+ ret = send_memfds(conn_src, KDBUS_DST_ID_BROADCAST, (int *)&memfd, 1);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* Open code this loop */
-+loop_send_fds:
-+
-+ /*
-+ * Send the read end of the pipe and close it.
-+ */
-+ ret = send_fds(conn_src, conn_dst->id, fds, 1);
-+ ASSERT_RETURN(ret == 0);
-+ close(fds[0]);
-+
-+ ret = kdbus_msg_recv(conn_dst, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ KDBUS_ITEM_FOREACH(item, msg, items) {
-+ if (item->type == KDBUS_ITEM_FDS) {
-+ char tmp[14];
-+ int nfds = (item->size - KDBUS_ITEM_HEADER_SIZE) /
-+ sizeof(int);
-+ ASSERT_RETURN(nfds == 1);
-+
-+ i = read(item->fds[0], tmp, sizeof(tmp));
-+ if (i != 0) {
-+ ASSERT_RETURN(i == sizeof(tmp));
-+ ASSERT_RETURN(memcmp(tmp, str, sizeof(tmp)) == 0);
-+
-+ /* Write EOF */
-+ close(fds[1]);
-+
-+ /*
-+ * Resend the read end of the pipe,
-+ * the receiver still holds a reference
-+ * to it...
-+ */
-+ goto loop_send_fds;
-+ }
-+
-+ /* Got EOF */
-+
-+ /*
-+ * Close the last reference to the read end
-+ * of the pipe, other references are
-+ * automatically closed just after send.
-+ */
-+ close(item->fds[0]);
-+ }
-+ }
-+
-+ /*
-+ * Try to resend the read end of the pipe. Must fail with
-+ * -EBADF since both the sender and receiver closed their
-+ * references to it. We assume the above since sender and
-+ * receiver are on the same process.
-+ */
-+ ret = send_fds(conn_src, conn_dst->id, fds, 1);
-+ ASSERT_RETURN(ret == -EBADF);
-+
-+ /* Then we clear out received any data... */
-+ kdbus_msg_free(msg);
-+
-+ ret = kdbus_send_multiple_fds(conn_src, conn_dst);
-+ ASSERT_RETURN(ret == 0);
-+
-+ close(sock_pair[0]);
-+ close(sock_pair[1]);
-+ close(memfd);
-+
-+ kdbus_conn_free(conn_src);
-+ kdbus_conn_free(conn_dst);
-+
-+ return TEST_OK;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-free.c b/tools/testing/selftests/kdbus/test-free.c
-new file mode 100644
-index 0000000..f666da3
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-free.c
-@@ -0,0 +1,64 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <stdbool.h>
-+
-+#include "kdbus-api.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+#include "kdbus-test.h"
-+
-+static int sample_ioctl_call(struct kdbus_test_env *env)
-+{
-+ int ret;
-+ struct kdbus_cmd_list cmd_list = {
-+ .flags = KDBUS_LIST_QUEUED,
-+ .size = sizeof(cmd_list),
-+ };
-+
-+ ret = kdbus_cmd_list(env->conn->fd, &cmd_list);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* DON'T FREE THIS SLICE OF MEMORY! */
-+
-+ return TEST_OK;
-+}
-+
-+int kdbus_test_free(struct kdbus_test_env *env)
-+{
-+ int ret;
-+ struct kdbus_cmd_free cmd_free = {};
-+
-+ /* free an unallocated buffer */
-+ cmd_free.size = sizeof(cmd_free);
-+ cmd_free.flags = 0;
-+ cmd_free.offset = 0;
-+ ret = kdbus_cmd_free(env->conn->fd, &cmd_free);
-+ ASSERT_RETURN(ret == -ENXIO);
-+
-+ /* free a buffer out of the pool's bounds */
-+ cmd_free.size = sizeof(cmd_free);
-+ cmd_free.offset = POOL_SIZE + 1;
-+ ret = kdbus_cmd_free(env->conn->fd, &cmd_free);
-+ ASSERT_RETURN(ret == -ENXIO);
-+
-+ /*
-+ * The user application is responsible for freeing the allocated
-+ * memory with the KDBUS_CMD_FREE ioctl, so let's test what happens
-+ * if we forget about it.
-+ */
-+
-+ ret = sample_ioctl_call(env);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = sample_ioctl_call(env);
-+ ASSERT_RETURN(ret == 0);
-+
-+ return TEST_OK;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-match.c b/tools/testing/selftests/kdbus/test-match.c
-new file mode 100644
-index 0000000..2360dc1
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-match.c
-@@ -0,0 +1,441 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <stdbool.h>
-+
-+#include "kdbus-api.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+#include "kdbus-test.h"
-+
-+int kdbus_test_match_id_add(struct kdbus_test_env *env)
-+{
-+ struct {
-+ struct kdbus_cmd_match cmd;
-+ struct {
-+ uint64_t size;
-+ uint64_t type;
-+ struct kdbus_notify_id_change chg;
-+ } item;
-+ } buf;
-+ struct kdbus_conn *conn;
-+ struct kdbus_msg *msg;
-+ int ret;
-+
-+ memset(&buf, 0, sizeof(buf));
-+
-+ buf.cmd.size = sizeof(buf);
-+ buf.cmd.cookie = 0xdeafbeefdeaddead;
-+ buf.item.size = sizeof(buf.item);
-+ buf.item.type = KDBUS_ITEM_ID_ADD;
-+ buf.item.chg.id = KDBUS_MATCH_ID_ANY;
-+
-+ /* match on id add */
-+ ret = kdbus_cmd_match_add(env->conn->fd, &buf.cmd);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* create 2nd connection */
-+ conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn != NULL);
-+
-+ /* 1st connection should have received a notification */
-+ ret = kdbus_msg_recv(env->conn, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_ID_ADD);
-+ ASSERT_RETURN(msg->items[0].id_change.id == conn->id);
-+
-+ kdbus_conn_free(conn);
-+
-+ return TEST_OK;
-+}
-+
-+int kdbus_test_match_id_remove(struct kdbus_test_env *env)
-+{
-+ struct {
-+ struct kdbus_cmd_match cmd;
-+ struct {
-+ uint64_t size;
-+ uint64_t type;
-+ struct kdbus_notify_id_change chg;
-+ } item;
-+ } buf;
-+ struct kdbus_conn *conn;
-+ struct kdbus_msg *msg;
-+ size_t id;
-+ int ret;
-+
-+ /* create 2nd connection */
-+ conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn != NULL);
-+ id = conn->id;
-+
-+ memset(&buf, 0, sizeof(buf));
-+ buf.cmd.size = sizeof(buf);
-+ buf.cmd.cookie = 0xdeafbeefdeaddead;
-+ buf.item.size = sizeof(buf.item);
-+ buf.item.type = KDBUS_ITEM_ID_REMOVE;
-+ buf.item.chg.id = id;
-+
-+ /* register match on 2nd connection */
-+ ret = kdbus_cmd_match_add(env->conn->fd, &buf.cmd);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* remove 2nd connection again */
-+ kdbus_conn_free(conn);
-+
-+ /* 1st connection should have received a notification */
-+ ret = kdbus_msg_recv(env->conn, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_ID_REMOVE);
-+ ASSERT_RETURN(msg->items[0].id_change.id == id);
-+
-+ return TEST_OK;
-+}
-+
-+int kdbus_test_match_replace(struct kdbus_test_env *env)
-+{
-+ struct {
-+ struct kdbus_cmd_match cmd;
-+ struct {
-+ uint64_t size;
-+ uint64_t type;
-+ struct kdbus_notify_id_change chg;
-+ } item;
-+ } buf;
-+ struct kdbus_conn *conn;
-+ struct kdbus_msg *msg;
-+ size_t id;
-+ int ret;
-+
-+ /* add a match to id_add */
-+ ASSERT_RETURN(kdbus_test_match_id_add(env) == TEST_OK);
-+
-+ /* do a replace of the match from id_add to id_remove */
-+ memset(&buf, 0, sizeof(buf));
-+
-+ buf.cmd.size = sizeof(buf);
-+ buf.cmd.cookie = 0xdeafbeefdeaddead;
-+ buf.cmd.flags = KDBUS_MATCH_REPLACE;
-+ buf.item.size = sizeof(buf.item);
-+ buf.item.type = KDBUS_ITEM_ID_REMOVE;
-+ buf.item.chg.id = KDBUS_MATCH_ID_ANY;
-+
-+ ret = kdbus_cmd_match_add(env->conn->fd, &buf.cmd);
-+
-+ /* create 2nd connection */
-+ conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn != NULL);
-+ id = conn->id;
-+
-+ /* 1st connection should _not_ have received a notification */
-+ ret = kdbus_msg_recv(env->conn, &msg, NULL);
-+ ASSERT_RETURN(ret != 0);
-+
-+ /* remove 2nd connection */
-+ kdbus_conn_free(conn);
-+
-+ /* 1st connection should _now_ have received a notification */
-+ ret = kdbus_msg_recv(env->conn, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_ID_REMOVE);
-+ ASSERT_RETURN(msg->items[0].id_change.id == id);
-+
-+ return TEST_OK;
-+}
-+
-+int kdbus_test_match_name_add(struct kdbus_test_env *env)
-+{
-+ struct {
-+ struct kdbus_cmd_match cmd;
-+ struct {
-+ uint64_t size;
-+ uint64_t type;
-+ struct kdbus_notify_name_change chg;
-+ } item;
-+ char name[64];
-+ } buf;
-+ struct kdbus_msg *msg;
-+ char *name;
-+ int ret;
-+
-+ name = "foo.bla.blaz";
-+
-+ /* install the match rule */
-+ memset(&buf, 0, sizeof(buf));
-+ buf.item.type = KDBUS_ITEM_NAME_ADD;
-+ buf.item.chg.old_id.id = KDBUS_MATCH_ID_ANY;
-+ buf.item.chg.new_id.id = KDBUS_MATCH_ID_ANY;
-+ strncpy(buf.name, name, sizeof(buf.name) - 1);
-+ buf.item.size = sizeof(buf.item) + strlen(buf.name) + 1;
-+ buf.cmd.size = sizeof(buf.cmd) + buf.item.size;
-+
-+ ret = kdbus_cmd_match_add(env->conn->fd, &buf.cmd);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* acquire the name */
-+ ret = kdbus_name_acquire(env->conn, name, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* we should have received a notification */
-+ ret = kdbus_msg_recv(env->conn, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_NAME_ADD);
-+ ASSERT_RETURN(msg->items[0].name_change.old_id.id == 0);
-+ ASSERT_RETURN(msg->items[0].name_change.new_id.id == env->conn->id);
-+ ASSERT_RETURN(strcmp(msg->items[0].name_change.name, name) == 0);
-+
-+ return TEST_OK;
-+}
-+
-+int kdbus_test_match_name_remove(struct kdbus_test_env *env)
-+{
-+ struct {
-+ struct kdbus_cmd_match cmd;
-+ struct {
-+ uint64_t size;
-+ uint64_t type;
-+ struct kdbus_notify_name_change chg;
-+ } item;
-+ char name[64];
-+ } buf;
-+ struct kdbus_msg *msg;
-+ char *name;
-+ int ret;
-+
-+ name = "foo.bla.blaz";
-+
-+ /* acquire the name */
-+ ret = kdbus_name_acquire(env->conn, name, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* install the match rule */
-+ memset(&buf, 0, sizeof(buf));
-+ buf.item.type = KDBUS_ITEM_NAME_REMOVE;
-+ buf.item.chg.old_id.id = KDBUS_MATCH_ID_ANY;
-+ buf.item.chg.new_id.id = KDBUS_MATCH_ID_ANY;
-+ strncpy(buf.name, name, sizeof(buf.name) - 1);
-+ buf.item.size = sizeof(buf.item) + strlen(buf.name) + 1;
-+ buf.cmd.size = sizeof(buf.cmd) + buf.item.size;
-+
-+ ret = kdbus_cmd_match_add(env->conn->fd, &buf.cmd);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* release the name again */
-+ kdbus_name_release(env->conn, name);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* we should have received a notification */
-+ ret = kdbus_msg_recv(env->conn, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_NAME_REMOVE);
-+ ASSERT_RETURN(msg->items[0].name_change.old_id.id == env->conn->id);
-+ ASSERT_RETURN(msg->items[0].name_change.new_id.id == 0);
-+ ASSERT_RETURN(strcmp(msg->items[0].name_change.name, name) == 0);
-+
-+ return TEST_OK;
-+}
-+
-+int kdbus_test_match_name_change(struct kdbus_test_env *env)
-+{
-+ struct {
-+ struct kdbus_cmd_match cmd;
-+ struct {
-+ uint64_t size;
-+ uint64_t type;
-+ struct kdbus_notify_name_change chg;
-+ } item;
-+ char name[64];
-+ } buf;
-+ struct kdbus_conn *conn;
-+ struct kdbus_msg *msg;
-+ uint64_t flags;
-+ char *name = "foo.bla.baz";
-+ int ret;
-+
-+ /* acquire the name */
-+ ret = kdbus_name_acquire(env->conn, name, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* install the match rule */
-+ memset(&buf, 0, sizeof(buf));
-+ buf.item.type = KDBUS_ITEM_NAME_CHANGE;
-+ buf.item.chg.old_id.id = KDBUS_MATCH_ID_ANY;
-+ buf.item.chg.new_id.id = KDBUS_MATCH_ID_ANY;
-+ strncpy(buf.name, name, sizeof(buf.name) - 1);
-+ buf.item.size = sizeof(buf.item) + strlen(buf.name) + 1;
-+ buf.cmd.size = sizeof(buf.cmd) + buf.item.size;
-+
-+ ret = kdbus_cmd_match_add(env->conn->fd, &buf.cmd);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* create a 2nd connection */
-+ conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn != NULL);
-+
-+ /* allow the new connection to own the same name */
-+ /* queue the 2nd connection as waiting owner */
-+ flags = KDBUS_NAME_QUEUE;
-+ ret = kdbus_name_acquire(conn, name, &flags);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(flags & KDBUS_NAME_IN_QUEUE);
-+
-+ /* release name from 1st connection */
-+ ret = kdbus_name_release(env->conn, name);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* we should have received a notification */
-+ ret = kdbus_msg_recv(env->conn, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ASSERT_RETURN(msg->items[0].type == KDBUS_ITEM_NAME_CHANGE);
-+ ASSERT_RETURN(msg->items[0].name_change.old_id.id == env->conn->id);
-+ ASSERT_RETURN(msg->items[0].name_change.new_id.id == conn->id);
-+ ASSERT_RETURN(strcmp(msg->items[0].name_change.name, name) == 0);
-+
-+ kdbus_conn_free(conn);
-+
-+ return TEST_OK;
-+}
-+
-+static int send_bloom_filter(const struct kdbus_conn *conn,
-+ uint64_t cookie,
-+ const uint8_t *filter,
-+ size_t filter_size,
-+ uint64_t filter_generation)
-+{
-+ struct kdbus_cmd_send cmd = {};
-+ struct kdbus_msg *msg;
-+ struct kdbus_item *item;
-+ uint64_t size;
-+ int ret;
-+
-+ size = sizeof(struct kdbus_msg);
-+ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) + filter_size;
-+
-+ msg = alloca(size);
-+
-+ memset(msg, 0, size);
-+ msg->size = size;
-+ msg->src_id = conn->id;
-+ msg->dst_id = KDBUS_DST_ID_BROADCAST;
-+ msg->flags = KDBUS_MSG_SIGNAL;
-+ msg->payload_type = KDBUS_PAYLOAD_DBUS;
-+ msg->cookie = cookie;
-+
-+ item = msg->items;
-+ item->type = KDBUS_ITEM_BLOOM_FILTER;
-+ item->size = KDBUS_ITEM_SIZE(sizeof(struct kdbus_bloom_filter)) +
-+ filter_size;
-+
-+ item->bloom_filter.generation = filter_generation;
-+ memcpy(item->bloom_filter.data, filter, filter_size);
-+
-+ cmd.size = sizeof(cmd);
-+ cmd.msg_address = (uintptr_t)msg;
-+
-+ ret = kdbus_cmd_send(conn->fd, &cmd);
-+ if (ret < 0) {
-+ kdbus_printf("error sending message: %d (%m)\n", ret);
-+ return ret;
-+ }
-+
-+ return 0;
-+}
-+
-+int kdbus_test_match_bloom(struct kdbus_test_env *env)
-+{
-+ struct {
-+ struct kdbus_cmd_match cmd;
-+ struct {
-+ uint64_t size;
-+ uint64_t type;
-+ uint8_t data_gen0[64];
-+ uint8_t data_gen1[64];
-+ } item;
-+ } buf;
-+ struct kdbus_conn *conn;
-+ struct kdbus_msg *msg;
-+ uint64_t cookie = 0xf000f00f;
-+ uint8_t filter[64];
-+ int ret;
-+
-+ /* install the match rule */
-+ memset(&buf, 0, sizeof(buf));
-+ buf.cmd.size = sizeof(buf);
-+
-+ buf.item.size = sizeof(buf.item);
-+ buf.item.type = KDBUS_ITEM_BLOOM_MASK;
-+ buf.item.data_gen0[0] = 0x55;
-+ buf.item.data_gen0[63] = 0x80;
-+
-+ buf.item.data_gen1[1] = 0xaa;
-+ buf.item.data_gen1[9] = 0x02;
-+
-+ ret = kdbus_cmd_match_add(env->conn->fd, &buf.cmd);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* create a 2nd connection */
-+ conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn != NULL);
-+
-+ /* a message with a 0'ed out filter must not reach the other peer */
-+ memset(filter, 0, sizeof(filter));
-+ ret = send_bloom_filter(conn, ++cookie, filter, sizeof(filter), 0);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_msg_recv(env->conn, &msg, NULL);
-+ ASSERT_RETURN(ret == -EAGAIN);
-+
-+ /* now set the filter to the connection's mask and expect success */
-+ filter[0] = 0x55;
-+ filter[63] = 0x80;
-+ ret = send_bloom_filter(conn, ++cookie, filter, sizeof(filter), 0);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_msg_recv(env->conn, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(msg->cookie == cookie);
-+
-+ /* broaden the filter and try again. this should also succeed. */
-+ filter[0] = 0xff;
-+ filter[8] = 0xff;
-+ filter[63] = 0xff;
-+ ret = send_bloom_filter(conn, ++cookie, filter, sizeof(filter), 0);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_msg_recv(env->conn, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(msg->cookie == cookie);
-+
-+ /* the same filter must not match against bloom generation 1 */
-+ ret = send_bloom_filter(conn, ++cookie, filter, sizeof(filter), 1);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_msg_recv(env->conn, &msg, NULL);
-+ ASSERT_RETURN(ret == -EAGAIN);
-+
-+ /* set a different filter and try again */
-+ filter[1] = 0xaa;
-+ filter[9] = 0x02;
-+ ret = send_bloom_filter(conn, ++cookie, filter, sizeof(filter), 1);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_msg_recv(env->conn, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(msg->cookie == cookie);
-+
-+ kdbus_conn_free(conn);
-+
-+ return TEST_OK;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-message.c b/tools/testing/selftests/kdbus/test-message.c
-new file mode 100644
-index 0000000..563dc85
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-message.c
-@@ -0,0 +1,734 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <time.h>
-+#include <stdbool.h>
-+#include <sys/eventfd.h>
-+#include <sys/types.h>
-+#include <sys/wait.h>
-+
-+#include "kdbus-api.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+#include "kdbus-test.h"
-+
-+/* maximum number of queued messages from the same individual user */
-+#define KDBUS_CONN_MAX_MSGS 256
-+
-+/* maximum number of queued requests waiting for a reply */
-+#define KDBUS_CONN_MAX_REQUESTS_PENDING 128
-+
-+/* maximum message payload size */
-+#define KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE (2 * 1024UL * 1024UL)
-+
-+int kdbus_test_message_basic(struct kdbus_test_env *env)
-+{
-+ struct kdbus_conn *conn;
-+ struct kdbus_conn *sender;
-+ struct kdbus_msg *msg;
-+ uint64_t cookie = 0x1234abcd5678eeff;
-+ uint64_t offset;
-+ int ret;
-+
-+ sender = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(sender != NULL);
-+
-+ /* create a 2nd connection */
-+ conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn != NULL);
-+
-+ ret = kdbus_add_match_empty(conn);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_add_match_empty(sender);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* send over 1st connection */
-+ ret = kdbus_msg_send(sender, NULL, cookie, 0, 0, 0,
-+ KDBUS_DST_ID_BROADCAST);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* Make sure that we do get our own broadcasts */
-+ ret = kdbus_msg_recv(sender, &msg, &offset);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(msg->cookie == cookie);
-+
-+ kdbus_msg_free(msg);
-+
-+ /* ... and receive on the 2nd */
-+ ret = kdbus_msg_recv_poll(conn, 100, &msg, &offset);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(msg->cookie == cookie);
-+
-+ kdbus_msg_free(msg);
-+
-+ /* Msgs that expect a reply must have timeout and cookie */
-+ ret = kdbus_msg_send(sender, NULL, 0, KDBUS_MSG_EXPECT_REPLY,
-+ 0, 0, conn->id);
-+ ASSERT_RETURN(ret == -EINVAL);
-+
-+ /* Faked replies with a valid reply cookie are rejected */
-+ ret = kdbus_msg_send_reply(conn, time(NULL) ^ cookie, sender->id);
-+ ASSERT_RETURN(ret == -EBADSLT);
-+
-+ ret = kdbus_free(conn, offset);
-+ ASSERT_RETURN(ret == 0);
-+
-+ kdbus_conn_free(sender);
-+ kdbus_conn_free(conn);
-+
-+ return TEST_OK;
-+}
-+
-+static int msg_recv_prio(struct kdbus_conn *conn,
-+ int64_t requested_prio,
-+ int64_t expected_prio)
-+{
-+ struct kdbus_cmd_recv recv = {
-+ .size = sizeof(recv),
-+ .flags = KDBUS_RECV_USE_PRIORITY,
-+ .priority = requested_prio,
-+ };
-+ struct kdbus_msg *msg;
-+ int ret;
-+
-+ ret = kdbus_cmd_recv(conn->fd, &recv);
-+ if (ret < 0) {
-+ kdbus_printf("error receiving message: %d (%m)\n", -errno);
-+ return ret;
-+ }
-+
-+ msg = (struct kdbus_msg *)(conn->buf + recv.msg.offset);
-+ kdbus_msg_dump(conn, msg);
-+
-+ if (msg->priority != expected_prio) {
-+ kdbus_printf("expected message prio %lld, got %lld\n",
-+ (unsigned long long) expected_prio,
-+ (unsigned long long) msg->priority);
-+ return -EINVAL;
-+ }
-+
-+ kdbus_msg_free(msg);
-+ ret = kdbus_free(conn, recv.msg.offset);
-+ if (ret < 0)
-+ return ret;
-+
-+ return 0;
-+}
-+
-+int kdbus_test_message_prio(struct kdbus_test_env *env)
-+{
-+ struct kdbus_conn *a, *b;
-+ uint64_t cookie = 0;
-+
-+ a = kdbus_hello(env->buspath, 0, NULL, 0);
-+ b = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(a && b);
-+
-+ ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, 25, a->id) == 0);
-+ ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -600, a->id) == 0);
-+ ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, 10, a->id) == 0);
-+ ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -35, a->id) == 0);
-+ ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -100, a->id) == 0);
-+ ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, 20, a->id) == 0);
-+ ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -15, a->id) == 0);
-+ ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -800, a->id) == 0);
-+ ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -150, a->id) == 0);
-+ ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, 10, a->id) == 0);
-+ ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -800, a->id) == 0);
-+ ASSERT_RETURN(kdbus_msg_send(b, NULL, ++cookie, 0, 0, -10, a->id) == 0);
-+
-+ ASSERT_RETURN(msg_recv_prio(a, -200, -800) == 0);
-+ ASSERT_RETURN(msg_recv_prio(a, -100, -800) == 0);
-+ ASSERT_RETURN(msg_recv_prio(a, -400, -600) == 0);
-+ ASSERT_RETURN(msg_recv_prio(a, -400, -600) == -EAGAIN);
-+ ASSERT_RETURN(msg_recv_prio(a, 10, -150) == 0);
-+ ASSERT_RETURN(msg_recv_prio(a, 10, -100) == 0);
-+
-+ kdbus_printf("--- get priority (all)\n");
-+ ASSERT_RETURN(kdbus_msg_recv(a, NULL, NULL) == 0);
-+
-+ kdbus_conn_free(a);
-+ kdbus_conn_free(b);
-+
-+ return TEST_OK;
-+}
-+
-+static int kdbus_test_notify_kernel_quota(struct kdbus_test_env *env)
-+{
-+ int ret;
-+ unsigned int i;
-+ struct kdbus_conn *conn;
-+ struct kdbus_conn *reader;
-+ struct kdbus_msg *msg = NULL;
-+ struct kdbus_cmd_recv recv = { .size = sizeof(recv) };
-+
-+ reader = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(reader);
-+
-+ conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn);
-+
-+ /* Register for ID signals */
-+ ret = kdbus_add_match_id(reader, 0x1, KDBUS_ITEM_ID_ADD,
-+ KDBUS_MATCH_ID_ANY);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_add_match_id(reader, 0x2, KDBUS_ITEM_ID_REMOVE,
-+ KDBUS_MATCH_ID_ANY);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* Each iteration two notifications: add and remove ID */
-+ for (i = 0; i < KDBUS_CONN_MAX_MSGS / 2; i++) {
-+ struct kdbus_conn *notifier;
-+
-+ notifier = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(notifier);
-+
-+ kdbus_conn_free(notifier);
-+ }
-+
-+ /*
-+ * Now the reader queue is full with kernel notfications,
-+ * but as a user we still have room to push our messages.
-+ */
-+ ret = kdbus_msg_send(conn, NULL, 0xdeadbeef, 0, 0, 0, reader->id);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* More ID kernel notifications that will be lost */
-+ kdbus_conn_free(conn);
-+
-+ conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn);
-+
-+ kdbus_conn_free(conn);
-+
-+ /*
-+ * We lost only 3 packets since only signal msgs are
-+ * accounted. The connection ID add/remove notification
-+ */
-+ ret = kdbus_cmd_recv(reader->fd, &recv);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(recv.return_flags & KDBUS_RECV_RETURN_DROPPED_MSGS);
-+ ASSERT_RETURN(recv.dropped_msgs == 3);
-+
-+ msg = (struct kdbus_msg *)(reader->buf + recv.msg.offset);
-+ kdbus_msg_free(msg);
-+
-+ /* Read our queue */
-+ for (i = 0; i < KDBUS_CONN_MAX_MSGS - 1; i++) {
-+ memset(&recv, 0, sizeof(recv));
-+ recv.size = sizeof(recv);
-+
-+ ret = kdbus_cmd_recv(reader->fd, &recv);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(!(recv.return_flags &
-+ KDBUS_RECV_RETURN_DROPPED_MSGS));
-+
-+ msg = (struct kdbus_msg *)(reader->buf + recv.msg.offset);
-+ kdbus_msg_free(msg);
-+ }
-+
-+ ret = kdbus_msg_recv(reader, NULL, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_msg_recv(reader, NULL, NULL);
-+ ASSERT_RETURN(ret == -EAGAIN);
-+
-+ kdbus_conn_free(reader);
-+
-+ return 0;
-+}
-+
-+/* Return the number of message successfully sent */
-+static int kdbus_fill_conn_queue(struct kdbus_conn *conn_src,
-+ uint64_t dst_id,
-+ unsigned int max_msgs)
-+{
-+ unsigned int i;
-+ uint64_t cookie = 0;
-+ size_t size;
-+ struct kdbus_cmd_send cmd = {};
-+ struct kdbus_msg *msg;
-+ int ret;
-+
-+ size = sizeof(struct kdbus_msg);
-+ msg = malloc(size);
-+ ASSERT_RETURN_VAL(msg, -ENOMEM);
-+
-+ memset(msg, 0, size);
-+ msg->size = size;
-+ msg->src_id = conn_src->id;
-+ msg->dst_id = dst_id;
-+ msg->payload_type = KDBUS_PAYLOAD_DBUS;
-+
-+ cmd.size = sizeof(cmd);
-+ cmd.msg_address = (uintptr_t)msg;
-+
-+ for (i = 0; i < max_msgs; i++) {
-+ msg->cookie = cookie++;
-+ ret = kdbus_cmd_send(conn_src->fd, &cmd);
-+ if (ret < 0)
-+ break;
-+ }
-+
-+ free(msg);
-+
-+ return i;
-+}
-+
-+static int kdbus_test_activator_quota(struct kdbus_test_env *env)
-+{
-+ int ret;
-+ unsigned int i;
-+ unsigned int activator_msgs_count = 0;
-+ uint64_t cookie = time(NULL);
-+ struct kdbus_conn *conn;
-+ struct kdbus_conn *sender;
-+ struct kdbus_conn *activator;
-+ struct kdbus_msg *msg;
-+ uint64_t flags = KDBUS_NAME_REPLACE_EXISTING;
-+ struct kdbus_cmd_recv recv = { .size = sizeof(recv) };
-+ struct kdbus_policy_access access = {
-+ .type = KDBUS_POLICY_ACCESS_USER,
-+ .id = geteuid(),
-+ .access = KDBUS_POLICY_OWN,
-+ };
-+
-+ activator = kdbus_hello_activator(env->buspath, "foo.test.activator",
-+ &access, 1);
-+ ASSERT_RETURN(activator);
-+
-+ conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+ sender = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn || sender);
-+
-+ ret = kdbus_list(sender, KDBUS_LIST_NAMES |
-+ KDBUS_LIST_UNIQUE |
-+ KDBUS_LIST_ACTIVATORS |
-+ KDBUS_LIST_QUEUED);
-+ ASSERT_RETURN(ret == 0);
-+
-+ for (i = 0; i < KDBUS_CONN_MAX_MSGS; i++) {
-+ ret = kdbus_msg_send(sender, "foo.test.activator",
-+ cookie++, 0, 0, 0,
-+ KDBUS_DST_ID_NAME);
-+ if (ret < 0)
-+ break;
-+ activator_msgs_count++;
-+ }
-+
-+ /* we must have at least sent one message */
-+ ASSERT_RETURN_VAL(i > 0, -errno);
-+ ASSERT_RETURN(ret == -ENOBUFS);
-+
-+ /* Good, activator queue is full now */
-+
-+ /* ENXIO on direct send (activators can never be addressed by ID) */
-+ ret = kdbus_msg_send(conn, NULL, cookie++, 0, 0, 0, activator->id);
-+ ASSERT_RETURN(ret == -ENXIO);
-+
-+ /* can't queue more */
-+ ret = kdbus_msg_send(conn, "foo.test.activator", cookie++,
-+ 0, 0, 0, KDBUS_DST_ID_NAME);
-+ ASSERT_RETURN(ret == -ENOBUFS);
-+
-+ /* no match installed, so the broadcast will not inc dropped_msgs */
-+ ret = kdbus_msg_send(sender, NULL, cookie++, 0, 0, 0,
-+ KDBUS_DST_ID_BROADCAST);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* Check activator queue */
-+ ret = kdbus_cmd_recv(activator->fd, &recv);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(recv.dropped_msgs == 0);
-+
-+ activator_msgs_count--;
-+
-+ msg = (struct kdbus_msg *)(activator->buf + recv.msg.offset);
-+ kdbus_msg_free(msg);
-+
-+
-+ /* Stage 1) of test check the pool memory quota */
-+
-+ /* Consume the connection pool memory */
-+ for (i = 0; i < KDBUS_CONN_MAX_MSGS; i++) {
-+ ret = kdbus_msg_send(sender, NULL,
-+ cookie++, 0, 0, 0, conn->id);
-+ if (ret < 0)
-+ break;
-+ }
-+
-+ /* consume one message, so later at least one can be moved */
-+ memset(&recv, 0, sizeof(recv));
-+ recv.size = sizeof(recv);
-+ ret = kdbus_cmd_recv(conn->fd, &recv);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(recv.dropped_msgs == 0);
-+ msg = (struct kdbus_msg *)(conn->buf + recv.msg.offset);
-+ kdbus_msg_free(msg);
-+
-+ /* Try to acquire the name now */
-+ ret = kdbus_name_acquire(conn, "foo.test.activator", &flags);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* try to read messages and see if we have lost some */
-+ memset(&recv, 0, sizeof(recv));
-+ recv.size = sizeof(recv);
-+ ret = kdbus_cmd_recv(conn->fd, &recv);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(recv.dropped_msgs != 0);
-+
-+ /* number of dropped msgs < received ones (at least one was moved) */
-+ ASSERT_RETURN(recv.dropped_msgs < activator_msgs_count);
-+
-+ /* Deduct the number of dropped msgs from the activator msgs */
-+ activator_msgs_count -= recv.dropped_msgs;
-+
-+ msg = (struct kdbus_msg *)(activator->buf + recv.msg.offset);
-+ kdbus_msg_free(msg);
-+
-+ /*
-+ * Release the name and hand it back to activator, now
-+ * we should have 'activator_msgs_count' msgs again in
-+ * the activator queue
-+ */
-+ ret = kdbus_name_release(conn, "foo.test.activator");
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* make sure that we got our previous activator msgs */
-+ ret = kdbus_msg_recv(activator, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(msg->src_id == sender->id);
-+
-+ activator_msgs_count--;
-+
-+ kdbus_msg_free(msg);
-+
-+
-+ /* Stage 2) of test check max message quota */
-+
-+ /* Empty conn queue */
-+ for (i = 0; i < KDBUS_CONN_MAX_MSGS; i++) {
-+ ret = kdbus_msg_recv(conn, NULL, NULL);
-+ if (ret == -EAGAIN)
-+ break;
-+ }
-+
-+ /* fill queue with max msgs quota */
-+ ret = kdbus_fill_conn_queue(sender, conn->id, KDBUS_CONN_MAX_MSGS);
-+ ASSERT_RETURN(ret == KDBUS_CONN_MAX_MSGS);
-+
-+ /* This one is lost but it is not accounted */
-+ ret = kdbus_msg_send(sender, NULL,
-+ cookie++, 0, 0, 0, conn->id);
-+ ASSERT_RETURN(ret == -ENOBUFS);
-+
-+ /* Acquire the name again */
-+ ret = kdbus_name_acquire(conn, "foo.test.activator", &flags);
-+ ASSERT_RETURN(ret == 0);
-+
-+ memset(&recv, 0, sizeof(recv));
-+ recv.size = sizeof(recv);
-+
-+ /*
-+ * Try to read messages and make sure that we have lost all
-+ * the activator messages due to quota checks. Our queue is
-+ * already full.
-+ */
-+ ret = kdbus_cmd_recv(conn->fd, &recv);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(recv.dropped_msgs == activator_msgs_count);
-+
-+ msg = (struct kdbus_msg *)(activator->buf + recv.msg.offset);
-+ kdbus_msg_free(msg);
-+
-+ kdbus_conn_free(sender);
-+ kdbus_conn_free(conn);
-+ kdbus_conn_free(activator);
-+
-+ return 0;
-+}
-+
-+static int kdbus_test_expected_reply_quota(struct kdbus_test_env *env)
-+{
-+ int ret;
-+ unsigned int i, n;
-+ unsigned int count;
-+ uint64_t cookie = 0x1234abcd5678eeff;
-+ struct kdbus_conn *conn;
-+ struct kdbus_conn *connections[9];
-+
-+ conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn);
-+
-+ for (i = 0; i < 9; i++) {
-+ connections[i] = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(connections[i]);
-+ }
-+
-+ count = 0;
-+ /* Send 16 messages to 8 different connections */
-+ for (i = 0; i < 8; i++) {
-+ for (n = 0; n < 16; n++) {
-+ ret = kdbus_msg_send(conn, NULL, cookie++,
-+ KDBUS_MSG_EXPECT_REPLY,
-+ 100000000ULL, 0,
-+ connections[i]->id);
-+ if (ret < 0)
-+ break;
-+
-+ count++;
-+ }
-+ }
-+
-+ /*
-+ * We should have queued at least
-+ * KDBUS_CONN_MAX_REQUESTS_PENDING method call
-+ */
-+ ASSERT_RETURN(count == KDBUS_CONN_MAX_REQUESTS_PENDING);
-+
-+ /*
-+ * Now try to send a message to the last connection,
-+ * if we have reached KDBUS_CONN_MAX_REQUESTS_PENDING
-+ * no further requests are allowed
-+ */
-+ ret = kdbus_msg_send(conn, NULL, cookie++, KDBUS_MSG_EXPECT_REPLY,
-+ 1000000000ULL, 0, connections[8]->id);
-+ ASSERT_RETURN(ret == -EMLINK);
-+
-+ for (i = 0; i < 9; i++)
-+ kdbus_conn_free(connections[i]);
-+
-+ kdbus_conn_free(conn);
-+
-+ return 0;
-+}
-+
-+int kdbus_test_pool_quota(struct kdbus_test_env *env)
-+{
-+ struct kdbus_conn *a, *b, *c;
-+ struct kdbus_cmd_send cmd = {};
-+ struct kdbus_item *item;
-+ struct kdbus_msg *recv_msg;
-+ struct kdbus_msg *msg;
-+ uint64_t cookie = time(NULL);
-+ uint64_t size;
-+ unsigned int i;
-+ char *payload;
-+ int ret;
-+
-+ /* just a guard */
-+ if (POOL_SIZE <= KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE ||
-+ POOL_SIZE % KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE != 0)
-+ return 0;
-+
-+ payload = calloc(KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE, sizeof(char));
-+ ASSERT_RETURN_VAL(payload, -ENOMEM);
-+
-+ a = kdbus_hello(env->buspath, 0, NULL, 0);
-+ b = kdbus_hello(env->buspath, 0, NULL, 0);
-+ c = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(a && b && c);
-+
-+ size = sizeof(struct kdbus_msg);
-+ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
-+
-+ msg = malloc(size);
-+ ASSERT_RETURN_VAL(msg, -ENOMEM);
-+
-+ memset(msg, 0, size);
-+ msg->size = size;
-+ msg->src_id = a->id;
-+ msg->dst_id = c->id;
-+ msg->payload_type = KDBUS_PAYLOAD_DBUS;
-+
-+ item = msg->items;
-+ item->type = KDBUS_ITEM_PAYLOAD_VEC;
-+ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
-+ item->vec.address = (uintptr_t)payload;
-+ item->vec.size = KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE;
-+ item = KDBUS_ITEM_NEXT(item);
-+
-+ cmd.size = sizeof(cmd);
-+ cmd.msg_address = (uintptr_t)msg;
-+
-+ /*
-+ * Send 2097248 bytes, a user is only allowed to get 33% of half of
-+ * the free space of the pool, the already used space is
-+ * accounted as free space
-+ */
-+ size += KDBUS_MSG_MAX_PAYLOAD_VEC_SIZE;
-+ for (i = size; i < (POOL_SIZE / 2 / 3); i += size) {
-+ msg->cookie = cookie++;
-+
-+ ret = kdbus_cmd_send(a->fd, &cmd);
-+ ASSERT_RETURN_VAL(ret == 0, ret);
-+ }
-+
-+ /* Try to get more than 33% */
-+ msg->cookie = cookie++;
-+ ret = kdbus_cmd_send(a->fd, &cmd);
-+ ASSERT_RETURN(ret == -ENOBUFS);
-+
-+ /* We still can pass small messages */
-+ ret = kdbus_msg_send(b, NULL, cookie++, 0, 0, 0, c->id);
-+ ASSERT_RETURN(ret == 0);
-+
-+ for (i = size; i < (POOL_SIZE / 2 / 3); i += size) {
-+ ret = kdbus_msg_recv(c, &recv_msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(recv_msg->src_id == a->id);
-+
-+ kdbus_msg_free(recv_msg);
-+ }
-+
-+ ret = kdbus_msg_recv(c, &recv_msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(recv_msg->src_id == b->id);
-+
-+ kdbus_msg_free(recv_msg);
-+
-+ ret = kdbus_msg_recv(c, NULL, NULL);
-+ ASSERT_RETURN(ret == -EAGAIN);
-+
-+ free(msg);
-+ free(payload);
-+
-+ kdbus_conn_free(c);
-+ kdbus_conn_free(b);
-+ kdbus_conn_free(a);
-+
-+ return 0;
-+}
-+
-+int kdbus_test_message_quota(struct kdbus_test_env *env)
-+{
-+ struct kdbus_conn *a, *b;
-+ uint64_t cookie = 0;
-+ int ret;
-+ int i;
-+
-+ ret = kdbus_test_activator_quota(env);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_test_notify_kernel_quota(env);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_test_pool_quota(env);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_test_expected_reply_quota(env);
-+ ASSERT_RETURN(ret == 0);
-+
-+ a = kdbus_hello(env->buspath, 0, NULL, 0);
-+ b = kdbus_hello(env->buspath, 0, NULL, 0);
-+
-+ ret = kdbus_fill_conn_queue(b, a->id, KDBUS_CONN_MAX_MSGS);
-+ ASSERT_RETURN(ret == KDBUS_CONN_MAX_MSGS);
-+
-+ ret = kdbus_msg_send(b, NULL, ++cookie, 0, 0, 0, a->id);
-+ ASSERT_RETURN(ret == -ENOBUFS);
-+
-+ for (i = 0; i < KDBUS_CONN_MAX_MSGS; ++i) {
-+ ret = kdbus_msg_recv(a, NULL, NULL);
-+ ASSERT_RETURN(ret == 0);
-+ }
-+
-+ ret = kdbus_msg_recv(a, NULL, NULL);
-+ ASSERT_RETURN(ret == -EAGAIN);
-+
-+ ret = kdbus_fill_conn_queue(b, a->id, KDBUS_CONN_MAX_MSGS + 1);
-+ ASSERT_RETURN(ret == KDBUS_CONN_MAX_MSGS);
-+
-+ ret = kdbus_msg_send(b, NULL, ++cookie, 0, 0, 0, a->id);
-+ ASSERT_RETURN(ret == -ENOBUFS);
-+
-+ kdbus_conn_free(a);
-+ kdbus_conn_free(b);
-+
-+ return TEST_OK;
-+}
-+
-+int kdbus_test_memory_access(struct kdbus_test_env *env)
-+{
-+ struct kdbus_conn *a, *b;
-+ struct kdbus_cmd_send cmd = {};
-+ struct kdbus_item *item;
-+ struct kdbus_msg *msg;
-+ uint64_t test_addr = 0;
-+ char line[256];
-+ uint64_t size;
-+ FILE *f;
-+ int ret;
-+
-+ /*
-+ * Search in /proc/kallsyms for the address of a kernel symbol that
-+ * should always be there, regardless of the config. Use that address
-+ * in a PAYLOAD_VEC item and make sure it's inaccessible.
-+ */
-+
-+ f = fopen("/proc/kallsyms", "r");
-+ if (!f)
-+ return TEST_SKIP;
-+
-+ while (fgets(line, sizeof(line), f)) {
-+ char *s = line;
-+
-+ if (!strsep(&s, " "))
-+ continue;
-+
-+ if (!strsep(&s, " "))
-+ continue;
-+
-+ if (!strncmp(s, "mutex_lock", 10)) {
-+ test_addr = strtoull(line, NULL, 16);
-+ break;
-+ }
-+ }
-+
-+ fclose(f);
-+
-+ if (!test_addr)
-+ return TEST_SKIP;
-+
-+ a = kdbus_hello(env->buspath, 0, NULL, 0);
-+ b = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(a && b);
-+
-+ size = sizeof(struct kdbus_msg);
-+ size += KDBUS_ITEM_SIZE(sizeof(struct kdbus_vec));
-+
-+ msg = alloca(size);
-+ ASSERT_RETURN_VAL(msg, -ENOMEM);
-+
-+ memset(msg, 0, size);
-+ msg->size = size;
-+ msg->src_id = a->id;
-+ msg->dst_id = b->id;
-+ msg->payload_type = KDBUS_PAYLOAD_DBUS;
-+
-+ item = msg->items;
-+ item->type = KDBUS_ITEM_PAYLOAD_VEC;
-+ item->size = KDBUS_ITEM_HEADER_SIZE + sizeof(struct kdbus_vec);
-+ item->vec.address = test_addr;
-+ item->vec.size = sizeof(void*);
-+ item = KDBUS_ITEM_NEXT(item);
-+
-+ cmd.size = sizeof(cmd);
-+ cmd.msg_address = (uintptr_t)msg;
-+
-+ ret = kdbus_cmd_send(a->fd, &cmd);
-+ ASSERT_RETURN(ret == -EFAULT);
-+
-+ kdbus_conn_free(b);
-+ kdbus_conn_free(a);
-+
-+ return 0;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-metadata-ns.c b/tools/testing/selftests/kdbus/test-metadata-ns.c
-new file mode 100644
-index 0000000..1f6edc0
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-metadata-ns.c
-@@ -0,0 +1,500 @@
-+/*
-+ * Test metadata in new namespaces. Even if our tests can run
-+ * in a namespaced setup, this test is necessary so we can inspect
-+ * metadata on the same kdbusfs but between multiple namespaces
-+ */
-+
-+#include <stdio.h>
-+#include <string.h>
-+#include <sched.h>
-+#include <time.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <signal.h>
-+#include <sys/wait.h>
-+#include <sys/prctl.h>
-+#include <sys/eventfd.h>
-+#include <sys/syscall.h>
-+#include <sys/capability.h>
-+#include <linux/sched.h>
-+
-+#include "kdbus-test.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+
-+static const struct kdbus_creds privileged_creds = {};
-+
-+static const struct kdbus_creds unmapped_creds = {
-+ .uid = UNPRIV_UID,
-+ .euid = UNPRIV_UID,
-+ .suid = UNPRIV_UID,
-+ .fsuid = UNPRIV_UID,
-+ .gid = UNPRIV_GID,
-+ .egid = UNPRIV_GID,
-+ .sgid = UNPRIV_GID,
-+ .fsgid = UNPRIV_GID,
-+};
-+
-+static const struct kdbus_pids unmapped_pids = {};
-+
-+/* Get only the first item */
-+static struct kdbus_item *kdbus_get_item(struct kdbus_msg *msg,
-+ uint64_t type)
-+{
-+ struct kdbus_item *item;
-+
-+ KDBUS_ITEM_FOREACH(item, msg, items)
-+ if (item->type == type)
-+ return item;
-+
-+ return NULL;
-+}
-+
-+static int kdbus_match_kdbus_creds(struct kdbus_msg *msg,
-+ const struct kdbus_creds *expected_creds)
-+{
-+ struct kdbus_item *item;
-+
-+ item = kdbus_get_item(msg, KDBUS_ITEM_CREDS);
-+ ASSERT_RETURN(item);
-+
-+ ASSERT_RETURN(memcmp(&item->creds, expected_creds,
-+ sizeof(struct kdbus_creds)) == 0);
-+
-+ return 0;
-+}
-+
-+static int kdbus_match_kdbus_pids(struct kdbus_msg *msg,
-+ const struct kdbus_pids *expected_pids)
-+{
-+ struct kdbus_item *item;
-+
-+ item = kdbus_get_item(msg, KDBUS_ITEM_PIDS);
-+ ASSERT_RETURN(item);
-+
-+ ASSERT_RETURN(memcmp(&item->pids, expected_pids,
-+ sizeof(struct kdbus_pids)) == 0);
-+
-+ return 0;
-+}
-+
-+static int __kdbus_clone_userns_test(const char *bus,
-+ struct kdbus_conn *conn,
-+ uint64_t grandpa_pid,
-+ int signal_fd)
-+{
-+ int clone_ret;
-+ int ret;
-+ struct kdbus_msg *msg = NULL;
-+ const struct kdbus_item *item;
-+ uint64_t cookie = time(NULL) ^ 0xdeadbeef;
-+ struct kdbus_conn *unpriv_conn = NULL;
-+ struct kdbus_pids parent_pids = {
-+ .pid = getppid(),
-+ .tid = getppid(),
-+ .ppid = grandpa_pid,
-+ };
-+
-+ ret = drop_privileges(UNPRIV_UID, UNPRIV_GID);
-+ ASSERT_EXIT(ret == 0);
-+
-+ unpriv_conn = kdbus_hello(bus, 0, NULL, 0);
-+ ASSERT_EXIT(unpriv_conn);
-+
-+ ret = kdbus_add_match_empty(unpriv_conn);
-+ ASSERT_EXIT(ret == 0);
-+
-+ /*
-+ * ping privileged connection from this new unprivileged
-+ * one
-+ */
-+
-+ ret = kdbus_msg_send(unpriv_conn, NULL, cookie, 0, 0,
-+ 0, conn->id);
-+ ASSERT_EXIT(ret == 0);
-+
-+ /*
-+ * Since we just dropped privileges, the dumpable flag
-+ * was just cleared which makes the /proc/$clone_child/uid_map
-+ * to be owned by root, hence any userns uid mapping will fail
-+ * with -EPERM since the mapping will be done by uid 65534.
-+ *
-+ * To avoid this set the dumpable flag again which makes
-+ * procfs update the /proc/$clone_child/ inodes owner to 65534.
-+ *
-+ * Using this we will be able write to /proc/$clone_child/uid_map
-+ * as uid 65534 and map the uid 65534 to 0 inside the user namespace.
-+ */
-+ ret = prctl(PR_SET_DUMPABLE, SUID_DUMP_USER);
-+ ASSERT_EXIT(ret == 0);
-+
-+ /* Make child privileged in its new userns and run tests */
-+
-+ ret = RUN_CLONE_CHILD(&clone_ret,
-+ SIGCHLD | CLONE_NEWUSER | CLONE_NEWPID,
-+ ({ 0; /* Clone setup, nothing */ }),
-+ ({
-+ eventfd_t event_status = 0;
-+ struct kdbus_conn *userns_conn;
-+
-+ /* ping connection from the new user namespace */
-+ userns_conn = kdbus_hello(bus, 0, NULL, 0);
-+ ASSERT_EXIT(userns_conn);
-+
-+ ret = kdbus_add_match_empty(userns_conn);
-+ ASSERT_EXIT(ret == 0);
-+
-+ cookie++;
-+ ret = kdbus_msg_send(userns_conn, NULL, cookie,
-+ 0, 0, 0, conn->id);
-+ ASSERT_EXIT(ret == 0);
-+
-+ /* Parent did send */
-+ ret = eventfd_read(signal_fd, &event_status);
-+ ASSERT_RETURN(ret >= 0 && event_status == 1);
-+
-+ /*
-+ * Receive from privileged connection
-+ */
-+ kdbus_printf("Privileged → unprivileged/privileged "
-+ "in its userns "
-+ "(different userns and pidns):\n");
-+ ret = kdbus_msg_recv_poll(userns_conn, 300, &msg, NULL);
-+ ASSERT_EXIT(ret == 0);
-+ ASSERT_EXIT(msg->dst_id == userns_conn->id);
-+
-+ item = kdbus_get_item(msg, KDBUS_ITEM_CAPS);
-+ ASSERT_EXIT(item);
-+
-+ /* uid/gid not mapped, so we have unpriv cached creds */
-+ ret = kdbus_match_kdbus_creds(msg, &unmapped_creds);
-+ ASSERT_EXIT(ret == 0);
-+
-+ /*
-+ * Diffent pid namepsaces. This is the child pidns
-+ * so it should not see its parent kdbus_pids
-+ */
-+ ret = kdbus_match_kdbus_pids(msg, &unmapped_pids);
-+ ASSERT_EXIT(ret == 0);
-+
-+ kdbus_msg_free(msg);
-+
-+
-+ /*
-+ * Receive broadcast from privileged connection
-+ */
-+ kdbus_printf("Privileged → unprivileged/privileged "
-+ "in its userns "
-+ "(different userns and pidns):\n");
-+ ret = kdbus_msg_recv_poll(userns_conn, 300, &msg, NULL);
-+ ASSERT_EXIT(ret == 0);
-+ ASSERT_EXIT(msg->dst_id == KDBUS_DST_ID_BROADCAST);
-+
-+ item = kdbus_get_item(msg, KDBUS_ITEM_CAPS);
-+ ASSERT_EXIT(item);
-+
-+ /* uid/gid not mapped, so we have unpriv cached creds */
-+ ret = kdbus_match_kdbus_creds(msg, &unmapped_creds);
-+ ASSERT_EXIT(ret == 0);
-+
-+ /*
-+ * Diffent pid namepsaces. This is the child pidns
-+ * so it should not see its parent kdbus_pids
-+ */
-+ ret = kdbus_match_kdbus_pids(msg, &unmapped_pids);
-+ ASSERT_EXIT(ret == 0);
-+
-+ kdbus_msg_free(msg);
-+
-+ kdbus_conn_free(userns_conn);
-+ }),
-+ ({
-+ /* Parent setup map child uid/gid */
-+ ret = userns_map_uid_gid(pid, "0 65534 1", "0 65534 1");
-+ ASSERT_EXIT(ret == 0);
-+ }),
-+ ({ 0; }));
-+ /* Unprivileged was not able to create user namespace */
-+ if (clone_ret == -EPERM) {
-+ kdbus_printf("-- CLONE_NEWUSER TEST Failed for "
-+ "uid: %u\n -- Make sure that your kernel "
-+ "do not allow CLONE_NEWUSER for "
-+ "unprivileged users\n", UNPRIV_UID);
-+ ret = 0;
-+ goto out;
-+ }
-+
-+ ASSERT_EXIT(ret == 0);
-+
-+
-+ /*
-+ * Receive from privileged connection
-+ */
-+ kdbus_printf("\nPrivileged → unprivileged (same namespaces):\n");
-+ ret = kdbus_msg_recv_poll(unpriv_conn, 300, &msg, NULL);
-+
-+ ASSERT_EXIT(ret == 0);
-+ ASSERT_EXIT(msg->dst_id == unpriv_conn->id);
-+
-+ /* will get the privileged creds */
-+ ret = kdbus_match_kdbus_creds(msg, &privileged_creds);
-+ ASSERT_EXIT(ret == 0);
-+
-+ /* Same pidns so will get the kdbus_pids */
-+ ret = kdbus_match_kdbus_pids(msg, &parent_pids);
-+ ASSERT_RETURN(ret == 0);
-+
-+ kdbus_msg_free(msg);
-+
-+
-+ /*
-+ * Receive broadcast from privileged connection
-+ */
-+ kdbus_printf("\nPrivileged → unprivileged (same namespaces):\n");
-+ ret = kdbus_msg_recv_poll(unpriv_conn, 300, &msg, NULL);
-+
-+ ASSERT_EXIT(ret == 0);
-+ ASSERT_EXIT(msg->dst_id == KDBUS_DST_ID_BROADCAST);
-+
-+ /* will get the privileged creds */
-+ ret = kdbus_match_kdbus_creds(msg, &privileged_creds);
-+ ASSERT_EXIT(ret == 0);
-+
-+ ret = kdbus_match_kdbus_pids(msg, &parent_pids);
-+ ASSERT_RETURN(ret == 0);
-+
-+ kdbus_msg_free(msg);
-+
-+out:
-+ kdbus_conn_free(unpriv_conn);
-+
-+ return ret;
-+}
-+
-+static int kdbus_clone_userns_test(const char *bus,
-+ struct kdbus_conn *conn)
-+{
-+ int ret, status, efd;
-+ pid_t pid, ppid;
-+ uint64_t unpriv_conn_id, userns_conn_id;
-+ struct kdbus_msg *msg;
-+ const struct kdbus_item *item;
-+ struct kdbus_pids expected_pids;
-+ struct kdbus_conn *monitor;
-+
-+ kdbus_printf("STARTING TEST 'metadata-ns'.\n");
-+
-+ monitor = kdbus_hello(bus, KDBUS_HELLO_MONITOR, NULL, 0);
-+ ASSERT_EXIT(monitor);
-+
-+ /*
-+ * parent will signal to child that is in its
-+ * userns to read its queue
-+ */
-+ efd = eventfd(0, EFD_CLOEXEC);
-+ ASSERT_RETURN_VAL(efd >= 0, efd);
-+
-+ ppid = getppid();
-+
-+ pid = fork();
-+ ASSERT_RETURN_VAL(pid >= 0, -errno);
-+
-+ if (pid == 0) {
-+ ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
-+ ASSERT_EXIT_VAL(ret == 0, -errno);
-+
-+ ret = __kdbus_clone_userns_test(bus, conn, ppid, efd);
-+ _exit(ret);
-+ }
-+
-+
-+ /* Phase 1) privileged receives from unprivileged */
-+
-+ /*
-+ * Receive from the unprivileged child
-+ */
-+ kdbus_printf("\nUnprivileged → privileged (same namespaces):\n");
-+ ret = kdbus_msg_recv_poll(conn, 300, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ unpriv_conn_id = msg->src_id;
-+
-+ /* Unprivileged user */
-+ ret = kdbus_match_kdbus_creds(msg, &unmapped_creds);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* Set the expected creds_pids */
-+ expected_pids = (struct kdbus_pids) {
-+ .pid = pid,
-+ .tid = pid,
-+ .ppid = getpid(),
-+ };
-+ ret = kdbus_match_kdbus_pids(msg, &expected_pids);
-+ ASSERT_RETURN(ret == 0);
-+
-+ kdbus_msg_free(msg);
-+
-+
-+ /*
-+ * Receive from the unprivileged that is in his own
-+ * userns and pidns
-+ */
-+
-+ kdbus_printf("\nUnprivileged/privileged in its userns → privileged "
-+ "(different userns and pidns)\n");
-+ ret = kdbus_msg_recv_poll(conn, 300, &msg, NULL);
-+ if (ret == -ETIMEDOUT)
-+ /* perhaps unprivileged userns is not allowed */
-+ goto wait;
-+
-+ ASSERT_RETURN(ret == 0);
-+
-+ userns_conn_id = msg->src_id;
-+
-+ item = kdbus_get_item(msg, KDBUS_ITEM_CAPS);
-+ ASSERT_RETURN(item);
-+
-+ /*
-+ * Compare received items, creds must be translated into
-+ * the receiver user namespace, so the user is unprivileged
-+ */
-+ ret = kdbus_match_kdbus_creds(msg, &unmapped_creds);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /*
-+ * We should have the kdbus_pids since we are the parent
-+ * pidns
-+ */
-+ item = kdbus_get_item(msg, KDBUS_ITEM_PIDS);
-+ ASSERT_RETURN(item);
-+
-+ ASSERT_RETURN(memcmp(&item->pids, &unmapped_pids,
-+ sizeof(struct kdbus_pids)) != 0);
-+
-+ /*
-+ * Parent pid of the unprivileged/privileged in its userns
-+ * is the unprivileged child pid that was forked here.
-+ */
-+ ASSERT_RETURN((uint64_t)pid == item->pids.ppid);
-+
-+ kdbus_msg_free(msg);
-+
-+
-+ /* Phase 2) Privileged connection sends now 3 packets */
-+
-+ /*
-+ * Sending to unprivileged connections a unicast
-+ */
-+ ret = kdbus_msg_send(conn, NULL, 0xdeadbeef, 0, 0,
-+ 0, unpriv_conn_id);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* signal to child that is in its userns */
-+ ret = eventfd_write(efd, 1);
-+ ASSERT_EXIT(ret == 0);
-+
-+ /*
-+ * Sending to unprivileged/privilged in its userns
-+ * connections a unicast
-+ */
-+ ret = kdbus_msg_send(conn, NULL, 0xdeadbeef, 0, 0,
-+ 0, userns_conn_id);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /*
-+ * Sending to unprivileged connections a broadcast
-+ */
-+ ret = kdbus_msg_send(conn, NULL, 0xdeadbeef, 0, 0,
-+ 0, KDBUS_DST_ID_BROADCAST);
-+ ASSERT_RETURN(ret == 0);
-+
-+
-+wait:
-+ ret = waitpid(pid, &status, 0);
-+ ASSERT_RETURN(ret >= 0);
-+
-+ ASSERT_RETURN(WIFEXITED(status))
-+ ASSERT_RETURN(!WEXITSTATUS(status));
-+
-+ /* Dump monitor queue */
-+ kdbus_printf("\n\nMonitor queue:\n");
-+ for (;;) {
-+ ret = kdbus_msg_recv_poll(monitor, 100, &msg, NULL);
-+ if (ret < 0)
-+ break;
-+
-+ if (msg->payload_type == KDBUS_PAYLOAD_DBUS) {
-+ /*
-+ * Parent pidns should see all the
-+ * pids
-+ */
-+ item = kdbus_get_item(msg, KDBUS_ITEM_PIDS);
-+ ASSERT_RETURN(item);
-+
-+ ASSERT_RETURN(item->pids.pid != 0 &&
-+ item->pids.tid != 0 &&
-+ item->pids.ppid != 0);
-+ }
-+
-+ kdbus_msg_free(msg);
-+ }
-+
-+ kdbus_conn_free(monitor);
-+ close(efd);
-+
-+ return 0;
-+}
-+
-+int kdbus_test_metadata_ns(struct kdbus_test_env *env)
-+{
-+ int ret;
-+ struct kdbus_conn *holder, *conn;
-+ struct kdbus_policy_access policy_access = {
-+ /* Allow world so we can inspect metadata in namespace */
-+ .type = KDBUS_POLICY_ACCESS_WORLD,
-+ .id = geteuid(),
-+ .access = KDBUS_POLICY_TALK,
-+ };
-+
-+ /*
-+ * We require user-namespaces and all uids/gids
-+ * should be mapped (we can just require the necessary ones)
-+ */
-+ if (!config_user_ns_is_enabled() ||
-+ !all_uids_gids_are_mapped())
-+ return TEST_SKIP;
-+
-+ ret = test_is_capable(CAP_SETUID, CAP_SETGID, CAP_SYS_ADMIN, -1);
-+ ASSERT_RETURN(ret >= 0);
-+
-+ /* no enough privileges, SKIP test */
-+ if (!ret)
-+ return TEST_SKIP;
-+
-+ holder = kdbus_hello_registrar(env->buspath, "com.example.metadata",
-+ &policy_access, 1,
-+ KDBUS_HELLO_POLICY_HOLDER);
-+ ASSERT_RETURN(holder);
-+
-+ conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn);
-+
-+ ret = kdbus_add_match_empty(conn);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_name_acquire(conn, "com.example.metadata", NULL);
-+ ASSERT_EXIT(ret >= 0);
-+
-+ ret = kdbus_clone_userns_test(env->buspath, conn);
-+ ASSERT_RETURN(ret == 0);
-+
-+ kdbus_conn_free(holder);
-+ kdbus_conn_free(conn);
-+
-+ return TEST_OK;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-monitor.c b/tools/testing/selftests/kdbus/test-monitor.c
-new file mode 100644
-index 0000000..e00d738
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-monitor.c
-@@ -0,0 +1,176 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <time.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <stdbool.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <signal.h>
-+#include <sys/time.h>
-+#include <sys/mman.h>
-+#include <sys/capability.h>
-+#include <sys/wait.h>
-+
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+#include "kdbus-test.h"
-+
-+int kdbus_test_monitor(struct kdbus_test_env *env)
-+{
-+ struct kdbus_conn *monitor, *conn;
-+ unsigned int cookie = 0xdeadbeef;
-+ struct kdbus_msg *msg;
-+ uint64_t offset = 0;
-+ int ret;
-+
-+ conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn);
-+
-+ /* add matches to make sure the monitor do not trigger an item add or
-+ * remove on connect and disconnect, respectively.
-+ */
-+ ret = kdbus_add_match_id(conn, 0x1, KDBUS_ITEM_ID_ADD,
-+ KDBUS_MATCH_ID_ANY);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_add_match_id(conn, 0x2, KDBUS_ITEM_ID_REMOVE,
-+ KDBUS_MATCH_ID_ANY);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* register a monitor */
-+ monitor = kdbus_hello(env->buspath, KDBUS_HELLO_MONITOR, NULL, 0);
-+ ASSERT_RETURN(monitor);
-+
-+ /* make sure we did not receive a monitor connect notification */
-+ ret = kdbus_msg_recv(conn, &msg, &offset);
-+ ASSERT_RETURN(ret == -EAGAIN);
-+
-+ /* check that a monitor cannot acquire a name */
-+ ret = kdbus_name_acquire(monitor, "foo.bar.baz", NULL);
-+ ASSERT_RETURN(ret == -EOPNOTSUPP);
-+
-+ ret = kdbus_msg_send(env->conn, NULL, cookie, 0, 0, 0, conn->id);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* the recipient should have gotten the message */
-+ ret = kdbus_msg_recv(conn, &msg, &offset);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(msg->cookie == cookie);
-+ kdbus_msg_free(msg);
-+ kdbus_free(conn, offset);
-+
-+ /* and so should the monitor */
-+ ret = kdbus_msg_recv(monitor, &msg, &offset);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(msg->cookie == cookie);
-+
-+ kdbus_msg_free(msg);
-+ kdbus_free(monitor, offset);
-+
-+ /* Installing matches for monitors must fais must fail */
-+ ret = kdbus_add_match_empty(monitor);
-+ ASSERT_RETURN(ret == -EOPNOTSUPP);
-+
-+ cookie++;
-+ ret = kdbus_msg_send(env->conn, NULL, cookie, 0, 0, 0,
-+ KDBUS_DST_ID_BROADCAST);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* The monitor should get the message. */
-+ ret = kdbus_msg_recv_poll(monitor, 100, &msg, &offset);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(msg->cookie == cookie);
-+
-+ kdbus_msg_free(msg);
-+ kdbus_free(monitor, offset);
-+
-+ /*
-+ * Since we are the only monitor, update the attach flags
-+ * and tell we are not interessted in attach flags recv
-+ */
-+
-+ ret = kdbus_conn_update_attach_flags(monitor,
-+ _KDBUS_ATTACH_ALL,
-+ 0);
-+ ASSERT_RETURN(ret == 0);
-+
-+ cookie++;
-+ ret = kdbus_msg_send(env->conn, NULL, cookie, 0, 0, 0,
-+ KDBUS_DST_ID_BROADCAST);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_msg_recv_poll(monitor, 100, &msg, &offset);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(msg->cookie == cookie);
-+
-+ ret = kdbus_item_in_message(msg, KDBUS_ITEM_TIMESTAMP);
-+ ASSERT_RETURN(ret == 0);
-+
-+ kdbus_msg_free(msg);
-+ kdbus_free(monitor, offset);
-+
-+ /*
-+ * Now we are interested in KDBUS_ITEM_TIMESTAMP and
-+ * KDBUS_ITEM_CREDS
-+ */
-+ ret = kdbus_conn_update_attach_flags(monitor,
-+ _KDBUS_ATTACH_ALL,
-+ KDBUS_ATTACH_TIMESTAMP |
-+ KDBUS_ATTACH_CREDS);
-+ ASSERT_RETURN(ret == 0);
-+
-+ cookie++;
-+ ret = kdbus_msg_send(env->conn, NULL, cookie, 0, 0, 0,
-+ KDBUS_DST_ID_BROADCAST);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_msg_recv_poll(monitor, 100, &msg, &offset);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(msg->cookie == cookie);
-+
-+ ret = kdbus_item_in_message(msg, KDBUS_ITEM_TIMESTAMP);
-+ ASSERT_RETURN(ret == 1);
-+
-+ ret = kdbus_item_in_message(msg, KDBUS_ITEM_CREDS);
-+ ASSERT_RETURN(ret == 1);
-+
-+ /* the KDBUS_ITEM_PID_COMM was not requested */
-+ ret = kdbus_item_in_message(msg, KDBUS_ITEM_PID_COMM);
-+ ASSERT_RETURN(ret == 0);
-+
-+ kdbus_msg_free(msg);
-+ kdbus_free(monitor, offset);
-+
-+ kdbus_conn_free(monitor);
-+ /* make sure we did not receive a monitor disconnect notification */
-+ ret = kdbus_msg_recv(conn, &msg, &offset);
-+ ASSERT_RETURN(ret == -EAGAIN);
-+
-+ kdbus_conn_free(conn);
-+
-+ /* Make sure that monitor as unprivileged is not allowed */
-+ ret = test_is_capable(CAP_SETUID, CAP_SETGID, -1);
-+ ASSERT_RETURN(ret >= 0);
-+
-+ if (ret && all_uids_gids_are_mapped()) {
-+ ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_UID, ({
-+ monitor = kdbus_hello(env->buspath,
-+ KDBUS_HELLO_MONITOR,
-+ NULL, 0);
-+ ASSERT_EXIT(!monitor && errno == EPERM);
-+
-+ _exit(EXIT_SUCCESS);
-+ }),
-+ ({ 0; }));
-+ ASSERT_RETURN(ret == 0);
-+ }
-+
-+ return TEST_OK;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-names.c b/tools/testing/selftests/kdbus/test-names.c
-new file mode 100644
-index 0000000..e400dc8
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-names.c
-@@ -0,0 +1,272 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <time.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <limits.h>
-+#include <getopt.h>
-+#include <stdbool.h>
-+
-+#include "kdbus-api.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+#include "kdbus-test.h"
-+
-+struct test_name {
-+ const char *name;
-+ __u64 owner_id;
-+ __u64 flags;
-+};
-+
-+static bool conn_test_names(const struct kdbus_conn *conn,
-+ const struct test_name *tests,
-+ unsigned int n_tests)
-+{
-+ struct kdbus_cmd_list cmd_list = {};
-+ struct kdbus_info *name, *list;
-+ unsigned int i;
-+ int ret;
-+
-+ cmd_list.size = sizeof(cmd_list);
-+ cmd_list.flags = KDBUS_LIST_NAMES |
-+ KDBUS_LIST_ACTIVATORS |
-+ KDBUS_LIST_QUEUED;
-+
-+ ret = kdbus_cmd_list(conn->fd, &cmd_list);
-+ ASSERT_RETURN(ret == 0);
-+
-+ list = (struct kdbus_info *)(conn->buf + cmd_list.offset);
-+
-+ for (i = 0; i < n_tests; i++) {
-+ const struct test_name *t = tests + i;
-+ bool found = false;
-+
-+ KDBUS_FOREACH(name, list, cmd_list.list_size) {
-+ struct kdbus_item *item;
-+
-+ KDBUS_ITEM_FOREACH(item, name, items) {
-+ if (item->type != KDBUS_ITEM_OWNED_NAME ||
-+ strcmp(item->name.name, t->name) != 0)
-+ continue;
-+
-+ if (t->owner_id == name->id &&
-+ t->flags == item->name.flags) {
-+ found = true;
-+ break;
-+ }
-+ }
-+ }
-+
-+ if (!found)
-+ return false;
-+ }
-+
-+ return true;
-+}
-+
-+static bool conn_is_name_primary_owner(const struct kdbus_conn *conn,
-+ const char *needle)
-+{
-+ struct test_name t = {
-+ .name = needle,
-+ .owner_id = conn->id,
-+ .flags = KDBUS_NAME_PRIMARY,
-+ };
-+
-+ return conn_test_names(conn, &t, 1);
-+}
-+
-+int kdbus_test_name_basic(struct kdbus_test_env *env)
-+{
-+ struct kdbus_conn *conn;
-+ char *name, *dot_name, *invalid_name, *wildcard_name;
-+ int ret;
-+
-+ name = "foo.bla.blaz";
-+ dot_name = ".bla.blaz";
-+ invalid_name = "foo";
-+ wildcard_name = "foo.bla.bl.*";
-+
-+ /* create a 2nd connection */
-+ conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn != NULL);
-+
-+ /* acquire name "foo.bar.xxx" name */
-+ ret = kdbus_name_acquire(conn, "foo.bar.xxx", NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* Name is not valid, must fail */
-+ ret = kdbus_name_acquire(env->conn, dot_name, NULL);
-+ ASSERT_RETURN(ret == -EINVAL);
-+
-+ ret = kdbus_name_acquire(env->conn, invalid_name, NULL);
-+ ASSERT_RETURN(ret == -EINVAL);
-+
-+ ret = kdbus_name_acquire(env->conn, wildcard_name, NULL);
-+ ASSERT_RETURN(ret == -EINVAL);
-+
-+ /* check that we can acquire a name */
-+ ret = kdbus_name_acquire(env->conn, name, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = conn_is_name_primary_owner(env->conn, name);
-+ ASSERT_RETURN(ret == true);
-+
-+ /* ... and release it again */
-+ ret = kdbus_name_release(env->conn, name);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = conn_is_name_primary_owner(env->conn, name);
-+ ASSERT_RETURN(ret == false);
-+
-+ /* check that we can't release it again */
-+ ret = kdbus_name_release(env->conn, name);
-+ ASSERT_RETURN(ret == -ESRCH);
-+
-+ /* check that we can't release a name that we don't own */
-+ ret = kdbus_name_release(env->conn, "foo.bar.xxx");
-+ ASSERT_RETURN(ret == -EADDRINUSE);
-+
-+ /* Name is not valid, must fail */
-+ ret = kdbus_name_release(env->conn, dot_name);
-+ ASSERT_RETURN(ret == -ESRCH);
-+
-+ ret = kdbus_name_release(env->conn, invalid_name);
-+ ASSERT_RETURN(ret == -ESRCH);
-+
-+ ret = kdbus_name_release(env->conn, wildcard_name);
-+ ASSERT_RETURN(ret == -ESRCH);
-+
-+ kdbus_conn_free(conn);
-+
-+ return TEST_OK;
-+}
-+
-+int kdbus_test_name_conflict(struct kdbus_test_env *env)
-+{
-+ struct kdbus_conn *conn;
-+ char *name;
-+ int ret;
-+
-+ name = "foo.bla.blaz";
-+
-+ /* create a 2nd connection */
-+ conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn != NULL);
-+
-+ /* allow the new connection to own the same name */
-+ /* acquire name from the 1st connection */
-+ ret = kdbus_name_acquire(env->conn, name, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = conn_is_name_primary_owner(env->conn, name);
-+ ASSERT_RETURN(ret == true);
-+
-+ /* check that we also can't acquire it again from the 2nd connection */
-+ ret = kdbus_name_acquire(conn, name, NULL);
-+ ASSERT_RETURN(ret == -EEXIST);
-+
-+ kdbus_conn_free(conn);
-+
-+ return TEST_OK;
-+}
-+
-+int kdbus_test_name_queue(struct kdbus_test_env *env)
-+{
-+ struct kdbus_conn *conn;
-+ struct test_name t[2];
-+ const char *name;
-+ uint64_t flags;
-+ int ret;
-+
-+ name = "foo.bla.blaz";
-+
-+ flags = 0;
-+
-+ /* create a 2nd connection */
-+ conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn != NULL);
-+
-+ /* allow the new connection to own the same name */
-+ /* acquire name from the 1st connection */
-+ ret = kdbus_name_acquire(env->conn, name, &flags);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = conn_is_name_primary_owner(env->conn, name);
-+ ASSERT_RETURN(ret == true);
-+
-+ /* queue the 2nd connection as waiting owner */
-+ flags = KDBUS_NAME_QUEUE;
-+ ret = kdbus_name_acquire(conn, name, &flags);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(flags & KDBUS_NAME_IN_QUEUE);
-+
-+ t[0].name = name;
-+ t[0].owner_id = env->conn->id;
-+ t[0].flags = KDBUS_NAME_PRIMARY;
-+ t[1].name = name;
-+ t[1].owner_id = conn->id;
-+ t[1].flags = KDBUS_NAME_QUEUE | KDBUS_NAME_IN_QUEUE;
-+ ret = conn_test_names(conn, t, 2);
-+ ASSERT_RETURN(ret == true);
-+
-+ /* release name from 1st connection */
-+ ret = kdbus_name_release(env->conn, name);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* now the name should be owned by the 2nd connection */
-+ t[0].name = name;
-+ t[0].owner_id = conn->id;
-+ t[0].flags = KDBUS_NAME_PRIMARY | KDBUS_NAME_QUEUE;
-+ ret = conn_test_names(conn, t, 1);
-+ ASSERT_RETURN(ret == true);
-+
-+ kdbus_conn_free(conn);
-+
-+ return TEST_OK;
-+}
-+
-+int kdbus_test_name_takeover(struct kdbus_test_env *env)
-+{
-+ struct kdbus_conn *conn;
-+ struct test_name t;
-+ const char *name;
-+ uint64_t flags;
-+ int ret;
-+
-+ name = "foo.bla.blaz";
-+
-+ flags = KDBUS_NAME_ALLOW_REPLACEMENT;
-+
-+ /* create a 2nd connection */
-+ conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn != NULL);
-+
-+ /* acquire name for 1st connection */
-+ ret = kdbus_name_acquire(env->conn, name, &flags);
-+ ASSERT_RETURN(ret == 0);
-+
-+ t.name = name;
-+ t.owner_id = env->conn->id;
-+ t.flags = KDBUS_NAME_ALLOW_REPLACEMENT | KDBUS_NAME_PRIMARY;
-+ ret = conn_test_names(conn, &t, 1);
-+ ASSERT_RETURN(ret == true);
-+
-+ /* now steal name with 2nd connection */
-+ flags = KDBUS_NAME_REPLACE_EXISTING;
-+ ret = kdbus_name_acquire(conn, name, &flags);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(flags & KDBUS_NAME_ACQUIRED);
-+
-+ ret = conn_is_name_primary_owner(conn, name);
-+ ASSERT_RETURN(ret == true);
-+
-+ kdbus_conn_free(conn);
-+
-+ return TEST_OK;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-policy-ns.c b/tools/testing/selftests/kdbus/test-policy-ns.c
-new file mode 100644
-index 0000000..3437012
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-policy-ns.c
-@@ -0,0 +1,632 @@
-+/*
-+ * Test metadata and policies in new namespaces. Even if our tests
-+ * can run in a namespaced setup, this test is necessary so we can
-+ * inspect policies on the same kdbusfs but between multiple
-+ * namespaces.
-+ *
-+ * Copyright (C) 2014-2015 Djalal Harouni
-+ *
-+ * kdbus is free software; you can redistribute it and/or modify it under
-+ * the terms of the GNU Lesser General Public License as published by the
-+ * Free Software Foundation; either version 2.1 of the License, or (at
-+ * your option) any later version.
-+ */
-+
-+#include <stdio.h>
-+#include <string.h>
-+#include <fcntl.h>
-+#include <pthread.h>
-+#include <sched.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <stdint.h>
-+#include <stdbool.h>
-+#include <unistd.h>
-+#include <errno.h>
-+#include <signal.h>
-+#include <sys/wait.h>
-+#include <sys/prctl.h>
-+#include <sys/eventfd.h>
-+#include <sys/syscall.h>
-+#include <sys/capability.h>
-+#include <linux/sched.h>
-+
-+#include "kdbus-test.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+
-+#define MAX_CONN 64
-+#define POLICY_NAME "foo.test.policy-test"
-+
-+#define KDBUS_CONN_MAX_MSGS_PER_USER 16
-+
-+/**
-+ * Note: this test can be used to inspect policy_db->talk_access_hash
-+ *
-+ * The purpose of these tests:
-+ * 1) Check KDBUS_POLICY_TALK
-+ * 2) Check the cache state: kdbus_policy_db->talk_access_hash
-+ * Should be extended
-+ */
-+
-+/**
-+ * Check a list of connections against conn_db[0]
-+ * conn_db[0] will own the name "foo.test.policy-test" and the
-+ * policy holder connection for this name will update the policy
-+ * entries, so different use cases can be tested.
-+ */
-+static struct kdbus_conn **conn_db;
-+
-+static void *kdbus_recv_echo(void *ptr)
-+{
-+ int ret;
-+ struct kdbus_conn *conn = ptr;
-+
-+ ret = kdbus_msg_recv_poll(conn, 200, NULL, NULL);
-+
-+ return (void *)(long)ret;
-+}
-+
-+/* Trigger kdbus_policy_set() */
-+static int kdbus_set_policy_talk(struct kdbus_conn *conn,
-+ const char *name,
-+ uid_t id, unsigned int type)
-+{
-+ int ret;
-+ struct kdbus_policy_access access = {
-+ .type = type,
-+ .id = id,
-+ .access = KDBUS_POLICY_TALK,
-+ };
-+
-+ ret = kdbus_conn_update_policy(conn, name, &access, 1);
-+ ASSERT_RETURN(ret == 0);
-+
-+ return TEST_OK;
-+}
-+
-+/* return TEST_OK or TEST_ERR on failure */
-+static int kdbus_register_same_activator(char *bus, const char *name,
-+ struct kdbus_conn **c)
-+{
-+ int ret;
-+ struct kdbus_conn *activator;
-+
-+ activator = kdbus_hello_activator(bus, name, NULL, 0);
-+ if (activator) {
-+ *c = activator;
-+ fprintf(stderr, "--- error was able to register name twice '%s'.\n",
-+ name);
-+ return TEST_ERR;
-+ }
-+
-+ ret = -errno;
-+ /* -EEXIST means test succeeded */
-+ if (ret == -EEXIST)
-+ return TEST_OK;
-+
-+ return TEST_ERR;
-+}
-+
-+/* return TEST_OK or TEST_ERR on failure */
-+static int kdbus_register_policy_holder(char *bus, const char *name,
-+ struct kdbus_conn **conn)
-+{
-+ struct kdbus_conn *c;
-+ struct kdbus_policy_access access[2];
-+
-+ access[0].type = KDBUS_POLICY_ACCESS_USER;
-+ access[0].access = KDBUS_POLICY_OWN;
-+ access[0].id = geteuid();
-+
-+ access[1].type = KDBUS_POLICY_ACCESS_WORLD;
-+ access[1].access = KDBUS_POLICY_TALK;
-+ access[1].id = geteuid();
-+
-+ c = kdbus_hello_registrar(bus, name, access, 2,
-+ KDBUS_HELLO_POLICY_HOLDER);
-+ ASSERT_RETURN(c);
-+
-+ *conn = c;
-+
-+ return TEST_OK;
-+}
-+
-+/**
-+ * Create new threads for receiving from multiple senders,
-+ * The 'conn_db' will be populated by newly created connections.
-+ * Caller should free all allocated connections.
-+ *
-+ * return 0 on success, negative errno on failure.
-+ */
-+static int kdbus_recv_in_threads(const char *bus, const char *name,
-+ struct kdbus_conn **conn_db)
-+{
-+ int ret;
-+ bool pool_full = false;
-+ unsigned int sent_packets = 0;
-+ unsigned int lost_packets = 0;
-+ unsigned int i, tid;
-+ unsigned long dst_id;
-+ unsigned long cookie = 1;
-+ unsigned int thread_nr = MAX_CONN - 1;
-+ pthread_t thread_id[MAX_CONN - 1] = {'\0'};
-+
-+ dst_id = name ? KDBUS_DST_ID_NAME : conn_db[0]->id;
-+
-+ for (tid = 0, i = 1; tid < thread_nr; tid++, i++) {
-+ ret = pthread_create(&thread_id[tid], NULL,
-+ kdbus_recv_echo, (void *)conn_db[0]);
-+ if (ret < 0) {
-+ ret = -errno;
-+ kdbus_printf("error pthread_create: %d (%m)\n",
-+ ret);
-+ break;
-+ }
-+
-+ /* just free before re-using */
-+ kdbus_conn_free(conn_db[i]);
-+ conn_db[i] = NULL;
-+
-+ /* We need to create connections here */
-+ conn_db[i] = kdbus_hello(bus, 0, NULL, 0);
-+ if (!conn_db[i]) {
-+ ret = -errno;
-+ break;
-+ }
-+
-+ ret = kdbus_add_match_empty(conn_db[i]);
-+ if (ret < 0)
-+ break;
-+
-+ ret = kdbus_msg_send(conn_db[i], name, cookie++,
-+ 0, 0, 0, dst_id);
-+ if (ret < 0) {
-+ /*
-+ * Receivers are not reading their messages,
-+ * not scheduled ?!
-+ *
-+ * So set the pool full here, perhaps the
-+ * connection pool or queue was full, later
-+ * recheck receivers errors
-+ */
-+ if (ret == -ENOBUFS || ret == -EXFULL)
-+ pool_full = true;
-+ break;
-+ }
-+
-+ sent_packets++;
-+ }
-+
-+ for (tid = 0; tid < thread_nr; tid++) {
-+ int thread_ret = 0;
-+
-+ if (thread_id[tid]) {
-+ pthread_join(thread_id[tid], (void *)&thread_ret);
-+ if (thread_ret < 0) {
-+ /* Update only if send did not fail */
-+ if (ret == 0)
-+ ret = thread_ret;
-+
-+ lost_packets++;
-+ }
-+ }
-+ }
-+
-+ /*
-+ * When sending if we did fail with -ENOBUFS or -EXFULL
-+ * then we should have set lost_packet and we should at
-+ * least have sent_packets set to KDBUS_CONN_MAX_MSGS_PER_USER
-+ */
-+ if (pool_full) {
-+ ASSERT_RETURN(lost_packets > 0);
-+
-+ /*
-+ * We should at least send KDBUS_CONN_MAX_MSGS_PER_USER
-+ *
-+ * For every send operation we create a thread to
-+ * recv the packet, so we keep the queue clean
-+ */
-+ ASSERT_RETURN(sent_packets >= KDBUS_CONN_MAX_MSGS_PER_USER);
-+
-+ /*
-+ * Set ret to zero since we only failed due to
-+ * the receiving threads that have not been
-+ * scheduled
-+ */
-+ ret = 0;
-+ }
-+
-+ return ret;
-+}
-+
-+/* Return: TEST_OK or TEST_ERR on failure */
-+static int kdbus_normal_test(const char *bus, const char *name,
-+ struct kdbus_conn **conn_db)
-+{
-+ int ret;
-+
-+ ret = kdbus_recv_in_threads(bus, name, conn_db);
-+ ASSERT_RETURN(ret >= 0);
-+
-+ return TEST_OK;
-+}
-+
-+static int kdbus_fork_test_by_id(const char *bus,
-+ struct kdbus_conn **conn_db,
-+ int parent_status, int child_status)
-+{
-+ int ret;
-+ pid_t pid;
-+ uint64_t cookie = 0x9876ecba;
-+ struct kdbus_msg *msg = NULL;
-+ uint64_t offset = 0;
-+ int status = 0;
-+
-+ /*
-+ * If the child_status is not EXIT_SUCCESS, then we expect
-+ * that sending from the child will fail, thus receiving
-+ * from parent must error with -ETIMEDOUT, and vice versa.
-+ */
-+ bool parent_timedout = !!child_status;
-+ bool child_timedout = !!parent_status;
-+
-+ pid = fork();
-+ ASSERT_RETURN_VAL(pid >= 0, pid);
-+
-+ if (pid == 0) {
-+ struct kdbus_conn *conn_src;
-+
-+ ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
-+ ASSERT_EXIT(ret == 0);
-+
-+ ret = drop_privileges(65534, 65534);
-+ ASSERT_EXIT(ret == 0);
-+
-+ conn_src = kdbus_hello(bus, 0, NULL, 0);
-+ ASSERT_EXIT(conn_src);
-+
-+ ret = kdbus_add_match_empty(conn_src);
-+ ASSERT_EXIT(ret == 0);
-+
-+ /*
-+ * child_status is always checked against send
-+ * operations, in case it fails always return
-+ * EXIT_FAILURE.
-+ */
-+ ret = kdbus_msg_send(conn_src, NULL, cookie,
-+ 0, 0, 0, conn_db[0]->id);
-+ ASSERT_EXIT(ret == child_status);
-+
-+ ret = kdbus_msg_recv_poll(conn_src, 100, NULL, NULL);
-+
-+ kdbus_conn_free(conn_src);
-+
-+ /*
-+ * Child kdbus_msg_recv_poll() should timeout since
-+ * the parent_status was set to a non EXIT_SUCCESS
-+ * value.
-+ */
-+ if (child_timedout)
-+ _exit(ret == -ETIMEDOUT ? EXIT_SUCCESS : EXIT_FAILURE);
-+
-+ _exit(ret == 0 ? EXIT_SUCCESS : EXIT_FAILURE);
-+ }
-+
-+ ret = kdbus_msg_recv_poll(conn_db[0], 100, &msg, &offset);
-+ /*
-+ * If parent_timedout is set then this should fail with
-+ * -ETIMEDOUT since the child_status was set to a non
-+ * EXIT_SUCCESS value. Otherwise, assume
-+ * that kdbus_msg_recv_poll() has succeeded.
-+ */
-+ if (parent_timedout) {
-+ ASSERT_RETURN_VAL(ret == -ETIMEDOUT, TEST_ERR);
-+
-+ /* timedout no need to continue, we don't have the
-+ * child connection ID, so just terminate. */
-+ goto out;
-+ } else {
-+ ASSERT_RETURN_VAL(ret == 0, ret);
-+ }
-+
-+ ret = kdbus_msg_send(conn_db[0], NULL, ++cookie,
-+ 0, 0, 0, msg->src_id);
-+ /*
-+ * parent_status is checked against send operations,
-+ * on failures always return TEST_ERR.
-+ */
-+ ASSERT_RETURN_VAL(ret == parent_status, TEST_ERR);
-+
-+ kdbus_msg_free(msg);
-+ kdbus_free(conn_db[0], offset);
-+
-+out:
-+ ret = waitpid(pid, &status, 0);
-+ ASSERT_RETURN_VAL(ret >= 0, ret);
-+
-+ return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
-+}
-+
-+/*
-+ * Return: TEST_OK, TEST_ERR or TEST_SKIP
-+ * we return TEST_OK only if the children return with the expected
-+ * 'expected_status' that is specified as an argument.
-+ */
-+static int kdbus_fork_test(const char *bus, const char *name,
-+ struct kdbus_conn **conn_db, int expected_status)
-+{
-+ pid_t pid;
-+ int ret = 0;
-+ int status = 0;
-+
-+ pid = fork();
-+ ASSERT_RETURN_VAL(pid >= 0, pid);
-+
-+ if (pid == 0) {
-+ ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
-+ ASSERT_EXIT(ret == 0);
-+
-+ ret = drop_privileges(65534, 65534);
-+ ASSERT_EXIT(ret == 0);
-+
-+ ret = kdbus_recv_in_threads(bus, name, conn_db);
-+ _exit(ret == expected_status ? EXIT_SUCCESS : EXIT_FAILURE);
-+ }
-+
-+ ret = waitpid(pid, &status, 0);
-+ ASSERT_RETURN(ret >= 0);
-+
-+ return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
-+}
-+
-+/* Return EXIT_SUCCESS, EXIT_FAILURE or negative errno */
-+static int __kdbus_clone_userns_test(const char *bus,
-+ const char *name,
-+ struct kdbus_conn **conn_db,
-+ int expected_status)
-+{
-+ int efd;
-+ pid_t pid;
-+ int ret = 0;
-+ unsigned int uid = 65534;
-+ int status;
-+
-+ ret = drop_privileges(uid, uid);
-+ ASSERT_RETURN_VAL(ret == 0, ret);
-+
-+ /*
-+ * Since we just dropped privileges, the dumpable flag was just
-+ * cleared which makes the /proc/$clone_child/uid_map to be
-+ * owned by root, hence any userns uid mapping will fail with
-+ * -EPERM since the mapping will be done by uid 65534.
-+ *
-+ * To avoid this set the dumpable flag again which makes procfs
-+ * update the /proc/$clone_child/ inodes owner to 65534.
-+ *
-+ * Using this we will be able write to /proc/$clone_child/uid_map
-+ * as uid 65534 and map the uid 65534 to 0 inside the user
-+ * namespace.
-+ */
-+ ret = prctl(PR_SET_DUMPABLE, SUID_DUMP_USER);
-+ ASSERT_RETURN_VAL(ret == 0, ret);
-+
-+ /* sync parent/child */
-+ efd = eventfd(0, EFD_CLOEXEC);
-+ ASSERT_RETURN_VAL(efd >= 0, efd);
-+
-+ pid = syscall(__NR_clone, SIGCHLD|CLONE_NEWUSER, NULL);
-+ if (pid < 0) {
-+ ret = -errno;
-+ kdbus_printf("error clone: %d (%m)\n", ret);
-+ /*
-+ * Normal user not allowed to create userns,
-+ * so nothing to worry about ?
-+ */
-+ if (ret == -EPERM) {
-+ kdbus_printf("-- CLONE_NEWUSER TEST Failed for uid: %u\n"
-+ "-- Make sure that your kernel do not allow "
-+ "CLONE_NEWUSER for unprivileged users\n"
-+ "-- Upstream Commit: "
-+ "https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5eaf563e\n",
-+ uid);
-+ ret = 0;
-+ }
-+
-+ return ret;
-+ }
-+
-+ if (pid == 0) {
-+ struct kdbus_conn *conn_src;
-+ eventfd_t event_status = 0;
-+
-+ ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
-+ ASSERT_EXIT(ret == 0);
-+
-+ ret = eventfd_read(efd, &event_status);
-+ ASSERT_EXIT(ret >= 0 && event_status == 1);
-+
-+ /* ping connection from the new user namespace */
-+ conn_src = kdbus_hello(bus, 0, NULL, 0);
-+ ASSERT_EXIT(conn_src);
-+
-+ ret = kdbus_add_match_empty(conn_src);
-+ ASSERT_EXIT(ret == 0);
-+
-+ ret = kdbus_msg_send(conn_src, name, 0xabcd1234,
-+ 0, 0, 0, KDBUS_DST_ID_NAME);
-+ kdbus_conn_free(conn_src);
-+
-+ _exit(ret == expected_status ? EXIT_SUCCESS : EXIT_FAILURE);
-+ }
-+
-+ ret = userns_map_uid_gid(pid, "0 65534 1", "0 65534 1");
-+ ASSERT_RETURN_VAL(ret == 0, ret);
-+
-+ /* Tell child we are ready */
-+ ret = eventfd_write(efd, 1);
-+ ASSERT_RETURN_VAL(ret == 0, ret);
-+
-+ ret = waitpid(pid, &status, 0);
-+ ASSERT_RETURN_VAL(ret >= 0, ret);
-+
-+ close(efd);
-+
-+ return status == EXIT_SUCCESS ? TEST_OK : TEST_ERR;
-+}
-+
-+static int kdbus_clone_userns_test(const char *bus,
-+ const char *name,
-+ struct kdbus_conn **conn_db,
-+ int expected_status)
-+{
-+ pid_t pid;
-+ int ret = 0;
-+ int status;
-+
-+ pid = fork();
-+ ASSERT_RETURN_VAL(pid >= 0, -errno);
-+
-+ if (pid == 0) {
-+ ret = prctl(PR_SET_PDEATHSIG, SIGKILL);
-+ if (ret < 0)
-+ _exit(EXIT_FAILURE);
-+
-+ ret = __kdbus_clone_userns_test(bus, name, conn_db,
-+ expected_status);
-+ _exit(ret);
-+ }
-+
-+ /*
-+ * Receive in the original (root privileged) user namespace,
-+ * must fail with -ETIMEDOUT.
-+ */
-+ ret = kdbus_msg_recv_poll(conn_db[0], 100, NULL, NULL);
-+ ASSERT_RETURN_VAL(ret == -ETIMEDOUT, ret);
-+
-+ ret = waitpid(pid, &status, 0);
-+ ASSERT_RETURN_VAL(ret >= 0, ret);
-+
-+ return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
-+}
-+
-+int kdbus_test_policy_ns(struct kdbus_test_env *env)
-+{
-+ int i;
-+ int ret;
-+ struct kdbus_conn *activator = NULL;
-+ struct kdbus_conn *policy_holder = NULL;
-+ char *bus = env->buspath;
-+
-+ ret = test_is_capable(CAP_SETUID, CAP_SETGID, -1);
-+ ASSERT_RETURN(ret >= 0);
-+
-+ /* no enough privileges, SKIP test */
-+ if (!ret)
-+ return TEST_SKIP;
-+
-+ /* we require user-namespaces */
-+ if (access("/proc/self/uid_map", F_OK) != 0)
-+ return TEST_SKIP;
-+
-+ /* uids/gids must be mapped */
-+ if (!all_uids_gids_are_mapped())
-+ return TEST_SKIP;
-+
-+ conn_db = calloc(MAX_CONN, sizeof(struct kdbus_conn *));
-+ ASSERT_RETURN(conn_db);
-+
-+ memset(conn_db, 0, MAX_CONN * sizeof(struct kdbus_conn *));
-+
-+ conn_db[0] = kdbus_hello(bus, 0, NULL, 0);
-+ ASSERT_RETURN(conn_db[0]);
-+
-+ ret = kdbus_add_match_empty(conn_db[0]);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_fork_test_by_id(bus, conn_db, -EPERM, -EPERM);
-+ ASSERT_EXIT(ret == 0);
-+
-+ ret = kdbus_register_policy_holder(bus, POLICY_NAME,
-+ &policy_holder);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* Try to register the same name with an activator */
-+ ret = kdbus_register_same_activator(bus, POLICY_NAME,
-+ &activator);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* Acquire POLICY_NAME */
-+ ret = kdbus_name_acquire(conn_db[0], POLICY_NAME, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_normal_test(bus, POLICY_NAME, conn_db);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_list(conn_db[0], KDBUS_LIST_NAMES |
-+ KDBUS_LIST_UNIQUE |
-+ KDBUS_LIST_ACTIVATORS |
-+ KDBUS_LIST_QUEUED);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_fork_test(bus, POLICY_NAME, conn_db, EXIT_SUCCESS);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /*
-+ * children connections are able to talk to conn_db[0] since
-+ * current POLICY_NAME TALK type is KDBUS_POLICY_ACCESS_WORLD,
-+ * so expect EXIT_SUCCESS when sending from child. However,
-+ * since the child's connection does not own any well-known
-+ * name, The parent connection conn_db[0] should fail with
-+ * -EPERM but since it is a privileged bus user the TALK is
-+ * allowed.
-+ */
-+ ret = kdbus_fork_test_by_id(bus, conn_db,
-+ EXIT_SUCCESS, EXIT_SUCCESS);
-+ ASSERT_EXIT(ret == 0);
-+
-+ /*
-+ * Connections that can talk are perhaps being destroyed now.
-+ * Restrict the policy and purge cache entries where the
-+ * conn_db[0] is the destination.
-+ *
-+ * Now only connections with uid == 0 are allowed to talk.
-+ */
-+ ret = kdbus_set_policy_talk(policy_holder, POLICY_NAME,
-+ geteuid(), KDBUS_POLICY_ACCESS_USER);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /*
-+ * Testing connections (FORK+DROP) again:
-+ * After setting the policy re-check connections
-+ * we expect the children to fail with -EPERM
-+ */
-+ ret = kdbus_fork_test(bus, POLICY_NAME, conn_db, -EPERM);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /*
-+ * Now expect that both parent and child to fail.
-+ *
-+ * Child should fail with -EPERM since we just restricted
-+ * the POLICY_NAME TALK to uid 0 and its uid is 65534.
-+ *
-+ * Since the parent's connection will timeout when receiving
-+ * from the child, we never continue. FWIW just put -EPERM.
-+ */
-+ ret = kdbus_fork_test_by_id(bus, conn_db, -EPERM, -EPERM);
-+ ASSERT_EXIT(ret == 0);
-+
-+ /* Check if the name can be reached in a new userns */
-+ ret = kdbus_clone_userns_test(bus, POLICY_NAME, conn_db, -EPERM);
-+ ASSERT_RETURN(ret == 0);
-+
-+ for (i = 0; i < MAX_CONN; i++)
-+ kdbus_conn_free(conn_db[i]);
-+
-+ kdbus_conn_free(activator);
-+ kdbus_conn_free(policy_holder);
-+
-+ free(conn_db);
-+
-+ return ret;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-policy-priv.c b/tools/testing/selftests/kdbus/test-policy-priv.c
-new file mode 100644
-index 0000000..0208638
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-policy-priv.c
-@@ -0,0 +1,1285 @@
-+#include <errno.h>
-+#include <stdio.h>
-+#include <string.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stdint.h>
-+#include <stdbool.h>
-+#include <unistd.h>
-+#include <time.h>
-+#include <sys/capability.h>
-+#include <sys/eventfd.h>
-+#include <sys/wait.h>
-+
-+#include "kdbus-test.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+
-+static int test_policy_priv_by_id(const char *bus,
-+ struct kdbus_conn *conn_dst,
-+ bool drop_second_user,
-+ int parent_status,
-+ int child_status)
-+{
-+ int ret = 0;
-+ uint64_t expected_cookie = time(NULL) ^ 0xdeadbeef;
-+
-+ ASSERT_RETURN(conn_dst);
-+
-+ ret = RUN_UNPRIVILEGED_CONN(unpriv, bus, ({
-+ ret = kdbus_msg_send(unpriv, NULL,
-+ expected_cookie, 0, 0, 0,
-+ conn_dst->id);
-+ ASSERT_EXIT(ret == child_status);
-+ }));
-+ ASSERT_RETURN(ret >= 0);
-+
-+ ret = kdbus_msg_recv_poll(conn_dst, 300, NULL, NULL);
-+ ASSERT_RETURN(ret == parent_status);
-+
-+ return 0;
-+}
-+
-+static int test_policy_priv_by_broadcast(const char *bus,
-+ struct kdbus_conn *conn_dst,
-+ int drop_second_user,
-+ int parent_status,
-+ int child_status)
-+{
-+ int efd;
-+ int ret = 0;
-+ eventfd_t event_status = 0;
-+ struct kdbus_msg *msg = NULL;
-+ uid_t second_uid = UNPRIV_UID;
-+ gid_t second_gid = UNPRIV_GID;
-+ struct kdbus_conn *child_2 = conn_dst;
-+ uint64_t expected_cookie = time(NULL) ^ 0xdeadbeef;
-+
-+ /* Drop to another unprivileged user other than UNPRIV_UID */
-+ if (drop_second_user == DROP_OTHER_UNPRIV) {
-+ second_uid = UNPRIV_UID - 1;
-+ second_gid = UNPRIV_GID - 1;
-+ }
-+
-+ /* child will signal parent to send broadcast */
-+ efd = eventfd(0, EFD_CLOEXEC);
-+ ASSERT_RETURN_VAL(efd >= 0, efd);
-+
-+ ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_GID, ({
-+ struct kdbus_conn *child;
-+
-+ child = kdbus_hello(bus, 0, NULL, 0);
-+ ASSERT_EXIT(child);
-+
-+ ret = kdbus_add_match_empty(child);
-+ ASSERT_EXIT(ret == 0);
-+
-+ /* signal parent */
-+ ret = eventfd_write(efd, 1);
-+ ASSERT_EXIT(ret == 0);
-+
-+ /* Use a little bit high time */
-+ ret = kdbus_msg_recv_poll(child, 500, &msg, NULL);
-+ ASSERT_EXIT(ret == child_status);
-+
-+ /*
-+ * If we expect the child to get the broadcast
-+ * message, then check the received cookie.
-+ */
-+ if (ret == 0) {
-+ ASSERT_EXIT(expected_cookie == msg->cookie);
-+ }
-+
-+ /* Use expected_cookie since 'msg' might be NULL */
-+ ret = kdbus_msg_send(child, NULL, expected_cookie + 1,
-+ 0, 0, 0, KDBUS_DST_ID_BROADCAST);
-+ ASSERT_EXIT(ret == 0);
-+
-+ kdbus_msg_free(msg);
-+ kdbus_conn_free(child);
-+ }),
-+ ({
-+ if (drop_second_user == DO_NOT_DROP) {
-+ ASSERT_RETURN(child_2);
-+
-+ ret = eventfd_read(efd, &event_status);
-+ ASSERT_RETURN(ret >= 0 && event_status == 1);
-+
-+ ret = kdbus_msg_send(child_2, NULL,
-+ expected_cookie, 0, 0, 0,
-+ KDBUS_DST_ID_BROADCAST);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* drop own broadcast */
-+ ret = kdbus_msg_recv(child_2, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(msg->src_id == child_2->id);
-+ kdbus_msg_free(msg);
-+
-+ /* Use a little bit high time */
-+ ret = kdbus_msg_recv_poll(child_2, 1000,
-+ &msg, NULL);
-+ ASSERT_RETURN(ret == parent_status);
-+
-+ /*
-+ * Check returned cookie in case we expect
-+ * success.
-+ */
-+ if (ret == 0) {
-+ ASSERT_RETURN(msg->cookie ==
-+ expected_cookie + 1);
-+ }
-+
-+ kdbus_msg_free(msg);
-+ } else {
-+ /*
-+ * Two unprivileged users will try to
-+ * communicate using broadcast.
-+ */
-+ ret = RUN_UNPRIVILEGED(second_uid, second_gid, ({
-+ child_2 = kdbus_hello(bus, 0, NULL, 0);
-+ ASSERT_EXIT(child_2);
-+
-+ ret = kdbus_add_match_empty(child_2);
-+ ASSERT_EXIT(ret == 0);
-+
-+ ret = eventfd_read(efd, &event_status);
-+ ASSERT_EXIT(ret >= 0 && event_status == 1);
-+
-+ ret = kdbus_msg_send(child_2, NULL,
-+ expected_cookie, 0, 0, 0,
-+ KDBUS_DST_ID_BROADCAST);
-+ ASSERT_EXIT(ret == 0);
-+
-+ /* drop own broadcast */
-+ ret = kdbus_msg_recv(child_2, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(msg->src_id == child_2->id);
-+ kdbus_msg_free(msg);
-+
-+ /* Use a little bit high time */
-+ ret = kdbus_msg_recv_poll(child_2, 1000,
-+ &msg, NULL);
-+ ASSERT_EXIT(ret == parent_status);
-+
-+ /*
-+ * Check returned cookie in case we expect
-+ * success.
-+ */
-+ if (ret == 0) {
-+ ASSERT_EXIT(msg->cookie ==
-+ expected_cookie + 1);
-+ }
-+
-+ kdbus_msg_free(msg);
-+ kdbus_conn_free(child_2);
-+ }),
-+ ({ 0; }));
-+ ASSERT_RETURN(ret == 0);
-+ }
-+ }));
-+ ASSERT_RETURN(ret == 0);
-+
-+ close(efd);
-+
-+ return ret;
-+}
-+
-+static void nosig(int sig)
-+{
-+}
-+
-+static int test_priv_before_policy_upload(struct kdbus_test_env *env)
-+{
-+ int ret = 0;
-+ struct kdbus_conn *conn;
-+
-+ conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn);
-+
-+ /*
-+ * Make sure unprivileged bus user cannot acquire names
-+ * before registring any policy holder.
-+ */
-+
-+ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+ ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
-+ ASSERT_EXIT(ret < 0);
-+ }));
-+ ASSERT_RETURN(ret == 0);
-+
-+ /*
-+ * Make sure unprivileged bus users cannot talk by default
-+ * to privileged ones, unless a policy holder that allows
-+ * this was uploaded.
-+ */
-+
-+ ret = test_policy_priv_by_id(env->buspath, conn, false,
-+ -ETIMEDOUT, -EPERM);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* Activate matching for a privileged connection */
-+ ret = kdbus_add_match_empty(conn);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /*
-+ * First make sure that BROADCAST with msg flag
-+ * KDBUS_MSG_EXPECT_REPLY will fail with -ENOTUNIQ
-+ */
-+ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+ ret = kdbus_msg_send(unpriv, NULL, 0xdeadbeef,
-+ KDBUS_MSG_EXPECT_REPLY,
-+ 5000000000ULL, 0,
-+ KDBUS_DST_ID_BROADCAST);
-+ ASSERT_EXIT(ret == -ENOTUNIQ);
-+ }));
-+ ASSERT_RETURN(ret == 0);
-+
-+ /*
-+ * Test broadcast with a privileged connection.
-+ *
-+ * The first unprivileged receiver should not get the
-+ * broadcast message sent by the privileged connection,
-+ * since there is no a TALK policy that allows the
-+ * unprivileged to TALK to the privileged connection. It
-+ * will fail with -ETIMEDOUT
-+ *
-+ * Then second case:
-+ * The privileged connection should get the broadcast
-+ * message from the unprivileged one. Since the receiver is
-+ * a privileged bus user and it has default TALK access to
-+ * all connections it will receive those.
-+ */
-+
-+ ret = test_policy_priv_by_broadcast(env->buspath, conn,
-+ DO_NOT_DROP,
-+ 0, -ETIMEDOUT);
-+ ASSERT_RETURN(ret == 0);
-+
-+
-+ /*
-+ * Test broadcast with two unprivileged connections running
-+ * under the same user.
-+ *
-+ * Both connections should succeed.
-+ */
-+
-+ ret = test_policy_priv_by_broadcast(env->buspath, NULL,
-+ DROP_SAME_UNPRIV, 0, 0);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /*
-+ * Test broadcast with two unprivileged connections running
-+ * under different users.
-+ *
-+ * Both connections will fail with -ETIMEDOUT.
-+ */
-+
-+ ret = test_policy_priv_by_broadcast(env->buspath, NULL,
-+ DROP_OTHER_UNPRIV,
-+ -ETIMEDOUT, -ETIMEDOUT);
-+ ASSERT_RETURN(ret == 0);
-+
-+ kdbus_conn_free(conn);
-+
-+ return ret;
-+}
-+
-+static int test_broadcast_after_policy_upload(struct kdbus_test_env *env)
-+{
-+ int ret;
-+ int efd;
-+ eventfd_t event_status = 0;
-+ struct kdbus_msg *msg = NULL;
-+ struct kdbus_conn *owner_a, *owner_b;
-+ struct kdbus_conn *holder_a, *holder_b;
-+ struct kdbus_policy_access access = {};
-+ uint64_t expected_cookie = time(NULL) ^ 0xdeadbeef;
-+
-+ owner_a = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(owner_a);
-+
-+ ret = kdbus_name_acquire(owner_a, "com.example.broadcastA", NULL);
-+ ASSERT_EXIT(ret >= 0);
-+
-+ /*
-+ * Make sure unprivileged bus users cannot talk by default
-+ * to privileged ones, unless a policy holder that allows
-+ * this was uploaded.
-+ */
-+
-+ ++expected_cookie;
-+ ret = test_policy_priv_by_id(env->buspath, owner_a, false,
-+ -ETIMEDOUT, -EPERM);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /*
-+ * Make sure that privileged won't receive broadcasts unless
-+ * it installs a match. It will fail with -ETIMEDOUT
-+ *
-+ * At same time check that the unprivileged connection will
-+ * not receive the broadcast message from the privileged one
-+ * since the privileged one owns a name with a restricted
-+ * policy TALK (actually the TALK policy is still not
-+ * registered so we fail by default), thus the unprivileged
-+ * receiver is not able to TALK to that name.
-+ */
-+
-+ /* Activate matching for a privileged connection */
-+ ret = kdbus_add_match_empty(owner_a);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /*
-+ * Redo the previous test. The privileged conn owner_a is
-+ * able to TALK to any connection so it will receive the
-+ * broadcast message now.
-+ */
-+
-+ ret = test_policy_priv_by_broadcast(env->buspath, owner_a,
-+ DO_NOT_DROP,
-+ 0, -ETIMEDOUT);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /*
-+ * Test that broadcast between two unprivileged users running
-+ * under the same user still succeed.
-+ */
-+
-+ ret = test_policy_priv_by_broadcast(env->buspath, NULL,
-+ DROP_SAME_UNPRIV, 0, 0);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /*
-+ * Test broadcast with two unprivileged connections running
-+ * under different users.
-+ *
-+ * Both connections will fail with -ETIMEDOUT.
-+ */
-+
-+ ret = test_policy_priv_by_broadcast(env->buspath, NULL,
-+ DROP_OTHER_UNPRIV,
-+ -ETIMEDOUT, -ETIMEDOUT);
-+ ASSERT_RETURN(ret == 0);
-+
-+ access = (struct kdbus_policy_access){
-+ .type = KDBUS_POLICY_ACCESS_USER,
-+ .id = geteuid(),
-+ .access = KDBUS_POLICY_OWN,
-+ };
-+
-+ holder_a = kdbus_hello_registrar(env->buspath,
-+ "com.example.broadcastA",
-+ &access, 1,
-+ KDBUS_HELLO_POLICY_HOLDER);
-+ ASSERT_RETURN(holder_a);
-+
-+ holder_b = kdbus_hello_registrar(env->buspath,
-+ "com.example.broadcastB",
-+ &access, 1,
-+ KDBUS_HELLO_POLICY_HOLDER);
-+ ASSERT_RETURN(holder_b);
-+
-+ /* Free connections and their received messages and restart */
-+ kdbus_conn_free(owner_a);
-+
-+ owner_a = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(owner_a);
-+
-+ /* Activate matching for a privileged connection */
-+ ret = kdbus_add_match_empty(owner_a);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_name_acquire(owner_a, "com.example.broadcastA", NULL);
-+ ASSERT_EXIT(ret >= 0);
-+
-+ owner_b = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(owner_b);
-+
-+ ret = kdbus_name_acquire(owner_b, "com.example.broadcastB", NULL);
-+ ASSERT_EXIT(ret >= 0);
-+
-+ /* Activate matching for a privileged connection */
-+ ret = kdbus_add_match_empty(owner_b);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /*
-+ * Test that even if "com.example.broadcastA" and
-+ * "com.example.broadcastB" do have a TALK access by default
-+ * they are able to signal each other using broadcast due to
-+ * the fact they are privileged connections, they receive
-+ * all broadcasts if the match allows it.
-+ */
-+
-+ ++expected_cookie;
-+ ret = kdbus_msg_send(owner_a, NULL, expected_cookie, 0,
-+ 0, 0, KDBUS_DST_ID_BROADCAST);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_msg_recv_poll(owner_a, 100, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(msg->cookie == expected_cookie);
-+
-+ /* Check src ID */
-+ ASSERT_RETURN(msg->src_id == owner_a->id);
-+
-+ kdbus_msg_free(msg);
-+
-+ ret = kdbus_msg_recv_poll(owner_b, 100, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+ ASSERT_RETURN(msg->cookie == expected_cookie);
-+
-+ /* Check src ID */
-+ ASSERT_RETURN(msg->src_id == owner_a->id);
-+
-+ kdbus_msg_free(msg);
-+
-+ /* Release name "com.example.broadcastB" */
-+
-+ ret = kdbus_name_release(owner_b, "com.example.broadcastB");
-+ ASSERT_EXIT(ret >= 0);
-+
-+ /* KDBUS_POLICY_OWN for unprivileged connections */
-+ access = (struct kdbus_policy_access){
-+ .type = KDBUS_POLICY_ACCESS_WORLD,
-+ .id = geteuid(),
-+ .access = KDBUS_POLICY_OWN,
-+ };
-+
-+ /* Update the policy so unprivileged will own the name */
-+
-+ ret = kdbus_conn_update_policy(holder_b,
-+ "com.example.broadcastB",
-+ &access, 1);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /*
-+ * Send broadcasts from an unprivileged connection that
-+ * owns a name "com.example.broadcastB".
-+ *
-+ * We'll have four destinations here:
-+ *
-+ * 1) destination owner_a: privileged connection that owns
-+ * "com.example.broadcastA". It will receive the broadcast
-+ * since it is a privileged has default TALK access to all
-+ * connections, and it is subscribed to the match.
-+ * Will succeed.
-+ *
-+ * owner_b: privileged connection (running under a different
-+ * uid) that do not own names, but with an empty broadcast
-+ * match, so it will receive broadcasts since it has default
-+ * TALK access to all connection.
-+ *
-+ * unpriv_a: unpriv connection that do not own any name.
-+ * It will receive the broadcast since it is running under
-+ * the same user of the one broadcasting and did install
-+ * matches. It should get the message.
-+ *
-+ * unpriv_b: unpriv connection is not interested in broadcast
-+ * messages, so it did not install broadcast matches. Should
-+ * fail with -ETIMEDOUT
-+ */
-+
-+ ++expected_cookie;
-+ efd = eventfd(0, EFD_CLOEXEC);
-+ ASSERT_RETURN_VAL(efd >= 0, efd);
-+
-+ ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_UID, ({
-+ struct kdbus_conn *unpriv_owner;
-+ struct kdbus_conn *unpriv_a, *unpriv_b;
-+
-+ unpriv_owner = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_EXIT(unpriv_owner);
-+
-+ unpriv_a = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_EXIT(unpriv_a);
-+
-+ unpriv_b = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_EXIT(unpriv_b);
-+
-+ ret = kdbus_name_acquire(unpriv_owner,
-+ "com.example.broadcastB",
-+ NULL);
-+ ASSERT_EXIT(ret >= 0);
-+
-+ ret = kdbus_add_match_empty(unpriv_a);
-+ ASSERT_EXIT(ret == 0);
-+
-+ /* Signal that we are doing broadcasts */
-+ ret = eventfd_write(efd, 1);
-+ ASSERT_EXIT(ret == 0);
-+
-+ /*
-+ * Do broadcast from a connection that owns the
-+ * names "com.example.broadcastB".
-+ */
-+ ret = kdbus_msg_send(unpriv_owner, NULL,
-+ expected_cookie,
-+ 0, 0, 0,
-+ KDBUS_DST_ID_BROADCAST);
-+ ASSERT_EXIT(ret == 0);
-+
-+ /*
-+ * Unprivileged connection running under the same
-+ * user. It should succeed.
-+ */
-+ ret = kdbus_msg_recv_poll(unpriv_a, 300, &msg, NULL);
-+ ASSERT_EXIT(ret == 0 && msg->cookie == expected_cookie);
-+
-+ /*
-+ * Did not install matches, not interested in
-+ * broadcasts
-+ */
-+ ret = kdbus_msg_recv_poll(unpriv_b, 300, NULL, NULL);
-+ ASSERT_EXIT(ret == -ETIMEDOUT);
-+ }),
-+ ({
-+ ret = eventfd_read(efd, &event_status);
-+ ASSERT_RETURN(ret >= 0 && event_status == 1);
-+
-+ /*
-+ * owner_a must fail with -ETIMEDOUT, since it owns
-+ * name "com.example.broadcastA" and its TALK
-+ * access is restriced.
-+ */
-+ ret = kdbus_msg_recv_poll(owner_a, 300, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* confirm the received cookie */
-+ ASSERT_RETURN(msg->cookie == expected_cookie);
-+
-+ kdbus_msg_free(msg);
-+
-+ /*
-+ * owner_b got the broadcast from an unprivileged
-+ * connection.
-+ */
-+ ret = kdbus_msg_recv_poll(owner_b, 300, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* confirm the received cookie */
-+ ASSERT_RETURN(msg->cookie == expected_cookie);
-+
-+ kdbus_msg_free(msg);
-+
-+ }));
-+ ASSERT_RETURN(ret == 0);
-+
-+ close(efd);
-+
-+ /*
-+ * Test broadcast with two unprivileged connections running
-+ * under different users.
-+ *
-+ * Both connections will fail with -ETIMEDOUT.
-+ */
-+
-+ ret = test_policy_priv_by_broadcast(env->buspath, NULL,
-+ DROP_OTHER_UNPRIV,
-+ -ETIMEDOUT, -ETIMEDOUT);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* Drop received broadcasts by privileged */
-+ ret = kdbus_msg_recv_poll(owner_a, 100, NULL, NULL);
-+ ret = kdbus_msg_recv_poll(owner_a, 100, NULL, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_msg_recv(owner_a, NULL, NULL);
-+ ASSERT_RETURN(ret == -EAGAIN);
-+
-+ ret = kdbus_msg_recv_poll(owner_b, 100, NULL, NULL);
-+ ret = kdbus_msg_recv_poll(owner_b, 100, NULL, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_msg_recv(owner_b, NULL, NULL);
-+ ASSERT_RETURN(ret == -EAGAIN);
-+
-+ /*
-+ * Perform last tests, allow others to talk to name
-+ * "com.example.broadcastA". So now receiving broadcasts
-+ * from it should succeed since the TALK policy allow it.
-+ */
-+
-+ /* KDBUS_POLICY_OWN for unprivileged connections */
-+ access = (struct kdbus_policy_access){
-+ .type = KDBUS_POLICY_ACCESS_WORLD,
-+ .id = geteuid(),
-+ .access = KDBUS_POLICY_TALK,
-+ };
-+
-+ ret = kdbus_conn_update_policy(holder_a,
-+ "com.example.broadcastA",
-+ &access, 1);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /*
-+ * Unprivileged is able to TALK to "com.example.broadcastA"
-+ * now so it will receive its broadcasts
-+ */
-+ ret = test_policy_priv_by_broadcast(env->buspath, owner_a,
-+ DO_NOT_DROP, 0, 0);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ++expected_cookie;
-+ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+ ret = kdbus_name_acquire(unpriv, "com.example.broadcastB",
-+ NULL);
-+ ASSERT_EXIT(ret >= 0);
-+ ret = kdbus_msg_send(unpriv, NULL, expected_cookie,
-+ 0, 0, 0, KDBUS_DST_ID_BROADCAST);
-+ ASSERT_EXIT(ret == 0);
-+ }));
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* owner_a is privileged it will get the broadcast now. */
-+ ret = kdbus_msg_recv_poll(owner_a, 300, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* confirm the received cookie */
-+ ASSERT_RETURN(msg->cookie == expected_cookie);
-+
-+ kdbus_msg_free(msg);
-+
-+ /*
-+ * owner_a released name "com.example.broadcastA". It should
-+ * receive broadcasts since it is still privileged and has
-+ * the right match.
-+ *
-+ * Unprivileged connection will own a name and will try to
-+ * signal to the privileged connection.
-+ */
-+
-+ ret = kdbus_name_release(owner_a, "com.example.broadcastA");
-+ ASSERT_EXIT(ret >= 0);
-+
-+ ++expected_cookie;
-+ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+ ret = kdbus_name_acquire(unpriv, "com.example.broadcastB",
-+ NULL);
-+ ASSERT_EXIT(ret >= 0);
-+ ret = kdbus_msg_send(unpriv, NULL, expected_cookie,
-+ 0, 0, 0, KDBUS_DST_ID_BROADCAST);
-+ ASSERT_EXIT(ret == 0);
-+ }));
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* owner_a will get the broadcast now. */
-+ ret = kdbus_msg_recv_poll(owner_a, 300, &msg, NULL);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /* confirm the received cookie */
-+ ASSERT_RETURN(msg->cookie == expected_cookie);
-+
-+ kdbus_msg_free(msg);
-+
-+ kdbus_conn_free(owner_a);
-+ kdbus_conn_free(owner_b);
-+ kdbus_conn_free(holder_a);
-+ kdbus_conn_free(holder_b);
-+
-+ return 0;
-+}
-+
-+static int test_policy_priv(struct kdbus_test_env *env)
-+{
-+ struct kdbus_conn *conn_a, *conn_b, *conn, *owner;
-+ struct kdbus_policy_access access, *acc;
-+ sigset_t sset;
-+ size_t num;
-+ int ret;
-+
-+ /*
-+ * Make sure we have CAP_SETUID/SETGID so we can drop privileges
-+ */
-+
-+ ret = test_is_capable(CAP_SETUID, CAP_SETGID, -1);
-+ ASSERT_RETURN(ret >= 0);
-+
-+ if (!ret)
-+ return TEST_SKIP;
-+
-+ /* make sure that uids and gids are mapped */
-+ if (!all_uids_gids_are_mapped())
-+ return TEST_SKIP;
-+
-+ /*
-+ * Setup:
-+ * conn_a: policy holder for com.example.a
-+ * conn_b: name holder of com.example.b
-+ */
-+
-+ signal(SIGUSR1, nosig);
-+ sigemptyset(&sset);
-+ sigaddset(&sset, SIGUSR1);
-+ sigprocmask(SIG_BLOCK, &sset, NULL);
-+
-+ conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn);
-+
-+ /*
-+ * Before registering any policy holder, make sure that the
-+ * bus is secure by default. This test is necessary, it catches
-+ * several cases where old D-Bus was vulnerable.
-+ */
-+
-+ ret = test_priv_before_policy_upload(env);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /*
-+ * Make sure unprivileged are not able to register policy
-+ * holders
-+ */
-+
-+ ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_GID, ({
-+ struct kdbus_conn *holder;
-+
-+ holder = kdbus_hello_registrar(env->buspath,
-+ "com.example.a", NULL, 0,
-+ KDBUS_HELLO_POLICY_HOLDER);
-+ ASSERT_EXIT(holder == NULL && errno == EPERM);
-+ }),
-+ ({ 0; }));
-+ ASSERT_RETURN(ret == 0);
-+
-+
-+ /* Register policy holder */
-+
-+ conn_a = kdbus_hello_registrar(env->buspath, "com.example.a",
-+ NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
-+ ASSERT_RETURN(conn_a);
-+
-+ conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn_b);
-+
-+ ret = kdbus_name_acquire(conn_b, "com.example.b", NULL);
-+ ASSERT_EXIT(ret >= 0);
-+
-+ /*
-+ * Make sure bus-owners can always acquire names.
-+ */
-+ ret = kdbus_name_acquire(conn, "com.example.a", NULL);
-+ ASSERT_EXIT(ret >= 0);
-+
-+ kdbus_conn_free(conn);
-+
-+ /*
-+ * Make sure unprivileged users cannot acquire names with default
-+ * policy assigned.
-+ */
-+
-+ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+ ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
-+ ASSERT_EXIT(ret < 0);
-+ }));
-+ ASSERT_RETURN(ret >= 0);
-+
-+ /*
-+ * Make sure unprivileged users can acquire names if we make them
-+ * world-accessible.
-+ */
-+
-+ access = (struct kdbus_policy_access){
-+ .type = KDBUS_POLICY_ACCESS_WORLD,
-+ .id = 0,
-+ .access = KDBUS_POLICY_OWN,
-+ };
-+
-+ /*
-+ * Make sure unprivileged/normal connections are not able
-+ * to update policies
-+ */
-+
-+ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+ ret = kdbus_conn_update_policy(unpriv, "com.example.a",
-+ &access, 1);
-+ ASSERT_EXIT(ret == -EOPNOTSUPP);
-+ }));
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+ ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
-+ ASSERT_EXIT(ret >= 0);
-+ }));
-+ ASSERT_RETURN(ret >= 0);
-+
-+ /*
-+ * Make sure unprivileged users can acquire names if we make them
-+ * gid-accessible. But only if the gid matches.
-+ */
-+
-+ access = (struct kdbus_policy_access){
-+ .type = KDBUS_POLICY_ACCESS_GROUP,
-+ .id = UNPRIV_GID,
-+ .access = KDBUS_POLICY_OWN,
-+ };
-+
-+ ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+ ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
-+ ASSERT_EXIT(ret >= 0);
-+ }));
-+ ASSERT_RETURN(ret >= 0);
-+
-+ access = (struct kdbus_policy_access){
-+ .type = KDBUS_POLICY_ACCESS_GROUP,
-+ .id = 1,
-+ .access = KDBUS_POLICY_OWN,
-+ };
-+
-+ ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+ ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
-+ ASSERT_EXIT(ret < 0);
-+ }));
-+ ASSERT_RETURN(ret >= 0);
-+
-+ /*
-+ * Make sure unprivileged users can acquire names if we make them
-+ * uid-accessible. But only if the uid matches.
-+ */
-+
-+ access = (struct kdbus_policy_access){
-+ .type = KDBUS_POLICY_ACCESS_USER,
-+ .id = UNPRIV_UID,
-+ .access = KDBUS_POLICY_OWN,
-+ };
-+
-+ ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+ ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
-+ ASSERT_EXIT(ret >= 0);
-+ }));
-+ ASSERT_RETURN(ret >= 0);
-+
-+ access = (struct kdbus_policy_access){
-+ .type = KDBUS_POLICY_ACCESS_USER,
-+ .id = 1,
-+ .access = KDBUS_POLICY_OWN,
-+ };
-+
-+ ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+ ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
-+ ASSERT_EXIT(ret < 0);
-+ }));
-+ ASSERT_RETURN(ret >= 0);
-+
-+ /*
-+ * Make sure unprivileged users cannot acquire names if no owner-policy
-+ * matches, even if SEE/TALK policies match.
-+ */
-+
-+ num = 4;
-+ acc = (struct kdbus_policy_access[]){
-+ {
-+ .type = KDBUS_POLICY_ACCESS_GROUP,
-+ .id = UNPRIV_GID,
-+ .access = KDBUS_POLICY_SEE,
-+ },
-+ {
-+ .type = KDBUS_POLICY_ACCESS_USER,
-+ .id = UNPRIV_UID,
-+ .access = KDBUS_POLICY_TALK,
-+ },
-+ {
-+ .type = KDBUS_POLICY_ACCESS_WORLD,
-+ .id = 0,
-+ .access = KDBUS_POLICY_TALK,
-+ },
-+ {
-+ .type = KDBUS_POLICY_ACCESS_WORLD,
-+ .id = 0,
-+ .access = KDBUS_POLICY_SEE,
-+ },
-+ };
-+
-+ ret = kdbus_conn_update_policy(conn_a, "com.example.a", acc, num);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+ ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
-+ ASSERT_EXIT(ret < 0);
-+ }));
-+ ASSERT_RETURN(ret >= 0);
-+
-+ /*
-+ * Make sure unprivileged users can acquire names if the only matching
-+ * policy is somewhere in the middle.
-+ */
-+
-+ num = 5;
-+ acc = (struct kdbus_policy_access[]){
-+ {
-+ .type = KDBUS_POLICY_ACCESS_USER,
-+ .id = 1,
-+ .access = KDBUS_POLICY_OWN,
-+ },
-+ {
-+ .type = KDBUS_POLICY_ACCESS_USER,
-+ .id = 2,
-+ .access = KDBUS_POLICY_OWN,
-+ },
-+ {
-+ .type = KDBUS_POLICY_ACCESS_USER,
-+ .id = UNPRIV_UID,
-+ .access = KDBUS_POLICY_OWN,
-+ },
-+ {
-+ .type = KDBUS_POLICY_ACCESS_USER,
-+ .id = 3,
-+ .access = KDBUS_POLICY_OWN,
-+ },
-+ {
-+ .type = KDBUS_POLICY_ACCESS_USER,
-+ .id = 4,
-+ .access = KDBUS_POLICY_OWN,
-+ },
-+ };
-+
-+ ret = kdbus_conn_update_policy(conn_a, "com.example.a", acc, num);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+ ret = kdbus_name_acquire(unpriv, "com.example.a", NULL);
-+ ASSERT_EXIT(ret >= 0);
-+ }));
-+ ASSERT_RETURN(ret >= 0);
-+
-+ /*
-+ * Clear policies
-+ */
-+
-+ ret = kdbus_conn_update_policy(conn_a, "com.example.a", NULL, 0);
-+ ASSERT_RETURN(ret == 0);
-+
-+ /*
-+ * Make sure privileged bus users can _always_ talk to others.
-+ */
-+
-+ conn = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn);
-+
-+ ret = kdbus_msg_send(conn, "com.example.b", 0xdeadbeef, 0, 0, 0, 0);
-+ ASSERT_EXIT(ret >= 0);
-+
-+ ret = kdbus_msg_recv_poll(conn_b, 300, NULL, NULL);
-+ ASSERT_EXIT(ret >= 0);
-+
-+ kdbus_conn_free(conn);
-+
-+ /*
-+ * Make sure unprivileged bus users cannot talk by default.
-+ */
-+
-+ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+ ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
-+ 0, 0);
-+ ASSERT_EXIT(ret == -EPERM);
-+ }));
-+ ASSERT_RETURN(ret >= 0);
-+
-+ /*
-+ * Make sure unprivileged bus users can talk to equals, even without
-+ * policy.
-+ */
-+
-+ access = (struct kdbus_policy_access){
-+ .type = KDBUS_POLICY_ACCESS_USER,
-+ .id = UNPRIV_UID,
-+ .access = KDBUS_POLICY_OWN,
-+ };
-+
-+ ret = kdbus_conn_update_policy(conn_a, "com.example.c", &access, 1);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+ struct kdbus_conn *owner;
-+
-+ owner = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(owner);
-+
-+ ret = kdbus_name_acquire(owner, "com.example.c", NULL);
-+ ASSERT_EXIT(ret >= 0);
-+
-+ ret = kdbus_msg_send(unpriv, "com.example.c", 0xdeadbeef, 0, 0,
-+ 0, 0);
-+ ASSERT_EXIT(ret >= 0);
-+ ret = kdbus_msg_recv_poll(owner, 100, NULL, NULL);
-+ ASSERT_EXIT(ret >= 0);
-+
-+ kdbus_conn_free(owner);
-+ }));
-+ ASSERT_RETURN(ret >= 0);
-+
-+ /*
-+ * Make sure unprivileged bus users can talk to privileged users if a
-+ * suitable UID policy is set.
-+ */
-+
-+ access = (struct kdbus_policy_access){
-+ .type = KDBUS_POLICY_ACCESS_USER,
-+ .id = UNPRIV_UID,
-+ .access = KDBUS_POLICY_TALK,
-+ };
-+
-+ ret = kdbus_conn_update_policy(conn_a, "com.example.b", &access, 1);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+ ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
-+ 0, 0);
-+ ASSERT_EXIT(ret >= 0);
-+ }));
-+ ASSERT_RETURN(ret >= 0);
-+
-+ ret = kdbus_msg_recv_poll(conn_b, 100, NULL, NULL);
-+ ASSERT_EXIT(ret >= 0);
-+
-+ /*
-+ * Make sure unprivileged bus users can talk to privileged users if a
-+ * suitable GID policy is set.
-+ */
-+
-+ access = (struct kdbus_policy_access){
-+ .type = KDBUS_POLICY_ACCESS_GROUP,
-+ .id = UNPRIV_GID,
-+ .access = KDBUS_POLICY_TALK,
-+ };
-+
-+ ret = kdbus_conn_update_policy(conn_a, "com.example.b", &access, 1);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+ ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
-+ 0, 0);
-+ ASSERT_EXIT(ret >= 0);
-+ }));
-+ ASSERT_RETURN(ret >= 0);
-+
-+ ret = kdbus_msg_recv_poll(conn_b, 100, NULL, NULL);
-+ ASSERT_EXIT(ret >= 0);
-+
-+ /*
-+ * Make sure unprivileged bus users can talk to privileged users if a
-+ * suitable WORLD policy is set.
-+ */
-+
-+ access = (struct kdbus_policy_access){
-+ .type = KDBUS_POLICY_ACCESS_WORLD,
-+ .id = 0,
-+ .access = KDBUS_POLICY_TALK,
-+ };
-+
-+ ret = kdbus_conn_update_policy(conn_a, "com.example.b", &access, 1);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+ ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
-+ 0, 0);
-+ ASSERT_EXIT(ret >= 0);
-+ }));
-+ ASSERT_RETURN(ret >= 0);
-+
-+ ret = kdbus_msg_recv_poll(conn_b, 100, NULL, NULL);
-+ ASSERT_EXIT(ret >= 0);
-+
-+ /*
-+ * Make sure unprivileged bus users cannot talk to privileged users if
-+ * no suitable policy is set.
-+ */
-+
-+ num = 5;
-+ acc = (struct kdbus_policy_access[]){
-+ {
-+ .type = KDBUS_POLICY_ACCESS_USER,
-+ .id = 0,
-+ .access = KDBUS_POLICY_OWN,
-+ },
-+ {
-+ .type = KDBUS_POLICY_ACCESS_USER,
-+ .id = 1,
-+ .access = KDBUS_POLICY_TALK,
-+ },
-+ {
-+ .type = KDBUS_POLICY_ACCESS_USER,
-+ .id = UNPRIV_UID,
-+ .access = KDBUS_POLICY_SEE,
-+ },
-+ {
-+ .type = KDBUS_POLICY_ACCESS_USER,
-+ .id = 3,
-+ .access = KDBUS_POLICY_TALK,
-+ },
-+ {
-+ .type = KDBUS_POLICY_ACCESS_USER,
-+ .id = 4,
-+ .access = KDBUS_POLICY_TALK,
-+ },
-+ };
-+
-+ ret = kdbus_conn_update_policy(conn_a, "com.example.b", acc, num);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+ ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
-+ 0, 0);
-+ ASSERT_EXIT(ret == -EPERM);
-+ }));
-+ ASSERT_RETURN(ret >= 0);
-+
-+ /*
-+ * Make sure unprivileged bus users can talk to privileged users if a
-+ * suitable OWN privilege overwrites TALK.
-+ */
-+
-+ access = (struct kdbus_policy_access){
-+ .type = KDBUS_POLICY_ACCESS_WORLD,
-+ .id = 0,
-+ .access = KDBUS_POLICY_OWN,
-+ };
-+
-+ ret = kdbus_conn_update_policy(conn_a, "com.example.b", &access, 1);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+ ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
-+ 0, 0);
-+ ASSERT_EXIT(ret >= 0);
-+ }));
-+ ASSERT_RETURN(ret >= 0);
-+
-+ ret = kdbus_msg_recv_poll(conn_b, 100, NULL, NULL);
-+ ASSERT_EXIT(ret >= 0);
-+
-+ /*
-+ * Make sure the TALK cache is reset correctly when policies are
-+ * updated.
-+ */
-+
-+ access = (struct kdbus_policy_access){
-+ .type = KDBUS_POLICY_ACCESS_WORLD,
-+ .id = 0,
-+ .access = KDBUS_POLICY_TALK,
-+ };
-+
-+ ret = kdbus_conn_update_policy(conn_a, "com.example.b", &access, 1);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = RUN_UNPRIVILEGED_CONN(unpriv, env->buspath, ({
-+ ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
-+ 0, 0);
-+ ASSERT_EXIT(ret >= 0);
-+
-+ ret = kdbus_msg_recv_poll(conn_b, 100, NULL, NULL);
-+ ASSERT_EXIT(ret >= 0);
-+
-+ ret = kdbus_conn_update_policy(conn_a, "com.example.b",
-+ NULL, 0);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_msg_send(unpriv, "com.example.b", 0xdeadbeef, 0, 0,
-+ 0, 0);
-+ ASSERT_EXIT(ret == -EPERM);
-+ }));
-+ ASSERT_RETURN(ret >= 0);
-+
-+ /*
-+ * Make sure the TALK cache is reset correctly when policy holders
-+ * disconnect.
-+ */
-+
-+ access = (struct kdbus_policy_access){
-+ .type = KDBUS_POLICY_ACCESS_WORLD,
-+ .id = 0,
-+ .access = KDBUS_POLICY_OWN,
-+ };
-+
-+ conn = kdbus_hello_registrar(env->buspath, "com.example.c",
-+ NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
-+ ASSERT_RETURN(conn);
-+
-+ ret = kdbus_conn_update_policy(conn, "com.example.c", &access, 1);
-+ ASSERT_RETURN(ret == 0);
-+
-+ owner = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(owner);
-+
-+ ret = kdbus_name_acquire(owner, "com.example.c", NULL);
-+ ASSERT_RETURN(ret >= 0);
-+
-+ ret = RUN_UNPRIVILEGED(UNPRIV_UID, UNPRIV_GID, ({
-+ struct kdbus_conn *unpriv;
-+
-+ /* wait for parent to be finished */
-+ sigemptyset(&sset);
-+ ret = sigsuspend(&sset);
-+ ASSERT_RETURN(ret == -1 && errno == EINTR);
-+
-+ unpriv = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(unpriv);
-+
-+ ret = kdbus_msg_send(unpriv, "com.example.c", 0xdeadbeef, 0, 0,
-+ 0, 0);
-+ ASSERT_EXIT(ret >= 0);
-+
-+ ret = kdbus_msg_recv_poll(owner, 100, NULL, NULL);
-+ ASSERT_EXIT(ret >= 0);
-+
-+ /* free policy holder */
-+ kdbus_conn_free(conn);
-+
-+ ret = kdbus_msg_send(unpriv, "com.example.c", 0xdeadbeef, 0, 0,
-+ 0, 0);
-+ ASSERT_EXIT(ret == -EPERM);
-+
-+ kdbus_conn_free(unpriv);
-+ }), ({
-+ /* make sure policy holder is only valid in child */
-+ kdbus_conn_free(conn);
-+ kill(pid, SIGUSR1);
-+ }));
-+ ASSERT_RETURN(ret >= 0);
-+
-+
-+ /*
-+ * The following tests are necessary.
-+ */
-+
-+ ret = test_broadcast_after_policy_upload(env);
-+ ASSERT_RETURN(ret == 0);
-+
-+ kdbus_conn_free(owner);
-+
-+ /*
-+ * cleanup resources
-+ */
-+
-+ kdbus_conn_free(conn_b);
-+ kdbus_conn_free(conn_a);
-+
-+ return TEST_OK;
-+}
-+
-+int kdbus_test_policy_priv(struct kdbus_test_env *env)
-+{
-+ pid_t pid;
-+ int ret;
-+
-+ /* make sure to exit() if a child returns from fork() */
-+ pid = getpid();
-+ ret = test_policy_priv(env);
-+ if (pid != getpid())
-+ exit(1);
-+
-+ return ret;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-policy.c b/tools/testing/selftests/kdbus/test-policy.c
-new file mode 100644
-index 0000000..96d20d5
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-policy.c
-@@ -0,0 +1,80 @@
-+#include <errno.h>
-+#include <stdio.h>
-+#include <string.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stdint.h>
-+#include <stdbool.h>
-+#include <unistd.h>
-+
-+#include "kdbus-test.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+
-+int kdbus_test_policy(struct kdbus_test_env *env)
-+{
-+ struct kdbus_conn *conn_a, *conn_b;
-+ struct kdbus_policy_access access;
-+ int ret;
-+
-+ /* Invalid name */
-+ conn_a = kdbus_hello_registrar(env->buspath, ".example.a",
-+ NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
-+ ASSERT_RETURN(conn_a == NULL);
-+
-+ conn_a = kdbus_hello_registrar(env->buspath, "example",
-+ NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
-+ ASSERT_RETURN(conn_a == NULL);
-+
-+ conn_a = kdbus_hello_registrar(env->buspath, "com.example.a",
-+ NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
-+ ASSERT_RETURN(conn_a);
-+
-+ conn_b = kdbus_hello_registrar(env->buspath, "com.example.b",
-+ NULL, 0, KDBUS_HELLO_POLICY_HOLDER);
-+ ASSERT_RETURN(conn_b);
-+
-+ /*
-+ * Verify there cannot be any duplicate entries, except for specific vs.
-+ * wildcard entries.
-+ */
-+
-+ access = (struct kdbus_policy_access){
-+ .type = KDBUS_POLICY_ACCESS_USER,
-+ .id = geteuid(),
-+ .access = KDBUS_POLICY_SEE,
-+ };
-+
-+ ret = kdbus_conn_update_policy(conn_a, "com.example.a", &access, 1);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_conn_update_policy(conn_b, "com.example.a", &access, 1);
-+ ASSERT_RETURN(ret == -EEXIST);
-+
-+ ret = kdbus_conn_update_policy(conn_b, "com.example.a.*", &access, 1);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_conn_update_policy(conn_a, "com.example.a.*", &access, 1);
-+ ASSERT_RETURN(ret == -EEXIST);
-+
-+ ret = kdbus_conn_update_policy(conn_a, "com.example.*", &access, 1);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_conn_update_policy(conn_b, "com.example.a", &access, 1);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = kdbus_conn_update_policy(conn_b, "com.example.*", &access, 1);
-+ ASSERT_RETURN(ret == -EEXIST);
-+
-+ /* Invalid name */
-+ ret = kdbus_conn_update_policy(conn_b, ".example.*", &access, 1);
-+ ASSERT_RETURN(ret == -EINVAL);
-+
-+ ret = kdbus_conn_update_policy(conn_b, "example", &access, 1);
-+ ASSERT_RETURN(ret == -EINVAL);
-+
-+ kdbus_conn_free(conn_b);
-+ kdbus_conn_free(conn_a);
-+
-+ return TEST_OK;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-sync.c b/tools/testing/selftests/kdbus/test-sync.c
-new file mode 100644
-index 0000000..0655a54
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-sync.c
-@@ -0,0 +1,369 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <time.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <pthread.h>
-+#include <stdbool.h>
-+#include <signal.h>
-+#include <sys/wait.h>
-+#include <sys/eventfd.h>
-+
-+#include "kdbus-api.h"
-+#include "kdbus-test.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+
-+static struct kdbus_conn *conn_a, *conn_b;
-+static unsigned int cookie = 0xdeadbeef;
-+
-+static void nop_handler(int sig) {}
-+
-+static int interrupt_sync(struct kdbus_conn *conn_src,
-+ struct kdbus_conn *conn_dst)
-+{
-+ pid_t pid;
-+ int ret, status;
-+ struct kdbus_msg *msg = NULL;
-+ struct sigaction sa = {
-+ .sa_handler = nop_handler,
-+ .sa_flags = SA_NOCLDSTOP|SA_RESTART,
-+ };
-+
-+ cookie++;
-+ pid = fork();
-+ ASSERT_RETURN_VAL(pid >= 0, pid);
-+
-+ if (pid == 0) {
-+ ret = sigaction(SIGINT, &sa, NULL);
-+ ASSERT_EXIT(ret == 0);
-+
-+ ret = kdbus_msg_send_sync(conn_dst, NULL, cookie,
-+ KDBUS_MSG_EXPECT_REPLY,
-+ 100000000ULL, 0, conn_src->id, -1);
-+ ASSERT_EXIT(ret == -ETIMEDOUT);
-+
-+ _exit(EXIT_SUCCESS);
-+ }
-+
-+ ret = kdbus_msg_recv_poll(conn_src, 100, &msg, NULL);
-+ ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
-+
-+ kdbus_msg_free(msg);
-+
-+ ret = kill(pid, SIGINT);
-+ ASSERT_RETURN_VAL(ret == 0, ret);
-+
-+ ret = waitpid(pid, &status, 0);
-+ ASSERT_RETURN_VAL(ret >= 0, ret);
-+
-+ if (WIFSIGNALED(status))
-+ return TEST_ERR;
-+
-+ ret = kdbus_msg_recv_poll(conn_src, 100, NULL, NULL);
-+ ASSERT_RETURN(ret == -ETIMEDOUT);
-+
-+ return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
-+}
-+
-+static int close_epipe_sync(const char *bus)
-+{
-+ pid_t pid;
-+ int ret, status;
-+ struct kdbus_conn *conn_src;
-+ struct kdbus_conn *conn_dst;
-+ struct kdbus_msg *msg = NULL;
-+
-+ conn_src = kdbus_hello(bus, 0, NULL, 0);
-+ ASSERT_RETURN(conn_src);
-+
-+ ret = kdbus_add_match_empty(conn_src);
-+ ASSERT_RETURN(ret == 0);
-+
-+ conn_dst = kdbus_hello(bus, 0, NULL, 0);
-+ ASSERT_RETURN(conn_dst);
-+
-+ cookie++;
-+ pid = fork();
-+ ASSERT_RETURN_VAL(pid >= 0, pid);
-+
-+ if (pid == 0) {
-+ uint64_t dst_id;
-+
-+ /* close our reference */
-+ dst_id = conn_dst->id;
-+ kdbus_conn_free(conn_dst);
-+
-+ ret = kdbus_msg_recv_poll(conn_src, 100, &msg, NULL);
-+ ASSERT_EXIT(ret == 0 && msg->cookie == cookie);
-+ ASSERT_EXIT(msg->src_id == dst_id);
-+
-+ cookie++;
-+ ret = kdbus_msg_send_sync(conn_src, NULL, cookie,
-+ KDBUS_MSG_EXPECT_REPLY,
-+ 100000000ULL, 0, dst_id, -1);
-+ ASSERT_EXIT(ret == -EPIPE);
-+
-+ _exit(EXIT_SUCCESS);
-+ }
-+
-+ ret = kdbus_msg_send(conn_dst, NULL, cookie, 0, 0, 0,
-+ KDBUS_DST_ID_BROADCAST);
-+ ASSERT_RETURN(ret == 0);
-+
-+ cookie++;
-+ ret = kdbus_msg_recv_poll(conn_dst, 100, &msg, NULL);
-+ ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
-+
-+ kdbus_msg_free(msg);
-+
-+ /* destroy connection */
-+ kdbus_conn_free(conn_dst);
-+ kdbus_conn_free(conn_src);
-+
-+ ret = waitpid(pid, &status, 0);
-+ ASSERT_RETURN_VAL(ret >= 0, ret);
-+
-+ if (!WIFEXITED(status))
-+ return TEST_ERR;
-+
-+ return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
-+}
-+
-+static int cancel_fd_sync(struct kdbus_conn *conn_src,
-+ struct kdbus_conn *conn_dst)
-+{
-+ pid_t pid;
-+ int cancel_fd;
-+ int ret, status;
-+ uint64_t counter = 1;
-+ struct kdbus_msg *msg = NULL;
-+
-+ cancel_fd = eventfd(0, 0);
-+ ASSERT_RETURN_VAL(cancel_fd >= 0, cancel_fd);
-+
-+ cookie++;
-+ pid = fork();
-+ ASSERT_RETURN_VAL(pid >= 0, pid);
-+
-+ if (pid == 0) {
-+ ret = kdbus_msg_send_sync(conn_dst, NULL, cookie,
-+ KDBUS_MSG_EXPECT_REPLY,
-+ 100000000ULL, 0, conn_src->id,
-+ cancel_fd);
-+ ASSERT_EXIT(ret == -ECANCELED);
-+
-+ _exit(EXIT_SUCCESS);
-+ }
-+
-+ ret = kdbus_msg_recv_poll(conn_src, 100, &msg, NULL);
-+ ASSERT_RETURN(ret == 0 && msg->cookie == cookie);
-+
-+ kdbus_msg_free(msg);
-+
-+ ret = write(cancel_fd, &counter, sizeof(counter));
-+ ASSERT_RETURN(ret == sizeof(counter));
-+
-+ ret = waitpid(pid, &status, 0);
-+ ASSERT_RETURN_VAL(ret >= 0, ret);
-+
-+ if (WIFSIGNALED(status))
-+ return TEST_ERR;
-+
-+ return (status == EXIT_SUCCESS) ? TEST_OK : TEST_ERR;
-+}
-+
-+static int no_cancel_sync(struct kdbus_conn *conn_src,
-+ struct kdbus_conn *conn_dst)
-+{
-+ pid_t pid;
-+ int cancel_fd;
-+ int ret, status;
-+ struct kdbus_msg *msg = NULL;
-+
-+ /* pass eventfd, but never signal it so it shouldn't have any effect */
-+
-+ cancel_fd = eventfd(0, 0);
-+ ASSERT_RETURN_VAL(cancel_fd >= 0, cancel_fd);
-+
-+ cookie++;
-+ pid = fork();
-+ ASSERT_RETURN_VAL(pid >= 0, pid);
-+
-+ if (pid == 0) {
-+ ret = kdbus_msg_send_sync(conn_dst, NULL, cookie,
-+ KDBUS_MSG_EXPECT_REPLY,
-+ 100000000ULL, 0, conn_src->id,
-+ cancel_fd);
-+ ASSERT_EXIT(ret == 0);
-+
-+ _exit(EXIT_SUCCESS);
-+ }
-+
-+ ret = kdbus_msg_recv_poll(conn_src, 100, &msg, NULL);
-+ ASSERT_RETURN_VAL(ret == 0 && msg->cookie == cookie, -1);
-+
-+ kdbus_msg_free(msg);
-+
-+ ret = kdbus_msg_send_reply(conn_src, cookie, conn_dst->id);
-+ ASSERT_RETURN_VAL(ret >= 0, ret);
-+
-+ ret = waitpid(pid, &status, 0);
-+ ASSERT_RETURN_VAL(ret >= 0, ret);
-+
-+ if (WIFSIGNALED(status))
-+ return -1;
-+
-+ return (status == EXIT_SUCCESS) ? 0 : -1;
-+}
-+
-+static void *run_thread_reply(void *data)
-+{
-+ int ret;
-+ unsigned long status = TEST_OK;
-+
-+ ret = kdbus_msg_recv_poll(conn_a, 3000, NULL, NULL);
-+ if (ret < 0)
-+ goto exit_thread;
-+
-+ kdbus_printf("Thread received message, sending reply ...\n");
-+
-+ /* using an unknown cookie must fail */
-+ ret = kdbus_msg_send_reply(conn_a, ~cookie, conn_b->id);
-+ if (ret != -EBADSLT) {
-+ status = TEST_ERR;
-+ goto exit_thread;
-+ }
-+
-+ ret = kdbus_msg_send_reply(conn_a, cookie, conn_b->id);
-+ if (ret != 0) {
-+ status = TEST_ERR;
-+ goto exit_thread;
-+ }
-+
-+exit_thread:
-+ pthread_exit(NULL);
-+ return (void *) status;
-+}
-+
-+int kdbus_test_sync_reply(struct kdbus_test_env *env)
-+{
-+ unsigned long status;
-+ pthread_t thread;
-+ int ret;
-+
-+ conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
-+ conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn_a && conn_b);
-+
-+ pthread_create(&thread, NULL, run_thread_reply, NULL);
-+
-+ ret = kdbus_msg_send_sync(conn_b, NULL, cookie,
-+ KDBUS_MSG_EXPECT_REPLY,
-+ 5000000000ULL, 0, conn_a->id, -1);
-+
-+ pthread_join(thread, (void *) &status);
-+ ASSERT_RETURN(status == 0);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = interrupt_sync(conn_a, conn_b);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = close_epipe_sync(env->buspath);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = cancel_fd_sync(conn_a, conn_b);
-+ ASSERT_RETURN(ret == 0);
-+
-+ ret = no_cancel_sync(conn_a, conn_b);
-+ ASSERT_RETURN(ret == 0);
-+
-+ kdbus_printf("-- closing bus connections\n");
-+
-+ kdbus_conn_free(conn_a);
-+ kdbus_conn_free(conn_b);
-+
-+ return TEST_OK;
-+}
-+
-+#define BYEBYE_ME ((void*)0L)
-+#define BYEBYE_THEM ((void*)1L)
-+
-+static void *run_thread_byebye(void *data)
-+{
-+ struct kdbus_cmd cmd_byebye = { .size = sizeof(cmd_byebye) };
-+ int ret;
-+
-+ ret = kdbus_msg_recv_poll(conn_a, 3000, NULL, NULL);
-+ if (ret == 0) {
-+ kdbus_printf("Thread received message, invoking BYEBYE ...\n");
-+ kdbus_msg_recv(conn_a, NULL, NULL);
-+ if (data == BYEBYE_ME)
-+ kdbus_cmd_byebye(conn_b->fd, &cmd_byebye);
-+ else if (data == BYEBYE_THEM)
-+ kdbus_cmd_byebye(conn_a->fd, &cmd_byebye);
-+ }
-+
-+ pthread_exit(NULL);
-+ return NULL;
-+}
-+
-+int kdbus_test_sync_byebye(struct kdbus_test_env *env)
-+{
-+ pthread_t thread;
-+ int ret;
-+
-+ /*
-+ * This sends a synchronous message to a thread, which waits until it
-+ * received the message and then invokes BYEBYE on the *ORIGINAL*
-+ * connection. That is, on the same connection that synchronously waits
-+ * for an reply.
-+ * This should properly wake the connection up and cause ECONNRESET as
-+ * the connection is disconnected now.
-+ *
-+ * The second time, we do the same but invoke BYEBYE on the *TARGET*
-+ * connection. This should also wake up the synchronous sender as the
-+ * reply cannot be sent by a disconnected target.
-+ */
-+
-+ conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
-+ conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn_a && conn_b);
-+
-+ pthread_create(&thread, NULL, run_thread_byebye, BYEBYE_ME);
-+
-+ ret = kdbus_msg_send_sync(conn_b, NULL, cookie,
-+ KDBUS_MSG_EXPECT_REPLY,
-+ 5000000000ULL, 0, conn_a->id, -1);
-+
-+ ASSERT_RETURN(ret == -ECONNRESET);
-+
-+ pthread_join(thread, NULL);
-+
-+ kdbus_conn_free(conn_a);
-+ kdbus_conn_free(conn_b);
-+
-+ conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
-+ conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn_a && conn_b);
-+
-+ pthread_create(&thread, NULL, run_thread_byebye, BYEBYE_THEM);
-+
-+ ret = kdbus_msg_send_sync(conn_b, NULL, cookie,
-+ KDBUS_MSG_EXPECT_REPLY,
-+ 5000000000ULL, 0, conn_a->id, -1);
-+
-+ ASSERT_RETURN(ret == -EPIPE);
-+
-+ pthread_join(thread, NULL);
-+
-+ kdbus_conn_free(conn_a);
-+ kdbus_conn_free(conn_b);
-+
-+ return TEST_OK;
-+}
-diff --git a/tools/testing/selftests/kdbus/test-timeout.c b/tools/testing/selftests/kdbus/test-timeout.c
-new file mode 100644
-index 0000000..cfd1930
---- /dev/null
-+++ b/tools/testing/selftests/kdbus/test-timeout.c
-@@ -0,0 +1,99 @@
-+#include <stdio.h>
-+#include <string.h>
-+#include <time.h>
-+#include <fcntl.h>
-+#include <stdlib.h>
-+#include <stddef.h>
-+#include <unistd.h>
-+#include <stdint.h>
-+#include <errno.h>
-+#include <assert.h>
-+#include <poll.h>
-+#include <stdbool.h>
-+
-+#include "kdbus-api.h"
-+#include "kdbus-test.h"
-+#include "kdbus-util.h"
-+#include "kdbus-enum.h"
-+
-+int timeout_msg_recv(struct kdbus_conn *conn, uint64_t *expected)
-+{
-+ struct kdbus_cmd_recv recv = { .size = sizeof(recv) };
-+ struct kdbus_msg *msg;
-+ int ret;
-+
-+ ret = kdbus_cmd_recv(conn->fd, &recv);
-+ if (ret < 0) {
-+ kdbus_printf("error receiving message: %d (%m)\n", ret);
-+ return ret;
-+ }
-+
-+ msg = (struct kdbus_msg *)(conn->buf + recv.msg.offset);
-+
-+ ASSERT_RETURN_VAL(msg->payload_type == KDBUS_PAYLOAD_KERNEL, -EINVAL);
-+ ASSERT_RETURN_VAL(msg->src_id == KDBUS_SRC_ID_KERNEL, -EINVAL);
-+ ASSERT_RETURN_VAL(msg->dst_id == conn->id, -EINVAL);
-+
-+ *expected &= ~(1ULL << msg->cookie_reply);
-+ kdbus_printf("Got message timeout for cookie %llu\n",
-+ msg->cookie_reply);
-+
-+ ret = kdbus_free(conn, recv.msg.offset);
-+ if (ret < 0)
-+ return ret;
-+
-+ return 0;
-+}
-+
-+int kdbus_test_timeout(struct kdbus_test_env *env)
-+{
-+ struct kdbus_conn *conn_a, *conn_b;
-+ struct pollfd fd;
-+ int ret, i, n_msgs = 4;
-+ uint64_t expected = 0;
-+ uint64_t cookie = 0xdeadbeef;
-+
-+ conn_a = kdbus_hello(env->buspath, 0, NULL, 0);
-+ conn_b = kdbus_hello(env->buspath, 0, NULL, 0);
-+ ASSERT_RETURN(conn_a && conn_b);
-+
-+ fd.fd = conn_b->fd;
-+
-+ /*
-+ * send messages that expect a reply (within 100 msec),
-+ * but never answer it.
-+ */
-+ for (i = 0; i < n_msgs; i++, cookie++) {
-+ kdbus_printf("Sending message with cookie %llu ...\n",
-+ (unsigned long long)cookie);
-+ ASSERT_RETURN(kdbus_msg_send(conn_b, NULL, cookie,
-+ KDBUS_MSG_EXPECT_REPLY,
-+ (i + 1) * 100ULL * 1000000ULL, 0,
-+ conn_a->id) == 0);
-+ expected |= 1ULL << cookie;
-+ }
-+
-+ for (;;) {
-+ fd.events = POLLIN | POLLPRI | POLLHUP;
-+ fd.revents = 0;
-+
-+ ret = poll(&fd, 1, (n_msgs + 1) * 100);
-+ if (ret == 0)
-+ kdbus_printf("--- timeout\n");
-+ if (ret <= 0)
-+ break;
-+
-+ if (fd.revents & POLLIN)
-+ ASSERT_RETURN(!timeout_msg_recv(conn_b, &expected));
-+
-+ if (expected == 0)
-+ break;
-+ }
-+
-+ ASSERT_RETURN(expected == 0);
-+
-+ kdbus_conn_free(conn_a);
-+ kdbus_conn_free(conn_b);
-+
-+ return TEST_OK;
-+}
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-11-10 0:58 Mike Pagano
0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-11-10 0:58 UTC (permalink / raw
To: gentoo-commits
commit: decadf545e156ac9100bb99a7dc63d44bbfb1c08
Author: Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Tue Nov 10 00:58:38 2015 +0000
Commit: Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Tue Nov 10 00:58:38 2015 +0000
URL: https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=decadf54
Linux patch 4.2.6
0000_README | 4 +
1005_linux-4.2.6.patch | 3380 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 3384 insertions(+)
diff --git a/0000_README b/0000_README
index cf9d964..8190b77 100644
--- a/0000_README
+++ b/0000_README
@@ -63,6 +63,10 @@ Patch: 1004_linux-4.2.5.patch
From: http://www.kernel.org
Desc: Linux 4.2.5
+Patch: 1005_linux-4.2.6.patch
+From: http://www.kernel.org
+Desc: Linux 4.2.6
+
Patch: 1500_XATTR_USER_PREFIX.patch
From: https://bugs.gentoo.org/show_bug.cgi?id=470644
Desc: Support for namespace user.pax.* on tmpfs.
diff --git a/1005_linux-4.2.6.patch b/1005_linux-4.2.6.patch
new file mode 100644
index 0000000..39cc395
--- /dev/null
+++ b/1005_linux-4.2.6.patch
@@ -0,0 +1,3380 @@
+diff --git a/Makefile b/Makefile
+index 96076dcad18e..9ef37399b4e8 100644
+--- a/Makefile
++++ b/Makefile
+@@ -1,6 +1,6 @@
+ VERSION = 4
+ PATCHLEVEL = 2
+-SUBLEVEL = 5
++SUBLEVEL = 6
+ EXTRAVERSION =
+ NAME = Hurr durr I'ma sheep
+
+diff --git a/arch/arm/boot/dts/am57xx-beagle-x15.dts b/arch/arm/boot/dts/am57xx-beagle-x15.dts
+index a63bf78191ea..03385fabf839 100644
+--- a/arch/arm/boot/dts/am57xx-beagle-x15.dts
++++ b/arch/arm/boot/dts/am57xx-beagle-x15.dts
+@@ -415,11 +415,12 @@
+ /* SMPS9 unused */
+
+ ldo1_reg: ldo1 {
+- /* VDD_SD */
++ /* VDD_SD / VDDSHV8 */
+ regulator-name = "ldo1";
+ regulator-min-microvolt = <1800000>;
+ regulator-max-microvolt = <3300000>;
+ regulator-boot-on;
++ regulator-always-on;
+ };
+
+ ldo2_reg: ldo2 {
+diff --git a/arch/arm/boot/dts/armada-385-db-ap.dts b/arch/arm/boot/dts/armada-385-db-ap.dts
+index 89f5a95954ed..4047621b137e 100644
+--- a/arch/arm/boot/dts/armada-385-db-ap.dts
++++ b/arch/arm/boot/dts/armada-385-db-ap.dts
+@@ -46,7 +46,7 @@
+
+ / {
+ model = "Marvell Armada 385 Access Point Development Board";
+- compatible = "marvell,a385-db-ap", "marvell,armada385", "marvell,armada38x";
++ compatible = "marvell,a385-db-ap", "marvell,armada385", "marvell,armada380";
+
+ chosen {
+ stdout-path = "serial1:115200n8";
+diff --git a/arch/arm/boot/dts/berlin2q.dtsi b/arch/arm/boot/dts/berlin2q.dtsi
+index 63a48490e2f9..d4dbd28d348c 100644
+--- a/arch/arm/boot/dts/berlin2q.dtsi
++++ b/arch/arm/boot/dts/berlin2q.dtsi
+@@ -152,7 +152,7 @@
+ };
+
+ usb_phy2: phy@a2f400 {
+- compatible = "marvell,berlin2-usb-phy";
++ compatible = "marvell,berlin2cd-usb-phy";
+ reg = <0xa2f400 0x128>;
+ #phy-cells = <0>;
+ resets = <&chip_rst 0x104 14>;
+@@ -170,7 +170,7 @@
+ };
+
+ usb_phy0: phy@b74000 {
+- compatible = "marvell,berlin2-usb-phy";
++ compatible = "marvell,berlin2cd-usb-phy";
+ reg = <0xb74000 0x128>;
+ #phy-cells = <0>;
+ resets = <&chip_rst 0x104 12>;
+@@ -178,7 +178,7 @@
+ };
+
+ usb_phy1: phy@b78000 {
+- compatible = "marvell,berlin2-usb-phy";
++ compatible = "marvell,berlin2cd-usb-phy";
+ reg = <0xb78000 0x128>;
+ #phy-cells = <0>;
+ resets = <&chip_rst 0x104 13>;
+diff --git a/arch/arm/boot/dts/exynos5420-peach-pit.dts b/arch/arm/boot/dts/exynos5420-peach-pit.dts
+index 8f4d76c5e11c..1b95da79293c 100644
+--- a/arch/arm/boot/dts/exynos5420-peach-pit.dts
++++ b/arch/arm/boot/dts/exynos5420-peach-pit.dts
+@@ -915,6 +915,11 @@
+ };
+ };
+
++&pmu_system_controller {
++ assigned-clocks = <&pmu_system_controller 0>;
++ assigned-clock-parents = <&clock CLK_FIN_PLL>;
++};
++
+ &rtc {
+ status = "okay";
+ clocks = <&clock CLK_RTC>, <&max77802 MAX77802_CLK_32K_AP>;
+diff --git a/arch/arm/boot/dts/exynos5800-peach-pi.dts b/arch/arm/boot/dts/exynos5800-peach-pi.dts
+index 7d5b386b5ae6..8f40c7e549bd 100644
+--- a/arch/arm/boot/dts/exynos5800-peach-pi.dts
++++ b/arch/arm/boot/dts/exynos5800-peach-pi.dts
+@@ -878,6 +878,11 @@
+ };
+ };
+
++&pmu_system_controller {
++ assigned-clocks = <&pmu_system_controller 0>;
++ assigned-clock-parents = <&clock CLK_FIN_PLL>;
++};
++
+ &rtc {
+ status = "okay";
+ clocks = <&clock CLK_RTC>, <&max77802 MAX77802_CLK_32K_AP>;
+diff --git a/arch/arm/boot/dts/imx7d.dtsi b/arch/arm/boot/dts/imx7d.dtsi
+index c42cf8db0451..9accbae15374 100644
+--- a/arch/arm/boot/dts/imx7d.dtsi
++++ b/arch/arm/boot/dts/imx7d.dtsi
+@@ -340,10 +340,10 @@
+ status = "disabled";
+ };
+
+- uart2: serial@30870000 {
++ uart2: serial@30890000 {
+ compatible = "fsl,imx7d-uart",
+ "fsl,imx6q-uart";
+- reg = <0x30870000 0x10000>;
++ reg = <0x30890000 0x10000>;
+ interrupts = <GIC_SPI 27 IRQ_TYPE_LEVEL_HIGH>;
+ clocks = <&clks IMX7D_UART2_ROOT_CLK>,
+ <&clks IMX7D_UART2_ROOT_CLK>;
+diff --git a/arch/arm/boot/dts/ste-hrefv60plus.dtsi b/arch/arm/boot/dts/ste-hrefv60plus.dtsi
+index 810cda743b6d..9c2387b34d0c 100644
+--- a/arch/arm/boot/dts/ste-hrefv60plus.dtsi
++++ b/arch/arm/boot/dts/ste-hrefv60plus.dtsi
+@@ -56,7 +56,7 @@
+ /* VMMCI level-shifter enable */
+ default_hrefv60_cfg2 {
+ pins = "GPIO169_D22";
+- ste,config = <&gpio_out_lo>;
++ ste,config = <&gpio_out_hi>;
+ };
+ /* VMMCI level-shifter voltage select */
+ default_hrefv60_cfg3 {
+diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
+index bfb915d05665..dd5fc1e36384 100644
+--- a/arch/arm/kvm/Kconfig
++++ b/arch/arm/kvm/Kconfig
+@@ -21,6 +21,7 @@ config KVM
+ depends on MMU && OF
+ select PREEMPT_NOTIFIERS
+ select ANON_INODES
++ select ARM_GIC
+ select HAVE_KVM_CPU_RELAX_INTERCEPT
+ select HAVE_KVM_ARCH_TLB_FLUSH_ALL
+ select KVM_MMIO
+diff --git a/arch/arm/mach-exynos/pm_domains.c b/arch/arm/mach-exynos/pm_domains.c
+index 4a87e86dec45..7c21760f590f 100644
+--- a/arch/arm/mach-exynos/pm_domains.c
++++ b/arch/arm/mach-exynos/pm_domains.c
+@@ -200,15 +200,15 @@ no_clk:
+ args.args_count = 0;
+ child_domain = of_genpd_get_from_provider(&args);
+ if (IS_ERR(child_domain))
+- goto next_pd;
++ continue;
+
+ if (of_parse_phandle_with_args(np, "power-domains",
+ "#power-domain-cells", 0, &args) != 0)
+- goto next_pd;
++ continue;
+
+ parent_domain = of_genpd_get_from_provider(&args);
+ if (IS_ERR(parent_domain))
+- goto next_pd;
++ continue;
+
+ if (pm_genpd_add_subdomain(parent_domain, child_domain))
+ pr_warn("%s failed to add subdomain: %s\n",
+@@ -216,8 +216,6 @@ no_clk:
+ else
+ pr_info("%s has as child subdomain: %s.\n",
+ parent_domain->name, child_domain->name);
+-next_pd:
+- of_node_put(np);
+ }
+
+ return 0;
+diff --git a/arch/arm/plat-orion/common.c b/arch/arm/plat-orion/common.c
+index 2235081a04ee..8861c367d061 100644
+--- a/arch/arm/plat-orion/common.c
++++ b/arch/arm/plat-orion/common.c
+@@ -495,7 +495,7 @@ void __init orion_ge00_switch_init(struct dsa_platform_data *d, int irq)
+
+ d->netdev = &orion_ge00.dev;
+ for (i = 0; i < d->nr_chips; i++)
+- d->chip[i].host_dev = &orion_ge00_shared.dev;
++ d->chip[i].host_dev = &orion_ge_mvmdio.dev;
+ orion_switch_device.dev.platform_data = d;
+
+ platform_device_register(&orion_switch_device);
+diff --git a/arch/arm/vdso/vdsomunge.c b/arch/arm/vdso/vdsomunge.c
+index aedec81d1198..f6455273b2f8 100644
+--- a/arch/arm/vdso/vdsomunge.c
++++ b/arch/arm/vdso/vdsomunge.c
+@@ -45,7 +45,6 @@
+ * it does.
+ */
+
+-#include <byteswap.h>
+ #include <elf.h>
+ #include <errno.h>
+ #include <fcntl.h>
+@@ -59,6 +58,16 @@
+ #include <sys/types.h>
+ #include <unistd.h>
+
++#define swab16(x) \
++ ((((x) & 0x00ff) << 8) | \
++ (((x) & 0xff00) >> 8))
++
++#define swab32(x) \
++ ((((x) & 0x000000ff) << 24) | \
++ (((x) & 0x0000ff00) << 8) | \
++ (((x) & 0x00ff0000) >> 8) | \
++ (((x) & 0xff000000) >> 24))
++
+ #if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+ #define HOST_ORDER ELFDATA2LSB
+ #elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+@@ -104,17 +113,17 @@ static void cleanup(void)
+
+ static Elf32_Word read_elf_word(Elf32_Word word, bool swap)
+ {
+- return swap ? bswap_32(word) : word;
++ return swap ? swab32(word) : word;
+ }
+
+ static Elf32_Half read_elf_half(Elf32_Half half, bool swap)
+ {
+- return swap ? bswap_16(half) : half;
++ return swap ? swab16(half) : half;
+ }
+
+ static void write_elf_word(Elf32_Word val, Elf32_Word *dst, bool swap)
+ {
+- *dst = swap ? bswap_32(val) : val;
++ *dst = swap ? swab32(val) : val;
+ }
+
+ int main(int argc, char **argv)
+diff --git a/arch/arm64/kernel/armv8_deprecated.c b/arch/arm64/kernel/armv8_deprecated.c
+index 7922c2e710ca..7ac3920b1356 100644
+--- a/arch/arm64/kernel/armv8_deprecated.c
++++ b/arch/arm64/kernel/armv8_deprecated.c
+@@ -279,22 +279,24 @@ static void register_insn_emulation_sysctl(struct ctl_table *table)
+ */
+ #define __user_swpX_asm(data, addr, res, temp, B) \
+ __asm__ __volatile__( \
+- " mov %w2, %w1\n" \
+- "0: ldxr"B" %w1, [%3]\n" \
+- "1: stxr"B" %w0, %w2, [%3]\n" \
++ "0: ldxr"B" %w2, [%3]\n" \
++ "1: stxr"B" %w0, %w1, [%3]\n" \
+ " cbz %w0, 2f\n" \
+ " mov %w0, %w4\n" \
++ " b 3f\n" \
+ "2:\n" \
++ " mov %w1, %w2\n" \
++ "3:\n" \
+ " .pushsection .fixup,\"ax\"\n" \
+ " .align 2\n" \
+- "3: mov %w0, %w5\n" \
+- " b 2b\n" \
++ "4: mov %w0, %w5\n" \
++ " b 3b\n" \
+ " .popsection" \
+ " .pushsection __ex_table,\"a\"\n" \
+ " .align 3\n" \
+- " .quad 0b, 3b\n" \
+- " .quad 1b, 3b\n" \
+- " .popsection" \
++ " .quad 0b, 4b\n" \
++ " .quad 1b, 4b\n" \
++ " .popsection\n" \
+ : "=&r" (res), "+r" (data), "=&r" (temp) \
+ : "r" (addr), "i" (-EAGAIN), "i" (-EFAULT) \
+ : "memory")
+diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
+index 407991bf79f5..ccb6078ed9f2 100644
+--- a/arch/arm64/kernel/stacktrace.c
++++ b/arch/arm64/kernel/stacktrace.c
+@@ -48,11 +48,7 @@ int notrace unwind_frame(struct stackframe *frame)
+
+ frame->sp = fp + 0x10;
+ frame->fp = *(unsigned long *)(fp);
+- /*
+- * -4 here because we care about the PC at time of bl,
+- * not where the return will go.
+- */
+- frame->pc = *(unsigned long *)(fp + 8) - 4;
++ frame->pc = *(unsigned long *)(fp + 8);
+
+ return 0;
+ }
+diff --git a/arch/arm64/kernel/suspend.c b/arch/arm64/kernel/suspend.c
+index 8297d502217e..44ca4143b013 100644
+--- a/arch/arm64/kernel/suspend.c
++++ b/arch/arm64/kernel/suspend.c
+@@ -80,17 +80,21 @@ int cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
+ if (ret == 0) {
+ /*
+ * We are resuming from reset with TTBR0_EL1 set to the
+- * idmap to enable the MMU; restore the active_mm mappings in
+- * TTBR0_EL1 unless the active_mm == &init_mm, in which case
+- * the thread entered cpu_suspend with TTBR0_EL1 set to
+- * reserved TTBR0 page tables and should be restored as such.
++ * idmap to enable the MMU; set the TTBR0 to the reserved
++ * page tables to prevent speculative TLB allocations, flush
++ * the local tlb and set the default tcr_el1.t0sz so that
++ * the TTBR0 address space set-up is properly restored.
++ * If the current active_mm != &init_mm we entered cpu_suspend
++ * with mappings in TTBR0 that must be restored, so we switch
++ * them back to complete the address space configuration
++ * restoration before returning.
+ */
+- if (mm == &init_mm)
+- cpu_set_reserved_ttbr0();
+- else
+- cpu_switch_mm(mm->pgd, mm);
+-
++ cpu_set_reserved_ttbr0();
+ flush_tlb_all();
++ cpu_set_default_tcr_t0sz();
++
++ if (mm != &init_mm)
++ cpu_switch_mm(mm->pgd, mm);
+
+ /*
+ * Restore per-cpu offset before any kernel
+diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
+index caffb10e7aa3..5607693f35cf 100644
+--- a/arch/powerpc/kernel/rtas.c
++++ b/arch/powerpc/kernel/rtas.c
+@@ -1041,6 +1041,9 @@ asmlinkage int ppc_rtas(struct rtas_args __user *uargs)
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
++ if (!rtas.entry)
++ return -EINVAL;
++
+ if (copy_from_user(&args, uargs, 3 * sizeof(u32)) != 0)
+ return -EFAULT;
+
+diff --git a/arch/um/kernel/trap.c b/arch/um/kernel/trap.c
+index 557232f758b6..5610b185d1e9 100644
+--- a/arch/um/kernel/trap.c
++++ b/arch/um/kernel/trap.c
+@@ -220,7 +220,7 @@ unsigned long segv(struct faultinfo fi, unsigned long ip, int is_user,
+ show_regs(container_of(regs, struct pt_regs, regs));
+ panic("Segfault with no mm");
+ }
+- else if (!is_user && address < TASK_SIZE) {
++ else if (!is_user && address > PAGE_SIZE && address < TASK_SIZE) {
+ show_regs(container_of(regs, struct pt_regs, regs));
+ panic("Kernel tried to access user memory at addr 0x%lx, ip 0x%lx",
+ address, ip);
+diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
+index 7d69afd8b6fa..16edc0f169fa 100644
+--- a/arch/x86/boot/compressed/eboot.c
++++ b/arch/x86/boot/compressed/eboot.c
+@@ -667,6 +667,7 @@ setup_gop32(struct screen_info *si, efi_guid_t *proto,
+ bool conout_found = false;
+ void *dummy = NULL;
+ u32 h = handles[i];
++ u32 current_fb_base;
+
+ status = efi_call_early(handle_protocol, h,
+ proto, (void **)&gop32);
+@@ -678,7 +679,7 @@ setup_gop32(struct screen_info *si, efi_guid_t *proto,
+ if (status == EFI_SUCCESS)
+ conout_found = true;
+
+- status = __gop_query32(gop32, &info, &size, &fb_base);
++ status = __gop_query32(gop32, &info, &size, ¤t_fb_base);
+ if (status == EFI_SUCCESS && (!first_gop || conout_found)) {
+ /*
+ * Systems that use the UEFI Console Splitter may
+@@ -692,6 +693,7 @@ setup_gop32(struct screen_info *si, efi_guid_t *proto,
+ pixel_format = info->pixel_format;
+ pixel_info = info->pixel_information;
+ pixels_per_scan_line = info->pixels_per_scan_line;
++ fb_base = current_fb_base;
+
+ /*
+ * Once we've found a GOP supporting ConOut,
+@@ -770,6 +772,7 @@ setup_gop64(struct screen_info *si, efi_guid_t *proto,
+ bool conout_found = false;
+ void *dummy = NULL;
+ u64 h = handles[i];
++ u32 current_fb_base;
+
+ status = efi_call_early(handle_protocol, h,
+ proto, (void **)&gop64);
+@@ -781,7 +784,7 @@ setup_gop64(struct screen_info *si, efi_guid_t *proto,
+ if (status == EFI_SUCCESS)
+ conout_found = true;
+
+- status = __gop_query64(gop64, &info, &size, &fb_base);
++ status = __gop_query64(gop64, &info, &size, ¤t_fb_base);
+ if (status == EFI_SUCCESS && (!first_gop || conout_found)) {
+ /*
+ * Systems that use the UEFI Console Splitter may
+@@ -795,6 +798,7 @@ setup_gop64(struct screen_info *si, efi_guid_t *proto,
+ pixel_format = info->pixel_format;
+ pixel_info = info->pixel_information;
+ pixels_per_scan_line = info->pixels_per_scan_line;
++ fb_base = current_fb_base;
+
+ /*
+ * Once we've found a GOP supporting ConOut,
+diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
+index 5880b482d83c..11b46d91f4e5 100644
+--- a/arch/x86/kernel/apic/io_apic.c
++++ b/arch/x86/kernel/apic/io_apic.c
+@@ -2547,7 +2547,9 @@ void __init setup_ioapic_dest(void)
+ mask = apic->target_cpus();
+
+ chip = irq_data_get_irq_chip(idata);
+- chip->irq_set_affinity(idata, mask, false);
++ /* Might be lapic_chip for irq 0 */
++ if (chip->irq_set_affinity)
++ chip->irq_set_affinity(idata, mask, false);
+ }
+ }
+ #endif
+diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
+index 777ad2f03160..3cebc65221a2 100644
+--- a/arch/x86/xen/enlighten.c
++++ b/arch/x86/xen/enlighten.c
+@@ -33,7 +33,7 @@
+ #include <linux/memblock.h>
+ #include <linux/edd.h>
+
+-#ifdef CONFIG_KEXEC_CORE
++#ifdef CONFIG_KEXEC
+ #include <linux/kexec.h>
+ #endif
+
+@@ -1804,7 +1804,7 @@ static struct notifier_block xen_hvm_cpu_notifier = {
+ .notifier_call = xen_hvm_cpu_notify,
+ };
+
+-#ifdef CONFIG_KEXEC_CORE
++#ifdef CONFIG_KEXEC
+ static void xen_hvm_shutdown(void)
+ {
+ native_machine_shutdown();
+@@ -1838,7 +1838,7 @@ static void __init xen_hvm_guest_init(void)
+ x86_init.irqs.intr_init = xen_init_IRQ;
+ xen_hvm_init_time_ops();
+ xen_hvm_init_mmu_ops();
+-#ifdef CONFIG_KEXEC_CORE
++#ifdef CONFIG_KEXEC
+ machine_ops.shutdown = xen_hvm_shutdown;
+ machine_ops.crash_shutdown = xen_hvm_crash_shutdown;
+ #endif
+diff --git a/block/blk-core.c b/block/blk-core.c
+index 627ed0c593fb..1955ed3a1fa9 100644
+--- a/block/blk-core.c
++++ b/block/blk-core.c
+@@ -578,7 +578,7 @@ void blk_cleanup_queue(struct request_queue *q)
+ q->queue_lock = &q->__queue_lock;
+ spin_unlock_irq(lock);
+
+- bdi_destroy(&q->backing_dev_info);
++ bdi_unregister(&q->backing_dev_info);
+
+ /* @q is and will stay empty, shutdown and put */
+ blk_put_queue(q);
+diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
+index 9115c6d59948..273519894951 100644
+--- a/block/blk-mq-tag.c
++++ b/block/blk-mq-tag.c
+@@ -628,6 +628,7 @@ void blk_mq_free_tags(struct blk_mq_tags *tags)
+ {
+ bt_free(&tags->bitmap_tags);
+ bt_free(&tags->breserved_tags);
++ free_cpumask_var(tags->cpumask);
+ kfree(tags);
+ }
+
+diff --git a/block/blk-mq.c b/block/blk-mq.c
+index c69902695136..4d6ff5259a61 100644
+--- a/block/blk-mq.c
++++ b/block/blk-mq.c
+@@ -2263,10 +2263,8 @@ void blk_mq_free_tag_set(struct blk_mq_tag_set *set)
+ int i;
+
+ for (i = 0; i < set->nr_hw_queues; i++) {
+- if (set->tags[i]) {
++ if (set->tags[i])
+ blk_mq_free_rq_map(set, set->tags[i], i);
+- free_cpumask_var(set->tags[i]->cpumask);
+- }
+ }
+
+ kfree(set->tags);
+diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
+index 6264b382d4d1..145ddb6c6d31 100644
+--- a/block/blk-sysfs.c
++++ b/block/blk-sysfs.c
+@@ -502,6 +502,7 @@ static void blk_release_queue(struct kobject *kobj)
+ struct request_queue *q =
+ container_of(kobj, struct request_queue, kobj);
+
++ bdi_exit(&q->backing_dev_info);
+ blkcg_exit_queue(q);
+
+ if (q->elevator) {
+diff --git a/crypto/ablkcipher.c b/crypto/ablkcipher.c
+index b788f169cc98..b4ffc5be1a93 100644
+--- a/crypto/ablkcipher.c
++++ b/crypto/ablkcipher.c
+@@ -706,7 +706,7 @@ struct crypto_ablkcipher *crypto_alloc_ablkcipher(const char *alg_name,
+ err:
+ if (err != -EAGAIN)
+ break;
+- if (signal_pending(current)) {
++ if (fatal_signal_pending(current)) {
+ err = -EINTR;
+ break;
+ }
+diff --git a/crypto/algapi.c b/crypto/algapi.c
+index 3c079b7f23f6..b603b34ce8a8 100644
+--- a/crypto/algapi.c
++++ b/crypto/algapi.c
+@@ -335,7 +335,7 @@ static void crypto_wait_for_test(struct crypto_larval *larval)
+ crypto_alg_tested(larval->alg.cra_driver_name, 0);
+ }
+
+- err = wait_for_completion_interruptible(&larval->completion);
++ err = wait_for_completion_killable(&larval->completion);
+ WARN_ON(err);
+
+ out:
+diff --git a/crypto/api.c b/crypto/api.c
+index afe4610afc4b..bbc147cb5dec 100644
+--- a/crypto/api.c
++++ b/crypto/api.c
+@@ -172,7 +172,7 @@ static struct crypto_alg *crypto_larval_wait(struct crypto_alg *alg)
+ struct crypto_larval *larval = (void *)alg;
+ long timeout;
+
+- timeout = wait_for_completion_interruptible_timeout(
++ timeout = wait_for_completion_killable_timeout(
+ &larval->completion, 60 * HZ);
+
+ alg = larval->adult;
+@@ -445,7 +445,7 @@ struct crypto_tfm *crypto_alloc_base(const char *alg_name, u32 type, u32 mask)
+ err:
+ if (err != -EAGAIN)
+ break;
+- if (signal_pending(current)) {
++ if (fatal_signal_pending(current)) {
+ err = -EINTR;
+ break;
+ }
+@@ -562,7 +562,7 @@ void *crypto_alloc_tfm(const char *alg_name,
+ err:
+ if (err != -EAGAIN)
+ break;
+- if (signal_pending(current)) {
++ if (fatal_signal_pending(current)) {
+ err = -EINTR;
+ break;
+ }
+diff --git a/crypto/crypto_user.c b/crypto/crypto_user.c
+index 08ea2867fc8a..d59fb4eeed2b 100644
+--- a/crypto/crypto_user.c
++++ b/crypto/crypto_user.c
+@@ -376,7 +376,7 @@ static struct crypto_alg *crypto_user_skcipher_alg(const char *name, u32 type,
+ err = PTR_ERR(alg);
+ if (err != -EAGAIN)
+ break;
+- if (signal_pending(current)) {
++ if (fatal_signal_pending(current)) {
+ err = -EINTR;
+ break;
+ }
+diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c
+index 7920c2741b47..cf91c114ed9f 100644
+--- a/drivers/block/nvme-core.c
++++ b/drivers/block/nvme-core.c
+@@ -597,6 +597,7 @@ static void req_completion(struct nvme_queue *nvmeq, void *ctx,
+ struct nvme_iod *iod = ctx;
+ struct request *req = iod_get_private(iod);
+ struct nvme_cmd_info *cmd_rq = blk_mq_rq_to_pdu(req);
++ bool requeue = false;
+
+ u16 status = le16_to_cpup(&cqe->status) >> 1;
+
+@@ -605,12 +606,13 @@ static void req_completion(struct nvme_queue *nvmeq, void *ctx,
+ && (jiffies - req->start_time) < req->timeout) {
+ unsigned long flags;
+
++ requeue = true;
+ blk_mq_requeue_request(req);
+ spin_lock_irqsave(req->q->queue_lock, flags);
+ if (!blk_queue_stopped(req->q))
+ blk_mq_kick_requeue_list(req->q);
+ spin_unlock_irqrestore(req->q->queue_lock, flags);
+- return;
++ goto release_iod;
+ }
+ if (req->cmd_type == REQ_TYPE_DRV_PRIV) {
+ if (cmd_rq->ctx == CMD_CTX_CANCELLED)
+@@ -631,7 +633,7 @@ static void req_completion(struct nvme_queue *nvmeq, void *ctx,
+ dev_warn(nvmeq->dev->dev,
+ "completing aborted command with status:%04x\n",
+ status);
+-
++ release_iod:
+ if (iod->nents) {
+ dma_unmap_sg(nvmeq->dev->dev, iod->sg, iod->nents,
+ rq_data_dir(req) ? DMA_TO_DEVICE : DMA_FROM_DEVICE);
+@@ -644,7 +646,8 @@ static void req_completion(struct nvme_queue *nvmeq, void *ctx,
+ }
+ nvme_free_iod(nvmeq->dev, iod);
+
+- blk_mq_complete_request(req);
++ if (likely(!requeue))
++ blk_mq_complete_request(req);
+ }
+
+ /* length is in bytes. gfp flags indicates whether we may sleep. */
+@@ -1764,7 +1767,7 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio)
+
+ length = (io.nblocks + 1) << ns->lba_shift;
+ meta_len = (io.nblocks + 1) * ns->ms;
+- metadata = (void __user *)(unsigned long)io.metadata;
++ metadata = (void __user *)(uintptr_t)io.metadata;
+ write = io.opcode & 1;
+
+ if (ns->ext) {
+@@ -1804,7 +1807,7 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio)
+ c.rw.metadata = cpu_to_le64(meta_dma);
+
+ status = __nvme_submit_sync_cmd(ns->queue, &c, NULL,
+- (void __user *)io.addr, length, NULL, 0);
++ (void __user *)(uintptr_t)io.addr, length, NULL, 0);
+ unmap:
+ if (meta) {
+ if (status == NVME_SC_SUCCESS && !write) {
+@@ -1846,7 +1849,7 @@ static int nvme_user_cmd(struct nvme_dev *dev, struct nvme_ns *ns,
+ timeout = msecs_to_jiffies(cmd.timeout_ms);
+
+ status = __nvme_submit_sync_cmd(ns ? ns->queue : dev->admin_q, &c,
+- NULL, (void __user *)cmd.addr, cmd.data_len,
++ NULL, (void __user *)(uintptr_t)cmd.addr, cmd.data_len,
+ &cmd.result, timeout);
+ if (status >= 0) {
+ if (put_user(cmd.result, &ucmd->result))
+diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
+index 324bf35ec4dd..017b7d58ae06 100644
+--- a/drivers/block/rbd.c
++++ b/drivers/block/rbd.c
+@@ -96,6 +96,8 @@ static int atomic_dec_return_safe(atomic_t *v)
+ #define RBD_MINORS_PER_MAJOR 256
+ #define RBD_SINGLE_MAJOR_PART_SHIFT 4
+
++#define RBD_MAX_PARENT_CHAIN_LEN 16
++
+ #define RBD_SNAP_DEV_NAME_PREFIX "snap_"
+ #define RBD_MAX_SNAP_NAME_LEN \
+ (NAME_MAX - (sizeof (RBD_SNAP_DEV_NAME_PREFIX) - 1))
+@@ -426,7 +428,7 @@ static ssize_t rbd_add_single_major(struct bus_type *bus, const char *buf,
+ size_t count);
+ static ssize_t rbd_remove_single_major(struct bus_type *bus, const char *buf,
+ size_t count);
+-static int rbd_dev_image_probe(struct rbd_device *rbd_dev, bool mapping);
++static int rbd_dev_image_probe(struct rbd_device *rbd_dev, int depth);
+ static void rbd_spec_put(struct rbd_spec *spec);
+
+ static int rbd_dev_id_to_minor(int dev_id)
+@@ -3819,6 +3821,9 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
+ q->limits.discard_zeroes_data = 1;
+
+ blk_queue_merge_bvec(q, rbd_merge_bvec);
++ if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
++ q->backing_dev_info.capabilities |= BDI_CAP_STABLE_WRITES;
++
+ disk->queue = q;
+
+ q->queuedata = rbd_dev;
+@@ -5169,44 +5174,51 @@ out_err:
+ return ret;
+ }
+
+-static int rbd_dev_probe_parent(struct rbd_device *rbd_dev)
++/*
++ * @depth is rbd_dev_image_probe() -> rbd_dev_probe_parent() ->
++ * rbd_dev_image_probe() recursion depth, which means it's also the
++ * length of the already discovered part of the parent chain.
++ */
++static int rbd_dev_probe_parent(struct rbd_device *rbd_dev, int depth)
+ {
+ struct rbd_device *parent = NULL;
+- struct rbd_spec *parent_spec;
+- struct rbd_client *rbdc;
+ int ret;
+
+ if (!rbd_dev->parent_spec)
+ return 0;
+- /*
+- * We need to pass a reference to the client and the parent
+- * spec when creating the parent rbd_dev. Images related by
+- * parent/child relationships always share both.
+- */
+- parent_spec = rbd_spec_get(rbd_dev->parent_spec);
+- rbdc = __rbd_get_client(rbd_dev->rbd_client);
+
+- ret = -ENOMEM;
+- parent = rbd_dev_create(rbdc, parent_spec, NULL);
+- if (!parent)
++ if (++depth > RBD_MAX_PARENT_CHAIN_LEN) {
++ pr_info("parent chain is too long (%d)\n", depth);
++ ret = -EINVAL;
+ goto out_err;
++ }
+
+- ret = rbd_dev_image_probe(parent, false);
++ parent = rbd_dev_create(rbd_dev->rbd_client, rbd_dev->parent_spec,
++ NULL);
++ if (!parent) {
++ ret = -ENOMEM;
++ goto out_err;
++ }
++
++ /*
++ * Images related by parent/child relationships always share
++ * rbd_client and spec/parent_spec, so bump their refcounts.
++ */
++ __rbd_get_client(rbd_dev->rbd_client);
++ rbd_spec_get(rbd_dev->parent_spec);
++
++ ret = rbd_dev_image_probe(parent, depth);
+ if (ret < 0)
+ goto out_err;
++
+ rbd_dev->parent = parent;
+ atomic_set(&rbd_dev->parent_ref, 1);
+-
+ return 0;
++
+ out_err:
+- if (parent) {
+- rbd_dev_unparent(rbd_dev);
++ rbd_dev_unparent(rbd_dev);
++ if (parent)
+ rbd_dev_destroy(parent);
+- } else {
+- rbd_put_client(rbdc);
+- rbd_spec_put(parent_spec);
+- }
+-
+ return ret;
+ }
+
+@@ -5324,7 +5336,7 @@ static void rbd_dev_image_release(struct rbd_device *rbd_dev)
+ * parent), initiate a watch on its header object before using that
+ * object to get detailed information about the rbd image.
+ */
+-static int rbd_dev_image_probe(struct rbd_device *rbd_dev, bool mapping)
++static int rbd_dev_image_probe(struct rbd_device *rbd_dev, int depth)
+ {
+ int ret;
+
+@@ -5342,7 +5354,7 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, bool mapping)
+ if (ret)
+ goto err_out_format;
+
+- if (mapping) {
++ if (!depth) {
+ ret = rbd_dev_header_watch_sync(rbd_dev);
+ if (ret) {
+ if (ret == -ENOENT)
+@@ -5363,7 +5375,7 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, bool mapping)
+ * Otherwise this is a parent image, identified by pool, image
+ * and snap ids - need to fill in names for those ids.
+ */
+- if (mapping)
++ if (!depth)
+ ret = rbd_spec_fill_snap_id(rbd_dev);
+ else
+ ret = rbd_spec_fill_names(rbd_dev);
+@@ -5385,12 +5397,12 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, bool mapping)
+ * Need to warn users if this image is the one being
+ * mapped and has a parent.
+ */
+- if (mapping && rbd_dev->parent_spec)
++ if (!depth && rbd_dev->parent_spec)
+ rbd_warn(rbd_dev,
+ "WARNING: kernel layering is EXPERIMENTAL!");
+ }
+
+- ret = rbd_dev_probe_parent(rbd_dev);
++ ret = rbd_dev_probe_parent(rbd_dev, depth);
+ if (ret)
+ goto err_out_probe;
+
+@@ -5401,7 +5413,7 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, bool mapping)
+ err_out_probe:
+ rbd_dev_unprobe(rbd_dev);
+ err_out_watch:
+- if (mapping)
++ if (!depth)
+ rbd_dev_header_unwatch_sync(rbd_dev);
+ out_header_name:
+ kfree(rbd_dev->header_name);
+@@ -5464,7 +5476,7 @@ static ssize_t do_rbd_add(struct bus_type *bus,
+ spec = NULL; /* rbd_dev now owns this */
+ rbd_opts = NULL; /* rbd_dev now owns this */
+
+- rc = rbd_dev_image_probe(rbd_dev, true);
++ rc = rbd_dev_image_probe(rbd_dev, 0);
+ if (rc < 0)
+ goto err_out_rbd_dev;
+
+diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
+index 7a8a73f1fc04..d68b08ae4be1 100644
+--- a/drivers/block/xen-blkfront.c
++++ b/drivers/block/xen-blkfront.c
+@@ -1984,7 +1984,8 @@ static void blkback_changed(struct xenbus_device *dev,
+ break;
+ /* Missed the backend's Closing state -- fallthrough */
+ case XenbusStateClosing:
+- blkfront_closing(info);
++ if (info)
++ blkfront_closing(info);
+ break;
+ }
+ }
+diff --git a/drivers/bus/arm-ccn.c b/drivers/bus/arm-ccn.c
+index 7d9879e166cf..395cb7f9f5a4 100644
+--- a/drivers/bus/arm-ccn.c
++++ b/drivers/bus/arm-ccn.c
+@@ -1188,7 +1188,8 @@ static int arm_ccn_pmu_cpu_notifier(struct notifier_block *nb,
+ break;
+ perf_pmu_migrate_context(&dt->pmu, cpu, target);
+ cpumask_set_cpu(target, &dt->cpu);
+- WARN_ON(irq_set_affinity(ccn->irq, &dt->cpu) != 0);
++ if (ccn->irq)
++ WARN_ON(irq_set_affinity(ccn->irq, &dt->cpu) != 0);
+ default:
+ break;
+ }
+diff --git a/drivers/clk/clkdev.c b/drivers/clk/clkdev.c
+index c0eaf0973bd2..779b6ff0c7ad 100644
+--- a/drivers/clk/clkdev.c
++++ b/drivers/clk/clkdev.c
+@@ -333,7 +333,8 @@ int clk_add_alias(const char *alias, const char *alias_dev_name,
+ if (IS_ERR(r))
+ return PTR_ERR(r);
+
+- l = clkdev_create(r, alias, "%s", alias_dev_name);
++ l = clkdev_create(r, alias, alias_dev_name ? "%s" : NULL,
++ alias_dev_name);
+ clk_put(r);
+
+ return l ? 0 : -ENODEV;
+diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
+index fcb929ec5304..aba2117a80c1 100644
+--- a/drivers/cpufreq/intel_pstate.c
++++ b/drivers/cpufreq/intel_pstate.c
+@@ -766,6 +766,11 @@ static inline void intel_pstate_sample(struct cpudata *cpu)
+ local_irq_save(flags);
+ rdmsrl(MSR_IA32_APERF, aperf);
+ rdmsrl(MSR_IA32_MPERF, mperf);
++ if (cpu->prev_mperf == mperf) {
++ local_irq_restore(flags);
++ return;
++ }
++
+ tsc = native_read_tsc();
+ local_irq_restore(flags);
+
+diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c
+index ca7831168298..91cf71008e11 100644
+--- a/drivers/edac/sb_edac.c
++++ b/drivers/edac/sb_edac.c
+@@ -1648,6 +1648,7 @@ static int sbridge_mci_bind_devs(struct mem_ctl_info *mci,
+ {
+ struct sbridge_pvt *pvt = mci->pvt_info;
+ struct pci_dev *pdev;
++ u8 saw_chan_mask = 0;
+ int i;
+
+ for (i = 0; i < sbridge_dev->n_devs; i++) {
+@@ -1681,6 +1682,7 @@ static int sbridge_mci_bind_devs(struct mem_ctl_info *mci,
+ {
+ int id = pdev->device - PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TAD0;
+ pvt->pci_tad[id] = pdev;
++ saw_chan_mask |= 1 << id;
+ }
+ break;
+ case PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_DDRIO:
+@@ -1701,10 +1703,8 @@ static int sbridge_mci_bind_devs(struct mem_ctl_info *mci,
+ !pvt-> pci_tad || !pvt->pci_ras || !pvt->pci_ta)
+ goto enodev;
+
+- for (i = 0; i < NUM_CHANNELS; i++) {
+- if (!pvt->pci_tad[i])
+- goto enodev;
+- }
++ if (saw_chan_mask != 0x0f)
++ goto enodev;
+ return 0;
+
+ enodev:
+diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+index f7b49d5ce4b8..e3305a5aedfd 100644
+--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
++++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+@@ -1583,6 +1583,7 @@ struct amdgpu_pm {
+ u8 fan_max_rpm;
+ /* dpm */
+ bool dpm_enabled;
++ bool sysfs_initialized;
+ struct amdgpu_dpm dpm;
+ const struct firmware *fw; /* SMC firmware */
+ uint32_t fw_version;
+diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
+index ed13baa7c976..91c7556a365a 100644
+--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
++++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
+@@ -693,6 +693,9 @@ int amdgpu_pm_sysfs_init(struct amdgpu_device *adev)
+ {
+ int ret;
+
++ if (adev->pm.sysfs_initialized)
++ return 0;
++
+ if (adev->pm.funcs->get_temperature == NULL)
+ return 0;
+ adev->pm.int_hwmon_dev = hwmon_device_register_with_groups(adev->dev,
+@@ -721,6 +724,8 @@ int amdgpu_pm_sysfs_init(struct amdgpu_device *adev)
+ return ret;
+ }
+
++ adev->pm.sysfs_initialized = true;
++
+ return 0;
+ }
+
+diff --git a/drivers/gpu/drm/amd/amdgpu/kv_dpm.c b/drivers/gpu/drm/amd/amdgpu/kv_dpm.c
+index 9745ed3a9aef..7e9154c7f1db 100644
+--- a/drivers/gpu/drm/amd/amdgpu/kv_dpm.c
++++ b/drivers/gpu/drm/amd/amdgpu/kv_dpm.c
+@@ -2997,6 +2997,9 @@ static int kv_dpm_late_init(void *handle)
+ struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+ int ret;
+
++ if (!amdgpu_dpm)
++ return 0;
++
+ /* init the sysfs and debugfs files late */
+ ret = amdgpu_pm_sysfs_init(adev);
+ if (ret)
+diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
+index fed748311b92..4e8d72d40af4 100644
+--- a/drivers/gpu/drm/drm_crtc.c
++++ b/drivers/gpu/drm/drm_crtc.c
+@@ -4221,7 +4221,7 @@ drm_property_create_blob(struct drm_device *dev, size_t length,
+ struct drm_property_blob *blob;
+ int ret;
+
+- if (!length)
++ if (!length || length > ULONG_MAX - sizeof(struct drm_property_blob))
+ return ERR_PTR(-EINVAL);
+
+ blob = kzalloc(sizeof(struct drm_property_blob)+length, GFP_KERNEL);
+@@ -4573,7 +4573,7 @@ int drm_mode_createblob_ioctl(struct drm_device *dev,
+ * not associated with any file_priv. */
+ mutex_lock(&dev->mode_config.blob_lock);
+ out_resp->blob_id = blob->base.id;
+- list_add_tail(&file_priv->blobs, &blob->head_file);
++ list_add_tail(&blob->head_file, &file_priv->blobs);
+ mutex_unlock(&dev->mode_config.blob_lock);
+
+ return 0;
+diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c
+index 27a2426c3daa..1f94219f3e0e 100644
+--- a/drivers/gpu/drm/drm_dp_mst_topology.c
++++ b/drivers/gpu/drm/drm_dp_mst_topology.c
+@@ -1193,17 +1193,18 @@ static struct drm_dp_mst_branch *drm_dp_get_mst_branch_device(struct drm_dp_mst_
+
+ list_for_each_entry(port, &mstb->ports, next) {
+ if (port->port_num == port_num) {
+- if (!port->mstb) {
++ mstb = port->mstb;
++ if (!mstb) {
+ DRM_ERROR("failed to lookup MSTB with lct %d, rad %02x\n", lct, rad[0]);
+- return NULL;
++ goto out;
+ }
+
+- mstb = port->mstb;
+ break;
+ }
+ }
+ }
+ kref_get(&mstb->kref);
++out:
+ mutex_unlock(&mgr->lock);
+ return mstb;
+ }
+diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
+index 8fd431bcdfd3..a96b9006a51e 100644
+--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
++++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
+@@ -804,7 +804,10 @@ static const struct drm_i915_gem_object_ops i915_gem_userptr_ops = {
+ * Also note, that the object created here is not currently a "first class"
+ * object, in that several ioctls are banned. These are the CPU access
+ * ioctls: mmap(), pwrite and pread. In practice, you are expected to use
+- * direct access via your pointer rather than use those ioctls.
++ * direct access via your pointer rather than use those ioctls. Another
++ * restriction is that we do not allow userptr surfaces to be pinned to the
++ * hardware and so we reject any attempt to create a framebuffer out of a
++ * userptr.
+ *
+ * If you think this is a good interface to use to pass GPU memory between
+ * drivers, please use dma-buf instead. In fact, wherever possible use
+diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
+index 107c6c0519fd..10b1b657d32a 100644
+--- a/drivers/gpu/drm/i915/intel_display.c
++++ b/drivers/gpu/drm/i915/intel_display.c
+@@ -1729,6 +1729,8 @@ static void i9xx_enable_pll(struct intel_crtc *crtc)
+ I915_READ(DPLL(!crtc->pipe)) | DPLL_DVO_2X_MODE);
+ }
+
++ I915_WRITE(reg, dpll);
++
+ /* Wait for the clocks to stabilize. */
+ POSTING_READ(reg);
+ udelay(150);
+@@ -14070,6 +14072,11 @@ static int intel_user_framebuffer_create_handle(struct drm_framebuffer *fb,
+ struct intel_framebuffer *intel_fb = to_intel_framebuffer(fb);
+ struct drm_i915_gem_object *obj = intel_fb->obj;
+
++ if (obj->userptr.mm) {
++ DRM_DEBUG("attempting to use a userptr for a framebuffer, denied\n");
++ return -EINVAL;
++ }
++
+ return drm_gem_handle_create(file, &obj->base, handle);
+ }
+
+diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
+index 7f2161a1ff5d..504728b401b6 100644
+--- a/drivers/gpu/drm/i915/intel_lrc.c
++++ b/drivers/gpu/drm/i915/intel_lrc.c
+@@ -1250,6 +1250,7 @@ static int gen8_emit_flush_render(struct intel_ringbuffer *ringbuf,
+ if (flush_domains) {
+ flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
+ flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
++ flags |= PIPE_CONTROL_FLUSH_ENABLE;
+ }
+
+ if (invalidate_domains) {
+diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
+index 3817a6f00d9e..ba672aa980e1 100644
+--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
++++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
+@@ -342,6 +342,7 @@ gen7_render_ring_flush(struct intel_engine_cs *ring,
+ if (flush_domains) {
+ flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
+ flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
++ flags |= PIPE_CONTROL_FLUSH_ENABLE;
+ }
+ if (invalidate_domains) {
+ flags |= PIPE_CONTROL_TLB_INVALIDATE;
+@@ -412,6 +413,7 @@ gen8_render_ring_flush(struct intel_engine_cs *ring,
+ if (flush_domains) {
+ flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH;
+ flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH;
++ flags |= PIPE_CONTROL_FLUSH_ENABLE;
+ }
+ if (invalidate_domains) {
+ flags |= PIPE_CONTROL_TLB_INVALIDATE;
+diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c
+index af1ee517f372..0b2239423a37 100644
+--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
++++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
+@@ -227,11 +227,12 @@ nouveau_gem_info(struct drm_file *file_priv, struct drm_gem_object *gem,
+ struct nouveau_bo *nvbo = nouveau_gem_object(gem);
+ struct nvkm_vma *vma;
+
+- if (nvbo->bo.mem.mem_type == TTM_PL_TT)
++ if (is_power_of_2(nvbo->valid_domains))
++ rep->domain = nvbo->valid_domains;
++ else if (nvbo->bo.mem.mem_type == TTM_PL_TT)
+ rep->domain = NOUVEAU_GEM_DOMAIN_GART;
+ else
+ rep->domain = NOUVEAU_GEM_DOMAIN_VRAM;
+-
+ rep->offset = nvbo->bo.offset;
+ if (cli->vm) {
+ vma = nouveau_bo_vma_find(nvbo, cli->vm);
+diff --git a/drivers/gpu/drm/radeon/atombios_encoders.c b/drivers/gpu/drm/radeon/atombios_encoders.c
+index 65adb9c72377..bb292143997e 100644
+--- a/drivers/gpu/drm/radeon/atombios_encoders.c
++++ b/drivers/gpu/drm/radeon/atombios_encoders.c
+@@ -237,6 +237,7 @@ void radeon_atom_backlight_init(struct radeon_encoder *radeon_encoder,
+ backlight_update_status(bd);
+
+ DRM_INFO("radeon atom DIG backlight initialized\n");
++ rdev->mode_info.bl_encoder = radeon_encoder;
+
+ return;
+
+@@ -1624,9 +1625,14 @@ radeon_atom_encoder_dpms_avivo(struct drm_encoder *encoder, int mode)
+ } else
+ atom_execute_table(rdev->mode_info.atom_context, index, (uint32_t *)&args);
+ if (radeon_encoder->devices & (ATOM_DEVICE_LCD_SUPPORT)) {
+- struct radeon_encoder_atom_dig *dig = radeon_encoder->enc_priv;
++ if (rdev->mode_info.bl_encoder) {
++ struct radeon_encoder_atom_dig *dig = radeon_encoder->enc_priv;
+
+- atombios_set_backlight_level(radeon_encoder, dig->backlight_level);
++ atombios_set_backlight_level(radeon_encoder, dig->backlight_level);
++ } else {
++ args.ucAction = ATOM_LCD_BLON;
++ atom_execute_table(rdev->mode_info.atom_context, index, (uint32_t *)&args);
++ }
+ }
+ break;
+ case DRM_MODE_DPMS_STANDBY:
+@@ -1706,8 +1712,13 @@ radeon_atom_encoder_dpms_dig(struct drm_encoder *encoder, int mode)
+ if (ASIC_IS_DCE4(rdev))
+ atombios_dig_encoder_setup(encoder, ATOM_ENCODER_CMD_DP_VIDEO_ON, 0);
+ }
+- if (radeon_encoder->devices & (ATOM_DEVICE_LCD_SUPPORT))
+- atombios_set_backlight_level(radeon_encoder, dig->backlight_level);
++ if (radeon_encoder->devices & (ATOM_DEVICE_LCD_SUPPORT)) {
++ if (rdev->mode_info.bl_encoder)
++ atombios_set_backlight_level(radeon_encoder, dig->backlight_level);
++ else
++ atombios_dig_transmitter_setup(encoder,
++ ATOM_TRANSMITTER_ACTION_LCD_BLON, 0, 0);
++ }
+ if (ext_encoder)
+ atombios_external_encoder_setup(encoder, ext_encoder, ATOM_ENABLE);
+ break;
+diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
+index f03b7eb15233..b6cbd816537e 100644
+--- a/drivers/gpu/drm/radeon/radeon.h
++++ b/drivers/gpu/drm/radeon/radeon.h
+@@ -1658,6 +1658,7 @@ struct radeon_pm {
+ u8 fan_max_rpm;
+ /* dpm */
+ bool dpm_enabled;
++ bool sysfs_initialized;
+ struct radeon_dpm dpm;
+ };
+
+diff --git a/drivers/gpu/drm/radeon/radeon_encoders.c b/drivers/gpu/drm/radeon/radeon_encoders.c
+index ef99917f000d..c6ee80216cf4 100644
+--- a/drivers/gpu/drm/radeon/radeon_encoders.c
++++ b/drivers/gpu/drm/radeon/radeon_encoders.c
+@@ -194,7 +194,6 @@ static void radeon_encoder_add_backlight(struct radeon_encoder *radeon_encoder,
+ radeon_atom_backlight_init(radeon_encoder, connector);
+ else
+ radeon_legacy_backlight_init(radeon_encoder, connector);
+- rdev->mode_info.bl_encoder = radeon_encoder;
+ }
+ }
+
+diff --git a/drivers/gpu/drm/radeon/radeon_legacy_encoders.c b/drivers/gpu/drm/radeon/radeon_legacy_encoders.c
+index 45715307db71..30de43366eae 100644
+--- a/drivers/gpu/drm/radeon/radeon_legacy_encoders.c
++++ b/drivers/gpu/drm/radeon/radeon_legacy_encoders.c
+@@ -441,6 +441,7 @@ void radeon_legacy_backlight_init(struct radeon_encoder *radeon_encoder,
+ backlight_update_status(bd);
+
+ DRM_INFO("radeon legacy LVDS backlight initialized\n");
++ rdev->mode_info.bl_encoder = radeon_encoder;
+
+ return;
+
+diff --git a/drivers/gpu/drm/radeon/radeon_pm.c b/drivers/gpu/drm/radeon/radeon_pm.c
+index 948c33105801..91764320c56f 100644
+--- a/drivers/gpu/drm/radeon/radeon_pm.c
++++ b/drivers/gpu/drm/radeon/radeon_pm.c
+@@ -720,10 +720,14 @@ static umode_t hwmon_attributes_visible(struct kobject *kobj,
+ struct radeon_device *rdev = dev_get_drvdata(dev);
+ umode_t effective_mode = attr->mode;
+
+- /* Skip limit attributes if DPM is not enabled */
++ /* Skip attributes if DPM is not enabled */
+ if (rdev->pm.pm_method != PM_METHOD_DPM &&
+ (attr == &sensor_dev_attr_temp1_crit.dev_attr.attr ||
+- attr == &sensor_dev_attr_temp1_crit_hyst.dev_attr.attr))
++ attr == &sensor_dev_attr_temp1_crit_hyst.dev_attr.attr ||
++ attr == &sensor_dev_attr_pwm1.dev_attr.attr ||
++ attr == &sensor_dev_attr_pwm1_enable.dev_attr.attr ||
++ attr == &sensor_dev_attr_pwm1_max.dev_attr.attr ||
++ attr == &sensor_dev_attr_pwm1_min.dev_attr.attr))
+ return 0;
+
+ /* Skip fan attributes if fan is not present */
+@@ -1529,19 +1533,23 @@ int radeon_pm_late_init(struct radeon_device *rdev)
+
+ if (rdev->pm.pm_method == PM_METHOD_DPM) {
+ if (rdev->pm.dpm_enabled) {
+- ret = device_create_file(rdev->dev, &dev_attr_power_dpm_state);
+- if (ret)
+- DRM_ERROR("failed to create device file for dpm state\n");
+- ret = device_create_file(rdev->dev, &dev_attr_power_dpm_force_performance_level);
+- if (ret)
+- DRM_ERROR("failed to create device file for dpm state\n");
+- /* XXX: these are noops for dpm but are here for backwards compat */
+- ret = device_create_file(rdev->dev, &dev_attr_power_profile);
+- if (ret)
+- DRM_ERROR("failed to create device file for power profile\n");
+- ret = device_create_file(rdev->dev, &dev_attr_power_method);
+- if (ret)
+- DRM_ERROR("failed to create device file for power method\n");
++ if (!rdev->pm.sysfs_initialized) {
++ ret = device_create_file(rdev->dev, &dev_attr_power_dpm_state);
++ if (ret)
++ DRM_ERROR("failed to create device file for dpm state\n");
++ ret = device_create_file(rdev->dev, &dev_attr_power_dpm_force_performance_level);
++ if (ret)
++ DRM_ERROR("failed to create device file for dpm state\n");
++ /* XXX: these are noops for dpm but are here for backwards compat */
++ ret = device_create_file(rdev->dev, &dev_attr_power_profile);
++ if (ret)
++ DRM_ERROR("failed to create device file for power profile\n");
++ ret = device_create_file(rdev->dev, &dev_attr_power_method);
++ if (ret)
++ DRM_ERROR("failed to create device file for power method\n");
++ if (!ret)
++ rdev->pm.sysfs_initialized = true;
++ }
+
+ mutex_lock(&rdev->pm.mutex);
+ ret = radeon_dpm_late_enable(rdev);
+@@ -1557,7 +1565,8 @@ int radeon_pm_late_init(struct radeon_device *rdev)
+ }
+ }
+ } else {
+- if (rdev->pm.num_power_states > 1) {
++ if ((rdev->pm.num_power_states > 1) &&
++ (!rdev->pm.sysfs_initialized)) {
+ /* where's the best place to put these? */
+ ret = device_create_file(rdev->dev, &dev_attr_power_profile);
+ if (ret)
+@@ -1565,6 +1574,8 @@ int radeon_pm_late_init(struct radeon_device *rdev)
+ ret = device_create_file(rdev->dev, &dev_attr_power_method);
+ if (ret)
+ DRM_ERROR("failed to create device file for power method\n");
++ if (!ret)
++ rdev->pm.sysfs_initialized = true;
+ }
+ }
+ return ret;
+diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+index 620bb5cf617c..15a8d7746fd2 100644
+--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
++++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+@@ -1458,6 +1458,9 @@ static void __exit vmwgfx_exit(void)
+ drm_pci_exit(&driver, &vmw_pci_driver);
+ }
+
++MODULE_INFO(vmw_patch, "ed7d78b2");
++MODULE_INFO(vmw_patch, "54c12bc3");
++
+ module_init(vmwgfx_init);
+ module_exit(vmwgfx_exit);
+
+diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+index d26a6daa9719..d8896ed41b9e 100644
+--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
++++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+@@ -636,7 +636,8 @@ extern int vmw_user_dmabuf_alloc(struct vmw_private *dev_priv,
+ uint32_t size,
+ bool shareable,
+ uint32_t *handle,
+- struct vmw_dma_buffer **p_dma_buf);
++ struct vmw_dma_buffer **p_dma_buf,
++ struct ttm_base_object **p_base);
+ extern int vmw_user_dmabuf_reference(struct ttm_object_file *tfile,
+ struct vmw_dma_buffer *dma_buf,
+ uint32_t *handle);
+@@ -650,7 +651,8 @@ extern uint32_t vmw_dmabuf_validate_node(struct ttm_buffer_object *bo,
+ uint32_t cur_validate_node);
+ extern void vmw_dmabuf_validate_clear(struct ttm_buffer_object *bo);
+ extern int vmw_user_dmabuf_lookup(struct ttm_object_file *tfile,
+- uint32_t id, struct vmw_dma_buffer **out);
++ uint32_t id, struct vmw_dma_buffer **out,
++ struct ttm_base_object **base);
+ extern int vmw_stream_claim_ioctl(struct drm_device *dev, void *data,
+ struct drm_file *file_priv);
+ extern int vmw_stream_unref_ioctl(struct drm_device *dev, void *data,
+diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
+index 97ad3bcb99a7..aee1c6ccc52d 100644
+--- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
++++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
+@@ -887,7 +887,8 @@ static int vmw_translate_mob_ptr(struct vmw_private *dev_priv,
+ struct vmw_relocation *reloc;
+ int ret;
+
+- ret = vmw_user_dmabuf_lookup(sw_context->fp->tfile, handle, &vmw_bo);
++ ret = vmw_user_dmabuf_lookup(sw_context->fp->tfile, handle, &vmw_bo,
++ NULL);
+ if (unlikely(ret != 0)) {
+ DRM_ERROR("Could not find or use MOB buffer.\n");
+ ret = -EINVAL;
+@@ -949,7 +950,8 @@ static int vmw_translate_guest_ptr(struct vmw_private *dev_priv,
+ struct vmw_relocation *reloc;
+ int ret;
+
+- ret = vmw_user_dmabuf_lookup(sw_context->fp->tfile, handle, &vmw_bo);
++ ret = vmw_user_dmabuf_lookup(sw_context->fp->tfile, handle, &vmw_bo,
++ NULL);
+ if (unlikely(ret != 0)) {
+ DRM_ERROR("Could not find or use GMR region.\n");
+ ret = -EINVAL;
+diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c b/drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c
+index 87e39f68e9d0..e1898982b44a 100644
+--- a/drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c
++++ b/drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c
+@@ -484,7 +484,7 @@ int vmw_overlay_ioctl(struct drm_device *dev, void *data,
+ goto out_unlock;
+ }
+
+- ret = vmw_user_dmabuf_lookup(tfile, arg->handle, &buf);
++ ret = vmw_user_dmabuf_lookup(tfile, arg->handle, &buf, NULL);
+ if (ret)
+ goto out_unlock;
+
+diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
+index 210ef15b1d09..c5b4c47e86d6 100644
+--- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
++++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
+@@ -356,7 +356,7 @@ int vmw_user_lookup_handle(struct vmw_private *dev_priv,
+ }
+
+ *out_surf = NULL;
+- ret = vmw_user_dmabuf_lookup(tfile, handle, out_buf);
++ ret = vmw_user_dmabuf_lookup(tfile, handle, out_buf, NULL);
+ return ret;
+ }
+
+@@ -483,7 +483,8 @@ int vmw_user_dmabuf_alloc(struct vmw_private *dev_priv,
+ uint32_t size,
+ bool shareable,
+ uint32_t *handle,
+- struct vmw_dma_buffer **p_dma_buf)
++ struct vmw_dma_buffer **p_dma_buf,
++ struct ttm_base_object **p_base)
+ {
+ struct vmw_user_dma_buffer *user_bo;
+ struct ttm_buffer_object *tmp;
+@@ -517,6 +518,10 @@ int vmw_user_dmabuf_alloc(struct vmw_private *dev_priv,
+ }
+
+ *p_dma_buf = &user_bo->dma;
++ if (p_base) {
++ *p_base = &user_bo->prime.base;
++ kref_get(&(*p_base)->refcount);
++ }
+ *handle = user_bo->prime.base.hash.key;
+
+ out_no_base_object:
+@@ -633,6 +638,7 @@ int vmw_user_dmabuf_synccpu_ioctl(struct drm_device *dev, void *data,
+ struct vmw_dma_buffer *dma_buf;
+ struct vmw_user_dma_buffer *user_bo;
+ struct ttm_object_file *tfile = vmw_fpriv(file_priv)->tfile;
++ struct ttm_base_object *buffer_base;
+ int ret;
+
+ if ((arg->flags & (drm_vmw_synccpu_read | drm_vmw_synccpu_write)) == 0
+@@ -645,7 +651,8 @@ int vmw_user_dmabuf_synccpu_ioctl(struct drm_device *dev, void *data,
+
+ switch (arg->op) {
+ case drm_vmw_synccpu_grab:
+- ret = vmw_user_dmabuf_lookup(tfile, arg->handle, &dma_buf);
++ ret = vmw_user_dmabuf_lookup(tfile, arg->handle, &dma_buf,
++ &buffer_base);
+ if (unlikely(ret != 0))
+ return ret;
+
+@@ -653,6 +660,7 @@ int vmw_user_dmabuf_synccpu_ioctl(struct drm_device *dev, void *data,
+ dma);
+ ret = vmw_user_dmabuf_synccpu_grab(user_bo, tfile, arg->flags);
+ vmw_dmabuf_unreference(&dma_buf);
++ ttm_base_object_unref(&buffer_base);
+ if (unlikely(ret != 0 && ret != -ERESTARTSYS &&
+ ret != -EBUSY)) {
+ DRM_ERROR("Failed synccpu grab on handle 0x%08x.\n",
+@@ -694,7 +702,8 @@ int vmw_dmabuf_alloc_ioctl(struct drm_device *dev, void *data,
+ return ret;
+
+ ret = vmw_user_dmabuf_alloc(dev_priv, vmw_fpriv(file_priv)->tfile,
+- req->size, false, &handle, &dma_buf);
++ req->size, false, &handle, &dma_buf,
++ NULL);
+ if (unlikely(ret != 0))
+ goto out_no_dmabuf;
+
+@@ -723,7 +732,8 @@ int vmw_dmabuf_unref_ioctl(struct drm_device *dev, void *data,
+ }
+
+ int vmw_user_dmabuf_lookup(struct ttm_object_file *tfile,
+- uint32_t handle, struct vmw_dma_buffer **out)
++ uint32_t handle, struct vmw_dma_buffer **out,
++ struct ttm_base_object **p_base)
+ {
+ struct vmw_user_dma_buffer *vmw_user_bo;
+ struct ttm_base_object *base;
+@@ -745,7 +755,10 @@ int vmw_user_dmabuf_lookup(struct ttm_object_file *tfile,
+ vmw_user_bo = container_of(base, struct vmw_user_dma_buffer,
+ prime.base);
+ (void)ttm_bo_reference(&vmw_user_bo->dma.base);
+- ttm_base_object_unref(&base);
++ if (p_base)
++ *p_base = base;
++ else
++ ttm_base_object_unref(&base);
+ *out = &vmw_user_bo->dma;
+
+ return 0;
+@@ -1006,7 +1019,7 @@ int vmw_dumb_create(struct drm_file *file_priv,
+
+ ret = vmw_user_dmabuf_alloc(dev_priv, vmw_fpriv(file_priv)->tfile,
+ args->size, false, &args->handle,
+- &dma_buf);
++ &dma_buf, NULL);
+ if (unlikely(ret != 0))
+ goto out_no_dmabuf;
+
+@@ -1034,7 +1047,7 @@ int vmw_dumb_map_offset(struct drm_file *file_priv,
+ struct vmw_dma_buffer *out_buf;
+ int ret;
+
+- ret = vmw_user_dmabuf_lookup(tfile, handle, &out_buf);
++ ret = vmw_user_dmabuf_lookup(tfile, handle, &out_buf, NULL);
+ if (ret != 0)
+ return -EINVAL;
+
+diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c b/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c
+index 6a4584a43aa6..d2751ada19b1 100644
+--- a/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c
++++ b/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c
+@@ -470,7 +470,7 @@ int vmw_shader_define_ioctl(struct drm_device *dev, void *data,
+
+ if (arg->buffer_handle != SVGA3D_INVALID_ID) {
+ ret = vmw_user_dmabuf_lookup(tfile, arg->buffer_handle,
+- &buffer);
++ &buffer, NULL);
+ if (unlikely(ret != 0)) {
+ DRM_ERROR("Could not find buffer for shader "
+ "creation.\n");
+diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
+index 4ecdbf3e59da..17a4107639b2 100644
+--- a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
++++ b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c
+@@ -43,6 +43,7 @@ struct vmw_user_surface {
+ struct vmw_surface srf;
+ uint32_t size;
+ struct drm_master *master;
++ struct ttm_base_object *backup_base;
+ };
+
+ /**
+@@ -652,6 +653,8 @@ static void vmw_user_surface_base_release(struct ttm_base_object **p_base)
+ struct vmw_resource *res = &user_srf->srf.res;
+
+ *p_base = NULL;
++ if (user_srf->backup_base)
++ ttm_base_object_unref(&user_srf->backup_base);
+ vmw_resource_unreference(&res);
+ }
+
+@@ -846,7 +849,8 @@ int vmw_surface_define_ioctl(struct drm_device *dev, void *data,
+ res->backup_size,
+ true,
+ &backup_handle,
+- &res->backup);
++ &res->backup,
++ &user_srf->backup_base);
+ if (unlikely(ret != 0)) {
+ vmw_resource_unreference(&res);
+ goto out_unlock;
+@@ -1309,7 +1313,8 @@ int vmw_gb_surface_define_ioctl(struct drm_device *dev, void *data,
+
+ if (req->buffer_handle != SVGA3D_INVALID_ID) {
+ ret = vmw_user_dmabuf_lookup(tfile, req->buffer_handle,
+- &res->backup);
++ &res->backup,
++ &user_srf->backup_base);
+ } else if (req->drm_surface_flags &
+ drm_vmw_surface_flag_create_buffer)
+ ret = vmw_user_dmabuf_alloc(dev_priv, tfile,
+@@ -1317,7 +1322,8 @@ int vmw_gb_surface_define_ioctl(struct drm_device *dev, void *data,
+ req->drm_surface_flags &
+ drm_vmw_surface_flag_shareable,
+ &backup_handle,
+- &res->backup);
++ &res->backup,
++ &user_srf->backup_base);
+
+ if (unlikely(ret != 0)) {
+ vmw_resource_unreference(&res);
+diff --git a/drivers/i2c/busses/i2c-mv64xxx.c b/drivers/i2c/busses/i2c-mv64xxx.c
+index 30059c1df2a3..5801227b97ab 100644
+--- a/drivers/i2c/busses/i2c-mv64xxx.c
++++ b/drivers/i2c/busses/i2c-mv64xxx.c
+@@ -669,8 +669,6 @@ mv64xxx_i2c_can_offload(struct mv64xxx_i2c_data *drv_data)
+ struct i2c_msg *msgs = drv_data->msgs;
+ int num = drv_data->num_msgs;
+
+- return false;
+-
+ if (!drv_data->offload_enabled)
+ return false;
+
+diff --git a/drivers/iio/accel/st_accel_core.c b/drivers/iio/accel/st_accel_core.c
+index 4002e6410444..c472477f9a7d 100644
+--- a/drivers/iio/accel/st_accel_core.c
++++ b/drivers/iio/accel/st_accel_core.c
+@@ -149,8 +149,6 @@
+ #define ST_ACCEL_4_BDU_MASK 0x40
+ #define ST_ACCEL_4_DRDY_IRQ_ADDR 0x21
+ #define ST_ACCEL_4_DRDY_IRQ_INT1_MASK 0x04
+-#define ST_ACCEL_4_IG1_EN_ADDR 0x21
+-#define ST_ACCEL_4_IG1_EN_MASK 0x08
+ #define ST_ACCEL_4_MULTIREAD_BIT true
+
+ /* CUSTOM VALUES FOR SENSOR 5 */
+@@ -484,10 +482,6 @@ static const struct st_sensor_settings st_accel_sensors_settings[] = {
+ .drdy_irq = {
+ .addr = ST_ACCEL_4_DRDY_IRQ_ADDR,
+ .mask_int1 = ST_ACCEL_4_DRDY_IRQ_INT1_MASK,
+- .ig1 = {
+- .en_addr = ST_ACCEL_4_IG1_EN_ADDR,
+- .en_mask = ST_ACCEL_4_IG1_EN_MASK,
+- },
+ },
+ .multi_read_bit = ST_ACCEL_4_MULTIREAD_BIT,
+ .bootime = 2, /* guess */
+diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
+index 3a972ebf3c0d..8be73524aabd 100644
+--- a/drivers/infiniband/core/cm.c
++++ b/drivers/infiniband/core/cm.c
+@@ -873,6 +873,11 @@ retest:
+ case IB_CM_SIDR_REQ_RCVD:
+ spin_unlock_irq(&cm_id_priv->lock);
+ cm_reject_sidr_req(cm_id_priv, IB_SIDR_REJECT);
++ spin_lock_irq(&cm.lock);
++ if (!RB_EMPTY_NODE(&cm_id_priv->sidr_id_node))
++ rb_erase(&cm_id_priv->sidr_id_node,
++ &cm.remote_sidr_table);
++ spin_unlock_irq(&cm.lock);
+ break;
+ case IB_CM_REQ_SENT:
+ case IB_CM_MRA_REQ_RCVD:
+@@ -3112,7 +3117,10 @@ int ib_send_cm_sidr_rep(struct ib_cm_id *cm_id,
+ spin_unlock_irqrestore(&cm_id_priv->lock, flags);
+
+ spin_lock_irqsave(&cm.lock, flags);
+- rb_erase(&cm_id_priv->sidr_id_node, &cm.remote_sidr_table);
++ if (!RB_EMPTY_NODE(&cm_id_priv->sidr_id_node)) {
++ rb_erase(&cm_id_priv->sidr_id_node, &cm.remote_sidr_table);
++ RB_CLEAR_NODE(&cm_id_priv->sidr_id_node);
++ }
+ spin_unlock_irqrestore(&cm.lock, flags);
+ return 0;
+
+diff --git a/drivers/input/mouse/alps.c b/drivers/input/mouse/alps.c
+index 4d246861d692..41e6cb501e6a 100644
+--- a/drivers/input/mouse/alps.c
++++ b/drivers/input/mouse/alps.c
+@@ -100,7 +100,7 @@ static const struct alps_nibble_commands alps_v6_nibble_commands[] = {
+ #define ALPS_FOUR_BUTTONS 0x40 /* 4 direction button present */
+ #define ALPS_PS2_INTERLEAVED 0x80 /* 3-byte PS/2 packet interleaved with
+ 6-byte ALPS packet */
+-#define ALPS_DELL 0x100 /* device is a Dell laptop */
++#define ALPS_STICK_BITS 0x100 /* separate stick button bits */
+ #define ALPS_BUTTONPAD 0x200 /* device is a clickpad */
+
+ static const struct alps_model_info alps_model_data[] = {
+@@ -159,6 +159,43 @@ static const struct alps_protocol_info alps_v8_protocol_data = {
+ ALPS_PROTO_V8, 0x18, 0x18, 0
+ };
+
++/*
++ * Some v2 models report the stick buttons in separate bits
++ */
++static const struct dmi_system_id alps_dmi_has_separate_stick_buttons[] = {
++#if defined(CONFIG_DMI) && defined(CONFIG_X86)
++ {
++ /* Extrapolated from other entries */
++ .matches = {
++ DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
++ DMI_MATCH(DMI_PRODUCT_NAME, "Latitude D420"),
++ },
++ },
++ {
++ /* Reported-by: Hans de Bruin <jmdebruin@xmsnet.nl> */
++ .matches = {
++ DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
++ DMI_MATCH(DMI_PRODUCT_NAME, "Latitude D430"),
++ },
++ },
++ {
++ /* Reported-by: Hans de Goede <hdegoede@redhat.com> */
++ .matches = {
++ DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
++ DMI_MATCH(DMI_PRODUCT_NAME, "Latitude D620"),
++ },
++ },
++ {
++ /* Extrapolated from other entries */
++ .matches = {
++ DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."),
++ DMI_MATCH(DMI_PRODUCT_NAME, "Latitude D630"),
++ },
++ },
++#endif
++ { }
++};
++
+ static void alps_set_abs_params_st(struct alps_data *priv,
+ struct input_dev *dev1);
+ static void alps_set_abs_params_semi_mt(struct alps_data *priv,
+@@ -253,9 +290,8 @@ static void alps_process_packet_v1_v2(struct psmouse *psmouse)
+ return;
+ }
+
+- /* Dell non interleaved V2 dualpoint has separate stick button bits */
+- if (priv->proto_version == ALPS_PROTO_V2 &&
+- priv->flags == (ALPS_DELL | ALPS_PASS | ALPS_DUALPOINT)) {
++ /* Some models have separate stick button bits */
++ if (priv->flags & ALPS_STICK_BITS) {
+ left |= packet[0] & 1;
+ right |= packet[0] & 2;
+ middle |= packet[0] & 4;
+@@ -2552,8 +2588,6 @@ static int alps_set_protocol(struct psmouse *psmouse,
+ priv->byte0 = protocol->byte0;
+ priv->mask0 = protocol->mask0;
+ priv->flags = protocol->flags;
+- if (dmi_name_in_vendors("Dell"))
+- priv->flags |= ALPS_DELL;
+
+ priv->x_max = 2000;
+ priv->y_max = 1400;
+@@ -2568,6 +2602,8 @@ static int alps_set_protocol(struct psmouse *psmouse,
+ priv->set_abs_params = alps_set_abs_params_st;
+ priv->x_max = 1023;
+ priv->y_max = 767;
++ if (dmi_check_system(alps_dmi_has_separate_stick_buttons))
++ priv->flags |= ALPS_STICK_BITS;
+ break;
+
+ case ALPS_PROTO_V3:
+diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
+index 658ee39e6569..1b10e5fd6ef6 100644
+--- a/drivers/iommu/amd_iommu.c
++++ b/drivers/iommu/amd_iommu.c
+@@ -1974,8 +1974,8 @@ static void set_dte_entry(u16 devid, struct protection_domain *domain, bool ats)
+ static void clear_dte_entry(u16 devid)
+ {
+ /* remove entry from the device table seen by the hardware */
+- amd_iommu_dev_table[devid].data[0] = IOMMU_PTE_P | IOMMU_PTE_TV;
+- amd_iommu_dev_table[devid].data[1] = 0;
++ amd_iommu_dev_table[devid].data[0] = IOMMU_PTE_P | IOMMU_PTE_TV;
++ amd_iommu_dev_table[devid].data[1] &= DTE_FLAG_MASK;
+
+ amd_iommu_apply_erratum_63(devid);
+ }
+diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
+index f65908841be0..c9b64722f623 100644
+--- a/drivers/iommu/amd_iommu_types.h
++++ b/drivers/iommu/amd_iommu_types.h
+@@ -295,6 +295,7 @@
+ #define IOMMU_PTE_IR (1ULL << 61)
+ #define IOMMU_PTE_IW (1ULL << 62)
+
++#define DTE_FLAG_MASK (0x3ffULL << 32)
+ #define DTE_FLAG_IOTLB (0x01UL << 32)
+ #define DTE_FLAG_GV (0x01ULL << 55)
+ #define DTE_GLX_SHIFT (56)
+diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c
+index f7b875bb70d4..c3b8a5b9f035 100644
+--- a/drivers/iommu/amd_iommu_v2.c
++++ b/drivers/iommu/amd_iommu_v2.c
+@@ -516,6 +516,13 @@ static void do_fault(struct work_struct *work)
+ goto out;
+ }
+
++ if (!(vma->vm_flags & (VM_READ | VM_EXEC | VM_WRITE))) {
++ /* handle_mm_fault would BUG_ON() */
++ up_read(&mm->mmap_sem);
++ handle_fault_error(fault);
++ goto out;
++ }
++
+ ret = handle_mm_fault(mm, vma, address, write);
+ if (ret & VM_FAULT_ERROR) {
+ /* failed to service fault */
+diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
+index 7553cb90627f..bd1b8ad8af44 100644
+--- a/drivers/iommu/intel-iommu.c
++++ b/drivers/iommu/intel-iommu.c
+@@ -2109,15 +2109,19 @@ static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
+ return -ENOMEM;
+ /* It is large page*/
+ if (largepage_lvl > 1) {
++ unsigned long nr_superpages, end_pfn;
++
+ pteval |= DMA_PTE_LARGE_PAGE;
+ lvl_pages = lvl_to_nr_pages(largepage_lvl);
++
++ nr_superpages = sg_res / lvl_pages;
++ end_pfn = iov_pfn + nr_superpages * lvl_pages - 1;
++
+ /*
+ * Ensure that old small page tables are
+- * removed to make room for superpage,
+- * if they exist.
++ * removed to make room for superpage(s).
+ */
+- dma_pte_free_pagetable(domain, iov_pfn,
+- iov_pfn + lvl_pages - 1);
++ dma_pte_free_pagetable(domain, iov_pfn, end_pfn);
+ } else {
+ pteval &= ~(uint64_t)DMA_PTE_LARGE_PAGE;
+ }
+diff --git a/drivers/irqchip/irq-tegra.c b/drivers/irqchip/irq-tegra.c
+index f67bbd80433e..ab5353a96a82 100644
+--- a/drivers/irqchip/irq-tegra.c
++++ b/drivers/irqchip/irq-tegra.c
+@@ -215,6 +215,7 @@ static struct irq_chip tegra_ictlr_chip = {
+ .irq_unmask = tegra_unmask,
+ .irq_retrigger = tegra_retrigger,
+ .irq_set_wake = tegra_set_wake,
++ .irq_set_type = irq_chip_set_type_parent,
+ .flags = IRQCHIP_MASK_ON_SUSPEND,
+ #ifdef CONFIG_SMP
+ .irq_set_affinity = irq_chip_set_affinity_parent,
+diff --git a/drivers/md/dm-cache-metadata.c b/drivers/md/dm-cache-metadata.c
+index 20cc36b01b77..0a17d1b91a81 100644
+--- a/drivers/md/dm-cache-metadata.c
++++ b/drivers/md/dm-cache-metadata.c
+@@ -634,10 +634,10 @@ static int __commit_transaction(struct dm_cache_metadata *cmd,
+
+ disk_super = dm_block_data(sblock);
+
++ disk_super->flags = cpu_to_le32(cmd->flags);
+ if (mutator)
+ update_flags(disk_super, mutator);
+
+- disk_super->flags = cpu_to_le32(cmd->flags);
+ disk_super->mapping_root = cpu_to_le64(cmd->root);
+ disk_super->hint_root = cpu_to_le64(cmd->hint_root);
+ disk_super->discard_root = cpu_to_le64(cmd->discard_root);
+diff --git a/drivers/md/md.c b/drivers/md/md.c
+index e25f00f0138a..95e7b72a164a 100644
+--- a/drivers/md/md.c
++++ b/drivers/md/md.c
+@@ -8030,8 +8030,7 @@ static int remove_and_add_spares(struct mddev *mddev,
+ !test_bit(Bitmap_sync, &rdev->flags)))
+ continue;
+
+- if (rdev->saved_raid_disk < 0)
+- rdev->recovery_offset = 0;
++ rdev->recovery_offset = 0;
+ if (mddev->pers->
+ hot_add_disk(mddev, rdev) == 0) {
+ if (sysfs_link_rdev(mddev, rdev))
+diff --git a/drivers/md/persistent-data/dm-btree-remove.c b/drivers/md/persistent-data/dm-btree-remove.c
+index 4222f774cf36..1dac15d1697c 100644
+--- a/drivers/md/persistent-data/dm-btree-remove.c
++++ b/drivers/md/persistent-data/dm-btree-remove.c
+@@ -301,11 +301,16 @@ static void redistribute3(struct dm_btree_info *info, struct btree_node *parent,
+ {
+ int s;
+ uint32_t max_entries = le32_to_cpu(left->header.max_entries);
+- unsigned target = (nr_left + nr_center + nr_right) / 3;
+- BUG_ON(target > max_entries);
++ unsigned total = nr_left + nr_center + nr_right;
++ unsigned target_right = total / 3;
++ unsigned remainder = (target_right * 3) != total;
++ unsigned target_left = target_right + remainder;
++
++ BUG_ON(target_left > max_entries);
++ BUG_ON(target_right > max_entries);
+
+ if (nr_left < nr_right) {
+- s = nr_left - target;
++ s = nr_left - target_left;
+
+ if (s < 0 && nr_center < -s) {
+ /* not enough in central node */
+@@ -316,10 +321,10 @@ static void redistribute3(struct dm_btree_info *info, struct btree_node *parent,
+ } else
+ shift(left, center, s);
+
+- shift(center, right, target - nr_right);
++ shift(center, right, target_right - nr_right);
+
+ } else {
+- s = target - nr_right;
++ s = target_right - nr_right;
+ if (s > 0 && nr_center < s) {
+ /* not enough in central node */
+ shift(center, right, nr_center);
+@@ -329,7 +334,7 @@ static void redistribute3(struct dm_btree_info *info, struct btree_node *parent,
+ } else
+ shift(center, right, s);
+
+- shift(left, center, nr_left - target);
++ shift(left, center, nr_left - target_left);
+ }
+
+ *key_ptr(parent, c->index) = center->keys[0];
+diff --git a/drivers/md/persistent-data/dm-btree.c b/drivers/md/persistent-data/dm-btree.c
+index c7726cebc495..d6e47033b5e0 100644
+--- a/drivers/md/persistent-data/dm-btree.c
++++ b/drivers/md/persistent-data/dm-btree.c
+@@ -523,7 +523,7 @@ static int btree_split_beneath(struct shadow_spine *s, uint64_t key)
+
+ r = new_block(s->info, &right);
+ if (r < 0) {
+- /* FIXME: put left */
++ unlock_block(s->info, left);
+ return r;
+ }
+
+diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
+index 967a4ed73929..d10d3008227e 100644
+--- a/drivers/md/raid1.c
++++ b/drivers/md/raid1.c
+@@ -2249,7 +2249,7 @@ static int narrow_write_error(struct r1bio *r1_bio, int i)
+ bio_trim(wbio, sector - r1_bio->sector, sectors);
+ wbio->bi_iter.bi_sector += rdev->data_offset;
+ wbio->bi_bdev = rdev->bdev;
+- if (submit_bio_wait(WRITE, wbio) == 0)
++ if (submit_bio_wait(WRITE, wbio) < 0)
+ /* failure! */
+ ok = rdev_set_badblocks(rdev, sector,
+ sectors, 0)
+diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
+index 38c58e19cfce..d4b70d90de9c 100644
+--- a/drivers/md/raid10.c
++++ b/drivers/md/raid10.c
+@@ -2580,7 +2580,7 @@ static int narrow_write_error(struct r10bio *r10_bio, int i)
+ choose_data_offset(r10_bio, rdev) +
+ (sector - r10_bio->sector));
+ wbio->bi_bdev = rdev->bdev;
+- if (submit_bio_wait(WRITE, wbio) == 0)
++ if (submit_bio_wait(WRITE, wbio) < 0)
+ /* Failure! */
+ ok = rdev_set_badblocks(rdev, sector,
+ sectors, 0)
+diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
+index f757023fc458..0d4f7b1b7f73 100644
+--- a/drivers/md/raid5.c
++++ b/drivers/md/raid5.c
+@@ -3505,6 +3505,7 @@ returnbi:
+ }
+ if (!discard_pending &&
+ test_bit(R5_Discard, &sh->dev[sh->pd_idx].flags)) {
++ int hash;
+ clear_bit(R5_Discard, &sh->dev[sh->pd_idx].flags);
+ clear_bit(R5_UPTODATE, &sh->dev[sh->pd_idx].flags);
+ if (sh->qd_idx >= 0) {
+@@ -3518,16 +3519,17 @@ returnbi:
+ * no updated data, so remove it from hash list and the stripe
+ * will be reinitialized
+ */
+- spin_lock_irq(&conf->device_lock);
+ unhash:
++ hash = sh->hash_lock_index;
++ spin_lock_irq(conf->hash_locks + hash);
+ remove_hash(sh);
++ spin_unlock_irq(conf->hash_locks + hash);
+ if (head_sh->batch_head) {
+ sh = list_first_entry(&sh->batch_list,
+ struct stripe_head, batch_list);
+ if (sh != head_sh)
+ goto unhash;
+ }
+- spin_unlock_irq(&conf->device_lock);
+ sh = head_sh;
+
+ if (test_bit(STRIPE_SYNC_REQUESTED, &sh->state))
+diff --git a/drivers/media/dvb-frontends/m88ds3103.c b/drivers/media/dvb-frontends/m88ds3103.c
+index e9b2d2b69b1d..377fb6991ab3 100644
+--- a/drivers/media/dvb-frontends/m88ds3103.c
++++ b/drivers/media/dvb-frontends/m88ds3103.c
+@@ -18,6 +18,27 @@
+
+ static struct dvb_frontend_ops m88ds3103_ops;
+
++/* write single register with mask */
++static int m88ds3103_update_bits(struct m88ds3103_dev *dev,
++ u8 reg, u8 mask, u8 val)
++{
++ int ret;
++ u8 tmp;
++
++ /* no need for read if whole reg is written */
++ if (mask != 0xff) {
++ ret = regmap_bulk_read(dev->regmap, reg, &tmp, 1);
++ if (ret)
++ return ret;
++
++ val &= mask;
++ tmp &= ~mask;
++ val |= tmp;
++ }
++
++ return regmap_bulk_write(dev->regmap, reg, &val, 1);
++}
++
+ /* write reg val table using reg addr auto increment */
+ static int m88ds3103_wr_reg_val_tab(struct m88ds3103_dev *dev,
+ const struct m88ds3103_reg_val *tab, int tab_len)
+@@ -394,10 +415,10 @@ static int m88ds3103_set_frontend(struct dvb_frontend *fe)
+ u8tmp2 = 0x00; /* 0b00 */
+ break;
+ }
+- ret = regmap_update_bits(dev->regmap, 0x22, 0xc0, u8tmp1 << 6);
++ ret = m88ds3103_update_bits(dev, 0x22, 0xc0, u8tmp1 << 6);
+ if (ret)
+ goto err;
+- ret = regmap_update_bits(dev->regmap, 0x24, 0xc0, u8tmp2 << 6);
++ ret = m88ds3103_update_bits(dev, 0x24, 0xc0, u8tmp2 << 6);
+ if (ret)
+ goto err;
+ }
+@@ -455,13 +476,13 @@ static int m88ds3103_set_frontend(struct dvb_frontend *fe)
+ if (ret)
+ goto err;
+ }
+- ret = regmap_update_bits(dev->regmap, 0x9d, 0x08, 0x08);
++ ret = m88ds3103_update_bits(dev, 0x9d, 0x08, 0x08);
+ if (ret)
+ goto err;
+ ret = regmap_write(dev->regmap, 0xf1, 0x01);
+ if (ret)
+ goto err;
+- ret = regmap_update_bits(dev->regmap, 0x30, 0x80, 0x80);
++ ret = m88ds3103_update_bits(dev, 0x30, 0x80, 0x80);
+ if (ret)
+ goto err;
+ }
+@@ -498,7 +519,7 @@ static int m88ds3103_set_frontend(struct dvb_frontend *fe)
+ switch (dev->cfg->ts_mode) {
+ case M88DS3103_TS_SERIAL:
+ case M88DS3103_TS_SERIAL_D7:
+- ret = regmap_update_bits(dev->regmap, 0x29, 0x20, u8tmp1);
++ ret = m88ds3103_update_bits(dev, 0x29, 0x20, u8tmp1);
+ if (ret)
+ goto err;
+ u8tmp1 = 0;
+@@ -567,11 +588,11 @@ static int m88ds3103_set_frontend(struct dvb_frontend *fe)
+ if (ret)
+ goto err;
+
+- ret = regmap_update_bits(dev->regmap, 0x4d, 0x02, dev->cfg->spec_inv << 1);
++ ret = m88ds3103_update_bits(dev, 0x4d, 0x02, dev->cfg->spec_inv << 1);
+ if (ret)
+ goto err;
+
+- ret = regmap_update_bits(dev->regmap, 0x30, 0x10, dev->cfg->agc_inv << 4);
++ ret = m88ds3103_update_bits(dev, 0x30, 0x10, dev->cfg->agc_inv << 4);
+ if (ret)
+ goto err;
+
+@@ -625,13 +646,13 @@ static int m88ds3103_init(struct dvb_frontend *fe)
+ dev->warm = false;
+
+ /* wake up device from sleep */
+- ret = regmap_update_bits(dev->regmap, 0x08, 0x01, 0x01);
++ ret = m88ds3103_update_bits(dev, 0x08, 0x01, 0x01);
+ if (ret)
+ goto err;
+- ret = regmap_update_bits(dev->regmap, 0x04, 0x01, 0x00);
++ ret = m88ds3103_update_bits(dev, 0x04, 0x01, 0x00);
+ if (ret)
+ goto err;
+- ret = regmap_update_bits(dev->regmap, 0x23, 0x10, 0x00);
++ ret = m88ds3103_update_bits(dev, 0x23, 0x10, 0x00);
+ if (ret)
+ goto err;
+
+@@ -749,18 +770,18 @@ static int m88ds3103_sleep(struct dvb_frontend *fe)
+ utmp = 0x29;
+ else
+ utmp = 0x27;
+- ret = regmap_update_bits(dev->regmap, utmp, 0x01, 0x00);
++ ret = m88ds3103_update_bits(dev, utmp, 0x01, 0x00);
+ if (ret)
+ goto err;
+
+ /* sleep */
+- ret = regmap_update_bits(dev->regmap, 0x08, 0x01, 0x00);
++ ret = m88ds3103_update_bits(dev, 0x08, 0x01, 0x00);
+ if (ret)
+ goto err;
+- ret = regmap_update_bits(dev->regmap, 0x04, 0x01, 0x01);
++ ret = m88ds3103_update_bits(dev, 0x04, 0x01, 0x01);
+ if (ret)
+ goto err;
+- ret = regmap_update_bits(dev->regmap, 0x23, 0x10, 0x10);
++ ret = m88ds3103_update_bits(dev, 0x23, 0x10, 0x10);
+ if (ret)
+ goto err;
+
+@@ -992,12 +1013,12 @@ static int m88ds3103_set_tone(struct dvb_frontend *fe,
+ }
+
+ utmp = tone << 7 | dev->cfg->envelope_mode << 5;
+- ret = regmap_update_bits(dev->regmap, 0xa2, 0xe0, utmp);
++ ret = m88ds3103_update_bits(dev, 0xa2, 0xe0, utmp);
+ if (ret)
+ goto err;
+
+ utmp = 1 << 2;
+- ret = regmap_update_bits(dev->regmap, 0xa1, reg_a1_mask, utmp);
++ ret = m88ds3103_update_bits(dev, 0xa1, reg_a1_mask, utmp);
+ if (ret)
+ goto err;
+
+@@ -1047,7 +1068,7 @@ static int m88ds3103_set_voltage(struct dvb_frontend *fe,
+ voltage_dis ^= dev->cfg->lnb_en_pol;
+
+ utmp = voltage_dis << 1 | voltage_sel << 0;
+- ret = regmap_update_bits(dev->regmap, 0xa2, 0x03, utmp);
++ ret = m88ds3103_update_bits(dev, 0xa2, 0x03, utmp);
+ if (ret)
+ goto err;
+
+@@ -1080,7 +1101,7 @@ static int m88ds3103_diseqc_send_master_cmd(struct dvb_frontend *fe,
+ }
+
+ utmp = dev->cfg->envelope_mode << 5;
+- ret = regmap_update_bits(dev->regmap, 0xa2, 0xe0, utmp);
++ ret = m88ds3103_update_bits(dev, 0xa2, 0xe0, utmp);
+ if (ret)
+ goto err;
+
+@@ -1115,12 +1136,12 @@ static int m88ds3103_diseqc_send_master_cmd(struct dvb_frontend *fe,
+ } else {
+ dev_dbg(&client->dev, "diseqc tx timeout\n");
+
+- ret = regmap_update_bits(dev->regmap, 0xa1, 0xc0, 0x40);
++ ret = m88ds3103_update_bits(dev, 0xa1, 0xc0, 0x40);
+ if (ret)
+ goto err;
+ }
+
+- ret = regmap_update_bits(dev->regmap, 0xa2, 0xc0, 0x80);
++ ret = m88ds3103_update_bits(dev, 0xa2, 0xc0, 0x80);
+ if (ret)
+ goto err;
+
+@@ -1152,7 +1173,7 @@ static int m88ds3103_diseqc_send_burst(struct dvb_frontend *fe,
+ }
+
+ utmp = dev->cfg->envelope_mode << 5;
+- ret = regmap_update_bits(dev->regmap, 0xa2, 0xe0, utmp);
++ ret = m88ds3103_update_bits(dev, 0xa2, 0xe0, utmp);
+ if (ret)
+ goto err;
+
+@@ -1194,12 +1215,12 @@ static int m88ds3103_diseqc_send_burst(struct dvb_frontend *fe,
+ } else {
+ dev_dbg(&client->dev, "diseqc tx timeout\n");
+
+- ret = regmap_update_bits(dev->regmap, 0xa1, 0xc0, 0x40);
++ ret = m88ds3103_update_bits(dev, 0xa1, 0xc0, 0x40);
+ if (ret)
+ goto err;
+ }
+
+- ret = regmap_update_bits(dev->regmap, 0xa2, 0xc0, 0x80);
++ ret = m88ds3103_update_bits(dev, 0xa2, 0xc0, 0x80);
+ if (ret)
+ goto err;
+
+@@ -1435,13 +1456,13 @@ static int m88ds3103_probe(struct i2c_client *client,
+ goto err_kfree;
+
+ /* sleep */
+- ret = regmap_update_bits(dev->regmap, 0x08, 0x01, 0x00);
++ ret = m88ds3103_update_bits(dev, 0x08, 0x01, 0x00);
+ if (ret)
+ goto err_kfree;
+- ret = regmap_update_bits(dev->regmap, 0x04, 0x01, 0x01);
++ ret = m88ds3103_update_bits(dev, 0x04, 0x01, 0x01);
+ if (ret)
+ goto err_kfree;
+- ret = regmap_update_bits(dev->regmap, 0x23, 0x10, 0x10);
++ ret = m88ds3103_update_bits(dev, 0x23, 0x10, 0x10);
+ if (ret)
+ goto err_kfree;
+
+diff --git a/drivers/media/dvb-frontends/si2168.c b/drivers/media/dvb-frontends/si2168.c
+index 25e238c370e5..cb6a49b8c1ce 100644
+--- a/drivers/media/dvb-frontends/si2168.c
++++ b/drivers/media/dvb-frontends/si2168.c
+@@ -502,6 +502,10 @@ static int si2168_init(struct dvb_frontend *fe)
+ /* firmware is in the new format */
+ for (remaining = fw->size; remaining > 0; remaining -= 17) {
+ len = fw->data[fw->size - remaining];
++ if (len > SI2168_ARGLEN) {
++ ret = -EINVAL;
++ break;
++ }
+ memcpy(cmd.args, &fw->data[(fw->size - remaining) + 1], len);
+ cmd.wlen = len;
+ cmd.rlen = 1;
+diff --git a/drivers/media/tuners/si2157.c b/drivers/media/tuners/si2157.c
+index a6245ef379c4..416c865eb595 100644
+--- a/drivers/media/tuners/si2157.c
++++ b/drivers/media/tuners/si2157.c
+@@ -166,6 +166,10 @@ static int si2157_init(struct dvb_frontend *fe)
+
+ for (remaining = fw->size; remaining > 0; remaining -= 17) {
+ len = fw->data[fw->size - remaining];
++ if (len > SI2157_ARGLEN) {
++ dev_err(&client->dev, "Bad firmware length\n");
++ goto err_release_firmware;
++ }
+ memcpy(cmd.args, &fw->data[(fw->size - remaining) + 1], len);
+ cmd.wlen = len;
+ cmd.rlen = 1;
+diff --git a/drivers/media/usb/dvb-usb-v2/rtl28xxu.c b/drivers/media/usb/dvb-usb-v2/rtl28xxu.c
+index c3cac4c12fb3..197a4f2e54d2 100644
+--- a/drivers/media/usb/dvb-usb-v2/rtl28xxu.c
++++ b/drivers/media/usb/dvb-usb-v2/rtl28xxu.c
+@@ -34,6 +34,14 @@ static int rtl28xxu_ctrl_msg(struct dvb_usb_device *d, struct rtl28xxu_req *req)
+ unsigned int pipe;
+ u8 requesttype;
+
++ mutex_lock(&d->usb_mutex);
++
++ if (req->size > sizeof(dev->buf)) {
++ dev_err(&d->intf->dev, "too large message %u\n", req->size);
++ ret = -EINVAL;
++ goto err_mutex_unlock;
++ }
++
+ if (req->index & CMD_WR_FLAG) {
+ /* write */
+ memcpy(dev->buf, req->data, req->size);
+@@ -50,14 +58,17 @@ static int rtl28xxu_ctrl_msg(struct dvb_usb_device *d, struct rtl28xxu_req *req)
+ dvb_usb_dbg_usb_control_msg(d->udev, 0, requesttype, req->value,
+ req->index, dev->buf, req->size);
+ if (ret < 0)
+- goto err;
++ goto err_mutex_unlock;
+
+ /* read request, copy returned data to return buf */
+ if (requesttype == (USB_TYPE_VENDOR | USB_DIR_IN))
+ memcpy(req->data, dev->buf, req->size);
+
++ mutex_unlock(&d->usb_mutex);
++
+ return 0;
+-err:
++err_mutex_unlock:
++ mutex_unlock(&d->usb_mutex);
+ dev_dbg(&d->intf->dev, "failed=%d\n", ret);
+ return ret;
+ }
+diff --git a/drivers/media/usb/dvb-usb-v2/rtl28xxu.h b/drivers/media/usb/dvb-usb-v2/rtl28xxu.h
+index 9f6115a2ee01..138062960a73 100644
+--- a/drivers/media/usb/dvb-usb-v2/rtl28xxu.h
++++ b/drivers/media/usb/dvb-usb-v2/rtl28xxu.h
+@@ -71,7 +71,7 @@
+
+
+ struct rtl28xxu_dev {
+- u8 buf[28];
++ u8 buf[128];
+ u8 chip_id;
+ u8 tuner;
+ char *tuner_name;
+diff --git a/drivers/mmc/card/mmc_test.c b/drivers/mmc/card/mmc_test.c
+index b78cf5d403a3..7fc9174d4619 100644
+--- a/drivers/mmc/card/mmc_test.c
++++ b/drivers/mmc/card/mmc_test.c
+@@ -2263,15 +2263,12 @@ static int mmc_test_profile_sglen_r_nonblock_perf(struct mmc_test_card *test)
+ /*
+ * eMMC hardware reset.
+ */
+-static int mmc_test_hw_reset(struct mmc_test_card *test)
++static int mmc_test_reset(struct mmc_test_card *test)
+ {
+ struct mmc_card *card = test->card;
+ struct mmc_host *host = card->host;
+ int err;
+
+- if (!mmc_card_mmc(card) || !mmc_can_reset(card))
+- return RESULT_UNSUP_CARD;
+-
+ err = mmc_hw_reset(host);
+ if (!err)
+ return RESULT_OK;
+@@ -2605,8 +2602,8 @@ static const struct mmc_test_case mmc_test_cases[] = {
+ },
+
+ {
+- .name = "eMMC hardware reset",
+- .run = mmc_test_hw_reset,
++ .name = "Reset test",
++ .run = mmc_test_reset,
+ },
+ };
+
+diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
+index e726903170a8..f6cd995dbe92 100644
+--- a/drivers/mmc/core/mmc.c
++++ b/drivers/mmc/core/mmc.c
+@@ -1924,7 +1924,6 @@ EXPORT_SYMBOL(mmc_can_reset);
+ static int mmc_reset(struct mmc_host *host)
+ {
+ struct mmc_card *card = host->card;
+- u32 status;
+
+ if (!(host->caps & MMC_CAP_HW_RESET) || !host->ops->hw_reset)
+ return -EOPNOTSUPP;
+@@ -1937,12 +1936,6 @@ static int mmc_reset(struct mmc_host *host)
+
+ host->ops->hw_reset(host);
+
+- /* If the reset has happened, then a status command will fail */
+- if (!mmc_send_status(card, &status)) {
+- mmc_host_clk_release(host);
+- return -ENOSYS;
+- }
+-
+ /* Set initial state and call mmc_set_ios */
+ mmc_set_initial_state(host);
+ mmc_host_clk_release(host);
+diff --git a/drivers/net/wireless/ath/ath9k/init.c b/drivers/net/wireless/ath/ath9k/init.c
+index eff0e5325e6a..bfddc9efd6cc 100644
+--- a/drivers/net/wireless/ath/ath9k/init.c
++++ b/drivers/net/wireless/ath/ath9k/init.c
+@@ -874,6 +874,7 @@ static void ath9k_set_hw_capab(struct ath_softc *sc, struct ieee80211_hw *hw)
+ hw->max_rate_tries = 10;
+ hw->sta_data_size = sizeof(struct ath_node);
+ hw->vif_data_size = sizeof(struct ath_vif);
++ hw->extra_tx_headroom = 4;
+
+ hw->wiphy->available_antennas_rx = BIT(ah->caps.max_rxchains) - 1;
+ hw->wiphy->available_antennas_tx = BIT(ah->caps.max_txchains) - 1;
+diff --git a/drivers/net/wireless/iwlwifi/dvm/lib.c b/drivers/net/wireless/iwlwifi/dvm/lib.c
+index 1d2223df5cb0..e7d3566c714b 100644
+--- a/drivers/net/wireless/iwlwifi/dvm/lib.c
++++ b/drivers/net/wireless/iwlwifi/dvm/lib.c
+@@ -1022,7 +1022,7 @@ static void iwlagn_wowlan_program_keys(struct ieee80211_hw *hw,
+ u8 *pn = seq.ccmp.pn;
+
+ ieee80211_get_key_rx_seq(key, i, &seq);
+- aes_sc->pn = cpu_to_le64(
++ aes_sc[i].pn = cpu_to_le64(
+ (u64)pn[5] |
+ ((u64)pn[4] << 8) |
+ ((u64)pn[3] << 16) |
+diff --git a/drivers/net/wireless/iwlwifi/iwl-7000.c b/drivers/net/wireless/iwlwifi/iwl-7000.c
+index cc35f796d406..d7acbd147bd1 100644
+--- a/drivers/net/wireless/iwlwifi/iwl-7000.c
++++ b/drivers/net/wireless/iwlwifi/iwl-7000.c
+@@ -348,6 +348,6 @@ const struct iwl_cfg iwl7265d_n_cfg = {
+ };
+
+ MODULE_FIRMWARE(IWL7260_MODULE_FIRMWARE(IWL7260_UCODE_API_OK));
+-MODULE_FIRMWARE(IWL3160_MODULE_FIRMWARE(IWL3160_UCODE_API_OK));
++MODULE_FIRMWARE(IWL3160_MODULE_FIRMWARE(IWL7260_UCODE_API_OK));
+ MODULE_FIRMWARE(IWL7265_MODULE_FIRMWARE(IWL7260_UCODE_API_OK));
+ MODULE_FIRMWARE(IWL7265D_MODULE_FIRMWARE(IWL7260_UCODE_API_OK));
+diff --git a/drivers/net/wireless/iwlwifi/mvm/d3.c b/drivers/net/wireless/iwlwifi/mvm/d3.c
+index 4165d104e4c3..f60b89baab7a 100644
+--- a/drivers/net/wireless/iwlwifi/mvm/d3.c
++++ b/drivers/net/wireless/iwlwifi/mvm/d3.c
+@@ -274,18 +274,13 @@ static void iwl_mvm_wowlan_program_keys(struct ieee80211_hw *hw,
+ break;
+ case WLAN_CIPHER_SUITE_CCMP:
+ if (sta) {
+- u8 *pn = seq.ccmp.pn;
++ u64 pn64;
+
+ aes_sc = data->rsc_tsc->all_tsc_rsc.aes.unicast_rsc;
+ aes_tx_sc = &data->rsc_tsc->all_tsc_rsc.aes.tsc;
+
+- ieee80211_get_key_tx_seq(key, &seq);
+- aes_tx_sc->pn = cpu_to_le64((u64)pn[5] |
+- ((u64)pn[4] << 8) |
+- ((u64)pn[3] << 16) |
+- ((u64)pn[2] << 24) |
+- ((u64)pn[1] << 32) |
+- ((u64)pn[0] << 40));
++ pn64 = atomic64_read(&key->tx_pn);
++ aes_tx_sc->pn = cpu_to_le64(pn64);
+ } else {
+ aes_sc = data->rsc_tsc->all_tsc_rsc.aes.multicast_rsc;
+ }
+@@ -298,12 +293,12 @@ static void iwl_mvm_wowlan_program_keys(struct ieee80211_hw *hw,
+ u8 *pn = seq.ccmp.pn;
+
+ ieee80211_get_key_rx_seq(key, i, &seq);
+- aes_sc->pn = cpu_to_le64((u64)pn[5] |
+- ((u64)pn[4] << 8) |
+- ((u64)pn[3] << 16) |
+- ((u64)pn[2] << 24) |
+- ((u64)pn[1] << 32) |
+- ((u64)pn[0] << 40));
++ aes_sc[i].pn = cpu_to_le64((u64)pn[5] |
++ ((u64)pn[4] << 8) |
++ ((u64)pn[3] << 16) |
++ ((u64)pn[2] << 24) |
++ ((u64)pn[1] << 32) |
++ ((u64)pn[0] << 40));
+ }
+ data->use_rsc_tsc = true;
+ break;
+@@ -1446,15 +1441,15 @@ static void iwl_mvm_d3_update_gtks(struct ieee80211_hw *hw,
+
+ switch (key->cipher) {
+ case WLAN_CIPHER_SUITE_CCMP:
+- iwl_mvm_aes_sc_to_seq(&sc->aes.tsc, &seq);
+ iwl_mvm_set_aes_rx_seq(sc->aes.unicast_rsc, key);
++ atomic64_set(&key->tx_pn, le64_to_cpu(sc->aes.tsc.pn));
+ break;
+ case WLAN_CIPHER_SUITE_TKIP:
+ iwl_mvm_tkip_sc_to_seq(&sc->tkip.tsc, &seq);
+ iwl_mvm_set_tkip_rx_seq(sc->tkip.unicast_rsc, key);
++ ieee80211_set_key_tx_seq(key, &seq);
+ break;
+ }
+- ieee80211_set_key_tx_seq(key, &seq);
+
+ /* that's it for this key */
+ return;
+diff --git a/drivers/net/wireless/iwlwifi/mvm/fw.c b/drivers/net/wireless/iwlwifi/mvm/fw.c
+index eb10c5ee4a14..b49367e1cfd2 100644
+--- a/drivers/net/wireless/iwlwifi/mvm/fw.c
++++ b/drivers/net/wireless/iwlwifi/mvm/fw.c
+@@ -364,7 +364,7 @@ int iwl_run_init_mvm_ucode(struct iwl_mvm *mvm, bool read_nvm)
+ * abort after reading the nvm in case RF Kill is on, we will complete
+ * the init seq later when RF kill will switch to off
+ */
+- if (iwl_mvm_is_radio_killed(mvm)) {
++ if (iwl_mvm_is_radio_hw_killed(mvm)) {
+ IWL_DEBUG_RF_KILL(mvm,
+ "jump over all phy activities due to RF kill\n");
+ iwl_remove_notification(&mvm->notif_wait, &calib_wait);
+@@ -397,7 +397,7 @@ int iwl_run_init_mvm_ucode(struct iwl_mvm *mvm, bool read_nvm)
+ ret = iwl_wait_notification(&mvm->notif_wait, &calib_wait,
+ MVM_UCODE_CALIB_TIMEOUT);
+
+- if (ret && iwl_mvm_is_radio_killed(mvm)) {
++ if (ret && iwl_mvm_is_radio_hw_killed(mvm)) {
+ IWL_DEBUG_RF_KILL(mvm, "RFKILL while calibrating.\n");
+ ret = 1;
+ }
+diff --git a/drivers/net/wireless/iwlwifi/mvm/mac80211.c b/drivers/net/wireless/iwlwifi/mvm/mac80211.c
+index dfdab38e2d4a..f82019c0c4c0 100644
+--- a/drivers/net/wireless/iwlwifi/mvm/mac80211.c
++++ b/drivers/net/wireless/iwlwifi/mvm/mac80211.c
+@@ -2373,6 +2373,7 @@ static void iwl_mvm_stop_ap_ibss(struct ieee80211_hw *hw,
+ iwl_mvm_remove_time_event(mvm, mvmvif,
+ &mvmvif->time_event_data);
+ RCU_INIT_POINTER(mvm->csa_vif, NULL);
++ mvmvif->csa_countdown = false;
+ }
+
+ if (rcu_access_pointer(mvm->csa_tx_blocked_vif) == vif) {
+diff --git a/drivers/net/wireless/iwlwifi/mvm/mvm.h b/drivers/net/wireless/iwlwifi/mvm/mvm.h
+index 2d4bad5fe825..4a6f1627ae43 100644
+--- a/drivers/net/wireless/iwlwifi/mvm/mvm.h
++++ b/drivers/net/wireless/iwlwifi/mvm/mvm.h
+@@ -848,6 +848,11 @@ static inline bool iwl_mvm_is_radio_killed(struct iwl_mvm *mvm)
+ test_bit(IWL_MVM_STATUS_HW_CTKILL, &mvm->status);
+ }
+
++static inline bool iwl_mvm_is_radio_hw_killed(struct iwl_mvm *mvm)
++{
++ return test_bit(IWL_MVM_STATUS_HW_RFKILL, &mvm->status);
++}
++
+ /* Must be called with rcu_read_lock() held and it can only be
+ * released when mvmsta is not needed anymore.
+ */
+diff --git a/drivers/net/wireless/iwlwifi/mvm/ops.c b/drivers/net/wireless/iwlwifi/mvm/ops.c
+index e4fa50075ffd..61c2b0ad5db7 100644
+--- a/drivers/net/wireless/iwlwifi/mvm/ops.c
++++ b/drivers/net/wireless/iwlwifi/mvm/ops.c
+@@ -582,6 +582,7 @@ iwl_op_mode_mvm_start(struct iwl_trans *trans, const struct iwl_cfg *cfg,
+ ieee80211_unregister_hw(mvm->hw);
+ iwl_mvm_leds_exit(mvm);
+ out_free:
++ flush_delayed_work(&mvm->fw_dump_wk);
+ iwl_phy_db_free(mvm->phy_db);
+ kfree(mvm->scan_cmd);
+ if (!cfg->no_power_up_nic_in_init || !mvm->nvm_file_name)
+diff --git a/drivers/net/wireless/iwlwifi/pcie/drv.c b/drivers/net/wireless/iwlwifi/pcie/drv.c
+index 9f65c1cff1b1..865d578dee82 100644
+--- a/drivers/net/wireless/iwlwifi/pcie/drv.c
++++ b/drivers/net/wireless/iwlwifi/pcie/drv.c
+@@ -414,6 +414,11 @@ static const struct pci_device_id iwl_hw_card_ids[] = {
+ {IWL_PCI_DEVICE(0x095A, 0x5590, iwl7265_2ac_cfg)},
+ {IWL_PCI_DEVICE(0x095B, 0x5290, iwl7265_2ac_cfg)},
+ {IWL_PCI_DEVICE(0x095A, 0x5490, iwl7265_2ac_cfg)},
++ {IWL_PCI_DEVICE(0x095A, 0x5F10, iwl7265_2ac_cfg)},
++ {IWL_PCI_DEVICE(0x095B, 0x5212, iwl7265_2ac_cfg)},
++ {IWL_PCI_DEVICE(0x095B, 0x520A, iwl7265_2ac_cfg)},
++ {IWL_PCI_DEVICE(0x095A, 0x9000, iwl7265_2ac_cfg)},
++ {IWL_PCI_DEVICE(0x095A, 0x9400, iwl7265_2ac_cfg)},
+
+ /* 8000 Series */
+ {IWL_PCI_DEVICE(0x24F3, 0x0010, iwl8260_2ac_cfg)},
+diff --git a/drivers/net/wireless/rtlwifi/pci.h b/drivers/net/wireless/rtlwifi/pci.h
+index d4567d12e07e..5da6703942d9 100644
+--- a/drivers/net/wireless/rtlwifi/pci.h
++++ b/drivers/net/wireless/rtlwifi/pci.h
+@@ -247,6 +247,8 @@ struct rtl_pci {
+ /* MSI support */
+ bool msi_support;
+ bool using_msi;
++ /* interrupt clear before set */
++ bool int_clear;
+ };
+
+ struct mp_adapter {
+diff --git a/drivers/net/wireless/rtlwifi/rtl8821ae/hw.c b/drivers/net/wireless/rtlwifi/rtl8821ae/hw.c
+index b7f18e2155eb..6e9418ed90c2 100644
+--- a/drivers/net/wireless/rtlwifi/rtl8821ae/hw.c
++++ b/drivers/net/wireless/rtlwifi/rtl8821ae/hw.c
+@@ -2253,11 +2253,28 @@ void rtl8821ae_set_qos(struct ieee80211_hw *hw, int aci)
+ }
+ }
+
++static void rtl8821ae_clear_interrupt(struct ieee80211_hw *hw)
++{
++ struct rtl_priv *rtlpriv = rtl_priv(hw);
++ u32 tmp = rtl_read_dword(rtlpriv, REG_HISR);
++
++ rtl_write_dword(rtlpriv, REG_HISR, tmp);
++
++ tmp = rtl_read_dword(rtlpriv, REG_HISRE);
++ rtl_write_dword(rtlpriv, REG_HISRE, tmp);
++
++ tmp = rtl_read_dword(rtlpriv, REG_HSISR);
++ rtl_write_dword(rtlpriv, REG_HSISR, tmp);
++}
++
+ void rtl8821ae_enable_interrupt(struct ieee80211_hw *hw)
+ {
+ struct rtl_priv *rtlpriv = rtl_priv(hw);
+ struct rtl_pci *rtlpci = rtl_pcidev(rtl_pcipriv(hw));
+
++ if (!rtlpci->int_clear)
++ rtl8821ae_clear_interrupt(hw);/*clear it here first*/
++
+ rtl_write_dword(rtlpriv, REG_HIMR, rtlpci->irq_mask[0] & 0xFFFFFFFF);
+ rtl_write_dword(rtlpriv, REG_HIMRE, rtlpci->irq_mask[1] & 0xFFFFFFFF);
+ rtlpci->irq_enabled = true;
+diff --git a/drivers/net/wireless/rtlwifi/rtl8821ae/sw.c b/drivers/net/wireless/rtlwifi/rtl8821ae/sw.c
+index a4988121e1ab..8ee141a55bc5 100644
+--- a/drivers/net/wireless/rtlwifi/rtl8821ae/sw.c
++++ b/drivers/net/wireless/rtlwifi/rtl8821ae/sw.c
+@@ -96,6 +96,7 @@ int rtl8821ae_init_sw_vars(struct ieee80211_hw *hw)
+
+ rtl8821ae_bt_reg_init(hw);
+ rtlpci->msi_support = rtlpriv->cfg->mod_params->msi_support;
++ rtlpci->int_clear = rtlpriv->cfg->mod_params->int_clear;
+ rtlpriv->btcoexist.btc_ops = rtl_btc_get_ops_pointer();
+
+ rtlpriv->dm.dm_initialgain_enable = 1;
+@@ -167,6 +168,7 @@ int rtl8821ae_init_sw_vars(struct ieee80211_hw *hw)
+ rtlpriv->psc.swctrl_lps = rtlpriv->cfg->mod_params->swctrl_lps;
+ rtlpriv->psc.fwctrl_lps = rtlpriv->cfg->mod_params->fwctrl_lps;
+ rtlpci->msi_support = rtlpriv->cfg->mod_params->msi_support;
++ rtlpci->msi_support = rtlpriv->cfg->mod_params->int_clear;
+ if (rtlpriv->cfg->mod_params->disable_watchdog)
+ pr_info("watchdog disabled\n");
+ rtlpriv->psc.reg_fwctrl_lps = 3;
+@@ -308,6 +310,7 @@ static struct rtl_mod_params rtl8821ae_mod_params = {
+ .swctrl_lps = false,
+ .fwctrl_lps = true,
+ .msi_support = true,
++ .int_clear = true,
+ .debug = DBG_EMERG,
+ .disable_watchdog = 0,
+ };
+@@ -437,6 +440,7 @@ module_param_named(fwlps, rtl8821ae_mod_params.fwctrl_lps, bool, 0444);
+ module_param_named(msi, rtl8821ae_mod_params.msi_support, bool, 0444);
+ module_param_named(disable_watchdog, rtl8821ae_mod_params.disable_watchdog,
+ bool, 0444);
++module_param_named(int_clear, rtl8821ae_mod_params.int_clear, bool, 0444);
+ MODULE_PARM_DESC(swenc, "Set to 1 for software crypto (default 0)\n");
+ MODULE_PARM_DESC(ips, "Set to 0 to not use link power save (default 1)\n");
+ MODULE_PARM_DESC(swlps, "Set to 1 to use SW control power save (default 0)\n");
+@@ -444,6 +448,7 @@ MODULE_PARM_DESC(fwlps, "Set to 1 to use FW control power save (default 1)\n");
+ MODULE_PARM_DESC(msi, "Set to 1 to use MSI interrupts mode (default 1)\n");
+ MODULE_PARM_DESC(debug, "Set debug level (0-5) (default 0)");
+ MODULE_PARM_DESC(disable_watchdog, "Set to 1 to disable the watchdog (default 0)\n");
++MODULE_PARM_DESC(int_clear, "Set to 1 to disable interrupt clear before set (default 0)\n");
+
+ static SIMPLE_DEV_PM_OPS(rtlwifi_pm_ops, rtl_pci_suspend, rtl_pci_resume);
+
+diff --git a/drivers/net/wireless/rtlwifi/wifi.h b/drivers/net/wireless/rtlwifi/wifi.h
+index 2b770b5e2620..0a3570aa6651 100644
+--- a/drivers/net/wireless/rtlwifi/wifi.h
++++ b/drivers/net/wireless/rtlwifi/wifi.h
+@@ -2234,6 +2234,9 @@ struct rtl_mod_params {
+
+ /* default 0: 1 means disable */
+ bool disable_watchdog;
++
++ /* default 0: 1 means do not disable interrupts */
++ bool int_clear;
+ };
+
+ struct rtl_hal_usbint_cfg {
+diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
+index 312f23a8429c..92618686604c 100644
+--- a/drivers/pci/pci-sysfs.c
++++ b/drivers/pci/pci-sysfs.c
+@@ -216,7 +216,7 @@ static ssize_t numa_node_store(struct device *dev,
+ if (ret)
+ return ret;
+
+- if (!node_online(node))
++ if (node >= MAX_NUMNODES || !node_online(node))
+ return -EINVAL;
+
+ add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK);
+diff --git a/drivers/pinctrl/intel/pinctrl-baytrail.c b/drivers/pinctrl/intel/pinctrl-baytrail.c
+index 2062c224e32f..b2602210784d 100644
+--- a/drivers/pinctrl/intel/pinctrl-baytrail.c
++++ b/drivers/pinctrl/intel/pinctrl-baytrail.c
+@@ -146,7 +146,7 @@ struct byt_gpio_pin_context {
+ struct byt_gpio {
+ struct gpio_chip chip;
+ struct platform_device *pdev;
+- spinlock_t lock;
++ raw_spinlock_t lock;
+ void __iomem *reg_base;
+ struct pinctrl_gpio_range *range;
+ struct byt_gpio_pin_context *saved_context;
+@@ -174,11 +174,11 @@ static void byt_gpio_clear_triggering(struct byt_gpio *vg, unsigned offset)
+ unsigned long flags;
+ u32 value;
+
+- spin_lock_irqsave(&vg->lock, flags);
++ raw_spin_lock_irqsave(&vg->lock, flags);
+ value = readl(reg);
+ value &= ~(BYT_TRIG_POS | BYT_TRIG_NEG | BYT_TRIG_LVL);
+ writel(value, reg);
+- spin_unlock_irqrestore(&vg->lock, flags);
++ raw_spin_unlock_irqrestore(&vg->lock, flags);
+ }
+
+ static u32 byt_get_gpio_mux(struct byt_gpio *vg, unsigned offset)
+@@ -201,6 +201,9 @@ static int byt_gpio_request(struct gpio_chip *chip, unsigned offset)
+ struct byt_gpio *vg = to_byt_gpio(chip);
+ void __iomem *reg = byt_gpio_reg(chip, offset, BYT_CONF0_REG);
+ u32 value, gpio_mux;
++ unsigned long flags;
++
++ raw_spin_lock_irqsave(&vg->lock, flags);
+
+ /*
+ * In most cases, func pin mux 000 means GPIO function.
+@@ -214,18 +217,16 @@ static int byt_gpio_request(struct gpio_chip *chip, unsigned offset)
+ value = readl(reg) & BYT_PIN_MUX;
+ gpio_mux = byt_get_gpio_mux(vg, offset);
+ if (WARN_ON(gpio_mux != value)) {
+- unsigned long flags;
+-
+- spin_lock_irqsave(&vg->lock, flags);
+ value = readl(reg) & ~BYT_PIN_MUX;
+ value |= gpio_mux;
+ writel(value, reg);
+- spin_unlock_irqrestore(&vg->lock, flags);
+
+ dev_warn(&vg->pdev->dev,
+ "pin %u forcibly re-configured as GPIO\n", offset);
+ }
+
++ raw_spin_unlock_irqrestore(&vg->lock, flags);
++
+ pm_runtime_get(&vg->pdev->dev);
+
+ return 0;
+@@ -250,7 +251,7 @@ static int byt_irq_type(struct irq_data *d, unsigned type)
+ if (offset >= vg->chip.ngpio)
+ return -EINVAL;
+
+- spin_lock_irqsave(&vg->lock, flags);
++ raw_spin_lock_irqsave(&vg->lock, flags);
+ value = readl(reg);
+
+ WARN(value & BYT_DIRECT_IRQ_EN,
+@@ -269,7 +270,7 @@ static int byt_irq_type(struct irq_data *d, unsigned type)
+ else if (type & IRQ_TYPE_LEVEL_MASK)
+ __irq_set_handler_locked(d->irq, handle_level_irq);
+
+- spin_unlock_irqrestore(&vg->lock, flags);
++ raw_spin_unlock_irqrestore(&vg->lock, flags);
+
+ return 0;
+ }
+@@ -277,7 +278,15 @@ static int byt_irq_type(struct irq_data *d, unsigned type)
+ static int byt_gpio_get(struct gpio_chip *chip, unsigned offset)
+ {
+ void __iomem *reg = byt_gpio_reg(chip, offset, BYT_VAL_REG);
+- return readl(reg) & BYT_LEVEL;
++ struct byt_gpio *vg = to_byt_gpio(chip);
++ unsigned long flags;
++ u32 val;
++
++ raw_spin_lock_irqsave(&vg->lock, flags);
++ val = readl(reg);
++ raw_spin_unlock_irqrestore(&vg->lock, flags);
++
++ return val & BYT_LEVEL;
+ }
+
+ static void byt_gpio_set(struct gpio_chip *chip, unsigned offset, int value)
+@@ -287,7 +296,7 @@ static void byt_gpio_set(struct gpio_chip *chip, unsigned offset, int value)
+ unsigned long flags;
+ u32 old_val;
+
+- spin_lock_irqsave(&vg->lock, flags);
++ raw_spin_lock_irqsave(&vg->lock, flags);
+
+ old_val = readl(reg);
+
+@@ -296,7 +305,7 @@ static void byt_gpio_set(struct gpio_chip *chip, unsigned offset, int value)
+ else
+ writel(old_val & ~BYT_LEVEL, reg);
+
+- spin_unlock_irqrestore(&vg->lock, flags);
++ raw_spin_unlock_irqrestore(&vg->lock, flags);
+ }
+
+ static int byt_gpio_direction_input(struct gpio_chip *chip, unsigned offset)
+@@ -306,13 +315,13 @@ static int byt_gpio_direction_input(struct gpio_chip *chip, unsigned offset)
+ unsigned long flags;
+ u32 value;
+
+- spin_lock_irqsave(&vg->lock, flags);
++ raw_spin_lock_irqsave(&vg->lock, flags);
+
+ value = readl(reg) | BYT_DIR_MASK;
+ value &= ~BYT_INPUT_EN; /* active low */
+ writel(value, reg);
+
+- spin_unlock_irqrestore(&vg->lock, flags);
++ raw_spin_unlock_irqrestore(&vg->lock, flags);
+
+ return 0;
+ }
+@@ -326,7 +335,7 @@ static int byt_gpio_direction_output(struct gpio_chip *chip,
+ unsigned long flags;
+ u32 reg_val;
+
+- spin_lock_irqsave(&vg->lock, flags);
++ raw_spin_lock_irqsave(&vg->lock, flags);
+
+ /*
+ * Before making any direction modifications, do a check if gpio
+@@ -345,7 +354,7 @@ static int byt_gpio_direction_output(struct gpio_chip *chip,
+ else
+ writel(reg_val & ~BYT_LEVEL, reg);
+
+- spin_unlock_irqrestore(&vg->lock, flags);
++ raw_spin_unlock_irqrestore(&vg->lock, flags);
+
+ return 0;
+ }
+@@ -354,18 +363,19 @@ static void byt_gpio_dbg_show(struct seq_file *s, struct gpio_chip *chip)
+ {
+ struct byt_gpio *vg = to_byt_gpio(chip);
+ int i;
+- unsigned long flags;
+ u32 conf0, val, offs;
+
+- spin_lock_irqsave(&vg->lock, flags);
+-
+ for (i = 0; i < vg->chip.ngpio; i++) {
+ const char *pull_str = NULL;
+ const char *pull = NULL;
++ unsigned long flags;
+ const char *label;
+ offs = vg->range->pins[i] * 16;
++
++ raw_spin_lock_irqsave(&vg->lock, flags);
+ conf0 = readl(vg->reg_base + offs + BYT_CONF0_REG);
+ val = readl(vg->reg_base + offs + BYT_VAL_REG);
++ raw_spin_unlock_irqrestore(&vg->lock, flags);
+
+ label = gpiochip_is_requested(chip, i);
+ if (!label)
+@@ -418,7 +428,6 @@ static void byt_gpio_dbg_show(struct seq_file *s, struct gpio_chip *chip)
+
+ seq_puts(s, "\n");
+ }
+- spin_unlock_irqrestore(&vg->lock, flags);
+ }
+
+ static void byt_gpio_irq_handler(unsigned irq, struct irq_desc *desc)
+@@ -450,8 +459,10 @@ static void byt_irq_ack(struct irq_data *d)
+ unsigned offset = irqd_to_hwirq(d);
+ void __iomem *reg;
+
++ raw_spin_lock(&vg->lock);
+ reg = byt_gpio_reg(&vg->chip, offset, BYT_INT_STAT_REG);
+ writel(BIT(offset % 32), reg);
++ raw_spin_unlock(&vg->lock);
+ }
+
+ static void byt_irq_unmask(struct irq_data *d)
+@@ -463,9 +474,9 @@ static void byt_irq_unmask(struct irq_data *d)
+ void __iomem *reg;
+ u32 value;
+
+- spin_lock_irqsave(&vg->lock, flags);
+-
+ reg = byt_gpio_reg(&vg->chip, offset, BYT_CONF0_REG);
++
++ raw_spin_lock_irqsave(&vg->lock, flags);
+ value = readl(reg);
+
+ switch (irqd_get_trigger_type(d)) {
+@@ -486,7 +497,7 @@ static void byt_irq_unmask(struct irq_data *d)
+
+ writel(value, reg);
+
+- spin_unlock_irqrestore(&vg->lock, flags);
++ raw_spin_unlock_irqrestore(&vg->lock, flags);
+ }
+
+ static void byt_irq_mask(struct irq_data *d)
+@@ -578,7 +589,7 @@ static int byt_gpio_probe(struct platform_device *pdev)
+ if (IS_ERR(vg->reg_base))
+ return PTR_ERR(vg->reg_base);
+
+- spin_lock_init(&vg->lock);
++ raw_spin_lock_init(&vg->lock);
+
+ gc = &vg->chip;
+ gc->label = dev_name(&pdev->dev);
+diff --git a/drivers/scsi/mvsas/mv_sas.c b/drivers/scsi/mvsas/mv_sas.c
+index 454536c49315..9c780740fb82 100644
+--- a/drivers/scsi/mvsas/mv_sas.c
++++ b/drivers/scsi/mvsas/mv_sas.c
+@@ -887,6 +887,8 @@ static void mvs_slot_free(struct mvs_info *mvi, u32 rx_desc)
+ static void mvs_slot_task_free(struct mvs_info *mvi, struct sas_task *task,
+ struct mvs_slot_info *slot, u32 slot_idx)
+ {
++ if (!slot)
++ return;
+ if (!slot->task)
+ return;
+ if (!sas_protocol_ata(task->task_proto))
+diff --git a/drivers/staging/iio/accel/sca3000_ring.c b/drivers/staging/iio/accel/sca3000_ring.c
+index 23685e74917e..bd2c69f85949 100644
+--- a/drivers/staging/iio/accel/sca3000_ring.c
++++ b/drivers/staging/iio/accel/sca3000_ring.c
+@@ -116,7 +116,7 @@ static int sca3000_read_first_n_hw_rb(struct iio_buffer *r,
+ if (ret)
+ goto error_ret;
+
+- for (i = 0; i < num_read; i++)
++ for (i = 0; i < num_read / sizeof(u16); i++)
+ *(((u16 *)rx) + i) = be16_to_cpup((__be16 *)rx + i);
+
+ if (copy_to_user(buf, rx, num_read))
+diff --git a/drivers/staging/iio/adc/mxs-lradc.c b/drivers/staging/iio/adc/mxs-lradc.c
+index d7c5223f1c3e..2931ea9b75d1 100644
+--- a/drivers/staging/iio/adc/mxs-lradc.c
++++ b/drivers/staging/iio/adc/mxs-lradc.c
+@@ -919,11 +919,12 @@ static int mxs_lradc_read_raw(struct iio_dev *iio_dev,
+ case IIO_CHAN_INFO_OFFSET:
+ if (chan->type == IIO_TEMP) {
+ /* The calculated value from the ADC is in Kelvin, we
+- * want Celsius for hwmon so the offset is
+- * -272.15 * scale
++ * want Celsius for hwmon so the offset is -273.15
++ * The offset is applied before scaling so it is
++ * actually -213.15 * 4 / 1.012 = -1079.644268
+ */
+- *val = -1075;
+- *val2 = 691699;
++ *val = -1079;
++ *val2 = 644268;
+
+ return IIO_VAL_INT_PLUS_MICRO;
+ }
+diff --git a/drivers/thermal/samsung/exynos_tmu.c b/drivers/thermal/samsung/exynos_tmu.c
+index c96ff10b869e..af68d06fd193 100644
+--- a/drivers/thermal/samsung/exynos_tmu.c
++++ b/drivers/thermal/samsung/exynos_tmu.c
+@@ -933,7 +933,7 @@ static void exynos4412_tmu_set_emulation(struct exynos_tmu_data *data,
+
+ if (data->soc == SOC_ARCH_EXYNOS5260)
+ emul_con = EXYNOS5260_EMUL_CON;
+- if (data->soc == SOC_ARCH_EXYNOS5433)
++ else if (data->soc == SOC_ARCH_EXYNOS5433)
+ emul_con = EXYNOS5433_TMU_EMUL_CON;
+ else if (data->soc == SOC_ARCH_EXYNOS7)
+ emul_con = EXYNOS7_TMU_REG_EMUL_CON;
+diff --git a/drivers/tty/serial/8250/8250_dma.c b/drivers/tty/serial/8250/8250_dma.c
+index 21d01a491405..e508939daea3 100644
+--- a/drivers/tty/serial/8250/8250_dma.c
++++ b/drivers/tty/serial/8250/8250_dma.c
+@@ -80,10 +80,6 @@ int serial8250_tx_dma(struct uart_8250_port *p)
+ return 0;
+
+ dma->tx_size = CIRC_CNT_TO_END(xmit->head, xmit->tail, UART_XMIT_SIZE);
+- if (dma->tx_size < p->port.fifosize) {
+- ret = -EINVAL;
+- goto err;
+- }
+
+ desc = dmaengine_prep_slave_single(dma->txchan,
+ dma->tx_addr + xmit->tail,
+diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
+index c79d33676672..c47d3e480586 100644
+--- a/drivers/usb/host/xhci-pci.c
++++ b/drivers/usb/host/xhci-pci.c
+@@ -147,6 +147,7 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
+ if (pdev->vendor == PCI_VENDOR_ID_INTEL &&
+ pdev->device == PCI_DEVICE_ID_INTEL_LYNXPOINT_LP_XHCI) {
+ xhci->quirks |= XHCI_SPURIOUS_REBOOT;
++ xhci->quirks |= XHCI_SPURIOUS_WAKEUP;
+ }
+ if (pdev->vendor == PCI_VENDOR_ID_INTEL &&
+ (pdev->device == PCI_DEVICE_ID_INTEL_SUNRISEPOINT_LP_XHCI ||
+diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
+index 8aadf3def901..63041c1e5a9f 100644
+--- a/drivers/usb/host/xhci-ring.c
++++ b/drivers/usb/host/xhci-ring.c
+@@ -2239,6 +2239,7 @@ static int handle_tx_event(struct xhci_hcd *xhci,
+ u32 trb_comp_code;
+ int ret = 0;
+ int td_num = 0;
++ bool handling_skipped_tds = false;
+
+ slot_id = TRB_TO_SLOT_ID(le32_to_cpu(event->flags));
+ xdev = xhci->devs[slot_id];
+@@ -2372,6 +2373,10 @@ static int handle_tx_event(struct xhci_hcd *xhci,
+ ep->skip = true;
+ xhci_dbg(xhci, "Miss service interval error, set skip flag\n");
+ goto cleanup;
++ case COMP_PING_ERR:
++ ep->skip = true;
++ xhci_dbg(xhci, "No Ping response error, Skip one Isoc TD\n");
++ goto cleanup;
+ default:
+ if (xhci_is_vendor_info_code(xhci, trb_comp_code)) {
+ status = 0;
+@@ -2508,13 +2513,18 @@ static int handle_tx_event(struct xhci_hcd *xhci,
+ ep, &status);
+
+ cleanup:
++
++
++ handling_skipped_tds = ep->skip &&
++ trb_comp_code != COMP_MISSED_INT &&
++ trb_comp_code != COMP_PING_ERR;
++
+ /*
+- * Do not update event ring dequeue pointer if ep->skip is set.
+- * Will roll back to continue process missed tds.
++ * Do not update event ring dequeue pointer if we're in a loop
++ * processing missed tds.
+ */
+- if (trb_comp_code == COMP_MISSED_INT || !ep->skip) {
++ if (!handling_skipped_tds)
+ inc_deq(xhci, xhci->event_ring);
+- }
+
+ if (ret) {
+ urb = td->urb;
+@@ -2549,7 +2559,7 @@ cleanup:
+ * Process them as short transfer until reach the td pointed by
+ * the event.
+ */
+- } while (ep->skip && trb_comp_code != COMP_MISSED_INT);
++ } while (handling_skipped_tds);
+
+ return 0;
+ }
+diff --git a/drivers/usb/serial/qcserial.c b/drivers/usb/serial/qcserial.c
+index ebcec8cda858..f49d262e926b 100644
+--- a/drivers/usb/serial/qcserial.c
++++ b/drivers/usb/serial/qcserial.c
+@@ -153,6 +153,8 @@ static const struct usb_device_id id_table[] = {
+ {DEVICE_SWI(0x1199, 0x9056)}, /* Sierra Wireless Modem */
+ {DEVICE_SWI(0x1199, 0x9060)}, /* Sierra Wireless Modem */
+ {DEVICE_SWI(0x1199, 0x9061)}, /* Sierra Wireless Modem */
++ {DEVICE_SWI(0x1199, 0x9070)}, /* Sierra Wireless MC74xx/EM74xx */
++ {DEVICE_SWI(0x1199, 0x9071)}, /* Sierra Wireless MC74xx/EM74xx */
+ {DEVICE_SWI(0x413c, 0x81a2)}, /* Dell Wireless 5806 Gobi(TM) 4G LTE Mobile Broadband Card */
+ {DEVICE_SWI(0x413c, 0x81a3)}, /* Dell Wireless 5570 HSPA+ (42Mbps) Mobile Broadband Card */
+ {DEVICE_SWI(0x413c, 0x81a4)}, /* Dell Wireless 5570e HSPA+ (42Mbps) Mobile Broadband Card */
+diff --git a/drivers/video/console/fbcon.c b/drivers/video/console/fbcon.c
+index 1aaf89300621..92f394927f24 100644
+--- a/drivers/video/console/fbcon.c
++++ b/drivers/video/console/fbcon.c
+@@ -1093,6 +1093,7 @@ static void fbcon_init(struct vc_data *vc, int init)
+ con_copy_unimap(vc, svc);
+
+ ops = info->fbcon_par;
++ ops->cur_blink_jiffies = msecs_to_jiffies(vc->vc_cur_blink_ms);
+ p->con_rotate = initial_rotation;
+ set_blitting_type(vc, info);
+
+diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
+index f490b6155091..641d3dc4f31e 100644
+--- a/fs/btrfs/ioctl.c
++++ b/fs/btrfs/ioctl.c
+@@ -4649,7 +4649,7 @@ locked:
+
+ if (bctl->flags & ~(BTRFS_BALANCE_ARGS_MASK | BTRFS_BALANCE_TYPE_MASK)) {
+ ret = -EINVAL;
+- goto out_bargs;
++ goto out_bctl;
+ }
+
+ do_balance:
+@@ -4663,12 +4663,15 @@ do_balance:
+ need_unlock = false;
+
+ ret = btrfs_balance(bctl, bargs);
++ bctl = NULL;
+
+ if (arg) {
+ if (copy_to_user(arg, bargs, sizeof(*bargs)))
+ ret = -EFAULT;
+ }
+
++out_bctl:
++ kfree(bctl);
+ out_bargs:
+ kfree(bargs);
+ out_unlock:
+diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
+index 84d693d37428..871fcb67be97 100644
+--- a/fs/overlayfs/copy_up.c
++++ b/fs/overlayfs/copy_up.c
+@@ -81,11 +81,11 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
+ if (len == 0)
+ return 0;
+
+- old_file = ovl_path_open(old, O_RDONLY);
++ old_file = ovl_path_open(old, O_LARGEFILE | O_RDONLY);
+ if (IS_ERR(old_file))
+ return PTR_ERR(old_file);
+
+- new_file = ovl_path_open(new, O_WRONLY);
++ new_file = ovl_path_open(new, O_LARGEFILE | O_WRONLY);
+ if (IS_ERR(new_file)) {
+ error = PTR_ERR(new_file);
+ goto out_fput;
+@@ -267,7 +267,7 @@ out:
+
+ out_cleanup:
+ ovl_cleanup(wdir, newdentry);
+- goto out;
++ goto out2;
+ }
+
+ /*
+diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
+index d9da5a4e9382..ec0c2a050043 100644
+--- a/fs/overlayfs/inode.c
++++ b/fs/overlayfs/inode.c
+@@ -363,6 +363,9 @@ struct inode *ovl_d_select_inode(struct dentry *dentry, unsigned file_flags)
+ ovl_path_upper(dentry, &realpath);
+ }
+
++ if (realpath.dentry->d_flags & DCACHE_OP_SELECT_INODE)
++ return realpath.dentry->d_op->d_select_inode(realpath.dentry, file_flags);
++
+ return d_backing_inode(realpath.dentry);
+ }
+
+diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
+index 79073d68b475..e38ee0fed24a 100644
+--- a/fs/overlayfs/super.c
++++ b/fs/overlayfs/super.c
+@@ -544,6 +544,7 @@ static void ovl_put_super(struct super_block *sb)
+ mntput(ufs->upper_mnt);
+ for (i = 0; i < ufs->numlower; i++)
+ mntput(ufs->lower_mnt[i]);
++ kfree(ufs->lower_mnt);
+
+ kfree(ufs->config.lowerdir);
+ kfree(ufs->config.upperdir);
+@@ -1048,6 +1049,7 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
+ oe->lowerstack[i].dentry = stack[i].dentry;
+ oe->lowerstack[i].mnt = ufs->lower_mnt[i];
+ }
++ kfree(stack);
+
+ root_dentry->d_fsdata = oe;
+
+diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
+index 0fe9df983ab7..fe0ab983859b 100644
+--- a/include/linux/backing-dev.h
++++ b/include/linux/backing-dev.h
+@@ -18,13 +18,17 @@
+ #include <linux/slab.h>
+
+ int __must_check bdi_init(struct backing_dev_info *bdi);
+-void bdi_destroy(struct backing_dev_info *bdi);
++void bdi_exit(struct backing_dev_info *bdi);
+
+ __printf(3, 4)
+ int bdi_register(struct backing_dev_info *bdi, struct device *parent,
+ const char *fmt, ...);
+ int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev);
++void bdi_unregister(struct backing_dev_info *bdi);
++
+ int __must_check bdi_setup_and_register(struct backing_dev_info *, char *);
++void bdi_destroy(struct backing_dev_info *bdi);
++
+ void wb_start_writeback(struct bdi_writeback *wb, long nr_pages,
+ bool range_cyclic, enum wb_reason reason);
+ void wb_start_background_writeback(struct bdi_writeback *wb);
+diff --git a/include/linux/omap-dma.h b/include/linux/omap-dma.h
+index e5a70132a240..88fa8af2b937 100644
+--- a/include/linux/omap-dma.h
++++ b/include/linux/omap-dma.h
+@@ -17,7 +17,7 @@
+
+ #include <linux/platform_device.h>
+
+-#define INT_DMA_LCD 25
++#define INT_DMA_LCD (NR_IRQS_LEGACY + 25)
+
+ #define OMAP1_DMA_TOUT_IRQ (1 << 0)
+ #define OMAP_DMA_DROP_IRQ (1 << 1)
+diff --git a/include/sound/soc.h b/include/sound/soc.h
+index 93df8bf9d54a..334d0d292020 100644
+--- a/include/sound/soc.h
++++ b/include/sound/soc.h
+@@ -86,7 +86,7 @@
+ .access = SNDRV_CTL_ELEM_ACCESS_TLV_READ | \
+ SNDRV_CTL_ELEM_ACCESS_READWRITE, \
+ .tlv.p = (tlv_array),\
+- .info = snd_soc_info_volsw, \
++ .info = snd_soc_info_volsw_sx, \
+ .get = snd_soc_get_volsw_sx,\
+ .put = snd_soc_put_volsw_sx, \
+ .private_value = (unsigned long)&(struct soc_mixer_control) \
+@@ -156,7 +156,7 @@
+ .access = SNDRV_CTL_ELEM_ACCESS_TLV_READ | \
+ SNDRV_CTL_ELEM_ACCESS_READWRITE, \
+ .tlv.p = (tlv_array), \
+- .info = snd_soc_info_volsw, \
++ .info = snd_soc_info_volsw_sx, \
+ .get = snd_soc_get_volsw_sx, \
+ .put = snd_soc_put_volsw_sx, \
+ .private_value = (unsigned long)&(struct soc_mixer_control) \
+@@ -573,6 +573,8 @@ int snd_soc_put_enum_double(struct snd_kcontrol *kcontrol,
+ struct snd_ctl_elem_value *ucontrol);
+ int snd_soc_info_volsw(struct snd_kcontrol *kcontrol,
+ struct snd_ctl_elem_info *uinfo);
++int snd_soc_info_volsw_sx(struct snd_kcontrol *kcontrol,
++ struct snd_ctl_elem_info *uinfo);
+ #define snd_soc_info_bool_ext snd_ctl_boolean_mono_info
+ int snd_soc_get_volsw(struct snd_kcontrol *kcontrol,
+ struct snd_ctl_elem_value *ucontrol);
+diff --git a/include/sound/wm8904.h b/include/sound/wm8904.h
+index 898be3a8db9a..6d8f8fba3341 100644
+--- a/include/sound/wm8904.h
++++ b/include/sound/wm8904.h
+@@ -119,7 +119,7 @@
+ #define WM8904_MIC_REGS 2
+ #define WM8904_GPIO_REGS 4
+ #define WM8904_DRC_REGS 4
+-#define WM8904_EQ_REGS 25
++#define WM8904_EQ_REGS 24
+
+ /**
+ * DRC configurations are specified with a label and a set of register
+diff --git a/kernel/module.c b/kernel/module.c
+index b86b7bf1be38..8f051a106676 100644
+--- a/kernel/module.c
++++ b/kernel/module.c
+@@ -1063,11 +1063,15 @@ void symbol_put_addr(void *addr)
+ if (core_kernel_text(a))
+ return;
+
+- /* module_text_address is safe here: we're supposed to have reference
+- * to module from symbol_get, so it can't go away. */
++ /*
++ * Even though we hold a reference on the module; we still need to
++ * disable preemption in order to safely traverse the data structure.
++ */
++ preempt_disable();
+ modaddr = __module_text_address(a);
+ BUG_ON(!modaddr);
+ module_put(modaddr);
++ preempt_enable();
+ }
+ EXPORT_SYMBOL_GPL(symbol_put_addr);
+
+diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
+index 0a17af35670a..da7f8266913b 100644
+--- a/kernel/sched/deadline.c
++++ b/kernel/sched/deadline.c
+@@ -1066,8 +1066,9 @@ select_task_rq_dl(struct task_struct *p, int cpu, int sd_flag, int flags)
+ int target = find_later_rq(p);
+
+ if (target != -1 &&
+- dl_time_before(p->dl.deadline,
+- cpu_rq(target)->dl.earliest_dl.curr))
++ (dl_time_before(p->dl.deadline,
++ cpu_rq(target)->dl.earliest_dl.curr) ||
++ (cpu_rq(target)->dl.dl_nr_running == 0)))
+ cpu = target;
+ }
+ rcu_read_unlock();
+@@ -1417,7 +1418,8 @@ static struct rq *find_lock_later_rq(struct task_struct *task, struct rq *rq)
+
+ later_rq = cpu_rq(cpu);
+
+- if (!dl_time_before(task->dl.deadline,
++ if (later_rq->dl.dl_nr_running &&
++ !dl_time_before(task->dl.deadline,
+ later_rq->dl.earliest_dl.curr)) {
+ /*
+ * Target rq has tasks of equal or earlier deadline,
+diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c
+index 3f34496244e9..96969012f242 100644
+--- a/kernel/trace/trace_stack.c
++++ b/kernel/trace/trace_stack.c
+@@ -94,6 +94,12 @@ check_stack(unsigned long ip, unsigned long *stack)
+ local_irq_save(flags);
+ arch_spin_lock(&max_stack_lock);
+
++ /*
++ * RCU may not be watching, make it see us.
++ * The stack trace code uses rcu_sched.
++ */
++ rcu_irq_enter();
++
+ /* In case another CPU set the tracer_frame on us */
+ if (unlikely(!frame_size))
+ this_size -= tracer_frame;
+@@ -174,6 +180,7 @@ check_stack(unsigned long ip, unsigned long *stack)
+ }
+
+ out:
++ rcu_irq_exit();
+ arch_spin_unlock(&max_stack_lock);
+ local_irq_restore(flags);
+ }
+diff --git a/lib/fault-inject.c b/lib/fault-inject.c
+index f1cdeb024d17..6a823a53e357 100644
+--- a/lib/fault-inject.c
++++ b/lib/fault-inject.c
+@@ -44,7 +44,7 @@ static void fail_dump(struct fault_attr *attr)
+ printk(KERN_NOTICE "FAULT_INJECTION: forcing a failure.\n"
+ "name %pd, interval %lu, probability %lu, "
+ "space %d, times %d\n", attr->dname,
+- attr->probability, attr->interval,
++ attr->interval, attr->probability,
+ atomic_read(&attr->space),
+ atomic_read(&attr->times));
+ if (attr->verbose > 1)
+diff --git a/mm/backing-dev.c b/mm/backing-dev.c
+index dac5bf59309d..dc07d8866d9a 100644
+--- a/mm/backing-dev.c
++++ b/mm/backing-dev.c
+@@ -823,7 +823,7 @@ static void bdi_remove_from_list(struct backing_dev_info *bdi)
+ synchronize_rcu_expedited();
+ }
+
+-void bdi_destroy(struct backing_dev_info *bdi)
++void bdi_unregister(struct backing_dev_info *bdi)
+ {
+ /* make sure nobody finds us on the bdi_list anymore */
+ bdi_remove_from_list(bdi);
+@@ -835,9 +835,19 @@ void bdi_destroy(struct backing_dev_info *bdi)
+ device_unregister(bdi->dev);
+ bdi->dev = NULL;
+ }
++}
+
++void bdi_exit(struct backing_dev_info *bdi)
++{
++ WARN_ON_ONCE(bdi->dev);
+ wb_exit(&bdi->wb);
+ }
++
++void bdi_destroy(struct backing_dev_info *bdi)
++{
++ bdi_unregister(bdi);
++ bdi_exit(bdi);
++}
+ EXPORT_SYMBOL(bdi_destroy);
+
+ /*
+diff --git a/mm/filemap.c b/mm/filemap.c
+index 1283fc825458..3fd68ee183c6 100644
+--- a/mm/filemap.c
++++ b/mm/filemap.c
+@@ -2488,6 +2488,11 @@ again:
+ break;
+ }
+
++ if (fatal_signal_pending(current)) {
++ status = -EINTR;
++ break;
++ }
++
+ status = a_ops->write_begin(file, mapping, pos, bytes, flags,
+ &page, &fsdata);
+ if (unlikely(status < 0))
+@@ -2525,10 +2530,6 @@ again:
+ written += copied;
+
+ balance_dirty_pages_ratelimited(mapping);
+- if (fatal_signal_pending(current)) {
+- status = -EINTR;
+- break;
+- }
+ } while (iov_iter_count(i));
+
+ return written ? written : status;
+diff --git a/mm/huge_memory.c b/mm/huge_memory.c
+index 097c7a4bfbd9..da0ac6a0445f 100644
+--- a/mm/huge_memory.c
++++ b/mm/huge_memory.c
+@@ -2132,7 +2132,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
+ for (_pte = pte; _pte < pte+HPAGE_PMD_NR;
+ _pte++, address += PAGE_SIZE) {
+ pte_t pteval = *_pte;
+- if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) {
++ if (pte_none(pteval) || (pte_present(pteval) &&
++ is_zero_pfn(pte_pfn(pteval)))) {
+ if (++none_or_zero <= khugepaged_max_ptes_none)
+ continue;
+ else
+diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
+index 3ea8b7de9633..58d9a8167dd2 100644
+--- a/net/mac80211/debugfs.c
++++ b/net/mac80211/debugfs.c
+@@ -148,7 +148,7 @@ static ssize_t hwflags_read(struct file *file, char __user *user_buf,
+
+ for (i = 0; i < NUM_IEEE80211_HW_FLAGS; i++) {
+ if (test_bit(i, local->hw.flags))
+- pos += scnprintf(pos, end - pos, "%s",
++ pos += scnprintf(pos, end - pos, "%s\n",
+ hw_flag_names[i]);
+ }
+
+diff --git a/net/netfilter/ipset/ip_set_list_set.c b/net/netfilter/ipset/ip_set_list_set.c
+index a1fe5377a2b3..5a30ce6e8c90 100644
+--- a/net/netfilter/ipset/ip_set_list_set.c
++++ b/net/netfilter/ipset/ip_set_list_set.c
+@@ -297,7 +297,7 @@ list_set_uadd(struct ip_set *set, void *value, const struct ip_set_ext *ext,
+ ip_set_timeout_expired(ext_timeout(n, set))))
+ n = NULL;
+
+- e = kzalloc(set->dsize, GFP_KERNEL);
++ e = kzalloc(set->dsize, GFP_ATOMIC);
+ if (!e)
+ return -ENOMEM;
+ e->id = d->id;
+diff --git a/sound/hda/ext/hdac_ext_bus.c b/sound/hda/ext/hdac_ext_bus.c
+index 0aa5d9eb6c3f..d85aa1a75188 100644
+--- a/sound/hda/ext/hdac_ext_bus.c
++++ b/sound/hda/ext/hdac_ext_bus.c
+@@ -19,6 +19,7 @@
+
+ #include <linux/module.h>
+ #include <linux/slab.h>
++#include <linux/io.h>
+ #include <sound/hdaudio_ext.h>
+
+ MODULE_DESCRIPTION("HDA extended core");
+diff --git a/sound/pci/hda/hda_codec.c b/sound/pci/hda/hda_codec.c
+index d1a2cb65e27c..ca374462d7e5 100644
+--- a/sound/pci/hda/hda_codec.c
++++ b/sound/pci/hda/hda_codec.c
+@@ -3438,10 +3438,8 @@ int snd_hda_codec_build_pcms(struct hda_codec *codec)
+ int dev, err;
+
+ err = snd_hda_codec_parse_pcms(codec);
+- if (err < 0) {
+- snd_hda_codec_reset(codec);
++ if (err < 0)
+ return err;
+- }
+
+ /* attach a new PCM streams */
+ list_for_each_entry(cpcm, &codec->pcm_list_head, list) {
+diff --git a/sound/pci/hda/patch_conexant.c b/sound/pci/hda/patch_conexant.c
+index ca03c40609fc..2f0ec7c45fc7 100644
+--- a/sound/pci/hda/patch_conexant.c
++++ b/sound/pci/hda/patch_conexant.c
+@@ -819,6 +819,7 @@ static const struct snd_pci_quirk cxt5066_fixups[] = {
+ SND_PCI_QUIRK(0x17aa, 0x21da, "Lenovo X220", CXT_PINCFG_LENOVO_TP410),
+ SND_PCI_QUIRK(0x17aa, 0x21db, "Lenovo X220-tablet", CXT_PINCFG_LENOVO_TP410),
+ SND_PCI_QUIRK(0x17aa, 0x38af, "Lenovo IdeaPad Z560", CXT_FIXUP_MUTE_LED_EAPD),
++ SND_PCI_QUIRK(0x17aa, 0x390b, "Lenovo G50-80", CXT_FIXUP_STEREO_DMIC),
+ SND_PCI_QUIRK(0x17aa, 0x3975, "Lenovo U300s", CXT_FIXUP_STEREO_DMIC),
+ SND_PCI_QUIRK(0x17aa, 0x3977, "Lenovo IdeaPad U310", CXT_FIXUP_STEREO_DMIC),
+ SND_PCI_QUIRK(0x17aa, 0x397b, "Lenovo S205", CXT_FIXUP_STEREO_DMIC),
+diff --git a/sound/soc/soc-ops.c b/sound/soc/soc-ops.c
+index 100d92b5b77e..05977ae1ff2a 100644
+--- a/sound/soc/soc-ops.c
++++ b/sound/soc/soc-ops.c
+@@ -207,6 +207,34 @@ int snd_soc_info_volsw(struct snd_kcontrol *kcontrol,
+ EXPORT_SYMBOL_GPL(snd_soc_info_volsw);
+
+ /**
++ * snd_soc_info_volsw_sx - Mixer info callback for SX TLV controls
++ * @kcontrol: mixer control
++ * @uinfo: control element information
++ *
++ * Callback to provide information about a single mixer control, or a double
++ * mixer control that spans 2 registers of the SX TLV type. SX TLV controls
++ * have a range that represents both positive and negative values either side
++ * of zero but without a sign bit.
++ *
++ * Returns 0 for success.
++ */
++int snd_soc_info_volsw_sx(struct snd_kcontrol *kcontrol,
++ struct snd_ctl_elem_info *uinfo)
++{
++ struct soc_mixer_control *mc =
++ (struct soc_mixer_control *)kcontrol->private_value;
++
++ snd_soc_info_volsw(kcontrol, uinfo);
++ /* Max represents the number of levels in an SX control not the
++ * maximum value, so add the minimum value back on
++ */
++ uinfo->value.integer.max += mc->min;
++
++ return 0;
++}
++EXPORT_SYMBOL_GPL(snd_soc_info_volsw_sx);
++
++/**
+ * snd_soc_get_volsw - single mixer get callback
+ * @kcontrol: mixer control
+ * @ucontrol: control element information
+diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
+index 21c14244f4c4..d7ea8e20dae4 100644
+--- a/virt/kvm/irqchip.c
++++ b/virt/kvm/irqchip.c
+@@ -213,11 +213,15 @@ int kvm_set_irq_routing(struct kvm *kvm,
+ goto out;
+
+ r = -EINVAL;
+- if (ue->flags)
++ if (ue->flags) {
++ kfree(e);
+ goto out;
++ }
+ r = setup_routing_entry(new, e, ue);
+- if (r)
++ if (r) {
++ kfree(e);
+ goto out;
++ }
+ ++ue;
+ }
+
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-12-11 14:31 Mike Pagano
0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-12-11 14:31 UTC (permalink / raw
To: gentoo-commits
commit: 42c0079efa9fff84d8c63842c360060aad2f75cb
Author: Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Fri Dec 11 14:31:16 2015 +0000
Commit: Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Fri Dec 11 14:31:16 2015 +0000
URL: https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=42c0079e
Linux patch 4.2.7
0000_README | 4 +
1006_linux-4.2.7.patch | 4131 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 4135 insertions(+)
diff --git a/0000_README b/0000_README
index 8190b77..2299001 100644
--- a/0000_README
+++ b/0000_README
@@ -67,6 +67,10 @@ Patch: 1005_linux-4.2.6.patch
From: http://www.kernel.org
Desc: Linux 4.2.6
+Patch: 1006_linux-4.2.7.patch
+From: http://www.kernel.org
+Desc: Linux 4.2.7
+
Patch: 1500_XATTR_USER_PREFIX.patch
From: https://bugs.gentoo.org/show_bug.cgi?id=470644
Desc: Support for namespace user.pax.* on tmpfs.
diff --git a/1006_linux-4.2.7.patch b/1006_linux-4.2.7.patch
new file mode 100644
index 0000000..35ba2e4
--- /dev/null
+++ b/1006_linux-4.2.7.patch
@@ -0,0 +1,4131 @@
+diff --git a/Documentation/devicetree/bindings/usb/dwc3.txt b/Documentation/devicetree/bindings/usb/dwc3.txt
+index 0815eac5b185..e12f3448846a 100644
+--- a/Documentation/devicetree/bindings/usb/dwc3.txt
++++ b/Documentation/devicetree/bindings/usb/dwc3.txt
+@@ -35,6 +35,8 @@ Optional properties:
+ LTSSM during USB3 Compliance mode.
+ - snps,dis_u3_susphy_quirk: when set core will disable USB3 suspend phy.
+ - snps,dis_u2_susphy_quirk: when set core will disable USB2 suspend phy.
++ - snps,dis_enblslpm_quirk: when set clears the enblslpm in GUSB2PHYCFG,
++ disabling the suspend signal to the PHY.
+ - snps,is-utmi-l1-suspend: true when DWC3 asserts output signal
+ utmi_l1_suspend_n, false when asserts utmi_sleep_n
+ - snps,hird-threshold: HIRD threshold
+diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
+index 6f7fafde0884..3e2844eca266 100644
+--- a/Documentation/filesystems/proc.txt
++++ b/Documentation/filesystems/proc.txt
+@@ -140,7 +140,8 @@ Table 1-1: Process specific entries in /proc
+ stat Process status
+ statm Process memory status information
+ status Process status in human readable form
+- wchan If CONFIG_KALLSYMS is set, a pre-decoded wchan
++ wchan Present with CONFIG_KALLSYMS=y: it shows the kernel function
++ symbol the task is blocked in - or "0" if not blocked.
+ pagemap Page table
+ stack Report full stack trace, enable via CONFIG_STACKTRACE
+ smaps a extension based on maps, showing the memory consumption of
+@@ -310,7 +311,7 @@ Table 1-4: Contents of the stat files (as of 2.6.30-rc7)
+ blocked bitmap of blocked signals
+ sigign bitmap of ignored signals
+ sigcatch bitmap of caught signals
+- wchan address where process went to sleep
++ 0 (place holder, used to be the wchan address, use /proc/PID/wchan instead)
+ 0 (place holder)
+ 0 (place holder)
+ exit_signal signal to send to parent thread on exit
+diff --git a/Makefile b/Makefile
+index 9ef37399b4e8..f5014eaf2532 100644
+--- a/Makefile
++++ b/Makefile
+@@ -1,6 +1,6 @@
+ VERSION = 4
+ PATCHLEVEL = 2
+-SUBLEVEL = 6
++SUBLEVEL = 7
+ EXTRAVERSION =
+ NAME = Hurr durr I'ma sheep
+
+diff --git a/arch/arm/boot/dts/imx27.dtsi b/arch/arm/boot/dts/imx27.dtsi
+index b69be5c499cf..8c603fdf9da1 100644
+--- a/arch/arm/boot/dts/imx27.dtsi
++++ b/arch/arm/boot/dts/imx27.dtsi
+@@ -477,7 +477,10 @@
+ compatible = "fsl,imx27-usb";
+ reg = <0x10024000 0x200>;
+ interrupts = <56>;
+- clocks = <&clks IMX27_CLK_USB_IPG_GATE>;
++ clocks = <&clks IMX27_CLK_USB_IPG_GATE>,
++ <&clks IMX27_CLK_USB_AHB_GATE>,
++ <&clks IMX27_CLK_USB_DIV>;
++ clock-names = "ipg", "ahb", "per";
+ fsl,usbmisc = <&usbmisc 0>;
+ status = "disabled";
+ };
+@@ -486,7 +489,10 @@
+ compatible = "fsl,imx27-usb";
+ reg = <0x10024200 0x200>;
+ interrupts = <54>;
+- clocks = <&clks IMX27_CLK_USB_IPG_GATE>;
++ clocks = <&clks IMX27_CLK_USB_IPG_GATE>,
++ <&clks IMX27_CLK_USB_AHB_GATE>,
++ <&clks IMX27_CLK_USB_DIV>;
++ clock-names = "ipg", "ahb", "per";
+ fsl,usbmisc = <&usbmisc 1>;
+ dr_mode = "host";
+ status = "disabled";
+@@ -496,7 +502,10 @@
+ compatible = "fsl,imx27-usb";
+ reg = <0x10024400 0x200>;
+ interrupts = <55>;
+- clocks = <&clks IMX27_CLK_USB_IPG_GATE>;
++ clocks = <&clks IMX27_CLK_USB_IPG_GATE>,
++ <&clks IMX27_CLK_USB_AHB_GATE>,
++ <&clks IMX27_CLK_USB_DIV>;
++ clock-names = "ipg", "ahb", "per";
+ fsl,usbmisc = <&usbmisc 2>;
+ dr_mode = "host";
+ status = "disabled";
+@@ -506,7 +515,6 @@
+ #index-cells = <1>;
+ compatible = "fsl,imx27-usbmisc";
+ reg = <0x10024600 0x200>;
+- clocks = <&clks IMX27_CLK_USB_AHB_GATE>;
+ };
+
+ sahara2: sahara@10025000 {
+diff --git a/arch/arm/boot/dts/omap5-uevm.dts b/arch/arm/boot/dts/omap5-uevm.dts
+index 5771a149ce4a..23d645daeac1 100644
+--- a/arch/arm/boot/dts/omap5-uevm.dts
++++ b/arch/arm/boot/dts/omap5-uevm.dts
+@@ -31,6 +31,24 @@
+ regulator-max-microvolt = <3000000>;
+ };
+
++ mmc3_pwrseq: sdhci0_pwrseq {
++ compatible = "mmc-pwrseq-simple";
++ clocks = <&clk32kgaudio>;
++ clock-names = "ext_clock";
++ };
++
++ vmmcsdio_fixed: fixedregulator-mmcsdio {
++ compatible = "regulator-fixed";
++ regulator-name = "vmmcsdio_fixed";
++ regulator-min-microvolt = <1800000>;
++ regulator-max-microvolt = <1800000>;
++ gpio = <&gpio5 12 GPIO_ACTIVE_HIGH>; /* gpio140 WLAN_EN */
++ enable-active-high;
++ startup-delay-us = <70000>;
++ pinctrl-names = "default";
++ pinctrl-0 = <&wlan_pins>;
++ };
++
+ /* HS USB Host PHY on PORT 2 */
+ hsusb2_phy: hsusb2_phy {
+ compatible = "usb-nop-xceiv";
+@@ -197,12 +215,20 @@
+ >;
+ };
+
+- mcspi4_pins: pinmux_mcspi4_pins {
++ mmc3_pins: pinmux_mmc3_pins {
++ pinctrl-single,pins = <
++ OMAP5_IOPAD(0x01a4, PIN_INPUT_PULLUP | MUX_MODE0) /* wlsdio_clk */
++ OMAP5_IOPAD(0x01a6, PIN_INPUT_PULLUP | MUX_MODE0) /* wlsdio_cmd */
++ OMAP5_IOPAD(0x01a8, PIN_INPUT_PULLUP | MUX_MODE0) /* wlsdio_data0 */
++ OMAP5_IOPAD(0x01aa, PIN_INPUT_PULLUP | MUX_MODE0) /* wlsdio_data1 */
++ OMAP5_IOPAD(0x01ac, PIN_INPUT_PULLUP | MUX_MODE0) /* wlsdio_data2 */
++ OMAP5_IOPAD(0x01ae, PIN_INPUT_PULLUP | MUX_MODE0) /* wlsdio_data3 */
++ >;
++ };
++
++ wlan_pins: pinmux_wlan_pins {
+ pinctrl-single,pins = <
+- 0x164 (PIN_INPUT | MUX_MODE1) /* mcspi4_clk */
+- 0x168 (PIN_INPUT | MUX_MODE1) /* mcspi4_simo */
+- 0x16a (PIN_INPUT | MUX_MODE1) /* mcspi4_somi */
+- 0x16c (PIN_INPUT | MUX_MODE1) /* mcspi4_cs0 */
++ OMAP5_IOPAD(0x1bc, PIN_OUTPUT | MUX_MODE6) /* mcspi1_clk.gpio5_140 */
+ >;
+ };
+
+@@ -276,6 +302,12 @@
+ 0x1A (PIN_OUTPUT | MUX_MODE0) /* fref_clk1_out, USB hub clk */
+ >;
+ };
++
++ wlcore_irq_pin: pinmux_wlcore_irq_pin {
++ pinctrl-single,pins = <
++ OMAP5_IOPAD(0x040, WAKEUP_EN | PIN_INPUT_PULLUP | MUX_MODE6) /* llia_wakereqin.gpio1_wk14 */
++ >;
++ };
+ };
+
+ &mmc1 {
+@@ -290,8 +322,25 @@
+ };
+
+ &mmc3 {
++ vmmc-supply = <&vmmcsdio_fixed>;
++ mmc-pwrseq = <&mmc3_pwrseq>;
+ bus-width = <4>;
+- ti,non-removable;
++ non-removable;
++ cap-power-off-card;
++ pinctrl-names = "default";
++ pinctrl-0 = <&mmc3_pins &wlcore_irq_pin>;
++ interrupts-extended = <&gic GIC_SPI 94 IRQ_TYPE_LEVEL_HIGH
++ &omap5_pmx_core 0x168>;
++
++ #address-cells = <1>;
++ #size-cells = <0>;
++ wlcore: wlcore@2 {
++ compatible = "ti,wl1271";
++ reg = <2>;
++ interrupt-parent = <&gpio1>;
++ interrupts = <14 IRQ_TYPE_LEVEL_HIGH>; /* gpio 14 */
++ ref-clock-frequency = <26000000>;
++ };
+ };
+
+ &mmc4 {
+@@ -591,11 +640,6 @@
+ pinctrl-0 = <&mcspi3_pins>;
+ };
+
+-&mcspi4 {
+- pinctrl-names = "default";
+- pinctrl-0 = <&mcspi4_pins>;
+-};
+-
+ &uart1 {
+ pinctrl-names = "default";
+ pinctrl-0 = <&uart1_pins>;
+diff --git a/arch/arm/boot/dts/sama5d4.dtsi b/arch/arm/boot/dts/sama5d4.dtsi
+index 3ee22ee13c5a..1ba10e495f21 100644
+--- a/arch/arm/boot/dts/sama5d4.dtsi
++++ b/arch/arm/boot/dts/sama5d4.dtsi
+@@ -939,11 +939,11 @@
+ reg = <0xf8018000 0x4000>;
+ interrupts = <33 IRQ_TYPE_LEVEL_HIGH 6>;
+ dmas = <&dma1
+- (AT91_XDMAC_DT_MEM_IF(0) | AT91_XDMAC_DT_PER_IF(1))
+- AT91_XDMAC_DT_PERID(4)>,
++ (AT91_XDMAC_DT_MEM_IF(0) | AT91_XDMAC_DT_PER_IF(1)
++ | AT91_XDMAC_DT_PERID(4))>,
+ <&dma1
+- (AT91_XDMAC_DT_MEM_IF(0) | AT91_XDMAC_DT_PER_IF(1))
+- AT91_XDMAC_DT_PERID(5)>;
++ (AT91_XDMAC_DT_MEM_IF(0) | AT91_XDMAC_DT_PER_IF(1)
++ | AT91_XDMAC_DT_PERID(5))>;
+ dma-names = "tx", "rx";
+ pinctrl-names = "default";
+ pinctrl-0 = <&pinctrl_i2c1>;
+diff --git a/arch/arm/boot/dts/sun6i-a31-hummingbird.dts b/arch/arm/boot/dts/sun6i-a31-hummingbird.dts
+index d0cfadac0691..18f26ca4e375 100644
+--- a/arch/arm/boot/dts/sun6i-a31-hummingbird.dts
++++ b/arch/arm/boot/dts/sun6i-a31-hummingbird.dts
+@@ -184,18 +184,18 @@
+ regulator-name = "vcc-3v0";
+ };
+
+- vdd_cpu: dcdc2 {
++ vdd_gpu: dcdc2 {
+ regulator-always-on;
+ regulator-min-microvolt = <700000>;
+ regulator-max-microvolt = <1320000>;
+- regulator-name = "vdd-cpu";
++ regulator-name = "vdd-gpu";
+ };
+
+- vdd_gpu: dcdc3 {
++ vdd_cpu: dcdc3 {
+ regulator-always-on;
+ regulator-min-microvolt = <700000>;
+ regulator-max-microvolt = <1320000>;
+- regulator-name = "vdd-gpu";
++ regulator-name = "vdd-cpu";
+ };
+
+ vdd_sys_dll: dcdc4 {
+diff --git a/arch/arm/common/edma.c b/arch/arm/common/edma.c
+index 873dbfcc7dc9..56fc339571f9 100644
+--- a/arch/arm/common/edma.c
++++ b/arch/arm/common/edma.c
+@@ -406,7 +406,8 @@ static irqreturn_t dma_irq_handler(int irq, void *data)
+ BIT(slot));
+ if (edma_cc[ctlr]->intr_data[channel].callback)
+ edma_cc[ctlr]->intr_data[channel].callback(
+- channel, EDMA_DMA_COMPLETE,
++ EDMA_CTLR_CHAN(ctlr, channel),
++ EDMA_DMA_COMPLETE,
+ edma_cc[ctlr]->intr_data[channel].data);
+ }
+ } while (sh_ipr);
+@@ -460,7 +461,8 @@ static irqreturn_t dma_ccerr_handler(int irq, void *data)
+ if (edma_cc[ctlr]->intr_data[k].
+ callback) {
+ edma_cc[ctlr]->intr_data[k].
+- callback(k,
++ callback(
++ EDMA_CTLR_CHAN(ctlr, k),
+ EDMA_DMA_CC_ERROR,
+ edma_cc[ctlr]->intr_data
+ [k].data);
+diff --git a/arch/arm/include/asm/irq.h b/arch/arm/include/asm/irq.h
+index 53c15dec7af6..6a9851ea6a60 100644
+--- a/arch/arm/include/asm/irq.h
++++ b/arch/arm/include/asm/irq.h
+@@ -35,6 +35,11 @@ extern void (*handle_arch_irq)(struct pt_regs *);
+ extern void set_handle_irq(void (*handle_irq)(struct pt_regs *));
+ #endif
+
++static inline int nr_legacy_irqs(void)
++{
++ return NR_IRQS_LEGACY;
++}
++
+ #endif
+
+ #endif
+diff --git a/arch/arm/mach-at91/pm_suspend.S b/arch/arm/mach-at91/pm_suspend.S
+index 0d95f488b47a..a25defda3d22 100644
+--- a/arch/arm/mach-at91/pm_suspend.S
++++ b/arch/arm/mach-at91/pm_suspend.S
+@@ -80,6 +80,8 @@ tmp2 .req r5
+ * @r2: base address of second SDRAM Controller or 0 if not present
+ * @r3: pm information
+ */
++/* at91_pm_suspend_in_sram must be 8-byte aligned per the requirements of fncpy() */
++ .align 3
+ ENTRY(at91_pm_suspend_in_sram)
+ /* Save registers on stack */
+ stmfd sp!, {r4 - r12, lr}
+diff --git a/arch/arm/mach-pxa/include/mach/pxa27x.h b/arch/arm/mach-pxa/include/mach/pxa27x.h
+index 599b925a657c..1a4291936c58 100644
+--- a/arch/arm/mach-pxa/include/mach/pxa27x.h
++++ b/arch/arm/mach-pxa/include/mach/pxa27x.h
+@@ -19,7 +19,7 @@
+ #define ARB_CORE_PARK (1<<24) /* Be parked with core when idle */
+ #define ARB_LOCK_FLAG (1<<23) /* Only Locking masters gain access to the bus */
+
+-extern int __init pxa27x_set_pwrmode(unsigned int mode);
++extern int pxa27x_set_pwrmode(unsigned int mode);
+ extern void pxa27x_cpu_pm_enter(suspend_state_t state);
+
+ #endif /* __MACH_PXA27x_H */
+diff --git a/arch/arm/mach-pxa/pxa27x.c b/arch/arm/mach-pxa/pxa27x.c
+index b5abdeb5bb2d..aa97547099fb 100644
+--- a/arch/arm/mach-pxa/pxa27x.c
++++ b/arch/arm/mach-pxa/pxa27x.c
+@@ -84,7 +84,7 @@ EXPORT_SYMBOL_GPL(pxa27x_configure_ac97reset);
+ */
+ static unsigned int pwrmode = PWRMODE_SLEEP;
+
+-int __init pxa27x_set_pwrmode(unsigned int mode)
++int pxa27x_set_pwrmode(unsigned int mode)
+ {
+ switch (mode) {
+ case PWRMODE_SLEEP:
+diff --git a/arch/arm/mach-tegra/board-paz00.c b/arch/arm/mach-tegra/board-paz00.c
+index fbe74c6806f3..49d1110cff53 100644
+--- a/arch/arm/mach-tegra/board-paz00.c
++++ b/arch/arm/mach-tegra/board-paz00.c
+@@ -39,8 +39,8 @@ static struct platform_device wifi_rfkill_device = {
+ static struct gpiod_lookup_table wifi_gpio_lookup = {
+ .dev_id = "rfkill_gpio",
+ .table = {
+- GPIO_LOOKUP_IDX("tegra-gpio", 25, NULL, 0, 0),
+- GPIO_LOOKUP_IDX("tegra-gpio", 85, NULL, 1, 0),
++ GPIO_LOOKUP("tegra-gpio", 25, "reset", 0),
++ GPIO_LOOKUP("tegra-gpio", 85, "shutdown", 0),
+ { },
+ },
+ };
+diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
+index cba12f34ff77..25ecc6afec4c 100644
+--- a/arch/arm/mm/dma-mapping.c
++++ b/arch/arm/mm/dma-mapping.c
+@@ -1413,12 +1413,19 @@ static int arm_iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
+ unsigned long uaddr = vma->vm_start;
+ unsigned long usize = vma->vm_end - vma->vm_start;
+ struct page **pages = __iommu_get_pages(cpu_addr, attrs);
++ unsigned long nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
++ unsigned long off = vma->vm_pgoff;
+
+ vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot);
+
+ if (!pages)
+ return -ENXIO;
+
++ if (off >= nr_pages || (usize >> PAGE_SHIFT) > nr_pages - off)
++ return -ENXIO;
++
++ pages += off;
++
+ do {
+ int ret = vm_insert_page(vma, uaddr, *pages++);
+ if (ret) {
+diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
+index bbb251b14746..8b9bf54105b3 100644
+--- a/arch/arm64/include/asm/irq.h
++++ b/arch/arm64/include/asm/irq.h
+@@ -21,4 +21,9 @@ static inline void acpi_irq_init(void)
+ }
+ #define acpi_irq_init acpi_irq_init
+
++static inline int nr_legacy_irqs(void)
++{
++ return 0;
++}
++
+ #endif
+diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
+index d6dd9fdbc3be..d4264bb0a409 100644
+--- a/arch/arm64/include/asm/ptrace.h
++++ b/arch/arm64/include/asm/ptrace.h
+@@ -83,14 +83,14 @@
+ #define compat_sp regs[13]
+ #define compat_lr regs[14]
+ #define compat_sp_hyp regs[15]
+-#define compat_sp_irq regs[16]
+-#define compat_lr_irq regs[17]
+-#define compat_sp_svc regs[18]
+-#define compat_lr_svc regs[19]
+-#define compat_sp_abt regs[20]
+-#define compat_lr_abt regs[21]
+-#define compat_sp_und regs[22]
+-#define compat_lr_und regs[23]
++#define compat_lr_irq regs[16]
++#define compat_sp_irq regs[17]
++#define compat_lr_svc regs[18]
++#define compat_sp_svc regs[19]
++#define compat_lr_abt regs[20]
++#define compat_sp_abt regs[21]
++#define compat_lr_und regs[22]
++#define compat_sp_und regs[23]
+ #define compat_r8_fiq regs[24]
+ #define compat_r9_fiq regs[25]
+ #define compat_r10_fiq regs[26]
+diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
+index 98073332e2d0..4d77757b5894 100644
+--- a/arch/arm64/kernel/vmlinux.lds.S
++++ b/arch/arm64/kernel/vmlinux.lds.S
+@@ -60,9 +60,12 @@ PECOFF_FILE_ALIGNMENT = 0x200;
+ #define PECOFF_EDATA_PADDING
+ #endif
+
+-#ifdef CONFIG_DEBUG_ALIGN_RODATA
++#if defined(CONFIG_DEBUG_ALIGN_RODATA)
+ #define ALIGN_DEBUG_RO . = ALIGN(1<<SECTION_SHIFT);
+ #define ALIGN_DEBUG_RO_MIN(min) ALIGN_DEBUG_RO
++#elif defined(CONFIG_DEBUG_RODATA)
++#define ALIGN_DEBUG_RO . = ALIGN(1<<PAGE_SHIFT);
++#define ALIGN_DEBUG_RO_MIN(min) ALIGN_DEBUG_RO
+ #else
+ #define ALIGN_DEBUG_RO
+ #define ALIGN_DEBUG_RO_MIN(min) . = ALIGN(min);
+diff --git a/arch/mips/ath79/setup.c b/arch/mips/ath79/setup.c
+index 1ba21204ebe0..9a0013703579 100644
+--- a/arch/mips/ath79/setup.c
++++ b/arch/mips/ath79/setup.c
+@@ -216,9 +216,9 @@ void __init plat_mem_setup(void)
+ AR71XX_RESET_SIZE);
+ ath79_pll_base = ioremap_nocache(AR71XX_PLL_BASE,
+ AR71XX_PLL_SIZE);
++ ath79_detect_sys_type();
+ ath79_ddr_ctrl_init();
+
+- ath79_detect_sys_type();
+ if (mips_machtype != ATH79_MACH_GENERIC_OF)
+ detect_memory_region(0, ATH79_MEM_SIZE_MIN, ATH79_MEM_SIZE_MAX);
+
+diff --git a/arch/mips/include/asm/cdmm.h b/arch/mips/include/asm/cdmm.h
+index 16e22ce9719f..85dc4ce401ad 100644
+--- a/arch/mips/include/asm/cdmm.h
++++ b/arch/mips/include/asm/cdmm.h
+@@ -84,6 +84,17 @@ void mips_cdmm_driver_unregister(struct mips_cdmm_driver *);
+ module_driver(__mips_cdmm_driver, mips_cdmm_driver_register, \
+ mips_cdmm_driver_unregister)
+
++/*
++ * builtin_mips_cdmm_driver() - Helper macro for drivers that don't do anything
++ * special in init and have no exit. This eliminates some boilerplate. Each
++ * driver may only use this macro once, and calling it replaces device_initcall
++ * (or in some cases, the legacy __initcall). This is meant to be a direct
++ * parallel of module_mips_cdmm_driver() above but without the __exit stuff that
++ * is not used for builtin cases.
++ */
++#define builtin_mips_cdmm_driver(__mips_cdmm_driver) \
++ builtin_driver(__mips_cdmm_driver, mips_cdmm_driver_register)
++
+ /* drivers/tty/mips_ejtag_fdc.c */
+
+ #ifdef CONFIG_MIPS_EJTAG_FDC_EARLYCON
+diff --git a/arch/mips/kvm/emulate.c b/arch/mips/kvm/emulate.c
+index d5fa3eaf39a1..41b1b090f56f 100644
+--- a/arch/mips/kvm/emulate.c
++++ b/arch/mips/kvm/emulate.c
+@@ -1581,7 +1581,7 @@ enum emulation_result kvm_mips_emulate_cache(uint32_t inst, uint32_t *opc,
+
+ base = (inst >> 21) & 0x1f;
+ op_inst = (inst >> 16) & 0x1f;
+- offset = inst & 0xffff;
++ offset = (int16_t)inst;
+ cache = (inst >> 16) & 0x3;
+ op = (inst >> 18) & 0x7;
+
+diff --git a/arch/mips/kvm/locore.S b/arch/mips/kvm/locore.S
+index c567240386a0..d1ee95a7f7dd 100644
+--- a/arch/mips/kvm/locore.S
++++ b/arch/mips/kvm/locore.S
+@@ -165,9 +165,11 @@ FEXPORT(__kvm_mips_vcpu_run)
+
+ FEXPORT(__kvm_mips_load_asid)
+ /* Set the ASID for the Guest Kernel */
+- INT_SLL t0, t0, 1 /* with kseg0 @ 0x40000000, kernel */
+- /* addresses shift to 0x80000000 */
+- bltz t0, 1f /* If kernel */
++ PTR_L t0, VCPU_COP0(k1)
++ LONG_L t0, COP0_STATUS(t0)
++ andi t0, KSU_USER | ST0_ERL | ST0_EXL
++ xori t0, KSU_USER
++ bnez t0, 1f /* If kernel */
+ INT_ADDIU t1, k1, VCPU_GUEST_KERNEL_ASID /* (BD) */
+ INT_ADDIU t1, k1, VCPU_GUEST_USER_ASID /* else user */
+ 1:
+@@ -482,9 +484,11 @@ __kvm_mips_return_to_guest:
+ mtc0 t0, CP0_EPC
+
+ /* Set the ASID for the Guest Kernel */
+- INT_SLL t0, t0, 1 /* with kseg0 @ 0x40000000, kernel */
+- /* addresses shift to 0x80000000 */
+- bltz t0, 1f /* If kernel */
++ PTR_L t0, VCPU_COP0(k1)
++ LONG_L t0, COP0_STATUS(t0)
++ andi t0, KSU_USER | ST0_ERL | ST0_EXL
++ xori t0, KSU_USER
++ bnez t0, 1f /* If kernel */
+ INT_ADDIU t1, k1, VCPU_GUEST_KERNEL_ASID /* (BD) */
+ INT_ADDIU t1, k1, VCPU_GUEST_USER_ASID /* else user */
+ 1:
+diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
+index cd4c129ce743..bafb32b4c6b4 100644
+--- a/arch/mips/kvm/mips.c
++++ b/arch/mips/kvm/mips.c
+@@ -278,7 +278,7 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
+
+ if (!gebase) {
+ err = -ENOMEM;
+- goto out_free_cpu;
++ goto out_uninit_cpu;
+ }
+ kvm_debug("Allocated %d bytes for KVM Exception Handlers @ %p\n",
+ ALIGN(size, PAGE_SIZE), gebase);
+@@ -342,6 +342,9 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
+ out_free_gebase:
+ kfree(gebase);
+
++out_uninit_cpu:
++ kvm_vcpu_uninit(vcpu);
++
+ out_free_cpu:
+ kfree(vcpu);
+
+diff --git a/arch/mips/lantiq/clk.c b/arch/mips/lantiq/clk.c
+index 3fc2e6d70c77..a0706fd4ce0a 100644
+--- a/arch/mips/lantiq/clk.c
++++ b/arch/mips/lantiq/clk.c
+@@ -99,6 +99,23 @@ int clk_set_rate(struct clk *clk, unsigned long rate)
+ }
+ EXPORT_SYMBOL(clk_set_rate);
+
++long clk_round_rate(struct clk *clk, unsigned long rate)
++{
++ if (unlikely(!clk_good(clk)))
++ return 0;
++ if (clk->rates && *clk->rates) {
++ unsigned long *r = clk->rates;
++
++ while (*r && (*r != rate))
++ r++;
++ if (!*r) {
++ return clk->rate;
++ }
++ }
++ return rate;
++}
++EXPORT_SYMBOL(clk_round_rate);
++
+ int clk_enable(struct clk *clk)
+ {
+ if (unlikely(!clk_good(clk)))
+diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
+index c98d89708e99..cbee788d9625 100644
+--- a/arch/s390/kvm/interrupt.c
++++ b/arch/s390/kvm/interrupt.c
+@@ -1051,8 +1051,7 @@ static int __inject_extcall(struct kvm_vcpu *vcpu, struct kvm_s390_irq *irq)
+ src_id, 0, 2);
+
+ /* sending vcpu invalid */
+- if (src_id >= KVM_MAX_VCPUS ||
+- kvm_get_vcpu(vcpu->kvm, src_id) == NULL)
++ if (kvm_get_vcpu_by_id(vcpu->kvm, src_id) == NULL)
+ return -EINVAL;
+
+ if (sclp.has_sigpif)
+@@ -1131,6 +1130,10 @@ static int __inject_sigp_emergency(struct kvm_vcpu *vcpu,
+ trace_kvm_s390_inject_vcpu(vcpu->vcpu_id, KVM_S390_INT_EMERGENCY,
+ irq->u.emerg.code, 0, 2);
+
++ /* sending vcpu invalid */
++ if (kvm_get_vcpu_by_id(vcpu->kvm, irq->u.emerg.code) == NULL)
++ return -EINVAL;
++
+ set_bit(irq->u.emerg.code, li->sigp_emerg_pending);
+ set_bit(IRQ_PEND_EXT_EMERGENCY, &li->pending_irqs);
+ atomic_set_mask(CPUSTAT_EXT_INT, li->cpuflags);
+diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
+index f32f843a3631..4a001c1b5a1a 100644
+--- a/arch/s390/kvm/kvm-s390.c
++++ b/arch/s390/kvm/kvm-s390.c
+@@ -289,12 +289,16 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
+ r = 0;
+ break;
+ case KVM_CAP_S390_VECTOR_REGISTERS:
+- if (MACHINE_HAS_VX) {
++ mutex_lock(&kvm->lock);
++ if (atomic_read(&kvm->online_vcpus)) {
++ r = -EBUSY;
++ } else if (MACHINE_HAS_VX) {
+ set_kvm_facility(kvm->arch.model.fac->mask, 129);
+ set_kvm_facility(kvm->arch.model.fac->list, 129);
+ r = 0;
+ } else
+ r = -EINVAL;
++ mutex_unlock(&kvm->lock);
+ break;
+ case KVM_CAP_S390_USER_STSI:
+ kvm->arch.user_stsi = 1;
+@@ -1037,7 +1041,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
+ if (!kvm->arch.sca)
+ goto out_err;
+ spin_lock(&kvm_lock);
+- sca_offset = (sca_offset + 16) & 0x7f0;
++ sca_offset += 16;
++ if (sca_offset + sizeof(struct sca_block) > PAGE_SIZE)
++ sca_offset = 0;
+ kvm->arch.sca = (struct sca_block *) ((char *) kvm->arch.sca + sca_offset);
+ spin_unlock(&kvm_lock);
+
+diff --git a/arch/s390/kvm/sigp.c b/arch/s390/kvm/sigp.c
+index 72e58bd2bee7..7171056fc24d 100644
+--- a/arch/s390/kvm/sigp.c
++++ b/arch/s390/kvm/sigp.c
+@@ -294,12 +294,8 @@ static int handle_sigp_dst(struct kvm_vcpu *vcpu, u8 order_code,
+ u16 cpu_addr, u32 parameter, u64 *status_reg)
+ {
+ int rc;
+- struct kvm_vcpu *dst_vcpu;
++ struct kvm_vcpu *dst_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, cpu_addr);
+
+- if (cpu_addr >= KVM_MAX_VCPUS)
+- return SIGP_CC_NOT_OPERATIONAL;
+-
+- dst_vcpu = kvm_get_vcpu(vcpu->kvm, cpu_addr);
+ if (!dst_vcpu)
+ return SIGP_CC_NOT_OPERATIONAL;
+
+@@ -481,7 +477,7 @@ int kvm_s390_handle_sigp_pei(struct kvm_vcpu *vcpu)
+ trace_kvm_s390_handle_sigp_pei(vcpu, order_code, cpu_addr);
+
+ if (order_code == SIGP_EXTERNAL_CALL) {
+- dest_vcpu = kvm_get_vcpu(vcpu->kvm, cpu_addr);
++ dest_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, cpu_addr);
+ BUG_ON(dest_vcpu == NULL);
+
+ kvm_s390_vcpu_wakeup(dest_vcpu);
+diff --git a/arch/tile/kernel/usb.c b/arch/tile/kernel/usb.c
+index f0da5a237e94..9f1e05e12255 100644
+--- a/arch/tile/kernel/usb.c
++++ b/arch/tile/kernel/usb.c
+@@ -22,6 +22,7 @@
+ #include <linux/platform_device.h>
+ #include <linux/usb/tilegx.h>
+ #include <linux/init.h>
++#include <linux/module.h>
+ #include <linux/types.h>
+
+ static u64 ehci_dmamask = DMA_BIT_MASK(32);
+diff --git a/arch/x86/include/asm/i8259.h b/arch/x86/include/asm/i8259.h
+index ccffa53750a8..39bcefc20de7 100644
+--- a/arch/x86/include/asm/i8259.h
++++ b/arch/x86/include/asm/i8259.h
+@@ -60,6 +60,7 @@ struct legacy_pic {
+ void (*mask_all)(void);
+ void (*restore_mask)(void);
+ void (*init)(int auto_eoi);
++ int (*probe)(void);
+ int (*irq_pending)(unsigned int irq);
+ void (*make_irq)(unsigned int irq);
+ };
+diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
+index e16466ec473c..e9cd7befcb76 100644
+--- a/arch/x86/include/asm/kvm_emulate.h
++++ b/arch/x86/include/asm/kvm_emulate.h
+@@ -112,6 +112,16 @@ struct x86_emulate_ops {
+ struct x86_exception *fault);
+
+ /*
++ * read_phys: Read bytes of standard (non-emulated/special) memory.
++ * Used for descriptor reading.
++ * @addr: [IN ] Physical address from which to read.
++ * @val: [OUT] Value read from memory.
++ * @bytes: [IN ] Number of bytes to read from memory.
++ */
++ int (*read_phys)(struct x86_emulate_ctxt *ctxt, unsigned long addr,
++ void *val, unsigned int bytes);
++
++ /*
+ * write_std: Write bytes of standard (non-emulated/special) memory.
+ * Used for descriptor writing.
+ * @addr: [IN ] Linear address to which to write.
+diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h
+index b5d7640abc5d..8a4add8e4639 100644
+--- a/arch/x86/include/uapi/asm/svm.h
++++ b/arch/x86/include/uapi/asm/svm.h
+@@ -100,6 +100,7 @@
+ { SVM_EXIT_EXCP_BASE + UD_VECTOR, "UD excp" }, \
+ { SVM_EXIT_EXCP_BASE + PF_VECTOR, "PF excp" }, \
+ { SVM_EXIT_EXCP_BASE + NM_VECTOR, "NM excp" }, \
++ { SVM_EXIT_EXCP_BASE + AC_VECTOR, "AC excp" }, \
+ { SVM_EXIT_EXCP_BASE + MC_VECTOR, "MC excp" }, \
+ { SVM_EXIT_INTR, "interrupt" }, \
+ { SVM_EXIT_NMI, "nmi" }, \
+diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
+index 2683f36e4e0a..ea4ba83ca0cf 100644
+--- a/arch/x86/kernel/apic/vector.c
++++ b/arch/x86/kernel/apic/vector.c
+@@ -360,7 +360,11 @@ int __init arch_probe_nr_irqs(void)
+ if (nr < nr_irqs)
+ nr_irqs = nr;
+
+- return nr_legacy_irqs();
++ /*
++ * We don't know if PIC is present at this point so we need to do
++ * probe() to get the right number of legacy IRQs.
++ */
++ return legacy_pic->probe();
+ }
+
+ #ifdef CONFIG_X86_IO_APIC
+diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
+index cb9e5df42dd2..e4f929d97c42 100644
+--- a/arch/x86/kernel/cpu/common.c
++++ b/arch/x86/kernel/cpu/common.c
+@@ -272,10 +272,9 @@ __setup("nosmap", setup_disable_smap);
+
+ static __always_inline void setup_smap(struct cpuinfo_x86 *c)
+ {
+- unsigned long eflags;
++ unsigned long eflags = native_save_fl();
+
+ /* This should have been cleared long ago */
+- raw_local_save_flags(eflags);
+ BUG_ON(eflags & X86_EFLAGS_AC);
+
+ if (cpu_has(c, X86_FEATURE_SMAP)) {
+diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
+index 50ec9af1bd51..6545e6ddbfb1 100644
+--- a/arch/x86/kernel/fpu/signal.c
++++ b/arch/x86/kernel/fpu/signal.c
+@@ -385,20 +385,19 @@ fpu__alloc_mathframe(unsigned long sp, int ia32_frame,
+ */
+ void fpu__init_prepare_fx_sw_frame(void)
+ {
+- int fsave_header_size = sizeof(struct fregs_state);
+ int size = xstate_size + FP_XSTATE_MAGIC2_SIZE;
+
+- if (config_enabled(CONFIG_X86_32))
+- size += fsave_header_size;
+-
+ fx_sw_reserved.magic1 = FP_XSTATE_MAGIC1;
+ fx_sw_reserved.extended_size = size;
+ fx_sw_reserved.xfeatures = xfeatures_mask;
+ fx_sw_reserved.xstate_size = xstate_size;
+
+- if (config_enabled(CONFIG_IA32_EMULATION)) {
++ if (config_enabled(CONFIG_IA32_EMULATION) ||
++ config_enabled(CONFIG_X86_32)) {
++ int fsave_header_size = sizeof(struct fregs_state);
++
+ fx_sw_reserved_ia32 = fx_sw_reserved;
+- fx_sw_reserved_ia32.extended_size += fsave_header_size;
++ fx_sw_reserved_ia32.extended_size = size + fsave_header_size;
+ }
+ }
+
+diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
+index 62fc001c7846..2c4ac072a702 100644
+--- a/arch/x86/kernel/fpu/xstate.c
++++ b/arch/x86/kernel/fpu/xstate.c
+@@ -402,7 +402,6 @@ void *get_xsave_addr(struct xregs_state *xsave, int xstate_feature)
+ if (!boot_cpu_has(X86_FEATURE_XSAVE))
+ return NULL;
+
+- xsave = ¤t->thread.fpu.state.xsave;
+ /*
+ * We should not ever be requesting features that we
+ * have not enabled. Remember that pcntxt_mask is
+diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
+index 1d40ca8a73f2..ffdc0e860390 100644
+--- a/arch/x86/kernel/head_64.S
++++ b/arch/x86/kernel/head_64.S
+@@ -65,6 +65,9 @@ startup_64:
+ * tables and then reload them.
+ */
+
++ /* Sanitize CPU configuration */
++ call verify_cpu
++
+ /*
+ * Compute the delta between the address I am compiled to run at and the
+ * address I am actually running at.
+@@ -174,6 +177,9 @@ ENTRY(secondary_startup_64)
+ * after the boot processor executes this code.
+ */
+
++ /* Sanitize CPU configuration */
++ call verify_cpu
++
+ movq $(init_level4_pgt - __START_KERNEL_map), %rax
+ 1:
+
+@@ -288,6 +294,8 @@ ENTRY(secondary_startup_64)
+ pushq %rax # target address in negative space
+ lretq
+
++#include "verify_cpu.S"
++
+ #ifdef CONFIG_HOTPLUG_CPU
+ /*
+ * Boot CPU0 entry point. It's called from play_dead(). Everything has been set
+diff --git a/arch/x86/kernel/i8259.c b/arch/x86/kernel/i8259.c
+index 16cb827a5b27..be22f5a2192e 100644
+--- a/arch/x86/kernel/i8259.c
++++ b/arch/x86/kernel/i8259.c
+@@ -295,16 +295,11 @@ static void unmask_8259A(void)
+ raw_spin_unlock_irqrestore(&i8259A_lock, flags);
+ }
+
+-static void init_8259A(int auto_eoi)
++static int probe_8259A(void)
+ {
+ unsigned long flags;
+ unsigned char probe_val = ~(1 << PIC_CASCADE_IR);
+ unsigned char new_val;
+-
+- i8259A_auto_eoi = auto_eoi;
+-
+- raw_spin_lock_irqsave(&i8259A_lock, flags);
+-
+ /*
+ * Check to see if we have a PIC.
+ * Mask all except the cascade and read
+@@ -312,16 +307,28 @@ static void init_8259A(int auto_eoi)
+ * have a PIC, we will read 0xff as opposed to the
+ * value we wrote.
+ */
++ raw_spin_lock_irqsave(&i8259A_lock, flags);
++
+ outb(0xff, PIC_SLAVE_IMR); /* mask all of 8259A-2 */
+ outb(probe_val, PIC_MASTER_IMR);
+ new_val = inb(PIC_MASTER_IMR);
+ if (new_val != probe_val) {
+ printk(KERN_INFO "Using NULL legacy PIC\n");
+ legacy_pic = &null_legacy_pic;
+- raw_spin_unlock_irqrestore(&i8259A_lock, flags);
+- return;
+ }
+
++ raw_spin_unlock_irqrestore(&i8259A_lock, flags);
++ return nr_legacy_irqs();
++}
++
++static void init_8259A(int auto_eoi)
++{
++ unsigned long flags;
++
++ i8259A_auto_eoi = auto_eoi;
++
++ raw_spin_lock_irqsave(&i8259A_lock, flags);
++
+ outb(0xff, PIC_MASTER_IMR); /* mask all of 8259A-1 */
+
+ /*
+@@ -379,6 +386,10 @@ static int legacy_pic_irq_pending_noop(unsigned int irq)
+ {
+ return 0;
+ }
++static int legacy_pic_probe(void)
++{
++ return 0;
++}
+
+ struct legacy_pic null_legacy_pic = {
+ .nr_legacy_irqs = 0,
+@@ -388,6 +399,7 @@ struct legacy_pic null_legacy_pic = {
+ .mask_all = legacy_pic_noop,
+ .restore_mask = legacy_pic_noop,
+ .init = legacy_pic_int_noop,
++ .probe = legacy_pic_probe,
+ .irq_pending = legacy_pic_irq_pending_noop,
+ .make_irq = legacy_pic_uint_noop,
+ };
+@@ -400,6 +412,7 @@ struct legacy_pic default_legacy_pic = {
+ .mask_all = mask_8259A,
+ .restore_mask = unmask_8259A,
+ .init = init_8259A,
++ .probe = probe_8259A,
+ .irq_pending = i8259A_irq_pending,
+ .make_irq = make_8259A_irq,
+ };
+diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
+index 80f874bf999e..1e6f70f1f251 100644
+--- a/arch/x86/kernel/setup.c
++++ b/arch/x86/kernel/setup.c
+@@ -1198,6 +1198,14 @@ void __init setup_arch(char **cmdline_p)
+ clone_pgd_range(initial_page_table + KERNEL_PGD_BOUNDARY,
+ swapper_pg_dir + KERNEL_PGD_BOUNDARY,
+ KERNEL_PGD_PTRS);
++
++ /*
++ * sync back low identity map too. It is used for example
++ * in the 32-bit EFI stub.
++ */
++ clone_pgd_range(initial_page_table,
++ swapper_pg_dir + KERNEL_PGD_BOUNDARY,
++ min(KERNEL_PGD_PTRS, KERNEL_PGD_BOUNDARY));
+ #endif
+
+ tboot_probe();
+diff --git a/arch/x86/kernel/verify_cpu.S b/arch/x86/kernel/verify_cpu.S
+index b9242bacbe59..4cf401f581e7 100644
+--- a/arch/x86/kernel/verify_cpu.S
++++ b/arch/x86/kernel/verify_cpu.S
+@@ -34,10 +34,11 @@
+ #include <asm/msr-index.h>
+
+ verify_cpu:
+- pushfl # Save caller passed flags
+- pushl $0 # Kill any dangerous flags
+- popfl
++ pushf # Save caller passed flags
++ push $0 # Kill any dangerous flags
++ popf
+
++#ifndef __x86_64__
+ pushfl # standard way to check for cpuid
+ popl %eax
+ movl %eax,%ebx
+@@ -48,6 +49,7 @@ verify_cpu:
+ popl %eax
+ cmpl %eax,%ebx
+ jz verify_cpu_no_longmode # cpu has no cpuid
++#endif
+
+ movl $0x0,%eax # See if cpuid 1 is implemented
+ cpuid
+@@ -130,10 +132,10 @@ verify_cpu_sse_test:
+ jmp verify_cpu_sse_test # try again
+
+ verify_cpu_no_longmode:
+- popfl # Restore caller passed flags
++ popf # Restore caller passed flags
+ movl $1,%eax
+ ret
+ verify_cpu_sse_ok:
+- popfl # Restore caller passed flags
++ popf # Restore caller passed flags
+ xorl %eax, %eax
+ ret
+diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
+index 2392541a96e6..f17c342355f6 100644
+--- a/arch/x86/kvm/emulate.c
++++ b/arch/x86/kvm/emulate.c
+@@ -2272,8 +2272,8 @@ static int emulator_has_longmode(struct x86_emulate_ctxt *ctxt)
+ #define GET_SMSTATE(type, smbase, offset) \
+ ({ \
+ type __val; \
+- int r = ctxt->ops->read_std(ctxt, smbase + offset, &__val, \
+- sizeof(__val), NULL); \
++ int r = ctxt->ops->read_phys(ctxt, smbase + offset, &__val, \
++ sizeof(__val)); \
+ if (r != X86EMUL_CONTINUE) \
+ return X86EMUL_UNHANDLEABLE; \
+ __val; \
+@@ -2484,17 +2484,36 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)
+
+ /*
+ * Get back to real mode, to prepare a safe state in which to load
+- * CR0/CR3/CR4/EFER. Also this will ensure that addresses passed
+- * to read_std/write_std are not virtual.
+- *
+- * CR4.PCIDE must be zero, because it is a 64-bit mode only feature.
++ * CR0/CR3/CR4/EFER. It's all a bit more complicated if the vCPU
++ * supports long mode.
+ */
++ cr4 = ctxt->ops->get_cr(ctxt, 4);
++ if (emulator_has_longmode(ctxt)) {
++ struct desc_struct cs_desc;
++
++ /* Zero CR4.PCIDE before CR0.PG. */
++ if (cr4 & X86_CR4_PCIDE) {
++ ctxt->ops->set_cr(ctxt, 4, cr4 & ~X86_CR4_PCIDE);
++ cr4 &= ~X86_CR4_PCIDE;
++ }
++
++ /* A 32-bit code segment is required to clear EFER.LMA. */
++ memset(&cs_desc, 0, sizeof(cs_desc));
++ cs_desc.type = 0xb;
++ cs_desc.s = cs_desc.g = cs_desc.p = 1;
++ ctxt->ops->set_segment(ctxt, 0, &cs_desc, 0, VCPU_SREG_CS);
++ }
++
++ /* For the 64-bit case, this will clear EFER.LMA. */
+ cr0 = ctxt->ops->get_cr(ctxt, 0);
+ if (cr0 & X86_CR0_PE)
+ ctxt->ops->set_cr(ctxt, 0, cr0 & ~(X86_CR0_PG | X86_CR0_PE));
+- cr4 = ctxt->ops->get_cr(ctxt, 4);
++
++ /* Now clear CR4.PAE (which must be done before clearing EFER.LME). */
+ if (cr4 & X86_CR4_PAE)
+ ctxt->ops->set_cr(ctxt, 4, cr4 & ~X86_CR4_PAE);
++
++ /* And finally go back to 32-bit mode. */
+ efer = 0;
+ ctxt->ops->set_msr(ctxt, MSR_EFER, efer);
+
+@@ -4455,7 +4474,7 @@ static const struct opcode twobyte_table[256] = {
+ F(DstMem | SrcReg | Src2CL | ModRM, em_shld), N, N,
+ /* 0xA8 - 0xAF */
+ I(Stack | Src2GS, em_push_sreg), I(Stack | Src2GS, em_pop_sreg),
+- II(No64 | EmulateOnUD | ImplicitOps, em_rsm, rsm),
++ II(EmulateOnUD | ImplicitOps, em_rsm, rsm),
+ F(DstMem | SrcReg | ModRM | BitOp | Lock | PageTable, em_bts),
+ F(DstMem | SrcReg | Src2ImmByte | ModRM, em_shrd),
+ F(DstMem | SrcReg | Src2CL | ModRM, em_shrd),
+diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
+index 2a5ca97c263b..236e346584c3 100644
+--- a/arch/x86/kvm/lapic.c
++++ b/arch/x86/kvm/lapic.c
+@@ -348,6 +348,8 @@ void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir)
+ struct kvm_lapic *apic = vcpu->arch.apic;
+
+ __kvm_apic_update_irr(pir, apic->regs);
++
++ kvm_make_request(KVM_REQ_EVENT, vcpu);
+ }
+ EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
+
+diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
+index 2d32b67a1043..00da6e85a27f 100644
+--- a/arch/x86/kvm/svm.c
++++ b/arch/x86/kvm/svm.c
+@@ -1085,7 +1085,7 @@ static u64 svm_compute_tsc_offset(struct kvm_vcpu *vcpu, u64 target_tsc)
+ return target_tsc - tsc;
+ }
+
+-static void init_vmcb(struct vcpu_svm *svm, bool init_event)
++static void init_vmcb(struct vcpu_svm *svm)
+ {
+ struct vmcb_control_area *control = &svm->vmcb->control;
+ struct vmcb_save_area *save = &svm->vmcb->save;
+@@ -1106,6 +1106,7 @@ static void init_vmcb(struct vcpu_svm *svm, bool init_event)
+ set_exception_intercept(svm, PF_VECTOR);
+ set_exception_intercept(svm, UD_VECTOR);
+ set_exception_intercept(svm, MC_VECTOR);
++ set_exception_intercept(svm, AC_VECTOR);
+
+ set_intercept(svm, INTERCEPT_INTR);
+ set_intercept(svm, INTERCEPT_NMI);
+@@ -1156,8 +1157,7 @@ static void init_vmcb(struct vcpu_svm *svm, bool init_event)
+ init_sys_seg(&save->ldtr, SEG_TYPE_LDT);
+ init_sys_seg(&save->tr, SEG_TYPE_BUSY_TSS16);
+
+- if (!init_event)
+- svm_set_efer(&svm->vcpu, 0);
++ svm_set_efer(&svm->vcpu, 0);
+ save->dr6 = 0xffff0ff0;
+ kvm_set_rflags(&svm->vcpu, 2);
+ save->rip = 0x0000fff0;
+@@ -1211,7 +1211,7 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
+ if (kvm_vcpu_is_reset_bsp(&svm->vcpu))
+ svm->vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
+ }
+- init_vmcb(svm, init_event);
++ init_vmcb(svm);
+
+ kvm_cpuid(vcpu, &eax, &dummy, &dummy, &dummy);
+ kvm_register_write(vcpu, VCPU_REGS_RDX, eax);
+@@ -1267,7 +1267,7 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
+ clear_page(svm->vmcb);
+ svm->vmcb_pa = page_to_pfn(page) << PAGE_SHIFT;
+ svm->asid_generation = 0;
+- init_vmcb(svm, false);
++ init_vmcb(svm);
+
+ svm_init_osvw(&svm->vcpu);
+
+@@ -1795,6 +1795,12 @@ static int ud_interception(struct vcpu_svm *svm)
+ return 1;
+ }
+
++static int ac_interception(struct vcpu_svm *svm)
++{
++ kvm_queue_exception_e(&svm->vcpu, AC_VECTOR, 0);
++ return 1;
++}
++
+ static void svm_fpu_activate(struct kvm_vcpu *vcpu)
+ {
+ struct vcpu_svm *svm = to_svm(vcpu);
+@@ -1889,7 +1895,7 @@ static int shutdown_interception(struct vcpu_svm *svm)
+ * so reinitialize it.
+ */
+ clear_page(svm->vmcb);
+- init_vmcb(svm, false);
++ init_vmcb(svm);
+
+ kvm_run->exit_reason = KVM_EXIT_SHUTDOWN;
+ return 0;
+@@ -3369,6 +3375,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
+ [SVM_EXIT_EXCP_BASE + PF_VECTOR] = pf_interception,
+ [SVM_EXIT_EXCP_BASE + NM_VECTOR] = nm_interception,
+ [SVM_EXIT_EXCP_BASE + MC_VECTOR] = mc_interception,
++ [SVM_EXIT_EXCP_BASE + AC_VECTOR] = ac_interception,
+ [SVM_EXIT_INTR] = intr_interception,
+ [SVM_EXIT_NMI] = nmi_interception,
+ [SVM_EXIT_SMI] = nop_on_interception,
+diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
+index aa9e8229571d..e77d75b8772a 100644
+--- a/arch/x86/kvm/vmx.c
++++ b/arch/x86/kvm/vmx.c
+@@ -1567,7 +1567,7 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu)
+ u32 eb;
+
+ eb = (1u << PF_VECTOR) | (1u << UD_VECTOR) | (1u << MC_VECTOR) |
+- (1u << NM_VECTOR) | (1u << DB_VECTOR);
++ (1u << NM_VECTOR) | (1u << DB_VECTOR) | (1u << AC_VECTOR);
+ if ((vcpu->guest_debug &
+ (KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP)) ==
+ (KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP))
+@@ -4780,8 +4780,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
+ vmx_set_cr0(vcpu, cr0); /* enter rmode */
+ vmx->vcpu.arch.cr0 = cr0;
+ vmx_set_cr4(vcpu, 0);
+- if (!init_event)
+- vmx_set_efer(vcpu, 0);
++ vmx_set_efer(vcpu, 0);
+ vmx_fpu_activate(vcpu);
+ update_exception_bitmap(vcpu);
+
+@@ -5118,6 +5117,9 @@ static int handle_exception(struct kvm_vcpu *vcpu)
+ return handle_rmode_exception(vcpu, ex_no, error_code);
+
+ switch (ex_no) {
++ case AC_VECTOR:
++ kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
++ return 1;
+ case DB_VECTOR:
+ dr6 = vmcs_readl(EXIT_QUALIFICATION);
+ if (!(vcpu->guest_debug &
+diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
+index 373328b71599..2781e2b0201d 100644
+--- a/arch/x86/kvm/x86.c
++++ b/arch/x86/kvm/x86.c
+@@ -621,7 +621,9 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
+ if ((cr0 ^ old_cr0) & update_bits)
+ kvm_mmu_reset_context(vcpu);
+
+- if ((cr0 ^ old_cr0) & X86_CR0_CD)
++ if (((cr0 ^ old_cr0) & X86_CR0_CD) &&
++ kvm_arch_has_noncoherent_dma(vcpu->kvm) &&
++ !kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
+ kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL);
+
+ return 0;
+@@ -4260,6 +4262,15 @@ static int kvm_read_guest_virt_system(struct x86_emulate_ctxt *ctxt,
+ return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, 0, exception);
+ }
+
++static int kvm_read_guest_phys_system(struct x86_emulate_ctxt *ctxt,
++ unsigned long addr, void *val, unsigned int bytes)
++{
++ struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
++ int r = kvm_vcpu_read_guest(vcpu, addr, val, bytes);
++
++ return r < 0 ? X86EMUL_IO_NEEDED : X86EMUL_CONTINUE;
++}
++
+ int kvm_write_guest_virt_system(struct x86_emulate_ctxt *ctxt,
+ gva_t addr, void *val,
+ unsigned int bytes,
+@@ -4995,6 +5006,7 @@ static const struct x86_emulate_ops emulate_ops = {
+ .write_gpr = emulator_write_gpr,
+ .read_std = kvm_read_guest_virt_system,
+ .write_std = kvm_write_guest_virt_system,
++ .read_phys = kvm_read_guest_phys_system,
+ .fetch = kvm_fetch_guest_virt,
+ .read_emulated = emulator_read_emulated,
+ .write_emulated = emulator_write_emulated,
+diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
+index db1b0bc5017c..c28f6185f8a4 100644
+--- a/arch/x86/mm/mpx.c
++++ b/arch/x86/mm/mpx.c
+@@ -622,6 +622,29 @@ static unsigned long mpx_bd_entry_to_bt_addr(struct mm_struct *mm,
+ }
+
+ /*
++ * We only want to do a 4-byte get_user() on 32-bit. Otherwise,
++ * we might run off the end of the bounds table if we are on
++ * a 64-bit kernel and try to get 8 bytes.
++ */
++int get_user_bd_entry(struct mm_struct *mm, unsigned long *bd_entry_ret,
++ long __user *bd_entry_ptr)
++{
++ u32 bd_entry_32;
++ int ret;
++
++ if (is_64bit_mm(mm))
++ return get_user(*bd_entry_ret, bd_entry_ptr);
++
++ /*
++ * Note that get_user() uses the type of the *pointer* to
++ * establish the size of the get, not the destination.
++ */
++ ret = get_user(bd_entry_32, (u32 __user *)bd_entry_ptr);
++ *bd_entry_ret = bd_entry_32;
++ return ret;
++}
++
++/*
+ * Get the base of bounds tables pointed by specific bounds
+ * directory entry.
+ */
+@@ -641,7 +664,7 @@ static int get_bt_addr(struct mm_struct *mm,
+ int need_write = 0;
+
+ pagefault_disable();
+- ret = get_user(bd_entry, bd_entry_ptr);
++ ret = get_user_bd_entry(mm, &bd_entry, bd_entry_ptr);
+ pagefault_enable();
+ if (!ret)
+ break;
+@@ -736,11 +759,23 @@ static unsigned long mpx_get_bt_entry_offset_bytes(struct mm_struct *mm,
+ */
+ static inline unsigned long bd_entry_virt_space(struct mm_struct *mm)
+ {
+- unsigned long long virt_space = (1ULL << boot_cpu_data.x86_virt_bits);
+- if (is_64bit_mm(mm))
+- return virt_space / MPX_BD_NR_ENTRIES_64;
+- else
+- return virt_space / MPX_BD_NR_ENTRIES_32;
++ unsigned long long virt_space;
++ unsigned long long GB = (1ULL << 30);
++
++ /*
++ * This covers 32-bit emulation as well as 32-bit kernels
++ * running on 64-bit harware.
++ */
++ if (!is_64bit_mm(mm))
++ return (4ULL * GB) / MPX_BD_NR_ENTRIES_32;
++
++ /*
++ * 'x86_virt_bits' returns what the hardware is capable
++ * of, and returns the full >32-bit adddress space when
++ * running 32-bit kernels on 64-bit hardware.
++ */
++ virt_space = (1ULL << boot_cpu_data.x86_virt_bits);
++ return virt_space / MPX_BD_NR_ENTRIES_64;
+ }
+
+ /*
+diff --git a/drivers/bluetooth/ath3k.c b/drivers/bluetooth/ath3k.c
+index e527a3e13939..fa893c3ec408 100644
+--- a/drivers/bluetooth/ath3k.c
++++ b/drivers/bluetooth/ath3k.c
+@@ -93,6 +93,7 @@ static const struct usb_device_id ath3k_table[] = {
+ { USB_DEVICE(0x04CA, 0x300f) },
+ { USB_DEVICE(0x04CA, 0x3010) },
+ { USB_DEVICE(0x0930, 0x0219) },
++ { USB_DEVICE(0x0930, 0x021c) },
+ { USB_DEVICE(0x0930, 0x0220) },
+ { USB_DEVICE(0x0930, 0x0227) },
+ { USB_DEVICE(0x0b05, 0x17d0) },
+@@ -104,6 +105,7 @@ static const struct usb_device_id ath3k_table[] = {
+ { USB_DEVICE(0x0CF3, 0x311F) },
+ { USB_DEVICE(0x0cf3, 0x3121) },
+ { USB_DEVICE(0x0CF3, 0x817a) },
++ { USB_DEVICE(0x0CF3, 0x817b) },
+ { USB_DEVICE(0x0cf3, 0xe003) },
+ { USB_DEVICE(0x0CF3, 0xE004) },
+ { USB_DEVICE(0x0CF3, 0xE005) },
+@@ -153,6 +155,7 @@ static const struct usb_device_id ath3k_blist_tbl[] = {
+ { USB_DEVICE(0x04ca, 0x300f), .driver_info = BTUSB_ATH3012 },
+ { USB_DEVICE(0x04ca, 0x3010), .driver_info = BTUSB_ATH3012 },
+ { USB_DEVICE(0x0930, 0x0219), .driver_info = BTUSB_ATH3012 },
++ { USB_DEVICE(0x0930, 0x021c), .driver_info = BTUSB_ATH3012 },
+ { USB_DEVICE(0x0930, 0x0220), .driver_info = BTUSB_ATH3012 },
+ { USB_DEVICE(0x0930, 0x0227), .driver_info = BTUSB_ATH3012 },
+ { USB_DEVICE(0x0b05, 0x17d0), .driver_info = BTUSB_ATH3012 },
+@@ -164,6 +167,7 @@ static const struct usb_device_id ath3k_blist_tbl[] = {
+ { USB_DEVICE(0x0cf3, 0x311F), .driver_info = BTUSB_ATH3012 },
+ { USB_DEVICE(0x0cf3, 0x3121), .driver_info = BTUSB_ATH3012 },
+ { USB_DEVICE(0x0CF3, 0x817a), .driver_info = BTUSB_ATH3012 },
++ { USB_DEVICE(0x0CF3, 0x817b), .driver_info = BTUSB_ATH3012 },
+ { USB_DEVICE(0x0cf3, 0xe004), .driver_info = BTUSB_ATH3012 },
+ { USB_DEVICE(0x0cf3, 0xe005), .driver_info = BTUSB_ATH3012 },
+ { USB_DEVICE(0x0cf3, 0xe006), .driver_info = BTUSB_ATH3012 },
+diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c
+index b4cf8d9c9dac..7d9b09f4158c 100644
+--- a/drivers/bluetooth/btusb.c
++++ b/drivers/bluetooth/btusb.c
+@@ -192,6 +192,7 @@ static const struct usb_device_id blacklist_table[] = {
+ { USB_DEVICE(0x04ca, 0x300f), .driver_info = BTUSB_ATH3012 },
+ { USB_DEVICE(0x04ca, 0x3010), .driver_info = BTUSB_ATH3012 },
+ { USB_DEVICE(0x0930, 0x0219), .driver_info = BTUSB_ATH3012 },
++ { USB_DEVICE(0x0930, 0x021c), .driver_info = BTUSB_ATH3012 },
+ { USB_DEVICE(0x0930, 0x0220), .driver_info = BTUSB_ATH3012 },
+ { USB_DEVICE(0x0930, 0x0227), .driver_info = BTUSB_ATH3012 },
+ { USB_DEVICE(0x0b05, 0x17d0), .driver_info = BTUSB_ATH3012 },
+@@ -203,6 +204,7 @@ static const struct usb_device_id blacklist_table[] = {
+ { USB_DEVICE(0x0cf3, 0x311f), .driver_info = BTUSB_ATH3012 },
+ { USB_DEVICE(0x0cf3, 0x3121), .driver_info = BTUSB_ATH3012 },
+ { USB_DEVICE(0x0cf3, 0x817a), .driver_info = BTUSB_ATH3012 },
++ { USB_DEVICE(0x0cf3, 0x817b), .driver_info = BTUSB_ATH3012 },
+ { USB_DEVICE(0x0cf3, 0xe003), .driver_info = BTUSB_ATH3012 },
+ { USB_DEVICE(0x0cf3, 0xe004), .driver_info = BTUSB_ATH3012 },
+ { USB_DEVICE(0x0cf3, 0xe005), .driver_info = BTUSB_ATH3012 },
+diff --git a/drivers/clk/bcm/clk-iproc-pll.c b/drivers/clk/bcm/clk-iproc-pll.c
+index 2dda4e8295a9..d679ab869653 100644
+--- a/drivers/clk/bcm/clk-iproc-pll.c
++++ b/drivers/clk/bcm/clk-iproc-pll.c
+@@ -345,8 +345,8 @@ static unsigned long iproc_pll_recalc_rate(struct clk_hw *hw,
+ struct iproc_pll *pll = clk->pll;
+ const struct iproc_pll_ctrl *ctrl = pll->ctrl;
+ u32 val;
+- u64 ndiv;
+- unsigned int ndiv_int, ndiv_frac, pdiv;
++ u64 ndiv, ndiv_int, ndiv_frac;
++ unsigned int pdiv;
+
+ if (parent_rate == 0)
+ return 0;
+@@ -366,22 +366,19 @@ static unsigned long iproc_pll_recalc_rate(struct clk_hw *hw,
+ val = readl(pll->pll_base + ctrl->ndiv_int.offset);
+ ndiv_int = (val >> ctrl->ndiv_int.shift) &
+ bit_mask(ctrl->ndiv_int.width);
+- ndiv = (u64)ndiv_int << ctrl->ndiv_int.shift;
++ ndiv = ndiv_int << 20;
+
+ if (ctrl->flags & IPROC_CLK_PLL_HAS_NDIV_FRAC) {
+ val = readl(pll->pll_base + ctrl->ndiv_frac.offset);
+ ndiv_frac = (val >> ctrl->ndiv_frac.shift) &
+ bit_mask(ctrl->ndiv_frac.width);
+-
+- if (ndiv_frac != 0)
+- ndiv = ((u64)ndiv_int << ctrl->ndiv_int.shift) |
+- ndiv_frac;
++ ndiv += ndiv_frac;
+ }
+
+ val = readl(pll->pll_base + ctrl->pdiv.offset);
+ pdiv = (val >> ctrl->pdiv.shift) & bit_mask(ctrl->pdiv.width);
+
+- clk->rate = (ndiv * parent_rate) >> ctrl->ndiv_int.shift;
++ clk->rate = (ndiv * parent_rate) >> 20;
+
+ if (pdiv == 0)
+ clk->rate *= 2;
+diff --git a/drivers/clk/versatile/clk-icst.c b/drivers/clk/versatile/clk-icst.c
+index bc96f103bd7c..9064636a867f 100644
+--- a/drivers/clk/versatile/clk-icst.c
++++ b/drivers/clk/versatile/clk-icst.c
+@@ -156,8 +156,10 @@ struct clk *icst_clk_register(struct device *dev,
+ icst->lockreg = base + desc->lock_offset;
+
+ clk = clk_register(dev, &icst->hw);
+- if (IS_ERR(clk))
++ if (IS_ERR(clk)) {
++ kfree(pclone);
+ kfree(icst);
++ }
+
+ return clk;
+ }
+diff --git a/drivers/mfd/twl6040.c b/drivers/mfd/twl6040.c
+index c5265c1262c5..6aacd205a774 100644
+--- a/drivers/mfd/twl6040.c
++++ b/drivers/mfd/twl6040.c
+@@ -647,6 +647,8 @@ static int twl6040_probe(struct i2c_client *client,
+
+ twl6040->clk32k = devm_clk_get(&client->dev, "clk32k");
+ if (IS_ERR(twl6040->clk32k)) {
++ if (PTR_ERR(twl6040->clk32k) == -EPROBE_DEFER)
++ return -EPROBE_DEFER;
+ dev_info(&client->dev, "clk32k is not handled\n");
+ twl6040->clk32k = NULL;
+ }
+diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
+index a98dd4f1b0e3..cbbb1c93386d 100644
+--- a/drivers/net/bonding/bond_main.c
++++ b/drivers/net/bonding/bond_main.c
+@@ -1751,6 +1751,7 @@ err_undo_flags:
+ slave_dev->dev_addr))
+ eth_hw_addr_random(bond_dev);
+ if (bond_dev->type != ARPHRD_ETHER) {
++ dev_close(bond_dev);
+ ether_setup(bond_dev);
+ bond_dev->flags |= IFF_MASTER;
+ bond_dev->priv_flags &= ~IFF_TX_SKB_SHARING;
+diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c
+index aede704605c6..141c2a42d7ed 100644
+--- a/drivers/net/can/dev.c
++++ b/drivers/net/can/dev.c
+@@ -915,7 +915,7 @@ static int can_fill_info(struct sk_buff *skb, const struct net_device *dev)
+ nla_put(skb, IFLA_CAN_BITTIMING_CONST,
+ sizeof(*priv->bittiming_const), priv->bittiming_const)) ||
+
+- nla_put(skb, IFLA_CAN_CLOCK, sizeof(cm), &priv->clock) ||
++ nla_put(skb, IFLA_CAN_CLOCK, sizeof(priv->clock), &priv->clock) ||
+ nla_put_u32(skb, IFLA_CAN_STATE, state) ||
+ nla_put(skb, IFLA_CAN_CTRLMODE, sizeof(cm), &cm) ||
+ nla_put_u32(skb, IFLA_CAN_RESTART_MS, priv->restart_ms) ||
+diff --git a/drivers/net/can/sja1000/sja1000.c b/drivers/net/can/sja1000/sja1000.c
+index 7b92e911a616..f10834be48a5 100644
+--- a/drivers/net/can/sja1000/sja1000.c
++++ b/drivers/net/can/sja1000/sja1000.c
+@@ -218,6 +218,9 @@ static void sja1000_start(struct net_device *dev)
+ priv->write_reg(priv, SJA1000_RXERR, 0x0);
+ priv->read_reg(priv, SJA1000_ECC);
+
++ /* clear interrupt flags */
++ priv->read_reg(priv, SJA1000_IR);
++
+ /* leave reset mode */
+ set_normal_mode(dev);
+ }
+diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-dev.c b/drivers/net/ethernet/amd/xgbe/xgbe-dev.c
+index a4473d8ff4fa..f672dba345f7 100644
+--- a/drivers/net/ethernet/amd/xgbe/xgbe-dev.c
++++ b/drivers/net/ethernet/amd/xgbe/xgbe-dev.c
+@@ -1595,7 +1595,7 @@ static void xgbe_dev_xmit(struct xgbe_channel *channel)
+ packet->rdesc_count, 1);
+
+ /* Make sure ownership is written to the descriptor */
+- dma_wmb();
++ smp_wmb();
+
+ ring->cur = cur_index + 1;
+ if (!packet->skb->xmit_more ||
+diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
+index aae9d5ecd182..dde0486667e0 100644
+--- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
++++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c
+@@ -1807,6 +1807,7 @@ static int xgbe_tx_poll(struct xgbe_channel *channel)
+ struct netdev_queue *txq;
+ int processed = 0;
+ unsigned int tx_packets = 0, tx_bytes = 0;
++ unsigned int cur;
+
+ DBGPR("-->xgbe_tx_poll\n");
+
+@@ -1814,10 +1815,15 @@ static int xgbe_tx_poll(struct xgbe_channel *channel)
+ if (!ring)
+ return 0;
+
++ cur = ring->cur;
++
++ /* Be sure we get ring->cur before accessing descriptor data */
++ smp_rmb();
++
+ txq = netdev_get_tx_queue(netdev, channel->queue_index);
+
+ while ((processed < XGBE_TX_DESC_MAX_PROC) &&
+- (ring->dirty != ring->cur)) {
++ (ring->dirty != cur)) {
+ rdata = XGBE_GET_DESC_DATA(ring, ring->dirty);
+ rdesc = rdata->rdesc;
+
+diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
+index de63266de16b..5d1dde3f3540 100644
+--- a/drivers/net/ethernet/freescale/fec_main.c
++++ b/drivers/net/ethernet/freescale/fec_main.c
+@@ -1775,7 +1775,7 @@ static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
+ int ret = 0;
+
+ ret = pm_runtime_get_sync(dev);
+- if (IS_ERR_VALUE(ret))
++ if (ret < 0)
+ return ret;
+
+ fep->mii_timeout = 0;
+@@ -1811,11 +1811,13 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
+ struct fec_enet_private *fep = bus->priv;
+ struct device *dev = &fep->pdev->dev;
+ unsigned long time_left;
+- int ret = 0;
++ int ret;
+
+ ret = pm_runtime_get_sync(dev);
+- if (IS_ERR_VALUE(ret))
++ if (ret < 0)
+ return ret;
++ else
++ ret = 0;
+
+ fep->mii_timeout = 0;
+ reinit_completion(&fep->mdio_done);
+@@ -2866,7 +2868,7 @@ fec_enet_open(struct net_device *ndev)
+ int ret;
+
+ ret = pm_runtime_get_sync(&fep->pdev->dev);
+- if (IS_ERR_VALUE(ret))
++ if (ret < 0)
+ return ret;
+
+ pinctrl_pm_select_default_state(&fep->pdev->dev);
+diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
+index 09ec32e33076..7e788073c154 100644
+--- a/drivers/net/ethernet/marvell/mvneta.c
++++ b/drivers/net/ethernet/marvell/mvneta.c
+@@ -949,7 +949,7 @@ static void mvneta_defaults_set(struct mvneta_port *pp)
+ /* Set CPU queue access map - all CPUs have access to all RX
+ * queues and to all TX queues
+ */
+- for (cpu = 0; cpu < CONFIG_NR_CPUS; cpu++)
++ for_each_present_cpu(cpu)
+ mvreg_write(pp, MVNETA_CPU_MAP(cpu),
+ (MVNETA_CPU_RXQ_ACCESS_ALL_MASK |
+ MVNETA_CPU_TXQ_ACCESS_ALL_MASK));
+@@ -1533,12 +1533,16 @@ static int mvneta_rx(struct mvneta_port *pp, int rx_todo,
+ }
+
+ skb = build_skb(data, pp->frag_size > PAGE_SIZE ? 0 : pp->frag_size);
+- if (!skb)
+- goto err_drop_frame;
+
++ /* After refill old buffer has to be unmapped regardless
++ * the skb is successfully built or not.
++ */
+ dma_unmap_single(dev->dev.parent, phys_addr,
+ MVNETA_RX_BUF_SIZE(pp->pkt_size), DMA_FROM_DEVICE);
+
++ if (!skb)
++ goto err_drop_frame;
++
+ rcvd_pkts++;
+ rcvd_bytes += rx_bytes;
+
+diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
+index 0a3202047569..2177e56ed0be 100644
+--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
++++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
+@@ -2398,7 +2398,7 @@ int mlx4_multi_func_init(struct mlx4_dev *dev)
+ }
+ }
+
+- memset(&priv->mfunc.master.cmd_eqe, 0, dev->caps.eqe_size);
++ memset(&priv->mfunc.master.cmd_eqe, 0, sizeof(struct mlx4_eqe));
+ priv->mfunc.master.cmd_eqe.type = MLX4_EVENT_TYPE_CMD;
+ INIT_WORK(&priv->mfunc.master.comm_work,
+ mlx4_master_comm_channel);
+diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c b/drivers/net/ethernet/mellanox/mlx4/eq.c
+index 8e81e53c370e..ad8f95df4310 100644
+--- a/drivers/net/ethernet/mellanox/mlx4/eq.c
++++ b/drivers/net/ethernet/mellanox/mlx4/eq.c
+@@ -196,7 +196,7 @@ static void slave_event(struct mlx4_dev *dev, u8 slave, struct mlx4_eqe *eqe)
+ return;
+ }
+
+- memcpy(s_eqe, eqe, dev->caps.eqe_size - 1);
++ memcpy(s_eqe, eqe, sizeof(struct mlx4_eqe) - 1);
+ s_eqe->slave_id = slave;
+ /* ensure all information is written before setting the ownersip bit */
+ dma_wmb();
+diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
+index b1a4ea21c91c..4dd18f4bb5ae 100644
+--- a/drivers/net/ethernet/sfc/ef10.c
++++ b/drivers/net/ethernet/sfc/ef10.c
+@@ -1809,7 +1809,9 @@ static void efx_ef10_tx_write(struct efx_tx_queue *tx_queue)
+ unsigned int write_ptr;
+ efx_qword_t *txd;
+
+- BUG_ON(tx_queue->write_count == tx_queue->insert_count);
++ tx_queue->xmit_more_available = false;
++ if (unlikely(tx_queue->write_count == tx_queue->insert_count))
++ return;
+
+ do {
+ write_ptr = tx_queue->write_count & tx_queue->ptr_mask;
+diff --git a/drivers/net/ethernet/sfc/farch.c b/drivers/net/ethernet/sfc/farch.c
+index f08266f0eca2..5a1c5a8f278a 100644
+--- a/drivers/net/ethernet/sfc/farch.c
++++ b/drivers/net/ethernet/sfc/farch.c
+@@ -321,7 +321,9 @@ void efx_farch_tx_write(struct efx_tx_queue *tx_queue)
+ unsigned write_ptr;
+ unsigned old_write_count = tx_queue->write_count;
+
+- BUG_ON(tx_queue->write_count == tx_queue->insert_count);
++ tx_queue->xmit_more_available = false;
++ if (unlikely(tx_queue->write_count == tx_queue->insert_count))
++ return;
+
+ do {
+ write_ptr = tx_queue->write_count & tx_queue->ptr_mask;
+diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
+index 47d1e3a96522..b8e8ce1caf0f 100644
+--- a/drivers/net/ethernet/sfc/net_driver.h
++++ b/drivers/net/ethernet/sfc/net_driver.h
+@@ -219,6 +219,7 @@ struct efx_tx_buffer {
+ * @tso_packets: Number of packets via the TSO xmit path
+ * @pushes: Number of times the TX push feature has been used
+ * @pio_packets: Number of times the TX PIO feature has been used
++ * @xmit_more_available: Are any packets waiting to be pushed to the NIC
+ * @empty_read_count: If the completion path has seen the queue as empty
+ * and the transmission path has not yet checked this, the value of
+ * @read_count bitwise-added to %EFX_EMPTY_COUNT_VALID; otherwise 0.
+@@ -253,6 +254,7 @@ struct efx_tx_queue {
+ unsigned int tso_packets;
+ unsigned int pushes;
+ unsigned int pio_packets;
++ bool xmit_more_available;
+ /* Statistics to supplement MAC stats */
+ unsigned long tx_packets;
+
+diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c
+index 1833a0146571..67f6afaa022f 100644
+--- a/drivers/net/ethernet/sfc/tx.c
++++ b/drivers/net/ethernet/sfc/tx.c
+@@ -431,8 +431,20 @@ finish_packet:
+ efx_tx_maybe_stop_queue(tx_queue);
+
+ /* Pass off to hardware */
+- if (!skb->xmit_more || netif_xmit_stopped(tx_queue->core_txq))
++ if (!skb->xmit_more || netif_xmit_stopped(tx_queue->core_txq)) {
++ struct efx_tx_queue *txq2 = efx_tx_queue_partner(tx_queue);
++
++ /* There could be packets left on the partner queue if those
++ * SKBs had skb->xmit_more set. If we do not push those they
++ * could be left for a long time and cause a netdev watchdog.
++ */
++ if (txq2->xmit_more_available)
++ efx_nic_push_buffers(txq2);
++
+ efx_nic_push_buffers(tx_queue);
++ } else {
++ tx_queue->xmit_more_available = skb->xmit_more;
++ }
+
+ tx_queue->tx_packets++;
+
+@@ -722,6 +734,7 @@ void efx_init_tx_queue(struct efx_tx_queue *tx_queue)
+ tx_queue->read_count = 0;
+ tx_queue->old_read_count = 0;
+ tx_queue->empty_read_count = 0 | EFX_EMPTY_COUNT_VALID;
++ tx_queue->xmit_more_available = false;
+
+ /* Set up TX descriptor ring */
+ efx_nic_init_tx(tx_queue);
+@@ -747,6 +760,7 @@ void efx_fini_tx_queue(struct efx_tx_queue *tx_queue)
+
+ ++tx_queue->read_count;
+ }
++ tx_queue->xmit_more_available = false;
+ netdev_tx_reset_queue(tx_queue->core_txq);
+ }
+
+@@ -1302,8 +1316,20 @@ static int efx_enqueue_skb_tso(struct efx_tx_queue *tx_queue,
+ efx_tx_maybe_stop_queue(tx_queue);
+
+ /* Pass off to hardware */
+- if (!skb->xmit_more || netif_xmit_stopped(tx_queue->core_txq))
++ if (!skb->xmit_more || netif_xmit_stopped(tx_queue->core_txq)) {
++ struct efx_tx_queue *txq2 = efx_tx_queue_partner(tx_queue);
++
++ /* There could be packets left on the partner queue if those
++ * SKBs had skb->xmit_more set. If we do not push those they
++ * could be left for a long time and cause a netdev watchdog.
++ */
++ if (txq2->xmit_more_available)
++ efx_nic_push_buffers(txq2);
++
+ efx_nic_push_buffers(tx_queue);
++ } else {
++ tx_queue->xmit_more_available = skb->xmit_more;
++ }
+
+ tx_queue->tso_bursts++;
+ return NETDEV_TX_OK;
+diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
+index 771cda2a48b2..2e51b816a7e8 100644
+--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
++++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
+@@ -721,10 +721,13 @@ static int stmmac_get_ts_info(struct net_device *dev,
+ {
+ struct stmmac_priv *priv = netdev_priv(dev);
+
+- if ((priv->hwts_tx_en) && (priv->hwts_rx_en)) {
++ if ((priv->dma_cap.time_stamp || priv->dma_cap.atime_stamp)) {
+
+- info->so_timestamping = SOF_TIMESTAMPING_TX_HARDWARE |
++ info->so_timestamping = SOF_TIMESTAMPING_TX_SOFTWARE |
++ SOF_TIMESTAMPING_TX_HARDWARE |
++ SOF_TIMESTAMPING_RX_SOFTWARE |
+ SOF_TIMESTAMPING_RX_HARDWARE |
++ SOF_TIMESTAMPING_SOFTWARE |
+ SOF_TIMESTAMPING_RAW_HARDWARE;
+
+ if (priv->ptp_clock)
+diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
+index 248478c6f6e4..197c93937c2d 100644
+--- a/drivers/net/macvtap.c
++++ b/drivers/net/macvtap.c
+@@ -137,7 +137,7 @@ static const struct proto_ops macvtap_socket_ops;
+ #define TUN_OFFLOADS (NETIF_F_HW_CSUM | NETIF_F_TSO_ECN | NETIF_F_TSO | \
+ NETIF_F_TSO6 | NETIF_F_UFO)
+ #define RX_OFFLOADS (NETIF_F_GRO | NETIF_F_LRO)
+-#define TAP_FEATURES (NETIF_F_GSO | NETIF_F_SG)
++#define TAP_FEATURES (NETIF_F_GSO | NETIF_F_SG | NETIF_F_FRAGLIST)
+
+ static struct macvlan_dev *macvtap_get_vlan_rcu(const struct net_device *dev)
+ {
+diff --git a/drivers/net/ppp/pppoe.c b/drivers/net/ppp/pppoe.c
+index 2ed75060da50..5e0b43283bce 100644
+--- a/drivers/net/ppp/pppoe.c
++++ b/drivers/net/ppp/pppoe.c
+@@ -589,7 +589,7 @@ static int pppoe_release(struct socket *sock)
+
+ po = pppox_sk(sk);
+
+- if (sk->sk_state & (PPPOX_CONNECTED | PPPOX_BOUND | PPPOX_ZOMBIE)) {
++ if (po->pppoe_dev) {
+ dev_put(po->pppoe_dev);
+ po->pppoe_dev = NULL;
+ }
+diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
+index 64a60afbe50c..8f1738c3b3c5 100644
+--- a/drivers/net/usb/qmi_wwan.c
++++ b/drivers/net/usb/qmi_wwan.c
+@@ -765,6 +765,10 @@ static const struct usb_device_id products[] = {
+ {QMI_FIXED_INTF(0x1199, 0x9056, 8)}, /* Sierra Wireless Modem */
+ {QMI_FIXED_INTF(0x1199, 0x9057, 8)},
+ {QMI_FIXED_INTF(0x1199, 0x9061, 8)}, /* Sierra Wireless Modem */
++ {QMI_FIXED_INTF(0x1199, 0x9070, 8)}, /* Sierra Wireless MC74xx/EM74xx */
++ {QMI_FIXED_INTF(0x1199, 0x9070, 10)}, /* Sierra Wireless MC74xx/EM74xx */
++ {QMI_FIXED_INTF(0x1199, 0x9071, 8)}, /* Sierra Wireless MC74xx/EM74xx */
++ {QMI_FIXED_INTF(0x1199, 0x9071, 10)}, /* Sierra Wireless MC74xx/EM74xx */
+ {QMI_FIXED_INTF(0x1bbb, 0x011e, 4)}, /* Telekom Speedstick LTE II (Alcatel One Touch L100V LTE) */
+ {QMI_FIXED_INTF(0x1bbb, 0x0203, 2)}, /* Alcatel L800MA */
+ {QMI_FIXED_INTF(0x2357, 0x0201, 4)}, /* TP-LINK HSUPA Modem MA180 */
+diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
+index 0d3c474ff76d..a5ea8a984c53 100644
+--- a/drivers/net/wireless/ath/ath10k/mac.c
++++ b/drivers/net/wireless/ath/ath10k/mac.c
+@@ -2070,7 +2070,8 @@ static void ath10k_peer_assoc_h_ht(struct ath10k *ar,
+ enum ieee80211_band band;
+ const u8 *ht_mcs_mask;
+ const u16 *vht_mcs_mask;
+- int i, n, max_nss;
++ int i, n;
++ u8 max_nss;
+ u32 stbc;
+
+ lockdep_assert_held(&ar->conf_mutex);
+@@ -2155,7 +2156,7 @@ static void ath10k_peer_assoc_h_ht(struct ath10k *ar,
+ arg->peer_ht_rates.rates[i] = i;
+ } else {
+ arg->peer_ht_rates.num_rates = n;
+- arg->peer_num_spatial_streams = max_nss;
++ arg->peer_num_spatial_streams = min(sta->rx_nss, max_nss);
+ }
+
+ ath10k_dbg(ar, ATH10K_DBG_MAC, "mac ht peer %pM mcs cnt %d nss %d\n",
+@@ -4021,7 +4022,7 @@ static int ath10k_config(struct ieee80211_hw *hw, u32 changed)
+
+ static u32 get_nss_from_chainmask(u16 chain_mask)
+ {
+- if ((chain_mask & 0x15) == 0x15)
++ if ((chain_mask & 0xf) == 0xf)
+ return 4;
+ else if ((chain_mask & 0x7) == 0x7)
+ return 3;
+diff --git a/drivers/net/wireless/iwlwifi/pcie/drv.c b/drivers/net/wireless/iwlwifi/pcie/drv.c
+index 865d578dee82..fd6aef7d4496 100644
+--- a/drivers/net/wireless/iwlwifi/pcie/drv.c
++++ b/drivers/net/wireless/iwlwifi/pcie/drv.c
+@@ -423,14 +423,21 @@ static const struct pci_device_id iwl_hw_card_ids[] = {
+ /* 8000 Series */
+ {IWL_PCI_DEVICE(0x24F3, 0x0010, iwl8260_2ac_cfg)},
+ {IWL_PCI_DEVICE(0x24F3, 0x1010, iwl8260_2ac_cfg)},
++ {IWL_PCI_DEVICE(0x24F3, 0x0130, iwl8260_2ac_cfg)},
++ {IWL_PCI_DEVICE(0x24F3, 0x1130, iwl8260_2ac_cfg)},
++ {IWL_PCI_DEVICE(0x24F3, 0x0132, iwl8260_2ac_cfg)},
++ {IWL_PCI_DEVICE(0x24F3, 0x1132, iwl8260_2ac_cfg)},
+ {IWL_PCI_DEVICE(0x24F3, 0x0110, iwl8260_2ac_cfg)},
++ {IWL_PCI_DEVICE(0x24F3, 0x01F0, iwl8260_2ac_cfg)},
++ {IWL_PCI_DEVICE(0x24F3, 0x0012, iwl8260_2ac_cfg)},
++ {IWL_PCI_DEVICE(0x24F3, 0x1012, iwl8260_2ac_cfg)},
+ {IWL_PCI_DEVICE(0x24F3, 0x1110, iwl8260_2ac_cfg)},
+ {IWL_PCI_DEVICE(0x24F3, 0x0050, iwl8260_2ac_cfg)},
+ {IWL_PCI_DEVICE(0x24F3, 0x0250, iwl8260_2ac_cfg)},
+ {IWL_PCI_DEVICE(0x24F3, 0x1050, iwl8260_2ac_cfg)},
+ {IWL_PCI_DEVICE(0x24F3, 0x0150, iwl8260_2ac_cfg)},
++ {IWL_PCI_DEVICE(0x24F3, 0x1150, iwl8260_2ac_cfg)},
+ {IWL_PCI_DEVICE(0x24F4, 0x0030, iwl8260_2ac_cfg)},
+- {IWL_PCI_DEVICE(0x24F4, 0x1130, iwl8260_2ac_cfg)},
+ {IWL_PCI_DEVICE(0x24F4, 0x1030, iwl8260_2ac_cfg)},
+ {IWL_PCI_DEVICE(0x24F3, 0xC010, iwl8260_2ac_cfg)},
+ {IWL_PCI_DEVICE(0x24F3, 0xC110, iwl8260_2ac_cfg)},
+@@ -438,18 +445,28 @@ static const struct pci_device_id iwl_hw_card_ids[] = {
+ {IWL_PCI_DEVICE(0x24F3, 0xC050, iwl8260_2ac_cfg)},
+ {IWL_PCI_DEVICE(0x24F3, 0xD050, iwl8260_2ac_cfg)},
+ {IWL_PCI_DEVICE(0x24F3, 0x8010, iwl8260_2ac_cfg)},
++ {IWL_PCI_DEVICE(0x24F3, 0x8110, iwl8260_2ac_cfg)},
+ {IWL_PCI_DEVICE(0x24F3, 0x9010, iwl8260_2ac_cfg)},
++ {IWL_PCI_DEVICE(0x24F3, 0x9110, iwl8260_2ac_cfg)},
+ {IWL_PCI_DEVICE(0x24F4, 0x8030, iwl8260_2ac_cfg)},
+ {IWL_PCI_DEVICE(0x24F4, 0x9030, iwl8260_2ac_cfg)},
++ {IWL_PCI_DEVICE(0x24F3, 0x8130, iwl8260_2ac_cfg)},
++ {IWL_PCI_DEVICE(0x24F3, 0x9130, iwl8260_2ac_cfg)},
++ {IWL_PCI_DEVICE(0x24F3, 0x8132, iwl8260_2ac_cfg)},
++ {IWL_PCI_DEVICE(0x24F3, 0x9132, iwl8260_2ac_cfg)},
+ {IWL_PCI_DEVICE(0x24F3, 0x8050, iwl8260_2ac_cfg)},
++ {IWL_PCI_DEVICE(0x24F3, 0x8150, iwl8260_2ac_cfg)},
+ {IWL_PCI_DEVICE(0x24F3, 0x9050, iwl8260_2ac_cfg)},
++ {IWL_PCI_DEVICE(0x24F3, 0x9150, iwl8260_2ac_cfg)},
+ {IWL_PCI_DEVICE(0x24F3, 0x0004, iwl8260_2n_cfg)},
++ {IWL_PCI_DEVICE(0x24F3, 0x0044, iwl8260_2n_cfg)},
+ {IWL_PCI_DEVICE(0x24F5, 0x0010, iwl4165_2ac_cfg)},
+ {IWL_PCI_DEVICE(0x24F6, 0x0030, iwl4165_2ac_cfg)},
+ {IWL_PCI_DEVICE(0x24F3, 0x0810, iwl8260_2ac_cfg)},
+ {IWL_PCI_DEVICE(0x24F3, 0x0910, iwl8260_2ac_cfg)},
+ {IWL_PCI_DEVICE(0x24F3, 0x0850, iwl8260_2ac_cfg)},
+ {IWL_PCI_DEVICE(0x24F3, 0x0950, iwl8260_2ac_cfg)},
++ {IWL_PCI_DEVICE(0x24F3, 0x0930, iwl8260_2ac_cfg)},
+ #endif /* CONFIG_IWLMVM */
+
+ {0}
+diff --git a/drivers/net/wireless/iwlwifi/pcie/trans.c b/drivers/net/wireless/iwlwifi/pcie/trans.c
+index 9e144e71da0b..dab9b91b3f3d 100644
+--- a/drivers/net/wireless/iwlwifi/pcie/trans.c
++++ b/drivers/net/wireless/iwlwifi/pcie/trans.c
+@@ -592,10 +592,8 @@ static int iwl_pcie_prepare_card_hw(struct iwl_trans *trans)
+
+ do {
+ ret = iwl_pcie_set_hw_ready(trans);
+- if (ret >= 0) {
+- ret = 0;
+- goto out;
+- }
++ if (ret >= 0)
++ return 0;
+
+ usleep_range(200, 1000);
+ t += 200;
+@@ -605,10 +603,6 @@ static int iwl_pcie_prepare_card_hw(struct iwl_trans *trans)
+
+ IWL_ERR(trans, "Couldn't prepare the card\n");
+
+-out:
+- iwl_clear_bit(trans, CSR_DBG_LINK_PWR_MGMT_REG,
+- CSR_RESET_LINK_PWR_MGMT_DISABLED);
+-
+ return ret;
+ }
+
+diff --git a/drivers/net/wireless/mwifiex/debugfs.c b/drivers/net/wireless/mwifiex/debugfs.c
+index 5a0636d43a1b..5583856fc5c4 100644
+--- a/drivers/net/wireless/mwifiex/debugfs.c
++++ b/drivers/net/wireless/mwifiex/debugfs.c
+@@ -731,7 +731,7 @@ mwifiex_rdeeprom_read(struct file *file, char __user *ubuf,
+ (struct mwifiex_private *) file->private_data;
+ unsigned long addr = get_zeroed_page(GFP_KERNEL);
+ char *buf = (char *) addr;
+- int pos = 0, ret = 0, i;
++ int pos, ret, i;
+ u8 value[MAX_EEPROM_DATA];
+
+ if (!buf)
+@@ -739,7 +739,7 @@ mwifiex_rdeeprom_read(struct file *file, char __user *ubuf,
+
+ if (saved_offset == -1) {
+ /* No command has been given */
+- pos += snprintf(buf, PAGE_SIZE, "0");
++ pos = snprintf(buf, PAGE_SIZE, "0");
+ goto done;
+ }
+
+@@ -748,17 +748,17 @@ mwifiex_rdeeprom_read(struct file *file, char __user *ubuf,
+ (u16) saved_bytes, value);
+ if (ret) {
+ ret = -EINVAL;
+- goto done;
++ goto out_free;
+ }
+
+- pos += snprintf(buf, PAGE_SIZE, "%d %d ", saved_offset, saved_bytes);
++ pos = snprintf(buf, PAGE_SIZE, "%d %d ", saved_offset, saved_bytes);
+
+ for (i = 0; i < saved_bytes; i++)
+- pos += snprintf(buf + strlen(buf), PAGE_SIZE, "%d ", value[i]);
+-
+- ret = simple_read_from_buffer(ubuf, count, ppos, buf, pos);
++ pos += scnprintf(buf + pos, PAGE_SIZE - pos, "%d ", value[i]);
+
+ done:
++ ret = simple_read_from_buffer(ubuf, count, ppos, buf, pos);
++out_free:
+ free_page(addr);
+ return ret;
+ }
+diff --git a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
+index a9c9a077c77d..bc3d907fd20f 100644
+--- a/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
++++ b/drivers/staging/lustre/include/linux/lnet/lib-lnet.h
+@@ -680,7 +680,7 @@ void lnet_debug_peer(lnet_nid_t nid);
+ static inline void
+ lnet_peer_set_alive(lnet_peer_t *lp)
+ {
+- lp->lp_last_alive = lp->lp_last_query = get_seconds();
++ lp->lp_last_alive = lp->lp_last_query = jiffies;
+ if (!lp->lp_alive)
+ lnet_notify_locked(lp, 0, 1, lp->lp_last_alive);
+ }
+diff --git a/drivers/staging/rtl8712/usb_intf.c b/drivers/staging/rtl8712/usb_intf.c
+index f8b5b332e7c3..943a0e204532 100644
+--- a/drivers/staging/rtl8712/usb_intf.c
++++ b/drivers/staging/rtl8712/usb_intf.c
+@@ -144,6 +144,7 @@ static struct usb_device_id rtl871x_usb_id_tbl[] = {
+ {USB_DEVICE(0x0DF6, 0x0058)},
+ {USB_DEVICE(0x0DF6, 0x0049)},
+ {USB_DEVICE(0x0DF6, 0x004C)},
++ {USB_DEVICE(0x0DF6, 0x006C)},
+ {USB_DEVICE(0x0DF6, 0x0064)},
+ /* Skyworth */
+ {USB_DEVICE(0x14b2, 0x3300)},
+diff --git a/drivers/tty/mips_ejtag_fdc.c b/drivers/tty/mips_ejtag_fdc.c
+index 358323c83b4f..43a2ba0c0fe9 100644
+--- a/drivers/tty/mips_ejtag_fdc.c
++++ b/drivers/tty/mips_ejtag_fdc.c
+@@ -1045,38 +1045,6 @@ err_destroy_ports:
+ return ret;
+ }
+
+-static int mips_ejtag_fdc_tty_remove(struct mips_cdmm_device *dev)
+-{
+- struct mips_ejtag_fdc_tty *priv = mips_cdmm_get_drvdata(dev);
+- struct mips_ejtag_fdc_tty_port *dport;
+- int nport;
+- unsigned int cfg;
+-
+- if (priv->irq >= 0) {
+- raw_spin_lock_irq(&priv->lock);
+- cfg = mips_ejtag_fdc_read(priv, REG_FDCFG);
+- /* Disable interrupts */
+- cfg &= ~(REG_FDCFG_TXINTTHRES | REG_FDCFG_RXINTTHRES);
+- cfg |= REG_FDCFG_TXINTTHRES_DISABLED;
+- cfg |= REG_FDCFG_RXINTTHRES_DISABLED;
+- mips_ejtag_fdc_write(priv, REG_FDCFG, cfg);
+- raw_spin_unlock_irq(&priv->lock);
+- } else {
+- priv->removing = true;
+- del_timer_sync(&priv->poll_timer);
+- }
+- kthread_stop(priv->thread);
+- if (dev->cpu == 0)
+- mips_ejtag_fdc_con.tty_drv = NULL;
+- tty_unregister_driver(priv->driver);
+- for (nport = 0; nport < NUM_TTY_CHANNELS; nport++) {
+- dport = &priv->ports[nport];
+- tty_port_destroy(&dport->port);
+- }
+- put_tty_driver(priv->driver);
+- return 0;
+-}
+-
+ static int mips_ejtag_fdc_tty_cpu_down(struct mips_cdmm_device *dev)
+ {
+ struct mips_ejtag_fdc_tty *priv = mips_cdmm_get_drvdata(dev);
+@@ -1149,12 +1117,11 @@ static struct mips_cdmm_driver mips_ejtag_fdc_tty_driver = {
+ .name = "mips_ejtag_fdc",
+ },
+ .probe = mips_ejtag_fdc_tty_probe,
+- .remove = mips_ejtag_fdc_tty_remove,
+ .cpu_down = mips_ejtag_fdc_tty_cpu_down,
+ .cpu_up = mips_ejtag_fdc_tty_cpu_up,
+ .id_table = mips_ejtag_fdc_tty_ids,
+ };
+-module_mips_cdmm_driver(mips_ejtag_fdc_tty_driver);
++builtin_mips_cdmm_driver(mips_ejtag_fdc_tty_driver);
+
+ static int __init mips_ejtag_fdc_init_console(void)
+ {
+diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
+index afc1879f66e0..dedac8ab85bf 100644
+--- a/drivers/tty/n_tty.c
++++ b/drivers/tty/n_tty.c
+@@ -169,7 +169,7 @@ static inline int tty_copy_to_user(struct tty_struct *tty,
+ {
+ struct n_tty_data *ldata = tty->disc_data;
+
+- tty_audit_add_data(tty, to, n, ldata->icanon);
++ tty_audit_add_data(tty, from, n, ldata->icanon);
+ return copy_to_user(to, from, n);
+ }
+
+diff --git a/drivers/tty/tty_audit.c b/drivers/tty/tty_audit.c
+index 90ca082935f6..3d245cd3d8e6 100644
+--- a/drivers/tty/tty_audit.c
++++ b/drivers/tty/tty_audit.c
+@@ -265,7 +265,7 @@ static struct tty_audit_buf *tty_audit_buf_get(struct tty_struct *tty,
+ *
+ * Audit @data of @size from @tty, if necessary.
+ */
+-void tty_audit_add_data(struct tty_struct *tty, unsigned char *data,
++void tty_audit_add_data(struct tty_struct *tty, const void *data,
+ size_t size, unsigned icanon)
+ {
+ struct tty_audit_buf *buf;
+diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
+index 774df354af55..1aa028638120 100644
+--- a/drivers/tty/tty_io.c
++++ b/drivers/tty/tty_io.c
+@@ -1279,18 +1279,22 @@ int tty_send_xchar(struct tty_struct *tty, char ch)
+ int was_stopped = tty->stopped;
+
+ if (tty->ops->send_xchar) {
++ down_read(&tty->termios_rwsem);
+ tty->ops->send_xchar(tty, ch);
++ up_read(&tty->termios_rwsem);
+ return 0;
+ }
+
+ if (tty_write_lock(tty, 0) < 0)
+ return -ERESTARTSYS;
+
++ down_read(&tty->termios_rwsem);
+ if (was_stopped)
+ start_tty(tty);
+ tty->ops->write(tty, &ch, 1);
+ if (was_stopped)
+ stop_tty(tty);
++ up_read(&tty->termios_rwsem);
+ tty_write_unlock(tty);
+ return 0;
+ }
+diff --git a/drivers/tty/tty_ioctl.c b/drivers/tty/tty_ioctl.c
+index 5232fb60b0b1..043e332e7423 100644
+--- a/drivers/tty/tty_ioctl.c
++++ b/drivers/tty/tty_ioctl.c
+@@ -1142,16 +1142,12 @@ int n_tty_ioctl_helper(struct tty_struct *tty, struct file *file,
+ spin_unlock_irq(&tty->flow_lock);
+ break;
+ case TCIOFF:
+- down_read(&tty->termios_rwsem);
+ if (STOP_CHAR(tty) != __DISABLED_CHAR)
+ retval = tty_send_xchar(tty, STOP_CHAR(tty));
+- up_read(&tty->termios_rwsem);
+ break;
+ case TCION:
+- down_read(&tty->termios_rwsem);
+ if (START_CHAR(tty) != __DISABLED_CHAR)
+ retval = tty_send_xchar(tty, START_CHAR(tty));
+- up_read(&tty->termios_rwsem);
+ break;
+ default:
+ return -EINVAL;
+diff --git a/drivers/usb/chipidea/ci_hdrc_imx.c b/drivers/usb/chipidea/ci_hdrc_imx.c
+index fa774323ebda..846ceb91ec14 100644
+--- a/drivers/usb/chipidea/ci_hdrc_imx.c
++++ b/drivers/usb/chipidea/ci_hdrc_imx.c
+@@ -68,6 +68,12 @@ struct ci_hdrc_imx_data {
+ struct imx_usbmisc_data *usbmisc_data;
+ bool supports_runtime_pm;
+ bool in_lpm;
++ /* SoC before i.mx6 (except imx23/imx28) needs three clks */
++ bool need_three_clks;
++ struct clk *clk_ipg;
++ struct clk *clk_ahb;
++ struct clk *clk_per;
++ /* --------------------------------- */
+ };
+
+ /* Common functions shared by usbmisc drivers */
+@@ -119,6 +125,102 @@ static struct imx_usbmisc_data *usbmisc_get_init_data(struct device *dev)
+ }
+
+ /* End of common functions shared by usbmisc drivers*/
++static int imx_get_clks(struct device *dev)
++{
++ struct ci_hdrc_imx_data *data = dev_get_drvdata(dev);
++ int ret = 0;
++
++ data->clk_ipg = devm_clk_get(dev, "ipg");
++ if (IS_ERR(data->clk_ipg)) {
++ /* If the platform only needs one clocks */
++ data->clk = devm_clk_get(dev, NULL);
++ if (IS_ERR(data->clk)) {
++ ret = PTR_ERR(data->clk);
++ dev_err(dev,
++ "Failed to get clks, err=%ld,%ld\n",
++ PTR_ERR(data->clk), PTR_ERR(data->clk_ipg));
++ return ret;
++ }
++ return ret;
++ }
++
++ data->clk_ahb = devm_clk_get(dev, "ahb");
++ if (IS_ERR(data->clk_ahb)) {
++ ret = PTR_ERR(data->clk_ahb);
++ dev_err(dev,
++ "Failed to get ahb clock, err=%d\n", ret);
++ return ret;
++ }
++
++ data->clk_per = devm_clk_get(dev, "per");
++ if (IS_ERR(data->clk_per)) {
++ ret = PTR_ERR(data->clk_per);
++ dev_err(dev,
++ "Failed to get per clock, err=%d\n", ret);
++ return ret;
++ }
++
++ data->need_three_clks = true;
++ return ret;
++}
++
++static int imx_prepare_enable_clks(struct device *dev)
++{
++ struct ci_hdrc_imx_data *data = dev_get_drvdata(dev);
++ int ret = 0;
++
++ if (data->need_three_clks) {
++ ret = clk_prepare_enable(data->clk_ipg);
++ if (ret) {
++ dev_err(dev,
++ "Failed to prepare/enable ipg clk, err=%d\n",
++ ret);
++ return ret;
++ }
++
++ ret = clk_prepare_enable(data->clk_ahb);
++ if (ret) {
++ dev_err(dev,
++ "Failed to prepare/enable ahb clk, err=%d\n",
++ ret);
++ clk_disable_unprepare(data->clk_ipg);
++ return ret;
++ }
++
++ ret = clk_prepare_enable(data->clk_per);
++ if (ret) {
++ dev_err(dev,
++ "Failed to prepare/enable per clk, err=%d\n",
++ ret);
++ clk_disable_unprepare(data->clk_ahb);
++ clk_disable_unprepare(data->clk_ipg);
++ return ret;
++ }
++ } else {
++ ret = clk_prepare_enable(data->clk);
++ if (ret) {
++ dev_err(dev,
++ "Failed to prepare/enable clk, err=%d\n",
++ ret);
++ return ret;
++ }
++ }
++
++ return ret;
++}
++
++static void imx_disable_unprepare_clks(struct device *dev)
++{
++ struct ci_hdrc_imx_data *data = dev_get_drvdata(dev);
++
++ if (data->need_three_clks) {
++ clk_disable_unprepare(data->clk_per);
++ clk_disable_unprepare(data->clk_ahb);
++ clk_disable_unprepare(data->clk_ipg);
++ } else {
++ clk_disable_unprepare(data->clk);
++ }
++}
+
+ static int ci_hdrc_imx_probe(struct platform_device *pdev)
+ {
+@@ -137,23 +239,18 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
+ if (!data)
+ return -ENOMEM;
+
++ platform_set_drvdata(pdev, data);
+ data->usbmisc_data = usbmisc_get_init_data(&pdev->dev);
+ if (IS_ERR(data->usbmisc_data))
+ return PTR_ERR(data->usbmisc_data);
+
+- data->clk = devm_clk_get(&pdev->dev, NULL);
+- if (IS_ERR(data->clk)) {
+- dev_err(&pdev->dev,
+- "Failed to get clock, err=%ld\n", PTR_ERR(data->clk));
+- return PTR_ERR(data->clk);
+- }
++ ret = imx_get_clks(&pdev->dev);
++ if (ret)
++ return ret;
+
+- ret = clk_prepare_enable(data->clk);
+- if (ret) {
+- dev_err(&pdev->dev,
+- "Failed to prepare or enable clock, err=%d\n", ret);
++ ret = imx_prepare_enable_clks(&pdev->dev);
++ if (ret)
+ return ret;
+- }
+
+ data->phy = devm_usb_get_phy_by_phandle(&pdev->dev, "fsl,usbphy", 0);
+ if (IS_ERR(data->phy)) {
+@@ -196,8 +293,6 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
+ goto disable_device;
+ }
+
+- platform_set_drvdata(pdev, data);
+-
+ if (data->supports_runtime_pm) {
+ pm_runtime_set_active(&pdev->dev);
+ pm_runtime_enable(&pdev->dev);
+@@ -210,7 +305,7 @@ static int ci_hdrc_imx_probe(struct platform_device *pdev)
+ disable_device:
+ ci_hdrc_remove_device(data->ci_pdev);
+ err_clk:
+- clk_disable_unprepare(data->clk);
++ imx_disable_unprepare_clks(&pdev->dev);
+ return ret;
+ }
+
+@@ -224,7 +319,7 @@ static int ci_hdrc_imx_remove(struct platform_device *pdev)
+ pm_runtime_put_noidle(&pdev->dev);
+ }
+ ci_hdrc_remove_device(data->ci_pdev);
+- clk_disable_unprepare(data->clk);
++ imx_disable_unprepare_clks(&pdev->dev);
+
+ return 0;
+ }
+@@ -236,7 +331,7 @@ static int imx_controller_suspend(struct device *dev)
+
+ dev_dbg(dev, "at %s\n", __func__);
+
+- clk_disable_unprepare(data->clk);
++ imx_disable_unprepare_clks(dev);
+ data->in_lpm = true;
+
+ return 0;
+@@ -254,7 +349,7 @@ static int imx_controller_resume(struct device *dev)
+ return 0;
+ }
+
+- ret = clk_prepare_enable(data->clk);
++ ret = imx_prepare_enable_clks(dev);
+ if (ret)
+ return ret;
+
+@@ -269,7 +364,7 @@ static int imx_controller_resume(struct device *dev)
+ return 0;
+
+ clk_disable:
+- clk_disable_unprepare(data->clk);
++ imx_disable_unprepare_clks(dev);
+ return ret;
+ }
+
+diff --git a/drivers/usb/chipidea/udc.c b/drivers/usb/chipidea/udc.c
+index 6e53c24fa1cb..92937c14f818 100644
+--- a/drivers/usb/chipidea/udc.c
++++ b/drivers/usb/chipidea/udc.c
+@@ -1730,6 +1730,22 @@ static int ci_udc_start(struct usb_gadget *gadget,
+ return retval;
+ }
+
++static void ci_udc_stop_for_otg_fsm(struct ci_hdrc *ci)
++{
++ if (!ci_otg_is_fsm_mode(ci))
++ return;
++
++ mutex_lock(&ci->fsm.lock);
++ if (ci->fsm.otg->state == OTG_STATE_A_PERIPHERAL) {
++ ci->fsm.a_bidl_adis_tmout = 1;
++ ci_hdrc_otg_fsm_start(ci);
++ } else if (ci->fsm.otg->state == OTG_STATE_B_PERIPHERAL) {
++ ci->fsm.protocol = PROTO_UNDEF;
++ ci->fsm.otg->state = OTG_STATE_UNDEFINED;
++ }
++ mutex_unlock(&ci->fsm.lock);
++}
++
+ /**
+ * ci_udc_stop: unregister a gadget driver
+ */
+@@ -1754,6 +1770,7 @@ static int ci_udc_stop(struct usb_gadget *gadget)
+ ci->driver = NULL;
+ spin_unlock_irqrestore(&ci->lock, flags);
+
++ ci_udc_stop_for_otg_fsm(ci);
+ return 0;
+ }
+
+diff --git a/drivers/usb/class/usblp.c b/drivers/usb/class/usblp.c
+index f38e875a3fb1..8218ba7eb263 100644
+--- a/drivers/usb/class/usblp.c
++++ b/drivers/usb/class/usblp.c
+@@ -873,11 +873,11 @@ static int usblp_wwait(struct usblp *usblp, int nonblock)
+
+ add_wait_queue(&usblp->wwait, &waita);
+ for (;;) {
+- set_current_state(TASK_INTERRUPTIBLE);
+ if (mutex_lock_interruptible(&usblp->mut)) {
+ rc = -EINTR;
+ break;
+ }
++ set_current_state(TASK_INTERRUPTIBLE);
+ rc = usblp_wtest(usblp, nonblock);
+ mutex_unlock(&usblp->mut);
+ if (rc <= 0)
+diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
+index ff5773c66b84..c0566ecd9977 100644
+--- a/drivers/usb/dwc3/core.c
++++ b/drivers/usb/dwc3/core.c
+@@ -490,6 +490,9 @@ static int dwc3_phy_setup(struct dwc3 *dwc)
+ if (dwc->dis_u2_susphy_quirk)
+ reg &= ~DWC3_GUSB2PHYCFG_SUSPHY;
+
++ if (dwc->dis_enblslpm_quirk)
++ reg &= ~DWC3_GUSB2PHYCFG_ENBLSLPM;
++
+ dwc3_writel(dwc->regs, DWC3_GUSB2PHYCFG(0), reg);
+
+ return 0;
+@@ -509,12 +512,18 @@ static int dwc3_core_init(struct dwc3 *dwc)
+
+ reg = dwc3_readl(dwc->regs, DWC3_GSNPSID);
+ /* This should read as U3 followed by revision number */
+- if ((reg & DWC3_GSNPSID_MASK) != 0x55330000) {
++ if ((reg & DWC3_GSNPSID_MASK) == 0x55330000) {
++ /* Detected DWC_usb3 IP */
++ dwc->revision = reg;
++ } else if ((reg & DWC3_GSNPSID_MASK) == 0x33310000) {
++ /* Detected DWC_usb31 IP */
++ dwc->revision = dwc3_readl(dwc->regs, DWC3_VER_NUMBER);
++ dwc->revision |= DWC3_REVISION_IS_DWC31;
++ } else {
+ dev_err(dwc->dev, "this is not a DesignWare USB3 DRD Core\n");
+ ret = -ENODEV;
+ goto err0;
+ }
+- dwc->revision = reg;
+
+ /*
+ * Write Linux Version Code to our GUID register so it's easy to figure
+@@ -881,6 +890,8 @@ static int dwc3_probe(struct platform_device *pdev)
+ "snps,dis_u3_susphy_quirk");
+ dwc->dis_u2_susphy_quirk = of_property_read_bool(node,
+ "snps,dis_u2_susphy_quirk");
++ dwc->dis_enblslpm_quirk = device_property_read_bool(dev,
++ "snps,dis_enblslpm_quirk");
+
+ dwc->tx_de_emphasis_quirk = of_property_read_bool(node,
+ "snps,tx_de_emphasis_quirk");
+@@ -911,6 +922,7 @@ static int dwc3_probe(struct platform_device *pdev)
+ dwc->rx_detect_poll_quirk = pdata->rx_detect_poll_quirk;
+ dwc->dis_u3_susphy_quirk = pdata->dis_u3_susphy_quirk;
+ dwc->dis_u2_susphy_quirk = pdata->dis_u2_susphy_quirk;
++ dwc->dis_enblslpm_quirk = pdata->dis_enblslpm_quirk;
+
+ dwc->tx_de_emphasis_quirk = pdata->tx_de_emphasis_quirk;
+ if (pdata->tx_de_emphasis)
+diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
+index 044778884585..6e53ce9ce320 100644
+--- a/drivers/usb/dwc3/core.h
++++ b/drivers/usb/dwc3/core.h
+@@ -108,6 +108,9 @@
+ #define DWC3_GPRTBIMAP_FS0 0xc188
+ #define DWC3_GPRTBIMAP_FS1 0xc18c
+
++#define DWC3_VER_NUMBER 0xc1a0
++#define DWC3_VER_TYPE 0xc1a4
++
+ #define DWC3_GUSB2PHYCFG(n) (0xc200 + (n * 0x04))
+ #define DWC3_GUSB2I2CCTL(n) (0xc240 + (n * 0x04))
+
+@@ -175,6 +178,7 @@
+ #define DWC3_GUSB2PHYCFG_PHYSOFTRST (1 << 31)
+ #define DWC3_GUSB2PHYCFG_SUSPHY (1 << 6)
+ #define DWC3_GUSB2PHYCFG_ULPI_UTMI (1 << 4)
++#define DWC3_GUSB2PHYCFG_ENBLSLPM (1 << 8)
+
+ /* Global USB2 PHY Vendor Control Register */
+ #define DWC3_GUSB2PHYACC_NEWREGREQ (1 << 25)
+@@ -712,6 +716,8 @@ struct dwc3_scratchpad_array {
+ * @rx_detect_poll_quirk: set if we enable rx_detect to polling lfps quirk
+ * @dis_u3_susphy_quirk: set if we disable usb3 suspend phy
+ * @dis_u2_susphy_quirk: set if we disable usb2 suspend phy
++ * @dis_enblslpm_quirk: set if we clear enblslpm in GUSB2PHYCFG,
++ * disabling the suspend signal to the PHY.
+ * @tx_de_emphasis_quirk: set if we enable Tx de-emphasis quirk
+ * @tx_de_emphasis: Tx de-emphasis value
+ * 0 - -6dB de-emphasis
+@@ -766,6 +772,14 @@ struct dwc3 {
+ u32 num_event_buffers;
+ u32 u1u2;
+ u32 maximum_speed;
++
++ /*
++ * All 3.1 IP version constants are greater than the 3.0 IP
++ * version constants. This works for most version checks in
++ * dwc3. However, in the future, this may not apply as
++ * features may be developed on newer versions of the 3.0 IP
++ * that are not in the 3.1 IP.
++ */
+ u32 revision;
+
+ #define DWC3_REVISION_173A 0x5533173a
+@@ -788,6 +802,13 @@ struct dwc3 {
+ #define DWC3_REVISION_270A 0x5533270a
+ #define DWC3_REVISION_280A 0x5533280a
+
++/*
++ * NOTICE: we're using bit 31 as a "is usb 3.1" flag. This is really
++ * just so dwc31 revisions are always larger than dwc3.
++ */
++#define DWC3_REVISION_IS_DWC31 0x80000000
++#define DWC3_USB31_REVISION_110A (0x3131302a | DWC3_REVISION_IS_USB31)
++
+ enum dwc3_ep0_next ep0_next_event;
+ enum dwc3_ep0_state ep0state;
+ enum dwc3_link_state link_state;
+@@ -841,6 +862,7 @@ struct dwc3 {
+ unsigned rx_detect_poll_quirk:1;
+ unsigned dis_u3_susphy_quirk:1;
+ unsigned dis_u2_susphy_quirk:1;
++ unsigned dis_enblslpm_quirk:1;
+
+ unsigned tx_de_emphasis_quirk:1;
+ unsigned tx_de_emphasis:2;
+diff --git a/drivers/usb/dwc3/dwc3-pci.c b/drivers/usb/dwc3/dwc3-pci.c
+index 27e4fc896e9d..04b87ebe6f94 100644
+--- a/drivers/usb/dwc3/dwc3-pci.c
++++ b/drivers/usb/dwc3/dwc3-pci.c
+@@ -27,6 +27,8 @@
+ #include "platform_data.h"
+
+ #define PCI_DEVICE_ID_SYNOPSYS_HAPSUSB3 0xabcd
++#define PCI_DEVICE_ID_SYNOPSYS_HAPSUSB3_AXI 0xabce
++#define PCI_DEVICE_ID_SYNOPSYS_HAPSUSB31 0xabcf
+ #define PCI_DEVICE_ID_INTEL_BYT 0x0f37
+ #define PCI_DEVICE_ID_INTEL_MRFLD 0x119e
+ #define PCI_DEVICE_ID_INTEL_BSW 0x22B7
+@@ -100,6 +102,22 @@ static int dwc3_pci_quirks(struct pci_dev *pdev)
+ }
+ }
+
++ if (pdev->vendor == PCI_VENDOR_ID_SYNOPSYS &&
++ (pdev->device == PCI_DEVICE_ID_SYNOPSYS_HAPSUSB3 ||
++ pdev->device == PCI_DEVICE_ID_SYNOPSYS_HAPSUSB3_AXI ||
++ pdev->device == PCI_DEVICE_ID_SYNOPSYS_HAPSUSB31)) {
++
++ struct dwc3_platform_data pdata;
++
++ memset(&pdata, 0, sizeof(pdata));
++ pdata.usb3_lpm_capable = true;
++ pdata.has_lpm_erratum = true;
++ pdata.dis_enblslpm_quirk = true;
++
++ return platform_device_add_data(pci_get_drvdata(pdev), &pdata,
++ sizeof(pdata));
++ }
++
+ return 0;
+ }
+
+@@ -172,6 +190,14 @@ static const struct pci_device_id dwc3_pci_id_table[] = {
+ PCI_DEVICE(PCI_VENDOR_ID_SYNOPSYS,
+ PCI_DEVICE_ID_SYNOPSYS_HAPSUSB3),
+ },
++ {
++ PCI_DEVICE(PCI_VENDOR_ID_SYNOPSYS,
++ PCI_DEVICE_ID_SYNOPSYS_HAPSUSB3_AXI),
++ },
++ {
++ PCI_DEVICE(PCI_VENDOR_ID_SYNOPSYS,
++ PCI_DEVICE_ID_SYNOPSYS_HAPSUSB31),
++ },
+ { PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_BSW), },
+ { PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_BYT), },
+ { PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_MRFLD), },
+diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
+index 333a7c0078fc..6fbf461d523c 100644
+--- a/drivers/usb/dwc3/gadget.c
++++ b/drivers/usb/dwc3/gadget.c
+@@ -1859,27 +1859,32 @@ static int dwc3_cleanup_done_reqs(struct dwc3 *dwc, struct dwc3_ep *dep,
+ unsigned int i;
+ int ret;
+
+- req = next_request(&dep->req_queued);
+- if (!req) {
+- WARN_ON_ONCE(1);
+- return 1;
+- }
+- i = 0;
+ do {
+- slot = req->start_slot + i;
+- if ((slot == DWC3_TRB_NUM - 1) &&
++ req = next_request(&dep->req_queued);
++ if (!req) {
++ WARN_ON_ONCE(1);
++ return 1;
++ }
++ i = 0;
++ do {
++ slot = req->start_slot + i;
++ if ((slot == DWC3_TRB_NUM - 1) &&
+ usb_endpoint_xfer_isoc(dep->endpoint.desc))
+- slot++;
+- slot %= DWC3_TRB_NUM;
+- trb = &dep->trb_pool[slot];
++ slot++;
++ slot %= DWC3_TRB_NUM;
++ trb = &dep->trb_pool[slot];
++
++ ret = __dwc3_cleanup_done_trbs(dwc, dep, req, trb,
++ event, status);
++ if (ret)
++ break;
++ } while (++i < req->request.num_mapped_sgs);
++
++ dwc3_gadget_giveback(dep, req, status);
+
+- ret = __dwc3_cleanup_done_trbs(dwc, dep, req, trb,
+- event, status);
+ if (ret)
+ break;
+- } while (++i < req->request.num_mapped_sgs);
+-
+- dwc3_gadget_giveback(dep, req, status);
++ } while (1);
+
+ if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
+ list_empty(&dep->req_queued)) {
+@@ -2709,12 +2714,34 @@ int dwc3_gadget_init(struct dwc3 *dwc)
+ }
+
+ dwc->gadget.ops = &dwc3_gadget_ops;
+- dwc->gadget.max_speed = USB_SPEED_SUPER;
+ dwc->gadget.speed = USB_SPEED_UNKNOWN;
+ dwc->gadget.sg_supported = true;
+ dwc->gadget.name = "dwc3-gadget";
+
+ /*
++ * FIXME We might be setting max_speed to <SUPER, however versions
++ * <2.20a of dwc3 have an issue with metastability (documented
++ * elsewhere in this driver) which tells us we can't set max speed to
++ * anything lower than SUPER.
++ *
++ * Because gadget.max_speed is only used by composite.c and function
++ * drivers (i.e. it won't go into dwc3's registers) we are allowing this
++ * to happen so we avoid sending SuperSpeed Capability descriptor
++ * together with our BOS descriptor as that could confuse host into
++ * thinking we can handle super speed.
++ *
++ * Note that, in fact, we won't even support GetBOS requests when speed
++ * is less than super speed because we don't have means, yet, to tell
++ * composite.c that we are USB 2.0 + LPM ECN.
++ */
++ if (dwc->revision < DWC3_REVISION_220A)
++ dwc3_trace(trace_dwc3_gadget,
++ "Changing max_speed on rev %08x\n",
++ dwc->revision);
++
++ dwc->gadget.max_speed = dwc->maximum_speed;
++
++ /*
+ * Per databook, DWC3 needs buffer size to be aligned to MaxPacketSize
+ * on ep out.
+ */
+diff --git a/drivers/usb/dwc3/platform_data.h b/drivers/usb/dwc3/platform_data.h
+index d3614ecbb9ca..db2938002260 100644
+--- a/drivers/usb/dwc3/platform_data.h
++++ b/drivers/usb/dwc3/platform_data.h
+@@ -42,6 +42,7 @@ struct dwc3_platform_data {
+ unsigned rx_detect_poll_quirk:1;
+ unsigned dis_u3_susphy_quirk:1;
+ unsigned dis_u2_susphy_quirk:1;
++ unsigned dis_enblslpm_quirk:1;
+
+ unsigned tx_de_emphasis_quirk:1;
+ unsigned tx_de_emphasis:2;
+diff --git a/drivers/usb/gadget/udc/atmel_usba_udc.c b/drivers/usb/gadget/udc/atmel_usba_udc.c
+index 4095cce05e6a..35fff450bdc8 100644
+--- a/drivers/usb/gadget/udc/atmel_usba_udc.c
++++ b/drivers/usb/gadget/udc/atmel_usba_udc.c
+@@ -1634,7 +1634,7 @@ static irqreturn_t usba_udc_irq(int irq, void *devid)
+ spin_lock(&udc->lock);
+
+ int_enb = usba_int_enb_get(udc);
+- status = usba_readl(udc, INT_STA) & int_enb;
++ status = usba_readl(udc, INT_STA) & (int_enb | USBA_HIGH_SPEED);
+ DBG(DBG_INT, "irq, status=%#08x\n", status);
+
+ if (status & USBA_DET_SUSPEND) {
+diff --git a/drivers/usb/gadget/udc/net2280.c b/drivers/usb/gadget/udc/net2280.c
+index 2bee912ca65b..baa0191666aa 100644
+--- a/drivers/usb/gadget/udc/net2280.c
++++ b/drivers/usb/gadget/udc/net2280.c
+@@ -1846,7 +1846,7 @@ static void defect7374_disable_data_eps(struct net2280 *dev)
+
+ for (i = 1; i < 5; i++) {
+ ep = &dev->ep[i];
+- writel(0, &ep->cfg->ep_cfg);
++ writel(i, &ep->cfg->ep_cfg);
+ }
+
+ /* CSROUT, CSRIN, PCIOUT, PCIIN, STATIN, RCIN */
+diff --git a/drivers/usb/host/ehci-orion.c b/drivers/usb/host/ehci-orion.c
+index bfcbb9aa8816..ee8d5faa0194 100644
+--- a/drivers/usb/host/ehci-orion.c
++++ b/drivers/usb/host/ehci-orion.c
+@@ -224,7 +224,8 @@ static int ehci_orion_drv_probe(struct platform_device *pdev)
+ priv->phy = devm_phy_optional_get(&pdev->dev, "usb");
+ if (IS_ERR(priv->phy)) {
+ err = PTR_ERR(priv->phy);
+- goto err_phy_get;
++ if (err != -ENOSYS)
++ goto err_phy_get;
+ } else {
+ err = phy_init(priv->phy);
+ if (err)
+diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
+index d7b9f484d4e9..6062996d35a6 100644
+--- a/drivers/usb/host/xhci.c
++++ b/drivers/usb/host/xhci.c
+@@ -175,6 +175,16 @@ int xhci_reset(struct xhci_hcd *xhci)
+ command |= CMD_RESET;
+ writel(command, &xhci->op_regs->command);
+
++ /* Existing Intel xHCI controllers require a delay of 1 mS,
++ * after setting the CMD_RESET bit, and before accessing any
++ * HC registers. This allows the HC to complete the
++ * reset operation and be ready for HC register access.
++ * Without this delay, the subsequent HC register access,
++ * may result in a system hang very rarely.
++ */
++ if (xhci->quirks & XHCI_INTEL_HOST)
++ udelay(1000);
++
+ ret = xhci_handshake(&xhci->op_regs->command,
+ CMD_RESET, 0, 10 * 1000 * 1000);
+ if (ret)
+diff --git a/drivers/usb/musb/musb_core.c b/drivers/usb/musb/musb_core.c
+index 514a6cdaeff6..2fe6d263eb6b 100644
+--- a/drivers/usb/musb/musb_core.c
++++ b/drivers/usb/musb/musb_core.c
+@@ -132,7 +132,7 @@ static inline struct musb *dev_to_musb(struct device *dev)
+ /*-------------------------------------------------------------------------*/
+
+ #ifndef CONFIG_BLACKFIN
+-static int musb_ulpi_read(struct usb_phy *phy, u32 offset)
++static int musb_ulpi_read(struct usb_phy *phy, u32 reg)
+ {
+ void __iomem *addr = phy->io_priv;
+ int i = 0;
+@@ -151,7 +151,7 @@ static int musb_ulpi_read(struct usb_phy *phy, u32 offset)
+ * ULPICarKitControlDisableUTMI after clearing POWER_SUSPENDM.
+ */
+
+- musb_writeb(addr, MUSB_ULPI_REG_ADDR, (u8)offset);
++ musb_writeb(addr, MUSB_ULPI_REG_ADDR, (u8)reg);
+ musb_writeb(addr, MUSB_ULPI_REG_CONTROL,
+ MUSB_ULPI_REG_REQ | MUSB_ULPI_RDN_WR);
+
+@@ -176,7 +176,7 @@ out:
+ return ret;
+ }
+
+-static int musb_ulpi_write(struct usb_phy *phy, u32 offset, u32 data)
++static int musb_ulpi_write(struct usb_phy *phy, u32 val, u32 reg)
+ {
+ void __iomem *addr = phy->io_priv;
+ int i = 0;
+@@ -191,8 +191,8 @@ static int musb_ulpi_write(struct usb_phy *phy, u32 offset, u32 data)
+ power &= ~MUSB_POWER_SUSPENDM;
+ musb_writeb(addr, MUSB_POWER, power);
+
+- musb_writeb(addr, MUSB_ULPI_REG_ADDR, (u8)offset);
+- musb_writeb(addr, MUSB_ULPI_REG_DATA, (u8)data);
++ musb_writeb(addr, MUSB_ULPI_REG_ADDR, (u8)reg);
++ musb_writeb(addr, MUSB_ULPI_REG_DATA, (u8)val);
+ musb_writeb(addr, MUSB_ULPI_REG_CONTROL, MUSB_ULPI_REG_REQ);
+
+ while (!(musb_readb(addr, MUSB_ULPI_REG_CONTROL)
+diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c
+index 7c8eb4c4c175..4021846139c9 100644
+--- a/drivers/usb/serial/option.c
++++ b/drivers/usb/serial/option.c
+@@ -162,6 +162,7 @@ static void option_instat_callback(struct urb *urb);
+ #define NOVATELWIRELESS_PRODUCT_HSPA_EMBEDDED_HIGHSPEED 0x9001
+ #define NOVATELWIRELESS_PRODUCT_E362 0x9010
+ #define NOVATELWIRELESS_PRODUCT_E371 0x9011
++#define NOVATELWIRELESS_PRODUCT_U620L 0x9022
+ #define NOVATELWIRELESS_PRODUCT_G2 0xA010
+ #define NOVATELWIRELESS_PRODUCT_MC551 0xB001
+
+@@ -357,6 +358,7 @@ static void option_instat_callback(struct urb *urb);
+ /* This is the 4G XS Stick W14 a.k.a. Mobilcom Debitel Surf-Stick *
+ * It seems to contain a Qualcomm QSC6240/6290 chipset */
+ #define FOUR_G_SYSTEMS_PRODUCT_W14 0x9603
++#define FOUR_G_SYSTEMS_PRODUCT_W100 0x9b01
+
+ /* iBall 3.5G connect wireless modem */
+ #define IBALL_3_5G_CONNECT 0x9605
+@@ -522,6 +524,11 @@ static const struct option_blacklist_info four_g_w14_blacklist = {
+ .sendsetup = BIT(0) | BIT(1),
+ };
+
++static const struct option_blacklist_info four_g_w100_blacklist = {
++ .sendsetup = BIT(1) | BIT(2),
++ .reserved = BIT(3),
++};
++
+ static const struct option_blacklist_info alcatel_x200_blacklist = {
+ .sendsetup = BIT(0) | BIT(1),
+ .reserved = BIT(4),
+@@ -1060,6 +1067,7 @@ static const struct usb_device_id option_ids[] = {
+ { USB_DEVICE_AND_INTERFACE_INFO(NOVATELWIRELESS_VENDOR_ID, NOVATELWIRELESS_PRODUCT_MC551, 0xff, 0xff, 0xff) },
+ { USB_DEVICE_AND_INTERFACE_INFO(NOVATELWIRELESS_VENDOR_ID, NOVATELWIRELESS_PRODUCT_E362, 0xff, 0xff, 0xff) },
+ { USB_DEVICE_AND_INTERFACE_INFO(NOVATELWIRELESS_VENDOR_ID, NOVATELWIRELESS_PRODUCT_E371, 0xff, 0xff, 0xff) },
++ { USB_DEVICE_AND_INTERFACE_INFO(NOVATELWIRELESS_VENDOR_ID, NOVATELWIRELESS_PRODUCT_U620L, 0xff, 0x00, 0x00) },
+
+ { USB_DEVICE(AMOI_VENDOR_ID, AMOI_PRODUCT_H01) },
+ { USB_DEVICE(AMOI_VENDOR_ID, AMOI_PRODUCT_H01A) },
+@@ -1653,6 +1661,9 @@ static const struct usb_device_id option_ids[] = {
+ { USB_DEVICE(LONGCHEER_VENDOR_ID, FOUR_G_SYSTEMS_PRODUCT_W14),
+ .driver_info = (kernel_ulong_t)&four_g_w14_blacklist
+ },
++ { USB_DEVICE(LONGCHEER_VENDOR_ID, FOUR_G_SYSTEMS_PRODUCT_W100),
++ .driver_info = (kernel_ulong_t)&four_g_w100_blacklist
++ },
+ { USB_DEVICE_INTERFACE_CLASS(LONGCHEER_VENDOR_ID, SPEEDUP_PRODUCT_SU9800, 0xff) },
+ { USB_DEVICE(LONGCHEER_VENDOR_ID, ZOOM_PRODUCT_4597) },
+ { USB_DEVICE(LONGCHEER_VENDOR_ID, IBALL_3_5G_CONNECT) },
+diff --git a/drivers/usb/serial/qcserial.c b/drivers/usb/serial/qcserial.c
+index f49d262e926b..514fa91cf74e 100644
+--- a/drivers/usb/serial/qcserial.c
++++ b/drivers/usb/serial/qcserial.c
+@@ -22,6 +22,8 @@
+ #define DRIVER_AUTHOR "Qualcomm Inc"
+ #define DRIVER_DESC "Qualcomm USB Serial driver"
+
++#define QUECTEL_EC20_PID 0x9215
++
+ /* standard device layouts supported by this driver */
+ enum qcserial_layouts {
+ QCSERIAL_G2K = 0, /* Gobi 2000 */
+@@ -169,6 +171,38 @@ static const struct usb_device_id id_table[] = {
+ };
+ MODULE_DEVICE_TABLE(usb, id_table);
+
++static int handle_quectel_ec20(struct device *dev, int ifnum)
++{
++ int altsetting = 0;
++
++ /*
++ * Quectel EC20 Mini PCIe LTE module layout:
++ * 0: DM/DIAG (use libqcdm from ModemManager for communication)
++ * 1: NMEA
++ * 2: AT-capable modem port
++ * 3: Modem interface
++ * 4: NDIS
++ */
++ switch (ifnum) {
++ case 0:
++ dev_dbg(dev, "Quectel EC20 DM/DIAG interface found\n");
++ break;
++ case 1:
++ dev_dbg(dev, "Quectel EC20 NMEA GPS interface found\n");
++ break;
++ case 2:
++ case 3:
++ dev_dbg(dev, "Quectel EC20 Modem port found\n");
++ break;
++ case 4:
++ /* Don't claim the QMI/net interface */
++ altsetting = -1;
++ break;
++ }
++
++ return altsetting;
++}
++
+ static int qcprobe(struct usb_serial *serial, const struct usb_device_id *id)
+ {
+ struct usb_host_interface *intf = serial->interface->cur_altsetting;
+@@ -178,6 +212,10 @@ static int qcprobe(struct usb_serial *serial, const struct usb_device_id *id)
+ __u8 ifnum;
+ int altsetting = -1;
+
++ /* we only support vendor specific functions */
++ if (intf->desc.bInterfaceClass != USB_CLASS_VENDOR_SPEC)
++ goto done;
++
+ nintf = serial->dev->actconfig->desc.bNumInterfaces;
+ dev_dbg(dev, "Num Interfaces = %d\n", nintf);
+ ifnum = intf->desc.bInterfaceNumber;
+@@ -237,6 +275,12 @@ static int qcprobe(struct usb_serial *serial, const struct usb_device_id *id)
+ altsetting = -1;
+ break;
+ case QCSERIAL_G2K:
++ /* handle non-standard layouts */
++ if (nintf == 5 && id->idProduct == QUECTEL_EC20_PID) {
++ altsetting = handle_quectel_ec20(dev, ifnum);
++ goto done;
++ }
++
+ /*
+ * Gobi 2K+ USB layout:
+ * 0: QMI/net
+@@ -297,29 +341,39 @@ static int qcprobe(struct usb_serial *serial, const struct usb_device_id *id)
+ break;
+ case QCSERIAL_HWI:
+ /*
+- * Huawei layout:
+- * 0: AT-capable modem port
+- * 1: DM/DIAG
+- * 2: AT-capable modem port
+- * 3: CCID-compatible PCSC interface
+- * 4: QMI/net
+- * 5: NMEA
++ * Huawei devices map functions by subclass + protocol
++ * instead of interface numbers. The protocol identify
++ * a specific function, while the subclass indicate a
++ * specific firmware source
++ *
++ * This is a blacklist of functions known to be
++ * non-serial. The rest are assumed to be serial and
++ * will be handled by this driver
+ */
+- switch (ifnum) {
+- case 0:
+- case 2:
+- dev_dbg(dev, "Modem port found\n");
+- break;
+- case 1:
+- dev_dbg(dev, "DM/DIAG interface found\n");
+- break;
+- case 5:
+- dev_dbg(dev, "NMEA GPS interface found\n");
+- break;
+- default:
+- /* don't claim any unsupported interface */
++ switch (intf->desc.bInterfaceProtocol) {
++ /* QMI combined (qmi_wwan) */
++ case 0x07:
++ case 0x37:
++ case 0x67:
++ /* QMI data (qmi_wwan) */
++ case 0x08:
++ case 0x38:
++ case 0x68:
++ /* QMI control (qmi_wwan) */
++ case 0x09:
++ case 0x39:
++ case 0x69:
++ /* NCM like (huawei_cdc_ncm) */
++ case 0x16:
++ case 0x46:
++ case 0x76:
+ altsetting = -1;
+ break;
++ default:
++ dev_dbg(dev, "Huawei type serial port found (%02x/%02x/%02x)\n",
++ intf->desc.bInterfaceClass,
++ intf->desc.bInterfaceSubClass,
++ intf->desc.bInterfaceProtocol);
+ }
+ break;
+ default:
+diff --git a/drivers/usb/serial/ti_usb_3410_5052.c b/drivers/usb/serial/ti_usb_3410_5052.c
+index e9da41d9fe7f..2694df2f4559 100644
+--- a/drivers/usb/serial/ti_usb_3410_5052.c
++++ b/drivers/usb/serial/ti_usb_3410_5052.c
+@@ -159,6 +159,7 @@ static const struct usb_device_id ti_id_table_3410[] = {
+ { USB_DEVICE(ABBOTT_VENDOR_ID, ABBOTT_STEREO_PLUG_ID) },
+ { USB_DEVICE(ABBOTT_VENDOR_ID, ABBOTT_STRIP_PORT_ID) },
+ { USB_DEVICE(TI_VENDOR_ID, FRI2_PRODUCT_ID) },
++ { USB_DEVICE(HONEYWELL_VENDOR_ID, HONEYWELL_HGI80_PRODUCT_ID) },
+ { } /* terminator */
+ };
+
+@@ -191,6 +192,7 @@ static const struct usb_device_id ti_id_table_combined[] = {
+ { USB_DEVICE(ABBOTT_VENDOR_ID, ABBOTT_PRODUCT_ID) },
+ { USB_DEVICE(ABBOTT_VENDOR_ID, ABBOTT_STRIP_PORT_ID) },
+ { USB_DEVICE(TI_VENDOR_ID, FRI2_PRODUCT_ID) },
++ { USB_DEVICE(HONEYWELL_VENDOR_ID, HONEYWELL_HGI80_PRODUCT_ID) },
+ { } /* terminator */
+ };
+
+diff --git a/drivers/usb/serial/ti_usb_3410_5052.h b/drivers/usb/serial/ti_usb_3410_5052.h
+index 4a2423e84d55..98f35c656c02 100644
+--- a/drivers/usb/serial/ti_usb_3410_5052.h
++++ b/drivers/usb/serial/ti_usb_3410_5052.h
+@@ -56,6 +56,10 @@
+ #define ABBOTT_PRODUCT_ID ABBOTT_STEREO_PLUG_ID
+ #define ABBOTT_STRIP_PORT_ID 0x3420
+
++/* Honeywell vendor and product IDs */
++#define HONEYWELL_VENDOR_ID 0x10ac
++#define HONEYWELL_HGI80_PRODUCT_ID 0x0102 /* Honeywell HGI80 */
++
+ /* Commands */
+ #define TI_GET_VERSION 0x01
+ #define TI_GET_PORT_STATUS 0x02
+diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
+index 96093ae369a5..cdc3d3360764 100644
+--- a/drivers/xen/events/events_base.c
++++ b/drivers/xen/events/events_base.c
+@@ -39,6 +39,7 @@
+ #include <asm/irq.h>
+ #include <asm/idle.h>
+ #include <asm/io_apic.h>
++#include <asm/i8259.h>
+ #include <asm/xen/pci.h>
+ #include <xen/page.h>
+ #endif
+@@ -420,7 +421,7 @@ static int __must_check xen_allocate_irq_gsi(unsigned gsi)
+ return xen_allocate_irq_dynamic();
+
+ /* Legacy IRQ descriptors are already allocated by the arch. */
+- if (gsi < NR_IRQS_LEGACY)
++ if (gsi < nr_legacy_irqs())
+ irq = gsi;
+ else
+ irq = irq_alloc_desc_at(gsi, -1);
+@@ -446,7 +447,7 @@ static void xen_free_irq(unsigned irq)
+ kfree(info);
+
+ /* Legacy IRQ descriptors are managed by the arch. */
+- if (irq < NR_IRQS_LEGACY)
++ if (irq < nr_legacy_irqs())
+ return;
+
+ irq_free_desc(irq);
+diff --git a/fs/proc/array.c b/fs/proc/array.c
+index ce065cf3104f..57fde2dfd4af 100644
+--- a/fs/proc/array.c
++++ b/fs/proc/array.c
+@@ -372,7 +372,7 @@ int proc_pid_status(struct seq_file *m, struct pid_namespace *ns,
+ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
+ struct pid *pid, struct task_struct *task, int whole)
+ {
+- unsigned long vsize, eip, esp, wchan = ~0UL;
++ unsigned long vsize, eip, esp, wchan = 0;
+ int priority, nice;
+ int tty_pgrp = -1, tty_nr = 0;
+ sigset_t sigign, sigcatch;
+@@ -504,7 +504,19 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
+ seq_put_decimal_ull(m, ' ', task->blocked.sig[0] & 0x7fffffffUL);
+ seq_put_decimal_ull(m, ' ', sigign.sig[0] & 0x7fffffffUL);
+ seq_put_decimal_ull(m, ' ', sigcatch.sig[0] & 0x7fffffffUL);
+- seq_put_decimal_ull(m, ' ', wchan);
++
++ /*
++ * We used to output the absolute kernel address, but that's an
++ * information leak - so instead we show a 0/1 flag here, to signal
++ * to user-space whether there's a wchan field in /proc/PID/wchan.
++ *
++ * This works with older implementations of procps as well.
++ */
++ if (wchan)
++ seq_puts(m, " 1");
++ else
++ seq_puts(m, " 0");
++
+ seq_put_decimal_ull(m, ' ', 0);
+ seq_put_decimal_ull(m, ' ', 0);
+ seq_put_decimal_ll(m, ' ', task->exit_signal);
+diff --git a/fs/proc/base.c b/fs/proc/base.c
+index aa50d1ac28fc..83a43c131e9d 100644
+--- a/fs/proc/base.c
++++ b/fs/proc/base.c
+@@ -430,13 +430,10 @@ static int proc_pid_wchan(struct seq_file *m, struct pid_namespace *ns,
+
+ wchan = get_wchan(task);
+
+- if (lookup_symbol_name(wchan, symname) < 0) {
+- if (!ptrace_may_access(task, PTRACE_MODE_READ))
+- return 0;
+- seq_printf(m, "%lu", wchan);
+- } else {
++ if (wchan && ptrace_may_access(task, PTRACE_MODE_READ) && !lookup_symbol_name(wchan, symname))
+ seq_printf(m, "%s", symname);
+- }
++ else
++ seq_putc(m, '0');
+
+ return 0;
+ }
+diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
+index 05e99b8ef465..053f122b592d 100644
+--- a/include/linux/kvm_host.h
++++ b/include/linux/kvm_host.h
+@@ -436,6 +436,17 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
+ (vcpup = kvm_get_vcpu(kvm, idx)) != NULL; \
+ idx++)
+
++static inline struct kvm_vcpu *kvm_get_vcpu_by_id(struct kvm *kvm, int id)
++{
++ struct kvm_vcpu *vcpu;
++ int i;
++
++ kvm_for_each_vcpu(i, vcpu, kvm)
++ if (vcpu->vcpu_id == id)
++ return vcpu;
++ return NULL;
++}
++
+ #define kvm_for_each_memslot(memslot, slots) \
+ for (memslot = &slots->memslots[0]; \
+ memslot < slots->memslots + KVM_MEM_SLOTS_NUM && memslot->npages;\
+diff --git a/include/linux/tty.h b/include/linux/tty.h
+index ad6c8913aa3e..342a760d5729 100644
+--- a/include/linux/tty.h
++++ b/include/linux/tty.h
+@@ -605,7 +605,7 @@ extern void n_tty_inherit_ops(struct tty_ldisc_ops *ops);
+
+ /* tty_audit.c */
+ #ifdef CONFIG_AUDIT
+-extern void tty_audit_add_data(struct tty_struct *tty, unsigned char *data,
++extern void tty_audit_add_data(struct tty_struct *tty, const void *data,
+ size_t size, unsigned icanon);
+ extern void tty_audit_exit(void);
+ extern void tty_audit_fork(struct signal_struct *sig);
+@@ -613,8 +613,8 @@ extern void tty_audit_tiocsti(struct tty_struct *tty, char ch);
+ extern void tty_audit_push(struct tty_struct *tty);
+ extern int tty_audit_push_current(void);
+ #else
+-static inline void tty_audit_add_data(struct tty_struct *tty,
+- unsigned char *data, size_t size, unsigned icanon)
++static inline void tty_audit_add_data(struct tty_struct *tty, const void *data,
++ size_t size, unsigned icanon)
+ {
+ }
+ static inline void tty_audit_tiocsti(struct tty_struct *tty, char ch)
+diff --git a/include/net/inet_common.h b/include/net/inet_common.h
+index 279f83591971..109e3ee9108c 100644
+--- a/include/net/inet_common.h
++++ b/include/net/inet_common.h
+@@ -41,7 +41,8 @@ int inet_recv_error(struct sock *sk, struct msghdr *msg, int len,
+
+ static inline void inet_ctl_sock_destroy(struct sock *sk)
+ {
+- sock_release(sk->sk_socket);
++ if (sk)
++ sock_release(sk->sk_socket);
+ }
+
+ #endif
+diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
+index 5fa643b4e891..ff6d78ff68df 100644
+--- a/include/net/ip_fib.h
++++ b/include/net/ip_fib.h
+@@ -306,7 +306,7 @@ void fib_flush_external(struct net *net);
+
+ /* Exported by fib_semantics.c */
+ int ip_fib_check_default(__be32 gw, struct net_device *dev);
+-int fib_sync_down_dev(struct net_device *dev, unsigned long event);
++int fib_sync_down_dev(struct net_device *dev, unsigned long event, bool force);
+ int fib_sync_down_addr(struct net *net, __be32 local);
+ int fib_sync_up(struct net_device *dev, unsigned int nh_flags);
+ void fib_select_multipath(struct fib_result *res);
+diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c
+index f1a117f8cad2..0bec4588c3c8 100644
+--- a/net/bluetooth/hidp/core.c
++++ b/net/bluetooth/hidp/core.c
+@@ -401,6 +401,20 @@ static void hidp_idle_timeout(unsigned long arg)
+ {
+ struct hidp_session *session = (struct hidp_session *) arg;
+
++ /* The HIDP user-space API only contains calls to add and remove
++ * devices. There is no way to forward events of any kind. Therefore,
++ * we have to forcefully disconnect a device on idle-timeouts. This is
++ * unfortunate and weird API design, but it is spec-compliant and
++ * required for backwards-compatibility. Hence, on idle-timeout, we
++ * signal driver-detach events, so poll() will be woken up with an
++ * error-condition on both sockets.
++ */
++
++ session->intr_sock->sk->sk_err = EUNATCH;
++ session->ctrl_sock->sk->sk_err = EUNATCH;
++ wake_up_interruptible(sk_sleep(session->intr_sock->sk));
++ wake_up_interruptible(sk_sleep(session->ctrl_sock->sk));
++
+ hidp_session_terminate(session);
+ }
+
+diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c
+index 92720f3fe573..e32a9e4910da 100644
+--- a/net/bluetooth/mgmt.c
++++ b/net/bluetooth/mgmt.c
+@@ -3090,6 +3090,11 @@ static int unpair_device(struct sock *sk, struct hci_dev *hdev, void *data,
+ } else {
+ u8 addr_type;
+
++ if (cp->addr.type == BDADDR_LE_PUBLIC)
++ addr_type = ADDR_LE_DEV_PUBLIC;
++ else
++ addr_type = ADDR_LE_DEV_RANDOM;
++
+ conn = hci_conn_hash_lookup_ba(hdev, LE_LINK,
+ &cp->addr.bdaddr);
+ if (conn) {
+@@ -3105,13 +3110,10 @@ static int unpair_device(struct sock *sk, struct hci_dev *hdev, void *data,
+ */
+ if (!cp->disconnect)
+ conn = NULL;
++ } else {
++ hci_conn_params_del(hdev, &cp->addr.bdaddr, addr_type);
+ }
+
+- if (cp->addr.type == BDADDR_LE_PUBLIC)
+- addr_type = ADDR_LE_DEV_PUBLIC;
+- else
+- addr_type = ADDR_LE_DEV_RANDOM;
+-
+ hci_remove_irk(hdev, &cp->addr.bdaddr, addr_type);
+
+ err = hci_remove_ltk(hdev, &cp->addr.bdaddr, addr_type);
+diff --git a/net/core/dst.c b/net/core/dst.c
+index 002144bea935..cc4a086ae09c 100644
+--- a/net/core/dst.c
++++ b/net/core/dst.c
+@@ -287,7 +287,7 @@ void dst_release(struct dst_entry *dst)
+ if (unlikely(newrefcnt < 0))
+ net_warn_ratelimited("%s: dst:%p refcnt:%d\n",
+ __func__, dst, newrefcnt);
+- if (unlikely(dst->flags & DST_NOCACHE) && !newrefcnt)
++ if (!newrefcnt && unlikely(dst->flags & DST_NOCACHE))
+ call_rcu(&dst->rcu_head, dst_destroy_rcu);
+ }
+ }
+diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
+index 6bbc54940eb4..d7116cf4eba4 100644
+--- a/net/ipv4/fib_frontend.c
++++ b/net/ipv4/fib_frontend.c
+@@ -1063,9 +1063,10 @@ static void nl_fib_lookup_exit(struct net *net)
+ net->ipv4.fibnl = NULL;
+ }
+
+-static void fib_disable_ip(struct net_device *dev, unsigned long event)
++static void fib_disable_ip(struct net_device *dev, unsigned long event,
++ bool force)
+ {
+- if (fib_sync_down_dev(dev, event))
++ if (fib_sync_down_dev(dev, event, force))
+ fib_flush(dev_net(dev));
+ rt_cache_flush(dev_net(dev));
+ arp_ifdown(dev);
+@@ -1093,7 +1094,7 @@ static int fib_inetaddr_event(struct notifier_block *this, unsigned long event,
+ /* Last address was deleted from this interface.
+ * Disable IP.
+ */
+- fib_disable_ip(dev, event);
++ fib_disable_ip(dev, event, true);
+ } else {
+ rt_cache_flush(dev_net(dev));
+ }
+@@ -1110,7 +1111,7 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
+ unsigned int flags;
+
+ if (event == NETDEV_UNREGISTER) {
+- fib_disable_ip(dev, event);
++ fib_disable_ip(dev, event, true);
+ rt_flush_dev(dev);
+ return NOTIFY_DONE;
+ }
+@@ -1131,14 +1132,14 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
+ rt_cache_flush(net);
+ break;
+ case NETDEV_DOWN:
+- fib_disable_ip(dev, event);
++ fib_disable_ip(dev, event, false);
+ break;
+ case NETDEV_CHANGE:
+ flags = dev_get_flags(dev);
+ if (flags & (IFF_RUNNING | IFF_LOWER_UP))
+ fib_sync_up(dev, RTNH_F_LINKDOWN);
+ else
+- fib_sync_down_dev(dev, event);
++ fib_sync_down_dev(dev, event, false);
+ /* fall through */
+ case NETDEV_CHANGEMTU:
+ rt_cache_flush(net);
+diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
+index 3a06586b170c..71bad5c82445 100644
+--- a/net/ipv4/fib_semantics.c
++++ b/net/ipv4/fib_semantics.c
+@@ -1132,7 +1132,13 @@ int fib_sync_down_addr(struct net *net, __be32 local)
+ return ret;
+ }
+
+-int fib_sync_down_dev(struct net_device *dev, unsigned long event)
++/* Event force Flags Description
++ * NETDEV_CHANGE 0 LINKDOWN Carrier OFF, not for scope host
++ * NETDEV_DOWN 0 LINKDOWN|DEAD Link down, not for scope host
++ * NETDEV_DOWN 1 LINKDOWN|DEAD Last address removed
++ * NETDEV_UNREGISTER 1 LINKDOWN|DEAD Device removed
++ */
++int fib_sync_down_dev(struct net_device *dev, unsigned long event, bool force)
+ {
+ int ret = 0;
+ int scope = RT_SCOPE_NOWHERE;
+@@ -1141,8 +1147,7 @@ int fib_sync_down_dev(struct net_device *dev, unsigned long event)
+ struct hlist_head *head = &fib_info_devhash[hash];
+ struct fib_nh *nh;
+
+- if (event == NETDEV_UNREGISTER ||
+- event == NETDEV_DOWN)
++ if (force)
+ scope = -1;
+
+ hlist_for_each_entry(nh, head, nh_hash) {
+@@ -1291,6 +1296,13 @@ int fib_sync_up(struct net_device *dev, unsigned int nh_flags)
+ if (!(dev->flags & IFF_UP))
+ return 0;
+
++ if (nh_flags & RTNH_F_DEAD) {
++ unsigned int flags = dev_get_flags(dev);
++
++ if (flags & (IFF_RUNNING | IFF_LOWER_UP))
++ nh_flags |= RTNH_F_LINKDOWN;
++ }
++
+ prev_fi = NULL;
+ hash = fib_devindex_hashfn(dev->ifindex);
+ head = &fib_info_devhash[hash];
+diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
+index b0c6258ffb79..ea3aedb7dd0e 100644
+--- a/net/ipv4/fib_trie.c
++++ b/net/ipv4/fib_trie.c
+@@ -1561,7 +1561,7 @@ static struct key_vector *leaf_walk_rcu(struct key_vector **tn, t_key key)
+ do {
+ /* record parent and next child index */
+ pn = n;
+- cindex = key ? get_index(key, pn) : 0;
++ cindex = (key > pn->key) ? get_index(key, pn) : 0;
+
+ if (cindex >> pn->bits)
+ break;
+diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
+index 5aa46d4b44ef..5a8ee3282550 100644
+--- a/net/ipv4/gre_offload.c
++++ b/net/ipv4/gre_offload.c
+@@ -36,7 +36,8 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,
+ SKB_GSO_TCP_ECN |
+ SKB_GSO_GRE |
+ SKB_GSO_GRE_CSUM |
+- SKB_GSO_IPIP)))
++ SKB_GSO_IPIP |
++ SKB_GSO_SIT)))
+ goto out;
+
+ if (!skb->encapsulation)
+diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
+index 3a2c0162c3ba..df28693f32e1 100644
+--- a/net/ipv4/ipmr.c
++++ b/net/ipv4/ipmr.c
+@@ -1683,8 +1683,8 @@ static inline int ipmr_forward_finish(struct sock *sk, struct sk_buff *skb)
+ {
+ struct ip_options *opt = &(IPCB(skb)->opt);
+
+- IP_INC_STATS_BH(dev_net(skb_dst(skb)->dev), IPSTATS_MIB_OUTFORWDATAGRAMS);
+- IP_ADD_STATS_BH(dev_net(skb_dst(skb)->dev), IPSTATS_MIB_OUTOCTETS, skb->len);
++ IP_INC_STATS(dev_net(skb_dst(skb)->dev), IPSTATS_MIB_OUTFORWDATAGRAMS);
++ IP_ADD_STATS(dev_net(skb_dst(skb)->dev), IPSTATS_MIB_OUTOCTETS, skb->len);
+
+ if (unlikely(opt->optlen))
+ ip_forward_options(skb);
+@@ -1746,7 +1746,7 @@ static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt,
+ * to blackhole.
+ */
+
+- IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_FRAGFAILS);
++ IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGFAILS);
+ ip_rt_put(rt);
+ goto out_free;
+ }
+diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
+index 0330ab2e2b63..a1442c5a3e0c 100644
+--- a/net/ipv4/sysctl_net_ipv4.c
++++ b/net/ipv4/sysctl_net_ipv4.c
+@@ -47,14 +47,14 @@ static void set_local_port_range(struct net *net, int range[2])
+ {
+ bool same_parity = !((range[0] ^ range[1]) & 1);
+
+- write_seqlock(&net->ipv4.ip_local_ports.lock);
++ write_seqlock_bh(&net->ipv4.ip_local_ports.lock);
+ if (same_parity && !net->ipv4.ip_local_ports.warned) {
+ net->ipv4.ip_local_ports.warned = true;
+ pr_err_ratelimited("ip_local_port_range: prefer different parity for start/end values.\n");
+ }
+ net->ipv4.ip_local_ports.range[0] = range[0];
+ net->ipv4.ip_local_ports.range[1] = range[1];
+- write_sequnlock(&net->ipv4.ip_local_ports.lock);
++ write_sequnlock_bh(&net->ipv4.ip_local_ports.lock);
+ }
+
+ /* Validate changes from /proc interface. */
+diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
+index b7dedd9d36d8..747a4c47e070 100644
+--- a/net/ipv4/tcp_output.c
++++ b/net/ipv4/tcp_output.c
+@@ -3406,7 +3406,7 @@ static int tcp_xmit_probe_skb(struct sock *sk, int urgent, int mib)
+ */
+ tcp_init_nondata_skb(skb, tp->snd_una - !urgent, TCPHDR_ACK);
+ skb_mstamp_get(&skb->skb_mstamp);
+- NET_INC_STATS_BH(sock_net(sk), mib);
++ NET_INC_STATS(sock_net(sk), mib);
+ return tcp_transmit_skb(sk, skb, 0, GFP_ATOMIC);
+ }
+
+diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
+index 21c2c818df3b..c8c1fea06003 100644
+--- a/net/ipv6/addrconf.c
++++ b/net/ipv6/addrconf.c
+@@ -411,6 +411,7 @@ static struct inet6_dev *ipv6_add_dev(struct net_device *dev)
+ if (err) {
+ ipv6_mc_destroy_dev(ndev);
+ del_timer(&ndev->regen_timer);
++ snmp6_unregister_dev(ndev);
+ goto err_release;
+ }
+ /* protected by rtnl_lock */
+diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
+index ac35a28599be..85c4b2fff504 100644
+--- a/net/ipv6/sit.c
++++ b/net/ipv6/sit.c
+@@ -1394,34 +1394,20 @@ static int ipip6_tunnel_init(struct net_device *dev)
+ return 0;
+ }
+
+-static int __net_init ipip6_fb_tunnel_init(struct net_device *dev)
++static void __net_init ipip6_fb_tunnel_init(struct net_device *dev)
+ {
+ struct ip_tunnel *tunnel = netdev_priv(dev);
+ struct iphdr *iph = &tunnel->parms.iph;
+ struct net *net = dev_net(dev);
+ struct sit_net *sitn = net_generic(net, sit_net_id);
+
+- tunnel->dev = dev;
+- tunnel->net = dev_net(dev);
+-
+ iph->version = 4;
+ iph->protocol = IPPROTO_IPV6;
+ iph->ihl = 5;
+ iph->ttl = 64;
+
+- dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
+- if (!dev->tstats)
+- return -ENOMEM;
+-
+- tunnel->dst_cache = alloc_percpu(struct ip_tunnel_dst);
+- if (!tunnel->dst_cache) {
+- free_percpu(dev->tstats);
+- return -ENOMEM;
+- }
+-
+ dev_hold(dev);
+ rcu_assign_pointer(sitn->tunnels_wc[0], tunnel);
+- return 0;
+ }
+
+ static int ipip6_validate(struct nlattr *tb[], struct nlattr *data[])
+@@ -1831,23 +1817,19 @@ static int __net_init sit_init_net(struct net *net)
+ */
+ sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
+
+- err = ipip6_fb_tunnel_init(sitn->fb_tunnel_dev);
+- if (err)
+- goto err_dev_free;
+-
+- ipip6_tunnel_clone_6rd(sitn->fb_tunnel_dev, sitn);
+ err = register_netdev(sitn->fb_tunnel_dev);
+ if (err)
+ goto err_reg_dev;
+
++ ipip6_tunnel_clone_6rd(sitn->fb_tunnel_dev, sitn);
++ ipip6_fb_tunnel_init(sitn->fb_tunnel_dev);
++
+ t = netdev_priv(sitn->fb_tunnel_dev);
+
+ strcpy(t->parms.name, sitn->fb_tunnel_dev->name);
+ return 0;
+
+ err_reg_dev:
+- dev_put(sitn->fb_tunnel_dev);
+-err_dev_free:
+ ipip6_dev_free(sitn->fb_tunnel_dev);
+ err_alloc_dev:
+ return err;
+diff --git a/net/irda/irlmp.c b/net/irda/irlmp.c
+index a26c401ef4a4..43964594aa12 100644
+--- a/net/irda/irlmp.c
++++ b/net/irda/irlmp.c
+@@ -1839,7 +1839,7 @@ static void *irlmp_seq_hb_idx(struct irlmp_iter_state *iter, loff_t *off)
+ for (element = hashbin_get_first(iter->hashbin);
+ element != NULL;
+ element = hashbin_get_next(iter->hashbin)) {
+- if (!off || *off-- == 0) {
++ if (!off || (*off)-- == 0) {
+ /* NB: hashbin left locked */
+ return element;
+ }
+diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
+index 9b2cc278ac2a..33bf779df350 100644
+--- a/net/mac80211/mlme.c
++++ b/net/mac80211/mlme.c
+@@ -3378,7 +3378,7 @@ static void ieee80211_rx_mgmt_beacon(struct ieee80211_sub_if_data *sdata,
+
+ if (ifmgd->rssi_min_thold != ifmgd->rssi_max_thold &&
+ ifmgd->count_beacon_signal >= IEEE80211_SIGNAL_AVE_MIN_COUNT) {
+- int sig = ifmgd->ave_beacon_signal;
++ int sig = ifmgd->ave_beacon_signal / 16;
+ int last_sig = ifmgd->last_ave_beacon_signal;
+ struct ieee80211_event event = {
+ .type = RSSI_EVENT,
+@@ -4999,6 +4999,25 @@ int ieee80211_mgd_deauth(struct ieee80211_sub_if_data *sdata,
+ return 0;
+ }
+
++ if (ifmgd->assoc_data &&
++ ether_addr_equal(ifmgd->assoc_data->bss->bssid, req->bssid)) {
++ sdata_info(sdata,
++ "aborting association with %pM by local choice (Reason: %u=%s)\n",
++ req->bssid, req->reason_code,
++ ieee80211_get_reason_code_string(req->reason_code));
++
++ drv_mgd_prepare_tx(sdata->local, sdata);
++ ieee80211_send_deauth_disassoc(sdata, req->bssid,
++ IEEE80211_STYPE_DEAUTH,
++ req->reason_code, tx,
++ frame_buf);
++ ieee80211_destroy_assoc_data(sdata, false);
++ ieee80211_report_disconnect(sdata, frame_buf,
++ sizeof(frame_buf), true,
++ req->reason_code);
++ return 0;
++ }
++
+ if (ifmgd->associated &&
+ ether_addr_equal(ifmgd->associated->bssid, req->bssid)) {
+ sdata_info(sdata,
+diff --git a/net/mac80211/trace.h b/net/mac80211/trace.h
+index 6f14591d8ca9..0b13bfa6f32f 100644
+--- a/net/mac80211/trace.h
++++ b/net/mac80211/trace.h
+@@ -33,11 +33,11 @@
+ __field(u32, chan_width) \
+ __field(u32, center_freq1) \
+ __field(u32, center_freq2)
+-#define CHANDEF_ASSIGN(c) \
+- __entry->control_freq = (c)->chan ? (c)->chan->center_freq : 0; \
+- __entry->chan_width = (c)->width; \
+- __entry->center_freq1 = (c)->center_freq1; \
+- __entry->center_freq2 = (c)->center_freq2;
++#define CHANDEF_ASSIGN(c) \
++ __entry->control_freq = (c) ? ((c)->chan ? (c)->chan->center_freq : 0) : 0; \
++ __entry->chan_width = (c) ? (c)->width : 0; \
++ __entry->center_freq1 = (c) ? (c)->center_freq1 : 0; \
++ __entry->center_freq2 = (c) ? (c)->center_freq2 : 0;
+ #define CHANDEF_PR_FMT " control:%d MHz width:%d center: %d/%d MHz"
+ #define CHANDEF_PR_ARG __entry->control_freq, __entry->chan_width, \
+ __entry->center_freq1, __entry->center_freq2
+diff --git a/net/mac80211/util.c b/net/mac80211/util.c
+index 43e5aadd7a89..f5fa8c09cb42 100644
+--- a/net/mac80211/util.c
++++ b/net/mac80211/util.c
+@@ -2984,6 +2984,13 @@ ieee80211_extend_noa_desc(struct ieee80211_noa_data *data, u32 tsf, int i)
+ if (end > 0)
+ return false;
+
++ /* One shot NOA */
++ if (data->count[i] == 1)
++ return false;
++
++ if (data->desc[i].interval == 0)
++ return false;
++
+ /* End time is in the past, check for repetitions */
+ skip = DIV_ROUND_UP(-end, data->desc[i].interval);
+ if (data->count[i] < 255) {
+diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
+index a133d16eb053..8b158f71bff6 100644
+--- a/net/netlink/af_netlink.c
++++ b/net/netlink/af_netlink.c
+@@ -2346,7 +2346,7 @@ static int netlink_getsockopt(struct socket *sock, int level, int optname,
+ int pos, idx, shift;
+
+ err = 0;
+- netlink_table_grab();
++ netlink_lock_table();
+ for (pos = 0; pos * 8 < nlk->ngroups; pos += sizeof(u32)) {
+ if (len - pos < sizeof(u32))
+ break;
+@@ -2361,7 +2361,7 @@ static int netlink_getsockopt(struct socket *sock, int level, int optname,
+ }
+ if (put_user(ALIGN(nlk->ngroups / 8, sizeof(u32)), optlen))
+ err = -EFAULT;
+- netlink_table_ungrab();
++ netlink_unlock_table();
+ break;
+ }
+ default:
+diff --git a/net/nfc/nci/hci.c b/net/nfc/nci/hci.c
+index 609f92283d1b..30b09f04c142 100644
+--- a/net/nfc/nci/hci.c
++++ b/net/nfc/nci/hci.c
+@@ -101,6 +101,20 @@ struct nci_hcp_packet {
+ #define NCI_HCP_MSG_GET_CMD(header) (header & 0x3f)
+ #define NCI_HCP_MSG_GET_PIPE(header) (header & 0x7f)
+
++static int nci_hci_result_to_errno(u8 result)
++{
++ switch (result) {
++ case NCI_HCI_ANY_OK:
++ return 0;
++ case NCI_HCI_ANY_E_REG_PAR_UNKNOWN:
++ return -EOPNOTSUPP;
++ case NCI_HCI_ANY_E_TIMEOUT:
++ return -ETIME;
++ default:
++ return -1;
++ }
++}
++
+ /* HCI core */
+ static void nci_hci_reset_pipes(struct nci_hci_dev *hdev)
+ {
+@@ -146,18 +160,18 @@ static int nci_hci_send_data(struct nci_dev *ndev, u8 pipe,
+ if (!conn_info)
+ return -EPROTO;
+
+- skb = nci_skb_alloc(ndev, 2 + conn_info->max_pkt_payload_len +
++ i = 0;
++ skb = nci_skb_alloc(ndev, conn_info->max_pkt_payload_len +
+ NCI_DATA_HDR_SIZE, GFP_KERNEL);
+ if (!skb)
+ return -ENOMEM;
+
+- skb_reserve(skb, 2 + NCI_DATA_HDR_SIZE);
++ skb_reserve(skb, NCI_DATA_HDR_SIZE + 2);
+ *skb_push(skb, 1) = data_type;
+
+- i = 0;
+- len = conn_info->max_pkt_payload_len;
+-
+ do {
++ len = conn_info->max_pkt_payload_len;
++
+ /* If last packet add NCI_HFP_NO_CHAINING */
+ if (i + conn_info->max_pkt_payload_len -
+ (skb->len + 1) >= data_len) {
+@@ -177,9 +191,15 @@ static int nci_hci_send_data(struct nci_dev *ndev, u8 pipe,
+ return r;
+
+ i += len;
++
+ if (i < data_len) {
+- skb_trim(skb, 0);
+- skb_pull(skb, len);
++ skb = nci_skb_alloc(ndev,
++ conn_info->max_pkt_payload_len +
++ NCI_DATA_HDR_SIZE, GFP_KERNEL);
++ if (!skb)
++ return -ENOMEM;
++
++ skb_reserve(skb, NCI_DATA_HDR_SIZE + 1);
+ }
+ } while (i < data_len);
+
+@@ -212,7 +232,8 @@ int nci_hci_send_cmd(struct nci_dev *ndev, u8 gate, u8 cmd,
+ const u8 *param, size_t param_len,
+ struct sk_buff **skb)
+ {
+- struct nci_conn_info *conn_info;
++ struct nci_hcp_message *message;
++ struct nci_conn_info *conn_info;
+ struct nci_data data;
+ int r;
+ u8 pipe = ndev->hci_dev->gate2pipe[gate];
+@@ -232,9 +253,15 @@ int nci_hci_send_cmd(struct nci_dev *ndev, u8 gate, u8 cmd,
+
+ r = nci_request(ndev, nci_hci_send_data_req, (unsigned long)&data,
+ msecs_to_jiffies(NCI_DATA_TIMEOUT));
+-
+- if (r == NCI_STATUS_OK && skb)
+- *skb = conn_info->rx_skb;
++ if (r == NCI_STATUS_OK) {
++ message = (struct nci_hcp_message *)conn_info->rx_skb->data;
++ r = nci_hci_result_to_errno(
++ NCI_HCP_MSG_GET_CMD(message->header));
++ skb_pull(conn_info->rx_skb, NCI_HCI_HCP_MESSAGE_HEADER_LEN);
++
++ if (!r && skb)
++ *skb = conn_info->rx_skb;
++ }
+
+ return r;
+ }
+@@ -328,9 +355,6 @@ static void nci_hci_resp_received(struct nci_dev *ndev, u8 pipe,
+ struct nci_conn_info *conn_info;
+ u8 status = result;
+
+- if (result != NCI_HCI_ANY_OK)
+- goto exit;
+-
+ conn_info = ndev->hci_dev->conn_info;
+ if (!conn_info) {
+ status = NCI_STATUS_REJECTED;
+@@ -340,7 +364,7 @@ static void nci_hci_resp_received(struct nci_dev *ndev, u8 pipe,
+ conn_info->rx_skb = skb;
+
+ exit:
+- nci_req_complete(ndev, status);
++ nci_req_complete(ndev, NCI_STATUS_OK);
+ }
+
+ /* Receive hcp message for pipe, with type and cmd.
+@@ -378,7 +402,7 @@ static void nci_hci_msg_rx_work(struct work_struct *work)
+ u8 pipe, type, instruction;
+
+ while ((skb = skb_dequeue(&hdev->msg_rx_queue)) != NULL) {
+- pipe = skb->data[0];
++ pipe = NCI_HCP_MSG_GET_PIPE(skb->data[0]);
+ skb_pull(skb, NCI_HCI_HCP_PACKET_HEADER_LEN);
+ message = (struct nci_hcp_message *)skb->data;
+ type = NCI_HCP_MSG_GET_TYPE(message->header);
+@@ -395,7 +419,7 @@ void nci_hci_data_received_cb(void *context,
+ {
+ struct nci_dev *ndev = (struct nci_dev *)context;
+ struct nci_hcp_packet *packet;
+- u8 pipe, type, instruction;
++ u8 pipe, type;
+ struct sk_buff *hcp_skb;
+ struct sk_buff *frag_skb;
+ int msg_len;
+@@ -415,7 +439,7 @@ void nci_hci_data_received_cb(void *context,
+
+ /* it's the last fragment. Does it need re-aggregation? */
+ if (skb_queue_len(&ndev->hci_dev->rx_hcp_frags)) {
+- pipe = packet->header & NCI_HCI_FRAGMENT;
++ pipe = NCI_HCP_MSG_GET_PIPE(packet->header);
+ skb_queue_tail(&ndev->hci_dev->rx_hcp_frags, skb);
+
+ msg_len = 0;
+@@ -434,7 +458,7 @@ void nci_hci_data_received_cb(void *context,
+ *skb_put(hcp_skb, NCI_HCI_HCP_PACKET_HEADER_LEN) = pipe;
+
+ skb_queue_walk(&ndev->hci_dev->rx_hcp_frags, frag_skb) {
+- msg_len = frag_skb->len - NCI_HCI_HCP_PACKET_HEADER_LEN;
++ msg_len = frag_skb->len - NCI_HCI_HCP_PACKET_HEADER_LEN;
+ memcpy(skb_put(hcp_skb, msg_len), frag_skb->data +
+ NCI_HCI_HCP_PACKET_HEADER_LEN, msg_len);
+ }
+@@ -452,11 +476,10 @@ void nci_hci_data_received_cb(void *context,
+ packet = (struct nci_hcp_packet *)hcp_skb->data;
+ type = NCI_HCP_MSG_GET_TYPE(packet->message.header);
+ if (type == NCI_HCI_HCP_RESPONSE) {
+- pipe = packet->header;
+- instruction = NCI_HCP_MSG_GET_CMD(packet->message.header);
+- skb_pull(hcp_skb, NCI_HCI_HCP_PACKET_HEADER_LEN +
+- NCI_HCI_HCP_MESSAGE_HEADER_LEN);
+- nci_hci_hcp_message_rx(ndev, pipe, type, instruction, hcp_skb);
++ pipe = NCI_HCP_MSG_GET_PIPE(packet->header);
++ skb_pull(hcp_skb, NCI_HCI_HCP_PACKET_HEADER_LEN);
++ nci_hci_hcp_message_rx(ndev, pipe, type,
++ NCI_STATUS_OK, hcp_skb);
+ } else {
+ skb_queue_tail(&ndev->hci_dev->msg_rx_queue, hcp_skb);
+ schedule_work(&ndev->hci_dev->msg_rx_work);
+@@ -488,6 +511,7 @@ EXPORT_SYMBOL(nci_hci_open_pipe);
+ int nci_hci_set_param(struct nci_dev *ndev, u8 gate, u8 idx,
+ const u8 *param, size_t param_len)
+ {
++ struct nci_hcp_message *message;
+ struct nci_conn_info *conn_info;
+ struct nci_data data;
+ int r;
+@@ -520,6 +544,12 @@ int nci_hci_set_param(struct nci_dev *ndev, u8 gate, u8 idx,
+ r = nci_request(ndev, nci_hci_send_data_req,
+ (unsigned long)&data,
+ msecs_to_jiffies(NCI_DATA_TIMEOUT));
++ if (r == NCI_STATUS_OK) {
++ message = (struct nci_hcp_message *)conn_info->rx_skb->data;
++ r = nci_hci_result_to_errno(
++ NCI_HCP_MSG_GET_CMD(message->header));
++ skb_pull(conn_info->rx_skb, NCI_HCI_HCP_MESSAGE_HEADER_LEN);
++ }
+
+ kfree(tmp);
+ return r;
+@@ -529,6 +559,7 @@ EXPORT_SYMBOL(nci_hci_set_param);
+ int nci_hci_get_param(struct nci_dev *ndev, u8 gate, u8 idx,
+ struct sk_buff **skb)
+ {
++ struct nci_hcp_message *message;
+ struct nci_conn_info *conn_info;
+ struct nci_data data;
+ int r;
+@@ -553,8 +584,15 @@ int nci_hci_get_param(struct nci_dev *ndev, u8 gate, u8 idx,
+ r = nci_request(ndev, nci_hci_send_data_req, (unsigned long)&data,
+ msecs_to_jiffies(NCI_DATA_TIMEOUT));
+
+- if (r == NCI_STATUS_OK)
+- *skb = conn_info->rx_skb;
++ if (r == NCI_STATUS_OK) {
++ message = (struct nci_hcp_message *)conn_info->rx_skb->data;
++ r = nci_hci_result_to_errno(
++ NCI_HCP_MSG_GET_CMD(message->header));
++ skb_pull(conn_info->rx_skb, NCI_HCI_HCP_MESSAGE_HEADER_LEN);
++
++ if (!r && skb)
++ *skb = conn_info->rx_skb;
++ }
+
+ return r;
+ }
+diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
+index 7851b1222a36..71cb085e16fd 100644
+--- a/net/packet/af_packet.c
++++ b/net/packet/af_packet.c
+@@ -2784,22 +2784,40 @@ static int packet_release(struct socket *sock)
+ * Attach a packet hook.
+ */
+
+-static int packet_do_bind(struct sock *sk, struct net_device *dev, __be16 proto)
++static int packet_do_bind(struct sock *sk, const char *name, int ifindex,
++ __be16 proto)
+ {
+ struct packet_sock *po = pkt_sk(sk);
+ struct net_device *dev_curr;
+ __be16 proto_curr;
+ bool need_rehook;
++ struct net_device *dev = NULL;
++ int ret = 0;
++ bool unlisted = false;
+
+- if (po->fanout) {
+- if (dev)
+- dev_put(dev);
+-
++ if (po->fanout)
+ return -EINVAL;
+- }
+
+ lock_sock(sk);
+ spin_lock(&po->bind_lock);
++ rcu_read_lock();
++
++ if (name) {
++ dev = dev_get_by_name_rcu(sock_net(sk), name);
++ if (!dev) {
++ ret = -ENODEV;
++ goto out_unlock;
++ }
++ } else if (ifindex) {
++ dev = dev_get_by_index_rcu(sock_net(sk), ifindex);
++ if (!dev) {
++ ret = -ENODEV;
++ goto out_unlock;
++ }
++ }
++
++ if (dev)
++ dev_hold(dev);
+
+ proto_curr = po->prot_hook.type;
+ dev_curr = po->prot_hook.dev;
+@@ -2807,14 +2825,29 @@ static int packet_do_bind(struct sock *sk, struct net_device *dev, __be16 proto)
+ need_rehook = proto_curr != proto || dev_curr != dev;
+
+ if (need_rehook) {
+- unregister_prot_hook(sk, true);
++ if (po->running) {
++ rcu_read_unlock();
++ __unregister_prot_hook(sk, true);
++ rcu_read_lock();
++ dev_curr = po->prot_hook.dev;
++ if (dev)
++ unlisted = !dev_get_by_index_rcu(sock_net(sk),
++ dev->ifindex);
++ }
+
+ po->num = proto;
+ po->prot_hook.type = proto;
+- po->prot_hook.dev = dev;
+
+- po->ifindex = dev ? dev->ifindex : 0;
+- packet_cached_dev_assign(po, dev);
++ if (unlikely(unlisted)) {
++ dev_put(dev);
++ po->prot_hook.dev = NULL;
++ po->ifindex = -1;
++ packet_cached_dev_reset(po);
++ } else {
++ po->prot_hook.dev = dev;
++ po->ifindex = dev ? dev->ifindex : 0;
++ packet_cached_dev_assign(po, dev);
++ }
+ }
+ if (dev_curr)
+ dev_put(dev_curr);
+@@ -2822,7 +2855,7 @@ static int packet_do_bind(struct sock *sk, struct net_device *dev, __be16 proto)
+ if (proto == 0 || !need_rehook)
+ goto out_unlock;
+
+- if (!dev || (dev->flags & IFF_UP)) {
++ if (!unlisted && (!dev || (dev->flags & IFF_UP))) {
+ register_prot_hook(sk);
+ } else {
+ sk->sk_err = ENETDOWN;
+@@ -2831,9 +2864,10 @@ static int packet_do_bind(struct sock *sk, struct net_device *dev, __be16 proto)
+ }
+
+ out_unlock:
++ rcu_read_unlock();
+ spin_unlock(&po->bind_lock);
+ release_sock(sk);
+- return 0;
++ return ret;
+ }
+
+ /*
+@@ -2845,8 +2879,6 @@ static int packet_bind_spkt(struct socket *sock, struct sockaddr *uaddr,
+ {
+ struct sock *sk = sock->sk;
+ char name[15];
+- struct net_device *dev;
+- int err = -ENODEV;
+
+ /*
+ * Check legality
+@@ -2856,19 +2888,13 @@ static int packet_bind_spkt(struct socket *sock, struct sockaddr *uaddr,
+ return -EINVAL;
+ strlcpy(name, uaddr->sa_data, sizeof(name));
+
+- dev = dev_get_by_name(sock_net(sk), name);
+- if (dev)
+- err = packet_do_bind(sk, dev, pkt_sk(sk)->num);
+- return err;
++ return packet_do_bind(sk, name, 0, pkt_sk(sk)->num);
+ }
+
+ static int packet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
+ {
+ struct sockaddr_ll *sll = (struct sockaddr_ll *)uaddr;
+ struct sock *sk = sock->sk;
+- struct net_device *dev = NULL;
+- int err;
+-
+
+ /*
+ * Check legality
+@@ -2879,16 +2905,8 @@ static int packet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len
+ if (sll->sll_family != AF_PACKET)
+ return -EINVAL;
+
+- if (sll->sll_ifindex) {
+- err = -ENODEV;
+- dev = dev_get_by_index(sock_net(sk), sll->sll_ifindex);
+- if (dev == NULL)
+- goto out;
+- }
+- err = packet_do_bind(sk, dev, sll->sll_protocol ? : pkt_sk(sk)->num);
+-
+-out:
+- return err;
++ return packet_do_bind(sk, NULL, sll->sll_ifindex,
++ sll->sll_protocol ? : pkt_sk(sk)->num);
+ }
+
+ static struct proto packet_proto = {
+diff --git a/net/rds/connection.c b/net/rds/connection.c
+index da6da57e5f36..9d66705f9d41 100644
+--- a/net/rds/connection.c
++++ b/net/rds/connection.c
+@@ -187,6 +187,12 @@ new_conn:
+ }
+ }
+
++ if (trans == NULL) {
++ kmem_cache_free(rds_conn_slab, conn);
++ conn = ERR_PTR(-ENODEV);
++ goto out;
++ }
++
+ conn->c_trans = trans;
+
+ ret = trans->conn_alloc(conn, gfp);
+diff --git a/net/rds/tcp_recv.c b/net/rds/tcp_recv.c
+index fbc5ef88bc0e..27a992154804 100644
+--- a/net/rds/tcp_recv.c
++++ b/net/rds/tcp_recv.c
+@@ -214,8 +214,15 @@ static int rds_tcp_data_recv(read_descriptor_t *desc, struct sk_buff *skb,
+ }
+
+ to_copy = min(tc->t_tinc_data_rem, left);
+- pskb_pull(clone, offset);
+- pskb_trim(clone, to_copy);
++ if (!pskb_pull(clone, offset) ||
++ pskb_trim(clone, to_copy)) {
++ pr_warn("rds_tcp_data_recv: pull/trim failed "
++ "left %zu data_rem %zu skb_len %d\n",
++ left, tc->t_tinc_data_rem, skb->len);
++ kfree_skb(clone);
++ desc->error = -ENOMEM;
++ goto out;
++ }
+ skb_queue_tail(&tinc->ti_skb_list, clone);
+
+ rdsdebug("skb %p data %p len %d off %u to_copy %zu -> "
+diff --git a/net/tipc/msg.c b/net/tipc/msg.c
+index 08b4cc7d496d..b3a393104b17 100644
+--- a/net/tipc/msg.c
++++ b/net/tipc/msg.c
+@@ -121,7 +121,7 @@ int tipc_buf_append(struct sk_buff **headbuf, struct sk_buff **buf)
+ {
+ struct sk_buff *head = *headbuf;
+ struct sk_buff *frag = *buf;
+- struct sk_buff *tail;
++ struct sk_buff *tail = NULL;
+ struct tipc_msg *msg;
+ u32 fragid;
+ int delta;
+@@ -141,9 +141,15 @@ int tipc_buf_append(struct sk_buff **headbuf, struct sk_buff **buf)
+ if (unlikely(skb_unclone(frag, GFP_ATOMIC)))
+ goto err;
+ head = *headbuf = frag;
+- skb_frag_list_init(head);
+- TIPC_SKB_CB(head)->tail = NULL;
+ *buf = NULL;
++ TIPC_SKB_CB(head)->tail = NULL;
++ if (skb_is_nonlinear(head)) {
++ skb_walk_frags(head, tail) {
++ TIPC_SKB_CB(head)->tail = tail;
++ }
++ } else {
++ skb_frag_list_init(head);
++ }
+ return 0;
+ }
+
+diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c
+index 66deebc66aa1..f8dfee5072c0 100644
+--- a/net/tipc/udp_media.c
++++ b/net/tipc/udp_media.c
+@@ -48,6 +48,7 @@
+ #include <linux/tipc_netlink.h>
+ #include "core.h"
+ #include "bearer.h"
++#include "msg.h"
+
+ /* IANA assigned UDP port */
+ #define UDP_PORT_DEFAULT 6118
+@@ -216,6 +217,10 @@ static int tipc_udp_recv(struct sock *sk, struct sk_buff *skb)
+ {
+ struct udp_bearer *ub;
+ struct tipc_bearer *b;
++ int usr = msg_user(buf_msg(skb));
++
++ if ((usr == LINK_PROTOCOL) || (usr == NAME_DISTRIBUTOR))
++ skb_linearize(skb);
+
+ ub = rcu_dereference_sk_user_data(sk);
+ if (!ub) {
+diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
+index 76b41578a838..d059cf31d754 100644
+--- a/net/wireless/nl80211.c
++++ b/net/wireless/nl80211.c
+@@ -3408,12 +3408,6 @@ static int nl80211_start_ap(struct sk_buff *skb, struct genl_info *info)
+ wdev->iftype))
+ return -EINVAL;
+
+- if (info->attrs[NL80211_ATTR_ACL_POLICY]) {
+- params.acl = parse_acl_data(&rdev->wiphy, info);
+- if (IS_ERR(params.acl))
+- return PTR_ERR(params.acl);
+- }
+-
+ if (info->attrs[NL80211_ATTR_SMPS_MODE]) {
+ params.smps_mode =
+ nla_get_u8(info->attrs[NL80211_ATTR_SMPS_MODE]);
+@@ -3437,6 +3431,12 @@ static int nl80211_start_ap(struct sk_buff *skb, struct genl_info *info)
+ params.smps_mode = NL80211_SMPS_OFF;
+ }
+
++ if (info->attrs[NL80211_ATTR_ACL_POLICY]) {
++ params.acl = parse_acl_data(&rdev->wiphy, info);
++ if (IS_ERR(params.acl))
++ return PTR_ERR(params.acl);
++ }
++
+ wdev_lock(wdev);
+ err = rdev_start_ap(rdev, dev, ¶ms);
+ if (!err) {
+diff --git a/sound/usb/midi.c b/sound/usb/midi.c
+index 417ebb11cf48..bec63e0d2605 100644
+--- a/sound/usb/midi.c
++++ b/sound/usb/midi.c
+@@ -174,6 +174,8 @@ struct snd_usb_midi_in_endpoint {
+ u8 running_status_length;
+ } ports[0x10];
+ u8 seen_f5;
++ bool in_sysex;
++ u8 last_cin;
+ u8 error_resubmit;
+ int current_port;
+ };
+@@ -468,6 +470,39 @@ static void snd_usbmidi_maudio_broken_running_status_input(
+ }
+
+ /*
++ * QinHeng CH345 is buggy: every second packet inside a SysEx has not CIN 4
++ * but the previously seen CIN, but still with three data bytes.
++ */
++static void ch345_broken_sysex_input(struct snd_usb_midi_in_endpoint *ep,
++ uint8_t *buffer, int buffer_length)
++{
++ unsigned int i, cin, length;
++
++ for (i = 0; i + 3 < buffer_length; i += 4) {
++ if (buffer[i] == 0 && i > 0)
++ break;
++ cin = buffer[i] & 0x0f;
++ if (ep->in_sysex &&
++ cin == ep->last_cin &&
++ (buffer[i + 1 + (cin == 0x6)] & 0x80) == 0)
++ cin = 0x4;
++#if 0
++ if (buffer[i + 1] == 0x90) {
++ /*
++ * Either a corrupted running status or a real note-on
++ * message; impossible to detect reliably.
++ */
++ }
++#endif
++ length = snd_usbmidi_cin_length[cin];
++ snd_usbmidi_input_data(ep, 0, &buffer[i + 1], length);
++ ep->in_sysex = cin == 0x4;
++ if (!ep->in_sysex)
++ ep->last_cin = cin;
++ }
++}
++
++/*
+ * CME protocol: like the standard protocol, but SysEx commands are sent as a
+ * single USB packet preceded by a 0x0F byte.
+ */
+@@ -660,6 +695,12 @@ static struct usb_protocol_ops snd_usbmidi_cme_ops = {
+ .output_packet = snd_usbmidi_output_standard_packet,
+ };
+
++static struct usb_protocol_ops snd_usbmidi_ch345_broken_sysex_ops = {
++ .input = ch345_broken_sysex_input,
++ .output = snd_usbmidi_standard_output,
++ .output_packet = snd_usbmidi_output_standard_packet,
++};
++
+ /*
+ * AKAI MPD16 protocol:
+ *
+@@ -1341,6 +1382,7 @@ static int snd_usbmidi_out_endpoint_create(struct snd_usb_midi *umidi,
+ * Various chips declare a packet size larger than 4 bytes, but
+ * do not actually work with larger packets:
+ */
++ case USB_ID(0x0a67, 0x5011): /* Medeli DD305 */
+ case USB_ID(0x0a92, 0x1020): /* ESI M4U */
+ case USB_ID(0x1430, 0x474b): /* RedOctane GH MIDI INTERFACE */
+ case USB_ID(0x15ca, 0x0101): /* Textech USB Midi Cable */
+@@ -2375,6 +2417,10 @@ int snd_usbmidi_create(struct snd_card *card,
+
+ err = snd_usbmidi_detect_per_port_endpoints(umidi, endpoints);
+ break;
++ case QUIRK_MIDI_CH345:
++ umidi->usb_protocol_ops = &snd_usbmidi_ch345_broken_sysex_ops;
++ err = snd_usbmidi_detect_per_port_endpoints(umidi, endpoints);
++ break;
+ default:
+ dev_err(&umidi->dev->dev, "invalid quirk type %d\n",
+ quirk->type);
+diff --git a/sound/usb/quirks-table.h b/sound/usb/quirks-table.h
+index e4756651a52c..ecc2a4ea014d 100644
+--- a/sound/usb/quirks-table.h
++++ b/sound/usb/quirks-table.h
+@@ -2820,6 +2820,17 @@ YAMAHA_DEVICE(0x7010, "UB99"),
+ .idProduct = 0x1020,
+ },
+
++/* QinHeng devices */
++{
++ USB_DEVICE(0x1a86, 0x752d),
++ .driver_info = (unsigned long) &(const struct snd_usb_audio_quirk) {
++ .vendor_name = "QinHeng",
++ .product_name = "CH345",
++ .ifnum = 1,
++ .type = QUIRK_MIDI_CH345
++ }
++},
++
+ /* KeithMcMillen Stringport */
+ {
+ USB_DEVICE(0x1f38, 0x0001),
+diff --git a/sound/usb/quirks.c b/sound/usb/quirks.c
+index 00ebc0ca008e..eef9b8e4b949 100644
+--- a/sound/usb/quirks.c
++++ b/sound/usb/quirks.c
+@@ -535,6 +535,7 @@ int snd_usb_create_quirk(struct snd_usb_audio *chip,
+ [QUIRK_MIDI_CME] = create_any_midi_quirk,
+ [QUIRK_MIDI_AKAI] = create_any_midi_quirk,
+ [QUIRK_MIDI_FTDI] = create_any_midi_quirk,
++ [QUIRK_MIDI_CH345] = create_any_midi_quirk,
+ [QUIRK_AUDIO_STANDARD_INTERFACE] = create_standard_audio_quirk,
+ [QUIRK_AUDIO_FIXED_ENDPOINT] = create_fixed_stream_quirk,
+ [QUIRK_AUDIO_EDIROL_UAXX] = create_uaxx_quirk,
+@@ -1271,6 +1272,7 @@ u64 snd_usb_interface_dsd_format_quirks(struct snd_usb_audio *chip,
+ case USB_ID(0x20b1, 0x000a): /* Gustard DAC-X20U */
+ case USB_ID(0x20b1, 0x2009): /* DIYINHK DSD DXD 384kHz USB to I2S/DSD */
+ case USB_ID(0x20b1, 0x2023): /* JLsounds I2SoverUSB */
++ case USB_ID(0x20b1, 0x3023): /* Aune X1S 32BIT/384 DSD DAC */
+ if (fp->altsetting == 3)
+ return SNDRV_PCM_FMTBIT_DSD_U32_BE;
+ break;
+diff --git a/sound/usb/usbaudio.h b/sound/usb/usbaudio.h
+index 91d0380431b4..991aa84491cd 100644
+--- a/sound/usb/usbaudio.h
++++ b/sound/usb/usbaudio.h
+@@ -94,6 +94,7 @@ enum quirk_type {
+ QUIRK_MIDI_AKAI,
+ QUIRK_MIDI_US122L,
+ QUIRK_MIDI_FTDI,
++ QUIRK_MIDI_CH345,
+ QUIRK_AUDIO_STANDARD_INTERFACE,
+ QUIRK_AUDIO_FIXED_ENDPOINT,
+ QUIRK_AUDIO_EDIROL_UAXX,
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [gentoo-commits] proj/linux-patches:4.2 commit in: /
@ 2015-12-15 11:15 Mike Pagano
0 siblings, 0 replies; 17+ messages in thread
From: Mike Pagano @ 2015-12-15 11:15 UTC (permalink / raw
To: gentoo-commits
commit: 3a9d9184e1f0d412574eabf24e5cd3586f69d3e9
Author: Mike Pagano <mpagano <AT> gentoo <DOT> org>
AuthorDate: Tue Dec 15 11:15:05 2015 +0000
Commit: Mike Pagano <mpagano <AT> gentoo <DOT> org>
CommitDate: Tue Dec 15 11:15:05 2015 +0000
URL: https://gitweb.gentoo.org/proj/linux-patches.git/commit/?id=3a9d9184
Linux patch 4.2.8
0000_README | 4 +
1007_linux-4.2.8.patch | 3882 ++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 3886 insertions(+)
diff --git a/0000_README b/0000_README
index 2299001..5645178 100644
--- a/0000_README
+++ b/0000_README
@@ -71,6 +71,10 @@ Patch: 1006_linux-4.2.7.patch
From: http://www.kernel.org
Desc: Linux 4.2.7
+Patch: 1007_linux-4.2.8.patch
+From: http://www.kernel.org
+Desc: Linux 4.2.8
+
Patch: 1500_XATTR_USER_PREFIX.patch
From: https://bugs.gentoo.org/show_bug.cgi?id=470644
Desc: Support for namespace user.pax.* on tmpfs.
diff --git a/1007_linux-4.2.8.patch b/1007_linux-4.2.8.patch
new file mode 100644
index 0000000..7aca417
--- /dev/null
+++ b/1007_linux-4.2.8.patch
@@ -0,0 +1,3882 @@
+diff --git a/Makefile b/Makefile
+index f5014eaf2532..06b988951ccb 100644
+--- a/Makefile
++++ b/Makefile
+@@ -1,6 +1,6 @@
+ VERSION = 4
+ PATCHLEVEL = 2
+-SUBLEVEL = 7
++SUBLEVEL = 8
+ EXTRAVERSION =
+ NAME = Hurr durr I'ma sheep
+
+diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
+index 017b7d58ae06..55f8a6a706fc 100644
+--- a/drivers/block/rbd.c
++++ b/drivers/block/rbd.c
+@@ -3439,6 +3439,7 @@ static void rbd_queue_workfn(struct work_struct *work)
+ goto err_rq;
+ }
+ img_request->rq = rq;
++ snapc = NULL; /* img_request consumes a ref */
+
+ if (op_type == OBJ_OP_DISCARD)
+ result = rbd_img_request_fill(img_request, OBJ_REQUEST_NODATA,
+diff --git a/drivers/firewire/ohci.c b/drivers/firewire/ohci.c
+index f51d376d10ba..c2f5117fd8cb 100644
+--- a/drivers/firewire/ohci.c
++++ b/drivers/firewire/ohci.c
+@@ -3675,6 +3675,11 @@ static int pci_probe(struct pci_dev *dev,
+
+ reg_write(ohci, OHCI1394_IsoXmitIntMaskSet, ~0);
+ ohci->it_context_support = reg_read(ohci, OHCI1394_IsoXmitIntMaskSet);
++ /* JMicron JMB38x often shows 0 at first read, just ignore it */
++ if (!ohci->it_context_support) {
++ ohci_notice(ohci, "overriding IsoXmitIntMask\n");
++ ohci->it_context_support = 0xf;
++ }
+ reg_write(ohci, OHCI1394_IsoXmitIntMaskClear, ~0);
+ ohci->it_context_mask = ohci->it_context_support;
+ ohci->n_it = hweight32(ohci->it_context_mask);
+diff --git a/drivers/media/pci/cobalt/Kconfig b/drivers/media/pci/cobalt/Kconfig
+index 6a1c0089bb62..4ecf171d14a2 100644
+--- a/drivers/media/pci/cobalt/Kconfig
++++ b/drivers/media/pci/cobalt/Kconfig
+@@ -1,6 +1,6 @@
+ config VIDEO_COBALT
+ tristate "Cisco Cobalt support"
+- depends on VIDEO_V4L2 && I2C && MEDIA_CONTROLLER
++ depends on VIDEO_V4L2 && I2C && VIDEO_V4L2_SUBDEV_API
+ depends on PCI_MSI && MTD_COMPLEX_MAPPINGS && GPIOLIB
+ depends on SND
+ select I2C_ALGOBIT
+diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+index 3b90afb8c293..6f2a748524f3 100644
+--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
++++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+@@ -1325,7 +1325,12 @@ err_disable_device:
+ static void nicvf_remove(struct pci_dev *pdev)
+ {
+ struct net_device *netdev = pci_get_drvdata(pdev);
+- struct nicvf *nic = netdev_priv(netdev);
++ struct nicvf *nic;
++
++ if (!netdev)
++ return;
++
++ nic = netdev_priv(netdev);
+
+ unregister_netdev(netdev);
+ nicvf_unregister_interrupts(nic);
+diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+index 731423ca575d..8bead97373ab 100644
+--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
++++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+@@ -4934,26 +4934,41 @@ static void rem_slave_counters(struct mlx4_dev *dev, int slave)
+ struct res_counter *counter;
+ struct res_counter *tmp;
+ int err;
+- int index;
++ int *counters_arr = NULL;
++ int i, j;
+
+ err = move_all_busy(dev, slave, RES_COUNTER);
+ if (err)
+ mlx4_warn(dev, "rem_slave_counters: Could not move all counters - too busy for slave %d\n",
+ slave);
+
+- spin_lock_irq(mlx4_tlock(dev));
+- list_for_each_entry_safe(counter, tmp, counter_list, com.list) {
+- if (counter->com.owner == slave) {
+- index = counter->com.res_id;
+- rb_erase(&counter->com.node,
+- &tracker->res_tree[RES_COUNTER]);
+- list_del(&counter->com.list);
+- kfree(counter);
+- __mlx4_counter_free(dev, index);
++ counters_arr = kmalloc_array(dev->caps.max_counters,
++ sizeof(*counters_arr), GFP_KERNEL);
++ if (!counters_arr)
++ return;
++
++ do {
++ i = 0;
++ j = 0;
++ spin_lock_irq(mlx4_tlock(dev));
++ list_for_each_entry_safe(counter, tmp, counter_list, com.list) {
++ if (counter->com.owner == slave) {
++ counters_arr[i++] = counter->com.res_id;
++ rb_erase(&counter->com.node,
++ &tracker->res_tree[RES_COUNTER]);
++ list_del(&counter->com.list);
++ kfree(counter);
++ }
++ }
++ spin_unlock_irq(mlx4_tlock(dev));
++
++ while (j < i) {
++ __mlx4_counter_free(dev, counters_arr[j++]);
+ mlx4_release_resource(dev, slave, RES_COUNTER, 1, 0);
+ }
+- }
+- spin_unlock_irq(mlx4_tlock(dev));
++ } while (i);
++
++ kfree(counters_arr);
+ }
+
+ static void rem_slave_xrcdns(struct mlx4_dev *dev, int slave)
+diff --git a/drivers/net/ethernet/via/via-rhine.c b/drivers/net/ethernet/via/via-rhine.c
+index a83263743665..2b7550c43f78 100644
+--- a/drivers/net/ethernet/via/via-rhine.c
++++ b/drivers/net/ethernet/via/via-rhine.c
+@@ -2134,10 +2134,11 @@ static int rhine_rx(struct net_device *dev, int limit)
+ }
+
+ skb_put(skb, pkt_len);
+- skb->protocol = eth_type_trans(skb, dev);
+
+ rhine_rx_vlan_tag(skb, desc, data_size);
+
++ skb->protocol = eth_type_trans(skb, dev);
++
+ netif_receive_skb(skb);
+
+ u64_stats_update_begin(&rp->rx_stats.syncp);
+diff --git a/drivers/net/phy/broadcom.c b/drivers/net/phy/broadcom.c
+index 9c71295f2fef..85e640440bd9 100644
+--- a/drivers/net/phy/broadcom.c
++++ b/drivers/net/phy/broadcom.c
+@@ -675,7 +675,7 @@ static struct mdio_device_id __maybe_unused broadcom_tbl[] = {
+ { PHY_ID_BCM5461, 0xfffffff0 },
+ { PHY_ID_BCM54616S, 0xfffffff0 },
+ { PHY_ID_BCM5464, 0xfffffff0 },
+- { PHY_ID_BCM5482, 0xfffffff0 },
++ { PHY_ID_BCM5481, 0xfffffff0 },
+ { PHY_ID_BCM5482, 0xfffffff0 },
+ { PHY_ID_BCM50610, 0xfffffff0 },
+ { PHY_ID_BCM50610M, 0xfffffff0 },
+diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
+index 8f1738c3b3c5..de27f510c0f3 100644
+--- a/drivers/net/usb/qmi_wwan.c
++++ b/drivers/net/usb/qmi_wwan.c
+@@ -775,6 +775,7 @@ static const struct usb_device_id products[] = {
+ {QMI_FIXED_INTF(0x2357, 0x9000, 4)}, /* TP-LINK MA260 */
+ {QMI_FIXED_INTF(0x1bc7, 0x1200, 5)}, /* Telit LE920 */
+ {QMI_FIXED_INTF(0x1bc7, 0x1201, 2)}, /* Telit LE920 */
++ {QMI_FIXED_INTF(0x1c9e, 0x9b01, 3)}, /* XS Stick W100-2 from 4G Systems */
+ {QMI_FIXED_INTF(0x0b3c, 0xc000, 4)}, /* Olivetti Olicard 100 */
+ {QMI_FIXED_INTF(0x0b3c, 0xc001, 4)}, /* Olivetti Olicard 120 */
+ {QMI_FIXED_INTF(0x0b3c, 0xc002, 4)}, /* Olivetti Olicard 140 */
+diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
+index aac314e14188..bb25b8d00570 100644
+--- a/fs/btrfs/ctree.h
++++ b/fs/btrfs/ctree.h
+@@ -3404,7 +3404,7 @@ int btrfs_set_disk_extent_flags(struct btrfs_trans_handle *trans,
+ int btrfs_free_extent(struct btrfs_trans_handle *trans,
+ struct btrfs_root *root,
+ u64 bytenr, u64 num_bytes, u64 parent, u64 root_objectid,
+- u64 owner, u64 offset, int no_quota);
++ u64 owner, u64 offset);
+
+ int btrfs_free_reserved_extent(struct btrfs_root *root, u64 start, u64 len,
+ int delalloc);
+@@ -3417,7 +3417,7 @@ int btrfs_finish_extent_commit(struct btrfs_trans_handle *trans,
+ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
+ struct btrfs_root *root,
+ u64 bytenr, u64 num_bytes, u64 parent,
+- u64 root_objectid, u64 owner, u64 offset, int no_quota);
++ u64 root_objectid, u64 owner, u64 offset);
+
+ int btrfs_start_dirty_block_groups(struct btrfs_trans_handle *trans,
+ struct btrfs_root *root);
+diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
+index ac3e81da6d4e..7832031fef68 100644
+--- a/fs/btrfs/delayed-ref.c
++++ b/fs/btrfs/delayed-ref.c
+@@ -197,6 +197,119 @@ static inline void drop_delayed_ref(struct btrfs_trans_handle *trans,
+ trans->delayed_ref_updates--;
+ }
+
++static bool merge_ref(struct btrfs_trans_handle *trans,
++ struct btrfs_delayed_ref_root *delayed_refs,
++ struct btrfs_delayed_ref_head *head,
++ struct btrfs_delayed_ref_node *ref,
++ u64 seq)
++{
++ struct btrfs_delayed_ref_node *next;
++ bool done = false;
++
++ next = list_first_entry(&head->ref_list, struct btrfs_delayed_ref_node,
++ list);
++ while (!done && &next->list != &head->ref_list) {
++ int mod;
++ struct btrfs_delayed_ref_node *next2;
++
++ next2 = list_next_entry(next, list);
++
++ if (next == ref)
++ goto next;
++
++ if (seq && next->seq >= seq)
++ goto next;
++
++ if (next->type != ref->type)
++ goto next;
++
++ if ((ref->type == BTRFS_TREE_BLOCK_REF_KEY ||
++ ref->type == BTRFS_SHARED_BLOCK_REF_KEY) &&
++ comp_tree_refs(btrfs_delayed_node_to_tree_ref(ref),
++ btrfs_delayed_node_to_tree_ref(next),
++ ref->type))
++ goto next;
++ if ((ref->type == BTRFS_EXTENT_DATA_REF_KEY ||
++ ref->type == BTRFS_SHARED_DATA_REF_KEY) &&
++ comp_data_refs(btrfs_delayed_node_to_data_ref(ref),
++ btrfs_delayed_node_to_data_ref(next)))
++ goto next;
++
++ if (ref->action == next->action) {
++ mod = next->ref_mod;
++ } else {
++ if (ref->ref_mod < next->ref_mod) {
++ swap(ref, next);
++ done = true;
++ }
++ mod = -next->ref_mod;
++ }
++
++ drop_delayed_ref(trans, delayed_refs, head, next);
++ ref->ref_mod += mod;
++ if (ref->ref_mod == 0) {
++ drop_delayed_ref(trans, delayed_refs, head, ref);
++ done = true;
++ } else {
++ /*
++ * Can't have multiples of the same ref on a tree block.
++ */
++ WARN_ON(ref->type == BTRFS_TREE_BLOCK_REF_KEY ||
++ ref->type == BTRFS_SHARED_BLOCK_REF_KEY);
++ }
++next:
++ next = next2;
++ }
++
++ return done;
++}
++
++void btrfs_merge_delayed_refs(struct btrfs_trans_handle *trans,
++ struct btrfs_fs_info *fs_info,
++ struct btrfs_delayed_ref_root *delayed_refs,
++ struct btrfs_delayed_ref_head *head)
++{
++ struct btrfs_delayed_ref_node *ref;
++ u64 seq = 0;
++
++ assert_spin_locked(&head->lock);
++
++ if (list_empty(&head->ref_list))
++ return;
++
++ /* We don't have too many refs to merge for data. */
++ if (head->is_data)
++ return;
++
++ spin_lock(&fs_info->tree_mod_seq_lock);
++ if (!list_empty(&fs_info->tree_mod_seq_list)) {
++ struct seq_list *elem;
++
++ elem = list_first_entry(&fs_info->tree_mod_seq_list,
++ struct seq_list, list);
++ seq = elem->seq;
++ }
++ spin_unlock(&fs_info->tree_mod_seq_lock);
++
++ ref = list_first_entry(&head->ref_list, struct btrfs_delayed_ref_node,
++ list);
++ while (&ref->list != &head->ref_list) {
++ if (seq && ref->seq >= seq)
++ goto next;
++
++ if (merge_ref(trans, delayed_refs, head, ref, seq)) {
++ if (list_empty(&head->ref_list))
++ break;
++ ref = list_first_entry(&head->ref_list,
++ struct btrfs_delayed_ref_node,
++ list);
++ continue;
++ }
++next:
++ ref = list_next_entry(ref, list);
++ }
++}
++
+ int btrfs_check_delayed_seq(struct btrfs_fs_info *fs_info,
+ struct btrfs_delayed_ref_root *delayed_refs,
+ u64 seq)
+@@ -292,8 +405,7 @@ add_delayed_ref_tail_merge(struct btrfs_trans_handle *trans,
+ exist = list_entry(href->ref_list.prev, struct btrfs_delayed_ref_node,
+ list);
+ /* No need to compare bytenr nor is_head */
+- if (exist->type != ref->type || exist->no_quota != ref->no_quota ||
+- exist->seq != ref->seq)
++ if (exist->type != ref->type || exist->seq != ref->seq)
+ goto add_tail;
+
+ if ((exist->type == BTRFS_TREE_BLOCK_REF_KEY ||
+@@ -524,7 +636,7 @@ add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
+ struct btrfs_delayed_ref_head *head_ref,
+ struct btrfs_delayed_ref_node *ref, u64 bytenr,
+ u64 num_bytes, u64 parent, u64 ref_root, int level,
+- int action, int no_quota)
++ int action)
+ {
+ struct btrfs_delayed_tree_ref *full_ref;
+ struct btrfs_delayed_ref_root *delayed_refs;
+@@ -546,7 +658,6 @@ add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
+ ref->action = action;
+ ref->is_head = 0;
+ ref->in_tree = 1;
+- ref->no_quota = no_quota;
+ ref->seq = seq;
+
+ full_ref = btrfs_delayed_node_to_tree_ref(ref);
+@@ -579,7 +690,7 @@ add_delayed_data_ref(struct btrfs_fs_info *fs_info,
+ struct btrfs_delayed_ref_head *head_ref,
+ struct btrfs_delayed_ref_node *ref, u64 bytenr,
+ u64 num_bytes, u64 parent, u64 ref_root, u64 owner,
+- u64 offset, int action, int no_quota)
++ u64 offset, int action)
+ {
+ struct btrfs_delayed_data_ref *full_ref;
+ struct btrfs_delayed_ref_root *delayed_refs;
+@@ -602,7 +713,6 @@ add_delayed_data_ref(struct btrfs_fs_info *fs_info,
+ ref->action = action;
+ ref->is_head = 0;
+ ref->in_tree = 1;
+- ref->no_quota = no_quota;
+ ref->seq = seq;
+
+ full_ref = btrfs_delayed_node_to_data_ref(ref);
+@@ -633,17 +743,13 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
+ struct btrfs_trans_handle *trans,
+ u64 bytenr, u64 num_bytes, u64 parent,
+ u64 ref_root, int level, int action,
+- struct btrfs_delayed_extent_op *extent_op,
+- int no_quota)
++ struct btrfs_delayed_extent_op *extent_op)
+ {
+ struct btrfs_delayed_tree_ref *ref;
+ struct btrfs_delayed_ref_head *head_ref;
+ struct btrfs_delayed_ref_root *delayed_refs;
+ struct btrfs_qgroup_extent_record *record = NULL;
+
+- if (!is_fstree(ref_root) || !fs_info->quota_enabled)
+- no_quota = 0;
+-
+ BUG_ON(extent_op && extent_op->is_data);
+ ref = kmem_cache_alloc(btrfs_delayed_tree_ref_cachep, GFP_NOFS);
+ if (!ref)
+@@ -672,8 +778,7 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
+ bytenr, num_bytes, action, 0);
+
+ add_delayed_tree_ref(fs_info, trans, head_ref, &ref->node, bytenr,
+- num_bytes, parent, ref_root, level, action,
+- no_quota);
++ num_bytes, parent, ref_root, level, action);
+ spin_unlock(&delayed_refs->lock);
+
+ return 0;
+@@ -694,17 +799,13 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
+ u64 bytenr, u64 num_bytes,
+ u64 parent, u64 ref_root,
+ u64 owner, u64 offset, int action,
+- struct btrfs_delayed_extent_op *extent_op,
+- int no_quota)
++ struct btrfs_delayed_extent_op *extent_op)
+ {
+ struct btrfs_delayed_data_ref *ref;
+ struct btrfs_delayed_ref_head *head_ref;
+ struct btrfs_delayed_ref_root *delayed_refs;
+ struct btrfs_qgroup_extent_record *record = NULL;
+
+- if (!is_fstree(ref_root) || !fs_info->quota_enabled)
+- no_quota = 0;
+-
+ BUG_ON(extent_op && !extent_op->is_data);
+ ref = kmem_cache_alloc(btrfs_delayed_data_ref_cachep, GFP_NOFS);
+ if (!ref)
+@@ -740,7 +841,7 @@ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
+
+ add_delayed_data_ref(fs_info, trans, head_ref, &ref->node, bytenr,
+ num_bytes, parent, ref_root, owner, offset,
+- action, no_quota);
++ action);
+ spin_unlock(&delayed_refs->lock);
+
+ return 0;
+diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
+index 13fb5e6090fe..930887a4275f 100644
+--- a/fs/btrfs/delayed-ref.h
++++ b/fs/btrfs/delayed-ref.h
+@@ -68,7 +68,6 @@ struct btrfs_delayed_ref_node {
+
+ unsigned int action:8;
+ unsigned int type:8;
+- unsigned int no_quota:1;
+ /* is this node still in the rbtree? */
+ unsigned int is_head:1;
+ unsigned int in_tree:1;
+@@ -233,15 +232,13 @@ int btrfs_add_delayed_tree_ref(struct btrfs_fs_info *fs_info,
+ struct btrfs_trans_handle *trans,
+ u64 bytenr, u64 num_bytes, u64 parent,
+ u64 ref_root, int level, int action,
+- struct btrfs_delayed_extent_op *extent_op,
+- int no_quota);
++ struct btrfs_delayed_extent_op *extent_op);
+ int btrfs_add_delayed_data_ref(struct btrfs_fs_info *fs_info,
+ struct btrfs_trans_handle *trans,
+ u64 bytenr, u64 num_bytes,
+ u64 parent, u64 ref_root,
+ u64 owner, u64 offset, int action,
+- struct btrfs_delayed_extent_op *extent_op,
+- int no_quota);
++ struct btrfs_delayed_extent_op *extent_op);
+ int btrfs_add_delayed_extent_op(struct btrfs_fs_info *fs_info,
+ struct btrfs_trans_handle *trans,
+ u64 bytenr, u64 num_bytes,
+diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
+index 07204bf601ed..5d870c4eac05 100644
+--- a/fs/btrfs/extent-tree.c
++++ b/fs/btrfs/extent-tree.c
+@@ -95,8 +95,7 @@ static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans,
+ struct btrfs_root *root,
+ u64 parent, u64 root_objectid,
+ u64 flags, struct btrfs_disk_key *key,
+- int level, struct btrfs_key *ins,
+- int no_quota);
++ int level, struct btrfs_key *ins);
+ static int do_chunk_alloc(struct btrfs_trans_handle *trans,
+ struct btrfs_root *extent_root, u64 flags,
+ int force);
+@@ -1941,8 +1940,7 @@ int btrfs_discard_extent(struct btrfs_root *root, u64 bytenr,
+ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
+ struct btrfs_root *root,
+ u64 bytenr, u64 num_bytes, u64 parent,
+- u64 root_objectid, u64 owner, u64 offset,
+- int no_quota)
++ u64 root_objectid, u64 owner, u64 offset)
+ {
+ int ret;
+ struct btrfs_fs_info *fs_info = root->fs_info;
+@@ -1954,12 +1952,12 @@ int btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
+ ret = btrfs_add_delayed_tree_ref(fs_info, trans, bytenr,
+ num_bytes,
+ parent, root_objectid, (int)owner,
+- BTRFS_ADD_DELAYED_REF, NULL, no_quota);
++ BTRFS_ADD_DELAYED_REF, NULL);
+ } else {
+ ret = btrfs_add_delayed_data_ref(fs_info, trans, bytenr,
+ num_bytes,
+ parent, root_objectid, owner, offset,
+- BTRFS_ADD_DELAYED_REF, NULL, no_quota);
++ BTRFS_ADD_DELAYED_REF, NULL);
+ }
+ return ret;
+ }
+@@ -1980,15 +1978,11 @@ static int __btrfs_inc_extent_ref(struct btrfs_trans_handle *trans,
+ u64 num_bytes = node->num_bytes;
+ u64 refs;
+ int ret;
+- int no_quota = node->no_quota;
+
+ path = btrfs_alloc_path();
+ if (!path)
+ return -ENOMEM;
+
+- if (!is_fstree(root_objectid) || !root->fs_info->quota_enabled)
+- no_quota = 1;
+-
+ path->reada = 1;
+ path->leave_spinning = 1;
+ /* this will setup the path even if it fails to insert the back ref */
+@@ -2223,8 +2217,7 @@ static int run_delayed_tree_ref(struct btrfs_trans_handle *trans,
+ parent, ref_root,
+ extent_op->flags_to_set,
+ &extent_op->key,
+- ref->level, &ins,
+- node->no_quota);
++ ref->level, &ins);
+ } else if (node->action == BTRFS_ADD_DELAYED_REF) {
+ ret = __btrfs_inc_extent_ref(trans, root, node,
+ parent, ref_root,
+@@ -2365,7 +2358,21 @@ static noinline int __btrfs_run_delayed_refs(struct btrfs_trans_handle *trans,
+ }
+ }
+
++ /*
++ * We need to try and merge add/drops of the same ref since we
++ * can run into issues with relocate dropping the implicit ref
++ * and then it being added back again before the drop can
++ * finish. If we merged anything we need to re-loop so we can
++ * get a good ref.
++ * Or we can get node references of the same type that weren't
++ * merged when created due to bumps in the tree mod seq, and
++ * we need to merge them to prevent adding an inline extent
++ * backref before dropping it (triggering a BUG_ON at
++ * insert_inline_extent_backref()).
++ */
+ spin_lock(&locked_ref->lock);
++ btrfs_merge_delayed_refs(trans, fs_info, delayed_refs,
++ locked_ref);
+
+ /*
+ * locked_ref is the head node, so we have to go one
+@@ -3038,7 +3045,7 @@ static int __btrfs_mod_ref(struct btrfs_trans_handle *trans,
+ int level;
+ int ret = 0;
+ int (*process_func)(struct btrfs_trans_handle *, struct btrfs_root *,
+- u64, u64, u64, u64, u64, u64, int);
++ u64, u64, u64, u64, u64, u64);
+
+
+ if (btrfs_test_is_dummy_root(root))
+@@ -3079,15 +3086,14 @@ static int __btrfs_mod_ref(struct btrfs_trans_handle *trans,
+ key.offset -= btrfs_file_extent_offset(buf, fi);
+ ret = process_func(trans, root, bytenr, num_bytes,
+ parent, ref_root, key.objectid,
+- key.offset, 1);
++ key.offset);
+ if (ret)
+ goto fail;
+ } else {
+ bytenr = btrfs_node_blockptr(buf, i);
+ num_bytes = root->nodesize;
+ ret = process_func(trans, root, bytenr, num_bytes,
+- parent, ref_root, level - 1, 0,
+- 1);
++ parent, ref_root, level - 1, 0);
+ if (ret)
+ goto fail;
+ }
+@@ -6137,7 +6143,6 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
+ int extent_slot = 0;
+ int found_extent = 0;
+ int num_to_del = 1;
+- int no_quota = node->no_quota;
+ u32 item_size;
+ u64 refs;
+ u64 bytenr = node->bytenr;
+@@ -6146,9 +6151,6 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans,
+ bool skinny_metadata = btrfs_fs_incompat(root->fs_info,
+ SKINNY_METADATA);
+
+- if (!info->quota_enabled || !is_fstree(root_objectid))
+- no_quota = 1;
+-
+ path = btrfs_alloc_path();
+ if (!path)
+ return -ENOMEM;
+@@ -6474,7 +6476,7 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans,
+ buf->start, buf->len,
+ parent, root->root_key.objectid,
+ btrfs_header_level(buf),
+- BTRFS_DROP_DELAYED_REF, NULL, 0);
++ BTRFS_DROP_DELAYED_REF, NULL);
+ BUG_ON(ret); /* -ENOMEM */
+ }
+
+@@ -6522,7 +6524,7 @@ out:
+ /* Can return -ENOMEM */
+ int btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root,
+ u64 bytenr, u64 num_bytes, u64 parent, u64 root_objectid,
+- u64 owner, u64 offset, int no_quota)
++ u64 owner, u64 offset)
+ {
+ int ret;
+ struct btrfs_fs_info *fs_info = root->fs_info;
+@@ -6545,13 +6547,13 @@ int btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root,
+ ret = btrfs_add_delayed_tree_ref(fs_info, trans, bytenr,
+ num_bytes,
+ parent, root_objectid, (int)owner,
+- BTRFS_DROP_DELAYED_REF, NULL, no_quota);
++ BTRFS_DROP_DELAYED_REF, NULL);
+ } else {
+ ret = btrfs_add_delayed_data_ref(fs_info, trans, bytenr,
+ num_bytes,
+ parent, root_objectid, owner,
+ offset, BTRFS_DROP_DELAYED_REF,
+- NULL, no_quota);
++ NULL);
+ }
+ return ret;
+ }
+@@ -7333,8 +7335,7 @@ static int alloc_reserved_tree_block(struct btrfs_trans_handle *trans,
+ struct btrfs_root *root,
+ u64 parent, u64 root_objectid,
+ u64 flags, struct btrfs_disk_key *key,
+- int level, struct btrfs_key *ins,
+- int no_quota)
++ int level, struct btrfs_key *ins)
+ {
+ int ret;
+ struct btrfs_fs_info *fs_info = root->fs_info;
+@@ -7424,7 +7425,7 @@ int btrfs_alloc_reserved_file_extent(struct btrfs_trans_handle *trans,
+ ret = btrfs_add_delayed_data_ref(root->fs_info, trans, ins->objectid,
+ ins->offset, 0,
+ root_objectid, owner, offset,
+- BTRFS_ADD_DELAYED_EXTENT, NULL, 0);
++ BTRFS_ADD_DELAYED_EXTENT, NULL);
+ return ret;
+ }
+
+@@ -7641,7 +7642,7 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans,
+ ins.objectid, ins.offset,
+ parent, root_objectid, level,
+ BTRFS_ADD_DELAYED_EXTENT,
+- extent_op, 0);
++ extent_op);
+ if (ret)
+ goto out_free_delayed;
+ }
+@@ -8189,7 +8190,7 @@ skip:
+ }
+ }
+ ret = btrfs_free_extent(trans, root, bytenr, blocksize, parent,
+- root->root_key.objectid, level - 1, 0, 0);
++ root->root_key.objectid, level - 1, 0);
+ BUG_ON(ret); /* -ENOMEM */
+ }
+ btrfs_tree_unlock(next);
+diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
+index b823fac91c92..5e314856a58c 100644
+--- a/fs/btrfs/file.c
++++ b/fs/btrfs/file.c
+@@ -756,8 +756,16 @@ next_slot:
+ }
+
+ btrfs_item_key_to_cpu(leaf, &key, path->slots[0]);
+- if (key.objectid > ino ||
+- key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
++
++ if (key.objectid > ino)
++ break;
++ if (WARN_ON_ONCE(key.objectid < ino) ||
++ key.type < BTRFS_EXTENT_DATA_KEY) {
++ ASSERT(del_nr == 0);
++ path->slots[0]++;
++ goto next_slot;
++ }
++ if (key.type > BTRFS_EXTENT_DATA_KEY || key.offset >= end)
+ break;
+
+ fi = btrfs_item_ptr(leaf, path->slots[0],
+@@ -776,8 +784,8 @@ next_slot:
+ btrfs_file_extent_inline_len(leaf,
+ path->slots[0], fi);
+ } else {
+- WARN_ON(1);
+- extent_end = search_start;
++ /* can't happen */
++ BUG();
+ }
+
+ /*
+@@ -847,7 +855,7 @@ next_slot:
+ disk_bytenr, num_bytes, 0,
+ root->root_key.objectid,
+ new_key.objectid,
+- start - extent_offset, 1);
++ start - extent_offset);
+ BUG_ON(ret); /* -ENOMEM */
+ }
+ key.offset = start;
+@@ -925,7 +933,7 @@ delete_extent_item:
+ disk_bytenr, num_bytes, 0,
+ root->root_key.objectid,
+ key.objectid, key.offset -
+- extent_offset, 0);
++ extent_offset);
+ BUG_ON(ret); /* -ENOMEM */
+ inode_sub_bytes(inode,
+ extent_end - key.offset);
+@@ -1204,7 +1212,7 @@ again:
+
+ ret = btrfs_inc_extent_ref(trans, root, bytenr, num_bytes, 0,
+ root->root_key.objectid,
+- ino, orig_offset, 1);
++ ino, orig_offset);
+ BUG_ON(ret); /* -ENOMEM */
+
+ if (split == start) {
+@@ -1231,7 +1239,7 @@ again:
+ del_nr++;
+ ret = btrfs_free_extent(trans, root, bytenr, num_bytes,
+ 0, root->root_key.objectid,
+- ino, orig_offset, 0);
++ ino, orig_offset);
+ BUG_ON(ret); /* -ENOMEM */
+ }
+ other_start = 0;
+@@ -1248,7 +1256,7 @@ again:
+ del_nr++;
+ ret = btrfs_free_extent(trans, root, bytenr, num_bytes,
+ 0, root->root_key.objectid,
+- ino, orig_offset, 0);
++ ino, orig_offset);
+ BUG_ON(ret); /* -ENOMEM */
+ }
+ if (del_nr == 0) {
+@@ -1868,8 +1876,13 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
+ struct btrfs_log_ctx ctx;
+ int ret = 0;
+ bool full_sync = 0;
+- const u64 len = end - start + 1;
++ u64 len;
+
++ /*
++ * The range length can be represented by u64, we have to do the typecasts
++ * to avoid signed overflow if it's [0, LLONG_MAX] eg. from fsync()
++ */
++ len = (u64)end - (u64)start + 1;
+ trace_btrfs_sync_file(file, datasync);
+
+ /*
+@@ -2057,8 +2070,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
+ }
+ }
+ if (!full_sync) {
+- ret = btrfs_wait_ordered_range(inode, start,
+- end - start + 1);
++ ret = btrfs_wait_ordered_range(inode, start, len);
+ if (ret) {
+ btrfs_end_transaction(trans, root);
+ goto out;
+diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
+index b54e63038b96..9aabff2102f8 100644
+--- a/fs/btrfs/inode.c
++++ b/fs/btrfs/inode.c
+@@ -1294,8 +1294,14 @@ next_slot:
+ num_bytes = 0;
+ btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
+
+- if (found_key.objectid > ino ||
+- found_key.type > BTRFS_EXTENT_DATA_KEY ||
++ if (found_key.objectid > ino)
++ break;
++ if (WARN_ON_ONCE(found_key.objectid < ino) ||
++ found_key.type < BTRFS_EXTENT_DATA_KEY) {
++ path->slots[0]++;
++ goto next_slot;
++ }
++ if (found_key.type > BTRFS_EXTENT_DATA_KEY ||
+ found_key.offset > end)
+ break;
+
+@@ -2569,7 +2575,7 @@ again:
+ ret = btrfs_inc_extent_ref(trans, root, new->bytenr,
+ new->disk_len, 0,
+ backref->root_id, backref->inum,
+- new->file_pos, 0); /* start - extent_offset */
++ new->file_pos); /* start - extent_offset */
+ if (ret) {
+ btrfs_abort_transaction(trans, root, ret);
+ goto out_free_path;
+@@ -4184,6 +4190,47 @@ static int truncate_space_check(struct btrfs_trans_handle *trans,
+
+ }
+
++static int truncate_inline_extent(struct inode *inode,
++ struct btrfs_path *path,
++ struct btrfs_key *found_key,
++ const u64 item_end,
++ const u64 new_size)
++{
++ struct extent_buffer *leaf = path->nodes[0];
++ int slot = path->slots[0];
++ struct btrfs_file_extent_item *fi;
++ u32 size = (u32)(new_size - found_key->offset);
++ struct btrfs_root *root = BTRFS_I(inode)->root;
++
++ fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item);
++
++ if (btrfs_file_extent_compression(leaf, fi) != BTRFS_COMPRESS_NONE) {
++ loff_t offset = new_size;
++ loff_t page_end = ALIGN(offset, PAGE_CACHE_SIZE);
++
++ /*
++ * Zero out the remaining of the last page of our inline extent,
++ * instead of directly truncating our inline extent here - that
++ * would be much more complex (decompressing all the data, then
++ * compressing the truncated data, which might be bigger than
++ * the size of the inline extent, resize the extent, etc).
++ * We release the path because to get the page we might need to
++ * read the extent item from disk (data not in the page cache).
++ */
++ btrfs_release_path(path);
++ return btrfs_truncate_page(inode, offset, page_end - offset, 0);
++ }
++
++ btrfs_set_file_extent_ram_bytes(leaf, fi, size);
++ size = btrfs_file_extent_calc_inline_size(size);
++ btrfs_truncate_item(root, path, size, 1);
++
++ if (test_bit(BTRFS_ROOT_REF_COWS, &root->state))
++ inode_sub_bytes(inode, item_end + 1 - new_size);
++
++ return 0;
++}
++
+ /*
+ * this can truncate away extent items, csum items and directory items.
+ * It starts at a high offset and removes keys until it can't find
+@@ -4378,27 +4425,40 @@ search_again:
+ * special encodings
+ */
+ if (!del_item &&
+- btrfs_file_extent_compression(leaf, fi) == 0 &&
+ btrfs_file_extent_encryption(leaf, fi) == 0 &&
+ btrfs_file_extent_other_encoding(leaf, fi) == 0) {
+- u32 size = new_size - found_key.offset;
+-
+- if (test_bit(BTRFS_ROOT_REF_COWS, &root->state))
+- inode_sub_bytes(inode, item_end + 1 -
+- new_size);
+
+ /*
+- * update the ram bytes to properly reflect
+- * the new size of our item
++ * Need to release path in order to truncate a
++ * compressed extent. So delete any accumulated
++ * extent items so far.
+ */
+- btrfs_set_file_extent_ram_bytes(leaf, fi, size);
+- size =
+- btrfs_file_extent_calc_inline_size(size);
+- btrfs_truncate_item(root, path, size, 1);
++ if (btrfs_file_extent_compression(leaf, fi) !=
++ BTRFS_COMPRESS_NONE && pending_del_nr) {
++ err = btrfs_del_items(trans, root, path,
++ pending_del_slot,
++ pending_del_nr);
++ if (err) {
++ btrfs_abort_transaction(trans,
++ root,
++ err);
++ goto error;
++ }
++ pending_del_nr = 0;
++ }
++
++ err = truncate_inline_extent(inode, path,
++ &found_key,
++ item_end,
++ new_size);
++ if (err) {
++ btrfs_abort_transaction(trans,
++ root, err);
++ goto error;
++ }
+ } else if (test_bit(BTRFS_ROOT_REF_COWS,
+ &root->state)) {
+- inode_sub_bytes(inode, item_end + 1 -
+- found_key.offset);
++ inode_sub_bytes(inode, item_end + 1 - new_size);
+ }
+ }
+ delete:
+@@ -4428,7 +4488,7 @@ delete:
+ ret = btrfs_free_extent(trans, root, extent_start,
+ extent_num_bytes, 0,
+ btrfs_header_owner(leaf),
+- ino, extent_offset, 0);
++ ino, extent_offset);
+ BUG_ON(ret);
+ if (btrfs_should_throttle_delayed_refs(trans, root))
+ btrfs_async_run_delayed_refs(root,
+diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
+index 641d3dc4f31e..be4e53c61dd9 100644
+--- a/fs/btrfs/ioctl.c
++++ b/fs/btrfs/ioctl.c
+@@ -3195,41 +3195,6 @@ out:
+ return ret;
+ }
+
+-/* Helper to check and see if this root currently has a ref on the given disk
+- * bytenr. If it does then we need to update the quota for this root. This
+- * doesn't do anything if quotas aren't enabled.
+- */
+-static int check_ref(struct btrfs_trans_handle *trans, struct btrfs_root *root,
+- u64 disko)
+-{
+- struct seq_list tree_mod_seq_elem = SEQ_LIST_INIT(tree_mod_seq_elem);
+- struct ulist *roots;
+- struct ulist_iterator uiter;
+- struct ulist_node *root_node = NULL;
+- int ret;
+-
+- if (!root->fs_info->quota_enabled)
+- return 1;
+-
+- btrfs_get_tree_mod_seq(root->fs_info, &tree_mod_seq_elem);
+- ret = btrfs_find_all_roots(trans, root->fs_info, disko,
+- tree_mod_seq_elem.seq, &roots);
+- if (ret < 0)
+- goto out;
+- ret = 0;
+- ULIST_ITER_INIT(&uiter);
+- while ((root_node = ulist_next(roots, &uiter))) {
+- if (root_node->val == root->objectid) {
+- ret = 1;
+- break;
+- }
+- }
+- ulist_free(roots);
+-out:
+- btrfs_put_tree_mod_seq(root->fs_info, &tree_mod_seq_elem);
+- return ret;
+-}
+-
+ static int clone_finish_inode_update(struct btrfs_trans_handle *trans,
+ struct inode *inode,
+ u64 endoff,
+@@ -3320,6 +3285,150 @@ static void clone_update_extent_map(struct inode *inode,
+ &BTRFS_I(inode)->runtime_flags);
+ }
+
++/*
++ * Make sure we do not end up inserting an inline extent into a file that has
++ * already other (non-inline) extents. If a file has an inline extent it can
++ * not have any other extents and the (single) inline extent must start at the
++ * file offset 0. Failing to respect these rules will lead to file corruption,
++ * resulting in EIO errors on read/write operations, hitting BUG_ON's in mm, etc
++ *
++ * We can have extents that have been already written to disk or we can have
++ * dirty ranges still in delalloc, in which case the extent maps and items are
++ * created only when we run delalloc, and the delalloc ranges might fall outside
++ * the range we are currently locking in the inode's io tree. So we check the
++ * inode's i_size because of that (i_size updates are done while holding the
++ * i_mutex, which we are holding here).
++ * We also check to see if the inode has a size not greater than "datal" but has
++ * extents beyond it, due to an fallocate with FALLOC_FL_KEEP_SIZE (and we are
++ * protected against such concurrent fallocate calls by the i_mutex).
++ *
++ * If the file has no extents but a size greater than datal, do not allow the
++ * copy because we would need turn the inline extent into a non-inline one (even
++ * with NO_HOLES enabled). If we find our destination inode only has one inline
++ * extent, just overwrite it with the source inline extent if its size is less
++ * than the source extent's size, or we could copy the source inline extent's
++ * data into the destination inode's inline extent if the later is greater then
++ * the former.
++ */
++static int clone_copy_inline_extent(struct inode *src,
++ struct inode *dst,
++ struct btrfs_trans_handle *trans,
++ struct btrfs_path *path,
++ struct btrfs_key *new_key,
++ const u64 drop_start,
++ const u64 datal,
++ const u64 skip,
++ const u64 size,
++ char *inline_data)
++{
++ struct btrfs_root *root = BTRFS_I(dst)->root;
++ const u64 aligned_end = ALIGN(new_key->offset + datal,
++ root->sectorsize);
++ int ret;
++ struct btrfs_key key;
++
++ if (new_key->offset > 0)
++ return -EOPNOTSUPP;
++
++ key.objectid = btrfs_ino(dst);
++ key.type = BTRFS_EXTENT_DATA_KEY;
++ key.offset = 0;
++ ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
++ if (ret < 0) {
++ return ret;
++ } else if (ret > 0) {
++ if (path->slots[0] >= btrfs_header_nritems(path->nodes[0])) {
++ ret = btrfs_next_leaf(root, path);
++ if (ret < 0)
++ return ret;
++ else if (ret > 0)
++ goto copy_inline_extent;
++ }
++ btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
++ if (key.objectid == btrfs_ino(dst) &&
++ key.type == BTRFS_EXTENT_DATA_KEY) {
++ ASSERT(key.offset > 0);
++ return -EOPNOTSUPP;
++ }
++ } else if (i_size_read(dst) <= datal) {
++ struct btrfs_file_extent_item *ei;
++ u64 ext_len;
++
++ /*
++ * If the file size is <= datal, make sure there are no other
++ * extents following (can happen do to an fallocate call with
++ * the flag FALLOC_FL_KEEP_SIZE).
++ */
++ ei = btrfs_item_ptr(path->nodes[0], path->slots[0],
++ struct btrfs_file_extent_item);
++ /*
++ * If it's an inline extent, it can not have other extents
++ * following it.
++ */
++ if (btrfs_file_extent_type(path->nodes[0], ei) ==
++ BTRFS_FILE_EXTENT_INLINE)
++ goto copy_inline_extent;
++
++ ext_len = btrfs_file_extent_num_bytes(path->nodes[0], ei);
++ if (ext_len > aligned_end)
++ return -EOPNOTSUPP;
++
++ ret = btrfs_next_item(root, path);
++ if (ret < 0) {
++ return ret;
++ } else if (ret == 0) {
++ btrfs_item_key_to_cpu(path->nodes[0], &key,
++ path->slots[0]);
++ if (key.objectid == btrfs_ino(dst) &&
++ key.type == BTRFS_EXTENT_DATA_KEY)
++ return -EOPNOTSUPP;
++ }
++ }
++
++copy_inline_extent:
++ /*
++ * We have no extent items, or we have an extent at offset 0 which may
++ * or may not be inlined. All these cases are dealt the same way.
++ */
++ if (i_size_read(dst) > datal) {
++ /*
++ * If the destination inode has an inline extent...
++ * This would require copying the data from the source inline
++ * extent into the beginning of the destination's inline extent.
++ * But this is really complex, both extents can be compressed
++ * or just one of them, which would require decompressing and
++ * re-compressing data (which could increase the new compressed
++ * size, not allowing the compressed data to fit anymore in an
++ * inline extent).
++ * So just don't support this case for now (it should be rare,
++ * we are not really saving space when cloning inline extents).
++ */
++ return -EOPNOTSUPP;
++ }
++
++ btrfs_release_path(path);
++ ret = btrfs_drop_extents(trans, root, dst, drop_start, aligned_end, 1);
++ if (ret)
++ return ret;
++ ret = btrfs_insert_empty_item(trans, root, path, new_key, size);
++ if (ret)
++ return ret;
++
++ if (skip) {
++ const u32 start = btrfs_file_extent_calc_inline_size(0);
++
++ memmove(inline_data + start, inline_data + start + skip, datal);
++ }
++
++ write_extent_buffer(path->nodes[0], inline_data,
++ btrfs_item_ptr_offset(path->nodes[0],
++ path->slots[0]),
++ size);
++ inode_add_bytes(dst, datal);
++
++ return 0;
++}
++
+ /**
+ * btrfs_clone() - clone a range from inode file to another
+ *
+@@ -3344,9 +3453,7 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
+ u32 nritems;
+ int slot;
+ int ret;
+- int no_quota;
+ const u64 len = olen_aligned;
+- u64 last_disko = 0;
+ u64 last_dest_end = destoff;
+
+ ret = -ENOMEM;
+@@ -3392,7 +3499,6 @@ static int btrfs_clone(struct inode *src, struct inode *inode,
+
+ nritems = btrfs_header_nritems(path->nodes[0]);
+ process_slot:
+- no_quota = 1;
+ if (path->slots[0] >= nritems) {
+ ret = btrfs_next_leaf(BTRFS_I(src)->root, path);
+ if (ret < 0)
+@@ -3544,35 +3650,13 @@ process_slot:
+ btrfs_set_file_extent_num_bytes(leaf, extent,
+ datal);
+
+- /*
+- * We need to look up the roots that point at
+- * this bytenr and see if the new root does. If
+- * it does not we need to make sure we update
+- * quotas appropriately.
+- */
+- if (disko && root != BTRFS_I(src)->root &&
+- disko != last_disko) {
+- no_quota = check_ref(trans, root,
+- disko);
+- if (no_quota < 0) {
+- btrfs_abort_transaction(trans,
+- root,
+- ret);
+- btrfs_end_transaction(trans,
+- root);
+- ret = no_quota;
+- goto out;
+- }
+- }
+-
+ if (disko) {
+ inode_add_bytes(inode, datal);
+ ret = btrfs_inc_extent_ref(trans, root,
+ disko, diskl, 0,
+ root->root_key.objectid,
+ btrfs_ino(inode),
+- new_key.offset - datao,
+- no_quota);
++ new_key.offset - datao);
+ if (ret) {
+ btrfs_abort_transaction(trans,
+ root,
+@@ -3586,21 +3670,6 @@ process_slot:
+ } else if (type == BTRFS_FILE_EXTENT_INLINE) {
+ u64 skip = 0;
+ u64 trim = 0;
+- u64 aligned_end = 0;
+-
+- /*
+- * Don't copy an inline extent into an offset
+- * greater than zero. Having an inline extent
+- * at such an offset results in chaos as btrfs
+- * isn't prepared for such cases. Just skip
+- * this case for the same reasons as commented
+- * at btrfs_ioctl_clone().
+- */
+- if (last_dest_end > 0) {
+- ret = -EOPNOTSUPP;
+- btrfs_end_transaction(trans, root);
+- goto out;
+- }
+
+ if (off > key.offset) {
+ skip = off - key.offset;
+@@ -3618,42 +3687,22 @@ process_slot:
+ size -= skip + trim;
+ datal -= skip + trim;
+
+- aligned_end = ALIGN(new_key.offset + datal,
+- root->sectorsize);
+- ret = btrfs_drop_extents(trans, root, inode,
+- drop_start,
+- aligned_end,
+- 1);
++ ret = clone_copy_inline_extent(src, inode,
++ trans, path,
++ &new_key,
++ drop_start,
++ datal,
++ skip, size, buf);
+ if (ret) {
+ if (ret != -EOPNOTSUPP)
+ btrfs_abort_transaction(trans,
+- root, ret);
+- btrfs_end_transaction(trans, root);
+- goto out;
+- }
+-
+- ret = btrfs_insert_empty_item(trans, root, path,
+- &new_key, size);
+- if (ret) {
+- btrfs_abort_transaction(trans, root,
+- ret);
++ root,
++ ret);
+ btrfs_end_transaction(trans, root);
+ goto out;
+ }
+-
+- if (skip) {
+- u32 start =
+- btrfs_file_extent_calc_inline_size(0);
+- memmove(buf+start, buf+start+skip,
+- datal);
+- }
+-
+ leaf = path->nodes[0];
+ slot = path->slots[0];
+- write_extent_buffer(leaf, buf,
+- btrfs_item_ptr_offset(leaf, slot),
+- size);
+- inode_add_bytes(inode, datal);
+ }
+
+ /* If we have an implicit hole (NO_HOLES feature). */
+diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
+index 88cbb5995667..3a828a33cd67 100644
+--- a/fs/btrfs/relocation.c
++++ b/fs/btrfs/relocation.c
+@@ -1716,7 +1716,7 @@ int replace_file_extents(struct btrfs_trans_handle *trans,
+ ret = btrfs_inc_extent_ref(trans, root, new_bytenr,
+ num_bytes, parent,
+ btrfs_header_owner(leaf),
+- key.objectid, key.offset, 1);
++ key.objectid, key.offset);
+ if (ret) {
+ btrfs_abort_transaction(trans, root, ret);
+ break;
+@@ -1724,7 +1724,7 @@ int replace_file_extents(struct btrfs_trans_handle *trans,
+
+ ret = btrfs_free_extent(trans, root, bytenr, num_bytes,
+ parent, btrfs_header_owner(leaf),
+- key.objectid, key.offset, 1);
++ key.objectid, key.offset);
+ if (ret) {
+ btrfs_abort_transaction(trans, root, ret);
+ break;
+@@ -1900,23 +1900,21 @@ again:
+
+ ret = btrfs_inc_extent_ref(trans, src, old_bytenr, blocksize,
+ path->nodes[level]->start,
+- src->root_key.objectid, level - 1, 0,
+- 1);
++ src->root_key.objectid, level - 1, 0);
+ BUG_ON(ret);
+ ret = btrfs_inc_extent_ref(trans, dest, new_bytenr, blocksize,
+ 0, dest->root_key.objectid, level - 1,
+- 0, 1);
++ 0);
+ BUG_ON(ret);
+
+ ret = btrfs_free_extent(trans, src, new_bytenr, blocksize,
+ path->nodes[level]->start,
+- src->root_key.objectid, level - 1, 0,
+- 1);
++ src->root_key.objectid, level - 1, 0);
+ BUG_ON(ret);
+
+ ret = btrfs_free_extent(trans, dest, old_bytenr, blocksize,
+ 0, dest->root_key.objectid, level - 1,
+- 0, 1);
++ 0);
+ BUG_ON(ret);
+
+ btrfs_unlock_up_safe(path, 0);
+@@ -2746,7 +2744,7 @@ static int do_relocation(struct btrfs_trans_handle *trans,
+ node->eb->start, blocksize,
+ upper->eb->start,
+ btrfs_header_owner(upper->eb),
+- node->level, 0, 1);
++ node->level, 0);
+ BUG_ON(ret);
+
+ ret = btrfs_drop_subtree(trans, root, eb, upper->eb);
+diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
+index aa72bfd28f7d..890933b61267 100644
+--- a/fs/btrfs/send.c
++++ b/fs/btrfs/send.c
+@@ -2351,8 +2351,14 @@ static int send_subvol_begin(struct send_ctx *sctx)
+ }
+
+ TLV_PUT_STRING(sctx, BTRFS_SEND_A_PATH, name, namelen);
+- TLV_PUT_UUID(sctx, BTRFS_SEND_A_UUID,
+- sctx->send_root->root_item.uuid);
++
++ if (!btrfs_is_empty_uuid(sctx->send_root->root_item.received_uuid))
++ TLV_PUT_UUID(sctx, BTRFS_SEND_A_UUID,
++ sctx->send_root->root_item.received_uuid);
++ else
++ TLV_PUT_UUID(sctx, BTRFS_SEND_A_UUID,
++ sctx->send_root->root_item.uuid);
++
+ TLV_PUT_U64(sctx, BTRFS_SEND_A_CTRANSID,
+ le64_to_cpu(sctx->send_root->root_item.ctransid));
+ if (parent_root) {
+diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
+index 9c45431e69ab..7639695075dd 100644
+--- a/fs/btrfs/tree-log.c
++++ b/fs/btrfs/tree-log.c
+@@ -700,7 +700,7 @@ static noinline int replay_one_extent(struct btrfs_trans_handle *trans,
+ ret = btrfs_inc_extent_ref(trans, root,
+ ins.objectid, ins.offset,
+ 0, root->root_key.objectid,
+- key->objectid, offset, 0);
++ key->objectid, offset);
+ if (ret)
+ goto out;
+ } else {
+diff --git a/fs/btrfs/xattr.c b/fs/btrfs/xattr.c
+index 6f518c90e1c1..1fcd7b6e7564 100644
+--- a/fs/btrfs/xattr.c
++++ b/fs/btrfs/xattr.c
+@@ -313,8 +313,10 @@ ssize_t btrfs_listxattr(struct dentry *dentry, char *buffer, size_t size)
+ /* check to make sure this item is what we want */
+ if (found_key.objectid != key.objectid)
+ break;
+- if (found_key.type != BTRFS_XATTR_ITEM_KEY)
++ if (found_key.type > BTRFS_XATTR_ITEM_KEY)
+ break;
++ if (found_key.type < BTRFS_XATTR_ITEM_KEY)
++ goto next;
+
+ di = btrfs_item_ptr(leaf, slot, struct btrfs_dir_item);
+ if (verify_dir_item(root, leaf, di))
+diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
+index 6aa07af67603..df45a818c570 100644
+--- a/fs/ceph/mds_client.c
++++ b/fs/ceph/mds_client.c
+@@ -1935,7 +1935,7 @@ static struct ceph_msg *create_request_message(struct ceph_mds_client *mdsc,
+
+ len = sizeof(*head) +
+ pathlen1 + pathlen2 + 2*(1 + sizeof(u32) + sizeof(u64)) +
+- sizeof(struct timespec);
++ sizeof(struct ceph_timespec);
+
+ /* calculate (max) length for cap releases */
+ len += sizeof(struct ceph_mds_request_release) *
+diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
+index c711be8d6a3c..9c8d23316da1 100644
+--- a/fs/debugfs/inode.c
++++ b/fs/debugfs/inode.c
+@@ -271,8 +271,12 @@ static struct dentry *start_creating(const char *name, struct dentry *parent)
+ dput(dentry);
+ dentry = ERR_PTR(-EEXIST);
+ }
+- if (IS_ERR(dentry))
++
++ if (IS_ERR(dentry)) {
+ mutex_unlock(&d_inode(parent)->i_mutex);
++ simple_release_fs(&debugfs_mount, &debugfs_mount_count);
++ }
++
+ return dentry;
+ }
+
+diff --git a/fs/ext4/crypto.c b/fs/ext4/crypto.c
+index 45731558138c..54a5169327a3 100644
+--- a/fs/ext4/crypto.c
++++ b/fs/ext4/crypto.c
+@@ -296,7 +296,6 @@ static int ext4_page_crypto(struct ext4_crypto_ctx *ctx,
+ else
+ res = crypto_ablkcipher_encrypt(req);
+ if (res == -EINPROGRESS || res == -EBUSY) {
+- BUG_ON(req->base.data != &ecr);
+ wait_for_completion(&ecr.completion);
+ res = ecr.res;
+ }
+diff --git a/fs/ext4/crypto_fname.c b/fs/ext4/crypto_fname.c
+index 7dc4eb55913c..f9d53c2bd756 100644
+--- a/fs/ext4/crypto_fname.c
++++ b/fs/ext4/crypto_fname.c
+@@ -121,7 +121,6 @@ static int ext4_fname_encrypt(struct inode *inode,
+ ablkcipher_request_set_crypt(req, &src_sg, &dst_sg, ciphertext_len, iv);
+ res = crypto_ablkcipher_encrypt(req);
+ if (res == -EINPROGRESS || res == -EBUSY) {
+- BUG_ON(req->base.data != &ecr);
+ wait_for_completion(&ecr.completion);
+ res = ecr.res;
+ }
+@@ -183,7 +182,6 @@ static int ext4_fname_decrypt(struct inode *inode,
+ ablkcipher_request_set_crypt(req, &src_sg, &dst_sg, iname->len, iv);
+ res = crypto_ablkcipher_decrypt(req);
+ if (res == -EINPROGRESS || res == -EBUSY) {
+- BUG_ON(req->base.data != &ecr);
+ wait_for_completion(&ecr.completion);
+ res = ecr.res;
+ }
+diff --git a/fs/ext4/crypto_key.c b/fs/ext4/crypto_key.c
+index 442d24e8efc0..9bad1132ac8f 100644
+--- a/fs/ext4/crypto_key.c
++++ b/fs/ext4/crypto_key.c
+@@ -71,7 +71,6 @@ static int ext4_derive_key_aes(char deriving_key[EXT4_AES_128_ECB_KEY_SIZE],
+ EXT4_AES_256_XTS_KEY_SIZE, NULL);
+ res = crypto_ablkcipher_encrypt(req);
+ if (res == -EINPROGRESS || res == -EBUSY) {
+- BUG_ON(req->base.data != &ecr);
+ wait_for_completion(&ecr.completion);
+ res = ecr.res;
+ }
+@@ -208,7 +207,12 @@ retry:
+ goto out;
+ }
+ crypt_info->ci_keyring_key = keyring_key;
+- BUG_ON(keyring_key->type != &key_type_logon);
++ if (keyring_key->type != &key_type_logon) {
++ printk_once(KERN_WARNING
++ "ext4: key type must be logon\n");
++ res = -ENOKEY;
++ goto out;
++ }
+ ukp = ((struct user_key_payload *)keyring_key->payload.data);
+ if (ukp->datalen != sizeof(struct ext4_encryption_key)) {
+ res = -EINVAL;
+@@ -217,7 +221,13 @@ retry:
+ master_key = (struct ext4_encryption_key *)ukp->data;
+ BUILD_BUG_ON(EXT4_AES_128_ECB_KEY_SIZE !=
+ EXT4_KEY_DERIVATION_NONCE_SIZE);
+- BUG_ON(master_key->size != EXT4_AES_256_XTS_KEY_SIZE);
++ if (master_key->size != EXT4_AES_256_XTS_KEY_SIZE) {
++ printk_once(KERN_WARNING
++ "ext4: key size incorrect: %d\n",
++ master_key->size);
++ res = -ENOKEY;
++ goto out;
++ }
+ res = ext4_derive_key_aes(ctx.nonce, master_key->raw,
+ raw_key);
+ got_key:
+diff --git a/fs/ext4/crypto_policy.c b/fs/ext4/crypto_policy.c
+index 02c4e5df7afb..f92fa93e67f1 100644
+--- a/fs/ext4/crypto_policy.c
++++ b/fs/ext4/crypto_policy.c
+@@ -137,7 +137,8 @@ int ext4_is_child_context_consistent_with_parent(struct inode *parent,
+
+ if ((parent == NULL) || (child == NULL)) {
+ pr_err("parent %p child %p\n", parent, child);
+- BUG_ON(1);
++ WARN_ON(1); /* Should never happen */
++ return 0;
+ }
+ /* no restrictions if the parent directory is not encrypted */
+ if (!ext4_encrypted_inode(parent))
+diff --git a/fs/ext4/ext4_jbd2.c b/fs/ext4/ext4_jbd2.c
+index d41843181818..e770c1ee4613 100644
+--- a/fs/ext4/ext4_jbd2.c
++++ b/fs/ext4/ext4_jbd2.c
+@@ -88,13 +88,13 @@ int __ext4_journal_stop(const char *where, unsigned int line, handle_t *handle)
+ return 0;
+ }
+
++ err = handle->h_err;
+ if (!handle->h_transaction) {
+- err = jbd2_journal_stop(handle);
+- return handle->h_err ? handle->h_err : err;
++ rc = jbd2_journal_stop(handle);
++ return err ? err : rc;
+ }
+
+ sb = handle->h_transaction->t_journal->j_private;
+- err = handle->h_err;
+ rc = jbd2_journal_stop(handle);
+
+ if (!err)
+diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
+index 5602450f03f6..89e96f99dae7 100644
+--- a/fs/ext4/page-io.c
++++ b/fs/ext4/page-io.c
+@@ -425,6 +425,7 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
+ struct buffer_head *bh, *head;
+ int ret = 0;
+ int nr_submitted = 0;
++ int nr_to_submit = 0;
+
+ blocksize = 1 << inode->i_blkbits;
+
+@@ -477,11 +478,13 @@ int ext4_bio_write_page(struct ext4_io_submit *io,
+ unmap_underlying_metadata(bh->b_bdev, bh->b_blocknr);
+ }
+ set_buffer_async_write(bh);
++ nr_to_submit++;
+ } while ((bh = bh->b_this_page) != head);
+
+ bh = head = page_buffers(page);
+
+- if (ext4_encrypted_inode(inode) && S_ISREG(inode->i_mode)) {
++ if (ext4_encrypted_inode(inode) && S_ISREG(inode->i_mode) &&
++ nr_to_submit) {
+ data_page = ext4_encrypt(inode, page);
+ if (IS_ERR(data_page)) {
+ ret = PTR_ERR(data_page);
+diff --git a/fs/ext4/super.c b/fs/ext4/super.c
+index a5e8c744e962..bc24d1b44b8f 100644
+--- a/fs/ext4/super.c
++++ b/fs/ext4/super.c
+@@ -397,9 +397,13 @@ static void ext4_handle_error(struct super_block *sb)
+ smp_wmb();
+ sb->s_flags |= MS_RDONLY;
+ }
+- if (test_opt(sb, ERRORS_PANIC))
++ if (test_opt(sb, ERRORS_PANIC)) {
++ if (EXT4_SB(sb)->s_journal &&
++ !(EXT4_SB(sb)->s_journal->j_flags & JBD2_REC_ERR))
++ return;
+ panic("EXT4-fs (device %s): panic forced after error\n",
+ sb->s_id);
++ }
+ }
+
+ #define ext4_error_ratelimit(sb) \
+@@ -588,8 +592,12 @@ void __ext4_abort(struct super_block *sb, const char *function,
+ jbd2_journal_abort(EXT4_SB(sb)->s_journal, -EIO);
+ save_error_info(sb, function, line);
+ }
+- if (test_opt(sb, ERRORS_PANIC))
++ if (test_opt(sb, ERRORS_PANIC)) {
++ if (EXT4_SB(sb)->s_journal &&
++ !(EXT4_SB(sb)->s_journal->j_flags & JBD2_REC_ERR))
++ return;
+ panic("EXT4-fs panic from previous error\n");
++ }
+ }
+
+ void __ext4_msg(struct super_block *sb,
+diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
+index 2721513adb1f..fd2787a39b87 100644
+--- a/fs/jbd2/journal.c
++++ b/fs/jbd2/journal.c
+@@ -2071,8 +2071,12 @@ static void __journal_abort_soft (journal_t *journal, int errno)
+
+ __jbd2_journal_abort_hard(journal);
+
+- if (errno)
++ if (errno) {
+ jbd2_journal_update_sb_errno(journal);
++ write_lock(&journal->j_state_lock);
++ journal->j_flags |= JBD2_REC_ERR;
++ write_unlock(&journal->j_state_lock);
++ }
+ }
+
+ /**
+diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
+index 4afbe13321cb..f27cc76ed5e6 100644
+--- a/fs/nfs/inode.c
++++ b/fs/nfs/inode.c
+@@ -1816,7 +1816,11 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
+ if ((long)fattr->gencount - (long)nfsi->attr_gencount > 0)
+ nfsi->attr_gencount = fattr->gencount;
+ }
+- invalid &= ~NFS_INO_INVALID_ATTR;
++
++ /* Don't declare attrcache up to date if there were no attrs! */
++ if (fattr->valid != 0)
++ invalid &= ~NFS_INO_INVALID_ATTR;
++
+ /* Don't invalidate the data if we were to blame */
+ if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode)
+ || S_ISLNK(inode->i_mode)))
+diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
+index 3aa6a9ba5113..199648d5fcc5 100644
+--- a/fs/nfs/nfs4client.c
++++ b/fs/nfs/nfs4client.c
+@@ -33,7 +33,7 @@ static int nfs_get_cb_ident_idr(struct nfs_client *clp, int minorversion)
+ return ret;
+ idr_preload(GFP_KERNEL);
+ spin_lock(&nn->nfs_client_lock);
+- ret = idr_alloc(&nn->cb_ident_idr, clp, 0, 0, GFP_NOWAIT);
++ ret = idr_alloc(&nn->cb_ident_idr, clp, 1, 0, GFP_NOWAIT);
+ if (ret >= 0)
+ clp->cl_cb_ident = ret;
+ spin_unlock(&nn->nfs_client_lock);
+diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
+index 75189cd34583..5ea13286e2b0 100644
+--- a/fs/nfsd/nfs4state.c
++++ b/fs/nfsd/nfs4state.c
+@@ -765,16 +765,68 @@ void nfs4_unhash_stid(struct nfs4_stid *s)
+ s->sc_type = 0;
+ }
+
+-static void
++/**
++ * nfs4_get_existing_delegation - Discover if this delegation already exists
++ * @clp: a pointer to the nfs4_client we're granting a delegation to
++ * @fp: a pointer to the nfs4_file we're granting a delegation on
++ *
++ * Return:
++ * On success: NULL if an existing delegation was not found.
++ *
++ * On error: -EAGAIN if one was previously granted to this nfs4_client
++ * for this nfs4_file.
++ *
++ */
++
++static int
++nfs4_get_existing_delegation(struct nfs4_client *clp, struct nfs4_file *fp)
++{
++ struct nfs4_delegation *searchdp = NULL;
++ struct nfs4_client *searchclp = NULL;
++
++ lockdep_assert_held(&state_lock);
++ lockdep_assert_held(&fp->fi_lock);
++
++ list_for_each_entry(searchdp, &fp->fi_delegations, dl_perfile) {
++ searchclp = searchdp->dl_stid.sc_client;
++ if (clp == searchclp) {
++ return -EAGAIN;
++ }
++ }
++ return 0;
++}
++
++/**
++ * hash_delegation_locked - Add a delegation to the appropriate lists
++ * @dp: a pointer to the nfs4_delegation we are adding.
++ * @fp: a pointer to the nfs4_file we're granting a delegation on
++ *
++ * Return:
++ * On success: NULL if the delegation was successfully hashed.
++ *
++ * On error: -EAGAIN if one was previously granted to this
++ * nfs4_client for this nfs4_file. Delegation is not hashed.
++ *
++ */
++
++static int
+ hash_delegation_locked(struct nfs4_delegation *dp, struct nfs4_file *fp)
+ {
++ int status;
++ struct nfs4_client *clp = dp->dl_stid.sc_client;
++
+ lockdep_assert_held(&state_lock);
+ lockdep_assert_held(&fp->fi_lock);
+
++ status = nfs4_get_existing_delegation(clp, fp);
++ if (status)
++ return status;
++ ++fp->fi_delegees;
+ atomic_inc(&dp->dl_stid.sc_count);
+ dp->dl_stid.sc_type = NFS4_DELEG_STID;
+ list_add(&dp->dl_perfile, &fp->fi_delegations);
+- list_add(&dp->dl_perclnt, &dp->dl_stid.sc_client->cl_delegations);
++ list_add(&dp->dl_perclnt, &clp->cl_delegations);
++ return 0;
+ }
+
+ static bool
+@@ -3351,6 +3403,7 @@ static void init_open_stateid(struct nfs4_ol_stateid *stp, struct nfs4_file *fp,
+ stp->st_access_bmap = 0;
+ stp->st_deny_bmap = 0;
+ stp->st_openstp = NULL;
++ init_rwsem(&stp->st_rwsem);
+ spin_lock(&oo->oo_owner.so_client->cl_lock);
+ list_add(&stp->st_perstateowner, &oo->oo_owner.so_stateids);
+ spin_lock(&fp->fi_lock);
+@@ -3939,6 +3992,18 @@ static struct file_lock *nfs4_alloc_init_lease(struct nfs4_file *fp, int flag)
+ return fl;
+ }
+
++/**
++ * nfs4_setlease - Obtain a delegation by requesting lease from vfs layer
++ * @dp: a pointer to the nfs4_delegation we're adding.
++ *
++ * Return:
++ * On success: Return code will be 0 on success.
++ *
++ * On error: -EAGAIN if there was an existing delegation.
++ * nonzero if there is an error in other cases.
++ *
++ */
++
+ static int nfs4_setlease(struct nfs4_delegation *dp)
+ {
+ struct nfs4_file *fp = dp->dl_stid.sc_file;
+@@ -3970,16 +4035,19 @@ static int nfs4_setlease(struct nfs4_delegation *dp)
+ goto out_unlock;
+ /* Race breaker */
+ if (fp->fi_deleg_file) {
+- status = 0;
+- ++fp->fi_delegees;
+- hash_delegation_locked(dp, fp);
++ status = hash_delegation_locked(dp, fp);
+ goto out_unlock;
+ }
+ fp->fi_deleg_file = filp;
+- fp->fi_delegees = 1;
+- hash_delegation_locked(dp, fp);
++ fp->fi_delegees = 0;
++ status = hash_delegation_locked(dp, fp);
+ spin_unlock(&fp->fi_lock);
+ spin_unlock(&state_lock);
++ if (status) {
++ /* Should never happen, this is a new fi_deleg_file */
++ WARN_ON_ONCE(1);
++ goto out_fput;
++ }
+ return 0;
+ out_unlock:
+ spin_unlock(&fp->fi_lock);
+@@ -3999,6 +4067,15 @@ nfs4_set_delegation(struct nfs4_client *clp, struct svc_fh *fh,
+ if (fp->fi_had_conflict)
+ return ERR_PTR(-EAGAIN);
+
++ spin_lock(&state_lock);
++ spin_lock(&fp->fi_lock);
++ status = nfs4_get_existing_delegation(clp, fp);
++ spin_unlock(&fp->fi_lock);
++ spin_unlock(&state_lock);
++
++ if (status)
++ return ERR_PTR(status);
++
+ dp = alloc_init_deleg(clp, fh, odstate);
+ if (!dp)
+ return ERR_PTR(-ENOMEM);
+@@ -4017,9 +4094,7 @@ nfs4_set_delegation(struct nfs4_client *clp, struct svc_fh *fh,
+ status = -EAGAIN;
+ goto out_unlock;
+ }
+- ++fp->fi_delegees;
+- hash_delegation_locked(dp, fp);
+- status = 0;
++ status = hash_delegation_locked(dp, fp);
+ out_unlock:
+ spin_unlock(&fp->fi_lock);
+ spin_unlock(&state_lock);
+@@ -4180,15 +4255,20 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
+ */
+ if (stp) {
+ /* Stateid was found, this is an OPEN upgrade */
++ down_read(&stp->st_rwsem);
+ status = nfs4_upgrade_open(rqstp, fp, current_fh, stp, open);
+- if (status)
++ if (status) {
++ up_read(&stp->st_rwsem);
+ goto out;
++ }
+ } else {
+ stp = open->op_stp;
+ open->op_stp = NULL;
+ init_open_stateid(stp, fp, open);
++ down_read(&stp->st_rwsem);
+ status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open);
+ if (status) {
++ up_read(&stp->st_rwsem);
+ release_open_stateid(stp);
+ goto out;
+ }
+@@ -4200,6 +4280,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
+ }
+ update_stateid(&stp->st_stid.sc_stateid);
+ memcpy(&open->op_stateid, &stp->st_stid.sc_stateid, sizeof(stateid_t));
++ up_read(&stp->st_rwsem);
+
+ if (nfsd4_has_session(&resp->cstate)) {
+ if (open->op_deleg_want & NFS4_SHARE_WANT_NO_DELEG) {
+@@ -4814,10 +4895,13 @@ static __be32 nfs4_seqid_op_checks(struct nfsd4_compound_state *cstate, stateid_
+ * revoked delegations are kept only for free_stateid.
+ */
+ return nfserr_bad_stateid;
++ down_write(&stp->st_rwsem);
+ status = check_stateid_generation(stateid, &stp->st_stid.sc_stateid, nfsd4_has_session(cstate));
+- if (status)
+- return status;
+- return nfs4_check_fh(current_fh, &stp->st_stid);
++ if (status == nfs_ok)
++ status = nfs4_check_fh(current_fh, &stp->st_stid);
++ if (status != nfs_ok)
++ up_write(&stp->st_rwsem);
++ return status;
+ }
+
+ /*
+@@ -4864,6 +4948,7 @@ static __be32 nfs4_preprocess_confirmed_seqid_op(struct nfsd4_compound_state *cs
+ return status;
+ oo = openowner(stp->st_stateowner);
+ if (!(oo->oo_flags & NFS4_OO_CONFIRMED)) {
++ up_write(&stp->st_rwsem);
+ nfs4_put_stid(&stp->st_stid);
+ return nfserr_bad_stateid;
+ }
+@@ -4894,11 +4979,14 @@ nfsd4_open_confirm(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
+ goto out;
+ oo = openowner(stp->st_stateowner);
+ status = nfserr_bad_stateid;
+- if (oo->oo_flags & NFS4_OO_CONFIRMED)
++ if (oo->oo_flags & NFS4_OO_CONFIRMED) {
++ up_write(&stp->st_rwsem);
+ goto put_stateid;
++ }
+ oo->oo_flags |= NFS4_OO_CONFIRMED;
+ update_stateid(&stp->st_stid.sc_stateid);
+ memcpy(&oc->oc_resp_stateid, &stp->st_stid.sc_stateid, sizeof(stateid_t));
++ up_write(&stp->st_rwsem);
+ dprintk("NFSD: %s: success, seqid=%d stateid=" STATEID_FMT "\n",
+ __func__, oc->oc_seqid, STATEID_VAL(&stp->st_stid.sc_stateid));
+
+@@ -4977,6 +5065,7 @@ nfsd4_open_downgrade(struct svc_rqst *rqstp,
+ memcpy(&od->od_stateid, &stp->st_stid.sc_stateid, sizeof(stateid_t));
+ status = nfs_ok;
+ put_stateid:
++ up_write(&stp->st_rwsem);
+ nfs4_put_stid(&stp->st_stid);
+ out:
+ nfsd4_bump_seqid(cstate, status);
+@@ -5030,6 +5119,7 @@ nfsd4_close(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
+ goto out;
+ update_stateid(&stp->st_stid.sc_stateid);
+ memcpy(&close->cl_stateid, &stp->st_stid.sc_stateid, sizeof(stateid_t));
++ up_write(&stp->st_rwsem);
+
+ nfsd4_close_open_stateid(stp);
+
+@@ -5260,6 +5350,7 @@ init_lock_stateid(struct nfs4_ol_stateid *stp, struct nfs4_lockowner *lo,
+ stp->st_access_bmap = 0;
+ stp->st_deny_bmap = open_stp->st_deny_bmap;
+ stp->st_openstp = open_stp;
++ init_rwsem(&stp->st_rwsem);
+ list_add(&stp->st_locks, &open_stp->st_locks);
+ list_add(&stp->st_perstateowner, &lo->lo_owner.so_stateids);
+ spin_lock(&fp->fi_lock);
+@@ -5428,6 +5519,7 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
+ &open_stp, nn);
+ if (status)
+ goto out;
++ up_write(&open_stp->st_rwsem);
+ open_sop = openowner(open_stp->st_stateowner);
+ status = nfserr_bad_stateid;
+ if (!same_clid(&open_sop->oo_owner.so_client->cl_clientid,
+@@ -5435,6 +5527,8 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
+ goto out;
+ status = lookup_or_create_lock_state(cstate, open_stp, lock,
+ &lock_stp, &new);
++ if (status == nfs_ok)
++ down_write(&lock_stp->st_rwsem);
+ } else {
+ status = nfs4_preprocess_seqid_op(cstate,
+ lock->lk_old_lock_seqid,
+@@ -5540,6 +5634,8 @@ out:
+ seqid_mutating_err(ntohl(status)))
+ lock_sop->lo_owner.so_seqid++;
+
++ up_write(&lock_stp->st_rwsem);
++
+ /*
+ * If this is a new, never-before-used stateid, and we are
+ * returning an error, then just go ahead and release it.
+@@ -5710,6 +5806,7 @@ nfsd4_locku(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
+ fput:
+ fput(filp);
+ put_stateid:
++ up_write(&stp->st_rwsem);
+ nfs4_put_stid(&stp->st_stid);
+ out:
+ nfsd4_bump_seqid(cstate, status);
+diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
+index 4874ce515fc1..fada614d6db1 100644
+--- a/fs/nfsd/state.h
++++ b/fs/nfsd/state.h
+@@ -534,15 +534,16 @@ struct nfs4_file {
+ * Better suggestions welcome.
+ */
+ struct nfs4_ol_stateid {
+- struct nfs4_stid st_stid; /* must be first field */
+- struct list_head st_perfile;
+- struct list_head st_perstateowner;
+- struct list_head st_locks;
+- struct nfs4_stateowner * st_stateowner;
+- struct nfs4_clnt_odstate * st_clnt_odstate;
+- unsigned char st_access_bmap;
+- unsigned char st_deny_bmap;
+- struct nfs4_ol_stateid * st_openstp;
++ struct nfs4_stid st_stid;
++ struct list_head st_perfile;
++ struct list_head st_perstateowner;
++ struct list_head st_locks;
++ struct nfs4_stateowner *st_stateowner;
++ struct nfs4_clnt_odstate *st_clnt_odstate;
++ unsigned char st_access_bmap;
++ unsigned char st_deny_bmap;
++ struct nfs4_ol_stateid *st_openstp;
++ struct rw_semaphore st_rwsem;
+ };
+
+ static inline struct nfs4_ol_stateid *openlockstateid(struct nfs4_stid *s)
+diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
+index 6e6abb93fda5..ff040125c190 100644
+--- a/fs/ocfs2/namei.c
++++ b/fs/ocfs2/namei.c
+@@ -365,6 +365,8 @@ static int ocfs2_mknod(struct inode *dir,
+ mlog_errno(status);
+ goto leave;
+ }
++ /* update inode->i_mode after mask with "umask". */
++ inode->i_mode = mode;
+
+ handle = ocfs2_start_trans(osb, ocfs2_mknod_credits(osb->sb,
+ S_ISDIR(mode),
+diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
+index 82806c60aa42..e4b464983322 100644
+--- a/include/linux/ipv6.h
++++ b/include/linux/ipv6.h
+@@ -224,7 +224,7 @@ struct ipv6_pinfo {
+ struct ipv6_ac_socklist *ipv6_ac_list;
+ struct ipv6_fl_socklist __rcu *ipv6_fl_list;
+
+- struct ipv6_txoptions *opt;
++ struct ipv6_txoptions __rcu *opt;
+ struct sk_buff *pktoptions;
+ struct sk_buff *rxpmtu;
+ struct inet6_cork cork;
+diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
+index eb1cebed3f36..c90c9b70e568 100644
+--- a/include/linux/jbd2.h
++++ b/include/linux/jbd2.h
+@@ -1007,6 +1007,7 @@ struct journal_s
+ #define JBD2_ABORT_ON_SYNCDATA_ERR 0x040 /* Abort the journal on file
+ * data write error in ordered
+ * mode */
++#define JBD2_REC_ERR 0x080 /* The errno in the sb has been recorded */
+
+ /*
+ * Function declarations for the journaling transaction and buffer
+diff --git a/include/net/af_unix.h b/include/net/af_unix.h
+index cb1b9bbda332..49c7683e1096 100644
+--- a/include/net/af_unix.h
++++ b/include/net/af_unix.h
+@@ -62,6 +62,7 @@ struct unix_sock {
+ #define UNIX_GC_CANDIDATE 0
+ #define UNIX_GC_MAYBE_CYCLE 1
+ struct socket_wq peer_wq;
++ wait_queue_t peer_wake;
+ };
+
+ static inline struct unix_sock *unix_sk(struct sock *sk)
+diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
+index 3b76849c190f..75a888c254e4 100644
+--- a/include/net/ip6_fib.h
++++ b/include/net/ip6_fib.h
+@@ -165,7 +165,8 @@ static inline void rt6_update_expires(struct rt6_info *rt0, int timeout)
+
+ static inline u32 rt6_get_cookie(const struct rt6_info *rt)
+ {
+- if (rt->rt6i_flags & RTF_PCPU || unlikely(rt->dst.flags & DST_NOCACHE))
++ if (rt->rt6i_flags & RTF_PCPU ||
++ (unlikely(rt->dst.flags & DST_NOCACHE) && rt->dst.from))
+ rt = (struct rt6_info *)(rt->dst.from);
+
+ return rt->rt6i_node ? rt->rt6i_node->fn_sernum : 0;
+diff --git a/include/net/ip6_tunnel.h b/include/net/ip6_tunnel.h
+index b8529aa1dae7..b0f7445c0fdc 100644
+--- a/include/net/ip6_tunnel.h
++++ b/include/net/ip6_tunnel.h
+@@ -83,11 +83,12 @@ static inline void ip6tunnel_xmit(struct sock *sk, struct sk_buff *skb,
+ err = ip6_local_out_sk(sk, skb);
+
+ if (net_xmit_eval(err) == 0) {
+- struct pcpu_sw_netstats *tstats = this_cpu_ptr(dev->tstats);
++ struct pcpu_sw_netstats *tstats = get_cpu_ptr(dev->tstats);
+ u64_stats_update_begin(&tstats->syncp);
+ tstats->tx_bytes += pkt_len;
+ tstats->tx_packets++;
+ u64_stats_update_end(&tstats->syncp);
++ put_cpu_ptr(tstats);
+ } else {
+ stats->tx_errors++;
+ stats->tx_aborted_errors++;
+diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
+index d8214cb88bbc..9c2897e56ee1 100644
+--- a/include/net/ip_tunnels.h
++++ b/include/net/ip_tunnels.h
+@@ -207,12 +207,13 @@ static inline void iptunnel_xmit_stats(int err,
+ struct pcpu_sw_netstats __percpu *stats)
+ {
+ if (err > 0) {
+- struct pcpu_sw_netstats *tstats = this_cpu_ptr(stats);
++ struct pcpu_sw_netstats *tstats = get_cpu_ptr(stats);
+
+ u64_stats_update_begin(&tstats->syncp);
+ tstats->tx_bytes += err;
+ tstats->tx_packets++;
+ u64_stats_update_end(&tstats->syncp);
++ put_cpu_ptr(tstats);
+ } else if (err < 0) {
+ err_stats->tx_errors++;
+ err_stats->tx_aborted_errors++;
+diff --git a/include/net/ipv6.h b/include/net/ipv6.h
+index 82dbdb092a5d..177a89689095 100644
+--- a/include/net/ipv6.h
++++ b/include/net/ipv6.h
+@@ -205,6 +205,7 @@ extern rwlock_t ip6_ra_lock;
+ */
+
+ struct ipv6_txoptions {
++ atomic_t refcnt;
+ /* Length of this structure */
+ int tot_len;
+
+@@ -217,7 +218,7 @@ struct ipv6_txoptions {
+ struct ipv6_opt_hdr *dst0opt;
+ struct ipv6_rt_hdr *srcrt; /* Routing Header */
+ struct ipv6_opt_hdr *dst1opt;
+-
++ struct rcu_head rcu;
+ /* Option buffer, as read by IPV6_PKTOPTIONS, starts here. */
+ };
+
+@@ -252,6 +253,24 @@ struct ipv6_fl_socklist {
+ struct rcu_head rcu;
+ };
+
++static inline struct ipv6_txoptions *txopt_get(const struct ipv6_pinfo *np)
++{
++ struct ipv6_txoptions *opt;
++
++ rcu_read_lock();
++ opt = rcu_dereference(np->opt);
++ if (opt && !atomic_inc_not_zero(&opt->refcnt))
++ opt = NULL;
++ rcu_read_unlock();
++ return opt;
++}
++
++static inline void txopt_put(struct ipv6_txoptions *opt)
++{
++ if (opt && atomic_dec_and_test(&opt->refcnt))
++ kfree_rcu(opt, rcu);
++}
++
+ struct ip6_flowlabel *fl6_sock_lookup(struct sock *sk, __be32 label);
+ struct ipv6_txoptions *fl6_merge_options(struct ipv6_txoptions *opt_space,
+ struct ip6_flowlabel *fl,
+@@ -490,6 +509,7 @@ struct ip6_create_arg {
+ u32 user;
+ const struct in6_addr *src;
+ const struct in6_addr *dst;
++ int iif;
+ u8 ecn;
+ };
+
+diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
+index 2738f6f87908..49dda3835061 100644
+--- a/include/net/sch_generic.h
++++ b/include/net/sch_generic.h
+@@ -61,6 +61,9 @@ struct Qdisc {
+ */
+ #define TCQ_F_WARN_NONWC (1 << 16)
+ #define TCQ_F_CPUSTATS 0x20 /* run using percpu statistics */
++#define TCQ_F_NOPARENT 0x40 /* root of its hierarchy :
++ * qdisc_tree_decrease_qlen() should stop.
++ */
+ u32 limit;
+ const struct Qdisc_ops *ops;
+ struct qdisc_size_table __rcu *stab;
+diff --git a/include/net/switchdev.h b/include/net/switchdev.h
+index d5671f118bfc..0b9197975603 100644
+--- a/include/net/switchdev.h
++++ b/include/net/switchdev.h
+@@ -268,7 +268,7 @@ static inline int switchdev_port_fdb_dump(struct sk_buff *skb,
+ struct net_device *filter_dev,
+ int idx)
+ {
+- return -EOPNOTSUPP;
++ return idx;
+ }
+
+ #endif
+diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
+index cb31229a6fa4..34265a1ddb51 100644
+--- a/kernel/bpf/arraymap.c
++++ b/kernel/bpf/arraymap.c
+@@ -104,7 +104,7 @@ static int array_map_update_elem(struct bpf_map *map, void *key, void *value,
+ /* all elements already exist */
+ return -EEXIST;
+
+- memcpy(array->value + array->elem_size * index, value, array->elem_size);
++ memcpy(array->value + array->elem_size * index, value, map->value_size);
+ return 0;
+ }
+
+diff --git a/net/core/neighbour.c b/net/core/neighbour.c
+index 84195dacb8b6..ecdb1717ef3a 100644
+--- a/net/core/neighbour.c
++++ b/net/core/neighbour.c
+@@ -2210,7 +2210,7 @@ static int pneigh_fill_info(struct sk_buff *skb, struct pneigh_entry *pn,
+ ndm->ndm_pad2 = 0;
+ ndm->ndm_flags = pn->flags | NTF_PROXY;
+ ndm->ndm_type = RTN_UNICAST;
+- ndm->ndm_ifindex = pn->dev->ifindex;
++ ndm->ndm_ifindex = pn->dev ? pn->dev->ifindex : 0;
+ ndm->ndm_state = NUD_NONE;
+
+ if (nla_put(skb, NDA_DST, tbl->key_len, pn->key))
+@@ -2285,7 +2285,7 @@ static int pneigh_dump_table(struct neigh_table *tbl, struct sk_buff *skb,
+ if (h > s_h)
+ s_idx = 0;
+ for (n = tbl->phash_buckets[h], idx = 0; n; n = n->next) {
+- if (dev_net(n->dev) != net)
++ if (pneigh_net(n) != net)
+ continue;
+ if (idx < s_idx)
+ goto next;
+diff --git a/net/core/scm.c b/net/core/scm.c
+index 3b6899b7d810..8a1741b14302 100644
+--- a/net/core/scm.c
++++ b/net/core/scm.c
+@@ -305,6 +305,8 @@ void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm)
+ err = put_user(cmlen, &cm->cmsg_len);
+ if (!err) {
+ cmlen = CMSG_SPACE(i*sizeof(int));
++ if (msg->msg_controllen < cmlen)
++ cmlen = msg->msg_controllen;
+ msg->msg_control += cmlen;
+ msg->msg_controllen -= cmlen;
+ }
+diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
+index 5165571f397a..a0490508d213 100644
+--- a/net/dccp/ipv6.c
++++ b/net/dccp/ipv6.c
+@@ -202,7 +202,9 @@ static int dccp_v6_send_response(struct sock *sk, struct request_sock *req)
+ security_req_classify_flow(req, flowi6_to_flowi(&fl6));
+
+
+- final_p = fl6_update_dst(&fl6, np->opt, &final);
++ rcu_read_lock();
++ final_p = fl6_update_dst(&fl6, rcu_dereference(np->opt), &final);
++ rcu_read_unlock();
+
+ dst = ip6_dst_lookup_flow(sk, &fl6, final_p);
+ if (IS_ERR(dst)) {
+@@ -219,7 +221,10 @@ static int dccp_v6_send_response(struct sock *sk, struct request_sock *req)
+ &ireq->ir_v6_loc_addr,
+ &ireq->ir_v6_rmt_addr);
+ fl6.daddr = ireq->ir_v6_rmt_addr;
+- err = ip6_xmit(sk, skb, &fl6, np->opt, np->tclass);
++ rcu_read_lock();
++ err = ip6_xmit(sk, skb, &fl6, rcu_dereference(np->opt),
++ np->tclass);
++ rcu_read_unlock();
+ err = net_xmit_eval(err);
+ }
+
+@@ -415,6 +420,7 @@ static struct sock *dccp_v6_request_recv_sock(struct sock *sk,
+ {
+ struct inet_request_sock *ireq = inet_rsk(req);
+ struct ipv6_pinfo *newnp, *np = inet6_sk(sk);
++ struct ipv6_txoptions *opt;
+ struct inet_sock *newinet;
+ struct dccp6_sock *newdp6;
+ struct sock *newsk;
+@@ -534,13 +540,15 @@ static struct sock *dccp_v6_request_recv_sock(struct sock *sk,
+ * Yes, keeping reference count would be much more clever, but we make
+ * one more one thing there: reattach optmem to newsk.
+ */
+- if (np->opt != NULL)
+- newnp->opt = ipv6_dup_options(newsk, np->opt);
+-
++ opt = rcu_dereference(np->opt);
++ if (opt) {
++ opt = ipv6_dup_options(newsk, opt);
++ RCU_INIT_POINTER(newnp->opt, opt);
++ }
+ inet_csk(newsk)->icsk_ext_hdr_len = 0;
+- if (newnp->opt != NULL)
+- inet_csk(newsk)->icsk_ext_hdr_len = (newnp->opt->opt_nflen +
+- newnp->opt->opt_flen);
++ if (opt)
++ inet_csk(newsk)->icsk_ext_hdr_len = opt->opt_nflen +
++ opt->opt_flen;
+
+ dccp_sync_mss(newsk, dst_mtu(dst));
+
+@@ -793,6 +801,7 @@ static int dccp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
+ struct ipv6_pinfo *np = inet6_sk(sk);
+ struct dccp_sock *dp = dccp_sk(sk);
+ struct in6_addr *saddr = NULL, *final_p, final;
++ struct ipv6_txoptions *opt;
+ struct flowi6 fl6;
+ struct dst_entry *dst;
+ int addr_type;
+@@ -892,7 +901,8 @@ static int dccp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
+ fl6.fl6_sport = inet->inet_sport;
+ security_sk_classify_flow(sk, flowi6_to_flowi(&fl6));
+
+- final_p = fl6_update_dst(&fl6, np->opt, &final);
++ opt = rcu_dereference_protected(np->opt, sock_owned_by_user(sk));
++ final_p = fl6_update_dst(&fl6, opt, &final);
+
+ dst = ip6_dst_lookup_flow(sk, &fl6, final_p);
+ if (IS_ERR(dst)) {
+@@ -912,9 +922,8 @@ static int dccp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
+ __ip6_dst_store(sk, dst, NULL, NULL);
+
+ icsk->icsk_ext_hdr_len = 0;
+- if (np->opt != NULL)
+- icsk->icsk_ext_hdr_len = (np->opt->opt_flen +
+- np->opt->opt_nflen);
++ if (opt)
++ icsk->icsk_ext_hdr_len = opt->opt_flen + opt->opt_nflen;
+
+ inet->inet_dport = usin->sin6_port;
+
+diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
+index df28693f32e1..c3bfebd501ed 100644
+--- a/net/ipv4/ipmr.c
++++ b/net/ipv4/ipmr.c
+@@ -134,7 +134,7 @@ static int __ipmr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb,
+ struct mfc_cache *c, struct rtmsg *rtm);
+ static void mroute_netlink_event(struct mr_table *mrt, struct mfc_cache *mfc,
+ int cmd);
+-static void mroute_clean_tables(struct mr_table *mrt);
++static void mroute_clean_tables(struct mr_table *mrt, bool all);
+ static void ipmr_expire_process(unsigned long arg);
+
+ #ifdef CONFIG_IP_MROUTE_MULTIPLE_TABLES
+@@ -351,7 +351,7 @@ static struct mr_table *ipmr_new_table(struct net *net, u32 id)
+ static void ipmr_free_table(struct mr_table *mrt)
+ {
+ del_timer_sync(&mrt->ipmr_expire_timer);
+- mroute_clean_tables(mrt);
++ mroute_clean_tables(mrt, true);
+ kfree(mrt);
+ }
+
+@@ -1209,7 +1209,7 @@ static int ipmr_mfc_add(struct net *net, struct mr_table *mrt,
+ * Close the multicast socket, and clear the vif tables etc
+ */
+
+-static void mroute_clean_tables(struct mr_table *mrt)
++static void mroute_clean_tables(struct mr_table *mrt, bool all)
+ {
+ int i;
+ LIST_HEAD(list);
+@@ -1218,8 +1218,9 @@ static void mroute_clean_tables(struct mr_table *mrt)
+ /* Shut down all active vif entries */
+
+ for (i = 0; i < mrt->maxvif; i++) {
+- if (!(mrt->vif_table[i].flags & VIFF_STATIC))
+- vif_delete(mrt, i, 0, &list);
++ if (!all && (mrt->vif_table[i].flags & VIFF_STATIC))
++ continue;
++ vif_delete(mrt, i, 0, &list);
+ }
+ unregister_netdevice_many(&list);
+
+@@ -1227,7 +1228,7 @@ static void mroute_clean_tables(struct mr_table *mrt)
+
+ for (i = 0; i < MFC_LINES; i++) {
+ list_for_each_entry_safe(c, next, &mrt->mfc_cache_array[i], list) {
+- if (c->mfc_flags & MFC_STATIC)
++ if (!all && (c->mfc_flags & MFC_STATIC))
+ continue;
+ list_del_rcu(&c->list);
+ mroute_netlink_event(mrt, c, RTM_DELROUTE);
+@@ -1262,7 +1263,7 @@ static void mrtsock_destruct(struct sock *sk)
+ NETCONFA_IFINDEX_ALL,
+ net->ipv4.devconf_all);
+ RCU_INIT_POINTER(mrt->mroute_sk, NULL);
+- mroute_clean_tables(mrt);
++ mroute_clean_tables(mrt, false);
+ }
+ }
+ rtnl_unlock();
+diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
+index 728f5b3d3c64..77730b43469d 100644
+--- a/net/ipv4/tcp_input.c
++++ b/net/ipv4/tcp_input.c
+@@ -4434,19 +4434,34 @@ static int __must_check tcp_queue_rcv(struct sock *sk, struct sk_buff *skb, int
+ int tcp_send_rcvq(struct sock *sk, struct msghdr *msg, size_t size)
+ {
+ struct sk_buff *skb;
++ int err = -ENOMEM;
++ int data_len = 0;
+ bool fragstolen;
+
+ if (size == 0)
+ return 0;
+
+- skb = alloc_skb(size, sk->sk_allocation);
++ if (size > PAGE_SIZE) {
++ int npages = min_t(size_t, size >> PAGE_SHIFT, MAX_SKB_FRAGS);
++
++ data_len = npages << PAGE_SHIFT;
++ size = data_len + (size & ~PAGE_MASK);
++ }
++ skb = alloc_skb_with_frags(size - data_len, data_len,
++ PAGE_ALLOC_COSTLY_ORDER,
++ &err, sk->sk_allocation);
+ if (!skb)
+ goto err;
+
++ skb_put(skb, size - data_len);
++ skb->data_len = data_len;
++ skb->len = size;
++
+ if (tcp_try_rmem_schedule(sk, skb, skb->truesize))
+ goto err_free;
+
+- if (memcpy_from_msg(skb_put(skb, size), msg, size))
++ err = skb_copy_datagram_from_iter(skb, 0, &msg->msg_iter, size);
++ if (err)
+ goto err_free;
+
+ TCP_SKB_CB(skb)->seq = tcp_sk(sk)->rcv_nxt;
+@@ -4462,7 +4477,8 @@ int tcp_send_rcvq(struct sock *sk, struct msghdr *msg, size_t size)
+ err_free:
+ kfree_skb(skb);
+ err:
+- return -ENOMEM;
++ return err;
++
+ }
+
+ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
+@@ -5620,6 +5636,7 @@ discard:
+ }
+
+ tp->rcv_nxt = TCP_SKB_CB(skb)->seq + 1;
++ tp->copied_seq = tp->rcv_nxt;
+ tp->rcv_wup = TCP_SKB_CB(skb)->seq + 1;
+
+ /* RFC1323: The window in SYN & SYN/ACK segments is
+diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
+index 0ea2e1c5d395..569c63894472 100644
+--- a/net/ipv4/tcp_ipv4.c
++++ b/net/ipv4/tcp_ipv4.c
+@@ -922,7 +922,8 @@ int tcp_md5_do_add(struct sock *sk, const union tcp_md5_addr *addr,
+ }
+
+ md5sig = rcu_dereference_protected(tp->md5sig_info,
+- sock_owned_by_user(sk));
++ sock_owned_by_user(sk) ||
++ lockdep_is_held(&sk->sk_lock.slock));
+ if (!md5sig) {
+ md5sig = kmalloc(sizeof(*md5sig), gfp);
+ if (!md5sig)
+diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
+index 5b752f58a900..1e63c8fe1db8 100644
+--- a/net/ipv4/tcp_timer.c
++++ b/net/ipv4/tcp_timer.c
+@@ -176,6 +176,18 @@ static int tcp_write_timeout(struct sock *sk)
+ syn_set = true;
+ } else {
+ if (retransmits_timed_out(sk, sysctl_tcp_retries1, 0, 0)) {
++ /* Some middle-boxes may black-hole Fast Open _after_
++ * the handshake. Therefore we conservatively disable
++ * Fast Open on this path on recurring timeouts with
++ * few or zero bytes acked after Fast Open.
++ */
++ if (tp->syn_data_acked &&
++ tp->bytes_acked <= tp->rx_opt.mss_clamp) {
++ tcp_fastopen_cache_set(sk, 0, NULL, true, 0);
++ if (icsk->icsk_retransmits == sysctl_tcp_retries1)
++ NET_INC_STATS_BH(sock_net(sk),
++ LINUX_MIB_TCPFASTOPENACTIVEFAIL);
++ }
+ /* Black hole detection */
+ tcp_mtu_probing(icsk, sk);
+
+diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
+index 7de52b65173f..d87519efc3bd 100644
+--- a/net/ipv6/af_inet6.c
++++ b/net/ipv6/af_inet6.c
+@@ -426,9 +426,11 @@ void inet6_destroy_sock(struct sock *sk)
+
+ /* Free tx options */
+
+- opt = xchg(&np->opt, NULL);
+- if (opt)
+- sock_kfree_s(sk, opt, opt->tot_len);
++ opt = xchg((__force struct ipv6_txoptions **)&np->opt, NULL);
++ if (opt) {
++ atomic_sub(opt->tot_len, &sk->sk_omem_alloc);
++ txopt_put(opt);
++ }
+ }
+ EXPORT_SYMBOL_GPL(inet6_destroy_sock);
+
+@@ -657,7 +659,10 @@ int inet6_sk_rebuild_header(struct sock *sk)
+ fl6.fl6_sport = inet->inet_sport;
+ security_sk_classify_flow(sk, flowi6_to_flowi(&fl6));
+
+- final_p = fl6_update_dst(&fl6, np->opt, &final);
++ rcu_read_lock();
++ final_p = fl6_update_dst(&fl6, rcu_dereference(np->opt),
++ &final);
++ rcu_read_unlock();
+
+ dst = ip6_dst_lookup_flow(sk, &fl6, final_p);
+ if (IS_ERR(dst)) {
+diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
+index b10a88986a98..13ca4cf5616f 100644
+--- a/net/ipv6/datagram.c
++++ b/net/ipv6/datagram.c
+@@ -167,8 +167,10 @@ ipv4_connected:
+
+ security_sk_classify_flow(sk, flowi6_to_flowi(&fl6));
+
+- opt = flowlabel ? flowlabel->opt : np->opt;
++ rcu_read_lock();
++ opt = flowlabel ? flowlabel->opt : rcu_dereference(np->opt);
+ final_p = fl6_update_dst(&fl6, opt, &final);
++ rcu_read_unlock();
+
+ dst = ip6_dst_lookup_flow(sk, &fl6, final_p);
+ err = 0;
+diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
+index a7bbbe45570b..adbd6958c398 100644
+--- a/net/ipv6/exthdrs.c
++++ b/net/ipv6/exthdrs.c
+@@ -727,6 +727,7 @@ ipv6_dup_options(struct sock *sk, struct ipv6_txoptions *opt)
+ *((char **)&opt2->dst1opt) += dif;
+ if (opt2->srcrt)
+ *((char **)&opt2->srcrt) += dif;
++ atomic_set(&opt2->refcnt, 1);
+ }
+ return opt2;
+ }
+@@ -790,7 +791,7 @@ ipv6_renew_options(struct sock *sk, struct ipv6_txoptions *opt,
+ return ERR_PTR(-ENOBUFS);
+
+ memset(opt2, 0, tot_len);
+-
++ atomic_set(&opt2->refcnt, 1);
+ opt2->tot_len = tot_len;
+ p = (char *)(opt2 + 1);
+
+diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c
+index 6927f3fb5597..9beed302eb36 100644
+--- a/net/ipv6/inet6_connection_sock.c
++++ b/net/ipv6/inet6_connection_sock.c
+@@ -77,7 +77,9 @@ struct dst_entry *inet6_csk_route_req(struct sock *sk,
+ memset(fl6, 0, sizeof(*fl6));
+ fl6->flowi6_proto = IPPROTO_TCP;
+ fl6->daddr = ireq->ir_v6_rmt_addr;
+- final_p = fl6_update_dst(fl6, np->opt, &final);
++ rcu_read_lock();
++ final_p = fl6_update_dst(fl6, rcu_dereference(np->opt), &final);
++ rcu_read_unlock();
+ fl6->saddr = ireq->ir_v6_loc_addr;
+ fl6->flowi6_oif = ireq->ir_iif;
+ fl6->flowi6_mark = ireq->ir_mark;
+@@ -207,7 +209,9 @@ static struct dst_entry *inet6_csk_route_socket(struct sock *sk,
+ fl6->fl6_dport = inet->inet_dport;
+ security_sk_classify_flow(sk, flowi6_to_flowi(fl6));
+
+- final_p = fl6_update_dst(fl6, np->opt, &final);
++ rcu_read_lock();
++ final_p = fl6_update_dst(fl6, rcu_dereference(np->opt), &final);
++ rcu_read_unlock();
+
+ dst = __inet6_csk_dst_check(sk, np->dst_cookie);
+ if (!dst) {
+@@ -240,7 +244,8 @@ int inet6_csk_xmit(struct sock *sk, struct sk_buff *skb, struct flowi *fl_unused
+ /* Restore final destination back after routing done */
+ fl6.daddr = sk->sk_v6_daddr;
+
+- res = ip6_xmit(sk, skb, &fl6, np->opt, np->tclass);
++ res = ip6_xmit(sk, skb, &fl6, rcu_dereference(np->opt),
++ np->tclass);
+ rcu_read_unlock();
+ return res;
+ }
+diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
+index 5f36266b1f5e..a7aef4b52d65 100644
+--- a/net/ipv6/ip6mr.c
++++ b/net/ipv6/ip6mr.c
+@@ -118,7 +118,7 @@ static void mr6_netlink_event(struct mr6_table *mrt, struct mfc6_cache *mfc,
+ int cmd);
+ static int ip6mr_rtm_dumproute(struct sk_buff *skb,
+ struct netlink_callback *cb);
+-static void mroute_clean_tables(struct mr6_table *mrt);
++static void mroute_clean_tables(struct mr6_table *mrt, bool all);
+ static void ipmr_expire_process(unsigned long arg);
+
+ #ifdef CONFIG_IPV6_MROUTE_MULTIPLE_TABLES
+@@ -335,7 +335,7 @@ static struct mr6_table *ip6mr_new_table(struct net *net, u32 id)
+ static void ip6mr_free_table(struct mr6_table *mrt)
+ {
+ del_timer_sync(&mrt->ipmr_expire_timer);
+- mroute_clean_tables(mrt);
++ mroute_clean_tables(mrt, true);
+ kfree(mrt);
+ }
+
+@@ -1543,7 +1543,7 @@ static int ip6mr_mfc_add(struct net *net, struct mr6_table *mrt,
+ * Close the multicast socket, and clear the vif tables etc
+ */
+
+-static void mroute_clean_tables(struct mr6_table *mrt)
++static void mroute_clean_tables(struct mr6_table *mrt, bool all)
+ {
+ int i;
+ LIST_HEAD(list);
+@@ -1553,8 +1553,9 @@ static void mroute_clean_tables(struct mr6_table *mrt)
+ * Shut down all active vif entries
+ */
+ for (i = 0; i < mrt->maxvif; i++) {
+- if (!(mrt->vif6_table[i].flags & VIFF_STATIC))
+- mif6_delete(mrt, i, &list);
++ if (!all && (mrt->vif6_table[i].flags & VIFF_STATIC))
++ continue;
++ mif6_delete(mrt, i, &list);
+ }
+ unregister_netdevice_many(&list);
+
+@@ -1563,7 +1564,7 @@ static void mroute_clean_tables(struct mr6_table *mrt)
+ */
+ for (i = 0; i < MFC6_LINES; i++) {
+ list_for_each_entry_safe(c, next, &mrt->mfc6_cache_array[i], list) {
+- if (c->mfc_flags & MFC_STATIC)
++ if (!all && (c->mfc_flags & MFC_STATIC))
+ continue;
+ write_lock_bh(&mrt_lock);
+ list_del(&c->list);
+@@ -1626,7 +1627,7 @@ int ip6mr_sk_done(struct sock *sk)
+ net->ipv6.devconf_all);
+ write_unlock_bh(&mrt_lock);
+
+- mroute_clean_tables(mrt);
++ mroute_clean_tables(mrt, false);
+ err = 0;
+ break;
+ }
+diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
+index 63e6956917c9..4449ad1f8114 100644
+--- a/net/ipv6/ipv6_sockglue.c
++++ b/net/ipv6/ipv6_sockglue.c
+@@ -111,7 +111,8 @@ struct ipv6_txoptions *ipv6_update_options(struct sock *sk,
+ icsk->icsk_sync_mss(sk, icsk->icsk_pmtu_cookie);
+ }
+ }
+- opt = xchg(&inet6_sk(sk)->opt, opt);
++ opt = xchg((__force struct ipv6_txoptions **)&inet6_sk(sk)->opt,
++ opt);
+ sk_dst_reset(sk);
+
+ return opt;
+@@ -231,9 +232,12 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
+ sk->sk_socket->ops = &inet_dgram_ops;
+ sk->sk_family = PF_INET;
+ }
+- opt = xchg(&np->opt, NULL);
+- if (opt)
+- sock_kfree_s(sk, opt, opt->tot_len);
++ opt = xchg((__force struct ipv6_txoptions **)&np->opt,
++ NULL);
++ if (opt) {
++ atomic_sub(opt->tot_len, &sk->sk_omem_alloc);
++ txopt_put(opt);
++ }
+ pktopt = xchg(&np->pktoptions, NULL);
+ kfree_skb(pktopt);
+
+@@ -403,7 +407,8 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
+ if (optname != IPV6_RTHDR && !ns_capable(net->user_ns, CAP_NET_RAW))
+ break;
+
+- opt = ipv6_renew_options(sk, np->opt, optname,
++ opt = rcu_dereference_protected(np->opt, sock_owned_by_user(sk));
++ opt = ipv6_renew_options(sk, opt, optname,
+ (struct ipv6_opt_hdr __user *)optval,
+ optlen);
+ if (IS_ERR(opt)) {
+@@ -432,8 +437,10 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
+ retv = 0;
+ opt = ipv6_update_options(sk, opt);
+ sticky_done:
+- if (opt)
+- sock_kfree_s(sk, opt, opt->tot_len);
++ if (opt) {
++ atomic_sub(opt->tot_len, &sk->sk_omem_alloc);
++ txopt_put(opt);
++ }
+ break;
+ }
+
+@@ -486,6 +493,7 @@ sticky_done:
+ break;
+
+ memset(opt, 0, sizeof(*opt));
++ atomic_set(&opt->refcnt, 1);
+ opt->tot_len = sizeof(*opt) + optlen;
+ retv = -EFAULT;
+ if (copy_from_user(opt+1, optval, optlen))
+@@ -502,8 +510,10 @@ update:
+ retv = 0;
+ opt = ipv6_update_options(sk, opt);
+ done:
+- if (opt)
+- sock_kfree_s(sk, opt, opt->tot_len);
++ if (opt) {
++ atomic_sub(opt->tot_len, &sk->sk_omem_alloc);
++ txopt_put(opt);
++ }
+ break;
+ }
+ case IPV6_UNICAST_HOPS:
+@@ -1110,10 +1120,11 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
+ case IPV6_RTHDR:
+ case IPV6_DSTOPTS:
+ {
++ struct ipv6_txoptions *opt;
+
+ lock_sock(sk);
+- len = ipv6_getsockopt_sticky(sk, np->opt,
+- optname, optval, len);
++ opt = rcu_dereference_protected(np->opt, sock_owned_by_user(sk));
++ len = ipv6_getsockopt_sticky(sk, opt, optname, optval, len);
+ release_sock(sk);
+ /* check if ipv6_getsockopt_sticky() returns err code */
+ if (len < 0)
+diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
+index 083b2927fc67..41e3b5ee8d0b 100644
+--- a/net/ipv6/mcast.c
++++ b/net/ipv6/mcast.c
+@@ -1651,7 +1651,6 @@ out:
+ if (!err) {
+ ICMP6MSGOUT_INC_STATS(net, idev, ICMPV6_MLD2_REPORT);
+ ICMP6_INC_STATS(net, idev, ICMP6_MIB_OUTMSGS);
+- IP6_UPD_PO_STATS(net, idev, IPSTATS_MIB_OUTMCAST, payload_len);
+ } else {
+ IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS);
+ }
+@@ -2014,7 +2013,6 @@ out:
+ if (!err) {
+ ICMP6MSGOUT_INC_STATS(net, idev, type);
+ ICMP6_INC_STATS(net, idev, ICMP6_MIB_OUTMSGS);
+- IP6_UPD_PO_STATS(net, idev, IPSTATS_MIB_OUTMCAST, full_len);
+ } else
+ IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS);
+
+diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
+index 6d02498172c1..2a4682c847b0 100644
+--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
++++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
+@@ -190,7 +190,7 @@ static void nf_ct_frag6_expire(unsigned long data)
+ /* Creation primitives. */
+ static inline struct frag_queue *fq_find(struct net *net, __be32 id,
+ u32 user, struct in6_addr *src,
+- struct in6_addr *dst, u8 ecn)
++ struct in6_addr *dst, int iif, u8 ecn)
+ {
+ struct inet_frag_queue *q;
+ struct ip6_create_arg arg;
+@@ -200,6 +200,7 @@ static inline struct frag_queue *fq_find(struct net *net, __be32 id,
+ arg.user = user;
+ arg.src = src;
+ arg.dst = dst;
++ arg.iif = iif;
+ arg.ecn = ecn;
+
+ local_bh_disable();
+@@ -603,7 +604,7 @@ struct sk_buff *nf_ct_frag6_gather(struct sk_buff *skb, u32 user)
+ fhdr = (struct frag_hdr *)skb_transport_header(clone);
+
+ fq = fq_find(net, fhdr->identification, user, &hdr->saddr, &hdr->daddr,
+- ip6_frag_ecn(hdr));
++ skb->dev ? skb->dev->ifindex : 0, ip6_frag_ecn(hdr));
+ if (fq == NULL) {
+ pr_debug("Can't find and can't create new queue\n");
+ goto ret_orig;
+diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
+index ca4700cb26c4..92d532967c90 100644
+--- a/net/ipv6/raw.c
++++ b/net/ipv6/raw.c
+@@ -731,6 +731,7 @@ static int raw6_getfrag(void *from, char *to, int offset, int len, int odd,
+
+ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
+ {
++ struct ipv6_txoptions *opt_to_free = NULL;
+ struct ipv6_txoptions opt_space;
+ DECLARE_SOCKADDR(struct sockaddr_in6 *, sin6, msg->msg_name);
+ struct in6_addr *daddr, *final_p, final;
+@@ -837,8 +838,10 @@ static int rawv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
+ if (!(opt->opt_nflen|opt->opt_flen))
+ opt = NULL;
+ }
+- if (!opt)
+- opt = np->opt;
++ if (!opt) {
++ opt = txopt_get(np);
++ opt_to_free = opt;
++ }
+ if (flowlabel)
+ opt = fl6_merge_options(&opt_space, flowlabel, opt);
+ opt = ipv6_fixup_options(&opt_space, opt);
+@@ -904,6 +907,7 @@ done:
+ dst_release(dst);
+ out:
+ fl6_sock_release(flowlabel);
++ txopt_put(opt_to_free);
+ return err < 0 ? err : len;
+ do_confirm:
+ dst_confirm(dst);
+diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
+index f1159bb76e0a..04013a910ce5 100644
+--- a/net/ipv6/reassembly.c
++++ b/net/ipv6/reassembly.c
+@@ -108,7 +108,10 @@ bool ip6_frag_match(const struct inet_frag_queue *q, const void *a)
+ return fq->id == arg->id &&
+ fq->user == arg->user &&
+ ipv6_addr_equal(&fq->saddr, arg->src) &&
+- ipv6_addr_equal(&fq->daddr, arg->dst);
++ ipv6_addr_equal(&fq->daddr, arg->dst) &&
++ (arg->iif == fq->iif ||
++ !(ipv6_addr_type(arg->dst) & (IPV6_ADDR_MULTICAST |
++ IPV6_ADDR_LINKLOCAL)));
+ }
+ EXPORT_SYMBOL(ip6_frag_match);
+
+@@ -180,7 +183,7 @@ static void ip6_frag_expire(unsigned long data)
+
+ static struct frag_queue *
+ fq_find(struct net *net, __be32 id, const struct in6_addr *src,
+- const struct in6_addr *dst, u8 ecn)
++ const struct in6_addr *dst, int iif, u8 ecn)
+ {
+ struct inet_frag_queue *q;
+ struct ip6_create_arg arg;
+@@ -190,6 +193,7 @@ fq_find(struct net *net, __be32 id, const struct in6_addr *src,
+ arg.user = IP6_DEFRAG_LOCAL_DELIVER;
+ arg.src = src;
+ arg.dst = dst;
++ arg.iif = iif;
+ arg.ecn = ecn;
+
+ hash = inet6_hash_frag(id, src, dst);
+@@ -551,7 +555,7 @@ static int ipv6_frag_rcv(struct sk_buff *skb)
+ }
+
+ fq = fq_find(net, fhdr->identification, &hdr->saddr, &hdr->daddr,
+- ip6_frag_ecn(hdr));
++ skb->dev ? skb->dev->ifindex : 0, ip6_frag_ecn(hdr));
+ if (fq) {
+ int ret;
+
+diff --git a/net/ipv6/route.c b/net/ipv6/route.c
+index dd6ebba5846c..8478719ef500 100644
+--- a/net/ipv6/route.c
++++ b/net/ipv6/route.c
+@@ -401,6 +401,14 @@ static void ip6_dst_ifdown(struct dst_entry *dst, struct net_device *dev,
+ }
+ }
+
++static bool __rt6_check_expired(const struct rt6_info *rt)
++{
++ if (rt->rt6i_flags & RTF_EXPIRES)
++ return time_after(jiffies, rt->dst.expires);
++ else
++ return false;
++}
++
+ static bool rt6_check_expired(const struct rt6_info *rt)
+ {
+ if (rt->rt6i_flags & RTF_EXPIRES) {
+@@ -1255,7 +1263,8 @@ static struct dst_entry *rt6_check(struct rt6_info *rt, u32 cookie)
+
+ static struct dst_entry *rt6_dst_from_check(struct rt6_info *rt, u32 cookie)
+ {
+- if (rt->dst.obsolete == DST_OBSOLETE_FORCE_CHK &&
++ if (!__rt6_check_expired(rt) &&
++ rt->dst.obsolete == DST_OBSOLETE_FORCE_CHK &&
+ rt6_check((struct rt6_info *)(rt->dst.from), cookie))
+ return &rt->dst;
+ else
+@@ -1275,7 +1284,8 @@ static struct dst_entry *ip6_dst_check(struct dst_entry *dst, u32 cookie)
+
+ rt6_dst_from_metrics_check(rt);
+
+- if ((rt->rt6i_flags & RTF_PCPU) || unlikely(dst->flags & DST_NOCACHE))
++ if (rt->rt6i_flags & RTF_PCPU ||
++ (unlikely(dst->flags & DST_NOCACHE) && rt->dst.from))
+ return rt6_dst_from_check(rt, cookie);
+ else
+ return rt6_check(rt, cookie);
+@@ -1326,6 +1336,12 @@ static void rt6_do_update_pmtu(struct rt6_info *rt, u32 mtu)
+ rt6_update_expires(rt, net->ipv6.sysctl.ip6_rt_mtu_expires);
+ }
+
++static bool rt6_cache_allowed_for_pmtu(const struct rt6_info *rt)
++{
++ return !(rt->rt6i_flags & RTF_CACHE) &&
++ (rt->rt6i_flags & RTF_PCPU || rt->rt6i_node);
++}
++
+ static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
+ const struct ipv6hdr *iph, u32 mtu)
+ {
+@@ -1339,7 +1355,7 @@ static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
+ if (mtu >= dst_mtu(dst))
+ return;
+
+- if (rt6->rt6i_flags & RTF_CACHE) {
++ if (!rt6_cache_allowed_for_pmtu(rt6)) {
+ rt6_do_update_pmtu(rt6, mtu);
+ } else {
+ const struct in6_addr *daddr, *saddr;
+diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
+index 0909f4e0d53c..f30bfdcdea54 100644
+--- a/net/ipv6/syncookies.c
++++ b/net/ipv6/syncookies.c
+@@ -225,7 +225,7 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
+ memset(&fl6, 0, sizeof(fl6));
+ fl6.flowi6_proto = IPPROTO_TCP;
+ fl6.daddr = ireq->ir_v6_rmt_addr;
+- final_p = fl6_update_dst(&fl6, np->opt, &final);
++ final_p = fl6_update_dst(&fl6, rcu_dereference(np->opt), &final);
+ fl6.saddr = ireq->ir_v6_loc_addr;
+ fl6.flowi6_oif = sk->sk_bound_dev_if;
+ fl6.flowi6_mark = ireq->ir_mark;
+diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
+index 7a6cea5e4274..45e473ee340b 100644
+--- a/net/ipv6/tcp_ipv6.c
++++ b/net/ipv6/tcp_ipv6.c
+@@ -120,6 +120,7 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
+ struct ipv6_pinfo *np = inet6_sk(sk);
+ struct tcp_sock *tp = tcp_sk(sk);
+ struct in6_addr *saddr = NULL, *final_p, final;
++ struct ipv6_txoptions *opt;
+ struct flowi6 fl6;
+ struct dst_entry *dst;
+ int addr_type;
+@@ -235,7 +236,8 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
+ fl6.fl6_dport = usin->sin6_port;
+ fl6.fl6_sport = inet->inet_sport;
+
+- final_p = fl6_update_dst(&fl6, np->opt, &final);
++ opt = rcu_dereference_protected(np->opt, sock_owned_by_user(sk));
++ final_p = fl6_update_dst(&fl6, opt, &final);
+
+ security_sk_classify_flow(sk, flowi6_to_flowi(&fl6));
+
+@@ -263,9 +265,9 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
+ tcp_fetch_timewait_stamp(sk, dst);
+
+ icsk->icsk_ext_hdr_len = 0;
+- if (np->opt)
+- icsk->icsk_ext_hdr_len = (np->opt->opt_flen +
+- np->opt->opt_nflen);
++ if (opt)
++ icsk->icsk_ext_hdr_len = opt->opt_flen +
++ opt->opt_nflen;
+
+ tp->rx_opt.mss_clamp = IPV6_MIN_MTU - sizeof(struct tcphdr) - sizeof(struct ipv6hdr);
+
+@@ -461,7 +463,8 @@ static int tcp_v6_send_synack(struct sock *sk, struct dst_entry *dst,
+ fl6->flowlabel = ip6_flowlabel(ipv6_hdr(ireq->pktopts));
+
+ skb_set_queue_mapping(skb, queue_mapping);
+- err = ip6_xmit(sk, skb, fl6, np->opt, np->tclass);
++ err = ip6_xmit(sk, skb, fl6, rcu_dereference(np->opt),
++ np->tclass);
+ err = net_xmit_eval(err);
+ }
+
+@@ -991,6 +994,7 @@ static struct sock *tcp_v6_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
+ struct inet_request_sock *ireq;
+ struct ipv6_pinfo *newnp, *np = inet6_sk(sk);
+ struct tcp6_sock *newtcp6sk;
++ struct ipv6_txoptions *opt;
+ struct inet_sock *newinet;
+ struct tcp_sock *newtp;
+ struct sock *newsk;
+@@ -1126,13 +1130,15 @@ static struct sock *tcp_v6_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
+ but we make one more one thing there: reattach optmem
+ to newsk.
+ */
+- if (np->opt)
+- newnp->opt = ipv6_dup_options(newsk, np->opt);
+-
++ opt = rcu_dereference(np->opt);
++ if (opt) {
++ opt = ipv6_dup_options(newsk, opt);
++ RCU_INIT_POINTER(newnp->opt, opt);
++ }
+ inet_csk(newsk)->icsk_ext_hdr_len = 0;
+- if (newnp->opt)
+- inet_csk(newsk)->icsk_ext_hdr_len = (newnp->opt->opt_nflen +
+- newnp->opt->opt_flen);
++ if (opt)
++ inet_csk(newsk)->icsk_ext_hdr_len = opt->opt_nflen +
++ opt->opt_flen;
+
+ tcp_ca_openreq_child(newsk, dst);
+
+diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
+index e51fc3eee6db..7333f3575fc5 100644
+--- a/net/ipv6/udp.c
++++ b/net/ipv6/udp.c
+@@ -1107,6 +1107,7 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
+ DECLARE_SOCKADDR(struct sockaddr_in6 *, sin6, msg->msg_name);
+ struct in6_addr *daddr, *final_p, final;
+ struct ipv6_txoptions *opt = NULL;
++ struct ipv6_txoptions *opt_to_free = NULL;
+ struct ip6_flowlabel *flowlabel = NULL;
+ struct flowi6 fl6;
+ struct dst_entry *dst;
+@@ -1260,8 +1261,10 @@ do_udp_sendmsg:
+ opt = NULL;
+ connected = 0;
+ }
+- if (!opt)
+- opt = np->opt;
++ if (!opt) {
++ opt = txopt_get(np);
++ opt_to_free = opt;
++ }
+ if (flowlabel)
+ opt = fl6_merge_options(&opt_space, flowlabel, opt);
+ opt = ipv6_fixup_options(&opt_space, opt);
+@@ -1370,6 +1373,7 @@ release_dst:
+ out:
+ dst_release(dst);
+ fl6_sock_release(flowlabel);
++ txopt_put(opt_to_free);
+ if (!err)
+ return len;
+ /*
+diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
+index d1ded3777815..0ce9da948ad7 100644
+--- a/net/l2tp/l2tp_ip6.c
++++ b/net/l2tp/l2tp_ip6.c
+@@ -486,6 +486,7 @@ static int l2tp_ip6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
+ DECLARE_SOCKADDR(struct sockaddr_l2tpip6 *, lsa, msg->msg_name);
+ struct in6_addr *daddr, *final_p, final;
+ struct ipv6_pinfo *np = inet6_sk(sk);
++ struct ipv6_txoptions *opt_to_free = NULL;
+ struct ipv6_txoptions *opt = NULL;
+ struct ip6_flowlabel *flowlabel = NULL;
+ struct dst_entry *dst = NULL;
+@@ -575,8 +576,10 @@ static int l2tp_ip6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
+ opt = NULL;
+ }
+
+- if (opt == NULL)
+- opt = np->opt;
++ if (!opt) {
++ opt = txopt_get(np);
++ opt_to_free = opt;
++ }
+ if (flowlabel)
+ opt = fl6_merge_options(&opt_space, flowlabel, opt);
+ opt = ipv6_fixup_options(&opt_space, opt);
+@@ -631,6 +634,7 @@ done:
+ dst_release(dst);
+ out:
+ fl6_sock_release(flowlabel);
++ txopt_put(opt_to_free);
+
+ return err < 0 ? err : len;
+
+diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
+index 71cb085e16fd..71d671c06952 100644
+--- a/net/packet/af_packet.c
++++ b/net/packet/af_packet.c
+@@ -1622,6 +1622,20 @@ static void fanout_release(struct sock *sk)
+ kfree_rcu(po->rollover, rcu);
+ }
+
++static bool packet_extra_vlan_len_allowed(const struct net_device *dev,
++ struct sk_buff *skb)
++{
++ /* Earlier code assumed this would be a VLAN pkt, double-check
++ * this now that we have the actual packet in hand. We can only
++ * do this check on Ethernet devices.
++ */
++ if (unlikely(dev->type != ARPHRD_ETHER))
++ return false;
++
++ skb_reset_mac_header(skb);
++ return likely(eth_hdr(skb)->h_proto == htons(ETH_P_8021Q));
++}
++
+ static const struct proto_ops packet_ops;
+
+ static const struct proto_ops packet_ops_spkt;
+@@ -1783,18 +1797,10 @@ retry:
+ goto retry;
+ }
+
+- if (len > (dev->mtu + dev->hard_header_len + extra_len)) {
+- /* Earlier code assumed this would be a VLAN pkt,
+- * double-check this now that we have the actual
+- * packet in hand.
+- */
+- struct ethhdr *ehdr;
+- skb_reset_mac_header(skb);
+- ehdr = eth_hdr(skb);
+- if (ehdr->h_proto != htons(ETH_P_8021Q)) {
+- err = -EMSGSIZE;
+- goto out_unlock;
+- }
++ if (len > (dev->mtu + dev->hard_header_len + extra_len) &&
++ !packet_extra_vlan_len_allowed(dev, skb)) {
++ err = -EMSGSIZE;
++ goto out_unlock;
+ }
+
+ skb->protocol = proto;
+@@ -2213,6 +2219,15 @@ static bool ll_header_truncated(const struct net_device *dev, int len)
+ return false;
+ }
+
++static void tpacket_set_protocol(const struct net_device *dev,
++ struct sk_buff *skb)
++{
++ if (dev->type == ARPHRD_ETHER) {
++ skb_reset_mac_header(skb);
++ skb->protocol = eth_hdr(skb)->h_proto;
++ }
++}
++
+ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
+ void *frame, struct net_device *dev, int size_max,
+ __be16 proto, unsigned char *addr, int hlen)
+@@ -2249,8 +2264,6 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
+ skb_reserve(skb, hlen);
+ skb_reset_network_header(skb);
+
+- if (!packet_use_direct_xmit(po))
+- skb_probe_transport_header(skb, 0);
+ if (unlikely(po->tp_tx_has_off)) {
+ int off_min, off_max, off;
+ off_min = po->tp_hdrlen - sizeof(struct sockaddr_ll);
+@@ -2296,6 +2309,8 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
+ dev->hard_header_len);
+ if (unlikely(err))
+ return err;
++ if (!skb->protocol)
++ tpacket_set_protocol(dev, skb);
+
+ data += dev->hard_header_len;
+ to_write -= dev->hard_header_len;
+@@ -2330,6 +2345,8 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
+ len = ((to_write > len_max) ? len_max : to_write);
+ }
+
++ skb_probe_transport_header(skb, 0);
++
+ return tp_len;
+ }
+
+@@ -2374,12 +2391,13 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
+ if (unlikely(!(dev->flags & IFF_UP)))
+ goto out_put;
+
+- reserve = dev->hard_header_len + VLAN_HLEN;
++ if (po->sk.sk_socket->type == SOCK_RAW)
++ reserve = dev->hard_header_len;
+ size_max = po->tx_ring.frame_size
+ - (po->tp_hdrlen - sizeof(struct sockaddr_ll));
+
+- if (size_max > dev->mtu + reserve)
+- size_max = dev->mtu + reserve;
++ if (size_max > dev->mtu + reserve + VLAN_HLEN)
++ size_max = dev->mtu + reserve + VLAN_HLEN;
+
+ do {
+ ph = packet_current_frame(po, &po->tx_ring,
+@@ -2406,18 +2424,10 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
+ tp_len = tpacket_fill_skb(po, skb, ph, dev, size_max, proto,
+ addr, hlen);
+ if (likely(tp_len >= 0) &&
+- tp_len > dev->mtu + dev->hard_header_len) {
+- struct ethhdr *ehdr;
+- /* Earlier code assumed this would be a VLAN pkt,
+- * double-check this now that we have the actual
+- * packet in hand.
+- */
++ tp_len > dev->mtu + reserve &&
++ !packet_extra_vlan_len_allowed(dev, skb))
++ tp_len = -EMSGSIZE;
+
+- skb_reset_mac_header(skb);
+- ehdr = eth_hdr(skb);
+- if (ehdr->h_proto != htons(ETH_P_8021Q))
+- tp_len = -EMSGSIZE;
+- }
+ if (unlikely(tp_len < 0)) {
+ if (po->tp_loss) {
+ __packet_set_status(po, ph,
+@@ -2638,18 +2648,10 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
+
+ sock_tx_timestamp(sk, &skb_shinfo(skb)->tx_flags);
+
+- if (!gso_type && (len > dev->mtu + reserve + extra_len)) {
+- /* Earlier code assumed this would be a VLAN pkt,
+- * double-check this now that we have the actual
+- * packet in hand.
+- */
+- struct ethhdr *ehdr;
+- skb_reset_mac_header(skb);
+- ehdr = eth_hdr(skb);
+- if (ehdr->h_proto != htons(ETH_P_8021Q)) {
+- err = -EMSGSIZE;
+- goto out_free;
+- }
++ if (!gso_type && (len > dev->mtu + reserve + extra_len) &&
++ !packet_extra_vlan_len_allowed(dev, skb)) {
++ err = -EMSGSIZE;
++ goto out_free;
+ }
+
+ skb->protocol = proto;
+@@ -2680,8 +2682,8 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
+ len += vnet_hdr_len;
+ }
+
+- if (!packet_use_direct_xmit(po))
+- skb_probe_transport_header(skb, reserve);
++ skb_probe_transport_header(skb, reserve);
++
+ if (unlikely(extra_len == 4))
+ skb->no_fcs = 1;
+
+diff --git a/net/rds/connection.c b/net/rds/connection.c
+index 9d66705f9d41..da6da57e5f36 100644
+--- a/net/rds/connection.c
++++ b/net/rds/connection.c
+@@ -187,12 +187,6 @@ new_conn:
+ }
+ }
+
+- if (trans == NULL) {
+- kmem_cache_free(rds_conn_slab, conn);
+- conn = ERR_PTR(-ENODEV);
+- goto out;
+- }
+-
+ conn->c_trans = trans;
+
+ ret = trans->conn_alloc(conn, gfp);
+diff --git a/net/rds/send.c b/net/rds/send.c
+index e9430f537f9c..7b30c0f3180d 100644
+--- a/net/rds/send.c
++++ b/net/rds/send.c
+@@ -986,11 +986,13 @@ int rds_sendmsg(struct socket *sock, struct msghdr *msg, size_t payload_len)
+ release_sock(sk);
+ }
+
+- /* racing with another thread binding seems ok here */
++ lock_sock(sk);
+ if (daddr == 0 || rs->rs_bound_addr == 0) {
++ release_sock(sk);
+ ret = -ENOTCONN; /* XXX not a great errno */
+ goto out;
+ }
++ release_sock(sk);
+
+ /* size of rm including all sgs */
+ ret = rds_rm_size(msg, payload_len);
+diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
+index f06aa01d60fd..1a0aa2a7cfeb 100644
+--- a/net/sched/sch_api.c
++++ b/net/sched/sch_api.c
+@@ -253,7 +253,8 @@ int qdisc_set_default(const char *name)
+ }
+
+ /* We know handle. Find qdisc among all qdisc's attached to device
+- (root qdisc, all its children, children of children etc.)
++ * (root qdisc, all its children, children of children etc.)
++ * Note: caller either uses rtnl or rcu_read_lock()
+ */
+
+ static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle)
+@@ -264,7 +265,7 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle)
+ root->handle == handle)
+ return root;
+
+- list_for_each_entry(q, &root->list, list) {
++ list_for_each_entry_rcu(q, &root->list, list) {
+ if (q->handle == handle)
+ return q;
+ }
+@@ -277,15 +278,18 @@ void qdisc_list_add(struct Qdisc *q)
+ struct Qdisc *root = qdisc_dev(q)->qdisc;
+
+ WARN_ON_ONCE(root == &noop_qdisc);
+- list_add_tail(&q->list, &root->list);
++ ASSERT_RTNL();
++ list_add_tail_rcu(&q->list, &root->list);
+ }
+ }
+ EXPORT_SYMBOL(qdisc_list_add);
+
+ void qdisc_list_del(struct Qdisc *q)
+ {
+- if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS))
+- list_del(&q->list);
++ if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) {
++ ASSERT_RTNL();
++ list_del_rcu(&q->list);
++ }
+ }
+ EXPORT_SYMBOL(qdisc_list_del);
+
+@@ -750,14 +754,18 @@ void qdisc_tree_decrease_qlen(struct Qdisc *sch, unsigned int n)
+ if (n == 0)
+ return;
+ drops = max_t(int, n, 0);
++ rcu_read_lock();
+ while ((parentid = sch->parent)) {
+ if (TC_H_MAJ(parentid) == TC_H_MAJ(TC_H_INGRESS))
+- return;
++ break;
+
++ if (sch->flags & TCQ_F_NOPARENT)
++ break;
++ /* TODO: perform the search on a per txq basis */
+ sch = qdisc_lookup(qdisc_dev(sch), TC_H_MAJ(parentid));
+ if (sch == NULL) {
+- WARN_ON(parentid != TC_H_ROOT);
+- return;
++ WARN_ON_ONCE(parentid != TC_H_ROOT);
++ break;
+ }
+ cops = sch->ops->cl_ops;
+ if (cops->qlen_notify) {
+@@ -768,6 +776,7 @@ void qdisc_tree_decrease_qlen(struct Qdisc *sch, unsigned int n)
+ sch->q.qlen -= n;
+ __qdisc_qstats_drop(sch, drops);
+ }
++ rcu_read_unlock();
+ }
+ EXPORT_SYMBOL(qdisc_tree_decrease_qlen);
+
+@@ -941,7 +950,7 @@ qdisc_create(struct net_device *dev, struct netdev_queue *dev_queue,
+ }
+ lockdep_set_class(qdisc_lock(sch), &qdisc_tx_lock);
+ if (!netif_is_multiqueue(dev))
+- sch->flags |= TCQ_F_ONETXQUEUE;
++ sch->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
+ }
+
+ sch->handle = handle;
+diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
+index 6efca30894aa..b453270be3fd 100644
+--- a/net/sched/sch_generic.c
++++ b/net/sched/sch_generic.c
+@@ -743,7 +743,7 @@ static void attach_one_default_qdisc(struct net_device *dev,
+ return;
+ }
+ if (!netif_is_multiqueue(dev))
+- qdisc->flags |= TCQ_F_ONETXQUEUE;
++ qdisc->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
+ }
+ dev_queue->qdisc_sleeping = qdisc;
+ }
+diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
+index f3cbaecd283a..3e82f047caaf 100644
+--- a/net/sched/sch_mq.c
++++ b/net/sched/sch_mq.c
+@@ -63,7 +63,7 @@ static int mq_init(struct Qdisc *sch, struct nlattr *opt)
+ if (qdisc == NULL)
+ goto err;
+ priv->qdiscs[ntx] = qdisc;
+- qdisc->flags |= TCQ_F_ONETXQUEUE;
++ qdisc->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
+ }
+
+ sch->flags |= TCQ_F_MQROOT;
+@@ -156,7 +156,7 @@ static int mq_graft(struct Qdisc *sch, unsigned long cl, struct Qdisc *new,
+
+ *old = dev_graft_qdisc(dev_queue, new);
+ if (new)
+- new->flags |= TCQ_F_ONETXQUEUE;
++ new->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
+ if (dev->flags & IFF_UP)
+ dev_activate(dev);
+ return 0;
+diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
+index 3811a745452c..ad70ecf57ce7 100644
+--- a/net/sched/sch_mqprio.c
++++ b/net/sched/sch_mqprio.c
+@@ -132,7 +132,7 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt)
+ goto err;
+ }
+ priv->qdiscs[i] = qdisc;
+- qdisc->flags |= TCQ_F_ONETXQUEUE;
++ qdisc->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
+ }
+
+ /* If the mqprio options indicate that hardware should own
+@@ -209,7 +209,7 @@ static int mqprio_graft(struct Qdisc *sch, unsigned long cl, struct Qdisc *new,
+ *old = dev_graft_qdisc(dev_queue, new);
+
+ if (new)
+- new->flags |= TCQ_F_ONETXQUEUE;
++ new->flags |= TCQ_F_ONETXQUEUE | TCQ_F_NOPARENT;
+
+ if (dev->flags & IFF_UP)
+ dev_activate(dev);
+diff --git a/net/sctp/auth.c b/net/sctp/auth.c
+index 4f15b7d730e1..1543e39f47c3 100644
+--- a/net/sctp/auth.c
++++ b/net/sctp/auth.c
+@@ -809,8 +809,8 @@ int sctp_auth_ep_set_hmacs(struct sctp_endpoint *ep,
+ if (!has_sha1)
+ return -EINVAL;
+
+- memcpy(ep->auth_hmacs_list->hmac_ids, &hmacs->shmac_idents[0],
+- hmacs->shmac_num_idents * sizeof(__u16));
++ for (i = 0; i < hmacs->shmac_num_idents; i++)
++ ep->auth_hmacs_list->hmac_ids[i] = htons(hmacs->shmac_idents[i]);
+ ep->auth_hmacs_list->param_hdr.length = htons(sizeof(sctp_paramhdr_t) +
+ hmacs->shmac_num_idents * sizeof(__u16));
+ return 0;
+diff --git a/net/sctp/socket.c b/net/sctp/socket.c
+index 17bef01b9aa3..3ec88be0faec 100644
+--- a/net/sctp/socket.c
++++ b/net/sctp/socket.c
+@@ -7375,6 +7375,13 @@ struct proto sctp_prot = {
+
+ #if IS_ENABLED(CONFIG_IPV6)
+
++#include <net/transp_v6.h>
++static void sctp_v6_destroy_sock(struct sock *sk)
++{
++ sctp_destroy_sock(sk);
++ inet6_destroy_sock(sk);
++}
++
+ struct proto sctpv6_prot = {
+ .name = "SCTPv6",
+ .owner = THIS_MODULE,
+@@ -7384,7 +7391,7 @@ struct proto sctpv6_prot = {
+ .accept = sctp_accept,
+ .ioctl = sctp_ioctl,
+ .init = sctp_init_sock,
+- .destroy = sctp_destroy_sock,
++ .destroy = sctp_v6_destroy_sock,
+ .shutdown = sctp_shutdown,
+ .setsockopt = sctp_setsockopt,
+ .getsockopt = sctp_getsockopt,
+diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
+index 94f658235fb4..128b0982c96b 100644
+--- a/net/unix/af_unix.c
++++ b/net/unix/af_unix.c
+@@ -326,6 +326,118 @@ found:
+ return s;
+ }
+
++/* Support code for asymmetrically connected dgram sockets
++ *
++ * If a datagram socket is connected to a socket not itself connected
++ * to the first socket (eg, /dev/log), clients may only enqueue more
++ * messages if the present receive queue of the server socket is not
++ * "too large". This means there's a second writeability condition
++ * poll and sendmsg need to test. The dgram recv code will do a wake
++ * up on the peer_wait wait queue of a socket upon reception of a
++ * datagram which needs to be propagated to sleeping would-be writers
++ * since these might not have sent anything so far. This can't be
++ * accomplished via poll_wait because the lifetime of the server
++ * socket might be less than that of its clients if these break their
++ * association with it or if the server socket is closed while clients
++ * are still connected to it and there's no way to inform "a polling
++ * implementation" that it should let go of a certain wait queue
++ *
++ * In order to propagate a wake up, a wait_queue_t of the client
++ * socket is enqueued on the peer_wait queue of the server socket
++ * whose wake function does a wake_up on the ordinary client socket
++ * wait queue. This connection is established whenever a write (or
++ * poll for write) hit the flow control condition and broken when the
++ * association to the server socket is dissolved or after a wake up
++ * was relayed.
++ */
++
++static int unix_dgram_peer_wake_relay(wait_queue_t *q, unsigned mode, int flags,
++ void *key)
++{
++ struct unix_sock *u;
++ wait_queue_head_t *u_sleep;
++
++ u = container_of(q, struct unix_sock, peer_wake);
++
++ __remove_wait_queue(&unix_sk(u->peer_wake.private)->peer_wait,
++ q);
++ u->peer_wake.private = NULL;
++
++ /* relaying can only happen while the wq still exists */
++ u_sleep = sk_sleep(&u->sk);
++ if (u_sleep)
++ wake_up_interruptible_poll(u_sleep, key);
++
++ return 0;
++}
++
++static int unix_dgram_peer_wake_connect(struct sock *sk, struct sock *other)
++{
++ struct unix_sock *u, *u_other;
++ int rc;
++
++ u = unix_sk(sk);
++ u_other = unix_sk(other);
++ rc = 0;
++ spin_lock(&u_other->peer_wait.lock);
++
++ if (!u->peer_wake.private) {
++ u->peer_wake.private = other;
++ __add_wait_queue(&u_other->peer_wait, &u->peer_wake);
++
++ rc = 1;
++ }
++
++ spin_unlock(&u_other->peer_wait.lock);
++ return rc;
++}
++
++static void unix_dgram_peer_wake_disconnect(struct sock *sk,
++ struct sock *other)
++{
++ struct unix_sock *u, *u_other;
++
++ u = unix_sk(sk);
++ u_other = unix_sk(other);
++ spin_lock(&u_other->peer_wait.lock);
++
++ if (u->peer_wake.private == other) {
++ __remove_wait_queue(&u_other->peer_wait, &u->peer_wake);
++ u->peer_wake.private = NULL;
++ }
++
++ spin_unlock(&u_other->peer_wait.lock);
++}
++
++static void unix_dgram_peer_wake_disconnect_wakeup(struct sock *sk,
++ struct sock *other)
++{
++ unix_dgram_peer_wake_disconnect(sk, other);
++ wake_up_interruptible_poll(sk_sleep(sk),
++ POLLOUT |
++ POLLWRNORM |
++ POLLWRBAND);
++}
++
++/* preconditions:
++ * - unix_peer(sk) == other
++ * - association is stable
++ */
++static int unix_dgram_peer_wake_me(struct sock *sk, struct sock *other)
++{
++ int connected;
++
++ connected = unix_dgram_peer_wake_connect(sk, other);
++
++ if (unix_recvq_full(other))
++ return 1;
++
++ if (connected)
++ unix_dgram_peer_wake_disconnect(sk, other);
++
++ return 0;
++}
++
+ static inline int unix_writable(struct sock *sk)
+ {
+ return (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
+@@ -430,6 +542,8 @@ static void unix_release_sock(struct sock *sk, int embrion)
+ skpair->sk_state_change(skpair);
+ sk_wake_async(skpair, SOCK_WAKE_WAITD, POLL_HUP);
+ }
++
++ unix_dgram_peer_wake_disconnect(sk, skpair);
+ sock_put(skpair); /* It may now die */
+ unix_peer(sk) = NULL;
+ }
+@@ -440,6 +554,7 @@ static void unix_release_sock(struct sock *sk, int embrion)
+ if (state == TCP_LISTEN)
+ unix_release_sock(skb->sk, 1);
+ /* passed fds are erased in the kfree_skb hook */
++ UNIXCB(skb).consumed = skb->len;
+ kfree_skb(skb);
+ }
+
+@@ -664,6 +779,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern)
+ INIT_LIST_HEAD(&u->link);
+ mutex_init(&u->readlock); /* single task reading lock */
+ init_waitqueue_head(&u->peer_wait);
++ init_waitqueue_func_entry(&u->peer_wake, unix_dgram_peer_wake_relay);
+ unix_insert_socket(unix_sockets_unbound(sk), sk);
+ out:
+ if (sk == NULL)
+@@ -1031,6 +1147,8 @@ restart:
+ if (unix_peer(sk)) {
+ struct sock *old_peer = unix_peer(sk);
+ unix_peer(sk) = other;
++ unix_dgram_peer_wake_disconnect_wakeup(sk, old_peer);
++
+ unix_state_double_unlock(sk, other);
+
+ if (other != old_peer)
+@@ -1432,6 +1550,14 @@ static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bool sen
+ return err;
+ }
+
++static bool unix_passcred_enabled(const struct socket *sock,
++ const struct sock *other)
++{
++ return test_bit(SOCK_PASSCRED, &sock->flags) ||
++ !other->sk_socket ||
++ test_bit(SOCK_PASSCRED, &other->sk_socket->flags);
++}
++
+ /*
+ * Some apps rely on write() giving SCM_CREDENTIALS
+ * We include credentials if source or destination socket
+@@ -1442,14 +1568,41 @@ static void maybe_add_creds(struct sk_buff *skb, const struct socket *sock,
+ {
+ if (UNIXCB(skb).pid)
+ return;
+- if (test_bit(SOCK_PASSCRED, &sock->flags) ||
+- !other->sk_socket ||
+- test_bit(SOCK_PASSCRED, &other->sk_socket->flags)) {
++ if (unix_passcred_enabled(sock, other)) {
+ UNIXCB(skb).pid = get_pid(task_tgid(current));
+ current_uid_gid(&UNIXCB(skb).uid, &UNIXCB(skb).gid);
+ }
+ }
+
++static int maybe_init_creds(struct scm_cookie *scm,
++ struct socket *socket,
++ const struct sock *other)
++{
++ int err;
++ struct msghdr msg = { .msg_controllen = 0 };
++
++ err = scm_send(socket, &msg, scm, false);
++ if (err)
++ return err;
++
++ if (unix_passcred_enabled(socket, other)) {
++ scm->pid = get_pid(task_tgid(current));
++ current_uid_gid(&scm->creds.uid, &scm->creds.gid);
++ }
++ return err;
++}
++
++static bool unix_skb_scm_eq(struct sk_buff *skb,
++ struct scm_cookie *scm)
++{
++ const struct unix_skb_parms *u = &UNIXCB(skb);
++
++ return u->pid == scm->pid &&
++ uid_eq(u->uid, scm->creds.uid) &&
++ gid_eq(u->gid, scm->creds.gid) &&
++ unix_secdata_eq(scm, skb);
++}
++
+ /*
+ * Send AF_UNIX data.
+ */
+@@ -1470,6 +1623,7 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg,
+ struct scm_cookie scm;
+ int max_level;
+ int data_len = 0;
++ int sk_locked;
+
+ wait_for_unix_gc();
+ err = scm_send(sock, msg, &scm, false);
+@@ -1548,12 +1702,14 @@ restart:
+ goto out_free;
+ }
+
++ sk_locked = 0;
+ unix_state_lock(other);
++restart_locked:
+ err = -EPERM;
+ if (!unix_may_send(sk, other))
+ goto out_unlock;
+
+- if (sock_flag(other, SOCK_DEAD)) {
++ if (unlikely(sock_flag(other, SOCK_DEAD))) {
+ /*
+ * Check with 1003.1g - what should
+ * datagram error
+@@ -1561,10 +1717,14 @@ restart:
+ unix_state_unlock(other);
+ sock_put(other);
+
++ if (!sk_locked)
++ unix_state_lock(sk);
++
+ err = 0;
+- unix_state_lock(sk);
+ if (unix_peer(sk) == other) {
+ unix_peer(sk) = NULL;
++ unix_dgram_peer_wake_disconnect_wakeup(sk, other);
++
+ unix_state_unlock(sk);
+
+ unix_dgram_disconnected(sk, other);
+@@ -1590,21 +1750,38 @@ restart:
+ goto out_unlock;
+ }
+
+- if (unix_peer(other) != sk && unix_recvq_full(other)) {
+- if (!timeo) {
+- err = -EAGAIN;
+- goto out_unlock;
++ if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
++ if (timeo) {
++ timeo = unix_wait_for_peer(other, timeo);
++
++ err = sock_intr_errno(timeo);
++ if (signal_pending(current))
++ goto out_free;
++
++ goto restart;
+ }
+
+- timeo = unix_wait_for_peer(other, timeo);
++ if (!sk_locked) {
++ unix_state_unlock(other);
++ unix_state_double_lock(sk, other);
++ }
+
+- err = sock_intr_errno(timeo);
+- if (signal_pending(current))
+- goto out_free;
++ if (unix_peer(sk) != other ||
++ unix_dgram_peer_wake_me(sk, other)) {
++ err = -EAGAIN;
++ sk_locked = 1;
++ goto out_unlock;
++ }
+
+- goto restart;
++ if (!sk_locked) {
++ sk_locked = 1;
++ goto restart_locked;
++ }
+ }
+
++ if (unlikely(sk_locked))
++ unix_state_unlock(sk);
++
+ if (sock_flag(other, SOCK_RCVTSTAMP))
+ __net_timestamp(skb);
+ maybe_add_creds(skb, sock, other);
+@@ -1618,6 +1795,8 @@ restart:
+ return len;
+
+ out_unlock:
++ if (sk_locked)
++ unix_state_unlock(sk);
+ unix_state_unlock(other);
+ out_free:
+ kfree_skb(skb);
+@@ -1739,8 +1918,10 @@ out_err:
+ static ssize_t unix_stream_sendpage(struct socket *socket, struct page *page,
+ int offset, size_t size, int flags)
+ {
+- int err = 0;
+- bool send_sigpipe = true;
++ int err;
++ bool send_sigpipe = false;
++ bool init_scm = true;
++ struct scm_cookie scm;
+ struct sock *other, *sk = socket->sk;
+ struct sk_buff *skb, *newskb = NULL, *tail = NULL;
+
+@@ -1758,7 +1939,7 @@ alloc_skb:
+ newskb = sock_alloc_send_pskb(sk, 0, 0, flags & MSG_DONTWAIT,
+ &err, 0);
+ if (!newskb)
+- return err;
++ goto err;
+ }
+
+ /* we must acquire readlock as we modify already present
+@@ -1767,12 +1948,12 @@ alloc_skb:
+ err = mutex_lock_interruptible(&unix_sk(other)->readlock);
+ if (err) {
+ err = flags & MSG_DONTWAIT ? -EAGAIN : -ERESTARTSYS;
+- send_sigpipe = false;
+ goto err;
+ }
+
+ if (sk->sk_shutdown & SEND_SHUTDOWN) {
+ err = -EPIPE;
++ send_sigpipe = true;
+ goto err_unlock;
+ }
+
+@@ -1781,23 +1962,34 @@ alloc_skb:
+ if (sock_flag(other, SOCK_DEAD) ||
+ other->sk_shutdown & RCV_SHUTDOWN) {
+ err = -EPIPE;
++ send_sigpipe = true;
+ goto err_state_unlock;
+ }
+
++ if (init_scm) {
++ err = maybe_init_creds(&scm, socket, other);
++ if (err)
++ goto err_state_unlock;
++ init_scm = false;
++ }
++
+ skb = skb_peek_tail(&other->sk_receive_queue);
+ if (tail && tail == skb) {
+ skb = newskb;
+- } else if (!skb) {
+- if (newskb)
++ } else if (!skb || !unix_skb_scm_eq(skb, &scm)) {
++ if (newskb) {
+ skb = newskb;
+- else
++ } else {
++ tail = skb;
+ goto alloc_skb;
++ }
+ } else if (newskb) {
+ /* this is fast path, we don't necessarily need to
+ * call to kfree_skb even though with newskb == NULL
+ * this - does no harm
+ */
+ consume_skb(newskb);
++ newskb = NULL;
+ }
+
+ if (skb_append_pagefrags(skb, page, offset, size)) {
+@@ -1810,14 +2002,20 @@ alloc_skb:
+ skb->truesize += size;
+ atomic_add(size, &sk->sk_wmem_alloc);
+
+- if (newskb)
++ if (newskb) {
++ err = unix_scm_to_skb(&scm, skb, false);
++ if (err)
++ goto err_state_unlock;
++ spin_lock(&other->sk_receive_queue.lock);
+ __skb_queue_tail(&other->sk_receive_queue, newskb);
++ spin_unlock(&other->sk_receive_queue.lock);
++ }
+
+ unix_state_unlock(other);
+ mutex_unlock(&unix_sk(other)->readlock);
+
+ other->sk_data_ready(other);
+-
++ scm_destroy(&scm);
+ return size;
+
+ err_state_unlock:
+@@ -1828,6 +2026,8 @@ err:
+ kfree_skb(newskb);
+ if (send_sigpipe && !(flags & MSG_NOSIGNAL))
+ send_sig(SIGPIPE, current, 0);
++ if (!init_scm)
++ scm_destroy(&scm);
+ return err;
+ }
+
+@@ -2071,6 +2271,7 @@ static int unix_stream_read_generic(struct unix_stream_read_state *state)
+
+ do {
+ int chunk;
++ bool drop_skb;
+ struct sk_buff *skb, *last;
+
+ unix_state_lock(sk);
+@@ -2130,10 +2331,7 @@ unlock:
+
+ if (check_creds) {
+ /* Never glue messages from different writers */
+- if ((UNIXCB(skb).pid != scm.pid) ||
+- !uid_eq(UNIXCB(skb).uid, scm.creds.uid) ||
+- !gid_eq(UNIXCB(skb).gid, scm.creds.gid) ||
+- !unix_secdata_eq(&scm, skb))
++ if (!unix_skb_scm_eq(skb, &scm))
+ break;
+ } else if (test_bit(SOCK_PASSCRED, &sock->flags)) {
+ /* Copy credentials */
+@@ -2151,7 +2349,11 @@ unlock:
+ }
+
+ chunk = min_t(unsigned int, unix_skb_len(skb) - skip, size);
++ skb_get(skb);
+ chunk = state->recv_actor(skb, skip, chunk, state);
++ drop_skb = !unix_skb_len(skb);
++ /* skb is only safe to use if !drop_skb */
++ consume_skb(skb);
+ if (chunk < 0) {
+ if (copied == 0)
+ copied = -EFAULT;
+@@ -2160,6 +2362,18 @@ unlock:
+ copied += chunk;
+ size -= chunk;
+
++ if (drop_skb) {
++ /* the skb was touched by a concurrent reader;
++ * we should not expect anything from this skb
++ * anymore and assume it invalid - we can be
++ * sure it was dropped from the socket queue
++ *
++ * let's report a short read
++ */
++ err = 0;
++ break;
++ }
++
+ /* Mark read part of skb as used */
+ if (!(flags & MSG_PEEK)) {
+ UNIXCB(skb).consumed += chunk;
+@@ -2453,14 +2667,16 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock,
+ return mask;
+
+ writable = unix_writable(sk);
+- other = unix_peer_get(sk);
+- if (other) {
+- if (unix_peer(other) != sk) {
+- sock_poll_wait(file, &unix_sk(other)->peer_wait, wait);
+- if (unix_recvq_full(other))
+- writable = 0;
+- }
+- sock_put(other);
++ if (writable) {
++ unix_state_lock(sk);
++
++ other = unix_peer(sk);
++ if (other && unix_peer(other) != sk &&
++ unix_recvq_full(other) &&
++ unix_dgram_peer_wake_me(sk, other))
++ writable = 0;
++
++ unix_state_unlock(sk);
+ }
+
+ if (writable)
+diff --git a/sound/pci/hda/patch_hdmi.c b/sound/pci/hda/patch_hdmi.c
+index a97db5fc8a15..9d1f91db57e6 100644
+--- a/sound/pci/hda/patch_hdmi.c
++++ b/sound/pci/hda/patch_hdmi.c
+@@ -48,8 +48,9 @@ MODULE_PARM_DESC(static_hdmi_pcm, "Don't restrict PCM parameters per ELD info");
+ #define is_haswell(codec) ((codec)->core.vendor_id == 0x80862807)
+ #define is_broadwell(codec) ((codec)->core.vendor_id == 0x80862808)
+ #define is_skylake(codec) ((codec)->core.vendor_id == 0x80862809)
++#define is_broxton(codec) ((codec)->core.vendor_id == 0x8086280a)
+ #define is_haswell_plus(codec) (is_haswell(codec) || is_broadwell(codec) \
+- || is_skylake(codec))
++ || is_skylake(codec) || is_broxton(codec))
+
+ #define is_valleyview(codec) ((codec)->core.vendor_id == 0x80862882)
+ #define is_cherryview(codec) ((codec)->core.vendor_id == 0x80862883)
+diff --git a/tools/net/Makefile b/tools/net/Makefile
+index ee577ea03ba5..ddf888010652 100644
+--- a/tools/net/Makefile
++++ b/tools/net/Makefile
+@@ -4,6 +4,9 @@ CC = gcc
+ LEX = flex
+ YACC = bison
+
++CFLAGS += -Wall -O2
++CFLAGS += -D__EXPORTED_HEADERS__ -I../../include/uapi -I../../include
++
+ %.yacc.c: %.y
+ $(YACC) -o $@ -d $<
+
+@@ -12,15 +15,13 @@ YACC = bison
+
+ all : bpf_jit_disasm bpf_dbg bpf_asm
+
+-bpf_jit_disasm : CFLAGS = -Wall -O2 -DPACKAGE='bpf_jit_disasm'
++bpf_jit_disasm : CFLAGS += -DPACKAGE='bpf_jit_disasm'
+ bpf_jit_disasm : LDLIBS = -lopcodes -lbfd -ldl
+ bpf_jit_disasm : bpf_jit_disasm.o
+
+-bpf_dbg : CFLAGS = -Wall -O2
+ bpf_dbg : LDLIBS = -lreadline
+ bpf_dbg : bpf_dbg.o
+
+-bpf_asm : CFLAGS = -Wall -O2 -I.
+ bpf_asm : LDLIBS =
+ bpf_asm : bpf_asm.o bpf_exp.yacc.o bpf_exp.lex.o
+ bpf_exp.lex.o : bpf_exp.yacc.c
^ permalink raw reply related [flat|nested] 17+ messages in thread
end of thread, other threads:[~2015-12-15 11:15 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-03 16:12 [gentoo-commits] proj/linux-patches:4.2 commit in: / Mike Pagano
-- strict thread matches above, loose matches on Subject: below --
2015-12-15 11:15 Mike Pagano
2015-12-11 14:31 Mike Pagano
2015-11-10 0:58 Mike Pagano
2015-11-05 23:30 Mike Pagano
2015-10-27 13:36 Mike Pagano
2015-10-23 17:19 Mike Pagano
2015-10-23 17:14 Mike Pagano
2015-09-29 19:16 Mike Pagano
2015-09-29 17:51 Mike Pagano
2015-09-28 23:44 Mike Pagano
2015-09-28 16:49 Mike Pagano
2015-09-22 11:43 Mike Pagano
2015-09-21 22:19 Mike Pagano
2015-09-15 12:31 Mike Pagano
2015-09-02 16:34 Mike Pagano
2015-08-19 14:58 Mike Pagano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox